All of lore.kernel.org
 help / color / mirror / Atom feed
* Ceph Hackathon: More Memory Allocator Testing
@ 2015-08-19  4:45 Mark Nelson
  2015-08-19  5:13 ` Shinobu Kinjo
                   ` (4 more replies)
  0 siblings, 5 replies; 55+ messages in thread
From: Mark Nelson @ 2015-08-19  4:45 UTC (permalink / raw)
  To: ceph-devel

Hi Everyone,

One of the goals at the Ceph Hackathon last week was to examine how to 
improve Ceph Small IO performance.  Jian Zhang presented findings 
showing a dramatic improvement in small random IO performance when Ceph 
is used with jemalloc.  His results build upon Sandisk's original 
findings that the default thread cache values are a major bottleneck in 
TCMalloc 2.1.  To further verify these results, we sat down at the 
Hackathon and configured the new performance test cluster that Intel 
generously donated to the Ceph community laboratory to run through a 
variety of tests with different memory allocator configurations.  I've 
since written the results of those tests up in pdf form for folks who 
are interested.

The results are located here:

http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.pdf

I want to be clear that many other folks have done the heavy lifting 
here.  These results are simply a validation of the many tests that 
other folks have already done.  Many thanks to Sandisk and others for 
figuring this out as it's a pretty big deal!

Side note:  Very little tuning other than swapping the memory allocator 
and a couple of quick and dirty ceph tunables were set during these 
tests. It's quite possible that higher IOPS will be achieved as we 
really start digging into the cluster and learning what the bottlenecks are.

Thanks,
Mark

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Ceph Hackathon: More Memory Allocator Testing
  2015-08-19  4:45 Ceph Hackathon: More Memory Allocator Testing Mark Nelson
@ 2015-08-19  5:13 ` Shinobu Kinjo
  2015-08-19  5:36 ` Somnath Roy
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 55+ messages in thread
From: Shinobu Kinjo @ 2015-08-19  5:13 UTC (permalink / raw)
  To: Mark Nelson; +Cc: ceph-devel

Thank you.
That's pretty interesting to me.

 Shinobu

----- Original Message -----
From: "Mark Nelson" <mnelson@redhat.com>
To: "ceph-devel" <ceph-devel@vger.kernel.org>
Sent: Wednesday, August 19, 2015 1:45:36 PM
Subject: Ceph Hackathon: More Memory Allocator Testing

Hi Everyone,

One of the goals at the Ceph Hackathon last week was to examine how to 
improve Ceph Small IO performance.  Jian Zhang presented findings 
showing a dramatic improvement in small random IO performance when Ceph 
is used with jemalloc.  His results build upon Sandisk's original 
findings that the default thread cache values are a major bottleneck in 
TCMalloc 2.1.  To further verify these results, we sat down at the 
Hackathon and configured the new performance test cluster that Intel 
generously donated to the Ceph community laboratory to run through a 
variety of tests with different memory allocator configurations.  I've 
since written the results of those tests up in pdf form for folks who 
are interested.

The results are located here:

http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.pdf

I want to be clear that many other folks have done the heavy lifting 
here.  These results are simply a validation of the many tests that 
other folks have already done.  Many thanks to Sandisk and others for 
figuring this out as it's a pretty big deal!

Side note:  Very little tuning other than swapping the memory allocator 
and a couple of quick and dirty ceph tunables were set during these 
tests. It's quite possible that higher IOPS will be achieved as we 
really start digging into the cluster and learning what the bottlenecks are.

Thanks,
Mark
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: Ceph Hackathon: More Memory Allocator Testing
  2015-08-19  4:45 Ceph Hackathon: More Memory Allocator Testing Mark Nelson
  2015-08-19  5:13 ` Shinobu Kinjo
@ 2015-08-19  5:36 ` Somnath Roy
  2015-08-19  8:07   ` Haomai Wang
  2015-08-19 12:10   ` Mark Nelson
  2015-08-19  6:33 ` Stefan Priebe - Profihost AG
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 55+ messages in thread
From: Somnath Roy @ 2015-08-19  5:36 UTC (permalink / raw)
  To: Mark Nelson, ceph-devel

Mark,
Thanks for verifying this. Nice report !
Since there is a big difference in memory consumption with jemalloc, I would say a recovery performance data or client performance data during recovery would be helpful.

Thanks & Regards
Somnath

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
Sent: Tuesday, August 18, 2015 9:46 PM
To: ceph-devel
Subject: Ceph Hackathon: More Memory Allocator Testing

Hi Everyone,

One of the goals at the Ceph Hackathon last week was to examine how to improve Ceph Small IO performance.  Jian Zhang presented findings showing a dramatic improvement in small random IO performance when Ceph is used with jemalloc.  His results build upon Sandisk's original findings that the default thread cache values are a major bottleneck in TCMalloc 2.1.  To further verify these results, we sat down at the Hackathon and configured the new performance test cluster that Intel generously donated to the Ceph community laboratory to run through a variety of tests with different memory allocator configurations.  I've since written the results of those tests up in pdf form for folks who are interested.

The results are located here:

http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.pdf

I want to be clear that many other folks have done the heavy lifting here.  These results are simply a validation of the many tests that other folks have already done.  Many thanks to Sandisk and others for figuring this out as it's a pretty big deal!

Side note:  Very little tuning other than swapping the memory allocator and a couple of quick and dirty ceph tunables were set during these tests. It's quite possible that higher IOPS will be achieved as we really start digging into the cluster and learning what the bottlenecks are.

Thanks,
Mark
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Ceph Hackathon: More Memory Allocator Testing
  2015-08-19  4:45 Ceph Hackathon: More Memory Allocator Testing Mark Nelson
  2015-08-19  5:13 ` Shinobu Kinjo
  2015-08-19  5:36 ` Somnath Roy
@ 2015-08-19  6:33 ` Stefan Priebe - Profihost AG
  2015-08-19 12:20   ` Mark Nelson
  2015-08-19 14:01 ` Alexandre DERUMIER
  2015-08-19 20:50 ` Zhang, Jian
  4 siblings, 1 reply; 55+ messages in thread
From: Stefan Priebe - Profihost AG @ 2015-08-19  6:33 UTC (permalink / raw)
  To: Mark Nelson, ceph-devel


Thanks for sharing. Do those tests use jemalloc for fio too? Otherwise
librbd on client side is running with tcmalloc again.

Stefan

Am 19.08.2015 um 06:45 schrieb Mark Nelson:
> Hi Everyone,
> 
> One of the goals at the Ceph Hackathon last week was to examine how to
> improve Ceph Small IO performance.  Jian Zhang presented findings
> showing a dramatic improvement in small random IO performance when Ceph
> is used with jemalloc.  His results build upon Sandisk's original
> findings that the default thread cache values are a major bottleneck in
> TCMalloc 2.1.  To further verify these results, we sat down at the
> Hackathon and configured the new performance test cluster that Intel
> generously donated to the Ceph community laboratory to run through a
> variety of tests with different memory allocator configurations.  I've
> since written the results of those tests up in pdf form for folks who
> are interested.
> 
> The results are located here:
> 
> http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.pdf
> 
> I want to be clear that many other folks have done the heavy lifting
> here.  These results are simply a validation of the many tests that
> other folks have already done.  Many thanks to Sandisk and others for
> figuring this out as it's a pretty big deal!
> 
> Side note:  Very little tuning other than swapping the memory allocator
> and a couple of quick and dirty ceph tunables were set during these
> tests. It's quite possible that higher IOPS will be achieved as we
> really start digging into the cluster and learning what the bottlenecks
> are.
> 
> Thanks,
> Mark
> -- 
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Ceph Hackathon: More Memory Allocator Testing
  2015-08-19  5:36 ` Somnath Roy
@ 2015-08-19  8:07   ` Haomai Wang
  2015-08-19  9:06     ` Shinobu Kinjo
  2015-08-19 12:17     ` Mark Nelson
  2015-08-19 12:10   ` Mark Nelson
  1 sibling, 2 replies; 55+ messages in thread
From: Haomai Wang @ 2015-08-19  8:07 UTC (permalink / raw)
  To: Somnath Roy; +Cc: Mark Nelson, ceph-devel

On Wed, Aug 19, 2015 at 1:36 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
> Mark,
> Thanks for verifying this. Nice report !
> Since there is a big difference in memory consumption with jemalloc, I would say a recovery performance data or client performance data during recovery would be helpful.
>

The RSS memory usage in the report is per OSD I guess(really?). It
can't be ignored since it's really a great improvement memory usage.

> Thanks & Regards
> Somnath
>
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
> Sent: Tuesday, August 18, 2015 9:46 PM
> To: ceph-devel
> Subject: Ceph Hackathon: More Memory Allocator Testing
>
> Hi Everyone,
>
> One of the goals at the Ceph Hackathon last week was to examine how to improve Ceph Small IO performance.  Jian Zhang presented findings showing a dramatic improvement in small random IO performance when Ceph is used with jemalloc.  His results build upon Sandisk's original findings that the default thread cache values are a major bottleneck in TCMalloc 2.1.  To further verify these results, we sat down at the Hackathon and configured the new performance test cluster that Intel generously donated to the Ceph community laboratory to run through a variety of tests with different memory allocator configurations.  I've since written the results of those tests up in pdf form for folks who are interested.
>
> The results are located here:
>
> http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.pdf
>
> I want to be clear that many other folks have done the heavy lifting here.  These results are simply a validation of the many tests that other folks have already done.  Many thanks to Sandisk and others for figuring this out as it's a pretty big deal!
>
> Side note:  Very little tuning other than swapping the memory allocator and a couple of quick and dirty ceph tunables were set during these tests. It's quite possible that higher IOPS will be achieved as we really start digging into the cluster and learning what the bottlenecks are.
>
> Thanks,
> Mark
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> ________________________________
>
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>



-- 
Best Regards,

Wheat

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Ceph Hackathon: More Memory Allocator Testing
  2015-08-19  8:07   ` Haomai Wang
@ 2015-08-19  9:06     ` Shinobu Kinjo
  2015-08-19 12:17     ` Mark Nelson
  1 sibling, 0 replies; 55+ messages in thread
From: Shinobu Kinjo @ 2015-08-19  9:06 UTC (permalink / raw)
  To: Haomai Wang; +Cc: Somnath Roy, Mark Nelson, ceph-devel

yes, that's true.

----- Original Message -----
From: "Haomai Wang" <haomaiwang@gmail.com>
To: "Somnath Roy" <Somnath.Roy@sandisk.com>
Cc: "Mark Nelson" <mnelson@redhat.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
Sent: Wednesday, August 19, 2015 5:07:53 PM
Subject: Re: Ceph Hackathon: More Memory Allocator Testing

On Wed, Aug 19, 2015 at 1:36 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
> Mark,
> Thanks for verifying this. Nice report !
> Since there is a big difference in memory consumption with jemalloc, I would say a recovery performance data or client performance data during recovery would be helpful.
>

The RSS memory usage in the report is per OSD I guess(really?). It
can't be ignored since it's really a great improvement memory usage.

> Thanks & Regards
> Somnath
>
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
> Sent: Tuesday, August 18, 2015 9:46 PM
> To: ceph-devel
> Subject: Ceph Hackathon: More Memory Allocator Testing
>
> Hi Everyone,
>
> One of the goals at the Ceph Hackathon last week was to examine how to improve Ceph Small IO performance.  Jian Zhang presented findings showing a dramatic improvement in small random IO performance when Ceph is used with jemalloc.  His results build upon Sandisk's original findings that the default thread cache values are a major bottleneck in TCMalloc 2.1.  To further verify these results, we sat down at the Hackathon and configured the new performance test cluster that Intel generously donated to the Ceph community laboratory to run through a variety of tests with different memory allocator configurations.  I've since written the results of those tests up in pdf form for folks who are interested.
>
> The results are located here:
>
> http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.pdf
>
> I want to be clear that many other folks have done the heavy lifting here.  These results are simply a validation of the many tests that other folks have already done.  Many thanks to Sandisk and others for figuring this out as it's a pretty big deal!
>
> Side note:  Very little tuning other than swapping the memory allocator and a couple of quick and dirty ceph tunables were set during these tests. It's quite possible that higher IOPS will be achieved as we really start digging into the cluster and learning what the bottlenecks are.
>
> Thanks,
> Mark
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> ________________________________
>
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>



-- 
Best Regards,

Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Ceph Hackathon: More Memory Allocator Testing
  2015-08-19  5:36 ` Somnath Roy
  2015-08-19  8:07   ` Haomai Wang
@ 2015-08-19 12:10   ` Mark Nelson
  1 sibling, 0 replies; 55+ messages in thread
From: Mark Nelson @ 2015-08-19 12:10 UTC (permalink / raw)
  To: Somnath Roy, ceph-devel

Yes, I agree.  I think that's the next step.  Half of the cluster is 
being used this week for QOS testing, but I may be able to examine this 
on the other half of the cluster, or wait until next week when I can get 
the whole cluster back together.

Mark

On 08/19/2015 12:36 AM, Somnath Roy wrote:
> Mark,
> Thanks for verifying this. Nice report !
> Since there is a big difference in memory consumption with jemalloc, I would say a recovery performance data or client performance data during recovery would be helpful.
>
> Thanks & Regards
> Somnath
>
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
> Sent: Tuesday, August 18, 2015 9:46 PM
> To: ceph-devel
> Subject: Ceph Hackathon: More Memory Allocator Testing
>
> Hi Everyone,
>
> One of the goals at the Ceph Hackathon last week was to examine how to improve Ceph Small IO performance.  Jian Zhang presented findings showing a dramatic improvement in small random IO performance when Ceph is used with jemalloc.  His results build upon Sandisk's original findings that the default thread cache values are a major bottleneck in TCMalloc 2.1.  To further verify these results, we sat down at the Hackathon and configured the new performance test cluster that Intel generously donated to the Ceph community laboratory to run through a variety of tests with different memory allocator configurations.  I've since written the results of those tests up in pdf form for folks who are interested.
>
> The results are located here:
>
> http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.pdf
>
> I want to be clear that many other folks have done the heavy lifting here.  These results are simply a validation of the many tests that other folks have already done.  Many thanks to Sandisk and others for figuring this out as it's a pretty big deal!
>
> Side note:  Very little tuning other than swapping the memory allocator and a couple of quick and dirty ceph tunables were set during these tests. It's quite possible that higher IOPS will be achieved as we really start digging into the cluster and learning what the bottlenecks are.
>
> Thanks,
> Mark
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> ________________________________
>
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Ceph Hackathon: More Memory Allocator Testing
  2015-08-19  8:07   ` Haomai Wang
  2015-08-19  9:06     ` Shinobu Kinjo
@ 2015-08-19 12:17     ` Mark Nelson
  2015-08-19 12:36       ` Dałek, Piotr
  1 sibling, 1 reply; 55+ messages in thread
From: Mark Nelson @ 2015-08-19 12:17 UTC (permalink / raw)
  To: Haomai Wang, Somnath Roy; +Cc: ceph-devel



On 08/19/2015 03:07 AM, Haomai Wang wrote:
> On Wed, Aug 19, 2015 at 1:36 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
>> Mark,
>> Thanks for verifying this. Nice report !
>> Since there is a big difference in memory consumption with jemalloc, I would say a recovery performance data or client performance data during recovery would be helpful.
>>
>
> The RSS memory usage in the report is per OSD I guess(really?). It
> can't be ignored since it's really a great improvement memory usage.

Do you mean with tcmalloc?  I think it's a tough decision.  For 
jemalloc, 300MB more of RSS per OSD does add up (about 18GB for 60 
OSDs).  On the other hand, the cost of memory is such a small fraction 
of the overall cost of systems like this that it might be worth it to 
switch over anyway.  In the 4K write tests it's pretty clear that even 
with 128MB TC, TCMalloc is suffering and jemalloc appears to still have 
headroom left.  It's possible that bumping the thread cache even higher 
might help TCMalloc close the gap though.  It's also possible that 
jemalloc might have worse memory behavior under recovery scenarios as we 
discussed at the hackathon (And Somnath mentioned above), so I think we 
probably need to run the tests.

>
>> Thanks & Regards
>> Somnath
>>
>> -----Original Message-----
>> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
>> Sent: Tuesday, August 18, 2015 9:46 PM
>> To: ceph-devel
>> Subject: Ceph Hackathon: More Memory Allocator Testing
>>
>> Hi Everyone,
>>
>> One of the goals at the Ceph Hackathon last week was to examine how to improve Ceph Small IO performance.  Jian Zhang presented findings showing a dramatic improvement in small random IO performance when Ceph is used with jemalloc.  His results build upon Sandisk's original findings that the default thread cache values are a major bottleneck in TCMalloc 2.1.  To further verify these results, we sat down at the Hackathon and configured the new performance test cluster that Intel generously donated to the Ceph community laboratory to run through a variety of tests with different memory allocator configurations.  I've since written the results of those tests up in pdf form for folks who are interested.
>>
>> The results are located here:
>>
>> http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.pdf
>>
>> I want to be clear that many other folks have done the heavy lifting here.  These results are simply a validation of the many tests that other folks have already done.  Many thanks to Sandisk and others for figuring this out as it's a pretty big deal!
>>
>> Side note:  Very little tuning other than swapping the memory allocator and a couple of quick and dirty ceph tunables were set during these tests. It's quite possible that higher IOPS will be achieved as we really start digging into the cluster and learning what the bottlenecks are.
>>
>> Thanks,
>> Mark
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> ________________________________
>>
>> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>>
>
>
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Ceph Hackathon: More Memory Allocator Testing
  2015-08-19  6:33 ` Stefan Priebe - Profihost AG
@ 2015-08-19 12:20   ` Mark Nelson
  0 siblings, 0 replies; 55+ messages in thread
From: Mark Nelson @ 2015-08-19 12:20 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG, ceph-devel

Nope!  So in this case it's just server side.

On 08/19/2015 01:33 AM, Stefan Priebe - Profihost AG wrote:
>
> Thanks for sharing. Do those tests use jemalloc for fio too? Otherwise
> librbd on client side is running with tcmalloc again.
>
> Stefan
>
> Am 19.08.2015 um 06:45 schrieb Mark Nelson:
>> Hi Everyone,
>>
>> One of the goals at the Ceph Hackathon last week was to examine how to
>> improve Ceph Small IO performance.  Jian Zhang presented findings
>> showing a dramatic improvement in small random IO performance when Ceph
>> is used with jemalloc.  His results build upon Sandisk's original
>> findings that the default thread cache values are a major bottleneck in
>> TCMalloc 2.1.  To further verify these results, we sat down at the
>> Hackathon and configured the new performance test cluster that Intel
>> generously donated to the Ceph community laboratory to run through a
>> variety of tests with different memory allocator configurations.  I've
>> since written the results of those tests up in pdf form for folks who
>> are interested.
>>
>> The results are located here:
>>
>> http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.pdf
>>
>> I want to be clear that many other folks have done the heavy lifting
>> here.  These results are simply a validation of the many tests that
>> other folks have already done.  Many thanks to Sandisk and others for
>> figuring this out as it's a pretty big deal!
>>
>> Side note:  Very little tuning other than swapping the memory allocator
>> and a couple of quick and dirty ceph tunables were set during these
>> tests. It's quite possible that higher IOPS will be achieved as we
>> really start digging into the cluster and learning what the bottlenecks
>> are.
>>
>> Thanks,
>> Mark
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: Ceph Hackathon: More Memory Allocator Testing
  2015-08-19 12:17     ` Mark Nelson
@ 2015-08-19 12:36       ` Dałek, Piotr
  2015-08-19 12:44         ` Mark Nelson
  0 siblings, 1 reply; 55+ messages in thread
From: Dałek, Piotr @ 2015-08-19 12:36 UTC (permalink / raw)
  To: Mark Nelson, Haomai Wang, Somnath Roy; +Cc: ceph-devel

> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-
> owner@vger.kernel.org] On Behalf Of Mark Nelson
> Sent: Wednesday, August 19, 2015 2:17 PM
> 
> > The RSS memory usage in the report is per OSD I guess(really?). It
> > can't be ignored since it's really a great improvement memory usage.
> 
> Do you mean with tcmalloc?  I think it's a tough decision.  For jemalloc, 300MB
> more of RSS per OSD does add up (about 18GB for 60 OSDs).  On the other
> hand, the cost of memory is such a small fraction of the overall cost of
> systems like this that it might be worth it to switch over anyway.  In the 4K
> write tests it's pretty clear that even with 128MB TC, TCMalloc is suffering
> and jemalloc appears to still have headroom left.  It's possible that bumping
> the thread cache even higher might help TCMalloc close the gap though.  It's
> also possible that jemalloc might have worse memory behavior under
> recovery scenarios as we discussed at the hackathon (And Somnath
> mentioned above), so I think we probably need to run the tests.

Have you tried running these tests again with TCMalloc after applying patches from https://github.com/ceph/ceph/pull/5534? Because switching to jemalloc alone won't fix the root issues, only make it less painful.

With best regards / Pozdrawiam
Piotr Dałek


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Ceph Hackathon: More Memory Allocator Testing
  2015-08-19 12:36       ` Dałek, Piotr
@ 2015-08-19 12:44         ` Mark Nelson
  2015-08-19 12:47           ` Dałek, Piotr
  0 siblings, 1 reply; 55+ messages in thread
From: Mark Nelson @ 2015-08-19 12:44 UTC (permalink / raw)
  To: "Dałek, Piotr", Haomai Wang, Somnath Roy; +Cc: ceph-devel

On 08/19/2015 07:36 AM, Dałek, Piotr wrote:
>> -----Original Message-----
>> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-
>> owner@vger.kernel.org] On Behalf Of Mark Nelson
>> Sent: Wednesday, August 19, 2015 2:17 PM
>>
>>> The RSS memory usage in the report is per OSD I guess(really?). It
>>> can't be ignored since it's really a great improvement memory usage.
>>
>> Do you mean with tcmalloc?  I think it's a tough decision.  For jemalloc, 300MB
>> more of RSS per OSD does add up (about 18GB for 60 OSDs).  On the other
>> hand, the cost of memory is such a small fraction of the overall cost of
>> systems like this that it might be worth it to switch over anyway.  In the 4K
>> write tests it's pretty clear that even with 128MB TC, TCMalloc is suffering
>> and jemalloc appears to still have headroom left.  It's possible that bumping
>> the thread cache even higher might help TCMalloc close the gap though.  It's
>> also possible that jemalloc might have worse memory behavior under
>> recovery scenarios as we discussed at the hackathon (And Somnath
>> mentioned above), so I think we probably need to run the tests.
>
> Have you tried running these tests again with TCMalloc after applying patches from https://github.com/ceph/ceph/pull/5534? Because switching to jemalloc alone won't fix the root issues, only make it less painful.

Not yet, my first goal was to simply replicate the tests that have 
already been done using hammer to get a baseline on the new cluster.  It 
definitely looks like that's (yet another!) good next step.

Mark

>
> With best regards / Pozdrawiam
> Piotr Dałek
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: Ceph Hackathon: More Memory Allocator Testing
  2015-08-19 12:44         ` Mark Nelson
@ 2015-08-19 12:47           ` Dałek, Piotr
  0 siblings, 0 replies; 55+ messages in thread
From: Dałek, Piotr @ 2015-08-19 12:47 UTC (permalink / raw)
  To: Mark Nelson, Haomai Wang, Somnath Roy; +Cc: ceph-devel

> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-
> owner@vger.kernel.org] On Behalf Of Mark Nelson
> Sent: Wednesday, August 19, 2015 2:45 PM
>
> > Have you tried running these tests again with TCMalloc after applying
> patches from https://github.com/ceph/ceph/pull/5534? Because switching
> to jemalloc alone won't fix the root issues, only make it less painful.
> 
> Not yet, my first goal was to simply replicate the tests that have already been
> done using hammer to get a baseline on the new cluster.  It definitely looks
> like that's (yet another!) good next step.

Ah, OK.
Waiting for more results then!

With best regards / Pozdrawiam
Piotr Dałek

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Ceph Hackathon: More Memory Allocator Testing
  2015-08-19  4:45 Ceph Hackathon: More Memory Allocator Testing Mark Nelson
                   ` (2 preceding siblings ...)
  2015-08-19  6:33 ` Stefan Priebe - Profihost AG
@ 2015-08-19 14:01 ` Alexandre DERUMIER
  2015-08-19 16:05   ` Alexandre DERUMIER
  2015-08-19 20:16   ` Somnath Roy
  2015-08-19 20:50 ` Zhang, Jian
  4 siblings, 2 replies; 55+ messages in thread
From: Alexandre DERUMIER @ 2015-08-19 14:01 UTC (permalink / raw)
  To: Mark Nelson; +Cc: ceph-devel

Thanks Marc,

Results are matching exactly what I have seen with tcmalloc 2.1 vs 2.4 vs jemalloc.

and indeed tcmalloc, even with bigger cache, seem decrease over time.


What is funny, is that I see exactly same behaviour client librbd side, with qemu and multiple iothreads.


Switching both server and client to jemalloc give me best performance on small read currently.






----- Mail original -----
De: "Mark Nelson" <mnelson@redhat.com>
À: "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Mercredi 19 Août 2015 06:45:36
Objet: Ceph Hackathon: More Memory Allocator Testing

Hi Everyone, 

One of the goals at the Ceph Hackathon last week was to examine how to 
improve Ceph Small IO performance. Jian Zhang presented findings 
showing a dramatic improvement in small random IO performance when Ceph 
is used with jemalloc. His results build upon Sandisk's original 
findings that the default thread cache values are a major bottleneck in 
TCMalloc 2.1. To further verify these results, we sat down at the 
Hackathon and configured the new performance test cluster that Intel 
generously donated to the Ceph community laboratory to run through a 
variety of tests with different memory allocator configurations. I've 
since written the results of those tests up in pdf form for folks who 
are interested. 

The results are located here: 

http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.pdf 

I want to be clear that many other folks have done the heavy lifting 
here. These results are simply a validation of the many tests that 
other folks have already done. Many thanks to Sandisk and others for 
figuring this out as it's a pretty big deal! 

Side note: Very little tuning other than swapping the memory allocator 
and a couple of quick and dirty ceph tunables were set during these 
tests. It's quite possible that higher IOPS will be achieved as we 
really start digging into the cluster and learning what the bottlenecks are. 

Thanks, 
Mark 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
the body of a message to majordomo@vger.kernel.org 
More majordomo info at http://vger.kernel.org/majordomo-info.html 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Ceph Hackathon: More Memory Allocator Testing
  2015-08-19 14:01 ` Alexandre DERUMIER
@ 2015-08-19 16:05   ` Alexandre DERUMIER
  2015-08-19 16:27     ` Somnath Roy
  2015-08-19 20:16   ` Somnath Roy
  1 sibling, 1 reply; 55+ messages in thread
From: Alexandre DERUMIER @ 2015-08-19 16:05 UTC (permalink / raw)
  To: Mark Nelson; +Cc: ceph-devel

I was listening at the today meeting,

and seem that the blocker to have jemalloc as default,

is that it's used more memory by osd (around 300MB?),
and some guys could have boxes with 60disks.


I just wonder if the memory increase is related to osd_op_num_shards/osd_op_threads value ?

Seem that as hackaton, the bench has been done on super big cpus boxed 36cores/72T,
http://ceph.com/hackathon/2015-08-ceph-hammer-full-ssd.pptx
with osd_op_threads = 32.

I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process.

Maybe jemalloc allocated memory by threads.



(I think guys with 60disks box, dont use ssd, so low iops by osd, and they don't need a lot of threads by osd)



----- Mail original -----
De: "aderumier" <aderumier@odiso.com>
À: "Mark Nelson" <mnelson@redhat.com>
Cc: "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Mercredi 19 Août 2015 16:01:28
Objet: Re: Ceph Hackathon: More Memory Allocator Testing

Thanks Marc, 

Results are matching exactly what I have seen with tcmalloc 2.1 vs 2.4 vs jemalloc. 

and indeed tcmalloc, even with bigger cache, seem decrease over time. 


What is funny, is that I see exactly same behaviour client librbd side, with qemu and multiple iothreads. 


Switching both server and client to jemalloc give me best performance on small read currently. 






----- Mail original ----- 
De: "Mark Nelson" <mnelson@redhat.com> 
À: "ceph-devel" <ceph-devel@vger.kernel.org> 
Envoyé: Mercredi 19 Août 2015 06:45:36 
Objet: Ceph Hackathon: More Memory Allocator Testing 

Hi Everyone, 

One of the goals at the Ceph Hackathon last week was to examine how to 
improve Ceph Small IO performance. Jian Zhang presented findings 
showing a dramatic improvement in small random IO performance when Ceph 
is used with jemalloc. His results build upon Sandisk's original 
findings that the default thread cache values are a major bottleneck in 
TCMalloc 2.1. To further verify these results, we sat down at the 
Hackathon and configured the new performance test cluster that Intel 
generously donated to the Ceph community laboratory to run through a 
variety of tests with different memory allocator configurations. I've 
since written the results of those tests up in pdf form for folks who 
are interested. 

The results are located here: 

http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.pdf 

I want to be clear that many other folks have done the heavy lifting 
here. These results are simply a validation of the many tests that 
other folks have already done. Many thanks to Sandisk and others for 
figuring this out as it's a pretty big deal! 

Side note: Very little tuning other than swapping the memory allocator 
and a couple of quick and dirty ceph tunables were set during these 
tests. It's quite possible that higher IOPS will be achieved as we 
really start digging into the cluster and learning what the bottlenecks are. 

Thanks, 
Mark 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
the body of a message to majordomo@vger.kernel.org 
More majordomo info at http://vger.kernel.org/majordomo-info.html 

-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
the body of a message to majordomo@vger.kernel.org 
More majordomo info at http://vger.kernel.org/majordomo-info.html 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: Ceph Hackathon: More Memory Allocator Testing
  2015-08-19 16:05   ` Alexandre DERUMIER
@ 2015-08-19 16:27     ` Somnath Roy
  2015-08-19 16:55       ` Alexandre DERUMIER
  0 siblings, 1 reply; 55+ messages in thread
From: Somnath Roy @ 2015-08-19 16:27 UTC (permalink / raw)
  To: Alexandre DERUMIER, Mark Nelson; +Cc: ceph-devel

<< I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process.

I think it is per tcmalloc instance loaded , so, at least with num_osds * num_tcmalloc_instance * TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in a box.

Also, I think there is no point of increasing osd_op_threads as it is not in IO path anymore..Mark is using default 5:2 for shard:thread per shard..

But, yes, it could be related to number of threads OSDs are using, need to understand how jemalloc works..Also, there may be some tuning to reduce memory usage (?).

Thanks & Regards
Somnath

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Alexandre DERUMIER
Sent: Wednesday, August 19, 2015 9:06 AM
To: Mark Nelson
Cc: ceph-devel
Subject: Re: Ceph Hackathon: More Memory Allocator Testing

I was listening at the today meeting,

and seem that the blocker to have jemalloc as default,

is that it's used more memory by osd (around 300MB?), and some guys could have boxes with 60disks.


I just wonder if the memory increase is related to osd_op_num_shards/osd_op_threads value ?

Seem that as hackaton, the bench has been done on super big cpus boxed 36cores/72T, http://ceph.com/hackathon/2015-08-ceph-hammer-full-ssd.pptx
with osd_op_threads = 32.

I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process.

Maybe jemalloc allocated memory by threads.



(I think guys with 60disks box, dont use ssd, so low iops by osd, and they don't need a lot of threads by osd)



----- Mail original -----
De: "aderumier" <aderumier@odiso.com>
À: "Mark Nelson" <mnelson@redhat.com>
Cc: "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Mercredi 19 Août 2015 16:01:28
Objet: Re: Ceph Hackathon: More Memory Allocator Testing

Thanks Marc,

Results are matching exactly what I have seen with tcmalloc 2.1 vs 2.4 vs jemalloc.

and indeed tcmalloc, even with bigger cache, seem decrease over time.


What is funny, is that I see exactly same behaviour client librbd side, with qemu and multiple iothreads.


Switching both server and client to jemalloc give me best performance on small read currently.






----- Mail original -----
De: "Mark Nelson" <mnelson@redhat.com>
À: "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Mercredi 19 Août 2015 06:45:36
Objet: Ceph Hackathon: More Memory Allocator Testing

Hi Everyone,

One of the goals at the Ceph Hackathon last week was to examine how to improve Ceph Small IO performance. Jian Zhang presented findings showing a dramatic improvement in small random IO performance when Ceph is used with jemalloc. His results build upon Sandisk's original findings that the default thread cache values are a major bottleneck in TCMalloc 2.1. To further verify these results, we sat down at the Hackathon and configured the new performance test cluster that Intel generously donated to the Ceph community laboratory to run through a variety of tests with different memory allocator configurations. I've since written the results of those tests up in pdf form for folks who are interested.

The results are located here:

http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.pdf

I want to be clear that many other folks have done the heavy lifting here. These results are simply a validation of the many tests that other folks have already done. Many thanks to Sandisk and others for figuring this out as it's a pretty big deal!

Side note: Very little tuning other than swapping the memory allocator and a couple of quick and dirty ceph tunables were set during these tests. It's quite possible that higher IOPS will be achieved as we really start digging into the cluster and learning what the bottlenecks are.

Thanks,
Mark
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Ceph Hackathon: More Memory Allocator Testing
  2015-08-19 16:27     ` Somnath Roy
@ 2015-08-19 16:55       ` Alexandre DERUMIER
  2015-08-19 16:57         ` Blinick, Stephen L
  2015-08-19 17:29         ` Somnath Roy
  0 siblings, 2 replies; 55+ messages in thread
From: Alexandre DERUMIER @ 2015-08-19 16:55 UTC (permalink / raw)
  To: Somnath Roy; +Cc: Mark Nelson, ceph-devel

<< I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. 

>>I think it is per tcmalloc instance loaded , so, at least with num_osds * num_tcmalloc_instance * TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in a box. 

What is num_tcmalloc_instance ? I think 1 osd process use a defined TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES size ?

I'm saying that, because I have exactly the same bug, client side, with librbd + tcmalloc + qemu + iothreads.
When I defined too much iothread threads, I'm hitting the bug directly. (can reproduce 100%).
Like the thread_cache size is divide by number of threads?






----- Mail original -----
De: "Somnath Roy" <Somnath.Roy@sandisk.com>
À: "aderumier" <aderumier@odiso.com>, "Mark Nelson" <mnelson@redhat.com>
Cc: "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Mercredi 19 Août 2015 18:27:30
Objet: RE: Ceph Hackathon: More Memory Allocator Testing

<< I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. 

I think it is per tcmalloc instance loaded , so, at least with num_osds * num_tcmalloc_instance * TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in a box. 

Also, I think there is no point of increasing osd_op_threads as it is not in IO path anymore..Mark is using default 5:2 for shard:thread per shard.. 

But, yes, it could be related to number of threads OSDs are using, need to understand how jemalloc works..Also, there may be some tuning to reduce memory usage (?). 

Thanks & Regards 
Somnath 

-----Original Message----- 
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Alexandre DERUMIER 
Sent: Wednesday, August 19, 2015 9:06 AM 
To: Mark Nelson 
Cc: ceph-devel 
Subject: Re: Ceph Hackathon: More Memory Allocator Testing 

I was listening at the today meeting, 

and seem that the blocker to have jemalloc as default, 

is that it's used more memory by osd (around 300MB?), and some guys could have boxes with 60disks. 


I just wonder if the memory increase is related to osd_op_num_shards/osd_op_threads value ? 

Seem that as hackaton, the bench has been done on super big cpus boxed 36cores/72T, http://ceph.com/hackathon/2015-08-ceph-hammer-full-ssd.pptx 
with osd_op_threads = 32. 

I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. 

Maybe jemalloc allocated memory by threads. 



(I think guys with 60disks box, dont use ssd, so low iops by osd, and they don't need a lot of threads by osd) 



----- Mail original ----- 
De: "aderumier" <aderumier@odiso.com> 
À: "Mark Nelson" <mnelson@redhat.com> 
Cc: "ceph-devel" <ceph-devel@vger.kernel.org> 
Envoyé: Mercredi 19 Août 2015 16:01:28 
Objet: Re: Ceph Hackathon: More Memory Allocator Testing 

Thanks Marc, 

Results are matching exactly what I have seen with tcmalloc 2.1 vs 2.4 vs jemalloc. 

and indeed tcmalloc, even with bigger cache, seem decrease over time. 


What is funny, is that I see exactly same behaviour client librbd side, with qemu and multiple iothreads. 


Switching both server and client to jemalloc give me best performance on small read currently. 






----- Mail original ----- 
De: "Mark Nelson" <mnelson@redhat.com> 
À: "ceph-devel" <ceph-devel@vger.kernel.org> 
Envoyé: Mercredi 19 Août 2015 06:45:36 
Objet: Ceph Hackathon: More Memory Allocator Testing 

Hi Everyone, 

One of the goals at the Ceph Hackathon last week was to examine how to improve Ceph Small IO performance. Jian Zhang presented findings showing a dramatic improvement in small random IO performance when Ceph is used with jemalloc. His results build upon Sandisk's original findings that the default thread cache values are a major bottleneck in TCMalloc 2.1. To further verify these results, we sat down at the Hackathon and configured the new performance test cluster that Intel generously donated to the Ceph community laboratory to run through a variety of tests with different memory allocator configurations. I've since written the results of those tests up in pdf form for folks who are interested. 

The results are located here: 

http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.pdf 

I want to be clear that many other folks have done the heavy lifting here. These results are simply a validation of the many tests that other folks have already done. Many thanks to Sandisk and others for figuring this out as it's a pretty big deal! 

Side note: Very little tuning other than swapping the memory allocator and a couple of quick and dirty ceph tunables were set during these tests. It's quite possible that higher IOPS will be achieved as we really start digging into the cluster and learning what the bottlenecks are. 

Thanks, 
Mark 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 

-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 

________________________________ 

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: Ceph Hackathon: More Memory Allocator Testing
  2015-08-19 16:55       ` Alexandre DERUMIER
@ 2015-08-19 16:57         ` Blinick, Stephen L
  2015-08-20  6:35           ` Dałek, Piotr
  2015-08-19 17:29         ` Somnath Roy
  1 sibling, 1 reply; 55+ messages in thread
From: Blinick, Stephen L @ 2015-08-19 16:57 UTC (permalink / raw)
  To: Alexandre DERUMIER, Somnath Roy; +Cc: Mark Nelson, ceph-devel

First, I wanted chime in to say as well: awesome report!  This will be useful for us as well to refer to when deciding which allocator to use.  I like the performance as well as the stability in the measurements with jemalloc.  Looking at  TCMaloc w/ TC128, didn't seem to stabilize on small write performance over the test period.  

Regarding the all-HDD or high density HDD nodes, is it certain these issues with tcmalloc don't apply, due to lower performance, or would it potentially be something that would manifest over a longer period of time (weeks/months) of running?   I know we've seen some weirdness attributed to tcmalloc on our 10-disk 20-node cluster with HDD's &  SSD journals, but it took a few weeks.

Thanks,

Stephen


-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Alexandre DERUMIER
Sent: Wednesday, August 19, 2015 9:55 AM
To: Somnath Roy
Cc: Mark Nelson; ceph-devel
Subject: Re: Ceph Hackathon: More Memory Allocator Testing

<< I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. 

>>I think it is per tcmalloc instance loaded , so, at least with num_osds * num_tcmalloc_instance * TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in a box. 

What is num_tcmalloc_instance ? I think 1 osd process use a defined TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES size ?

I'm saying that, because I have exactly the same bug, client side, with librbd + tcmalloc + qemu + iothreads.
When I defined too much iothread threads, I'm hitting the bug directly. (can reproduce 100%).
Like the thread_cache size is divide by number of threads?






----- Mail original -----
De: "Somnath Roy" <Somnath.Roy@sandisk.com>
À: "aderumier" <aderumier@odiso.com>, "Mark Nelson" <mnelson@redhat.com>
Cc: "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Mercredi 19 Août 2015 18:27:30
Objet: RE: Ceph Hackathon: More Memory Allocator Testing

<< I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. 

I think it is per tcmalloc instance loaded , so, at least with num_osds * num_tcmalloc_instance * TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in a box. 

Also, I think there is no point of increasing osd_op_threads as it is not in IO path anymore..Mark is using default 5:2 for shard:thread per shard.. 

But, yes, it could be related to number of threads OSDs are using, need to understand how jemalloc works..Also, there may be some tuning to reduce memory usage (?). 

Thanks & Regards
Somnath 

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Alexandre DERUMIER
Sent: Wednesday, August 19, 2015 9:06 AM
To: Mark Nelson
Cc: ceph-devel
Subject: Re: Ceph Hackathon: More Memory Allocator Testing 

I was listening at the today meeting, 

and seem that the blocker to have jemalloc as default, 

is that it's used more memory by osd (around 300MB?), and some guys could have boxes with 60disks. 


I just wonder if the memory increase is related to osd_op_num_shards/osd_op_threads value ? 

Seem that as hackaton, the bench has been done on super big cpus boxed 36cores/72T, http://ceph.com/hackathon/2015-08-ceph-hammer-full-ssd.pptx
with osd_op_threads = 32. 

I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. 

Maybe jemalloc allocated memory by threads. 



(I think guys with 60disks box, dont use ssd, so low iops by osd, and they don't need a lot of threads by osd) 



----- Mail original -----
De: "aderumier" <aderumier@odiso.com>
À: "Mark Nelson" <mnelson@redhat.com>
Cc: "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Mercredi 19 Août 2015 16:01:28
Objet: Re: Ceph Hackathon: More Memory Allocator Testing 

Thanks Marc, 

Results are matching exactly what I have seen with tcmalloc 2.1 vs 2.4 vs jemalloc. 

and indeed tcmalloc, even with bigger cache, seem decrease over time. 


What is funny, is that I see exactly same behaviour client librbd side, with qemu and multiple iothreads. 


Switching both server and client to jemalloc give me best performance on small read currently. 






----- Mail original -----
De: "Mark Nelson" <mnelson@redhat.com>
À: "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Mercredi 19 Août 2015 06:45:36
Objet: Ceph Hackathon: More Memory Allocator Testing 

Hi Everyone, 

One of the goals at the Ceph Hackathon last week was to examine how to improve Ceph Small IO performance. Jian Zhang presented findings showing a dramatic improvement in small random IO performance when Ceph is used with jemalloc. His results build upon Sandisk's original findings that the default thread cache values are a major bottleneck in TCMalloc 2.1. To further verify these results, we sat down at the Hackathon and configured the new performance test cluster that Intel generously donated to the Ceph community laboratory to run through a variety of tests with different memory allocator configurations. I've since written the results of those tests up in pdf form for folks who are interested. 

The results are located here: 

http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.pdf 

I want to be clear that many other folks have done the heavy lifting here. These results are simply a validation of the many tests that other folks have already done. Many thanks to Sandisk and others for figuring this out as it's a pretty big deal! 

Side note: Very little tuning other than swapping the memory allocator and a couple of quick and dirty ceph tunables were set during these tests. It's quite possible that higher IOPS will be achieved as we really start digging into the cluster and learning what the bottlenecks are. 

Thanks,
Mark
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 

________________________________ 

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: Ceph Hackathon: More Memory Allocator Testing
  2015-08-19 16:55       ` Alexandre DERUMIER
  2015-08-19 16:57         ` Blinick, Stephen L
@ 2015-08-19 17:29         ` Somnath Roy
  2015-08-19 18:20           ` Allen Samuels
  2015-08-19 18:47           ` Alexandre DERUMIER
  1 sibling, 2 replies; 55+ messages in thread
From: Somnath Roy @ 2015-08-19 17:29 UTC (permalink / raw)
  To: Alexandre DERUMIER; +Cc: Mark Nelson, ceph-devel

Yes, it should be 1 per OSD...
There is no doubt that TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES is relative to the number of threads running..
But, I don't know if number of threads is a factor for jemalloc..

Thanks & Regards
Somnath

-----Original Message-----
From: Alexandre DERUMIER [mailto:aderumier@odiso.com] 
Sent: Wednesday, August 19, 2015 9:55 AM
To: Somnath Roy
Cc: Mark Nelson; ceph-devel
Subject: Re: Ceph Hackathon: More Memory Allocator Testing

<< I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. 

>>I think it is per tcmalloc instance loaded , so, at least with num_osds * num_tcmalloc_instance * TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in a box. 

What is num_tcmalloc_instance ? I think 1 osd process use a defined TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES size ?

I'm saying that, because I have exactly the same bug, client side, with librbd + tcmalloc + qemu + iothreads.
When I defined too much iothread threads, I'm hitting the bug directly. (can reproduce 100%).
Like the thread_cache size is divide by number of threads?






----- Mail original -----
De: "Somnath Roy" <Somnath.Roy@sandisk.com>
À: "aderumier" <aderumier@odiso.com>, "Mark Nelson" <mnelson@redhat.com>
Cc: "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Mercredi 19 Août 2015 18:27:30
Objet: RE: Ceph Hackathon: More Memory Allocator Testing

<< I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. 

I think it is per tcmalloc instance loaded , so, at least with num_osds * num_tcmalloc_instance * TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in a box. 

Also, I think there is no point of increasing osd_op_threads as it is not in IO path anymore..Mark is using default 5:2 for shard:thread per shard.. 

But, yes, it could be related to number of threads OSDs are using, need to understand how jemalloc works..Also, there may be some tuning to reduce memory usage (?). 

Thanks & Regards 
Somnath 

-----Original Message----- 
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Alexandre DERUMIER 
Sent: Wednesday, August 19, 2015 9:06 AM 
To: Mark Nelson 
Cc: ceph-devel 
Subject: Re: Ceph Hackathon: More Memory Allocator Testing 

I was listening at the today meeting, 

and seem that the blocker to have jemalloc as default, 

is that it's used more memory by osd (around 300MB?), and some guys could have boxes with 60disks. 


I just wonder if the memory increase is related to osd_op_num_shards/osd_op_threads value ? 

Seem that as hackaton, the bench has been done on super big cpus boxed 36cores/72T, http://ceph.com/hackathon/2015-08-ceph-hammer-full-ssd.pptx 
with osd_op_threads = 32. 

I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. 

Maybe jemalloc allocated memory by threads. 



(I think guys with 60disks box, dont use ssd, so low iops by osd, and they don't need a lot of threads by osd) 



----- Mail original ----- 
De: "aderumier" <aderumier@odiso.com> 
À: "Mark Nelson" <mnelson@redhat.com> 
Cc: "ceph-devel" <ceph-devel@vger.kernel.org> 
Envoyé: Mercredi 19 Août 2015 16:01:28 
Objet: Re: Ceph Hackathon: More Memory Allocator Testing 

Thanks Marc, 

Results are matching exactly what I have seen with tcmalloc 2.1 vs 2.4 vs jemalloc. 

and indeed tcmalloc, even with bigger cache, seem decrease over time. 


What is funny, is that I see exactly same behaviour client librbd side, with qemu and multiple iothreads. 


Switching both server and client to jemalloc give me best performance on small read currently. 






----- Mail original ----- 
De: "Mark Nelson" <mnelson@redhat.com> 
À: "ceph-devel" <ceph-devel@vger.kernel.org> 
Envoyé: Mercredi 19 Août 2015 06:45:36 
Objet: Ceph Hackathon: More Memory Allocator Testing 

Hi Everyone, 

One of the goals at the Ceph Hackathon last week was to examine how to improve Ceph Small IO performance. Jian Zhang presented findings showing a dramatic improvement in small random IO performance when Ceph is used with jemalloc. His results build upon Sandisk's original findings that the default thread cache values are a major bottleneck in TCMalloc 2.1. To further verify these results, we sat down at the Hackathon and configured the new performance test cluster that Intel generously donated to the Ceph community laboratory to run through a variety of tests with different memory allocator configurations. I've since written the results of those tests up in pdf form for folks who are interested. 

The results are located here: 

http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.pdf 

I want to be clear that many other folks have done the heavy lifting here. These results are simply a validation of the many tests that other folks have already done. Many thanks to Sandisk and others for figuring this out as it's a pretty big deal! 

Side note: Very little tuning other than swapping the memory allocator and a couple of quick and dirty ceph tunables were set during these tests. It's quite possible that higher IOPS will be achieved as we really start digging into the cluster and learning what the bottlenecks are. 

Thanks, 
Mark 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 

-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 

________________________________ 

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: Ceph Hackathon: More Memory Allocator Testing
  2015-08-19 17:29         ` Somnath Roy
@ 2015-08-19 18:20           ` Allen Samuels
  2015-08-19 18:36             ` Mark Nelson
  2015-08-20  6:25             ` Dałek, Piotr
  2015-08-19 18:47           ` Alexandre DERUMIER
  1 sibling, 2 replies; 55+ messages in thread
From: Allen Samuels @ 2015-08-19 18:20 UTC (permalink / raw)
  To: Somnath Roy, Alexandre DERUMIER; +Cc: Mark Nelson, ceph-devel

It was a surprising result that the memory allocator is making such a large difference in performance. All of the recent work in fiddling with TCmalloc's and Jemalloc's various knobs and switches has been excellent a great example of group collaboration. But I think it's only a partial optimization of the underlying problem. The real take-away from this activity is that the code base is doing a LOT of memory allocation/deallocation which is consuming substantial CPU time-- regardless of how much we optimize the memory allocator, you can't get away from the fact that it macroscopically MATTERs. The better long-term solution is to reduce reliance on the general-purpose memory allocator and to implement strategies that are more specific to our usage model. 

What really needs to happen initially is to instrument the allocation/deallocation. Most likely we'll find that 80+% of the work is coming from just a few object classes and it will be easy to create custom allocation strategies for those usages. This will lead to even higher performance that's much less sensitive to easy-to-misconfigure environmental factors and the entire tcmalloc/jemalloc -- oops it uses more memory discussion will go away.


Allen Samuels
Software Architect, Systems and Software Solutions 

2880 Junction Avenue, San Jose, CA 95134
T: +1 408 801 7030| M: +1 408 780 6416
allen.samuels@SanDisk.com


-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Somnath Roy
Sent: Wednesday, August 19, 2015 10:30 AM
To: Alexandre DERUMIER
Cc: Mark Nelson; ceph-devel
Subject: RE: Ceph Hackathon: More Memory Allocator Testing

Yes, it should be 1 per OSD...
There is no doubt that TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES is relative to the number of threads running..
But, I don't know if number of threads is a factor for jemalloc..

Thanks & Regards
Somnath

-----Original Message-----
From: Alexandre DERUMIER [mailto:aderumier@odiso.com]
Sent: Wednesday, August 19, 2015 9:55 AM
To: Somnath Roy
Cc: Mark Nelson; ceph-devel
Subject: Re: Ceph Hackathon: More Memory Allocator Testing

<< I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. 

>>I think it is per tcmalloc instance loaded , so, at least with num_osds * num_tcmalloc_instance * TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in a box. 

What is num_tcmalloc_instance ? I think 1 osd process use a defined TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES size ?

I'm saying that, because I have exactly the same bug, client side, with librbd + tcmalloc + qemu + iothreads.
When I defined too much iothread threads, I'm hitting the bug directly. (can reproduce 100%).
Like the thread_cache size is divide by number of threads?






----- Mail original -----
De: "Somnath Roy" <Somnath.Roy@sandisk.com>
À: "aderumier" <aderumier@odiso.com>, "Mark Nelson" <mnelson@redhat.com>
Cc: "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Mercredi 19 Août 2015 18:27:30
Objet: RE: Ceph Hackathon: More Memory Allocator Testing

<< I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. 

I think it is per tcmalloc instance loaded , so, at least with num_osds * num_tcmalloc_instance * TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in a box. 

Also, I think there is no point of increasing osd_op_threads as it is not in IO path anymore..Mark is using default 5:2 for shard:thread per shard.. 

But, yes, it could be related to number of threads OSDs are using, need to understand how jemalloc works..Also, there may be some tuning to reduce memory usage (?). 

Thanks & Regards
Somnath 

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Alexandre DERUMIER
Sent: Wednesday, August 19, 2015 9:06 AM
To: Mark Nelson
Cc: ceph-devel
Subject: Re: Ceph Hackathon: More Memory Allocator Testing 

I was listening at the today meeting, 

and seem that the blocker to have jemalloc as default, 

is that it's used more memory by osd (around 300MB?), and some guys could have boxes with 60disks. 


I just wonder if the memory increase is related to osd_op_num_shards/osd_op_threads value ? 

Seem that as hackaton, the bench has been done on super big cpus boxed 36cores/72T, http://ceph.com/hackathon/2015-08-ceph-hammer-full-ssd.pptx
with osd_op_threads = 32. 

I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. 

Maybe jemalloc allocated memory by threads. 



(I think guys with 60disks box, dont use ssd, so low iops by osd, and they don't need a lot of threads by osd) 



----- Mail original -----
De: "aderumier" <aderumier@odiso.com>
À: "Mark Nelson" <mnelson@redhat.com>
Cc: "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Mercredi 19 Août 2015 16:01:28
Objet: Re: Ceph Hackathon: More Memory Allocator Testing 

Thanks Marc, 

Results are matching exactly what I have seen with tcmalloc 2.1 vs 2.4 vs jemalloc. 

and indeed tcmalloc, even with bigger cache, seem decrease over time. 


What is funny, is that I see exactly same behaviour client librbd side, with qemu and multiple iothreads. 


Switching both server and client to jemalloc give me best performance on small read currently. 






----- Mail original -----
De: "Mark Nelson" <mnelson@redhat.com>
À: "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Mercredi 19 Août 2015 06:45:36
Objet: Ceph Hackathon: More Memory Allocator Testing 

Hi Everyone, 

One of the goals at the Ceph Hackathon last week was to examine how to improve Ceph Small IO performance. Jian Zhang presented findings showing a dramatic improvement in small random IO performance when Ceph is used with jemalloc. His results build upon Sandisk's original findings that the default thread cache values are a major bottleneck in TCMalloc 2.1. To further verify these results, we sat down at the Hackathon and configured the new performance test cluster that Intel generously donated to the Ceph community laboratory to run through a variety of tests with different memory allocator configurations. I've since written the results of those tests up in pdf form for folks who are interested. 

The results are located here: 

http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.pdf 

I want to be clear that many other folks have done the heavy lifting here. These results are simply a validation of the many tests that other folks have already done. Many thanks to Sandisk and others for figuring this out as it's a pretty big deal! 

Side note: Very little tuning other than swapping the memory allocator and a couple of quick and dirty ceph tunables were set during these tests. It's quite possible that higher IOPS will be achieved as we really start digging into the cluster and learning what the bottlenecks are. 

Thanks,
Mark
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 

________________________________ 

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). 
N     r  y   b X  ǧv ^ )޺{.n +   z ]z   {ay \x1dʇڙ ,j   f   h   z \x1e w       j:+v   w j m         zZ+     ݢj"  ! i

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Ceph Hackathon: More Memory Allocator Testing
  2015-08-19 18:20           ` Allen Samuels
@ 2015-08-19 18:36             ` Mark Nelson
  2015-08-19 18:47               ` Łukasz Redynk
  2015-08-20  6:25             ` Dałek, Piotr
  1 sibling, 1 reply; 55+ messages in thread
From: Mark Nelson @ 2015-08-19 18:36 UTC (permalink / raw)
  To: Allen Samuels, Somnath Roy, Alexandre DERUMIER; +Cc: ceph-devel

On 08/19/2015 01:20 PM, Allen Samuels wrote:
> It was a surprising result that the memory allocator is making such a large difference in performance. All of the recent work in fiddling with TCmalloc's and Jemalloc's various knobs and switches has been excellent a great example of group collaboration. But I think it's only a partial optimization of the underlying problem. The real take-away from this activity is that the code base is doing a LOT of memory allocation/deallocation which is consuming substantial CPU time-- regardless of how much we optimize the memory allocator, you can't get away from the fact that it macroscopically MATTERs. The better long-term solution is to reduce reliance on the general-purpose memory allocator and to implement strategies that are more specific to our usage model.
>
> What really needs to happen initially is to instrument the allocation/deallocation. Most likely we'll find that 80+% of the work is coming from just a few object classes and it will be easy to create custom allocation strategies for those usages. This will lead to even higher performance that's much less sensitive to easy-to-misconfigure environmental factors and the entire tcmalloc/jemalloc -- oops it uses more memory discussion will go away.

Yes, I think the real take away is the Ceph is really hard on memory 
allocators.  I think a lot of us have sort of had a feeling this was the 
case for a long time.  The current discussion/results just draws it a 
lot more sharply into focus.

On the plus side there is work going on to make things a little more 
manageable, though a more comprehensive analysis would be very welcome! 
  I see the jemalloc has some interesting looking profiling options in 
the newer releases.

Mark

>
>
> Allen Samuels
> Software Architect, Systems and Software Solutions
>
> 2880 Junction Avenue, San Jose, CA 95134
> T: +1 408 801 7030| M: +1 408 780 6416
> allen.samuels@SanDisk.com
>
>
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Somnath Roy
> Sent: Wednesday, August 19, 2015 10:30 AM
> To: Alexandre DERUMIER
> Cc: Mark Nelson; ceph-devel
> Subject: RE: Ceph Hackathon: More Memory Allocator Testing
>
> Yes, it should be 1 per OSD...
> There is no doubt that TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES is relative to the number of threads running..
> But, I don't know if number of threads is a factor for jemalloc..
>
> Thanks & Regards
> Somnath
>
> -----Original Message-----
> From: Alexandre DERUMIER [mailto:aderumier@odiso.com]
> Sent: Wednesday, August 19, 2015 9:55 AM
> To: Somnath Roy
> Cc: Mark Nelson; ceph-devel
> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
>
> << I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process.
>
>>> I think it is per tcmalloc instance loaded , so, at least with num_osds * num_tcmalloc_instance * TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in a box.
>
> What is num_tcmalloc_instance ? I think 1 osd process use a defined TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES size ?
>
> I'm saying that, because I have exactly the same bug, client side, with librbd + tcmalloc + qemu + iothreads.
> When I defined too much iothread threads, I'm hitting the bug directly. (can reproduce 100%).
> Like the thread_cache size is divide by number of threads?
>
>
>
>
>
>
> ----- Mail original -----
> De: "Somnath Roy" <Somnath.Roy@sandisk.com>
> À: "aderumier" <aderumier@odiso.com>, "Mark Nelson" <mnelson@redhat.com>
> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>
> Envoyé: Mercredi 19 Août 2015 18:27:30
> Objet: RE: Ceph Hackathon: More Memory Allocator Testing
>
> << I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process.
>
> I think it is per tcmalloc instance loaded , so, at least with num_osds * num_tcmalloc_instance * TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in a box.
>
> Also, I think there is no point of increasing osd_op_threads as it is not in IO path anymore..Mark is using default 5:2 for shard:thread per shard..
>
> But, yes, it could be related to number of threads OSDs are using, need to understand how jemalloc works..Also, there may be some tuning to reduce memory usage (?).
>
> Thanks & Regards
> Somnath
>
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Alexandre DERUMIER
> Sent: Wednesday, August 19, 2015 9:06 AM
> To: Mark Nelson
> Cc: ceph-devel
> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
>
> I was listening at the today meeting,
>
> and seem that the blocker to have jemalloc as default,
>
> is that it's used more memory by osd (around 300MB?), and some guys could have boxes with 60disks.
>
>
> I just wonder if the memory increase is related to osd_op_num_shards/osd_op_threads value ?
>
> Seem that as hackaton, the bench has been done on super big cpus boxed 36cores/72T, http://ceph.com/hackathon/2015-08-ceph-hammer-full-ssd.pptx
> with osd_op_threads = 32.
>
> I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process.
>
> Maybe jemalloc allocated memory by threads.
>
>
>
> (I think guys with 60disks box, dont use ssd, so low iops by osd, and they don't need a lot of threads by osd)
>
>
>
> ----- Mail original -----
> De: "aderumier" <aderumier@odiso.com>
> À: "Mark Nelson" <mnelson@redhat.com>
> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>
> Envoyé: Mercredi 19 Août 2015 16:01:28
> Objet: Re: Ceph Hackathon: More Memory Allocator Testing
>
> Thanks Marc,
>
> Results are matching exactly what I have seen with tcmalloc 2.1 vs 2.4 vs jemalloc.
>
> and indeed tcmalloc, even with bigger cache, seem decrease over time.
>
>
> What is funny, is that I see exactly same behaviour client librbd side, with qemu and multiple iothreads.
>
>
> Switching both server and client to jemalloc give me best performance on small read currently.
>
>
>
>
>
>
> ----- Mail original -----
> De: "Mark Nelson" <mnelson@redhat.com>
> À: "ceph-devel" <ceph-devel@vger.kernel.org>
> Envoyé: Mercredi 19 Août 2015 06:45:36
> Objet: Ceph Hackathon: More Memory Allocator Testing
>
> Hi Everyone,
>
> One of the goals at the Ceph Hackathon last week was to examine how to improve Ceph Small IO performance. Jian Zhang presented findings showing a dramatic improvement in small random IO performance when Ceph is used with jemalloc. His results build upon Sandisk's original findings that the default thread cache values are a major bottleneck in TCMalloc 2.1. To further verify these results, we sat down at the Hackathon and configured the new performance test cluster that Intel generously donated to the Ceph community laboratory to run through a variety of tests with different memory allocator configurations. I've since written the results of those tests up in pdf form for folks who are interested.
>
> The results are located here:
>
> http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.pdf
>
> I want to be clear that many other folks have done the heavy lifting here. These results are simply a validation of the many tests that other folks have already done. Many thanks to Sandisk and others for figuring this out as it's a pretty big deal!
>
> Side note: Very little tuning other than swapping the memory allocator and a couple of quick and dirty ceph tunables were set during these tests. It's quite possible that higher IOPS will be achieved as we really start digging into the cluster and learning what the bottlenecks are.
>
> Thanks,
> Mark
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> ________________________________
>
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
> N     r  y   b X  ǧv ^ )޺{.n +   z ]z   {ay \x1dʇڙ ,j   f   h   z \x1e w       j:+v   w j m         zZ+     ݢj"  ! i
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Ceph Hackathon: More Memory Allocator Testing
  2015-08-19 17:29         ` Somnath Roy
  2015-08-19 18:20           ` Allen Samuels
@ 2015-08-19 18:47           ` Alexandre DERUMIER
  2015-08-20  1:09             ` Blinick, Stephen L
  1 sibling, 1 reply; 55+ messages in thread
From: Alexandre DERUMIER @ 2015-08-19 18:47 UTC (permalink / raw)
  To: Somnath Roy; +Cc: Mark Nelson, ceph-devel

Just have done a small test with jemalloc, change osd_op_threads value, and check the memory just after daemon restart.

osd_op_threads = 2 (default)


USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      10246  6.0  0.3 1086656 245760 ?      Ssl  20:36   0:01 /usr/bin/ceph-osd --cluster=ceph -i 0 -f

osd_op_threads = 32

USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      10736 19.5  0.4 1474672 307412 ?      Ssl  20:37   0:01 /usr/bin/ceph-osd --cluster=ceph -i 0 -f



I'll try to compare with tcmalloc tommorow and under load.



----- Mail original -----
De: "Somnath Roy" <Somnath.Roy@sandisk.com>
À: "aderumier" <aderumier@odiso.com>
Cc: "Mark Nelson" <mnelson@redhat.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Mercredi 19 Août 2015 19:29:56
Objet: RE: Ceph Hackathon: More Memory Allocator Testing

Yes, it should be 1 per OSD... 
There is no doubt that TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES is relative to the number of threads running.. 
But, I don't know if number of threads is a factor for jemalloc.. 

Thanks & Regards 
Somnath 

-----Original Message----- 
From: Alexandre DERUMIER [mailto:aderumier@odiso.com] 
Sent: Wednesday, August 19, 2015 9:55 AM 
To: Somnath Roy 
Cc: Mark Nelson; ceph-devel 
Subject: Re: Ceph Hackathon: More Memory Allocator Testing 

<< I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. 

>>I think it is per tcmalloc instance loaded , so, at least with num_osds * num_tcmalloc_instance * TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in a box. 

What is num_tcmalloc_instance ? I think 1 osd process use a defined TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES size ? 

I'm saying that, because I have exactly the same bug, client side, with librbd + tcmalloc + qemu + iothreads. 
When I defined too much iothread threads, I'm hitting the bug directly. (can reproduce 100%). 
Like the thread_cache size is divide by number of threads? 






----- Mail original ----- 
De: "Somnath Roy" <Somnath.Roy@sandisk.com> 
À: "aderumier" <aderumier@odiso.com>, "Mark Nelson" <mnelson@redhat.com> 
Cc: "ceph-devel" <ceph-devel@vger.kernel.org> 
Envoyé: Mercredi 19 Août 2015 18:27:30 
Objet: RE: Ceph Hackathon: More Memory Allocator Testing 

<< I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. 

I think it is per tcmalloc instance loaded , so, at least with num_osds * num_tcmalloc_instance * TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in a box. 

Also, I think there is no point of increasing osd_op_threads as it is not in IO path anymore..Mark is using default 5:2 for shard:thread per shard.. 

But, yes, it could be related to number of threads OSDs are using, need to understand how jemalloc works..Also, there may be some tuning to reduce memory usage (?). 

Thanks & Regards 
Somnath 

-----Original Message----- 
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Alexandre DERUMIER 
Sent: Wednesday, August 19, 2015 9:06 AM 
To: Mark Nelson 
Cc: ceph-devel 
Subject: Re: Ceph Hackathon: More Memory Allocator Testing 

I was listening at the today meeting, 

and seem that the blocker to have jemalloc as default, 

is that it's used more memory by osd (around 300MB?), and some guys could have boxes with 60disks. 


I just wonder if the memory increase is related to osd_op_num_shards/osd_op_threads value ? 

Seem that as hackaton, the bench has been done on super big cpus boxed 36cores/72T, http://ceph.com/hackathon/2015-08-ceph-hammer-full-ssd.pptx 
with osd_op_threads = 32. 

I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. 

Maybe jemalloc allocated memory by threads. 



(I think guys with 60disks box, dont use ssd, so low iops by osd, and they don't need a lot of threads by osd) 



----- Mail original ----- 
De: "aderumier" <aderumier@odiso.com> 
À: "Mark Nelson" <mnelson@redhat.com> 
Cc: "ceph-devel" <ceph-devel@vger.kernel.org> 
Envoyé: Mercredi 19 Août 2015 16:01:28 
Objet: Re: Ceph Hackathon: More Memory Allocator Testing 

Thanks Marc, 

Results are matching exactly what I have seen with tcmalloc 2.1 vs 2.4 vs jemalloc. 

and indeed tcmalloc, even with bigger cache, seem decrease over time. 


What is funny, is that I see exactly same behaviour client librbd side, with qemu and multiple iothreads. 


Switching both server and client to jemalloc give me best performance on small read currently. 






----- Mail original ----- 
De: "Mark Nelson" <mnelson@redhat.com> 
À: "ceph-devel" <ceph-devel@vger.kernel.org> 
Envoyé: Mercredi 19 Août 2015 06:45:36 
Objet: Ceph Hackathon: More Memory Allocator Testing 

Hi Everyone, 

One of the goals at the Ceph Hackathon last week was to examine how to improve Ceph Small IO performance. Jian Zhang presented findings showing a dramatic improvement in small random IO performance when Ceph is used with jemalloc. His results build upon Sandisk's original findings that the default thread cache values are a major bottleneck in TCMalloc 2.1. To further verify these results, we sat down at the Hackathon and configured the new performance test cluster that Intel generously donated to the Ceph community laboratory to run through a variety of tests with different memory allocator configurations. I've since written the results of those tests up in pdf form for folks who are interested. 

The results are located here: 

http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.pdf 

I want to be clear that many other folks have done the heavy lifting here. These results are simply a validation of the many tests that other folks have already done. Many thanks to Sandisk and others for figuring this out as it's a pretty big deal! 

Side note: Very little tuning other than swapping the memory allocator and a couple of quick and dirty ceph tunables were set during these tests. It's quite possible that higher IOPS will be achieved as we really start digging into the cluster and learning what the bottlenecks are. 

Thanks, 
Mark 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 

-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 

________________________________ 

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Ceph Hackathon: More Memory Allocator Testing
  2015-08-19 18:36             ` Mark Nelson
@ 2015-08-19 18:47               ` Łukasz Redynk
  0 siblings, 0 replies; 55+ messages in thread
From: Łukasz Redynk @ 2015-08-19 18:47 UTC (permalink / raw)
  To: Mark Nelson; +Cc: Allen Samuels, Somnath Roy, Alexandre DERUMIER, ceph-devel

@Mark: could you also post your ceph.conf?

2015-08-19 11:36 GMT-07:00 Mark Nelson <mnelson@redhat.com>:
> On 08/19/2015 01:20 PM, Allen Samuels wrote:
>>
>> It was a surprising result that the memory allocator is making such a
>> large difference in performance. All of the recent work in fiddling with
>> TCmalloc's and Jemalloc's various knobs and switches has been excellent a
>> great example of group collaboration. But I think it's only a partial
>> optimization of the underlying problem. The real take-away from this
>> activity is that the code base is doing a LOT of memory
>> allocation/deallocation which is consuming substantial CPU time-- regardless
>> of how much we optimize the memory allocator, you can't get away from the
>> fact that it macroscopically MATTERs. The better long-term solution is to
>> reduce reliance on the general-purpose memory allocator and to implement
>> strategies that are more specific to our usage model.
>>
>> What really needs to happen initially is to instrument the
>> allocation/deallocation. Most likely we'll find that 80+% of the work is
>> coming from just a few object classes and it will be easy to create custom
>> allocation strategies for those usages. This will lead to even higher
>> performance that's much less sensitive to easy-to-misconfigure environmental
>> factors and the entire tcmalloc/jemalloc -- oops it uses more memory
>> discussion will go away.
>
>
> Yes, I think the real take away is the Ceph is really hard on memory
> allocators.  I think a lot of us have sort of had a feeling this was the
> case for a long time.  The current discussion/results just draws it a lot
> more sharply into focus.
>
> On the plus side there is work going on to make things a little more
> manageable, though a more comprehensive analysis would be very welcome!  I
> see the jemalloc has some interesting looking profiling options in the newer
> releases.
>
> Mark
>
>
>>
>>
>> Allen Samuels
>> Software Architect, Systems and Software Solutions
>>
>> 2880 Junction Avenue, San Jose, CA 95134
>> T: +1 408 801 7030| M: +1 408 780 6416
>> allen.samuels@SanDisk.com
>>
>>
>> -----Original Message-----
>> From: ceph-devel-owner@vger.kernel.org
>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Somnath Roy
>> Sent: Wednesday, August 19, 2015 10:30 AM
>> To: Alexandre DERUMIER
>> Cc: Mark Nelson; ceph-devel
>> Subject: RE: Ceph Hackathon: More Memory Allocator Testing
>>
>> Yes, it should be 1 per OSD...
>> There is no doubt that TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES is relative
>> to the number of threads running..
>> But, I don't know if number of threads is a factor for jemalloc..
>>
>> Thanks & Regards
>> Somnath
>>
>> -----Original Message-----
>> From: Alexandre DERUMIER [mailto:aderumier@odiso.com]
>> Sent: Wednesday, August 19, 2015 9:55 AM
>> To: Somnath Roy
>> Cc: Mark Nelson; ceph-devel
>> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
>>
>> << I think that tcmalloc have a fixed size
>> (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process.
>>
>>>> I think it is per tcmalloc instance loaded , so, at least with num_osds
>>>> * num_tcmalloc_instance * TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in a box.
>>
>>
>> What is num_tcmalloc_instance ? I think 1 osd process use a defined
>> TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES size ?
>>
>> I'm saying that, because I have exactly the same bug, client side, with
>> librbd + tcmalloc + qemu + iothreads.
>> When I defined too much iothread threads, I'm hitting the bug directly.
>> (can reproduce 100%).
>> Like the thread_cache size is divide by number of threads?
>>
>>
>>
>>
>>
>>
>> ----- Mail original -----
>> De: "Somnath Roy" <Somnath.Roy@sandisk.com>
>> À: "aderumier" <aderumier@odiso.com>, "Mark Nelson" <mnelson@redhat.com>
>> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>
>> Envoyé: Mercredi 19 Août 2015 18:27:30
>> Objet: RE: Ceph Hackathon: More Memory Allocator Testing
>>
>> << I think that tcmalloc have a fixed size
>> (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process.
>>
>> I think it is per tcmalloc instance loaded , so, at least with num_osds *
>> num_tcmalloc_instance * TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in a box.
>>
>> Also, I think there is no point of increasing osd_op_threads as it is not
>> in IO path anymore..Mark is using default 5:2 for shard:thread per shard..
>>
>> But, yes, it could be related to number of threads OSDs are using, need to
>> understand how jemalloc works..Also, there may be some tuning to reduce
>> memory usage (?).
>>
>> Thanks & Regards
>> Somnath
>>
>> -----Original Message-----
>> From: ceph-devel-owner@vger.kernel.org
>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Alexandre DERUMIER
>> Sent: Wednesday, August 19, 2015 9:06 AM
>> To: Mark Nelson
>> Cc: ceph-devel
>> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
>>
>> I was listening at the today meeting,
>>
>> and seem that the blocker to have jemalloc as default,
>>
>> is that it's used more memory by osd (around 300MB?), and some guys could
>> have boxes with 60disks.
>>
>>
>> I just wonder if the memory increase is related to
>> osd_op_num_shards/osd_op_threads value ?
>>
>> Seem that as hackaton, the bench has been done on super big cpus boxed
>> 36cores/72T, http://ceph.com/hackathon/2015-08-ceph-hammer-full-ssd.pptx
>> with osd_op_threads = 32.
>>
>> I think that tcmalloc have a fixed size
>> (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process.
>>
>> Maybe jemalloc allocated memory by threads.
>>
>>
>>
>> (I think guys with 60disks box, dont use ssd, so low iops by osd, and they
>> don't need a lot of threads by osd)
>>
>>
>>
>> ----- Mail original -----
>> De: "aderumier" <aderumier@odiso.com>
>> À: "Mark Nelson" <mnelson@redhat.com>
>> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>
>> Envoyé: Mercredi 19 Août 2015 16:01:28
>> Objet: Re: Ceph Hackathon: More Memory Allocator Testing
>>
>> Thanks Marc,
>>
>> Results are matching exactly what I have seen with tcmalloc 2.1 vs 2.4 vs
>> jemalloc.
>>
>> and indeed tcmalloc, even with bigger cache, seem decrease over time.
>>
>>
>> What is funny, is that I see exactly same behaviour client librbd side,
>> with qemu and multiple iothreads.
>>
>>
>> Switching both server and client to jemalloc give me best performance on
>> small read currently.
>>
>>
>>
>>
>>
>>
>> ----- Mail original -----
>> De: "Mark Nelson" <mnelson@redhat.com>
>> À: "ceph-devel" <ceph-devel@vger.kernel.org>
>> Envoyé: Mercredi 19 Août 2015 06:45:36
>> Objet: Ceph Hackathon: More Memory Allocator Testing
>>
>> Hi Everyone,
>>
>> One of the goals at the Ceph Hackathon last week was to examine how to
>> improve Ceph Small IO performance. Jian Zhang presented findings showing a
>> dramatic improvement in small random IO performance when Ceph is used with
>> jemalloc. His results build upon Sandisk's original findings that the
>> default thread cache values are a major bottleneck in TCMalloc 2.1. To
>> further verify these results, we sat down at the Hackathon and configured
>> the new performance test cluster that Intel generously donated to the Ceph
>> community laboratory to run through a variety of tests with different memory
>> allocator configurations. I've since written the results of those tests up
>> in pdf form for folks who are interested.
>>
>> The results are located here:
>>
>> http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.pdf
>>
>> I want to be clear that many other folks have done the heavy lifting here.
>> These results are simply a validation of the many tests that other folks
>> have already done. Many thanks to Sandisk and others for figuring this out
>> as it's a pretty big deal!
>>
>> Side note: Very little tuning other than swapping the memory allocator and
>> a couple of quick and dirty ceph tunables were set during these tests. It's
>> quite possible that higher IOPS will be achieved as we really start digging
>> into the cluster and learning what the bottlenecks are.
>>
>> Thanks,
>> Mark
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org More majordomo info at
>> http://vger.kernel.org/majordomo-info.html
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org More majordomo info at
>> http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org More majordomo info at
>> http://vger.kernel.org/majordomo-info.html
>>
>> ________________________________
>>
>> PLEASE NOTE: The information contained in this electronic mail message is
>> intended only for the use of the designated recipient(s) named above. If the
>> reader of this message is not the intended recipient, you are hereby
>> notified that you have received this message in error and that any review,
>> dissemination, distribution, or copying of this message is strictly
>> prohibited. If you have received this communication in error, please notify
>> the sender by telephone or e-mail (as shown above) immediately and destroy
>> any and all copies of this message in your possession (whether hard copies
>> or electronically stored copies).
>> N     r  y   b X  ǧv ^ )޺{.n +   z ]z   {ay  ʇڙ ,j   f   h   z   w
>> j:+v   w j m         zZ+     ݢj"  ! i
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: Ceph Hackathon: More Memory Allocator Testing
  2015-08-19 14:01 ` Alexandre DERUMIER
  2015-08-19 16:05   ` Alexandre DERUMIER
@ 2015-08-19 20:16   ` Somnath Roy
  2015-08-19 20:17     ` Stefan Priebe
  1 sibling, 1 reply; 55+ messages in thread
From: Somnath Roy @ 2015-08-19 20:16 UTC (permalink / raw)
  To: Alexandre DERUMIER, Mark Nelson; +Cc: ceph-devel

Alexandre,
I am not able to build librados/librbd by using the following config option.

./configure –without-tcmalloc –with-jemalloc

It seems it is building osd/mon/Mds/RGW with jemalloc enabled..

root@emsnode10:~/ceph-latest/src# ldd ./ceph-osd
        linux-vdso.so.1 =>  (0x00007ffd0eb43000)
        libjemalloc.so.1 => /usr/lib/x86_64-linux-gnu/libjemalloc.so.1 (0x00007f5f92d70000)
        .......

root@emsnode10:~/ceph-latest/src/.libs# ldd ./librados.so.2.0.0
        linux-vdso.so.1 =>  (0x00007ffed46f2000)
        libboost_thread.so.1.55.0 => /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.55.0 (0x00007ff687887000)
        liblttng-ust.so.0 => /usr/lib/x86_64-linux-gnu/liblttng-ust.so.0 (0x00007ff68763d000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff687438000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ff68721a000)
        libnss3.so => /usr/lib/x86_64-linux-gnu/libnss3.so (0x00007ff686ee0000)
        libsmime3.so => /usr/lib/x86_64-linux-gnu/libsmime3.so (0x00007ff686cb3000)
        libnspr4.so => /usr/lib/x86_64-linux-gnu/libnspr4.so (0x00007ff686a76000)
        libuuid.so.1 => /lib/x86_64-linux-gnu/libuuid.so.1 (0x00007ff686871000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007ff686668000)
        libboost_system.so.1.55.0 => /usr/lib/x86_64-linux-gnu/libboost_system.so.1.55.0 (0x00007ff686464000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007ff686160000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff685e59000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff685a94000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007ff68587e000)
        liblttng-ust-tracepoint.so.0 => /usr/lib/x86_64-linux-gnu/liblttng-ust-tracepoint.so.0 (0x00007ff685663000)
        liburcu-bp.so.1 => /usr/lib/liburcu-bp.so.1 (0x00007ff68545c000)
        liburcu-cds.so.1 => /usr/lib/liburcu-cds.so.1 (0x00007ff685255000)
        /lib64/ld-linux-x86-64.so.2 (0x00007ff68a0f6000)
        libnssutil3.so => /usr/lib/x86_64-linux-gnu/libnssutil3.so (0x00007ff685029000)
        libplc4.so => /usr/lib/x86_64-linux-gnu/libplc4.so (0x00007ff684e24000)
        libplds4.so => /usr/lib/x86_64-linux-gnu/libplds4.so (0x00007ff684c20000)

It is building with libcmalloc always...

Did you change the ceph makefiles to build librbd/librados with jemalloc ?

Thanks & Regards
Somnath

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Alexandre DERUMIER
Sent: Wednesday, August 19, 2015 7:01 AM
To: Mark Nelson
Cc: ceph-devel
Subject: Re: Ceph Hackathon: More Memory Allocator Testing

Thanks Marc,

Results are matching exactly what I have seen with tcmalloc 2.1 vs 2.4 vs jemalloc.

and indeed tcmalloc, even with bigger cache, seem decrease over time.


What is funny, is that I see exactly same behaviour client librbd side, with qemu and multiple iothreads.


Switching both server and client to jemalloc give me best performance on small read currently.






----- Mail original -----
De: "Mark Nelson" <mnelson@redhat.com>
À: "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Mercredi 19 Août 2015 06:45:36
Objet: Ceph Hackathon: More Memory Allocator Testing

Hi Everyone,

One of the goals at the Ceph Hackathon last week was to examine how to improve Ceph Small IO performance. Jian Zhang presented findings showing a dramatic improvement in small random IO performance when Ceph is used with jemalloc. His results build upon Sandisk's original findings that the default thread cache values are a major bottleneck in TCMalloc 2.1. To further verify these results, we sat down at the Hackathon and configured the new performance test cluster that Intel generously donated to the Ceph community laboratory to run through a variety of tests with different memory allocator configurations. I've since written the results of those tests up in pdf form for folks who are interested.

The results are located here:

http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.pdf

I want to be clear that many other folks have done the heavy lifting here. These results are simply a validation of the many tests that other folks have already done. Many thanks to Sandisk and others for figuring this out as it's a pretty big deal!

Side note: Very little tuning other than swapping the memory allocator and a couple of quick and dirty ceph tunables were set during these tests. It's quite possible that higher IOPS will be achieved as we really start digging into the cluster and learning what the bottlenecks are.

Thanks,
Mark
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Ceph Hackathon: More Memory Allocator Testing
  2015-08-19 20:16   ` Somnath Roy
@ 2015-08-19 20:17     ` Stefan Priebe
  2015-08-19 20:29       ` Somnath Roy
  0 siblings, 1 reply; 55+ messages in thread
From: Stefan Priebe @ 2015-08-19 20:17 UTC (permalink / raw)
  To: Somnath Roy, Alexandre DERUMIER, Mark Nelson; +Cc: ceph-devel


Am 19.08.2015 um 22:16 schrieb Somnath Roy:
> Alexandre,
> I am not able to build librados/librbd by using the following config option.
>
> ./configure –without-tcmalloc –with-jemalloc

Same issue to me. You have to remove libcmalloc out of your build 
environment to get this done.

Stefan


> It seems it is building osd/mon/Mds/RGW with jemalloc enabled..
>
> root@emsnode10:~/ceph-latest/src# ldd ./ceph-osd
>          linux-vdso.so.1 =>  (0x00007ffd0eb43000)
>          libjemalloc.so.1 => /usr/lib/x86_64-linux-gnu/libjemalloc.so.1 (0x00007f5f92d70000)
>          .......
>
> root@emsnode10:~/ceph-latest/src/.libs# ldd ./librados.so.2.0.0
>          linux-vdso.so.1 =>  (0x00007ffed46f2000)
>          libboost_thread.so.1.55.0 => /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.55.0 (0x00007ff687887000)
>          liblttng-ust.so.0 => /usr/lib/x86_64-linux-gnu/liblttng-ust.so.0 (0x00007ff68763d000)
>          libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff687438000)
>          libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ff68721a000)
>          libnss3.so => /usr/lib/x86_64-linux-gnu/libnss3.so (0x00007ff686ee0000)
>          libsmime3.so => /usr/lib/x86_64-linux-gnu/libsmime3.so (0x00007ff686cb3000)
>          libnspr4.so => /usr/lib/x86_64-linux-gnu/libnspr4.so (0x00007ff686a76000)
>          libuuid.so.1 => /lib/x86_64-linux-gnu/libuuid.so.1 (0x00007ff686871000)
>          librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007ff686668000)
>          libboost_system.so.1.55.0 => /usr/lib/x86_64-linux-gnu/libboost_system.so.1.55.0 (0x00007ff686464000)
>          libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007ff686160000)
>          libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff685e59000)
>          libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff685a94000)
>          libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007ff68587e000)
>          liblttng-ust-tracepoint.so.0 => /usr/lib/x86_64-linux-gnu/liblttng-ust-tracepoint.so.0 (0x00007ff685663000)
>          liburcu-bp.so.1 => /usr/lib/liburcu-bp.so.1 (0x00007ff68545c000)
>          liburcu-cds.so.1 => /usr/lib/liburcu-cds.so.1 (0x00007ff685255000)
>          /lib64/ld-linux-x86-64.so.2 (0x00007ff68a0f6000)
>          libnssutil3.so => /usr/lib/x86_64-linux-gnu/libnssutil3.so (0x00007ff685029000)
>          libplc4.so => /usr/lib/x86_64-linux-gnu/libplc4.so (0x00007ff684e24000)
>          libplds4.so => /usr/lib/x86_64-linux-gnu/libplds4.so (0x00007ff684c20000)
>
> It is building with libcmalloc always...
>
> Did you change the ceph makefiles to build librbd/librados with jemalloc ?
>
> Thanks & Regards
> Somnath
>
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Alexandre DERUMIER
> Sent: Wednesday, August 19, 2015 7:01 AM
> To: Mark Nelson
> Cc: ceph-devel
> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
>
> Thanks Marc,
>
> Results are matching exactly what I have seen with tcmalloc 2.1 vs 2.4 vs jemalloc.
>
> and indeed tcmalloc, even with bigger cache, seem decrease over time.
>
>
> What is funny, is that I see exactly same behaviour client librbd side, with qemu and multiple iothreads.
>
>
> Switching both server and client to jemalloc give me best performance on small read currently.
>
>
>
>
>
>
> ----- Mail original -----
> De: "Mark Nelson" <mnelson@redhat.com>
> À: "ceph-devel" <ceph-devel@vger.kernel.org>
> Envoyé: Mercredi 19 Août 2015 06:45:36
> Objet: Ceph Hackathon: More Memory Allocator Testing
>
> Hi Everyone,
>
> One of the goals at the Ceph Hackathon last week was to examine how to improve Ceph Small IO performance. Jian Zhang presented findings showing a dramatic improvement in small random IO performance when Ceph is used with jemalloc. His results build upon Sandisk's original findings that the default thread cache values are a major bottleneck in TCMalloc 2.1. To further verify these results, we sat down at the Hackathon and configured the new performance test cluster that Intel generously donated to the Ceph community laboratory to run through a variety of tests with different memory allocator configurations. I've since written the results of those tests up in pdf form for folks who are interested.
>
> The results are located here:
>
> http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.pdf
>
> I want to be clear that many other folks have done the heavy lifting here. These results are simply a validation of the many tests that other folks have already done. Many thanks to Sandisk and others for figuring this out as it's a pretty big deal!
>
> Side note: Very little tuning other than swapping the memory allocator and a couple of quick and dirty ceph tunables were set during these tests. It's quite possible that higher IOPS will be achieved as we really start digging into the cluster and learning what the bottlenecks are.
>
> Thanks,
> Mark
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> ________________________________
>
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>
> N�����r��y���b�X��ǧv�^�)޺{.n�+���z�]z���{ay�\x1dʇڙ�,j\a��f���h���z�\x1e�w���\f���j:+v���w�j�m����\a����zZ+�����ݢj"��!tml=
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: Ceph Hackathon: More Memory Allocator Testing
  2015-08-19 20:17     ` Stefan Priebe
@ 2015-08-19 20:29       ` Somnath Roy
  2015-08-19 20:31         ` Stefan Priebe
  0 siblings, 1 reply; 55+ messages in thread
From: Somnath Roy @ 2015-08-19 20:29 UTC (permalink / raw)
  To: Stefan Priebe, Alexandre DERUMIER, Mark Nelson; +Cc: ceph-devel

Hmm...We need to fix that as part of configure/Makefile I guess (?)..
Since we have done this jemalloc integration originally, we can take that ownership unless anybody sees a problem of enabling tcmalloc/jemalloc with librbd/librados. 

<< You have to remove libcmalloc out of your build environment to get this done
How do I do that ? I am using Ubuntu and can't afford to remove libc* packages.

Thanks & Regards
Somnath

-----Original Message-----
From: Stefan Priebe [mailto:s.priebe@profihost.ag] 
Sent: Wednesday, August 19, 2015 1:18 PM
To: Somnath Roy; Alexandre DERUMIER; Mark Nelson
Cc: ceph-devel
Subject: Re: Ceph Hackathon: More Memory Allocator Testing


Am 19.08.2015 um 22:16 schrieb Somnath Roy:
> Alexandre,
> I am not able to build librados/librbd by using the following config option.
>
> ./configure –without-tcmalloc –with-jemalloc

Same issue to me. You have to remove libcmalloc out of your build environment to get this done.

Stefan


> It seems it is building osd/mon/Mds/RGW with jemalloc enabled..
>
> root@emsnode10:~/ceph-latest/src# ldd ./ceph-osd
>          linux-vdso.so.1 =>  (0x00007ffd0eb43000)
>          libjemalloc.so.1 => /usr/lib/x86_64-linux-gnu/libjemalloc.so.1 (0x00007f5f92d70000)
>          .......
>
> root@emsnode10:~/ceph-latest/src/.libs# ldd ./librados.so.2.0.0
>          linux-vdso.so.1 =>  (0x00007ffed46f2000)
>          libboost_thread.so.1.55.0 => /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.55.0 (0x00007ff687887000)
>          liblttng-ust.so.0 => /usr/lib/x86_64-linux-gnu/liblttng-ust.so.0 (0x00007ff68763d000)
>          libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff687438000)
>          libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ff68721a000)
>          libnss3.so => /usr/lib/x86_64-linux-gnu/libnss3.so (0x00007ff686ee0000)
>          libsmime3.so => /usr/lib/x86_64-linux-gnu/libsmime3.so (0x00007ff686cb3000)
>          libnspr4.so => /usr/lib/x86_64-linux-gnu/libnspr4.so (0x00007ff686a76000)
>          libuuid.so.1 => /lib/x86_64-linux-gnu/libuuid.so.1 (0x00007ff686871000)
>          librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007ff686668000)
>          libboost_system.so.1.55.0 => /usr/lib/x86_64-linux-gnu/libboost_system.so.1.55.0 (0x00007ff686464000)
>          libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007ff686160000)
>          libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff685e59000)
>          libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff685a94000)
>          libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007ff68587e000)
>          liblttng-ust-tracepoint.so.0 => /usr/lib/x86_64-linux-gnu/liblttng-ust-tracepoint.so.0 (0x00007ff685663000)
>          liburcu-bp.so.1 => /usr/lib/liburcu-bp.so.1 (0x00007ff68545c000)
>          liburcu-cds.so.1 => /usr/lib/liburcu-cds.so.1 (0x00007ff685255000)
>          /lib64/ld-linux-x86-64.so.2 (0x00007ff68a0f6000)
>          libnssutil3.so => /usr/lib/x86_64-linux-gnu/libnssutil3.so (0x00007ff685029000)
>          libplc4.so => /usr/lib/x86_64-linux-gnu/libplc4.so (0x00007ff684e24000)
>          libplds4.so => /usr/lib/x86_64-linux-gnu/libplds4.so 
> (0x00007ff684c20000)
>
> It is building with libcmalloc always...
>
> Did you change the ceph makefiles to build librbd/librados with jemalloc ?
>
> Thanks & Regards
> Somnath
>
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org 
> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Alexandre 
> DERUMIER
> Sent: Wednesday, August 19, 2015 7:01 AM
> To: Mark Nelson
> Cc: ceph-devel
> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
>
> Thanks Marc,
>
> Results are matching exactly what I have seen with tcmalloc 2.1 vs 2.4 vs jemalloc.
>
> and indeed tcmalloc, even with bigger cache, seem decrease over time.
>
>
> What is funny, is that I see exactly same behaviour client librbd side, with qemu and multiple iothreads.
>
>
> Switching both server and client to jemalloc give me best performance on small read currently.
>
>
>
>
>
>
> ----- Mail original -----
> De: "Mark Nelson" <mnelson@redhat.com>
> À: "ceph-devel" <ceph-devel@vger.kernel.org>
> Envoyé: Mercredi 19 Août 2015 06:45:36
> Objet: Ceph Hackathon: More Memory Allocator Testing
>
> Hi Everyone,
>
> One of the goals at the Ceph Hackathon last week was to examine how to improve Ceph Small IO performance. Jian Zhang presented findings showing a dramatic improvement in small random IO performance when Ceph is used with jemalloc. His results build upon Sandisk's original findings that the default thread cache values are a major bottleneck in TCMalloc 2.1. To further verify these results, we sat down at the Hackathon and configured the new performance test cluster that Intel generously donated to the Ceph community laboratory to run through a variety of tests with different memory allocator configurations. I've since written the results of those tests up in pdf form for folks who are interested.
>
> The results are located here:
>
> http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.
> pdf
>
> I want to be clear that many other folks have done the heavy lifting here. These results are simply a validation of the many tests that other folks have already done. Many thanks to Sandisk and others for figuring this out as it's a pretty big deal!
>
> Side note: Very little tuning other than swapping the memory allocator and a couple of quick and dirty ceph tunables were set during these tests. It's quite possible that higher IOPS will be achieved as we really start digging into the cluster and learning what the bottlenecks are.
>
> Thanks,
> Mark
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
>
> ________________________________
>
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>
> N     r  y   b X  ǧv ^ )޺{.n +   z ]z   {ay \x1dʇڙ ,j   f   h   z \x1e w   
   j:+v   w j m         zZ+     ݢj"  !tml=
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Ceph Hackathon: More Memory Allocator Testing
  2015-08-19 20:29       ` Somnath Roy
@ 2015-08-19 20:31         ` Stefan Priebe
  2015-08-19 20:34           ` Somnath Roy
  0 siblings, 1 reply; 55+ messages in thread
From: Stefan Priebe @ 2015-08-19 20:31 UTC (permalink / raw)
  To: Somnath Roy, Alexandre DERUMIER, Mark Nelson; +Cc: ceph-devel


Am 19.08.2015 um 22:29 schrieb Somnath Roy:
> Hmm...We need to fix that as part of configure/Makefile I guess (?)..
> Since we have done this jemalloc integration originally, we can take that ownership unless anybody sees a problem of enabling tcmalloc/jemalloc with librbd/librados.
>
> << You have to remove libcmalloc out of your build environment to get this done
> How do I do that ? I am using Ubuntu and can't afford to remove libc* packages.

I always use a chroot to build packages where only a minimal bootstrap + 
the build deps are installed. googleperftools where libtcmalloc comes 
from is not Ubuntu "core/minimal".

Stefan

>
> Thanks & Regards
> Somnath
>
> -----Original Message-----
> From: Stefan Priebe [mailto:s.priebe@profihost.ag]
> Sent: Wednesday, August 19, 2015 1:18 PM
> To: Somnath Roy; Alexandre DERUMIER; Mark Nelson
> Cc: ceph-devel
> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
>
>
> Am 19.08.2015 um 22:16 schrieb Somnath Roy:
>> Alexandre,
>> I am not able to build librados/librbd by using the following config option.
>>
>> ./configure –without-tcmalloc –with-jemalloc
>
> Same issue to me. You have to remove libcmalloc out of your build environment to get this done.
>
> Stefan
>
>
>> It seems it is building osd/mon/Mds/RGW with jemalloc enabled..
>>
>> root@emsnode10:~/ceph-latest/src# ldd ./ceph-osd
>>           linux-vdso.so.1 =>  (0x00007ffd0eb43000)
>>           libjemalloc.so.1 => /usr/lib/x86_64-linux-gnu/libjemalloc.so.1 (0x00007f5f92d70000)
>>           .......
>>
>> root@emsnode10:~/ceph-latest/src/.libs# ldd ./librados.so.2.0.0
>>           linux-vdso.so.1 =>  (0x00007ffed46f2000)
>>           libboost_thread.so.1.55.0 => /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.55.0 (0x00007ff687887000)
>>           liblttng-ust.so.0 => /usr/lib/x86_64-linux-gnu/liblttng-ust.so.0 (0x00007ff68763d000)
>>           libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff687438000)
>>           libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ff68721a000)
>>           libnss3.so => /usr/lib/x86_64-linux-gnu/libnss3.so (0x00007ff686ee0000)
>>           libsmime3.so => /usr/lib/x86_64-linux-gnu/libsmime3.so (0x00007ff686cb3000)
>>           libnspr4.so => /usr/lib/x86_64-linux-gnu/libnspr4.so (0x00007ff686a76000)
>>           libuuid.so.1 => /lib/x86_64-linux-gnu/libuuid.so.1 (0x00007ff686871000)
>>           librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007ff686668000)
>>           libboost_system.so.1.55.0 => /usr/lib/x86_64-linux-gnu/libboost_system.so.1.55.0 (0x00007ff686464000)
>>           libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007ff686160000)
>>           libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff685e59000)
>>           libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff685a94000)
>>           libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007ff68587e000)
>>           liblttng-ust-tracepoint.so.0 => /usr/lib/x86_64-linux-gnu/liblttng-ust-tracepoint.so.0 (0x00007ff685663000)
>>           liburcu-bp.so.1 => /usr/lib/liburcu-bp.so.1 (0x00007ff68545c000)
>>           liburcu-cds.so.1 => /usr/lib/liburcu-cds.so.1 (0x00007ff685255000)
>>           /lib64/ld-linux-x86-64.so.2 (0x00007ff68a0f6000)
>>           libnssutil3.so => /usr/lib/x86_64-linux-gnu/libnssutil3.so (0x00007ff685029000)
>>           libplc4.so => /usr/lib/x86_64-linux-gnu/libplc4.so (0x00007ff684e24000)
>>           libplds4.so => /usr/lib/x86_64-linux-gnu/libplds4.so
>> (0x00007ff684c20000)
>>
>> It is building with libcmalloc always...
>>
>> Did you change the ceph makefiles to build librbd/librados with jemalloc ?
>>
>> Thanks & Regards
>> Somnath
>>
>> -----Original Message-----
>> From: ceph-devel-owner@vger.kernel.org
>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Alexandre
>> DERUMIER
>> Sent: Wednesday, August 19, 2015 7:01 AM
>> To: Mark Nelson
>> Cc: ceph-devel
>> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
>>
>> Thanks Marc,
>>
>> Results are matching exactly what I have seen with tcmalloc 2.1 vs 2.4 vs jemalloc.
>>
>> and indeed tcmalloc, even with bigger cache, seem decrease over time.
>>
>>
>> What is funny, is that I see exactly same behaviour client librbd side, with qemu and multiple iothreads.
>>
>>
>> Switching both server and client to jemalloc give me best performance on small read currently.
>>
>>
>>
>>
>>
>>
>> ----- Mail original -----
>> De: "Mark Nelson" <mnelson@redhat.com>
>> À: "ceph-devel" <ceph-devel@vger.kernel.org>
>> Envoyé: Mercredi 19 Août 2015 06:45:36
>> Objet: Ceph Hackathon: More Memory Allocator Testing
>>
>> Hi Everyone,
>>
>> One of the goals at the Ceph Hackathon last week was to examine how to improve Ceph Small IO performance. Jian Zhang presented findings showing a dramatic improvement in small random IO performance when Ceph is used with jemalloc. His results build upon Sandisk's original findings that the default thread cache values are a major bottleneck in TCMalloc 2.1. To further verify these results, we sat down at the Hackathon and configured the new performance test cluster that Intel generously donated to the Ceph community laboratory to run through a variety of tests with different memory allocator configurations. I've since written the results of those tests up in pdf form for folks who are interested.
>>
>> The results are located here:
>>
>> http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.
>> pdf
>>
>> I want to be clear that many other folks have done the heavy lifting here. These results are simply a validation of the many tests that other folks have already done. Many thanks to Sandisk and others for figuring this out as it's a pretty big deal!
>>
>> Side note: Very little tuning other than swapping the memory allocator and a couple of quick and dirty ceph tunables were set during these tests. It's quite possible that higher IOPS will be achieved as we really start digging into the cluster and learning what the bottlenecks are.
>>
>> Thanks,
>> Mark
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo
>> info at http://vger.kernel.org/majordomo-info.html
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo
>> info at  http://vger.kernel.org/majordomo-info.html
>>
>> ________________________________
>>
>> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>>
>> N     r  y   b X  ǧv ^ )޺{.n +   z ]z   {ay \x1dʇڙ ,j   f   h   z \x1e w
>     j:+v   w j m         zZ+     ݢj"  !tml=
>>
> N�����r��y���b�X��ǧv�^�)޺{.n�+���z�]z���{ay�\x1dʇڙ�,j\a��f���h���z�\x1e�w���\f���j:+v���w�j�m����\a����zZ+�����ݢj"��!tml=
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: Ceph Hackathon: More Memory Allocator Testing
  2015-08-19 20:31         ` Stefan Priebe
@ 2015-08-19 20:34           ` Somnath Roy
  2015-08-19 20:40             ` Stefan Priebe
  0 siblings, 1 reply; 55+ messages in thread
From: Somnath Roy @ 2015-08-19 20:34 UTC (permalink / raw)
  To: Stefan Priebe, Alexandre DERUMIER, Mark Nelson; +Cc: ceph-devel

But, you said you need to remove libcmalloc *not* libtcmalloc...
I saw librbd/librados is built with libcmalloc not with libtcmalloc..
So, are you saying to remove libtcmalloc (not libcmalloc) to enable jemalloc ?

-----Original Message-----
From: Stefan Priebe [mailto:s.priebe@profihost.ag] 
Sent: Wednesday, August 19, 2015 1:31 PM
To: Somnath Roy; Alexandre DERUMIER; Mark Nelson
Cc: ceph-devel
Subject: Re: Ceph Hackathon: More Memory Allocator Testing


Am 19.08.2015 um 22:29 schrieb Somnath Roy:
> Hmm...We need to fix that as part of configure/Makefile I guess (?)..
> Since we have done this jemalloc integration originally, we can take that ownership unless anybody sees a problem of enabling tcmalloc/jemalloc with librbd/librados.
>
> << You have to remove libcmalloc out of your build environment to get 
> this done How do I do that ? I am using Ubuntu and can't afford to remove libc* packages.

I always use a chroot to build packages where only a minimal bootstrap + the build deps are installed. googleperftools where libtcmalloc comes from is not Ubuntu "core/minimal".

Stefan

>
> Thanks & Regards
> Somnath
>
> -----Original Message-----
> From: Stefan Priebe [mailto:s.priebe@profihost.ag]
> Sent: Wednesday, August 19, 2015 1:18 PM
> To: Somnath Roy; Alexandre DERUMIER; Mark Nelson
> Cc: ceph-devel
> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
>
>
> Am 19.08.2015 um 22:16 schrieb Somnath Roy:
>> Alexandre,
>> I am not able to build librados/librbd by using the following config option.
>>
>> ./configure –without-tcmalloc –with-jemalloc
>
> Same issue to me. You have to remove libcmalloc out of your build environment to get this done.
>
> Stefan
>
>
>> It seems it is building osd/mon/Mds/RGW with jemalloc enabled..
>>
>> root@emsnode10:~/ceph-latest/src# ldd ./ceph-osd
>>           linux-vdso.so.1 =>  (0x00007ffd0eb43000)
>>           libjemalloc.so.1 => /usr/lib/x86_64-linux-gnu/libjemalloc.so.1 (0x00007f5f92d70000)
>>           .......
>>
>> root@emsnode10:~/ceph-latest/src/.libs# ldd ./librados.so.2.0.0
>>           linux-vdso.so.1 =>  (0x00007ffed46f2000)
>>           libboost_thread.so.1.55.0 => /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.55.0 (0x00007ff687887000)
>>           liblttng-ust.so.0 => /usr/lib/x86_64-linux-gnu/liblttng-ust.so.0 (0x00007ff68763d000)
>>           libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff687438000)
>>           libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ff68721a000)
>>           libnss3.so => /usr/lib/x86_64-linux-gnu/libnss3.so (0x00007ff686ee0000)
>>           libsmime3.so => /usr/lib/x86_64-linux-gnu/libsmime3.so (0x00007ff686cb3000)
>>           libnspr4.so => /usr/lib/x86_64-linux-gnu/libnspr4.so (0x00007ff686a76000)
>>           libuuid.so.1 => /lib/x86_64-linux-gnu/libuuid.so.1 (0x00007ff686871000)
>>           librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007ff686668000)
>>           libboost_system.so.1.55.0 => /usr/lib/x86_64-linux-gnu/libboost_system.so.1.55.0 (0x00007ff686464000)
>>           libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007ff686160000)
>>           libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff685e59000)
>>           libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff685a94000)
>>           libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007ff68587e000)
>>           liblttng-ust-tracepoint.so.0 => /usr/lib/x86_64-linux-gnu/liblttng-ust-tracepoint.so.0 (0x00007ff685663000)
>>           liburcu-bp.so.1 => /usr/lib/liburcu-bp.so.1 (0x00007ff68545c000)
>>           liburcu-cds.so.1 => /usr/lib/liburcu-cds.so.1 (0x00007ff685255000)
>>           /lib64/ld-linux-x86-64.so.2 (0x00007ff68a0f6000)
>>           libnssutil3.so => /usr/lib/x86_64-linux-gnu/libnssutil3.so (0x00007ff685029000)
>>           libplc4.so => /usr/lib/x86_64-linux-gnu/libplc4.so (0x00007ff684e24000)
>>           libplds4.so => /usr/lib/x86_64-linux-gnu/libplds4.so
>> (0x00007ff684c20000)
>>
>> It is building with libcmalloc always...
>>
>> Did you change the ceph makefiles to build librbd/librados with jemalloc ?
>>
>> Thanks & Regards
>> Somnath
>>
>> -----Original Message-----
>> From: ceph-devel-owner@vger.kernel.org 
>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Alexandre 
>> DERUMIER
>> Sent: Wednesday, August 19, 2015 7:01 AM
>> To: Mark Nelson
>> Cc: ceph-devel
>> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
>>
>> Thanks Marc,
>>
>> Results are matching exactly what I have seen with tcmalloc 2.1 vs 2.4 vs jemalloc.
>>
>> and indeed tcmalloc, even with bigger cache, seem decrease over time.
>>
>>
>> What is funny, is that I see exactly same behaviour client librbd side, with qemu and multiple iothreads.
>>
>>
>> Switching both server and client to jemalloc give me best performance on small read currently.
>>
>>
>>
>>
>>
>>
>> ----- Mail original -----
>> De: "Mark Nelson" <mnelson@redhat.com>
>> À: "ceph-devel" <ceph-devel@vger.kernel.org>
>> Envoyé: Mercredi 19 Août 2015 06:45:36
>> Objet: Ceph Hackathon: More Memory Allocator Testing
>>
>> Hi Everyone,
>>
>> One of the goals at the Ceph Hackathon last week was to examine how to improve Ceph Small IO performance. Jian Zhang presented findings showing a dramatic improvement in small random IO performance when Ceph is used with jemalloc. His results build upon Sandisk's original findings that the default thread cache values are a major bottleneck in TCMalloc 2.1. To further verify these results, we sat down at the Hackathon and configured the new performance test cluster that Intel generously donated to the Ceph community laboratory to run through a variety of tests with different memory allocator configurations. I've since written the results of those tests up in pdf form for folks who are interested.
>>
>> The results are located here:
>>
>> http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.
>> pdf
>>
>> I want to be clear that many other folks have done the heavy lifting here. These results are simply a validation of the many tests that other folks have already done. Many thanks to Sandisk and others for figuring this out as it's a pretty big deal!
>>
>> Side note: Very little tuning other than swapping the memory allocator and a couple of quick and dirty ceph tunables were set during these tests. It's quite possible that higher IOPS will be achieved as we really start digging into the cluster and learning what the bottlenecks are.
>>
>> Thanks,
>> Mark
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>> info at http://vger.kernel.org/majordomo-info.html
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>> info at  http://vger.kernel.org/majordomo-info.html
>>
>> ________________________________
>>
>> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>>
>> N     r  y   b X  ǧv ^ )޺{.n +   z ]z   {ay \x1dʇڙ ,j   f   h   z \x1e w
>     j:+v   w j m         zZ+     ݢj"  !tml=
>>
> N     r  y   b X  ǧv ^ )޺{.n +   z ]z   {ay \x1dʇڙ ,j   f   h   z \x1e w   
   j:+v   w j m         zZ+     ݢj"  !tml=
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Ceph Hackathon: More Memory Allocator Testing
  2015-08-19 20:34           ` Somnath Roy
@ 2015-08-19 20:40             ` Stefan Priebe
  2015-08-19 20:44               ` Somnath Roy
  0 siblings, 1 reply; 55+ messages in thread
From: Stefan Priebe @ 2015-08-19 20:40 UTC (permalink / raw)
  To: Somnath Roy, Alexandre DERUMIER, Mark Nelson; +Cc: ceph-devel


Am 19.08.2015 um 22:34 schrieb Somnath Roy:
> But, you said you need to remove libcmalloc *not* libtcmalloc...
> I saw librbd/librados is built with libcmalloc not with libtcmalloc..
> So, are you saying to remove libtcmalloc (not libcmalloc) to enable jemalloc ?

Ouch my mistake. I read libtcmalloc - too late here.

My build (Hammer) says:
# ldd /usr/lib/librados.so.2.0.0
         linux-vdso.so.1 =>  (0x00007fff4f71d000)
         libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fafdb26c000)
         libboost_thread.so.1.49.0 => /usr/lib/libboost_thread.so.1.49.0 
(0x00007fafdb24f000)
         libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 
(0x00007fafdb032000)
         libcrypto++.so.9 => /usr/lib/libcrypto++.so.9 (0x00007fafda924000)
         libuuid.so.1 => /lib/x86_64-linux-gnu/libuuid.so.1 
(0x00007fafda71f000)
         librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fafda516000)
         libboost_system.so.1.49.0 => /usr/lib/libboost_system.so.1.49.0 
(0x00007fafda512000)
         libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 
(0x00007fafda20b000)
         libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fafd9f88000)
         libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fafd9bfd000)
         libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 
(0x00007fafd99e7000)
         /lib64/ld-linux-x86-64.so.2 (0x000056358ecfe000)

Only ceph-osd is linked against libjemalloc for me.

Stefan

> -----Original Message-----
> From: Stefan Priebe [mailto:s.priebe@profihost.ag]
> Sent: Wednesday, August 19, 2015 1:31 PM
> To: Somnath Roy; Alexandre DERUMIER; Mark Nelson
> Cc: ceph-devel
> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
>
>
> Am 19.08.2015 um 22:29 schrieb Somnath Roy:
>> Hmm...We need to fix that as part of configure/Makefile I guess (?)..
>> Since we have done this jemalloc integration originally, we can take that ownership unless anybody sees a problem of enabling tcmalloc/jemalloc with librbd/librados.
>>
>> << You have to remove libcmalloc out of your build environment to get
>> this done How do I do that ? I am using Ubuntu and can't afford to remove libc* packages.
>
> I always use a chroot to build packages where only a minimal bootstrap + the build deps are installed. googleperftools where libtcmalloc comes from is not Ubuntu "core/minimal".
>
> Stefan
>
>>
>> Thanks & Regards
>> Somnath
>>
>> -----Original Message-----
>> From: Stefan Priebe [mailto:s.priebe@profihost.ag]
>> Sent: Wednesday, August 19, 2015 1:18 PM
>> To: Somnath Roy; Alexandre DERUMIER; Mark Nelson
>> Cc: ceph-devel
>> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
>>
>>
>> Am 19.08.2015 um 22:16 schrieb Somnath Roy:
>>> Alexandre,
>>> I am not able to build librados/librbd by using the following config option.
>>>
>>> ./configure –without-tcmalloc –with-jemalloc
>>
>> Same issue to me. You have to remove libcmalloc out of your build environment to get this done.
>>
>> Stefan
>>
>>
>>> It seems it is building osd/mon/Mds/RGW with jemalloc enabled..
>>>
>>> root@emsnode10:~/ceph-latest/src# ldd ./ceph-osd
>>>            linux-vdso.so.1 =>  (0x00007ffd0eb43000)
>>>            libjemalloc.so.1 => /usr/lib/x86_64-linux-gnu/libjemalloc.so.1 (0x00007f5f92d70000)
>>>            .......
>>>
>>> root@emsnode10:~/ceph-latest/src/.libs# ldd ./librados.so.2.0.0
>>>            linux-vdso.so.1 =>  (0x00007ffed46f2000)
>>>            libboost_thread.so.1.55.0 => /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.55.0 (0x00007ff687887000)
>>>            liblttng-ust.so.0 => /usr/lib/x86_64-linux-gnu/liblttng-ust.so.0 (0x00007ff68763d000)
>>>            libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff687438000)
>>>            libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ff68721a000)
>>>            libnss3.so => /usr/lib/x86_64-linux-gnu/libnss3.so (0x00007ff686ee0000)
>>>            libsmime3.so => /usr/lib/x86_64-linux-gnu/libsmime3.so (0x00007ff686cb3000)
>>>            libnspr4.so => /usr/lib/x86_64-linux-gnu/libnspr4.so (0x00007ff686a76000)
>>>            libuuid.so.1 => /lib/x86_64-linux-gnu/libuuid.so.1 (0x00007ff686871000)
>>>            librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007ff686668000)
>>>            libboost_system.so.1.55.0 => /usr/lib/x86_64-linux-gnu/libboost_system.so.1.55.0 (0x00007ff686464000)
>>>            libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007ff686160000)
>>>            libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff685e59000)
>>>            libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff685a94000)
>>>            libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007ff68587e000)
>>>            liblttng-ust-tracepoint.so.0 => /usr/lib/x86_64-linux-gnu/liblttng-ust-tracepoint.so.0 (0x00007ff685663000)
>>>            liburcu-bp.so.1 => /usr/lib/liburcu-bp.so.1 (0x00007ff68545c000)
>>>            liburcu-cds.so.1 => /usr/lib/liburcu-cds.so.1 (0x00007ff685255000)
>>>            /lib64/ld-linux-x86-64.so.2 (0x00007ff68a0f6000)
>>>            libnssutil3.so => /usr/lib/x86_64-linux-gnu/libnssutil3.so (0x00007ff685029000)
>>>            libplc4.so => /usr/lib/x86_64-linux-gnu/libplc4.so (0x00007ff684e24000)
>>>            libplds4.so => /usr/lib/x86_64-linux-gnu/libplds4.so
>>> (0x00007ff684c20000)
>>>
>>> It is building with libcmalloc always...
>>>
>>> Did you change the ceph makefiles to build librbd/librados with jemalloc ?
>>>
>>> Thanks & Regards
>>> Somnath
>>>
>>> -----Original Message-----
>>> From: ceph-devel-owner@vger.kernel.org
>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Alexandre
>>> DERUMIER
>>> Sent: Wednesday, August 19, 2015 7:01 AM
>>> To: Mark Nelson
>>> Cc: ceph-devel
>>> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
>>>
>>> Thanks Marc,
>>>
>>> Results are matching exactly what I have seen with tcmalloc 2.1 vs 2.4 vs jemalloc.
>>>
>>> and indeed tcmalloc, even with bigger cache, seem decrease over time.
>>>
>>>
>>> What is funny, is that I see exactly same behaviour client librbd side, with qemu and multiple iothreads.
>>>
>>>
>>> Switching both server and client to jemalloc give me best performance on small read currently.
>>>
>>>
>>>
>>>
>>>
>>>
>>> ----- Mail original -----
>>> De: "Mark Nelson" <mnelson@redhat.com>
>>> À: "ceph-devel" <ceph-devel@vger.kernel.org>
>>> Envoyé: Mercredi 19 Août 2015 06:45:36
>>> Objet: Ceph Hackathon: More Memory Allocator Testing
>>>
>>> Hi Everyone,
>>>
>>> One of the goals at the Ceph Hackathon last week was to examine how to improve Ceph Small IO performance. Jian Zhang presented findings showing a dramatic improvement in small random IO performance when Ceph is used with jemalloc. His results build upon Sandisk's original findings that the default thread cache values are a major bottleneck in TCMalloc 2.1. To further verify these results, we sat down at the Hackathon and configured the new performance test cluster that Intel generously donated to the Ceph community laboratory to run through a variety of tests with different memory allocator configurations. I've since written the results of those tests up in pdf form for folks who are interested.
>>>
>>> The results are located here:
>>>
>>> http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.
>>> pdf
>>>
>>> I want to be clear that many other folks have done the heavy lifting here. These results are simply a validation of the many tests that other folks have already done. Many thanks to Sandisk and others for figuring this out as it's a pretty big deal!
>>>
>>> Side note: Very little tuning other than swapping the memory allocator and a couple of quick and dirty ceph tunables were set during these tests. It's quite possible that higher IOPS will be achieved as we really start digging into the cluster and learning what the bottlenecks are.
>>>
>>> Thanks,
>>> Mark
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> in the body of a message to majordomo@vger.kernel.org More majordomo
>>> info at http://vger.kernel.org/majordomo-info.html
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> in the body of a message to majordomo@vger.kernel.org More majordomo
>>> info at  http://vger.kernel.org/majordomo-info.html
>>>
>>> ________________________________
>>>
>>> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>>>
>>> N     r  y   b X  ǧv ^ )޺{.n +   z ]z   {ay \x1dʇڙ ,j   f   h   z \x1e w
>>      j:+v   w j m         zZ+     ݢj"  !tml=
>>>
>> N     r  y   b X  ǧv ^ )޺{.n +   z ]z   {ay \x1dʇڙ ,j   f   h   z \x1e w
>     j:+v   w j m         zZ+     ݢj"  !tml=
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: Ceph Hackathon: More Memory Allocator Testing
  2015-08-19 20:40             ` Stefan Priebe
@ 2015-08-19 20:44               ` Somnath Roy
  2015-08-21  3:45                 ` Shishir Gowda
  2015-08-21  4:22                 ` Shishir Gowda
  0 siblings, 2 replies; 55+ messages in thread
From: Somnath Roy @ 2015-08-19 20:44 UTC (permalink / raw)
  To: Stefan Priebe, Alexandre DERUMIER, Mark Nelson; +Cc: ceph-devel

Yeah , I can see ceph-osd/ceph-mon built with jemalloc.

Thanks & Regards
Somnath

-----Original Message-----
From: Stefan Priebe [mailto:s.priebe@profihost.ag] 
Sent: Wednesday, August 19, 2015 1:41 PM
To: Somnath Roy; Alexandre DERUMIER; Mark Nelson
Cc: ceph-devel
Subject: Re: Ceph Hackathon: More Memory Allocator Testing


Am 19.08.2015 um 22:34 schrieb Somnath Roy:
> But, you said you need to remove libcmalloc *not* libtcmalloc...
> I saw librbd/librados is built with libcmalloc not with libtcmalloc..
> So, are you saying to remove libtcmalloc (not libcmalloc) to enable jemalloc ?

Ouch my mistake. I read libtcmalloc - too late here.

My build (Hammer) says:
# ldd /usr/lib/librados.so.2.0.0
         linux-vdso.so.1 =>  (0x00007fff4f71d000)
         libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fafdb26c000)
         libboost_thread.so.1.49.0 => /usr/lib/libboost_thread.so.1.49.0
(0x00007fafdb24f000)
         libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
(0x00007fafdb032000)
         libcrypto++.so.9 => /usr/lib/libcrypto++.so.9 (0x00007fafda924000)
         libuuid.so.1 => /lib/x86_64-linux-gnu/libuuid.so.1
(0x00007fafda71f000)
         librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fafda516000)
         libboost_system.so.1.49.0 => /usr/lib/libboost_system.so.1.49.0
(0x00007fafda512000)
         libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6
(0x00007fafda20b000)
         libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fafd9f88000)
         libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fafd9bfd000)
         libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1
(0x00007fafd99e7000)
         /lib64/ld-linux-x86-64.so.2 (0x000056358ecfe000)

Only ceph-osd is linked against libjemalloc for me.

Stefan

> -----Original Message-----
> From: Stefan Priebe [mailto:s.priebe@profihost.ag]
> Sent: Wednesday, August 19, 2015 1:31 PM
> To: Somnath Roy; Alexandre DERUMIER; Mark Nelson
> Cc: ceph-devel
> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
>
>
> Am 19.08.2015 um 22:29 schrieb Somnath Roy:
>> Hmm...We need to fix that as part of configure/Makefile I guess (?)..
>> Since we have done this jemalloc integration originally, we can take that ownership unless anybody sees a problem of enabling tcmalloc/jemalloc with librbd/librados.
>>
>> << You have to remove libcmalloc out of your build environment to get 
>> this done How do I do that ? I am using Ubuntu and can't afford to remove libc* packages.
>
> I always use a chroot to build packages where only a minimal bootstrap + the build deps are installed. googleperftools where libtcmalloc comes from is not Ubuntu "core/minimal".
>
> Stefan
>
>>
>> Thanks & Regards
>> Somnath
>>
>> -----Original Message-----
>> From: Stefan Priebe [mailto:s.priebe@profihost.ag]
>> Sent: Wednesday, August 19, 2015 1:18 PM
>> To: Somnath Roy; Alexandre DERUMIER; Mark Nelson
>> Cc: ceph-devel
>> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
>>
>>
>> Am 19.08.2015 um 22:16 schrieb Somnath Roy:
>>> Alexandre,
>>> I am not able to build librados/librbd by using the following config option.
>>>
>>> ./configure –without-tcmalloc –with-jemalloc
>>
>> Same issue to me. You have to remove libcmalloc out of your build environment to get this done.
>>
>> Stefan
>>
>>
>>> It seems it is building osd/mon/Mds/RGW with jemalloc enabled..
>>>
>>> root@emsnode10:~/ceph-latest/src# ldd ./ceph-osd
>>>            linux-vdso.so.1 =>  (0x00007ffd0eb43000)
>>>            libjemalloc.so.1 => /usr/lib/x86_64-linux-gnu/libjemalloc.so.1 (0x00007f5f92d70000)
>>>            .......
>>>
>>> root@emsnode10:~/ceph-latest/src/.libs# ldd ./librados.so.2.0.0
>>>            linux-vdso.so.1 =>  (0x00007ffed46f2000)
>>>            libboost_thread.so.1.55.0 => /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.55.0 (0x00007ff687887000)
>>>            liblttng-ust.so.0 => /usr/lib/x86_64-linux-gnu/liblttng-ust.so.0 (0x00007ff68763d000)
>>>            libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff687438000)
>>>            libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ff68721a000)
>>>            libnss3.so => /usr/lib/x86_64-linux-gnu/libnss3.so (0x00007ff686ee0000)
>>>            libsmime3.so => /usr/lib/x86_64-linux-gnu/libsmime3.so (0x00007ff686cb3000)
>>>            libnspr4.so => /usr/lib/x86_64-linux-gnu/libnspr4.so (0x00007ff686a76000)
>>>            libuuid.so.1 => /lib/x86_64-linux-gnu/libuuid.so.1 (0x00007ff686871000)
>>>            librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007ff686668000)
>>>            libboost_system.so.1.55.0 => /usr/lib/x86_64-linux-gnu/libboost_system.so.1.55.0 (0x00007ff686464000)
>>>            libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007ff686160000)
>>>            libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff685e59000)
>>>            libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff685a94000)
>>>            libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007ff68587e000)
>>>            liblttng-ust-tracepoint.so.0 => /usr/lib/x86_64-linux-gnu/liblttng-ust-tracepoint.so.0 (0x00007ff685663000)
>>>            liburcu-bp.so.1 => /usr/lib/liburcu-bp.so.1 (0x00007ff68545c000)
>>>            liburcu-cds.so.1 => /usr/lib/liburcu-cds.so.1 (0x00007ff685255000)
>>>            /lib64/ld-linux-x86-64.so.2 (0x00007ff68a0f6000)
>>>            libnssutil3.so => /usr/lib/x86_64-linux-gnu/libnssutil3.so (0x00007ff685029000)
>>>            libplc4.so => /usr/lib/x86_64-linux-gnu/libplc4.so (0x00007ff684e24000)
>>>            libplds4.so => /usr/lib/x86_64-linux-gnu/libplds4.so
>>> (0x00007ff684c20000)
>>>
>>> It is building with libcmalloc always...
>>>
>>> Did you change the ceph makefiles to build librbd/librados with jemalloc ?
>>>
>>> Thanks & Regards
>>> Somnath
>>>
>>> -----Original Message-----
>>> From: ceph-devel-owner@vger.kernel.org 
>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Alexandre 
>>> DERUMIER
>>> Sent: Wednesday, August 19, 2015 7:01 AM
>>> To: Mark Nelson
>>> Cc: ceph-devel
>>> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
>>>
>>> Thanks Marc,
>>>
>>> Results are matching exactly what I have seen with tcmalloc 2.1 vs 2.4 vs jemalloc.
>>>
>>> and indeed tcmalloc, even with bigger cache, seem decrease over time.
>>>
>>>
>>> What is funny, is that I see exactly same behaviour client librbd side, with qemu and multiple iothreads.
>>>
>>>
>>> Switching both server and client to jemalloc give me best performance on small read currently.
>>>
>>>
>>>
>>>
>>>
>>>
>>> ----- Mail original -----
>>> De: "Mark Nelson" <mnelson@redhat.com>
>>> À: "ceph-devel" <ceph-devel@vger.kernel.org>
>>> Envoyé: Mercredi 19 Août 2015 06:45:36
>>> Objet: Ceph Hackathon: More Memory Allocator Testing
>>>
>>> Hi Everyone,
>>>
>>> One of the goals at the Ceph Hackathon last week was to examine how to improve Ceph Small IO performance. Jian Zhang presented findings showing a dramatic improvement in small random IO performance when Ceph is used with jemalloc. His results build upon Sandisk's original findings that the default thread cache values are a major bottleneck in TCMalloc 2.1. To further verify these results, we sat down at the Hackathon and configured the new performance test cluster that Intel generously donated to the Ceph community laboratory to run through a variety of tests with different memory allocator configurations. I've since written the results of those tests up in pdf form for folks who are interested.
>>>
>>> The results are located here:
>>>
>>> http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.
>>> pdf
>>>
>>> I want to be clear that many other folks have done the heavy lifting here. These results are simply a validation of the many tests that other folks have already done. Many thanks to Sandisk and others for figuring this out as it's a pretty big deal!
>>>
>>> Side note: Very little tuning other than swapping the memory allocator and a couple of quick and dirty ceph tunables were set during these tests. It's quite possible that higher IOPS will be achieved as we really start digging into the cluster and learning what the bottlenecks are.
>>>
>>> Thanks,
>>> Mark
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>>> info at http://vger.kernel.org/majordomo-info.html
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>>> info at  http://vger.kernel.org/majordomo-info.html
>>>
>>> ________________________________
>>>
>>> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>>>
>>> N     r  y   b X  ǧv ^ )޺{.n +   z ]z   {ay \x1dʇڙ ,j   f   h   z \x1e w
>>      j:+v   w j m         zZ+     ݢj"  !tml=
>>>
>> N     r  y   b X  ǧv ^ )޺{.n +   z ]z   {ay \x1dʇڙ ,j   f   h   z \x1e w
>     j:+v   w j m         zZ+     ݢj"  !tml=
>>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: Ceph Hackathon: More Memory Allocator Testing
  2015-08-19  4:45 Ceph Hackathon: More Memory Allocator Testing Mark Nelson
                   ` (3 preceding siblings ...)
  2015-08-19 14:01 ` Alexandre DERUMIER
@ 2015-08-19 20:50 ` Zhang, Jian
  4 siblings, 0 replies; 55+ messages in thread
From: Zhang, Jian @ 2015-08-19 20:50 UTC (permalink / raw)
  To: Mark Nelson, ceph-devel

Mark, 
Great dutiful report! Really helpful to understanding the impact in difference scenarios. 

Thanks
Jian

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson
Sent: Tuesday, August 18, 2015 9:46 PM
To: ceph-devel
Subject: Ceph Hackathon: More Memory Allocator Testing

Hi Everyone,

One of the goals at the Ceph Hackathon last week was to examine how to improve Ceph Small IO performance.  Jian Zhang presented findings showing a dramatic improvement in small random IO performance when Ceph is used with jemalloc.  His results build upon Sandisk's original findings that the default thread cache values are a major bottleneck in TCMalloc 2.1.  To further verify these results, we sat down at the Hackathon and configured the new performance test cluster that Intel generously donated to the Ceph community laboratory to run through a variety of tests with different memory allocator configurations.  I've since written the results of those tests up in pdf form for folks who are interested.

The results are located here:

http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.pdf

I want to be clear that many other folks have done the heavy lifting here.  These results are simply a validation of the many tests that other folks have already done.  Many thanks to Sandisk and others for figuring this out as it's a pretty big deal!

Side note:  Very little tuning other than swapping the memory allocator and a couple of quick and dirty ceph tunables were set during these tests. It's quite possible that higher IOPS will be achieved as we really start digging into the cluster and learning what the bottlenecks are.

Thanks,
Mark
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: Ceph Hackathon: More Memory Allocator Testing
  2015-08-19 18:47           ` Alexandre DERUMIER
@ 2015-08-20  1:09             ` Blinick, Stephen L
  2015-08-20  2:00               ` Shinobu Kinjo
  0 siblings, 1 reply; 55+ messages in thread
From: Blinick, Stephen L @ 2015-08-20  1:09 UTC (permalink / raw)
  To: Alexandre DERUMIER, Somnath Roy; +Cc: Mark Nelson, ceph-devel

Would it make more sense to try this comparison while changing the size of the worker thread pool?  i.e.  changing "osd_op_num_threads_per_shard" and "osd_op_num_shards"   (default is currently 2 and 5 respectively, for a total of 10 worker threads).

Thanks,

Stephen


-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Alexandre DERUMIER
Sent: Wednesday, August 19, 2015 11:47 AM
To: Somnath Roy
Cc: Mark Nelson; ceph-devel
Subject: Re: Ceph Hackathon: More Memory Allocator Testing

Just have done a small test with jemalloc, change osd_op_threads value, and check the memory just after daemon restart.

osd_op_threads = 2 (default)


USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      10246  6.0  0.3 1086656 245760 ?      Ssl  20:36   0:01 /usr/bin/ceph-osd --cluster=ceph -i 0 -f

osd_op_threads = 32

USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      10736 19.5  0.4 1474672 307412 ?      Ssl  20:37   0:01 /usr/bin/ceph-osd --cluster=ceph -i 0 -f



I'll try to compare with tcmalloc tommorow and under load.



----- Mail original -----
De: "Somnath Roy" <Somnath.Roy@sandisk.com>
À: "aderumier" <aderumier@odiso.com>
Cc: "Mark Nelson" <mnelson@redhat.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Mercredi 19 Août 2015 19:29:56
Objet: RE: Ceph Hackathon: More Memory Allocator Testing

Yes, it should be 1 per OSD... 
There is no doubt that TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES is relative to the number of threads running.. 
But, I don't know if number of threads is a factor for jemalloc.. 

Thanks & Regards
Somnath 

-----Original Message-----
From: Alexandre DERUMIER [mailto:aderumier@odiso.com]
Sent: Wednesday, August 19, 2015 9:55 AM
To: Somnath Roy
Cc: Mark Nelson; ceph-devel
Subject: Re: Ceph Hackathon: More Memory Allocator Testing 

<< I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. 

>>I think it is per tcmalloc instance loaded , so, at least with num_osds * num_tcmalloc_instance * TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in a box. 

What is num_tcmalloc_instance ? I think 1 osd process use a defined TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES size ? 

I'm saying that, because I have exactly the same bug, client side, with librbd + tcmalloc + qemu + iothreads. 
When I defined too much iothread threads, I'm hitting the bug directly. (can reproduce 100%). 
Like the thread_cache size is divide by number of threads? 






----- Mail original -----
De: "Somnath Roy" <Somnath.Roy@sandisk.com>
À: "aderumier" <aderumier@odiso.com>, "Mark Nelson" <mnelson@redhat.com>
Cc: "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Mercredi 19 Août 2015 18:27:30
Objet: RE: Ceph Hackathon: More Memory Allocator Testing 

<< I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. 

I think it is per tcmalloc instance loaded , so, at least with num_osds * num_tcmalloc_instance * TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in a box. 

Also, I think there is no point of increasing osd_op_threads as it is not in IO path anymore..Mark is using default 5:2 for shard:thread per shard.. 

But, yes, it could be related to number of threads OSDs are using, need to understand how jemalloc works..Also, there may be some tuning to reduce memory usage (?). 

Thanks & Regards
Somnath 

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Alexandre DERUMIER
Sent: Wednesday, August 19, 2015 9:06 AM
To: Mark Nelson
Cc: ceph-devel
Subject: Re: Ceph Hackathon: More Memory Allocator Testing 

I was listening at the today meeting, 

and seem that the blocker to have jemalloc as default, 

is that it's used more memory by osd (around 300MB?), and some guys could have boxes with 60disks. 


I just wonder if the memory increase is related to osd_op_num_shards/osd_op_threads value ? 

Seem that as hackaton, the bench has been done on super big cpus boxed 36cores/72T, http://ceph.com/hackathon/2015-08-ceph-hammer-full-ssd.pptx
with osd_op_threads = 32. 

I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. 

Maybe jemalloc allocated memory by threads. 



(I think guys with 60disks box, dont use ssd, so low iops by osd, and they don't need a lot of threads by osd) 



----- Mail original -----
De: "aderumier" <aderumier@odiso.com>
À: "Mark Nelson" <mnelson@redhat.com>
Cc: "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Mercredi 19 Août 2015 16:01:28
Objet: Re: Ceph Hackathon: More Memory Allocator Testing 

Thanks Marc, 

Results are matching exactly what I have seen with tcmalloc 2.1 vs 2.4 vs jemalloc. 

and indeed tcmalloc, even with bigger cache, seem decrease over time. 


What is funny, is that I see exactly same behaviour client librbd side, with qemu and multiple iothreads. 


Switching both server and client to jemalloc give me best performance on small read currently. 






----- Mail original -----
De: "Mark Nelson" <mnelson@redhat.com>
À: "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Mercredi 19 Août 2015 06:45:36
Objet: Ceph Hackathon: More Memory Allocator Testing 

Hi Everyone, 

One of the goals at the Ceph Hackathon last week was to examine how to improve Ceph Small IO performance. Jian Zhang presented findings showing a dramatic improvement in small random IO performance when Ceph is used with jemalloc. His results build upon Sandisk's original findings that the default thread cache values are a major bottleneck in TCMalloc 2.1. To further verify these results, we sat down at the Hackathon and configured the new performance test cluster that Intel generously donated to the Ceph community laboratory to run through a variety of tests with different memory allocator configurations. I've since written the results of those tests up in pdf form for folks who are interested. 

The results are located here: 

http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.pdf 

I want to be clear that many other folks have done the heavy lifting here. These results are simply a validation of the many tests that other folks have already done. Many thanks to Sandisk and others for figuring this out as it's a pretty big deal! 

Side note: Very little tuning other than swapping the memory allocator and a couple of quick and dirty ceph tunables were set during these tests. It's quite possible that higher IOPS will be achieved as we really start digging into the cluster and learning what the bottlenecks are. 

Thanks,
Mark
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 

________________________________ 

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Ceph Hackathon: More Memory Allocator Testing
  2015-08-20  1:09             ` Blinick, Stephen L
@ 2015-08-20  2:00               ` Shinobu Kinjo
  2015-08-20  5:29                 ` Alexandre DERUMIER
  0 siblings, 1 reply; 55+ messages in thread
From: Shinobu Kinjo @ 2015-08-20  2:00 UTC (permalink / raw)
  To: Stephen L Blinick
  Cc: Alexandre DERUMIER, Somnath Roy, Mark Nelson, ceph-devel

How about making any sheet for testing patter?

 Shinobu

----- Original Message -----
From: "Stephen L Blinick" <stephen.l.blinick@intel.com>
To: "Alexandre DERUMIER" <aderumier@odiso.com>, "Somnath Roy" <Somnath.Roy@sandisk.com>
Cc: "Mark Nelson" <mnelson@redhat.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
Sent: Thursday, August 20, 2015 10:09:36 AM
Subject: RE: Ceph Hackathon: More Memory Allocator Testing

Would it make more sense to try this comparison while changing the size of the worker thread pool?  i.e.  changing "osd_op_num_threads_per_shard" and "osd_op_num_shards"   (default is currently 2 and 5 respectively, for a total of 10 worker threads).

Thanks,

Stephen


-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Alexandre DERUMIER
Sent: Wednesday, August 19, 2015 11:47 AM
To: Somnath Roy
Cc: Mark Nelson; ceph-devel
Subject: Re: Ceph Hackathon: More Memory Allocator Testing

Just have done a small test with jemalloc, change osd_op_threads value, and check the memory just after daemon restart.

osd_op_threads = 2 (default)


USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      10246  6.0  0.3 1086656 245760 ?      Ssl  20:36   0:01 /usr/bin/ceph-osd --cluster=ceph -i 0 -f

osd_op_threads = 32

USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      10736 19.5  0.4 1474672 307412 ?      Ssl  20:37   0:01 /usr/bin/ceph-osd --cluster=ceph -i 0 -f



I'll try to compare with tcmalloc tommorow and under load.



----- Mail original -----
De: "Somnath Roy" <Somnath.Roy@sandisk.com>
À: "aderumier" <aderumier@odiso.com>
Cc: "Mark Nelson" <mnelson@redhat.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Mercredi 19 Août 2015 19:29:56
Objet: RE: Ceph Hackathon: More Memory Allocator Testing

Yes, it should be 1 per OSD... 
There is no doubt that TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES is relative to the number of threads running.. 
But, I don't know if number of threads is a factor for jemalloc.. 

Thanks & Regards
Somnath 

-----Original Message-----
From: Alexandre DERUMIER [mailto:aderumier@odiso.com]
Sent: Wednesday, August 19, 2015 9:55 AM
To: Somnath Roy
Cc: Mark Nelson; ceph-devel
Subject: Re: Ceph Hackathon: More Memory Allocator Testing 

<< I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. 

>>I think it is per tcmalloc instance loaded , so, at least with num_osds * num_tcmalloc_instance * TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in a box. 

What is num_tcmalloc_instance ? I think 1 osd process use a defined TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES size ? 

I'm saying that, because I have exactly the same bug, client side, with librbd + tcmalloc + qemu + iothreads. 
When I defined too much iothread threads, I'm hitting the bug directly. (can reproduce 100%). 
Like the thread_cache size is divide by number of threads? 






----- Mail original -----
De: "Somnath Roy" <Somnath.Roy@sandisk.com>
À: "aderumier" <aderumier@odiso.com>, "Mark Nelson" <mnelson@redhat.com>
Cc: "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Mercredi 19 Août 2015 18:27:30
Objet: RE: Ceph Hackathon: More Memory Allocator Testing 

<< I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. 

I think it is per tcmalloc instance loaded , so, at least with num_osds * num_tcmalloc_instance * TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in a box. 

Also, I think there is no point of increasing osd_op_threads as it is not in IO path anymore..Mark is using default 5:2 for shard:thread per shard.. 

But, yes, it could be related to number of threads OSDs are using, need to understand how jemalloc works..Also, there may be some tuning to reduce memory usage (?). 

Thanks & Regards
Somnath 

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Alexandre DERUMIER
Sent: Wednesday, August 19, 2015 9:06 AM
To: Mark Nelson
Cc: ceph-devel
Subject: Re: Ceph Hackathon: More Memory Allocator Testing 

I was listening at the today meeting, 

and seem that the blocker to have jemalloc as default, 

is that it's used more memory by osd (around 300MB?), and some guys could have boxes with 60disks. 


I just wonder if the memory increase is related to osd_op_num_shards/osd_op_threads value ? 

Seem that as hackaton, the bench has been done on super big cpus boxed 36cores/72T, http://ceph.com/hackathon/2015-08-ceph-hammer-full-ssd.pptx
with osd_op_threads = 32. 

I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. 

Maybe jemalloc allocated memory by threads. 



(I think guys with 60disks box, dont use ssd, so low iops by osd, and they don't need a lot of threads by osd) 



----- Mail original -----
De: "aderumier" <aderumier@odiso.com>
À: "Mark Nelson" <mnelson@redhat.com>
Cc: "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Mercredi 19 Août 2015 16:01:28
Objet: Re: Ceph Hackathon: More Memory Allocator Testing 

Thanks Marc, 

Results are matching exactly what I have seen with tcmalloc 2.1 vs 2.4 vs jemalloc. 

and indeed tcmalloc, even with bigger cache, seem decrease over time. 


What is funny, is that I see exactly same behaviour client librbd side, with qemu and multiple iothreads. 


Switching both server and client to jemalloc give me best performance on small read currently. 






----- Mail original -----
De: "Mark Nelson" <mnelson@redhat.com>
À: "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Mercredi 19 Août 2015 06:45:36
Objet: Ceph Hackathon: More Memory Allocator Testing 

Hi Everyone, 

One of the goals at the Ceph Hackathon last week was to examine how to improve Ceph Small IO performance. Jian Zhang presented findings showing a dramatic improvement in small random IO performance when Ceph is used with jemalloc. His results build upon Sandisk's original findings that the default thread cache values are a major bottleneck in TCMalloc 2.1. To further verify these results, we sat down at the Hackathon and configured the new performance test cluster that Intel generously donated to the Ceph community laboratory to run through a variety of tests with different memory allocator configurations. I've since written the results of those tests up in pdf form for folks who are interested. 

The results are located here: 

http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.pdf 

I want to be clear that many other folks have done the heavy lifting here. These results are simply a validation of the many tests that other folks have already done. Many thanks to Sandisk and others for figuring this out as it's a pretty big deal! 

Side note: Very little tuning other than swapping the memory allocator and a couple of quick and dirty ceph tunables were set during these tests. It's quite possible that higher IOPS will be achieved as we really start digging into the cluster and learning what the bottlenecks are. 

Thanks,
Mark
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 

________________________________ 

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
N�����r��y���b�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w������j:+v���w�j�m��������zZ+��ݢj"��
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Ceph Hackathon: More Memory Allocator Testing
  2015-08-20  2:00               ` Shinobu Kinjo
@ 2015-08-20  5:29                 ` Alexandre DERUMIER
  2015-08-20  8:17                   ` Alexandre DERUMIER
  0 siblings, 1 reply; 55+ messages in thread
From: Alexandre DERUMIER @ 2015-08-20  5:29 UTC (permalink / raw)
  To: Shinobu Kinjo; +Cc: Stephen L Blinick, Somnath Roy, Mark Nelson, ceph-devel

Hi,

jemmaloc 4.0 has been released 2 days agos

https://github.com/jemalloc/jemalloc/releases

I'm curious to see performance/memory usage improvement :)


----- Mail original -----
De: "Shinobu Kinjo" <skinjo@redhat.com>
À: "Stephen L Blinick" <stephen.l.blinick@intel.com>
Cc: "aderumier" <aderumier@odiso.com>, "Somnath Roy" <Somnath.Roy@sandisk.com>, "Mark Nelson" <mnelson@redhat.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Jeudi 20 Août 2015 04:00:15
Objet: Re: Ceph Hackathon: More Memory Allocator Testing

How about making any sheet for testing patter? 

Shinobu 

----- Original Message ----- 
From: "Stephen L Blinick" <stephen.l.blinick@intel.com> 
To: "Alexandre DERUMIER" <aderumier@odiso.com>, "Somnath Roy" <Somnath.Roy@sandisk.com> 
Cc: "Mark Nelson" <mnelson@redhat.com>, "ceph-devel" <ceph-devel@vger.kernel.org> 
Sent: Thursday, August 20, 2015 10:09:36 AM 
Subject: RE: Ceph Hackathon: More Memory Allocator Testing 

Would it make more sense to try this comparison while changing the size of the worker thread pool? i.e. changing "osd_op_num_threads_per_shard" and "osd_op_num_shards" (default is currently 2 and 5 respectively, for a total of 10 worker threads). 

Thanks, 

Stephen 


-----Original Message----- 
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Alexandre DERUMIER 
Sent: Wednesday, August 19, 2015 11:47 AM 
To: Somnath Roy 
Cc: Mark Nelson; ceph-devel 
Subject: Re: Ceph Hackathon: More Memory Allocator Testing 

Just have done a small test with jemalloc, change osd_op_threads value, and check the memory just after daemon restart. 

osd_op_threads = 2 (default) 


USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND 
root 10246 6.0 0.3 1086656 245760 ? Ssl 20:36 0:01 /usr/bin/ceph-osd --cluster=ceph -i 0 -f 

osd_op_threads = 32 

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND 
root 10736 19.5 0.4 1474672 307412 ? Ssl 20:37 0:01 /usr/bin/ceph-osd --cluster=ceph -i 0 -f 



I'll try to compare with tcmalloc tommorow and under load. 



----- Mail original ----- 
De: "Somnath Roy" <Somnath.Roy@sandisk.com> 
À: "aderumier" <aderumier@odiso.com> 
Cc: "Mark Nelson" <mnelson@redhat.com>, "ceph-devel" <ceph-devel@vger.kernel.org> 
Envoyé: Mercredi 19 Août 2015 19:29:56 
Objet: RE: Ceph Hackathon: More Memory Allocator Testing 

Yes, it should be 1 per OSD... 
There is no doubt that TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES is relative to the number of threads running.. 
But, I don't know if number of threads is a factor for jemalloc.. 

Thanks & Regards 
Somnath 

-----Original Message----- 
From: Alexandre DERUMIER [mailto:aderumier@odiso.com] 
Sent: Wednesday, August 19, 2015 9:55 AM 
To: Somnath Roy 
Cc: Mark Nelson; ceph-devel 
Subject: Re: Ceph Hackathon: More Memory Allocator Testing 

<< I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. 

>>I think it is per tcmalloc instance loaded , so, at least with num_osds * num_tcmalloc_instance * TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in a box. 

What is num_tcmalloc_instance ? I think 1 osd process use a defined TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES size ? 

I'm saying that, because I have exactly the same bug, client side, with librbd + tcmalloc + qemu + iothreads. 
When I defined too much iothread threads, I'm hitting the bug directly. (can reproduce 100%). 
Like the thread_cache size is divide by number of threads? 






----- Mail original ----- 
De: "Somnath Roy" <Somnath.Roy@sandisk.com> 
À: "aderumier" <aderumier@odiso.com>, "Mark Nelson" <mnelson@redhat.com> 
Cc: "ceph-devel" <ceph-devel@vger.kernel.org> 
Envoyé: Mercredi 19 Août 2015 18:27:30 
Objet: RE: Ceph Hackathon: More Memory Allocator Testing 

<< I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. 

I think it is per tcmalloc instance loaded , so, at least with num_osds * num_tcmalloc_instance * TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in a box. 

Also, I think there is no point of increasing osd_op_threads as it is not in IO path anymore..Mark is using default 5:2 for shard:thread per shard.. 

But, yes, it could be related to number of threads OSDs are using, need to understand how jemalloc works..Also, there may be some tuning to reduce memory usage (?). 

Thanks & Regards 
Somnath 

-----Original Message----- 
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Alexandre DERUMIER 
Sent: Wednesday, August 19, 2015 9:06 AM 
To: Mark Nelson 
Cc: ceph-devel 
Subject: Re: Ceph Hackathon: More Memory Allocator Testing 

I was listening at the today meeting, 

and seem that the blocker to have jemalloc as default, 

is that it's used more memory by osd (around 300MB?), and some guys could have boxes with 60disks. 


I just wonder if the memory increase is related to osd_op_num_shards/osd_op_threads value ? 

Seem that as hackaton, the bench has been done on super big cpus boxed 36cores/72T, http://ceph.com/hackathon/2015-08-ceph-hammer-full-ssd.pptx 
with osd_op_threads = 32. 

I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. 

Maybe jemalloc allocated memory by threads. 



(I think guys with 60disks box, dont use ssd, so low iops by osd, and they don't need a lot of threads by osd) 



----- Mail original ----- 
De: "aderumier" <aderumier@odiso.com> 
À: "Mark Nelson" <mnelson@redhat.com> 
Cc: "ceph-devel" <ceph-devel@vger.kernel.org> 
Envoyé: Mercredi 19 Août 2015 16:01:28 
Objet: Re: Ceph Hackathon: More Memory Allocator Testing 

Thanks Marc, 

Results are matching exactly what I have seen with tcmalloc 2.1 vs 2.4 vs jemalloc. 

and indeed tcmalloc, even with bigger cache, seem decrease over time. 


What is funny, is that I see exactly same behaviour client librbd side, with qemu and multiple iothreads. 


Switching both server and client to jemalloc give me best performance on small read currently. 






----- Mail original ----- 
De: "Mark Nelson" <mnelson@redhat.com> 
À: "ceph-devel" <ceph-devel@vger.kernel.org> 
Envoyé: Mercredi 19 Août 2015 06:45:36 
Objet: Ceph Hackathon: More Memory Allocator Testing 

Hi Everyone, 

One of the goals at the Ceph Hackathon last week was to examine how to improve Ceph Small IO performance. Jian Zhang presented findings showing a dramatic improvement in small random IO performance when Ceph is used with jemalloc. His results build upon Sandisk's original findings that the default thread cache values are a major bottleneck in TCMalloc 2.1. To further verify these results, we sat down at the Hackathon and configured the new performance test cluster that Intel generously donated to the Ceph community laboratory to run through a variety of tests with different memory allocator configurations. I've since written the results of those tests up in pdf form for folks who are interested. 

The results are located here: 

http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.pdf 

I want to be clear that many other folks have done the heavy lifting here. These results are simply a validation of the many tests that other folks have already done. Many thanks to Sandisk and others for figuring this out as it's a pretty big deal! 

Side note: Very little tuning other than swapping the memory allocator and a couple of quick and dirty ceph tunables were set during these tests. It's quite possible that higher IOPS will be achieved as we really start digging into the cluster and learning what the bottlenecks are. 

Thanks, 
Mark 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 

-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 

________________________________ 

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 
N�����r��y���b�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w������j:+v���w�j�m��������zZ+��ݢj"�� 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: Ceph Hackathon: More Memory Allocator Testing
  2015-08-19 18:20           ` Allen Samuels
  2015-08-19 18:36             ` Mark Nelson
@ 2015-08-20  6:25             ` Dałek, Piotr
  1 sibling, 0 replies; 55+ messages in thread
From: Dałek, Piotr @ 2015-08-20  6:25 UTC (permalink / raw)
  To: Allen Samuels, Somnath Roy, Alexandre DERUMIER; +Cc: Mark Nelson, ceph-devel

> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-
> owner@vger.kernel.org] On Behalf Of Allen Samuels
> Sent: Wednesday, August 19, 2015 8:20 PM

> It was a surprising result that the memory allocator is making such a large
> difference in performance. All of the recent work in fiddling with TCmalloc's
> and Jemalloc's various knobs and switches has been excellent a great
> example of group collaboration. But I think it's only a partial optimization of
> the underlying problem. The real take-away from this activity is that the code
> base is doing a LOT of memory allocation/deallocation which is consuming
> substantial CPU time-- regardless of how much we optimize the memory
> allocator, you can't get away from the fact that it macroscopically MATTERs.
> The better long-term solution is to reduce reliance on the general-purpose
> memory allocator and to implement strategies that are more specific to our
> usage model.

That's what some are trying to do right now. See, for example, https://github.com/ceph/ceph/pull/5534 - one of first patches in this patchset increased Ceph performance on small I/O by around 3% - depending on kind of messenger used, it was either decreased CPU usage, increased bandwidth, or both. 

> What really needs to happen initially is to instrument the
> allocation/deallocation. Most likely we'll find that 80+% of the work is coming
> from just a few object classes and it will be easy to create custom allocation
> strategies for those usages.

I've done this in the past, most allocations and deallocations come from bufferlist code itself (for example, in ::rebuild() method). Other than that, it's scattered around entire Ceph code base. Even simple, small message objects (like heartbeats) are constantly allocated and freed. It's especially super-tricky, because buffers are constantly moved between threads, so I guess current Ceph code might constitute a worst-case scenario for memory managers.

> This will lead to even higher performance that's
> much less sensitive to easy-to-misconfigure environmental factors and the
> entire tcmalloc/jemalloc -- oops it uses more memory discussion will go
> away.

That memory issue probably won't go away, most high-performance memory allocators do their best to not return freed memory to the OS too soon, so even if application would free some memory, on the OS side it still would be seen as used.
On the bright side, in worst case scenario (physical RAM exhausted), swapping wouldn't be as big issue here, since OS tracks which memory pages are actually used and would move to swap pages that aren't used -- including allocated by memory allocator, but not used by actual application.

With best regards / Pozdrawiam
Piotr Dałek

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: Ceph Hackathon: More Memory Allocator Testing
  2015-08-19 16:57         ` Blinick, Stephen L
@ 2015-08-20  6:35           ` Dałek, Piotr
  2015-08-20  7:08             ` Haomai Wang
  0 siblings, 1 reply; 55+ messages in thread
From: Dałek, Piotr @ 2015-08-20  6:35 UTC (permalink / raw)
  To: Blinick, Stephen L, Alexandre DERUMIER, Somnath Roy
  Cc: Mark Nelson, ceph-devel

> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-
> owner@vger.kernel.org] On Behalf Of Blinick, Stephen L
> Sent: Wednesday, August 19, 2015 6:58 PM
> 
> [..
> Regarding the all-HDD or high density HDD nodes, is it certain these issues
> with tcmalloc don't apply, due to lower performance, or would it potentially
> be something that would manifest over a longer period of time
> (weeks/months) of running?   I know we've seen some weirdness attributed
> to tcmalloc on our 10-disk 20-node cluster with HDD's &  SSD journals, but it
> took a few weeks.

And it takes me just a few minutes with rados bench to reproduce this issue on mixed-storage node (SSDs, SAS disks, high-capacity SATA disks, etc). 
See here: http://ceph.predictor.org.pl/cpu_usage_over_time.xlsx 
It gets even worse when rebalancing starts...

With best regards / Pozdrawiam
Piotr Dałek

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Ceph Hackathon: More Memory Allocator Testing
  2015-08-20  6:35           ` Dałek, Piotr
@ 2015-08-20  7:08             ` Haomai Wang
  2015-08-20  7:18               ` Dałek, Piotr
  0 siblings, 1 reply; 55+ messages in thread
From: Haomai Wang @ 2015-08-20  7:08 UTC (permalink / raw)
  To: Dałek, Piotr
  Cc: Blinick, Stephen L, Alexandre DERUMIER, Somnath Roy, Mark Nelson,
	ceph-devel

On Thu, Aug 20, 2015 at 2:35 PM, Dałek, Piotr
<Piotr.Dalek@ts.fujitsu.com> wrote:
>> -----Original Message-----
>> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-
>> owner@vger.kernel.org] On Behalf Of Blinick, Stephen L
>> Sent: Wednesday, August 19, 2015 6:58 PM
>>
>> [..
>> Regarding the all-HDD or high density HDD nodes, is it certain these issues
>> with tcmalloc don't apply, due to lower performance, or would it potentially
>> be something that would manifest over a longer period of time
>> (weeks/months) of running?   I know we've seen some weirdness attributed
>> to tcmalloc on our 10-disk 20-node cluster with HDD's &  SSD journals, but it
>> took a few weeks.
>
> And it takes me just a few minutes with rados bench to reproduce this issue on mixed-storage node (SSDs, SAS disks, high-capacity SATA disks, etc).
> See here: http://ceph.predictor.org.pl/cpu_usage_over_time.xlsx
> It gets even worse when rebalancing starts...

Cool, it met my thought. I guess the only way to lighten memory
problem is solve this for each heavy memory allocation use case.

>
> With best regards / Pozdrawiam
> Piotr Dałek



-- 
Best Regards,

Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: Ceph Hackathon: More Memory Allocator Testing
  2015-08-20  7:08             ` Haomai Wang
@ 2015-08-20  7:18               ` Dałek, Piotr
  0 siblings, 0 replies; 55+ messages in thread
From: Dałek, Piotr @ 2015-08-20  7:18 UTC (permalink / raw)
  To: Haomai Wang
  Cc: Blinick, Stephen L, Alexandre DERUMIER, Somnath Roy, Mark Nelson,
	ceph-devel

> -----Original Message-----
> From: Haomai Wang [mailto:haomaiwang@gmail.com]
> Sent: Thursday, August 20, 2015 9:09 AM
>
> > And it takes me just a few minutes with rados bench to reproduce this
> issue on mixed-storage node (SSDs, SAS disks, high-capacity SATA disks, etc).
> > See here: http://ceph.predictor.org.pl/cpu_usage_over_time.xlsx
> > It gets even worse when rebalancing starts...
> 
> Cool, it met my thought. I guess the only way to lighten memory problem is
> solve this for each heavy memory allocation use case.

Unfortunately, yes. 

With best regards / Pozdrawiam
Piotr Dałek

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Ceph Hackathon: More Memory Allocator Testing
  2015-08-20  5:29                 ` Alexandre DERUMIER
@ 2015-08-20  8:17                   ` Alexandre DERUMIER
  2015-08-20 12:54                     ` Shinobu Kinjo
  0 siblings, 1 reply; 55+ messages in thread
From: Alexandre DERUMIER @ 2015-08-20  8:17 UTC (permalink / raw)
  To: Shinobu Kinjo; +Cc: Stephen L Blinick, Somnath Roy, Mark Nelson, ceph-devel

memory results of osd daemon under load,

jemalloc use always more memory than tcmalloc,
jemalloc 4.0 seem to reduce memory usage but still a little bit more than tcmalloc



osd_op_threads=2 : tcmalloc 2.1
------------------------------------------
root      38066  2.3  0.7 1223088 505144 ?      Ssl  08:35   1:32 /usr/bin/ceph-osd --cluster=ceph -i 4 -f
root      38165  2.4  0.7 1247828 525356 ?      Ssl  08:35   1:34 /usr/bin/ceph-osd --cluster=ceph -i 5 -f


osd_op_threads=32: tcmalloc 2.1
------------------------------------------

root      39002  102  0.7 1455928 488584 ?      Ssl  09:41   0:30 /usr/bin/ceph-osd --cluster=ceph -i 4 -f
root      39168  114  0.7 1483752 518368 ?      Ssl  09:41   0:30 /usr/bin/ceph-osd --cluster=ceph -i 5 -f


osd_op_threads=2 jemalloc 3.5
-----------------------------
root      18402 72.0  1.1 1642000 769000 ?      Ssl  09:43   0:17 /usr/bin/ceph-osd --cluster=ceph -i 0 -f
root      18434 89.1  1.2 1677444 797508 ?      Ssl  09:43   0:21 /usr/bin/ceph-osd --cluster=ceph -i 1 -f


osd_op_threads=32 jemalloc 3.5
-----------------------------
root      17204  3.7  1.2 2030616 816520 ?      Ssl  08:35   2:31 /usr/bin/ceph-osd --cluster=ceph -i 0 -f
root      17228  4.6  1.2 2064928 830060 ?      Ssl  08:35   3:05 /usr/bin/ceph-osd --cluster=ceph -i 1 -f


osd_op_threads=2 jemalloc 4.0
-----------------------------
root      19967  113  1.1 1432520 737988 ?      Ssl  10:04   0:31 /usr/bin/ceph-osd --cluster=ceph -i 1 -f
root      19976 93.6  1.0 1409376 711192 ?      Ssl  10:04   0:26 /usr/bin/ceph-osd --cluster=ceph -i 0 -f


osd_op_threads=32 jemalloc 4.0
-----------------------------
root      20484  128  1.1 1689176 778508 ?      Ssl  10:06   0:26 /usr/bin/ceph-osd --cluster=ceph -i 0 -f
root      20502  170  1.2 1720524 810668 ?      Ssl  10:06   0:35 /usr/bin/ceph-osd --cluster=ceph -i 1 -f



----- Mail original -----
De: "aderumier" <aderumier@odiso.com>
À: "Shinobu Kinjo" <skinjo@redhat.com>
Cc: "Stephen L Blinick" <stephen.l.blinick@intel.com>, "Somnath Roy" <Somnath.Roy@sandisk.com>, "Mark Nelson" <mnelson@redhat.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Jeudi 20 Août 2015 07:29:22
Objet: Re: Ceph Hackathon: More Memory Allocator Testing

Hi, 

jemmaloc 4.0 has been released 2 days agos 

https://github.com/jemalloc/jemalloc/releases 

I'm curious to see performance/memory usage improvement :) 


----- Mail original ----- 
De: "Shinobu Kinjo" <skinjo@redhat.com> 
À: "Stephen L Blinick" <stephen.l.blinick@intel.com> 
Cc: "aderumier" <aderumier@odiso.com>, "Somnath Roy" <Somnath.Roy@sandisk.com>, "Mark Nelson" <mnelson@redhat.com>, "ceph-devel" <ceph-devel@vger.kernel.org> 
Envoyé: Jeudi 20 Août 2015 04:00:15 
Objet: Re: Ceph Hackathon: More Memory Allocator Testing 

How about making any sheet for testing patter? 

Shinobu 

----- Original Message ----- 
From: "Stephen L Blinick" <stephen.l.blinick@intel.com> 
To: "Alexandre DERUMIER" <aderumier@odiso.com>, "Somnath Roy" <Somnath.Roy@sandisk.com> 
Cc: "Mark Nelson" <mnelson@redhat.com>, "ceph-devel" <ceph-devel@vger.kernel.org> 
Sent: Thursday, August 20, 2015 10:09:36 AM 
Subject: RE: Ceph Hackathon: More Memory Allocator Testing 

Would it make more sense to try this comparison while changing the size of the worker thread pool? i.e. changing "osd_op_num_threads_per_shard" and "osd_op_num_shards" (default is currently 2 and 5 respectively, for a total of 10 worker threads). 

Thanks, 

Stephen 


-----Original Message----- 
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Alexandre DERUMIER 
Sent: Wednesday, August 19, 2015 11:47 AM 
To: Somnath Roy 
Cc: Mark Nelson; ceph-devel 
Subject: Re: Ceph Hackathon: More Memory Allocator Testing 

Just have done a small test with jemalloc, change osd_op_threads value, and check the memory just after daemon restart. 

osd_op_threads = 2 (default) 


USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND 
root 10246 6.0 0.3 1086656 245760 ? Ssl 20:36 0:01 /usr/bin/ceph-osd --cluster=ceph -i 0 -f 

osd_op_threads = 32 

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND 
root 10736 19.5 0.4 1474672 307412 ? Ssl 20:37 0:01 /usr/bin/ceph-osd --cluster=ceph -i 0 -f 



I'll try to compare with tcmalloc tommorow and under load. 



----- Mail original ----- 
De: "Somnath Roy" <Somnath.Roy@sandisk.com> 
À: "aderumier" <aderumier@odiso.com> 
Cc: "Mark Nelson" <mnelson@redhat.com>, "ceph-devel" <ceph-devel@vger.kernel.org> 
Envoyé: Mercredi 19 Août 2015 19:29:56 
Objet: RE: Ceph Hackathon: More Memory Allocator Testing 

Yes, it should be 1 per OSD... 
There is no doubt that TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES is relative to the number of threads running.. 
But, I don't know if number of threads is a factor for jemalloc.. 

Thanks & Regards 
Somnath 

-----Original Message----- 
From: Alexandre DERUMIER [mailto:aderumier@odiso.com] 
Sent: Wednesday, August 19, 2015 9:55 AM 
To: Somnath Roy 
Cc: Mark Nelson; ceph-devel 
Subject: Re: Ceph Hackathon: More Memory Allocator Testing 

<< I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. 

>>I think it is per tcmalloc instance loaded , so, at least with num_osds * num_tcmalloc_instance * TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in a box. 

What is num_tcmalloc_instance ? I think 1 osd process use a defined TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES size ? 

I'm saying that, because I have exactly the same bug, client side, with librbd + tcmalloc + qemu + iothreads. 
When I defined too much iothread threads, I'm hitting the bug directly. (can reproduce 100%). 
Like the thread_cache size is divide by number of threads? 






----- Mail original ----- 
De: "Somnath Roy" <Somnath.Roy@sandisk.com> 
À: "aderumier" <aderumier@odiso.com>, "Mark Nelson" <mnelson@redhat.com> 
Cc: "ceph-devel" <ceph-devel@vger.kernel.org> 
Envoyé: Mercredi 19 Août 2015 18:27:30 
Objet: RE: Ceph Hackathon: More Memory Allocator Testing 

<< I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. 

I think it is per tcmalloc instance loaded , so, at least with num_osds * num_tcmalloc_instance * TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in a box. 

Also, I think there is no point of increasing osd_op_threads as it is not in IO path anymore..Mark is using default 5:2 for shard:thread per shard.. 

But, yes, it could be related to number of threads OSDs are using, need to understand how jemalloc works..Also, there may be some tuning to reduce memory usage (?). 

Thanks & Regards 
Somnath 

-----Original Message----- 
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Alexandre DERUMIER 
Sent: Wednesday, August 19, 2015 9:06 AM 
To: Mark Nelson 
Cc: ceph-devel 
Subject: Re: Ceph Hackathon: More Memory Allocator Testing 

I was listening at the today meeting, 

and seem that the blocker to have jemalloc as default, 

is that it's used more memory by osd (around 300MB?), and some guys could have boxes with 60disks. 


I just wonder if the memory increase is related to osd_op_num_shards/osd_op_threads value ? 

Seem that as hackaton, the bench has been done on super big cpus boxed 36cores/72T, http://ceph.com/hackathon/2015-08-ceph-hammer-full-ssd.pptx 
with osd_op_threads = 32. 

I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. 

Maybe jemalloc allocated memory by threads. 



(I think guys with 60disks box, dont use ssd, so low iops by osd, and they don't need a lot of threads by osd) 



----- Mail original ----- 
De: "aderumier" <aderumier@odiso.com> 
À: "Mark Nelson" <mnelson@redhat.com> 
Cc: "ceph-devel" <ceph-devel@vger.kernel.org> 
Envoyé: Mercredi 19 Août 2015 16:01:28 
Objet: Re: Ceph Hackathon: More Memory Allocator Testing 

Thanks Marc, 

Results are matching exactly what I have seen with tcmalloc 2.1 vs 2.4 vs jemalloc. 

and indeed tcmalloc, even with bigger cache, seem decrease over time. 


What is funny, is that I see exactly same behaviour client librbd side, with qemu and multiple iothreads. 


Switching both server and client to jemalloc give me best performance on small read currently. 






----- Mail original ----- 
De: "Mark Nelson" <mnelson@redhat.com> 
À: "ceph-devel" <ceph-devel@vger.kernel.org> 
Envoyé: Mercredi 19 Août 2015 06:45:36 
Objet: Ceph Hackathon: More Memory Allocator Testing 

Hi Everyone, 

One of the goals at the Ceph Hackathon last week was to examine how to improve Ceph Small IO performance. Jian Zhang presented findings showing a dramatic improvement in small random IO performance when Ceph is used with jemalloc. His results build upon Sandisk's original findings that the default thread cache values are a major bottleneck in TCMalloc 2.1. To further verify these results, we sat down at the Hackathon and configured the new performance test cluster that Intel generously donated to the Ceph community laboratory to run through a variety of tests with different memory allocator configurations. I've since written the results of those tests up in pdf form for folks who are interested. 

The results are located here: 

http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.pdf 

I want to be clear that many other folks have done the heavy lifting here. These results are simply a validation of the many tests that other folks have already done. Many thanks to Sandisk and others for figuring this out as it's a pretty big deal! 

Side note: Very little tuning other than swapping the memory allocator and a couple of quick and dirty ceph tunables were set during these tests. It's quite possible that higher IOPS will be achieved as we really start digging into the cluster and learning what the bottlenecks are. 

Thanks, 
Mark 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 

-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 

________________________________ 

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 
N�����r��y���b�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w������j:+v���w�j�m��������zZ+��ݢj"�� 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
the body of a message to majordomo@vger.kernel.org 
More majordomo info at http://vger.kernel.org/majordomo-info.html 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Ceph Hackathon: More Memory Allocator Testing
  2015-08-20  8:17                   ` Alexandre DERUMIER
@ 2015-08-20 12:54                     ` Shinobu Kinjo
  2015-08-20 14:46                       ` Matt Benjamin
  0 siblings, 1 reply; 55+ messages in thread
From: Shinobu Kinjo @ 2015-08-20 12:54 UTC (permalink / raw)
  To: Alexandre DERUMIER
  Cc: Stephen L Blinick, Somnath Roy, Mark Nelson, ceph-devel

Thank you for that result.
So it might make sense to know difference between jemalloc and jemalloc 4.0.

 Shinobu

----- Original Message -----
From: "Alexandre DERUMIER" <aderumier@odiso.com>
To: "Shinobu Kinjo" <skinjo@redhat.com>
Cc: "Stephen L Blinick" <stephen.l.blinick@intel.com>, "Somnath Roy" <Somnath.Roy@sandisk.com>, "Mark Nelson" <mnelson@redhat.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
Sent: Thursday, August 20, 2015 5:17:46 PM
Subject: Re: Ceph Hackathon: More Memory Allocator Testing

memory results of osd daemon under load,

jemalloc use always more memory than tcmalloc,
jemalloc 4.0 seem to reduce memory usage but still a little bit more than tcmalloc



osd_op_threads=2 : tcmalloc 2.1
------------------------------------------
root      38066  2.3  0.7 1223088 505144 ?      Ssl  08:35   1:32 /usr/bin/ceph-osd --cluster=ceph -i 4 -f
root      38165  2.4  0.7 1247828 525356 ?      Ssl  08:35   1:34 /usr/bin/ceph-osd --cluster=ceph -i 5 -f


osd_op_threads=32: tcmalloc 2.1
------------------------------------------

root      39002  102  0.7 1455928 488584 ?      Ssl  09:41   0:30 /usr/bin/ceph-osd --cluster=ceph -i 4 -f
root      39168  114  0.7 1483752 518368 ?      Ssl  09:41   0:30 /usr/bin/ceph-osd --cluster=ceph -i 5 -f


osd_op_threads=2 jemalloc 3.5
-----------------------------
root      18402 72.0  1.1 1642000 769000 ?      Ssl  09:43   0:17 /usr/bin/ceph-osd --cluster=ceph -i 0 -f
root      18434 89.1  1.2 1677444 797508 ?      Ssl  09:43   0:21 /usr/bin/ceph-osd --cluster=ceph -i 1 -f


osd_op_threads=32 jemalloc 3.5
-----------------------------
root      17204  3.7  1.2 2030616 816520 ?      Ssl  08:35   2:31 /usr/bin/ceph-osd --cluster=ceph -i 0 -f
root      17228  4.6  1.2 2064928 830060 ?      Ssl  08:35   3:05 /usr/bin/ceph-osd --cluster=ceph -i 1 -f


osd_op_threads=2 jemalloc 4.0
-----------------------------
root      19967  113  1.1 1432520 737988 ?      Ssl  10:04   0:31 /usr/bin/ceph-osd --cluster=ceph -i 1 -f
root      19976 93.6  1.0 1409376 711192 ?      Ssl  10:04   0:26 /usr/bin/ceph-osd --cluster=ceph -i 0 -f


osd_op_threads=32 jemalloc 4.0
-----------------------------
root      20484  128  1.1 1689176 778508 ?      Ssl  10:06   0:26 /usr/bin/ceph-osd --cluster=ceph -i 0 -f
root      20502  170  1.2 1720524 810668 ?      Ssl  10:06   0:35 /usr/bin/ceph-osd --cluster=ceph -i 1 -f



----- Mail original -----
De: "aderumier" <aderumier@odiso.com>
À: "Shinobu Kinjo" <skinjo@redhat.com>
Cc: "Stephen L Blinick" <stephen.l.blinick@intel.com>, "Somnath Roy" <Somnath.Roy@sandisk.com>, "Mark Nelson" <mnelson@redhat.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Jeudi 20 Août 2015 07:29:22
Objet: Re: Ceph Hackathon: More Memory Allocator Testing

Hi, 

jemmaloc 4.0 has been released 2 days agos 

https://github.com/jemalloc/jemalloc/releases 

I'm curious to see performance/memory usage improvement :) 


----- Mail original ----- 
De: "Shinobu Kinjo" <skinjo@redhat.com> 
À: "Stephen L Blinick" <stephen.l.blinick@intel.com> 
Cc: "aderumier" <aderumier@odiso.com>, "Somnath Roy" <Somnath.Roy@sandisk.com>, "Mark Nelson" <mnelson@redhat.com>, "ceph-devel" <ceph-devel@vger.kernel.org> 
Envoyé: Jeudi 20 Août 2015 04:00:15 
Objet: Re: Ceph Hackathon: More Memory Allocator Testing 

How about making any sheet for testing patter? 

Shinobu 

----- Original Message ----- 
From: "Stephen L Blinick" <stephen.l.blinick@intel.com> 
To: "Alexandre DERUMIER" <aderumier@odiso.com>, "Somnath Roy" <Somnath.Roy@sandisk.com> 
Cc: "Mark Nelson" <mnelson@redhat.com>, "ceph-devel" <ceph-devel@vger.kernel.org> 
Sent: Thursday, August 20, 2015 10:09:36 AM 
Subject: RE: Ceph Hackathon: More Memory Allocator Testing 

Would it make more sense to try this comparison while changing the size of the worker thread pool? i.e. changing "osd_op_num_threads_per_shard" and "osd_op_num_shards" (default is currently 2 and 5 respectively, for a total of 10 worker threads). 

Thanks, 

Stephen 


-----Original Message----- 
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Alexandre DERUMIER 
Sent: Wednesday, August 19, 2015 11:47 AM 
To: Somnath Roy 
Cc: Mark Nelson; ceph-devel 
Subject: Re: Ceph Hackathon: More Memory Allocator Testing 

Just have done a small test with jemalloc, change osd_op_threads value, and check the memory just after daemon restart. 

osd_op_threads = 2 (default) 


USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND 
root 10246 6.0 0.3 1086656 245760 ? Ssl 20:36 0:01 /usr/bin/ceph-osd --cluster=ceph -i 0 -f 

osd_op_threads = 32 

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND 
root 10736 19.5 0.4 1474672 307412 ? Ssl 20:37 0:01 /usr/bin/ceph-osd --cluster=ceph -i 0 -f 



I'll try to compare with tcmalloc tommorow and under load. 



----- Mail original ----- 
De: "Somnath Roy" <Somnath.Roy@sandisk.com> 
À: "aderumier" <aderumier@odiso.com> 
Cc: "Mark Nelson" <mnelson@redhat.com>, "ceph-devel" <ceph-devel@vger.kernel.org> 
Envoyé: Mercredi 19 Août 2015 19:29:56 
Objet: RE: Ceph Hackathon: More Memory Allocator Testing 

Yes, it should be 1 per OSD... 
There is no doubt that TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES is relative to the number of threads running.. 
But, I don't know if number of threads is a factor for jemalloc.. 

Thanks & Regards 
Somnath 

-----Original Message----- 
From: Alexandre DERUMIER [mailto:aderumier@odiso.com] 
Sent: Wednesday, August 19, 2015 9:55 AM 
To: Somnath Roy 
Cc: Mark Nelson; ceph-devel 
Subject: Re: Ceph Hackathon: More Memory Allocator Testing 

<< I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. 

>>I think it is per tcmalloc instance loaded , so, at least with num_osds * num_tcmalloc_instance * TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in a box. 

What is num_tcmalloc_instance ? I think 1 osd process use a defined TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES size ? 

I'm saying that, because I have exactly the same bug, client side, with librbd + tcmalloc + qemu + iothreads. 
When I defined too much iothread threads, I'm hitting the bug directly. (can reproduce 100%). 
Like the thread_cache size is divide by number of threads? 






----- Mail original ----- 
De: "Somnath Roy" <Somnath.Roy@sandisk.com> 
À: "aderumier" <aderumier@odiso.com>, "Mark Nelson" <mnelson@redhat.com> 
Cc: "ceph-devel" <ceph-devel@vger.kernel.org> 
Envoyé: Mercredi 19 Août 2015 18:27:30 
Objet: RE: Ceph Hackathon: More Memory Allocator Testing 

<< I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. 

I think it is per tcmalloc instance loaded , so, at least with num_osds * num_tcmalloc_instance * TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in a box. 

Also, I think there is no point of increasing osd_op_threads as it is not in IO path anymore..Mark is using default 5:2 for shard:thread per shard.. 

But, yes, it could be related to number of threads OSDs are using, need to understand how jemalloc works..Also, there may be some tuning to reduce memory usage (?). 

Thanks & Regards 
Somnath 

-----Original Message----- 
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Alexandre DERUMIER 
Sent: Wednesday, August 19, 2015 9:06 AM 
To: Mark Nelson 
Cc: ceph-devel 
Subject: Re: Ceph Hackathon: More Memory Allocator Testing 

I was listening at the today meeting, 

and seem that the blocker to have jemalloc as default, 

is that it's used more memory by osd (around 300MB?), and some guys could have boxes with 60disks. 


I just wonder if the memory increase is related to osd_op_num_shards/osd_op_threads value ? 

Seem that as hackaton, the bench has been done on super big cpus boxed 36cores/72T, http://ceph.com/hackathon/2015-08-ceph-hammer-full-ssd.pptx 
with osd_op_threads = 32. 

I think that tcmalloc have a fixed size (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process. 

Maybe jemalloc allocated memory by threads. 



(I think guys with 60disks box, dont use ssd, so low iops by osd, and they don't need a lot of threads by osd) 



----- Mail original ----- 
De: "aderumier" <aderumier@odiso.com> 
À: "Mark Nelson" <mnelson@redhat.com> 
Cc: "ceph-devel" <ceph-devel@vger.kernel.org> 
Envoyé: Mercredi 19 Août 2015 16:01:28 
Objet: Re: Ceph Hackathon: More Memory Allocator Testing 

Thanks Marc, 

Results are matching exactly what I have seen with tcmalloc 2.1 vs 2.4 vs jemalloc. 

and indeed tcmalloc, even with bigger cache, seem decrease over time. 


What is funny, is that I see exactly same behaviour client librbd side, with qemu and multiple iothreads. 


Switching both server and client to jemalloc give me best performance on small read currently. 






----- Mail original ----- 
De: "Mark Nelson" <mnelson@redhat.com> 
À: "ceph-devel" <ceph-devel@vger.kernel.org> 
Envoyé: Mercredi 19 Août 2015 06:45:36 
Objet: Ceph Hackathon: More Memory Allocator Testing 

Hi Everyone, 

One of the goals at the Ceph Hackathon last week was to examine how to improve Ceph Small IO performance. Jian Zhang presented findings showing a dramatic improvement in small random IO performance when Ceph is used with jemalloc. His results build upon Sandisk's original findings that the default thread cache values are a major bottleneck in TCMalloc 2.1. To further verify these results, we sat down at the Hackathon and configured the new performance test cluster that Intel generously donated to the Ceph community laboratory to run through a variety of tests with different memory allocator configurations. I've since written the results of those tests up in pdf form for folks who are interested. 

The results are located here: 

http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.pdf 

I want to be clear that many other folks have done the heavy lifting here. These results are simply a validation of the many tests that other folks have already done. Many thanks to Sandisk and others for figuring this out as it's a pretty big deal! 

Side note: Very little tuning other than swapping the memory allocator and a couple of quick and dirty ceph tunables were set during these tests. It's quite possible that higher IOPS will be achieved as we really start digging into the cluster and learning what the bottlenecks are. 

Thanks, 
Mark 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 

-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 

________________________________ 

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 
N�����r��y���b�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w������j:+v���w�j�m��������zZ+��ݢj"�� 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
the body of a message to majordomo@vger.kernel.org 
More majordomo info at http://vger.kernel.org/majordomo-info.html 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Ceph Hackathon: More Memory Allocator Testing
  2015-08-20 12:54                     ` Shinobu Kinjo
@ 2015-08-20 14:46                       ` Matt Benjamin
  0 siblings, 0 replies; 55+ messages in thread
From: Matt Benjamin @ 2015-08-20 14:46 UTC (permalink / raw)
  To: Shinobu Kinjo
  Cc: Alexandre DERUMIER, Stephen L Blinick, Somnath Roy, Mark Nelson,
	ceph-devel

Jemalloc 4.0 seems to have some shiny new capabilities, at least.

Matt

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-761-4689
fax.  734-769-8938
cel.  734-216-5309

----- Original Message -----
> From: "Shinobu Kinjo" <skinjo@redhat.com>
> To: "Alexandre DERUMIER" <aderumier@odiso.com>
> Cc: "Stephen L Blinick" <stephen.l.blinick@intel.com>, "Somnath Roy" <Somnath.Roy@sandisk.com>, "Mark Nelson"
> <mnelson@redhat.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
> Sent: Thursday, August 20, 2015 8:54:59 AM
> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
> 
> Thank you for that result.
> So it might make sense to know difference between jemalloc and jemalloc 4.0.
> 
>  Shinobu
> 
> ----- Original Message -----
> From: "Alexandre DERUMIER" <aderumier@odiso.com>
> To: "Shinobu Kinjo" <skinjo@redhat.com>
> Cc: "Stephen L Blinick" <stephen.l.blinick@intel.com>, "Somnath Roy"
> <Somnath.Roy@sandisk.com>, "Mark Nelson" <mnelson@redhat.com>, "ceph-devel"
> <ceph-devel@vger.kernel.org>
> Sent: Thursday, August 20, 2015 5:17:46 PM
> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
> 
> memory results of osd daemon under load,
> 
> jemalloc use always more memory than tcmalloc,
> jemalloc 4.0 seem to reduce memory usage but still a little bit more than
> tcmalloc
> 
> 
> 
> osd_op_threads=2 : tcmalloc 2.1
> ------------------------------------------
> root      38066  2.3  0.7 1223088 505144 ?      Ssl  08:35   1:32
> /usr/bin/ceph-osd --cluster=ceph -i 4 -f
> root      38165  2.4  0.7 1247828 525356 ?      Ssl  08:35   1:34
> /usr/bin/ceph-osd --cluster=ceph -i 5 -f
> 
> 
> osd_op_threads=32: tcmalloc 2.1
> ------------------------------------------
> 
> root      39002  102  0.7 1455928 488584 ?      Ssl  09:41   0:30
> /usr/bin/ceph-osd --cluster=ceph -i 4 -f
> root      39168  114  0.7 1483752 518368 ?      Ssl  09:41   0:30
> /usr/bin/ceph-osd --cluster=ceph -i 5 -f
> 
> 
> osd_op_threads=2 jemalloc 3.5
> -----------------------------
> root      18402 72.0  1.1 1642000 769000 ?      Ssl  09:43   0:17
> /usr/bin/ceph-osd --cluster=ceph -i 0 -f
> root      18434 89.1  1.2 1677444 797508 ?      Ssl  09:43   0:21
> /usr/bin/ceph-osd --cluster=ceph -i 1 -f
> 
> 
> osd_op_threads=32 jemalloc 3.5
> -----------------------------
> root      17204  3.7  1.2 2030616 816520 ?      Ssl  08:35   2:31
> /usr/bin/ceph-osd --cluster=ceph -i 0 -f
> root      17228  4.6  1.2 2064928 830060 ?      Ssl  08:35   3:05
> /usr/bin/ceph-osd --cluster=ceph -i 1 -f
> 
> 
> osd_op_threads=2 jemalloc 4.0
> -----------------------------
> root      19967  113  1.1 1432520 737988 ?      Ssl  10:04   0:31
> /usr/bin/ceph-osd --cluster=ceph -i 1 -f
> root      19976 93.6  1.0 1409376 711192 ?      Ssl  10:04   0:26
> /usr/bin/ceph-osd --cluster=ceph -i 0 -f
> 
> 
> osd_op_threads=32 jemalloc 4.0
> -----------------------------
> root      20484  128  1.1 1689176 778508 ?      Ssl  10:06   0:26
> /usr/bin/ceph-osd --cluster=ceph -i 0 -f
> root      20502  170  1.2 1720524 810668 ?      Ssl  10:06   0:35
> /usr/bin/ceph-osd --cluster=ceph -i 1 -f
> 
> 
> 
> ----- Mail original -----
> De: "aderumier" <aderumier@odiso.com>
> À: "Shinobu Kinjo" <skinjo@redhat.com>
> Cc: "Stephen L Blinick" <stephen.l.blinick@intel.com>, "Somnath Roy"
> <Somnath.Roy@sandisk.com>, "Mark Nelson" <mnelson@redhat.com>, "ceph-devel"
> <ceph-devel@vger.kernel.org>
> Envoyé: Jeudi 20 Août 2015 07:29:22
> Objet: Re: Ceph Hackathon: More Memory Allocator Testing
> 
> Hi,
> 
> jemmaloc 4.0 has been released 2 days agos
> 
> https://github.com/jemalloc/jemalloc/releases
> 
> I'm curious to see performance/memory usage improvement :)
> 
> 
> ----- Mail original -----
> De: "Shinobu Kinjo" <skinjo@redhat.com>
> À: "Stephen L Blinick" <stephen.l.blinick@intel.com>
> Cc: "aderumier" <aderumier@odiso.com>, "Somnath Roy"
> <Somnath.Roy@sandisk.com>, "Mark Nelson" <mnelson@redhat.com>, "ceph-devel"
> <ceph-devel@vger.kernel.org>
> Envoyé: Jeudi 20 Août 2015 04:00:15
> Objet: Re: Ceph Hackathon: More Memory Allocator Testing
> 
> How about making any sheet for testing patter?
> 
> Shinobu
> 
> ----- Original Message -----
> From: "Stephen L Blinick" <stephen.l.blinick@intel.com>
> To: "Alexandre DERUMIER" <aderumier@odiso.com>, "Somnath Roy"
> <Somnath.Roy@sandisk.com>
> Cc: "Mark Nelson" <mnelson@redhat.com>, "ceph-devel"
> <ceph-devel@vger.kernel.org>
> Sent: Thursday, August 20, 2015 10:09:36 AM
> Subject: RE: Ceph Hackathon: More Memory Allocator Testing
> 
> Would it make more sense to try this comparison while changing the size of
> the worker thread pool? i.e. changing "osd_op_num_threads_per_shard" and
> "osd_op_num_shards" (default is currently 2 and 5 respectively, for a total
> of 10 worker threads).
> 
> Thanks,
> 
> Stephen
> 
> 
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org
> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Alexandre DERUMIER
> Sent: Wednesday, August 19, 2015 11:47 AM
> To: Somnath Roy
> Cc: Mark Nelson; ceph-devel
> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
> 
> Just have done a small test with jemalloc, change osd_op_threads value, and
> check the memory just after daemon restart.
> 
> osd_op_threads = 2 (default)
> 
> 
> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
> root 10246 6.0 0.3 1086656 245760 ? Ssl 20:36 0:01 /usr/bin/ceph-osd
> --cluster=ceph -i 0 -f
> 
> osd_op_threads = 32
> 
> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
> root 10736 19.5 0.4 1474672 307412 ? Ssl 20:37 0:01 /usr/bin/ceph-osd
> --cluster=ceph -i 0 -f
> 
> 
> 
> I'll try to compare with tcmalloc tommorow and under load.
> 
> 
> 
> ----- Mail original -----
> De: "Somnath Roy" <Somnath.Roy@sandisk.com>
> À: "aderumier" <aderumier@odiso.com>
> Cc: "Mark Nelson" <mnelson@redhat.com>, "ceph-devel"
> <ceph-devel@vger.kernel.org>
> Envoyé: Mercredi 19 Août 2015 19:29:56
> Objet: RE: Ceph Hackathon: More Memory Allocator Testing
> 
> Yes, it should be 1 per OSD...
> There is no doubt that TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES is relative to
> the number of threads running..
> But, I don't know if number of threads is a factor for jemalloc..
> 
> Thanks & Regards
> Somnath
> 
> -----Original Message-----
> From: Alexandre DERUMIER [mailto:aderumier@odiso.com]
> Sent: Wednesday, August 19, 2015 9:55 AM
> To: Somnath Roy
> Cc: Mark Nelson; ceph-devel
> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
> 
> << I think that tcmalloc have a fixed size
> (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process.
> 
> >>I think it is per tcmalloc instance loaded , so, at least with num_osds *
> >>num_tcmalloc_instance * TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in a box.
> 
> What is num_tcmalloc_instance ? I think 1 osd process use a defined
> TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES size ?
> 
> I'm saying that, because I have exactly the same bug, client side, with
> librbd + tcmalloc + qemu + iothreads.
> When I defined too much iothread threads, I'm hitting the bug directly. (can
> reproduce 100%).
> Like the thread_cache size is divide by number of threads?
> 
> 
> 
> 
> 
> 
> ----- Mail original -----
> De: "Somnath Roy" <Somnath.Roy@sandisk.com>
> À: "aderumier" <aderumier@odiso.com>, "Mark Nelson" <mnelson@redhat.com>
> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>
> Envoyé: Mercredi 19 Août 2015 18:27:30
> Objet: RE: Ceph Hackathon: More Memory Allocator Testing
> 
> << I think that tcmalloc have a fixed size
> (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process.
> 
> I think it is per tcmalloc instance loaded , so, at least with num_osds *
> num_tcmalloc_instance * TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES in a box.
> 
> Also, I think there is no point of increasing osd_op_threads as it is not in
> IO path anymore..Mark is using default 5:2 for shard:thread per shard..
> 
> But, yes, it could be related to number of threads OSDs are using, need to
> understand how jemalloc works..Also, there may be some tuning to reduce
> memory usage (?).
> 
> Thanks & Regards
> Somnath
> 
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org
> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Alexandre DERUMIER
> Sent: Wednesday, August 19, 2015 9:06 AM
> To: Mark Nelson
> Cc: ceph-devel
> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
> 
> I was listening at the today meeting,
> 
> and seem that the blocker to have jemalloc as default,
> 
> is that it's used more memory by osd (around 300MB?), and some guys could
> have boxes with 60disks.
> 
> 
> I just wonder if the memory increase is related to
> osd_op_num_shards/osd_op_threads value ?
> 
> Seem that as hackaton, the bench has been done on super big cpus boxed
> 36cores/72T, http://ceph.com/hackathon/2015-08-ceph-hammer-full-ssd.pptx
> with osd_op_threads = 32.
> 
> I think that tcmalloc have a fixed size
> (TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES), and share it between all process.
> 
> Maybe jemalloc allocated memory by threads.
> 
> 
> 
> (I think guys with 60disks box, dont use ssd, so low iops by osd, and they
> don't need a lot of threads by osd)
> 
> 
> 
> ----- Mail original -----
> De: "aderumier" <aderumier@odiso.com>
> À: "Mark Nelson" <mnelson@redhat.com>
> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>
> Envoyé: Mercredi 19 Août 2015 16:01:28
> Objet: Re: Ceph Hackathon: More Memory Allocator Testing
> 
> Thanks Marc,
> 
> Results are matching exactly what I have seen with tcmalloc 2.1 vs 2.4 vs
> jemalloc.
> 
> and indeed tcmalloc, even with bigger cache, seem decrease over time.
> 
> 
> What is funny, is that I see exactly same behaviour client librbd side, with
> qemu and multiple iothreads.
> 
> 
> Switching both server and client to jemalloc give me best performance on
> small read currently.
> 
> 
> 
> 
> 
> 
> ----- Mail original -----
> De: "Mark Nelson" <mnelson@redhat.com>
> À: "ceph-devel" <ceph-devel@vger.kernel.org>
> Envoyé: Mercredi 19 Août 2015 06:45:36
> Objet: Ceph Hackathon: More Memory Allocator Testing
> 
> Hi Everyone,
> 
> One of the goals at the Ceph Hackathon last week was to examine how to
> improve Ceph Small IO performance. Jian Zhang presented findings showing a
> dramatic improvement in small random IO performance when Ceph is used with
> jemalloc. His results build upon Sandisk's original findings that the
> default thread cache values are a major bottleneck in TCMalloc 2.1. To
> further verify these results, we sat down at the Hackathon and configured
> the new performance test cluster that Intel generously donated to the Ceph
> community laboratory to run through a variety of tests with different memory
> allocator configurations. I've since written the results of those tests up
> in pdf form for folks who are interested.
> 
> The results are located here:
> 
> http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Testing.pdf
> 
> I want to be clear that many other folks have done the heavy lifting here.
> These results are simply a validation of the many tests that other folks
> have already done. Many thanks to Sandisk and others for figuring this out
> as it's a pretty big deal!
> 
> Side note: Very little tuning other than swapping the memory allocator and a
> couple of quick and dirty ceph tunables were set during these tests. It's
> quite possible that higher IOPS will be achieved as we really start digging
> into the cluster and learning what the bottlenecks are.
> 
> Thanks,
> Mark
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the
> body of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the
> body of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the
> body of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
> 
> ________________________________
> 
> PLEASE NOTE: The information contained in this electronic mail message is
> intended only for the use of the designated recipient(s) named above. If the
> reader of this message is not the intended recipient, you are hereby
> notified that you have received this message in error and that any review,
> dissemination, distribution, or copying of this message is strictly
> prohibited. If you have received this communication in error, please notify
> the sender by telephone or e-mail (as shown above) immediately and destroy
> any and all copies of this message in your possession (whether hard copies
> or electronically stored copies).
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the
> body of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
> N�����r��y���b�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w������j:+v���w�j�m��������zZ+��ݢj"��
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: Ceph Hackathon: More Memory Allocator Testing
  2015-08-19 20:44               ` Somnath Roy
@ 2015-08-21  3:45                 ` Shishir Gowda
  2015-08-21  4:22                 ` Shishir Gowda
  1 sibling, 0 replies; 55+ messages in thread
From: Shishir Gowda @ 2015-08-21  3:45 UTC (permalink / raw)
  To: ceph-devel

Hi All,

Have sent out a pull request which enables building librados/librbd with either tcmalloc(as default)  or jemalloc.

Please find the pull request @ https://github.com/ceph/ceph/pull/5628

With regards,
Shishir

> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-
> owner@vger.kernel.org] On Behalf Of Somnath Roy
> Sent: Thursday, August 20, 2015 2:14 AM
> To: Stefan Priebe; Alexandre DERUMIER; Mark Nelson
> Cc: ceph-devel
> Subject: RE: Ceph Hackathon: More Memory Allocator Testing
>
> Yeah , I can see ceph-osd/ceph-mon built with jemalloc.
>
> Thanks & Regards
> Somnath
>
> -----Original Message-----
> From: Stefan Priebe [mailto:s.priebe@profihost.ag]
> Sent: Wednesday, August 19, 2015 1:41 PM
> To: Somnath Roy; Alexandre DERUMIER; Mark Nelson
> Cc: ceph-devel
> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
>
>
> Am 19.08.2015 um 22:34 schrieb Somnath Roy:
> > But, you said you need to remove libcmalloc *not* libtcmalloc...
> > I saw librbd/librados is built with libcmalloc not with libtcmalloc..
> > So, are you saying to remove libtcmalloc (not libcmalloc) to enable jemalloc
> ?
>
> Ouch my mistake. I read libtcmalloc - too late here.
>
> My build (Hammer) says:
> # ldd /usr/lib/librados.so.2.0.0
>          linux-vdso.so.1 =>  (0x00007fff4f71d000)
>          libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fafdb26c000)
>          libboost_thread.so.1.49.0 => /usr/lib/libboost_thread.so.1.49.0
> (0x00007fafdb24f000)
>          libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
> (0x00007fafdb032000)
>          libcrypto++.so.9 => /usr/lib/libcrypto++.so.9 (0x00007fafda924000)
>          libuuid.so.1 => /lib/x86_64-linux-gnu/libuuid.so.1
> (0x00007fafda71f000)
>          librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fafda516000)
>          libboost_system.so.1.49.0 => /usr/lib/libboost_system.so.1.49.0
> (0x00007fafda512000)
>          libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> (0x00007fafda20b000)
>          libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fafd9f88000)
>          libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fafd9bfd000)
>          libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1
> (0x00007fafd99e7000)
>          /lib64/ld-linux-x86-64.so.2 (0x000056358ecfe000)
>
> Only ceph-osd is linked against libjemalloc for me.
>
> Stefan
>
> > -----Original Message-----
> > From: Stefan Priebe [mailto:s.priebe@profihost.ag]
> > Sent: Wednesday, August 19, 2015 1:31 PM
> > To: Somnath Roy; Alexandre DERUMIER; Mark Nelson
> > Cc: ceph-devel
> > Subject: Re: Ceph Hackathon: More Memory Allocator Testing
> >
> >
> > Am 19.08.2015 um 22:29 schrieb Somnath Roy:
> >> Hmm...We need to fix that as part of configure/Makefile I guess (?)..
> >> Since we have done this jemalloc integration originally, we can take that
> ownership unless anybody sees a problem of enabling tcmalloc/jemalloc with
> librbd/librados.
> >>
> >> << You have to remove libcmalloc out of your build environment to get
> >> this done How do I do that ? I am using Ubuntu and can't afford to remove
> libc* packages.
> >
> > I always use a chroot to build packages where only a minimal bootstrap +
> the build deps are installed. googleperftools where libtcmalloc comes from is
> not Ubuntu "core/minimal".
> >
> > Stefan
> >
> >>
> >> Thanks & Regards
> >> Somnath
> >>
> >> -----Original Message-----
> >> From: Stefan Priebe [mailto:s.priebe@profihost.ag]
> >> Sent: Wednesday, August 19, 2015 1:18 PM
> >> To: Somnath Roy; Alexandre DERUMIER; Mark Nelson
> >> Cc: ceph-devel
> >> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
> >>
> >>
> >> Am 19.08.2015 um 22:16 schrieb Somnath Roy:
> >>> Alexandre,
> >>> I am not able to build librados/librbd by using the following config option.
> >>>
> >>> ./configure –without-tcmalloc –with-jemalloc
> >>
> >> Same issue to me. You have to remove libcmalloc out of your build
> environment to get this done.
> >>
> >> Stefan
> >>
> >>
> >>> It seems it is building osd/mon/Mds/RGW with jemalloc enabled..
> >>>
> >>> root@emsnode10:~/ceph-latest/src# ldd ./ceph-osd
> >>>            linux-vdso.so.1 =>  (0x00007ffd0eb43000)
> >>>            libjemalloc.so.1 => /usr/lib/x86_64-linux-gnu/libjemalloc.so.1
> (0x00007f5f92d70000)
> >>>            .......
> >>>
> >>> root@emsnode10:~/ceph-latest/src/.libs# ldd ./librados.so.2.0.0
> >>>            linux-vdso.so.1 =>  (0x00007ffed46f2000)
> >>>            libboost_thread.so.1.55.0 => /usr/lib/x86_64-linux-
> gnu/libboost_thread.so.1.55.0 (0x00007ff687887000)
> >>>            liblttng-ust.so.0 => /usr/lib/x86_64-linux-gnu/liblttng-ust.so.0
> (0x00007ff68763d000)
> >>>            libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff687438000)
> >>>            libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
> (0x00007ff68721a000)
> >>>            libnss3.so => /usr/lib/x86_64-linux-gnu/libnss3.so
> (0x00007ff686ee0000)
> >>>            libsmime3.so => /usr/lib/x86_64-linux-gnu/libsmime3.so
> (0x00007ff686cb3000)
> >>>            libnspr4.so => /usr/lib/x86_64-linux-gnu/libnspr4.so
> (0x00007ff686a76000)
> >>>            libuuid.so.1 => /lib/x86_64-linux-gnu/libuuid.so.1
> (0x00007ff686871000)
> >>>            librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007ff686668000)
> >>>            libboost_system.so.1.55.0 => /usr/lib/x86_64-linux-
> gnu/libboost_system.so.1.55.0 (0x00007ff686464000)
> >>>            libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> (0x00007ff686160000)
> >>>            libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff685e59000)
> >>>            libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff685a94000)
> >>>            libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1
> (0x00007ff68587e000)
> >>>            liblttng-ust-tracepoint.so.0 => /usr/lib/x86_64-linux-gnu/liblttng-
> ust-tracepoint.so.0 (0x00007ff685663000)
> >>>            liburcu-bp.so.1 => /usr/lib/liburcu-bp.so.1 (0x00007ff68545c000)
> >>>            liburcu-cds.so.1 => /usr/lib/liburcu-cds.so.1 (0x00007ff685255000)
> >>>            /lib64/ld-linux-x86-64.so.2 (0x00007ff68a0f6000)
> >>>            libnssutil3.so => /usr/lib/x86_64-linux-gnu/libnssutil3.so
> (0x00007ff685029000)
> >>>            libplc4.so => /usr/lib/x86_64-linux-gnu/libplc4.so
> (0x00007ff684e24000)
> >>>            libplds4.so => /usr/lib/x86_64-linux-gnu/libplds4.so
> >>> (0x00007ff684c20000)
> >>>
> >>> It is building with libcmalloc always...
> >>>
> >>> Did you change the ceph makefiles to build librbd/librados with jemalloc
> ?
> >>>
> >>> Thanks & Regards
> >>> Somnath
> >>>
> >>> -----Original Message-----
> >>> From: ceph-devel-owner@vger.kernel.org
> >>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Alexandre
> >>> DERUMIER
> >>> Sent: Wednesday, August 19, 2015 7:01 AM
> >>> To: Mark Nelson
> >>> Cc: ceph-devel
> >>> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
> >>>
> >>> Thanks Marc,
> >>>
> >>> Results are matching exactly what I have seen with tcmalloc 2.1 vs 2.4 vs
> jemalloc.
> >>>
> >>> and indeed tcmalloc, even with bigger cache, seem decrease over time.
> >>>
> >>>
> >>> What is funny, is that I see exactly same behaviour client librbd side, with
> qemu and multiple iothreads.
> >>>
> >>>
> >>> Switching both server and client to jemalloc give me best performance
> on small read currently.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> ----- Mail original -----
> >>> De: "Mark Nelson" <mnelson@redhat.com>
> >>> À: "ceph-devel" <ceph-devel@vger.kernel.org>
> >>> Envoyé: Mercredi 19 Août 2015 06:45:36
> >>> Objet: Ceph Hackathon: More Memory Allocator Testing
> >>>
> >>> Hi Everyone,
> >>>
> >>> One of the goals at the Ceph Hackathon last week was to examine how
> to improve Ceph Small IO performance. Jian Zhang presented findings
> showing a dramatic improvement in small random IO performance when
> Ceph is used with jemalloc. His results build upon Sandisk's original findings
> that the default thread cache values are a major bottleneck in TCMalloc 2.1.
> To further verify these results, we sat down at the Hackathon and configured
> the new performance test cluster that Intel generously donated to the Ceph
> community laboratory to run through a variety of tests with different
> memory allocator configurations. I've since written the results of those tests
> up in pdf form for folks who are interested.
> >>>
> >>> The results are located here:
> >>>
> >>>
> http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Tes
> ting.
> >>> pdf
> >>>
> >>> I want to be clear that many other folks have done the heavy lifting
> here. These results are simply a validation of the many tests that other folks
> have already done. Many thanks to Sandisk and others for figuring this out as
> it's a pretty big deal!
> >>>
> >>> Side note: Very little tuning other than swapping the memory allocator
> and a couple of quick and dirty ceph tunables were set during these tests. It's
> quite possible that higher IOPS will be achieved as we really start digging into
> the cluster and learning what the bottlenecks are.
> >>>
> >>> Thanks,
> >>> Mark
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> >>> in the body of a message to majordomo@vger.kernel.org More
> majordomo
> >>> info at http://vger.kernel.org/majordomo-info.html
> >>>
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> >>> in the body of a message to majordomo@vger.kernel.org More
> majordomo
> >>> info at  http://vger.kernel.org/majordomo-info.html
> >>>
> >>> ________________________________
> >>>
> >>> PLEASE NOTE: The information contained in this electronic mail message
> is intended only for the use of the designated recipient(s) named above. If
> the reader of this message is not the intended recipient, you are hereby
> notified that you have received this message in error and that any review,
> dissemination, distribution, or copying of this message is strictly prohibited. If
> you have received this communication in error, please notify the sender by
> telephone or e-mail (as shown above) immediately and destroy any and all
> copies of this message in your possession (whether hard copies or
> electronically stored copies).
> >>>
> >>> N     r  y   b X  ǧv ^ )޺{.n +   z ]z   {ay \x1dʇڙ ,j   f   h   z \x1e w
> >>      j:+v   w j m         zZ+     ݢj"  !tml=
> >>>
> >> N     r  y   b X  ǧv ^ )޺{.n +   z ]z   {ay \x1dʇڙ ,j   f   h   z \x1e w
> >     j:+v   w j m         zZ+     ݢj"  !tml=
> >>
> N�����r��y���b�X��ǧv�^�)޺{.n�+���z�]z���{ay�\x1dʇڙ�,j
��f���h���z�\x1e
> �w���
>
> ���j:+v���w�j�m����
����zZ+�����ݢj"��!�i

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).


^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: Ceph Hackathon: More Memory Allocator Testing
  2015-08-19 20:44               ` Somnath Roy
  2015-08-21  3:45                 ` Shishir Gowda
@ 2015-08-21  4:22                 ` Shishir Gowda
  2015-08-21 14:26                   ` Milosz Tanski
  1 sibling, 1 reply; 55+ messages in thread
From: Shishir Gowda @ 2015-08-21  4:22 UTC (permalink / raw)
  To: Somnath Roy, Stefan Priebe, Alexandre DERUMIER, Mark Nelson; +Cc: ceph-devel

Hi All,

Have sent out a pull request which enables building librados/librbd with either tcmalloc(as default)  or jemalloc.

Please find the pull request @ https://github.com/ceph/ceph/pull/5628

With regards,
Shishir

> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-
> owner@vger.kernel.org] On Behalf Of Somnath Roy
> Sent: Thursday, August 20, 2015 2:14 AM
> To: Stefan Priebe; Alexandre DERUMIER; Mark Nelson
> Cc: ceph-devel
> Subject: RE: Ceph Hackathon: More Memory Allocator Testing
>
> Yeah , I can see ceph-osd/ceph-mon built with jemalloc.
>
> Thanks & Regards
> Somnath
>
> -----Original Message-----
> From: Stefan Priebe [mailto:s.priebe@profihost.ag]
> Sent: Wednesday, August 19, 2015 1:41 PM
> To: Somnath Roy; Alexandre DERUMIER; Mark Nelson
> Cc: ceph-devel
> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
>
>
> Am 19.08.2015 um 22:34 schrieb Somnath Roy:
> > But, you said you need to remove libcmalloc *not* libtcmalloc...
> > I saw librbd/librados is built with libcmalloc not with libtcmalloc..
> > So, are you saying to remove libtcmalloc (not libcmalloc) to enable jemalloc
> ?
>
> Ouch my mistake. I read libtcmalloc - too late here.
>
> My build (Hammer) says:
> # ldd /usr/lib/librados.so.2.0.0
>          linux-vdso.so.1 =>  (0x00007fff4f71d000)
>          libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fafdb26c000)
>          libboost_thread.so.1.49.0 => /usr/lib/libboost_thread.so.1.49.0
> (0x00007fafdb24f000)
>          libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
> (0x00007fafdb032000)
>          libcrypto++.so.9 => /usr/lib/libcrypto++.so.9 (0x00007fafda924000)
>          libuuid.so.1 => /lib/x86_64-linux-gnu/libuuid.so.1
> (0x00007fafda71f000)
>          librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fafda516000)
>          libboost_system.so.1.49.0 => /usr/lib/libboost_system.so.1.49.0
> (0x00007fafda512000)
>          libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> (0x00007fafda20b000)
>          libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fafd9f88000)
>          libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fafd9bfd000)
>          libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1
> (0x00007fafd99e7000)
>          /lib64/ld-linux-x86-64.so.2 (0x000056358ecfe000)
>
> Only ceph-osd is linked against libjemalloc for me.
>
> Stefan
>
> > -----Original Message-----
> > From: Stefan Priebe [mailto:s.priebe@profihost.ag]
> > Sent: Wednesday, August 19, 2015 1:31 PM
> > To: Somnath Roy; Alexandre DERUMIER; Mark Nelson
> > Cc: ceph-devel
> > Subject: Re: Ceph Hackathon: More Memory Allocator Testing
> >
> >
> > Am 19.08.2015 um 22:29 schrieb Somnath Roy:
> >> Hmm...We need to fix that as part of configure/Makefile I guess (?)..
> >> Since we have done this jemalloc integration originally, we can take that
> ownership unless anybody sees a problem of enabling tcmalloc/jemalloc with
> librbd/librados.
> >>
> >> << You have to remove libcmalloc out of your build environment to get
> >> this done How do I do that ? I am using Ubuntu and can't afford to remove
> libc* packages.
> >
> > I always use a chroot to build packages where only a minimal bootstrap +
> the build deps are installed. googleperftools where libtcmalloc comes from is
> not Ubuntu "core/minimal".
> >
> > Stefan
> >
> >>
> >> Thanks & Regards
> >> Somnath
> >>
> >> -----Original Message-----
> >> From: Stefan Priebe [mailto:s.priebe@profihost.ag]
> >> Sent: Wednesday, August 19, 2015 1:18 PM
> >> To: Somnath Roy; Alexandre DERUMIER; Mark Nelson
> >> Cc: ceph-devel
> >> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
> >>
> >>
> >> Am 19.08.2015 um 22:16 schrieb Somnath Roy:
> >>> Alexandre,
> >>> I am not able to build librados/librbd by using the following config option.
> >>>
> >>> ./configure –without-tcmalloc –with-jemalloc
> >>
> >> Same issue to me. You have to remove libcmalloc out of your build
> environment to get this done.
> >>
> >> Stefan
> >>
> >>
> >>> It seems it is building osd/mon/Mds/RGW with jemalloc enabled..
> >>>
> >>> root@emsnode10:~/ceph-latest/src# ldd ./ceph-osd
> >>>            linux-vdso.so.1 =>  (0x00007ffd0eb43000)
> >>>            libjemalloc.so.1 => /usr/lib/x86_64-linux-gnu/libjemalloc.so.1
> (0x00007f5f92d70000)
> >>>            .......
> >>>
> >>> root@emsnode10:~/ceph-latest/src/.libs# ldd ./librados.so.2.0.0
> >>>            linux-vdso.so.1 =>  (0x00007ffed46f2000)
> >>>            libboost_thread.so.1.55.0 => /usr/lib/x86_64-linux-
> gnu/libboost_thread.so.1.55.0 (0x00007ff687887000)
> >>>            liblttng-ust.so.0 => /usr/lib/x86_64-linux-gnu/liblttng-ust.so.0
> (0x00007ff68763d000)
> >>>            libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff687438000)
> >>>            libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
> (0x00007ff68721a000)
> >>>            libnss3.so => /usr/lib/x86_64-linux-gnu/libnss3.so
> (0x00007ff686ee0000)
> >>>            libsmime3.so => /usr/lib/x86_64-linux-gnu/libsmime3.so
> (0x00007ff686cb3000)
> >>>            libnspr4.so => /usr/lib/x86_64-linux-gnu/libnspr4.so
> (0x00007ff686a76000)
> >>>            libuuid.so.1 => /lib/x86_64-linux-gnu/libuuid.so.1
> (0x00007ff686871000)
> >>>            librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007ff686668000)
> >>>            libboost_system.so.1.55.0 => /usr/lib/x86_64-linux-
> gnu/libboost_system.so.1.55.0 (0x00007ff686464000)
> >>>            libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> (0x00007ff686160000)
> >>>            libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff685e59000)
> >>>            libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff685a94000)
> >>>            libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1
> (0x00007ff68587e000)
> >>>            liblttng-ust-tracepoint.so.0 => /usr/lib/x86_64-linux-gnu/liblttng-
> ust-tracepoint.so.0 (0x00007ff685663000)
> >>>            liburcu-bp.so.1 => /usr/lib/liburcu-bp.so.1 (0x00007ff68545c000)
> >>>            liburcu-cds.so.1 => /usr/lib/liburcu-cds.so.1 (0x00007ff685255000)
> >>>            /lib64/ld-linux-x86-64.so.2 (0x00007ff68a0f6000)
> >>>            libnssutil3.so => /usr/lib/x86_64-linux-gnu/libnssutil3.so
> (0x00007ff685029000)
> >>>            libplc4.so => /usr/lib/x86_64-linux-gnu/libplc4.so
> (0x00007ff684e24000)
> >>>            libplds4.so => /usr/lib/x86_64-linux-gnu/libplds4.so
> >>> (0x00007ff684c20000)
> >>>
> >>> It is building with libcmalloc always...
> >>>
> >>> Did you change the ceph makefiles to build librbd/librados with jemalloc
> ?
> >>>
> >>> Thanks & Regards
> >>> Somnath
> >>>
> >>> -----Original Message-----
> >>> From: ceph-devel-owner@vger.kernel.org
> >>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Alexandre
> >>> DERUMIER
> >>> Sent: Wednesday, August 19, 2015 7:01 AM
> >>> To: Mark Nelson
> >>> Cc: ceph-devel
> >>> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
> >>>
> >>> Thanks Marc,
> >>>
> >>> Results are matching exactly what I have seen with tcmalloc 2.1 vs 2.4 vs
> jemalloc.
> >>>
> >>> and indeed tcmalloc, even with bigger cache, seem decrease over time.
> >>>
> >>>
> >>> What is funny, is that I see exactly same behaviour client librbd side, with
> qemu and multiple iothreads.
> >>>
> >>>
> >>> Switching both server and client to jemalloc give me best performance
> on small read currently.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> ----- Mail original -----
> >>> De: "Mark Nelson" <mnelson@redhat.com>
> >>> À: "ceph-devel" <ceph-devel@vger.kernel.org>
> >>> Envoyé: Mercredi 19 Août 2015 06:45:36
> >>> Objet: Ceph Hackathon: More Memory Allocator Testing
> >>>
> >>> Hi Everyone,
> >>>
> >>> One of the goals at the Ceph Hackathon last week was to examine how
> to improve Ceph Small IO performance. Jian Zhang presented findings
> showing a dramatic improvement in small random IO performance when
> Ceph is used with jemalloc. His results build upon Sandisk's original findings
> that the default thread cache values are a major bottleneck in TCMalloc 2.1.
> To further verify these results, we sat down at the Hackathon and configured
> the new performance test cluster that Intel generously donated to the Ceph
> community laboratory to run through a variety of tests with different
> memory allocator configurations. I've since written the results of those tests
> up in pdf form for folks who are interested.
> >>>
> >>> The results are located here:
> >>>
> >>>
> http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Tes
> ting.
> >>> pdf
> >>>
> >>> I want to be clear that many other folks have done the heavy lifting
> here. These results are simply a validation of the many tests that other folks
> have already done. Many thanks to Sandisk and others for figuring this out as
> it's a pretty big deal!
> >>>
> >>> Side note: Very little tuning other than swapping the memory allocator
> and a couple of quick and dirty ceph tunables were set during these tests. It's
> quite possible that higher IOPS will be achieved as we really start digging into
> the cluster and learning what the bottlenecks are.
> >>>
> >>> Thanks,
> >>> Mark
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> >>> in the body of a message to majordomo@vger.kernel.org More
> majordomo
> >>> info at http://vger.kernel.org/majordomo-info.html
> >>>
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> >>> in the body of a message to majordomo@vger.kernel.org More
> majordomo
> >>> info at  http://vger.kernel.org/majordomo-info.html
> >>>
> >>> ________________________________
> >>>
> >>> PLEASE NOTE: The information contained in this electronic mail message
> is intended only for the use of the designated recipient(s) named above. If
> the reader of this message is not the intended recipient, you are hereby
> notified that you have received this message in error and that any review,
> dissemination, distribution, or copying of this message is strictly prohibited. If
> you have received this communication in error, please notify the sender by
> telephone or e-mail (as shown above) immediately and destroy any and all
> copies of this message in your possession (whether hard copies or
> electronically stored copies).
> >>>
> >>> N     r  y   b X  ǧv ^ )޺{.n +   z ]z   {ay \x1dʇڙ ,j   f   h   z \x1e w
> >>      j:+v   w j m         zZ+     ݢj"  !tml=
> >>>
> >> N     r  y   b X  ǧv ^ )޺{.n +   z ]z   {ay \x1dʇڙ ,j   f   h   z \x1e w
> >     j:+v   w j m         zZ+     ݢj"  !tml=
> >>
> N�����r��y���b�X��ǧv�^�)޺{.n�+���z�]z���{ay�\x1dʇڙ�,j
��f���h���z�\x1e
> �w���
>
> ���j:+v���w�j�m����
����zZ+�����ݢj"��!�i

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Ceph Hackathon: More Memory Allocator Testing
  2015-08-21  4:22                 ` Shishir Gowda
@ 2015-08-21 14:26                   ` Milosz Tanski
  2015-08-21 19:07                     ` Robert LeBlanc
  2015-08-22 13:55                     ` Sage Weil
  0 siblings, 2 replies; 55+ messages in thread
From: Milosz Tanski @ 2015-08-21 14:26 UTC (permalink / raw)
  To: Shishir Gowda
  Cc: Somnath Roy, Stefan Priebe, Alexandre DERUMIER, Mark Nelson, ceph-devel

On Fri, Aug 21, 2015 at 12:22 AM, Shishir Gowda
<Shishir.Gowda@sandisk.com> wrote:
> Hi All,
>
> Have sent out a pull request which enables building librados/librbd with either tcmalloc(as default)  or jemalloc.
>
> Please find the pull request @ https://github.com/ceph/ceph/pull/5628
>
> With regards,
> Shishir

Unless I'm missing something here, this seams like the wrong thing to.
Libraries that will be linked in by other external applications should
not have a 3rd party malloc linked in there. That seams like an
application choice. At the very least the default should not be to
link in a 3rd party malloc.

>
>> -----Original Message-----
>> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-
>> owner@vger.kernel.org] On Behalf Of Somnath Roy
>> Sent: Thursday, August 20, 2015 2:14 AM
>> To: Stefan Priebe; Alexandre DERUMIER; Mark Nelson
>> Cc: ceph-devel
>> Subject: RE: Ceph Hackathon: More Memory Allocator Testing
>>
>> Yeah , I can see ceph-osd/ceph-mon built with jemalloc.
>>
>> Thanks & Regards
>> Somnath
>>
>> -----Original Message-----
>> From: Stefan Priebe [mailto:s.priebe@profihost.ag]
>> Sent: Wednesday, August 19, 2015 1:41 PM
>> To: Somnath Roy; Alexandre DERUMIER; Mark Nelson
>> Cc: ceph-devel
>> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
>>
>>
>> Am 19.08.2015 um 22:34 schrieb Somnath Roy:
>> > But, you said you need to remove libcmalloc *not* libtcmalloc...
>> > I saw librbd/librados is built with libcmalloc not with libtcmalloc..
>> > So, are you saying to remove libtcmalloc (not libcmalloc) to enable jemalloc
>> ?
>>
>> Ouch my mistake. I read libtcmalloc - too late here.
>>
>> My build (Hammer) says:
>> # ldd /usr/lib/librados.so.2.0.0
>>          linux-vdso.so.1 =>  (0x00007fff4f71d000)
>>          libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fafdb26c000)
>>          libboost_thread.so.1.49.0 => /usr/lib/libboost_thread.so.1.49.0
>> (0x00007fafdb24f000)
>>          libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
>> (0x00007fafdb032000)
>>          libcrypto++.so.9 => /usr/lib/libcrypto++.so.9 (0x00007fafda924000)
>>          libuuid.so.1 => /lib/x86_64-linux-gnu/libuuid.so.1
>> (0x00007fafda71f000)
>>          librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fafda516000)
>>          libboost_system.so.1.49.0 => /usr/lib/libboost_system.so.1.49.0
>> (0x00007fafda512000)
>>          libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>> (0x00007fafda20b000)
>>          libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fafd9f88000)
>>          libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fafd9bfd000)
>>          libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1
>> (0x00007fafd99e7000)
>>          /lib64/ld-linux-x86-64.so.2 (0x000056358ecfe000)
>>
>> Only ceph-osd is linked against libjemalloc for me.
>>
>> Stefan
>>
>> > -----Original Message-----
>> > From: Stefan Priebe [mailto:s.priebe@profihost.ag]
>> > Sent: Wednesday, August 19, 2015 1:31 PM
>> > To: Somnath Roy; Alexandre DERUMIER; Mark Nelson
>> > Cc: ceph-devel
>> > Subject: Re: Ceph Hackathon: More Memory Allocator Testing
>> >
>> >
>> > Am 19.08.2015 um 22:29 schrieb Somnath Roy:
>> >> Hmm...We need to fix that as part of configure/Makefile I guess (?)..
>> >> Since we have done this jemalloc integration originally, we can take that
>> ownership unless anybody sees a problem of enabling tcmalloc/jemalloc with
>> librbd/librados.
>> >>
>> >> << You have to remove libcmalloc out of your build environment to get
>> >> this done How do I do that ? I am using Ubuntu and can't afford to remove
>> libc* packages.
>> >
>> > I always use a chroot to build packages where only a minimal bootstrap +
>> the build deps are installed. googleperftools where libtcmalloc comes from is
>> not Ubuntu "core/minimal".
>> >
>> > Stefan
>> >
>> >>
>> >> Thanks & Regards
>> >> Somnath
>> >>
>> >> -----Original Message-----
>> >> From: Stefan Priebe [mailto:s.priebe@profihost.ag]
>> >> Sent: Wednesday, August 19, 2015 1:18 PM
>> >> To: Somnath Roy; Alexandre DERUMIER; Mark Nelson
>> >> Cc: ceph-devel
>> >> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
>> >>
>> >>
>> >> Am 19.08.2015 um 22:16 schrieb Somnath Roy:
>> >>> Alexandre,
>> >>> I am not able to build librados/librbd by using the following config option.
>> >>>
>> >>> ./configure –without-tcmalloc –with-jemalloc
>> >>
>> >> Same issue to me. You have to remove libcmalloc out of your build
>> environment to get this done.
>> >>
>> >> Stefan
>> >>
>> >>
>> >>> It seems it is building osd/mon/Mds/RGW with jemalloc enabled..
>> >>>
>> >>> root@emsnode10:~/ceph-latest/src# ldd ./ceph-osd
>> >>>            linux-vdso.so.1 =>  (0x00007ffd0eb43000)
>> >>>            libjemalloc.so.1 => /usr/lib/x86_64-linux-gnu/libjemalloc.so.1
>> (0x00007f5f92d70000)
>> >>>            .......
>> >>>
>> >>> root@emsnode10:~/ceph-latest/src/.libs# ldd ./librados.so.2.0.0
>> >>>            linux-vdso.so.1 =>  (0x00007ffed46f2000)
>> >>>            libboost_thread.so.1.55.0 => /usr/lib/x86_64-linux-
>> gnu/libboost_thread.so.1.55.0 (0x00007ff687887000)
>> >>>            liblttng-ust.so.0 => /usr/lib/x86_64-linux-gnu/liblttng-ust.so.0
>> (0x00007ff68763d000)
>> >>>            libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff687438000)
>> >>>            libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
>> (0x00007ff68721a000)
>> >>>            libnss3.so => /usr/lib/x86_64-linux-gnu/libnss3.so
>> (0x00007ff686ee0000)
>> >>>            libsmime3.so => /usr/lib/x86_64-linux-gnu/libsmime3.so
>> (0x00007ff686cb3000)
>> >>>            libnspr4.so => /usr/lib/x86_64-linux-gnu/libnspr4.so
>> (0x00007ff686a76000)
>> >>>            libuuid.so.1 => /lib/x86_64-linux-gnu/libuuid.so.1
>> (0x00007ff686871000)
>> >>>            librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007ff686668000)
>> >>>            libboost_system.so.1.55.0 => /usr/lib/x86_64-linux-
>> gnu/libboost_system.so.1.55.0 (0x00007ff686464000)
>> >>>            libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>> (0x00007ff686160000)
>> >>>            libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff685e59000)
>> >>>            libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff685a94000)
>> >>>            libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1
>> (0x00007ff68587e000)
>> >>>            liblttng-ust-tracepoint.so.0 => /usr/lib/x86_64-linux-gnu/liblttng-
>> ust-tracepoint.so.0 (0x00007ff685663000)
>> >>>            liburcu-bp.so.1 => /usr/lib/liburcu-bp.so.1 (0x00007ff68545c000)
>> >>>            liburcu-cds.so.1 => /usr/lib/liburcu-cds.so.1 (0x00007ff685255000)
>> >>>            /lib64/ld-linux-x86-64.so.2 (0x00007ff68a0f6000)
>> >>>            libnssutil3.so => /usr/lib/x86_64-linux-gnu/libnssutil3.so
>> (0x00007ff685029000)
>> >>>            libplc4.so => /usr/lib/x86_64-linux-gnu/libplc4.so
>> (0x00007ff684e24000)
>> >>>            libplds4.so => /usr/lib/x86_64-linux-gnu/libplds4.so
>> >>> (0x00007ff684c20000)
>> >>>
>> >>> It is building with libcmalloc always...
>> >>>
>> >>> Did you change the ceph makefiles to build librbd/librados with jemalloc
>> ?
>> >>>
>> >>> Thanks & Regards
>> >>> Somnath
>> >>>
>> >>> -----Original Message-----
>> >>> From: ceph-devel-owner@vger.kernel.org
>> >>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Alexandre
>> >>> DERUMIER
>> >>> Sent: Wednesday, August 19, 2015 7:01 AM
>> >>> To: Mark Nelson
>> >>> Cc: ceph-devel
>> >>> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
>> >>>
>> >>> Thanks Marc,
>> >>>
>> >>> Results are matching exactly what I have seen with tcmalloc 2.1 vs 2.4 vs
>> jemalloc.
>> >>>
>> >>> and indeed tcmalloc, even with bigger cache, seem decrease over time.
>> >>>
>> >>>
>> >>> What is funny, is that I see exactly same behaviour client librbd side, with
>> qemu and multiple iothreads.
>> >>>
>> >>>
>> >>> Switching both server and client to jemalloc give me best performance
>> on small read currently.
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> ----- Mail original -----
>> >>> De: "Mark Nelson" <mnelson@redhat.com>
>> >>> À: "ceph-devel" <ceph-devel@vger.kernel.org>
>> >>> Envoyé: Mercredi 19 Août 2015 06:45:36
>> >>> Objet: Ceph Hackathon: More Memory Allocator Testing
>> >>>
>> >>> Hi Everyone,
>> >>>
>> >>> One of the goals at the Ceph Hackathon last week was to examine how
>> to improve Ceph Small IO performance. Jian Zhang presented findings
>> showing a dramatic improvement in small random IO performance when
>> Ceph is used with jemalloc. His results build upon Sandisk's original findings
>> that the default thread cache values are a major bottleneck in TCMalloc 2.1.
>> To further verify these results, we sat down at the Hackathon and configured
>> the new performance test cluster that Intel generously donated to the Ceph
>> community laboratory to run through a variety of tests with different
>> memory allocator configurations. I've since written the results of those tests
>> up in pdf form for folks who are interested.
>> >>>
>> >>> The results are located here:
>> >>>
>> >>>
>> http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Tes
>> ting.
>> >>> pdf
>> >>>
>> >>> I want to be clear that many other folks have done the heavy lifting
>> here. These results are simply a validation of the many tests that other folks
>> have already done. Many thanks to Sandisk and others for figuring this out as
>> it's a pretty big deal!
>> >>>
>> >>> Side note: Very little tuning other than swapping the memory allocator
>> and a couple of quick and dirty ceph tunables were set during these tests. It's
>> quite possible that higher IOPS will be achieved as we really start digging into
>> the cluster and learning what the bottlenecks are.
>> >>>
>> >>> Thanks,
>> >>> Mark
>> >>> --
>> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> >>> in the body of a message to majordomo@vger.kernel.org More
>> majordomo
>> >>> info at http://vger.kernel.org/majordomo-info.html
>> >>>
>> >>> --
>> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> >>> in the body of a message to majordomo@vger.kernel.org More
>> majordomo
>> >>> info at  http://vger.kernel.org/majordomo-info.html
>> >>>
>> >>> ________________________________
>> >>>
>> >>> PLEASE NOTE: The information contained in this electronic mail message
>> is intended only for the use of the designated recipient(s) named above. If
>> the reader of this message is not the intended recipient, you are hereby
>> notified that you have received this message in error and that any review,
>> dissemination, distribution, or copying of this message is strictly prohibited. If
>> you have received this communication in error, please notify the sender by
>> telephone or e-mail (as shown above) immediately and destroy any and all
>> copies of this message in your possession (whether hard copies or
>> electronically stored copies).
>> >>>
>> >>> N     r  y   b X  ǧv ^ )޺{.n +   z ]z   {ay  ʇڙ ,j   f   h   z   w
>> >>      j:+v   w j m         zZ+     ݢj"  !tml=
>> >>>
>> >> N     r  y   b X  ǧv ^ )޺{.n +   z ]z   {ay  ʇڙ ,j   f   h   z   w
>> >     j:+v   w j m         zZ+     ݢj"  !tml=
>> >>
>> N�����r��y���b�X��ǧv�^�)޺{.n�+���z�]z���{ay� ʇڙ�,j
> ��f���h���z�
>> �w���
>>
>> ���j:+v���w�j�m����
> ����zZ+�����ݢj"��!�i
>
> ________________________________
>
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>



-- 
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016

p: 646-253-9055
e: milosz@adfin.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Ceph Hackathon: More Memory Allocator Testing
  2015-08-21 14:26                   ` Milosz Tanski
@ 2015-08-21 19:07                     ` Robert LeBlanc
  2015-08-22 13:52                       ` Sage Weil
  2015-08-22 13:55                     ` Sage Weil
  1 sibling, 1 reply; 55+ messages in thread
From: Robert LeBlanc @ 2015-08-21 19:07 UTC (permalink / raw)
  To: Milosz Tanski
  Cc: Shishir Gowda, Somnath Roy, Stefan Priebe, Alexandre DERUMIER,
	Mark Nelson, ceph-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Please excuse me if I'm naive about this (I'm not a programmer by
trade), but wouldn't it be possible to load/link to tcmalloc or
jemalloc at run time instead of having to do so at compile time?
Except for tcmalloc being statically linked (is this even a
requirement?) it seems like it would be much more flexible to have a
config option, or a list of allocators to try (i.e. jemalloc, then
tcmalloc then glib depending on what is available). Then people would
have the option of choosing an allocator without having to recompile
packages.

We'd like to use jemalloc, but don't want to deviate from the Ceph
provided packages if at all possible. I also understand from some
discussion earlier that it would be very difficult/near impossible to
perform adequate testing on all Ceph versions x Distro allocators.

A possible compromise might be that Ceph is tested and distributed
with the recommended tcmalloc/jemalloc/etc, but if an end user chooses
can override the allocator in the config. We would be able to perform
the testing of our allocator of choce, Ceph version on our distro to
our satisfaction. Ceph could also test specific versions of
jemalloc/tcmalloc/glibc and just state that it is "certified" on
specific versions of these allocators and running a different version
is not tested or supported. I would suspect that a 3-way allocator
testing would help find most of the bugs/issues to make it fairly
stable across most versions.

On a different note, we ran into some memory allocation/deallocation
issues in some kernel code that we are writing. We wound up moving the
allocation code to a parent function and are just rewriting the memory
space in the child which has resolved a lot of the performance and
stability issues we were seeing. I think Ceph has some more complex
challenges like unknown buffer size (not sure we want to allocate max
buffers for each request/thread), thread safety and even thread scope.
If we can somehow reduce these memory operations, I think it will be a
great win regardless of the allocator. (I think I'm stating the
obvious here, sorry).
- ----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Fri, Aug 21, 2015 at 8:26 AM, Milosz Tanski  wrote:
> On Fri, Aug 21, 2015 at 12:22 AM, Shishir Gowda
>  wrote:
>> Hi All,
>>
>> Have sent out a pull request which enables building librados/librbd with either tcmalloc(as default)  or jemalloc.
>>
>> Please find the pull request @ https://github.com/ceph/ceph/pull/5628
>>
>> With regards,
>> Shishir
>
> Unless I'm missing something here, this seams like the wrong thing to.
> Libraries that will be linked in by other external applications should
> not have a 3rd party malloc linked in there. That seams like an
> application choice. At the very least the default should not be to
> link in a 3rd party malloc.
>
>>
>>> -----Original Message-----
>>> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-
>>> owner@vger.kernel.org] On Behalf Of Somnath Roy
>>> Sent: Thursday, August 20, 2015 2:14 AM
>>> To: Stefan Priebe; Alexandre DERUMIER; Mark Nelson
>>> Cc: ceph-devel
>>> Subject: RE: Ceph Hackathon: More Memory Allocator Testing
>>>
>>> Yeah , I can see ceph-osd/ceph-mon built with jemalloc.
>>>
>>> Thanks & Regards
>>> Somnath
>>>
>>> -----Original Message-----
>>> From: Stefan Priebe [mailto:s.priebe@profihost.ag]
>>> Sent: Wednesday, August 19, 2015 1:41 PM
>>> To: Somnath Roy; Alexandre DERUMIER; Mark Nelson
>>> Cc: ceph-devel
>>> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
>>>
>>>
>>> Am 19.08.2015 um 22:34 schrieb Somnath Roy:
>>> > But, you said you need to remove libcmalloc *not* libtcmalloc...
>>> > I saw librbd/librados is built with libcmalloc not with libtcmalloc..
>>> > So, are you saying to remove libtcmalloc (not libcmalloc) to enable jemalloc
>>> ?
>>>
>>> Ouch my mistake. I read libtcmalloc - too late here.
>>>
>>> My build (Hammer) says:
>>> # ldd /usr/lib/librados.so.2.0.0
>>>          linux-vdso.so.1 =>  (0x00007fff4f71d000)
>>>          libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fafdb26c000)
>>>          libboost_thread.so.1.49.0 => /usr/lib/libboost_thread.so.1.49.0
>>> (0x00007fafdb24f000)
>>>          libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
>>> (0x00007fafdb032000)
>>>          libcrypto++.so.9 => /usr/lib/libcrypto++.so.9 (0x00007fafda924000)
>>>          libuuid.so.1 => /lib/x86_64-linux-gnu/libuuid.so.1
>>> (0x00007fafda71f000)
>>>          librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fafda516000)
>>>          libboost_system.so.1.49.0 => /usr/lib/libboost_system.so.1.49.0
>>> (0x00007fafda512000)
>>>          libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>>> (0x00007fafda20b000)
>>>          libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fafd9f88000)
>>>          libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fafd9bfd000)
>>>          libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1
>>> (0x00007fafd99e7000)
>>>          /lib64/ld-linux-x86-64.so.2 (0x000056358ecfe000)
>>>
>>> Only ceph-osd is linked against libjemalloc for me.
>>>
>>> Stefan
>>>
>>> > -----Original Message-----
>>> > From: Stefan Priebe [mailto:s.priebe@profihost.ag]
>>> > Sent: Wednesday, August 19, 2015 1:31 PM
>>> > To: Somnath Roy; Alexandre DERUMIER; Mark Nelson
>>> > Cc: ceph-devel
>>> > Subject: Re: Ceph Hackathon: More Memory Allocator Testing
>>> >
>>> >
>>> > Am 19.08.2015 um 22:29 schrieb Somnath Roy:
>>> >> Hmm...We need to fix that as part of configure/Makefile I guess (?)..
>>> >> Since we have done this jemalloc integration originally, we can take that
>>> ownership unless anybody sees a problem of enabling tcmalloc/jemalloc with
>>> librbd/librados.
>>> >>
>>> >> << You have to remove libcmalloc out of your build environment to get
>>> >> this done How do I do that ? I am using Ubuntu and can't afford to remove
>>> libc* packages.
>>> >
>>> > I always use a chroot to build packages where only a minimal bootstrap +
>>> the build deps are installed. googleperftools where libtcmalloc comes from is
>>> not Ubuntu "core/minimal".
>>> >
>>> > Stefan
>>> >
>>> >>
>>> >> Thanks & Regards
>>> >> Somnath
>>> >>
>>> >> -----Original Message-----
>>> >> From: Stefan Priebe [mailto:s.priebe@profihost.ag]
>>> >> Sent: Wednesday, August 19, 2015 1:18 PM
>>> >> To: Somnath Roy; Alexandre DERUMIER; Mark Nelson
>>> >> Cc: ceph-devel
>>> >> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
>>> >>
>>> >>
>>> >> Am 19.08.2015 um 22:16 schrieb Somnath Roy:
>>> >>> Alexandre,
>>> >>> I am not able to build librados/librbd by using the following config option.
>>> >>>
>>> >>> ./configure –without-tcmalloc –with-jemalloc
>>> >>
>>> >> Same issue to me. You have to remove libcmalloc out of your build
>>> environment to get this done.
>>> >>
>>> >> Stefan
>>> >>
>>> >>
>>> >>> It seems it is building osd/mon/Mds/RGW with jemalloc enabled..
>>> >>>
>>> >>> root@emsnode10:~/ceph-latest/src# ldd ./ceph-osd
>>> >>>            linux-vdso.so.1 =>  (0x00007ffd0eb43000)
>>> >>>            libjemalloc.so.1 => /usr/lib/x86_64-linux-gnu/libjemalloc.so.1
>>> (0x00007f5f92d70000)
>>> >>>            .......
>>> >>>
>>> >>> root@emsnode10:~/ceph-latest/src/.libs# ldd ./librados.so.2.0.0
>>> >>>            linux-vdso.so.1 =>  (0x00007ffed46f2000)
>>> >>>            libboost_thread.so.1.55.0 => /usr/lib/x86_64-linux-
>>> gnu/libboost_thread.so.1.55.0 (0x00007ff687887000)
>>> >>>            liblttng-ust.so.0 => /usr/lib/x86_64-linux-gnu/liblttng-ust.so.0
>>> (0x00007ff68763d000)
>>> >>>            libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff687438000)
>>> >>>            libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
>>> (0x00007ff68721a000)
>>> >>>            libnss3.so => /usr/lib/x86_64-linux-gnu/libnss3.so
>>> (0x00007ff686ee0000)
>>> >>>            libsmime3.so => /usr/lib/x86_64-linux-gnu/libsmime3.so
>>> (0x00007ff686cb3000)
>>> >>>            libnspr4.so => /usr/lib/x86_64-linux-gnu/libnspr4.so
>>> (0x00007ff686a76000)
>>> >>>            libuuid.so.1 => /lib/x86_64-linux-gnu/libuuid.so.1
>>> (0x00007ff686871000)
>>> >>>            librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007ff686668000)
>>> >>>            libboost_system.so.1.55.0 => /usr/lib/x86_64-linux-
>>> gnu/libboost_system.so.1.55.0 (0x00007ff686464000)
>>> >>>            libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>>> (0x00007ff686160000)
>>> >>>            libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff685e59000)
>>> >>>            libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff685a94000)
>>> >>>            libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1
>>> (0x00007ff68587e000)
>>> >>>            liblttng-ust-tracepoint.so.0 => /usr/lib/x86_64-linux-gnu/liblttng-
>>> ust-tracepoint.so.0 (0x00007ff685663000)
>>> >>>            liburcu-bp.so.1 => /usr/lib/liburcu-bp.so.1 (0x00007ff68545c000)
>>> >>>            liburcu-cds.so.1 => /usr/lib/liburcu-cds.so.1 (0x00007ff685255000)
>>> >>>            /lib64/ld-linux-x86-64.so.2 (0x00007ff68a0f6000)
>>> >>>            libnssutil3.so => /usr/lib/x86_64-linux-gnu/libnssutil3.so
>>> (0x00007ff685029000)
>>> >>>            libplc4.so => /usr/lib/x86_64-linux-gnu/libplc4.so
>>> (0x00007ff684e24000)
>>> >>>            libplds4.so => /usr/lib/x86_64-linux-gnu/libplds4.so
>>> >>> (0x00007ff684c20000)
>>> >>>
>>> >>> It is building with libcmalloc always...
>>> >>>
>>> >>> Did you change the ceph makefiles to build librbd/librados with jemalloc
>>> ?
>>> >>>
>>> >>> Thanks & Regards
>>> >>> Somnath
>>> >>>
>>> >>> -----Original Message-----
>>> >>> From: ceph-devel-owner@vger.kernel.org
>>> >>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Alexandre
>>> >>> DERUMIER
>>> >>> Sent: Wednesday, August 19, 2015 7:01 AM
>>> >>> To: Mark Nelson
>>> >>> Cc: ceph-devel
>>> >>> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
>>> >>>
>>> >>> Thanks Marc,
>>> >>>
>>> >>> Results are matching exactly what I have seen with tcmalloc 2.1 vs 2.4 vs
>>> jemalloc.
>>> >>>
>>> >>> and indeed tcmalloc, even with bigger cache, seem decrease over time.
>>> >>>
>>> >>>
>>> >>> What is funny, is that I see exactly same behaviour client librbd side, with
>>> qemu and multiple iothreads.
>>> >>>
>>> >>>
>>> >>> Switching both server and client to jemalloc give me best performance
>>> on small read currently.
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> ----- Mail original -----
>>> >>> De: "Mark Nelson"
>>> >>> À: "ceph-devel"
>>> >>> Envoyé: Mercredi 19 Août 2015 06:45:36
>>> >>> Objet: Ceph Hackathon: More Memory Allocator Testing
>>> >>>
>>> >>> Hi Everyone,
>>> >>>
>>> >>> One of the goals at the Ceph Hackathon last week was to examine how
>>> to improve Ceph Small IO performance. Jian Zhang presented findings
>>> showing a dramatic improvement in small random IO performance when
>>> Ceph is used with jemalloc. His results build upon Sandisk's original findings
>>> that the default thread cache values are a major bottleneck in TCMalloc 2.1.
>>> To further verify these results, we sat down at the Hackathon and configured
>>> the new performance test cluster that Intel generously donated to the Ceph
>>> community laboratory to run through a variety of tests with different
>>> memory allocator configurations. I've since written the results of those tests
>>> up in pdf form for folks who are interested.
>>> >>>
>>> >>> The results are located here:
>>> >>>
>>> >>>
>>> http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Tes
>>> ting.
>>> >>> pdf
>>> >>>
>>> >>> I want to be clear that many other folks have done the heavy lifting
>>> here. These results are simply a validation of the many tests that other folks
>>> have already done. Many thanks to Sandisk and others for figuring this out as
>>> it's a pretty big deal!
>>> >>>
>>> >>> Side note: Very little tuning other than swapping the memory allocator
>>> and a couple of quick and dirty ceph tunables were set during these tests. It's
>>> quite possible that higher IOPS will be achieved as we really start digging into
>>> the cluster and learning what the bottlenecks are.
>>> >>>
>>> >>> Thanks,
>>> >>> Mark
>>> >>> --
>>> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> >>> in the body of a message to majordomo@vger.kernel.org More
>>> majordomo
>>> >>> info at http://vger.kernel.org/majordomo-info.html
>>> >>>
>>> >>> --
>>> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> >>> in the body of a message to majordomo@vger.kernel.org More
>>> majordomo
>>> >>> info at  http://vger.kernel.org/majordomo-info.html
>>> >>>
>>> >>> ________________________________
>>> >>>
>>> >>> PLEASE NOTE: The information contained in this electronic mail message
>>> is intended only for the use of the designated recipient(s) named above. If
>>> the reader of this message is not the intended recipient, you are hereby
>>> notified that you have received this message in error and that any review,
>>> dissemination, distribution, or copying of this message is strictly prohibited. If
>>> you have received this communication in error, please notify the sender by
>>> telephone or e-mail (as shown above) immediately and destroy any and all
>>> copies of this message in your possession (whether hard copies or
>>> electronically stored copies).
>>> >>>
>>> >>> N     r  y   b X  ǧv ^ )޺{.n +   z ]z   {ay  ʇڙ ,j   f   h   z   w
>>> >>      j:+v   w j m         zZ+     ݢj"  !tml=
>>> >>>
>>> >> N     r  y   b X  ǧv ^ )޺{.n +   z ]z   {ay  ʇڙ ,j   f   h   z   w
>>> >     j:+v   w j m         zZ+     ݢj"  !tml=
>>> >>
>>> N�����r��y���b�X��ǧv�^�)޺{.n�+���z�]z���{ay� ʇڙ�,j
>> ��f���h���z�
>>> �w���
>>>
>>> ���j:+v���w�j�m����
>> ����zZ+�����ݢj"��!�i
>>
>> ________________________________
>>
>> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>>
>
>
>
> --
> Milosz Tanski
> CTO
> 16 East 34th Street, 15th floor
> New York, NY 10016
>
> p: 646-253-9055
> e: milosz@adfin.com
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.0.0
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJV13b/CRDmVDuy+mK58QAAxvkQAI7s/5W4hJ/DCp3h50Lh
1zz9oq2RM6wyTX5SFOcTgdqUKvZOPFHVAt2M3s/q1aCwT2X7+N+AJkFU6rya
d2xCz9BXXgb3EWXdYAIY96QTA4ZL3khcz8HznNVd4bJwRAT8DcM2Q3/O+KPT
GkyaSE1WDvC1M1jKZH1O1CNk0t0qn2TbABvsnPHmfaJ7kA/HdXGA/wGTnFoK
ugAEVVaCBQezxFlU+FOYa72ov0m8IGaoPx7AEbkkzXcH2jNBb2toBMjQPVjo
xes9TkZcw99hMFStlUFMhzuopB9N11yS/UBXjrQm2g1irgFpT/6XKqqrNZwl
AEtk4iC8sAw4CNLzSPx1i6errWqi7Bo2V9ylH+mhBEUZ2I7m40HtWqlu7RyK
FjmDBEEyeI4Osim3r1h7jb4juaq0uuQXZzAeRgyHaH/IDA5ZStwUdOSZ4YNJ
xvq1TLctO64CG6GZeLM45q7V/yOCnOL8wLIDjtea8mAz8x6ugkV5LjdLZ9oh
dEStoZyDEfKmgud8NMbAmCNJOrBSsA9a4Sxe2uoroSiwN60hJxYXmZ11dNv3
9Sraox44Sq4FWWZgCIqS0sJK11kYeF03Cy7fllr9mq9BHr9E4dlhXVLzdPDE
z23QBHSEJDtvOfGXg+nP0UTTjwStWA3UTX7poy+ydR2goTcNZscMSDSOjIFo
ChIM
=WlNV
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Ceph Hackathon: More Memory Allocator Testing
  2015-08-21 19:07                     ` Robert LeBlanc
@ 2015-08-22 13:52                       ` Sage Weil
  0 siblings, 0 replies; 55+ messages in thread
From: Sage Weil @ 2015-08-22 13:52 UTC (permalink / raw)
  To: Robert LeBlanc
  Cc: Milosz Tanski, Shishir Gowda, Somnath Roy, Stefan Priebe,
	Alexandre DERUMIER, Mark Nelson, ceph-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 17682 bytes --]

On Fri, 21 Aug 2015, Robert LeBlanc wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
> 
> Please excuse me if I'm naive about this (I'm not a programmer by
> trade), but wouldn't it be possible to load/link to tcmalloc or
> jemalloc at run time instead of having to do so at compile time?
> Except for tcmalloc being statically linked (is this even a
> requirement?) it seems like it would be much more flexible to have a
> config option, or a list of allocators to try (i.e. jemalloc, then
> tcmalloc then glib depending on what is available). Then people would
> have the option of choosing an allocator without having to recompile
> packages.
> 
> We'd like to use jemalloc, but don't want to deviate from the Ceph
> provided packages if at all possible. I also understand from some
> discussion earlier that it would be very difficult/near impossible to
> perform adequate testing on all Ceph versions x Distro allocators.
> 
> A possible compromise might be that Ceph is tested and distributed
> with the recommended tcmalloc/jemalloc/etc, but if an end user chooses
> can override the allocator in the config. We would be able to perform

I don't think it's possible to select the allocator at runtime, since the 
dynamic loading happens while loading the executable.  You can select an 
allocator using LD_PRELOAD, though, so in your environment it would be 
pretty straightforward to do this by modifying the init script or 
upstart/systemd file:

 LD_PRELOAD=${JEMALLOC_PATH}/lib/libjemalloc.so.1 ceph-osd ...

I suppose we could bake this into the upstream init files?

sage




> the testing of our allocator of choce, Ceph version on our distro to
> our satisfaction. Ceph could also test specific versions of
> jemalloc/tcmalloc/glibc and just state that it is "certified" on
> specific versions of these allocators and running a different version
> is not tested or supported. I would suspect that a 3-way allocator
> testing would help find most of the bugs/issues to make it fairly
> stable across most versions.
> 
> On a different note, we ran into some memory allocation/deallocation
> issues in some kernel code that we are writing. We wound up moving the
> allocation code to a parent function and are just rewriting the memory
> space in the child which has resolved a lot of the performance and
> stability issues we were seeing. I think Ceph has some more complex
> challenges like unknown buffer size (not sure we want to allocate max
> buffers for each request/thread), thread safety and even thread scope.
> If we can somehow reduce these memory operations, I think it will be a
> great win regardless of the allocator. (I think I'm stating the
> obvious here, sorry).
> - ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> 
> 
> On Fri, Aug 21, 2015 at 8:26 AM, Milosz Tanski  wrote:
> > On Fri, Aug 21, 2015 at 12:22 AM, Shishir Gowda
> >  wrote:
> >> Hi All,
> >>
> >> Have sent out a pull request which enables building librados/librbd with either tcmalloc(as default)  or jemalloc.
> >>
> >> Please find the pull request @ https://github.com/ceph/ceph/pull/5628
> >>
> >> With regards,
> >> Shishir
> >
> > Unless I'm missing something here, this seams like the wrong thing to.
> > Libraries that will be linked in by other external applications should
> > not have a 3rd party malloc linked in there. That seams like an
> > application choice. At the very least the default should not be to
> > link in a 3rd party malloc.
> >
> >>
> >>> -----Original Message-----
> >>> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-
> >>> owner@vger.kernel.org] On Behalf Of Somnath Roy
> >>> Sent: Thursday, August 20, 2015 2:14 AM
> >>> To: Stefan Priebe; Alexandre DERUMIER; Mark Nelson
> >>> Cc: ceph-devel
> >>> Subject: RE: Ceph Hackathon: More Memory Allocator Testing
> >>>
> >>> Yeah , I can see ceph-osd/ceph-mon built with jemalloc.
> >>>
> >>> Thanks & Regards
> >>> Somnath
> >>>
> >>> -----Original Message-----
> >>> From: Stefan Priebe [mailto:s.priebe@profihost.ag]
> >>> Sent: Wednesday, August 19, 2015 1:41 PM
> >>> To: Somnath Roy; Alexandre DERUMIER; Mark Nelson
> >>> Cc: ceph-devel
> >>> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
> >>>
> >>>
> >>> Am 19.08.2015 um 22:34 schrieb Somnath Roy:
> >>> > But, you said you need to remove libcmalloc *not* libtcmalloc...
> >>> > I saw librbd/librados is built with libcmalloc not with libtcmalloc..
> >>> > So, are you saying to remove libtcmalloc (not libcmalloc) to enable jemalloc
> >>> ?
> >>>
> >>> Ouch my mistake. I read libtcmalloc - too late here.
> >>>
> >>> My build (Hammer) says:
> >>> # ldd /usr/lib/librados.so.2.0.0
> >>>          linux-vdso.so.1 =>  (0x00007fff4f71d000)
> >>>          libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fafdb26c000)
> >>>          libboost_thread.so.1.49.0 => /usr/lib/libboost_thread.so.1.49.0
> >>> (0x00007fafdb24f000)
> >>>          libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
> >>> (0x00007fafdb032000)
> >>>          libcrypto++.so.9 => /usr/lib/libcrypto++.so.9 (0x00007fafda924000)
> >>>          libuuid.so.1 => /lib/x86_64-linux-gnu/libuuid.so.1
> >>> (0x00007fafda71f000)
> >>>          librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fafda516000)
> >>>          libboost_system.so.1.49.0 => /usr/lib/libboost_system.so.1.49.0
> >>> (0x00007fafda512000)
> >>>          libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> >>> (0x00007fafda20b000)
> >>>          libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fafd9f88000)
> >>>          libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fafd9bfd000)
> >>>          libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1
> >>> (0x00007fafd99e7000)
> >>>          /lib64/ld-linux-x86-64.so.2 (0x000056358ecfe000)
> >>>
> >>> Only ceph-osd is linked against libjemalloc for me.
> >>>
> >>> Stefan
> >>>
> >>> > -----Original Message-----
> >>> > From: Stefan Priebe [mailto:s.priebe@profihost.ag]
> >>> > Sent: Wednesday, August 19, 2015 1:31 PM
> >>> > To: Somnath Roy; Alexandre DERUMIER; Mark Nelson
> >>> > Cc: ceph-devel
> >>> > Subject: Re: Ceph Hackathon: More Memory Allocator Testing
> >>> >
> >>> >
> >>> > Am 19.08.2015 um 22:29 schrieb Somnath Roy:
> >>> >> Hmm...We need to fix that as part of configure/Makefile I guess (?)..
> >>> >> Since we have done this jemalloc integration originally, we can take that
> >>> ownership unless anybody sees a problem of enabling tcmalloc/jemalloc with
> >>> librbd/librados.
> >>> >>
> >>> >> << You have to remove libcmalloc out of your build environment to get
> >>> >> this done How do I do that ? I am using Ubuntu and can't afford to remove
> >>> libc* packages.
> >>> >
> >>> > I always use a chroot to build packages where only a minimal bootstrap +
> >>> the build deps are installed. googleperftools where libtcmalloc comes from is
> >>> not Ubuntu "core/minimal".
> >>> >
> >>> > Stefan
> >>> >
> >>> >>
> >>> >> Thanks & Regards
> >>> >> Somnath
> >>> >>
> >>> >> -----Original Message-----
> >>> >> From: Stefan Priebe [mailto:s.priebe@profihost.ag]
> >>> >> Sent: Wednesday, August 19, 2015 1:18 PM
> >>> >> To: Somnath Roy; Alexandre DERUMIER; Mark Nelson
> >>> >> Cc: ceph-devel
> >>> >> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
> >>> >>
> >>> >>
> >>> >> Am 19.08.2015 um 22:16 schrieb Somnath Roy:
> >>> >>> Alexandre,
> >>> >>> I am not able to build librados/librbd by using the following config option.
> >>> >>>
> >>> >>> ./configure ?without-tcmalloc ?with-jemalloc
> >>> >>
> >>> >> Same issue to me. You have to remove libcmalloc out of your build
> >>> environment to get this done.
> >>> >>
> >>> >> Stefan
> >>> >>
> >>> >>
> >>> >>> It seems it is building osd/mon/Mds/RGW with jemalloc enabled..
> >>> >>>
> >>> >>> root@emsnode10:~/ceph-latest/src# ldd ./ceph-osd
> >>> >>>            linux-vdso.so.1 =>  (0x00007ffd0eb43000)
> >>> >>>            libjemalloc.so.1 => /usr/lib/x86_64-linux-gnu/libjemalloc.so.1
> >>> (0x00007f5f92d70000)
> >>> >>>            .......
> >>> >>>
> >>> >>> root@emsnode10:~/ceph-latest/src/.libs# ldd ./librados.so.2.0.0
> >>> >>>            linux-vdso.so.1 =>  (0x00007ffed46f2000)
> >>> >>>            libboost_thread.so.1.55.0 => /usr/lib/x86_64-linux-
> >>> gnu/libboost_thread.so.1.55.0 (0x00007ff687887000)
> >>> >>>            liblttng-ust.so.0 => /usr/lib/x86_64-linux-gnu/liblttng-ust.so.0
> >>> (0x00007ff68763d000)
> >>> >>>            libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff687438000)
> >>> >>>            libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
> >>> (0x00007ff68721a000)
> >>> >>>            libnss3.so => /usr/lib/x86_64-linux-gnu/libnss3.so
> >>> (0x00007ff686ee0000)
> >>> >>>            libsmime3.so => /usr/lib/x86_64-linux-gnu/libsmime3.so
> >>> (0x00007ff686cb3000)
> >>> >>>            libnspr4.so => /usr/lib/x86_64-linux-gnu/libnspr4.so
> >>> (0x00007ff686a76000)
> >>> >>>            libuuid.so.1 => /lib/x86_64-linux-gnu/libuuid.so.1
> >>> (0x00007ff686871000)
> >>> >>>            librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007ff686668000)
> >>> >>>            libboost_system.so.1.55.0 => /usr/lib/x86_64-linux-
> >>> gnu/libboost_system.so.1.55.0 (0x00007ff686464000)
> >>> >>>            libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> >>> (0x00007ff686160000)
> >>> >>>            libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff685e59000)
> >>> >>>            libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff685a94000)
> >>> >>>            libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1
> >>> (0x00007ff68587e000)
> >>> >>>            liblttng-ust-tracepoint.so.0 => /usr/lib/x86_64-linux-gnu/liblttng-
> >>> ust-tracepoint.so.0 (0x00007ff685663000)
> >>> >>>            liburcu-bp.so.1 => /usr/lib/liburcu-bp.so.1 (0x00007ff68545c000)
> >>> >>>            liburcu-cds.so.1 => /usr/lib/liburcu-cds.so.1 (0x00007ff685255000)
> >>> >>>            /lib64/ld-linux-x86-64.so.2 (0x00007ff68a0f6000)
> >>> >>>            libnssutil3.so => /usr/lib/x86_64-linux-gnu/libnssutil3.so
> >>> (0x00007ff685029000)
> >>> >>>            libplc4.so => /usr/lib/x86_64-linux-gnu/libplc4.so
> >>> (0x00007ff684e24000)
> >>> >>>            libplds4.so => /usr/lib/x86_64-linux-gnu/libplds4.so
> >>> >>> (0x00007ff684c20000)
> >>> >>>
> >>> >>> It is building with libcmalloc always...
> >>> >>>
> >>> >>> Did you change the ceph makefiles to build librbd/librados with jemalloc
> >>> ?
> >>> >>>
> >>> >>> Thanks & Regards
> >>> >>> Somnath
> >>> >>>
> >>> >>> -----Original Message-----
> >>> >>> From: ceph-devel-owner@vger.kernel.org
> >>> >>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Alexandre
> >>> >>> DERUMIER
> >>> >>> Sent: Wednesday, August 19, 2015 7:01 AM
> >>> >>> To: Mark Nelson
> >>> >>> Cc: ceph-devel
> >>> >>> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
> >>> >>>
> >>> >>> Thanks Marc,
> >>> >>>
> >>> >>> Results are matching exactly what I have seen with tcmalloc 2.1 vs 2.4 vs
> >>> jemalloc.
> >>> >>>
> >>> >>> and indeed tcmalloc, even with bigger cache, seem decrease over time.
> >>> >>>
> >>> >>>
> >>> >>> What is funny, is that I see exactly same behaviour client librbd side, with
> >>> qemu and multiple iothreads.
> >>> >>>
> >>> >>>
> >>> >>> Switching both server and client to jemalloc give me best performance
> >>> on small read currently.
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>> ----- Mail original -----
> >>> >>> De: "Mark Nelson"
> >>> >>> À: "ceph-devel"
> >>> >>> Envoyé: Mercredi 19 Août 2015 06:45:36
> >>> >>> Objet: Ceph Hackathon: More Memory Allocator Testing
> >>> >>>
> >>> >>> Hi Everyone,
> >>> >>>
> >>> >>> One of the goals at the Ceph Hackathon last week was to examine how
> >>> to improve Ceph Small IO performance. Jian Zhang presented findings
> >>> showing a dramatic improvement in small random IO performance when
> >>> Ceph is used with jemalloc. His results build upon Sandisk's original findings
> >>> that the default thread cache values are a major bottleneck in TCMalloc 2.1.
> >>> To further verify these results, we sat down at the Hackathon and configured
> >>> the new performance test cluster that Intel generously donated to the Ceph
> >>> community laboratory to run through a variety of tests with different
> >>> memory allocator configurations. I've since written the results of those tests
> >>> up in pdf form for folks who are interested.
> >>> >>>
> >>> >>> The results are located here:
> >>> >>>
> >>> >>>
> >>> http://nhm.ceph.com/hackathon/Ceph_Hackathon_Memory_Allocator_Tes
> >>> ting.
> >>> >>> pdf
> >>> >>>
> >>> >>> I want to be clear that many other folks have done the heavy lifting
> >>> here. These results are simply a validation of the many tests that other folks
> >>> have already done. Many thanks to Sandisk and others for figuring this out as
> >>> it's a pretty big deal!
> >>> >>>
> >>> >>> Side note: Very little tuning other than swapping the memory allocator
> >>> and a couple of quick and dirty ceph tunables were set during these tests. It's
> >>> quite possible that higher IOPS will be achieved as we really start digging into
> >>> the cluster and learning what the bottlenecks are.
> >>> >>>
> >>> >>> Thanks,
> >>> >>> Mark
> >>> >>> --
> >>> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> >>> >>> in the body of a message to majordomo@vger.kernel.org More
> >>> majordomo
> >>> >>> info at http://vger.kernel.org/majordomo-info.html
> >>> >>>
> >>> >>> --
> >>> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> >>> >>> in the body of a message to majordomo@vger.kernel.org More
> >>> majordomo
> >>> >>> info at  http://vger.kernel.org/majordomo-info.html
> >>> >>>
> >>> >>> ________________________________
> >>> >>>
> >>> >>> PLEASE NOTE: The information contained in this electronic mail message
> >>> is intended only for the use of the designated recipient(s) named above. If
> >>> the reader of this message is not the intended recipient, you are hereby
> >>> notified that you have received this message in error and that any review,
> >>> dissemination, distribution, or copying of this message is strictly prohibited. If
> >>> you have received this communication in error, please notify the sender by
> >>> telephone or e-mail (as shown above) immediately and destroy any and all
> >>> copies of this message in your possession (whether hard copies or
> >>> electronically stored copies).
> >>> >>>
> >>> >>> N     r  y   b X  ?v ^ )?{.n +   z ]z   {ay  ?? ,j   f   h   z   w
> >>> >>      j:+v   w j m         zZ+     ?j"  !tml=
> >>> >>>
> >>> >> N     r  y   b X  ?v ^ )?{.n +   z ]z   {ay  ?? ,j   f   h   z   w
> >>> >     j:+v   w j m         zZ+     ?j"  !tml=
> >>> >>
> >>> N???????????????r??????y?????????b???X???????v???^???)?{.n???+?????????z???]z?????????{ay??? ?????,j
> >> ??????f?????????h?????????z???
> >>> ???w?????????
> >>>
> >>> ?????????j:+v?????????w???j???m????????????
> >> ????????????zZ+????????????????j"??????!???i
> >>
> >> ________________________________
> >>
> >> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
> >>
> >
> >
> >
> > --
> > Milosz Tanski
> > CTO
> > 16 East 34th Street, 15th floor
> > New York, NY 10016
> >
> > p: 646-253-9055
> > e: milosz@adfin.com
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> -----BEGIN PGP SIGNATURE-----
> Version: Mailvelope v1.0.0
> Comment: https://www.mailvelope.com
> 
> wsFcBAEBCAAQBQJV13b/CRDmVDuy+mK58QAAxvkQAI7s/5W4hJ/DCp3h50Lh
> 1zz9oq2RM6wyTX5SFOcTgdqUKvZOPFHVAt2M3s/q1aCwT2X7+N+AJkFU6rya
> d2xCz9BXXgb3EWXdYAIY96QTA4ZL3khcz8HznNVd4bJwRAT8DcM2Q3/O+KPT
> GkyaSE1WDvC1M1jKZH1O1CNk0t0qn2TbABvsnPHmfaJ7kA/HdXGA/wGTnFoK
> ugAEVVaCBQezxFlU+FOYa72ov0m8IGaoPx7AEbkkzXcH2jNBb2toBMjQPVjo
> xes9TkZcw99hMFStlUFMhzuopB9N11yS/UBXjrQm2g1irgFpT/6XKqqrNZwl
> AEtk4iC8sAw4CNLzSPx1i6errWqi7Bo2V9ylH+mhBEUZ2I7m40HtWqlu7RyK
> FjmDBEEyeI4Osim3r1h7jb4juaq0uuQXZzAeRgyHaH/IDA5ZStwUdOSZ4YNJ
> xvq1TLctO64CG6GZeLM45q7V/yOCnOL8wLIDjtea8mAz8x6ugkV5LjdLZ9oh
> dEStoZyDEfKmgud8NMbAmCNJOrBSsA9a4Sxe2uoroSiwN60hJxYXmZ11dNv3
> 9Sraox44Sq4FWWZgCIqS0sJK11kYeF03Cy7fllr9mq9BHr9E4dlhXVLzdPDE
> z23QBHSEJDtvOfGXg+nP0UTTjwStWA3UTX7poy+ydR2goTcNZscMSDSOjIFo
> ChIM
> =WlNV
> -----END PGP SIGNATURE-----
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Ceph Hackathon: More Memory Allocator Testing
  2015-08-21 14:26                   ` Milosz Tanski
  2015-08-21 19:07                     ` Robert LeBlanc
@ 2015-08-22 13:55                     ` Sage Weil
  2015-08-22 16:15                       ` Somnath Roy
  2015-08-24 17:01                       ` Robert LeBlanc
  1 sibling, 2 replies; 55+ messages in thread
From: Sage Weil @ 2015-08-22 13:55 UTC (permalink / raw)
  To: Milosz Tanski
  Cc: Shishir Gowda, Somnath Roy, Stefan Priebe, Alexandre DERUMIER,
	Mark Nelson, ceph-devel

On Fri, 21 Aug 2015, Milosz Tanski wrote:
> On Fri, Aug 21, 2015 at 12:22 AM, Shishir Gowda
> <Shishir.Gowda@sandisk.com> wrote:
> > Hi All,
> >
> > Have sent out a pull request which enables building librados/librbd with either tcmalloc(as default)  or jemalloc.
> >
> > Please find the pull request @ https://github.com/ceph/ceph/pull/5628
> >
> > With regards,
> > Shishir
> 
> Unless I'm missing something here, this seams like the wrong thing to.
> Libraries that will be linked in by other external applications should
> not have a 3rd party malloc linked in there. That seams like an
> application choice. At the very least the default should not be to
> link in a 3rd party malloc.

Yeah, I think you're right.

Note that this isn't/wasn't always the case, though.. on precise, for 
instance, libleveldb links libtcmalloc.  They stopped doing this 
sometime before trusty.

sage

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: Ceph Hackathon: More Memory Allocator Testing
  2015-08-22 13:55                     ` Sage Weil
@ 2015-08-22 16:15                       ` Somnath Roy
  2015-08-22 16:57                         ` Alexandre DERUMIER
  2015-08-24 17:01                       ` Robert LeBlanc
  1 sibling, 1 reply; 55+ messages in thread
From: Somnath Roy @ 2015-08-22 16:15 UTC (permalink / raw)
  To: Sage Weil, Milosz Tanski
  Cc: Shishir Gowda, Stefan Priebe, Alexandre DERUMIER, Mark Nelson,
	ceph-devel

Yes, even today rocksdb also linked with tcmalloc. It doesn't mean all the application using rocksdb needs to be built with tcmalloc.
Sage,
Wanted to know is there any reason we didn't link client libraries with tcmalloc at the first place (but did link only OSDs/mon/RGW) ?

Thanks & Regards
Somnath

-----Original Message-----
From: Sage Weil [mailto:sage@newdream.net]
Sent: Saturday, August 22, 2015 6:56 AM
To: Milosz Tanski
Cc: Shishir Gowda; Somnath Roy; Stefan Priebe; Alexandre DERUMIER; Mark Nelson; ceph-devel
Subject: Re: Ceph Hackathon: More Memory Allocator Testing

On Fri, 21 Aug 2015, Milosz Tanski wrote:
> On Fri, Aug 21, 2015 at 12:22 AM, Shishir Gowda
> <Shishir.Gowda@sandisk.com> wrote:
> > Hi All,
> >
> > Have sent out a pull request which enables building librados/librbd with either tcmalloc(as default)  or jemalloc.
> >
> > Please find the pull request @
> > https://github.com/ceph/ceph/pull/5628
> >
> > With regards,
> > Shishir
>
> Unless I'm missing something here, this seams like the wrong thing to.
> Libraries that will be linked in by other external applications should
> not have a 3rd party malloc linked in there. That seams like an
> application choice. At the very least the default should not be to
> link in a 3rd party malloc.

Yeah, I think you're right.

Note that this isn't/wasn't always the case, though.. on precise, for instance, libleveldb links libtcmalloc.  They stopped doing this sometime before trusty.

sage

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Ceph Hackathon: More Memory Allocator Testing
  2015-08-22 16:15                       ` Somnath Roy
@ 2015-08-22 16:57                         ` Alexandre DERUMIER
  2015-08-22 17:03                           ` Somnath Roy
  0 siblings, 1 reply; 55+ messages in thread
From: Alexandre DERUMIER @ 2015-08-22 16:57 UTC (permalink / raw)
  To: Somnath Roy
  Cc: Sage Weil, Milosz Tanski, Shishir Gowda, Stefan Priebe,
	Mark Nelson, ceph-devel

>>Wanted to know is there any reason we didn't link client libraries with tcmalloc at the first place (but did link only OSDs/mon/RGW) ?

Do we need to link client librairies ?

I'm building qemu with jemalloc , and it's seem to be enough.



----- Mail original -----
De: "Somnath Roy" <Somnath.Roy@sandisk.com>
À: "Sage Weil" <sage@newdream.net>, "Milosz Tanski" <milosz@adfin.com>
Cc: "Shishir Gowda" <Shishir.Gowda@sandisk.com>, "Stefan Priebe" <s.priebe@profihost.ag>, "aderumier" <aderumier@odiso.com>, "Mark Nelson" <mnelson@redhat.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Samedi 22 Août 2015 18:15:36
Objet: RE: Ceph Hackathon: More Memory Allocator Testing

Yes, even today rocksdb also linked with tcmalloc. It doesn't mean all the application using rocksdb needs to be built with tcmalloc. 
Sage, 
Wanted to know is there any reason we didn't link client libraries with tcmalloc at the first place (but did link only OSDs/mon/RGW) ? 

Thanks & Regards 
Somnath 

-----Original Message----- 
From: Sage Weil [mailto:sage@newdream.net] 
Sent: Saturday, August 22, 2015 6:56 AM 
To: Milosz Tanski 
Cc: Shishir Gowda; Somnath Roy; Stefan Priebe; Alexandre DERUMIER; Mark Nelson; ceph-devel 
Subject: Re: Ceph Hackathon: More Memory Allocator Testing 

On Fri, 21 Aug 2015, Milosz Tanski wrote: 
> On Fri, Aug 21, 2015 at 12:22 AM, Shishir Gowda 
> <Shishir.Gowda@sandisk.com> wrote: 
> > Hi All, 
> > 
> > Have sent out a pull request which enables building librados/librbd with either tcmalloc(as default) or jemalloc. 
> > 
> > Please find the pull request @ 
> > https://github.com/ceph/ceph/pull/5628 
> > 
> > With regards, 
> > Shishir 
> 
> Unless I'm missing something here, this seams like the wrong thing to. 
> Libraries that will be linked in by other external applications should 
> not have a 3rd party malloc linked in there. That seams like an 
> application choice. At the very least the default should not be to 
> link in a 3rd party malloc. 

Yeah, I think you're right. 

Note that this isn't/wasn't always the case, though.. on precise, for instance, libleveldb links libtcmalloc. They stopped doing this sometime before trusty. 

sage 

________________________________ 

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: Ceph Hackathon: More Memory Allocator Testing
  2015-08-22 16:57                         ` Alexandre DERUMIER
@ 2015-08-22 17:03                           ` Somnath Roy
  2015-08-23 13:12                             ` Alexandre DERUMIER
  2015-09-03  9:13                             ` Shinobu Kinjo
  0 siblings, 2 replies; 55+ messages in thread
From: Somnath Roy @ 2015-08-22 17:03 UTC (permalink / raw)
  To: Alexandre DERUMIER
  Cc: Sage Weil, Milosz Tanski, Shishir Gowda, Stefan Priebe,
	Mark Nelson, ceph-devel

Need to see if client is overriding the libraries built with different malloc libraries I guess..
I am not sure in your case the benefit you are seeing is because of qemu is more efficient with tcmalloc/jemalloc or the entire client stack ?

-----Original Message-----
From: Alexandre DERUMIER [mailto:aderumier@odiso.com] 
Sent: Saturday, August 22, 2015 9:57 AM
To: Somnath Roy
Cc: Sage Weil; Milosz Tanski; Shishir Gowda; Stefan Priebe; Mark Nelson; ceph-devel
Subject: Re: Ceph Hackathon: More Memory Allocator Testing

>>Wanted to know is there any reason we didn't link client libraries with tcmalloc at the first place (but did link only OSDs/mon/RGW) ?

Do we need to link client librairies ?

I'm building qemu with jemalloc , and it's seem to be enough.



----- Mail original -----
De: "Somnath Roy" <Somnath.Roy@sandisk.com>
À: "Sage Weil" <sage@newdream.net>, "Milosz Tanski" <milosz@adfin.com>
Cc: "Shishir Gowda" <Shishir.Gowda@sandisk.com>, "Stefan Priebe" <s.priebe@profihost.ag>, "aderumier" <aderumier@odiso.com>, "Mark Nelson" <mnelson@redhat.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Samedi 22 Août 2015 18:15:36
Objet: RE: Ceph Hackathon: More Memory Allocator Testing

Yes, even today rocksdb also linked with tcmalloc. It doesn't mean all the application using rocksdb needs to be built with tcmalloc. 
Sage,
Wanted to know is there any reason we didn't link client libraries with tcmalloc at the first place (but did link only OSDs/mon/RGW) ? 

Thanks & Regards
Somnath 

-----Original Message-----
From: Sage Weil [mailto:sage@newdream.net]
Sent: Saturday, August 22, 2015 6:56 AM
To: Milosz Tanski
Cc: Shishir Gowda; Somnath Roy; Stefan Priebe; Alexandre DERUMIER; Mark Nelson; ceph-devel
Subject: Re: Ceph Hackathon: More Memory Allocator Testing 

On Fri, 21 Aug 2015, Milosz Tanski wrote: 
> On Fri, Aug 21, 2015 at 12:22 AM, Shishir Gowda 
> <Shishir.Gowda@sandisk.com> wrote:
> > Hi All,
> > 
> > Have sent out a pull request which enables building librados/librbd with either tcmalloc(as default) or jemalloc. 
> > 
> > Please find the pull request @
> > https://github.com/ceph/ceph/pull/5628
> > 
> > With regards,
> > Shishir
> 
> Unless I'm missing something here, this seams like the wrong thing to. 
> Libraries that will be linked in by other external applications should 
> not have a 3rd party malloc linked in there. That seams like an 
> application choice. At the very least the default should not be to 
> link in a 3rd party malloc.

Yeah, I think you're right. 

Note that this isn't/wasn't always the case, though.. on precise, for instance, libleveldb links libtcmalloc. They stopped doing this sometime before trusty. 

sage 

________________________________ 

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Ceph Hackathon: More Memory Allocator Testing
  2015-08-22 17:03                           ` Somnath Roy
@ 2015-08-23 13:12                             ` Alexandre DERUMIER
  2015-08-23 16:38                               ` Somnath Roy
  2015-09-03  9:13                             ` Shinobu Kinjo
  1 sibling, 1 reply; 55+ messages in thread
From: Alexandre DERUMIER @ 2015-08-23 13:12 UTC (permalink / raw)
  To: Somnath Roy
  Cc: Sage Weil, Milosz Tanski, Shishir Gowda, Stefan Priebe,
	Mark Nelson, ceph-devel

>>I am not sure in your case the benefit you are seeing is because of qemu is more efficient with tcmalloc/jemalloc or the entire client stack ? 

From my test, qemu, fio or "rados bench" are more efficient with tcmalloc/jemmaloc when using librbd.

For qemu, I don't see any difference with other backends (iscsi,nfs,local), only rbd backend have a big difference when using another memory allocator.

Here some qemu results (1disk / 1 iothread):

glibc malloc
------------

1 disk      29052
2 disks     55878
4 disks     127899
8 disks     240566
15 disks    269976

jemalloc
--------

1 disk      41278
2 disks     75781
4 disks     195351
8 disks     294241
15 disks    298199

tcmalloc  default cache (increasing threads hit tcmalloc bug)
----------------------------

1 disk   37911
2 disks  67698
4 disks  41076
8 disks  43312
15 disks 37569

tcmalloc : 256M cache
---------------------------

1 disk     33914
2 disks    58839
4 disks    148205
8 disks    213298
15 disks   218383


----- Mail original -----
De: "Somnath Roy" <Somnath.Roy@sandisk.com>
À: "aderumier" <aderumier@odiso.com>
Cc: "Sage Weil" <sage@newdream.net>, "Milosz Tanski" <milosz@adfin.com>, "Shishir Gowda" <Shishir.Gowda@sandisk.com>, "Stefan Priebe" <s.priebe@profihost.ag>, "Mark Nelson" <mnelson@redhat.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Samedi 22 Août 2015 19:03:41
Objet: RE: Ceph Hackathon: More Memory Allocator Testing

Need to see if client is overriding the libraries built with different malloc libraries I guess.. 
I am not sure in your case the benefit you are seeing is because of qemu is more efficient with tcmalloc/jemalloc or the entire client stack ? 

-----Original Message----- 
From: Alexandre DERUMIER [mailto:aderumier@odiso.com] 
Sent: Saturday, August 22, 2015 9:57 AM 
To: Somnath Roy 
Cc: Sage Weil; Milosz Tanski; Shishir Gowda; Stefan Priebe; Mark Nelson; ceph-devel 
Subject: Re: Ceph Hackathon: More Memory Allocator Testing 

>>Wanted to know is there any reason we didn't link client libraries with tcmalloc at the first place (but did link only OSDs/mon/RGW) ? 

Do we need to link client librairies ? 

I'm building qemu with jemalloc , and it's seem to be enough. 



----- Mail original ----- 
De: "Somnath Roy" <Somnath.Roy@sandisk.com> 
À: "Sage Weil" <sage@newdream.net>, "Milosz Tanski" <milosz@adfin.com> 
Cc: "Shishir Gowda" <Shishir.Gowda@sandisk.com>, "Stefan Priebe" <s.priebe@profihost.ag>, "aderumier" <aderumier@odiso.com>, "Mark Nelson" <mnelson@redhat.com>, "ceph-devel" <ceph-devel@vger.kernel.org> 
Envoyé: Samedi 22 Août 2015 18:15:36 
Objet: RE: Ceph Hackathon: More Memory Allocator Testing 

Yes, even today rocksdb also linked with tcmalloc. It doesn't mean all the application using rocksdb needs to be built with tcmalloc. 
Sage, 
Wanted to know is there any reason we didn't link client libraries with tcmalloc at the first place (but did link only OSDs/mon/RGW) ? 

Thanks & Regards 
Somnath 

-----Original Message----- 
From: Sage Weil [mailto:sage@newdream.net] 
Sent: Saturday, August 22, 2015 6:56 AM 
To: Milosz Tanski 
Cc: Shishir Gowda; Somnath Roy; Stefan Priebe; Alexandre DERUMIER; Mark Nelson; ceph-devel 
Subject: Re: Ceph Hackathon: More Memory Allocator Testing 

On Fri, 21 Aug 2015, Milosz Tanski wrote: 
> On Fri, Aug 21, 2015 at 12:22 AM, Shishir Gowda 
> <Shishir.Gowda@sandisk.com> wrote: 
> > Hi All, 
> > 
> > Have sent out a pull request which enables building librados/librbd with either tcmalloc(as default) or jemalloc. 
> > 
> > Please find the pull request @ 
> > https://github.com/ceph/ceph/pull/5628 
> > 
> > With regards, 
> > Shishir 
> 
> Unless I'm missing something here, this seams like the wrong thing to. 
> Libraries that will be linked in by other external applications should 
> not have a 3rd party malloc linked in there. That seams like an 
> application choice. At the very least the default should not be to 
> link in a 3rd party malloc. 

Yeah, I think you're right. 

Note that this isn't/wasn't always the case, though.. on precise, for instance, libleveldb links libtcmalloc. They stopped doing this sometime before trusty. 

sage 

________________________________ 

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: Ceph Hackathon: More Memory Allocator Testing
  2015-08-23 13:12                             ` Alexandre DERUMIER
@ 2015-08-23 16:38                               ` Somnath Roy
  0 siblings, 0 replies; 55+ messages in thread
From: Somnath Roy @ 2015-08-23 16:38 UTC (permalink / raw)
  To: Alexandre DERUMIER
  Cc: Sage Weil, Milosz Tanski, Shishir Gowda, Stefan Priebe,
	Mark Nelson, ceph-devel

Hmm..You said you built qemu with tcmalloc/jemalloc and based on your result it seems it is overriding the librbd allocator probably. We need to see if qemu is built with glibc and librbd is built with say jemalloc, still you will see improvement or not. If you have some spare time, could you please use the patch to verify this ?
Did you build fio/rados bench also with tcmalloc/jemalloc ? If not, how/why it is improving ?

Thanks & Regards
Somnath

-----Original Message-----
From: Alexandre DERUMIER [mailto:aderumier@odiso.com] 
Sent: Sunday, August 23, 2015 6:13 AM
To: Somnath Roy
Cc: Sage Weil; Milosz Tanski; Shishir Gowda; Stefan Priebe; Mark Nelson; ceph-devel
Subject: Re: Ceph Hackathon: More Memory Allocator Testing

>>I am not sure in your case the benefit you are seeing is because of qemu is more efficient with tcmalloc/jemalloc or the entire client stack ? 

From my test, qemu, fio or "rados bench" are more efficient with tcmalloc/jemmaloc when using librbd.

For qemu, I don't see any difference with other backends (iscsi,nfs,local), only rbd backend have a big difference when using another memory allocator.

Here some qemu results (1disk / 1 iothread):

glibc malloc
------------

1 disk      29052
2 disks     55878
4 disks     127899
8 disks     240566
15 disks    269976

jemalloc
--------

1 disk      41278
2 disks     75781
4 disks     195351
8 disks     294241
15 disks    298199

tcmalloc  default cache (increasing threads hit tcmalloc bug)
----------------------------

1 disk   37911
2 disks  67698
4 disks  41076
8 disks  43312
15 disks 37569

tcmalloc : 256M cache
---------------------------

1 disk     33914
2 disks    58839
4 disks    148205
8 disks    213298
15 disks   218383


----- Mail original -----
De: "Somnath Roy" <Somnath.Roy@sandisk.com>
À: "aderumier" <aderumier@odiso.com>
Cc: "Sage Weil" <sage@newdream.net>, "Milosz Tanski" <milosz@adfin.com>, "Shishir Gowda" <Shishir.Gowda@sandisk.com>, "Stefan Priebe" <s.priebe@profihost.ag>, "Mark Nelson" <mnelson@redhat.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Samedi 22 Août 2015 19:03:41
Objet: RE: Ceph Hackathon: More Memory Allocator Testing

Need to see if client is overriding the libraries built with different malloc libraries I guess.. 
I am not sure in your case the benefit you are seeing is because of qemu is more efficient with tcmalloc/jemalloc or the entire client stack ? 

-----Original Message-----
From: Alexandre DERUMIER [mailto:aderumier@odiso.com]
Sent: Saturday, August 22, 2015 9:57 AM
To: Somnath Roy
Cc: Sage Weil; Milosz Tanski; Shishir Gowda; Stefan Priebe; Mark Nelson; ceph-devel
Subject: Re: Ceph Hackathon: More Memory Allocator Testing 

>>Wanted to know is there any reason we didn't link client libraries with tcmalloc at the first place (but did link only OSDs/mon/RGW) ? 

Do we need to link client librairies ? 

I'm building qemu with jemalloc , and it's seem to be enough. 



----- Mail original -----
De: "Somnath Roy" <Somnath.Roy@sandisk.com>
À: "Sage Weil" <sage@newdream.net>, "Milosz Tanski" <milosz@adfin.com>
Cc: "Shishir Gowda" <Shishir.Gowda@sandisk.com>, "Stefan Priebe" <s.priebe@profihost.ag>, "aderumier" <aderumier@odiso.com>, "Mark Nelson" <mnelson@redhat.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Samedi 22 Août 2015 18:15:36
Objet: RE: Ceph Hackathon: More Memory Allocator Testing 

Yes, even today rocksdb also linked with tcmalloc. It doesn't mean all the application using rocksdb needs to be built with tcmalloc. 
Sage,
Wanted to know is there any reason we didn't link client libraries with tcmalloc at the first place (but did link only OSDs/mon/RGW) ? 

Thanks & Regards
Somnath 

-----Original Message-----
From: Sage Weil [mailto:sage@newdream.net]
Sent: Saturday, August 22, 2015 6:56 AM
To: Milosz Tanski
Cc: Shishir Gowda; Somnath Roy; Stefan Priebe; Alexandre DERUMIER; Mark Nelson; ceph-devel
Subject: Re: Ceph Hackathon: More Memory Allocator Testing 

On Fri, 21 Aug 2015, Milosz Tanski wrote: 
> On Fri, Aug 21, 2015 at 12:22 AM, Shishir Gowda 
> <Shishir.Gowda@sandisk.com> wrote:
> > Hi All,
> > 
> > Have sent out a pull request which enables building librados/librbd with either tcmalloc(as default) or jemalloc. 
> > 
> > Please find the pull request @
> > https://github.com/ceph/ceph/pull/5628
> > 
> > With regards,
> > Shishir
> 
> Unless I'm missing something here, this seams like the wrong thing to. 
> Libraries that will be linked in by other external applications should 
> not have a 3rd party malloc linked in there. That seams like an 
> application choice. At the very least the default should not be to 
> link in a 3rd party malloc.

Yeah, I think you're right. 

Note that this isn't/wasn't always the case, though.. on precise, for instance, libleveldb links libtcmalloc. They stopped doing this sometime before trusty. 

sage 

________________________________ 

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Ceph Hackathon: More Memory Allocator Testing
  2015-08-22 13:55                     ` Sage Weil
  2015-08-22 16:15                       ` Somnath Roy
@ 2015-08-24 17:01                       ` Robert LeBlanc
  1 sibling, 0 replies; 55+ messages in thread
From: Robert LeBlanc @ 2015-08-24 17:01 UTC (permalink / raw)
  To: ceph-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Probably possible. The question is if it would it be worth the extra
logic. It seems a lot more complicated than an include line. I don't
know if you would have to use introspection or if you could assume
that functions exist.

https://en.m.wikipedia.org/wiki/Dynamic_loading

Robert LeBlanc

Sent from a mobile device please excuse any typos.

On Aug 22, 2015 7:55 AM, "Sage Weil"  wrote:
On Fri, 21 Aug 2015, Milosz Tanski wrote:
> On Fri, Aug 21, 2015 at 12:22 AM, Shishir Gowda
>  wrote:
> > Hi All,
> >
> > Have sent out a pull request which enables building librados/librbd with either tcmalloc(as default)  or jemalloc.
> >
> > Please find the pull request @ https://github.com/ceph/ceph/pull/5628
> >
> > With regards,
> > Shishir
>
> Unless I'm missing something here, this seams like the wrong thing to.
> Libraries that will be linked in by other external applications should
> not have a 3rd party malloc linked in there. That seams like an
> application choice. At the very least the default should not be to
> link in a 3rd party malloc.

Yeah, I think you're right.

Note that this isn't/wasn't always the case, though.. on precise, for
instance, libleveldb links libtcmalloc.  They stopped doing this
sometime before trusty.

sage
- --
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.0.0
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJV203VCRDmVDuy+mK58QAA4nkQAMjitAd+qP7o6Oq6T/Xq
QnAhiUXzgIW4EPcuKl6CsuQp2jeKn0QYzhbf0qqARnIwTisM7PiKyH7lMgYR
izH7ZA9IJt5KeCrgOJrbpOxWNWHAJgUwPqFPCkOIvjeQjjLa8vh4oki072dC
A8cIgMAo/baRNiRtPn+B4OUpgkzYBUvePHfBMACFfwB6Tu+7p4kqRSXMshOs
3zdV8dG3S4vTS5hJO0zMKN3XDq9Nz0p/+XsSQ9C8bw2LMD/ctVxBBn+g9jiD
HsImeSRHoZpna/XZ8EFdwDaEPiRRdCLxueVMpcwTDQYh86DoPL9tqnmzJeD4
lgFnBJfAztvfSqwTaA0BZWAZyt5XkdUZrY365/ZfhfBpdLeXb8ozdaQcWNd4
nE28ce47m6nU8MenC1/cVSFF1VJ9cxsvc3jOqxcvObs6OwzblGWboxAxMhXD
kS4IdTbOdixiWKgCBvmWvjgI9dUSr00EMaW6LtrPiNjRjF+fiEzKNQrvYkZJ
gVTcXJQVpYZirUj7z+PjnmuFVOuq1aQ0cscl/cDfew/II9ZLtf5+nScE+z2c
SqWfi+q/FI+Z7yJtzHlmY5No19rlFycP4pjawCZhuy7bZts5adh+JhciVv9B
7Y7NM3K4/nmU4DYwC0RbyK1ty2xVwuvME/Ws6Ds4ywjGi29uXBA9ryf8Cgvq
c7zh
=MlV0
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Ceph Hackathon: More Memory Allocator Testing
  2015-08-22 17:03                           ` Somnath Roy
  2015-08-23 13:12                             ` Alexandre DERUMIER
@ 2015-09-03  9:13                             ` Shinobu Kinjo
  2015-09-03 13:06                               ` Daniel Gryniewicz
  1 sibling, 1 reply; 55+ messages in thread
From: Shinobu Kinjo @ 2015-09-03  9:13 UTC (permalink / raw)
  To: Ceph Development

Pre loading jemalloc after compiling with malloc

$ cat hoge.c 
#include <stdlib.h>

int main()
{
    int *ptr = malloc(sizeof(int) * 10);

    if (ptr == NULL)
        exit(EXIT_FAILURE);
    free(ptr);
}


$ gcc ./hoge.c


$ ldd ./a.out 
	linux-vdso.so.1 (0x00007fffe17e5000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fc989c5f000)
	/lib64/ld-linux-x86-64.so.2 (0x000055a718762000)


$ nm ./a.out | grep malloc
                 U malloc@@GLIBC_2.2.5                       // malloc loaded


$ LD_PRELOAD=/usr/lib64/libjemalloc.so.1 \
> ldd a.out
	linux-vdso.so.1 (0x00007fff7fd36000)
	/usr/lib64/libjemalloc.so.1 (0x00007fe6ffe39000)    // jemallo loaded
	libc.so.6 => /lib64/libc.so.6 (0x00007fe6ffa61000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fe6ff844000)
	/lib64/ld-linux-x86-64.so.2 (0x0000560342ddf000)


Logically it could work, but in real world I'm not 100% sure if it works for large scale application.

Shinobu

----- Original Message -----
From: "Somnath Roy" <Somnath.Roy@sandisk.com>
To: "Alexandre DERUMIER" <aderumier@odiso.com>
Cc: "Sage Weil" <sage@newdream.net>, "Milosz Tanski" <milosz@adfin.com>, "Shishir Gowda" <Shishir.Gowda@sandisk.com>, "Stefan Priebe" <s.priebe@profihost.ag>, "Mark Nelson" <mnelson@redhat.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
Sent: Sunday, August 23, 2015 2:03:41 AM
Subject: RE: Ceph Hackathon: More Memory Allocator Testing

Need to see if client is overriding the libraries built with different malloc libraries I guess..
I am not sure in your case the benefit you are seeing is because of qemu is more efficient with tcmalloc/jemalloc or the entire client stack ?

-----Original Message-----
From: Alexandre DERUMIER [mailto:aderumier@odiso.com] 
Sent: Saturday, August 22, 2015 9:57 AM
To: Somnath Roy
Cc: Sage Weil; Milosz Tanski; Shishir Gowda; Stefan Priebe; Mark Nelson; ceph-devel
Subject: Re: Ceph Hackathon: More Memory Allocator Testing

>>Wanted to know is there any reason we didn't link client libraries with tcmalloc at the first place (but did link only OSDs/mon/RGW) ?

Do we need to link client librairies ?

I'm building qemu with jemalloc , and it's seem to be enough.



----- Mail original -----
De: "Somnath Roy" <Somnath.Roy@sandisk.com>
À: "Sage Weil" <sage@newdream.net>, "Milosz Tanski" <milosz@adfin.com>
Cc: "Shishir Gowda" <Shishir.Gowda@sandisk.com>, "Stefan Priebe" <s.priebe@profihost.ag>, "aderumier" <aderumier@odiso.com>, "Mark Nelson" <mnelson@redhat.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
Envoyé: Samedi 22 Août 2015 18:15:36
Objet: RE: Ceph Hackathon: More Memory Allocator Testing

Yes, even today rocksdb also linked with tcmalloc. It doesn't mean all the application using rocksdb needs to be built with tcmalloc. 
Sage,
Wanted to know is there any reason we didn't link client libraries with tcmalloc at the first place (but did link only OSDs/mon/RGW) ? 

Thanks & Regards
Somnath 

-----Original Message-----
From: Sage Weil [mailto:sage@newdream.net]
Sent: Saturday, August 22, 2015 6:56 AM
To: Milosz Tanski
Cc: Shishir Gowda; Somnath Roy; Stefan Priebe; Alexandre DERUMIER; Mark Nelson; ceph-devel
Subject: Re: Ceph Hackathon: More Memory Allocator Testing 

On Fri, 21 Aug 2015, Milosz Tanski wrote: 
> On Fri, Aug 21, 2015 at 12:22 AM, Shishir Gowda 
> <Shishir.Gowda@sandisk.com> wrote:
> > Hi All,
> > 
> > Have sent out a pull request which enables building librados/librbd with either tcmalloc(as default) or jemalloc. 
> > 
> > Please find the pull request @
> > https://github.com/ceph/ceph/pull/5628
> > 
> > With regards,
> > Shishir
> 
> Unless I'm missing something here, this seams like the wrong thing to. 
> Libraries that will be linked in by other external applications should 
> not have a 3rd party malloc linked in there. That seams like an 
> application choice. At the very least the default should not be to 
> link in a 3rd party malloc.

Yeah, I think you're right. 

Note that this isn't/wasn't always the case, though.. on precise, for instance, libleveldb links libtcmalloc. They stopped doing this sometime before trusty. 

sage 

________________________________ 

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). 

N�����r��y���b�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w������j:+v���w�j�m��������zZ+��ݢj"��
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Ceph Hackathon: More Memory Allocator Testing
  2015-09-03  9:13                             ` Shinobu Kinjo
@ 2015-09-03 13:06                               ` Daniel Gryniewicz
  2015-09-03 13:12                                 ` Matt Benjamin
  0 siblings, 1 reply; 55+ messages in thread
From: Daniel Gryniewicz @ 2015-09-03 13:06 UTC (permalink / raw)
  To: Ceph Development

I believe preloading should work fine.  It has been a common way to
debug buffer overruns using electric fence and similar tools for
years, and I have used it in large applications of similar size to
Ceph.

Daniel

On Thu, Sep 3, 2015 at 5:13 AM, Shinobu Kinjo <skinjo@redhat.com> wrote:
>
> Pre loading jemalloc after compiling with malloc
>
> $ cat hoge.c
> #include <stdlib.h>
>
> int main()
> {
>     int *ptr = malloc(sizeof(int) * 10);
>
>     if (ptr == NULL)
>         exit(EXIT_FAILURE);
>     free(ptr);
> }
>
>
> $ gcc ./hoge.c
>
>
> $ ldd ./a.out
>         linux-vdso.so.1 (0x00007fffe17e5000)
>         libc.so.6 => /lib64/libc.so.6 (0x00007fc989c5f000)
>         /lib64/ld-linux-x86-64.so.2 (0x000055a718762000)
>
>
> $ nm ./a.out | grep malloc
>                  U malloc@@GLIBC_2.2.5                       // malloc loaded
>
>
> $ LD_PRELOAD=/usr/lib64/libjemalloc.so.1 \
> > ldd a.out
>         linux-vdso.so.1 (0x00007fff7fd36000)
>         /usr/lib64/libjemalloc.so.1 (0x00007fe6ffe39000)    // jemallo loaded
>         libc.so.6 => /lib64/libc.so.6 (0x00007fe6ffa61000)
>         libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fe6ff844000)
>         /lib64/ld-linux-x86-64.so.2 (0x0000560342ddf000)
>
>
> Logically it could work, but in real world I'm not 100% sure if it works for large scale application.
>
> Shinobu
>
> ----- Original Message -----
> From: "Somnath Roy" <Somnath.Roy@sandisk.com>
> To: "Alexandre DERUMIER" <aderumier@odiso.com>
> Cc: "Sage Weil" <sage@newdream.net>, "Milosz Tanski" <milosz@adfin.com>, "Shishir Gowda" <Shishir.Gowda@sandisk.com>, "Stefan Priebe" <s.priebe@profihost.ag>, "Mark Nelson" <mnelson@redhat.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
> Sent: Sunday, August 23, 2015 2:03:41 AM
> Subject: RE: Ceph Hackathon: More Memory Allocator Testing
>
> Need to see if client is overriding the libraries built with different malloc libraries I guess..
> I am not sure in your case the benefit you are seeing is because of qemu is more efficient with tcmalloc/jemalloc or the entire client stack ?
>
> -----Original Message-----
> From: Alexandre DERUMIER [mailto:aderumier@odiso.com]
> Sent: Saturday, August 22, 2015 9:57 AM
> To: Somnath Roy
> Cc: Sage Weil; Milosz Tanski; Shishir Gowda; Stefan Priebe; Mark Nelson; ceph-devel
> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
>
> >>Wanted to know is there any reason we didn't link client libraries with tcmalloc at the first place (but did link only OSDs/mon/RGW) ?
>
> Do we need to link client librairies ?
>
> I'm building qemu with jemalloc , and it's seem to be enough.
>
>
>
> ----- Mail original -----
> De: "Somnath Roy" <Somnath.Roy@sandisk.com>
> À: "Sage Weil" <sage@newdream.net>, "Milosz Tanski" <milosz@adfin.com>
> Cc: "Shishir Gowda" <Shishir.Gowda@sandisk.com>, "Stefan Priebe" <s.priebe@profihost.ag>, "aderumier" <aderumier@odiso.com>, "Mark Nelson" <mnelson@redhat.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
> Envoyé: Samedi 22 Août 2015 18:15:36
> Objet: RE: Ceph Hackathon: More Memory Allocator Testing
>
> Yes, even today rocksdb also linked with tcmalloc. It doesn't mean all the application using rocksdb needs to be built with tcmalloc.
> Sage,
> Wanted to know is there any reason we didn't link client libraries with tcmalloc at the first place (but did link only OSDs/mon/RGW) ?
>
> Thanks & Regards
> Somnath
>
> -----Original Message-----
> From: Sage Weil [mailto:sage@newdream.net]
> Sent: Saturday, August 22, 2015 6:56 AM
> To: Milosz Tanski
> Cc: Shishir Gowda; Somnath Roy; Stefan Priebe; Alexandre DERUMIER; Mark Nelson; ceph-devel
> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
>
> On Fri, 21 Aug 2015, Milosz Tanski wrote:
> > On Fri, Aug 21, 2015 at 12:22 AM, Shishir Gowda
> > <Shishir.Gowda@sandisk.com> wrote:
> > > Hi All,
> > >
> > > Have sent out a pull request which enables building librados/librbd with either tcmalloc(as default) or jemalloc.
> > >
> > > Please find the pull request @
> > > https://github.com/ceph/ceph/pull/5628
> > >
> > > With regards,
> > > Shishir
> >
> > Unless I'm missing something here, this seams like the wrong thing to.
> > Libraries that will be linked in by other external applications should
> > not have a 3rd party malloc linked in there. That seams like an
> > application choice. At the very least the default should not be to
> > link in a 3rd party malloc.
>
> Yeah, I think you're right.
>
> Note that this isn't/wasn't always the case, though.. on precise, for instance, libleveldb links libtcmalloc. They stopped doing this sometime before trusty.
>
> sage
>
> ________________________________
>
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>
> N�����r��y���b�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w������j:+v���w�j�m��������zZ+��ݢj"��
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Ceph Hackathon: More Memory Allocator Testing
  2015-09-03 13:06                               ` Daniel Gryniewicz
@ 2015-09-03 13:12                                 ` Matt Benjamin
  0 siblings, 0 replies; 55+ messages in thread
From: Matt Benjamin @ 2015-09-03 13:12 UTC (permalink / raw)
  To: Daniel Gryniewicz; +Cc: Ceph Development

We've frequently run fio + libosd (cohort ceph-osd linked as a library) with jemalloc preloaded, without problems.

Matt

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-761-4689
fax.  734-769-8938
cel.  734-216-5309

----- Original Message -----
> From: "Daniel Gryniewicz" <dang@redhat.com>
> To: "Ceph Development" <ceph-devel@vger.kernel.org>
> Sent: Thursday, September 3, 2015 9:06:47 AM
> Subject: Re: Ceph Hackathon: More Memory Allocator Testing
> 
> I believe preloading should work fine.  It has been a common way to
> debug buffer overruns using electric fence and similar tools for
> years, and I have used it in large applications of similar size to
> Ceph.
> 
> Daniel
> 
> On Thu, Sep 3, 2015 at 5:13 AM, Shinobu Kinjo <skinjo@redhat.com> wrote:
> >
> > Pre loading jemalloc after compiling with malloc
> >
> > $ cat hoge.c
> > #include <stdlib.h>
> >
> > int main()
> > {
> >     int *ptr = malloc(sizeof(int) * 10);
> >
> >     if (ptr == NULL)
> >         exit(EXIT_FAILURE);
> >     free(ptr);
> > }
> >
> >
> > $ gcc ./hoge.c
> >
> >
> > $ ldd ./a.out
> >         linux-vdso.so.1 (0x00007fffe17e5000)
> >         libc.so.6 => /lib64/libc.so.6 (0x00007fc989c5f000)
> >         /lib64/ld-linux-x86-64.so.2 (0x000055a718762000)
> >
> >
> > $ nm ./a.out | grep malloc
> >                  U malloc@@GLIBC_2.2.5                       // malloc
> >                  loaded
> >
> >
> > $ LD_PRELOAD=/usr/lib64/libjemalloc.so.1 \
> > > ldd a.out
> >         linux-vdso.so.1 (0x00007fff7fd36000)
> >         /usr/lib64/libjemalloc.so.1 (0x00007fe6ffe39000)    // jemallo
> >         loaded
> >         libc.so.6 => /lib64/libc.so.6 (0x00007fe6ffa61000)
> >         libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fe6ff844000)
> >         /lib64/ld-linux-x86-64.so.2 (0x0000560342ddf000)
> >
> >
> > Logically it could work, but in real world I'm not 100% sure if it works
> > for large scale application.
> >
> > Shinobu
> >
> > ----- Original Message -----
> > From: "Somnath Roy" <Somnath.Roy@sandisk.com>
> > To: "Alexandre DERUMIER" <aderumier@odiso.com>
> > Cc: "Sage Weil" <sage@newdream.net>, "Milosz Tanski" <milosz@adfin.com>,
> > "Shishir Gowda" <Shishir.Gowda@sandisk.com>, "Stefan Priebe"
> > <s.priebe@profihost.ag>, "Mark Nelson" <mnelson@redhat.com>, "ceph-devel"
> > <ceph-devel@vger.kernel.org>
> > Sent: Sunday, August 23, 2015 2:03:41 AM
> > Subject: RE: Ceph Hackathon: More Memory Allocator Testing
> >
> > Need to see if client is overriding the libraries built with different
> > malloc libraries I guess..
> > I am not sure in your case the benefit you are seeing is because of qemu is
> > more efficient with tcmalloc/jemalloc or the entire client stack ?
> >
> > -----Original Message-----
> > From: Alexandre DERUMIER [mailto:aderumier@odiso.com]
> > Sent: Saturday, August 22, 2015 9:57 AM
> > To: Somnath Roy
> > Cc: Sage Weil; Milosz Tanski; Shishir Gowda; Stefan Priebe; Mark Nelson;
> > ceph-devel
> > Subject: Re: Ceph Hackathon: More Memory Allocator Testing
> >
> > >>Wanted to know is there any reason we didn't link client libraries with
> > >>tcmalloc at the first place (but did link only OSDs/mon/RGW) ?
> >
> > Do we need to link client librairies ?
> >
> > I'm building qemu with jemalloc , and it's seem to be enough.
> >
> >
> >
> > ----- Mail original -----
> > De: "Somnath Roy" <Somnath.Roy@sandisk.com>
> > À: "Sage Weil" <sage@newdream.net>, "Milosz Tanski" <milosz@adfin.com>
> > Cc: "Shishir Gowda" <Shishir.Gowda@sandisk.com>, "Stefan Priebe"
> > <s.priebe@profihost.ag>, "aderumier" <aderumier@odiso.com>, "Mark Nelson"
> > <mnelson@redhat.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
> > Envoyé: Samedi 22 Août 2015 18:15:36
> > Objet: RE: Ceph Hackathon: More Memory Allocator Testing
> >
> > Yes, even today rocksdb also linked with tcmalloc. It doesn't mean all the
> > application using rocksdb needs to be built with tcmalloc.
> > Sage,
> > Wanted to know is there any reason we didn't link client libraries with
> > tcmalloc at the first place (but did link only OSDs/mon/RGW) ?
> >
> > Thanks & Regards
> > Somnath
> >
> > -----Original Message-----
> > From: Sage Weil [mailto:sage@newdream.net]
> > Sent: Saturday, August 22, 2015 6:56 AM
> > To: Milosz Tanski
> > Cc: Shishir Gowda; Somnath Roy; Stefan Priebe; Alexandre DERUMIER; Mark
> > Nelson; ceph-devel
> > Subject: Re: Ceph Hackathon: More Memory Allocator Testing
> >
> > On Fri, 21 Aug 2015, Milosz Tanski wrote:
> > > On Fri, Aug 21, 2015 at 12:22 AM, Shishir Gowda
> > > <Shishir.Gowda@sandisk.com> wrote:
> > > > Hi All,
> > > >
> > > > Have sent out a pull request which enables building librados/librbd
> > > > with either tcmalloc(as default) or jemalloc.
> > > >
> > > > Please find the pull request @
> > > > https://github.com/ceph/ceph/pull/5628
> > > >
> > > > With regards,
> > > > Shishir
> > >
> > > Unless I'm missing something here, this seams like the wrong thing to.
> > > Libraries that will be linked in by other external applications should
> > > not have a 3rd party malloc linked in there. That seams like an
> > > application choice. At the very least the default should not be to
> > > link in a 3rd party malloc.
> >
> > Yeah, I think you're right.
> >
> > Note that this isn't/wasn't always the case, though.. on precise, for
> > instance, libleveldb links libtcmalloc. They stopped doing this sometime
> > before trusty.
> >
> > sage
> >
> > ________________________________
> >
> > PLEASE NOTE: The information contained in this electronic mail message is
> > intended only for the use of the designated recipient(s) named above. If
> > the reader of this message is not the intended recipient, you are hereby
> > notified that you have received this message in error and that any review,
> > dissemination, distribution, or copying of this message is strictly
> > prohibited. If you have received this communication in error, please
> > notify the sender by telephone or e-mail (as shown above) immediately and
> > destroy any and all copies of this message in your possession (whether
> > hard copies or electronically stored copies).
> >
> > N�����r��y���b�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w������j:+v���w�j�m��������zZ+��ݢj"��
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2015-09-03 13:12 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-19  4:45 Ceph Hackathon: More Memory Allocator Testing Mark Nelson
2015-08-19  5:13 ` Shinobu Kinjo
2015-08-19  5:36 ` Somnath Roy
2015-08-19  8:07   ` Haomai Wang
2015-08-19  9:06     ` Shinobu Kinjo
2015-08-19 12:17     ` Mark Nelson
2015-08-19 12:36       ` Dałek, Piotr
2015-08-19 12:44         ` Mark Nelson
2015-08-19 12:47           ` Dałek, Piotr
2015-08-19 12:10   ` Mark Nelson
2015-08-19  6:33 ` Stefan Priebe - Profihost AG
2015-08-19 12:20   ` Mark Nelson
2015-08-19 14:01 ` Alexandre DERUMIER
2015-08-19 16:05   ` Alexandre DERUMIER
2015-08-19 16:27     ` Somnath Roy
2015-08-19 16:55       ` Alexandre DERUMIER
2015-08-19 16:57         ` Blinick, Stephen L
2015-08-20  6:35           ` Dałek, Piotr
2015-08-20  7:08             ` Haomai Wang
2015-08-20  7:18               ` Dałek, Piotr
2015-08-19 17:29         ` Somnath Roy
2015-08-19 18:20           ` Allen Samuels
2015-08-19 18:36             ` Mark Nelson
2015-08-19 18:47               ` Łukasz Redynk
2015-08-20  6:25             ` Dałek, Piotr
2015-08-19 18:47           ` Alexandre DERUMIER
2015-08-20  1:09             ` Blinick, Stephen L
2015-08-20  2:00               ` Shinobu Kinjo
2015-08-20  5:29                 ` Alexandre DERUMIER
2015-08-20  8:17                   ` Alexandre DERUMIER
2015-08-20 12:54                     ` Shinobu Kinjo
2015-08-20 14:46                       ` Matt Benjamin
2015-08-19 20:16   ` Somnath Roy
2015-08-19 20:17     ` Stefan Priebe
2015-08-19 20:29       ` Somnath Roy
2015-08-19 20:31         ` Stefan Priebe
2015-08-19 20:34           ` Somnath Roy
2015-08-19 20:40             ` Stefan Priebe
2015-08-19 20:44               ` Somnath Roy
2015-08-21  3:45                 ` Shishir Gowda
2015-08-21  4:22                 ` Shishir Gowda
2015-08-21 14:26                   ` Milosz Tanski
2015-08-21 19:07                     ` Robert LeBlanc
2015-08-22 13:52                       ` Sage Weil
2015-08-22 13:55                     ` Sage Weil
2015-08-22 16:15                       ` Somnath Roy
2015-08-22 16:57                         ` Alexandre DERUMIER
2015-08-22 17:03                           ` Somnath Roy
2015-08-23 13:12                             ` Alexandre DERUMIER
2015-08-23 16:38                               ` Somnath Roy
2015-09-03  9:13                             ` Shinobu Kinjo
2015-09-03 13:06                               ` Daniel Gryniewicz
2015-09-03 13:12                                 ` Matt Benjamin
2015-08-24 17:01                       ` Robert LeBlanc
2015-08-19 20:50 ` Zhang, Jian

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.