All of lore.kernel.org
 help / color / mirror / Atom feed
* mem leaks in Bluestore?
@ 2016-06-27 14:11 Igor Fedotov
  2016-06-27 14:18 ` Sage Weil
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Igor Fedotov @ 2016-06-27 14:11 UTC (permalink / raw)
  To: ceph-devel

Hi All,

let me share some observations I collected while running 
ceph_test_objectstore against the bluestore.

Initially I started this investigations due to a failure in 
SyntheticMatrixCompressionAlgorithm test case. The issue appeared while 
running the the whole test suite and had a pretty odd symptom: failed 
test case was run with settings that aren't provided for it: compression 
= none, compression algorithm = snappy. No attempts to run for zlib 
despite the fact zlib is the first in the list. When running single test 
case everything is OK.
Additional investigations led me to the fact that RAM at my VM decreases 
almost to zero while running the test suite and that probably prevents 
from setting desired config params. Hence I proceeded with mem leak 
investigation.

Since Synthetic test cases are pretty complex I switched to less complex 
one - Many4KWriteNoCSumTest. As it performs writes only against the 
single object this eliminates other ops, compression, csum, multiple 
object handlings etc from under the suspicion.
Currently I can see ~6Gb mem consumption when doing ~3000 random writes 
(up to 4K) over 4M object. Counting Bluestore's Buffer, Blob and Onode 
objects shows that they aren't grow unexpectedly over the time for the 
test case.
Then I changed the test case to perform fixed length(64K) writes - mem 
consumption for 3000 writes reduced to 500M but I can see that Buffer 
count is permanently growing - one buffer per single write. Thus 
original issue is rather specific for small writes. But probably there 
is another issue with buffer cache for big ones.
That's all what I have so far.

Any comments/ideas are appreciated.


Thanks,
Igor



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: mem leaks in Bluestore?
  2016-06-27 14:11 mem leaks in Bluestore? Igor Fedotov
@ 2016-06-27 14:18 ` Sage Weil
  2016-06-27 16:49 ` Milosz Tanski
       [not found] ` <66c9974d-8cf8-2876-8dd5-7040aef4630c@redhat.com>
  2 siblings, 0 replies; 6+ messages in thread
From: Sage Weil @ 2016-06-27 14:18 UTC (permalink / raw)
  To: Igor Fedotov; +Cc: ceph-devel

On Mon, 27 Jun 2016, Igor Fedotov wrote:
> Hi All,
> 
> let me share some observations I collected while running ceph_test_objectstore
> against the bluestore.
> 
> Initially I started this investigations due to a failure in
> SyntheticMatrixCompressionAlgorithm test case. The issue appeared while
> running the the whole test suite and had a pretty odd symptom: failed test
> case was run with settings that aren't provided for it: compression = none,
> compression algorithm = snappy. No attempts to run for zlib despite the fact
> zlib is the first in the list. When running single test case everything is OK.
> Additional investigations led me to the fact that RAM at my VM decreases
> almost to zero while running the test suite and that probably prevents from
> setting desired config params. Hence I proceeded with mem leak investigation.
> 
> Since Synthetic test cases are pretty complex I switched to less complex one -
> Many4KWriteNoCSumTest. As it performs writes only against the single object
> this eliminates other ops, compression, csum, multiple object handlings etc
> from under the suspicion.
> Currently I can see ~6Gb mem consumption when doing ~3000 random writes (up to
> 4K) over 4M object. Counting Bluestore's Buffer, Blob and Onode objects shows
> that they aren't grow unexpectedly over the time for the test case.
> Then I changed the test case to perform fixed length(64K) writes - mem
> consumption for 3000 writes reduced to 500M but I can see that Buffer count is
> permanently growing - one buffer per single write. Thus original issue is
> rather specific for small writes. But probably there is another issue with
> buffer cache for big ones.
> That's all what I have so far.
> 
> Any comments/ideas are appreciated.

valgrind --tool=massif bin/ceph_test_objectstore --gtest_filter=*Many4K*/2

should generate a heap profile (massif.* iirc?) that you can look at with 
ms_print.

sage

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: mem leaks in Bluestore?
  2016-06-27 14:11 mem leaks in Bluestore? Igor Fedotov
  2016-06-27 14:18 ` Sage Weil
@ 2016-06-27 16:49 ` Milosz Tanski
       [not found] ` <66c9974d-8cf8-2876-8dd5-7040aef4630c@redhat.com>
  2 siblings, 0 replies; 6+ messages in thread
From: Milosz Tanski @ 2016-06-27 16:49 UTC (permalink / raw)
  To: Igor Fedotov; +Cc: ceph-devel

On Mon, Jun 27, 2016 at 10:11 AM, Igor Fedotov <ifedotov@mirantis.com> wrote:
>
> Hi All,
>
> let me share some observations I collected while running ceph_test_objectstore against the bluestore.
>
> Initially I started this investigations due to a failure in SyntheticMatrixCompressionAlgorithm test case. The issue appeared while running the the whole test suite and had a pretty odd symptom: failed test case was run with settings that aren't provided for it: compression = none, compression algorithm = snappy. No attempts to run for zlib despite the fact zlib is the first in the list. When running single test case everything is OK.
> Additional investigations led me to the fact that RAM at my VM decreases almost to zero while running the test suite and that probably prevents from setting desired config params. Hence I proceeded with mem leak investigation.
>
> Since Synthetic test cases are pretty complex I switched to less complex one - Many4KWriteNoCSumTest. As it performs writes only against the single object this eliminates other ops, compression, csum, multiple object handlings etc from under the suspicion.
> Currently I can see ~6Gb mem consumption when doing ~3000 random writes (up to 4K) over 4M object. Counting Bluestore's Buffer, Blob and Onode objects shows that they aren't grow unexpectedly over the time for the test case.
> Then I changed the test case to perform fixed length(64K) writes - mem consumption for 3000 writes reduced to 500M but I can see that Buffer count is permanently growing - one buffer per single write. Thus original issue is rather specific for small writes. But probably there is another issue with buffer cache for big ones.
> That's all what I have so far.
>
> Any comments/ideas are appreciated.
>

Igor,

Today both clang and gcc nowadays have a "performant" address
sanitizer that supports leak detection.
https://github.com/google/sanitizers/wiki/AddressSanitizerLeakSanitizer
I would try using that first. It's just so much more more performant
than Valgrind, something like a magnitude.



-- 
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016

p: 646-253-9055
e: milosz@adfin.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: mem leaks in Bluestore?
       [not found] ` <66c9974d-8cf8-2876-8dd5-7040aef4630c@redhat.com>
@ 2016-07-04 13:11   ` Igor Fedotov
  2016-07-04 13:15     ` Sage Weil
  0 siblings, 1 reply; 6+ messages in thread
From: Igor Fedotov @ 2016-07-04 13:11 UTC (permalink / raw)
  To: Mark Nelson, ceph-devel

Hi Mark,

I suspect that BlueStore's buffer cache holds the memory. Could you 
please try to set 'bluestore buffer cache size = 4194304' in your config 
file and check what's happening again.

And another question - how many pools do you have for your test case? In 
Bluestore each collection can allocate up to 512M for the buffer cache 
by default. Not sure if collection == pool but there is some correlation 
between them hence this might be the case.

Thanks,

Igor



On 30.06.2016 22:30, Mark Nelson wrote:
> Not sure how related this is, but for both reads and random reads I'm 
> seeing very large memory spikes (not necessarily leaks).  I've 
> included both the massif output and a screenshot from massif-visualizer.
>
> The spikes appear to be due to:
>
> bufferptr p = buffer::create_page_aligned(len)
>
> in KernelDevice::read on line ~477 in KernelDevice.cc.  Running via 
> massif slows things down enough that the node doesn't go out of memory 
> and the tests keep running, but if I attempt these tests without 
> massif, very quickly the OSD will spike to ~64GB of memory and then 
> get killed by the kernel.
>
> There were a number of recent commits that modified KernelDevice.cc as 
> well as buffer.cc:
>
> https://github.com/ceph/ceph/commits/master/src/os/bluestore/KernelDevice.cc 
>
> https://github.com/ceph/ceph/commits/master/src/common/buffer.cc
>
> Still looking through them for signs of what might be amiss.
>
> Mark
>
> On 06/27/2016 09:11 AM, Igor Fedotov wrote:
>> Hi All,
>>
>> let me share some observations I collected while running
>> ceph_test_objectstore against the bluestore.
>>
>> Initially I started this investigations due to a failure in
>> SyntheticMatrixCompressionAlgorithm test case. The issue appeared while
>> running the the whole test suite and had a pretty odd symptom: failed
>> test case was run with settings that aren't provided for it: compression
>> = none, compression algorithm = snappy. No attempts to run for zlib
>> despite the fact zlib is the first in the list. When running single test
>> case everything is OK.
>> Additional investigations led me to the fact that RAM at my VM decreases
>> almost to zero while running the test suite and that probably prevents
>> from setting desired config params. Hence I proceeded with mem leak
>> investigation.
>>
>> Since Synthetic test cases are pretty complex I switched to less complex
>> one - Many4KWriteNoCSumTest. As it performs writes only against the
>> single object this eliminates other ops, compression, csum, multiple
>> object handlings etc from under the suspicion.
>> Currently I can see ~6Gb mem consumption when doing ~3000 random writes
>> (up to 4K) over 4M object. Counting Bluestore's Buffer, Blob and Onode
>> objects shows that they aren't grow unexpectedly over the time for the
>> test case.
>> Then I changed the test case to perform fixed length(64K) writes - mem
>> consumption for 3000 writes reduced to 500M but I can see that Buffer
>> count is permanently growing - one buffer per single write. Thus
>> original issue is rather specific for small writes. But probably there
>> is another issue with buffer cache for big ones.
>> That's all what I have so far.
>>
>> Any comments/ideas are appreciated.
>>
>>
>> Thanks,
>> Igor
>>
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: mem leaks in Bluestore?
  2016-07-04 13:11   ` Igor Fedotov
@ 2016-07-04 13:15     ` Sage Weil
  2016-07-04 23:38       ` Mark Nelson
  0 siblings, 1 reply; 6+ messages in thread
From: Sage Weil @ 2016-07-04 13:15 UTC (permalink / raw)
  To: Igor Fedotov; +Cc: Mark Nelson, ceph-devel

I think this was fixed by adding a call to trim the cache in the read 
path.  Previously it was only happening on writes.

On Mon, 4 Jul 2016, Igor Fedotov wrote:
> Hi Mark,
> 
> I suspect that BlueStore's buffer cache holds the memory. Could you please try
> to set 'bluestore buffer cache size = 4194304' in your config file and check
> what's happening again.
> 
> And another question - how many pools do you have for your test case? In
> Bluestore each collection can allocate up to 512M for the buffer cache by
> default. Not sure if collection == pool but there is some correlation between
> them hence this might be the case.

This could cause a problem if we got nothing but requests that touch 2 
collections and the important one was always second.  In the OSD's case, 
though, we generally touch the PG collection first, so I don't think this 
will be a problem.

sage

> 
> Thanks,
> 
> Igor
> 
> 
> 
> On 30.06.2016 22:30, Mark Nelson wrote:
> > Not sure how related this is, but for both reads and random reads I'm seeing
> > very large memory spikes (not necessarily leaks).  I've included both the
> > massif output and a screenshot from massif-visualizer.
> > 
> > The spikes appear to be due to:
> > 
> > bufferptr p = buffer::create_page_aligned(len)
> > 
> > in KernelDevice::read on line ~477 in KernelDevice.cc.  Running via massif
> > slows things down enough that the node doesn't go out of memory and the
> > tests keep running, but if I attempt these tests without massif, very
> > quickly the OSD will spike to ~64GB of memory and then get killed by the
> > kernel.
> > 
> > There were a number of recent commits that modified KernelDevice.cc as well
> > as buffer.cc:
> > 
> > https://github.com/ceph/ceph/commits/master/src/os/bluestore/KernelDevice.cc 
> > https://github.com/ceph/ceph/commits/master/src/common/buffer.cc
> > 
> > Still looking through them for signs of what might be amiss.
> > 
> > Mark
> > 
> > On 06/27/2016 09:11 AM, Igor Fedotov wrote:
> > > Hi All,
> > > 
> > > let me share some observations I collected while running
> > > ceph_test_objectstore against the bluestore.
> > > 
> > > Initially I started this investigations due to a failure in
> > > SyntheticMatrixCompressionAlgorithm test case. The issue appeared while
> > > running the the whole test suite and had a pretty odd symptom: failed
> > > test case was run with settings that aren't provided for it: compression
> > > = none, compression algorithm = snappy. No attempts to run for zlib
> > > despite the fact zlib is the first in the list. When running single test
> > > case everything is OK.
> > > Additional investigations led me to the fact that RAM at my VM decreases
> > > almost to zero while running the test suite and that probably prevents
> > > from setting desired config params. Hence I proceeded with mem leak
> > > investigation.
> > > 
> > > Since Synthetic test cases are pretty complex I switched to less complex
> > > one - Many4KWriteNoCSumTest. As it performs writes only against the
> > > single object this eliminates other ops, compression, csum, multiple
> > > object handlings etc from under the suspicion.
> > > Currently I can see ~6Gb mem consumption when doing ~3000 random writes
> > > (up to 4K) over 4M object. Counting Bluestore's Buffer, Blob and Onode
> > > objects shows that they aren't grow unexpectedly over the time for the
> > > test case.
> > > Then I changed the test case to perform fixed length(64K) writes - mem
> > > consumption for 3000 writes reduced to 500M but I can see that Buffer
> > > count is permanently growing - one buffer per single write. Thus
> > > original issue is rather specific for small writes. But probably there
> > > is another issue with buffer cache for big ones.
> > > That's all what I have so far.
> > > 
> > > Any comments/ideas are appreciated.
> > > 
> > > 
> > > Thanks,
> > > Igor
> > > 
> > > 
> > > -- 
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: mem leaks in Bluestore?
  2016-07-04 13:15     ` Sage Weil
@ 2016-07-04 23:38       ` Mark Nelson
  0 siblings, 0 replies; 6+ messages in thread
From: Mark Nelson @ 2016-07-04 23:38 UTC (permalink / raw)
  To: Sage Weil, Igor Fedotov; +Cc: ceph-devel

Yep, this was due to the lack of trim in the read path.  We are now 
spiking up to around 2.7GB RSS per OSD but holding steady and not 
exceeding that under normal operation.  I did notice that under valgrind 
we can spike higher.  I wonder if perhaps trim isn't keeping up under 
the strain when massif is used.

I'm now fighting segfaults during random writes on master.  I was able 
to reproduce it with logs and got a core dump but still need to look 
through the code to figure out why it's happening.

Mark

On 07/04/2016 08:15 AM, Sage Weil wrote:
> I think this was fixed by adding a call to trim the cache in the read
> path.  Previously it was only happening on writes.
>
> On Mon, 4 Jul 2016, Igor Fedotov wrote:
>> Hi Mark,
>>
>> I suspect that BlueStore's buffer cache holds the memory. Could you please try
>> to set 'bluestore buffer cache size = 4194304' in your config file and check
>> what's happening again.
>>
>> And another question - how many pools do you have for your test case? In
>> Bluestore each collection can allocate up to 512M for the buffer cache by
>> default. Not sure if collection == pool but there is some correlation between
>> them hence this might be the case.
>
> This could cause a problem if we got nothing but requests that touch 2
> collections and the important one was always second.  In the OSD's case,
> though, we generally touch the PG collection first, so I don't think this
> will be a problem.
>
> sage
>
>>
>> Thanks,
>>
>> Igor
>>
>>
>>
>> On 30.06.2016 22:30, Mark Nelson wrote:
>>> Not sure how related this is, but for both reads and random reads I'm seeing
>>> very large memory spikes (not necessarily leaks).  I've included both the
>>> massif output and a screenshot from massif-visualizer.
>>>
>>> The spikes appear to be due to:
>>>
>>> bufferptr p = buffer::create_page_aligned(len)
>>>
>>> in KernelDevice::read on line ~477 in KernelDevice.cc.  Running via massif
>>> slows things down enough that the node doesn't go out of memory and the
>>> tests keep running, but if I attempt these tests without massif, very
>>> quickly the OSD will spike to ~64GB of memory and then get killed by the
>>> kernel.
>>>
>>> There were a number of recent commits that modified KernelDevice.cc as well
>>> as buffer.cc:
>>>
>>> https://github.com/ceph/ceph/commits/master/src/os/bluestore/KernelDevice.cc
>>> https://github.com/ceph/ceph/commits/master/src/common/buffer.cc
>>>
>>> Still looking through them for signs of what might be amiss.
>>>
>>> Mark
>>>
>>> On 06/27/2016 09:11 AM, Igor Fedotov wrote:
>>>> Hi All,
>>>>
>>>> let me share some observations I collected while running
>>>> ceph_test_objectstore against the bluestore.
>>>>
>>>> Initially I started this investigations due to a failure in
>>>> SyntheticMatrixCompressionAlgorithm test case. The issue appeared while
>>>> running the the whole test suite and had a pretty odd symptom: failed
>>>> test case was run with settings that aren't provided for it: compression
>>>> = none, compression algorithm = snappy. No attempts to run for zlib
>>>> despite the fact zlib is the first in the list. When running single test
>>>> case everything is OK.
>>>> Additional investigations led me to the fact that RAM at my VM decreases
>>>> almost to zero while running the test suite and that probably prevents
>>>> from setting desired config params. Hence I proceeded with mem leak
>>>> investigation.
>>>>
>>>> Since Synthetic test cases are pretty complex I switched to less complex
>>>> one - Many4KWriteNoCSumTest. As it performs writes only against the
>>>> single object this eliminates other ops, compression, csum, multiple
>>>> object handlings etc from under the suspicion.
>>>> Currently I can see ~6Gb mem consumption when doing ~3000 random writes
>>>> (up to 4K) over 4M object. Counting Bluestore's Buffer, Blob and Onode
>>>> objects shows that they aren't grow unexpectedly over the time for the
>>>> test case.
>>>> Then I changed the test case to perform fixed length(64K) writes - mem
>>>> consumption for 3000 writes reduced to 500M but I can see that Buffer
>>>> count is permanently growing - one buffer per single write. Thus
>>>> original issue is rather specific for small writes. But probably there
>>>> is another issue with buffer cache for big ones.
>>>> That's all what I have so far.
>>>>
>>>> Any comments/ideas are appreciated.
>>>>
>>>>
>>>> Thanks,
>>>> Igor
>>>>
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-07-04 23:38 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-27 14:11 mem leaks in Bluestore? Igor Fedotov
2016-06-27 14:18 ` Sage Weil
2016-06-27 16:49 ` Milosz Tanski
     [not found] ` <66c9974d-8cf8-2876-8dd5-7040aef4630c@redhat.com>
2016-07-04 13:11   ` Igor Fedotov
2016-07-04 13:15     ` Sage Weil
2016-07-04 23:38       ` Mark Nelson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.