* mem leaks in Bluestore?
@ 2016-06-27 14:11 Igor Fedotov
2016-06-27 14:18 ` Sage Weil
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Igor Fedotov @ 2016-06-27 14:11 UTC (permalink / raw)
To: ceph-devel
Hi All,
let me share some observations I collected while running
ceph_test_objectstore against the bluestore.
Initially I started this investigations due to a failure in
SyntheticMatrixCompressionAlgorithm test case. The issue appeared while
running the the whole test suite and had a pretty odd symptom: failed
test case was run with settings that aren't provided for it: compression
= none, compression algorithm = snappy. No attempts to run for zlib
despite the fact zlib is the first in the list. When running single test
case everything is OK.
Additional investigations led me to the fact that RAM at my VM decreases
almost to zero while running the test suite and that probably prevents
from setting desired config params. Hence I proceeded with mem leak
investigation.
Since Synthetic test cases are pretty complex I switched to less complex
one - Many4KWriteNoCSumTest. As it performs writes only against the
single object this eliminates other ops, compression, csum, multiple
object handlings etc from under the suspicion.
Currently I can see ~6Gb mem consumption when doing ~3000 random writes
(up to 4K) over 4M object. Counting Bluestore's Buffer, Blob and Onode
objects shows that they aren't grow unexpectedly over the time for the
test case.
Then I changed the test case to perform fixed length(64K) writes - mem
consumption for 3000 writes reduced to 500M but I can see that Buffer
count is permanently growing - one buffer per single write. Thus
original issue is rather specific for small writes. But probably there
is another issue with buffer cache for big ones.
That's all what I have so far.
Any comments/ideas are appreciated.
Thanks,
Igor
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: mem leaks in Bluestore?
2016-06-27 14:11 mem leaks in Bluestore? Igor Fedotov
@ 2016-06-27 14:18 ` Sage Weil
2016-06-27 16:49 ` Milosz Tanski
[not found] ` <66c9974d-8cf8-2876-8dd5-7040aef4630c@redhat.com>
2 siblings, 0 replies; 6+ messages in thread
From: Sage Weil @ 2016-06-27 14:18 UTC (permalink / raw)
To: Igor Fedotov; +Cc: ceph-devel
On Mon, 27 Jun 2016, Igor Fedotov wrote:
> Hi All,
>
> let me share some observations I collected while running ceph_test_objectstore
> against the bluestore.
>
> Initially I started this investigations due to a failure in
> SyntheticMatrixCompressionAlgorithm test case. The issue appeared while
> running the the whole test suite and had a pretty odd symptom: failed test
> case was run with settings that aren't provided for it: compression = none,
> compression algorithm = snappy. No attempts to run for zlib despite the fact
> zlib is the first in the list. When running single test case everything is OK.
> Additional investigations led me to the fact that RAM at my VM decreases
> almost to zero while running the test suite and that probably prevents from
> setting desired config params. Hence I proceeded with mem leak investigation.
>
> Since Synthetic test cases are pretty complex I switched to less complex one -
> Many4KWriteNoCSumTest. As it performs writes only against the single object
> this eliminates other ops, compression, csum, multiple object handlings etc
> from under the suspicion.
> Currently I can see ~6Gb mem consumption when doing ~3000 random writes (up to
> 4K) over 4M object. Counting Bluestore's Buffer, Blob and Onode objects shows
> that they aren't grow unexpectedly over the time for the test case.
> Then I changed the test case to perform fixed length(64K) writes - mem
> consumption for 3000 writes reduced to 500M but I can see that Buffer count is
> permanently growing - one buffer per single write. Thus original issue is
> rather specific for small writes. But probably there is another issue with
> buffer cache for big ones.
> That's all what I have so far.
>
> Any comments/ideas are appreciated.
valgrind --tool=massif bin/ceph_test_objectstore --gtest_filter=*Many4K*/2
should generate a heap profile (massif.* iirc?) that you can look at with
ms_print.
sage
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: mem leaks in Bluestore?
2016-06-27 14:11 mem leaks in Bluestore? Igor Fedotov
2016-06-27 14:18 ` Sage Weil
@ 2016-06-27 16:49 ` Milosz Tanski
[not found] ` <66c9974d-8cf8-2876-8dd5-7040aef4630c@redhat.com>
2 siblings, 0 replies; 6+ messages in thread
From: Milosz Tanski @ 2016-06-27 16:49 UTC (permalink / raw)
To: Igor Fedotov; +Cc: ceph-devel
On Mon, Jun 27, 2016 at 10:11 AM, Igor Fedotov <ifedotov@mirantis.com> wrote:
>
> Hi All,
>
> let me share some observations I collected while running ceph_test_objectstore against the bluestore.
>
> Initially I started this investigations due to a failure in SyntheticMatrixCompressionAlgorithm test case. The issue appeared while running the the whole test suite and had a pretty odd symptom: failed test case was run with settings that aren't provided for it: compression = none, compression algorithm = snappy. No attempts to run for zlib despite the fact zlib is the first in the list. When running single test case everything is OK.
> Additional investigations led me to the fact that RAM at my VM decreases almost to zero while running the test suite and that probably prevents from setting desired config params. Hence I proceeded with mem leak investigation.
>
> Since Synthetic test cases are pretty complex I switched to less complex one - Many4KWriteNoCSumTest. As it performs writes only against the single object this eliminates other ops, compression, csum, multiple object handlings etc from under the suspicion.
> Currently I can see ~6Gb mem consumption when doing ~3000 random writes (up to 4K) over 4M object. Counting Bluestore's Buffer, Blob and Onode objects shows that they aren't grow unexpectedly over the time for the test case.
> Then I changed the test case to perform fixed length(64K) writes - mem consumption for 3000 writes reduced to 500M but I can see that Buffer count is permanently growing - one buffer per single write. Thus original issue is rather specific for small writes. But probably there is another issue with buffer cache for big ones.
> That's all what I have so far.
>
> Any comments/ideas are appreciated.
>
Igor,
Today both clang and gcc nowadays have a "performant" address
sanitizer that supports leak detection.
https://github.com/google/sanitizers/wiki/AddressSanitizerLeakSanitizer
I would try using that first. It's just so much more more performant
than Valgrind, something like a magnitude.
--
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016
p: 646-253-9055
e: milosz@adfin.com
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: mem leaks in Bluestore?
[not found] ` <66c9974d-8cf8-2876-8dd5-7040aef4630c@redhat.com>
@ 2016-07-04 13:11 ` Igor Fedotov
2016-07-04 13:15 ` Sage Weil
0 siblings, 1 reply; 6+ messages in thread
From: Igor Fedotov @ 2016-07-04 13:11 UTC (permalink / raw)
To: Mark Nelson, ceph-devel
Hi Mark,
I suspect that BlueStore's buffer cache holds the memory. Could you
please try to set 'bluestore buffer cache size = 4194304' in your config
file and check what's happening again.
And another question - how many pools do you have for your test case? In
Bluestore each collection can allocate up to 512M for the buffer cache
by default. Not sure if collection == pool but there is some correlation
between them hence this might be the case.
Thanks,
Igor
On 30.06.2016 22:30, Mark Nelson wrote:
> Not sure how related this is, but for both reads and random reads I'm
> seeing very large memory spikes (not necessarily leaks). I've
> included both the massif output and a screenshot from massif-visualizer.
>
> The spikes appear to be due to:
>
> bufferptr p = buffer::create_page_aligned(len)
>
> in KernelDevice::read on line ~477 in KernelDevice.cc. Running via
> massif slows things down enough that the node doesn't go out of memory
> and the tests keep running, but if I attempt these tests without
> massif, very quickly the OSD will spike to ~64GB of memory and then
> get killed by the kernel.
>
> There were a number of recent commits that modified KernelDevice.cc as
> well as buffer.cc:
>
> https://github.com/ceph/ceph/commits/master/src/os/bluestore/KernelDevice.cc
>
> https://github.com/ceph/ceph/commits/master/src/common/buffer.cc
>
> Still looking through them for signs of what might be amiss.
>
> Mark
>
> On 06/27/2016 09:11 AM, Igor Fedotov wrote:
>> Hi All,
>>
>> let me share some observations I collected while running
>> ceph_test_objectstore against the bluestore.
>>
>> Initially I started this investigations due to a failure in
>> SyntheticMatrixCompressionAlgorithm test case. The issue appeared while
>> running the the whole test suite and had a pretty odd symptom: failed
>> test case was run with settings that aren't provided for it: compression
>> = none, compression algorithm = snappy. No attempts to run for zlib
>> despite the fact zlib is the first in the list. When running single test
>> case everything is OK.
>> Additional investigations led me to the fact that RAM at my VM decreases
>> almost to zero while running the test suite and that probably prevents
>> from setting desired config params. Hence I proceeded with mem leak
>> investigation.
>>
>> Since Synthetic test cases are pretty complex I switched to less complex
>> one - Many4KWriteNoCSumTest. As it performs writes only against the
>> single object this eliminates other ops, compression, csum, multiple
>> object handlings etc from under the suspicion.
>> Currently I can see ~6Gb mem consumption when doing ~3000 random writes
>> (up to 4K) over 4M object. Counting Bluestore's Buffer, Blob and Onode
>> objects shows that they aren't grow unexpectedly over the time for the
>> test case.
>> Then I changed the test case to perform fixed length(64K) writes - mem
>> consumption for 3000 writes reduced to 500M but I can see that Buffer
>> count is permanently growing - one buffer per single write. Thus
>> original issue is rather specific for small writes. But probably there
>> is another issue with buffer cache for big ones.
>> That's all what I have so far.
>>
>> Any comments/ideas are appreciated.
>>
>>
>> Thanks,
>> Igor
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: mem leaks in Bluestore?
2016-07-04 13:11 ` Igor Fedotov
@ 2016-07-04 13:15 ` Sage Weil
2016-07-04 23:38 ` Mark Nelson
0 siblings, 1 reply; 6+ messages in thread
From: Sage Weil @ 2016-07-04 13:15 UTC (permalink / raw)
To: Igor Fedotov; +Cc: Mark Nelson, ceph-devel
I think this was fixed by adding a call to trim the cache in the read
path. Previously it was only happening on writes.
On Mon, 4 Jul 2016, Igor Fedotov wrote:
> Hi Mark,
>
> I suspect that BlueStore's buffer cache holds the memory. Could you please try
> to set 'bluestore buffer cache size = 4194304' in your config file and check
> what's happening again.
>
> And another question - how many pools do you have for your test case? In
> Bluestore each collection can allocate up to 512M for the buffer cache by
> default. Not sure if collection == pool but there is some correlation between
> them hence this might be the case.
This could cause a problem if we got nothing but requests that touch 2
collections and the important one was always second. In the OSD's case,
though, we generally touch the PG collection first, so I don't think this
will be a problem.
sage
>
> Thanks,
>
> Igor
>
>
>
> On 30.06.2016 22:30, Mark Nelson wrote:
> > Not sure how related this is, but for both reads and random reads I'm seeing
> > very large memory spikes (not necessarily leaks). I've included both the
> > massif output and a screenshot from massif-visualizer.
> >
> > The spikes appear to be due to:
> >
> > bufferptr p = buffer::create_page_aligned(len)
> >
> > in KernelDevice::read on line ~477 in KernelDevice.cc. Running via massif
> > slows things down enough that the node doesn't go out of memory and the
> > tests keep running, but if I attempt these tests without massif, very
> > quickly the OSD will spike to ~64GB of memory and then get killed by the
> > kernel.
> >
> > There were a number of recent commits that modified KernelDevice.cc as well
> > as buffer.cc:
> >
> > https://github.com/ceph/ceph/commits/master/src/os/bluestore/KernelDevice.cc
> > https://github.com/ceph/ceph/commits/master/src/common/buffer.cc
> >
> > Still looking through them for signs of what might be amiss.
> >
> > Mark
> >
> > On 06/27/2016 09:11 AM, Igor Fedotov wrote:
> > > Hi All,
> > >
> > > let me share some observations I collected while running
> > > ceph_test_objectstore against the bluestore.
> > >
> > > Initially I started this investigations due to a failure in
> > > SyntheticMatrixCompressionAlgorithm test case. The issue appeared while
> > > running the the whole test suite and had a pretty odd symptom: failed
> > > test case was run with settings that aren't provided for it: compression
> > > = none, compression algorithm = snappy. No attempts to run for zlib
> > > despite the fact zlib is the first in the list. When running single test
> > > case everything is OK.
> > > Additional investigations led me to the fact that RAM at my VM decreases
> > > almost to zero while running the test suite and that probably prevents
> > > from setting desired config params. Hence I proceeded with mem leak
> > > investigation.
> > >
> > > Since Synthetic test cases are pretty complex I switched to less complex
> > > one - Many4KWriteNoCSumTest. As it performs writes only against the
> > > single object this eliminates other ops, compression, csum, multiple
> > > object handlings etc from under the suspicion.
> > > Currently I can see ~6Gb mem consumption when doing ~3000 random writes
> > > (up to 4K) over 4M object. Counting Bluestore's Buffer, Blob and Onode
> > > objects shows that they aren't grow unexpectedly over the time for the
> > > test case.
> > > Then I changed the test case to perform fixed length(64K) writes - mem
> > > consumption for 3000 writes reduced to 500M but I can see that Buffer
> > > count is permanently growing - one buffer per single write. Thus
> > > original issue is rather specific for small writes. But probably there
> > > is another issue with buffer cache for big ones.
> > > That's all what I have so far.
> > >
> > > Any comments/ideas are appreciated.
> > >
> > >
> > > Thanks,
> > > Igor
> > >
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: mem leaks in Bluestore?
2016-07-04 13:15 ` Sage Weil
@ 2016-07-04 23:38 ` Mark Nelson
0 siblings, 0 replies; 6+ messages in thread
From: Mark Nelson @ 2016-07-04 23:38 UTC (permalink / raw)
To: Sage Weil, Igor Fedotov; +Cc: ceph-devel
Yep, this was due to the lack of trim in the read path. We are now
spiking up to around 2.7GB RSS per OSD but holding steady and not
exceeding that under normal operation. I did notice that under valgrind
we can spike higher. I wonder if perhaps trim isn't keeping up under
the strain when massif is used.
I'm now fighting segfaults during random writes on master. I was able
to reproduce it with logs and got a core dump but still need to look
through the code to figure out why it's happening.
Mark
On 07/04/2016 08:15 AM, Sage Weil wrote:
> I think this was fixed by adding a call to trim the cache in the read
> path. Previously it was only happening on writes.
>
> On Mon, 4 Jul 2016, Igor Fedotov wrote:
>> Hi Mark,
>>
>> I suspect that BlueStore's buffer cache holds the memory. Could you please try
>> to set 'bluestore buffer cache size = 4194304' in your config file and check
>> what's happening again.
>>
>> And another question - how many pools do you have for your test case? In
>> Bluestore each collection can allocate up to 512M for the buffer cache by
>> default. Not sure if collection == pool but there is some correlation between
>> them hence this might be the case.
>
> This could cause a problem if we got nothing but requests that touch 2
> collections and the important one was always second. In the OSD's case,
> though, we generally touch the PG collection first, so I don't think this
> will be a problem.
>
> sage
>
>>
>> Thanks,
>>
>> Igor
>>
>>
>>
>> On 30.06.2016 22:30, Mark Nelson wrote:
>>> Not sure how related this is, but for both reads and random reads I'm seeing
>>> very large memory spikes (not necessarily leaks). I've included both the
>>> massif output and a screenshot from massif-visualizer.
>>>
>>> The spikes appear to be due to:
>>>
>>> bufferptr p = buffer::create_page_aligned(len)
>>>
>>> in KernelDevice::read on line ~477 in KernelDevice.cc. Running via massif
>>> slows things down enough that the node doesn't go out of memory and the
>>> tests keep running, but if I attempt these tests without massif, very
>>> quickly the OSD will spike to ~64GB of memory and then get killed by the
>>> kernel.
>>>
>>> There were a number of recent commits that modified KernelDevice.cc as well
>>> as buffer.cc:
>>>
>>> https://github.com/ceph/ceph/commits/master/src/os/bluestore/KernelDevice.cc
>>> https://github.com/ceph/ceph/commits/master/src/common/buffer.cc
>>>
>>> Still looking through them for signs of what might be amiss.
>>>
>>> Mark
>>>
>>> On 06/27/2016 09:11 AM, Igor Fedotov wrote:
>>>> Hi All,
>>>>
>>>> let me share some observations I collected while running
>>>> ceph_test_objectstore against the bluestore.
>>>>
>>>> Initially I started this investigations due to a failure in
>>>> SyntheticMatrixCompressionAlgorithm test case. The issue appeared while
>>>> running the the whole test suite and had a pretty odd symptom: failed
>>>> test case was run with settings that aren't provided for it: compression
>>>> = none, compression algorithm = snappy. No attempts to run for zlib
>>>> despite the fact zlib is the first in the list. When running single test
>>>> case everything is OK.
>>>> Additional investigations led me to the fact that RAM at my VM decreases
>>>> almost to zero while running the test suite and that probably prevents
>>>> from setting desired config params. Hence I proceeded with mem leak
>>>> investigation.
>>>>
>>>> Since Synthetic test cases are pretty complex I switched to less complex
>>>> one - Many4KWriteNoCSumTest. As it performs writes only against the
>>>> single object this eliminates other ops, compression, csum, multiple
>>>> object handlings etc from under the suspicion.
>>>> Currently I can see ~6Gb mem consumption when doing ~3000 random writes
>>>> (up to 4K) over 4M object. Counting Bluestore's Buffer, Blob and Onode
>>>> objects shows that they aren't grow unexpectedly over the time for the
>>>> test case.
>>>> Then I changed the test case to perform fixed length(64K) writes - mem
>>>> consumption for 3000 writes reduced to 500M but I can see that Buffer
>>>> count is permanently growing - one buffer per single write. Thus
>>>> original issue is rather specific for small writes. But probably there
>>>> is another issue with buffer cache for big ones.
>>>> That's all what I have so far.
>>>>
>>>> Any comments/ideas are appreciated.
>>>>
>>>>
>>>> Thanks,
>>>> Igor
>>>>
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-07-04 23:38 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-27 14:11 mem leaks in Bluestore? Igor Fedotov
2016-06-27 14:18 ` Sage Weil
2016-06-27 16:49 ` Milosz Tanski
[not found] ` <66c9974d-8cf8-2876-8dd5-7040aef4630c@redhat.com>
2016-07-04 13:11 ` Igor Fedotov
2016-07-04 13:15 ` Sage Weil
2016-07-04 23:38 ` Mark Nelson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.