All of lore.kernel.org
 help / color / mirror / Atom feed
* Segmentation faults in ceph-osd
@ 2013-05-21 11:21 Emil Renner Berthing
       [not found] ` <CAPYLRziXri6RYPZGCqQbnVS87XrnH1Bq2deX3yY4A8G5g3Yk5Q@mail.gmail.com>
  0 siblings, 1 reply; 12+ messages in thread
From: Emil Renner Berthing @ 2013-05-21 11:21 UTC (permalink / raw)
  To: ceph-devel

Hi,

We're experiencing random segmentation faults in the osd daemon from
the 0.61.2-1~bpo70+1 debian packages. It happens across all our
servers and we've seen around 40 crashes in the last week.

It seems to happen more often on loaded servers, but at least they all
return the same error in the logs. An example can be found here:
http://esmil.dk/osdcrash.txt

Here is the backtrace from the core dump:

#0  0x00007f87b148eefb in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x0000000000853a89 in reraise_fatal (signum=11) at
global/signal_handler.cc:58
#2  handle_fatal_signal (signum=11) at global/signal_handler.cc:104
#3  <signal handler called>
#4  0x00007f87b06a96f3 in do_malloc (size=388987616) at src/tcmalloc.cc:1059
#5  cpp_alloc (nothrow=false, size=388987616) at src/tcmalloc.cc:1354
#6  tc_new (size=388987616) at src/tcmalloc.cc:1530
#7  0x00007f87a60c89b0 in ?? ()
#8  0x00000000172f7ae0 in ?? ()
#9  0x00007f87b0459b21 in ?? () from /usr/lib/x86_64-linux-gnu/libleveldb.so.1
#10 0x00007f87b0456ba8 in ?? () from /usr/lib/x86_64-linux-gnu/libleveldb.so.1
#11 0x00007f87b04424d4 in ?? () from /usr/lib/x86_64-linux-gnu/libleveldb.so.1
#12 0x0000000000840977 in
LevelDBStore::LevelDBWholeSpaceIteratorImpl::lower_bound
(this=0x20910a20, prefix=..., to=...) at os/LevelDBStore.h:204
#13 0x000000000083f351 in LevelDBStore::get (this=<optimized out>,
prefix=..., keys=..., out=0x7f87a60c8d00) at os/LevelDBStore.cc:106
#14 0x0000000000838449 in DBObjectMap::_lookup_map_header
(this=this@entry=0x316d4a0, hoid=...) at os/DBObjectMap.cc:1080
#15 0x000000000083e4a9 in DBObjectMap::lookup_map_header
(this=this@entry=0x316d4a0, hoid=...) at os/DBObjectMap.h:404
#16 0x0000000000839e06 in DBObjectMap::rm_keys (this=0x316d4a0,
hoid=..., to_clear=..., spos=0x7f87a60c9400) at os/DBObjectMap.cc:696
#17 0x00000000007f40c1 in FileStore::_omap_rmkeys
(this=this@entry=0x3188000, cid=..., hoid=..., keys=..., spos=...) at
os/FileStore.cc:4765
#18 0x000000000080f610 in FileStore::_do_transaction
(this=this@entry=0x3188000, t=..., op_seq=op_seq@entry=4760123,
trans_num=trans_num@entry=0) at os/FileStore.cc:2595
#19 0x0000000000812999 in FileStore::_do_transactions
(this=this@entry=0x3188000, tls=..., op_seq=4760123,
handle=handle@entry=0x7f87a60c9b80) at os/FileStore.cc:2151
#20 0x0000000000812b2e in FileStore::_do_op (this=0x3188000,
osr=<optimized out>, handle=...) at os/FileStore.cc:1985
#21 0x00000000008f52ea in ThreadPool::worker (this=0x3188a08,
wt=0x319c3e0) at common/WorkQueue.cc:119
#22 0x00000000008f6590 in ThreadPool::WorkThread::entry
(this=<optimized out>) at common/WorkQueue.h:316
#23 0x00007f87b1486b50 in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#24 0x00007f87af9c2a7d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#25 0x0000000000000000 in ?? ()

Please let me know if can provide any other info to help find this bug.
/Emil

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Segmentation faults in ceph-osd
       [not found] ` <CAPYLRziXri6RYPZGCqQbnVS87XrnH1Bq2deX3yY4A8G5g3Yk5Q@mail.gmail.com>
@ 2013-05-21 15:44   ` Emil Renner Berthing
  2013-05-21 15:55     ` Gregory Farnum
  0 siblings, 1 reply; 12+ messages in thread
From: Emil Renner Berthing @ 2013-05-21 15:44 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: ceph-devel

Hi Greg,

Here are some more stats on our servers:
- each server has 64GB ram,
- there are 12 OSDs pr. server,
- each OSD uses around 1.5 GB of memory,
- we have 18432 PGs,
- around 5 to 10 MB writes/s is written to each OSD and almost no reads (yet).

/Emil

On 21 May 2013 17:10, Gregory Farnum <greg@inktank.com> wrote:
> That looks like an attempt at a 370MB memory allocation. :? What's the
> memory use like on those nodes, and what's your workload?
> -Greg
>
>
> On Tuesday, May 21, 2013, Emil Renner Berthing wrote:
>>
>> Hi,
>>
>> We're experiencing random segmentation faults in the osd daemon from
>> the 0.61.2-1~bpo70+1 debian packages. It happens across all our
>> servers and we've seen around 40 crashes in the last week.
>>
>> It seems to happen more often on loaded servers, but at least they all
>> return the same error in the logs. An example can be found here:
>> http://esmil.dk/osdcrash.txt
>>
>> Here is the backtrace from the core dump:
>>
>> #0  0x00007f87b148eefb in raise () from
>> /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000853a89 in reraise_fatal (signum=11) at
>> global/signal_handler.cc:58
>> #2  handle_fatal_signal (signum=11) at global/signal_handler.cc:104
>> #3  <signal handler called>
>> #4  0x00007f87b06a96f3 in do_malloc (size=388987616) at
>> src/tcmalloc.cc:1059
>> #5  cpp_alloc (nothrow=false, size=388987616) at src/tcmalloc.cc:1354
>> #6  tc_new (size=388987616) at src/tcmalloc.cc:1530
>> #7  0x00007f87a60c89b0 in ?? ()
>> #8  0x00000000172f7ae0 in ?? ()
>> #9  0x00007f87b0459b21 in ?? () from
>> /usr/lib/x86_64-linux-gnu/libleveldb.so.1
>> #10 0x00007f87b0456ba8 in ?? () from
>> /usr/lib/x86_64-linux-gnu/libleveldb.so.1
>> #11 0x00007f87b04424d4 in ?? () from
>> /usr/lib/x86_64-linux-gnu/libleveldb.so.1
>> #12 0x0000000000840977 in
>> LevelDBStore::LevelDBWholeSpaceIteratorImpl::lower_bound
>> (this=0x20910a20, prefix=..., to=...) at os/LevelDBStore.h:204
>> #13 0x000000000083f351 in LevelDBStore::get (this=<optimized out>,
>> prefix=..., keys=..., out=0x7f87a60c8d00) at os/LevelDBStore.cc:106
>> #14 0x0000000000838449 in DBObjectMap::_lookup_map_header
>> (this=this@entry=0x316d4a0, hoid=...) at os/DBObjectMap.cc:1080
>> #15 0x000000000083e4a9 in DBObjectMap::lookup_map_header
>> (this=this@entry=0x316d4a0, hoid=...) at os/DBObjectMap.h:404
>> #16 0x0000000000839e06 in DBObjectMap::rm_keys (this=0x316d4a0,
>> hoid=..., to_clear=..., spos=0x7f87a60c9400) at os/DBObjectMap.cc:696
>> #17 0x00000000007f40c1 in FileStore::_omap_rmkeys
>> (this=this@entry=0x3188000, cid=..., hoid=..., keys=..., spos=...) at
>> os/FileStore.cc:4765
>> #18 0x000000000080f610 in FileStore::_do_transaction
>> (this=this@entry=0x3188000, t=..., op_seq=op_seq@entry=4760123,
>> trans_num=trans_num@entry=0) at os/FileStore.cc:2595
>> #19 0x0000000000812999 in FileStore::_do_transactions
>> (this=this@entry=0x3188000, tls=..., op_seq=4760123,
>> handle=handle@entry=0x7f87a60c9b80) at os/FileStore.cc:2151
>> #20 0x0000000000812b2e in FileStore::_do_op (this=0x3188000,
>> osr=<optimized out>, handle=...) at os/FileStore.cc:1985
>> #21 0x00000000008f52ea in ThreadPool::worker (this=0x3188a08,
>> wt=0x319c3e0) at common/WorkQueue.cc:119
>> #22 0x00000000008f6590 in ThreadPool::WorkThread::entry
>> (this=<optimized out>) at common/WorkQueue.h:316
>> #23 0x00007f87b1486b50 in start_thread () from
>> /lib/x86_64-linux-gnu/libpthread.so.0
>> #24 0x00007f87af9c2a7d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #25 0x0000000000000000 in ?? ()
>>
>> Please let me know if can provide any other info to help find this bug.
>> /Emil
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Software Engineer #42 @ http://inktank.com | http://ceph.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Segmentation faults in ceph-osd
  2013-05-21 15:44   ` Emil Renner Berthing
@ 2013-05-21 15:55     ` Gregory Farnum
  2013-05-21 16:01       ` Emil Renner Berthing
  0 siblings, 1 reply; 12+ messages in thread
From: Gregory Farnum @ 2013-05-21 15:55 UTC (permalink / raw)
  To: Emil Renner Berthing; +Cc: ceph-devel

On Tue, May 21, 2013 at 8:44 AM, Emil Renner Berthing <ceph@esmil.dk> wrote:
> Hi Greg,
>
> Here are some more stats on our servers:
> - each server has 64GB ram,
> - there are 12 OSDs pr. server,
> - each OSD uses around 1.5 GB of memory,
> - we have 18432 PGs,
> - around 5 to 10 MB writes/s is written to each OSD and almost no reads (yet).

What interface are you writing with? How many OSD servers are there?
-Greg


>
> /Emil
>
> On 21 May 2013 17:10, Gregory Farnum <greg@inktank.com> wrote:
>> That looks like an attempt at a 370MB memory allocation. :? What's the
>> memory use like on those nodes, and what's your workload?
>> -Greg
>>
>>
>> On Tuesday, May 21, 2013, Emil Renner Berthing wrote:
>>>
>>> Hi,
>>>
>>> We're experiencing random segmentation faults in the osd daemon from
>>> the 0.61.2-1~bpo70+1 debian packages. It happens across all our
>>> servers and we've seen around 40 crashes in the last week.
>>>
>>> It seems to happen more often on loaded servers, but at least they all
>>> return the same error in the logs. An example can be found here:
>>> http://esmil.dk/osdcrash.txt
>>>
>>> Here is the backtrace from the core dump:
>>>
>>> #0  0x00007f87b148eefb in raise () from
>>> /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x0000000000853a89 in reraise_fatal (signum=11) at
>>> global/signal_handler.cc:58
>>> #2  handle_fatal_signal (signum=11) at global/signal_handler.cc:104
>>> #3  <signal handler called>
>>> #4  0x00007f87b06a96f3 in do_malloc (size=388987616) at
>>> src/tcmalloc.cc:1059
>>> #5  cpp_alloc (nothrow=false, size=388987616) at src/tcmalloc.cc:1354
>>> #6  tc_new (size=388987616) at src/tcmalloc.cc:1530
>>> #7  0x00007f87a60c89b0 in ?? ()
>>> #8  0x00000000172f7ae0 in ?? ()
>>> #9  0x00007f87b0459b21 in ?? () from
>>> /usr/lib/x86_64-linux-gnu/libleveldb.so.1
>>> #10 0x00007f87b0456ba8 in ?? () from
>>> /usr/lib/x86_64-linux-gnu/libleveldb.so.1
>>> #11 0x00007f87b04424d4 in ?? () from
>>> /usr/lib/x86_64-linux-gnu/libleveldb.so.1
>>> #12 0x0000000000840977 in
>>> LevelDBStore::LevelDBWholeSpaceIteratorImpl::lower_bound
>>> (this=0x20910a20, prefix=..., to=...) at os/LevelDBStore.h:204
>>> #13 0x000000000083f351 in LevelDBStore::get (this=<optimized out>,
>>> prefix=..., keys=..., out=0x7f87a60c8d00) at os/LevelDBStore.cc:106
>>> #14 0x0000000000838449 in DBObjectMap::_lookup_map_header
>>> (this=this@entry=0x316d4a0, hoid=...) at os/DBObjectMap.cc:1080
>>> #15 0x000000000083e4a9 in DBObjectMap::lookup_map_header
>>> (this=this@entry=0x316d4a0, hoid=...) at os/DBObjectMap.h:404
>>> #16 0x0000000000839e06 in DBObjectMap::rm_keys (this=0x316d4a0,
>>> hoid=..., to_clear=..., spos=0x7f87a60c9400) at os/DBObjectMap.cc:696
>>> #17 0x00000000007f40c1 in FileStore::_omap_rmkeys
>>> (this=this@entry=0x3188000, cid=..., hoid=..., keys=..., spos=...) at
>>> os/FileStore.cc:4765
>>> #18 0x000000000080f610 in FileStore::_do_transaction
>>> (this=this@entry=0x3188000, t=..., op_seq=op_seq@entry=4760123,
>>> trans_num=trans_num@entry=0) at os/FileStore.cc:2595
>>> #19 0x0000000000812999 in FileStore::_do_transactions
>>> (this=this@entry=0x3188000, tls=..., op_seq=4760123,
>>> handle=handle@entry=0x7f87a60c9b80) at os/FileStore.cc:2151
>>> #20 0x0000000000812b2e in FileStore::_do_op (this=0x3188000,
>>> osr=<optimized out>, handle=...) at os/FileStore.cc:1985
>>> #21 0x00000000008f52ea in ThreadPool::worker (this=0x3188a08,
>>> wt=0x319c3e0) at common/WorkQueue.cc:119
>>> #22 0x00000000008f6590 in ThreadPool::WorkThread::entry
>>> (this=<optimized out>) at common/WorkQueue.h:316
>>> #23 0x00007f87b1486b50 in start_thread () from
>>> /lib/x86_64-linux-gnu/libpthread.so.0
>>> #24 0x00007f87af9c2a7d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #25 0x0000000000000000 in ?? ()
>>>
>>> Please let me know if can provide any other info to help find this bug.
>>> /Emil
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>> --
>> Software Engineer #42 @ http://inktank.com | http://ceph.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Segmentation faults in ceph-osd
  2013-05-21 15:55     ` Gregory Farnum
@ 2013-05-21 16:01       ` Emil Renner Berthing
  2013-05-21 16:19         ` Gregory Farnum
  0 siblings, 1 reply; 12+ messages in thread
From: Emil Renner Berthing @ 2013-05-21 16:01 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: ceph-devel

On 21 May 2013 17:55, Gregory Farnum <greg@inktank.com> wrote:
> On Tue, May 21, 2013 at 8:44 AM, Emil Renner Berthing <ceph@esmil.dk> wrote:
>> Hi Greg,
>>
>> Here are some more stats on our servers:
>> - each server has 64GB ram,
>> - there are 12 OSDs pr. server,
>> - each OSD uses around 1.5 GB of memory,
>> - we have 18432 PGs,
>> - around 5 to 10 MB writes/s is written to each OSD and almost no reads (yet).
>
> What interface are you writing with? How many OSD servers are there?

We're using librados and there are 132 OSDs so far.

/Emil

> -Greg
>
>
>>
>> /Emil
>>
>> On 21 May 2013 17:10, Gregory Farnum <greg@inktank.com> wrote:
>>> That looks like an attempt at a 370MB memory allocation. :? What's the
>>> memory use like on those nodes, and what's your workload?
>>> -Greg
>>>
>>>
>>> On Tuesday, May 21, 2013, Emil Renner Berthing wrote:
>>>>
>>>> Hi,
>>>>
>>>> We're experiencing random segmentation faults in the osd daemon from
>>>> the 0.61.2-1~bpo70+1 debian packages. It happens across all our
>>>> servers and we've seen around 40 crashes in the last week.
>>>>
>>>> It seems to happen more often on loaded servers, but at least they all
>>>> return the same error in the logs. An example can be found here:
>>>> http://esmil.dk/osdcrash.txt
>>>>
>>>> Here is the backtrace from the core dump:
>>>>
>>>> #0  0x00007f87b148eefb in raise () from
>>>> /lib/x86_64-linux-gnu/libpthread.so.0
>>>> #1  0x0000000000853a89 in reraise_fatal (signum=11) at
>>>> global/signal_handler.cc:58
>>>> #2  handle_fatal_signal (signum=11) at global/signal_handler.cc:104
>>>> #3  <signal handler called>
>>>> #4  0x00007f87b06a96f3 in do_malloc (size=388987616) at
>>>> src/tcmalloc.cc:1059
>>>> #5  cpp_alloc (nothrow=false, size=388987616) at src/tcmalloc.cc:1354
>>>> #6  tc_new (size=388987616) at src/tcmalloc.cc:1530
>>>> #7  0x00007f87a60c89b0 in ?? ()
>>>> #8  0x00000000172f7ae0 in ?? ()
>>>> #9  0x00007f87b0459b21 in ?? () from
>>>> /usr/lib/x86_64-linux-gnu/libleveldb.so.1
>>>> #10 0x00007f87b0456ba8 in ?? () from
>>>> /usr/lib/x86_64-linux-gnu/libleveldb.so.1
>>>> #11 0x00007f87b04424d4 in ?? () from
>>>> /usr/lib/x86_64-linux-gnu/libleveldb.so.1
>>>> #12 0x0000000000840977 in
>>>> LevelDBStore::LevelDBWholeSpaceIteratorImpl::lower_bound
>>>> (this=0x20910a20, prefix=..., to=...) at os/LevelDBStore.h:204
>>>> #13 0x000000000083f351 in LevelDBStore::get (this=<optimized out>,
>>>> prefix=..., keys=..., out=0x7f87a60c8d00) at os/LevelDBStore.cc:106
>>>> #14 0x0000000000838449 in DBObjectMap::_lookup_map_header
>>>> (this=this@entry=0x316d4a0, hoid=...) at os/DBObjectMap.cc:1080
>>>> #15 0x000000000083e4a9 in DBObjectMap::lookup_map_header
>>>> (this=this@entry=0x316d4a0, hoid=...) at os/DBObjectMap.h:404
>>>> #16 0x0000000000839e06 in DBObjectMap::rm_keys (this=0x316d4a0,
>>>> hoid=..., to_clear=..., spos=0x7f87a60c9400) at os/DBObjectMap.cc:696
>>>> #17 0x00000000007f40c1 in FileStore::_omap_rmkeys
>>>> (this=this@entry=0x3188000, cid=..., hoid=..., keys=..., spos=...) at
>>>> os/FileStore.cc:4765
>>>> #18 0x000000000080f610 in FileStore::_do_transaction
>>>> (this=this@entry=0x3188000, t=..., op_seq=op_seq@entry=4760123,
>>>> trans_num=trans_num@entry=0) at os/FileStore.cc:2595
>>>> #19 0x0000000000812999 in FileStore::_do_transactions
>>>> (this=this@entry=0x3188000, tls=..., op_seq=4760123,
>>>> handle=handle@entry=0x7f87a60c9b80) at os/FileStore.cc:2151
>>>> #20 0x0000000000812b2e in FileStore::_do_op (this=0x3188000,
>>>> osr=<optimized out>, handle=...) at os/FileStore.cc:1985
>>>> #21 0x00000000008f52ea in ThreadPool::worker (this=0x3188a08,
>>>> wt=0x319c3e0) at common/WorkQueue.cc:119
>>>> #22 0x00000000008f6590 in ThreadPool::WorkThread::entry
>>>> (this=<optimized out>) at common/WorkQueue.h:316
>>>> #23 0x00007f87b1486b50 in start_thread () from
>>>> /lib/x86_64-linux-gnu/libpthread.so.0
>>>> #24 0x00007f87af9c2a7d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>>> #25 0x0000000000000000 in ?? ()
>>>>
>>>> Please let me know if can provide any other info to help find this bug.
>>>> /Emil
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>
>>> --
>>> Software Engineer #42 @ http://inktank.com | http://ceph.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Segmentation faults in ceph-osd
  2013-05-21 16:01       ` Emil Renner Berthing
@ 2013-05-21 16:19         ` Gregory Farnum
  2013-05-21 16:32           ` Emil Renner Berthing
  2013-05-21 16:34           ` Anders Saaby
  0 siblings, 2 replies; 12+ messages in thread
From: Gregory Farnum @ 2013-05-21 16:19 UTC (permalink / raw)
  To: Emil Renner Berthing; +Cc: ceph-devel, Samuel Just

On Tue, May 21, 2013 at 9:01 AM, Emil Renner Berthing <ceph@esmil.dk> wrote:
> On 21 May 2013 17:55, Gregory Farnum <greg@inktank.com> wrote:
>> On Tue, May 21, 2013 at 8:44 AM, Emil Renner Berthing <ceph@esmil.dk> wrote:
>>> Hi Greg,
>>>
>>> Here are some more stats on our servers:
>>> - each server has 64GB ram,
>>> - there are 12 OSDs pr. server,
>>> - each OSD uses around 1.5 GB of memory,
>>> - we have 18432 PGs,
>>> - around 5 to 10 MB writes/s is written to each OSD and almost no reads (yet).
>>
>> What interface are you writing with? How many OSD servers are there?
>
> We're using librados and there are 132 OSDs so far.

Okay, so the allocation is happening in the depths of LevelDB — maybe
the issue is there somewhere. Are you doing anything weird with omap,
snapshots, or xattrs?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

>
> /Emil
>
>> -Greg
>>
>>
>>>
>>> /Emil
>>>
>>> On 21 May 2013 17:10, Gregory Farnum <greg@inktank.com> wrote:
>>>> That looks like an attempt at a 370MB memory allocation. :? What's the
>>>> memory use like on those nodes, and what's your workload?
>>>> -Greg
>>>>
>>>>
>>>> On Tuesday, May 21, 2013, Emil Renner Berthing wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> We're experiencing random segmentation faults in the osd daemon from
>>>>> the 0.61.2-1~bpo70+1 debian packages. It happens across all our
>>>>> servers and we've seen around 40 crashes in the last week.
>>>>>
>>>>> It seems to happen more often on loaded servers, but at least they all
>>>>> return the same error in the logs. An example can be found here:
>>>>> http://esmil.dk/osdcrash.txt
>>>>>
>>>>> Here is the backtrace from the core dump:
>>>>>
>>>>> #0  0x00007f87b148eefb in raise () from
>>>>> /lib/x86_64-linux-gnu/libpthread.so.0
>>>>> #1  0x0000000000853a89 in reraise_fatal (signum=11) at
>>>>> global/signal_handler.cc:58
>>>>> #2  handle_fatal_signal (signum=11) at global/signal_handler.cc:104
>>>>> #3  <signal handler called>
>>>>> #4  0x00007f87b06a96f3 in do_malloc (size=388987616) at
>>>>> src/tcmalloc.cc:1059
>>>>> #5  cpp_alloc (nothrow=false, size=388987616) at src/tcmalloc.cc:1354
>>>>> #6  tc_new (size=388987616) at src/tcmalloc.cc:1530
>>>>> #7  0x00007f87a60c89b0 in ?? ()
>>>>> #8  0x00000000172f7ae0 in ?? ()
>>>>> #9  0x00007f87b0459b21 in ?? () from
>>>>> /usr/lib/x86_64-linux-gnu/libleveldb.so.1
>>>>> #10 0x00007f87b0456ba8 in ?? () from
>>>>> /usr/lib/x86_64-linux-gnu/libleveldb.so.1
>>>>> #11 0x00007f87b04424d4 in ?? () from
>>>>> /usr/lib/x86_64-linux-gnu/libleveldb.so.1
>>>>> #12 0x0000000000840977 in
>>>>> LevelDBStore::LevelDBWholeSpaceIteratorImpl::lower_bound
>>>>> (this=0x20910a20, prefix=..., to=...) at os/LevelDBStore.h:204
>>>>> #13 0x000000000083f351 in LevelDBStore::get (this=<optimized out>,
>>>>> prefix=..., keys=..., out=0x7f87a60c8d00) at os/LevelDBStore.cc:106
>>>>> #14 0x0000000000838449 in DBObjectMap::_lookup_map_header
>>>>> (this=this@entry=0x316d4a0, hoid=...) at os/DBObjectMap.cc:1080
>>>>> #15 0x000000000083e4a9 in DBObjectMap::lookup_map_header
>>>>> (this=this@entry=0x316d4a0, hoid=...) at os/DBObjectMap.h:404
>>>>> #16 0x0000000000839e06 in DBObjectMap::rm_keys (this=0x316d4a0,
>>>>> hoid=..., to_clear=..., spos=0x7f87a60c9400) at os/DBObjectMap.cc:696
>>>>> #17 0x00000000007f40c1 in FileStore::_omap_rmkeys
>>>>> (this=this@entry=0x3188000, cid=..., hoid=..., keys=..., spos=...) at
>>>>> os/FileStore.cc:4765
>>>>> #18 0x000000000080f610 in FileStore::_do_transaction
>>>>> (this=this@entry=0x3188000, t=..., op_seq=op_seq@entry=4760123,
>>>>> trans_num=trans_num@entry=0) at os/FileStore.cc:2595
>>>>> #19 0x0000000000812999 in FileStore::_do_transactions
>>>>> (this=this@entry=0x3188000, tls=..., op_seq=4760123,
>>>>> handle=handle@entry=0x7f87a60c9b80) at os/FileStore.cc:2151
>>>>> #20 0x0000000000812b2e in FileStore::_do_op (this=0x3188000,
>>>>> osr=<optimized out>, handle=...) at os/FileStore.cc:1985
>>>>> #21 0x00000000008f52ea in ThreadPool::worker (this=0x3188a08,
>>>>> wt=0x319c3e0) at common/WorkQueue.cc:119
>>>>> #22 0x00000000008f6590 in ThreadPool::WorkThread::entry
>>>>> (this=<optimized out>) at common/WorkQueue.h:316
>>>>> #23 0x00007f87b1486b50 in start_thread () from
>>>>> /lib/x86_64-linux-gnu/libpthread.so.0
>>>>> #24 0x00007f87af9c2a7d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>>>> #25 0x0000000000000000 in ?? ()
>>>>>
>>>>> Please let me know if can provide any other info to help find this bug.
>>>>> /Emil
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>>>>
>>>> --
>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Segmentation faults in ceph-osd
  2013-05-21 16:19         ` Gregory Farnum
@ 2013-05-21 16:32           ` Emil Renner Berthing
  2013-05-21 16:34           ` Anders Saaby
  1 sibling, 0 replies; 12+ messages in thread
From: Emil Renner Berthing @ 2013-05-21 16:32 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: ceph-devel, Samuel Just

On 21 May 2013 18:19, Gregory Farnum <greg@inktank.com> wrote:
> On Tue, May 21, 2013 at 9:01 AM, Emil Renner Berthing <ceph@esmil.dk> wrote:
>> On 21 May 2013 17:55, Gregory Farnum <greg@inktank.com> wrote:
>>> On Tue, May 21, 2013 at 8:44 AM, Emil Renner Berthing <ceph@esmil.dk> wrote:
>>>> Hi Greg,
>>>>
>>>> Here are some more stats on our servers:
>>>> - each server has 64GB ram,
>>>> - there are 12 OSDs pr. server,
>>>> - each OSD uses around 1.5 GB of memory,
>>>> - we have 18432 PGs,
>>>> - around 5 to 10 MB writes/s is written to each OSD and almost no reads (yet).
>>>
>>> What interface are you writing with? How many OSD servers are there?
>>
>> We're using librados and there are 132 OSDs so far.
>
> Okay, so the allocation is happening in the depths of LevelDB — maybe
> the issue is there somewhere. Are you doing anything weird with omap,
> snapshots, or xattrs?

All the drives are 4TB large, have an xfs-formatted sdx1 and a 10GB
journal at sdx2.
The filesystems are mounted xfs (rw,noatime,attr2,noquota).

We don't use snapshots.
/Emil

> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>>
>> /Emil
>>
>>> -Greg
>>>
>>>
>>>>
>>>> /Emil
>>>>
>>>> On 21 May 2013 17:10, Gregory Farnum <greg@inktank.com> wrote:
>>>>> That looks like an attempt at a 370MB memory allocation. :? What's the
>>>>> memory use like on those nodes, and what's your workload?
>>>>> -Greg
>>>>>
>>>>>
>>>>> On Tuesday, May 21, 2013, Emil Renner Berthing wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We're experiencing random segmentation faults in the osd daemon from
>>>>>> the 0.61.2-1~bpo70+1 debian packages. It happens across all our
>>>>>> servers and we've seen around 40 crashes in the last week.
>>>>>>
>>>>>> It seems to happen more often on loaded servers, but at least they all
>>>>>> return the same error in the logs. An example can be found here:
>>>>>> http://esmil.dk/osdcrash.txt
>>>>>>
>>>>>> Here is the backtrace from the core dump:
>>>>>>
>>>>>> #0  0x00007f87b148eefb in raise () from
>>>>>> /lib/x86_64-linux-gnu/libpthread.so.0
>>>>>> #1  0x0000000000853a89 in reraise_fatal (signum=11) at
>>>>>> global/signal_handler.cc:58
>>>>>> #2  handle_fatal_signal (signum=11) at global/signal_handler.cc:104
>>>>>> #3  <signal handler called>
>>>>>> #4  0x00007f87b06a96f3 in do_malloc (size=388987616) at
>>>>>> src/tcmalloc.cc:1059
>>>>>> #5  cpp_alloc (nothrow=false, size=388987616) at src/tcmalloc.cc:1354
>>>>>> #6  tc_new (size=388987616) at src/tcmalloc.cc:1530
>>>>>> #7  0x00007f87a60c89b0 in ?? ()
>>>>>> #8  0x00000000172f7ae0 in ?? ()
>>>>>> #9  0x00007f87b0459b21 in ?? () from
>>>>>> /usr/lib/x86_64-linux-gnu/libleveldb.so.1
>>>>>> #10 0x00007f87b0456ba8 in ?? () from
>>>>>> /usr/lib/x86_64-linux-gnu/libleveldb.so.1
>>>>>> #11 0x00007f87b04424d4 in ?? () from
>>>>>> /usr/lib/x86_64-linux-gnu/libleveldb.so.1
>>>>>> #12 0x0000000000840977 in
>>>>>> LevelDBStore::LevelDBWholeSpaceIteratorImpl::lower_bound
>>>>>> (this=0x20910a20, prefix=..., to=...) at os/LevelDBStore.h:204
>>>>>> #13 0x000000000083f351 in LevelDBStore::get (this=<optimized out>,
>>>>>> prefix=..., keys=..., out=0x7f87a60c8d00) at os/LevelDBStore.cc:106
>>>>>> #14 0x0000000000838449 in DBObjectMap::_lookup_map_header
>>>>>> (this=this@entry=0x316d4a0, hoid=...) at os/DBObjectMap.cc:1080
>>>>>> #15 0x000000000083e4a9 in DBObjectMap::lookup_map_header
>>>>>> (this=this@entry=0x316d4a0, hoid=...) at os/DBObjectMap.h:404
>>>>>> #16 0x0000000000839e06 in DBObjectMap::rm_keys (this=0x316d4a0,
>>>>>> hoid=..., to_clear=..., spos=0x7f87a60c9400) at os/DBObjectMap.cc:696
>>>>>> #17 0x00000000007f40c1 in FileStore::_omap_rmkeys
>>>>>> (this=this@entry=0x3188000, cid=..., hoid=..., keys=..., spos=...) at
>>>>>> os/FileStore.cc:4765
>>>>>> #18 0x000000000080f610 in FileStore::_do_transaction
>>>>>> (this=this@entry=0x3188000, t=..., op_seq=op_seq@entry=4760123,
>>>>>> trans_num=trans_num@entry=0) at os/FileStore.cc:2595
>>>>>> #19 0x0000000000812999 in FileStore::_do_transactions
>>>>>> (this=this@entry=0x3188000, tls=..., op_seq=4760123,
>>>>>> handle=handle@entry=0x7f87a60c9b80) at os/FileStore.cc:2151
>>>>>> #20 0x0000000000812b2e in FileStore::_do_op (this=0x3188000,
>>>>>> osr=<optimized out>, handle=...) at os/FileStore.cc:1985
>>>>>> #21 0x00000000008f52ea in ThreadPool::worker (this=0x3188a08,
>>>>>> wt=0x319c3e0) at common/WorkQueue.cc:119
>>>>>> #22 0x00000000008f6590 in ThreadPool::WorkThread::entry
>>>>>> (this=<optimized out>) at common/WorkQueue.h:316
>>>>>> #23 0x00007f87b1486b50 in start_thread () from
>>>>>> /lib/x86_64-linux-gnu/libpthread.so.0
>>>>>> #24 0x00007f87af9c2a7d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>>>>> #25 0x0000000000000000 in ?? ()
>>>>>>
>>>>>> Please let me know if can provide any other info to help find this bug.
>>>>>> /Emil
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Segmentation faults in ceph-osd
  2013-05-21 16:19         ` Gregory Farnum
  2013-05-21 16:32           ` Emil Renner Berthing
@ 2013-05-21 16:34           ` Anders Saaby
  2013-05-21 17:05             ` Samuel Just
  1 sibling, 1 reply; 12+ messages in thread
From: Anders Saaby @ 2013-05-21 16:34 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Emil Renner Berthing, ceph-devel, Samuel Just

On 21/05/2013, at 18.19, Gregory Farnum <greg@inktank.com> wrote:
> On Tue, May 21, 2013 at 9:01 AM, Emil Renner Berthing <ceph@esmil.dk> wrote:
>> On 21 May 2013 17:55, Gregory Farnum <greg@inktank.com> wrote:
>>> On Tue, May 21, 2013 at 8:44 AM, Emil Renner Berthing <ceph@esmil.dk> wrote:
>>>> Hi Greg,
>>>> 
>>>> Here are some more stats on our servers:
>>>> - each server has 64GB ram,
>>>> - there are 12 OSDs pr. server,
>>>> - each OSD uses around 1.5 GB of memory,
>>>> - we have 18432 PGs,
>>>> - around 5 to 10 MB writes/s is written to each OSD and almost no reads (yet).
>>> 
>>> What interface are you writing with? How many OSD servers are there?
>> 
>> We're using librados and there are 132 OSDs so far.
> 
> Okay, so the allocation is happening in the depths of LevelDB — maybe
> the issue is there somewhere. Are you doing anything weird with omap,
> snapshots, or xattrs?

I can help; No, we are not using omap, snaps or weird stuff with xattrs.

We are storing objects of sizes from few KB to GBs. Also, we have a quirk in the application design right now, which means that we store an object, write it again under a new name and delete the original object. - if that is of any potential value to finding this.


-- 
Anders

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Segmentation faults in ceph-osd
  2013-05-21 16:34           ` Anders Saaby
@ 2013-05-21 17:05             ` Samuel Just
  2013-05-21 17:13               ` Emil Renner Berthing
  0 siblings, 1 reply; 12+ messages in thread
From: Samuel Just @ 2013-05-21 17:05 UTC (permalink / raw)
  To: Anders Saaby; +Cc: Gregory Farnum, Emil Renner Berthing, ceph-devel

Do you use xattrs at all?
-Sam

On Tue, May 21, 2013 at 9:34 AM, Anders Saaby <anders@saaby.com> wrote:
> On 21/05/2013, at 18.19, Gregory Farnum <greg@inktank.com> wrote:
>> On Tue, May 21, 2013 at 9:01 AM, Emil Renner Berthing <ceph@esmil.dk> wrote:
>>> On 21 May 2013 17:55, Gregory Farnum <greg@inktank.com> wrote:
>>>> On Tue, May 21, 2013 at 8:44 AM, Emil Renner Berthing <ceph@esmil.dk> wrote:
>>>>> Hi Greg,
>>>>>
>>>>> Here are some more stats on our servers:
>>>>> - each server has 64GB ram,
>>>>> - there are 12 OSDs pr. server,
>>>>> - each OSD uses around 1.5 GB of memory,
>>>>> - we have 18432 PGs,
>>>>> - around 5 to 10 MB writes/s is written to each OSD and almost no reads (yet).
>>>>
>>>> What interface are you writing with? How many OSD servers are there?
>>>
>>> We're using librados and there are 132 OSDs so far.
>>
>> Okay, so the allocation is happening in the depths of LevelDB — maybe
>> the issue is there somewhere. Are you doing anything weird with omap,
>> snapshots, or xattrs?
>
> I can help; No, we are not using omap, snaps or weird stuff with xattrs.
>
> We are storing objects of sizes from few KB to GBs. Also, we have a quirk in the application design right now, which means that we store an object, write it again under a new name and delete the original object. - if that is of any potential value to finding this.
>
>
> --
> Anders
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Segmentation faults in ceph-osd
  2013-05-21 17:05             ` Samuel Just
@ 2013-05-21 17:13               ` Emil Renner Berthing
  2013-05-21 19:00                 ` Samuel Just
  0 siblings, 1 reply; 12+ messages in thread
From: Emil Renner Berthing @ 2013-05-21 17:13 UTC (permalink / raw)
  To: Samuel Just; +Cc: Anders Saaby, Gregory Farnum, ceph-devel

On 21 May 2013 19:05, Samuel Just <sam.just@inktank.com> wrote:
> Do you use xattrs at all?

Yes, on each object we set between 2 to 4 attributes at write time
which are then left unchanged.
/Emil

> -Sam
>
> On Tue, May 21, 2013 at 9:34 AM, Anders Saaby <anders@saaby.com> wrote:
>> On 21/05/2013, at 18.19, Gregory Farnum <greg@inktank.com> wrote:
>>> On Tue, May 21, 2013 at 9:01 AM, Emil Renner Berthing <ceph@esmil.dk> wrote:
>>>> On 21 May 2013 17:55, Gregory Farnum <greg@inktank.com> wrote:
>>>>> On Tue, May 21, 2013 at 8:44 AM, Emil Renner Berthing <ceph@esmil.dk> wrote:
>>>>>> Hi Greg,
>>>>>>
>>>>>> Here are some more stats on our servers:
>>>>>> - each server has 64GB ram,
>>>>>> - there are 12 OSDs pr. server,
>>>>>> - each OSD uses around 1.5 GB of memory,
>>>>>> - we have 18432 PGs,
>>>>>> - around 5 to 10 MB writes/s is written to each OSD and almost no reads (yet).
>>>>>
>>>>> What interface are you writing with? How many OSD servers are there?
>>>>
>>>> We're using librados and there are 132 OSDs so far.
>>>
>>> Okay, so the allocation is happening in the depths of LevelDB — maybe
>>> the issue is there somewhere. Are you doing anything weird with omap,
>>> snapshots, or xattrs?
>>
>> I can help; No, we are not using omap, snaps or weird stuff with xattrs.
>>
>> We are storing objects of sizes from few KB to GBs. Also, we have a quirk in the application design right now, which means that we store an object, write it again under a new name and delete the original object. - if that is of any potential value to finding this.
>>
>>
>> --
>> Anders
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Segmentation faults in ceph-osd
  2013-05-21 17:13               ` Emil Renner Berthing
@ 2013-05-21 19:00                 ` Samuel Just
  2013-05-21 21:27                   ` Anders Saaby
  0 siblings, 1 reply; 12+ messages in thread
From: Samuel Just @ 2013-05-21 19:00 UTC (permalink / raw)
  To: Emil Renner Berthing; +Cc: Anders Saaby, Gregory Farnum, ceph-devel

How large are the xattrs?
-Sam

On Tue, May 21, 2013 at 10:13 AM, Emil Renner Berthing <ceph@esmil.dk> wrote:
> On 21 May 2013 19:05, Samuel Just <sam.just@inktank.com> wrote:
>> Do you use xattrs at all?
>
> Yes, on each object we set between 2 to 4 attributes at write time
> which are then left unchanged.
> /Emil
>
>> -Sam
>>
>> On Tue, May 21, 2013 at 9:34 AM, Anders Saaby <anders@saaby.com> wrote:
>>> On 21/05/2013, at 18.19, Gregory Farnum <greg@inktank.com> wrote:
>>>> On Tue, May 21, 2013 at 9:01 AM, Emil Renner Berthing <ceph@esmil.dk> wrote:
>>>>> On 21 May 2013 17:55, Gregory Farnum <greg@inktank.com> wrote:
>>>>>> On Tue, May 21, 2013 at 8:44 AM, Emil Renner Berthing <ceph@esmil.dk> wrote:
>>>>>>> Hi Greg,
>>>>>>>
>>>>>>> Here are some more stats on our servers:
>>>>>>> - each server has 64GB ram,
>>>>>>> - there are 12 OSDs pr. server,
>>>>>>> - each OSD uses around 1.5 GB of memory,
>>>>>>> - we have 18432 PGs,
>>>>>>> - around 5 to 10 MB writes/s is written to each OSD and almost no reads (yet).
>>>>>>
>>>>>> What interface are you writing with? How many OSD servers are there?
>>>>>
>>>>> We're using librados and there are 132 OSDs so far.
>>>>
>>>> Okay, so the allocation is happening in the depths of LevelDB — maybe
>>>> the issue is there somewhere. Are you doing anything weird with omap,
>>>> snapshots, or xattrs?
>>>
>>> I can help; No, we are not using omap, snaps or weird stuff with xattrs.
>>>
>>> We are storing objects of sizes from few KB to GBs. Also, we have a quirk in the application design right now, which means that we store an object, write it again under a new name and delete the original object. - if that is of any potential value to finding this.
>>>
>>>
>>> --
>>> Anders
>>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Segmentation faults in ceph-osd
  2013-05-21 19:00                 ` Samuel Just
@ 2013-05-21 21:27                   ` Anders Saaby
  2013-06-03 15:42                     ` Emil Renner Berthing
  0 siblings, 1 reply; 12+ messages in thread
From: Anders Saaby @ 2013-05-21 21:27 UTC (permalink / raw)
  To: Samuel Just; +Cc: Emil Renner Berthing, Gregory Farnum, ceph-devel

On 21/05/2013, at 21.00, Samuel Just <sam.just@inktank.com> wrote:
> How large are the xattrs?
> -Sam

Sam,

They are quite small, as far as I can see from 10-50 chars each.

However, as you just explained on irc, our current unbounded 1:1 files vs. objects aproach, which in rare cases leads to ~25GB large objects, is not a good idea.. Could those large objects be what triggers the large leveldb allocation (and eventuelly the segfault)? 


-- 
Anders

> On Tue, May 21, 2013 at 10:13 AM, Emil Renner Berthing <ceph@esmil.dk> wrote:
>> On 21 May 2013 19:05, Samuel Just <sam.just@inktank.com> wrote:
>>> Do you use xattrs at all?
>> 
>> Yes, on each object we set between 2 to 4 attributes at write time
>> which are then left unchanged.
>> /Emil
>> 
>>> -Sam
>>> 
>>> On Tue, May 21, 2013 at 9:34 AM, Anders Saaby <anders@saaby.com> wrote:
>>>> On 21/05/2013, at 18.19, Gregory Farnum <greg@inktank.com> wrote:
>>>>> On Tue, May 21, 2013 at 9:01 AM, Emil Renner Berthing <ceph@esmil.dk> wrote:
>>>>>> On 21 May 2013 17:55, Gregory Farnum <greg@inktank.com> wrote:
>>>>>>> On Tue, May 21, 2013 at 8:44 AM, Emil Renner Berthing <ceph@esmil.dk> wrote:
>>>>>>>> Hi Greg,
>>>>>>>> 
>>>>>>>> Here are some more stats on our servers:
>>>>>>>> - each server has 64GB ram,
>>>>>>>> - there are 12 OSDs pr. server,
>>>>>>>> - each OSD uses around 1.5 GB of memory,
>>>>>>>> - we have 18432 PGs,
>>>>>>>> - around 5 to 10 MB writes/s is written to each OSD and almost no reads (yet).
>>>>>>> 
>>>>>>> What interface are you writing with? How many OSD servers are there?
>>>>>> 
>>>>>> We're using librados and there are 132 OSDs so far.
>>>>> 
>>>>> Okay, so the allocation is happening in the depths of LevelDB — maybe
>>>>> the issue is there somewhere. Are you doing anything weird with omap,
>>>>> snapshots, or xattrs?
>>>> 
>>>> I can help; No, we are not using omap, snaps or weird stuff with xattrs.
>>>> 
>>>> We are storing objects of sizes from few KB to GBs. Also, we have a quirk in the application design right now, which means that we store an object, write it again under a new name and delete the original object. - if that is of any potential value to finding this.
>>>> 
>>>> 
>>>> --
>>>> Anders

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Segmentation faults in ceph-osd
  2013-05-21 21:27                   ` Anders Saaby
@ 2013-06-03 15:42                     ` Emil Renner Berthing
  0 siblings, 0 replies; 12+ messages in thread
From: Emil Renner Berthing @ 2013-06-03 15:42 UTC (permalink / raw)
  To: Anders Saaby; +Cc: Samuel Just, Gregory Farnum, ceph-devel

Hi,

Unfortunately we keep getting these segmentation faults even when all
the cluster contains is objects from the rados benchmark tool.

I've opened an issue for it here:
http://tracker.ceph.com/issues/5239

On 21 May 2013 23:27, Anders Saaby <anders@saaby.com> wrote:
> On 21/05/2013, at 21.00, Samuel Just <sam.just@inktank.com> wrote:
>> How large are the xattrs?
>> -Sam
>
> Sam,
>
> They are quite small, as far as I can see from 10-50 chars each.
>
> However, as you just explained on irc, our current unbounded 1:1 files vs. objects aproach, which in rare cases leads to ~25GB large objects, is not a good idea.. Could those large objects be what triggers the large leveldb allocation (and eventuelly the segfault)?
>
>
> --
> Anders
>
>> On Tue, May 21, 2013 at 10:13 AM, Emil Renner Berthing <ceph@esmil.dk> wrote:
>>> On 21 May 2013 19:05, Samuel Just <sam.just@inktank.com> wrote:
>>>> Do you use xattrs at all?
>>>
>>> Yes, on each object we set between 2 to 4 attributes at write time
>>> which are then left unchanged.
>>> /Emil
>>>
>>>> -Sam
>>>>
>>>> On Tue, May 21, 2013 at 9:34 AM, Anders Saaby <anders@saaby.com> wrote:
>>>>> On 21/05/2013, at 18.19, Gregory Farnum <greg@inktank.com> wrote:
>>>>>> On Tue, May 21, 2013 at 9:01 AM, Emil Renner Berthing <ceph@esmil.dk> wrote:
>>>>>>> On 21 May 2013 17:55, Gregory Farnum <greg@inktank.com> wrote:
>>>>>>>> On Tue, May 21, 2013 at 8:44 AM, Emil Renner Berthing <ceph@esmil.dk> wrote:
>>>>>>>>> Hi Greg,
>>>>>>>>>
>>>>>>>>> Here are some more stats on our servers:
>>>>>>>>> - each server has 64GB ram,
>>>>>>>>> - there are 12 OSDs pr. server,
>>>>>>>>> - each OSD uses around 1.5 GB of memory,
>>>>>>>>> - we have 18432 PGs,
>>>>>>>>> - around 5 to 10 MB writes/s is written to each OSD and almost no reads (yet).
>>>>>>>>
>>>>>>>> What interface are you writing with? How many OSD servers are there?
>>>>>>>
>>>>>>> We're using librados and there are 132 OSDs so far.
>>>>>>
>>>>>> Okay, so the allocation is happening in the depths of LevelDB — maybe
>>>>>> the issue is there somewhere. Are you doing anything weird with omap,
>>>>>> snapshots, or xattrs?
>>>>>
>>>>> I can help; No, we are not using omap, snaps or weird stuff with xattrs.
>>>>>
>>>>> We are storing objects of sizes from few KB to GBs. Also, we have a quirk in the application design right now, which means that we store an object, write it again under a new name and delete the original object. - if that is of any potential value to finding this.
>>>>>
>>>>>
>>>>> --
>>>>> Anders
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-06-03 15:42 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-21 11:21 Segmentation faults in ceph-osd Emil Renner Berthing
     [not found] ` <CAPYLRziXri6RYPZGCqQbnVS87XrnH1Bq2deX3yY4A8G5g3Yk5Q@mail.gmail.com>
2013-05-21 15:44   ` Emil Renner Berthing
2013-05-21 15:55     ` Gregory Farnum
2013-05-21 16:01       ` Emil Renner Berthing
2013-05-21 16:19         ` Gregory Farnum
2013-05-21 16:32           ` Emil Renner Berthing
2013-05-21 16:34           ` Anders Saaby
2013-05-21 17:05             ` Samuel Just
2013-05-21 17:13               ` Emil Renner Berthing
2013-05-21 19:00                 ` Samuel Just
2013-05-21 21:27                   ` Anders Saaby
2013-06-03 15:42                     ` Emil Renner Berthing

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.