All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Help...MDS Continuously Segfaulting
@ 2012-10-30  2:22 Nick Couchman
  2012-11-03 17:45 ` Gregory Farnum
  0 siblings, 1 reply; 19+ messages in thread
From: Nick Couchman @ 2012-10-30  2:22 UTC (permalink / raw)
  To: greg; +Cc: ceph-devel

Okay, that patch worked and it seems to be running, again.  Should I continue to run with that patch, or go back to the original binaries?

>>> Gregory Farnum  10/19/12 4:16 PM >>>
I've written a small patch on top of v0.48.1argonaut which should
avoid this. It's in branch 3369-mds-session-workaround and will simply
log an error in the monitor central log instead of segfaulting. There
should shortly be packages available at
http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/3369-mds-session-workaround/
(for Precise amd64; or elsewhere if you're on a different platform?).
-Greg

On Fri, Oct 19, 2012 at 1:52 PM, Nick Couchman  wrote:
> One of the MDSs crashed over the weekend (late Friday night), but I believe that one was not active and was just in Replay mode.  Other than that, I don't know of anything that would have affected the MDSs.
>
> -Nick
>
>>>> On 2012/10/18 at 16:55, Gregory Farnum  wrote:
>> Okay, looked at this a little bit. Can you describe what was happening
>> before you got into this failed-replay loop? (So, why was it in replay
>> at all?) I see that the monitor marked it as laggy for some reason;
>> was the cluster under load; did the monitors break; something else?
>> I can see why it's failed here and I think I can do a simple code
>> patch to work around it, but the root cause is something that happened
>> while the MDS was still alive.
>>
>> Basic technical content:
>> The MDS journals all open client sessions. It brings them back into
>> memory during replay, and then operates on them to do things like open
>> new sessions or close ones that it turns out not to need. Your log
>> contains two close events for the same client session, and it's
>> causing a big freak out. This actually feels somewhat familiar; I'll
>> talk about it with our team here and get back to you tomorrow
>> sometime.
>> -Greg
>>
>> On Thu, Oct 18, 2012 at 8:56 AM, Nick Couchman 
>> wrote:
>>> Hopefully this is what you're looking for...
>>> (gdb) bt
>>> #0  ESession::replay (this=0x7fffcc49a7c0, mds=0x127d5f0) at
>> mds/journal.cc:828
>>> #1  0x00000000006a2446 in MDLog::_replay_thread (this=0x1281390) at
>> mds/MDLog.cc:580
>>> #2  0x00000000004cf5ed in MDLog::ReplayThread::entry (this=) at
>> mds/MDLog.h:86
>>> #3  0x00007ffff764df05 in start_thread () from /lib64/libpthread.so.0
>>> #4  0x00007ffff680d10d in clone () from /lib64/libc.so.6
>>>
>>>>>> On 2012/10/17 at 09:53, Sam Lang  wrote:
>>>> On 10/17/2012 09:42 AM, Nick Couchman wrote:
>>>>> Thanks...here's the backtrace:
>>>>> (gdb) bt
>>>>> #0  0x00000000004dcfea in ESession::replay(MDS*) ()
>>>>> #1  0x00000000006a2446 in MDLog::_replay_thread() ()
>>>>> #2  0x00000000004cf5ed in MDLog::ReplayThread::entry() ()
>>>>> #3  0x00007ffff764df05 in start_thread () from /lib64/libpthread.so.0
>>>>> #4  0x00007ffff680d10d in clone () from /lib64/libc.so.6
>>>>
>>>> Hi Nick,
>>>>
>>>> This doesn't have the debug symbols (line numbers in the source) we were
>>>> hoping for.  Could you install the ceph-dpg package and rerun?  You will
>>>> probably have to first uninstall the ceph package.
>>>>
>>>> Thanks,
>>>> -sam
>>>>
>>>>>
>>>>>>>> On 2012/10/17 at 07:34, Sam Lang  wrote:
>>>>>> On 10/16/2012 06:04 PM, Gregory Farnum wrote:
>>>>>>> Okay, that's the right debugging but it wasn't quite as helpful on its
>>>>>>> own as I expected. Can you get a core dump (you might already have
>>>>>>> one, depending on system settings) of the crash and open it up with
>>>>>>> gdb and get a full backtrace?
>>>>>>
>>>>>> You can also run the mds directly in gdb and avoid any core file ulimit
>>>>>> settings you have set:
>>>>>>
>>>>>>   > gdb --args ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
>>>>>> ...
>>>>>> (gdb) run
>>>>>>
>>>>>> Once you hit the segfault you can get the backtrace with:
>>>>>>
>>>>>> (gdb) bt
>>>>>>
>>>>>> -sam
>>>>>>
>>>>>>
>>>>>>> -Greg
>>>>>>>
>>>>>>> On Mon, Oct 15, 2012 at 10:59 AM, Nick Couchman 
>>>>>> wrote:
>>>>>>>> Well, hopefully this is still okay...8.5MB bzip2d, 230MB unzipped.
>>>>>>>>
>>>>>>>> -Nick
>>>>>>>>
>>>>>>>>>>> On 2012/10/15 at 11:47, Gregory Farnum  wrote:
>>>>>>>>> Yeah, zip it and post * somebody's going to have to download it and
>>>>>>>> do
>>>>>>>>> fun things. :)
>>>>>>>>> -Greg
>>>>>>>>>
>>>>>>>>> On Mon, Oct 15, 2012 at 10:43 AM, Nick Couchman
>>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>>> Anywhere in particular I should make it available?  It's a little
>>>>>>>> over a
>>>>>>>>> million lines of debug in the file - I can put it on a pastebin, if
>>>>>>>> that
>>>>>>>>> works, or perhaps zip it up and throw it somewhere?
>>>>>>>>>>
>>>>>>>>>> -Nick
>>>>>>>>>>
>>>>>>>>>>>>> On 2012/10/15 at 11:26, Gregory Farnum  wrote:
>>>>>>>>>>> Something in the MDS log is bad or is poking at a bug in the code.
>>>>>>>> Can
>>>>>>>>>>> you turn on MDS debugging and restart a daemon and put that log
>>>>>>>>>>> somewhere accessible?
>>>>>>>>>>> debug mds = 20
>>>>>>>>>>> debug journaler = 20
>>>>>>>>>>> debug ms = 1
>>>>>>>>>>> -Greg
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Oct 15, 2012 at 10:02 AM, Nick Couchman
>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>> Well, both of my MDSs seem to be down right now, and then
>>>>>>>> continually
>>>>>>>>>>> segfault (every time I try to start them) with the following:
>>>>>>>>>>>>
>>>>>>>>>>>> ceph-mdsmon-a:~ # ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
>>>>>>>>>>>> starting mds.b at :/0
>>>>>>>>>>>> *** Caught signal (Segmentation fault) **
>>>>>>>>>>>>    in thread 7fbe0d61d700
>>>>>>>>>>>>    ceph version 0.48.1argonaut
>>>>>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>>>>>    1: ceph-mds() [0x7ef83a]
>>>>>>>>>>>>    2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>>>>>    3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>>>>>    4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>>>>>    5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>>>>>    6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>>>>>    7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>>>>>> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught signal
>>>>>>>> (Segmentation
>>>>>>>>>>> fault) **
>>>>>>>>>>>>    in thread 7fbe0d61d700
>>>>>>>>>>>>
>>>>>>>>>>>>    ceph version 0.48.1argonaut
>>>>>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>>>>>    1: ceph-mds() [0x7ef83a]
>>>>>>>>>>>>    2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>>>>>    3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>>>>>    4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>>>>>    5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>>>>>    6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>>>>>    7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>>>>>>    NOTE: a copy of the executable, or `objdump -rdS ` is
>>>>>>>> needed to
>>>>>>>>>>> interpret this.
>>>>>>>>>>>>
>>>>>>>>>>>>        0> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught
>>>>>>>> signal
>>>>>>>>>>> (Segmentation fault) **
>>>>>>>>>>>>    in thread 7fbe0d61d700
>>>>>>>>>>>>
>>>>>>>>>>>>    ceph version 0.48.1argonaut
>>>>>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>>>>>    1: ceph-mds() [0x7ef83a]
>>>>>>>>>>>>    2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>>>>>    3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>>>>>    4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>>>>>    5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>>>>>    6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>>>>>    7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>>>>>>    NOTE: a copy of the executable, or `objdump -rdS ` is
>>>>>>>> needed to
>>>>>>>>>>> interpret this.
>>>>>>>>>>>>
>>>>>>>>>>>> Segmentation fault
>>>>>>>>>>>>
>>>>>>>>>>>> Anyone have any hints on recovering?  I'm running 0.48.1argonaut -
>>>>>>>> I can
>>>>>>>>>>> attempt to upgrade to 0.48.2 and see if that helps, but I figured
>>>>>>>> if anyone
>>>>>>>>>>> can offer any insight as to what to do to get the replay to run
>>>>>>>> without
>>>>>>>>>>> segfaulting?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --------
>>>>>>>>>>>> This e-mail may contain confidential and privileged material for
>>>>>>>> the sole use
>>>>>>>>>>> of the intended recipient.  If this email is not intended for you,
>>>>>>>> or you
>>>>>>>>> are
>>>>>>>>>>> not responsible for the delivery of this message to the intended
>>>>>>>> recipient,
>>>>>>>>>>> please note that this message may contain SEAKR Engineering
>>>>>>>> (SEAKR)
>>>>>>>>>>> Privileged/Proprietary Information.  In such a case, you are
>>>>>>>> strictly
>>>>>>>>>>> prohibited from downloading, photocopying, distributing or
>>>>>>>> otherwise using
>>>>>>>>>>> this message, its contents or attachments in any way.  If you have
>>>>>>>> received
>>>>>>>>>>> this message in error, please notify us immediately by replying to
>>>>>>>> this
>>>>>>>>> e-mail
>>>>>>>>>>> and delete the message from your mailbox.  Information contained in
>>>>>>>> this
>>>>>>>>>>> message that does not relate to the business of SEAKR is neither
>>>>>>>> endorsed by
>>>>>>>>>>> nor attributable to SEAKR.
>>>>>>>>>>>> --
>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>> ceph-devel" in
>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>> More majordomo info at
>>>>>>>> http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --------
>>>>>>>>>>
>>>>>>>>>> This e-mail may contain confidential and privileged material for the
>>>>>>>> sole use
>>>>>>>>> of the intended recipient.  If this email is not intended for you, or
>>>>>>>> you are
>>>>>>>>> not responsible for the delivery of this message to the intended
>>>>>>>> recipient,
>>>>>>>>> please note that this message may contain SEAKR Engineering (SEAKR)
>>>>>>>>> Privileged/Proprietary Information.  In such a case, you are strictly
>>>>>>>>
>>>>>>>>> prohibited from downloading, photocopying, distributing or otherwise
>>>>>>>> using
>>>>>>>>> this message, its contents or attachments in any way.  If you have
>>>>>>>> received
>>>>>>>>> this message in error, please notify us immediately by replying to
>>>>>>>> this e-mail
>>>>>>>>> and delete the message from your mailbox.  Information contained in
>>>>>>>> this
>>>>>>>>> message that does not relate to the business of SEAKR is neither
>>>>>>>> endorsed by
>>>>>>>>> nor attributable to SEAKR.
>>>>>>>>> --
>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>>>>> in
>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --------
>>>>>>>> This e-mail may contain confidential and privileged material for the sole use
>>>>>> of the intended recipient.  If this email is not intended for you, or you
>>>> are
>>>>>> not responsible for the delivery of this message to the intended recipient,
>>>>>> please note that this message may contain SEAKR Engineering (SEAKR)
>>>>>> Privileged/Proprietary Information.  In such a case, you are strictly
>>>>>> prohibited from downloading, photocopying, distributing or otherwise using
>>>>>> this message, its contents or attachments in any way.  If you have received
>>>>>> this message in error, please notify us immediately by replying to this
>>>> e-mail
>>>>>> and delete the message from your mailbox.  Information contained in this
>>>>>> message that does not relate to the business of SEAKR is neither endorsed by
>>>>>> nor attributable to SEAKR.
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --------
>>>>> This e-mail may contain confidential and privileged material for the sole use
>>>> of the intended recipient.  If this email is not intended for you, or you
>> are
>>>> not responsible for the delivery of this message to the intended recipient,
>>>> please note that this message may contain SEAKR Engineering (SEAKR)
>>>> Privileged/Proprietary Information.  In such a case, you are strictly
>>>> prohibited from downloading, photocopying, distributing or otherwise using
>>>> this message, its contents or attachments in any way.  If you have received
>>>> this message in error, please notify us immediately by replying to this
>> e-mail
>>>> and delete the message from your mailbox.  Information contained in this
>>>> message that does not relate to the business of SEAKR is neither endorsed by
>>>> nor attributable to SEAKR.
>>>>>
>>>
>>>
>>>
>>> --------
>>>
>>> This e-mail may contain confidential and privileged material for the sole use
>> of the intended recipient.  If this email is not intended for you, or you are
>> not responsible for the delivery of this message to the intended recipient,
>> please note that this message may contain SEAKR Engineering (SEAKR)
>> Privileged/Proprietary Information.  In such a case, you are strictly
>> prohibited from downloading, photocopying, distributing or otherwise using
>> this message, its contents or attachments in any way.  If you have received
>> this message in error, please notify us immediately by replying to this e-mail
>> and delete the message from your mailbox.  Information contained in this
>> message that does not relate to the business of SEAKR is neither endorsed by
>> nor attributable to SEAKR.
>
>
>
> --------
>
> This e-mail may contain confidential and privileged material for the sole use of the intended recipient.  If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information.  In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way.  If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox.  Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.



--------

This e-mail may contain confidential and privileged material for the sole use of the intended recipient.  If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information.  In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way.  If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox.  Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Help...MDS Continuously Segfaulting
  2012-10-30  2:22 Help...MDS Continuously Segfaulting Nick Couchman
@ 2012-11-03 17:45 ` Gregory Farnum
  0 siblings, 0 replies; 19+ messages in thread
From: Gregory Farnum @ 2012-11-03 17:45 UTC (permalink / raw)
  To: Nick Couchman; +Cc: ceph-devel

Sage merged it into master, so whatever you like. If you remove the
patch and the error happens again, your MDS will fail on replay as it
did here. If you leave it in, it has no effect other than handling
that particular bad case.
-Greg

On Tue, Oct 30, 2012 at 3:22 AM, Nick Couchman <Nick.Couchman@seakr.com> wrote:
> Okay, that patch worked and it seems to be running, again.  Should I continue to run with that patch, or go back to the original binaries?
>
>>>> Gregory Farnum  10/19/12 4:16 PM >>>
> I've written a small patch on top of v0.48.1argonaut which should
> avoid this. It's in branch 3369-mds-session-workaround and will simply
> log an error in the monitor central log instead of segfaulting. There
> should shortly be packages available at
> http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/3369-mds-session-workaround/
> (for Precise amd64; or elsewhere if you're on a different platform?).
> -Greg
>
> On Fri, Oct 19, 2012 at 1:52 PM, Nick Couchman  wrote:
>> One of the MDSs crashed over the weekend (late Friday night), but I believe that one was not active and was just in Replay mode.  Other than that, I don't know of anything that would have affected the MDSs.
>>
>> -Nick
>>
>>>>> On 2012/10/18 at 16:55, Gregory Farnum  wrote:
>>> Okay, looked at this a little bit. Can you describe what was happening
>>> before you got into this failed-replay loop? (So, why was it in replay
>>> at all?) I see that the monitor marked it as laggy for some reason;
>>> was the cluster under load; did the monitors break; something else?
>>> I can see why it's failed here and I think I can do a simple code
>>> patch to work around it, but the root cause is something that happened
>>> while the MDS was still alive.
>>>
>>> Basic technical content:
>>> The MDS journals all open client sessions. It brings them back into
>>> memory during replay, and then operates on them to do things like open
>>> new sessions or close ones that it turns out not to need. Your log
>>> contains two close events for the same client session, and it's
>>> causing a big freak out. This actually feels somewhat familiar; I'll
>>> talk about it with our team here and get back to you tomorrow
>>> sometime.
>>> -Greg
>>>
>>> On Thu, Oct 18, 2012 at 8:56 AM, Nick Couchman
>>> wrote:
>>>> Hopefully this is what you're looking for...
>>>> (gdb) bt
>>>> #0  ESession::replay (this=0x7fffcc49a7c0, mds=0x127d5f0) at
>>> mds/journal.cc:828
>>>> #1  0x00000000006a2446 in MDLog::_replay_thread (this=0x1281390) at
>>> mds/MDLog.cc:580
>>>> #2  0x00000000004cf5ed in MDLog::ReplayThread::entry (this=) at
>>> mds/MDLog.h:86
>>>> #3  0x00007ffff764df05 in start_thread () from /lib64/libpthread.so.0
>>>> #4  0x00007ffff680d10d in clone () from /lib64/libc.so.6
>>>>
>>>>>>> On 2012/10/17 at 09:53, Sam Lang  wrote:
>>>>> On 10/17/2012 09:42 AM, Nick Couchman wrote:
>>>>>> Thanks...here's the backtrace:
>>>>>> (gdb) bt
>>>>>> #0  0x00000000004dcfea in ESession::replay(MDS*) ()
>>>>>> #1  0x00000000006a2446 in MDLog::_replay_thread() ()
>>>>>> #2  0x00000000004cf5ed in MDLog::ReplayThread::entry() ()
>>>>>> #3  0x00007ffff764df05 in start_thread () from /lib64/libpthread.so.0
>>>>>> #4  0x00007ffff680d10d in clone () from /lib64/libc.so.6
>>>>>
>>>>> Hi Nick,
>>>>>
>>>>> This doesn't have the debug symbols (line numbers in the source) we were
>>>>> hoping for.  Could you install the ceph-dpg package and rerun?  You will
>>>>> probably have to first uninstall the ceph package.
>>>>>
>>>>> Thanks,
>>>>> -sam
>>>>>
>>>>>>
>>>>>>>>> On 2012/10/17 at 07:34, Sam Lang  wrote:
>>>>>>> On 10/16/2012 06:04 PM, Gregory Farnum wrote:
>>>>>>>> Okay, that's the right debugging but it wasn't quite as helpful on its
>>>>>>>> own as I expected. Can you get a core dump (you might already have
>>>>>>>> one, depending on system settings) of the crash and open it up with
>>>>>>>> gdb and get a full backtrace?
>>>>>>>
>>>>>>> You can also run the mds directly in gdb and avoid any core file ulimit
>>>>>>> settings you have set:
>>>>>>>
>>>>>>>   > gdb --args ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
>>>>>>> ...
>>>>>>> (gdb) run
>>>>>>>
>>>>>>> Once you hit the segfault you can get the backtrace with:
>>>>>>>
>>>>>>> (gdb) bt
>>>>>>>
>>>>>>> -sam
>>>>>>>
>>>>>>>
>>>>>>>> -Greg
>>>>>>>>
>>>>>>>> On Mon, Oct 15, 2012 at 10:59 AM, Nick Couchman
>>>>>>> wrote:
>>>>>>>>> Well, hopefully this is still okay...8.5MB bzip2d, 230MB unzipped.
>>>>>>>>>
>>>>>>>>> -Nick
>>>>>>>>>
>>>>>>>>>>>> On 2012/10/15 at 11:47, Gregory Farnum  wrote:
>>>>>>>>>> Yeah, zip it and post * somebody's going to have to download it and
>>>>>>>>> do
>>>>>>>>>> fun things. :)
>>>>>>>>>> -Greg
>>>>>>>>>>
>>>>>>>>>> On Mon, Oct 15, 2012 at 10:43 AM, Nick Couchman
>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>> Anywhere in particular I should make it available?  It's a little
>>>>>>>>> over a
>>>>>>>>>> million lines of debug in the file - I can put it on a pastebin, if
>>>>>>>>> that
>>>>>>>>>> works, or perhaps zip it up and throw it somewhere?
>>>>>>>>>>>
>>>>>>>>>>> -Nick
>>>>>>>>>>>
>>>>>>>>>>>>>> On 2012/10/15 at 11:26, Gregory Farnum  wrote:
>>>>>>>>>>>> Something in the MDS log is bad or is poking at a bug in the code.
>>>>>>>>> Can
>>>>>>>>>>>> you turn on MDS debugging and restart a daemon and put that log
>>>>>>>>>>>> somewhere accessible?
>>>>>>>>>>>> debug mds = 20
>>>>>>>>>>>> debug journaler = 20
>>>>>>>>>>>> debug ms = 1
>>>>>>>>>>>> -Greg
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Oct 15, 2012 at 10:02 AM, Nick Couchman
>>>>>>>>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> Well, both of my MDSs seem to be down right now, and then
>>>>>>>>> continually
>>>>>>>>>>>> segfault (every time I try to start them) with the following:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ceph-mdsmon-a:~ # ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
>>>>>>>>>>>>> starting mds.b at :/0
>>>>>>>>>>>>> *** Caught signal (Segmentation fault) **
>>>>>>>>>>>>>    in thread 7fbe0d61d700
>>>>>>>>>>>>>    ceph version 0.48.1argonaut
>>>>>>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>>>>>>    1: ceph-mds() [0x7ef83a]
>>>>>>>>>>>>>    2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>>>>>>    3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>>>>>>    4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>>>>>>    5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>>>>>>    6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>>>>>>    7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>>>>>>> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught signal
>>>>>>>>> (Segmentation
>>>>>>>>>>>> fault) **
>>>>>>>>>>>>>    in thread 7fbe0d61d700
>>>>>>>>>>>>>
>>>>>>>>>>>>>    ceph version 0.48.1argonaut
>>>>>>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>>>>>>    1: ceph-mds() [0x7ef83a]
>>>>>>>>>>>>>    2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>>>>>>    3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>>>>>>    4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>>>>>>    5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>>>>>>    6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>>>>>>    7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>>>>>>>    NOTE: a copy of the executable, or `objdump -rdS ` is
>>>>>>>>> needed to
>>>>>>>>>>>> interpret this.
>>>>>>>>>>>>>
>>>>>>>>>>>>>        0> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught
>>>>>>>>> signal
>>>>>>>>>>>> (Segmentation fault) **
>>>>>>>>>>>>>    in thread 7fbe0d61d700
>>>>>>>>>>>>>
>>>>>>>>>>>>>    ceph version 0.48.1argonaut
>>>>>>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>>>>>>    1: ceph-mds() [0x7ef83a]
>>>>>>>>>>>>>    2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>>>>>>    3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>>>>>>    4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>>>>>>    5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>>>>>>    6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>>>>>>    7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>>>>>>>    NOTE: a copy of the executable, or `objdump -rdS ` is
>>>>>>>>> needed to
>>>>>>>>>>>> interpret this.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Segmentation fault
>>>>>>>>>>>>>
>>>>>>>>>>>>> Anyone have any hints on recovering?  I'm running 0.48.1argonaut -
>>>>>>>>> I can
>>>>>>>>>>>> attempt to upgrade to 0.48.2 and see if that helps, but I figured
>>>>>>>>> if anyone
>>>>>>>>>>>> can offer any insight as to what to do to get the replay to run
>>>>>>>>> without
>>>>>>>>>>>> segfaulting?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --------
>>>>>>>>>>>>> This e-mail may contain confidential and privileged material for
>>>>>>>>> the sole use
>>>>>>>>>>>> of the intended recipient.  If this email is not intended for you,
>>>>>>>>> or you
>>>>>>>>>> are
>>>>>>>>>>>> not responsible for the delivery of this message to the intended
>>>>>>>>> recipient,
>>>>>>>>>>>> please note that this message may contain SEAKR Engineering
>>>>>>>>> (SEAKR)
>>>>>>>>>>>> Privileged/Proprietary Information.  In such a case, you are
>>>>>>>>> strictly
>>>>>>>>>>>> prohibited from downloading, photocopying, distributing or
>>>>>>>>> otherwise using
>>>>>>>>>>>> this message, its contents or attachments in any way.  If you have
>>>>>>>>> received
>>>>>>>>>>>> this message in error, please notify us immediately by replying to
>>>>>>>>> this
>>>>>>>>>> e-mail
>>>>>>>>>>>> and delete the message from your mailbox.  Information contained in
>>>>>>>>> this
>>>>>>>>>>>> message that does not relate to the business of SEAKR is neither
>>>>>>>>> endorsed by
>>>>>>>>>>>> nor attributable to SEAKR.
>>>>>>>>>>>>> --
>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>> ceph-devel" in
>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>> More majordomo info at
>>>>>>>>> http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --------
>>>>>>>>>>>
>>>>>>>>>>> This e-mail may contain confidential and privileged material for the
>>>>>>>>> sole use
>>>>>>>>>> of the intended recipient.  If this email is not intended for you, or
>>>>>>>>> you are
>>>>>>>>>> not responsible for the delivery of this message to the intended
>>>>>>>>> recipient,
>>>>>>>>>> please note that this message may contain SEAKR Engineering (SEAKR)
>>>>>>>>>> Privileged/Proprietary Information.  In such a case, you are strictly
>>>>>>>>>
>>>>>>>>>> prohibited from downloading, photocopying, distributing or otherwise
>>>>>>>>> using
>>>>>>>>>> this message, its contents or attachments in any way.  If you have
>>>>>>>>> received
>>>>>>>>>> this message in error, please notify us immediately by replying to
>>>>>>>>> this e-mail
>>>>>>>>>> and delete the message from your mailbox.  Information contained in
>>>>>>>>> this
>>>>>>>>>> message that does not relate to the business of SEAKR is neither
>>>>>>>>> endorsed by
>>>>>>>>>> nor attributable to SEAKR.
>>>>>>>>>> --
>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>>>>>> in
>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --------
>>>>>>>>> This e-mail may contain confidential and privileged material for the sole use
>>>>>>> of the intended recipient.  If this email is not intended for you, or you
>>>>> are
>>>>>>> not responsible for the delivery of this message to the intended recipient,
>>>>>>> please note that this message may contain SEAKR Engineering (SEAKR)
>>>>>>> Privileged/Proprietary Information.  In such a case, you are strictly
>>>>>>> prohibited from downloading, photocopying, distributing or otherwise using
>>>>>>> this message, its contents or attachments in any way.  If you have received
>>>>>>> this message in error, please notify us immediately by replying to this
>>>>> e-mail
>>>>>>> and delete the message from your mailbox.  Information contained in this
>>>>>>> message that does not relate to the business of SEAKR is neither endorsed by
>>>>>>> nor attributable to SEAKR.
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --------
>>>>>> This e-mail may contain confidential and privileged material for the sole use
>>>>> of the intended recipient.  If this email is not intended for you, or you
>>> are
>>>>> not responsible for the delivery of this message to the intended recipient,
>>>>> please note that this message may contain SEAKR Engineering (SEAKR)
>>>>> Privileged/Proprietary Information.  In such a case, you are strictly
>>>>> prohibited from downloading, photocopying, distributing or otherwise using
>>>>> this message, its contents or attachments in any way.  If you have received
>>>>> this message in error, please notify us immediately by replying to this
>>> e-mail
>>>>> and delete the message from your mailbox.  Information contained in this
>>>>> message that does not relate to the business of SEAKR is neither endorsed by
>>>>> nor attributable to SEAKR.
>>>>>>
>>>>
>>>>
>>>>
>>>> --------
>>>>
>>>> This e-mail may contain confidential and privileged material for the sole use
>>> of the intended recipient.  If this email is not intended for you, or you are
>>> not responsible for the delivery of this message to the intended recipient,
>>> please note that this message may contain SEAKR Engineering (SEAKR)
>>> Privileged/Proprietary Information.  In such a case, you are strictly
>>> prohibited from downloading, photocopying, distributing or otherwise using
>>> this message, its contents or attachments in any way.  If you have received
>>> this message in error, please notify us immediately by replying to this e-mail
>>> and delete the message from your mailbox.  Information contained in this
>>> message that does not relate to the business of SEAKR is neither endorsed by
>>> nor attributable to SEAKR.
>>
>>
>>
>> --------
>>
>> This e-mail may contain confidential and privileged material for the sole use of the intended recipient.  If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information.  In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way.  If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox.  Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.
>
>
>
> --------
>
> This e-mail may contain confidential and privileged material for the sole use of the intended recipient.  If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information.  In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way.  If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox.  Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Help...MDS Continuously Segfaulting
  2012-11-03 18:27 Nick Couchman
@ 2012-11-03 18:38 ` Gregory Farnum
  0 siblings, 0 replies; 19+ messages in thread
From: Gregory Farnum @ 2012-11-03 18:38 UTC (permalink / raw)
  To: Nick Couchman; +Cc: ceph-devel

It should apply cleanly on top of 0.48.2. There may be a 0.48.3, but
it won't be driven by this patch.
-Greg

On Sat, Nov 3, 2012 at 7:27 PM, Nick Couchman <Nick.Couchman@seakr.com> wrote:
> Okay - I'm planning to try to go to version 0.48.2, the latest stable - is the patch available for that branch, or will there be a 0.48.3 release coming?
>
>>>> Gregory Farnum  11/03/12 11:45 AM >>>
> Sage merged it into master, so whatever you like. If you remove the
> patch and the error happens again, your MDS will fail on replay as it
> did here. If you leave it in, it has no effect other than handling
> that particular bad case.
> -Greg
>
> On Tue, Oct 30, 2012 at 3:22 AM, Nick Couchman  wrote:
>> Okay, that patch worked and it seems to be running, again.  Should I continue to run with that patch, or go back to the original binaries?
>>
>>>>> Gregory Farnum  10/19/12 4:16 PM >>>
>> I've written a small patch on top of v0.48.1argonaut which should
>> avoid this. It's in branch 3369-mds-session-workaround and will simply
>> log an error in the monitor central log instead of segfaulting. There
>> should shortly be packages available at
>> http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/3369-mds-session-workaround/
>> (for Precise amd64; or elsewhere if you're on a different platform?).
>> -Greg
>>
>> On Fri, Oct 19, 2012 at 1:52 PM, Nick Couchman  wrote:
>>> One of the MDSs crashed over the weekend (late Friday night), but I believe that one was not active and was just in Replay mode.  Other than that, I don't know of anything that would have affected the MDSs.
>>>
>>> -Nick
>>>
>>>>>> On 2012/10/18 at 16:55, Gregory Farnum  wrote:
>>>> Okay, looked at this a little bit. Can you describe what was happening
>>>> before you got into this failed-replay loop? (So, why was it in replay
>>>> at all?) I see that the monitor marked it as laggy for some reason;
>>>> was the cluster under load; did the monitors break; something else?
>>>> I can see why it's failed here and I think I can do a simple code
>>>> patch to work around it, but the root cause is something that happened
>>>> while the MDS was still alive.
>>>>
>>>> Basic technical content:
>>>> The MDS journals all open client sessions. It brings them back into
>>>> memory during replay, and then operates on them to do things like open
>>>> new sessions or close ones that it turns out not to need. Your log
>>>> contains two close events for the same client session, and it's
>>>> causing a big freak out. This actually feels somewhat familiar; I'll
>>>> talk about it with our team here and get back to you tomorrow
>>>> sometime.
>>>> -Greg
>>>>
>>>> On Thu, Oct 18, 2012 at 8:56 AM, Nick Couchman
>>>> wrote:
>>>>> Hopefully this is what you're looking for...
>>>>> (gdb) bt
>>>>> #0  ESession::replay (this=0x7fffcc49a7c0, mds=0x127d5f0) at
>>>> mds/journal.cc:828
>>>>> #1  0x00000000006a2446 in MDLog::_replay_thread (this=0x1281390) at
>>>> mds/MDLog.cc:580
>>>>> #2  0x00000000004cf5ed in MDLog::ReplayThread::entry (this=) at
>>>> mds/MDLog.h:86
>>>>> #3  0x00007ffff764df05 in start_thread () from /lib64/libpthread.so.0
>>>>> #4  0x00007ffff680d10d in clone () from /lib64/libc.so.6
>>>>>
>>>>>>>> On 2012/10/17 at 09:53, Sam Lang  wrote:
>>>>>> On 10/17/2012 09:42 AM, Nick Couchman wrote:
>>>>>>> Thanks...here's the backtrace:
>>>>>>> (gdb) bt
>>>>>>> #0  0x00000000004dcfea in ESession::replay(MDS*) ()
>>>>>>> #1  0x00000000006a2446 in MDLog::_replay_thread() ()
>>>>>>> #2  0x00000000004cf5ed in MDLog::ReplayThread::entry() ()
>>>>>>> #3  0x00007ffff764df05 in start_thread () from /lib64/libpthread.so.0
>>>>>>> #4  0x00007ffff680d10d in clone () from /lib64/libc.so.6
>>>>>>
>>>>>> Hi Nick,
>>>>>>
>>>>>> This doesn't have the debug symbols (line numbers in the source) we were
>>>>>> hoping for.  Could you install the ceph-dpg package and rerun?  You will
>>>>>> probably have to first uninstall the ceph package.
>>>>>>
>>>>>> Thanks,
>>>>>> -sam
>>>>>>
>>>>>>>
>>>>>>>>>> On 2012/10/17 at 07:34, Sam Lang  wrote:
>>>>>>>> On 10/16/2012 06:04 PM, Gregory Farnum wrote:
>>>>>>>>> Okay, that's the right debugging but it wasn't quite as helpful on its
>>>>>>>>> own as I expected. Can you get a core dump (you might already have
>>>>>>>>> one, depending on system settings) of the crash and open it up with
>>>>>>>>> gdb and get a full backtrace?
>>>>>>>>
>>>>>>>> You can also run the mds directly in gdb and avoid any core file ulimit
>>>>>>>> settings you have set:
>>>>>>>>
>>>>>>>>   > gdb --args ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
>>>>>>>> ...
>>>>>>>> (gdb) run
>>>>>>>>
>>>>>>>> Once you hit the segfault you can get the backtrace with:
>>>>>>>>
>>>>>>>> (gdb) bt
>>>>>>>>
>>>>>>>> -sam
>>>>>>>>
>>>>>>>>
>>>>>>>>> -Greg
>>>>>>>>>
>>>>>>>>> On Mon, Oct 15, 2012 at 10:59 AM, Nick Couchman
>>>>>>>> wrote:
>>>>>>>>>> Well, hopefully this is still okay...8.5MB bzip2d, 230MB unzipped.
>>>>>>>>>>
>>>>>>>>>> -Nick
>>>>>>>>>>
>>>>>>>>>>>>> On 2012/10/15 at 11:47, Gregory Farnum  wrote:
>>>>>>>>>>> Yeah, zip it and post * somebody's going to have to download it and
>>>>>>>>>> do
>>>>>>>>>>> fun things. :)
>>>>>>>>>>> -Greg
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Oct 15, 2012 at 10:43 AM, Nick Couchman
>>>>>>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> Anywhere in particular I should make it available?  It's a little
>>>>>>>>>> over a
>>>>>>>>>>> million lines of debug in the file - I can put it on a pastebin, if
>>>>>>>>>> that
>>>>>>>>>>> works, or perhaps zip it up and throw it somewhere?
>>>>>>>>>>>>
>>>>>>>>>>>> -Nick
>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 2012/10/15 at 11:26, Gregory Farnum  wrote:
>>>>>>>>>>>>> Something in the MDS log is bad or is poking at a bug in the code.
>>>>>>>>>> Can
>>>>>>>>>>>>> you turn on MDS debugging and restart a daemon and put that log
>>>>>>>>>>>>> somewhere accessible?
>>>>>>>>>>>>> debug mds = 20
>>>>>>>>>>>>> debug journaler = 20
>>>>>>>>>>>>> debug ms = 1
>>>>>>>>>>>>> -Greg
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Oct 15, 2012 at 10:02 AM, Nick Couchman
>>>>>>>>>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> Well, both of my MDSs seem to be down right now, and then
>>>>>>>>>> continually
>>>>>>>>>>>>> segfault (every time I try to start them) with the following:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ceph-mdsmon-a:~ # ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
>>>>>>>>>>>>>> starting mds.b at :/0
>>>>>>>>>>>>>> *** Caught signal (Segmentation fault) **
>>>>>>>>>>>>>>    in thread 7fbe0d61d700
>>>>>>>>>>>>>>    ceph version 0.48.1argonaut
>>>>>>>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>>>>>>>    1: ceph-mds() [0x7ef83a]
>>>>>>>>>>>>>>    2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>>>>>>>    3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>>>>>>>    4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>>>>>>>    5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>>>>>>>    6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>>>>>>>    7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>>>>>>>> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught signal
>>>>>>>>>> (Segmentation
>>>>>>>>>>>>> fault) **
>>>>>>>>>>>>>>    in thread 7fbe0d61d700
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    ceph version 0.48.1argonaut
>>>>>>>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>>>>>>>    1: ceph-mds() [0x7ef83a]
>>>>>>>>>>>>>>    2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>>>>>>>    3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>>>>>>>    4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>>>>>>>    5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>>>>>>>    6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>>>>>>>    7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>>>>>>>>    NOTE: a copy of the executable, or `objdump -rdS ` is
>>>>>>>>>> needed to
>>>>>>>>>>>>> interpret this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>        0> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught
>>>>>>>>>> signal
>>>>>>>>>>>>> (Segmentation fault) **
>>>>>>>>>>>>>>    in thread 7fbe0d61d700
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    ceph version 0.48.1argonaut
>>>>>>>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>>>>>>>    1: ceph-mds() [0x7ef83a]
>>>>>>>>>>>>>>    2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>>>>>>>    3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>>>>>>>    4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>>>>>>>    5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>>>>>>>    6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>>>>>>>    7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>>>>>>>>    NOTE: a copy of the executable, or `objdump -rdS ` is
>>>>>>>>>> needed to
>>>>>>>>>>>>> interpret this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Segmentation fault
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Anyone have any hints on recovering?  I'm running 0.48.1argonaut -
>>>>>>>>>> I can
>>>>>>>>>>>>> attempt to upgrade to 0.48.2 and see if that helps, but I figured
>>>>>>>>>> if anyone
>>>>>>>>>>>>> can offer any insight as to what to do to get the replay to run
>>>>>>>>>> without
>>>>>>>>>>>>> segfaulting?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --------
>>>>>>>>>>>>>> This e-mail may contain confidential and privileged material for
>>>>>>>>>> the sole use
>>>>>>>>>>>>> of the intended recipient.  If this email is not intended for you,
>>>>>>>>>> or you
>>>>>>>>>>> are
>>>>>>>>>>>>> not responsible for the delivery of this message to the intended
>>>>>>>>>> recipient,
>>>>>>>>>>>>> please note that this message may contain SEAKR Engineering
>>>>>>>>>> (SEAKR)
>>>>>>>>>>>>> Privileged/Proprietary Information.  In such a case, you are
>>>>>>>>>> strictly
>>>>>>>>>>>>> prohibited from downloading, photocopying, distributing or
>>>>>>>>>> otherwise using
>>>>>>>>>>>>> this message, its contents or attachments in any way.  If you have
>>>>>>>>>> received
>>>>>>>>>>>>> this message in error, please notify us immediately by replying to
>>>>>>>>>> this
>>>>>>>>>>> e-mail
>>>>>>>>>>>>> and delete the message from your mailbox.  Information contained in
>>>>>>>>>> this
>>>>>>>>>>>>> message that does not relate to the business of SEAKR is neither
>>>>>>>>>> endorsed by
>>>>>>>>>>>>> nor attributable to SEAKR.
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>>> ceph-devel" in
>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>> More majordomo info at
>>>>>>>>>> http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --------
>>>>>>>>>>>>
>>>>>>>>>>>> This e-mail may contain confidential and privileged material for the
>>>>>>>>>> sole use
>>>>>>>>>>> of the intended recipient.  If this email is not intended for you, or
>>>>>>>>>> you are
>>>>>>>>>>> not responsible for the delivery of this message to the intended
>>>>>>>>>> recipient,
>>>>>>>>>>> please note that this message may contain SEAKR Engineering (SEAKR)
>>>>>>>>>>> Privileged/Proprietary Information.  In such a case, you are strictly
>>>>>>>>>>
>>>>>>>>>>> prohibited from downloading, photocopying, distributing or otherwise
>>>>>>>>>> using
>>>>>>>>>>> this message, its contents or attachments in any way.  If you have
>>>>>>>>>> received
>>>>>>>>>>> this message in error, please notify us immediately by replying to
>>>>>>>>>> this e-mail
>>>>>>>>>>> and delete the message from your mailbox.  Information contained in
>>>>>>>>>> this
>>>>>>>>>>> message that does not relate to the business of SEAKR is neither
>>>>>>>>>> endorsed by
>>>>>>>>>>> nor attributable to SEAKR.
>>>>>>>>>>> --
>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>>>>>>> in
>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --------
>>>>>>>>>> This e-mail may contain confidential and privileged material for the sole use
>>>>>>>> of the intended recipient.  If this email is not intended for you, or you
>>>>>> are
>>>>>>>> not responsible for the delivery of this message to the intended recipient,
>>>>>>>> please note that this message may contain SEAKR Engineering (SEAKR)
>>>>>>>> Privileged/Proprietary Information.  In such a case, you are strictly
>>>>>>>> prohibited from downloading, photocopying, distributing or otherwise using
>>>>>>>> this message, its contents or attachments in any way.  If you have received
>>>>>>>> this message in error, please notify us immediately by replying to this
>>>>>> e-mail
>>>>>>>> and delete the message from your mailbox.  Information contained in this
>>>>>>>> message that does not relate to the business of SEAKR is neither endorsed by
>>>>>>>> nor attributable to SEAKR.
>>>>>>>>> --
>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --------
>>>>>>> This e-mail may contain confidential and privileged material for the sole use
>>>>>> of the intended recipient.  If this email is not intended for you, or you
>>>> are
>>>>>> not responsible for the delivery of this message to the intended recipient,
>>>>>> please note that this message may contain SEAKR Engineering (SEAKR)
>>>>>> Privileged/Proprietary Information.  In such a case, you are strictly
>>>>>> prohibited from downloading, photocopying, distributing or otherwise using
>>>>>> this message, its contents or attachments in any way.  If you have received
>>>>>> this message in error, please notify us immediately by replying to this
>>>> e-mail
>>>>>> and delete the message from your mailbox.  Information contained in this
>>>>>> message that does not relate to the business of SEAKR is neither endorsed by
>>>>>> nor attributable to SEAKR.
>>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --------
>>>>>
>>>>> This e-mail may contain confidential and privileged material for the sole use
>>>> of the intended recipient.  If this email is not intended for you, or you are
>>>> not responsible for the delivery of this message to the intended recipient,
>>>> please note that this message may contain SEAKR Engineering (SEAKR)
>>>> Privileged/Proprietary Information.  In such a case, you are strictly
>>>> prohibited from downloading, photocopying, distributing or otherwise using
>>>> this message, its contents or attachments in any way.  If you have received
>>>> this message in error, please notify us immediately by replying to this e-mail
>>>> and delete the message from your mailbox.  Information contained in this
>>>> message that does not relate to the business of SEAKR is neither endorsed by
>>>> nor attributable to SEAKR.
>>>
>>>
>>>
>>> --------
>>>
>>> This e-mail may contain confidential and privileged material for the sole use of the intended recipient.  If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information.  In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way.  If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox.  Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.
>>
>>
>>
>> --------
>>
>> This e-mail may contain confidential and privileged material for the sole use of the intended recipient.  If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information.  In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way.  If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox.  Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --------
>
> This e-mail may contain confidential and privileged material for the sole use of the intended recipient.  If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information.  In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way.  If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox.  Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Help...MDS Continuously Segfaulting
@ 2012-11-03 18:27 Nick Couchman
  2012-11-03 18:38 ` Gregory Farnum
  0 siblings, 1 reply; 19+ messages in thread
From: Nick Couchman @ 2012-11-03 18:27 UTC (permalink / raw)
  To: greg; +Cc: ceph-devel

Okay - I'm planning to try to go to version 0.48.2, the latest stable - is the patch available for that branch, or will there be a 0.48.3 release coming?

>>> Gregory Farnum  11/03/12 11:45 AM >>>
Sage merged it into master, so whatever you like. If you remove the
patch and the error happens again, your MDS will fail on replay as it
did here. If you leave it in, it has no effect other than handling
that particular bad case.
-Greg

On Tue, Oct 30, 2012 at 3:22 AM, Nick Couchman  wrote:
> Okay, that patch worked and it seems to be running, again.  Should I continue to run with that patch, or go back to the original binaries?
>
>>>> Gregory Farnum  10/19/12 4:16 PM >>>
> I've written a small patch on top of v0.48.1argonaut which should
> avoid this. It's in branch 3369-mds-session-workaround and will simply
> log an error in the monitor central log instead of segfaulting. There
> should shortly be packages available at
> http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/3369-mds-session-workaround/
> (for Precise amd64; or elsewhere if you're on a different platform?).
> -Greg
>
> On Fri, Oct 19, 2012 at 1:52 PM, Nick Couchman  wrote:
>> One of the MDSs crashed over the weekend (late Friday night), but I believe that one was not active and was just in Replay mode.  Other than that, I don't know of anything that would have affected the MDSs.
>>
>> -Nick
>>
>>>>> On 2012/10/18 at 16:55, Gregory Farnum  wrote:
>>> Okay, looked at this a little bit. Can you describe what was happening
>>> before you got into this failed-replay loop? (So, why was it in replay
>>> at all?) I see that the monitor marked it as laggy for some reason;
>>> was the cluster under load; did the monitors break; something else?
>>> I can see why it's failed here and I think I can do a simple code
>>> patch to work around it, but the root cause is something that happened
>>> while the MDS was still alive.
>>>
>>> Basic technical content:
>>> The MDS journals all open client sessions. It brings them back into
>>> memory during replay, and then operates on them to do things like open
>>> new sessions or close ones that it turns out not to need. Your log
>>> contains two close events for the same client session, and it's
>>> causing a big freak out. This actually feels somewhat familiar; I'll
>>> talk about it with our team here and get back to you tomorrow
>>> sometime.
>>> -Greg
>>>
>>> On Thu, Oct 18, 2012 at 8:56 AM, Nick Couchman
>>> wrote:
>>>> Hopefully this is what you're looking for...
>>>> (gdb) bt
>>>> #0  ESession::replay (this=0x7fffcc49a7c0, mds=0x127d5f0) at
>>> mds/journal.cc:828
>>>> #1  0x00000000006a2446 in MDLog::_replay_thread (this=0x1281390) at
>>> mds/MDLog.cc:580
>>>> #2  0x00000000004cf5ed in MDLog::ReplayThread::entry (this=) at
>>> mds/MDLog.h:86
>>>> #3  0x00007ffff764df05 in start_thread () from /lib64/libpthread.so.0
>>>> #4  0x00007ffff680d10d in clone () from /lib64/libc.so.6
>>>>
>>>>>>> On 2012/10/17 at 09:53, Sam Lang  wrote:
>>>>> On 10/17/2012 09:42 AM, Nick Couchman wrote:
>>>>>> Thanks...here's the backtrace:
>>>>>> (gdb) bt
>>>>>> #0  0x00000000004dcfea in ESession::replay(MDS*) ()
>>>>>> #1  0x00000000006a2446 in MDLog::_replay_thread() ()
>>>>>> #2  0x00000000004cf5ed in MDLog::ReplayThread::entry() ()
>>>>>> #3  0x00007ffff764df05 in start_thread () from /lib64/libpthread.so.0
>>>>>> #4  0x00007ffff680d10d in clone () from /lib64/libc.so.6
>>>>>
>>>>> Hi Nick,
>>>>>
>>>>> This doesn't have the debug symbols (line numbers in the source) we were
>>>>> hoping for.  Could you install the ceph-dpg package and rerun?  You will
>>>>> probably have to first uninstall the ceph package.
>>>>>
>>>>> Thanks,
>>>>> -sam
>>>>>
>>>>>>
>>>>>>>>> On 2012/10/17 at 07:34, Sam Lang  wrote:
>>>>>>> On 10/16/2012 06:04 PM, Gregory Farnum wrote:
>>>>>>>> Okay, that's the right debugging but it wasn't quite as helpful on its
>>>>>>>> own as I expected. Can you get a core dump (you might already have
>>>>>>>> one, depending on system settings) of the crash and open it up with
>>>>>>>> gdb and get a full backtrace?
>>>>>>>
>>>>>>> You can also run the mds directly in gdb and avoid any core file ulimit
>>>>>>> settings you have set:
>>>>>>>
>>>>>>>   > gdb --args ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
>>>>>>> ...
>>>>>>> (gdb) run
>>>>>>>
>>>>>>> Once you hit the segfault you can get the backtrace with:
>>>>>>>
>>>>>>> (gdb) bt
>>>>>>>
>>>>>>> -sam
>>>>>>>
>>>>>>>
>>>>>>>> -Greg
>>>>>>>>
>>>>>>>> On Mon, Oct 15, 2012 at 10:59 AM, Nick Couchman
>>>>>>> wrote:
>>>>>>>>> Well, hopefully this is still okay...8.5MB bzip2d, 230MB unzipped.
>>>>>>>>>
>>>>>>>>> -Nick
>>>>>>>>>
>>>>>>>>>>>> On 2012/10/15 at 11:47, Gregory Farnum  wrote:
>>>>>>>>>> Yeah, zip it and post * somebody's going to have to download it and
>>>>>>>>> do
>>>>>>>>>> fun things. :)
>>>>>>>>>> -Greg
>>>>>>>>>>
>>>>>>>>>> On Mon, Oct 15, 2012 at 10:43 AM, Nick Couchman
>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>> Anywhere in particular I should make it available?  It's a little
>>>>>>>>> over a
>>>>>>>>>> million lines of debug in the file - I can put it on a pastebin, if
>>>>>>>>> that
>>>>>>>>>> works, or perhaps zip it up and throw it somewhere?
>>>>>>>>>>>
>>>>>>>>>>> -Nick
>>>>>>>>>>>
>>>>>>>>>>>>>> On 2012/10/15 at 11:26, Gregory Farnum  wrote:
>>>>>>>>>>>> Something in the MDS log is bad or is poking at a bug in the code.
>>>>>>>>> Can
>>>>>>>>>>>> you turn on MDS debugging and restart a daemon and put that log
>>>>>>>>>>>> somewhere accessible?
>>>>>>>>>>>> debug mds = 20
>>>>>>>>>>>> debug journaler = 20
>>>>>>>>>>>> debug ms = 1
>>>>>>>>>>>> -Greg
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Oct 15, 2012 at 10:02 AM, Nick Couchman
>>>>>>>>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> Well, both of my MDSs seem to be down right now, and then
>>>>>>>>> continually
>>>>>>>>>>>> segfault (every time I try to start them) with the following:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ceph-mdsmon-a:~ # ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
>>>>>>>>>>>>> starting mds.b at :/0
>>>>>>>>>>>>> *** Caught signal (Segmentation fault) **
>>>>>>>>>>>>>    in thread 7fbe0d61d700
>>>>>>>>>>>>>    ceph version 0.48.1argonaut
>>>>>>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>>>>>>    1: ceph-mds() [0x7ef83a]
>>>>>>>>>>>>>    2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>>>>>>    3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>>>>>>    4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>>>>>>    5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>>>>>>    6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>>>>>>    7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>>>>>>> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught signal
>>>>>>>>> (Segmentation
>>>>>>>>>>>> fault) **
>>>>>>>>>>>>>    in thread 7fbe0d61d700
>>>>>>>>>>>>>
>>>>>>>>>>>>>    ceph version 0.48.1argonaut
>>>>>>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>>>>>>    1: ceph-mds() [0x7ef83a]
>>>>>>>>>>>>>    2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>>>>>>    3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>>>>>>    4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>>>>>>    5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>>>>>>    6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>>>>>>    7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>>>>>>>    NOTE: a copy of the executable, or `objdump -rdS ` is
>>>>>>>>> needed to
>>>>>>>>>>>> interpret this.
>>>>>>>>>>>>>
>>>>>>>>>>>>>        0> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught
>>>>>>>>> signal
>>>>>>>>>>>> (Segmentation fault) **
>>>>>>>>>>>>>    in thread 7fbe0d61d700
>>>>>>>>>>>>>
>>>>>>>>>>>>>    ceph version 0.48.1argonaut
>>>>>>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>>>>>>    1: ceph-mds() [0x7ef83a]
>>>>>>>>>>>>>    2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>>>>>>    3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>>>>>>    4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>>>>>>    5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>>>>>>    6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>>>>>>    7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>>>>>>>    NOTE: a copy of the executable, or `objdump -rdS ` is
>>>>>>>>> needed to
>>>>>>>>>>>> interpret this.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Segmentation fault
>>>>>>>>>>>>>
>>>>>>>>>>>>> Anyone have any hints on recovering?  I'm running 0.48.1argonaut -
>>>>>>>>> I can
>>>>>>>>>>>> attempt to upgrade to 0.48.2 and see if that helps, but I figured
>>>>>>>>> if anyone
>>>>>>>>>>>> can offer any insight as to what to do to get the replay to run
>>>>>>>>> without
>>>>>>>>>>>> segfaulting?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --------
>>>>>>>>>>>>> This e-mail may contain confidential and privileged material for
>>>>>>>>> the sole use
>>>>>>>>>>>> of the intended recipient.  If this email is not intended for you,
>>>>>>>>> or you
>>>>>>>>>> are
>>>>>>>>>>>> not responsible for the delivery of this message to the intended
>>>>>>>>> recipient,
>>>>>>>>>>>> please note that this message may contain SEAKR Engineering
>>>>>>>>> (SEAKR)
>>>>>>>>>>>> Privileged/Proprietary Information.  In such a case, you are
>>>>>>>>> strictly
>>>>>>>>>>>> prohibited from downloading, photocopying, distributing or
>>>>>>>>> otherwise using
>>>>>>>>>>>> this message, its contents or attachments in any way.  If you have
>>>>>>>>> received
>>>>>>>>>>>> this message in error, please notify us immediately by replying to
>>>>>>>>> this
>>>>>>>>>> e-mail
>>>>>>>>>>>> and delete the message from your mailbox.  Information contained in
>>>>>>>>> this
>>>>>>>>>>>> message that does not relate to the business of SEAKR is neither
>>>>>>>>> endorsed by
>>>>>>>>>>>> nor attributable to SEAKR.
>>>>>>>>>>>>> --
>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>> ceph-devel" in
>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>> More majordomo info at
>>>>>>>>> http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --------
>>>>>>>>>>>
>>>>>>>>>>> This e-mail may contain confidential and privileged material for the
>>>>>>>>> sole use
>>>>>>>>>> of the intended recipient.  If this email is not intended for you, or
>>>>>>>>> you are
>>>>>>>>>> not responsible for the delivery of this message to the intended
>>>>>>>>> recipient,
>>>>>>>>>> please note that this message may contain SEAKR Engineering (SEAKR)
>>>>>>>>>> Privileged/Proprietary Information.  In such a case, you are strictly
>>>>>>>>>
>>>>>>>>>> prohibited from downloading, photocopying, distributing or otherwise
>>>>>>>>> using
>>>>>>>>>> this message, its contents or attachments in any way.  If you have
>>>>>>>>> received
>>>>>>>>>> this message in error, please notify us immediately by replying to
>>>>>>>>> this e-mail
>>>>>>>>>> and delete the message from your mailbox.  Information contained in
>>>>>>>>> this
>>>>>>>>>> message that does not relate to the business of SEAKR is neither
>>>>>>>>> endorsed by
>>>>>>>>>> nor attributable to SEAKR.
>>>>>>>>>> --
>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>>>>>> in
>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --------
>>>>>>>>> This e-mail may contain confidential and privileged material for the sole use
>>>>>>> of the intended recipient.  If this email is not intended for you, or you
>>>>> are
>>>>>>> not responsible for the delivery of this message to the intended recipient,
>>>>>>> please note that this message may contain SEAKR Engineering (SEAKR)
>>>>>>> Privileged/Proprietary Information.  In such a case, you are strictly
>>>>>>> prohibited from downloading, photocopying, distributing or otherwise using
>>>>>>> this message, its contents or attachments in any way.  If you have received
>>>>>>> this message in error, please notify us immediately by replying to this
>>>>> e-mail
>>>>>>> and delete the message from your mailbox.  Information contained in this
>>>>>>> message that does not relate to the business of SEAKR is neither endorsed by
>>>>>>> nor attributable to SEAKR.
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --------
>>>>>> This e-mail may contain confidential and privileged material for the sole use
>>>>> of the intended recipient.  If this email is not intended for you, or you
>>> are
>>>>> not responsible for the delivery of this message to the intended recipient,
>>>>> please note that this message may contain SEAKR Engineering (SEAKR)
>>>>> Privileged/Proprietary Information.  In such a case, you are strictly
>>>>> prohibited from downloading, photocopying, distributing or otherwise using
>>>>> this message, its contents or attachments in any way.  If you have received
>>>>> this message in error, please notify us immediately by replying to this
>>> e-mail
>>>>> and delete the message from your mailbox.  Information contained in this
>>>>> message that does not relate to the business of SEAKR is neither endorsed by
>>>>> nor attributable to SEAKR.
>>>>>>
>>>>
>>>>
>>>>
>>>> --------
>>>>
>>>> This e-mail may contain confidential and privileged material for the sole use
>>> of the intended recipient.  If this email is not intended for you, or you are
>>> not responsible for the delivery of this message to the intended recipient,
>>> please note that this message may contain SEAKR Engineering (SEAKR)
>>> Privileged/Proprietary Information.  In such a case, you are strictly
>>> prohibited from downloading, photocopying, distributing or otherwise using
>>> this message, its contents or attachments in any way.  If you have received
>>> this message in error, please notify us immediately by replying to this e-mail
>>> and delete the message from your mailbox.  Information contained in this
>>> message that does not relate to the business of SEAKR is neither endorsed by
>>> nor attributable to SEAKR.
>>
>>
>>
>> --------
>>
>> This e-mail may contain confidential and privileged material for the sole use of the intended recipient.  If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information.  In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way.  If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox.  Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.
>
>
>
> --------
>
> This e-mail may contain confidential and privileged material for the sole use of the intended recipient.  If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information.  In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way.  If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox.  Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--------

This e-mail may contain confidential and privileged material for the sole use of the intended recipient.  If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information.  In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way.  If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox.  Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Help...MDS Continuously Segfaulting
  2012-10-19 20:52                     ` Nick Couchman
@ 2012-10-19 22:15                       ` Gregory Farnum
  0 siblings, 0 replies; 19+ messages in thread
From: Gregory Farnum @ 2012-10-19 22:15 UTC (permalink / raw)
  To: Nick Couchman; +Cc: ceph-devel

I've written a small patch on top of v0.48.1argonaut which should
avoid this. It's in branch 3369-mds-session-workaround and will simply
log an error in the monitor central log instead of segfaulting. There
should shortly be packages available at
http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/3369-mds-session-workaround/
(for Precise amd64; or elsewhere if you're on a different platform?).
-Greg

On Fri, Oct 19, 2012 at 1:52 PM, Nick Couchman <Nick.Couchman@seakr.com> wrote:
> One of the MDSs crashed over the weekend (late Friday night), but I believe that one was not active and was just in Replay mode.  Other than that, I don't know of anything that would have affected the MDSs.
>
> -Nick
>
>>>> On 2012/10/18 at 16:55, Gregory Farnum <greg@inktank.com> wrote:
>> Okay, looked at this a little bit. Can you describe what was happening
>> before you got into this failed-replay loop? (So, why was it in replay
>> at all?) I see that the monitor marked it as laggy for some reason;
>> was the cluster under load; did the monitors break; something else?
>> I can see why it's failed here and I think I can do a simple code
>> patch to work around it, but the root cause is something that happened
>> while the MDS was still alive.
>>
>> Basic technical content:
>> The MDS journals all open client sessions. It brings them back into
>> memory during replay, and then operates on them to do things like open
>> new sessions or close ones that it turns out not to need. Your log
>> contains two close events for the same client session, and it's
>> causing a big freak out. This actually feels somewhat familiar; I'll
>> talk about it with our team here and get back to you tomorrow
>> sometime.
>> -Greg
>>
>> On Thu, Oct 18, 2012 at 8:56 AM, Nick Couchman <Nick.Couchman@seakr.com>
>> wrote:
>>> Hopefully this is what you're looking for...
>>> (gdb) bt
>>> #0  ESession::replay (this=0x7fffcc49a7c0, mds=0x127d5f0) at
>> mds/journal.cc:828
>>> #1  0x00000000006a2446 in MDLog::_replay_thread (this=0x1281390) at
>> mds/MDLog.cc:580
>>> #2  0x00000000004cf5ed in MDLog::ReplayThread::entry (this=<optimized out>) at
>> mds/MDLog.h:86
>>> #3  0x00007ffff764df05 in start_thread () from /lib64/libpthread.so.0
>>> #4  0x00007ffff680d10d in clone () from /lib64/libc.so.6
>>>
>>>>>> On 2012/10/17 at 09:53, Sam Lang <sam.lang@inktank.com> wrote:
>>>> On 10/17/2012 09:42 AM, Nick Couchman wrote:
>>>>> Thanks...here's the backtrace:
>>>>> (gdb) bt
>>>>> #0  0x00000000004dcfea in ESession::replay(MDS*) ()
>>>>> #1  0x00000000006a2446 in MDLog::_replay_thread() ()
>>>>> #2  0x00000000004cf5ed in MDLog::ReplayThread::entry() ()
>>>>> #3  0x00007ffff764df05 in start_thread () from /lib64/libpthread.so.0
>>>>> #4  0x00007ffff680d10d in clone () from /lib64/libc.so.6
>>>>
>>>> Hi Nick,
>>>>
>>>> This doesn't have the debug symbols (line numbers in the source) we were
>>>> hoping for.  Could you install the ceph-dpg package and rerun?  You will
>>>> probably have to first uninstall the ceph package.
>>>>
>>>> Thanks,
>>>> -sam
>>>>
>>>>>
>>>>>>>> On 2012/10/17 at 07:34, Sam Lang <sam.lang@inktank.com> wrote:
>>>>>> On 10/16/2012 06:04 PM, Gregory Farnum wrote:
>>>>>>> Okay, that's the right debugging but it wasn't quite as helpful on its
>>>>>>> own as I expected. Can you get a core dump (you might already have
>>>>>>> one, depending on system settings) of the crash and open it up with
>>>>>>> gdb and get a full backtrace?
>>>>>>
>>>>>> You can also run the mds directly in gdb and avoid any core file ulimit
>>>>>> settings you have set:
>>>>>>
>>>>>>   > gdb --args ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
>>>>>> ...
>>>>>> (gdb) run
>>>>>>
>>>>>> Once you hit the segfault you can get the backtrace with:
>>>>>>
>>>>>> (gdb) bt
>>>>>>
>>>>>> -sam
>>>>>>
>>>>>>
>>>>>>> -Greg
>>>>>>>
>>>>>>> On Mon, Oct 15, 2012 at 10:59 AM, Nick Couchman <Nick.Couchman@seakr.com>
>>>>>> wrote:
>>>>>>>> Well, hopefully this is still okay...8.5MB bzip2d, 230MB unzipped.
>>>>>>>>
>>>>>>>> -Nick
>>>>>>>>
>>>>>>>>>>> On 2012/10/15 at 11:47, Gregory Farnum <greg@inktank.com> wrote:
>>>>>>>>> Yeah, zip it and post * somebody's going to have to download it and
>>>>>>>> do
>>>>>>>>> fun things. :)
>>>>>>>>> -Greg
>>>>>>>>>
>>>>>>>>> On Mon, Oct 15, 2012 at 10:43 AM, Nick Couchman
>>>>>>>> <Nick.Couchman@seakr.com>
>>>>>>>>> wrote:
>>>>>>>>>> Anywhere in particular I should make it available?  It's a little
>>>>>>>> over a
>>>>>>>>> million lines of debug in the file - I can put it on a pastebin, if
>>>>>>>> that
>>>>>>>>> works, or perhaps zip it up and throw it somewhere?
>>>>>>>>>>
>>>>>>>>>> -Nick
>>>>>>>>>>
>>>>>>>>>>>>> On 2012/10/15 at 11:26, Gregory Farnum <greg@inktank.com> wrote:
>>>>>>>>>>> Something in the MDS log is bad or is poking at a bug in the code.
>>>>>>>> Can
>>>>>>>>>>> you turn on MDS debugging and restart a daemon and put that log
>>>>>>>>>>> somewhere accessible?
>>>>>>>>>>> debug mds = 20
>>>>>>>>>>> debug journaler = 20
>>>>>>>>>>> debug ms = 1
>>>>>>>>>>> -Greg
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Oct 15, 2012 at 10:02 AM, Nick Couchman
>>>>>>>> <Nick.Couchman@seakr.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> Well, both of my MDSs seem to be down right now, and then
>>>>>>>> continually
>>>>>>>>>>> segfault (every time I try to start them) with the following:
>>>>>>>>>>>>
>>>>>>>>>>>> ceph-mdsmon-a:~ # ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
>>>>>>>>>>>> starting mds.b at :/0
>>>>>>>>>>>> *** Caught signal (Segmentation fault) **
>>>>>>>>>>>>    in thread 7fbe0d61d700
>>>>>>>>>>>>    ceph version 0.48.1argonaut
>>>>>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>>>>>    1: ceph-mds() [0x7ef83a]
>>>>>>>>>>>>    2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>>>>>    3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>>>>>    4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>>>>>    5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>>>>>    6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>>>>>    7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>>>>>> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught signal
>>>>>>>> (Segmentation
>>>>>>>>>>> fault) **
>>>>>>>>>>>>    in thread 7fbe0d61d700
>>>>>>>>>>>>
>>>>>>>>>>>>    ceph version 0.48.1argonaut
>>>>>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>>>>>    1: ceph-mds() [0x7ef83a]
>>>>>>>>>>>>    2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>>>>>    3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>>>>>    4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>>>>>    5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>>>>>    6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>>>>>    7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>>>>>>    NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>> needed to
>>>>>>>>>>> interpret this.
>>>>>>>>>>>>
>>>>>>>>>>>>        0> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught
>>>>>>>> signal
>>>>>>>>>>> (Segmentation fault) **
>>>>>>>>>>>>    in thread 7fbe0d61d700
>>>>>>>>>>>>
>>>>>>>>>>>>    ceph version 0.48.1argonaut
>>>>>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>>>>>    1: ceph-mds() [0x7ef83a]
>>>>>>>>>>>>    2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>>>>>    3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>>>>>    4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>>>>>    5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>>>>>    6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>>>>>    7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>>>>>>    NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>>> needed to
>>>>>>>>>>> interpret this.
>>>>>>>>>>>>
>>>>>>>>>>>> Segmentation fault
>>>>>>>>>>>>
>>>>>>>>>>>> Anyone have any hints on recovering?  I'm running 0.48.1argonaut -
>>>>>>>> I can
>>>>>>>>>>> attempt to upgrade to 0.48.2 and see if that helps, but I figured
>>>>>>>> if anyone
>>>>>>>>>>> can offer any insight as to what to do to get the replay to run
>>>>>>>> without
>>>>>>>>>>> segfaulting?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --------
>>>>>>>>>>>> This e-mail may contain confidential and privileged material for
>>>>>>>> the sole use
>>>>>>>>>>> of the intended recipient.  If this email is not intended for you,
>>>>>>>> or you
>>>>>>>>> are
>>>>>>>>>>> not responsible for the delivery of this message to the intended
>>>>>>>> recipient,
>>>>>>>>>>> please note that this message may contain SEAKR Engineering
>>>>>>>> (SEAKR)
>>>>>>>>>>> Privileged/Proprietary Information.  In such a case, you are
>>>>>>>> strictly
>>>>>>>>>>> prohibited from downloading, photocopying, distributing or
>>>>>>>> otherwise using
>>>>>>>>>>> this message, its contents or attachments in any way.  If you have
>>>>>>>> received
>>>>>>>>>>> this message in error, please notify us immediately by replying to
>>>>>>>> this
>>>>>>>>> e-mail
>>>>>>>>>>> and delete the message from your mailbox.  Information contained in
>>>>>>>> this
>>>>>>>>>>> message that does not relate to the business of SEAKR is neither
>>>>>>>> endorsed by
>>>>>>>>>>> nor attributable to SEAKR.
>>>>>>>>>>>> --
>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>> ceph-devel" in
>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>> More majordomo info at
>>>>>>>> http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --------
>>>>>>>>>>
>>>>>>>>>> This e-mail may contain confidential and privileged material for the
>>>>>>>> sole use
>>>>>>>>> of the intended recipient.  If this email is not intended for you, or
>>>>>>>> you are
>>>>>>>>> not responsible for the delivery of this message to the intended
>>>>>>>> recipient,
>>>>>>>>> please note that this message may contain SEAKR Engineering (SEAKR)
>>>>>>>>> Privileged/Proprietary Information.  In such a case, you are strictly
>>>>>>>>
>>>>>>>>> prohibited from downloading, photocopying, distributing or otherwise
>>>>>>>> using
>>>>>>>>> this message, its contents or attachments in any way.  If you have
>>>>>>>> received
>>>>>>>>> this message in error, please notify us immediately by replying to
>>>>>>>> this e-mail
>>>>>>>>> and delete the message from your mailbox.  Information contained in
>>>>>>>> this
>>>>>>>>> message that does not relate to the business of SEAKR is neither
>>>>>>>> endorsed by
>>>>>>>>> nor attributable to SEAKR.
>>>>>>>>> --
>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>>>>> in
>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --------
>>>>>>>> This e-mail may contain confidential and privileged material for the sole use
>>>>>> of the intended recipient.  If this email is not intended for you, or you
>>>> are
>>>>>> not responsible for the delivery of this message to the intended recipient,
>>>>>> please note that this message may contain SEAKR Engineering (SEAKR)
>>>>>> Privileged/Proprietary Information.  In such a case, you are strictly
>>>>>> prohibited from downloading, photocopying, distributing or otherwise using
>>>>>> this message, its contents or attachments in any way.  If you have received
>>>>>> this message in error, please notify us immediately by replying to this
>>>> e-mail
>>>>>> and delete the message from your mailbox.  Information contained in this
>>>>>> message that does not relate to the business of SEAKR is neither endorsed by
>>>>>> nor attributable to SEAKR.
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --------
>>>>> This e-mail may contain confidential and privileged material for the sole use
>>>> of the intended recipient.  If this email is not intended for you, or you
>> are
>>>> not responsible for the delivery of this message to the intended recipient,
>>>> please note that this message may contain SEAKR Engineering (SEAKR)
>>>> Privileged/Proprietary Information.  In such a case, you are strictly
>>>> prohibited from downloading, photocopying, distributing or otherwise using
>>>> this message, its contents or attachments in any way.  If you have received
>>>> this message in error, please notify us immediately by replying to this
>> e-mail
>>>> and delete the message from your mailbox.  Information contained in this
>>>> message that does not relate to the business of SEAKR is neither endorsed by
>>>> nor attributable to SEAKR.
>>>>>
>>>
>>>
>>>
>>> --------
>>>
>>> This e-mail may contain confidential and privileged material for the sole use
>> of the intended recipient.  If this email is not intended for you, or you are
>> not responsible for the delivery of this message to the intended recipient,
>> please note that this message may contain SEAKR Engineering (SEAKR)
>> Privileged/Proprietary Information.  In such a case, you are strictly
>> prohibited from downloading, photocopying, distributing or otherwise using
>> this message, its contents or attachments in any way.  If you have received
>> this message in error, please notify us immediately by replying to this e-mail
>> and delete the message from your mailbox.  Information contained in this
>> message that does not relate to the business of SEAKR is neither endorsed by
>> nor attributable to SEAKR.
>
>
>
> --------
>
> This e-mail may contain confidential and privileged material for the sole use of the intended recipient.  If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information.  In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way.  If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox.  Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Help...MDS Continuously Segfaulting
  2012-10-18 22:55                   ` Gregory Farnum
@ 2012-10-19 20:52                     ` Nick Couchman
  2012-10-19 22:15                       ` Gregory Farnum
  0 siblings, 1 reply; 19+ messages in thread
From: Nick Couchman @ 2012-10-19 20:52 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Sam Lang, ceph-devel

One of the MDSs crashed over the weekend (late Friday night), but I believe that one was not active and was just in Replay mode.  Other than that, I don't know of anything that would have affected the MDSs.

-Nick

>>> On 2012/10/18 at 16:55, Gregory Farnum <greg@inktank.com> wrote: 
> Okay, looked at this a little bit. Can you describe what was happening
> before you got into this failed-replay loop? (So, why was it in replay
> at all?) I see that the monitor marked it as laggy for some reason;
> was the cluster under load; did the monitors break; something else?
> I can see why it's failed here and I think I can do a simple code
> patch to work around it, but the root cause is something that happened
> while the MDS was still alive.
> 
> Basic technical content:
> The MDS journals all open client sessions. It brings them back into
> memory during replay, and then operates on them to do things like open
> new sessions or close ones that it turns out not to need. Your log
> contains two close events for the same client session, and it's
> causing a big freak out. This actually feels somewhat familiar; I'll
> talk about it with our team here and get back to you tomorrow
> sometime.
> -Greg
> 
> On Thu, Oct 18, 2012 at 8:56 AM, Nick Couchman <Nick.Couchman@seakr.com> 
> wrote:
>> Hopefully this is what you're looking for...
>> (gdb) bt
>> #0  ESession::replay (this=0x7fffcc49a7c0, mds=0x127d5f0) at 
> mds/journal.cc:828
>> #1  0x00000000006a2446 in MDLog::_replay_thread (this=0x1281390) at 
> mds/MDLog.cc:580
>> #2  0x00000000004cf5ed in MDLog::ReplayThread::entry (this=<optimized out>) at 
> mds/MDLog.h:86
>> #3  0x00007ffff764df05 in start_thread () from /lib64/libpthread.so.0
>> #4  0x00007ffff680d10d in clone () from /lib64/libc.so.6
>>
>>>>> On 2012/10/17 at 09:53, Sam Lang <sam.lang@inktank.com> wrote:
>>> On 10/17/2012 09:42 AM, Nick Couchman wrote:
>>>> Thanks...here's the backtrace:
>>>> (gdb) bt
>>>> #0  0x00000000004dcfea in ESession::replay(MDS*) ()
>>>> #1  0x00000000006a2446 in MDLog::_replay_thread() ()
>>>> #2  0x00000000004cf5ed in MDLog::ReplayThread::entry() ()
>>>> #3  0x00007ffff764df05 in start_thread () from /lib64/libpthread.so.0
>>>> #4  0x00007ffff680d10d in clone () from /lib64/libc.so.6
>>>
>>> Hi Nick,
>>>
>>> This doesn't have the debug symbols (line numbers in the source) we were
>>> hoping for.  Could you install the ceph-dpg package and rerun?  You will
>>> probably have to first uninstall the ceph package.
>>>
>>> Thanks,
>>> -sam
>>>
>>>>
>>>>>>> On 2012/10/17 at 07:34, Sam Lang <sam.lang@inktank.com> wrote:
>>>>> On 10/16/2012 06:04 PM, Gregory Farnum wrote:
>>>>>> Okay, that's the right debugging but it wasn't quite as helpful on its
>>>>>> own as I expected. Can you get a core dump (you might already have
>>>>>> one, depending on system settings) of the crash and open it up with
>>>>>> gdb and get a full backtrace?
>>>>>
>>>>> You can also run the mds directly in gdb and avoid any core file ulimit
>>>>> settings you have set:
>>>>>
>>>>>   > gdb --args ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
>>>>> ...
>>>>> (gdb) run
>>>>>
>>>>> Once you hit the segfault you can get the backtrace with:
>>>>>
>>>>> (gdb) bt
>>>>>
>>>>> -sam
>>>>>
>>>>>
>>>>>> -Greg
>>>>>>
>>>>>> On Mon, Oct 15, 2012 at 10:59 AM, Nick Couchman <Nick.Couchman@seakr.com>
>>>>> wrote:
>>>>>>> Well, hopefully this is still okay...8.5MB bzip2d, 230MB unzipped.
>>>>>>>
>>>>>>> -Nick
>>>>>>>
>>>>>>>>>> On 2012/10/15 at 11:47, Gregory Farnum <greg@inktank.com> wrote:
>>>>>>>> Yeah, zip it and post * somebody's going to have to download it and
>>>>>>> do
>>>>>>>> fun things. :)
>>>>>>>> -Greg
>>>>>>>>
>>>>>>>> On Mon, Oct 15, 2012 at 10:43 AM, Nick Couchman
>>>>>>> <Nick.Couchman@seakr.com>
>>>>>>>> wrote:
>>>>>>>>> Anywhere in particular I should make it available?  It's a little
>>>>>>> over a
>>>>>>>> million lines of debug in the file - I can put it on a pastebin, if
>>>>>>> that
>>>>>>>> works, or perhaps zip it up and throw it somewhere?
>>>>>>>>>
>>>>>>>>> -Nick
>>>>>>>>>
>>>>>>>>>>>> On 2012/10/15 at 11:26, Gregory Farnum <greg@inktank.com> wrote:
>>>>>>>>>> Something in the MDS log is bad or is poking at a bug in the code.
>>>>>>> Can
>>>>>>>>>> you turn on MDS debugging and restart a daemon and put that log
>>>>>>>>>> somewhere accessible?
>>>>>>>>>> debug mds = 20
>>>>>>>>>> debug journaler = 20
>>>>>>>>>> debug ms = 1
>>>>>>>>>> -Greg
>>>>>>>>>>
>>>>>>>>>> On Mon, Oct 15, 2012 at 10:02 AM, Nick Couchman
>>>>>>> <Nick.Couchman@seakr.com>
>>>>>>>>>> wrote:
>>>>>>>>>>> Well, both of my MDSs seem to be down right now, and then
>>>>>>> continually
>>>>>>>>>> segfault (every time I try to start them) with the following:
>>>>>>>>>>>
>>>>>>>>>>> ceph-mdsmon-a:~ # ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
>>>>>>>>>>> starting mds.b at :/0
>>>>>>>>>>> *** Caught signal (Segmentation fault) **
>>>>>>>>>>>    in thread 7fbe0d61d700
>>>>>>>>>>>    ceph version 0.48.1argonaut
>>>>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>>>>    1: ceph-mds() [0x7ef83a]
>>>>>>>>>>>    2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>>>>    3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>>>>    4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>>>>    5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>>>>    6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>>>>    7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>>>>> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught signal
>>>>>>> (Segmentation
>>>>>>>>>> fault) **
>>>>>>>>>>>    in thread 7fbe0d61d700
>>>>>>>>>>>
>>>>>>>>>>>    ceph version 0.48.1argonaut
>>>>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>>>>    1: ceph-mds() [0x7ef83a]
>>>>>>>>>>>    2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>>>>    3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>>>>    4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>>>>    5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>>>>    6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>>>>    7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>>>>>    NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>> needed to
>>>>>>>>>> interpret this.
>>>>>>>>>>>
>>>>>>>>>>>        0> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught
>>>>>>> signal
>>>>>>>>>> (Segmentation fault) **
>>>>>>>>>>>    in thread 7fbe0d61d700
>>>>>>>>>>>
>>>>>>>>>>>    ceph version 0.48.1argonaut
>>>>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>>>>    1: ceph-mds() [0x7ef83a]
>>>>>>>>>>>    2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>>>>    3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>>>>    4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>>>>    5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>>>>    6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>>>>    7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>>>>>    NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>>> needed to
>>>>>>>>>> interpret this.
>>>>>>>>>>>
>>>>>>>>>>> Segmentation fault
>>>>>>>>>>>
>>>>>>>>>>> Anyone have any hints on recovering?  I'm running 0.48.1argonaut -
>>>>>>> I can
>>>>>>>>>> attempt to upgrade to 0.48.2 and see if that helps, but I figured
>>>>>>> if anyone
>>>>>>>>>> can offer any insight as to what to do to get the replay to run
>>>>>>> without
>>>>>>>>>> segfaulting?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --------
>>>>>>>>>>> This e-mail may contain confidential and privileged material for
>>>>>>> the sole use
>>>>>>>>>> of the intended recipient.  If this email is not intended for you,
>>>>>>> or you
>>>>>>>> are
>>>>>>>>>> not responsible for the delivery of this message to the intended
>>>>>>> recipient,
>>>>>>>>>> please note that this message may contain SEAKR Engineering
>>>>>>> (SEAKR)
>>>>>>>>>> Privileged/Proprietary Information.  In such a case, you are
>>>>>>> strictly
>>>>>>>>>> prohibited from downloading, photocopying, distributing or
>>>>>>> otherwise using
>>>>>>>>>> this message, its contents or attachments in any way.  If you have
>>>>>>> received
>>>>>>>>>> this message in error, please notify us immediately by replying to
>>>>>>> this
>>>>>>>> e-mail
>>>>>>>>>> and delete the message from your mailbox.  Information contained in
>>>>>>> this
>>>>>>>>>> message that does not relate to the business of SEAKR is neither
>>>>>>> endorsed by
>>>>>>>>>> nor attributable to SEAKR.
>>>>>>>>>>> --
>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>> ceph-devel" in
>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>> More majordomo info at
>>>>>>> http://vger.kernel.org/majordomo-info.html
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --------
>>>>>>>>>
>>>>>>>>> This e-mail may contain confidential and privileged material for the
>>>>>>> sole use
>>>>>>>> of the intended recipient.  If this email is not intended for you, or
>>>>>>> you are
>>>>>>>> not responsible for the delivery of this message to the intended
>>>>>>> recipient,
>>>>>>>> please note that this message may contain SEAKR Engineering (SEAKR)
>>>>>>>> Privileged/Proprietary Information.  In such a case, you are strictly
>>>>>>>
>>>>>>>> prohibited from downloading, photocopying, distributing or otherwise
>>>>>>> using
>>>>>>>> this message, its contents or attachments in any way.  If you have
>>>>>>> received
>>>>>>>> this message in error, please notify us immediately by replying to
>>>>>>> this e-mail
>>>>>>>> and delete the message from your mailbox.  Information contained in
>>>>>>> this
>>>>>>>> message that does not relate to the business of SEAKR is neither
>>>>>>> endorsed by
>>>>>>>> nor attributable to SEAKR.
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>>>> in
>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --------
>>>>>>> This e-mail may contain confidential and privileged material for the sole use
>>>>> of the intended recipient.  If this email is not intended for you, or you
>>> are
>>>>> not responsible for the delivery of this message to the intended recipient,
>>>>> please note that this message may contain SEAKR Engineering (SEAKR)
>>>>> Privileged/Proprietary Information.  In such a case, you are strictly
>>>>> prohibited from downloading, photocopying, distributing or otherwise using
>>>>> this message, its contents or attachments in any way.  If you have received
>>>>> this message in error, please notify us immediately by replying to this
>>> e-mail
>>>>> and delete the message from your mailbox.  Information contained in this
>>>>> message that does not relate to the business of SEAKR is neither endorsed by
>>>>> nor attributable to SEAKR.
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>
>>>>
>>>>
>>>> --------
>>>> This e-mail may contain confidential and privileged material for the sole use
>>> of the intended recipient.  If this email is not intended for you, or you 
> are
>>> not responsible for the delivery of this message to the intended recipient,
>>> please note that this message may contain SEAKR Engineering (SEAKR)
>>> Privileged/Proprietary Information.  In such a case, you are strictly
>>> prohibited from downloading, photocopying, distributing or otherwise using
>>> this message, its contents or attachments in any way.  If you have received
>>> this message in error, please notify us immediately by replying to this 
> e-mail
>>> and delete the message from your mailbox.  Information contained in this
>>> message that does not relate to the business of SEAKR is neither endorsed by
>>> nor attributable to SEAKR.
>>>>
>>
>>
>>
>> --------
>>
>> This e-mail may contain confidential and privileged material for the sole use 
> of the intended recipient.  If this email is not intended for you, or you are 
> not responsible for the delivery of this message to the intended recipient, 
> please note that this message may contain SEAKR Engineering (SEAKR) 
> Privileged/Proprietary Information.  In such a case, you are strictly 
> prohibited from downloading, photocopying, distributing or otherwise using 
> this message, its contents or attachments in any way.  If you have received 
> this message in error, please notify us immediately by replying to this e-mail 
> and delete the message from your mailbox.  Information contained in this 
> message that does not relate to the business of SEAKR is neither endorsed by 
> nor attributable to SEAKR.



--------

This e-mail may contain confidential and privileged material for the sole use of the intended recipient.  If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information.  In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way.  If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox.  Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Help...MDS Continuously Segfaulting
  2012-10-18 15:56                 ` Nick Couchman
  2012-10-18 16:20                   ` Gregory Farnum
@ 2012-10-18 22:55                   ` Gregory Farnum
  2012-10-19 20:52                     ` Nick Couchman
  1 sibling, 1 reply; 19+ messages in thread
From: Gregory Farnum @ 2012-10-18 22:55 UTC (permalink / raw)
  To: Nick Couchman; +Cc: Sam Lang, ceph-devel

Okay, looked at this a little bit. Can you describe what was happening
before you got into this failed-replay loop? (So, why was it in replay
at all?) I see that the monitor marked it as laggy for some reason;
was the cluster under load; did the monitors break; something else?
I can see why it's failed here and I think I can do a simple code
patch to work around it, but the root cause is something that happened
while the MDS was still alive.

Basic technical content:
The MDS journals all open client sessions. It brings them back into
memory during replay, and then operates on them to do things like open
new sessions or close ones that it turns out not to need. Your log
contains two close events for the same client session, and it's
causing a big freak out. This actually feels somewhat familiar; I'll
talk about it with our team here and get back to you tomorrow
sometime.
-Greg

On Thu, Oct 18, 2012 at 8:56 AM, Nick Couchman <Nick.Couchman@seakr.com> wrote:
> Hopefully this is what you're looking for...
> (gdb) bt
> #0  ESession::replay (this=0x7fffcc49a7c0, mds=0x127d5f0) at mds/journal.cc:828
> #1  0x00000000006a2446 in MDLog::_replay_thread (this=0x1281390) at mds/MDLog.cc:580
> #2  0x00000000004cf5ed in MDLog::ReplayThread::entry (this=<optimized out>) at mds/MDLog.h:86
> #3  0x00007ffff764df05 in start_thread () from /lib64/libpthread.so.0
> #4  0x00007ffff680d10d in clone () from /lib64/libc.so.6
>
>>>> On 2012/10/17 at 09:53, Sam Lang <sam.lang@inktank.com> wrote:
>> On 10/17/2012 09:42 AM, Nick Couchman wrote:
>>> Thanks...here's the backtrace:
>>> (gdb) bt
>>> #0  0x00000000004dcfea in ESession::replay(MDS*) ()
>>> #1  0x00000000006a2446 in MDLog::_replay_thread() ()
>>> #2  0x00000000004cf5ed in MDLog::ReplayThread::entry() ()
>>> #3  0x00007ffff764df05 in start_thread () from /lib64/libpthread.so.0
>>> #4  0x00007ffff680d10d in clone () from /lib64/libc.so.6
>>
>> Hi Nick,
>>
>> This doesn't have the debug symbols (line numbers in the source) we were
>> hoping for.  Could you install the ceph-dpg package and rerun?  You will
>> probably have to first uninstall the ceph package.
>>
>> Thanks,
>> -sam
>>
>>>
>>>>>> On 2012/10/17 at 07:34, Sam Lang <sam.lang@inktank.com> wrote:
>>>> On 10/16/2012 06:04 PM, Gregory Farnum wrote:
>>>>> Okay, that's the right debugging but it wasn't quite as helpful on its
>>>>> own as I expected. Can you get a core dump (you might already have
>>>>> one, depending on system settings) of the crash and open it up with
>>>>> gdb and get a full backtrace?
>>>>
>>>> You can also run the mds directly in gdb and avoid any core file ulimit
>>>> settings you have set:
>>>>
>>>>   > gdb --args ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
>>>> ...
>>>> (gdb) run
>>>>
>>>> Once you hit the segfault you can get the backtrace with:
>>>>
>>>> (gdb) bt
>>>>
>>>> -sam
>>>>
>>>>
>>>>> -Greg
>>>>>
>>>>> On Mon, Oct 15, 2012 at 10:59 AM, Nick Couchman <Nick.Couchman@seakr.com>
>>>> wrote:
>>>>>> Well, hopefully this is still okay...8.5MB bzip2d, 230MB unzipped.
>>>>>>
>>>>>> -Nick
>>>>>>
>>>>>>>>> On 2012/10/15 at 11:47, Gregory Farnum <greg@inktank.com> wrote:
>>>>>>> Yeah, zip it and post * somebody's going to have to download it and
>>>>>> do
>>>>>>> fun things. :)
>>>>>>> -Greg
>>>>>>>
>>>>>>> On Mon, Oct 15, 2012 at 10:43 AM, Nick Couchman
>>>>>> <Nick.Couchman@seakr.com>
>>>>>>> wrote:
>>>>>>>> Anywhere in particular I should make it available?  It's a little
>>>>>> over a
>>>>>>> million lines of debug in the file - I can put it on a pastebin, if
>>>>>> that
>>>>>>> works, or perhaps zip it up and throw it somewhere?
>>>>>>>>
>>>>>>>> -Nick
>>>>>>>>
>>>>>>>>>>> On 2012/10/15 at 11:26, Gregory Farnum <greg@inktank.com> wrote:
>>>>>>>>> Something in the MDS log is bad or is poking at a bug in the code.
>>>>>> Can
>>>>>>>>> you turn on MDS debugging and restart a daemon and put that log
>>>>>>>>> somewhere accessible?
>>>>>>>>> debug mds = 20
>>>>>>>>> debug journaler = 20
>>>>>>>>> debug ms = 1
>>>>>>>>> -Greg
>>>>>>>>>
>>>>>>>>> On Mon, Oct 15, 2012 at 10:02 AM, Nick Couchman
>>>>>> <Nick.Couchman@seakr.com>
>>>>>>>>> wrote:
>>>>>>>>>> Well, both of my MDSs seem to be down right now, and then
>>>>>> continually
>>>>>>>>> segfault (every time I try to start them) with the following:
>>>>>>>>>>
>>>>>>>>>> ceph-mdsmon-a:~ # ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
>>>>>>>>>> starting mds.b at :/0
>>>>>>>>>> *** Caught signal (Segmentation fault) **
>>>>>>>>>>    in thread 7fbe0d61d700
>>>>>>>>>>    ceph version 0.48.1argonaut
>>>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>>>    1: ceph-mds() [0x7ef83a]
>>>>>>>>>>    2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>>>    3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>>>    4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>>>    5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>>>    6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>>>    7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>>>> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught signal
>>>>>> (Segmentation
>>>>>>>>> fault) **
>>>>>>>>>>    in thread 7fbe0d61d700
>>>>>>>>>>
>>>>>>>>>>    ceph version 0.48.1argonaut
>>>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>>>    1: ceph-mds() [0x7ef83a]
>>>>>>>>>>    2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>>>    3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>>>    4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>>>    5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>>>    6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>>>    7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>>>>    NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>> needed to
>>>>>>>>> interpret this.
>>>>>>>>>>
>>>>>>>>>>        0> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught
>>>>>> signal
>>>>>>>>> (Segmentation fault) **
>>>>>>>>>>    in thread 7fbe0d61d700
>>>>>>>>>>
>>>>>>>>>>    ceph version 0.48.1argonaut
>>>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>>>    1: ceph-mds() [0x7ef83a]
>>>>>>>>>>    2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>>>    3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>>>    4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>>>    5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>>>    6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>>>    7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>>>>    NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>> needed to
>>>>>>>>> interpret this.
>>>>>>>>>>
>>>>>>>>>> Segmentation fault
>>>>>>>>>>
>>>>>>>>>> Anyone have any hints on recovering?  I'm running 0.48.1argonaut -
>>>>>> I can
>>>>>>>>> attempt to upgrade to 0.48.2 and see if that helps, but I figured
>>>>>> if anyone
>>>>>>>>> can offer any insight as to what to do to get the replay to run
>>>>>> without
>>>>>>>>> segfaulting?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --------
>>>>>>>>>> This e-mail may contain confidential and privileged material for
>>>>>> the sole use
>>>>>>>>> of the intended recipient.  If this email is not intended for you,
>>>>>> or you
>>>>>>> are
>>>>>>>>> not responsible for the delivery of this message to the intended
>>>>>> recipient,
>>>>>>>>> please note that this message may contain SEAKR Engineering
>>>>>> (SEAKR)
>>>>>>>>> Privileged/Proprietary Information.  In such a case, you are
>>>>>> strictly
>>>>>>>>> prohibited from downloading, photocopying, distributing or
>>>>>> otherwise using
>>>>>>>>> this message, its contents or attachments in any way.  If you have
>>>>>> received
>>>>>>>>> this message in error, please notify us immediately by replying to
>>>>>> this
>>>>>>> e-mail
>>>>>>>>> and delete the message from your mailbox.  Information contained in
>>>>>> this
>>>>>>>>> message that does not relate to the business of SEAKR is neither
>>>>>> endorsed by
>>>>>>>>> nor attributable to SEAKR.
>>>>>>>>>> --
>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>> ceph-devel" in
>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>> More majordomo info at
>>>>>> http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --------
>>>>>>>>
>>>>>>>> This e-mail may contain confidential and privileged material for the
>>>>>> sole use
>>>>>>> of the intended recipient.  If this email is not intended for you, or
>>>>>> you are
>>>>>>> not responsible for the delivery of this message to the intended
>>>>>> recipient,
>>>>>>> please note that this message may contain SEAKR Engineering (SEAKR)
>>>>>>> Privileged/Proprietary Information.  In such a case, you are strictly
>>>>>>
>>>>>>> prohibited from downloading, photocopying, distributing or otherwise
>>>>>> using
>>>>>>> this message, its contents or attachments in any way.  If you have
>>>>>> received
>>>>>>> this message in error, please notify us immediately by replying to
>>>>>> this e-mail
>>>>>>> and delete the message from your mailbox.  Information contained in
>>>>>> this
>>>>>>> message that does not relate to the business of SEAKR is neither
>>>>>> endorsed by
>>>>>>> nor attributable to SEAKR.
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>>> in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>>
>>>>>>
>>>>>> --------
>>>>>> This e-mail may contain confidential and privileged material for the sole use
>>>> of the intended recipient.  If this email is not intended for you, or you
>> are
>>>> not responsible for the delivery of this message to the intended recipient,
>>>> please note that this message may contain SEAKR Engineering (SEAKR)
>>>> Privileged/Proprietary Information.  In such a case, you are strictly
>>>> prohibited from downloading, photocopying, distributing or otherwise using
>>>> this message, its contents or attachments in any way.  If you have received
>>>> this message in error, please notify us immediately by replying to this
>> e-mail
>>>> and delete the message from your mailbox.  Information contained in this
>>>> message that does not relate to the business of SEAKR is neither endorsed by
>>>> nor attributable to SEAKR.
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>
>>>
>>>
>>> --------
>>> This e-mail may contain confidential and privileged material for the sole use
>> of the intended recipient.  If this email is not intended for you, or you are
>> not responsible for the delivery of this message to the intended recipient,
>> please note that this message may contain SEAKR Engineering (SEAKR)
>> Privileged/Proprietary Information.  In such a case, you are strictly
>> prohibited from downloading, photocopying, distributing or otherwise using
>> this message, its contents or attachments in any way.  If you have received
>> this message in error, please notify us immediately by replying to this e-mail
>> and delete the message from your mailbox.  Information contained in this
>> message that does not relate to the business of SEAKR is neither endorsed by
>> nor attributable to SEAKR.
>>>
>
>
>
> --------
>
> This e-mail may contain confidential and privileged material for the sole use of the intended recipient.  If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information.  In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way.  If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox.  Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Help...MDS Continuously Segfaulting
  2012-10-18 15:56                 ` Nick Couchman
@ 2012-10-18 16:20                   ` Gregory Farnum
  2012-10-18 22:55                   ` Gregory Farnum
  1 sibling, 0 replies; 19+ messages in thread
From: Gregory Farnum @ 2012-10-18 16:20 UTC (permalink / raw)
  To: Nick Couchman; +Cc: Sam Lang, ceph-devel

Yep, thanks! I'll have to go through and see if I can figure out
what's going on there.

On Thu, Oct 18, 2012 at 8:56 AM, Nick Couchman <Nick.Couchman@seakr.com> wrote:
> Hopefully this is what you're looking for...
> (gdb) bt
> #0  ESession::replay (this=0x7fffcc49a7c0, mds=0x127d5f0) at mds/journal.cc:828
> #1  0x00000000006a2446 in MDLog::_replay_thread (this=0x1281390) at mds/MDLog.cc:580
> #2  0x00000000004cf5ed in MDLog::ReplayThread::entry (this=<optimized out>) at mds/MDLog.h:86
> #3  0x00007ffff764df05 in start_thread () from /lib64/libpthread.so.0
> #4  0x00007ffff680d10d in clone () from /lib64/libc.so.6
>
>>>> On 2012/10/17 at 09:53, Sam Lang <sam.lang@inktank.com> wrote:
>> On 10/17/2012 09:42 AM, Nick Couchman wrote:
>>> Thanks...here's the backtrace:
>>> (gdb) bt
>>> #0  0x00000000004dcfea in ESession::replay(MDS*) ()
>>> #1  0x00000000006a2446 in MDLog::_replay_thread() ()
>>> #2  0x00000000004cf5ed in MDLog::ReplayThread::entry() ()
>>> #3  0x00007ffff764df05 in start_thread () from /lib64/libpthread.so.0
>>> #4  0x00007ffff680d10d in clone () from /lib64/libc.so.6
>>
>> Hi Nick,
>>
>> This doesn't have the debug symbols (line numbers in the source) we were
>> hoping for.  Could you install the ceph-dpg package and rerun?  You will
>> probably have to first uninstall the ceph package.
>>
>> Thanks,
>> -sam
>>
>>>
>>>>>> On 2012/10/17 at 07:34, Sam Lang <sam.lang@inktank.com> wrote:
>>>> On 10/16/2012 06:04 PM, Gregory Farnum wrote:
>>>>> Okay, that's the right debugging but it wasn't quite as helpful on its
>>>>> own as I expected. Can you get a core dump (you might already have
>>>>> one, depending on system settings) of the crash and open it up with
>>>>> gdb and get a full backtrace?
>>>>
>>>> You can also run the mds directly in gdb and avoid any core file ulimit
>>>> settings you have set:
>>>>
>>>>   > gdb --args ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
>>>> ...
>>>> (gdb) run
>>>>
>>>> Once you hit the segfault you can get the backtrace with:
>>>>
>>>> (gdb) bt
>>>>
>>>> -sam
>>>>
>>>>
>>>>> -Greg
>>>>>
>>>>> On Mon, Oct 15, 2012 at 10:59 AM, Nick Couchman <Nick.Couchman@seakr.com>
>>>> wrote:
>>>>>> Well, hopefully this is still okay...8.5MB bzip2d, 230MB unzipped.
>>>>>>
>>>>>> -Nick
>>>>>>
>>>>>>>>> On 2012/10/15 at 11:47, Gregory Farnum <greg@inktank.com> wrote:
>>>>>>> Yeah, zip it and post * somebody's going to have to download it and
>>>>>> do
>>>>>>> fun things. :)
>>>>>>> -Greg
>>>>>>>
>>>>>>> On Mon, Oct 15, 2012 at 10:43 AM, Nick Couchman
>>>>>> <Nick.Couchman@seakr.com>
>>>>>>> wrote:
>>>>>>>> Anywhere in particular I should make it available?  It's a little
>>>>>> over a
>>>>>>> million lines of debug in the file - I can put it on a pastebin, if
>>>>>> that
>>>>>>> works, or perhaps zip it up and throw it somewhere?
>>>>>>>>
>>>>>>>> -Nick
>>>>>>>>
>>>>>>>>>>> On 2012/10/15 at 11:26, Gregory Farnum <greg@inktank.com> wrote:
>>>>>>>>> Something in the MDS log is bad or is poking at a bug in the code.
>>>>>> Can
>>>>>>>>> you turn on MDS debugging and restart a daemon and put that log
>>>>>>>>> somewhere accessible?
>>>>>>>>> debug mds = 20
>>>>>>>>> debug journaler = 20
>>>>>>>>> debug ms = 1
>>>>>>>>> -Greg
>>>>>>>>>
>>>>>>>>> On Mon, Oct 15, 2012 at 10:02 AM, Nick Couchman
>>>>>> <Nick.Couchman@seakr.com>
>>>>>>>>> wrote:
>>>>>>>>>> Well, both of my MDSs seem to be down right now, and then
>>>>>> continually
>>>>>>>>> segfault (every time I try to start them) with the following:
>>>>>>>>>>
>>>>>>>>>> ceph-mdsmon-a:~ # ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
>>>>>>>>>> starting mds.b at :/0
>>>>>>>>>> *** Caught signal (Segmentation fault) **
>>>>>>>>>>    in thread 7fbe0d61d700
>>>>>>>>>>    ceph version 0.48.1argonaut
>>>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>>>    1: ceph-mds() [0x7ef83a]
>>>>>>>>>>    2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>>>    3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>>>    4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>>>    5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>>>    6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>>>    7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>>>> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught signal
>>>>>> (Segmentation
>>>>>>>>> fault) **
>>>>>>>>>>    in thread 7fbe0d61d700
>>>>>>>>>>
>>>>>>>>>>    ceph version 0.48.1argonaut
>>>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>>>    1: ceph-mds() [0x7ef83a]
>>>>>>>>>>    2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>>>    3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>>>    4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>>>    5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>>>    6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>>>    7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>>>>    NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>> needed to
>>>>>>>>> interpret this.
>>>>>>>>>>
>>>>>>>>>>        0> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught
>>>>>> signal
>>>>>>>>> (Segmentation fault) **
>>>>>>>>>>    in thread 7fbe0d61d700
>>>>>>>>>>
>>>>>>>>>>    ceph version 0.48.1argonaut
>>>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>>>    1: ceph-mds() [0x7ef83a]
>>>>>>>>>>    2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>>>    3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>>>    4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>>>    5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>>>    6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>>>    7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>>>>    NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>> needed to
>>>>>>>>> interpret this.
>>>>>>>>>>
>>>>>>>>>> Segmentation fault
>>>>>>>>>>
>>>>>>>>>> Anyone have any hints on recovering?  I'm running 0.48.1argonaut -
>>>>>> I can
>>>>>>>>> attempt to upgrade to 0.48.2 and see if that helps, but I figured
>>>>>> if anyone
>>>>>>>>> can offer any insight as to what to do to get the replay to run
>>>>>> without
>>>>>>>>> segfaulting?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --------
>>>>>>>>>> This e-mail may contain confidential and privileged material for
>>>>>> the sole use
>>>>>>>>> of the intended recipient.  If this email is not intended for you,
>>>>>> or you
>>>>>>> are
>>>>>>>>> not responsible for the delivery of this message to the intended
>>>>>> recipient,
>>>>>>>>> please note that this message may contain SEAKR Engineering
>>>>>> (SEAKR)
>>>>>>>>> Privileged/Proprietary Information.  In such a case, you are
>>>>>> strictly
>>>>>>>>> prohibited from downloading, photocopying, distributing or
>>>>>> otherwise using
>>>>>>>>> this message, its contents or attachments in any way.  If you have
>>>>>> received
>>>>>>>>> this message in error, please notify us immediately by replying to
>>>>>> this
>>>>>>> e-mail
>>>>>>>>> and delete the message from your mailbox.  Information contained in
>>>>>> this
>>>>>>>>> message that does not relate to the business of SEAKR is neither
>>>>>> endorsed by
>>>>>>>>> nor attributable to SEAKR.
>>>>>>>>>> --
>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>> ceph-devel" in
>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>> More majordomo info at
>>>>>> http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --------
>>>>>>>>
>>>>>>>> This e-mail may contain confidential and privileged material for the
>>>>>> sole use
>>>>>>> of the intended recipient.  If this email is not intended for you, or
>>>>>> you are
>>>>>>> not responsible for the delivery of this message to the intended
>>>>>> recipient,
>>>>>>> please note that this message may contain SEAKR Engineering (SEAKR)
>>>>>>> Privileged/Proprietary Information.  In such a case, you are strictly
>>>>>>
>>>>>>> prohibited from downloading, photocopying, distributing or otherwise
>>>>>> using
>>>>>>> this message, its contents or attachments in any way.  If you have
>>>>>> received
>>>>>>> this message in error, please notify us immediately by replying to
>>>>>> this e-mail
>>>>>>> and delete the message from your mailbox.  Information contained in
>>>>>> this
>>>>>>> message that does not relate to the business of SEAKR is neither
>>>>>> endorsed by
>>>>>>> nor attributable to SEAKR.
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>>> in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>>
>>>>>>
>>>>>> --------
>>>>>> This e-mail may contain confidential and privileged material for the sole use
>>>> of the intended recipient.  If this email is not intended for you, or you
>> are
>>>> not responsible for the delivery of this message to the intended recipient,
>>>> please note that this message may contain SEAKR Engineering (SEAKR)
>>>> Privileged/Proprietary Information.  In such a case, you are strictly
>>>> prohibited from downloading, photocopying, distributing or otherwise using
>>>> this message, its contents or attachments in any way.  If you have received
>>>> this message in error, please notify us immediately by replying to this
>> e-mail
>>>> and delete the message from your mailbox.  Information contained in this
>>>> message that does not relate to the business of SEAKR is neither endorsed by
>>>> nor attributable to SEAKR.
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>
>>>
>>>
>>> --------
>>> This e-mail may contain confidential and privileged material for the sole use
>> of the intended recipient.  If this email is not intended for you, or you are
>> not responsible for the delivery of this message to the intended recipient,
>> please note that this message may contain SEAKR Engineering (SEAKR)
>> Privileged/Proprietary Information.  In such a case, you are strictly
>> prohibited from downloading, photocopying, distributing or otherwise using
>> this message, its contents or attachments in any way.  If you have received
>> this message in error, please notify us immediately by replying to this e-mail
>> and delete the message from your mailbox.  Information contained in this
>> message that does not relate to the business of SEAKR is neither endorsed by
>> nor attributable to SEAKR.
>>>
>
>
>
> --------
>
> This e-mail may contain confidential and privileged material for the sole use of the intended recipient.  If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information.  In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way.  If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox.  Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Help...MDS Continuously Segfaulting
  2012-10-17 15:53               ` Sam Lang
  2012-10-17 16:23                 ` Nick Couchman
@ 2012-10-18 15:56                 ` Nick Couchman
  2012-10-18 16:20                   ` Gregory Farnum
  2012-10-18 22:55                   ` Gregory Farnum
  1 sibling, 2 replies; 19+ messages in thread
From: Nick Couchman @ 2012-10-18 15:56 UTC (permalink / raw)
  To: Sam Lang; +Cc: Gregory Farnum, ceph-devel

Hopefully this is what you're looking for...
(gdb) bt
#0  ESession::replay (this=0x7fffcc49a7c0, mds=0x127d5f0) at mds/journal.cc:828
#1  0x00000000006a2446 in MDLog::_replay_thread (this=0x1281390) at mds/MDLog.cc:580
#2  0x00000000004cf5ed in MDLog::ReplayThread::entry (this=<optimized out>) at mds/MDLog.h:86
#3  0x00007ffff764df05 in start_thread () from /lib64/libpthread.so.0
#4  0x00007ffff680d10d in clone () from /lib64/libc.so.6

>>> On 2012/10/17 at 09:53, Sam Lang <sam.lang@inktank.com> wrote: 
> On 10/17/2012 09:42 AM, Nick Couchman wrote:
>> Thanks...here's the backtrace:
>> (gdb) bt
>> #0  0x00000000004dcfea in ESession::replay(MDS*) ()
>> #1  0x00000000006a2446 in MDLog::_replay_thread() ()
>> #2  0x00000000004cf5ed in MDLog::ReplayThread::entry() ()
>> #3  0x00007ffff764df05 in start_thread () from /lib64/libpthread.so.0
>> #4  0x00007ffff680d10d in clone () from /lib64/libc.so.6
> 
> Hi Nick,
> 
> This doesn't have the debug symbols (line numbers in the source) we were 
> hoping for.  Could you install the ceph-dpg package and rerun?  You will 
> probably have to first uninstall the ceph package.
> 
> Thanks,
> -sam
> 
>>
>>>>> On 2012/10/17 at 07:34, Sam Lang <sam.lang@inktank.com> wrote:
>>> On 10/16/2012 06:04 PM, Gregory Farnum wrote:
>>>> Okay, that's the right debugging but it wasn't quite as helpful on its
>>>> own as I expected. Can you get a core dump (you might already have
>>>> one, depending on system settings) of the crash and open it up with
>>>> gdb and get a full backtrace?
>>>
>>> You can also run the mds directly in gdb and avoid any core file ulimit
>>> settings you have set:
>>>
>>>   > gdb --args ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
>>> ...
>>> (gdb) run
>>>
>>> Once you hit the segfault you can get the backtrace with:
>>>
>>> (gdb) bt
>>>
>>> -sam
>>>
>>>
>>>> -Greg
>>>>
>>>> On Mon, Oct 15, 2012 at 10:59 AM, Nick Couchman <Nick.Couchman@seakr.com>
>>> wrote:
>>>>> Well, hopefully this is still okay...8.5MB bzip2d, 230MB unzipped.
>>>>>
>>>>> -Nick
>>>>>
>>>>>>>> On 2012/10/15 at 11:47, Gregory Farnum <greg@inktank.com> wrote:
>>>>>> Yeah, zip it and post * somebody's going to have to download it and
>>>>> do
>>>>>> fun things. :)
>>>>>> -Greg
>>>>>>
>>>>>> On Mon, Oct 15, 2012 at 10:43 AM, Nick Couchman
>>>>> <Nick.Couchman@seakr.com>
>>>>>> wrote:
>>>>>>> Anywhere in particular I should make it available?  It's a little
>>>>> over a
>>>>>> million lines of debug in the file - I can put it on a pastebin, if
>>>>> that
>>>>>> works, or perhaps zip it up and throw it somewhere?
>>>>>>>
>>>>>>> -Nick
>>>>>>>
>>>>>>>>>> On 2012/10/15 at 11:26, Gregory Farnum <greg@inktank.com> wrote:
>>>>>>>> Something in the MDS log is bad or is poking at a bug in the code.
>>>>> Can
>>>>>>>> you turn on MDS debugging and restart a daemon and put that log
>>>>>>>> somewhere accessible?
>>>>>>>> debug mds = 20
>>>>>>>> debug journaler = 20
>>>>>>>> debug ms = 1
>>>>>>>> -Greg
>>>>>>>>
>>>>>>>> On Mon, Oct 15, 2012 at 10:02 AM, Nick Couchman
>>>>> <Nick.Couchman@seakr.com>
>>>>>>>> wrote:
>>>>>>>>> Well, both of my MDSs seem to be down right now, and then
>>>>> continually
>>>>>>>> segfault (every time I try to start them) with the following:
>>>>>>>>>
>>>>>>>>> ceph-mdsmon-a:~ # ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
>>>>>>>>> starting mds.b at :/0
>>>>>>>>> *** Caught signal (Segmentation fault) **
>>>>>>>>>    in thread 7fbe0d61d700
>>>>>>>>>    ceph version 0.48.1argonaut
>>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>>    1: ceph-mds() [0x7ef83a]
>>>>>>>>>    2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>>    3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>>    4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>>    5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>>    6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>>    7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>>> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught signal
>>>>> (Segmentation
>>>>>>>> fault) **
>>>>>>>>>    in thread 7fbe0d61d700
>>>>>>>>>
>>>>>>>>>    ceph version 0.48.1argonaut
>>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>>    1: ceph-mds() [0x7ef83a]
>>>>>>>>>    2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>>    3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>>    4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>>    5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>>    6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>>    7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>>>    NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>> needed to
>>>>>>>> interpret this.
>>>>>>>>>
>>>>>>>>>        0> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught
>>>>> signal
>>>>>>>> (Segmentation fault) **
>>>>>>>>>    in thread 7fbe0d61d700
>>>>>>>>>
>>>>>>>>>    ceph version 0.48.1argonaut
>>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>>    1: ceph-mds() [0x7ef83a]
>>>>>>>>>    2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>>    3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>>    4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>>    5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>>    6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>>    7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>>>    NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>> needed to
>>>>>>>> interpret this.
>>>>>>>>>
>>>>>>>>> Segmentation fault
>>>>>>>>>
>>>>>>>>> Anyone have any hints on recovering?  I'm running 0.48.1argonaut -
>>>>> I can
>>>>>>>> attempt to upgrade to 0.48.2 and see if that helps, but I figured
>>>>> if anyone
>>>>>>>> can offer any insight as to what to do to get the replay to run
>>>>> without
>>>>>>>> segfaulting?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --------
>>>>>>>>> This e-mail may contain confidential and privileged material for
>>>>> the sole use
>>>>>>>> of the intended recipient.  If this email is not intended for you,
>>>>> or you
>>>>>> are
>>>>>>>> not responsible for the delivery of this message to the intended
>>>>> recipient,
>>>>>>>> please note that this message may contain SEAKR Engineering
>>>>> (SEAKR)
>>>>>>>> Privileged/Proprietary Information.  In such a case, you are
>>>>> strictly
>>>>>>>> prohibited from downloading, photocopying, distributing or
>>>>> otherwise using
>>>>>>>> this message, its contents or attachments in any way.  If you have
>>>>> received
>>>>>>>> this message in error, please notify us immediately by replying to
>>>>> this
>>>>>> e-mail
>>>>>>>> and delete the message from your mailbox.  Information contained in
>>>>> this
>>>>>>>> message that does not relate to the business of SEAKR is neither
>>>>> endorsed by
>>>>>>>> nor attributable to SEAKR.
>>>>>>>>> --
>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>> ceph-devel" in
>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>> More majordomo info at
>>>>> http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --------
>>>>>>>
>>>>>>> This e-mail may contain confidential and privileged material for the
>>>>> sole use
>>>>>> of the intended recipient.  If this email is not intended for you, or
>>>>> you are
>>>>>> not responsible for the delivery of this message to the intended
>>>>> recipient,
>>>>>> please note that this message may contain SEAKR Engineering (SEAKR)
>>>>>> Privileged/Proprietary Information.  In such a case, you are strictly
>>>>>
>>>>>> prohibited from downloading, photocopying, distributing or otherwise
>>>>> using
>>>>>> this message, its contents or attachments in any way.  If you have
>>>>> received
>>>>>> this message in error, please notify us immediately by replying to
>>>>> this e-mail
>>>>>> and delete the message from your mailbox.  Information contained in
>>>>> this
>>>>>> message that does not relate to the business of SEAKR is neither
>>>>> endorsed by
>>>>>> nor attributable to SEAKR.
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>> in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>>>>
>>>>> --------
>>>>> This e-mail may contain confidential and privileged material for the sole use
>>> of the intended recipient.  If this email is not intended for you, or you 
> are
>>> not responsible for the delivery of this message to the intended recipient,
>>> please note that this message may contain SEAKR Engineering (SEAKR)
>>> Privileged/Proprietary Information.  In such a case, you are strictly
>>> prohibited from downloading, photocopying, distributing or otherwise using
>>> this message, its contents or attachments in any way.  If you have received
>>> this message in error, please notify us immediately by replying to this 
> e-mail
>>> and delete the message from your mailbox.  Information contained in this
>>> message that does not relate to the business of SEAKR is neither endorsed by
>>> nor attributable to SEAKR.
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>
>>
>>
>> --------
>> This e-mail may contain confidential and privileged material for the sole use 
> of the intended recipient.  If this email is not intended for you, or you are 
> not responsible for the delivery of this message to the intended recipient, 
> please note that this message may contain SEAKR Engineering (SEAKR) 
> Privileged/Proprietary Information.  In such a case, you are strictly 
> prohibited from downloading, photocopying, distributing or otherwise using 
> this message, its contents or attachments in any way.  If you have received 
> this message in error, please notify us immediately by replying to this e-mail 
> and delete the message from your mailbox.  Information contained in this 
> message that does not relate to the business of SEAKR is neither endorsed by 
> nor attributable to SEAKR.
>>



--------

This e-mail may contain confidential and privileged material for the sole use of the intended recipient.  If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information.  In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way.  If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox.  Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Help...MDS Continuously Segfaulting
  2012-10-17 16:23                 ` Nick Couchman
@ 2012-10-17 17:03                   ` Sam Lang
  0 siblings, 0 replies; 19+ messages in thread
From: Sam Lang @ 2012-10-17 17:03 UTC (permalink / raw)
  To: Nick Couchman; +Cc: Gregory Farnum, ceph-devel

On 10/17/2012 11:23 AM, Nick Couchman wrote:
> Hmmm...I don't seem to have the dbg packages built...will have to go back and figure out how to build those.
>

Ah I thought you had installed from debian binaries.  If you compiled 
ceph yourself, to get the debugging symbols you have to reconfigure with 
-g on CXXFLAGS:

./configure CXXFLAGS=-g

-sam

> -Nick
>
>>>> On 2012/10/17 at 09:53, Sam Lang <sam.lang@inktank.com> wrote:
>> On 10/17/2012 09:42 AM, Nick Couchman wrote:
>>> Thanks...here's the backtrace:
>>> (gdb) bt
>>> #0  0x00000000004dcfea in ESession::replay(MDS*) ()
>>> #1  0x00000000006a2446 in MDLog::_replay_thread() ()
>>> #2  0x00000000004cf5ed in MDLog::ReplayThread::entry() ()
>>> #3  0x00007ffff764df05 in start_thread () from /lib64/libpthread.so.0
>>> #4  0x00007ffff680d10d in clone () from /lib64/libc.so.6
>>
>> Hi Nick,
>>
>> This doesn't have the debug symbols (line numbers in the source) we were
>> hoping for.  Could you install the ceph-dpg package and rerun?  You will
>> probably have to first uninstall the ceph package.
>>
>> Thanks,
>> -sam
>>
>>>
>>>>>> On 2012/10/17 at 07:34, Sam Lang <sam.lang@inktank.com> wrote:
>>>> On 10/16/2012 06:04 PM, Gregory Farnum wrote:
>>>>> Okay, that's the right debugging but it wasn't quite as helpful on its
>>>>> own as I expected. Can you get a core dump (you might already have
>>>>> one, depending on system settings) of the crash and open it up with
>>>>> gdb and get a full backtrace?
>>>>
>>>> You can also run the mds directly in gdb and avoid any core file ulimit
>>>> settings you have set:
>>>>
>>>>    > gdb --args ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
>>>> ...
>>>> (gdb) run
>>>>
>>>> Once you hit the segfault you can get the backtrace with:
>>>>
>>>> (gdb) bt
>>>>
>>>> -sam
>>>>
>>>>
>>>>> -Greg
>>>>>
>>>>> On Mon, Oct 15, 2012 at 10:59 AM, Nick Couchman <Nick.Couchman@seakr.com>
>>>> wrote:
>>>>>> Well, hopefully this is still okay...8.5MB bzip2d, 230MB unzipped.
>>>>>>
>>>>>> -Nick
>>>>>>
>>>>>>>>> On 2012/10/15 at 11:47, Gregory Farnum <greg@inktank.com> wrote:
>>>>>>> Yeah, zip it and post * somebody's going to have to download it and
>>>>>> do
>>>>>>> fun things. :)
>>>>>>> -Greg
>>>>>>>
>>>>>>> On Mon, Oct 15, 2012 at 10:43 AM, Nick Couchman
>>>>>> <Nick.Couchman@seakr.com>
>>>>>>> wrote:
>>>>>>>> Anywhere in particular I should make it available?  It's a little
>>>>>> over a
>>>>>>> million lines of debug in the file - I can put it on a pastebin, if
>>>>>> that
>>>>>>> works, or perhaps zip it up and throw it somewhere?
>>>>>>>>
>>>>>>>> -Nick
>>>>>>>>
>>>>>>>>>>> On 2012/10/15 at 11:26, Gregory Farnum <greg@inktank.com> wrote:
>>>>>>>>> Something in the MDS log is bad or is poking at a bug in the code.
>>>>>> Can
>>>>>>>>> you turn on MDS debugging and restart a daemon and put that log
>>>>>>>>> somewhere accessible?
>>>>>>>>> debug mds = 20
>>>>>>>>> debug journaler = 20
>>>>>>>>> debug ms = 1
>>>>>>>>> -Greg
>>>>>>>>>
>>>>>>>>> On Mon, Oct 15, 2012 at 10:02 AM, Nick Couchman
>>>>>> <Nick.Couchman@seakr.com>
>>>>>>>>> wrote:
>>>>>>>>>> Well, both of my MDSs seem to be down right now, and then
>>>>>> continually
>>>>>>>>> segfault (every time I try to start them) with the following:
>>>>>>>>>>
>>>>>>>>>> ceph-mdsmon-a:~ # ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
>>>>>>>>>> starting mds.b at :/0
>>>>>>>>>> *** Caught signal (Segmentation fault) **
>>>>>>>>>>     in thread 7fbe0d61d700
>>>>>>>>>>     ceph version 0.48.1argonaut
>>>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>>>     1: ceph-mds() [0x7ef83a]
>>>>>>>>>>     2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>>>     3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>>>     4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>>>     5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>>>     6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>>>     7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>>>> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught signal
>>>>>> (Segmentation
>>>>>>>>> fault) **
>>>>>>>>>>     in thread 7fbe0d61d700
>>>>>>>>>>
>>>>>>>>>>     ceph version 0.48.1argonaut
>>>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>>>     1: ceph-mds() [0x7ef83a]
>>>>>>>>>>     2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>>>     3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>>>     4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>>>     5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>>>     6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>>>     7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>>>>     NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>> needed to
>>>>>>>>> interpret this.
>>>>>>>>>>
>>>>>>>>>>         0> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught
>>>>>> signal
>>>>>>>>> (Segmentation fault) **
>>>>>>>>>>     in thread 7fbe0d61d700
>>>>>>>>>>
>>>>>>>>>>     ceph version 0.48.1argonaut
>>>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>>>     1: ceph-mds() [0x7ef83a]
>>>>>>>>>>     2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>>>     3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>>>     4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>>>     5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>>>     6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>>>     7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>>>>     NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>> needed to
>>>>>>>>> interpret this.
>>>>>>>>>>
>>>>>>>>>> Segmentation fault
>>>>>>>>>>
>>>>>>>>>> Anyone have any hints on recovering?  I'm running 0.48.1argonaut -
>>>>>> I can
>>>>>>>>> attempt to upgrade to 0.48.2 and see if that helps, but I figured
>>>>>> if anyone
>>>>>>>>> can offer any insight as to what to do to get the replay to run
>>>>>> without
>>>>>>>>> segfaulting?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --------
>>>>>>>>>> This e-mail may contain confidential and privileged material for
>>>>>> the sole use
>>>>>>>>> of the intended recipient.  If this email is not intended for you,
>>>>>> or you
>>>>>>> are
>>>>>>>>> not responsible for the delivery of this message to the intended
>>>>>> recipient,
>>>>>>>>> please note that this message may contain SEAKR Engineering
>>>>>> (SEAKR)
>>>>>>>>> Privileged/Proprietary Information.  In such a case, you are
>>>>>> strictly
>>>>>>>>> prohibited from downloading, photocopying, distributing or
>>>>>> otherwise using
>>>>>>>>> this message, its contents or attachments in any way.  If you have
>>>>>> received
>>>>>>>>> this message in error, please notify us immediately by replying to
>>>>>> this
>>>>>>> e-mail
>>>>>>>>> and delete the message from your mailbox.  Information contained in
>>>>>> this
>>>>>>>>> message that does not relate to the business of SEAKR is neither
>>>>>> endorsed by
>>>>>>>>> nor attributable to SEAKR.
>>>>>>>>>> --
>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>> ceph-devel" in
>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>> More majordomo info at
>>>>>> http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --------
>>>>>>>>
>>>>>>>> This e-mail may contain confidential and privileged material for the
>>>>>> sole use
>>>>>>> of the intended recipient.  If this email is not intended for you, or
>>>>>> you are
>>>>>>> not responsible for the delivery of this message to the intended
>>>>>> recipient,
>>>>>>> please note that this message may contain SEAKR Engineering (SEAKR)
>>>>>>> Privileged/Proprietary Information.  In such a case, you are strictly
>>>>>>
>>>>>>> prohibited from downloading, photocopying, distributing or otherwise
>>>>>> using
>>>>>>> this message, its contents or attachments in any way.  If you have
>>>>>> received
>>>>>>> this message in error, please notify us immediately by replying to
>>>>>> this e-mail
>>>>>>> and delete the message from your mailbox.  Information contained in
>>>>>> this
>>>>>>> message that does not relate to the business of SEAKR is neither
>>>>>> endorsed by
>>>>>>> nor attributable to SEAKR.
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>>> in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>>
>>>>>>
>>>>>> --------
>>>>>> This e-mail may contain confidential and privileged material for the sole use
>>>> of the intended recipient.  If this email is not intended for you, or you
>> are
>>>> not responsible for the delivery of this message to the intended recipient,
>>>> please note that this message may contain SEAKR Engineering (SEAKR)
>>>> Privileged/Proprietary Information.  In such a case, you are strictly
>>>> prohibited from downloading, photocopying, distributing or otherwise using
>>>> this message, its contents or attachments in any way.  If you have received
>>>> this message in error, please notify us immediately by replying to this
>> e-mail
>>>> and delete the message from your mailbox.  Information contained in this
>>>> message that does not relate to the business of SEAKR is neither endorsed by
>>>> nor attributable to SEAKR.
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>
>>>
>>>
>>> --------
>>> This e-mail may contain confidential and privileged material for the sole use
>> of the intended recipient.  If this email is not intended for you, or you are
>> not responsible for the delivery of this message to the intended recipient,
>> please note that this message may contain SEAKR Engineering (SEAKR)
>> Privileged/Proprietary Information.  In such a case, you are strictly
>> prohibited from downloading, photocopying, distributing or otherwise using
>> this message, its contents or attachments in any way.  If you have received
>> this message in error, please notify us immediately by replying to this e-mail
>> and delete the message from your mailbox.  Information contained in this
>> message that does not relate to the business of SEAKR is neither endorsed by
>> nor attributable to SEAKR.
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --------
> This e-mail may contain confidential and privileged material for the sole use of the intended recipient.  If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information.  In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way.  If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox.  Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Help...MDS Continuously Segfaulting
  2012-10-17 15:53               ` Sam Lang
@ 2012-10-17 16:23                 ` Nick Couchman
  2012-10-17 17:03                   ` Sam Lang
  2012-10-18 15:56                 ` Nick Couchman
  1 sibling, 1 reply; 19+ messages in thread
From: Nick Couchman @ 2012-10-17 16:23 UTC (permalink / raw)
  To: Sam Lang; +Cc: Gregory Farnum, ceph-devel

Hmmm...I don't seem to have the dbg packages built...will have to go back and figure out how to build those.

-Nick

>>> On 2012/10/17 at 09:53, Sam Lang <sam.lang@inktank.com> wrote: 
> On 10/17/2012 09:42 AM, Nick Couchman wrote:
>> Thanks...here's the backtrace:
>> (gdb) bt
>> #0  0x00000000004dcfea in ESession::replay(MDS*) ()
>> #1  0x00000000006a2446 in MDLog::_replay_thread() ()
>> #2  0x00000000004cf5ed in MDLog::ReplayThread::entry() ()
>> #3  0x00007ffff764df05 in start_thread () from /lib64/libpthread.so.0
>> #4  0x00007ffff680d10d in clone () from /lib64/libc.so.6
> 
> Hi Nick,
> 
> This doesn't have the debug symbols (line numbers in the source) we were 
> hoping for.  Could you install the ceph-dpg package and rerun?  You will 
> probably have to first uninstall the ceph package.
> 
> Thanks,
> -sam
> 
>>
>>>>> On 2012/10/17 at 07:34, Sam Lang <sam.lang@inktank.com> wrote:
>>> On 10/16/2012 06:04 PM, Gregory Farnum wrote:
>>>> Okay, that's the right debugging but it wasn't quite as helpful on its
>>>> own as I expected. Can you get a core dump (you might already have
>>>> one, depending on system settings) of the crash and open it up with
>>>> gdb and get a full backtrace?
>>>
>>> You can also run the mds directly in gdb and avoid any core file ulimit
>>> settings you have set:
>>>
>>>   > gdb --args ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
>>> ...
>>> (gdb) run
>>>
>>> Once you hit the segfault you can get the backtrace with:
>>>
>>> (gdb) bt
>>>
>>> -sam
>>>
>>>
>>>> -Greg
>>>>
>>>> On Mon, Oct 15, 2012 at 10:59 AM, Nick Couchman <Nick.Couchman@seakr.com>
>>> wrote:
>>>>> Well, hopefully this is still okay...8.5MB bzip2d, 230MB unzipped.
>>>>>
>>>>> -Nick
>>>>>
>>>>>>>> On 2012/10/15 at 11:47, Gregory Farnum <greg@inktank.com> wrote:
>>>>>> Yeah, zip it and post * somebody's going to have to download it and
>>>>> do
>>>>>> fun things. :)
>>>>>> -Greg
>>>>>>
>>>>>> On Mon, Oct 15, 2012 at 10:43 AM, Nick Couchman
>>>>> <Nick.Couchman@seakr.com>
>>>>>> wrote:
>>>>>>> Anywhere in particular I should make it available?  It's a little
>>>>> over a
>>>>>> million lines of debug in the file - I can put it on a pastebin, if
>>>>> that
>>>>>> works, or perhaps zip it up and throw it somewhere?
>>>>>>>
>>>>>>> -Nick
>>>>>>>
>>>>>>>>>> On 2012/10/15 at 11:26, Gregory Farnum <greg@inktank.com> wrote:
>>>>>>>> Something in the MDS log is bad or is poking at a bug in the code.
>>>>> Can
>>>>>>>> you turn on MDS debugging and restart a daemon and put that log
>>>>>>>> somewhere accessible?
>>>>>>>> debug mds = 20
>>>>>>>> debug journaler = 20
>>>>>>>> debug ms = 1
>>>>>>>> -Greg
>>>>>>>>
>>>>>>>> On Mon, Oct 15, 2012 at 10:02 AM, Nick Couchman
>>>>> <Nick.Couchman@seakr.com>
>>>>>>>> wrote:
>>>>>>>>> Well, both of my MDSs seem to be down right now, and then
>>>>> continually
>>>>>>>> segfault (every time I try to start them) with the following:
>>>>>>>>>
>>>>>>>>> ceph-mdsmon-a:~ # ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
>>>>>>>>> starting mds.b at :/0
>>>>>>>>> *** Caught signal (Segmentation fault) **
>>>>>>>>>    in thread 7fbe0d61d700
>>>>>>>>>    ceph version 0.48.1argonaut
>>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>>    1: ceph-mds() [0x7ef83a]
>>>>>>>>>    2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>>    3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>>    4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>>    5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>>    6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>>    7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>>> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught signal
>>>>> (Segmentation
>>>>>>>> fault) **
>>>>>>>>>    in thread 7fbe0d61d700
>>>>>>>>>
>>>>>>>>>    ceph version 0.48.1argonaut
>>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>>    1: ceph-mds() [0x7ef83a]
>>>>>>>>>    2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>>    3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>>    4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>>    5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>>    6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>>    7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>>>    NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>> needed to
>>>>>>>> interpret this.
>>>>>>>>>
>>>>>>>>>        0> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught
>>>>> signal
>>>>>>>> (Segmentation fault) **
>>>>>>>>>    in thread 7fbe0d61d700
>>>>>>>>>
>>>>>>>>>    ceph version 0.48.1argonaut
>>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>>    1: ceph-mds() [0x7ef83a]
>>>>>>>>>    2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>>    3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>>    4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>>    5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>>    6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>>    7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>>>    NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>> needed to
>>>>>>>> interpret this.
>>>>>>>>>
>>>>>>>>> Segmentation fault
>>>>>>>>>
>>>>>>>>> Anyone have any hints on recovering?  I'm running 0.48.1argonaut -
>>>>> I can
>>>>>>>> attempt to upgrade to 0.48.2 and see if that helps, but I figured
>>>>> if anyone
>>>>>>>> can offer any insight as to what to do to get the replay to run
>>>>> without
>>>>>>>> segfaulting?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --------
>>>>>>>>> This e-mail may contain confidential and privileged material for
>>>>> the sole use
>>>>>>>> of the intended recipient.  If this email is not intended for you,
>>>>> or you
>>>>>> are
>>>>>>>> not responsible for the delivery of this message to the intended
>>>>> recipient,
>>>>>>>> please note that this message may contain SEAKR Engineering
>>>>> (SEAKR)
>>>>>>>> Privileged/Proprietary Information.  In such a case, you are
>>>>> strictly
>>>>>>>> prohibited from downloading, photocopying, distributing or
>>>>> otherwise using
>>>>>>>> this message, its contents or attachments in any way.  If you have
>>>>> received
>>>>>>>> this message in error, please notify us immediately by replying to
>>>>> this
>>>>>> e-mail
>>>>>>>> and delete the message from your mailbox.  Information contained in
>>>>> this
>>>>>>>> message that does not relate to the business of SEAKR is neither
>>>>> endorsed by
>>>>>>>> nor attributable to SEAKR.
>>>>>>>>> --
>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>> ceph-devel" in
>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>> More majordomo info at
>>>>> http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --------
>>>>>>>
>>>>>>> This e-mail may contain confidential and privileged material for the
>>>>> sole use
>>>>>> of the intended recipient.  If this email is not intended for you, or
>>>>> you are
>>>>>> not responsible for the delivery of this message to the intended
>>>>> recipient,
>>>>>> please note that this message may contain SEAKR Engineering (SEAKR)
>>>>>> Privileged/Proprietary Information.  In such a case, you are strictly
>>>>>
>>>>>> prohibited from downloading, photocopying, distributing or otherwise
>>>>> using
>>>>>> this message, its contents or attachments in any way.  If you have
>>>>> received
>>>>>> this message in error, please notify us immediately by replying to
>>>>> this e-mail
>>>>>> and delete the message from your mailbox.  Information contained in
>>>>> this
>>>>>> message that does not relate to the business of SEAKR is neither
>>>>> endorsed by
>>>>>> nor attributable to SEAKR.
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>> in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>>>>
>>>>> --------
>>>>> This e-mail may contain confidential and privileged material for the sole use
>>> of the intended recipient.  If this email is not intended for you, or you 
> are
>>> not responsible for the delivery of this message to the intended recipient,
>>> please note that this message may contain SEAKR Engineering (SEAKR)
>>> Privileged/Proprietary Information.  In such a case, you are strictly
>>> prohibited from downloading, photocopying, distributing or otherwise using
>>> this message, its contents or attachments in any way.  If you have received
>>> this message in error, please notify us immediately by replying to this 
> e-mail
>>> and delete the message from your mailbox.  Information contained in this
>>> message that does not relate to the business of SEAKR is neither endorsed by
>>> nor attributable to SEAKR.
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>
>>
>>
>> --------
>> This e-mail may contain confidential and privileged material for the sole use 
> of the intended recipient.  If this email is not intended for you, or you are 
> not responsible for the delivery of this message to the intended recipient, 
> please note that this message may contain SEAKR Engineering (SEAKR) 
> Privileged/Proprietary Information.  In such a case, you are strictly 
> prohibited from downloading, photocopying, distributing or otherwise using 
> this message, its contents or attachments in any way.  If you have received 
> this message in error, please notify us immediately by replying to this e-mail 
> and delete the message from your mailbox.  Information contained in this 
> message that does not relate to the business of SEAKR is neither endorsed by 
> nor attributable to SEAKR.
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



--------

This e-mail may contain confidential and privileged material for the sole use of the intended recipient.  If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information.  In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way.  If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox.  Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Help...MDS Continuously Segfaulting
  2012-10-17 14:42             ` Nick Couchman
@ 2012-10-17 15:53               ` Sam Lang
  2012-10-17 16:23                 ` Nick Couchman
  2012-10-18 15:56                 ` Nick Couchman
  0 siblings, 2 replies; 19+ messages in thread
From: Sam Lang @ 2012-10-17 15:53 UTC (permalink / raw)
  To: Nick Couchman; +Cc: Gregory Farnum, ceph-devel

On 10/17/2012 09:42 AM, Nick Couchman wrote:
> Thanks...here's the backtrace:
> (gdb) bt
> #0  0x00000000004dcfea in ESession::replay(MDS*) ()
> #1  0x00000000006a2446 in MDLog::_replay_thread() ()
> #2  0x00000000004cf5ed in MDLog::ReplayThread::entry() ()
> #3  0x00007ffff764df05 in start_thread () from /lib64/libpthread.so.0
> #4  0x00007ffff680d10d in clone () from /lib64/libc.so.6

Hi Nick,

This doesn't have the debug symbols (line numbers in the source) we were 
hoping for.  Could you install the ceph-dpg package and rerun?  You will 
probably have to first uninstall the ceph package.

Thanks,
-sam

>
>>>> On 2012/10/17 at 07:34, Sam Lang <sam.lang@inktank.com> wrote:
>> On 10/16/2012 06:04 PM, Gregory Farnum wrote:
>>> Okay, that's the right debugging but it wasn't quite as helpful on its
>>> own as I expected. Can you get a core dump (you might already have
>>> one, depending on system settings) of the crash and open it up with
>>> gdb and get a full backtrace?
>>
>> You can also run the mds directly in gdb and avoid any core file ulimit
>> settings you have set:
>>
>>   > gdb --args ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
>> ...
>> (gdb) run
>>
>> Once you hit the segfault you can get the backtrace with:
>>
>> (gdb) bt
>>
>> -sam
>>
>>
>>> -Greg
>>>
>>> On Mon, Oct 15, 2012 at 10:59 AM, Nick Couchman <Nick.Couchman@seakr.com>
>> wrote:
>>>> Well, hopefully this is still okay...8.5MB bzip2d, 230MB unzipped.
>>>>
>>>> -Nick
>>>>
>>>>>>> On 2012/10/15 at 11:47, Gregory Farnum <greg@inktank.com> wrote:
>>>>> Yeah, zip it and post * somebody's going to have to download it and
>>>> do
>>>>> fun things. :)
>>>>> -Greg
>>>>>
>>>>> On Mon, Oct 15, 2012 at 10:43 AM, Nick Couchman
>>>> <Nick.Couchman@seakr.com>
>>>>> wrote:
>>>>>> Anywhere in particular I should make it available?  It's a little
>>>> over a
>>>>> million lines of debug in the file - I can put it on a pastebin, if
>>>> that
>>>>> works, or perhaps zip it up and throw it somewhere?
>>>>>>
>>>>>> -Nick
>>>>>>
>>>>>>>>> On 2012/10/15 at 11:26, Gregory Farnum <greg@inktank.com> wrote:
>>>>>>> Something in the MDS log is bad or is poking at a bug in the code.
>>>> Can
>>>>>>> you turn on MDS debugging and restart a daemon and put that log
>>>>>>> somewhere accessible?
>>>>>>> debug mds = 20
>>>>>>> debug journaler = 20
>>>>>>> debug ms = 1
>>>>>>> -Greg
>>>>>>>
>>>>>>> On Mon, Oct 15, 2012 at 10:02 AM, Nick Couchman
>>>> <Nick.Couchman@seakr.com>
>>>>>>> wrote:
>>>>>>>> Well, both of my MDSs seem to be down right now, and then
>>>> continually
>>>>>>> segfault (every time I try to start them) with the following:
>>>>>>>>
>>>>>>>> ceph-mdsmon-a:~ # ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
>>>>>>>> starting mds.b at :/0
>>>>>>>> *** Caught signal (Segmentation fault) **
>>>>>>>>    in thread 7fbe0d61d700
>>>>>>>>    ceph version 0.48.1argonaut
>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>    1: ceph-mds() [0x7ef83a]
>>>>>>>>    2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>    3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>    4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>    5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>    6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>    7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught signal
>>>> (Segmentation
>>>>>>> fault) **
>>>>>>>>    in thread 7fbe0d61d700
>>>>>>>>
>>>>>>>>    ceph version 0.48.1argonaut
>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>    1: ceph-mds() [0x7ef83a]
>>>>>>>>    2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>    3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>    4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>    5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>    6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>    7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>>    NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>> needed to
>>>>>>> interpret this.
>>>>>>>>
>>>>>>>>        0> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught
>>>> signal
>>>>>>> (Segmentation fault) **
>>>>>>>>    in thread 7fbe0d61d700
>>>>>>>>
>>>>>>>>    ceph version 0.48.1argonaut
>>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>>    1: ceph-mds() [0x7ef83a]
>>>>>>>>    2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>>    3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>>    4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>>    5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>>    6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>>    7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>>    NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>> needed to
>>>>>>> interpret this.
>>>>>>>>
>>>>>>>> Segmentation fault
>>>>>>>>
>>>>>>>> Anyone have any hints on recovering?  I'm running 0.48.1argonaut -
>>>> I can
>>>>>>> attempt to upgrade to 0.48.2 and see if that helps, but I figured
>>>> if anyone
>>>>>>> can offer any insight as to what to do to get the replay to run
>>>> without
>>>>>>> segfaulting?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --------
>>>>>>>> This e-mail may contain confidential and privileged material for
>>>> the sole use
>>>>>>> of the intended recipient.  If this email is not intended for you,
>>>> or you
>>>>> are
>>>>>>> not responsible for the delivery of this message to the intended
>>>> recipient,
>>>>>>> please note that this message may contain SEAKR Engineering
>>>> (SEAKR)
>>>>>>> Privileged/Proprietary Information.  In such a case, you are
>>>> strictly
>>>>>>> prohibited from downloading, photocopying, distributing or
>>>> otherwise using
>>>>>>> this message, its contents or attachments in any way.  If you have
>>>> received
>>>>>>> this message in error, please notify us immediately by replying to
>>>> this
>>>>> e-mail
>>>>>>> and delete the message from your mailbox.  Information contained in
>>>> this
>>>>>>> message that does not relate to the business of SEAKR is neither
>>>> endorsed by
>>>>>>> nor attributable to SEAKR.
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>> ceph-devel" in
>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>> More majordomo info at
>>>> http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>>
>>>>>>
>>>>>> --------
>>>>>>
>>>>>> This e-mail may contain confidential and privileged material for the
>>>> sole use
>>>>> of the intended recipient.  If this email is not intended for you, or
>>>> you are
>>>>> not responsible for the delivery of this message to the intended
>>>> recipient,
>>>>> please note that this message may contain SEAKR Engineering (SEAKR)
>>>>> Privileged/Proprietary Information.  In such a case, you are strictly
>>>>
>>>>> prohibited from downloading, photocopying, distributing or otherwise
>>>> using
>>>>> this message, its contents or attachments in any way.  If you have
>>>> received
>>>>> this message in error, please notify us immediately by replying to
>>>> this e-mail
>>>>> and delete the message from your mailbox.  Information contained in
>>>> this
>>>>> message that does not relate to the business of SEAKR is neither
>>>> endorsed by
>>>>> nor attributable to SEAKR.
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>> in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>>>>
>>>> --------
>>>> This e-mail may contain confidential and privileged material for the sole use
>> of the intended recipient.  If this email is not intended for you, or you are
>> not responsible for the delivery of this message to the intended recipient,
>> please note that this message may contain SEAKR Engineering (SEAKR)
>> Privileged/Proprietary Information.  In such a case, you are strictly
>> prohibited from downloading, photocopying, distributing or otherwise using
>> this message, its contents or attachments in any way.  If you have received
>> this message in error, please notify us immediately by replying to this e-mail
>> and delete the message from your mailbox.  Information contained in this
>> message that does not relate to the business of SEAKR is neither endorsed by
>> nor attributable to SEAKR.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>
>
>
> --------
> This e-mail may contain confidential and privileged material for the sole use of the intended recipient.  If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information.  In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way.  If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox.  Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Help...MDS Continuously Segfaulting
  2012-10-17 13:34           ` Sam Lang
@ 2012-10-17 14:42             ` Nick Couchman
  2012-10-17 15:53               ` Sam Lang
  0 siblings, 1 reply; 19+ messages in thread
From: Nick Couchman @ 2012-10-17 14:42 UTC (permalink / raw)
  To: Gregory Farnum, Sam Lang; +Cc: ceph-devel

Thanks...here's the backtrace:
(gdb) bt
#0  0x00000000004dcfea in ESession::replay(MDS*) ()
#1  0x00000000006a2446 in MDLog::_replay_thread() ()
#2  0x00000000004cf5ed in MDLog::ReplayThread::entry() ()
#3  0x00007ffff764df05 in start_thread () from /lib64/libpthread.so.0
#4  0x00007ffff680d10d in clone () from /lib64/libc.so.6

>>> On 2012/10/17 at 07:34, Sam Lang <sam.lang@inktank.com> wrote: 
> On 10/16/2012 06:04 PM, Gregory Farnum wrote:
>> Okay, that's the right debugging but it wasn't quite as helpful on its
>> own as I expected. Can you get a core dump (you might already have
>> one, depending on system settings) of the crash and open it up with
>> gdb and get a full backtrace?
> 
> You can also run the mds directly in gdb and avoid any core file ulimit 
> settings you have set:
> 
>  > gdb --args ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
> ...
> (gdb) run
> 
> Once you hit the segfault you can get the backtrace with:
> 
> (gdb) bt
> 
> -sam
> 
> 
>> -Greg
>>
>> On Mon, Oct 15, 2012 at 10:59 AM, Nick Couchman <Nick.Couchman@seakr.com> 
> wrote:
>>> Well, hopefully this is still okay...8.5MB bzip2d, 230MB unzipped.
>>>
>>> -Nick
>>>
>>>>>> On 2012/10/15 at 11:47, Gregory Farnum <greg@inktank.com> wrote:
>>>> Yeah, zip it and post * somebody's going to have to download it and
>>> do
>>>> fun things. :)
>>>> -Greg
>>>>
>>>> On Mon, Oct 15, 2012 at 10:43 AM, Nick Couchman
>>> <Nick.Couchman@seakr.com>
>>>> wrote:
>>>>> Anywhere in particular I should make it available?  It's a little
>>> over a
>>>> million lines of debug in the file - I can put it on a pastebin, if
>>> that
>>>> works, or perhaps zip it up and throw it somewhere?
>>>>>
>>>>> -Nick
>>>>>
>>>>>>>> On 2012/10/15 at 11:26, Gregory Farnum <greg@inktank.com> wrote:
>>>>>> Something in the MDS log is bad or is poking at a bug in the code.
>>> Can
>>>>>> you turn on MDS debugging and restart a daemon and put that log
>>>>>> somewhere accessible?
>>>>>> debug mds = 20
>>>>>> debug journaler = 20
>>>>>> debug ms = 1
>>>>>> -Greg
>>>>>>
>>>>>> On Mon, Oct 15, 2012 at 10:02 AM, Nick Couchman
>>> <Nick.Couchman@seakr.com>
>>>>>> wrote:
>>>>>>> Well, both of my MDSs seem to be down right now, and then
>>> continually
>>>>>> segfault (every time I try to start them) with the following:
>>>>>>>
>>>>>>> ceph-mdsmon-a:~ # ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
>>>>>>> starting mds.b at :/0
>>>>>>> *** Caught signal (Segmentation fault) **
>>>>>>>   in thread 7fbe0d61d700
>>>>>>>   ceph version 0.48.1argonaut
>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>   1: ceph-mds() [0x7ef83a]
>>>>>>>   2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>   3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>   4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>   5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>   6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>   7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught signal
>>> (Segmentation
>>>>>> fault) **
>>>>>>>   in thread 7fbe0d61d700
>>>>>>>
>>>>>>>   ceph version 0.48.1argonaut
>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>   1: ceph-mds() [0x7ef83a]
>>>>>>>   2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>   3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>   4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>   5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>   6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>   7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>> needed to
>>>>>> interpret this.
>>>>>>>
>>>>>>>       0> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught
>>> signal
>>>>>> (Segmentation fault) **
>>>>>>>   in thread 7fbe0d61d700
>>>>>>>
>>>>>>>   ceph version 0.48.1argonaut
>>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>>   1: ceph-mds() [0x7ef83a]
>>>>>>>   2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>>   3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>>   4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>>   5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>>   6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>>   7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>> needed to
>>>>>> interpret this.
>>>>>>>
>>>>>>> Segmentation fault
>>>>>>>
>>>>>>> Anyone have any hints on recovering?  I'm running 0.48.1argonaut -
>>> I can
>>>>>> attempt to upgrade to 0.48.2 and see if that helps, but I figured
>>> if anyone
>>>>>> can offer any insight as to what to do to get the replay to run
>>> without
>>>>>> segfaulting?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --------
>>>>>>> This e-mail may contain confidential and privileged material for
>>> the sole use
>>>>>> of the intended recipient.  If this email is not intended for you,
>>> or you
>>>> are
>>>>>> not responsible for the delivery of this message to the intended
>>> recipient,
>>>>>> please note that this message may contain SEAKR Engineering
>>> (SEAKR)
>>>>>> Privileged/Proprietary Information.  In such a case, you are
>>> strictly
>>>>>> prohibited from downloading, photocopying, distributing or
>>> otherwise using
>>>>>> this message, its contents or attachments in any way.  If you have
>>> received
>>>>>> this message in error, please notify us immediately by replying to
>>> this
>>>> e-mail
>>>>>> and delete the message from your mailbox.  Information contained in
>>> this
>>>>>> message that does not relate to the business of SEAKR is neither
>>> endorsed by
>>>>>> nor attributable to SEAKR.
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>> ceph-devel" in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at
>>> http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>>>>
>>>>> --------
>>>>>
>>>>> This e-mail may contain confidential and privileged material for the
>>> sole use
>>>> of the intended recipient.  If this email is not intended for you, or
>>> you are
>>>> not responsible for the delivery of this message to the intended
>>> recipient,
>>>> please note that this message may contain SEAKR Engineering (SEAKR)
>>>> Privileged/Proprietary Information.  In such a case, you are strictly
>>>
>>>> prohibited from downloading, photocopying, distributing or otherwise
>>> using
>>>> this message, its contents or attachments in any way.  If you have
>>> received
>>>> this message in error, please notify us immediately by replying to
>>> this e-mail
>>>> and delete the message from your mailbox.  Information contained in
>>> this
>>>> message that does not relate to the business of SEAKR is neither
>>> endorsed by
>>>> nor attributable to SEAKR.
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>
>>> --------
>>> This e-mail may contain confidential and privileged material for the sole use 
> of the intended recipient.  If this email is not intended for you, or you are 
> not responsible for the delivery of this message to the intended recipient, 
> please note that this message may contain SEAKR Engineering (SEAKR) 
> Privileged/Proprietary Information.  In such a case, you are strictly 
> prohibited from downloading, photocopying, distributing or otherwise using 
> this message, its contents or attachments in any way.  If you have received 
> this message in error, please notify us immediately by replying to this e-mail 
> and delete the message from your mailbox.  Information contained in this 
> message that does not relate to the business of SEAKR is neither endorsed by 
> nor attributable to SEAKR.
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>



--------

This e-mail may contain confidential and privileged material for the sole use of the intended recipient.  If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information.  In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way.  If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox.  Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Help...MDS Continuously Segfaulting
  2012-10-16 23:04         ` Gregory Farnum
@ 2012-10-17 13:34           ` Sam Lang
  2012-10-17 14:42             ` Nick Couchman
  0 siblings, 1 reply; 19+ messages in thread
From: Sam Lang @ 2012-10-17 13:34 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Nick Couchman, ceph-devel

On 10/16/2012 06:04 PM, Gregory Farnum wrote:
> Okay, that's the right debugging but it wasn't quite as helpful on its
> own as I expected. Can you get a core dump (you might already have
> one, depending on system settings) of the crash and open it up with
> gdb and get a full backtrace?

You can also run the mds directly in gdb and avoid any core file ulimit 
settings you have set:

 > gdb --args ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
...
(gdb) run

Once you hit the segfault you can get the backtrace with:

(gdb) bt

-sam


> -Greg
>
> On Mon, Oct 15, 2012 at 10:59 AM, Nick Couchman <Nick.Couchman@seakr.com> wrote:
>> Well, hopefully this is still okay...8.5MB bzip2d, 230MB unzipped.
>>
>> -Nick
>>
>>>>> On 2012/10/15 at 11:47, Gregory Farnum <greg@inktank.com> wrote:
>>> Yeah, zip it and post * somebody's going to have to download it and
>> do
>>> fun things. :)
>>> -Greg
>>>
>>> On Mon, Oct 15, 2012 at 10:43 AM, Nick Couchman
>> <Nick.Couchman@seakr.com>
>>> wrote:
>>>> Anywhere in particular I should make it available?  It's a little
>> over a
>>> million lines of debug in the file - I can put it on a pastebin, if
>> that
>>> works, or perhaps zip it up and throw it somewhere?
>>>>
>>>> -Nick
>>>>
>>>>>>> On 2012/10/15 at 11:26, Gregory Farnum <greg@inktank.com> wrote:
>>>>> Something in the MDS log is bad or is poking at a bug in the code.
>> Can
>>>>> you turn on MDS debugging and restart a daemon and put that log
>>>>> somewhere accessible?
>>>>> debug mds = 20
>>>>> debug journaler = 20
>>>>> debug ms = 1
>>>>> -Greg
>>>>>
>>>>> On Mon, Oct 15, 2012 at 10:02 AM, Nick Couchman
>> <Nick.Couchman@seakr.com>
>>>>> wrote:
>>>>>> Well, both of my MDSs seem to be down right now, and then
>> continually
>>>>> segfault (every time I try to start them) with the following:
>>>>>>
>>>>>> ceph-mdsmon-a:~ # ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
>>>>>> starting mds.b at :/0
>>>>>> *** Caught signal (Segmentation fault) **
>>>>>>   in thread 7fbe0d61d700
>>>>>>   ceph version 0.48.1argonaut
>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>   1: ceph-mds() [0x7ef83a]
>>>>>>   2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>   3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>   4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>   5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>   6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>   7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught signal
>> (Segmentation
>>>>> fault) **
>>>>>>   in thread 7fbe0d61d700
>>>>>>
>>>>>>   ceph version 0.48.1argonaut
>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>   1: ceph-mds() [0x7ef83a]
>>>>>>   2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>   3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>   4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>   5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>   6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>   7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to
>>>>> interpret this.
>>>>>>
>>>>>>       0> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught
>> signal
>>>>> (Segmentation fault) **
>>>>>>   in thread 7fbe0d61d700
>>>>>>
>>>>>>   ceph version 0.48.1argonaut
>>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>>   1: ceph-mds() [0x7ef83a]
>>>>>>   2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>>   3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>>   4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>>   5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>>   6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>>   7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to
>>>>> interpret this.
>>>>>>
>>>>>> Segmentation fault
>>>>>>
>>>>>> Anyone have any hints on recovering?  I'm running 0.48.1argonaut -
>> I can
>>>>> attempt to upgrade to 0.48.2 and see if that helps, but I figured
>> if anyone
>>>>> can offer any insight as to what to do to get the replay to run
>> without
>>>>> segfaulting?
>>>>>>
>>>>>>
>>>>>>
>>>>>> --------
>>>>>> This e-mail may contain confidential and privileged material for
>> the sole use
>>>>> of the intended recipient.  If this email is not intended for you,
>> or you
>>> are
>>>>> not responsible for the delivery of this message to the intended
>> recipient,
>>>>> please note that this message may contain SEAKR Engineering
>> (SEAKR)
>>>>> Privileged/Proprietary Information.  In such a case, you are
>> strictly
>>>>> prohibited from downloading, photocopying, distributing or
>> otherwise using
>>>>> this message, its contents or attachments in any way.  If you have
>> received
>>>>> this message in error, please notify us immediately by replying to
>> this
>>> e-mail
>>>>> and delete the message from your mailbox.  Information contained in
>> this
>>>>> message that does not relate to the business of SEAKR is neither
>> endorsed by
>>>>> nor attributable to SEAKR.
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe
>> ceph-devel" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at
>> http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>>>>
>>>> --------
>>>>
>>>> This e-mail may contain confidential and privileged material for the
>> sole use
>>> of the intended recipient.  If this email is not intended for you, or
>> you are
>>> not responsible for the delivery of this message to the intended
>> recipient,
>>> please note that this message may contain SEAKR Engineering (SEAKR)
>>> Privileged/Proprietary Information.  In such a case, you are strictly
>>
>>> prohibited from downloading, photocopying, distributing or otherwise
>> using
>>> this message, its contents or attachments in any way.  If you have
>> received
>>> this message in error, please notify us immediately by replying to
>> this e-mail
>>> and delete the message from your mailbox.  Information contained in
>> this
>>> message that does not relate to the business of SEAKR is neither
>> endorsed by
>>> nor attributable to SEAKR.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>> --------
>> This e-mail may contain confidential and privileged material for the sole use of the intended recipient.  If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information.  In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way.  If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox.  Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Help...MDS Continuously Segfaulting
       [not found]       ` <507BFAA102000099000ECA3B@collaborate.seakr.com>
@ 2012-10-16 23:04         ` Gregory Farnum
  2012-10-17 13:34           ` Sam Lang
  0 siblings, 1 reply; 19+ messages in thread
From: Gregory Farnum @ 2012-10-16 23:04 UTC (permalink / raw)
  To: Nick Couchman; +Cc: ceph-devel

Okay, that's the right debugging but it wasn't quite as helpful on its
own as I expected. Can you get a core dump (you might already have
one, depending on system settings) of the crash and open it up with
gdb and get a full backtrace?
-Greg

On Mon, Oct 15, 2012 at 10:59 AM, Nick Couchman <Nick.Couchman@seakr.com> wrote:
> Well, hopefully this is still okay...8.5MB bzip2d, 230MB unzipped.
>
> -Nick
>
>>>> On 2012/10/15 at 11:47, Gregory Farnum <greg@inktank.com> wrote:
>> Yeah, zip it and post * somebody's going to have to download it and
> do
>> fun things. :)
>> -Greg
>>
>> On Mon, Oct 15, 2012 at 10:43 AM, Nick Couchman
> <Nick.Couchman@seakr.com>
>> wrote:
>>> Anywhere in particular I should make it available?  It's a little
> over a
>> million lines of debug in the file - I can put it on a pastebin, if
> that
>> works, or perhaps zip it up and throw it somewhere?
>>>
>>> -Nick
>>>
>>>>>> On 2012/10/15 at 11:26, Gregory Farnum <greg@inktank.com> wrote:
>>>> Something in the MDS log is bad or is poking at a bug in the code.
> Can
>>>> you turn on MDS debugging and restart a daemon and put that log
>>>> somewhere accessible?
>>>> debug mds = 20
>>>> debug journaler = 20
>>>> debug ms = 1
>>>> -Greg
>>>>
>>>> On Mon, Oct 15, 2012 at 10:02 AM, Nick Couchman
> <Nick.Couchman@seakr.com>
>>>> wrote:
>>>>> Well, both of my MDSs seem to be down right now, and then
> continually
>>>> segfault (every time I try to start them) with the following:
>>>>>
>>>>> ceph-mdsmon-a:~ # ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
>>>>> starting mds.b at :/0
>>>>> *** Caught signal (Segmentation fault) **
>>>>>  in thread 7fbe0d61d700
>>>>>  ceph version 0.48.1argonaut
>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>  1: ceph-mds() [0x7ef83a]
>>>>>  2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>  3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>  4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>  5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>  6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>  7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught signal
> (Segmentation
>>>> fault) **
>>>>>  in thread 7fbe0d61d700
>>>>>
>>>>>  ceph version 0.48.1argonaut
>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>  1: ceph-mds() [0x7ef83a]
>>>>>  2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>  3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>  4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>  5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>  6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>  7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to
>>>> interpret this.
>>>>>
>>>>>      0> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught
> signal
>>>> (Segmentation fault) **
>>>>>  in thread 7fbe0d61d700
>>>>>
>>>>>  ceph version 0.48.1argonaut
>>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>>>  1: ceph-mds() [0x7ef83a]
>>>>>  2: (()+0xfd00) [0x7fbe15a0cd00]
>>>>>  3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>>>  4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>>>  5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>>>  6: (()+0x7f05) [0x7fbe15a04f05]
>>>>>  7: (clone()+0x6d) [0x7fbe14bc410d]
>>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to
>>>> interpret this.
>>>>>
>>>>> Segmentation fault
>>>>>
>>>>> Anyone have any hints on recovering?  I'm running 0.48.1argonaut -
> I can
>>>> attempt to upgrade to 0.48.2 and see if that helps, but I figured
> if anyone
>>>> can offer any insight as to what to do to get the replay to run
> without
>>>> segfaulting?
>>>>>
>>>>>
>>>>>
>>>>> --------
>>>>> This e-mail may contain confidential and privileged material for
> the sole use
>>>> of the intended recipient.  If this email is not intended for you,
> or you
>> are
>>>> not responsible for the delivery of this message to the intended
> recipient,
>>>> please note that this message may contain SEAKR Engineering
> (SEAKR)
>>>> Privileged/Proprietary Information.  In such a case, you are
> strictly
>>>> prohibited from downloading, photocopying, distributing or
> otherwise using
>>>> this message, its contents or attachments in any way.  If you have
> received
>>>> this message in error, please notify us immediately by replying to
> this
>> e-mail
>>>> and delete the message from your mailbox.  Information contained in
> this
>>>> message that does not relate to the business of SEAKR is neither
> endorsed by
>>>> nor attributable to SEAKR.
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe
> ceph-devel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at
> http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>
>>> --------
>>>
>>> This e-mail may contain confidential and privileged material for the
> sole use
>> of the intended recipient.  If this email is not intended for you, or
> you are
>> not responsible for the delivery of this message to the intended
> recipient,
>> please note that this message may contain SEAKR Engineering (SEAKR)
>> Privileged/Proprietary Information.  In such a case, you are strictly
>
>> prohibited from downloading, photocopying, distributing or otherwise
> using
>> this message, its contents or attachments in any way.  If you have
> received
>> this message in error, please notify us immediately by replying to
> this e-mail
>> and delete the message from your mailbox.  Information contained in
> this
>> message that does not relate to the business of SEAKR is neither
> endorsed by
>> nor attributable to SEAKR.
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --------
> This e-mail may contain confidential and privileged material for the sole use of the intended recipient.  If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information.  In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way.  If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox.  Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Help...MDS Continuously Segfaulting
  2012-10-15 17:43   ` Nick Couchman
@ 2012-10-15 17:47     ` Gregory Farnum
       [not found]       ` <507BFAA102000099000ECA3B@collaborate.seakr.com>
  0 siblings, 1 reply; 19+ messages in thread
From: Gregory Farnum @ 2012-10-15 17:47 UTC (permalink / raw)
  To: Nick Couchman; +Cc: ceph-devel

Yeah, zip it and post — somebody's going to have to download it and do
fun things. :)
-Greg

On Mon, Oct 15, 2012 at 10:43 AM, Nick Couchman <Nick.Couchman@seakr.com> wrote:
> Anywhere in particular I should make it available?  It's a little over a million lines of debug in the file - I can put it on a pastebin, if that works, or perhaps zip it up and throw it somewhere?
>
> -Nick
>
>>>> On 2012/10/15 at 11:26, Gregory Farnum <greg@inktank.com> wrote:
>> Something in the MDS log is bad or is poking at a bug in the code. Can
>> you turn on MDS debugging and restart a daemon and put that log
>> somewhere accessible?
>> debug mds = 20
>> debug journaler = 20
>> debug ms = 1
>> -Greg
>>
>> On Mon, Oct 15, 2012 at 10:02 AM, Nick Couchman <Nick.Couchman@seakr.com>
>> wrote:
>>> Well, both of my MDSs seem to be down right now, and then continually
>> segfault (every time I try to start them) with the following:
>>>
>>> ceph-mdsmon-a:~ # ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
>>> starting mds.b at :/0
>>> *** Caught signal (Segmentation fault) **
>>>  in thread 7fbe0d61d700
>>>  ceph version 0.48.1argonaut
>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>  1: ceph-mds() [0x7ef83a]
>>>  2: (()+0xfd00) [0x7fbe15a0cd00]
>>>  3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>  4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>  5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>  6: (()+0x7f05) [0x7fbe15a04f05]
>>>  7: (clone()+0x6d) [0x7fbe14bc410d]
>>> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught signal (Segmentation
>> fault) **
>>>  in thread 7fbe0d61d700
>>>
>>>  ceph version 0.48.1argonaut
>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>  1: ceph-mds() [0x7ef83a]
>>>  2: (()+0xfd00) [0x7fbe15a0cd00]
>>>  3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>  4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>  5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>  6: (()+0x7f05) [0x7fbe15a04f05]
>>>  7: (clone()+0x6d) [0x7fbe14bc410d]
>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
>> interpret this.
>>>
>>>      0> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught signal
>> (Segmentation fault) **
>>>  in thread 7fbe0d61d700
>>>
>>>  ceph version 0.48.1argonaut
>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>  1: ceph-mds() [0x7ef83a]
>>>  2: (()+0xfd00) [0x7fbe15a0cd00]
>>>  3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>>  4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>>  5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>>  6: (()+0x7f05) [0x7fbe15a04f05]
>>>  7: (clone()+0x6d) [0x7fbe14bc410d]
>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
>> interpret this.
>>>
>>> Segmentation fault
>>>
>>> Anyone have any hints on recovering?  I'm running 0.48.1argonaut - I can
>> attempt to upgrade to 0.48.2 and see if that helps, but I figured if anyone
>> can offer any insight as to what to do to get the replay to run without
>> segfaulting?
>>>
>>>
>>>
>>> --------
>>> This e-mail may contain confidential and privileged material for the sole use
>> of the intended recipient.  If this email is not intended for you, or you are
>> not responsible for the delivery of this message to the intended recipient,
>> please note that this message may contain SEAKR Engineering (SEAKR)
>> Privileged/Proprietary Information.  In such a case, you are strictly
>> prohibited from downloading, photocopying, distributing or otherwise using
>> this message, its contents or attachments in any way.  If you have received
>> this message in error, please notify us immediately by replying to this e-mail
>> and delete the message from your mailbox.  Information contained in this
>> message that does not relate to the business of SEAKR is neither endorsed by
>> nor attributable to SEAKR.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --------
>
> This e-mail may contain confidential and privileged material for the sole use of the intended recipient.  If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information.  In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way.  If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox.  Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Help...MDS Continuously Segfaulting
  2012-10-15 17:26 ` Gregory Farnum
@ 2012-10-15 17:43   ` Nick Couchman
  2012-10-15 17:47     ` Gregory Farnum
  0 siblings, 1 reply; 19+ messages in thread
From: Nick Couchman @ 2012-10-15 17:43 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: ceph-devel

Anywhere in particular I should make it available?  It's a little over a million lines of debug in the file - I can put it on a pastebin, if that works, or perhaps zip it up and throw it somewhere?

-Nick

>>> On 2012/10/15 at 11:26, Gregory Farnum <greg@inktank.com> wrote: 
> Something in the MDS log is bad or is poking at a bug in the code. Can
> you turn on MDS debugging and restart a daemon and put that log
> somewhere accessible?
> debug mds = 20
> debug journaler = 20
> debug ms = 1
> -Greg
> 
> On Mon, Oct 15, 2012 at 10:02 AM, Nick Couchman <Nick.Couchman@seakr.com> 
> wrote:
>> Well, both of my MDSs seem to be down right now, and then continually 
> segfault (every time I try to start them) with the following:
>>
>> ceph-mdsmon-a:~ # ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
>> starting mds.b at :/0
>> *** Caught signal (Segmentation fault) **
>>  in thread 7fbe0d61d700
>>  ceph version 0.48.1argonaut 
> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>  1: ceph-mds() [0x7ef83a]
>>  2: (()+0xfd00) [0x7fbe15a0cd00]
>>  3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>  4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>  5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>  6: (()+0x7f05) [0x7fbe15a04f05]
>>  7: (clone()+0x6d) [0x7fbe14bc410d]
>> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught signal (Segmentation 
> fault) **
>>  in thread 7fbe0d61d700
>>
>>  ceph version 0.48.1argonaut 
> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>  1: ceph-mds() [0x7ef83a]
>>  2: (()+0xfd00) [0x7fbe15a0cd00]
>>  3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>  4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>  5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>  6: (()+0x7f05) [0x7fbe15a04f05]
>>  7: (clone()+0x6d) [0x7fbe14bc410d]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
> interpret this.
>>
>>      0> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught signal 
> (Segmentation fault) **
>>  in thread 7fbe0d61d700
>>
>>  ceph version 0.48.1argonaut 
> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>  1: ceph-mds() [0x7ef83a]
>>  2: (()+0xfd00) [0x7fbe15a0cd00]
>>  3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>>  4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>>  5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>>  6: (()+0x7f05) [0x7fbe15a04f05]
>>  7: (clone()+0x6d) [0x7fbe14bc410d]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
> interpret this.
>>
>> Segmentation fault
>>
>> Anyone have any hints on recovering?  I'm running 0.48.1argonaut - I can 
> attempt to upgrade to 0.48.2 and see if that helps, but I figured if anyone 
> can offer any insight as to what to do to get the replay to run without 
> segfaulting?
>>
>>
>>
>> --------
>> This e-mail may contain confidential and privileged material for the sole use 
> of the intended recipient.  If this email is not intended for you, or you are 
> not responsible for the delivery of this message to the intended recipient, 
> please note that this message may contain SEAKR Engineering (SEAKR) 
> Privileged/Proprietary Information.  In such a case, you are strictly 
> prohibited from downloading, photocopying, distributing or otherwise using 
> this message, its contents or attachments in any way.  If you have received 
> this message in error, please notify us immediately by replying to this e-mail 
> and delete the message from your mailbox.  Information contained in this 
> message that does not relate to the business of SEAKR is neither endorsed by 
> nor attributable to SEAKR.
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html



--------

This e-mail may contain confidential and privileged material for the sole use of the intended recipient.  If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information.  In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way.  If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox.  Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Help...MDS Continuously Segfaulting
  2012-10-15 17:02 Nick Couchman
@ 2012-10-15 17:26 ` Gregory Farnum
  2012-10-15 17:43   ` Nick Couchman
  0 siblings, 1 reply; 19+ messages in thread
From: Gregory Farnum @ 2012-10-15 17:26 UTC (permalink / raw)
  To: Nick Couchman; +Cc: ceph-devel

Something in the MDS log is bad or is poking at a bug in the code. Can
you turn on MDS debugging and restart a daemon and put that log
somewhere accessible?
debug mds = 20
debug journaler = 20
debug ms = 1
-Greg

On Mon, Oct 15, 2012 at 10:02 AM, Nick Couchman <Nick.Couchman@seakr.com> wrote:
> Well, both of my MDSs seem to be down right now, and then continually segfault (every time I try to start them) with the following:
>
> ceph-mdsmon-a:~ # ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
> starting mds.b at :/0
> *** Caught signal (Segmentation fault) **
>  in thread 7fbe0d61d700
>  ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>  1: ceph-mds() [0x7ef83a]
>  2: (()+0xfd00) [0x7fbe15a0cd00]
>  3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>  4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>  5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>  6: (()+0x7f05) [0x7fbe15a04f05]
>  7: (clone()+0x6d) [0x7fbe14bc410d]
> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught signal (Segmentation fault) **
>  in thread 7fbe0d61d700
>
>  ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>  1: ceph-mds() [0x7ef83a]
>  2: (()+0xfd00) [0x7fbe15a0cd00]
>  3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>  4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>  5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>  6: (()+0x7f05) [0x7fbe15a04f05]
>  7: (clone()+0x6d) [0x7fbe14bc410d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
>
>      0> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught signal (Segmentation fault) **
>  in thread 7fbe0d61d700
>
>  ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>  1: ceph-mds() [0x7ef83a]
>  2: (()+0xfd00) [0x7fbe15a0cd00]
>  3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
>  4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
>  5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
>  6: (()+0x7f05) [0x7fbe15a04f05]
>  7: (clone()+0x6d) [0x7fbe14bc410d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
>
> Segmentation fault
>
> Anyone have any hints on recovering?  I'm running 0.48.1argonaut - I can attempt to upgrade to 0.48.2 and see if that helps, but I figured if anyone can offer any insight as to what to do to get the replay to run without segfaulting?
>
>
>
> --------
> This e-mail may contain confidential and privileged material for the sole use of the intended recipient.  If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information.  In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way.  If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox.  Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Help...MDS Continuously Segfaulting
@ 2012-10-15 17:02 Nick Couchman
  2012-10-15 17:26 ` Gregory Farnum
  0 siblings, 1 reply; 19+ messages in thread
From: Nick Couchman @ 2012-10-15 17:02 UTC (permalink / raw)
  To: ceph-devel

Well, both of my MDSs seem to be down right now, and then continually segfault (every time I try to start them) with the following:

ceph-mdsmon-a:~ # ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
starting mds.b at :/0
*** Caught signal (Segmentation fault) **
 in thread 7fbe0d61d700
 ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
 1: ceph-mds() [0x7ef83a]
 2: (()+0xfd00) [0x7fbe15a0cd00]
 3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
 4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
 5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
 6: (()+0x7f05) [0x7fbe15a04f05]
 7: (clone()+0x6d) [0x7fbe14bc410d]
2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught signal (Segmentation fault) **
 in thread 7fbe0d61d700

 ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
 1: ceph-mds() [0x7ef83a]
 2: (()+0xfd00) [0x7fbe15a0cd00]
 3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
 4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
 5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
 6: (()+0x7f05) [0x7fbe15a04f05]
 7: (clone()+0x6d) [0x7fbe14bc410d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

     0> 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught signal (Segmentation fault) **
 in thread 7fbe0d61d700

 ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
 1: ceph-mds() [0x7ef83a]
 2: (()+0xfd00) [0x7fbe15a0cd00]
 3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
 4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
 5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
 6: (()+0x7f05) [0x7fbe15a04f05]
 7: (clone()+0x6d) [0x7fbe14bc410d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Segmentation fault

Anyone have any hints on recovering?  I'm running 0.48.1argonaut - I can attempt to upgrade to 0.48.2 and see if that helps, but I figured if anyone can offer any insight as to what to do to get the replay to run without segfaulting?



--------

This e-mail may contain confidential and privileged material for the sole use of the intended recipient.  If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information.  In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way.  If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox.  Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-11-03 18:38 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-30  2:22 Help...MDS Continuously Segfaulting Nick Couchman
2012-11-03 17:45 ` Gregory Farnum
  -- strict thread matches above, loose matches on Subject: below --
2012-11-03 18:27 Nick Couchman
2012-11-03 18:38 ` Gregory Farnum
2012-10-15 17:02 Nick Couchman
2012-10-15 17:26 ` Gregory Farnum
2012-10-15 17:43   ` Nick Couchman
2012-10-15 17:47     ` Gregory Farnum
     [not found]       ` <507BFAA102000099000ECA3B@collaborate.seakr.com>
2012-10-16 23:04         ` Gregory Farnum
2012-10-17 13:34           ` Sam Lang
2012-10-17 14:42             ` Nick Couchman
2012-10-17 15:53               ` Sam Lang
2012-10-17 16:23                 ` Nick Couchman
2012-10-17 17:03                   ` Sam Lang
2012-10-18 15:56                 ` Nick Couchman
2012-10-18 16:20                   ` Gregory Farnum
2012-10-18 22:55                   ` Gregory Farnum
2012-10-19 20:52                     ` Nick Couchman
2012-10-19 22:15                       ` Gregory Farnum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.