All of lore.kernel.org
 help / color / mirror / Atom feed
* OSD failure on start
@ 2013-02-13 19:57 Mandell Degerness
  2013-02-13 22:08 ` Mike Dawson
  0 siblings, 1 reply; 4+ messages in thread
From: Mandell Degerness @ 2013-02-13 19:57 UTC (permalink / raw)
  To: ceph-devel

I'm getting this error on one of my OSD's when I try to start it.

I can gather more complete log data if no-one recognizes the error from this:

Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.612847
7f4f607e7780  0 filestore(/mnt/osd96) mount found snaps <>
Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.615147
7f4f607e7780  0 filestore(/mnt/osd96) mount: enabling WRITEAHEAD
journal mode: btrfs not detected
Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.658965
7f4f607e7780  1 journal _open /mnt/osd96/journal fd 30: 8589934592
bytes, block size 4096 bytes, directio = 1, aio = 0
Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.720091
7f4f607e7780  1 journal _open /mnt/osd96/journal fd 30: 8589934592
bytes, block size 4096 bytes, directio = 1, aio = 0
Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.721871
7f4f607e7780 -1 osd/OSD.cc: In function 'OSDMapRef
OSD::get_map(epoch_t)' thread 7f4f607e7780 time 2013-02-13
19:30:04.721278
osd/OSD.cc: 4029: FAILED assert(_get_map_bl(epoch, bl))

 ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
 1: (OSD::get_map(unsigned int)+0x560) [0x7f4f60a411e0]
 2: (OSD::init()+0x5a3) [0x7f4f60a53ce3]
 3: (main()+0x4462) [0x7f4f6096d182]
 4: (__libc_start_main()+0xfd) [0x7f4f5e64b26d]
 5: (()+0x16e829) [0x7f4f60968829]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
Feb 13 19:30:04 node-192-168-8-14 ceph-osd: --- begin dump of recent events ---

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: OSD failure on start
  2013-02-13 19:57 OSD failure on start Mandell Degerness
@ 2013-02-13 22:08 ` Mike Dawson
  2013-02-13 22:47   ` Mandell Degerness
  0 siblings, 1 reply; 4+ messages in thread
From: Mike Dawson @ 2013-02-13 22:08 UTC (permalink / raw)
  To: Mandell Degerness; +Cc: ceph-devel

Mandell,

A few of us saw a similar failure on 0.56.1.

http://tracker.ceph.com/issues/3770

Sam Just patched the issue for 0.56.2. My understanding is Sam's patch 
prevents the issue in the future, but doesn't repair a previously 
damaged OSD.

If you have good replication (or a good backup), I have had luck 
removing the affected OSD, formatting, and re-adding it. I believe Sam 
may have a manual process to fix it if you can't wipe this OSD.

Good Luck,
Mike


On 2/13/2013 2:57 PM, Mandell Degerness wrote:
> I'm getting this error on one of my OSD's when I try to start it.
>
> I can gather more complete log data if no-one recognizes the error from this:
>
> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.612847
> 7f4f607e7780  0 filestore(/mnt/osd96) mount found snaps <>
> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.615147
> 7f4f607e7780  0 filestore(/mnt/osd96) mount: enabling WRITEAHEAD
> journal mode: btrfs not detected
> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.658965
> 7f4f607e7780  1 journal _open /mnt/osd96/journal fd 30: 8589934592
> bytes, block size 4096 bytes, directio = 1, aio = 0
> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.720091
> 7f4f607e7780  1 journal _open /mnt/osd96/journal fd 30: 8589934592
> bytes, block size 4096 bytes, directio = 1, aio = 0
> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.721871
> 7f4f607e7780 -1 osd/OSD.cc: In function 'OSDMapRef
> OSD::get_map(epoch_t)' thread 7f4f607e7780 time 2013-02-13
> 19:30:04.721278
> osd/OSD.cc: 4029: FAILED assert(_get_map_bl(epoch, bl))
>
>   ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>   1: (OSD::get_map(unsigned int)+0x560) [0x7f4f60a411e0]
>   2: (OSD::init()+0x5a3) [0x7f4f60a53ce3]
>   3: (main()+0x4462) [0x7f4f6096d182]
>   4: (__libc_start_main()+0xfd) [0x7f4f5e64b26d]
>   5: (()+0x16e829) [0x7f4f60968829]
>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: --- begin dump of recent events ---
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: OSD failure on start
  2013-02-13 22:08 ` Mike Dawson
@ 2013-02-13 22:47   ` Mandell Degerness
  2013-02-14  2:52     ` Samuel Just
  0 siblings, 1 reply; 4+ messages in thread
From: Mandell Degerness @ 2013-02-13 22:47 UTC (permalink / raw)
  To: Mike Dawson; +Cc: ceph-devel

Thanks.  I'm glad to hear it is fixed in new version.  Wiping the OSD worked.

On Wed, Feb 13, 2013 at 2:08 PM, Mike Dawson
<mike.dawson@scholarstack.com> wrote:
> Mandell,
>
> A few of us saw a similar failure on 0.56.1.
>
> http://tracker.ceph.com/issues/3770
>
> Sam Just patched the issue for 0.56.2. My understanding is Sam's patch
> prevents the issue in the future, but doesn't repair a previously damaged
> OSD.
>
> If you have good replication (or a good backup), I have had luck removing
> the affected OSD, formatting, and re-adding it. I believe Sam may have a
> manual process to fix it if you can't wipe this OSD.
>
> Good Luck,
> Mike
>
>
>
> On 2/13/2013 2:57 PM, Mandell Degerness wrote:
>>
>> I'm getting this error on one of my OSD's when I try to start it.
>>
>> I can gather more complete log data if no-one recognizes the error from
>> this:
>>
>> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.612847
>> 7f4f607e7780  0 filestore(/mnt/osd96) mount found snaps <>
>> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.615147
>> 7f4f607e7780  0 filestore(/mnt/osd96) mount: enabling WRITEAHEAD
>> journal mode: btrfs not detected
>> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.658965
>> 7f4f607e7780  1 journal _open /mnt/osd96/journal fd 30: 8589934592
>> bytes, block size 4096 bytes, directio = 1, aio = 0
>> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.720091
>> 7f4f607e7780  1 journal _open /mnt/osd96/journal fd 30: 8589934592
>> bytes, block size 4096 bytes, directio = 1, aio = 0
>> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.721871
>> 7f4f607e7780 -1 osd/OSD.cc: In function 'OSDMapRef
>> OSD::get_map(epoch_t)' thread 7f4f607e7780 time 2013-02-13
>> 19:30:04.721278
>> osd/OSD.cc: 4029: FAILED assert(_get_map_bl(epoch, bl))
>>
>>   ceph version 0.48.1argonaut
>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>   1: (OSD::get_map(unsigned int)+0x560) [0x7f4f60a411e0]
>>   2: (OSD::init()+0x5a3) [0x7f4f60a53ce3]
>>   3: (main()+0x4462) [0x7f4f6096d182]
>>   4: (__libc_start_main()+0xfd) [0x7f4f5e64b26d]
>>   5: (()+0x16e829) [0x7f4f60968829]
>>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to interpret this.
>> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: --- begin dump of recent
>> events ---
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: OSD failure on start
  2013-02-13 22:47   ` Mandell Degerness
@ 2013-02-14  2:52     ` Samuel Just
  0 siblings, 0 replies; 4+ messages in thread
From: Samuel Just @ 2013-02-14  2:52 UTC (permalink / raw)
  To: Mandell Degerness; +Cc: Mike Dawson, ceph-devel

Actually, that bug did not exist in 48.1, must have been something
different.  Was the the node you had the trouble with the pg logs on?
-Sam

On Wed, Feb 13, 2013 at 2:47 PM, Mandell Degerness
<mandell@pistoncloud.com> wrote:
> Thanks.  I'm glad to hear it is fixed in new version.  Wiping the OSD worked.
>
> On Wed, Feb 13, 2013 at 2:08 PM, Mike Dawson
> <mike.dawson@scholarstack.com> wrote:
>> Mandell,
>>
>> A few of us saw a similar failure on 0.56.1.
>>
>> http://tracker.ceph.com/issues/3770
>>
>> Sam Just patched the issue for 0.56.2. My understanding is Sam's patch
>> prevents the issue in the future, but doesn't repair a previously damaged
>> OSD.
>>
>> If you have good replication (or a good backup), I have had luck removing
>> the affected OSD, formatting, and re-adding it. I believe Sam may have a
>> manual process to fix it if you can't wipe this OSD.
>>
>> Good Luck,
>> Mike
>>
>>
>>
>> On 2/13/2013 2:57 PM, Mandell Degerness wrote:
>>>
>>> I'm getting this error on one of my OSD's when I try to start it.
>>>
>>> I can gather more complete log data if no-one recognizes the error from
>>> this:
>>>
>>> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.612847
>>> 7f4f607e7780  0 filestore(/mnt/osd96) mount found snaps <>
>>> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.615147
>>> 7f4f607e7780  0 filestore(/mnt/osd96) mount: enabling WRITEAHEAD
>>> journal mode: btrfs not detected
>>> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.658965
>>> 7f4f607e7780  1 journal _open /mnt/osd96/journal fd 30: 8589934592
>>> bytes, block size 4096 bytes, directio = 1, aio = 0
>>> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.720091
>>> 7f4f607e7780  1 journal _open /mnt/osd96/journal fd 30: 8589934592
>>> bytes, block size 4096 bytes, directio = 1, aio = 0
>>> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.721871
>>> 7f4f607e7780 -1 osd/OSD.cc: In function 'OSDMapRef
>>> OSD::get_map(epoch_t)' thread 7f4f607e7780 time 2013-02-13
>>> 19:30:04.721278
>>> osd/OSD.cc: 4029: FAILED assert(_get_map_bl(epoch, bl))
>>>
>>>   ceph version 0.48.1argonaut
>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>>   1: (OSD::get_map(unsigned int)+0x560) [0x7f4f60a411e0]
>>>   2: (OSD::init()+0x5a3) [0x7f4f60a53ce3]
>>>   3: (main()+0x4462) [0x7f4f6096d182]
>>>   4: (__libc_start_main()+0xfd) [0x7f4f5e64b26d]
>>>   5: (()+0x16e829) [0x7f4f60968829]
>>>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>> needed to interpret this.
>>> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: --- begin dump of recent
>>> events ---
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-02-14  2:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-13 19:57 OSD failure on start Mandell Degerness
2013-02-13 22:08 ` Mike Dawson
2013-02-13 22:47   ` Mandell Degerness
2013-02-14  2:52     ` Samuel Just

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.