* OSD failure on start
@ 2013-02-13 19:57 Mandell Degerness
2013-02-13 22:08 ` Mike Dawson
0 siblings, 1 reply; 4+ messages in thread
From: Mandell Degerness @ 2013-02-13 19:57 UTC (permalink / raw)
To: ceph-devel
I'm getting this error on one of my OSD's when I try to start it.
I can gather more complete log data if no-one recognizes the error from this:
Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.612847
7f4f607e7780 0 filestore(/mnt/osd96) mount found snaps <>
Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.615147
7f4f607e7780 0 filestore(/mnt/osd96) mount: enabling WRITEAHEAD
journal mode: btrfs not detected
Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.658965
7f4f607e7780 1 journal _open /mnt/osd96/journal fd 30: 8589934592
bytes, block size 4096 bytes, directio = 1, aio = 0
Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.720091
7f4f607e7780 1 journal _open /mnt/osd96/journal fd 30: 8589934592
bytes, block size 4096 bytes, directio = 1, aio = 0
Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.721871
7f4f607e7780 -1 osd/OSD.cc: In function 'OSDMapRef
OSD::get_map(epoch_t)' thread 7f4f607e7780 time 2013-02-13
19:30:04.721278
osd/OSD.cc: 4029: FAILED assert(_get_map_bl(epoch, bl))
ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
1: (OSD::get_map(unsigned int)+0x560) [0x7f4f60a411e0]
2: (OSD::init()+0x5a3) [0x7f4f60a53ce3]
3: (main()+0x4462) [0x7f4f6096d182]
4: (__libc_start_main()+0xfd) [0x7f4f5e64b26d]
5: (()+0x16e829) [0x7f4f60968829]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
Feb 13 19:30:04 node-192-168-8-14 ceph-osd: --- begin dump of recent events ---
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: OSD failure on start
2013-02-13 19:57 OSD failure on start Mandell Degerness
@ 2013-02-13 22:08 ` Mike Dawson
2013-02-13 22:47 ` Mandell Degerness
0 siblings, 1 reply; 4+ messages in thread
From: Mike Dawson @ 2013-02-13 22:08 UTC (permalink / raw)
To: Mandell Degerness; +Cc: ceph-devel
Mandell,
A few of us saw a similar failure on 0.56.1.
http://tracker.ceph.com/issues/3770
Sam Just patched the issue for 0.56.2. My understanding is Sam's patch
prevents the issue in the future, but doesn't repair a previously
damaged OSD.
If you have good replication (or a good backup), I have had luck
removing the affected OSD, formatting, and re-adding it. I believe Sam
may have a manual process to fix it if you can't wipe this OSD.
Good Luck,
Mike
On 2/13/2013 2:57 PM, Mandell Degerness wrote:
> I'm getting this error on one of my OSD's when I try to start it.
>
> I can gather more complete log data if no-one recognizes the error from this:
>
> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.612847
> 7f4f607e7780 0 filestore(/mnt/osd96) mount found snaps <>
> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.615147
> 7f4f607e7780 0 filestore(/mnt/osd96) mount: enabling WRITEAHEAD
> journal mode: btrfs not detected
> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.658965
> 7f4f607e7780 1 journal _open /mnt/osd96/journal fd 30: 8589934592
> bytes, block size 4096 bytes, directio = 1, aio = 0
> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.720091
> 7f4f607e7780 1 journal _open /mnt/osd96/journal fd 30: 8589934592
> bytes, block size 4096 bytes, directio = 1, aio = 0
> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.721871
> 7f4f607e7780 -1 osd/OSD.cc: In function 'OSDMapRef
> OSD::get_map(epoch_t)' thread 7f4f607e7780 time 2013-02-13
> 19:30:04.721278
> osd/OSD.cc: 4029: FAILED assert(_get_map_bl(epoch, bl))
>
> ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
> 1: (OSD::get_map(unsigned int)+0x560) [0x7f4f60a411e0]
> 2: (OSD::init()+0x5a3) [0x7f4f60a53ce3]
> 3: (main()+0x4462) [0x7f4f6096d182]
> 4: (__libc_start_main()+0xfd) [0x7f4f5e64b26d]
> 5: (()+0x16e829) [0x7f4f60968829]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: --- begin dump of recent events ---
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: OSD failure on start
2013-02-13 22:08 ` Mike Dawson
@ 2013-02-13 22:47 ` Mandell Degerness
2013-02-14 2:52 ` Samuel Just
0 siblings, 1 reply; 4+ messages in thread
From: Mandell Degerness @ 2013-02-13 22:47 UTC (permalink / raw)
To: Mike Dawson; +Cc: ceph-devel
Thanks. I'm glad to hear it is fixed in new version. Wiping the OSD worked.
On Wed, Feb 13, 2013 at 2:08 PM, Mike Dawson
<mike.dawson@scholarstack.com> wrote:
> Mandell,
>
> A few of us saw a similar failure on 0.56.1.
>
> http://tracker.ceph.com/issues/3770
>
> Sam Just patched the issue for 0.56.2. My understanding is Sam's patch
> prevents the issue in the future, but doesn't repair a previously damaged
> OSD.
>
> If you have good replication (or a good backup), I have had luck removing
> the affected OSD, formatting, and re-adding it. I believe Sam may have a
> manual process to fix it if you can't wipe this OSD.
>
> Good Luck,
> Mike
>
>
>
> On 2/13/2013 2:57 PM, Mandell Degerness wrote:
>>
>> I'm getting this error on one of my OSD's when I try to start it.
>>
>> I can gather more complete log data if no-one recognizes the error from
>> this:
>>
>> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.612847
>> 7f4f607e7780 0 filestore(/mnt/osd96) mount found snaps <>
>> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.615147
>> 7f4f607e7780 0 filestore(/mnt/osd96) mount: enabling WRITEAHEAD
>> journal mode: btrfs not detected
>> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.658965
>> 7f4f607e7780 1 journal _open /mnt/osd96/journal fd 30: 8589934592
>> bytes, block size 4096 bytes, directio = 1, aio = 0
>> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.720091
>> 7f4f607e7780 1 journal _open /mnt/osd96/journal fd 30: 8589934592
>> bytes, block size 4096 bytes, directio = 1, aio = 0
>> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.721871
>> 7f4f607e7780 -1 osd/OSD.cc: In function 'OSDMapRef
>> OSD::get_map(epoch_t)' thread 7f4f607e7780 time 2013-02-13
>> 19:30:04.721278
>> osd/OSD.cc: 4029: FAILED assert(_get_map_bl(epoch, bl))
>>
>> ceph version 0.48.1argonaut
>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>> 1: (OSD::get_map(unsigned int)+0x560) [0x7f4f60a411e0]
>> 2: (OSD::init()+0x5a3) [0x7f4f60a53ce3]
>> 3: (main()+0x4462) [0x7f4f6096d182]
>> 4: (__libc_start_main()+0xfd) [0x7f4f5e64b26d]
>> 5: (()+0x16e829) [0x7f4f60968829]
>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to interpret this.
>> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: --- begin dump of recent
>> events ---
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: OSD failure on start
2013-02-13 22:47 ` Mandell Degerness
@ 2013-02-14 2:52 ` Samuel Just
0 siblings, 0 replies; 4+ messages in thread
From: Samuel Just @ 2013-02-14 2:52 UTC (permalink / raw)
To: Mandell Degerness; +Cc: Mike Dawson, ceph-devel
Actually, that bug did not exist in 48.1, must have been something
different. Was the the node you had the trouble with the pg logs on?
-Sam
On Wed, Feb 13, 2013 at 2:47 PM, Mandell Degerness
<mandell@pistoncloud.com> wrote:
> Thanks. I'm glad to hear it is fixed in new version. Wiping the OSD worked.
>
> On Wed, Feb 13, 2013 at 2:08 PM, Mike Dawson
> <mike.dawson@scholarstack.com> wrote:
>> Mandell,
>>
>> A few of us saw a similar failure on 0.56.1.
>>
>> http://tracker.ceph.com/issues/3770
>>
>> Sam Just patched the issue for 0.56.2. My understanding is Sam's patch
>> prevents the issue in the future, but doesn't repair a previously damaged
>> OSD.
>>
>> If you have good replication (or a good backup), I have had luck removing
>> the affected OSD, formatting, and re-adding it. I believe Sam may have a
>> manual process to fix it if you can't wipe this OSD.
>>
>> Good Luck,
>> Mike
>>
>>
>>
>> On 2/13/2013 2:57 PM, Mandell Degerness wrote:
>>>
>>> I'm getting this error on one of my OSD's when I try to start it.
>>>
>>> I can gather more complete log data if no-one recognizes the error from
>>> this:
>>>
>>> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.612847
>>> 7f4f607e7780 0 filestore(/mnt/osd96) mount found snaps <>
>>> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.615147
>>> 7f4f607e7780 0 filestore(/mnt/osd96) mount: enabling WRITEAHEAD
>>> journal mode: btrfs not detected
>>> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.658965
>>> 7f4f607e7780 1 journal _open /mnt/osd96/journal fd 30: 8589934592
>>> bytes, block size 4096 bytes, directio = 1, aio = 0
>>> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.720091
>>> 7f4f607e7780 1 journal _open /mnt/osd96/journal fd 30: 8589934592
>>> bytes, block size 4096 bytes, directio = 1, aio = 0
>>> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.721871
>>> 7f4f607e7780 -1 osd/OSD.cc: In function 'OSDMapRef
>>> OSD::get_map(epoch_t)' thread 7f4f607e7780 time 2013-02-13
>>> 19:30:04.721278
>>> osd/OSD.cc: 4029: FAILED assert(_get_map_bl(epoch, bl))
>>>
>>> ceph version 0.48.1argonaut
>>> (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
>>> 1: (OSD::get_map(unsigned int)+0x560) [0x7f4f60a411e0]
>>> 2: (OSD::init()+0x5a3) [0x7f4f60a53ce3]
>>> 3: (main()+0x4462) [0x7f4f6096d182]
>>> 4: (__libc_start_main()+0xfd) [0x7f4f5e64b26d]
>>> 5: (()+0x16e829) [0x7f4f60968829]
>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>> needed to interpret this.
>>> Feb 13 19:30:04 node-192-168-8-14 ceph-osd: --- begin dump of recent
>>> events ---
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2013-02-14 2:52 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-13 19:57 OSD failure on start Mandell Degerness
2013-02-13 22:08 ` Mike Dawson
2013-02-13 22:47 ` Mandell Degerness
2013-02-14 2:52 ` Samuel Just
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.