All of lore.kernel.org
 help / color / mirror / Atom feed
* pgs stuck inactive
@ 2012-04-04  8:12 Damien Churchill
  2012-04-04 20:51 ` Samuel Just
  0 siblings, 1 reply; 25+ messages in thread
From: Damien Churchill @ 2012-04-04  8:12 UTC (permalink / raw)
  To: ceph-devel

Hi,

I'm having some trouble getting some pgs to stop being inactive. The
cluster is running 0.44.1 and the kernel version is 3.2.x.

ceph -s reports:
2012-04-04 09:08:57.816029    pg v188540: 990 pgs: 223 inactive, 767
active+clean; 205 GB data, 1013 GB used, 8204 GB / 9315 GB avail
2012-04-04 09:08:57.817970   mds e2198: 1/1/1 up {0=node24=up:active},
4 up:standby
2012-04-04 09:08:57.818024   osd e5910: 5 osds: 5 up, 5 in
2012-04-04 09:08:57.818201   log 2012-04-04 09:04:03.838358 osd.3
172.22.10.24:6801/30000 159 : [INF] 0.13d scrub ok
2012-04-04 09:08:57.818280   mon e7: 3 mons at
{node21=172.22.10.21:6789/0,node22=172.22.10.22:6789/0,node23=172.22.10.23:6789/0}

ceph health says:
2012-04-04 09:09:01.651053 mon <- [health]
2012-04-04 09:09:01.666585 mon.1 -> 'HEALTH_WARN 223 pgs stuck
inactive; 223 pgs stuck unclean' (0)

I was wondering if anyone has any suggestions about how to resolve
this, or things to look for. I've tried restarted the ceph daemons on
the various nodes a few times to no-avail. I don't think that there is
anything wrong with any of the nodes either.

Thanks in advance,
Damien

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pgs stuck inactive
  2012-04-04  8:12 pgs stuck inactive Damien Churchill
@ 2012-04-04 20:51 ` Samuel Just
  2012-04-04 21:44   ` Damien Churchill
  0 siblings, 1 reply; 25+ messages in thread
From: Samuel Just @ 2012-04-04 20:51 UTC (permalink / raw)
  To: Damien Churchill; +Cc: ceph-devel

Can you post a copy of your osd map and the output of 'ceph pg dump' ?
 You can get the osdmap via 'ceph osd getmap -o <filename>'.
-Sam

On Wed, Apr 4, 2012 at 1:12 AM, Damien Churchill <damoxc@gmail.com> wrote:
> Hi,
>
> I'm having some trouble getting some pgs to stop being inactive. The
> cluster is running 0.44.1 and the kernel version is 3.2.x.
>
> ceph -s reports:
> 2012-04-04 09:08:57.816029    pg v188540: 990 pgs: 223 inactive, 767
> active+clean; 205 GB data, 1013 GB used, 8204 GB / 9315 GB avail
> 2012-04-04 09:08:57.817970   mds e2198: 1/1/1 up {0=node24=up:active},
> 4 up:standby
> 2012-04-04 09:08:57.818024   osd e5910: 5 osds: 5 up, 5 in
> 2012-04-04 09:08:57.818201   log 2012-04-04 09:04:03.838358 osd.3
> 172.22.10.24:6801/30000 159 : [INF] 0.13d scrub ok
> 2012-04-04 09:08:57.818280   mon e7: 3 mons at
> {node21=172.22.10.21:6789/0,node22=172.22.10.22:6789/0,node23=172.22.10.23:6789/0}
>
> ceph health says:
> 2012-04-04 09:09:01.651053 mon <- [health]
> 2012-04-04 09:09:01.666585 mon.1 -> 'HEALTH_WARN 223 pgs stuck
> inactive; 223 pgs stuck unclean' (0)
>
> I was wondering if anyone has any suggestions about how to resolve
> this, or things to look for. I've tried restarted the ceph daemons on
> the various nodes a few times to no-avail. I don't think that there is
> anything wrong with any of the nodes either.
>
> Thanks in advance,
> Damien
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pgs stuck inactive
  2012-04-04 20:51 ` Samuel Just
@ 2012-04-04 21:44   ` Damien Churchill
  2012-04-06 16:50     ` Samuel Just
  0 siblings, 1 reply; 25+ messages in thread
From: Damien Churchill @ 2012-04-04 21:44 UTC (permalink / raw)
  To: Samuel Just; +Cc: ceph-devel

I've uploaded them to:

http://damoxc.net/ceph/osdmap
http://damoxc.net/ceph/pg_dump

Thanks

On 4 April 2012 21:51, Samuel Just <sam.just@dreamhost.com> wrote:
> Can you post a copy of your osd map and the output of 'ceph pg dump' ?
>  You can get the osdmap via 'ceph osd getmap -o <filename>'.
> -Sam
>
> On Wed, Apr 4, 2012 at 1:12 AM, Damien Churchill <damoxc@gmail.com> wrote:
>> Hi,
>>
>> I'm having some trouble getting some pgs to stop being inactive. The
>> cluster is running 0.44.1 and the kernel version is 3.2.x.
>>
>> ceph -s reports:
>> 2012-04-04 09:08:57.816029    pg v188540: 990 pgs: 223 inactive, 767
>> active+clean; 205 GB data, 1013 GB used, 8204 GB / 9315 GB avail
>> 2012-04-04 09:08:57.817970   mds e2198: 1/1/1 up {0=node24=up:active},
>> 4 up:standby
>> 2012-04-04 09:08:57.818024   osd e5910: 5 osds: 5 up, 5 in
>> 2012-04-04 09:08:57.818201   log 2012-04-04 09:04:03.838358 osd.3
>> 172.22.10.24:6801/30000 159 : [INF] 0.13d scrub ok
>> 2012-04-04 09:08:57.818280   mon e7: 3 mons at
>> {node21=172.22.10.21:6789/0,node22=172.22.10.22:6789/0,node23=172.22.10.23:6789/0}
>>
>> ceph health says:
>> 2012-04-04 09:09:01.651053 mon <- [health]
>> 2012-04-04 09:09:01.666585 mon.1 -> 'HEALTH_WARN 223 pgs stuck
>> inactive; 223 pgs stuck unclean' (0)
>>
>> I was wondering if anyone has any suggestions about how to resolve
>> this, or things to look for. I've tried restarted the ceph daemons on
>> the various nodes a few times to no-avail. I don't think that there is
>> anything wrong with any of the nodes either.
>>
>> Thanks in advance,
>> Damien
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pgs stuck inactive
  2012-04-04 21:44   ` Damien Churchill
@ 2012-04-06 16:50     ` Samuel Just
  2012-04-06 16:55       ` Damien Churchill
  0 siblings, 1 reply; 25+ messages in thread
From: Samuel Just @ 2012-04-06 16:50 UTC (permalink / raw)
  To: Damien Churchill; +Cc: ceph-devel

Is there a 0.138_head directory under current/ on any of your osds?
If so, can you post the log to that osd?  I could also use the osd.0
log.
-Sam

On Wed, Apr 4, 2012 at 2:44 PM, Damien Churchill <damoxc@gmail.com> wrote:
> I've uploaded them to:
>
> http://damoxc.net/ceph/osdmap
> http://damoxc.net/ceph/pg_dump
>
> Thanks
>
> On 4 April 2012 21:51, Samuel Just <sam.just@dreamhost.com> wrote:
>> Can you post a copy of your osd map and the output of 'ceph pg dump' ?
>>  You can get the osdmap via 'ceph osd getmap -o <filename>'.
>> -Sam
>>
>> On Wed, Apr 4, 2012 at 1:12 AM, Damien Churchill <damoxc@gmail.com> wrote:
>>> Hi,
>>>
>>> I'm having some trouble getting some pgs to stop being inactive. The
>>> cluster is running 0.44.1 and the kernel version is 3.2.x.
>>>
>>> ceph -s reports:
>>> 2012-04-04 09:08:57.816029    pg v188540: 990 pgs: 223 inactive, 767
>>> active+clean; 205 GB data, 1013 GB used, 8204 GB / 9315 GB avail
>>> 2012-04-04 09:08:57.817970   mds e2198: 1/1/1 up {0=node24=up:active},
>>> 4 up:standby
>>> 2012-04-04 09:08:57.818024   osd e5910: 5 osds: 5 up, 5 in
>>> 2012-04-04 09:08:57.818201   log 2012-04-04 09:04:03.838358 osd.3
>>> 172.22.10.24:6801/30000 159 : [INF] 0.13d scrub ok
>>> 2012-04-04 09:08:57.818280   mon e7: 3 mons at
>>> {node21=172.22.10.21:6789/0,node22=172.22.10.22:6789/0,node23=172.22.10.23:6789/0}
>>>
>>> ceph health says:
>>> 2012-04-04 09:09:01.651053 mon <- [health]
>>> 2012-04-04 09:09:01.666585 mon.1 -> 'HEALTH_WARN 223 pgs stuck
>>> inactive; 223 pgs stuck unclean' (0)
>>>
>>> I was wondering if anyone has any suggestions about how to resolve
>>> this, or things to look for. I've tried restarted the ceph daemons on
>>> the various nodes a few times to no-avail. I don't think that there is
>>> anything wrong with any of the nodes either.
>>>
>>> Thanks in advance,
>>> Damien
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pgs stuck inactive
  2012-04-06 16:50     ` Samuel Just
@ 2012-04-06 16:55       ` Damien Churchill
  2012-04-06 17:30         ` Samuel Just
  0 siblings, 1 reply; 25+ messages in thread
From: Damien Churchill @ 2012-04-06 16:55 UTC (permalink / raw)
  To: Samuel Just; +Cc: ceph-devel

I've got that directory on 3 of the osds: 0, 3 and 4. Do you want the
logs to all 3 of them?

On 6 April 2012 17:50, Samuel Just <sam.just@dreamhost.com> wrote:
> Is there a 0.138_head directory under current/ on any of your osds?
> If so, can you post the log to that osd?  I could also use the osd.0
> log.
> -Sam
>
> On Wed, Apr 4, 2012 at 2:44 PM, Damien Churchill <damoxc@gmail.com> wrote:
>> I've uploaded them to:
>>
>> http://damoxc.net/ceph/osdmap
>> http://damoxc.net/ceph/pg_dump
>>
>> Thanks
>>
>> On 4 April 2012 21:51, Samuel Just <sam.just@dreamhost.com> wrote:
>>> Can you post a copy of your osd map and the output of 'ceph pg dump' ?
>>>  You can get the osdmap via 'ceph osd getmap -o <filename>'.
>>> -Sam
>>>
>>> On Wed, Apr 4, 2012 at 1:12 AM, Damien Churchill <damoxc@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> I'm having some trouble getting some pgs to stop being inactive. The
>>>> cluster is running 0.44.1 and the kernel version is 3.2.x.
>>>>
>>>> ceph -s reports:
>>>> 2012-04-04 09:08:57.816029    pg v188540: 990 pgs: 223 inactive, 767
>>>> active+clean; 205 GB data, 1013 GB used, 8204 GB / 9315 GB avail
>>>> 2012-04-04 09:08:57.817970   mds e2198: 1/1/1 up {0=node24=up:active},
>>>> 4 up:standby
>>>> 2012-04-04 09:08:57.818024   osd e5910: 5 osds: 5 up, 5 in
>>>> 2012-04-04 09:08:57.818201   log 2012-04-04 09:04:03.838358 osd.3
>>>> 172.22.10.24:6801/30000 159 : [INF] 0.13d scrub ok
>>>> 2012-04-04 09:08:57.818280   mon e7: 3 mons at
>>>> {node21=172.22.10.21:6789/0,node22=172.22.10.22:6789/0,node23=172.22.10.23:6789/0}
>>>>
>>>> ceph health says:
>>>> 2012-04-04 09:09:01.651053 mon <- [health]
>>>> 2012-04-04 09:09:01.666585 mon.1 -> 'HEALTH_WARN 223 pgs stuck
>>>> inactive; 223 pgs stuck unclean' (0)
>>>>
>>>> I was wondering if anyone has any suggestions about how to resolve
>>>> this, or things to look for. I've tried restarted the ceph daemons on
>>>> the various nodes a few times to no-avail. I don't think that there is
>>>> anything wrong with any of the nodes either.
>>>>
>>>> Thanks in advance,
>>>> Damien
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pgs stuck inactive
  2012-04-06 16:55       ` Damien Churchill
@ 2012-04-06 17:30         ` Samuel Just
  2012-04-06 17:53           ` Damien Churchill
  0 siblings, 1 reply; 25+ messages in thread
From: Samuel Just @ 2012-04-06 17:30 UTC (permalink / raw)
  To: Damien Churchill; +Cc: ceph-devel

Hmm, osd.0 should be enough.  I need osd debugging at around 20 from
when the osd started.  Restarting the osd with debugging at 20 would
also work fine.
-Sam

On Fri, Apr 6, 2012 at 9:55 AM, Damien Churchill <damoxc@gmail.com> wrote:
> I've got that directory on 3 of the osds: 0, 3 and 4. Do you want the
> logs to all 3 of them?
>
> On 6 April 2012 17:50, Samuel Just <sam.just@dreamhost.com> wrote:
>> Is there a 0.138_head directory under current/ on any of your osds?
>> If so, can you post the log to that osd?  I could also use the osd.0
>> log.
>> -Sam
>>
>> On Wed, Apr 4, 2012 at 2:44 PM, Damien Churchill <damoxc@gmail.com> wrote:
>>> I've uploaded them to:
>>>
>>> http://damoxc.net/ceph/osdmap
>>> http://damoxc.net/ceph/pg_dump
>>>
>>> Thanks
>>>
>>> On 4 April 2012 21:51, Samuel Just <sam.just@dreamhost.com> wrote:
>>>> Can you post a copy of your osd map and the output of 'ceph pg dump' ?
>>>>  You can get the osdmap via 'ceph osd getmap -o <filename>'.
>>>> -Sam
>>>>
>>>> On Wed, Apr 4, 2012 at 1:12 AM, Damien Churchill <damoxc@gmail.com> wrote:
>>>>> Hi,
>>>>>
>>>>> I'm having some trouble getting some pgs to stop being inactive. The
>>>>> cluster is running 0.44.1 and the kernel version is 3.2.x.
>>>>>
>>>>> ceph -s reports:
>>>>> 2012-04-04 09:08:57.816029    pg v188540: 990 pgs: 223 inactive, 767
>>>>> active+clean; 205 GB data, 1013 GB used, 8204 GB / 9315 GB avail
>>>>> 2012-04-04 09:08:57.817970   mds e2198: 1/1/1 up {0=node24=up:active},
>>>>> 4 up:standby
>>>>> 2012-04-04 09:08:57.818024   osd e5910: 5 osds: 5 up, 5 in
>>>>> 2012-04-04 09:08:57.818201   log 2012-04-04 09:04:03.838358 osd.3
>>>>> 172.22.10.24:6801/30000 159 : [INF] 0.13d scrub ok
>>>>> 2012-04-04 09:08:57.818280   mon e7: 3 mons at
>>>>> {node21=172.22.10.21:6789/0,node22=172.22.10.22:6789/0,node23=172.22.10.23:6789/0}
>>>>>
>>>>> ceph health says:
>>>>> 2012-04-04 09:09:01.651053 mon <- [health]
>>>>> 2012-04-04 09:09:01.666585 mon.1 -> 'HEALTH_WARN 223 pgs stuck
>>>>> inactive; 223 pgs stuck unclean' (0)
>>>>>
>>>>> I was wondering if anyone has any suggestions about how to resolve
>>>>> this, or things to look for. I've tried restarted the ceph daemons on
>>>>> the various nodes a few times to no-avail. I don't think that there is
>>>>> anything wrong with any of the nodes either.
>>>>>
>>>>> Thanks in advance,
>>>>> Damien
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pgs stuck inactive
  2012-04-06 17:30         ` Samuel Just
@ 2012-04-06 17:53           ` Damien Churchill
       [not found]             ` <CACLRD_3hqoOqbZWnyft3B+pEaPG=pmiDt5HeQASUC=OPS-uA7Q@mail.gmail.com>
  0 siblings, 1 reply; 25+ messages in thread
From: Damien Churchill @ 2012-04-06 17:53 UTC (permalink / raw)
  To: Samuel Just; +Cc: ceph-devel

Okay, uploaded it to http://damoxc.net/ceph/osd.0.log.gz. I restarted
the osd with debug osd = 20 and let it run for 5 minutes or so.

On 6 April 2012 18:30, Samuel Just <sam.just@dreamhost.com> wrote:
> Hmm, osd.0 should be enough.  I need osd debugging at around 20 from
> when the osd started.  Restarting the osd with debugging at 20 would
> also work fine.
> -Sam
>
> On Fri, Apr 6, 2012 at 9:55 AM, Damien Churchill <damoxc@gmail.com> wrote:
>> I've got that directory on 3 of the osds: 0, 3 and 4. Do you want the
>> logs to all 3 of them?
>>
>> On 6 April 2012 17:50, Samuel Just <sam.just@dreamhost.com> wrote:
>>> Is there a 0.138_head directory under current/ on any of your osds?
>>> If so, can you post the log to that osd?  I could also use the osd.0
>>> log.
>>> -Sam
>>>
>>> On Wed, Apr 4, 2012 at 2:44 PM, Damien Churchill <damoxc@gmail.com> wrote:
>>>> I've uploaded them to:
>>>>
>>>> http://damoxc.net/ceph/osdmap
>>>> http://damoxc.net/ceph/pg_dump
>>>>
>>>> Thanks
>>>>
>>>> On 4 April 2012 21:51, Samuel Just <sam.just@dreamhost.com> wrote:
>>>>> Can you post a copy of your osd map and the output of 'ceph pg dump' ?
>>>>>  You can get the osdmap via 'ceph osd getmap -o <filename>'.
>>>>> -Sam
>>>>>
>>>>> On Wed, Apr 4, 2012 at 1:12 AM, Damien Churchill <damoxc@gmail.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I'm having some trouble getting some pgs to stop being inactive. The
>>>>>> cluster is running 0.44.1 and the kernel version is 3.2.x.
>>>>>>
>>>>>> ceph -s reports:
>>>>>> 2012-04-04 09:08:57.816029    pg v188540: 990 pgs: 223 inactive, 767
>>>>>> active+clean; 205 GB data, 1013 GB used, 8204 GB / 9315 GB avail
>>>>>> 2012-04-04 09:08:57.817970   mds e2198: 1/1/1 up {0=node24=up:active},
>>>>>> 4 up:standby
>>>>>> 2012-04-04 09:08:57.818024   osd e5910: 5 osds: 5 up, 5 in
>>>>>> 2012-04-04 09:08:57.818201   log 2012-04-04 09:04:03.838358 osd.3
>>>>>> 172.22.10.24:6801/30000 159 : [INF] 0.13d scrub ok
>>>>>> 2012-04-04 09:08:57.818280   mon e7: 3 mons at
>>>>>> {node21=172.22.10.21:6789/0,node22=172.22.10.22:6789/0,node23=172.22.10.23:6789/0}
>>>>>>
>>>>>> ceph health says:
>>>>>> 2012-04-04 09:09:01.651053 mon <- [health]
>>>>>> 2012-04-04 09:09:01.666585 mon.1 -> 'HEALTH_WARN 223 pgs stuck
>>>>>> inactive; 223 pgs stuck unclean' (0)
>>>>>>
>>>>>> I was wondering if anyone has any suggestions about how to resolve
>>>>>> this, or things to look for. I've tried restarted the ceph daemons on
>>>>>> the various nodes a few times to no-avail. I don't think that there is
>>>>>> anything wrong with any of the nodes either.
>>>>>>
>>>>>> Thanks in advance,
>>>>>> Damien
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pgs stuck inactive
       [not found]               ` <CAFtEh-dnR0zu=ct-gG0L+76BTYHVFimjFKNhHjnDNZWA=Scs1g@mail.gmail.com>
@ 2012-04-06 18:20                 ` Samuel Just
       [not found]                   ` <CAFtEh-cx2by6yG_xiQbq5e2fbViOWyHoaOLFPBCZgX7x-7gwFA@mail.gmail.com>
  0 siblings, 1 reply; 25+ messages in thread
From: Samuel Just @ 2012-04-06 18:20 UTC (permalink / raw)
  To: Damien Churchill; +Cc: ceph-devel

Hmm, can you post the monitor logs as well for that period?  It looks
like the osd is requesting a map change and not getting it.
-Sam

On Fri, Apr 6, 2012 at 10:59 AM, Damien Churchill <damoxc@gmail.com> wrote:
> Oops, was set to 600, chmod'd to 644, should be good now.
>
> On 6 April 2012 18:56, Samuel Just <sam.just@dreamhost.com> wrote:
>> Error 403.
>> -Sam
>>
>> On Fri, Apr 6, 2012 at 10:53 AM, Damien Churchill <damoxc@gmail.com> wrote:
>>> Okay, uploaded it to http://damoxc.net/ceph/osd.0.log.gz. I restarted
>>> the osd with debug osd = 20 and let it run for 5 minutes or so.
>>>
>>> On 6 April 2012 18:30, Samuel Just <sam.just@dreamhost.com> wrote:
>>>> Hmm, osd.0 should be enough.  I need osd debugging at around 20 from
>>>> when the osd started.  Restarting the osd with debugging at 20 would
>>>> also work fine.
>>>> -Sam
>>>>
>>>> On Fri, Apr 6, 2012 at 9:55 AM, Damien Churchill <damoxc@gmail.com> wrote:
>>>>> I've got that directory on 3 of the osds: 0, 3 and 4. Do you want the
>>>>> logs to all 3 of them?
>>>>>
>>>>> On 6 April 2012 17:50, Samuel Just <sam.just@dreamhost.com> wrote:
>>>>>> Is there a 0.138_head directory under current/ on any of your osds?
>>>>>> If so, can you post the log to that osd?  I could also use the osd.0
>>>>>> log.
>>>>>> -Sam
>>>>>>
>>>>>> On Wed, Apr 4, 2012 at 2:44 PM, Damien Churchill <damoxc@gmail.com> wrote:
>>>>>>> I've uploaded them to:
>>>>>>>
>>>>>>> http://damoxc.net/ceph/osdmap
>>>>>>> http://damoxc.net/ceph/pg_dump
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> On 4 April 2012 21:51, Samuel Just <sam.just@dreamhost.com> wrote:
>>>>>>>> Can you post a copy of your osd map and the output of 'ceph pg dump' ?
>>>>>>>>  You can get the osdmap via 'ceph osd getmap -o <filename>'.
>>>>>>>> -Sam
>>>>>>>>
>>>>>>>> On Wed, Apr 4, 2012 at 1:12 AM, Damien Churchill <damoxc@gmail.com> wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I'm having some trouble getting some pgs to stop being inactive. The
>>>>>>>>> cluster is running 0.44.1 and the kernel version is 3.2.x.
>>>>>>>>>
>>>>>>>>> ceph -s reports:
>>>>>>>>> 2012-04-04 09:08:57.816029    pg v188540: 990 pgs: 223 inactive, 767
>>>>>>>>> active+clean; 205 GB data, 1013 GB used, 8204 GB / 9315 GB avail
>>>>>>>>> 2012-04-04 09:08:57.817970   mds e2198: 1/1/1 up {0=node24=up:active},
>>>>>>>>> 4 up:standby
>>>>>>>>> 2012-04-04 09:08:57.818024   osd e5910: 5 osds: 5 up, 5 in
>>>>>>>>> 2012-04-04 09:08:57.818201   log 2012-04-04 09:04:03.838358 osd.3
>>>>>>>>> 172.22.10.24:6801/30000 159 : [INF] 0.13d scrub ok
>>>>>>>>> 2012-04-04 09:08:57.818280   mon e7: 3 mons at
>>>>>>>>> {node21=172.22.10.21:6789/0,node22=172.22.10.22:6789/0,node23=172.22.10.23:6789/0}
>>>>>>>>>
>>>>>>>>> ceph health says:
>>>>>>>>> 2012-04-04 09:09:01.651053 mon <- [health]
>>>>>>>>> 2012-04-04 09:09:01.666585 mon.1 -> 'HEALTH_WARN 223 pgs stuck
>>>>>>>>> inactive; 223 pgs stuck unclean' (0)
>>>>>>>>>
>>>>>>>>> I was wondering if anyone has any suggestions about how to resolve
>>>>>>>>> this, or things to look for. I've tried restarted the ceph daemons on
>>>>>>>>> the various nodes a few times to no-avail. I don't think that there is
>>>>>>>>> anything wrong with any of the nodes either.
>>>>>>>>>
>>>>>>>>> Thanks in advance,
>>>>>>>>> Damien
>>>>>>>>> --
>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pgs stuck inactive
       [not found]                       ` <CAFtEh-fwt9BwZ7NwZaJVnwq3hgzvkrb1XvKT0uF81vePRcu7oA@mail.gmail.com>
@ 2012-04-10 21:49                         ` Samuel Just
  2012-04-10 22:03                           ` Damien Churchill
  0 siblings, 1 reply; 25+ messages in thread
From: Samuel Just @ 2012-04-10 21:49 UTC (permalink / raw)
  To: Damien Churchill; +Cc: ceph-devel

Nothing apparent from the backtrace.  I need monitor logs from when
the osd is sending pg_temp requests.  Can you restart the osd and post
the osd and all three monitor logs from when you restarted the osd?
You'll have to enable monitor logging on all three.
-Sam

On Tue, Apr 10, 2012 at 4:34 AM, Damien Churchill <damoxc@gmail.com> wrote:
> Okay done that now:
>
> (gdb) thread apply all bt
>
> Thread 61 (Thread 0x7fb5cb00c700 (LWP 12771)):
> #0  0x00007fb5cccdf3f1 in sem_timedwait ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x00000000006981f7 in CephContextServiceThread::entry (this=0x2b52bc0)
>    at common/ceph_context.cc:53
> #2  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #3  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #4  0x0000000000000000 in ?? ()
>
> Thread 60 (Thread 0x7fb5ca80b700 (LWP 12774)):
> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x000000000068abf9 in AdminSocket::entry (this=0x2b5c000)
>    at common/admin_socket.cc:211
> #2  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #3  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #4  0x0000000000000000 in ?? ()
>
> Thread 59 (Thread 0x7fb5c5801700 (LWP 12831)):
> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x000000000067060e in SimpleMessenger::Accepter::entry (this=0x2b496b8)
>    at msg/SimpleMessenger.cc:209
> #2  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #3  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #4  0x0000000000000000 in ?? ()
>
> Thread 58 (Thread 0x7fb5c6002700 (LWP 12832)):
> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000672aba in Wait (mutex=..., this=0x2b49aa0)
>    at ./common/Cond.h:48
> #2  SimpleMessenger::reaper_entry (this=0x2b49680)
>    at msg/SimpleMessenger.cc:2336
> #3  0x000000000052122d in SimpleMessenger::ReaperThread::entry (
>    this=<optimized out>) at msg/SimpleMessenger.h:522
> #4  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #6  0x0000000000000000 in ?? ()
>
> Thread 57 (Thread 0x7fb5c6803700 (LWP 12833)):
> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000672aba in Wait (mutex=..., this=0x2b491a0)
>    at ./common/Cond.h:48
> #2  SimpleMessenger::reaper_entry (this=0x2b48d80)
>    at msg/SimpleMessenger.cc:2336
> #3  0x000000000052122d in SimpleMessenger::ReaperThread::entry (
>    this=<optimized out>) at msg/SimpleMessenger.h:522
> #4  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #6  0x0000000000000000 in ?? ()
>
> Thread 56 (Thread 0x7fb5c9008700 (LWP 12834)):
> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x000000000067060e in SimpleMessenger::Accepter::entry (this=0x2b49b38)
>    at msg/SimpleMessenger.cc:209
> #2  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #3  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #4  0x0000000000000000 in ?? ()
>
> Thread 55 (Thread 0x7fb5ca00a700 (LWP 12835)):
> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000672aba in Wait (mutex=..., this=0x2b49f20)
>    at ./common/Cond.h:48
> #2  SimpleMessenger::reaper_entry (this=0x2b49b00)
>    at msg/SimpleMessenger.cc:2336
> #3  0x000000000052122d in SimpleMessenger::ReaperThread::entry (
>    this=<optimized out>) at msg/SimpleMessenger.h:522
> #4  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #6  0x0000000000000000 in ?? ()
>
> Thread 54 (Thread 0x7fb5c9809700 (LWP 12836)):
> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x000000000067060e in SimpleMessenger::Accepter::entry (this=0x2b49238)
>    at msg/SimpleMessenger.cc:209
> #2  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #3  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #4  0x0000000000000000 in ?? ()
>
> Thread 53 (Thread 0x7fb5c8807700 (LWP 12837)):
> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000672aba in Wait (mutex=..., this=0x2b49620)
>    at ./common/Cond.h:48
> #2  SimpleMessenger::reaper_entry (this=0x2b49200)
>    at msg/SimpleMessenger.cc:2336
> #3  0x000000000052122d in SimpleMessenger::ReaperThread::entry (
>    this=<optimized out>) at msg/SimpleMessenger.h:522
> #4  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #6  0x0000000000000000 in ?? ()
>
> Thread 52 (Thread 0x7fb5c8006700 (LWP 12838)):
> #0  0x00007fb5cb302613 in select () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x00000000006fb210 in SignalHandler::entry (this=0x2b4e420)
>    at global/signal_handler.cc:201
> #2  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #3  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #4  0x0000000000000000 in ?? ()
>
> Thread 51 (Thread 0x7fb5c7805700 (LWP 12839)):
> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000685bf7 in WaitUntil (when=<optimized out>, mutex=...,
>    this=0x2b66060) at common/Cond.h:67
> #2  SafeTimer::timer_thread (this=0x2b66050) at common/Timer.cc:110
> #3  0x000000000068666d in SafeTimerThread::entry (this=<optimized out>)
>    at common/Timer.cc:38
> #4  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #6  0x0000000000000000 in ?? ()
>
> Thread 50 (Thread 0x7fb5c7004700 (LWP 12840)):
> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000685853 in Wait (mutex=..., this=0x2b67460) at common/Cond.h:48
> #2  SafeTimer::timer_thread (this=0x2b67450) at common/Timer.cc:108
> #3  0x000000000068666d in SafeTimerThread::entry (this=<optimized out>)
>    at common/Timer.cc:38
> #4  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #6  0x0000000000000000 in ?? ()
>
> Thread 49 (Thread 0x7fb5c5000700 (LWP 12854)):
> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x00000000007992d0 in WaitUntil (when=<optimized out>, mutex=...,
>    this=0x2b63500) at ./common/Cond.h:67
> #2  WaitInterval (interval=<optimized out>, mutex=..., cct=<optimized out>,
>    this=0x2b63500) at ./common/Cond.h:74
> #3  FileStore::sync_entry (this=0x2b63000) at os/FileStore.cc:3352
> #4  0x00000000007a65ed in FileStore::SyncThread::entry (this=<optimized out>)
>    at os/FileStore.h:103
> #5  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #7  0x0000000000000000 in ?? ()
>
> Thread 48 (Thread 0x7fb5c47ff700 (LWP 12859)):
> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x000000000077807c in Wait (mutex=..., this=0x2b6ab10)
>    at ./common/Cond.h:48
> #2  FileJournal::write_thread_entry (this=0x2b6a800) at os/FileJournal.cc:1052
>
> #3  0x00000000005d4c7d in FileJournal::Writer::entry (this=<optimized out>)
>    at ./os/FileJournal.h:249
> #4  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #6  0x0000000000000000 in ?? ()
>
> Thread 47 (Thread 0x7fb5c3ffe700 (LWP 12860)):
> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x000000000077222f in Wait (mutex=..., this=0x2b6a8c0)
>    at ./common/Cond.h:48
> #2  FileJournal::write_finish_thread_entry (this=0x2b6a800)
>    at os/FileJournal.cc:1245
> #3  0x00000000005d4c5d in FileJournal::WriteFinisher::entry (
>    this=<optimized out>) at ./os/FileJournal.h:259
> #4  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #6  0x0000000000000000 in ?? ()
>
> Thread 46 (Thread 0x7fb5c37fd700 (LWP 12861)):
> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000761f8e in Wait (mutex=..., this=0x2b630b8)
>    at ./common/Cond.h:48
> #2  Finisher::finisher_thread_entry (this=0x2b63070) at common/Finisher.cc:76
> #3  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #4  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #5  0x0000000000000000 in ?? ()
>
> Thread 45 (Thread 0x7fb5c2ffc700 (LWP 12862)):
> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000688a05 in WaitUntil (when=<optimized out>, mutex=...,
>    this=0x2b63868) at common/Cond.h:67
> #2  WaitInterval (interval=<optimized out>, mutex=..., cct=<optimized out>,
>    this=0x2b63868) at common/Cond.h:74
> #3  ThreadPool::worker (this=0x2b63810) at common/WorkQueue.cc:71
> #4  0x00000000005d4cad in ThreadPool::WorkThread::entry (this=<optimized out>)
>    at ./common/WorkQueue.h:121
> #5  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #7  0x0000000000000000 in ?? ()
>
> Thread 44 (Thread 0x7fb5c27fb700 (LWP 12863)):
> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000688a05 in WaitUntil (when=<optimized out>, mutex=...,
>    this=0x2b63868) at common/Cond.h:67
> #2  WaitInterval (interval=<optimized out>, mutex=..., cct=<optimized out>,
>    this=0x2b63868) at common/Cond.h:74
> #3  ThreadPool::worker (this=0x2b63810) at common/WorkQueue.cc:71
> #4  0x00000000005d4cad in ThreadPool::WorkThread::entry (this=<optimized out>)
>    at ./common/WorkQueue.h:121
> #5  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #7  0x0000000000000000 in ?? ()
>
> Thread 43 (Thread 0x7fb5c1ffa700 (LWP 12864)):
> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000781ebd in Wait (mutex=..., this=0x2b63970)
>    at ./common/Cond.h:48
> #2  FileStore::flusher_entry (this=0x2b63000) at os/FileStore.cc:3312
> #3  0x00000000007a49ed in FileStore::FlusherThread::entry (
>    this=<optimized out>) at os/FileStore.h:242
> #4  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #6  0x0000000000000000 in ?? ()
>
> Thread 42 (Thread 0x7fb5c17f9700 (LWP 12865)):
> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000761f8e in Wait (mutex=..., this=0x2b63750)
>    at ./common/Cond.h:48
> #2  Finisher::finisher_thread_entry (this=0x2b63708) at common/Finisher.cc:76
> #3  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #4  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #5  0x0000000000000000 in ?? ()
>
> Thread 41 (Thread 0x7fb5c0ff8700 (LWP 12866)):
> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000761f8e in Wait (mutex=..., this=0x2b63400)
>    at ./common/Cond.h:48
> #2  Finisher::finisher_thread_entry (this=0x2b633b8) at common/Finisher.cc:76
> #3  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #4  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #5  0x0000000000000000 in ?? ()
>
> Thread 40 (Thread 0x7fb5c07f7700 (LWP 12867)):
> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000685bf7 in WaitUntil (when=<optimized out>, mutex=...,
>    this=0x2b63590) at common/Cond.h:67
> #2  SafeTimer::timer_thread (this=0x2b63580) at common/Timer.cc:110
> #3  0x000000000068666d in SafeTimerThread::entry (this=<optimized out>)
>    at common/Timer.cc:38
> #4  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #6  0x0000000000000000 in ?? ()
>
> Thread 39 (Thread 0x7fb5bfff6700 (LWP 12868)):
> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x00000000006748da in Wait (mutex=..., this=0x2b49718)
>    at ./common/Cond.h:48
> #2  SimpleMessenger::dispatch_entry (this=0x2b49680)
>    at msg/SimpleMessenger.cc:374
> #3  0x000000000052120d in SimpleMessenger::DispatchThread::entry (
>    this=<optimized out>) at msg/SimpleMessenger.h:559
> #4  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #6  0x0000000000000000 in ?? ()
>
> Thread 38 (Thread 0x7fb5bf7f5700 (LWP 12869)):
> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x00000000006748da in Wait (mutex=..., this=0x2b49298)
>    at ./common/Cond.h:48
> #2  SimpleMessenger::dispatch_entry (this=0x2b49200)
>    at msg/SimpleMessenger.cc:374
> #3  0x000000000052120d in SimpleMessenger::DispatchThread::entry (
>    this=<optimized out>) at msg/SimpleMessenger.h:559
> #4  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #6  0x0000000000000000 in ?? ()
>
> Thread 37 (Thread 0x7fb5beff4700 (LWP 12870)):
> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x00000000006748da in Wait (mutex=..., this=0x2b48e18)
>    at ./common/Cond.h:48
> #2  SimpleMessenger::dispatch_entry (this=0x2b48d80)
>    at msg/SimpleMessenger.cc:374
> #3  0x000000000052120d in SimpleMessenger::DispatchThread::entry (
>    this=<optimized out>) at msg/SimpleMessenger.h:559
> #4  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #6  0x0000000000000000 in ?? ()
>
> Thread 36 (Thread 0x7fb5be7f3700 (LWP 12871)):
> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x00000000006748da in Wait (mutex=..., this=0x2b49b98)
>    at ./common/Cond.h:48
> #2  SimpleMessenger::dispatch_entry (this=0x2b49b00)
>    at msg/SimpleMessenger.cc:374
> #3  0x000000000052120d in SimpleMessenger::DispatchThread::entry (
>    this=<optimized out>) at msg/SimpleMessenger.h:559
> #4  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #6  0x0000000000000000 in ?? ()
>
> Thread 35 (Thread 0x7fb5bdff2700 (LWP 12872)):
> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000685bf7 in WaitUntil (when=<optimized out>, mutex=...,
>    this=0x7fff73395748) at common/Cond.h:67
> #2  SafeTimer::timer_thread (this=0x7fff73395738) at common/Timer.cc:110
> #3  0x000000000068666d in SafeTimerThread::entry (this=<optimized out>)
>    at common/Timer.cc:38
> #4  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #6  0x0000000000000000 in ?? ()
>
> Thread 34 (Thread 0x7fb5bd7f1700 (LWP 12873)):
> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000761f8e in Wait (mutex=..., this=0x7fff73395838)
>    at ./common/Cond.h:48
> #2  Finisher::finisher_thread_entry (this=0x7fff733957f0)
>    at common/Finisher.cc:76
> #3  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #4  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #5  0x0000000000000000 in ?? ()
>
> Thread 33 (Thread 0x7fb5cd2e3700 (LWP 12874)):
> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000679f1a in Wait (mutex=..., this=0x4965950)
>    at ./common/Cond.h:48
> #2  SimpleMessenger::Pipe::writer (this=0x4965780)
>    at msg/SimpleMessenger.cc:1821
> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>    this=<optimized out>) at msg/SimpleMessenger.h:258
> #4  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #6  0x0000000000000000 in ?? ()
>
> Thread 32 (Thread 0x7fb5bcff0700 (LWP 12877)):
> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>    timeout=<optimized out>) at msg/tcp.cc:53
> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=31,
>    buf=0x7fb5bcfefd9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x4965780)
>    at msg/SimpleMessenger.cc:1606
> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>    this=<optimized out>) at msg/SimpleMessenger.h:250
> #5  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #7  0x0000000000000000 in ?? ()
>
> Thread 31 (Thread 0x7fb5bceef700 (LWP 12878)):
> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000688a05 in WaitUntil (when=<optimized out>, mutex=...,
>    this=0x2b66470) at common/Cond.h:67
> #2  WaitInterval (interval=<optimized out>, mutex=..., cct=<optimized out>,
>    this=0x2b66470) at common/Cond.h:74
> #3  ThreadPool::worker (this=0x2b66418) at common/WorkQueue.cc:71
> #4  0x00000000005d4cad in ThreadPool::WorkThread::entry (this=<optimized out>)
>    at ./common/WorkQueue.h:121
> #5  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #7  0x0000000000000000 in ?? ()
>
> Thread 30 (Thread 0x7fb5bc6ee700 (LWP 12879)):
> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000688a05 in WaitUntil (when=<optimized out>, mutex=...,
>    this=0x2b66470) at common/Cond.h:67
> #2  WaitInterval (interval=<optimized out>, mutex=..., cct=<optimized out>,
>    this=0x2b66470) at common/Cond.h:74
> #3  ThreadPool::worker (this=0x2b66418) at common/WorkQueue.cc:71
> #4  0x00000000005d4cad in ThreadPool::WorkThread::entry (this=<optimized out>)
>    at ./common/WorkQueue.h:121
> #5  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #7  0x0000000000000000 in ?? ()
>
> Thread 29 (Thread 0x7fb5bbeed700 (LWP 12880)):
> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000688a05 in WaitUntil (when=<optimized out>, mutex=...,
>    this=0x2b665a0) at common/Cond.h:67
> #2  WaitInterval (interval=<optimized out>, mutex=..., cct=<optimized out>,
>    this=0x2b665a0) at common/Cond.h:74
> #3  ThreadPool::worker (this=0x2b66548) at common/WorkQueue.cc:71
> #4  0x00000000005d4cad in ThreadPool::WorkThread::entry (this=<optimized out>)
>    at ./common/WorkQueue.h:121
> #5  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #7  0x0000000000000000 in ?? ()
>
> Thread 28 (Thread 0x7fb5bb6ec700 (LWP 12881)):
> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000688a05 in WaitUntil (when=<optimized out>, mutex=...,
>    this=0x2b666d0) at common/Cond.h:67
> #2  WaitInterval (interval=<optimized out>, mutex=..., cct=<optimized out>,
>    this=0x2b666d0) at common/Cond.h:74
> #3  ThreadPool::worker (this=0x2b66678) at common/WorkQueue.cc:71
> #4  0x00000000005d4cad in ThreadPool::WorkThread::entry (this=<optimized out>)
>    at ./common/WorkQueue.h:121
> #5  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #7  0x0000000000000000 in ?? ()
>
> Thread 27 (Thread 0x7fb5baeeb700 (LWP 12882)):
> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000688a05 in WaitUntil (when=<optimized out>, mutex=...,
>    this=0x2b66800) at common/Cond.h:67
> #2  WaitInterval (interval=<optimized out>, mutex=..., cct=<optimized out>,
>    this=0x2b66800) at common/Cond.h:74
> #3  ThreadPool::worker (this=0x2b667a8) at common/WorkQueue.cc:71
> #4  0x00000000005d4cad in ThreadPool::WorkThread::entry (this=<optimized out>)
>    at ./common/WorkQueue.h:121
> #5  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #7  0x0000000000000000 in ?? ()
>
> Thread 26 (Thread 0x7fb5ba6ea700 (LWP 12883)):
> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x00000000005b43f2 in WaitUntil (when=<optimized out>, mutex=...,
>    this=0x2b66918) at ./common/Cond.h:67
> #2  WaitInterval (interval=<optimized out>, mutex=..., cct=<optimized out>,
>    this=0x2b66918) at ./common/Cond.h:74
> #3  OSD::heartbeat_entry (this=0x2b66000) at osd/OSD.cc:1699
> #4  0x00000000005e68ad in OSD::T_Heartbeat::entry (this=<optimized out>)
>    at osd/OSD.h:280
> #5  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #7  0x0000000000000000 in ?? ()
>
> Thread 25 (Thread 0x7fb5b9ee9700 (LWP 12885)):
> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000679f1a in Wait (mutex=..., this=0x4965e50)
>    at ./common/Cond.h:48
> #2  SimpleMessenger::Pipe::writer (this=0x4965c80)
>    at msg/SimpleMessenger.cc:1821
> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>    this=<optimized out>) at msg/SimpleMessenger.h:258
> #4  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #6  0x0000000000000000 in ?? ()
>
> Thread 24 (Thread 0x7fb5b9de8700 (LWP 12886)):
> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000679f1a in Wait (mutex=..., this=0x4995450)
>    at ./common/Cond.h:48
> #2  SimpleMessenger::Pipe::writer (this=0x4995280)
>    at msg/SimpleMessenger.cc:1821
> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>    this=<optimized out>) at msg/SimpleMessenger.h:258
> #4  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #6  0x0000000000000000 in ?? ()
>
> Thread 23 (Thread 0x7fb5b9ce7700 (LWP 12887)):
> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000679f1a in Wait (mutex=..., this=0x49951d0)
>    at ./common/Cond.h:48
> #2  SimpleMessenger::Pipe::writer (this=0x4995000)
>    at msg/SimpleMessenger.cc:1821
> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>    this=<optimized out>) at msg/SimpleMessenger.h:258
> #4  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #6  0x0000000000000000 in ?? ()
>
> Thread 22 (Thread 0x7fb5b9be6700 (LWP 12888)):
> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000679f1a in Wait (mutex=..., this=0x4995bd0)
>    at ./common/Cond.h:48
> #2  SimpleMessenger::Pipe::writer (this=0x4995a00)
>    at msg/SimpleMessenger.cc:1821
> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>    this=<optimized out>) at msg/SimpleMessenger.h:258
> #4  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #6  0x0000000000000000 in ?? ()
>
> Thread 21 (Thread 0x7fb5b9ae5700 (LWP 12889)):
> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000679f1a in Wait (mutex=..., this=0x4995950)
>    at ./common/Cond.h:48
> #2  SimpleMessenger::Pipe::writer (this=0x4995780)
>    at msg/SimpleMessenger.cc:1821
> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>    this=<optimized out>) at msg/SimpleMessenger.h:258
> #4  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #6  0x0000000000000000 in ?? ()
>
> Thread 20 (Thread 0x7fb5b99e4700 (LWP 12890)):
> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000679f1a in Wait (mutex=..., this=0x49956d0)
>    at ./common/Cond.h:48
> #2  SimpleMessenger::Pipe::writer (this=0x4995500)
>    at msg/SimpleMessenger.cc:1821
> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>    this=<optimized out>) at msg/SimpleMessenger.h:258
> #4  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #6  0x0000000000000000 in ?? ()
>
> Thread 19 (Thread 0x7fb5b98e3700 (LWP 12891)):
> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000679f1a in Wait (mutex=..., this=0x4995e50)
>    at ./common/Cond.h:48
> #2  SimpleMessenger::Pipe::writer (this=0x4995c80)
>    at msg/SimpleMessenger.cc:1821
> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>    this=<optimized out>) at msg/SimpleMessenger.h:258
> #4  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #6  0x0000000000000000 in ?? ()
>
> Thread 18 (Thread 0x7fb5b97e2700 (LWP 12892)):
> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000679f1a in Wait (mutex=..., this=0x4d62bd0)
>    at ./common/Cond.h:48
> #2  SimpleMessenger::Pipe::writer (this=0x4d62a00)
>    at msg/SimpleMessenger.cc:1821
> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>    this=<optimized out>) at msg/SimpleMessenger.h:258
> #4  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #6  0x0000000000000000 in ?? ()
>
> Thread 17 (Thread 0x7fb5b96e1700 (LWP 12893)):
> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>    timeout=<optimized out>) at msg/tcp.cc:53
> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=33,
>    buf=0x7fb5b96e0d9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x4995280)
>    at msg/SimpleMessenger.cc:1606
> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>    this=<optimized out>) at msg/SimpleMessenger.h:250
> #5  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #7  0x0000000000000000 in ?? ()
>
> Thread 16 (Thread 0x7fb5b95e0700 (LWP 12894)):
> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>    timeout=<optimized out>) at msg/tcp.cc:53
> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=35,
>    buf=0x7fb5b95dfd9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x4995a00)
>    at msg/SimpleMessenger.cc:1606
> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>    this=<optimized out>) at msg/SimpleMessenger.h:250
> #5  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #7  0x0000000000000000 in ?? ()
>
> Thread 15 (Thread 0x7fb5b94df700 (LWP 12895)):
> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>    timeout=<optimized out>) at msg/tcp.cc:53
> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=32,
>    buf=0x7fb5b94ded9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x4965c80)
>    at msg/SimpleMessenger.cc:1606
> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>    this=<optimized out>) at msg/SimpleMessenger.h:250
> #5  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #7  0x0000000000000000 in ?? ()
>
> Thread 14 (Thread 0x7fb5b93de700 (LWP 12896)):
> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>    timeout=<optimized out>) at msg/tcp.cc:53
> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=38,
>    buf=0x7fb5b93ddd9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x4995c80)
>    at msg/SimpleMessenger.cc:1606
> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>    this=<optimized out>) at msg/SimpleMessenger.h:250
> #5  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #7  0x0000000000000000 in ?? ()
>
> Thread 13 (Thread 0x7fb5b92dd700 (LWP 12897)):
> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>    timeout=<optimized out>) at msg/tcp.cc:53
> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=34,
>    buf=0x7fb5b92dcd9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x4995000)
>    at msg/SimpleMessenger.cc:1606
> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>    this=<optimized out>) at msg/SimpleMessenger.h:250
> #5  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #7  0x0000000000000000 in ?? ()
>
> Thread 12 (Thread 0x7fb5b91dc700 (LWP 12898)):
> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>    timeout=<optimized out>) at msg/tcp.cc:53
> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=36,
>    buf=0x7fb5b91dbd9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x4995780)
>    at msg/SimpleMessenger.cc:1606
> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>    this=<optimized out>) at msg/SimpleMessenger.h:250
> #5  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #7  0x0000000000000000 in ?? ()
>
> Thread 11 (Thread 0x7fb5b90db700 (LWP 12900)):
> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>    timeout=<optimized out>) at msg/tcp.cc:53
> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=39,
>    buf=0x7fb5b90dad9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x4d62a00)
>    at msg/SimpleMessenger.cc:1606
> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>    this=<optimized out>) at msg/SimpleMessenger.h:250
> #5  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #7  0x0000000000000000 in ?? ()
>
> Thread 10 (Thread 0x7fb5b8fda700 (LWP 12899)):
> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>    timeout=<optimized out>) at msg/tcp.cc:53
> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=37,
>    buf=0x7fb5b8fd9d9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x4995500)
>    at msg/SimpleMessenger.cc:1606
> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>    this=<optimized out>) at msg/SimpleMessenger.h:250
> #5  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #7  0x0000000000000000 in ?? ()
>
> Thread 9 (Thread 0x7fb5b8ed9700 (LWP 12903)):
> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>    timeout=<optimized out>) at msg/tcp.cc:53
> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=40,
>    buf=0x7fb5b8ed8d9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x4d62c80)
>    at msg/SimpleMessenger.cc:1606
> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>    this=<optimized out>) at msg/SimpleMessenger.h:250
> #5  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #7  0x0000000000000000 in ?? ()
>
> Thread 8 (Thread 0x7fb5b8dd8700 (LWP 12904)):
> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000679f1a in Wait (mutex=..., this=0x4d62e50)
>    at ./common/Cond.h:48
> #2  SimpleMessenger::Pipe::writer (this=0x4d62c80)
>    at msg/SimpleMessenger.cc:1821
> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>    this=<optimized out>) at msg/SimpleMessenger.h:258
> #4  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #6  0x0000000000000000 in ?? ()
>
> Thread 7 (Thread 0x7fb5b8cd7700 (LWP 12908)):
> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>    timeout=<optimized out>) at msg/tcp.cc:53
> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=42,
>    buf=0x7fb5b8cd6d9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x49ed280)
>    at msg/SimpleMessenger.cc:1606
> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>    this=<optimized out>) at msg/SimpleMessenger.h:250
> #5  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #7  0x0000000000000000 in ?? ()
>
> Thread 6 (Thread 0x7fb5b8bd6700 (LWP 12909)):
> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000679f1a in Wait (mutex=..., this=0x49ed450)
>    at ./common/Cond.h:48
> #2  SimpleMessenger::Pipe::writer (this=0x49ed280)
>    at msg/SimpleMessenger.cc:1821
> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>    this=<optimized out>) at msg/SimpleMessenger.h:258
> #4  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #6  0x0000000000000000 in ?? ()
>
> Thread 5 (Thread 0x7fb5b8ad5700 (LWP 12910)):
> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>    timeout=<optimized out>) at msg/tcp.cc:53
> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=41,
>    buf=0x7fb5b8ad4d9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x49ed000)
>    at msg/SimpleMessenger.cc:1606
> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>    this=<optimized out>) at msg/SimpleMessenger.h:250
> #5  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #7  0x0000000000000000 in ?? ()
>
> Thread 4 (Thread 0x7fb5b89d4700 (LWP 12911)):
> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000679f1a in Wait (mutex=..., this=0x49ed1d0)
>    at ./common/Cond.h:48
> #2  SimpleMessenger::Pipe::writer (this=0x49ed000)
>    at msg/SimpleMessenger.cc:1821
> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>    this=<optimized out>) at msg/SimpleMessenger.h:258
> #4  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #6  0x0000000000000000 in ?? ()
>
> Thread 3 (Thread 0x7fb5b88d3700 (LWP 12912)):
> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>    timeout=<optimized out>) at msg/tcp.cc:53
> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=43,
>    buf=0x7fb5b88d2d9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x49eda00)
>    at msg/SimpleMessenger.cc:1606
> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>    this=<optimized out>) at msg/SimpleMessenger.h:250
> #5  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #7  0x0000000000000000 in ?? ()
>
> Thread 2 (Thread 0x7fb5b87d2700 (LWP 12913)):
> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000679f1a in Wait (mutex=..., this=0x49edbd0)
>    at ./common/Cond.h:48
> #2  SimpleMessenger::Pipe::writer (this=0x49eda00)
>    at msg/SimpleMessenger.cc:1821
> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>    this=<optimized out>) at msg/SimpleMessenger.h:258
> #4  0x00007fb5cccd8efc in start_thread ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #6  0x0000000000000000 in ?? ()
>
> Thread 1 (Thread 0x7fb5cd3017a0 (LWP 12770)):
> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>   from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x0000000000672c47 in Wait (mutex=..., this=0x2b49838)
>    at ./common/Cond.h:48
> #2  SimpleMessenger::wait (this=0x2b49680) at msg/SimpleMessenger.cc:2654
> #3  0x000000000051efc7 in main (argc=<optimized out>, argv=<optimized out>)
>    at ceph_osd.cc:423
>
>
> On 6 April 2012 22:06, Samuel Just <sam.just@dreamhost.com> wrote:
>> Actually, can you restart osd0, wait the same amount of time, attach
>> to the process with gdb, and post a backtrace from all threads?
>> -Sam
>>
>> On Fri, Apr 6, 2012 at 12:32 PM, Damien Churchill <damoxc@gmail.com> wrote:
>>> I've upped the monitor debug logging and re-uploaded the osd log, is that okay?
>>>
>>> http://damoxc.net/ceph/mon.node21.log.gz
>>>
>>> On 6 April 2012 19:20, Samuel Just <sam.just@dreamhost.com> wrote:
>>>> Hmm, can you post the monitor logs as well for that period?  It looks
>>>> like the osd is requesting a map change and not getting it.
>>>> -Sam
>>>>
>>>> On Fri, Apr 6, 2012 at 10:59 AM, Damien Churchill <damoxc@gmail.com> wrote:
>>>>> Oops, was set to 600, chmod'd to 644, should be good now.
>>>>>
>>>>> On 6 April 2012 18:56, Samuel Just <sam.just@dreamhost.com> wrote:
>>>>>> Error 403.
>>>>>> -Sam
>>>>>>
>>>>>> On Fri, Apr 6, 2012 at 10:53 AM, Damien Churchill <damoxc@gmail.com> wrote:
>>>>>>> Okay, uploaded it to http://damoxc.net/ceph/osd.0.log.gz. I restarted
>>>>>>> the osd with debug osd = 20 and let it run for 5 minutes or so.
>>>>>>>
>>>>>>> On 6 April 2012 18:30, Samuel Just <sam.just@dreamhost.com> wrote:
>>>>>>>> Hmm, osd.0 should be enough.  I need osd debugging at around 20 from
>>>>>>>> when the osd started.  Restarting the osd with debugging at 20 would
>>>>>>>> also work fine.
>>>>>>>> -Sam
>>>>>>>>
>>>>>>>> On Fri, Apr 6, 2012 at 9:55 AM, Damien Churchill <damoxc@gmail.com> wrote:
>>>>>>>>> I've got that directory on 3 of the osds: 0, 3 and 4. Do you want the
>>>>>>>>> logs to all 3 of them?
>>>>>>>>>
>>>>>>>>> On 6 April 2012 17:50, Samuel Just <sam.just@dreamhost.com> wrote:
>>>>>>>>>> Is there a 0.138_head directory under current/ on any of your osds?
>>>>>>>>>> If so, can you post the log to that osd?  I could also use the osd.0
>>>>>>>>>> log.
>>>>>>>>>> -Sam
>>>>>>>>>>
>>>>>>>>>> On Wed, Apr 4, 2012 at 2:44 PM, Damien Churchill <damoxc@gmail.com> wrote:
>>>>>>>>>>> I've uploaded them to:
>>>>>>>>>>>
>>>>>>>>>>> http://damoxc.net/ceph/osdmap
>>>>>>>>>>> http://damoxc.net/ceph/pg_dump
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>>> On 4 April 2012 21:51, Samuel Just <sam.just@dreamhost.com> wrote:
>>>>>>>>>>>> Can you post a copy of your osd map and the output of 'ceph pg dump' ?
>>>>>>>>>>>>  You can get the osdmap via 'ceph osd getmap -o <filename>'.
>>>>>>>>>>>> -Sam
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Apr 4, 2012 at 1:12 AM, Damien Churchill <damoxc@gmail.com> wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm having some trouble getting some pgs to stop being inactive. The
>>>>>>>>>>>>> cluster is running 0.44.1 and the kernel version is 3.2.x.
>>>>>>>>>>>>>
>>>>>>>>>>>>> ceph -s reports:
>>>>>>>>>>>>> 2012-04-04 09:08:57.816029    pg v188540: 990 pgs: 223 inactive, 767
>>>>>>>>>>>>> active+clean; 205 GB data, 1013 GB used, 8204 GB / 9315 GB avail
>>>>>>>>>>>>> 2012-04-04 09:08:57.817970   mds e2198: 1/1/1 up {0=node24=up:active},
>>>>>>>>>>>>> 4 up:standby
>>>>>>>>>>>>> 2012-04-04 09:08:57.818024   osd e5910: 5 osds: 5 up, 5 in
>>>>>>>>>>>>> 2012-04-04 09:08:57.818201   log 2012-04-04 09:04:03.838358 osd.3
>>>>>>>>>>>>> 172.22.10.24:6801/30000 159 : [INF] 0.13d scrub ok
>>>>>>>>>>>>> 2012-04-04 09:08:57.818280   mon e7: 3 mons at
>>>>>>>>>>>>> {node21=172.22.10.21:6789/0,node22=172.22.10.22:6789/0,node23=172.22.10.23:6789/0}
>>>>>>>>>>>>>
>>>>>>>>>>>>> ceph health says:
>>>>>>>>>>>>> 2012-04-04 09:09:01.651053 mon <- [health]
>>>>>>>>>>>>> 2012-04-04 09:09:01.666585 mon.1 -> 'HEALTH_WARN 223 pgs stuck
>>>>>>>>>>>>> inactive; 223 pgs stuck unclean' (0)
>>>>>>>>>>>>>
>>>>>>>>>>>>> I was wondering if anyone has any suggestions about how to resolve
>>>>>>>>>>>>> this, or things to look for. I've tried restarted the ceph daemons on
>>>>>>>>>>>>> the various nodes a few times to no-avail. I don't think that there is
>>>>>>>>>>>>> anything wrong with any of the nodes either.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>> Damien
>>>>>>>>>>>>> --
>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pgs stuck inactive
  2012-04-10 21:49                         ` Samuel Just
@ 2012-04-10 22:03                           ` Damien Churchill
  2012-04-10 23:00                             ` Samuel Just
  0 siblings, 1 reply; 25+ messages in thread
From: Damien Churchill @ 2012-04-10 22:03 UTC (permalink / raw)
  To: Samuel Just; +Cc: ceph-devel

Here are the monitor logs, they're from the monitor starts, however I
restarted the osd shortly afterwards and left it to run for 5 or so
minutes.

http://damoxc.net/ceph/mon.node21.log.gz
http://damoxc.net/ceph/mon.node22.log.gz
http://damoxc.net/ceph/mon.node23.log.gz

On 10 April 2012 22:49, Samuel Just <sam.just@dreamhost.com> wrote:
> Nothing apparent from the backtrace.  I need monitor logs from when
> the osd is sending pg_temp requests.  Can you restart the osd and post
> the osd and all three monitor logs from when you restarted the osd?
> You'll have to enable monitor logging on all three.
> -Sam
>
> On Tue, Apr 10, 2012 at 4:34 AM, Damien Churchill <damoxc@gmail.com> wrote:
>> Okay done that now:
>>
>> (gdb) thread apply all bt
>>
>> Thread 61 (Thread 0x7fb5cb00c700 (LWP 12771)):
>> #0  0x00007fb5cccdf3f1 in sem_timedwait ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x00000000006981f7 in CephContextServiceThread::entry (this=0x2b52bc0)
>>    at common/ceph_context.cc:53
>> #2  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #3  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #4  0x0000000000000000 in ?? ()
>>
>> Thread 60 (Thread 0x7fb5ca80b700 (LWP 12774)):
>> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>> #1  0x000000000068abf9 in AdminSocket::entry (this=0x2b5c000)
>>    at common/admin_socket.cc:211
>> #2  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #3  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #4  0x0000000000000000 in ?? ()
>>
>> Thread 59 (Thread 0x7fb5c5801700 (LWP 12831)):
>> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>> #1  0x000000000067060e in SimpleMessenger::Accepter::entry (this=0x2b496b8)
>>    at msg/SimpleMessenger.cc:209
>> #2  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #3  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #4  0x0000000000000000 in ?? ()
>>
>> Thread 58 (Thread 0x7fb5c6002700 (LWP 12832)):
>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000672aba in Wait (mutex=..., this=0x2b49aa0)
>>    at ./common/Cond.h:48
>> #2  SimpleMessenger::reaper_entry (this=0x2b49680)
>>    at msg/SimpleMessenger.cc:2336
>> #3  0x000000000052122d in SimpleMessenger::ReaperThread::entry (
>>    this=<optimized out>) at msg/SimpleMessenger.h:522
>> #4  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #6  0x0000000000000000 in ?? ()
>>
>> Thread 57 (Thread 0x7fb5c6803700 (LWP 12833)):
>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000672aba in Wait (mutex=..., this=0x2b491a0)
>>    at ./common/Cond.h:48
>> #2  SimpleMessenger::reaper_entry (this=0x2b48d80)
>>    at msg/SimpleMessenger.cc:2336
>> #3  0x000000000052122d in SimpleMessenger::ReaperThread::entry (
>>    this=<optimized out>) at msg/SimpleMessenger.h:522
>> #4  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #6  0x0000000000000000 in ?? ()
>>
>> Thread 56 (Thread 0x7fb5c9008700 (LWP 12834)):
>> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>> #1  0x000000000067060e in SimpleMessenger::Accepter::entry (this=0x2b49b38)
>>    at msg/SimpleMessenger.cc:209
>> #2  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #3  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #4  0x0000000000000000 in ?? ()
>>
>> Thread 55 (Thread 0x7fb5ca00a700 (LWP 12835)):
>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000672aba in Wait (mutex=..., this=0x2b49f20)
>>    at ./common/Cond.h:48
>> #2  SimpleMessenger::reaper_entry (this=0x2b49b00)
>>    at msg/SimpleMessenger.cc:2336
>> #3  0x000000000052122d in SimpleMessenger::ReaperThread::entry (
>>    this=<optimized out>) at msg/SimpleMessenger.h:522
>> #4  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #6  0x0000000000000000 in ?? ()
>>
>> Thread 54 (Thread 0x7fb5c9809700 (LWP 12836)):
>> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>> #1  0x000000000067060e in SimpleMessenger::Accepter::entry (this=0x2b49238)
>>    at msg/SimpleMessenger.cc:209
>> #2  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #3  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #4  0x0000000000000000 in ?? ()
>>
>> Thread 53 (Thread 0x7fb5c8807700 (LWP 12837)):
>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000672aba in Wait (mutex=..., this=0x2b49620)
>>    at ./common/Cond.h:48
>> #2  SimpleMessenger::reaper_entry (this=0x2b49200)
>>    at msg/SimpleMessenger.cc:2336
>> #3  0x000000000052122d in SimpleMessenger::ReaperThread::entry (
>>    this=<optimized out>) at msg/SimpleMessenger.h:522
>> #4  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #6  0x0000000000000000 in ?? ()
>>
>> Thread 52 (Thread 0x7fb5c8006700 (LWP 12838)):
>> #0  0x00007fb5cb302613 in select () from /lib/x86_64-linux-gnu/libc.so.6
>> #1  0x00000000006fb210 in SignalHandler::entry (this=0x2b4e420)
>>    at global/signal_handler.cc:201
>> #2  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #3  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #4  0x0000000000000000 in ?? ()
>>
>> Thread 51 (Thread 0x7fb5c7805700 (LWP 12839)):
>> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000685bf7 in WaitUntil (when=<optimized out>, mutex=...,
>>    this=0x2b66060) at common/Cond.h:67
>> #2  SafeTimer::timer_thread (this=0x2b66050) at common/Timer.cc:110
>> #3  0x000000000068666d in SafeTimerThread::entry (this=<optimized out>)
>>    at common/Timer.cc:38
>> #4  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #6  0x0000000000000000 in ?? ()
>>
>> Thread 50 (Thread 0x7fb5c7004700 (LWP 12840)):
>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000685853 in Wait (mutex=..., this=0x2b67460) at common/Cond.h:48
>> #2  SafeTimer::timer_thread (this=0x2b67450) at common/Timer.cc:108
>> #3  0x000000000068666d in SafeTimerThread::entry (this=<optimized out>)
>>    at common/Timer.cc:38
>> #4  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #6  0x0000000000000000 in ?? ()
>>
>> Thread 49 (Thread 0x7fb5c5000700 (LWP 12854)):
>> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x00000000007992d0 in WaitUntil (when=<optimized out>, mutex=...,
>>    this=0x2b63500) at ./common/Cond.h:67
>> #2  WaitInterval (interval=<optimized out>, mutex=..., cct=<optimized out>,
>>    this=0x2b63500) at ./common/Cond.h:74
>> #3  FileStore::sync_entry (this=0x2b63000) at os/FileStore.cc:3352
>> #4  0x00000000007a65ed in FileStore::SyncThread::entry (this=<optimized out>)
>>    at os/FileStore.h:103
>> #5  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #7  0x0000000000000000 in ?? ()
>>
>> Thread 48 (Thread 0x7fb5c47ff700 (LWP 12859)):
>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x000000000077807c in Wait (mutex=..., this=0x2b6ab10)
>>    at ./common/Cond.h:48
>> #2  FileJournal::write_thread_entry (this=0x2b6a800) at os/FileJournal.cc:1052
>>
>> #3  0x00000000005d4c7d in FileJournal::Writer::entry (this=<optimized out>)
>>    at ./os/FileJournal.h:249
>> #4  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #6  0x0000000000000000 in ?? ()
>>
>> Thread 47 (Thread 0x7fb5c3ffe700 (LWP 12860)):
>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x000000000077222f in Wait (mutex=..., this=0x2b6a8c0)
>>    at ./common/Cond.h:48
>> #2  FileJournal::write_finish_thread_entry (this=0x2b6a800)
>>    at os/FileJournal.cc:1245
>> #3  0x00000000005d4c5d in FileJournal::WriteFinisher::entry (
>>    this=<optimized out>) at ./os/FileJournal.h:259
>> #4  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #6  0x0000000000000000 in ?? ()
>>
>> Thread 46 (Thread 0x7fb5c37fd700 (LWP 12861)):
>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000761f8e in Wait (mutex=..., this=0x2b630b8)
>>    at ./common/Cond.h:48
>> #2  Finisher::finisher_thread_entry (this=0x2b63070) at common/Finisher.cc:76
>> #3  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #4  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #5  0x0000000000000000 in ?? ()
>>
>> Thread 45 (Thread 0x7fb5c2ffc700 (LWP 12862)):
>> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000688a05 in WaitUntil (when=<optimized out>, mutex=...,
>>    this=0x2b63868) at common/Cond.h:67
>> #2  WaitInterval (interval=<optimized out>, mutex=..., cct=<optimized out>,
>>    this=0x2b63868) at common/Cond.h:74
>> #3  ThreadPool::worker (this=0x2b63810) at common/WorkQueue.cc:71
>> #4  0x00000000005d4cad in ThreadPool::WorkThread::entry (this=<optimized out>)
>>    at ./common/WorkQueue.h:121
>> #5  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #7  0x0000000000000000 in ?? ()
>>
>> Thread 44 (Thread 0x7fb5c27fb700 (LWP 12863)):
>> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000688a05 in WaitUntil (when=<optimized out>, mutex=...,
>>    this=0x2b63868) at common/Cond.h:67
>> #2  WaitInterval (interval=<optimized out>, mutex=..., cct=<optimized out>,
>>    this=0x2b63868) at common/Cond.h:74
>> #3  ThreadPool::worker (this=0x2b63810) at common/WorkQueue.cc:71
>> #4  0x00000000005d4cad in ThreadPool::WorkThread::entry (this=<optimized out>)
>>    at ./common/WorkQueue.h:121
>> #5  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #7  0x0000000000000000 in ?? ()
>>
>> Thread 43 (Thread 0x7fb5c1ffa700 (LWP 12864)):
>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000781ebd in Wait (mutex=..., this=0x2b63970)
>>    at ./common/Cond.h:48
>> #2  FileStore::flusher_entry (this=0x2b63000) at os/FileStore.cc:3312
>> #3  0x00000000007a49ed in FileStore::FlusherThread::entry (
>>    this=<optimized out>) at os/FileStore.h:242
>> #4  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #6  0x0000000000000000 in ?? ()
>>
>> Thread 42 (Thread 0x7fb5c17f9700 (LWP 12865)):
>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000761f8e in Wait (mutex=..., this=0x2b63750)
>>    at ./common/Cond.h:48
>> #2  Finisher::finisher_thread_entry (this=0x2b63708) at common/Finisher.cc:76
>> #3  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #4  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #5  0x0000000000000000 in ?? ()
>>
>> Thread 41 (Thread 0x7fb5c0ff8700 (LWP 12866)):
>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000761f8e in Wait (mutex=..., this=0x2b63400)
>>    at ./common/Cond.h:48
>> #2  Finisher::finisher_thread_entry (this=0x2b633b8) at common/Finisher.cc:76
>> #3  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #4  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #5  0x0000000000000000 in ?? ()
>>
>> Thread 40 (Thread 0x7fb5c07f7700 (LWP 12867)):
>> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000685bf7 in WaitUntil (when=<optimized out>, mutex=...,
>>    this=0x2b63590) at common/Cond.h:67
>> #2  SafeTimer::timer_thread (this=0x2b63580) at common/Timer.cc:110
>> #3  0x000000000068666d in SafeTimerThread::entry (this=<optimized out>)
>>    at common/Timer.cc:38
>> #4  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #6  0x0000000000000000 in ?? ()
>>
>> Thread 39 (Thread 0x7fb5bfff6700 (LWP 12868)):
>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x00000000006748da in Wait (mutex=..., this=0x2b49718)
>>    at ./common/Cond.h:48
>> #2  SimpleMessenger::dispatch_entry (this=0x2b49680)
>>    at msg/SimpleMessenger.cc:374
>> #3  0x000000000052120d in SimpleMessenger::DispatchThread::entry (
>>    this=<optimized out>) at msg/SimpleMessenger.h:559
>> #4  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #6  0x0000000000000000 in ?? ()
>>
>> Thread 38 (Thread 0x7fb5bf7f5700 (LWP 12869)):
>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x00000000006748da in Wait (mutex=..., this=0x2b49298)
>>    at ./common/Cond.h:48
>> #2  SimpleMessenger::dispatch_entry (this=0x2b49200)
>>    at msg/SimpleMessenger.cc:374
>> #3  0x000000000052120d in SimpleMessenger::DispatchThread::entry (
>>    this=<optimized out>) at msg/SimpleMessenger.h:559
>> #4  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #6  0x0000000000000000 in ?? ()
>>
>> Thread 37 (Thread 0x7fb5beff4700 (LWP 12870)):
>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x00000000006748da in Wait (mutex=..., this=0x2b48e18)
>>    at ./common/Cond.h:48
>> #2  SimpleMessenger::dispatch_entry (this=0x2b48d80)
>>    at msg/SimpleMessenger.cc:374
>> #3  0x000000000052120d in SimpleMessenger::DispatchThread::entry (
>>    this=<optimized out>) at msg/SimpleMessenger.h:559
>> #4  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #6  0x0000000000000000 in ?? ()
>>
>> Thread 36 (Thread 0x7fb5be7f3700 (LWP 12871)):
>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x00000000006748da in Wait (mutex=..., this=0x2b49b98)
>>    at ./common/Cond.h:48
>> #2  SimpleMessenger::dispatch_entry (this=0x2b49b00)
>>    at msg/SimpleMessenger.cc:374
>> #3  0x000000000052120d in SimpleMessenger::DispatchThread::entry (
>>    this=<optimized out>) at msg/SimpleMessenger.h:559
>> #4  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #6  0x0000000000000000 in ?? ()
>>
>> Thread 35 (Thread 0x7fb5bdff2700 (LWP 12872)):
>> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000685bf7 in WaitUntil (when=<optimized out>, mutex=...,
>>    this=0x7fff73395748) at common/Cond.h:67
>> #2  SafeTimer::timer_thread (this=0x7fff73395738) at common/Timer.cc:110
>> #3  0x000000000068666d in SafeTimerThread::entry (this=<optimized out>)
>>    at common/Timer.cc:38
>> #4  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #6  0x0000000000000000 in ?? ()
>>
>> Thread 34 (Thread 0x7fb5bd7f1700 (LWP 12873)):
>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000761f8e in Wait (mutex=..., this=0x7fff73395838)
>>    at ./common/Cond.h:48
>> #2  Finisher::finisher_thread_entry (this=0x7fff733957f0)
>>    at common/Finisher.cc:76
>> #3  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #4  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #5  0x0000000000000000 in ?? ()
>>
>> Thread 33 (Thread 0x7fb5cd2e3700 (LWP 12874)):
>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000679f1a in Wait (mutex=..., this=0x4965950)
>>    at ./common/Cond.h:48
>> #2  SimpleMessenger::Pipe::writer (this=0x4965780)
>>    at msg/SimpleMessenger.cc:1821
>> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>>    this=<optimized out>) at msg/SimpleMessenger.h:258
>> #4  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #6  0x0000000000000000 in ?? ()
>>
>> Thread 32 (Thread 0x7fb5bcff0700 (LWP 12877)):
>> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>>    timeout=<optimized out>) at msg/tcp.cc:53
>> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=31,
>>    buf=0x7fb5bcfefd9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
>> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x4965780)
>>    at msg/SimpleMessenger.cc:1606
>> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>>    this=<optimized out>) at msg/SimpleMessenger.h:250
>> #5  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #7  0x0000000000000000 in ?? ()
>>
>> Thread 31 (Thread 0x7fb5bceef700 (LWP 12878)):
>> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000688a05 in WaitUntil (when=<optimized out>, mutex=...,
>>    this=0x2b66470) at common/Cond.h:67
>> #2  WaitInterval (interval=<optimized out>, mutex=..., cct=<optimized out>,
>>    this=0x2b66470) at common/Cond.h:74
>> #3  ThreadPool::worker (this=0x2b66418) at common/WorkQueue.cc:71
>> #4  0x00000000005d4cad in ThreadPool::WorkThread::entry (this=<optimized out>)
>>    at ./common/WorkQueue.h:121
>> #5  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #7  0x0000000000000000 in ?? ()
>>
>> Thread 30 (Thread 0x7fb5bc6ee700 (LWP 12879)):
>> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000688a05 in WaitUntil (when=<optimized out>, mutex=...,
>>    this=0x2b66470) at common/Cond.h:67
>> #2  WaitInterval (interval=<optimized out>, mutex=..., cct=<optimized out>,
>>    this=0x2b66470) at common/Cond.h:74
>> #3  ThreadPool::worker (this=0x2b66418) at common/WorkQueue.cc:71
>> #4  0x00000000005d4cad in ThreadPool::WorkThread::entry (this=<optimized out>)
>>    at ./common/WorkQueue.h:121
>> #5  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #7  0x0000000000000000 in ?? ()
>>
>> Thread 29 (Thread 0x7fb5bbeed700 (LWP 12880)):
>> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000688a05 in WaitUntil (when=<optimized out>, mutex=...,
>>    this=0x2b665a0) at common/Cond.h:67
>> #2  WaitInterval (interval=<optimized out>, mutex=..., cct=<optimized out>,
>>    this=0x2b665a0) at common/Cond.h:74
>> #3  ThreadPool::worker (this=0x2b66548) at common/WorkQueue.cc:71
>> #4  0x00000000005d4cad in ThreadPool::WorkThread::entry (this=<optimized out>)
>>    at ./common/WorkQueue.h:121
>> #5  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #7  0x0000000000000000 in ?? ()
>>
>> Thread 28 (Thread 0x7fb5bb6ec700 (LWP 12881)):
>> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000688a05 in WaitUntil (when=<optimized out>, mutex=...,
>>    this=0x2b666d0) at common/Cond.h:67
>> #2  WaitInterval (interval=<optimized out>, mutex=..., cct=<optimized out>,
>>    this=0x2b666d0) at common/Cond.h:74
>> #3  ThreadPool::worker (this=0x2b66678) at common/WorkQueue.cc:71
>> #4  0x00000000005d4cad in ThreadPool::WorkThread::entry (this=<optimized out>)
>>    at ./common/WorkQueue.h:121
>> #5  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #7  0x0000000000000000 in ?? ()
>>
>> Thread 27 (Thread 0x7fb5baeeb700 (LWP 12882)):
>> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000688a05 in WaitUntil (when=<optimized out>, mutex=...,
>>    this=0x2b66800) at common/Cond.h:67
>> #2  WaitInterval (interval=<optimized out>, mutex=..., cct=<optimized out>,
>>    this=0x2b66800) at common/Cond.h:74
>> #3  ThreadPool::worker (this=0x2b667a8) at common/WorkQueue.cc:71
>> #4  0x00000000005d4cad in ThreadPool::WorkThread::entry (this=<optimized out>)
>>    at ./common/WorkQueue.h:121
>> #5  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #7  0x0000000000000000 in ?? ()
>>
>> Thread 26 (Thread 0x7fb5ba6ea700 (LWP 12883)):
>> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x00000000005b43f2 in WaitUntil (when=<optimized out>, mutex=...,
>>    this=0x2b66918) at ./common/Cond.h:67
>> #2  WaitInterval (interval=<optimized out>, mutex=..., cct=<optimized out>,
>>    this=0x2b66918) at ./common/Cond.h:74
>> #3  OSD::heartbeat_entry (this=0x2b66000) at osd/OSD.cc:1699
>> #4  0x00000000005e68ad in OSD::T_Heartbeat::entry (this=<optimized out>)
>>    at osd/OSD.h:280
>> #5  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #7  0x0000000000000000 in ?? ()
>>
>> Thread 25 (Thread 0x7fb5b9ee9700 (LWP 12885)):
>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000679f1a in Wait (mutex=..., this=0x4965e50)
>>    at ./common/Cond.h:48
>> #2  SimpleMessenger::Pipe::writer (this=0x4965c80)
>>    at msg/SimpleMessenger.cc:1821
>> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>>    this=<optimized out>) at msg/SimpleMessenger.h:258
>> #4  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #6  0x0000000000000000 in ?? ()
>>
>> Thread 24 (Thread 0x7fb5b9de8700 (LWP 12886)):
>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000679f1a in Wait (mutex=..., this=0x4995450)
>>    at ./common/Cond.h:48
>> #2  SimpleMessenger::Pipe::writer (this=0x4995280)
>>    at msg/SimpleMessenger.cc:1821
>> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>>    this=<optimized out>) at msg/SimpleMessenger.h:258
>> #4  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #6  0x0000000000000000 in ?? ()
>>
>> Thread 23 (Thread 0x7fb5b9ce7700 (LWP 12887)):
>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000679f1a in Wait (mutex=..., this=0x49951d0)
>>    at ./common/Cond.h:48
>> #2  SimpleMessenger::Pipe::writer (this=0x4995000)
>>    at msg/SimpleMessenger.cc:1821
>> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>>    this=<optimized out>) at msg/SimpleMessenger.h:258
>> #4  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #6  0x0000000000000000 in ?? ()
>>
>> Thread 22 (Thread 0x7fb5b9be6700 (LWP 12888)):
>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000679f1a in Wait (mutex=..., this=0x4995bd0)
>>    at ./common/Cond.h:48
>> #2  SimpleMessenger::Pipe::writer (this=0x4995a00)
>>    at msg/SimpleMessenger.cc:1821
>> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>>    this=<optimized out>) at msg/SimpleMessenger.h:258
>> #4  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #6  0x0000000000000000 in ?? ()
>>
>> Thread 21 (Thread 0x7fb5b9ae5700 (LWP 12889)):
>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000679f1a in Wait (mutex=..., this=0x4995950)
>>    at ./common/Cond.h:48
>> #2  SimpleMessenger::Pipe::writer (this=0x4995780)
>>    at msg/SimpleMessenger.cc:1821
>> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>>    this=<optimized out>) at msg/SimpleMessenger.h:258
>> #4  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #6  0x0000000000000000 in ?? ()
>>
>> Thread 20 (Thread 0x7fb5b99e4700 (LWP 12890)):
>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000679f1a in Wait (mutex=..., this=0x49956d0)
>>    at ./common/Cond.h:48
>> #2  SimpleMessenger::Pipe::writer (this=0x4995500)
>>    at msg/SimpleMessenger.cc:1821
>> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>>    this=<optimized out>) at msg/SimpleMessenger.h:258
>> #4  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #6  0x0000000000000000 in ?? ()
>>
>> Thread 19 (Thread 0x7fb5b98e3700 (LWP 12891)):
>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000679f1a in Wait (mutex=..., this=0x4995e50)
>>    at ./common/Cond.h:48
>> #2  SimpleMessenger::Pipe::writer (this=0x4995c80)
>>    at msg/SimpleMessenger.cc:1821
>> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>>    this=<optimized out>) at msg/SimpleMessenger.h:258
>> #4  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #6  0x0000000000000000 in ?? ()
>>
>> Thread 18 (Thread 0x7fb5b97e2700 (LWP 12892)):
>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000679f1a in Wait (mutex=..., this=0x4d62bd0)
>>    at ./common/Cond.h:48
>> #2  SimpleMessenger::Pipe::writer (this=0x4d62a00)
>>    at msg/SimpleMessenger.cc:1821
>> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>>    this=<optimized out>) at msg/SimpleMessenger.h:258
>> #4  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #6  0x0000000000000000 in ?? ()
>>
>> Thread 17 (Thread 0x7fb5b96e1700 (LWP 12893)):
>> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>>    timeout=<optimized out>) at msg/tcp.cc:53
>> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=33,
>>    buf=0x7fb5b96e0d9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
>> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x4995280)
>>    at msg/SimpleMessenger.cc:1606
>> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>>    this=<optimized out>) at msg/SimpleMessenger.h:250
>> #5  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #7  0x0000000000000000 in ?? ()
>>
>> Thread 16 (Thread 0x7fb5b95e0700 (LWP 12894)):
>> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>>    timeout=<optimized out>) at msg/tcp.cc:53
>> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=35,
>>    buf=0x7fb5b95dfd9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
>> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x4995a00)
>>    at msg/SimpleMessenger.cc:1606
>> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>>    this=<optimized out>) at msg/SimpleMessenger.h:250
>> #5  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #7  0x0000000000000000 in ?? ()
>>
>> Thread 15 (Thread 0x7fb5b94df700 (LWP 12895)):
>> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>>    timeout=<optimized out>) at msg/tcp.cc:53
>> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=32,
>>    buf=0x7fb5b94ded9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
>> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x4965c80)
>>    at msg/SimpleMessenger.cc:1606
>> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>>    this=<optimized out>) at msg/SimpleMessenger.h:250
>> #5  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #7  0x0000000000000000 in ?? ()
>>
>> Thread 14 (Thread 0x7fb5b93de700 (LWP 12896)):
>> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>>    timeout=<optimized out>) at msg/tcp.cc:53
>> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=38,
>>    buf=0x7fb5b93ddd9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
>> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x4995c80)
>>    at msg/SimpleMessenger.cc:1606
>> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>>    this=<optimized out>) at msg/SimpleMessenger.h:250
>> #5  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #7  0x0000000000000000 in ?? ()
>>
>> Thread 13 (Thread 0x7fb5b92dd700 (LWP 12897)):
>> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>>    timeout=<optimized out>) at msg/tcp.cc:53
>> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=34,
>>    buf=0x7fb5b92dcd9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
>> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x4995000)
>>    at msg/SimpleMessenger.cc:1606
>> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>>    this=<optimized out>) at msg/SimpleMessenger.h:250
>> #5  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #7  0x0000000000000000 in ?? ()
>>
>> Thread 12 (Thread 0x7fb5b91dc700 (LWP 12898)):
>> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>>    timeout=<optimized out>) at msg/tcp.cc:53
>> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=36,
>>    buf=0x7fb5b91dbd9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
>> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x4995780)
>>    at msg/SimpleMessenger.cc:1606
>> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>>    this=<optimized out>) at msg/SimpleMessenger.h:250
>> #5  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #7  0x0000000000000000 in ?? ()
>>
>> Thread 11 (Thread 0x7fb5b90db700 (LWP 12900)):
>> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>>    timeout=<optimized out>) at msg/tcp.cc:53
>> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=39,
>>    buf=0x7fb5b90dad9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
>> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x4d62a00)
>>    at msg/SimpleMessenger.cc:1606
>> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>>    this=<optimized out>) at msg/SimpleMessenger.h:250
>> #5  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #7  0x0000000000000000 in ?? ()
>>
>> Thread 10 (Thread 0x7fb5b8fda700 (LWP 12899)):
>> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>>    timeout=<optimized out>) at msg/tcp.cc:53
>> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=37,
>>    buf=0x7fb5b8fd9d9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
>> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x4995500)
>>    at msg/SimpleMessenger.cc:1606
>> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>>    this=<optimized out>) at msg/SimpleMessenger.h:250
>> #5  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #7  0x0000000000000000 in ?? ()
>>
>> Thread 9 (Thread 0x7fb5b8ed9700 (LWP 12903)):
>> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>>    timeout=<optimized out>) at msg/tcp.cc:53
>> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=40,
>>    buf=0x7fb5b8ed8d9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
>> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x4d62c80)
>>    at msg/SimpleMessenger.cc:1606
>> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>>    this=<optimized out>) at msg/SimpleMessenger.h:250
>> #5  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #7  0x0000000000000000 in ?? ()
>>
>> Thread 8 (Thread 0x7fb5b8dd8700 (LWP 12904)):
>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000679f1a in Wait (mutex=..., this=0x4d62e50)
>>    at ./common/Cond.h:48
>> #2  SimpleMessenger::Pipe::writer (this=0x4d62c80)
>>    at msg/SimpleMessenger.cc:1821
>> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>>    this=<optimized out>) at msg/SimpleMessenger.h:258
>> #4  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #6  0x0000000000000000 in ?? ()
>>
>> Thread 7 (Thread 0x7fb5b8cd7700 (LWP 12908)):
>> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>>    timeout=<optimized out>) at msg/tcp.cc:53
>> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=42,
>>    buf=0x7fb5b8cd6d9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
>> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x49ed280)
>>    at msg/SimpleMessenger.cc:1606
>> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>>    this=<optimized out>) at msg/SimpleMessenger.h:250
>> #5  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #7  0x0000000000000000 in ?? ()
>>
>> Thread 6 (Thread 0x7fb5b8bd6700 (LWP 12909)):
>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000679f1a in Wait (mutex=..., this=0x49ed450)
>>    at ./common/Cond.h:48
>> #2  SimpleMessenger::Pipe::writer (this=0x49ed280)
>>    at msg/SimpleMessenger.cc:1821
>> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>>    this=<optimized out>) at msg/SimpleMessenger.h:258
>> #4  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #6  0x0000000000000000 in ?? ()
>>
>> Thread 5 (Thread 0x7fb5b8ad5700 (LWP 12910)):
>> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>>    timeout=<optimized out>) at msg/tcp.cc:53
>> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=41,
>>    buf=0x7fb5b8ad4d9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
>> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x49ed000)
>>    at msg/SimpleMessenger.cc:1606
>> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>>    this=<optimized out>) at msg/SimpleMessenger.h:250
>> #5  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #7  0x0000000000000000 in ?? ()
>>
>> Thread 4 (Thread 0x7fb5b89d4700 (LWP 12911)):
>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000679f1a in Wait (mutex=..., this=0x49ed1d0)
>>    at ./common/Cond.h:48
>> #2  SimpleMessenger::Pipe::writer (this=0x49ed000)
>>    at msg/SimpleMessenger.cc:1821
>> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>>    this=<optimized out>) at msg/SimpleMessenger.h:258
>> #4  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #6  0x0000000000000000 in ?? ()
>>
>> Thread 3 (Thread 0x7fb5b88d3700 (LWP 12912)):
>> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>>    timeout=<optimized out>) at msg/tcp.cc:53
>> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=43,
>>    buf=0x7fb5b88d2d9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
>> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x49eda00)
>>    at msg/SimpleMessenger.cc:1606
>> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>>    this=<optimized out>) at msg/SimpleMessenger.h:250
>> #5  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #7  0x0000000000000000 in ?? ()
>>
>> Thread 2 (Thread 0x7fb5b87d2700 (LWP 12913)):
>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000679f1a in Wait (mutex=..., this=0x49edbd0)
>>    at ./common/Cond.h:48
>> #2  SimpleMessenger::Pipe::writer (this=0x49eda00)
>>    at msg/SimpleMessenger.cc:1821
>> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>>    this=<optimized out>) at msg/SimpleMessenger.h:258
>> #4  0x00007fb5cccd8efc in start_thread ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #6  0x0000000000000000 in ?? ()
>>
>> Thread 1 (Thread 0x7fb5cd3017a0 (LWP 12770)):
>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000672c47 in Wait (mutex=..., this=0x2b49838)
>>    at ./common/Cond.h:48
>> #2  SimpleMessenger::wait (this=0x2b49680) at msg/SimpleMessenger.cc:2654
>> #3  0x000000000051efc7 in main (argc=<optimized out>, argv=<optimized out>)
>>    at ceph_osd.cc:423
>>
>>
>> On 6 April 2012 22:06, Samuel Just <sam.just@dreamhost.com> wrote:
>>> Actually, can you restart osd0, wait the same amount of time, attach
>>> to the process with gdb, and post a backtrace from all threads?
>>> -Sam
>>>
>>> On Fri, Apr 6, 2012 at 12:32 PM, Damien Churchill <damoxc@gmail.com> wrote:
>>>> I've upped the monitor debug logging and re-uploaded the osd log, is that okay?
>>>>
>>>> http://damoxc.net/ceph/mon.node21.log.gz
>>>>
>>>> On 6 April 2012 19:20, Samuel Just <sam.just@dreamhost.com> wrote:
>>>>> Hmm, can you post the monitor logs as well for that period?  It looks
>>>>> like the osd is requesting a map change and not getting it.
>>>>> -Sam
>>>>>
>>>>> On Fri, Apr 6, 2012 at 10:59 AM, Damien Churchill <damoxc@gmail.com> wrote:
>>>>>> Oops, was set to 600, chmod'd to 644, should be good now.
>>>>>>
>>>>>> On 6 April 2012 18:56, Samuel Just <sam.just@dreamhost.com> wrote:
>>>>>>> Error 403.
>>>>>>> -Sam
>>>>>>>
>>>>>>> On Fri, Apr 6, 2012 at 10:53 AM, Damien Churchill <damoxc@gmail.com> wrote:
>>>>>>>> Okay, uploaded it to http://damoxc.net/ceph/osd.0.log.gz. I restarted
>>>>>>>> the osd with debug osd = 20 and let it run for 5 minutes or so.
>>>>>>>>
>>>>>>>> On 6 April 2012 18:30, Samuel Just <sam.just@dreamhost.com> wrote:
>>>>>>>>> Hmm, osd.0 should be enough.  I need osd debugging at around 20 from
>>>>>>>>> when the osd started.  Restarting the osd with debugging at 20 would
>>>>>>>>> also work fine.
>>>>>>>>> -Sam
>>>>>>>>>
>>>>>>>>> On Fri, Apr 6, 2012 at 9:55 AM, Damien Churchill <damoxc@gmail.com> wrote:
>>>>>>>>>> I've got that directory on 3 of the osds: 0, 3 and 4. Do you want the
>>>>>>>>>> logs to all 3 of them?
>>>>>>>>>>
>>>>>>>>>> On 6 April 2012 17:50, Samuel Just <sam.just@dreamhost.com> wrote:
>>>>>>>>>>> Is there a 0.138_head directory under current/ on any of your osds?
>>>>>>>>>>> If so, can you post the log to that osd?  I could also use the osd.0
>>>>>>>>>>> log.
>>>>>>>>>>> -Sam
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Apr 4, 2012 at 2:44 PM, Damien Churchill <damoxc@gmail.com> wrote:
>>>>>>>>>>>> I've uploaded them to:
>>>>>>>>>>>>
>>>>>>>>>>>> http://damoxc.net/ceph/osdmap
>>>>>>>>>>>> http://damoxc.net/ceph/pg_dump
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>>
>>>>>>>>>>>> On 4 April 2012 21:51, Samuel Just <sam.just@dreamhost.com> wrote:
>>>>>>>>>>>>> Can you post a copy of your osd map and the output of 'ceph pg dump' ?
>>>>>>>>>>>>>  You can get the osdmap via 'ceph osd getmap -o <filename>'.
>>>>>>>>>>>>> -Sam
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Apr 4, 2012 at 1:12 AM, Damien Churchill <damoxc@gmail.com> wrote:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm having some trouble getting some pgs to stop being inactive. The
>>>>>>>>>>>>>> cluster is running 0.44.1 and the kernel version is 3.2.x.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ceph -s reports:
>>>>>>>>>>>>>> 2012-04-04 09:08:57.816029    pg v188540: 990 pgs: 223 inactive, 767
>>>>>>>>>>>>>> active+clean; 205 GB data, 1013 GB used, 8204 GB / 9315 GB avail
>>>>>>>>>>>>>> 2012-04-04 09:08:57.817970   mds e2198: 1/1/1 up {0=node24=up:active},
>>>>>>>>>>>>>> 4 up:standby
>>>>>>>>>>>>>> 2012-04-04 09:08:57.818024   osd e5910: 5 osds: 5 up, 5 in
>>>>>>>>>>>>>> 2012-04-04 09:08:57.818201   log 2012-04-04 09:04:03.838358 osd.3
>>>>>>>>>>>>>> 172.22.10.24:6801/30000 159 : [INF] 0.13d scrub ok
>>>>>>>>>>>>>> 2012-04-04 09:08:57.818280   mon e7: 3 mons at
>>>>>>>>>>>>>> {node21=172.22.10.21:6789/0,node22=172.22.10.22:6789/0,node23=172.22.10.23:6789/0}
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ceph health says:
>>>>>>>>>>>>>> 2012-04-04 09:09:01.651053 mon <- [health]
>>>>>>>>>>>>>> 2012-04-04 09:09:01.666585 mon.1 -> 'HEALTH_WARN 223 pgs stuck
>>>>>>>>>>>>>> inactive; 223 pgs stuck unclean' (0)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I was wondering if anyone has any suggestions about how to resolve
>>>>>>>>>>>>>> this, or things to look for. I've tried restarted the ceph daemons on
>>>>>>>>>>>>>> the various nodes a few times to no-avail. I don't think that there is
>>>>>>>>>>>>>> anything wrong with any of the nodes either.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>> Damien
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pgs stuck inactive
  2012-04-10 22:03                           ` Damien Churchill
@ 2012-04-10 23:00                             ` Samuel Just
  2012-04-10 23:40                               ` Greg Farnum
  0 siblings, 1 reply; 25+ messages in thread
From: Samuel Just @ 2012-04-10 23:00 UTC (permalink / raw)
  To: Damien Churchill; +Cc: ceph-devel

Can you send along the osd log as well for comparison?
-Sam

On Tue, Apr 10, 2012 at 3:03 PM, Damien Churchill <damoxc@gmail.com> wrote:
> Here are the monitor logs, they're from the monitor starts, however I
> restarted the osd shortly afterwards and left it to run for 5 or so
> minutes.
>
> http://damoxc.net/ceph/mon.node21.log.gz
> http://damoxc.net/ceph/mon.node22.log.gz
> http://damoxc.net/ceph/mon.node23.log.gz
>
> On 10 April 2012 22:49, Samuel Just <sam.just@dreamhost.com> wrote:
>> Nothing apparent from the backtrace.  I need monitor logs from when
>> the osd is sending pg_temp requests.  Can you restart the osd and post
>> the osd and all three monitor logs from when you restarted the osd?
>> You'll have to enable monitor logging on all three.
>> -Sam
>>
>> On Tue, Apr 10, 2012 at 4:34 AM, Damien Churchill <damoxc@gmail.com> wrote:
>>> Okay done that now:
>>>
>>> (gdb) thread apply all bt
>>>
>>> Thread 61 (Thread 0x7fb5cb00c700 (LWP 12771)):
>>> #0  0x00007fb5cccdf3f1 in sem_timedwait ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x00000000006981f7 in CephContextServiceThread::entry (this=0x2b52bc0)
>>>    at common/ceph_context.cc:53
>>> #2  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #3  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #4  0x0000000000000000 in ?? ()
>>>
>>> Thread 60 (Thread 0x7fb5ca80b700 (LWP 12774)):
>>> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>>> #1  0x000000000068abf9 in AdminSocket::entry (this=0x2b5c000)
>>>    at common/admin_socket.cc:211
>>> #2  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #3  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #4  0x0000000000000000 in ?? ()
>>>
>>> Thread 59 (Thread 0x7fb5c5801700 (LWP 12831)):
>>> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>>> #1  0x000000000067060e in SimpleMessenger::Accepter::entry (this=0x2b496b8)
>>>    at msg/SimpleMessenger.cc:209
>>> #2  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #3  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #4  0x0000000000000000 in ?? ()
>>>
>>> Thread 58 (Thread 0x7fb5c6002700 (LWP 12832)):
>>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x0000000000672aba in Wait (mutex=..., this=0x2b49aa0)
>>>    at ./common/Cond.h:48
>>> #2  SimpleMessenger::reaper_entry (this=0x2b49680)
>>>    at msg/SimpleMessenger.cc:2336
>>> #3  0x000000000052122d in SimpleMessenger::ReaperThread::entry (
>>>    this=<optimized out>) at msg/SimpleMessenger.h:522
>>> #4  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #6  0x0000000000000000 in ?? ()
>>>
>>> Thread 57 (Thread 0x7fb5c6803700 (LWP 12833)):
>>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x0000000000672aba in Wait (mutex=..., this=0x2b491a0)
>>>    at ./common/Cond.h:48
>>> #2  SimpleMessenger::reaper_entry (this=0x2b48d80)
>>>    at msg/SimpleMessenger.cc:2336
>>> #3  0x000000000052122d in SimpleMessenger::ReaperThread::entry (
>>>    this=<optimized out>) at msg/SimpleMessenger.h:522
>>> #4  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #6  0x0000000000000000 in ?? ()
>>>
>>> Thread 56 (Thread 0x7fb5c9008700 (LWP 12834)):
>>> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>>> #1  0x000000000067060e in SimpleMessenger::Accepter::entry (this=0x2b49b38)
>>>    at msg/SimpleMessenger.cc:209
>>> #2  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #3  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #4  0x0000000000000000 in ?? ()
>>>
>>> Thread 55 (Thread 0x7fb5ca00a700 (LWP 12835)):
>>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x0000000000672aba in Wait (mutex=..., this=0x2b49f20)
>>>    at ./common/Cond.h:48
>>> #2  SimpleMessenger::reaper_entry (this=0x2b49b00)
>>>    at msg/SimpleMessenger.cc:2336
>>> #3  0x000000000052122d in SimpleMessenger::ReaperThread::entry (
>>>    this=<optimized out>) at msg/SimpleMessenger.h:522
>>> #4  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #6  0x0000000000000000 in ?? ()
>>>
>>> Thread 54 (Thread 0x7fb5c9809700 (LWP 12836)):
>>> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>>> #1  0x000000000067060e in SimpleMessenger::Accepter::entry (this=0x2b49238)
>>>    at msg/SimpleMessenger.cc:209
>>> #2  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #3  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #4  0x0000000000000000 in ?? ()
>>>
>>> Thread 53 (Thread 0x7fb5c8807700 (LWP 12837)):
>>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x0000000000672aba in Wait (mutex=..., this=0x2b49620)
>>>    at ./common/Cond.h:48
>>> #2  SimpleMessenger::reaper_entry (this=0x2b49200)
>>>    at msg/SimpleMessenger.cc:2336
>>> #3  0x000000000052122d in SimpleMessenger::ReaperThread::entry (
>>>    this=<optimized out>) at msg/SimpleMessenger.h:522
>>> #4  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #6  0x0000000000000000 in ?? ()
>>>
>>> Thread 52 (Thread 0x7fb5c8006700 (LWP 12838)):
>>> #0  0x00007fb5cb302613 in select () from /lib/x86_64-linux-gnu/libc.so.6
>>> #1  0x00000000006fb210 in SignalHandler::entry (this=0x2b4e420)
>>>    at global/signal_handler.cc:201
>>> #2  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #3  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #4  0x0000000000000000 in ?? ()
>>>
>>> Thread 51 (Thread 0x7fb5c7805700 (LWP 12839)):
>>> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x0000000000685bf7 in WaitUntil (when=<optimized out>, mutex=...,
>>>    this=0x2b66060) at common/Cond.h:67
>>> #2  SafeTimer::timer_thread (this=0x2b66050) at common/Timer.cc:110
>>> #3  0x000000000068666d in SafeTimerThread::entry (this=<optimized out>)
>>>    at common/Timer.cc:38
>>> #4  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #6  0x0000000000000000 in ?? ()
>>>
>>> Thread 50 (Thread 0x7fb5c7004700 (LWP 12840)):
>>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x0000000000685853 in Wait (mutex=..., this=0x2b67460) at common/Cond.h:48
>>> #2  SafeTimer::timer_thread (this=0x2b67450) at common/Timer.cc:108
>>> #3  0x000000000068666d in SafeTimerThread::entry (this=<optimized out>)
>>>    at common/Timer.cc:38
>>> #4  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #6  0x0000000000000000 in ?? ()
>>>
>>> Thread 49 (Thread 0x7fb5c5000700 (LWP 12854)):
>>> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x00000000007992d0 in WaitUntil (when=<optimized out>, mutex=...,
>>>    this=0x2b63500) at ./common/Cond.h:67
>>> #2  WaitInterval (interval=<optimized out>, mutex=..., cct=<optimized out>,
>>>    this=0x2b63500) at ./common/Cond.h:74
>>> #3  FileStore::sync_entry (this=0x2b63000) at os/FileStore.cc:3352
>>> #4  0x00000000007a65ed in FileStore::SyncThread::entry (this=<optimized out>)
>>>    at os/FileStore.h:103
>>> #5  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #7  0x0000000000000000 in ?? ()
>>>
>>> Thread 48 (Thread 0x7fb5c47ff700 (LWP 12859)):
>>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x000000000077807c in Wait (mutex=..., this=0x2b6ab10)
>>>    at ./common/Cond.h:48
>>> #2  FileJournal::write_thread_entry (this=0x2b6a800) at os/FileJournal.cc:1052
>>>
>>> #3  0x00000000005d4c7d in FileJournal::Writer::entry (this=<optimized out>)
>>>    at ./os/FileJournal.h:249
>>> #4  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #6  0x0000000000000000 in ?? ()
>>>
>>> Thread 47 (Thread 0x7fb5c3ffe700 (LWP 12860)):
>>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x000000000077222f in Wait (mutex=..., this=0x2b6a8c0)
>>>    at ./common/Cond.h:48
>>> #2  FileJournal::write_finish_thread_entry (this=0x2b6a800)
>>>    at os/FileJournal.cc:1245
>>> #3  0x00000000005d4c5d in FileJournal::WriteFinisher::entry (
>>>    this=<optimized out>) at ./os/FileJournal.h:259
>>> #4  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #6  0x0000000000000000 in ?? ()
>>>
>>> Thread 46 (Thread 0x7fb5c37fd700 (LWP 12861)):
>>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x0000000000761f8e in Wait (mutex=..., this=0x2b630b8)
>>>    at ./common/Cond.h:48
>>> #2  Finisher::finisher_thread_entry (this=0x2b63070) at common/Finisher.cc:76
>>> #3  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #4  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #5  0x0000000000000000 in ?? ()
>>>
>>> Thread 45 (Thread 0x7fb5c2ffc700 (LWP 12862)):
>>> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x0000000000688a05 in WaitUntil (when=<optimized out>, mutex=...,
>>>    this=0x2b63868) at common/Cond.h:67
>>> #2  WaitInterval (interval=<optimized out>, mutex=..., cct=<optimized out>,
>>>    this=0x2b63868) at common/Cond.h:74
>>> #3  ThreadPool::worker (this=0x2b63810) at common/WorkQueue.cc:71
>>> #4  0x00000000005d4cad in ThreadPool::WorkThread::entry (this=<optimized out>)
>>>    at ./common/WorkQueue.h:121
>>> #5  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #7  0x0000000000000000 in ?? ()
>>>
>>> Thread 44 (Thread 0x7fb5c27fb700 (LWP 12863)):
>>> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x0000000000688a05 in WaitUntil (when=<optimized out>, mutex=...,
>>>    this=0x2b63868) at common/Cond.h:67
>>> #2  WaitInterval (interval=<optimized out>, mutex=..., cct=<optimized out>,
>>>    this=0x2b63868) at common/Cond.h:74
>>> #3  ThreadPool::worker (this=0x2b63810) at common/WorkQueue.cc:71
>>> #4  0x00000000005d4cad in ThreadPool::WorkThread::entry (this=<optimized out>)
>>>    at ./common/WorkQueue.h:121
>>> #5  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #7  0x0000000000000000 in ?? ()
>>>
>>> Thread 43 (Thread 0x7fb5c1ffa700 (LWP 12864)):
>>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x0000000000781ebd in Wait (mutex=..., this=0x2b63970)
>>>    at ./common/Cond.h:48
>>> #2  FileStore::flusher_entry (this=0x2b63000) at os/FileStore.cc:3312
>>> #3  0x00000000007a49ed in FileStore::FlusherThread::entry (
>>>    this=<optimized out>) at os/FileStore.h:242
>>> #4  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #6  0x0000000000000000 in ?? ()
>>>
>>> Thread 42 (Thread 0x7fb5c17f9700 (LWP 12865)):
>>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x0000000000761f8e in Wait (mutex=..., this=0x2b63750)
>>>    at ./common/Cond.h:48
>>> #2  Finisher::finisher_thread_entry (this=0x2b63708) at common/Finisher.cc:76
>>> #3  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #4  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #5  0x0000000000000000 in ?? ()
>>>
>>> Thread 41 (Thread 0x7fb5c0ff8700 (LWP 12866)):
>>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x0000000000761f8e in Wait (mutex=..., this=0x2b63400)
>>>    at ./common/Cond.h:48
>>> #2  Finisher::finisher_thread_entry (this=0x2b633b8) at common/Finisher.cc:76
>>> #3  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #4  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #5  0x0000000000000000 in ?? ()
>>>
>>> Thread 40 (Thread 0x7fb5c07f7700 (LWP 12867)):
>>> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x0000000000685bf7 in WaitUntil (when=<optimized out>, mutex=...,
>>>    this=0x2b63590) at common/Cond.h:67
>>> #2  SafeTimer::timer_thread (this=0x2b63580) at common/Timer.cc:110
>>> #3  0x000000000068666d in SafeTimerThread::entry (this=<optimized out>)
>>>    at common/Timer.cc:38
>>> #4  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #6  0x0000000000000000 in ?? ()
>>>
>>> Thread 39 (Thread 0x7fb5bfff6700 (LWP 12868)):
>>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x00000000006748da in Wait (mutex=..., this=0x2b49718)
>>>    at ./common/Cond.h:48
>>> #2  SimpleMessenger::dispatch_entry (this=0x2b49680)
>>>    at msg/SimpleMessenger.cc:374
>>> #3  0x000000000052120d in SimpleMessenger::DispatchThread::entry (
>>>    this=<optimized out>) at msg/SimpleMessenger.h:559
>>> #4  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #6  0x0000000000000000 in ?? ()
>>>
>>> Thread 38 (Thread 0x7fb5bf7f5700 (LWP 12869)):
>>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x00000000006748da in Wait (mutex=..., this=0x2b49298)
>>>    at ./common/Cond.h:48
>>> #2  SimpleMessenger::dispatch_entry (this=0x2b49200)
>>>    at msg/SimpleMessenger.cc:374
>>> #3  0x000000000052120d in SimpleMessenger::DispatchThread::entry (
>>>    this=<optimized out>) at msg/SimpleMessenger.h:559
>>> #4  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #6  0x0000000000000000 in ?? ()
>>>
>>> Thread 37 (Thread 0x7fb5beff4700 (LWP 12870)):
>>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x00000000006748da in Wait (mutex=..., this=0x2b48e18)
>>>    at ./common/Cond.h:48
>>> #2  SimpleMessenger::dispatch_entry (this=0x2b48d80)
>>>    at msg/SimpleMessenger.cc:374
>>> #3  0x000000000052120d in SimpleMessenger::DispatchThread::entry (
>>>    this=<optimized out>) at msg/SimpleMessenger.h:559
>>> #4  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #6  0x0000000000000000 in ?? ()
>>>
>>> Thread 36 (Thread 0x7fb5be7f3700 (LWP 12871)):
>>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x00000000006748da in Wait (mutex=..., this=0x2b49b98)
>>>    at ./common/Cond.h:48
>>> #2  SimpleMessenger::dispatch_entry (this=0x2b49b00)
>>>    at msg/SimpleMessenger.cc:374
>>> #3  0x000000000052120d in SimpleMessenger::DispatchThread::entry (
>>>    this=<optimized out>) at msg/SimpleMessenger.h:559
>>> #4  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #6  0x0000000000000000 in ?? ()
>>>
>>> Thread 35 (Thread 0x7fb5bdff2700 (LWP 12872)):
>>> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x0000000000685bf7 in WaitUntil (when=<optimized out>, mutex=...,
>>>    this=0x7fff73395748) at common/Cond.h:67
>>> #2  SafeTimer::timer_thread (this=0x7fff73395738) at common/Timer.cc:110
>>> #3  0x000000000068666d in SafeTimerThread::entry (this=<optimized out>)
>>>    at common/Timer.cc:38
>>> #4  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #6  0x0000000000000000 in ?? ()
>>>
>>> Thread 34 (Thread 0x7fb5bd7f1700 (LWP 12873)):
>>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x0000000000761f8e in Wait (mutex=..., this=0x7fff73395838)
>>>    at ./common/Cond.h:48
>>> #2  Finisher::finisher_thread_entry (this=0x7fff733957f0)
>>>    at common/Finisher.cc:76
>>> #3  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #4  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #5  0x0000000000000000 in ?? ()
>>>
>>> Thread 33 (Thread 0x7fb5cd2e3700 (LWP 12874)):
>>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x0000000000679f1a in Wait (mutex=..., this=0x4965950)
>>>    at ./common/Cond.h:48
>>> #2  SimpleMessenger::Pipe::writer (this=0x4965780)
>>>    at msg/SimpleMessenger.cc:1821
>>> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>>>    this=<optimized out>) at msg/SimpleMessenger.h:258
>>> #4  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #6  0x0000000000000000 in ?? ()
>>>
>>> Thread 32 (Thread 0x7fb5bcff0700 (LWP 12877)):
>>> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>>> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>>>    timeout=<optimized out>) at msg/tcp.cc:53
>>> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=31,
>>>    buf=0x7fb5bcfefd9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
>>> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x4965780)
>>>    at msg/SimpleMessenger.cc:1606
>>> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>>>    this=<optimized out>) at msg/SimpleMessenger.h:250
>>> #5  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #7  0x0000000000000000 in ?? ()
>>>
>>> Thread 31 (Thread 0x7fb5bceef700 (LWP 12878)):
>>> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x0000000000688a05 in WaitUntil (when=<optimized out>, mutex=...,
>>>    this=0x2b66470) at common/Cond.h:67
>>> #2  WaitInterval (interval=<optimized out>, mutex=..., cct=<optimized out>,
>>>    this=0x2b66470) at common/Cond.h:74
>>> #3  ThreadPool::worker (this=0x2b66418) at common/WorkQueue.cc:71
>>> #4  0x00000000005d4cad in ThreadPool::WorkThread::entry (this=<optimized out>)
>>>    at ./common/WorkQueue.h:121
>>> #5  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #7  0x0000000000000000 in ?? ()
>>>
>>> Thread 30 (Thread 0x7fb5bc6ee700 (LWP 12879)):
>>> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x0000000000688a05 in WaitUntil (when=<optimized out>, mutex=...,
>>>    this=0x2b66470) at common/Cond.h:67
>>> #2  WaitInterval (interval=<optimized out>, mutex=..., cct=<optimized out>,
>>>    this=0x2b66470) at common/Cond.h:74
>>> #3  ThreadPool::worker (this=0x2b66418) at common/WorkQueue.cc:71
>>> #4  0x00000000005d4cad in ThreadPool::WorkThread::entry (this=<optimized out>)
>>>    at ./common/WorkQueue.h:121
>>> #5  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #7  0x0000000000000000 in ?? ()
>>>
>>> Thread 29 (Thread 0x7fb5bbeed700 (LWP 12880)):
>>> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x0000000000688a05 in WaitUntil (when=<optimized out>, mutex=...,
>>>    this=0x2b665a0) at common/Cond.h:67
>>> #2  WaitInterval (interval=<optimized out>, mutex=..., cct=<optimized out>,
>>>    this=0x2b665a0) at common/Cond.h:74
>>> #3  ThreadPool::worker (this=0x2b66548) at common/WorkQueue.cc:71
>>> #4  0x00000000005d4cad in ThreadPool::WorkThread::entry (this=<optimized out>)
>>>    at ./common/WorkQueue.h:121
>>> #5  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #7  0x0000000000000000 in ?? ()
>>>
>>> Thread 28 (Thread 0x7fb5bb6ec700 (LWP 12881)):
>>> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x0000000000688a05 in WaitUntil (when=<optimized out>, mutex=...,
>>>    this=0x2b666d0) at common/Cond.h:67
>>> #2  WaitInterval (interval=<optimized out>, mutex=..., cct=<optimized out>,
>>>    this=0x2b666d0) at common/Cond.h:74
>>> #3  ThreadPool::worker (this=0x2b66678) at common/WorkQueue.cc:71
>>> #4  0x00000000005d4cad in ThreadPool::WorkThread::entry (this=<optimized out>)
>>>    at ./common/WorkQueue.h:121
>>> #5  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #7  0x0000000000000000 in ?? ()
>>>
>>> Thread 27 (Thread 0x7fb5baeeb700 (LWP 12882)):
>>> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x0000000000688a05 in WaitUntil (when=<optimized out>, mutex=...,
>>>    this=0x2b66800) at common/Cond.h:67
>>> #2  WaitInterval (interval=<optimized out>, mutex=..., cct=<optimized out>,
>>>    this=0x2b66800) at common/Cond.h:74
>>> #3  ThreadPool::worker (this=0x2b667a8) at common/WorkQueue.cc:71
>>> #4  0x00000000005d4cad in ThreadPool::WorkThread::entry (this=<optimized out>)
>>>    at ./common/WorkQueue.h:121
>>> #5  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #7  0x0000000000000000 in ?? ()
>>>
>>> Thread 26 (Thread 0x7fb5ba6ea700 (LWP 12883)):
>>> #0  0x00007fb5cccdd3cb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x00000000005b43f2 in WaitUntil (when=<optimized out>, mutex=...,
>>>    this=0x2b66918) at ./common/Cond.h:67
>>> #2  WaitInterval (interval=<optimized out>, mutex=..., cct=<optimized out>,
>>>    this=0x2b66918) at ./common/Cond.h:74
>>> #3  OSD::heartbeat_entry (this=0x2b66000) at osd/OSD.cc:1699
>>> #4  0x00000000005e68ad in OSD::T_Heartbeat::entry (this=<optimized out>)
>>>    at osd/OSD.h:280
>>> #5  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #7  0x0000000000000000 in ?? ()
>>>
>>> Thread 25 (Thread 0x7fb5b9ee9700 (LWP 12885)):
>>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x0000000000679f1a in Wait (mutex=..., this=0x4965e50)
>>>    at ./common/Cond.h:48
>>> #2  SimpleMessenger::Pipe::writer (this=0x4965c80)
>>>    at msg/SimpleMessenger.cc:1821
>>> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>>>    this=<optimized out>) at msg/SimpleMessenger.h:258
>>> #4  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #6  0x0000000000000000 in ?? ()
>>>
>>> Thread 24 (Thread 0x7fb5b9de8700 (LWP 12886)):
>>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x0000000000679f1a in Wait (mutex=..., this=0x4995450)
>>>    at ./common/Cond.h:48
>>> #2  SimpleMessenger::Pipe::writer (this=0x4995280)
>>>    at msg/SimpleMessenger.cc:1821
>>> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>>>    this=<optimized out>) at msg/SimpleMessenger.h:258
>>> #4  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #6  0x0000000000000000 in ?? ()
>>>
>>> Thread 23 (Thread 0x7fb5b9ce7700 (LWP 12887)):
>>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x0000000000679f1a in Wait (mutex=..., this=0x49951d0)
>>>    at ./common/Cond.h:48
>>> #2  SimpleMessenger::Pipe::writer (this=0x4995000)
>>>    at msg/SimpleMessenger.cc:1821
>>> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>>>    this=<optimized out>) at msg/SimpleMessenger.h:258
>>> #4  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #6  0x0000000000000000 in ?? ()
>>>
>>> Thread 22 (Thread 0x7fb5b9be6700 (LWP 12888)):
>>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x0000000000679f1a in Wait (mutex=..., this=0x4995bd0)
>>>    at ./common/Cond.h:48
>>> #2  SimpleMessenger::Pipe::writer (this=0x4995a00)
>>>    at msg/SimpleMessenger.cc:1821
>>> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>>>    this=<optimized out>) at msg/SimpleMessenger.h:258
>>> #4  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #6  0x0000000000000000 in ?? ()
>>>
>>> Thread 21 (Thread 0x7fb5b9ae5700 (LWP 12889)):
>>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x0000000000679f1a in Wait (mutex=..., this=0x4995950)
>>>    at ./common/Cond.h:48
>>> #2  SimpleMessenger::Pipe::writer (this=0x4995780)
>>>    at msg/SimpleMessenger.cc:1821
>>> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>>>    this=<optimized out>) at msg/SimpleMessenger.h:258
>>> #4  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #6  0x0000000000000000 in ?? ()
>>>
>>> Thread 20 (Thread 0x7fb5b99e4700 (LWP 12890)):
>>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x0000000000679f1a in Wait (mutex=..., this=0x49956d0)
>>>    at ./common/Cond.h:48
>>> #2  SimpleMessenger::Pipe::writer (this=0x4995500)
>>>    at msg/SimpleMessenger.cc:1821
>>> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>>>    this=<optimized out>) at msg/SimpleMessenger.h:258
>>> #4  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #6  0x0000000000000000 in ?? ()
>>>
>>> Thread 19 (Thread 0x7fb5b98e3700 (LWP 12891)):
>>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x0000000000679f1a in Wait (mutex=..., this=0x4995e50)
>>>    at ./common/Cond.h:48
>>> #2  SimpleMessenger::Pipe::writer (this=0x4995c80)
>>>    at msg/SimpleMessenger.cc:1821
>>> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>>>    this=<optimized out>) at msg/SimpleMessenger.h:258
>>> #4  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #6  0x0000000000000000 in ?? ()
>>>
>>> Thread 18 (Thread 0x7fb5b97e2700 (LWP 12892)):
>>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x0000000000679f1a in Wait (mutex=..., this=0x4d62bd0)
>>>    at ./common/Cond.h:48
>>> #2  SimpleMessenger::Pipe::writer (this=0x4d62a00)
>>>    at msg/SimpleMessenger.cc:1821
>>> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>>>    this=<optimized out>) at msg/SimpleMessenger.h:258
>>> #4  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #6  0x0000000000000000 in ?? ()
>>>
>>> Thread 17 (Thread 0x7fb5b96e1700 (LWP 12893)):
>>> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>>> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>>>    timeout=<optimized out>) at msg/tcp.cc:53
>>> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=33,
>>>    buf=0x7fb5b96e0d9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
>>> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x4995280)
>>>    at msg/SimpleMessenger.cc:1606
>>> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>>>    this=<optimized out>) at msg/SimpleMessenger.h:250
>>> #5  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #7  0x0000000000000000 in ?? ()
>>>
>>> Thread 16 (Thread 0x7fb5b95e0700 (LWP 12894)):
>>> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>>> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>>>    timeout=<optimized out>) at msg/tcp.cc:53
>>> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=35,
>>>    buf=0x7fb5b95dfd9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
>>> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x4995a00)
>>>    at msg/SimpleMessenger.cc:1606
>>> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>>>    this=<optimized out>) at msg/SimpleMessenger.h:250
>>> #5  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #7  0x0000000000000000 in ?? ()
>>>
>>> Thread 15 (Thread 0x7fb5b94df700 (LWP 12895)):
>>> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>>> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>>>    timeout=<optimized out>) at msg/tcp.cc:53
>>> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=32,
>>>    buf=0x7fb5b94ded9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
>>> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x4965c80)
>>>    at msg/SimpleMessenger.cc:1606
>>> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>>>    this=<optimized out>) at msg/SimpleMessenger.h:250
>>> #5  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #7  0x0000000000000000 in ?? ()
>>>
>>> Thread 14 (Thread 0x7fb5b93de700 (LWP 12896)):
>>> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>>> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>>>    timeout=<optimized out>) at msg/tcp.cc:53
>>> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=38,
>>>    buf=0x7fb5b93ddd9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
>>> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x4995c80)
>>>    at msg/SimpleMessenger.cc:1606
>>> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>>>    this=<optimized out>) at msg/SimpleMessenger.h:250
>>> #5  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #7  0x0000000000000000 in ?? ()
>>>
>>> Thread 13 (Thread 0x7fb5b92dd700 (LWP 12897)):
>>> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>>> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>>>    timeout=<optimized out>) at msg/tcp.cc:53
>>> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=34,
>>>    buf=0x7fb5b92dcd9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
>>> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x4995000)
>>>    at msg/SimpleMessenger.cc:1606
>>> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>>>    this=<optimized out>) at msg/SimpleMessenger.h:250
>>> #5  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #7  0x0000000000000000 in ?? ()
>>>
>>> Thread 12 (Thread 0x7fb5b91dc700 (LWP 12898)):
>>> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>>> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>>>    timeout=<optimized out>) at msg/tcp.cc:53
>>> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=36,
>>>    buf=0x7fb5b91dbd9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
>>> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x4995780)
>>>    at msg/SimpleMessenger.cc:1606
>>> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>>>    this=<optimized out>) at msg/SimpleMessenger.h:250
>>> #5  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #7  0x0000000000000000 in ?? ()
>>>
>>> Thread 11 (Thread 0x7fb5b90db700 (LWP 12900)):
>>> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>>> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>>>    timeout=<optimized out>) at msg/tcp.cc:53
>>> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=39,
>>>    buf=0x7fb5b90dad9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
>>> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x4d62a00)
>>>    at msg/SimpleMessenger.cc:1606
>>> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>>>    this=<optimized out>) at msg/SimpleMessenger.h:250
>>> #5  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #7  0x0000000000000000 in ?? ()
>>>
>>> Thread 10 (Thread 0x7fb5b8fda700 (LWP 12899)):
>>> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>>> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>>>    timeout=<optimized out>) at msg/tcp.cc:53
>>> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=37,
>>>    buf=0x7fb5b8fd9d9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
>>> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x4995500)
>>>    at msg/SimpleMessenger.cc:1606
>>> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>>>    this=<optimized out>) at msg/SimpleMessenger.h:250
>>> #5  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #7  0x0000000000000000 in ?? ()
>>>
>>> Thread 9 (Thread 0x7fb5b8ed9700 (LWP 12903)):
>>> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>>> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>>>    timeout=<optimized out>) at msg/tcp.cc:53
>>> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=40,
>>>    buf=0x7fb5b8ed8d9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
>>> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x4d62c80)
>>>    at msg/SimpleMessenger.cc:1606
>>> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>>>    this=<optimized out>) at msg/SimpleMessenger.h:250
>>> #5  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #7  0x0000000000000000 in ?? ()
>>>
>>> Thread 8 (Thread 0x7fb5b8dd8700 (LWP 12904)):
>>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x0000000000679f1a in Wait (mutex=..., this=0x4d62e50)
>>>    at ./common/Cond.h:48
>>> #2  SimpleMessenger::Pipe::writer (this=0x4d62c80)
>>>    at msg/SimpleMessenger.cc:1821
>>> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>>>    this=<optimized out>) at msg/SimpleMessenger.h:258
>>> #4  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #6  0x0000000000000000 in ?? ()
>>>
>>> Thread 7 (Thread 0x7fb5b8cd7700 (LWP 12908)):
>>> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>>> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>>>    timeout=<optimized out>) at msg/tcp.cc:53
>>> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=42,
>>>    buf=0x7fb5b8cd6d9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
>>> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x49ed280)
>>>    at msg/SimpleMessenger.cc:1606
>>> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>>>    this=<optimized out>) at msg/SimpleMessenger.h:250
>>> #5  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #7  0x0000000000000000 in ?? ()
>>>
>>> Thread 6 (Thread 0x7fb5b8bd6700 (LWP 12909)):
>>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x0000000000679f1a in Wait (mutex=..., this=0x49ed450)
>>>    at ./common/Cond.h:48
>>> #2  SimpleMessenger::Pipe::writer (this=0x49ed280)
>>>    at msg/SimpleMessenger.cc:1821
>>> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>>>    this=<optimized out>) at msg/SimpleMessenger.h:258
>>> #4  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #6  0x0000000000000000 in ?? ()
>>>
>>> Thread 5 (Thread 0x7fb5b8ad5700 (LWP 12910)):
>>> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>>> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>>>    timeout=<optimized out>) at msg/tcp.cc:53
>>> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=41,
>>>    buf=0x7fb5b8ad4d9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
>>> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x49ed000)
>>>    at msg/SimpleMessenger.cc:1606
>>> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>>>    this=<optimized out>) at msg/SimpleMessenger.h:250
>>> #5  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #7  0x0000000000000000 in ?? ()
>>>
>>> Thread 4 (Thread 0x7fb5b89d4700 (LWP 12911)):
>>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x0000000000679f1a in Wait (mutex=..., this=0x49ed1d0)
>>>    at ./common/Cond.h:48
>>> #2  SimpleMessenger::Pipe::writer (this=0x49ed000)
>>>    at msg/SimpleMessenger.cc:1821
>>> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>>>    this=<optimized out>) at msg/SimpleMessenger.h:258
>>> #4  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #6  0x0000000000000000 in ?? ()
>>>
>>> Thread 3 (Thread 0x7fb5b88d3700 (LWP 12912)):
>>> #0  0x00007fb5cb2fd473 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>>> #1  0x000000000066af87 in tcp_read_wait (sd=<optimized out>,
>>>    timeout=<optimized out>) at msg/tcp.cc:53
>>> #2  0x000000000066b238 in tcp_read (cct=0x2b4e000, sd=43,
>>>    buf=0x7fb5b88d2d9f "\377", len=1, timeout=900000) at msg/tcp.cc:26
>>> #3  0x000000000067e2d3 in SimpleMessenger::Pipe::reader (this=0x49eda00)
>>>    at msg/SimpleMessenger.cc:1606
>>> #4  0x000000000052126d in SimpleMessenger::Pipe::Reader::entry (
>>>    this=<optimized out>) at msg/SimpleMessenger.h:250
>>> #5  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #6  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #7  0x0000000000000000 in ?? ()
>>>
>>> Thread 2 (Thread 0x7fb5b87d2700 (LWP 12913)):
>>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x0000000000679f1a in Wait (mutex=..., this=0x49edbd0)
>>>    at ./common/Cond.h:48
>>> #2  SimpleMessenger::Pipe::writer (this=0x49eda00)
>>>    at msg/SimpleMessenger.cc:1821
>>> #3  0x000000000052124d in SimpleMessenger::Pipe::Writer::entry (
>>>    this=<optimized out>) at msg/SimpleMessenger.h:258
>>> #4  0x00007fb5cccd8efc in start_thread ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #5  0x00007fb5cb30959d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #6  0x0000000000000000 in ?? ()
>>>
>>> Thread 1 (Thread 0x7fb5cd3017a0 (LWP 12770)):
>>> #0  0x00007fb5cccdd04c in pthread_cond_wait@@GLIBC_2.3.2 ()
>>>   from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x0000000000672c47 in Wait (mutex=..., this=0x2b49838)
>>>    at ./common/Cond.h:48
>>> #2  SimpleMessenger::wait (this=0x2b49680) at msg/SimpleMessenger.cc:2654
>>> #3  0x000000000051efc7 in main (argc=<optimized out>, argv=<optimized out>)
>>>    at ceph_osd.cc:423
>>>
>>>
>>> On 6 April 2012 22:06, Samuel Just <sam.just@dreamhost.com> wrote:
>>>> Actually, can you restart osd0, wait the same amount of time, attach
>>>> to the process with gdb, and post a backtrace from all threads?
>>>> -Sam
>>>>
>>>> On Fri, Apr 6, 2012 at 12:32 PM, Damien Churchill <damoxc@gmail.com> wrote:
>>>>> I've upped the monitor debug logging and re-uploaded the osd log, is that okay?
>>>>>
>>>>> http://damoxc.net/ceph/mon.node21.log.gz
>>>>>
>>>>> On 6 April 2012 19:20, Samuel Just <sam.just@dreamhost.com> wrote:
>>>>>> Hmm, can you post the monitor logs as well for that period?  It looks
>>>>>> like the osd is requesting a map change and not getting it.
>>>>>> -Sam
>>>>>>
>>>>>> On Fri, Apr 6, 2012 at 10:59 AM, Damien Churchill <damoxc@gmail.com> wrote:
>>>>>>> Oops, was set to 600, chmod'd to 644, should be good now.
>>>>>>>
>>>>>>> On 6 April 2012 18:56, Samuel Just <sam.just@dreamhost.com> wrote:
>>>>>>>> Error 403.
>>>>>>>> -Sam
>>>>>>>>
>>>>>>>> On Fri, Apr 6, 2012 at 10:53 AM, Damien Churchill <damoxc@gmail.com> wrote:
>>>>>>>>> Okay, uploaded it to http://damoxc.net/ceph/osd.0.log.gz. I restarted
>>>>>>>>> the osd with debug osd = 20 and let it run for 5 minutes or so.
>>>>>>>>>
>>>>>>>>> On 6 April 2012 18:30, Samuel Just <sam.just@dreamhost.com> wrote:
>>>>>>>>>> Hmm, osd.0 should be enough.  I need osd debugging at around 20 from
>>>>>>>>>> when the osd started.  Restarting the osd with debugging at 20 would
>>>>>>>>>> also work fine.
>>>>>>>>>> -Sam
>>>>>>>>>>
>>>>>>>>>> On Fri, Apr 6, 2012 at 9:55 AM, Damien Churchill <damoxc@gmail.com> wrote:
>>>>>>>>>>> I've got that directory on 3 of the osds: 0, 3 and 4. Do you want the
>>>>>>>>>>> logs to all 3 of them?
>>>>>>>>>>>
>>>>>>>>>>> On 6 April 2012 17:50, Samuel Just <sam.just@dreamhost.com> wrote:
>>>>>>>>>>>> Is there a 0.138_head directory under current/ on any of your osds?
>>>>>>>>>>>> If so, can you post the log to that osd?  I could also use the osd.0
>>>>>>>>>>>> log.
>>>>>>>>>>>> -Sam
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Apr 4, 2012 at 2:44 PM, Damien Churchill <damoxc@gmail.com> wrote:
>>>>>>>>>>>>> I've uploaded them to:
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://damoxc.net/ceph/osdmap
>>>>>>>>>>>>> http://damoxc.net/ceph/pg_dump
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 4 April 2012 21:51, Samuel Just <sam.just@dreamhost.com> wrote:
>>>>>>>>>>>>>> Can you post a copy of your osd map and the output of 'ceph pg dump' ?
>>>>>>>>>>>>>>  You can get the osdmap via 'ceph osd getmap -o <filename>'.
>>>>>>>>>>>>>> -Sam
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Apr 4, 2012 at 1:12 AM, Damien Churchill <damoxc@gmail.com> wrote:
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm having some trouble getting some pgs to stop being inactive. The
>>>>>>>>>>>>>>> cluster is running 0.44.1 and the kernel version is 3.2.x.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ceph -s reports:
>>>>>>>>>>>>>>> 2012-04-04 09:08:57.816029    pg v188540: 990 pgs: 223 inactive, 767
>>>>>>>>>>>>>>> active+clean; 205 GB data, 1013 GB used, 8204 GB / 9315 GB avail
>>>>>>>>>>>>>>> 2012-04-04 09:08:57.817970   mds e2198: 1/1/1 up {0=node24=up:active},
>>>>>>>>>>>>>>> 4 up:standby
>>>>>>>>>>>>>>> 2012-04-04 09:08:57.818024   osd e5910: 5 osds: 5 up, 5 in
>>>>>>>>>>>>>>> 2012-04-04 09:08:57.818201   log 2012-04-04 09:04:03.838358 osd.3
>>>>>>>>>>>>>>> 172.22.10.24:6801/30000 159 : [INF] 0.13d scrub ok
>>>>>>>>>>>>>>> 2012-04-04 09:08:57.818280   mon e7: 3 mons at
>>>>>>>>>>>>>>> {node21=172.22.10.21:6789/0,node22=172.22.10.22:6789/0,node23=172.22.10.23:6789/0}
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ceph health says:
>>>>>>>>>>>>>>> 2012-04-04 09:09:01.651053 mon <- [health]
>>>>>>>>>>>>>>> 2012-04-04 09:09:01.666585 mon.1 -> 'HEALTH_WARN 223 pgs stuck
>>>>>>>>>>>>>>> inactive; 223 pgs stuck unclean' (0)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I was wondering if anyone has any suggestions about how to resolve
>>>>>>>>>>>>>>> this, or things to look for. I've tried restarted the ceph daemons on
>>>>>>>>>>>>>>> the various nodes a few times to no-avail. I don't think that there is
>>>>>>>>>>>>>>> anything wrong with any of the nodes either.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>>> Damien
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pgs stuck inactive
  2012-04-10 23:00                             ` Samuel Just
@ 2012-04-10 23:40                               ` Greg Farnum
  2012-04-12 15:29                                 ` Damien Churchill
  0 siblings, 1 reply; 25+ messages in thread
From: Greg Farnum @ 2012-04-10 23:40 UTC (permalink / raw)
  To: Samuel Just, Damien Churchill; +Cc: ceph-devel

On Tuesday, April 10, 2012 at 4:00 PM, Samuel Just wrote:
> Can you send along the osd log as well for comparison?
> -Sam
> 
> On Tue, Apr 10, 2012 at 3:03 PM, Damien Churchill <damoxc@gmail.com (mailto:damoxc@gmail.com)> wrote:
> > Here are the monitor logs, they're from the monitor starts, however I
> > restarted the osd shortly afterwards and left it to run for 5 or so
> > minutes.
> > 
> > http://damoxc.net/ceph/mon.node21.log.gz
> > http://damoxc.net/ceph/mon.node22.log.gz
> > http://damoxc.net/ceph/mon.node23.log.gz
> > 
> > On 10 April 2012 22:49, Samuel Just <sam.just@dreamhost.com (mailto:sam.just@dreamhost.com)> wrote:
> > > Nothing apparent from the backtrace. I need monitor logs from when
> > > the osd is sending pg_temp requests. Can you restart the osd and post
> > > the osd and all three monitor logs from when you restarted the osd?
> > > You'll have to enable monitor logging on all three.
> > > -Sam
> > 
> 


A quick glance through these shows that all the pg_temp requests aren't actually requesting any changes from the monitor. It's either a very serious mon bug which happened a while ago (unlikely, given the restarts and ongoing map changes, etc), or an OSD bug. I think we want logs from both osd.0 and osd.3 at the same time, from what I'm seeing. :)
-Greg


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pgs stuck inactive
  2012-04-10 23:40                               ` Greg Farnum
@ 2012-04-12 15:29                                 ` Damien Churchill
  2012-04-13 19:30                                   ` Greg Farnum
  0 siblings, 1 reply; 25+ messages in thread
From: Damien Churchill @ 2012-04-12 15:29 UTC (permalink / raw)
  To: Greg Farnum; +Cc: Samuel Just, ceph-devel

On 11 April 2012 00:40, Greg Farnum <gregory.farnum@dreamhost.com> wrote:
>
> A quick glance through these shows that all the pg_temp requests aren't actually requesting any changes from the monitor. It's either a very serious mon bug which happened a while ago (unlikely, given the restarts and ongoing map changes, etc), or an OSD bug. I think we want logs from both osd.0 and osd.3 at the same time, from what I'm seeing. :)
> -Greg
>

Just to make sure all bases are covered:

http://damoxc.net/ceph/ceph-logs-20120412142537.tar.gz

This contains all 5 osd logs and all 3 monitor logs, everything
restarted with debug logging prior to capturing the logs.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pgs stuck inactive
  2012-04-12 15:29                                 ` Damien Churchill
@ 2012-04-13 19:30                                   ` Greg Farnum
  2012-04-13 19:42                                     ` Damien Churchill
  0 siblings, 1 reply; 25+ messages in thread
From: Greg Farnum @ 2012-04-13 19:30 UTC (permalink / raw)
  To: Damien Churchill; +Cc: Samuel Just, ceph-devel

On Thursday, April 12, 2012 at 8:29 AM, Damien Churchill wrote:
> On 11 April 2012 00:40, Greg Farnum <gregory.farnum@dreamhost.com (mailto:gregory.farnum@dreamhost.com)> wrote:
> >  
> > A quick glance through these shows that all the pg_temp requests aren't actually requesting any changes from the monitor. It's either a very serious mon bug which happened a while ago (unlikely, given the restarts and ongoing map changes, etc), or an OSD bug. I think we want logs from both osd.0 and osd.3 at the same time, from what I'm seeing. :)
> > -Greg
>  
>  
>  
> Just to make sure all bases are covered:
>  
> http://damoxc.net/ceph/ceph-logs-20120412142537.tar.gz
>  
> This contains all 5 osd logs and all 3 monitor logs, everything
> restarted with debug logging prior to capturing the logs.

I (and Sam) spent some time looking at this very closely. It continues to tell me that the OSD and the monitor are disagreeing on whether osd 3 should be in the pg temp set for some things, but they seem to agree on everything else….  
Can you zip up for me:
1) The files matching osdmap* of osd0's store from the current/meta/ directory,
2) The contents of your lead monitor's osdmap and osdmap_full directories?

We can check these for differences and then run them through some of our tools and stuff to try and identify the issue.
Thanks!
-Greg

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pgs stuck inactive
  2012-04-13 19:30                                   ` Greg Farnum
@ 2012-04-13 19:42                                     ` Damien Churchill
  2012-04-17  0:06                                       ` Greg Farnum
  0 siblings, 1 reply; 25+ messages in thread
From: Damien Churchill @ 2012-04-13 19:42 UTC (permalink / raw)
  To: Greg Farnum; +Cc: Samuel Just, ceph-devel

Hi,

On 13 April 2012 20:30, Greg Farnum <gregory.farnum@dreamhost.com> wrote:
> On Thursday, April 12, 2012 at 8:29 AM, Damien Churchill wrote:
>> On 11 April 2012 00:40, Greg Farnum <gregory.farnum@dreamhost.com (mailto:gregory.farnum@dreamhost.com)> wrote:
>> >
>> > A quick glance through these shows that all the pg_temp requests aren't actually requesting any changes from the monitor. It's either a very serious mon bug which happened a while ago (unlikely, given the restarts and ongoing map changes, etc), or an OSD bug. I think we want logs from both osd.0 and osd.3 at the same time, from what I'm seeing. :)
>> > -Greg
>>
>>
>>
>> Just to make sure all bases are covered:
>>
>> http://damoxc.net/ceph/ceph-logs-20120412142537.tar.gz
>>
>> This contains all 5 osd logs and all 3 monitor logs, everything
>> restarted with debug logging prior to capturing the logs.
>
> I (and Sam) spent some time looking at this very closely. It continues to tell me that the OSD and the monitor are disagreeing on whether osd 3 should be in the pg temp set for some things, but they seem to agree on everything else….
> Can you zip up for me:
> 1) The files matching osdmap* of osd0's store from the current/meta/ directory,
> 2) The contents of your lead monitor's osdmap and osdmap_full directories?
>

Here they are

http://damoxc.net/ceph/osdmap.0.tar.gz
http://damoxc.net/ceph/mon.node21.osdmap.tar.gz

Hopefully I got the right files :)

Regards,
Damien
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pgs stuck inactive
  2012-04-13 19:42                                     ` Damien Churchill
@ 2012-04-17  0:06                                       ` Greg Farnum
  2012-04-17  5:32                                         ` Damien Churchill
  0 siblings, 1 reply; 25+ messages in thread
From: Greg Farnum @ 2012-04-17  0:06 UTC (permalink / raw)
  To: Damien Churchill; +Cc: Samuel Just, ceph-devel



On Friday, April 13, 2012 at 12:42 PM, Damien Churchill wrote:

> Hi,
>  
> On 13 April 2012 20:30, Greg Farnum <gregory.farnum@dreamhost.com (mailto:gregory.farnum@dreamhost.com)> wrote:
> > On Thursday, April 12, 2012 at 8:29 AM, Damien Churchill wrote:
> > > On 11 April 2012 00:40, Greg Farnum <gregory.farnum@dreamhost.com (mailto:gregory.farnum@dreamhost.com)> wrote:
> > > >  
> > > > A quick glance through these shows that all the pg_temp requests aren't actually requesting any changes from the monitor. It's either a very serious mon bug which happened a while ago (unlikely, given the restarts and ongoing map changes, etc), or an OSD bug. I think we want logs from both osd.0 and osd.3 at the same time, from what I'm seeing. :)
> > > > -Greg
> > >  
> > >  
> > >  
> > >  
> > >  
> > > Just to make sure all bases are covered:
> > >  
> > > http://damoxc.net/ceph/ceph-logs-20120412142537.tar.gz
> > >  
> > > This contains all 5 osd logs and all 3 monitor logs, everything
> > > restarted with debug logging prior to capturing the logs.
> >  
> >  
> >  
> > I (and Sam) spent some time looking at this very closely. It continues to tell me that the OSD and the monitor are disagreeing on whether osd 3 should be in the pg temp set for some things, but they seem to agree on everything else….
> > Can you zip up for me:
> > 1) The files matching osdmap* of osd0's store from the current/meta/ directory,
> > 2) The contents of your lead monitor's osdmap and osdmap_full directories?
>  
>  
>  
> Here they are
>  
> http://damoxc.net/ceph/osdmap.0.tar.gz
> http://damoxc.net/ceph/mon.node21.osdmap.tar.gz
>  
> Hopefully I got the right files :)
Yep!
We looked into this more today and have discovered some definite oddness. Have you by any chance tried to change the number of PGs in your pools?

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pgs stuck inactive
  2012-04-17  0:06                                       ` Greg Farnum
@ 2012-04-17  5:32                                         ` Damien Churchill
  2012-04-17 16:49                                           ` Greg Farnum
  0 siblings, 1 reply; 25+ messages in thread
From: Damien Churchill @ 2012-04-17  5:32 UTC (permalink / raw)
  To: Greg Farnum; +Cc: Samuel Just, ceph-devel

On 17 April 2012 01:06, Greg Farnum <gregory.farnum@dreamhost.com> wrote:
> Yep!
> We looked into this more today and have discovered some definite oddness. Have you by any chance tried to change the number of PGs in your pools?

I haven't no (at least certainly not on purpose!). All I've really
done is copy a bit of stuff onto the unix fs and create a few rbd
volumes, as well as upgrade it when a new version comes out.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pgs stuck inactive
  2012-04-17  5:32                                         ` Damien Churchill
@ 2012-04-17 16:49                                           ` Greg Farnum
  2012-04-18  4:35                                             ` Martin Wilderoth
  2012-04-18  6:41                                             ` Damien Churchill
  0 siblings, 2 replies; 25+ messages in thread
From: Greg Farnum @ 2012-04-17 16:49 UTC (permalink / raw)
  To: Damien Churchill; +Cc: Samuel Just, ceph-devel

On Monday, April 16, 2012 at 10:32 PM, Damien Churchill wrote:
> On 17 April 2012 01:06, Greg Farnum <gregory.farnum@dreamhost.com (mailto:gregory.farnum@dreamhost.com)> wrote:
> > Yep!
> > We looked into this more today and have discovered some definite oddness. Have you by any chance tried to change the number of PGs in your pools?
> 
> 
> 
> I haven't no (at least certainly not on purpose!). All I've really
> done is copy a bit of stuff onto the unix fs and create a few rbd
> volumes, as well as upgrade it when a new version comes out.

Drat, that means there's actually a problem to track down somewhere. 
Do you know what version this was created with, and what upgrades you've been through? My best guess right now is that there's a problem with the encoding and decoding that I'm going to have to track down, and more context will make it a lot easier. :)
-Greg


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pgs stuck inactive
  2012-04-17 16:49                                           ` Greg Farnum
@ 2012-04-18  4:35                                             ` Martin Wilderoth
  2012-04-18  6:41                                             ` Damien Churchill
  1 sibling, 0 replies; 25+ messages in thread
From: Martin Wilderoth @ 2012-04-18  4:35 UTC (permalink / raw)
  To: ceph-devel

Hello,

I think I have a simmilar problem, but only 2 pgs inacive and 2 pgs unclean.
They have been there since yesterday. I'm running kernel 3.2 and ceph 0.45

 /Regards Martin

----- Ursprungligt meddelande ----- 

Från: "Greg Farnum" <gregory.farnum@dreamhost.com> 
Till: "Damien Churchill" <damoxc@gmail.com> 
Kopia: "Samuel Just" <sam.just@dreamhost.com>, ceph-devel@vger.kernel.org 
Skickat: tisdag, 17 apr 2012 18:49:03 
Ämne: Re: pgs stuck inactive 

On Monday, April 16, 2012 at 10:32 PM, Damien Churchill wrote: 
> On 17 April 2012 01:06, Greg Farnum <gregory.farnum@dreamhost.com (mailto:gregory.farnum@dreamhost.com)> wrote: 
> > Yep! 
> > We looked into this more today and have discovered some definite oddness. Have you by any chance tried to change the number of PGs in your pools? 
> 
> 
> 
> I haven't no (at least certainly not on purpose!). All I've really 
> done is copy a bit of stuff onto the unix fs and create a few rbd 
> volumes, as well as upgrade it when a new version comes out. 

Drat, that means there's actually a problem to track down somewhere. 
Do you know what version this was created with, and what upgrades you've been through? My best guess right now is that there's a problem with the encoding and decoding that I'm going to have to track down, and more context will make it a lot easier. :) 
-Greg 

-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
the body of a message to majordomo@vger.kernel.org 
More majordomo info at http://vger.kernel.org/majordomo-info.html 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pgs stuck inactive
  2012-04-17 16:49                                           ` Greg Farnum
  2012-04-18  4:35                                             ` Martin Wilderoth
@ 2012-04-18  6:41                                             ` Damien Churchill
  2012-04-18 18:41                                               ` Greg Farnum
  1 sibling, 1 reply; 25+ messages in thread
From: Damien Churchill @ 2012-04-18  6:41 UTC (permalink / raw)
  To: Greg Farnum; +Cc: Samuel Just, ceph-devel

On 17 April 2012 17:49, Greg Farnum <gregory.farnum@dreamhost.com> wrote:
> Do you know what version this was created with, and what upgrades you've been through? My best guess right now is that there's a problem with the encoding and decoding that I'm going to have to track down, and more context will make it a lot easier. :)

Hmmm that's testing my memory, I'd say that cluster has been alive at
least since 0.34. Occasionally I think there was a version skipped,
not sure if that could cause any issues?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pgs stuck inactive
  2012-04-18  6:41                                             ` Damien Churchill
@ 2012-04-18 18:41                                               ` Greg Farnum
  2012-04-18 19:04                                                 ` Damien Churchill
  0 siblings, 1 reply; 25+ messages in thread
From: Greg Farnum @ 2012-04-18 18:41 UTC (permalink / raw)
  To: Damien Churchill; +Cc: ceph-devel

On Tuesday, April 17, 2012 at 11:41 PM, Damien Churchill wrote:
> On 17 April 2012 17:49, Greg Farnum <gregory.farnum@dreamhost.com (mailto:gregory.farnum@dreamhost.com)> wrote:
> > Do you know what version this was created with, and what upgrades you've been through? My best guess right now is that there's a problem with the encoding and decoding that I'm going to have to track down, and more context will make it a lot easier. :)
> 
> 
> 
> Hmmm that's testing my memory, I'd say that cluster has been alive at
> least since 0.34. Occasionally I think there was a version skipped,
> not sure if that could cause any issues?

Okay. So the good news is that we can see what's broken now and have a kludge to prevent it happening to others; the bad news is we still have no idea how it actually occurred. :( But I don't think it's worth investing the time given what we have available, so all we can do is repair your cluster. 

Are you building your binaries from source, and can you run a patched version of the monitors? If you can I'll give you a patch to enable a simple command that should make things work; otherwise we'll need to start editing things by hand. (Yucky)
-Greg


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pgs stuck inactive
  2012-04-18 18:41                                               ` Greg Farnum
@ 2012-04-18 19:04                                                 ` Damien Churchill
  2012-04-18 20:41                                                   ` Greg Farnum
  0 siblings, 1 reply; 25+ messages in thread
From: Damien Churchill @ 2012-04-18 19:04 UTC (permalink / raw)
  To: Greg Farnum; +Cc: ceph-devel

On 18 April 2012 19:41, Greg Farnum <gregory.farnum@dreamhost.com> wrote:
>
> Are you building your binaries from source, and can you run a patched version of the monitors? If you can I'll give you a patch to enable a simple command that should make things work; otherwise we'll need to start editing things by hand. (Yucky)
> -Greg
>

I was using the Ubuntu packages but I can quite happily build my own
packages if you give me the patch :-)

I agree it's a waste of time if it's not obvious what's caused it,
could be some obscure cause occurred due to upgrading between older
versions.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pgs stuck inactive
  2012-04-18 19:04                                                 ` Damien Churchill
@ 2012-04-18 20:41                                                   ` Greg Farnum
  2012-04-19  9:07                                                     ` Damien Churchill
  0 siblings, 1 reply; 25+ messages in thread
From: Greg Farnum @ 2012-04-18 20:41 UTC (permalink / raw)
  To: Damien Churchill; +Cc: ceph-devel

On Wednesday, April 18, 2012 at 12:04 PM, Damien Churchill wrote:
> On 18 April 2012 19:41, Greg Farnum <gregory.farnum@dreamhost.com (mailto:gregory.farnum@dreamhost.com)> wrote:
> > 
> > Are you building your binaries from source, and can you run a patched version of the monitors? If you can I'll give you a patch to enable a simple command that should make things work; otherwise we'll need to start editing things by hand. (Yucky)
> > -Greg
> 
> 
> 
> I was using the Ubuntu packages but I can quite happily build my own
> packages if you give me the patch :-)
> 
> I agree it's a waste of time if it's not obvious what's caused it,
> could be some obscure cause occurred due to upgrading between older
> versions.

Okay, assuming you're still on 0.41.1, can you checkout the git branch "for-damien" and build it? 
Then shut down your monitors, replace their executables with the freshly-built ones, and run
"ceph osd pool set vmimages pg_num 320 --a-dev-told-me-to"
and
"ceph osd pool set vmimages pgp_num 320"



That should get everything back up and running. The one sour note is that due to the bug in the past, your data (ie, filesystem) and vmimages pools have gotten conflated. It shouldn't cause any issues (they use very different naming schemes), but they're tied together in terms of replication and the raw pool statistics.
(If that's important you can create a new pool and move the rbd images to it.)
-Greg


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pgs stuck inactive
  2012-04-18 20:41                                                   ` Greg Farnum
@ 2012-04-19  9:07                                                     ` Damien Churchill
  2012-04-19 17:09                                                       ` Greg Farnum
  0 siblings, 1 reply; 25+ messages in thread
From: Damien Churchill @ 2012-04-19  9:07 UTC (permalink / raw)
  To: Greg Farnum; +Cc: ceph-devel

On 18 April 2012 21:41, Greg Farnum <gregory.farnum@dreamhost.com> wrote:
> That should get everything back up and running. The one sour note is that due to the bug in the past, your data (ie, filesystem) and vmimages pools have gotten conflated. It shouldn't cause any issues (they use very different naming schemes), but they're tied together in terms of replication and the raw pool statistics.
> (If that's important you can create a new pool and move the rbd images to it.)

Thanks a lot Greg!

All back up and running now. What negative side effects could having
the pools mixed together have, given that I'm not doing any special
placement of them?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: pgs stuck inactive
  2012-04-19  9:07                                                     ` Damien Churchill
@ 2012-04-19 17:09                                                       ` Greg Farnum
  0 siblings, 0 replies; 25+ messages in thread
From: Greg Farnum @ 2012-04-19 17:09 UTC (permalink / raw)
  To: Damien Churchill; +Cc: ceph-devel

On Thursday, April 19, 2012 at 2:07 AM, Damien Churchill wrote:
> On 18 April 2012 21:41, Greg Farnum <gregory.farnum@dreamhost.com (mailto:gregory.farnum@dreamhost.com)> wrote:
> > That should get everything back up and running. The one sour note is that due to the bug in the past, your data (ie, filesystem) and vmimages pools have gotten conflated. It shouldn't cause any issues (they use very different naming schemes), but they're tied together in terms of replication and the raw pool statistics.
> > (If that's important you can create a new pool and move the rbd images to it.)
> 
> 
> 
> Thanks a lot Greg!
> 
> All back up and running now. What negative side effects could having
> the pools mixed together have, given that I'm not doing any special
> placement of them?

There shouldn't be any negative side effects from it at all. It just means that you've got a mixed namespace, and if you don't care about that none of our current tools do either. (Something like the still-entirely-theoretical ceph-fsck probably wouldn't appreciate it, but we don't have anything like that right now.) 


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2012-04-19 17:09 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-04  8:12 pgs stuck inactive Damien Churchill
2012-04-04 20:51 ` Samuel Just
2012-04-04 21:44   ` Damien Churchill
2012-04-06 16:50     ` Samuel Just
2012-04-06 16:55       ` Damien Churchill
2012-04-06 17:30         ` Samuel Just
2012-04-06 17:53           ` Damien Churchill
     [not found]             ` <CACLRD_3hqoOqbZWnyft3B+pEaPG=pmiDt5HeQASUC=OPS-uA7Q@mail.gmail.com>
     [not found]               ` <CAFtEh-dnR0zu=ct-gG0L+76BTYHVFimjFKNhHjnDNZWA=Scs1g@mail.gmail.com>
2012-04-06 18:20                 ` Samuel Just
     [not found]                   ` <CAFtEh-cx2by6yG_xiQbq5e2fbViOWyHoaOLFPBCZgX7x-7gwFA@mail.gmail.com>
     [not found]                     ` <CACLRD_0UrwKJuELiZqrL7Q_5AjB1EvOd5gqLRzrxw_7AFMcjxg@mail.gmail.com>
     [not found]                       ` <CAFtEh-fwt9BwZ7NwZaJVnwq3hgzvkrb1XvKT0uF81vePRcu7oA@mail.gmail.com>
2012-04-10 21:49                         ` Samuel Just
2012-04-10 22:03                           ` Damien Churchill
2012-04-10 23:00                             ` Samuel Just
2012-04-10 23:40                               ` Greg Farnum
2012-04-12 15:29                                 ` Damien Churchill
2012-04-13 19:30                                   ` Greg Farnum
2012-04-13 19:42                                     ` Damien Churchill
2012-04-17  0:06                                       ` Greg Farnum
2012-04-17  5:32                                         ` Damien Churchill
2012-04-17 16:49                                           ` Greg Farnum
2012-04-18  4:35                                             ` Martin Wilderoth
2012-04-18  6:41                                             ` Damien Churchill
2012-04-18 18:41                                               ` Greg Farnum
2012-04-18 19:04                                                 ` Damien Churchill
2012-04-18 20:41                                                   ` Greg Farnum
2012-04-19  9:07                                                     ` Damien Churchill
2012-04-19 17:09                                                       ` Greg Farnum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.