All of lore.kernel.org
 help / color / mirror / Atom feed
* OSD crashing in ~Job()
@ 2017-02-27 14:37 Willem Jan Withagen
  2017-02-27 14:54 ` Sage Weil
  2017-02-27 15:24 ` Gregory Farnum
  0 siblings, 2 replies; 4+ messages in thread
From: Willem Jan Withagen @ 2017-02-27 14:37 UTC (permalink / raw)
  To: Ceph Development

Hi,

Once every 10 runs  test-erasure-code.sh is terminated with a timeout
during my FreeBSD tests.

It receives an assert in:
/usr/srcs/Ceph/work/ceph/src/osd/OSDMapMapping.h:31
class ParallelPGMapper {
public:
  struct Job {
....
    virtual ~Job() {
      assert(shards == 0);
    }
} }

Anydody any suggestions on what this can be, and/or how to start
debugging this?

--WjW


2017-02-27 14:15:28.034683 b134480 -1
/usr/srcs/Ceph/work/ceph/src/osd/OSDMapMapping.h: In function 'virtual
ParallelPGMapper::Job::~Job()' thread b134480 time 2017
-02-27 14:15:27.975342
/usr/srcs/Ceph/work/ceph/src/osd/OSDMapMapping.h: 31: FAILED
assert(shards == 0)

 ceph version Development (no_version)
 1: <ceph::__ceph_assert_fail(char const*, char const, int, char
const)+0xb21> at /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
 2: <ParallelPGMapper::Job::~Job(void)+0x50> at
/usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
 3: <OSDMapMapping::MappingJob::~MappingJob(void)+0x15> at
/usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
 4: <OSDMapMapping::MappingJob::~MappingJob(void)+0x19> at
/usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
 5:
<OSDMonitor::encode_pending(std::__1::shared_ptr<MonitorDBStore::Transaction>)+0xefd>
at /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
 6: <PaxosService::propose_pending(void)+0x91d> at
/usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
 7:
<PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)::C_Propose::finish(int)+0x3a>
at /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
 8: <Context::complete(int)+0x22> at
/usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
 9: <SafeTimer::timer_thread(void)+0x8d7> at
/usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
 10: <SafeTimerThread::entry(void)+0x19> at
/usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
 11: <Thread::entry_wrapper(void)+0xc6> at
/usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
 12: <Thread::_entry_func(void*)+0x15> at
/usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: OSD crashing in ~Job()
  2017-02-27 14:37 OSD crashing in ~Job() Willem Jan Withagen
@ 2017-02-27 14:54 ` Sage Weil
  2017-02-27 15:24 ` Gregory Farnum
  1 sibling, 0 replies; 4+ messages in thread
From: Sage Weil @ 2017-02-27 14:54 UTC (permalink / raw)
  To: Willem Jan Withagen; +Cc: Ceph Development

On Mon, 27 Feb 2017, Willem Jan Withagen wrote:
> Hi,
> 
> Once every 10 runs  test-erasure-code.sh is terminated with a timeout
> during my FreeBSD tests.
> 
> It receives an assert in:
> /usr/srcs/Ceph/work/ceph/src/osd/OSDMapMapping.h:31
> class ParallelPGMapper {
> public:
>   struct Job {
> ....
>     virtual ~Job() {
>       assert(shards == 0);
>     }
> } }
> 
> Anydody any suggestions on what this can be, and/or how to start
> debugging this?

I'm guessing this fixes it: https://github.com/ceph/ceph/pull/13574

sage

> 
> --WjW
> 
> 
> 2017-02-27 14:15:28.034683 b134480 -1
> /usr/srcs/Ceph/work/ceph/src/osd/OSDMapMapping.h: In function 'virtual
> ParallelPGMapper::Job::~Job()' thread b134480 time 2017
> -02-27 14:15:27.975342
> /usr/srcs/Ceph/work/ceph/src/osd/OSDMapMapping.h: 31: FAILED
> assert(shards == 0)
> 
>  ceph version Development (no_version)
>  1: <ceph::__ceph_assert_fail(char const*, char const, int, char
> const)+0xb21> at /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>  2: <ParallelPGMapper::Job::~Job(void)+0x50> at
> /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>  3: <OSDMapMapping::MappingJob::~MappingJob(void)+0x15> at
> /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>  4: <OSDMapMapping::MappingJob::~MappingJob(void)+0x19> at
> /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>  5:
> <OSDMonitor::encode_pending(std::__1::shared_ptr<MonitorDBStore::Transaction>)+0xefd>
> at /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>  6: <PaxosService::propose_pending(void)+0x91d> at
> /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>  7:
> <PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)::C_Propose::finish(int)+0x3a>
> at /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>  8: <Context::complete(int)+0x22> at
> /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>  9: <SafeTimer::timer_thread(void)+0x8d7> at
> /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>  10: <SafeTimerThread::entry(void)+0x19> at
> /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>  11: <Thread::entry_wrapper(void)+0xc6> at
> /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>  12: <Thread::_entry_func(void*)+0x15> at
> /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: OSD crashing in ~Job()
  2017-02-27 14:37 OSD crashing in ~Job() Willem Jan Withagen
  2017-02-27 14:54 ` Sage Weil
@ 2017-02-27 15:24 ` Gregory Farnum
  2017-02-28  8:03   ` Willem Jan Withagen
  1 sibling, 1 reply; 4+ messages in thread
From: Gregory Farnum @ 2017-02-27 15:24 UTC (permalink / raw)
  To: Willem Jan Withagen; +Cc: Ceph Development

On Mon, Feb 27, 2017 at 6:37 AM, Willem Jan Withagen <wjw@digiware.nl> wrote:
> Hi,
>
> Once every 10 runs  test-erasure-code.sh is terminated with a timeout
> during my FreeBSD tests.
>
> It receives an assert in:
> /usr/srcs/Ceph/work/ceph/src/osd/OSDMapMapping.h:31
> class ParallelPGMapper {
> public:
>   struct Job {
> ....
>     virtual ~Job() {
>       assert(shards == 0);
>     }
> } }
>
> Anydody any suggestions on what this can be, and/or how to start
> debugging this?

Sounds like maybe
1) your test hardware is slower than what we mostly use, in this
particular job, and
2) the OSDMapMapping code doesn't shut down cleanly when it gets a
TERM signal. :/

So, extend the timeout or fix the OSDMapMapping, I guess?
-Greg

>
> --WjW
>
>
> 2017-02-27 14:15:28.034683 b134480 -1
> /usr/srcs/Ceph/work/ceph/src/osd/OSDMapMapping.h: In function 'virtual
> ParallelPGMapper::Job::~Job()' thread b134480 time 2017
> -02-27 14:15:27.975342
> /usr/srcs/Ceph/work/ceph/src/osd/OSDMapMapping.h: 31: FAILED
> assert(shards == 0)
>
>  ceph version Development (no_version)
>  1: <ceph::__ceph_assert_fail(char const*, char const, int, char
> const)+0xb21> at /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>  2: <ParallelPGMapper::Job::~Job(void)+0x50> at
> /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>  3: <OSDMapMapping::MappingJob::~MappingJob(void)+0x15> at
> /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>  4: <OSDMapMapping::MappingJob::~MappingJob(void)+0x19> at
> /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>  5:
> <OSDMonitor::encode_pending(std::__1::shared_ptr<MonitorDBStore::Transaction>)+0xefd>
> at /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>  6: <PaxosService::propose_pending(void)+0x91d> at
> /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>  7:
> <PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)::C_Propose::finish(int)+0x3a>
> at /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>  8: <Context::complete(int)+0x22> at
> /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>  9: <SafeTimer::timer_thread(void)+0x8d7> at
> /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>  10: <SafeTimerThread::entry(void)+0x19> at
> /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>  11: <Thread::entry_wrapper(void)+0xc6> at
> /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>  12: <Thread::_entry_func(void*)+0x15> at
> /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: OSD crashing in ~Job()
  2017-02-27 15:24 ` Gregory Farnum
@ 2017-02-28  8:03   ` Willem Jan Withagen
  0 siblings, 0 replies; 4+ messages in thread
From: Willem Jan Withagen @ 2017-02-28  8:03 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Ceph Development

On 27-2-2017 16:24, Gregory Farnum wrote:
> On Mon, Feb 27, 2017 at 6:37 AM, Willem Jan Withagen <wjw@digiware.nl> wrote:
>> Hi,
>>
>> Once every 10 runs  test-erasure-code.sh is terminated with a timeout
>> during my FreeBSD tests.
>>
>> It receives an assert in:
>> /usr/srcs/Ceph/work/ceph/src/osd/OSDMapMapping.h:31
>> class ParallelPGMapper {
>> public:
>>   struct Job {
>> ....
>>     virtual ~Job() {
>>       assert(shards == 0);
>>     }
>> } }
>>
>> Anydody any suggestions on what this can be, and/or how to start
>> debugging this?
> 
> Sounds like maybe
> 1) your test hardware is slower than what we mostly use, in this
> particular job, and
> 2) the OSDMapMapping code doesn't shut down cleanly when it gets a
> TERM signal. :/
> 
> So, extend the timeout or fix the OSDMapMapping, I guess?

Got a fix-PR from Sage, and now it complets 100 testruns wiithout any
trouble. Timeouts were due to the test started only one mon, and that
made the next rados command hang and ctest timeout.

--WjW

> -Greg
> 
>>
>> --WjW
>>
>>
>> 2017-02-27 14:15:28.034683 b134480 -1
>> /usr/srcs/Ceph/work/ceph/src/osd/OSDMapMapping.h: In function 'virtual
>> ParallelPGMapper::Job::~Job()' thread b134480 time 2017
>> -02-27 14:15:27.975342
>> /usr/srcs/Ceph/work/ceph/src/osd/OSDMapMapping.h: 31: FAILED
>> assert(shards == 0)
>>
>>  ceph version Development (no_version)
>>  1: <ceph::__ceph_assert_fail(char const*, char const, int, char
>> const)+0xb21> at /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>>  2: <ParallelPGMapper::Job::~Job(void)+0x50> at
>> /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>>  3: <OSDMapMapping::MappingJob::~MappingJob(void)+0x15> at
>> /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>>  4: <OSDMapMapping::MappingJob::~MappingJob(void)+0x19> at
>> /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>>  5:
>> <OSDMonitor::encode_pending(std::__1::shared_ptr<MonitorDBStore::Transaction>)+0xefd>
>> at /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>>  6: <PaxosService::propose_pending(void)+0x91d> at
>> /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>>  7:
>> <PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)::C_Propose::finish(int)+0x3a>
>> at /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>>  8: <Context::complete(int)+0x22> at
>> /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>>  9: <SafeTimer::timer_thread(void)+0x8d7> at
>> /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>>  10: <SafeTimerThread::entry(void)+0x19> at
>> /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>>  11: <Thread::entry_wrapper(void)+0xc6> at
>> /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>>  12: <Thread::_entry_func(void*)+0x15> at
>> /usr/srcs/Ceph/work/ceph/build/bin/ceph-mon
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to interpret this.
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-02-28  8:12 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-27 14:37 OSD crashing in ~Job() Willem Jan Withagen
2017-02-27 14:54 ` Sage Weil
2017-02-27 15:24 ` Gregory Farnum
2017-02-28  8:03   ` Willem Jan Withagen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.