All of lore.kernel.org
 help / color / mirror / Atom feed
* cmon: PGMonitor::encode_pending() assert failure
@ 2011-02-03  1:51 Chris Dunlop
       [not found] ` <AANLkTi=SyqVD8MyCt+ybYrhoSHunYrchcdZUvW-nzOgw@mail.gmail.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Dunlop @ 2011-02-03  1:51 UTC (permalink / raw)
  To: ceph-devel

G'day,

I received this assert failure after copying about 110 GB of data into
a previously-empty ceph 0.24.2:

ceph version 0.25~rc (commit:73e76723e35562c9391872e07cf314b4465f30af)
2011-02-03 08:05:26.779951 409b9950 mon.0@0(leader).pg v19635 PGMonitor::update_from_paxos: error parsing incremental update: buffer::end_of_buffer
2011-02-03 08:05:28.651238 42b99950 mon.0@0(leader).pg v19635 PGMonitor::update_from_paxos: error parsing incremental update: buffer::end_of_buffer
mon/PGMonitor.cc: In function 'virtual void PGMonitor::encode_pending(ceph::bufferlist&)', In thread 409b9950
mon/PGMonitor.cc:178: FAILED assert(paxos->get_version() + 1 == pending_inc.version)
 ceph version 0.25~rc (commit:73e76723e35562c9391872e07cf314b4465f30af)
 1: (PGMonitor::encode_pending(ceph::buffer::list&)+0x442) [0x4d4332]
 2: (PaxosService::propose_pending()+0x26d) [0x4995ad]
 3: (SafeTimer::timer_thread()+0x65f) [0x5602bf]
 4: (SafeTimerThread::entry()+0xd) [0x563a3d]
 5: (Thread::_entry_func(void*)+0xa) [0x46fe0a]
 6: /lib/libpthread.so.0 [0x7f282fd87fc7]
 7: (clone()+0x6d) [0x7f282ec6764d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
*** Caught signal (Aborted) ***
in thread 409b9950
 ceph version 0.25~rc (commit:73e76723e35562c9391872e07cf314b4465f30af)
 1: /usr/bin/cmon [0x58054e]
 2: /lib/libpthread.so.0 [0x7f282fd8fa80]
 3: (gsignal()+0x35) [0x7f282ebc9ed5]
 4: (abort()+0x183) [0x7f282ebcb3f3]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x114) [0x7f282f44d294]
 6: /usr/lib/libstdc++.so.6 [0x7f282f44b696]
 7: /usr/lib/libstdc++.so.6 [0x7f282f44b6c3]
 8: /usr/lib/libstdc++.so.6 [0x7f282f44b7aa]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x3f4) [0x563f84]
 a: (PGMonitor::encode_pending(ceph::buffer::list&)+0x442) [0x4d4332]
 b: (PaxosService::propose_pending()+0x26d) [0x4995ad]
 c: (SafeTimer::timer_thread()+0x65f) [0x5602bf]
 d: (SafeTimerThread::entry()+0xd) [0x563a3d]
 e: (Thread::_entry_func(void*)+0xa) [0x46fe0a]
 f: /lib/libpthread.so.0 [0x7f282fd87fc7]
 10: (clone()+0x6d) [0x7f282ec6764d]

If needed, the cmon executable is available here:

http://www.onthe.net.au/private/cmon.gz

If you need any other info, just holler!

Cheers,

Chris


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: cmon: PGMonitor::encode_pending() assert failure
       [not found] ` <AANLkTi=SyqVD8MyCt+ybYrhoSHunYrchcdZUvW-nzOgw@mail.gmail.com>
@ 2011-02-03 21:03   ` Sage Weil
  2011-02-03 22:02     ` Chris Dunlop
  0 siblings, 1 reply; 6+ messages in thread
From: Sage Weil @ 2011-02-03 21:03 UTC (permalink / raw)
  To: chris; +Cc: ceph-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3233 bytes --]

Hi Chris,

This is an interesting one.  Would it be possible for you to tar up your 
mondata directory on the failed node and post it somewhere I can get at 
it?  From the looks of things the pgmap incremental state file is 
truncated, but I'd like to confirm.

http://tracker.newdream.net/issues/762

Thanks!
sage


On Thu, 3 Feb 2011, Gregory Farnum wrote:

> ---------- Forwarded message ----------
> From: Chris Dunlop <chris@onthe.net.au>
> Date: Wed, Feb 2, 2011 at 5:51 PM
> Subject: cmon: PGMonitor::encode_pending() assert failure
> To: ceph-devel@vger.kernel.org
> 
> 
> G'day,
> 
> I received this assert failure after copying about 110 GB of data into
> a previously-empty ceph 0.24.2:
> 
> ceph version 0.25~rc (commit:73e76723e35562c9391872e07cf314b4465f30af)
> 2011-02-03 08:05:26.779951 409b9950 mon.0@0(leader).pg v19635
> PGMonitor::update_from_paxos: error parsing incremental update:
> buffer::end_of_buffer
> 2011-02-03 08:05:28.651238 42b99950 mon.0@0(leader).pg v19635
> PGMonitor::update_from_paxos: error parsing incremental update:
> buffer::end_of_buffer
> mon/PGMonitor.cc: In function 'virtual void
> PGMonitor::encode_pending(ceph::bufferlist&)', In thread 409b9950
> mon/PGMonitor.cc:178: FAILED assert(paxos->get_version() + 1 ==
> pending_inc.version)
>  ceph version 0.25~rc (commit:73e76723e35562c9391872e07cf314b4465f30af)
>  1: (PGMonitor::encode_pending(ceph::buffer::list&)+0x442) [0x4d4332]
>  2: (PaxosService::propose_pending()+0x26d) [0x4995ad]
>  3: (SafeTimer::timer_thread()+0x65f) [0x5602bf]
>  4: (SafeTimerThread::entry()+0xd) [0x563a3d]
>  5: (Thread::_entry_func(void*)+0xa) [0x46fe0a]
>  6: /lib/libpthread.so.0 [0x7f282fd87fc7]
>  7: (clone()+0x6d) [0x7f282ec6764d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
> *** Caught signal (Aborted) ***
> in thread 409b9950
>  ceph version 0.25~rc (commit:73e76723e35562c9391872e07cf314b4465f30af)
>  1: /usr/bin/cmon [0x58054e]
>  2: /lib/libpthread.so.0 [0x7f282fd8fa80]
>  3: (gsignal()+0x35) [0x7f282ebc9ed5]
>  4: (abort()+0x183) [0x7f282ebcb3f3]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x114) [0x7f282f44d294]
>  6: /usr/lib/libstdc++.so.6 [0x7f282f44b696]
>  7: /usr/lib/libstdc++.so.6 [0x7f282f44b6c3]
>  8: /usr/lib/libstdc++.so.6 [0x7f282f44b7aa]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x3f4) [0x563f84]
>  a: (PGMonitor::encode_pending(ceph::buffer::list&)+0x442) [0x4d4332]
>  b: (PaxosService::propose_pending()+0x26d) [0x4995ad]
>  c: (SafeTimer::timer_thread()+0x65f) [0x5602bf]
>  d: (SafeTimerThread::entry()+0xd) [0x563a3d]
>  e: (Thread::_entry_func(void*)+0xa) [0x46fe0a]
>  f: /lib/libpthread.so.0 [0x7f282fd87fc7]
>  10: (clone()+0x6d) [0x7f282ec6764d]
> 
> If needed, the cmon executable is available here:
> 
> http://www.onthe.net.au/private/cmon.gz
> 
> If you need any other info, just holler!
> 
> Cheers,
> 
> Chris
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: cmon: PGMonitor::encode_pending() assert failure
  2011-02-03 21:03   ` Sage Weil
@ 2011-02-03 22:02     ` Chris Dunlop
  2011-02-03 22:24       ` Sage Weil
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Dunlop @ 2011-02-03 22:02 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

On Thu, Feb 03, 2011 at 01:03:17PM -0800, Sage Weil wrote:
> Hi Chris,
> 
> This is an interesting one.  Would it be possible for you to
> tar up your mondata directory on the failed node and post it
> somewhere I can get at it?  From the looks of things the pgmap
> incremental state file is truncated, but I'd like to confirm.
> 
> http://tracker.newdream.net/issues/762

Aw crap, sorry, I blew that fs away installing the latest master
to see what happened there ...whereupon overnight I've promptly
hit the "WARNING: at fs/btrfs/inode.c:2143" problem*.

I can revert back to my previous install and run the same
workload to see if it crops up again if that's useful (it took
about 12 hours of rsync'ing files into the fs to get there), or
I can try the workload using latest ceph with Josef Bacik's
btrfs-work** to see if either problem crops up again. Any
preference?

* http://article.gmane.org/gmane.comp.file-systems.ceph.devel/1726
** http://article.gmane.org/gmane.comp.file-systems.ceph.devel/1719

Cheers,

Chris

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: cmon: PGMonitor::encode_pending() assert failure
  2011-02-03 22:02     ` Chris Dunlop
@ 2011-02-03 22:24       ` Sage Weil
  2011-02-03 23:24         ` Chris Dunlop
  2011-02-07  3:34         ` Chris Dunlop
  0 siblings, 2 replies; 6+ messages in thread
From: Sage Weil @ 2011-02-03 22:24 UTC (permalink / raw)
  To: Chris Dunlop; +Cc: ceph-devel

On Fri, 4 Feb 2011, Chris Dunlop wrote:
> On Thu, Feb 03, 2011 at 01:03:17PM -0800, Sage Weil wrote:
> > Hi Chris,
> > 
> > This is an interesting one.  Would it be possible for you to
> > tar up your mondata directory on the failed node and post it
> > somewhere I can get at it?  From the looks of things the pgmap
> > incremental state file is truncated, but I'd like to confirm.
> > 
> > http://tracker.newdream.net/issues/762
> 
> Aw crap, sorry, I blew that fs away installing the latest master
> to see what happened there ...whereupon overnight I've promptly
> hit the "WARNING: at fs/btrfs/inode.c:2143" problem*.
> 
> I can revert back to my previous install and run the same
> workload to see if it crops up again if that's useful (it took
> about 12 hours of rsync'ing files into the fs to get there), or
> I can try the workload using latest ceph with Josef Bacik's
> btrfs-work** to see if either problem crops up again. Any
> preference?

Let's go with latest master and latest bits from Josef.  It's the future!

Thanks!
sage


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: cmon: PGMonitor::encode_pending() assert failure
  2011-02-03 22:24       ` Sage Weil
@ 2011-02-03 23:24         ` Chris Dunlop
  2011-02-07  3:34         ` Chris Dunlop
  1 sibling, 0 replies; 6+ messages in thread
From: Chris Dunlop @ 2011-02-03 23:24 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

On Thu, Feb 03, 2011 at 02:24:43PM -0800, Sage Weil wrote:
> On Fri, 4 Feb 2011, Chris Dunlop wrote:
>> On Thu, Feb 03, 2011 at 01:03:17PM -0800, Sage Weil wrote:
>>> Hi Chris,
>>> 
>>> This is an interesting one.  Would it be possible for you to
>>> tar up your mondata directory on the failed node and post it
>>> somewhere I can get at it?  From the looks of things the pgmap
>>> incremental state file is truncated, but I'd like to confirm.
>>> 
>>> http://tracker.newdream.net/issues/762
>> 
>> Aw crap, sorry, I blew that fs away installing the latest master
>> to see what happened there ...whereupon overnight I've promptly
>> hit the "WARNING: at fs/btrfs/inode.c:2143" problem*.
>> 
>> I can revert back to my previous install and run the same
>> workload to see if it crops up again if that's useful (it took
>> about 12 hours of rsync'ing files into the fs to get there), or
>> I can try the workload using latest ceph with Josef Bacik's
>> btrfs-work** to see if either problem crops up again. Any
>> preference?
> 
> Let's go with latest master and latest bits from Josef.  It's the future!

No worries, I'll get started on that as soon as I increase my
git-foo to the point where I know how to work with multiple
repositories...  hmmm, looks like git-remote is what I need...

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: cmon: PGMonitor::encode_pending() assert failure
  2011-02-03 22:24       ` Sage Weil
  2011-02-03 23:24         ` Chris Dunlop
@ 2011-02-07  3:34         ` Chris Dunlop
  1 sibling, 0 replies; 6+ messages in thread
From: Chris Dunlop @ 2011-02-07  3:34 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

On Thu, Feb 03, 2011 at 02:24:43PM -0800, Sage Weil wrote:
> On Fri, 4 Feb 2011, Chris Dunlop wrote:
>> On Thu, Feb 03, 2011 at 01:03:17PM -0800, Sage Weil wrote:
>>> 
>>> http://tracker.newdream.net/issues/762
>> 
>> I can revert back to my previous install and run the same
>> workload to see if it crops up again if that's useful (it
>> took about 12 hours of rsync'ing files into the fs to get
>> there), or I can try the workload using latest ceph with
>> Josef Bacik's btrfs-work** to see if either problem crops up
>> again. Any preference?
> 
> Let's go with latest master and latest bits from Josef.  It's
> the future!

The PGMonitor::encode_pending() assert failure didn't show up in
copying 500 GB into a fresh ceph fs using:

ceph-client 9aae8faf
+ Josef's btrfs-work bacae123

So either it's fixed or it's now craftily hidden, waiting to
pounce again.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-02-07  3:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-03  1:51 cmon: PGMonitor::encode_pending() assert failure Chris Dunlop
     [not found] ` <AANLkTi=SyqVD8MyCt+ybYrhoSHunYrchcdZUvW-nzOgw@mail.gmail.com>
2011-02-03 21:03   ` Sage Weil
2011-02-03 22:02     ` Chris Dunlop
2011-02-03 22:24       ` Sage Weil
2011-02-03 23:24         ` Chris Dunlop
2011-02-07  3:34         ` Chris Dunlop

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.