* cmon: PGMonitor::encode_pending() assert failure @ 2011-02-03 1:51 Chris Dunlop [not found] ` <AANLkTi=SyqVD8MyCt+ybYrhoSHunYrchcdZUvW-nzOgw@mail.gmail.com> 0 siblings, 1 reply; 6+ messages in thread From: Chris Dunlop @ 2011-02-03 1:51 UTC (permalink / raw) To: ceph-devel G'day, I received this assert failure after copying about 110 GB of data into a previously-empty ceph 0.24.2: ceph version 0.25~rc (commit:73e76723e35562c9391872e07cf314b4465f30af) 2011-02-03 08:05:26.779951 409b9950 mon.0@0(leader).pg v19635 PGMonitor::update_from_paxos: error parsing incremental update: buffer::end_of_buffer 2011-02-03 08:05:28.651238 42b99950 mon.0@0(leader).pg v19635 PGMonitor::update_from_paxos: error parsing incremental update: buffer::end_of_buffer mon/PGMonitor.cc: In function 'virtual void PGMonitor::encode_pending(ceph::bufferlist&)', In thread 409b9950 mon/PGMonitor.cc:178: FAILED assert(paxos->get_version() + 1 == pending_inc.version) ceph version 0.25~rc (commit:73e76723e35562c9391872e07cf314b4465f30af) 1: (PGMonitor::encode_pending(ceph::buffer::list&)+0x442) [0x4d4332] 2: (PaxosService::propose_pending()+0x26d) [0x4995ad] 3: (SafeTimer::timer_thread()+0x65f) [0x5602bf] 4: (SafeTimerThread::entry()+0xd) [0x563a3d] 5: (Thread::_entry_func(void*)+0xa) [0x46fe0a] 6: /lib/libpthread.so.0 [0x7f282fd87fc7] 7: (clone()+0x6d) [0x7f282ec6764d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. *** Caught signal (Aborted) *** in thread 409b9950 ceph version 0.25~rc (commit:73e76723e35562c9391872e07cf314b4465f30af) 1: /usr/bin/cmon [0x58054e] 2: /lib/libpthread.so.0 [0x7f282fd8fa80] 3: (gsignal()+0x35) [0x7f282ebc9ed5] 4: (abort()+0x183) [0x7f282ebcb3f3] 5: (__gnu_cxx::__verbose_terminate_handler()+0x114) [0x7f282f44d294] 6: /usr/lib/libstdc++.so.6 [0x7f282f44b696] 7: /usr/lib/libstdc++.so.6 [0x7f282f44b6c3] 8: /usr/lib/libstdc++.so.6 [0x7f282f44b7aa] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x3f4) [0x563f84] a: (PGMonitor::encode_pending(ceph::buffer::list&)+0x442) [0x4d4332] b: (PaxosService::propose_pending()+0x26d) [0x4995ad] c: (SafeTimer::timer_thread()+0x65f) [0x5602bf] d: (SafeTimerThread::entry()+0xd) [0x563a3d] e: (Thread::_entry_func(void*)+0xa) [0x46fe0a] f: /lib/libpthread.so.0 [0x7f282fd87fc7] 10: (clone()+0x6d) [0x7f282ec6764d] If needed, the cmon executable is available here: http://www.onthe.net.au/private/cmon.gz If you need any other info, just holler! Cheers, Chris ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <AANLkTi=SyqVD8MyCt+ybYrhoSHunYrchcdZUvW-nzOgw@mail.gmail.com>]
* Re: cmon: PGMonitor::encode_pending() assert failure [not found] ` <AANLkTi=SyqVD8MyCt+ybYrhoSHunYrchcdZUvW-nzOgw@mail.gmail.com> @ 2011-02-03 21:03 ` Sage Weil 2011-02-03 22:02 ` Chris Dunlop 0 siblings, 1 reply; 6+ messages in thread From: Sage Weil @ 2011-02-03 21:03 UTC (permalink / raw) To: chris; +Cc: ceph-devel [-- Attachment #1: Type: TEXT/PLAIN, Size: 3233 bytes --] Hi Chris, This is an interesting one. Would it be possible for you to tar up your mondata directory on the failed node and post it somewhere I can get at it? From the looks of things the pgmap incremental state file is truncated, but I'd like to confirm. http://tracker.newdream.net/issues/762 Thanks! sage On Thu, 3 Feb 2011, Gregory Farnum wrote: > ---------- Forwarded message ---------- > From: Chris Dunlop <chris@onthe.net.au> > Date: Wed, Feb 2, 2011 at 5:51 PM > Subject: cmon: PGMonitor::encode_pending() assert failure > To: ceph-devel@vger.kernel.org > > > G'day, > > I received this assert failure after copying about 110 GB of data into > a previously-empty ceph 0.24.2: > > ceph version 0.25~rc (commit:73e76723e35562c9391872e07cf314b4465f30af) > 2011-02-03 08:05:26.779951 409b9950 mon.0@0(leader).pg v19635 > PGMonitor::update_from_paxos: error parsing incremental update: > buffer::end_of_buffer > 2011-02-03 08:05:28.651238 42b99950 mon.0@0(leader).pg v19635 > PGMonitor::update_from_paxos: error parsing incremental update: > buffer::end_of_buffer > mon/PGMonitor.cc: In function 'virtual void > PGMonitor::encode_pending(ceph::bufferlist&)', In thread 409b9950 > mon/PGMonitor.cc:178: FAILED assert(paxos->get_version() + 1 == > pending_inc.version) > ceph version 0.25~rc (commit:73e76723e35562c9391872e07cf314b4465f30af) > 1: (PGMonitor::encode_pending(ceph::buffer::list&)+0x442) [0x4d4332] > 2: (PaxosService::propose_pending()+0x26d) [0x4995ad] > 3: (SafeTimer::timer_thread()+0x65f) [0x5602bf] > 4: (SafeTimerThread::entry()+0xd) [0x563a3d] > 5: (Thread::_entry_func(void*)+0xa) [0x46fe0a] > 6: /lib/libpthread.so.0 [0x7f282fd87fc7] > 7: (clone()+0x6d) [0x7f282ec6764d] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > needed to interpret this. > *** Caught signal (Aborted) *** > in thread 409b9950 > ceph version 0.25~rc (commit:73e76723e35562c9391872e07cf314b4465f30af) > 1: /usr/bin/cmon [0x58054e] > 2: /lib/libpthread.so.0 [0x7f282fd8fa80] > 3: (gsignal()+0x35) [0x7f282ebc9ed5] > 4: (abort()+0x183) [0x7f282ebcb3f3] > 5: (__gnu_cxx::__verbose_terminate_handler()+0x114) [0x7f282f44d294] > 6: /usr/lib/libstdc++.so.6 [0x7f282f44b696] > 7: /usr/lib/libstdc++.so.6 [0x7f282f44b6c3] > 8: /usr/lib/libstdc++.so.6 [0x7f282f44b7aa] > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x3f4) [0x563f84] > a: (PGMonitor::encode_pending(ceph::buffer::list&)+0x442) [0x4d4332] > b: (PaxosService::propose_pending()+0x26d) [0x4995ad] > c: (SafeTimer::timer_thread()+0x65f) [0x5602bf] > d: (SafeTimerThread::entry()+0xd) [0x563a3d] > e: (Thread::_entry_func(void*)+0xa) [0x46fe0a] > f: /lib/libpthread.so.0 [0x7f282fd87fc7] > 10: (clone()+0x6d) [0x7f282ec6764d] > > If needed, the cmon executable is available here: > > http://www.onthe.net.au/private/cmon.gz > > If you need any other info, just holler! > > Cheers, > > Chris > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: cmon: PGMonitor::encode_pending() assert failure 2011-02-03 21:03 ` Sage Weil @ 2011-02-03 22:02 ` Chris Dunlop 2011-02-03 22:24 ` Sage Weil 0 siblings, 1 reply; 6+ messages in thread From: Chris Dunlop @ 2011-02-03 22:02 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel On Thu, Feb 03, 2011 at 01:03:17PM -0800, Sage Weil wrote: > Hi Chris, > > This is an interesting one. Would it be possible for you to > tar up your mondata directory on the failed node and post it > somewhere I can get at it? From the looks of things the pgmap > incremental state file is truncated, but I'd like to confirm. > > http://tracker.newdream.net/issues/762 Aw crap, sorry, I blew that fs away installing the latest master to see what happened there ...whereupon overnight I've promptly hit the "WARNING: at fs/btrfs/inode.c:2143" problem*. I can revert back to my previous install and run the same workload to see if it crops up again if that's useful (it took about 12 hours of rsync'ing files into the fs to get there), or I can try the workload using latest ceph with Josef Bacik's btrfs-work** to see if either problem crops up again. Any preference? * http://article.gmane.org/gmane.comp.file-systems.ceph.devel/1726 ** http://article.gmane.org/gmane.comp.file-systems.ceph.devel/1719 Cheers, Chris ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: cmon: PGMonitor::encode_pending() assert failure 2011-02-03 22:02 ` Chris Dunlop @ 2011-02-03 22:24 ` Sage Weil 2011-02-03 23:24 ` Chris Dunlop 2011-02-07 3:34 ` Chris Dunlop 0 siblings, 2 replies; 6+ messages in thread From: Sage Weil @ 2011-02-03 22:24 UTC (permalink / raw) To: Chris Dunlop; +Cc: ceph-devel On Fri, 4 Feb 2011, Chris Dunlop wrote: > On Thu, Feb 03, 2011 at 01:03:17PM -0800, Sage Weil wrote: > > Hi Chris, > > > > This is an interesting one. Would it be possible for you to > > tar up your mondata directory on the failed node and post it > > somewhere I can get at it? From the looks of things the pgmap > > incremental state file is truncated, but I'd like to confirm. > > > > http://tracker.newdream.net/issues/762 > > Aw crap, sorry, I blew that fs away installing the latest master > to see what happened there ...whereupon overnight I've promptly > hit the "WARNING: at fs/btrfs/inode.c:2143" problem*. > > I can revert back to my previous install and run the same > workload to see if it crops up again if that's useful (it took > about 12 hours of rsync'ing files into the fs to get there), or > I can try the workload using latest ceph with Josef Bacik's > btrfs-work** to see if either problem crops up again. Any > preference? Let's go with latest master and latest bits from Josef. It's the future! Thanks! sage ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: cmon: PGMonitor::encode_pending() assert failure 2011-02-03 22:24 ` Sage Weil @ 2011-02-03 23:24 ` Chris Dunlop 2011-02-07 3:34 ` Chris Dunlop 1 sibling, 0 replies; 6+ messages in thread From: Chris Dunlop @ 2011-02-03 23:24 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel On Thu, Feb 03, 2011 at 02:24:43PM -0800, Sage Weil wrote: > On Fri, 4 Feb 2011, Chris Dunlop wrote: >> On Thu, Feb 03, 2011 at 01:03:17PM -0800, Sage Weil wrote: >>> Hi Chris, >>> >>> This is an interesting one. Would it be possible for you to >>> tar up your mondata directory on the failed node and post it >>> somewhere I can get at it? From the looks of things the pgmap >>> incremental state file is truncated, but I'd like to confirm. >>> >>> http://tracker.newdream.net/issues/762 >> >> Aw crap, sorry, I blew that fs away installing the latest master >> to see what happened there ...whereupon overnight I've promptly >> hit the "WARNING: at fs/btrfs/inode.c:2143" problem*. >> >> I can revert back to my previous install and run the same >> workload to see if it crops up again if that's useful (it took >> about 12 hours of rsync'ing files into the fs to get there), or >> I can try the workload using latest ceph with Josef Bacik's >> btrfs-work** to see if either problem crops up again. Any >> preference? > > Let's go with latest master and latest bits from Josef. It's the future! No worries, I'll get started on that as soon as I increase my git-foo to the point where I know how to work with multiple repositories... hmmm, looks like git-remote is what I need... ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: cmon: PGMonitor::encode_pending() assert failure 2011-02-03 22:24 ` Sage Weil 2011-02-03 23:24 ` Chris Dunlop @ 2011-02-07 3:34 ` Chris Dunlop 1 sibling, 0 replies; 6+ messages in thread From: Chris Dunlop @ 2011-02-07 3:34 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel On Thu, Feb 03, 2011 at 02:24:43PM -0800, Sage Weil wrote: > On Fri, 4 Feb 2011, Chris Dunlop wrote: >> On Thu, Feb 03, 2011 at 01:03:17PM -0800, Sage Weil wrote: >>> >>> http://tracker.newdream.net/issues/762 >> >> I can revert back to my previous install and run the same >> workload to see if it crops up again if that's useful (it >> took about 12 hours of rsync'ing files into the fs to get >> there), or I can try the workload using latest ceph with >> Josef Bacik's btrfs-work** to see if either problem crops up >> again. Any preference? > > Let's go with latest master and latest bits from Josef. It's > the future! The PGMonitor::encode_pending() assert failure didn't show up in copying 500 GB into a fresh ceph fs using: ceph-client 9aae8faf + Josef's btrfs-work bacae123 So either it's fixed or it's now craftily hidden, waiting to pounce again. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-02-07 3:34 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-02-03 1:51 cmon: PGMonitor::encode_pending() assert failure Chris Dunlop [not found] ` <AANLkTi=SyqVD8MyCt+ybYrhoSHunYrchcdZUvW-nzOgw@mail.gmail.com> 2011-02-03 21:03 ` Sage Weil 2011-02-03 22:02 ` Chris Dunlop 2011-02-03 22:24 ` Sage Weil 2011-02-03 23:24 ` Chris Dunlop 2011-02-07 3:34 ` Chris Dunlop
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.