All of lore.kernel.org
 help / color / mirror / Atom feed
* debugging librbd async
@ 2013-08-16  5:00 James Harper
  2013-08-16  5:08 ` Sage Weil
  0 siblings, 1 reply; 11+ messages in thread
From: James Harper @ 2013-08-16  5:00 UTC (permalink / raw)
  To: ceph-devel; +Cc: Sylvain Munaut (s.munaut@whatever-company.com)

I'm testing out the tapdisk rbd that Sylvain wrote under Xen, and have been having all sorts of problems as the tapdisk process is segfaulting. To make matters worse, any attempt to use gdb on the resulting core just tells me it can't find the threads ('generic error'). Google tells me that I can get around this error by linking the main exe (tapdisk) with libpthread, but that doesn't help.

With strategic printf's I have confirmed that in most cases the crash happens after a call to rbd_aio_read or rbd_aio_write and before the callback is called. Given the async nature of tapdisk it's impossible to be sure but I'm confident that the crash is not happening in any of the tapdisk code. It's possible that there is an off-by-one error in a buffer somewhere with the corruption showing up later but there really isn't a lot of code there and I've been over it very closely and it appears quite sound.

I have also tested for multiple complete's for the same request, and corrupt pointers being passed into the completion routine, and nothing shows up there either.

In most cases there is nothing pre-empting the crash, aside from a tendency to seemingly crash more often when the cluster is disturbed (eg a mon node is rebooted). I have one VM which will be unbootable for long periods of time with the crash happening during boot, typically when postgres starts. This can be reproduced for hours and is useful for debugging, but then suddenly the problem goes away spontaneously and I can no longer reproduce it even after hundreds of reboots.

I'm using Debian and the problem exists with both the latest cuttlefish and dumpling deb's.

So... does librbd have any internal self-checking options I can enable? If I'm going to start injecting printf's around the place, can anyone suggest what code paths are most likely to be causing the above?

Thanks

James


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: debugging librbd async
  2013-08-16  5:00 debugging librbd async James Harper
@ 2013-08-16  5:08 ` Sage Weil
  2013-08-16  5:38   ` James Harper
  0 siblings, 1 reply; 11+ messages in thread
From: Sage Weil @ 2013-08-16  5:08 UTC (permalink / raw)
  To: James Harper; +Cc: ceph-devel, Sylvain Munaut (s.munaut@whatever-company.com)

On Fri, 16 Aug 2013, James Harper wrote:
> I'm testing out the tapdisk rbd that Sylvain wrote under Xen, and have been having all sorts of problems as the tapdisk process is segfaulting. To make matters worse, any attempt to use gdb on the resulting core just tells me it can't find the threads ('generic error'). Google tells me that I can get around this error by linking the main exe (tapdisk) with libpthread, but that doesn't help.
> 
> With strategic printf's I have confirmed that in most cases the crash happens after a call to rbd_aio_read or rbd_aio_write and before the callback is called. Given the async nature of tapdisk it's impossible to be sure but I'm confident that the crash is not happening in any of the tapdisk code. It's possible that there is an off-by-one error in a buffer somewhere with the corruption showing up later but there really isn't a lot of code there and I've been over it very closely and it appears quite sound.
> 
> I have also tested for multiple complete's for the same request, and corrupt pointers being passed into the completion routine, and nothing shows up there either.
> 
> In most cases there is nothing pre-empting the crash, aside from a tendency to seemingly crash more often when the cluster is disturbed (eg a mon node is rebooted). I have one VM which will be unbootable for long periods of time with the crash happening during boot, typically when postgres starts. This can be reproduced for hours and is useful for debugging, but then suddenly the problem goes away spontaneously and I can no longer reproduce it even after hundreds of reboots.
> 
> I'm using Debian and the problem exists with both the latest cuttlefish and dumpling deb's.
> 
> So... does librbd have any internal self-checking options I can enable? If I'm going to start injecting printf's around the place, can anyone suggest what code paths are most likely to be causing the above?

This is usually about the time when we trying running things under 
valgrind.  Is that an option with tapdisk?

Of course, the old standby is to just crank up the logging detail and try 
to narrow down where the crash happens.  Have you tried that yet?

There is a probable issue with aio_flush and caching enabled that Mike 
Dawson is trying to reproduce.  Are you running with caching on or off?

sage

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: debugging librbd async
  2013-08-16  5:08 ` Sage Weil
@ 2013-08-16  5:38   ` James Harper
  2013-08-16  5:49     ` Sage Weil
  0 siblings, 1 reply; 11+ messages in thread
From: James Harper @ 2013-08-16  5:38 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel, Sylvain Munaut (s.munaut@whatever-company.com)

> 
> On Fri, 16 Aug 2013, James Harper wrote:
> > I'm testing out the tapdisk rbd that Sylvain wrote under Xen, and have
> > been having all sorts of problems as the tapdisk process is segfaulting. To
> > make matters worse, any attempt to use gdb on the resulting core just tells
> > me it can't find the threads ('generic error'). Google tells me that I can get
> > around this error by linking the main exe (tapdisk) with libpthread, but that
> > doesn't help.
> >
> > With strategic printf's I have confirmed that in most cases the crash
> > happens after a call to rbd_aio_read or rbd_aio_write and before the
> > callback is called. Given the async nature of tapdisk it's impossible to be sure
> > but I'm confident that the crash is not happening in any of the tapdisk code.
> > It's possible that there is an off-by-one error in a buffer somewhere with the
> > corruption showing up later but there really isn't a lot of code there and I've
> > been over it very closely and it appears quite sound.
> >
> > I have also tested for multiple complete's for the same request, and
> > corrupt pointers being passed into the completion routine, and nothing
> > shows up there either.
> >
> > In most cases there is nothing pre-empting the crash, aside from a
> > tendency to seemingly crash more often when the cluster is disturbed (eg a
> > mon node is rebooted). I have one VM which will be unbootable for long
> > periods of time with the crash happening during boot, typically when
> > postgres starts. This can be reproduced for hours and is useful for debugging,
> > but then suddenly the problem goes away spontaneously and I can no longer
> > reproduce it even after hundreds of reboots.
> >
> > I'm using Debian and the problem exists with both the latest cuttlefish and
> > dumpling deb's.
> >
> > So... does librbd have any internal self-checking options I can enable? If I'm
> > going to start injecting printf's around the place, can anyone suggest what
> > code paths are most likely to be causing the above?
> 
> This is usually about the time when we trying running things under
> valgrind.  Is that an option with tapdisk?

Never used it before. I guess I can find out pretty easy, I'll try that next.

> Of course, the old standby is to just crank up the logging detail and try
> to narrow down where the crash happens.  Have you tried that yet?

I haven't touched the rbd code. Is increased logging a compile-time option or a config option?

> 
> There is a probable issue with aio_flush and caching enabled that Mike
> Dawson is trying to reproduce.  Are you running with caching on or off?

I have not enabled caching, and I believe it's disabled by default.

Thanks

James

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: debugging librbd async
  2013-08-16  5:38   ` James Harper
@ 2013-08-16  5:49     ` Sage Weil
  2013-08-16  6:13       ` James Harper
  0 siblings, 1 reply; 11+ messages in thread
From: Sage Weil @ 2013-08-16  5:49 UTC (permalink / raw)
  To: James Harper; +Cc: ceph-devel, Sylvain Munaut (s.munaut@whatever-company.com)

On Fri, 16 Aug 2013, James Harper wrote:
> > 
> > On Fri, 16 Aug 2013, James Harper wrote:
> > > I'm testing out the tapdisk rbd that Sylvain wrote under Xen, and have
> > > been having all sorts of problems as the tapdisk process is segfaulting. To
> > > make matters worse, any attempt to use gdb on the resulting core just tells
> > > me it can't find the threads ('generic error'). Google tells me that I can get
> > > around this error by linking the main exe (tapdisk) with libpthread, but that
> > > doesn't help.
> > >
> > > With strategic printf's I have confirmed that in most cases the crash
> > > happens after a call to rbd_aio_read or rbd_aio_write and before the
> > > callback is called. Given the async nature of tapdisk it's impossible to be sure
> > > but I'm confident that the crash is not happening in any of the tapdisk code.
> > > It's possible that there is an off-by-one error in a buffer somewhere with the
> > > corruption showing up later but there really isn't a lot of code there and I've
> > > been over it very closely and it appears quite sound.
> > >
> > > I have also tested for multiple complete's for the same request, and
> > > corrupt pointers being passed into the completion routine, and nothing
> > > shows up there either.
> > >
> > > In most cases there is nothing pre-empting the crash, aside from a
> > > tendency to seemingly crash more often when the cluster is disturbed (eg a
> > > mon node is rebooted). I have one VM which will be unbootable for long
> > > periods of time with the crash happening during boot, typically when
> > > postgres starts. This can be reproduced for hours and is useful for debugging,
> > > but then suddenly the problem goes away spontaneously and I can no longer
> > > reproduce it even after hundreds of reboots.
> > >
> > > I'm using Debian and the problem exists with both the latest cuttlefish and
> > > dumpling deb's.
> > >
> > > So... does librbd have any internal self-checking options I can enable? If I'm
> > > going to start injecting printf's around the place, can anyone suggest what
> > > code paths are most likely to be causing the above?
> > 
> > This is usually about the time when we trying running things under
> > valgrind.  Is that an option with tapdisk?
> 
> Never used it before. I guess I can find out pretty easy, I'll try that next.
> 
> > Of course, the old standby is to just crank up the logging detail and try
> > to narrow down where the crash happens.  Have you tried that yet?
> 
> I haven't touched the rbd code. Is increased logging a compile-time 
> option or a config option?

That is probably the first you should try then.  In the [client] section 
of ceph.conf on the node where tapdisk is running add something like

 [client]
  debug rbd = 20
  debug rados = 20
  debug ms = 1
  log file = /var/log/ceph/client.$name.$pid.log

and make sure the log directory is writeable.

> > There is a probable issue with aio_flush and caching enabled that Mike
> > Dawson is trying to reproduce.  Are you running with caching on or off?
> 
> I have not enabled caching, and I believe it's disabled by default.

There is a fix for an aio hang that just hit the cuttlefish branch today 
that could conceivably be the issue.  It causes a hang on qemu but maybe 
tapdisk is more sensitive?  I'd make sure you're running with that in any 
case to rule it out.

sage

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: debugging librbd async
  2013-08-16  5:49     ` Sage Weil
@ 2013-08-16  6:13       ` James Harper
  2013-08-16 15:57         ` Sage Weil
  0 siblings, 1 reply; 11+ messages in thread
From: James Harper @ 2013-08-16  6:13 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel, Sylvain Munaut (s.munaut@whatever-company.com)

> > > Of course, the old standby is to just crank up the logging detail and try
> > > to narrow down where the crash happens.  Have you tried that yet?
> >
> > I haven't touched the rbd code. Is increased logging a compile-time
> > option or a config option?
> 
> That is probably the first you should try then.  In the [client] section
> of ceph.conf on the node where tapdisk is running add something like
> 
>  [client]
>   debug rbd = 20
>   debug rados = 20
>   debug ms = 1
>   log file = /var/log/ceph/client.$name.$pid.log
> 
> and make sure the log directory is writeable.
> 

Excellent. How noisy are those levels likely to be?

Is it the consumer of librbd that reads those values? I mean all I need to do is restart tapdisk process and the logging should happen right?

> > > There is a probable issue with aio_flush and caching enabled that Mike
> > > Dawson is trying to reproduce.  Are you running with caching on or off?
> >
> > I have not enabled caching, and I believe it's disabled by default.
> 
> There is a fix for an aio hang that just hit the cuttlefish branch today
> that could conceivably be the issue.  It causes a hang on qemu but maybe
> tapdisk is more sensitive?  I'd make sure you're running with that in any
> case to rule it out.
> 

I switched to dumpling in the last few days to see if the problem existed there. Is the fix you mention in dumpling? I'm not yet running mission critical production code on ceph, just a secondary windows domain controller, secondary spam filter, and a few other machines that don't affect production if they crash.

I'm also testing valgrind at the moment, just basic memtest, but suddenly everything is quite stable even though it's under reasonable load right now. Stupid heisenbugs.

Thanks

James




^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: debugging librbd async
  2013-08-16  6:13       ` James Harper
@ 2013-08-16 15:57         ` Sage Weil
  2013-08-17  0:07           ` James Harper
  2013-08-17 14:10           ` James Harper
  0 siblings, 2 replies; 11+ messages in thread
From: Sage Weil @ 2013-08-16 15:57 UTC (permalink / raw)
  To: James Harper; +Cc: ceph-devel, Sylvain Munaut (s.munaut@whatever-company.com)

On Fri, 16 Aug 2013, James Harper wrote:
> > > > Of course, the old standby is to just crank up the logging detail and try
> > > > to narrow down where the crash happens.  Have you tried that yet?
> > >
> > > I haven't touched the rbd code. Is increased logging a compile-time
> > > option or a config option?
> > 
> > That is probably the first you should try then.  In the [client] section
> > of ceph.conf on the node where tapdisk is running add something like
> > 
> >  [client]
> >   debug rbd = 20
> >   debug rados = 20
> >   debug ms = 1
> >   log file = /var/log/ceph/client.$name.$pid.log
> > 
> > and make sure the log directory is writeable.
> > 
> 
> Excellent. How noisy are those levels likely to be?
> 
> Is it the consumer of librbd that reads those values? I mean all I need 
> to do is restart tapdisk process and the logging should happen right?

That sound do it, yeah.

> > > > There is a probable issue with aio_flush and caching enabled that Mike
> > > > Dawson is trying to reproduce.  Are you running with caching on or off?
> > >
> > > I have not enabled caching, and I believe it's disabled by default.
> > 
> > There is a fix for an aio hang that just hit the cuttlefish branch today
> > that could conceivably be the issue.  It causes a hang on qemu but maybe
> > tapdisk is more sensitive?  I'd make sure you're running with that in any
> > case to rule it out.
> > 
> 
> I switched to dumpling in the last few days to see if the problem existed there. Is the fix you mention in dumpling? I'm not yet running mission critical production code on ceph, just a secondary windows domain controller, secondary spam filter, and a few other machines that don't affect production if they crash.

The fix is in the dumpling branch, but not in v0.67.  It will be in 
v0.67.1.
 
> I'm also testing valgrind at the moment, just basic memtest, but suddenly everything is quite stable even though it's under reasonable load right now. Stupid heisenbugs.

Valgrind makes things go very slow (~10x?), which can have a huge effect 
on timing. Sometimes that reveals new races, other times it hides others.  
:/

sage

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: debugging librbd async
  2013-08-16 15:57         ` Sage Weil
@ 2013-08-17  0:07           ` James Harper
  2013-08-17 14:10           ` James Harper
  1 sibling, 0 replies; 11+ messages in thread
From: James Harper @ 2013-08-17  0:07 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel, Sylvain Munaut (s.munaut@whatever-company.com)

> > I'm also testing valgrind at the moment, just basic memtest, but suddenly
> > everything is quite stable even though it's under reasonable load right now.
> > Stupid heisenbugs.
> 
> Valgrind makes things go very slow (~10x?), which can have a huge effect
> on timing. Sometimes that reveals new races, other times it hides others.
> :/
> 

That's what I figured. It's run overnight with no hiccups so far under valgrind (helgrind - but no crashes under memtest either). The helgrind process on one tapdisk instance is running pretty hard, but performance is tolerable. I'll leave it go and poke at it occasionally to see if I can reproduce an actual crash.

So far helgrind has identified a bunch of possible races like:

==12478== Possible data race during write of size 4 at 0x913F250 by thread #1
==12478== Locks held: 2, at addresses 0x9119B80 0x913EF88
==12478==    at 0x5821D57: Pipe::Pipe(SimpleMessenger*, int, Connection*) (in /usr/lib/librados.so.2.0.0)
==12478==    by 0x5838FC7: SimpleMessenger::connect_rank(entity_addr_t const&, int, Connection*, Message*) (in /usr/lib/librados.so.2.0.0)
==12478==    by 0x5839414: SimpleMessenger::get_connection(entity_inst_t const&) (in /usr/lib/librados.so.2.0.0)
==12478==    by 0x56D062F: Objecter::get_session(int) (in /usr/lib/librados.so.2.0.0)
==12478==    by 0x56D0FAA: Objecter::recalc_op_target(Objecter::Op*) (in /usr/lib/librados.so.2.0.0)
==12478==    by 0x56D6140: Objecter::_op_submit(Objecter::Op*) (in /usr/lib/librados.so.2.0.0)
==12478==    by 0x56C4640: librados::IoCtxImpl::operate_read(object_t const&, ObjectOperation*, ceph::buffer::list*) (in /usr/lib/librados.so.2.0.0)
==12478==    by 0x56C53D1: librados::IoCtxImpl::stat(object_t const&, unsigned long*, long*) (in /usr/lib/librados.so.2.0.0)
==12478==    by 0x569B83E: librados::IoCtx::stat(std::string const&, unsigned long*, long*) (in /usr/lib/librados.so.2.0.0)
==12478==    by 0x63E45BF: librbd::detect_format(librados::IoCtx&, std::string const&, bool*, unsigned long*) (in /usr/lib/librbd.so.1.0.0)
==12478==    by 0x63DCF6E: librbd::ImageCtx::init() (in /usr/lib/librbd.so.1.0.0)
==12478==    by 0x63F45C7: librbd::open_image(librbd::ImageCtx*) (in /usr/lib/librbd.so.1.0.0)
==12478==
==12478== This conflicts with a previous read of size 4 by thread #9
==12478== Locks held: none
==12478==    at 0x581E6A1: Pipe::tcp_read_wait() (in /usr/lib/librados.so.2.0.0)
==12478==    by 0x581E96F: Pipe::tcp_read(char*, int) (in /usr/lib/librados.so.2.0.0)
==12478==    by 0x581F965: Pipe::read_message(Message**) (in /usr/lib/librados.so.2.0.0)
==12478==    by 0x583138B: Pipe::reader() (in /usr/lib/librados.so.2.0.0)
==12478==    by 0x5834D5C: Pipe::Reader::entry() (in /usr/lib/librados.so.2.0.0)
==12478==    by 0x4C2B5AD: mythread_wrapper (hg_intercepts.c:219)
==12478==    by 0x6650B4F: start_thread (pthread_create.c:304)
==12478==    by 0x6940A7C: clone (clone.S:112)

But I don't think they are correct. There are also some races that appear to relate to memcpy.

Also I believe tapdisk serialises all callbacks and I don't know that helgrind is aware of that completely as it identifies some cases that just can't happen.

The last thing logged is:

==12478== More than 10000000 total errors detected.  I'm not reporting any more.
==12478== Final error counts will be inaccurate.  Go fix your program!
==12478== Rerun with --error-limit=no to disable this cutoff.  Note
==12478== that errors may occur in your program without prior warning from
==12478== Valgrind, because errors are no longer being displayed.

So I think maybe it's stopped being useful in that instance. I can post the full output if you want. It's only about 40kb

James

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: debugging librbd async
  2013-08-16 15:57         ` Sage Weil
  2013-08-17  0:07           ` James Harper
@ 2013-08-17 14:10           ` James Harper
  2013-08-28  4:49             ` James Harper
  1 sibling, 1 reply; 11+ messages in thread
From: James Harper @ 2013-08-17 14:10 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel, Sylvain Munaut (s.munaut@whatever-company.com)

I can now reliably reproduce this with fio (see config following email), but never under valgrind so far.

James

[global]
directory=/tmp/fio
size=128M
ioengine=libaio

[randwrite1]
rw=randwrite
iodepth=32

[randread1]
rw=randread
iodepth=32

[randwrite2]
rw=randwrite
iodepth=32

[randread2]
rw=randread
iodepth=32

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: debugging librbd async
  2013-08-17 14:10           ` James Harper
@ 2013-08-28  4:49             ` James Harper
  2013-08-28 15:45               ` Sage Weil
  2013-08-28 21:40               ` James Harper
  0 siblings, 2 replies; 11+ messages in thread
From: James Harper @ 2013-08-28  4:49 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel, Sylvain Munaut (s.munaut@whatever-company.com)

Still having crashes with the rbd module for blktap. I think I can't get consistent debugging info out of librbd. When it writes to a file the logging is buffered so the tail is always missing. When it logs to syslog I thought I was getting everything but now I'm not so sure.

What is the best way to ensure that each log message gets flushed to the logfile? I'm looking now but a hint would get the job done faster!

Thanks

James

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: debugging librbd async
  2013-08-28  4:49             ` James Harper
@ 2013-08-28 15:45               ` Sage Weil
  2013-08-28 21:40               ` James Harper
  1 sibling, 0 replies; 11+ messages in thread
From: Sage Weil @ 2013-08-28 15:45 UTC (permalink / raw)
  To: James Harper; +Cc: ceph-devel, Sylvain Munaut (s.munaut@whatever-company.com)

On Wed, 28 Aug 2013, James Harper wrote:
> Still having crashes with the rbd module for blktap. I think I can't get 
> consistent debugging info out of librbd. When it writes to a file the 
> logging is buffered so the tail is always missing. When it logs to 
> syslog I thought I was getting everything but now I'm not so sure.

You can disable buffering with

 log max recent = 0

That should catch everything!

> What is the best way to ensure that each log message gets flushed to the 
> logfile? I'm looking now but a hint would get the job done faster!

sage


> 
> Thanks
> 
> James
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: debugging librbd async
  2013-08-28  4:49             ` James Harper
  2013-08-28 15:45               ` Sage Weil
@ 2013-08-28 21:40               ` James Harper
  1 sibling, 0 replies; 11+ messages in thread
From: James Harper @ 2013-08-28 21:40 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel, Sylvain Munaut (s.munaut@whatever-company.com)

I have set the logfile to be opened with SYNC and that seems to be giving me more consistent output

I see the crash is mostly happening around queue_aio_write. Most of the time the last thing I see is this entry "librados: queue_aio_write 0x7f0928004390 completion 0x1ea65d0 write_seq 147". I've never seen it happen in a read operation. And always where there are lots of writes queued.

From adding further debug statements, I can see that the exact point of the crash is very soon after that but not at a constant point.

I'm thinking that the crash is actually happening somewhere in the callback chain, although definitely before librbd callbacks are invoked as debug prints in the rados end of the callbacks shows nothing.

Where is the 'top' of the callback chain on a write? I can see that librados calls librbd (which then calls the callback in tapdisk), but what calls librados?

Thanks

James


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-08-28 21:41 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-16  5:00 debugging librbd async James Harper
2013-08-16  5:08 ` Sage Weil
2013-08-16  5:38   ` James Harper
2013-08-16  5:49     ` Sage Weil
2013-08-16  6:13       ` James Harper
2013-08-16 15:57         ` Sage Weil
2013-08-17  0:07           ` James Harper
2013-08-17 14:10           ` James Harper
2013-08-28  4:49             ` James Harper
2013-08-28 15:45               ` Sage Weil
2013-08-28 21:40               ` James Harper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.