still crashes with tapdisk rbd

* still crashes with tapdisk rbd
@ 2013-09-12 11:21 James Harper
  2013-09-12 14:31 ` Sage Weil
  0 siblings, 1 reply; 4+ messages in thread
From: James Harper @ 2013-09-12 11:21 UTC (permalink / raw)
  To: ceph-devel

I'm still getting crashes with tapdisk rbd. Most of the time it crashes gdb if I try. When I do get something, the crashing thread is always segfaulting in pthread_cond_wait and the stack is always corrupt:

(gdb) bt
#0  0x00007faae20c52d7 in pthread_cond_wait@@GLIBC_2.3.2 () from remote:/lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00c1c435c10e782c in ?? ()
#2  0xe0bc294e52000010 in ?? ()
#3  0x08481380b00400fa in ?? ()
#4  0x3326aab400000000 in ?? ()
#5  0x0000000008001e00 in ?? ()
#6  0x000004043326aab4 in ?? ()
#7  0x7aef0100040595ef in ?? ()

When I examine the memory on the stack I get like:

0x7faae3cc7c10: 0x00    0x00    0x00    0x00    0xb4    0xaa    0x26    0x32
0x7faae3cc7c18: 0x00    0x1e    0x00    0x08    0x00    0x00    0x00    0x00
0x7faae3cc7c20: 0xb4    0xaa    0x26    0x32    0x04    0x04    0x00    0x00
0x7faae3cc7c28: 0xef    0x95    0x05    0x04    0x00    0x01    0xef    0x79
0x7faae3cc7c30: 0x06    0x04    0x00    0x00    0x00    0x01    0x2b    0xf8
0x7faae3cc7c38: 0x2c    0x78    0x0e    0xc1    0x35    0xc4    0xc1    0x00
0x7faae3cc7c40: 0x10    0x00    0x00    0x52    0x4e    0x29    0xbc    0xe0
0x7faae3cc7c48: 0xfa    0x00    0x04    0xb0    0x80    0x13    0x48    0x08
0x7faae3cc7c50: 0x00    0x00    0x00    0x00    0xb4    0xaa    0x26    0x33
0x7faae3cc7c58: 0x00    0x1e    0x00    0x08    0x00    0x00    0x00    0x00
0x7faae3cc7c60: 0xb4    0xaa    0x26    0x33    0x04    0x04    0x00    0x00
0x7faae3cc7c68: 0xef    0x95    0x05    0x04    0x00    0x01    0xef    0x7a
0x7faae3cc7c70: 0x06    0x04    0x00    0x00    0x00    0x01    0x2c    0x38
0x7faae3cc7c78: 0x2c    0xb8    0x0e    0xc1    0x35    0xc5    0xc1    0x00
0x7faae3cc7c80: 0x10    0x00    0x00    0x52    0x4e    0x29    0xbc    0xe0
0x7faae3cc7c88: 0xfa    0x00    0x04    0xb0    0x80    0x13    0x5c    0x08

And I see very similar byte patterns in a tcpdump taken at the time of the crash, so I'm wondering if data read from or to be written to the network is overflowing a buffer somewhere and corrupting the stack.

Does ceph use a magic start of message number or something that I could identify?

Thanks

James

^ permalink raw reply	[flat|nested] 4+ messages in thread