All of lore.kernel.org
 help / color / mirror / Atom feed
* Fwd: Intermittent Assert Failure
@ 2011-12-08  5:20 Noah Watkins
  2011-12-08  5:54 ` Sage Weil
  0 siblings, 1 reply; 4+ messages in thread
From: Noah Watkins @ 2011-12-08  5:20 UTC (permalink / raw)
  To: ceph-devel; +Cc: John Ralston

Stack trace from a simple Ceph client that does nothing more than open a 
file and call ceph_read(...) on it.

- Noah

Hey,
I just wanted to note that I got this failure occasionally when I was 
running ceph_read on issdm-29

@issdm-29:~$ time ./ceph_read /etc/ceph/ceph.conf /john.1gb.bin
client/Client.cc: In function 'void Client::put_inode(Inode*, int)', in 
thread '7fbccb1ea760'
client/Client.cc: 1763: FAILED assert(!unclean)
  ceph version 0.38-259-gd4aef20 
(commit:d4aef20210d43e25eefe945009e6f77d5b045381)
  1: (Client::put_inode(Inode*, int)+0x615) [0x7fbccabe0455]
  2: (Client::unlink(Dentry*, bool)+0x27d) [0x7fbccabe1fed]
  3: (Client::trim_dentry(Dentry*)+0x73) [0x7fbccabe31a3]
  4: (Client::trim_cache()+0x215) [0x7fbccabe3585]
  5: (Client::unmount()+0x4d4) [0x7fbccac06474]
  6: (ceph_shutdown()+0x79) [0x7fbccabd00e9]
  7: ./ceph_read() [0x400e3e]
  8: (__libc_start_main()+0xfe) [0x7fbcca7fcd8e]
  9: ./ceph_read() [0x400a69]
  ceph version 0.38-259-gd4aef20 
(commit:d4aef20210d43e25eefe945009e6f77d5b045381)
  1: (Client::put_inode(Inode*, int)+0x615) [0x7fbccabe0455]
  2: (Client::unlink(Dentry*, bool)+0x27d) [0x7fbccabe1fed]
  3: (Client::trim_dentry(Dentry*)+0x73) [0x7fbccabe31a3]
  4: (Client::trim_cache()+0x215) [0x7fbccabe3585]
  5: (Client::unmount()+0x4d4) [0x7fbccac06474]
  6: (ceph_shutdown()+0x79) [0x7fbccabd00e9]
  7: ./ceph_read() [0x400e3e]
  8: (__libc_start_main()+0xfe) [0x7fbcca7fcd8e]
  9: ./ceph_read() [0x400a69]
terminate called after throwing an instance of 'ceph::FailedAssertion'
Aborted


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Fwd: Intermittent Assert Failure
  2011-12-08  5:20 Fwd: Intermittent Assert Failure Noah Watkins
@ 2011-12-08  5:54 ` Sage Weil
  2012-04-16 18:33   ` Joe Buck
  0 siblings, 1 reply; 4+ messages in thread
From: Sage Weil @ 2011-12-08  5:54 UTC (permalink / raw)
  To: Noah Watkins; +Cc: ceph-devel, John Ralston

On Wed, 7 Dec 2011, Noah Watkins wrote:
> Stack trace from a simple Ceph client that does nothing more than open a file
> and call ceph_read(...) on it.

This just looks like a crash we've periodically been seeing in qa, but 
haven't been able to reproduce with logging (or diagnose from the cores).

I did some cleanup in Client::unmount() and rearranged some stuff.  Can 
you see if it still happens with the latest master?

Thanks!
sage



> 
> - Noah
> 
> Hey,
> I just wanted to note that I got this failure occasionally when I was running
> ceph_read on issdm-29
> 
> @issdm-29:~$ time ./ceph_read /etc/ceph/ceph.conf /john.1gb.bin
> client/Client.cc: In function 'void Client::put_inode(Inode*, int)', in thread
> '7fbccb1ea760'
> client/Client.cc: 1763: FAILED assert(!unclean)
>  ceph version 0.38-259-gd4aef20
> (commit:d4aef20210d43e25eefe945009e6f77d5b045381)
>  1: (Client::put_inode(Inode*, int)+0x615) [0x7fbccabe0455]
>  2: (Client::unlink(Dentry*, bool)+0x27d) [0x7fbccabe1fed]
>  3: (Client::trim_dentry(Dentry*)+0x73) [0x7fbccabe31a3]
>  4: (Client::trim_cache()+0x215) [0x7fbccabe3585]
>  5: (Client::unmount()+0x4d4) [0x7fbccac06474]
>  6: (ceph_shutdown()+0x79) [0x7fbccabd00e9]
>  7: ./ceph_read() [0x400e3e]
>  8: (__libc_start_main()+0xfe) [0x7fbcca7fcd8e]
>  9: ./ceph_read() [0x400a69]
>  ceph version 0.38-259-gd4aef20
> (commit:d4aef20210d43e25eefe945009e6f77d5b045381)
>  1: (Client::put_inode(Inode*, int)+0x615) [0x7fbccabe0455]
>  2: (Client::unlink(Dentry*, bool)+0x27d) [0x7fbccabe1fed]
>  3: (Client::trim_dentry(Dentry*)+0x73) [0x7fbccabe31a3]
>  4: (Client::trim_cache()+0x215) [0x7fbccabe3585]
>  5: (Client::unmount()+0x4d4) [0x7fbccac06474]
>  6: (ceph_shutdown()+0x79) [0x7fbccabd00e9]
>  7: ./ceph_read() [0x400e3e]
>  8: (__libc_start_main()+0xfe) [0x7fbcca7fcd8e]
>  9: ./ceph_read() [0x400a69]
> terminate called after throwing an instance of 'ceph::FailedAssertion'
> Aborted
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Intermittent Assert Failure
  2011-12-08  5:54 ` Sage Weil
@ 2012-04-16 18:33   ` Joe Buck
  2012-04-16 18:41     ` Sage Weil
  0 siblings, 1 reply; 4+ messages in thread
From: Joe Buck @ 2012-04-16 18:33 UTC (permalink / raw)
  To: Sage Weil; +Cc: Noah Watkins, ceph-devel, John Ralston

Sage,

With the 0.45 build of ceph, we're still seeing this. It seems to happen to about 10% of the tasks that spin up when we launch a MapReduce job. 

I can reproduce this pretty reliably. What log files would be useful?

-Joe Buck

On Dec 7, 2011, at 9:54 PM, Sage Weil wrote:

> On Wed, 7 Dec 2011, Noah Watkins wrote:
>> Stack trace from a simple Ceph client that does nothing more than open a file
>> and call ceph_read(...) on it.
> 
> This just looks like a crash we've periodically been seeing in qa, but 
> haven't been able to reproduce with logging (or diagnose from the cores).
> 
> I did some cleanup in Client::unmount() and rearranged some stuff.  Can 
> you see if it still happens with the latest master?
> 
> Thanks!
> sage
> 
> 
> 
>> 
>> - Noah
>> 
>> Hey,
>> I just wanted to note that I got this failure occasionally when I was running
>> ceph_read on issdm-29
>> 
>> @issdm-29:~$ time ./ceph_read /etc/ceph/ceph.conf /john.1gb.bin
>> client/Client.cc: In function 'void Client::put_inode(Inode*, int)', in thread
>> '7fbccb1ea760'
>> client/Client.cc: 1763: FAILED assert(!unclean)
>> ceph version 0.38-259-gd4aef20
>> (commit:d4aef20210d43e25eefe945009e6f77d5b045381)
>> 1: (Client::put_inode(Inode*, int)+0x615) [0x7fbccabe0455]
>> 2: (Client::unlink(Dentry*, bool)+0x27d) [0x7fbccabe1fed]
>> 3: (Client::trim_dentry(Dentry*)+0x73) [0x7fbccabe31a3]
>> 4: (Client::trim_cache()+0x215) [0x7fbccabe3585]
>> 5: (Client::unmount()+0x4d4) [0x7fbccac06474]
>> 6: (ceph_shutdown()+0x79) [0x7fbccabd00e9]
>> 7: ./ceph_read() [0x400e3e]
>> 8: (__libc_start_main()+0xfe) [0x7fbcca7fcd8e]
>> 9: ./ceph_read() [0x400a69]
>> ceph version 0.38-259-gd4aef20
>> (commit:d4aef20210d43e25eefe945009e6f77d5b045381)
>> 1: (Client::put_inode(Inode*, int)+0x615) [0x7fbccabe0455]
>> 2: (Client::unlink(Dentry*, bool)+0x27d) [0x7fbccabe1fed]
>> 3: (Client::trim_dentry(Dentry*)+0x73) [0x7fbccabe31a3]
>> 4: (Client::trim_cache()+0x215) [0x7fbccabe3585]
>> 5: (Client::unmount()+0x4d4) [0x7fbccac06474]
>> 6: (ceph_shutdown()+0x79) [0x7fbccabd00e9]
>> 7: ./ceph_read() [0x400e3e]
>> 8: (__libc_start_main()+0xfe) [0x7fbcca7fcd8e]
>> 9: ./ceph_read() [0x400a69]
>> terminate called after throwing an instance of 'ceph::FailedAssertion'
>> Aborted
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Intermittent Assert Failure
  2012-04-16 18:33   ` Joe Buck
@ 2012-04-16 18:41     ` Sage Weil
  0 siblings, 0 replies; 4+ messages in thread
From: Sage Weil @ 2012-04-16 18:41 UTC (permalink / raw)
  To: Joe Buck; +Cc: Noah Watkins, ceph-devel, John Ralston

On Mon, 16 Apr 2012, Joe Buck wrote:
> Sage,
> 
> With the 0.45 build of ceph, we're still seeing this. It seems to happen to about 10% of the tasks that spin up when we launch a MapReduce job. 
> 
> I can reproduce this pretty reliably. What log files would be useful?

I think

	debug ms = 1
	debug client = 20
	debug objectcacher = 20

would be enough!  My guess is that this is related to the objectcacher 
callbacks in the client.

Thanks-
sage


> 
> -Joe Buck
> 
> On Dec 7, 2011, at 9:54 PM, Sage Weil wrote:
> 
> > On Wed, 7 Dec 2011, Noah Watkins wrote:
> >> Stack trace from a simple Ceph client that does nothing more than open a file
> >> and call ceph_read(...) on it.
> > 
> > This just looks like a crash we've periodically been seeing in qa, but 
> > haven't been able to reproduce with logging (or diagnose from the cores).
> > 
> > I did some cleanup in Client::unmount() and rearranged some stuff.  Can 
> > you see if it still happens with the latest master?
> > 
> > Thanks!
> > sage
> > 
> > 
> > 
> >> 
> >> - Noah
> >> 
> >> Hey,
> >> I just wanted to note that I got this failure occasionally when I was running
> >> ceph_read on issdm-29
> >> 
> >> @issdm-29:~$ time ./ceph_read /etc/ceph/ceph.conf /john.1gb.bin
> >> client/Client.cc: In function 'void Client::put_inode(Inode*, int)', in thread
> >> '7fbccb1ea760'
> >> client/Client.cc: 1763: FAILED assert(!unclean)
> >> ceph version 0.38-259-gd4aef20
> >> (commit:d4aef20210d43e25eefe945009e6f77d5b045381)
> >> 1: (Client::put_inode(Inode*, int)+0x615) [0x7fbccabe0455]
> >> 2: (Client::unlink(Dentry*, bool)+0x27d) [0x7fbccabe1fed]
> >> 3: (Client::trim_dentry(Dentry*)+0x73) [0x7fbccabe31a3]
> >> 4: (Client::trim_cache()+0x215) [0x7fbccabe3585]
> >> 5: (Client::unmount()+0x4d4) [0x7fbccac06474]
> >> 6: (ceph_shutdown()+0x79) [0x7fbccabd00e9]
> >> 7: ./ceph_read() [0x400e3e]
> >> 8: (__libc_start_main()+0xfe) [0x7fbcca7fcd8e]
> >> 9: ./ceph_read() [0x400a69]
> >> ceph version 0.38-259-gd4aef20
> >> (commit:d4aef20210d43e25eefe945009e6f77d5b045381)
> >> 1: (Client::put_inode(Inode*, int)+0x615) [0x7fbccabe0455]
> >> 2: (Client::unlink(Dentry*, bool)+0x27d) [0x7fbccabe1fed]
> >> 3: (Client::trim_dentry(Dentry*)+0x73) [0x7fbccabe31a3]
> >> 4: (Client::trim_cache()+0x215) [0x7fbccabe3585]
> >> 5: (Client::unmount()+0x4d4) [0x7fbccac06474]
> >> 6: (ceph_shutdown()+0x79) [0x7fbccabd00e9]
> >> 7: ./ceph_read() [0x400e3e]
> >> 8: (__libc_start_main()+0xfe) [0x7fbcca7fcd8e]
> >> 9: ./ceph_read() [0x400a69]
> >> terminate called after throwing an instance of 'ceph::FailedAssertion'
> >> Aborted
> >> 
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> 
> >> 
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-04-16 18:42 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-08  5:20 Fwd: Intermittent Assert Failure Noah Watkins
2011-12-08  5:54 ` Sage Weil
2012-04-16 18:33   ` Joe Buck
2012-04-16 18:41     ` Sage Weil

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.