All of lore.kernel.org
 help / color / mirror / Atom feed
* crash in __jbd2_journal_file_buffer
@ 2013-07-30 22:41 Sage Weil
  2013-07-31 19:02 ` Jan Kara
  0 siblings, 1 reply; 16+ messages in thread
From: Sage Weil @ 2013-07-30 22:41 UTC (permalink / raw)
  To: linux-ext4

Hi everyone,

Hit this on 3.10.  Is this a known issue?

Thanks-
sage


Stack traceback for pid 23944
0xffff8802206edeb0    23944    23840  1    2   R  0xffff8802206ee338 
*ceph-osd
 ffff88020bf17b78 0000000000000018 ffffffff81267398 ffff8802256d40c0
 ffff88020b5a1230 ffff88020bf17bb8 ffffffff81325d9c ffff88020b5a1230
 ffff8802256d40c0 ffff88020b5a1230 00000000698a8d24 ffff88020bf17be8
Call Trace:
 [<ffffffff81267398>] ? __jbd2_journal_file_buffer+0x188/0x260
 [<ffffffff81325d9c>] ? do_raw_spin_lock+0x10c/0x150
 [<ffffffff81268118>] ? do_get_write_access+0x448/0x650
 [<ffffffff81213caf>] ? ext4_read_inode_bitmap+0x9f/0x5f0
 [<ffffffff81167849>] ? kmem_cache_alloc+0x39/0x160
 [<ffffffff81268490>] ? jbd2_journal_get_write_access+0x30/0x50
 [<ffffffff81244d43>] ? __ext4_journal_get_write_access+0x43/0x90
 [<ffffffff812143d7>] ? ext4_free_inode+0x1d7/0x5d0
 [<ffffffff8121e601>] ? ext4_evict_inode+0x341/0x4d0
 [<ffffffff8121b268>] ? ext4_mark_inode_dirty+0x88/0x230
 [<ffffffff8121e610>] ? ext4_evict_inode+0x350/0x4d0
 [<ffffffff8118a098>] ? evict+0xb8/0x1c0
 [<ffffffff8118a9d5>] ? iput+0x105/0x190
 [<ffffffff8117d341>] ? do_unlinkat+0x201/0x270
 [<ffffffff8131e9be>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff81180356>] ? SyS_unlink+0x16/0x20


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: crash in __jbd2_journal_file_buffer
  2013-07-30 22:41 crash in __jbd2_journal_file_buffer Sage Weil
@ 2013-07-31 19:02 ` Jan Kara
  2013-08-09 17:36   ` Sage Weil
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Kara @ 2013-07-31 19:02 UTC (permalink / raw)
  To: Sage Weil; +Cc: linux-ext4

  Hello,

On Tue 30-07-13 15:41:40, Sage Weil wrote:
> Hit this on 3.10.  Is this a known issue?
  No, I haven't seen it. Why did the kernel crash?

								Honza
> 
> Thanks-
> sage
> 
> 
> Stack traceback for pid 23944
> 0xffff8802206edeb0    23944    23840  1    2   R  0xffff8802206ee338 
> *ceph-osd
>  ffff88020bf17b78 0000000000000018 ffffffff81267398 ffff8802256d40c0
>  ffff88020b5a1230 ffff88020bf17bb8 ffffffff81325d9c ffff88020b5a1230
>  ffff8802256d40c0 ffff88020b5a1230 00000000698a8d24 ffff88020bf17be8
> Call Trace:
>  [<ffffffff81267398>] ? __jbd2_journal_file_buffer+0x188/0x260
>  [<ffffffff81325d9c>] ? do_raw_spin_lock+0x10c/0x150
>  [<ffffffff81268118>] ? do_get_write_access+0x448/0x650
>  [<ffffffff81213caf>] ? ext4_read_inode_bitmap+0x9f/0x5f0
>  [<ffffffff81167849>] ? kmem_cache_alloc+0x39/0x160
>  [<ffffffff81268490>] ? jbd2_journal_get_write_access+0x30/0x50
>  [<ffffffff81244d43>] ? __ext4_journal_get_write_access+0x43/0x90
>  [<ffffffff812143d7>] ? ext4_free_inode+0x1d7/0x5d0
>  [<ffffffff8121e601>] ? ext4_evict_inode+0x341/0x4d0
>  [<ffffffff8121b268>] ? ext4_mark_inode_dirty+0x88/0x230
>  [<ffffffff8121e610>] ? ext4_evict_inode+0x350/0x4d0
>  [<ffffffff8118a098>] ? evict+0xb8/0x1c0
>  [<ffffffff8118a9d5>] ? iput+0x105/0x190
>  [<ffffffff8117d341>] ? do_unlinkat+0x201/0x270
>  [<ffffffff8131e9be>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>  [<ffffffff81180356>] ? SyS_unlink+0x16/0x20
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: crash in __jbd2_journal_file_buffer
  2013-07-31 19:02 ` Jan Kara
@ 2013-08-09 17:36   ` Sage Weil
  2013-08-09 21:24     ` Jan Kara
  0 siblings, 1 reply; 16+ messages in thread
From: Sage Weil @ 2013-08-09 17:36 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4

Hi Jan,

Sorry for the slow response; took a while for this to happen again.  This 
time I'm keeping the machine sitting in kdb in case there is more 
information needed.

<4>[19307.314449] ------------[ cut here ]------------
<4>[19307.319114] WARNING: at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1237 jbd2_journal_dirty_metadata+0x1ba/0x260()

<4>[19307.382324] CPU: 0 PID: 8877 Comm: ceph-osd Tainted: G        W    3.10.0-ceph-00049-g68d04c9 #1
<4>[19307.391256] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
<4>[19307.398879]  ffffffff81a1d3c8 ffff880214469928 ffffffff816311b0 ffff880214469968
<4>[19307.407572]  ffffffff8103fae0 ffff880214469958 ffff880170a9dc30 ffff8802240fbe80
<4>[19307.415153]  0000000000000000 ffff88020b366000 ffff8802256e7510 ffff880214469978
<4>[19307.422633] Call Trace:
<4>[19307.425209]  [<ffffffff816311b0>] dump_stack+0x19/0x1b
<4>[19307.430368]  [<ffffffff8103fae0>] warn_slowpath_common+0x70/0xa0
<4>[19307.436502]  [<ffffffff8103fb2a>] warn_slowpath_null+0x1a/0x20
<4>[19307.442356]  [<ffffffff81267c2a>] jbd2_journal_dirty_metadata+0x1ba/0x260
<4>[19307.449271]  [<ffffffff81245093>] __ext4_handle_dirty_metadata+0xa3/0x140
<4>[19307.456192]  [<ffffffff812561f3>] ext4_xattr_release_block+0x103/0x1f0
<4>[19307.462742]  [<ffffffff81256680>] ext4_xattr_block_set+0x1e0/0x910
<4>[19307.469049]  [<ffffffff8125795b>] ext4_xattr_set_handle+0x38b/0x4a0
<4>[19307.475445]  [<ffffffff810a319d>] ? trace_hardirqs_on+0xd/0x10
<4>[19307.481300]  [<ffffffff81257b32>] ext4_xattr_set+0xc2/0x140
<4>[19307.486995]  [<ffffffff81258547>] ext4_xattr_user_set+0x47/0x50
<4>[19307.492936]  [<ffffffff811935ce>] generic_setxattr+0x6e/0x90
<4>[19307.498734]  [<ffffffff81193ecb>] __vfs_setxattr_noperm+0x7b/0x1c0
<4>[19307.505049]  [<ffffffff811940d4>] vfs_setxattr+0xc4/0xd0
<4>[19307.510380]  [<ffffffff8119421e>] setxattr+0x13e/0x1e0
<4>[19307.515646]  [<ffffffff811719c7>] ? __sb_start_write+0xe7/0x1b0
<4>[19307.521587]  [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
<4>[19307.527812]  [<ffffffff8118c65c>] ? fget_light+0x3c/0x130
<4>[19307.533230]  [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
<4>[19307.539522]  [<ffffffff8118f1f8>] ? __mnt_want_write+0x58/0x70
<4>[19307.545492]  [<ffffffff811946be>] SyS_fsetxattr+0xbe/0x100
<4>[19307.551001]  [<ffffffff816407c2>] system_call_fastpath+0x16/0x1b
<4>[19307.557137] ---[ end trace 3e447f9462172c58 ]---
<2>[19307.561776] EXT4-fs error (device sda1) in ext4_handle_dirty_xattr_block:167: error 117
<3>[19307.570181] Aborting journal on device sda1-8.
<2>[19307.604589] EXT4-fs (sda1): Remounting filesystem read-only
<2>[19307.610273] EXT4-fs error (device sda1) in ext4_xattr_release_block:558: error 117
<0>[19307.623337] journal commit I/O error
<0>[19307.623342] journal commit I/O error
<0>[19307.623405] journal commit I/O error
<0>[19307.623519] journal commit I/O error
<2>[19307.623585] EXT4-fs error (device sda1): __ext4_journal_start_sb:62: Detected aborted journal
<1>[19307.623642] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
<1>[19307.623649] IP: [<ffffffff81267c4a>] jbd2_journal_dirty_metadata+0x1da/0x260
<4>[19307.623653] PGD 20bc32067 PUD 2245fc067 PMD 0 
<4>[19307.623657] Oops: 0000 [#1] SMP 
[dumpcommon]kdb>   -bt

Stack traceback for pid 8877
0xffff88020db7bf20     8877     8795  1    4   R  0xffff88020db7c3a8 *ceph-osd
 ffff880214469a08 0000000000000018 ffffffff81267ad0 ffffea000834d600
 ffff880226c03300 ffffffff81256562 0000000000000282 ffff880214469a68
 0000000000000000 00000000000011a5 ffffffff81825270 ffff8802256e7510
Call Trace:
 [<ffffffff81267ad0>] ? jbd2_journal_dirty_metadata+0x60/0x260
 [<ffffffff81256562>] ? ext4_xattr_block_set+0xc2/0x910
 [<ffffffff81245093>] ? __ext4_handle_dirty_metadata+0xa3/0x140
 [<ffffffff8121aeee>] ? ext4_mark_iloc_dirty+0x40e/0x660
 [<ffffffff81257835>] ? ext4_xattr_set_handle+0x265/0x4a0
 [<ffffffff81257b32>] ? ext4_xattr_set+0xc2/0x140
 [<ffffffff81258547>] ? ext4_xattr_user_set+0x47/0x50
 [<ffffffff811935ce>] ? generic_setxattr+0x6e/0x90
 [<ffffffff81193ecb>] ? __vfs_setxattr_noperm+0x7b/0x1c0
 [<ffffffff811940d4>] ? vfs_setxattr+0xc4/0xd0
 [<ffffffff8119421e>] ? setxattr+0x13e/0x1e0
 [<ffffffff811719c7>] ? __sb_start_write+0xe7/0x1b0
 [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
 [<ffffffff8118c65c>] ? fget_light+0x3c/0x130
 [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
 [<ffffffff8118f1f8>] ? __mnt_want_write+0x58/0x70
 [<ffffffff811946be>] ? SyS_fsetxattr+0xbe/0x100
 [<ffffffff816407c2>] ? system_call_fastpath+0x16/0x1b


The workload is a ceph-osd daemon, which tends to hammer pretty heavily on 
the xattr paths.  I don't have a nice self-contained reproducer or 
anything since this is is falling out of our integration tests.  
Hopefully there is enough here (or that can be gleaned from kdb) that it 
is clear what is going on.  It's v3.10 (plus some unrelated patches).

Thanks!
sage



On Wed, 31 Jul 2013, Jan Kara wrote:
>   Hello,
> 
> On Tue 30-07-13 15:41:40, Sage Weil wrote:
> > Hit this on 3.10.  Is this a known issue?
>   No, I haven't seen it. Why did the kernel crash?
> 
> 								Honza
> > 
> > Thanks-
> > sage
> > 
> > 
> > Stack traceback for pid 23944
> > 0xffff8802206edeb0    23944    23840  1    2   R  0xffff8802206ee338 
> > *ceph-osd
> >  ffff88020bf17b78 0000000000000018 ffffffff81267398 ffff8802256d40c0
> >  ffff88020b5a1230 ffff88020bf17bb8 ffffffff81325d9c ffff88020b5a1230
> >  ffff8802256d40c0 ffff88020b5a1230 00000000698a8d24 ffff88020bf17be8
> > Call Trace:
> >  [<ffffffff81267398>] ? __jbd2_journal_file_buffer+0x188/0x260
> >  [<ffffffff81325d9c>] ? do_raw_spin_lock+0x10c/0x150
> >  [<ffffffff81268118>] ? do_get_write_access+0x448/0x650
> >  [<ffffffff81213caf>] ? ext4_read_inode_bitmap+0x9f/0x5f0
> >  [<ffffffff81167849>] ? kmem_cache_alloc+0x39/0x160
> >  [<ffffffff81268490>] ? jbd2_journal_get_write_access+0x30/0x50
> >  [<ffffffff81244d43>] ? __ext4_journal_get_write_access+0x43/0x90
> >  [<ffffffff812143d7>] ? ext4_free_inode+0x1d7/0x5d0
> >  [<ffffffff8121e601>] ? ext4_evict_inode+0x341/0x4d0
> >  [<ffffffff8121b268>] ? ext4_mark_inode_dirty+0x88/0x230
> >  [<ffffffff8121e610>] ? ext4_evict_inode+0x350/0x4d0
> >  [<ffffffff8118a098>] ? evict+0xb8/0x1c0
> >  [<ffffffff8118a9d5>] ? iput+0x105/0x190
> >  [<ffffffff8117d341>] ? do_unlinkat+0x201/0x270
> >  [<ffffffff8131e9be>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> >  [<ffffffff81180356>] ? SyS_unlink+0x16/0x20
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> -- 
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: crash in __jbd2_journal_file_buffer
  2013-08-09 17:36   ` Sage Weil
@ 2013-08-09 21:24     ` Jan Kara
  2013-08-09 22:11       ` Sage Weil
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Kara @ 2013-08-09 21:24 UTC (permalink / raw)
  To: Sage Weil; +Cc: Jan Kara, linux-ext4

On Fri 09-08-13 10:36:37, Sage Weil wrote:
> Hi Jan,
> 
> Sorry for the slow response; took a while for this to happen again.  This 
> time I'm keeping the machine sitting in kdb in case there is more 
> information needed.
> 
> <4>[19307.314449] ------------[ cut here ]------------
> <4>[19307.319114] WARNING: at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1237 jbd2_journal_dirty_metadata+0x1ba/0x260()
> 
> <4>[19307.382324] CPU: 0 PID: 8877 Comm: ceph-osd Tainted: G        W    3.10.0-ceph-00049-g68d04c9 #1
> <4>[19307.391256] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
> <4>[19307.398879]  ffffffff81a1d3c8 ffff880214469928 ffffffff816311b0 ffff880214469968
> <4>[19307.407572]  ffffffff8103fae0 ffff880214469958 ffff880170a9dc30 ffff8802240fbe80
> <4>[19307.415153]  0000000000000000 ffff88020b366000 ffff8802256e7510 ffff880214469978
> <4>[19307.422633] Call Trace:
> <4>[19307.425209]  [<ffffffff816311b0>] dump_stack+0x19/0x1b
> <4>[19307.430368]  [<ffffffff8103fae0>] warn_slowpath_common+0x70/0xa0
> <4>[19307.436502]  [<ffffffff8103fb2a>] warn_slowpath_null+0x1a/0x20
> <4>[19307.442356]  [<ffffffff81267c2a>] jbd2_journal_dirty_metadata+0x1ba/0x260
> <4>[19307.449271]  [<ffffffff81245093>] __ext4_handle_dirty_metadata+0xa3/0x140
> <4>[19307.456192]  [<ffffffff812561f3>] ext4_xattr_release_block+0x103/0x1f0
> <4>[19307.462742]  [<ffffffff81256680>] ext4_xattr_block_set+0x1e0/0x910
> <4>[19307.469049]  [<ffffffff8125795b>] ext4_xattr_set_handle+0x38b/0x4a0
> <4>[19307.475445]  [<ffffffff810a319d>] ? trace_hardirqs_on+0xd/0x10
> <4>[19307.481300]  [<ffffffff81257b32>] ext4_xattr_set+0xc2/0x140
> <4>[19307.486995]  [<ffffffff81258547>] ext4_xattr_user_set+0x47/0x50
> <4>[19307.492936]  [<ffffffff811935ce>] generic_setxattr+0x6e/0x90
> <4>[19307.498734]  [<ffffffff81193ecb>] __vfs_setxattr_noperm+0x7b/0x1c0
> <4>[19307.505049]  [<ffffffff811940d4>] vfs_setxattr+0xc4/0xd0
> <4>[19307.510380]  [<ffffffff8119421e>] setxattr+0x13e/0x1e0
> <4>[19307.515646]  [<ffffffff811719c7>] ? __sb_start_write+0xe7/0x1b0
> <4>[19307.521587]  [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
> <4>[19307.527812]  [<ffffffff8118c65c>] ? fget_light+0x3c/0x130
> <4>[19307.533230]  [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
> <4>[19307.539522]  [<ffffffff8118f1f8>] ? __mnt_want_write+0x58/0x70
> <4>[19307.545492]  [<ffffffff811946be>] SyS_fsetxattr+0xbe/0x100
> <4>[19307.551001]  [<ffffffff816407c2>] system_call_fastpath+0x16/0x1b
> <4>[19307.557137] ---[ end trace 3e447f9462172c58 ]---
  Hum, weird. So we returned EUCLEAN from jbd2_journal_dirty_metadata()
meaning that buffer head passed in there didn't have journal head attached.
However ext4_xattr_release_block() does call
ext4_journal_get_write_access() for the bh before calling
ext4_handle_dirty_xattr_block(). So we should have bh attached to the
running transaction which also keeps journal head in place.

> <2>[19307.561776] EXT4-fs error (device sda1) in ext4_handle_dirty_xattr_block:167: error 117
> <3>[19307.570181] Aborting journal on device sda1-8.
> <2>[19307.604589] EXT4-fs (sda1): Remounting filesystem read-only
> <2>[19307.610273] EXT4-fs error (device sda1) in ext4_xattr_release_block:558: error 117
> <0>[19307.623337] journal commit I/O error
> <0>[19307.623342] journal commit I/O error
> <0>[19307.623405] journal commit I/O error
> <0>[19307.623519] journal commit I/O error
> <2>[19307.623585] EXT4-fs error (device sda1): __ext4_journal_start_sb:62: Detected aborted journal
> <1>[19307.623642] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
> <1>[19307.623649] IP: [<ffffffff81267c4a>] jbd2_journal_dirty_metadata+0x1da/0x260
  Can you possibly disassemble where exactly did we crash in this function?
Apparently we didn't properly handle the fact that the journal got aborted
here. But I'm not sure what exactly failed.

								Honza

> <4>[19307.623653] PGD 20bc32067 PUD 2245fc067 PMD 0 
> <4>[19307.623657] Oops: 0000 [#1] SMP 
> [dumpcommon]kdb>   -bt
> 
> Stack traceback for pid 8877
> 0xffff88020db7bf20     8877     8795  1    4   R  0xffff88020db7c3a8 *ceph-osd
>  ffff880214469a08 0000000000000018 ffffffff81267ad0 ffffea000834d600
>  ffff880226c03300 ffffffff81256562 0000000000000282 ffff880214469a68
>  0000000000000000 00000000000011a5 ffffffff81825270 ffff8802256e7510
> Call Trace:
>  [<ffffffff81267ad0>] ? jbd2_journal_dirty_metadata+0x60/0x260
>  [<ffffffff81256562>] ? ext4_xattr_block_set+0xc2/0x910
>  [<ffffffff81245093>] ? __ext4_handle_dirty_metadata+0xa3/0x140
>  [<ffffffff8121aeee>] ? ext4_mark_iloc_dirty+0x40e/0x660
>  [<ffffffff81257835>] ? ext4_xattr_set_handle+0x265/0x4a0
>  [<ffffffff81257b32>] ? ext4_xattr_set+0xc2/0x140
>  [<ffffffff81258547>] ? ext4_xattr_user_set+0x47/0x50
>  [<ffffffff811935ce>] ? generic_setxattr+0x6e/0x90
>  [<ffffffff81193ecb>] ? __vfs_setxattr_noperm+0x7b/0x1c0
>  [<ffffffff811940d4>] ? vfs_setxattr+0xc4/0xd0
>  [<ffffffff8119421e>] ? setxattr+0x13e/0x1e0
>  [<ffffffff811719c7>] ? __sb_start_write+0xe7/0x1b0
>  [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
>  [<ffffffff8118c65c>] ? fget_light+0x3c/0x130
>  [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
>  [<ffffffff8118f1f8>] ? __mnt_want_write+0x58/0x70
>  [<ffffffff811946be>] ? SyS_fsetxattr+0xbe/0x100
>  [<ffffffff816407c2>] ? system_call_fastpath+0x16/0x1b
> 
> 
> The workload is a ceph-osd daemon, which tends to hammer pretty heavily on 
> the xattr paths.  I don't have a nice self-contained reproducer or 
> anything since this is is falling out of our integration tests.  
> Hopefully there is enough here (or that can be gleaned from kdb) that it 
> is clear what is going on.  It's v3.10 (plus some unrelated patches).
> 
> Thanks!
> sage
> 
> 
> 
> On Wed, 31 Jul 2013, Jan Kara wrote:
> >   Hello,
> > 
> > On Tue 30-07-13 15:41:40, Sage Weil wrote:
> > > Hit this on 3.10.  Is this a known issue?
> >   No, I haven't seen it. Why did the kernel crash?
> > 
> > 								Honza
> > > 
> > > Thanks-
> > > sage
> > > 
> > > 
> > > Stack traceback for pid 23944
> > > 0xffff8802206edeb0    23944    23840  1    2   R  0xffff8802206ee338 
> > > *ceph-osd
> > >  ffff88020bf17b78 0000000000000018 ffffffff81267398 ffff8802256d40c0
> > >  ffff88020b5a1230 ffff88020bf17bb8 ffffffff81325d9c ffff88020b5a1230
> > >  ffff8802256d40c0 ffff88020b5a1230 00000000698a8d24 ffff88020bf17be8
> > > Call Trace:
> > >  [<ffffffff81267398>] ? __jbd2_journal_file_buffer+0x188/0x260
> > >  [<ffffffff81325d9c>] ? do_raw_spin_lock+0x10c/0x150
> > >  [<ffffffff81268118>] ? do_get_write_access+0x448/0x650
> > >  [<ffffffff81213caf>] ? ext4_read_inode_bitmap+0x9f/0x5f0
> > >  [<ffffffff81167849>] ? kmem_cache_alloc+0x39/0x160
> > >  [<ffffffff81268490>] ? jbd2_journal_get_write_access+0x30/0x50
> > >  [<ffffffff81244d43>] ? __ext4_journal_get_write_access+0x43/0x90
> > >  [<ffffffff812143d7>] ? ext4_free_inode+0x1d7/0x5d0
> > >  [<ffffffff8121e601>] ? ext4_evict_inode+0x341/0x4d0
> > >  [<ffffffff8121b268>] ? ext4_mark_inode_dirty+0x88/0x230
> > >  [<ffffffff8121e610>] ? ext4_evict_inode+0x350/0x4d0
> > >  [<ffffffff8118a098>] ? evict+0xb8/0x1c0
> > >  [<ffffffff8118a9d5>] ? iput+0x105/0x190
> > >  [<ffffffff8117d341>] ? do_unlinkat+0x201/0x270
> > >  [<ffffffff8131e9be>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> > >  [<ffffffff81180356>] ? SyS_unlink+0x16/0x20
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > -- 
> > Jan Kara <jack@suse.cz>
> > SUSE Labs, CR
> > 
> > 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: crash in __jbd2_journal_file_buffer
  2013-08-09 21:24     ` Jan Kara
@ 2013-08-09 22:11       ` Sage Weil
  2013-08-12 12:52         ` Jan Kara
  0 siblings, 1 reply; 16+ messages in thread
From: Sage Weil @ 2013-08-09 22:11 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4

On Fri, 9 Aug 2013, Jan Kara wrote:
> On Fri 09-08-13 10:36:37, Sage Weil wrote:
> > Hi Jan,
> > 
> > Sorry for the slow response; took a while for this to happen again.  This 
> > time I'm keeping the machine sitting in kdb in case there is more 
> > information needed.
> > 
> > <4>[19307.314449] ------------[ cut here ]------------
> > <4>[19307.319114] WARNING: at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1237 jbd2_journal_dirty_metadata+0x1ba/0x260()
> > 
> > <4>[19307.382324] CPU: 0 PID: 8877 Comm: ceph-osd Tainted: G        W    3.10.0-ceph-00049-g68d04c9 #1
> > <4>[19307.391256] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
> > <4>[19307.398879]  ffffffff81a1d3c8 ffff880214469928 ffffffff816311b0 ffff880214469968
> > <4>[19307.407572]  ffffffff8103fae0 ffff880214469958 ffff880170a9dc30 ffff8802240fbe80
> > <4>[19307.415153]  0000000000000000 ffff88020b366000 ffff8802256e7510 ffff880214469978
> > <4>[19307.422633] Call Trace:
> > <4>[19307.425209]  [<ffffffff816311b0>] dump_stack+0x19/0x1b
> > <4>[19307.430368]  [<ffffffff8103fae0>] warn_slowpath_common+0x70/0xa0
> > <4>[19307.436502]  [<ffffffff8103fb2a>] warn_slowpath_null+0x1a/0x20
> > <4>[19307.442356]  [<ffffffff81267c2a>] jbd2_journal_dirty_metadata+0x1ba/0x260
> > <4>[19307.449271]  [<ffffffff81245093>] __ext4_handle_dirty_metadata+0xa3/0x140
> > <4>[19307.456192]  [<ffffffff812561f3>] ext4_xattr_release_block+0x103/0x1f0
> > <4>[19307.462742]  [<ffffffff81256680>] ext4_xattr_block_set+0x1e0/0x910
> > <4>[19307.469049]  [<ffffffff8125795b>] ext4_xattr_set_handle+0x38b/0x4a0
> > <4>[19307.475445]  [<ffffffff810a319d>] ? trace_hardirqs_on+0xd/0x10
> > <4>[19307.481300]  [<ffffffff81257b32>] ext4_xattr_set+0xc2/0x140
> > <4>[19307.486995]  [<ffffffff81258547>] ext4_xattr_user_set+0x47/0x50
> > <4>[19307.492936]  [<ffffffff811935ce>] generic_setxattr+0x6e/0x90
> > <4>[19307.498734]  [<ffffffff81193ecb>] __vfs_setxattr_noperm+0x7b/0x1c0
> > <4>[19307.505049]  [<ffffffff811940d4>] vfs_setxattr+0xc4/0xd0
> > <4>[19307.510380]  [<ffffffff8119421e>] setxattr+0x13e/0x1e0
> > <4>[19307.515646]  [<ffffffff811719c7>] ? __sb_start_write+0xe7/0x1b0
> > <4>[19307.521587]  [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
> > <4>[19307.527812]  [<ffffffff8118c65c>] ? fget_light+0x3c/0x130
> > <4>[19307.533230]  [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
> > <4>[19307.539522]  [<ffffffff8118f1f8>] ? __mnt_want_write+0x58/0x70
> > <4>[19307.545492]  [<ffffffff811946be>] SyS_fsetxattr+0xbe/0x100
> > <4>[19307.551001]  [<ffffffff816407c2>] system_call_fastpath+0x16/0x1b
> > <4>[19307.557137] ---[ end trace 3e447f9462172c58 ]---
>   Hum, weird. So we returned EUCLEAN from jbd2_journal_dirty_metadata()
> meaning that buffer head passed in there didn't have journal head attached.
> However ext4_xattr_release_block() does call
> ext4_journal_get_write_access() for the bh before calling
> ext4_handle_dirty_xattr_block(). So we should have bh attached to the
> running transaction which also keeps journal head in place.
> 
> > <2>[19307.561776] EXT4-fs error (device sda1) in ext4_handle_dirty_xattr_block:167: error 117
> > <3>[19307.570181] Aborting journal on device sda1-8.
> > <2>[19307.604589] EXT4-fs (sda1): Remounting filesystem read-only
> > <2>[19307.610273] EXT4-fs error (device sda1) in ext4_xattr_release_block:558: error 117
> > <0>[19307.623337] journal commit I/O error
> > <0>[19307.623342] journal commit I/O error
> > <0>[19307.623405] journal commit I/O error
> > <0>[19307.623519] journal commit I/O error
> > <2>[19307.623585] EXT4-fs error (device sda1): __ext4_journal_start_sb:62: Detected aborted journal
> > <1>[19307.623642] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
> > <1>[19307.623649] IP: [<ffffffff81267c4a>] jbd2_journal_dirty_metadata+0x1da/0x260
>   Can you possibly disassemble where exactly did we crash in this function?
> Apparently we didn't properly handle the fact that the journal got aborted
> here. But I'm not sure what exactly failed.

Hrm, I'm having trouble getting from a bzImage back to something I can 
objdump with inline code, so this is a separate compile with the same 
.config that is hopefully right:

                jh->b_modified = 1;
                J_ASSERT_JH(jh, handle->h_buffer_credits > 0);
    1e33:       0f 0b                   ud2    
         */
        if (jh->b_transaction != transaction) {
                JBUFFER_TRACE(jh, "already on other transaction");
                if (unlikely(jh->b_transaction !=
                             journal->j_committing_transaction)) {
                        printk(KERN_EMERG "JBD: %s: "
    1e35:       4d 85 c9                test   %r9,%r9
    1e38:       74 04                   je     1e3e <jbd2_journal_dirty_metadata+0x1de>
    1e3a:       41 8b 41 08             mov    0x8(%r9),%eax
**here^^^
    1e3e:       45 31 c0                xor    %r8d,%r8d
    1e41:       48 85 c9                test   %rcx,%rcx
    1e44:       74 04                   je     1e4a <jbd2_journal_dirty_metadata+0x1ea>
    1e46:       44 8b 41 08             mov    0x8(%rcx),%r8d
    1e4a:       48 8b 53 18             mov    0x18(%rbx),%rdx
                               "jh->b_transaction (%llu, %p, %u) != "
                               "journal->j_committing_transaction (%p, %u)",
                               journal->j_devname,
    1e4e:       49 8d b6 80 05 00 00    lea    0x580(%r14),%rsi
         */
        if (jh->b_transaction != transaction) {
                JBUFFER_TRACE(jh, "already on other transaction");
                if (unlikely(jh->b_transaction !=
                             journal->j_committing_transaction)) {
                        printk(KERN_EMERG "JBD: %s: "
    1e55:       89 04 24                mov    %eax,(%rsp)
    1e58:       48 c7 c7 00 00 00 00    mov    $0x0,%rdi
                        1e5b: R_X86_64_32S      .rodata.str1.8+0x160
    1e5f:       31 c0                   xor    %eax,%eax
    1e61:       e8 00 00 00 00          callq  1e66 <jbd2_journal_dirty_metadata+0x206>
                        1e62: R_X86_64_PC32     printk-0x4
                               jh->b_transaction,
                               jh->b_transaction ? jh->b_transaction->t_tid : 0,
                               journal->j_committing_transaction,
                               journal->j_committing_transaction ?
                               journal->j_committing_transaction->t_tid : 0);
                        ret = -EINVAL;
    1e66:       b8 ea ff ff ff          mov    $0xffffffea,%eax
    1e6b:       e9 a4 fe ff ff          jmpq   1d14 <jbd2_journal_dirty_metadata+0xb4>
                }
                if (unlikely(jh->b_next_transaction != transaction)) {
                        printk(KERN_EMERG "JBD: %s: "
    1e70:       45 31 c0                xor    %r8d,%r8d
    1e73:       48 85 c9                test   %rcx,%rcx
    1e76:       41 8b 44 24 08          mov    0x8(%r12),%eax
    1e7b:       74 04                   je     1e81 <jbd2_journal_dirty_metadata+0x221>
    1e7d:       44 8b 41 08             mov    0x8(%rcx),%r8d
    1e81:       48 8b 53 18             mov    0x18(%rbx),%rdx
                               "jh->b_next_transaction (%llu, %p, %u) != "
                               "transaction (%p, %u)",
                               journal->j_devname,
    1e85:       49 8d b6 80 05 00 00    lea    0x580(%r14),%rsi
                               journal->j_committing_transaction ?
                               journal->j_committing_transaction->t_tid : 0);
                        ret = -EINVAL;
                }
                if (unlikely(jh->b_next_transaction != transaction)) {
                        printk(KERN_EMERG "JBD: %s: "
    1e8c:       89 04 24                mov    %eax,(%rsp)
    1e8f:       4d 89 e1                mov    %r12,%r9
    1e92:       48 c7 c7 00 00 00 00    mov    $0x0,%rdi
                        1e95: R_X86_64_32S      .rodata.str1.8+0x1c0
    1e99:       31 c0                   xor    %eax,%eax
    1e9b:       e8 00 00 00 00          callq  1ea0 <jbd2_journal_dirty_metadata+0x240>
                        1e9c: R_X86_64_PC32     printk-0x4
                               (unsigned long long) bh->b_blocknr,
                               jh->b_next_transaction,
                               jh->b_next_transaction ?
                               jh->b_next_transaction->t_tid : 0,
                               transaction, transaction->t_tid);
                        ret = -EINVAL;
    1ea0:       b8 ea ff ff ff          mov    $0xffffffea,%eax
    1ea5:       e9 77 fe ff ff          jmpq   1d21 <jbd2_journal_dirty_metadata+0xc1>
 *  bit-based spin_unlock()
 */
static inline void bit_spin_unlock(int bitnum, unsigned long *addr)
{
#ifdef CONFIG_DEBUG_SPINLOCK
        BUG_ON(!test_bit(bitnum, addr));
    1eaa:       e8 00 00 00 00          callq  1eaf <jbd2_journal_dirty_metadata+0x24f>
                        1eab: R_X86_64_PC32     .text.unlikely-0x4
                 * transaction's data buffer, ever. */

Registers are

[4]kdb> rd
ax: 0000000000000000  bx: ffff8801f57837b8  cx: 0000000000000000
dx: 000000000091c029  si: 000000000091c029  di: 0000000000000001
bp: ffff880214469a58  sp: ffff880214469a08  r8: ffff8801f57837b8
r9: 0000000000000008  r10: 0000000000000000  r11: 0000000000000000
r12: ffff8802256e7870  r13: ffff8801893c52a0  r14: ffff8802256e76c0
r15: ffff8802256e7510  ip: ffffffff81267c4a  flags: 00010202  cs: 00000010
ss: 00000018  ds: 00000018  es: 00000018  fs: 00000018  gs: 00000018

journal == NULL?

sage


> 
> 								Honza
> 
> > <4>[19307.623653] PGD 20bc32067 PUD 2245fc067 PMD 0 
> > <4>[19307.623657] Oops: 0000 [#1] SMP 
> > [dumpcommon]kdb>   -bt
> > 
> > Stack traceback for pid 8877
> > 0xffff88020db7bf20     8877     8795  1    4   R  0xffff88020db7c3a8 *ceph-osd
> >  ffff880214469a08 0000000000000018 ffffffff81267ad0 ffffea000834d600
> >  ffff880226c03300 ffffffff81256562 0000000000000282 ffff880214469a68
> >  0000000000000000 00000000000011a5 ffffffff81825270 ffff8802256e7510
> > Call Trace:
> >  [<ffffffff81267ad0>] ? jbd2_journal_dirty_metadata+0x60/0x260
> >  [<ffffffff81256562>] ? ext4_xattr_block_set+0xc2/0x910
> >  [<ffffffff81245093>] ? __ext4_handle_dirty_metadata+0xa3/0x140
> >  [<ffffffff8121aeee>] ? ext4_mark_iloc_dirty+0x40e/0x660
> >  [<ffffffff81257835>] ? ext4_xattr_set_handle+0x265/0x4a0
> >  [<ffffffff81257b32>] ? ext4_xattr_set+0xc2/0x140
> >  [<ffffffff81258547>] ? ext4_xattr_user_set+0x47/0x50
> >  [<ffffffff811935ce>] ? generic_setxattr+0x6e/0x90
> >  [<ffffffff81193ecb>] ? __vfs_setxattr_noperm+0x7b/0x1c0
> >  [<ffffffff811940d4>] ? vfs_setxattr+0xc4/0xd0
> >  [<ffffffff8119421e>] ? setxattr+0x13e/0x1e0
> >  [<ffffffff811719c7>] ? __sb_start_write+0xe7/0x1b0
> >  [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
> >  [<ffffffff8118c65c>] ? fget_light+0x3c/0x130
> >  [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
> >  [<ffffffff8118f1f8>] ? __mnt_want_write+0x58/0x70
> >  [<ffffffff811946be>] ? SyS_fsetxattr+0xbe/0x100
> >  [<ffffffff816407c2>] ? system_call_fastpath+0x16/0x1b
> > 
> > 
> > The workload is a ceph-osd daemon, which tends to hammer pretty heavily on 
> > the xattr paths.  I don't have a nice self-contained reproducer or 
> > anything since this is is falling out of our integration tests.  
> > Hopefully there is enough here (or that can be gleaned from kdb) that it 
> > is clear what is going on.  It's v3.10 (plus some unrelated patches).
> > 
> > Thanks!
> > sage
> > 
> > 
> > 
> > On Wed, 31 Jul 2013, Jan Kara wrote:
> > >   Hello,
> > > 
> > > On Tue 30-07-13 15:41:40, Sage Weil wrote:
> > > > Hit this on 3.10.  Is this a known issue?
> > >   No, I haven't seen it. Why did the kernel crash?
> > > 
> > > 								Honza
> > > > 
> > > > Thanks-
> > > > sage
> > > > 
> > > > 
> > > > Stack traceback for pid 23944
> > > > 0xffff8802206edeb0    23944    23840  1    2   R  0xffff8802206ee338 
> > > > *ceph-osd
> > > >  ffff88020bf17b78 0000000000000018 ffffffff81267398 ffff8802256d40c0
> > > >  ffff88020b5a1230 ffff88020bf17bb8 ffffffff81325d9c ffff88020b5a1230
> > > >  ffff8802256d40c0 ffff88020b5a1230 00000000698a8d24 ffff88020bf17be8
> > > > Call Trace:
> > > >  [<ffffffff81267398>] ? __jbd2_journal_file_buffer+0x188/0x260
> > > >  [<ffffffff81325d9c>] ? do_raw_spin_lock+0x10c/0x150
> > > >  [<ffffffff81268118>] ? do_get_write_access+0x448/0x650
> > > >  [<ffffffff81213caf>] ? ext4_read_inode_bitmap+0x9f/0x5f0
> > > >  [<ffffffff81167849>] ? kmem_cache_alloc+0x39/0x160
> > > >  [<ffffffff81268490>] ? jbd2_journal_get_write_access+0x30/0x50
> > > >  [<ffffffff81244d43>] ? __ext4_journal_get_write_access+0x43/0x90
> > > >  [<ffffffff812143d7>] ? ext4_free_inode+0x1d7/0x5d0
> > > >  [<ffffffff8121e601>] ? ext4_evict_inode+0x341/0x4d0
> > > >  [<ffffffff8121b268>] ? ext4_mark_inode_dirty+0x88/0x230
> > > >  [<ffffffff8121e610>] ? ext4_evict_inode+0x350/0x4d0
> > > >  [<ffffffff8118a098>] ? evict+0xb8/0x1c0
> > > >  [<ffffffff8118a9d5>] ? iput+0x105/0x190
> > > >  [<ffffffff8117d341>] ? do_unlinkat+0x201/0x270
> > > >  [<ffffffff8131e9be>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> > > >  [<ffffffff81180356>] ? SyS_unlink+0x16/0x20
> > > > 
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > -- 
> > > Jan Kara <jack@suse.cz>
> > > SUSE Labs, CR
> > > 
> > > 
> -- 
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: crash in __jbd2_journal_file_buffer
  2013-08-09 22:11       ` Sage Weil
@ 2013-08-12 12:52         ` Jan Kara
       [not found]           ` <alpine.DEB.2.00.1308121106320.29150@cobra.newdream.net>
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Kara @ 2013-08-12 12:52 UTC (permalink / raw)
  To: Sage Weil; +Cc: Jan Kara, linux-ext4

On Fri 09-08-13 15:11:05, Sage Weil wrote:
> On Fri, 9 Aug 2013, Jan Kara wrote:
> > On Fri 09-08-13 10:36:37, Sage Weil wrote:
> > > Hi Jan,
> > > 
> > > Sorry for the slow response; took a while for this to happen again.  This 
> > > time I'm keeping the machine sitting in kdb in case there is more 
> > > information needed.
> > > 
> > > <4>[19307.314449] ------------[ cut here ]------------
> > > <4>[19307.319114] WARNING: at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1237 jbd2_journal_dirty_metadata+0x1ba/0x260()
> > > 
> > > <4>[19307.382324] CPU: 0 PID: 8877 Comm: ceph-osd Tainted: G        W    3.10.0-ceph-00049-g68d04c9 #1
> > > <4>[19307.391256] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
> > > <4>[19307.398879]  ffffffff81a1d3c8 ffff880214469928 ffffffff816311b0 ffff880214469968
> > > <4>[19307.407572]  ffffffff8103fae0 ffff880214469958 ffff880170a9dc30 ffff8802240fbe80
> > > <4>[19307.415153]  0000000000000000 ffff88020b366000 ffff8802256e7510 ffff880214469978
> > > <4>[19307.422633] Call Trace:
> > > <4>[19307.425209]  [<ffffffff816311b0>] dump_stack+0x19/0x1b
> > > <4>[19307.430368]  [<ffffffff8103fae0>] warn_slowpath_common+0x70/0xa0
> > > <4>[19307.436502]  [<ffffffff8103fb2a>] warn_slowpath_null+0x1a/0x20
> > > <4>[19307.442356]  [<ffffffff81267c2a>] jbd2_journal_dirty_metadata+0x1ba/0x260
> > > <4>[19307.449271]  [<ffffffff81245093>] __ext4_handle_dirty_metadata+0xa3/0x140
> > > <4>[19307.456192]  [<ffffffff812561f3>] ext4_xattr_release_block+0x103/0x1f0
> > > <4>[19307.462742]  [<ffffffff81256680>] ext4_xattr_block_set+0x1e0/0x910
> > > <4>[19307.469049]  [<ffffffff8125795b>] ext4_xattr_set_handle+0x38b/0x4a0
> > > <4>[19307.475445]  [<ffffffff810a319d>] ? trace_hardirqs_on+0xd/0x10
> > > <4>[19307.481300]  [<ffffffff81257b32>] ext4_xattr_set+0xc2/0x140
> > > <4>[19307.486995]  [<ffffffff81258547>] ext4_xattr_user_set+0x47/0x50
> > > <4>[19307.492936]  [<ffffffff811935ce>] generic_setxattr+0x6e/0x90
> > > <4>[19307.498734]  [<ffffffff81193ecb>] __vfs_setxattr_noperm+0x7b/0x1c0
> > > <4>[19307.505049]  [<ffffffff811940d4>] vfs_setxattr+0xc4/0xd0
> > > <4>[19307.510380]  [<ffffffff8119421e>] setxattr+0x13e/0x1e0
> > > <4>[19307.515646]  [<ffffffff811719c7>] ? __sb_start_write+0xe7/0x1b0
> > > <4>[19307.521587]  [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
> > > <4>[19307.527812]  [<ffffffff8118c65c>] ? fget_light+0x3c/0x130
> > > <4>[19307.533230]  [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
> > > <4>[19307.539522]  [<ffffffff8118f1f8>] ? __mnt_want_write+0x58/0x70
> > > <4>[19307.545492]  [<ffffffff811946be>] SyS_fsetxattr+0xbe/0x100
> > > <4>[19307.551001]  [<ffffffff816407c2>] system_call_fastpath+0x16/0x1b
> > > <4>[19307.557137] ---[ end trace 3e447f9462172c58 ]---
> >   Hum, weird. So we returned EUCLEAN from jbd2_journal_dirty_metadata()
> > meaning that buffer head passed in there didn't have journal head attached.
> > However ext4_xattr_release_block() does call
> > ext4_journal_get_write_access() for the bh before calling
> > ext4_handle_dirty_xattr_block(). So we should have bh attached to the
> > running transaction which also keeps journal head in place.
> > 
> > > <2>[19307.561776] EXT4-fs error (device sda1) in ext4_handle_dirty_xattr_block:167: error 117
> > > <3>[19307.570181] Aborting journal on device sda1-8.
> > > <2>[19307.604589] EXT4-fs (sda1): Remounting filesystem read-only
> > > <2>[19307.610273] EXT4-fs error (device sda1) in ext4_xattr_release_block:558: error 117
> > > <0>[19307.623337] journal commit I/O error
> > > <0>[19307.623342] journal commit I/O error
> > > <0>[19307.623405] journal commit I/O error
> > > <0>[19307.623519] journal commit I/O error
> > > <2>[19307.623585] EXT4-fs error (device sda1): __ext4_journal_start_sb:62: Detected aborted journal
> > > <1>[19307.623642] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
> > > <1>[19307.623649] IP: [<ffffffff81267c4a>] jbd2_journal_dirty_metadata+0x1da/0x260
> >   Can you possibly disassemble where exactly did we crash in this function?
> > Apparently we didn't properly handle the fact that the journal got aborted
> > here. But I'm not sure what exactly failed.
> 
> Hrm, I'm having trouble getting from a bzImage back to something I can 
> objdump with inline code, so this is a separate compile with the same 
> .config that is hopefully right:
> 
>                 jh->b_modified = 1;
>                 J_ASSERT_JH(jh, handle->h_buffer_credits > 0);
>     1e33:       0f 0b                   ud2    
>          */
>         if (jh->b_transaction != transaction) {
>                 JBUFFER_TRACE(jh, "already on other transaction");
>                 if (unlikely(jh->b_transaction !=
>                              journal->j_committing_transaction)) {
>                         printk(KERN_EMERG "JBD: %s: "
>     1e35:       4d 85 c9                test   %r9,%r9
>     1e38:       74 04                   je     1e3e <jbd2_journal_dirty_metadata+0x1de>
>     1e3a:       41 8b 41 08             mov    0x8(%r9),%eax
> **here^^^
>     1e3e:       45 31 c0                xor    %r8d,%r8d
>     1e41:       48 85 c9                test   %rcx,%rcx
>     1e44:       74 04                   je     1e4a <jbd2_journal_dirty_metadata+0x1ea>
>     1e46:       44 8b 41 08             mov    0x8(%rcx),%r8d
>     1e4a:       48 8b 53 18             mov    0x18(%rbx),%rdx
>                                "jh->b_transaction (%llu, %p, %u) != "
>                                "journal->j_committing_transaction (%p, %u)",
>                                journal->j_devname,
>     1e4e:       49 8d b6 80 05 00 00    lea    0x580(%r14),%rsi
>          */
>         if (jh->b_transaction != transaction) {
>                 JBUFFER_TRACE(jh, "already on other transaction");
>                 if (unlikely(jh->b_transaction !=
>                              journal->j_committing_transaction)) {
>                         printk(KERN_EMERG "JBD: %s: "
>     1e55:       89 04 24                mov    %eax,(%rsp)
>     1e58:       48 c7 c7 00 00 00 00    mov    $0x0,%rdi
>                         1e5b: R_X86_64_32S      .rodata.str1.8+0x160
>     1e5f:       31 c0                   xor    %eax,%eax
>     1e61:       e8 00 00 00 00          callq  1e66 <jbd2_journal_dirty_metadata+0x206>
>                         1e62: R_X86_64_PC32     printk-0x4
>                                jh->b_transaction,
>                                jh->b_transaction ? jh->b_transaction->t_tid : 0,
>                                journal->j_committing_transaction,
>                                journal->j_committing_transaction ?
>                                journal->j_committing_transaction->t_tid : 0);
>                         ret = -EINVAL;
>     1e66:       b8 ea ff ff ff          mov    $0xffffffea,%eax
>     1e6b:       e9 a4 fe ff ff          jmpq   1d14 <jbd2_journal_dirty_metadata+0xb4>
>                 }
>                 if (unlikely(jh->b_next_transaction != transaction)) {
>                         printk(KERN_EMERG "JBD: %s: "
>     1e70:       45 31 c0                xor    %r8d,%r8d
>     1e73:       48 85 c9                test   %rcx,%rcx
>     1e76:       41 8b 44 24 08          mov    0x8(%r12),%eax
>     1e7b:       74 04                   je     1e81 <jbd2_journal_dirty_metadata+0x221>
>     1e7d:       44 8b 41 08             mov    0x8(%rcx),%r8d
>     1e81:       48 8b 53 18             mov    0x18(%rbx),%rdx
>                                "jh->b_next_transaction (%llu, %p, %u) != "
>                                "transaction (%p, %u)",
>                                journal->j_devname,
>     1e85:       49 8d b6 80 05 00 00    lea    0x580(%r14),%rsi
>                                journal->j_committing_transaction ?
>                                journal->j_committing_transaction->t_tid : 0);
>                         ret = -EINVAL;
>                 }
>                 if (unlikely(jh->b_next_transaction != transaction)) {
>                         printk(KERN_EMERG "JBD: %s: "
>     1e8c:       89 04 24                mov    %eax,(%rsp)
>     1e8f:       4d 89 e1                mov    %r12,%r9
>     1e92:       48 c7 c7 00 00 00 00    mov    $0x0,%rdi
>                         1e95: R_X86_64_32S      .rodata.str1.8+0x1c0
>     1e99:       31 c0                   xor    %eax,%eax
>     1e9b:       e8 00 00 00 00          callq  1ea0 <jbd2_journal_dirty_metadata+0x240>
>                         1e9c: R_X86_64_PC32     printk-0x4
>                                (unsigned long long) bh->b_blocknr,
>                                jh->b_next_transaction,
>                                jh->b_next_transaction ?
>                                jh->b_next_transaction->t_tid : 0,
>                                transaction, transaction->t_tid);
>                         ret = -EINVAL;
>     1ea0:       b8 ea ff ff ff          mov    $0xffffffea,%eax
>     1ea5:       e9 77 fe ff ff          jmpq   1d21 <jbd2_journal_dirty_metadata+0xc1>
>  *  bit-based spin_unlock()
>  */
> static inline void bit_spin_unlock(int bitnum, unsigned long *addr)
> {
> #ifdef CONFIG_DEBUG_SPINLOCK
>         BUG_ON(!test_bit(bitnum, addr));
>     1eaa:       e8 00 00 00 00          callq  1eaf <jbd2_journal_dirty_metadata+0x24f>
>                         1eab: R_X86_64_PC32     .text.unlikely-0x4
>                  * transaction's data buffer, ever. */
> 
> Registers are
> 
> [4]kdb> rd
> ax: 0000000000000000  bx: ffff8801f57837b8  cx: 0000000000000000
> dx: 000000000091c029  si: 000000000091c029  di: 0000000000000001
> bp: ffff880214469a58  sp: ffff880214469a08  r8: ffff8801f57837b8
> r9: 0000000000000008  r10: 0000000000000000  r11: 0000000000000000
> r12: ffff8802256e7870  r13: ffff8801893c52a0  r14: ffff8802256e76c0
> r15: ffff8802256e7510  ip: ffffffff81267c4a  flags: 00010202  cs: 00000010
> ss: 00000018  ds: 00000018  es: 00000018  fs: 00000018  gs: 00000018
> 
> journal == NULL?
  Umm, r9 is 0x8 so it seems jh->b_transaction or
journal->j_running_transaction was 8 (instead of NULL vs valid pointer) and
thus we oopsed while looking at ->t_tid... Oh, I see where's the problem.
__ext4_handle_dirty_metadata() will stop the handle if there's an error but
caller really doesn't count with this and further uses freed handle. So at
least the BUG part of the problem is clear now - I'll look into fixing
that.

But how did we pass bh without jh into jbd2_journal_dirty_metadata() still
isn't clear to me. BTW, I see 'W' taint flag in the warning which means
there was some warning before this one. What was that?

								Honza

-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: crash in __jbd2_journal_file_buffer
       [not found]           ` <alpine.DEB.2.00.1308121106320.29150@cobra.newdream.net>
@ 2013-08-13 10:34             ` Jan Kara
  2013-08-22 23:35               ` Sage Weil
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Kara @ 2013-08-13 10:34 UTC (permalink / raw)
  To: Sage Weil; +Cc: Jan Kara, linux-ext4

[-- Attachment #1: Type: text/plain, Size: 542 bytes --]

On Mon 12-08-13 11:13:06, Sage Weil wrote:
> Full dmesg is attached.
  Hum, nothing interesting in there...

> Our QA seems to hit this with some regularity.  Let me know if there's 
> some combination of patches that would help shed more light!
  If they can run with attached debug patch it could maybe sched some more
light. Please send also your System.map file together with the dmesg of the
kernel when the crash happens so that I can map addresses to function
names... Thanks!

								Honza

-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

[-- Attachment #2: 0001-jbd2-Debug-journal_head-additions-and-removals.patch --]
[-- Type: text/x-patch, Size: 2139 bytes --]

>From ae473a63b28ee1719cf47a637c73d9d163ddb4db Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Tue, 13 Aug 2013 12:29:35 +0200
Subject: [PATCH] jbd2: Debug journal_head additions and removals

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/jbd2/journal.c           | 4 ++++
 fs/jbd2/transaction.c       | 4 ++++
 include/linux/buffer_head.h | 4 ++++
 3 files changed, 12 insertions(+)

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 02c7ad9..227a1ab 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -2433,6 +2433,8 @@ repeat:
 		bh->b_private = jh;
 		jh->b_bh = bh;
 		get_bh(bh);
+		bh->added_from = _RET_IP_;
+		bh->added_jiffies = jiffies;
 		BUFFER_TRACE(bh, "added journal_head");
 	}
 	jh->b_jcount++;
@@ -2498,6 +2500,8 @@ void jbd2_journal_put_journal_head(struct journal_head *jh)
 	--jh->b_jcount;
 	if (!jh->b_jcount) {
 		__journal_remove_journal_head(bh);
+		bh->removed_from = _RET_IP_;
+		bh->removed_jiffies = jiffies;
 		jbd_unlock_bh_journal_head(bh);
 		__brelse(bh);
 	} else
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 7aa9a32..aacd1cd 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1275,6 +1275,10 @@ int jbd2_journal_dirty_metadata(handle_t *handle, struct buffer_head *bh)
 	journal = transaction->t_journal;
 	jh = jbd2_journal_grab_journal_head(bh);
 	if (!jh) {
+		printk(KERN_ERR "Dirtying buffer without jh at %lu: state %lx,"
+			"jh added from 0x%lx at %lu, removed from 0x%lx "
+			"at %lu\n", jiffies, bh->b_state, bh->removed_from,
+			bh->removed_jiffies, bh->added_from, bh->added_jiffies);
 		ret = -EUCLEAN;
 		goto out;
 	}
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 91fa9a9..f24073e 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -74,6 +74,10 @@ struct buffer_head {
 	struct address_space *b_assoc_map;	/* mapping this buffer is
 						   associated with */
 	atomic_t b_count;		/* users using this buffer_head */
+	unsigned long removed_from;
+	unsigned long removed_jiffies;
+	unsigned long added_from;
+	unsigned long added_jiffies;
 };
 
 /*
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: crash in __jbd2_journal_file_buffer
  2013-08-13 10:34             ` Jan Kara
@ 2013-08-22 23:35               ` Sage Weil
  2013-08-23  9:54                 ` Jan Kara
  2013-08-23 20:48                 ` Jan Kara
  0 siblings, 2 replies; 16+ messages in thread
From: Sage Weil @ 2013-08-22 23:35 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4

On Tue, 13 Aug 2013, Jan Kara wrote:
> On Mon 12-08-13 11:13:06, Sage Weil wrote:
> > Full dmesg is attached.
>   Hum, nothing interesting in there...
> 
> > Our QA seems to hit this with some regularity.  Let me know if there's 
> > some combination of patches that would help shed more light!
>   If they can run with attached debug patch it could maybe sched some more
> light. Please send also your System.map file together with the dmesg of the
> kernel when the crash happens so that I can map addresses to function
> names... Thanks!

Okay, finally hit it again:

<6>[75193.192249] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro,user_xattr,user_xattr
<3>[77877.426658] Dirtying buffer without jh at 4302720297: state 218c029,jh added from 0xffffffff8127ab1d at 4302720297, removed from 0xffffffff8127b5b0 at 4302720296
<4>[77877.441200] ------------[ cut here ]------------
<4>[77877.445845] WARNING: CPU: 7 PID: 26045 at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1380 jbd2_journal_dirty_metadata+0x1f1/0x2e0()

<4>[77877.497349] CPU: 7 PID: 26045 Comm: ceph-osd Not tainted 3.11.0-rc5-ceph-00061-g546140d #1
<4>[77877.505649] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
<4>[77877.513213]  0000000000000564 ffff880131ca1938 ffffffff81642d85 ffff8802272ef290
<4>[77877.520694]  0000000000000000 ffff880131ca1978 ffffffff8104985c ffff880131ca19a0
<4>[77877.528218]  ffff88020f695aa0 0000000000000000 ffff880214c48b40 ffff88020be55000
<4>[77877.535756] Call Trace:
<4>[77877.538279]  [<ffffffff81642d85>] dump_stack+0x46/0x58
<4>[77877.543439]  [<ffffffff8104985c>] warn_slowpath_common+0x8c/0xc0
<4>[77877.549548]  [<ffffffff810498aa>] warn_slowpath_null+0x1a/0x20
<4>[77877.555413]  [<ffffffff8127adb1>] jbd2_journal_dirty_metadata+0x1f1/0x2e0
<4>[77877.562288]  [<ffffffff812578c3>] __ext4_handle_dirty_metadata+0xa3/0x140
<4>[77877.569155]  [<ffffffff81268e23>] ext4_xattr_release_block+0x103/0x1f0
<4>[77877.575723]  [<ffffffff812692b0>] ext4_xattr_block_set+0x1e0/0x910
<4>[77877.581990]  [<ffffffff8126a58b>] ext4_xattr_set_handle+0x38b/0x4a0
<4>[77877.588335]  [<ffffffff810af7cd>] ? trace_hardirqs_on+0xd/0x10
<4>[77877.594188]  [<ffffffff8126a765>] ext4_xattr_set+0xc5/0x140
<4>[77877.599837]  [<ffffffff8126b177>] ext4_xattr_user_set+0x47/0x50
<4>[77877.605779]  [<ffffffff811a3fee>] generic_setxattr+0x6e/0x90
<4>[77877.611514]  [<ffffffff811a48eb>] __vfs_setxattr_noperm+0x7b/0x1c0
<4>[77877.617773]  [<ffffffff811a4af4>] vfs_setxattr+0xc4/0xd0
<4>[77877.623103]  [<ffffffff811a4c3e>] setxattr+0x13e/0x1e0
<4>[77877.628317]  [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
<4>[77877.634260]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
<4>[77877.640428]  [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
<4>[77877.645847]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
<4>[77877.652015]  [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
<4>[77877.657897]  [<ffffffff811a50de>] SyS_fsetxattr+0xbe/0x100
<4>[77877.663405]  [<ffffffff81653782>] system_call_fastpath+0x16/0x1b
<4>[77877.669488] ---[ end trace bb7933908cd5a32a ]---
<2>[77877.674126] EXT4-fs error (device sda1) in ext4_handle_dirty_xattr_block:167: error 117
<3>[77877.692983] Aborting journal on device sda1-8.
<2>[77877.721561] EXT4-fs (sda1): Remounting filesystem read-only
<0>[77877.721657] journal commit I/O error
<0>[77877.721706] journal commit I/O error
<0>[77877.721707] journal commit I/O error
<2>[77877.727300] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
<2>[77877.727338] EXT4-fs (sda1): Remounting filesystem read-only
<2>[77877.727613] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
<2>[77877.727618] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
<2>[77877.727625] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
<2>[77877.778239] EXT4-fs error (device sda1) in ext4_xattr_release_block:558: error 117
<3>[77877.786051] Dirtying buffer without jh at 4302720332: state 10c029,jh added from 0xffffffff8127eb88 at 4302720326, removed from 0xffffffff8127b5b0 at 4302720274
<4>[77877.800516] ------------[ cut here ]------------
<4>[77877.805156] WARNING: CPU: 7 PID: 26045 at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1380 jbd2_journal_dirty_metadata+0x1f1/0x2e0()

<4>[77877.856583] CPU: 7 PID: 26045 Comm: ceph-osd Tainted: G        W    3.11.0-rc5-ceph-00061-g546140d #1
<4>[77877.865896] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
<4>[77877.873475]  0000000000000564 ffff880131ca19b8 ffffffff81642d85 ffff8802272ef290
<4>[77877.880954]  0000000000000000 ffff880131ca19f8 ffffffff8104985c ffff880131ca1a20
<4>[77877.888488]  ffff8801c499be58 0000000000000000 ffff880029010000 ffff880029010c30
<4>[77877.895962] Call Trace:
<4>[77877.898484]  [<ffffffff81642d85>] dump_stack+0x46/0x58
<4>[77877.903643]  [<ffffffff8104985c>] warn_slowpath_common+0x8c/0xc0
<4>[77877.909724]  [<ffffffff810498aa>] warn_slowpath_null+0x1a/0x20
<4>[77877.915577]  [<ffffffff8127adb1>] jbd2_journal_dirty_metadata+0x1f1/0x2e0
<4>[77877.922443]  [<ffffffff810af7cd>] ? trace_hardirqs_on+0xd/0x10
<4>[77877.928353]  [<ffffffff812578c3>] __ext4_handle_dirty_metadata+0xa3/0x140
<4>[77877.935165]  [<ffffffff8122ca3e>] ext4_mark_iloc_dirty+0x40e/0x660
<4>[77877.941421]  [<ffffffff8126a465>] ext4_xattr_set_handle+0x265/0x4a0
<4>[77877.947766]  [<ffffffff8126a765>] ext4_xattr_set+0xc5/0x140
<4>[77877.953358]  [<ffffffff8126b177>] ext4_xattr_user_set+0x47/0x50
<4>[77877.959354]  [<ffffffff811a3fee>] generic_setxattr+0x6e/0x90
<4>[77877.965034]  [<ffffffff811a48eb>] __vfs_setxattr_noperm+0x7b/0x1c0
<4>[77877.971289]  [<ffffffff811a4af4>] vfs_setxattr+0xc4/0xd0
<4>[77877.976621]  [<ffffffff811a4c3e>] setxattr+0x13e/0x1e0
<4>[77877.981837]  [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
<4>[77877.987830]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
<4>[77877.993944]  [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
<4>[77877.999417]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
<4>[77878.005530]  [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
<4>[77878.011349]  [<ffffffff811a50de>] SyS_fsetxattr+0xbe/0x100
<4>[77878.016856]  [<ffffffff81653782>] system_call_fastpath+0x16/0x1b
<4>[77878.022941] ---[ end trace bb7933908cd5a32b ]---
[7]kdb> rd
ax: ffff88020aadbf20  bx: ffff8800290100a0  cx: 0000000000000000
dx: ffff8800290100a0  si: ffff8800290100a0  di: ffff8800290100a0
bp: ffff880131ca1a28  sp: ffff880131ca1978  r8: 0000000000000002
r9: 0000000000000000  r10: 0000000000000001  r11: 0000000000000000
r12: ffff880029010000  r13: 0000000000000000  r14: ffff880029010c30
r15: 00000000ffffff8b  ip: ffffffff81279f84  flags: 00010286  cs: 00000010
ss: 00000018  ds: 00000018  es: 00000018  fs: 00000018  gs: 00000018
[7]kdb> bt
Stack traceback for pid 26045
0xffff88020aadbf20    26045    25958  1    7   R  0xffff88020aadc3a8 *ceph-osd
 ffff880131ca1978 0000000000000018 ffff880131ca1978 ffff880131ca1998
 bb7933908cd5a32b 0000000000000000 0000000000000001 0000000000000002
 0000000000000000 ffff880131ca19b8 0000000000000000 ffff880131ca19f8
Call Trace:
 [<ffffffff8104986f>] ? warn_slowpath_common+0x9f/0xc0
 [<ffffffff81257308>] ? __ext4_journal_stop+0x78/0xa0
 [<ffffffff812578dc>] ? __ext4_handle_dirty_metadata+0xbc/0x140
 [<ffffffff8122ca3e>] ? ext4_mark_iloc_dirty+0x40e/0x660
 [<ffffffff8126a465>] ? ext4_xattr_set_handle+0x265/0x4a0
 [<ffffffff8126a765>] ? ext4_xattr_set+0xc5/0x140
 [<ffffffff8126b177>] ? ext4_xattr_user_set+0x47/0x50
 [<ffffffff811a3fee>] ? generic_setxattr+0x6e/0x90
 [<ffffffff811a48eb>] ? __vfs_setxattr_noperm+0x7b/0x1c0
 [<ffffffff811a4af4>] ? vfs_setxattr+0xc4/0xd0
 [<ffffffff811a4c3e>] ? setxattr+0x13e/0x1e0
 [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
 [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
 [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
 [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
 [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
 [<ffffffff811a50de>] ? SyS_fsetxattr+0xbe/0x100
 [<ffffffff81653782>] ? system_call_fastpath+0x16/0x1b

Let me know if there is anything else I can gather from this machine that 
will help!

sage

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: crash in __jbd2_journal_file_buffer
  2013-08-22 23:35               ` Sage Weil
@ 2013-08-23  9:54                 ` Jan Kara
  2013-08-23 15:02                   ` Sage Weil
  2013-08-23 20:48                 ` Jan Kara
  1 sibling, 1 reply; 16+ messages in thread
From: Jan Kara @ 2013-08-23  9:54 UTC (permalink / raw)
  To: Sage Weil; +Cc: linux-ext4

On Thu 22-08-13 16:35:15, Sage Weil wrote:
> On Tue, 13 Aug 2013, Jan Kara wrote:
> > On Mon 12-08-13 11:13:06, Sage Weil wrote:
> > > Full dmesg is attached.
> >   Hum, nothing interesting in there...
> > 
> > > Our QA seems to hit this with some regularity.  Let me know if there's 
> > > some combination of patches that would help shed more light!
> >   If they can run with attached debug patch it could maybe sched some more
> > light. Please send also your System.map file together with the dmesg of the
> > kernel when the crash happens so that I can map addresses to function
> > names... Thanks!
> 
> Okay, finally hit it again:
> 
> <6>[75193.192249] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro,user_xattr,user_xattr
> <3>[77877.426658] Dirtying buffer without jh at 4302720297: state 218c029,jh added from 0xffffffff8127ab1d at 4302720297, removed from 0xffffffff8127b5b0 at 4302720296
  Great! Can you please send me /proc/kallsyms from the machine?

								Honza

> <4>[77877.441200] ------------[ cut here ]------------
> <4>[77877.445845] WARNING: CPU: 7 PID: 26045 at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1380 jbd2_journal_dirty_metadata+0x1f1/0x2e0()
> 
> <4>[77877.497349] CPU: 7 PID: 26045 Comm: ceph-osd Not tainted 3.11.0-rc5-ceph-00061-g546140d #1
> <4>[77877.505649] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
> <4>[77877.513213]  0000000000000564 ffff880131ca1938 ffffffff81642d85 ffff8802272ef290
> <4>[77877.520694]  0000000000000000 ffff880131ca1978 ffffffff8104985c ffff880131ca19a0
> <4>[77877.528218]  ffff88020f695aa0 0000000000000000 ffff880214c48b40 ffff88020be55000
> <4>[77877.535756] Call Trace:
> <4>[77877.538279]  [<ffffffff81642d85>] dump_stack+0x46/0x58
> <4>[77877.543439]  [<ffffffff8104985c>] warn_slowpath_common+0x8c/0xc0
> <4>[77877.549548]  [<ffffffff810498aa>] warn_slowpath_null+0x1a/0x20
> <4>[77877.555413]  [<ffffffff8127adb1>] jbd2_journal_dirty_metadata+0x1f1/0x2e0
> <4>[77877.562288]  [<ffffffff812578c3>] __ext4_handle_dirty_metadata+0xa3/0x140
> <4>[77877.569155]  [<ffffffff81268e23>] ext4_xattr_release_block+0x103/0x1f0
> <4>[77877.575723]  [<ffffffff812692b0>] ext4_xattr_block_set+0x1e0/0x910
> <4>[77877.581990]  [<ffffffff8126a58b>] ext4_xattr_set_handle+0x38b/0x4a0
> <4>[77877.588335]  [<ffffffff810af7cd>] ? trace_hardirqs_on+0xd/0x10
> <4>[77877.594188]  [<ffffffff8126a765>] ext4_xattr_set+0xc5/0x140
> <4>[77877.599837]  [<ffffffff8126b177>] ext4_xattr_user_set+0x47/0x50
> <4>[77877.605779]  [<ffffffff811a3fee>] generic_setxattr+0x6e/0x90
> <4>[77877.611514]  [<ffffffff811a48eb>] __vfs_setxattr_noperm+0x7b/0x1c0
> <4>[77877.617773]  [<ffffffff811a4af4>] vfs_setxattr+0xc4/0xd0
> <4>[77877.623103]  [<ffffffff811a4c3e>] setxattr+0x13e/0x1e0
> <4>[77877.628317]  [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
> <4>[77877.634260]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> <4>[77877.640428]  [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
> <4>[77877.645847]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> <4>[77877.652015]  [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
> <4>[77877.657897]  [<ffffffff811a50de>] SyS_fsetxattr+0xbe/0x100
> <4>[77877.663405]  [<ffffffff81653782>] system_call_fastpath+0x16/0x1b
> <4>[77877.669488] ---[ end trace bb7933908cd5a32a ]---
> <2>[77877.674126] EXT4-fs error (device sda1) in ext4_handle_dirty_xattr_block:167: error 117
> <3>[77877.692983] Aborting journal on device sda1-8.
> <2>[77877.721561] EXT4-fs (sda1): Remounting filesystem read-only
> <0>[77877.721657] journal commit I/O error
> <0>[77877.721706] journal commit I/O error
> <0>[77877.721707] journal commit I/O error
> <2>[77877.727300] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
> <2>[77877.727338] EXT4-fs (sda1): Remounting filesystem read-only
> <2>[77877.727613] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
> <2>[77877.727618] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
> <2>[77877.727625] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
> <2>[77877.778239] EXT4-fs error (device sda1) in ext4_xattr_release_block:558: error 117
> <3>[77877.786051] Dirtying buffer without jh at 4302720332: state 10c029,jh added from 0xffffffff8127eb88 at 4302720326, removed from 0xffffffff8127b5b0 at 4302720274
> <4>[77877.800516] ------------[ cut here ]------------
> <4>[77877.805156] WARNING: CPU: 7 PID: 26045 at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1380 jbd2_journal_dirty_metadata+0x1f1/0x2e0()
> 
> <4>[77877.856583] CPU: 7 PID: 26045 Comm: ceph-osd Tainted: G        W    3.11.0-rc5-ceph-00061-g546140d #1
> <4>[77877.865896] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
> <4>[77877.873475]  0000000000000564 ffff880131ca19b8 ffffffff81642d85 ffff8802272ef290
> <4>[77877.880954]  0000000000000000 ffff880131ca19f8 ffffffff8104985c ffff880131ca1a20
> <4>[77877.888488]  ffff8801c499be58 0000000000000000 ffff880029010000 ffff880029010c30
> <4>[77877.895962] Call Trace:
> <4>[77877.898484]  [<ffffffff81642d85>] dump_stack+0x46/0x58
> <4>[77877.903643]  [<ffffffff8104985c>] warn_slowpath_common+0x8c/0xc0
> <4>[77877.909724]  [<ffffffff810498aa>] warn_slowpath_null+0x1a/0x20
> <4>[77877.915577]  [<ffffffff8127adb1>] jbd2_journal_dirty_metadata+0x1f1/0x2e0
> <4>[77877.922443]  [<ffffffff810af7cd>] ? trace_hardirqs_on+0xd/0x10
> <4>[77877.928353]  [<ffffffff812578c3>] __ext4_handle_dirty_metadata+0xa3/0x140
> <4>[77877.935165]  [<ffffffff8122ca3e>] ext4_mark_iloc_dirty+0x40e/0x660
> <4>[77877.941421]  [<ffffffff8126a465>] ext4_xattr_set_handle+0x265/0x4a0
> <4>[77877.947766]  [<ffffffff8126a765>] ext4_xattr_set+0xc5/0x140
> <4>[77877.953358]  [<ffffffff8126b177>] ext4_xattr_user_set+0x47/0x50
> <4>[77877.959354]  [<ffffffff811a3fee>] generic_setxattr+0x6e/0x90
> <4>[77877.965034]  [<ffffffff811a48eb>] __vfs_setxattr_noperm+0x7b/0x1c0
> <4>[77877.971289]  [<ffffffff811a4af4>] vfs_setxattr+0xc4/0xd0
> <4>[77877.976621]  [<ffffffff811a4c3e>] setxattr+0x13e/0x1e0
> <4>[77877.981837]  [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
> <4>[77877.987830]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> <4>[77877.993944]  [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
> <4>[77877.999417]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> <4>[77878.005530]  [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
> <4>[77878.011349]  [<ffffffff811a50de>] SyS_fsetxattr+0xbe/0x100
> <4>[77878.016856]  [<ffffffff81653782>] system_call_fastpath+0x16/0x1b
> <4>[77878.022941] ---[ end trace bb7933908cd5a32b ]---
> [7]kdb> rd
> ax: ffff88020aadbf20  bx: ffff8800290100a0  cx: 0000000000000000
> dx: ffff8800290100a0  si: ffff8800290100a0  di: ffff8800290100a0
> bp: ffff880131ca1a28  sp: ffff880131ca1978  r8: 0000000000000002
> r9: 0000000000000000  r10: 0000000000000001  r11: 0000000000000000
> r12: ffff880029010000  r13: 0000000000000000  r14: ffff880029010c30
> r15: 00000000ffffff8b  ip: ffffffff81279f84  flags: 00010286  cs: 00000010
> ss: 00000018  ds: 00000018  es: 00000018  fs: 00000018  gs: 00000018
> [7]kdb> bt
> Stack traceback for pid 26045
> 0xffff88020aadbf20    26045    25958  1    7   R  0xffff88020aadc3a8 *ceph-osd
>  ffff880131ca1978 0000000000000018 ffff880131ca1978 ffff880131ca1998
>  bb7933908cd5a32b 0000000000000000 0000000000000001 0000000000000002
>  0000000000000000 ffff880131ca19b8 0000000000000000 ffff880131ca19f8
> Call Trace:
>  [<ffffffff8104986f>] ? warn_slowpath_common+0x9f/0xc0
>  [<ffffffff81257308>] ? __ext4_journal_stop+0x78/0xa0
>  [<ffffffff812578dc>] ? __ext4_handle_dirty_metadata+0xbc/0x140
>  [<ffffffff8122ca3e>] ? ext4_mark_iloc_dirty+0x40e/0x660
>  [<ffffffff8126a465>] ? ext4_xattr_set_handle+0x265/0x4a0
>  [<ffffffff8126a765>] ? ext4_xattr_set+0xc5/0x140
>  [<ffffffff8126b177>] ? ext4_xattr_user_set+0x47/0x50
>  [<ffffffff811a3fee>] ? generic_setxattr+0x6e/0x90
>  [<ffffffff811a48eb>] ? __vfs_setxattr_noperm+0x7b/0x1c0
>  [<ffffffff811a4af4>] ? vfs_setxattr+0xc4/0xd0
>  [<ffffffff811a4c3e>] ? setxattr+0x13e/0x1e0
>  [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
>  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
>  [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
>  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
>  [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
>  [<ffffffff811a50de>] ? SyS_fsetxattr+0xbe/0x100
>  [<ffffffff81653782>] ? system_call_fastpath+0x16/0x1b
> 
> Let me know if there is anything else I can gather from this machine that 
> will help!
> 
> sage
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: crash in __jbd2_journal_file_buffer
  2013-08-23  9:54                 ` Jan Kara
@ 2013-08-23 15:02                   ` Sage Weil
  2013-08-23 19:52                     ` Jan Kara
  0 siblings, 1 reply; 16+ messages in thread
From: Sage Weil @ 2013-08-23 15:02 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4

[-- Attachment #1: Type: TEXT/PLAIN, Size: 9026 bytes --]

On Fri, 23 Aug 2013, Jan Kara wrote:
> On Thu 22-08-13 16:35:15, Sage Weil wrote:
> > On Tue, 13 Aug 2013, Jan Kara wrote:
> > > On Mon 12-08-13 11:13:06, Sage Weil wrote:
> > > > Full dmesg is attached.
> > >   Hum, nothing interesting in there...
> > > 
> > > > Our QA seems to hit this with some regularity.  Let me know if there's 
> > > > some combination of patches that would help shed more light!
> > >   If they can run with attached debug patch it could maybe sched some more
> > > light. Please send also your System.map file together with the dmesg of the
> > > kernel when the crash happens so that I can map addresses to function
> > > names... Thanks!
> > 
> > Okay, finally hit it again:
> > 
> > <6>[75193.192249] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro,user_xattr,user_xattr
> > <3>[77877.426658] Dirtying buffer without jh at 4302720297: state 218c029,jh added from 0xffffffff8127ab1d at 4302720297, removed from 0xffffffff8127b5b0 at 4302720296
>   Great! Can you please send me /proc/kallsyms from the machine?

Attached!

sage

> 
> 								Honza
> 
> > <4>[77877.441200] ------------[ cut here ]------------
> > <4>[77877.445845] WARNING: CPU: 7 PID: 26045 at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1380 jbd2_journal_dirty_metadata+0x1f1/0x2e0()
> > 
> > <4>[77877.497349] CPU: 7 PID: 26045 Comm: ceph-osd Not tainted 3.11.0-rc5-ceph-00061-g546140d #1
> > <4>[77877.505649] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
> > <4>[77877.513213]  0000000000000564 ffff880131ca1938 ffffffff81642d85 ffff8802272ef290
> > <4>[77877.520694]  0000000000000000 ffff880131ca1978 ffffffff8104985c ffff880131ca19a0
> > <4>[77877.528218]  ffff88020f695aa0 0000000000000000 ffff880214c48b40 ffff88020be55000
> > <4>[77877.535756] Call Trace:
> > <4>[77877.538279]  [<ffffffff81642d85>] dump_stack+0x46/0x58
> > <4>[77877.543439]  [<ffffffff8104985c>] warn_slowpath_common+0x8c/0xc0
> > <4>[77877.549548]  [<ffffffff810498aa>] warn_slowpath_null+0x1a/0x20
> > <4>[77877.555413]  [<ffffffff8127adb1>] jbd2_journal_dirty_metadata+0x1f1/0x2e0
> > <4>[77877.562288]  [<ffffffff812578c3>] __ext4_handle_dirty_metadata+0xa3/0x140
> > <4>[77877.569155]  [<ffffffff81268e23>] ext4_xattr_release_block+0x103/0x1f0
> > <4>[77877.575723]  [<ffffffff812692b0>] ext4_xattr_block_set+0x1e0/0x910
> > <4>[77877.581990]  [<ffffffff8126a58b>] ext4_xattr_set_handle+0x38b/0x4a0
> > <4>[77877.588335]  [<ffffffff810af7cd>] ? trace_hardirqs_on+0xd/0x10
> > <4>[77877.594188]  [<ffffffff8126a765>] ext4_xattr_set+0xc5/0x140
> > <4>[77877.599837]  [<ffffffff8126b177>] ext4_xattr_user_set+0x47/0x50
> > <4>[77877.605779]  [<ffffffff811a3fee>] generic_setxattr+0x6e/0x90
> > <4>[77877.611514]  [<ffffffff811a48eb>] __vfs_setxattr_noperm+0x7b/0x1c0
> > <4>[77877.617773]  [<ffffffff811a4af4>] vfs_setxattr+0xc4/0xd0
> > <4>[77877.623103]  [<ffffffff811a4c3e>] setxattr+0x13e/0x1e0
> > <4>[77877.628317]  [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
> > <4>[77877.634260]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> > <4>[77877.640428]  [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
> > <4>[77877.645847]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> > <4>[77877.652015]  [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
> > <4>[77877.657897]  [<ffffffff811a50de>] SyS_fsetxattr+0xbe/0x100
> > <4>[77877.663405]  [<ffffffff81653782>] system_call_fastpath+0x16/0x1b
> > <4>[77877.669488] ---[ end trace bb7933908cd5a32a ]---
> > <2>[77877.674126] EXT4-fs error (device sda1) in ext4_handle_dirty_xattr_block:167: error 117
> > <3>[77877.692983] Aborting journal on device sda1-8.
> > <2>[77877.721561] EXT4-fs (sda1): Remounting filesystem read-only
> > <0>[77877.721657] journal commit I/O error
> > <0>[77877.721706] journal commit I/O error
> > <0>[77877.721707] journal commit I/O error
> > <2>[77877.727300] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
> > <2>[77877.727338] EXT4-fs (sda1): Remounting filesystem read-only
> > <2>[77877.727613] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
> > <2>[77877.727618] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
> > <2>[77877.727625] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
> > <2>[77877.778239] EXT4-fs error (device sda1) in ext4_xattr_release_block:558: error 117
> > <3>[77877.786051] Dirtying buffer without jh at 4302720332: state 10c029,jh added from 0xffffffff8127eb88 at 4302720326, removed from 0xffffffff8127b5b0 at 4302720274
> > <4>[77877.800516] ------------[ cut here ]------------
> > <4>[77877.805156] WARNING: CPU: 7 PID: 26045 at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1380 jbd2_journal_dirty_metadata+0x1f1/0x2e0()
> > 
> > <4>[77877.856583] CPU: 7 PID: 26045 Comm: ceph-osd Tainted: G        W    3.11.0-rc5-ceph-00061-g546140d #1
> > <4>[77877.865896] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
> > <4>[77877.873475]  0000000000000564 ffff880131ca19b8 ffffffff81642d85 ffff8802272ef290
> > <4>[77877.880954]  0000000000000000 ffff880131ca19f8 ffffffff8104985c ffff880131ca1a20
> > <4>[77877.888488]  ffff8801c499be58 0000000000000000 ffff880029010000 ffff880029010c30
> > <4>[77877.895962] Call Trace:
> > <4>[77877.898484]  [<ffffffff81642d85>] dump_stack+0x46/0x58
> > <4>[77877.903643]  [<ffffffff8104985c>] warn_slowpath_common+0x8c/0xc0
> > <4>[77877.909724]  [<ffffffff810498aa>] warn_slowpath_null+0x1a/0x20
> > <4>[77877.915577]  [<ffffffff8127adb1>] jbd2_journal_dirty_metadata+0x1f1/0x2e0
> > <4>[77877.922443]  [<ffffffff810af7cd>] ? trace_hardirqs_on+0xd/0x10
> > <4>[77877.928353]  [<ffffffff812578c3>] __ext4_handle_dirty_metadata+0xa3/0x140
> > <4>[77877.935165]  [<ffffffff8122ca3e>] ext4_mark_iloc_dirty+0x40e/0x660
> > <4>[77877.941421]  [<ffffffff8126a465>] ext4_xattr_set_handle+0x265/0x4a0
> > <4>[77877.947766]  [<ffffffff8126a765>] ext4_xattr_set+0xc5/0x140
> > <4>[77877.953358]  [<ffffffff8126b177>] ext4_xattr_user_set+0x47/0x50
> > <4>[77877.959354]  [<ffffffff811a3fee>] generic_setxattr+0x6e/0x90
> > <4>[77877.965034]  [<ffffffff811a48eb>] __vfs_setxattr_noperm+0x7b/0x1c0
> > <4>[77877.971289]  [<ffffffff811a4af4>] vfs_setxattr+0xc4/0xd0
> > <4>[77877.976621]  [<ffffffff811a4c3e>] setxattr+0x13e/0x1e0
> > <4>[77877.981837]  [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
> > <4>[77877.987830]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> > <4>[77877.993944]  [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
> > <4>[77877.999417]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> > <4>[77878.005530]  [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
> > <4>[77878.011349]  [<ffffffff811a50de>] SyS_fsetxattr+0xbe/0x100
> > <4>[77878.016856]  [<ffffffff81653782>] system_call_fastpath+0x16/0x1b
> > <4>[77878.022941] ---[ end trace bb7933908cd5a32b ]---
> > [7]kdb> rd
> > ax: ffff88020aadbf20  bx: ffff8800290100a0  cx: 0000000000000000
> > dx: ffff8800290100a0  si: ffff8800290100a0  di: ffff8800290100a0
> > bp: ffff880131ca1a28  sp: ffff880131ca1978  r8: 0000000000000002
> > r9: 0000000000000000  r10: 0000000000000001  r11: 0000000000000000
> > r12: ffff880029010000  r13: 0000000000000000  r14: ffff880029010c30
> > r15: 00000000ffffff8b  ip: ffffffff81279f84  flags: 00010286  cs: 00000010
> > ss: 00000018  ds: 00000018  es: 00000018  fs: 00000018  gs: 00000018
> > [7]kdb> bt
> > Stack traceback for pid 26045
> > 0xffff88020aadbf20    26045    25958  1    7   R  0xffff88020aadc3a8 *ceph-osd
> >  ffff880131ca1978 0000000000000018 ffff880131ca1978 ffff880131ca1998
> >  bb7933908cd5a32b 0000000000000000 0000000000000001 0000000000000002
> >  0000000000000000 ffff880131ca19b8 0000000000000000 ffff880131ca19f8
> > Call Trace:
> >  [<ffffffff8104986f>] ? warn_slowpath_common+0x9f/0xc0
> >  [<ffffffff81257308>] ? __ext4_journal_stop+0x78/0xa0
> >  [<ffffffff812578dc>] ? __ext4_handle_dirty_metadata+0xbc/0x140
> >  [<ffffffff8122ca3e>] ? ext4_mark_iloc_dirty+0x40e/0x660
> >  [<ffffffff8126a465>] ? ext4_xattr_set_handle+0x265/0x4a0
> >  [<ffffffff8126a765>] ? ext4_xattr_set+0xc5/0x140
> >  [<ffffffff8126b177>] ? ext4_xattr_user_set+0x47/0x50
> >  [<ffffffff811a3fee>] ? generic_setxattr+0x6e/0x90
> >  [<ffffffff811a48eb>] ? __vfs_setxattr_noperm+0x7b/0x1c0
> >  [<ffffffff811a4af4>] ? vfs_setxattr+0xc4/0xd0
> >  [<ffffffff811a4c3e>] ? setxattr+0x13e/0x1e0
> >  [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
> >  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> >  [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
> >  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> >  [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
> >  [<ffffffff811a50de>] ? SyS_fsetxattr+0xbe/0x100
> >  [<ffffffff81653782>] ? system_call_fastpath+0x16/0x1b
> > 
> > Let me know if there is anything else I can gather from this machine that 
> > will help!
> > 
> > sage
> -- 
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
> 
> 

[-- Attachment #2: Type: APPLICATION/octet-stream, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: crash in __jbd2_journal_file_buffer
  2013-08-23 15:02                   ` Sage Weil
@ 2013-08-23 19:52                     ` Jan Kara
  0 siblings, 0 replies; 16+ messages in thread
From: Jan Kara @ 2013-08-23 19:52 UTC (permalink / raw)
  To: Sage Weil; +Cc: Jan Kara, linux-ext4

On Fri 23-08-13 08:02:45, Sage Weil wrote:
> On Fri, 23 Aug 2013, Jan Kara wrote:
> > On Thu 22-08-13 16:35:15, Sage Weil wrote:
> > > On Tue, 13 Aug 2013, Jan Kara wrote:
> > > > On Mon 12-08-13 11:13:06, Sage Weil wrote:
> > > > > Full dmesg is attached.
> > > >   Hum, nothing interesting in there...
> > > > 
> > > > > Our QA seems to hit this with some regularity.  Let me know if there's 
> > > > > some combination of patches that would help shed more light!
> > > >   If they can run with attached debug patch it could maybe sched some more
> > > > light. Please send also your System.map file together with the dmesg of the
> > > > kernel when the crash happens so that I can map addresses to function
> > > > names... Thanks!
> > > 
> > > Okay, finally hit it again:
> > > 
> > > <6>[75193.192249] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro,user_xattr,user_xattr
> > > <3>[77877.426658] Dirtying buffer without jh at 4302720297: state 218c029,jh added from 0xffffffff8127ab1d at 4302720297, removed from 0xffffffff8127b5b0 at 4302720296
> >   Great! Can you please send me /proc/kallsyms from the machine?
> 
> Attached!
  Sadly not. I think your MUA got confused by the fact that files in /proc
have 0 length and attached empty file.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: crash in __jbd2_journal_file_buffer
  2013-08-22 23:35               ` Sage Weil
  2013-08-23  9:54                 ` Jan Kara
@ 2013-08-23 20:48                 ` Jan Kara
  2013-09-10 22:19                   ` Jan Kara
  1 sibling, 1 reply; 16+ messages in thread
From: Jan Kara @ 2013-08-23 20:48 UTC (permalink / raw)
  To: Sage Weil; +Cc: Jan Kara, linux-ext4

On Thu 22-08-13 16:35:15, Sage Weil wrote:
> On Tue, 13 Aug 2013, Jan Kara wrote:
> > On Mon 12-08-13 11:13:06, Sage Weil wrote:
> > > Full dmesg is attached.
> >   Hum, nothing interesting in there...
> > 
> > > Our QA seems to hit this with some regularity.  Let me know if there's 
> > > some combination of patches that would help shed more light!
> >   If they can run with attached debug patch it could maybe sched some more
> > light. Please send also your System.map file together with the dmesg of the
> > kernel when the crash happens so that I can map addresses to function
> > names... Thanks!
> 
> Okay, finally hit it again:
> 
> <6>[75193.192249] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro,user_xattr,user_xattr
> <3>[77877.426658] Dirtying buffer without jh at 4302720297: state 218c029,jh added from 0xffffffff8127ab1d at 4302720297, removed from 0xffffffff8127b5b0 at 4302720296
> <4>[77877.441200] ------------[ cut here ]------------
> <4>[77877.445845] WARNING: CPU: 7 PID: 26045 at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1380 jbd2_journal_dirty_metadata+0x1f1/0x2e0()
> 
> <4>[77877.497349] CPU: 7 PID: 26045 Comm: ceph-osd Not tainted 3.11.0-rc5-ceph-00061-g546140d #1
> <4>[77877.505649] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
> <4>[77877.513213]  0000000000000564 ffff880131ca1938 ffffffff81642d85 ffff8802272ef290
> <4>[77877.520694]  0000000000000000 ffff880131ca1978 ffffffff8104985c ffff880131ca19a0
> <4>[77877.528218]  ffff88020f695aa0 0000000000000000 ffff880214c48b40 ffff88020be55000
> <4>[77877.535756] Call Trace:
> <4>[77877.538279]  [<ffffffff81642d85>] dump_stack+0x46/0x58
> <4>[77877.543439]  [<ffffffff8104985c>] warn_slowpath_common+0x8c/0xc0
> <4>[77877.549548]  [<ffffffff810498aa>] warn_slowpath_null+0x1a/0x20
> <4>[77877.555413]  [<ffffffff8127adb1>] jbd2_journal_dirty_metadata+0x1f1/0x2e0
> <4>[77877.562288]  [<ffffffff812578c3>] __ext4_handle_dirty_metadata+0xa3/0x140
> <4>[77877.569155]  [<ffffffff81268e23>] ext4_xattr_release_block+0x103/0x1f0
> <4>[77877.575723]  [<ffffffff812692b0>] ext4_xattr_block_set+0x1e0/0x910
> <4>[77877.581990]  [<ffffffff8126a58b>] ext4_xattr_set_handle+0x38b/0x4a0
> <4>[77877.588335]  [<ffffffff810af7cd>] ? trace_hardirqs_on+0xd/0x10
> <4>[77877.594188]  [<ffffffff8126a765>] ext4_xattr_set+0xc5/0x140
> <4>[77877.599837]  [<ffffffff8126b177>] ext4_xattr_user_set+0x47/0x50
> <4>[77877.605779]  [<ffffffff811a3fee>] generic_setxattr+0x6e/0x90
> <4>[77877.611514]  [<ffffffff811a48eb>] __vfs_setxattr_noperm+0x7b/0x1c0
> <4>[77877.617773]  [<ffffffff811a4af4>] vfs_setxattr+0xc4/0xd0
> <4>[77877.623103]  [<ffffffff811a4c3e>] setxattr+0x13e/0x1e0
> <4>[77877.628317]  [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
> <4>[77877.634260]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> <4>[77877.640428]  [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
> <4>[77877.645847]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> <4>[77877.652015]  [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
> <4>[77877.657897]  [<ffffffff811a50de>] SyS_fsetxattr+0xbe/0x100
> <4>[77877.663405]  [<ffffffff81653782>] system_call_fastpath+0x16/0x1b
> <4>[77877.669488] ---[ end trace bb7933908cd5a32a ]---
  I was scratching my head for a while how this could happen but I think I
see the race now. Think we have two inodes A and B which share the same
xattr block (so &BHDR(bh)->h_refcount == 2). Now we are changing xattrs for
both inodes in parallel - that can easily happen because xattr locking is
generally per inode (EXT4_I(inode)->xattr_sem). The following race then
happens:
CPU1					CPU2
ext4_xattr_release_block()		ext4_xattr_release_block()
lock_buffer(bh);
/* False */
if (BHDR(bh)->h_refcount == cpu_to_le32(1))
} else {
	le32_add_cpu(&BHDR(bh)->h_refcount, -1);
	unlock_buffer(bh);
					lock_buffer(bh);
					/* True */
					if (BHDR(bh)->h_refcount == cpu_to_le32(1))
						get_bh(bh);
						ext4_free_blocks()
							...
							jbd2_journal_forget()
								jbd2_journal_unfile_buffer()
								-> JH is gone
	error = ext4_handle_dirty_xattr_block(handle, inode, bh);
	-> triggers warning

Now easy fix would be to move ext4_handle_dirty_xattr_block() before
unlock_buffer() but I don't really like that because of locking
constraints. I'll think about it...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: crash in __jbd2_journal_file_buffer
  2013-08-23 20:48                 ` Jan Kara
@ 2013-09-10 22:19                   ` Jan Kara
  2013-09-10 22:32                     ` Sage Weil
  2014-04-05 19:20                     ` Sage Weil
  0 siblings, 2 replies; 16+ messages in thread
From: Jan Kara @ 2013-09-10 22:19 UTC (permalink / raw)
  To: Sage Weil; +Cc: Jan Kara, linux-ext4

[-- Attachment #1: Type: text/plain, Size: 4706 bytes --]

On Fri 23-08-13 22:48:10, Jan Kara wrote:
> On Thu 22-08-13 16:35:15, Sage Weil wrote:
> > On Tue, 13 Aug 2013, Jan Kara wrote:
> > > On Mon 12-08-13 11:13:06, Sage Weil wrote:
> > > > Full dmesg is attached.
> > >   Hum, nothing interesting in there...
> > > 
> > > > Our QA seems to hit this with some regularity.  Let me know if there's 
> > > > some combination of patches that would help shed more light!
> > >   If they can run with attached debug patch it could maybe sched some more
> > > light. Please send also your System.map file together with the dmesg of the
> > > kernel when the crash happens so that I can map addresses to function
> > > names... Thanks!
> > 
> > Okay, finally hit it again:
> > 
> > <6>[75193.192249] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro,user_xattr,user_xattr
> > <3>[77877.426658] Dirtying buffer without jh at 4302720297: state 218c029,jh added from 0xffffffff8127ab1d at 4302720297, removed from 0xffffffff8127b5b0 at 4302720296
> > <4>[77877.441200] ------------[ cut here ]------------
> > <4>[77877.445845] WARNING: CPU: 7 PID: 26045 at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1380 jbd2_journal_dirty_metadata+0x1f1/0x2e0()
> > 
> > <4>[77877.497349] CPU: 7 PID: 26045 Comm: ceph-osd Not tainted 3.11.0-rc5-ceph-00061-g546140d #1
> > <4>[77877.505649] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
> > <4>[77877.513213]  0000000000000564 ffff880131ca1938 ffffffff81642d85 ffff8802272ef290
> > <4>[77877.520694]  0000000000000000 ffff880131ca1978 ffffffff8104985c ffff880131ca19a0
> > <4>[77877.528218]  ffff88020f695aa0 0000000000000000 ffff880214c48b40 ffff88020be55000
> > <4>[77877.535756] Call Trace:
> > <4>[77877.538279]  [<ffffffff81642d85>] dump_stack+0x46/0x58
> > <4>[77877.543439]  [<ffffffff8104985c>] warn_slowpath_common+0x8c/0xc0
> > <4>[77877.549548]  [<ffffffff810498aa>] warn_slowpath_null+0x1a/0x20
> > <4>[77877.555413]  [<ffffffff8127adb1>] jbd2_journal_dirty_metadata+0x1f1/0x2e0
> > <4>[77877.562288]  [<ffffffff812578c3>] __ext4_handle_dirty_metadata+0xa3/0x140
> > <4>[77877.569155]  [<ffffffff81268e23>] ext4_xattr_release_block+0x103/0x1f0
> > <4>[77877.575723]  [<ffffffff812692b0>] ext4_xattr_block_set+0x1e0/0x910
> > <4>[77877.581990]  [<ffffffff8126a58b>] ext4_xattr_set_handle+0x38b/0x4a0
> > <4>[77877.588335]  [<ffffffff810af7cd>] ? trace_hardirqs_on+0xd/0x10
> > <4>[77877.594188]  [<ffffffff8126a765>] ext4_xattr_set+0xc5/0x140
> > <4>[77877.599837]  [<ffffffff8126b177>] ext4_xattr_user_set+0x47/0x50
> > <4>[77877.605779]  [<ffffffff811a3fee>] generic_setxattr+0x6e/0x90
> > <4>[77877.611514]  [<ffffffff811a48eb>] __vfs_setxattr_noperm+0x7b/0x1c0
> > <4>[77877.617773]  [<ffffffff811a4af4>] vfs_setxattr+0xc4/0xd0
> > <4>[77877.623103]  [<ffffffff811a4c3e>] setxattr+0x13e/0x1e0
> > <4>[77877.628317]  [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
> > <4>[77877.634260]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> > <4>[77877.640428]  [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
> > <4>[77877.645847]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> > <4>[77877.652015]  [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
> > <4>[77877.657897]  [<ffffffff811a50de>] SyS_fsetxattr+0xbe/0x100
> > <4>[77877.663405]  [<ffffffff81653782>] system_call_fastpath+0x16/0x1b
> > <4>[77877.669488] ---[ end trace bb7933908cd5a32a ]---
>   I was scratching my head for a while how this could happen but I think I
> see the race now. Think we have two inodes A and B which share the same
> xattr block (so &BHDR(bh)->h_refcount == 2). Now we are changing xattrs for
> both inodes in parallel - that can easily happen because xattr locking is
> generally per inode (EXT4_I(inode)->xattr_sem). The following race then
> happens:
> CPU1					CPU2
> ext4_xattr_release_block()		ext4_xattr_release_block()
> lock_buffer(bh);
> /* False */
> if (BHDR(bh)->h_refcount == cpu_to_le32(1))
> } else {
> 	le32_add_cpu(&BHDR(bh)->h_refcount, -1);
> 	unlock_buffer(bh);
> 					lock_buffer(bh);
> 					/* True */
> 					if (BHDR(bh)->h_refcount == cpu_to_le32(1))
> 						get_bh(bh);
> 						ext4_free_blocks()
> 							...
> 							jbd2_journal_forget()
> 								jbd2_journal_unfile_buffer()
> 								-> JH is gone
> 	error = ext4_handle_dirty_xattr_block(handle, inode, bh);
> 	-> triggers warning
> 
> Now easy fix would be to move ext4_handle_dirty_xattr_block() before
> unlock_buffer() but I don't really like that because of locking
> constraints. I'll think about it...
  So finally I've got back to this. Attached is a somewhat ugly patch that
should fix this issue. Can you please test it? Thanks!

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

[-- Attachment #2: 0001-ext4-Fix-jbd2-warning-under-heavy-xattr-load.patch --]
[-- Type: text/x-patch, Size: 5425 bytes --]

>From f70fdf2341118eddc2ad1d5ef29498097b3630ef Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Wed, 11 Sep 2013 00:04:25 +0200
Subject: [PATCH] ext4: Fix jbd2 warning under heavy xattr load

When heavily exercising xattr code the assertion that
jbd2_journal_dirty_metadata() shouldn't return error was triggered:

WARNING: at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1237
jbd2_journal_dirty_metadata+0x1ba/0x260()

CPU: 0 PID: 8877 Comm: ceph-osd Tainted: G    W 3.10.0-ceph-00049-g68d04c9 #1
Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
 ffffffff81a1d3c8 ffff880214469928 ffffffff816311b0 ffff880214469968
 ffffffff8103fae0 ffff880214469958 ffff880170a9dc30 ffff8802240fbe80
 0000000000000000 ffff88020b366000 ffff8802256e7510 ffff880214469978
Call Trace:
 [<ffffffff816311b0>] dump_stack+0x19/0x1b
 [<ffffffff8103fae0>] warn_slowpath_common+0x70/0xa0
 [<ffffffff8103fb2a>] warn_slowpath_null+0x1a/0x20
 [<ffffffff81267c2a>] jbd2_journal_dirty_metadata+0x1ba/0x260
 [<ffffffff81245093>] __ext4_handle_dirty_metadata+0xa3/0x140
 [<ffffffff812561f3>] ext4_xattr_release_block+0x103/0x1f0
 [<ffffffff81256680>] ext4_xattr_block_set+0x1e0/0x910
 [<ffffffff8125795b>] ext4_xattr_set_handle+0x38b/0x4a0
 [<ffffffff810a319d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff81257b32>] ext4_xattr_set+0xc2/0x140
 [<ffffffff81258547>] ext4_xattr_user_set+0x47/0x50
 [<ffffffff811935ce>] generic_setxattr+0x6e/0x90
 [<ffffffff81193ecb>] __vfs_setxattr_noperm+0x7b/0x1c0
 [<ffffffff811940d4>] vfs_setxattr+0xc4/0xd0
 [<ffffffff8119421e>] setxattr+0x13e/0x1e0
 [<ffffffff811719c7>] ? __sb_start_write+0xe7/0x1b0
 [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
 [<ffffffff8118c65c>] ? fget_light+0x3c/0x130
 [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
 [<ffffffff8118f1f8>] ? __mnt_want_write+0x58/0x70
 [<ffffffff811946be>] SyS_fsetxattr+0xbe/0x100
 [<ffffffff816407c2>] system_call_fastpath+0x16/0x1b

The reason for the warning is that buffer_head passed into
jbd2_journal_dirty_metadata() didn't have journal_head attached. This is
caused by the following race of two ext4_xattr_release_block() calls:

CPU1                                CPU2
ext4_xattr_release_block()          ext4_xattr_release_block()
lock_buffer(bh);
/* False */
if (BHDR(bh)->h_refcount == cpu_to_le32(1))
} else {
  le32_add_cpu(&BHDR(bh)->h_refcount, -1);
  unlock_buffer(bh);
                                    lock_buffer(bh);
                                    /* True */
                                    if (BHDR(bh)->h_refcount == cpu_to_le32(1))
                                      get_bh(bh);
                                      ext4_free_blocks()
                                        ...
                                        jbd2_journal_forget()
                                          jbd2_journal_unfile_buffer()
                                          -> JH is gone
  error = ext4_handle_dirty_xattr_block(handle, inode, bh);
  -> triggers the warning

We fix the problem by moving ext4_handle_dirty_xattr_block() under the
buffer lock. Sadly this cannot be done in nojournal mode as that
function can call sync_dirty_buffer() which would deadlock. Luckily in
nojournal mode the race is harmless (we only dirty already freed buffer)
and thus for nojournal mode we leave the dirtying outside of the buffer
lock.

Reported-by: Sage Weil <sage@inktank.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/xattr.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index c081e34..dc99e17 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -517,8 +517,8 @@ static void ext4_xattr_update_super_block(handle_t *handle,
 }
 
 /*
- * Release the xattr block BH: If the reference count is > 1, decrement
- * it; otherwise free the block.
+ * Release the xattr block BH: If the reference count is > 1, decrement it;
+ * otherwise free the block.
  */
 static void
 ext4_xattr_release_block(handle_t *handle, struct inode *inode,
@@ -538,16 +538,31 @@ ext4_xattr_release_block(handle_t *handle, struct inode *inode,
 		if (ce)
 			mb_cache_entry_free(ce);
 		get_bh(bh);
+		unlock_buffer(bh);
 		ext4_free_blocks(handle, inode, bh, 0, 1,
 				 EXT4_FREE_BLOCKS_METADATA |
 				 EXT4_FREE_BLOCKS_FORGET);
-		unlock_buffer(bh);
 	} else {
 		le32_add_cpu(&BHDR(bh)->h_refcount, -1);
 		if (ce)
 			mb_cache_entry_release(ce);
+		/*
+		 * Beware of this ugliness: Releasing of xattr block references
+		 * from different inodes can race and so we have to protect
+		 * from a race where someone else frees the block (and releases
+		 * its journal_head) before we are done dirtying the buffer. In
+		 * nojournal mode this race is harmless and we actually cannot
+		 * call ext4_handle_dirty_xattr_block() with locked buffer as
+		 * that function can call sync_dirty_buffer() so for that case
+		 * we handle the dirtying after unlocking the buffer.
+		 */
+		if (ext4_handle_valid(handle))
+			error = ext4_handle_dirty_xattr_block(handle, inode,
+							      bh);
 		unlock_buffer(bh);
-		error = ext4_handle_dirty_xattr_block(handle, inode, bh);
+		if (!ext4_handle_valid(handle))
+			error = ext4_handle_dirty_xattr_block(handle, inode,
+							      bh);
 		if (IS_SYNC(inode))
 			ext4_handle_sync(handle);
 		dquot_free_block(inode, EXT4_C2B(EXT4_SB(inode->i_sb), 1));
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: crash in __jbd2_journal_file_buffer
  2013-09-10 22:19                   ` Jan Kara
@ 2013-09-10 22:32                     ` Sage Weil
  2013-11-11 22:20                       ` Jan Kara
  2014-04-05 19:20                     ` Sage Weil
  1 sibling, 1 reply; 16+ messages in thread
From: Sage Weil @ 2013-09-10 22:32 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4

On Wed, 11 Sep 2013, Jan Kara wrote:
> On Fri 23-08-13 22:48:10, Jan Kara wrote:
> > On Thu 22-08-13 16:35:15, Sage Weil wrote:
> > > On Tue, 13 Aug 2013, Jan Kara wrote:
> > > > On Mon 12-08-13 11:13:06, Sage Weil wrote:
> > > > > Full dmesg is attached.
> > > >   Hum, nothing interesting in there...
> > > > 
> > > > > Our QA seems to hit this with some regularity.  Let me know if there's 
> > > > > some combination of patches that would help shed more light!
> > > >   If they can run with attached debug patch it could maybe sched some more
> > > > light. Please send also your System.map file together with the dmesg of the
> > > > kernel when the crash happens so that I can map addresses to function
> > > > names... Thanks!
> > > 
> > > Okay, finally hit it again:
> > > 
> > > <6>[75193.192249] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro,user_xattr,user_xattr
> > > <3>[77877.426658] Dirtying buffer without jh at 4302720297: state 218c029,jh added from 0xffffffff8127ab1d at 4302720297, removed from 0xffffffff8127b5b0 at 4302720296
> > > <4>[77877.441200] ------------[ cut here ]------------
> > > <4>[77877.445845] WARNING: CPU: 7 PID: 26045 at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1380 jbd2_journal_dirty_metadata+0x1f1/0x2e0()
> > > 
> > > <4>[77877.497349] CPU: 7 PID: 26045 Comm: ceph-osd Not tainted 3.11.0-rc5-ceph-00061-g546140d #1
> > > <4>[77877.505649] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
> > > <4>[77877.513213]  0000000000000564 ffff880131ca1938 ffffffff81642d85 ffff8802272ef290
> > > <4>[77877.520694]  0000000000000000 ffff880131ca1978 ffffffff8104985c ffff880131ca19a0
> > > <4>[77877.528218]  ffff88020f695aa0 0000000000000000 ffff880214c48b40 ffff88020be55000
> > > <4>[77877.535756] Call Trace:
> > > <4>[77877.538279]  [<ffffffff81642d85>] dump_stack+0x46/0x58
> > > <4>[77877.543439]  [<ffffffff8104985c>] warn_slowpath_common+0x8c/0xc0
> > > <4>[77877.549548]  [<ffffffff810498aa>] warn_slowpath_null+0x1a/0x20
> > > <4>[77877.555413]  [<ffffffff8127adb1>] jbd2_journal_dirty_metadata+0x1f1/0x2e0
> > > <4>[77877.562288]  [<ffffffff812578c3>] __ext4_handle_dirty_metadata+0xa3/0x140
> > > <4>[77877.569155]  [<ffffffff81268e23>] ext4_xattr_release_block+0x103/0x1f0
> > > <4>[77877.575723]  [<ffffffff812692b0>] ext4_xattr_block_set+0x1e0/0x910
> > > <4>[77877.581990]  [<ffffffff8126a58b>] ext4_xattr_set_handle+0x38b/0x4a0
> > > <4>[77877.588335]  [<ffffffff810af7cd>] ? trace_hardirqs_on+0xd/0x10
> > > <4>[77877.594188]  [<ffffffff8126a765>] ext4_xattr_set+0xc5/0x140
> > > <4>[77877.599837]  [<ffffffff8126b177>] ext4_xattr_user_set+0x47/0x50
> > > <4>[77877.605779]  [<ffffffff811a3fee>] generic_setxattr+0x6e/0x90
> > > <4>[77877.611514]  [<ffffffff811a48eb>] __vfs_setxattr_noperm+0x7b/0x1c0
> > > <4>[77877.617773]  [<ffffffff811a4af4>] vfs_setxattr+0xc4/0xd0
> > > <4>[77877.623103]  [<ffffffff811a4c3e>] setxattr+0x13e/0x1e0
> > > <4>[77877.628317]  [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
> > > <4>[77877.634260]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> > > <4>[77877.640428]  [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
> > > <4>[77877.645847]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> > > <4>[77877.652015]  [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
> > > <4>[77877.657897]  [<ffffffff811a50de>] SyS_fsetxattr+0xbe/0x100
> > > <4>[77877.663405]  [<ffffffff81653782>] system_call_fastpath+0x16/0x1b
> > > <4>[77877.669488] ---[ end trace bb7933908cd5a32a ]---
> >   I was scratching my head for a while how this could happen but I think I
> > see the race now. Think we have two inodes A and B which share the same
> > xattr block (so &BHDR(bh)->h_refcount == 2). Now we are changing xattrs for
> > both inodes in parallel - that can easily happen because xattr locking is
> > generally per inode (EXT4_I(inode)->xattr_sem). The following race then
> > happens:
> > CPU1					CPU2
> > ext4_xattr_release_block()		ext4_xattr_release_block()
> > lock_buffer(bh);
> > /* False */
> > if (BHDR(bh)->h_refcount == cpu_to_le32(1))
> > } else {
> > 	le32_add_cpu(&BHDR(bh)->h_refcount, -1);
> > 	unlock_buffer(bh);
> > 					lock_buffer(bh);
> > 					/* True */
> > 					if (BHDR(bh)->h_refcount == cpu_to_le32(1))
> > 						get_bh(bh);
> > 						ext4_free_blocks()
> > 							...
> > 							jbd2_journal_forget()
> > 								jbd2_journal_unfile_buffer()
> > 								-> JH is gone
> > 	error = ext4_handle_dirty_xattr_block(handle, inode, bh);
> > 	-> triggers warning
> > 
> > Now easy fix would be to move ext4_handle_dirty_xattr_block() before
> > unlock_buffer() but I don't really like that because of locking
> > constraints. I'll think about it...
>   So finally I've got back to this. Attached is a somewhat ugly patch that
> should fix this issue. Can you please test it? Thanks!

Sure; added it to our test tree.  I haven't seen this in a week probably, 
so it'll be hard to definitively say it's fixed, but it'll get plenty of 
testing.  :)

Thanks!
sage

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: crash in __jbd2_journal_file_buffer
  2013-09-10 22:32                     ` Sage Weil
@ 2013-11-11 22:20                       ` Jan Kara
  0 siblings, 0 replies; 16+ messages in thread
From: Jan Kara @ 2013-11-11 22:20 UTC (permalink / raw)
  To: Sage Weil; +Cc: Jan Kara, linux-ext4

On Tue 10-09-13 15:32:47, Sage Weil wrote:
> On Wed, 11 Sep 2013, Jan Kara wrote:
> > On Fri 23-08-13 22:48:10, Jan Kara wrote:
> > > On Thu 22-08-13 16:35:15, Sage Weil wrote:
> > > > On Tue, 13 Aug 2013, Jan Kara wrote:
> > > > > On Mon 12-08-13 11:13:06, Sage Weil wrote:
> > > > > > Full dmesg is attached.
> > > > >   Hum, nothing interesting in there...
> > > > > 
> > > > > > Our QA seems to hit this with some regularity.  Let me know if there's 
> > > > > > some combination of patches that would help shed more light!
> > > > >   If they can run with attached debug patch it could maybe sched some more
> > > > > light. Please send also your System.map file together with the dmesg of the
> > > > > kernel when the crash happens so that I can map addresses to function
> > > > > names... Thanks!
> > > > 
> > > > Okay, finally hit it again:
> > > > 
> > > > <6>[75193.192249] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro,user_xattr,user_xattr
> > > > <3>[77877.426658] Dirtying buffer without jh at 4302720297: state 218c029,jh added from 0xffffffff8127ab1d at 4302720297, removed from 0xffffffff8127b5b0 at 4302720296
> > > > <4>[77877.441200] ------------[ cut here ]------------
> > > > <4>[77877.445845] WARNING: CPU: 7 PID: 26045 at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1380 jbd2_journal_dirty_metadata+0x1f1/0x2e0()
> > > > 
> > > > <4>[77877.497349] CPU: 7 PID: 26045 Comm: ceph-osd Not tainted 3.11.0-rc5-ceph-00061-g546140d #1
> > > > <4>[77877.505649] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
> > > > <4>[77877.513213]  0000000000000564 ffff880131ca1938 ffffffff81642d85 ffff8802272ef290
> > > > <4>[77877.520694]  0000000000000000 ffff880131ca1978 ffffffff8104985c ffff880131ca19a0
> > > > <4>[77877.528218]  ffff88020f695aa0 0000000000000000 ffff880214c48b40 ffff88020be55000
> > > > <4>[77877.535756] Call Trace:
> > > > <4>[77877.538279]  [<ffffffff81642d85>] dump_stack+0x46/0x58
> > > > <4>[77877.543439]  [<ffffffff8104985c>] warn_slowpath_common+0x8c/0xc0
> > > > <4>[77877.549548]  [<ffffffff810498aa>] warn_slowpath_null+0x1a/0x20
> > > > <4>[77877.555413]  [<ffffffff8127adb1>] jbd2_journal_dirty_metadata+0x1f1/0x2e0
> > > > <4>[77877.562288]  [<ffffffff812578c3>] __ext4_handle_dirty_metadata+0xa3/0x140
> > > > <4>[77877.569155]  [<ffffffff81268e23>] ext4_xattr_release_block+0x103/0x1f0
> > > > <4>[77877.575723]  [<ffffffff812692b0>] ext4_xattr_block_set+0x1e0/0x910
> > > > <4>[77877.581990]  [<ffffffff8126a58b>] ext4_xattr_set_handle+0x38b/0x4a0
> > > > <4>[77877.588335]  [<ffffffff810af7cd>] ? trace_hardirqs_on+0xd/0x10
> > > > <4>[77877.594188]  [<ffffffff8126a765>] ext4_xattr_set+0xc5/0x140
> > > > <4>[77877.599837]  [<ffffffff8126b177>] ext4_xattr_user_set+0x47/0x50
> > > > <4>[77877.605779]  [<ffffffff811a3fee>] generic_setxattr+0x6e/0x90
> > > > <4>[77877.611514]  [<ffffffff811a48eb>] __vfs_setxattr_noperm+0x7b/0x1c0
> > > > <4>[77877.617773]  [<ffffffff811a4af4>] vfs_setxattr+0xc4/0xd0
> > > > <4>[77877.623103]  [<ffffffff811a4c3e>] setxattr+0x13e/0x1e0
> > > > <4>[77877.628317]  [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
> > > > <4>[77877.634260]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> > > > <4>[77877.640428]  [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
> > > > <4>[77877.645847]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> > > > <4>[77877.652015]  [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
> > > > <4>[77877.657897]  [<ffffffff811a50de>] SyS_fsetxattr+0xbe/0x100
> > > > <4>[77877.663405]  [<ffffffff81653782>] system_call_fastpath+0x16/0x1b
> > > > <4>[77877.669488] ---[ end trace bb7933908cd5a32a ]---
> > >   I was scratching my head for a while how this could happen but I think I
> > > see the race now. Think we have two inodes A and B which share the same
> > > xattr block (so &BHDR(bh)->h_refcount == 2). Now we are changing xattrs for
> > > both inodes in parallel - that can easily happen because xattr locking is
> > > generally per inode (EXT4_I(inode)->xattr_sem). The following race then
> > > happens:
> > > CPU1					CPU2
> > > ext4_xattr_release_block()		ext4_xattr_release_block()
> > > lock_buffer(bh);
> > > /* False */
> > > if (BHDR(bh)->h_refcount == cpu_to_le32(1))
> > > } else {
> > > 	le32_add_cpu(&BHDR(bh)->h_refcount, -1);
> > > 	unlock_buffer(bh);
> > > 					lock_buffer(bh);
> > > 					/* True */
> > > 					if (BHDR(bh)->h_refcount == cpu_to_le32(1))
> > > 						get_bh(bh);
> > > 						ext4_free_blocks()
> > > 							...
> > > 							jbd2_journal_forget()
> > > 								jbd2_journal_unfile_buffer()
> > > 								-> JH is gone
> > > 	error = ext4_handle_dirty_xattr_block(handle, inode, bh);
> > > 	-> triggers warning
> > > 
> > > Now easy fix would be to move ext4_handle_dirty_xattr_block() before
> > > unlock_buffer() but I don't really like that because of locking
> > > constraints. I'll think about it...
> >   So finally I've got back to this. Attached is a somewhat ugly patch that
> > should fix this issue. Can you please test it? Thanks!
> 
> Sure; added it to our test tree.  I haven't seen this in a week probably, 
> so it'll be hard to definitively say it's fixed, but it'll get plenty of 
> testing.  :)
  Any luck with testing? It has been two months so if the bug didn't
reappear I'd hope it really got fixed...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: crash in __jbd2_journal_file_buffer
  2013-09-10 22:19                   ` Jan Kara
  2013-09-10 22:32                     ` Sage Weil
@ 2014-04-05 19:20                     ` Sage Weil
  1 sibling, 0 replies; 16+ messages in thread
From: Sage Weil @ 2014-04-05 19:20 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, ceph-devel

Hi Jan,

Thanks for reminding me about this at LSF.  Both your debug patch and this 
fix were dropped from our testing branch a while ago, but we just hit the 
crash again yesterday on 3.14-rc5.  The fix was in our tree for quite some 
time, though, and did not cause any problems; I think it got dropped when 
we rebased for the 3.14 cycle.  I'll add it back now.

Unfortunately this triggers pretty infrequently so I'm not sure I can give 
a definitive "this fixes it," but it feels that way, and we haven't seen 
any fallout as a result.

Let me know if there's anything else we can put into the test kernel that 
might help out.

Thanks!
sage


On Wed, 11 Sep 2013, Jan Kara wrote:

> On Fri 23-08-13 22:48:10, Jan Kara wrote:
> > On Thu 22-08-13 16:35:15, Sage Weil wrote:
> > > On Tue, 13 Aug 2013, Jan Kara wrote:
> > > > On Mon 12-08-13 11:13:06, Sage Weil wrote:
> > > > > Full dmesg is attached.
> > > >   Hum, nothing interesting in there...
> > > > 
> > > > > Our QA seems to hit this with some regularity.  Let me know if there's 
> > > > > some combination of patches that would help shed more light!
> > > >   If they can run with attached debug patch it could maybe sched some more
> > > > light. Please send also your System.map file together with the dmesg of the
> > > > kernel when the crash happens so that I can map addresses to function
> > > > names... Thanks!
> > > 
> > > Okay, finally hit it again:
> > > 
> > > <6>[75193.192249] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro,user_xattr,user_xattr
> > > <3>[77877.426658] Dirtying buffer without jh at 4302720297: state 218c029,jh added from 0xffffffff8127ab1d at 4302720297, removed from 0xffffffff8127b5b0 at 4302720296
> > > <4>[77877.441200] ------------[ cut here ]------------
> > > <4>[77877.445845] WARNING: CPU: 7 PID: 26045 at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1380 jbd2_journal_dirty_metadata+0x1f1/0x2e0()
> > > 
> > > <4>[77877.497349] CPU: 7 PID: 26045 Comm: ceph-osd Not tainted 3.11.0-rc5-ceph-00061-g546140d #1
> > > <4>[77877.505649] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
> > > <4>[77877.513213]  0000000000000564 ffff880131ca1938 ffffffff81642d85 ffff8802272ef290
> > > <4>[77877.520694]  0000000000000000 ffff880131ca1978 ffffffff8104985c ffff880131ca19a0
> > > <4>[77877.528218]  ffff88020f695aa0 0000000000000000 ffff880214c48b40 ffff88020be55000
> > > <4>[77877.535756] Call Trace:
> > > <4>[77877.538279]  [<ffffffff81642d85>] dump_stack+0x46/0x58
> > > <4>[77877.543439]  [<ffffffff8104985c>] warn_slowpath_common+0x8c/0xc0
> > > <4>[77877.549548]  [<ffffffff810498aa>] warn_slowpath_null+0x1a/0x20
> > > <4>[77877.555413]  [<ffffffff8127adb1>] jbd2_journal_dirty_metadata+0x1f1/0x2e0
> > > <4>[77877.562288]  [<ffffffff812578c3>] __ext4_handle_dirty_metadata+0xa3/0x140
> > > <4>[77877.569155]  [<ffffffff81268e23>] ext4_xattr_release_block+0x103/0x1f0
> > > <4>[77877.575723]  [<ffffffff812692b0>] ext4_xattr_block_set+0x1e0/0x910
> > > <4>[77877.581990]  [<ffffffff8126a58b>] ext4_xattr_set_handle+0x38b/0x4a0
> > > <4>[77877.588335]  [<ffffffff810af7cd>] ? trace_hardirqs_on+0xd/0x10
> > > <4>[77877.594188]  [<ffffffff8126a765>] ext4_xattr_set+0xc5/0x140
> > > <4>[77877.599837]  [<ffffffff8126b177>] ext4_xattr_user_set+0x47/0x50
> > > <4>[77877.605779]  [<ffffffff811a3fee>] generic_setxattr+0x6e/0x90
> > > <4>[77877.611514]  [<ffffffff811a48eb>] __vfs_setxattr_noperm+0x7b/0x1c0
> > > <4>[77877.617773]  [<ffffffff811a4af4>] vfs_setxattr+0xc4/0xd0
> > > <4>[77877.623103]  [<ffffffff811a4c3e>] setxattr+0x13e/0x1e0
> > > <4>[77877.628317]  [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
> > > <4>[77877.634260]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> > > <4>[77877.640428]  [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
> > > <4>[77877.645847]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> > > <4>[77877.652015]  [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
> > > <4>[77877.657897]  [<ffffffff811a50de>] SyS_fsetxattr+0xbe/0x100
> > > <4>[77877.663405]  [<ffffffff81653782>] system_call_fastpath+0x16/0x1b
> > > <4>[77877.669488] ---[ end trace bb7933908cd5a32a ]---
> >   I was scratching my head for a while how this could happen but I think I
> > see the race now. Think we have two inodes A and B which share the same
> > xattr block (so &BHDR(bh)->h_refcount == 2). Now we are changing xattrs for
> > both inodes in parallel - that can easily happen because xattr locking is
> > generally per inode (EXT4_I(inode)->xattr_sem). The following race then
> > happens:
> > CPU1					CPU2
> > ext4_xattr_release_block()		ext4_xattr_release_block()
> > lock_buffer(bh);
> > /* False */
> > if (BHDR(bh)->h_refcount == cpu_to_le32(1))
> > } else {
> > 	le32_add_cpu(&BHDR(bh)->h_refcount, -1);
> > 	unlock_buffer(bh);
> > 					lock_buffer(bh);
> > 					/* True */
> > 					if (BHDR(bh)->h_refcount == cpu_to_le32(1))
> > 						get_bh(bh);
> > 						ext4_free_blocks()
> > 							...
> > 							jbd2_journal_forget()
> > 								jbd2_journal_unfile_buffer()
> > 								-> JH is gone
> > 	error = ext4_handle_dirty_xattr_block(handle, inode, bh);
> > 	-> triggers warning
> > 
> > Now easy fix would be to move ext4_handle_dirty_xattr_block() before
> > unlock_buffer() but I don't really like that because of locking
> > constraints. I'll think about it...
>   So finally I've got back to this. Attached is a somewhat ugly patch that
> should fix this issue. Can you please test it? Thanks!
> 
> 								Honza
> -- 
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-04-05 19:20 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-30 22:41 crash in __jbd2_journal_file_buffer Sage Weil
2013-07-31 19:02 ` Jan Kara
2013-08-09 17:36   ` Sage Weil
2013-08-09 21:24     ` Jan Kara
2013-08-09 22:11       ` Sage Weil
2013-08-12 12:52         ` Jan Kara
     [not found]           ` <alpine.DEB.2.00.1308121106320.29150@cobra.newdream.net>
2013-08-13 10:34             ` Jan Kara
2013-08-22 23:35               ` Sage Weil
2013-08-23  9:54                 ` Jan Kara
2013-08-23 15:02                   ` Sage Weil
2013-08-23 19:52                     ` Jan Kara
2013-08-23 20:48                 ` Jan Kara
2013-09-10 22:19                   ` Jan Kara
2013-09-10 22:32                     ` Sage Weil
2013-11-11 22:20                       ` Jan Kara
2014-04-05 19:20                     ` Sage Weil

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.