All of lore.kernel.org
 help / color / mirror / Atom feed
* kernel BUG at fs/btrfs/extent_io.c:3982!
@ 2012-04-10 19:39 Jim Schutt
  2012-04-10 20:24 ` Chris Mason
  2012-04-11 19:09 ` Josef Bacik
  0 siblings, 2 replies; 14+ messages in thread
From: Jim Schutt @ 2012-04-10 19:39 UTC (permalink / raw)
  To: linux-btrfs

Hi,

I hit this BUG today.

I'm running 3.3.1 merged with the ceph and btrfs bits for 3.4,
i.e. 3.3.1 +
   commit bc3f116fec194 "Btrfs: update the checks for mixed block groups with big metadata blocks"
   commit c666601a935b9 "rbd: move snap_rwsem to the device, rename to header_rwsem"

The btrfs filesystem in question is backing a Ceph OSD under
a heavy write load.

Here's the bug:

[510342.517157] ------------[ cut here ]------------
[510342.521855] kernel BUG at fs/btrfs/extent_io.c:3982!
[510342.526894] invalid opcode: 0000 [#1] SMP
[510342.531102] CPU 4
[510342.533028] Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa iw_cxgb4 dm_mirror dm_region_hash dm_log dm_round_robin dm_multipath scsi_dh vhost_net macvtap macvlan tun kvm uinput sg sd_mod joydev ata_piix libata button microcode mpt2sas scsi_transport_sas raid_class scsi_mod serio_raw pcspkr mlx4_ib ib_mad ib_core mlx4_en mlx4_core cxgb4 i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ehci_hcd uhci_hcd ioatdma dm_mod i7core_edac edac_core nfs nfs_acl auth_rpcgss fscache lockd sunrpc tg3 bnx2 igb dca e1000 [last unloaded: scsi_wait_scan]
[510342.587836]
[510342.589412] Pid: 16609, comm: kworker/4:2 Not tainted 3.3.1-00162-gd8b2857 #15 Supermicro X8DTH-i/6/iF/6F/X8DTH
[510342.599601] RIP: 0010:[<ffffffffa057924c>]  [<ffffffffa057924c>] btrfs_release_extent_buffer_page.clone.0+0x2c/0x130 [btrfs]
[510342.610893] RSP: 0018:ffff88015fb6ba10  EFLAGS: 00010202
[510342.616277] RAX: 0000000000000004 RBX: ffff880ab81865a0 RCX: ffff880174bc0230
[510342.623476] RDX: ffff8801335bf9b1 RSI: 00000000000d0fb8 RDI: ffff880ab81865a0
[510342.630675] RBP: ffff88015fb6ba40 R08: 0000000000000038 R09: 0000000000000003
[510342.637874] R10: 0000000000000008 R11: ffff8804658c9e40 R12: ffff88015fb6a000
[510342.645069] R13: ffff880ab81865a0 R14: 000000000000000e R15: ffff88015fb6bc10
[510342.652268] FS:  0000000000000000(0000) GS:ffff880627c80000(0000) knlGS:0000000000000000
[510342.660418] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[510342.666234] CR2: ffffffffff600400 CR3: 0000000001a05000 CR4: 00000000000006e0
[510342.673427] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[510342.680627] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[510342.687827] Process kworker/4:2 (pid: 16609, threadinfo ffff88015fb6a000, task ffff880102ca4410)
[510342.696669] Stack:
[510342.698769]  ffff880100000000 ffff880ab81865a0 ffff88015fb6a000 ffff8806057d2eb0
[510342.706297]  000000000000000e ffff88015fb6bc10 ffff88015fb6ba70 ffffffffa05793f2
[510342.713825]  ffff88015fb6bb80 ffff880ab81865a0 ffff88015fb6bb50 0000000000000008
[510342.721362] Call Trace:
[510342.723912]  [<ffffffffa05793f2>] release_extent_buffer+0xa2/0xe0 [btrfs]
[510342.730790]  [<ffffffffa05795b4>] free_extent_buffer+0x34/0x80 [btrfs]
[510342.737407]  [<ffffffffa057a126>] btree_write_cache_pages+0x246/0x410 [btrfs]
[510342.744637]  [<ffffffffa054e96a>] btree_writepages+0x3a/0x50 [btrfs]
[510342.751060]  [<ffffffff810fc421>] do_writepages+0x21/0x40
[510342.756537]  [<ffffffff810f0b0b>] __filemap_fdatawrite_range+0x5b/0x60
[510342.763136]  [<ffffffff810f0de3>] filemap_fdatawrite_range+0x13/0x20
[510342.769568]  [<ffffffffa0554ecf>] btrfs_write_marked_extents+0x7f/0xe0 [btrfs]
[510342.776867]  [<ffffffffa0554f5e>] btrfs_write_and_wait_marked_extents+0x2e/0x60 [btrfs]
[510342.784951]  [<ffffffffa0554fbb>] btrfs_write_and_wait_transaction+0x2b/0x50 [btrfs]
[510342.792768]  [<ffffffffa055604c>] btrfs_commit_transaction+0x7ac/0xa10 [btrfs]
[510342.800060]  [<ffffffff81079540>] ? set_next_entity+0x90/0xa0
[510342.805875]  [<ffffffff8105f5d0>] ? wake_up_bit+0x40/0x40
[510342.811365]  [<ffffffffa0556590>] ? btrfs_end_transaction+0x20/0x20 [btrfs]
[510342.818403]  [<ffffffffa05565af>] do_async_commit+0x1f/0x30 [btrfs]
[510342.824748]  [<ffffffffa0556590>] ? btrfs_end_transaction+0x20/0x20 [btrfs]
[510342.831774]  [<ffffffff81058680>] process_one_work+0x140/0x490
[510342.837673]  [<ffffffff8105a417>] worker_thread+0x187/0x3f0
[510342.843319]  [<ffffffff8105a290>] ? manage_workers+0x120/0x120
[510342.849225]  [<ffffffff8105f02e>] kthread+0x9e/0xb0
[510342.854176]  [<ffffffff81486c64>] kernel_thread_helper+0x4/0x10
[510342.860168]  [<ffffffff8147d84a>] ? retint_restore_args+0xe/0xe
[510342.866161]  [<ffffffff8105ef90>] ? kthread_freezable_should_stop+0x80/0x80
[510342.873198]  [<ffffffff81486c60>] ? gs_change+0xb/0xb
[510342.878322] Code: 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 08 66 66 66 66 90 8b 47 38 49 89 fd 85 c0 75 0c 48 8b 47 20 4c 8d 7f 20 84 c0 79 04 <0f> 0b eb fe 48 8b 47 20 a8 04 75 f4 48 8b 07 49 89 c4 4c 03 67
[510342.898331] RIP  [<ffffffffa057924c>] btrfs_release_extent_buffer_page.clone.0+0x2c/0x130 [btrfs]
[510342.907294]  RSP <ffff88015fb6ba10>
[510342.911241] ---[ end trace 62013c6b6e2e5135 ]---


Please let me know if there is anything I can do
to help track this down.

Thanks -- Jim


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel BUG at fs/btrfs/extent_io.c:3982!
  2012-04-10 19:39 kernel BUG at fs/btrfs/extent_io.c:3982! Jim Schutt
@ 2012-04-10 20:24 ` Chris Mason
  2012-04-10 20:32   ` Jim Schutt
  2012-04-11 19:09 ` Josef Bacik
  1 sibling, 1 reply; 14+ messages in thread
From: Chris Mason @ 2012-04-10 20:24 UTC (permalink / raw)
  To: Jim Schutt; +Cc: linux-btrfs

On Tue, Apr 10, 2012 at 01:39:14PM -0600, Jim Schutt wrote:
> Hi,
> 
> I hit this BUG today.
> 
> I'm running 3.3.1 merged with the ceph and btrfs bits for 3.4,
> i.e. 3.3.1 +
>   commit bc3f116fec194 "Btrfs: update the checks for mixed block groups with big metadata blocks"
>   commit c666601a935b9 "rbd: move snap_rwsem to the device, rename to header_rwsem"
> 
> The btrfs filesystem in question is backing a Ceph OSD under
> a heavy write load.
> 
> Here's the bug:
> 
> [510342.517157] ------------[ cut here ]------------
> [510342.521855] kernel BUG at fs/btrfs/extent_io.c:3982!

Could you please confirm that line number is this BUG_ON()

        BUG_ON(extent_buffer_under_io(eb));

Josef has a theory on this one, but I want to make sure we're chasing
the right thing.

-chris

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel BUG at fs/btrfs/extent_io.c:3982!
  2012-04-10 20:24 ` Chris Mason
@ 2012-04-10 20:32   ` Jim Schutt
  0 siblings, 0 replies; 14+ messages in thread
From: Jim Schutt @ 2012-04-10 20:32 UTC (permalink / raw)
  To: Chris Mason, linux-btrfs

On 04/10/2012 02:24 PM, Chris Mason wrote:
> On Tue, Apr 10, 2012 at 01:39:14PM -0600, Jim Schutt wrote:
>> Hi,
>>
>> I hit this BUG today.
>>
>> I'm running 3.3.1 merged with the ceph and btrfs bits for 3.4,
>> i.e. 3.3.1 +
>>    commit bc3f116fec194 "Btrfs: update the checks for mixed block groups with big metadata blocks"
>>    commit c666601a935b9 "rbd: move snap_rwsem to the device, rename to header_rwsem"
>>
>> The btrfs filesystem in question is backing a Ceph OSD under
>> a heavy write load.
>>
>> Here's the bug:
>>
>> [510342.517157] ------------[ cut here ]------------
>> [510342.521855] kernel BUG at fs/btrfs/extent_io.c:3982!
>
> Could you please confirm that line number is this BUG_ON()
>
>          BUG_ON(extent_buffer_under_io(eb));

Yep, that's definitely it:

git blame fs/btrfs/extent_io.c | grep -w 3982
0b32f4bb (Josef Bacik        2012-03-13 09:38:00 -0400 3982) 	BUG_ON(extent_buffer_under_io(eb));

>
> Josef has a theory on this one, but I want to make sure we're chasing
> the right thing.

Great, thanks.  I'll be happy to test any patches, if needed.

-- Jim

>
> -chris
>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel BUG at fs/btrfs/extent_io.c:3982!
  2012-04-10 19:39 kernel BUG at fs/btrfs/extent_io.c:3982! Jim Schutt
  2012-04-10 20:24 ` Chris Mason
@ 2012-04-11 19:09 ` Josef Bacik
  2012-04-11 20:24   ` Jim Schutt
  1 sibling, 1 reply; 14+ messages in thread
From: Josef Bacik @ 2012-04-11 19:09 UTC (permalink / raw)
  To: Jim Schutt; +Cc: linux-btrfs

On Tue, Apr 10, 2012 at 01:39:14PM -0600, Jim Schutt wrote:
> Hi,
> 
> I hit this BUG today.
> 
> I'm running 3.3.1 merged with the ceph and btrfs bits for 3.4,
> i.e. 3.3.1 +
>   commit bc3f116fec194 "Btrfs: update the checks for mixed block groups with big metadata blocks"
>   commit c666601a935b9 "rbd: move snap_rwsem to the device, rename to header_rwsem"
> 
> The btrfs filesystem in question is backing a Ceph OSD under
> a heavy write load.
> 
> Here's the bug:
> 

Can you give this a whirl and let me know how it goes?  If I'm right you should
see a warning pop up in your messages.  Thanks,

Josef

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 26fbe1c..0d81fd4 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -55,6 +55,7 @@ struct extent_page_data {
 };
 
 static noinline void flush_write_bio(void *data);
+static void check_buffer_tree_ref(struct extent_buffer *eb);
 static inline struct btrfs_fs_info *
 tree_fs_info(struct extent_io_tree *tree)
 {
@@ -3264,6 +3265,12 @@ retry:
 				continue;
 			}
 
+			if (unlikely(!test_bit(EXTENT_BUFFER_TREE_REF,
+					       &eb->bflags))) {
+				WARN_ON(1);
+				check_buffer_tree_ref(eb);
+			}
+
 			prev_eb = eb;
 			ret = lock_extent_buffer_for_io(eb, fs_info, &epd);
 			if (!ret) {

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: kernel BUG at fs/btrfs/extent_io.c:3982!
  2012-04-11 19:09 ` Josef Bacik
@ 2012-04-11 20:24   ` Jim Schutt
  2012-04-11 20:28     ` Josef Bacik
  2012-05-01 16:00     ` Josef Bacik
  0 siblings, 2 replies; 14+ messages in thread
From: Jim Schutt @ 2012-04-11 20:24 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs

On 04/11/2012 01:09 PM, Josef Bacik wrote:
> On Tue, Apr 10, 2012 at 01:39:14PM -0600, Jim Schutt wrote:
>> Hi,
>>
>> I hit this BUG today.
>>
>> I'm running 3.3.1 merged with the ceph and btrfs bits for 3.4,
>> i.e. 3.3.1 +
>>    commit bc3f116fec194 "Btrfs: update the checks for mixed block groups with big metadata blocks"
>>    commit c666601a935b9 "rbd: move snap_rwsem to the device, rename to header_rwsem"
>>
>> The btrfs filesystem in question is backing a Ceph OSD under
>> a heavy write load.
>>
>> Here's the bug:
>>
>
> Can you give this a whirl and let me know how it goes?  If I'm right you should
> see a warning pop up in your messages.  Thanks,

OK, I've got my test running with your patch applied
to my previous kernel.

Do you expect your warning to only fire when my
previous kernel would have BUGged?  I ask because I've
only seen the BUG once, so it may be a low-probability
occurrence.

It seems like I should keep testing until I see either
your new warning or the BUG, right?

Thanks -- Jim

>
> Josef


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel BUG at fs/btrfs/extent_io.c:3982!
  2012-04-11 20:24   ` Jim Schutt
@ 2012-04-11 20:28     ` Josef Bacik
  2012-04-11 21:39       ` Jim Schutt
  2012-05-01 16:00     ` Josef Bacik
  1 sibling, 1 reply; 14+ messages in thread
From: Josef Bacik @ 2012-04-11 20:28 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Josef Bacik, linux-btrfs

On Wed, Apr 11, 2012 at 02:24:30PM -0600, Jim Schutt wrote:
> On 04/11/2012 01:09 PM, Josef Bacik wrote:
> >On Tue, Apr 10, 2012 at 01:39:14PM -0600, Jim Schutt wrote:
> >>Hi,
> >>
> >>I hit this BUG today.
> >>
> >>I'm running 3.3.1 merged with the ceph and btrfs bits for 3.4,
> >>i.e. 3.3.1 +
> >>   commit bc3f116fec194 "Btrfs: update the checks for mixed block groups with big metadata blocks"
> >>   commit c666601a935b9 "rbd: move snap_rwsem to the device, rename to header_rwsem"
> >>
> >>The btrfs filesystem in question is backing a Ceph OSD under
> >>a heavy write load.
> >>
> >>Here's the bug:
> >>
> >
> >Can you give this a whirl and let me know how it goes?  If I'm right you should
> >see a warning pop up in your messages.  Thanks,
> 
> OK, I've got my test running with your patch applied
> to my previous kernel.
> 
> Do you expect your warning to only fire when my
> previous kernel would have BUGged?  I ask because I've
> only seen the BUG once, so it may be a low-probability
> occurrence.
> 
> It seems like I should keep testing until I see either
> your new warning or the BUG, right?
> 

So hopefully you will see my WARN with no BUG, but yes keep running until you
see one or the other please ;).  Thanks,

Josef

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel BUG at fs/btrfs/extent_io.c:3982!
  2012-04-11 20:28     ` Josef Bacik
@ 2012-04-11 21:39       ` Jim Schutt
  2012-04-12  0:29         ` Chris Mason
  0 siblings, 1 reply; 14+ messages in thread
From: Jim Schutt @ 2012-04-11 21:39 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs

On 04/11/2012 02:28 PM, Josef Bacik wrote:
> On Wed, Apr 11, 2012 at 02:24:30PM -0600, Jim Schutt wrote:
>> On 04/11/2012 01:09 PM, Josef Bacik wrote:
>>> On Tue, Apr 10, 2012 at 01:39:14PM -0600, Jim Schutt wrote:
>>>> Hi,
>>>>
>>>> I hit this BUG today.
>>>>
>>>> I'm running 3.3.1 merged with the ceph and btrfs bits for 3.4,
>>>> i.e. 3.3.1 +
>>>>    commit bc3f116fec194 "Btrfs: update the checks for mixed block groups with big metadata blocks"
>>>>    commit c666601a935b9 "rbd: move snap_rwsem to the device, rename to header_rwsem"
>>>>
>>>> The btrfs filesystem in question is backing a Ceph OSD under
>>>> a heavy write load.
>>>>
>>>> Here's the bug:
>>>>
>>>
>>> Can you give this a whirl and let me know how it goes?  If I'm right you should
>>> see a warning pop up in your messages.  Thanks,
>>
>> OK, I've got my test running with your patch applied
>> to my previous kernel.
>>
>> Do you expect your warning to only fire when my
>> previous kernel would have BUGged?  I ask because I've
>> only seen the BUG once, so it may be a low-probability
>> occurrence.
>>
>> It seems like I should keep testing until I see either
>> your new warning or the BUG, right?
>>
>
> So hopefully you will see my WARN with no BUG, but yes keep running until you
> see one or the other please ;).  Thanks,

Hmmm, the BUG won:

[ 6202.249041] ------------[ cut here ]------------
[ 6202.253654] kernel BUG at fs/btrfs/extent_io.c:3989!
[ 6202.258607] invalid opcode: 0000 [#1] SMP
[ 6202.262737] CPU 5
[ 6202.264578] Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa iw_cxgb4 dm_mirror dm_region_hash dm_log dm_round_robin dm_multipath scsi_dh vhost_net macvtap macvlan tun kvm uinput sg joydev sd_mod ata_piix libata microcode button mpt2sas scsi_transport_sas raid_class scsi_mod serio_raw pcspkr mlx4_ib ib_mad ib_core mlx4_en mlx4_core cxgb4 i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ehci_hcd uhci_hcd ioatdma dm_mod i7core_edac edac_core nfs nfs_acl auth_rpcgss fscache lockd sunrpc tg3 bnx2 igb dca e1000 [last unloaded: scsi_wait_scan]
[ 6202.319360]
[ 6202.320862] Pid: 1676, comm: kworker/5:2 Not tainted 3.3.1-00163-gdf6ae83 #17 Supermicro X8DTH-i/6/iF/6F/X8DTH
[ 6202.330900] RIP: 0010:[<ffffffffa057724c>]  [<ffffffffa057724c>] btrfs_release_extent_buffer_page.clone.0+0x2c/0x130 [btrfs]
[ 6202.342121] RSP: 0018:ffff88060c74da00  EFLAGS: 00010202
[ 6202.347417] RAX: 0000000000000004 RBX: ffff88049b4d3b20 RCX: ffff8809135bf9a8
[ 6202.354521] RDX: ffff8802df769cd9 RSI: 00000000001409bc RDI: ffff88049b4d3b20
[ 6202.361626] RBP: ffff88060c74da30 R08: 000000000000003c R09: 0000000000000003
[ 6202.368734] R10: 0000000000000008 R11: ffff8802a9aa6a20 R12: ffff88060c74c000
[ 6202.375848] R13: ffff88049b4d3b20 R14: 000000000000000e R15: ffff88060c74dc10
[ 6202.382963] FS:  0000000000000000(0000) GS:ffff880627ca0000(0000) knlGS:0000000000000000
[ 6202.391029] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 6202.396758] CR2: ffffffffff600400 CR3: 000000061e956000 CR4: 00000000000006e0
[ 6202.403872] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 6202.410986] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 6202.418104] Process kworker/5:2 (pid: 1676, threadinfo ffff88060c74c000, task ffff8806166616b0)
[ 6202.426776] Stack:
[ 6202.428792]  ffff880600000000 ffff88049b4d3b20 ffff88060c74c000 ffff8802fc3c3290
[ 6202.436257]  000000000000000e ffff88060c74dc10 ffff88060c74da60 ffffffffa05773f2
[ 6202.443735]  ffff88060c74db80 ffff88049b4d3b20 ffff88060c74db10 0000000000000000
[ 6202.451211] Call Trace:
[ 6202.453690]  [<ffffffffa05773f2>] release_extent_buffer+0xa2/0xe0 [btrfs]
[ 6202.460505]  [<ffffffffa05775b4>] free_extent_buffer+0x34/0x80 [btrfs]
[ 6202.467051]  [<ffffffffa0578152>] btree_write_cache_pages+0x272/0x480 [btrfs]
[ 6202.474169]  [<ffffffff81077588>] ? update_curr+0x128/0x1f0
[ 6202.479761]  [<ffffffffa054c96a>] btree_writepages+0x3a/0x50 [btrfs]
[ 6202.486110]  [<ffffffff810fc421>] do_writepages+0x21/0x40
[ 6202.491500]  [<ffffffff810f0b0b>] __filemap_fdatawrite_range+0x5b/0x60
[ 6202.498019]  [<ffffffff810f0de3>] filemap_fdatawrite_range+0x13/0x20
[ 6202.504407]  [<ffffffffa0552ecf>] btrfs_write_marked_extents+0x7f/0xe0 [btrfs]
[ 6202.511639]  [<ffffffffa0552f5e>] btrfs_write_and_wait_marked_extents+0x2e/0x60 [btrfs]
[ 6202.519679]  [<ffffffffa0552fbb>] btrfs_write_and_wait_transaction+0x2b/0x50 [btrfs]
[ 6202.527464]  [<ffffffffa055404c>] btrfs_commit_transaction+0x7ac/0xa10 [btrfs]
[ 6202.534675]  [<ffffffff81079540>] ? set_next_entity+0x90/0xa0
[ 6202.540418]  [<ffffffff8105f5d0>] ? wake_up_bit+0x40/0x40
[ 6202.545830]  [<ffffffffa0554590>] ? btrfs_end_transaction+0x20/0x20 [btrfs]
[ 6202.552825]  [<ffffffffa05545af>] do_async_commit+0x1f/0x30 [btrfs]
[ 6202.559111]  [<ffffffffa0554590>] ? btrfs_end_transaction+0x20/0x20 [btrfs]
[ 6202.566062]  [<ffffffff81058680>] process_one_work+0x140/0x490
[ 6202.571886]  [<ffffffff8105a417>] worker_thread+0x187/0x3f0
[ 6202.577453]  [<ffffffff8105a290>] ? manage_workers+0x120/0x120
[ 6202.583281]  [<ffffffff8105f02e>] kthread+0x9e/0xb0
[ 6202.588159]  [<ffffffff81486c64>] kernel_thread_helper+0x4/0x10
[ 6202.594076]  [<ffffffff8147d84a>] ? retint_restore_args+0xe/0xe
[ 6202.599988]  [<ffffffff8105ef90>] ? kthread_freezable_should_stop+0x80/0x80
[ 6202.606936]  [<ffffffff81486c60>] ? gs_change+0xb/0xb
[ 6202.611975] Code: 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 08 66 66 66 66 90 8b 47 38 49 89 fd 85 c0 75 0c 48 8b 47 20 4c 8d 7f 20 84 c0 79 04 <0f> 0b eb fe 48 8b 47 20 a8 04 75 f4 48 8b 07 49 89 c4 4c 03 67
[ 6202.631894] RIP  [<ffffffffa057724c>] btrfs_release_extent_buffer_page.clone.0+0x2c/0x130 [btrfs]
[ 6202.640773]  RSP <ffff88060c74da00>
[ 6202.644691] ---[ end trace de7af0e9a646be3b ]---

git blame fs/btrfs/extent_io.c | grep -w 3989
0b32f4bb (Josef Bacik        2012-03-13 09:38:00 -0400 3989) 	BUG_ON(extent_buffer_under_io(eb));

-- Jim


>
> Josef
>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel BUG at fs/btrfs/extent_io.c:3982!
  2012-04-11 21:39       ` Jim Schutt
@ 2012-04-12  0:29         ` Chris Mason
  0 siblings, 0 replies; 14+ messages in thread
From: Chris Mason @ 2012-04-12  0:29 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Josef Bacik, linux-btrfs

On Wed, Apr 11, 2012 at 03:39:07PM -0600, Jim Schutt wrote:
> On 04/11/2012 02:28 PM, Josef Bacik wrote:
> >On Wed, Apr 11, 2012 at 02:24:30PM -0600, Jim Schutt wrote:
> >>On 04/11/2012 01:09 PM, Josef Bacik wrote:
> >>>On Tue, Apr 10, 2012 at 01:39:14PM -0600, Jim Schutt wrote:
> >>>>Hi,
> >>>>
> >>>>I hit this BUG today.
> >>>>
> >>>>I'm running 3.3.1 merged with the ceph and btrfs bits for 3.4,
> >>>>i.e. 3.3.1 +
> >>>>   commit bc3f116fec194 "Btrfs: update the checks for mixed block groups with big metadata blocks"
> >>>>   commit c666601a935b9 "rbd: move snap_rwsem to the device, rename to header_rwsem"
> >>>>
> >>>>The btrfs filesystem in question is backing a Ceph OSD under
> >>>>a heavy write load.
> >>>>
> >>>>Here's the bug:
> >>>>
> >>>
> >>>Can you give this a whirl and let me know how it goes?  If I'm right you should
> >>>see a warning pop up in your messages.  Thanks,
> >>
> >>OK, I've got my test running with your patch applied
> >>to my previous kernel.
> >>
> >>Do you expect your warning to only fire when my
> >>previous kernel would have BUGged?  I ask because I've
> >>only seen the BUG once, so it may be a low-probability
> >>occurrence.
> >>
> >>It seems like I should keep testing until I see either
> >>your new warning or the BUG, right?
> >>
> >
> >So hopefully you will see my WARN with no BUG, but yes keep running until you
> >see one or the other please ;).  Thanks,
> 
> Hmmm, the BUG won:
> 
> [ 6202.249041] ------------[ cut here ]------------
> [ 6202.253654] kernel BUG at fs/btrfs/extent_io.c:3989!

Since this is exactly the same call trace, we can assume ref count on
the buffer is correct.  I think it means we're racing on removing the
buffer from the radix tree.  I'm adding some diagnostics here to try and
grow the window a bit.

-chris

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel BUG at fs/btrfs/extent_io.c:3982!
  2012-04-11 20:24   ` Jim Schutt
  2012-04-11 20:28     ` Josef Bacik
@ 2012-05-01 16:00     ` Josef Bacik
  2012-05-01 16:41       ` Jim Schutt
  1 sibling, 1 reply; 14+ messages in thread
From: Josef Bacik @ 2012-05-01 16:00 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Josef Bacik, linux-btrfs

On Wed, Apr 11, 2012 at 02:24:30PM -0600, Jim Schutt wrote:
> On 04/11/2012 01:09 PM, Josef Bacik wrote:
> >On Tue, Apr 10, 2012 at 01:39:14PM -0600, Jim Schutt wrote:
> >>Hi,
> >>
> >>I hit this BUG today.
> >>
> >>I'm running 3.3.1 merged with the ceph and btrfs bits for 3.4,
> >>i.e. 3.3.1 +
> >>   commit bc3f116fec194 "Btrfs: update the checks for mixed block groups with big metadata blocks"
> >>   commit c666601a935b9 "rbd: move snap_rwsem to the device, rename to header_rwsem"
> >>
> >>The btrfs filesystem in question is backing a Ceph OSD under
> >>a heavy write load.
> >>
> >>Here's the bug:
> >>
> >
> >Can you give this a whirl and let me know how it goes?  If I'm right you should
> >see a warning pop up in your messages.  Thanks,
> 
> OK, I've got my test running with your patch applied
> to my previous kernel.
> 
> Do you expect your warning to only fire when my
> previous kernel would have BUGged?  I ask because I've
> only seen the BUG once, so it may be a low-probability
> occurrence.
> 
> It seems like I should keep testing until I see either
> your new warning or the BUG, right?

Hey Jim,

I just sent a patch to the list

[PATCH] Btrfs: fix page leak when allocing extent buffers 

Could you try that and see if you can reproduce your problem?  Thanks,

Josef

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel BUG at fs/btrfs/extent_io.c:3982!
  2012-05-01 16:00     ` Josef Bacik
@ 2012-05-01 16:41       ` Jim Schutt
  2012-05-03 14:43         ` Jim Schutt
  0 siblings, 1 reply; 14+ messages in thread
From: Jim Schutt @ 2012-05-01 16:41 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs

On 05/01/2012 10:00 AM, Josef Bacik wrote:
> On Wed, Apr 11, 2012 at 02:24:30PM -0600, Jim Schutt wrote:
>> On 04/11/2012 01:09 PM, Josef Bacik wrote:
>>> On Tue, Apr 10, 2012 at 01:39:14PM -0600, Jim Schutt wrote:
>>>> Hi,
>>>>
>>>> I hit this BUG today.
>>>>
>>>> I'm running 3.3.1 merged with the ceph and btrfs bits for 3.4,
>>>> i.e. 3.3.1 +
>>>>    commit bc3f116fec194 "Btrfs: update the checks for mixed block groups with big metadata blocks"
>>>>    commit c666601a935b9 "rbd: move snap_rwsem to the device, rename to header_rwsem"
>>>>
>>>> The btrfs filesystem in question is backing a Ceph OSD under
>>>> a heavy write load.
>>>>
>>>> Here's the bug:
>>>>
>>>
>>> Can you give this a whirl and let me know how it goes?  If I'm right you should
>>> see a warning pop up in your messages.  Thanks,
>>
>> OK, I've got my test running with your patch applied
>> to my previous kernel.
>>
>> Do you expect your warning to only fire when my
>> previous kernel would have BUGged?  I ask because I've
>> only seen the BUG once, so it may be a low-probability
>> occurrence.
>>
>> It seems like I should keep testing until I see either
>> your new warning or the BUG, right?
>
> Hey Jim,
>
> I just sent a patch to the list
>
> [PATCH] Btrfs: fix page leak when allocing extent buffers
>
> Could you try that and see if you can reproduce your problem?

Taking it for a spin now...

Thanks -- Jim

> Thanks,
>
> Josef
>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel BUG at fs/btrfs/extent_io.c:3982!
  2012-05-01 16:41       ` Jim Schutt
@ 2012-05-03 14:43         ` Jim Schutt
  2012-05-03 14:53           ` Josef Bacik
  0 siblings, 1 reply; 14+ messages in thread
From: Jim Schutt @ 2012-05-03 14:43 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs

On 05/01/2012 10:41 AM, Jim Schutt wrote:
> On 05/01/2012 10:00 AM, Josef Bacik wrote:
>> On Wed, Apr 11, 2012 at 02:24:30PM -0600, Jim Schutt wrote:
>>> On 04/11/2012 01:09 PM, Josef Bacik wrote:
>>>> On Tue, Apr 10, 2012 at 01:39:14PM -0600, Jim Schutt wrote:
>>>>> Hi,
>>>>>
>>>>> I hit this BUG today.
>>>>>
>>>>> I'm running 3.3.1 merged with the ceph and btrfs bits for 3.4,
>>>>> i.e. 3.3.1 +
>>>>> commit bc3f116fec194 "Btrfs: update the checks for mixed block groups with big metadata blocks"
>>>>> commit c666601a935b9 "rbd: move snap_rwsem to the device, rename to header_rwsem"
>>>>>
>>>>> The btrfs filesystem in question is backing a Ceph OSD under
>>>>> a heavy write load.
>>>>>
>>>>> Here's the bug:
>>>>>
>>>>
>>>> Can you give this a whirl and let me know how it goes? If I'm right you should
>>>> see a warning pop up in your messages. Thanks,
>>>
>>> OK, I've got my test running with your patch applied
>>> to my previous kernel.
>>>
>>> Do you expect your warning to only fire when my
>>> previous kernel would have BUGged? I ask because I've
>>> only seen the BUG once, so it may be a low-probability
>>> occurrence.
>>>
>>> It seems like I should keep testing until I see either
>>> your new warning or the BUG, right?
>>
>> Hey Jim,
>>
>> I just sent a patch to the list
>>
>> [PATCH] Btrfs: fix page leak when allocing extent buffers
>>
>> Could you try that and see if you can reproduce your problem?
>
> Taking it for a spin now...
>

Hit it again:

[ 4638.295231] ------------[ cut here ]------------
[ 4638.299840] kernel BUG at fs/btrfs/extent_io.c:3993!
[ 4638.304792] invalid opcode: 0000 [#1] SMP
[ 4638.308912] CPU 3
[ 4638.310745] Modules linked in: btrfs zlib_deflate dm_round_robin ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa iw_cxgb4 sg sd_mod dm_mirror dm_region_hash dm_log dm_multipath scsi_dh vhost_net macvtap macvlan tun kvm uinput joydev button ata_piix libata mpt2sas scsi_transport_sas raid_class scsi_mod microcode serio_raw pcspkr mlx4_ib ib_mad ib_core mlx4_en mlx4_core cxgb4 i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ehci_hcd uhci_hcd ioatdma i7core_edac edac_core dm_mod nfs nfs_acl auth_rpcgss fscache lockd sunrpc broadcom tg3 bnx2 igb dca e1000 [last unloaded: scsi_wait_scan]
[ 4638.366288]
[ 4638.367786] Pid: 32179, comm: kworker/3:5 Not tainted 3.3.4-00186-g56a0ae2 #65 Supermicro X8DTH-i/6/iF/6F/X8DTH
[ 4638.377898] RIP: 0010:[<ffffffffa057717c>]  [<ffffffffa057717c>] btrfs_release_extent_buffer_page.clone.0+0x2c/0x130 [btrfs]
[ 4638.389112] RSP: 0018:ffff8805ff6cba00  EFLAGS: 00010202
[ 4638.394408] RAX: 0000000000000004 RBX: ffff880152ba8c18 RCX: ffff8800be9e4468
[ 4638.401529] RDX: ffff8802f7d64b19 RSI: 00000000000858ec RDI: ffff880152ba8c18
[ 4638.408644] RBP: ffff8805ff6cba30 R08: 000000000000002c R09: 0000000000000003
[ 4638.415759] R10: 0000000000000008 R11: ffff880618cee0c0 R12: ffff8805ff6ca000
[ 4638.422874] R13: ffff880152ba8c18 R14: 000000000000000e R15: ffff8805ff6cbc10
[ 4638.429987] FS:  0000000000000000(0000) GS:ffff880627c60000(0000) knlGS:0000000000000000
[ 4638.438052] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 4638.443781] CR2: ffffffffff600400 CR3: 0000000a06461000 CR4: 00000000000006e0
[ 4638.450900] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4638.458018] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 4638.465130] Process kworker/3:5 (pid: 32179, threadinfo ffff8805ff6ca000, task ffff88060bf14500)
[ 4638.473886] Stack:
[ 4638.475899]  ffff880500000000 ffff880152ba8c18 ffff8805ff6ca000 ffff880a0cf1aeb0
[ 4638.483350]  000000000000000e ffff8805ff6cbc10 ffff8805ff6cba60 ffffffffa0577322
[ 4638.490782]  ffff8805ff6cbb80 ffff880152ba8c18 ffff8805ff6cbb50 0000000000000008
[ 4638.498234] Call Trace:
[ 4638.500705]  [<ffffffffa0577322>] release_extent_buffer+0xa2/0xe0 [btrfs]
[ 4638.507492]  [<ffffffffa05774e4>] free_extent_buffer+0x34/0x80 [btrfs]
[ 4638.514036]  [<ffffffffa05780a2>] btree_write_cache_pages+0x272/0x480 [btrfs]
[ 4638.521155]  [<ffffffff81075b18>] ? enqueue_sleeper+0x248/0x2c0
[ 4638.527072]  [<ffffffffa054c92a>] btree_writepages+0x3a/0x50 [btrfs]
[ 4638.533411]  [<ffffffff810fc9f1>] do_writepages+0x21/0x40
[ 4638.538794]  [<ffffffff810f10db>] __filemap_fdatawrite_range+0x5b/0x60
[ 4638.545300]  [<ffffffff810f13b3>] filemap_fdatawrite_range+0x13/0x20
[ 4638.551654]  [<ffffffffa0552e8f>] btrfs_write_marked_extents+0x7f/0xe0 [btrfs]
[ 4638.558862]  [<ffffffffa0552f1e>] btrfs_write_and_wait_marked_extents+0x2e/0x60 [btrfs]
[ 4638.566867]  [<ffffffffa0552f7b>] btrfs_write_and_wait_transaction+0x2b/0x50 [btrfs]
[ 4638.574615]  [<ffffffffa0554058>] btrfs_commit_transaction+0x7d8/0x9f0 [btrfs]
[ 4638.581818]  [<ffffffff81079910>] ? set_next_entity+0x90/0xa0
[ 4638.587556]  [<ffffffff8105f970>] ? wake_up_bit+0x40/0x40
[ 4638.592957]  [<ffffffffa0554570>] ? btrfs_end_transaction+0x20/0x20 [btrfs]
[ 4638.599902]  [<ffffffffa055458f>] do_async_commit+0x1f/0x30 [btrfs]
[ 4638.606161]  [<ffffffffa0554570>] ? btrfs_end_transaction+0x20/0x20 [btrfs]
[ 4638.613106]  [<ffffffff81058a20>] process_one_work+0x140/0x490
[ 4638.618926]  [<ffffffff8105a7b7>] worker_thread+0x187/0x3f0
[ 4638.624484]  [<ffffffff8105a630>] ? manage_workers+0x120/0x120
[ 4638.630303]  [<ffffffff8105f3ce>] kthread+0x9e/0xb0
[ 4638.635171]  [<ffffffff81487a24>] kernel_thread_helper+0x4/0x10
[ 4638.641072]  [<ffffffff8147e60a>] ? retint_restore_args+0xe/0xe
[ 4638.646974]  [<ffffffff8105f330>] ? kthread_freezable_should_stop+0x80/0x80
[ 4638.653915]  [<ffffffff81487a20>] ? gs_change+0xb/0xb
[ 4638.658952] Code: 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 08 66 66 66 66 90 8b 47 38 49 89 fd 85 c0 75 0c 48 8b 47 20 4c 8d 7f 20 84 c0 79 04 <0f> 0b eb fe 48 8b 47 20 a8 04 75 f4 48 8b 07 49 89 c4 4c 03 67
[ 4638.678851] RIP  [<ffffffffa057717c>] btrfs_release_extent_buffer_page.clone.0+0x2c/0x130 [btrfs]
[ 4638.687729]  RSP <ffff8805ff6cba00>
[ 4638.691654] ---[ end trace 51121d321f4755d6 ]---


Kernel is 3.3.4 + for-linus branch (commit c666601a93) of
     git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git
+ for-linus branch (commit dc7fdde39e) of
     git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git
+ your debug patch for me: "kernel BUG at fs/btrfs/extent_io.c:3982!"
+ your patch "Btrfs: fix page leak when allocing extent buffers"


git blame fs/btrfs/extent_io.c | grep -w -C 10 3993
0b32f4bb (Josef Bacik        2012-03-13 09:38:00 -0400 3983)
897ca6e9 (Miao Xie           2010-10-26 20:57:29 -0400 3984) /*
897ca6e9 (Miao Xie           2010-10-26 20:57:29 -0400 3985)  * Helper for releasing extent buffer page.
897ca6e9 (Miao Xie           2010-10-26 20:57:29 -0400 3986)  */
897ca6e9 (Miao Xie           2010-10-26 20:57:29 -0400 3987) static void btrfs_release_extent_buffer_page(struct extent_buffer *eb,
897ca6e9 (Miao Xie           2010-10-26 20:57:29 -0400 3988) 						unsigned long start_idx)
897ca6e9 (Miao Xie           2010-10-26 20:57:29 -0400 3989) {
897ca6e9 (Miao Xie           2010-10-26 20:57:29 -0400 3990) 	unsigned long index;
897ca6e9 (Miao Xie           2010-10-26 20:57:29 -0400 3991) 	struct page *page;
897ca6e9 (Miao Xie           2010-10-26 20:57:29 -0400 3992)
0b32f4bb (Josef Bacik        2012-03-13 09:38:00 -0400 3993) 	BUG_ON(extent_buffer_under_io(eb));
0b32f4bb (Josef Bacik        2012-03-13 09:38:00 -0400 3994)
897ca6e9 (Miao Xie           2010-10-26 20:57:29 -0400 3995) 	index = num_extent_pages(eb->start, eb->len);
897ca6e9 (Miao Xie           2010-10-26 20:57:29 -0400 3996) 	if (start_idx >= index)
897ca6e9 (Miao Xie           2010-10-26 20:57:29 -0400 3997) 		return;
897ca6e9 (Miao Xie           2010-10-26 20:57:29 -0400 3998)
897ca6e9 (Miao Xie           2010-10-26 20:57:29 -0400 3999) 	do {
897ca6e9 (Miao Xie           2010-10-26 20:57:29 -0400 4000) 		index--;
897ca6e9 (Miao Xie           2010-10-26 20:57:29 -0400 4001) 		page = extent_buffer_page(eb, index);
4f2de97a (Josef Bacik        2012-03-07 16:20:05 -0500 4002) 		if (page) {
4f2de97a (Josef Bacik        2012-03-07 16:20:05 -0500 4003) 			spin_lock(&page->map

FWIW, it takes a while to hit this - load was 128 Ceph clients writing
to a Ceph filesystem with 288 OSDs - the above bug hit several tens of
TB into a 65 TB aggregate write test.

-- Jim

> Thanks -- Jim
>
>> Thanks,
>>
>> Josef
>>
>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel BUG at fs/btrfs/extent_io.c:3982!
  2012-05-03 14:43         ` Jim Schutt
@ 2012-05-03 14:53           ` Josef Bacik
  2012-05-03 15:46             ` [EXTERNAL] " Jim Schutt
  0 siblings, 1 reply; 14+ messages in thread
From: Josef Bacik @ 2012-05-03 14:53 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Josef Bacik, linux-btrfs

On Thu, May 03, 2012 at 08:43:32AM -0600, Jim Schutt wrote:
> On 05/01/2012 10:41 AM, Jim Schutt wrote:
> >On 05/01/2012 10:00 AM, Josef Bacik wrote:
> >>On Wed, Apr 11, 2012 at 02:24:30PM -0600, Jim Schutt wrote:
> >>>On 04/11/2012 01:09 PM, Josef Bacik wrote:
> >>>>On Tue, Apr 10, 2012 at 01:39:14PM -0600, Jim Schutt wrote:
> >>>>>Hi,
> >>>>>
> >>>>>I hit this BUG today.
> >>>>>
> >>>>>I'm running 3.3.1 merged with the ceph and btrfs bits for 3.4,
> >>>>>i.e. 3.3.1 +
> >>>>>commit bc3f116fec194 "Btrfs: update the checks for mixed block groups with big metadata blocks"
> >>>>>commit c666601a935b9 "rbd: move snap_rwsem to the device, rename to header_rwsem"
> >>>>>
> >>>>>The btrfs filesystem in question is backing a Ceph OSD under
> >>>>>a heavy write load.
> >>>>>
> >>>>>Here's the bug:
> >>>>>
> >>>>
> >>>>Can you give this a whirl and let me know how it goes? If I'm right you should
> >>>>see a warning pop up in your messages. Thanks,
> >>>
> >>>OK, I've got my test running with your patch applied
> >>>to my previous kernel.
> >>>
> >>>Do you expect your warning to only fire when my
> >>>previous kernel would have BUGged? I ask because I've
> >>>only seen the BUG once, so it may be a low-probability
> >>>occurrence.
> >>>
> >>>It seems like I should keep testing until I see either
> >>>your new warning or the BUG, right?
> >>
> >>Hey Jim,
> >>
> >>I just sent a patch to the list
> >>
> >>[PATCH] Btrfs: fix page leak when allocing extent buffers
> >>
> >>Could you try that and see if you can reproduce your problem?
> >
> >Taking it for a spin now...
> >
> 
> Hit it again:
> 

Argh ok it's time to stop hopping around the problem and see what exactly the
state is when this happens so I know where to look.  Can you run with this patch
and give me the dmesg?  The important information will be above the --- cut here
 --- line so make sure to grab that part.  Thanks,

Josef


diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 7af9343..72249e3 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3972,7 +3972,13 @@ static void btrfs_release_extent_buffer_page(struct extent_buffer *eb,
 	unsigned long num_pages;
 	struct page *page;
 
-	BUG_ON(extent_buffer_under_io(eb));
+	if (extent_buffer_under_io(eb)) {
+		printk(KERN_ERR "io_pages=%d, writeback=%d, dirty=%d, stale=%d, tree_ref=%d\n",
+		       atomic_read(&eb->io_pages), test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags),
+		       test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags), test_bit(EXTENT_BUFFER_STALE, &eb->bflags),
+		       test_bit(EXTENT_BUFFER_TREE_REF, &eb->bflags));
+		BUG();
+	}
 
 	num_pages = num_extent_pages(eb->start, eb->len);
 	index = start_idx + num_pages;

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [EXTERNAL] Re: kernel BUG at fs/btrfs/extent_io.c:3982!
  2012-05-03 14:53           ` Josef Bacik
@ 2012-05-03 15:46             ` Jim Schutt
  2012-05-03 15:53               ` Josef Bacik
  0 siblings, 1 reply; 14+ messages in thread
From: Jim Schutt @ 2012-05-03 15:46 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs

On 05/03/2012 08:53 AM, Josef Bacik wrote:
> On Thu, May 03, 2012 at 08:43:32AM -0600, Jim Schutt wrote:
>> On 05/01/2012 10:41 AM, Jim Schutt wrote:
>>> On 05/01/2012 10:00 AM, Josef Bacik wrote:
>>>> On Wed, Apr 11, 2012 at 02:24:30PM -0600, Jim Schutt wrote:
>>>>> On 04/11/2012 01:09 PM, Josef Bacik wrote:
>>>>>> On Tue, Apr 10, 2012 at 01:39:14PM -0600, Jim Schutt wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I hit this BUG today.
>>>>>>>
>>>>>>> I'm running 3.3.1 merged with the ceph and btrfs bits for 3.4,
>>>>>>> i.e. 3.3.1 +
>>>>>>> commit bc3f116fec194 "Btrfs: update the checks for mixed block =
groups with big metadata blocks"
>>>>>>> commit c666601a935b9 "rbd: move snap_rwsem to the device, renam=
e to header_rwsem"
>>>>>>>
>>>>>>> The btrfs filesystem in question is backing a Ceph OSD under
>>>>>>> a heavy write load.
>>>>>>>
>>>>>>> Here's the bug:
>>>>>>>
>>>>>>
>>>>>> Can you give this a whirl and let me know how it goes? If I'm ri=
ght you should
>>>>>> see a warning pop up in your messages. Thanks,
>>>>>
>>>>> OK, I've got my test running with your patch applied
>>>>> to my previous kernel.
>>>>>
>>>>> Do you expect your warning to only fire when my
>>>>> previous kernel would have BUGged? I ask because I've
>>>>> only seen the BUG once, so it may be a low-probability
>>>>> occurrence.
>>>>>
>>>>> It seems like I should keep testing until I see either
>>>>> your new warning or the BUG, right?
>>>>
>>>> Hey Jim,
>>>>
>>>> I just sent a patch to the list
>>>>
>>>> [PATCH] Btrfs: fix page leak when allocing extent buffers
>>>>
>>>> Could you try that and see if you can reproduce your problem?
>>>
>>> Taking it for a spin now...
>>>
>>
>> Hit it again:
>>
>
> Argh ok it's time to stop hopping around the problem and see what exa=
ctly the
> state is when this happens so I know where to look.  Can you run with=
 this patch
> and give me the dmesg?  The important information will be above the -=
-- cut here
>   --- line so make sure to grab that part.  Thanks,

Working on it...

BTW, when I recompiled, I noticed this warning:

   CC [M]  fs/btrfs/extent_io.o
fs/btrfs/extent_io.c: In function =E2=80=98write_one_eb=E2=80=99:
fs/btrfs/extent_io.c:3195: warning: =E2=80=98ret=E2=80=99 may be used u=
ninitialized in this function

Is there ever any chance at all that write_one_eb() can be
called by mistake for an eb with zero pages?  If so, could
that be part of the problem?

-- Jim

>
> Josef
>

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [EXTERNAL] Re: kernel BUG at fs/btrfs/extent_io.c:3982!
  2012-05-03 15:46             ` [EXTERNAL] " Jim Schutt
@ 2012-05-03 15:53               ` Josef Bacik
  0 siblings, 0 replies; 14+ messages in thread
From: Josef Bacik @ 2012-05-03 15:53 UTC (permalink / raw)
  To: Jim Schutt; +Cc: Josef Bacik, linux-btrfs

On Thu, May 03, 2012 at 09:46:15AM -0600, Jim Schutt wrote:
> On 05/03/2012 08:53 AM, Josef Bacik wrote:
> >On Thu, May 03, 2012 at 08:43:32AM -0600, Jim Schutt wrote:
> >>On 05/01/2012 10:41 AM, Jim Schutt wrote:
> >>>On 05/01/2012 10:00 AM, Josef Bacik wrote:
> >>>>On Wed, Apr 11, 2012 at 02:24:30PM -0600, Jim Schutt wrote:
> >>>>>On 04/11/2012 01:09 PM, Josef Bacik wrote:
> >>>>>>On Tue, Apr 10, 2012 at 01:39:14PM -0600, Jim Schutt wrote:
> >>>>>>>Hi,
> >>>>>>>
> >>>>>>>I hit this BUG today.
> >>>>>>>
> >>>>>>>I'm running 3.3.1 merged with the ceph and btrfs bits for 3.4,
> >>>>>>>i.e. 3.3.1 +
> >>>>>>>commit bc3f116fec194 "Btrfs: update the checks for mixed block=
 groups with big metadata blocks"
> >>>>>>>commit c666601a935b9 "rbd: move snap_rwsem to the device, rena=
me to header_rwsem"
> >>>>>>>
> >>>>>>>The btrfs filesystem in question is backing a Ceph OSD under
> >>>>>>>a heavy write load.
> >>>>>>>
> >>>>>>>Here's the bug:
> >>>>>>>
> >>>>>>
> >>>>>>Can you give this a whirl and let me know how it goes? If I'm r=
ight you should
> >>>>>>see a warning pop up in your messages. Thanks,
> >>>>>
> >>>>>OK, I've got my test running with your patch applied
> >>>>>to my previous kernel.
> >>>>>
> >>>>>Do you expect your warning to only fire when my
> >>>>>previous kernel would have BUGged? I ask because I've
> >>>>>only seen the BUG once, so it may be a low-probability
> >>>>>occurrence.
> >>>>>
> >>>>>It seems like I should keep testing until I see either
> >>>>>your new warning or the BUG, right?
> >>>>
> >>>>Hey Jim,
> >>>>
> >>>>I just sent a patch to the list
> >>>>
> >>>>[PATCH] Btrfs: fix page leak when allocing extent buffers
> >>>>
> >>>>Could you try that and see if you can reproduce your problem?
> >>>
> >>>Taking it for a spin now...
> >>>
> >>
> >>Hit it again:
> >>
> >
> >Argh ok it's time to stop hopping around the problem and see what ex=
actly the
> >state is when this happens so I know where to look.  Can you run wit=
h this patch
> >and give me the dmesg?  The important information will be above the =
--- cut here
> >  --- line so make sure to grab that part.  Thanks,
>=20
> Working on it...
>=20
> BTW, when I recompiled, I noticed this warning:
>=20
>   CC [M]  fs/btrfs/extent_io.o
> fs/btrfs/extent_io.c: In function =E2=80=98write_one_eb=E2=80=99:
> fs/btrfs/extent_io.c:3195: warning: =E2=80=98ret=E2=80=99 may be used=
 uninitialized in this function
>=20
> Is there ever any chance at all that write_one_eb() can be
> called by mistake for an eb with zero pages?  If so, could
> that be part of the problem?
>=20

It shouldn't happen but really neither should this bug sooooo go ahead =
and set
ret =3D 0 and put a BUG_ON(!num_pages); in write_one_eb after the

        num_pages =3D num_extent_pages(eb->start, eb->len);

and let it ride.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2012-05-03 15:53 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-10 19:39 kernel BUG at fs/btrfs/extent_io.c:3982! Jim Schutt
2012-04-10 20:24 ` Chris Mason
2012-04-10 20:32   ` Jim Schutt
2012-04-11 19:09 ` Josef Bacik
2012-04-11 20:24   ` Jim Schutt
2012-04-11 20:28     ` Josef Bacik
2012-04-11 21:39       ` Jim Schutt
2012-04-12  0:29         ` Chris Mason
2012-05-01 16:00     ` Josef Bacik
2012-05-01 16:41       ` Jim Schutt
2012-05-03 14:43         ` Jim Schutt
2012-05-03 14:53           ` Josef Bacik
2012-05-03 15:46             ` [EXTERNAL] " Jim Schutt
2012-05-03 15:53               ` Josef Bacik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.