* Re: btrfs btree_ctree_super fault
@ 2017-02-13 3:38 Sam McLeod
0 siblings, 0 replies; 8+ messages in thread
From: Sam McLeod @ 2017-02-13 3:38 UTC (permalink / raw)
To: linux-btrfs
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 7008 bytes --]
On 11/17/2016 12:39 AM, Chris Cui wrote:
We have just encountered the same bug on 4.9.0-rc2. Any solution now?
kernel BUG at fs/btrfs/ctree.c:3172!
invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU: 0 PID: 22702 Comm: trinity-c40 Not tainted 4.9.0-rc4-think+ #1
task: 8804ffde37c0 task.stack: c90002188000
RIP: 0010:[]
[] btrfs_set_item_key_safe+0x179/0x190 [btrfs]
RSP: :c9000218b8a8 EFLAGS: 00010246
RAX: RBX: 8804fddcf348 RCX: 1000
RDX: RSI: c9000218b9ce RDI: c9000218b8c7
RBP: c9000218b908 R08: 4000 R09: c9000218b8c8
R10: R11: 0001 R12: c9000218b8b6
R13: c9000218b9ce R14: 0001 R15: 880480684a88
FS: 7f7c7f998b40() GS:88050780() knlGS:
CS: 0010 DS: ES: CR0: 80050033
CR2: CR3: 00044f15f000 CR4: 001406f0
DR0: 7f4ce439d000 DR1: DR2:
DR3: DR6: 0ff0 DR7: 0600
Stack:
88050143 d305a00a2245 006c0002 0510
6c0002d3 1000 6427eebb 880480684a88
8804fddcf348 2000
Call Trace:
[] __btrfs_drop_extents+0xb00/0xe30 [btrfs]
We're going to bash on Josef's patch and probably send it with the next
merge window (queued for stable as well).
https://patchwork.kernel.org/patch/9431679/
-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello,
We are seeing this issue regularly across many of the CentOS 7 servers we use for automated software builds.
Weâve hit what seems to be this bug from kernel 3.10 through to 4.9.5-1 on physical hardware (HP BL460C G7 Blades, P410i RAID controller in RAID1) for several years now.
Iâm finding it a little hard to navigate the plethora of mailing list archives and changelogs Iâve found thus far and from the patch Chris provided above I couldnât find a way to see if this had been merged into the kernel so Iâm wondering â
1) Did it make it in?
2) If so, In what kernel version? (and if possible, how can one correlate this information to a release in the future)
3) And finally, if so, do people generally agree that itâs resolved the issue?
Below is a crash (resulting in a reboot) we experienced this morning on one of the hosts.
(Note that since rebooting, this host has booted into a newer 4.9.9 kernel).
Kernel at time of crash: 4.9.5-1.el7.elrepo.x86_64
root@s1-b12:~ # btrfs --version
btrfs-progs v4.4.1
root@s1-b12:~ # btrfs fi show
Label: none uuid: 87f6d740-0675-41d7-896d-b04d252c7783
Total devices 1 FS bytes used 1.08GiB
devid 1 size 426.61GiB used 4.02GiB path /dev/sda3
root@s1-b12:~ # btrfs fi df /var/lib/docker
Data, single: total=2.01GiB, used=1.00GiB
System, DUP: total=8.00MiB, used=16.00KiB
Metadata, DUP: total=1.00GiB, used=76.88MiB
GlobalReserve, single: total=16.00MiB, used=0.00B
[1712950.168671] ------------[ cut here ]------------
[1712950.169806] kernel BUG at fs/btrfs/ctree.c:3172!
[1712950.170925] invalid opcode: 0000 [#1] SMP
[1712950.172034] Modules linked in: fuse ufs hfsplus hfs vfat msdos fat veth binfmt_misc mptctl mptbase ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack bonding xfs libcrc32c intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd intel_cstate btrfs xor ipmi_devintf raid6_pq iTCO_wdt gpio_ich iTCO_vendor_support pcspkr sg lpc_ich mfd_core hpwdt hpilo ipmi_si ipmi_msghandler be2iscsi iscsi_boot_sysfs libiscsi i7core_edac acpi_power_meter scsi_transport_iscsi edac_core shpchp pcc_cpufreq acpi_cpufreq ip_tables ext4 jbd2 mbcache sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
[1712950.179662] crc32c_intel fb_sys_fops serio_raw ttm hpsa drm scsi_transport_sas be2net fjes dm_mirror dm_region_hash dm_log dm_mod
[1712950.182391] CPU: 7 PID: 18324 Comm: apt-get Tainted: G I 4.9.5-1.el7.elrepo.x86_64 #1
[1712950.183805] Hardware name: HP ProLiant BL460c G7, BIOS I27 08/16/2015
[1712950.185223] task: ffff880549f48000 task.stack: ffffc9000d640000
[1712950.186655] RIP: 0010:[<ffffffffa04042b2>] [<ffffffffa04042b2>] btrfs_set_item_key_safe+0x172/0x180 [btrfs]
[1712950.188180] RSP: 0018:ffffc9000d643920 EFLAGS: 00010246
[1712950.189664] RAX: 0000000000000000 RBX: 0000000000000031 RCX: 00000000000a0000
[1712950.191155] RDX: 0000000000000000 RSI: ffffc9000d643a3e RDI: ffffc9000d64393f
[1712950.192639] RBP: ffffc9000d643980 R08: 0000000000004000 R09: ffffc9000d643940
[1712950.194111] R10: 0000000000000000 R11: 0000000000000003 R12: ffffc9000d64392e
[1712950.195569] R13: ffff8808efb15d90 R14: ffffc9000d643a3e R15: ffff8807ef220d20
[1712950.197044] FS: 00007ff1686d56e0(0000) GS:ffff880bdb8c0000(0000) knlGS:0000000000000000
[1712950.198529] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1712950.200008] CR2: 00007ff1672adb8c CR3: 0000000ac136f000 CR4: 00000000000006e0
[1712950.201524] Stack:
[1712950.203021] ffff8812b46f0000 438effffa044a2c8 006c000000000000 8e00000000000a00
[1712950.204562] 6c00000000000043 00000000000a0000 0000000050c4c66c ffff8808efb15d90
[1712950.206119] 0000000000003540 0000000000000000 0000000000c00000 ffff8807ef220d20
[1712950.207684] Call Trace:
[1712950.209248] [<ffffffffa0444dc6>] __btrfs_drop_extents+0x536/0xd90 [btrfs]
[1712950.210889] [<ffffffffa04709c6>] btrfs_log_changed_extents+0x356/0x650 [btrfs]
[1712950.212480] [<ffffffffa0470161>] ? fill_inode_item.isra.17+0x231/0x290 [btrfs]
[1712950.214058] [<ffffffffa04765e6>] btrfs_log_inode+0xa56/0xc20 [btrfs]
[1712950.215626] [<ffffffffa046f99c>] ? check_parent_dirs_for_sync+0xec/0x120 [btrfs]
[1712950.217216] [<ffffffffa0476abb>] btrfs_log_inode_parent+0x27b/0x970 [btrfs]
[1712950.218814] [<ffffffffa042cea1>] ? wait_current_trans.isra.23+0x31/0x110 [btrfs]
[1712950.220394] [<ffffffff81202847>] ? kmem_cache_alloc+0xd7/0x1a0
[1712950.221985] [<ffffffffa042f5fc>] ? start_transaction+0x11c/0x4b0 [btrfs]
[1712950.223583] [<ffffffffa0478112>] btrfs_log_dentry_safe+0x62/0x80 [btrfs]
[1712950.225173] [<ffffffffa0447702>] btrfs_sync_file+0x2a2/0x3f0 [btrfs]
[1712950.226748] [<ffffffff8125e8bd>] vfs_fsync_range+0x3d/0xb0
[1712950.228331] [<ffffffff811de57e>] SyS_msync+0x16e/0x1f0
[1712950.229903] [<ffffffff81003a47>] do_syscall_64+0x67/0x180
[1712950.231466] [<ffffffff8175692b>] entry_SYSCALL64_slow_path+0x25/0x25
[1712950.233023] Code: 48 8b 45 b7 48 8d 7d bf 4c 89 f6 48 89 45 c8 0f b6 45 b6 88 45 c7 48 8b 45 ae 48 89 45 bf e8 c6 f2 ff ff 85 c0 0f 8f 46 ff ff ff <0f> 0b e8 e7 dd c7 e0 0f 0b 0f 1f 44 00 00 66 66 66 66 90 55 48
[1712950.236350] RIP [<ffffffffa04042b2>] btrfs_set_item_key_safe+0x172/0x180 [btrfs]
[1712950.238008] RSP <ffffc9000d643920>
I apologise in advance if this is incorrectly posted in any way, I donât often post to mailing lists.
--
Sam McLeod
ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±ý»k~ÏâØ^nr¡ö¦zË\x1aëh¨èÚ&£ûàz¿äz¹Þú+Ê+zf£¢·h§~Ûiÿÿïêÿêçz_è®\x0fæj:+v¨þ)ߣøm
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: btrfs btree_ctree_super fault
2016-11-17 5:39 Chris Cui
@ 2016-11-17 15:34 ` Chris Mason
0 siblings, 0 replies; 8+ messages in thread
From: Chris Mason @ 2016-11-17 15:34 UTC (permalink / raw)
To: Chris Cui, linux-btrfs
On 11/17/2016 12:39 AM, Chris Cui wrote:
> We have just encountered the same bug on 4.9.0-rc2. Any solution now?
>
>> kernel BUG at fs/btrfs/ctree.c:3172!
>> invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
>> CPU: 0 PID: 22702 Comm: trinity-c40 Not tainted 4.9.0-rc4-think+ #1
>> task: ffff8804ffde37c0 task.stack: ffffc90002188000
>> RIP: 0010:[<ffffffffa00576b9>]
>> [<ffffffffa00576b9>] btrfs_set_item_key_safe+0x179/0x190 [btrfs]
>> RSP: 0000:ffffc9000218b8a8 EFLAGS: 00010246
>> RAX: 0000000000000000 RBX: ffff8804fddcf348 RCX: 0000000000001000
>> RDX: 0000000000000000 RSI: ffffc9000218b9ce RDI: ffffc9000218b8c7
>> RBP: ffffc9000218b908 R08: 0000000000004000 R09: ffffc9000218b8c8
>> R10: 0000000000000000 R11: 0000000000000001 R12: ffffc9000218b8b6
>> R13: ffffc9000218b9ce R14: 0000000000000001 R15: ffff880480684a88
>> FS: 00007f7c7f998b40(0000) GS:ffff880507800000(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000000000000000 CR3: 000000044f15f000 CR4: 00000000001406f0
>> DR0: 00007f4ce439d000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
>> Stack:
>> ffff880501430000 d305ffffa00a2245 006c000000000002 0500000000000010
>> 6c000000000002d3 0000000000001000 000000006427eebb ffff880480684a88
>> 0000000000000000 ffff8804fddcf348 0000000000002000 0000000000000000
>> Call Trace:
>> [<ffffffffa009cff0>] __btrfs_drop_extents+0xb00/0xe30 [btrfs]
We're going to bash on Josef's patch and probably send it with the next
merge window (queued for stable as well).
https://patchwork.kernel.org/patch/9431679/
-chris
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: btrfs btree_ctree_super fault
@ 2016-11-17 5:39 Chris Cui
2016-11-17 15:34 ` Chris Mason
0 siblings, 1 reply; 8+ messages in thread
From: Chris Cui @ 2016-11-17 5:39 UTC (permalink / raw)
To: linux-btrfs
We have just encountered the same bug on 4.9.0-rc2. Any solution now?
> kernel BUG at fs/btrfs/ctree.c:3172!
> invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> CPU: 0 PID: 22702 Comm: trinity-c40 Not tainted 4.9.0-rc4-think+ #1
> task: ffff8804ffde37c0 task.stack: ffffc90002188000
> RIP: 0010:[<ffffffffa00576b9>]
> [<ffffffffa00576b9>] btrfs_set_item_key_safe+0x179/0x190 [btrfs]
> RSP: 0000:ffffc9000218b8a8 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffff8804fddcf348 RCX: 0000000000001000
> RDX: 0000000000000000 RSI: ffffc9000218b9ce RDI: ffffc9000218b8c7
> RBP: ffffc9000218b908 R08: 0000000000004000 R09: ffffc9000218b8c8
> R10: 0000000000000000 R11: 0000000000000001 R12: ffffc9000218b8b6
> R13: ffffc9000218b9ce R14: 0000000000000001 R15: ffff880480684a88
> FS: 00007f7c7f998b40(0000) GS:ffff880507800000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 000000044f15f000 CR4: 00000000001406f0
> DR0: 00007f4ce439d000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
> Stack:
> ffff880501430000 d305ffffa00a2245 006c000000000002 0500000000000010
> 6c000000000002d3 0000000000001000 000000006427eebb ffff880480684a88
> 0000000000000000 ffff8804fddcf348 0000000000002000 0000000000000000
> Call Trace:
> [<ffffffffa009cff0>] __btrfs_drop_extents+0xb00/0xe30 [btrfs]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: btrfs btree_ctree_super fault
2016-11-10 14:35 ` Dave Jones
@ 2016-11-10 15:27 ` Chris Mason
0 siblings, 0 replies; 8+ messages in thread
From: Chris Mason @ 2016-11-10 15:27 UTC (permalink / raw)
To: Dave Jones, Linus Torvalds, Jens Axboe, Andy Lutomirski,
Andy Lutomirski, Al Viro, Josef Bacik, David Sterba, linux-btrfs,
Linux Kernel, Dave Chinner
On 11/10/2016 09:35 AM, Dave Jones wrote:
> On Tue, Nov 08, 2016 at 10:08:04AM -0500, Chris Mason wrote:
>
> > > And another new one:
> > >
> > > kernel BUG at fs/btrfs/ctree.c:3172!
> > >
> > > Call Trace:
> > > [<ffffffffa009cff0>] __btrfs_drop_extents+0xb00/0xe30 [btrfs]
> >
> > We've been hunting this one for at least two years. It's the white
> > whale of btrfs bugs. Josef has a semi-reliable reproducer now, but I
> > think it's not the same as the pagevec based problems you reported earlier.
>
> Great, now for whatever reason, I'm hitting this over and over.
>
> Even better, after the last time I hit it, it reboot and this happened during boot..
>
> BTRFS info (device sda6): disk space caching is enabled
> BTRFS info (device sda6): has skinny extents
> BTRFS info (device sda3): disk space caching is enabled
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 443 at fs/btrfs/file.c:546 btrfs_drop_extent_cache+0x411/0x420 [btrfs]
> CPU: 1 PID: 443 Comm: mount Not tainted 4.9.0-rc4-think+ #1
> ffffc90000c4b468 ffffffff813b66bc 0000000000000000 0000000000000000
> ffffc90000c4b4a8 ffffffff81086d2b 0000022200c4b488 000000000002f265
> 40c8dded1afd6000 ffff8804ff5cddc8 ffff8804ef26f2b8 40c8dded1afd5000
> Call Trace:
> [<ffffffff813b66bc>] dump_stack+0x4f/0x73
> [<ffffffff81086d2b>] __warn+0xcb/0xf0
> [<ffffffff81086e5d>] warn_slowpath_null+0x1d/0x20
> [<ffffffffa009c0f1>] btrfs_drop_extent_cache+0x411/0x420 [btrfs]
> [<ffffffff81215923>] ? alloc_debug_processing+0x73/0x1b0
> [<ffffffffa009c93f>] __btrfs_drop_extents+0x44f/0xe30 [btrfs]
> [<ffffffffa005426a>] ? btrfs_alloc_path+0x1a/0x20 [btrfs]
> [<ffffffffa005426a>] ? btrfs_alloc_path+0x1a/0x20 [btrfs]
> [<ffffffff8121842a>] ? kmem_cache_alloc+0x2aa/0x330
> [<ffffffffa005426a>] ? btrfs_alloc_path+0x1a/0x20 [btrfs]
> [<ffffffffa009e399>] btrfs_drop_extents+0x79/0xa0 [btrfs]
> [<ffffffffa00ce4d1>] replay_one_extent+0x1e1/0x710 [btrfs]
> [<ffffffffa00cec6d>] replay_one_buffer+0x26d/0x7e0 [btrfs]
> [<ffffffff8121732c>] ? ___slab_alloc.constprop.83+0x27c/0x5c0
> [<ffffffffa005426a>] ? btrfs_alloc_path+0x1a/0x20 [btrfs]
> [<ffffffff813d5b87>] ? debug_smp_processor_id+0x17/0x20
> [<ffffffffa00ca3db>] walk_up_log_tree+0xeb/0x240 [btrfs]
> [<ffffffffa00ca5d6>] walk_log_tree+0xa6/0x1d0 [btrfs]
> [<ffffffffa00d32fc>] btrfs_recover_log_trees+0x1dc/0x460 [btrfs]
> [<ffffffffa00cea00>] ? replay_one_extent+0x710/0x710 [btrfs]
> [<ffffffffa0081f65>] open_ctree+0x2575/0x2670 [btrfs]
> [<ffffffffa005144b>] btrfs_mount+0xd0b/0xe10 [btrfs]
> [<ffffffff811da804>] ? pcpu_alloc+0x2d4/0x660
> [<ffffffff810dce41>] ? lockdep_init_map+0x61/0x200
> [<ffffffff810d39eb>] ? __init_waitqueue_head+0x3b/0x50
> [<ffffffff81243794>] mount_fs+0x14/0xa0
> [<ffffffff8126305b>] vfs_kern_mount+0x6b/0x150
> [<ffffffffa0050a08>] btrfs_mount+0x2c8/0xe10 [btrfs]
> [<ffffffff811da804>] ? pcpu_alloc+0x2d4/0x660
> [<ffffffff810dce41>] ? lockdep_init_map+0x61/0x200
> [<ffffffff810dce41>] ? lockdep_init_map+0x61/0x200
> [<ffffffff810d39eb>] ? __init_waitqueue_head+0x3b/0x50
> [<ffffffff81243794>] mount_fs+0x14/0xa0
> [<ffffffff8126305b>] vfs_kern_mount+0x6b/0x150
> [<ffffffff81265bd2>] do_mount+0x1c2/0xda0
> [<ffffffff811d41c0>] ? memdup_user+0x60/0x90
> [<ffffffff81266ac3>] SyS_mount+0x83/0xd0
> [<ffffffff81002d81>] do_syscall_64+0x61/0x170
> [<ffffffff81894ccb>] entry_SYSCALL64_slow_path+0x25/0x25
> ---[ end trace d3fa03bb9c115bbe ]---
> BTRFS: error (device sda3) in btrfs_replay_log:2491: errno=-17 Object already exists (Failed to recover log tree)
> BTRFS error (device sda3): cleaner transaction attach returned -30
> BTRFS error (device sda3): open_ctree failed
>
>
> Guess I'll hit it with btrfsck and hope for the best..
You can zero the log if you need to. Josef has a ton of tracing around
this right now, so I'm hoping we nail it down very soon.
-chris
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: btrfs btree_ctree_super fault
2016-11-08 15:08 ` Chris Mason
@ 2016-11-10 14:35 ` Dave Jones
2016-11-10 15:27 ` Chris Mason
0 siblings, 1 reply; 8+ messages in thread
From: Dave Jones @ 2016-11-10 14:35 UTC (permalink / raw)
To: Chris Mason
Cc: Linus Torvalds, Jens Axboe, Andy Lutomirski, Andy Lutomirski,
Al Viro, Josef Bacik, David Sterba, linux-btrfs, Linux Kernel,
Dave Chinner
On Tue, Nov 08, 2016 at 10:08:04AM -0500, Chris Mason wrote:
> > And another new one:
> >
> > kernel BUG at fs/btrfs/ctree.c:3172!
> >
> > Call Trace:
> > [<ffffffffa009cff0>] __btrfs_drop_extents+0xb00/0xe30 [btrfs]
>
> We've been hunting this one for at least two years. It's the white
> whale of btrfs bugs. Josef has a semi-reliable reproducer now, but I
> think it's not the same as the pagevec based problems you reported earlier.
Great, now for whatever reason, I'm hitting this over and over.
Even better, after the last time I hit it, it reboot and this happened during boot..
BTRFS info (device sda6): disk space caching is enabled
BTRFS info (device sda6): has skinny extents
BTRFS info (device sda3): disk space caching is enabled
------------[ cut here ]------------
WARNING: CPU: 1 PID: 443 at fs/btrfs/file.c:546 btrfs_drop_extent_cache+0x411/0x420 [btrfs]
CPU: 1 PID: 443 Comm: mount Not tainted 4.9.0-rc4-think+ #1
ffffc90000c4b468 ffffffff813b66bc 0000000000000000 0000000000000000
ffffc90000c4b4a8 ffffffff81086d2b 0000022200c4b488 000000000002f265
40c8dded1afd6000 ffff8804ff5cddc8 ffff8804ef26f2b8 40c8dded1afd5000
Call Trace:
[<ffffffff813b66bc>] dump_stack+0x4f/0x73
[<ffffffff81086d2b>] __warn+0xcb/0xf0
[<ffffffff81086e5d>] warn_slowpath_null+0x1d/0x20
[<ffffffffa009c0f1>] btrfs_drop_extent_cache+0x411/0x420 [btrfs]
[<ffffffff81215923>] ? alloc_debug_processing+0x73/0x1b0
[<ffffffffa009c93f>] __btrfs_drop_extents+0x44f/0xe30 [btrfs]
[<ffffffffa005426a>] ? btrfs_alloc_path+0x1a/0x20 [btrfs]
[<ffffffffa005426a>] ? btrfs_alloc_path+0x1a/0x20 [btrfs]
[<ffffffff8121842a>] ? kmem_cache_alloc+0x2aa/0x330
[<ffffffffa005426a>] ? btrfs_alloc_path+0x1a/0x20 [btrfs]
[<ffffffffa009e399>] btrfs_drop_extents+0x79/0xa0 [btrfs]
[<ffffffffa00ce4d1>] replay_one_extent+0x1e1/0x710 [btrfs]
[<ffffffffa00cec6d>] replay_one_buffer+0x26d/0x7e0 [btrfs]
[<ffffffff8121732c>] ? ___slab_alloc.constprop.83+0x27c/0x5c0
[<ffffffffa005426a>] ? btrfs_alloc_path+0x1a/0x20 [btrfs]
[<ffffffff813d5b87>] ? debug_smp_processor_id+0x17/0x20
[<ffffffffa00ca3db>] walk_up_log_tree+0xeb/0x240 [btrfs]
[<ffffffffa00ca5d6>] walk_log_tree+0xa6/0x1d0 [btrfs]
[<ffffffffa00d32fc>] btrfs_recover_log_trees+0x1dc/0x460 [btrfs]
[<ffffffffa00cea00>] ? replay_one_extent+0x710/0x710 [btrfs]
[<ffffffffa0081f65>] open_ctree+0x2575/0x2670 [btrfs]
[<ffffffffa005144b>] btrfs_mount+0xd0b/0xe10 [btrfs]
[<ffffffff811da804>] ? pcpu_alloc+0x2d4/0x660
[<ffffffff810dce41>] ? lockdep_init_map+0x61/0x200
[<ffffffff810d39eb>] ? __init_waitqueue_head+0x3b/0x50
[<ffffffff81243794>] mount_fs+0x14/0xa0
[<ffffffff8126305b>] vfs_kern_mount+0x6b/0x150
[<ffffffffa0050a08>] btrfs_mount+0x2c8/0xe10 [btrfs]
[<ffffffff811da804>] ? pcpu_alloc+0x2d4/0x660
[<ffffffff810dce41>] ? lockdep_init_map+0x61/0x200
[<ffffffff810dce41>] ? lockdep_init_map+0x61/0x200
[<ffffffff810d39eb>] ? __init_waitqueue_head+0x3b/0x50
[<ffffffff81243794>] mount_fs+0x14/0xa0
[<ffffffff8126305b>] vfs_kern_mount+0x6b/0x150
[<ffffffff81265bd2>] do_mount+0x1c2/0xda0
[<ffffffff811d41c0>] ? memdup_user+0x60/0x90
[<ffffffff81266ac3>] SyS_mount+0x83/0xd0
[<ffffffff81002d81>] do_syscall_64+0x61/0x170
[<ffffffff81894ccb>] entry_SYSCALL64_slow_path+0x25/0x25
---[ end trace d3fa03bb9c115bbe ]---
BTRFS: error (device sda3) in btrfs_replay_log:2491: errno=-17 Object already exists (Failed to recover log tree)
BTRFS error (device sda3): cleaner transaction attach returned -30
BTRFS error (device sda3): open_ctree failed
Guess I'll hit it with btrfsck and hope for the best..
Dave
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: btrfs btree_ctree_super fault
2016-11-08 14:59 ` Dave Jones
@ 2016-11-08 15:08 ` Chris Mason
2016-11-10 14:35 ` Dave Jones
0 siblings, 1 reply; 8+ messages in thread
From: Chris Mason @ 2016-11-08 15:08 UTC (permalink / raw)
To: Dave Jones, Linus Torvalds, Jens Axboe, Andy Lutomirski,
Andy Lutomirski, Al Viro, Josef Bacik, David Sterba, linux-btrfs,
Linux Kernel, Dave Chinner
On 11/08/2016 09:59 AM, Dave Jones wrote:
> On Sun, Nov 06, 2016 at 11:55:39AM -0500, Dave Jones wrote:
> > <subject changed, hopefully we're done with bio corruption for now>
> >
> > On Mon, Oct 31, 2016 at 01:44:55PM -0600, Chris Mason wrote:
> > > On Mon, Oct 31, 2016 at 12:35:16PM -0700, Linus Torvalds wrote:
> > > >On Mon, Oct 31, 2016 at 11:55 AM, Dave Jones <davej@codemonkey.org.uk> wrote:
> > > >>
> > > >> BUG: Bad page state in process kworker/u8:12 pfn:4e0e39
> > > >> page:ffffea0013838e40 count:0 mapcount:0 mapping:ffff8804a20310e0 index:0x100c
> > > >> flags: 0x400000000000000c(referenced|uptodate)
> > > >> page dumped because: non-NULL mapping
> > > >
> > > >Hmm. So this seems to be btrfs-specific, right?
> > > >
> > > >I searched for all your "non-NULL mapping" cases, and they all seem to
> > > >have basically the same call trace, with some work thread doing
> > > >writeback and going through btrfs_writepages().
> > > >
> > > >Sounds like it's a race with either fallocate hole-punching or
> > > >truncate. I'm not seeing it, but I suspect it's btrfs, since DaveJ
> > > >clearly ran other filesystems too but I am not seeing this backtrace
> > > >for anything else.
> > >
> > > Agreed, I think this is a separate bug, almost certainly btrfs specific.
> > > I'll work with Dave on a better reproducer.
> >
> > Still refining my 'capture ftrace when trinity detects taint' feature,
> > but in the meantime, here's a variant I don't think we've seen before:
>
> And another new one:
>
> kernel BUG at fs/btrfs/ctree.c:3172!
> invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> CPU: 0 PID: 22702 Comm: trinity-c40 Not tainted 4.9.0-rc4-think+ #1
> task: ffff8804ffde37c0 task.stack: ffffc90002188000
> RIP: 0010:[<ffffffffa00576b9>]
> [<ffffffffa00576b9>] btrfs_set_item_key_safe+0x179/0x190 [btrfs]
> RSP: 0000:ffffc9000218b8a8 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffff8804fddcf348 RCX: 0000000000001000
> RDX: 0000000000000000 RSI: ffffc9000218b9ce RDI: ffffc9000218b8c7
> RBP: ffffc9000218b908 R08: 0000000000004000 R09: ffffc9000218b8c8
> R10: 0000000000000000 R11: 0000000000000001 R12: ffffc9000218b8b6
> R13: ffffc9000218b9ce R14: 0000000000000001 R15: ffff880480684a88
> FS: 00007f7c7f998b40(0000) GS:ffff880507800000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 000000044f15f000 CR4: 00000000001406f0
> DR0: 00007f4ce439d000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
> Stack:
> ffff880501430000 d305ffffa00a2245 006c000000000002 0500000000000010
> 6c000000000002d3 0000000000001000 000000006427eebb ffff880480684a88
> 0000000000000000 ffff8804fddcf348 0000000000002000 0000000000000000
> Call Trace:
> [<ffffffffa009cff0>] __btrfs_drop_extents+0xb00/0xe30 [btrfs]
We've been hunting this one for at least two years. It's the white
whale of btrfs bugs. Josef has a semi-reliable reproducer now, but I
think it's not the same as the pagevec based problems you reported earlier.
-chris
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: btrfs btree_ctree_super fault
2016-11-06 16:55 ` btrfs btree_ctree_super fault Dave Jones
@ 2016-11-08 14:59 ` Dave Jones
2016-11-08 15:08 ` Chris Mason
0 siblings, 1 reply; 8+ messages in thread
From: Dave Jones @ 2016-11-08 14:59 UTC (permalink / raw)
To: Chris Mason, Linus Torvalds, Jens Axboe, Andy Lutomirski,
Andy Lutomirski, Al Viro, Josef Bacik, David Sterba, linux-btrfs,
Linux Kernel, Dave Chinner
On Sun, Nov 06, 2016 at 11:55:39AM -0500, Dave Jones wrote:
> <subject changed, hopefully we're done with bio corruption for now>
>
> On Mon, Oct 31, 2016 at 01:44:55PM -0600, Chris Mason wrote:
> > On Mon, Oct 31, 2016 at 12:35:16PM -0700, Linus Torvalds wrote:
> > >On Mon, Oct 31, 2016 at 11:55 AM, Dave Jones <davej@codemonkey.org.uk> wrote:
> > >>
> > >> BUG: Bad page state in process kworker/u8:12 pfn:4e0e39
> > >> page:ffffea0013838e40 count:0 mapcount:0 mapping:ffff8804a20310e0 index:0x100c
> > >> flags: 0x400000000000000c(referenced|uptodate)
> > >> page dumped because: non-NULL mapping
> > >
> > >Hmm. So this seems to be btrfs-specific, right?
> > >
> > >I searched for all your "non-NULL mapping" cases, and they all seem to
> > >have basically the same call trace, with some work thread doing
> > >writeback and going through btrfs_writepages().
> > >
> > >Sounds like it's a race with either fallocate hole-punching or
> > >truncate. I'm not seeing it, but I suspect it's btrfs, since DaveJ
> > >clearly ran other filesystems too but I am not seeing this backtrace
> > >for anything else.
> >
> > Agreed, I think this is a separate bug, almost certainly btrfs specific.
> > I'll work with Dave on a better reproducer.
>
> Still refining my 'capture ftrace when trinity detects taint' feature,
> but in the meantime, here's a variant I don't think we've seen before:
And another new one:
kernel BUG at fs/btrfs/ctree.c:3172!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU: 0 PID: 22702 Comm: trinity-c40 Not tainted 4.9.0-rc4-think+ #1
task: ffff8804ffde37c0 task.stack: ffffc90002188000
RIP: 0010:[<ffffffffa00576b9>]
[<ffffffffa00576b9>] btrfs_set_item_key_safe+0x179/0x190 [btrfs]
RSP: 0000:ffffc9000218b8a8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8804fddcf348 RCX: 0000000000001000
RDX: 0000000000000000 RSI: ffffc9000218b9ce RDI: ffffc9000218b8c7
RBP: ffffc9000218b908 R08: 0000000000004000 R09: ffffc9000218b8c8
R10: 0000000000000000 R11: 0000000000000001 R12: ffffc9000218b8b6
R13: ffffc9000218b9ce R14: 0000000000000001 R15: ffff880480684a88
FS: 00007f7c7f998b40(0000) GS:ffff880507800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000044f15f000 CR4: 00000000001406f0
DR0: 00007f4ce439d000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
Stack:
ffff880501430000 d305ffffa00a2245 006c000000000002 0500000000000010
6c000000000002d3 0000000000001000 000000006427eebb ffff880480684a88
0000000000000000 ffff8804fddcf348 0000000000002000 0000000000000000
Call Trace:
[<ffffffffa009cff0>] __btrfs_drop_extents+0xb00/0xe30 [btrfs]
[<ffffffff8116c80c>] ? function_trace_call+0x13c/0x190
[<ffffffffa009c4f5>] ? __btrfs_drop_extents+0x5/0xe30 [btrfs]
[<ffffffff810e2b00>] ? do_raw_write_lock+0xb0/0xc0
[<ffffffffa00cd43d>] btrfs_log_changed_extents+0x35d/0x630 [btrfs]
[<ffffffffa00a6a74>] ? release_extent_buffer+0xa4/0x110 [btrfs]
[<ffffffffa00cd0e5>] ? btrfs_log_changed_extents+0x5/0x630 [btrfs]
[<ffffffffa00d1085>] btrfs_log_inode+0xb05/0x11d0 [btrfs]
[<ffffffff8116536c>] ? trace_function+0x6c/0x80
[<ffffffffa00d0580>] ? log_directory_changes+0xc0/0xc0 [btrfs]
[<ffffffffa00d1a20>] ? btrfs_log_inode_parent+0x240/0x940 [btrfs]
[<ffffffff8116c80c>] ? function_trace_call+0x13c/0x190
[<ffffffffa00d1a20>] btrfs_log_inode_parent+0x240/0x940 [btrfs]
[<ffffffffa00d17e5>] ? btrfs_log_inode_parent+0x5/0x940 [btrfs]
[<ffffffff81259131>] ? dget_parent+0x71/0x150
[<ffffffffa00d3102>] btrfs_log_dentry_safe+0x62/0x80 [btrfs]
[<ffffffffa009f404>] btrfs_sync_file+0x344/0x4d0 [btrfs]
[<ffffffff81278a1b>] vfs_fsync_range+0x4b/0xb0
[<ffffffff812607c5>] ? __fget_light+0x5/0x60
[<ffffffff81278add>] do_fsync+0x3d/0x70
[<ffffffff81278aa5>] ? do_fsync+0x5/0x70
[<ffffffff81278db3>] SyS_fdatasync+0x13/0x20
[<ffffffff81002d81>] do_syscall_64+0x61/0x170
[<ffffffff81894ccb>] entry_SYSCALL64_slow_path+0x25/0x25
Code: 48 8b 45 b7 48 8d 7d bf 4c 89 ee 48 89 45 c8 0f b6 45 b6 88 45 c7 48 8b 45 ae 48 89 45 bf e8 af f2 ff ff 85 c0 0f 8f 43 ff ff ff <0f> 0b 0f 0b e8 ee f3 02 e1 0f 1f 40 00 66 2e 0f 1f 84 00 00 00
Unfortunatly, because this was a BUG_ON, it locked up the box so it
didn't save any additional debug info. Tempted to see if making BUG_ON
a no-op will at least let it live long enough to save the ftrace buffer.
Given this seems to be mutating every time I see something go wrong,
I'm wondering if this is fallout from memory corruption again.
Dave
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: btrfs btree_ctree_super fault
2016-10-31 19:44 ` Chris Mason
@ 2016-11-06 16:55 ` Dave Jones
2016-11-08 14:59 ` Dave Jones
0 siblings, 1 reply; 8+ messages in thread
From: Dave Jones @ 2016-11-06 16:55 UTC (permalink / raw)
To: Chris Mason, Linus Torvalds, Jens Axboe, Andy Lutomirski,
Andy Lutomirski, Al Viro, Josef Bacik, David Sterba, linux-btrfs,
Linux Kernel, Dave Chinner
<subject changed, hopefully we're done with bio corruption for now>
On Mon, Oct 31, 2016 at 01:44:55PM -0600, Chris Mason wrote:
> On Mon, Oct 31, 2016 at 12:35:16PM -0700, Linus Torvalds wrote:
> >On Mon, Oct 31, 2016 at 11:55 AM, Dave Jones <davej@codemonkey.org.uk> wrote:
> >>
> >> BUG: Bad page state in process kworker/u8:12 pfn:4e0e39
> >> page:ffffea0013838e40 count:0 mapcount:0 mapping:ffff8804a20310e0 index:0x100c
> >> flags: 0x400000000000000c(referenced|uptodate)
> >> page dumped because: non-NULL mapping
> >
> >Hmm. So this seems to be btrfs-specific, right?
> >
> >I searched for all your "non-NULL mapping" cases, and they all seem to
> >have basically the same call trace, with some work thread doing
> >writeback and going through btrfs_writepages().
> >
> >Sounds like it's a race with either fallocate hole-punching or
> >truncate. I'm not seeing it, but I suspect it's btrfs, since DaveJ
> >clearly ran other filesystems too but I am not seeing this backtrace
> >for anything else.
>
> Agreed, I think this is a separate bug, almost certainly btrfs specific.
> I'll work with Dave on a better reproducer.
Still refining my 'capture ftrace when trinity detects taint' feature,
but in the meantime, here's a variant I don't think we've seen before:
general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU: 3 PID: 1913 Comm: trinity-c51 Not tainted 4.9.0-rc3-think+ #3
task: ffff880503350040 task.stack: ffffc90000240000
RIP: 0010:[<ffffffffa007c2f6>]
[<ffffffffa007c2f6>] write_ctree_super+0x96/0xb30 [btrfs]
RSP: 0018:ffffc90000243c90 EFLAGS: 00010286
RAX: dae05adadadad000 RBX: 0000000000000000 RCX: 0000000000000002
RDX: ffff8804fdfcc000 RSI: ffff8804edcee313 RDI: ffff8804edcee1c3
RBP: ffffc90000243d00 R08: 0000000000000003 R09: ffff880000000000
R10: 0000000000000001 R11: 0000000000000100 R12: ffff88045151c548
R13: 0000000000000000 R14: ffff8804ee5122a8 R15: ffff8804572267e8
FS: 00007f25c3e0eb40(0000) GS:ffff880507e00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f25c1560d44 CR3: 0000000454e20000 CR4: 00000000001406e0
DR0: 00007fee93506000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
Stack:
0000000000000001
ffff88050227b3f8
ffff8804fff01b28
00000001810b7f35
ffffffffa007c265
0000000000000001
ffffc90000243cb8
000000002c9a8645
ffff8804fff01b28
ffff8804fff01b28
ffff88045151c548
0000000000000000
Call Trace:
[<ffffffffa007c265>] ? write_ctree_super+0x5/0xb30 [btrfs]
[<ffffffffa00d2956>] btrfs_sync_log+0x886/0xa60 [btrfs]
[<ffffffffa009f4f9>] btrfs_sync_file+0x479/0x4d0 [btrfs]
[<ffffffff812789ab>] vfs_fsync_range+0x4b/0xb0
[<ffffffff81260755>] ? __fget_light+0x5/0x60
[<ffffffff81278a6d>] do_fsync+0x3d/0x70
[<ffffffff81278a35>] ? do_fsync+0x5/0x70
[<ffffffff81278d20>] SyS_fsync+0x10/0x20
[<ffffffff81002d81>] do_syscall_64+0x61/0x170
[<ffffffff81894a8b>] entry_SYSCALL64_slow_path+0x25/0x25
Code: c7 48 8b 42 30 4c 8b 08 48 b8 00 00 00 00 00 16 00 00 49 03 81 a0 01 00 00 49 b9 00 00 00 00 00 88 ff ff 48 c1 f8 06 48 c1 e0 0c <4a> 8b 44 08 50 48 39 46 08 0f 84 8d 08 00 00 49 63 c0 48 8d 0c
RIP
[<ffffffffa007c2f6>] write_ctree_super+0x96/0xb30 [btrfs]
RSP <ffffc90000243c90>
All code
========
0: c7 (bad)
1: 48 8b 42 30 mov 0x30(%rdx),%rax
5: 4c 8b 08 mov (%rax),%r9
8: 48 b8 00 00 00 00 00 movabs $0x160000000000,%rax
f: 16 00 00
12: 49 03 81 a0 01 00 00 add 0x1a0(%r9),%rax
19: 49 b9 00 00 00 00 00 movabs $0xffff880000000000,%r9
20: 88 ff ff
23: 48 c1 f8 06 sar $0x6,%rax
27: 48 c1 e0 0c shl $0xc,%rax
2b:* 4a 8b 44 08 50 mov 0x50(%rax,%r9,1),%rax <-- trapping instruction
30: 48 39 46 08 cmp %rax,0x8(%rsi)
34: 0f 84 8d 08 00 00 je 0x8c7
3a: 49 63 c0 movslq %r8d,%rax
3d: 48 rex.W
3e: 8d .byte 0x8d
3f: 0c .byte 0xc
Code starting with the faulting instruction
===========================================
0: 4a 8b 44 08 50 mov 0x50(%rax,%r9,1),%rax
5: 48 39 46 08 cmp %rax,0x8(%rsi)
9: 0f 84 8d 08 00 00 je 0x89c
f: 49 63 c0 movslq %r8d,%rax
12: 48 rex.W
13: 8d .byte 0x8d
14: 0c .byte 0xc
According to objdump -S, it looks like this is an inlined copy of backup_super_roots
root_backup = info->super_for_commit->super_roots + last_backup;
2706: 48 8d b8 2b 0b 00 00 lea 0xb2b(%rax),%rdi
270d: 48 63 c1 movslq %ecx,%rax
2710: 48 8d 34 80 lea (%rax,%rax,4),%rsi
2714: 48 8d 04 b0 lea (%rax,%rsi,4),%rax
2718: 48 8d 34 c7 lea (%rdi,%rax,8),%rsi
btrfs_header_generation(info->tree_root->node))
271c: 48 8b 42 30 mov 0x30(%rdx),%rax
2720: 4c 8b 08 mov (%rax),%r9
2723: 48 b8 00 00 00 00 00 movabs $0x160000000000,%rax
272a: 16 00 00
272d: 49 03 81 a0 01 00 00 add 0x1a0(%r9),%rax
if (btrfs_backup_tree_root_gen(root_backup) ==
2734: 49 b9 00 00 00 00 00 movabs $0xffff880000000000,%r9
273b: 88 ff ff
273e: 48 c1 f8 06 sar $0x6,%rax
2742: 48 c1 e0 0c shl $0xc,%rax
2746: 4a 8b 44 08 50 mov 0x50(%rax,%r9,1),%rax <-- trapping instruction
274b: 48 39 46 08 cmp %rax,0x8(%rsi)
274f: 0f 84 8d 08 00 00 je 2fe2 <write_ctree_super+0x932>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-02-13 3:38 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-13 3:38 btrfs btree_ctree_super fault Sam McLeod
-- strict thread matches above, loose matches on Subject: below --
2016-11-17 5:39 Chris Cui
2016-11-17 15:34 ` Chris Mason
2016-10-26 22:51 bio linked list corruption Linus Torvalds
2016-10-26 22:58 ` Linus Torvalds
2016-10-26 23:03 ` Jens Axboe
2016-10-26 23:08 ` Linus Torvalds
2016-10-26 23:20 ` Jens Axboe
2016-10-26 23:38 ` Chris Mason
2016-10-26 23:47 ` Dave Jones
2016-10-31 18:55 ` Dave Jones
2016-10-31 19:35 ` Linus Torvalds
2016-10-31 19:44 ` Chris Mason
2016-11-06 16:55 ` btrfs btree_ctree_super fault Dave Jones
2016-11-08 14:59 ` Dave Jones
2016-11-08 15:08 ` Chris Mason
2016-11-10 14:35 ` Dave Jones
2016-11-10 15:27 ` Chris Mason
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.