* kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
@ 2015-09-14 11:46 Stéphane Lesimple
2015-09-15 14:47 ` Stéphane Lesimple
0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-14 11:46 UTC (permalink / raw)
To: linux-btrfs
Hello btrfs-aholics,
I've been experiencing repetitive "kernel BUG" occurences in the past
few days trying to balance a raid5 filesystem after adding a new drive.
It occurs on both 4.2.0 and 4.1.7, using 4.2 userspace tools.
The raid5 setup was 2x4T drives (created 3 days ago to upgrade smoothly
from mdadm/ext4 to btrfs), then I added a 3rd drive and tried to
balance.
metadata is in raid1.
root@nas:~# uname -a
Linux nas 4.1.7-040107-generic #201509131330 SMP Sun Sep 13 17:32:28 UTC
2015 x86_64 x86_64 x86_64 GNU/Linux
(and also
Linux version 4.2.0-7-generic (buildd@lgw01-60) (gcc version 5.2.1
20150825 (Ubuntu 5.2.1-15ubuntu5) ) #7-Ubuntu SMP Tue Sep 1 16:43:10 UTC
2015 (Ubuntu 4.2.0-7.7-generic 4.2.0)
)
root@nas:~# btrfs --version
btrfs-progs v4.2
root@nas:~# btrfs fi show
Label: 'tank' uuid: 6bec1608-d9c0-453e-87eb-8b8663c9010d
Total devices 3 FS bytes used 2.66TiB
devid 1 size 2.73TiB used 2.50TiB path
/dev/mapper/luks-WDC_WD30EFRX-68EUZN0_WD-WCC4N2STUCVR
devid 2 size 2.73TiB used 2.50TiB path
/dev/mapper/luks-WDC_WD30EFRX-68EUZN0_WD-WCC4N2DVRDXF
devid 4 size 2.73TiB used 190.03GiB path
/dev/mapper/luks-WDC_WD30EZRX-00MMMB0_WD-WCAWZ3013164
btrfs-progs v4.2
root@nas:~# btrfs fi df /tank/
Data, RAID5: total=2.67TiB, used=2.65TiB
System, RAID1: total=32.00MiB, used=384.00KiB
Metadata, RAID1: total=6.00GiB, used=4.38GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
root@nas:~# btrfs fi usage /tank/
WARNING: RAID56 detected, not implemented
Overall:
Device size: 8.19TiB
Device allocated: 12.06GiB
Device unallocated: 8.17TiB
Device missing: 0.00B
Used: 8.76GiB
Free (estimated): 0.00B (min: 8.00EiB)
Data ratio: 0.00
Metadata ratio: 2.00
Global reserve: 512.00MiB (used: 0.00B)
Data,RAID5: Size:2.67TiB, Used:2.65TiB
/dev/dm-1 2.49TiB
/dev/dm-2 2.49TiB
/dev/mapper/luks-WDC_WD30EZRX-00MMMB0_WD-WCAWZ3013164
184.00GiB
Metadata,RAID1: Size:6.00GiB, Used:4.38GiB
/dev/dm-1 3.00GiB
/dev/dm-2 3.00GiB
/dev/mapper/luks-WDC_WD30EZRX-00MMMB0_WD-WCAWZ3013164
6.00GiB
System,RAID1: Size:32.00MiB, Used:384.00KiB
/dev/dm-2 32.00MiB
/dev/mapper/luks-WDC_WD30EZRX-00MMMB0_WD-WCAWZ3013164
32.00MiB
Unallocated:
/dev/dm-1 239.52GiB
/dev/dm-2 239.49GiB
/dev/mapper/luks-WDC_WD30EZRX-00MMMB0_WD-WCAWZ3013164
2.54TiB
Each drive had LUKS configured on them (directly on /dev/sdX, no
partition), then the resulting virtual drive is directly used as a btrfs
device.
root@nas:~# time btrfs balance start /tank
Segmentation fault
real 750m55.550s
with the following kernel BUG in the log :
nas kernel: [17863.907793] ------------[ cut here ]------------
nas kernel: [17863.907833] kernel BUG at
/build/linux-4dBub_/linux-4.2.0/fs/btrfs/extent-tree.c:1833!
nas kernel: [17863.907857] invalid opcode: 0000 [#1] SMP
nas kernel: [17863.907877] Modules linked in: xts gf128mul drbg
ansi_cprng xt_multiport xt_comment xt_conntrack xt_nat xt_tcpudp
nfnetlink_queue nfnetlink_log nfne
nas kernel: [17863.908264] CPU: 1 PID: 17379 Comm: btrfs Not tainted
4.2.0-7-generic #7-Ubuntu
nas kernel: [17863.908281] Hardware name: ASUS All Series/H87I-PLUS,
BIOS 1005 01/06/2014
nas kernel: [17863.908297] task: ffff880036184c80 ti: ffff8800507f4000
task.ti: ffff8800507f4000
nas kernel: [17863.908314] RIP: 0010:[<ffffffffc0311ab6>]
[<ffffffffc0311ab6>] insert_inline_extent_backref+0xc6/0xd0 [btrfs]
nas kernel: [17863.908349] RSP: 0018:ffff8800507f7698 EFLAGS: 00010293
nas kernel: [17863.908362] RAX: 0000000000000000 RBX: 0000000000000001
RCX: 0000000000000001
nas kernel: [17863.908378] RDX: ffff880000000000 RSI: 0000000000000001
RDI: 0000000000000000
nas kernel: [17863.908394] RBP: ffff8800507f7718 R08: 0000000000004000
R09: ffff8800507f7598
nas kernel: [17863.908410] R10: 0000000000000000 R11: 0000000000000003
R12: ffff8800c5c65000
nas kernel: [17863.908427] R13: 00000307b70ac000 R14: 0000000000000000
R15: ffff880108d5c630
nas kernel: [17863.908443] FS: 00007f9300a7d900(0000)
GS:ffff88011fb00000(0000) knlGS:0000000000000000
nas kernel: [17863.908461] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
nas kernel: [17863.908475] CR2: 00007f0a351c6000 CR3: 0000000118c0d000
CR4: 00000000000406e0
nas kernel: [17863.908491] Stack:
nas kernel: [17863.908496] 00000307b70ac000 0000000000000d0b
0000000000000001 0000000000000000
nas kernel: [17863.908516] 0000030600000001 ffffffff811cf4ca
0000000000000000 ffffffffc030550a
nas kernel: [17863.908535] 0000000000270026 00000000000035d7
ffff88001fdd95c0 ffff8800927ae000
nas kernel: [17863.908555] Call Trace:
nas kernel: [17863.908564] [<ffffffff811cf4ca>] ?
kmem_cache_alloc+0x1ca/0x200
nas kernel: [17863.908582] [<ffffffffc030550a>] ?
btrfs_alloc_path+0x1a/0x20 [btrfs]
nas kernel: [17863.908601] [<ffffffffc0311f98>]
__btrfs_inc_extent_ref.isra.52+0x98/0x250 [btrfs]
nas kernel: [17863.908623] [<ffffffffc031757a>]
__btrfs_run_delayed_refs+0xc4a/0x1050 [btrfs]
nas kernel: [17863.908643] [<ffffffffc030f980>] ?
add_pinned_bytes+0x70/0x80 [btrfs]
nas kernel: [17863.908662] [<ffffffffc0318087>] ?
walk_up_proc+0xd7/0x4a0 [btrfs]
nas kernel: [17863.908681] [<ffffffffc031a5be>]
btrfs_run_delayed_refs.part.73+0x6e/0x270 [btrfs]
nas kernel: [17863.908702] [<ffffffffc031a7d5>]
btrfs_run_delayed_refs+0x15/0x20 [btrfs]
nas kernel: [17863.908723] [<ffffffffc032e38a>]
btrfs_should_end_transaction+0x5a/0x60 [btrfs]
nas kernel: [17863.908744] [<ffffffffc0318dad>]
btrfs_drop_snapshot+0x43d/0x820 [btrfs]
nas kernel: [17863.908765] [<ffffffffc0328c00>] ?
btrfs_get_fs_root+0x30/0x80 [btrfs]
nas kernel: [17863.908787] [<ffffffffc03813c2>]
merge_reloc_roots+0xd2/0x240 [btrfs]
nas kernel: [17863.908808] [<ffffffffc038178a>]
relocate_block_group+0x25a/0x690 [btrfs]
nas kernel: [17863.908829] [<ffffffffc0381d8a>]
btrfs_relocate_block_group+0x1ca/0x2c0 [btrfs]
nas kernel: [17863.909470] [<ffffffffc03564de>]
btrfs_relocate_chunk.isra.39+0x3e/0xb0 [btrfs]
nas kernel: [17863.910108] [<ffffffffc0357847>]
__btrfs_balance+0x4c7/0x8b0 [btrfs]
nas kernel: [17863.910748] [<ffffffffc0357ec0>]
btrfs_balance+0x290/0x610 [btrfs]
nas kernel: [17863.911406] [<ffffffffc0364014>] ?
btrfs_ioctl_balance+0x274/0x3c0 [btrfs]
nas kernel: [17863.912065] [<ffffffffc0363f09>]
btrfs_ioctl_balance+0x169/0x3c0 [btrfs]
nas kernel: [17863.912734] [<ffffffffc03658d8>]
btrfs_ioctl+0x548/0x26d0 [btrfs]
nas kernel: [17863.913398] [<ffffffff811c5f12>] ?
alloc_pages_vma+0xc2/0x230
nas kernel: [17863.914014] [<ffffffff81185d6b>] ?
lru_cache_add_active_or_unevictable+0x2b/0xa0
nas kernel: [17863.914651] [<ffffffff811a6d25>] ?
handle_mm_fault+0xbc5/0x16a0
nas kernel: [17863.915260] [<ffffffff811aa4dd>] ?
__vma_link_rb+0xfd/0x110
nas kernel: [17863.915841] [<ffffffff811aa5a9>] ? vma_link+0xb9/0xc0
nas kernel: [17863.916427] [<ffffffff811fffd5>]
do_vfs_ioctl+0x285/0x470
nas kernel: [17863.916970] [<ffffffff810630a4>] ?
__do_page_fault+0x1b4/0x400
nas kernel: [17863.917528] [<ffffffff81200239>] SyS_ioctl+0x79/0x90
nas kernel: [17863.918037] [<ffffffff817b6cf2>]
entry_SYSCALL_64_fastpath+0x16/0x75
nas kernel: [17863.918564] Code: 45 10 49 89 d9 48 8b 55 c8 4c 89 34 24
4c 89 e9 4c 89 fe 4c 89 e7 48 89 44 24 10 8b 45 28 89 44 24 08 e8 fe d6
ff ff 31 c0 eb bb <
nas kernel: [17863.919683] RIP [<ffffffffc0311ab6>]
insert_inline_extent_backref+0xc6/0xd0 [btrfs]
nas kernel: [17863.920202] RSP <ffff8800507f7698>
nas kernel: [17863.922890] ---[ end trace f9b514d72fc0a628 ]---
I downgraded to 4.1.7 just in case, and got the same thing after a
couple hours :
nas kernel: [47155.229661] ------------[ cut here ]------------
nas kernel: [47155.229670] WARNING: CPU: 1 PID: 9145 at
/home/kernel/COD/linux/fs/btrfs/delayed-ref.c:475
update_existing_ref+0x18b/0x1e0 [btrfs]()
nas kernel: [47155.229671] Modules linked in: ufs qnx4 hfsplus hfs minix
ntfs msdos jfs xfs libcrc32c xts gf128mul xt_multiport xt_comment
xt_conntrack xt_nat xt_t
nas kernel: [47155.229704] CPU: 1 PID: 9145 Comm: btrfs Tainted: P
W OE 4.1.7-040107-generic #201509131330
nas kernel: [47155.229705] Hardware name: ASUS All Series/H87I-PLUS,
BIOS 1005 01/06/2014
nas kernel: [47155.229706] ffffffffc0381b30 ffff880103eff658
ffffffff817d0ee3 0000000000000000
nas kernel: [47155.229707] 0000000000000000 ffff880103eff698
ffffffff81079c3a 0000000000001000
nas kernel: [47155.229708] ffff88009c3806e0 ffff88009a96a428
ffff88009a96a3c0 ffff8800a3064420
nas kernel: [47155.229710] Call Trace:
nas kernel: [47155.229713] [<ffffffff817d0ee3>] dump_stack+0x45/0x57
nas kernel: [47155.229714] [<ffffffff81079c3a>]
warn_slowpath_common+0x8a/0xc0
nas kernel: [47155.229715] [<ffffffff81079d2a>]
warn_slowpath_null+0x1a/0x20
nas kernel: [47155.229723] [<ffffffffc0349cdb>]
update_existing_ref+0x18b/0x1e0 [btrfs]
nas kernel: [47155.229730] [<ffffffffc034a0cb>]
add_delayed_tree_ref+0xeb/0x1a0 [btrfs]
nas kernel: [47155.229737] [<ffffffffc034accc>]
btrfs_add_delayed_tree_ref+0x10c/0x180 [btrfs]
nas kernel: [47155.229744] [<ffffffffc02e6610>]
btrfs_free_extent+0xe0/0x140 [btrfs]
nas kernel: [47155.229750] [<ffffffffc02d3735>] ?
btrfs_release_path+0x25/0xb0 [btrfs]
nas kernel: [47155.229757] [<ffffffffc02e6958>]
do_walk_down+0x2e8/0x940 [btrfs]
nas kernel: [47155.229763] [<ffffffffc02e3b82>] ?
walk_down_proc+0x2e2/0x310 [btrfs]
nas kernel: [47155.229771] [<ffffffffc02fc68d>] ?
join_transaction.isra.14+0xfd/0x410 [btrfs]
nas kernel: [47155.229777] [<ffffffffc02e7076>]
walk_down_tree+0xc6/0x100 [btrfs]
nas kernel: [47155.229784] [<ffffffffc02eaa4a>]
btrfs_drop_snapshot+0x41a/0x880 [btrfs]
nas kernel: [47155.229792] [<ffffffffc034cb00>] ?
should_ignore_root.part.15+0x50/0x50 [btrfs]
nas kernel: [47155.229800] [<ffffffffc0351d49>]
merge_reloc_roots+0xd9/0x240 [btrfs]
nas kernel: [47155.229807] [<ffffffffc0352119>]
relocate_block_group+0x269/0x670 [btrfs]
nas kernel: [47155.229814] [<ffffffffc03526f6>]
btrfs_relocate_block_group+0x1d6/0x2e0 [btrfs]
nas kernel: [47155.229822] [<ffffffffc0325cbe>]
btrfs_relocate_chunk.isra.38+0x3e/0xc0 [btrfs]
nas kernel: [47155.229830] [<ffffffffc03270a4>]
__btrfs_balance+0x4e4/0x8b0 [btrfs]
nas kernel: [47155.229838] [<ffffffffc032781a>]
btrfs_balance+0x3aa/0x680 [btrfs]
nas kernel: [47155.229846] [<ffffffffc033086b>] ?
btrfs_ioctl_balance+0x29b/0x520 [btrfs]
nas kernel: [47155.229853] [<ffffffffc0330734>]
btrfs_ioctl_balance+0x164/0x520 [btrfs]
nas kernel: [47155.229860] [<ffffffffc03355f7>]
btrfs_ioctl+0x597/0x2b30 [btrfs]
nas kernel: [47155.229862] [<ffffffff811d2ad5>] ?
alloc_pages_vma+0xb5/0x200
nas kernel: [47155.229864] [<ffffffff81191a3b>] ?
lru_cache_add_active_or_unevictable+0x2b/0xa0
nas kernel: [47155.229865] [<ffffffff811b280c>] ?
handle_mm_fault+0xbac/0x17e0
nas kernel: [47155.229866] [<ffffffff811b6a08>] ?
__vma_link_rb+0xc8/0xf0
nas kernel: [47155.229867] [<ffffffff8120ce68>]
do_vfs_ioctl+0x2f8/0x510
nas kernel: [47155.229869] [<ffffffff81066f76>] ?
__do_page_fault+0x1b6/0x450
nas kernel: [47155.229870] [<ffffffff8120d101>] SyS_ioctl+0x81/0xa0
nas kernel: [47155.229871] [<ffffffff81067240>] ?
do_page_fault+0x30/0x80
nas kernel: [47155.229873] [<ffffffff817d8ab2>]
system_call_fastpath+0x16/0x75
nas kernel: [47155.229874] ---[ end trace e4064ae1c7878a22 ]---
and 2 seconds later :
nas kernel: [47157.228137] ------------[ cut here ]------------
nas kernel: [47157.228190] kernel BUG at
/home/kernel/COD/linux/fs/btrfs/extent-tree.c:2248!
nas kernel: [47157.228259] invalid opcode: 0000 [#1] SMP
nas kernel: [47157.228301] Modules linked in: ufs qnx4 hfsplus hfs minix
ntfs msdos jfs xfs libcrc32c xts gf128mul xt_multiport xt_comment
xt_conntrack xt_nat xt_t
nas kernel: [47157.229656] CPU: 0 PID: 9145 Comm: btrfs Tainted: P
W OE 4.1.7-040107-generic #201509131330
nas kernel: [47157.229741] Hardware name: ASUS All Series/H87I-PLUS,
BIOS 1005 01/06/2014
nas kernel: [47157.229807] task: ffff88011a8cd080 ti: ffff880103efc000
task.ti: ffff880103efc000
nas kernel: [47157.229875] RIP: 0010:[<ffffffffc02e8251>]
[<ffffffffc02e8251>] __btrfs_run_delayed_refs+0x11a1/0x1230 [btrfs]
nas kernel: [47157.229998] RSP: 0018:ffff880103eff7c8 EFLAGS: 00010202
nas kernel: [47157.230048] RAX: 0000000000000001 RBX: 0000000000000000
RCX: 00000000000001e1
nas kernel: [47157.230113] RDX: ffff8800c61ad000 RSI: ffff8800c6adaed0
RDI: ffff8800c6adaec8
nas kernel: [47157.230179] RBP: ffff880103eff8f8 R08: 0000000000000000
R09: 00000001802e002c
nas kernel: [47157.230244] R10: ffffffffc02e75d3 R11: 0000000000000d0a
R12: ffff880056f0c9f8
nas kernel: [47157.230310] R13: 000003cdf0f80000 R14: ffff8800c6adae60
R15: 0000000000000000
nas kernel: [47157.230377] FS: 00007f5f63146900(0000)
GS:ffff88011fa00000(0000) knlGS:0000000000000000
nas kernel: [47157.230451] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
nas kernel: [47157.230504] CR2: 00007f6126ad5000 CR3: 00000000041be000
CR4: 00000000000406f0
nas kernel: [47157.230569] Stack:
nas kernel: [47157.230590] 0000000000000001 0000000000000000
0000042000000001 0000000000000001
nas kernel: [47157.230669] 0000000000000000 0000000000000cf6
ffff88009a930480 00000000000020ae
nas kernel: [47157.230748] 0000000203eff838 0000000000004000
ffff88009a930480 ffff88009a930480
nas kernel: [47157.230827] Call Trace:
nas kernel: [47157.230882] [<ffffffffc02ec483>]
btrfs_run_delayed_refs.part.66+0x73/0x270 [btrfs]
nas kernel: [47157.230975] [<ffffffffc02ec697>]
btrfs_run_delayed_refs+0x17/0x20 [btrfs]
nas kernel: [47157.231065] [<ffffffffc02fd169>]
btrfs_should_end_transaction+0x49/0x60 [btrfs]
nas kernel: [47157.231155] [<ffffffffc02eaaa2>]
btrfs_drop_snapshot+0x472/0x880 [btrfs]
nas kernel: [47157.231251] [<ffffffffc034cb00>] ?
should_ignore_root.part.15+0x50/0x50 [btrfs]
nas kernel: [47157.231347] [<ffffffffc0351d49>]
merge_reloc_roots+0xd9/0x240 [btrfs]
nas kernel: [47157.231433] [<ffffffffc0352119>]
relocate_block_group+0x269/0x670 [btrfs]
nas kernel: [47157.231521] [<ffffffffc03526f6>]
btrfs_relocate_block_group+0x1d6/0x2e0 [btrfs]
nas kernel: [47157.231618] [<ffffffffc0325cbe>]
btrfs_relocate_chunk.isra.38+0x3e/0xc0 [btrfs]
nas kernel: [47157.231714] [<ffffffffc03270a4>]
__btrfs_balance+0x4e4/0x8b0 [btrfs]
nas kernel: [47157.231799] [<ffffffffc032781a>]
btrfs_balance+0x3aa/0x680 [btrfs]
nas kernel: [47157.231885] [<ffffffffc033086b>] ?
btrfs_ioctl_balance+0x29b/0x520 [btrfs]
nas kernel: [47157.231974] [<ffffffffc0330734>]
btrfs_ioctl_balance+0x164/0x520 [btrfs]
nas kernel: [47157.232062] [<ffffffffc03355f7>]
btrfs_ioctl+0x597/0x2b30 [btrfs]
nas kernel: [47157.232125] [<ffffffff811d2ad5>] ?
alloc_pages_vma+0xb5/0x200
nas kernel: [47157.232183] [<ffffffff81191a3b>] ?
lru_cache_add_active_or_unevictable+0x2b/0xa0
nas kernel: [47157.232253] [<ffffffff811b280c>] ?
handle_mm_fault+0xbac/0x17e0
nas kernel: [47157.232311] [<ffffffff811b6a08>] ?
__vma_link_rb+0xc8/0xf0
nas kernel: [47157.232365] [<ffffffff8120ce68>]
do_vfs_ioctl+0x2f8/0x510
nas kernel: [47157.232421] [<ffffffff81066f76>] ?
__do_page_fault+0x1b6/0x450
nas kernel: [47157.232477] [<ffffffff8120d101>] SyS_ioctl+0x81/0xa0
nas kernel: [47157.232527] [<ffffffff81067240>] ?
do_page_fault+0x30/0x80
nas kernel: [47157.232584] [<ffffffff817d8ab2>]
system_call_fastpath+0x16/0x75
nas kernel: [47157.232640] Code: 48 c7 c7 68 e4 37 c0 e8 de 1a d9 c0 e9
55 f0 ff ff 0f 0b be ba 00 00 00 48 c7 c7 68 e4 37 c0 e8 c6 1a d9 c0 e9
4d f1 ff ff 0f 0b <
nas kernel: [47157.232977] RIP [<ffffffffc02e8251>]
__btrfs_run_delayed_refs+0x11a1/0x1230 [btrfs]
nas kernel: [47157.233072] RSP <ffff880103eff7c8>
nas kernel: [47157.256409] ---[ end trace e4064ae1c7878a23 ]---
When it happens, the system is obviously unstable and I can't umount or
reboot (without the sysreq keys, that is).
When I do reboot, the filesystem is still mountable and remotely seems
OK (didn't try a scrub yet). This is reproductible on my side, and I'm
willing do help you debug this!
I can attach the complete dmesg if necessary.
If you need me to try more stuff or dump more information to help
debugging, just ask!
Thanks,
Stéphane.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-14 11:46 kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance Stéphane Lesimple
@ 2015-09-15 14:47 ` Stéphane Lesimple
2015-09-15 14:56 ` Josef Bacik
0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-15 14:47 UTC (permalink / raw)
To: linux-btrfs
> I've been experiencing repetitive "kernel BUG" occurences in the past
> few days trying to balance a raid5 filesystem after adding a new drive.
> It occurs on both 4.2.0 and 4.1.7, using 4.2 userspace tools.
I've ran a scrub on this filesystem after the crash happened twice, and
if found no errors.
The BUG_ON() condition that my filesystem triggers is the following :
BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID);
// in insert_inline_extent_backref() of extent-tree.c.
I've compiled a fresh 4.3.0-rc1 with a couple added printk's just before
the BUG_ON(), to dump the parameters passed to
insert_inline_extent_backref() when the problem occurs.
Here is an excerpt of the resulting dmesg :
{btrfs} in insert_inline_extent_backref, got owner <
BTRFS_FIRST_FREE_OBJECTID
{btrfs} with bytenr=4557830635520 num_bytes=16384 parent=4558111506432
root_objectid=3339 owner=1 offset=0 refs_to_add=1
BTRFS_FIRST_FREE_OBJECTID=256
------------[ cut here ]------------
kernel BUG at fs/btrfs/extent-tree.c:1837!
I'll retry with the exact same kernel once I get the machine back up,
and see if the the bug happens again at the same filesystem spot or a
different one.
The variable amount of time after a balance start elapsed before I get
the bug suggests that this would be a different one.
--
Stéphane.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-15 14:47 ` Stéphane Lesimple
@ 2015-09-15 14:56 ` Josef Bacik
2015-09-15 21:47 ` Stéphane Lesimple
0 siblings, 1 reply; 37+ messages in thread
From: Josef Bacik @ 2015-09-15 14:56 UTC (permalink / raw)
To: Stéphane Lesimple, linux-btrfs
On 09/15/2015 10:47 AM, Stéphane Lesimple wrote:
>> I've been experiencing repetitive "kernel BUG" occurences in the past
>> few days trying to balance a raid5 filesystem after adding a new drive.
>> It occurs on both 4.2.0 and 4.1.7, using 4.2 userspace tools.
>
> I've ran a scrub on this filesystem after the crash happened twice, and
> if found no errors.
>
> The BUG_ON() condition that my filesystem triggers is the following :
>
> BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID);
> // in insert_inline_extent_backref() of extent-tree.c.
>
> I've compiled a fresh 4.3.0-rc1 with a couple added printk's just before
> the BUG_ON(), to dump the parameters passed to
> insert_inline_extent_backref() when the problem occurs.
> Here is an excerpt of the resulting dmesg :
>
> {btrfs} in insert_inline_extent_backref, got owner <
> BTRFS_FIRST_FREE_OBJECTID
> {btrfs} with bytenr=4557830635520 num_bytes=16384 parent=4558111506432
> root_objectid=3339 owner=1 offset=0 refs_to_add=1
> BTRFS_FIRST_FREE_OBJECTID=256
> ------------[ cut here ]------------
> kernel BUG at fs/btrfs/extent-tree.c:1837!
>
> I'll retry with the exact same kernel once I get the machine back up,
> and see if the the bug happens again at the same filesystem spot or a
> different one.
> The variable amount of time after a balance start elapsed before I get
> the bug suggests that this would be a different one.
>
Does btrfsck complain at all? Thanks,
Josef
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-15 14:56 ` Josef Bacik
@ 2015-09-15 21:47 ` Stéphane Lesimple
2015-09-16 5:02 ` Duncan
0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-15 21:47 UTC (permalink / raw)
To: Josef Bacik; +Cc: linux-btrfs
Le 2015-09-15 16:56, Josef Bacik a écrit :
> On 09/15/2015 10:47 AM, Stéphane Lesimple wrote:
>>> I've been experiencing repetitive "kernel BUG" occurences in the past
>>> few days trying to balance a raid5 filesystem after adding a new
>>> drive.
>>> It occurs on both 4.2.0 and 4.1.7, using 4.2 userspace tools.
>>
>> I've ran a scrub on this filesystem after the crash happened twice,
>> and
>> if found no errors.
>>
>> The BUG_ON() condition that my filesystem triggers is the following :
>>
>> BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID);
>> // in insert_inline_extent_backref() of extent-tree.c.
>>
>> I've compiled a fresh 4.3.0-rc1 with a couple added printk's just
>> before
>> the BUG_ON(), to dump the parameters passed to
>> insert_inline_extent_backref() when the problem occurs.
>> Here is an excerpt of the resulting dmesg :
>>
>> {btrfs} in insert_inline_extent_backref, got owner <
>> BTRFS_FIRST_FREE_OBJECTID
>> {btrfs} with bytenr=4557830635520 num_bytes=16384 parent=4558111506432
>> root_objectid=3339 owner=1 offset=0 refs_to_add=1
>> BTRFS_FIRST_FREE_OBJECTID=256
>> ------------[ cut here ]------------
>> kernel BUG at fs/btrfs/extent-tree.c:1837!
>>
>> I'll retry with the exact same kernel once I get the machine back up,
>> and see if the the bug happens again at the same filesystem spot or a
>> different one.
>> The variable amount of time after a balance start elapsed before I get
>> the bug suggests that this would be a different one.
>>
>
> Does btrfsck complain at all?
Thanks for your suggestion.
You're right, even if btrfs scrub didn't complain, btrfsck does :
checking extents
bad metadata [4179166806016, 4179166822400) crossing stripe boundary
bad metadata [4179166871552, 4179166887936) crossing stripe boundary
bad metadata [4179166937088, 4179166953472) crossing stripe boundary
[... some more ...]
extent buffer leak: start 4561066901504 len 16384
extent buffer leak: start 4561078812672 len 16384
extent buffer leak: start 4561078861824 len 16384
[... some more ...]
then some complains about mismatched counts for qgroups.
I can see from tbe btrfsck source code that the --repair will not work
here, so I didn't try.
I'm not sure if those errors would be a cause or a consequence of the
bug. As the filesystem was only a few days old and as there was always a
balance running during the crashes, I would be tempted to think it might
actually be a consequence, but I can't be sure.
In your experience, could these inconsistencies cause the crash ?
If you think so, then I'll btrfs dev del the 3rd device, then remount
the array degraded with just 1 disk and create a new btrfs system from
scratch on the second, then copy the data in single redundancy, then
re-add the 2 disks and balance convert in raid5.
If you think not, then this array could still help you debug a corner
case, and I can keep it that way for a couple days if more testing/debug
is needed.
Thanks,
--
Stéphane
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-15 21:47 ` Stéphane Lesimple
@ 2015-09-16 5:02 ` Duncan
2015-09-16 10:28 ` Stéphane Lesimple
0 siblings, 1 reply; 37+ messages in thread
From: Duncan @ 2015-09-16 5:02 UTC (permalink / raw)
To: linux-btrfs
Stéphane Lesimple posted on Tue, 15 Sep 2015 23:47:01 +0200 as excerpted:
> Le 2015-09-15 16:56, Josef Bacik a écrit :
>> On 09/15/2015 10:47 AM, Stéphane Lesimple wrote:
>>>> I've been experiencing repetitive "kernel BUG" occurences in the past
>>>> few days trying to balance a raid5 filesystem after adding a new
>>>> drive.
>>>> It occurs on both 4.2.0 and 4.1.7, using 4.2 userspace tools.
>>>
>>> I've ran a scrub on this filesystem after the crash happened twice,
>>> and if found no errors.
>>>
>>> The BUG_ON() condition that my filesystem triggers is the following :
>>>
>>> BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID);
>>> // in insert_inline_extent_backref() of extent-tree.c.
>>>
>> Does btrfsck complain at all?
Just to elucidate a bit...
Scrub is designed to detect, and where there's a second copy available
(dup or raid1/10 modes, raid5/6 modes can reconstruct from parity)
correct, exactly one problem, corruption where the checksum stored at
data write doesn't match that computed on data read back from storage.
As such, it detects/corrects media errors and (perhaps more commonly)
corrupted data due to crashes in the middle of the write, but if the data
was bad when it was written in the first place and thus the checksum
covering it simply validates what was already bad before the write
happened, scrub will be none the wiser and will happily validate the
incorrect data, since it's a totally valid checksum covering data that
was bad before the checksum was ever created.
Which is where btrfs check comes in and why JB asked you to run it, since
unlike scrub, check is designed to catch filesystem logic errors.
> Thanks for your suggestion.
> You're right, even if btrfs scrub didn't complain, btrfsck does :
>
> checking extents
> bad metadata [4179166806016, 4179166822400) crossing stripe boundary
> bad metadata [4179166871552, 4179166887936) crossing stripe boundary
> bad metadata [4179166937088, 4179166953472) crossing stripe boundary
This is an actively in-focus bug ATM, and while I'm not a dev and can't
tell you for sure that it's behind the specific balance-related crash and
traces you posted (tho I believe it so), it certainly has the potential
to be that serious, yes.
The most common cause is a buggy btrfs-convert that was creating invalid
btrfs when converting from ext* at one point. AFAIK they've hotfixed the
immediate convert issue, but are still actively working on a longer term
proper fix. Meanwhile, while btrfs check does now detect the issue (and
even that is quite new code, added in 4.2 I believe), there's still no
real fix for what was after all a defective btrfs from the moment the
convert was done.
So where that's the cause, the filesystem was created from an ext* fs
using a buggy btrfs-convert and is thus actually invalid due to this
cross-stripe-metadata, the current fix is to back up the files you want
to keep (and FWIW, as any good sysadmin will tell you, a backup that
hasn't been tested restorable isn't yet a backup, as the job isn't
complete), then blow away and recreate the filesystem properly, using
mkfs.btrfs, and of course then restore to the new filesystem.
If, however, you created the filesystem using mkfs.btrfs, then the
problem must have occurred some other way. Whether there's some other
cause beyond the known cause, a buggy btrfs-convert, has in fact been in
question, so in this case the devs are likely to be quite interested
indeed in your case and perhaps the filesystem history that brought you
to this point. The ultimate fix is likely to be the same (unless the
devs have you test new fix code for btrfs check --repair), but I'd
strongly urge you to delay blowing away the filesystem, if possible,
until the devs have a chance to ask you to run other diagnostics and
perhaps even get a btrfs-image for them, since you may well have
accidentally found a corner-case they'll have trouble reproducing,
without your information.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-16 5:02 ` Duncan
@ 2015-09-16 10:28 ` Stéphane Lesimple
2015-09-16 10:46 ` Holger Hoffstätte
0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-16 10:28 UTC (permalink / raw)
To: Duncan; +Cc: linux-btrfs
Le 2015-09-16 07:02, Duncan a écrit :
> Stéphane Lesimple posted on Tue, 15 Sep 2015 23:47:01 +0200 as
> excerpted:
>
>> Le 2015-09-15 16:56, Josef Bacik a écrit :
>>> On 09/15/2015 10:47 AM, Stéphane Lesimple wrote:
>>>>> I've been experiencing repetitive "kernel BUG" occurences in the
>>>>> past
>>>>> few days trying to balance a raid5 filesystem after adding a new
>>>>> drive.
>>>>> It occurs on both 4.2.0 and 4.1.7, using 4.2 userspace tools.
>>>>
>>>> I've ran a scrub on this filesystem after the crash happened twice,
>>>> and if found no errors.
>>>>
>>>> The BUG_ON() condition that my filesystem triggers is the following
>>>> :
>>>>
>>>> BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID);
>>>> // in insert_inline_extent_backref() of extent-tree.c.
>>>>
>>> Does btrfsck complain at all?
>
> Just to elucidate a bit...
> [...]
> Which is where btrfs check comes in and why JB asked you to run it,
> since
> unlike scrub, check is designed to catch filesystem logic errors.
Thanks for your clarification Duncan, that perfectly makes sense.
>> You're right, even if btrfs scrub didn't complain, btrfsck does :
>>
>> checking extents
>> bad metadata [4179166806016, 4179166822400) crossing stripe boundary
>> bad metadata [4179166871552, 4179166887936) crossing stripe boundary
>> bad metadata [4179166937088, 4179166953472) crossing stripe boundary
>
> This is an actively in-focus bug ATM, and while I'm not a dev and can't
> tell you for sure that it's behind the specific balance-related crash
> and
> traces you posted (tho I believe it so), it certainly has the potential
> to be that serious, yes.
>
> The most common cause is a buggy btrfs-convert that was creating
> invalid
> btrfs when converting from ext* at one point. AFAIK they've hotfixed
> the
> immediate convert issue, but are still actively working on a longer
> term
> proper fix. Meanwhile, while btrfs check does now detect the issue
> (and
> even that is quite new code, added in 4.2 I believe), there's still no
> real fix for what was after all a defective btrfs from the moment the
> convert was done.
> [...]
> If, however, you created the filesystem using mkfs.btrfs, then the
> problem must have occurred some other way. Whether there's some other
> cause beyond the known cause, a buggy btrfs-convert, has in fact been
> in
> question, so in this case the devs are likely to be quite interested
> indeed in your case and perhaps the filesystem history that brought you
> to this point. The ultimate fix is likely to be the same (unless the
> devs have you test new fix code for btrfs check --repair), but I'd
> strongly urge you to delay blowing away the filesystem, if possible,
> until the devs have a chance to ask you to run other diagnostics and
> perhaps even get a btrfs-image for them, since you may well have
> accidentally found a corner-case they'll have trouble reproducing,
> without your information.
Nice to know that this bug was already somewhat known, but I can confirm
that it actually doesn't come from an ext4 conversion on my case.
Here is the filesystem history, which is actually quite short :
- FS created from scratch, no convert, on 2x4T devices using mkfs.btrfs
with raid1 metadata, raid5 data. This is using the 4.2 tools and kernel
3.19, so a couple incompat features were turned on by default (such as
skinny metadata).
- Approx. 4T worth of files copied to it, a bit less, I had around 30G
free after the copy.
- Upgraded to kernel 4.2.0
- Added a third 4T device to the filesystem
- Ran a balance to get an even repartition of data/metadata among the 3
drives
- Kernel BUG after a couple hours. The btrfs balance userspace tool
segfaulted at the same time. Due to apport default configuration (damn
you, Ubuntu !), core file was discarded, but I don't think the segfault
is really interesting. The kernel trace is.
This was all done within ~1 week.
I've just created an image of the metadata, using btrfs-image -s. The
image is 2.9G large, I can drop it somewhere in case a dev would like to
have a look at it.
For what it's worth, I've been hitting another kernel BUG, almost
certainly related, while trying to dev del the 3rd device, after 8 hours
of work (kernel 4.1.7) :
kernel BUG at /home/kernel/COD/linux/fs/btrfs/extent-tree.c:2248!
in __btrfs_run_delayed_refs+0x11a1/0x1230 [btrfs]
Trace:
[<ffffffff813d9a65>] ? __percpu_counter_add+0x55/0x70
[<ffffffffc02ea483>] btrfs_run_delayed_refs.part.66+0x73/0x270 [btrfs]
[<ffffffffc02ea697>] btrfs_run_delayed_refs+0x17/0x20 [btrfs]
[<ffffffffc02fb169>] btrfs_should_end_transaction+0x49/0x60 [btrfs]
[<ffffffffc02e8aa2>] btrfs_drop_snapshot+0x472/0x880 [btrfs]
[<ffffffffc034ab00>] ? should_ignore_root.part.15+0x50/0x50 [btrfs]
[<ffffffffc034fd49>] merge_reloc_roots+0xd9/0x240 [btrfs]
[<ffffffffc0350119>] relocate_block_group+0x269/0x670 [btrfs]
[<ffffffffc03506f6>] btrfs_relocate_block_group+0x1d6/0x2e0 [btrfs]
[<ffffffffc0323cbe>] btrfs_relocate_chunk.isra.38+0x3e/0xc0 [btrfs]
[<ffffffffc0324944>] btrfs_shrink_device+0x1d4/0x450 [btrfs]
[<ffffffffc0328d43>] btrfs_rm_device+0x323/0x810 [btrfs]
[<ffffffffc0334ee6>] btrfs_ioctl+0x1e86/0x2b30 [btrfs]
[<ffffffff81183544>] ? filemap_map_pages+0x1d4/0x230
[<ffffffff811b29f5>] ? handle_mm_fault+0xd95/0x17e0
[<ffffffff81115112>] ? from_kgid_munged+0x12/0x20
[<ffffffff811fe710>] ? cp_new_stat+0x140/0x160
[<ffffffff8120ce68>] do_vfs_ioctl+0x2f8/0x510
[<ffffffff81066f76>] ? __do_page_fault+0x1b6/0x450
[<ffffffff811fe75f>] ? SYSC_newstat+0x2f/0x40
[<ffffffff8120d101>] SyS_ioctl+0x81/0xa0
[<ffffffff81067240>] ? do_page_fault+0x30/0x80
[<ffffffff817d8ab2>] system_call_fastpath+0x16/0x75
If JB or any other btrfs dev wants me to try anything at this filesystem
before I recreate it from scratch, such as a kernel patch or userland
tool patch, or run a more verbose debug balance, I would be happy to do
so.
If this is the case, please tell me, so I can keep the filesystem as it
is. On the other hand if you're sure the btrfs-image is enough, please
tell me too, so I can go forward and fix my system :)
Thanks,
--
Stéphane.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-16 10:28 ` Stéphane Lesimple
@ 2015-09-16 10:46 ` Holger Hoffstätte
2015-09-16 13:04 ` Stéphane Lesimple
0 siblings, 1 reply; 37+ messages in thread
From: Holger Hoffstätte @ 2015-09-16 10:46 UTC (permalink / raw)
To: Stéphane Lesimple; +Cc: linux-btrfs
On 09/16/15 12:28, Stéphane Lesimple wrote:
> Nice to know that this bug was already somewhat known, but I can
> confirm that it actually doesn't come from an ext4 conversion on my
> case.
In that case the "crossing stripe boundary" messages are false positives
in btrfs-progs-4.2: http://www.spinics.net/lists/linux-btrfs/msg47059.html
This should be fixed in the next release.
-h
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-16 10:46 ` Holger Hoffstätte
@ 2015-09-16 13:04 ` Stéphane Lesimple
2015-09-16 20:18 ` Duncan
2015-09-17 6:29 ` Stéphane Lesimple
0 siblings, 2 replies; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-16 13:04 UTC (permalink / raw)
To: linux-btrfs
Le 2015-09-16 12:46, Holger Hoffstätte a écrit :
> On 09/16/15 12:28, Stéphane Lesimple wrote:
>> Nice to know that this bug was already somewhat known, but I can
>> confirm that it actually doesn't come from an ext4 conversion on my
>> case.
>
> In that case the "crossing stripe boundary" messages are false
> positives
> in btrfs-progs-4.2:
> http://www.spinics.net/lists/linux-btrfs/msg47059.html
>
> This should be fixed in the next release.
Out of curiosity I compiled the btrfs-progs-4.2 release patched with the
diff you're referencing to fix the off-by-one error, and ran a btrfsck
again.
Indeed those errors disappear and my filesystem seems clean in this
regard. I also disabled quota because it has almost for sure nothing to
do with the bug, and now btrsfck is 100% happy:
---------
checking extents
checking free space cache
checking fs roots
checking csums
checking root refs
Checking filesystem on
/dev/mapper/luks-WDC_WD30EZRX-00MMMB0_WD-WCAWZ3013164
UUID: 6bec1608-d9c0-453e-87eb-8b8663c9010d
found 2922178546042 bytes used err is 0
total csum bytes: 2849102736
total tree bytes: 4697341952
total fs tree bytes: 1276395520
total extent tree bytes: 90963968
btree space waste bytes: 640514848
file data blocks allocated: 2959998808064
referenced 2959997575168
btrfs-progs v4.2-dirty
---------
So this is even more interesting, my filesystem is reported by scrub and
fsck as being in perfect shape, but still crashes the kernel from time
to time on balance.
Next step: reboot under 4.3.0-rc1 with my printk's, run a balance, log
the crash, reboot, balance again, crash again, and compare. If the same
filesystem spot triggers the crash twice then it would be an undetected
metadata/filesystem internal integrity corruption, if it crashes at 2
different spots, then maybe it's some kind of race condition that, for
some reason, my system hits way more often than others.
--
Stéphane.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-16 13:04 ` Stéphane Lesimple
@ 2015-09-16 20:18 ` Duncan
2015-09-16 20:41 ` Stéphane Lesimple
2015-09-17 6:29 ` Stéphane Lesimple
1 sibling, 1 reply; 37+ messages in thread
From: Duncan @ 2015-09-16 20:18 UTC (permalink / raw)
To: linux-btrfs
Stéphane Lesimple posted on Wed, 16 Sep 2015 15:04:20 +0200 as excerpted:
> Le 2015-09-16 12:46, Holger Hoffstätte a écrit :
>>
>> In that case the "crossing stripe boundary" messages are false
>> positives in btrfs-progs-4.2:
>> http://www.spinics.net/lists/linux-btrfs/msg47059.html
>>
>> This should be fixed in the next release.
Thanks for reminding me of that, Holger. I think I was too sleepy when I
read it on-list, and forgot about it...
> Out of curiosity I compiled the btrfs-progs-4.2 release patched with the
> diff you're referencing to fix the off-by-one error, and ran a btrfsck
> again.
> Indeed those errors disappear and my filesystem seems clean in this
> regard. I also disabled quota because it has almost for sure nothing to
> do with the bug, and now btrsfck is 100% happy:
Yes. Quotas have been a continuing issue on btrfs. AFAIK, they're on
the third rewrite now, and still have some bugs to work out. So what
I've been recommending is unless you're (a) directly and specifically
working with the devs to find and fix the quota issues (in which case
please keep at it! =:^), either you (b) really depend on quotas working
and btrfs isn't appropriate for you at this time, since they're known to
be buggy, so use a more mature filesystem where they're known to work, or
(c) you don't actually need quotas at all, so disable them and remove the
quota tracking metadata, thus avoiding the bugs in the feature entirely.
=:^)
(Agreed with your conclusions and nothing to add to the rest of your
message, so omitted here.)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-16 20:18 ` Duncan
@ 2015-09-16 20:41 ` Stéphane Lesimple
2015-09-17 3:03 ` Qu Wenruo
0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-16 20:41 UTC (permalink / raw)
To: linux-btrfs
Le 2015-09-16 22:18, Duncan a écrit :
> Stéphane Lesimple posted on Wed, 16 Sep 2015 15:04:20 +0200 as
> excerpted:
>
>> Le 2015-09-16 12:46, Holger Hoffstätte a écrit :
>>>
>> I also disabled quota because it has almost for sure nothing to
>> do with the bug, and now btrsfck is 100% happy:
>
> Yes. Quotas have been a continuing issue on btrfs. AFAIK, they're on
> the third rewrite now, and still have some bugs to work out. So what
> I've been recommending is unless you're (a) directly and specifically
> working with the devs to find and fix the quota issues (in which case
> please keep at it! =:^), either you (b) really depend on quotas working
> and btrfs isn't appropriate for you at this time, since they're known
> to
> be buggy, so use a more mature filesystem where they're known to work,
> or
> (c) you don't actually need quotas at all, so disable them and remove
> the
> quota tracking metadata, thus avoiding the bugs in the feature
> entirely.
> =:^)
Well actually it's the (d) option ;)
I activate the quota feature for only one reason : being able to track
down how much space my snapshots are taking.
I've been using ZFS in the past, and I was really missing the "zfs list"
command that is able to tell you how much space a given snapshot is
actually taking under btrfs.
With quota enabled, I was able to somehow mimic zfs list with a perl
script I wrote, btrfs-list :
PATH ID TYPE REFER USED
'tank' -1 df - 2.66T
(13.26G free)
/tank 5 vol 48.00K 48.00K
media 1906 subvol 1.04T 1.04T
photos 1909 subvol 116.37G 116.37G
main 1911 subvol 973.23G 973.23G
bkp-slash 3270 subvol 15.86G 15.86G
bkp-quasar 3314 subvol 18.26G 16.00K
.snaps/bkp-quasar@2015-01-17 3317 rosnap 17.77G 40.76M
.snaps/bkp-quasar@2015-03-06 3318 rosnap 17.89G 88.88M
.snaps/bkp-quasar@2015-04-05 3319 rosnap 17.92G 90.97M
.snaps/bkp-quasar@2015-05-31 3320 rosnap 17.95G 1.02M
.snaps/bkp-quasar@2015-06-13 3321 rosnap 17.95G 760.00K
.snaps/bkp-quasar@2015-07-26 3322 rosnap 18.19G 17.88M
.snaps/bkp-quasar@2015-07-31 3323 rosnap 18.19G 14.58M
.snaps/bkp-quasar@2015-08-03 3324 rosnap 18.19G 17.43M
bkp-liznodisk 3341 subvol 7.01G 16.00K
.snaps/bkp-liznodisk@2015-03-01 3342 rosnap 6.96G 75.37M
.snaps/bkp-liznodisk@2015-03-28 3343 rosnap 6.98G 84.93M
.snaps/bkp-liznodisk@2015-04-26 3344 rosnap 6.96G 67.14M
.snaps/bkp-liznodisk@2015-05-24 3345 rosnap 6.95G 47.67M
.snaps/bkp-liznodisk@2015-06-27 3346 rosnap 6.96G 67.97M
.snaps/bkp-liznodisk@2015-07-25 3347 rosnap 6.98G 60.30M
.snaps/bkp-liznodisk@2015-08-16 3348 rosnap 7.10G 159.44M
bkp-skyline 3367 subvol 22.52G 22.52G
I just pushed it to https://github.com/speed47/btrfs-list, if anybody is
interested.
Anyway, balance is running again for 7+ hours, trying to reproduce the
bug twice, but no crash yet ... should happen soon now !
--
Stéphane.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-16 20:41 ` Stéphane Lesimple
@ 2015-09-17 3:03 ` Qu Wenruo
2015-09-17 6:11 ` Stéphane Lesimple
0 siblings, 1 reply; 37+ messages in thread
From: Qu Wenruo @ 2015-09-17 3:03 UTC (permalink / raw)
To: Stéphane Lesimple, linux-btrfs
Stéphane Lesimple wrote on 2015/09/16 22:41 +0200:
> Le 2015-09-16 22:18, Duncan a écrit :
>> Stéphane Lesimple posted on Wed, 16 Sep 2015 15:04:20 +0200 as excerpted:
>>
>>> Le 2015-09-16 12:46, Holger Hoffstätte a écrit :
>>>>
>>> I also disabled quota because it has almost for sure nothing to
>>> do with the bug, and now btrsfck is 100% happy:
>>
>> Yes. Quotas have been a continuing issue on btrfs. AFAIK, they're on
>> the third rewrite now, and still have some bugs to work out. So what
>> I've been recommending is unless you're (a) directly and specifically
>> working with the devs to find and fix the quota issues (in which case
>> please keep at it! =:^), either you (b) really depend on quotas working
>> and btrfs isn't appropriate for you at this time, since they're known to
>> be buggy, so use a more mature filesystem where they're known to work, or
>> (c) you don't actually need quotas at all, so disable them and remove the
>> quota tracking metadata, thus avoiding the bugs in the feature entirely.
>> =:^)
>
> Well actually it's the (d) option ;)
> I activate the quota feature for only one reason : being able to track
> down how much space my snapshots are taking.
> I've been using ZFS in the past, and I was really missing the "zfs list"
> command that is able to tell you how much space a given snapshot is
> actually taking under btrfs.
> With quota enabled, I was able to somehow mimic zfs list with a perl
> script I wrote, btrfs-list :
>
> PATH ID TYPE REFER USED
> 'tank' -1 df - 2.66T
> (13.26G free)
> /tank 5 vol 48.00K 48.00K
> media 1906 subvol 1.04T 1.04T
> photos 1909 subvol 116.37G 116.37G
> main 1911 subvol 973.23G 973.23G
> bkp-slash 3270 subvol 15.86G 15.86G
> bkp-quasar 3314 subvol 18.26G 16.00K
> .snaps/bkp-quasar@2015-01-17 3317 rosnap 17.77G 40.76M
> .snaps/bkp-quasar@2015-03-06 3318 rosnap 17.89G 88.88M
> .snaps/bkp-quasar@2015-04-05 3319 rosnap 17.92G 90.97M
> .snaps/bkp-quasar@2015-05-31 3320 rosnap 17.95G 1.02M
> .snaps/bkp-quasar@2015-06-13 3321 rosnap 17.95G 760.00K
> .snaps/bkp-quasar@2015-07-26 3322 rosnap 18.19G 17.88M
> .snaps/bkp-quasar@2015-07-31 3323 rosnap 18.19G 14.58M
> .snaps/bkp-quasar@2015-08-03 3324 rosnap 18.19G 17.43M
> bkp-liznodisk 3341 subvol 7.01G 16.00K
> .snaps/bkp-liznodisk@2015-03-01 3342 rosnap 6.96G 75.37M
> .snaps/bkp-liznodisk@2015-03-28 3343 rosnap 6.98G 84.93M
> .snaps/bkp-liznodisk@2015-04-26 3344 rosnap 6.96G 67.14M
> .snaps/bkp-liznodisk@2015-05-24 3345 rosnap 6.95G 47.67M
> .snaps/bkp-liznodisk@2015-06-27 3346 rosnap 6.96G 67.97M
> .snaps/bkp-liznodisk@2015-07-25 3347 rosnap 6.98G 60.30M
> .snaps/bkp-liznodisk@2015-08-16 3348 rosnap 7.10G 159.44M
> bkp-skyline 3367 subvol 22.52G 22.52G
>
> I just pushed it to https://github.com/speed47/btrfs-list, if anybody is
> interested.
>
> Anyway, balance is running again for 7+ hours, trying to reproduce the
> bug twice, but no crash yet ... should happen soon now !
>
Yeah, that's completely one of the ideal use case of btrfs qgroup.
But I'm quite curious about the btrfsck error report on qgroup.
If btrfsck report such error, it means either I'm too confident about
the recent qgroup accounting rework, or btrfsck has some bug which I
didn't take much consideration during the kernel rework.
Would you please provide the full result of previous btrfsck with qgroup
error?
Thanks,
Qu
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-17 3:03 ` Qu Wenruo
@ 2015-09-17 6:11 ` Stéphane Lesimple
2015-09-17 6:42 ` Qu Wenruo
0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-17 6:11 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
Le 2015-09-17 05:03, Qu Wenruo a écrit :
> Stéphane Lesimple wrote on 2015/09/16 22:41 +0200:
>> Le 2015-09-16 22:18, Duncan a écrit :
>>> Stéphane Lesimple posted on Wed, 16 Sep 2015 15:04:20 +0200 as
>>> excerpted:
>>>
>>
>> Well actually it's the (d) option ;)
>> I activate the quota feature for only one reason : being able to track
>> down how much space my snapshots are taking.
>
> Yeah, that's completely one of the ideal use case of btrfs qgroup.
>
> But I'm quite curious about the btrfsck error report on qgroup.
>
> If btrfsck report such error, it means either I'm too confident about
> the recent qgroup accounting rework, or btrfsck has some bug which I
> didn't take much consideration during the kernel rework.
>
> Would you please provide the full result of previous btrfsck with
> qgroup error?
Sure, I've saved the log somewhere just in case, here your are :
Counts for qgroup id: 3359 are different
our: referenced 7530119168 referenced compressed 7530119168
disk: referenced 7530086400 referenced compressed 7530086400
diff: referenced 32768 referenced compressed 32768
our: exclusive 49152 exclusive compressed 49152
disk: exclusive 32768 exclusive compressed 32768
diff: exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3361 are different
our: referenced 7530119168 referenced compressed 7530119168
disk: referenced 7530086400 referenced compressed 7530086400
diff: referenced 32768 referenced compressed 32768
our: exclusive 49152 exclusive compressed 49152
disk: exclusive 32768 exclusive compressed 32768
diff: exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3362 are different
our: referenced 7530119168 referenced compressed 7530119168
disk: referenced 7530086400 referenced compressed 7530086400
diff: referenced 32768 referenced compressed 32768
our: exclusive 49152 exclusive compressed 49152
disk: exclusive 32768 exclusive compressed 32768
diff: exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3363 are different
our: referenced 7530119168 referenced compressed 7530119168
disk: referenced 7530086400 referenced compressed 7530086400
diff: referenced 32768 referenced compressed 32768
our: exclusive 49152 exclusive compressed 49152
disk: exclusive 32768 exclusive compressed 32768
diff: exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3361 are different
our: referenced 7530119168 referenced compressed 7530119168
disk: referenced 7530086400 referenced compressed 7530086400
diff: referenced 32768 referenced compressed 32768
our: exclusive 49152 exclusive compressed 49152
disk: exclusive 32768 exclusive compressed 32768
diff: exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3362 are different
our: referenced 7530119168 referenced compressed 7530119168
disk: referenced 7530086400 referenced compressed 7530086400
diff: referenced 32768 referenced compressed 32768
our: exclusive 49152 exclusive compressed 49152
disk: exclusive 32768 exclusive compressed 32768
diff: exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3363 are different
our: referenced 7530119168 referenced compressed 7530119168
disk: referenced 7530086400 referenced compressed 7530086400
diff: referenced 32768 referenced compressed 32768
our: exclusive 49152 exclusive compressed 49152
disk: exclusive 32768 exclusive compressed 32768
diff: exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3364 are different
our: referenced 7530119168 referenced compressed 7530119168
disk: referenced 7530086400 referenced compressed 7530086400
diff: referenced 32768 referenced compressed 32768
our: exclusive 49152 exclusive compressed 49152
disk: exclusive 32768 exclusive compressed 32768
diff: exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3365 are different
our: referenced 7530119168 referenced compressed 7530119168
disk: referenced 7530086400 referenced compressed 7530086400
diff: referenced 32768 referenced compressed 32768
our: exclusive 49152 exclusive compressed 49152
disk: exclusive 32768 exclusive compressed 32768
diff: exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3366 are different
our: referenced 7530119168 referenced compressed 7530119168
disk: referenced 7530086400 referenced compressed 7530086400
diff: referenced 32768 referenced compressed 32768
our: exclusive 16384 exclusive compressed 16384
disk: exclusive 16384 exclusive compressed 16384
If you need the really complete log (even with the "crossing stripe
boundary" false positives), I can post it somewhere for you.
I can't do a btrfs qgroup show because I've since disabled quota as
stated above. However you can find the IDs corresponding to subvolumes
and snapshots in my previous post.
Don't hesitate to ask for more information if needed.
Thanks,
--
Stéphane.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-16 13:04 ` Stéphane Lesimple
2015-09-16 20:18 ` Duncan
@ 2015-09-17 6:29 ` Stéphane Lesimple
2015-09-17 7:54 ` Stéphane Lesimple
1 sibling, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-17 6:29 UTC (permalink / raw)
To: linux-btrfs
Le 2015-09-16 15:04, Stéphane Lesimple a écrit :
> I also disabled quota because it has almost for sure nothing
> to do with the bug
As it turns out, it seems that this assertion was completely wrong.
I've got balance running for more than 16 hours now, without a crash.
This is almost 50% of the work done without any issue. Before, a crash
would happen within minutes, sometimes 1 hour, but not much more. The
problem is, I didn't change anything to the filesystem, well, appart
from the benign quota disable. So Qu's question about the qgroups errors
in fsck made me wonder : if I activate quota again, it'll still continue
to balance flawlessly, right ?
Well, it doesn't. I just ran btrfs quota enable on my filesystem, it
completed successfully after some minutes (rescan -s said that no rescan
was pending). Then less than 5 minutes later, the kernel crashed, at the
same BUG_ON() than usually :
[60156.062082] BTRFS info (device dm-3): relocating block group
972839452672 flags 129
[60185.203626] BTRFS info (device dm-3): found 1463 extents
[60414.452890] {btrfs} in insert_inline_extent_backref, got owner <
BTRFS_FIRST_FREE_OBJECTID
[60414.452894] {btrfs} with bytenr=5197436141568 num_bytes=16384
parent=5336636473344 root_objectid=3358 owner=1 offset=0 refs_to_add=1
BTRFS_FIRST_FREE_OBJECTID=256
[60414.452924] ------------[ cut here ]------------
[60414.452928] kernel BUG at fs/btrfs/extent-tree.c:1837!
owner is=1 again at this point in the code (this is still kernel
4.3.0-rc1 with my added printks).
So I'll disable quota, again, and resume the balance. If I'm right, it
should proceed without issue for 18 more hours !
Qu, my filesystem is at your disposal :)
--
Stéphane.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-17 6:11 ` Stéphane Lesimple
@ 2015-09-17 6:42 ` Qu Wenruo
2015-09-17 8:02 ` Stéphane Lesimple
0 siblings, 1 reply; 37+ messages in thread
From: Qu Wenruo @ 2015-09-17 6:42 UTC (permalink / raw)
To: Stéphane Lesimple; +Cc: linux-btrfs
Stéphane Lesimple wrote on 2015/09/17 08:11 +0200:
> Le 2015-09-17 05:03, Qu Wenruo a écrit :
>> Stéphane Lesimple wrote on 2015/09/16 22:41 +0200:
>>> Le 2015-09-16 22:18, Duncan a écrit :
>>>> Stéphane Lesimple posted on Wed, 16 Sep 2015 15:04:20 +0200 as
>>>> excerpted:
>>>>
>>>
>>> Well actually it's the (d) option ;)
>>> I activate the quota feature for only one reason : being able to track
>>> down how much space my snapshots are taking.
>>
>> Yeah, that's completely one of the ideal use case of btrfs qgroup.
>>
>> But I'm quite curious about the btrfsck error report on qgroup.
>>
>> If btrfsck report such error, it means either I'm too confident about
>> the recent qgroup accounting rework, or btrfsck has some bug which I
>> didn't take much consideration during the kernel rework.
>>
>> Would you please provide the full result of previous btrfsck with
>> qgroup error?
>
> Sure, I've saved the log somewhere just in case, here your are :
>
> Counts for qgroup id: 3359 are different
> our: referenced 7530119168 referenced compressed 7530119168
> disk: referenced 7530086400 referenced compressed 7530086400
> diff: referenced 32768 referenced compressed 32768
> our: exclusive 49152 exclusive compressed 49152
> disk: exclusive 32768 exclusive compressed 32768
> diff: exclusive 16384 exclusive compressed 16384
> Counts for qgroup id: 3361 are different
> our: referenced 7530119168 referenced compressed 7530119168
> disk: referenced 7530086400 referenced compressed 7530086400
> diff: referenced 32768 referenced compressed 32768
> our: exclusive 49152 exclusive compressed 49152
> disk: exclusive 32768 exclusive compressed 32768
> diff: exclusive 16384 exclusive compressed 16384
> Counts for qgroup id: 3362 are different
> our: referenced 7530119168 referenced compressed 7530119168
> disk: referenced 7530086400 referenced compressed 7530086400
> diff: referenced 32768 referenced compressed 32768
> our: exclusive 49152 exclusive compressed 49152
> disk: exclusive 32768 exclusive compressed 32768
> diff: exclusive 16384 exclusive compressed 16384
> Counts for qgroup id: 3363 are different
> our: referenced 7530119168 referenced compressed 7530119168
> disk: referenced 7530086400 referenced compressed 7530086400
> diff: referenced 32768 referenced compressed 32768
> our: exclusive 49152 exclusive compressed 49152
> disk: exclusive 32768 exclusive compressed 32768
> diff: exclusive 16384 exclusive compressed 16384
> Counts for qgroup id: 3361 are different
> our: referenced 7530119168 referenced compressed 7530119168
> disk: referenced 7530086400 referenced compressed 7530086400
> diff: referenced 32768 referenced compressed 32768
> our: exclusive 49152 exclusive compressed 49152
> disk: exclusive 32768 exclusive compressed 32768
> diff: exclusive 16384 exclusive compressed 16384
> Counts for qgroup id: 3362 are different
> our: referenced 7530119168 referenced compressed 7530119168
> disk: referenced 7530086400 referenced compressed 7530086400
> diff: referenced 32768 referenced compressed 32768
> our: exclusive 49152 exclusive compressed 49152
> disk: exclusive 32768 exclusive compressed 32768
> diff: exclusive 16384 exclusive compressed 16384
> Counts for qgroup id: 3363 are different
> our: referenced 7530119168 referenced compressed 7530119168
> disk: referenced 7530086400 referenced compressed 7530086400
> diff: referenced 32768 referenced compressed 32768
> our: exclusive 49152 exclusive compressed 49152
> disk: exclusive 32768 exclusive compressed 32768
> diff: exclusive 16384 exclusive compressed 16384
> Counts for qgroup id: 3364 are different
> our: referenced 7530119168 referenced compressed 7530119168
> disk: referenced 7530086400 referenced compressed 7530086400
> diff: referenced 32768 referenced compressed 32768
> our: exclusive 49152 exclusive compressed 49152
> disk: exclusive 32768 exclusive compressed 32768
> diff: exclusive 16384 exclusive compressed 16384
> Counts for qgroup id: 3365 are different
> our: referenced 7530119168 referenced compressed 7530119168
> disk: referenced 7530086400 referenced compressed 7530086400
> diff: referenced 32768 referenced compressed 32768
> our: exclusive 49152 exclusive compressed 49152
> disk: exclusive 32768 exclusive compressed 32768
> diff: exclusive 16384 exclusive compressed 16384
> Counts for qgroup id: 3366 are different
> our: referenced 7530119168 referenced compressed 7530119168
> disk: referenced 7530086400 referenced compressed 7530086400
> diff: referenced 32768 referenced compressed 32768
> our: exclusive 16384 exclusive compressed 16384
> disk: exclusive 16384 exclusive compressed 16384
>
> If you need the really complete log (even with the "crossing stripe
> boundary" false positives), I can post it somewhere for you.
>
> I can't do a btrfs qgroup show because I've since disabled quota as
> stated above. However you can find the IDs corresponding to subvolumes
> and snapshots in my previous post.
>
> Don't hesitate to ask for more information if needed.
>
> Thanks,
>
Thanks for your log, pretty interesting result.
BTW, did you enabled qgroup from old kernel earlier than 4.2-rc1?
If so, I would be much relaxed as they can be the problem of old kernels.
If it's OK for you, would you please enable quota after reproducing the
bug and use for sometime and recheck it?
Thanks,
Qu
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-17 6:29 ` Stéphane Lesimple
@ 2015-09-17 7:54 ` Stéphane Lesimple
0 siblings, 0 replies; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-17 7:54 UTC (permalink / raw)
To: linux-btrfs
Le 2015-09-17 08:29, Stéphane Lesimple a écrit :
> Le 2015-09-16 15:04, Stéphane Lesimple a écrit :
>> I also disabled quota because it has almost for sure nothing
>> to do with the bug
>
> As it turns out, it seems that this assertion was completely wrong.
>
> I've got balance running for more than 16 hours now, without a crash.
> This is almost 50% of the work done without any issue. Before, a crash
> would happen within minutes, sometimes 1 hour, but not much more. The
> problem is, I didn't change anything to the filesystem, well, appart
> from the benign quota disable. So Qu's question about the qgroups
> errors in fsck made me wonder : if I activate quota again, it'll still
> continue to balance flawlessly, right ?
>
> Well, it doesn't. I just ran btrfs quota enable on my filesystem, it
> completed successfully after some minutes (rescan -s said that no
> rescan was pending). Then less than 5 minutes later, the kernel
> crashed, at the same BUG_ON() than usually :
>
> [60156.062082] BTRFS info (device dm-3): relocating block group
> 972839452672 flags 129
> [60185.203626] BTRFS info (device dm-3): found 1463 extents
> [60414.452890] {btrfs} in insert_inline_extent_backref, got owner <
> BTRFS_FIRST_FREE_OBJECTID
> [60414.452894] {btrfs} with bytenr=5197436141568 num_bytes=16384
> parent=5336636473344 root_objectid=3358 owner=1 offset=0 refs_to_add=1
> BTRFS_FIRST_FREE_OBJECTID=256
> [60414.452924] ------------[ cut here ]------------
> [60414.452928] kernel BUG at fs/btrfs/extent-tree.c:1837!
>
> owner is=1 again at this point in the code (this is still kernel
> 4.3.0-rc1 with my added printks).
>
> So I'll disable quota, again, and resume the balance. If I'm right, it
> should proceed without issue for 18 more hours !
Damn, wrong again. It just re-crashed without quota enabled :(
The fact that it went perfectly well for 17+ hours and crashed minutes
after I reactivated quota might be by complete chance then ...
[ 5487.706499] {btrfs} in insert_inline_extent_backref, got owner <
BTRFS_FIRST_FREE_OBJECTID
[ 5487.706504] {btrfs} with bytenr=6906661109760 num_bytes=16384
parent=6905020874752 root_objectid=18446744073709551608 owner=1 offset=0
refs_to_add=1 BTRFS_FIRST_FREE_OBJECTID=256
[ 5487.706536] ------------[ cut here ]------------
[ 5487.706539] kernel BUG at fs/btrfs/extent-tree.c:1837!
For reference, the crash I had earlier this morning was as follows :
[60414.452894] {btrfs} with bytenr=5197436141568 num_bytes=16384
parent=5336636473344 root_objectid=3358 owner=1 offset=0 refs_to_add=1
BTRFS_FIRST_FREE_OBJECTID=256
So, this is a completely different part of the filesystem.
The bug is always the same though, owner=1 where it shouldn't be < 256.
Balance cancelled.
To me, it sounds like some sort of race condition. But I'm out of ideas
on what to test now.
--
Stéphane.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-17 6:42 ` Qu Wenruo
@ 2015-09-17 8:02 ` Stéphane Lesimple
2015-09-17 8:11 ` Qu Wenruo
0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-17 8:02 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
Le 2015-09-17 08:42, Qu Wenruo a écrit :
> Stéphane Lesimple wrote on 2015/09/17 08:11 +0200:
>> Le 2015-09-17 05:03, Qu Wenruo a écrit :
>>> Stéphane Lesimple wrote on 2015/09/16 22:41 +0200:
>>>> Le 2015-09-16 22:18, Duncan a écrit :
>>>>> Stéphane Lesimple posted on Wed, 16 Sep 2015 15:04:20 +0200 as
>>>>> excerpted:
>>>>>
>>>>
>>>> Well actually it's the (d) option ;)
>>>> I activate the quota feature for only one reason : being able to
>>>> track
>>>> down how much space my snapshots are taking.
>>>
>>> Yeah, that's completely one of the ideal use case of btrfs qgroup.
>>>
>>> But I'm quite curious about the btrfsck error report on qgroup.
>>>
>>> If btrfsck report such error, it means either I'm too confident about
>>> the recent qgroup accounting rework, or btrfsck has some bug which I
>>> didn't take much consideration during the kernel rework.
>>>
>>> Would you please provide the full result of previous btrfsck with
>>> qgroup error?
>>
>> Sure, I've saved the log somewhere just in case, here your are :
>>
>> [...]
> Thanks for your log, pretty interesting result.
>
> BTW, did you enabled qgroup from old kernel earlier than 4.2-rc1?
> If so, I would be much relaxed as they can be the problem of old
> kernels.
The mkfs.btrfs was done under 3.19, but I'm almost sure I enabled quota
under 4.2.0 precisely. My kern.log tends to confirm that (looking for
'qgroup scan completed').
> If it's OK for you, would you please enable quota after reproducing
> the bug and use for sometime and recheck it?
Sure, I've just reproduced the bug twice as I wanted, and posted the
info, so now I've cancelled the balance and I can reenable quota. Will
do it under 4.3.0-rc1. I'll keep you posted if btrfsck complains about
it in the following days.
Regards,
--
Stéphane.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-17 8:02 ` Stéphane Lesimple
@ 2015-09-17 8:11 ` Qu Wenruo
2015-09-17 10:08 ` Stéphane Lesimple
0 siblings, 1 reply; 37+ messages in thread
From: Qu Wenruo @ 2015-09-17 8:11 UTC (permalink / raw)
To: Stéphane Lesimple; +Cc: linux-btrfs
Stéphane Lesimple wrote on 2015/09/17 10:02 +0200:
> Le 2015-09-17 08:42, Qu Wenruo a écrit :
>> Stéphane Lesimple wrote on 2015/09/17 08:11 +0200:
>>> Le 2015-09-17 05:03, Qu Wenruo a écrit :
>>>> Stéphane Lesimple wrote on 2015/09/16 22:41 +0200:
>>>>> Le 2015-09-16 22:18, Duncan a écrit :
>>>>>> Stéphane Lesimple posted on Wed, 16 Sep 2015 15:04:20 +0200 as
>>>>>> excerpted:
>>>>>>
>>>>>
>>>>> Well actually it's the (d) option ;)
>>>>> I activate the quota feature for only one reason : being able to track
>>>>> down how much space my snapshots are taking.
>>>>
>>>> Yeah, that's completely one of the ideal use case of btrfs qgroup.
>>>>
>>>> But I'm quite curious about the btrfsck error report on qgroup.
>>>>
>>>> If btrfsck report such error, it means either I'm too confident about
>>>> the recent qgroup accounting rework, or btrfsck has some bug which I
>>>> didn't take much consideration during the kernel rework.
>>>>
>>>> Would you please provide the full result of previous btrfsck with
>>>> qgroup error?
>>>
>>> Sure, I've saved the log somewhere just in case, here your are :
>>>
>>> [...]
>> Thanks for your log, pretty interesting result.
>>
>> BTW, did you enabled qgroup from old kernel earlier than 4.2-rc1?
>> If so, I would be much relaxed as they can be the problem of old kernels.
>
> The mkfs.btrfs was done under 3.19, but I'm almost sure I enabled quota
> under 4.2.0 precisely. My kern.log tends to confirm that (looking for
> 'qgroup scan completed').
Emmm, seems I need to pay more attention on this case now.
Any info about the workload for this btrfs fs?
>
>> If it's OK for you, would you please enable quota after reproducing
>> the bug and use for sometime and recheck it?
>
> Sure, I've just reproduced the bug twice as I wanted, and posted the
> info, so now I've cancelled the balance and I can reenable quota. Will
> do it under 4.3.0-rc1. I'll keep you posted if btrfsck complains about
> it in the following days.
>
> Regards,
>
Thanks for your patience and detailed report.
But I still have another question, did you do any snapshot deletion
after quota enabled?
(I'll assume you did it, as there are a lot of backup snapshot, old ones
should be already deleted)
That's one of the known bug and Mark is working on it actively.
If you delete non-empty snapshot a lot, then I'd better add a hot fix to
mark qgroup inconsistent after snapshot delete, and trigger a rescan if
possible.
Thanks,
Qu
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-17 8:11 ` Qu Wenruo
@ 2015-09-17 10:08 ` Stéphane Lesimple
2015-09-17 10:41 ` Qu Wenruo
0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-17 10:08 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
Le 2015-09-17 10:11, Qu Wenruo a écrit :
> Stéphane Lesimple wrote on 2015/09/17 10:02 +0200:
>> Le 2015-09-17 08:42, Qu Wenruo a écrit :
>>> Stéphane Lesimple wrote on 2015/09/17 08:11 +0200:
>>>> Le 2015-09-17 05:03, Qu Wenruo a écrit :
>>>>> Stéphane Lesimple wrote on 2015/09/16 22:41 +0200:
>>>>>> Le 2015-09-16 22:18, Duncan a écrit :
>>>>>>> Stéphane Lesimple posted on Wed, 16 Sep 2015 15:04:20 +0200 as
>>>>>>> excerpted:
>>>>>>>
>>>>>>
>>>>>> Well actually it's the (d) option ;)
>>>>>> I activate the quota feature for only one reason : being able to
>>>>>> track
>>>>>> down how much space my snapshots are taking.
>>>>>
>>>>> Yeah, that's completely one of the ideal use case of btrfs qgroup.
>>>>>
>>>>> But I'm quite curious about the btrfsck error report on qgroup.
>>>>>
>>>>> If btrfsck report such error, it means either I'm too confident
>>>>> about
>>>>> the recent qgroup accounting rework, or btrfsck has some bug which
>>>>> I
>>>>> didn't take much consideration during the kernel rework.
>>>>>
>>>>> Would you please provide the full result of previous btrfsck with
>>>>> qgroup error?
>>>>
>>>> Sure, I've saved the log somewhere just in case, here your are :
>>>>
>>>> [...]
>>> Thanks for your log, pretty interesting result.
>>>
>>> BTW, did you enabled qgroup from old kernel earlier than 4.2-rc1?
>>> If so, I would be much relaxed as they can be the problem of old
>>> kernels.
>>
>> The mkfs.btrfs was done under 3.19, but I'm almost sure I enabled
>> quota
>> under 4.2.0 precisely. My kern.log tends to confirm that (looking for
>> 'qgroup scan completed').
>
> Emmm, seems I need to pay more attention on this case now.
> Any info about the workload for this btrfs fs?
>
>>
>>> If it's OK for you, would you please enable quota after reproducing
>>> the bug and use for sometime and recheck it?
>>
>> Sure, I've just reproduced the bug twice as I wanted, and posted the
>> info, so now I've cancelled the balance and I can reenable quota. Will
>> do it under 4.3.0-rc1. I'll keep you posted if btrfsck complains about
>> it in the following days.
>>
>> Regards,
>>
> Thanks for your patience and detailed report.
You're very welcome.
> But I still have another question, did you do any snapshot deletion
> after quota enabled?
> (I'll assume you did it, as there are a lot of backup snapshot, old
> ones should be already deleted)
Actually no : this btrfs system is quite new (less than a week old) as
I'm migrating from mdadm(raid1)+ext4 to btrfs. So those snapshots were
actually rsynced one by one from my hardlinks-based "snapshots" under
ext4 (those pseudo-snapshots are created using a program named
"rsnapshot", if you know it. This is basically a wrapper to cp -la). I
didn't activate yet an automatic snapshot/delete on my btrfs system, due
to the bugs I'm tripping on. So no snapshot was deleted.
> That's one of the known bug and Mark is working on it actively.
> If you delete non-empty snapshot a lot, then I'd better add a hot fix
> to mark qgroup inconsistent after snapshot delete, and trigger a
> rescan if possible.
I've made a btrfs-image of the filesystem just before disabling quotas
(which I did to get a clean btrfsck and eliminate quotas from the
equation trying to reproduce the bug I have). Would it be of any use if
I drop it somewhere for you to pick it up ? (2.9G in size).
In the meantime, I've reactivated quotas, umounted the filesystem and
ran a btrfsck on it : as you would expect, there's no qgroup problem
reported so far. I'll clear all my snapshots, run an quota rescan, then
re-create them one by one by rsyncing from my ext4 system I still have.
Maybe I'll run into the issue again.
--
Stéphane.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-17 10:08 ` Stéphane Lesimple
@ 2015-09-17 10:41 ` Qu Wenruo
2015-09-17 18:47 ` Stéphane Lesimple
0 siblings, 1 reply; 37+ messages in thread
From: Qu Wenruo @ 2015-09-17 10:41 UTC (permalink / raw)
To: Stéphane Lesimple, Qu Wenruo; +Cc: linux-btrfs
在 2015年09月17日 18:08, Stéphane Lesimple 写道:
> Le 2015-09-17 10:11, Qu Wenruo a écrit :
>> Stéphane Lesimple wrote on 2015/09/17 10:02 +0200:
>>> Le 2015-09-17 08:42, Qu Wenruo a écrit :
>>>> Stéphane Lesimple wrote on 2015/09/17 08:11 +0200:
>>>>> Le 2015-09-17 05:03, Qu Wenruo a écrit :
>>>>>> Stéphane Lesimple wrote on 2015/09/16 22:41 +0200:
>>>>>>> Le 2015-09-16 22:18, Duncan a écrit :
>>>>>>>> Stéphane Lesimple posted on Wed, 16 Sep 2015 15:04:20 +0200 as
>>>>>>>> excerpted:
>>>>>>>>
>>>>>>>
>>>>>>> Well actually it's the (d) option ;)
>>>>>>> I activate the quota feature for only one reason : being able to
>>>>>>> track
>>>>>>> down how much space my snapshots are taking.
>>>>>>
>>>>>> Yeah, that's completely one of the ideal use case of btrfs qgroup.
>>>>>>
>>>>>> But I'm quite curious about the btrfsck error report on qgroup.
>>>>>>
>>>>>> If btrfsck report such error, it means either I'm too confident about
>>>>>> the recent qgroup accounting rework, or btrfsck has some bug which I
>>>>>> didn't take much consideration during the kernel rework.
>>>>>>
>>>>>> Would you please provide the full result of previous btrfsck with
>>>>>> qgroup error?
>>>>>
>>>>> Sure, I've saved the log somewhere just in case, here your are :
>>>>>
>>>>> [...]
>>>> Thanks for your log, pretty interesting result.
>>>>
>>>> BTW, did you enabled qgroup from old kernel earlier than 4.2-rc1?
>>>> If so, I would be much relaxed as they can be the problem of old
>>>> kernels.
>>>
>>> The mkfs.btrfs was done under 3.19, but I'm almost sure I enabled quota
>>> under 4.2.0 precisely. My kern.log tends to confirm that (looking for
>>> 'qgroup scan completed').
>>
>> Emmm, seems I need to pay more attention on this case now.
>> Any info about the workload for this btrfs fs?
>>
>>>
>>>> If it's OK for you, would you please enable quota after reproducing
>>>> the bug and use for sometime and recheck it?
>>>
>>> Sure, I've just reproduced the bug twice as I wanted, and posted the
>>> info, so now I've cancelled the balance and I can reenable quota. Will
>>> do it under 4.3.0-rc1. I'll keep you posted if btrfsck complains about
>>> it in the following days.
>>>
>>> Regards,
>>>
>> Thanks for your patience and detailed report.
>
> You're very welcome.
>
>> But I still have another question, did you do any snapshot deletion
>> after quota enabled?
>> (I'll assume you did it, as there are a lot of backup snapshot, old
>> ones should be already deleted)
>
> Actually no : this btrfs system is quite new (less than a week old) as
> I'm migrating from mdadm(raid1)+ext4 to btrfs. So those snapshots were
> actually rsynced one by one from my hardlinks-based "snapshots" under
> ext4 (those pseudo-snapshots are created using a program named
> "rsnapshot", if you know it. This is basically a wrapper to cp -la). I
> didn't activate yet an automatic snapshot/delete on my btrfs system, due
> to the bugs I'm tripping on. So no snapshot was deleted.
Now things are getting tricky, as all known bugs are ruled out, it must
be another hidden bug, even we tried to rework the qgroup accounting code.
>
>> That's one of the known bug and Mark is working on it actively.
>> If you delete non-empty snapshot a lot, then I'd better add a hot fix
>> to mark qgroup inconsistent after snapshot delete, and trigger a
>> rescan if possible.
>
> I've made a btrfs-image of the filesystem just before disabling quotas
> (which I did to get a clean btrfsck and eliminate quotas from the
> equation trying to reproduce the bug I have). Would it be of any use if
> I drop it somewhere for you to pick it up ? (2.9G in size).
For dismatch case, static btrfs-image dump won't really help.
As the important point is, when and which operation caused qgroup
accounting to dismatch.
>
> In the meantime, I've reactivated quotas, umounted the filesystem and
> ran a btrfsck on it : as you would expect, there's no qgroup problem
> reported so far.
At least, rescan code is working without problem.
> I'll clear all my snapshots, run an quota rescan, then
> re-create them one by one by rsyncing from my ext4 system I still have.
> Maybe I'll run into the issue again.
>
Would you mind to do the following check for each subvolume rsync?
1) Do 'sync; btrfs qgroup show -prce --raw' and save the output
2) Create the needed snapshot
3) Do 'sync; btrfs qgroup show -prce --raw' and save the output
4) Avoid doing IO if possible until step 6)
5) Do 'btrfs quota rescan -w' and save it
6) Do 'sync; btrfs qgroup show -prce --raw' and save the output
7) Rsync data from ext4 to the newly created snapshot
The point is, as you mentioned, rescan is working fine, we can compare
output from 3), 6) and 1) to see which qgroup accounting number changes.
And if differs, which means the qgroup update at write time OR snapshot
creation has something wrong, at least we can locate the problem to
qgroup update routine or snapshot creation.
Thanks,
Qu
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-17 10:41 ` Qu Wenruo
@ 2015-09-17 18:47 ` Stéphane Lesimple
2015-09-18 0:59 ` Qu Wenruo
0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-17 18:47 UTC (permalink / raw)
To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs
Le 2015-09-17 12:41, Qu Wenruo a écrit :
>> In the meantime, I've reactivated quotas, umounted the filesystem and
>> ran a btrfsck on it : as you would expect, there's no qgroup problem
>> reported so far.
>
> At least, rescan code is working without problem.
>
>> I'll clear all my snapshots, run an quota rescan, then
>> re-create them one by one by rsyncing from my ext4 system I still
>> have.
>> Maybe I'll run into the issue again.
>>
>
> Would you mind to do the following check for each subvolume rsync?
>
> 1) Do 'sync; btrfs qgroup show -prce --raw' and save the output
> 2) Create the needed snapshot
> 3) Do 'sync; btrfs qgroup show -prce --raw' and save the output
> 4) Avoid doing IO if possible until step 6)
> 5) Do 'btrfs quota rescan -w' and save it
> 6) Do 'sync; btrfs qgroup show -prce --raw' and save the output
> 7) Rsync data from ext4 to the newly created snapshot
>
> The point is, as you mentioned, rescan is working fine, we can compare
> output from 3), 6) and 1) to see which qgroup accounting number
> changes.
>
> And if differs, which means the qgroup update at write time OR
> snapshot creation has something wrong, at least we can locate the
> problem to qgroup update routine or snapshot creation.
I was about to do that, but first there's something that sounds strange
: I've begun by trashing all my snapshots, then ran a quota rescan, and
waited for it to complete, to start on a sane base.
However, this is the output of qgroup show now :
qgroupid rfer excl max_rfer max_excl
parent child
-------- ---- ---- -------- --------
------ -----
0/5 16384 16384 none none
--- ---
0/1906 1657848029184 1657848029184 none none
--- ---
0/1909 124950921216 124950921216 none none
--- ---
0/1911 1054587293696 1054587293696 none none
--- ---
0/3270 23727300608 23727300608 none none
--- ---
0/3314 23206055936 23206055936 none none
--- ---
0/3317 18472996864 0 none none
--- ---
0/3318 22235709440 18446744073708421120 none none
--- ---
0/3319 22240333824 0 none none
--- ---
0/3320 22289608704 0 none none
--- ---
0/3321 22289608704 0 none none
--- ---
0/3322 18461151232 0 none none
--- ---
0/3323 18423902208 0 none none
--- ---
0/3324 18423902208 0 none none
--- ---
0/3325 18463506432 0 none none
--- ---
0/3326 18463506432 0 none none
--- ---
0/3327 18463506432 0 none none
--- ---
0/3328 18463506432 0 none none
--- ---
0/3329 18585427968 0 none none
--- ---
0/3330 18621472768 18446744073251348480 none none
--- ---
0/3331 18621472768 0 none none
--- ---
0/3332 18621472768 0 none none
--- ---
0/3333 18783076352 0 none none
--- ---
0/3334 18799804416 0 none none
--- ---
0/3335 18799804416 0 none none
--- ---
0/3336 18816217088 0 none none
--- ---
0/3337 18816266240 0 none none
--- ---
0/3338 18816266240 0 none none
--- ---
0/3339 18816266240 0 none none
--- ---
0/3340 18816364544 0 none none
--- ---
0/3341 7530119168 7530119168 none none
--- ---
0/3342 4919283712 0 none none
--- ---
0/3343 4921724928 0 none none
--- ---
0/3344 4921724928 0 none none
--- ---
0/3345 6503317504 18446744073690902528 none none
--- ---
0/3346 6503452672 0 none none
--- ---
0/3347 6509514752 0 none none
--- ---
0/3348 6515793920 0 none none
--- ---
0/3349 6515793920 0 none none
--- ---
0/3350 6518685696 0 none none
--- ---
0/3351 6521511936 0 none none
--- ---
0/3352 6521511936 0 none none
--- ---
0/3353 6521544704 0 none none
--- ---
0/3354 6597963776 0 none none
--- ---
0/3355 6598275072 0 none none
--- ---
0/3356 6635880448 0 none none
--- ---
0/3357 6635880448 0 none none
--- ---
0/3358 6635880448 0 none none
--- ---
0/3359 6635880448 0 none none
--- ---
0/3360 6635880448 0 none none
--- ---
0/3361 6635880448 0 none none
--- ---
0/3362 6635880448 0 none none
--- ---
0/3363 6635880448 0 none none
--- ---
0/3364 6635880448 0 none none
--- ---
0/3365 6635880448 0 none none
--- ---
0/3366 6635896832 0 none none
--- ---
0/3367 24185790464 24185790464 none none
--- ---
I would have expected all these qgroupids to have been trashed with the
snapshots, but it seems not. It reminded me of the bug you were talking
about, where deleted snapshots don't always clear correctly their
qgroup, but as these don't disappear after a rescan either... I'm a bit
surprised.
I've just tried quota disable / quota enable, and not it seems OK. Just
wanted to let you know, in case it's not known behavior ...
The procedure I'll use will be slighlty different from what you
proposed, but to my understanding it won't change the result :
> 0) Rsync data from the next ext4 "snapshot" to the subvolume
> 1) Do 'sync; btrfs qgroup show -prce --raw' and save the output
> 2) Create the needed readonly snapshot on btrfs
> 3) Do 'sync; btrfs qgroup show -prce --raw' and save the output
> 4) Avoid doing IO if possible until step 6)
> 5) Do 'btrfs quota rescan -w' and save it
> 6) Do 'sync; btrfs qgroup show -prce --raw' and save the output
I'll post the results once this is done.
--
Stéphane.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-17 18:47 ` Stéphane Lesimple
@ 2015-09-18 0:59 ` Qu Wenruo
2015-09-18 7:36 ` Stéphane Lesimple
0 siblings, 1 reply; 37+ messages in thread
From: Qu Wenruo @ 2015-09-18 0:59 UTC (permalink / raw)
To: Stéphane Lesimple, Qu Wenruo; +Cc: linux-btrfs
Stéphane Lesimple wrote on 2015/09/17 20:47 +0200:
> Le 2015-09-17 12:41, Qu Wenruo a écrit :
>>> In the meantime, I've reactivated quotas, umounted the filesystem and
>>> ran a btrfsck on it : as you would expect, there's no qgroup problem
>>> reported so far.
>>
>> At least, rescan code is working without problem.
>>
>>> I'll clear all my snapshots, run an quota rescan, then
>>> re-create them one by one by rsyncing from my ext4 system I still have.
>>> Maybe I'll run into the issue again.
>>>
>>
>> Would you mind to do the following check for each subvolume rsync?
>>
>> 1) Do 'sync; btrfs qgroup show -prce --raw' and save the output
>> 2) Create the needed snapshot
>> 3) Do 'sync; btrfs qgroup show -prce --raw' and save the output
>> 4) Avoid doing IO if possible until step 6)
>> 5) Do 'btrfs quota rescan -w' and save it
>> 6) Do 'sync; btrfs qgroup show -prce --raw' and save the output
>> 7) Rsync data from ext4 to the newly created snapshot
>>
>> The point is, as you mentioned, rescan is working fine, we can compare
>> output from 3), 6) and 1) to see which qgroup accounting number
>> changes.
>>
>> And if differs, which means the qgroup update at write time OR
>> snapshot creation has something wrong, at least we can locate the
>> problem to qgroup update routine or snapshot creation.
>
> I was about to do that, but first there's something that sounds strange
> : I've begun by trashing all my snapshots, then ran a quota rescan, and
> waited for it to complete, to start on a sane base.
> However, this is the output of qgroup show now :
By "trashing", did you mean deleting all the files inside the subvolume?
Or "btrfs subv del"?
>
> qgroupid rfer excl max_rfer max_excl
> parent child
> -------- ---- ---- -------- --------
> ------ -----
> 0/5 16384 16384 none none
> --- ---
> 0/1906 1657848029184 1657848029184 none none
> --- ---
> 0/1909 124950921216 124950921216 none none
> --- ---
> 0/1911 1054587293696 1054587293696 none none
> --- ---
> 0/3270 23727300608 23727300608 none none
> --- ---
> 0/3314 23206055936 23206055936 none none
> --- ---
> 0/3317 18472996864 0 none none
> --- ---
> 0/3318 22235709440 18446744073708421120 none none
> --- ---
> 0/3319 22240333824 0 none none
> --- ---
> 0/3320 22289608704 0 none none
> --- ---
> 0/3321 22289608704 0 none none
> --- ---
> 0/3322 18461151232 0 none none
> --- ---
> 0/3323 18423902208 0 none none
> --- ---
> 0/3324 18423902208 0 none none
> --- ---
> 0/3325 18463506432 0 none none
> --- ---
> 0/3326 18463506432 0 none none
> --- ---
> 0/3327 18463506432 0 none none
> --- ---
> 0/3328 18463506432 0 none none
> --- ---
> 0/3329 18585427968 0 none none
> --- ---
> 0/3330 18621472768 18446744073251348480 none none
> --- ---
> 0/3331 18621472768 0 none none
> --- ---
> 0/3332 18621472768 0 none none
> --- ---
> 0/3333 18783076352 0 none none
> --- ---
> 0/3334 18799804416 0 none none
> --- ---
> 0/3335 18799804416 0 none none
> --- ---
> 0/3336 18816217088 0 none none
> --- ---
> 0/3337 18816266240 0 none none
> --- ---
> 0/3338 18816266240 0 none none
> --- ---
> 0/3339 18816266240 0 none none
> --- ---
> 0/3340 18816364544 0 none none
> --- ---
> 0/3341 7530119168 7530119168 none none
> --- ---
> 0/3342 4919283712 0 none none
> --- ---
> 0/3343 4921724928 0 none none
> --- ---
> 0/3344 4921724928 0 none none
> --- ---
> 0/3345 6503317504 18446744073690902528 none none
> --- ---
> 0/3346 6503452672 0 none none
> --- ---
> 0/3347 6509514752 0 none none
> --- ---
> 0/3348 6515793920 0 none none
> --- ---
> 0/3349 6515793920 0 none none
> --- ---
> 0/3350 6518685696 0 none none
> --- ---
> 0/3351 6521511936 0 none none
> --- ---
> 0/3352 6521511936 0 none none
> --- ---
> 0/3353 6521544704 0 none none
> --- ---
> 0/3354 6597963776 0 none none
> --- ---
> 0/3355 6598275072 0 none none
> --- ---
> 0/3356 6635880448 0 none none
> --- ---
> 0/3357 6635880448 0 none none
> --- ---
> 0/3358 6635880448 0 none none
> --- ---
> 0/3359 6635880448 0 none none
> --- ---
> 0/3360 6635880448 0 none none
> --- ---
> 0/3361 6635880448 0 none none
> --- ---
> 0/3362 6635880448 0 none none
> --- ---
> 0/3363 6635880448 0 none none
> --- ---
> 0/3364 6635880448 0 none none
> --- ---
> 0/3365 6635880448 0 none none
> --- ---
> 0/3366 6635896832 0 none none
> --- ---
> 0/3367 24185790464 24185790464 none none
> --- ---
>
Nooooo!! What a wired result here!
Qg 3345 is having minus number again, even after a qgroup rescan....
IIRC, from the code, rescan is just passing old_roots as NULL, and use
correct new_roots to build up "rfer" and "excl".
So in theory it should never go below zero in rescan.
The only hope for me is, that's a orphan qgroup.(mentioned below)
> I would have expected all these qgroupids to have been trashed with the
> snapshots, but it seems not. It reminded me of the bug you were talking
> about, where deleted snapshots don't always clear correctly their
> qgroup, but as these don't disappear after a rescan either... I'm a bit
> surprised.
If you mean you "btrfs qgroup del" the subvolume, then it's known the
qgroup won't be deleted, and won't be associated to any subvolume.
(It's possible later created subvolume uses the old subvolid, and be
associated to the qgroup again).
If above qgroups with 0 or even minus "excl" number are orphan, I'll be
much relieved, as it'll be a minor orphan qgroup bug other than another
possible qgroup rework(or at least huge review).
>
> I've just tried quota disable / quota enable, and not it seems OK. Just
> wanted to let you know, in case it's not known behavior ...
Thanks for your info a lot, which indeed expose something we didn't take
much consideration.
And if the qgroups are the same with above description, would you mind
to remove these qgroups?
>
> The procedure I'll use will be slighlty different from what you
> proposed, but to my understanding it won't change the result :
>
>> 0) Rsync data from the next ext4 "snapshot" to the subvolume
>> 1) Do 'sync; btrfs qgroup show -prce --raw' and save the output
>> 2) Create the needed readonly snapshot on btrfs
>> 3) Do 'sync; btrfs qgroup show -prce --raw' and save the output
>> 4) Avoid doing IO if possible until step 6)
>> 5) Do 'btrfs quota rescan -w' and save it
>> 6) Do 'sync; btrfs qgroup show -prce --raw' and save the output
>
> I'll post the results once this is done.
>
Thanks a lot!
Qu
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-18 0:59 ` Qu Wenruo
@ 2015-09-18 7:36 ` Stéphane Lesimple
2015-09-18 10:15 ` Stéphane Lesimple
0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-18 7:36 UTC (permalink / raw)
To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs
Le 2015-09-18 02:59, Qu Wenruo a écrit :
> Stéphane Lesimple wrote on 2015/09/17 20:47 +0200:
>> Le 2015-09-17 12:41, Qu Wenruo a écrit :
>>>> In the meantime, I've reactivated quotas, umounted the filesystem
>>>> and
>>>> ran a btrfsck on it : as you would expect, there's no qgroup problem
>>>> reported so far.
>>>
>>> At least, rescan code is working without problem.
>>>
>>>> I'll clear all my snapshots, run an quota rescan, then
>>>> re-create them one by one by rsyncing from my ext4 system I still
>>>> have.
>>>> Maybe I'll run into the issue again.
>>>>
>>>
>>> Would you mind to do the following check for each subvolume rsync?
>>>
>>> 1) Do 'sync; btrfs qgroup show -prce --raw' and save the output
>>> 2) Create the needed snapshot
>>> 3) Do 'sync; btrfs qgroup show -prce --raw' and save the output
>>> 4) Avoid doing IO if possible until step 6)
>>> 5) Do 'btrfs quota rescan -w' and save it
>>> 6) Do 'sync; btrfs qgroup show -prce --raw' and save the output
>>> 7) Rsync data from ext4 to the newly created snapshot
>>>
>>> The point is, as you mentioned, rescan is working fine, we can
>>> compare
>>> output from 3), 6) and 1) to see which qgroup accounting number
>>> changes.
>>>
>>> And if differs, which means the qgroup update at write time OR
>>> snapshot creation has something wrong, at least we can locate the
>>> problem to qgroup update routine or snapshot creation.
>>
>> I was about to do that, but first there's something that sounds
>> strange
>> : I've begun by trashing all my snapshots, then ran a quota rescan,
>> and
>> waited for it to complete, to start on a sane base.
>> However, this is the output of qgroup show now :
>
> By "trashing", did you mean deleting all the files inside the
> subvolume?
> Or "btrfs subv del"?
Sorry for the confusion here, yes, I meant btrfs subvolume del.
>> qgroupid rfer excl max_rfer max_excl
>> parent child
>> -------- ---- ---- -------- --------
>> ------ -----
>> 0/5 16384 16384 none none
>> --- ---
>> 0/1906 1657848029184 1657848029184 none none
>> --- ---
>> 0/1909 124950921216 124950921216 none none
>> --- ---
>> 0/1911 1054587293696 1054587293696 none none
>> --- ---
>> 0/3270 23727300608 23727300608 none none
>> --- ---
>> 0/3314 23206055936 23206055936 none none
>> --- ---
>> 0/3317 18472996864 0 none none
>> --- ---
>> 0/3318 22235709440 18446744073708421120 none none
>> --- ---
>> 0/3319 22240333824 0 none none
>> --- ---
>> 0/3320 22289608704 0 none none
>> --- ---
>> 0/3321 22289608704 0 none none
>> --- ---
>> 0/3322 18461151232 0 none none
>> --- ---
>> 0/3323 18423902208 0 none none
>> --- ---
>> 0/3324 18423902208 0 none none
>> --- ---
>> 0/3325 18463506432 0 none none
>> --- ---
>> 0/3326 18463506432 0 none none
>> --- ---
>> 0/3327 18463506432 0 none none
>> --- ---
>> 0/3328 18463506432 0 none none
>> --- ---
>> 0/3329 18585427968 0 none none
>> --- ---
>> 0/3330 18621472768 18446744073251348480 none none
>> --- ---
>> 0/3331 18621472768 0 none none
>> --- ---
>> 0/3332 18621472768 0 none none
>> --- ---
>> 0/3333 18783076352 0 none none
>> --- ---
>> 0/3334 18799804416 0 none none
>> --- ---
>> 0/3335 18799804416 0 none none
>> --- ---
>> 0/3336 18816217088 0 none none
>> --- ---
>> 0/3337 18816266240 0 none none
>> --- ---
>> 0/3338 18816266240 0 none none
>> --- ---
>> 0/3339 18816266240 0 none none
>> --- ---
>> 0/3340 18816364544 0 none none
>> --- ---
>> 0/3341 7530119168 7530119168 none none
>> --- ---
>> 0/3342 4919283712 0 none none
>> --- ---
>> 0/3343 4921724928 0 none none
>> --- ---
>> 0/3344 4921724928 0 none none
>> --- ---
>> 0/3345 6503317504 18446744073690902528 none none
>> --- ---
>> 0/3346 6503452672 0 none none
>> --- ---
>> 0/3347 6509514752 0 none none
>> --- ---
>> 0/3348 6515793920 0 none none
>> --- ---
>> 0/3349 6515793920 0 none none
>> --- ---
>> 0/3350 6518685696 0 none none
>> --- ---
>> 0/3351 6521511936 0 none none
>> --- ---
>> 0/3352 6521511936 0 none none
>> --- ---
>> 0/3353 6521544704 0 none none
>> --- ---
>> 0/3354 6597963776 0 none none
>> --- ---
>> 0/3355 6598275072 0 none none
>> --- ---
>> 0/3356 6635880448 0 none none
>> --- ---
>> 0/3357 6635880448 0 none none
>> --- ---
>> 0/3358 6635880448 0 none none
>> --- ---
>> 0/3359 6635880448 0 none none
>> --- ---
>> 0/3360 6635880448 0 none none
>> --- ---
>> 0/3361 6635880448 0 none none
>> --- ---
>> 0/3362 6635880448 0 none none
>> --- ---
>> 0/3363 6635880448 0 none none
>> --- ---
>> 0/3364 6635880448 0 none none
>> --- ---
>> 0/3365 6635880448 0 none none
>> --- ---
>> 0/3366 6635896832 0 none none
>> --- ---
>> 0/3367 24185790464 24185790464 none none
>> --- ---
>>
>
> Nooooo!! What a wired result here!
> Qg 3345 is having minus number again, even after a qgroup rescan....
> IIRC, from the code, rescan is just passing old_roots as NULL, and use
> correct new_roots to build up "rfer" and "excl".
> So in theory it should never go below zero in rescan.
>
> The only hope for me is, that's a orphan qgroup.(mentioned below)
>
>> I would have expected all these qgroupids to have been trashed with
>> the
>> snapshots, but it seems not. It reminded me of the bug you were
>> talking
>> about, where deleted snapshots don't always clear correctly their
>> qgroup, but as these don't disappear after a rescan either... I'm a
>> bit
>> surprised.
>
> If you mean you "btrfs qgroup del" the subvolume, then it's known the
> qgroup won't be deleted, and won't be associated to any subvolume.
> (It's possible later created subvolume uses the old subvolid, and be
> associated to the qgroup again).
>
> If above qgroups with 0 or even minus "excl" number are orphan, I'll
> be much relieved, as it'll be a minor orphan qgroup bug other than
> another possible qgroup rework(or at least huge review).
The only qgroup subcommand I use is qroup show, I never deleted a qgroup
directly by using qgroup del... I guess this is not good news :(
>> I've just tried quota disable / quota enable, and not it seems OK.
>> Just
>> wanted to let you know, in case it's not known behavior ...
There's a typo above, I was meaning "and *now* it seems OK".
I'm sure you corrected, I just want to be sure there's no possibility of
misinterpretation.
> Thanks for your info a lot, which indeed expose something we didn't
> take much consideration.
>
> And if the qgroups are the same with above description, would you mind
> to remove these qgroups?
Sure, I did a quota disable / quota enable before running the snapshot
debug procedure, so the qgroups were clean again when I started :
qgroupid rfer excl max_rfer max_excl parent
child
-------- ---- ---- -------- -------- ------
-----
0/5 16384 16384 none none ---
---
0/1906 1657848029184 1657848029184 none none ---
---
0/1909 124950921216 124950921216 none none ---
---
0/1911 1054587293696 1054587293696 none none ---
---
0/3270 23727300608 23727300608 none none ---
---
0/3314 23221784576 23221784576 none none ---
---
0/3341 7479275520 7479275520 none none ---
---
0/3367 24185790464 24185790464 none none ---
---
The test is running, I expect to post the results within an hour or two.
--
Stéphane.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-18 7:36 ` Stéphane Lesimple
@ 2015-09-18 10:15 ` Stéphane Lesimple
2015-09-18 10:26 ` Stéphane Lesimple
0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-18 10:15 UTC (permalink / raw)
To: Stéphane Lesimple; +Cc: Qu Wenruo, Qu Wenruo, linux-btrfs
Le 2015-09-18 09:36, Stéphane Lesimple a écrit :
> Sure, I did a quota disable / quota enable before running the snapshot
> debug procedure, so the qgroups were clean again when I started :
>
> qgroupid rfer excl max_rfer max_excl parent
> child
> -------- ---- ---- -------- -------- ------
> -----
> 0/5 16384 16384 none none ---
> ---
> 0/1906 1657848029184 1657848029184 none none ---
> ---
> 0/1909 124950921216 124950921216 none none ---
> ---
> 0/1911 1054587293696 1054587293696 none none ---
> ---
> 0/3270 23727300608 23727300608 none none ---
> ---
> 0/3314 23221784576 23221784576 none none ---
> ---
> 0/3341 7479275520 7479275520 none none ---
> ---
> 0/3367 24185790464 24185790464 none none ---
> ---
>
> The test is running, I expect to post the results within an hour or
> two.
Well, my system crashed twice while running the procedure...
By "crashed" I mean : the machine no longer pings, and nothing is logged
in kern.log unfortunately :
[ 7096.735731] BTRFS info (device dm-3): qgroup scan completed
(inconsistency flag cleared)
[ 7172.614851] BTRFS info (device dm-3): qgroup scan completed
(inconsistency flag cleared)
[ 7242.870259] BTRFS info (device dm-3): qgroup scan completed
(inconsistency flag cleared)
[ 7321.466931] BTRFS info (device dm-3): qgroup scan completed
(inconsistency flag cleared)
[ 0.000000] Initializing cgroup subsys cpuset
The even stranger part is that the last 2 stdout dump files exist but
are empty :
-rw-r--r-- 1 root root 21 Sep 18 10:29 snap32.step5
-rw-r--r-- 1 root root 3.2K Sep 18 10:29 snap32.step6
-rw-r--r-- 1 root root 3.2K Sep 18 10:29 snap33.step1
-rw-r--r-- 1 root root 3.3K Sep 18 10:29 snap33.step3
-rw-r--r-- 1 root root 21 Sep 18 10:30 snap33.step5
-rw-r--r-- 1 root root 3.3K Sep 18 10:30 snap33.step6
-rw-r--r-- 1 root root 3.3K Sep 18 10:30 snap34.step1
-rw-r--r-- 1 root root 0 Sep 18 10:30 snap34.step3 <==
-rw-r--r-- 1 root root 0 Sep 18 10:30 snap34.step5 <==
The mentioned steps are as follows :
0) Rsync data from the next ext4 "snapshot" to the subvolume
1) Do 'sync; btrfs qgroup show -prce --raw' and save the output <==
2) Create the needed readonly snapshot on btrfs
3) Do 'sync; btrfs qgroup show -prce --raw' and save the output <==
4) Avoid doing IO if possible until step 6)
5) Do 'btrfs quota rescan -w' and save it <==
6) Do 'sync; btrfs qgroup show -prce --raw' and save the output <==
The resulting files are available here:
http://speed47.net/tmp2/qgroup.tar.gz
The run2 is the more complete one, during run1 the machine crashed even
faster.
It's interesting to note, however, that it seems to have crashed the
same way and at the same step in the process.
As the machine is now, qgroups seems OK :
~# btrfs qgroup show -pcre --raw /tank/
qgroupid rfer excl max_rfer max_excl parent
child
-------- ---- ---- -------- -------- ------
-----
0/5 32768 32768 none none ---
---
0/1906 3315696058368 3315696058368 none none ---
---
0/1909 249901842432 249901842432 none none ---
---
0/1911 2109174587392 2109174587392 none none ---
---
0/3270 47454601216 47454601216 none none ---
---
0/3314 46408499200 32768 none none ---
---
0/3341 14991097856 32768 none none ---
---
0/3367 48371580928 48371580928 none none ---
---
0/5335 56523751424 280592384 none none ---
---
0/5336 60175253504 2599960576 none none ---
---
0/5337 45751746560 250888192 none none ---
---
0/5338 45804650496 186531840 none none ---
---
0/5339 45875167232 190521344 none none ---
---
0/5340 45933486080 327680 none none ---
---
0/5341 45933502464 344064 none none ---
---
0/5342 46442815488 35454976 none none ---
---
0/5343 46442520576 30638080 none none ---
---
0/5344 46448312320 36495360 none none ---
---
0/5345 46425235456 86204416 none none ---
---
0/5346 46081941504 119398400 none none ---
---
0/5347 46402715648 55615488 none none ---
---
0/5348 46403534848 50528256 none none ---
---
0/5349 45486301184 91463680 none none ---
---
0/5351 46414635008 393216 none none ---
---
0/5352 46414667776 294912 none none ---
---
0/5353 46414667776 294912 none none ---
---
0/5354 46406148096 24829952 none none ---
---
0/5355 46415986688 33103872 none none ---
---
0/5356 46406262784 23216128 none none ---
---
0/5357 46408245248 17408000 none none ---
---
0/5358 46416052224 25280512 none none ---
---
0/5359 46406336512 23158784 none none ---
---
0/5360 46408335360 25157632 none none ---
---
0/5361 46406402048 24395776 none none ---
---
0/5362 46415273984 32260096 none none ---
---
0/5363 46408499200 32768 none none ---
---
0/5364 14949441536 139812864 none none ---
---
0/5365 14996299776 176889856 none none ---
---
0/5366 14958616576 143065088 none none ---
---
0/5367 14919172096 100171776 none none ---
---
0/5368 14945968128 142409728 none none ---
---
0/5369 14991097856 32768 none none ---
---
But I'm pretty sure I can get that (u64)-1 value again by deleting
snapshots. Shall I ? Or do you have something else for me to run before
that ?
So, as a quick summary of this big thread, it seems I've been hitting 3
bugs, all reproductible :
- kernel BUG on balance (this original thread)
- negative or zero "excl" qgroups
- hard freezes without kernel trace when playing with snapshots and
quota
Still available to dig deeper where needed.
--
Stéphane.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-18 10:15 ` Stéphane Lesimple
@ 2015-09-18 10:26 ` Stéphane Lesimple
2015-09-20 1:22 ` Qu Wenruo
0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-18 10:26 UTC (permalink / raw)
To: Stéphane Lesimple; +Cc: Qu Wenruo, Qu Wenruo, linux-btrfs
Le 2015-09-18 12:15, Stéphane Lesimple a écrit :
> Le 2015-09-18 09:36, Stéphane Lesimple a écrit :
>> Sure, I did a quota disable / quota enable before running the snapshot
>> debug procedure, so the qgroups were clean again when I started :
>>
>> qgroupid rfer excl max_rfer max_excl parent
>> child
>> -------- ---- ---- -------- -------- ------
>> -----
>> 0/5 16384 16384 none none ---
>> ---
>> 0/1906 1657848029184 1657848029184 none none ---
>> ---
>> 0/1909 124950921216 124950921216 none none ---
>> ---
>> 0/1911 1054587293696 1054587293696 none none ---
>> ---
>> 0/3270 23727300608 23727300608 none none ---
>> ---
>> 0/3314 23221784576 23221784576 none none ---
>> ---
>> 0/3341 7479275520 7479275520 none none ---
>> ---
>> 0/3367 24185790464 24185790464 none none ---
>> ---
>>
>> The test is running, I expect to post the results within an hour or
>> two.
>
> Well, my system crashed twice while running the procedure...
> By "crashed" I mean : the machine no longer pings, and nothing is
> logged in kern.log unfortunately :
>
> [ 7096.735731] BTRFS info (device dm-3): qgroup scan completed
> (inconsistency flag cleared)
> [ 7172.614851] BTRFS info (device dm-3): qgroup scan completed
> (inconsistency flag cleared)
> [ 7242.870259] BTRFS info (device dm-3): qgroup scan completed
> (inconsistency flag cleared)
> [ 7321.466931] BTRFS info (device dm-3): qgroup scan completed
> (inconsistency flag cleared)
> [ 0.000000] Initializing cgroup subsys cpuset
>
> The even stranger part is that the last 2 stdout dump files exist but
> are empty :
>
> -rw-r--r-- 1 root root 21 Sep 18 10:29 snap32.step5
> -rw-r--r-- 1 root root 3.2K Sep 18 10:29 snap32.step6
> -rw-r--r-- 1 root root 3.2K Sep 18 10:29 snap33.step1
> -rw-r--r-- 1 root root 3.3K Sep 18 10:29 snap33.step3
> -rw-r--r-- 1 root root 21 Sep 18 10:30 snap33.step5
> -rw-r--r-- 1 root root 3.3K Sep 18 10:30 snap33.step6
> -rw-r--r-- 1 root root 3.3K Sep 18 10:30 snap34.step1
> -rw-r--r-- 1 root root 0 Sep 18 10:30 snap34.step3 <==
> -rw-r--r-- 1 root root 0 Sep 18 10:30 snap34.step5 <==
>
> The mentioned steps are as follows :
>
> 0) Rsync data from the next ext4 "snapshot" to the subvolume
> 1) Do 'sync; btrfs qgroup show -prce --raw' and save the output <==
> 2) Create the needed readonly snapshot on btrfs
> 3) Do 'sync; btrfs qgroup show -prce --raw' and save the output <==
> 4) Avoid doing IO if possible until step 6)
> 5) Do 'btrfs quota rescan -w' and save it <==
> 6) Do 'sync; btrfs qgroup show -prce --raw' and save the output <==
>
> The resulting files are available here:
> http://speed47.net/tmp2/qgroup.tar.gz
> The run2 is the more complete one, during run1 the machine crashed even
> faster.
> It's interesting to note, however, that it seems to have crashed the
> same way and at the same step in the process.
Actually about that, I forgot I did set up netconsole before starting
the second run after the first "muted" crash, and it did work : even if
I have no logs in kern.log, netconsole managed to send them to my other
machine before going down, so here it is :
---
[ 5738.172692] BUG: unable to handle kernel NULL pointer dereference at
00000000000001f0
[ 5738.172702] IP: [<ffffffffc03150db>] start_transaction+0x1b/0x580
[btrfs]
[ 5738.172719] PGD c0aa7067 PUD c0aa6067 PMD 0
[ 5738.172723] Oops: 0000 [#1] SMP
[ 5738.172726] Modules linked in: netconsole configfs xts gf128mul drbg
ansi_cprng xt_multiport xt_comment xt_conntrack xt_nat xt_tcpudp
nfnetlink_queue nfnetlink_log nfnetlink nf_conntrack_ftp
nf_conntrack_sane iptable_security iptable_filter iptable_mangle
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat
nf_conntrack iptable_raw ip_tables x_tables nfsd auth_rpcgss nfs_acl nfs
cmac dm_crypt rfcomm bnep lockd grace sunrpc fscache binfmt_misc
intel_rapl snd_hda_codec_realtek iosf_mbi x86_pkg_temp_thermal
intel_powerclamp kvm_intel snd_hda_codec_generic snd_hda_intel
snd_hda_codec kvm eeepc_wmi asus_wmi snd_hda_core btusb sparse_keymap
btrtl snd_hwdep btbcm snd_pcm btintel 8021q bluetooth snd_seq_midi
dm_multipath snd_seq_midi_event garp snd_rawmidi mrp snd_seq stp llc
snd_seq_device snd_timer crct10dif_pclmul crc32_pclmul snd
ghash_clmulni_intel cryptd serio_raw soundcore mei_me mei lpc_ich shpchp
mac_hid parport_pc ppdev nct6775 hwmon_vid coretemp lp parport btrfs
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor
async_tx xor raid6_pq raid0 multipath linear nbd raid1 i915 e1000e
i2c_algo_bit drm_kms_helper syscopyarea ptp sysfillrect sysimgblt
fb_sys_fops psmouse ahci drm libahci pps_core wmi video [last unloaded:
netconsole]
[ 5738.172831] CPU: 1 PID: 10932 Comm: kworker/u4:14 Not tainted
4.3.0-rc1 #1
[ 5738.172833] Hardware name: ASUS All Series/H87I-PLUS, BIOS 1005
01/06/2014
[ 5738.172843] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper
[btrfs]
[ 5738.172845] task: ffff8800c7010000 ti: ffff88006acf4000 task.ti:
ffff88006acf4000
[ 5738.172847] RIP: 0010:[<ffffffffc03150db>] [<ffffffffc03150db>]
start_transaction+0x1b/0x580 [btrfs]
[ 5738.172855] RSP: 0018:ffff88006acf7ca8 EFLAGS: 00010282
[ 5738.172856] RAX: 0000000000000004 RBX: 0000000000000201 RCX:
0000000000000002
[ 5738.172857] RDX: 0000000000000201 RSI: 0000000000000001 RDI:
0000000000000000
[ 5738.172858] RBP: ffff88006acf7cf0 R08: ffff88010990eab0 R09:
00000001801c0017
[ 5738.172860] R10: 000000000990e701 R11: ffffea0004264380 R12:
0000000000000000
[ 5738.172861] R13: ffff8800c73a6e08 R14: ffff880027963800 R15:
0000160000000000
[ 5738.172862] FS: 0000000000000000(0000) GS:ffff88011fb00000(0000)
knlGS:0000000000000000
[ 5738.172863] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5738.172864] CR2: 00000000000001f0 CR3: 0000000027a65000 CR4:
00000000000406e0
[ 5738.172866] Stack:
[ 5738.172867] ffff8800c73a6e08 ffff880027963800 0000160000000000
ffff88006acf7ce8
[ 5738.172871] 00000000000000be 00000000fffffffc ffff8800c73a6e08
ffff880027963800
[ 5738.172875] 0000160000000000 ffff88006acf7d00 ffffffffc031565b
ffff88006acf7dc0
[ 5738.172879] Call Trace:
[ 5738.172887] [<ffffffffc031565b>] btrfs_start_transaction+0x1b/0x20
[btrfs]
[ 5738.172896] [<ffffffffc0378038>]
btrfs_qgroup_rescan_worker+0x388/0x5a0 [btrfs]
[ 5738.172904] [<ffffffffc03444e0>] normal_work_helper+0xc0/0x270
[btrfs]
[ 5738.172912] [<ffffffffc03448a2>]
btrfs_qgroup_rescan_helper+0x12/0x20 [btrfs]
[ 5738.172915] [<ffffffff8109127e>] process_one_work+0x14e/0x3d0
[ 5738.172917] [<ffffffff8109192a>] worker_thread+0x11a/0x470
[ 5738.172919] [<ffffffff81091810>] ? rescuer_thread+0x310/0x310
[ 5738.172921] [<ffffffff81097059>] kthread+0xc9/0xe0
[ 5738.172923] [<ffffffff81096f90>] ? kthread_park+0x60/0x60
[ 5738.172926] [<ffffffff817aac4f>] ret_from_fork+0x3f/0x70
[ 5738.172928] [<ffffffff81096f90>] ? kthread_park+0x60/0x60
[ 5738.172929] Code: 49 c1 e9 5c ff ff ff 66 0f 1f 84 00 00 00 00 00 0f
1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 89 d3 48 83
ec 20 <48> 8b 87 f0 01 00 00 48 8b 90 60 0e 00 00 83 e2 01 0f 85 86 00
[ 5738.172973] RIP [<ffffffffc03150db>] start_transaction+0x1b/0x580
[btrfs]
[ 5738.172981] RSP <ffff88006acf7ca8>
[ 5738.172982] CR2: 00000000000001f0
[ 5738.172984] ---[ end trace 9feb85def1327ee9 ]---
[ 5738.173010] BUG: unable to handle kernel paging request at
ffffffffffffffd8
[ 5738.173012] IP: [<ffffffff810977d0>] kthread_data+0x10/0x20
[ 5738.173015] PGD 1c13067 PUD 1c15067 PMD 0
[ 5738.173019] Oops: 0000 [#2] SMP
[ 5738.173021] Modules linked in: netconsole configfs xts gf128mul drbg
---
Clearly this is during a rescan.
> As the machine is now, qgroups seems OK :
>
> ~# btrfs qgroup show -pcre --raw /tank/
> qgroupid rfer excl max_rfer max_excl parent
> child
> -------- ---- ---- -------- -------- ------
> -----
> 0/5 32768 32768 none none ---
> ---
> 0/1906 3315696058368 3315696058368 none none ---
> ---
> 0/1909 249901842432 249901842432 none none ---
> ---
> 0/1911 2109174587392 2109174587392 none none ---
> ---
> 0/3270 47454601216 47454601216 none none ---
> ---
> 0/3314 46408499200 32768 none none ---
> ---
> 0/3341 14991097856 32768 none none ---
> ---
> 0/3367 48371580928 48371580928 none none ---
> ---
> 0/5335 56523751424 280592384 none none ---
> ---
> 0/5336 60175253504 2599960576 none none ---
> ---
> 0/5337 45751746560 250888192 none none ---
> ---
> 0/5338 45804650496 186531840 none none ---
> ---
> 0/5339 45875167232 190521344 none none ---
> ---
> 0/5340 45933486080 327680 none none ---
> ---
> 0/5341 45933502464 344064 none none ---
> ---
> 0/5342 46442815488 35454976 none none ---
> ---
> 0/5343 46442520576 30638080 none none ---
> ---
> 0/5344 46448312320 36495360 none none ---
> ---
> 0/5345 46425235456 86204416 none none ---
> ---
> 0/5346 46081941504 119398400 none none ---
> ---
> 0/5347 46402715648 55615488 none none ---
> ---
> 0/5348 46403534848 50528256 none none ---
> ---
> 0/5349 45486301184 91463680 none none ---
> ---
> 0/5351 46414635008 393216 none none ---
> ---
> 0/5352 46414667776 294912 none none ---
> ---
> 0/5353 46414667776 294912 none none ---
> ---
> 0/5354 46406148096 24829952 none none ---
> ---
> 0/5355 46415986688 33103872 none none ---
> ---
> 0/5356 46406262784 23216128 none none ---
> ---
> 0/5357 46408245248 17408000 none none ---
> ---
> 0/5358 46416052224 25280512 none none ---
> ---
> 0/5359 46406336512 23158784 none none ---
> ---
> 0/5360 46408335360 25157632 none none ---
> ---
> 0/5361 46406402048 24395776 none none ---
> ---
> 0/5362 46415273984 32260096 none none ---
> ---
> 0/5363 46408499200 32768 none none ---
> ---
> 0/5364 14949441536 139812864 none none ---
> ---
> 0/5365 14996299776 176889856 none none ---
> ---
> 0/5366 14958616576 143065088 none none ---
> ---
> 0/5367 14919172096 100171776 none none ---
> ---
> 0/5368 14945968128 142409728 none none ---
> ---
> 0/5369 14991097856 32768 none none ---
> ---
>
>
> But I'm pretty sure I can get that (u64)-1 value again by deleting
> snapshots. Shall I ? Or do you have something else for me to run
> before that ?
>
> So, as a quick summary of this big thread, it seems I've been hitting
> 3 bugs, all reproductible :
> - kernel BUG on balance (this original thread)
> - negative or zero "excl" qgroups
> - hard freezes without kernel trace when playing with snapshots and
> quota
>
> Still available to dig deeper where needed.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-18 10:26 ` Stéphane Lesimple
@ 2015-09-20 1:22 ` Qu Wenruo
2015-09-20 10:35 ` Stéphane Lesimple
0 siblings, 1 reply; 37+ messages in thread
From: Qu Wenruo @ 2015-09-20 1:22 UTC (permalink / raw)
To: Stéphane Lesimple; +Cc: Qu Wenruo, linux-btrfs
在 2015年09月18日 18:26, Stéphane Lesimple 写道:
> Le 2015-09-18 12:15, Stéphane Lesimple a écrit :
>> Le 2015-09-18 09:36, Stéphane Lesimple a écrit :
>>> Sure, I did a quota disable / quota enable before running the snapshot
>>> debug procedure, so the qgroups were clean again when I started :
>>>
>>> qgroupid rfer excl max_rfer max_excl parent
>>> child
>>> -------- ---- ---- -------- -------- ------
>>> -----
>>> 0/5 16384 16384 none none --- ---
>>> 0/1906 1657848029184 1657848029184 none none --- ---
>>> 0/1909 124950921216 124950921216 none none --- ---
>>> 0/1911 1054587293696 1054587293696 none none --- ---
>>> 0/3270 23727300608 23727300608 none none --- ---
>>> 0/3314 23221784576 23221784576 none none --- ---
>>> 0/3341 7479275520 7479275520 none none --- ---
>>> 0/3367 24185790464 24185790464 none none --- ---
>>>
>>> The test is running, I expect to post the results within an hour or two.
>>
>> Well, my system crashed twice while running the procedure...
>> By "crashed" I mean : the machine no longer pings, and nothing is
>> logged in kern.log unfortunately :
>>
>> [ 7096.735731] BTRFS info (device dm-3): qgroup scan completed
>> (inconsistency flag cleared)
>> [ 7172.614851] BTRFS info (device dm-3): qgroup scan completed
>> (inconsistency flag cleared)
>> [ 7242.870259] BTRFS info (device dm-3): qgroup scan completed
>> (inconsistency flag cleared)
>> [ 7321.466931] BTRFS info (device dm-3): qgroup scan completed
>> (inconsistency flag cleared)
>> [ 0.000000] Initializing cgroup subsys cpuset
>>
>> The even stranger part is that the last 2 stdout dump files exist but
>> are empty :
>>
>> -rw-r--r-- 1 root root 21 Sep 18 10:29 snap32.step5
>> -rw-r--r-- 1 root root 3.2K Sep 18 10:29 snap32.step6
>> -rw-r--r-- 1 root root 3.2K Sep 18 10:29 snap33.step1
>> -rw-r--r-- 1 root root 3.3K Sep 18 10:29 snap33.step3
>> -rw-r--r-- 1 root root 21 Sep 18 10:30 snap33.step5
>> -rw-r--r-- 1 root root 3.3K Sep 18 10:30 snap33.step6
>> -rw-r--r-- 1 root root 3.3K Sep 18 10:30 snap34.step1
>> -rw-r--r-- 1 root root 0 Sep 18 10:30 snap34.step3 <==
>> -rw-r--r-- 1 root root 0 Sep 18 10:30 snap34.step5 <==
>>
>> The mentioned steps are as follows :
>>
>> 0) Rsync data from the next ext4 "snapshot" to the subvolume
>> 1) Do 'sync; btrfs qgroup show -prce --raw' and save the output <==
>> 2) Create the needed readonly snapshot on btrfs
>> 3) Do 'sync; btrfs qgroup show -prce --raw' and save the output <==
>> 4) Avoid doing IO if possible until step 6)
>> 5) Do 'btrfs quota rescan -w' and save it <==
>> 6) Do 'sync; btrfs qgroup show -prce --raw' and save the output <==
>>
>> The resulting files are available here:
>> http://speed47.net/tmp2/qgroup.tar.gz
>> The run2 is the more complete one, during run1 the machine crashed
>> even faster.
>> It's interesting to note, however, that it seems to have crashed the
>> same way and at the same step in the process.
Your data really helps a lot!!
And the good news is, the qgroup accouting part is working as expected.
Although I only checked about 1/3/6 of about 5 snaps, they are all OK.
I can make a script to cross check them, but from the last few result, I
think qgroup works fine.
I'm more confident about the minus number, which should be a result of
deleted subvolume, and the real problem is, such qgroup is not handled
well with qgroup rescan.
I'll try to add a hot fix for such case if needed.
But right now, I don't have a good idea for it until Mark's work of
rescan subtree.
Maybe I can add a new option for btrfs-progs to automatically remove the
qgroup and trigger a rescan?
> Actually about that, I forgot I did set up netconsole before starting
> the second run after the first "muted" crash, and it did work : even if
> I have no logs in kern.log, netconsole managed to send them to my other
> machine before going down, so here it is :
>
> ---
> [ 5738.172692] BUG: unable to handle kernel NULL pointer dereference at
> 00000000000001f0
> [ 5738.172702] IP: [<ffffffffc03150db>] start_transaction+0x1b/0x580
> [btrfs]
> [ 5738.172719] PGD c0aa7067 PUD c0aa6067 PMD 0
> [ 5738.172723] Oops: 0000 [#1] SMP
> [ 5738.172726] Modules linked in: netconsole configfs xts gf128mul drbg
> ansi_cprng xt_multiport xt_comment xt_conntrack xt_nat xt_tcpudp
> nfnetlink_queue nfnetlink_log nfnetlink nf_conntrack_ftp
> nf_conntrack_sane iptable_security iptable_filter iptable_mangle
> iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat
> nf_conntrack iptable_raw ip_tables x_tables nfsd auth_rpcgss nfs_acl nfs
> cmac dm_crypt rfcomm bnep lockd grace sunrpc fscache binfmt_misc
> intel_rapl snd_hda_codec_realtek iosf_mbi x86_pkg_temp_thermal
> intel_powerclamp kvm_intel snd_hda_codec_generic snd_hda_intel
> snd_hda_codec kvm eeepc_wmi asus_wmi snd_hda_core btusb sparse_keymap
> btrtl snd_hwdep btbcm snd_pcm btintel 8021q bluetooth snd_seq_midi
> dm_multipath snd_seq_midi_event garp snd_rawmidi mrp snd_seq stp llc
> snd_seq_device snd_timer crct10dif_pclmul crc32_pclmul snd
> ghash_clmulni_intel cryptd serio_raw soundcore mei_me mei lpc_ich shpchp
> mac_hid parport_pc ppdev nct6775 hwmon_vid coretemp lp parport btrfs
> raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor
> async_tx xor raid6_pq raid0 multipath linear nbd raid1 i915 e1000e
> i2c_algo_bit drm_kms_helper syscopyarea ptp sysfillrect sysimgblt
> fb_sys_fops psmouse ahci drm libahci pps_core wmi video [last unloaded:
> netconsole]
> [ 5738.172831] CPU: 1 PID: 10932 Comm: kworker/u4:14 Not tainted
> 4.3.0-rc1 #1
> [ 5738.172833] Hardware name: ASUS All Series/H87I-PLUS, BIOS 1005
> 01/06/2014
> [ 5738.172843] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper
> [btrfs]
> [ 5738.172845] task: ffff8800c7010000 ti: ffff88006acf4000 task.ti:
> ffff88006acf4000
> [ 5738.172847] RIP: 0010:[<ffffffffc03150db>] [<ffffffffc03150db>]
> start_transaction+0x1b/0x580 [btrfs]
> [ 5738.172855] RSP: 0018:ffff88006acf7ca8 EFLAGS: 00010282
> [ 5738.172856] RAX: 0000000000000004 RBX: 0000000000000201 RCX:
> 0000000000000002
> [ 5738.172857] RDX: 0000000000000201 RSI: 0000000000000001 RDI:
> 0000000000000000
> [ 5738.172858] RBP: ffff88006acf7cf0 R08: ffff88010990eab0 R09:
> 00000001801c0017
> [ 5738.172860] R10: 000000000990e701 R11: ffffea0004264380 R12:
> 0000000000000000
> [ 5738.172861] R13: ffff8800c73a6e08 R14: ffff880027963800 R15:
> 0000160000000000
> [ 5738.172862] FS: 0000000000000000(0000) GS:ffff88011fb00000(0000)
> knlGS:0000000000000000
> [ 5738.172863] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 5738.172864] CR2: 00000000000001f0 CR3: 0000000027a65000 CR4:
> 00000000000406e0
> [ 5738.172866] Stack:
> [ 5738.172867] ffff8800c73a6e08 ffff880027963800 0000160000000000
> ffff88006acf7ce8
> [ 5738.172871] 00000000000000be 00000000fffffffc ffff8800c73a6e08
> ffff880027963800
> [ 5738.172875] 0000160000000000 ffff88006acf7d00 ffffffffc031565b
> ffff88006acf7dc0
> [ 5738.172879] Call Trace:
> [ 5738.172887] [<ffffffffc031565b>] btrfs_start_transaction+0x1b/0x20
> [btrfs]
> [ 5738.172896] [<ffffffffc0378038>]
> btrfs_qgroup_rescan_worker+0x388/0x5a0 [btrfs]
Your netconsole backtrace is also of greate value.
This one implies that, my rework also caused some stupid bug.
(Yeah, I always make such bugs) or some existing unexposed rescan bug.
Would you please use gdb to show the codes of
"btrfs_qgroup_rescan_worker+0x388" ?
(Need kernel debuginfo)
My guess is the following line:(pretty sure, but not 100% sure)
------
/*
* only update status, since the previous part has alreay
updated the
* qgroup info.
*/
trans = btrfs_start_transaction(fs_info->quota_root, 1); <<<<<
if (IS_ERR(trans)) {
err = PTR_ERR(trans);
btrfs_err(fs_info,
"fail to start transaction for status update:
%d\n",
err);
goto done;
}
------
But that means, at rescan time, fs_info->quota_root is still NULL, which
is quite wired.
I can add extra check to avoid such NULL pointer for now, but it's
better to review the existing rescan workflow, as I think there is some
race for it to init quota_root.
You can also try the following hotfix patch to see if it works:
http://pastebin.com/966GQXPk
My concern is, this may cause qgroup rescan to exit without updating its
accounting info...
So still need your help.
Or I can use your reproducer script to test it next Monday.
> [ 5738.172904] [<ffffffffc03444e0>] normal_work_helper+0xc0/0x270 [btrfs]
> [ 5738.172912] [<ffffffffc03448a2>]
> btrfs_qgroup_rescan_helper+0x12/0x20 [btrfs]
> [ 5738.172915] [<ffffffff8109127e>] process_one_work+0x14e/0x3d0
> [ 5738.172917] [<ffffffff8109192a>] worker_thread+0x11a/0x470
> [ 5738.172919] [<ffffffff81091810>] ? rescuer_thread+0x310/0x310
> [ 5738.172921] [<ffffffff81097059>] kthread+0xc9/0xe0
> [ 5738.172923] [<ffffffff81096f90>] ? kthread_park+0x60/0x60
> [ 5738.172926] [<ffffffff817aac4f>] ret_from_fork+0x3f/0x70
> [ 5738.172928] [<ffffffff81096f90>] ? kthread_park+0x60/0x60
> [ 5738.172929] Code: 49 c1 e9 5c ff ff ff 66 0f 1f 84 00 00 00 00 00 0f
> 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 89 d3 48 83
> ec 20 <48> 8b 87 f0 01 00 00 48 8b 90 60 0e 00 00 83 e2 01 0f 85 86 00
> [ 5738.172973] RIP [<ffffffffc03150db>] start_transaction+0x1b/0x580
> [btrfs]
> [ 5738.172981] RSP <ffff88006acf7ca8>
> [ 5738.172982] CR2: 00000000000001f0
> [ 5738.172984] ---[ end trace 9feb85def1327ee9 ]---
> [ 5738.173010] BUG: unable to handle kernel paging request at
> ffffffffffffffd8
> [ 5738.173012] IP: [<ffffffff810977d0>] kthread_data+0x10/0x20
> [ 5738.173015] PGD 1c13067 PUD 1c15067 PMD 0
> [ 5738.173019] Oops: 0000 [#2] SMP
> [ 5738.173021] Modules linked in: netconsole configfs xts gf128mul drbg
> ---
>
> Clearly this is during a rescan.
>
>> As the machine is now, qgroups seems OK :
>>
>> ~# btrfs qgroup show -pcre --raw /tank/
>> qgroupid rfer excl max_rfer max_excl parent
>> child
>> -------- ---- ---- -------- -------- ------
>> -----
>> 0/5 32768 32768 none none --- ---
>> 0/1906 3315696058368 3315696058368 none none --- ---
>> 0/1909 249901842432 249901842432 none none --- ---
>> 0/1911 2109174587392 2109174587392 none none --- ---
>> 0/3270 47454601216 47454601216 none none --- ---
>> 0/3314 46408499200 32768 none none --- ---
>> 0/3341 14991097856 32768 none none --- ---
>> 0/3367 48371580928 48371580928 none none --- ---
>> 0/5335 56523751424 280592384 none none --- ---
>> 0/5336 60175253504 2599960576 none none --- ---
>> 0/5337 45751746560 250888192 none none --- ---
>> 0/5338 45804650496 186531840 none none --- ---
>> 0/5339 45875167232 190521344 none none --- ---
>> 0/5340 45933486080 327680 none none --- ---
>> 0/5341 45933502464 344064 none none --- ---
>> 0/5342 46442815488 35454976 none none --- ---
>> 0/5343 46442520576 30638080 none none --- ---
>> 0/5344 46448312320 36495360 none none --- ---
>> 0/5345 46425235456 86204416 none none --- ---
>> 0/5346 46081941504 119398400 none none --- ---
>> 0/5347 46402715648 55615488 none none --- ---
>> 0/5348 46403534848 50528256 none none --- ---
>> 0/5349 45486301184 91463680 none none --- ---
>> 0/5351 46414635008 393216 none none --- ---
>> 0/5352 46414667776 294912 none none --- ---
>> 0/5353 46414667776 294912 none none --- ---
>> 0/5354 46406148096 24829952 none none --- ---
>> 0/5355 46415986688 33103872 none none --- ---
>> 0/5356 46406262784 23216128 none none --- ---
>> 0/5357 46408245248 17408000 none none --- ---
>> 0/5358 46416052224 25280512 none none --- ---
>> 0/5359 46406336512 23158784 none none --- ---
>> 0/5360 46408335360 25157632 none none --- ---
>> 0/5361 46406402048 24395776 none none --- ---
>> 0/5362 46415273984 32260096 none none --- ---
>> 0/5363 46408499200 32768 none none --- ---
>> 0/5364 14949441536 139812864 none none --- ---
>> 0/5365 14996299776 176889856 none none --- ---
>> 0/5366 14958616576 143065088 none none --- ---
>> 0/5367 14919172096 100171776 none none --- ---
>> 0/5368 14945968128 142409728 none none --- ---
>> 0/5369 14991097856 32768 none none --- ---
>>
>>
>> But I'm pretty sure I can get that (u64)-1 value again by deleting
>> snapshots. Shall I ? Or do you have something else for me to run
>> before that ?
You have already done a great job in helping maturing qgroups.
The minus number and 0 excl is somewhat expected for deleted snapshots.
Good news is, 1) it doesn't affect valid(non-orphan) qgroup.
2) Mark is already working on it.
I'll try to add a btrfs-progs hotfix for you to delete and rescan
qgroups to avoid such problem.
>>
>> So, as a quick summary of this big thread, it seems I've been hitting
>> 3 bugs, all reproductible :
>> - kernel BUG on balance (this original thread)
For this, I can't provide much help, as extent backref bug is quite hard
to debug, unless a developer is interested in it and find a stable way
to reproduce it.
The rest two are explained or have hot fix mentioned above.
Thanks,
Qu
>> - negative or zero "excl" qgroups
>> - hard freezes without kernel trace when playing with snapshots and quota
>>
>> Still available to dig deeper where needed.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-20 1:22 ` Qu Wenruo
@ 2015-09-20 10:35 ` Stéphane Lesimple
2015-09-20 10:51 ` Qu Wenruo
0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-20 10:35 UTC (permalink / raw)
To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs
Le 2015-09-20 03:22, Qu Wenruo a écrit :
>>> The mentioned steps are as follows :
>>>
>>> 0) Rsync data from the next ext4 "snapshot" to the subvolume
>>> 1) Do 'sync; btrfs qgroup show -prce --raw' and save the output
>>> <==
>>> 2) Create the needed readonly snapshot on btrfs
>>> 3) Do 'sync; btrfs qgroup show -prce --raw' and save the output
>>> <==
>>> 4) Avoid doing IO if possible until step 6)
>>> 5) Do 'btrfs quota rescan -w' and save it <==
>>> 6) Do 'sync; btrfs qgroup show -prce --raw' and save the output
>>> <==
>>>
>>> The resulting files are available here:
>>> http://speed47.net/tmp2/qgroup.tar.gz
>>> The run2 is the more complete one, during run1 the machine crashed
>>> even faster.
>>> It's interesting to note, however, that it seems to have crashed the
>>> same way and at the same step in the process.
>
> Your data really helps a lot!!
>
> And the good news is, the qgroup accouting part is working as expected.
> Although I only checked about 1/3/6 of about 5 snaps, they are all OK.
>
> I can make a script to cross check them, but from the last few result,
> I think qgroup works fine.
>
> I'm more confident about the minus number, which should be a result of
> deleted subvolume, and the real problem is, such qgroup is not handled
> well with qgroup rescan.
I agree with your analysis, this matches what I observed.
> I'll try to add a hot fix for such case if needed.
> But right now, I don't have a good idea for it until Mark's work of
> rescan subtree.
>
> Maybe I can add a new option for btrfs-progs to automatically remove
> the qgroup and trigger a rescan?
Until this is properly fixed in the kernel code, and this is good news
to
know Mark and you are working on it, this would be a good workaround
yes!
>> [ 5738.172879] Call Trace:
>> [ 5738.172887] [<ffffffffc031565b>] btrfs_start_transaction+0x1b/0x20
>> [btrfs]
>> [ 5738.172896] [<ffffffffc0378038>]
>> btrfs_qgroup_rescan_worker+0x388/0x5a0 [btrfs]
>
> Your netconsole backtrace is also of greate value.
> This one implies that, my rework also caused some stupid bug.
> (Yeah, I always make such bugs) or some existing unexposed rescan bug.
>
> Would you please use gdb to show the codes of
> "btrfs_qgroup_rescan_worker+0x388" ?
> (Need kernel debuginfo)
>
> My guess is the following line:(pretty sure, but not 100% sure)
> ------
> /*
> * only update status, since the previous part has alreay
> updated the
> * qgroup info.
> */
> trans = btrfs_start_transaction(fs_info->quota_root, 1); <<<<<
> if (IS_ERR(trans)) {
> err = PTR_ERR(trans);
> btrfs_err(fs_info,
> "fail to start transaction for status update:
> %d\n",
> err);
> goto done;
> }
> ------
The kernel and modules were already compiled with debuginfo.
However for some reason, I couldn't get gdb disassembly of /proc/kcore
properly
aligned with the source I compiled: the asm code doesn't match the C
code shown
by gdb. In any case, watching the source of this function, this is the
only place
btrfs_start_transaction is called, so we can be 100% sure it's where the
crash
happens indeed.
> But that means, at rescan time, fs_info->quota_root is still NULL,
> which is quite wired.
> I can add extra check to avoid such NULL pointer for now, but it's
> better to review the existing rescan workflow, as I think there is
> some race for it to init quota_root.
>
> You can also try the following hotfix patch to see if it works:
> http://pastebin.com/966GQXPk
>
> My concern is, this may cause qgroup rescan to exit without updating
> its accounting info...
>
> So still need your help.
> Or I can use your reproducer script to test it next Monday.
Compiling with your patch, just amended of a little printk to know if
the execution
flow enters the added if condition. Will let you know about the results.
>>> But I'm pretty sure I can get that (u64)-1 value again by deleting
>>> snapshots. Shall I ? Or do you have something else for me to run
>>> before that ?
>
> You have already done a great job in helping maturing qgroups.
> The minus number and 0 excl is somewhat expected for deleted snapshots.
>
> Good news is, 1) it doesn't affect valid(non-orphan) qgroup.
> 2) Mark is already working on it.
>
> I'll try to add a btrfs-progs hotfix for you to delete and rescan
> qgroups to avoid such problem.
That would be good !
>>> So, as a quick summary of this big thread, it seems I've been hitting
>>> 3 bugs, all reproductible :
>>> - kernel BUG on balance (this original thread)
>
> For this, I can't provide much help, as extent backref bug is quite
> hard to debug, unless a developer is interested in it and find a
> stable way to reproduce it.
Yes, unfortunately as it looks so much like a race condition, I know I
can
reproduce it with my worflow, but it can take between 1 minute and 12
hours,
so I wouldn't call it a "stable way" to reproduce it unfortunately :(
Still if any dev is interested in it, I can reproduce it, with a patched
kernel if needed.
> The rest two are explained or have hot fix mentioned above.
And thanks for that, will keep you posted.
--
Stéphane.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-20 10:35 ` Stéphane Lesimple
@ 2015-09-20 10:51 ` Qu Wenruo
2015-09-20 11:14 ` Stéphane Lesimple
0 siblings, 1 reply; 37+ messages in thread
From: Qu Wenruo @ 2015-09-20 10:51 UTC (permalink / raw)
To: Stéphane Lesimple; +Cc: Qu Wenruo, linux-btrfs
在 2015年09月20日 18:35, Stéphane Lesimple 写道:
> Le 2015-09-20 03:22, Qu Wenruo a écrit :
>>>> The mentioned steps are as follows :
>>>>
>>>> 0) Rsync data from the next ext4 "snapshot" to the subvolume
>>>> 1) Do 'sync; btrfs qgroup show -prce --raw' and save the output <==
>>>> 2) Create the needed readonly snapshot on btrfs
>>>> 3) Do 'sync; btrfs qgroup show -prce --raw' and save the output <==
>>>> 4) Avoid doing IO if possible until step 6)
>>>> 5) Do 'btrfs quota rescan -w' and save it <==
>>>> 6) Do 'sync; btrfs qgroup show -prce --raw' and save the output <==
>>>>
>>>> The resulting files are available here:
>>>> http://speed47.net/tmp2/qgroup.tar.gz
>>>> The run2 is the more complete one, during run1 the machine crashed
>>>> even faster.
>>>> It's interesting to note, however, that it seems to have crashed the
>>>> same way and at the same step in the process.
>>
>> Your data really helps a lot!!
>>
>> And the good news is, the qgroup accouting part is working as expected.
>> Although I only checked about 1/3/6 of about 5 snaps, they are all OK.
>>
>> I can make a script to cross check them, but from the last few result,
>> I think qgroup works fine.
>>
>> I'm more confident about the minus number, which should be a result of
>> deleted subvolume, and the real problem is, such qgroup is not handled
>> well with qgroup rescan.
>
> I agree with your analysis, this matches what I observed.
>
>> I'll try to add a hot fix for such case if needed.
>> But right now, I don't have a good idea for it until Mark's work of
>> rescan subtree.
>>
>> Maybe I can add a new option for btrfs-progs to automatically remove
>> the qgroup and trigger a rescan?
>
> Until this is properly fixed in the kernel code, and this is good news to
> know Mark and you are working on it, this would be a good workaround yes!
>
>>> [ 5738.172879] Call Trace:
>>> [ 5738.172887] [<ffffffffc031565b>] btrfs_start_transaction+0x1b/0x20
>>> [btrfs]
>>> [ 5738.172896] [<ffffffffc0378038>]
>>> btrfs_qgroup_rescan_worker+0x388/0x5a0 [btrfs]
>>
>> Your netconsole backtrace is also of greate value.
>> This one implies that, my rework also caused some stupid bug.
>> (Yeah, I always make such bugs) or some existing unexposed rescan bug.
>>
>> Would you please use gdb to show the codes of
>> "btrfs_qgroup_rescan_worker+0x388" ?
>> (Need kernel debuginfo)
>>
>> My guess is the following line:(pretty sure, but not 100% sure)
>> ------
>> /*
>> * only update status, since the previous part has alreay
>> updated the
>> * qgroup info.
>> */
>> trans = btrfs_start_transaction(fs_info->quota_root, 1); <<<<<
>> if (IS_ERR(trans)) {
>> err = PTR_ERR(trans);
>> btrfs_err(fs_info,
>> "fail to start transaction for status
>> update: %d\n",
>> err);
>> goto done;
>> }
>> ------
>
> The kernel and modules were already compiled with debuginfo.
> However for some reason, I couldn't get gdb disassembly of /proc/kcore
> properly
> aligned with the source I compiled: the asm code doesn't match the C
> code shown
> by gdb. In any case, watching the source of this function, this is the
> only place
> btrfs_start_transaction is called, so we can be 100% sure it's where the
> crash
> happens indeed.
Yep, that's the only caller.
Here is some useful small hint to locate the code, if you are
interestied in kernel development.
# Not sure about whether ubuntu gzipped modules, at least Arch does
# compress it
$ cp <kernel modules dir>/kernel/fs/btrfs/btrfs.ko.gz /tmp/
$ gunzip /tmp/btrfs.ko.gz
$ gdb /tmp/btrfs.ko
# Make sure gdb read all the needed debuginfo
$ gdb list *(btrfs_qgroup_rescan_worker+0x388)
And gdb will find the code position for you.
Quite easy one, only backtrace info is needed.
Another hint is about how to collect the kernel crash info.
Your netconsole setup would be definitely one good practice.
Another one I use to collect crash info is kdump.
Ubuntu should have a good wiki on it.
>
>> But that means, at rescan time, fs_info->quota_root is still NULL,
>> which is quite wired.
>> I can add extra check to avoid such NULL pointer for now, but it's
>> better to review the existing rescan workflow, as I think there is
>> some race for it to init quota_root.
>>
>> You can also try the following hotfix patch to see if it works:
>> http://pastebin.com/966GQXPk
>>
>> My concern is, this may cause qgroup rescan to exit without updating
>> its accounting info...
>>
>> So still need your help.
>> Or I can use your reproducer script to test it next Monday.
>
> Compiling with your patch, just amended of a little printk to know if
> the execution
> flow enters the added if condition. Will let you know about the results.
>
>>>> But I'm pretty sure I can get that (u64)-1 value again by deleting
>>>> snapshots. Shall I ? Or do you have something else for me to run
>>>> before that ?
>>
>> You have already done a great job in helping maturing qgroups.
>> The minus number and 0 excl is somewhat expected for deleted snapshots.
>>
>> Good news is, 1) it doesn't affect valid(non-orphan) qgroup.
>> 2) Mark is already working on it.
>>
>> I'll try to add a btrfs-progs hotfix for you to delete and rescan
>> qgroups to avoid such problem.
>
> That would be good !
>
>>>> So, as a quick summary of this big thread, it seems I've been hitting
>>>> 3 bugs, all reproductible :
>>>> - kernel BUG on balance (this original thread)
>>
>> For this, I can't provide much help, as extent backref bug is quite
>> hard to debug, unless a developer is interested in it and find a
>> stable way to reproduce it.
>
> Yes, unfortunately as it looks so much like a race condition, I know I can
> reproduce it with my worflow, but it can take between 1 minute and 12
> hours,
> so I wouldn't call it a "stable way" to reproduce it unfortunately :(
>
> Still if any dev is interested in it, I can reproduce it, with a patched
> kernel if needed.
Maybe you are already doing it, you can only compile the btrfs modules,
which will be far more faster than compile the whole kernel, if and only
if the compiled module can be loaded.
Thanks,
Qu
>
>> The rest two are explained or have hot fix mentioned above.
>
> And thanks for that, will keep you posted.
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-20 10:51 ` Qu Wenruo
@ 2015-09-20 11:14 ` Stéphane Lesimple
2015-09-22 1:30 ` Stéphane Lesimple
0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-20 11:14 UTC (permalink / raw)
To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs
Le 2015-09-20 12:51, Qu Wenruo a écrit :
>>> Would you please use gdb to show the codes of
>>> "btrfs_qgroup_rescan_worker+0x388" ?
>>> (Need kernel debuginfo)
>>>
>>> My guess is the following line:(pretty sure, but not 100% sure)
>>> ------
>>> /*
>>> * only update status, since the previous part has alreay
>>> updated the
>>> * qgroup info.
>>> */
>>> trans = btrfs_start_transaction(fs_info->quota_root, 1);
>>> <<<<<
>>> if (IS_ERR(trans)) {
>>> err = PTR_ERR(trans);
>>> btrfs_err(fs_info,
>>> "fail to start transaction for status
>>> update: %d\n",
>>> err);
>>> goto done;
>>> }
>>> ------
>>
>> The kernel and modules were already compiled with debuginfo.
>> However for some reason, I couldn't get gdb disassembly of /proc/kcore
>> properly
>> aligned with the source I compiled: the asm code doesn't match the C
>> code shown
>> by gdb. In any case, watching the source of this function, this is the
>> only place
>> btrfs_start_transaction is called, so we can be 100% sure it's where
>> the
>> crash
>> happens indeed.
>
> Yep, that's the only caller.
>
> Here is some useful small hint to locate the code, if you are
> interestied in kernel development.
>
> # Not sure about whether ubuntu gzipped modules, at least Arch does
> # compress it
> $ cp <kernel modules dir>/kernel/fs/btrfs/btrfs.ko.gz /tmp/
> $ gunzip /tmp/btrfs.ko.gz
> $ gdb /tmp/btrfs.ko
> # Make sure gdb read all the needed debuginfo
> $ gdb list *(btrfs_qgroup_rescan_worker+0x388)
>
> And gdb will find the code position for you.
> Quite easy one, only backtrace info is needed.
Ah, thanks for the tips, I was loading whole vmlinux and using
/proc/kcore
as the core info, then adding the module with "add-symbol-file". But as
we're just looking for the code and not the variables, it was indeed
completely overkill.
(gdb) list *(btrfs_qgroup_rescan_worker+0x388)
0x98068 is in btrfs_qgroup_rescan_worker (fs/btrfs/qgroup.c:2328).
2323
2324 /*
2325 * only update status, since the previous part has
alreay updated the
2326 * qgroup info.
2327 */
2328 trans = btrfs_start_transaction(fs_info->quota_root, 1);
2329 if (IS_ERR(trans)) {
2330 err = PTR_ERR(trans);
2331 btrfs_err(fs_info,
2332 "fail to start transaction for status
update: %d\n",
So this just confirms what we were already 99% sure of.
> Another hint is about how to collect the kernel crash info.
> Your netconsole setup would be definitely one good practice.
>
> Another one I use to collect crash info is kdump.
> Ubuntu should have a good wiki on it.
I've already come across kdump a few times, but never really look into
it.
To debug the other complicated extend backref bug, it could be of some
use.
>>>>> So, as a quick summary of this big thread, it seems I've been
>>>>> hitting
>>>>> 3 bugs, all reproductible :
>>>>> - kernel BUG on balance (this original thread)
>>>
>>> For this, I can't provide much help, as extent backref bug is quite
>>> hard to debug, unless a developer is interested in it and find a
>>> stable way to reproduce it.
>>
>> Yes, unfortunately as it looks so much like a race condition, I know I
>> can
>> reproduce it with my worflow, but it can take between 1 minute and 12
>> hours,
>> so I wouldn't call it a "stable way" to reproduce it unfortunately :(
>>
>> Still if any dev is interested in it, I can reproduce it, with a
>> patched
>> kernel if needed.
>
> Maybe you are already doing it, you can only compile the btrfs
> modules, which will be far more faster than compile the whole kernel,
> if and only if the compiled module can be loaded.
Yes, I've compiled this 4.3.0-rc1 in a completely modular form, so I'll
try to
load the modified module and see if the running kernel accepts it. I
have to rmmod
the loaded module first, hence umounting any btrfs fs before that.
Should be able
to do it in a couple hours.
I'll delete again all my snapshots and run my script. Should be easy to
trigger
the (hopefully worked-around) bug again.
Regards,
--
Stéphane.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-20 11:14 ` Stéphane Lesimple
@ 2015-09-22 1:30 ` Stéphane Lesimple
2015-09-22 1:37 ` Qu Wenruo
0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-22 1:30 UTC (permalink / raw)
To: Stéphane Lesimple; +Cc: Qu Wenruo, Qu Wenruo, linux-btrfs
Le 2015-09-20 13:14, Stéphane Lesimple a écrit :
> Le 2015-09-20 12:51, Qu Wenruo a écrit :
>>>> Would you please use gdb to show the codes of
>>>> "btrfs_qgroup_rescan_worker+0x388" ?
>>>> (Need kernel debuginfo)
>>>>
>>>> My guess is the following line:(pretty sure, but not 100% sure)
>>>> ------
>>>> /*
>>>> * only update status, since the previous part has alreay
>>>> updated the
>>>> * qgroup info.
>>>> */
>>>> trans = btrfs_start_transaction(fs_info->quota_root, 1);
>>>> <<<<<
>>>> if (IS_ERR(trans)) {
>>>> err = PTR_ERR(trans);
>>>> btrfs_err(fs_info,
>>>> "fail to start transaction for status
>>>> update: %d\n",
>>>> err);
>>>> goto done;
>>>> }
>>>> ------
>>>
>>> The kernel and modules were already compiled with debuginfo.
>>> However for some reason, I couldn't get gdb disassembly of
>>> /proc/kcore
>>> properly
>>> aligned with the source I compiled: the asm code doesn't match the C
>>> code shown
>>> by gdb. In any case, watching the source of this function, this is
>>> the
>>> only place
>>> btrfs_start_transaction is called, so we can be 100% sure it's where
>>> the
>>> crash
>>> happens indeed.
>>
>> Yep, that's the only caller.
>>
>> Here is some useful small hint to locate the code, if you are
>> interestied in kernel development.
>>
>> # Not sure about whether ubuntu gzipped modules, at least Arch does
>> # compress it
>> $ cp <kernel modules dir>/kernel/fs/btrfs/btrfs.ko.gz /tmp/
>> $ gunzip /tmp/btrfs.ko.gz
>> $ gdb /tmp/btrfs.ko
>> # Make sure gdb read all the needed debuginfo
>> $ gdb list *(btrfs_qgroup_rescan_worker+0x388)
>>
>> And gdb will find the code position for you.
>> Quite easy one, only backtrace info is needed.
>
> Ah, thanks for the tips, I was loading whole vmlinux and using
> /proc/kcore
> as the core info, then adding the module with "add-symbol-file". But as
> we're just looking for the code and not the variables, it was indeed
> completely overkill.
>
> (gdb) list *(btrfs_qgroup_rescan_worker+0x388)
> 0x98068 is in btrfs_qgroup_rescan_worker (fs/btrfs/qgroup.c:2328).
> 2323
> 2324 /*
> 2325 * only update status, since the previous part has
> alreay updated the
> 2326 * qgroup info.
> 2327 */
> 2328 trans = btrfs_start_transaction(fs_info->quota_root,
> 1);
> 2329 if (IS_ERR(trans)) {
> 2330 err = PTR_ERR(trans);
> 2331 btrfs_err(fs_info,
> 2332 "fail to start transaction for
> status update: %d\n",
>
> So this just confirms what we were already 99% sure of.
>
>> Another hint is about how to collect the kernel crash info.
>> Your netconsole setup would be definitely one good practice.
>>
>> Another one I use to collect crash info is kdump.
>> Ubuntu should have a good wiki on it.
>
> I've already come across kdump a few times, but never really look into
> it.
> To debug the other complicated extend backref bug, it could be of some
> use.
>
>>>>>> So, as a quick summary of this big thread, it seems I've been
>>>>>> hitting
>>>>>> 3 bugs, all reproductible :
>>>>>> - kernel BUG on balance (this original thread)
>>>>
>>>> For this, I can't provide much help, as extent backref bug is quite
>>>> hard to debug, unless a developer is interested in it and find a
>>>> stable way to reproduce it.
>>>
>>> Yes, unfortunately as it looks so much like a race condition, I know
>>> I can
>>> reproduce it with my worflow, but it can take between 1 minute and 12
>>> hours,
>>> so I wouldn't call it a "stable way" to reproduce it unfortunately :(
>>>
>>> Still if any dev is interested in it, I can reproduce it, with a
>>> patched
>>> kernel if needed.
>>
>> Maybe you are already doing it, you can only compile the btrfs
>> modules, which will be far more faster than compile the whole kernel,
>> if and only if the compiled module can be loaded.
>
> Yes, I've compiled this 4.3.0-rc1 in a completely modular form, so I'll
> try to
> load the modified module and see if the running kernel accepts it. I
> have to rmmod
> the loaded module first, hence umounting any btrfs fs before that.
> Should be able
> to do it in a couple hours.
>
> I'll delete again all my snapshots and run my script. Should be easy to
> trigger
> the (hopefully worked-around) bug again.
Well, I didn't trigger this exact bug, but another one, not less severe
though, as it also crashed the system:
[92098.841309] general protection fault: 0000 [#1] SMP
[92098.841338] Modules linked in: ...
[92098.841814] CPU: 1 PID: 24655 Comm: kworker/u4:12 Not tainted
4.3.0-rc1 #1
[92098.841834] Hardware name: ASUS All Series/H87I-PLUS, BIOS 1005
01/06/2014
[92098.841868] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper
[btrfs]
[92098.841889] task: ffff8800b6cc4100 ti: ffff8800a3dc8000 task.ti:
ffff8800a3dc8000
[92098.841910] RIP: 0010:[<ffffffff813ae6c6>] [<ffffffff813ae6c6>]
memcpy_erms+0x6/0x10
[92098.841935] RSP: 0018:ffff8800a3dcbcc8 EFLAGS: 00010207
[92098.841950] RAX: ffff8800a3dcbd67 RBX: 0000000000000009 RCX:
0000000000000009
[92098.841970] RDX: 0000000000000009 RSI: 0005080000000000 RDI:
ffff8800a3dcbd67
[92098.841989] RBP: ffff8800a3dcbd00 R08: 0000000000019c60 R09:
ffff88011fb19c60
[92098.842009] R10: ffffea0003006480 R11: 0000000001000000 R12:
ffff8800b76c32c0
[92098.842028] R13: 0000160000000000 R14: ffff8800a3dcbd70 R15:
0000000000000009
[92098.842048] FS: 0000000000000000(0000) GS:ffff88011fb00000(0000)
knlGS:0000000000000000
[92098.842070] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[92098.842086] CR2: 00007fe1f2bd8000 CR3: 0000000001c10000 CR4:
00000000000406e0
[92098.842105] Stack:
[92098.842111] ffffffffc035a5d8 ffffffffc0396d00 000000000000028b
0000000000000000
[92098.842212] 0000cc6c00000000 ffff8800b76c3200 0000160000000000
ffff8800a3dcbdc0
[92098.842237] ffffffffc039af3d ffff8800c7196dc8 ffff8800c7196e08
ffff8800c7196da0
[92098.842261] Call Trace:
[92098.842277] [<ffffffffc035a5d8>] ? read_extent_buffer+0xb8/0x110
[btrfs]
[92098.842304] [<ffffffffc0396d00>] ? btrfs_find_all_roots+0x60/0x70
[btrfs]
[92098.842329] [<ffffffffc039af3d>]
btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs]
[92098.842351] [<ffffffff810a1a0d>] ?
ttwu_do_activate.constprop.90+0x5d/0x70
[92098.842377] [<ffffffffc03674e0>] normal_work_helper+0xc0/0x270
[btrfs]
[92098.842401] [<ffffffffc03678a2>]
btrfs_qgroup_rescan_helper+0x12/0x20 [btrfs]
[92098.842421] [<ffffffff8109127e>] process_one_work+0x14e/0x3d0
[92098.842438] [<ffffffff8109192a>] worker_thread+0x11a/0x470
[92098.842454] [<ffffffff81091810>] ? rescuer_thread+0x310/0x310
[92098.842471] [<ffffffff81097059>] kthread+0xc9/0xe0
[92098.842485] [<ffffffff81096f90>] ? kthread_park+0x60/0x60
[92098.842502] [<ffffffff817aac4f>] ret_from_fork+0x3f/0x70
[92098.842517] [<ffffffff81096f90>] ? kthread_park+0x60/0x60
[92098.842532] Code: ff eb eb 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48
c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48
89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38
[92098.842658] RIP [<ffffffff813ae6c6>] memcpy_erms+0x6/0x10
[92098.842675] RSP <ffff8800a3dcbcc8>
[92098.849594] ---[ end trace 9d5fb7931a3ec713 ]---
I would definitely say that rescans should be avoided on current kernels
as the possibility that it'll bring the system down shouldn't be
ignored.
It confirms that this code really needs a rewrite !
Regards,
--
Stéphane.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-22 1:30 ` Stéphane Lesimple
@ 2015-09-22 1:37 ` Qu Wenruo
2015-09-22 7:34 ` Stéphane Lesimple
0 siblings, 1 reply; 37+ messages in thread
From: Qu Wenruo @ 2015-09-22 1:37 UTC (permalink / raw)
To: Stéphane Lesimple; +Cc: Qu Wenruo, linux-btrfs
Stéphane Lesimple wrote on 2015/09/22 03:30 +0200:
> Le 2015-09-20 13:14, Stéphane Lesimple a écrit :
>> Le 2015-09-20 12:51, Qu Wenruo a écrit :
>>>>> Would you please use gdb to show the codes of
>>>>> "btrfs_qgroup_rescan_worker+0x388" ?
>>>>> (Need kernel debuginfo)
>>>>>
>>>>> My guess is the following line:(pretty sure, but not 100% sure)
>>>>> ------
>>>>> /*
>>>>> * only update status, since the previous part has alreay
>>>>> updated the
>>>>> * qgroup info.
>>>>> */
>>>>> trans = btrfs_start_transaction(fs_info->quota_root, 1); <<<<<
>>>>> if (IS_ERR(trans)) {
>>>>> err = PTR_ERR(trans);
>>>>> btrfs_err(fs_info,
>>>>> "fail to start transaction for status
>>>>> update: %d\n",
>>>>> err);
>>>>> goto done;
>>>>> }
>>>>> ------
>>>>
>>>> The kernel and modules were already compiled with debuginfo.
>>>> However for some reason, I couldn't get gdb disassembly of /proc/kcore
>>>> properly
>>>> aligned with the source I compiled: the asm code doesn't match the C
>>>> code shown
>>>> by gdb. In any case, watching the source of this function, this is the
>>>> only place
>>>> btrfs_start_transaction is called, so we can be 100% sure it's where
>>>> the
>>>> crash
>>>> happens indeed.
>>>
>>> Yep, that's the only caller.
>>>
>>> Here is some useful small hint to locate the code, if you are
>>> interestied in kernel development.
>>>
>>> # Not sure about whether ubuntu gzipped modules, at least Arch does
>>> # compress it
>>> $ cp <kernel modules dir>/kernel/fs/btrfs/btrfs.ko.gz /tmp/
>>> $ gunzip /tmp/btrfs.ko.gz
>>> $ gdb /tmp/btrfs.ko
>>> # Make sure gdb read all the needed debuginfo
>>> $ gdb list *(btrfs_qgroup_rescan_worker+0x388)
>>>
>>> And gdb will find the code position for you.
>>> Quite easy one, only backtrace info is needed.
>>
>> Ah, thanks for the tips, I was loading whole vmlinux and using
>> /proc/kcore
>> as the core info, then adding the module with "add-symbol-file". But as
>> we're just looking for the code and not the variables, it was indeed
>> completely overkill.
>>
>> (gdb) list *(btrfs_qgroup_rescan_worker+0x388)
>> 0x98068 is in btrfs_qgroup_rescan_worker (fs/btrfs/qgroup.c:2328).
>> 2323
>> 2324 /*
>> 2325 * only update status, since the previous part has
>> alreay updated the
>> 2326 * qgroup info.
>> 2327 */
>> 2328 trans = btrfs_start_transaction(fs_info->quota_root, 1);
>> 2329 if (IS_ERR(trans)) {
>> 2330 err = PTR_ERR(trans);
>> 2331 btrfs_err(fs_info,
>> 2332 "fail to start transaction for
>> status update: %d\n",
>>
>> So this just confirms what we were already 99% sure of.
>>
>>> Another hint is about how to collect the kernel crash info.
>>> Your netconsole setup would be definitely one good practice.
>>>
>>> Another one I use to collect crash info is kdump.
>>> Ubuntu should have a good wiki on it.
>>
>> I've already come across kdump a few times, but never really look into
>> it.
>> To debug the other complicated extend backref bug, it could be of some
>> use.
>>
>>>>>>> So, as a quick summary of this big thread, it seems I've been
>>>>>>> hitting
>>>>>>> 3 bugs, all reproductible :
>>>>>>> - kernel BUG on balance (this original thread)
>>>>>
>>>>> For this, I can't provide much help, as extent backref bug is quite
>>>>> hard to debug, unless a developer is interested in it and find a
>>>>> stable way to reproduce it.
>>>>
>>>> Yes, unfortunately as it looks so much like a race condition, I know
>>>> I can
>>>> reproduce it with my worflow, but it can take between 1 minute and 12
>>>> hours,
>>>> so I wouldn't call it a "stable way" to reproduce it unfortunately :(
>>>>
>>>> Still if any dev is interested in it, I can reproduce it, with a
>>>> patched
>>>> kernel if needed.
>>>
>>> Maybe you are already doing it, you can only compile the btrfs
>>> modules, which will be far more faster than compile the whole kernel,
>>> if and only if the compiled module can be loaded.
>>
>> Yes, I've compiled this 4.3.0-rc1 in a completely modular form, so
>> I'll try to
>> load the modified module and see if the running kernel accepts it. I
>> have to rmmod
>> the loaded module first, hence umounting any btrfs fs before that.
>> Should be able
>> to do it in a couple hours.
>>
>> I'll delete again all my snapshots and run my script. Should be easy
>> to trigger
>> the (hopefully worked-around) bug again.
>
> Well, I didn't trigger this exact bug, but another one, not less severe
> though, as it also crashed the system:
>
> [92098.841309] general protection fault: 0000 [#1] SMP
> [92098.841338] Modules linked in: ...
> [92098.841814] CPU: 1 PID: 24655 Comm: kworker/u4:12 Not tainted
> 4.3.0-rc1 #1
> [92098.841834] Hardware name: ASUS All Series/H87I-PLUS, BIOS 1005
> 01/06/2014
> [92098.841868] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper
> [btrfs]
> [92098.841889] task: ffff8800b6cc4100 ti: ffff8800a3dc8000 task.ti:
> ffff8800a3dc8000
> [92098.841910] RIP: 0010:[<ffffffff813ae6c6>] [<ffffffff813ae6c6>]
> memcpy_erms+0x6/0x10
> [92098.841935] RSP: 0018:ffff8800a3dcbcc8 EFLAGS: 00010207
> [92098.841950] RAX: ffff8800a3dcbd67 RBX: 0000000000000009 RCX:
> 0000000000000009
> [92098.841970] RDX: 0000000000000009 RSI: 0005080000000000 RDI:
> ffff8800a3dcbd67
> [92098.841989] RBP: ffff8800a3dcbd00 R08: 0000000000019c60 R09:
> ffff88011fb19c60
> [92098.842009] R10: ffffea0003006480 R11: 0000000001000000 R12:
> ffff8800b76c32c0
> [92098.842028] R13: 0000160000000000 R14: ffff8800a3dcbd70 R15:
> 0000000000000009
> [92098.842048] FS: 0000000000000000(0000) GS:ffff88011fb00000(0000)
> knlGS:0000000000000000
> [92098.842070] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [92098.842086] CR2: 00007fe1f2bd8000 CR3: 0000000001c10000 CR4:
> 00000000000406e0
> [92098.842105] Stack:
> [92098.842111] ffffffffc035a5d8 ffffffffc0396d00 000000000000028b
> 0000000000000000
> [92098.842212] 0000cc6c00000000 ffff8800b76c3200 0000160000000000
> ffff8800a3dcbdc0
> [92098.842237] ffffffffc039af3d ffff8800c7196dc8 ffff8800c7196e08
> ffff8800c7196da0
> [92098.842261] Call Trace:
> [92098.842277] [<ffffffffc035a5d8>] ? read_extent_buffer+0xb8/0x110
> [btrfs]
> [92098.842304] [<ffffffffc0396d00>] ? btrfs_find_all_roots+0x60/0x70
> [btrfs]
> [92098.842329] [<ffffffffc039af3d>]
> btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs]
Would you please show the code of it?
This one seems to be another stupid bug I made when rewriting the framework.
Maybe I forgot to reinit some variants or I'm screwing memory...
Thanks,
Qu
> [92098.842351] [<ffffffff810a1a0d>] ?
> ttwu_do_activate.constprop.90+0x5d/0x70
> [92098.842377] [<ffffffffc03674e0>] normal_work_helper+0xc0/0x270 [btrfs]
> [92098.842401] [<ffffffffc03678a2>]
> btrfs_qgroup_rescan_helper+0x12/0x20 [btrfs]
> [92098.842421] [<ffffffff8109127e>] process_one_work+0x14e/0x3d0
> [92098.842438] [<ffffffff8109192a>] worker_thread+0x11a/0x470
> [92098.842454] [<ffffffff81091810>] ? rescuer_thread+0x310/0x310
> [92098.842471] [<ffffffff81097059>] kthread+0xc9/0xe0
> [92098.842485] [<ffffffff81096f90>] ? kthread_park+0x60/0x60
> [92098.842502] [<ffffffff817aac4f>] ret_from_fork+0x3f/0x70
> [92098.842517] [<ffffffff81096f90>] ? kthread_park+0x60/0x60
> [92098.842532] Code: ff eb eb 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48
> c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48
> 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38
> [92098.842658] RIP [<ffffffff813ae6c6>] memcpy_erms+0x6/0x10
> [92098.842675] RSP <ffff8800a3dcbcc8>
> [92098.849594] ---[ end trace 9d5fb7931a3ec713 ]---
>
> I would definitely say that rescans should be avoided on current kernels
> as the possibility that it'll bring the system down shouldn't be ignored.
> It confirms that this code really needs a rewrite !
>
> Regards,
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-22 1:37 ` Qu Wenruo
@ 2015-09-22 7:34 ` Stéphane Lesimple
2015-09-22 8:40 ` Qu Wenruo
0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-22 7:34 UTC (permalink / raw)
To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs
Le 2015-09-22 03:37, Qu Wenruo a écrit :
> Stéphane Lesimple wrote on 2015/09/22 03:30 +0200:
>> Le 2015-09-20 13:14, Stéphane Lesimple a écrit :
>>> Le 2015-09-20 12:51, Qu Wenruo a écrit :
>>>>>> Would you please use gdb to show the codes of
>>>>>> "btrfs_qgroup_rescan_worker+0x388" ?
>>>>>> (Need kernel debuginfo)
>>>>>>
>>>>>> My guess is the following line:(pretty sure, but not 100% sure)
>>>>>> ------
>>>>>> /*
>>>>>> * only update status, since the previous part has alreay
>>>>>> updated the
>>>>>> * qgroup info.
>>>>>> */
>>>>>> trans = btrfs_start_transaction(fs_info->quota_root, 1);
>>>>>> <<<<<
>>>>>> if (IS_ERR(trans)) {
>>>>>> err = PTR_ERR(trans);
>>>>>> btrfs_err(fs_info,
>>>>>> "fail to start transaction for status
>>>>>> update: %d\n",
>>>>>> err);
>>>>>> goto done;
>>>>>> }
>>>>>> ------
>>>>>
>>>>> The kernel and modules were already compiled with debuginfo.
>>>>> However for some reason, I couldn't get gdb disassembly of
>>>>> /proc/kcore
>>>>> properly
>>>>> aligned with the source I compiled: the asm code doesn't match the
>>>>> C
>>>>> code shown
>>>>> by gdb. In any case, watching the source of this function, this is
>>>>> the
>>>>> only place
>>>>> btrfs_start_transaction is called, so we can be 100% sure it's
>>>>> where
>>>>> the
>>>>> crash
>>>>> happens indeed.
>>>>
>>>> Yep, that's the only caller.
>>>>
>>>> Here is some useful small hint to locate the code, if you are
>>>> interestied in kernel development.
>>>>
>>>> # Not sure about whether ubuntu gzipped modules, at least Arch does
>>>> # compress it
>>>> $ cp <kernel modules dir>/kernel/fs/btrfs/btrfs.ko.gz /tmp/
>>>> $ gunzip /tmp/btrfs.ko.gz
>>>> $ gdb /tmp/btrfs.ko
>>>> # Make sure gdb read all the needed debuginfo
>>>> $ gdb list *(btrfs_qgroup_rescan_worker+0x388)
>>>>
>>>> And gdb will find the code position for you.
>>>> Quite easy one, only backtrace info is needed.
>>>
>>> Ah, thanks for the tips, I was loading whole vmlinux and using
>>> /proc/kcore
>>> as the core info, then adding the module with "add-symbol-file". But
>>> as
>>> we're just looking for the code and not the variables, it was indeed
>>> completely overkill.
>>>
>>> (gdb) list *(btrfs_qgroup_rescan_worker+0x388)
>>> 0x98068 is in btrfs_qgroup_rescan_worker (fs/btrfs/qgroup.c:2328).
>>> 2323
>>> 2324 /*
>>> 2325 * only update status, since the previous part has
>>> alreay updated the
>>> 2326 * qgroup info.
>>> 2327 */
>>> 2328 trans = btrfs_start_transaction(fs_info->quota_root,
>>> 1);
>>> 2329 if (IS_ERR(trans)) {
>>> 2330 err = PTR_ERR(trans);
>>> 2331 btrfs_err(fs_info,
>>> 2332 "fail to start transaction for
>>> status update: %d\n",
>>>
>>> So this just confirms what we were already 99% sure of.
>>>
>>>> Another hint is about how to collect the kernel crash info.
>>>> Your netconsole setup would be definitely one good practice.
>>>>
>>>> Another one I use to collect crash info is kdump.
>>>> Ubuntu should have a good wiki on it.
>>>
>>> I've already come across kdump a few times, but never really look
>>> into
>>> it.
>>> To debug the other complicated extend backref bug, it could be of
>>> some
>>> use.
>>>
>>>>>>>> So, as a quick summary of this big thread, it seems I've been
>>>>>>>> hitting
>>>>>>>> 3 bugs, all reproductible :
>>>>>>>> - kernel BUG on balance (this original thread)
>>>>>>
>>>>>> For this, I can't provide much help, as extent backref bug is
>>>>>> quite
>>>>>> hard to debug, unless a developer is interested in it and find a
>>>>>> stable way to reproduce it.
>>>>>
>>>>> Yes, unfortunately as it looks so much like a race condition, I
>>>>> know
>>>>> I can
>>>>> reproduce it with my worflow, but it can take between 1 minute and
>>>>> 12
>>>>> hours,
>>>>> so I wouldn't call it a "stable way" to reproduce it unfortunately
>>>>> :(
>>>>>
>>>>> Still if any dev is interested in it, I can reproduce it, with a
>>>>> patched
>>>>> kernel if needed.
>>>>
>>>> Maybe you are already doing it, you can only compile the btrfs
>>>> modules, which will be far more faster than compile the whole
>>>> kernel,
>>>> if and only if the compiled module can be loaded.
>>>
>>> Yes, I've compiled this 4.3.0-rc1 in a completely modular form, so
>>> I'll try to
>>> load the modified module and see if the running kernel accepts it. I
>>> have to rmmod
>>> the loaded module first, hence umounting any btrfs fs before that.
>>> Should be able
>>> to do it in a couple hours.
>>>
>>> I'll delete again all my snapshots and run my script. Should be easy
>>> to trigger
>>> the (hopefully worked-around) bug again.
>>
>> Well, I didn't trigger this exact bug, but another one, not less
>> severe
>> though, as it also crashed the system:
>>
>> [92098.841309] general protection fault: 0000 [#1] SMP
>> [92098.841338] Modules linked in: ...
>> [92098.841814] CPU: 1 PID: 24655 Comm: kworker/u4:12 Not tainted
>> 4.3.0-rc1 #1
>> [92098.841834] Hardware name: ASUS All Series/H87I-PLUS, BIOS 1005
>> 01/06/2014
>> [92098.841868] Workqueue: btrfs-qgroup-rescan
>> btrfs_qgroup_rescan_helper
>> [btrfs]
>> [92098.841889] task: ffff8800b6cc4100 ti: ffff8800a3dc8000 task.ti:
>> ffff8800a3dc8000
>> [92098.841910] RIP: 0010:[<ffffffff813ae6c6>] [<ffffffff813ae6c6>]
>> memcpy_erms+0x6/0x10
>> [92098.841935] RSP: 0018:ffff8800a3dcbcc8 EFLAGS: 00010207
>> [92098.841950] RAX: ffff8800a3dcbd67 RBX: 0000000000000009 RCX:
>> 0000000000000009
>> [92098.841970] RDX: 0000000000000009 RSI: 0005080000000000 RDI:
>> ffff8800a3dcbd67
>> [92098.841989] RBP: ffff8800a3dcbd00 R08: 0000000000019c60 R09:
>> ffff88011fb19c60
>> [92098.842009] R10: ffffea0003006480 R11: 0000000001000000 R12:
>> ffff8800b76c32c0
>> [92098.842028] R13: 0000160000000000 R14: ffff8800a3dcbd70 R15:
>> 0000000000000009
>> [92098.842048] FS: 0000000000000000(0000) GS:ffff88011fb00000(0000)
>> knlGS:0000000000000000
>> [92098.842070] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [92098.842086] CR2: 00007fe1f2bd8000 CR3: 0000000001c10000 CR4:
>> 00000000000406e0
>> [92098.842105] Stack:
>> [92098.842111] ffffffffc035a5d8 ffffffffc0396d00 000000000000028b
>> 0000000000000000
>> [92098.842212] 0000cc6c00000000 ffff8800b76c3200 0000160000000000
>> ffff8800a3dcbdc0
>> [92098.842237] ffffffffc039af3d ffff8800c7196dc8 ffff8800c7196e08
>> ffff8800c7196da0
>> [92098.842261] Call Trace:
>> [92098.842277] [<ffffffffc035a5d8>] ? read_extent_buffer+0xb8/0x110
>> [btrfs]
>> [92098.842304] [<ffffffffc0396d00>] ? btrfs_find_all_roots+0x60/0x70
>> [btrfs]
>> [92098.842329] [<ffffffffc039af3d>]
>> btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs]
>
> Would you please show the code of it?
> This one seems to be another stupid bug I made when rewriting the
> framework.
> Maybe I forgot to reinit some variants or I'm screwing memory...
(gdb) list *(btrfs_qgroup_rescan_worker+0x28d)
0x97f6d is in btrfs_qgroup_rescan_worker (fs/btrfs/ctree.h:2760).
2755
2756 static inline void btrfs_disk_key_to_cpu(struct btrfs_key *cpu,
2757 struct btrfs_disk_key
*disk)
2758 {
2759 cpu->offset = le64_to_cpu(disk->offset);
2760 cpu->type = disk->type;
2761 cpu->objectid = le64_to_cpu(disk->objectid);
2762 }
2763
2764 static inline void btrfs_cpu_key_to_disk(struct btrfs_disk_key
*disk,
(gdb)
Does it makes sense ?
>> [92098.842351] [<ffffffff810a1a0d>] ?
>> ttwu_do_activate.constprop.90+0x5d/0x70
>> [92098.842377] [<ffffffffc03674e0>] normal_work_helper+0xc0/0x270
>> [btrfs]
>> [92098.842401] [<ffffffffc03678a2>]
>> btrfs_qgroup_rescan_helper+0x12/0x20 [btrfs]
>> [92098.842421] [<ffffffff8109127e>] process_one_work+0x14e/0x3d0
>> [92098.842438] [<ffffffff8109192a>] worker_thread+0x11a/0x470
>> [92098.842454] [<ffffffff81091810>] ? rescuer_thread+0x310/0x310
>> [92098.842471] [<ffffffff81097059>] kthread+0xc9/0xe0
>> [92098.842485] [<ffffffff81096f90>] ? kthread_park+0x60/0x60
>> [92098.842502] [<ffffffff817aac4f>] ret_from_fork+0x3f/0x70
>> [92098.842517] [<ffffffff81096f90>] ? kthread_park+0x60/0x60
>> [92098.842532] Code: ff eb eb 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1
>> 48
>> c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8
>> 48
>> 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38
>> [92098.842658] RIP [<ffffffff813ae6c6>] memcpy_erms+0x6/0x10
>> [92098.842675] RSP <ffff8800a3dcbcc8>
>> [92098.849594] ---[ end trace 9d5fb7931a3ec713 ]---
>>
>> I would definitely say that rescans should be avoided on current
>> kernels
>> as the possibility that it'll bring the system down shouldn't be
>> ignored.
>> It confirms that this code really needs a rewrite !
>>
>> Regards,
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-22 7:34 ` Stéphane Lesimple
@ 2015-09-22 8:40 ` Qu Wenruo
2015-09-22 8:51 ` Qu Wenruo
0 siblings, 1 reply; 37+ messages in thread
From: Qu Wenruo @ 2015-09-22 8:40 UTC (permalink / raw)
To: Stéphane Lesimple, Qu Wenruo; +Cc: linux-btrfs
在 2015年09月22日 15:34, Stéphane Lesimple 写道:
> Le 2015-09-22 03:37, Qu Wenruo a écrit :
>> Stéphane Lesimple wrote on 2015/09/22 03:30 +0200:
>>> Le 2015-09-20 13:14, Stéphane Lesimple a écrit :
>>>> Le 2015-09-20 12:51, Qu Wenruo a écrit :
>>>>>>> Would you please use gdb to show the codes of
>>>>>>> "btrfs_qgroup_rescan_worker+0x388" ?
>>>>>>> (Need kernel debuginfo)
>>>>>>>
>>>>>>> My guess is the following line:(pretty sure, but not 100% sure)
>>>>>>> ------
>>>>>>> /*
>>>>>>> * only update status, since the previous part has alreay
>>>>>>> updated the
>>>>>>> * qgroup info.
>>>>>>> */
>>>>>>> trans = btrfs_start_transaction(fs_info->quota_root, 1);
>>>>>>> <<<<<
>>>>>>> if (IS_ERR(trans)) {
>>>>>>> err = PTR_ERR(trans);
>>>>>>> btrfs_err(fs_info,
>>>>>>> "fail to start transaction for status
>>>>>>> update: %d\n",
>>>>>>> err);
>>>>>>> goto done;
>>>>>>> }
>>>>>>> ------
>>>>>>
>>>>>> The kernel and modules were already compiled with debuginfo.
>>>>>> However for some reason, I couldn't get gdb disassembly of
>>>>>> /proc/kcore
>>>>>> properly
>>>>>> aligned with the source I compiled: the asm code doesn't match the C
>>>>>> code shown
>>>>>> by gdb. In any case, watching the source of this function, this is
>>>>>> the
>>>>>> only place
>>>>>> btrfs_start_transaction is called, so we can be 100% sure it's where
>>>>>> the
>>>>>> crash
>>>>>> happens indeed.
>>>>>
>>>>> Yep, that's the only caller.
>>>>>
>>>>> Here is some useful small hint to locate the code, if you are
>>>>> interestied in kernel development.
>>>>>
>>>>> # Not sure about whether ubuntu gzipped modules, at least Arch does
>>>>> # compress it
>>>>> $ cp <kernel modules dir>/kernel/fs/btrfs/btrfs.ko.gz /tmp/
>>>>> $ gunzip /tmp/btrfs.ko.gz
>>>>> $ gdb /tmp/btrfs.ko
>>>>> # Make sure gdb read all the needed debuginfo
>>>>> $ gdb list *(btrfs_qgroup_rescan_worker+0x388)
>>>>>
>>>>> And gdb will find the code position for you.
>>>>> Quite easy one, only backtrace info is needed.
>>>>
>>>> Ah, thanks for the tips, I was loading whole vmlinux and using
>>>> /proc/kcore
>>>> as the core info, then adding the module with "add-symbol-file". But as
>>>> we're just looking for the code and not the variables, it was indeed
>>>> completely overkill.
>>>>
>>>> (gdb) list *(btrfs_qgroup_rescan_worker+0x388)
>>>> 0x98068 is in btrfs_qgroup_rescan_worker (fs/btrfs/qgroup.c:2328).
>>>> 2323
>>>> 2324 /*
>>>> 2325 * only update status, since the previous part has
>>>> alreay updated the
>>>> 2326 * qgroup info.
>>>> 2327 */
>>>> 2328 trans = btrfs_start_transaction(fs_info->quota_root,
>>>> 1);
>>>> 2329 if (IS_ERR(trans)) {
>>>> 2330 err = PTR_ERR(trans);
>>>> 2331 btrfs_err(fs_info,
>>>> 2332 "fail to start transaction for
>>>> status update: %d\n",
>>>>
>>>> So this just confirms what we were already 99% sure of.
>>>>
>>>>> Another hint is about how to collect the kernel crash info.
>>>>> Your netconsole setup would be definitely one good practice.
>>>>>
>>>>> Another one I use to collect crash info is kdump.
>>>>> Ubuntu should have a good wiki on it.
>>>>
>>>> I've already come across kdump a few times, but never really look into
>>>> it.
>>>> To debug the other complicated extend backref bug, it could be of some
>>>> use.
>>>>
>>>>>>>>> So, as a quick summary of this big thread, it seems I've been
>>>>>>>>> hitting
>>>>>>>>> 3 bugs, all reproductible :
>>>>>>>>> - kernel BUG on balance (this original thread)
>>>>>>>
>>>>>>> For this, I can't provide much help, as extent backref bug is quite
>>>>>>> hard to debug, unless a developer is interested in it and find a
>>>>>>> stable way to reproduce it.
>>>>>>
>>>>>> Yes, unfortunately as it looks so much like a race condition, I know
>>>>>> I can
>>>>>> reproduce it with my worflow, but it can take between 1 minute and 12
>>>>>> hours,
>>>>>> so I wouldn't call it a "stable way" to reproduce it unfortunately :(
>>>>>>
>>>>>> Still if any dev is interested in it, I can reproduce it, with a
>>>>>> patched
>>>>>> kernel if needed.
>>>>>
>>>>> Maybe you are already doing it, you can only compile the btrfs
>>>>> modules, which will be far more faster than compile the whole kernel,
>>>>> if and only if the compiled module can be loaded.
>>>>
>>>> Yes, I've compiled this 4.3.0-rc1 in a completely modular form, so
>>>> I'll try to
>>>> load the modified module and see if the running kernel accepts it. I
>>>> have to rmmod
>>>> the loaded module first, hence umounting any btrfs fs before that.
>>>> Should be able
>>>> to do it in a couple hours.
>>>>
>>>> I'll delete again all my snapshots and run my script. Should be easy
>>>> to trigger
>>>> the (hopefully worked-around) bug again.
>>>
>>> Well, I didn't trigger this exact bug, but another one, not less severe
>>> though, as it also crashed the system:
>>>
>>> [92098.841309] general protection fault: 0000 [#1] SMP
>>> [92098.841338] Modules linked in: ...
>>> [92098.841814] CPU: 1 PID: 24655 Comm: kworker/u4:12 Not tainted
>>> 4.3.0-rc1 #1
>>> [92098.841834] Hardware name: ASUS All Series/H87I-PLUS, BIOS 1005
>>> 01/06/2014
>>> [92098.841868] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper
>>> [btrfs]
>>> [92098.841889] task: ffff8800b6cc4100 ti: ffff8800a3dc8000 task.ti:
>>> ffff8800a3dc8000
>>> [92098.841910] RIP: 0010:[<ffffffff813ae6c6>] [<ffffffff813ae6c6>]
>>> memcpy_erms+0x6/0x10
>>> [92098.841935] RSP: 0018:ffff8800a3dcbcc8 EFLAGS: 00010207
>>> [92098.841950] RAX: ffff8800a3dcbd67 RBX: 0000000000000009 RCX:
>>> 0000000000000009
>>> [92098.841970] RDX: 0000000000000009 RSI: 0005080000000000 RDI:
>>> ffff8800a3dcbd67
>>> [92098.841989] RBP: ffff8800a3dcbd00 R08: 0000000000019c60 R09:
>>> ffff88011fb19c60
>>> [92098.842009] R10: ffffea0003006480 R11: 0000000001000000 R12:
>>> ffff8800b76c32c0
>>> [92098.842028] R13: 0000160000000000 R14: ffff8800a3dcbd70 R15:
>>> 0000000000000009
>>> [92098.842048] FS: 0000000000000000(0000) GS:ffff88011fb00000(0000)
>>> knlGS:0000000000000000
>>> [92098.842070] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [92098.842086] CR2: 00007fe1f2bd8000 CR3: 0000000001c10000 CR4:
>>> 00000000000406e0
>>> [92098.842105] Stack:
>>> [92098.842111] ffffffffc035a5d8 ffffffffc0396d00 000000000000028b
>>> 0000000000000000
>>> [92098.842212] 0000cc6c00000000 ffff8800b76c3200 0000160000000000
>>> ffff8800a3dcbdc0
>>> [92098.842237] ffffffffc039af3d ffff8800c7196dc8 ffff8800c7196e08
>>> ffff8800c7196da0
>>> [92098.842261] Call Trace:
>>> [92098.842277] [<ffffffffc035a5d8>] ? read_extent_buffer+0xb8/0x110
>>> [btrfs]
>>> [92098.842304] [<ffffffffc0396d00>] ? btrfs_find_all_roots+0x60/0x70
>>> [btrfs]
>>> [92098.842329] [<ffffffffc039af3d>]
>>> btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs]
>>
>> Would you please show the code of it?
>> This one seems to be another stupid bug I made when rewriting the
>> framework.
>> Maybe I forgot to reinit some variants or I'm screwing memory...
>
> (gdb) list *(btrfs_qgroup_rescan_worker+0x28d)
> 0x97f6d is in btrfs_qgroup_rescan_worker (fs/btrfs/ctree.h:2760).
> 2755
> 2756 static inline void btrfs_disk_key_to_cpu(struct btrfs_key *cpu,
> 2757 struct btrfs_disk_key
> *disk)
> 2758 {
> 2759 cpu->offset = le64_to_cpu(disk->offset);
> 2760 cpu->type = disk->type;
> 2761 cpu->objectid = le64_to_cpu(disk->objectid);
> 2762 }
> 2763
> 2764 static inline void btrfs_cpu_key_to_disk(struct btrfs_disk_key
> *disk,
> (gdb)
>
>
> Does it makes sense ?
So it seems that the memory of cpu key is being screwed up...
The code is be specific thin inline function, so what about other stack?
Like btrfs_qgroup_rescan_helper+0x12?
Thanks,
Qu
>
>
>>> [92098.842351] [<ffffffff810a1a0d>] ?
>>> ttwu_do_activate.constprop.90+0x5d/0x70
>>> [92098.842377] [<ffffffffc03674e0>] normal_work_helper+0xc0/0x270
>>> [btrfs]
>>> [92098.842401] [<ffffffffc03678a2>]
>>> btrfs_qgroup_rescan_helper+0x12/0x20 [btrfs]
>>> [92098.842421] [<ffffffff8109127e>] process_one_work+0x14e/0x3d0
>>> [92098.842438] [<ffffffff8109192a>] worker_thread+0x11a/0x470
>>> [92098.842454] [<ffffffff81091810>] ? rescuer_thread+0x310/0x310
>>> [92098.842471] [<ffffffff81097059>] kthread+0xc9/0xe0
>>> [92098.842485] [<ffffffff81096f90>] ? kthread_park+0x60/0x60
>>> [92098.842502] [<ffffffff817aac4f>] ret_from_fork+0x3f/0x70
>>> [92098.842517] [<ffffffff81096f90>] ? kthread_park+0x60/0x60
>>> [92098.842532] Code: ff eb eb 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48
>>> c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48
>>> 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38
>>> [92098.842658] RIP [<ffffffff813ae6c6>] memcpy_erms+0x6/0x10
>>> [92098.842675] RSP <ffff8800a3dcbcc8>
>>> [92098.849594] ---[ end trace 9d5fb7931a3ec713 ]---
>>>
>>> I would definitely say that rescans should be avoided on current kernels
>>> as the possibility that it'll bring the system down shouldn't be
>>> ignored.
>>> It confirms that this code really needs a rewrite !
>>>
>>> Regards,
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-22 8:40 ` Qu Wenruo
@ 2015-09-22 8:51 ` Qu Wenruo
2015-09-22 14:31 ` Stéphane Lesimple
0 siblings, 1 reply; 37+ messages in thread
From: Qu Wenruo @ 2015-09-22 8:51 UTC (permalink / raw)
To: Stéphane Lesimple, Qu Wenruo; +Cc: linux-btrfs
在 2015年09月22日 16:40, Qu Wenruo 写道:
>
>
> 在 2015年09月22日 15:34, Stéphane Lesimple 写道:
>> Le 2015-09-22 03:37, Qu Wenruo a écrit :
>>> Stéphane Lesimple wrote on 2015/09/22 03:30 +0200:
>>>> Le 2015-09-20 13:14, Stéphane Lesimple a écrit :
>>>>> Le 2015-09-20 12:51, Qu Wenruo a écrit :
>>>>>>>> Would you please use gdb to show the codes of
>>>>>>>> "btrfs_qgroup_rescan_worker+0x388" ?
>>>>>>>> (Need kernel debuginfo)
>>>>>>>>
>>>>>>>> My guess is the following line:(pretty sure, but not 100% sure)
>>>>>>>> ------
>>>>>>>> /*
>>>>>>>> * only update status, since the previous part has alreay
>>>>>>>> updated the
>>>>>>>> * qgroup info.
>>>>>>>> */
>>>>>>>> trans =trfs_start_transaction(fs_info->quota_root, 1);
>>>>>>>> <<<<<
>>>>>>>> if (IS_ERR(trans)) {
>>>>>>>> err =TR_ERR(trans);
>>>>>>>> btrfs_err(fs_info,
>>>>>>>> "fail to start transaction for status
>>>>>>>> update: %d\n",
>>>>>>>> err);
>>>>>>>> goto done;
>>>>>>>> }
>>>>>>>> ------
>>>>>>>
>>>>>>> The kernel and modules were already compiled with debuginfo.
>>>>>>> However for some reason, I couldn't get gdb disassembly of
>>>>>>> /proc/kcore
>>>>>>> properly
>>>>>>> aligned with the source I compiled: the asm code doesn't match the C
>>>>>>> code shown
>>>>>>> by gdb. In any case, watching the source of this function, this is
>>>>>>> the
>>>>>>> only place
>>>>>>> btrfs_start_transaction is called, so we can be 100% sure it's where
>>>>>>> the
>>>>>>> crash
>>>>>>> happens indeed.
>>>>>>
>>>>>> Yep, that's the only caller.
>>>>>>
>>>>>> Here is some useful small hint to locate the code, if you are
>>>>>> interestied in kernel development.
>>>>>>
>>>>>> # Not sure about whether ubuntu gzipped modules, at least Arch does
>>>>>> # compress it
>>>>>> $ cp <kernel modules dir>/kernel/fs/btrfs/btrfs.ko.gz /tmp/
>>>>>> $ gunzip /tmp/btrfs.ko.gz
>>>>>> $ gdb /tmp/btrfs.ko
>>>>>> # Make sure gdb read all the needed debuginfo
>>>>>> $ gdb list *(btrfs_qgroup_rescan_worker+0x388)
>>>>>>
>>>>>> And gdb will find the code position for you.
>>>>>> Quite easy one, only backtrace info is needed.
>>>>>
>>>>> Ah, thanks for the tips, I was loading whole vmlinux and using
>>>>> /proc/kcore
>>>>> as the core info, then adding the module with "add-symbol-file".
>>>>> But as
>>>>> we're just looking for the code and not the variables, it was indeed
>>>>> completely overkill.
>>>>>
>>>>> (gdb) list *(btrfs_qgroup_rescan_worker+0x388)
>>>>> 0x98068 is in btrfs_qgroup_rescan_worker (fs/btrfs/qgroup.c:2328).
>>>>> 2323
>>>>> 2324 /*
>>>>> 2325 * only update status, since the previous part has
>>>>> alreay updated the
>>>>> 2326 * qgroup info.
>>>>> 2327 */
>>>>> 2328 trans =trfs_start_transaction(fs_info->quota_root,
>>>>> 1);
>>>>> 2329 if (IS_ERR(trans)) {
>>>>> 2330 err =TR_ERR(trans);
>>>>> 2331 btrfs_err(fs_info,
>>>>> 2332 "fail to start transaction for
>>>>> status update: %d\n",
>>>>>
>>>>> So this just confirms what we were already 99% sure of.
>>>>>
>>>>>> Another hint is about how to collect the kernel crash info.
>>>>>> Your netconsole setup would be definitely one good practice.
>>>>>>
>>>>>> Another one I use to collect crash info is kdump.
>>>>>> Ubuntu should have a good wiki on it.
>>>>>
>>>>> I've already come across kdump a few times, but never really look into
>>>>> it.
>>>>> To debug the other complicated extend backref bug, it could be of some
>>>>> use.
>>>>>
>>>>>>>>>> So, as a quick summary of this big thread, it seems I've been
>>>>>>>>>> hitting
>>>>>>>>>> 3 bugs, all reproductible :
>>>>>>>>>> - kernel BUG on balance (this original thread)
>>>>>>>>
>>>>>>>> For this, I can't provide much help, as extent backref bug is quite
>>>>>>>> hard to debug, unless a developer is interested in it and find a
>>>>>>>> stable way to reproduce it.
>>>>>>>
>>>>>>> Yes, unfortunately as it looks so much like a race condition, I know
>>>>>>> I can
>>>>>>> reproduce it with my worflow, but it can take between 1 minute
>>>>>>> and 12
>>>>>>> hours,
>>>>>>> so I wouldn't call it a "stable way" to reproduce it
>>>>>>> unfortunately :(
>>>>>>>
>>>>>>> Still if any dev is interested in it, I can reproduce it, with a
>>>>>>> patched
>>>>>>> kernel if needed.
>>>>>>
>>>>>> Maybe you are already doing it, you can only compile the btrfs
>>>>>> modules, which will be far more faster than compile the whole kernel,
>>>>>> if and only if the compiled module can be loaded.
>>>>>
>>>>> Yes, I've compiled this 4.3.0-rc1 in a completely modular form, so
>>>>> I'll try to
>>>>> load the modified module and see if the running kernel accepts it. I
>>>>> have to rmmod
>>>>> the loaded module first, hence umounting any btrfs fs before that.
>>>>> Should be able
>>>>> to do it in a couple hours.
>>>>>
>>>>> I'll delete again all my snapshots and run my script. Should be easy
>>>>> to trigger
>>>>> the (hopefully worked-around) bug again.
>>>>
>>>> Well, I didn't trigger this exact bug, but another one, not less severe
>>>> though, as it also crashed the system:
>>>>
>>>> [92098.841309] general protection fault: 0000 [#1] SMP
>>>> [92098.841338] Modules linked in: ...
>>>> [92098.841814] CPU: 1 PID: 24655 Comm: kworker/u4:12 Not tainted
>>>> 4.3.0-rc1 #1
>>>> [92098.841834] Hardware name: ASUS All Series/H87I-PLUS, BIOS 1005
>>>> 01/06/2014
>>>> [92098.841868] Workqueue: btrfs-qgroup-rescan
>>>> btrfs_qgroup_rescan_helper
>>>> [btrfs]
>>>> [92098.841889] task: ffff8800b6cc4100 ti: ffff8800a3dc8000 task.ti:
>>>> ffff8800a3dc8000
>>>> [92098.841910] RIP: 0010:[<ffffffff813ae6c6>] [<ffffffff813ae6c6>]
>>>> memcpy_erms+0x6/0x10
>>>> [92098.841935] RSP: 0018:ffff8800a3dcbcc8 EFLAGS: 00010207
>>>> [92098.841950] RAX: ffff8800a3dcbd67 RBX: 0000000000000009 RCX:
>>>> 0000000000000009
>>>> [92098.841970] RDX: 0000000000000009 RSI: 0005080000000000 RDI:
>>>> ffff8800a3dcbd67
>>>> [92098.841989] RBP: ffff8800a3dcbd00 R08: 0000000000019c60 R09:
>>>> ffff88011fb19c60
>>>> [92098.842009] R10: ffffea0003006480 R11: 0000000001000000 R12:
>>>> ffff8800b76c32c0
>>>> [92098.842028] R13: 0000160000000000 R14: ffff8800a3dcbd70 R15:
>>>> 0000000000000009
>>>> [92098.842048] FS: 0000000000000000(0000) GS:ffff88011fb00000(0000)
>>>> knlGS:0000000000000000
>>>> [92098.842070] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [92098.842086] CR2: 00007fe1f2bd8000 CR3: 0000000001c10000 CR4:
>>>> 00000000000406e0
>>>> [92098.842105] Stack:
>>>> [92098.842111] ffffffffc035a5d8 ffffffffc0396d00 000000000000028b
>>>> 0000000000000000
>>>> [92098.842212] 0000cc6c00000000 ffff8800b76c3200 0000160000000000
>>>> ffff8800a3dcbdc0
>>>> [92098.842237] ffffffffc039af3d ffff8800c7196dc8 ffff8800c7196e08
>>>> ffff8800c7196da0
>>>> [92098.842261] Call Trace:
>>>> [92098.842277] [<ffffffffc035a5d8>] ? read_extent_buffer+0xb8/0x110
>>>> [btrfs]
>>>> [92098.842304] [<ffffffffc0396d00>] ? btrfs_find_all_roots+0x60/0x70
>>>> [btrfs]
>>>> [92098.842329] [<ffffffffc039af3d>]
>>>> btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs]
>>>
>>> Would you please show the code of it?
>>> This one seems to be another stupid bug I made when rewriting the
>>> framework.
>>> Maybe I forgot to reinit some variants or I'm screwing memory...
>>
>> (gdb) list *(btrfs_qgroup_rescan_worker+0x28d)
>> 0x97f6d is in btrfs_qgroup_rescan_worker (fs/btrfs/ctree.h:2760).
>> 2755
>> 2756 static inline void btrfs_disk_key_to_cpu(struct btrfs_key *cpu,
>> 2757 struct btrfs_disk_key
>> *disk)
>> 2758 {
>> 2759 cpu->offset =e64_to_cpu(disk->offset);
>> 2760 cpu->type =isk->type;
>> 2761 cpu->objectid =e64_to_cpu(disk->objectid);
>> 2762 }
>> 2763
>> 2764 static inline void btrfs_cpu_key_to_disk(struct btrfs_disk_key
>> *disk,
>> (gdb)
>>
>>
>> Does it makes sense ?
> So it seems that the memory of cpu key is being screwed up...
>
> The code is be specific thin inline function, so what about other stack?
> Like btrfs_qgroup_rescan_helper+0x12?
>
> Thanks,
> Qu
Oh, I forgot that you can just change the number of
btrfs_qgroup_rescan_worker+0x28d to smaller value.
Try +0x280 for example, which will revert to 14 bytes asm code back,
which may jump out of the inline function range, and may give you a good
hint.
Or gdb may have a better mode for inline function, but I don't know...
Thanks,
Qu
>>
>>
>>>> [92098.842351] [<ffffffff810a1a0d>] ?
>>>> ttwu_do_activate.constprop.90+0x5d/0x70
>>>> [92098.842377] [<ffffffffc03674e0>] normal_work_helper+0xc0/0x270
>>>> [btrfs]
>>>> [92098.842401] [<ffffffffc03678a2>]
>>>> btrfs_qgroup_rescan_helper+0x12/0x20 [btrfs]
>>>> [92098.842421] [<ffffffff8109127e>] process_one_work+0x14e/0x3d0
>>>> [92098.842438] [<ffffffff8109192a>] worker_thread+0x11a/0x470
>>>> [92098.842454] [<ffffffff81091810>] ? rescuer_thread+0x310/0x310
>>>> [92098.842471] [<ffffffff81097059>] kthread+0xc9/0xe0
>>>> [92098.842485] [<ffffffff81096f90>] ? kthread_park+0x60/0x60
>>>> [92098.842502] [<ffffffff817aac4f>] ret_from_fork+0x3f/0x70
>>>> [92098.842517] [<ffffffff81096f90>] ? kthread_park+0x60/0x60
>>>> [92098.842532] Code: ff eb eb 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48
>>>> c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48
>>>> 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38
>>>> [92098.842658] RIP [<ffffffff813ae6c6>] memcpy_erms+0x6/0x10
>>>> [92098.842675] RSP <ffff8800a3dcbcc8>
>>>> [92098.849594] ---[ end trace 9d5fb7931a3ec713 ]---
>>>>
>>>> I would definitely say that rescans should be avoided on current
>>>> kernels
>>>> as the possibility that it'll bring the system down shouldn't be
>>>> ignored.
>>>> It confirms that this code really needs a rewrite !
>>>>
>>>> Regards,
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe
>>> linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-22 8:51 ` Qu Wenruo
@ 2015-09-22 14:31 ` Stéphane Lesimple
2015-09-23 7:03 ` Qu Wenruo
0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-22 14:31 UTC (permalink / raw)
To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs
Le 2015-09-22 10:51, Qu Wenruo a écrit :
>>>>> [92098.842261] Call Trace:
>>>>> [92098.842277] [<ffffffffc035a5d8>] ?
>>>>> read_extent_buffer+0xb8/0x110
>>>>> [btrfs]
>>>>> [92098.842304] [<ffffffffc0396d00>] ?
>>>>> btrfs_find_all_roots+0x60/0x70
>>>>> [btrfs]
>>>>> [92098.842329] [<ffffffffc039af3d>]
>>>>> btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs]
>>>>
>>>> Would you please show the code of it?
>>>> This one seems to be another stupid bug I made when rewriting the
>>>> framework.
>>>> Maybe I forgot to reinit some variants or I'm screwing memory...
>>>
>>> (gdb) list *(btrfs_qgroup_rescan_worker+0x28d)
>>> 0x97f6d is in btrfs_qgroup_rescan_worker (fs/btrfs/ctree.h:2760).
>>> 2755
>>> 2756 static inline void btrfs_disk_key_to_cpu(struct btrfs_key
>>> *cpu,
>>> 2757 struct
>>> btrfs_disk_key
>>> *disk)
>>> 2758 {
>>> 2759 cpu->offset =e64_to_cpu(disk->offset);
>>> 2760 cpu->type =isk->type;
>>> 2761 cpu->objectid =e64_to_cpu(disk->objectid);
>>> 2762 }
>>> 2763
>>> 2764 static inline void btrfs_cpu_key_to_disk(struct
>>> btrfs_disk_key
>>> *disk,
>>> (gdb)
>>>
>>>
>>> Does it makes sense ?
>> So it seems that the memory of cpu key is being screwed up...
>>
>> The code is be specific thin inline function, so what about other
>> stack?
>> Like btrfs_qgroup_rescan_helper+0x12?
>>
>> Thanks,
>> Qu
> Oh, I forgot that you can just change the number of
> btrfs_qgroup_rescan_worker+0x28d to smaller value.
> Try +0x280 for example, which will revert to 14 bytes asm code back,
> which may jump out of the inline function range, and may give you a
> good hint.
>
> Or gdb may have a better mode for inline function, but I don't know...
Actually, "list -" is our friend here (show 10 lignes before the last
src output)
(gdb) list *(btrfs_qgroup_rescan_worker+0x28d)
0x97f6d is in btrfs_qgroup_rescan_worker (fs/btrfs/ctree.h:2760).
2755
2756 static inline void btrfs_disk_key_to_cpu(struct btrfs_key *cpu,
2757 struct btrfs_disk_key
*disk)
2758 {
2759 cpu->offset = le64_to_cpu(disk->offset);
2760 cpu->type = disk->type;
2761 cpu->objectid = le64_to_cpu(disk->objectid);
2762 }
2763
2764 static inline void btrfs_cpu_key_to_disk(struct btrfs_disk_key
*disk,
(gdb) list -
2745 struct
btrfs_disk_key *key)
2746 {
2747 write_eb_member(eb, h, struct btrfs_free_space_header,
location, key);
2748 }
2749
2750 /* struct btrfs_disk_key */
2751 BTRFS_SETGET_STACK_FUNCS(disk_key_objectid, struct
btrfs_disk_key,
2752 objectid, 64);
2753 BTRFS_SETGET_STACK_FUNCS(disk_key_offset, struct btrfs_disk_key,
offset, 64);
2754 BTRFS_SETGET_STACK_FUNCS(disk_key_type, struct btrfs_disk_key,
type, 8);
(gdb) list -
2735
2736 static inline void btrfs_free_space_key(struct extent_buffer
*eb,
2737 struct
btrfs_free_space_header *h,
2738 struct btrfs_disk_key
*key)
2739 {
2740 read_eb_member(eb, h, struct btrfs_free_space_header,
location, key);
2741 }
2742
2743 static inline void btrfs_set_free_space_key(struct extent_buffer
*eb,
2744 struct
btrfs_free_space_header *h,
(gdb)
Lots of inline funcs and macros it seems.
--
Stéphane.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-22 14:31 ` Stéphane Lesimple
@ 2015-09-23 7:03 ` Qu Wenruo
2015-09-23 9:40 ` Stéphane Lesimple
0 siblings, 1 reply; 37+ messages in thread
From: Qu Wenruo @ 2015-09-23 7:03 UTC (permalink / raw)
To: Stéphane Lesimple, Qu Wenruo; +Cc: linux-btrfs
Stéphane Lesimple wrote on 2015/09/22 16:31 +0200:
> Le 2015-09-22 10:51, Qu Wenruo a écrit :
>>>>>> [92098.842261] Call Trace:
>>>>>> [92098.842277] [<ffffffffc035a5d8>] ? read_extent_buffer+0xb8/0x110
>>>>>> [btrfs]
>>>>>> [92098.842304] [<ffffffffc0396d00>] ? btrfs_find_all_roots+0x60/0x70
>>>>>> [btrfs]
>>>>>> [92098.842329] [<ffffffffc039af3d>]
>>>>>> btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs]
>>>>>
>>>>> Would you please show the code of it?
>>>>> This one seems to be another stupid bug I made when rewriting the
>>>>> framework.
>>>>> Maybe I forgot to reinit some variants or I'm screwing memory...
>>>>
>>>> (gdb) list *(btrfs_qgroup_rescan_worker+0x28d)
>>>> 0x97f6d is in btrfs_qgroup_rescan_worker (fs/btrfs/ctree.h:2760).
>>>> 2755
>>>> 2756 static inline void btrfs_disk_key_to_cpu(struct btrfs_key *cpu,
>>>> 2757 struct btrfs_disk_key
>>>> *disk)
>>>> 2758 {
>>>> 2759 cpu->offset =e64_to_cpu(disk->offset);
>>>> 2760 cpu->type =isk->type;
>>>> 2761 cpu->objectid =e64_to_cpu(disk->objectid);
>>>> 2762 }
>>>> 2763
>>>> 2764 static inline void btrfs_cpu_key_to_disk(struct btrfs_disk_key
>>>> *disk,
>>>> (gdb)
>>>>
>>>>
>>>> Does it makes sense ?
>>> So it seems that the memory of cpu key is being screwed up...
>>>
>>> The code is be specific thin inline function, so what about other stack?
>>> Like btrfs_qgroup_rescan_helper+0x12?
>>>
>>> Thanks,
>>> Qu
>> Oh, I forgot that you can just change the number of
>> btrfs_qgroup_rescan_worker+0x28d to smaller value.
>> Try +0x280 for example, which will revert to 14 bytes asm code back,
>> which may jump out of the inline function range, and may give you a
>> good hint.
>>
>> Or gdb may have a better mode for inline function, but I don't know...
>
> Actually, "list -" is our friend here (show 10 lignes before the last
> src output)
No, that's not the case.
List - will only show lines around the source code.
What I need is to get the higher caller stack.
If debugging a running program, it's quite easy to just use frame command.
But in this situation, we don't have call stack, so I'd like to change
the +0x28d to several bytes backward, until we jump out of the inline
function call, and see the meaningful codes.
BTW, did you tried the following patch?
https://patchwork.kernel.org/patch/7114321/
btrfs: qgroup: exit the rescan worker during umount
The problem seems a little related to the bug you encountered, so I'd
recommend to give it a try.
Thanks,
Qu
>
> (gdb) list *(btrfs_qgroup_rescan_worker+0x28d)
> 0x97f6d is in btrfs_qgroup_rescan_worker (fs/btrfs/ctree.h:2760).
> 2755
> 2756 static inline void btrfs_disk_key_to_cpu(struct btrfs_key *cpu,
> 2757 struct btrfs_disk_key
> *disk)
> 2758 {
> 2759 cpu->offset = le64_to_cpu(disk->offset);
> 2760 cpu->type = disk->type;
> 2761 cpu->objectid = le64_to_cpu(disk->objectid);
> 2762 }
> 2763
> 2764 static inline void btrfs_cpu_key_to_disk(struct btrfs_disk_key
> *disk,
> (gdb) list -
> 2745 struct
> btrfs_disk_key *key)
> 2746 {
> 2747 write_eb_member(eb, h, struct btrfs_free_space_header,
> location, key);
> 2748 }
> 2749
> 2750 /* struct btrfs_disk_key */
> 2751 BTRFS_SETGET_STACK_FUNCS(disk_key_objectid, struct btrfs_disk_key,
> 2752 objectid, 64);
> 2753 BTRFS_SETGET_STACK_FUNCS(disk_key_offset, struct btrfs_disk_key,
> offset, 64);
> 2754 BTRFS_SETGET_STACK_FUNCS(disk_key_type, struct btrfs_disk_key,
> type, 8);
> (gdb) list -
> 2735
> 2736 static inline void btrfs_free_space_key(struct extent_buffer *eb,
> 2737 struct
> btrfs_free_space_header *h,
> 2738 struct btrfs_disk_key *key)
> 2739 {
> 2740 read_eb_member(eb, h, struct btrfs_free_space_header,
> location, key);
> 2741 }
> 2742
> 2743 static inline void btrfs_set_free_space_key(struct extent_buffer
> *eb,
> 2744 struct
> btrfs_free_space_header *h,
> (gdb)
>
> Lots of inline funcs and macros it seems.
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-23 7:03 ` Qu Wenruo
@ 2015-09-23 9:40 ` Stéphane Lesimple
2015-09-23 10:13 ` Qu Wenruo
0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-23 9:40 UTC (permalink / raw)
To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs
Le 2015-09-23 09:03, Qu Wenruo a écrit :
> Stéphane Lesimple wrote on 2015/09/22 16:31 +0200:
>> Le 2015-09-22 10:51, Qu Wenruo a écrit :
>>>>>>> [92098.842261] Call Trace:
>>>>>>> [92098.842277] [<ffffffffc035a5d8>] ?
>>>>>>> read_extent_buffer+0xb8/0x110
>>>>>>> [btrfs]
>>>>>>> [92098.842304] [<ffffffffc0396d00>] ?
>>>>>>> btrfs_find_all_roots+0x60/0x70
>>>>>>> [btrfs]
>>>>>>> [92098.842329] [<ffffffffc039af3d>]
>>>>>>> btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs]
>>>>>>
>>>>>> Would you please show the code of it?
>>>>>> This one seems to be another stupid bug I made when rewriting the
>>>>>> framework.
>>>>>> Maybe I forgot to reinit some variants or I'm screwing memory...
>>>>>
>>>>> (gdb) list *(btrfs_qgroup_rescan_worker+0x28d)
>>>>> 0x97f6d is in btrfs_qgroup_rescan_worker (fs/btrfs/ctree.h:2760).
>>>>> 2755
>>>>> 2756 static inline void btrfs_disk_key_to_cpu(struct btrfs_key
>>>>> *cpu,
>>>>> 2757 struct
>>>>> btrfs_disk_key
>>>>> *disk)
>>>>> 2758 {
>>>>> 2759 cpu->offset =e64_to_cpu(disk->offset);
>>>>> 2760 cpu->type =isk->type;
>>>>> 2761 cpu->objectid =e64_to_cpu(disk->objectid);
>>>>> 2762 }
>>>>> 2763
>>>>> 2764 static inline void btrfs_cpu_key_to_disk(struct
>>>>> btrfs_disk_key
>>>>> *disk,
>>>>> (gdb)
>>>>>
>>>>>
>>>>> Does it makes sense ?
>>>> So it seems that the memory of cpu key is being screwed up...
>>>>
>>>> The code is be specific thin inline function, so what about other
>>>> stack?
>>>> Like btrfs_qgroup_rescan_helper+0x12?
>>>>
>>>> Thanks,
>>>> Qu
>>> Oh, I forgot that you can just change the number of
>>> btrfs_qgroup_rescan_worker+0x28d to smaller value.
>>> Try +0x280 for example, which will revert to 14 bytes asm code back,
>>> which may jump out of the inline function range, and may give you a
>>> good hint.
>>>
>>> Or gdb may have a better mode for inline function, but I don't
>>> know...
>>
>> Actually, "list -" is our friend here (show 10 lignes before the last
>> src output)
> No, that's not the case.
>
> List - will only show lines around the source code.
>
> What I need is to get the higher caller stack.
> If debugging a running program, it's quite easy to just use frame
> command.
>
> But in this situation, we don't have call stack, so I'd like to change
> the +0x28d to several bytes backward, until we jump out of the inline
> function call, and see the meaningful codes.
Ah, you're right.
I had a hard time finding a value where I wouldn't end up in another
inline
function or entirely somewhere else in the kernel code, but here it is :
(gdb) list *(btrfs_qgroup_rescan_worker+0x26e)
0x97f4e is in btrfs_qgroup_rescan_worker (fs/btrfs/qgroup.c:2237).
2232 memcpy(scratch_leaf, path->nodes[0],
sizeof(*scratch_leaf));
2233 slot = path->slots[0];
2234 btrfs_release_path(path);
2235 mutex_unlock(&fs_info->qgroup_rescan_lock);
2236
2237 for (; slot < btrfs_header_nritems(scratch_leaf);
++slot) {
2238 btrfs_item_key_to_cpu(scratch_leaf, &found,
slot); <== here
2239 if (found.type != BTRFS_EXTENT_ITEM_KEY &&
2240 found.type != BTRFS_METADATA_ITEM_KEY)
2241 continue;
the btrfs_item_key_to_cpu() inline func calls 2 other inline funcs:
static inline void btrfs_item_key_to_cpu(struct extent_buffer *eb,
struct btrfs_key *key, int nr)
{
struct btrfs_disk_key disk_key;
btrfs_item_key(eb, &disk_key, nr);
btrfs_disk_key_to_cpu(key, &disk_key); <== this is 0x28d
}
btrfs_disk_key_to_cpu() is the inline referenced by 0x28d and this is
where
the GPF happens.
> BTW, did you tried the following patch?
> https://patchwork.kernel.org/patch/7114321/
> btrfs: qgroup: exit the rescan worker during umount
>
> The problem seems a little related to the bug you encountered, so I'd
> recommend to give it a try.
Not yet, but I've come across this bug too during my tests: starting a
rescan
and umounting gets you a crash. I didn't mention it because I was sure
this
was an already known bug. Nice to see it has been fixed though !
I'll certainly give it a try but I'm not really sure it'll fix the
specific
bug we're talking about.
However the group of patches posted by Mark should fix the qgroup count
disrepancies as I understand it, right ? It might be of interest to try
them
all at once for sure.
Thanks,
--
Stéphane.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
2015-09-23 9:40 ` Stéphane Lesimple
@ 2015-09-23 10:13 ` Qu Wenruo
0 siblings, 0 replies; 37+ messages in thread
From: Qu Wenruo @ 2015-09-23 10:13 UTC (permalink / raw)
To: Stéphane Lesimple, Qu Wenruo; +Cc: linux-btrfs
在 2015年09月23日 17:40, Stéphane Lesimple 写道:
> Le 2015-09-23 09:03, Qu Wenruo a écrit :
>> Stéphane Lesimple wrote on 2015/09/22 16:31 +0200:
>>> Le 2015-09-22 10:51, Qu Wenruo a écrit :
>>>>>>>> [92098.842261] Call Trace:
>>>>>>>> [92098.842277] [<ffffffffc035a5d8>] ?
>>>>>>>> read_extent_buffer+0xb8/0x110
>>>>>>>> [btrfs]
>>>>>>>> [92098.842304] [<ffffffffc0396d00>] ?
>>>>>>>> btrfs_find_all_roots+0x60/0x70
>>>>>>>> [btrfs]
>>>>>>>> [92098.842329] [<ffffffffc039af3d>]
>>>>>>>> btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs]
>>>>>>>
>>>>>>> Would you please show the code of it?
>>>>>>> This one seems to be another stupid bug I made when rewriting the
>>>>>>> framework.
>>>>>>> Maybe I forgot to reinit some variants or I'm screwing memory...
>>>>>>
>>>>>> (gdb) list *(btrfs_qgroup_rescan_worker+0x28d)
>>>>>> 0x97f6d is in btrfs_qgroup_rescan_worker (fs/btrfs/ctree.h:2760).
>>>>>> 2755
>>>>>> 2756 static inline void btrfs_disk_key_to_cpu(struct btrfs_key
>>>>>> *cpu,
>>>>>> 2757 struct
>>>>>> btrfs_disk_key
>>>>>> *disk)
>>>>>> 2758 {
>>>>>> 2759 cpu->offset =e64_to_cpu(disk->offset);
>>>>>> 2760 cpu->type =isk->type;
>>>>>> 2761 cpu->objectid =e64_to_cpu(disk->objectid);
>>>>>> 2762 }
>>>>>> 2763
>>>>>> 2764 static inline void btrfs_cpu_key_to_disk(struct
>>>>>> btrfs_disk_key
>>>>>> *disk,
>>>>>> (gdb)
>>>>>>
>>>>>>
>>>>>> Does it makes sense ?
>>>>> So it seems that the memory of cpu key is being screwed up...
>>>>>
>>>>> The code is be specific thin inline function, so what about other
>>>>> stack?
>>>>> Like btrfs_qgroup_rescan_helper+0x12?
>>>>>
>>>>> Thanks,
>>>>> Qu
>>>> Oh, I forgot that you can just change the number of
>>>> btrfs_qgroup_rescan_worker+0x28d to smaller value.
>>>> Try +0x280 for example, which will revert to 14 bytes asm code back,
>>>> which may jump out of the inline function range, and may give you a
>>>> good hint.
>>>>
>>>> Or gdb may have a better mode for inline function, but I don't know...
>>>
>>> Actually, "list -" is our friend here (show 10 lignes before the last
>>> src output)
>> No, that's not the case.
>>
>> List - will only show lines around the source code.
>>
>> What I need is to get the higher caller stack.
>> If debugging a running program, it's quite easy to just use frame
>> command.
>>
>> But in this situation, we don't have call stack, so I'd like to change
>> the +0x28d to several bytes backward, until we jump out of the inline
>> function call, and see the meaningful codes.
>
> Ah, you're right.
> I had a hard time finding a value where I wouldn't end up in another inline
> function or entirely somewhere else in the kernel code, but here it is :
>
> (gdb) list *(btrfs_qgroup_rescan_worker+0x26e)
> 0x97f4e is in btrfs_qgroup_rescan_worker (fs/btrfs/qgroup.c:2237).
> 2232 memcpy(scratch_leaf, path->nodes[0],
> sizeof(*scratch_leaf));
> 2233 slot = path->slots[0];
> 2234 btrfs_release_path(path);
> 2235 mutex_unlock(&fs_info->qgroup_rescan_lock);
> 2236
> 2237 for (; slot < btrfs_header_nritems(scratch_leaf); ++slot) {
> 2238 btrfs_item_key_to_cpu(scratch_leaf, &found,
> slot); <== here
>
> 2239 if (found.type != BTRFS_EXTENT_ITEM_KEY &&
> 2240 found.type != BTRFS_METADATA_ITEM_KEY)
> 2241 continue;
>
> the btrfs_item_key_to_cpu() inline func calls 2 other inline funcs:
>
> static inline void btrfs_item_key_to_cpu(struct extent_buffer *eb,
> struct btrfs_key *key, int nr)
> {
> struct btrfs_disk_key disk_key;
> btrfs_item_key(eb, &disk_key, nr);
> btrfs_disk_key_to_cpu(key, &disk_key); <== this is 0x28d
> }
>
> btrfs_disk_key_to_cpu() is the inline referenced by 0x28d and this is where
> the GPF happens.
Thanks, now things are much more clear.
Not completely sure, but scratch_leaf seems invalid and cause the bug.
(found is in stack memory, so I don't think it's the cause).
But less related to the qgroup rework, as that's the existing code.
A quick glance already shows some dirty and maybe deadly hack, like
copying the whole extent buffer, which includes pages and all kinds of
locks.
But I'm not 100% sure if that's the problem, but I'll create a patch to
for you to test in recent days.
>
>
>> BTW, did you tried the following patch?
>> https://patchwork.kernel.org/patch/7114321/
>> btrfs: qgroup: exit the rescan worker during umount
>>
>> The problem seems a little related to the bug you encountered, so I'd
>> recommend to give it a try.
>
> Not yet, but I've come across this bug too during my tests: starting a
> rescan
> and umounting gets you a crash. I didn't mention it because I was sure this
> was an already known bug. Nice to see it has been fixed though !
> I'll certainly give it a try but I'm not really sure it'll fix the specific
> bug we're talking about.
> However the group of patches posted by Mark should fix the qgroup count
> disrepancies as I understand it, right ? It might be of interest to try
> them
> all at once for sure.
Yes, his patch should fix the qgroup count mismatch problem for
subvolume remove.
If I read the codes correctly, after remove and sync, the accounting
number for qgroup of deleted subvolume should be:
rfer = 0 and excl = 0.
Thanks,
Qu
>
> Thanks,
>
^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2015-09-23 10:14 UTC | newest]
Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-14 11:46 kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance Stéphane Lesimple
2015-09-15 14:47 ` Stéphane Lesimple
2015-09-15 14:56 ` Josef Bacik
2015-09-15 21:47 ` Stéphane Lesimple
2015-09-16 5:02 ` Duncan
2015-09-16 10:28 ` Stéphane Lesimple
2015-09-16 10:46 ` Holger Hoffstätte
2015-09-16 13:04 ` Stéphane Lesimple
2015-09-16 20:18 ` Duncan
2015-09-16 20:41 ` Stéphane Lesimple
2015-09-17 3:03 ` Qu Wenruo
2015-09-17 6:11 ` Stéphane Lesimple
2015-09-17 6:42 ` Qu Wenruo
2015-09-17 8:02 ` Stéphane Lesimple
2015-09-17 8:11 ` Qu Wenruo
2015-09-17 10:08 ` Stéphane Lesimple
2015-09-17 10:41 ` Qu Wenruo
2015-09-17 18:47 ` Stéphane Lesimple
2015-09-18 0:59 ` Qu Wenruo
2015-09-18 7:36 ` Stéphane Lesimple
2015-09-18 10:15 ` Stéphane Lesimple
2015-09-18 10:26 ` Stéphane Lesimple
2015-09-20 1:22 ` Qu Wenruo
2015-09-20 10:35 ` Stéphane Lesimple
2015-09-20 10:51 ` Qu Wenruo
2015-09-20 11:14 ` Stéphane Lesimple
2015-09-22 1:30 ` Stéphane Lesimple
2015-09-22 1:37 ` Qu Wenruo
2015-09-22 7:34 ` Stéphane Lesimple
2015-09-22 8:40 ` Qu Wenruo
2015-09-22 8:51 ` Qu Wenruo
2015-09-22 14:31 ` Stéphane Lesimple
2015-09-23 7:03 ` Qu Wenruo
2015-09-23 9:40 ` Stéphane Lesimple
2015-09-23 10:13 ` Qu Wenruo
2015-09-17 6:29 ` Stéphane Lesimple
2015-09-17 7:54 ` Stéphane Lesimple
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.