All of lore.kernel.org
 help / color / mirror / Atom feed
* kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
@ 2015-09-14 11:46 Stéphane Lesimple
  2015-09-15 14:47 ` Stéphane Lesimple
  0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-14 11:46 UTC (permalink / raw)
  To: linux-btrfs

Hello btrfs-aholics,

I've been experiencing repetitive "kernel BUG" occurences in the past 
few days trying to balance a raid5 filesystem after adding a new drive.
It occurs on both 4.2.0 and 4.1.7, using 4.2 userspace tools.

The raid5 setup was 2x4T drives (created 3 days ago to upgrade smoothly 
from mdadm/ext4 to btrfs), then I added a 3rd drive and tried to 
balance.
metadata is in raid1.

root@nas:~# uname -a
Linux nas 4.1.7-040107-generic #201509131330 SMP Sun Sep 13 17:32:28 UTC 
2015 x86_64 x86_64 x86_64 GNU/Linux

(and also
Linux version 4.2.0-7-generic (buildd@lgw01-60) (gcc version 5.2.1 
20150825 (Ubuntu 5.2.1-15ubuntu5) ) #7-Ubuntu SMP Tue Sep 1 16:43:10 UTC 
2015 (Ubuntu 4.2.0-7.7-generic 4.2.0)
)

root@nas:~# btrfs --version
btrfs-progs v4.2

root@nas:~# btrfs fi show
Label: 'tank'  uuid: 6bec1608-d9c0-453e-87eb-8b8663c9010d
         Total devices 3 FS bytes used 2.66TiB
         devid    1 size 2.73TiB used 2.50TiB path 
/dev/mapper/luks-WDC_WD30EFRX-68EUZN0_WD-WCC4N2STUCVR
         devid    2 size 2.73TiB used 2.50TiB path 
/dev/mapper/luks-WDC_WD30EFRX-68EUZN0_WD-WCC4N2DVRDXF
         devid    4 size 2.73TiB used 190.03GiB path 
/dev/mapper/luks-WDC_WD30EZRX-00MMMB0_WD-WCAWZ3013164

btrfs-progs v4.2

root@nas:~# btrfs fi df /tank/
Data, RAID5: total=2.67TiB, used=2.65TiB
System, RAID1: total=32.00MiB, used=384.00KiB
Metadata, RAID1: total=6.00GiB, used=4.38GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

root@nas:~# btrfs fi usage /tank/
WARNING: RAID56 detected, not implemented
Overall:
     Device size:                   8.19TiB
     Device allocated:             12.06GiB
     Device unallocated:            8.17TiB
     Device missing:                  0.00B
     Used:                          8.76GiB
     Free (estimated):                0.00B      (min: 8.00EiB)
     Data ratio:                       0.00
     Metadata ratio:                   2.00
     Global reserve:              512.00MiB      (used: 0.00B)

Data,RAID5: Size:2.67TiB, Used:2.65TiB
    /dev/dm-1       2.49TiB
    /dev/dm-2       2.49TiB
    /dev/mapper/luks-WDC_WD30EZRX-00MMMB0_WD-WCAWZ3013164         
184.00GiB

Metadata,RAID1: Size:6.00GiB, Used:4.38GiB
    /dev/dm-1       3.00GiB
    /dev/dm-2       3.00GiB
    /dev/mapper/luks-WDC_WD30EZRX-00MMMB0_WD-WCAWZ3013164           
6.00GiB

System,RAID1: Size:32.00MiB, Used:384.00KiB
    /dev/dm-2      32.00MiB
    /dev/mapper/luks-WDC_WD30EZRX-00MMMB0_WD-WCAWZ3013164          
32.00MiB

Unallocated:
    /dev/dm-1     239.52GiB
    /dev/dm-2     239.49GiB
    /dev/mapper/luks-WDC_WD30EZRX-00MMMB0_WD-WCAWZ3013164           
2.54TiB


Each drive had LUKS configured on them (directly on /dev/sdX, no 
partition), then the resulting virtual drive is directly used as a btrfs 
device.


root@nas:~# time btrfs balance start /tank


Segmentation fault

real    750m55.550s

with the following kernel BUG in the log :


nas kernel: [17863.907793] ------------[ cut here ]------------
nas kernel: [17863.907833] kernel BUG at 
/build/linux-4dBub_/linux-4.2.0/fs/btrfs/extent-tree.c:1833!
nas kernel: [17863.907857] invalid opcode: 0000 [#1] SMP
nas kernel: [17863.907877] Modules linked in: xts gf128mul drbg 
ansi_cprng xt_multiport xt_comment xt_conntrack xt_nat xt_tcpudp 
nfnetlink_queue nfnetlink_log nfne
nas kernel: [17863.908264] CPU: 1 PID: 17379 Comm: btrfs Not tainted 
4.2.0-7-generic #7-Ubuntu
nas kernel: [17863.908281] Hardware name: ASUS All Series/H87I-PLUS, 
BIOS 1005 01/06/2014
nas kernel: [17863.908297] task: ffff880036184c80 ti: ffff8800507f4000 
task.ti: ffff8800507f4000
nas kernel: [17863.908314] RIP: 0010:[<ffffffffc0311ab6>]  
[<ffffffffc0311ab6>] insert_inline_extent_backref+0xc6/0xd0 [btrfs]
nas kernel: [17863.908349] RSP: 0018:ffff8800507f7698  EFLAGS: 00010293
nas kernel: [17863.908362] RAX: 0000000000000000 RBX: 0000000000000001 
RCX: 0000000000000001
nas kernel: [17863.908378] RDX: ffff880000000000 RSI: 0000000000000001 
RDI: 0000000000000000
nas kernel: [17863.908394] RBP: ffff8800507f7718 R08: 0000000000004000 
R09: ffff8800507f7598
nas kernel: [17863.908410] R10: 0000000000000000 R11: 0000000000000003 
R12: ffff8800c5c65000
nas kernel: [17863.908427] R13: 00000307b70ac000 R14: 0000000000000000 
R15: ffff880108d5c630
nas kernel: [17863.908443] FS:  00007f9300a7d900(0000) 
GS:ffff88011fb00000(0000) knlGS:0000000000000000
nas kernel: [17863.908461] CS:  0010 DS: 0000 ES: 0000 CR0: 
0000000080050033
nas kernel: [17863.908475] CR2: 00007f0a351c6000 CR3: 0000000118c0d000 
CR4: 00000000000406e0
nas kernel: [17863.908491] Stack:
nas kernel: [17863.908496]  00000307b70ac000 0000000000000d0b 
0000000000000001 0000000000000000
nas kernel: [17863.908516]  0000030600000001 ffffffff811cf4ca 
0000000000000000 ffffffffc030550a
nas kernel: [17863.908535]  0000000000270026 00000000000035d7 
ffff88001fdd95c0 ffff8800927ae000
nas kernel: [17863.908555] Call Trace:
nas kernel: [17863.908564]  [<ffffffff811cf4ca>] ? 
kmem_cache_alloc+0x1ca/0x200
nas kernel: [17863.908582]  [<ffffffffc030550a>] ? 
btrfs_alloc_path+0x1a/0x20 [btrfs]
nas kernel: [17863.908601]  [<ffffffffc0311f98>] 
__btrfs_inc_extent_ref.isra.52+0x98/0x250 [btrfs]
nas kernel: [17863.908623]  [<ffffffffc031757a>] 
__btrfs_run_delayed_refs+0xc4a/0x1050 [btrfs]
nas kernel: [17863.908643]  [<ffffffffc030f980>] ? 
add_pinned_bytes+0x70/0x80 [btrfs]
nas kernel: [17863.908662]  [<ffffffffc0318087>] ? 
walk_up_proc+0xd7/0x4a0 [btrfs]
nas kernel: [17863.908681]  [<ffffffffc031a5be>] 
btrfs_run_delayed_refs.part.73+0x6e/0x270 [btrfs]
nas kernel: [17863.908702]  [<ffffffffc031a7d5>] 
btrfs_run_delayed_refs+0x15/0x20 [btrfs]
nas kernel: [17863.908723]  [<ffffffffc032e38a>] 
btrfs_should_end_transaction+0x5a/0x60 [btrfs]
nas kernel: [17863.908744]  [<ffffffffc0318dad>] 
btrfs_drop_snapshot+0x43d/0x820 [btrfs]
nas kernel: [17863.908765]  [<ffffffffc0328c00>] ? 
btrfs_get_fs_root+0x30/0x80 [btrfs]
nas kernel: [17863.908787]  [<ffffffffc03813c2>] 
merge_reloc_roots+0xd2/0x240 [btrfs]
nas kernel: [17863.908808]  [<ffffffffc038178a>] 
relocate_block_group+0x25a/0x690 [btrfs]
nas kernel: [17863.908829]  [<ffffffffc0381d8a>] 
btrfs_relocate_block_group+0x1ca/0x2c0 [btrfs]
nas kernel: [17863.909470]  [<ffffffffc03564de>] 
btrfs_relocate_chunk.isra.39+0x3e/0xb0 [btrfs]
nas kernel: [17863.910108]  [<ffffffffc0357847>] 
__btrfs_balance+0x4c7/0x8b0 [btrfs]
nas kernel: [17863.910748]  [<ffffffffc0357ec0>] 
btrfs_balance+0x290/0x610 [btrfs]
nas kernel: [17863.911406]  [<ffffffffc0364014>] ? 
btrfs_ioctl_balance+0x274/0x3c0 [btrfs]
nas kernel: [17863.912065]  [<ffffffffc0363f09>] 
btrfs_ioctl_balance+0x169/0x3c0 [btrfs]
nas kernel: [17863.912734]  [<ffffffffc03658d8>] 
btrfs_ioctl+0x548/0x26d0 [btrfs]
nas kernel: [17863.913398]  [<ffffffff811c5f12>] ? 
alloc_pages_vma+0xc2/0x230
nas kernel: [17863.914014]  [<ffffffff81185d6b>] ? 
lru_cache_add_active_or_unevictable+0x2b/0xa0
nas kernel: [17863.914651]  [<ffffffff811a6d25>] ? 
handle_mm_fault+0xbc5/0x16a0
nas kernel: [17863.915260]  [<ffffffff811aa4dd>] ? 
__vma_link_rb+0xfd/0x110
nas kernel: [17863.915841]  [<ffffffff811aa5a9>] ? vma_link+0xb9/0xc0
nas kernel: [17863.916427]  [<ffffffff811fffd5>] 
do_vfs_ioctl+0x285/0x470
nas kernel: [17863.916970]  [<ffffffff810630a4>] ? 
__do_page_fault+0x1b4/0x400
nas kernel: [17863.917528]  [<ffffffff81200239>] SyS_ioctl+0x79/0x90
nas kernel: [17863.918037]  [<ffffffff817b6cf2>] 
entry_SYSCALL_64_fastpath+0x16/0x75
nas kernel: [17863.918564] Code: 45 10 49 89 d9 48 8b 55 c8 4c 89 34 24 
4c 89 e9 4c 89 fe 4c 89 e7 48 89 44 24 10 8b 45 28 89 44 24 08 e8 fe d6 
ff ff 31 c0 eb bb <
nas kernel: [17863.919683] RIP  [<ffffffffc0311ab6>] 
insert_inline_extent_backref+0xc6/0xd0 [btrfs]
nas kernel: [17863.920202]  RSP <ffff8800507f7698>
nas kernel: [17863.922890] ---[ end trace f9b514d72fc0a628 ]---


I downgraded to 4.1.7 just in case, and got the same thing after a 
couple hours :


nas kernel: [47155.229661] ------------[ cut here ]------------
nas kernel: [47155.229670] WARNING: CPU: 1 PID: 9145 at 
/home/kernel/COD/linux/fs/btrfs/delayed-ref.c:475 
update_existing_ref+0x18b/0x1e0 [btrfs]()
nas kernel: [47155.229671] Modules linked in: ufs qnx4 hfsplus hfs minix 
ntfs msdos jfs xfs libcrc32c xts gf128mul xt_multiport xt_comment 
xt_conntrack xt_nat xt_t
nas kernel: [47155.229704] CPU: 1 PID: 9145 Comm: btrfs Tainted: P       
  W  OE   4.1.7-040107-generic #201509131330
nas kernel: [47155.229705] Hardware name: ASUS All Series/H87I-PLUS, 
BIOS 1005 01/06/2014
nas kernel: [47155.229706]  ffffffffc0381b30 ffff880103eff658 
ffffffff817d0ee3 0000000000000000
nas kernel: [47155.229707]  0000000000000000 ffff880103eff698 
ffffffff81079c3a 0000000000001000
nas kernel: [47155.229708]  ffff88009c3806e0 ffff88009a96a428 
ffff88009a96a3c0 ffff8800a3064420
nas kernel: [47155.229710] Call Trace:
nas kernel: [47155.229713]  [<ffffffff817d0ee3>] dump_stack+0x45/0x57
nas kernel: [47155.229714]  [<ffffffff81079c3a>] 
warn_slowpath_common+0x8a/0xc0
nas kernel: [47155.229715]  [<ffffffff81079d2a>] 
warn_slowpath_null+0x1a/0x20
nas kernel: [47155.229723]  [<ffffffffc0349cdb>] 
update_existing_ref+0x18b/0x1e0 [btrfs]
nas kernel: [47155.229730]  [<ffffffffc034a0cb>] 
add_delayed_tree_ref+0xeb/0x1a0 [btrfs]
nas kernel: [47155.229737]  [<ffffffffc034accc>] 
btrfs_add_delayed_tree_ref+0x10c/0x180 [btrfs]
nas kernel: [47155.229744]  [<ffffffffc02e6610>] 
btrfs_free_extent+0xe0/0x140 [btrfs]
nas kernel: [47155.229750]  [<ffffffffc02d3735>] ? 
btrfs_release_path+0x25/0xb0 [btrfs]
nas kernel: [47155.229757]  [<ffffffffc02e6958>] 
do_walk_down+0x2e8/0x940 [btrfs]
nas kernel: [47155.229763]  [<ffffffffc02e3b82>] ? 
walk_down_proc+0x2e2/0x310 [btrfs]
nas kernel: [47155.229771]  [<ffffffffc02fc68d>] ? 
join_transaction.isra.14+0xfd/0x410 [btrfs]
nas kernel: [47155.229777]  [<ffffffffc02e7076>] 
walk_down_tree+0xc6/0x100 [btrfs]
nas kernel: [47155.229784]  [<ffffffffc02eaa4a>] 
btrfs_drop_snapshot+0x41a/0x880 [btrfs]
nas kernel: [47155.229792]  [<ffffffffc034cb00>] ? 
should_ignore_root.part.15+0x50/0x50 [btrfs]
nas kernel: [47155.229800]  [<ffffffffc0351d49>] 
merge_reloc_roots+0xd9/0x240 [btrfs]
nas kernel: [47155.229807]  [<ffffffffc0352119>] 
relocate_block_group+0x269/0x670 [btrfs]
nas kernel: [47155.229814]  [<ffffffffc03526f6>] 
btrfs_relocate_block_group+0x1d6/0x2e0 [btrfs]
nas kernel: [47155.229822]  [<ffffffffc0325cbe>] 
btrfs_relocate_chunk.isra.38+0x3e/0xc0 [btrfs]
nas kernel: [47155.229830]  [<ffffffffc03270a4>] 
__btrfs_balance+0x4e4/0x8b0 [btrfs]
nas kernel: [47155.229838]  [<ffffffffc032781a>] 
btrfs_balance+0x3aa/0x680 [btrfs]
nas kernel: [47155.229846]  [<ffffffffc033086b>] ? 
btrfs_ioctl_balance+0x29b/0x520 [btrfs]
nas kernel: [47155.229853]  [<ffffffffc0330734>] 
btrfs_ioctl_balance+0x164/0x520 [btrfs]
nas kernel: [47155.229860]  [<ffffffffc03355f7>] 
btrfs_ioctl+0x597/0x2b30 [btrfs]
nas kernel: [47155.229862]  [<ffffffff811d2ad5>] ? 
alloc_pages_vma+0xb5/0x200
nas kernel: [47155.229864]  [<ffffffff81191a3b>] ? 
lru_cache_add_active_or_unevictable+0x2b/0xa0
nas kernel: [47155.229865]  [<ffffffff811b280c>] ? 
handle_mm_fault+0xbac/0x17e0
nas kernel: [47155.229866]  [<ffffffff811b6a08>] ? 
__vma_link_rb+0xc8/0xf0
nas kernel: [47155.229867]  [<ffffffff8120ce68>] 
do_vfs_ioctl+0x2f8/0x510
nas kernel: [47155.229869]  [<ffffffff81066f76>] ? 
__do_page_fault+0x1b6/0x450
nas kernel: [47155.229870]  [<ffffffff8120d101>] SyS_ioctl+0x81/0xa0
nas kernel: [47155.229871]  [<ffffffff81067240>] ? 
do_page_fault+0x30/0x80
nas kernel: [47155.229873]  [<ffffffff817d8ab2>] 
system_call_fastpath+0x16/0x75
nas kernel: [47155.229874] ---[ end trace e4064ae1c7878a22 ]---


and 2 seconds later :


nas kernel: [47157.228137] ------------[ cut here ]------------
nas kernel: [47157.228190] kernel BUG at 
/home/kernel/COD/linux/fs/btrfs/extent-tree.c:2248!
nas kernel: [47157.228259] invalid opcode: 0000 [#1] SMP
nas kernel: [47157.228301] Modules linked in: ufs qnx4 hfsplus hfs minix 
ntfs msdos jfs xfs libcrc32c xts gf128mul xt_multiport xt_comment 
xt_conntrack xt_nat xt_t
nas kernel: [47157.229656] CPU: 0 PID: 9145 Comm: btrfs Tainted: P       
  W  OE   4.1.7-040107-generic #201509131330
nas kernel: [47157.229741] Hardware name: ASUS All Series/H87I-PLUS, 
BIOS 1005 01/06/2014
nas kernel: [47157.229807] task: ffff88011a8cd080 ti: ffff880103efc000 
task.ti: ffff880103efc000
nas kernel: [47157.229875] RIP: 0010:[<ffffffffc02e8251>]  
[<ffffffffc02e8251>] __btrfs_run_delayed_refs+0x11a1/0x1230 [btrfs]
nas kernel: [47157.229998] RSP: 0018:ffff880103eff7c8  EFLAGS: 00010202
nas kernel: [47157.230048] RAX: 0000000000000001 RBX: 0000000000000000 
RCX: 00000000000001e1
nas kernel: [47157.230113] RDX: ffff8800c61ad000 RSI: ffff8800c6adaed0 
RDI: ffff8800c6adaec8
nas kernel: [47157.230179] RBP: ffff880103eff8f8 R08: 0000000000000000 
R09: 00000001802e002c
nas kernel: [47157.230244] R10: ffffffffc02e75d3 R11: 0000000000000d0a 
R12: ffff880056f0c9f8
nas kernel: [47157.230310] R13: 000003cdf0f80000 R14: ffff8800c6adae60 
R15: 0000000000000000
nas kernel: [47157.230377] FS:  00007f5f63146900(0000) 
GS:ffff88011fa00000(0000) knlGS:0000000000000000
nas kernel: [47157.230451] CS:  0010 DS: 0000 ES: 0000 CR0: 
0000000080050033
nas kernel: [47157.230504] CR2: 00007f6126ad5000 CR3: 00000000041be000 
CR4: 00000000000406f0
nas kernel: [47157.230569] Stack:
nas kernel: [47157.230590]  0000000000000001 0000000000000000 
0000042000000001 0000000000000001
nas kernel: [47157.230669]  0000000000000000 0000000000000cf6 
ffff88009a930480 00000000000020ae
nas kernel: [47157.230748]  0000000203eff838 0000000000004000 
ffff88009a930480 ffff88009a930480
nas kernel: [47157.230827] Call Trace:
nas kernel: [47157.230882]  [<ffffffffc02ec483>] 
btrfs_run_delayed_refs.part.66+0x73/0x270 [btrfs]
nas kernel: [47157.230975]  [<ffffffffc02ec697>] 
btrfs_run_delayed_refs+0x17/0x20 [btrfs]
nas kernel: [47157.231065]  [<ffffffffc02fd169>] 
btrfs_should_end_transaction+0x49/0x60 [btrfs]
nas kernel: [47157.231155]  [<ffffffffc02eaaa2>] 
btrfs_drop_snapshot+0x472/0x880 [btrfs]
nas kernel: [47157.231251]  [<ffffffffc034cb00>] ? 
should_ignore_root.part.15+0x50/0x50 [btrfs]
nas kernel: [47157.231347]  [<ffffffffc0351d49>] 
merge_reloc_roots+0xd9/0x240 [btrfs]
nas kernel: [47157.231433]  [<ffffffffc0352119>] 
relocate_block_group+0x269/0x670 [btrfs]
nas kernel: [47157.231521]  [<ffffffffc03526f6>] 
btrfs_relocate_block_group+0x1d6/0x2e0 [btrfs]
nas kernel: [47157.231618]  [<ffffffffc0325cbe>] 
btrfs_relocate_chunk.isra.38+0x3e/0xc0 [btrfs]
nas kernel: [47157.231714]  [<ffffffffc03270a4>] 
__btrfs_balance+0x4e4/0x8b0 [btrfs]
nas kernel: [47157.231799]  [<ffffffffc032781a>] 
btrfs_balance+0x3aa/0x680 [btrfs]
nas kernel: [47157.231885]  [<ffffffffc033086b>] ? 
btrfs_ioctl_balance+0x29b/0x520 [btrfs]
nas kernel: [47157.231974]  [<ffffffffc0330734>] 
btrfs_ioctl_balance+0x164/0x520 [btrfs]
nas kernel: [47157.232062]  [<ffffffffc03355f7>] 
btrfs_ioctl+0x597/0x2b30 [btrfs]
nas kernel: [47157.232125]  [<ffffffff811d2ad5>] ? 
alloc_pages_vma+0xb5/0x200
nas kernel: [47157.232183]  [<ffffffff81191a3b>] ? 
lru_cache_add_active_or_unevictable+0x2b/0xa0
nas kernel: [47157.232253]  [<ffffffff811b280c>] ? 
handle_mm_fault+0xbac/0x17e0
nas kernel: [47157.232311]  [<ffffffff811b6a08>] ? 
__vma_link_rb+0xc8/0xf0
nas kernel: [47157.232365]  [<ffffffff8120ce68>] 
do_vfs_ioctl+0x2f8/0x510
nas kernel: [47157.232421]  [<ffffffff81066f76>] ? 
__do_page_fault+0x1b6/0x450
nas kernel: [47157.232477]  [<ffffffff8120d101>] SyS_ioctl+0x81/0xa0
nas kernel: [47157.232527]  [<ffffffff81067240>] ? 
do_page_fault+0x30/0x80
nas kernel: [47157.232584]  [<ffffffff817d8ab2>] 
system_call_fastpath+0x16/0x75
nas kernel: [47157.232640] Code: 48 c7 c7 68 e4 37 c0 e8 de 1a d9 c0 e9 
55 f0 ff ff 0f 0b be ba 00 00 00 48 c7 c7 68 e4 37 c0 e8 c6 1a d9 c0 e9 
4d f1 ff ff 0f 0b <
nas kernel: [47157.232977] RIP  [<ffffffffc02e8251>] 
__btrfs_run_delayed_refs+0x11a1/0x1230 [btrfs]
nas kernel: [47157.233072]  RSP <ffff880103eff7c8>
nas kernel: [47157.256409] ---[ end trace e4064ae1c7878a23 ]---

When it happens, the system is obviously unstable and I can't umount or 
reboot (without the sysreq keys, that is).
When I do reboot, the filesystem is still mountable and remotely seems 
OK (didn't try a scrub yet). This is reproductible on my side, and I'm 
willing do help you debug this!
I can attach the complete dmesg if necessary.

If you need me to try more stuff or dump more information to help 
debugging, just ask!

Thanks,

Stéphane.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-14 11:46 kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance Stéphane Lesimple
@ 2015-09-15 14:47 ` Stéphane Lesimple
  2015-09-15 14:56   ` Josef Bacik
  0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-15 14:47 UTC (permalink / raw)
  To: linux-btrfs

> I've been experiencing repetitive "kernel BUG" occurences in the past
> few days trying to balance a raid5 filesystem after adding a new drive.
> It occurs on both 4.2.0 and 4.1.7, using 4.2 userspace tools.

I've ran a scrub on this filesystem after the crash happened twice, and 
if found no errors.

The BUG_ON() condition that my filesystem triggers is the following :

BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID);
// in insert_inline_extent_backref() of extent-tree.c.

I've compiled a fresh 4.3.0-rc1 with a couple added printk's just before 
the BUG_ON(), to dump the parameters passed to 
insert_inline_extent_backref() when the problem occurs.
Here is an excerpt of the resulting dmesg :

{btrfs} in insert_inline_extent_backref, got owner < 
BTRFS_FIRST_FREE_OBJECTID
{btrfs} with bytenr=4557830635520 num_bytes=16384 parent=4558111506432 
root_objectid=3339 owner=1 offset=0 refs_to_add=1 
BTRFS_FIRST_FREE_OBJECTID=256
------------[ cut here ]------------
kernel BUG at fs/btrfs/extent-tree.c:1837!

I'll retry with the exact same kernel once I get the machine back up, 
and see if the the bug happens again at the same filesystem spot or a 
different one.
The variable amount of time after a balance start elapsed before I get 
the bug suggests that this would be a different one.

-- 
Stéphane.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-15 14:47 ` Stéphane Lesimple
@ 2015-09-15 14:56   ` Josef Bacik
  2015-09-15 21:47     ` Stéphane Lesimple
  0 siblings, 1 reply; 37+ messages in thread
From: Josef Bacik @ 2015-09-15 14:56 UTC (permalink / raw)
  To: Stéphane Lesimple, linux-btrfs

On 09/15/2015 10:47 AM, Stéphane Lesimple wrote:
>> I've been experiencing repetitive "kernel BUG" occurences in the past
>> few days trying to balance a raid5 filesystem after adding a new drive.
>> It occurs on both 4.2.0 and 4.1.7, using 4.2 userspace tools.
>
> I've ran a scrub on this filesystem after the crash happened twice, and
> if found no errors.
>
> The BUG_ON() condition that my filesystem triggers is the following :
>
> BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID);
> // in insert_inline_extent_backref() of extent-tree.c.
>
> I've compiled a fresh 4.3.0-rc1 with a couple added printk's just before
> the BUG_ON(), to dump the parameters passed to
> insert_inline_extent_backref() when the problem occurs.
> Here is an excerpt of the resulting dmesg :
>
> {btrfs} in insert_inline_extent_backref, got owner <
> BTRFS_FIRST_FREE_OBJECTID
> {btrfs} with bytenr=4557830635520 num_bytes=16384 parent=4558111506432
> root_objectid=3339 owner=1 offset=0 refs_to_add=1
> BTRFS_FIRST_FREE_OBJECTID=256
> ------------[ cut here ]------------
> kernel BUG at fs/btrfs/extent-tree.c:1837!
>
> I'll retry with the exact same kernel once I get the machine back up,
> and see if the the bug happens again at the same filesystem spot or a
> different one.
> The variable amount of time after a balance start elapsed before I get
> the bug suggests that this would be a different one.
>

Does btrfsck complain at all?  Thanks,

Josef

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-15 14:56   ` Josef Bacik
@ 2015-09-15 21:47     ` Stéphane Lesimple
  2015-09-16  5:02       ` Duncan
  0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-15 21:47 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs

Le 2015-09-15 16:56, Josef Bacik a écrit :
> On 09/15/2015 10:47 AM, Stéphane Lesimple wrote:
>>> I've been experiencing repetitive "kernel BUG" occurences in the past
>>> few days trying to balance a raid5 filesystem after adding a new 
>>> drive.
>>> It occurs on both 4.2.0 and 4.1.7, using 4.2 userspace tools.
>> 
>> I've ran a scrub on this filesystem after the crash happened twice, 
>> and
>> if found no errors.
>> 
>> The BUG_ON() condition that my filesystem triggers is the following :
>> 
>> BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID);
>> // in insert_inline_extent_backref() of extent-tree.c.
>> 
>> I've compiled a fresh 4.3.0-rc1 with a couple added printk's just 
>> before
>> the BUG_ON(), to dump the parameters passed to
>> insert_inline_extent_backref() when the problem occurs.
>> Here is an excerpt of the resulting dmesg :
>> 
>> {btrfs} in insert_inline_extent_backref, got owner <
>> BTRFS_FIRST_FREE_OBJECTID
>> {btrfs} with bytenr=4557830635520 num_bytes=16384 parent=4558111506432
>> root_objectid=3339 owner=1 offset=0 refs_to_add=1
>> BTRFS_FIRST_FREE_OBJECTID=256
>> ------------[ cut here ]------------
>> kernel BUG at fs/btrfs/extent-tree.c:1837!
>> 
>> I'll retry with the exact same kernel once I get the machine back up,
>> and see if the the bug happens again at the same filesystem spot or a
>> different one.
>> The variable amount of time after a balance start elapsed before I get
>> the bug suggests that this would be a different one.
>> 
> 
> Does btrfsck complain at all?

Thanks for your suggestion.
You're right, even if btrfs scrub didn't complain, btrfsck does :

checking extents
bad metadata [4179166806016, 4179166822400) crossing stripe boundary
bad metadata [4179166871552, 4179166887936) crossing stripe boundary
bad metadata [4179166937088, 4179166953472) crossing stripe boundary
[... some more ...]
extent buffer leak: start 4561066901504 len 16384
extent buffer leak: start 4561078812672 len 16384
extent buffer leak: start 4561078861824 len 16384
[... some more ...]
then some complains about mismatched counts for qgroups.

I can see from tbe btrfsck source code that the --repair will not work 
here, so I didn't try.

I'm not sure if those errors would be a cause or a consequence of the 
bug. As the filesystem was only a few days old and as there was always a 
balance running during the crashes, I would be tempted to think it might 
actually be a consequence, but I can't be sure.
In your experience, could these inconsistencies cause the crash ?

If you think so, then I'll btrfs dev del the 3rd device, then remount 
the array degraded with just 1 disk and create a new btrfs system from 
scratch on the second, then copy the data in single redundancy, then 
re-add the 2 disks and balance convert in raid5.

If you think not, then this array could still help you debug a corner 
case, and I can keep it that way for a couple days if more testing/debug 
is needed.

Thanks,

-- 
Stéphane

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-15 21:47     ` Stéphane Lesimple
@ 2015-09-16  5:02       ` Duncan
  2015-09-16 10:28         ` Stéphane Lesimple
  0 siblings, 1 reply; 37+ messages in thread
From: Duncan @ 2015-09-16  5:02 UTC (permalink / raw)
  To: linux-btrfs

Stéphane Lesimple posted on Tue, 15 Sep 2015 23:47:01 +0200 as excerpted:

> Le 2015-09-15 16:56, Josef Bacik a écrit :
>> On 09/15/2015 10:47 AM, Stéphane Lesimple wrote:
>>>> I've been experiencing repetitive "kernel BUG" occurences in the past
>>>> few days trying to balance a raid5 filesystem after adding a new
>>>> drive.
>>>> It occurs on both 4.2.0 and 4.1.7, using 4.2 userspace tools.
>>> 
>>> I've ran a scrub on this filesystem after the crash happened twice,
>>> and if found no errors.
>>> 
>>> The BUG_ON() condition that my filesystem triggers is the following :
>>> 
>>> BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID);
>>> // in insert_inline_extent_backref() of extent-tree.c.
>>> 
>> Does btrfsck complain at all?

Just to elucidate a bit...

Scrub is designed to detect, and where there's a second copy available 
(dup or raid1/10 modes, raid5/6 modes can reconstruct from parity) 
correct, exactly one problem, corruption where the checksum stored at 
data write doesn't match that computed on data read back from storage.  
As such, it detects/corrects media errors and (perhaps more commonly) 
corrupted data due to crashes in the middle of the write, but if the data 
was bad when it was written in the first place and thus the checksum 
covering it simply validates what was already bad before the write 
happened, scrub will be none the wiser and will happily validate the 
incorrect data, since it's a totally valid checksum covering data that 
was bad before the checksum was ever created.

Which is where btrfs check comes in and why JB asked you to run it, since 
unlike scrub, check is designed to catch filesystem logic errors.

> Thanks for your suggestion.
> You're right, even if btrfs scrub didn't complain, btrfsck does :
> 
> checking extents
> bad metadata [4179166806016, 4179166822400) crossing stripe boundary
> bad metadata [4179166871552, 4179166887936) crossing stripe boundary
> bad metadata [4179166937088, 4179166953472) crossing stripe boundary

This is an actively in-focus bug ATM, and while I'm not a dev and can't 
tell you for sure that it's behind the specific balance-related crash and 
traces you posted (tho I believe it so), it certainly has the potential 
to be that serious, yes.

The most common cause is a buggy btrfs-convert that was creating invalid 
btrfs when converting from ext* at one point.  AFAIK they've hotfixed the 
immediate convert issue, but are still actively working on a longer term 
proper fix.  Meanwhile, while btrfs check does now detect the issue (and 
even that is quite new code, added in 4.2 I believe), there's still no 
real fix for what was after all a defective btrfs from the moment the 
convert was done.

So where that's the cause, the filesystem was created from an ext* fs 
using a buggy btrfs-convert and is thus actually invalid due to this 
cross-stripe-metadata, the current fix is to back up the files you want 
to keep (and FWIW, as any good sysadmin will tell you, a backup that 
hasn't been tested restorable isn't yet a backup, as the job isn't 
complete), then blow away and recreate the filesystem properly, using 
mkfs.btrfs, and of course then restore to the new filesystem.

If, however, you created the filesystem using mkfs.btrfs, then the 
problem must have occurred some other way.  Whether there's some other 
cause beyond the known cause, a buggy btrfs-convert, has in fact been in 
question, so in this case the devs are likely to be quite interested 
indeed in your case and perhaps the filesystem history that brought you 
to this point.  The ultimate fix is likely to be the same (unless the 
devs have you test new fix code for btrfs check --repair), but I'd 
strongly urge you to delay blowing away the filesystem, if possible, 
until the devs have a chance to ask you to run other diagnostics and 
perhaps even get a btrfs-image for them, since you may well have 
accidentally found a corner-case they'll have trouble reproducing, 
without your information.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-16  5:02       ` Duncan
@ 2015-09-16 10:28         ` Stéphane Lesimple
  2015-09-16 10:46           ` Holger Hoffstätte
  0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-16 10:28 UTC (permalink / raw)
  To: Duncan; +Cc: linux-btrfs

Le 2015-09-16 07:02, Duncan a écrit :
> Stéphane Lesimple posted on Tue, 15 Sep 2015 23:47:01 +0200 as 
> excerpted:
> 
>> Le 2015-09-15 16:56, Josef Bacik a écrit :
>>> On 09/15/2015 10:47 AM, Stéphane Lesimple wrote:
>>>>> I've been experiencing repetitive "kernel BUG" occurences in the 
>>>>> past
>>>>> few days trying to balance a raid5 filesystem after adding a new
>>>>> drive.
>>>>> It occurs on both 4.2.0 and 4.1.7, using 4.2 userspace tools.
>>>> 
>>>> I've ran a scrub on this filesystem after the crash happened twice,
>>>> and if found no errors.
>>>> 
>>>> The BUG_ON() condition that my filesystem triggers is the following 
>>>> :
>>>> 
>>>> BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID);
>>>> // in insert_inline_extent_backref() of extent-tree.c.
>>>> 
>>> Does btrfsck complain at all?
> 
> Just to elucidate a bit...
> [...]
> Which is where btrfs check comes in and why JB asked you to run it, 
> since
> unlike scrub, check is designed to catch filesystem logic errors.

Thanks for your clarification Duncan, that perfectly makes sense.

>> You're right, even if btrfs scrub didn't complain, btrfsck does :
>> 
>> checking extents
>> bad metadata [4179166806016, 4179166822400) crossing stripe boundary
>> bad metadata [4179166871552, 4179166887936) crossing stripe boundary
>> bad metadata [4179166937088, 4179166953472) crossing stripe boundary
> 
> This is an actively in-focus bug ATM, and while I'm not a dev and can't
> tell you for sure that it's behind the specific balance-related crash 
> and
> traces you posted (tho I believe it so), it certainly has the potential
> to be that serious, yes.
> 
> The most common cause is a buggy btrfs-convert that was creating 
> invalid
> btrfs when converting from ext* at one point.  AFAIK they've hotfixed 
> the
> immediate convert issue, but are still actively working on a longer 
> term
> proper fix.  Meanwhile, while btrfs check does now detect the issue 
> (and
> even that is quite new code, added in 4.2 I believe), there's still no
> real fix for what was after all a defective btrfs from the moment the
> convert was done.
> [...]
> If, however, you created the filesystem using mkfs.btrfs, then the
> problem must have occurred some other way.  Whether there's some other
> cause beyond the known cause, a buggy btrfs-convert, has in fact been 
> in
> question, so in this case the devs are likely to be quite interested
> indeed in your case and perhaps the filesystem history that brought you
> to this point.  The ultimate fix is likely to be the same (unless the
> devs have you test new fix code for btrfs check --repair), but I'd
> strongly urge you to delay blowing away the filesystem, if possible,
> until the devs have a chance to ask you to run other diagnostics and
> perhaps even get a btrfs-image for them, since you may well have
> accidentally found a corner-case they'll have trouble reproducing,
> without your information.

Nice to know that this bug was already somewhat known, but I can confirm 
that it actually doesn't come from an ext4 conversion on my case.

Here is the filesystem history, which is actually quite short :
- FS created from scratch, no convert, on 2x4T devices using mkfs.btrfs 
with raid1 metadata, raid5 data. This is using the 4.2 tools and kernel 
3.19, so a couple incompat features were turned on by default (such as 
skinny metadata).
- Approx. 4T worth of files copied to it, a bit less, I had around 30G 
free after the copy.
- Upgraded to kernel 4.2.0
- Added a third 4T device to the filesystem
- Ran a balance to get an even repartition of data/metadata among the 3 
drives
- Kernel BUG after a couple hours. The btrfs balance userspace tool 
segfaulted at the same time. Due to apport default configuration (damn 
you, Ubuntu !), core file was discarded, but I don't think the segfault 
is really interesting. The kernel trace is.

This was all done within ~1 week.

I've just created an image of the metadata, using btrfs-image -s. The 
image is 2.9G large, I can drop it somewhere in case a dev would like to 
have a look at it.

For what it's worth, I've been hitting another kernel BUG, almost 
certainly related, while trying to dev del the 3rd device, after 8 hours 
of work (kernel 4.1.7) :

kernel BUG at /home/kernel/COD/linux/fs/btrfs/extent-tree.c:2248!
in __btrfs_run_delayed_refs+0x11a1/0x1230 [btrfs]

Trace:
[<ffffffff813d9a65>] ? __percpu_counter_add+0x55/0x70
[<ffffffffc02ea483>] btrfs_run_delayed_refs.part.66+0x73/0x270 [btrfs]
[<ffffffffc02ea697>] btrfs_run_delayed_refs+0x17/0x20 [btrfs]
[<ffffffffc02fb169>] btrfs_should_end_transaction+0x49/0x60 [btrfs]
[<ffffffffc02e8aa2>] btrfs_drop_snapshot+0x472/0x880 [btrfs]
[<ffffffffc034ab00>] ? should_ignore_root.part.15+0x50/0x50 [btrfs]
[<ffffffffc034fd49>] merge_reloc_roots+0xd9/0x240 [btrfs]
[<ffffffffc0350119>] relocate_block_group+0x269/0x670 [btrfs]
[<ffffffffc03506f6>] btrfs_relocate_block_group+0x1d6/0x2e0 [btrfs]
[<ffffffffc0323cbe>] btrfs_relocate_chunk.isra.38+0x3e/0xc0 [btrfs]
[<ffffffffc0324944>] btrfs_shrink_device+0x1d4/0x450 [btrfs]
[<ffffffffc0328d43>] btrfs_rm_device+0x323/0x810 [btrfs]
[<ffffffffc0334ee6>] btrfs_ioctl+0x1e86/0x2b30 [btrfs]
[<ffffffff81183544>] ? filemap_map_pages+0x1d4/0x230
[<ffffffff811b29f5>] ? handle_mm_fault+0xd95/0x17e0
[<ffffffff81115112>] ? from_kgid_munged+0x12/0x20
[<ffffffff811fe710>] ? cp_new_stat+0x140/0x160
[<ffffffff8120ce68>] do_vfs_ioctl+0x2f8/0x510
[<ffffffff81066f76>] ? __do_page_fault+0x1b6/0x450
[<ffffffff811fe75f>] ? SYSC_newstat+0x2f/0x40
[<ffffffff8120d101>] SyS_ioctl+0x81/0xa0
[<ffffffff81067240>] ? do_page_fault+0x30/0x80
[<ffffffff817d8ab2>] system_call_fastpath+0x16/0x75


If JB or any other btrfs dev wants me to try anything at this filesystem 
before I recreate it from scratch, such as a kernel patch or userland 
tool patch, or run a more verbose debug balance, I would be happy to do 
so.
If this is the case, please tell me, so I can keep the filesystem as it 
is. On the other hand if you're sure the btrfs-image is enough, please 
tell me too, so I can go forward and fix my system :)

Thanks,

-- 
Stéphane.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-16 10:28         ` Stéphane Lesimple
@ 2015-09-16 10:46           ` Holger Hoffstätte
  2015-09-16 13:04             ` Stéphane Lesimple
  0 siblings, 1 reply; 37+ messages in thread
From: Holger Hoffstätte @ 2015-09-16 10:46 UTC (permalink / raw)
  To: Stéphane Lesimple; +Cc: linux-btrfs

On 09/16/15 12:28, Stéphane Lesimple wrote:
> Nice to know that this bug was already somewhat known, but I can
> confirm that it actually doesn't come from an ext4 conversion on my
> case.

In that case the "crossing stripe boundary" messages are false positives
in btrfs-progs-4.2: http://www.spinics.net/lists/linux-btrfs/msg47059.html

This should be fixed in the next release.

-h


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-16 10:46           ` Holger Hoffstätte
@ 2015-09-16 13:04             ` Stéphane Lesimple
  2015-09-16 20:18               ` Duncan
  2015-09-17  6:29               ` Stéphane Lesimple
  0 siblings, 2 replies; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-16 13:04 UTC (permalink / raw)
  To: linux-btrfs

Le 2015-09-16 12:46, Holger Hoffstätte a écrit :
> On 09/16/15 12:28, Stéphane Lesimple wrote:
>> Nice to know that this bug was already somewhat known, but I can
>> confirm that it actually doesn't come from an ext4 conversion on my
>> case.
> 
> In that case the "crossing stripe boundary" messages are false 
> positives
> in btrfs-progs-4.2: 
> http://www.spinics.net/lists/linux-btrfs/msg47059.html
> 
> This should be fixed in the next release.

Out of curiosity I compiled the btrfs-progs-4.2 release patched with the 
diff you're referencing to fix the off-by-one error, and ran a btrfsck 
again.
Indeed those errors disappear and my filesystem seems clean in this 
regard. I also disabled quota because it has almost for sure nothing to 
do with the bug, and now btrsfck is 100% happy:

---------
checking extents
checking free space cache
checking fs roots
checking csums
checking root refs
Checking filesystem on 
/dev/mapper/luks-WDC_WD30EZRX-00MMMB0_WD-WCAWZ3013164
UUID: 6bec1608-d9c0-453e-87eb-8b8663c9010d
found 2922178546042 bytes used err is 0
total csum bytes: 2849102736
total tree bytes: 4697341952
total fs tree bytes: 1276395520
total extent tree bytes: 90963968
btree space waste bytes: 640514848
file data blocks allocated: 2959998808064
  referenced 2959997575168
btrfs-progs v4.2-dirty
---------

So this is even more interesting, my filesystem is reported by scrub and 
fsck as being in perfect shape, but still crashes the kernel from time 
to time on balance.

Next step: reboot under 4.3.0-rc1 with my printk's, run a balance, log 
the crash, reboot, balance again, crash again, and compare. If the same 
filesystem spot triggers the crash twice then it would be an undetected 
metadata/filesystem internal integrity corruption, if it crashes at 2 
different spots, then maybe it's some kind of race condition that, for 
some reason, my system hits way more often than others.

-- 
Stéphane.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-16 13:04             ` Stéphane Lesimple
@ 2015-09-16 20:18               ` Duncan
  2015-09-16 20:41                 ` Stéphane Lesimple
  2015-09-17  6:29               ` Stéphane Lesimple
  1 sibling, 1 reply; 37+ messages in thread
From: Duncan @ 2015-09-16 20:18 UTC (permalink / raw)
  To: linux-btrfs

Stéphane Lesimple posted on Wed, 16 Sep 2015 15:04:20 +0200 as excerpted:

> Le 2015-09-16 12:46, Holger Hoffstätte a écrit :
>> 
>> In that case the "crossing stripe boundary" messages are false
>> positives in btrfs-progs-4.2:
>> http://www.spinics.net/lists/linux-btrfs/msg47059.html
>> 
>> This should be fixed in the next release.

Thanks for reminding me of that, Holger.  I think I was too sleepy when I 
read it on-list, and forgot about it...

> Out of curiosity I compiled the btrfs-progs-4.2 release patched with the
> diff you're referencing to fix the off-by-one error, and ran a btrfsck
> again.
> Indeed those errors disappear and my filesystem seems clean in this
> regard. I also disabled quota because it has almost for sure nothing to
> do with the bug, and now btrsfck is 100% happy:

Yes.  Quotas have been a continuing issue on btrfs.  AFAIK, they're on 
the third rewrite now, and still have some bugs to work out.  So what 
I've been recommending is unless you're (a) directly and specifically 
working with the devs to find and fix the quota issues (in which case 
please keep at it! =:^), either you (b) really depend on quotas working 
and btrfs isn't appropriate for you at this time, since they're known to 
be buggy, so use a more mature filesystem where they're known to work, or 
(c) you don't actually need quotas at all, so disable them and remove the 
quota tracking metadata, thus avoiding the bugs in the feature entirely. 
=:^)

(Agreed with your conclusions and nothing to add to the rest of your 
message, so omitted here.)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-16 20:18               ` Duncan
@ 2015-09-16 20:41                 ` Stéphane Lesimple
  2015-09-17  3:03                   ` Qu Wenruo
  0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-16 20:41 UTC (permalink / raw)
  To: linux-btrfs

Le 2015-09-16 22:18, Duncan a écrit :
> Stéphane Lesimple posted on Wed, 16 Sep 2015 15:04:20 +0200 as 
> excerpted:
> 
>> Le 2015-09-16 12:46, Holger Hoffstätte a écrit :
>>> 
>> I also disabled quota because it has almost for sure nothing to
>> do with the bug, and now btrsfck is 100% happy:
> 
> Yes.  Quotas have been a continuing issue on btrfs.  AFAIK, they're on
> the third rewrite now, and still have some bugs to work out.  So what
> I've been recommending is unless you're (a) directly and specifically
> working with the devs to find and fix the quota issues (in which case
> please keep at it! =:^), either you (b) really depend on quotas working
> and btrfs isn't appropriate for you at this time, since they're known 
> to
> be buggy, so use a more mature filesystem where they're known to work, 
> or
> (c) you don't actually need quotas at all, so disable them and remove 
> the
> quota tracking metadata, thus avoiding the bugs in the feature 
> entirely.
> =:^)

Well actually it's the (d) option ;)
I activate the quota feature for only one reason : being able to track 
down how much space my snapshots are taking.
I've been using ZFS in the past, and I was really missing the "zfs list" 
command that is able to tell you how much space a given snapshot is 
actually taking under btrfs.
With quota enabled, I was able to somehow mimic zfs list with a perl 
script I wrote, btrfs-list :

PATH                                    ID     TYPE     REFER      USED
'tank'                                  -1       df         -     2.66T 
(13.26G free)
    /tank                                 5      vol    48.00K    48.00K
    media                              1906   subvol     1.04T     1.04T
    photos                             1909   subvol   116.37G   116.37G
    main                               1911   subvol   973.23G   973.23G
    bkp-slash                          3270   subvol    15.86G    15.86G
    bkp-quasar                         3314   subvol    18.26G    16.00K
       .snaps/bkp-quasar@2015-01-17    3317   rosnap    17.77G    40.76M
       .snaps/bkp-quasar@2015-03-06    3318   rosnap    17.89G    88.88M
       .snaps/bkp-quasar@2015-04-05    3319   rosnap    17.92G    90.97M
       .snaps/bkp-quasar@2015-05-31    3320   rosnap    17.95G     1.02M
       .snaps/bkp-quasar@2015-06-13    3321   rosnap    17.95G   760.00K
       .snaps/bkp-quasar@2015-07-26    3322   rosnap    18.19G    17.88M
       .snaps/bkp-quasar@2015-07-31    3323   rosnap    18.19G    14.58M
       .snaps/bkp-quasar@2015-08-03    3324   rosnap    18.19G    17.43M
    bkp-liznodisk                      3341   subvol     7.01G    16.00K
       .snaps/bkp-liznodisk@2015-03-01 3342   rosnap     6.96G    75.37M
       .snaps/bkp-liznodisk@2015-03-28 3343   rosnap     6.98G    84.93M
       .snaps/bkp-liznodisk@2015-04-26 3344   rosnap     6.96G    67.14M
       .snaps/bkp-liznodisk@2015-05-24 3345   rosnap     6.95G    47.67M
       .snaps/bkp-liznodisk@2015-06-27 3346   rosnap     6.96G    67.97M
       .snaps/bkp-liznodisk@2015-07-25 3347   rosnap     6.98G    60.30M
       .snaps/bkp-liznodisk@2015-08-16 3348   rosnap     7.10G   159.44M
    bkp-skyline                        3367   subvol    22.52G    22.52G

I just pushed it to https://github.com/speed47/btrfs-list, if anybody is 
interested.

Anyway, balance is running again for 7+ hours, trying to reproduce the 
bug twice, but no crash yet ... should happen soon now !

-- 
Stéphane.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-16 20:41                 ` Stéphane Lesimple
@ 2015-09-17  3:03                   ` Qu Wenruo
  2015-09-17  6:11                     ` Stéphane Lesimple
  0 siblings, 1 reply; 37+ messages in thread
From: Qu Wenruo @ 2015-09-17  3:03 UTC (permalink / raw)
  To: Stéphane Lesimple, linux-btrfs



Stéphane Lesimple wrote on 2015/09/16 22:41 +0200:
> Le 2015-09-16 22:18, Duncan a écrit :
>> Stéphane Lesimple posted on Wed, 16 Sep 2015 15:04:20 +0200 as excerpted:
>>
>>> Le 2015-09-16 12:46, Holger Hoffstätte a écrit :
>>>>
>>> I also disabled quota because it has almost for sure nothing to
>>> do with the bug, and now btrsfck is 100% happy:
>>
>> Yes.  Quotas have been a continuing issue on btrfs.  AFAIK, they're on
>> the third rewrite now, and still have some bugs to work out.  So what
>> I've been recommending is unless you're (a) directly and specifically
>> working with the devs to find and fix the quota issues (in which case
>> please keep at it! =:^), either you (b) really depend on quotas working
>> and btrfs isn't appropriate for you at this time, since they're known to
>> be buggy, so use a more mature filesystem where they're known to work, or
>> (c) you don't actually need quotas at all, so disable them and remove the
>> quota tracking metadata, thus avoiding the bugs in the feature entirely.
>> =:^)
>
> Well actually it's the (d) option ;)
> I activate the quota feature for only one reason : being able to track
> down how much space my snapshots are taking.
> I've been using ZFS in the past, and I was really missing the "zfs list"
> command that is able to tell you how much space a given snapshot is
> actually taking under btrfs.
> With quota enabled, I was able to somehow mimic zfs list with a perl
> script I wrote, btrfs-list :
>
> PATH                                    ID     TYPE     REFER      USED
> 'tank'                                  -1       df         -     2.66T
> (13.26G free)
>     /tank                                 5      vol    48.00K    48.00K
>     media                              1906   subvol     1.04T     1.04T
>     photos                             1909   subvol   116.37G   116.37G
>     main                               1911   subvol   973.23G   973.23G
>     bkp-slash                          3270   subvol    15.86G    15.86G
>     bkp-quasar                         3314   subvol    18.26G    16.00K
>        .snaps/bkp-quasar@2015-01-17    3317   rosnap    17.77G    40.76M
>        .snaps/bkp-quasar@2015-03-06    3318   rosnap    17.89G    88.88M
>        .snaps/bkp-quasar@2015-04-05    3319   rosnap    17.92G    90.97M
>        .snaps/bkp-quasar@2015-05-31    3320   rosnap    17.95G     1.02M
>        .snaps/bkp-quasar@2015-06-13    3321   rosnap    17.95G   760.00K
>        .snaps/bkp-quasar@2015-07-26    3322   rosnap    18.19G    17.88M
>        .snaps/bkp-quasar@2015-07-31    3323   rosnap    18.19G    14.58M
>        .snaps/bkp-quasar@2015-08-03    3324   rosnap    18.19G    17.43M
>     bkp-liznodisk                      3341   subvol     7.01G    16.00K
>        .snaps/bkp-liznodisk@2015-03-01 3342   rosnap     6.96G    75.37M
>        .snaps/bkp-liznodisk@2015-03-28 3343   rosnap     6.98G    84.93M
>        .snaps/bkp-liznodisk@2015-04-26 3344   rosnap     6.96G    67.14M
>        .snaps/bkp-liznodisk@2015-05-24 3345   rosnap     6.95G    47.67M
>        .snaps/bkp-liznodisk@2015-06-27 3346   rosnap     6.96G    67.97M
>        .snaps/bkp-liznodisk@2015-07-25 3347   rosnap     6.98G    60.30M
>        .snaps/bkp-liznodisk@2015-08-16 3348   rosnap     7.10G   159.44M
>     bkp-skyline                        3367   subvol    22.52G    22.52G
>
> I just pushed it to https://github.com/speed47/btrfs-list, if anybody is
> interested.
>
> Anyway, balance is running again for 7+ hours, trying to reproduce the
> bug twice, but no crash yet ... should happen soon now !
>
Yeah, that's completely one of the ideal use case of btrfs qgroup.

But I'm quite curious about the btrfsck error report on qgroup.

If btrfsck report such error, it means either I'm too confident about 
the recent qgroup accounting rework, or btrfsck has some bug which I 
didn't take much consideration during the kernel rework.

Would you please provide the full result of previous btrfsck with qgroup 
error?

Thanks,
Qu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-17  3:03                   ` Qu Wenruo
@ 2015-09-17  6:11                     ` Stéphane Lesimple
  2015-09-17  6:42                       ` Qu Wenruo
  0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-17  6:11 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

Le 2015-09-17 05:03, Qu Wenruo a écrit :
> Stéphane Lesimple wrote on 2015/09/16 22:41 +0200:
>> Le 2015-09-16 22:18, Duncan a écrit :
>>> Stéphane Lesimple posted on Wed, 16 Sep 2015 15:04:20 +0200 as 
>>> excerpted:
>>> 
>> 
>> Well actually it's the (d) option ;)
>> I activate the quota feature for only one reason : being able to track
>> down how much space my snapshots are taking.
> 
> Yeah, that's completely one of the ideal use case of btrfs qgroup.
> 
> But I'm quite curious about the btrfsck error report on qgroup.
> 
> If btrfsck report such error, it means either I'm too confident about
> the recent qgroup accounting rework, or btrfsck has some bug which I
> didn't take much consideration during the kernel rework.
> 
> Would you please provide the full result of previous btrfsck with 
> qgroup error?

Sure, I've saved the log somewhere just in case, here your are :

Counts for qgroup id: 3359 are different
our:            referenced 7530119168 referenced compressed 7530119168
disk:           referenced 7530086400 referenced compressed 7530086400
diff:           referenced 32768 referenced compressed 32768
our:            exclusive 49152 exclusive compressed 49152
disk:           exclusive 32768 exclusive compressed 32768
diff:           exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3361 are different
our:            referenced 7530119168 referenced compressed 7530119168
disk:           referenced 7530086400 referenced compressed 7530086400
diff:           referenced 32768 referenced compressed 32768
our:            exclusive 49152 exclusive compressed 49152
disk:           exclusive 32768 exclusive compressed 32768
diff:           exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3362 are different
our:            referenced 7530119168 referenced compressed 7530119168
disk:           referenced 7530086400 referenced compressed 7530086400
diff:           referenced 32768 referenced compressed 32768
our:            exclusive 49152 exclusive compressed 49152
disk:           exclusive 32768 exclusive compressed 32768
diff:           exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3363 are different
our:            referenced 7530119168 referenced compressed 7530119168
disk:           referenced 7530086400 referenced compressed 7530086400
diff:           referenced 32768 referenced compressed 32768
our:            exclusive 49152 exclusive compressed 49152
disk:           exclusive 32768 exclusive compressed 32768
diff:           exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3361 are different
our:            referenced 7530119168 referenced compressed 7530119168
disk:           referenced 7530086400 referenced compressed 7530086400
diff:           referenced 32768 referenced compressed 32768
our:            exclusive 49152 exclusive compressed 49152
disk:           exclusive 32768 exclusive compressed 32768
diff:           exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3362 are different
our:            referenced 7530119168 referenced compressed 7530119168
disk:           referenced 7530086400 referenced compressed 7530086400
diff:           referenced 32768 referenced compressed 32768
our:            exclusive 49152 exclusive compressed 49152
disk:           exclusive 32768 exclusive compressed 32768
diff:           exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3363 are different
our:            referenced 7530119168 referenced compressed 7530119168
disk:           referenced 7530086400 referenced compressed 7530086400
diff:           referenced 32768 referenced compressed 32768
our:            exclusive 49152 exclusive compressed 49152
disk:           exclusive 32768 exclusive compressed 32768
diff:           exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3364 are different
our:            referenced 7530119168 referenced compressed 7530119168
disk:           referenced 7530086400 referenced compressed 7530086400
diff:           referenced 32768 referenced compressed 32768
our:            exclusive 49152 exclusive compressed 49152
disk:           exclusive 32768 exclusive compressed 32768
diff:           exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3365 are different
our:            referenced 7530119168 referenced compressed 7530119168
disk:           referenced 7530086400 referenced compressed 7530086400
diff:           referenced 32768 referenced compressed 32768
our:            exclusive 49152 exclusive compressed 49152
disk:           exclusive 32768 exclusive compressed 32768
diff:           exclusive 16384 exclusive compressed 16384
Counts for qgroup id: 3366 are different
our:            referenced 7530119168 referenced compressed 7530119168
disk:           referenced 7530086400 referenced compressed 7530086400
diff:           referenced 32768 referenced compressed 32768
our:            exclusive 16384 exclusive compressed 16384
disk:           exclusive 16384 exclusive compressed 16384

If you need the really complete log (even with the "crossing stripe 
boundary" false positives), I can post it somewhere for you.

I can't do a btrfs qgroup show because I've since disabled quota as 
stated above. However you can find the IDs corresponding to subvolumes 
and snapshots in my previous post.

Don't hesitate to ask for more information if needed.

Thanks,

-- 
Stéphane.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-16 13:04             ` Stéphane Lesimple
  2015-09-16 20:18               ` Duncan
@ 2015-09-17  6:29               ` Stéphane Lesimple
  2015-09-17  7:54                 ` Stéphane Lesimple
  1 sibling, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-17  6:29 UTC (permalink / raw)
  To: linux-btrfs

Le 2015-09-16 15:04, Stéphane Lesimple a écrit :
> I also disabled quota because it has almost for sure nothing
> to do with the bug

As it turns out, it seems that this assertion was completely wrong.

I've got balance running for more than 16 hours now, without a crash. 
This is almost 50% of the work done without any issue. Before, a crash 
would happen within minutes, sometimes 1 hour, but not much more. The 
problem is, I didn't change anything to the filesystem, well, appart 
from the benign quota disable. So Qu's question about the qgroups errors 
in fsck made me wonder : if I activate quota again, it'll still continue 
to balance flawlessly, right ?

Well, it doesn't. I just ran btrfs quota enable on my filesystem, it 
completed successfully after some minutes (rescan -s said that no rescan 
was pending). Then less than 5 minutes later, the kernel crashed, at the 
same BUG_ON() than usually :

[60156.062082] BTRFS info (device dm-3): relocating block group 
972839452672 flags 129
[60185.203626] BTRFS info (device dm-3): found 1463 extents
[60414.452890] {btrfs} in insert_inline_extent_backref, got owner < 
BTRFS_FIRST_FREE_OBJECTID
[60414.452894] {btrfs} with bytenr=5197436141568 num_bytes=16384 
parent=5336636473344 root_objectid=3358 owner=1 offset=0 refs_to_add=1 
BTRFS_FIRST_FREE_OBJECTID=256
[60414.452924] ------------[ cut here ]------------
[60414.452928] kernel BUG at fs/btrfs/extent-tree.c:1837!

owner is=1 again at this point in the code (this is still kernel 
4.3.0-rc1 with my added printks).

So I'll disable quota, again, and resume the balance. If I'm right, it 
should proceed without issue for 18 more hours !

Qu, my filesystem is at your disposal :)

-- 
Stéphane.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-17  6:11                     ` Stéphane Lesimple
@ 2015-09-17  6:42                       ` Qu Wenruo
  2015-09-17  8:02                         ` Stéphane Lesimple
  0 siblings, 1 reply; 37+ messages in thread
From: Qu Wenruo @ 2015-09-17  6:42 UTC (permalink / raw)
  To: Stéphane Lesimple; +Cc: linux-btrfs



Stéphane Lesimple wrote on 2015/09/17 08:11 +0200:
> Le 2015-09-17 05:03, Qu Wenruo a écrit :
>> Stéphane Lesimple wrote on 2015/09/16 22:41 +0200:
>>> Le 2015-09-16 22:18, Duncan a écrit :
>>>> Stéphane Lesimple posted on Wed, 16 Sep 2015 15:04:20 +0200 as
>>>> excerpted:
>>>>
>>>
>>> Well actually it's the (d) option ;)
>>> I activate the quota feature for only one reason : being able to track
>>> down how much space my snapshots are taking.
>>
>> Yeah, that's completely one of the ideal use case of btrfs qgroup.
>>
>> But I'm quite curious about the btrfsck error report on qgroup.
>>
>> If btrfsck report such error, it means either I'm too confident about
>> the recent qgroup accounting rework, or btrfsck has some bug which I
>> didn't take much consideration during the kernel rework.
>>
>> Would you please provide the full result of previous btrfsck with
>> qgroup error?
>
> Sure, I've saved the log somewhere just in case, here your are :
>
> Counts for qgroup id: 3359 are different
> our:            referenced 7530119168 referenced compressed 7530119168
> disk:           referenced 7530086400 referenced compressed 7530086400
> diff:           referenced 32768 referenced compressed 32768
> our:            exclusive 49152 exclusive compressed 49152
> disk:           exclusive 32768 exclusive compressed 32768
> diff:           exclusive 16384 exclusive compressed 16384
> Counts for qgroup id: 3361 are different
> our:            referenced 7530119168 referenced compressed 7530119168
> disk:           referenced 7530086400 referenced compressed 7530086400
> diff:           referenced 32768 referenced compressed 32768
> our:            exclusive 49152 exclusive compressed 49152
> disk:           exclusive 32768 exclusive compressed 32768
> diff:           exclusive 16384 exclusive compressed 16384
> Counts for qgroup id: 3362 are different
> our:            referenced 7530119168 referenced compressed 7530119168
> disk:           referenced 7530086400 referenced compressed 7530086400
> diff:           referenced 32768 referenced compressed 32768
> our:            exclusive 49152 exclusive compressed 49152
> disk:           exclusive 32768 exclusive compressed 32768
> diff:           exclusive 16384 exclusive compressed 16384
> Counts for qgroup id: 3363 are different
> our:            referenced 7530119168 referenced compressed 7530119168
> disk:           referenced 7530086400 referenced compressed 7530086400
> diff:           referenced 32768 referenced compressed 32768
> our:            exclusive 49152 exclusive compressed 49152
> disk:           exclusive 32768 exclusive compressed 32768
> diff:           exclusive 16384 exclusive compressed 16384
> Counts for qgroup id: 3361 are different
> our:            referenced 7530119168 referenced compressed 7530119168
> disk:           referenced 7530086400 referenced compressed 7530086400
> diff:           referenced 32768 referenced compressed 32768
> our:            exclusive 49152 exclusive compressed 49152
> disk:           exclusive 32768 exclusive compressed 32768
> diff:           exclusive 16384 exclusive compressed 16384
> Counts for qgroup id: 3362 are different
> our:            referenced 7530119168 referenced compressed 7530119168
> disk:           referenced 7530086400 referenced compressed 7530086400
> diff:           referenced 32768 referenced compressed 32768
> our:            exclusive 49152 exclusive compressed 49152
> disk:           exclusive 32768 exclusive compressed 32768
> diff:           exclusive 16384 exclusive compressed 16384
> Counts for qgroup id: 3363 are different
> our:            referenced 7530119168 referenced compressed 7530119168
> disk:           referenced 7530086400 referenced compressed 7530086400
> diff:           referenced 32768 referenced compressed 32768
> our:            exclusive 49152 exclusive compressed 49152
> disk:           exclusive 32768 exclusive compressed 32768
> diff:           exclusive 16384 exclusive compressed 16384
> Counts for qgroup id: 3364 are different
> our:            referenced 7530119168 referenced compressed 7530119168
> disk:           referenced 7530086400 referenced compressed 7530086400
> diff:           referenced 32768 referenced compressed 32768
> our:            exclusive 49152 exclusive compressed 49152
> disk:           exclusive 32768 exclusive compressed 32768
> diff:           exclusive 16384 exclusive compressed 16384
> Counts for qgroup id: 3365 are different
> our:            referenced 7530119168 referenced compressed 7530119168
> disk:           referenced 7530086400 referenced compressed 7530086400
> diff:           referenced 32768 referenced compressed 32768
> our:            exclusive 49152 exclusive compressed 49152
> disk:           exclusive 32768 exclusive compressed 32768
> diff:           exclusive 16384 exclusive compressed 16384
> Counts for qgroup id: 3366 are different
> our:            referenced 7530119168 referenced compressed 7530119168
> disk:           referenced 7530086400 referenced compressed 7530086400
> diff:           referenced 32768 referenced compressed 32768
> our:            exclusive 16384 exclusive compressed 16384
> disk:           exclusive 16384 exclusive compressed 16384
>
> If you need the really complete log (even with the "crossing stripe
> boundary" false positives), I can post it somewhere for you.
>
> I can't do a btrfs qgroup show because I've since disabled quota as
> stated above. However you can find the IDs corresponding to subvolumes
> and snapshots in my previous post.
>
> Don't hesitate to ask for more information if needed.
>
> Thanks,
>
Thanks for your log, pretty interesting result.

BTW, did you enabled qgroup from old kernel earlier than 4.2-rc1?
If so, I would be much relaxed as they can be the problem of old kernels.

If it's OK for you, would you please enable quota after reproducing the 
bug and use for sometime and recheck it?

Thanks,
Qu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-17  6:29               ` Stéphane Lesimple
@ 2015-09-17  7:54                 ` Stéphane Lesimple
  0 siblings, 0 replies; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-17  7:54 UTC (permalink / raw)
  To: linux-btrfs

Le 2015-09-17 08:29, Stéphane Lesimple a écrit :
> Le 2015-09-16 15:04, Stéphane Lesimple a écrit :
>> I also disabled quota because it has almost for sure nothing
>> to do with the bug
> 
> As it turns out, it seems that this assertion was completely wrong.
> 
> I've got balance running for more than 16 hours now, without a crash.
> This is almost 50% of the work done without any issue. Before, a crash
> would happen within minutes, sometimes 1 hour, but not much more. The
> problem is, I didn't change anything to the filesystem, well, appart
> from the benign quota disable. So Qu's question about the qgroups
> errors in fsck made me wonder : if I activate quota again, it'll still
> continue to balance flawlessly, right ?
> 
> Well, it doesn't. I just ran btrfs quota enable on my filesystem, it
> completed successfully after some minutes (rescan -s said that no
> rescan was pending). Then less than 5 minutes later, the kernel
> crashed, at the same BUG_ON() than usually :
> 
> [60156.062082] BTRFS info (device dm-3): relocating block group
> 972839452672 flags 129
> [60185.203626] BTRFS info (device dm-3): found 1463 extents
> [60414.452890] {btrfs} in insert_inline_extent_backref, got owner <
> BTRFS_FIRST_FREE_OBJECTID
> [60414.452894] {btrfs} with bytenr=5197436141568 num_bytes=16384
> parent=5336636473344 root_objectid=3358 owner=1 offset=0 refs_to_add=1
> BTRFS_FIRST_FREE_OBJECTID=256
> [60414.452924] ------------[ cut here ]------------
> [60414.452928] kernel BUG at fs/btrfs/extent-tree.c:1837!
> 
> owner is=1 again at this point in the code (this is still kernel
> 4.3.0-rc1 with my added printks).
> 
> So I'll disable quota, again, and resume the balance. If I'm right, it
> should proceed without issue for 18 more hours !

Damn, wrong again. It just re-crashed without quota enabled :(
The fact that it went perfectly well for 17+ hours and crashed minutes 
after I reactivated quota might be by complete chance then ...

[ 5487.706499] {btrfs} in insert_inline_extent_backref, got owner < 
BTRFS_FIRST_FREE_OBJECTID
[ 5487.706504] {btrfs} with bytenr=6906661109760 num_bytes=16384 
parent=6905020874752 root_objectid=18446744073709551608 owner=1 offset=0 
refs_to_add=1 BTRFS_FIRST_FREE_OBJECTID=256
[ 5487.706536] ------------[ cut here ]------------
[ 5487.706539] kernel BUG at fs/btrfs/extent-tree.c:1837!

For reference, the crash I had earlier this morning was as follows :

[60414.452894] {btrfs} with bytenr=5197436141568 num_bytes=16384 
parent=5336636473344 root_objectid=3358 owner=1 offset=0 refs_to_add=1 
BTRFS_FIRST_FREE_OBJECTID=256

So, this is a completely different part of the filesystem.
The bug is always the same though, owner=1 where it shouldn't be < 256.

Balance cancelled.

To me, it sounds like some sort of race condition. But I'm out of ideas 
on what to test now.

-- 
Stéphane.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-17  6:42                       ` Qu Wenruo
@ 2015-09-17  8:02                         ` Stéphane Lesimple
  2015-09-17  8:11                           ` Qu Wenruo
  0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-17  8:02 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

Le 2015-09-17 08:42, Qu Wenruo a écrit :
> Stéphane Lesimple wrote on 2015/09/17 08:11 +0200:
>> Le 2015-09-17 05:03, Qu Wenruo a écrit :
>>> Stéphane Lesimple wrote on 2015/09/16 22:41 +0200:
>>>> Le 2015-09-16 22:18, Duncan a écrit :
>>>>> Stéphane Lesimple posted on Wed, 16 Sep 2015 15:04:20 +0200 as
>>>>> excerpted:
>>>>> 
>>>> 
>>>> Well actually it's the (d) option ;)
>>>> I activate the quota feature for only one reason : being able to 
>>>> track
>>>> down how much space my snapshots are taking.
>>> 
>>> Yeah, that's completely one of the ideal use case of btrfs qgroup.
>>> 
>>> But I'm quite curious about the btrfsck error report on qgroup.
>>> 
>>> If btrfsck report such error, it means either I'm too confident about
>>> the recent qgroup accounting rework, or btrfsck has some bug which I
>>> didn't take much consideration during the kernel rework.
>>> 
>>> Would you please provide the full result of previous btrfsck with
>>> qgroup error?
>> 
>> Sure, I've saved the log somewhere just in case, here your are :
>> 
>> [...]
> Thanks for your log, pretty interesting result.
> 
> BTW, did you enabled qgroup from old kernel earlier than 4.2-rc1?
> If so, I would be much relaxed as they can be the problem of old 
> kernels.

The mkfs.btrfs was done under 3.19, but I'm almost sure I enabled quota 
under 4.2.0 precisely. My kern.log tends to confirm that (looking for 
'qgroup scan completed').

> If it's OK for you, would you please enable quota after reproducing
> the bug and use for sometime and recheck it?

Sure, I've just reproduced the bug twice as I wanted, and posted the 
info, so now I've cancelled the balance and I can reenable quota. Will 
do it under 4.3.0-rc1. I'll keep you posted if btrfsck complains about 
it in the following days.

Regards,

-- 
Stéphane.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-17  8:02                         ` Stéphane Lesimple
@ 2015-09-17  8:11                           ` Qu Wenruo
  2015-09-17 10:08                             ` Stéphane Lesimple
  0 siblings, 1 reply; 37+ messages in thread
From: Qu Wenruo @ 2015-09-17  8:11 UTC (permalink / raw)
  To: Stéphane Lesimple; +Cc: linux-btrfs



Stéphane Lesimple wrote on 2015/09/17 10:02 +0200:
> Le 2015-09-17 08:42, Qu Wenruo a écrit :
>> Stéphane Lesimple wrote on 2015/09/17 08:11 +0200:
>>> Le 2015-09-17 05:03, Qu Wenruo a écrit :
>>>> Stéphane Lesimple wrote on 2015/09/16 22:41 +0200:
>>>>> Le 2015-09-16 22:18, Duncan a écrit :
>>>>>> Stéphane Lesimple posted on Wed, 16 Sep 2015 15:04:20 +0200 as
>>>>>> excerpted:
>>>>>>
>>>>>
>>>>> Well actually it's the (d) option ;)
>>>>> I activate the quota feature for only one reason : being able to track
>>>>> down how much space my snapshots are taking.
>>>>
>>>> Yeah, that's completely one of the ideal use case of btrfs qgroup.
>>>>
>>>> But I'm quite curious about the btrfsck error report on qgroup.
>>>>
>>>> If btrfsck report such error, it means either I'm too confident about
>>>> the recent qgroup accounting rework, or btrfsck has some bug which I
>>>> didn't take much consideration during the kernel rework.
>>>>
>>>> Would you please provide the full result of previous btrfsck with
>>>> qgroup error?
>>>
>>> Sure, I've saved the log somewhere just in case, here your are :
>>>
>>> [...]
>> Thanks for your log, pretty interesting result.
>>
>> BTW, did you enabled qgroup from old kernel earlier than 4.2-rc1?
>> If so, I would be much relaxed as they can be the problem of old kernels.
>
> The mkfs.btrfs was done under 3.19, but I'm almost sure I enabled quota
> under 4.2.0 precisely. My kern.log tends to confirm that (looking for
> 'qgroup scan completed').

Emmm, seems I need to pay more attention on this case now.
Any info about the workload for this btrfs fs?

>
>> If it's OK for you, would you please enable quota after reproducing
>> the bug and use for sometime and recheck it?
>
> Sure, I've just reproduced the bug twice as I wanted, and posted the
> info, so now I've cancelled the balance and I can reenable quota. Will
> do it under 4.3.0-rc1. I'll keep you posted if btrfsck complains about
> it in the following days.
>
> Regards,
>
Thanks for your patience and detailed report.

But I still have another question, did you do any snapshot deletion 
after quota enabled?
(I'll assume you did it, as there are a lot of backup snapshot, old ones 
should be already deleted)

That's one of the known bug and Mark is working on it actively.
If you delete non-empty snapshot a lot, then I'd better add a hot fix to 
mark qgroup inconsistent after snapshot delete, and trigger a rescan if 
possible.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-17  8:11                           ` Qu Wenruo
@ 2015-09-17 10:08                             ` Stéphane Lesimple
  2015-09-17 10:41                               ` Qu Wenruo
  0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-17 10:08 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

Le 2015-09-17 10:11, Qu Wenruo a écrit :
> Stéphane Lesimple wrote on 2015/09/17 10:02 +0200:
>> Le 2015-09-17 08:42, Qu Wenruo a écrit :
>>> Stéphane Lesimple wrote on 2015/09/17 08:11 +0200:
>>>> Le 2015-09-17 05:03, Qu Wenruo a écrit :
>>>>> Stéphane Lesimple wrote on 2015/09/16 22:41 +0200:
>>>>>> Le 2015-09-16 22:18, Duncan a écrit :
>>>>>>> Stéphane Lesimple posted on Wed, 16 Sep 2015 15:04:20 +0200 as
>>>>>>> excerpted:
>>>>>>> 
>>>>>> 
>>>>>> Well actually it's the (d) option ;)
>>>>>> I activate the quota feature for only one reason : being able to 
>>>>>> track
>>>>>> down how much space my snapshots are taking.
>>>>> 
>>>>> Yeah, that's completely one of the ideal use case of btrfs qgroup.
>>>>> 
>>>>> But I'm quite curious about the btrfsck error report on qgroup.
>>>>> 
>>>>> If btrfsck report such error, it means either I'm too confident 
>>>>> about
>>>>> the recent qgroup accounting rework, or btrfsck has some bug which 
>>>>> I
>>>>> didn't take much consideration during the kernel rework.
>>>>> 
>>>>> Would you please provide the full result of previous btrfsck with
>>>>> qgroup error?
>>>> 
>>>> Sure, I've saved the log somewhere just in case, here your are :
>>>> 
>>>> [...]
>>> Thanks for your log, pretty interesting result.
>>> 
>>> BTW, did you enabled qgroup from old kernel earlier than 4.2-rc1?
>>> If so, I would be much relaxed as they can be the problem of old 
>>> kernels.
>> 
>> The mkfs.btrfs was done under 3.19, but I'm almost sure I enabled 
>> quota
>> under 4.2.0 precisely. My kern.log tends to confirm that (looking for
>> 'qgroup scan completed').
> 
> Emmm, seems I need to pay more attention on this case now.
> Any info about the workload for this btrfs fs?
> 
>> 
>>> If it's OK for you, would you please enable quota after reproducing
>>> the bug and use for sometime and recheck it?
>> 
>> Sure, I've just reproduced the bug twice as I wanted, and posted the
>> info, so now I've cancelled the balance and I can reenable quota. Will
>> do it under 4.3.0-rc1. I'll keep you posted if btrfsck complains about
>> it in the following days.
>> 
>> Regards,
>> 
> Thanks for your patience and detailed report.

You're very welcome.

> But I still have another question, did you do any snapshot deletion
> after quota enabled?
> (I'll assume you did it, as there are a lot of backup snapshot, old
> ones should be already deleted)

Actually no : this btrfs system is quite new (less than a week old) as 
I'm migrating from mdadm(raid1)+ext4 to btrfs. So those snapshots were 
actually rsynced one by one from my hardlinks-based "snapshots" under 
ext4 (those pseudo-snapshots are created using a program named 
"rsnapshot", if you know it. This is basically a wrapper to cp -la). I 
didn't activate yet an automatic snapshot/delete on my btrfs system, due 
to the bugs I'm tripping on. So no snapshot was deleted.

> That's one of the known bug and Mark is working on it actively.
> If you delete non-empty snapshot a lot, then I'd better add a hot fix
> to mark qgroup inconsistent after snapshot delete, and trigger a
> rescan if possible.

I've made a btrfs-image of the filesystem just before disabling quotas 
(which I did to get a clean btrfsck and eliminate quotas from the 
equation trying to reproduce the bug I have). Would it be of any use if 
I drop it somewhere for you to pick it up ? (2.9G in size).

In the meantime, I've reactivated quotas, umounted the filesystem and 
ran a btrfsck on it : as you would expect, there's no qgroup problem 
reported so far. I'll clear all my snapshots, run an quota rescan, then 
re-create them one by one by rsyncing from my ext4 system I still have. 
Maybe I'll run into the issue again.

-- 
Stéphane.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-17 10:08                             ` Stéphane Lesimple
@ 2015-09-17 10:41                               ` Qu Wenruo
  2015-09-17 18:47                                 ` Stéphane Lesimple
  0 siblings, 1 reply; 37+ messages in thread
From: Qu Wenruo @ 2015-09-17 10:41 UTC (permalink / raw)
  To: Stéphane Lesimple, Qu Wenruo; +Cc: linux-btrfs



在 2015年09月17日 18:08, Stéphane Lesimple 写道:
> Le 2015-09-17 10:11, Qu Wenruo a écrit :
>> Stéphane Lesimple wrote on 2015/09/17 10:02 +0200:
>>> Le 2015-09-17 08:42, Qu Wenruo a écrit :
>>>> Stéphane Lesimple wrote on 2015/09/17 08:11 +0200:
>>>>> Le 2015-09-17 05:03, Qu Wenruo a écrit :
>>>>>> Stéphane Lesimple wrote on 2015/09/16 22:41 +0200:
>>>>>>> Le 2015-09-16 22:18, Duncan a écrit :
>>>>>>>> Stéphane Lesimple posted on Wed, 16 Sep 2015 15:04:20 +0200 as
>>>>>>>> excerpted:
>>>>>>>>
>>>>>>>
>>>>>>> Well actually it's the (d) option ;)
>>>>>>> I activate the quota feature for only one reason : being able to
>>>>>>> track
>>>>>>> down how much space my snapshots are taking.
>>>>>>
>>>>>> Yeah, that's completely one of the ideal use case of btrfs qgroup.
>>>>>>
>>>>>> But I'm quite curious about the btrfsck error report on qgroup.
>>>>>>
>>>>>> If btrfsck report such error, it means either I'm too confident about
>>>>>> the recent qgroup accounting rework, or btrfsck has some bug which I
>>>>>> didn't take much consideration during the kernel rework.
>>>>>>
>>>>>> Would you please provide the full result of previous btrfsck with
>>>>>> qgroup error?
>>>>>
>>>>> Sure, I've saved the log somewhere just in case, here your are :
>>>>>
>>>>> [...]
>>>> Thanks for your log, pretty interesting result.
>>>>
>>>> BTW, did you enabled qgroup from old kernel earlier than 4.2-rc1?
>>>> If so, I would be much relaxed as they can be the problem of old
>>>> kernels.
>>>
>>> The mkfs.btrfs was done under 3.19, but I'm almost sure I enabled quota
>>> under 4.2.0 precisely. My kern.log tends to confirm that (looking for
>>> 'qgroup scan completed').
>>
>> Emmm, seems I need to pay more attention on this case now.
>> Any info about the workload for this btrfs fs?
>>
>>>
>>>> If it's OK for you, would you please enable quota after reproducing
>>>> the bug and use for sometime and recheck it?
>>>
>>> Sure, I've just reproduced the bug twice as I wanted, and posted the
>>> info, so now I've cancelled the balance and I can reenable quota. Will
>>> do it under 4.3.0-rc1. I'll keep you posted if btrfsck complains about
>>> it in the following days.
>>>
>>> Regards,
>>>
>> Thanks for your patience and detailed report.
>
> You're very welcome.
>
>> But I still have another question, did you do any snapshot deletion
>> after quota enabled?
>> (I'll assume you did it, as there are a lot of backup snapshot, old
>> ones should be already deleted)
>
> Actually no : this btrfs system is quite new (less than a week old) as
> I'm migrating from mdadm(raid1)+ext4 to btrfs. So those snapshots were
> actually rsynced one by one from my hardlinks-based "snapshots" under
> ext4 (those pseudo-snapshots are created using a program named
> "rsnapshot", if you know it. This is basically a wrapper to cp -la). I
> didn't activate yet an automatic snapshot/delete on my btrfs system, due
> to the bugs I'm tripping on. So no snapshot was deleted.

Now things are getting tricky, as all known bugs are ruled out, it must 
be another hidden bug, even we tried to rework the qgroup accounting code.

>
>> That's one of the known bug and Mark is working on it actively.
>> If you delete non-empty snapshot a lot, then I'd better add a hot fix
>> to mark qgroup inconsistent after snapshot delete, and trigger a
>> rescan if possible.
>
> I've made a btrfs-image of the filesystem just before disabling quotas
> (which I did to get a clean btrfsck and eliminate quotas from the
> equation trying to reproduce the bug I have). Would it be of any use if
> I drop it somewhere for you to pick it up ? (2.9G in size).

For dismatch case, static btrfs-image dump won't really help.
As the important point is, when and which operation caused qgroup 
accounting to dismatch.

>
> In the meantime, I've reactivated quotas, umounted the filesystem and
> ran a btrfsck on it : as you would expect, there's no qgroup problem
> reported so far.

At least, rescan code is working without problem.

> I'll clear all my snapshots, run an quota rescan, then
> re-create them one by one by rsyncing from my ext4 system I still have.
> Maybe I'll run into the issue again.
>

Would you mind to do the following check for each subvolume rsync?

1) Do 'sync; btrfs qgroup show -prce --raw' and save the output
2) Create the needed snapshot
3) Do 'sync; btrfs qgroup show -prce --raw' and save the output
4) Avoid doing IO if possible until step 6)
5) Do 'btrfs quota rescan -w' and save it
6) Do 'sync; btrfs qgroup show -prce --raw' and save the output
7) Rsync data from ext4 to the newly created snapshot

The point is, as you mentioned, rescan is working fine, we can compare 
output from 3), 6) and 1) to see which qgroup accounting number changes.

And if differs, which means the qgroup update at write time OR snapshot 
creation has something wrong, at least we can locate the problem to 
qgroup update routine or snapshot creation.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-17 10:41                               ` Qu Wenruo
@ 2015-09-17 18:47                                 ` Stéphane Lesimple
  2015-09-18  0:59                                   ` Qu Wenruo
  0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-17 18:47 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs

Le 2015-09-17 12:41, Qu Wenruo a écrit :
>> In the meantime, I've reactivated quotas, umounted the filesystem and
>> ran a btrfsck on it : as you would expect, there's no qgroup problem
>> reported so far.
> 
> At least, rescan code is working without problem.
> 
>> I'll clear all my snapshots, run an quota rescan, then
>> re-create them one by one by rsyncing from my ext4 system I still 
>> have.
>> Maybe I'll run into the issue again.
>> 
> 
> Would you mind to do the following check for each subvolume rsync?
> 
> 1) Do 'sync; btrfs qgroup show -prce --raw' and save the output
> 2) Create the needed snapshot
> 3) Do 'sync; btrfs qgroup show -prce --raw' and save the output
> 4) Avoid doing IO if possible until step 6)
> 5) Do 'btrfs quota rescan -w' and save it
> 6) Do 'sync; btrfs qgroup show -prce --raw' and save the output
> 7) Rsync data from ext4 to the newly created snapshot
> 
> The point is, as you mentioned, rescan is working fine, we can compare
> output from 3), 6) and 1) to see which qgroup accounting number
> changes.
> 
> And if differs, which means the qgroup update at write time OR
> snapshot creation has something wrong, at least we can locate the
> problem to qgroup update routine or snapshot creation.

I was about to do that, but first there's something that sounds strange 
: I've begun by trashing all my snapshots, then ran a quota rescan, and 
waited for it to complete, to start on a sane base.
However, this is the output of qgroup show now :

qgroupid          rfer                 excl     max_rfer     max_excl 
parent  child
--------          ----                 ----     --------     -------- 
------  -----
0/5              16384                16384         none         none 
---     ---
0/1906   1657848029184        1657848029184         none         none 
---     ---
0/1909    124950921216         124950921216         none         none 
---     ---
0/1911   1054587293696        1054587293696         none         none 
---     ---
0/3270     23727300608          23727300608         none         none 
---     ---
0/3314     23206055936          23206055936         none         none 
---     ---
0/3317     18472996864                    0         none         none 
---     ---
0/3318     22235709440 18446744073708421120         none         none 
---     ---
0/3319     22240333824                    0         none         none 
---     ---
0/3320     22289608704                    0         none         none 
---     ---
0/3321     22289608704                    0         none         none 
---     ---
0/3322     18461151232                    0         none         none 
---     ---
0/3323     18423902208                    0         none         none 
---     ---
0/3324     18423902208                    0         none         none 
---     ---
0/3325     18463506432                    0         none         none 
---     ---
0/3326     18463506432                    0         none         none 
---     ---
0/3327     18463506432                    0         none         none 
---     ---
0/3328     18463506432                    0         none         none 
---     ---
0/3329     18585427968                    0         none         none 
---     ---
0/3330     18621472768 18446744073251348480         none         none 
---     ---
0/3331     18621472768                    0         none         none 
---     ---
0/3332     18621472768                    0         none         none 
---     ---
0/3333     18783076352                    0         none         none 
---     ---
0/3334     18799804416                    0         none         none 
---     ---
0/3335     18799804416                    0         none         none 
---     ---
0/3336     18816217088                    0         none         none 
---     ---
0/3337     18816266240                    0         none         none 
---     ---
0/3338     18816266240                    0         none         none 
---     ---
0/3339     18816266240                    0         none         none 
---     ---
0/3340     18816364544                    0         none         none 
---     ---
0/3341      7530119168           7530119168         none         none 
---     ---
0/3342      4919283712                    0         none         none 
---     ---
0/3343      4921724928                    0         none         none 
---     ---
0/3344      4921724928                    0         none         none 
---     ---
0/3345      6503317504 18446744073690902528         none         none 
---     ---
0/3346      6503452672                    0         none         none 
---     ---
0/3347      6509514752                    0         none         none 
---     ---
0/3348      6515793920                    0         none         none 
---     ---
0/3349      6515793920                    0         none         none 
---     ---
0/3350      6518685696                    0         none         none 
---     ---
0/3351      6521511936                    0         none         none 
---     ---
0/3352      6521511936                    0         none         none 
---     ---
0/3353      6521544704                    0         none         none 
---     ---
0/3354      6597963776                    0         none         none 
---     ---
0/3355      6598275072                    0         none         none 
---     ---
0/3356      6635880448                    0         none         none 
---     ---
0/3357      6635880448                    0         none         none 
---     ---
0/3358      6635880448                    0         none         none 
---     ---
0/3359      6635880448                    0         none         none 
---     ---
0/3360      6635880448                    0         none         none 
---     ---
0/3361      6635880448                    0         none         none 
---     ---
0/3362      6635880448                    0         none         none 
---     ---
0/3363      6635880448                    0         none         none 
---     ---
0/3364      6635880448                    0         none         none 
---     ---
0/3365      6635880448                    0         none         none 
---     ---
0/3366      6635896832                    0         none         none 
---     ---
0/3367     24185790464          24185790464         none         none 
---     ---

I would have expected all these qgroupids to have been trashed with the 
snapshots, but it seems not. It reminded me of the bug you were talking 
about, where deleted snapshots don't always clear correctly their 
qgroup, but as these don't disappear after a rescan either... I'm a bit 
surprised.

I've just tried quota disable / quota enable, and not it seems OK. Just 
wanted to let you know, in case it's not known behavior ...

The procedure I'll use will be slighlty different from what you 
proposed, but to my understanding it won't change the result :

> 0) Rsync data from the next ext4 "snapshot" to the subvolume
> 1) Do 'sync; btrfs qgroup show -prce --raw' and save the output
> 2) Create the needed readonly snapshot on btrfs
> 3) Do 'sync; btrfs qgroup show -prce --raw' and save the output
> 4) Avoid doing IO if possible until step 6)
> 5) Do 'btrfs quota rescan -w' and save it
> 6) Do 'sync; btrfs qgroup show -prce --raw' and save the output

I'll post the results once this is done.

-- 
Stéphane.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-17 18:47                                 ` Stéphane Lesimple
@ 2015-09-18  0:59                                   ` Qu Wenruo
  2015-09-18  7:36                                     ` Stéphane Lesimple
  0 siblings, 1 reply; 37+ messages in thread
From: Qu Wenruo @ 2015-09-18  0:59 UTC (permalink / raw)
  To: Stéphane Lesimple, Qu Wenruo; +Cc: linux-btrfs



Stéphane Lesimple wrote on 2015/09/17 20:47 +0200:
> Le 2015-09-17 12:41, Qu Wenruo a écrit :
>>> In the meantime, I've reactivated quotas, umounted the filesystem and
>>> ran a btrfsck on it : as you would expect, there's no qgroup problem
>>> reported so far.
>>
>> At least, rescan code is working without problem.
>>
>>> I'll clear all my snapshots, run an quota rescan, then
>>> re-create them one by one by rsyncing from my ext4 system I still have.
>>> Maybe I'll run into the issue again.
>>>
>>
>> Would you mind to do the following check for each subvolume rsync?
>>
>> 1) Do 'sync; btrfs qgroup show -prce --raw' and save the output
>> 2) Create the needed snapshot
>> 3) Do 'sync; btrfs qgroup show -prce --raw' and save the output
>> 4) Avoid doing IO if possible until step 6)
>> 5) Do 'btrfs quota rescan -w' and save it
>> 6) Do 'sync; btrfs qgroup show -prce --raw' and save the output
>> 7) Rsync data from ext4 to the newly created snapshot
>>
>> The point is, as you mentioned, rescan is working fine, we can compare
>> output from 3), 6) and 1) to see which qgroup accounting number
>> changes.
>>
>> And if differs, which means the qgroup update at write time OR
>> snapshot creation has something wrong, at least we can locate the
>> problem to qgroup update routine or snapshot creation.
>
> I was about to do that, but first there's something that sounds strange
> : I've begun by trashing all my snapshots, then ran a quota rescan, and
> waited for it to complete, to start on a sane base.
> However, this is the output of qgroup show now :

By "trashing", did you mean deleting all the files inside the subvolume?
Or "btrfs subv del"?

>
> qgroupid          rfer                 excl     max_rfer     max_excl
> parent  child
> --------          ----                 ----     --------     --------
> ------  -----
> 0/5              16384                16384         none         none
> ---     ---
> 0/1906   1657848029184        1657848029184         none         none
> ---     ---
> 0/1909    124950921216         124950921216         none         none
> ---     ---
> 0/1911   1054587293696        1054587293696         none         none
> ---     ---
> 0/3270     23727300608          23727300608         none         none
> ---     ---
> 0/3314     23206055936          23206055936         none         none
> ---     ---
> 0/3317     18472996864                    0         none         none
> ---     ---
> 0/3318     22235709440 18446744073708421120         none         none
> ---     ---
> 0/3319     22240333824                    0         none         none
> ---     ---
> 0/3320     22289608704                    0         none         none
> ---     ---
> 0/3321     22289608704                    0         none         none
> ---     ---
> 0/3322     18461151232                    0         none         none
> ---     ---
> 0/3323     18423902208                    0         none         none
> ---     ---
> 0/3324     18423902208                    0         none         none
> ---     ---
> 0/3325     18463506432                    0         none         none
> ---     ---
> 0/3326     18463506432                    0         none         none
> ---     ---
> 0/3327     18463506432                    0         none         none
> ---     ---
> 0/3328     18463506432                    0         none         none
> ---     ---
> 0/3329     18585427968                    0         none         none
> ---     ---
> 0/3330     18621472768 18446744073251348480         none         none
> ---     ---
> 0/3331     18621472768                    0         none         none
> ---     ---
> 0/3332     18621472768                    0         none         none
> ---     ---
> 0/3333     18783076352                    0         none         none
> ---     ---
> 0/3334     18799804416                    0         none         none
> ---     ---
> 0/3335     18799804416                    0         none         none
> ---     ---
> 0/3336     18816217088                    0         none         none
> ---     ---
> 0/3337     18816266240                    0         none         none
> ---     ---
> 0/3338     18816266240                    0         none         none
> ---     ---
> 0/3339     18816266240                    0         none         none
> ---     ---
> 0/3340     18816364544                    0         none         none
> ---     ---
> 0/3341      7530119168           7530119168         none         none
> ---     ---
> 0/3342      4919283712                    0         none         none
> ---     ---
> 0/3343      4921724928                    0         none         none
> ---     ---
> 0/3344      4921724928                    0         none         none
> ---     ---
> 0/3345      6503317504 18446744073690902528         none         none
> ---     ---
> 0/3346      6503452672                    0         none         none
> ---     ---
> 0/3347      6509514752                    0         none         none
> ---     ---
> 0/3348      6515793920                    0         none         none
> ---     ---
> 0/3349      6515793920                    0         none         none
> ---     ---
> 0/3350      6518685696                    0         none         none
> ---     ---
> 0/3351      6521511936                    0         none         none
> ---     ---
> 0/3352      6521511936                    0         none         none
> ---     ---
> 0/3353      6521544704                    0         none         none
> ---     ---
> 0/3354      6597963776                    0         none         none
> ---     ---
> 0/3355      6598275072                    0         none         none
> ---     ---
> 0/3356      6635880448                    0         none         none
> ---     ---
> 0/3357      6635880448                    0         none         none
> ---     ---
> 0/3358      6635880448                    0         none         none
> ---     ---
> 0/3359      6635880448                    0         none         none
> ---     ---
> 0/3360      6635880448                    0         none         none
> ---     ---
> 0/3361      6635880448                    0         none         none
> ---     ---
> 0/3362      6635880448                    0         none         none
> ---     ---
> 0/3363      6635880448                    0         none         none
> ---     ---
> 0/3364      6635880448                    0         none         none
> ---     ---
> 0/3365      6635880448                    0         none         none
> ---     ---
> 0/3366      6635896832                    0         none         none
> ---     ---
> 0/3367     24185790464          24185790464         none         none
> ---     ---
>

Nooooo!! What a wired result here!
Qg 3345 is having minus number again, even after a qgroup rescan....
IIRC, from the code, rescan is just passing old_roots as NULL, and use 
correct new_roots to build up "rfer" and "excl".
So in theory it should never go below zero in rescan.

The only hope for me is, that's a orphan qgroup.(mentioned below)

> I would have expected all these qgroupids to have been trashed with the
> snapshots, but it seems not. It reminded me of the bug you were talking
> about, where deleted snapshots don't always clear correctly their
> qgroup, but as these don't disappear after a rescan either... I'm a bit
> surprised.

If you mean you "btrfs qgroup del" the subvolume, then it's known the 
qgroup won't be deleted, and won't be associated to any subvolume.
(It's possible later created subvolume uses the old subvolid, and be 
associated to the qgroup again).

If above qgroups with 0 or even minus "excl" number are orphan, I'll be 
much relieved, as it'll be a minor orphan qgroup bug other than another 
possible qgroup rework(or at least huge review).

>
> I've just tried quota disable / quota enable, and not it seems OK. Just
> wanted to let you know, in case it's not known behavior ...

Thanks for your info a lot, which indeed expose something we didn't take 
much consideration.

And if the qgroups are the same with above description, would you mind 
to remove these qgroups?

>
> The procedure I'll use will be slighlty different from what you
> proposed, but to my understanding it won't change the result :
>
>> 0) Rsync data from the next ext4 "snapshot" to the subvolume
>> 1) Do 'sync; btrfs qgroup show -prce --raw' and save the output
>> 2) Create the needed readonly snapshot on btrfs
>> 3) Do 'sync; btrfs qgroup show -prce --raw' and save the output
>> 4) Avoid doing IO if possible until step 6)
>> 5) Do 'btrfs quota rescan -w' and save it
>> 6) Do 'sync; btrfs qgroup show -prce --raw' and save the output
>
> I'll post the results once this is done.
>

Thanks a lot!
Qu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-18  0:59                                   ` Qu Wenruo
@ 2015-09-18  7:36                                     ` Stéphane Lesimple
  2015-09-18 10:15                                       ` Stéphane Lesimple
  0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-18  7:36 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs

Le 2015-09-18 02:59, Qu Wenruo a écrit :
> Stéphane Lesimple wrote on 2015/09/17 20:47 +0200:
>> Le 2015-09-17 12:41, Qu Wenruo a écrit :
>>>> In the meantime, I've reactivated quotas, umounted the filesystem 
>>>> and
>>>> ran a btrfsck on it : as you would expect, there's no qgroup problem
>>>> reported so far.
>>> 
>>> At least, rescan code is working without problem.
>>> 
>>>> I'll clear all my snapshots, run an quota rescan, then
>>>> re-create them one by one by rsyncing from my ext4 system I still 
>>>> have.
>>>> Maybe I'll run into the issue again.
>>>> 
>>> 
>>> Would you mind to do the following check for each subvolume rsync?
>>> 
>>> 1) Do 'sync; btrfs qgroup show -prce --raw' and save the output
>>> 2) Create the needed snapshot
>>> 3) Do 'sync; btrfs qgroup show -prce --raw' and save the output
>>> 4) Avoid doing IO if possible until step 6)
>>> 5) Do 'btrfs quota rescan -w' and save it
>>> 6) Do 'sync; btrfs qgroup show -prce --raw' and save the output
>>> 7) Rsync data from ext4 to the newly created snapshot
>>> 
>>> The point is, as you mentioned, rescan is working fine, we can 
>>> compare
>>> output from 3), 6) and 1) to see which qgroup accounting number
>>> changes.
>>> 
>>> And if differs, which means the qgroup update at write time OR
>>> snapshot creation has something wrong, at least we can locate the
>>> problem to qgroup update routine or snapshot creation.
>> 
>> I was about to do that, but first there's something that sounds 
>> strange
>> : I've begun by trashing all my snapshots, then ran a quota rescan, 
>> and
>> waited for it to complete, to start on a sane base.
>> However, this is the output of qgroup show now :
> 
> By "trashing", did you mean deleting all the files inside the 
> subvolume?
> Or "btrfs subv del"?

Sorry for the confusion here, yes, I meant btrfs subvolume del.

>> qgroupid          rfer                 excl     max_rfer     max_excl
>> parent  child
>> --------          ----                 ----     --------     --------
>> ------  -----
>> 0/5              16384                16384         none         none
>> ---     ---
>> 0/1906   1657848029184        1657848029184         none         none
>> ---     ---
>> 0/1909    124950921216         124950921216         none         none
>> ---     ---
>> 0/1911   1054587293696        1054587293696         none         none
>> ---     ---
>> 0/3270     23727300608          23727300608         none         none
>> ---     ---
>> 0/3314     23206055936          23206055936         none         none
>> ---     ---
>> 0/3317     18472996864                    0         none         none
>> ---     ---
>> 0/3318     22235709440 18446744073708421120         none         none
>> ---     ---
>> 0/3319     22240333824                    0         none         none
>> ---     ---
>> 0/3320     22289608704                    0         none         none
>> ---     ---
>> 0/3321     22289608704                    0         none         none
>> ---     ---
>> 0/3322     18461151232                    0         none         none
>> ---     ---
>> 0/3323     18423902208                    0         none         none
>> ---     ---
>> 0/3324     18423902208                    0         none         none
>> ---     ---
>> 0/3325     18463506432                    0         none         none
>> ---     ---
>> 0/3326     18463506432                    0         none         none
>> ---     ---
>> 0/3327     18463506432                    0         none         none
>> ---     ---
>> 0/3328     18463506432                    0         none         none
>> ---     ---
>> 0/3329     18585427968                    0         none         none
>> ---     ---
>> 0/3330     18621472768 18446744073251348480         none         none
>> ---     ---
>> 0/3331     18621472768                    0         none         none
>> ---     ---
>> 0/3332     18621472768                    0         none         none
>> ---     ---
>> 0/3333     18783076352                    0         none         none
>> ---     ---
>> 0/3334     18799804416                    0         none         none
>> ---     ---
>> 0/3335     18799804416                    0         none         none
>> ---     ---
>> 0/3336     18816217088                    0         none         none
>> ---     ---
>> 0/3337     18816266240                    0         none         none
>> ---     ---
>> 0/3338     18816266240                    0         none         none
>> ---     ---
>> 0/3339     18816266240                    0         none         none
>> ---     ---
>> 0/3340     18816364544                    0         none         none
>> ---     ---
>> 0/3341      7530119168           7530119168         none         none
>> ---     ---
>> 0/3342      4919283712                    0         none         none
>> ---     ---
>> 0/3343      4921724928                    0         none         none
>> ---     ---
>> 0/3344      4921724928                    0         none         none
>> ---     ---
>> 0/3345      6503317504 18446744073690902528         none         none
>> ---     ---
>> 0/3346      6503452672                    0         none         none
>> ---     ---
>> 0/3347      6509514752                    0         none         none
>> ---     ---
>> 0/3348      6515793920                    0         none         none
>> ---     ---
>> 0/3349      6515793920                    0         none         none
>> ---     ---
>> 0/3350      6518685696                    0         none         none
>> ---     ---
>> 0/3351      6521511936                    0         none         none
>> ---     ---
>> 0/3352      6521511936                    0         none         none
>> ---     ---
>> 0/3353      6521544704                    0         none         none
>> ---     ---
>> 0/3354      6597963776                    0         none         none
>> ---     ---
>> 0/3355      6598275072                    0         none         none
>> ---     ---
>> 0/3356      6635880448                    0         none         none
>> ---     ---
>> 0/3357      6635880448                    0         none         none
>> ---     ---
>> 0/3358      6635880448                    0         none         none
>> ---     ---
>> 0/3359      6635880448                    0         none         none
>> ---     ---
>> 0/3360      6635880448                    0         none         none
>> ---     ---
>> 0/3361      6635880448                    0         none         none
>> ---     ---
>> 0/3362      6635880448                    0         none         none
>> ---     ---
>> 0/3363      6635880448                    0         none         none
>> ---     ---
>> 0/3364      6635880448                    0         none         none
>> ---     ---
>> 0/3365      6635880448                    0         none         none
>> ---     ---
>> 0/3366      6635896832                    0         none         none
>> ---     ---
>> 0/3367     24185790464          24185790464         none         none
>> ---     ---
>> 
> 
> Nooooo!! What a wired result here!
> Qg 3345 is having minus number again, even after a qgroup rescan....
> IIRC, from the code, rescan is just passing old_roots as NULL, and use
> correct new_roots to build up "rfer" and "excl".
> So in theory it should never go below zero in rescan.
> 
> The only hope for me is, that's a orphan qgroup.(mentioned below)
> 
>> I would have expected all these qgroupids to have been trashed with 
>> the
>> snapshots, but it seems not. It reminded me of the bug you were 
>> talking
>> about, where deleted snapshots don't always clear correctly their
>> qgroup, but as these don't disappear after a rescan either... I'm a 
>> bit
>> surprised.
> 
> If you mean you "btrfs qgroup del" the subvolume, then it's known the
> qgroup won't be deleted, and won't be associated to any subvolume.
> (It's possible later created subvolume uses the old subvolid, and be
> associated to the qgroup again).
> 
> If above qgroups with 0 or even minus "excl" number are orphan, I'll
> be much relieved, as it'll be a minor orphan qgroup bug other than
> another possible qgroup rework(or at least huge review).

The only qgroup subcommand I use is qroup show, I never deleted a qgroup 
directly by using qgroup del... I guess this is not good news :(

>> I've just tried quota disable / quota enable, and not it seems OK. 
>> Just
>> wanted to let you know, in case it's not known behavior ...

There's a typo above, I was meaning "and *now* it seems OK".
I'm sure you corrected, I just want to be sure there's no possibility of 
misinterpretation.

> Thanks for your info a lot, which indeed expose something we didn't
> take much consideration.
> 
> And if the qgroups are the same with above description, would you mind
> to remove these qgroups?

Sure, I did a quota disable / quota enable before running the snapshot 
debug procedure, so the qgroups were clean again when I started :

qgroupid          rfer          excl     max_rfer     max_excl parent  
child
--------          ----          ----     --------     -------- ------  
-----
0/5              16384         16384         none         none ---     
---
0/1906   1657848029184 1657848029184         none         none ---     
---
0/1909    124950921216  124950921216         none         none ---     
---
0/1911   1054587293696 1054587293696         none         none ---     
---
0/3270     23727300608   23727300608         none         none ---     
---
0/3314     23221784576   23221784576         none         none ---     
---
0/3341      7479275520    7479275520         none         none ---     
---
0/3367     24185790464   24185790464         none         none ---     
---

The test is running, I expect to post the results within an hour or two.

-- 
Stéphane.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-18  7:36                                     ` Stéphane Lesimple
@ 2015-09-18 10:15                                       ` Stéphane Lesimple
  2015-09-18 10:26                                         ` Stéphane Lesimple
  0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-18 10:15 UTC (permalink / raw)
  To: Stéphane Lesimple; +Cc: Qu Wenruo, Qu Wenruo, linux-btrfs

Le 2015-09-18 09:36, Stéphane Lesimple a écrit :
> Sure, I did a quota disable / quota enable before running the snapshot
> debug procedure, so the qgroups were clean again when I started :
> 
> qgroupid          rfer          excl     max_rfer     max_excl parent  
> child
> --------          ----          ----     --------     -------- ------  
> -----
> 0/5              16384         16384         none         none ---     
> ---
> 0/1906   1657848029184 1657848029184         none         none ---     
> ---
> 0/1909    124950921216  124950921216         none         none ---     
> ---
> 0/1911   1054587293696 1054587293696         none         none ---     
> ---
> 0/3270     23727300608   23727300608         none         none ---     
> ---
> 0/3314     23221784576   23221784576         none         none ---     
> ---
> 0/3341      7479275520    7479275520         none         none ---     
> ---
> 0/3367     24185790464   24185790464         none         none ---     
> ---
> 
> The test is running, I expect to post the results within an hour or 
> two.

Well, my system crashed twice while running the procedure...
By "crashed" I mean : the machine no longer pings, and nothing is logged 
in kern.log unfortunately :

[ 7096.735731] BTRFS info (device dm-3): qgroup scan completed 
(inconsistency flag cleared)
[ 7172.614851] BTRFS info (device dm-3): qgroup scan completed 
(inconsistency flag cleared)
[ 7242.870259] BTRFS info (device dm-3): qgroup scan completed 
(inconsistency flag cleared)
[ 7321.466931] BTRFS info (device dm-3): qgroup scan completed 
(inconsistency flag cleared)
[    0.000000] Initializing cgroup subsys cpuset

The even stranger part is that the last 2 stdout dump files exist but 
are empty :

-rw-r--r-- 1 root root   21 Sep 18 10:29 snap32.step5
-rw-r--r-- 1 root root 3.2K Sep 18 10:29 snap32.step6
-rw-r--r-- 1 root root 3.2K Sep 18 10:29 snap33.step1
-rw-r--r-- 1 root root 3.3K Sep 18 10:29 snap33.step3
-rw-r--r-- 1 root root   21 Sep 18 10:30 snap33.step5
-rw-r--r-- 1 root root 3.3K Sep 18 10:30 snap33.step6
-rw-r--r-- 1 root root 3.3K Sep 18 10:30 snap34.step1
-rw-r--r-- 1 root root    0 Sep 18 10:30 snap34.step3 <==
-rw-r--r-- 1 root root    0 Sep 18 10:30 snap34.step5 <==

The mentioned steps are as follows :

0) Rsync data from the next ext4 "snapshot" to the subvolume
1) Do 'sync; btrfs qgroup show -⁠prce -⁠-⁠raw' and save the output <==
2) Create the needed readonly snapshot on btrfs
3) Do 'sync; btrfs qgroup show -⁠prce -⁠-⁠raw' and save the output <==
4) Avoid doing IO if possible until step 6)
5) Do 'btrfs quota rescan -⁠w' and save it <==
6) Do 'sync; btrfs qgroup show -⁠prce -⁠-⁠raw' and save the output <==

The resulting files are available here: 
http://speed47.net/tmp2/qgroup.tar.gz
The run2 is the more complete one, during run1 the machine crashed even 
faster.
It's interesting to note, however, that it seems to have crashed the 
same way and at the same step in the process.

As the machine is now, qgroups seems OK :

~# btrfs qgroup show -pcre --raw /tank/
qgroupid          rfer          excl     max_rfer     max_excl parent  
child
--------          ----          ----     --------     -------- ------  
-----
0/5              32768         32768         none         none ---     
---
0/1906   3315696058368 3315696058368         none         none ---     
---
0/1909    249901842432  249901842432         none         none ---     
---
0/1911   2109174587392 2109174587392         none         none ---     
---
0/3270     47454601216   47454601216         none         none ---     
---
0/3314     46408499200         32768         none         none ---     
---
0/3341     14991097856         32768         none         none ---     
---
0/3367     48371580928   48371580928         none         none ---     
---
0/5335     56523751424     280592384         none         none ---     
---
0/5336     60175253504    2599960576         none         none ---     
---
0/5337     45751746560     250888192         none         none ---     
---
0/5338     45804650496     186531840         none         none ---     
---
0/5339     45875167232     190521344         none         none ---     
---
0/5340     45933486080        327680         none         none ---     
---
0/5341     45933502464        344064         none         none ---     
---
0/5342     46442815488      35454976         none         none ---     
---
0/5343     46442520576      30638080         none         none ---     
---
0/5344     46448312320      36495360         none         none ---     
---
0/5345     46425235456      86204416         none         none ---     
---
0/5346     46081941504     119398400         none         none ---     
---
0/5347     46402715648      55615488         none         none ---     
---
0/5348     46403534848      50528256         none         none ---     
---
0/5349     45486301184      91463680         none         none ---     
---
0/5351     46414635008        393216         none         none ---     
---
0/5352     46414667776        294912         none         none ---     
---
0/5353     46414667776        294912         none         none ---     
---
0/5354     46406148096      24829952         none         none ---     
---
0/5355     46415986688      33103872         none         none ---     
---
0/5356     46406262784      23216128         none         none ---     
---
0/5357     46408245248      17408000         none         none ---     
---
0/5358     46416052224      25280512         none         none ---     
---
0/5359     46406336512      23158784         none         none ---     
---
0/5360     46408335360      25157632         none         none ---     
---
0/5361     46406402048      24395776         none         none ---     
---
0/5362     46415273984      32260096         none         none ---     
---
0/5363     46408499200         32768         none         none ---     
---
0/5364     14949441536     139812864         none         none ---     
---
0/5365     14996299776     176889856         none         none ---     
---
0/5366     14958616576     143065088         none         none ---     
---
0/5367     14919172096     100171776         none         none ---     
---
0/5368     14945968128     142409728         none         none ---     
---
0/5369     14991097856         32768         none         none ---     
---


But I'm pretty sure I can get that (u64)-1 value again by deleting 
snapshots. Shall I ? Or do you have something else for me to run before 
that ?

So, as a quick summary of this big thread, it seems I've been hitting 3 
bugs, all reproductible :
- kernel BUG on balance (this original thread)
- negative or zero "excl" qgroups
- hard freezes without kernel trace when playing with snapshots and 
quota

Still available to dig deeper where needed.

-- 
Stéphane.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-18 10:15                                       ` Stéphane Lesimple
@ 2015-09-18 10:26                                         ` Stéphane Lesimple
  2015-09-20  1:22                                           ` Qu Wenruo
  0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-18 10:26 UTC (permalink / raw)
  To: Stéphane Lesimple; +Cc: Qu Wenruo, Qu Wenruo, linux-btrfs

Le 2015-09-18 12:15, Stéphane Lesimple a écrit :
> Le 2015-09-18 09:36, Stéphane Lesimple a écrit :
>> Sure, I did a quota disable / quota enable before running the snapshot
>> debug procedure, so the qgroups were clean again when I started :
>> 
>> qgroupid          rfer          excl     max_rfer     max_excl parent  
>> child
>> --------          ----          ----     --------     -------- ------  
>> -----
>> 0/5              16384         16384         none         none ---     
>> ---
>> 0/1906   1657848029184 1657848029184         none         none ---     
>> ---
>> 0/1909    124950921216  124950921216         none         none ---     
>> ---
>> 0/1911   1054587293696 1054587293696         none         none ---     
>> ---
>> 0/3270     23727300608   23727300608         none         none ---     
>> ---
>> 0/3314     23221784576   23221784576         none         none ---     
>> ---
>> 0/3341      7479275520    7479275520         none         none ---     
>> ---
>> 0/3367     24185790464   24185790464         none         none ---     
>> ---
>> 
>> The test is running, I expect to post the results within an hour or 
>> two.
> 
> Well, my system crashed twice while running the procedure...
> By "crashed" I mean : the machine no longer pings, and nothing is
> logged in kern.log unfortunately :
> 
> [ 7096.735731] BTRFS info (device dm-3): qgroup scan completed
> (inconsistency flag cleared)
> [ 7172.614851] BTRFS info (device dm-3): qgroup scan completed
> (inconsistency flag cleared)
> [ 7242.870259] BTRFS info (device dm-3): qgroup scan completed
> (inconsistency flag cleared)
> [ 7321.466931] BTRFS info (device dm-3): qgroup scan completed
> (inconsistency flag cleared)
> [    0.000000] Initializing cgroup subsys cpuset
> 
> The even stranger part is that the last 2 stdout dump files exist but
> are empty :
> 
> -rw-r--r-- 1 root root   21 Sep 18 10:29 snap32.step5
> -rw-r--r-- 1 root root 3.2K Sep 18 10:29 snap32.step6
> -rw-r--r-- 1 root root 3.2K Sep 18 10:29 snap33.step1
> -rw-r--r-- 1 root root 3.3K Sep 18 10:29 snap33.step3
> -rw-r--r-- 1 root root   21 Sep 18 10:30 snap33.step5
> -rw-r--r-- 1 root root 3.3K Sep 18 10:30 snap33.step6
> -rw-r--r-- 1 root root 3.3K Sep 18 10:30 snap34.step1
> -rw-r--r-- 1 root root    0 Sep 18 10:30 snap34.step3 <==
> -rw-r--r-- 1 root root    0 Sep 18 10:30 snap34.step5 <==
> 
> The mentioned steps are as follows :
> 
> 0) Rsync data from the next ext4 "snapshot" to the subvolume
> 1) Do 'sync; btrfs qgroup show -⁠prce -⁠-⁠raw' and save the output <==
> 2) Create the needed readonly snapshot on btrfs
> 3) Do 'sync; btrfs qgroup show -⁠prce -⁠-⁠raw' and save the output <==
> 4) Avoid doing IO if possible until step 6)
> 5) Do 'btrfs quota rescan -⁠w' and save it <==
> 6) Do 'sync; btrfs qgroup show -⁠prce -⁠-⁠raw' and save the output <==
> 
> The resulting files are available here: 
> http://speed47.net/tmp2/qgroup.tar.gz
> The run2 is the more complete one, during run1 the machine crashed even 
> faster.
> It's interesting to note, however, that it seems to have crashed the
> same way and at the same step in the process.

Actually about that, I forgot I did set up netconsole before starting 
the second run after the first "muted" crash, and it did work : even if 
I have no logs in kern.log, netconsole managed to send them to my other 
machine before going down, so here it is :

---
[ 5738.172692] BUG: unable to handle kernel NULL pointer dereference at 
00000000000001f0
[ 5738.172702] IP: [<ffffffffc03150db>] start_transaction+0x1b/0x580 
[btrfs]
[ 5738.172719] PGD c0aa7067 PUD c0aa6067 PMD 0
[ 5738.172723] Oops: 0000 [#1] SMP
[ 5738.172726] Modules linked in: netconsole configfs xts gf128mul drbg 
ansi_cprng xt_multiport xt_comment xt_conntrack xt_nat xt_tcpudp 
nfnetlink_queue nfnetlink_log nfnetlink nf_conntrack_ftp 
nf_conntrack_sane iptable_security iptable_filter iptable_mangle 
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat 
nf_conntrack iptable_raw ip_tables x_tables nfsd auth_rpcgss nfs_acl nfs 
cmac dm_crypt rfcomm bnep lockd grace sunrpc fscache binfmt_misc 
intel_rapl snd_hda_codec_realtek iosf_mbi x86_pkg_temp_thermal 
intel_powerclamp kvm_intel snd_hda_codec_generic snd_hda_intel 
snd_hda_codec kvm eeepc_wmi asus_wmi snd_hda_core btusb sparse_keymap 
btrtl snd_hwdep btbcm snd_pcm btintel 8021q bluetooth snd_seq_midi 
dm_multipath snd_seq_midi_event garp snd_rawmidi mrp snd_seq stp llc 
snd_seq_device snd_timer crct10dif_pclmul crc32_pclmul snd 
ghash_clmulni_intel cryptd serio_raw soundcore mei_me mei lpc_ich shpchp 
mac_hid parport_pc ppdev nct6775 hwmon_vid coretemp lp parport btrfs 
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor 
async_tx xor raid6_pq raid0 multipath linear nbd raid1 i915 e1000e 
i2c_algo_bit drm_kms_helper syscopyarea ptp sysfillrect sysimgblt 
fb_sys_fops psmouse ahci drm libahci pps_core wmi video [last unloaded: 
netconsole]
[ 5738.172831] CPU: 1 PID: 10932 Comm: kworker/u4:14 Not tainted 
4.3.0-rc1 #1
[ 5738.172833] Hardware name: ASUS All Series/H87I-PLUS, BIOS 1005 
01/06/2014
[ 5738.172843] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper 
[btrfs]
[ 5738.172845] task: ffff8800c7010000 ti: ffff88006acf4000 task.ti: 
ffff88006acf4000
[ 5738.172847] RIP: 0010:[<ffffffffc03150db>]  [<ffffffffc03150db>] 
start_transaction+0x1b/0x580 [btrfs]
[ 5738.172855] RSP: 0018:ffff88006acf7ca8  EFLAGS: 00010282
[ 5738.172856] RAX: 0000000000000004 RBX: 0000000000000201 RCX: 
0000000000000002
[ 5738.172857] RDX: 0000000000000201 RSI: 0000000000000001 RDI: 
0000000000000000
[ 5738.172858] RBP: ffff88006acf7cf0 R08: ffff88010990eab0 R09: 
00000001801c0017
[ 5738.172860] R10: 000000000990e701 R11: ffffea0004264380 R12: 
0000000000000000
[ 5738.172861] R13: ffff8800c73a6e08 R14: ffff880027963800 R15: 
0000160000000000
[ 5738.172862] FS:  0000000000000000(0000) GS:ffff88011fb00000(0000) 
knlGS:0000000000000000
[ 5738.172863] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5738.172864] CR2: 00000000000001f0 CR3: 0000000027a65000 CR4: 
00000000000406e0
[ 5738.172866] Stack:
[ 5738.172867]  ffff8800c73a6e08 ffff880027963800 0000160000000000 
ffff88006acf7ce8
[ 5738.172871]  00000000000000be 00000000fffffffc ffff8800c73a6e08 
ffff880027963800
[ 5738.172875]  0000160000000000 ffff88006acf7d00 ffffffffc031565b 
ffff88006acf7dc0
[ 5738.172879] Call Trace:
[ 5738.172887]  [<ffffffffc031565b>] btrfs_start_transaction+0x1b/0x20 
[btrfs]
[ 5738.172896]  [<ffffffffc0378038>] 
btrfs_qgroup_rescan_worker+0x388/0x5a0 [btrfs]
[ 5738.172904]  [<ffffffffc03444e0>] normal_work_helper+0xc0/0x270 
[btrfs]
[ 5738.172912]  [<ffffffffc03448a2>] 
btrfs_qgroup_rescan_helper+0x12/0x20 [btrfs]
[ 5738.172915]  [<ffffffff8109127e>] process_one_work+0x14e/0x3d0
[ 5738.172917]  [<ffffffff8109192a>] worker_thread+0x11a/0x470
[ 5738.172919]  [<ffffffff81091810>] ? rescuer_thread+0x310/0x310
[ 5738.172921]  [<ffffffff81097059>] kthread+0xc9/0xe0
[ 5738.172923]  [<ffffffff81096f90>] ? kthread_park+0x60/0x60
[ 5738.172926]  [<ffffffff817aac4f>] ret_from_fork+0x3f/0x70
[ 5738.172928]  [<ffffffff81096f90>] ? kthread_park+0x60/0x60
[ 5738.172929] Code: 49 c1 e9 5c ff ff ff 66 0f 1f 84 00 00 00 00 00 0f 
1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 89 d3 48 83 
ec 20 <48> 8b 87 f0 01 00 00 48 8b 90 60 0e 00 00 83 e2 01 0f 85 86 00
[ 5738.172973] RIP  [<ffffffffc03150db>] start_transaction+0x1b/0x580 
[btrfs]
[ 5738.172981]  RSP <ffff88006acf7ca8>
[ 5738.172982] CR2: 00000000000001f0
[ 5738.172984] ---[ end trace 9feb85def1327ee9 ]---
[ 5738.173010] BUG: unable to handle kernel paging request at 
ffffffffffffffd8
[ 5738.173012] IP: [<ffffffff810977d0>] kthread_data+0x10/0x20
[ 5738.173015] PGD 1c13067 PUD 1c15067 PMD 0
[ 5738.173019] Oops: 0000 [#2] SMP
[ 5738.173021] Modules linked in: netconsole configfs xts gf128mul drbg
---

Clearly this is during a rescan.

> As the machine is now, qgroups seems OK :
> 
> ~# btrfs qgroup show -pcre --raw /tank/
> qgroupid          rfer          excl     max_rfer     max_excl parent  
> child
> --------          ----          ----     --------     -------- ------  
> -----
> 0/5              32768         32768         none         none ---     
> ---
> 0/1906   3315696058368 3315696058368         none         none ---     
> ---
> 0/1909    249901842432  249901842432         none         none ---     
> ---
> 0/1911   2109174587392 2109174587392         none         none ---     
> ---
> 0/3270     47454601216   47454601216         none         none ---     
> ---
> 0/3314     46408499200         32768         none         none ---     
> ---
> 0/3341     14991097856         32768         none         none ---     
> ---
> 0/3367     48371580928   48371580928         none         none ---     
> ---
> 0/5335     56523751424     280592384         none         none ---     
> ---
> 0/5336     60175253504    2599960576         none         none ---     
> ---
> 0/5337     45751746560     250888192         none         none ---     
> ---
> 0/5338     45804650496     186531840         none         none ---     
> ---
> 0/5339     45875167232     190521344         none         none ---     
> ---
> 0/5340     45933486080        327680         none         none ---     
> ---
> 0/5341     45933502464        344064         none         none ---     
> ---
> 0/5342     46442815488      35454976         none         none ---     
> ---
> 0/5343     46442520576      30638080         none         none ---     
> ---
> 0/5344     46448312320      36495360         none         none ---     
> ---
> 0/5345     46425235456      86204416         none         none ---     
> ---
> 0/5346     46081941504     119398400         none         none ---     
> ---
> 0/5347     46402715648      55615488         none         none ---     
> ---
> 0/5348     46403534848      50528256         none         none ---     
> ---
> 0/5349     45486301184      91463680         none         none ---     
> ---
> 0/5351     46414635008        393216         none         none ---     
> ---
> 0/5352     46414667776        294912         none         none ---     
> ---
> 0/5353     46414667776        294912         none         none ---     
> ---
> 0/5354     46406148096      24829952         none         none ---     
> ---
> 0/5355     46415986688      33103872         none         none ---     
> ---
> 0/5356     46406262784      23216128         none         none ---     
> ---
> 0/5357     46408245248      17408000         none         none ---     
> ---
> 0/5358     46416052224      25280512         none         none ---     
> ---
> 0/5359     46406336512      23158784         none         none ---     
> ---
> 0/5360     46408335360      25157632         none         none ---     
> ---
> 0/5361     46406402048      24395776         none         none ---     
> ---
> 0/5362     46415273984      32260096         none         none ---     
> ---
> 0/5363     46408499200         32768         none         none ---     
> ---
> 0/5364     14949441536     139812864         none         none ---     
> ---
> 0/5365     14996299776     176889856         none         none ---     
> ---
> 0/5366     14958616576     143065088         none         none ---     
> ---
> 0/5367     14919172096     100171776         none         none ---     
> ---
> 0/5368     14945968128     142409728         none         none ---     
> ---
> 0/5369     14991097856         32768         none         none ---     
> ---
> 
> 
> But I'm pretty sure I can get that (u64)-1 value again by deleting
> snapshots. Shall I ? Or do you have something else for me to run
> before that ?
> 
> So, as a quick summary of this big thread, it seems I've been hitting
> 3 bugs, all reproductible :
> - kernel BUG on balance (this original thread)
> - negative or zero "excl" qgroups
> - hard freezes without kernel trace when playing with snapshots and 
> quota
> 
> Still available to dig deeper where needed.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-18 10:26                                         ` Stéphane Lesimple
@ 2015-09-20  1:22                                           ` Qu Wenruo
  2015-09-20 10:35                                             ` Stéphane Lesimple
  0 siblings, 1 reply; 37+ messages in thread
From: Qu Wenruo @ 2015-09-20  1:22 UTC (permalink / raw)
  To: Stéphane Lesimple; +Cc: Qu Wenruo, linux-btrfs



在 2015年09月18日 18:26, Stéphane Lesimple 写道:
> Le 2015-09-18 12:15, Stéphane Lesimple a écrit :
>> Le 2015-09-18 09:36, Stéphane Lesimple a écrit :
>>> Sure, I did a quota disable / quota enable before running the snapshot
>>> debug procedure, so the qgroups were clean again when I started :
>>>
>>> qgroupid          rfer          excl     max_rfer     max_excl parent
>>> child
>>> --------          ----          ----     --------     -------- ------
>>> -----
>>> 0/5              16384         16384         none         none --- ---
>>> 0/1906   1657848029184 1657848029184         none         none --- ---
>>> 0/1909    124950921216  124950921216         none         none --- ---
>>> 0/1911   1054587293696 1054587293696         none         none --- ---
>>> 0/3270     23727300608   23727300608         none         none --- ---
>>> 0/3314     23221784576   23221784576         none         none --- ---
>>> 0/3341      7479275520    7479275520         none         none --- ---
>>> 0/3367     24185790464   24185790464         none         none --- ---
>>>
>>> The test is running, I expect to post the results within an hour or two.
>>
>> Well, my system crashed twice while running the procedure...
>> By "crashed" I mean : the machine no longer pings, and nothing is
>> logged in kern.log unfortunately :
>>
>> [ 7096.735731] BTRFS info (device dm-3): qgroup scan completed
>> (inconsistency flag cleared)
>> [ 7172.614851] BTRFS info (device dm-3): qgroup scan completed
>> (inconsistency flag cleared)
>> [ 7242.870259] BTRFS info (device dm-3): qgroup scan completed
>> (inconsistency flag cleared)
>> [ 7321.466931] BTRFS info (device dm-3): qgroup scan completed
>> (inconsistency flag cleared)
>> [    0.000000] Initializing cgroup subsys cpuset
>>
>> The even stranger part is that the last 2 stdout dump files exist but
>> are empty :
>>
>> -rw-r--r-- 1 root root   21 Sep 18 10:29 snap32.step5
>> -rw-r--r-- 1 root root 3.2K Sep 18 10:29 snap32.step6
>> -rw-r--r-- 1 root root 3.2K Sep 18 10:29 snap33.step1
>> -rw-r--r-- 1 root root 3.3K Sep 18 10:29 snap33.step3
>> -rw-r--r-- 1 root root   21 Sep 18 10:30 snap33.step5
>> -rw-r--r-- 1 root root 3.3K Sep 18 10:30 snap33.step6
>> -rw-r--r-- 1 root root 3.3K Sep 18 10:30 snap34.step1
>> -rw-r--r-- 1 root root    0 Sep 18 10:30 snap34.step3 <==
>> -rw-r--r-- 1 root root    0 Sep 18 10:30 snap34.step5 <==
>>
>> The mentioned steps are as follows :
>>
>> 0) Rsync data from the next ext4 "snapshot" to the subvolume
>> 1) Do 'sync; btrfs qgroup show -⁠prce -⁠-⁠raw' and save the output <==
>> 2) Create the needed readonly snapshot on btrfs
>> 3) Do 'sync; btrfs qgroup show -⁠prce -⁠-⁠raw' and save the output <==
>> 4) Avoid doing IO if possible until step 6)
>> 5) Do 'btrfs quota rescan -⁠w' and save it <==
>> 6) Do 'sync; btrfs qgroup show -⁠prce -⁠-⁠raw' and save the output <==
>>
>> The resulting files are available here:
>> http://speed47.net/tmp2/qgroup.tar.gz
>> The run2 is the more complete one, during run1 the machine crashed
>> even faster.
>> It's interesting to note, however, that it seems to have crashed the
>> same way and at the same step in the process.

Your data really helps a lot!!

And the good news is, the qgroup accouting part is working as expected.
Although I only checked about 1/3/6 of about 5 snaps, they are all OK.

I can make a script to cross check them, but from the last few result, I 
think qgroup works fine.

I'm more confident about the minus number, which should be a result of 
deleted subvolume, and the real problem is, such qgroup is not handled 
well with qgroup rescan.

I'll try to add a hot fix for such case if needed.
But right now, I don't have a good idea for it until Mark's work of 
rescan subtree.

Maybe I can add a new option for btrfs-progs to automatically remove the 
qgroup and trigger a rescan?

> Actually about that, I forgot I did set up netconsole before starting
> the second run after the first "muted" crash, and it did work : even if
> I have no logs in kern.log, netconsole managed to send them to my other
> machine before going down, so here it is :
>
> ---
> [ 5738.172692] BUG: unable to handle kernel NULL pointer dereference at
> 00000000000001f0
> [ 5738.172702] IP: [<ffffffffc03150db>] start_transaction+0x1b/0x580
> [btrfs]
> [ 5738.172719] PGD c0aa7067 PUD c0aa6067 PMD 0
> [ 5738.172723] Oops: 0000 [#1] SMP
> [ 5738.172726] Modules linked in: netconsole configfs xts gf128mul drbg
> ansi_cprng xt_multiport xt_comment xt_conntrack xt_nat xt_tcpudp
> nfnetlink_queue nfnetlink_log nfnetlink nf_conntrack_ftp
> nf_conntrack_sane iptable_security iptable_filter iptable_mangle
> iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat
> nf_conntrack iptable_raw ip_tables x_tables nfsd auth_rpcgss nfs_acl nfs
> cmac dm_crypt rfcomm bnep lockd grace sunrpc fscache binfmt_misc
> intel_rapl snd_hda_codec_realtek iosf_mbi x86_pkg_temp_thermal
> intel_powerclamp kvm_intel snd_hda_codec_generic snd_hda_intel
> snd_hda_codec kvm eeepc_wmi asus_wmi snd_hda_core btusb sparse_keymap
> btrtl snd_hwdep btbcm snd_pcm btintel 8021q bluetooth snd_seq_midi
> dm_multipath snd_seq_midi_event garp snd_rawmidi mrp snd_seq stp llc
> snd_seq_device snd_timer crct10dif_pclmul crc32_pclmul snd
> ghash_clmulni_intel cryptd serio_raw soundcore mei_me mei lpc_ich shpchp
> mac_hid parport_pc ppdev nct6775 hwmon_vid coretemp lp parport btrfs
> raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor
> async_tx xor raid6_pq raid0 multipath linear nbd raid1 i915 e1000e
> i2c_algo_bit drm_kms_helper syscopyarea ptp sysfillrect sysimgblt
> fb_sys_fops psmouse ahci drm libahci pps_core wmi video [last unloaded:
> netconsole]
> [ 5738.172831] CPU: 1 PID: 10932 Comm: kworker/u4:14 Not tainted
> 4.3.0-rc1 #1
> [ 5738.172833] Hardware name: ASUS All Series/H87I-PLUS, BIOS 1005
> 01/06/2014
> [ 5738.172843] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper
> [btrfs]
> [ 5738.172845] task: ffff8800c7010000 ti: ffff88006acf4000 task.ti:
> ffff88006acf4000
> [ 5738.172847] RIP: 0010:[<ffffffffc03150db>]  [<ffffffffc03150db>]
> start_transaction+0x1b/0x580 [btrfs]
> [ 5738.172855] RSP: 0018:ffff88006acf7ca8  EFLAGS: 00010282
> [ 5738.172856] RAX: 0000000000000004 RBX: 0000000000000201 RCX:
> 0000000000000002
> [ 5738.172857] RDX: 0000000000000201 RSI: 0000000000000001 RDI:
> 0000000000000000
> [ 5738.172858] RBP: ffff88006acf7cf0 R08: ffff88010990eab0 R09:
> 00000001801c0017
> [ 5738.172860] R10: 000000000990e701 R11: ffffea0004264380 R12:
> 0000000000000000
> [ 5738.172861] R13: ffff8800c73a6e08 R14: ffff880027963800 R15:
> 0000160000000000
> [ 5738.172862] FS:  0000000000000000(0000) GS:ffff88011fb00000(0000)
> knlGS:0000000000000000
> [ 5738.172863] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 5738.172864] CR2: 00000000000001f0 CR3: 0000000027a65000 CR4:
> 00000000000406e0
> [ 5738.172866] Stack:
> [ 5738.172867]  ffff8800c73a6e08 ffff880027963800 0000160000000000
> ffff88006acf7ce8
> [ 5738.172871]  00000000000000be 00000000fffffffc ffff8800c73a6e08
> ffff880027963800
> [ 5738.172875]  0000160000000000 ffff88006acf7d00 ffffffffc031565b
> ffff88006acf7dc0
> [ 5738.172879] Call Trace:
> [ 5738.172887]  [<ffffffffc031565b>] btrfs_start_transaction+0x1b/0x20
> [btrfs]
> [ 5738.172896]  [<ffffffffc0378038>]
> btrfs_qgroup_rescan_worker+0x388/0x5a0 [btrfs]

Your netconsole backtrace is also of greate value.
This one implies that, my rework also caused some stupid bug.
(Yeah, I always make such bugs) or some existing unexposed rescan bug.

Would you please use gdb to show the codes of 
"btrfs_qgroup_rescan_worker+0x388" ?
(Need kernel debuginfo)

My guess is the following line:(pretty sure, but not 100% sure)
------
/*
          * only update status, since the previous part has alreay 
updated the
          * qgroup info.
          */
         trans = btrfs_start_transaction(fs_info->quota_root, 1); <<<<<
         if (IS_ERR(trans)) {
                 err = PTR_ERR(trans);
                 btrfs_err(fs_info,
                           "fail to start transaction for status update: 
%d\n",
                           err);
                 goto done;
         }
------

But that means, at rescan time, fs_info->quota_root is still NULL, which 
is quite wired.
I can add extra check to avoid such NULL pointer for now, but it's 
better to review the existing rescan workflow, as I think there is some 
race for it to init quota_root.

You can also try the following hotfix patch to see if it works:
http://pastebin.com/966GQXPk

My concern is, this may cause qgroup rescan to exit without updating its 
accounting info...

So still need your help.
Or I can use your reproducer script to test it next Monday.


> [ 5738.172904]  [<ffffffffc03444e0>] normal_work_helper+0xc0/0x270 [btrfs]
> [ 5738.172912]  [<ffffffffc03448a2>]
> btrfs_qgroup_rescan_helper+0x12/0x20 [btrfs]
> [ 5738.172915]  [<ffffffff8109127e>] process_one_work+0x14e/0x3d0
> [ 5738.172917]  [<ffffffff8109192a>] worker_thread+0x11a/0x470
> [ 5738.172919]  [<ffffffff81091810>] ? rescuer_thread+0x310/0x310
> [ 5738.172921]  [<ffffffff81097059>] kthread+0xc9/0xe0
> [ 5738.172923]  [<ffffffff81096f90>] ? kthread_park+0x60/0x60
> [ 5738.172926]  [<ffffffff817aac4f>] ret_from_fork+0x3f/0x70
> [ 5738.172928]  [<ffffffff81096f90>] ? kthread_park+0x60/0x60
> [ 5738.172929] Code: 49 c1 e9 5c ff ff ff 66 0f 1f 84 00 00 00 00 00 0f
> 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 89 d3 48 83
> ec 20 <48> 8b 87 f0 01 00 00 48 8b 90 60 0e 00 00 83 e2 01 0f 85 86 00
> [ 5738.172973] RIP  [<ffffffffc03150db>] start_transaction+0x1b/0x580
> [btrfs]
> [ 5738.172981]  RSP <ffff88006acf7ca8>
> [ 5738.172982] CR2: 00000000000001f0
> [ 5738.172984] ---[ end trace 9feb85def1327ee9 ]---
> [ 5738.173010] BUG: unable to handle kernel paging request at
> ffffffffffffffd8
> [ 5738.173012] IP: [<ffffffff810977d0>] kthread_data+0x10/0x20
> [ 5738.173015] PGD 1c13067 PUD 1c15067 PMD 0
> [ 5738.173019] Oops: 0000 [#2] SMP
> [ 5738.173021] Modules linked in: netconsole configfs xts gf128mul drbg
> ---
>
> Clearly this is during a rescan.
>
>> As the machine is now, qgroups seems OK :
>>
>> ~# btrfs qgroup show -pcre --raw /tank/
>> qgroupid          rfer          excl     max_rfer     max_excl parent
>> child
>> --------          ----          ----     --------     -------- ------
>> -----
>> 0/5              32768         32768         none         none --- ---
>> 0/1906   3315696058368 3315696058368         none         none --- ---
>> 0/1909    249901842432  249901842432         none         none --- ---
>> 0/1911   2109174587392 2109174587392         none         none --- ---
>> 0/3270     47454601216   47454601216         none         none --- ---
>> 0/3314     46408499200         32768         none         none --- ---
>> 0/3341     14991097856         32768         none         none --- ---
>> 0/3367     48371580928   48371580928         none         none --- ---
>> 0/5335     56523751424     280592384         none         none --- ---
>> 0/5336     60175253504    2599960576         none         none --- ---
>> 0/5337     45751746560     250888192         none         none --- ---
>> 0/5338     45804650496     186531840         none         none --- ---
>> 0/5339     45875167232     190521344         none         none --- ---
>> 0/5340     45933486080        327680         none         none --- ---
>> 0/5341     45933502464        344064         none         none --- ---
>> 0/5342     46442815488      35454976         none         none --- ---
>> 0/5343     46442520576      30638080         none         none --- ---
>> 0/5344     46448312320      36495360         none         none --- ---
>> 0/5345     46425235456      86204416         none         none --- ---
>> 0/5346     46081941504     119398400         none         none --- ---
>> 0/5347     46402715648      55615488         none         none --- ---
>> 0/5348     46403534848      50528256         none         none --- ---
>> 0/5349     45486301184      91463680         none         none --- ---
>> 0/5351     46414635008        393216         none         none --- ---
>> 0/5352     46414667776        294912         none         none --- ---
>> 0/5353     46414667776        294912         none         none --- ---
>> 0/5354     46406148096      24829952         none         none --- ---
>> 0/5355     46415986688      33103872         none         none --- ---
>> 0/5356     46406262784      23216128         none         none --- ---
>> 0/5357     46408245248      17408000         none         none --- ---
>> 0/5358     46416052224      25280512         none         none --- ---
>> 0/5359     46406336512      23158784         none         none --- ---
>> 0/5360     46408335360      25157632         none         none --- ---
>> 0/5361     46406402048      24395776         none         none --- ---
>> 0/5362     46415273984      32260096         none         none --- ---
>> 0/5363     46408499200         32768         none         none --- ---
>> 0/5364     14949441536     139812864         none         none --- ---
>> 0/5365     14996299776     176889856         none         none --- ---
>> 0/5366     14958616576     143065088         none         none --- ---
>> 0/5367     14919172096     100171776         none         none --- ---
>> 0/5368     14945968128     142409728         none         none --- ---
>> 0/5369     14991097856         32768         none         none --- ---
>>
>>
>> But I'm pretty sure I can get that (u64)-1 value again by deleting
>> snapshots. Shall I ? Or do you have something else for me to run
>> before that ?

You have already done a great job in helping maturing qgroups.
The minus number and 0 excl is somewhat expected for deleted snapshots.

Good news is, 1) it doesn't affect valid(non-orphan) qgroup.
2) Mark is already working on it.

I'll try to add a btrfs-progs hotfix for you to delete and rescan 
qgroups to avoid such problem.

>>
>> So, as a quick summary of this big thread, it seems I've been hitting
>> 3 bugs, all reproductible :
>> - kernel BUG on balance (this original thread)

For this, I can't provide much help, as extent backref bug is quite hard 
to debug, unless a developer is interested in it and find a stable way 
to reproduce it.

The rest two are explained or have hot fix mentioned above.

Thanks,
Qu

>> - negative or zero "excl" qgroups
>> - hard freezes without kernel trace when playing with snapshots and quota
>>
>> Still available to dig deeper where needed.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-20  1:22                                           ` Qu Wenruo
@ 2015-09-20 10:35                                             ` Stéphane Lesimple
  2015-09-20 10:51                                               ` Qu Wenruo
  0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-20 10:35 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs

Le 2015-09-20 03:22, Qu Wenruo a écrit :
>>> The mentioned steps are as follows :
>>> 
>>> 0) Rsync data from the next ext4 "snapshot" to the subvolume
>>> 1) Do 'sync; btrfs qgroup show -⁠prce -⁠-⁠raw' and save the output 
>>> <==
>>> 2) Create the needed readonly snapshot on btrfs
>>> 3) Do 'sync; btrfs qgroup show -⁠prce -⁠-⁠raw' and save the output 
>>> <==
>>> 4) Avoid doing IO if possible until step 6)
>>> 5) Do 'btrfs quota rescan -⁠w' and save it <==
>>> 6) Do 'sync; btrfs qgroup show -⁠prce -⁠-⁠raw' and save the output 
>>> <==
>>> 
>>> The resulting files are available here:
>>> http://speed47.net/tmp2/qgroup.tar.gz
>>> The run2 is the more complete one, during run1 the machine crashed
>>> even faster.
>>> It's interesting to note, however, that it seems to have crashed the
>>> same way and at the same step in the process.
> 
> Your data really helps a lot!!
> 
> And the good news is, the qgroup accouting part is working as expected.
> Although I only checked about 1/3/6 of about 5 snaps, they are all OK.
> 
> I can make a script to cross check them, but from the last few result,
> I think qgroup works fine.
> 
> I'm more confident about the minus number, which should be a result of
> deleted subvolume, and the real problem is, such qgroup is not handled
> well with qgroup rescan.

I agree with your analysis, this matches what I observed.

> I'll try to add a hot fix for such case if needed.
> But right now, I don't have a good idea for it until Mark's work of
> rescan subtree.
> 
> Maybe I can add a new option for btrfs-progs to automatically remove
> the qgroup and trigger a rescan?

Until this is properly fixed in the kernel code, and this is good news 
to
know Mark and you are working on it, this would be a good workaround 
yes!

>> [ 5738.172879] Call Trace:
>> [ 5738.172887]  [<ffffffffc031565b>] btrfs_start_transaction+0x1b/0x20
>> [btrfs]
>> [ 5738.172896]  [<ffffffffc0378038>]
>> btrfs_qgroup_rescan_worker+0x388/0x5a0 [btrfs]
> 
> Your netconsole backtrace is also of greate value.
> This one implies that, my rework also caused some stupid bug.
> (Yeah, I always make such bugs) or some existing unexposed rescan bug.
> 
> Would you please use gdb to show the codes of
> "btrfs_qgroup_rescan_worker+0x388" ?
> (Need kernel debuginfo)
> 
> My guess is the following line:(pretty sure, but not 100% sure)
> ------
> /*
>          * only update status, since the previous part has alreay 
> updated the
>          * qgroup info.
>          */
>         trans = btrfs_start_transaction(fs_info->quota_root, 1); <<<<<
>         if (IS_ERR(trans)) {
>                 err = PTR_ERR(trans);
>                 btrfs_err(fs_info,
>                           "fail to start transaction for status update: 
> %d\n",
>                           err);
>                 goto done;
>         }
> ------

The kernel and modules were already compiled with debuginfo.
However for some reason, I couldn't get gdb disassembly of /proc/kcore 
properly
aligned with the source I compiled: the asm code doesn't match the C 
code shown
by gdb. In any case, watching the source of this function, this is the 
only place
btrfs_start_transaction is called, so we can be 100% sure it's where the 
crash
happens indeed.

> But that means, at rescan time, fs_info->quota_root is still NULL,
> which is quite wired.
> I can add extra check to avoid such NULL pointer for now, but it's
> better to review the existing rescan workflow, as I think there is
> some race for it to init quota_root.
> 
> You can also try the following hotfix patch to see if it works:
> http://pastebin.com/966GQXPk
> 
> My concern is, this may cause qgroup rescan to exit without updating
> its accounting info...
> 
> So still need your help.
> Or I can use your reproducer script to test it next Monday.

Compiling with your patch, just amended of a little printk to know if 
the execution
flow enters the added if condition. Will let you know about the results.

>>> But I'm pretty sure I can get that (u64)-1 value again by deleting
>>> snapshots. Shall I ? Or do you have something else for me to run
>>> before that ?
> 
> You have already done a great job in helping maturing qgroups.
> The minus number and 0 excl is somewhat expected for deleted snapshots.
> 
> Good news is, 1) it doesn't affect valid(non-orphan) qgroup.
> 2) Mark is already working on it.
> 
> I'll try to add a btrfs-progs hotfix for you to delete and rescan
> qgroups to avoid such problem.

That would be good !

>>> So, as a quick summary of this big thread, it seems I've been hitting
>>> 3 bugs, all reproductible :
>>> - kernel BUG on balance (this original thread)
> 
> For this, I can't provide much help, as extent backref bug is quite
> hard to debug, unless a developer is interested in it and find a
> stable way to reproduce it.

Yes, unfortunately as it looks so much like a race condition, I know I 
can
reproduce it with my worflow, but it can take between 1 minute and 12 
hours,
so I wouldn't call it a "stable way" to reproduce it unfortunately :(

Still if any dev is interested in it, I can reproduce it, with a patched
kernel if needed.

> The rest two are explained or have hot fix mentioned above.

And thanks for that, will keep you posted.

-- 
Stéphane.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-20 10:35                                             ` Stéphane Lesimple
@ 2015-09-20 10:51                                               ` Qu Wenruo
  2015-09-20 11:14                                                 ` Stéphane Lesimple
  0 siblings, 1 reply; 37+ messages in thread
From: Qu Wenruo @ 2015-09-20 10:51 UTC (permalink / raw)
  To: Stéphane Lesimple; +Cc: Qu Wenruo, linux-btrfs



在 2015年09月20日 18:35, Stéphane Lesimple 写道:
> Le 2015-09-20 03:22, Qu Wenruo a écrit :
>>>> The mentioned steps are as follows :
>>>>
>>>> 0) Rsync data from the next ext4 "snapshot" to the subvolume
>>>> 1) Do 'sync; btrfs qgroup show -⁠prce -⁠-⁠raw' and save the output <==
>>>> 2) Create the needed readonly snapshot on btrfs
>>>> 3) Do 'sync; btrfs qgroup show -⁠prce -⁠-⁠raw' and save the output <==
>>>> 4) Avoid doing IO if possible until step 6)
>>>> 5) Do 'btrfs quota rescan -⁠w' and save it <==
>>>> 6) Do 'sync; btrfs qgroup show -⁠prce -⁠-⁠raw' and save the output <==
>>>>
>>>> The resulting files are available here:
>>>> http://speed47.net/tmp2/qgroup.tar.gz
>>>> The run2 is the more complete one, during run1 the machine crashed
>>>> even faster.
>>>> It's interesting to note, however, that it seems to have crashed the
>>>> same way and at the same step in the process.
>>
>> Your data really helps a lot!!
>>
>> And the good news is, the qgroup accouting part is working as expected.
>> Although I only checked about 1/3/6 of about 5 snaps, they are all OK.
>>
>> I can make a script to cross check them, but from the last few result,
>> I think qgroup works fine.
>>
>> I'm more confident about the minus number, which should be a result of
>> deleted subvolume, and the real problem is, such qgroup is not handled
>> well with qgroup rescan.
>
> I agree with your analysis, this matches what I observed.
>
>> I'll try to add a hot fix for such case if needed.
>> But right now, I don't have a good idea for it until Mark's work of
>> rescan subtree.
>>
>> Maybe I can add a new option for btrfs-progs to automatically remove
>> the qgroup and trigger a rescan?
>
> Until this is properly fixed in the kernel code, and this is good news to
> know Mark and you are working on it, this would be a good workaround yes!
>
>>> [ 5738.172879] Call Trace:
>>> [ 5738.172887]  [<ffffffffc031565b>] btrfs_start_transaction+0x1b/0x20
>>> [btrfs]
>>> [ 5738.172896]  [<ffffffffc0378038>]
>>> btrfs_qgroup_rescan_worker+0x388/0x5a0 [btrfs]
>>
>> Your netconsole backtrace is also of greate value.
>> This one implies that, my rework also caused some stupid bug.
>> (Yeah, I always make such bugs) or some existing unexposed rescan bug.
>>
>> Would you please use gdb to show the codes of
>> "btrfs_qgroup_rescan_worker+0x388" ?
>> (Need kernel debuginfo)
>>
>> My guess is the following line:(pretty sure, but not 100% sure)
>> ------
>> /*
>>          * only update status, since the previous part has alreay
>> updated the
>>          * qgroup info.
>>          */
>>         trans = btrfs_start_transaction(fs_info->quota_root, 1); <<<<<
>>         if (IS_ERR(trans)) {
>>                 err = PTR_ERR(trans);
>>                 btrfs_err(fs_info,
>>                           "fail to start transaction for status
>> update: %d\n",
>>                           err);
>>                 goto done;
>>         }
>> ------
>
> The kernel and modules were already compiled with debuginfo.
> However for some reason, I couldn't get gdb disassembly of /proc/kcore
> properly
> aligned with the source I compiled: the asm code doesn't match the C
> code shown
> by gdb. In any case, watching the source of this function, this is the
> only place
> btrfs_start_transaction is called, so we can be 100% sure it's where the
> crash
> happens indeed.

Yep, that's the only caller.

Here is some useful small hint to locate the code, if you are 
interestied in kernel development.

# Not sure about whether ubuntu gzipped modules, at least Arch does
# compress it
$ cp <kernel modules dir>/kernel/fs/btrfs/btrfs.ko.gz /tmp/
$ gunzip /tmp/btrfs.ko.gz
$ gdb /tmp/btrfs.ko
# Make sure gdb read all the needed debuginfo
$ gdb list *(btrfs_qgroup_rescan_worker+0x388)

And gdb will find the code position for you.
Quite easy one, only backtrace info is needed.

Another hint is about how to collect the kernel crash info.
Your netconsole setup would be definitely one good practice.

Another one I use to collect crash info is kdump.
Ubuntu should have a good wiki on it.

>
>> But that means, at rescan time, fs_info->quota_root is still NULL,
>> which is quite wired.
>> I can add extra check to avoid such NULL pointer for now, but it's
>> better to review the existing rescan workflow, as I think there is
>> some race for it to init quota_root.
>>
>> You can also try the following hotfix patch to see if it works:
>> http://pastebin.com/966GQXPk
>>
>> My concern is, this may cause qgroup rescan to exit without updating
>> its accounting info...
>>
>> So still need your help.
>> Or I can use your reproducer script to test it next Monday.
>
> Compiling with your patch, just amended of a little printk to know if
> the execution
> flow enters the added if condition. Will let you know about the results.
>
>>>> But I'm pretty sure I can get that (u64)-1 value again by deleting
>>>> snapshots. Shall I ? Or do you have something else for me to run
>>>> before that ?
>>
>> You have already done a great job in helping maturing qgroups.
>> The minus number and 0 excl is somewhat expected for deleted snapshots.
>>
>> Good news is, 1) it doesn't affect valid(non-orphan) qgroup.
>> 2) Mark is already working on it.
>>
>> I'll try to add a btrfs-progs hotfix for you to delete and rescan
>> qgroups to avoid such problem.
>
> That would be good !
>
>>>> So, as a quick summary of this big thread, it seems I've been hitting
>>>> 3 bugs, all reproductible :
>>>> - kernel BUG on balance (this original thread)
>>
>> For this, I can't provide much help, as extent backref bug is quite
>> hard to debug, unless a developer is interested in it and find a
>> stable way to reproduce it.
>
> Yes, unfortunately as it looks so much like a race condition, I know I can
> reproduce it with my worflow, but it can take between 1 minute and 12
> hours,
> so I wouldn't call it a "stable way" to reproduce it unfortunately :(
>
> Still if any dev is interested in it, I can reproduce it, with a patched
> kernel if needed.

Maybe you are already doing it, you can only compile the btrfs modules, 
which will be far more faster than compile the whole kernel, if and only 
if the compiled module can be loaded.

Thanks,
Qu
>
>> The rest two are explained or have hot fix mentioned above.
>
> And thanks for that, will keep you posted.
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-20 10:51                                               ` Qu Wenruo
@ 2015-09-20 11:14                                                 ` Stéphane Lesimple
  2015-09-22  1:30                                                   ` Stéphane Lesimple
  0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-20 11:14 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs

Le 2015-09-20 12:51, Qu Wenruo a écrit :
>>> Would you please use gdb to show the codes of
>>> "btrfs_qgroup_rescan_worker+0x388" ?
>>> (Need kernel debuginfo)
>>> 
>>> My guess is the following line:(pretty sure, but not 100% sure)
>>> ------
>>> /*
>>>          * only update status, since the previous part has alreay
>>> updated the
>>>          * qgroup info.
>>>          */
>>>         trans = btrfs_start_transaction(fs_info->quota_root, 1); 
>>> <<<<<
>>>         if (IS_ERR(trans)) {
>>>                 err = PTR_ERR(trans);
>>>                 btrfs_err(fs_info,
>>>                           "fail to start transaction for status
>>> update: %d\n",
>>>                           err);
>>>                 goto done;
>>>         }
>>> ------
>> 
>> The kernel and modules were already compiled with debuginfo.
>> However for some reason, I couldn't get gdb disassembly of /proc/kcore
>> properly
>> aligned with the source I compiled: the asm code doesn't match the C
>> code shown
>> by gdb. In any case, watching the source of this function, this is the
>> only place
>> btrfs_start_transaction is called, so we can be 100% sure it's where 
>> the
>> crash
>> happens indeed.
> 
> Yep, that's the only caller.
> 
> Here is some useful small hint to locate the code, if you are
> interestied in kernel development.
> 
> # Not sure about whether ubuntu gzipped modules, at least Arch does
> # compress it
> $ cp <kernel modules dir>/kernel/fs/btrfs/btrfs.ko.gz /tmp/
> $ gunzip /tmp/btrfs.ko.gz
> $ gdb /tmp/btrfs.ko
> # Make sure gdb read all the needed debuginfo
> $ gdb list *(btrfs_qgroup_rescan_worker+0x388)
> 
> And gdb will find the code position for you.
> Quite easy one, only backtrace info is needed.

Ah, thanks for the tips, I was loading whole vmlinux and using 
/proc/kcore
as the core info, then adding the module with "add-symbol-file". But as
we're just looking for the code and not the variables, it was indeed
completely overkill.

(gdb) list *(btrfs_qgroup_rescan_worker+0x388)
0x98068 is in btrfs_qgroup_rescan_worker (fs/btrfs/qgroup.c:2328).
2323
2324            /*
2325             * only update status, since the previous part has 
alreay updated the
2326             * qgroup info.
2327             */
2328            trans = btrfs_start_transaction(fs_info->quota_root, 1);
2329            if (IS_ERR(trans)) {
2330                    err = PTR_ERR(trans);
2331                    btrfs_err(fs_info,
2332                              "fail to start transaction for status 
update: %d\n",

So this just confirms what we were already 99% sure of.

> Another hint is about how to collect the kernel crash info.
> Your netconsole setup would be definitely one good practice.
> 
> Another one I use to collect crash info is kdump.
> Ubuntu should have a good wiki on it.

I've already come across kdump a few times, but never really look into 
it.
To debug the other complicated extend backref bug, it could be of some 
use.

>>>>> So, as a quick summary of this big thread, it seems I've been 
>>>>> hitting
>>>>> 3 bugs, all reproductible :
>>>>> - kernel BUG on balance (this original thread)
>>> 
>>> For this, I can't provide much help, as extent backref bug is quite
>>> hard to debug, unless a developer is interested in it and find a
>>> stable way to reproduce it.
>> 
>> Yes, unfortunately as it looks so much like a race condition, I know I 
>> can
>> reproduce it with my worflow, but it can take between 1 minute and 12
>> hours,
>> so I wouldn't call it a "stable way" to reproduce it unfortunately :(
>> 
>> Still if any dev is interested in it, I can reproduce it, with a 
>> patched
>> kernel if needed.
> 
> Maybe you are already doing it, you can only compile the btrfs
> modules, which will be far more faster than compile the whole kernel,
> if and only if the compiled module can be loaded.

Yes, I've compiled this 4.3.0-rc1 in a completely modular form, so I'll 
try to
load the modified module and see if the running kernel accepts it. I 
have to rmmod
the loaded module first, hence umounting any btrfs fs before that. 
Should be able
to do it in a couple hours.

I'll delete again all my snapshots and run my script. Should be easy to 
trigger
the (hopefully worked-around) bug again.

Regards,

-- 
Stéphane.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-20 11:14                                                 ` Stéphane Lesimple
@ 2015-09-22  1:30                                                   ` Stéphane Lesimple
  2015-09-22  1:37                                                     ` Qu Wenruo
  0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-22  1:30 UTC (permalink / raw)
  To: Stéphane Lesimple; +Cc: Qu Wenruo, Qu Wenruo, linux-btrfs

Le 2015-09-20 13:14, Stéphane Lesimple a écrit :
> Le 2015-09-20 12:51, Qu Wenruo a écrit :
>>>> Would you please use gdb to show the codes of
>>>> "btrfs_qgroup_rescan_worker+0x388" ?
>>>> (Need kernel debuginfo)
>>>> 
>>>> My guess is the following line:(pretty sure, but not 100% sure)
>>>> ------
>>>> /*
>>>>          * only update status, since the previous part has alreay
>>>> updated the
>>>>          * qgroup info.
>>>>          */
>>>>         trans = btrfs_start_transaction(fs_info->quota_root, 1); 
>>>> <<<<<
>>>>         if (IS_ERR(trans)) {
>>>>                 err = PTR_ERR(trans);
>>>>                 btrfs_err(fs_info,
>>>>                           "fail to start transaction for status
>>>> update: %d\n",
>>>>                           err);
>>>>                 goto done;
>>>>         }
>>>> ------
>>> 
>>> The kernel and modules were already compiled with debuginfo.
>>> However for some reason, I couldn't get gdb disassembly of 
>>> /proc/kcore
>>> properly
>>> aligned with the source I compiled: the asm code doesn't match the C
>>> code shown
>>> by gdb. In any case, watching the source of this function, this is 
>>> the
>>> only place
>>> btrfs_start_transaction is called, so we can be 100% sure it's where 
>>> the
>>> crash
>>> happens indeed.
>> 
>> Yep, that's the only caller.
>> 
>> Here is some useful small hint to locate the code, if you are
>> interestied in kernel development.
>> 
>> # Not sure about whether ubuntu gzipped modules, at least Arch does
>> # compress it
>> $ cp <kernel modules dir>/kernel/fs/btrfs/btrfs.ko.gz /tmp/
>> $ gunzip /tmp/btrfs.ko.gz
>> $ gdb /tmp/btrfs.ko
>> # Make sure gdb read all the needed debuginfo
>> $ gdb list *(btrfs_qgroup_rescan_worker+0x388)
>> 
>> And gdb will find the code position for you.
>> Quite easy one, only backtrace info is needed.
> 
> Ah, thanks for the tips, I was loading whole vmlinux and using 
> /proc/kcore
> as the core info, then adding the module with "add-symbol-file". But as
> we're just looking for the code and not the variables, it was indeed
> completely overkill.
> 
> (gdb) list *(btrfs_qgroup_rescan_worker+0x388)
> 0x98068 is in btrfs_qgroup_rescan_worker (fs/btrfs/qgroup.c:2328).
> 2323
> 2324            /*
> 2325             * only update status, since the previous part has
> alreay updated the
> 2326             * qgroup info.
> 2327             */
> 2328            trans = btrfs_start_transaction(fs_info->quota_root, 
> 1);
> 2329            if (IS_ERR(trans)) {
> 2330                    err = PTR_ERR(trans);
> 2331                    btrfs_err(fs_info,
> 2332                              "fail to start transaction for
> status update: %d\n",
> 
> So this just confirms what we were already 99% sure of.
> 
>> Another hint is about how to collect the kernel crash info.
>> Your netconsole setup would be definitely one good practice.
>> 
>> Another one I use to collect crash info is kdump.
>> Ubuntu should have a good wiki on it.
> 
> I've already come across kdump a few times, but never really look into 
> it.
> To debug the other complicated extend backref bug, it could be of some 
> use.
> 
>>>>>> So, as a quick summary of this big thread, it seems I've been 
>>>>>> hitting
>>>>>> 3 bugs, all reproductible :
>>>>>> - kernel BUG on balance (this original thread)
>>>> 
>>>> For this, I can't provide much help, as extent backref bug is quite
>>>> hard to debug, unless a developer is interested in it and find a
>>>> stable way to reproduce it.
>>> 
>>> Yes, unfortunately as it looks so much like a race condition, I know 
>>> I can
>>> reproduce it with my worflow, but it can take between 1 minute and 12
>>> hours,
>>> so I wouldn't call it a "stable way" to reproduce it unfortunately :(
>>> 
>>> Still if any dev is interested in it, I can reproduce it, with a 
>>> patched
>>> kernel if needed.
>> 
>> Maybe you are already doing it, you can only compile the btrfs
>> modules, which will be far more faster than compile the whole kernel,
>> if and only if the compiled module can be loaded.
> 
> Yes, I've compiled this 4.3.0-rc1 in a completely modular form, so I'll 
> try to
> load the modified module and see if the running kernel accepts it. I
> have to rmmod
> the loaded module first, hence umounting any btrfs fs before that.
> Should be able
> to do it in a couple hours.
> 
> I'll delete again all my snapshots and run my script. Should be easy to 
> trigger
> the (hopefully worked-around) bug again.

Well, I didn't trigger this exact bug, but another one, not less severe 
though, as it also crashed the system:

[92098.841309] general protection fault: 0000 [#1] SMP
[92098.841338] Modules linked in: ...
[92098.841814] CPU: 1 PID: 24655 Comm: kworker/u4:12 Not tainted 
4.3.0-rc1 #1
[92098.841834] Hardware name: ASUS All Series/H87I-PLUS, BIOS 1005 
01/06/2014
[92098.841868] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper 
[btrfs]
[92098.841889] task: ffff8800b6cc4100 ti: ffff8800a3dc8000 task.ti: 
ffff8800a3dc8000
[92098.841910] RIP: 0010:[<ffffffff813ae6c6>]  [<ffffffff813ae6c6>] 
memcpy_erms+0x6/0x10
[92098.841935] RSP: 0018:ffff8800a3dcbcc8  EFLAGS: 00010207
[92098.841950] RAX: ffff8800a3dcbd67 RBX: 0000000000000009 RCX: 
0000000000000009
[92098.841970] RDX: 0000000000000009 RSI: 0005080000000000 RDI: 
ffff8800a3dcbd67
[92098.841989] RBP: ffff8800a3dcbd00 R08: 0000000000019c60 R09: 
ffff88011fb19c60
[92098.842009] R10: ffffea0003006480 R11: 0000000001000000 R12: 
ffff8800b76c32c0
[92098.842028] R13: 0000160000000000 R14: ffff8800a3dcbd70 R15: 
0000000000000009
[92098.842048] FS:  0000000000000000(0000) GS:ffff88011fb00000(0000) 
knlGS:0000000000000000
[92098.842070] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[92098.842086] CR2: 00007fe1f2bd8000 CR3: 0000000001c10000 CR4: 
00000000000406e0
[92098.842105] Stack:
[92098.842111]  ffffffffc035a5d8 ffffffffc0396d00 000000000000028b 
0000000000000000
[92098.842212]  0000cc6c00000000 ffff8800b76c3200 0000160000000000 
ffff8800a3dcbdc0
[92098.842237]  ffffffffc039af3d ffff8800c7196dc8 ffff8800c7196e08 
ffff8800c7196da0
[92098.842261] Call Trace:
[92098.842277]  [<ffffffffc035a5d8>] ? read_extent_buffer+0xb8/0x110 
[btrfs]
[92098.842304]  [<ffffffffc0396d00>] ? btrfs_find_all_roots+0x60/0x70 
[btrfs]
[92098.842329]  [<ffffffffc039af3d>] 
btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs]
[92098.842351]  [<ffffffff810a1a0d>] ? 
ttwu_do_activate.constprop.90+0x5d/0x70
[92098.842377]  [<ffffffffc03674e0>] normal_work_helper+0xc0/0x270 
[btrfs]
[92098.842401]  [<ffffffffc03678a2>] 
btrfs_qgroup_rescan_helper+0x12/0x20 [btrfs]
[92098.842421]  [<ffffffff8109127e>] process_one_work+0x14e/0x3d0
[92098.842438]  [<ffffffff8109192a>] worker_thread+0x11a/0x470
[92098.842454]  [<ffffffff81091810>] ? rescuer_thread+0x310/0x310
[92098.842471]  [<ffffffff81097059>] kthread+0xc9/0xe0
[92098.842485]  [<ffffffff81096f90>] ? kthread_park+0x60/0x60
[92098.842502]  [<ffffffff817aac4f>] ret_from_fork+0x3f/0x70
[92098.842517]  [<ffffffff81096f90>] ? kthread_park+0x60/0x60
[92098.842532] Code: ff eb eb 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 
c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 
89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38
[92098.842658] RIP  [<ffffffff813ae6c6>] memcpy_erms+0x6/0x10
[92098.842675]  RSP <ffff8800a3dcbcc8>
[92098.849594] ---[ end trace 9d5fb7931a3ec713 ]---

I would definitely say that rescans should be avoided on current kernels 
as the possibility that it'll bring the system down shouldn't be 
ignored.
It confirms that this code really needs a rewrite !

Regards,

-- 
Stéphane.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-22  1:30                                                   ` Stéphane Lesimple
@ 2015-09-22  1:37                                                     ` Qu Wenruo
  2015-09-22  7:34                                                       ` Stéphane Lesimple
  0 siblings, 1 reply; 37+ messages in thread
From: Qu Wenruo @ 2015-09-22  1:37 UTC (permalink / raw)
  To: Stéphane Lesimple; +Cc: Qu Wenruo, linux-btrfs



Stéphane Lesimple wrote on 2015/09/22 03:30 +0200:
> Le 2015-09-20 13:14, Stéphane Lesimple a écrit :
>> Le 2015-09-20 12:51, Qu Wenruo a écrit :
>>>>> Would you please use gdb to show the codes of
>>>>> "btrfs_qgroup_rescan_worker+0x388" ?
>>>>> (Need kernel debuginfo)
>>>>>
>>>>> My guess is the following line:(pretty sure, but not 100% sure)
>>>>> ------
>>>>> /*
>>>>>          * only update status, since the previous part has alreay
>>>>> updated the
>>>>>          * qgroup info.
>>>>>          */
>>>>>         trans = btrfs_start_transaction(fs_info->quota_root, 1); <<<<<
>>>>>         if (IS_ERR(trans)) {
>>>>>                 err = PTR_ERR(trans);
>>>>>                 btrfs_err(fs_info,
>>>>>                           "fail to start transaction for status
>>>>> update: %d\n",
>>>>>                           err);
>>>>>                 goto done;
>>>>>         }
>>>>> ------
>>>>
>>>> The kernel and modules were already compiled with debuginfo.
>>>> However for some reason, I couldn't get gdb disassembly of /proc/kcore
>>>> properly
>>>> aligned with the source I compiled: the asm code doesn't match the C
>>>> code shown
>>>> by gdb. In any case, watching the source of this function, this is the
>>>> only place
>>>> btrfs_start_transaction is called, so we can be 100% sure it's where
>>>> the
>>>> crash
>>>> happens indeed.
>>>
>>> Yep, that's the only caller.
>>>
>>> Here is some useful small hint to locate the code, if you are
>>> interestied in kernel development.
>>>
>>> # Not sure about whether ubuntu gzipped modules, at least Arch does
>>> # compress it
>>> $ cp <kernel modules dir>/kernel/fs/btrfs/btrfs.ko.gz /tmp/
>>> $ gunzip /tmp/btrfs.ko.gz
>>> $ gdb /tmp/btrfs.ko
>>> # Make sure gdb read all the needed debuginfo
>>> $ gdb list *(btrfs_qgroup_rescan_worker+0x388)
>>>
>>> And gdb will find the code position for you.
>>> Quite easy one, only backtrace info is needed.
>>
>> Ah, thanks for the tips, I was loading whole vmlinux and using
>> /proc/kcore
>> as the core info, then adding the module with "add-symbol-file". But as
>> we're just looking for the code and not the variables, it was indeed
>> completely overkill.
>>
>> (gdb) list *(btrfs_qgroup_rescan_worker+0x388)
>> 0x98068 is in btrfs_qgroup_rescan_worker (fs/btrfs/qgroup.c:2328).
>> 2323
>> 2324            /*
>> 2325             * only update status, since the previous part has
>> alreay updated the
>> 2326             * qgroup info.
>> 2327             */
>> 2328            trans = btrfs_start_transaction(fs_info->quota_root, 1);
>> 2329            if (IS_ERR(trans)) {
>> 2330                    err = PTR_ERR(trans);
>> 2331                    btrfs_err(fs_info,
>> 2332                              "fail to start transaction for
>> status update: %d\n",
>>
>> So this just confirms what we were already 99% sure of.
>>
>>> Another hint is about how to collect the kernel crash info.
>>> Your netconsole setup would be definitely one good practice.
>>>
>>> Another one I use to collect crash info is kdump.
>>> Ubuntu should have a good wiki on it.
>>
>> I've already come across kdump a few times, but never really look into
>> it.
>> To debug the other complicated extend backref bug, it could be of some
>> use.
>>
>>>>>>> So, as a quick summary of this big thread, it seems I've been
>>>>>>> hitting
>>>>>>> 3 bugs, all reproductible :
>>>>>>> - kernel BUG on balance (this original thread)
>>>>>
>>>>> For this, I can't provide much help, as extent backref bug is quite
>>>>> hard to debug, unless a developer is interested in it and find a
>>>>> stable way to reproduce it.
>>>>
>>>> Yes, unfortunately as it looks so much like a race condition, I know
>>>> I can
>>>> reproduce it with my worflow, but it can take between 1 minute and 12
>>>> hours,
>>>> so I wouldn't call it a "stable way" to reproduce it unfortunately :(
>>>>
>>>> Still if any dev is interested in it, I can reproduce it, with a
>>>> patched
>>>> kernel if needed.
>>>
>>> Maybe you are already doing it, you can only compile the btrfs
>>> modules, which will be far more faster than compile the whole kernel,
>>> if and only if the compiled module can be loaded.
>>
>> Yes, I've compiled this 4.3.0-rc1 in a completely modular form, so
>> I'll try to
>> load the modified module and see if the running kernel accepts it. I
>> have to rmmod
>> the loaded module first, hence umounting any btrfs fs before that.
>> Should be able
>> to do it in a couple hours.
>>
>> I'll delete again all my snapshots and run my script. Should be easy
>> to trigger
>> the (hopefully worked-around) bug again.
>
> Well, I didn't trigger this exact bug, but another one, not less severe
> though, as it also crashed the system:
>
> [92098.841309] general protection fault: 0000 [#1] SMP
> [92098.841338] Modules linked in: ...
> [92098.841814] CPU: 1 PID: 24655 Comm: kworker/u4:12 Not tainted
> 4.3.0-rc1 #1
> [92098.841834] Hardware name: ASUS All Series/H87I-PLUS, BIOS 1005
> 01/06/2014
> [92098.841868] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper
> [btrfs]
> [92098.841889] task: ffff8800b6cc4100 ti: ffff8800a3dc8000 task.ti:
> ffff8800a3dc8000
> [92098.841910] RIP: 0010:[<ffffffff813ae6c6>]  [<ffffffff813ae6c6>]
> memcpy_erms+0x6/0x10
> [92098.841935] RSP: 0018:ffff8800a3dcbcc8  EFLAGS: 00010207
> [92098.841950] RAX: ffff8800a3dcbd67 RBX: 0000000000000009 RCX:
> 0000000000000009
> [92098.841970] RDX: 0000000000000009 RSI: 0005080000000000 RDI:
> ffff8800a3dcbd67
> [92098.841989] RBP: ffff8800a3dcbd00 R08: 0000000000019c60 R09:
> ffff88011fb19c60
> [92098.842009] R10: ffffea0003006480 R11: 0000000001000000 R12:
> ffff8800b76c32c0
> [92098.842028] R13: 0000160000000000 R14: ffff8800a3dcbd70 R15:
> 0000000000000009
> [92098.842048] FS:  0000000000000000(0000) GS:ffff88011fb00000(0000)
> knlGS:0000000000000000
> [92098.842070] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [92098.842086] CR2: 00007fe1f2bd8000 CR3: 0000000001c10000 CR4:
> 00000000000406e0
> [92098.842105] Stack:
> [92098.842111]  ffffffffc035a5d8 ffffffffc0396d00 000000000000028b
> 0000000000000000
> [92098.842212]  0000cc6c00000000 ffff8800b76c3200 0000160000000000
> ffff8800a3dcbdc0
> [92098.842237]  ffffffffc039af3d ffff8800c7196dc8 ffff8800c7196e08
> ffff8800c7196da0
> [92098.842261] Call Trace:
> [92098.842277]  [<ffffffffc035a5d8>] ? read_extent_buffer+0xb8/0x110
> [btrfs]
> [92098.842304]  [<ffffffffc0396d00>] ? btrfs_find_all_roots+0x60/0x70
> [btrfs]
> [92098.842329]  [<ffffffffc039af3d>]
> btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs]

Would you please show the code of it?
This one seems to be another stupid bug I made when rewriting the framework.
Maybe I forgot to reinit some variants or I'm screwing memory...

Thanks,
Qu

> [92098.842351]  [<ffffffff810a1a0d>] ?
> ttwu_do_activate.constprop.90+0x5d/0x70
> [92098.842377]  [<ffffffffc03674e0>] normal_work_helper+0xc0/0x270 [btrfs]
> [92098.842401]  [<ffffffffc03678a2>]
> btrfs_qgroup_rescan_helper+0x12/0x20 [btrfs]
> [92098.842421]  [<ffffffff8109127e>] process_one_work+0x14e/0x3d0
> [92098.842438]  [<ffffffff8109192a>] worker_thread+0x11a/0x470
> [92098.842454]  [<ffffffff81091810>] ? rescuer_thread+0x310/0x310
> [92098.842471]  [<ffffffff81097059>] kthread+0xc9/0xe0
> [92098.842485]  [<ffffffff81096f90>] ? kthread_park+0x60/0x60
> [92098.842502]  [<ffffffff817aac4f>] ret_from_fork+0x3f/0x70
> [92098.842517]  [<ffffffff81096f90>] ? kthread_park+0x60/0x60
> [92098.842532] Code: ff eb eb 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48
> c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48
> 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38
> [92098.842658] RIP  [<ffffffff813ae6c6>] memcpy_erms+0x6/0x10
> [92098.842675]  RSP <ffff8800a3dcbcc8>
> [92098.849594] ---[ end trace 9d5fb7931a3ec713 ]---
>
> I would definitely say that rescans should be avoided on current kernels
> as the possibility that it'll bring the system down shouldn't be ignored.
> It confirms that this code really needs a rewrite !
>
> Regards,
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-22  1:37                                                     ` Qu Wenruo
@ 2015-09-22  7:34                                                       ` Stéphane Lesimple
  2015-09-22  8:40                                                         ` Qu Wenruo
  0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-22  7:34 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs

Le 2015-09-22 03:37, Qu Wenruo a écrit :
> Stéphane Lesimple wrote on 2015/09/22 03:30 +0200:
>> Le 2015-09-20 13:14, Stéphane Lesimple a écrit :
>>> Le 2015-09-20 12:51, Qu Wenruo a écrit :
>>>>>> Would you please use gdb to show the codes of
>>>>>> "btrfs_qgroup_rescan_worker+0x388" ?
>>>>>> (Need kernel debuginfo)
>>>>>> 
>>>>>> My guess is the following line:(pretty sure, but not 100% sure)
>>>>>> ------
>>>>>> /*
>>>>>>          * only update status, since the previous part has alreay
>>>>>> updated the
>>>>>>          * qgroup info.
>>>>>>          */
>>>>>>         trans = btrfs_start_transaction(fs_info->quota_root, 1); 
>>>>>> <<<<<
>>>>>>         if (IS_ERR(trans)) {
>>>>>>                 err = PTR_ERR(trans);
>>>>>>                 btrfs_err(fs_info,
>>>>>>                           "fail to start transaction for status
>>>>>> update: %d\n",
>>>>>>                           err);
>>>>>>                 goto done;
>>>>>>         }
>>>>>> ------
>>>>> 
>>>>> The kernel and modules were already compiled with debuginfo.
>>>>> However for some reason, I couldn't get gdb disassembly of 
>>>>> /proc/kcore
>>>>> properly
>>>>> aligned with the source I compiled: the asm code doesn't match the 
>>>>> C
>>>>> code shown
>>>>> by gdb. In any case, watching the source of this function, this is 
>>>>> the
>>>>> only place
>>>>> btrfs_start_transaction is called, so we can be 100% sure it's 
>>>>> where
>>>>> the
>>>>> crash
>>>>> happens indeed.
>>>> 
>>>> Yep, that's the only caller.
>>>> 
>>>> Here is some useful small hint to locate the code, if you are
>>>> interestied in kernel development.
>>>> 
>>>> # Not sure about whether ubuntu gzipped modules, at least Arch does
>>>> # compress it
>>>> $ cp <kernel modules dir>/kernel/fs/btrfs/btrfs.ko.gz /tmp/
>>>> $ gunzip /tmp/btrfs.ko.gz
>>>> $ gdb /tmp/btrfs.ko
>>>> # Make sure gdb read all the needed debuginfo
>>>> $ gdb list *(btrfs_qgroup_rescan_worker+0x388)
>>>> 
>>>> And gdb will find the code position for you.
>>>> Quite easy one, only backtrace info is needed.
>>> 
>>> Ah, thanks for the tips, I was loading whole vmlinux and using
>>> /proc/kcore
>>> as the core info, then adding the module with "add-symbol-file". But 
>>> as
>>> we're just looking for the code and not the variables, it was indeed
>>> completely overkill.
>>> 
>>> (gdb) list *(btrfs_qgroup_rescan_worker+0x388)
>>> 0x98068 is in btrfs_qgroup_rescan_worker (fs/btrfs/qgroup.c:2328).
>>> 2323
>>> 2324            /*
>>> 2325             * only update status, since the previous part has
>>> alreay updated the
>>> 2326             * qgroup info.
>>> 2327             */
>>> 2328            trans = btrfs_start_transaction(fs_info->quota_root, 
>>> 1);
>>> 2329            if (IS_ERR(trans)) {
>>> 2330                    err = PTR_ERR(trans);
>>> 2331                    btrfs_err(fs_info,
>>> 2332                              "fail to start transaction for
>>> status update: %d\n",
>>> 
>>> So this just confirms what we were already 99% sure of.
>>> 
>>>> Another hint is about how to collect the kernel crash info.
>>>> Your netconsole setup would be definitely one good practice.
>>>> 
>>>> Another one I use to collect crash info is kdump.
>>>> Ubuntu should have a good wiki on it.
>>> 
>>> I've already come across kdump a few times, but never really look 
>>> into
>>> it.
>>> To debug the other complicated extend backref bug, it could be of 
>>> some
>>> use.
>>> 
>>>>>>>> So, as a quick summary of this big thread, it seems I've been
>>>>>>>> hitting
>>>>>>>> 3 bugs, all reproductible :
>>>>>>>> - kernel BUG on balance (this original thread)
>>>>>> 
>>>>>> For this, I can't provide much help, as extent backref bug is 
>>>>>> quite
>>>>>> hard to debug, unless a developer is interested in it and find a
>>>>>> stable way to reproduce it.
>>>>> 
>>>>> Yes, unfortunately as it looks so much like a race condition, I 
>>>>> know
>>>>> I can
>>>>> reproduce it with my worflow, but it can take between 1 minute and 
>>>>> 12
>>>>> hours,
>>>>> so I wouldn't call it a "stable way" to reproduce it unfortunately 
>>>>> :(
>>>>> 
>>>>> Still if any dev is interested in it, I can reproduce it, with a
>>>>> patched
>>>>> kernel if needed.
>>>> 
>>>> Maybe you are already doing it, you can only compile the btrfs
>>>> modules, which will be far more faster than compile the whole 
>>>> kernel,
>>>> if and only if the compiled module can be loaded.
>>> 
>>> Yes, I've compiled this 4.3.0-rc1 in a completely modular form, so
>>> I'll try to
>>> load the modified module and see if the running kernel accepts it. I
>>> have to rmmod
>>> the loaded module first, hence umounting any btrfs fs before that.
>>> Should be able
>>> to do it in a couple hours.
>>> 
>>> I'll delete again all my snapshots and run my script. Should be easy
>>> to trigger
>>> the (hopefully worked-around) bug again.
>> 
>> Well, I didn't trigger this exact bug, but another one, not less 
>> severe
>> though, as it also crashed the system:
>> 
>> [92098.841309] general protection fault: 0000 [#1] SMP
>> [92098.841338] Modules linked in: ...
>> [92098.841814] CPU: 1 PID: 24655 Comm: kworker/u4:12 Not tainted
>> 4.3.0-rc1 #1
>> [92098.841834] Hardware name: ASUS All Series/H87I-PLUS, BIOS 1005
>> 01/06/2014
>> [92098.841868] Workqueue: btrfs-qgroup-rescan 
>> btrfs_qgroup_rescan_helper
>> [btrfs]
>> [92098.841889] task: ffff8800b6cc4100 ti: ffff8800a3dc8000 task.ti:
>> ffff8800a3dc8000
>> [92098.841910] RIP: 0010:[<ffffffff813ae6c6>]  [<ffffffff813ae6c6>]
>> memcpy_erms+0x6/0x10
>> [92098.841935] RSP: 0018:ffff8800a3dcbcc8  EFLAGS: 00010207
>> [92098.841950] RAX: ffff8800a3dcbd67 RBX: 0000000000000009 RCX:
>> 0000000000000009
>> [92098.841970] RDX: 0000000000000009 RSI: 0005080000000000 RDI:
>> ffff8800a3dcbd67
>> [92098.841989] RBP: ffff8800a3dcbd00 R08: 0000000000019c60 R09:
>> ffff88011fb19c60
>> [92098.842009] R10: ffffea0003006480 R11: 0000000001000000 R12:
>> ffff8800b76c32c0
>> [92098.842028] R13: 0000160000000000 R14: ffff8800a3dcbd70 R15:
>> 0000000000000009
>> [92098.842048] FS:  0000000000000000(0000) GS:ffff88011fb00000(0000)
>> knlGS:0000000000000000
>> [92098.842070] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [92098.842086] CR2: 00007fe1f2bd8000 CR3: 0000000001c10000 CR4:
>> 00000000000406e0
>> [92098.842105] Stack:
>> [92098.842111]  ffffffffc035a5d8 ffffffffc0396d00 000000000000028b
>> 0000000000000000
>> [92098.842212]  0000cc6c00000000 ffff8800b76c3200 0000160000000000
>> ffff8800a3dcbdc0
>> [92098.842237]  ffffffffc039af3d ffff8800c7196dc8 ffff8800c7196e08
>> ffff8800c7196da0
>> [92098.842261] Call Trace:
>> [92098.842277]  [<ffffffffc035a5d8>] ? read_extent_buffer+0xb8/0x110
>> [btrfs]
>> [92098.842304]  [<ffffffffc0396d00>] ? btrfs_find_all_roots+0x60/0x70
>> [btrfs]
>> [92098.842329]  [<ffffffffc039af3d>]
>> btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs]
> 
> Would you please show the code of it?
> This one seems to be another stupid bug I made when rewriting the 
> framework.
> Maybe I forgot to reinit some variants or I'm screwing memory...

(gdb) list *(btrfs_qgroup_rescan_worker+0x28d)
0x97f6d is in btrfs_qgroup_rescan_worker (fs/btrfs/ctree.h:2760).
2755
2756    static inline void btrfs_disk_key_to_cpu(struct btrfs_key *cpu,
2757                                             struct btrfs_disk_key 
*disk)
2758    {
2759            cpu->offset = le64_to_cpu(disk->offset);
2760            cpu->type = disk->type;
2761            cpu->objectid = le64_to_cpu(disk->objectid);
2762    }
2763
2764    static inline void btrfs_cpu_key_to_disk(struct btrfs_disk_key 
*disk,
(gdb)


Does it makes sense ?


>> [92098.842351]  [<ffffffff810a1a0d>] ?
>> ttwu_do_activate.constprop.90+0x5d/0x70
>> [92098.842377]  [<ffffffffc03674e0>] normal_work_helper+0xc0/0x270 
>> [btrfs]
>> [92098.842401]  [<ffffffffc03678a2>]
>> btrfs_qgroup_rescan_helper+0x12/0x20 [btrfs]
>> [92098.842421]  [<ffffffff8109127e>] process_one_work+0x14e/0x3d0
>> [92098.842438]  [<ffffffff8109192a>] worker_thread+0x11a/0x470
>> [92098.842454]  [<ffffffff81091810>] ? rescuer_thread+0x310/0x310
>> [92098.842471]  [<ffffffff81097059>] kthread+0xc9/0xe0
>> [92098.842485]  [<ffffffff81096f90>] ? kthread_park+0x60/0x60
>> [92098.842502]  [<ffffffff817aac4f>] ret_from_fork+0x3f/0x70
>> [92098.842517]  [<ffffffff81096f90>] ? kthread_park+0x60/0x60
>> [92098.842532] Code: ff eb eb 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 
>> 48
>> c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 
>> 48
>> 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38
>> [92098.842658] RIP  [<ffffffff813ae6c6>] memcpy_erms+0x6/0x10
>> [92098.842675]  RSP <ffff8800a3dcbcc8>
>> [92098.849594] ---[ end trace 9d5fb7931a3ec713 ]---
>> 
>> I would definitely say that rescans should be avoided on current 
>> kernels
>> as the possibility that it'll bring the system down shouldn't be 
>> ignored.
>> It confirms that this code really needs a rewrite !
>> 
>> Regards,
>> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" 
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-22  7:34                                                       ` Stéphane Lesimple
@ 2015-09-22  8:40                                                         ` Qu Wenruo
  2015-09-22  8:51                                                           ` Qu Wenruo
  0 siblings, 1 reply; 37+ messages in thread
From: Qu Wenruo @ 2015-09-22  8:40 UTC (permalink / raw)
  To: Stéphane Lesimple, Qu Wenruo; +Cc: linux-btrfs



在 2015年09月22日 15:34, Stéphane Lesimple 写道:
> Le 2015-09-22 03:37, Qu Wenruo a écrit :
>> Stéphane Lesimple wrote on 2015/09/22 03:30 +0200:
>>> Le 2015-09-20 13:14, Stéphane Lesimple a écrit :
>>>> Le 2015-09-20 12:51, Qu Wenruo a écrit :
>>>>>>> Would you please use gdb to show the codes of
>>>>>>> "btrfs_qgroup_rescan_worker+0x388" ?
>>>>>>> (Need kernel debuginfo)
>>>>>>>
>>>>>>> My guess is the following line:(pretty sure, but not 100% sure)
>>>>>>> ------
>>>>>>> /*
>>>>>>>          * only update status, since the previous part has alreay
>>>>>>> updated the
>>>>>>>          * qgroup info.
>>>>>>>          */
>>>>>>>         trans = btrfs_start_transaction(fs_info->quota_root, 1);
>>>>>>> <<<<<
>>>>>>>         if (IS_ERR(trans)) {
>>>>>>>                 err = PTR_ERR(trans);
>>>>>>>                 btrfs_err(fs_info,
>>>>>>>                           "fail to start transaction for status
>>>>>>> update: %d\n",
>>>>>>>                           err);
>>>>>>>                 goto done;
>>>>>>>         }
>>>>>>> ------
>>>>>>
>>>>>> The kernel and modules were already compiled with debuginfo.
>>>>>> However for some reason, I couldn't get gdb disassembly of
>>>>>> /proc/kcore
>>>>>> properly
>>>>>> aligned with the source I compiled: the asm code doesn't match the C
>>>>>> code shown
>>>>>> by gdb. In any case, watching the source of this function, this is
>>>>>> the
>>>>>> only place
>>>>>> btrfs_start_transaction is called, so we can be 100% sure it's where
>>>>>> the
>>>>>> crash
>>>>>> happens indeed.
>>>>>
>>>>> Yep, that's the only caller.
>>>>>
>>>>> Here is some useful small hint to locate the code, if you are
>>>>> interestied in kernel development.
>>>>>
>>>>> # Not sure about whether ubuntu gzipped modules, at least Arch does
>>>>> # compress it
>>>>> $ cp <kernel modules dir>/kernel/fs/btrfs/btrfs.ko.gz /tmp/
>>>>> $ gunzip /tmp/btrfs.ko.gz
>>>>> $ gdb /tmp/btrfs.ko
>>>>> # Make sure gdb read all the needed debuginfo
>>>>> $ gdb list *(btrfs_qgroup_rescan_worker+0x388)
>>>>>
>>>>> And gdb will find the code position for you.
>>>>> Quite easy one, only backtrace info is needed.
>>>>
>>>> Ah, thanks for the tips, I was loading whole vmlinux and using
>>>> /proc/kcore
>>>> as the core info, then adding the module with "add-symbol-file". But as
>>>> we're just looking for the code and not the variables, it was indeed
>>>> completely overkill.
>>>>
>>>> (gdb) list *(btrfs_qgroup_rescan_worker+0x388)
>>>> 0x98068 is in btrfs_qgroup_rescan_worker (fs/btrfs/qgroup.c:2328).
>>>> 2323
>>>> 2324            /*
>>>> 2325             * only update status, since the previous part has
>>>> alreay updated the
>>>> 2326             * qgroup info.
>>>> 2327             */
>>>> 2328            trans = btrfs_start_transaction(fs_info->quota_root,
>>>> 1);
>>>> 2329            if (IS_ERR(trans)) {
>>>> 2330                    err = PTR_ERR(trans);
>>>> 2331                    btrfs_err(fs_info,
>>>> 2332                              "fail to start transaction for
>>>> status update: %d\n",
>>>>
>>>> So this just confirms what we were already 99% sure of.
>>>>
>>>>> Another hint is about how to collect the kernel crash info.
>>>>> Your netconsole setup would be definitely one good practice.
>>>>>
>>>>> Another one I use to collect crash info is kdump.
>>>>> Ubuntu should have a good wiki on it.
>>>>
>>>> I've already come across kdump a few times, but never really look into
>>>> it.
>>>> To debug the other complicated extend backref bug, it could be of some
>>>> use.
>>>>
>>>>>>>>> So, as a quick summary of this big thread, it seems I've been
>>>>>>>>> hitting
>>>>>>>>> 3 bugs, all reproductible :
>>>>>>>>> - kernel BUG on balance (this original thread)
>>>>>>>
>>>>>>> For this, I can't provide much help, as extent backref bug is quite
>>>>>>> hard to debug, unless a developer is interested in it and find a
>>>>>>> stable way to reproduce it.
>>>>>>
>>>>>> Yes, unfortunately as it looks so much like a race condition, I know
>>>>>> I can
>>>>>> reproduce it with my worflow, but it can take between 1 minute and 12
>>>>>> hours,
>>>>>> so I wouldn't call it a "stable way" to reproduce it unfortunately :(
>>>>>>
>>>>>> Still if any dev is interested in it, I can reproduce it, with a
>>>>>> patched
>>>>>> kernel if needed.
>>>>>
>>>>> Maybe you are already doing it, you can only compile the btrfs
>>>>> modules, which will be far more faster than compile the whole kernel,
>>>>> if and only if the compiled module can be loaded.
>>>>
>>>> Yes, I've compiled this 4.3.0-rc1 in a completely modular form, so
>>>> I'll try to
>>>> load the modified module and see if the running kernel accepts it. I
>>>> have to rmmod
>>>> the loaded module first, hence umounting any btrfs fs before that.
>>>> Should be able
>>>> to do it in a couple hours.
>>>>
>>>> I'll delete again all my snapshots and run my script. Should be easy
>>>> to trigger
>>>> the (hopefully worked-around) bug again.
>>>
>>> Well, I didn't trigger this exact bug, but another one, not less severe
>>> though, as it also crashed the system:
>>>
>>> [92098.841309] general protection fault: 0000 [#1] SMP
>>> [92098.841338] Modules linked in: ...
>>> [92098.841814] CPU: 1 PID: 24655 Comm: kworker/u4:12 Not tainted
>>> 4.3.0-rc1 #1
>>> [92098.841834] Hardware name: ASUS All Series/H87I-PLUS, BIOS 1005
>>> 01/06/2014
>>> [92098.841868] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper
>>> [btrfs]
>>> [92098.841889] task: ffff8800b6cc4100 ti: ffff8800a3dc8000 task.ti:
>>> ffff8800a3dc8000
>>> [92098.841910] RIP: 0010:[<ffffffff813ae6c6>]  [<ffffffff813ae6c6>]
>>> memcpy_erms+0x6/0x10
>>> [92098.841935] RSP: 0018:ffff8800a3dcbcc8  EFLAGS: 00010207
>>> [92098.841950] RAX: ffff8800a3dcbd67 RBX: 0000000000000009 RCX:
>>> 0000000000000009
>>> [92098.841970] RDX: 0000000000000009 RSI: 0005080000000000 RDI:
>>> ffff8800a3dcbd67
>>> [92098.841989] RBP: ffff8800a3dcbd00 R08: 0000000000019c60 R09:
>>> ffff88011fb19c60
>>> [92098.842009] R10: ffffea0003006480 R11: 0000000001000000 R12:
>>> ffff8800b76c32c0
>>> [92098.842028] R13: 0000160000000000 R14: ffff8800a3dcbd70 R15:
>>> 0000000000000009
>>> [92098.842048] FS:  0000000000000000(0000) GS:ffff88011fb00000(0000)
>>> knlGS:0000000000000000
>>> [92098.842070] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [92098.842086] CR2: 00007fe1f2bd8000 CR3: 0000000001c10000 CR4:
>>> 00000000000406e0
>>> [92098.842105] Stack:
>>> [92098.842111]  ffffffffc035a5d8 ffffffffc0396d00 000000000000028b
>>> 0000000000000000
>>> [92098.842212]  0000cc6c00000000 ffff8800b76c3200 0000160000000000
>>> ffff8800a3dcbdc0
>>> [92098.842237]  ffffffffc039af3d ffff8800c7196dc8 ffff8800c7196e08
>>> ffff8800c7196da0
>>> [92098.842261] Call Trace:
>>> [92098.842277]  [<ffffffffc035a5d8>] ? read_extent_buffer+0xb8/0x110
>>> [btrfs]
>>> [92098.842304]  [<ffffffffc0396d00>] ? btrfs_find_all_roots+0x60/0x70
>>> [btrfs]
>>> [92098.842329]  [<ffffffffc039af3d>]
>>> btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs]
>>
>> Would you please show the code of it?
>> This one seems to be another stupid bug I made when rewriting the
>> framework.
>> Maybe I forgot to reinit some variants or I'm screwing memory...
>
> (gdb) list *(btrfs_qgroup_rescan_worker+0x28d)
> 0x97f6d is in btrfs_qgroup_rescan_worker (fs/btrfs/ctree.h:2760).
> 2755
> 2756    static inline void btrfs_disk_key_to_cpu(struct btrfs_key *cpu,
> 2757                                             struct btrfs_disk_key
> *disk)
> 2758    {
> 2759            cpu->offset = le64_to_cpu(disk->offset);
> 2760            cpu->type = disk->type;
> 2761            cpu->objectid = le64_to_cpu(disk->objectid);
> 2762    }
> 2763
> 2764    static inline void btrfs_cpu_key_to_disk(struct btrfs_disk_key
> *disk,
> (gdb)
>
>
> Does it makes sense ?
So it seems that the memory of cpu key is being screwed up...

The code is be specific thin inline function, so what about other stack?
Like btrfs_qgroup_rescan_helper+0x12?

Thanks,
Qu
>
>
>>> [92098.842351]  [<ffffffff810a1a0d>] ?
>>> ttwu_do_activate.constprop.90+0x5d/0x70
>>> [92098.842377]  [<ffffffffc03674e0>] normal_work_helper+0xc0/0x270
>>> [btrfs]
>>> [92098.842401]  [<ffffffffc03678a2>]
>>> btrfs_qgroup_rescan_helper+0x12/0x20 [btrfs]
>>> [92098.842421]  [<ffffffff8109127e>] process_one_work+0x14e/0x3d0
>>> [92098.842438]  [<ffffffff8109192a>] worker_thread+0x11a/0x470
>>> [92098.842454]  [<ffffffff81091810>] ? rescuer_thread+0x310/0x310
>>> [92098.842471]  [<ffffffff81097059>] kthread+0xc9/0xe0
>>> [92098.842485]  [<ffffffff81096f90>] ? kthread_park+0x60/0x60
>>> [92098.842502]  [<ffffffff817aac4f>] ret_from_fork+0x3f/0x70
>>> [92098.842517]  [<ffffffff81096f90>] ? kthread_park+0x60/0x60
>>> [92098.842532] Code: ff eb eb 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48
>>> c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48
>>> 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38
>>> [92098.842658] RIP  [<ffffffff813ae6c6>] memcpy_erms+0x6/0x10
>>> [92098.842675]  RSP <ffff8800a3dcbcc8>
>>> [92098.849594] ---[ end trace 9d5fb7931a3ec713 ]---
>>>
>>> I would definitely say that rescans should be avoided on current kernels
>>> as the possibility that it'll bring the system down shouldn't be
>>> ignored.
>>> It confirms that this code really needs a rewrite !
>>>
>>> Regards,
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-22  8:40                                                         ` Qu Wenruo
@ 2015-09-22  8:51                                                           ` Qu Wenruo
  2015-09-22 14:31                                                             ` Stéphane Lesimple
  0 siblings, 1 reply; 37+ messages in thread
From: Qu Wenruo @ 2015-09-22  8:51 UTC (permalink / raw)
  To: Stéphane Lesimple, Qu Wenruo; +Cc: linux-btrfs



在 2015年09月22日 16:40, Qu Wenruo 写道:
>
>
> 在 2015年09月22日 15:34, Stéphane Lesimple 写道:
>> Le 2015-09-22 03:37, Qu Wenruo a écrit :
>>> Stéphane Lesimple wrote on 2015/09/22 03:30 +0200:
>>>> Le 2015-09-20 13:14, Stéphane Lesimple a écrit :
>>>>> Le 2015-09-20 12:51, Qu Wenruo a écrit :
>>>>>>>> Would you please use gdb to show the codes of
>>>>>>>> "btrfs_qgroup_rescan_worker+0x388" ?
>>>>>>>> (Need kernel debuginfo)
>>>>>>>>
>>>>>>>> My guess is the following line:(pretty sure, but not 100% sure)
>>>>>>>> ------
>>>>>>>> /*
>>>>>>>>          * only update status, since the previous part has alreay
>>>>>>>> updated the
>>>>>>>>          * qgroup info.
>>>>>>>>          */
>>>>>>>>         trans =trfs_start_transaction(fs_info->quota_root, 1);
>>>>>>>> <<<<<
>>>>>>>>         if (IS_ERR(trans)) {
>>>>>>>>                 err =TR_ERR(trans);
>>>>>>>>                 btrfs_err(fs_info,
>>>>>>>>                           "fail to start transaction for status
>>>>>>>> update: %d\n",
>>>>>>>>                           err);
>>>>>>>>                 goto done;
>>>>>>>>         }
>>>>>>>> ------
>>>>>>>
>>>>>>> The kernel and modules were already compiled with debuginfo.
>>>>>>> However for some reason, I couldn't get gdb disassembly of
>>>>>>> /proc/kcore
>>>>>>> properly
>>>>>>> aligned with the source I compiled: the asm code doesn't match the C
>>>>>>> code shown
>>>>>>> by gdb. In any case, watching the source of this function, this is
>>>>>>> the
>>>>>>> only place
>>>>>>> btrfs_start_transaction is called, so we can be 100% sure it's where
>>>>>>> the
>>>>>>> crash
>>>>>>> happens indeed.
>>>>>>
>>>>>> Yep, that's the only caller.
>>>>>>
>>>>>> Here is some useful small hint to locate the code, if you are
>>>>>> interestied in kernel development.
>>>>>>
>>>>>> # Not sure about whether ubuntu gzipped modules, at least Arch does
>>>>>> # compress it
>>>>>> $ cp <kernel modules dir>/kernel/fs/btrfs/btrfs.ko.gz /tmp/
>>>>>> $ gunzip /tmp/btrfs.ko.gz
>>>>>> $ gdb /tmp/btrfs.ko
>>>>>> # Make sure gdb read all the needed debuginfo
>>>>>> $ gdb list *(btrfs_qgroup_rescan_worker+0x388)
>>>>>>
>>>>>> And gdb will find the code position for you.
>>>>>> Quite easy one, only backtrace info is needed.
>>>>>
>>>>> Ah, thanks for the tips, I was loading whole vmlinux and using
>>>>> /proc/kcore
>>>>> as the core info, then adding the module with "add-symbol-file".
>>>>> But as
>>>>> we're just looking for the code and not the variables, it was indeed
>>>>> completely overkill.
>>>>>
>>>>> (gdb) list *(btrfs_qgroup_rescan_worker+0x388)
>>>>> 0x98068 is in btrfs_qgroup_rescan_worker (fs/btrfs/qgroup.c:2328).
>>>>> 2323
>>>>> 2324            /*
>>>>> 2325             * only update status, since the previous part has
>>>>> alreay updated the
>>>>> 2326             * qgroup info.
>>>>> 2327             */
>>>>> 2328            trans =trfs_start_transaction(fs_info->quota_root,
>>>>> 1);
>>>>> 2329            if (IS_ERR(trans)) {
>>>>> 2330                    err =TR_ERR(trans);
>>>>> 2331                    btrfs_err(fs_info,
>>>>> 2332                              "fail to start transaction for
>>>>> status update: %d\n",
>>>>>
>>>>> So this just confirms what we were already 99% sure of.
>>>>>
>>>>>> Another hint is about how to collect the kernel crash info.
>>>>>> Your netconsole setup would be definitely one good practice.
>>>>>>
>>>>>> Another one I use to collect crash info is kdump.
>>>>>> Ubuntu should have a good wiki on it.
>>>>>
>>>>> I've already come across kdump a few times, but never really look into
>>>>> it.
>>>>> To debug the other complicated extend backref bug, it could be of some
>>>>> use.
>>>>>
>>>>>>>>>> So, as a quick summary of this big thread, it seems I've been
>>>>>>>>>> hitting
>>>>>>>>>> 3 bugs, all reproductible :
>>>>>>>>>> - kernel BUG on balance (this original thread)
>>>>>>>>
>>>>>>>> For this, I can't provide much help, as extent backref bug is quite
>>>>>>>> hard to debug, unless a developer is interested in it and find a
>>>>>>>> stable way to reproduce it.
>>>>>>>
>>>>>>> Yes, unfortunately as it looks so much like a race condition, I know
>>>>>>> I can
>>>>>>> reproduce it with my worflow, but it can take between 1 minute
>>>>>>> and 12
>>>>>>> hours,
>>>>>>> so I wouldn't call it a "stable way" to reproduce it
>>>>>>> unfortunately :(
>>>>>>>
>>>>>>> Still if any dev is interested in it, I can reproduce it, with a
>>>>>>> patched
>>>>>>> kernel if needed.
>>>>>>
>>>>>> Maybe you are already doing it, you can only compile the btrfs
>>>>>> modules, which will be far more faster than compile the whole kernel,
>>>>>> if and only if the compiled module can be loaded.
>>>>>
>>>>> Yes, I've compiled this 4.3.0-rc1 in a completely modular form, so
>>>>> I'll try to
>>>>> load the modified module and see if the running kernel accepts it. I
>>>>> have to rmmod
>>>>> the loaded module first, hence umounting any btrfs fs before that.
>>>>> Should be able
>>>>> to do it in a couple hours.
>>>>>
>>>>> I'll delete again all my snapshots and run my script. Should be easy
>>>>> to trigger
>>>>> the (hopefully worked-around) bug again.
>>>>
>>>> Well, I didn't trigger this exact bug, but another one, not less severe
>>>> though, as it also crashed the system:
>>>>
>>>> [92098.841309] general protection fault: 0000 [#1] SMP
>>>> [92098.841338] Modules linked in: ...
>>>> [92098.841814] CPU: 1 PID: 24655 Comm: kworker/u4:12 Not tainted
>>>> 4.3.0-rc1 #1
>>>> [92098.841834] Hardware name: ASUS All Series/H87I-PLUS, BIOS 1005
>>>> 01/06/2014
>>>> [92098.841868] Workqueue: btrfs-qgroup-rescan
>>>> btrfs_qgroup_rescan_helper
>>>> [btrfs]
>>>> [92098.841889] task: ffff8800b6cc4100 ti: ffff8800a3dc8000 task.ti:
>>>> ffff8800a3dc8000
>>>> [92098.841910] RIP: 0010:[<ffffffff813ae6c6>]  [<ffffffff813ae6c6>]
>>>> memcpy_erms+0x6/0x10
>>>> [92098.841935] RSP: 0018:ffff8800a3dcbcc8  EFLAGS: 00010207
>>>> [92098.841950] RAX: ffff8800a3dcbd67 RBX: 0000000000000009 RCX:
>>>> 0000000000000009
>>>> [92098.841970] RDX: 0000000000000009 RSI: 0005080000000000 RDI:
>>>> ffff8800a3dcbd67
>>>> [92098.841989] RBP: ffff8800a3dcbd00 R08: 0000000000019c60 R09:
>>>> ffff88011fb19c60
>>>> [92098.842009] R10: ffffea0003006480 R11: 0000000001000000 R12:
>>>> ffff8800b76c32c0
>>>> [92098.842028] R13: 0000160000000000 R14: ffff8800a3dcbd70 R15:
>>>> 0000000000000009
>>>> [92098.842048] FS:  0000000000000000(0000) GS:ffff88011fb00000(0000)
>>>> knlGS:0000000000000000
>>>> [92098.842070] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [92098.842086] CR2: 00007fe1f2bd8000 CR3: 0000000001c10000 CR4:
>>>> 00000000000406e0
>>>> [92098.842105] Stack:
>>>> [92098.842111]  ffffffffc035a5d8 ffffffffc0396d00 000000000000028b
>>>> 0000000000000000
>>>> [92098.842212]  0000cc6c00000000 ffff8800b76c3200 0000160000000000
>>>> ffff8800a3dcbdc0
>>>> [92098.842237]  ffffffffc039af3d ffff8800c7196dc8 ffff8800c7196e08
>>>> ffff8800c7196da0
>>>> [92098.842261] Call Trace:
>>>> [92098.842277]  [<ffffffffc035a5d8>] ? read_extent_buffer+0xb8/0x110
>>>> [btrfs]
>>>> [92098.842304]  [<ffffffffc0396d00>] ? btrfs_find_all_roots+0x60/0x70
>>>> [btrfs]
>>>> [92098.842329]  [<ffffffffc039af3d>]
>>>> btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs]
>>>
>>> Would you please show the code of it?
>>> This one seems to be another stupid bug I made when rewriting the
>>> framework.
>>> Maybe I forgot to reinit some variants or I'm screwing memory...
>>
>> (gdb) list *(btrfs_qgroup_rescan_worker+0x28d)
>> 0x97f6d is in btrfs_qgroup_rescan_worker (fs/btrfs/ctree.h:2760).
>> 2755
>> 2756    static inline void btrfs_disk_key_to_cpu(struct btrfs_key *cpu,
>> 2757                                             struct btrfs_disk_key
>> *disk)
>> 2758    {
>> 2759            cpu->offset =e64_to_cpu(disk->offset);
>> 2760            cpu->type =isk->type;
>> 2761            cpu->objectid =e64_to_cpu(disk->objectid);
>> 2762    }
>> 2763
>> 2764    static inline void btrfs_cpu_key_to_disk(struct btrfs_disk_key
>> *disk,
>> (gdb)
>>
>>
>> Does it makes sense ?
> So it seems that the memory of cpu key is being screwed up...
>
> The code is be specific thin inline function, so what about other stack?
> Like btrfs_qgroup_rescan_helper+0x12?
>
> Thanks,
> Qu
Oh, I forgot that you can just change the number of 
btrfs_qgroup_rescan_worker+0x28d to smaller value.
Try +0x280 for example, which will revert to 14 bytes asm code back, 
which may jump out of the inline function range, and may give you a good 
hint.

Or gdb may have a better mode for inline function, but I don't know...

Thanks,
Qu
>>
>>
>>>> [92098.842351]  [<ffffffff810a1a0d>] ?
>>>> ttwu_do_activate.constprop.90+0x5d/0x70
>>>> [92098.842377]  [<ffffffffc03674e0>] normal_work_helper+0xc0/0x270
>>>> [btrfs]
>>>> [92098.842401]  [<ffffffffc03678a2>]
>>>> btrfs_qgroup_rescan_helper+0x12/0x20 [btrfs]
>>>> [92098.842421]  [<ffffffff8109127e>] process_one_work+0x14e/0x3d0
>>>> [92098.842438]  [<ffffffff8109192a>] worker_thread+0x11a/0x470
>>>> [92098.842454]  [<ffffffff81091810>] ? rescuer_thread+0x310/0x310
>>>> [92098.842471]  [<ffffffff81097059>] kthread+0xc9/0xe0
>>>> [92098.842485]  [<ffffffff81096f90>] ? kthread_park+0x60/0x60
>>>> [92098.842502]  [<ffffffff817aac4f>] ret_from_fork+0x3f/0x70
>>>> [92098.842517]  [<ffffffff81096f90>] ? kthread_park+0x60/0x60
>>>> [92098.842532] Code: ff eb eb 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48
>>>> c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48
>>>> 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38
>>>> [92098.842658] RIP  [<ffffffff813ae6c6>] memcpy_erms+0x6/0x10
>>>> [92098.842675]  RSP <ffff8800a3dcbcc8>
>>>> [92098.849594] ---[ end trace 9d5fb7931a3ec713 ]---
>>>>
>>>> I would definitely say that rescans should be avoided on current
>>>> kernels
>>>> as the possibility that it'll bring the system down shouldn't be
>>>> ignored.
>>>> It confirms that this code really needs a rewrite !
>>>>
>>>> Regards,
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe
>>> linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-22  8:51                                                           ` Qu Wenruo
@ 2015-09-22 14:31                                                             ` Stéphane Lesimple
  2015-09-23  7:03                                                               ` Qu Wenruo
  0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-22 14:31 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs

Le 2015-09-22 10:51, Qu Wenruo a écrit :
>>>>> [92098.842261] Call Trace:
>>>>> [92098.842277]  [<ffffffffc035a5d8>] ? 
>>>>> read_extent_buffer+0xb8/0x110
>>>>> [btrfs]
>>>>> [92098.842304]  [<ffffffffc0396d00>] ? 
>>>>> btrfs_find_all_roots+0x60/0x70
>>>>> [btrfs]
>>>>> [92098.842329]  [<ffffffffc039af3d>]
>>>>> btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs]
>>>> 
>>>> Would you please show the code of it?
>>>> This one seems to be another stupid bug I made when rewriting the
>>>> framework.
>>>> Maybe I forgot to reinit some variants or I'm screwing memory...
>>> 
>>> (gdb) list *(btrfs_qgroup_rescan_worker+0x28d)
>>> 0x97f6d is in btrfs_qgroup_rescan_worker (fs/btrfs/ctree.h:2760).
>>> 2755
>>> 2756    static inline void btrfs_disk_key_to_cpu(struct btrfs_key 
>>> *cpu,
>>> 2757                                             struct 
>>> btrfs_disk_key
>>> *disk)
>>> 2758    {
>>> 2759            cpu->offset =e64_to_cpu(disk->offset);
>>> 2760            cpu->type =isk->type;
>>> 2761            cpu->objectid =e64_to_cpu(disk->objectid);
>>> 2762    }
>>> 2763
>>> 2764    static inline void btrfs_cpu_key_to_disk(struct 
>>> btrfs_disk_key
>>> *disk,
>>> (gdb)
>>> 
>>> 
>>> Does it makes sense ?
>> So it seems that the memory of cpu key is being screwed up...
>> 
>> The code is be specific thin inline function, so what about other 
>> stack?
>> Like btrfs_qgroup_rescan_helper+0x12?
>> 
>> Thanks,
>> Qu
> Oh, I forgot that you can just change the number of
> btrfs_qgroup_rescan_worker+0x28d to smaller value.
> Try +0x280 for example, which will revert to 14 bytes asm code back,
> which may jump out of the inline function range, and may give you a
> good hint.
> 
> Or gdb may have a better mode for inline function, but I don't know...

Actually, "list -" is our friend here (show 10 lignes before the last 
src output)

(gdb) list *(btrfs_qgroup_rescan_worker+0x28d)
0x97f6d is in btrfs_qgroup_rescan_worker (fs/btrfs/ctree.h:2760).
2755
2756    static inline void btrfs_disk_key_to_cpu(struct btrfs_key *cpu,
2757                                             struct btrfs_disk_key 
*disk)
2758    {
2759            cpu->offset = le64_to_cpu(disk->offset);
2760            cpu->type = disk->type;
2761            cpu->objectid = le64_to_cpu(disk->objectid);
2762    }
2763
2764    static inline void btrfs_cpu_key_to_disk(struct btrfs_disk_key 
*disk,
(gdb) list -
2745                                                struct 
btrfs_disk_key *key)
2746    {
2747            write_eb_member(eb, h, struct btrfs_free_space_header, 
location, key);
2748    }
2749
2750    /* struct btrfs_disk_key */
2751    BTRFS_SETGET_STACK_FUNCS(disk_key_objectid, struct 
btrfs_disk_key,
2752                             objectid, 64);
2753    BTRFS_SETGET_STACK_FUNCS(disk_key_offset, struct btrfs_disk_key, 
offset, 64);
2754    BTRFS_SETGET_STACK_FUNCS(disk_key_type, struct btrfs_disk_key, 
type, 8);
(gdb) list -
2735
2736    static inline void btrfs_free_space_key(struct extent_buffer 
*eb,
2737                                            struct 
btrfs_free_space_header *h,
2738                                            struct btrfs_disk_key 
*key)
2739    {
2740            read_eb_member(eb, h, struct btrfs_free_space_header, 
location, key);
2741    }
2742
2743    static inline void btrfs_set_free_space_key(struct extent_buffer 
*eb,
2744                                                struct 
btrfs_free_space_header *h,
(gdb)

Lots of inline funcs and macros it seems.

-- 
Stéphane.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-22 14:31                                                             ` Stéphane Lesimple
@ 2015-09-23  7:03                                                               ` Qu Wenruo
  2015-09-23  9:40                                                                 ` Stéphane Lesimple
  0 siblings, 1 reply; 37+ messages in thread
From: Qu Wenruo @ 2015-09-23  7:03 UTC (permalink / raw)
  To: Stéphane Lesimple, Qu Wenruo; +Cc: linux-btrfs



Stéphane Lesimple wrote on 2015/09/22 16:31 +0200:
> Le 2015-09-22 10:51, Qu Wenruo a écrit :
>>>>>> [92098.842261] Call Trace:
>>>>>> [92098.842277]  [<ffffffffc035a5d8>] ? read_extent_buffer+0xb8/0x110
>>>>>> [btrfs]
>>>>>> [92098.842304]  [<ffffffffc0396d00>] ? btrfs_find_all_roots+0x60/0x70
>>>>>> [btrfs]
>>>>>> [92098.842329]  [<ffffffffc039af3d>]
>>>>>> btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs]
>>>>>
>>>>> Would you please show the code of it?
>>>>> This one seems to be another stupid bug I made when rewriting the
>>>>> framework.
>>>>> Maybe I forgot to reinit some variants or I'm screwing memory...
>>>>
>>>> (gdb) list *(btrfs_qgroup_rescan_worker+0x28d)
>>>> 0x97f6d is in btrfs_qgroup_rescan_worker (fs/btrfs/ctree.h:2760).
>>>> 2755
>>>> 2756    static inline void btrfs_disk_key_to_cpu(struct btrfs_key *cpu,
>>>> 2757                                             struct btrfs_disk_key
>>>> *disk)
>>>> 2758    {
>>>> 2759            cpu->offset =e64_to_cpu(disk->offset);
>>>> 2760            cpu->type =isk->type;
>>>> 2761            cpu->objectid =e64_to_cpu(disk->objectid);
>>>> 2762    }
>>>> 2763
>>>> 2764    static inline void btrfs_cpu_key_to_disk(struct btrfs_disk_key
>>>> *disk,
>>>> (gdb)
>>>>
>>>>
>>>> Does it makes sense ?
>>> So it seems that the memory of cpu key is being screwed up...
>>>
>>> The code is be specific thin inline function, so what about other stack?
>>> Like btrfs_qgroup_rescan_helper+0x12?
>>>
>>> Thanks,
>>> Qu
>> Oh, I forgot that you can just change the number of
>> btrfs_qgroup_rescan_worker+0x28d to smaller value.
>> Try +0x280 for example, which will revert to 14 bytes asm code back,
>> which may jump out of the inline function range, and may give you a
>> good hint.
>>
>> Or gdb may have a better mode for inline function, but I don't know...
>
> Actually, "list -" is our friend here (show 10 lignes before the last
> src output)
No, that's not the case.

List - will only show lines around the source code.

What I need is to get the higher caller stack.
If debugging a running program, it's quite easy to just use frame command.

But in this situation, we don't have call stack, so I'd like to change 
the +0x28d to several bytes backward, until we jump out of the inline 
function call, and see the meaningful codes.


BTW, did you tried the following patch?
https://patchwork.kernel.org/patch/7114321/
btrfs: qgroup: exit the rescan worker during umount

The problem seems a little related to the bug you encountered, so I'd 
recommend to give it a try.

Thanks,
Qu
>
> (gdb) list *(btrfs_qgroup_rescan_worker+0x28d)
> 0x97f6d is in btrfs_qgroup_rescan_worker (fs/btrfs/ctree.h:2760).
> 2755
> 2756    static inline void btrfs_disk_key_to_cpu(struct btrfs_key *cpu,
> 2757                                             struct btrfs_disk_key
> *disk)
> 2758    {
> 2759            cpu->offset = le64_to_cpu(disk->offset);
> 2760            cpu->type = disk->type;
> 2761            cpu->objectid = le64_to_cpu(disk->objectid);
> 2762    }
> 2763
> 2764    static inline void btrfs_cpu_key_to_disk(struct btrfs_disk_key
> *disk,
> (gdb) list -
> 2745                                                struct
> btrfs_disk_key *key)
> 2746    {
> 2747            write_eb_member(eb, h, struct btrfs_free_space_header,
> location, key);
> 2748    }
> 2749
> 2750    /* struct btrfs_disk_key */
> 2751    BTRFS_SETGET_STACK_FUNCS(disk_key_objectid, struct btrfs_disk_key,
> 2752                             objectid, 64);
> 2753    BTRFS_SETGET_STACK_FUNCS(disk_key_offset, struct btrfs_disk_key,
> offset, 64);
> 2754    BTRFS_SETGET_STACK_FUNCS(disk_key_type, struct btrfs_disk_key,
> type, 8);
> (gdb) list -
> 2735
> 2736    static inline void btrfs_free_space_key(struct extent_buffer *eb,
> 2737                                            struct
> btrfs_free_space_header *h,
> 2738                                            struct btrfs_disk_key *key)
> 2739    {
> 2740            read_eb_member(eb, h, struct btrfs_free_space_header,
> location, key);
> 2741    }
> 2742
> 2743    static inline void btrfs_set_free_space_key(struct extent_buffer
> *eb,
> 2744                                                struct
> btrfs_free_space_header *h,
> (gdb)
>
> Lots of inline funcs and macros it seems.
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-23  7:03                                                               ` Qu Wenruo
@ 2015-09-23  9:40                                                                 ` Stéphane Lesimple
  2015-09-23 10:13                                                                   ` Qu Wenruo
  0 siblings, 1 reply; 37+ messages in thread
From: Stéphane Lesimple @ 2015-09-23  9:40 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs

Le 2015-09-23 09:03, Qu Wenruo a écrit :
> Stéphane Lesimple wrote on 2015/09/22 16:31 +0200:
>> Le 2015-09-22 10:51, Qu Wenruo a écrit :
>>>>>>> [92098.842261] Call Trace:
>>>>>>> [92098.842277]  [<ffffffffc035a5d8>] ? 
>>>>>>> read_extent_buffer+0xb8/0x110
>>>>>>> [btrfs]
>>>>>>> [92098.842304]  [<ffffffffc0396d00>] ? 
>>>>>>> btrfs_find_all_roots+0x60/0x70
>>>>>>> [btrfs]
>>>>>>> [92098.842329]  [<ffffffffc039af3d>]
>>>>>>> btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs]
>>>>>> 
>>>>>> Would you please show the code of it?
>>>>>> This one seems to be another stupid bug I made when rewriting the
>>>>>> framework.
>>>>>> Maybe I forgot to reinit some variants or I'm screwing memory...
>>>>> 
>>>>> (gdb) list *(btrfs_qgroup_rescan_worker+0x28d)
>>>>> 0x97f6d is in btrfs_qgroup_rescan_worker (fs/btrfs/ctree.h:2760).
>>>>> 2755
>>>>> 2756    static inline void btrfs_disk_key_to_cpu(struct btrfs_key 
>>>>> *cpu,
>>>>> 2757                                             struct 
>>>>> btrfs_disk_key
>>>>> *disk)
>>>>> 2758    {
>>>>> 2759            cpu->offset =e64_to_cpu(disk->offset);
>>>>> 2760            cpu->type =isk->type;
>>>>> 2761            cpu->objectid =e64_to_cpu(disk->objectid);
>>>>> 2762    }
>>>>> 2763
>>>>> 2764    static inline void btrfs_cpu_key_to_disk(struct 
>>>>> btrfs_disk_key
>>>>> *disk,
>>>>> (gdb)
>>>>> 
>>>>> 
>>>>> Does it makes sense ?
>>>> So it seems that the memory of cpu key is being screwed up...
>>>> 
>>>> The code is be specific thin inline function, so what about other 
>>>> stack?
>>>> Like btrfs_qgroup_rescan_helper+0x12?
>>>> 
>>>> Thanks,
>>>> Qu
>>> Oh, I forgot that you can just change the number of
>>> btrfs_qgroup_rescan_worker+0x28d to smaller value.
>>> Try +0x280 for example, which will revert to 14 bytes asm code back,
>>> which may jump out of the inline function range, and may give you a
>>> good hint.
>>> 
>>> Or gdb may have a better mode for inline function, but I don't 
>>> know...
>> 
>> Actually, "list -" is our friend here (show 10 lignes before the last
>> src output)
> No, that's not the case.
> 
> List - will only show lines around the source code.
> 
> What I need is to get the higher caller stack.
> If debugging a running program, it's quite easy to just use frame 
> command.
> 
> But in this situation, we don't have call stack, so I'd like to change
> the +0x28d to several bytes backward, until we jump out of the inline
> function call, and see the meaningful codes.

Ah, you're right.
I had a hard time finding a value where I wouldn't end up in another 
inline
function or entirely somewhere else in the kernel code, but here it is :

(gdb) list *(btrfs_qgroup_rescan_worker+0x26e)
0x97f4e is in btrfs_qgroup_rescan_worker (fs/btrfs/qgroup.c:2237).
2232            memcpy(scratch_leaf, path->nodes[0], 
sizeof(*scratch_leaf));
2233            slot = path->slots[0];
2234            btrfs_release_path(path);
2235            mutex_unlock(&fs_info->qgroup_rescan_lock);
2236
2237            for (; slot < btrfs_header_nritems(scratch_leaf); 
++slot) {
2238                    btrfs_item_key_to_cpu(scratch_leaf, &found, 
slot); <== here

2239                    if (found.type != BTRFS_EXTENT_ITEM_KEY &&
2240                        found.type != BTRFS_METADATA_ITEM_KEY)
2241                            continue;

the btrfs_item_key_to_cpu() inline func calls 2 other inline funcs:

static inline void btrfs_item_key_to_cpu(struct extent_buffer *eb,
                                   struct btrfs_key *key, int nr)
{
         struct btrfs_disk_key disk_key;
         btrfs_item_key(eb, &disk_key, nr);
         btrfs_disk_key_to_cpu(key, &disk_key); <== this is 0x28d
}

btrfs_disk_key_to_cpu() is the inline referenced by 0x28d and this is 
where
the GPF happens.


> BTW, did you tried the following patch?
> https://patchwork.kernel.org/patch/7114321/
> btrfs: qgroup: exit the rescan worker during umount
> 
> The problem seems a little related to the bug you encountered, so I'd
> recommend to give it a try.

Not yet, but I've come across this bug too during my tests: starting a 
rescan
and umounting gets you a crash. I didn't mention it because I was sure 
this
was an already known bug. Nice to see it has been fixed though !
I'll certainly give it a try but I'm not really sure it'll fix the 
specific
bug we're talking about.
However the group of patches posted by Mark should fix the qgroup count
disrepancies as I understand it, right ? It might be of interest to try 
them
all at once for sure.

Thanks,

-- 
Stéphane.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance
  2015-09-23  9:40                                                                 ` Stéphane Lesimple
@ 2015-09-23 10:13                                                                   ` Qu Wenruo
  0 siblings, 0 replies; 37+ messages in thread
From: Qu Wenruo @ 2015-09-23 10:13 UTC (permalink / raw)
  To: Stéphane Lesimple, Qu Wenruo; +Cc: linux-btrfs



在 2015年09月23日 17:40, Stéphane Lesimple 写道:
> Le 2015-09-23 09:03, Qu Wenruo a écrit :
>> Stéphane Lesimple wrote on 2015/09/22 16:31 +0200:
>>> Le 2015-09-22 10:51, Qu Wenruo a écrit :
>>>>>>>> [92098.842261] Call Trace:
>>>>>>>> [92098.842277]  [<ffffffffc035a5d8>] ?
>>>>>>>> read_extent_buffer+0xb8/0x110
>>>>>>>> [btrfs]
>>>>>>>> [92098.842304]  [<ffffffffc0396d00>] ?
>>>>>>>> btrfs_find_all_roots+0x60/0x70
>>>>>>>> [btrfs]
>>>>>>>> [92098.842329]  [<ffffffffc039af3d>]
>>>>>>>> btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs]
>>>>>>>
>>>>>>> Would you please show the code of it?
>>>>>>> This one seems to be another stupid bug I made when rewriting the
>>>>>>> framework.
>>>>>>> Maybe I forgot to reinit some variants or I'm screwing memory...
>>>>>>
>>>>>> (gdb) list *(btrfs_qgroup_rescan_worker+0x28d)
>>>>>> 0x97f6d is in btrfs_qgroup_rescan_worker (fs/btrfs/ctree.h:2760).
>>>>>> 2755
>>>>>> 2756    static inline void btrfs_disk_key_to_cpu(struct btrfs_key
>>>>>> *cpu,
>>>>>> 2757                                             struct
>>>>>> btrfs_disk_key
>>>>>> *disk)
>>>>>> 2758    {
>>>>>> 2759            cpu->offset =e64_to_cpu(disk->offset);
>>>>>> 2760            cpu->type =isk->type;
>>>>>> 2761            cpu->objectid =e64_to_cpu(disk->objectid);
>>>>>> 2762    }
>>>>>> 2763
>>>>>> 2764    static inline void btrfs_cpu_key_to_disk(struct
>>>>>> btrfs_disk_key
>>>>>> *disk,
>>>>>> (gdb)
>>>>>>
>>>>>>
>>>>>> Does it makes sense ?
>>>>> So it seems that the memory of cpu key is being screwed up...
>>>>>
>>>>> The code is be specific thin inline function, so what about other
>>>>> stack?
>>>>> Like btrfs_qgroup_rescan_helper+0x12?
>>>>>
>>>>> Thanks,
>>>>> Qu
>>>> Oh, I forgot that you can just change the number of
>>>> btrfs_qgroup_rescan_worker+0x28d to smaller value.
>>>> Try +0x280 for example, which will revert to 14 bytes asm code back,
>>>> which may jump out of the inline function range, and may give you a
>>>> good hint.
>>>>
>>>> Or gdb may have a better mode for inline function, but I don't know...
>>>
>>> Actually, "list -" is our friend here (show 10 lignes before the last
>>> src output)
>> No, that's not the case.
>>
>> List - will only show lines around the source code.
>>
>> What I need is to get the higher caller stack.
>> If debugging a running program, it's quite easy to just use frame
>> command.
>>
>> But in this situation, we don't have call stack, so I'd like to change
>> the +0x28d to several bytes backward, until we jump out of the inline
>> function call, and see the meaningful codes.
>
> Ah, you're right.
> I had a hard time finding a value where I wouldn't end up in another inline
> function or entirely somewhere else in the kernel code, but here it is :
>
> (gdb) list *(btrfs_qgroup_rescan_worker+0x26e)
> 0x97f4e is in btrfs_qgroup_rescan_worker (fs/btrfs/qgroup.c:2237).
> 2232            memcpy(scratch_leaf, path->nodes[0],
> sizeof(*scratch_leaf));
> 2233            slot = path->slots[0];
> 2234            btrfs_release_path(path);
> 2235            mutex_unlock(&fs_info->qgroup_rescan_lock);
> 2236
> 2237            for (; slot < btrfs_header_nritems(scratch_leaf); ++slot) {
> 2238                    btrfs_item_key_to_cpu(scratch_leaf, &found,
> slot); <== here
>
> 2239                    if (found.type != BTRFS_EXTENT_ITEM_KEY &&
> 2240                        found.type != BTRFS_METADATA_ITEM_KEY)
> 2241                            continue;
>
> the btrfs_item_key_to_cpu() inline func calls 2 other inline funcs:
>
> static inline void btrfs_item_key_to_cpu(struct extent_buffer *eb,
>                                    struct btrfs_key *key, int nr)
> {
>          struct btrfs_disk_key disk_key;
>          btrfs_item_key(eb, &disk_key, nr);
>          btrfs_disk_key_to_cpu(key, &disk_key); <== this is 0x28d
> }
>
> btrfs_disk_key_to_cpu() is the inline referenced by 0x28d and this is where
> the GPF happens.

Thanks, now things are much more clear.
Not completely sure, but scratch_leaf seems invalid and cause the bug.
(found is in stack memory, so I don't think it's the cause).

But less related to the qgroup rework, as that's the existing code.

A quick glance already shows some dirty and maybe deadly hack, like 
copying the whole extent buffer, which includes pages and all kinds of 
locks.

But I'm not 100% sure if that's the problem, but I'll create a patch to 
for you to test in recent days.

>
>
>> BTW, did you tried the following patch?
>> https://patchwork.kernel.org/patch/7114321/
>> btrfs: qgroup: exit the rescan worker during umount
>>
>> The problem seems a little related to the bug you encountered, so I'd
>> recommend to give it a try.
>
> Not yet, but I've come across this bug too during my tests: starting a
> rescan
> and umounting gets you a crash. I didn't mention it because I was sure this
> was an already known bug. Nice to see it has been fixed though !
> I'll certainly give it a try but I'm not really sure it'll fix the specific
> bug we're talking about.
> However the group of patches posted by Mark should fix the qgroup count
> disrepancies as I understand it, right ? It might be of interest to try
> them
> all at once for sure.

Yes, his patch should fix the qgroup count mismatch problem for 
subvolume remove.

If I read the codes correctly, after remove and sync, the accounting 
number for qgroup of deleted subvolume should be:
rfer = 0 and excl = 0.

Thanks,
Qu
>
> Thanks,
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2015-09-23 10:14 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-14 11:46 kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance Stéphane Lesimple
2015-09-15 14:47 ` Stéphane Lesimple
2015-09-15 14:56   ` Josef Bacik
2015-09-15 21:47     ` Stéphane Lesimple
2015-09-16  5:02       ` Duncan
2015-09-16 10:28         ` Stéphane Lesimple
2015-09-16 10:46           ` Holger Hoffstätte
2015-09-16 13:04             ` Stéphane Lesimple
2015-09-16 20:18               ` Duncan
2015-09-16 20:41                 ` Stéphane Lesimple
2015-09-17  3:03                   ` Qu Wenruo
2015-09-17  6:11                     ` Stéphane Lesimple
2015-09-17  6:42                       ` Qu Wenruo
2015-09-17  8:02                         ` Stéphane Lesimple
2015-09-17  8:11                           ` Qu Wenruo
2015-09-17 10:08                             ` Stéphane Lesimple
2015-09-17 10:41                               ` Qu Wenruo
2015-09-17 18:47                                 ` Stéphane Lesimple
2015-09-18  0:59                                   ` Qu Wenruo
2015-09-18  7:36                                     ` Stéphane Lesimple
2015-09-18 10:15                                       ` Stéphane Lesimple
2015-09-18 10:26                                         ` Stéphane Lesimple
2015-09-20  1:22                                           ` Qu Wenruo
2015-09-20 10:35                                             ` Stéphane Lesimple
2015-09-20 10:51                                               ` Qu Wenruo
2015-09-20 11:14                                                 ` Stéphane Lesimple
2015-09-22  1:30                                                   ` Stéphane Lesimple
2015-09-22  1:37                                                     ` Qu Wenruo
2015-09-22  7:34                                                       ` Stéphane Lesimple
2015-09-22  8:40                                                         ` Qu Wenruo
2015-09-22  8:51                                                           ` Qu Wenruo
2015-09-22 14:31                                                             ` Stéphane Lesimple
2015-09-23  7:03                                                               ` Qu Wenruo
2015-09-23  9:40                                                                 ` Stéphane Lesimple
2015-09-23 10:13                                                                   ` Qu Wenruo
2015-09-17  6:29               ` Stéphane Lesimple
2015-09-17  7:54                 ` Stéphane Lesimple

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.