All of lore.kernel.org
 help / color / mirror / Atom feed
* 4.17-rc1 FS went read-only during balance
@ 2018-04-21 14:55 Dmitrii Tcvetkov
  2018-04-22  8:12 ` Dmitrii Tcvetkov
  2018-04-23  1:23 ` Qu Wenruo
  0 siblings, 2 replies; 6+ messages in thread
From: Dmitrii Tcvetkov @ 2018-04-21 14:55 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2169 bytes --]

TL;DR It seems as regression in 4.17, but I managed to find a
workaround to make filesystem rw mountable again.

Kernel built from tag v4.17-rc1
btrfs-progs 4.16

Tonight two my machines (PC (ECC RAM) and laptop(non-ECC RAM)) were
doing usual weekly balance with this command via cron:
btrfs balance start -musage=50 -dusage=50 <mountpoint>
Both machines run same kernel version. 

On PC that caused root and "data" filesystems to go readonly. Root is on
an SSD with data single and metadata DUP, "data" filesystem is on 2 HDDs
with RAID1 for data and metadata.

On laptop only /home went ro, it's on NVMe SSD with data single and
metadata DUP. 

Btrfs check of PC rootfs was without any errors in both modes, I did
them once each before reboot on readonly filesystem with --force flag
and then from live usb. Same output without any errors.

After reboot kernel refused rw mount rootfs with the same error as
during cron balance, ro mount was accepted, error during rw mount:
BTRFS: error (device dm-17) in merge_reloc_roots:2465: errno=-117
unknown BTRFS info (device dm-17): forced readonly BTRFS info (device
dm-17): delayed_refs has NO entry BTRFS error (device dm-17): cleaner
transaction attach returned -3

mount rw with skip_balance parameter didn't help to mount.

After that I mounted rw the rootfs with 4.16.2 kernel, mount was
successful and kernel finished balance. After that the filesystem is
mountable rw by 4.17-rc1 kernel without errors, btrfs check is clean
too.

Data filesystem behaves the same, rw mount on 4.17-rc1 kernel yields:
[ 2321.370113] BTRFS: error (device dm-17) in merge_reloc_roots:2465:
errno=-117 unknown [ 2321.370119] BTRFS warning (device dm-17): failed
to recover relocation: -30 [ 2321.370137] BTRFS info (device dm-17):
delayed_refs has NO entry [ 2321.370155] BTRFS error (device dm-17):
cleaner transaction attach returned -30 [ 2321.414219] BTRFS error
(device dm-17): open_ctree failed

Rw mount on 4.16.2 goes ok and after balance finishes the filesystem is
mountable by 4.17-rc1 again. I saved /home filesystem from laptop in
unmountable by 4.17-rc1 state and can test patches and/or create
btrfs-image if it's needed.

[-- Attachment #2: pc-dmesg --]
[-- Type: text/plain, Size: 3707 bytes --]

Apr 20 23:46:00 fire kernel: BTRFS: device label root devid 1 transid 350197 /dev/dm-2
Apr 20 23:46:00 fire kernel: BTRFS info (device dm-2): enabling auto defrag
Apr 20 23:46:00 fire kernel: BTRFS info (device dm-2): use lzo compression, level 0
Apr 20 23:46:00 fire kernel: BTRFS info (device dm-2): using free space tree
Apr 20 23:46:00 fire kernel: BTRFS info (device dm-2): has skinny extents
Apr 20 23:46:00 fire kernel: BTRFS info (device dm-2): using free space tree
Apr 20 23:46:10 fire kernel: BTRFS: device label home devid 2 transid 358906 /dev/dm-5
Apr 20 23:46:13 fire kernel: BTRFS: device label home devid 1 transid 358906 /dev/dm-12
Apr 20 23:46:13 fire kernel: BTRFS info (device dm-12): use zstd compression, level 0
Apr 20 23:46:13 fire kernel: BTRFS info (device dm-12): enabling auto defrag
Apr 20 23:46:13 fire kernel: BTRFS info (device dm-12): using free space tree
Apr 20 23:46:13 fire kernel: BTRFS info (device dm-12): has skinny extents
Apr 20 23:52:32 fire kernel: BTRFS: device label storage devid 1 transid 357668 /dev/dm-17
Apr 20 23:52:32 fire kernel: BTRFS: device label backup devid 1 transid 5383 /dev/dm-18
Apr 20 23:52:41 fire kernel: BTRFS: device label storage devid 2 transid 357668 /dev/dm-21
Apr 20 23:52:41 fire kernel: BTRFS: device label backup devid 2 transid 5383 /dev/dm-22
Apr 20 23:52:42 fire kernel: BTRFS info (device dm-17): enabling auto defrag
Apr 20 23:52:42 fire kernel: BTRFS info (device dm-17): use zstd compression, level 0
Apr 20 23:52:42 fire kernel: BTRFS info (device dm-17): using free space tree
Apr 20 23:52:42 fire kernel: BTRFS info (device dm-17): has skinny extents
Apr 20 23:52:45 fire kernel: BTRFS info (device dm-18): enabling auto defrag
Apr 20 23:52:45 fire kernel: BTRFS info (device dm-18): use zstd compression, level 0
Apr 20 23:52:45 fire kernel: BTRFS info (device dm-18): using free space tree
Apr 20 23:52:45 fire kernel: BTRFS info (device dm-18): has skinny extents
Apr 21 01:30:00 fire kernel: BTRFS info (device dm-2): relocating block group 27309113344 flags system|dup
Apr 21 01:30:00 fire kernel: BTRFS info (device dm-12): relocating block group 62910365696 flags system|raid1
Apr 21 01:30:00 fire kernel: BTRFS info (device dm-17): relocating block group 2140869230592 flags metadata|raid1
Apr 21 01:30:00 fire kernel: BTRFS info (device dm-2): relocating block group 27040677888 flags metadata|dup
Apr 21 01:30:00 fire kernel: BTRFS info (device dm-12): relocating block group 61836623872 flags data|raid1
Apr 21 01:30:01 fire kernel: BTRFS info (device dm-12): found 5 extents
Apr 21 01:30:01 fire kernel: BTRFS: error (device dm-2) in merge_reloc_roots:2465: errno=-117 unknown
Apr 21 01:30:01 fire kernel: BTRFS info (device dm-2): forced readonly
Apr 21 01:30:03 fire kernel: BTRFS info (device dm-12): found 5 extents
Apr 21 01:30:04 fire kernel: BTRFS info (device dm-12): relocating block group 60762882048 flags metadata|raid1
Apr 21 01:30:08 fire kernel: BTRFS info (device dm-17): found 1521 extents
Apr 21 01:30:08 fire kernel: BTRFS info (device dm-17): relocating block group 2139761934336 flags metadata|raid1
Apr 21 01:30:22 fire kernel: BTRFS info (device dm-12): found 1709 extents
Apr 21 01:30:25 fire kernel: BTRFS info (device dm-17): found 4297 extents
Apr 21 01:30:25 fire kernel: BTRFS info (device dm-2): delayed_refs has NO entry
Apr 21 01:30:26 fire kernel: BTRFS: error (device dm-17) in merge_reloc_roots:2465: errno=-117 unknown
Apr 21 01:30:26 fire kernel: BTRFS info (device dm-17): forced readonly
Apr 21 01:30:26 fire kernel: BTRFS info (device dm-17): delayed_refs has NO entry
Apr 21 10:28:29 fire kernel: BTRFS error (device dm-17): cleaner transaction attach returned -30

[-- Attachment #3: laptop-dmesg --]
[-- Type: text/plain, Size: 7130 bytes --]

Apr 12 14:56:42 xps kernel: BTRFS info (device dm-3): enabling auto defrag
Apr 12 14:56:42 xps kernel: BTRFS info (device dm-3): using free space tree
Apr 12 14:56:42 xps kernel: BTRFS info (device dm-3): has skinny extents
Apr 14 02:00:00 xps kernel: BTRFS info (device dm-1): relocating block group 30496784384 flags data
Apr 14 02:00:00 xps kernel: BTRFS info (device dm-1): found 2 extents
Apr 14 02:00:00 xps kernel: BTRFS info (device dm-1): found 2 extents
Apr 14 02:00:00 xps kernel: BTRFS info (device dm-1): relocating block group 30463229952 flags system|dup
Apr 14 02:00:00 xps kernel: BTRFS info (device dm-1): found 1 extents
Apr 14 02:00:00 xps kernel: BTRFS info (device dm-1): relocating block group 29892804608 flags metadata|dup
Apr 14 02:00:00 xps kernel: BTRFS info (device dm-1): found 849 extents
Apr 14 02:05:00 xps kernel: BTRFS info (device dm-3): relocating block group 64453869568 flags metadata|dup
Apr 14 02:05:01 xps kernel: BTRFS info (device dm-3): found 2145 extents
Apr 14 02:05:01 xps kernel: BTRFS info (device dm-3): relocating block group 64420315136 flags system|dup
Apr 14 02:05:01 xps kernel: BTRFS info (device dm-3): found 1 extents
Apr 14 02:05:01 xps kernel: BTRFS info (device dm-3): relocating block group 61165535232 flags data
Apr 14 02:05:01 xps kernel: BTRFS info (device dm-3): found 13 extents
Apr 14 02:05:01 xps kernel: BTRFS info (device dm-3): found 13 extents
Apr 14 02:05:01 xps kernel: BTRFS info (device dm-3): relocating block group 60091793408 flags data
Apr 14 02:05:01 xps kernel: BTRFS info (device dm-3): found 5 extents
Apr 14 02:05:01 xps kernel: BTRFS info (device dm-3): found 5 extents
Apr 14 02:05:01 xps kernel: BTRFS info (device dm-3): relocating block group 59018051584 flags data
Apr 14 02:05:01 xps kernel: BTRFS info (device dm-3): found 5 extents
Apr 14 02:05:01 xps kernel: BTRFS info (device dm-3): found 5 extents
Apr 14 02:05:01 xps kernel: BTRFS info (device dm-3): relocating block group 57944309760 flags data
Apr 14 02:05:01 xps kernel: BTRFS info (device dm-3): found 4 extents
Apr 14 02:05:01 xps kernel: BTRFS info (device dm-3): found 4 extents
Apr 14 02:05:01 xps kernel: BTRFS info (device dm-3): relocating block group 56870567936 flags data
Apr 14 02:05:01 xps kernel: BTRFS info (device dm-3): found 3 extents
Apr 14 02:05:02 xps kernel: BTRFS info (device dm-3): found 3 extents
Apr 14 02:05:02 xps kernel: BTRFS info (device dm-3): relocating block group 55796826112 flags data
Apr 14 02:05:02 xps kernel: BTRFS info (device dm-3): found 4 extents
Apr 14 02:05:02 xps kernel: BTRFS info (device dm-3): found 4 extents
Apr 14 02:05:02 xps kernel: BTRFS info (device dm-3): relocating block group 54723084288 flags data
Apr 14 02:05:02 xps kernel: BTRFS info (device dm-3): found 7 extents
Apr 14 02:05:02 xps kernel: BTRFS info (device dm-3): found 7 extents
Apr 14 02:05:02 xps kernel: BTRFS info (device dm-3): relocating block group 53649342464 flags data
Apr 14 02:05:02 xps kernel: BTRFS info (device dm-3): found 8 extents
Apr 14 02:05:02 xps kernel: BTRFS info (device dm-3): found 8 extents
Apr 14 02:05:02 xps kernel: BTRFS info (device dm-3): relocating block group 52575600640 flags data
Apr 14 02:05:02 xps kernel: BTRFS info (device dm-3): found 2 extents
Apr 14 02:05:03 xps kernel: BTRFS info (device dm-3): found 2 extents
Apr 14 02:05:03 xps kernel: BTRFS info (device dm-3): relocating block group 51501858816 flags data
Apr 14 02:05:03 xps kernel: BTRFS info (device dm-3): found 4 extents
Apr 14 02:05:03 xps kernel: BTRFS info (device dm-3): found 4 extents
Apr 14 02:05:03 xps kernel: BTRFS info (device dm-3): relocating block group 50428116992 flags data
Apr 14 02:05:03 xps kernel: BTRFS info (device dm-3): found 3 extents
Apr 14 02:05:03 xps kernel: BTRFS info (device dm-3): found 3 extents
Apr 14 02:05:03 xps kernel: BTRFS info (device dm-3): relocating block group 49354375168 flags data
Apr 14 02:05:03 xps kernel: BTRFS info (device dm-3): found 7 extents
Apr 14 02:05:03 xps kernel: BTRFS info (device dm-3): found 7 extents
Apr 14 02:05:03 xps kernel: BTRFS info (device dm-3): relocating block group 48280633344 flags data
Apr 14 02:05:03 xps kernel: BTRFS info (device dm-3): found 7 extents
Apr 14 02:05:04 xps kernel: BTRFS info (device dm-3): found 7 extents
Apr 14 02:05:04 xps kernel: BTRFS info (device dm-3): relocating block group 47206891520 flags data
Apr 14 02:05:04 xps kernel: BTRFS info (device dm-3): found 7 extents
Apr 14 02:05:04 xps kernel: BTRFS info (device dm-3): found 7 extents
Apr 14 02:05:04 xps kernel: BTRFS info (device dm-3): relocating block group 46133149696 flags data
Apr 14 02:05:04 xps kernel: BTRFS info (device dm-3): found 8 extents
Apr 14 02:05:04 xps kernel: BTRFS info (device dm-3): found 8 extents
Apr 14 02:05:04 xps kernel: BTRFS info (device dm-3): relocating block group 45059407872 flags data
Apr 14 02:05:04 xps kernel: BTRFS info (device dm-3): found 7 extents
Apr 14 02:05:05 xps kernel: BTRFS info (device dm-3): found 7 extents
Apr 17 22:00:37 xps kernel: BTRFS: device label root devid 1 transid 186700 /dev/dm-1
Apr 17 22:00:37 xps kernel: BTRFS: device label home devid 1 transid 269843 /dev/dm-3
Apr 17 22:00:37 xps kernel: BTRFS info (device dm-1): enabling auto defrag
Apr 17 22:00:37 xps kernel: BTRFS info (device dm-1): using free space tree
Apr 17 22:00:37 xps kernel: BTRFS info (device dm-1): has skinny extents
Apr 17 22:00:37 xps kernel: BTRFS info (device dm-1): using free space tree
Apr 17 22:00:38 xps kernel: BTRFS info (device dm-3): enabling auto defrag
Apr 17 22:00:38 xps kernel: BTRFS info (device dm-3): using free space tree
Apr 17 22:00:38 xps kernel: BTRFS info (device dm-3): has skinny extents
Apr 21 02:00:00 xps kernel: BTRFS info (device dm-1): relocating block group 31100764160 flags data
Apr 21 02:00:00 xps kernel: BTRFS info (device dm-1): found 2 extents
Apr 21 02:00:00 xps kernel: BTRFS info (device dm-1): found 2 extents
Apr 21 02:00:00 xps kernel: BTRFS info (device dm-1): relocating block group 30832328704 flags metadata|dup
Apr 21 02:00:01 xps kernel: BTRFS info (device dm-1): found 1104 extents
Apr 21 02:00:01 xps kernel: BTRFS info (device dm-1): found 188 extents
Apr 21 02:00:01 xps kernel: BTRFS info (device dm-1): found 188 extents
Apr 21 02:00:01 xps kernel: BTRFS info (device dm-1): found 188 extents
Apr 21 02:00:01 xps kernel: BTRFS info (device dm-1): relocating block group 30798774272 flags system|dup
Apr 21 02:00:01 xps kernel: BTRFS info (device dm-1): found 1 extents
Apr 21 02:05:00 xps kernel: BTRFS info (device dm-3): relocating block group 68815945728 flags data
Apr 21 02:05:00 xps kernel: BTRFS info (device dm-3): found 18 extents
Apr 21 02:05:00 xps kernel: BTRFS info (device dm-3): found 18 extents
Apr 21 02:05:00 xps kernel: BTRFS info (device dm-3): relocating block group 67742203904 flags metadata|dup
Apr 21 02:05:01 xps kernel: BTRFS: error (device dm-3) in merge_reloc_roots:2465: errno=-117 unknown
Apr 21 02:05:01 xps kernel: BTRFS info (device dm-3): forced readonly
Apr 21 02:05:11 xps kernel: BTRFS error (device dm-3): pending csums is 13107200

[-- Attachment #4: pc-data-original-check --]
[-- Type: text/plain, Size: 568 bytes --]

# btrfs check --readonly /dev/disk/by-uuid/011912ff-8375-4fd0-8568-715cd28c2b8b
Checking filesystem on /dev/disk/by-uuid/011912ff-8375-4fd0-8568-715cd28c2b8b
UUID: 011912ff-8375-4fd0-8568-715cd28c2b8b
checking extents
checking free space tree
checking fs roots
checking csums
checking root refs
found 1023097102336 bytes used, no error found
total csum bytes: 995795540
total tree bytes: 1777254400
total fs tree bytes: 499777536
total extent tree bytes: 156712960
btree space waste bytes: 206534832
file data blocks allocated: 2580921212928
 referenced 1103042093056

[-- Attachment #5: pc-data-lowmem-check --]
[-- Type: text/plain, Size: 2561 bytes --]

# btrfs check --mode lowmem --readonly /dev/disk/by-uuid/011912ff-8375-4fd0-8568-715cd28c2b8b                                                                                         [18/6001]
Checking filesystem on /dev/disk/by-uuid/011912ff-8375-4fd0-8568-715cd28c2b8b
UUID: 011912ff-8375-4fd0-8568-715cd28c2b8b
checking extents
ERROR: extent[1211914780672, 16777216] referencer count mismatch (root: 2073, owner: 236200, offset: 33554432) wanted: 17, have: 18
ERROR: extent[1623008329728, 16777216] referencer count mismatch (root: 2073, owner: 241424, offset: 83886080) wanted: 12, have: 17
ERROR: extent[1636104806400, 16777216] referencer count mismatch (root: 2073, owner: 195667, offset: 117440512) wanted: 13, have: 14
ERROR: extent[1639128367104, 16777216] referencer count mismatch (root: 2073, owner: 197430, offset: 0) wanted: 13, have: 17
ERROR: extent[1689763446784, 16777216] referencer count mismatch (root: 2073, owner: 246255, offset: 67108864) wanted: 4, have: 17
ERROR: extent[1811447328768, 16777216] referencer count mismatch (root: 2073, owner: 253089, offset: 33554432) wanted: 15, have: 16
ERROR: extent[1853376790528, 16777216] referencer count mismatch (root: 2073, owner: 265031, offset: 117440512) wanted: 4, have: 5
ERROR: extent[1864968953856, 16777216] referencer count mismatch (root: 2073, owner: 271372, offset: 100663296) wanted: 12, have: 15
ERROR: extent[1865509302272, 16777216] referencer count mismatch (root: 2073, owner: 271480, offset: 0) wanted: 15, have: 19
ERROR: extent[1873773236224, 16777216] referencer count mismatch (root: 260, owner: 28919, offset: 33554432) wanted: 1, have: 2
ERROR: extent[1875367809024, 16777216] referencer count mismatch (root: 2073, owner: 275001, offset: 0) wanted: 12, have: 15
ERROR: extent[2060846788608, 134217728] referencer count mismatch (root: 263, owner: 265, offset: 50054324224) wanted: 31, have: 111
ERROR: extent[2080751099904, 134217728] referencer count mismatch (root: 263, owner: 265, offset: 32112349184) wanted: 2, have: 12
ERROR: extent[2094597668864, 186949632] referencer count mismatch (root: 259, owner: 33786, offset: 0) wanted: 27, have: 69
ERROR: errors found in extent allocation tree or chunk allocation
checking free space tree
checking fs roots
checking csums
checking root refs
found 1023097102336 bytes used, error(s) found
total csum bytes: 995795540
total tree bytes: 1976434688
total fs tree bytes: 698957824
total extent tree bytes: 156712960
btree space waste bytes: 219547602
file data blocks allocated: 2849277431808
 referenced 1426447798272

[-- Attachment #6: laptop-home-original-check --]
[-- Type: text/plain, Size: 496 bytes --]

WARNING: filesystem mounted, continuing because of --force
checking extents
checking free space tree
checking fs roots
checking csums
checking root refs
Checking filesystem on /dev/xps/home
UUID: 4b4b80dc-e2e3-4d76-96ae-02d42879771d
found 24599171072 bytes used, no error found
total csum bytes: 23610896
total tree bytes: 405340160
total fs tree bytes: 357548032
total extent tree bytes: 18857984
btree space waste bytes: 79738263
file data blocks allocated: 38386794496
 referenced 38672330752

[-- Attachment #7: laptop-home-lowmem-check --]
[-- Type: text/plain, Size: 563 bytes --]

WARNING: filesystem mounted, continuing because of --force
checking extents
checking free space tree
checking fs roots
ERROR: root 5 DIR INODE[791126] shouldn't have more than one link(0)
ERROR: errors found in fs roots
Checking filesystem on /dev/xps/home
UUID: 4b4b80dc-e2e3-4d76-96ae-02d42879771d
found 24599171072 bytes used, error(s) found
total csum bytes: 23610896
total tree bytes: 353976320
total fs tree bytes: 306184192
total extent tree bytes: 18857984
btree space waste bytes: 66493351
file data blocks allocated: 24513560576
 referenced 24935272448

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 4.17-rc1 FS went read-only during balance
  2018-04-21 14:55 4.17-rc1 FS went read-only during balance Dmitrii Tcvetkov
@ 2018-04-22  8:12 ` Dmitrii Tcvetkov
  2018-04-23  1:23 ` Qu Wenruo
  1 sibling, 0 replies; 6+ messages in thread
From: Dmitrii Tcvetkov @ 2018-04-22  8:12 UTC (permalink / raw)
  To: linux-btrfs

> I saved /home filesystem from laptop in unmountable 
> by 4.17-rc1 state and can test patches and/or create 
> btrfs-image if it's needed.

Here is link to the image (103 MB): 
https://demfloro.ru/static/home-btrfs.image

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 4.17-rc1 FS went read-only during balance
  2018-04-21 14:55 4.17-rc1 FS went read-only during balance Dmitrii Tcvetkov
  2018-04-22  8:12 ` Dmitrii Tcvetkov
@ 2018-04-23  1:23 ` Qu Wenruo
       [not found]   ` <20180423080745.5a9dc6be@demfloro.ru>
  1 sibling, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2018-04-23  1:23 UTC (permalink / raw)
  To: Dmitrii Tcvetkov, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2707 bytes --]



On 2018年04月21日 22:55, Dmitrii Tcvetkov wrote:
> TL;DR It seems as regression in 4.17, but I managed to find a
> workaround to make filesystem rw mountable again.
> 
> Kernel built from tag v4.17-rc1
> btrfs-progs 4.16
> 
> Tonight two my machines (PC (ECC RAM) and laptop(non-ECC RAM)) were
> doing usual weekly balance with this command via cron:
> btrfs balance start -musage=50 -dusage=50 <mountpoint>
> Both machines run same kernel version. 
> 
> On PC that caused root and "data" filesystems to go readonly. Root is on
> an SSD with data single and metadata DUP, "data" filesystem is on 2 HDDs
> with RAID1 for data and metadata.
> 
> On laptop only /home went ro, it's on NVMe SSD with data single and
> metadata DUP. 
> 
> Btrfs check of PC rootfs was without any errors in both modes, I did
> them once each before reboot on readonly filesystem with --force flag
> and then from live usb. Same output without any errors.
> 
> After reboot kernel refused rw mount rootfs with the same error as
> during cron balance, ro mount was accepted, error during rw mount:
> BTRFS: error (device dm-17) in merge_reloc_roots:2465: errno=-117

117 means EUCLEAN, which could be caused by the newly introduced
first_key and level check.

Please apply this hotfix to fix it.
btrfs: Only check first key for committed tree blocks
(Which is included in latest pull request)

Also, please consider enable CONFIG_BTRFS_DEBUG to provide extra debug info.

Thanks,
Qu

> unknown BTRFS info (device dm-17): forced readonly BTRFS info (device
> dm-17): delayed_refs has NO entry BTRFS error (device dm-17): cleaner
> transaction attach returned -3
> 
> mount rw with skip_balance parameter didn't help to mount.
> 
> After that I mounted rw the rootfs with 4.16.2 kernel, mount was
> successful and kernel finished balance. After that the filesystem is
> mountable rw by 4.17-rc1 kernel without errors, btrfs check is clean
> too.
> 
> Data filesystem behaves the same, rw mount on 4.17-rc1 kernel yields:
> [ 2321.370113] BTRFS: error (device dm-17) in merge_reloc_roots:2465:
> errno=-117 unknown [ 2321.370119] BTRFS warning (device dm-17): failed
> to recover relocation: -30 [ 2321.370137] BTRFS info (device dm-17):
> delayed_refs has NO entry [ 2321.370155] BTRFS error (device dm-17):
> cleaner transaction attach returned -30 [ 2321.414219] BTRFS error
> (device dm-17): open_ctree failed
> 
> Rw mount on 4.16.2 goes ok and after balance finishes the filesystem is
> mountable by 4.17-rc1 again. I saved /home filesystem from laptop in
> unmountable by 4.17-rc1 state and can test patches and/or create
> btrfs-image if it's needed.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 4.17-rc1 FS went read-only during balance
       [not found]   ` <20180423080745.5a9dc6be@demfloro.ru>
@ 2018-04-23  6:13     ` Qu Wenruo
       [not found]       ` <20180423105543.43f13e3a@job>
  0 siblings, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2018-04-23  6:13 UTC (permalink / raw)
  To: Dmitrii Tcvetkov, linux-btrfs


[-- Attachment #1.1.1: Type: text/plain, Size: 2338 bytes --]



On 2018年04月23日 13:08, Dmitrii Tcvetkov wrote:
> On Mon, 23 Apr 2018 09:23:53 +0800
> Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> 
>> On 2018年04月21日 22:55, Dmitrii Tcvetkov wrote:
>>> TL;DR It seems as regression in 4.17, but I managed to find a
>>> workaround to make filesystem rw mountable again.
>>>
>>> Kernel built from tag v4.17-rc1
>>> btrfs-progs 4.16
>>>
>>> Tonight two my machines (PC (ECC RAM) and laptop(non-ECC RAM)) were
>>> doing usual weekly balance with this command via cron:
>>> btrfs balance start -musage=50 -dusage=50 <mountpoint>
>>> Both machines run same kernel version. 
>>>
>>> On PC that caused root and "data" filesystems to go readonly. Root
>>> is on an SSD with data single and metadata DUP, "data" filesystem
>>> is on 2 HDDs with RAID1 for data and metadata.
>>>
>>> On laptop only /home went ro, it's on NVMe SSD with data single and
>>> metadata DUP. 
>>>
>>> Btrfs check of PC rootfs was without any errors in both modes, I did
>>> them once each before reboot on readonly filesystem with --force
>>> flag and then from live usb. Same output without any errors.
>>>
>>> After reboot kernel refused rw mount rootfs with the same error as
>>> during cron balance, ro mount was accepted, error during rw mount:
>>> BTRFS: error (device dm-17) in merge_reloc_roots:2465: errno=-117  
> 
>> 117 means EUCLEAN, which could be caused by the newly introduced
>> first_key and level check.
> 
>> Please apply this hotfix to fix it.
>> btrfs: Only check first key for committed tree blocks
>> (Which is included in latest pull request)
> 
>> Also, please consider enable CONFIG_BTRFS_DEBUG to provide extra
>> debug info.
> 
>> Thanks,
>> Qu
> 
> I tried 4.17-rc2 (as the pull request was pulled) with
> CONFIG_BTRFS_DEBUG on LVM snapshot of laptop home partition (/dev/vdb)
> in a VM (VM kernel sees only snapshot so no UUID collisions). Dmesg
> attached.

Thanks for the info and your previous btrfs-image.

The image itself shows nothing wrong, so it should be runtime problem.
Would you please apply these two debug patches?
https://patchwork.kernel.org/patch/10335133/
https://patchwork.kernel.org/patch/10335135/

And the attached diff file?

My guess is the parent node is not initialized correctly in this case.

Thanks,
Qu

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.1.2: debug.diff --]
[-- Type: text/x-patch; name="debug.diff", Size: 925 bytes --]

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 60caa68c3618..79f482578e02 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -458,6 +458,7 @@ static int verify_level_key(struct btrfs_fs_info *fs_info,
 			  eb->start, first_key->objectid, first_key->type,
 			  first_key->offset, found_key.objectid,
 			  found_key.type, found_key.offset);
+		btrfs_print_tree(eb, false);
 	}
 #endif
 	return ret;
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 00b7d3231821..cde0cb6c9786 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1870,6 +1870,8 @@ int replace_path(struct btrfs_trans_handle *trans,
 					     level - 1, &first_key);
 			if (IS_ERR(eb)) {
 				ret = PTR_ERR(eb);
+				btrfs_err(fs_info, "parent leaf, slot: %d:", slot);
+				btrfs_print_tree(parent, false);
 				break;
 			} else if (!extent_buffer_uptodate(eb)) {
 				ret = -EIO;

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: 4.17-rc1 FS went read-only during balance
       [not found]       ` <20180423105543.43f13e3a@job>
@ 2018-04-23  8:23         ` Qu Wenruo
  2018-04-23  8:40           ` Dmitrii Tcvetkov
  0 siblings, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2018-04-23  8:23 UTC (permalink / raw)
  To: Dmitrii Tcvetkov, linux-btrfs


[-- Attachment #1.1.1: Type: text/plain, Size: 2697 bytes --]



On 2018年04月23日 16:04, Dmitrii Tcvetkov wrote:
>>>>> TL;DR It seems as regression in 4.17, but I managed to find a
>>>>> workaround to make filesystem rw mountable again.
>>>>>
>>>>> Kernel built from tag v4.17-rc1
>>>>> btrfs-progs 4.16
>>>>>
>>>>> Tonight two my machines (PC (ECC RAM) and laptop(non-ECC RAM)) were
>>>>> doing usual weekly balance with this command via cron:
>>>>> btrfs balance start -musage=50 -dusage=50 <mountpoint>
>>>>> Both machines run same kernel version. 
>>>>>
>>>>> On PC that caused root and "data" filesystems to go readonly. Root
>>>>> is on an SSD with data single and metadata DUP, "data" filesystem
>>>>> is on 2 HDDs with RAID1 for data and metadata.
>>>>>
>>>>> On laptop only /home went ro, it's on NVMe SSD with data single and
>>>>> metadata DUP. 
>>>>>
>>>>> Btrfs check of PC rootfs was without any errors in both modes, I did
>>>>> them once each before reboot on readonly filesystem with --force
>>>>> flag and then from live usb. Same output without any errors.
>>>>>
>>>>> After reboot kernel refused rw mount rootfs with the same error as
>>>>> during cron balance, ro mount was accepted, error during rw mount:
>>>>> BTRFS: error (device dm-17) in merge_reloc_roots:2465: errno=-117    
>>>   
>>>> 117 means EUCLEAN, which could be caused by the newly introduced
>>>> first_key and level check.  
>>>   
>>>> Please apply this hotfix to fix it.
>>>> btrfs: Only check first key for committed tree blocks
>>>> (Which is included in latest pull request)  
>>>   
>>>> Also, please consider enable CONFIG_BTRFS_DEBUG to provide extra
>>>> debug info.  
>>>   
>>>> Thanks,
>>>> Qu  
>>>
>>> I tried 4.17-rc2 (as the pull request was pulled) with
>>> CONFIG_BTRFS_DEBUG on LVM snapshot of laptop home partition (/dev/vdb)
>>> in a VM (VM kernel sees only snapshot so no UUID collisions). Dmesg
>>> attached.  
>>
>> Thanks for the info and your previous btrfs-image.
>>
>> The image itself shows nothing wrong, so it should be runtime problem.
>> Would you please apply these two debug patches?
>> https://patchwork.kernel.org/patch/10335133/
>> https://patchwork.kernel.org/patch/10335135/
>>
>> And the attached diff file?
>>
>> My guess is the parent node is not initialized correctly in this case.
>>
>> Thanks,
>> Qu
> 
> Dmesg from kernel with all three patches applied attached.
> 
Thanks for the debug info, it really helps a lot!

It turns out that I'm just a super idiot, a typo in replace_path()
caused this, and it could not be trigger unless we enter it from
relocation recovery.

Please try the attached patch to see if it solves the problem.

Thanks,
Qu

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.1.2: 0001-btrfs-Fix-wrong-first_key-parameter-in-replace_path.patch --]
[-- Type: text/x-patch; name="0001-btrfs-Fix-wrong-first_key-parameter-in-replace_path.patch", Size: 1546 bytes --]

From 4b70eb864192ec5cf54a7e67e2957ddf0e5c0f6f Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu@suse.com>
Date: Mon, 23 Apr 2018 16:13:55 +0800
Subject: [PATCH] btrfs: Fix wrong first_key parameter in replace_path

Commit 581c1760415c ("btrfs: Validate child tree block's level and first
key") introduced new @first_key parameter for read_tree_block(), however
caller in replace_path() is parasing wrong key to read_tree_block().

It should use parameter @first_key other than @key.

Normally it won't expose problem as @key is normally initialzied to the
same value of @first_key we expect.
However in relocation recovery case, @key can be set to (0, 0, 0), and
since no valid key in relocation tree can be (0, 0, 0), it will cause
read_tree_block() to return -EUCLEAN and interrupt relocation recovery.

Fix it by setting @first_key correctly.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/relocation.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 00b7d3231821..b041b945a7ae 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1841,7 +1841,7 @@ int replace_path(struct btrfs_trans_handle *trans,
 		old_bytenr = btrfs_node_blockptr(parent, slot);
 		blocksize = fs_info->nodesize;
 		old_ptr_gen = btrfs_node_ptr_generation(parent, slot);
-		btrfs_node_key_to_cpu(parent, &key, slot);
+		btrfs_node_key_to_cpu(parent, &first_key, slot);
 
 		if (level <= max_level) {
 			eb = path->nodes[level];
-- 
2.17.0


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: 4.17-rc1 FS went read-only during balance
  2018-04-23  8:23         ` Qu Wenruo
@ 2018-04-23  8:40           ` Dmitrii Tcvetkov
  0 siblings, 0 replies; 6+ messages in thread
From: Dmitrii Tcvetkov @ 2018-04-23  8:40 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 4125 bytes --]

> >>>>> TL;DR It seems as regression in 4.17, but I managed to find a
> >>>>> workaround to make filesystem rw mountable again.
> >>>>>
> >>>>> Kernel built from tag v4.17-rc1
> >>>>> btrfs-progs 4.16
> >>>>>
> >>>>> Tonight two my machines (PC (ECC RAM) and laptop(non-ECC RAM)) were
> >>>>> doing usual weekly balance with this command via cron:
> >>>>> btrfs balance start -musage=50 -dusage=50 <mountpoint>
> >>>>> Both machines run same kernel version. 
> >>>>>
> >>>>> On PC that caused root and "data" filesystems to go readonly. Root
> >>>>> is on an SSD with data single and metadata DUP, "data" filesystem
> >>>>> is on 2 HDDs with RAID1 for data and metadata.
> >>>>>
> >>>>> On laptop only /home went ro, it's on NVMe SSD with data single and
> >>>>> metadata DUP. 
> >>>>>
> >>>>> Btrfs check of PC rootfs was without any errors in both modes, I did
> >>>>> them once each before reboot on readonly filesystem with --force
> >>>>> flag and then from live usb. Same output without any errors.
> >>>>>
> >>>>> After reboot kernel refused rw mount rootfs with the same error as
> >>>>> during cron balance, ro mount was accepted, error during rw mount:
> >>>>> BTRFS: error (device dm-17) in merge_reloc_roots:2465: errno=-117      
> >>>     
> >>>> 117 means EUCLEAN, which could be caused by the newly introduced
> >>>> first_key and level check.    
> >>>     
> >>>> Please apply this hotfix to fix it.
> >>>> btrfs: Only check first key for committed tree blocks
> >>>> (Which is included in latest pull request)    
> >>>     
> >>>> Also, please consider enable CONFIG_BTRFS_DEBUG to provide extra
> >>>> debug info.    
> >>>     
> >>>> Thanks,
> >>>> Qu    
> >>>
> >>> I tried 4.17-rc2 (as the pull request was pulled) with
> >>> CONFIG_BTRFS_DEBUG on LVM snapshot of laptop home partition (/dev/vdb)
> >>> in a VM (VM kernel sees only snapshot so no UUID collisions). Dmesg
> >>> attached.    
> >>
> >> Thanks for the info and your previous btrfs-image.
> >>
> >> The image itself shows nothing wrong, so it should be runtime problem.
> >> Would you please apply these two debug patches?
> >> https://patchwork.kernel.org/patch/10335133/
> >> https://patchwork.kernel.org/patch/10335135/
> >>
> >> And the attached diff file?
> >>
> >> My guess is the parent node is not initialized correctly in this case.
> >>
> >> Thanks,
> >> Qu  
> > 
> > Dmesg from kernel with all three patches applied attached.
> >   
> Thanks for the debug info, it really helps a lot!
> 
> It turns out that I'm just a super idiot, a typo in replace_path()
> caused this, and it could not be trigger unless we enter it from
> relocation recovery.
> 
> Please try the attached patch to see if it solves the problem.
> 
> Thanks,
> Qu
Glad to help, the patch solved the problem, 
rw mount is successful and balance finished, no errors or debug output,
btrfs check is clean in both modes.

[    2.842718] BTRFS: device label home devid 1 transid 277952 /dev/vdb
[    2.924965] BTRFS: device label root devid 1 transid 84092 /dev/vda2
[    3.072271] BTRFS info (device vda2): use lzo compression, level 0
[    3.072897] BTRFS info (device vda2): enabling auto defrag
[    3.073476] BTRFS info (device vda2): using free space tree
[    3.074049] BTRFS info (device vda2): has skinny extents
[    5.411821] BTRFS info (device vda2): using free space tree
[   24.925293] BTRFS info (device vdb): using free space tree
[   24.925324] BTRFS info (device vdb): has skinny extents
[   31.711868] BTRFS info (device vdb): continuing balance
[   31.721658] BTRFS info (device vdb): checking UUID tree
[   31.822920] BTRFS info (device vdb): relocating block group 69889687552flags data 
[   33.730399] BTRFS info (device vdb): found 12 extents
[   36.950699] BTRFS info (device vdb): found 12 extents
[   37.030813] BTRFS info (device vdb): relocating block group 67742203904flags metadata|dup 
[   37.104174] BTRFS info (device vdb): relocating block group 67708649472 flags system|dup 
[   37.189843] BTRFS info (device vdb): found 1 extents


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-04-23  8:40 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-21 14:55 4.17-rc1 FS went read-only during balance Dmitrii Tcvetkov
2018-04-22  8:12 ` Dmitrii Tcvetkov
2018-04-23  1:23 ` Qu Wenruo
     [not found]   ` <20180423080745.5a9dc6be@demfloro.ru>
2018-04-23  6:13     ` Qu Wenruo
     [not found]       ` <20180423105543.43f13e3a@job>
2018-04-23  8:23         ` Qu Wenruo
2018-04-23  8:40           ` Dmitrii Tcvetkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.