bad file extent, some csum missing - how to check that restored volumes are error-free?

All of lore.kernel.org
 help / color / mirror / Atom feed

* bad file extent, some csum missing - how to check that restored volumes are error-free?
@ 2021-07-14 17:53 Dave T
  2021-07-14 22:51 ` Qu Wenruo
  0 siblings, 1 reply; 17+ messages in thread
From: Dave T @ 2021-07-14 17:53 UTC (permalink / raw)
  To: Btrfs BTRFS

I was running btrfs send | receive to a target host via ssh and the
operation suddenly failed in the middle.

I ran this check:

btrfs check /dev/mapper/${xyz}

This shows lots of these errors:
  root 329 inode 262 errors 1040, bad file extent, some csum missing
  root 329 inode 7070 errors 1040, bad file extent, some csum missing
  root 329 inode 7242 errors 1040, bad file extent, some csum missing
  root 329 inode 7246 errors 1040, bad file extent, some csum missing
  root 329 inode 7252 errors 1040, bad file extent, some csum missing
  root 329 inode 7401 errors 1040, bad file extent, some csum missing
  root 329 inode 7753 errors 1040, bad file extent, some csum missing
  root 330 inode 588 errors 1040, bad file extent, some csum missing
  root 334 inode 258 errors 1040, bad file extent, some csum missing
  root 334 inode 636 errors 1040, bad file extent, some csum missing
  root 334 inode 3151 errors 1040, bad file extent, some csum missing
  ...
  root 334 inode 184871 errors 1040, bad file extent, some csum missing
  root 334 inode 184872 errors 1040, bad file extent, some csum missing
  root 334 inode 184874 errors 1040, bad file extent, some csum missing

I rebooted without any problems, then connected an external USB HDD.
Then I created new snapshots and used btrfs send | receive to send
them to the USB HDD.

Next I installed a new SSD and restored the snapshots. Then I ran
"btrfs check --check-data-csum /dev/mapper/abc" on the new device. It
shows:

Opening filesystem to check...
Checking filesystem on /dev/mapper/abc
UUID: fac54a70-8c27-4cbe-a8d0-325e761ba01d
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
[5/7] checking csums against data
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 128390598656 bytes used, no error found
total csum bytes: 124046564
total tree bytes: 1335197696
total fs tree bytes: 1140211712
total extent tree bytes: 50757632
btree space waste bytes: 168388261
file data blocks allocated: 127058169856
 referenced 142833545216

What else can or should I do to be sure my restored snapshots are error-free?
What additional checks would you recommend on the new device?
The new device is a Samsung EVO 970 Plus.
The old device was a Samsung 950 Pro.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: bad file extent, some csum missing - how to check that restored volumes are error-free?
  2021-07-14 17:53 bad file extent, some csum missing - how to check that restored volumes are error-free? Dave T
@ 2021-07-14 22:51 ` Qu Wenruo
       [not found]   ` <CAGdWbB44nH7dgdP3qO_bFYZwbkrW37OwFEVTE2Bn+rn4d7zWiQ@mail.gmail.com>
  0 siblings, 1 reply; 17+ messages in thread
From: Qu Wenruo @ 2021-07-14 22:51 UTC (permalink / raw)
  To: Dave T, Btrfs BTRFS



On 2021/7/15 上午1:53, Dave T wrote:
> I was running btrfs send | receive to a target host via ssh and the
> operation suddenly failed in the middle.
>
> I ran this check:
>
> btrfs check /dev/mapper/${xyz}
>
> This shows lots of these errors:
>    root 329 inode 262 errors 1040, bad file extent, some csum missing

Normally this is a minor error, normally caused by older kernels.

The original mode did a very bad report format.

You may want to run "btrfs check --mode=lowmem" to get a more human
readable report.
 From that we can get a full view of the problem and give better advice.

Thanks,
Qu

>    root 329 inode 7070 errors 1040, bad file extent, some csum missing
>    root 329 inode 7242 errors 1040, bad file extent, some csum missing
>    root 329 inode 7246 errors 1040, bad file extent, some csum missing
>    root 329 inode 7252 errors 1040, bad file extent, some csum missing
>    root 329 inode 7401 errors 1040, bad file extent, some csum missing
>    root 329 inode 7753 errors 1040, bad file extent, some csum missing
>    root 330 inode 588 errors 1040, bad file extent, some csum missing
>    root 334 inode 258 errors 1040, bad file extent, some csum missing
>    root 334 inode 636 errors 1040, bad file extent, some csum missing
>    root 334 inode 3151 errors 1040, bad file extent, some csum missing
>    ...
>    root 334 inode 184871 errors 1040, bad file extent, some csum missing
>    root 334 inode 184872 errors 1040, bad file extent, some csum missing
>    root 334 inode 184874 errors 1040, bad file extent, some csum missing
>
> I rebooted without any problems, then connected an external USB HDD.
> Then I created new snapshots and used btrfs send | receive to send
> them to the USB HDD.
>
> Next I installed a new SSD and restored the snapshots. Then I ran
> "btrfs check --check-data-csum /dev/mapper/abc" on the new device. It
> shows:
>
> Opening filesystem to check...
> Checking filesystem on /dev/mapper/abc
> UUID: fac54a70-8c27-4cbe-a8d0-325e761ba01d
> [1/7] checking root items
> [2/7] checking extents
> [3/7] checking free space cache
> [4/7] checking fs roots
> [5/7] checking csums against data
> [6/7] checking root refs
> [7/7] checking quota groups skipped (not enabled on this FS)
> found 128390598656 bytes used, no error found
> total csum bytes: 124046564
> total tree bytes: 1335197696
> total fs tree bytes: 1140211712
> total extent tree bytes: 50757632
> btree space waste bytes: 168388261
> file data blocks allocated: 127058169856
>   referenced 142833545216
>
> What else can or should I do to be sure my restored snapshots are error-free?
> What additional checks would you recommend on the new device?
> The new device is a Samsung EVO 970 Plus.
> The old device was a Samsung 950 Pro.
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: bad file extent, some csum missing - how to check that restored volumes are error-free?
       [not found]       ` <CAGdWbB7Q98tSbPgHUBF+yjqYRBPZ-a42hd=xLwMZUMO46gfd0A@mail.gmail.com>
@ 2021-07-15 22:19         ` Dave T
  2021-07-15 22:30           ` Qu Wenruo
  0 siblings, 1 reply; 17+ messages in thread
From: Dave T @ 2021-07-15 22:19 UTC (permalink / raw)
  To: Qu Wenruo, Btrfs BTRFS

> > >> You may want to run "btrfs check --mode=lowmem" to get a more human
> > >> readable report.
> > >>   From that we can get a full view of the problem and give better advice.
> > >
> > > Thank you. I will try to do that after I finish fully setting up the new SSD.
> > >
> > Looking forward to the output.

kernel version 5.12.15-arch1-1 (linux@archlinux)

# btrfs scrub start -B /
scrub done for ff2b04eb-088c-4fb0-9ad4-84780d23f821
Scrub started:    Thu Jul 15 11:44:47 2021
Status:           finished
Duration:         0:15:53
Total to scrub:   310.04GiB
Rate:             327.54MiB/s
Error summary:    no errors found

# btrfs check --mode=lowmem /dev/mapper/xyz
Opening filesystem to check...
Checking filesystem on /dev/mapper/extluks
UUID: ff2b04eb-088c-4fb0-9ad4-84780d23f821
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
ERROR: root 329 EXTENT_DATA[262 536576] compressed extent must have
csum, but only 0 bytes have, expect 65536
ERROR: root 329 EXTENT_DATA[262 536576] is compressed, but inode flag
doesn't allow it
ERROR: root 329 EXTENT_DATA[7070 0] compressed extent must have csum,
but only 0 bytes have, expect 4096
ERROR: root 329 EXTENT_DATA[7070 0] is compressed, but inode flag
doesn't allow it
ERROR: root 329 EXTENT_DATA[7242 0] compressed extent must have csum,
but only 0 bytes have, expect 28672
ERROR: root 329 EXTENT_DATA[7242 0] is compressed, but inode flag
doesn't allow it
ERROR: root 329 EXTENT_DATA[7246 0] compressed extent must have csum,
but only 0 bytes have, expect 16384
ERROR: root 329 EXTENT_DATA[7246 0] is compressed, but inode flag
doesn't allow it
ERROR: root 329 EXTENT_DATA[7252 0] compressed extent must have csum,
but only 0 bytes have, expect 32768
ERROR: root 329 EXTENT_DATA[7252 0] is compressed, but inode flag
doesn't allow it
ERROR: root 329 EXTENT_DATA[7401 0] compressed extent must have csum,
but only 0 bytes have, expect 12288
ERROR: root 329 EXTENT_DATA[7401 0] is compressed, but inode flag
doesn't allow it

and hundreds more errors of this same type... (I guess you don't want
to see every error line.)

ERROR: root 334 EXTENT_DATA[184874 0] compressed extent must have
csum, but only 0 bytes have, expect 16384
ERROR: root 334 EXTENT_DATA[184874 0] is compressed, but inode flag
doesn't allow it
ERROR: errors found in fs roots
found 327307210752 bytes used, error(s) found
total csum bytes: 282325056
total tree bytes: 5130452992
total fs tree bytes: 4535648256
total extent tree bytes: 249790464
btree space waste bytes: 848096029
file data blocks allocated: 588119937024
 referenced 568343642112

I'm interested in your thoughts about what might have caused this, and
how I should repair / fix it. Are any of these options appropriate?

-  btrfs rescue chunk-recover /dev/mapper/xyz

-  btrfs check --repair --init-extent-tree /dev/mapper/zyz

- btrfs check --repair --init-csum-tree /dev/mapper/xyz

Thank you.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: bad file extent, some csum missing - how to check that restored volumes are error-free?
  2021-07-15 22:19         ` Dave T
@ 2021-07-15 22:30           ` Qu Wenruo
  2021-07-15 22:49             ` Dave T
  0 siblings, 1 reply; 17+ messages in thread
From: Qu Wenruo @ 2021-07-15 22:30 UTC (permalink / raw)
  To: Dave T, Qu Wenruo, Btrfs BTRFS



On 2021/7/16 上午6:19, Dave T wrote:
>>>>> You may want to run "btrfs check --mode=lowmem" to get a more human
>>>>> readable report.
>>>>>    From that we can get a full view of the problem and give better advice.
>>>>
>>>> Thank you. I will try to do that after I finish fully setting up the new SSD.
>>>>
>>> Looking forward to the output.
> 
> kernel version 5.12.15-arch1-1 (linux@archlinux)
> 
> # btrfs scrub start -B /
> scrub done for ff2b04eb-088c-4fb0-9ad4-84780d23f821
> Scrub started:    Thu Jul 15 11:44:47 2021
> Status:           finished
> Duration:         0:15:53
> Total to scrub:   310.04GiB
> Rate:             327.54MiB/s
> Error summary:    no errors found
> 
> # btrfs check --mode=lowmem /dev/mapper/xyz
> Opening filesystem to check...
> Checking filesystem on /dev/mapper/extluks
> UUID: ff2b04eb-088c-4fb0-9ad4-84780d23f821
> [1/7] checking root items
> [2/7] checking extents
> [3/7] checking free space cache
> [4/7] checking fs roots
> ERROR: root 329 EXTENT_DATA[262 536576] compressed extent must have
> csum, but only 0 bytes have, expect 65536
> ERROR: root 329 EXTENT_DATA[262 536576] is compressed, but inode flag
> doesn't allow it
> ERROR: root 329 EXTENT_DATA[7070 0] compressed extent must have csum,
> but only 0 bytes have, expect 4096
> ERROR: root 329 EXTENT_DATA[7070 0] is compressed, but inode flag
> doesn't allow it
> ERROR: root 329 EXTENT_DATA[7242 0] compressed extent must have csum,
> but only 0 bytes have, expect 28672
> ERROR: root 329 EXTENT_DATA[7242 0] is compressed, but inode flag
> doesn't allow it
> ERROR: root 329 EXTENT_DATA[7246 0] compressed extent must have csum,
> but only 0 bytes have, expect 16384
> ERROR: root 329 EXTENT_DATA[7246 0] is compressed, but inode flag
> doesn't allow it
> ERROR: root 329 EXTENT_DATA[7252 0] compressed extent must have csum,
> but only 0 bytes have, expect 32768
> ERROR: root 329 EXTENT_DATA[7252 0] is compressed, but inode flag
> doesn't allow it
> ERROR: root 329 EXTENT_DATA[7401 0] compressed extent must have csum,
> but only 0 bytes have, expect 12288
> ERROR: root 329 EXTENT_DATA[7401 0] is compressed, but inode flag
> doesn't allow it

OK, lowmem mode indeed did a much better job.

This is a very strange bug.

This means:

- The compressed extent doesn't have csum
   Which shouldn't be possible for recent kernels.

- The compressed extent exists for inode which has NODATASUM flag
   Not possible again for recent kernels.

But IIRC there are old kernels allowing such compression + nodatasum.

I guess that's the reason why you got EIO when reading it.

When we failed to find csum, we just put 0x00 as csum, and then when you 
read the data, it's definitely going to cause csum mismatch and nothing 
get read out.

This can be worked around by recent "rescue=idatacsums" mount option.

But to me, this really looks like some old fs, with some inodes created 
by older kernels.

> 
> and hundreds more errors of this same type... (I guess you don't want
> to see every error line.)
> 
> ERROR: root 334 EXTENT_DATA[184874 0] compressed extent must have
> csum, but only 0 bytes have, expect 16384
> ERROR: root 334 EXTENT_DATA[184874 0] is compressed, but inode flag
> doesn't allow it
> ERROR: errors found in fs roots
> found 327307210752 bytes used, error(s) found
> total csum bytes: 282325056
> total tree bytes: 5130452992
> total fs tree bytes: 4535648256
> total extent tree bytes: 249790464
> btree space waste bytes: 848096029
> file data blocks allocated: 588119937024
>   referenced 568343642112
> 
> I'm interested in your thoughts about what might have caused this, and
> how I should repair / fix it. Are any of these options appropriate?
> 
> -  btrfs rescue chunk-recover /dev/mapper/xyz

Definite no.

Any rescue command should only be used when some developer suggested.

> 
> -  btrfs check --repair --init-extent-tree /dev/mapper/zyz

No again, this is even more dangerous.

> 
> - btrfs check --repair --init-csum-tree /dev/mapper/xyz

This may solve the read error, but we will still report the NODATACSUM 
problem for the compressed extent.

Have you tried to remove the NODATASUM option for those involved inodes?

If it's possible to remove NODATASUM for those inodes, then 
--init-csum-tree should be able to solve the problem.

Thanks,
Qu

> 
> Thank you.
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: bad file extent, some csum missing - how to check that restored volumes are error-free?
  2021-07-15 22:30           ` Qu Wenruo
@ 2021-07-15 22:49             ` Dave T
  2021-07-16  1:05               ` Qu Wenruo
  0 siblings, 1 reply; 17+ messages in thread
From: Dave T @ 2021-07-15 22:49 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, Btrfs BTRFS

> OK, lowmem mode indeed did a much better job.
>
> This is a very strange bug.
>
> This means:
>
> - The compressed extent doesn't have csum
>    Which shouldn't be possible for recent kernels.
>
> - The compressed extent exists for inode which has NODATASUM flag
>    Not possible again for recent kernels.
>
> But IIRC there are old kernels allowing such compression + nodatasum.
>
> I guess that's the reason why you got EIO when reading it.
>
> When we failed to find csum, we just put 0x00 as csum, and then when you
> read the data, it's definitely going to cause csum mismatch and nothing
> get read out.
>
> This can be worked around by recent "rescue=idatacsums" mount option.
>
> But to me, this really looks like some old fs, with some inodes created
> by older kernels.

I'm running:
kernel version 5.12.15-arch1-1 (linux@archlinux)

I've been running arch + btrfs since 2014. I keep arch linux fully
updated. I'm running new kernels and new btrfs progs. However, I
created this filesystem around 2014.

Is there an option to "update" my BTRFS filesystem? Is that even a thing?

I have multiple devices running on BTRFS filesystems created around
2014 to 2016. Are those all in danger of having some problems now?
BTRFS has been mostly problem-free for me since before 2014. I do
regular balance and scrubs. However, I'm getting worried about my data
now...

I hope I do not need to backup every device, recreate the filesystems,
and restore them. That would be weeks of work and I'm already
overworked... but losing data would be worse.

BTW, even my backup disks run on BTRFS filesystems that were created years ago.

> > Are any of these options appropriate?
> >
> > -  btrfs rescue chunk-recover /dev/mapper/xyz
>
> Definite no.
>
> Any rescue command should only be used when some developer suggested.

Thank you for reminding me! There's a lot of bad BTRFS advice on all
the various forums, and it is easy to be influenced by it when you are
a casual user like me.

> > - btrfs check --repair --init-csum-tree /dev/mapper/xyz
>
> This may solve the read error, but we will still report the NODATACSUM
> problem for the compressed extent.
>
> Have you tried to remove the NODATASUM option for those involved inodes?

https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs(5)
says:
Note: If compression is enabled, nodatacow and nodatasum are disabled.

My mount options are:
rw,autodefrag,noatime,nodiratime,compress=lzo,space_cache,subvol=xyz

Do I understand it correctly? My compression option should already
"remove the NODATASUM".

>
> If it's possible to remove NODATASUM for those inodes, then
> --init-csum-tree should be able to solve the problem.

What do you recommend now?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: bad file extent, some csum missing - how to check that restored volumes are error-free?
  2021-07-15 22:49             ` Dave T
@ 2021-07-16  1:05               ` Qu Wenruo
  2021-07-16  2:32                 ` Qu Wenruo
  2021-07-16 13:15                 ` Dave T
  0 siblings, 2 replies; 17+ messages in thread
From: Qu Wenruo @ 2021-07-16  1:05 UTC (permalink / raw)
  To: Dave T, Qu Wenruo; +Cc: Btrfs BTRFS



On 2021/7/16 上午6:49, Dave T wrote:
>> OK, lowmem mode indeed did a much better job.
>>
>> This is a very strange bug.
>>
>> This means:
>>
>> - The compressed extent doesn't have csum
>>     Which shouldn't be possible for recent kernels.
>>
>> - The compressed extent exists for inode which has NODATASUM flag
>>     Not possible again for recent kernels.
>>
>> But IIRC there are old kernels allowing such compression + nodatasum.
>>
>> I guess that's the reason why you got EIO when reading it.
>>
>> When we failed to find csum, we just put 0x00 as csum, and then when you
>> read the data, it's definitely going to cause csum mismatch and nothing
>> get read out.
>>
>> This can be worked around by recent "rescue=idatacsums" mount option.
>>
>> But to me, this really looks like some old fs, with some inodes created
>> by older kernels.
>
> I'm running:
> kernel version 5.12.15-arch1-1 (linux@archlinux)
>
> I've been running arch + btrfs since 2014. I keep arch linux fully
> updated. I'm running new kernels and new btrfs progs. However, I
> created this filesystem around 2014.

The change that don't allow allow compression if the inode has NODATASUM
option is introduced in commit 42c16da6d684 ("btrfs: inode: Don't
compress if NODATASUM or NODATACOW set"), which is from v5.2 in 2019.

Thus such old fs indeed can be affected.

>
> Is there an option to "update" my BTRFS filesystem? Is that even a thing?

I don't think so, but please allow me to do more testing and then I may
craft a fix in btrfs-progs to allow btrfs-check to repair such problems.

If possible I would enhance kernel to handle such existing file extents
better so that what you really need is just run "pacman -Syu" as usual,
nothing to bother.

Thanks,
Qu

>
> I have multiple devices running on BTRFS filesystems created around
> 2014 to 2016. Are those all in danger of having some problems now?
> BTRFS has been mostly problem-free for me since before 2014. I do
> regular balance and scrubs. However, I'm getting worried about my data
> now...
>
> I hope I do not need to backup every device, recreate the filesystems,
> and restore them. That would be weeks of work and I'm already
> overworked... but losing data would be worse.
>
> BTW, even my backup disks run on BTRFS filesystems that were created years ago.
>
>>> Are any of these options appropriate?
>>>
>>> -  btrfs rescue chunk-recover /dev/mapper/xyz
>>
>> Definite no.
>>
>> Any rescue command should only be used when some developer suggested.
>
> Thank you for reminding me! There's a lot of bad BTRFS advice on all
> the various forums, and it is easy to be influenced by it when you are
> a casual user like me.
>
>
>>> - btrfs check --repair --init-csum-tree /dev/mapper/xyz
>>
>> This may solve the read error, but we will still report the NODATACSUM
>> problem for the compressed extent.
>>
>> Have you tried to remove the NODATASUM option for those involved inodes?
>
> https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs(5)
> says:
> Note: If compression is enabled, nodatacow and nodatasum are disabled.
>
> My mount options are:
> rw,autodefrag,noatime,nodiratime,compress=lzo,space_cache,subvol=xyz
>
> Do I understand it correctly? My compression option should already
> "remove the NODATASUM".
>
>>
>> If it's possible to remove NODATASUM for those inodes, then
>> --init-csum-tree should be able to solve the problem.
>
> What do you recommend now?
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: bad file extent, some csum missing - how to check that restored volumes are error-free?
  2021-07-16  1:05               ` Qu Wenruo
@ 2021-07-16  2:32                 ` Qu Wenruo
  2021-07-16 13:15                 ` Dave T
  1 sibling, 0 replies; 17+ messages in thread
From: Qu Wenruo @ 2021-07-16  2:32 UTC (permalink / raw)
  To: Dave T, Qu Wenruo; +Cc: Btrfs BTRFS



On 2021/7/16 上午9:05, Qu Wenruo wrote:
>
>
> On 2021/7/16 上午6:49, Dave T wrote:
>>> OK, lowmem mode indeed did a much better job.
>>>
>>> This is a very strange bug.
>>>
>>> This means:
>>>
>>> - The compressed extent doesn't have csum
>>>     Which shouldn't be possible for recent kernels.
>>>
>>> - The compressed extent exists for inode which has NODATASUM flag
>>>     Not possible again for recent kernels.
>>>
>>> But IIRC there are old kernels allowing such compression + nodatasum.
>>>
>>> I guess that's the reason why you got EIO when reading it.
>>>
>>> When we failed to find csum, we just put 0x00 as csum, and then when you
>>> read the data, it's definitely going to cause csum mismatch and nothing
>>> get read out.
>>>
>>> This can be worked around by recent "rescue=idatacsums" mount option.
>>>
>>> But to me, this really looks like some old fs, with some inodes created
>>> by older kernels.
>>
>> I'm running:
>> kernel version 5.12.15-arch1-1 (linux@archlinux)
>>
>> I've been running arch + btrfs since 2014. I keep arch linux fully
>> updated. I'm running new kernels and new btrfs progs. However, I
>> created this filesystem around 2014.
>
> The change that don't allow allow compression if the inode has NODATASUM
> option is introduced in commit 42c16da6d684 ("btrfs: inode: Don't
> compress if NODATASUM or NODATACOW set"), which is from v5.2 in 2019.
>
> Thus such old fs indeed can be affected.
>
>>
>> Is there an option to "update" my BTRFS filesystem? Is that even a thing?
>
> I don't think so, but please allow me to do more testing and then I may
> craft a fix in btrfs-progs to allow btrfs-check to repair such problems.

There is something wrong.

I created an btrfs image which has exactly the same layout as yours,
with compressed extent and inode has NODATASUM flag, and no csum for
those extents.

The btrfs check reports the same error as yours:

[4/7] checking fs roots
ERROR: root 5 EXTENT_DATA[257 0] compressed extent must have csum, but
only 0 bytes have, expect 4096
ERROR: root 5 EXTENT_DATA[257 0] is compressed, but inode flag doesn't
allow it
...
ERROR: root 5 EXTENT_DATA[257 917504] is compressed, but inode flag
doesn't allow it
ERROR: errors found in fs roots
found 163840 bytes used, error(s) found

But current kernel (v5.13-rc7) has no problem reading such extents.

As check_compressed_extent() will skip the csum verification if the
inode has NODATASUM flag.
The check is there for a long long time.

So I'm afraid there is something different involved for your read error
problem.

When the read error happens, is there really no extra kernel error message?

Thanks,
Qu
>
> If possible I would enhance kernel to handle such existing file extents
> better so that what you really need is just run "pacman -Syu" as usual,
> nothing to bother.
>
> Thanks,
> Qu
>
>>
>> I have multiple devices running on BTRFS filesystems created around
>> 2014 to 2016. Are those all in danger of having some problems now?
>> BTRFS has been mostly problem-free for me since before 2014. I do
>> regular balance and scrubs. However, I'm getting worried about my data
>> now...
>>
>> I hope I do not need to backup every device, recreate the filesystems,
>> and restore them. That would be weeks of work and I'm already
>> overworked... but losing data would be worse.
>>
>> BTW, even my backup disks run on BTRFS filesystems that were created
>> years ago.
>>
>>>> Are any of these options appropriate?
>>>>
>>>> -  btrfs rescue chunk-recover /dev/mapper/xyz
>>>
>>> Definite no.
>>>
>>> Any rescue command should only be used when some developer suggested.
>>
>> Thank you for reminding me! There's a lot of bad BTRFS advice on all
>> the various forums, and it is easy to be influenced by it when you are
>> a casual user like me.
>>
>>
>>>> - btrfs check --repair --init-csum-tree /dev/mapper/xyz
>>>
>>> This may solve the read error, but we will still report the NODATACSUM
>>> problem for the compressed extent.
>>>
>>> Have you tried to remove the NODATASUM option for those involved inodes?
>>
>> https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs(5)
>> says:
>> Note: If compression is enabled, nodatacow and nodatasum are disabled.
>>
>> My mount options are:
>> rw,autodefrag,noatime,nodiratime,compress=lzo,space_cache,subvol=xyz
>>
>> Do I understand it correctly? My compression option should already
>> "remove the NODATASUM".
>>
>>>
>>> If it's possible to remove NODATASUM for those inodes, then
>>> --init-csum-tree should be able to solve the problem.
>>
>> What do you recommend now?
>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: bad file extent, some csum missing - how to check that restored volumes are error-free?
  2021-07-16  1:05               ` Qu Wenruo
  2021-07-16  2:32                 ` Qu Wenruo
@ 2021-07-16 13:15                 ` Dave T
  2021-07-16 13:28                   ` Qu Wenruo
  1 sibling, 1 reply; 17+ messages in thread
From: Dave T @ 2021-07-16 13:15 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, Btrfs BTRFS

On Thu, Jul 15, 2021 at 9:05 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
> > I've been running arch + btrfs since 2014. I keep arch linux fully
> > updated. I'm running new kernels and new btrfs progs. However, I
> > created this filesystem around 2014.
>
> The change that don't allow allow compression if the inode has NODATASUM
> option is introduced in commit 42c16da6d684 ("btrfs: inode: Don't
> compress if NODATASUM or NODATACOW set"), which is from v5.2 in 2019.
>
> Thus such old fs indeed can be affected.
>
> >
> > Is there an option to "update" my BTRFS filesystem? Is that even a thing?
>
> I don't think so, but please allow me to do more testing and then I may
> craft a fix in btrfs-progs to allow btrfs-check to repair such problems.

I hope that there is soon a way to run a btrfs-progs command to update
an old filesystem to the current standards.
Where can I send you a small donation to express my support for
something like this?

>
> If possible I would enhance kernel to handle such existing file extents
> better so that what you really need is just run "pacman -Syu" as usual,
> nothing to bother.

This would indeed be a fantastic solution!

> So I'm afraid there is something different involved for your read error problem.

I am less worried about this specific problem than about the general
problem of having an old filesystem on a fully updated rolling release
linux system. I was able to restore all my data to a new SSD and I am
just testing this old SSD to give you feedback.

However, I do have some general questions. As it stands currently,
what exactly is an "old filesystem"? If I run "mkfs.btrfs" with linux
kernel v5.1, is all data in that filesystem somehow affected even
after I install newer kernels? Or are files created when running the
newer kernel not affected? What about files copied?

If I do a btrfs send|receive from a fs originally created in 2014 but
now I am running the latest arch linux kernel, what is the result? Do
my transferred files still have hallmarks of the 2014 filesystem they
originally lived on?

Are there some checks I should do now on my other devices with btrfs
filessystems originally created around 2014? (I have a lot of such
devices because in 2014 I decided to run arch linux and btrfs
everywhere.)

> When the read error happens, is there really no extra kernel error message?

I can do more testing and let you know. Can you suggest any tests you
would like me to try? I could run "journalctl -f" in one window and do
some file operations in another program, for example. But I am not a
developer, so there may be limits on what I can do.

Thank you.
Dave

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: bad file extent, some csum missing - how to check that restored volumes are error-free?
  2021-07-16 13:15                 ` Dave T
@ 2021-07-16 13:28                   ` Qu Wenruo
  2021-07-16 15:40                     ` Dave T
  0 siblings, 1 reply; 17+ messages in thread
From: Qu Wenruo @ 2021-07-16 13:28 UTC (permalink / raw)
  To: Dave T; +Cc: Qu Wenruo, Btrfs BTRFS



On 2021/7/16 下午9:15, Dave T wrote:
> On Thu, Jul 15, 2021 at 9:05 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>> I've been running arch + btrfs since 2014. I keep arch linux fully
>>> updated. I'm running new kernels and new btrfs progs. However, I
>>> created this filesystem around 2014.
>>
>> The change that don't allow allow compression if the inode has NODATASUM
>> option is introduced in commit 42c16da6d684 ("btrfs: inode: Don't
>> compress if NODATASUM or NODATACOW set"), which is from v5.2 in 2019.
>>
>> Thus such old fs indeed can be affected.
>>
>>>
>>> Is there an option to "update" my BTRFS filesystem? Is that even a thing?
>>
>> I don't think so, but please allow me to do more testing and then I may
>> craft a fix in btrfs-progs to allow btrfs-check to repair such problems.
>
> I hope that there is soon a way to run a btrfs-progs command to update
> an old filesystem to the current standards.
> Where can I send you a small donation to express my support for
> something like this?

No need, we're a open community and I have a day job (working on btrfs).

>
>>
>> If possible I would enhance kernel to handle such existing file extents
>> better so that what you really need is just run "pacman -Syu" as usual,
>> nothing to bother.
>
> This would indeed be a fantastic solution!
>
>> So I'm afraid there is something different involved for your read error problem.
>
> I am less worried about this specific problem than about the general
> problem of having an old filesystem on a fully updated rolling release
> linux system. I was able to restore all my data to a new SSD and I am
> just testing this old SSD to give you feedback.
>
> However, I do have some general questions. As it stands currently,
> what exactly is an "old filesystem"? If I run "mkfs.btrfs" with linux
> kernel v5.1, is all data in that filesystem somehow affected even
> after I install newer kernels?

It's not about the mkfs, but the data write using older kernels.

You can even have mkfs from v5.12, then mount with v5.1, and write some
data, then you may have the file extents described above.

> Or are files created when running the
> newer kernel not affected? What about files copied?

If using newer kernel, btrfs won't create any compressed extents if the
inode has NODATASUM flag.

So your "files created when running the newer kernel not affected" part
is correct.

File copied counts as the same.

>
> If I do a btrfs send|receive from a fs originally created in 2014 but
> now I am running the latest arch linux kernel, what is the result?

Btrfs receive is just doing the file writes in user space.

Then the newer kernel will follow its new behavior, so received files
are all fine.

> Do
> my transferred files still have hallmarks of the 2014 filesystem they
> originally lived on?

Nope.

>
> Are there some checks I should do now on my other devices with btrfs
> filessystems originally created around 2014? (I have a lot of such
> devices because in 2014 I decided to run arch linux and btrfs
> everywhere.)

No need at least for the compressed file extents with NODATASUM inode
problem.

Kernel is able read it without problem.

Thus that's why I can't reproduce your original problem.

>
>> When the read error happens, is there really no extra kernel error message?
>
> I can do more testing and let you know. Can you suggest any tests you
> would like me to try?

1. Try to read the affected file:

- Mount the btrfs

- Read inode 262 in root 329 (just one example)
   You can use "find -inum 262" inside root 329 to locate the file.

   If you have a file you know you can't read, that would be the best
   case, just try to read that.

- Unmount the btrfs

- Attached the full dmesg.


2. Reproduce the Read-only fs problem.

- Find a way to reproduce the read-only fs problem
   If you don't have a reliable way to reproduce it, just ignore this
   part.

- Attach the full dmesg

Thanks,
Qu

> I could run "journalctl -f" in one window and do
> some file operations in another program, for example. But I am not a
> developer, so there may be limits on what I can do.
>
> Thank you.
> Dave
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: bad file extent, some csum missing - how to check that restored volumes are error-free?
  2021-07-16 13:28                   ` Qu Wenruo
@ 2021-07-16 15:40                     ` Dave T
  2021-07-16 23:06                       ` Qu Wenruo
  0 siblings, 1 reply; 17+ messages in thread
From: Dave T @ 2021-07-16 15:40 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, Btrfs BTRFS

On Fri, Jul 16, 2021 at 9:28 AM Qu Wenruo <quwenruo.btrfs@gmx.com>
> > I can do more testing and let you know. Can you suggest any tests you
> > would like me to try?
>
> 1. Try to read the affected file:
>
> - Mount the btrfs
>
> - Read inode 262 in root 329 (just one example)
>    You can use "find -inum 262" inside root 329 to locate the file.
>

I have reconnected and mounted the affected SSD.
Most of the csum errors reported are for root 334 like this:

root 334 inode 184115 errors 1040, bad file extent, some csum missing
root 334 inode 184116 errors 1040, bad file extent, some csum missing
There are hundreds of similar error lines.

There were only a few for root 329 and one for 330.

What is the method to map root 334, for example, to a file system
path? Is it like this?

# btrfs su li /
ID 257 gen 1106448 top level 5 path @root
...
ID 329 gen 1105905 top level 326 path @home/live/snapshot/user1/.cache
ID 330 gen 1105905 top level 326 path @home/live/snapshot/user2/.cache
ID 331 gen 1105905 top level 326 path @home/live/snapshot/user3/.cache
ID 332 gen 1105905 top level 326 path @home/live/snapshot/user4.cache
ID 333 gen 1105905 top level 326 path @home/live/snapshot/user5/.cache
ID 334 gen 1105905 top level 326 path @home/live/snapshot/user6/.cache

# cd /home/user6/.cache
# find . -inum 184116
./mozilla/firefox/profile1/cache2/entries/3E5DF2A295E7D36F537DFDC221EBD6153F46DC30

Did I do that correctly?

# less ./mozilla/firefox/profile1/cache2/entries/3E5DF2A295E7D36F537DFDC221EBD6153F46DC30
"./mozilla/firefox/profile1/cache2/entries/3E5DF2A295E7D36F537DFDC221EBD6153F46DC30"
may be a binary file.  See it anyway?

I viewed it and there are no errors in terminal or systemd journal
when reading it.

Next I tested every reported inode in root 334 (assuming I identified
the root correctly) using this method:

find /home/user6/.cache/ -inum 184874 -exec bash -c 'cp "{}" /tmp ;
out=$(basename "{}"); rm /tmp/$out' \;

I got a list of every inode number (e.g., 184874) from the output of
my prior checks and looped through them all. No errors were reported.

I do not see any related errors in dmesg either.

# dmesg | grep -i btrfs
[  +0.032192] Btrfs loaded, crc32c=crc32c-intel, zoned=yes
[  +0.000546] BTRFS: device label top_level devid 1 transid 1106559
/dev/dm-0 scanned by systemd-udevd (120)
[  +0.029879] BTRFS info (device dm-0): disk space caching is enabled
[  +0.000003] BTRFS info (device dm-0): has skinny extents
[  +0.096620] BTRFS info (device dm-0): enabling ssd optimizations
[  +0.002567] BTRFS info (device dm-0): enabling auto defrag
[  +0.000005] BTRFS info (device dm-0): use lzo compression, level 0
[  +0.000005] BTRFS info (device dm-0): disk space caching is enabled
[  +0.044004] BTRFS info (device dm-0): devid 1 device path
/dev/mapper/root changed to /dev/dm-0 scanned by systemd-udevd (275)
[  +0.000829] BTRFS info (device dm-0): devid 1 device path /dev/dm-0
changed to /dev/mapper/root scanned by system

The only other FS-related messages in dmesg are:

[  +0.142425] FS-Cache: Netfs 'nfs' registered for caching
[  +0.018228] Key type dns_resolver registered
[  +0.194893] NFS: Registering the id_resolver key type
[  +0.000016] Key type id_resolver registered
[  +0.000001] Key type id_legacy registered
[  +0.022450] FS-Cache: Duplicate cookie detected

If I have done that correctly, it raises some interesting questions.
First, I started using a btrfs subvolume for user .cache directories
in late 2018. I do this:

users_list="user1 user2 user3 ... userN"
for uu in $users_list; do \
  btrfs su cr $destination/@home/live/snapshot/${uu}/.cache
    chattr +C $destination/@home/live/snapshot/${uu}/.cache
    chown ${uu}:${uu} $destination/@home/live/snapshot/${uu}/.cache
done

The reason is to not include the cache contents in snapshots & backups.

The user6 user has apparently not logged into this particular device
since May 15, 2019. (It is now used primarily by someone else.) The
files in /home/user6/.cache appear to all have dates equal or prior to
May 15, 2019, but no older than Feb 3, 2019. The vast majority of the
reported errors were in these files. However, I do not see errors when
accessing those files now.

> - Find a way to reproduce the read-only fs problem

This happened when I was using btrbk to send|receive snapshots to a
target via ssh. I do not think it is a coincidence that I was doing a
btrfs operation at the time this happened.

I did the same btrbk operation on another device (a ThinkPad T450
laptop) that has been running Arch Linux and BTRFS since many years
ago (probably around 2015). However, the btrbk operation succeeded
with no errors.

Here is exactly what I did when the read-only problem first happened:

# btrbk dryrun
--------------------------------------------------------------------------------
Backup Summary (btrbk command line client, version 0.31.2)

    Date:   Tue Jul 13 23:11:32 2021
    Config: /etc/btrbk/btrbk.conf
    Dryrun: YES

Legend:
    ===  up-to-date subvolume (source snapshot)
    +++  created subvolume (source snapshot)
    ---  deleted subvolume
    ***  received subvolume (non-incremental)
    >>>  received subvolume (incremental)
--------------------------------------------------------------------------------
/mnt/top_level/@root/live/snapshot
+++ /mnt/top_level/@root/_btrbk_snap/root.20210713T231132-0400
*** backupsrv:/backup/clnt/laptop2/@root/root.20210713T231132-0400

/mnt/top_level/@home/live/snapshot
+++ /mnt/top_level/@home/_btrbk_snap/home.20210713T231132-0400
*** backupsrv:/backup/clnt/laptop2/@home/home.20210713T231132-0400

/mnt/top_level/@logs/live/snapshot
+++ /mnt/top_level/@logs/_btrbk_snap/vlog.20210713T231132-0400
*** backupsrv:/backup/clnt/laptop2/@log/vlog.20210713T231132-0400

NOTE: Dryrun was active, none of the operations above were actually executed!

# systemctl disable --now snapper-timeline.timer

# systemctl enable --now btrbk.timer
Created symlink /etc/systemd/system/timers.target.wants/btrbk.timer →
/usr/lib/systemd/system/btrbk.timer.

# systemctl list-timers --all
NEXT                        LEFT        LAST
PASSED        UNIT                         ACTIVATES
Wed 2021-07-14 00:00:00 EDT 47min left  n/a
n/a           btrbk.timer                  btrbk.service
Wed 2021-07-14 00:00:00 EDT 47min left  Tue 2021-07-13 09:05:48 EDT
14h ago       logrotate.timer              logrotate.service
Wed 2021-07-14 00:00:00 EDT 47min left  Tue 2021-07-13 09:05:48 EDT
14h ago       man-db.timer                 man-db.service
Wed 2021-07-14 00:00:00 EDT 47min left  Tue 2021-07-13 09:05:48 EDT
14h ago       shadow.timer                 shadow.service
Wed 2021-07-14 17:31:57 EDT 18h left    Tue 2021-07-13 17:31:57 EDT 5h
40min ago  snapper-cleanup.timer        snapper-cleanup.service
Wed 2021-07-14 17:36:17 EDT 18h left    Tue 2021-07-13 17:36:17 EDT 5h
36min ago  systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service
Mon 2021-07-19 01:11:06 EDT 5 days left Mon 2021-07-12 01:24:15 EDT 1
day 21h ago fstrim.timer                 fstrim.service

7 timers listed.

# systemctl start btrbk.service

# systemctl status btrbk
○ btrbk.service - btrbk backup
     Loaded: loaded (/usr/lib/systemd/system/btrbk.service; static)
    Drop-In: /etc/systemd/system/btrbk.service.d
             └─override.conf
     Active: inactive (dead) since Tue 2021-07-13 23:17:54 EDT; 20s ago
TriggeredBy: ● btrbk.timer
       Docs: man:btrbk(1)
    Process: 6827 ExecStart=/usr/local/bin/btrbk_run.sh (code=exited,
status=0/SUCCESS)
   Main PID: 6827 (code=exited, status=0/SUCCESS)
        CPU: 2min 40.794s

# mount /mnt/top_level/
mount: /mnt/top_level: wrong fs type, bad option, bad superblock on
/dev/mapper/root, missing codepage or helper program, or other error.

# ls /mnt/top_level/
total 0
drwxr-x--x 1 root root   0 Nov  1  2017 .
drwxr-xr-x 1 root root 116 Apr 10  2020 ..

My prompt includes a timestamp like this:

 !2813 [13-Jul 23:19:18] root@laptop2
# journalctl -r
An error was encountered while opening journal file or directory
/var/log/journal/7db5321aaf884af786868ec2f2e9c7b0/system.journal,
ignoring file: Input/output error
-- Journal begins at Thu 2021-06-17 15:14:31 EDT, ends at Tue
2021-07-13 16:19:12 EDT. --
Jul 13 16:19:12 laptop2 sudo[674]: pam_unix(sudo:session): session
opened for user root(uid=0) by user0(uid=1000)

As far as I can tell, the last 7 hours of the journal are missing at that point.

That's exactly how the read-only problem happened. I did a btrbk
dryrun to validate the configuration. Then I started the backup. Near
(or at) the end of the backup for the root subvolume, the backup
process exited, but I could not see the journal entries for it because
they were missing and the filesystem was read-only.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: bad file extent, some csum missing - how to check that restored volumes are error-free?
  2021-07-16 15:40                     ` Dave T
@ 2021-07-16 23:06                       ` Qu Wenruo
  2021-07-17  0:18                         ` Dave T
  0 siblings, 1 reply; 17+ messages in thread
From: Qu Wenruo @ 2021-07-16 23:06 UTC (permalink / raw)
  To: Dave T; +Cc: Qu Wenruo, Btrfs BTRFS



On 2021/7/16 下午11:40, Dave T wrote:
> On Fri, Jul 16, 2021 at 9:28 AM Qu Wenruo <quwenruo.btrfs@gmx.com>
>>> I can do more testing and let you know. Can you suggest any tests you
>>> would like me to try?
>>
>> 1. Try to read the affected file:
>>
>> - Mount the btrfs
>>
>> - Read inode 262 in root 329 (just one example)
>>     You can use "find -inum 262" inside root 329 to locate the file.
>>
>
> I have reconnected and mounted the affected SSD.
> Most of the csum errors reported are for root 334 like this:
>
> root 334 inode 184115 errors 1040, bad file extent, some csum missing
> root 334 inode 184116 errors 1040, bad file extent, some csum missing
> There are hundreds of similar error lines.
>
> There were only a few for root 329 and one for 330.
>
> What is the method to map root 334, for example, to a file system
> path? Is it like this?
>
> # btrfs su li /
> ID 257 gen 1106448 top level 5 path @root
> ...
> ID 329 gen 1105905 top level 326 path @home/live/snapshot/user1/.cache
> ID 330 gen 1105905 top level 326 path @home/live/snapshot/user2/.cache
> ID 331 gen 1105905 top level 326 path @home/live/snapshot/user3/.cache
> ID 332 gen 1105905 top level 326 path @home/live/snapshot/user4.cache
> ID 333 gen 1105905 top level 326 path @home/live/snapshot/user5/.cache
> ID 334 gen 1105905 top level 326 path @home/live/snapshot/user6/.cache
>
> # cd /home/user6/.cache
> # find . -inum 184116
> ./mozilla/firefox/profile1/cache2/entries/3E5DF2A295E7D36F537DFDC221EBD6153F46DC30
>
> Did I do that correctly?

Yes, you're doing it correctly.

>
> # less ./mozilla/firefox/profile1/cache2/entries/3E5DF2A295E7D36F537DFDC221EBD6153F46DC30
> "./mozilla/firefox/profile1/cache2/entries/3E5DF2A295E7D36F537DFDC221EBD6153F46DC30"
> may be a binary file.  See it anyway?
>
> I viewed it and there are no errors in terminal or systemd journal
> when reading it.
>
> Next I tested every reported inode in root 334 (assuming I identified
> the root correctly) using this method:
>
> find /home/user6/.cache/ -inum 184874 -exec bash -c 'cp "{}" /tmp ;
> out=$(basename "{}"); rm /tmp/$out' \;
>
> I got a list of every inode number (e.g., 184874) from the output of
> my prior checks and looped through them all. No errors were reported.
>
> I do not see any related errors in dmesg either.


That's the expected behavior.

So the original problem of failed to read is another problem.
>
> # dmesg | grep -i btrfs
> [  +0.032192] Btrfs loaded, crc32c=crc32c-intel, zoned=yes
> [  +0.000546] BTRFS: device label top_level devid 1 transid 1106559
> /dev/dm-0 scanned by systemd-udevd (120)
> [  +0.029879] BTRFS info (device dm-0): disk space caching is enabled
> [  +0.000003] BTRFS info (device dm-0): has skinny extents
> [  +0.096620] BTRFS info (device dm-0): enabling ssd optimizations
> [  +0.002567] BTRFS info (device dm-0): enabling auto defrag
> [  +0.000005] BTRFS info (device dm-0): use lzo compression, level 0
> [  +0.000005] BTRFS info (device dm-0): disk space caching is enabled
> [  +0.044004] BTRFS info (device dm-0): devid 1 device path
> /dev/mapper/root changed to /dev/dm-0 scanned by systemd-udevd (275)
> [  +0.000829] BTRFS info (device dm-0): devid 1 device path /dev/dm-0
> changed to /dev/mapper/root scanned by system
>
> The only other FS-related messages in dmesg are:
>
> [  +0.142425] FS-Cache: Netfs 'nfs' registered for caching
> [  +0.018228] Key type dns_resolver registered
> [  +0.194893] NFS: Registering the id_resolver key type
> [  +0.000016] Key type id_resolver registered
> [  +0.000001] Key type id_legacy registered
> [  +0.022450] FS-Cache: Duplicate cookie detected
>
> If I have done that correctly, it raises some interesting questions.
> First, I started using a btrfs subvolume for user .cache directories
> in late 2018. I do this:
>
> users_list="user1 user2 user3 ... userN"
> for uu in $users_list; do \
>    btrfs su cr $destination/@home/live/snapshot/${uu}/.cache
>      chattr +C $destination/@home/live/snapshot/${uu}/.cache
>      chown ${uu}:${uu} $destination/@home/live/snapshot/${uu}/.cache
> done
>
> The reason is to not include the cache contents in snapshots & backups.
>
> The user6 user has apparently not logged into this particular device
> since May 15, 2019. (It is now used primarily by someone else.) The
> files in /home/user6/.cache appear to all have dates equal or prior to
> May 15, 2019, but no older than Feb 3, 2019. The vast majority of the
> reported errors were in these files. However, I do not see errors when
> accessing those files now.

So far so good, every thing is working as expected.

Just the btrfs-check is a little paranoid.

BTW, despite the bad file extent and csum missing error, is there any
other error reported from btrfs check?

>
>> - Find a way to reproduce the read-only fs problem
>
> This happened when I was using btrbk to send|receive snapshots to a
> target via ssh. I do not think it is a coincidence that I was doing a
> btrfs operation at the time this happened.
>
> I did the same btrbk operation on another device (a ThinkPad T450
> laptop) that has been running Arch Linux and BTRFS since many years
> ago (probably around 2015). However, the btrbk operation succeeded
> with no errors.
>
> Here is exactly what I did when the read-only problem first happened:
>
> # btrbk dryrun
> --------------------------------------------------------------------------------
> Backup Summary (btrbk command line client, version 0.31.2)
>
>      Date:   Tue Jul 13 23:11:32 2021
>      Config: /etc/btrbk/btrbk.conf
>      Dryrun: YES
>
> Legend:
>      ===  up-to-date subvolume (source snapshot)
>      +++  created subvolume (source snapshot)
>      ---  deleted subvolume
>      ***  received subvolume (non-incremental)
>      >>>  received subvolume (incremental)
> --------------------------------------------------------------------------------
> /mnt/top_level/@root/live/snapshot
> +++ /mnt/top_level/@root/_btrbk_snap/root.20210713T231132-0400
> *** backupsrv:/backup/clnt/laptop2/@root/root.20210713T231132-0400
>
> /mnt/top_level/@home/live/snapshot
> +++ /mnt/top_level/@home/_btrbk_snap/home.20210713T231132-0400
> *** backupsrv:/backup/clnt/laptop2/@home/home.20210713T231132-0400
>
> /mnt/top_level/@logs/live/snapshot
> +++ /mnt/top_level/@logs/_btrbk_snap/vlog.20210713T231132-0400
> *** backupsrv:/backup/clnt/laptop2/@log/vlog.20210713T231132-0400
>
> NOTE: Dryrun was active, none of the operations above were actually executed!
>
> # systemctl disable --now snapper-timeline.timer
>
> # systemctl enable --now btrbk.timer
> Created symlink /etc/systemd/system/timers.target.wants/btrbk.timer →
> /usr/lib/systemd/system/btrbk.timer.
>
> # systemctl list-timers --all
> NEXT                        LEFT        LAST
> PASSED        UNIT                         ACTIVATES
> Wed 2021-07-14 00:00:00 EDT 47min left  n/a
> n/a           btrbk.timer                  btrbk.service
> Wed 2021-07-14 00:00:00 EDT 47min left  Tue 2021-07-13 09:05:48 EDT
> 14h ago       logrotate.timer              logrotate.service
> Wed 2021-07-14 00:00:00 EDT 47min left  Tue 2021-07-13 09:05:48 EDT
> 14h ago       man-db.timer                 man-db.service
> Wed 2021-07-14 00:00:00 EDT 47min left  Tue 2021-07-13 09:05:48 EDT
> 14h ago       shadow.timer                 shadow.service
> Wed 2021-07-14 17:31:57 EDT 18h left    Tue 2021-07-13 17:31:57 EDT 5h
> 40min ago  snapper-cleanup.timer        snapper-cleanup.service
> Wed 2021-07-14 17:36:17 EDT 18h left    Tue 2021-07-13 17:36:17 EDT 5h
> 36min ago  systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service
> Mon 2021-07-19 01:11:06 EDT 5 days left Mon 2021-07-12 01:24:15 EDT 1
> day 21h ago fstrim.timer                 fstrim.service
>
> 7 timers listed.
>
> # systemctl start btrbk.service
>
> # systemctl status btrbk
> ○ btrbk.service - btrbk backup
>       Loaded: loaded (/usr/lib/systemd/system/btrbk.service; static)
>      Drop-In: /etc/systemd/system/btrbk.service.d
>               └─override.conf
>       Active: inactive (dead) since Tue 2021-07-13 23:17:54 EDT; 20s ago
> TriggeredBy: ● btrbk.timer
>         Docs: man:btrbk(1)
>      Process: 6827 ExecStart=/usr/local/bin/btrbk_run.sh (code=exited,
> status=0/SUCCESS)
>     Main PID: 6827 (code=exited, status=0/SUCCESS)
>          CPU: 2min 40.794s
>
> # mount /mnt/top_level/
> mount: /mnt/top_level: wrong fs type, bad option, bad superblock on
> /dev/mapper/root, missing codepage or helper program, or other error.
>
> # ls /mnt/top_level/
> total 0
> drwxr-x--x 1 root root   0 Nov  1  2017 .
> drwxr-xr-x 1 root root 116 Apr 10  2020 ..
>
> My prompt includes a timestamp like this:
>
>   !2813 [13-Jul 23:19:18] root@laptop2
> # journalctl -r
> An error was encountered while opening journal file or directory
> /var/log/journal/7db5321aaf884af786868ec2f2e9c7b0/system.journal,
> ignoring file: Input/output error
> -- Journal begins at Thu 2021-06-17 15:14:31 EDT, ends at Tue
> 2021-07-13 16:19:12 EDT. --
> Jul 13 16:19:12 laptop2 sudo[674]: pam_unix(sudo:session): session
> opened for user root(uid=0) by user0(uid=1000)
>
> As far as I can tell, the last 7 hours of the journal are missing at that point.
>
> That's exactly how the read-only problem happened. I did a btrbk
> dryrun to validate the configuration. Then I started the backup. Near
> (or at) the end of the backup for the root subvolume, the backup
> process exited, but I could not see the journal entries for it because
> they were missing and the filesystem was read-only.

It's a pity that we didn't get the dmesg of that RO event, it should
contain the most valuable info.

But at least so far your old fs is pretty fine, you can continue using it.

Thanks,
Qu

>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: bad file extent, some csum missing - how to check that restored volumes are error-free?
  2021-07-16 23:06                       ` Qu Wenruo
@ 2021-07-17  0:18                         ` Dave T
  2021-07-17  0:25                           ` Qu Wenruo
  0 siblings, 1 reply; 17+ messages in thread
From: Dave T @ 2021-07-17  0:18 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, Btrfs BTRFS

On Fri, Jul 16, 2021 at 7:06 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
> So far so good, every thing is working as expected.

Thank you for confirming. I learned a lot in this discussion.

> Just the btrfs-check is a little paranoid.
>
> BTW, despite the bad file extent and csum missing error, is there any
> other error reported from btrfs check?

No, there was not. However... (see below)

> It's a pity that we didn't get the dmesg of that RO event, it should
> contain the most valuable info.
>
> But at least so far your old fs is pretty fine, you can continue using it.

... since you don't need me to do any more testing on this fs and I
don't need the old fs anymore, I decided to experiment.

I did the following operations:

btrfs check --mode=lowmem /dev/mapper/${mydev}luks
This reported exactly the same csum issue that I showed you
previously. For example:
ERROR: root 334 EXTENT_DATA[258 73728] compressed extent must have
csum, but only 0 bytes have, expect 4096
ERROR: root 334 EXTENT_DATA[258 73728] is compressed, but inode flag
doesn't allow it
The roots and inodes appear to be the same ones reported previously.
Nothing new.

So I experimented with these operations:
# btrfs check --clear-space-cache v1 /dev/mapper/${mydev}luks
Checking filesystem on /dev/mapper/sda2luks
UUID: ff2b04ab-088c-4fb0-9ad4-84780c23f821
Free space cache cleared
(no errors reported)

I wanted to try that on a fs I don't care about before I try it for
real. I also wanted to try the next operation.

# btrfs check --clear-ino-cache  /dev/mapper/${mydev}luks
...
Successfully cleaned up ino cache for root id: 5
Successfully cleaned up ino cache for root id: 257
Successfully cleaned up ino cache for root id: 258
(no errors reported)

I have never used the repair option, but I decided to see what would
happen with this next operation. Maybe I should not have combined
these parameters?

# btrfs check --repair --init-csum-tree /dev/mapper/${mydev}luks
...
Reinitialize checksum tree
[1/7] checking root items
Fixed 0 roots.
[2/7] checking extents
ref mismatch on [22921216 16384] extent item 1, found 0
backref 22921216 root 7 not referenced back 0x56524a54f850
incorrect global backref count on 22921216 found 1 wanted 0
backpointer mismatch on [22921216 16384]
owner ref check failed [22921216 16384]
repair deleting extent record: key [22921216,169,0]
Repaired extent references for 22921216
ref mismatch on [23085056 16384] extent item 1, found 0
backref 23085056 root 7 not referenced back 0x565264430000
incorrect global backref count on 23085056 found 1 wanted 0
backpointer mismatch on [23085056 16384]
owner ref check failed [23085056 16384]
repair deleting extent record: key [23085056,169,0]
... more
(The above operation reported tons of errors. Maybe I did damage to
the fs with this operation? Are any of the errors of interest to you?)

I ran it again, but with just the --repair option:
# btrfs check --repair /dev/mapper/${mydev}luks
Starting repair.
Opening filesystem to check...
Checking filesystem on /dev/mapper/xyzluks
UUID: ff2b04ab-088c-4fb0-9ad4-84780c23f821
[1/7] checking root items
Fixed 0 roots.
[2/7] checking extents
ref mismatch on [21625421824 28672] extent item 17, found 16
incorrect local backref count on 21625421824 parent 106806263808 owner
0 offset 0 found 0 wanted 1 back 0x55798f5fdc10
backref disk bytenr does not match extent record, bytenr=21625421824,
ref bytenr=0
backpointer mismatch on [21625421824 28672]
repair deleting extent record: key [21625421824,168,28672]
adding new data backref on 21625421824 parent 368825810944 owner 0
offset 0 found 1
adding new data backref on 21625421824 parent 309755756544 owner 0
offset 0 found 1
adding new data backref on 21625421824 parent 122323271680 owner 0
offset 0 found 1
adding new data backref on 21625421824 parent 139575754752 owner 0
offset 0 found 1
adding new data backref on 21625421824 parent 107060248576 owner 0
offset 0 found 1
adding new data backref on 21625421824 parent 107140677632 owner 0
offset 0 found 1
adding new data backref on 21625421824 parent 107212980224 owner 0
offset 0 found 1
adding new data backref on 21625421824 parent 771014656 owner 0 offset 0 found 1
adding new data backref on 21625421824 parent 180469760 owner 0 offset 0 found 1
adding new data backref on 21625421824 root 26792 owner 359 offset 0 found 1
adding new data backref on 21625421824 parent 160677888 owner 0 offset 0 found 1
adding new data backref on 21625421824 parent 461373440 owner 0 offset 0 found 1
adding new data backref on 21625421824 root 1761 owner 359 offset 0 found 1
adding new data backref on 21625421824 root 280 owner 359 offset 0 found 1
adding new data backref on 21625421824 root 326 owner 359 offset 0 found 1
adding new data backref on 21625421824 root 26786 owner 359 offset 0 found 1
Repaired extent references for 21625421824
ref mismatch on [21625450496 4096] extent item 17, found 16
incorrect local backref count on 21625450496 parent 106806263808 owner
0 offset 0 found 0 wanted 1 back 0x55798f5fe340
backref disk bytenr does not match extent record, bytenr=21625450496,
ref bytenr=0
backpointer mismatch on [21625450496 4096]
repair deleting extent record: key [21625450496,168,4096]
adding new data backref on 21625450496 parent 368825810944 owner 0
offset 0 found 1
adding new data backref on 21625450496 parent 309755756544 owner 0
offset 0 found 1
adding new data backref on 21625450496 parent 122323271680 owner 0
offset 0 found 1
adding new data backref on 21625450496 parent 139575754752 owner 0
offset 0 found 1
adding new data backref on 21625450496 parent 107060248576 owner 0
offset 0 found 1
adding new data backref on 21625450496 parent 107140677632 owner 0
offset 0 found 1
adding new data backref on 21625450496 parent 107212980224 owner 0
offset 0 found 1
adding new data backref on 21625450496 parent 771014656 owner 0 offset 0 found 1
adding new data backref on 21625450496 parent 180469760 owner 0 offset 0 found 1
adding new data backref on 21625450496 root 26792 owner 369 offset 0 found 1
adding new data backref on 21625450496 parent 160677888 owner 0 offset 0 found 1
...more
It reported many, many more errors. I'm not sure if any of that
interests you. My plan now is to wipe and reuse this SSD for something
else (with a BTRFS fs of course).

I'm just curious about one thing. Did I create all these problems with
the repair option or were these underlying issues that were not
previously found?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: bad file extent, some csum missing - how to check that restored volumes are error-free?
  2021-07-17  0:18                         ` Dave T
@ 2021-07-17  0:25                           ` Qu Wenruo
  2021-07-17  0:57                             ` Dave T
  0 siblings, 1 reply; 17+ messages in thread
From: Qu Wenruo @ 2021-07-17  0:25 UTC (permalink / raw)
  To: Dave T; +Cc: Qu Wenruo, Btrfs BTRFS



On 2021/7/17 上午8:18, Dave T wrote:
> On Fri, Jul 16, 2021 at 7:06 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>> So far so good, every thing is working as expected.
>
> Thank you for confirming. I learned a lot in this discussion.
>
>
>> Just the btrfs-check is a little paranoid.
>>
>> BTW, despite the bad file extent and csum missing error, is there any
>> other error reported from btrfs check?
>
> No, there was not. However... (see below)
>
>> It's a pity that we didn't get the dmesg of that RO event, it should
>> contain the most valuable info.
>>
>> But at least so far your old fs is pretty fine, you can continue using it.
>
> ... since you don't need me to do any more testing on this fs and I
> don't need the old fs anymore, I decided to experiment.
>
> I did the following operations:
>
> btrfs check --mode=lowmem /dev/mapper/${mydev}luks
> This reported exactly the same csum issue that I showed you
> previously. For example:
> ERROR: root 334 EXTENT_DATA[258 73728] compressed extent must have
> csum, but only 0 bytes have, expect 4096
> ERROR: root 334 EXTENT_DATA[258 73728] is compressed, but inode flag
> doesn't allow it
> The roots and inodes appear to be the same ones reported previously.
> Nothing new.
>
> So I experimented with these operations:
> # btrfs check --clear-space-cache v1 /dev/mapper/${mydev}luks
> Checking filesystem on /dev/mapper/sda2luks
> UUID: ff2b04ab-088c-4fb0-9ad4-84780c23f821
> Free space cache cleared
> (no errors reported)

This is pretty safe, but can be slow on very large fs.

>
> I wanted to try that on a fs I don't care about before I try it for
> real. I also wanted to try the next operation.
>
> # btrfs check --clear-ino-cache  /dev/mapper/${mydev}luks
> ...
> Successfully cleaned up ino cache for root id: 5
> Successfully cleaned up ino cache for root id: 257
> Successfully cleaned up ino cache for root id: 258
> (no errors reported)

Inode cache is now deprecated and rarely used. It should do nothing on
your fs anyway.

>
> I have never used the repair option, but I decided to see what would
> happen with this next operation. Maybe I should not have combined
> these parameters?
>
> # btrfs check --repair --init-csum-tree /dev/mapper/${mydev}luks

This is a little dangerous, especially there isn't much
experiments/tests when used with missing csums.

> ...
> Reinitialize checksum tree
> [1/7] checking root items
> Fixed 0 roots.
> [2/7] checking extents
> ref mismatch on [22921216 16384] extent item 1, found 0
> backref 22921216 root 7 not referenced back 0x56524a54f850
> incorrect global backref count on 22921216 found 1 wanted 0
> backpointer mismatch on [22921216 16384]
> owner ref check failed [22921216 16384]
> repair deleting extent record: key [22921216,169,0]
> Repaired extent references for 22921216
> ref mismatch on [23085056 16384] extent item 1, found 0
> backref 23085056 root 7 not referenced back 0x565264430000
> incorrect global backref count on 23085056 found 1 wanted 0
> backpointer mismatch on [23085056 16384]
> owner ref check failed [23085056 16384]
> repair deleting extent record: key [23085056,169,0]
> ... more
> (The above operation reported tons of errors. Maybe I did damage to
> the fs with this operation? Are any of the errors of interest to you?)

This is definitely caused by the repair, but I don't think it's a big deal.

>
> I ran it again, but with just the --repair option:
> # btrfs check --repair /dev/mapper/${mydev}luks
> Starting repair.
> Opening filesystem to check...
> Checking filesystem on /dev/mapper/xyzluks
> UUID: ff2b04ab-088c-4fb0-9ad4-84780c23f821
> [1/7] checking root items
> Fixed 0 roots.
> [2/7] checking extents
> ref mismatch on [21625421824 28672] extent item 17, found 16
> incorrect local backref count on 21625421824 parent 106806263808 owner
> 0 offset 0 found 0 wanted 1 back 0x55798f5fdc10
> backref disk bytenr does not match extent record, bytenr=21625421824,
> ref bytenr=0
> backpointer mismatch on [21625421824 28672]
> repair deleting extent record: key [21625421824,168,28672]
> adding new data backref on 21625421824 parent 368825810944 owner 0
> offset 0 found 1
> adding new data backref on 21625421824 parent 309755756544 owner 0
> offset 0 found 1
> adding new data backref on 21625421824 parent 122323271680 owner 0
> offset 0 found 1
> adding new data backref on 21625421824 parent 139575754752 owner 0
> offset 0 found 1
> adding new data backref on 21625421824 parent 107060248576 owner 0
> offset 0 found 1
> adding new data backref on 21625421824 parent 107140677632 owner 0
> offset 0 found 1
> adding new data backref on 21625421824 parent 107212980224 owner 0
> offset 0 found 1
> adding new data backref on 21625421824 parent 771014656 owner 0 offset 0 found 1
> adding new data backref on 21625421824 parent 180469760 owner 0 offset 0 found 1
> adding new data backref on 21625421824 root 26792 owner 359 offset 0 found 1
> adding new data backref on 21625421824 parent 160677888 owner 0 offset 0 found 1
> adding new data backref on 21625421824 parent 461373440 owner 0 offset 0 found 1
> adding new data backref on 21625421824 root 1761 owner 359 offset 0 found 1
> adding new data backref on 21625421824 root 280 owner 359 offset 0 found 1
> adding new data backref on 21625421824 root 326 owner 359 offset 0 found 1
> adding new data backref on 21625421824 root 26786 owner 359 offset 0 found 1
> Repaired extent references for 21625421824
> ref mismatch on [21625450496 4096] extent item 17, found 16
> incorrect local backref count on 21625450496 parent 106806263808 owner
> 0 offset 0 found 0 wanted 1 back 0x55798f5fe340
> backref disk bytenr does not match extent record, bytenr=21625450496,
> ref bytenr=0
> backpointer mismatch on [21625450496 4096]
> repair deleting extent record: key [21625450496,168,4096]
> adding new data backref on 21625450496 parent 368825810944 owner 0
> offset 0 found 1
> adding new data backref on 21625450496 parent 309755756544 owner 0
> offset 0 found 1
> adding new data backref on 21625450496 parent 122323271680 owner 0
> offset 0 found 1
> adding new data backref on 21625450496 parent 139575754752 owner 0
> offset 0 found 1
> adding new data backref on 21625450496 parent 107060248576 owner 0
> offset 0 found 1
> adding new data backref on 21625450496 parent 107140677632 owner 0
> offset 0 found 1
> adding new data backref on 21625450496 parent 107212980224 owner 0
> offset 0 found 1
> adding new data backref on 21625450496 parent 771014656 owner 0 offset 0 found 1
> adding new data backref on 21625450496 parent 180469760 owner 0 offset 0 found 1
> adding new data backref on 21625450496 root 26792 owner 369 offset 0 found 1
> adding new data backref on 21625450496 parent 160677888 owner 0 offset 0 found 1
> ...more
> It reported many, many more errors.

At the same time, it also says it's repairing these problems.

> I'm not sure if any of that
> interests you. My plan now is to wipe and reuse this SSD for something
> else (with a BTRFS fs of course).

That's completely fine.

But before that, would you mind to run "btrfs check" again on the fs to
see if it reports any error?

I'm interested to see the result though.

>
> I'm just curious about one thing. Did I create all these problems with
> the repair option or were these underlying issues that were not
> previously found?
>
It's mostly created by the repair, as --init-csum-tree would re-generate
csums, it will also cause the old csum items to mismatch from its extent
items.

It's mostly expected, but normally btrfs check --repair should be able
to fix them.
If not, we need to fix btrfs-progs then.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: bad file extent, some csum missing - how to check that restored volumes are error-free?
  2021-07-17  0:25                           ` Qu Wenruo
@ 2021-07-17  0:57                             ` Dave T
  2021-07-17  0:59                               ` Qu Wenruo
  0 siblings, 1 reply; 17+ messages in thread
From: Dave T @ 2021-07-17  0:57 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, Btrfs BTRFS

> But before that, would you mind to run "btrfs check" again on the fs to
> see if it reports any error?

> I'm interested to see the result though.

First I will send you the full output of the command I ran:
btrfs check --repair --init-csum-tree /dev/mapper/xyz
It's a lot of output - around 50MB before I zip it up.
How about if I send that to you as an attachment and mail it directly
to you, not the list?

Next step: I have remounted the old fs and I'm going to run a scrub on it.

Then I will unmount it and run btrfs check again and send you the
output. Again, I'll send it to you privately, OK?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: bad file extent, some csum missing - how to check that restored volumes are error-free?
  2021-07-17  0:57                             ` Dave T
@ 2021-07-17  0:59                               ` Qu Wenruo
  2021-07-25 17:34                                 ` Dave T
  0 siblings, 1 reply; 17+ messages in thread
From: Qu Wenruo @ 2021-07-17  0:59 UTC (permalink / raw)
  To: Dave T; +Cc: Qu Wenruo, Btrfs BTRFS



On 2021/7/17 上午8:57, Dave T wrote:
>> But before that, would you mind to run "btrfs check" again on the fs to
>> see if it reports any error?
>
>> I'm interested to see the result though.
>
> First I will send you the full output of the command I ran:
> btrfs check --repair --init-csum-tree /dev/mapper/xyz
> It's a lot of output - around 50MB before I zip it up.
> How about if I send that to you as an attachment and mail it directly
> to you, not the list?

It works for me either way.

>
> Next step: I have remounted the old fs and I'm going to run a scrub on it.

Scrub shouldn't detect much thing else, but it won't hurt anyway.

>
> Then I will unmount it and run btrfs check again and send you the
> output. Again, I'll send it to you privately, OK?
>

That's fine to me.

THanks,
Qu

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: bad file extent, some csum missing - how to check that restored volumes are error-free?
  2021-07-17  0:59                               ` Qu Wenruo
@ 2021-07-25 17:34                                 ` Dave T
  2021-07-25 23:51                                   ` Qu Wenruo
  0 siblings, 1 reply; 17+ messages in thread
From: Dave T @ 2021-07-25 17:34 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, Btrfs BTRFS

HI Qu. Was the information I sent helpful? Is there any final lesson I
should take away from this? Thank you.

On Fri, Jul 16, 2021 at 9:00 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2021/7/17 上午8:57, Dave T wrote:
> >> But before that, would you mind to run "btrfs check" again on the fs to
> >> see if it reports any error?
> >
> >> I'm interested to see the result though.
> >
> > First I will send you the full output of the command I ran:
> > btrfs check --repair --init-csum-tree /dev/mapper/xyz
> > It's a lot of output - around 50MB before I zip it up.
> > How about if I send that to you as an attachment and mail it directly
> > to you, not the list?
>
> It works for me either way.
>
> >
> > Next step: I have remounted the old fs and I'm going to run a scrub on it.
>
> Scrub shouldn't detect much thing else, but it won't hurt anyway.
>
> >
> > Then I will unmount it and run btrfs check again and send you the
> > output. Again, I'll send it to you privately, OK?
> >
>
> That's fine to me.
>
> THanks,
> Qu

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: bad file extent, some csum missing - how to check that restored volumes are error-free?
  2021-07-25 17:34                                 ` Dave T
@ 2021-07-25 23:51                                   ` Qu Wenruo
  0 siblings, 0 replies; 17+ messages in thread
From: Qu Wenruo @ 2021-07-25 23:51 UTC (permalink / raw)
  To: Dave T; +Cc: Qu Wenruo, Btrfs BTRFS



On 2021/7/26 上午1:34, Dave T wrote:
> HI Qu. Was the information I sent helpful? Is there any final lesson I
> should take away from this? Thank you.

Sorry, nothing much less can be provided.

It mostly looks like btrfs check --init-extent-tree is doing a pretty
bad rebuild of extent tree.

Thus I won't recommend it for future repair.

Thanks,
Qu

>
> On Fri, Jul 16, 2021 at 9:00 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>>
>> On 2021/7/17 上午8:57, Dave T wrote:
>>>> But before that, would you mind to run "btrfs check" again on the fs to
>>>> see if it reports any error?
>>>
>>>> I'm interested to see the result though.
>>>
>>> First I will send you the full output of the command I ran:
>>> btrfs check --repair --init-csum-tree /dev/mapper/xyz
>>> It's a lot of output - around 50MB before I zip it up.
>>> How about if I send that to you as an attachment and mail it directly
>>> to you, not the list?
>>
>> It works for me either way.
>>
>>>
>>> Next step: I have remounted the old fs and I'm going to run a scrub on it.
>>
>> Scrub shouldn't detect much thing else, but it won't hurt anyway.
>>
>>>
>>> Then I will unmount it and run btrfs check again and send you the
>>> output. Again, I'll send it to you privately, OK?
>>>
>>
>> That's fine to me.
>>
>> THanks,
>> Qu

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-07-25 23:51 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-14 17:53 bad file extent, some csum missing - how to check that restored volumes are error-free? Dave T
2021-07-14 22:51 ` Qu Wenruo
     [not found]   ` <CAGdWbB44nH7dgdP3qO_bFYZwbkrW37OwFEVTE2Bn+rn4d7zWiQ@mail.gmail.com>
     [not found]     ` <43e7dc04-c862-fff1-45af-fd779206d71c@gmx.com>
     [not found]       ` <CAGdWbB7Q98tSbPgHUBF+yjqYRBPZ-a42hd=xLwMZUMO46gfd0A@mail.gmail.com>
2021-07-15 22:19         ` Dave T
2021-07-15 22:30           ` Qu Wenruo
2021-07-15 22:49             ` Dave T
2021-07-16  1:05               ` Qu Wenruo
2021-07-16  2:32                 ` Qu Wenruo
2021-07-16 13:15                 ` Dave T
2021-07-16 13:28                   ` Qu Wenruo
2021-07-16 15:40                     ` Dave T
2021-07-16 23:06                       ` Qu Wenruo
2021-07-17  0:18                         ` Dave T
2021-07-17  0:25                           ` Qu Wenruo
2021-07-17  0:57                             ` Dave T
2021-07-17  0:59                               ` Qu Wenruo
2021-07-25 17:34                                 ` Dave T
2021-07-25 23:51                                   ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.