linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* How to recover from btrfs scrub errors? (uncorrectable errors, checksum error at logical)
@ 2018-10-15  7:50 Otto Kekäläinen
  2018-10-15  8:30 ` Qu Wenruo
  2018-10-22  6:29 ` Otto Kekäläinen
  0 siblings, 2 replies; 4+ messages in thread
From: Otto Kekäläinen @ 2018-10-15  7:50 UTC (permalink / raw)
  To: linux-btrfs

Hello!

I am trying to figure out how to recover from errors detected by btrfs scrub.

Scrub status reports:

scrub status for 4f4479d5-648a-45b9-bcbf-978c766aeb41
        scrub started at Mon Oct 15 10:02:28 2018, running for 00:35:39
        total bytes scrubbed: 791.15GiB with 18 errors
        error details: csum=18
        corrected errors: 0, uncorrectable errors: 18, unverified errors: 0

Kernel log contains lines like

  BTRFS warning (device dm-8): checksum error at logical 7351706472448 on dev
  /dev/mapper/disk6tb, sector 61412648, root 12725, inode 152358265,
offset 483328:
  path resolving failed with ret=-2

I've tried so far:
- deleting the files (when path is visible)
- overwriting the files with new data
- changed disk (with btrfs replace)

The checksum errors however persist.
How do I get rid of them?


The files are logs and other non-vital information. I am fine by
deleting the corrupted files. It is OK to recover so that I loose a
few gigabytes of data, but not the entire filesystem.

Setup is a multi-disk btrfs filesystem, data single, metadata RAID-1
Mounted with:

/dev/mapper/wdc3td on /data type btrfs
(rw,noatime,compress=lzo,space_cache,subvolid=5,subvol=/)

I've read lots of online sources on the topic but none of these help
me on how to recover from the current state:

https://btrfs.wiki.kernel.org/index.php/Btrfsck
http://marc.merlins.org/perso/btrfs/post_2014-03-19_Btrfs-Tips_-Btrfs-Scrub-and-Btrfs-Filesystem-Repair.html
https://wiki.archlinux.org/index.php/Identify_damaged_files#Find_damaged_files

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: How to recover from btrfs scrub errors? (uncorrectable errors, checksum error at logical)
  2018-10-15  7:50 How to recover from btrfs scrub errors? (uncorrectable errors, checksum error at logical) Otto Kekäläinen
@ 2018-10-15  8:30 ` Qu Wenruo
  2018-10-22  6:29 ` Otto Kekäläinen
  1 sibling, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2018-10-15  8:30 UTC (permalink / raw)
  To: Otto Kekäläinen, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2508 bytes --]



On 2018/10/15 下午3:50, Otto Kekäläinen wrote:
> Hello!
> 
> I am trying to figure out how to recover from errors detected by btrfs scrub.
> 
> Scrub status reports:
> 
> scrub status for 4f4479d5-648a-45b9-bcbf-978c766aeb41
>         scrub started at Mon Oct 15 10:02:28 2018, running for 00:35:39
>         total bytes scrubbed: 791.15GiB with 18 errors
>         error details: csum=18
>         corrected errors: 0, uncorrectable errors: 18, unverified errors: 0
> 
> Kernel log contains lines like
> 
>   BTRFS warning (device dm-8): checksum error at logical 7351706472448 on dev
>   /dev/mapper/disk6tb, sector 61412648, root 12725, inode 152358265,
> offset 483328:
>   path resolving failed with ret=-2
> 
> I've tried so far:
> - deleting the files (when path is visible)

Please ensure there are no other subvolumes/snapshots containing the
same file or reflink to it.

If path is not visible, please use the root and inode number to locate
the culprit file.
"find" command support to search using inode number.
And "btrfs subvolume list" command will show the subvolume number.

Also it's recommended to sync the fs before scrub, in case culprit inode
only get orphaned but not deleted from disk.

> - overwriting the files with new data

If you're only overwriting the culprit sector, it could get CoWed and
the original data extent is still there.

You need to ensure the old data is not referred by any other root/inode.
Please ensure there is no reflink/snapshot first.

Then delete the file or overwrite the whole culprit file.

Thanks,
Qu

> - changed disk (with btrfs replace)
> 
> The checksum errors however persist.
> How do I get rid of them?
> 
> 
> The files are logs and other non-vital information. I am fine by
> deleting the corrupted files. It is OK to recover so that I loose a
> few gigabytes of data, but not the entire filesystem.
> 
> Setup is a multi-disk btrfs filesystem, data single, metadata RAID-1
> Mounted with:
> 
> /dev/mapper/wdc3td on /data type btrfs
> (rw,noatime,compress=lzo,space_cache,subvolid=5,subvol=/)
> 
> I've read lots of online sources on the topic but none of these help
> me on how to recover from the current state:
> 
> https://btrfs.wiki.kernel.org/index.php/Btrfsck
> http://marc.merlins.org/perso/btrfs/post_2014-03-19_Btrfs-Tips_-Btrfs-Scrub-and-Btrfs-Filesystem-Repair.html
> https://wiki.archlinux.org/index.php/Identify_damaged_files#Find_damaged_files
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: How to recover from btrfs scrub errors? (uncorrectable errors, checksum error at logical)
  2018-10-15  7:50 How to recover from btrfs scrub errors? (uncorrectable errors, checksum error at logical) Otto Kekäläinen
  2018-10-15  8:30 ` Qu Wenruo
@ 2018-10-22  6:29 ` Otto Kekäläinen
  2018-10-22  6:53   ` Qu Wenruo
  1 sibling, 1 reply; 4+ messages in thread
From: Otto Kekäläinen @ 2018-10-22  6:29 UTC (permalink / raw)
  To: linux-btrfs

I never got a reply to this thread, but I am not replying to myself in
case somebody has the same issue and is reading the archive:

The problem went away after:
- deleted all snapshots as they seemed to slow down btrfs I/O so much
that simple commands like rm and rsync were unusable
- replaced the disk that had the corrupted file (just in case -
smartctl did not indicate any disk failures) with btrfs replace
- rsynced files from another location to this filesystem so that the
corrupted files got overwritten

Now btrfs scrub does not find any corruption anymore and the
filesystem I/O speed is usable, though still slower than what it used
to be in the past.

ma 15. lokak. 2018 klo 10.50 Otto Kekäläinen (otto@seravo.fi) kirjoitti:
>
> Hello!
>
> I am trying to figure out how to recover from errors detected by btrfs scrub.
>
> Scrub status reports:
>
> scrub status for 4f4479d5-648a-45b9-bcbf-978c766aeb41
>         scrub started at Mon Oct 15 10:02:28 2018, running for 00:35:39
>         total bytes scrubbed: 791.15GiB with 18 errors
>         error details: csum=18
>         corrected errors: 0, uncorrectable errors: 18, unverified errors: 0
>
> Kernel log contains lines like
>
>   BTRFS warning (device dm-8): checksum error at logical 7351706472448 on dev
>   /dev/mapper/disk6tb, sector 61412648, root 12725, inode 152358265,
> offset 483328:
>   path resolving failed with ret=-2
>
> I've tried so far:
> - deleting the files (when path is visible)
> - overwriting the files with new data
> - changed disk (with btrfs replace)
>
> The checksum errors however persist.
> How do I get rid of them?
>
>
> The files are logs and other non-vital information. I am fine by
> deleting the corrupted files. It is OK to recover so that I loose a
> few gigabytes of data, but not the entire filesystem.
>
> Setup is a multi-disk btrfs filesystem, data single, metadata RAID-1
> Mounted with:
>
> /dev/mapper/wdc3td on /data type btrfs
> (rw,noatime,compress=lzo,space_cache,subvolid=5,subvol=/)
>
> I've read lots of online sources on the topic but none of these help
> me on how to recover from the current state:
>
> https://btrfs.wiki.kernel.org/index.php/Btrfsck
> http://marc.merlins.org/perso/btrfs/post_2014-03-19_Btrfs-Tips_-Btrfs-Scrub-and-Btrfs-Filesystem-Repair.html
> https://wiki.archlinux.org/index.php/Identify_damaged_files#Find_damaged_files



-- 
Otto Kekäläinen
CEO
Seravo
+358 44 566 2204

Follow me at @ottokekalainen

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: How to recover from btrfs scrub errors? (uncorrectable errors, checksum error at logical)
  2018-10-22  6:29 ` Otto Kekäläinen
@ 2018-10-22  6:53   ` Qu Wenruo
  0 siblings, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2018-10-22  6:53 UTC (permalink / raw)
  To: Otto Kekäläinen, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2768 bytes --]



On 2018/10/22 下午2:29, Otto Kekäläinen wrote:
> I never got a reply to this thread, 

I replied to you but got no rely:

https://lore.kernel.org/linux-btrfs/eba5de6f-535a-0f5d-e415-9cd622d71b36@gmx.com/

And your steps are just what I suggested.

Thanks,
Qu

> but I am not replying to myself in
> case somebody has the same issue and is reading the archive:
> 
> The problem went away after:
> - deleted all snapshots as they seemed to slow down btrfs I/O so much
> that simple commands like rm and rsync were unusable
> - replaced the disk that had the corrupted file (just in case -
> smartctl did not indicate any disk failures) with btrfs replace
> - rsynced files from another location to this filesystem so that the
> corrupted files got overwritten
> 
> Now btrfs scrub does not find any corruption anymore and the
> filesystem I/O speed is usable, though still slower than what it used
> to be in the past.
> 
> ma 15. lokak. 2018 klo 10.50 Otto Kekäläinen (otto@seravo.fi) kirjoitti:
>>
>> Hello!
>>
>> I am trying to figure out how to recover from errors detected by btrfs scrub.
>>
>> Scrub status reports:
>>
>> scrub status for 4f4479d5-648a-45b9-bcbf-978c766aeb41
>>         scrub started at Mon Oct 15 10:02:28 2018, running for 00:35:39
>>         total bytes scrubbed: 791.15GiB with 18 errors
>>         error details: csum=18
>>         corrected errors: 0, uncorrectable errors: 18, unverified errors: 0
>>
>> Kernel log contains lines like
>>
>>   BTRFS warning (device dm-8): checksum error at logical 7351706472448 on dev
>>   /dev/mapper/disk6tb, sector 61412648, root 12725, inode 152358265,
>> offset 483328:
>>   path resolving failed with ret=-2
>>
>> I've tried so far:
>> - deleting the files (when path is visible)
>> - overwriting the files with new data
>> - changed disk (with btrfs replace)
>>
>> The checksum errors however persist.
>> How do I get rid of them?
>>
>>
>> The files are logs and other non-vital information. I am fine by
>> deleting the corrupted files. It is OK to recover so that I loose a
>> few gigabytes of data, but not the entire filesystem.
>>
>> Setup is a multi-disk btrfs filesystem, data single, metadata RAID-1
>> Mounted with:
>>
>> /dev/mapper/wdc3td on /data type btrfs
>> (rw,noatime,compress=lzo,space_cache,subvolid=5,subvol=/)
>>
>> I've read lots of online sources on the topic but none of these help
>> me on how to recover from the current state:
>>
>> https://btrfs.wiki.kernel.org/index.php/Btrfsck
>> http://marc.merlins.org/perso/btrfs/post_2014-03-19_Btrfs-Tips_-Btrfs-Scrub-and-Btrfs-Filesystem-Repair.html
>> https://wiki.archlinux.org/index.php/Identify_damaged_files#Find_damaged_files
> 
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-10-22  6:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-15  7:50 How to recover from btrfs scrub errors? (uncorrectable errors, checksum error at logical) Otto Kekäläinen
2018-10-15  8:30 ` Qu Wenruo
2018-10-22  6:29 ` Otto Kekäläinen
2018-10-22  6:53   ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).