All of lore.kernel.org
 help / color / mirror / Atom feed
* qcow2 images make scrub believe the filesystem is corrupted.
@ 2017-08-16  1:12 Paulo Dias
  2017-08-16  1:40 ` Qu Wenruo
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Paulo Dias @ 2017-08-16  1:12 UTC (permalink / raw)
  To: linux-btrfs

Hello/2 all

I'm using libvirt with a qcow2 image and everytime i run btrfs scrub
-H /home (subvolume where the image is), i get:

ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): bdev
/dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 30, gen 0
ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): unable to
fixup (regular) error at logical 289831161856 on dev /dev/sda3
ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): bdev
/dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 31, gen 0
ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): unable to
fixup (regular) error at logical 289830309888 on dev /dev/sda3
ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): bdev
/dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 32, gen 0
ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): unable to
fixup (regular) error at logical 289831055360 on dev /dev/sda3
ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): bdev
/dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 33, gen 0
ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): unable to
fixup (regular) error at logical 289861591040 on dev /dev/sda3
ago 15 21:58:09 kerberos kernel: BTRFS warning (device sda3): checksum
error at logical 290297204736 on dev /dev/sda3, sector 67982824, root
258, inode 968837, offset 17455849472, length 4096, links 1 (path:
groo/Fedora/Fedora.qcow2)
ago 15 21:58:09 kerberos kernel: BTRFS error (device sda3): bdev
/dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 34, gen 0
ago 15 21:58:09 kerberos kernel: BTRFS error (device sda3): unable to
fixup (regular) error at logical 290297204736 on dev /dev/sda3

The thing is, as soon as i move the image to another subvolume, root
in this case, and delete it, the errors go away and scrub tells me i
have zero errors again.

Then if i AGAIN copy the file back to /home, i get the same errors.

qemu-img check tells me the qcow2 file is fine, and smart doesnt show
me anything wrong with my ssd:

root@kerberos:/home/groo# smartctl -Ai /dev/sda
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.13.0-041300rc4-generic]
(local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Samsung based SSDs
Device Model:     Samsung SSD 850 EVO M.2 500GB
Serial Number:    S33DNX0H812686V
LU WWN Device Id: 5 002538 d4130d027
Firmware Version: EMT21B6Q
User Capacity:    500.107.862.016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      M.2
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4c
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue Aug 15 21:59:34 2017 -03
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail
Always       -       0
  9 Power_On_Hours          0x0032   099   099   000    Old_age
Always       -       1739
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age
Always       -       392
177 Wear_Leveling_Count     0x0013   099   099   000    Pre-fail
Always       -       7
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail
Always       -       0
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age
Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age
Always       -       0
183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail
Always       -       0
187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age
Always       -       0
190 Airflow_Temperature_Cel 0x0032   061   050   000    Old_age
Always       -       39
195 ECC_Error_Rate          0x001a   200   200   000    Old_age
Always       -       0
199 CRC_Error_Count         0x003e   100   100   000    Old_age
Always       -       0
235 POR_Recovery_Count      0x0012   099   099   000    Old_age
Always       -       54
241 Total_LBAs_Written      0x0032   099   099   000    Old_age
Always       -       7997549567

this is the usage for /home:

root@kerberos:/home/groo# btrfs filesystem usage -T /home/
Overall:
    Device size:                 333.50GiB
    Device allocated:             74.12GiB
    Device unallocated:          259.38GiB
    Device missing:                  0.00B
    Used:                         32.70GiB
    Free (estimated):            297.36GiB      (min: 167.67GiB)
    Data ratio:                       1.00
    Metadata ratio:                   2.00
    Global reserve:               58.12MiB      (used: 0.00B)

             Data     Metadata  System
Id Path      single   RAID1     RAID1    Unallocated
-- --------- -------- --------- -------- -----------
 1 /dev/sda3 68.00GiB   2.00GiB 64.00MiB   129.94GiB
 2 /dev/sdb7  2.00GiB   2.00GiB 64.00MiB   128.96GiB
 3 /dev/sdb8        -         -        -   488.13MiB
-- --------- -------- --------- -------- -----------
   Total     70.00GiB   2.00GiB 64.00MiB   259.38GiB
   Used      32.02GiB 348.12MiB 16.00KiB

and for root subvolume:

root@kerberos:/home/groo# btrfs filesystem usage -T /
Overall:
    Device size:                  65.29GiB
    Device allocated:             65.28GiB
    Device unallocated:           12.00MiB
    Device missing:                  0.00B
    Used:                         14.94GiB
    Free (estimated):             48.72GiB      (min: 48.72GiB)
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:               42.20MiB      (used: 0.00B)

             Data     Metadata  System
Id Path      single   single    single   Unallocated
-- --------- -------- --------- -------- -----------
 1 /dev/sda2 63.24GiB   2.01GiB 32.00MiB    12.00MiB
-- --------- -------- --------- -------- -----------
   Total     63.24GiB   2.01GiB 32.00MiB    12.00MiB
   Used      14.52GiB 425.16MiB 16.00KiB

i see this with both kernel 4.12 and 4.13rc4

the btrfstools are:

root@kerberos:/home/groo# btrfs version
btrfs-progs v4.12-dirty

/etc/fstab:

UUID=e31faa09-99e5-4c75-815c-629402ec92f2 /               btrfs
defaults,discard,subvol=@ 0       1
# /boot was on /dev/sda1 during installation
UUID=55796428-a9b8-4f1b-9a7e-8fe3aa8d8097 /boot           ext4
defaults        0       2
# /boot/efi was on /dev/sdb2 during installation
UUID=D4F8-9F87  /boot/efi       vfat    umask=0077      0       1
# /home was on /dev/sda3 during installation
UUID=ae9ae869-720d-4643-b673-6924d09b2fe0 /home           btrfs
defaults,discard,subvol=@home 0       2
# swap was on /dev/sdb6 during installation
#UUID=fc2a432b-4c40-4fe4-9730-869a1d1911ef none            swap    sw
            0       0
/dev/mapper/cryptswap1 none swap sw 0 0


this is reproducible every single time.

is btrfs scrub maybe getting confused with a sparse file? is it
possible to get a bad checksum with raid1 in this scenario?

any help is appreciated

| Paulo Dias
| paulo.miguel.dias@gmail.com

Tempora mutantur, nos et mutamur in illis.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: qcow2 images make scrub believe the filesystem is corrupted.
  2017-08-16  1:12 qcow2 images make scrub believe the filesystem is corrupted Paulo Dias
@ 2017-08-16  1:40 ` Qu Wenruo
  2017-08-16  1:51   ` Paulo Dias
  2017-08-16 23:32 ` Chris Murphy
  2017-08-17 23:39 ` Josef Bacik
  2 siblings, 1 reply; 19+ messages in thread
From: Qu Wenruo @ 2017-08-16  1:40 UTC (permalink / raw)
  To: Paulo Dias, linux-btrfs



On 2017年08月16日 09:12, Paulo Dias wrote:
> Hello/2 all
> 
> I'm using libvirt with a qcow2 image and everytime i run btrfs scrub
> -H /home (subvolume where the image is), i get:
> 
> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): bdev
> /dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 30, gen 0
> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): unable to
> fixup (regular) error at logical 289831161856 on dev /dev/sda3
> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): bdev
> /dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 31, gen 0
> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): unable to
> fixup (regular) error at logical 289830309888 on dev /dev/sda3
> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): bdev
> /dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 32, gen 0
> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): unable to
> fixup (regular) error at logical 289831055360 on dev /dev/sda3
> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): bdev
> /dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 33, gen 0
> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): unable to
> fixup (regular) error at logical 289861591040 on dev /dev/sda3
> ago 15 21:58:09 kerberos kernel: BTRFS warning (device sda3): checksum
> error at logical 290297204736 on dev /dev/sda3, sector 67982824, root
> 258, inode 968837, offset 17455849472, length 4096, links 1 (path:
> groo/Fedora/Fedora.qcow2)

Any special setting on the file or the Fedora directory? Like nodatasum?

And is there any special setup like off-line dedupe?

Considering the number of corruption, only less than 50 and not 
continuous at all, it's a little weird.
For normal corruption, (at least on HDD) corruption range should be 
continuous, and more errors should be detected.

> ago 15 21:58:09 kerberos kernel: BTRFS error (device sda3): bdev
> /dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 34, gen 0
> ago 15 21:58:09 kerberos kernel: BTRFS error (device sda3): unable to
> fixup (regular) error at logical 290297204736 on dev /dev/sda3
> 
> The thing is, as soon as i move the image to another subvolume, root
> in this case, and delete it, the errors go away and scrub tells me i
> have zero errors again.

This makes things even more weird.

If you're *moving* the file to another subvolume, its data still locates 
where it was, nothing is modified.

If you're *copying* the file to another subvolume, without reflinking, 
then kernel will try to read out the data and write it back to new place.
During the read, it will verify data checksum. And if it doesn't match, 
you'll get EIO error during the copy.

If you're *reflinking* the file, using cp --reflink=always, it's the 
same result as *moving*.

Anyway, the data of your image is either kept as it is, or re-written to 
new place.
If there is really some corruption, for copy case you should get some 
error, and for moving/reflinking case, scrub will always report error.

I doubt if there is something wrong with scrub.

Can you even reproduce it with a smaller sparse file? For example 
several mega size.
And is it only happening in that specified Fedora directory?

Thanks,
Qu

> 
> Then if i AGAIN copy the file back to /home, i get the same errors.
> 
> qemu-img check tells me the qcow2 file is fine, and smart doesnt show
> me anything wrong with my ssd:
> 
> root@kerberos:/home/groo# smartctl -Ai /dev/sda
> smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.13.0-041300rc4-generic]
> (local build)
> Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
> 
> === START OF INFORMATION SECTION ===
> Model Family:     Samsung based SSDs
> Device Model:     Samsung SSD 850 EVO M.2 500GB
> Serial Number:    S33DNX0H812686V
> LU WWN Device Id: 5 002538 d4130d027
> Firmware Version: EMT21B6Q
> User Capacity:    500.107.862.016 bytes [500 GB]
> Sector Size:      512 bytes logical/physical
> Rotation Rate:    Solid State Device
> Form Factor:      M.2
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4c
> SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
> Local Time is:    Tue Aug 15 21:59:34 2017 -03
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> === START OF READ SMART DATA SECTION ===
> SMART Attributes Data Structure revision number: 1
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
> UPDATED  WHEN_FAILED RAW_VALUE
>    5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail
> Always       -       0
>    9 Power_On_Hours          0x0032   099   099   000    Old_age
> Always       -       1739
>   12 Power_Cycle_Count       0x0032   099   099   000    Old_age
> Always       -       392
> 177 Wear_Leveling_Count     0x0013   099   099   000    Pre-fail
> Always       -       7
> 179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail
> Always       -       0
> 181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age
> Always       -       0
> 182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age
> Always       -       0
> 183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail
> Always       -       0
> 187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age
> Always       -       0
> 190 Airflow_Temperature_Cel 0x0032   061   050   000    Old_age
> Always       -       39
> 195 ECC_Error_Rate          0x001a   200   200   000    Old_age
> Always       -       0
> 199 CRC_Error_Count         0x003e   100   100   000    Old_age
> Always       -       0
> 235 POR_Recovery_Count      0x0012   099   099   000    Old_age
> Always       -       54
> 241 Total_LBAs_Written      0x0032   099   099   000    Old_age
> Always       -       7997549567
> 
> this is the usage for /home:
> 
> root@kerberos:/home/groo# btrfs filesystem usage -T /home/
> Overall:
>      Device size:                 333.50GiB
>      Device allocated:             74.12GiB
>      Device unallocated:          259.38GiB
>      Device missing:                  0.00B
>      Used:                         32.70GiB
>      Free (estimated):            297.36GiB      (min: 167.67GiB)
>      Data ratio:                       1.00
>      Metadata ratio:                   2.00
>      Global reserve:               58.12MiB      (used: 0.00B)
> 
>               Data     Metadata  System
> Id Path      single   RAID1     RAID1    Unallocated
> -- --------- -------- --------- -------- -----------
>   1 /dev/sda3 68.00GiB   2.00GiB 64.00MiB   129.94GiB
>   2 /dev/sdb7  2.00GiB   2.00GiB 64.00MiB   128.96GiB
>   3 /dev/sdb8        -         -        -   488.13MiB
> -- --------- -------- --------- -------- -----------
>     Total     70.00GiB   2.00GiB 64.00MiB   259.38GiB
>     Used      32.02GiB 348.12MiB 16.00KiB
> 
> and for root subvolume:
> 
> root@kerberos:/home/groo# btrfs filesystem usage -T /
> Overall:
>      Device size:                  65.29GiB
>      Device allocated:             65.28GiB
>      Device unallocated:           12.00MiB
>      Device missing:                  0.00B
>      Used:                         14.94GiB
>      Free (estimated):             48.72GiB      (min: 48.72GiB)
>      Data ratio:                       1.00
>      Metadata ratio:                   1.00
>      Global reserve:               42.20MiB      (used: 0.00B)
> 
>               Data     Metadata  System
> Id Path      single   single    single   Unallocated
> -- --------- -------- --------- -------- -----------
>   1 /dev/sda2 63.24GiB   2.01GiB 32.00MiB    12.00MiB
> -- --------- -------- --------- -------- -----------
>     Total     63.24GiB   2.01GiB 32.00MiB    12.00MiB
>     Used      14.52GiB 425.16MiB 16.00KiB
> 
> i see this with both kernel 4.12 and 4.13rc4
> 
> the btrfstools are:
> 
> root@kerberos:/home/groo# btrfs version
> btrfs-progs v4.12-dirty
> 
> /etc/fstab:
> 
> UUID=e31faa09-99e5-4c75-815c-629402ec92f2 /               btrfs
> defaults,discard,subvol=@ 0       1
> # /boot was on /dev/sda1 during installation
> UUID=55796428-a9b8-4f1b-9a7e-8fe3aa8d8097 /boot           ext4
> defaults        0       2
> # /boot/efi was on /dev/sdb2 during installation
> UUID=D4F8-9F87  /boot/efi       vfat    umask=0077      0       1
> # /home was on /dev/sda3 during installation
> UUID=ae9ae869-720d-4643-b673-6924d09b2fe0 /home           btrfs
> defaults,discard,subvol=@home 0       2
> # swap was on /dev/sdb6 during installation
> #UUID=fc2a432b-4c40-4fe4-9730-869a1d1911ef none            swap    sw
>              0       0
> /dev/mapper/cryptswap1 none swap sw 0 0
> 
> 
> this is reproducible every single time.
> 
> is btrfs scrub maybe getting confused with a sparse file? is it
> possible to get a bad checksum with raid1 in this scenario?
> 
> any help is appreciated
> 
> | Paulo Dias
> | paulo.miguel.dias@gmail.com
> 
> Tempora mutantur, nos et mutamur in illis.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: qcow2 images make scrub believe the filesystem is corrupted.
  2017-08-16  1:40 ` Qu Wenruo
@ 2017-08-16  1:51   ` Paulo Dias
  2017-08-16  2:28     ` Qu Wenruo
  0 siblings, 1 reply; 19+ messages in thread
From: Paulo Dias @ 2017-08-16  1:51 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

Hi, thanks for the quick answer.

So, since i wrote this i tested this even further.

First, and as you predicted, if i try to cp the file to another
location i get read errors:

root@kerberos:/home/groo# cp Fedora/Fedora.qcow2 /
cp: error reading 'Fedora/Fedora.qcow2': Input/output error

so i used this trick:

# modprobe nbd
# qemu-nbd --connect=/dev/nbd0 Fedora2.qcow2
# ddrescue /dev/nbd0 new_file.raw
# qemu-nbd --disconnect /dev/nbd0
# qemu-img convert -O qcow2 new_file.raw new_file.qcow2

and sure enough i was able to recreate the qcow2 but with this errors:

ago 15 22:19:49 kerberos kernel: block nbd0: Other side returned error (5)
ago 15 22:19:49 kerberos kernel: print_req_error: I/O error, dev nbd0,
sector 22159872
ago 15 22:19:49 kerberos kernel: BTRFS warning (device sda3): csum
failed root 258 ino 968837 off 17455849472 csum 0xcc028588 expected
csum 0xe3338de1 mirror 1
ago 15 22:19:49 kerberos kernel: block nbd0: Other side returned error (5)
ago 15 22:19:49 kerberos kernel: print_req_error: I/O error, dev nbd0,
sector 22160016
ago 15 22:19:49 kerberos kernel: Buffer I/O error on dev nbd0, logical
block 2770002, async page read
ago 15 22:19:49 kerberos kernel: BTRFS warning (device sda3): csum
failed root 258 ino 968837 off 17455849472 csum 0xcc028588 expected
csum 0xe3338de1 mirror 1
ago 15 22:19:49 kerberos kernel: block nbd0: Other side returned error (5)
ago 15 22:19:49 kerberos kernel: print_req_error: I/O error, dev nbd0,
sector 22160016
ago 15 22:19:49 kerberos kernel: Buffer I/O error on dev nbd0, logical
block 2770002, async page read
ago 15 22:20:47 kerberos kernel: BTRFS warning (device sda3): csum
failed root 258 ino 968837 off 17455849472 csum 0xcc028588 expected
csum 0xe3338de1 mirror 1
ago 15 22:20:47 kerberos kernel: BTRFS warning (device sda3): csum
failed root 258 ino 968837 off 17455849472 csum 0xcc028588 expected
csum 0xe3338de1 mirror 1
ago 15 22:20:47 kerberos kernel: block nbd0: Other side returned error (5)
ago 15 22:20:47 kerberos kernel: print_req_error: I/O error, dev nbd0,
sector 22160016
ago 15 22:20:47 kerberos kernel: BTRFS warning (device sda3): csum
failed root 258 ino 968837 off 17455849472 csum 0xcc028588 expected
csum 0xe3338de1 mirror 1
ago 15 22:20:47 kerberos kernel: block nbd0: Other side returned error (5)
ago 15 22:20:47 kerberos kernel: print_req_error: I/O error, dev nbd0,
sector 22160016
ago 15 22:20:47 kerberos kernel: Buffer I/O error on dev nbd0, logical
block 2770002, async page read
ago 15 22:20:47 kerberos kernel: BTRFS warning (device sda3): csum
failed root 258 ino 968837 off 17455849472 csum 0xcc028588 expected
csum 0xe3338de1 mirror 1
ago 15 22:20:47 kerberos kernel: block nbd0: Other side returned error (5)
ago 15 22:20:47 kerberos kernel: print_req_error: I/O error, dev nbd0,
sector 22160016
ago 15 22:20:47 kerberos kernel: Buffer I/O error on dev nbd0, logical
block 2770002, async page read
ago 15 22:20:47 kerberos kernel: BTRFS warning (device sda3): csum
failed root 258 ino 968837 off 17455849472 csum 0xcc028588 expected
csum 0xe3338de1 mirror 1
ago 15 22:20:47 kerberos kernel: block nbd0: Other side returned error (5)
ago 15 22:20:47 kerberos kernel: print_req_error: I/O error, dev nbd0,
sector 22160016
ago 15 22:20:47 kerberos kernel: Buffer I/O error on dev nbd0, logical
block 2770002, async page read
ago 15 22:20:47 kerberos kernel: BTRFS warning (device sda3): csum
failed root 258 ino 968837 off 17455849472 csum 0xcc028588 expected
csum 0xe3338de1 mirror 1
ago 15 22:20:47 kerberos kernel: block nbd0: Other side returned error (5)
ago 15 22:20:47 kerberos kernel: print_req_error: I/O error, dev nbd0,
sector 22160016
ago 15 22:20:47 kerberos kernel: Buffer I/O error on dev nbd0, logical
block 2770002, async page read
ago 15 22:20:47 kerberos kernel: BTRFS warning (device sda3): csum
failed root 258 ino 968837 off 17455849472 csum 0xcc028588 expected
csum 0xe3338de1 mirror 1
ago 15 22:20:47 kerberos kernel: block nbd0: Other side returned error (5)
ago 15 22:20:47 kerberos kernel: print_req_error: I/O error, dev nbd0,
sector 22160016
ago 15 22:20:47 kerberos kernel: Buffer I/O error on dev nbd0, logical
block 2770002, async page read
ago 15 22:20:47 kerberos kernel: BTRFS warning (device sda3): csum
failed root 258 ino 968837 off 17455849472 csum 0xcc028588 expected
csum 0xe3338de1 mirror 1
ago 15 22:20:47 kerberos kernel: block nbd0: Other side returned error (5)
ago 15 22:20:47 kerberos kernel: print_req_error: I/O error, dev nbd0,
sector 22160016
ago 15 22:20:47 kerberos kernel: Buffer I/O error on dev nbd0, logical
block 2770002, async page read
ago 15 22:20:47 kerberos kernel: BTRFS warning (device sda3): csum
failed root 258 ino 968837 off 17455849472 csum 0xcc028588 expected
csum 0xe3338de1 mirror 1
ago 15 22:20:47 kerberos kernel: block nbd0: Other side returned error (5)
ago 15 22:20:47 kerberos kernel: print_req_error: I/O error, dev nbd0,
sector 22160016
ago 15 22:20:47 kerberos kernel: Buffer I/O error on dev nbd0, logical
block 2770002, async page read
ago 15 22:20:47 kerberos kernel: BTRFS warning (device sda3): csum
failed root 258 ino 968837 off 17455849472 csum 0xcc028588 expected
csum 0xe3338de1 mirror 1
ago 15 22:20:47 kerberos kernel: block nbd0: Other side returned error (5)
ago 15 22:20:47 kerberos kernel: print_req_error: I/O error, dev nbd0,
sector 22160016
ago 15 22:20:47 kerberos kernel: Buffer I/O error on dev nbd0, logical
block 2770002, async page read
ago 15 22:21:32 kerberos kernel: block nbd0: NBD_DISCONNECT
ago 15 22:21:32 kerberos kernel: block nbd0: shutting down sockets

i deleted the original Fedora.qcow2 and again scrub said i didnt had
any errors, so i wondered, could it be the raid1 code (long shot), so
i moved the metadata back to DUP.

btrfs fi balance start -dconvert=single -mconvert=dup /home/

root@kerberos:/home/groo# btrfs filesystem usage -T /home/
Overall:
    Device size:                 333.50GiB
    Device allocated:             18.06GiB
    Device unallocated:          315.44GiB
    Device missing:                  0.00B
    Used:                         16.25GiB
    Free (estimated):            315.83GiB      (min: 158.11GiB)
    Data ratio:                       1.00
    Metadata ratio:                   2.00
    Global reserve:               39.45MiB      (used: 0.00B)

             Data     Metadata  System
Id Path      single   DUP       DUP      Unallocated
-- --------- -------- --------- -------- -----------
 1 /dev/sda3 16.00GiB   2.00GiB 64.00MiB   181.94GiB
 2 /dev/sdb7        -         -        -   133.03GiB
 3 /dev/sdb8        -         -        -   488.13MiB
-- --------- -------- --------- -------- -----------
   Total     16.00GiB   1.00GiB 32.00MiB   315.44GiB
   Used      15.61GiB 329.27MiB 16.00KiB

and once again copied the NEW fedora.qcow2 back to home and rerun scrub

and once again i got errors:

root@kerberos:/home/groo# btrfs scrub start -B /home/
scrub done for ae9ae869-720d-4643-b673-6924d09b2fe0
        scrub started at Tue Aug 15 22:36:32 2017 and finished after 00:01:04
        total bytes scrubbed: 32.56GiB with 13 errors
        error details: csum=13
        corrected errors: 0, uncorrectable errors: 13, unverified errors: 0

ago 15 22:37:36 kerberos kernel: BTRFS error (device sda3): bdev
/dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 35, gen 0
ago 15 22:37:36 kerberos kernel: BTRFS error (device sda3): unable to
fixup (regular) error at logical 418909777920 on dev /dev/sda3
ago 15 22:37:36 kerberos kernel: BTRFS error (device sda3): bdev
/dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 36, gen 0
ago 15 22:37:36 kerberos kernel: BTRFS error (device sda3): unable to
fixup (regular) error at logical 418913218560 on dev /dev/sda3
ago 15 22:37:36 kerberos kernel: BTRFS error (device sda3): bdev
/dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 37, gen 0
ago 15 22:37:36 kerberos kernel: BTRFS error (device sda3): unable to
fixup (regular) error at logical 418913234944 on dev /dev/sda3
ago 15 22:37:36 kerberos kernel: BTRFS error (device sda3): bdev
/dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 38, gen 0
ago 15 22:37:36 kerberos kernel: BTRFS error (device sda3): unable to
fixup (regular) error at logical 418909618176 on dev /dev/sda3
ago 15 22:37:36 kerberos kernel: BTRFS error (device sda3): bdev
/dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 39, gen 0
ago 15 22:37:36 kerberos kernel: BTRFS error (device sda3): unable to
fixup (regular) error at logical 418909630464 on dev /dev/sda3
ago 15 22:37:36 kerberos kernel: BTRFS error (device sda3): bdev
/dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 40, gen 0
ago 15 22:37:36 kerberos kernel: BTRFS error (device sda3): unable to
fixup (regular) error at logical 418910056448 on dev /dev/sda3
ago 15 22:37:36 kerberos kernel: BTRFS error (device sda3): bdev
/dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 41, gen 0
ago 15 22:37:36 kerberos kernel: BTRFS error (device sda3): unable to
fixup (regular) error at logical 418910064640 on dev /dev/sda3
ago 15 22:37:36 kerberos kernel: BTRFS error (device sda3): bdev
/dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 42, gen 0
ago 15 22:37:36 kerberos kernel: BTRFS error (device sda3): unable to
fixup (regular) error at logical 418913071104 on dev /dev/sda3
ago 15 22:37:36 kerberos kernel: BTRFS error (device sda3): bdev
/dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 43, gen 0
ago 15 22:37:36 kerberos kernel: BTRFS error (device sda3): unable to
fixup (regular) error at logical 418912890880 on dev /dev/sda3
ago 15 22:37:36 kerberos kernel: BTRFS error (device sda3): bdev
/dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 44, gen 0
ago 15 22:37:36 kerberos kernel: BTRFS error (device sda3): unable to
fixup (regular) error at logical 418912997376 on dev /dev/sda3

since i still have the original (recovered) Fedora.qcow2 back in the
root volume, i went back and changed the medatada back to raid1.

root@kerberos:/home/groo# btrfs filesystem usage -T /home/
Overall:
    Device size:                 333.50GiB
    Device allocated:             18.06GiB
    Device unallocated:          315.44GiB
    Device missing:                  0.00B
    Used:                         16.25GiB
    Free (estimated):            315.83GiB      (min: 158.11GiB)
    Data ratio:                       1.00
    Metadata ratio:                   2.00
    Global reserve:               38.98MiB      (used: 0.00B)

             Data     Metadata  System
Id Path      single   RAID1     RAID1    Unallocated
-- --------- -------- --------- -------- -----------
 1 /dev/sda3 16.00GiB   1.00GiB 32.00MiB   182.97GiB
 2 /dev/sdb7        -   1.00GiB 32.00MiB   132.00GiB
 3 /dev/sdb8        -         -        -   488.13MiB
-- --------- -------- --------- -------- -----------
   Total     16.00GiB   1.00GiB 32.00MiB   315.44GiB
   Used      15.61GiB 328.80MiB 16.00KiB

and thats when you answered my email.

now to answer your questions:

Any special setting on the file or the Fedora directory? Like nodatasum?

nope

And is there any special setup like off-line dedupe?

nope

its a plain btrfs setup with discard and thats it.

the qcow2 is the plain one created via libvirt/virt-manager.

also, its not the only one, if i create an image with minishift (a
openshift dockerized solution) i get even more errors, since i have 2
sparse files. if i delete them, the errors go away.

im stumped at this.

any ideas?
| Paulo Dias
| paulo.miguel.dias@gmail.com

Tempora mutantur, nos et mutamur in illis.


On Tue, Aug 15, 2017 at 10:40 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
> On 2017年08月16日 09:12, Paulo Dias wrote:
>>
>> Hello/2 all
>>
>> I'm using libvirt with a qcow2 image and everytime i run btrfs scrub
>> -H /home (subvolume where the image is), i get:
>>
>> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): bdev
>> /dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 30, gen 0
>> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): unable to
>> fixup (regular) error at logical 289831161856 on dev /dev/sda3
>> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): bdev
>> /dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 31, gen 0
>> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): unable to
>> fixup (regular) error at logical 289830309888 on dev /dev/sda3
>> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): bdev
>> /dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 32, gen 0
>> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): unable to
>> fixup (regular) error at logical 289831055360 on dev /dev/sda3
>> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): bdev
>> /dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 33, gen 0
>> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): unable to
>> fixup (regular) error at logical 289861591040 on dev /dev/sda3
>> ago 15 21:58:09 kerberos kernel: BTRFS warning (device sda3): checksum
>> error at logical 290297204736 on dev /dev/sda3, sector 67982824, root
>> 258, inode 968837, offset 17455849472, length 4096, links 1 (path:
>> groo/Fedora/Fedora.qcow2)
>
>
> Any special setting on the file or the Fedora directory? Like nodatasum?
>
> And is there any special setup like off-line dedupe?
>
> Considering the number of corruption, only less than 50 and not continuous
> at all, it's a little weird.
> For normal corruption, (at least on HDD) corruption range should be
> continuous, and more errors should be detected.
>
>> ago 15 21:58:09 kerberos kernel: BTRFS error (device sda3): bdev
>> /dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 34, gen 0
>> ago 15 21:58:09 kerberos kernel: BTRFS error (device sda3): unable to
>> fixup (regular) error at logical 290297204736 on dev /dev/sda3
>>
>> The thing is, as soon as i move the image to another subvolume, root
>> in this case, and delete it, the errors go away and scrub tells me i
>> have zero errors again.
>
>
> This makes things even more weird.
>
> If you're *moving* the file to another subvolume, its data still locates
> where it was, nothing is modified.
>
> If you're *copying* the file to another subvolume, without reflinking, then
> kernel will try to read out the data and write it back to new place.
> During the read, it will verify data checksum. And if it doesn't match,
> you'll get EIO error during the copy.
>
> If you're *reflinking* the file, using cp --reflink=always, it's the same
> result as *moving*.
>
> Anyway, the data of your image is either kept as it is, or re-written to new
> place.
> If there is really some corruption, for copy case you should get some error,
> and for moving/reflinking case, scrub will always report error.
>
> I doubt if there is something wrong with scrub.
>
> Can you even reproduce it with a smaller sparse file? For example several
> mega size.
> And is it only happening in that specified Fedora directory?
>
> Thanks,
> Qu
>
>>
>> Then if i AGAIN copy the file back to /home, i get the same errors.
>>
>> qemu-img check tells me the qcow2 file is fine, and smart doesnt show
>> me anything wrong with my ssd:
>>
>> root@kerberos:/home/groo# smartctl -Ai /dev/sda
>> smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.13.0-041300rc4-generic]
>> (local build)
>> Copyright (C) 2002-16, Bruce Allen, Christian Franke,
>> www.smartmontools.org
>>
>> === START OF INFORMATION SECTION ===
>> Model Family:     Samsung based SSDs
>> Device Model:     Samsung SSD 850 EVO M.2 500GB
>> Serial Number:    S33DNX0H812686V
>> LU WWN Device Id: 5 002538 d4130d027
>> Firmware Version: EMT21B6Q
>> User Capacity:    500.107.862.016 bytes [500 GB]
>> Sector Size:      512 bytes logical/physical
>> Rotation Rate:    Solid State Device
>> Form Factor:      M.2
>> Device is:        In smartctl database [for details use: -P show]
>> ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4c
>> SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
>> Local Time is:    Tue Aug 15 21:59:34 2017 -03
>> SMART support is: Available - device has SMART capability.
>> SMART support is: Enabled
>>
>> === START OF READ SMART DATA SECTION ===
>> SMART Attributes Data Structure revision number: 1
>> Vendor Specific SMART Attributes with Thresholds:
>> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
>> UPDATED  WHEN_FAILED RAW_VALUE
>>    5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail
>> Always       -       0
>>    9 Power_On_Hours          0x0032   099   099   000    Old_age
>> Always       -       1739
>>   12 Power_Cycle_Count       0x0032   099   099   000    Old_age
>> Always       -       392
>> 177 Wear_Leveling_Count     0x0013   099   099   000    Pre-fail
>> Always       -       7
>> 179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail
>> Always       -       0
>> 181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age
>> Always       -       0
>> 182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age
>> Always       -       0
>> 183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail
>> Always       -       0
>> 187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age
>> Always       -       0
>> 190 Airflow_Temperature_Cel 0x0032   061   050   000    Old_age
>> Always       -       39
>> 195 ECC_Error_Rate          0x001a   200   200   000    Old_age
>> Always       -       0
>> 199 CRC_Error_Count         0x003e   100   100   000    Old_age
>> Always       -       0
>> 235 POR_Recovery_Count      0x0012   099   099   000    Old_age
>> Always       -       54
>> 241 Total_LBAs_Written      0x0032   099   099   000    Old_age
>> Always       -       7997549567
>>
>> this is the usage for /home:
>>
>> root@kerberos:/home/groo# btrfs filesystem usage -T /home/
>> Overall:
>>      Device size:                 333.50GiB
>>      Device allocated:             74.12GiB
>>      Device unallocated:          259.38GiB
>>      Device missing:                  0.00B
>>      Used:                         32.70GiB
>>      Free (estimated):            297.36GiB      (min: 167.67GiB)
>>      Data ratio:                       1.00
>>      Metadata ratio:                   2.00
>>      Global reserve:               58.12MiB      (used: 0.00B)
>>
>>               Data     Metadata  System
>> Id Path      single   RAID1     RAID1    Unallocated
>> -- --------- -------- --------- -------- -----------
>>   1 /dev/sda3 68.00GiB   2.00GiB 64.00MiB   129.94GiB
>>   2 /dev/sdb7  2.00GiB   2.00GiB 64.00MiB   128.96GiB
>>   3 /dev/sdb8        -         -        -   488.13MiB
>> -- --------- -------- --------- -------- -----------
>>     Total     70.00GiB   2.00GiB 64.00MiB   259.38GiB
>>     Used      32.02GiB 348.12MiB 16.00KiB
>>
>> and for root subvolume:
>>
>> root@kerberos:/home/groo# btrfs filesystem usage -T /
>> Overall:
>>      Device size:                  65.29GiB
>>      Device allocated:             65.28GiB
>>      Device unallocated:           12.00MiB
>>      Device missing:                  0.00B
>>      Used:                         14.94GiB
>>      Free (estimated):             48.72GiB      (min: 48.72GiB)
>>      Data ratio:                       1.00
>>      Metadata ratio:                   1.00
>>      Global reserve:               42.20MiB      (used: 0.00B)
>>
>>               Data     Metadata  System
>> Id Path      single   single    single   Unallocated
>> -- --------- -------- --------- -------- -----------
>>   1 /dev/sda2 63.24GiB   2.01GiB 32.00MiB    12.00MiB
>> -- --------- -------- --------- -------- -----------
>>     Total     63.24GiB   2.01GiB 32.00MiB    12.00MiB
>>     Used      14.52GiB 425.16MiB 16.00KiB
>>
>> i see this with both kernel 4.12 and 4.13rc4
>>
>> the btrfstools are:
>>
>> root@kerberos:/home/groo# btrfs version
>> btrfs-progs v4.12-dirty
>>
>> /etc/fstab:
>>
>> UUID=e31faa09-99e5-4c75-815c-629402ec92f2 /               btrfs
>> defaults,discard,subvol=@ 0       1
>> # /boot was on /dev/sda1 during installation
>> UUID=55796428-a9b8-4f1b-9a7e-8fe3aa8d8097 /boot           ext4
>> defaults        0       2
>> # /boot/efi was on /dev/sdb2 during installation
>> UUID=D4F8-9F87  /boot/efi       vfat    umask=0077      0       1
>> # /home was on /dev/sda3 during installation
>> UUID=ae9ae869-720d-4643-b673-6924d09b2fe0 /home           btrfs
>> defaults,discard,subvol=@home 0       2
>> # swap was on /dev/sdb6 during installation
>> #UUID=fc2a432b-4c40-4fe4-9730-869a1d1911ef none            swap    sw
>>              0       0
>> /dev/mapper/cryptswap1 none swap sw 0 0
>>
>>
>> this is reproducible every single time.
>>
>> is btrfs scrub maybe getting confused with a sparse file? is it
>> possible to get a bad checksum with raid1 in this scenario?
>>
>> any help is appreciated
>>
>> | Paulo Dias
>> | paulo.miguel.dias@gmail.com
>>
>> Tempora mutantur, nos et mutamur in illis.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: qcow2 images make scrub believe the filesystem is corrupted.
  2017-08-16  1:51   ` Paulo Dias
@ 2017-08-16  2:28     ` Qu Wenruo
  2017-08-16  2:46       ` Qu Wenruo
  2017-08-16  7:47       ` Qu Wenruo
  0 siblings, 2 replies; 19+ messages in thread
From: Qu Wenruo @ 2017-08-16  2:28 UTC (permalink / raw)
  To: Paulo Dias; +Cc: linux-btrfs



On 2017年08月16日 09:51, Paulo Dias wrote:
> Hi, thanks for the quick answer.
> 
> So, since i wrote this i tested this even further.
> 
> First, and as you predicted, if i try to cp the file to another
> location i get read errors:
> 
> root@kerberos:/home/groo# cp Fedora/Fedora.qcow2 /
> cp: error reading 'Fedora/Fedora.qcow2': Input/output error

Less possible to blame scrub now.
As normal read routine also reports such error, it maybe a real 
corruption of the file.

> 
> so i used this trick:
> 
> # modprobe nbd
> # qemu-nbd --connect=/dev/nbd0 Fedora2.qcow2
> # ddrescue /dev/nbd0 new_file.raw
> # qemu-nbd --disconnect /dev/nbd0
> # qemu-img convert -O qcow2 new_file.raw new_file.qcow2
> 
> and sure enough i was able to recreate the qcow2 but with this errors:
> 
> ago 15 22:19:49 kerberos kernel: block nbd0: Other side returned error (5)
> ago 15 22:19:49 kerberos kernel: print_req_error: I/O error, dev nbd0,
> sector 22159872
> ago 15 22:19:49 kerberos kernel: BTRFS warning (device sda3): csum
> failed root 258 ino 968837 off 17455849472 csum 0xcc028588 expected
> csum 0xe3338de1 mirror 1

Still csum error.
And furthermore, both the expected and on-disk csum is not special value 
like crc32 for all zero page.
So it may means that, it's a real corruption.

> ago 15 22:19:49 kerberos kernel: block nbd0: Other side returned error (5)
> ago 15 22:19:49 kerberos kernel: print_req_error: I/O error, dev nbd0,
> sector 22160016
> ago 15 22:19:49 kerberos kernel: Buffer I/O error on dev nbd0, logical
> block 2770002, async page read
> ago 15 22:19:49 kerberos kernel: BTRFS warning (device sda3): csum
> failed root 258 ino 968837 off 17455849472 csum 0xcc028588 expected
> csum 0xe3338de1 mirror 1

At least, we now know which inode (968837 of root 258) and file offset 
(17455849472 length 4K) is corrupted.

> ago 15 22:19:49 kerberos kernel: block nbd0: Other side returned error (5)
> ago 15 22:19:49 kerberos kernel: print_req_error: I/O error, dev nbd0,
> sector 22160016
> ago 15 22:19:49 kerberos kernel: Buffer I/O error on dev nbd0, logical
> block 2770002, async page read
> ago 15 22:20:47 kerberos kernel: BTRFS warning (device sda3): csum
> failed root 258 ino 968837 off 17455849472 csum 0xcc028588 expected
> csum 0xe3338de1 mirror 1
> ago 15 22:20:47 kerberos kernel: BTRFS warning (device sda3): csum
> failed root 258 ino 968837 off 17455849472 csum 0xcc028588 expected
> csum 0xe3338de1 mirror 1
<snip>
> block 2770002, async page read
> ago 15 22:21:32 kerberos kernel: block nbd0: NBD_DISCONNECT
> ago 15 22:21:32 kerberos kernel: block nbd0: shutting down sockets
> 
> i deleted the original Fedora.qcow2 and again scrub said i didnt had
> any errors, so i wondered, could it be the raid1 code (long shot), so
> i moved the metadata back to DUP.
> 
> btrfs fi balance start -dconvert=single -mconvert=dup /home/

OK, data is not touched.
Single to single, so data chunks are not touched.
And your metadata is always good, so no problem should happen during 
balance.

BTW, if you balance data, (no need to do convert, just balancing all 
data), it should also report error if my assumption is correct:
Some data is *really* corrupted.

> 
> root@kerberos:/home/groo# btrfs filesystem usage -T /home/
> Overall:
>      Device size:                 333.50GiB
>      Device allocated:             18.06GiB
>      Device unallocated:          315.44GiB
>      Device missing:                  0.00B
>      Used:                         16.25GiB
>      Free (estimated):            315.83GiB      (min: 158.11GiB)
>      Data ratio:                       1.00
>      Metadata ratio:                   2.00
>      Global reserve:               39.45MiB      (used: 0.00B)
> 
>               Data     Metadata  System
> Id Path      single   DUP       DUP      Unallocated
> -- --------- -------- --------- -------- -----------
>   1 /dev/sda3 16.00GiB   2.00GiB 64.00MiB   181.94GiB
>   2 /dev/sdb7        -         -        -   133.03GiB
>   3 /dev/sdb8        -         -        -   488.13MiB
> -- --------- -------- --------- -------- -----------
>     Total     16.00GiB   1.00GiB 32.00MiB   315.44GiB
>     Used      15.61GiB 329.27MiB 16.00KiB
> 
> and once again copied the NEW fedora.qcow2 back to home and rerun scrub >
> and once again i got errors:
> 
> root@kerberos:/home/groo# btrfs scrub start -B /home/
> scrub done for ae9ae869-720d-4643-b673-6924d09b2fe0
>          scrub started at Tue Aug 15 22:36:32 2017 and finished after 00:01:04
>          total bytes scrubbed: 32.56GiB with 13 errors
>          error details: csum=13
>          corrected errors: 0, uncorrectable errors: 13, unverified errors: 0
> 
> ago 15 22:37:36 kerberos kernel: BTRFS error (device sda3): bdev
> /dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 35, gen 0
> ago 15 22:37:36 kerberos kernel: BTRFS error (device sda3): unable to
> fixup (regular) error at logical 418909777920 on dev /dev/sda3
> ago 15 22:37:36 kerberos kernel: BTRFS error (device sda3): bdev
<snip>
> ago 15 22:37:36 kerberos kernel: BTRFS error (device sda3): bdev
> /dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 44, gen 0
> ago 15 22:37:36 kerberos kernel: BTRFS error (device sda3): unable to
> fixup (regular) error at logical 418912997376 on dev /dev/sda3
> 
> since i still have the original (recovered) Fedora.qcow2 back in the
> root volume, i went back and changed the medatada back to raid1.
> 
> root@kerberos:/home/groo# btrfs filesystem usage -T /home/
> Overall:
>      Device size:                 333.50GiB
>      Device allocated:             18.06GiB
>      Device unallocated:          315.44GiB
>      Device missing:                  0.00B
>      Used:                         16.25GiB
>      Free (estimated):            315.83GiB      (min: 158.11GiB)
>      Data ratio:                       1.00
>      Metadata ratio:                   2.00
>      Global reserve:               38.98MiB      (used: 0.00B)
> 
>               Data     Metadata  System
> Id Path      single   RAID1     RAID1    Unallocated
> -- --------- -------- --------- -------- -----------
>   1 /dev/sda3 16.00GiB   1.00GiB 32.00MiB   182.97GiB
>   2 /dev/sdb7        -   1.00GiB 32.00MiB   132.00GiB
>   3 /dev/sdb8        -         -        -   488.13MiB
> -- --------- -------- --------- -------- -----------
>     Total     16.00GiB   1.00GiB 32.00MiB   315.44GiB
>     Used      15.61GiB 328.80MiB 16.00KiB
> 
> and thats when you answered my email.
> 
> now to answer your questions:
> 
> Any special setting on the file or the Fedora directory? Like nodatasum?
> 
> nope
> 
> And is there any special setup like off-line dedupe?
> 
> nope
> 
> its a plain btrfs setup with discard and thats it.

Oh, discard.
IIRC there used to be some discard related problems which leads to data 
corruption.
Not sure if it's related.

As a general recommendation, it's better to do periodic fstrim, other 
than using discard mount option.

Would you please try to mount without discard, and delete related files, 
making sure scrub and cat (just cat out all files, redirect to 
/dev/null, as in that case, error report is better than scrub) reports 
nothing wrong.

Then recreate the file from other backup (not in the same btrfs), and 
scrub again to verify if it's good or not.

Thanks,
Qu

> 
> the qcow2 is the plain one created via libvirt/virt-manager.
> 
> also, its not the only one, if i create an image with minishift (a
> openshift dockerized solution) i get even more errors, since i have 2
> sparse files. if i delete them, the errors go away.
> 
> im stumped at this.
> 
> any ideas?
> | Paulo Dias
> | paulo.miguel.dias@gmail.com
> 
> Tempora mutantur, nos et mutamur in illis.
> 
> 
> On Tue, Aug 15, 2017 at 10:40 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>> On 2017年08月16日 09:12, Paulo Dias wrote:
>>>
>>> Hello/2 all
>>>
>>> I'm using libvirt with a qcow2 image and everytime i run btrfs scrub
>>> -H /home (subvolume where the image is), i get:
>>>
>>> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): bdev
>>> /dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 30, gen 0
>>> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): unable to
>>> fixup (regular) error at logical 289831161856 on dev /dev/sda3
>>> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): bdev
>>> /dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 31, gen 0
>>> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): unable to
>>> fixup (regular) error at logical 289830309888 on dev /dev/sda3
>>> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): bdev
>>> /dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 32, gen 0
>>> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): unable to
>>> fixup (regular) error at logical 289831055360 on dev /dev/sda3
>>> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): bdev
>>> /dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 33, gen 0
>>> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): unable to
>>> fixup (regular) error at logical 289861591040 on dev /dev/sda3
>>> ago 15 21:58:09 kerberos kernel: BTRFS warning (device sda3): checksum
>>> error at logical 290297204736 on dev /dev/sda3, sector 67982824, root
>>> 258, inode 968837, offset 17455849472, length 4096, links 1 (path:
>>> groo/Fedora/Fedora.qcow2)
>>
>>
>> Any special setting on the file or the Fedora directory? Like nodatasum?
>>
>> And is there any special setup like off-line dedupe?
>>
>> Considering the number of corruption, only less than 50 and not continuous
>> at all, it's a little weird.
>> For normal corruption, (at least on HDD) corruption range should be
>> continuous, and more errors should be detected.
>>
>>> ago 15 21:58:09 kerberos kernel: BTRFS error (device sda3): bdev
>>> /dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 34, gen 0
>>> ago 15 21:58:09 kerberos kernel: BTRFS error (device sda3): unable to
>>> fixup (regular) error at logical 290297204736 on dev /dev/sda3
>>>
>>> The thing is, as soon as i move the image to another subvolume, root
>>> in this case, and delete it, the errors go away and scrub tells me i
>>> have zero errors again.
>>
>>
>> This makes things even more weird.
>>
>> If you're *moving* the file to another subvolume, its data still locates
>> where it was, nothing is modified.
>>
>> If you're *copying* the file to another subvolume, without reflinking, then
>> kernel will try to read out the data and write it back to new place.
>> During the read, it will verify data checksum. And if it doesn't match,
>> you'll get EIO error during the copy.
>>
>> If you're *reflinking* the file, using cp --reflink=always, it's the same
>> result as *moving*.
>>
>> Anyway, the data of your image is either kept as it is, or re-written to new
>> place.
>> If there is really some corruption, for copy case you should get some error,
>> and for moving/reflinking case, scrub will always report error.
>>
>> I doubt if there is something wrong with scrub.
>>
>> Can you even reproduce it with a smaller sparse file? For example several
>> mega size.
>> And is it only happening in that specified Fedora directory?
>>
>> Thanks,
>> Qu
>>
>>>
>>> Then if i AGAIN copy the file back to /home, i get the same errors.
>>>
>>> qemu-img check tells me the qcow2 file is fine, and smart doesnt show
>>> me anything wrong with my ssd:
>>>
>>> root@kerberos:/home/groo# smartctl -Ai /dev/sda
>>> smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.13.0-041300rc4-generic]
>>> (local build)
>>> Copyright (C) 2002-16, Bruce Allen, Christian Franke,
>>> www.smartmontools.org
>>>
>>> === START OF INFORMATION SECTION ===
>>> Model Family:     Samsung based SSDs
>>> Device Model:     Samsung SSD 850 EVO M.2 500GB
>>> Serial Number:    S33DNX0H812686V
>>> LU WWN Device Id: 5 002538 d4130d027
>>> Firmware Version: EMT21B6Q
>>> User Capacity:    500.107.862.016 bytes [500 GB]
>>> Sector Size:      512 bytes logical/physical
>>> Rotation Rate:    Solid State Device
>>> Form Factor:      M.2
>>> Device is:        In smartctl database [for details use: -P show]
>>> ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4c
>>> SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
>>> Local Time is:    Tue Aug 15 21:59:34 2017 -03
>>> SMART support is: Available - device has SMART capability.
>>> SMART support is: Enabled
>>>
>>> === START OF READ SMART DATA SECTION ===
>>> SMART Attributes Data Structure revision number: 1
>>> Vendor Specific SMART Attributes with Thresholds:
>>> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
>>> UPDATED  WHEN_FAILED RAW_VALUE
>>>     5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail
>>> Always       -       0
>>>     9 Power_On_Hours          0x0032   099   099   000    Old_age
>>> Always       -       1739
>>>    12 Power_Cycle_Count       0x0032   099   099   000    Old_age
>>> Always       -       392
>>> 177 Wear_Leveling_Count     0x0013   099   099   000    Pre-fail
>>> Always       -       7
>>> 179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail
>>> Always       -       0
>>> 181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age
>>> Always       -       0
>>> 182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age
>>> Always       -       0
>>> 183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail
>>> Always       -       0
>>> 187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age
>>> Always       -       0
>>> 190 Airflow_Temperature_Cel 0x0032   061   050   000    Old_age
>>> Always       -       39
>>> 195 ECC_Error_Rate          0x001a   200   200   000    Old_age
>>> Always       -       0
>>> 199 CRC_Error_Count         0x003e   100   100   000    Old_age
>>> Always       -       0
>>> 235 POR_Recovery_Count      0x0012   099   099   000    Old_age
>>> Always       -       54
>>> 241 Total_LBAs_Written      0x0032   099   099   000    Old_age
>>> Always       -       7997549567
>>>
>>> this is the usage for /home:
>>>
>>> root@kerberos:/home/groo# btrfs filesystem usage -T /home/
>>> Overall:
>>>       Device size:                 333.50GiB
>>>       Device allocated:             74.12GiB
>>>       Device unallocated:          259.38GiB
>>>       Device missing:                  0.00B
>>>       Used:                         32.70GiB
>>>       Free (estimated):            297.36GiB      (min: 167.67GiB)
>>>       Data ratio:                       1.00
>>>       Metadata ratio:                   2.00
>>>       Global reserve:               58.12MiB      (used: 0.00B)
>>>
>>>                Data     Metadata  System
>>> Id Path      single   RAID1     RAID1    Unallocated
>>> -- --------- -------- --------- -------- -----------
>>>    1 /dev/sda3 68.00GiB   2.00GiB 64.00MiB   129.94GiB
>>>    2 /dev/sdb7  2.00GiB   2.00GiB 64.00MiB   128.96GiB
>>>    3 /dev/sdb8        -         -        -   488.13MiB
>>> -- --------- -------- --------- -------- -----------
>>>      Total     70.00GiB   2.00GiB 64.00MiB   259.38GiB
>>>      Used      32.02GiB 348.12MiB 16.00KiB
>>>
>>> and for root subvolume:
>>>
>>> root@kerberos:/home/groo# btrfs filesystem usage -T /
>>> Overall:
>>>       Device size:                  65.29GiB
>>>       Device allocated:             65.28GiB
>>>       Device unallocated:           12.00MiB
>>>       Device missing:                  0.00B
>>>       Used:                         14.94GiB
>>>       Free (estimated):             48.72GiB      (min: 48.72GiB)
>>>       Data ratio:                       1.00
>>>       Metadata ratio:                   1.00
>>>       Global reserve:               42.20MiB      (used: 0.00B)
>>>
>>>                Data     Metadata  System
>>> Id Path      single   single    single   Unallocated
>>> -- --------- -------- --------- -------- -----------
>>>    1 /dev/sda2 63.24GiB   2.01GiB 32.00MiB    12.00MiB
>>> -- --------- -------- --------- -------- -----------
>>>      Total     63.24GiB   2.01GiB 32.00MiB    12.00MiB
>>>      Used      14.52GiB 425.16MiB 16.00KiB
>>>
>>> i see this with both kernel 4.12 and 4.13rc4
>>>
>>> the btrfstools are:
>>>
>>> root@kerberos:/home/groo# btrfs version
>>> btrfs-progs v4.12-dirty
>>>
>>> /etc/fstab:
>>>
>>> UUID=e31faa09-99e5-4c75-815c-629402ec92f2 /               btrfs
>>> defaults,discard,subvol=@ 0       1
>>> # /boot was on /dev/sda1 during installation
>>> UUID=55796428-a9b8-4f1b-9a7e-8fe3aa8d8097 /boot           ext4
>>> defaults        0       2
>>> # /boot/efi was on /dev/sdb2 during installation
>>> UUID=D4F8-9F87  /boot/efi       vfat    umask=0077      0       1
>>> # /home was on /dev/sda3 during installation
>>> UUID=ae9ae869-720d-4643-b673-6924d09b2fe0 /home           btrfs
>>> defaults,discard,subvol=@home 0       2
>>> # swap was on /dev/sdb6 during installation
>>> #UUID=fc2a432b-4c40-4fe4-9730-869a1d1911ef none            swap    sw
>>>               0       0
>>> /dev/mapper/cryptswap1 none swap sw 0 0
>>>
>>>
>>> this is reproducible every single time.
>>>
>>> is btrfs scrub maybe getting confused with a sparse file? is it
>>> possible to get a bad checksum with raid1 in this scenario?
>>>
>>> any help is appreciated
>>>
>>> | Paulo Dias
>>> | paulo.miguel.dias@gmail.com
>>>
>>> Tempora mutantur, nos et mutamur in illis.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: qcow2 images make scrub believe the filesystem is corrupted.
  2017-08-16  2:28     ` Qu Wenruo
@ 2017-08-16  2:46       ` Qu Wenruo
  2017-08-16  7:47       ` Qu Wenruo
  1 sibling, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2017-08-16  2:46 UTC (permalink / raw)
  To: Paulo Dias; +Cc: linux-btrfs



On 2017年08月16日 10:28, Qu Wenruo wrote:
> 
> OK, data is not touched.
> Single to single, so data chunks are not touched.
> And your metadata is always good, so no problem should happen during 
> balance.
Sorry, this part is wrong.

Data chunk is relocated, so I'm curious why there is no such kernel log 
warning about the csum mismatch.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: qcow2 images make scrub believe the filesystem is corrupted.
  2017-08-16  2:28     ` Qu Wenruo
  2017-08-16  2:46       ` Qu Wenruo
@ 2017-08-16  7:47       ` Qu Wenruo
  1 sibling, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2017-08-16  7:47 UTC (permalink / raw)
  To: Paulo Dias; +Cc: linux-btrfs

BTW, to determine it's really data corruption, you could check the data 
checksum by executing "btrfs check --check-data-csum".

--check-data-csum has its limitation of skipping remaining mirrors if 
the first mirror is correct, but since your data is single, such 
limitation is not a problem at all.

Or, you could also try the out-of-tree btrfs-progs with offline scrub 
support:
https://github.com/adam900710/btrfs-progs/tree/offline_scrub

It should be much like kernel scrub equivalent in btrfs-progs.
Using "btrfs scrub start --offline <your device>" should be able to 
verify all checksum for data and metadata.

If btrfs-progs reports csum error (for data), then it's really 
corrupted, and highly possible caused by discard mount option.

Thanks,
Qu

On 2017年08月16日 10:28, Qu Wenruo wrote:
> 
> 
> On 2017年08月16日 09:51, Paulo Dias wrote:
>> Hi, thanks for the quick answer.
>>
>> So, since i wrote this i tested this even further.
>>
>> First, and as you predicted, if i try to cp the file to another
>> location i get read errors:
>>
>> root@kerberos:/home/groo# cp Fedora/Fedora.qcow2 /
>> cp: error reading 'Fedora/Fedora.qcow2': Input/output error
> 
> Less possible to blame scrub now.
> As normal read routine also reports such error, it maybe a real 
> corruption of the file.
> 
>>
>> so i used this trick:
>>
>> # modprobe nbd
>> # qemu-nbd --connect=/dev/nbd0 Fedora2.qcow2
>> # ddrescue /dev/nbd0 new_file.raw
>> # qemu-nbd --disconnect /dev/nbd0
>> # qemu-img convert -O qcow2 new_file.raw new_file.qcow2
>>
>> and sure enough i was able to recreate the qcow2 but with this errors:
>>
>> ago 15 22:19:49 kerberos kernel: block nbd0: Other side returned error 
>> (5)
>> ago 15 22:19:49 kerberos kernel: print_req_error: I/O error, dev nbd0,
>> sector 22159872
>> ago 15 22:19:49 kerberos kernel: BTRFS warning (device sda3): csum
>> failed root 258 ino 968837 off 17455849472 csum 0xcc028588 expected
>> csum 0xe3338de1 mirror 1
> 
> Still csum error.
> And furthermore, both the expected and on-disk csum is not special value 
> like crc32 for all zero page.
> So it may means that, it's a real corruption.
> 
>> ago 15 22:19:49 kerberos kernel: block nbd0: Other side returned error 
>> (5)
>> ago 15 22:19:49 kerberos kernel: print_req_error: I/O error, dev nbd0,
>> sector 22160016
>> ago 15 22:19:49 kerberos kernel: Buffer I/O error on dev nbd0, logical
>> block 2770002, async page read
>> ago 15 22:19:49 kerberos kernel: BTRFS warning (device sda3): csum
>> failed root 258 ino 968837 off 17455849472 csum 0xcc028588 expected
>> csum 0xe3338de1 mirror 1
> 
> At least, we now know which inode (968837 of root 258) and file offset 
> (17455849472 length 4K) is corrupted.
> 
>> ago 15 22:19:49 kerberos kernel: block nbd0: Other side returned error 
>> (5)
>> ago 15 22:19:49 kerberos kernel: print_req_error: I/O error, dev nbd0,
>> sector 22160016
>> ago 15 22:19:49 kerberos kernel: Buffer I/O error on dev nbd0, logical
>> block 2770002, async page read
>> ago 15 22:20:47 kerberos kernel: BTRFS warning (device sda3): csum
>> failed root 258 ino 968837 off 17455849472 csum 0xcc028588 expected
>> csum 0xe3338de1 mirror 1
>> ago 15 22:20:47 kerberos kernel: BTRFS warning (device sda3): csum
>> failed root 258 ino 968837 off 17455849472 csum 0xcc028588 expected
>> csum 0xe3338de1 mirror 1
> <snip>
>> block 2770002, async page read
>> ago 15 22:21:32 kerberos kernel: block nbd0: NBD_DISCONNECT
>> ago 15 22:21:32 kerberos kernel: block nbd0: shutting down sockets
>>
>> i deleted the original Fedora.qcow2 and again scrub said i didnt had
>> any errors, so i wondered, could it be the raid1 code (long shot), so
>> i moved the metadata back to DUP.
>>
>> btrfs fi balance start -dconvert=single -mconvert=dup /home/
> 
> OK, data is not touched.
> Single to single, so data chunks are not touched.
> And your metadata is always good, so no problem should happen during 
> balance.
> 
> BTW, if you balance data, (no need to do convert, just balancing all 
> data), it should also report error if my assumption is correct:
> Some data is *really* corrupted.
> 
>>
>> root@kerberos:/home/groo# btrfs filesystem usage -T /home/
>> Overall:
>>      Device size:                 333.50GiB
>>      Device allocated:             18.06GiB
>>      Device unallocated:          315.44GiB
>>      Device missing:                  0.00B
>>      Used:                         16.25GiB
>>      Free (estimated):            315.83GiB      (min: 158.11GiB)
>>      Data ratio:                       1.00
>>      Metadata ratio:                   2.00
>>      Global reserve:               39.45MiB      (used: 0.00B)
>>
>>               Data     Metadata  System
>> Id Path      single   DUP       DUP      Unallocated
>> -- --------- -------- --------- -------- -----------
>>   1 /dev/sda3 16.00GiB   2.00GiB 64.00MiB   181.94GiB
>>   2 /dev/sdb7        -         -        -   133.03GiB
>>   3 /dev/sdb8        -         -        -   488.13MiB
>> -- --------- -------- --------- -------- -----------
>>     Total     16.00GiB   1.00GiB 32.00MiB   315.44GiB
>>     Used      15.61GiB 329.27MiB 16.00KiB
>>
>> and once again copied the NEW fedora.qcow2 back to home and rerun scrub >
>> and once again i got errors:
>>
>> root@kerberos:/home/groo# btrfs scrub start -B /home/
>> scrub done for ae9ae869-720d-4643-b673-6924d09b2fe0
>>          scrub started at Tue Aug 15 22:36:32 2017 and finished after 
>> 00:01:04
>>          total bytes scrubbed: 32.56GiB with 13 errors
>>          error details: csum=13
>>          corrected errors: 0, uncorrectable errors: 13, unverified 
>> errors: 0
>>
>> ago 15 22:37:36 kerberos kernel: BTRFS error (device sda3): bdev
>> /dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 35, gen 0
>> ago 15 22:37:36 kerberos kernel: BTRFS error (device sda3): unable to
>> fixup (regular) error at logical 418909777920 on dev /dev/sda3
>> ago 15 22:37:36 kerberos kernel: BTRFS error (device sda3): bdev
> <snip>
>> ago 15 22:37:36 kerberos kernel: BTRFS error (device sda3): bdev
>> /dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 44, gen 0
>> ago 15 22:37:36 kerberos kernel: BTRFS error (device sda3): unable to
>> fixup (regular) error at logical 418912997376 on dev /dev/sda3
>>
>> since i still have the original (recovered) Fedora.qcow2 back in the
>> root volume, i went back and changed the medatada back to raid1.
>>
>> root@kerberos:/home/groo# btrfs filesystem usage -T /home/
>> Overall:
>>      Device size:                 333.50GiB
>>      Device allocated:             18.06GiB
>>      Device unallocated:          315.44GiB
>>      Device missing:                  0.00B
>>      Used:                         16.25GiB
>>      Free (estimated):            315.83GiB      (min: 158.11GiB)
>>      Data ratio:                       1.00
>>      Metadata ratio:                   2.00
>>      Global reserve:               38.98MiB      (used: 0.00B)
>>
>>               Data     Metadata  System
>> Id Path      single   RAID1     RAID1    Unallocated
>> -- --------- -------- --------- -------- -----------
>>   1 /dev/sda3 16.00GiB   1.00GiB 32.00MiB   182.97GiB
>>   2 /dev/sdb7        -   1.00GiB 32.00MiB   132.00GiB
>>   3 /dev/sdb8        -         -        -   488.13MiB
>> -- --------- -------- --------- -------- -----------
>>     Total     16.00GiB   1.00GiB 32.00MiB   315.44GiB
>>     Used      15.61GiB 328.80MiB 16.00KiB
>>
>> and thats when you answered my email.
>>
>> now to answer your questions:
>>
>> Any special setting on the file or the Fedora directory? Like nodatasum?
>>
>> nope
>>
>> And is there any special setup like off-line dedupe?
>>
>> nope
>>
>> its a plain btrfs setup with discard and thats it.
> 
> Oh, discard.
> IIRC there used to be some discard related problems which leads to data 
> corruption.
> Not sure if it's related.
> 
> As a general recommendation, it's better to do periodic fstrim, other 
> than using discard mount option.
> 
> Would you please try to mount without discard, and delete related files, 
> making sure scrub and cat (just cat out all files, redirect to 
> /dev/null, as in that case, error report is better than scrub) reports 
> nothing wrong.
> 
> Then recreate the file from other backup (not in the same btrfs), and 
> scrub again to verify if it's good or not.
> 
> Thanks,
> Qu
> 
>>
>> the qcow2 is the plain one created via libvirt/virt-manager.
>>
>> also, its not the only one, if i create an image with minishift (a
>> openshift dockerized solution) i get even more errors, since i have 2
>> sparse files. if i delete them, the errors go away.
>>
>> im stumped at this.
>>
>> any ideas?
>> | Paulo Dias
>> | paulo.miguel.dias@gmail.com
>>
>> Tempora mutantur, nos et mutamur in illis.
>>
>>
>> On Tue, Aug 15, 2017 at 10:40 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> 
>> wrote:
>>>
>>>
>>> On 2017年08月16日 09:12, Paulo Dias wrote:
>>>>
>>>> Hello/2 all
>>>>
>>>> I'm using libvirt with a qcow2 image and everytime i run btrfs scrub
>>>> -H /home (subvolume where the image is), i get:
>>>>
>>>> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): bdev
>>>> /dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 30, gen 0
>>>> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): unable to
>>>> fixup (regular) error at logical 289831161856 on dev /dev/sda3
>>>> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): bdev
>>>> /dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 31, gen 0
>>>> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): unable to
>>>> fixup (regular) error at logical 289830309888 on dev /dev/sda3
>>>> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): bdev
>>>> /dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 32, gen 0
>>>> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): unable to
>>>> fixup (regular) error at logical 289831055360 on dev /dev/sda3
>>>> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): bdev
>>>> /dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 33, gen 0
>>>> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): unable to
>>>> fixup (regular) error at logical 289861591040 on dev /dev/sda3
>>>> ago 15 21:58:09 kerberos kernel: BTRFS warning (device sda3): checksum
>>>> error at logical 290297204736 on dev /dev/sda3, sector 67982824, root
>>>> 258, inode 968837, offset 17455849472, length 4096, links 1 (path:
>>>> groo/Fedora/Fedora.qcow2)
>>>
>>>
>>> Any special setting on the file or the Fedora directory? Like nodatasum?
>>>
>>> And is there any special setup like off-line dedupe?
>>>
>>> Considering the number of corruption, only less than 50 and not 
>>> continuous
>>> at all, it's a little weird.
>>> For normal corruption, (at least on HDD) corruption range should be
>>> continuous, and more errors should be detected.
>>>
>>>> ago 15 21:58:09 kerberos kernel: BTRFS error (device sda3): bdev
>>>> /dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 34, gen 0
>>>> ago 15 21:58:09 kerberos kernel: BTRFS error (device sda3): unable to
>>>> fixup (regular) error at logical 290297204736 on dev /dev/sda3
>>>>
>>>> The thing is, as soon as i move the image to another subvolume, root
>>>> in this case, and delete it, the errors go away and scrub tells me i
>>>> have zero errors again.
>>>
>>>
>>> This makes things even more weird.
>>>
>>> If you're *moving* the file to another subvolume, its data still locates
>>> where it was, nothing is modified.
>>>
>>> If you're *copying* the file to another subvolume, without 
>>> reflinking, then
>>> kernel will try to read out the data and write it back to new place.
>>> During the read, it will verify data checksum. And if it doesn't match,
>>> you'll get EIO error during the copy.
>>>
>>> If you're *reflinking* the file, using cp --reflink=always, it's the 
>>> same
>>> result as *moving*.
>>>
>>> Anyway, the data of your image is either kept as it is, or re-written 
>>> to new
>>> place.
>>> If there is really some corruption, for copy case you should get some 
>>> error,
>>> and for moving/reflinking case, scrub will always report error.
>>>
>>> I doubt if there is something wrong with scrub.
>>>
>>> Can you even reproduce it with a smaller sparse file? For example 
>>> several
>>> mega size.
>>> And is it only happening in that specified Fedora directory?
>>>
>>> Thanks,
>>> Qu
>>>
>>>>
>>>> Then if i AGAIN copy the file back to /home, i get the same errors.
>>>>
>>>> qemu-img check tells me the qcow2 file is fine, and smart doesnt show
>>>> me anything wrong with my ssd:
>>>>
>>>> root@kerberos:/home/groo# smartctl -Ai /dev/sda
>>>> smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.13.0-041300rc4-generic]
>>>> (local build)
>>>> Copyright (C) 2002-16, Bruce Allen, Christian Franke,
>>>> www.smartmontools.org
>>>>
>>>> === START OF INFORMATION SECTION ===
>>>> Model Family:     Samsung based SSDs
>>>> Device Model:     Samsung SSD 850 EVO M.2 500GB
>>>> Serial Number:    S33DNX0H812686V
>>>> LU WWN Device Id: 5 002538 d4130d027
>>>> Firmware Version: EMT21B6Q
>>>> User Capacity:    500.107.862.016 bytes [500 GB]
>>>> Sector Size:      512 bytes logical/physical
>>>> Rotation Rate:    Solid State Device
>>>> Form Factor:      M.2
>>>> Device is:        In smartctl database [for details use: -P show]
>>>> ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4c
>>>> SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
>>>> Local Time is:    Tue Aug 15 21:59:34 2017 -03
>>>> SMART support is: Available - device has SMART capability.
>>>> SMART support is: Enabled
>>>>
>>>> === START OF READ SMART DATA SECTION ===
>>>> SMART Attributes Data Structure revision number: 1
>>>> Vendor Specific SMART Attributes with Thresholds:
>>>> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
>>>> UPDATED  WHEN_FAILED RAW_VALUE
>>>>     5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail
>>>> Always       -       0
>>>>     9 Power_On_Hours          0x0032   099   099   000    Old_age
>>>> Always       -       1739
>>>>    12 Power_Cycle_Count       0x0032   099   099   000    Old_age
>>>> Always       -       392
>>>> 177 Wear_Leveling_Count     0x0013   099   099   000    Pre-fail
>>>> Always       -       7
>>>> 179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail
>>>> Always       -       0
>>>> 181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age
>>>> Always       -       0
>>>> 182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age
>>>> Always       -       0
>>>> 183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail
>>>> Always       -       0
>>>> 187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age
>>>> Always       -       0
>>>> 190 Airflow_Temperature_Cel 0x0032   061   050   000    Old_age
>>>> Always       -       39
>>>> 195 ECC_Error_Rate          0x001a   200   200   000    Old_age
>>>> Always       -       0
>>>> 199 CRC_Error_Count         0x003e   100   100   000    Old_age
>>>> Always       -       0
>>>> 235 POR_Recovery_Count      0x0012   099   099   000    Old_age
>>>> Always       -       54
>>>> 241 Total_LBAs_Written      0x0032   099   099   000    Old_age
>>>> Always       -       7997549567
>>>>
>>>> this is the usage for /home:
>>>>
>>>> root@kerberos:/home/groo# btrfs filesystem usage -T /home/
>>>> Overall:
>>>>       Device size:                 333.50GiB
>>>>       Device allocated:             74.12GiB
>>>>       Device unallocated:          259.38GiB
>>>>       Device missing:                  0.00B
>>>>       Used:                         32.70GiB
>>>>       Free (estimated):            297.36GiB      (min: 167.67GiB)
>>>>       Data ratio:                       1.00
>>>>       Metadata ratio:                   2.00
>>>>       Global reserve:               58.12MiB      (used: 0.00B)
>>>>
>>>>                Data     Metadata  System
>>>> Id Path      single   RAID1     RAID1    Unallocated
>>>> -- --------- -------- --------- -------- -----------
>>>>    1 /dev/sda3 68.00GiB   2.00GiB 64.00MiB   129.94GiB
>>>>    2 /dev/sdb7  2.00GiB   2.00GiB 64.00MiB   128.96GiB
>>>>    3 /dev/sdb8        -         -        -   488.13MiB
>>>> -- --------- -------- --------- -------- -----------
>>>>      Total     70.00GiB   2.00GiB 64.00MiB   259.38GiB
>>>>      Used      32.02GiB 348.12MiB 16.00KiB
>>>>
>>>> and for root subvolume:
>>>>
>>>> root@kerberos:/home/groo# btrfs filesystem usage -T /
>>>> Overall:
>>>>       Device size:                  65.29GiB
>>>>       Device allocated:             65.28GiB
>>>>       Device unallocated:           12.00MiB
>>>>       Device missing:                  0.00B
>>>>       Used:                         14.94GiB
>>>>       Free (estimated):             48.72GiB      (min: 48.72GiB)
>>>>       Data ratio:                       1.00
>>>>       Metadata ratio:                   1.00
>>>>       Global reserve:               42.20MiB      (used: 0.00B)
>>>>
>>>>                Data     Metadata  System
>>>> Id Path      single   single    single   Unallocated
>>>> -- --------- -------- --------- -------- -----------
>>>>    1 /dev/sda2 63.24GiB   2.01GiB 32.00MiB    12.00MiB
>>>> -- --------- -------- --------- -------- -----------
>>>>      Total     63.24GiB   2.01GiB 32.00MiB    12.00MiB
>>>>      Used      14.52GiB 425.16MiB 16.00KiB
>>>>
>>>> i see this with both kernel 4.12 and 4.13rc4
>>>>
>>>> the btrfstools are:
>>>>
>>>> root@kerberos:/home/groo# btrfs version
>>>> btrfs-progs v4.12-dirty
>>>>
>>>> /etc/fstab:
>>>>
>>>> UUID=e31faa09-99e5-4c75-815c-629402ec92f2 /               btrfs
>>>> defaults,discard,subvol=@ 0       1
>>>> # /boot was on /dev/sda1 during installation
>>>> UUID=55796428-a9b8-4f1b-9a7e-8fe3aa8d8097 /boot           ext4
>>>> defaults        0       2
>>>> # /boot/efi was on /dev/sdb2 during installation
>>>> UUID=D4F8-9F87  /boot/efi       vfat    umask=0077      0       1
>>>> # /home was on /dev/sda3 during installation
>>>> UUID=ae9ae869-720d-4643-b673-6924d09b2fe0 /home           btrfs
>>>> defaults,discard,subvol=@home 0       2
>>>> # swap was on /dev/sdb6 during installation
>>>> #UUID=fc2a432b-4c40-4fe4-9730-869a1d1911ef none            swap    sw
>>>>               0       0
>>>> /dev/mapper/cryptswap1 none swap sw 0 0
>>>>
>>>>
>>>> this is reproducible every single time.
>>>>
>>>> is btrfs scrub maybe getting confused with a sparse file? is it
>>>> possible to get a bad checksum with raid1 in this scenario?
>>>>
>>>> any help is appreciated
>>>>
>>>> | Paulo Dias
>>>> | paulo.miguel.dias@gmail.com
>>>>
>>>> Tempora mutantur, nos et mutamur in illis.
>>>> -- 
>>>> To unsubscribe from this list: send the line "unsubscribe 
>>>> linux-btrfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: qcow2 images make scrub believe the filesystem is corrupted.
  2017-08-16  1:12 qcow2 images make scrub believe the filesystem is corrupted Paulo Dias
  2017-08-16  1:40 ` Qu Wenruo
@ 2017-08-16 23:32 ` Chris Murphy
  2017-08-17  8:04   ` Duncan
  2017-08-17 23:39 ` Josef Bacik
  2 siblings, 1 reply; 19+ messages in thread
From: Chris Murphy @ 2017-08-16 23:32 UTC (permalink / raw)
  To: Paulo Dias; +Cc: Btrfs BTRFS

>>>
On Tue, Aug 15, 2017 at 7:12 PM, Paulo Dias <paulo.miguel.dias@gmail.com> wrote:
Device Model:     Samsung SSD 850 EVO M.2 500GB
Serial Number:    S33DNX0H812686V
LU WWN Device Id: 5 002538 d4130d027
Firmware Version: EMT21B6Q
>>>

Unfortunately no firmware updates listed with Samsung for this model.
It's worth filing a bug report with them, and then try not using
either fstrim or discard for a while and see if the problem reoccurs.
If not, then that suggests trim bug in the firmware. If it does still
occur it could just be defective hardware.

Does smartctl -x reveal any issues?



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: qcow2 images make scrub believe the filesystem is corrupted.
  2017-08-16 23:32 ` Chris Murphy
@ 2017-08-17  8:04   ` Duncan
  2017-08-17 19:10     ` Chris Murphy
  0 siblings, 1 reply; 19+ messages in thread
From: Duncan @ 2017-08-17  8:04 UTC (permalink / raw)
  To: linux-btrfs

Chris Murphy posted on Wed, 16 Aug 2017 17:32:36 -0600 as excerpted:


> On Tue, Aug 15, 2017 at 7:12 PM, Paulo Dias
> <paulo.miguel.dias@gmail.com> wrote:
> Device Model:     Samsung SSD 850 EVO M.2 500GB Serial Number:   
> S33DNX0H812686V LU WWN Device Id: 5 002538 d4130d027 Firmware Version:
> EMT21B6Q
>>>>
>>>>
> Unfortunately no firmware updates listed with Samsung for this model.
> It's worth filing a bug report with them, and then try not using either
> fstrim or discard for a while and see if the problem reoccurs.
> If not, then that suggests trim bug in the firmware. If it does still
> occur it could just be defective hardware.

Heh, may not be worth filing a bug after all, unless they've changed 
policy recently.

Google samsung ssds queued trim.  They had a bad firmware that was 
/supposed/ to support the new ATA standard with queued-trim/discard, but 
apparently it simply didn't (and there remains an open question as to 
whether the hardware can actually support it at all).  The MS side of 
things worked just fine with the firmware... because apparently no MS 
Windows supported queued-trim yet.  But Linux users had all sorts of 
problems, and...

*** Samsung support *refused* to support Linux on the devices, saying it 
was because "anyone" could change the code! ***

They repeatedly told a number of people the same thing, refusing Linux 
support.  One of the kernel block/ATA subsystem folks finally got them to 
investigate and eventually update the firmware to turn off queued-trim 
again, by omitting any reference to Linux, instead saying he was 
developing a new SATA chipset and was having trouble verifying queued-
trim on Samsung ssds.

Meanwhile, on the kernel side, *all* samsung ssds now have queued-trim 
blacklisted.  (OTOH, while there has been so much trouble with devices 
lying about flush and returning before it's actually done in ordered to 
enhance their performance scores, Samsung devices are among the few 
actually whitelisted for reliable flushing, so it's not /all/ bad news.)


The point being, as I said, unless Samsung's changed policy recently, 
there's no point in filing Linux related bug reports with them.  All 
you're likely to get is them putting their fingers in their ears while 
singing loudly about not supporting Linux.

Unfortunately, I found all this out /after/ having bought a pair of 1 TB 
Samsung evo 850s myself (after seeing them recommended here...), while 
googling, as suggested above, samsung ssd queued trim (tho I actually put 
in evo 850 since that's what I had), in ordered to see if I could safely 
mount with discard and not have it hurt performance due to lack of queued-
trim support.  Obviously not, so I'm running without discard, and letting 
the systemd fstrim timer do its thing every week, instead.

Tho to be fair I've had no problems with them... if only because I'm not 
trying to mount with discard, and the kernel blacklisting would turn of 
queued-trim if I did try it (tho I don't believe my firmware's actually 
one of the lying ones anyway, it doesn't claim support of whatever ATA 
revision made support mandatory, as the lying firmware apparently did).

But I'd have been rather unlikely to buy samsung if I knew they /refused/ 
to support Linux users because "anyone" can modify the code, that's for 
sure!

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: qcow2 images make scrub believe the filesystem is corrupted.
  2017-08-17  8:04   ` Duncan
@ 2017-08-17 19:10     ` Chris Murphy
  2017-08-17 20:17       ` Paulo Dias
  0 siblings, 1 reply; 19+ messages in thread
From: Chris Murphy @ 2017-08-17 19:10 UTC (permalink / raw)
  To: Duncan; +Cc: Btrfs BTRFS

On Thu, Aug 17, 2017 at 2:04 AM, Duncan <1i5t5.duncan@cox.net> wrote:

> The point being, as I said, unless Samsung's changed policy recently,
> there's no point in filing Linux related bug reports with them.  All
> you're likely to get is them putting their fingers in their ears while
> singing loudly about not supporting Linux.

OK good to know.


> Unfortunately, I found all this out /after/ having bought a pair of 1 TB
> Samsung evo 850s myself (after seeing them recommended here...), while
> googling, as suggested above, samsung ssd queued trim (tho I actually put
> in evo 850 since that's what I had), in ordered to see if I could safely
> mount with discard and not have it hurt performance due to lack of queued-
> trim support.  Obviously not, so I'm running without discard, and letting
> the systemd fstrim timer do its thing every week, instead.

I have one of these:

SAMSUNG MZVLV256HCHP-000H1

It came in the HP Spectre laptop I'm using, and I've intentionally
been using discard mount option to see if things go bad eventually.
It's been 10 months. Zero problems.



> But I'd have been rather unlikely to buy samsung if I knew they /refused/
> to support Linux users because "anyone" can modify the code, that's for
> sure!

That's silly. Someone's just a bad manager (or a series of them).
Samsung is a multi-tentacled beast. They clearly have another tentacle
that supports Linux.
https://en.wikipedia.org/wiki/F2FS


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: qcow2 images make scrub believe the filesystem is corrupted.
  2017-08-17 19:10     ` Chris Murphy
@ 2017-08-17 20:17       ` Paulo Dias
  2017-08-17 20:58         ` Chris Murphy
  0 siblings, 1 reply; 19+ messages in thread
From: Paulo Dias @ 2017-08-17 20:17 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Duncan, Btrfs BTRFS

HI/2 all, once again thanky you for taking time to look at this.

So i disabled discard in the mount options yesterday, recreated the
Fedora.qcow file outside of the /home subvolume, and copied it back to
/home.

It "appeared" ok for a while, but today i run scrub on /home again,
and sure enough:

ago 17 17:05:54 kerberos kernel: BTRFS error (device sda3): bdev
/dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 64, gen 0
ago 17 17:05:54 kerberos kernel: BTRFS error (device sda3): unable to
fixup (regular) error at logical 422343614464 on dev /dev/sda3
ago 17 17:05:54 kerberos kernel: BTRFS error (device sda3): bdev
/dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 65, gen 0
ago 17 17:05:54 kerberos kernel: BTRFS error (device sda3): unable to
fixup (regular) error at logical 422343618560 on dev /dev/sda3
ago 17 17:05:54 kerberos kernel: BTRFS error (device sda3): bdev
/dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 66, gen 0
ago 17 17:05:54 kerberos kernel: BTRFS error (device sda3): unable to
fixup (regular) error at logical 422343630848 on dev /dev/sda3
ago 17 17:05:54 kerberos kernel: BTRFS error (device sda3): bdev
/dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 67, gen 0
ago 17 17:05:54 kerberos kernel: BTRFS error (device sda3): unable to
fixup (regular) error at logical 422343634944 on dev /dev/sda3
ago 17 17:05:54 kerberos kernel: BTRFS error (device sda3): bdev
/dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 68, gen 0
ago 17 17:05:54 kerberos kernel: BTRFS error (device sda3): unable to
fixup (regular) error at logical 422343639040 on dev /dev/sda3
ago 17 17:05:54 kerberos kernel: BTRFS error (device sda3): bdev
/dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 69, gen 0
ago 17 17:05:54 kerberos kernel: BTRFS error (device sda3): unable to
fixup (regular) error at logical 422343643136 on dev /dev/sda3
ago 17 17:06:30 kerberos kernel: BTRFS error (device sda3): bdev
/dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 70, gen 0
ago 17 17:06:30 kerberos kernel: BTRFS error (device sda3): unable to
fixup (regular) error at logical 441140600832 on dev /dev/sda3
ago 17 17:06:30 kerberos kernel: BTRFS error (device sda3): bdev
/dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 71, gen 0
ago 17 17:06:30 kerberos kernel: BTRFS error (device sda3): unable to
fixup (regular) error at logical 441140629504 on dev /dev/sda3

root@kerberos:/home/groo# btrfs scrub start -B /home/
scrub done for ae9ae869-720d-4643-b673-6924d09b2fe0
       scrub started at Thu Aug 17 17:05:10 2017 and finished after 00:01:20
       total bytes scrubbed: 37.71GiB with 8 errors
       error details: csum=8
       corrected errors: 0, uncorrectable errors: 8, unverified errors: 0
ERROR: there are uncorrectable errors

this is what smartctl -x /ev/sda shows me:

root@kerberos:/home/groo# smartctl -x /dev/sda
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.12.5-041205-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Samsung based SSDs
Device Model:     Samsung SSD 850 EVO M.2 500GB
Serial Number:    S33DNX0H812686V
LU WWN Device Id: 5 002538 d4130d027
Firmware Version: EMT21B6Q
User Capacity:    500.107.862.016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      M.2
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4c
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Aug 17 17:15:44 2017 -03
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                       was never started.
                                       Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                       without error or no self-test has ever
                                       been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x53) SMART execute Offline immediate.
                                       Auto Offline data collection
on/off support.
                                       Suspend Offline collection upon new
                                       command.
                                       No Offline surface scan supported.
                                       Self-test supported.
                                       No Conveyance Self-test supported.
                                       Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                       power-saving mode.
                                       Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                       General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 265) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                       SCT Error Recovery Control supported.
                                       SCT Feature Control supported.
                                       SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
 5 Reallocated_Sector_Ct   PO--CK   100   100   010    -    0
 9 Power_On_Hours          -O--CK   099   099   000    -    1753
12 Power_Cycle_Count       -O--CK   099   099   000    -    395
177 Wear_Leveling_Count     PO--C-   099   099   000    -    7
179 Used_Rsvd_Blk_Cnt_Tot   PO--C-   100   100   010    -    0
181 Program_Fail_Cnt_Total  -O--CK   100   100   010    -    0
182 Erase_Fail_Count_Total  -O--CK   100   100   010    -    0
183 Runtime_Bad_Block       PO--C-   100   100   010    -    0
187 Uncorrectable_Error_Cnt -O--CK   100   100   000    -    0
190 Airflow_Temperature_Cel -O--CK   068   050   000    -    32
195 ECC_Error_Rate          -O-RC-   200   200   000    -    0
199 CRC_Error_Count         -OSRCK   100   100   000    -    0
235 POR_Recovery_Count      -O--C-   099   099   000    -    55
241 Total_LBAs_Written      -O--CK   099   099   000    -    8386128219
                           ||||||_ K auto-keep
                           |||||__ C event count
                           ||||___ R error rate
                           |||____ S speed/performance
                           ||_____ O updated online
                           |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      1  Comprehensive SMART error log
0x03       GPL     R/O      1  Ext. Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  SATA NCQ Queued Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x13       GPL     R/O      1  SATA NCQ Send and Receive log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa1           SL  VS      16  Device vendor specific log
0xa5           SL  VS      16  Device vendor specific log
0xce           SL  VS      16  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining
LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      1504         -
# 2  Short offline       Aborted by host               90%       857         -
# 3  Offline             Completed without error       00%       857         -
# 4  Short offline       Completed without error       00%       504         -
# 5  Short offline       Aborted by host               70%         8         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
   1        0        0  Not_testing
   2        0        0  Not_testing
   3        0        0  Not_testing
   4        0        0  Not_testing
   5        0        0  Not_testing
 255        0    65535  Read_scanning was never started
Selective self-test flags (0x0):
 After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       256 (0x0100)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                    43 Celsius
Power Cycle Min/Max Temperature:     30/43 Celsius
Lifetime    Min/Max Temperature:     22/50 Celsius
Under/Over Temperature Limit Count:   0/0

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        10 minutes
Min/Max recommended Temperature:      0/70 Celsius
Min/Max Temperature Limit:            0/70 Celsius
Temperature History Size (Index):    128 (15)

Index    Estimated Time   Temperature Celsius
 16    2017-08-16 20:00     ?  -
 17    2017-08-16 20:10    33  **************
 18    2017-08-16 20:20    32  *************
 19    2017-08-16 20:30    36  *****************
 20    2017-08-16 20:40    33  **************
 21    2017-08-16 20:50    33  **************
 22    2017-08-16 21:00    34  ***************
 23    2017-08-16 21:10    35  ****************
 24    2017-08-16 21:20    34  ***************
 25    2017-08-16 21:30    34  ***************
 26    2017-08-16 21:40    35  ****************
 27    2017-08-16 21:50    34  ***************
 28    2017-08-16 22:00    34  ***************
 29    2017-08-16 22:10    33  **************
 30    2017-08-16 22:20    33  **************
 31    2017-08-16 22:30    34  ***************
 32    2017-08-16 22:40    35  ****************
 33    2017-08-16 22:50    32  *************
...    ..(  2 skipped).    ..  *************
 36    2017-08-16 23:20    32  *************
 37    2017-08-16 23:30    31  ************
...    ..(  4 skipped).    ..  ************
 42    2017-08-17 00:20    31  ************
 43    2017-08-17 00:30    32  *************
 44    2017-08-17 00:40    32  *************
 45    2017-08-17 00:50    33  **************
 46    2017-08-17 01:00    32  *************
 47    2017-08-17 01:10    32  *************
 48    2017-08-17 01:20    33  **************
 49    2017-08-17 01:30     ?  -
 50    2017-08-17 01:40    32  *************
 51    2017-08-17 01:50    47  ****************************
 52    2017-08-17 02:00    48  *****************************
 53    2017-08-17 02:10    34  ***************
 54    2017-08-17 02:20    32  *************
 55    2017-08-17 02:30    32  *************
 56    2017-08-17 02:40    43  ************************
 57    2017-08-17 02:50    43  ************************
 58    2017-08-17 03:00    33  **************
 59    2017-08-17 03:10    30  ***********
 60    2017-08-17 03:20    36  *****************
 61    2017-08-17 03:30    43  ************************
 62    2017-08-17 03:40    42  ***********************
 63    2017-08-17 03:50    32  *************
 64    2017-08-17 04:00    31  ************
 65    2017-08-17 04:10    36  *****************
 66    2017-08-17 04:20    31  ************
...    ..(  2 skipped).    ..  ************
 69    2017-08-17 04:50    31  ************
 70    2017-08-17 05:00    30  ***********
...    ..( 43 skipped).    ..  ***********
114    2017-08-17 12:20    30  ***********
115    2017-08-17 12:30    31  ************
116    2017-08-17 12:40    31  ************
117    2017-08-17 12:50    31  ************
118    2017-08-17 13:00     ?  -
119    2017-08-17 13:10    35  ****************
120    2017-08-17 13:20    36  *****************
121    2017-08-17 13:30    34  ***************
122    2017-08-17 13:40    32  *************
123    2017-08-17 13:50    36  *****************
124    2017-08-17 14:00     ?  -
125    2017-08-17 14:10    32  *************
126    2017-08-17 14:20    33  **************
127    2017-08-17 14:30    31  ************
  0    2017-08-17 14:40     ?  -
  1    2017-08-17 14:50    31  ************
  2    2017-08-17 15:00    40  *********************
  3    2017-08-17 15:10    33  **************
  4    2017-08-17 15:20    32  *************
  5    2017-08-17 15:30    31  ************
  6    2017-08-17 15:40    31  ************
  7    2017-08-17 15:50    30  ***********
  8    2017-08-17 16:00    30  ***********
  9    2017-08-17 16:10    33  **************
 10    2017-08-17 16:20    31  ************
...    ..(  3 skipped).    ..  ************
 14    2017-08-17 17:00    31  ************
 15    2017-08-17 17:10    43  ************************

SCT Error Recovery Control:
          Read: Disabled
         Write: Disabled

Device Statistics (GP/SMART Log 0x04) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2           17  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2           17  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000d  2            0  Non-CRC errors within host-to-device FIS
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0010  2            0  R_ERR response for host-to-device data FIS, non-CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x0013  2            0  R_ERR response for host-to-device non-data FIS, non-CRC

is there a way to check if the above logical sectors are the ones
pointing to any of my sparse files? i have 2, fedora.qcow2 and
minishift.img.

thanks for everything so far

| Paulo Dias
| paulo.miguel.dias@gmail.com

Tempora mutantur, nos et mutamur in illis.


On Thu, Aug 17, 2017 at 4:10 PM, Chris Murphy <lists@colorremedies.com> wrote:
> On Thu, Aug 17, 2017 at 2:04 AM, Duncan <1i5t5.duncan@cox.net> wrote:
>
>> The point being, as I said, unless Samsung's changed policy recently,
>> there's no point in filing Linux related bug reports with them.  All
>> you're likely to get is them putting their fingers in their ears while
>> singing loudly about not supporting Linux.
>
> OK good to know.
>
>
>> Unfortunately, I found all this out /after/ having bought a pair of 1 TB
>> Samsung evo 850s myself (after seeing them recommended here...), while
>> googling, as suggested above, samsung ssd queued trim (tho I actually put
>> in evo 850 since that's what I had), in ordered to see if I could safely
>> mount with discard and not have it hurt performance due to lack of queued-
>> trim support.  Obviously not, so I'm running without discard, and letting
>> the systemd fstrim timer do its thing every week, instead.
>
> I have one of these:
>
> SAMSUNG MZVLV256HCHP-000H1
>
> It came in the HP Spectre laptop I'm using, and I've intentionally
> been using discard mount option to see if things go bad eventually.
> It's been 10 months. Zero problems.
>
>
>
>> But I'd have been rather unlikely to buy samsung if I knew they /refused/
>> to support Linux users because "anyone" can modify the code, that's for
>> sure!
>
> That's silly. Someone's just a bad manager (or a series of them).
> Samsung is a multi-tentacled beast. They clearly have another tentacle
> that supports Linux.
> https://en.wikipedia.org/wiki/F2FS
>
>
> --
> Chris Murphy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: qcow2 images make scrub believe the filesystem is corrupted.
  2017-08-17 20:17       ` Paulo Dias
@ 2017-08-17 20:58         ` Chris Murphy
  0 siblings, 0 replies; 19+ messages in thread
From: Chris Murphy @ 2017-08-17 20:58 UTC (permalink / raw)
  To: Paulo Dias; +Cc: Chris Murphy, Duncan, Btrfs BTRFS

First post

235 POR_Recovery_Count      0x0012   099   099   000    Old_age
Always       -       54

Recent post

235 POR_Recovery_Count      -O--C-   099   099   000    -    55

So you've had one more of these POR events.

http://www.samsung.com/semiconductor/minisite/ssd/downloads/document/Samsung_SSD_White_Paper.pdf
ID # 235 Power Recovery Count
A count of the number of sudden power off cases. If there is a sudden
power off, the firmware must recover all of the
mapping and user data during the next power on. This is a count of the
number of times this has happened.

I wonder if there is a correlation with these POR events and
corruption? And I wonder what's causing the POR event? Is this machine
crashing/hanging and you're doing a force power off? Or is it suspend
to RAM or suspend to disk? And if you stop doing those things, does
the corruption still happen?


--------------
Chris Murphy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: qcow2 images make scrub believe the filesystem is corrupted.
  2017-08-16  1:12 qcow2 images make scrub believe the filesystem is corrupted Paulo Dias
  2017-08-16  1:40 ` Qu Wenruo
  2017-08-16 23:32 ` Chris Murphy
@ 2017-08-17 23:39 ` Josef Bacik
  2017-08-18 16:23   ` Goffredo Baroncelli
  2 siblings, 1 reply; 19+ messages in thread
From: Josef Bacik @ 2017-08-17 23:39 UTC (permalink / raw)
  To: Paulo Dias; +Cc: linux-btrfs

On Tue, Aug 15, 2017 at 10:12:28PM -0300, Paulo Dias wrote:
> Hello/2 all
> 
> I'm using libvirt with a qcow2 image and everytime i run btrfs scrub
> -H /home (subvolume where the image is), i get:
> 
> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): bdev
> /dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 30, gen 0
> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): unable to
> fixup (regular) error at logical 289831161856 on dev /dev/sda3
> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): bdev
> /dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 31, gen 0
> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): unable to
> fixup (regular) error at logical 289830309888 on dev /dev/sda3
> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): bdev
> /dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 32, gen 0
> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): unable to
> fixup (regular) error at logical 289831055360 on dev /dev/sda3
> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): bdev
> /dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 33, gen 0
> ago 15 21:58:08 kerberos kernel: BTRFS error (device sda3): unable to
> fixup (regular) error at logical 289861591040 on dev /dev/sda3
> ago 15 21:58:09 kerberos kernel: BTRFS warning (device sda3): checksum
> error at logical 290297204736 on dev /dev/sda3, sector 67982824, root
> 258, inode 968837, offset 17455849472, length 4096, links 1 (path:
> groo/Fedora/Fedora.qcow2)
> ago 15 21:58:09 kerberos kernel: BTRFS error (device sda3): bdev
> /dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 34, gen 0
> ago 15 21:58:09 kerberos kernel: BTRFS error (device sda3): unable to
> fixup (regular) error at logical 290297204736 on dev /dev/sda3
>

Tried replying from my phone, forgot the app defaults to HTML, trying again.

This is happening because the app (the guest OS in this case, we saw this a lot
with windows guests) is changing the pages while they are in flight.  We
calculate the checksum of the page before it's written, so if it changes while
in flight we'll end up with a csum mismatch.

To fix this change kvm to not use O_DIRECT or set NODATASUM on your qcow2 image.
You'll have to re-create the image because NODATASUM won't apply to the already
invalid checksums.  Thanks,

Josef 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: qcow2 images make scrub believe the filesystem is corrupted.
  2017-08-17 23:39 ` Josef Bacik
@ 2017-08-18 16:23   ` Goffredo Baroncelli
  2017-08-18 17:43     ` Josef Bacik
  2017-08-18 17:59     ` Liu Bo
  0 siblings, 2 replies; 19+ messages in thread
From: Goffredo Baroncelli @ 2017-08-18 16:23 UTC (permalink / raw)
  To: Josef Bacik, Paulo Dias; +Cc: linux-btrfs

On 08/18/2017 01:39 AM, Josef Bacik wrote:
[...]
> This is happening because the app (the guest OS in this case, we saw this a lot
> with windows guests) is changing the pages while they are in flight.  We
> calculate the checksum of the page before it's written, so if it changes while
> in flight we'll end up with a csum mismatch.
> 
> To fix this change kvm to not use O_DIRECT or set NODATASUM on your qcow2 image.
> You'll have to re-create the image because NODATASUM won't apply to the already
> invalid checksums.  Thanks,

Hi Josef,

could you elaborate: do you are saying that using O_DIRECT is incompatible with DATASUM ?

Let me know

BR
G.Baroncelli

> 
> Josef 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: qcow2 images make scrub believe the filesystem is corrupted.
  2017-08-18 16:23   ` Goffredo Baroncelli
@ 2017-08-18 17:43     ` Josef Bacik
  2017-08-18 22:19       ` Goffredo Baroncelli
  2017-08-18 23:29       ` Qu Wenruo
  2017-08-18 17:59     ` Liu Bo
  1 sibling, 2 replies; 19+ messages in thread
From: Josef Bacik @ 2017-08-18 17:43 UTC (permalink / raw)
  To: kreijack; +Cc: Josef Bacik, Paulo Dias, linux-btrfs

On Fri, Aug 18, 2017 at 06:23:18PM +0200, Goffredo Baroncelli wrote:
> On 08/18/2017 01:39 AM, Josef Bacik wrote:
> [...]
> > This is happening because the app (the guest OS in this case, we saw this a lot
> > with windows guests) is changing the pages while they are in flight.  We
> > calculate the checksum of the page before it's written, so if it changes while
> > in flight we'll end up with a csum mismatch.
> > 
> > To fix this change kvm to not use O_DIRECT or set NODATASUM on your qcow2 image.
> > You'll have to re-create the image because NODATASUM won't apply to the already
> > invalid checksums.  Thanks,
> 
> Hi Josef,
> 
> could you elaborate: do you are saying that using O_DIRECT is incompatible with DATASUM ?
> 

No, I'm saying using O_DIRECT with applications that don't protect in-flight
memory are incompatible with DATASUM.  We have no way of making sure nobody
touches the page while we're writing it out, so after we calculate the checksum
any changes to the page are going to cause a checksum mismatch.  O_DIRECT are
user space pages, there's nothing we can do to stop user space from doing stupid
things.

The options I looked into before were things like detecting the page had changed
since we calculated the checksum, and re-submitting the write.  This punishes
applications that do the right thing (databases for example) by forcing us to
calculate checksums twice.

This is a shit situation because users aren't going to understand this
limitation, and it bites them in the ass with all these weird errors.  I think
maybe we need to go back to the double-checksum thing by default, and have a
flag or something for users to set if they know their application behaves
properly.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: qcow2 images make scrub believe the filesystem is corrupted.
  2017-08-18 16:23   ` Goffredo Baroncelli
  2017-08-18 17:43     ` Josef Bacik
@ 2017-08-18 17:59     ` Liu Bo
  2017-08-18 18:25       ` Paulo Dias
  1 sibling, 1 reply; 19+ messages in thread
From: Liu Bo @ 2017-08-18 17:59 UTC (permalink / raw)
  To: kreijack; +Cc: Josef Bacik, Paulo Dias, linux-btrfs

On Fri, Aug 18, 2017 at 06:23:18PM +0200, Goffredo Baroncelli wrote:
> On 08/18/2017 01:39 AM, Josef Bacik wrote:
> [...]
> > This is happening because the app (the guest OS in this case, we saw this a lot
> > with windows guests) is changing the pages while they are in flight.  We
> > calculate the checksum of the page before it's written, so if it changes while
> > in flight we'll end up with a csum mismatch.
> > 
> > To fix this change kvm to not use O_DIRECT or set NODATASUM on your qcow2 image.
> > You'll have to re-create the image because NODATASUM won't apply to the already
> > invalid checksums.  Thanks,
> 
> Hi Josef,
> 
> could you elaborate: do you are saying that using O_DIRECT is incompatible with DATASUM ?
>

They're compatible, but applications need to be careful.

O_DIRECT takes userspace page, and it works like
DIO write
  p = get_user_page();
  add p to bio
  #btrfs submits this bio
  calc_checksum(bio);
  submit_bio();

There's a chance that page p got changed between calc_checksum() and
submit_bio(), which then causes the mismatch.

For buffered IO, dirty page cache pages is synchronized with page
fault by page lock and page writeback bit.

thanks,
-liubo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: qcow2 images make scrub believe the filesystem is corrupted.
  2017-08-18 17:59     ` Liu Bo
@ 2017-08-18 18:25       ` Paulo Dias
  0 siblings, 0 replies; 19+ messages in thread
From: Paulo Dias @ 2017-08-18 18:25 UTC (permalink / raw)
  To: bo.li.liu; +Cc: kreijack, Josef Bacik, Btrfs BTRFS

HI all..

So this sucks, having to check every single disk image so it wont get
corrupted stuff, while it works on other filesystems isn't exactly
stellar.

i changed the qcow2 settings per this page:
https://pve.proxmox.com/wiki/Performance_Tweaks

i went with cache=writeback since it neither uses O_DSYNC nor O_DIRECT
semantics.

I understand the reasons checksumming might fail and that isn't
exactly btrfs fault but i scoured the entire btrfs wiki and couldnt
find anything warning about btrfs + qcow2 (or other image types) in
the wiki pages. Maybe adding some warning would help unlucky ppl like
myself?

Also, is is safe to enable compress=lzo , or is it also a no no, im
also starting to suspect that discard wasnt the culprit since i have
it enabled for more then a year and the only problem i got with
corruption was precisely this images?
| Paulo Dias
| paulo.miguel.dias@gmail.com

Tempora mutantur, nos et mutamur in illis.


On Fri, Aug 18, 2017 at 2:59 PM, Liu Bo <bo.li.liu@oracle.com> wrote:
> On Fri, Aug 18, 2017 at 06:23:18PM +0200, Goffredo Baroncelli wrote:
>> On 08/18/2017 01:39 AM, Josef Bacik wrote:
>> [...]
>> > This is happening because the app (the guest OS in this case, we saw this a lot
>> > with windows guests) is changing the pages while they are in flight.  We
>> > calculate the checksum of the page before it's written, so if it changes while
>> > in flight we'll end up with a csum mismatch.
>> >
>> > To fix this change kvm to not use O_DIRECT or set NODATASUM on your qcow2 image.
>> > You'll have to re-create the image because NODATASUM won't apply to the already
>> > invalid checksums.  Thanks,
>>
>> Hi Josef,
>>
>> could you elaborate: do you are saying that using O_DIRECT is incompatible with DATASUM ?
>>
>
> They're compatible, but applications need to be careful.
>
> O_DIRECT takes userspace page, and it works like
> DIO write
>   p = get_user_page();
>   add p to bio
>   #btrfs submits this bio
>   calc_checksum(bio);
>   submit_bio();
>
> There's a chance that page p got changed between calc_checksum() and
> submit_bio(), which then causes the mismatch.
>
> For buffered IO, dirty page cache pages is synchronized with page
> fault by page lock and page writeback bit.
>
> thanks,
> -liubo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: qcow2 images make scrub believe the filesystem is corrupted.
  2017-08-18 17:43     ` Josef Bacik
@ 2017-08-18 22:19       ` Goffredo Baroncelli
  2017-08-19 13:08         ` Goffredo Baroncelli
  2017-08-18 23:29       ` Qu Wenruo
  1 sibling, 1 reply; 19+ messages in thread
From: Goffredo Baroncelli @ 2017-08-18 22:19 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Paulo Dias, linux-btrfs

On 08/18/2017 07:43 PM, Josef Bacik wrote:
> On Fri, Aug 18, 2017 at 06:23:18PM +0200, Goffredo Baroncelli wrote:
>> On 08/18/2017 01:39 AM, Josef Bacik wrote:
>> [...]
>>> This is happening because the app (the guest OS in this case, we saw this a lot
>>> with windows guests) is changing the pages while they are in flight.  We
>>> calculate the checksum of the page before it's written, so if it changes while
>>> in flight we'll end up with a csum mismatch.
>>>
>>> To fix this change kvm to not use O_DIRECT or set NODATASUM on your qcow2 image.
>>> You'll have to re-create the image because NODATASUM won't apply to the already
>>> invalid checksums.  Thanks,
>>
>> Hi Josef,
>>
>> could you elaborate: do you are saying that using O_DIRECT is incompatible with DATASUM ?
>>
> 
> No, I'm saying using O_DIRECT with applications that don't protect in-flight
> memory are incompatible with DATASUM.  

This is what I call an 'incompatibility'. Even is a "corner" case, it is still an incompatibility. And to be honest, it is still difficult to say that a "VM" is a "corner" case.

> We have no way of making sure nobody
> touches the page while we're writing it out, so after we calculate the checksum
> any changes to the page are going to cause a checksum mismatch.  O_DIRECT are
> user space pages, there's nothing we can do to stop user space from doing stupid
> things.

I understand the technical difficulties; however I can't agree about "user space [...] doing *stupid* things". If it is not explicitly forbidden, it is legal; not "stupid"

How the application know that the page aren't in-flight anymore ? It is sufficient to wait the end of the write() syscall ? Or it has to wait the end of a fsync() ?
 
> The options I looked into before were things like detecting the page had changed
> since we calculated the checksum, and re-submitting the write.  This punishes
> applications that do the right thing (databases for example) by forcing us to
> calculate checksums twice.

There are other "cases" where it is possible to have the same problem ? It is the same for mmap() ?

> 
> This is a shit situation because users aren't going to understand this
> limitation, and it bites them in the ass with all these weird errors.  I think
> maybe we need to go back to the double-checksum thing by default, and have a
> flag or something for users to set if they know their application behaves
> properly.  

Or... disable checksum for the "O_DIRECT" writings... If you can't trust the checksums at 100%, these don't make sense.

> 
> Josef
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: qcow2 images make scrub believe the filesystem is corrupted.
  2017-08-18 17:43     ` Josef Bacik
  2017-08-18 22:19       ` Goffredo Baroncelli
@ 2017-08-18 23:29       ` Qu Wenruo
  1 sibling, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2017-08-18 23:29 UTC (permalink / raw)
  To: Josef Bacik, kreijack; +Cc: Paulo Dias, linux-btrfs



On 2017年08月19日 01:43, Josef Bacik wrote:
> On Fri, Aug 18, 2017 at 06:23:18PM +0200, Goffredo Baroncelli wrote:
>> On 08/18/2017 01:39 AM, Josef Bacik wrote:
>> [...]
>>> This is happening because the app (the guest OS in this case, we saw this a lot
>>> with windows guests) is changing the pages while they are in flight.  We
>>> calculate the checksum of the page before it's written, so if it changes while
>>> in flight we'll end up with a csum mismatch.
>>>
>>> To fix this change kvm to not use O_DIRECT or set NODATASUM on your qcow2 image.
>>> You'll have to re-create the image because NODATASUM won't apply to the already
>>> invalid checksums.  Thanks,
>>
>> Hi Josef,
>>
>> could you elaborate: do you are saying that using O_DIRECT is incompatible with DATASUM ?
>>
> 
> No, I'm saying using O_DIRECT with applications that don't protect in-flight
> memory are incompatible with DATASUM.  We have no way of making sure nobody
> touches the page while we're writing it out, so after we calculate the checksum
> any changes to the page are going to cause a checksum mismatch.  O_DIRECT are
> user space pages, there's nothing we can do to stop user space from doing stupid
> things.
> 
> The options I looked into before were things like detecting the page had changed
> since we calculated the checksum, and re-submitting the write.  This punishes
> applications that do the right thing (databases for example) by forcing us to
> calculate checksums twice.

Just curious about this.

Why not just scrubbing data/metadata in commit roots?
And don't use any page cache, but always read them out from disk?

For datacsum case, it's always cowed, so it won't change in-flight.

Although the cost is obvious, such method can only check data/metadata 
in previous trans and doesn't use page cache means tons of IO.

Thanks,
Qu
> 
> This is a shit situation because users aren't going to understand this
> limitation, and it bites them in the ass with all these weird errors.  I think
> maybe we need to go back to the double-checksum thing by default, and have a
> flag or something for users to set if they know their application behaves
> properly.  Thanks,
> 
> Josef
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: qcow2 images make scrub believe the filesystem is corrupted.
  2017-08-18 22:19       ` Goffredo Baroncelli
@ 2017-08-19 13:08         ` Goffredo Baroncelli
  0 siblings, 0 replies; 19+ messages in thread
From: Goffredo Baroncelli @ 2017-08-19 13:08 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Paulo Dias, linux-btrfs

On 08/19/2017 12:19 AM, Goffredo Baroncelli wrote:
>> We have no way of making sure nobody
>> touches the page while we're writing it out, so after we calculate the checksum
>> any changes to the page are going to cause a checksum mismatch.  O_DIRECT are
>> user space pages, there's nothing we can do to stop user space from doing stupid
>> things.
> I understand the technical difficulties; 

I looked how ZFS deal with this problem. According of this thread [1] it seems that currently ZFS (on linux) doesn't support O_DIRECT. In this thread it is mentioned that one of the problem is the coherency between data and checksum.

I think that it is better to not support O_DIRECT for cow files than having mismatched checksum..

BR
G.Baroncelli

[1] https://github.com/zfsonlinux/zfs/issues/224


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-08-19 13:08 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-16  1:12 qcow2 images make scrub believe the filesystem is corrupted Paulo Dias
2017-08-16  1:40 ` Qu Wenruo
2017-08-16  1:51   ` Paulo Dias
2017-08-16  2:28     ` Qu Wenruo
2017-08-16  2:46       ` Qu Wenruo
2017-08-16  7:47       ` Qu Wenruo
2017-08-16 23:32 ` Chris Murphy
2017-08-17  8:04   ` Duncan
2017-08-17 19:10     ` Chris Murphy
2017-08-17 20:17       ` Paulo Dias
2017-08-17 20:58         ` Chris Murphy
2017-08-17 23:39 ` Josef Bacik
2017-08-18 16:23   ` Goffredo Baroncelli
2017-08-18 17:43     ` Josef Bacik
2017-08-18 22:19       ` Goffredo Baroncelli
2017-08-19 13:08         ` Goffredo Baroncelli
2017-08-18 23:29       ` Qu Wenruo
2017-08-18 17:59     ` Liu Bo
2017-08-18 18:25       ` Paulo Dias

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.