[linux-lvm] Filesystem corruption with LVM's pvmove onto a PV with a larger physical block size

* [linux-lvm] Filesystem corruption with LVM's pvmove onto a PV with a larger physical block size
@ 2019-02-25 15:33 Ingo Franzki
  2019-02-27  0:00 ` Cesare Leonardi
  2019-02-28 14:36 ` Ilia Zykov
  0 siblings, 2 replies; 37+ messages in thread
From: Ingo Franzki @ 2019-02-25 15:33 UTC (permalink / raw)
  To: linux-lvm

Hi,

we just encountered an error when using LVM's pvmove command to move the data from an un-encrypted LVM physical volume onto an encrypted volume. 
After the pvmove has completed, the file system on the logical volume that resides on the moved physical volumes is corrupted and all data on this LV is lost.

As a special condition, we are using the sector-size option with cryptsetup (LUKS2) to use 4096 byte sectors to increase the crypt performance. 

According to Ondrej Kozina in https://www.saout.de/pipermail/dm-crypt/2019-February/006078.html this is due to the fact that a sector size of 4096 also causes the physical block size to be (at least) 4096. Using the default sector size of 512 does not show the problem. 

The source PV of the pvmove has a physical block size of 512 (SCSI disk in our case). So when moving this to the encrypted volume the physical block size becomes 4096. The file system does not seem to like this....

Please note that this problem can also happen in other cases, such as mixing disks with different block sizes (e.g. SCSI disks with 512 bytes and s390x-DASDs with 4096 block size).

In my opinion this is a general problem that needs to be addressed by LVM. 
Whenever you pvmove data to a PV with a larger physical block size you will corrupt the file system and thus cause data loss.
The same probably happens when using lvextend to enlarge an LV onto a PV with a larger physical block size. 

Bottom line: LVM should reject extending a volume group with a PV that has a larger physical block size than the PVs that belong to the volume group already. There might be a force option to override this check in case the user really knows what he does. But the default behavior should be to protect the user from file system corruptions and thus from data loss.

Yes, having a backup before doing pvmove or similar is of course wise, but I still think that LVM should prevent a user for damaging the file system by using LVM's command line tools. 

Here is how to reproduce this (note the error messages on the very last command): 

# sudo dd if=/dev/zero of=loopbackfile1.img bs=500M count=1
1+0 records in
1+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 2.32777 s, 225 MB/s

# sudo dd if=/dev/zero of=loopbackfile2.img bs=500M count=1
1+0 records in
1+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 1.89992 s, 276 MB/s

# losetup -fP loopbackfile1.img

# losetup -fP loopbackfile2.img

# pvcreate /dev/loop0
  Physical volume "/dev/loop0" successfully created.

# vgcreate LOOP_VG /dev/loop0
  Volume group "LOOP_VG" successfully created

# lvcreate -L 300MB LOOP_VG -n LV /dev/loop0
  Logical volume "LV" created.

# mkfs.ext4 /dev/mapper/LOOP_VG-LV
mke2fs 1.44.1 (24-Mar-2018)
Discarding device blocks: done
Creating filesystem with 307200 1k blocks and 76912 inodes
Filesystem UUID: 344289a3-e251-4d88-b03d-a71a4be2a8ec
Superblock backups stored on blocks:
        8193, 24577, 40961, 57345, 73729, 204801, 221185

Allocating group tables: done
Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done

# mount /dev/mapper/LOOP_VG-LV /mnt

# cryptsetup luksFormat --type luks2 --sector-size 4096 /dev/loop1

WARNING!
========
This will overwrite data on /dev/loop1 irrevocably.

Are you sure? (Type uppercase yes): YES
Enter passphrase for /dev/loop1: loop
Verify passphrase: loop

# cryptsetup luksOpen /dev/loop1 enc-loop
Enter passphrase for /dev/loop1: loop

# pvcreate /dev/mapper/enc-loop
  Physical volume "/dev/mapper/enc-loop" successfully created.

# vgextend LOOP_VG /dev/mapper/enc-loop
  Volume group "LOOP_VG" successfully extended

# pvs
  PV                   VG      Fmt  Attr PSize   PFree
  /dev/loop0           LOOP_VG lvm2 a--  496.00m 196.00m
  /dev/mapper/enc-loop LOOP_VG lvm2 a--  492.00m 492.00m

# pvmove /dev/loop0 /dev/mapper/enc-loop
  /dev/loop0: Moved: 30.67%
  /dev/loop0: Moved: 100.00%

# pvs
  /dev/LOOP_VG/LV: read failed after 0 of 1024 at 0: Invalid argument
  /dev/LOOP_VG/LV: read failed after 0 of 1024 at 314507264: Invalid argument
  /dev/LOOP_VG/LV: read failed after 0 of 1024 at 314564608: Invalid argument
  /dev/LOOP_VG/LV: read failed after 0 of 1024 at 4096: Invalid argument
  PV                   VG      Fmt  Attr PSize   PFree
  /dev/loop0           LOOP_VG lvm2 a--  496.00m 496.00m
  /dev/mapper/enc-loop LOOP_VG lvm2 a--  492.00m 192.00m

In case the filesystem of the logical volume is not mounted at the time of pvmove, it gets corrupted anyway, but you only see errors when trying to mount it.

-- 
Ingo Franzki
eMail: ifranzki@linux.ibm.com  
Tel: ++49 (0)7031-16-4648
Fax: ++49 (0)7031-16-3456
Linux on IBM Z Development, Schoenaicher Str. 220, 71032 Boeblingen, Germany

IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Matthias Hartmann
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM DATA Privacy Statement: https://www.ibm.com/privacy/us/en/

^ permalink raw reply	[flat|nested] 37+ messages in thread