What prevents discarding a cluster during rewrite?

* What prevents discarding a cluster during rewrite?
@ 2021-02-22 21:30 Vladimir Sementsov-Ogievskiy
  2021-02-22 21:42 ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 6+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-02-22 21:30 UTC (permalink / raw)
  To: qemu block; +Cc: qemu-devel, Max Reitz, Kevin Wolf, Denis V. Lunev, Eric Blake

Hi all!

Thinking of how to prevent dereferencing to zero (and discard) of host cluster during flush of compressed cache (which I'm working on now), I have a question.. What prevents it for normal writes?

A simple interactive qemu-io session on master branch:

./qemu-img create -f qcow2 x 1M

[root@kvm build]# ./qemu-io blkdebug::x

do initial write:

qemu-io> write -P 1 0 64K
wrote 65536/65536 bytes at offset 0
64 KiB, 1 ops; 00.12 sec (556.453 KiB/sec and 8.6946 ops/sec)

rewrite, and break before write (assume long write by fs or hardware for some reason)

qemu-io> break write_aio A
qemu-io> aio_write -P 2 0 64K
blkdebug: Suspended request 'A'

OK, we stopped before write. Everything is already allocated on initial write, mutex now resumed.. And suddenly we do discard:

qemu-io> discard 0 64K
discard 65536/65536 bytes at offset 0
64 KiB, 1 ops; 00.00 sec (146.034 MiB/sec and 2336.5414 ops/sec)

Now, start another write, to another place.. But it will allocate same host cluster!!!

qemu-io> write -P 3 128K 64K
wrote 65536/65536 bytes at offset 131072
64 KiB, 1 ops; 00.08 sec (787.122 KiB/sec and 12.2988 ops/sec)

Check it:

qemu-io> read -P 3 128K 64K
read 65536/65536 bytes at offset 131072
64 KiB, 1 ops; 00.00 sec (188.238 MiB/sec and 3011.8033 ops/sec)

resume our old write:

qemu-io> resume A
blkdebug: Resuming request 'A'
qemu-io> wrote 65536/65536 bytes at offset 0
64 KiB, 1 ops; 0:05:07.10 (213.400382 bytes/sec and 0.0033 ops/sec)

of course it doesn't influence first cluster, as it is discarded:

qemu-io> read -P 2 0 64K
Pattern verification failed at offset 0, 65536 bytes
read 65536/65536 bytes at offset 0
64 KiB, 1 ops; 00.00 sec (726.246 MiB/sec and 11619.9352 ops/sec)
qemu-io> read -P 0 0 64K
read 65536/65536 bytes at offset 0
64 KiB, 1 ops; 00.00 sec (632.348 MiB/sec and 10117.5661 ops/sec)

But in 3rd cluster data is corrupted now:

qemu-io> read -P 3 128K 64K
Pattern verification failed at offset 131072, 65536 bytes
read 65536/65536 bytes at offset 131072
64 KiB, 1 ops; 00.00 sec (163.922 MiB/sec and 2622.7444 ops/sec)
qemu-io> read -P 2 128K 64K
read 65536/65536 bytes at offset 131072
64 KiB, 1 ops; 00.00 sec (257.058 MiB/sec and 4112.9245 ops/sec

So, that's a classical use-after-free... For user it looks like racy write/discard to one cluster may corrupt another cluster... It may be even worse, if use-after-free corrupts metadata.

Note, that initial write is significant, as when we do allocate cluster we write L2 entry after data write (as I understand), so the race doesn't happen. But, if consider compressed writes, they allocate everything before write.. Let's check:

[root@kvm build]# ./qemu-img create -f qcow2 x 1M; ./qemu-io blkdebug::x
Formatting 'x', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=1048576 lazy_refcounts=off refcount_bits=16
qemu-io> break write_compressed A
qemu-io> aio_write -c -P 1 0 64K
qemu-io> compressed: 327680 79
blkdebug: Suspended request 'A'

qemu-io> discard 0 64K
discarded: 327680
discard 65536/65536 bytes at offset 0
64 KiB, 1 ops; 00.01 sec (7.102 MiB/sec and 113.6297 ops/sec)
qemu-io> write -P 3 128K 64K
normal cluster alloc: 327680
wrote 65536/65536 bytes at offset 131072
64 KiB, 1 ops; 00.06 sec (1.005 MiB/sec and 16.0774 ops/sec)
qemu-io> resume A
blkdebug: Resuming request 'A'
qemu-io> wrote 65536/65536 bytes at offset 0
64 KiB, 1 ops; 0:00:15.90 (4.026 KiB/sec and 0.0629 ops/sec)

qemu-io> read -P 3 128K 64K
Pattern verification failed at offset 131072, 65536 bytes
read 65536/65536 bytes at offset 131072
64 KiB, 1 ops; 00.00 sec (237.791 MiB/sec and 3804.6539 ops/sec)

(strange, but seems it didn't fail several times for me.. But now it fails several times... Anyway, it's all not good).

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 6+ messages in thread