ublk-qcow2: ublk-qcow2 is available

* ublk-qcow2: ublk-qcow2 is available
@ 2022-09-30  9:24 Ming Lei
  2022-10-03 19:53 ` Stefan Hajnoczi
  2022-10-04  5:43 ` Manuel Bentele
  0 siblings, 2 replies; 44+ messages in thread
From: Ming Lei @ 2022-09-30  9:24 UTC (permalink / raw)
  To: io-uring, linux-block, linux-kernel
  Cc: Kirill Tkhai, Manuel Bentele, Stefan Hajnoczi

Hello,

ublk-qcow2 is available now.

So far it provides basic read/write function, and compression and snapshot
aren't supported yet. The target/backend implementation is completely
based on io_uring, and share the same io_uring with ublk IO command
handler, just like what ublk-loop does.

Follows the main motivations of ublk-qcow2:

- building one complicated target from scratch helps libublksrv APIs/functions
  become mature/stable more quickly, since qcow2 is complicated and needs more
  requirement from libublksrv compared with other simple ones(loop, null)

- there are several attempts of implementing qcow2 driver in kernel, such as
  ``qloop`` [2], ``dm-qcow2`` [3] and ``in kernel qcow2(ro)`` [4], so ublk-qcow2
  might useful be for covering requirement in this field

- performance comparison with qemu-nbd, and it was my 1st thought to evaluate
  performance of ublk/io_uring backend by writing one ublk-qcow2 since ublksrv
  is started

- help to abstract common building block or design pattern for writing new ublk
  target/backend

So far it basically passes xfstest(XFS) test by using ublk-qcow2 block
device as TEST_DEV, and kernel building workload is verified too. Also
soft update approach is applied in meta flushing, and meta data
integrity is guaranteed, 'make test T=qcow2/040' covers this kind of
test, and only cluster leak is reported during this test.

The performance data looks much better compared with qemu-nbd, see
details in commit log[1], README[5] and STATUS[6]. And the test covers both
empty image and pre-allocated image, for example of pre-allocated qcow2
image(8GB):

- qemu-nbd (make test T=qcow2/002)
	randwrite(4k): jobs 1, iops 24605
	randread(4k): jobs 1, iops 30938
	randrw(4k): jobs 1, iops read 13981 write 14001
	rw(512k): jobs 1, iops read 724 write 728

- ublk-qcow2 (make test T=qcow2/022)
	randwrite(4k): jobs 1, iops 104481
	randread(4k): jobs 1, iops 114937
	randrw(4k): jobs 1, iops read 53630 write 53577
	rw(512k): jobs 1, iops read 1412 write 1423

Also ublk-qcow2 aligns queue's chunk_sectors limit with qcow2's cluster size,
which is 64KB at default, this way simplifies backend io handling, but
it could be increased to 512K or more proper size for improving sequential
IO perf, just need one coroutine to handle more than one IOs.

[1] https://github.com/ming1/ubdsrv/commit/9faabbec3a92ca83ddae92335c66eabbeff654e7
[2] https://upcommons.upc.edu/bitstream/handle/2099.1/9619/65757.pdf?sequence=1&isAllowed=y
[3] https://lwn.net/Articles/889429/
[4] https://lab.ks.uni-freiburg.de/projects/kernel-qcow2/repository
[5] https://github.com/ming1/ubdsrv/blob/master/qcow2/README.rst
[6] https://github.com/ming1/ubdsrv/blob/master/qcow2/STATUS.rst

Thanks,
Ming

^ permalink raw reply	[flat|nested] 44+ messages in thread