All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kirill Tkhai <kirill.tkhai@openvz.org>
To: agk@redhat.com, snitzer@redhat.com, dm-devel@redhat.com,
	song@kernel.org, linux-kernel@vger.kernel.org,
	khorenko@virtuozzo.com, kirill.tkhai@openvz.org
Subject: [PATCH 0/4] dm: Introduce dm-qcow2 driver to attach QCOW2 files as block device
Date: Mon, 28 Mar 2022 14:18:16 +0300	[thread overview]
Message-ID: <164846619932.251310.3668540533992131988.stgit@pro> (raw)

This patchset adds a new driver allowing to attach QCOW2 files
as block devices. Its idea is to implement in kernel only that
features, which affect runtime IO performance (IO requests
processing functionality). The maintenance operations are
synchronously processed in userspace, while device is suspended.

Userspace is allowed to do only that operations, which never
modifies virtual disk's data. It is only allowed to modify
QCOW2 file metadata providing that disk's data. The examples
of allowed operations is snapshot creation and resize.

Userspace part is handled by already existing utils (qemu-img).

For instance, snapshot creation on attached dm-qcow2 device looks like:

# dmsetup suspend $device
# qemu-img snapshot -c <snapshot_name> $device.qcow2
# dmsetup resume $device

1)Suspend flushes all pending IO and related metadata to file,
  leaving the file in consistent QCOW2 format.
  Driver .postsuspend throws out all images's cached metadata.
2)qemu-img creates snapshot: changes/moves metadata inside QCOW2 file.
3)Driver .preresume reads new version of metadata
  from file (1 page is required), and the device is ready
  to continue handling of IO requests.

This example shows the way of device-mapper infrastructure
allows to implement drivers following the idea of
kernel/userspace components demarcation. Thus, the driver
uses advantages of device-mapper instead of implementing
its own suspend/resume engine.

The below fio test was used to measure performance:

# fio --name=test --ioengine=libaio --direct=1 --bs=$bs --filename=$dev
      --readwrite=$rw --runtime=60 --numjobs=2 --iodepth=8

The collected results consists of the both: fio measurement
and system load taken from /proc/loadavg. Since loadavg min
period is 60 seconds, fio's runtime is 60 too.
Here is average results of 5 runs (IO/loadavg is also average
of IO/loadavg of 5 runs):

-------------------------------------------+---------------------------------+--------------------------+
                  qemu-nbd (native aio)    |             dm-qcow2            |          diff, %         |
----+--------------------------------------+---------------------------------+--------------------------+
bs  |  RW   | IO,MiB/s  loadavg  IO/loadavg|  IO,MiB/s   loadavg   IO/loadavg|IO     loadavg  IO/loadavg|
------------|------------------------------+---------------------------------+--------------------------+
4K  | READ  |  279       1.986     147     |  512        2.088     248       |+83.7    +5.1     +68.4   |
4K  | WRITE |  242       2.31      105     |  770        2.172     357       |+217.9   -5.9     +239.7  |
----+-------|------------------------------+---------------------------------+--------------------------+
64K | READ  |  1199      1.794     691     |  1218       1.118     1217      |+1.6     -37.7    +76     |
64K | WRITE |  946       1.084     877     |  1003       0.466     2144      |+6.1     -57      +144.5  |
------------|------------------------------+---------------------------------+--------------------------+
512K| READ  |  1741      1.142     1526    |  2196       0.546     4197      |+26.1    -52.2    +175.1  |
512K| WRITE |  1016      1.084     941     |  993        0.306     3267      |-2.2     -71.7    +246.9  |
----|-------|------------------------------+---------------------------------+--------------------------+
1M  | READ  |  1793      1.174     1542    |  2373       0.566     4384      |+32.4    -51.8    +184.2  |
1M  | WRITE |  1037      0.894     1165    |  1068       0.892     1196      |+2.9     -0.2     +2.7    |
----|-------+------------------------------+---------------------------------+--------------------------+
2M  | READ  |  1784      1.084     1654    |  2431       0.788     3090      |+36.3    -27.3    +86.8   |
2M  | WRITE |  1027      0.878     1172    |  1063       0.878     1212      |+3.6     0        +3.4    |
----+-------+------------------------------+---------------------------------+--------------------------+
(NBD attaching string: qemu-nbd -c $dev --aio=native --nocache file.qcow2)

As in diff column, dm-qcow2 driver has the best throughput
(the only exception is 512K WRITE), and the smallest
loadavg (the only exception is 4K READ). The density
of dm-qcow2 is significantly better.

(Note, that tests are made on preallocated images, when
 all L2 table is allocated, since QEMU has lazy L2 allocation
 feature, which is not implemented in dm-qcow2 yet).

So, one of the reasons of implementing the driver is providing
better performance and density than it's done in qemu-nbd.
The second reason is a possibility to unify virtual disks format
for VMs and containers, so a disk image can be used to start
both of them.

This patchset consists of 4 patches. Patches [1-2] make small
changes in dm code: [1] exports a function, while [2] makes
.io_hints be called for drivers not having .iterate_devices.
Patch [3] adds dm-qcow2, while patch [4] adds a userspace
wrapper for attaching such the devices.

---

Kirill Tkhai (4):
      dm: Export dm_complete_request()
      dm: Process .io_hints for drivers not having underlying devices
      dm-qcow2: Introduce driver to create block devices over QCOW2 files
      dm-qcow2: Add helper for working with dm-qcow2 devices


 drivers/md/Kconfig           |   17 +
 drivers/md/Makefile          |    2 +
 drivers/md/dm-qcow2-cmd.c    |  383 +++
 drivers/md/dm-qcow2-map.c    | 4256 ++++++++++++++++++++++++++++++++++
 drivers/md/dm-qcow2-target.c | 1026 ++++++++
 drivers/md/dm-qcow2.h        |  368 +++
 drivers/md/dm-rq.c           |    3 +-
 drivers/md/dm-rq.h           |    2 +
 drivers/md/dm-table.c        |    5 +-
 scripts/qcow2-dm.sh          |  249 ++
 10 files changed, 6309 insertions(+), 2 deletions(-)
 create mode 100644 drivers/md/dm-qcow2-cmd.c
 create mode 100644 drivers/md/dm-qcow2-map.c
 create mode 100644 drivers/md/dm-qcow2-target.c
 create mode 100644 drivers/md/dm-qcow2.h
 create mode 100755 scripts/qcow2-dm.sh

--
Signed-off-by: Kirill Tkhai <kirill.tkhai@openvz.org>


WARNING: multiple messages have this Message-ID (diff)
From: Kirill Tkhai <kirill.tkhai@openvz.org>
To: agk@redhat.com, snitzer@redhat.com, dm-devel@redhat.com,
	song@kernel.org,  linux-kernel@vger.kernel.org,
	khorenko@virtuozzo.com, kirill.tkhai@openvz.org
Subject: [dm-devel] [PATCH 0/4] dm: Introduce dm-qcow2 driver to attach QCOW2 files as block device
Date: Mon, 28 Mar 2022 14:18:16 +0300	[thread overview]
Message-ID: <164846619932.251310.3668540533992131988.stgit@pro> (raw)

This patchset adds a new driver allowing to attach QCOW2 files
as block devices. Its idea is to implement in kernel only that
features, which affect runtime IO performance (IO requests
processing functionality). The maintenance operations are
synchronously processed in userspace, while device is suspended.

Userspace is allowed to do only that operations, which never
modifies virtual disk's data. It is only allowed to modify
QCOW2 file metadata providing that disk's data. The examples
of allowed operations is snapshot creation and resize.

Userspace part is handled by already existing utils (qemu-img).

For instance, snapshot creation on attached dm-qcow2 device looks like:

# dmsetup suspend $device
# qemu-img snapshot -c <snapshot_name> $device.qcow2
# dmsetup resume $device

1)Suspend flushes all pending IO and related metadata to file,
  leaving the file in consistent QCOW2 format.
  Driver .postsuspend throws out all images's cached metadata.
2)qemu-img creates snapshot: changes/moves metadata inside QCOW2 file.
3)Driver .preresume reads new version of metadata
  from file (1 page is required), and the device is ready
  to continue handling of IO requests.

This example shows the way of device-mapper infrastructure
allows to implement drivers following the idea of
kernel/userspace components demarcation. Thus, the driver
uses advantages of device-mapper instead of implementing
its own suspend/resume engine.

The below fio test was used to measure performance:

# fio --name=test --ioengine=libaio --direct=1 --bs=$bs --filename=$dev
      --readwrite=$rw --runtime=60 --numjobs=2 --iodepth=8

The collected results consists of the both: fio measurement
and system load taken from /proc/loadavg. Since loadavg min
period is 60 seconds, fio's runtime is 60 too.
Here is average results of 5 runs (IO/loadavg is also average
of IO/loadavg of 5 runs):

-------------------------------------------+---------------------------------+--------------------------+
                  qemu-nbd (native aio)    |             dm-qcow2            |          diff, %         |
----+--------------------------------------+---------------------------------+--------------------------+
bs  |  RW   | IO,MiB/s  loadavg  IO/loadavg|  IO,MiB/s   loadavg   IO/loadavg|IO     loadavg  IO/loadavg|
------------|------------------------------+---------------------------------+--------------------------+
4K  | READ  |  279       1.986     147     |  512        2.088     248       |+83.7    +5.1     +68.4   |
4K  | WRITE |  242       2.31      105     |  770        2.172     357       |+217.9   -5.9     +239.7  |
----+-------|------------------------------+---------------------------------+--------------------------+
64K | READ  |  1199      1.794     691     |  1218       1.118     1217      |+1.6     -37.7    +76     |
64K | WRITE |  946       1.084     877     |  1003       0.466     2144      |+6.1     -57      +144.5  |
------------|------------------------------+---------------------------------+--------------------------+
512K| READ  |  1741      1.142     1526    |  2196       0.546     4197      |+26.1    -52.2    +175.1  |
512K| WRITE |  1016      1.084     941     |  993        0.306     3267      |-2.2     -71.7    +246.9  |
----|-------|------------------------------+---------------------------------+--------------------------+
1M  | READ  |  1793      1.174     1542    |  2373       0.566     4384      |+32.4    -51.8    +184.2  |
1M  | WRITE |  1037      0.894     1165    |  1068       0.892     1196      |+2.9     -0.2     +2.7    |
----|-------+------------------------------+---------------------------------+--------------------------+
2M  | READ  |  1784      1.084     1654    |  2431       0.788     3090      |+36.3    -27.3    +86.8   |
2M  | WRITE |  1027      0.878     1172    |  1063       0.878     1212      |+3.6     0        +3.4    |
----+-------+------------------------------+---------------------------------+--------------------------+
(NBD attaching string: qemu-nbd -c $dev --aio=native --nocache file.qcow2)

As in diff column, dm-qcow2 driver has the best throughput
(the only exception is 512K WRITE), and the smallest
loadavg (the only exception is 4K READ). The density
of dm-qcow2 is significantly better.

(Note, that tests are made on preallocated images, when
 all L2 table is allocated, since QEMU has lazy L2 allocation
 feature, which is not implemented in dm-qcow2 yet).

So, one of the reasons of implementing the driver is providing
better performance and density than it's done in qemu-nbd.
The second reason is a possibility to unify virtual disks format
for VMs and containers, so a disk image can be used to start
both of them.

This patchset consists of 4 patches. Patches [1-2] make small
changes in dm code: [1] exports a function, while [2] makes
.io_hints be called for drivers not having .iterate_devices.
Patch [3] adds dm-qcow2, while patch [4] adds a userspace
wrapper for attaching such the devices.

---

Kirill Tkhai (4):
      dm: Export dm_complete_request()
      dm: Process .io_hints for drivers not having underlying devices
      dm-qcow2: Introduce driver to create block devices over QCOW2 files
      dm-qcow2: Add helper for working with dm-qcow2 devices


 drivers/md/Kconfig           |   17 +
 drivers/md/Makefile          |    2 +
 drivers/md/dm-qcow2-cmd.c    |  383 +++
 drivers/md/dm-qcow2-map.c    | 4256 ++++++++++++++++++++++++++++++++++
 drivers/md/dm-qcow2-target.c | 1026 ++++++++
 drivers/md/dm-qcow2.h        |  368 +++
 drivers/md/dm-rq.c           |    3 +-
 drivers/md/dm-rq.h           |    2 +
 drivers/md/dm-table.c        |    5 +-
 scripts/qcow2-dm.sh          |  249 ++
 10 files changed, 6309 insertions(+), 2 deletions(-)
 create mode 100644 drivers/md/dm-qcow2-cmd.c
 create mode 100644 drivers/md/dm-qcow2-map.c
 create mode 100644 drivers/md/dm-qcow2-target.c
 create mode 100644 drivers/md/dm-qcow2.h
 create mode 100755 scripts/qcow2-dm.sh

--
Signed-off-by: Kirill Tkhai <kirill.tkhai@openvz.org>

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


             reply	other threads:[~2022-03-28 11:18 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-28 11:18 Kirill Tkhai [this message]
2022-03-28 11:18 ` [dm-devel] [PATCH 0/4] dm: Introduce dm-qcow2 driver to attach QCOW2 files as block device Kirill Tkhai
2022-03-28 11:18 ` [PATCH 1/4] dm: Export dm_complete_request() Kirill Tkhai
2022-03-28 11:18   ` [dm-devel] " Kirill Tkhai
2022-03-28 11:18 ` [PATCH 2/4] dm: Process .io_hints for drivers not having underlying devices Kirill Tkhai
2022-03-28 11:18   ` [dm-devel] " Kirill Tkhai
2022-03-28 11:18 ` [PATCH 3/4] dm-qcow2: Introduce driver to create block devices over QCOW2 files Kirill Tkhai
2022-03-28 11:18   ` [dm-devel] " Kirill Tkhai
2022-03-28 20:03   ` kernel test robot
2022-03-28 20:03     ` kernel test robot
2022-03-28 23:42   ` kernel test robot
2022-03-28 23:42     ` [dm-devel] " kernel test robot
2022-03-29 10:42   ` [PATCH 3/4 v1.5] " Kirill Tkhai
2022-03-29 10:42     ` [dm-devel] " Kirill Tkhai
2022-03-29 13:34   ` [dm-devel] [PATCH 3/4] " Christoph Hellwig
2022-03-29 13:34     ` Christoph Hellwig
2022-03-29 15:24     ` Kirill Tkhai
2022-03-29 15:24       ` Kirill Tkhai
2022-03-29 22:30       ` Kirill Tkhai
2022-03-29 22:30         ` Kirill Tkhai
2022-03-28 11:18 ` [PATCH 4/4] dm-qcow2: Add helper for working with dm-qcow2 devices Kirill Tkhai
2022-03-28 11:18   ` [dm-devel] " Kirill Tkhai
2022-03-29 13:08 ` [dm-devel] [PATCH 0/4] dm: Introduce dm-qcow2 driver to attach QCOW2 files as block device Christoph Hellwig
2022-03-29 13:08   ` Christoph Hellwig
2022-03-29 15:14   ` Kirill Tkhai
2022-03-29 15:14     ` Kirill Tkhai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=164846619932.251310.3668540533992131988.stgit@pro \
    --to=kirill.tkhai@openvz.org \
    --cc=agk@redhat.com \
    --cc=dm-devel@redhat.com \
    --cc=khorenko@virtuozzo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=snitzer@redhat.com \
    --cc=song@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.