linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] block: loop: add file format subsystem and QCOW2 file format driver
@ 2019-08-23 22:56 development
  2019-08-23 22:56 ` [PATCH 2/5] doc: admin-guide: add loop block device documentation development
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: development @ 2019-08-23 22:56 UTC (permalink / raw)
  To: linux-block; +Cc: Manuel Bentele

From: Manuel Bentele <development@manuel-bentele.de>

Hi

Regarding to the following discussion [1] on the mailing list I show you 
the result of my work as announced at the end of the discussion [2].

The discussion was about the project topic of how to implement the 
reading/writing of QCOW2 in the kernel. The project focuses on an read-only 
in-kernel QCOW2 implementation to increase the read/write performance 
and tries to avoid nbd. Furthermore, the project is part of a project 
series to develop a in-kernel network boot infrastructure that has no need 
for any user space interaction (e.g. nbd) anymore.

During the discussion, it turned out that the implementation as device 
mapper target is not applicable. The device mapper stacks different 
functionality such as compression or encryption on multiple block device 
layers whereas an implementation for the QCOW2 container format provides 
these functionalities on one block device layer. Using FUSE is also not 
possible due to performance reasons and user space interaction.

Therefore, I propose the extension of the loop device module. I created a 
new file format subsystem which is part of the loop device module. The file 
format subsystem abstracts the direct file access and provides an driver 
API to implement various disk file formats such as QCOW2, VDI and VMDK. 
File format drivers are implemented as kernel modules and can be registered 
by the file format subsystem.

The patch series contains documentation for the file format subsystem and 
the loop device module, too. Also, it provides a default RAW file format 
driver and a read-only QCOW2 driver. The RAW file format driver is based on 
the file specific parts of the existing loop device implementation and 
preserves the default behaviour of a loop device. More specific information 
can be found in the commit logs of the following patches.

Regards,
Manuel

[1] https://www.spinics.net/lists/linux-block/msg39538.html
[2] https://www.spinics.net/lists/linux-block/msg40479.html

Manuel Bentele (5):
  block: loop: add file format subsystem for loop devices
  doc: admin-guide: add loop block device documentation
  doc: driver-api: add loop file format subsystem API documentation
  block: loop: add QCOW2 loop file format driver (read-only)
  doc: admin-guide: add QCOW2 file format to loop device documentation

 Documentation/admin-guide/blockdev/index.rst  |   1 +
 Documentation/admin-guide/blockdev/loop.rst   |  85 ++
 Documentation/driver-api/index.rst            |   1 +
 Documentation/driver-api/loop-file-fmt.rst    | 137 +++
 arch/alpha/configs/defconfig                  |   1 +
 arch/arc/configs/axs103_defconfig             |   1 +
 arch/arc/configs/axs103_smp_defconfig         |   1 +
 arch/arm/configs/am200epdkit_defconfig        |   1 +
 arch/arm/configs/aspeed_g4_defconfig          |   1 +
 arch/arm/configs/aspeed_g5_defconfig          |   1 +
 arch/arm/configs/assabet_defconfig            |   1 +
 arch/arm/configs/at91_dt_defconfig            |   1 +
 arch/arm/configs/axm55xx_defconfig            |   1 +
 arch/arm/configs/badge4_defconfig             |   1 +
 arch/arm/configs/cerfcube_defconfig           |   1 +
 arch/arm/configs/cm_x2xx_defconfig            |   1 +
 arch/arm/configs/cm_x300_defconfig            |   1 +
 arch/arm/configs/cns3420vb_defconfig          |   1 +
 arch/arm/configs/colibri_pxa270_defconfig     |   1 +
 arch/arm/configs/collie_defconfig             |   1 +
 arch/arm/configs/corgi_defconfig              |   1 +
 arch/arm/configs/davinci_all_defconfig        |   1 +
 arch/arm/configs/dove_defconfig               |   1 +
 arch/arm/configs/em_x270_defconfig            |   1 +
 arch/arm/configs/eseries_pxa_defconfig        |   1 +
 arch/arm/configs/exynos_defconfig             |   1 +
 arch/arm/configs/ezx_defconfig                |   1 +
 arch/arm/configs/footbridge_defconfig         |   1 +
 arch/arm/configs/h3600_defconfig              |   1 +
 arch/arm/configs/imote2_defconfig             |   1 +
 arch/arm/configs/imx_v6_v7_defconfig          |   1 +
 arch/arm/configs/integrator_defconfig         |   1 +
 arch/arm/configs/iop32x_defconfig             |   1 +
 arch/arm/configs/ixp4xx_defconfig             |   1 +
 arch/arm/configs/jornada720_defconfig         |   1 +
 arch/arm/configs/keystone_defconfig           |   1 +
 arch/arm/configs/lpc32xx_defconfig            |   1 +
 arch/arm/configs/milbeaut_m10v_defconfig      |   1 +
 arch/arm/configs/mini2440_defconfig           |   1 +
 arch/arm/configs/multi_v5_defconfig           |   1 +
 arch/arm/configs/multi_v7_defconfig           |   1 +
 arch/arm/configs/mv78xx0_defconfig            |   1 +
 arch/arm/configs/mvebu_v5_defconfig           |   1 +
 arch/arm/configs/netwinder_defconfig          |   1 +
 arch/arm/configs/nhk8815_defconfig            |   1 +
 arch/arm/configs/omap1_defconfig              |   1 +
 arch/arm/configs/omap2plus_defconfig          |   1 +
 arch/arm/configs/orion5x_defconfig            |   1 +
 arch/arm/configs/oxnas_v6_defconfig           |   1 +
 arch/arm/configs/palmz72_defconfig            |   1 +
 arch/arm/configs/pleb_defconfig               |   1 +
 arch/arm/configs/prima2_defconfig             |   1 +
 arch/arm/configs/pxa3xx_defconfig             |   1 +
 arch/arm/configs/pxa_defconfig                |   1 +
 arch/arm/configs/qcom_defconfig               |   1 +
 arch/arm/configs/rpc_defconfig                |   1 +
 arch/arm/configs/s3c2410_defconfig            |   1 +
 arch/arm/configs/s3c6400_defconfig            |   1 +
 arch/arm/configs/s5pv210_defconfig            |   1 +
 arch/arm/configs/sama5_defconfig              |   1 +
 arch/arm/configs/simpad_defconfig             |   1 +
 arch/arm/configs/socfpga_defconfig            |   1 +
 arch/arm/configs/spitz_defconfig              |   1 +
 arch/arm/configs/tango4_defconfig             |   1 +
 arch/arm/configs/tegra_defconfig              |   1 +
 arch/arm/configs/trizeps4_defconfig           |   1 +
 arch/arm/configs/viper_defconfig              |   1 +
 arch/arm/configs/zeus_defconfig               |   1 +
 arch/arm/configs/zx_defconfig                 |   1 +
 arch/arm64/configs/defconfig                  |   1 +
 arch/c6x/configs/dsk6455_defconfig            |   1 +
 arch/c6x/configs/evmc6457_defconfig           |   1 +
 arch/c6x/configs/evmc6472_defconfig           |   1 +
 arch/c6x/configs/evmc6474_defconfig           |   1 +
 arch/c6x/configs/evmc6678_defconfig           |   1 +
 arch/csky/configs/defconfig                   |   1 +
 arch/hexagon/configs/comet_defconfig          |   1 +
 arch/ia64/configs/bigsur_defconfig            |   1 +
 arch/ia64/configs/generic_defconfig           |   1 +
 arch/ia64/configs/gensparse_defconfig         |   1 +
 arch/ia64/configs/tiger_defconfig             |   1 +
 arch/ia64/configs/zx1_defconfig               |   1 +
 arch/m68k/configs/amiga_defconfig             |   1 +
 arch/m68k/configs/apollo_defconfig            |   1 +
 arch/m68k/configs/atari_defconfig             |   1 +
 arch/m68k/configs/bvme6000_defconfig          |   1 +
 arch/m68k/configs/hp300_defconfig             |   1 +
 arch/m68k/configs/mac_defconfig               |   1 +
 arch/m68k/configs/multi_defconfig             |   1 +
 arch/m68k/configs/mvme147_defconfig           |   1 +
 arch/m68k/configs/mvme16x_defconfig           |   1 +
 arch/m68k/configs/q40_defconfig               |   1 +
 arch/m68k/configs/sun3_defconfig              |   1 +
 arch/m68k/configs/sun3x_defconfig             |   1 +
 arch/mips/configs/bigsur_defconfig            |   1 +
 arch/mips/configs/cavium_octeon_defconfig     |   1 +
 arch/mips/configs/cobalt_defconfig            |   1 +
 arch/mips/configs/decstation_64_defconfig     |   1 +
 arch/mips/configs/decstation_defconfig        |   1 +
 arch/mips/configs/decstation_r4k_defconfig    |   1 +
 arch/mips/configs/fuloong2e_defconfig         |   1 +
 arch/mips/configs/generic/board-ocelot.config |   1 +
 arch/mips/configs/gpr_defconfig               |   1 +
 arch/mips/configs/ip27_defconfig              |   1 +
 arch/mips/configs/ip32_defconfig              |   1 +
 arch/mips/configs/jazz_defconfig              |   1 +
 arch/mips/configs/lemote2f_defconfig          |   1 +
 arch/mips/configs/loongson1b_defconfig        |   1 +
 arch/mips/configs/loongson1c_defconfig        |   1 +
 arch/mips/configs/loongson3_defconfig         |   1 +
 arch/mips/configs/malta_defconfig             |   1 +
 arch/mips/configs/malta_kvm_defconfig         |   1 +
 arch/mips/configs/malta_kvm_guest_defconfig   |   1 +
 arch/mips/configs/malta_qemu_32r6_defconfig   |   1 +
 arch/mips/configs/maltaaprp_defconfig         |   1 +
 arch/mips/configs/maltasmvp_defconfig         |   1 +
 arch/mips/configs/maltasmvp_eva_defconfig     |   1 +
 arch/mips/configs/maltaup_defconfig           |   1 +
 arch/mips/configs/maltaup_xpa_defconfig       |   1 +
 arch/mips/configs/markeins_defconfig          |   1 +
 arch/mips/configs/mips_paravirt_defconfig     |   1 +
 arch/mips/configs/nlm_xlp_defconfig           |   1 +
 arch/mips/configs/nlm_xlr_defconfig           |   1 +
 arch/mips/configs/pic32mzda_defconfig         |   1 +
 arch/mips/configs/pistachio_defconfig         |   1 +
 arch/mips/configs/pnx8335_stb225_defconfig    |   1 +
 arch/mips/configs/rbtx49xx_defconfig          |   1 +
 arch/mips/configs/rm200_defconfig             |   1 +
 arch/mips/configs/tb0219_defconfig            |   1 +
 arch/mips/configs/tb0226_defconfig            |   1 +
 arch/mips/configs/tb0287_defconfig            |   1 +
 arch/nios2/configs/10m50_defconfig            |   1 +
 arch/nios2/configs/3c120_defconfig            |   1 +
 arch/parisc/configs/712_defconfig             |   1 +
 arch/parisc/configs/a500_defconfig            |   1 +
 arch/parisc/configs/b180_defconfig            |   1 +
 arch/parisc/configs/c3000_defconfig           |   1 +
 arch/parisc/configs/c8000_defconfig           |   1 +
 arch/parisc/configs/defconfig                 |   1 +
 arch/parisc/configs/generic-32bit_defconfig   |   1 +
 arch/parisc/configs/generic-64bit_defconfig   |   1 +
 arch/powerpc/configs/40x/virtex_defconfig     |   1 +
 arch/powerpc/configs/44x/sam440ep_defconfig   |   1 +
 arch/powerpc/configs/44x/virtex5_defconfig    |   1 +
 arch/powerpc/configs/52xx/cm5200_defconfig    |   1 +
 arch/powerpc/configs/52xx/lite5200b_defconfig |   1 +
 arch/powerpc/configs/52xx/motionpro_defconfig |   1 +
 arch/powerpc/configs/52xx/tqm5200_defconfig   |   1 +
 arch/powerpc/configs/83xx/asp8347_defconfig   |   1 +
 .../configs/83xx/mpc8313_rdb_defconfig        |   1 +
 .../configs/83xx/mpc8315_rdb_defconfig        |   1 +
 .../configs/83xx/mpc832x_mds_defconfig        |   1 +
 .../configs/83xx/mpc832x_rdb_defconfig        |   1 +
 .../configs/83xx/mpc834x_itx_defconfig        |   1 +
 .../configs/83xx/mpc834x_itxgp_defconfig      |   1 +
 .../configs/83xx/mpc834x_mds_defconfig        |   1 +
 .../configs/83xx/mpc836x_mds_defconfig        |   1 +
 .../configs/83xx/mpc836x_rdk_defconfig        |   1 +
 .../configs/83xx/mpc837x_mds_defconfig        |   1 +
 .../configs/83xx/mpc837x_rdb_defconfig        |   1 +
 arch/powerpc/configs/85xx/ge_imp3a_defconfig  |   1 +
 arch/powerpc/configs/85xx/ksi8560_defconfig   |   1 +
 .../configs/85xx/mpc8540_ads_defconfig        |   1 +
 .../configs/85xx/mpc8560_ads_defconfig        |   1 +
 .../configs/85xx/mpc85xx_cds_defconfig        |   1 +
 arch/powerpc/configs/85xx/sbc8548_defconfig   |   1 +
 arch/powerpc/configs/85xx/socrates_defconfig  |   1 +
 arch/powerpc/configs/85xx/stx_gp3_defconfig   |   1 +
 arch/powerpc/configs/85xx/tqm8540_defconfig   |   1 +
 arch/powerpc/configs/85xx/tqm8541_defconfig   |   1 +
 arch/powerpc/configs/85xx/tqm8548_defconfig   |   1 +
 arch/powerpc/configs/85xx/tqm8555_defconfig   |   1 +
 arch/powerpc/configs/85xx/tqm8560_defconfig   |   1 +
 .../configs/85xx/xes_mpc85xx_defconfig        |   1 +
 arch/powerpc/configs/amigaone_defconfig       |   1 +
 arch/powerpc/configs/cell_defconfig           |   1 +
 arch/powerpc/configs/chrp32_defconfig         |   1 +
 arch/powerpc/configs/ep8248e_defconfig        |   1 +
 arch/powerpc/configs/fsl-emb-nonhw.config     |   1 +
 arch/powerpc/configs/g5_defconfig             |   1 +
 arch/powerpc/configs/gamecube_defconfig       |   1 +
 arch/powerpc/configs/holly_defconfig          |   1 +
 arch/powerpc/configs/linkstation_defconfig    |   1 +
 arch/powerpc/configs/mgcoge_defconfig         |   1 +
 arch/powerpc/configs/mpc5200_defconfig        |   1 +
 arch/powerpc/configs/mpc7448_hpc2_defconfig   |   1 +
 arch/powerpc/configs/mpc8272_ads_defconfig    |   1 +
 arch/powerpc/configs/mpc83xx_defconfig        |   1 +
 arch/powerpc/configs/mpc866_ads_defconfig     |   1 +
 arch/powerpc/configs/mvme5100_defconfig       |   1 +
 arch/powerpc/configs/pasemi_defconfig         |   1 +
 arch/powerpc/configs/pmac32_defconfig         |   1 +
 arch/powerpc/configs/powernv_defconfig        |   1 +
 arch/powerpc/configs/ppc64_defconfig          |   1 +
 arch/powerpc/configs/ppc64e_defconfig         |   1 +
 arch/powerpc/configs/ppc6xx_defconfig         |   1 +
 arch/powerpc/configs/pq2fads_defconfig        |   1 +
 arch/powerpc/configs/ps3_defconfig            |   1 +
 arch/powerpc/configs/pseries_defconfig        |   1 +
 arch/powerpc/configs/skiroot_defconfig        |   1 +
 arch/powerpc/configs/wii_defconfig            |   1 +
 arch/riscv/configs/defconfig                  |   1 +
 arch/riscv/configs/rv32_defconfig             |   1 +
 arch/s390/configs/debug_defconfig             |   1 +
 arch/s390/configs/defconfig                   |   1 +
 arch/sh/configs/cayman_defconfig              |   1 +
 arch/sh/configs/landisk_defconfig             |   1 +
 arch/sh/configs/lboxre2_defconfig             |   1 +
 arch/sh/configs/rsk7264_defconfig             |   1 +
 arch/sh/configs/sdk7780_defconfig             |   1 +
 arch/sh/configs/sdk7786_defconfig             |   1 +
 arch/sh/configs/se7206_defconfig              |   1 +
 arch/sh/configs/se7780_defconfig              |   1 +
 arch/sh/configs/sh03_defconfig                |   1 +
 arch/sh/configs/sh2007_defconfig              |   1 +
 arch/sh/configs/sh7785lcr_32bit_defconfig     |   1 +
 arch/sh/configs/shmin_defconfig               |   1 +
 arch/sh/configs/titan_defconfig               |   1 +
 arch/sparc/configs/sparc32_defconfig          |   1 +
 arch/sparc/configs/sparc64_defconfig          |   1 +
 arch/um/configs/i386_defconfig                |   1 +
 arch/um/configs/x86_64_defconfig              |   1 +
 arch/unicore32/configs/defconfig              |   1 +
 arch/x86/configs/i386_defconfig               |   1 +
 arch/x86/configs/x86_64_defconfig             |   1 +
 arch/xtensa/configs/audio_kc705_defconfig     |   1 +
 arch/xtensa/configs/cadence_csp_defconfig     |   1 +
 arch/xtensa/configs/generic_kc705_defconfig   |   1 +
 arch/xtensa/configs/nommu_kc705_defconfig     |   1 +
 arch/xtensa/configs/smp_lx200_defconfig       |   1 +
 arch/xtensa/configs/virt_defconfig            |   1 +
 drivers/block/Kconfig                         |  73 +-
 drivers/block/Makefile                        |   4 +-
 drivers/block/loop/Kconfig                    |  93 ++
 drivers/block/loop/Makefile                   |  13 +
 drivers/block/{ => loop}/cryptoloop.c         |   2 +-
 drivers/block/loop/loop_file_fmt.c            | 328 ++++++
 drivers/block/loop/loop_file_fmt.h            | 351 +++++++
 drivers/block/loop/loop_file_fmt_qcow_cache.c | 218 ++++
 drivers/block/loop/loop_file_fmt_qcow_cache.h |  51 +
 .../block/loop/loop_file_fmt_qcow_cluster.c   | 270 +++++
 .../block/loop/loop_file_fmt_qcow_cluster.h   |  23 +
 drivers/block/loop/loop_file_fmt_qcow_main.c  | 945 ++++++++++++++++++
 drivers/block/loop/loop_file_fmt_qcow_main.h  | 417 ++++++++
 drivers/block/loop/loop_file_fmt_raw.c        | 449 +++++++++
 drivers/block/{loop.c => loop/loop_main.c}    | 567 ++++-------
 drivers/block/{loop.h => loop/loop_main.h}    |  14 +-
 include/uapi/linux/loop.h                     |  14 +-
 248 files changed, 3861 insertions(+), 422 deletions(-)
 create mode 100644 Documentation/admin-guide/blockdev/loop.rst
 create mode 100644 Documentation/driver-api/loop-file-fmt.rst
 create mode 100644 drivers/block/loop/Kconfig
 create mode 100644 drivers/block/loop/Makefile
 rename drivers/block/{ => loop}/cryptoloop.c (99%)
 create mode 100644 drivers/block/loop/loop_file_fmt.c
 create mode 100644 drivers/block/loop/loop_file_fmt.h
 create mode 100644 drivers/block/loop/loop_file_fmt_qcow_cache.c
 create mode 100644 drivers/block/loop/loop_file_fmt_qcow_cache.h
 create mode 100644 drivers/block/loop/loop_file_fmt_qcow_cluster.c
 create mode 100644 drivers/block/loop/loop_file_fmt_qcow_cluster.h
 create mode 100644 drivers/block/loop/loop_file_fmt_qcow_main.c
 create mode 100644 drivers/block/loop/loop_file_fmt_qcow_main.h
 create mode 100644 drivers/block/loop/loop_file_fmt_raw.c
 rename drivers/block/{loop.c => loop/loop_main.c} (86%)
 rename drivers/block/{loop.h => loop/loop_main.h} (92%)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/5] doc: admin-guide: add loop block device documentation
  2019-08-23 22:56 [PATCH 0/5] block: loop: add file format subsystem and QCOW2 file format driver development
@ 2019-08-23 22:56 ` development
  2019-08-23 22:56 ` [PATCH 3/5] doc: driver-api: add loop file format subsystem API documentation development
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: development @ 2019-08-23 22:56 UTC (permalink / raw)
  To: linux-block; +Cc: Manuel Bentele

From: Manuel Bentele <development@manuel-bentele.de>

The configuration of the loop block device module with file format support
is documented in the reST kernel documentation format.

Signed-off-by: Manuel Bentele <development@manuel-bentele.de>
---
 Documentation/admin-guide/blockdev/index.rst |  1 +
 Documentation/admin-guide/blockdev/loop.rst  | 74 ++++++++++++++++++++
 2 files changed, 75 insertions(+)
 create mode 100644 Documentation/admin-guide/blockdev/loop.rst

diff --git a/Documentation/admin-guide/blockdev/index.rst b/Documentation/admin-guide/blockdev/index.rst
index b903cf152091..127e921a0ccc 100644
--- a/Documentation/admin-guide/blockdev/index.rst
+++ b/Documentation/admin-guide/blockdev/index.rst
@@ -8,6 +8,7 @@ The Linux RapidIO Subsystem
    :maxdepth: 1
 
    floppy
+   loop
    nbd
    paride
    ramdisk
diff --git a/Documentation/admin-guide/blockdev/loop.rst b/Documentation/admin-guide/blockdev/loop.rst
new file mode 100644
index 000000000000..69d8172c85db
--- /dev/null
+++ b/Documentation/admin-guide/blockdev/loop.rst
@@ -0,0 +1,74 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Loopback Block Device
+=====================
+
+Overview
+--------
+
+The loopback device driver allows you to use a regular file as a block device.
+You can then create a file system on that block device and mount it just as you
+would mount other block devices such as hard drive partitions, CD-ROM drives or
+floppy drives. The loop devices are block special device files with major
+number 7 and typically called /dev/loop0, /dev/loop1 etc.
+
+To use the loop device, you need the losetup utility, found in the `util-linux
+package <https://www.kernel.org/pub/linux/utils/util-linux/>`_.
+
+.. note::
+	Note that this loop device has nothing to do with the loopback device \
+	used for network connections from the machine to itself.
+
+
+Parameters
+----------
+
+Kernel Command Line Parameters
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+	max_loop
+		The number of loop block devices that get unconditionally
+		pre-created at init time. The default number is configured by
+		BLK_DEV_LOOP_MIN_COUNT. Instead of statically allocating a
+		predefined number, loop devices can be requested on-demand
+		with the /dev/loop-control interface.
+
+
+Module parameters
+~~~~~~~~~~~~~~~~~
+
+	max_part
+		Maximum number of partitions per loop device (default: 0).
+
+		If max_part is given, partition scanning is globally enabled
+		for all loop devices.
+
+	max_loop
+		Maximum number of loop devices that should be initialized
+		(default: 8). The default number is configured by
+		BLK_DEV_LOOP_MIN_COUNT.
+
+
+File format drivers
+-------------------
+
+The loopback device driver provides an interface for kernel modules to
+implement custom file formats. By default, an initialized loop device uses the
+**RAW** file format driver.
+
+.. note::
+	If you want to create and set up a new loop device with the losetup \
+	utility make sure that the suitable file format driver is loaded \
+	before.
+
+The following file format drivers are available.
+
+
+RAW
+~~~
+
+The RAW file format driver implements the binary reading and writing of a disk
+image file. It supports discarding, asynchrounous IO, flushing and cryptoloop
+support.
+
+The driver's kernel module is named *loop_file_fmt_raw*.
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/5] doc: driver-api: add loop file format subsystem API documentation
  2019-08-23 22:56 [PATCH 0/5] block: loop: add file format subsystem and QCOW2 file format driver development
  2019-08-23 22:56 ` [PATCH 2/5] doc: admin-guide: add loop block device documentation development
@ 2019-08-23 22:56 ` development
  2019-08-23 22:56 ` [PATCH 4/5] block: loop: add QCOW2 loop file format driver (read-only) development
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: development @ 2019-08-23 22:56 UTC (permalink / raw)
  To: linux-block; +Cc: Manuel Bentele

From: Manuel Bentele <development@manuel-bentele.de>

The entire API of the file format subsystem for loop devices is documented
in the reST kernel documentation format. The documentation deals with the
description of the internal API of the file format subsystem to access its
functionality and adds a section on how to write own loop file format
drivers using the driver API of the subsystem.

Signed-off-by: Manuel Bentele <development@manuel-bentele.de>
---
 Documentation/driver-api/index.rst         |   1 +
 Documentation/driver-api/loop-file-fmt.rst | 137 +++++++++++++++++++++
 2 files changed, 138 insertions(+)
 create mode 100644 Documentation/driver-api/loop-file-fmt.rst

diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index 38e638abe3eb..88736bd668f3 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -51,6 +51,7 @@ available subsections can be seen below.
    mmc/index
    nvdimm/index
    w1
+   loop-file-fmt
    rapidio/index
    s390-drivers
    vme
diff --git a/Documentation/driver-api/loop-file-fmt.rst b/Documentation/driver-api/loop-file-fmt.rst
new file mode 100644
index 000000000000..1f47b19bdef0
--- /dev/null
+++ b/Documentation/driver-api/loop-file-fmt.rst
@@ -0,0 +1,137 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========================================
+Loopback block device file format subsystem
+===========================================
+
+This document outlines the file format subsystem used in the loopback block
+device module. This subsystem deals with the abstraction of direct file access
+to allow the implementation of various disk file formats. The subsystem can
+handle ...
+
+   - read
+   - write
+   - discard
+   - flush
+   - sector size
+
+... operations of a loop device.
+
+Therefore, the subsystem provides an internal API for the loop device module to
+access its functionality and exports a file format driver API to implement any
+file format driver for loop devices.
+
+
+Use the file format subsystem
+=============================
+
+At the moment, the file format subsystem is only intended to be used from the
+loopback device module to provide a specific file format implementation per
+configured loop device. Therefore, the loop device module can use the following
+internal file format API functions to set up loop file formats and access the
+file format subsystem.
+
+
+Internal subsystem API
+----------------------
+
+.. kernel-doc:: drivers/block/loop/loop_file_fmt.h
+   :functions: loop_file_fmt_alloc loop_file_fmt_free \
+               loop_file_fmt_set_lo loop_file_fmt_get_lo
+               loop_file_fmt_init loop_file_fmt_exit \
+               loop_file_fmt_read loop_file_fmt_read_aio \
+               loop_file_fmt_write loop_file_fmt_write_aio \
+               loop_file_fmt_discard loop_file_fmt_flush \
+               loop_file_fmt_sector_size loop_file_fmt_change
+
+
+Finite state machine
+--------------------
+
+To prevent a misuse of the internal file format API, the file format subsystem
+implements an finite state machine. The state machine consists of two states
+and a transition for each internal API function. The state
+*file_fmt_uninitialized* of a loop file format denotes that the file format is
+already allocated but not initialized. After the initialization, the file
+format's state is set to *file_fmt_initialized*. In this state, all IO related
+file format operations can be accessed.
+
+.. note:: If an internal API call does not succeed the file format's state \
+          does not change accordingly to its transition and remains in the \
+          original state before the API call.
+
+The entire implemented finite state machine looks like the following:
+
+.. kernel-render:: DOT
+   :alt: loop file format states
+   :caption: File format states and transitions
+
+   digraph file_fmt_states {
+       rankdir = LR;
+       node [ shape = point,        label = "" ] ENTRY, EXIT;
+       node [ shape = circle,       label = "file_fmt_uninitialized" ] UN;
+       node [ shape = doublecircle, label = "file_fmt_initialized" ]   IN;
+       subgraph helper {
+           rank = "same";
+           ENTRY -> UN   [ label = "loop_file_fmt_alloc()" ];
+           UN    -> EXIT [ label = "loop_file_fmt_free()" ];
+       }
+       UN    -> IN   [ label = "loop_file_fmt_init()" ];
+       IN    -> UN   [ label = "loop_file_fmt_exit()" ];
+       IN    -> IN   [ label = "loop_file_fmt_read()\nloop_file_fmt_read_aio()\nloop_file_fmt_write()\n loop_file_fmt_write_aio()\nloop_file_fmt_discard()\nloop_file_fmt_flush()\nloop_file_fmt_sector_size()\nloop_file_fmt_change()" ];
+   }
+
+
+Write file format drivers
+=========================
+
+A file format driver for the loop file format subsystem is implemented as
+kernel module. In the kernel module's code, the file format driver structure is
+statically allocated and must be initialized. An example definition would look
+like::
+
+   struct loop_file_fmt_driver raw_file_fmt_driver = {
+       .name          = "RAW",
+       .file_fmt_type = LO_FILE_FMT_RAW,
+       .ops           = &raw_file_fmt_ops,
+       .owner         = THIS_MODULE
+   };
+
+The definition assigns a *name* to the file format driver. The *file_fmt_type*
+field is set to the file format type that the driver implements. The *owner*
+specifies the driver's owner and is used to lock the kernel module of the
+driver if the file format driver is in use. The most important field of a loop
+file format driver is the specification of its implementation. Therefore, the
+*ops* field proposes all file format operations that the driver implement by
+link to a statically allocated operations structure.
+
+.. note:: All fields of the **loop_file_fmt_driver** structure must be \
+          initialized and set up accordingly, otherwise the driver does not \
+          work properly.
+
+An example of such an operations structure looks like::
+
+   struct loop_file_fmt_ops raw_file_fmt_ops = {
+       .init        = NULL,
+       .exit        = NULL,
+       .read        = raw_file_fmt_read,
+       .write       = raw_file_fmt_write,
+       .read_aio    = raw_file_fmt_read_aio,
+       .write_aio   = raw_file_fmt_write_aio,
+       .discard     = raw_file_fmt_discard,
+       .flush       = raw_file_fmt_flush,
+       .sector_size = raw_file_fmt_sector_size
+   };
+
+The operations structure consists of a bunch of functions pointers which are
+set in this example to some functions of the binary raw disk file format
+implemented in the example driver. If a function is not available in the
+driver's implementation the function pointer in the operations structure must
+be set to *NULL*.
+
+If all definitions are available and set up correctly the driver can be
+registered and later on unregistered by using the following functions exported
+by the file format subsystem:
+
+.. kernel-doc:: drivers/block/loop/loop_file_fmt.h
+   :functions: loop_file_fmt_register_driver loop_file_fmt_unregister_driver
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/5] block: loop: add QCOW2 loop file format driver (read-only)
  2019-08-23 22:56 [PATCH 0/5] block: loop: add file format subsystem and QCOW2 file format driver development
  2019-08-23 22:56 ` [PATCH 2/5] doc: admin-guide: add loop block device documentation development
  2019-08-23 22:56 ` [PATCH 3/5] doc: driver-api: add loop file format subsystem API documentation development
@ 2019-08-23 22:56 ` development
  2019-08-23 22:56 ` [PATCH 5/5] doc: admin-guide: add QCOW2 file format to loop device documentation development
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: development @ 2019-08-23 22:56 UTC (permalink / raw)
  To: linux-block; +Cc: Manuel Bentele

From: Manuel Bentele <development@manuel-bentele.de>

The QCOW2 file format is added as a new file format driver module to the
existing loop device file format subsystem. The implementation of the QCOW2
file format is based on the original implementation from the QEMU project
and was ported to the Linux kernel space.

The current implementation of the QCOW2 file format supports the reading of
normal QCOW2 disk images as well as the reading of sparsed or compressed
QCOW2 images. Write support is missing and is not ported yet. Discard,
flush and reading or writing aio is missing, too, and can be implemented
together with QCOW version 1 support in the future.

Signed-off-by: Manuel Bentele <development@manuel-bentele.de>
---
 drivers/block/loop/Kconfig                    |   9 +
 drivers/block/loop/Makefile                   |   5 +
 drivers/block/loop/loop_file_fmt_qcow_cache.c | 218 ++++
 drivers/block/loop/loop_file_fmt_qcow_cache.h |  51 +
 .../block/loop/loop_file_fmt_qcow_cluster.c   | 270 +++++
 .../block/loop/loop_file_fmt_qcow_cluster.h   |  23 +
 drivers/block/loop/loop_file_fmt_qcow_main.c  | 945 ++++++++++++++++++
 drivers/block/loop/loop_file_fmt_qcow_main.h  | 417 ++++++++
 8 files changed, 1938 insertions(+)
 create mode 100644 drivers/block/loop/loop_file_fmt_qcow_cache.c
 create mode 100644 drivers/block/loop/loop_file_fmt_qcow_cache.h
 create mode 100644 drivers/block/loop/loop_file_fmt_qcow_cluster.c
 create mode 100644 drivers/block/loop/loop_file_fmt_qcow_cluster.h
 create mode 100644 drivers/block/loop/loop_file_fmt_qcow_main.c
 create mode 100644 drivers/block/loop/loop_file_fmt_qcow_main.h

diff --git a/drivers/block/loop/Kconfig b/drivers/block/loop/Kconfig
index 355f1554b848..a3fa6768c7d7 100644
--- a/drivers/block/loop/Kconfig
+++ b/drivers/block/loop/Kconfig
@@ -82,3 +82,12 @@ config BLK_DEV_LOOP_FILE_FMT_RAW
 	---help---
 	  Say Y or M here if you want to enable the binary (RAW) file format
 	  support of the loop device module.
+
+config BLK_DEV_LOOP_FILE_FMT_QCOW
+	tristate "Loop device QCOW file format support"
+	  select ZLIB_INFLATE
+	  select ZLIB_DEFLATE
+	  depends on BLK_DEV_LOOP
+	  ---help---
+	    Say Y or M here if you want to enable the QEMU's copy on write (QCOW)
+	    file format support of the loop device module.
diff --git a/drivers/block/loop/Makefile b/drivers/block/loop/Makefile
index 2cd69e878453..f2fe116c2954 100644
--- a/drivers/block/loop/Makefile
+++ b/drivers/block/loop/Makefile
@@ -6,3 +6,8 @@ obj-$(CONFIG_BLK_DEV_LOOP)               += loop.o
 obj-$(CONFIG_BLK_DEV_CRYPTOLOOP)         += cryptoloop.o
 
 obj-$(CONFIG_BLK_DEV_LOOP_FILE_FMT_RAW)  += loop_file_fmt_raw.o
+
+loop_file_fmt_qcow-y                     += loop_file_fmt_qcow_main.o \
+                                            loop_file_fmt_qcow_cluster.o \
+                                            loop_file_fmt_qcow_cache.o
+obj-$(CONFIG_BLK_DEV_LOOP_FILE_FMT_QCOW) += loop_file_fmt_qcow.o
diff --git a/drivers/block/loop/loop_file_fmt_qcow_cache.c b/drivers/block/loop/loop_file_fmt_qcow_cache.c
new file mode 100644
index 000000000000..7d3af7398f04
--- /dev/null
+++ b/drivers/block/loop/loop_file_fmt_qcow_cache.c
@@ -0,0 +1,218 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * loop_file_fmt_qcow_cache.c
+ *
+ * QCOW file format driver for the loop device module.
+ *
+ * Ported QCOW2 implementation of the QEMU project (GPL-2.0):
+ * L2/refcount table cache for the QCOW2 format.
+ *
+ * The copyright (C) 2010 of the original code is owned by
+ * Kevin Wolf <kwolf@redhat.com>
+ *
+ * Copyright (C) 2019 Manuel Bentele <development@manuel-bentele.de>
+ */
+
+#include <linux/kernel.h>
+#include <linux/log2.h>
+#include <linux/types.h>
+#include <linux/limits.h>
+#include <linux/fs.h>
+#include <linux/vmalloc.h>
+
+#include "loop_file_fmt_qcow_main.h"
+#include "loop_file_fmt_qcow_cache.h"
+
+static inline void *__loop_file_fmt_qcow_cache_get_table_addr(
+	struct loop_file_fmt_qcow_cache *c, int table)
+{
+	return (u8 *) c->table_array + (size_t) table * c->table_size;
+}
+
+static inline int __loop_file_fmt_qcow_cache_get_table_idx(
+	struct loop_file_fmt_qcow_cache *c, void *table)
+{
+	ptrdiff_t table_offset = (u8 *) table - (u8 *) c->table_array;
+	int idx = table_offset / c->table_size;
+	ASSERT(idx >= 0 && idx < c->size && table_offset % c->table_size == 0);
+	return idx;
+}
+
+static inline const char *__loop_file_fmt_qcow_cache_get_name(
+	struct loop_file_fmt *lo_fmt, struct loop_file_fmt_qcow_cache *c)
+{
+	struct loop_file_fmt_qcow_data *qcow_data = lo_fmt->private_data;
+
+	if (c == qcow_data->refcount_block_cache) {
+		return "refcount block";
+	} else if (c == qcow_data->l2_table_cache) {
+		return "L2 table";
+	} else {
+		/* do not abort, because this is not critical */
+		return "unknown";
+	}
+}
+
+struct loop_file_fmt_qcow_cache *loop_file_fmt_qcow_cache_create(
+	struct loop_file_fmt *lo_fmt, int num_tables, unsigned table_size)
+{
+#ifdef CONFIG_DEBUG_DRIVER
+	struct loop_file_fmt_qcow_data *qcow_data = lo_fmt->private_data;
+#endif
+	struct loop_file_fmt_qcow_cache *c;
+
+	ASSERT(num_tables > 0);
+	ASSERT(is_power_of_2(table_size));
+	ASSERT(table_size >= (1 << QCOW_MIN_CLUSTER_BITS));
+	ASSERT(table_size <= qcow_data->cluster_size);
+
+	c = kzalloc(sizeof(*c), GFP_KERNEL);
+	if (!c) {
+		return NULL;
+	}
+
+	c->size = num_tables;
+	c->table_size = table_size;
+	c->entries = vzalloc(sizeof(struct loop_file_fmt_qcow_cache_table) *
+		num_tables);
+	c->table_array = vzalloc(num_tables * c->table_size);
+
+	if (!c->entries || !c->table_array) {
+		vfree(c->table_array);
+		vfree(c->entries);
+		kfree(c);
+		c = NULL;
+	}
+
+	return c;
+}
+
+void loop_file_fmt_qcow_cache_destroy(struct loop_file_fmt *lo_fmt)
+{
+	struct loop_file_fmt_qcow_data *qcow_data = lo_fmt->private_data;
+	struct loop_file_fmt_qcow_cache *c = qcow_data->l2_table_cache;
+	int i;
+
+	for (i = 0; i < c->size; i++) {
+		ASSERT(c->entries[i].ref == 0);
+	}
+
+	vfree(c->table_array);
+	vfree(c->entries);
+	kfree(c);
+}
+
+static int __loop_file_fmt_qcow_cache_entry_flush(
+	struct loop_file_fmt_qcow_cache *c, int i)
+{
+	if (!c->entries[i].dirty || !c->entries[i].offset) {
+		return 0;
+	} else {
+		printk(KERN_ERR "loop_file_fmt_qcow: Flush dirty cache tables "
+			"is not supported yet\n");
+		return -ENOSYS;
+	}
+}
+
+static int __loop_file_fmt_qcow_cache_do_get(struct loop_file_fmt *lo_fmt,
+	struct loop_file_fmt_qcow_cache *c, u64 offset, void **table,
+	bool read_from_disk)
+{
+	struct loop_device *lo = loop_file_fmt_get_lo(lo_fmt);
+	int i;
+	int ret;
+	int lookup_index;
+	u64 min_lru_counter = U64_MAX;
+	int min_lru_index = -1;
+	u64 read_offset;
+	size_t len;
+
+	ASSERT(offset != 0);
+
+	if (!IS_ALIGNED(offset, c->table_size)) {
+		printk_ratelimited(KERN_ERR "loop_file_fmt_qcow: Cannot get "
+			"entry from %s cache: offset %llx is unaligned\n",
+			__loop_file_fmt_qcow_cache_get_name(lo_fmt, c),
+			offset);
+		return -EIO;
+	}
+
+	/* Check if the table is already cached */
+	i = lookup_index = (offset / c->table_size * 4) % c->size;
+	do {
+		const struct loop_file_fmt_qcow_cache_table *t =
+			&c->entries[i];
+		if (t->offset == offset) {
+			goto found;
+		}
+		if (t->ref == 0 && t->lru_counter < min_lru_counter) {
+			min_lru_counter = t->lru_counter;
+			min_lru_index = i;
+		}
+		if (++i == c->size) {
+			i = 0;
+		}
+	} while (i != lookup_index);
+
+	if (min_lru_index == -1) {
+		BUG();
+		panic("Oops: This can't happen in current synchronous code, "
+			"but leave the check here as a reminder for whoever "
+			"starts using AIO with the QCOW cache");
+	}
+
+	/* Cache miss: write a table back and replace it */
+	i = min_lru_index;
+
+	ret = __loop_file_fmt_qcow_cache_entry_flush(c, i);
+	if (ret < 0) {
+		return ret;
+	}
+
+	c->entries[i].offset = 0;
+	if (read_from_disk) {
+		read_offset = offset;
+		len = kernel_read(lo->lo_backing_file,
+			__loop_file_fmt_qcow_cache_get_table_addr(c, i),
+			c->table_size, &read_offset);
+		if (len < 0) {
+			len = ret;
+			return ret;
+		}
+	}
+
+	c->entries[i].offset = offset;
+
+	/* And return the right table */
+found:
+	c->entries[i].ref++;
+	*table = __loop_file_fmt_qcow_cache_get_table_addr(c, i);
+
+	return 0;
+}
+
+int loop_file_fmt_qcow_cache_get(struct loop_file_fmt *lo_fmt, u64 offset,
+	void **table)
+{
+	struct loop_file_fmt_qcow_data *qcow_data = lo_fmt->private_data;
+	struct loop_file_fmt_qcow_cache *c = qcow_data->l2_table_cache;
+
+	return __loop_file_fmt_qcow_cache_do_get(lo_fmt, c, offset, table,
+		true);
+}
+
+void loop_file_fmt_qcow_cache_put(struct loop_file_fmt *lo_fmt, void **table)
+{
+	struct loop_file_fmt_qcow_data *qcow_data = lo_fmt->private_data;
+	struct loop_file_fmt_qcow_cache *c = qcow_data->l2_table_cache;
+	int i = __loop_file_fmt_qcow_cache_get_table_idx(c, *table);
+
+	c->entries[i].ref--;
+	*table = NULL;
+
+	if (c->entries[i].ref == 0) {
+		c->entries[i].lru_counter = ++c->lru_counter;
+	}
+
+	ASSERT(c->entries[i].ref >= 0);
+}
diff --git a/drivers/block/loop/loop_file_fmt_qcow_cache.h b/drivers/block/loop/loop_file_fmt_qcow_cache.h
new file mode 100644
index 000000000000..1abf9b2b7c09
--- /dev/null
+++ b/drivers/block/loop/loop_file_fmt_qcow_cache.h
@@ -0,0 +1,51 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * loop_file_fmt_qcow_cache.h
+ *
+ * Ported QCOW2 implementation of the QEMU project (GPL-2.0):
+ * L2/refcount table cache for the QCOW2 format.
+ *
+ * The copyright (C) 2010 of the original code is owned by
+ * Kevin Wolf <kwolf@redhat.com>
+ *
+ * Copyright (C) 2019 Manuel Bentele <development@manuel-bentele.de>
+ */
+
+#ifndef _LINUX_LOOP_FILE_FMT_QCOW_CACHE_H
+#define _LINUX_LOOP_FILE_FMT_QCOW_CACHE_H
+
+#include "loop_file_fmt.h"
+
+struct loop_file_fmt_qcow_cache_table {
+	s64 offset;
+	u64 lru_counter;
+	int ref;
+	bool dirty;
+};
+
+struct loop_file_fmt_qcow_cache {
+	struct loop_file_fmt_qcow_cache_table *entries;
+	struct loop_file_fmt_qcow_cache *depends;
+	int size;
+	int table_size;
+	bool depends_on_flush;
+	void *table_array;
+	u64 lru_counter;
+	u64 cache_clean_lru_counter;
+};
+
+extern struct loop_file_fmt_qcow_cache *loop_file_fmt_qcow_cache_create(
+	struct loop_file_fmt *lo_fmt,
+	int num_tables,
+	unsigned table_size);
+
+extern void loop_file_fmt_qcow_cache_destroy(struct loop_file_fmt *lo_fmt);
+
+extern int loop_file_fmt_qcow_cache_get(struct loop_file_fmt *lo_fmt,
+					u64 offset,
+					void **table);
+
+extern void loop_file_fmt_qcow_cache_put(struct loop_file_fmt *lo_fmt,
+					 void **table);
+
+#endif
diff --git a/drivers/block/loop/loop_file_fmt_qcow_cluster.c b/drivers/block/loop/loop_file_fmt_qcow_cluster.c
new file mode 100644
index 000000000000..9c91a8b4aeb7
--- /dev/null
+++ b/drivers/block/loop/loop_file_fmt_qcow_cluster.c
@@ -0,0 +1,270 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * loop_file_fmt_qcow_cluster.c
+ *
+ * Ported QCOW2 implementation of the QEMU project (GPL-2.0):
+ * Cluster calculation and lookup for the QCOW2 format.
+ *
+ * The copyright (C) 2004-2006 of the original code is owned by Fabrice Bellard.
+ *
+ * Copyright (C) 2019 Manuel Bentele <development@manuel-bentele.de>
+ */
+
+#include <linux/kernel.h>
+#include <linux/string.h>
+
+#include "loop_file_fmt.h"
+#include "loop_file_fmt_qcow_main.h"
+#include "loop_file_fmt_qcow_cache.h"
+#include "loop_file_fmt_qcow_cluster.h"
+
+/*
+ * Loads a L2 slice into memory (L2 slices are the parts of L2 tables
+ * that are loaded by the qcow2 cache). If the slice is in the cache,
+ * the cache is used; otherwise the L2 slice is loaded from the image
+ * file.
+ */
+static int __loop_file_fmt_qcow_cluster_l2_load(struct loop_file_fmt *lo_fmt,
+	u64 offset, u64 l2_offset, u64 **l2_slice)
+{
+	struct loop_file_fmt_qcow_data *qcow_data = lo_fmt->private_data;
+
+	int start_of_slice = sizeof(u64) * (
+		loop_file_fmt_qcow_offset_to_l2_index(qcow_data, offset) -
+		loop_file_fmt_qcow_offset_to_l2_slice_index(qcow_data, offset)
+	);
+
+	ASSERT(qcow_data->l2_table_cache != NULL);
+	return loop_file_fmt_qcow_cache_get(lo_fmt, l2_offset + start_of_slice,
+		(void **) l2_slice);
+}
+
+/*
+ * Checks how many clusters in a given L2 slice are contiguous in the image
+ * file. As soon as one of the flags in the bitmask stop_flags changes compared
+ * to the first cluster, the search is stopped and the cluster is not counted
+ * as contiguous. (This allows it, for example, to stop at the first compressed
+ * cluster which may require a different handling)
+ */
+static int __loop_file_fmt_qcow_cluster_count_contiguous(
+	struct loop_file_fmt *lo_fmt, int nb_clusters, int cluster_size,
+	u64 *l2_slice, u64 stop_flags)
+{
+	int i;
+	enum loop_file_fmt_qcow_cluster_type first_cluster_type;
+	u64 mask = stop_flags | L2E_OFFSET_MASK | QCOW_OFLAG_COMPRESSED;
+	u64 first_entry = be64_to_cpu(l2_slice[0]);
+	u64 offset = first_entry & mask;
+
+	first_cluster_type = loop_file_fmt_qcow_get_cluster_type(lo_fmt,
+		first_entry);
+	if (first_cluster_type == QCOW_CLUSTER_UNALLOCATED) {
+		return 0;
+	}
+
+	/* must be allocated */
+	ASSERT(first_cluster_type == QCOW_CLUSTER_NORMAL ||
+		first_cluster_type == QCOW_CLUSTER_ZERO_ALLOC);
+
+	for (i = 0; i < nb_clusters; i++) {
+		u64 l2_entry = be64_to_cpu(l2_slice[i]) & mask;
+		if (offset + (u64) i * cluster_size != l2_entry) {
+			break;
+		}
+	}
+
+	return i;
+}
+
+/*
+ * Checks how many consecutive unallocated clusters in a given L2
+ * slice have the same cluster type.
+ */
+static int __loop_file_fmt_qcow_cluster_count_contiguous_unallocated(
+	struct loop_file_fmt *lo_fmt, int nb_clusters, u64 *l2_slice,
+	enum loop_file_fmt_qcow_cluster_type wanted_type)
+{
+	int i;
+
+	ASSERT(wanted_type == QCOW_CLUSTER_ZERO_PLAIN ||
+		wanted_type == QCOW_CLUSTER_UNALLOCATED);
+
+	for (i = 0; i < nb_clusters; i++) {
+		u64 entry = be64_to_cpu(l2_slice[i]);
+		enum loop_file_fmt_qcow_cluster_type type =
+			loop_file_fmt_qcow_get_cluster_type(lo_fmt, entry);
+
+		if (type != wanted_type) {
+			break;
+		}
+	}
+
+	return i;
+}
+
+/*
+ * For a given offset of the virtual disk, find the cluster type and offset in
+ * the qcow2 file. The offset is stored in *cluster_offset.
+ *
+ * On entry, *bytes is the maximum number of contiguous bytes starting at
+ * offset that we are interested in.
+ *
+ * On exit, *bytes is the number of bytes starting at offset that have the same
+ * cluster type and (if applicable) are stored contiguously in the image file.
+ * Compressed clusters are always returned one by one.
+ *
+ * Returns the cluster type (QCOW2_CLUSTER_*) on success, -errno in error
+ * cases.
+ */
+int loop_file_fmt_qcow_cluster_get_offset(struct loop_file_fmt *lo_fmt,
+	u64 offset, unsigned int *bytes, u64 *cluster_offset)
+{
+	struct loop_file_fmt_qcow_data *qcow_data = lo_fmt->private_data;
+	unsigned int l2_index;
+	u64 l1_index, l2_offset, *l2_slice;
+	int c;
+	unsigned int offset_in_cluster;
+	u64 bytes_available, bytes_needed, nb_clusters;
+	enum loop_file_fmt_qcow_cluster_type type;
+	int ret;
+
+	offset_in_cluster = loop_file_fmt_qcow_offset_into_cluster(qcow_data,
+		offset);
+	bytes_needed = (u64) *bytes + offset_in_cluster;
+
+	/* compute how many bytes there are between the start of the cluster
+	 * containing offset and the end of the l2 slice that contains
+	 * the entry pointing to it */
+	bytes_available = ((u64)(
+		qcow_data->l2_slice_size -
+		loop_file_fmt_qcow_offset_to_l2_slice_index(qcow_data, offset))
+	) << qcow_data->cluster_bits;
+
+	if (bytes_needed > bytes_available) {
+		bytes_needed = bytes_available;
+	}
+
+	*cluster_offset = 0;
+
+	/* seek to the l2 offset in the l1 table */
+	l1_index = loop_file_fmt_qcow_offset_to_l1_index(qcow_data, offset);
+	if (l1_index >= qcow_data->l1_size) {
+		type = QCOW_CLUSTER_UNALLOCATED;
+		goto out;
+	}
+
+	l2_offset = qcow_data->l1_table[l1_index] & L1E_OFFSET_MASK;
+	if (!l2_offset) {
+		type = QCOW_CLUSTER_UNALLOCATED;
+		goto out;
+	}
+
+	if (loop_file_fmt_qcow_offset_into_cluster(qcow_data, l2_offset)) {
+		printk_ratelimited(KERN_ERR "loop_file_fmt_qcow: L2 table "
+			"offset %llx unaligned (L1 index: %llx)", l2_offset,
+			l1_index);
+		return -EIO;
+	}
+
+	/* load the l2 slice in memory */
+	ret = __loop_file_fmt_qcow_cluster_l2_load(lo_fmt, offset, l2_offset,
+		&l2_slice);
+	if (ret < 0) {
+		return ret;
+	}
+
+	/* find the cluster offset for the given disk offset */
+	l2_index = loop_file_fmt_qcow_offset_to_l2_slice_index(qcow_data,
+		offset);
+	*cluster_offset = be64_to_cpu(l2_slice[l2_index]);
+
+	nb_clusters = loop_file_fmt_qcow_size_to_clusters(qcow_data,
+		bytes_needed);
+	/* bytes_needed <= *bytes + offset_in_cluster, both of which are
+	 * unsigned integers; the minimum cluster size is 512, so this
+	 * assertion is always true */
+	ASSERT(nb_clusters <= INT_MAX);
+
+	type = loop_file_fmt_qcow_get_cluster_type(lo_fmt, *cluster_offset);
+	if (qcow_data->qcow_version < 3 && (
+			type == QCOW_CLUSTER_ZERO_PLAIN ||
+			type == QCOW_CLUSTER_ZERO_ALLOC)) {
+		printk_ratelimited(KERN_ERR "loop_file_fmt_qcow: zero cluster "
+			"entry found in pre-v3 image (L2 offset: %llx, "
+			"L2 index: %x)\n", l2_offset, l2_index);
+		ret = -EIO;
+		goto fail;
+	}
+	switch (type) {
+	case QCOW_CLUSTER_COMPRESSED:
+		if (loop_file_fmt_qcow_has_data_file(lo_fmt)) {
+			printk_ratelimited(KERN_ERR "loop_file_fmt_qcow: "
+				"compressed cluster entry found in image with "
+				"external data file (L2 offset: %llx, "
+				"L2 index: %x)", l2_offset, l2_index);
+			ret = -EIO;
+			goto fail;
+		}
+		/* Compressed clusters can only be processed one by one */
+		c = 1;
+		*cluster_offset &= L2E_COMPRESSED_OFFSET_SIZE_MASK;
+		break;
+	case QCOW_CLUSTER_ZERO_PLAIN:
+	case QCOW_CLUSTER_UNALLOCATED:
+		/* how many empty clusters ? */
+		c = __loop_file_fmt_qcow_cluster_count_contiguous_unallocated(
+			lo_fmt, nb_clusters, &l2_slice[l2_index], type);
+		*cluster_offset = 0;
+		break;
+	case QCOW_CLUSTER_ZERO_ALLOC:
+	case QCOW_CLUSTER_NORMAL:
+		/* how many allocated clusters ? */
+		c = __loop_file_fmt_qcow_cluster_count_contiguous(lo_fmt,
+			nb_clusters, qcow_data->cluster_size,
+			&l2_slice[l2_index], QCOW_OFLAG_ZERO);
+		*cluster_offset &= L2E_OFFSET_MASK;
+		if (loop_file_fmt_qcow_offset_into_cluster(qcow_data,
+				*cluster_offset)) {
+			printk_ratelimited(KERN_ERR "loop_file_fmt_qcow: "
+				"cluster allocation offset %llx unaligned "
+				"(L2 offset: %llx, L2 index: %x)\n",
+				*cluster_offset, l2_offset, l2_index);
+			ret = -EIO;
+			goto fail;
+		}
+		if (loop_file_fmt_qcow_has_data_file(lo_fmt) &&
+			*cluster_offset != offset - offset_in_cluster) {
+			printk_ratelimited(KERN_ERR "loop_file_fmt_qcow: "
+				"external data file host cluster offset %llx "
+				"does not match guest cluster offset: %llx, "
+				"L2 index: %x)", *cluster_offset,
+				offset - offset_in_cluster, l2_index);
+			ret = -EIO;
+			goto fail;
+		}
+		break;
+	default:
+		BUG();
+	}
+
+	loop_file_fmt_qcow_cache_put(lo_fmt, (void **) &l2_slice);
+
+	bytes_available = (s64) c * qcow_data->cluster_size;
+
+out:
+	if (bytes_available > bytes_needed) {
+		bytes_available = bytes_needed;
+	}
+
+	/* bytes_available <= bytes_needed <= *bytes + offset_in_cluster;
+	 * subtracting offset_in_cluster will therefore definitely yield
+	 * something not exceeding UINT_MAX */
+	ASSERT(bytes_available - offset_in_cluster <= UINT_MAX);
+	*bytes = bytes_available - offset_in_cluster;
+
+	return type;
+
+fail:
+	loop_file_fmt_qcow_cache_put(lo_fmt, (void **) &l2_slice);
+	return ret;
+}
diff --git a/drivers/block/loop/loop_file_fmt_qcow_cluster.h b/drivers/block/loop/loop_file_fmt_qcow_cluster.h
new file mode 100644
index 000000000000..d62e3318f6ce
--- /dev/null
+++ b/drivers/block/loop/loop_file_fmt_qcow_cluster.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * loop_file_fmt_qcow_cluster.h
+ *
+ * Ported QCOW2 implementation of the QEMU project (GPL-2.0):
+ * Cluster calculation and lookup for the QCOW2 format.
+ *
+ * The copyright (C) 2004-2006 of the original code is owned by Fabrice Bellard.
+ *
+ * Copyright (C) 2019 Manuel Bentele <development@manuel-bentele.de>
+ */
+
+#ifndef _LINUX_LOOP_FILE_FMT_QCOW_CLUSTER_H
+#define _LINUX_LOOP_FILE_FMT_QCOW_CLUSTER_H
+
+#include "loop_file_fmt.h"
+
+extern int loop_file_fmt_qcow_cluster_get_offset(struct loop_file_fmt *lo_fmt,
+						 u64 offset,
+						 unsigned int *bytes,
+						 u64 *cluster_offset);
+
+#endif
diff --git a/drivers/block/loop/loop_file_fmt_qcow_main.c b/drivers/block/loop/loop_file_fmt_qcow_main.c
new file mode 100644
index 000000000000..4fb786b340f7
--- /dev/null
+++ b/drivers/block/loop/loop_file_fmt_qcow_main.c
@@ -0,0 +1,945 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * loop_file_fmt_qcow.c
+ *
+ * QCOW file format driver for the loop device module.
+ *
+ * Copyright (C) 2019 Manuel Bentele <development@manuel-bentele.de>
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/fs.h>
+#include <linux/types.h>
+#include <linux/limits.h>
+#include <linux/blkdev.h>
+#include <linux/bio.h>
+#include <linux/bvec.h>
+#include <linux/mutex.h>
+#include <linux/uio.h>
+#include <linux/string.h>
+#include <linux/vmalloc.h>
+#include <linux/zlib.h>
+
+#include "loop_file_fmt.h"
+#include "loop_file_fmt_qcow_main.h"
+#include "loop_file_fmt_qcow_cache.h"
+#include "loop_file_fmt_qcow_cluster.h"
+
+static int __qcow_file_fmt_header_read(struct loop_file_fmt *lo_fmt,
+	struct loop_file_fmt_qcow_header *header)
+{
+	struct loop_device *lo = loop_file_fmt_get_lo(lo_fmt);
+	ssize_t len;
+	loff_t offset;
+	int ret = 0;
+
+	/* read QCOW header */
+	offset = 0;
+	len = kernel_read(lo->lo_backing_file, header, sizeof(*header),
+		&offset);
+	if (len < 0) {
+		printk(KERN_ERR "loop_file_fmt_qcow: could not read QCOW "
+			"header");
+		return len;
+	}
+
+	header->magic = be32_to_cpu(header->magic);
+	header->version = be32_to_cpu(header->version);
+	header->backing_file_offset = be64_to_cpu(header->backing_file_offset);
+	header->backing_file_size = be32_to_cpu(header->backing_file_size);
+	header->cluster_bits = be32_to_cpu(header->cluster_bits);
+	header->size = be64_to_cpu(header->size);
+	header->crypt_method = be32_to_cpu(header->crypt_method);
+	header->l1_size = be32_to_cpu(header->l1_size);
+	header->l1_table_offset = be64_to_cpu(header->l1_table_offset);
+	header->refcount_table_offset =
+		be64_to_cpu(header->refcount_table_offset);
+	header->refcount_table_clusters =
+		be32_to_cpu(header->refcount_table_clusters);
+	header->nb_snapshots = be32_to_cpu(header->nb_snapshots);
+	header->snapshots_offset = be64_to_cpu(header->snapshots_offset);
+
+	/* check QCOW file format and header version */
+	if (header->magic != QCOW_MAGIC) {
+		printk(KERN_ERR "loop_file_fmt_qcow: image is not in QCOW "
+			"format");
+		return -EINVAL;
+	}
+
+	if (header->version < 2 || header->version > 3) {
+		printk(KERN_ERR "loop_file_fmt_qcow: unsupported QCOW version "
+			"%d", header->version);
+		return -ENOTSUPP;
+	}
+
+	/* initialize version 3 header fields */
+	if (header->version == 2) {
+		header->incompatible_features =  0;
+		header->compatible_features   =  0;
+		header->autoclear_features    =  0;
+		header->refcount_order        =  4;
+		header->header_length         = 72;
+	} else {
+		header->incompatible_features =
+			be64_to_cpu(header->incompatible_features);
+		header->compatible_features =
+			be64_to_cpu(header->compatible_features);
+		header->autoclear_features =
+			be64_to_cpu(header->autoclear_features);
+		header->refcount_order = be32_to_cpu(header->refcount_order);
+		header->header_length = be32_to_cpu(header->header_length);
+
+		if (header->header_length < 104) {
+			printk(KERN_ERR "loop_file_fmt_qcow: QCOW header too "
+				"short");
+			return -EINVAL;
+		}
+	}
+
+	return ret;
+}
+
+static int __qcow_file_fmt_validate_table(struct loop_file_fmt *lo_fmt,
+	u64 offset, u64 entries, size_t entry_len, s64 max_size_bytes,
+	const char *table_name)
+{
+	struct loop_file_fmt_qcow_data *qcow_data = lo_fmt->private_data;
+
+	if (entries > max_size_bytes / entry_len) {
+		printk(KERN_INFO "loop_file_fmt_qcow: %s too large",
+			table_name);
+		return -EFBIG;
+	}
+
+	/* Use signed S64_MAX as the maximum even for u64 header fields,
+	 * because values will be passed to qemu functions taking s64. */
+	if ((S64_MAX - entries * entry_len < offset) || (
+		loop_file_fmt_qcow_offset_into_cluster(qcow_data, offset) != 0)
+	) {
+		printk(KERN_INFO "loop_file_fmt_qcow: %s offset invalid",
+			table_name);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static inline loff_t __qcow_file_fmt_rq_get_pos(struct loop_file_fmt *lo_fmt,
+						struct request *rq)
+{
+	struct loop_device *lo = loop_file_fmt_get_lo(lo_fmt);
+	return ((loff_t) blk_rq_pos(rq) << 9) + lo->lo_offset;
+}
+
+static int __qcow_file_fmt_compression_init(struct loop_file_fmt *lo_fmt)
+{
+	struct loop_file_fmt_qcow_data *qcow_data = lo_fmt->private_data;
+	int ret = 0;
+
+	qcow_data->strm = kzalloc(sizeof(*qcow_data->strm), GFP_KERNEL);
+	if (!qcow_data->strm) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	qcow_data->strm->workspace = vzalloc(zlib_inflate_workspacesize());
+	if (!qcow_data->strm->workspace) {
+		ret = -ENOMEM;
+		goto out_free_strm;
+	}
+
+	return ret;
+
+out_free_strm:
+	kfree(qcow_data->strm);
+out:
+	return ret;
+}
+
+static void __qcow_file_fmt_compression_exit(struct loop_file_fmt *lo_fmt)
+{
+	struct loop_file_fmt_qcow_data *qcow_data = lo_fmt->private_data;
+
+	if (qcow_data->strm->workspace)
+		vfree(qcow_data->strm->workspace);
+
+	if (qcow_data->strm)
+		kfree(qcow_data->strm);
+}
+
+#ifdef CONFIG_DEBUG_FS
+static void __qcow_file_fmt_header_to_buf(struct loop_file_fmt *lo_fmt,
+	const struct loop_file_fmt_qcow_header *header)
+{
+	struct loop_file_fmt_qcow_data *qcow_data = lo_fmt->private_data;
+	char *header_buf = qcow_data->dbgfs_file_qcow_header_buf;
+	ssize_t len = 0;
+
+	len += sprintf(header_buf + len, "magic: %d\n",
+		header->magic);
+	len += sprintf(header_buf + len, "version: %d\n",
+		header->version);
+	len += sprintf(header_buf + len, "backing_file_offset: %lld\n",
+		header->backing_file_offset);
+	len += sprintf(header_buf + len, "backing_file_size: %d\n",
+		header->backing_file_size);
+	len += sprintf(header_buf + len, "cluster_bits: %d\n",
+		header->cluster_bits);
+	len += sprintf(header_buf + len, "size: %lld\n",
+		header->size);
+	len += sprintf(header_buf + len, "crypt_method: %d\n",
+		header->crypt_method);
+	len += sprintf(header_buf + len, "l1_size: %d\n",
+		header->l1_size);
+	len += sprintf(header_buf + len, "l1_table_offset: %lld\n",
+		header->l1_table_offset);
+	len += sprintf(header_buf + len, "refcount_table_offset: %lld\n",
+		header->refcount_table_offset);
+	len += sprintf(header_buf + len, "refcount_table_clusters: %d\n",
+		header->refcount_table_clusters);
+	len += sprintf(header_buf + len, "nb_snapshots: %d\n",
+		header->nb_snapshots);
+	len += sprintf(header_buf + len, "snapshots_offset: %lld\n",
+		header->snapshots_offset);
+
+	if (header->version == 3) {
+		len += sprintf(header_buf + len,
+			"incompatible_features: %lld\n",
+			header->incompatible_features);
+		len += sprintf(header_buf + len,
+			"compatible_features: %lld\n",
+			header->compatible_features);
+		len += sprintf(header_buf + len,
+			"autoclear_features: %lld\n",
+			header->autoclear_features);
+		len += sprintf(header_buf + len,
+			"refcount_order: %d\n",
+			header->refcount_order);
+		len += sprintf(header_buf + len,
+			"header_length: %d\n",
+			header->header_length);
+	}
+
+	ASSERT(len < QCOW_HEADER_BUF_LEN);
+}
+
+static ssize_t __qcow_file_fmt_dbgfs_hdr_read(struct file *file,
+	char __user *buf, size_t size, loff_t *ppos)
+{
+	struct loop_file_fmt *lo_fmt = file->private_data;
+	struct loop_file_fmt_qcow_data *qcow_data = lo_fmt->private_data;
+	char *header_buf = qcow_data->dbgfs_file_qcow_header_buf;
+
+	return simple_read_from_buffer(buf, size, ppos, header_buf,
+		strlen(header_buf));
+}
+
+static const struct file_operations qcow_file_fmt_dbgfs_hdr_fops = {
+	.open = simple_open,
+	.read = __qcow_file_fmt_dbgfs_hdr_read
+};
+
+static ssize_t __qcow_file_fmt_dbgfs_ofs_read(struct file *file,
+	char __user *buf, size_t size, loff_t *ppos)
+{
+	struct loop_file_fmt *lo_fmt = file->private_data;
+	struct loop_file_fmt_qcow_data *qcow_data = lo_fmt->private_data;
+	unsigned int cur_bytes = 1;
+	u64 offset = 0;
+	u64 cluster_offset = 0;
+	s64 offset_in_cluster = 0;
+	ssize_t len = 0;
+	int ret = 0;
+
+	/* read the share debugfs offset */
+	ret = mutex_lock_interruptible(&qcow_data->dbgfs_qcow_offset_mutex);
+	if (ret)
+		return ret;
+
+	offset = qcow_data->dbgfs_qcow_offset;
+	mutex_unlock(&qcow_data->dbgfs_qcow_offset_mutex);
+
+	/* calculate and print the cluster offset */
+	ret = loop_file_fmt_qcow_cluster_get_offset(lo_fmt,
+		offset, &cur_bytes, &cluster_offset);
+	if (ret < 0)
+		return -EINVAL;
+
+	offset_in_cluster = loop_file_fmt_qcow_offset_into_cluster(qcow_data,
+		offset);
+
+	len = sprintf(qcow_data->dbgfs_file_qcow_cluster_buf,
+		"offset: %lld\ncluster_offset: %lld\noffset_in_cluster: %lld\n",
+		offset, cluster_offset, offset_in_cluster);
+
+	ASSERT(len < QCOW_CLUSTER_BUF_LEN);
+
+	return simple_read_from_buffer(buf, size, ppos,
+		qcow_data->dbgfs_file_qcow_cluster_buf, len);
+}
+
+static ssize_t __qcow_file_fmt_dbgfs_ofs_write(struct file *file,
+	const char __user *buf, size_t size, loff_t *ppos)
+{
+	struct loop_file_fmt *lo_fmt = file->private_data;
+	struct loop_file_fmt_qcow_data *qcow_data = lo_fmt->private_data;
+	ssize_t len = 0;
+	int ret = 0;
+
+	if (*ppos > QCOW_OFFSET_BUF_LEN || size > QCOW_OFFSET_BUF_LEN)
+		return -EINVAL;
+
+	len = simple_write_to_buffer(qcow_data->dbgfs_file_qcow_offset_buf,
+		QCOW_OFFSET_BUF_LEN, ppos, buf, size);
+	if (len < 0)
+		return len;
+
+	qcow_data->dbgfs_file_qcow_offset_buf[len] = '\0';
+
+	ret = mutex_lock_interruptible(&qcow_data->dbgfs_qcow_offset_mutex);
+	if (ret)
+		return ret;
+
+	ret = kstrtou64(qcow_data->dbgfs_file_qcow_offset_buf, 10,
+		&qcow_data->dbgfs_qcow_offset);
+	if (ret < 0)
+		goto out;
+
+	ret = len;
+out:
+	mutex_unlock(&qcow_data->dbgfs_qcow_offset_mutex);
+	return ret;
+}
+
+static const struct file_operations qcow_file_fmt_dbgfs_ofs_fops = {
+	.open = simple_open,
+	.read = __qcow_file_fmt_dbgfs_ofs_read,
+	.write = __qcow_file_fmt_dbgfs_ofs_write
+};
+
+static int __qcow_file_fmt_dbgfs_init(struct loop_file_fmt *lo_fmt)
+{
+	struct loop_file_fmt_qcow_data *qcow_data = lo_fmt->private_data;
+	struct loop_device *lo = loop_file_fmt_get_lo(lo_fmt);
+	int ret = 0;
+
+	qcow_data->dbgfs_dir = debugfs_create_dir("QCOW", lo->lo_dbgfs_dir);
+	if (IS_ERR_OR_NULL(qcow_data->dbgfs_dir)) {
+		ret = -ENODEV;
+		goto out;
+	}
+
+	qcow_data->dbgfs_file_qcow_header = debugfs_create_file("header",
+		S_IRUGO, qcow_data->dbgfs_dir, lo_fmt,
+		&qcow_file_fmt_dbgfs_hdr_fops);
+	if (IS_ERR_OR_NULL(qcow_data->dbgfs_file_qcow_header)) {
+		ret = -ENODEV;
+		goto out_free_dbgfs_dir;
+	}
+
+	qcow_data->dbgfs_file_qcow_offset = debugfs_create_file("offset",
+		S_IRUGO | S_IWUSR, qcow_data->dbgfs_dir, lo_fmt,
+		&qcow_file_fmt_dbgfs_ofs_fops);
+	if (IS_ERR_OR_NULL(qcow_data->dbgfs_file_qcow_offset)) {
+		qcow_data->dbgfs_file_qcow_offset = NULL;
+		ret = -ENODEV;
+		goto out_free_dbgfs_hdr;
+	}
+
+	qcow_data->dbgfs_qcow_offset = 0;
+	mutex_init(&qcow_data->dbgfs_qcow_offset_mutex);
+
+	return ret;
+
+out_free_dbgfs_hdr:
+	debugfs_remove(qcow_data->dbgfs_file_qcow_header);
+	qcow_data->dbgfs_file_qcow_header = NULL;
+out_free_dbgfs_dir:
+	debugfs_remove(qcow_data->dbgfs_dir);
+	qcow_data->dbgfs_dir = NULL;
+out:
+	return ret;
+}
+
+static void __qcow_file_fmt_dbgfs_exit(struct loop_file_fmt *lo_fmt)
+{
+	struct loop_file_fmt_qcow_data *qcow_data = lo_fmt->private_data;
+
+	if (qcow_data->dbgfs_file_qcow_offset)
+		debugfs_remove(qcow_data->dbgfs_file_qcow_offset);
+
+	mutex_destroy(&qcow_data->dbgfs_qcow_offset_mutex);
+
+	if (qcow_data->dbgfs_file_qcow_header)
+		debugfs_remove(qcow_data->dbgfs_file_qcow_header);
+
+	if (qcow_data->dbgfs_dir)
+		debugfs_remove(qcow_data->dbgfs_dir);
+}
+#endif
+
+static int qcow_file_fmt_init(struct loop_file_fmt *lo_fmt)
+{
+	struct loop_file_fmt_qcow_data *qcow_data;
+	struct loop_device *lo = loop_file_fmt_get_lo(lo_fmt);
+	struct loop_file_fmt_qcow_header header;
+	u64 l1_vm_state_index;
+	u64 l2_cache_size;
+	u64 l2_cache_entry_size;
+	ssize_t len;
+	unsigned int i;
+	int ret = 0;
+
+	/* allocate memory for saving QCOW file format data */
+	qcow_data = kzalloc(sizeof(*qcow_data), GFP_KERNEL);
+	if (!qcow_data)
+		return -ENOMEM;
+
+	lo_fmt->private_data = qcow_data;
+
+	/* read the QCOW file header */
+	ret = __qcow_file_fmt_header_read(lo_fmt, &header);
+	if (ret)
+		goto free_qcow_data;
+
+	/* save information of the header fields in human readable format in
+	 * a file buffer to access it with debugfs */
+#ifdef CONFIG_DEBUG_FS
+	__qcow_file_fmt_header_to_buf(lo_fmt, &header);
+#endif
+
+	qcow_data->qcow_version = header.version;
+
+	/* Initialise cluster size */
+	if (header.cluster_bits < QCOW_MIN_CLUSTER_BITS
+		|| header.cluster_bits > QCOW_MAX_CLUSTER_BITS) {
+		printk(KERN_ERR "loop_file_fmt_qcow: unsupported cluster "
+			"size: 2^%d", header.cluster_bits);
+		ret = -EINVAL;
+		goto free_qcow_data;
+	}
+
+	qcow_data->cluster_bits = header.cluster_bits;
+	qcow_data->cluster_size = 1 << qcow_data->cluster_bits;
+	qcow_data->cluster_sectors = 1 <<
+		(qcow_data->cluster_bits - SECTOR_SHIFT);
+
+	if (header.header_length > qcow_data->cluster_size) {
+		printk(KERN_ERR "loop_file_fmt_qcow: QCOW header exceeds "
+			"cluster size");
+		ret = -EINVAL;
+		goto free_qcow_data;
+	}
+
+	if (header.backing_file_offset > qcow_data->cluster_size) {
+		printk(KERN_ERR "loop_file_fmt_qcow: invalid backing file "
+			"offset");
+		ret = -EINVAL;
+		goto free_qcow_data;
+	}
+
+	if (header.backing_file_offset) {
+		printk(KERN_ERR "loop_file_fmt_qcow: backing file support not "
+			"available");
+		ret = -ENOTSUPP;
+		goto free_qcow_data;
+	}
+
+	/* handle feature bits */
+	qcow_data->incompatible_features = header.incompatible_features;
+	qcow_data->compatible_features = header.compatible_features;
+	qcow_data->autoclear_features = header.autoclear_features;
+
+	if (qcow_data->incompatible_features & QCOW_INCOMPAT_DIRTY) {
+		printk(KERN_ERR "loop_file_fmt_qcow: image contains "
+			"inconsistent refcounts");
+		ret = -EACCES;
+		goto free_qcow_data;
+	}
+
+	if (qcow_data->incompatible_features & QCOW_INCOMPAT_CORRUPT) {
+		printk(KERN_ERR "loop_file_fmt_qcow: image is corrupt; cannot "
+			"be opened read/write");
+		ret = -EACCES;
+		goto free_qcow_data;
+	}
+
+	if (qcow_data->incompatible_features & QCOW_INCOMPAT_DATA_FILE) {
+		printk(KERN_ERR "loop_file_fmt_qcow: clusters in the external "
+			"data file are not refcounted");
+		ret = -EACCES;
+		goto free_qcow_data;
+	}
+
+	/* Check support for various header values */
+	if (header.refcount_order > 6) {
+		printk(KERN_ERR "loop_file_fmt_qcow: reference count entry "
+			"width too large; may not exceed 64 bits");
+		ret = -EINVAL;
+		goto free_qcow_data;
+	}
+	qcow_data->refcount_order = header.refcount_order;
+	qcow_data->refcount_bits = 1 << qcow_data->refcount_order;
+	qcow_data->refcount_max = U64_C(1) << (qcow_data->refcount_bits - 1);
+	qcow_data->refcount_max += qcow_data->refcount_max - 1;
+
+	qcow_data->crypt_method_header = header.crypt_method;
+	if (qcow_data->crypt_method_header) {
+		printk(KERN_ERR "loop_file_fmt_qcow: encryption support not "
+			"available");
+		ret = -ENOTSUPP;
+		goto free_qcow_data;
+	}
+
+	/* L2 is always one cluster */
+	qcow_data->l2_bits = qcow_data->cluster_bits - 3;
+	qcow_data->l2_size = 1 << qcow_data->l2_bits;
+	/* 2^(qcow_data->refcount_order - 3) is the refcount width in bytes */
+	qcow_data->refcount_block_bits = qcow_data->cluster_bits -
+		(qcow_data->refcount_order - 3);
+	qcow_data->refcount_block_size = 1 << qcow_data->refcount_block_bits;
+	qcow_data->size = header.size;
+	qcow_data->csize_shift = (62 - (qcow_data->cluster_bits - 8));
+	qcow_data->csize_mask = (1 << (qcow_data->cluster_bits - 8)) - 1;
+	qcow_data->cluster_offset_mask = (1LL << qcow_data->csize_shift) - 1;
+
+	qcow_data->refcount_table_offset = header.refcount_table_offset;
+	qcow_data->refcount_table_size = header.refcount_table_clusters <<
+		(qcow_data->cluster_bits - 3);
+
+	if (header.refcount_table_clusters == 0) {
+		printk(KERN_ERR "loop_file_fmt_qcow: image does not contain a "
+			"reference count table");
+		ret = -EINVAL;
+		goto free_qcow_data;
+	}
+
+	ret = __qcow_file_fmt_validate_table(lo_fmt,
+		qcow_data->refcount_table_offset,
+		header.refcount_table_clusters, qcow_data->cluster_size,
+		QCOW_MAX_REFTABLE_SIZE, "Reference count table");
+	if (ret < 0) {
+		goto free_qcow_data;
+	}
+
+	/* The total size in bytes of the snapshot table is checked in
+	 * qcow2_read_snapshots() because the size of each snapshot is
+	 * variable and we don't know it yet.
+	 * Here we only check the offset and number of snapshots. */
+	ret = __qcow_file_fmt_validate_table(lo_fmt, header.snapshots_offset,
+		header.nb_snapshots,
+		sizeof(struct loop_file_fmt_qcow_snapshot_header),
+		sizeof(struct loop_file_fmt_qcow_snapshot_header) *
+		QCOW_MAX_SNAPSHOTS, "Snapshot table");
+	if (ret < 0) {
+		goto free_qcow_data;
+	}
+
+	/* read the level 1 table */
+	ret = __qcow_file_fmt_validate_table(lo_fmt, header.l1_table_offset,
+		header.l1_size, sizeof(u64), QCOW_MAX_L1_SIZE,
+		"Active L1 table");
+	if (ret < 0) {
+		goto free_qcow_data;
+	}
+	qcow_data->l1_size = header.l1_size;
+	qcow_data->l1_table_offset = header.l1_table_offset;
+
+	l1_vm_state_index = loop_file_fmt_qcow_size_to_l1(qcow_data,
+		header.size);
+	if (l1_vm_state_index > INT_MAX) {
+		printk(KERN_ERR "loop_file_fmt_qcow: image is too big");
+		ret = -EFBIG;
+		goto free_qcow_data;
+	}
+	qcow_data->l1_vm_state_index = l1_vm_state_index;
+
+	/* the L1 table must contain at least enough entries to put header.size
+	 * bytes */
+	if (qcow_data->l1_size < qcow_data->l1_vm_state_index) {
+		printk(KERN_ERR "loop_file_fmt_qcow: L1 table is too small");
+		ret = -EINVAL;
+		goto free_qcow_data;
+	}
+
+	if (qcow_data->l1_size > 0) {
+		qcow_data->l1_table = vzalloc(round_up(qcow_data->l1_size *
+			sizeof(u64), 512));
+		if (qcow_data->l1_table == NULL) {
+			printk(KERN_ERR "loop_file_fmt_qcow: could not "
+				"allocate L1 table");
+			ret = -ENOMEM;
+			goto free_qcow_data;
+		}
+		len = kernel_read(lo->lo_backing_file, qcow_data->l1_table,
+			qcow_data->l1_size * sizeof(u64),
+			&qcow_data->l1_table_offset);
+		if (len < 0) {
+			printk(KERN_ERR "loop_file_fmt_qcow: could not read L1 "
+				"table");
+			ret = len;
+			goto free_l1_table;
+		}
+		for (i = 0; i < qcow_data->l1_size; i++) {
+			qcow_data->l1_table[i] =
+				be64_to_cpu(qcow_data->l1_table[i]);
+		}
+	}
+
+	/* Internal snapshots */
+	qcow_data->snapshots_offset = header.snapshots_offset;
+	qcow_data->nb_snapshots = header.nb_snapshots;
+
+	if (qcow_data->nb_snapshots > 0) {
+		printk(KERN_ERR "loop_file_fmt_qcow: snapshots support not "
+			"available");
+		ret = -ENOTSUPP;
+		goto free_l1_table;
+	}
+
+
+	/* create cache for L2 */
+	l2_cache_size =  qcow_data->size / (qcow_data->cluster_size / 8);
+	l2_cache_entry_size = min(qcow_data->cluster_size, (int)4096);
+
+	/* limit the L2 size to maximum QCOW_DEFAULT_L2_CACHE_MAX_SIZE */
+	l2_cache_size = min(l2_cache_size, (u64)QCOW_DEFAULT_L2_CACHE_MAX_SIZE);
+
+	/* calculate the number of cache tables */
+	l2_cache_size /= l2_cache_entry_size;
+	if (l2_cache_size < QCOW_MIN_L2_CACHE_SIZE) {
+		l2_cache_size = QCOW_MIN_L2_CACHE_SIZE;
+	}
+
+	if (l2_cache_size > INT_MAX) {
+		printk(KERN_ERR "loop_file_fmt_qcow: L2 cache size too big");
+		ret = -EINVAL;
+		goto free_l1_table;
+	}
+
+	qcow_data->l2_slice_size = l2_cache_entry_size / sizeof(u64);
+
+	qcow_data->l2_table_cache = loop_file_fmt_qcow_cache_create(lo_fmt,
+		l2_cache_size, l2_cache_entry_size);
+	if (!qcow_data->l2_table_cache) {
+		ret = -ENOMEM;
+		goto free_l1_table;
+	}
+
+	/* initialize compression support */
+	ret = __qcow_file_fmt_compression_init(lo_fmt);
+	if (ret < 0)
+		goto free_l2_cache;
+
+	/* initialize debugfs entries */
+#ifdef CONFIG_DEBUG_FS
+	ret = __qcow_file_fmt_dbgfs_init(lo_fmt);
+	if (ret < 0)
+		goto free_l2_cache;
+#endif
+
+	return ret;
+
+free_l2_cache:
+	loop_file_fmt_qcow_cache_destroy(lo_fmt);
+free_l1_table:
+	vfree(qcow_data->l1_table);
+free_qcow_data:
+	kfree(qcow_data);
+	lo_fmt->private_data = NULL;
+	return ret;
+}
+
+static void qcow_file_fmt_exit(struct loop_file_fmt *lo_fmt)
+{
+	struct loop_file_fmt_qcow_data *qcow_data = lo_fmt->private_data;
+
+#ifdef CONFIG_DEBUG_FS
+	__qcow_file_fmt_dbgfs_exit(lo_fmt);
+#endif
+
+	__qcow_file_fmt_compression_exit(lo_fmt);
+
+	if (qcow_data->l1_table) {
+		vfree(qcow_data->l1_table);
+	}
+
+	if (qcow_data->l2_table_cache) {
+		loop_file_fmt_qcow_cache_destroy(lo_fmt);
+	}
+
+	if (qcow_data) {
+		kfree(qcow_data);
+		lo_fmt->private_data = NULL;
+	}
+}
+
+static ssize_t __qcow_file_fmt_buffer_decompress(struct loop_file_fmt *lo_fmt,
+						 void *dest,
+						 size_t dest_size,
+						 const void *src,
+						 size_t src_size)
+{
+	struct loop_file_fmt_qcow_data *qcow_data = lo_fmt->private_data;
+	int ret = 0;
+
+	qcow_data->strm->avail_in = src_size;
+	qcow_data->strm->next_in = (void *) src;
+	qcow_data->strm->avail_out = dest_size;
+	qcow_data->strm->next_out = dest;
+
+	ret = zlib_inflateInit2(qcow_data->strm, -12);
+	if (ret != Z_OK) {
+		return -1;
+	}
+
+	ret = zlib_inflate(qcow_data->strm, Z_FINISH);
+	if ((ret != Z_STREAM_END && ret != Z_BUF_ERROR)
+		|| qcow_data->strm->avail_out != 0) {
+		/* We approve Z_BUF_ERROR because we need @dest buffer to be
+		 * filled, but @src buffer may be processed partly (because in
+		 * qcow2 we know size of compressed data with precision of one
+		 * sector) */
+		ret = -1;
+	}
+
+	zlib_inflateEnd(qcow_data->strm);
+
+	return ret;
+}
+
+static int __qcow_file_fmt_read_compressed(struct loop_file_fmt *lo_fmt,
+					   struct bio_vec *bvec,
+					   u64 file_cluster_offset,
+					   u64 offset,
+					   u64 bytes,
+					   u64 bytes_done)
+{
+	struct loop_file_fmt_qcow_data *qcow_data = lo_fmt->private_data;
+	struct loop_device *lo = loop_file_fmt_get_lo(lo_fmt);
+	int ret = 0, csize, nb_csectors;
+	u64 coffset;
+	u8 *in_buf, *out_buf;
+	ssize_t len;
+	void *data;
+	unsigned long irq_flags;
+	int offset_in_cluster = loop_file_fmt_qcow_offset_into_cluster(
+		qcow_data, offset);
+
+	coffset = file_cluster_offset & qcow_data->cluster_offset_mask;
+	nb_csectors = ((file_cluster_offset >> qcow_data->csize_shift) &
+		qcow_data->csize_mask) + 1;
+	csize = nb_csectors * QCOW_COMPRESSED_SECTOR_SIZE -
+		(coffset & ~QCOW_COMPRESSED_SECTOR_MASK);
+
+	in_buf = vmalloc(csize);
+	if (!in_buf) {
+		return -ENOMEM;
+	}
+
+	out_buf = vmalloc(qcow_data->cluster_size);
+	if (!out_buf) {
+		ret = -ENOMEM;
+		goto out_free_in_buf;
+	}
+
+	len = kernel_read(lo->lo_backing_file, in_buf, csize, &coffset);
+	if (len < 0) {
+		ret = len;
+		goto out_free_out_buf;
+	}
+
+	if (__qcow_file_fmt_buffer_decompress(lo_fmt, out_buf,
+		qcow_data->cluster_size, in_buf, csize) < 0) {
+		ret = -EIO;
+		goto out_free_out_buf;
+	}
+
+	ASSERT(bytes <= bvec->bv_len);
+	data = bvec_kmap_irq(bvec, &irq_flags) + bytes_done;
+	memcpy(data, out_buf + offset_in_cluster, bytes);
+	flush_dcache_page(bvec->bv_page);
+	bvec_kunmap_irq(data, &irq_flags);
+
+out_free_out_buf:
+	vfree(out_buf);
+out_free_in_buf:
+	vfree(in_buf);
+
+	return ret;
+}
+
+static int __qcow_file_fmt_read_bvec(struct loop_file_fmt *lo_fmt,
+				     struct bio_vec *bvec,
+				     loff_t *ppos)
+{
+	struct loop_file_fmt_qcow_data *qcow_data = lo_fmt->private_data;
+	struct loop_device *lo = loop_file_fmt_get_lo(lo_fmt);
+	int offset_in_cluster;
+	int ret;
+	unsigned int cur_bytes; /* number of bytes in current iteration */
+	u64 bytes;
+	u64 cluster_offset = 0;
+	u64 bytes_done = 0;
+	void *data;
+	unsigned long irq_flags;
+	ssize_t len;
+	loff_t pos_read;
+
+	bytes = bvec->bv_len;
+
+	while (bytes != 0) {
+
+		/* prepare next request */
+		cur_bytes = bytes;
+
+		ret = loop_file_fmt_qcow_cluster_get_offset(lo_fmt, *ppos,
+			&cur_bytes, &cluster_offset);
+		if (ret < 0) {
+			goto fail;
+		}
+
+		offset_in_cluster = loop_file_fmt_qcow_offset_into_cluster(
+			qcow_data, *ppos);
+
+		switch (ret) {
+		case QCOW_CLUSTER_UNALLOCATED:
+		case QCOW_CLUSTER_ZERO_PLAIN:
+		case QCOW_CLUSTER_ZERO_ALLOC:
+			data = bvec_kmap_irq(bvec, &irq_flags) + bytes_done;
+			memset(data, 0, cur_bytes);
+			flush_dcache_page(bvec->bv_page);
+			bvec_kunmap_irq(data, &irq_flags);
+			break;
+
+		case QCOW_CLUSTER_COMPRESSED:
+			ret = __qcow_file_fmt_read_compressed(lo_fmt, bvec,
+				cluster_offset, *ppos, cur_bytes, bytes_done);
+			if (ret < 0) {
+				goto fail;
+			}
+
+			break;
+
+		case QCOW_CLUSTER_NORMAL:
+			if ((cluster_offset & 511) != 0) {
+				ret = -EIO;
+				goto fail;
+			}
+
+			pos_read = cluster_offset + offset_in_cluster;
+
+			data = bvec_kmap_irq(bvec, &irq_flags) + bytes_done;
+			len = kernel_read(lo->lo_backing_file, data, cur_bytes,
+				&pos_read);
+			flush_dcache_page(bvec->bv_page);
+			bvec_kunmap_irq(data, &irq_flags);
+
+			if (len < 0)
+				return len;
+
+			break;
+
+		default:
+			ret = -EIO;
+			goto fail;
+		}
+
+		bytes -= cur_bytes;
+		*ppos += cur_bytes;
+		bytes_done += cur_bytes;
+	}
+
+	ret = 0;
+
+fail:
+	return ret;
+}
+
+static int qcow_file_fmt_read(struct loop_file_fmt *lo_fmt,
+			      struct request *rq)
+{
+	struct bio_vec bvec;
+	struct req_iterator iter;
+	loff_t pos;
+	int ret = 0;
+
+	pos = __qcow_file_fmt_rq_get_pos(lo_fmt, rq);
+
+	rq_for_each_segment(bvec, rq, iter) {
+		ret = __qcow_file_fmt_read_bvec(lo_fmt, &bvec, &pos);
+		if (ret)
+			return ret;
+
+		cond_resched();
+	}
+
+	return ret;
+}
+
+static loff_t qcow_file_fmt_sector_size(struct loop_file_fmt *lo_fmt)
+{
+	struct loop_file_fmt_qcow_data *qcow_data = lo_fmt->private_data;
+	struct loop_device *lo = loop_file_fmt_get_lo(lo_fmt);
+	loff_t loopsize;
+
+	if (qcow_data->size > 0)
+		loopsize = qcow_data->size;
+	else
+		return 0;
+
+	if (lo->lo_offset > 0)
+		loopsize -= lo->lo_offset;
+
+	if (lo->lo_sizelimit > 0 && lo->lo_sizelimit < loopsize)
+		loopsize = lo->lo_sizelimit;
+
+	/*
+	 * Unfortunately, if we want to do I/O on the device,
+	 * the number of 512-byte sectors has to fit into a sector_t.
+	 */
+	return loopsize >> 9;
+}
+
+static struct loop_file_fmt_ops qcow_file_fmt_ops = {
+	.init = qcow_file_fmt_init,
+	.exit = qcow_file_fmt_exit,
+	.read = qcow_file_fmt_read,
+	.write = NULL,
+	.read_aio = NULL,
+	.write_aio = NULL,
+	.discard = NULL,
+	.flush = NULL,
+	.sector_size = qcow_file_fmt_sector_size
+};
+
+static struct loop_file_fmt_driver qcow_file_fmt_driver = {
+	.name = "QCOW",
+	.file_fmt_type = LO_FILE_FMT_QCOW,
+	.ops = &qcow_file_fmt_ops,
+	.owner = THIS_MODULE
+};
+
+static int __init loop_file_fmt_qcow_init(void)
+{
+	printk(KERN_INFO "loop_file_fmt_qcow: init loop device QCOW file "
+		"format driver");
+	return loop_file_fmt_register_driver(&qcow_file_fmt_driver);
+}
+
+static void __exit loop_file_fmt_qcow_exit(void)
+{
+	printk(KERN_INFO "loop_file_fmt_qcow: exit loop device QCOW file "
+		"format driver");
+	loop_file_fmt_unregister_driver(&qcow_file_fmt_driver);
+}
+
+module_init(loop_file_fmt_qcow_init);
+module_exit(loop_file_fmt_qcow_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Manuel Bentele <development@manuel-bentele.de>");
+MODULE_DESCRIPTION("Loop device QCOW file format driver");
+MODULE_SOFTDEP("pre: loop");
diff --git a/drivers/block/loop/loop_file_fmt_qcow_main.h b/drivers/block/loop/loop_file_fmt_qcow_main.h
new file mode 100644
index 000000000000..9e4951fba079
--- /dev/null
+++ b/drivers/block/loop/loop_file_fmt_qcow_main.h
@@ -0,0 +1,417 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * loop_file_fmt_qcow.h
+ *
+ * QCOW file format driver for the loop device module.
+ *
+ * Ported QCOW2 implementation of the QEMU project (GPL-2.0):
+ * Declarations for the QCOW2 file format.
+ *
+ * The copyright (C) 2004-2006 of the original code is owned by Fabrice Bellard.
+ *
+ * Copyright (C) 2019 Manuel Bentele <development@manuel-bentele.de>
+ */
+
+#ifndef _LINUX_LOOP_FILE_FMT_QCOW_H
+#define _LINUX_LOOP_FILE_FMT_QCOW_H
+
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/types.h>
+#include <linux/zlib.h>
+
+#ifdef CONFIG_DEBUG_FS
+#include <linux/debugfs.h>
+#endif
+
+#include "loop_file_fmt.h"
+
+#ifdef CONFIG_DEBUG_DRIVER
+#define ASSERT(x)  							\
+do {									\
+	if (!(x)) {							\
+		printk(KERN_EMERG "assertion failed %s: %d: %s\n",	\
+		       __FILE__, __LINE__, #x);				\
+		BUG();							\
+	}								\
+} while (0)
+#else
+#define ASSERT(x) do { } while (0)
+#endif
+
+#define KiB (1024)
+#define MiB (1024 * 1024)
+
+#define QCOW_MAGIC (('Q' << 24) | ('F' << 16) | ('I' << 8) | 0xfb)
+
+#define QCOW_CRYPT_NONE 0
+#define QCOW_CRYPT_AES  1
+#define QCOW_CRYPT_LUKS 2
+
+#define QCOW_MAX_CRYPT_CLUSTERS 32
+#define QCOW_MAX_SNAPSHOTS 65536
+
+/* Field widths in QCOW mean normal cluster offsets cannot reach
+ * 64PB; depending on cluster size, compressed clusters can have a
+ * smaller limit (64PB for up to 16k clusters, then ramps down to
+ * 512TB for 2M clusters).  */
+#define QCOW_MAX_CLUSTER_OFFSET ((1ULL << 56) - 1)
+
+/* 8 MB refcount table is enough for 2 PB images at 64k cluster size
+ * (128 GB for 512 byte clusters, 2 EB for 2 MB clusters) */
+#define QCOW_MAX_REFTABLE_SIZE (8 * MiB)
+
+/* 32 MB L1 table is enough for 2 PB images at 64k cluster size
+ * (128 GB for 512 byte clusters, 2 EB for 2 MB clusters) */
+#define QCOW_MAX_L1_SIZE (32 * MiB)
+
+/* Allow for an average of 1k per snapshot table entry, should be plenty of
+ * space for snapshot names and IDs */
+#define QCOW_MAX_SNAPSHOTS_SIZE (1024 * QCOW_MAX_SNAPSHOTS)
+
+/* Bitmap header extension constraints */
+#define QCOW_MAX_BITMAPS 65535
+#define QCOW_MAX_BITMAP_DIRECTORY_SIZE (1024 * QCOW_MAX_BITMAPS)
+
+/* indicate that the refcount of the referenced cluster is exactly one. */
+#define QCOW_OFLAG_COPIED     (1ULL << 63)
+/* indicate that the cluster is compressed (they never have the copied flag) */
+#define QCOW_OFLAG_COMPRESSED (1ULL << 62)
+/* The cluster reads as all zeros */
+#define QCOW_OFLAG_ZERO (1ULL << 0)
+
+#define QCOW_MIN_CLUSTER_BITS 9
+#define QCOW_MAX_CLUSTER_BITS 21
+
+/* Defined in the qcow2 spec (compressed cluster descriptor) */
+#define QCOW_COMPRESSED_SECTOR_SIZE 512U
+#define QCOW_COMPRESSED_SECTOR_MASK (~(QCOW_COMPRESSED_SECTOR_SIZE - 1))
+
+/* Must be at least 2 to cover COW */
+#define QCOW_MIN_L2_CACHE_SIZE 2 /* cache entries */
+
+/* Must be at least 4 to cover all cases of refcount table growth */
+#define QCOW_MIN_REFCOUNT_CACHE_SIZE 4 /* clusters */
+
+#define QCOW_DEFAULT_L2_CACHE_MAX_SIZE (32 * MiB)
+#define QCOW_DEFAULT_CACHE_CLEAN_INTERVAL 600  /* seconds */
+
+#define QCOW_DEFAULT_CLUSTER_SIZE 65536
+
+/* Buffer size for debugfs file buffer to display QCOW header information */
+#define QCOW_HEADER_BUF_LEN 1024
+
+/* Buffer size for debugfs file buffer to receive and display offset and
+ * cluster offset information */
+#define QCOW_OFFSET_BUF_LEN 32
+#define QCOW_CLUSTER_BUF_LEN 128
+
+struct loop_file_fmt_qcow_header {
+	u32 magic;
+	u32 version;
+	u64 backing_file_offset;
+	u32 backing_file_size;
+	u32 cluster_bits;
+	u64 size; /* in bytes */
+	u32 crypt_method;
+	u32 l1_size;
+	u64 l1_table_offset;
+	u64 refcount_table_offset;
+	u32 refcount_table_clusters;
+	u32 nb_snapshots;
+	u64 snapshots_offset;
+
+	/* The following fields are only valid for version >= 3 */
+	u64 incompatible_features;
+	u64 compatible_features;
+	u64 autoclear_features;
+
+	u32 refcount_order;
+	u32 header_length;
+} __attribute__((packed));
+
+struct loop_file_fmt_qcow_snapshot_header {
+	/* header is 8 byte aligned */
+	u64 l1_table_offset;
+
+	u32 l1_size;
+	u16 id_str_size;
+	u16 name_size;
+
+	u32 date_sec;
+	u32 date_nsec;
+
+	u64 vm_clock_nsec;
+
+	u32 vm_state_size;
+	/* for extension */
+	u32 extra_data_size;
+	/* extra data follows */
+	/* id_str follows */
+	/* name follows  */
+} __attribute__((packed));
+
+enum {
+	QCOW_FEAT_TYPE_INCOMPATIBLE    = 0,
+	QCOW_FEAT_TYPE_COMPATIBLE      = 1,
+	QCOW_FEAT_TYPE_AUTOCLEAR       = 2,
+};
+
+/* incompatible feature bits */
+enum {
+	QCOW_INCOMPAT_DIRTY_BITNR      = 0,
+	QCOW_INCOMPAT_CORRUPT_BITNR    = 1,
+	QCOW_INCOMPAT_DATA_FILE_BITNR  = 2,
+	QCOW_INCOMPAT_DIRTY            = 1 << QCOW_INCOMPAT_DIRTY_BITNR,
+	QCOW_INCOMPAT_CORRUPT          = 1 << QCOW_INCOMPAT_CORRUPT_BITNR,
+	QCOW_INCOMPAT_DATA_FILE        = 1 << QCOW_INCOMPAT_DATA_FILE_BITNR,
+
+	QCOW_INCOMPAT_MASK             = QCOW_INCOMPAT_DIRTY
+					| QCOW_INCOMPAT_CORRUPT
+					| QCOW_INCOMPAT_DATA_FILE,
+};
+
+/* compatible feature bits */
+enum {
+	QCOW_COMPAT_LAZY_REFCOUNTS_BITNR = 0,
+	QCOW_COMPAT_LAZY_REFCOUNTS       = 1 << QCOW_COMPAT_LAZY_REFCOUNTS_BITNR,
+
+	QCOW_COMPAT_FEAT_MASK            = QCOW_COMPAT_LAZY_REFCOUNTS,
+};
+
+/* autoclear feature bits */
+enum {
+	QCOW_AUTOCLEAR_BITMAPS_BITNR       = 0,
+	QCOW_AUTOCLEAR_DATA_FILE_RAW_BITNR = 1,
+	QCOW_AUTOCLEAR_BITMAPS             = 1 << QCOW_AUTOCLEAR_BITMAPS_BITNR,
+	QCOW_AUTOCLEAR_DATA_FILE_RAW       = 1 << QCOW_AUTOCLEAR_DATA_FILE_RAW_BITNR,
+
+	QCOW_AUTOCLEAR_MASK                = QCOW_AUTOCLEAR_BITMAPS |
+						QCOW_AUTOCLEAR_DATA_FILE_RAW,
+};
+
+struct loop_file_fmt_qcow_data {
+	u64 size;
+	int cluster_bits;
+	int cluster_size;
+	int cluster_sectors;
+	int l2_slice_size;
+	int l2_bits;
+	int l2_size;
+	int l1_size;
+	int l1_vm_state_index;
+	int refcount_block_bits;
+	int refcount_block_size;
+	int csize_shift;
+	int csize_mask;
+	u64 cluster_offset_mask;
+	u64 l1_table_offset;
+	u64 *l1_table;
+
+	struct loop_file_fmt_qcow_cache *l2_table_cache;
+	struct loop_file_fmt_qcow_cache *refcount_block_cache;
+
+	u64 *refcount_table;
+	u64 refcount_table_offset;
+	u32 refcount_table_size;
+	u32 max_refcount_table_index; /* Last used entry in refcount_table */
+	u64 free_cluster_index;
+	u64 free_byte_offset;
+
+	u32 crypt_method_header;
+	u64 snapshots_offset;
+	int snapshots_size;
+	unsigned int nb_snapshots;
+
+	u32 nb_bitmaps;
+	u64 bitmap_directory_size;
+	u64 bitmap_directory_offset;
+
+	int qcow_version;
+	bool use_lazy_refcounts;
+	int refcount_order;
+	int refcount_bits;
+	u64 refcount_max;
+
+	u64 incompatible_features;
+	u64 compatible_features;
+	u64 autoclear_features;
+
+	struct z_stream_s *strm;
+
+	/* debugfs entries */
+#ifdef CONFIG_DEBUG_FS
+	struct dentry *dbgfs_dir;
+	struct dentry *dbgfs_file_qcow_header;
+	char dbgfs_file_qcow_header_buf[QCOW_HEADER_BUF_LEN];
+	struct dentry *dbgfs_file_qcow_offset;
+	char dbgfs_file_qcow_offset_buf[QCOW_OFFSET_BUF_LEN];
+	char dbgfs_file_qcow_cluster_buf[QCOW_CLUSTER_BUF_LEN];
+	u64 dbgfs_qcow_offset;
+	struct mutex dbgfs_qcow_offset_mutex;
+#endif
+};
+
+struct loop_file_fmt_qcow_cow_region {
+	/**
+	 * Offset of the COW region in bytes from the start of the first
+	 * cluster touched by the request.
+	 */
+	unsigned offset;
+
+	/** Number of bytes to copy */
+	unsigned nb_bytes;
+};
+
+enum loop_file_fmt_qcow_cluster_type {
+	QCOW_CLUSTER_UNALLOCATED,
+	QCOW_CLUSTER_ZERO_PLAIN,
+	QCOW_CLUSTER_ZERO_ALLOC,
+	QCOW_CLUSTER_NORMAL,
+	QCOW_CLUSTER_COMPRESSED,
+};
+
+enum loop_file_fmt_qcow_metadata_overlap {
+	QCOW_OL_MAIN_HEADER_BITNR      = 0,
+	QCOW_OL_ACTIVE_L1_BITNR        = 1,
+	QCOW_OL_ACTIVE_L2_BITNR        = 2,
+	QCOW_OL_REFCOUNT_TABLE_BITNR   = 3,
+	QCOW_OL_REFCOUNT_BLOCK_BITNR   = 4,
+	QCOW_OL_SNAPSHOT_TABLE_BITNR   = 5,
+	QCOW_OL_INACTIVE_L1_BITNR      = 6,
+	QCOW_OL_INACTIVE_L2_BITNR      = 7,
+	QCOW_OL_BITMAP_DIRECTORY_BITNR = 8,
+
+	QCOW_OL_MAX_BITNR              = 9,
+
+	QCOW_OL_NONE             = 0,
+	QCOW_OL_MAIN_HEADER      = (1 << QCOW_OL_MAIN_HEADER_BITNR),
+	QCOW_OL_ACTIVE_L1        = (1 << QCOW_OL_ACTIVE_L1_BITNR),
+	QCOW_OL_ACTIVE_L2        = (1 << QCOW_OL_ACTIVE_L2_BITNR),
+	QCOW_OL_REFCOUNT_TABLE   = (1 << QCOW_OL_REFCOUNT_TABLE_BITNR),
+	QCOW_OL_REFCOUNT_BLOCK   = (1 << QCOW_OL_REFCOUNT_BLOCK_BITNR),
+	QCOW_OL_SNAPSHOT_TABLE   = (1 << QCOW_OL_SNAPSHOT_TABLE_BITNR),
+	QCOW_OL_INACTIVE_L1      = (1 << QCOW_OL_INACTIVE_L1_BITNR),
+	/* NOTE: Checking overlaps with inactive L2 tables will result in bdrv
+	 * reads. */
+	QCOW_OL_INACTIVE_L2      = (1 << QCOW_OL_INACTIVE_L2_BITNR),
+	QCOW_OL_BITMAP_DIRECTORY = (1 << QCOW_OL_BITMAP_DIRECTORY_BITNR),
+};
+
+/* Perform all overlap checks which can be done in constant time */
+#define QCOW_OL_CONSTANT \
+	(QCOW_OL_MAIN_HEADER | QCOW_OL_ACTIVE_L1 | QCOW_OL_REFCOUNT_TABLE | \
+		QCOW_OL_SNAPSHOT_TABLE | QCOW_OL_BITMAP_DIRECTORY)
+
+/* Perform all overlap checks which don't require disk access */
+#define QCOW_OL_CACHED \
+	(QCOW_OL_CONSTANT | QCOW_OL_ACTIVE_L2 | QCOW_OL_REFCOUNT_BLOCK | \
+		QCOW_OL_INACTIVE_L1)
+
+/* Perform all overlap checks */
+#define QCOW_OL_ALL \
+	(QCOW_OL_CACHED | QCOW_OL_INACTIVE_L2)
+
+#define L1E_OFFSET_MASK 0x00fffffffffffe00ULL
+#define L2E_OFFSET_MASK 0x00fffffffffffe00ULL
+#define L2E_COMPRESSED_OFFSET_SIZE_MASK 0x3fffffffffffffffULL
+
+#define REFT_OFFSET_MASK 0xfffffffffffffe00ULL
+
+#define INV_OFFSET (-1ULL)
+
+static inline bool loop_file_fmt_qcow_has_data_file(
+	struct loop_file_fmt *lo_fmt)
+{
+	/* At the moment, there is no support for copy on write! */
+	return false;
+}
+
+static inline bool loop_file_fmt_qcow_data_file_is_raw(
+	struct loop_file_fmt *lo_fmt)
+{
+	struct loop_file_fmt_qcow_data *qcow_data = lo_fmt->private_data;
+	return !!(qcow_data->autoclear_features &
+		QCOW_AUTOCLEAR_DATA_FILE_RAW);
+}
+
+static inline s64 loop_file_fmt_qcow_start_of_cluster(
+	struct loop_file_fmt_qcow_data *qcow_data, s64 offset)
+{
+	return offset & ~(qcow_data->cluster_size - 1);
+}
+
+static inline s64 loop_file_fmt_qcow_offset_into_cluster(
+	struct loop_file_fmt_qcow_data *qcow_data, s64 offset)
+{
+	return offset & (qcow_data->cluster_size - 1);
+}
+
+static inline s64 loop_file_fmt_qcow_size_to_clusters(
+	struct loop_file_fmt_qcow_data *qcow_data, u64 size)
+{
+	return (size + (qcow_data->cluster_size - 1)) >>
+		qcow_data->cluster_bits;
+}
+
+static inline s64 loop_file_fmt_qcow_size_to_l1(
+	struct loop_file_fmt_qcow_data *qcow_data, s64 size)
+{
+	int shift = qcow_data->cluster_bits + qcow_data->l2_bits;
+	return (size + (1ULL << shift) - 1) >> shift;
+}
+
+static inline int loop_file_fmt_qcow_offset_to_l1_index(
+	struct loop_file_fmt_qcow_data *qcow_data, u64 offset)
+{
+	return offset >> (qcow_data->l2_bits + qcow_data->cluster_bits);
+}
+
+static inline int loop_file_fmt_qcow_offset_to_l2_index(
+	struct loop_file_fmt_qcow_data *qcow_data, s64 offset)
+{
+	return (offset >> qcow_data->cluster_bits) & (qcow_data->l2_size - 1);
+}
+
+static inline int loop_file_fmt_qcow_offset_to_l2_slice_index(
+	struct loop_file_fmt_qcow_data *qcow_data, s64 offset)
+{
+	return (offset >> qcow_data->cluster_bits) &
+		(qcow_data->l2_slice_size - 1);
+}
+
+static inline s64 loop_file_fmt_qcow_vm_state_offset(
+	struct loop_file_fmt_qcow_data *qcow_data)
+{
+	return (s64)qcow_data->l1_vm_state_index <<
+		(qcow_data->cluster_bits + qcow_data->l2_bits);
+}
+
+static inline enum loop_file_fmt_qcow_cluster_type
+loop_file_fmt_qcow_get_cluster_type(struct loop_file_fmt *lo_fmt, u64 l2_entry)
+{
+	if (l2_entry & QCOW_OFLAG_COMPRESSED) {
+		return QCOW_CLUSTER_COMPRESSED;
+	} else if (l2_entry & QCOW_OFLAG_ZERO) {
+		if (l2_entry & L2E_OFFSET_MASK) {
+			return QCOW_CLUSTER_ZERO_ALLOC;
+		}
+		return QCOW_CLUSTER_ZERO_PLAIN;
+	} else if (!(l2_entry & L2E_OFFSET_MASK)) {
+		/* Offset 0 generally means unallocated, but it is ambiguous
+		 * with external data files because 0 is a valid offset there.
+		 * However, all clusters in external data files always have
+		 * refcount 1, so we can rely on QCOW_OFLAG_COPIED to
+		 * disambiguate. */
+		if (loop_file_fmt_qcow_has_data_file(lo_fmt) &&
+			(l2_entry & QCOW_OFLAG_COPIED)) {
+			return QCOW_CLUSTER_NORMAL;
+		} else {
+			return QCOW_CLUSTER_UNALLOCATED;
+		}
+	} else {
+		return QCOW_CLUSTER_NORMAL;
+	}
+}
+
+#endif
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/5] doc: admin-guide: add QCOW2 file format to loop device documentation
  2019-08-23 22:56 [PATCH 0/5] block: loop: add file format subsystem and QCOW2 file format driver development
                   ` (2 preceding siblings ...)
  2019-08-23 22:56 ` [PATCH 4/5] block: loop: add QCOW2 loop file format driver (read-only) development
@ 2019-08-23 22:56 ` development
  2019-08-24  3:37 ` [PATCH 0/5] block: loop: add file format subsystem and QCOW2 file format driver Bart Van Assche
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: development @ 2019-08-23 22:56 UTC (permalink / raw)
  To: linux-block; +Cc: Manuel Bentele

From: Manuel Bentele <development@manuel-bentele.de>

The existing documentation about the loop block device is extended by
a section about the QCOW2 file format driver. The documentation is written
in the reST kernel documentation format.

Signed-off-by: Manuel Bentele <development@manuel-bentele.de>
---
 Documentation/admin-guide/blockdev/loop.rst | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/Documentation/admin-guide/blockdev/loop.rst b/Documentation/admin-guide/blockdev/loop.rst
index 69d8172c85db..3a5897a14c8b 100644
--- a/Documentation/admin-guide/blockdev/loop.rst
+++ b/Documentation/admin-guide/blockdev/loop.rst
@@ -72,3 +72,14 @@ image file. It supports discarding, asynchrounous IO, flushing and cryptoloop
 support.
 
 The driver's kernel module is named *loop_file_fmt_raw*.
+
+
+QCOW
+~~~~
+
+The QCOW file format driver implements QEMU's copy on write file format in
+version 2. At the moment, the file format driver only supports the reading
+of QCOW2 disk image files. It does not support writing to QCOW2 images, the
+recovery of broken QCOW images, snapshots and reference counts.
+
+The driver's kernel module is named *loop_file_fmt_qcow*.
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/5] block: loop: add file format subsystem and QCOW2 file format driver
  2019-08-23 22:56 [PATCH 0/5] block: loop: add file format subsystem and QCOW2 file format driver development
                   ` (3 preceding siblings ...)
  2019-08-23 22:56 ` [PATCH 5/5] doc: admin-guide: add QCOW2 file format to loop device documentation development
@ 2019-08-24  3:37 ` Bart Van Assche
  2019-08-24  9:14   ` Manuel Bentele
  2019-08-24 11:10 ` Manuel Bentele
  2019-09-12  2:24 ` Ming Lei
  6 siblings, 1 reply; 15+ messages in thread
From: Bart Van Assche @ 2019-08-24  3:37 UTC (permalink / raw)
  To: development, linux-block

On 8/23/19 3:56 PM, development@manuel-bentele.de wrote:
> During the discussion, it turned out that the implementation as device
> mapper target is not applicable. The device mapper stacks different
> functionality such as compression or encryption on multiple block device
> layers whereas an implementation for the QCOW2 container format provides
> these functionalities on one block device layer.

Hi Manuel,

Is there a more detailed discussion available of this subject? Are you 
familiar with the dm-crypt driver?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/5] block: loop: add file format subsystem and QCOW2 file format driver
  2019-08-24  3:37 ` [PATCH 0/5] block: loop: add file format subsystem and QCOW2 file format driver Bart Van Assche
@ 2019-08-24  9:14   ` Manuel Bentele
  2019-08-24 16:04     ` Bart Van Assche
  0 siblings, 1 reply; 15+ messages in thread
From: Manuel Bentele @ 2019-08-24  9:14 UTC (permalink / raw)
  To: Bart Van Assche, development, linux-block

Hi Bart

Thanks for your quick reply.

On 8/24/19 5:37 AM, Bart Van Assche wrote:
> On 8/23/19 3:56 PM, development@manuel-bentele.de wrote:
>> During the discussion, it turned out that the implementation as device
>> mapper target is not applicable. The device mapper stacks different
>> functionality such as compression or encryption on multiple block device
>> layers whereas an implementation for the QCOW2 container format provides
>> these functionalities on one block device layer.
>
> Hi Manuel,
>
> Is there a more detailed discussion available of this subject?
No, the only discussion is the referenced one [1]. But there was a
similar discussion in the master's thesis of Francesc Zacarias Ribot
[2]. Unfortunately, I found no attempt on the mailing list that proposes
his solution.

> Are you familiar with the dm-crypt driver?
I don't know the specific implementation details, but I use this driver
personally and I like it. Do you want to propose that only the storage
aspect of the QCOW2 container format should be used and all other
functionality inside the container should be provided by available
device mapper targets?

> [...]

Regards,
Manuel

[1] https://www.spinics.net/lists/linux-block/msg39538.html
[2] Francesc Zacarias Ribot: QLOOP Linux driver to mount QCOW2 virtual
disks; June 23, 2010;
https://upcommons.upc.edu/bitstream/handle/2099.1/9619/65757.pdf


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/5] block: loop: add file format subsystem and QCOW2 file format driver
  2019-08-23 22:56 [PATCH 0/5] block: loop: add file format subsystem and QCOW2 file format driver development
                   ` (4 preceding siblings ...)
  2019-08-24  3:37 ` [PATCH 0/5] block: loop: add file format subsystem and QCOW2 file format driver Bart Van Assche
@ 2019-08-24 11:10 ` Manuel Bentele
  2019-09-12  2:24 ` Ming Lei
  6 siblings, 0 replies; 15+ messages in thread
From: Manuel Bentele @ 2019-08-24 11:10 UTC (permalink / raw)
  To: linux-block; +Cc: development

Hi

I realized that the first patch of my patch series is missing, although 
I successfully send them to the block mailing list. In addition to that, 
I checked my mail server's log and I personally received a copy of my 
patches. Also, I have not received any undelivered mail message from the 
mailing list sever. So everything seems fine.

In preparation for submitting the patch series, I checked the size of 
each patch. I can confirm that all patches are smaller than the 300kB 
size limit stated in the documentation [1]. Did I do something wrong?

Regards,
Manuel

[1] 
https://www.kernel.org/doc/html/v5.3-rc5/process/submitting-patches.html#e-mail-size

On 8/24/19 12:56 AM, development@manuel-bentele.de wrote:
> From: Manuel Bentele <development@manuel-bentele.de>
>
> Hi
>
> Regarding to the following discussion [1] on the mailing list I show you
> the result of my work as announced at the end of the discussion [2].
>
> The discussion was about the project topic of how to implement the
> reading/writing of QCOW2 in the kernel. The project focuses on an read-only
> in-kernel QCOW2 implementation to increase the read/write performance
> and tries to avoid nbd. Furthermore, the project is part of a project
> series to develop a in-kernel network boot infrastructure that has no need
> for any user space interaction (e.g. nbd) anymore.
>
> During the discussion, it turned out that the implementation as device
> mapper target is not applicable. The device mapper stacks different
> functionality such as compression or encryption on multiple block device
> layers whereas an implementation for the QCOW2 container format provides
> these functionalities on one block device layer. Using FUSE is also not
> possible due to performance reasons and user space interaction.
>
> Therefore, I propose the extension of the loop device module. I created a
> new file format subsystem which is part of the loop device module. The file
> format subsystem abstracts the direct file access and provides an driver
> API to implement various disk file formats such as QCOW2, VDI and VMDK.
> File format drivers are implemented as kernel modules and can be registered
> by the file format subsystem.
>
> The patch series contains documentation for the file format subsystem and
> the loop device module, too. Also, it provides a default RAW file format
> driver and a read-only QCOW2 driver. The RAW file format driver is based on
> the file specific parts of the existing loop device implementation and
> preserves the default behaviour of a loop device. More specific information
> can be found in the commit logs of the following patches.
>
> Regards,
> Manuel
>
> [1] https://www.spinics.net/lists/linux-block/msg39538.html
> [2] https://www.spinics.net/lists/linux-block/msg40479.html
>
> Manuel Bentele (5):
>    block: loop: add file format subsystem for loop devices
>    doc: admin-guide: add loop block device documentation
>    doc: driver-api: add loop file format subsystem API documentation
>    block: loop: add QCOW2 loop file format driver (read-only)
>    doc: admin-guide: add QCOW2 file format to loop device documentation
>
>   Documentation/admin-guide/blockdev/index.rst  |   1 +
>   Documentation/admin-guide/blockdev/loop.rst   |  85 ++
>   Documentation/driver-api/index.rst            |   1 +
>   Documentation/driver-api/loop-file-fmt.rst    | 137 +++
>   arch/alpha/configs/defconfig                  |   1 +
>   arch/arc/configs/axs103_defconfig             |   1 +
>   arch/arc/configs/axs103_smp_defconfig         |   1 +
>   arch/arm/configs/am200epdkit_defconfig        |   1 +
>   arch/arm/configs/aspeed_g4_defconfig          |   1 +
>   arch/arm/configs/aspeed_g5_defconfig          |   1 +
>   arch/arm/configs/assabet_defconfig            |   1 +
>   arch/arm/configs/at91_dt_defconfig            |   1 +
>   arch/arm/configs/axm55xx_defconfig            |   1 +
>   arch/arm/configs/badge4_defconfig             |   1 +
>   arch/arm/configs/cerfcube_defconfig           |   1 +
>   arch/arm/configs/cm_x2xx_defconfig            |   1 +
>   arch/arm/configs/cm_x300_defconfig            |   1 +
>   arch/arm/configs/cns3420vb_defconfig          |   1 +
>   arch/arm/configs/colibri_pxa270_defconfig     |   1 +
>   arch/arm/configs/collie_defconfig             |   1 +
>   arch/arm/configs/corgi_defconfig              |   1 +
>   arch/arm/configs/davinci_all_defconfig        |   1 +
>   arch/arm/configs/dove_defconfig               |   1 +
>   arch/arm/configs/em_x270_defconfig            |   1 +
>   arch/arm/configs/eseries_pxa_defconfig        |   1 +
>   arch/arm/configs/exynos_defconfig             |   1 +
>   arch/arm/configs/ezx_defconfig                |   1 +
>   arch/arm/configs/footbridge_defconfig         |   1 +
>   arch/arm/configs/h3600_defconfig              |   1 +
>   arch/arm/configs/imote2_defconfig             |   1 +
>   arch/arm/configs/imx_v6_v7_defconfig          |   1 +
>   arch/arm/configs/integrator_defconfig         |   1 +
>   arch/arm/configs/iop32x_defconfig             |   1 +
>   arch/arm/configs/ixp4xx_defconfig             |   1 +
>   arch/arm/configs/jornada720_defconfig         |   1 +
>   arch/arm/configs/keystone_defconfig           |   1 +
>   arch/arm/configs/lpc32xx_defconfig            |   1 +
>   arch/arm/configs/milbeaut_m10v_defconfig      |   1 +
>   arch/arm/configs/mini2440_defconfig           |   1 +
>   arch/arm/configs/multi_v5_defconfig           |   1 +
>   arch/arm/configs/multi_v7_defconfig           |   1 +
>   arch/arm/configs/mv78xx0_defconfig            |   1 +
>   arch/arm/configs/mvebu_v5_defconfig           |   1 +
>   arch/arm/configs/netwinder_defconfig          |   1 +
>   arch/arm/configs/nhk8815_defconfig            |   1 +
>   arch/arm/configs/omap1_defconfig              |   1 +
>   arch/arm/configs/omap2plus_defconfig          |   1 +
>   arch/arm/configs/orion5x_defconfig            |   1 +
>   arch/arm/configs/oxnas_v6_defconfig           |   1 +
>   arch/arm/configs/palmz72_defconfig            |   1 +
>   arch/arm/configs/pleb_defconfig               |   1 +
>   arch/arm/configs/prima2_defconfig             |   1 +
>   arch/arm/configs/pxa3xx_defconfig             |   1 +
>   arch/arm/configs/pxa_defconfig                |   1 +
>   arch/arm/configs/qcom_defconfig               |   1 +
>   arch/arm/configs/rpc_defconfig                |   1 +
>   arch/arm/configs/s3c2410_defconfig            |   1 +
>   arch/arm/configs/s3c6400_defconfig            |   1 +
>   arch/arm/configs/s5pv210_defconfig            |   1 +
>   arch/arm/configs/sama5_defconfig              |   1 +
>   arch/arm/configs/simpad_defconfig             |   1 +
>   arch/arm/configs/socfpga_defconfig            |   1 +
>   arch/arm/configs/spitz_defconfig              |   1 +
>   arch/arm/configs/tango4_defconfig             |   1 +
>   arch/arm/configs/tegra_defconfig              |   1 +
>   arch/arm/configs/trizeps4_defconfig           |   1 +
>   arch/arm/configs/viper_defconfig              |   1 +
>   arch/arm/configs/zeus_defconfig               |   1 +
>   arch/arm/configs/zx_defconfig                 |   1 +
>   arch/arm64/configs/defconfig                  |   1 +
>   arch/c6x/configs/dsk6455_defconfig            |   1 +
>   arch/c6x/configs/evmc6457_defconfig           |   1 +
>   arch/c6x/configs/evmc6472_defconfig           |   1 +
>   arch/c6x/configs/evmc6474_defconfig           |   1 +
>   arch/c6x/configs/evmc6678_defconfig           |   1 +
>   arch/csky/configs/defconfig                   |   1 +
>   arch/hexagon/configs/comet_defconfig          |   1 +
>   arch/ia64/configs/bigsur_defconfig            |   1 +
>   arch/ia64/configs/generic_defconfig           |   1 +
>   arch/ia64/configs/gensparse_defconfig         |   1 +
>   arch/ia64/configs/tiger_defconfig             |   1 +
>   arch/ia64/configs/zx1_defconfig               |   1 +
>   arch/m68k/configs/amiga_defconfig             |   1 +
>   arch/m68k/configs/apollo_defconfig            |   1 +
>   arch/m68k/configs/atari_defconfig             |   1 +
>   arch/m68k/configs/bvme6000_defconfig          |   1 +
>   arch/m68k/configs/hp300_defconfig             |   1 +
>   arch/m68k/configs/mac_defconfig               |   1 +
>   arch/m68k/configs/multi_defconfig             |   1 +
>   arch/m68k/configs/mvme147_defconfig           |   1 +
>   arch/m68k/configs/mvme16x_defconfig           |   1 +
>   arch/m68k/configs/q40_defconfig               |   1 +
>   arch/m68k/configs/sun3_defconfig              |   1 +
>   arch/m68k/configs/sun3x_defconfig             |   1 +
>   arch/mips/configs/bigsur_defconfig            |   1 +
>   arch/mips/configs/cavium_octeon_defconfig     |   1 +
>   arch/mips/configs/cobalt_defconfig            |   1 +
>   arch/mips/configs/decstation_64_defconfig     |   1 +
>   arch/mips/configs/decstation_defconfig        |   1 +
>   arch/mips/configs/decstation_r4k_defconfig    |   1 +
>   arch/mips/configs/fuloong2e_defconfig         |   1 +
>   arch/mips/configs/generic/board-ocelot.config |   1 +
>   arch/mips/configs/gpr_defconfig               |   1 +
>   arch/mips/configs/ip27_defconfig              |   1 +
>   arch/mips/configs/ip32_defconfig              |   1 +
>   arch/mips/configs/jazz_defconfig              |   1 +
>   arch/mips/configs/lemote2f_defconfig          |   1 +
>   arch/mips/configs/loongson1b_defconfig        |   1 +
>   arch/mips/configs/loongson1c_defconfig        |   1 +
>   arch/mips/configs/loongson3_defconfig         |   1 +
>   arch/mips/configs/malta_defconfig             |   1 +
>   arch/mips/configs/malta_kvm_defconfig         |   1 +
>   arch/mips/configs/malta_kvm_guest_defconfig   |   1 +
>   arch/mips/configs/malta_qemu_32r6_defconfig   |   1 +
>   arch/mips/configs/maltaaprp_defconfig         |   1 +
>   arch/mips/configs/maltasmvp_defconfig         |   1 +
>   arch/mips/configs/maltasmvp_eva_defconfig     |   1 +
>   arch/mips/configs/maltaup_defconfig           |   1 +
>   arch/mips/configs/maltaup_xpa_defconfig       |   1 +
>   arch/mips/configs/markeins_defconfig          |   1 +
>   arch/mips/configs/mips_paravirt_defconfig     |   1 +
>   arch/mips/configs/nlm_xlp_defconfig           |   1 +
>   arch/mips/configs/nlm_xlr_defconfig           |   1 +
>   arch/mips/configs/pic32mzda_defconfig         |   1 +
>   arch/mips/configs/pistachio_defconfig         |   1 +
>   arch/mips/configs/pnx8335_stb225_defconfig    |   1 +
>   arch/mips/configs/rbtx49xx_defconfig          |   1 +
>   arch/mips/configs/rm200_defconfig             |   1 +
>   arch/mips/configs/tb0219_defconfig            |   1 +
>   arch/mips/configs/tb0226_defconfig            |   1 +
>   arch/mips/configs/tb0287_defconfig            |   1 +
>   arch/nios2/configs/10m50_defconfig            |   1 +
>   arch/nios2/configs/3c120_defconfig            |   1 +
>   arch/parisc/configs/712_defconfig             |   1 +
>   arch/parisc/configs/a500_defconfig            |   1 +
>   arch/parisc/configs/b180_defconfig            |   1 +
>   arch/parisc/configs/c3000_defconfig           |   1 +
>   arch/parisc/configs/c8000_defconfig           |   1 +
>   arch/parisc/configs/defconfig                 |   1 +
>   arch/parisc/configs/generic-32bit_defconfig   |   1 +
>   arch/parisc/configs/generic-64bit_defconfig   |   1 +
>   arch/powerpc/configs/40x/virtex_defconfig     |   1 +
>   arch/powerpc/configs/44x/sam440ep_defconfig   |   1 +
>   arch/powerpc/configs/44x/virtex5_defconfig    |   1 +
>   arch/powerpc/configs/52xx/cm5200_defconfig    |   1 +
>   arch/powerpc/configs/52xx/lite5200b_defconfig |   1 +
>   arch/powerpc/configs/52xx/motionpro_defconfig |   1 +
>   arch/powerpc/configs/52xx/tqm5200_defconfig   |   1 +
>   arch/powerpc/configs/83xx/asp8347_defconfig   |   1 +
>   .../configs/83xx/mpc8313_rdb_defconfig        |   1 +
>   .../configs/83xx/mpc8315_rdb_defconfig        |   1 +
>   .../configs/83xx/mpc832x_mds_defconfig        |   1 +
>   .../configs/83xx/mpc832x_rdb_defconfig        |   1 +
>   .../configs/83xx/mpc834x_itx_defconfig        |   1 +
>   .../configs/83xx/mpc834x_itxgp_defconfig      |   1 +
>   .../configs/83xx/mpc834x_mds_defconfig        |   1 +
>   .../configs/83xx/mpc836x_mds_defconfig        |   1 +
>   .../configs/83xx/mpc836x_rdk_defconfig        |   1 +
>   .../configs/83xx/mpc837x_mds_defconfig        |   1 +
>   .../configs/83xx/mpc837x_rdb_defconfig        |   1 +
>   arch/powerpc/configs/85xx/ge_imp3a_defconfig  |   1 +
>   arch/powerpc/configs/85xx/ksi8560_defconfig   |   1 +
>   .../configs/85xx/mpc8540_ads_defconfig        |   1 +
>   .../configs/85xx/mpc8560_ads_defconfig        |   1 +
>   .../configs/85xx/mpc85xx_cds_defconfig        |   1 +
>   arch/powerpc/configs/85xx/sbc8548_defconfig   |   1 +
>   arch/powerpc/configs/85xx/socrates_defconfig  |   1 +
>   arch/powerpc/configs/85xx/stx_gp3_defconfig   |   1 +
>   arch/powerpc/configs/85xx/tqm8540_defconfig   |   1 +
>   arch/powerpc/configs/85xx/tqm8541_defconfig   |   1 +
>   arch/powerpc/configs/85xx/tqm8548_defconfig   |   1 +
>   arch/powerpc/configs/85xx/tqm8555_defconfig   |   1 +
>   arch/powerpc/configs/85xx/tqm8560_defconfig   |   1 +
>   .../configs/85xx/xes_mpc85xx_defconfig        |   1 +
>   arch/powerpc/configs/amigaone_defconfig       |   1 +
>   arch/powerpc/configs/cell_defconfig           |   1 +
>   arch/powerpc/configs/chrp32_defconfig         |   1 +
>   arch/powerpc/configs/ep8248e_defconfig        |   1 +
>   arch/powerpc/configs/fsl-emb-nonhw.config     |   1 +
>   arch/powerpc/configs/g5_defconfig             |   1 +
>   arch/powerpc/configs/gamecube_defconfig       |   1 +
>   arch/powerpc/configs/holly_defconfig          |   1 +
>   arch/powerpc/configs/linkstation_defconfig    |   1 +
>   arch/powerpc/configs/mgcoge_defconfig         |   1 +
>   arch/powerpc/configs/mpc5200_defconfig        |   1 +
>   arch/powerpc/configs/mpc7448_hpc2_defconfig   |   1 +
>   arch/powerpc/configs/mpc8272_ads_defconfig    |   1 +
>   arch/powerpc/configs/mpc83xx_defconfig        |   1 +
>   arch/powerpc/configs/mpc866_ads_defconfig     |   1 +
>   arch/powerpc/configs/mvme5100_defconfig       |   1 +
>   arch/powerpc/configs/pasemi_defconfig         |   1 +
>   arch/powerpc/configs/pmac32_defconfig         |   1 +
>   arch/powerpc/configs/powernv_defconfig        |   1 +
>   arch/powerpc/configs/ppc64_defconfig          |   1 +
>   arch/powerpc/configs/ppc64e_defconfig         |   1 +
>   arch/powerpc/configs/ppc6xx_defconfig         |   1 +
>   arch/powerpc/configs/pq2fads_defconfig        |   1 +
>   arch/powerpc/configs/ps3_defconfig            |   1 +
>   arch/powerpc/configs/pseries_defconfig        |   1 +
>   arch/powerpc/configs/skiroot_defconfig        |   1 +
>   arch/powerpc/configs/wii_defconfig            |   1 +
>   arch/riscv/configs/defconfig                  |   1 +
>   arch/riscv/configs/rv32_defconfig             |   1 +
>   arch/s390/configs/debug_defconfig             |   1 +
>   arch/s390/configs/defconfig                   |   1 +
>   arch/sh/configs/cayman_defconfig              |   1 +
>   arch/sh/configs/landisk_defconfig             |   1 +
>   arch/sh/configs/lboxre2_defconfig             |   1 +
>   arch/sh/configs/rsk7264_defconfig             |   1 +
>   arch/sh/configs/sdk7780_defconfig             |   1 +
>   arch/sh/configs/sdk7786_defconfig             |   1 +
>   arch/sh/configs/se7206_defconfig              |   1 +
>   arch/sh/configs/se7780_defconfig              |   1 +
>   arch/sh/configs/sh03_defconfig                |   1 +
>   arch/sh/configs/sh2007_defconfig              |   1 +
>   arch/sh/configs/sh7785lcr_32bit_defconfig     |   1 +
>   arch/sh/configs/shmin_defconfig               |   1 +
>   arch/sh/configs/titan_defconfig               |   1 +
>   arch/sparc/configs/sparc32_defconfig          |   1 +
>   arch/sparc/configs/sparc64_defconfig          |   1 +
>   arch/um/configs/i386_defconfig                |   1 +
>   arch/um/configs/x86_64_defconfig              |   1 +
>   arch/unicore32/configs/defconfig              |   1 +
>   arch/x86/configs/i386_defconfig               |   1 +
>   arch/x86/configs/x86_64_defconfig             |   1 +
>   arch/xtensa/configs/audio_kc705_defconfig     |   1 +
>   arch/xtensa/configs/cadence_csp_defconfig     |   1 +
>   arch/xtensa/configs/generic_kc705_defconfig   |   1 +
>   arch/xtensa/configs/nommu_kc705_defconfig     |   1 +
>   arch/xtensa/configs/smp_lx200_defconfig       |   1 +
>   arch/xtensa/configs/virt_defconfig            |   1 +
>   drivers/block/Kconfig                         |  73 +-
>   drivers/block/Makefile                        |   4 +-
>   drivers/block/loop/Kconfig                    |  93 ++
>   drivers/block/loop/Makefile                   |  13 +
>   drivers/block/{ => loop}/cryptoloop.c         |   2 +-
>   drivers/block/loop/loop_file_fmt.c            | 328 ++++++
>   drivers/block/loop/loop_file_fmt.h            | 351 +++++++
>   drivers/block/loop/loop_file_fmt_qcow_cache.c | 218 ++++
>   drivers/block/loop/loop_file_fmt_qcow_cache.h |  51 +
>   .../block/loop/loop_file_fmt_qcow_cluster.c   | 270 +++++
>   .../block/loop/loop_file_fmt_qcow_cluster.h   |  23 +
>   drivers/block/loop/loop_file_fmt_qcow_main.c  | 945 ++++++++++++++++++
>   drivers/block/loop/loop_file_fmt_qcow_main.h  | 417 ++++++++
>   drivers/block/loop/loop_file_fmt_raw.c        | 449 +++++++++
>   drivers/block/{loop.c => loop/loop_main.c}    | 567 ++++-------
>   drivers/block/{loop.h => loop/loop_main.h}    |  14 +-
>   include/uapi/linux/loop.h                     |  14 +-
>   248 files changed, 3861 insertions(+), 422 deletions(-)
>   create mode 100644 Documentation/admin-guide/blockdev/loop.rst
>   create mode 100644 Documentation/driver-api/loop-file-fmt.rst
>   create mode 100644 drivers/block/loop/Kconfig
>   create mode 100644 drivers/block/loop/Makefile
>   rename drivers/block/{ => loop}/cryptoloop.c (99%)
>   create mode 100644 drivers/block/loop/loop_file_fmt.c
>   create mode 100644 drivers/block/loop/loop_file_fmt.h
>   create mode 100644 drivers/block/loop/loop_file_fmt_qcow_cache.c
>   create mode 100644 drivers/block/loop/loop_file_fmt_qcow_cache.h
>   create mode 100644 drivers/block/loop/loop_file_fmt_qcow_cluster.c
>   create mode 100644 drivers/block/loop/loop_file_fmt_qcow_cluster.h
>   create mode 100644 drivers/block/loop/loop_file_fmt_qcow_main.c
>   create mode 100644 drivers/block/loop/loop_file_fmt_qcow_main.h
>   create mode 100644 drivers/block/loop/loop_file_fmt_raw.c
>   rename drivers/block/{loop.c => loop/loop_main.c} (86%)
>   rename drivers/block/{loop.h => loop/loop_main.h} (92%)
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/5] block: loop: add file format subsystem and QCOW2 file format driver
  2019-08-24  9:14   ` Manuel Bentele
@ 2019-08-24 16:04     ` Bart Van Assche
  2019-08-25 12:15       ` Manuel Bentele
  0 siblings, 1 reply; 15+ messages in thread
From: Bart Van Assche @ 2019-08-24 16:04 UTC (permalink / raw)
  To: Manuel Bentele, development, linux-block, Mike Snitzer, Jens Axboe

On 8/24/19 2:14 AM, Manuel Bentele wrote:
> On 8/24/19 5:37 AM, Bart Van Assche wrote:
>> On 8/23/19 3:56 PM, development@manuel-bentele.de wrote:
>>> During the discussion, it turned out that the implementation as device
>>> mapper target is not applicable. The device mapper stacks different
>>> functionality such as compression or encryption on multiple block device
>>> layers whereas an implementation for the QCOW2 container format provides
>>> these functionalities on one block device layer.
>>
>> Is there a more detailed discussion available of this subject?
 >
> No, the only discussion is the referenced one [1]. But there was a
> similar discussion in the master's thesis of Francesc Zacarias Ribot
> [2]. Unfortunately, I found no attempt on the mailing list that proposes
> his solution.
> 
>> Are you familiar with the dm-crypt driver?
 >
> I don't know the specific implementation details, but I use this driver
> personally and I like it. Do you want to propose that only the storage
> aspect of the QCOW2 container format should be used and all other
> functionality inside the container should be provided by available
> device mapper targets?

(+Mike Snitzer)

Hmm, I haven't found any reference to the device mapper in the document 
written by Francesc. Maybe that means that I overlooked something?

I referred to the dm-crypt driver because I think that's an example that 
shows that QCOW2 file format support could be implemented using the 
device mapper framework.

Mike, do you perhaps want to comment on what the most appropriate way is 
to implement such functionality? The entire patch series is available at 
https://lore.kernel.org/linux-block/86279379-32ac-15e9-2f91-68ce9c94cfbf@manuel-bentele.de/T/#t.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/5] block: loop: add file format subsystem and QCOW2 file format driver
  2019-08-24 16:04     ` Bart Van Assche
@ 2019-08-25 12:15       ` Manuel Bentele
  2019-09-09 22:12         ` Manuel Bentele
  0 siblings, 1 reply; 15+ messages in thread
From: Manuel Bentele @ 2019-08-25 12:15 UTC (permalink / raw)
  To: Bart Van Assche, Manuel Bentele, linux-block, Mike Snitzer, Jens Axboe

On 8/24/19 6:04 PM, Bart Van Assche wrote:
> On 8/24/19 2:14 AM, Manuel Bentele wrote:
>> On 8/24/19 5:37 AM, Bart Van Assche wrote:
>>> On 8/23/19 3:56 PM, development@manuel-bentele.de wrote:
>>>> During the discussion, it turned out that the implementation as device
>>>> mapper target is not applicable. The device mapper stacks different
>>>> functionality such as compression or encryption on multiple block
>>>> device
>>>> layers whereas an implementation for the QCOW2 container format
>>>> provides
>>>> these functionalities on one block device layer.
>>>
>>> Is there a more detailed discussion available of this subject?
> >
>> No, the only discussion is the referenced one [1]. But there was a
>> similar discussion in the master's thesis of Francesc Zacarias Ribot
>> [2]. Unfortunately, I found no attempt on the mailing list that proposes
>> his solution.
>>
>>> Are you familiar with the dm-crypt driver?
> >
>> I don't know the specific implementation details, but I use this driver
>> personally and I like it. Do you want to propose that only the storage
>> aspect of the QCOW2 container format should be used and all other
>> functionality inside the container should be provided by available
>> device mapper targets?
>
> (+Mike Snitzer)
>
> Hmm, I haven't found any reference to the device mapper in the
> document written by Francesc. Maybe that means that I overlooked
> something?
Oh sorry, you're right. I meant this in general for the topic 'QCOW2 in
the kernel space'.

> I referred to the dm-crypt driver because I think that's an example
> that shows that QCOW2 file format support could be implemented using
> the device mapper framework.
Okay, now I get it :)

> Mike, do you perhaps want to comment on what the most appropriate way
> is to implement such functionality?

To implement the QCOW2 format or other sparse container formats
correctly, the implementation must be able to ...
  - extend the capacity of the mapped block device
  - shrink the capacity of the mapped block device
  - rescan the paritions of the mapped block device

Are all three functionalities feasible using the device mapper framework?

> The entire patch series is available at
> https://lore.kernel.org/linux-block/86279379-32ac-15e9-2f91-68ce9c94cfbf@manuel-bentele.de/T/#t.

Note that PATCH [1/5] is missing in this series, although I've submitted
it twice. I asked already in [1] for the reason but haven't received any
answer, yet. Therefore, I temporarily insert a link to my repository
showing the missing PATCH [1/5]:
https://github.com/bahnwaerter/linux/commit/7a78da744b4c84809ad6aa20673a2b686bafb201

Regards,
Manuel

[1] https://www.spinics.net/lists/linux-block/msg44255.html


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/5] block: loop: add file format subsystem and QCOW2 file format driver
  2019-08-25 12:15       ` Manuel Bentele
@ 2019-09-09 22:12         ` Manuel Bentele
  0 siblings, 0 replies; 15+ messages in thread
From: Manuel Bentele @ 2019-09-09 22:12 UTC (permalink / raw)
  To: Manuel Bentele, Bart Van Assche, Mike Snitzer, Jens Axboe; +Cc: linux-block

On 8/25/19 2:15 PM, Manuel Bentele wrote:
> On 8/24/19 6:04 PM, Bart Van Assche wrote:
>> On 8/24/19 2:14 AM, Manuel Bentele wrote:
>>> On 8/24/19 5:37 AM, Bart Van Assche wrote:
>>>> On 8/23/19 3:56 PM, development@manuel-bentele.de wrote:
>>>>> During the discussion, it turned out that the implementation as device
>>>>> mapper target is not applicable. The device mapper stacks different
>>>>> functionality such as compression or encryption on multiple block
>>>>> device
>>>>> layers whereas an implementation for the QCOW2 container format
>>>>> provides
>>>>> these functionalities on one block device layer.
>>>> Is there a more detailed discussion available of this subject?
>>> No, the only discussion is the referenced one [1]. But there was a
>>> similar discussion in the master's thesis of Francesc Zacarias Ribot
>>> [2]. Unfortunately, I found no attempt on the mailing list that proposes
>>> his solution.
>>>
>>>> Are you familiar with the dm-crypt driver?
>>> I don't know the specific implementation details, but I use this driver
>>> personally and I like it. Do you want to propose that only the storage
>>> aspect of the QCOW2 container format should be used and all other
>>> functionality inside the container should be provided by available
>>> device mapper targets?
>> (+Mike Snitzer)
>>
>> Hmm, I haven't found any reference to the device mapper in the
>> document written by Francesc. Maybe that means that I overlooked
>> something?
> Oh sorry, you're right. I meant this in general for the topic 'QCOW2 in
> the kernel space'.
>
>> I referred to the dm-crypt driver because I think that's an example
>> that shows that QCOW2 file format support could be implemented using
>> the device mapper framework.
> Okay, now I get it :)
>
>> Mike, do you perhaps want to comment on what the most appropriate way
>> is to implement such functionality?
> To implement the QCOW2 format or other sparse container formats
> correctly, the implementation must be able to ...
>   - extend the capacity of the mapped block device
>   - shrink the capacity of the mapped block device
>   - rescan the paritions of the mapped block device
>
> Are all three functionalities feasible using the device mapper framework?
Because there was no answer, I have analyzed the device mapper in more
detail. I found out, that one can get access to the virtual and
"underlying" devices. The virtual device (mapped_device) is created and
managed by the device mapper. The mapped_device can be obtained in the
constructor of a device mapper target by calling dm_table_get_md(). The
function call needs the table of the dm_target as parameter and returns
a pointer to the mapped_device structure. The structure contains
pointers to the gendisk and the block_device of the mapped_device. The
"underlying" devices of the table can be obtained or added by calling
dm_get_device() in the constructor, too. The call returns a pointer to a
dm_dev structure. Then, the dm_dev structure contains a pointer to its
referenced block_device. Now there is direct access to the block_device
or gendisk structures. This means that one can implement the three
functionalities to support sparse container formats and implement my
file format subsystem and file format drivers as device mapper targets.
But one should take care of the direct access to the block_device and
gendisk structures in a device mapper target because sometimes there is
the risk of bypassing the device mapper framework. Please be careful and
read the comments and descriptions of the exported functions in the
device mapper framework.

Compared to the proposed loop device module integration, this approach
seems harder to achieve for me. Furthermore, the device mapper target
needs an additional user space utility to simplify the control of the
file format subsystem and drivers and help people who are afraid of the
dmsetup utility ;)

Would you accept the proposed file format subsystem and drivers
implemented as device mapper targets?

>> The entire patch series is available at
>> https://lore.kernel.org/linux-block/86279379-32ac-15e9-2f91-68ce9c94cfbf@manuel-bentele.de/T/#t.
> Note that PATCH [1/5] is missing in this series, although I've submitted
> it twice. I asked already in [1] for the reason but haven't received any
> answer, yet. Therefore, I temporarily insert a link to my repository
> showing the missing PATCH [1/5]:
> https://github.com/bahnwaerter/linux/commit/7a78da744b4c84809ad6aa20673a2b686bafb201
>
> Regards,
> Manuel
>
> [1] https://www.spinics.net/lists/linux-block/msg44255.html

Regards,
Manuel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/5] block: loop: add file format subsystem and QCOW2 file format driver
  2019-08-23 22:56 [PATCH 0/5] block: loop: add file format subsystem and QCOW2 file format driver development
                   ` (5 preceding siblings ...)
  2019-08-24 11:10 ` Manuel Bentele
@ 2019-09-12  2:24 ` Ming Lei
  2019-09-13 11:57   ` Manuel Bentele
  6 siblings, 1 reply; 15+ messages in thread
From: Ming Lei @ 2019-09-12  2:24 UTC (permalink / raw)
  To: development; +Cc: linux-block

On Sat, Aug 24, 2019 at 12:56:14AM +0200, development@manuel-bentele.de wrote:
> From: Manuel Bentele <development@manuel-bentele.de>
> 
> Hi
> 
> Regarding to the following discussion [1] on the mailing list I show you 
> the result of my work as announced at the end of the discussion [2].
> 
> The discussion was about the project topic of how to implement the 
> reading/writing of QCOW2 in the kernel. The project focuses on an read-only 
> in-kernel QCOW2 implementation to increase the read/write performance 
> and tries to avoid nbd. Furthermore, the project is part of a project 
> series to develop a in-kernel network boot infrastructure that has no need 

I'd suggest you to share more details about this use case first:

1) what is the in-kernel network boot infrastructure? which functions
does it provide for user?

2) how does the in kernel QCOW2 interacts with in-kernel network boot
infrastructure?

3) most important thing, what are the exact steps for one user to use
the in-kernel network boot infrastructure and in-kernel QCOW2?

Without knowing the motivation/purpose and exact use case, it doesn't
make sense to discuss the implementation details, IMO.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/5] block: loop: add file format subsystem and QCOW2 file format driver
  2019-09-12  2:24 ` Ming Lei
@ 2019-09-13 11:57   ` Manuel Bentele
  2019-09-16  2:11     ` Ming Lei
  0 siblings, 1 reply; 15+ messages in thread
From: Manuel Bentele @ 2019-09-13 11:57 UTC (permalink / raw)
  To: Ming Lei; +Cc: linux-block

Hi Ming,

On 9/12/19 4:24 AM, Ming Lei wrote:
> On Sat, Aug 24, 2019 at 12:56:14AM +0200, development@manuel-bentele.de wrote:
>> From: Manuel Bentele <development@manuel-bentele.de>
>>
>> Hi
>>
>> Regarding to the following discussion [1] on the mailing list I show you 
>> the result of my work as announced at the end of the discussion [2].
>>
>> The discussion was about the project topic of how to implement the 
>> reading/writing of QCOW2 in the kernel. The project focuses on an read-only 
>> in-kernel QCOW2 implementation to increase the read/write performance 
>> and tries to avoid nbd. Furthermore, the project is part of a project 
>> series to develop a in-kernel network boot infrastructure that has no need 
> I'd suggest you to share more details about this use case first:
>
> 1) what is the in-kernel network boot infrastructure? which functions
> does it provide for user?

Some time ago, I started to describe the setup a little bit in [1]. Now
I want to extend the description:

The boot infrastructure is used in the university environment and
quarrels with network-related limitations. Step-by-step, the network
hardware is renewed and improved, but there are still many university
branches which are spread all over the city and connected by poor uplink
connections. Sometimes there exist cases where 15 until 20 desktop
computers have to share only 1 gigabit uplink. To accelerate the network
boot, the idea came up to use the QCOW2 file format and its compression
feature for the image content. Tests have shown, that the usage of
compression is already measurable at gigabit uplinks and clearly
noticeable at 100 megabit uplinks.

The network boot infrastructure is based on a classical PXE network boot
to load the Linux kernel and the initramfs. In the initramfs, the
compressed QCOW2 image is fetched via nfs or cifs or something else. The
fetched QCOW2 image is now decompressed and read in the kernel. Compared
to a decompression and read in the user space, like qemu-nbd does, this
approach does not need any user space process, is faster and avoids
switchroot problems.

> 2) how does the in kernel QCOW2 interacts with in-kernel network boot
> infrastructure?

The in-kernel QCOW2 implementation uses the fetched QCOW2 image and
exposes it as block device.

Therefore, my implementation extends the loop device module by a general
file format subsystem to implement various file format drivers including
a driver for the QCOW2 and RAW file format. The configuration utility
losetup is used to set up a loop device and specify the file format
driver to use.

> 3) most important thing, what are the exact steps for one user to use
> the in-kernel network boot infrastructure and in-kernel QCOW2?

To achieve a running system one have to complete the following items:

  * Set up a PXE boot server and configure client computers to boot from
    the network
  * Build a Linux kernel for the network boot with built-in QCOW2
    implementation
  * Prepare the initramfs for the network boot. Use a network file
    system or copy tool to fetch the compressed QCOW2 image.
  * Create a compressed QCOW2 image that contains a complete environment
    for the user to work with after a successful network boot
  * Set up the reading of the fetched QCOW2 image using the in-kernel
    QCOW2 implementation and mount the file systems located in the QCOW2
    image.
  * Perform a switchroot to change into the mounted environment of the
    QCOW2 image.


Thanks for your help.

Regards,
Manuel

[1] https://www.spinics.net/lists/linux-block/msg39565.html


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/5] block: loop: add file format subsystem and QCOW2 file format driver
  2019-09-13 11:57   ` Manuel Bentele
@ 2019-09-16  2:11     ` Ming Lei
  2019-09-18 10:26       ` Simon Rettberg
  0 siblings, 1 reply; 15+ messages in thread
From: Ming Lei @ 2019-09-16  2:11 UTC (permalink / raw)
  To: Manuel Bentele; +Cc: linux-block

On Fri, Sep 13, 2019 at 01:57:33PM +0200, Manuel Bentele wrote:
> Hi Ming,
> 
> On 9/12/19 4:24 AM, Ming Lei wrote:
> > On Sat, Aug 24, 2019 at 12:56:14AM +0200, development@manuel-bentele.de wrote:
> >> From: Manuel Bentele <development@manuel-bentele.de>
> >>
> >> Hi
> >>
> >> Regarding to the following discussion [1] on the mailing list I show you 
> >> the result of my work as announced at the end of the discussion [2].
> >>
> >> The discussion was about the project topic of how to implement the 
> >> reading/writing of QCOW2 in the kernel. The project focuses on an read-only 
> >> in-kernel QCOW2 implementation to increase the read/write performance 
> >> and tries to avoid nbd. Furthermore, the project is part of a project 
> >> series to develop a in-kernel network boot infrastructure that has no need 
> > I'd suggest you to share more details about this use case first:
> >
> > 1) what is the in-kernel network boot infrastructure? which functions
> > does it provide for user?
> 
> Some time ago, I started to describe the setup a little bit in [1]. Now
> I want to extend the description:
> 
> The boot infrastructure is used in the university environment and
> quarrels with network-related limitations. Step-by-step, the network
> hardware is renewed and improved, but there are still many university
> branches which are spread all over the city and connected by poor uplink
> connections. Sometimes there exist cases where 15 until 20 desktop
> computers have to share only 1 gigabit uplink. To accelerate the network
> boot, the idea came up to use the QCOW2 file format and its compression
> feature for the image content. Tests have shown, that the usage of
> compression is already measurable at gigabit uplinks and clearly
> noticeable at 100 megabit uplinks.

Got it, looks a good use case for compression, but not has to be QCOW2.

> 
> The network boot infrastructure is based on a classical PXE network boot
> to load the Linux kernel and the initramfs. In the initramfs, the
> compressed QCOW2 image is fetched via nfs or cifs or something else. The
> fetched QCOW2 image is now decompressed and read in the kernel. Compared
> to a decompression and read in the user space, like qemu-nbd does, this
> approach does not need any user space process, is faster and avoids
> switchroot problems.

This image can be compressed via xz, and fetched via wget or what
ever. 'xz' could have better compression ratio than qcow2, I guess.

> 
> > 2) how does the in kernel QCOW2 interacts with in-kernel network boot
> > infrastructure?
> 
> The in-kernel QCOW2 implementation uses the fetched QCOW2 image and
> exposes it as block device.
> 
> Therefore, my implementation extends the loop device module by a general
> file format subsystem to implement various file format drivers including
> a driver for the QCOW2 and RAW file format. The configuration utility
> losetup is used to set up a loop device and specify the file format
> driver to use.

You still need to update losetup.  xz-utils can be installed for
decompressing the image, then you still can create loop disk over
the image.

> 
> > 3) most important thing, what are the exact steps for one user to use
> > the in-kernel network boot infrastructure and in-kernel QCOW2?
> 
> To achieve a running system one have to complete the following items:
> 
>   * Set up a PXE boot server and configure client computers to boot from
>     the network
>   * Build a Linux kernel for the network boot with built-in QCOW2
>     implementation
>   * Prepare the initramfs for the network boot. Use a network file
>     system or copy tool to fetch the compressed QCOW2 image.
>   * Create a compressed QCOW2 image that contains a complete environment
>     for the user to work with after a successful network boot
>   * Set up the reading of the fetched QCOW2 image using the in-kernel
>     QCOW2 implementation and mount the file systems located in the QCOW2
>     image.
>   * Perform a switchroot to change into the mounted environment of the
>     QCOW2 image.

As I mentioned above, seems not necessary to introduce loop-qcow2.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/5] block: loop: add file format subsystem and QCOW2 file format driver
  2019-09-16  2:11     ` Ming Lei
@ 2019-09-18 10:26       ` Simon Rettberg
  0 siblings, 0 replies; 15+ messages in thread
From: Simon Rettberg @ 2019-09-18 10:26 UTC (permalink / raw)
  To: Ming Lei; +Cc: Manuel Bentele, linux-block

Hi everyone,

chiming in for clearing this up a bit.

> Got it, looks a good use case for compression, but not has to be
> QCOW2.
> 
> > 
> > The network boot infrastructure is based on a classical PXE network
> > boot to load the Linux kernel and the initramfs. In the initramfs,
> > the compressed QCOW2 image is fetched via nfs or cifs or something
> > else. The fetched QCOW2 image is now decompressed and read in the
> > kernel. Compared to a decompression and read in the user space,
> > like qemu-nbd does, this approach does not need any user space
> > process, is faster and avoids switchroot problems.  
> 
> This image can be compressed via xz, and fetched via wget or what
> ever. 'xz' could have better compression ratio than qcow2, I guess.

"Fetch" was probably a bit ambiguous. The image isn't downloaded, but
mounted directly from the network (streamed?), so we can benefit from
the per-cluster compression of qcow2, similar to squashfs but on the
block layer. A typical image is between 3 and 10GB with qcow2
compression, so downloading it entirely on boot to be able to
decompress it is not feasible.

> As I mentioned above, seems not necessary to introduce loop-qcow2.

Yes, there are many ways to achieve this. The basic concept of network
booting the workstations has been practiced here for almost 15 years
now using very different approaches like plain old NFS mounts for the
root filesystem, squashfs containers that get downloaded, or streamed
over network. But since our requirement is a stateless system, we need
a copy-on-write layer on top of this. In the beginnings we did this
with unionfs and then aufs, but as these operate on the file-system
layer they have several drawbacks and relatively high complexity
compared to block-layer CoW. So we switched to a block-based approach
about 4 years ago. For reasons stated before, we wanted to use some
form of compression, as was possible with squashfs before, so after
some experimenting, qcow2 proved to be a good fit. However, adding in
user-space tools like qemu-nbd or xmount added too much of a
performance penalty and initially, also some problems during the
switchroot from initrd to the actual root file system.

So the current process looks as follows: kernel + initrd are
loaded via iPXE. initrd sets up network, mounts NFS share or connects
to server via NBD to access the qcow2 image. Modified losetup sets up
access to qcow2 image, either from NFS share or
directly from /dev/nbd0. Finally, mount /dev/loop0pXX and switch to new
root.

Manuel's implementation has so far proven to be very reliable and
brought noticeable performance improvements compared to having a user
space process doing the qcow2 handling.

So we would have really liked the idea of having his changes
upstreamed, I think he did a very good job by designing a plugin
infrastructure for the loop device and making the qcow2 plugin a
separate module. We knew about the concerns of adding code for handling
a file format in the kernel and were hoping that maybe an acceptable
compromise would be to have his changes added to the kernel minus the
actual qcow2 plugin, so it is mostly a refactoring of the old loop
device that's not adding too much complexity (hopefully). But if we're
really such an oddball use-case here that this won't possibly be of any
interest to anybody else we will just have to go forward maintaining
this out of tree entirely.

Thanks for your time,
Simon

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-09-18 11:25 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-23 22:56 [PATCH 0/5] block: loop: add file format subsystem and QCOW2 file format driver development
2019-08-23 22:56 ` [PATCH 2/5] doc: admin-guide: add loop block device documentation development
2019-08-23 22:56 ` [PATCH 3/5] doc: driver-api: add loop file format subsystem API documentation development
2019-08-23 22:56 ` [PATCH 4/5] block: loop: add QCOW2 loop file format driver (read-only) development
2019-08-23 22:56 ` [PATCH 5/5] doc: admin-guide: add QCOW2 file format to loop device documentation development
2019-08-24  3:37 ` [PATCH 0/5] block: loop: add file format subsystem and QCOW2 file format driver Bart Van Assche
2019-08-24  9:14   ` Manuel Bentele
2019-08-24 16:04     ` Bart Van Assche
2019-08-25 12:15       ` Manuel Bentele
2019-09-09 22:12         ` Manuel Bentele
2019-08-24 11:10 ` Manuel Bentele
2019-09-12  2:24 ` Ming Lei
2019-09-13 11:57   ` Manuel Bentele
2019-09-16  2:11     ` Ming Lei
2019-09-18 10:26       ` Simon Rettberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).