All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/4] block: Warn about usage of growing formats over non-growable protocols
@ 2014-07-11 22:23 Max Reitz
  2014-07-11 22:23 ` [Qemu-devel] [PATCH 1/4] block: Correct bs->growable Max Reitz
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Max Reitz @ 2014-07-11 22:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Max Reitz

Some image formats (e.g. qcow2) require the underlying file to grow on
write accesses, but this is in fact not supported by all protocols (e.g.
nbd does not). If such a format requiring file growth is used
non-read-only over a protocol which does not support this, a warning
should be issued.

This warning is issued for example whenever one tries to export a qcow2
image over nbd-server and use the export from qemu.

We could make this an error, but I decided not to in order to retain
"compatibility" (although if this warning is emitted, the user should
have already complained about I/O errors on write accesses, so it's
questionable what behavior this should be compatible to).

While at it, make BDS.growable actually indicate what it is (as far as I
understood) supposed to: Whether the BDS supports writes beyond the end
of the disk size (which then automatically increase that disk size).


Max Reitz (4):
  block: Correct bs->growable
  block: Introduce requires_growing_file
  iotests: Make some qemu-io commands read-only
  iotests: Skip read and write in 040 for length=0

 block.c                    | 11 +++++++++++
 block/blkdebug.c           |  2 ++
 block/blkverify.c          |  2 ++
 block/cow.c                |  1 +
 block/iscsi.c              |  2 ++
 block/nbd.c                |  2 ++
 block/qcow.c               |  1 +
 block/qcow2.c              |  2 ++
 block/qed.c                |  1 +
 block/raw_bsd.c            |  1 +
 block/vdi.c                |  2 ++
 block/vhdx.c               |  2 ++
 block/vmdk.c               |  1 +
 block/vpc.c                |  2 ++
 include/block/block_int.h  |  4 ++++
 tests/qemu-iotests/040     | 18 +++++++++++-------
 tests/qemu-iotests/072     |  9 ++++++++-
 tests/qemu-iotests/072.out |  1 +
 tests/qemu-iotests/089     |  2 +-
 19 files changed, 57 insertions(+), 9 deletions(-)

-- 
2.0.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 1/4] block: Correct bs->growable
  2014-07-11 22:23 [Qemu-devel] [PATCH 0/4] block: Warn about usage of growing formats over non-growable protocols Max Reitz
@ 2014-07-11 22:23 ` Max Reitz
  2014-08-20 11:40   ` Kevin Wolf
  2014-07-11 22:23 ` [Qemu-devel] [PATCH 2/4] block: Introduce requires_growing_file Max Reitz
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Max Reitz @ 2014-07-11 22:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Max Reitz

Currently, the field "growable" in a BDS is set iff the BDS is opened in
protocol mode (with O_BDRV_PROTOCOL). However, not every protocol block
driver allows growing: NBD, for instance, does not. On the other hand,
a non-protocol block driver may allow growing: The raw driver does.

Fix this by correcting the "growable" field in the driver-specific open
function for the BDS, if necessary.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/blkdebug.c  | 2 ++
 block/blkverify.c | 2 ++
 block/iscsi.c     | 2 ++
 block/nbd.c       | 2 ++
 block/raw_bsd.c   | 1 +
 5 files changed, 9 insertions(+)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index f51407d..c0e5927 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -418,6 +418,8 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
         goto out;
     }
 
+    bs->growable = bs->file->growable;
+
     /* Set request alignment */
     align = qemu_opt_get_size(opts, "align", bs->request_alignment);
     if (align > 0 && align < INT_MAX && !(align & (align - 1))) {
diff --git a/block/blkverify.c b/block/blkverify.c
index 621b785..fe90ada 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -154,6 +154,8 @@ static int blkverify_open(BlockDriverState *bs, QDict *options, int flags,
         goto fail;
     }
 
+    bs->growable = bs->file->growable && s->test_file->growable;
+
     ret = 0;
 fail:
     return ret;
diff --git a/block/iscsi.c b/block/iscsi.c
index f3e83e2..91626c7 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1246,6 +1246,8 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
     const char *filename;
     int i, ret;
 
+    bs->growable = false;
+
     if ((BDRV_SECTOR_SIZE % 512) != 0) {
         error_setg(errp, "iSCSI: Invalid BDRV_SECTOR_SIZE. "
                    "BDRV_SECTOR_SIZE(%lld) is not a multiple "
diff --git a/block/nbd.c b/block/nbd.c
index 4eda095..bd977f4 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -253,6 +253,8 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
     int result, sock;
     Error *local_err = NULL;
 
+    bs->growable = false;
+
     /* Pop the config into our state object. Exit if invalid. */
     nbd_config(s, options, &export, &local_err);
     if (local_err) {
diff --git a/block/raw_bsd.c b/block/raw_bsd.c
index 492f58d..34c091e 100644
--- a/block/raw_bsd.c
+++ b/block/raw_bsd.c
@@ -159,6 +159,7 @@ static int raw_open(BlockDriverState *bs, QDict *options, int flags,
                     Error **errp)
 {
     bs->sg = bs->file->sg;
+    bs->growable = bs->file->growable;
     return 0;
 }
 
-- 
2.0.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 2/4] block: Introduce requires_growing_file
  2014-07-11 22:23 [Qemu-devel] [PATCH 0/4] block: Warn about usage of growing formats over non-growable protocols Max Reitz
  2014-07-11 22:23 ` [Qemu-devel] [PATCH 1/4] block: Correct bs->growable Max Reitz
@ 2014-07-11 22:23 ` Max Reitz
  2014-07-11 22:23 ` [Qemu-devel] [PATCH 3/4] iotests: Make some qemu-io commands read-only Max Reitz
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Max Reitz @ 2014-07-11 22:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Max Reitz

There are several block drivers which may require the underlying file to
grow on write accesses because new clusters need to be allocated. If
such a format should be used non-read-only over a protocol which does
not allow file growth, emit an appropriate warning.

This is relevant e.g. for qcow2 over NBD, if the user tried to export
the raw image instead of using the qcow2 block driver on the host (e.g.
when using nbd-server).

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block.c                   | 11 +++++++++++
 block/cow.c               |  1 +
 block/qcow.c              |  1 +
 block/qcow2.c             |  2 ++
 block/qed.c               |  1 +
 block/vdi.c               |  2 ++
 block/vhdx.c              |  2 ++
 block/vmdk.c              |  1 +
 block/vpc.c               |  2 ++
 include/block/block_int.h |  4 ++++
 10 files changed, 27 insertions(+)

diff --git a/block.c b/block.c
index cb95e08..ba0115d 100644
--- a/block.c
+++ b/block.c
@@ -1474,6 +1474,17 @@ int bdrv_open(BlockDriverState **pbs, const char *filename,
         goto fail;
     }
 
+    if (drv->requires_growing_file && file && !file->growable &&
+        (flags & BDRV_O_RDWR))
+    {
+        error_report("Writes to the image '%s' in format '%s' may require the "
+                     "file to grow which is not supported for this file; "
+                     "please open it read-only or try to export it in a "
+                     "different format over the protocol (e.g. use qemu-nbd "
+                     "instead of nbd-server)",
+                     filename ?: "", drv->format_name);
+    }
+
     if (file && (bs->file != file)) {
         bdrv_unref(file);
         file = NULL;
diff --git a/block/cow.c b/block/cow.c
index 6ee4833..b944074 100644
--- a/block/cow.c
+++ b/block/cow.c
@@ -416,6 +416,7 @@ static BlockDriver bdrv_cow = {
     .bdrv_create    = cow_create,
     .bdrv_has_zero_init     = bdrv_has_zero_init_1,
     .supports_backing       = true,
+    .requires_growing_file  = true,
 
     .bdrv_read              = cow_co_read,
     .bdrv_write             = cow_co_write,
diff --git a/block/qcow.c b/block/qcow.c
index a874056..2dba733 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -942,6 +942,7 @@ static BlockDriver bdrv_qcow = {
     .bdrv_create            = qcow_create,
     .bdrv_has_zero_init     = bdrv_has_zero_init_1,
     .supports_backing       = true,
+    .requires_growing_file  = true,
 
     .bdrv_co_readv          = qcow_co_readv,
     .bdrv_co_writev         = qcow_co_writev,
diff --git a/block/qcow2.c b/block/qcow2.c
index 5b52f97..2d2b851 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2429,6 +2429,8 @@ static BlockDriver bdrv_qcow2 = {
     .supports_backing           = true,
     .bdrv_change_backing_file   = qcow2_change_backing_file,
 
+    .requires_growing_file      = true,
+
     .bdrv_refresh_limits        = qcow2_refresh_limits,
     .bdrv_invalidate_cache      = qcow2_invalidate_cache,
 
diff --git a/block/qed.c b/block/qed.c
index cd4872b..06170b7 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -1675,6 +1675,7 @@ static BlockDriver bdrv_qed = {
     .instance_size            = sizeof(BDRVQEDState),
     .create_opts              = &qed_create_opts,
     .supports_backing         = true,
+    .requires_growing_file    = true,
 
     .bdrv_probe               = bdrv_qed_probe,
     .bdrv_rebind              = bdrv_qed_rebind,
diff --git a/block/vdi.c b/block/vdi.c
index 197bd77..4b9ae21 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -873,6 +873,8 @@ static BlockDriver bdrv_vdi = {
 
     .create_opts = &vdi_create_opts,
     .bdrv_check = vdi_check,
+
+    .requires_growing_file = true,
 };
 
 static void bdrv_vdi_init(void)
diff --git a/block/vhdx.c b/block/vhdx.c
index fedcf9f..2ceb5a4 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -1927,6 +1927,8 @@ static BlockDriver bdrv_vhdx = {
     .bdrv_check             = vhdx_check,
 
     .create_opts            = &vhdx_create_opts,
+
+    .requires_growing_file  = true,
 };
 
 static void bdrv_vhdx_init(void)
diff --git a/block/vmdk.c b/block/vmdk.c
index f674127..78054cf 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -2180,6 +2180,7 @@ static BlockDriver bdrv_vmdk = {
     .bdrv_attach_aio_context      = vmdk_attach_aio_context,
 
     .supports_backing             = true,
+    .requires_growing_file        = true,
     .create_opts                  = &vmdk_create_opts,
 };
 
diff --git a/block/vpc.c b/block/vpc.c
index 8b376a4..8fc434e 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -934,6 +934,8 @@ static BlockDriver bdrv_vpc = {
 
     .create_opts            = &vpc_create_opts,
     .bdrv_has_zero_init     = vpc_has_zero_init,
+
+    .requires_growing_file  = true,
 };
 
 static void bdrv_vpc_init(void)
diff --git a/include/block/block_int.h b/include/block/block_int.h
index b420b37..adfd89c 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -104,6 +104,10 @@ struct BlockDriver {
     /* Set if a driver can support backing files */
     bool supports_backing;
 
+    /* Set if a driver may need its underlying files to be able to grow for
+     * write accesses */
+    bool requires_growing_file;
+
     /* For handling image reopen for split or non-split files */
     int (*bdrv_reopen_prepare)(BDRVReopenState *reopen_state,
                                BlockReopenQueue *queue, Error **errp);
-- 
2.0.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 3/4] iotests: Make some qemu-io commands read-only
  2014-07-11 22:23 [Qemu-devel] [PATCH 0/4] block: Warn about usage of growing formats over non-growable protocols Max Reitz
  2014-07-11 22:23 ` [Qemu-devel] [PATCH 1/4] block: Correct bs->growable Max Reitz
  2014-07-11 22:23 ` [Qemu-devel] [PATCH 2/4] block: Introduce requires_growing_file Max Reitz
@ 2014-07-11 22:23 ` Max Reitz
  2014-07-11 22:23 ` [Qemu-devel] [PATCH 4/4] iotests: Skip read and write in 040 for length=0 Max Reitz
  2014-08-15 15:23 ` [Qemu-devel] [PATCH 0/4] block: Warn about usage of growing formats over non-growable protocols Max Reitz
  4 siblings, 0 replies; 14+ messages in thread
From: Max Reitz @ 2014-07-11 22:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Max Reitz

Test 072 (and 089 which copied some parts from it) tests image format
nesting. When opening the inner image, qemu now correctly emits a
warning because you should not attempt to write to the inner image of
such a constellation. An example would be:

1. Create a qcow2 image with some relatively large virtual size.
2. Interpret this qcow2 image as a raw image and convert it to a qcow2
   image.
3. This yields a qcow2 image whose virtual size is just enough to hold
   the original image's physical size; trying to grow the original (now
   inner) image would fail because the outer image cannot grow (without
   explicitly changing its size at least).

Therefore, the warnings are correct; but the tests do not attempt to
write to the inner image, they only read from it. Therefore, open it
read-only to get rid of the messages.

While at it, consciously add a check to test 072 that the warning is
actually emitted if the inner image is not opened read-only.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/072     | 9 ++++++++-
 tests/qemu-iotests/072.out | 1 +
 tests/qemu-iotests/089     | 2 +-
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/072 b/tests/qemu-iotests/072
index 58faa8b..1d39b16 100755
--- a/tests/qemu-iotests/072
+++ b/tests/qemu-iotests/072
@@ -55,7 +55,7 @@ $QEMU_IO -c 'write -P 42 0 512' -c 'write -P 23 512 512' \
 
 $QEMU_IMG convert -f raw -O $IMGFMT "$TEST_IMG.base" "$TEST_IMG"
 
-$QEMU_IO -c "open -o driver=$IMGFMT,file.driver=$IMGFMT,file.file.filename=$TEST_IMG" \
+$QEMU_IO -c "open -r -o driver=$IMGFMT,file.driver=$IMGFMT,file.file.filename=$TEST_IMG" \
          -c 'read -P 42 0 512' -c 'read -P 23 512 512' \
          -c 'read -P 66 1024 512' | _filter_qemu_io
 
@@ -63,6 +63,13 @@ $QEMU_IO -c "open -o driver=$IMGFMT,file.driver=$IMGFMT,file.file.filename=$TEST
 # should not work for any image formats with a header.
 $QEMU_IO -c 'read -P 42 0 512' "$TEST_IMG" | _filter_qemu_io
 
+# This should emit a warning: Writes to the inner image might require it to grow
+# which it cannot because the outer image has a fixed virtual size. Therefore,
+# opening it non-read-only may result in I/O errors and this is why there should
+# be a warning.
+$QEMU_IO -c "open -o driver=$IMGFMT,file.driver=$IMGFMT,file.file.filename=$TEST_IMG" \
+    2>&1 | _filter_imgfmt
+
 # success, all done
 echo "*** done"
 rm -f $seq.full
diff --git a/tests/qemu-iotests/072.out b/tests/qemu-iotests/072.out
index efe577c..11d6d81 100644
--- a/tests/qemu-iotests/072.out
+++ b/tests/qemu-iotests/072.out
@@ -18,4 +18,5 @@ read 512/512 bytes at offset 1024
 Pattern verification failed at offset 0, 512 bytes
 read 512/512 bytes at offset 0
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Writes to the image '' in format 'IMGFMT' may require the file to grow which is not supported for this file; please open it read-only or try to export it in a different format over the protocol (e.g. use qemu-nbd instead of nbd-server)
 *** done
diff --git a/tests/qemu-iotests/089 b/tests/qemu-iotests/089
index dffc977..2f53735 100755
--- a/tests/qemu-iotests/089
+++ b/tests/qemu-iotests/089
@@ -66,7 +66,7 @@ $QEMU_IO -c 'write -P 42 0 512' -c 'write -P 23 512 512' \
 $QEMU_IMG convert -f raw -O $IMGFMT "$TEST_IMG.base" "$TEST_IMG"
 
 $QEMU_IO -c 'read -P 42 0 512' -c 'read -P 23 512 512' \
-         -c 'read -P 66 1024 512' "json:{
+         -c 'read -P 66 1024 512' -r "json:{
     \"driver\": \"$IMGFMT\",
     \"file\": {
         \"driver\": \"$IMGFMT\",
-- 
2.0.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 4/4] iotests: Skip read and write in 040 for length=0
  2014-07-11 22:23 [Qemu-devel] [PATCH 0/4] block: Warn about usage of growing formats over non-growable protocols Max Reitz
                   ` (2 preceding siblings ...)
  2014-07-11 22:23 ` [Qemu-devel] [PATCH 3/4] iotests: Make some qemu-io commands read-only Max Reitz
@ 2014-07-11 22:23 ` Max Reitz
  2014-08-15 15:23 ` [Qemu-devel] [PATCH 0/4] block: Warn about usage of growing formats over non-growable protocols Max Reitz
  4 siblings, 0 replies; 14+ messages in thread
From: Max Reitz @ 2014-07-11 22:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Max Reitz

Do not prevent to actually do a meaningful comparison after the commit
in case the backing image has a length of zero: The reason qemu-io does
not return "verification failed" in these tests is simply that it
returns "I/O error" instead. Therefore, it's better to skip these
comparisons instead of pretending they'd be useful.

Furthermore, do not write anything to the backing file in case its
length should be zero: As of "block: Correct bs->growable", raw images
are growable which means the backing file will no longer have a length
of zero after something has been written to it. Thus the write operation
should be skipped in order to actually test a commit operation on a
zero-length backing file.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/040 | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
index f1e16c1..4accd3e 100755
--- a/tests/qemu-iotests/040
+++ b/tests/qemu-iotests/040
@@ -78,7 +78,8 @@ class TestSingleDrive(ImageCommitTestCase):
         iotests.create_image(backing_img, self.image_len)
         qemu_img('create', '-f', iotests.imgfmt, '-o', 'backing_file=%s' % backing_img, mid_img)
         qemu_img('create', '-f', iotests.imgfmt, '-o', 'backing_file=%s' % mid_img, test_img)
-        qemu_io('-c', 'write -P 0xab 0 524288', backing_img)
+        if self.image_len > 0:
+            qemu_io('-c', 'write -P 0xab 0 524288', backing_img)
         qemu_io('-c', 'write -P 0xef 524288 524288', mid_img)
         self.vm = iotests.VM().add_drive(test_img)
         self.vm.launch()
@@ -91,8 +92,9 @@ class TestSingleDrive(ImageCommitTestCase):
 
     def test_commit(self):
         self.run_commit_test(mid_img, backing_img)
-        self.assertEqual(-1, qemu_io('-c', 'read -P 0xab 0 524288', backing_img).find("verification failed"))
-        self.assertEqual(-1, qemu_io('-c', 'read -P 0xef 524288 524288', backing_img).find("verification failed"))
+        if self.image_len > 0:
+            self.assertEqual(-1, qemu_io('-c', 'read -P 0xab 0 524288', backing_img).find("verification failed"))
+            self.assertEqual(-1, qemu_io('-c', 'read -P 0xef 524288 524288', backing_img).find("verification failed"))
 
     def test_device_not_found(self):
         result = self.vm.qmp('block-commit', device='nonexistent', top='%s' % mid_img)
@@ -118,13 +120,15 @@ class TestSingleDrive(ImageCommitTestCase):
 
     def test_top_is_active(self):
         self.run_commit_test(test_img, backing_img, need_ready=True)
-        self.assertEqual(-1, qemu_io('-c', 'read -P 0xab 0 524288', backing_img).find("verification failed"))
-        self.assertEqual(-1, qemu_io('-c', 'read -P 0xef 524288 524288', backing_img).find("verification failed"))
+        if self.image_len > 0:
+            self.assertEqual(-1, qemu_io('-c', 'read -P 0xab 0 524288', backing_img).find("verification failed"))
+            self.assertEqual(-1, qemu_io('-c', 'read -P 0xef 524288 524288', backing_img).find("verification failed"))
 
     def test_top_is_default_active(self):
         self.run_default_commit_test()
-        self.assertEqual(-1, qemu_io('-c', 'read -P 0xab 0 524288', backing_img).find("verification failed"))
-        self.assertEqual(-1, qemu_io('-c', 'read -P 0xef 524288 524288', backing_img).find("verification failed"))
+        if self.image_len > 0:
+            self.assertEqual(-1, qemu_io('-c', 'read -P 0xab 0 524288', backing_img).find("verification failed"))
+            self.assertEqual(-1, qemu_io('-c', 'read -P 0xef 524288 524288', backing_img).find("verification failed"))
 
     def test_top_and_base_reversed(self):
         self.assert_no_active_block_jobs()
-- 
2.0.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] block: Warn about usage of growing formats over non-growable protocols
  2014-07-11 22:23 [Qemu-devel] [PATCH 0/4] block: Warn about usage of growing formats over non-growable protocols Max Reitz
                   ` (3 preceding siblings ...)
  2014-07-11 22:23 ` [Qemu-devel] [PATCH 4/4] iotests: Skip read and write in 040 for length=0 Max Reitz
@ 2014-08-15 15:23 ` Max Reitz
  4 siblings, 0 replies; 14+ messages in thread
From: Max Reitz @ 2014-08-15 15:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi

On 12.07.2014 00:23, Max Reitz wrote:
> Some image formats (e.g. qcow2) require the underlying file to grow on
> write accesses, but this is in fact not supported by all protocols (e.g.
> nbd does not). If such a format requiring file growth is used
> non-read-only over a protocol which does not support this, a warning
> should be issued.
>
> This warning is issued for example whenever one tries to export a qcow2
> image over nbd-server and use the export from qemu.
>
> We could make this an error, but I decided not to in order to retain
> "compatibility" (although if this warning is emitted, the user should
> have already complained about I/O errors on write accesses, so it's
> questionable what behavior this should be compatible to).
>
> While at it, make BDS.growable actually indicate what it is (as far as I
> understood) supposed to: Whether the BDS supports writes beyond the end
> of the disk size (which then automatically increase that disk size).
>
>
> Max Reitz (4):
>    block: Correct bs->growable
>    block: Introduce requires_growing_file
>    iotests: Make some qemu-io commands read-only
>    iotests: Skip read and write in 040 for length=0
>
>   block.c                    | 11 +++++++++++
>   block/blkdebug.c           |  2 ++
>   block/blkverify.c          |  2 ++
>   block/cow.c                |  1 +
>   block/iscsi.c              |  2 ++
>   block/nbd.c                |  2 ++
>   block/qcow.c               |  1 +
>   block/qcow2.c              |  2 ++
>   block/qed.c                |  1 +
>   block/raw_bsd.c            |  1 +
>   block/vdi.c                |  2 ++
>   block/vhdx.c               |  2 ++
>   block/vmdk.c               |  1 +
>   block/vpc.c                |  2 ++
>   include/block/block_int.h  |  4 ++++
>   tests/qemu-iotests/040     | 18 +++++++++++-------
>   tests/qemu-iotests/072     |  9 ++++++++-
>   tests/qemu-iotests/072.out |  1 +
>   tests/qemu-iotests/089     |  2 +-
>   19 files changed, 57 insertions(+), 9 deletions(-)

Ping

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] block: Correct bs->growable
  2014-07-11 22:23 ` [Qemu-devel] [PATCH 1/4] block: Correct bs->growable Max Reitz
@ 2014-08-20 11:40   ` Kevin Wolf
  2014-08-20 19:13     ` Max Reitz
  2014-09-04 20:01     ` Max Reitz
  0 siblings, 2 replies; 14+ messages in thread
From: Kevin Wolf @ 2014-08-20 11:40 UTC (permalink / raw)
  To: Max Reitz; +Cc: qemu-devel, Stefan Hajnoczi, armbru

Am 12.07.2014 um 00:23 hat Max Reitz geschrieben:
> Currently, the field "growable" in a BDS is set iff the BDS is opened in
> protocol mode (with O_BDRV_PROTOCOL). However, not every protocol block
> driver allows growing: NBD, for instance, does not. On the other hand,
> a non-protocol block driver may allow growing: The raw driver does.
> 
> Fix this by correcting the "growable" field in the driver-specific open
> function for the BDS, if necessary.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>

I'm not sure I agree with bs->growable = true for raw. It's certainly
true that the backend can technically provide the functionality that
writes beyond EOF grow the file. That's not the point of bs->growable,
though.

The point of it was to _forbid_ it to grow even when it's technically
possible (non-file protocols weren't really a thing back then, apart
from vvfat, so the assumption was that it's always technically
possible). growable was introduced with bdrv_check_request(), which is
supposed to reject guest requests after the end of the virtual disk (and
this fixed a CVE, see commit 71d0770c). You're now disabling this check
for raw.

I think we need to make sure that bs->growable is only set if it is
opened for an image that has drv->requires_growing_file set and
therefore not directly used by a guest.

Well, except that with node-name a guest will be able to use any image
in the chain... Might this mean that it's really a BlockBackend
property?

Kevin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] block: Correct bs->growable
  2014-08-20 11:40   ` Kevin Wolf
@ 2014-08-20 19:13     ` Max Reitz
  2014-08-21  8:19       ` Kevin Wolf
  2014-09-04 20:01     ` Max Reitz
  1 sibling, 1 reply; 14+ messages in thread
From: Max Reitz @ 2014-08-20 19:13 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, Stefan Hajnoczi, armbru

On 20.08.2014 13:40, Kevin Wolf wrote:
> Am 12.07.2014 um 00:23 hat Max Reitz geschrieben:
>> Currently, the field "growable" in a BDS is set iff the BDS is opened in
>> protocol mode (with O_BDRV_PROTOCOL). However, not every protocol block
>> driver allows growing: NBD, for instance, does not. On the other hand,
>> a non-protocol block driver may allow growing: The raw driver does.
>>
>> Fix this by correcting the "growable" field in the driver-specific open
>> function for the BDS, if necessary.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
> I'm not sure I agree with bs->growable = true for raw. It's certainly
> true that the backend can technically provide the functionality that
> writes beyond EOF grow the file. That's not the point of bs->growable,
> though.
>
> The point of it was to _forbid_ it to grow even when it's technically
> possible (non-file protocols weren't really a thing back then, apart
> from vvfat, so the assumption was that it's always technically
> possible). growable was introduced with bdrv_check_request(), which is
> supposed to reject guest requests after the end of the virtual disk (and
> this fixed a CVE, see commit 71d0770c). You're now disabling this check
> for raw.
>
> I think we need to make sure that bs->growable is only set if it is
> opened for an image that has drv->requires_growing_file set and
> therefore not directly used by a guest.
>
> Well, except that with node-name a guest will be able to use any image
> in the chain... Might this mean that it's really a BlockBackend
> property?

I guess I can make things easy for me by just introducing some 
"really_growable" or "writes_beyond_eof" field or something for the sake 
of this series. ;-)

Max

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] block: Correct bs->growable
  2014-08-20 19:13     ` Max Reitz
@ 2014-08-21  8:19       ` Kevin Wolf
  2014-08-22 13:26         ` Max Reitz
  0 siblings, 1 reply; 14+ messages in thread
From: Kevin Wolf @ 2014-08-21  8:19 UTC (permalink / raw)
  To: Max Reitz; +Cc: qemu-devel, Stefan Hajnoczi, armbru

Am 20.08.2014 um 21:13 hat Max Reitz geschrieben:
> On 20.08.2014 13:40, Kevin Wolf wrote:
> >Am 12.07.2014 um 00:23 hat Max Reitz geschrieben:
> >>Currently, the field "growable" in a BDS is set iff the BDS is opened in
> >>protocol mode (with O_BDRV_PROTOCOL). However, not every protocol block
> >>driver allows growing: NBD, for instance, does not. On the other hand,
> >>a non-protocol block driver may allow growing: The raw driver does.
> >>
> >>Fix this by correcting the "growable" field in the driver-specific open
> >>function for the BDS, if necessary.
> >>
> >>Signed-off-by: Max Reitz <mreitz@redhat.com>
> >I'm not sure I agree with bs->growable = true for raw. It's certainly
> >true that the backend can technically provide the functionality that
> >writes beyond EOF grow the file. That's not the point of bs->growable,
> >though.
> >
> >The point of it was to _forbid_ it to grow even when it's technically
> >possible (non-file protocols weren't really a thing back then, apart
> >from vvfat, so the assumption was that it's always technically
> >possible). growable was introduced with bdrv_check_request(), which is
> >supposed to reject guest requests after the end of the virtual disk (and
> >this fixed a CVE, see commit 71d0770c). You're now disabling this check
> >for raw.
> >
> >I think we need to make sure that bs->growable is only set if it is
> >opened for an image that has drv->requires_growing_file set and
> >therefore not directly used by a guest.
> >
> >Well, except that with node-name a guest will be able to use any image
> >in the chain... Might this mean that it's really a BlockBackend
> >property?
> 
> I guess I can make things easy for me by just introducing some
> "really_growable" or "writes_beyond_eof" field or something for the
> sake of this series. ;-)

Nah, don't evade the real solution... Using BDRV_O_PROTOCOL like we
currently do isn't quite right either. If you clear growable when
requires_growing_file isn't set for the parent, you should be fine. I
think. Or hope.

Kevin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] block: Correct bs->growable
  2014-08-21  8:19       ` Kevin Wolf
@ 2014-08-22 13:26         ` Max Reitz
  0 siblings, 0 replies; 14+ messages in thread
From: Max Reitz @ 2014-08-22 13:26 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, Stefan Hajnoczi, armbru

On 21.08.2014 10:19, Kevin Wolf wrote:
> Am 20.08.2014 um 21:13 hat Max Reitz geschrieben:
>> On 20.08.2014 13:40, Kevin Wolf wrote:
>>> Am 12.07.2014 um 00:23 hat Max Reitz geschrieben:
>>>> Currently, the field "growable" in a BDS is set iff the BDS is opened in
>>>> protocol mode (with O_BDRV_PROTOCOL). However, not every protocol block
>>>> driver allows growing: NBD, for instance, does not. On the other hand,
>>>> a non-protocol block driver may allow growing: The raw driver does.
>>>>
>>>> Fix this by correcting the "growable" field in the driver-specific open
>>>> function for the BDS, if necessary.
>>>>
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> I'm not sure I agree with bs->growable = true for raw. It's certainly
>>> true that the backend can technically provide the functionality that
>>> writes beyond EOF grow the file. That's not the point of bs->growable,
>>> though.
>>>
>>> The point of it was to _forbid_ it to grow even when it's technically
>>> possible (non-file protocols weren't really a thing back then, apart
>> >from vvfat, so the assumption was that it's always technically
>>> possible). growable was introduced with bdrv_check_request(), which is
>>> supposed to reject guest requests after the end of the virtual disk (and
>>> this fixed a CVE, see commit 71d0770c). You're now disabling this check
>>> for raw.
>>>
>>> I think we need to make sure that bs->growable is only set if it is
>>> opened for an image that has drv->requires_growing_file set and
>>> therefore not directly used by a guest.
>>>
>>> Well, except that with node-name a guest will be able to use any image
>>> in the chain... Might this mean that it's really a BlockBackend
>>> property?
>> I guess I can make things easy for me by just introducing some
>> "really_growable" or "writes_beyond_eof" field or something for the
>> sake of this series. ;-)
> Nah, don't evade the real solution... Using BDRV_O_PROTOCOL like we
> currently do isn't quite right either. If you clear growable when
> requires_growing_file isn't set for the parent, you should be fine. I
> think. Or hope.

But then using qcow2 over raw over file will throw a warning. *g*

...Okay, you're right, I can go with growable and just don't set it for raw.

Max

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] block: Correct bs->growable
  2014-08-20 11:40   ` Kevin Wolf
  2014-08-20 19:13     ` Max Reitz
@ 2014-09-04 20:01     ` Max Reitz
  2014-09-05 10:01       ` Kevin Wolf
  1 sibling, 1 reply; 14+ messages in thread
From: Max Reitz @ 2014-09-04 20:01 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, Stefan Hajnoczi, armbru

On 20.08.2014 13:40, Kevin Wolf wrote:
> Am 12.07.2014 um 00:23 hat Max Reitz geschrieben:
>> Currently, the field "growable" in a BDS is set iff the BDS is opened in
>> protocol mode (with O_BDRV_PROTOCOL). However, not every protocol block
>> driver allows growing: NBD, for instance, does not. On the other hand,
>> a non-protocol block driver may allow growing: The raw driver does.
>>
>> Fix this by correcting the "growable" field in the driver-specific open
>> function for the BDS, if necessary.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
> I'm not sure I agree with bs->growable = true for raw. It's certainly
> true that the backend can technically provide the functionality that
> writes beyond EOF grow the file. That's not the point of bs->growable,
> though.
>
> The point of it was to _forbid_ it to grow even when it's technically
> possible (non-file protocols weren't really a thing back then, apart
> from vvfat, so the assumption was that it's always technically
> possible). growable was introduced with bdrv_check_request(), which is
> supposed to reject guest requests after the end of the virtual disk (and
> this fixed a CVE, see commit 71d0770c). You're now disabling this check
> for raw.
>
> I think we need to make sure that bs->growable is only set if it is
> opened for an image that has drv->requires_growing_file set and
> therefore not directly used by a guest.
>
> Well, except that with node-name a guest will be able to use any image
> in the chain... Might this mean that it's really a BlockBackend
> property?

Okay, the more I sit at this problem, the harder it seems to get. The 
only thing I currently know for sure is that I disagree with Anthony's 
opinion in 71d0770c ("this patch makes the BlockDriver API guarantee 
that all requests are within 0..bdrv_getlength() which to me seems like 
a Good Thing").

The initial point was to range check guest requests. In 2009, it may 
have been enough to statically check the BDS type (protocol or format) 
to know whether the guest directly accesses it (format) or not 
(protocol). However, this is no longer sufficient. Now, as far as I 
know, guests can access basically any BDS (as you yourself say). 
Therefore, it seems to me like it's impossible to determine whether the 
device should be marked growable or not when opening it. Instead, I 
think we have to check for each single request whether it comes from the 
guest or from within the block layer and do range checking only for the 
former case; but this should not be the task of the block layer core, 
but of the block devices instead. Theoretically, guests may write beyond 
the image end and grow it that way all they want, but in practice this 
should be limited by the block devices which have a fixed length.

Under this impression, I wanted to simply return to growable = false for 
raw. However, this breaks test 071 which attaches blkdebug to a raw BDS 
after qemu has been started. blkdebug detects raw is not growable, 
therefore reports not to be growable as well, and because qcow2 is on 
top of all that, the warning introduced by this series is emitted. Okay, 
so we will need growable = true for raw in some cases.

It's not trivial to determine whether the superordinate BDS of a certain 
BDS has BlockDriver.requires_growing_file set or not. We could introduce 
a new flag for bdrv_open(), but I'd rather avoid it. In fact, I tried 
something like this, but I quickly got into problems because e.g. 
blkdebug does not have requires_growing_file_set, but decides whether 
its own BDS are growable based on whether the underlying file is 
growable or not. So if a blkdebug BDS is growable, the underlying file 
actually needs to be growable as well. Therefore, we'd need a more 
sophisticated requires_growing_file_set and maybe even propagation of 
growable requirement through the BDS layers which quickly turns ugly 
when one has to consider that a BDS may be used by multiple users.

Also, it's actually irrelevant whether requires_growing_file is set or 
not. growable's current sole purpose apparently is enabling range checks 
for guest-accessible images. If the BDS is only a subordinate to another 
BDS, it doesn't matter whether that other BDS needs growable set or not.

So I scrapped that. Instead, we can just test whether BDRV_O_PROTOCOL is 
given or not. If it is, the BDS is used from within the block layer; if 
it isn't, it probably isn't, and even if it is, the user just has to 
cope with activated range checks. That's at least the idea.

But this doesn't work either: You can create a protocol BDS and then 
pass it to the guest; on second thought, however, this is already 
possible, so I wouldn't bother about this. But on the other hand, this 
breaks 071 as well because 071 creates a (non-guest accessible, but it 
could be) non-protocol BDS and then tries to put blkdebug on top of 
that. I do know that this is not a general use case but it should work 
anyway.

So, in my honest and very humble opinion, I'd reevaluate the usefulness 
of BDS.growable regarding it's original purpose and instead change it to 
be what this patch does.

Anthony argumented that the block layer could very well do the range 
checks. It can, but it cannot trivially know (at least in its current 
state) whether a certain request comes from the guest or not. In 2009, 
this may have been known when the BDS was created; but this is 
absolutely not true today.

On the other hand, the block devices very well know that any request 
coming to them has to fit in a certain range. They should do that range 
checking, not the block layer. I understand the intent of having a 
redundant fail-safe system, but it simply doesn't work anymore. We 
cannot sometimes expected raw BDS to grow (when in the middle of a BDS 
stack) and sometimes not (when directly used by a guest). On the other 
hand, the guest can simply be given a protocol BDS and all the range 
checks are disabled.

Putting BDS.growable into BlockBackend may (and probably) will fix this. 
But I really don't like doing the check in the block layer when it's 
really the block devices who are responsible for it, even if it's just a 
backup check.

The worst thing is, I can't even introduce a new field like 
"writes_beyond_eof" to BDS to circumvent all of this. Again, take the 
blkdebug-on-existing-raw example. With a separate field, the block layer 
will not complain about opening that constellation. But if you actually 
try to write something to the qcow2 BDS which would make the image file 
grow, the range checks breaks everything because it only cares about 
growable. So in the end, the block layer should actually have complained 
about the constellation. But on the other hand, it shouldn't have, 
because the constellation should work. This really is the heart of the 
problem: The raw BDS might be exposed to the guest, so when the guest 
accesses it, range checks should be done. But if it is used through 
qcow2+blkdebug, range checks should be disabled. Using BlockBackend will 
fix this, but we don't have that yet.


tl;dr, I see only two ways to go on: Either I wait until BlockBackend 
exists; or I simply leave this patch as it is because it's actually the 
block device driver's fault if an out-of-range request comes in from the 
guest. Since I remember talking about the former a year ago personally 
with you and Markus, I fear it'll still take a considerable amount of 
time. Therefore, I'm strongly in favor of the latter. If the block 
device drivers do their job, it won't break anything. If they don't, 
they should be fixed (and at least it'll be only raw that's broken).


Max

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] block: Correct bs->growable
  2014-09-04 20:01     ` Max Reitz
@ 2014-09-05 10:01       ` Kevin Wolf
  2014-09-05 12:46         ` Max Reitz
  0 siblings, 1 reply; 14+ messages in thread
From: Kevin Wolf @ 2014-09-05 10:01 UTC (permalink / raw)
  To: Max Reitz; +Cc: qemu-devel, Stefan Hajnoczi, armbru

Am 04.09.2014 um 22:01 hat Max Reitz geschrieben:
> On 20.08.2014 13:40, Kevin Wolf wrote:
> >Am 12.07.2014 um 00:23 hat Max Reitz geschrieben:
> >>Currently, the field "growable" in a BDS is set iff the BDS is opened in
> >>protocol mode (with O_BDRV_PROTOCOL). However, not every protocol block
> >>driver allows growing: NBD, for instance, does not. On the other hand,
> >>a non-protocol block driver may allow growing: The raw driver does.
> >>
> >>Fix this by correcting the "growable" field in the driver-specific open
> >>function for the BDS, if necessary.
> >>
> >>Signed-off-by: Max Reitz <mreitz@redhat.com>
> >I'm not sure I agree with bs->growable = true for raw. It's certainly
> >true that the backend can technically provide the functionality that
> >writes beyond EOF grow the file. That's not the point of bs->growable,
> >though.
> >
> >The point of it was to _forbid_ it to grow even when it's technically
> >possible (non-file protocols weren't really a thing back then, apart
> >from vvfat, so the assumption was that it's always technically
> >possible). growable was introduced with bdrv_check_request(), which is
> >supposed to reject guest requests after the end of the virtual disk (and
> >this fixed a CVE, see commit 71d0770c). You're now disabling this check
> >for raw.
> >
> >I think we need to make sure that bs->growable is only set if it is
> >opened for an image that has drv->requires_growing_file set and
> >therefore not directly used by a guest.
> >
> >Well, except that with node-name a guest will be able to use any image
> >in the chain... Might this mean that it's really a BlockBackend
> >property?
> 
> Okay, the more I sit at this problem, the harder it seems to get.
> The only thing I currently know for sure is that I disagree with
> Anthony's opinion in 71d0770c ("this patch makes the BlockDriver API
> guarantee that all requests are within 0..bdrv_getlength() which to
> me seems like a Good Thing").
> 
> The initial point was to range check guest requests. In 2009, it may
> have been enough to statically check the BDS type (protocol or
> format) to know whether the guest directly accesses it (format) or
> not (protocol). However, this is no longer sufficient. Now, as far
> as I know, guests can access basically any BDS (as you yourself
> say). Therefore, it seems to me like it's impossible to determine
> whether the device should be marked growable or not when opening it.
> Instead, I think we have to check for each single request whether it
> comes from the guest or from within the block layer and do range
> checking only for the former case; but this should not be the task
> of the block layer core, but of the block devices instead.
> Theoretically, guests may write beyond the image end and grow it
> that way all they want, but in practice this should be limited by
> the block devices which have a fixed length.

What about block jobs, built-in NBD server, etc.?

Also, we have many device models that I don't trust a bit and having one
central check is much easier to get right than n duplicates in each
device emulation.

> Under this impression, I wanted to simply return to growable = false
> for raw. However, this breaks test 071 which attaches blkdebug to a
> raw BDS after qemu has been started. blkdebug detects raw is not
> growable, therefore reports not to be growable as well, and because
> qcow2 is on top of all that, the warning introduced by this series
> is emitted. Okay, so we will need growable = true for raw in some
> cases.

I don't want to make the shortcut more tempting, but if we come to the
conclusion that a fixed growable=false for raw is the right thing to do,
then we simply need to fix the test case.

> It's not trivial to determine whether the superordinate BDS of a
> certain BDS has BlockDriver.requires_growing_file set or not. We
> could introduce a new flag for bdrv_open(), but I'd rather avoid it.
> In fact, I tried something like this, but I quickly got into
> problems because e.g. blkdebug does not have
> requires_growing_file_set, but decides whether its own BDS are
> growable based on whether the underlying file is growable or not. So
> if a blkdebug BDS is growable, the underlying file actually needs to
> be growable as well. Therefore, we'd need a more sophisticated
> requires_growing_file_set and maybe even propagation of growable
> requirement through the BDS layers which quickly turns ugly when one
> has to consider that a BDS may be used by multiple users.

Ugh. You're right, it's not a static per-driver property, but really a
per-BDS one. That's somewhat unfortunate.

> Also, it's actually irrelevant whether requires_growing_file is set
> or not. growable's current sole purpose apparently is enabling range
> checks for guest-accessible images. If the BDS is only a subordinate
> to another BDS, it doesn't matter whether that other BDS needs
> growable set or not.

Hm... So in blockdev talk, what you're saying is that range checks are a
BlockBackend thing, and if we're on BDS level, we don't care? Perhaps
that's the right approach and gets us rid of bs->growable immediately.

> So I scrapped that. Instead, we can just test whether
> BDRV_O_PROTOCOL is given or not. If it is, the BDS is used from
> within the block layer; if it isn't, it probably isn't, and even if
> it is, the user just has to cope with activated range checks. That's
> at least the idea.

Making any difference based on BDRV_O_PROTOCOL is a dead end in
blockdev-add times, where users are fully expected to build up a BDS
graph node by node with separate blockdev-add invocations.

> But this doesn't work either: You can create a protocol BDS and then
> pass it to the guest; on second thought, however, this is already
> possible, so I wouldn't bother about this. But on the other hand,
> this breaks 071 as well because 071 creates a (non-guest accessible,
> but it could be) non-protocol BDS and then tries to put blkdebug on
> top of that. I do know that this is not a general use case but it
> should work anyway.
> 
> So, in my honest and very humble opinion, I'd reevaluate the
> usefulness of BDS.growable regarding it's original purpose and
> instead change it to be what this patch does.

What use case is left if you don't use it for range checks any more?
Shouldn't we remove rather than change it?

Do your patches still need a per-BDS flag, or would a static per-driver
flag be enough?

Also, should qcow2 over host_device print a warning? It's a legitimate
setup that is frequently used, but a host_device isn't really growable
on its own. We rely on management taking care of it.

> Anthony argumented that the block layer could very well do the range
> checks. It can, but it cannot trivially know (at least in its
> current state) whether a certain request comes from the guest or
> not. In 2009, this may have been known when the BDS was created; but
> this is absolutely not true today.
> 
> On the other hand, the block devices very well know that any request
> coming to them has to fit in a certain range. They should do that
> range checking, not the block layer. I understand the intent of
> having a redundant fail-safe system, but it simply doesn't work
> anymore. We cannot sometimes expected raw BDS to grow (when in the
> middle of a BDS stack) and sometimes not (when directly used by a
> guest). On the other hand, the guest can simply be given a protocol
> BDS and all the range checks are disabled.
> 
> Putting BDS.growable into BlockBackend may (and probably) will fix
> this. But I really don't like doing the check in the block layer
> when it's really the block devices who are responsible for it, even
> if it's just a backup check.

I see, you've come to the same conclusion regarding BlockBackend.

And no, I don't agree that it belongs in the device emulation code.
There are more users of block devices than just those.

> The worst thing is, I can't even introduce a new field like
> "writes_beyond_eof" to BDS to circumvent all of this. Again, take
> the blkdebug-on-existing-raw example. With a separate field, the
> block layer will not complain about opening that constellation. But
> if you actually try to write something to the qcow2 BDS which would
> make the image file grow, the range checks breaks everything because
> it only cares about growable. So in the end, the block layer should
> actually have complained about the constellation. But on the other
> hand, it shouldn't have, because the constellation should work. This
> really is the heart of the problem: The raw BDS might be exposed to
> the guest, so when the guest accesses it, range checks should be
> done. But if it is used through qcow2+blkdebug, range checks should
> be disabled. Using BlockBackend will fix this, but we don't have
> that yet.
> 
> 
> tl;dr, I see only two ways to go on: Either I wait until
> BlockBackend exists; or I simply leave this patch as it is because
> it's actually the block device driver's fault if an out-of-range
> request comes in from the guest. Since I remember talking about the
> former a year ago personally with you and Markus, I fear it'll still
> take a considerable amount of time. Therefore, I'm strongly in favor
> of the latter. If the block device drivers do their job, it won't
> break anything. If they don't, they should be fixed (and at least
> it'll be only raw that's broken).

Wait until BlockBackend exists. Markus seems to have switched from "do
everything sometime" to "do something now" and is working on it, so I
hope we'll be there soon (before KVM Forum) with the initial version.
That should be enough for your requirement.

Not sure if it's helpful in this context, but reading all of this
reminded me of something I suggested before: BDSes must be able to treat
requests differently depending on where they come from. We called it
something like BDSes exposing different "views" or "ports". One of the
use cases was qcow2 exposing read-only views of inactive snapshots,
another one was group throttling, and what you just wrote sounded in
parts like it could make use of the same.

Kevin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] block: Correct bs->growable
  2014-09-05 10:01       ` Kevin Wolf
@ 2014-09-05 12:46         ` Max Reitz
  2014-09-05 13:13           ` Kevin Wolf
  0 siblings, 1 reply; 14+ messages in thread
From: Max Reitz @ 2014-09-05 12:46 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, Stefan Hajnoczi, armbru

On 05.09.2014 12:01, Kevin Wolf wrote:
> Am 04.09.2014 um 22:01 hat Max Reitz geschrieben:
>> On 20.08.2014 13:40, Kevin Wolf wrote:
>>> Am 12.07.2014 um 00:23 hat Max Reitz geschrieben:
>>>> Currently, the field "growable" in a BDS is set iff the BDS is opened in
>>>> protocol mode (with O_BDRV_PROTOCOL). However, not every protocol block
>>>> driver allows growing: NBD, for instance, does not. On the other hand,
>>>> a non-protocol block driver may allow growing: The raw driver does.
>>>>
>>>> Fix this by correcting the "growable" field in the driver-specific open
>>>> function for the BDS, if necessary.
>>>>
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> I'm not sure I agree with bs->growable = true for raw. It's certainly
>>> true that the backend can technically provide the functionality that
>>> writes beyond EOF grow the file. That's not the point of bs->growable,
>>> though.
>>>
>>> The point of it was to _forbid_ it to grow even when it's technically
>>> possible (non-file protocols weren't really a thing back then, apart
>> >from vvfat, so the assumption was that it's always technically
>>> possible). growable was introduced with bdrv_check_request(), which is
>>> supposed to reject guest requests after the end of the virtual disk (and
>>> this fixed a CVE, see commit 71d0770c). You're now disabling this check
>>> for raw.
>>>
>>> I think we need to make sure that bs->growable is only set if it is
>>> opened for an image that has drv->requires_growing_file set and
>>> therefore not directly used by a guest.
>>>
>>> Well, except that with node-name a guest will be able to use any image
>>> in the chain... Might this mean that it's really a BlockBackend
>>> property?
>> Okay, the more I sit at this problem, the harder it seems to get.
>> The only thing I currently know for sure is that I disagree with
>> Anthony's opinion in 71d0770c ("this patch makes the BlockDriver API
>> guarantee that all requests are within 0..bdrv_getlength() which to
>> me seems like a Good Thing").
>>
>> The initial point was to range check guest requests. In 2009, it may
>> have been enough to statically check the BDS type (protocol or
>> format) to know whether the guest directly accesses it (format) or
>> not (protocol). However, this is no longer sufficient. Now, as far
>> as I know, guests can access basically any BDS (as you yourself
>> say). Therefore, it seems to me like it's impossible to determine
>> whether the device should be marked growable or not when opening it.
>> Instead, I think we have to check for each single request whether it
>> comes from the guest or from within the block layer and do range
>> checking only for the former case; but this should not be the task
>> of the block layer core, but of the block devices instead.
>> Theoretically, guests may write beyond the image end and grow it
>> that way all they want, but in practice this should be limited by
>> the block devices which have a fixed length.
> What about block jobs, built-in NBD server, etc.?

Well, I honestly think they should do the check themselves as well.

> Also, we have many device models that I don't trust a bit and having one
> central check is much easier to get right than n duplicates in each
> device emulation.

That's why I (later) said that I do understand the concept of having a 
backup check.

I still think the block device drivers have to detect out-of-bounds 
accesses to be able to act differently than when just seeing an EIO from 
the block layer. The same applies to the NBD server; and then getting 
the range check into block jobs shouldn't hurt too much either.

I don't think all those guest interfaces (block jobs are not really a 
guest interface, imho, so I'd even keep them out of this) can just rely 
on the block layer to do the check for them.

>> Under this impression, I wanted to simply return to growable = false
>> for raw. However, this breaks test 071 which attaches blkdebug to a
>> raw BDS after qemu has been started. blkdebug detects raw is not
>> growable, therefore reports not to be growable as well, and because
>> qcow2 is on top of all that, the warning introduced by this series
>> is emitted. Okay, so we will need growable = true for raw in some
>> cases.
> I don't want to make the shortcut more tempting, but if we come to the
> conclusion that a fixed growable=false for raw is the right thing to do,
> then we simply need to fix the test case.

As I wrote later: I do know that this is not a general use case but it 
should work anyway.

You create some block device which naturally has raw on top; then you 
put another block format on top of that. It's probably crazy and 
useless, but by no means wrong. When accessing the raw BDS from the 
guest, it should not be growable. When accessing it through the format 
BDS on top of it, it should be.

>> It's not trivial to determine whether the superordinate BDS of a
>> certain BDS has BlockDriver.requires_growing_file set or not. We
>> could introduce a new flag for bdrv_open(), but I'd rather avoid it.
>> In fact, I tried something like this, but I quickly got into
>> problems because e.g. blkdebug does not have
>> requires_growing_file_set, but decides whether its own BDS are
>> growable based on whether the underlying file is growable or not. So
>> if a blkdebug BDS is growable, the underlying file actually needs to
>> be growable as well. Therefore, we'd need a more sophisticated
>> requires_growing_file_set and maybe even propagation of growable
>> requirement through the BDS layers which quickly turns ugly when one
>> has to consider that a BDS may be used by multiple users.
> Ugh. You're right, it's not a static per-driver property, but really a
> per-BDS one. That's somewhat unfortunate.
>
>> Also, it's actually irrelevant whether requires_growing_file is set
>> or not. growable's current sole purpose apparently is enabling range
>> checks for guest-accessible images. If the BDS is only a subordinate
>> to another BDS, it doesn't matter whether that other BDS needs
>> growable set or not.
> Hm... So in blockdev talk, what you're saying is that range checks are a
> BlockBackend thing, and if we're on BDS level, we don't care? Perhaps
> that's the right approach and gets us rid of bs->growable immediately.

Exactly.

>> So I scrapped that. Instead, we can just test whether
>> BDRV_O_PROTOCOL is given or not. If it is, the BDS is used from
>> within the block layer; if it isn't, it probably isn't, and even if
>> it is, the user just has to cope with activated range checks. That's
>> at least the idea.
> Making any difference based on BDRV_O_PROTOCOL is a dead end in
> blockdev-add times, where users are fully expected to build up a BDS
> graph node by node with separate blockdev-add invocations.

Right, but that just means the protection is already broken.

>> But this doesn't work either: You can create a protocol BDS and then
>> pass it to the guest; on second thought, however, this is already
>> possible, so I wouldn't bother about this. But on the other hand,
>> this breaks 071 as well because 071 creates a (non-guest accessible,
>> but it could be) non-protocol BDS and then tries to put blkdebug on
>> top of that. I do know that this is not a general use case but it
>> should work anyway.
>>
>> So, in my honest and very humble opinion, I'd reevaluate the
>> usefulness of BDS.growable regarding it's original purpose and
>> instead change it to be what this patch does.
> What use case is left if you don't use it for range checks any more?
> Shouldn't we remove rather than change it?
>
> Do your patches still need a per-BDS flag, or would a static per-driver
> flag be enough?

Well, that's not so easy. See blkdebug (I know blkdebug is a rather bad 
example, because noone really uses it except for the iotests, but it's 
there) and raw (which is another bad example, because it's what we're 
talking about here and don't really know how to handle it): A blkdebug 
or raw BDS is growable iff the underlying file BDS is growable. We could 
signal this through a special value in the BDS, but enter blkverify 
(just as bad as blkdebug, I know): It's growable iff both the file BDS 
and the tested BDS are growable; a similar thing applies to quorum 
(which I, for whatever reason, did not touch during this series; 
probably I should do that in v2).

> Also, should qcow2 over host_device print a warning? It's a legitimate
> setup that is frequently used, but a host_device isn't really growable
> on its own. We rely on management taking care of it.

I most certainly think it should warn. But I'll keep that in mind and 
reformulate the message (strip the "please fix your configuration" and 
just replace it by something like "write operations may fail").

>> Anthony argumented that the block layer could very well do the range
>> checks. It can, but it cannot trivially know (at least in its
>> current state) whether a certain request comes from the guest or
>> not. In 2009, this may have been known when the BDS was created; but
>> this is absolutely not true today.
>>
>> On the other hand, the block devices very well know that any request
>> coming to them has to fit in a certain range. They should do that
>> range checking, not the block layer. I understand the intent of
>> having a redundant fail-safe system, but it simply doesn't work
>> anymore. We cannot sometimes expected raw BDS to grow (when in the
>> middle of a BDS stack) and sometimes not (when directly used by a
>> guest). On the other hand, the guest can simply be given a protocol
>> BDS and all the range checks are disabled.
>>
>> Putting BDS.growable into BlockBackend may (and probably) will fix
>> this. But I really don't like doing the check in the block layer
>> when it's really the block devices who are responsible for it, even
>> if it's just a backup check.
> I see, you've come to the same conclusion regarding BlockBackend.
>
> And no, I don't agree that it belongs in the device emulation code.
> There are more users of block devices than just those.

I still think it does belong there. But I agree that we should have a 
central backup check in the block layer, if possible and feasible.

>> The worst thing is, I can't even introduce a new field like
>> "writes_beyond_eof" to BDS to circumvent all of this. Again, take
>> the blkdebug-on-existing-raw example. With a separate field, the
>> block layer will not complain about opening that constellation. But
>> if you actually try to write something to the qcow2 BDS which would
>> make the image file grow, the range checks breaks everything because
>> it only cares about growable. So in the end, the block layer should
>> actually have complained about the constellation. But on the other
>> hand, it shouldn't have, because the constellation should work. This
>> really is the heart of the problem: The raw BDS might be exposed to
>> the guest, so when the guest accesses it, range checks should be
>> done. But if it is used through qcow2+blkdebug, range checks should
>> be disabled. Using BlockBackend will fix this, but we don't have
>> that yet.
>>
>>
>> tl;dr, I see only two ways to go on: Either I wait until
>> BlockBackend exists; or I simply leave this patch as it is because
>> it's actually the block device driver's fault if an out-of-range
>> request comes in from the guest. Since I remember talking about the
>> former a year ago personally with you and Markus, I fear it'll still
>> take a considerable amount of time. Therefore, I'm strongly in favor
>> of the latter. If the block device drivers do their job, it won't
>> break anything. If they don't, they should be fixed (and at least
>> it'll be only raw that's broken).
> Wait until BlockBackend exists. Markus seems to have switched from "do
> everything sometime" to "do something now" and is working on it, so I
> hope we'll be there soon (before KVM Forum) with the initial version.
> That should be enough for your requirement.

Good.

> Not sure if it's helpful in this context, but reading all of this
> reminded me of something I suggested before: BDSes must be able to treat
> requests differently depending on where they come from. We called it
> something like BDSes exposing different "views" or "ports". One of the
> use cases was qcow2 exposing read-only views of inactive snapshots,
> another one was group throttling, and what you just wrote sounded in
> parts like it could make use of the same.

Yay, sounds like shaders in the block layer to me. :-)

Max

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] block: Correct bs->growable
  2014-09-05 12:46         ` Max Reitz
@ 2014-09-05 13:13           ` Kevin Wolf
  0 siblings, 0 replies; 14+ messages in thread
From: Kevin Wolf @ 2014-09-05 13:13 UTC (permalink / raw)
  To: Max Reitz; +Cc: qemu-devel, Stefan Hajnoczi, armbru

Am 05.09.2014 um 14:46 hat Max Reitz geschrieben:
> On 05.09.2014 12:01, Kevin Wolf wrote:
> >Am 04.09.2014 um 22:01 hat Max Reitz geschrieben:
> >>On 20.08.2014 13:40, Kevin Wolf wrote:
> >>>Am 12.07.2014 um 00:23 hat Max Reitz geschrieben:
> >>>>Currently, the field "growable" in a BDS is set iff the BDS is opened in
> >>>>protocol mode (with O_BDRV_PROTOCOL). However, not every protocol block
> >>>>driver allows growing: NBD, for instance, does not. On the other hand,
> >>>>a non-protocol block driver may allow growing: The raw driver does.
> >>>>
> >>>>Fix this by correcting the "growable" field in the driver-specific open
> >>>>function for the BDS, if necessary.
> >>>>
> >>>>Signed-off-by: Max Reitz <mreitz@redhat.com>
> >>>I'm not sure I agree with bs->growable = true for raw. It's certainly
> >>>true that the backend can technically provide the functionality that
> >>>writes beyond EOF grow the file. That's not the point of bs->growable,
> >>>though.
> >>>
> >>>The point of it was to _forbid_ it to grow even when it's technically
> >>>possible (non-file protocols weren't really a thing back then, apart
> >>>from vvfat, so the assumption was that it's always technically
> >>>possible). growable was introduced with bdrv_check_request(), which is
> >>>supposed to reject guest requests after the end of the virtual disk (and
> >>>this fixed a CVE, see commit 71d0770c). You're now disabling this check
> >>>for raw.
> >>>
> >>>I think we need to make sure that bs->growable is only set if it is
> >>>opened for an image that has drv->requires_growing_file set and
> >>>therefore not directly used by a guest.
> >>>
> >>>Well, except that with node-name a guest will be able to use any image
> >>>in the chain... Might this mean that it's really a BlockBackend
> >>>property?
> >>Okay, the more I sit at this problem, the harder it seems to get.
> >>The only thing I currently know for sure is that I disagree with
> >>Anthony's opinion in 71d0770c ("this patch makes the BlockDriver API
> >>guarantee that all requests are within 0..bdrv_getlength() which to
> >>me seems like a Good Thing").
> >>
> >>The initial point was to range check guest requests. In 2009, it may
> >>have been enough to statically check the BDS type (protocol or
> >>format) to know whether the guest directly accesses it (format) or
> >>not (protocol). However, this is no longer sufficient. Now, as far
> >>as I know, guests can access basically any BDS (as you yourself
> >>say). Therefore, it seems to me like it's impossible to determine
> >>whether the device should be marked growable or not when opening it.
> >>Instead, I think we have to check for each single request whether it
> >>comes from the guest or from within the block layer and do range
> >>checking only for the former case; but this should not be the task
> >>of the block layer core, but of the block devices instead.
> >>Theoretically, guests may write beyond the image end and grow it
> >>that way all they want, but in practice this should be limited by
> >>the block devices which have a fixed length.
> >What about block jobs, built-in NBD server, etc.?
> 
> Well, I honestly think they should do the check themselves as well.
> 
> >Also, we have many device models that I don't trust a bit and having one
> >central check is much easier to get right than n duplicates in each
> >device emulation.
> 
> That's why I (later) said that I do understand the concept of having
> a backup check.

Okay, then we're probably in violent agreement.

> I still think the block device drivers have to detect out-of-bounds
> accesses to be able to act differently than when just seeing an EIO
> from the block layer. The same applies to the NBD server; and then
> getting the range check into block jobs shouldn't hurt too much
> either.

True for devices, because they shouldn't apply rerror/werror to
invalid requests, but always report them to the guest. I'm not so sure
about the NBD server, I don't think it behaves differently for this kind
of errors.

> I don't think all those guest interfaces (block jobs are not really
> a guest interface, imho, so I'd even keep them out of this) can just
> rely on the block layer to do the check for them.

Block jobs and NBD servers are in the same category, I think. They are
somewhat more harmless than guest requests, but they are still outside
the core block layer and shouldn't be trusted.

> >>So I scrapped that. Instead, we can just test whether
> >>BDRV_O_PROTOCOL is given or not. If it is, the BDS is used from
> >>within the block layer; if it isn't, it probably isn't, and even if
> >>it is, the user just has to cope with activated range checks. That's
> >>at least the idea.
> >Making any difference based on BDRV_O_PROTOCOL is a dead end in
> >blockdev-add times, where users are fully expected to build up a BDS
> >graph node by node with separate blockdev-add invocations.
> 
> Right, but that just means the protection is already broken.

Is it really? I see how you can get BDSes _without_ BDRV_O_PROTOCOL
where you would have it when created automatically, so you're imposing
additional restriction (probably to the point of making the resulting
BDS useless), but can you actually bypass the protection?

> >>But this doesn't work either: You can create a protocol BDS and then
> >>pass it to the guest; on second thought, however, this is already
> >>possible, so I wouldn't bother about this. But on the other hand,
> >>this breaks 071 as well because 071 creates a (non-guest accessible,
> >>but it could be) non-protocol BDS and then tries to put blkdebug on
> >>top of that. I do know that this is not a general use case but it
> >>should work anyway.
> >>
> >>So, in my honest and very humble opinion, I'd reevaluate the
> >>usefulness of BDS.growable regarding it's original purpose and
> >>instead change it to be what this patch does.
> >What use case is left if you don't use it for range checks any more?
> >Shouldn't we remove rather than change it?
> >
> >Do your patches still need a per-BDS flag, or would a static per-driver
> >flag be enough?
> 
> Well, that's not so easy. See blkdebug (I know blkdebug is a rather
> bad example, because noone really uses it except for the iotests,
> but it's there) and raw (which is another bad example, because it's
> what we're talking about here and don't really know how to handle
> it): A blkdebug or raw BDS is growable iff the underlying file BDS
> is growable. We could signal this through a special value in the
> BDS, but enter blkverify (just as bad as blkdebug, I know): It's
> growable iff both the file BDS and the tested BDS are growable; a
> similar thing applies to quorum (which I, for whatever reason, did
> not touch during this series; probably I should do that in v2).

In short: Filters don't work this way. You're right.

Still begs the question if a callback doesn't make more sense than a BDS
field.

> >Also, should qcow2 over host_device print a warning? It's a legitimate
> >setup that is frequently used, but a host_device isn't really growable
> >on its own. We rely on management taking care of it.
> 
> I most certainly think it should warn. But I'll keep that in mind
> and reformulate the message (strip the "please fix your
> configuration" and just replace it by something like "write
> operations may fail").

Sounds reasonable.

> >>tl;dr, I see only two ways to go on: Either I wait until
> >>BlockBackend exists; or I simply leave this patch as it is because
> >>it's actually the block device driver's fault if an out-of-range
> >>request comes in from the guest. Since I remember talking about the
> >>former a year ago personally with you and Markus, I fear it'll still
> >>take a considerable amount of time. Therefore, I'm strongly in favor
> >>of the latter. If the block device drivers do their job, it won't
> >>break anything. If they don't, they should be fixed (and at least
> >>it'll be only raw that's broken).
> >Wait until BlockBackend exists. Markus seems to have switched from "do
> >everything sometime" to "do something now" and is working on it, so I
> >hope we'll be there soon (before KVM Forum) with the initial version.
> >That should be enough for your requirement.
> 
> Good.

I hope he agrees. ;-)

Kevin

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-09-05 13:14 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-11 22:23 [Qemu-devel] [PATCH 0/4] block: Warn about usage of growing formats over non-growable protocols Max Reitz
2014-07-11 22:23 ` [Qemu-devel] [PATCH 1/4] block: Correct bs->growable Max Reitz
2014-08-20 11:40   ` Kevin Wolf
2014-08-20 19:13     ` Max Reitz
2014-08-21  8:19       ` Kevin Wolf
2014-08-22 13:26         ` Max Reitz
2014-09-04 20:01     ` Max Reitz
2014-09-05 10:01       ` Kevin Wolf
2014-09-05 12:46         ` Max Reitz
2014-09-05 13:13           ` Kevin Wolf
2014-07-11 22:23 ` [Qemu-devel] [PATCH 2/4] block: Introduce requires_growing_file Max Reitz
2014-07-11 22:23 ` [Qemu-devel] [PATCH 3/4] iotests: Make some qemu-io commands read-only Max Reitz
2014-07-11 22:23 ` [Qemu-devel] [PATCH 4/4] iotests: Skip read and write in 040 for length=0 Max Reitz
2014-08-15 15:23 ` [Qemu-devel] [PATCH 0/4] block: Warn about usage of growing formats over non-growable protocols Max Reitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.