qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/2] Optimize alignment probing
@ 2019-08-25 22:03 Nir Soffer
  2019-08-25 22:03 ` [Qemu-devel] [PATCH v2 1/2] block: posix: Always allocate the first block Nir Soffer
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Nir Soffer @ 2019-08-25 22:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Nir Soffer, qemu-block, Max Reitz

When probing unallocated area on XFS filesystem we cannot detect request
alignment and we fallback to safe value which may not be optimal. Avoid this
fallback by always allocating the first block when creating a new image or
resizing empty image.

I tested v1 only with -raw format, and missed some changes in qcow2
tests creating raw images during the tests. This time I tested with both
-raw and -qcow2.

Changes in v2:
- Support file descriptor opened with O_DIRECT (e.g. in block_resize) (Max)
- Remove unneeded change in 160 (Max)
- Fix block filter in 175 on filesystem allocating extra blocks (Max)
- Comment why we ignore errors in allocte_first_block() (Max)
- Comment why allocate_first_block() is needed in FALLOC mode (Max)
- Clarify commit message about user visible changes (Maxim)
- Fix 178.out.qcow2
- Fix 150.out with -qcow2 by spliting to 150.out.{raw,qcow2}
- Add test for allocate_first_block() with block_resize (Max)
- Drop provisioing tests results since I ran them only once

v1 was here:
https://lists.nongnu.org/archive/html/qemu-block/2019-08/msg00821.html

Nir Soffer (2):
  block: posix: Always allocate the first block
  iotests: Test allocate_first_block() with O_DIRECT

 block/file-posix.c                            | 43 ++++++++++++++++++
 tests/qemu-iotests/{150.out => 150.out.qcow2} |  0
 tests/qemu-iotests/150.out.raw                | 12 +++++
 tests/qemu-iotests/175                        | 44 ++++++++++++++++---
 tests/qemu-iotests/175.out                    | 16 +++++--
 tests/qemu-iotests/178.out.qcow2              |  4 +-
 tests/qemu-iotests/221.out                    | 12 +++--
 tests/qemu-iotests/253.out                    | 12 +++--
 8 files changed, 123 insertions(+), 20 deletions(-)
 rename tests/qemu-iotests/{150.out => 150.out.qcow2} (100%)
 create mode 100644 tests/qemu-iotests/150.out.raw

-- 
2.20.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v2 1/2] block: posix: Always allocate the first block
  2019-08-25 22:03 [Qemu-devel] [PATCH v2 0/2] Optimize alignment probing Nir Soffer
@ 2019-08-25 22:03 ` Nir Soffer
  2019-08-26 12:31   ` Max Reitz
  2019-08-26 13:46   ` Eric Blake
  2019-08-25 22:03 ` [Qemu-devel] [PATCH v2 2/2] iotests: Test allocate_first_block() with O_DIRECT Nir Soffer
  2019-08-25 22:19 ` [Qemu-devel] [PATCH v2 0/2] Optimize alignment probing no-reply
  2 siblings, 2 replies; 13+ messages in thread
From: Nir Soffer @ 2019-08-25 22:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Nir Soffer, qemu-block, Max Reitz

When creating an image with preallocation "off" or "falloc", the first
block of the image is typically not allocated. When using Gluster
storage backed by XFS filesystem, reading this block using direct I/O
succeeds regardless of request length, fooling alignment detection.

In this case we fallback to a safe value (4096) instead of the optimal
value (512), which may lead to unneeded data copying when aligning
requests.  Allocating the first block avoids the fallback.

Since we allocate the first block even with preallocation=off, we no
longer create images with zero disk size:

    $ ./qemu-img create -f raw test.raw 1g
    Formatting 'test.raw', fmt=raw size=1073741824

    $ ls -lhs test.raw
    4.0K -rw-r--r--. 1 nsoffer nsoffer 1.0G Aug 16 23:48 test.raw

And converting the image requires additional cluster:

    $ ./qemu-img measure -f raw -O qcow2 test.raw
    required size: 458752
    fully allocated size: 1074135040

I did quick performance test for copying disks with qemu-img convert to
new raw target image to Gluster storage with sector size of 512 bytes:

    for i in $(seq 10); do
        rm -f dst.raw
        sleep 10
        time ./qemu-img convert -f raw -O raw -t none -T none src.raw dst.raw
    done

Here is a table comparing the total time spent:

Type    Before(s)   After(s)    Diff(%)
---------------------------------------
real      530.028    469.123      -11.4
user       17.204     10.768      -37.4
sys        17.881      7.011      -60.7

We can see very clear improvement in CPU usage.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
---
 block/file-posix.c                            | 43 +++++++++++++++++++
 tests/qemu-iotests/{150.out => 150.out.qcow2} |  0
 tests/qemu-iotests/150.out.raw                | 12 ++++++
 tests/qemu-iotests/175                        | 19 +++++---
 tests/qemu-iotests/175.out                    |  8 ++--
 tests/qemu-iotests/178.out.qcow2              |  4 +-
 tests/qemu-iotests/221.out                    | 12 ++++--
 tests/qemu-iotests/253.out                    | 12 ++++--
 8 files changed, 90 insertions(+), 20 deletions(-)
 rename tests/qemu-iotests/{150.out => 150.out.qcow2} (100%)
 create mode 100644 tests/qemu-iotests/150.out.raw

diff --git a/block/file-posix.c b/block/file-posix.c
index fbeb0068db..51688ae3fc 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1749,6 +1749,39 @@ static int handle_aiocb_discard(void *opaque)
     return ret;
 }
 
+/*
+ * Help alignment probing by allocating the first block.
+ *
+ * When reading with direct I/O from unallocated area on Gluster backed by XFS,
+ * reading succeeds regardless of request length. In this case we fallback to
+ * safe alignment which is not optimal. Allocating the first block avoids this
+ * fallback.
+ *
+ * fd may be opened with O_DIRECT, but we don't know the buffer alignment or
+ * request alignment, so we use safe values.
+ *
+ * Returns: 0 on success, -errno on failure. Since this is an optimization,
+ * caller may ignore failures.
+ */
+static int allocate_first_block(int fd, size_t max_size)
+{
+    size_t write_size = MIN(MAX_BLOCKSIZE, max_size);
+    size_t max_align = MAX(MAX_BLOCKSIZE, getpagesize());
+    void *buf;
+    ssize_t n;
+
+    buf = qemu_memalign(max_align, write_size);
+    memset(buf, 0, write_size);
+
+    do {
+        n = pwrite(fd, buf, write_size, 0);
+    } while (n == -1 && errno == EINTR);
+
+    qemu_vfree(buf);
+
+    return (n == -1) ? -errno : 0;
+}
+
 static int handle_aiocb_truncate(void *opaque)
 {
     RawPosixAIOData *aiocb = opaque;
@@ -1788,6 +1821,13 @@ static int handle_aiocb_truncate(void *opaque)
                 /* posix_fallocate() doesn't set errno. */
                 error_setg_errno(errp, -result,
                                  "Could not preallocate new data");
+            } else if (current_length == 0) {
+                /*
+                 * Needed only if posix_fallocate() used fallocate(), but we
+                 * don't have a way to detect that. Optimize future alignment
+                 * probing; ignore failures.
+                 */
+                allocate_first_block(fd, offset);
             }
         } else {
             result = 0;
@@ -1849,6 +1889,9 @@ static int handle_aiocb_truncate(void *opaque)
         if (ftruncate(fd, offset) != 0) {
             result = -errno;
             error_setg_errno(errp, -result, "Could not resize file");
+        } else if (current_length == 0 && offset > current_length) {
+            /* Optimize future alignment probing; ignore failures. */
+            allocate_first_block(fd, offset);
         }
         return result;
     default:
diff --git a/tests/qemu-iotests/150.out b/tests/qemu-iotests/150.out.qcow2
similarity index 100%
rename from tests/qemu-iotests/150.out
rename to tests/qemu-iotests/150.out.qcow2
diff --git a/tests/qemu-iotests/150.out.raw b/tests/qemu-iotests/150.out.raw
new file mode 100644
index 0000000000..3cdc7727a5
--- /dev/null
+++ b/tests/qemu-iotests/150.out.raw
@@ -0,0 +1,12 @@
+QA output created by 150
+
+=== Mapping sparse conversion ===
+
+Offset          Length          File
+0               0x1000          TEST_DIR/t.IMGFMT
+
+=== Mapping non-sparse conversion ===
+
+Offset          Length          File
+0               0x100000        TEST_DIR/t.IMGFMT
+*** done
diff --git a/tests/qemu-iotests/175 b/tests/qemu-iotests/175
index 51e62c8276..d54cb43c39 100755
--- a/tests/qemu-iotests/175
+++ b/tests/qemu-iotests/175
@@ -37,14 +37,16 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 # the file size.  This function hides the resulting difference in the
 # stat -c '%b' output.
 # Parameter 1: Number of blocks an empty file occupies
-# Parameter 2: Image size in bytes
+# Parameter 2: Minimal number of blocks in an image
+# Parameter 3: Image size in bytes
 _filter_blocks()
 {
     extra_blocks=$1
-    img_size=$2
+    min_blocks=$2
+    img_size=$3
 
-    sed -e "s/blocks=$extra_blocks\\(\$\\|[^0-9]\\)/nothing allocated/" \
-        -e "s/blocks=$((extra_blocks + img_size / 512))\\(\$\\|[^0-9]\\)/everything allocated/"
+    sed -e "s/blocks=$((min_blocks))\\(\$\\|[^0-9]\\)/min allocation/" \
+        -e "s/blocks=$((extra_blocks + img_size / 512))\\(\$\\|[^0-9]\\)/max allocation/"
 }
 
 # get standard environment, filters and checks
@@ -60,16 +62,21 @@ size=$((1 * 1024 * 1024))
 touch "$TEST_DIR/empty"
 extra_blocks=$(stat -c '%b' "$TEST_DIR/empty")
 
+# We always write the first byte; check how many blocks this filesystem
+# allocates to match empty image alloation.
+printf "\0" > "$TEST_DIR/empty"
+min_blocks=$(stat -c '%b' "$TEST_DIR/empty")
+
 echo
 echo "== creating image with default preallocation =="
 _make_test_img $size | _filter_imgfmt
-stat -c "size=%s, blocks=%b" $TEST_IMG | _filter_blocks $extra_blocks $size
+stat -c "size=%s, blocks=%b" $TEST_IMG | _filter_blocks $extra_blocks $min_blocks $size
 
 for mode in off full falloc; do
     echo
     echo "== creating image with preallocation $mode =="
     IMGOPTS=preallocation=$mode _make_test_img $size | _filter_imgfmt
-    stat -c "size=%s, blocks=%b" $TEST_IMG | _filter_blocks $extra_blocks $size
+    stat -c "size=%s, blocks=%b" $TEST_IMG | _filter_blocks $extra_blocks $min_blocks $size
 done
 
 # success, all done
diff --git a/tests/qemu-iotests/175.out b/tests/qemu-iotests/175.out
index 6d9a5ed84e..263e521262 100644
--- a/tests/qemu-iotests/175.out
+++ b/tests/qemu-iotests/175.out
@@ -2,17 +2,17 @@ QA output created by 175
 
 == creating image with default preallocation ==
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
-size=1048576, nothing allocated
+size=1048576, min allocation
 
 == creating image with preallocation off ==
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 preallocation=off
-size=1048576, nothing allocated
+size=1048576, min allocation
 
 == creating image with preallocation full ==
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 preallocation=full
-size=1048576, everything allocated
+size=1048576, max allocation
 
 == creating image with preallocation falloc ==
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 preallocation=falloc
-size=1048576, everything allocated
+size=1048576, max allocation
  *** done
diff --git a/tests/qemu-iotests/178.out.qcow2 b/tests/qemu-iotests/178.out.qcow2
index 55a8dc926f..9e7d8c44df 100644
--- a/tests/qemu-iotests/178.out.qcow2
+++ b/tests/qemu-iotests/178.out.qcow2
@@ -101,7 +101,7 @@ converted image file size in bytes: 196608
 == raw input image with data (human) ==
 
 Formatting 'TEST_DIR/t.qcow2', fmt=IMGFMT size=1073741824
-required size: 393216
+required size: 458752
 fully allocated size: 1074135040
 wrote 512/512 bytes at offset 512
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
@@ -257,7 +257,7 @@ converted image file size in bytes: 196608
 
 Formatting 'TEST_DIR/t.qcow2', fmt=IMGFMT size=1073741824
 {
-    "required": 393216,
+    "required": 458752,
     "fully-allocated": 1074135040
 }
 wrote 512/512 bytes at offset 512
diff --git a/tests/qemu-iotests/221.out b/tests/qemu-iotests/221.out
index 9f9dd52bb0..dca024a0c3 100644
--- a/tests/qemu-iotests/221.out
+++ b/tests/qemu-iotests/221.out
@@ -3,14 +3,18 @@ QA output created by 221
 === Check mapping of unaligned raw image ===
 
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=65537
-[{ "start": 0, "length": 66048, "depth": 0, "zero": true, "data": false, "offset": OFFSET}]
-[{ "start": 0, "length": 66048, "depth": 0, "zero": true, "data": false, "offset": OFFSET}]
+[{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
+{ "start": 4096, "length": 61952, "depth": 0, "zero": true, "data": false, "offset": OFFSET}]
+[{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
+{ "start": 4096, "length": 61952, "depth": 0, "zero": true, "data": false, "offset": OFFSET}]
 wrote 1/1 bytes at offset 65536
 1 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-[{ "start": 0, "length": 65536, "depth": 0, "zero": true, "data": false, "offset": OFFSET},
+[{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
+{ "start": 4096, "length": 61440, "depth": 0, "zero": true, "data": false, "offset": OFFSET},
 { "start": 65536, "length": 1, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
 { "start": 65537, "length": 511, "depth": 0, "zero": true, "data": false, "offset": OFFSET}]
-[{ "start": 0, "length": 65536, "depth": 0, "zero": true, "data": false, "offset": OFFSET},
+[{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
+{ "start": 4096, "length": 61440, "depth": 0, "zero": true, "data": false, "offset": OFFSET},
 { "start": 65536, "length": 1, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
 { "start": 65537, "length": 511, "depth": 0, "zero": true, "data": false, "offset": OFFSET}]
 *** done
diff --git a/tests/qemu-iotests/253.out b/tests/qemu-iotests/253.out
index 607c0baa0b..3d08b305d7 100644
--- a/tests/qemu-iotests/253.out
+++ b/tests/qemu-iotests/253.out
@@ -3,12 +3,16 @@ QA output created by 253
 === Check mapping of unaligned raw image ===
 
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048575
-[{ "start": 0, "length": 1048576, "depth": 0, "zero": true, "data": false, "offset": OFFSET}]
-[{ "start": 0, "length": 1048576, "depth": 0, "zero": true, "data": false, "offset": OFFSET}]
+[{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
+{ "start": 4096, "length": 1044480, "depth": 0, "zero": true, "data": false, "offset": OFFSET}]
+[{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
+{ "start": 4096, "length": 1044480, "depth": 0, "zero": true, "data": false, "offset": OFFSET}]
 wrote 65535/65535 bytes at offset 983040
 63.999 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-[{ "start": 0, "length": 983040, "depth": 0, "zero": true, "data": false, "offset": OFFSET},
+[{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
+{ "start": 4096, "length": 978944, "depth": 0, "zero": true, "data": false, "offset": OFFSET},
 { "start": 983040, "length": 65536, "depth": 0, "zero": false, "data": true, "offset": OFFSET}]
-[{ "start": 0, "length": 983040, "depth": 0, "zero": true, "data": false, "offset": OFFSET},
+[{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
+{ "start": 4096, "length": 978944, "depth": 0, "zero": true, "data": false, "offset": OFFSET},
 { "start": 983040, "length": 65536, "depth": 0, "zero": false, "data": true, "offset": OFFSET}]
 *** done
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v2 2/2] iotests: Test allocate_first_block() with O_DIRECT
  2019-08-25 22:03 [Qemu-devel] [PATCH v2 0/2] Optimize alignment probing Nir Soffer
  2019-08-25 22:03 ` [Qemu-devel] [PATCH v2 1/2] block: posix: Always allocate the first block Nir Soffer
@ 2019-08-25 22:03 ` Nir Soffer
  2019-08-25 22:41   ` Nir Soffer
  2019-08-26 12:38   ` Max Reitz
  2019-08-25 22:19 ` [Qemu-devel] [PATCH v2 0/2] Optimize alignment probing no-reply
  2 siblings, 2 replies; 13+ messages in thread
From: Nir Soffer @ 2019-08-25 22:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Nir Soffer, qemu-block, Max Reitz

Using block_resize we can test allocate_first_block() with file
descriptor opened with O_DIRECT, ensuring that it works for any size
larger than 4096 bytes.

Testing smaller sizes is tricky as the result depends on the filesystem
used for testing. For example on NFS any size will work since O_DIRECT
does not require any alignment.
---
 tests/qemu-iotests/175     | 25 +++++++++++++++++++++++++
 tests/qemu-iotests/175.out |  8 ++++++++
 2 files changed, 33 insertions(+)

diff --git a/tests/qemu-iotests/175 b/tests/qemu-iotests/175
index d54cb43c39..60cc251eb2 100755
--- a/tests/qemu-iotests/175
+++ b/tests/qemu-iotests/175
@@ -49,6 +49,23 @@ _filter_blocks()
         -e "s/blocks=$((extra_blocks + img_size / 512))\\(\$\\|[^0-9]\\)/max allocation/"
 }
 
+# Resize image using block_resize.
+# Parameter 1: image path
+# Parameter 2: new size
+_block_resize()
+{
+    local path=$1
+    local size=$2
+
+    $QEMU -qmp stdio -nographic -nodefaults \
+        -blockdev file,node-name=file,filename=$path,cache.direct=on \
+        <<EOF
+{'execute': 'qmp_capabilities'}
+{'execute': 'block_resize', 'arguments': {'node-name': 'file', 'size': $size}}
+{'execute': 'quit'}
+EOF
+}
+
 # get standard environment, filters and checks
 . ./common.rc
 . ./common.filter
@@ -79,6 +96,14 @@ for mode in off full falloc; do
     stat -c "size=%s, blocks=%b" $TEST_IMG | _filter_blocks $extra_blocks $min_blocks $size
 done
 
+for new_size in 4096 1048576; do
+    echo
+    echo "== resize empty image with block_resize =="
+    _make_test_img 0 | _filter_imgfmt
+    _block_resize $TEST_IMG $new_size >/dev/null
+    stat -c "size=%s, blocks=%b" $TEST_IMG | _filter_blocks $extra_blocks $min_blocks $new_size
+done
+
 # success, all done
 echo "*** done"
 rm -f $seq.full
diff --git a/tests/qemu-iotests/175.out b/tests/qemu-iotests/175.out
index 263e521262..39c2ee0f62 100644
--- a/tests/qemu-iotests/175.out
+++ b/tests/qemu-iotests/175.out
@@ -15,4 +15,12 @@ size=1048576, max allocation
 == creating image with preallocation falloc ==
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 preallocation=falloc
 size=1048576, max allocation
+
+== resize empty image with block_resize ==
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=0
+size=4096, min allocation
+
+== resize empty image with block_resize ==
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=0
+size=1048576, min allocation
  *** done
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/2] Optimize alignment probing
  2019-08-25 22:03 [Qemu-devel] [PATCH v2 0/2] Optimize alignment probing Nir Soffer
  2019-08-25 22:03 ` [Qemu-devel] [PATCH v2 1/2] block: posix: Always allocate the first block Nir Soffer
  2019-08-25 22:03 ` [Qemu-devel] [PATCH v2 2/2] iotests: Test allocate_first_block() with O_DIRECT Nir Soffer
@ 2019-08-25 22:19 ` no-reply
  2 siblings, 0 replies; 13+ messages in thread
From: no-reply @ 2019-08-25 22:19 UTC (permalink / raw)
  To: nirsof; +Cc: kwolf, nsoffer, qemu-devel, qemu-block, mreitz

Patchew URL: https://patchew.org/QEMU/20190825220329.7942-1-nsoffer@redhat.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Subject: [Qemu-devel] [PATCH v2 0/2] Optimize alignment probing
Message-id: 20190825220329.7942-1-nsoffer@redhat.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 - [tag update]      patchew/20190819213755.26175-1-richard.henderson@linaro.org -> patchew/20190819213755.26175-1-richard.henderson@linaro.org
 * [new tag]         patchew/20190825220329.7942-1-nsoffer@redhat.com -> patchew/20190825220329.7942-1-nsoffer@redhat.com
Submodule 'capstone' (https://git.qemu.org/git/capstone.git) registered for path 'capstone'
Submodule 'dtc' (https://git.qemu.org/git/dtc.git) registered for path 'dtc'
Submodule 'roms/QemuMacDrivers' (https://git.qemu.org/git/QemuMacDrivers.git) registered for path 'roms/QemuMacDrivers'
Submodule 'roms/SLOF' (https://git.qemu.org/git/SLOF.git) registered for path 'roms/SLOF'
Submodule 'roms/edk2' (https://git.qemu.org/git/edk2.git) registered for path 'roms/edk2'
Submodule 'roms/ipxe' (https://git.qemu.org/git/ipxe.git) registered for path 'roms/ipxe'
Submodule 'roms/openbios' (https://git.qemu.org/git/openbios.git) registered for path 'roms/openbios'
Submodule 'roms/openhackware' (https://git.qemu.org/git/openhackware.git) registered for path 'roms/openhackware'
Submodule 'roms/opensbi' (https://git.qemu.org/git/opensbi.git) registered for path 'roms/opensbi'
Submodule 'roms/qemu-palcode' (https://git.qemu.org/git/qemu-palcode.git) registered for path 'roms/qemu-palcode'
Submodule 'roms/seabios' (https://git.qemu.org/git/seabios.git/) registered for path 'roms/seabios'
Submodule 'roms/seabios-hppa' (https://git.qemu.org/git/seabios-hppa.git) registered for path 'roms/seabios-hppa'
Submodule 'roms/sgabios' (https://git.qemu.org/git/sgabios.git) registered for path 'roms/sgabios'
Submodule 'roms/skiboot' (https://git.qemu.org/git/skiboot.git) registered for path 'roms/skiboot'
Submodule 'roms/u-boot' (https://git.qemu.org/git/u-boot.git) registered for path 'roms/u-boot'
Submodule 'roms/u-boot-sam460ex' (https://git.qemu.org/git/u-boot-sam460ex.git) registered for path 'roms/u-boot-sam460ex'
Submodule 'slirp' (https://git.qemu.org/git/libslirp.git) registered for path 'slirp'
Submodule 'tests/fp/berkeley-softfloat-3' (https://git.qemu.org/git/berkeley-softfloat-3.git) registered for path 'tests/fp/berkeley-softfloat-3'
Submodule 'tests/fp/berkeley-testfloat-3' (https://git.qemu.org/git/berkeley-testfloat-3.git) registered for path 'tests/fp/berkeley-testfloat-3'
Submodule 'ui/keycodemapdb' (https://git.qemu.org/git/keycodemapdb.git) registered for path 'ui/keycodemapdb'
Cloning into 'capstone'...
Submodule path 'capstone': checked out '22ead3e0bfdb87516656453336160e0a37b066bf'
Cloning into 'dtc'...
Submodule path 'dtc': checked out '88f18909db731a627456f26d779445f84e449536'
Cloning into 'roms/QemuMacDrivers'...
Submodule path 'roms/QemuMacDrivers': checked out '90c488d5f4a407342247b9ea869df1c2d9c8e266'
Cloning into 'roms/SLOF'...
Submodule path 'roms/SLOF': checked out '7bfe584e321946771692711ff83ad2b5850daca7'
Cloning into 'roms/edk2'...
Submodule path 'roms/edk2': checked out '20d2e5a125e34fc8501026613a71549b2a1a3e54'
Submodule 'SoftFloat' (https://github.com/ucb-bar/berkeley-softfloat-3.git) registered for path 'ArmPkg/Library/ArmSoftFloatLib/berkeley-softfloat-3'
Submodule 'CryptoPkg/Library/OpensslLib/openssl' (https://github.com/openssl/openssl) registered for path 'CryptoPkg/Library/OpensslLib/openssl'
Cloning into 'ArmPkg/Library/ArmSoftFloatLib/berkeley-softfloat-3'...
Submodule path 'roms/edk2/ArmPkg/Library/ArmSoftFloatLib/berkeley-softfloat-3': checked out 'b64af41c3276f97f0e181920400ee056b9c88037'
Cloning into 'CryptoPkg/Library/OpensslLib/openssl'...
Submodule path 'roms/edk2/CryptoPkg/Library/OpensslLib/openssl': checked out '50eaac9f3337667259de725451f201e784599687'
Submodule 'boringssl' (https://boringssl.googlesource.com/boringssl) registered for path 'boringssl'
Submodule 'krb5' (https://github.com/krb5/krb5) registered for path 'krb5'
Submodule 'pyca.cryptography' (https://github.com/pyca/cryptography.git) registered for path 'pyca-cryptography'
Cloning into 'boringssl'...
Submodule path 'roms/edk2/CryptoPkg/Library/OpensslLib/openssl/boringssl': checked out '2070f8ad9151dc8f3a73bffaa146b5e6937a583f'
Cloning into 'krb5'...
Submodule path 'roms/edk2/CryptoPkg/Library/OpensslLib/openssl/krb5': checked out 'b9ad6c49505c96a088326b62a52568e3484f2168'
Cloning into 'pyca-cryptography'...
Submodule path 'roms/edk2/CryptoPkg/Library/OpensslLib/openssl/pyca-cryptography': checked out '09403100de2f6f1cdd0d484dcb8e620f1c335c8f'
Cloning into 'roms/ipxe'...
Submodule path 'roms/ipxe': checked out 'de4565cbe76ea9f7913a01f331be3ee901bb6e17'
Cloning into 'roms/openbios'...
Submodule path 'roms/openbios': checked out 'c79e0ecb84f4f1ee3f73f521622e264edd1bf174'
Cloning into 'roms/openhackware'...
Submodule path 'roms/openhackware': checked out 'c559da7c8eec5e45ef1f67978827af6f0b9546f5'
Cloning into 'roms/opensbi'...
Submodule path 'roms/opensbi': checked out 'ce228ee0919deb9957192d723eecc8aaae2697c6'
Cloning into 'roms/qemu-palcode'...
Submodule path 'roms/qemu-palcode': checked out 'bf0e13698872450164fa7040da36a95d2d4b326f'
Cloning into 'roms/seabios'...
Submodule path 'roms/seabios': checked out 'a5cab58e9a3fb6e168aba919c5669bea406573b4'
Cloning into 'roms/seabios-hppa'...
Submodule path 'roms/seabios-hppa': checked out '0f4fe84658165e96ce35870fd19fc634e182e77b'
Cloning into 'roms/sgabios'...
Submodule path 'roms/sgabios': checked out 'cbaee52287e5f32373181cff50a00b6c4ac9015a'
Cloning into 'roms/skiboot'...
Submodule path 'roms/skiboot': checked out '261ca8e779e5138869a45f174caa49be6a274501'
Cloning into 'roms/u-boot'...
Submodule path 'roms/u-boot': checked out 'd3689267f92c5956e09cc7d1baa4700141662bff'
Cloning into 'roms/u-boot-sam460ex'...
Submodule path 'roms/u-boot-sam460ex': checked out '60b3916f33e617a815973c5a6df77055b2e3a588'
Cloning into 'slirp'...
Submodule path 'slirp': checked out '126c04acbabd7ad32c2b018fe10dfac2a3bc1210'
Cloning into 'tests/fp/berkeley-softfloat-3'...
Submodule path 'tests/fp/berkeley-softfloat-3': checked out 'b64af41c3276f97f0e181920400ee056b9c88037'
Cloning into 'tests/fp/berkeley-testfloat-3'...
Submodule path 'tests/fp/berkeley-testfloat-3': checked out '5a59dcec19327396a011a17fd924aed4fec416b3'
Cloning into 'ui/keycodemapdb'...
Submodule path 'ui/keycodemapdb': checked out '6b3d716e2b6472eb7189d3220552280ef3d832ce'
Switched to a new branch 'test'
cb7fbab iotests: Test allocate_first_block() with O_DIRECT
b07ae0f block: posix: Always allocate the first block

=== OUTPUT BEGIN ===
1/2 Checking commit b07ae0ffb5ab (block: posix: Always allocate the first block)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#123: 
rename from tests/qemu-iotests/150.out

total: 0 errors, 1 warnings, 195 lines checked

Patch 1/2 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
2/2 Checking commit cb7fbab36744 (iotests: Test allocate_first_block() with O_DIRECT)
ERROR: Missing Signed-off-by: line(s)

total: 1 errors, 0 warnings, 49 lines checked

Patch 2/2 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20190825220329.7942-1-nsoffer@redhat.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/2] iotests: Test allocate_first_block() with O_DIRECT
  2019-08-25 22:03 ` [Qemu-devel] [PATCH v2 2/2] iotests: Test allocate_first_block() with O_DIRECT Nir Soffer
@ 2019-08-25 22:41   ` Nir Soffer
  2019-08-26 12:38   ` Max Reitz
  1 sibling, 0 replies; 13+ messages in thread
From: Nir Soffer @ 2019-08-25 22:41 UTC (permalink / raw)
  To: Nir Soffer; +Cc: Kevin Wolf, QEMU Developers, qemu-block, Max Reitz

On Mon, Aug 26, 2019 at 1:03 AM Nir Soffer <nirsof@gmail.com> wrote:

> Using block_resize we can test allocate_first_block() with file
> descriptor opened with O_DIRECT, ensuring that it works for any size
> larger than 4096 bytes.
>
> Testing smaller sizes is tricky as the result depends on the filesystem
> used for testing. For example on NFS any size will work since O_DIRECT
> does not require any alignment.
>

Forgot to add:

Signed-off-by: Nir Soffer <nsoffer@redhat.com>

---
>  tests/qemu-iotests/175     | 25 +++++++++++++++++++++++++
>  tests/qemu-iotests/175.out |  8 ++++++++
>  2 files changed, 33 insertions(+)
>
> diff --git a/tests/qemu-iotests/175 b/tests/qemu-iotests/175
> index d54cb43c39..60cc251eb2 100755
> --- a/tests/qemu-iotests/175
> +++ b/tests/qemu-iotests/175
> @@ -49,6 +49,23 @@ _filter_blocks()
>          -e "s/blocks=$((extra_blocks + img_size /
> 512))\\(\$\\|[^0-9]\\)/max allocation/"
>  }
>
> +# Resize image using block_resize.
> +# Parameter 1: image path
> +# Parameter 2: new size
> +_block_resize()
> +{
> +    local path=$1
> +    local size=$2
> +
> +    $QEMU -qmp stdio -nographic -nodefaults \
> +        -blockdev file,node-name=file,filename=$path,cache.direct=on \
> +        <<EOF
> +{'execute': 'qmp_capabilities'}
> +{'execute': 'block_resize', 'arguments': {'node-name': 'file', 'size':
> $size}}
> +{'execute': 'quit'}
> +EOF
> +}
> +
>  # get standard environment, filters and checks
>  . ./common.rc
>  . ./common.filter
> @@ -79,6 +96,14 @@ for mode in off full falloc; do
>      stat -c "size=%s, blocks=%b" $TEST_IMG | _filter_blocks $extra_blocks
> $min_blocks $size
>  done
>
> +for new_size in 4096 1048576; do
> +    echo
> +    echo "== resize empty image with block_resize =="
> +    _make_test_img 0 | _filter_imgfmt
> +    _block_resize $TEST_IMG $new_size >/dev/null
> +    stat -c "size=%s, blocks=%b" $TEST_IMG | _filter_blocks $extra_blocks
> $min_blocks $new_size
> +done
> +
>  # success, all done
>  echo "*** done"
>  rm -f $seq.full
> diff --git a/tests/qemu-iotests/175.out b/tests/qemu-iotests/175.out
> index 263e521262..39c2ee0f62 100644
> --- a/tests/qemu-iotests/175.out
> +++ b/tests/qemu-iotests/175.out
> @@ -15,4 +15,12 @@ size=1048576, max allocation
>  == creating image with preallocation falloc ==
>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
> preallocation=falloc
>  size=1048576, max allocation
> +
> +== resize empty image with block_resize ==
> +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=0
> +size=4096, min allocation
> +
> +== resize empty image with block_resize ==
> +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=0
> +size=1048576, min allocation
>   *** done
> --
> 2.20.1
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/2] block: posix: Always allocate the first block
  2019-08-25 22:03 ` [Qemu-devel] [PATCH v2 1/2] block: posix: Always allocate the first block Nir Soffer
@ 2019-08-26 12:31   ` Max Reitz
  2019-08-26 13:49     ` Eric Blake
  2019-08-26 15:41     ` Nir Soffer
  2019-08-26 13:46   ` Eric Blake
  1 sibling, 2 replies; 13+ messages in thread
From: Max Reitz @ 2019-08-26 12:31 UTC (permalink / raw)
  To: Nir Soffer, qemu-devel; +Cc: Kevin Wolf, Nir Soffer, qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 6657 bytes --]

On 26.08.19 00:03, Nir Soffer wrote:
> When creating an image with preallocation "off" or "falloc", the first
> block of the image is typically not allocated. When using Gluster
> storage backed by XFS filesystem, reading this block using direct I/O
> succeeds regardless of request length, fooling alignment detection.
> 
> In this case we fallback to a safe value (4096) instead of the optimal
> value (512), which may lead to unneeded data copying when aligning
> requests.  Allocating the first block avoids the fallback.
> 
> Since we allocate the first block even with preallocation=off, we no
> longer create images with zero disk size:
> 
>     $ ./qemu-img create -f raw test.raw 1g
>     Formatting 'test.raw', fmt=raw size=1073741824
> 
>     $ ls -lhs test.raw
>     4.0K -rw-r--r--. 1 nsoffer nsoffer 1.0G Aug 16 23:48 test.raw
> 
> And converting the image requires additional cluster:
> 
>     $ ./qemu-img measure -f raw -O qcow2 test.raw
>     required size: 458752
>     fully allocated size: 1074135040
> 
> I did quick performance test for copying disks with qemu-img convert to
> new raw target image to Gluster storage with sector size of 512 bytes:
> 
>     for i in $(seq 10); do
>         rm -f dst.raw
>         sleep 10
>         time ./qemu-img convert -f raw -O raw -t none -T none src.raw dst.raw
>     done
> 
> Here is a table comparing the total time spent:
> 
> Type    Before(s)   After(s)    Diff(%)
> ---------------------------------------
> real      530.028    469.123      -11.4
> user       17.204     10.768      -37.4
> sys        17.881      7.011      -60.7
> 
> We can see very clear improvement in CPU usage.
> 
> Signed-off-by: Nir Soffer <nsoffer@redhat.com>
> ---
>  block/file-posix.c                            | 43 +++++++++++++++++++
>  tests/qemu-iotests/{150.out => 150.out.qcow2} |  0
>  tests/qemu-iotests/150.out.raw                | 12 ++++++
>  tests/qemu-iotests/175                        | 19 +++++---
>  tests/qemu-iotests/175.out                    |  8 ++--
>  tests/qemu-iotests/178.out.qcow2              |  4 +-
>  tests/qemu-iotests/221.out                    | 12 ++++--
>  tests/qemu-iotests/253.out                    | 12 ++++--
>  8 files changed, 90 insertions(+), 20 deletions(-)
>  rename tests/qemu-iotests/{150.out => 150.out.qcow2} (100%)
>  create mode 100644 tests/qemu-iotests/150.out.raw
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index fbeb0068db..51688ae3fc 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1749,6 +1749,39 @@ static int handle_aiocb_discard(void *opaque)
>      return ret;
>  }
>  
> +/*
> + * Help alignment probing by allocating the first block.
> + *
> + * When reading with direct I/O from unallocated area on Gluster backed by XFS,
> + * reading succeeds regardless of request length. In this case we fallback to
> + * safe alignment which is not optimal. Allocating the first block avoids this
> + * fallback.
> + *
> + * fd may be opened with O_DIRECT, but we don't know the buffer alignment or
> + * request alignment, so we use safe values.
> + *
> + * Returns: 0 on success, -errno on failure. Since this is an optimization,
> + * caller may ignore failures.
> + */
> +static int allocate_first_block(int fd, size_t max_size)
> +{
> +    size_t write_size = MIN(MAX_BLOCKSIZE, max_size);

Hm, well, there was a reason why I proposed rounding this down to the
next power of two.  If max_size is not a power of two but below
MAX_BLOCKSIZE, write_size will not be a power of two, and thus the write
below may fail even if write_size exceeds the physical block size.

You can see that in the test case you add by using e.g. 768 as the
destination size (provided your test filesystem has a block size of 512).

Now I would like to say that it’s stupid to resize an O_DIRECT file to a
size that is not a multiple of the block size; but I’ve had a bug
assigned to me before because that didn’t work.

But maybe it’s actually better if it doesn’t work.  I don’t know.

> +    size_t max_align = MAX(MAX_BLOCKSIZE, getpagesize());
> +    void *buf;
> +    ssize_t n;
> +
> +    buf = qemu_memalign(max_align, write_size);
> +    memset(buf, 0, write_size);
> +
> +    do {
> +        n = pwrite(fd, buf, write_size, 0);
> +    } while (n == -1 && errno == EINTR);
> +
> +    qemu_vfree(buf);
> +
> +    return (n == -1) ? -errno : 0;
> +}
> +
>  static int handle_aiocb_truncate(void *opaque)
>  {
>      RawPosixAIOData *aiocb = opaque;
> @@ -1788,6 +1821,13 @@ static int handle_aiocb_truncate(void *opaque)
>                  /* posix_fallocate() doesn't set errno. */
>                  error_setg_errno(errp, -result,
>                                   "Could not preallocate new data");
> +            } else if (current_length == 0) {
> +                /*
> +                 * Needed only if posix_fallocate() used fallocate(), but we
> +                 * don't have a way to detect that.

This sounds a bit weird because fallocate() is what we call
posix_fallocate() for.  I’d’ve liked something that states more
explicitly that unaligned reads from fallocated areas may succeed even
with O_DIRECT, hence the need for allocate_first_block().

>                                                      Optimize future alignment
> +                 * probing; ignore failures.
> +                 */
> +                allocate_first_block(fd, offset);
>              }
>          } else {
>              result = 0;

[...]

> diff --git a/tests/qemu-iotests/175 b/tests/qemu-iotests/175
> index 51e62c8276..d54cb43c39 100755
> --- a/tests/qemu-iotests/175
> +++ b/tests/qemu-iotests/175
> @@ -37,14 +37,16 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
>  # the file size.  This function hides the resulting difference in the
>  # stat -c '%b' output.
>  # Parameter 1: Number of blocks an empty file occupies
> -# Parameter 2: Image size in bytes
> +# Parameter 2: Minimal number of blocks in an image
> +# Parameter 3: Image size in bytes
>  _filter_blocks()
>  {
>      extra_blocks=$1
> -    img_size=$2
> +    min_blocks=$2
> +    img_size=$3
>  
> -    sed -e "s/blocks=$extra_blocks\\(\$\\|[^0-9]\\)/nothing allocated/" \
> -        -e "s/blocks=$((extra_blocks + img_size / 512))\\(\$\\|[^0-9]\\)/everything allocated/"
> +    sed -e "s/blocks=$((min_blocks))\\(\$\\|[^0-9]\\)/min allocation/" \

Superfluous parentheses ($(())), but not wrong.

So I think I can give a

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/2] iotests: Test allocate_first_block() with O_DIRECT
  2019-08-25 22:03 ` [Qemu-devel] [PATCH v2 2/2] iotests: Test allocate_first_block() with O_DIRECT Nir Soffer
  2019-08-25 22:41   ` Nir Soffer
@ 2019-08-26 12:38   ` Max Reitz
  1 sibling, 0 replies; 13+ messages in thread
From: Max Reitz @ 2019-08-26 12:38 UTC (permalink / raw)
  To: Nir Soffer, qemu-devel; +Cc: Kevin Wolf, Nir Soffer, qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 922 bytes --]

On 26.08.19 00:03, Nir Soffer wrote:
> Using block_resize we can test allocate_first_block() with file
> descriptor opened with O_DIRECT, ensuring that it works for any size
> larger than 4096 bytes.
> 
> Testing smaller sizes is tricky as the result depends on the filesystem
> used for testing. For example on NFS any size will work since O_DIRECT
> does not require any alignment.
> ---
>  tests/qemu-iotests/175     | 25 +++++++++++++++++++++++++
>  tests/qemu-iotests/175.out |  8 ++++++++
>  2 files changed, 33 insertions(+)

Thanks for the test!

There’s just one thing: 175 now needs

_default_cache_mode none
_supported_cache_modes none directsync

somewhere near the top (where the rest of _supported*) is.  (Otherwise,
it will fail when the iotests should be run with some other cache mode
instead of being skipped.)

With that added:

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/2] block: posix: Always allocate the first block
  2019-08-25 22:03 ` [Qemu-devel] [PATCH v2 1/2] block: posix: Always allocate the first block Nir Soffer
  2019-08-26 12:31   ` Max Reitz
@ 2019-08-26 13:46   ` Eric Blake
  2019-08-26 15:19     ` Nir Soffer
  1 sibling, 1 reply; 13+ messages in thread
From: Eric Blake @ 2019-08-26 13:46 UTC (permalink / raw)
  To: Nir Soffer, qemu-devel; +Cc: Kevin Wolf, Nir Soffer, qemu-block, Max Reitz


[-- Attachment #1.1: Type: text/plain, Size: 1518 bytes --]

On 8/25/19 5:03 PM, Nir Soffer wrote:
> When creating an image with preallocation "off" or "falloc", the first
> block of the image is typically not allocated. When using Gluster
> storage backed by XFS filesystem, reading this block using direct I/O
> succeeds regardless of request length, fooling alignment detection.
> 
> In this case we fallback to a safe value (4096) instead of the optimal
> value (512), which may lead to unneeded data copying when aligning
> requests.  Allocating the first block avoids the fallback.
> 

> Here is a table comparing the total time spent:
> 
> Type    Before(s)   After(s)    Diff(%)
> ---------------------------------------
> real      530.028    469.123      -11.4
> user       17.204     10.768      -37.4
> sys        17.881      7.011      -60.7
> 
> We can see very clear improvement in CPU usage.

Nice justification.


> +/*
> + * Help alignment probing by allocating the first block.
> + *

> +    do {
> +        n = pwrite(fd, buf, write_size, 0);
> +    } while (n == -1 && errno == EINTR);
> +
> +    qemu_vfree(buf);

qemu_vfree() can corrupt errno...

> +
> +    return (n == -1) ? -errno : 0;

...which means you may be returning an unexpected value here.

Either we should patch qemu_vfree() to guarantee that errno is
preserved, or you locally capture errno before calling it here.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/2] block: posix: Always allocate the first block
  2019-08-26 12:31   ` Max Reitz
@ 2019-08-26 13:49     ` Eric Blake
  2019-08-26 15:23       ` Nir Soffer
  2019-08-26 15:41     ` Nir Soffer
  1 sibling, 1 reply; 13+ messages in thread
From: Eric Blake @ 2019-08-26 13:49 UTC (permalink / raw)
  To: Max Reitz, Nir Soffer, qemu-devel; +Cc: Kevin Wolf, Nir Soffer, qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 1216 bytes --]

On 8/26/19 7:31 AM, Max Reitz wrote:

>>  # the file size.  This function hides the resulting difference in the
>>  # stat -c '%b' output.
>>  # Parameter 1: Number of blocks an empty file occupies
>> -# Parameter 2: Image size in bytes
>> +# Parameter 2: Minimal number of blocks in an image
>> +# Parameter 3: Image size in bytes
>>  _filter_blocks()
>>  {
>>      extra_blocks=$1
>> -    img_size=$2
>> +    min_blocks=$2
>> +    img_size=$3
>>  
>> -    sed -e "s/blocks=$extra_blocks\\(\$\\|[^0-9]\\)/nothing allocated/" \
>> -        -e "s/blocks=$((extra_blocks + img_size / 512))\\(\$\\|[^0-9]\\)/everything allocated/"
>> +    sed -e "s/blocks=$((min_blocks))\\(\$\\|[^0-9]\\)/min allocation/" \
> 
> Superfluous parentheses ($(())), but not wrong.

Note that $((..)) has a purpose: it can convert any variable content
into decimal.  I can write min_blocks=0x1000, and $((min_blocks))
results in 4096 while $min_blocks is still 0x1000.  But I'd need more
context as to what the callers expect to pass as to whether the $((...))
is superfluous here.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/2] block: posix: Always allocate the first block
  2019-08-26 13:46   ` Eric Blake
@ 2019-08-26 15:19     ` Nir Soffer
  0 siblings, 0 replies; 13+ messages in thread
From: Nir Soffer @ 2019-08-26 15:19 UTC (permalink / raw)
  To: Eric Blake; +Cc: Kevin Wolf, Nir Soffer, QEMU Developers, qemu-block, Max Reitz

On Mon, Aug 26, 2019 at 4:46 PM Eric Blake <eblake@redhat.com> wrote:
>
> On 8/25/19 5:03 PM, Nir Soffer wrote:
> > When creating an image with preallocation "off" or "falloc", the first
> > block of the image is typically not allocated. When using Gluster
> > storage backed by XFS filesystem, reading this block using direct I/O
> > succeeds regardless of request length, fooling alignment detection.
> >
> > In this case we fallback to a safe value (4096) instead of the optimal
> > value (512), which may lead to unneeded data copying when aligning
> > requests.  Allocating the first block avoids the fallback.
> >
>
> > Here is a table comparing the total time spent:
> >
> > Type    Before(s)   After(s)    Diff(%)
> > ---------------------------------------
> > real      530.028    469.123      -11.4
> > user       17.204     10.768      -37.4
> > sys        17.881      7.011      -60.7
> >
> > We can see very clear improvement in CPU usage.
>
> Nice justification.
>
>
> > +/*
> > + * Help alignment probing by allocating the first block.
> > + *
>
> > +    do {
> > +        n = pwrite(fd, buf, write_size, 0);
> > +    } while (n == -1 && errno == EINTR);
> > +
> > +    qemu_vfree(buf);
>
> qemu_vfree() can corrupt errno...
>
> > +
> > +    return (n == -1) ? -errno : 0;
>
> ...which means you may be returning an unexpected value here.
>
> Either we should patch qemu_vfree() to guarantee that errno is
> preserved, or you locally capture errno before calling it here.

qemu_vfree() returns void like free(), so changing errno is unexpected, but
other code using it should not be effected, so preserving errno here seems
like a better change.

Thanks!

Nir


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/2] block: posix: Always allocate the first block
  2019-08-26 13:49     ` Eric Blake
@ 2019-08-26 15:23       ` Nir Soffer
  0 siblings, 0 replies; 13+ messages in thread
From: Nir Soffer @ 2019-08-26 15:23 UTC (permalink / raw)
  To: Eric Blake; +Cc: Kevin Wolf, Nir Soffer, QEMU Developers, qemu-block, Max Reitz

On Mon, Aug 26, 2019 at 4:49 PM Eric Blake <eblake@redhat.com> wrote:
>
> On 8/26/19 7:31 AM, Max Reitz wrote:
>
> >>  # the file size.  This function hides the resulting difference in the
> >>  # stat -c '%b' output.
> >>  # Parameter 1: Number of blocks an empty file occupies
> >> -# Parameter 2: Image size in bytes
> >> +# Parameter 2: Minimal number of blocks in an image
> >> +# Parameter 3: Image size in bytes
> >>  _filter_blocks()
> >>  {
> >>      extra_blocks=$1
> >> -    img_size=$2
> >> +    min_blocks=$2
> >> +    img_size=$3
> >>
> >> -    sed -e "s/blocks=$extra_blocks\\(\$\\|[^0-9]\\)/nothing allocated/" \
> >> -        -e "s/blocks=$((extra_blocks + img_size / 512))\\(\$\\|[^0-9]\\)/everything allocated/"
> >> +    sed -e "s/blocks=$((min_blocks))\\(\$\\|[^0-9]\\)/min allocation/" \
> >
> > Superfluous parentheses ($(())), but not wrong.
>
> Note that $((..)) has a purpose: it can convert any variable content
> into decimal.  I can write min_blocks=0x1000, and $((min_blocks))
> results in 4096 while $min_blocks is still 0x1000.  But I'd need more
> context as to what the callers expect to pass as to whether the $((...))
> is superfluous here.

In this case min_blocks is computed and always use base 10, so we don't
need the $(()).

Nir


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/2] block: posix: Always allocate the first block
  2019-08-26 12:31   ` Max Reitz
  2019-08-26 13:49     ` Eric Blake
@ 2019-08-26 15:41     ` Nir Soffer
  2019-08-26 16:20       ` Max Reitz
  1 sibling, 1 reply; 13+ messages in thread
From: Nir Soffer @ 2019-08-26 15:41 UTC (permalink / raw)
  To: Max Reitz; +Cc: Kevin Wolf, Nir Soffer, QEMU Developers, qemu-block

On Mon, Aug 26, 2019 at 3:31 PM Max Reitz <mreitz@redhat.com> wrote:
>
> On 26.08.19 00:03, Nir Soffer wrote:
...
> > +/*
> > + * Help alignment probing by allocating the first block.
> > + *
> > + * When reading with direct I/O from unallocated area on Gluster backed by XFS,
> > + * reading succeeds regardless of request length. In this case we fallback to
> > + * safe alignment which is not optimal. Allocating the first block avoids this
> > + * fallback.
> > + *
> > + * fd may be opened with O_DIRECT, but we don't know the buffer alignment or
> > + * request alignment, so we use safe values.
> > + *
> > + * Returns: 0 on success, -errno on failure. Since this is an optimization,
> > + * caller may ignore failures.
> > + */
> > +static int allocate_first_block(int fd, size_t max_size)
> > +{
> > +    size_t write_size = MIN(MAX_BLOCKSIZE, max_size);
>
> Hm, well, there was a reason why I proposed rounding this down to the
> next power of two.  If max_size is not a power of two but below
> MAX_BLOCKSIZE, write_size will not be a power of two, and thus the write
> below may fail even if write_size exceeds the physical block size.
>
> You can see that in the test case you add by using e.g. 768 as the
> destination size (provided your test filesystem has a block size of 512).
>
> Now I would like to say that it’s stupid to resize an O_DIRECT file to a
> size that is not a multiple of the block size; but I’ve had a bug
> assigned to me before because that didn’t work.
>
> But maybe it’s actually better if it doesn’t work.  I don’t know.

I tried to avoid complexity that is unlikely to help anyone, but we
can make the (typical)
case of 512 bytes sector size work with this:

    size_t write_size = (max_size < MAX_BLOCKSIZE)
        ? BDRV_SECTOR_SIZE
        : MAX_BLOCKSIZE;

Unfortunately testing max_size < 4096 will not be reliable since we don't know
that underlying storage sector size.

...


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/2] block: posix: Always allocate the first block
  2019-08-26 15:41     ` Nir Soffer
@ 2019-08-26 16:20       ` Max Reitz
  0 siblings, 0 replies; 13+ messages in thread
From: Max Reitz @ 2019-08-26 16:20 UTC (permalink / raw)
  To: Nir Soffer; +Cc: Kevin Wolf, Nir Soffer, QEMU Developers, qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 2172 bytes --]

On 26.08.19 17:41, Nir Soffer wrote:
> On Mon, Aug 26, 2019 at 3:31 PM Max Reitz <mreitz@redhat.com> wrote:
>>
>> On 26.08.19 00:03, Nir Soffer wrote:
> ...
>>> +/*
>>> + * Help alignment probing by allocating the first block.
>>> + *
>>> + * When reading with direct I/O from unallocated area on Gluster backed by XFS,
>>> + * reading succeeds regardless of request length. In this case we fallback to
>>> + * safe alignment which is not optimal. Allocating the first block avoids this
>>> + * fallback.
>>> + *
>>> + * fd may be opened with O_DIRECT, but we don't know the buffer alignment or
>>> + * request alignment, so we use safe values.
>>> + *
>>> + * Returns: 0 on success, -errno on failure. Since this is an optimization,
>>> + * caller may ignore failures.
>>> + */
>>> +static int allocate_first_block(int fd, size_t max_size)
>>> +{
>>> +    size_t write_size = MIN(MAX_BLOCKSIZE, max_size);
>>
>> Hm, well, there was a reason why I proposed rounding this down to the
>> next power of two.  If max_size is not a power of two but below
>> MAX_BLOCKSIZE, write_size will not be a power of two, and thus the write
>> below may fail even if write_size exceeds the physical block size.
>>
>> You can see that in the test case you add by using e.g. 768 as the
>> destination size (provided your test filesystem has a block size of 512).
>>
>> Now I would like to say that it’s stupid to resize an O_DIRECT file to a
>> size that is not a multiple of the block size; but I’ve had a bug
>> assigned to me before because that didn’t work.
>>
>> But maybe it’s actually better if it doesn’t work.  I don’t know.
> 
> I tried to avoid complexity that is unlikely to help anyone, but we
> can make the (typical)
> case of 512 bytes sector size work with this:
> 
>     size_t write_size = (max_size < MAX_BLOCKSIZE)
>         ? BDRV_SECTOR_SIZE
>         : MAX_BLOCKSIZE;
> 
> Unfortunately testing max_size < 4096 will not be reliable since we don't know
> that underlying storage sector size.

Hm, well, why not, actually.  That’s simple enough and it should work in
all common configurations.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-08-26 16:28 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-25 22:03 [Qemu-devel] [PATCH v2 0/2] Optimize alignment probing Nir Soffer
2019-08-25 22:03 ` [Qemu-devel] [PATCH v2 1/2] block: posix: Always allocate the first block Nir Soffer
2019-08-26 12:31   ` Max Reitz
2019-08-26 13:49     ` Eric Blake
2019-08-26 15:23       ` Nir Soffer
2019-08-26 15:41     ` Nir Soffer
2019-08-26 16:20       ` Max Reitz
2019-08-26 13:46   ` Eric Blake
2019-08-26 15:19     ` Nir Soffer
2019-08-25 22:03 ` [Qemu-devel] [PATCH v2 2/2] iotests: Test allocate_first_block() with O_DIRECT Nir Soffer
2019-08-25 22:41   ` Nir Soffer
2019-08-26 12:38   ` Max Reitz
2019-08-25 22:19 ` [Qemu-devel] [PATCH v2 0/2] Optimize alignment probing no-reply

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).