All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Snow <jsnow@redhat.com>
To: Nir Soffer <nirsof@gmail.com>, qemu-block@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>,
	qemu-devel@nongnu.org, Max Reitz <mreitz@redhat.com>
Subject: Re: [Qemu-devel] [Qemu-block] [PATCH] block: posix: Always allocate the first block
Date: Fri, 16 Aug 2019 17:57:05 -0400	[thread overview]
Message-ID: <b24959b4-f2b2-d720-f8b5-4adc25b89278@redhat.com> (raw)
In-Reply-To: <20190816212122.8816-1-nsoffer@redhat.com>



On 8/16/19 5:21 PM, Nir Soffer wrote:
> When creating an image with preallocation "off" or "falloc", the first
> block of the image is typically not allocated. When using Gluster
> storage backed by XFS filesystem, reading this block using direct I/O
> succeeds regardless of request length, fooling alignment detection.
> 
> In this case we fallback to a safe value (4096) instead of the optimal
> value (512), which may lead to unneeded data copying when aligning
> requests.  Allocating the first block avoids the fallback.
> 

Where does this detection/fallback happen? (Can it be improved?)

> When using preallocation=off, we always allocate at least one filesystem
> block:
> 
>     $ ./qemu-img create -f raw test.raw 1g
>     Formatting 'test.raw', fmt=raw size=1073741824
> 
>     $ ls -lhs test.raw
>     4.0K -rw-r--r--. 1 nsoffer nsoffer 1.0G Aug 16 23:48 test.raw
> 
> I did quick performance tests for these flows:
> - Provisioning a VM with a new raw image.
> - Copying disks with qemu-img convert to new raw target image
> 
> I installed Fedora 29 server on raw sparse image, measuring the time
> from clicking "Begin installation" until the "Reboot" button appears:
> 
> Before(s)  After(s)     Diff(%)
> -------------------------------
>      356        389        +8.4
> 
> I ran this only once, so we cannot tell much from these results.
> 

That seems like a pretty big difference for just having pre-allocated a
single block. What was the actual command line / block graph for that test?

Was this over a network that could explain the variance?

> The second test was cloning the installation image with qemu-img
> convert, doing 10 runs:
> 
>     for i in $(seq 10); do
>         rm -f dst.raw
>         sleep 10
>         time ./qemu-img convert -f raw -O raw -t none -T none src.raw dst.raw
>     done
> 
> Here is a table comparing the total time spent:
> 
> Type    Before(s)   After(s)    Diff(%)
> ---------------------------------------
> real      530.028    469.123      -11.4
> user       17.204     10.768      -37.4
> sys        17.881      7.011      -60.7
> 
> Here we see very clear improvement in CPU usage.
> 

Hard to argue much with that. I feel a little strange trying to force
the allocation of the first block, but I suppose in practice "almost no
preallocation" is indistinguishable from "exactly no preallocation" if
you squint.

> Signed-off-by: Nir Soffer <nsoffer@redhat.com>
> ---
>  block/file-posix.c         | 25 +++++++++++++++++++++++++
>  tests/qemu-iotests/150.out |  1 +
>  tests/qemu-iotests/160     |  4 ++++
>  tests/qemu-iotests/175     | 19 +++++++++++++------
>  tests/qemu-iotests/175.out |  8 ++++----
>  tests/qemu-iotests/221.out | 12 ++++++++----
>  tests/qemu-iotests/253.out | 12 ++++++++----
>  7 files changed, 63 insertions(+), 18 deletions(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index b9c33c8f6c..3964dd2021 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1755,6 +1755,27 @@ static int handle_aiocb_discard(void *opaque)
>      return ret;
>  }
>  
> +/*
> + * Help alignment detection by allocating the first block.
> + *
> + * When reading with direct I/O from unallocated area on Gluster backed by XFS,
> + * reading succeeds regardless of request length. In this case we fallback to
> + * safe aligment which is not optimal. Allocating the first block avoids this
> + * fallback.
> + *
> + * Returns: 0 on success, -errno on failure.
> + */
> +static int allocate_first_block(int fd)
> +{
> +    ssize_t n;
> +
> +    do {
> +        n = pwrite(fd, "\0", 1, 0);
> +    } while (n == -1 && errno == EINTR);
> +
> +    return (n == -1) ? -errno : 0;
> +}
> +
>  static int handle_aiocb_truncate(void *opaque)
>  {
>      RawPosixAIOData *aiocb = opaque;
> @@ -1794,6 +1815,8 @@ static int handle_aiocb_truncate(void *opaque)
>                  /* posix_fallocate() doesn't set errno. */
>                  error_setg_errno(errp, -result,
>                                   "Could not preallocate new data");
> +            } else if (current_length == 0) {
> +                allocate_first_block(fd);
>              }
>          } else {
>              result = 0;
> @@ -1855,6 +1878,8 @@ static int handle_aiocb_truncate(void *opaque)
>          if (ftruncate(fd, offset) != 0) {
>              result = -errno;
>              error_setg_errno(errp, -result, "Could not resize file");
> +        } else if (current_length == 0 && offset > current_length) {
> +            allocate_first_block(fd);
>          }
>          return result;
>      default:
> diff --git a/tests/qemu-iotests/150.out b/tests/qemu-iotests/150.out
> index 2a54e8dcfa..3cdc7727a5 100644
> --- a/tests/qemu-iotests/150.out
> +++ b/tests/qemu-iotests/150.out
> @@ -3,6 +3,7 @@ QA output created by 150
>  === Mapping sparse conversion ===
>  
>  Offset          Length          File
> +0               0x1000          TEST_DIR/t.IMGFMT
>  
>  === Mapping non-sparse conversion ===
>  
> diff --git a/tests/qemu-iotests/160 b/tests/qemu-iotests/160
> index df89d3864b..ad2d054a47 100755
> --- a/tests/qemu-iotests/160
> +++ b/tests/qemu-iotests/160
> @@ -57,6 +57,10 @@ for skip in $TEST_SKIP_BLOCKS; do
>      $QEMU_IMG dd if="$TEST_IMG" of="$TEST_IMG.out" skip="$skip" -O "$IMGFMT" \
>          2> /dev/null
>      TEST_IMG="$TEST_IMG.out" _check_test_img
> +
> +    # We always write the first byte of an image.
> +    printf "\0" > "$TEST_IMG.out.dd"
> +
>      dd if="$TEST_IMG" of="$TEST_IMG.out.dd" skip="$skip" status=none
>  
>      echo
> diff --git a/tests/qemu-iotests/175 b/tests/qemu-iotests/175
> index 51e62c8276..c6a3a7bb1e 100755
> --- a/tests/qemu-iotests/175
> +++ b/tests/qemu-iotests/175
> @@ -37,14 +37,16 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
>  # the file size.  This function hides the resulting difference in the
>  # stat -c '%b' output.
>  # Parameter 1: Number of blocks an empty file occupies
> -# Parameter 2: Image size in bytes
> +# Parameter 2: Minimal number of blocks in an image
> +# Parameter 3: Image size in bytes
>  _filter_blocks()
>  {
>      extra_blocks=$1
> -    img_size=$2
> +    min_blocks=$2
> +    img_size=$3
>  
> -    sed -e "s/blocks=$extra_blocks\\(\$\\|[^0-9]\\)/nothing allocated/" \
> -        -e "s/blocks=$((extra_blocks + img_size / 512))\\(\$\\|[^0-9]\\)/everything allocated/"
> +    sed -e "s/blocks=$((extra_blocks + min_blocks))\\(\$\\|[^0-9]\\)/min allocation/" \
> +        -e "s/blocks=$((extra_blocks + img_size / 512))\\(\$\\|[^0-9]\\)/max allocation/"
>  }
>  
>  # get standard environment, filters and checks
> @@ -60,16 +62,21 @@ size=$((1 * 1024 * 1024))
>  touch "$TEST_DIR/empty"
>  extra_blocks=$(stat -c '%b' "$TEST_DIR/empty")
>  
> +# We always write the first byte; check how many blocks this filesystem
> +# allocates to match empty image alloation.
> +printf "\0" > "$TEST_DIR/empty"
> +min_blocks=$(stat -c '%b' "$TEST_DIR/empty")
> +
>  echo
>  echo "== creating image with default preallocation =="
>  _make_test_img $size | _filter_imgfmt
> -stat -c "size=%s, blocks=%b" $TEST_IMG | _filter_blocks $extra_blocks $size
> +stat -c "size=%s, blocks=%b" $TEST_IMG | _filter_blocks $extra_blocks $min_blocks $size
>  
>  for mode in off full falloc; do
>      echo
>      echo "== creating image with preallocation $mode =="
>      IMGOPTS=preallocation=$mode _make_test_img $size | _filter_imgfmt
> -    stat -c "size=%s, blocks=%b" $TEST_IMG | _filter_blocks $extra_blocks $size
> +    stat -c "size=%s, blocks=%b" $TEST_IMG | _filter_blocks $extra_blocks $min_blocks $size
>  done
>  
>  # success, all done
> diff --git a/tests/qemu-iotests/175.out b/tests/qemu-iotests/175.out
> index 6d9a5ed84e..263e521262 100644
> --- a/tests/qemu-iotests/175.out
> +++ b/tests/qemu-iotests/175.out
> @@ -2,17 +2,17 @@ QA output created by 175
>  
>  == creating image with default preallocation ==
>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
> -size=1048576, nothing allocated
> +size=1048576, min allocation
>  
>  == creating image with preallocation off ==
>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 preallocation=off
> -size=1048576, nothing allocated
> +size=1048576, min allocation
>  
>  == creating image with preallocation full ==
>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 preallocation=full
> -size=1048576, everything allocated
> +size=1048576, max allocation
>  
>  == creating image with preallocation falloc ==
>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 preallocation=falloc
> -size=1048576, everything allocated
> +size=1048576, max allocation
>   *** done
> diff --git a/tests/qemu-iotests/221.out b/tests/qemu-iotests/221.out
> index 9f9dd52bb0..dca024a0c3 100644
> --- a/tests/qemu-iotests/221.out
> +++ b/tests/qemu-iotests/221.out
> @@ -3,14 +3,18 @@ QA output created by 221
>  === Check mapping of unaligned raw image ===
>  
>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=65537
> -[{ "start": 0, "length": 66048, "depth": 0, "zero": true, "data": false, "offset": OFFSET}]
> -[{ "start": 0, "length": 66048, "depth": 0, "zero": true, "data": false, "offset": OFFSET}]
> +[{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
> +{ "start": 4096, "length": 61952, "depth": 0, "zero": true, "data": false, "offset": OFFSET}]
> +[{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
> +{ "start": 4096, "length": 61952, "depth": 0, "zero": true, "data": false, "offset": OFFSET}]
>  wrote 1/1 bytes at offset 65536
>  1 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> -[{ "start": 0, "length": 65536, "depth": 0, "zero": true, "data": false, "offset": OFFSET},
> +[{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
> +{ "start": 4096, "length": 61440, "depth": 0, "zero": true, "data": false, "offset": OFFSET},
>  { "start": 65536, "length": 1, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
>  { "start": 65537, "length": 511, "depth": 0, "zero": true, "data": false, "offset": OFFSET}]
> -[{ "start": 0, "length": 65536, "depth": 0, "zero": true, "data": false, "offset": OFFSET},
> +[{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
> +{ "start": 4096, "length": 61440, "depth": 0, "zero": true, "data": false, "offset": OFFSET},
>  { "start": 65536, "length": 1, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
>  { "start": 65537, "length": 511, "depth": 0, "zero": true, "data": false, "offset": OFFSET}]
>  *** done
> diff --git a/tests/qemu-iotests/253.out b/tests/qemu-iotests/253.out
> index 607c0baa0b..3d08b305d7 100644
> --- a/tests/qemu-iotests/253.out
> +++ b/tests/qemu-iotests/253.out
> @@ -3,12 +3,16 @@ QA output created by 253
>  === Check mapping of unaligned raw image ===
>  
>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048575
> -[{ "start": 0, "length": 1048576, "depth": 0, "zero": true, "data": false, "offset": OFFSET}]
> -[{ "start": 0, "length": 1048576, "depth": 0, "zero": true, "data": false, "offset": OFFSET}]
> +[{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
> +{ "start": 4096, "length": 1044480, "depth": 0, "zero": true, "data": false, "offset": OFFSET}]
> +[{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
> +{ "start": 4096, "length": 1044480, "depth": 0, "zero": true, "data": false, "offset": OFFSET}]
>  wrote 65535/65535 bytes at offset 983040
>  63.999 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> -[{ "start": 0, "length": 983040, "depth": 0, "zero": true, "data": false, "offset": OFFSET},
> +[{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
> +{ "start": 4096, "length": 978944, "depth": 0, "zero": true, "data": false, "offset": OFFSET},
>  { "start": 983040, "length": 65536, "depth": 0, "zero": false, "data": true, "offset": OFFSET}]
> -[{ "start": 0, "length": 983040, "depth": 0, "zero": true, "data": false, "offset": OFFSET},
> +[{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
> +{ "start": 4096, "length": 978944, "depth": 0, "zero": true, "data": false, "offset": OFFSET},
>  { "start": 983040, "length": 65536, "depth": 0, "zero": false, "data": true, "offset": OFFSET}]
>  *** done
> 


  reply	other threads:[~2019-08-16 21:58 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-16 21:21 [Qemu-devel] [PATCH] block: posix: Always allocate the first block Nir Soffer
2019-08-16 21:57 ` John Snow [this message]
2019-08-16 22:45   ` [Qemu-devel] [Qemu-block] " Nir Soffer
2019-08-16 23:00     ` John Snow
2019-08-22 11:30 ` [Qemu-devel] " Nir Soffer
2019-08-22 14:28 ` Max Reitz
2019-08-22 16:39   ` Nir Soffer
2019-08-22 18:11     ` Max Reitz
2019-08-22 19:01       ` Nir Soffer
2019-08-23 13:58         ` Max Reitz
2019-08-23 16:30           ` Nir Soffer
2019-08-23 17:41             ` Max Reitz
2019-08-23 16:48           ` Nir Soffer
2019-08-23 17:53             ` Max Reitz
2019-08-24 22:57               ` Nir Soffer
2019-08-25  7:44 ` [Qemu-devel] [Qemu-block] " Maxim Levitsky
2019-08-25 19:51   ` Nir Soffer
2019-08-25 22:17     ` Maxim Levitsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b24959b4-f2b2-d720-f8b5-4adc25b89278@redhat.com \
    --to=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=nirsof@gmail.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.