All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nir Soffer <nirsof@gmail.com>
To: qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>,
	qemu-block@nongnu.org, Markus Armbruster <armbru@redhat.com>,
	Max Reitz <mreitz@redhat.com>, Nir Soffer <nsoffer@redhat.com>
Subject: [PATCH 2/2] block: file-posix: Replace posix_fallocate with fallocate
Date: Mon, 31 Aug 2020 17:01:27 +0300	[thread overview]
Message-ID: <20200831140127.657134-3-nsoffer@redhat.com> (raw)
In-Reply-To: <20200831140127.657134-1-nsoffer@redhat.com>

If fallocate() is not supported, posix_fallocate() falls back to
inefficient allocation, writing one byte for every 4k bytes[1]. This is
very slow compared with writing zeros. In oVirt we measured ~400%
improvement in allocation time when replacing posix_fallocate() with
manually writing zeroes[2].

We also know that posix_fallocated() does not work well when using OFD
locks[3]. We don't know the reason yet for this issue yet.

Change preallocate_falloc() to use fallocate() instead of
posix_falloate(), and fall back to full preallocation if not supported.

Here are quick test results with this change.

Before (qemu-img-5.1.0-2.fc32.x86_64):

$ time qemu-img create -f raw -o preallocation=falloc /tmp/nfs3/test.raw 6g
Formatting '/tmp/nfs3/test.raw', fmt=raw size=6442450944 preallocation=falloc

real 0m42.100s
user 0m0.602s
sys 0m4.137s

NFS stats:
calls      retrans    authrefrsh    write
1571583    0          1572205       1571321

After:

$ time ./qemu-img create -f raw -o preallocation=falloc /tmp/nfs3/test.raw 6g
Formatting '/tmp/nfs3/test.raw', fmt=raw size=6442450944 preallocation=falloc

real 0m15.551s
user 0m0.070s
sys 0m2.623s

NFS stats:
calls      retrans    authrefrsh    write
24620      0          24624         24567  

[1] https://code.woboq.org/userspace/glibc/sysdeps/posix/posix_fallocate.c.html#96
[2] https://bugzilla.redhat.com/1850267#c25
[3] https://bugzilla.redhat.com/1851097

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
---
 block/file-posix.c                     | 32 +++++++++-----------------
 docs/system/qemu-block-drivers.rst.inc | 11 +++++----
 docs/tools/qemu-img.rst                | 11 +++++----
 qapi/block-core.json                   |  4 ++--
 4 files changed, 25 insertions(+), 33 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 341ffb1cb4..eac3c0b412 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1835,36 +1835,24 @@ static int allocate_first_block(int fd, size_t max_size)
 static int preallocate_falloc(int fd, int64_t current_length, int64_t offset,
                               Error **errp)
 {
-#ifdef CONFIG_POSIX_FALLOCATE
+#ifdef CONFIG_FALLOCATE
     int result;
 
     if (offset == current_length)
         return 0;
 
-    /*
-     * Truncating before posix_fallocate() makes it about twice slower on
-     * file systems that do not support fallocate(), trying to check if a
-     * block is allocated before allocating it, so don't do that here.
-     */
-
-    result = -posix_fallocate(fd, current_length,
-                              offset - current_length);
+    result = do_fallocate(fd, 0, current_length, offset - current_length);
     if (result != 0) {
-        /* posix_fallocate() doesn't set errno. */
-        error_setg_errno(errp, -result,
-                         "Could not preallocate new data");
+        error_setg_errno(errp, -result, "Could not preallocate new data");
         return result;
     }
 
     if (current_length == 0) {
         /*
-         * posix_fallocate() uses fallocate() if the filesystem supports
-         * it, or fallback to manually writing zeroes. If fallocate()
-         * was used, unaligned reads from the fallocated area in
-         * raw_probe_alignment() will succeed, hence we need to allocate
-         * the first block.
+         * Unaligned reads from the fallocated area in raw_probe_alignment()
+         * will succeed, hence we need to allocate the first block.
          *
-         * Optimize future alignment probing; ignore failures.
+         * Optimizes future alignment probing; ignore failures.
          */
         allocate_first_block(fd, offset);
     }
@@ -1973,10 +1961,12 @@ static int handle_aiocb_truncate(void *opaque)
     }
 
     switch (prealloc) {
-#ifdef CONFIG_POSIX_FALLOCATE
+#ifdef CONFIG_FALLOCATE
     case PREALLOC_MODE_FALLOC:
         result = preallocate_falloc(fd, current_length, offset, errp);
-        goto out;
+        if (result != -ENOTSUP)
+            goto out;
+        /* If fallocate() is not supported, fallback to full preallocation. */
 #endif
     case PREALLOC_MODE_FULL:
         result = preallocate_full(fd, current_length, offset, errp);
@@ -3080,7 +3070,7 @@ static QemuOptsList raw_create_opts = {
             .name = BLOCK_OPT_PREALLOC,
             .type = QEMU_OPT_STRING,
             .help = "Preallocation mode (allowed values: off"
-#ifdef CONFIG_POSIX_FALLOCATE
+#ifdef CONFIG_FALLOCATE
                     ", falloc"
 #endif
                     ", full)"
diff --git a/docs/system/qemu-block-drivers.rst.inc b/docs/system/qemu-block-drivers.rst.inc
index b052a6d14e..8e4acf397e 100644
--- a/docs/system/qemu-block-drivers.rst.inc
+++ b/docs/system/qemu-block-drivers.rst.inc
@@ -25,11 +25,12 @@ This section describes each format and the options that are supported for it.
   .. program:: raw
   .. option:: preallocation
 
-    Preallocation mode (allowed values: ``off``, ``falloc``,
-    ``full``). ``falloc`` mode preallocates space for image by
-    calling ``posix_fallocate()``. ``full`` mode preallocates space
-    for image by writing data to underlying storage. This data may or
-    may not be zero, depending on the storage location.
+    Preallocation mode (allowed values: ``off``, ``falloc``, ``full``).
+    ``falloc`` mode preallocates space for image by calling
+    ``fallocate()``, and falling back to ``full` mode if not supported.
+    ``full`` mode preallocates space for image by writing data to
+    underlying storage. This data may or may not be zero, depending on
+    the storage location.
 
 .. program:: image-formats
 .. option:: qcow2
diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index c35bd64822..a2089bd1b7 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -750,11 +750,12 @@ Supported image file formats:
   Supported options:
 
   ``preallocation``
-    Preallocation mode (allowed values: ``off``, ``falloc``,
-    ``full``).  ``falloc`` mode preallocates space for image by
-    calling ``posix_fallocate()``.  ``full`` mode preallocates space
-    for image by writing data to underlying storage.  This data may or
-    may not be zero, depending on the storage location.
+    Preallocation mode (allowed values: ``off``, ``falloc``, ``full``).
+    ``falloc`` mode preallocates space for image by calling
+    ``fallocate()``, and falling back to ``full` mode if not supported.
+    ``full`` mode preallocates space for image by writing data to
+    underlying storage. This data may or may not be zero, depending on
+    the storage location.
 
 ``qcow2``
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index db08c58d78..681d79ec63 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -5021,8 +5021,8 @@
 #
 # @off: no preallocation
 # @metadata: preallocate only for metadata
-# @falloc: like @full preallocation but allocate disk space by
-#          posix_fallocate() rather than writing data.
+# @falloc: try to allocate disk space by fallocate(), and fallback to
+#          @full preallocation if not supported.
 # @full: preallocate all data by writing it to the device to ensure
 #        disk space is really available. This data may or may not be
 #        zero, depending on the image format and storage.
-- 
2.26.2



  parent reply	other threads:[~2020-08-31 14:04 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-31 14:01 [PATCH 0/2] Replace posix_fallocate() with falloate() Nir Soffer
2020-08-31 14:01 ` [PATCH 1/2] block: file-posix: Extract preallocate helpers Nir Soffer
2020-09-01 10:24   ` Alberto Garcia
2020-09-01 10:26   ` Alberto Garcia
2020-09-01 10:47     ` Nir Soffer
2020-08-31 14:01 ` Nir Soffer [this message]
2020-09-01 15:51   ` [PATCH 2/2] block: file-posix: Replace posix_fallocate with fallocate Alberto Garcia
2020-09-14 17:32   ` Daniel P. Berrangé
2020-09-15  8:55     ` Nir Soffer
2020-08-31 15:55 ` [PATCH 0/2] Replace posix_fallocate() with falloate() no-reply
2020-09-14 17:19 ` Nir Soffer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200831140127.657134-3-nsoffer@redhat.com \
    --to=nirsof@gmail.com \
    --cc=armbru@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=nsoffer@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.