* [PATCH 0/2] Replace posix_fallocate() with falloate() @ 2020-08-31 14:01 Nir Soffer 2020-08-31 14:01 ` [PATCH 1/2] block: file-posix: Extract preallocate helpers Nir Soffer ` (3 more replies) 0 siblings, 4 replies; 11+ messages in thread From: Nir Soffer @ 2020-08-31 14:01 UTC (permalink / raw) To: qemu-devel Cc: Kevin Wolf, qemu-block, Markus Armbruster, Max Reitz, Nir Soffer Change preallocation=falloc to use fallocate() instead of posix_fallocte(), improving performance when legacy filesystem that do not support fallocate, and avoiding issues seen with OFD locks. More work is needed to respect cache mode when using full preallocation and maybe optimize buffer size. Continuing the discussion at: https://lists.nongnu.org/archive/html/qemu-block/2020-08/msg00947.html Nir Soffer (2): block: file-posix: Extract preallocate helpers block: file-posix: Replace posix_fallocate with fallocate block/file-posix.c | 202 ++++++++++++++----------- docs/system/qemu-block-drivers.rst.inc | 11 +- docs/tools/qemu-img.rst | 11 +- qapi/block-core.json | 4 +- 4 files changed, 127 insertions(+), 101 deletions(-) -- 2.26.2 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] block: file-posix: Extract preallocate helpers 2020-08-31 14:01 [PATCH 0/2] Replace posix_fallocate() with falloate() Nir Soffer @ 2020-08-31 14:01 ` Nir Soffer 2020-09-01 10:24 ` Alberto Garcia 2020-09-01 10:26 ` Alberto Garcia 2020-08-31 14:01 ` [PATCH 2/2] block: file-posix: Replace posix_fallocate with fallocate Nir Soffer ` (2 subsequent siblings) 3 siblings, 2 replies; 11+ messages in thread From: Nir Soffer @ 2020-08-31 14:01 UTC (permalink / raw) To: qemu-devel Cc: Kevin Wolf, qemu-block, Markus Armbruster, Max Reitz, Nir Soffer handle_aiocb_truncate() was too big and complex, implementing 3 different preallocation modes. In a future patch I want to introduce a fallback from "falloc" to "full"; it will be too messy and error prone with the current code. Extract a helper for each of the preallocation modes (falloc, full, off) and leave only the common preparation and cleanup code in handle_aiocb_truncate(). Signed-off-by: Nir Soffer <nsoffer@redhat.com> --- block/file-posix.c | 206 ++++++++++++++++++++++++++------------------- 1 file changed, 120 insertions(+), 86 deletions(-) diff --git a/block/file-posix.c b/block/file-posix.c index 9a00d4190a..341ffb1cb4 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -1832,12 +1832,128 @@ static int allocate_first_block(int fd, size_t max_size) return ret; } +static int preallocate_falloc(int fd, int64_t current_length, int64_t offset, + Error **errp) +{ +#ifdef CONFIG_POSIX_FALLOCATE + int result; + + if (offset == current_length) + return 0; + + /* + * Truncating before posix_fallocate() makes it about twice slower on + * file systems that do not support fallocate(), trying to check if a + * block is allocated before allocating it, so don't do that here. + */ + + result = -posix_fallocate(fd, current_length, + offset - current_length); + if (result != 0) { + /* posix_fallocate() doesn't set errno. */ + error_setg_errno(errp, -result, + "Could not preallocate new data"); + return result; + } + + if (current_length == 0) { + /* + * posix_fallocate() uses fallocate() if the filesystem supports + * it, or fallback to manually writing zeroes. If fallocate() + * was used, unaligned reads from the fallocated area in + * raw_probe_alignment() will succeed, hence we need to allocate + * the first block. + * + * Optimize future alignment probing; ignore failures. + */ + allocate_first_block(fd, offset); + } + + return 0; +#else + return -ENOTSUP; +#endif +} + +static int preallocate_full(int fd, int64_t current_length, int64_t offset, + Error **errp) +{ + int64_t num = 0, left = offset - current_length; + off_t seek_result; + int result; + char *buf = NULL; + + /* + * Knowing the final size from the beginning could allow the file + * system driver to do less allocations and possibly avoid + * fragmentation of the file. + */ + if (ftruncate(fd, offset) != 0) { + result = -errno; + error_setg_errno(errp, -result, "Could not resize file"); + goto out; + } + + buf = g_malloc0(65536); + + seek_result = lseek(fd, current_length, SEEK_SET); + if (seek_result < 0) { + result = -errno; + error_setg_errno(errp, -result, + "Failed to seek to the old end of file"); + goto out; + } + + while (left > 0) { + num = MIN(left, 65536); + result = write(fd, buf, num); + if (result < 0) { + if (errno == EINTR) { + continue; + } + result = -errno; + error_setg_errno(errp, -result, + "Could not write zeros for preallocation"); + goto out; + } + left -= result; + } + + result = fsync(fd); + if (result < 0) { + result = -errno; + error_setg_errno(errp, -result, "Could not flush file to disk"); + goto out; + } + +out: + g_free(buf); + + return result; +} + +static int preallocate_off(int fd, int64_t current_length, int64_t offset, + Error **errp) +{ + if (ftruncate(fd, offset) != 0) { + int result = -errno; + error_setg_errno(errp, -result, "Could not resize file"); + return result; + } + + if (current_length == 0 && offset > current_length) { + /* Optimize future alignment probing; ignore failures. */ + allocate_first_block(fd, offset); + } + + return 0; +} + static int handle_aiocb_truncate(void *opaque) { RawPosixAIOData *aiocb = opaque; int result = 0; int64_t current_length = 0; - char *buf = NULL; struct stat st; int fd = aiocb->aio_fildes; int64_t offset = aiocb->aio_offset; @@ -1859,95 +1975,14 @@ static int handle_aiocb_truncate(void *opaque) switch (prealloc) { #ifdef CONFIG_POSIX_FALLOCATE case PREALLOC_MODE_FALLOC: - /* - * Truncating before posix_fallocate() makes it about twice slower on - * file systems that do not support fallocate(), trying to check if a - * block is allocated before allocating it, so don't do that here. - */ - if (offset != current_length) { - result = -posix_fallocate(fd, current_length, - offset - current_length); - if (result != 0) { - /* posix_fallocate() doesn't set errno. */ - error_setg_errno(errp, -result, - "Could not preallocate new data"); - } else if (current_length == 0) { - /* - * posix_fallocate() uses fallocate() if the filesystem - * supports it, or fallback to manually writing zeroes. If - * fallocate() was used, unaligned reads from the fallocated - * area in raw_probe_alignment() will succeed, hence we need to - * allocate the first block. - * - * Optimize future alignment probing; ignore failures. - */ - allocate_first_block(fd, offset); - } - } else { - result = 0; - } + result = preallocate_falloc(fd, current_length, offset, errp); goto out; #endif case PREALLOC_MODE_FULL: - { - int64_t num = 0, left = offset - current_length; - off_t seek_result; - - /* - * Knowing the final size from the beginning could allow the file - * system driver to do less allocations and possibly avoid - * fragmentation of the file. - */ - if (ftruncate(fd, offset) != 0) { - result = -errno; - error_setg_errno(errp, -result, "Could not resize file"); - goto out; - } - - buf = g_malloc0(65536); - - seek_result = lseek(fd, current_length, SEEK_SET); - if (seek_result < 0) { - result = -errno; - error_setg_errno(errp, -result, - "Failed to seek to the old end of file"); - goto out; - } - - while (left > 0) { - num = MIN(left, 65536); - result = write(fd, buf, num); - if (result < 0) { - if (errno == EINTR) { - continue; - } - result = -errno; - error_setg_errno(errp, -result, - "Could not write zeros for preallocation"); - goto out; - } - left -= result; - } - if (result >= 0) { - result = fsync(fd); - if (result < 0) { - result = -errno; - error_setg_errno(errp, -result, - "Could not flush file to disk"); - goto out; - } - } + result = preallocate_full(fd, current_length, offset, errp); goto out; - } case PREALLOC_MODE_OFF: - if (ftruncate(fd, offset) != 0) { - result = -errno; - error_setg_errno(errp, -result, "Could not resize file"); - } else if (current_length == 0 && offset > current_length) { - /* Optimize future alignment probing; ignore failures. */ - allocate_first_block(fd, offset); - } - return result; + return preallocate_off(fd, current_length, offset, errp); default: result = -ENOTSUP; error_setg(errp, "Unsupported preallocation mode: %s", @@ -1963,7 +1998,6 @@ out: } } - g_free(buf); return result; } -- 2.26.2 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] block: file-posix: Extract preallocate helpers 2020-08-31 14:01 ` [PATCH 1/2] block: file-posix: Extract preallocate helpers Nir Soffer @ 2020-09-01 10:24 ` Alberto Garcia 2020-09-01 10:26 ` Alberto Garcia 1 sibling, 0 replies; 11+ messages in thread From: Alberto Garcia @ 2020-09-01 10:24 UTC (permalink / raw) To: Nir Soffer, qemu-devel Cc: Kevin Wolf, Markus Armbruster, qemu-block, Max Reitz On Mon 31 Aug 2020 04:01:26 PM CEST, Nir Soffer wrote: > handle_aiocb_truncate() was too big and complex, implementing 3 > different preallocation modes. In a future patch I want to introduce a > fallback from "falloc" to "full"; it will be too messy and error prone > with the current code. > > Extract a helper for each of the preallocation modes (falloc, full, off) > and leave only the common preparation and cleanup code in > handle_aiocb_truncate(). > > Signed-off-by: Nir Soffer <nsoffer@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Berto ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] block: file-posix: Extract preallocate helpers 2020-08-31 14:01 ` [PATCH 1/2] block: file-posix: Extract preallocate helpers Nir Soffer 2020-09-01 10:24 ` Alberto Garcia @ 2020-09-01 10:26 ` Alberto Garcia 2020-09-01 10:47 ` Nir Soffer 1 sibling, 1 reply; 11+ messages in thread From: Alberto Garcia @ 2020-09-01 10:26 UTC (permalink / raw) To: Nir Soffer, qemu-devel Cc: Kevin Wolf, Markus Armbruster, qemu-block, Max Reitz On Mon 31 Aug 2020 04:01:26 PM CEST, Nir Soffer wrote: > +static int preallocate_falloc(int fd, int64_t current_length, int64_t offset, > + Error **errp) > +{ > +#ifdef CONFIG_POSIX_FALLOCATE > + int result; > + > + if (offset == current_length) > + return 0; You can also take the chance to add the missing braces here (there's a similar warning for the other patch). Berto ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] block: file-posix: Extract preallocate helpers 2020-09-01 10:26 ` Alberto Garcia @ 2020-09-01 10:47 ` Nir Soffer 0 siblings, 0 replies; 11+ messages in thread From: Nir Soffer @ 2020-09-01 10:47 UTC (permalink / raw) To: Alberto Garcia Cc: Kevin Wolf, Nir Soffer, qemu-block, Markus Armbruster, QEMU Developers, Max Reitz On Tue, Sep 1, 2020 at 1:27 PM Alberto Garcia <berto@igalia.com> wrote: > > On Mon 31 Aug 2020 04:01:26 PM CEST, Nir Soffer wrote: > > +static int preallocate_falloc(int fd, int64_t current_length, int64_t offset, > > + Error **errp) > > +{ > > +#ifdef CONFIG_POSIX_FALLOCATE > > + int result; > > + > > + if (offset == current_length) > > + return 0; > > You can also take the chance to add the missing braces here (there's a > similar warning for the other patch). Sure, I'll change it in the next version. I forgot to run checkpatch.pl, and it also seems extra work when using git publish. ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/2] block: file-posix: Replace posix_fallocate with fallocate 2020-08-31 14:01 [PATCH 0/2] Replace posix_fallocate() with falloate() Nir Soffer 2020-08-31 14:01 ` [PATCH 1/2] block: file-posix: Extract preallocate helpers Nir Soffer @ 2020-08-31 14:01 ` Nir Soffer 2020-09-01 15:51 ` Alberto Garcia 2020-09-14 17:32 ` Daniel P. Berrangé 2020-08-31 15:55 ` [PATCH 0/2] Replace posix_fallocate() with falloate() no-reply 2020-09-14 17:19 ` Nir Soffer 3 siblings, 2 replies; 11+ messages in thread From: Nir Soffer @ 2020-08-31 14:01 UTC (permalink / raw) To: qemu-devel Cc: Kevin Wolf, qemu-block, Markus Armbruster, Max Reitz, Nir Soffer If fallocate() is not supported, posix_fallocate() falls back to inefficient allocation, writing one byte for every 4k bytes[1]. This is very slow compared with writing zeros. In oVirt we measured ~400% improvement in allocation time when replacing posix_fallocate() with manually writing zeroes[2]. We also know that posix_fallocated() does not work well when using OFD locks[3]. We don't know the reason yet for this issue yet. Change preallocate_falloc() to use fallocate() instead of posix_falloate(), and fall back to full preallocation if not supported. Here are quick test results with this change. Before (qemu-img-5.1.0-2.fc32.x86_64): $ time qemu-img create -f raw -o preallocation=falloc /tmp/nfs3/test.raw 6g Formatting '/tmp/nfs3/test.raw', fmt=raw size=6442450944 preallocation=falloc real 0m42.100s user 0m0.602s sys 0m4.137s NFS stats: calls retrans authrefrsh write 1571583 0 1572205 1571321 After: $ time ./qemu-img create -f raw -o preallocation=falloc /tmp/nfs3/test.raw 6g Formatting '/tmp/nfs3/test.raw', fmt=raw size=6442450944 preallocation=falloc real 0m15.551s user 0m0.070s sys 0m2.623s NFS stats: calls retrans authrefrsh write 24620 0 24624 24567 [1] https://code.woboq.org/userspace/glibc/sysdeps/posix/posix_fallocate.c.html#96 [2] https://bugzilla.redhat.com/1850267#c25 [3] https://bugzilla.redhat.com/1851097 Signed-off-by: Nir Soffer <nsoffer@redhat.com> --- block/file-posix.c | 32 +++++++++----------------- docs/system/qemu-block-drivers.rst.inc | 11 +++++---- docs/tools/qemu-img.rst | 11 +++++---- qapi/block-core.json | 4 ++-- 4 files changed, 25 insertions(+), 33 deletions(-) diff --git a/block/file-posix.c b/block/file-posix.c index 341ffb1cb4..eac3c0b412 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -1835,36 +1835,24 @@ static int allocate_first_block(int fd, size_t max_size) static int preallocate_falloc(int fd, int64_t current_length, int64_t offset, Error **errp) { -#ifdef CONFIG_POSIX_FALLOCATE +#ifdef CONFIG_FALLOCATE int result; if (offset == current_length) return 0; - /* - * Truncating before posix_fallocate() makes it about twice slower on - * file systems that do not support fallocate(), trying to check if a - * block is allocated before allocating it, so don't do that here. - */ - - result = -posix_fallocate(fd, current_length, - offset - current_length); + result = do_fallocate(fd, 0, current_length, offset - current_length); if (result != 0) { - /* posix_fallocate() doesn't set errno. */ - error_setg_errno(errp, -result, - "Could not preallocate new data"); + error_setg_errno(errp, -result, "Could not preallocate new data"); return result; } if (current_length == 0) { /* - * posix_fallocate() uses fallocate() if the filesystem supports - * it, or fallback to manually writing zeroes. If fallocate() - * was used, unaligned reads from the fallocated area in - * raw_probe_alignment() will succeed, hence we need to allocate - * the first block. + * Unaligned reads from the fallocated area in raw_probe_alignment() + * will succeed, hence we need to allocate the first block. * - * Optimize future alignment probing; ignore failures. + * Optimizes future alignment probing; ignore failures. */ allocate_first_block(fd, offset); } @@ -1973,10 +1961,12 @@ static int handle_aiocb_truncate(void *opaque) } switch (prealloc) { -#ifdef CONFIG_POSIX_FALLOCATE +#ifdef CONFIG_FALLOCATE case PREALLOC_MODE_FALLOC: result = preallocate_falloc(fd, current_length, offset, errp); - goto out; + if (result != -ENOTSUP) + goto out; + /* If fallocate() is not supported, fallback to full preallocation. */ #endif case PREALLOC_MODE_FULL: result = preallocate_full(fd, current_length, offset, errp); @@ -3080,7 +3070,7 @@ static QemuOptsList raw_create_opts = { .name = BLOCK_OPT_PREALLOC, .type = QEMU_OPT_STRING, .help = "Preallocation mode (allowed values: off" -#ifdef CONFIG_POSIX_FALLOCATE +#ifdef CONFIG_FALLOCATE ", falloc" #endif ", full)" diff --git a/docs/system/qemu-block-drivers.rst.inc b/docs/system/qemu-block-drivers.rst.inc index b052a6d14e..8e4acf397e 100644 --- a/docs/system/qemu-block-drivers.rst.inc +++ b/docs/system/qemu-block-drivers.rst.inc @@ -25,11 +25,12 @@ This section describes each format and the options that are supported for it. .. program:: raw .. option:: preallocation - Preallocation mode (allowed values: ``off``, ``falloc``, - ``full``). ``falloc`` mode preallocates space for image by - calling ``posix_fallocate()``. ``full`` mode preallocates space - for image by writing data to underlying storage. This data may or - may not be zero, depending on the storage location. + Preallocation mode (allowed values: ``off``, ``falloc``, ``full``). + ``falloc`` mode preallocates space for image by calling + ``fallocate()``, and falling back to ``full` mode if not supported. + ``full`` mode preallocates space for image by writing data to + underlying storage. This data may or may not be zero, depending on + the storage location. .. program:: image-formats .. option:: qcow2 diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst index c35bd64822..a2089bd1b7 100644 --- a/docs/tools/qemu-img.rst +++ b/docs/tools/qemu-img.rst @@ -750,11 +750,12 @@ Supported image file formats: Supported options: ``preallocation`` - Preallocation mode (allowed values: ``off``, ``falloc``, - ``full``). ``falloc`` mode preallocates space for image by - calling ``posix_fallocate()``. ``full`` mode preallocates space - for image by writing data to underlying storage. This data may or - may not be zero, depending on the storage location. + Preallocation mode (allowed values: ``off``, ``falloc``, ``full``). + ``falloc`` mode preallocates space for image by calling + ``fallocate()``, and falling back to ``full` mode if not supported. + ``full`` mode preallocates space for image by writing data to + underlying storage. This data may or may not be zero, depending on + the storage location. ``qcow2`` diff --git a/qapi/block-core.json b/qapi/block-core.json index db08c58d78..681d79ec63 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -5021,8 +5021,8 @@ # # @off: no preallocation # @metadata: preallocate only for metadata -# @falloc: like @full preallocation but allocate disk space by -# posix_fallocate() rather than writing data. +# @falloc: try to allocate disk space by fallocate(), and fallback to +# @full preallocation if not supported. # @full: preallocate all data by writing it to the device to ensure # disk space is really available. This data may or may not be # zero, depending on the image format and storage. -- 2.26.2 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] block: file-posix: Replace posix_fallocate with fallocate 2020-08-31 14:01 ` [PATCH 2/2] block: file-posix: Replace posix_fallocate with fallocate Nir Soffer @ 2020-09-01 15:51 ` Alberto Garcia 2020-09-14 17:32 ` Daniel P. Berrangé 1 sibling, 0 replies; 11+ messages in thread From: Alberto Garcia @ 2020-09-01 15:51 UTC (permalink / raw) To: Nir Soffer, qemu-devel Cc: Kevin Wolf, Markus Armbruster, qemu-block, Max Reitz On Mon 31 Aug 2020 04:01:27 PM CEST, Nir Soffer wrote: > If fallocate() is not supported, posix_fallocate() falls back to > inefficient allocation, writing one byte for every 4k bytes[1]. This is > very slow compared with writing zeros. In oVirt we measured ~400% > improvement in allocation time when replacing posix_fallocate() with > manually writing zeroes[2]. > > We also know that posix_fallocated() does not work well when using OFD > locks[3]. We don't know the reason yet for this issue yet. > > Change preallocate_falloc() to use fallocate() instead of > posix_falloate(), and fall back to full preallocation if not > supported. > case PREALLOC_MODE_FALLOC: > result = preallocate_falloc(fd, current_length, offset, errp); > - goto out; > + if (result != -ENOTSUP) > + goto out; > + /* If fallocate() is not supported, fallback to full preallocation. */ With the missing braces in this if statement, Reviewed-by: Alberto Garcia <berto@igalia.com> Berto ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] block: file-posix: Replace posix_fallocate with fallocate 2020-08-31 14:01 ` [PATCH 2/2] block: file-posix: Replace posix_fallocate with fallocate Nir Soffer 2020-09-01 15:51 ` Alberto Garcia @ 2020-09-14 17:32 ` Daniel P. Berrangé 2020-09-15 8:55 ` Nir Soffer 1 sibling, 1 reply; 11+ messages in thread From: Daniel P. Berrangé @ 2020-09-14 17:32 UTC (permalink / raw) To: Nir Soffer Cc: Kevin Wolf, qemu-block, Markus Armbruster, qemu-devel, Max Reitz, Nir Soffer On Mon, Aug 31, 2020 at 05:01:27PM +0300, Nir Soffer wrote: > If fallocate() is not supported, posix_fallocate() falls back to > inefficient allocation, writing one byte for every 4k bytes[1]. This is > very slow compared with writing zeros. In oVirt we measured ~400% > improvement in allocation time when replacing posix_fallocate() with > manually writing zeroes[2]. > > We also know that posix_fallocated() does not work well when using OFD > locks[3]. We don't know the reason yet for this issue yet. > > Change preallocate_falloc() to use fallocate() instead of > posix_falloate(), and fall back to full preallocation if not supported. > > Here are quick test results with this change. > > Before (qemu-img-5.1.0-2.fc32.x86_64): > > $ time qemu-img create -f raw -o preallocation=falloc /tmp/nfs3/test.raw 6g > Formatting '/tmp/nfs3/test.raw', fmt=raw size=6442450944 preallocation=falloc > > real 0m42.100s > user 0m0.602s > sys 0m4.137s > > NFS stats: > calls retrans authrefrsh write > 1571583 0 1572205 1571321 > > After: > > $ time ./qemu-img create -f raw -o preallocation=falloc /tmp/nfs3/test.raw 6g > Formatting '/tmp/nfs3/test.raw', fmt=raw size=6442450944 preallocation=falloc > > real 0m15.551s > user 0m0.070s > sys 0m2.623s > > NFS stats: > calls retrans authrefrsh write > 24620 0 24624 24567 > > [1] https://code.woboq.org/userspace/glibc/sysdeps/posix/posix_fallocate.c.html#96 > [2] https://bugzilla.redhat.com/1850267#c25 > [3] https://bugzilla.redhat.com/1851097 This bug appears to be private to RH employees only, so rather than link to it, please summarize any important facts in it for benefit of nonn-RH QEMU contributors. > > Signed-off-by: Nir Soffer <nsoffer@redhat.com> > --- > block/file-posix.c | 32 +++++++++----------------- > docs/system/qemu-block-drivers.rst.inc | 11 +++++---- > docs/tools/qemu-img.rst | 11 +++++---- > qapi/block-core.json | 4 ++-- > 4 files changed, 25 insertions(+), 33 deletions(-) Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] block: file-posix: Replace posix_fallocate with fallocate 2020-09-14 17:32 ` Daniel P. Berrangé @ 2020-09-15 8:55 ` Nir Soffer 0 siblings, 0 replies; 11+ messages in thread From: Nir Soffer @ 2020-09-15 8:55 UTC (permalink / raw) To: Daniel P. Berrangé Cc: Kevin Wolf, Nir Soffer, qemu-block, Markus Armbruster, QEMU Developers, Max Reitz On Mon, Sep 14, 2020 at 8:32 PM Daniel P. Berrangé <berrange@redhat.com> wrote: > > On Mon, Aug 31, 2020 at 05:01:27PM +0300, Nir Soffer wrote: > > If fallocate() is not supported, posix_fallocate() falls back to > > inefficient allocation, writing one byte for every 4k bytes[1]. This is > > very slow compared with writing zeros. In oVirt we measured ~400% > > improvement in allocation time when replacing posix_fallocate() with > > manually writing zeroes[2]. > > > > We also know that posix_fallocated() does not work well when using OFD > > locks[3]. We don't know the reason yet for this issue yet. > > > > Change preallocate_falloc() to use fallocate() instead of > > posix_falloate(), and fall back to full preallocation if not supported. > > > > Here are quick test results with this change. > > > > Before (qemu-img-5.1.0-2.fc32.x86_64): > > > > $ time qemu-img create -f raw -o preallocation=falloc /tmp/nfs3/test.raw 6g > > Formatting '/tmp/nfs3/test.raw', fmt=raw size=6442450944 preallocation=falloc > > > > real 0m42.100s > > user 0m0.602s > > sys 0m4.137s > > > > NFS stats: > > calls retrans authrefrsh write > > 1571583 0 1572205 1571321 > > > > After: > > > > $ time ./qemu-img create -f raw -o preallocation=falloc /tmp/nfs3/test.raw 6g > > Formatting '/tmp/nfs3/test.raw', fmt=raw size=6442450944 preallocation=falloc > > > > real 0m15.551s > > user 0m0.070s > > sys 0m2.623s > > > > NFS stats: > > calls retrans authrefrsh write > > 24620 0 24624 24567 > > > > [1] https://code.woboq.org/userspace/glibc/sysdeps/posix/posix_fallocate.c.html#96 > > [2] https://bugzilla.redhat.com/1850267#c25 > > [3] https://bugzilla.redhat.com/1851097 > > This bug appears to be private to RH employees only, so rather than link > to it, please summarize any important facts in it for benefit of nonn-RH > QEMU contributors. Thanks, I missed that detail when linking to the bug. The bug is public now. > > Signed-off-by: Nir Soffer <nsoffer@redhat.com> > > --- > > block/file-posix.c | 32 +++++++++----------------- > > docs/system/qemu-block-drivers.rst.inc | 11 +++++---- > > docs/tools/qemu-img.rst | 11 +++++---- > > qapi/block-core.json | 4 ++-- > > 4 files changed, 25 insertions(+), 33 deletions(-) > > Regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/2] Replace posix_fallocate() with falloate() 2020-08-31 14:01 [PATCH 0/2] Replace posix_fallocate() with falloate() Nir Soffer 2020-08-31 14:01 ` [PATCH 1/2] block: file-posix: Extract preallocate helpers Nir Soffer 2020-08-31 14:01 ` [PATCH 2/2] block: file-posix: Replace posix_fallocate with fallocate Nir Soffer @ 2020-08-31 15:55 ` no-reply 2020-09-14 17:19 ` Nir Soffer 3 siblings, 0 replies; 11+ messages in thread From: no-reply @ 2020-08-31 15:55 UTC (permalink / raw) To: nirsof; +Cc: kwolf, qemu-block, armbru, qemu-devel, mreitz, nsoffer Patchew URL: https://patchew.org/QEMU/20200831140127.657134-1-nsoffer@redhat.com/ Hi, This series seems to have some coding style problems. See output below for more information: Type: series Message-id: 20200831140127.657134-1-nsoffer@redhat.com Subject: [PATCH 0/2] Replace posix_fallocate() with falloate() === TEST SCRIPT BEGIN === #!/bin/bash git rev-parse base > /dev/null || exit 0 git config --local diff.renamelimit 0 git config --local diff.renames True git config --local diff.algorithm histogram ./scripts/checkpatch.pl --mailback base.. === TEST SCRIPT END === From https://github.com/patchew-project/qemu * [new tag] patchew/20200831140127.657134-1-nsoffer@redhat.com -> patchew/20200831140127.657134-1-nsoffer@redhat.com Switched to a new branch 'test' 70e35ed block: file-posix: Replace posix_fallocate with fallocate 35d89d1 block: file-posix: Extract preallocate helpers === OUTPUT BEGIN === 1/2 Checking commit 35d89d1300e4 (block: file-posix: Extract preallocate helpers) ERROR: braces {} are necessary for all arms of this statement #33: FILE: block/file-posix.c:1841: + if (offset == current_length) [...] total: 1 errors, 0 warnings, 234 lines checked Patch 1/2 has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. 2/2 Checking commit 70e35eda5bc9 (block: file-posix: Replace posix_fallocate with fallocate) ERROR: braces {} are necessary for all arms of this statement #110: FILE: block/file-posix.c:1967: + if (result != -ENOTSUP) [...] total: 1 errors, 0 warnings, 108 lines checked Patch 2/2 has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. === OUTPUT END === Test command exited with code: 1 The full log is available at http://patchew.org/logs/20200831140127.657134-1-nsoffer@redhat.com/testing.checkpatch/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-devel@redhat.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/2] Replace posix_fallocate() with falloate() 2020-08-31 14:01 [PATCH 0/2] Replace posix_fallocate() with falloate() Nir Soffer ` (2 preceding siblings ...) 2020-08-31 15:55 ` [PATCH 0/2] Replace posix_fallocate() with falloate() no-reply @ 2020-09-14 17:19 ` Nir Soffer 3 siblings, 0 replies; 11+ messages in thread From: Nir Soffer @ 2020-09-14 17:19 UTC (permalink / raw) To: Nir Soffer Cc: Kevin Wolf, qemu-block, Markus Armbruster, QEMU Developers, Max Reitz On Mon, Aug 31, 2020 at 5:01 PM Nir Soffer <nirsof@gmail.com> wrote: > > Change preallocation=falloc to use fallocate() instead of > posix_fallocte(), improving performance when legacy filesystem that do > not support fallocate, and avoiding issues seen with OFD locks. > > More work is needed to respect cache mode when using full preallocation > and maybe optimize buffer size. > > Continuing the discussion at: > https://lists.nongnu.org/archive/html/qemu-block/2020-08/msg00947.html > > Nir Soffer (2): > block: file-posix: Extract preallocate helpers > block: file-posix: Replace posix_fallocate with fallocate > > block/file-posix.c | 202 ++++++++++++++----------- > docs/system/qemu-block-drivers.rst.inc | 11 +- > docs/tools/qemu-img.rst | 11 +- > qapi/block-core.json | 4 +- > 4 files changed, 127 insertions(+), 101 deletions(-) Ping ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2020-09-15 8:57 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-08-31 14:01 [PATCH 0/2] Replace posix_fallocate() with falloate() Nir Soffer 2020-08-31 14:01 ` [PATCH 1/2] block: file-posix: Extract preallocate helpers Nir Soffer 2020-09-01 10:24 ` Alberto Garcia 2020-09-01 10:26 ` Alberto Garcia 2020-09-01 10:47 ` Nir Soffer 2020-08-31 14:01 ` [PATCH 2/2] block: file-posix: Replace posix_fallocate with fallocate Nir Soffer 2020-09-01 15:51 ` Alberto Garcia 2020-09-14 17:32 ` Daniel P. Berrangé 2020-09-15 8:55 ` Nir Soffer 2020-08-31 15:55 ` [PATCH 0/2] Replace posix_fallocate() with falloate() no-reply 2020-09-14 17:19 ` Nir Soffer
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.