All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [qcow2] how to avoid qemu doing lseek(SEEK_DATA/SEEK_HOLE)?
@ 2017-02-02 12:30 Stephane Chazelas
  2017-02-02 15:23 ` Laszlo Ersek
  2017-02-07 23:43 ` Max Reitz
  0 siblings, 2 replies; 8+ messages in thread
From: Stephane Chazelas @ 2017-02-02 12:30 UTC (permalink / raw)
  To: qemu-devel

Hello,

since qemu-2.7.0, doing synchronised I/O in a VM (tested with
Ubuntu 16.04 amd64 VM)  while the disk is backed by a qcow2
file sitting on a ZFS filesystem (zfs on Linux on Debian jessie
(PVE)), the performances are dreadful:

# time dd if=/dev/zero count=1000  of=b oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 21.9908 s, 23.3 kB/s
dd if=/dev/zero count=1000 of=b oflag=dsync  0.00s user 0.04s system 0% cpu 21.992 total

(22 seconds to write that half megabyte). Same with O_SYNC or
O_DIRECT, or doing fsync() or sync_file_range() after each
write().

I first noticed it for dpkg unpacking kernel headers where dpkg
does a sync_file_range() after each file is extracted.

Note that it doesn't happen when writing anything else than
zeroes (like tr '\0' x < /dev/zero | dd count=1000  of=b
oflag=dsync). In the case of the kernel headers, I suppose the
zeroes come from the non-filled parts of the ext4 blocks.

Doing strace -fc on the qemu process, 98% of the time is spent
in the lseek() system call.

That's the lseek(SEEK_DATA) followed by lseek(SEEK_HOLE) done by
find_allocation() called to find out whether sectors are within
a hole in a sparse file.

#0  lseek64 () at ../sysdeps/unix/syscall-template.S:81
#1  0x0000561287cf4ca8 in find_allocation (bs=0x7fd898d70000, hole=<synthetic pointer>, data=<synthetic pointer>, start=<optimized out>)
    at block/raw-posix.c:1702
#2  raw_co_get_block_status (bs=0x7fd898d70000, sector_num=<optimized out>, nb_sectors=40, pnum=0x7fd80dd05aac, file=0x7fd80dd05ab0) at block/raw-posix.c:1765
#3  0x0000561287cfae92 in bdrv_co_get_block_status (bs=0x7fd898d70000, sector_num=sector_num@entry=1303680, nb_sectors=40, pnum=pnum@entry=0x7fd80dd05aac,
    file=file@entry=0x7fd80dd05ab0) at block/io.c:1709
#4  0x0000561287cfafaa in bdrv_co_get_block_status (bs=bs@entry=0x7fd898d66000, sector_num=sector_num@entry=33974144, nb_sectors=<optimized out>,
    nb_sectors@entry=40, pnum=pnum@entry=0x7fd80dd05bbc, file=file@entry=0x7fd80dd05bc0) at block/io.c:1742
#5  0x0000561287cfb0bb in bdrv_co_get_block_status_above (file=0x7fd80dd05bc0, pnum=0x7fd80dd05bbc, nb_sectors=40, sector_num=33974144, base=0x0,
    bs=<optimized out>) at block/io.c:1776
#6  bdrv_get_block_status_above_co_entry (opaque=opaque@entry=0x7fd80dd05b40) at block/io.c:1792
#7  0x0000561287cfae08 in bdrv_get_block_status_above (bs=0x7fd898d66000, base=base@entry=0x0, sector_num=<optimized out>, nb_sectors=nb_sectors@entry=40,
    pnum=pnum@entry=0x7fd80dd05bbc, file=file@entry=0x7fd80dd05bc0) at block/io.c:1824
#8  0x0000561287cd372d in is_zero_sectors (bs=<optimized out>, start=<optimized out>, count=40) at block/qcow2.c:2428
#9  0x0000561287cd38ed in is_zero_sectors (count=<optimized out>, start=<optimized out>, bs=<optimized out>) at block/qcow2.c:2471
#10 qcow2_co_pwrite_zeroes (bs=0x7fd898d66000, offset=33974144, count=24576, flags=2724114573) at block/qcow2.c:2452
#11 0x0000561287cfcb7f in bdrv_co_do_pwrite_zeroes (bs=bs@entry=0x7fd898d66000, offset=offset@entry=17394782208, count=count@entry=4096,
    flags=flags@entry=BDRV_REQ_ZERO_WRITE) at block/io.c:1218
#12 0x0000561287cfd0cb in bdrv_aligned_pwritev (bs=0x7fd898d66000, req=<optimized out>, offset=17394782208, bytes=4096, align=1, qiov=0x0,
    flags=<optimized out>) at block/io.c:1320
#13 0x0000561287cfe450 in bdrv_co_do_zero_pwritev (req=<optimized out>, flags=<optimized out>, bytes=<optimized out>, offset=<optimized out>,
    bs=<optimized out>) at block/io.c:1422
#14 bdrv_co_pwritev (child=0x15, offset=17394782208, bytes=4096, qiov=0x7fd8a25eb08d <lseek64+45>, qiov@entry=0x0, flags=231758512) at block/io.c:1492
#15 0x0000561287cefdc7 in blk_co_pwritev (blk=0x7fd898cad540, offset=17394782208, bytes=4096, qiov=0x0, flags=<optimized out>) at block/block-backend.c:788
#16 0x0000561287cefeeb in blk_aio_write_entry (opaque=0x7fd812941440) at block/block-backend.c:982
#17 0x0000561287d67c7a in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:78

Now, performance is really bad on ZFS for those lseek().
I believe that's https://github.com/zfsonlinux/zfs/issues/4306

Until that's fixed in ZFS, I need to find a way to avoid those
lseek()s in the first place.

One way is to downgrade to 2.6.2 where those lseek()s are not
called. The change that introduced them seems to be:

https://github.com/qemu/qemu/commit/2928abce6d1d426d37c0a9bd5f85fb95cf33f709
(and there have been further changes to improve it later).

If I understand correctly, that change was about preventing data
from being allocated when the user is writing unaligned zeroes.

I suppose the idea is that if something is trying to write
zeroes in the middle of an _allocated_ qcow2 cluster, but the
corresponding sectors in the file underneath are in a hole, we
don't want to write those zeros as that would allocate the data
at the file level.

I can see it makes sense, but in my case, the little space
efficiency it brings is largely overshadowed by the sharp
decrease in performance.

For now, I work around it by changing the "#ifdef SEEK_DATA"
to "#if 0" in find_allocation().

Note that passing detect-zeroes=off or detect-zeroes=unmap (with
discard) doesn't help (even though FALLOC_FL_PUNCH_HOLE is
supported on ZFS on Linux).

Is there any other way I could use to prevent those lseek()s
without having to rebuild qemu?

Would you consider adding an option to disable that behaviour
(skip checking allocation at file level for qcow2 image)?

Thanks,
Stephane

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [qcow2] how to avoid qemu doing lseek(SEEK_DATA/SEEK_HOLE)?
  2017-02-02 12:30 [Qemu-devel] [qcow2] how to avoid qemu doing lseek(SEEK_DATA/SEEK_HOLE)? Stephane Chazelas
@ 2017-02-02 15:23 ` Laszlo Ersek
  2017-02-02 16:03   ` Stephane Chazelas
  2017-02-07 23:43 ` Max Reitz
  1 sibling, 1 reply; 8+ messages in thread
From: Laszlo Ersek @ 2017-02-02 15:23 UTC (permalink / raw)
  To: Stephane Chazelas; +Cc: qemu-devel

On 02/02/17 13:30, Stephane Chazelas wrote:
> Hello,
> 
> since qemu-2.7.0, doing synchronised I/O in a VM (tested with
> Ubuntu 16.04 amd64 VM)  while the disk is backed by a qcow2
> file sitting on a ZFS filesystem (zfs on Linux on Debian jessie
> (PVE)), the performances are dreadful:
> 
> # time dd if=/dev/zero count=1000  of=b oflag=dsync
> 1000+0 records in
> 1000+0 records out
> 512000 bytes (512 kB, 500 KiB) copied, 21.9908 s, 23.3 kB/s
> dd if=/dev/zero count=1000 of=b oflag=dsync  0.00s user 0.04s system 0% cpu 21.992 total
> 
> (22 seconds to write that half megabyte). Same with O_SYNC or
> O_DIRECT, or doing fsync() or sync_file_range() after each
> write().
> 
> I first noticed it for dpkg unpacking kernel headers where dpkg
> does a sync_file_range() after each file is extracted.
> 
> Note that it doesn't happen when writing anything else than
> zeroes (like tr '\0' x < /dev/zero | dd count=1000  of=b
> oflag=dsync). In the case of the kernel headers, I suppose the
> zeroes come from the non-filled parts of the ext4 blocks.
> 
> Doing strace -fc on the qemu process, 98% of the time is spent
> in the lseek() system call.
> 
> That's the lseek(SEEK_DATA) followed by lseek(SEEK_HOLE) done by
> find_allocation() called to find out whether sectors are within
> a hole in a sparse file.
> 
> #0  lseek64 () at ../sysdeps/unix/syscall-template.S:81
> #1  0x0000561287cf4ca8 in find_allocation (bs=0x7fd898d70000, hole=<synthetic pointer>, data=<synthetic pointer>, start=<optimized out>)
>     at block/raw-posix.c:1702
> #2  raw_co_get_block_status (bs=0x7fd898d70000, sector_num=<optimized out>, nb_sectors=40, pnum=0x7fd80dd05aac, file=0x7fd80dd05ab0) at block/raw-posix.c:1765
> #3  0x0000561287cfae92 in bdrv_co_get_block_status (bs=0x7fd898d70000, sector_num=sector_num@entry=1303680, nb_sectors=40, pnum=pnum@entry=0x7fd80dd05aac,
>     file=file@entry=0x7fd80dd05ab0) at block/io.c:1709
> #4  0x0000561287cfafaa in bdrv_co_get_block_status (bs=bs@entry=0x7fd898d66000, sector_num=sector_num@entry=33974144, nb_sectors=<optimized out>,
>     nb_sectors@entry=40, pnum=pnum@entry=0x7fd80dd05bbc, file=file@entry=0x7fd80dd05bc0) at block/io.c:1742
> #5  0x0000561287cfb0bb in bdrv_co_get_block_status_above (file=0x7fd80dd05bc0, pnum=0x7fd80dd05bbc, nb_sectors=40, sector_num=33974144, base=0x0,
>     bs=<optimized out>) at block/io.c:1776
> #6  bdrv_get_block_status_above_co_entry (opaque=opaque@entry=0x7fd80dd05b40) at block/io.c:1792
> #7  0x0000561287cfae08 in bdrv_get_block_status_above (bs=0x7fd898d66000, base=base@entry=0x0, sector_num=<optimized out>, nb_sectors=nb_sectors@entry=40,
>     pnum=pnum@entry=0x7fd80dd05bbc, file=file@entry=0x7fd80dd05bc0) at block/io.c:1824
> #8  0x0000561287cd372d in is_zero_sectors (bs=<optimized out>, start=<optimized out>, count=40) at block/qcow2.c:2428
> #9  0x0000561287cd38ed in is_zero_sectors (count=<optimized out>, start=<optimized out>, bs=<optimized out>) at block/qcow2.c:2471
> #10 qcow2_co_pwrite_zeroes (bs=0x7fd898d66000, offset=33974144, count=24576, flags=2724114573) at block/qcow2.c:2452
> #11 0x0000561287cfcb7f in bdrv_co_do_pwrite_zeroes (bs=bs@entry=0x7fd898d66000, offset=offset@entry=17394782208, count=count@entry=4096,
>     flags=flags@entry=BDRV_REQ_ZERO_WRITE) at block/io.c:1218
> #12 0x0000561287cfd0cb in bdrv_aligned_pwritev (bs=0x7fd898d66000, req=<optimized out>, offset=17394782208, bytes=4096, align=1, qiov=0x0,
>     flags=<optimized out>) at block/io.c:1320
> #13 0x0000561287cfe450 in bdrv_co_do_zero_pwritev (req=<optimized out>, flags=<optimized out>, bytes=<optimized out>, offset=<optimized out>,
>     bs=<optimized out>) at block/io.c:1422
> #14 bdrv_co_pwritev (child=0x15, offset=17394782208, bytes=4096, qiov=0x7fd8a25eb08d <lseek64+45>, qiov@entry=0x0, flags=231758512) at block/io.c:1492
> #15 0x0000561287cefdc7 in blk_co_pwritev (blk=0x7fd898cad540, offset=17394782208, bytes=4096, qiov=0x0, flags=<optimized out>) at block/block-backend.c:788
> #16 0x0000561287cefeeb in blk_aio_write_entry (opaque=0x7fd812941440) at block/block-backend.c:982
> #17 0x0000561287d67c7a in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:78
> 
> Now, performance is really bad on ZFS for those lseek().
> I believe that's https://github.com/zfsonlinux/zfs/issues/4306
> 
> Until that's fixed in ZFS, I need to find a way to avoid those
> lseek()s in the first place.
> 
> One way is to downgrade to 2.6.2 where those lseek()s are not
> called. The change that introduced them seems to be:
> 
> https://github.com/qemu/qemu/commit/2928abce6d1d426d37c0a9bd5f85fb95cf33f709
> (and there have been further changes to improve it later).
> 
> If I understand correctly, that change was about preventing data
> from being allocated when the user is writing unaligned zeroes.
> 
> I suppose the idea is that if something is trying to write
> zeroes in the middle of an _allocated_ qcow2 cluster, but the
> corresponding sectors in the file underneath are in a hole, we
> don't want to write those zeros as that would allocate the data
> at the file level.
> 
> I can see it makes sense, but in my case, the little space
> efficiency it brings is largely overshadowed by the sharp
> decrease in performance.
> 
> For now, I work around it by changing the "#ifdef SEEK_DATA"
> to "#if 0" in find_allocation().
> 
> Note that passing detect-zeroes=off or detect-zeroes=unmap (with
> discard) doesn't help (even though FALLOC_FL_PUNCH_HOLE is
> supported on ZFS on Linux).
> 
> Is there any other way I could use to prevent those lseek()s
> without having to rebuild qemu?

My suggestion will likely be incredibly lame, but let's hope it at least
directs some attention to your query.

You didn't mention what qcow2 features you use -- vmstate, snapshots,
backing files (chains of them), compression?

Since commit 2928abce6d1d only modifies "block/qcow2.c", you could
switch / convert the images to "raw". "raw" still benefits from sparse
files (which ZFS-on-Linux apparently supports). Sparse files (i.e., the
disk space savings) are the most important feature to me at least.

Thanks (and sorry again about the lame idea... you likely have good
reasons for qcow2...)
Laszlo

> 
> Would you consider adding an option to disable that behaviour
> (skip checking allocation at file level for qcow2 image)?
> 
> Thanks,
> Stephane
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [qcow2] how to avoid qemu doing lseek(SEEK_DATA/SEEK_HOLE)?
  2017-02-02 15:23 ` Laszlo Ersek
@ 2017-02-02 16:03   ` Stephane Chazelas
  0 siblings, 0 replies; 8+ messages in thread
From: Stephane Chazelas @ 2017-02-02 16:03 UTC (permalink / raw)
  To: Laszlo Ersek; +Cc: qemu-devel

2017-02-02 16:23:53 +0100, Laszlo Ersek:
[...]
> You didn't mention what qcow2 features you use -- vmstate, snapshots,
> backing files (chains of them), compression?
> 
> Since commit 2928abce6d1d only modifies "block/qcow2.c", you could
> switch / convert the images to "raw". "raw" still benefits from sparse
> files (which ZFS-on-Linux apparently supports). Sparse files (i.e., the
> disk space savings) are the most important feature to me at least.
[...]

Thanks for the feedback.

Sorry for not mentioning it in the first place, but I do need
the vmstate and snapshots (even non-linear snapshots which means
even ZFS zvol snapshots as done by Proxmox VE are not an option
either, neither is vmdk)


I hadn't tested before now, but what I observe with raw
devices and discard=on,detect-zeroes=unmap (and the virtio-scsi
interface), is that upon those "synced writes of zeroes" into
allocated data, qemu does some

[pid 10535] fallocate(14, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 136314880, 4096) = 0

into the disk image.

(and no lseek(SEEK_DATA/SEEK_HOLE))

which I don't see when using qcow2 images.

If the qcow2 interface was updated to do the same (punch holes
regardless instead of checking if the data is allocated
beforehand), that would also solve my problem (anything that
avoid those lseek()s being called).

Another thing I've not mentioned clearly is the versions of qemu
I have been testing with: 2.7, 2.7.1 (those two on Proxmox VE
4.4 (based on Debian jessie)) and 2.8.0 (the latter for
verification on a Debian unstable system, not with zfs).

-- 
Stephane

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [qcow2] how to avoid qemu doing lseek(SEEK_DATA/SEEK_HOLE)?
  2017-02-02 12:30 [Qemu-devel] [qcow2] how to avoid qemu doing lseek(SEEK_DATA/SEEK_HOLE)? Stephane Chazelas
  2017-02-02 15:23 ` Laszlo Ersek
@ 2017-02-07 23:43 ` Max Reitz
  2017-02-08 14:06   ` Stephane Chazelas
  2017-02-08 14:20   ` Stephane Chazelas
  1 sibling, 2 replies; 8+ messages in thread
From: Max Reitz @ 2017-02-07 23:43 UTC (permalink / raw)
  To: Stephane Chazelas, qemu-devel, Qemu-block, Kevin Wolf

[-- Attachment #1: Type: text/plain, Size: 7227 bytes --]

Hi,

I've been thinking about the issue but I'm not sure I've come to a
resolution you'll like much.

I'm not really in favor of optimizing code for ZFS, especially if that
means worse code for every other case. I think it very much makes sense
to assume that lseek(SEEK_{DATA,HOLE}) is faster than writing data to
disk, and actually so much faster that it even pays off if you sometimes
to the lseek() only to find out that you actually have to write data still.

Therefore, the patch as it is makes sense. The fact that said lseek() is
slow on ZFS is (in my humble opinion) the ZFS driver's problem that
needs to be fixed there.

If ZFS has a good alternative for us to check whether a given area of a
file will return zeroes when read, I'm all ears and it might be a good
idea to use it. That is, if someone can write the code for it because
I'd rather not if that requires ZFS headers and a ZFS for testing.

(Determining whether a file has a hole in it and where it is has
actually plagued as for a while now. lseek() seemed to be the most
widespread way with the least amount of pitfalls to do it.)

OTOH, it may make sense to offer a way for the user to disable
lseek(SEEK_{DATA,HOLE}) in our "file" block driver. That way your issue
would be solved, too, I guess. I'll look into it.


Max



On 02.02.2017 13:30, Stephane Chazelas wrote:
> Hello,
> 
> since qemu-2.7.0, doing synchronised I/O in a VM (tested with
> Ubuntu 16.04 amd64 VM)  while the disk is backed by a qcow2
> file sitting on a ZFS filesystem (zfs on Linux on Debian jessie
> (PVE)), the performances are dreadful:
> 
> # time dd if=/dev/zero count=1000  of=b oflag=dsync
> 1000+0 records in
> 1000+0 records out
> 512000 bytes (512 kB, 500 KiB) copied, 21.9908 s, 23.3 kB/s
> dd if=/dev/zero count=1000 of=b oflag=dsync  0.00s user 0.04s system 0% cpu 21.992 total
> 
> (22 seconds to write that half megabyte). Same with O_SYNC or
> O_DIRECT, or doing fsync() or sync_file_range() after each
> write().
> 
> I first noticed it for dpkg unpacking kernel headers where dpkg
> does a sync_file_range() after each file is extracted.
> 
> Note that it doesn't happen when writing anything else than
> zeroes (like tr '\0' x < /dev/zero | dd count=1000  of=b
> oflag=dsync). In the case of the kernel headers, I suppose the
> zeroes come from the non-filled parts of the ext4 blocks.
> 
> Doing strace -fc on the qemu process, 98% of the time is spent
> in the lseek() system call.
> 
> That's the lseek(SEEK_DATA) followed by lseek(SEEK_HOLE) done by
> find_allocation() called to find out whether sectors are within
> a hole in a sparse file.
> 
> #0  lseek64 () at ../sysdeps/unix/syscall-template.S:81
> #1  0x0000561287cf4ca8 in find_allocation (bs=0x7fd898d70000, hole=<synthetic pointer>, data=<synthetic pointer>, start=<optimized out>)
>     at block/raw-posix.c:1702
> #2  raw_co_get_block_status (bs=0x7fd898d70000, sector_num=<optimized out>, nb_sectors=40, pnum=0x7fd80dd05aac, file=0x7fd80dd05ab0) at block/raw-posix.c:1765
> #3  0x0000561287cfae92 in bdrv_co_get_block_status (bs=0x7fd898d70000, sector_num=sector_num@entry=1303680, nb_sectors=40, pnum=pnum@entry=0x7fd80dd05aac,
>     file=file@entry=0x7fd80dd05ab0) at block/io.c:1709
> #4  0x0000561287cfafaa in bdrv_co_get_block_status (bs=bs@entry=0x7fd898d66000, sector_num=sector_num@entry=33974144, nb_sectors=<optimized out>,
>     nb_sectors@entry=40, pnum=pnum@entry=0x7fd80dd05bbc, file=file@entry=0x7fd80dd05bc0) at block/io.c:1742
> #5  0x0000561287cfb0bb in bdrv_co_get_block_status_above (file=0x7fd80dd05bc0, pnum=0x7fd80dd05bbc, nb_sectors=40, sector_num=33974144, base=0x0,
>     bs=<optimized out>) at block/io.c:1776
> #6  bdrv_get_block_status_above_co_entry (opaque=opaque@entry=0x7fd80dd05b40) at block/io.c:1792
> #7  0x0000561287cfae08 in bdrv_get_block_status_above (bs=0x7fd898d66000, base=base@entry=0x0, sector_num=<optimized out>, nb_sectors=nb_sectors@entry=40,
>     pnum=pnum@entry=0x7fd80dd05bbc, file=file@entry=0x7fd80dd05bc0) at block/io.c:1824
> #8  0x0000561287cd372d in is_zero_sectors (bs=<optimized out>, start=<optimized out>, count=40) at block/qcow2.c:2428
> #9  0x0000561287cd38ed in is_zero_sectors (count=<optimized out>, start=<optimized out>, bs=<optimized out>) at block/qcow2.c:2471
> #10 qcow2_co_pwrite_zeroes (bs=0x7fd898d66000, offset=33974144, count=24576, flags=2724114573) at block/qcow2.c:2452
> #11 0x0000561287cfcb7f in bdrv_co_do_pwrite_zeroes (bs=bs@entry=0x7fd898d66000, offset=offset@entry=17394782208, count=count@entry=4096,
>     flags=flags@entry=BDRV_REQ_ZERO_WRITE) at block/io.c:1218
> #12 0x0000561287cfd0cb in bdrv_aligned_pwritev (bs=0x7fd898d66000, req=<optimized out>, offset=17394782208, bytes=4096, align=1, qiov=0x0,
>     flags=<optimized out>) at block/io.c:1320
> #13 0x0000561287cfe450 in bdrv_co_do_zero_pwritev (req=<optimized out>, flags=<optimized out>, bytes=<optimized out>, offset=<optimized out>,
>     bs=<optimized out>) at block/io.c:1422
> #14 bdrv_co_pwritev (child=0x15, offset=17394782208, bytes=4096, qiov=0x7fd8a25eb08d <lseek64+45>, qiov@entry=0x0, flags=231758512) at block/io.c:1492
> #15 0x0000561287cefdc7 in blk_co_pwritev (blk=0x7fd898cad540, offset=17394782208, bytes=4096, qiov=0x0, flags=<optimized out>) at block/block-backend.c:788
> #16 0x0000561287cefeeb in blk_aio_write_entry (opaque=0x7fd812941440) at block/block-backend.c:982
> #17 0x0000561287d67c7a in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:78
> 
> Now, performance is really bad on ZFS for those lseek().
> I believe that's https://github.com/zfsonlinux/zfs/issues/4306
> 
> Until that's fixed in ZFS, I need to find a way to avoid those
> lseek()s in the first place.
> 
> One way is to downgrade to 2.6.2 where those lseek()s are not
> called. The change that introduced them seems to be:
> 
> https://github.com/qemu/qemu/commit/2928abce6d1d426d37c0a9bd5f85fb95cf33f709
> (and there have been further changes to improve it later).
> 
> If I understand correctly, that change was about preventing data
> from being allocated when the user is writing unaligned zeroes.
> 
> I suppose the idea is that if something is trying to write
> zeroes in the middle of an _allocated_ qcow2 cluster, but the
> corresponding sectors in the file underneath are in a hole, we
> don't want to write those zeros as that would allocate the data
> at the file level.
> 
> I can see it makes sense, but in my case, the little space
> efficiency it brings is largely overshadowed by the sharp
> decrease in performance.
> 
> For now, I work around it by changing the "#ifdef SEEK_DATA"
> to "#if 0" in find_allocation().
> 
> Note that passing detect-zeroes=off or detect-zeroes=unmap (with
> discard) doesn't help (even though FALLOC_FL_PUNCH_HOLE is
> supported on ZFS on Linux).
> 
> Is there any other way I could use to prevent those lseek()s
> without having to rebuild qemu?
> 
> Would you consider adding an option to disable that behaviour
> (skip checking allocation at file level for qcow2 image)?
> 
> Thanks,
> Stephane
> 
> 
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [qcow2] how to avoid qemu doing lseek(SEEK_DATA/SEEK_HOLE)?
  2017-02-07 23:43 ` Max Reitz
@ 2017-02-08 14:06   ` Stephane Chazelas
  2017-02-08 14:27     ` Max Reitz
  2017-02-08 14:20   ` Stephane Chazelas
  1 sibling, 1 reply; 8+ messages in thread
From: Stephane Chazelas @ 2017-02-08 14:06 UTC (permalink / raw)
  To: Max Reitz; +Cc: qemu-devel, Qemu-block, Kevin Wolf

2017-02-08 00:43:18 +0100, Max Reitz:
[...]
> OTOH, it may make sense to offer a way for the user to disable
> lseek(SEEK_{DATA,HOLE}) in our "file" block driver. That way your issue
> would be solved, too, I guess. I'll look into it.
[...]

Thanks Max,

Yes, that would work for me and other users of ZFS. What I do
for now is recompile with those lseek(SEEK_{DATA,HOLE}) disabled
in the code and it's working fine.

As I already hinted, something that would also possibly work for
me and could benefit everyone (well at least Linux users on
filesystems supporting hole punching), is instead of checking
beforehand if the file is allocated, do a
fallocate(FALLOC_FL_PUNCH_HOLE), or IOW, tell the underlying
layer to deallocate the data.

That would be those two lseek() replaced by a fallocate(), and
some extra disk space being saved.

One may argue that's what one would expect would be done when
using detect-zeroes=unmap. 

I suppose that would be quite significant work as that would
imply a framework to pass those "deallocates" down and you'd
probably have to differenciate "deallocates" that zero (like
hole punching in a reguar file), and those that don't (like
BLKDISCARD on a SSD)

I also suppose that could cause fragmentation that  would be
unwanted in some contexts, so maybe it should be tunable as
well.

-- 
Stephane

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [qcow2] how to avoid qemu doing lseek(SEEK_DATA/SEEK_HOLE)?
  2017-02-07 23:43 ` Max Reitz
  2017-02-08 14:06   ` Stephane Chazelas
@ 2017-02-08 14:20   ` Stephane Chazelas
  1 sibling, 0 replies; 8+ messages in thread
From: Stephane Chazelas @ 2017-02-08 14:20 UTC (permalink / raw)
  To: Max Reitz; +Cc: qemu-devel, Qemu-block, Kevin Wolf

2017-02-08 00:43:18 +0100, Max Reitz:
[...]
> Therefore, the patch as it is makes sense. The fact that said lseek() is
> slow on ZFS is (in my humble opinion) the ZFS driver's problem that
> needs to be fixed there.
[...]

For the record, I've mentioned the qemu performance implication at
https://github.com/zfsonlinux/zfs/issues/4306#issuecomment-277000682

Not much more I can do at that point.

That issue was raised a year ago and has not been assigned any
milestone yet.

-- 
Stephane

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [qcow2] how to avoid qemu doing lseek(SEEK_DATA/SEEK_HOLE)?
  2017-02-08 14:06   ` Stephane Chazelas
@ 2017-02-08 14:27     ` Max Reitz
  2017-02-08 17:16       ` Stephane Chazelas
  0 siblings, 1 reply; 8+ messages in thread
From: Max Reitz @ 2017-02-08 14:27 UTC (permalink / raw)
  To: Stephane Chazelas; +Cc: qemu-devel, Qemu-block, Kevin Wolf

[-- Attachment #1: Type: text/plain, Size: 2650 bytes --]

On 08.02.2017 15:06, Stephane Chazelas wrote:
> 2017-02-08 00:43:18 +0100, Max Reitz:
> [...]
>> OTOH, it may make sense to offer a way for the user to disable
>> lseek(SEEK_{DATA,HOLE}) in our "file" block driver. That way your issue
>> would be solved, too, I guess. I'll look into it.
> [...]
> 
> Thanks Max,
> 
> Yes, that would work for me and other users of ZFS. What I do
> for now is recompile with those lseek(SEEK_{DATA,HOLE}) disabled
> in the code and it's working fine.
> 
> As I already hinted, something that would also possibly work for
> me and could benefit everyone (well at least Linux users on
> filesystems supporting hole punching), is instead of checking
> beforehand if the file is allocated, do a
> fallocate(FALLOC_FL_PUNCH_HOLE), or IOW, tell the underlying
> layer to deallocate the data.
> 
> That would be those two lseek() replaced by a fallocate(), and
> some extra disk space being saved.

When using qcow2, however, qcow2 will try to take care of that by
discarding clusters or using special zero clusters.

The lseek() thing just tries to mitigate the effect of writing less than
a cluster of zeroes. Yes, we could punch a hole, but is that faster or
has at least the same speed as lseek(SEEK_{DATA,HOLE}) on all filesystems?

Also, is it the same speed on all protocols? We not only support files
as image storage, but also network protocols etc..

But this is just generally speaking for the "write zeroes" case. With
detect-zeroes=unmap (as opposed to detect-zeroes=on), things are a bit
different. It may indeed make sense to fall through to the protocol
level and punch holes there, even if it may be slower than lseek().

> One may argue that's what one would expect would be done when
> using detect-zeroes=unmap.

A bit of a stupid question, but: How is your performance when using
detect-zeroes=off?

> I suppose that would be quite significant work as that would
> imply a framework to pass those "deallocates" down and you'd
> probably have to differenciate "deallocates" that zero (like
> hole punching in a reguar file), and those that don't (like
> BLKDISCARD on a SSD)

Well, we internally already have different functions for writing zeroes
and discarding. The thing is, though, that qcow2 is supposed to handle
those deallocations and not hand them down, because qcow2 can handle
them -- but not if they're not aligned to whole qcow2 clusters (which,
by default, are 64 kB in size).

Max

> 
> I also suppose that could cause fragmentation that  would be
> unwanted in some contexts, so maybe it should be tunable as
> well.
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [qcow2] how to avoid qemu doing lseek(SEEK_DATA/SEEK_HOLE)?
  2017-02-08 14:27     ` Max Reitz
@ 2017-02-08 17:16       ` Stephane Chazelas
  0 siblings, 0 replies; 8+ messages in thread
From: Stephane Chazelas @ 2017-02-08 17:16 UTC (permalink / raw)
  To: Max Reitz; +Cc: Kevin Wolf, qemu-devel, Qemu-block

2017-02-08 15:27:11 +0100, Max Reitz:
[...]
> A bit of a stupid question, but: How is your performance when using
> detect-zeroes=off?
[...]

I did try that. See:

} Note that passing detect-zeroes=off or detect-zeroes=unmap (with
} discard) doesn't help (even though FALLOC_FL_PUNCH_HOLE is
} supported on ZFS on Linux).

In my original message. It makes no difference, I still see
those lseek()s being done.

-- 
Stephane

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-02-08 17:16 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-02 12:30 [Qemu-devel] [qcow2] how to avoid qemu doing lseek(SEEK_DATA/SEEK_HOLE)? Stephane Chazelas
2017-02-02 15:23 ` Laszlo Ersek
2017-02-02 16:03   ` Stephane Chazelas
2017-02-07 23:43 ` Max Reitz
2017-02-08 14:06   ` Stephane Chazelas
2017-02-08 14:27     ` Max Reitz
2017-02-08 17:16       ` Stephane Chazelas
2017-02-08 14:20   ` Stephane Chazelas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.