All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Denis V. Lunev" <den@openvz.org>
To: Kevin Wolf <kwolf@redhat.com>, Alberto Garcia <berto@igalia.com>
Cc: qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>,
	qemu-block@nongnu.org, Max Reitz <mreitz@redhat.com>
Subject: Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation
Date: Thu, 13 Apr 2017 17:42:57 +0300	[thread overview]
Message-ID: <4ed1cfc8-81c9-be4e-17c7-599a6b3ac0d3@openvz.org> (raw)
In-Reply-To: <20170413135155.GD5095@noname.redhat.com>

On 04/13/2017 04:51 PM, Kevin Wolf wrote:
> Am 13.04.2017 um 15:21 hat Alberto Garcia geschrieben:
>> This invariant is already broken by the very design of the qcow2 format,
>> subclusters don't really add anything new there. For any given cluster
>> size you can write 4k in every odd cluster, then do the same in every
>> even cluster, and you'll get an equally fragmented image.
> Because this scenario has appeared repeatedly in this thread: Can we
> please use a more realistic one that shows an actual problem? Because
> with 8k or more for the cluster size you don't get any qcow2
> fragmentation with 4k even/odd writes (which is a pathological case
> anyway), and the file systems are clever enough to cope with it, too.
>
> Just to confirm this experimentally, I ran this short script:
>
> ----------------------------------------------------------------
> #!/bin/bash
> ./qemu-img create -f qcow2 /tmp/test.qcow2 64M
>
> echo even blocks
> for i in $(seq 0 32767); do echo "write $((i * 8))k 4k"; done | ./qemu-io /tmp/test.qcow2 > /dev/null
> echo odd blocks
> for i in $(seq 0 32767); do echo "write $((i * 8 + 4))k 4k"; done | ./qemu-io /tmp/test.qcow2 > /dev/null
>
> ./qemu-img map /tmp/test.qcow2
> filefrag -v /tmp/test.qcow2
> ----------------------------------------------------------------
>
> And sure enough, this is the output:
>
> ----------------------------------------------------------------
> Formatting '/tmp/test.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
> even blocks
> odd blocks
> Offset          Length          Mapped to       File
> 0               0x4000000       0x50000         /tmp/test.qcow2
> Filesystem type is: 58465342
> File size of /tmp/test.qcow2 is 67436544 (16464 blocks of 4096 bytes)
>  ext:     logical_offset:        physical_offset: length:   expected: flags:
>    0:        0..      47:     142955..    143002:     48:            
>    1:       48..      48:     143016..    143016:      1:     143003:
>    2:       64..      79:     142868..    142883:     16:     143017:
>    3:       80..     111:     155386..    155417:     32:     142884:
>    4:      112..     303:     227558..    227749:    192:     155418:
>    5:      304..     559:     228382..    228637:    256:     227750:
>    6:      560..    1071:     455069..    455580:    512:     228638:
>    7:     1072..    2095:     485544..    486567:   1024:     455581:
>    8:     2096..    4143:     497978..    500025:   2048:     486568:
>    9:     4144..    8239:     508509..    512604:   4096:     500026:
>   10:     8240..   16431:     563122..    571313:   8192:     512605:
>   11:    16432..   32815:     632969..    649352:  16384:     571314: eof
> /tmp/test.qcow2: 12 extents found
> ----------------------------------------------------------------
>
> That is, on the qcow2 level we have exactly 0% fragmentation, everything
> is completely contiguous in a single chunk. XFS as the container of the
> test image creates a few more extents, but as you can see, it uses
> fairly large extent sizes in the end (and it would use even larger ones
> if I wrote more than 64 MB).

I am spoken about image like this:

#!/bin/bash
qemu-img create -f qcow2 /tmp/test.qcow2 64M

echo even blocks
for i in $(seq 0 512); do echo "write $((i * 128 + 1))k 4k"; done |
qemu-io /tmp/test.qcow2 > /dev/null
echo odd blocks
for i in $(seq 0 512); do echo "write $((i * 128 + 65))k 4k"; done |
qemu-io /tmp/test.qcow2 > /dev/null

echo fragmented
strace -f -e pread64 qemu-io -c "read 0 64M" /tmp/test.qcow2 2>&1 | wc -l
rm -rf 1.img

qemu-img create -f qcow2 /tmp/test.qcow2 64M
echo sequential
for i in $(seq 0 1024); do echo "write $((i * 64))k 4k"; done | qemu-io
/tmp/test.qcow2 > /dev/null
strace -f -e pread64 qemu-io -c "read 0 64M" /tmp/test.qcow2 2>&1 | wc -l

and the difference is important - see the amount of read operations
reported: 1032 vs 9

iris ~/tmp/2 $ ./1.sh
Formatting '/tmp/test.qcow2', fmt=qcow2 size=67108864 encryption=off
cluster_size=65536 lazy_refcounts=off refcount_bits=16
even blocks
odd blocks
fragmented
1032   <------------------------- (1)
Formatting '/tmp/test.qcow2', fmt=qcow2 size=67108864 encryption=off
cluster_size=65536 lazy_refcounts=off refcount_bits=16
sequential
main-loop: WARNING: I/O thread spun for 1000 iterations
9        <-------------------------- (2)
iris ~/tmp/2 $

Subclusters will work exactly like (1) rather than (2). With a big block
(1 Mb) one could expect much better performance for sequential
operations as (2). The file (1) is continuous from the host point of
view, correct, but it will be accessed randomly from guest. See
the difference in the amount of reads performed.

Den

  parent reply	other threads:[~2017-04-13 14:43 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-06 15:01 [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation Alberto Garcia
2017-04-06 16:40 ` Eric Blake
2017-04-07  8:49   ` Alberto Garcia
2017-04-07 12:41   ` Kevin Wolf
2017-04-07 14:24     ` Alberto Garcia
2017-04-21 21:09   ` [Qemu-devel] proposed qcow2 extension: cluster reservations [was: " Eric Blake
2017-04-22 17:56     ` Max Reitz
2017-04-24 11:45       ` Kevin Wolf
2017-04-24 12:46       ` Alberto Garcia
2017-04-07 12:20 ` [Qemu-devel] " Stefan Hajnoczi
2017-04-07 12:24   ` Alberto Garcia
2017-04-07 13:01   ` Kevin Wolf
2017-04-10 15:32     ` Stefan Hajnoczi
2017-04-07 17:10 ` Max Reitz
2017-04-10  8:42   ` Kevin Wolf
2017-04-10 15:03     ` Max Reitz
2017-04-11 12:56   ` Alberto Garcia
2017-04-11 14:04     ` Max Reitz
2017-04-11 14:31       ` Alberto Garcia
2017-04-11 14:45         ` [Qemu-devel] [Qemu-block] " Eric Blake
2017-04-12 12:41           ` Alberto Garcia
2017-04-12 14:10             ` Max Reitz
2017-04-13  8:05               ` Alberto Garcia
2017-04-13  9:02                 ` Kevin Wolf
2017-04-13  9:05                   ` Alberto Garcia
2017-04-11 14:49         ` [Qemu-devel] " Kevin Wolf
2017-04-11 14:58           ` Eric Blake
2017-04-11 14:59           ` Max Reitz
2017-04-11 15:08             ` Eric Blake
2017-04-11 15:18               ` Max Reitz
2017-04-11 15:29                 ` Kevin Wolf
2017-04-11 15:29                   ` Max Reitz
2017-04-11 15:30                 ` Eric Blake
2017-04-11 15:34                   ` Max Reitz
2017-04-12 12:47           ` Alberto Garcia
2017-04-12 16:54 ` Denis V. Lunev
2017-04-13 11:58   ` Alberto Garcia
2017-04-13 12:44     ` Denis V. Lunev
2017-04-13 13:05       ` Kevin Wolf
2017-04-13 13:09         ` Denis V. Lunev
2017-04-13 13:36           ` Alberto Garcia
2017-04-13 14:06             ` Denis V. Lunev
2017-04-13 13:21       ` Alberto Garcia
2017-04-13 13:30         ` Denis V. Lunev
2017-04-13 13:59           ` Kevin Wolf
2017-04-13 15:04           ` Alberto Garcia
2017-04-13 15:17             ` Denis V. Lunev
2017-04-18 11:52               ` Alberto Garcia
2017-04-18 17:27                 ` Denis V. Lunev
2017-04-13 13:51         ` Kevin Wolf
2017-04-13 14:15           ` Alberto Garcia
2017-04-13 14:27             ` Kevin Wolf
2017-04-13 16:42               ` [Qemu-devel] [Qemu-block] " Roman Kagan
2017-04-13 14:42           ` Denis V. Lunev [this message]
2017-04-12 17:55 ` [Qemu-devel] " Denis V. Lunev
2017-04-12 18:20   ` Eric Blake
2017-04-12 19:02     ` Denis V. Lunev
2017-04-13  9:44       ` Kevin Wolf
2017-04-13 10:19         ` Denis V. Lunev
2017-04-14  1:06           ` [Qemu-devel] [Qemu-block] " John Snow
2017-04-14  4:17             ` Denis V. Lunev
2017-04-18 11:22               ` Kevin Wolf
2017-04-18 17:30                 ` Denis V. Lunev
2017-04-14  7:40             ` Roman Kagan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ed1cfc8-81c9-be4e-17c7-599a6b3ac0d3@openvz.org \
    --to=den@openvz.org \
    --cc=berto@igalia.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.