All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Denis V. Lunev" <den@openvz.org>
To: Alberto Garcia <berto@igalia.com>, Kevin Wolf <kwolf@redhat.com>
Cc: qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>,
	qemu-block@nongnu.org, Max Reitz <mreitz@redhat.com>
Subject: Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation
Date: Thu, 13 Apr 2017 17:06:24 +0300	[thread overview]
Message-ID: <652aee46-5d72-a5c0-944d-a9712efd3948@openvz.org> (raw)
In-Reply-To: <w51mvbkv38k.fsf@maestria.local.igalia.com>

On 04/13/2017 04:36 PM, Alberto Garcia wrote:
> On Thu 13 Apr 2017 03:09:53 PM CEST, Denis V. Lunev wrote:
>>>> For nowadays SSD we are facing problems somewhere else. Right now I
>>>> can achieve only 100k IOPSes on SSD capable of 350-550k. 1 Mb block
>>>> with preallocation and fragmented L2 cache gives same 100k. Tests
>>>> for initially empty image gives around 80k for us.
>>> Preallocated images aren't particularly interesting to me. qcow2 is
>>> used mainly for two reasons. One of them is sparseness (initially
>>> small file size) mostly for desktop use cases with no serious I/O, so
>>> not that interesting either. The other one is snapshots, i.e. backing
>>> files, which doesn't work with preallocation (yet).
>>>
>>> Actually, preallocation with backing files is something that
>>> subclusters would automatically enable: You could already reserve the
>>> space for a cluster, but still leave all subclusters marked as
>>> unallocated.
>> I am spoken about fallocate() for the entire cluster before actual
>> write() for originally empty image. This increases the performance of
>> 4k random writes 10+ times. In this case we can just write those 4k
>> and do nothing else.
> You're talking about using fallocate() for filling a cluster with zeroes
> before writing data to it.
>
> As noted earlier in this thread, this works if the image is empty or if
> it doesn't have a backing file.
>
> And if the image is not empty you cannot guarantee that the cluster
> contains zeroes (you can use FALLOC_FL_ZERO_RANGE, but that won't work
> in all cases).
>
> Berto
yes, I agree here.

But COW operations suffer more from the amount of IO operations
required rather than from the amount of data transferred. Let us
assume that we have 64k cluster represented as [--------]. With
4k write in the middle of the cluster we will have now 5 IOPSes
to perform the operation: read head, write head, write 4k, read tail,
write tail. Normally this should take 2 operations: read entire
cluster (64kb), write entire cluster (64kb).

In this approach further 64kb read of this cluster will result in
1 host IOPS - the file is continuous from the point of the host file
system.

With 8kb subclusters we will have same 1 read and 1 write after the
tuning. The difference is only the amount of data read: 8kb and
8 kb write instead of 64kb read and 64kb write.

Sure the size of the cluster should be increased. Personally I like
1 MB due to my past experience.

At my opinion there is no difference at all in terms of performance
for rotational drive at all for COW of 1 MB cluster and 64 kb cluster
without subclusters. Reading of 1 Mb and reading of 64 kb are the
same (100-150 IOPSes with 150 Mb/s throughput). Further continuous
reads will be much better with 1 MB blocks and without subclusters.

Yes, there is a difference on "average" SSD drives, which gives
40k-100k IOPSes. We will experience slowdown reading 1 MB
instead of 64kb. But the difference is not that big actually.
Top notch nowadays PCIe SSDs can not be saturated with QEMU
nowadays. I am able to reach only 100k IOPSes instead of
300k-500k in host even when the data is written to the existing
clusters. Thus we will have no difference with them between 1 MB
cluster and 64 kb cluster in terms of COW.

So, at my opinion, simple 1 Mb cluster size along as fragmented
L2 cache is very good from all points. Even from COW point ;)
The situation in real life will not be worse or better from the performance
point of view as we also will avoid additional metadata updates.

Den

  reply	other threads:[~2017-04-13 14:39 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-06 15:01 [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation Alberto Garcia
2017-04-06 16:40 ` Eric Blake
2017-04-07  8:49   ` Alberto Garcia
2017-04-07 12:41   ` Kevin Wolf
2017-04-07 14:24     ` Alberto Garcia
2017-04-21 21:09   ` [Qemu-devel] proposed qcow2 extension: cluster reservations [was: " Eric Blake
2017-04-22 17:56     ` Max Reitz
2017-04-24 11:45       ` Kevin Wolf
2017-04-24 12:46       ` Alberto Garcia
2017-04-07 12:20 ` [Qemu-devel] " Stefan Hajnoczi
2017-04-07 12:24   ` Alberto Garcia
2017-04-07 13:01   ` Kevin Wolf
2017-04-10 15:32     ` Stefan Hajnoczi
2017-04-07 17:10 ` Max Reitz
2017-04-10  8:42   ` Kevin Wolf
2017-04-10 15:03     ` Max Reitz
2017-04-11 12:56   ` Alberto Garcia
2017-04-11 14:04     ` Max Reitz
2017-04-11 14:31       ` Alberto Garcia
2017-04-11 14:45         ` [Qemu-devel] [Qemu-block] " Eric Blake
2017-04-12 12:41           ` Alberto Garcia
2017-04-12 14:10             ` Max Reitz
2017-04-13  8:05               ` Alberto Garcia
2017-04-13  9:02                 ` Kevin Wolf
2017-04-13  9:05                   ` Alberto Garcia
2017-04-11 14:49         ` [Qemu-devel] " Kevin Wolf
2017-04-11 14:58           ` Eric Blake
2017-04-11 14:59           ` Max Reitz
2017-04-11 15:08             ` Eric Blake
2017-04-11 15:18               ` Max Reitz
2017-04-11 15:29                 ` Kevin Wolf
2017-04-11 15:29                   ` Max Reitz
2017-04-11 15:30                 ` Eric Blake
2017-04-11 15:34                   ` Max Reitz
2017-04-12 12:47           ` Alberto Garcia
2017-04-12 16:54 ` Denis V. Lunev
2017-04-13 11:58   ` Alberto Garcia
2017-04-13 12:44     ` Denis V. Lunev
2017-04-13 13:05       ` Kevin Wolf
2017-04-13 13:09         ` Denis V. Lunev
2017-04-13 13:36           ` Alberto Garcia
2017-04-13 14:06             ` Denis V. Lunev [this message]
2017-04-13 13:21       ` Alberto Garcia
2017-04-13 13:30         ` Denis V. Lunev
2017-04-13 13:59           ` Kevin Wolf
2017-04-13 15:04           ` Alberto Garcia
2017-04-13 15:17             ` Denis V. Lunev
2017-04-18 11:52               ` Alberto Garcia
2017-04-18 17:27                 ` Denis V. Lunev
2017-04-13 13:51         ` Kevin Wolf
2017-04-13 14:15           ` Alberto Garcia
2017-04-13 14:27             ` Kevin Wolf
2017-04-13 16:42               ` [Qemu-devel] [Qemu-block] " Roman Kagan
2017-04-13 14:42           ` [Qemu-devel] " Denis V. Lunev
2017-04-12 17:55 ` Denis V. Lunev
2017-04-12 18:20   ` Eric Blake
2017-04-12 19:02     ` Denis V. Lunev
2017-04-13  9:44       ` Kevin Wolf
2017-04-13 10:19         ` Denis V. Lunev
2017-04-14  1:06           ` [Qemu-devel] [Qemu-block] " John Snow
2017-04-14  4:17             ` Denis V. Lunev
2017-04-18 11:22               ` Kevin Wolf
2017-04-18 17:30                 ` Denis V. Lunev
2017-04-14  7:40             ` Roman Kagan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=652aee46-5d72-a5c0-944d-a9712efd3948@openvz.org \
    --to=den@openvz.org \
    --cc=berto@igalia.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.