All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alberto Garcia <berto@igalia.com>
To: Kevin Wolf <kwolf@redhat.com>, Eric Blake <eblake@redhat.com>
Cc: qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>,
	qemu-block@nongnu.org, Max Reitz <mreitz@redhat.com>
Subject: Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation
Date: Fri, 07 Apr 2017 16:24:44 +0200	[thread overview]
Message-ID: <w514ly0e1n7.fsf@maestria.local.igalia.com> (raw)
In-Reply-To: <20170407124121.GC4716@noname.redhat.com>

On Fri 07 Apr 2017 02:41:21 PM CEST, Kevin Wolf <kwolf@redhat.com> wrote:
>> 63    56 55    48 47    40 39    32 31    24 23    16 15     8 7      0
>> 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>> **<----> <-----------------------------------------------><---------->*
>>   Rsrved              host cluster offset of data             Reserved
>>   (6 bits)                (44 bits)                           (11 bits)
>> 
>> where you have 17 bits plus the "all zeroes" bit to play with, thanks to
>> the three bits of host cluster offset that are now guaranteed to be zero
>> due to cluster size alignment (but you're also right that the "all
>> zeroes" bit is now redundant information with the 8 subcluster-is-zero
>> bits, so repurposing it does not hurt)
>> 
>> > 
>> >     * Pros:
>> >       + Simple. Few changes compared to the current qcow2 format.
>> > 
>> >     * Cons:
>> >       - Only 8 subclusters per cluster. We would not be making the
>> >         most of this feature.
>> > 
>> >       - No reserved bits left for the future.
>> 
>> I just argued you have at least one, and probably 2, bits left over for
>> future in-word expansion.
>
> I think only 8 subclusters is just too few. That the subcluster status
> would be split in two halves doesn't make me like this layout much
> better either.

I also agree that 8 are too few (splitting the subcluster field would
not be strictly necessary, but that's not so important).

>> > (2) Making L2 entries 128-bit wide.
>> > 
>> >     In this alternative we would double the size of L2 entries. The
>> >     first half would remain unchanged and the second one would store
>> >     the bitmap. That would leave us with 32 subclusters per cluster.
>> 
>> Although for smaller cluster sizes (such as 4k clusters), you'd still
>> want to restrict that subclusters are at least 512-byte sectors, so
>> you'd be using fewer than 32 of those subcluster positions until the
>> cluster size is large enough.
>> 
>> > 
>> >     * Pros:
>> >       + More subclusters per cluster. We could have images with
>> >         e.g. 128k clusters with 4k subclusters.
>> 
>> Could allow variable-sized subclusters (your choice of 32 subclusters of
>> 4k each, or 16 subclusters of 8k each)
>
> I don't think using less subclusters is desirable if it doesn't come
> with savings elsewhere. We already need to allocate two clusters for an
> L2 table now, so we want to use it.
>
> The more interesting kind of variable-sized subclusters would be if you
> could select any multiple of 32, meaning three or more clusters per L2
> table (with 192 bits or more per entry).

Yeah, I agree. I think it's worth considering. One more drawback that I
can think of is that if we make L2 entries wider and we have compressed
clusters we'd be wasting space in their entries.

>> >       - One more metadata structure to be updated for each
>> >         allocation. This would probably impact I/O negatively.
>> 
>> Having the subcluster table directly in the L2 means that updating
>> the L2 table is done with a single write. You are definitely right
>> that having the subcluster table as a bitmap in a separate cluster
>> means two writes instead of one, but as always, it's hard to predict
>> how much of an impact that is without benchmarks.
>
> Note that it's not just additional write requests, but that we can't
> update the L2 table entry and the bitmap atomically any more, so we
> have to worry about ordering. The ordering between L2 table and
> refcount blocks is already painful enough, I'm not sure if I would
> want to add a third type. Ordering also means disk flushes, which are
> a lot slower than just additional writes.

You're rightk, I think you just convinced me that this is a bad idea and
I'm also more inclined towards (2) now.

Berto

  reply	other threads:[~2017-04-07 14:56 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-06 15:01 [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation Alberto Garcia
2017-04-06 16:40 ` Eric Blake
2017-04-07  8:49   ` Alberto Garcia
2017-04-07 12:41   ` Kevin Wolf
2017-04-07 14:24     ` Alberto Garcia [this message]
2017-04-21 21:09   ` [Qemu-devel] proposed qcow2 extension: cluster reservations [was: " Eric Blake
2017-04-22 17:56     ` Max Reitz
2017-04-24 11:45       ` Kevin Wolf
2017-04-24 12:46       ` Alberto Garcia
2017-04-07 12:20 ` [Qemu-devel] " Stefan Hajnoczi
2017-04-07 12:24   ` Alberto Garcia
2017-04-07 13:01   ` Kevin Wolf
2017-04-10 15:32     ` Stefan Hajnoczi
2017-04-07 17:10 ` Max Reitz
2017-04-10  8:42   ` Kevin Wolf
2017-04-10 15:03     ` Max Reitz
2017-04-11 12:56   ` Alberto Garcia
2017-04-11 14:04     ` Max Reitz
2017-04-11 14:31       ` Alberto Garcia
2017-04-11 14:45         ` [Qemu-devel] [Qemu-block] " Eric Blake
2017-04-12 12:41           ` Alberto Garcia
2017-04-12 14:10             ` Max Reitz
2017-04-13  8:05               ` Alberto Garcia
2017-04-13  9:02                 ` Kevin Wolf
2017-04-13  9:05                   ` Alberto Garcia
2017-04-11 14:49         ` [Qemu-devel] " Kevin Wolf
2017-04-11 14:58           ` Eric Blake
2017-04-11 14:59           ` Max Reitz
2017-04-11 15:08             ` Eric Blake
2017-04-11 15:18               ` Max Reitz
2017-04-11 15:29                 ` Kevin Wolf
2017-04-11 15:29                   ` Max Reitz
2017-04-11 15:30                 ` Eric Blake
2017-04-11 15:34                   ` Max Reitz
2017-04-12 12:47           ` Alberto Garcia
2017-04-12 16:54 ` Denis V. Lunev
2017-04-13 11:58   ` Alberto Garcia
2017-04-13 12:44     ` Denis V. Lunev
2017-04-13 13:05       ` Kevin Wolf
2017-04-13 13:09         ` Denis V. Lunev
2017-04-13 13:36           ` Alberto Garcia
2017-04-13 14:06             ` Denis V. Lunev
2017-04-13 13:21       ` Alberto Garcia
2017-04-13 13:30         ` Denis V. Lunev
2017-04-13 13:59           ` Kevin Wolf
2017-04-13 15:04           ` Alberto Garcia
2017-04-13 15:17             ` Denis V. Lunev
2017-04-18 11:52               ` Alberto Garcia
2017-04-18 17:27                 ` Denis V. Lunev
2017-04-13 13:51         ` Kevin Wolf
2017-04-13 14:15           ` Alberto Garcia
2017-04-13 14:27             ` Kevin Wolf
2017-04-13 16:42               ` [Qemu-devel] [Qemu-block] " Roman Kagan
2017-04-13 14:42           ` [Qemu-devel] " Denis V. Lunev
2017-04-12 17:55 ` Denis V. Lunev
2017-04-12 18:20   ` Eric Blake
2017-04-12 19:02     ` Denis V. Lunev
2017-04-13  9:44       ` Kevin Wolf
2017-04-13 10:19         ` Denis V. Lunev
2017-04-14  1:06           ` [Qemu-devel] [Qemu-block] " John Snow
2017-04-14  4:17             ` Denis V. Lunev
2017-04-18 11:22               ` Kevin Wolf
2017-04-18 17:30                 ` Denis V. Lunev
2017-04-14  7:40             ` Roman Kagan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=w514ly0e1n7.fsf@maestria.local.igalia.com \
    --to=berto@igalia.com \
    --cc=eblake@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.