All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] qcow2: Metadata preallocation
@ 2009-08-14 15:00 Kevin Wolf
  2009-08-16 11:58 ` Avi Kivity
  0 siblings, 1 reply; 9+ messages in thread
From: Kevin Wolf @ 2009-08-14 15:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf

This introduces a qemu-img create option for qcow2 which allows the metadata to
be preallocated, i.e. clusters are reserved in the refcount table and L1/L2
tables, but no data is written to them. Metadata is quite small, so this
happens in almost no time.

Especially with qcow2 on virtio this helps to gain a bit of performance during
the initial writes. However, as soon as create a snapshot, we're back to the
normal slow speed, obviously. So this isn't the real fix, but kind of a cheat
while we're still having trouble with qcow2 on virtio.

Note that the option is disabled by default and needs to be specified
explicitly using qemu-img create -f qcow2 -o preallocation=metadata.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2.c |   83 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 block_int.h   |    1 +
 2 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index a5bf205..88e0c71 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -638,9 +638,56 @@ static int get_bits_from_size(size_t size)
     return res;
 }
 
+
+static int preallocate(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+    uint64_t cluster_offset;
+    uint64_t nb_sectors;
+    uint64_t offset;
+    int num;
+    QCowL2Meta meta;
+
+    nb_sectors = bdrv_getlength(bs) >> 9;
+    offset = 0;
+
+    while (nb_sectors) {
+        num = MIN(nb_sectors, INT_MAX >> 9);
+        cluster_offset = qcow2_alloc_cluster_offset(bs, offset, 0, num, &num,
+            &meta);
+
+        if (cluster_offset == 0) {
+            return -1;
+        }
+
+        if (qcow2_alloc_cluster_link_l2(bs, cluster_offset, &meta) < 0) {
+            qcow2_free_any_clusters(bs, cluster_offset, meta.nb_clusters);
+            return -1;
+        }
+
+        /* TODO Preallocate data if requested */
+
+        nb_sectors -= num;
+        offset += num << 9;
+    }
+
+    /*
+     * It is expected that the image file is large enough to actually contain
+     * all of the allocated clusters (otherwise we get failing reads after
+     * EOF). So just write some zeros to the last sector.
+     */
+    if (cluster_offset != 0) {
+        uint8_t buf[512];
+        memset(buf, 0, 512);
+        bdrv_write(s->hd, (cluster_offset >> 9) + num - 1, buf, 1);
+    }
+
+    return 0;
+}
+
 static int qcow_create2(const char *filename, int64_t total_size,
                         const char *backing_file, const char *backing_format,
-                        int flags, size_t cluster_size)
+                        int flags, size_t cluster_size, int prealloc)
 {
 
     int fd, header_size, backing_filename_len, l1_size, i, shift, l2_bits;
@@ -762,6 +809,16 @@ static int qcow_create2(const char *filename, int64_t total_size,
     qemu_free(s->refcount_table);
     qemu_free(s->refcount_block);
     close(fd);
+
+    /* Preallocate metadata */
+    if (prealloc) {
+        BlockDriverState *bs;
+        bs = bdrv_new("");
+        bdrv_open(bs, filename, BDRV_O_CACHE_WB);
+        preallocate(bs);
+        bdrv_close(bs);
+    }
+
     return 0;
 }
 
@@ -772,6 +829,7 @@ static int qcow_create(const char *filename, QEMUOptionParameter *options)
     uint64_t sectors = 0;
     int flags = 0;
     size_t cluster_size = 65536;
+    int prealloc = 0;
 
     /* Read out options */
     while (options && options->name) {
@@ -787,12 +845,28 @@ static int qcow_create(const char *filename, QEMUOptionParameter *options)
             if (options->value.n) {
                 cluster_size = options->value.n;
             }
+        } else if (!strcmp(options->name, BLOCK_OPT_PREALLOC)) {
+            if (!options->value.s || !strcmp(options->value.s, "off")) {
+                prealloc = 0;
+            } else if (!strcmp(options->value.s, "metadata")) {
+                prealloc = 1;
+            } else {
+                fprintf(stderr, "Invalid preallocation mode: '%s'\n",
+                    options->value.s);
+                return -EINVAL;
+            }
         }
         options++;
     }
 
+    if (backing_file && prealloc) {
+        fprintf(stderr, "Backing file and preallocation cannot be used at "
+            "the same time\n");
+        return -EINVAL;
+    }
+
     return qcow_create2(filename, sectors, backing_file, backing_fmt, flags,
-        cluster_size);
+        cluster_size, prealloc);
 }
 
 static int qcow_make_empty(BlockDriverState *bs)
@@ -982,6 +1056,11 @@ static QEMUOptionParameter qcow_create_options[] = {
         .type = OPT_SIZE,
         .help = "qcow2 cluster size"
     },
+    {
+        .name = BLOCK_OPT_PREALLOC,
+        .type = OPT_STRING,
+        .help = "Preallocation mode (allowed values: off, metadata)"
+    },
     { NULL }
 };
 
diff --git a/block_int.h b/block_int.h
index 8898d91..0902fd4 100644
--- a/block_int.h
+++ b/block_int.h
@@ -37,6 +37,7 @@
 #define BLOCK_OPT_BACKING_FILE  "backing_file"
 #define BLOCK_OPT_BACKING_FMT   "backing_fmt"
 #define BLOCK_OPT_CLUSTER_SIZE  "cluster_size"
+#define BLOCK_OPT_PREALLOC      "preallocation"
 
 typedef struct AIOPool {
     void (*cancel)(BlockDriverAIOCB *acb);
-- 
1.6.0.6

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH] qcow2: Metadata preallocation
  2009-08-14 15:00 [Qemu-devel] [PATCH] qcow2: Metadata preallocation Kevin Wolf
@ 2009-08-16 11:58 ` Avi Kivity
  2009-08-16 12:12   ` Filip Navara
  2009-08-17  7:11   ` Kevin Wolf
  0 siblings, 2 replies; 9+ messages in thread
From: Avi Kivity @ 2009-08-16 11:58 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel

On 08/14/2009 06:00 PM, Kevin Wolf wrote:
> This introduces a qemu-img create option for qcow2 which allows the metadata to
> be preallocated, i.e. clusters are reserved in the refcount table and L1/L2
> tables, but no data is written to them. Metadata is quite small, so this
> happens in almost no time.
>
> Especially with qcow2 on virtio this helps to gain a bit of performance during
> the initial writes. However, as soon as create a snapshot, we're back to the
> normal slow speed, obviously. So this isn't the real fix, but kind of a cheat
> while we're still having trouble with qcow2 on virtio.
>
> Note that the option is disabled by default and needs to be specified
> explicitly using qemu-img create -f qcow2 -o preallocation=metadata.
>
>    

Can't say I'm thrilled with this.  I'd prefer coalescing metadata 
updates on parallel writes.  I don't object to this though.

> +    /*
> +     * It is expected that the image file is large enough to actually contain
> +     * all of the allocated clusters (otherwise we get failing reads after
> +     * EOF). So just write some zeros to the last sector.
> +     */
> +    if (cluster_offset != 0) {
> +        uint8_t buf[512];
> +        memset(buf, 0, 512);
> +        bdrv_write(s->hd, (cluster_offset>>  9) + num - 1, buf, 1);
> +    }
> +
>    

Older versions of Windows don't support sparse files, and newer ones 
need a flag.  It's a good idea to set this flag when opening on Windows.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH] qcow2: Metadata preallocation
  2009-08-16 11:58 ` Avi Kivity
@ 2009-08-16 12:12   ` Filip Navara
  2009-08-16 16:48     ` Jamie Lokier
  2009-08-17  7:11   ` Kevin Wolf
  1 sibling, 1 reply; 9+ messages in thread
From: Filip Navara @ 2009-08-16 12:12 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Kevin Wolf, qemu-devel

On Sun, Aug 16, 2009 at 1:58 PM, Avi Kivity<avi@redhat.com> wrote:
> Older versions of Windows don't support sparse files, and newer ones need a
> flag.

It's supported since Windows 2000, btw.

>  It's a good idea to set this flag when opening on Windows.

FILE_ATTRIBUTE_SPARSE_FILE? You can't actually set it when
opening/creating the file, a separate call to
DeviceIoControl/FSCTL_SET_SPARSE is needed.

Best regards,
Filip Navara

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH] qcow2: Metadata preallocation
  2009-08-16 12:12   ` Filip Navara
@ 2009-08-16 16:48     ` Jamie Lokier
  2009-08-16 20:28       ` Jamie Lokier
  2009-08-17  7:16       ` Kevin Wolf
  0 siblings, 2 replies; 9+ messages in thread
From: Jamie Lokier @ 2009-08-16 16:48 UTC (permalink / raw)
  To: Filip Navara; +Cc: Kevin Wolf, Avi Kivity, qemu-devel

Filip Navara wrote:
> FILE_ATTRIBUTE_SPARSE_FILE? You can't actually set it when
> opening/creating the file, a separate call to
> DeviceIoControl/FSCTL_SET_SPARSE is needed.

I see that you increase the file size by writing zeros to the end.

Can't you use the Windows equivalent of unix ftruncate() to extend the
file instead, after FSCTL_SET_SPARSE?

-- Jamie

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH] qcow2: Metadata preallocation
  2009-08-16 16:48     ` Jamie Lokier
@ 2009-08-16 20:28       ` Jamie Lokier
  2009-08-17  7:16       ` Kevin Wolf
  1 sibling, 0 replies; 9+ messages in thread
From: Jamie Lokier @ 2009-08-16 20:28 UTC (permalink / raw)
  To: Filip Navara; +Cc: Kevin Wolf, Avi Kivity, qemu-devel

Jamie Lokier wrote:
> Filip Navara wrote:
> > FILE_ATTRIBUTE_SPARSE_FILE? You can't actually set it when
> > opening/creating the file, a separate call to
> > DeviceIoControl/FSCTL_SET_SPARSE is needed.
> 
> I see that you increase the file size by writing zeros to the end.
> 
> Can't you use the Windows equivalent of unix ftruncate() to extend the
> file instead, after FSCTL_SET_SPARSE?

Specifically: 

    SetEndOfFile
	Use this after SetFilePointer to change the length of a file
	or stream.  If used on a sparse file or stream, increasing
	the length creates a sparse region.

-- Jamie

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH] qcow2: Metadata preallocation
  2009-08-16 11:58 ` Avi Kivity
  2009-08-16 12:12   ` Filip Navara
@ 2009-08-17  7:11   ` Kevin Wolf
  2009-08-17  7:45     ` Avi Kivity
  1 sibling, 1 reply; 9+ messages in thread
From: Kevin Wolf @ 2009-08-17  7:11 UTC (permalink / raw)
  To: Avi Kivity; +Cc: qemu-devel

Avi Kivity schrieb:
> On 08/14/2009 06:00 PM, Kevin Wolf wrote:
>> This introduces a qemu-img create option for qcow2 which allows the metadata to
>> be preallocated, i.e. clusters are reserved in the refcount table and L1/L2
>> tables, but no data is written to them. Metadata is quite small, so this
>> happens in almost no time.
>>
>> Especially with qcow2 on virtio this helps to gain a bit of performance during
>> the initial writes. However, as soon as create a snapshot, we're back to the
>> normal slow speed, obviously. So this isn't the real fix, but kind of a cheat
>> while we're still having trouble with qcow2 on virtio.
>>
>> Note that the option is disabled by default and needs to be specified
>> explicitly using qemu-img create -f qcow2 -o preallocation=metadata.
>>
>>    
> 
> Can't say I'm thrilled with this.  I'd prefer coalescing metadata 
> updates on parallel writes.  I don't object to this though.

Even with improved concurrent cluster allocation, you might profit from
metadata preallocation by having less fragmented qcow2 images which
avoids splitting up requests. Not sure if this is relevant in practice
though.

>> +    /*
>> +     * It is expected that the image file is large enough to actually contain
>> +     * all of the allocated clusters (otherwise we get failing reads after
>> +     * EOF). So just write some zeros to the last sector.
>> +     */
>> +    if (cluster_offset != 0) {
>> +        uint8_t buf[512];
>> +        memset(buf, 0, 512);
>> +        bdrv_write(s->hd, (cluster_offset>>  9) + num - 1, buf, 1);
>> +    }
>> +
>>    
> 
> Older versions of Windows don't support sparse files, and newer ones 
> need a flag.  It's a good idea to set this flag when opening on Windows.

I'm certainly hoping that raw-win32 is doing whatever needs to be done?
The mentioned FSCTL_SET_SPARSE seems to be there at least.

Kevin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH] qcow2: Metadata preallocation
  2009-08-16 16:48     ` Jamie Lokier
  2009-08-16 20:28       ` Jamie Lokier
@ 2009-08-17  7:16       ` Kevin Wolf
  1 sibling, 0 replies; 9+ messages in thread
From: Kevin Wolf @ 2009-08-17  7:16 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Filip Navara, Avi Kivity, qemu-devel

Jamie Lokier schrieb:
> Filip Navara wrote:
>> FILE_ATTRIBUTE_SPARSE_FILE? You can't actually set it when
>> opening/creating the file, a separate call to
>> DeviceIoControl/FSCTL_SET_SPARSE is needed.
> 
> I see that you increase the file size by writing zeros to the end.
> 
> Can't you use the Windows equivalent of unix ftruncate() to extend the
> file instead, after FSCTL_SET_SPARSE?

There actually exists a bdrv_truncate(). I wasn't aware of that. If you
prefer, I can resend the patch with bdrv_truncate instead of a zero write.

Kevin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH] qcow2: Metadata preallocation
  2009-08-17  7:11   ` Kevin Wolf
@ 2009-08-17  7:45     ` Avi Kivity
  2009-08-17  7:58       ` Kevin Wolf
  0 siblings, 1 reply; 9+ messages in thread
From: Avi Kivity @ 2009-08-17  7:45 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel

On 08/17/2009 10:11 AM, Kevin Wolf wrote:
> Avi Kivity schrieb:
>    
>> On 08/14/2009 06:00 PM, Kevin Wolf wrote:
>>      
>>> This introduces a qemu-img create option for qcow2 which allows the metadata to
>>> be preallocated, i.e. clusters are reserved in the refcount table and L1/L2
>>> tables, but no data is written to them. Metadata is quite small, so this
>>> happens in almost no time.
>>>
>>> Especially with qcow2 on virtio this helps to gain a bit of performance during
>>> the initial writes. However, as soon as create a snapshot, we're back to the
>>> normal slow speed, obviously. So this isn't the real fix, but kind of a cheat
>>> while we're still having trouble with qcow2 on virtio.
>>>
>>> Note that the option is disabled by default and needs to be specified
>>> explicitly using qemu-img create -f qcow2 -o preallocation=metadata.
>>>
>>>
>>>        
>> Can't say I'm thrilled with this.  I'd prefer coalescing metadata
>> updates on parallel writes.  I don't object to this though.
>>      
> Even with improved concurrent cluster allocation, you might profit from
> metadata preallocation by having less fragmented qcow2 images which
> avoids splitting up requests. Not sure if this is relevant in practice
> though.
>    

What I meant was that I prefer changes that improve performance 
throughout the lifetime of the image rather than the initial writes, 
especially as there's a space tradeoff.  It is not a strong objection, 
just a mild preference.

> I'm certainly hoping that raw-win32 is doing whatever needs to be done?
> The mentioned FSCTL_SET_SPARSE seems to be there at least.
>    

Ah, I looked at the open code and missed it.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH] qcow2: Metadata preallocation
  2009-08-17  7:45     ` Avi Kivity
@ 2009-08-17  7:58       ` Kevin Wolf
  0 siblings, 0 replies; 9+ messages in thread
From: Kevin Wolf @ 2009-08-17  7:58 UTC (permalink / raw)
  To: Avi Kivity; +Cc: qemu-devel

Avi Kivity schrieb:
>> Even with improved concurrent cluster allocation, you might profit from
>> metadata preallocation by having less fragmented qcow2 images which
>> avoids splitting up requests. Not sure if this is relevant in practice
>> though.
> 
> What I meant was that I prefer changes that improve performance 
> throughout the lifetime of the image rather than the initial writes, 
> especially as there's a space tradeoff. 

Avoiding fragmentation could improve performance during normal operation
(that is, as long as you don't use snapshots). And I wouldn't worry
about the space tradeoff: Metadata for a 10 GB image is under 2 MB, and
a good part of it would be needed anyway.

But I completely agree that it is not the solution to all of our problems.

Kevin

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-08-17  8:00 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-14 15:00 [Qemu-devel] [PATCH] qcow2: Metadata preallocation Kevin Wolf
2009-08-16 11:58 ` Avi Kivity
2009-08-16 12:12   ` Filip Navara
2009-08-16 16:48     ` Jamie Lokier
2009-08-16 20:28       ` Jamie Lokier
2009-08-17  7:16       ` Kevin Wolf
2009-08-17  7:11   ` Kevin Wolf
2009-08-17  7:45     ` Avi Kivity
2009-08-17  7:58       ` Kevin Wolf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.