qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Max Reitz <mreitz@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	qemu-block@nongnu.org
Cc: fam@euphon.net, kwolf@redhat.com, qemu-devel@nongnu.org,
	armbru@redhat.com, nsoffer@redhat.com, stefanha@redhat.com,
	den@openvz.org
Subject: Re: [PATCH v5 07/10] block: introduce preallocate filter
Date: Wed, 26 Aug 2020 10:52:50 +0200	[thread overview]
Message-ID: <3d155eaf-7365-70a7-f4fe-1abe88383087@redhat.com> (raw)
In-Reply-To: <9195846d-13ab-67d6-dae0-f29056577be1@virtuozzo.com>


[-- Attachment #1.1: Type: text/plain, Size: 6740 bytes --]

On 26.08.20 08:49, Vladimir Sementsov-Ogievskiy wrote:
> 25.08.2020 18:11, Max Reitz wrote:
>> On 21.08.20 16:11, Vladimir Sementsov-Ogievskiy wrote:
>>> It's intended to be inserted between format and protocol nodes to
>>> preallocate additional space (expanding protocol file) on writes
>>> crossing EOF. It improves performance for file-systems with slow
>>> allocation.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   docs/system/qemu-block-drivers.rst.inc |  26 +++
>>>   qapi/block-core.json                   |  20 +-
>>>   block/preallocate.c                    | 291 +++++++++++++++++++++++++
>>>   block/Makefile.objs                    |   1 +
>>>   4 files changed, 337 insertions(+), 1 deletion(-)
>>>   create mode 100644 block/preallocate.c

[...]

>>> diff --git a/block/preallocate.c b/block/preallocate.c
>>> new file mode 100644
>>> index 0000000000..bdf54dbd2f
>>> --- /dev/null
>>> +++ b/block/preallocate.c
>>> @@ -0,0 +1,291 @@
>>> +/*
>>> + * preallocate filter driver
>>> + *
>>> + * The driver performs preallocate operation: it is injected above
>>> + * some node, and before each write over EOF it does additional
>>> preallocating
>>> + * write-zeroes request.
>>> + *
>>> + * Copyright (c) 2020 Virtuozzo International GmbH.
>>> + *
>>> + * Author:
>>> + *  Sementsov-Ogievskiy Vladimir <vsementsov@virtuozzo.com>
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify
>>> + * it under the terms of the GNU General Public License as published by
>>> + * the Free Software Foundation; either version 2 of the License, or
>>> + * (at your option) any later version.
>>> + *
>>> + * This program is distributed in the hope that it will be useful,
>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>> + * GNU General Public License for more details.
>>> + *
>>> + * You should have received a copy of the GNU General Public License
>>> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
>>> + */
>>> +
>>> +#include "qemu/osdep.h"
>>> +
>>> +#include "qapi/error.h"
>>> +#include "qemu/module.h"
>>> +#include "qemu/option.h"
>>> +#include "qemu/units.h"
>>> +#include "block/block_int.h"
>>> +
>>> +
>>> +typedef struct BDRVPreallocateState {
>>> +    int64_t prealloc_size;
>>> +    int64_t prealloc_align;
>>> +
>>> +    /*
>>> +     * Filter is started as not-active, so it doesn't do any
>>> preallocations nor
>>> +     * requires BLK_PERM_RESIZE on its child. This is needed to
>>> create filter
>>> +     * above another node-child and then do bdrv_replace_node to
>>> insert the
>>> +     * filter.
>>
>> A bit weird, but seems fair.  It’s basically just a cache for whether
>> this node has a writer on it or not.
>>
>> Apart from the weirdness, I don’t understand the “another node-child”.
>> Say you have “format” -> “proto”, and then you want to insert
>> “prealloc”.  Wouldn’t you blockdev-add prealloc,file=proto and then
>> blockdev-replace format.file=prealloc?
> 
> Yes something like this. Actually, I'm about inserting the filter
> automatically from block layer code, but your reasoning is about same
> thing and is better.
> 
>> So what is that “other node-child”?
> 
> Hmm, just my misleading wording. At least '-' in wrong place. Just
> "other node child", or "child of another node"..

OK.

>>> +     *
>>> +     * Filter becomes active the first time it detects that its
>>> parents have
>>> +     * BLK_PERM_RESIZE on it.
>>> +     * Filter becomes active forever: it doesn't lose active status
>>> if parents
>>> +     * lose BLK_PERM_RESIZE, otherwise we'll not be able to shrink
>>> the file on
>>> +     * filter close.
>>
>> Oh, the file is shrunk?  That wasn’t clear from the documentation.
>>
>> Hm.  Seems like useful behavior.  I just wonder if keeping the
>> permission around indefinitely makes sense.  Why not just truncate it
>> when the permission is revoked?
> 
> How? Parent is closed earlier, so on close we will not have the
> permission. So, we force-keep the permission up to our close().

I thought that preallocate_child_perm() would be invoked when the parent
is detached, so we could do the truncate there, before relinquishing
preallocate’s RESIZE permission.

[...]

>>> +static void preallocate_close(BlockDriverState *bs)
>>> +{
>>> +    BDRVPreallocateState *s = bs->opaque;
>>> +
>>> +    if (s->active && s->data_end >= 0 &&
>>> +        bdrv_getlength(bs->file->bs) > s->data_end)
>>> +    {
>>> +        bdrv_truncate(bs->file, s->data_end, true,
>>> PREALLOC_MODE_OFF, 0, NULL);
>>
>> Now that I think more about it...  What if there are other writers on
>> bs->file?  This may throw data away.
> 
> Good point. Actually, if bs->file has other writing parents, the logic
> of the filter
> around "data_end" is broken. So we must unshare WRITE and RESIZE
> permissions.

That’s certainly a heavy hammer, but it’d work.

>>  Maybe BDS.wr_highest_offset can
>> help to make a better call?
> 
> Anyway, we'll have to use wr_highest_offset of the filter not the child
> node
> (in the child wr_highest_offset will track preallocations as well),

That’s true.

> so we want to unshare WRITE/RESIZE permissions.

[...]

>>> +static int coroutine_fn
>>> +preallocate_co_truncate(BlockDriverState *bs, int64_t offset,
>>> +                        bool exact, PreallocMode prealloc,
>>> +                        BdrvRequestFlags flags, Error **errp)
>>> +{
>>> +    BDRVPreallocateState *s = bs->opaque;
>>> +    int ret = bdrv_co_truncate(bs->file, offset, exact, prealloc,
>>> flags, errp);
>>> +
>>> +    /* s->data_end may become negative here, which means unknown
>>> data end */
>>> +    s->data_end = bdrv_getlength(bs->file->bs);
>>
>> What would be the problem with just setting data_end = offset?  We only
>> use data_end to truncate down on close, so basically repeat the
>> bdrv_co_truncate() call here, which seems correct.
> 
> We can use offset only if ret >= 0 && exact is true..

Well, on close, we ignore any errors from truncate() anyway.  So even if
we used exact=false here, it shouldn’t matter.

> But simpler is
> just call
> bdrv_getlength() anyway.

Certainly simpler than thinking about potential edge cases, true.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2020-08-26  8:53 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-21 14:11 [PATCH v5 00/10] preallocate filter Vladimir Sementsov-Ogievskiy
2020-08-21 14:11 ` [PATCH v5 01/10] block: simplify comment to BDRV_REQ_SERIALISING Vladimir Sementsov-Ogievskiy
2020-08-21 14:11 ` [PATCH v5 02/10] block/io.c: drop assertion on double waiting for request serialisation Vladimir Sementsov-Ogievskiy
2020-08-21 14:11 ` [PATCH v5 03/10] block/io: split out bdrv_find_conflicting_request Vladimir Sementsov-Ogievskiy
2020-08-21 14:11 ` [PATCH v5 04/10] block/io: bdrv_wait_serialising_requests_locked: drop extra bs arg Vladimir Sementsov-Ogievskiy
2020-08-21 14:11 ` [PATCH v5 05/10] block: bdrv_mark_request_serialising: split non-waiting function Vladimir Sementsov-Ogievskiy
2020-08-25 12:43   ` Max Reitz
2020-08-21 14:11 ` [PATCH v5 06/10] block: introduce BDRV_REQ_NO_WAIT flag Vladimir Sementsov-Ogievskiy
2020-08-25 13:10   ` Max Reitz
2020-08-26  6:26     ` Vladimir Sementsov-Ogievskiy
2020-08-26  8:36       ` Max Reitz
2020-08-26  8:57         ` Vladimir Sementsov-Ogievskiy
2020-08-21 14:11 ` [PATCH v5 07/10] block: introduce preallocate filter Vladimir Sementsov-Ogievskiy
2020-08-24 17:52   ` Vladimir Sementsov-Ogievskiy
2020-08-26  7:06     ` Vladimir Sementsov-Ogievskiy
2020-08-25 15:11   ` Max Reitz
2020-08-26  6:49     ` Vladimir Sementsov-Ogievskiy
2020-08-26  8:52       ` Max Reitz [this message]
2020-08-26  9:15         ` Vladimir Sementsov-Ogievskiy
2020-08-26  9:58           ` Max Reitz
2020-08-26 11:29             ` Vladimir Sementsov-Ogievskiy
2020-08-26 13:51     ` David Edmondson
2020-08-27  9:19       ` Vladimir Sementsov-Ogievskiy
2020-08-21 14:11 ` [PATCH v5 08/10] iotests.py: add verify_o_direct helper Vladimir Sementsov-Ogievskiy
2020-08-21 17:29   ` Nir Soffer
2020-08-21 14:11 ` [PATCH v5 09/10] iotests.py: add filter_img_check Vladimir Sementsov-Ogievskiy
2020-08-21 14:11 ` [PATCH v5 10/10] iotests: add 298 to test new preallocate filter driver Vladimir Sementsov-Ogievskiy
2020-08-27 21:08 ` [PATCH v5 00/10] preallocate filter Vladimir Sementsov-Ogievskiy
2020-09-01 15:07   ` Max Reitz
2020-09-17  9:00     ` Vladimir Sementsov-Ogievskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3d155eaf-7365-70a7-f4fe-1abe88383087@redhat.com \
    --to=mreitz@redhat.com \
    --cc=armbru@redhat.com \
    --cc=den@openvz.org \
    --cc=fam@euphon.net \
    --cc=kwolf@redhat.com \
    --cc=nsoffer@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).