All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
To: Max Reitz <mreitz@redhat.com>, qemu-block@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>,
	Anton Nefedov <anton.nefedov@virtuozzo.com>,
	qemu-devel@nongnu.org, qemu-stable@nongnu.org,
	Stefan Hajnoczi <stefanha@redhat.com>,
	"Denis V . Lunev" <den@openvz.org>
Subject: Re: [PATCH for-4.2 v2 3/3] block/file-posix: Let post-EOF fallocate serialize
Date: Tue, 2 Jun 2020 17:43:36 +0300	[thread overview]
Message-ID: <dfe5fbff-ce04-504e-542b-11095a57fd78@virtuozzo.com> (raw)
In-Reply-To: <20191101152510.11719-4-mreitz@redhat.com>

01.11.2019 18:25, Max Reitz wrote:

Sorry for being late, I have some comments

> The XFS kernel driver has a bug that may cause data corruption for qcow2
> images as of qemu commit c8bb23cbdbe32f.  We can work around it by
> treating post-EOF fallocates as serializing up until infinity (INT64_MAX
> in practice).
> 
> Cc: qemu-stable@nongnu.org
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/file-posix.c | 36 ++++++++++++++++++++++++++++++++++++
>   1 file changed, 36 insertions(+)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 0b7e904d48..1f0f61a02b 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -2721,6 +2721,42 @@ raw_do_pwrite_zeroes(BlockDriverState *bs, int64_t offset, int bytes,
>       RawPosixAIOData acb;
>       ThreadPoolFunc *handler;
>   
> +#ifdef CONFIG_FALLOCATE
> +    if (offset + bytes > bs->total_sectors * BDRV_SECTOR_SIZE) {
> +        BdrvTrackedRequest *req;
> +        uint64_t end;
> +
> +        /*
> +         * This is a workaround for a bug in the Linux XFS driver,
> +         * where writes submitted through the AIO interface will be
> +         * discarded if they happen beyond a concurrently running
> +         * fallocate() that increases the file length (i.e., both the
> +         * write and the fallocate() happen beyond the EOF).
> +         *
> +         * To work around it, we extend the tracked request for this
> +         * zero write until INT64_MAX (effectively infinity), and mark
> +         * it as serializing.
> +         *
> +         * We have to enable this workaround for all filesystems and
> +         * AIO modes (not just XFS with aio=native), because for
> +         * remote filesystems we do not know the host configuration.
> +         */
> +
> +        req = bdrv_co_get_self_request(bs);
> +        assert(req);
> +        assert(req->type == BDRV_TRACKED_WRITE);
> +        assert(req->offset <= offset);
> +        assert(req->offset + req->bytes >= offset + bytes);

Why these assertions? TrackedRequest offset and bytes fields correspond to the original request. When request is being expanded to satisfy request_alignment, these fields are not updated.
So, maybe, we should assert overlap_offset and overlap_bytes?

> +
> +        end = INT64_MAX & -(uint64_t)bs->bl.request_alignment;
> +        req->bytes = end - req->offset;

And I doubt that we should update req->bytes. We never updated it in other places, it corresponds to original request. It's enough to update overlap_bytes to achieve corresponding serialising.

> +        req->overlap_bytes = req->bytes;
> +
> +        bdrv_mark_request_serialising(req, bs->bl.request_alignment);

Not sure, how much should we care about request_alignment here, I think, it's enough to just set req->overlap_bytes = INT64_MAX - req->overlap_offest, but it doesn't really matter.

> +        bdrv_wait_serialising_requests(req);
> +    }
> +#endif
> +
>       acb = (RawPosixAIOData) {
>           .bs             = bs,
>           .aio_fildes     = s->fd,
> 


-- 
Best regards,
Vladimir


  parent reply	other threads:[~2020-06-02 14:46 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-01 15:25 [PATCH for-4.2 v2 0/3] qcow2: Fix data corruption on XFS Max Reitz
2019-11-01 15:25 ` [PATCH for-4.2 v2 1/3] block: Make wait/mark serialising requests public Max Reitz
2019-11-01 15:25 ` [PATCH for-4.2 v2 2/3] block: Add bdrv_co_get_self_request() Max Reitz
2019-11-01 15:25 ` [PATCH for-4.2 v2 3/3] block/file-posix: Let post-EOF fallocate serialize Max Reitz
2019-11-14 16:27   ` Christoph Hellwig
2019-11-14 17:15     ` Max Reitz
2019-11-14 17:16       ` Max Reitz
2020-06-02 14:43   ` Vladimir Sementsov-Ogievskiy [this message]
2020-06-02 15:46     ` Max Reitz
2020-06-02 16:16       ` Vladimir Sementsov-Ogievskiy
2020-06-02 16:38         ` Max Reitz
2020-06-02 17:01           ` Vladimir Sementsov-Ogievskiy
2020-06-02 17:08             ` Vladimir Sementsov-Ogievskiy
2020-08-22 17:03   ` Vladimir Sementsov-Ogievskiy
2020-08-22 17:04     ` Vladimir Sementsov-Ogievskiy
2020-08-26  8:23       ` Vladimir Sementsov-Ogievskiy
2020-08-26 11:20         ` Vladimir Sementsov-Ogievskiy
2019-11-01 15:48 ` [PATCH for-4.2 v2 0/3] qcow2: Fix data corruption on XFS no-reply
2019-11-04  8:29   ` Max Reitz
2019-11-04  9:09 ` Max Reitz
2019-11-04  9:45 ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dfe5fbff-ce04-504e-542b-11095a57fd78@virtuozzo.com \
    --to=vsementsov@virtuozzo.com \
    --cc=anton.nefedov@virtuozzo.com \
    --cc=den@openvz.org \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-stable@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.