All of lore.kernel.org
 help / color / mirror / Atom feed
From: Maxim Levitsky <mlevitsk@redhat.com>
To: qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>, Jan Kara <jack@suse.cz>,
	qemu-block@nongnu.org,
	"Darrick J . Wong" <darrick.wong@oracle.com>,
	Peter Lieven <pl@kamp.de>, Max Reitz <mreitz@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH 0/2] RFC: Issue with discards on raw block device without O_DIRECT
Date: Wed, 11 Nov 2020 17:44:05 +0200	[thread overview]
Message-ID: <03b01c699c9fab64736d04891f1e835aef06c886.camel@redhat.com> (raw)
In-Reply-To: <20201111153913.41840-1-mlevitsk@redhat.com>

On Wed, 2020-11-11 at 17:39 +0200, Maxim Levitsky wrote:
> clone of "starship_production"

The git-publish destroyed the cover letter:

For the reference this is for bz #1872633

The issue is that current kernel code that implements 'fallocate'
on kernel block devices roughly works like that:

1. Flush the page cache on the range that is about to be discarded.
2. Issue the discard and wait for it to finish.
   (as far as I can see the discard doesn't go through the
   page cache).

3. Check if the page cache is dirty for this range,
   if it is dirty (meaning that someone wrote to it meanwhile)
   return -EBUSY.

This means that if qemu (or qemu-img) issues a write, and then
discard to the area that shares a page, -EBUSY can be returned by
the kernel.

On the other hand, for example, the ext4 implementation of discard
doesn't seem to be affected. It does take a lock on the inode to avoid
concurrent IO and flushes O_DIRECT writers prior to doing discard thought.

Doing fsync and retrying is seems to resolve this issue, but it might be a too big hammer.
Just retrying doesn't work, indicating that maybe the code that flushes the page
cache in (1) doesn't do this correctly ?

It also can be racy unless special means are done to block IO from happening
from qemu during this fsync.

This patch series contains two patches:

First patch just lets the file-posix ignore the -EBUSY errors, which is
technically enough to fail back to plain write in this case, but seems wrong.

And the second patch adds an optimization to qemu-img to avoid such a
fragmented write/discard in the first place.

Both patches make the reproducer work for this particular bugzilla,
but I don't think they are enough.

What do you think?

Best regards,
	Maxim Levitsky



  parent reply	other threads:[~2020-11-11 15:45 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-11 15:39 [PATCH 0/2] RFC: Issue with discards on raw block device without O_DIRECT Maxim Levitsky
2020-11-11 15:39 ` [PATCH 1/2] file-posix: allow -EBUSY errors during write zeros on raw block devices Maxim Levitsky
2020-11-16 14:48   ` Kevin Wolf
2021-01-07 12:44     ` Maxim Levitsky
2020-11-11 15:39 ` [PATCH 2/2] qemu-img: align next status sector on destination alignment Maxim Levitsky
2020-11-12 12:40   ` Peter Lieven
2020-11-12 13:45     ` Eric Blake
2020-11-12 15:04       ` Maxim Levitsky
2021-03-18  9:57         ` Maxim Levitsky
2020-11-11 15:44 ` Maxim Levitsky [this message]
2020-11-12 11:19   ` [PATCH 0/2] RFC: Issue with discards on raw block device without O_DIRECT Jan Kara
2020-11-12 11:19     ` Jan Kara
2020-11-12 12:00     ` Jan Kara
2020-11-12 12:00       ` Jan Kara
2020-11-12 22:08       ` Darrick J. Wong
2020-11-12 22:08         ` Darrick J. Wong
2020-12-07 17:23         ` Maxim Levitsky
2020-12-07 17:23           ` Maxim Levitsky
2021-01-07 12:40           ` [PATCH] block: fallocate: avoid false positive on collision detection Maxim Levitsky
2021-01-07 12:42             ` Maxim Levitsky
2021-01-07 15:37             ` Jan Kara
2020-11-12 15:38     ` [PATCH 0/2] RFC: Issue with discards on raw block device without O_DIRECT Maxim Levitsky
2020-11-12 15:38       ` Maxim Levitsky
2020-11-13 10:07       ` Jan Kara
2020-11-13 10:07         ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=03b01c699c9fab64736d04891f1e835aef06c886.camel@redhat.com \
    --to=mlevitsk@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=jack@suse.cz \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=pl@kamp.de \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.