qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Cc: Denis Lunev <den@virtuozzo.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"qemu-block@nongnu.org" <qemu-block@nongnu.org>,
	Max Reitz <mreitz@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW
Date: Tue, 13 Aug 2019 13:51:15 +0200	[thread overview]
Message-ID: <20190813115115.GG4663@localhost.localdomain> (raw)
In-Reply-To: <94ccf129-cc7e-2778-7688-fd718f8df249@virtuozzo.com>

Am 13.08.2019 um 13:14 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 13.08.2019 12:33, Vladimir Sementsov-Ogievskiy wrote:
> > 13.08.2019 12:01, Vladimir Sementsov-Ogievskiy wrote:
> >> 13.08.2019 11:39, Vladimir Sementsov-Ogievskiy wrote:
> >>> 12.08.2019 22:50, Max Reitz wrote:
> >>>> On 12.08.19 21:46, Max Reitz wrote:
> >>>>> On 12.08.19 20:11, Vladimir Sementsov-Ogievskiy wrote:
> >>>>>> Hi all!
> >>>>>>
> >>>>>> I'm not sure, is it a bug or a feature, but using qcow2 under raw is
> >>>>>> broken. It should be either fixed like I propose (by Max's suggestion)
> >>>>>> or somehow forbidden (just forbid backing-file supporting node to be
> >>>>>> file child of raw-format node).
> >>>>>
> >>>>> I agree, I think only filters should return BDRV_BLOCK_RAW.
> >>>>>
> >>>>> (And not even them, they should just be handled transparently by
> >>>>> bdrv_co_block_status().  But that’s something for later.)
> >>>>>
> >>>>>> Vladimir Sementsov-Ogievskiy (2):
> >>>>>>    block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
> >>>>>>    iotests: test mirroring qcow2 under raw format
> >>>>>>
> >>>>>>   block/raw-format.c         |  2 +-
> >>>>>>   tests/qemu-iotests/263     | 46 ++++++++++++++++++++++++++++++++++++++
> >>>>>>   tests/qemu-iotests/263.out | 12 ++++++++++
> >>>>>>   tests/qemu-iotests/group   |  1 +
> >>>>>>   4 files changed, 60 insertions(+), 1 deletion(-)
> >>>>>>   create mode 100755 tests/qemu-iotests/263
> >>>>>>   create mode 100644 tests/qemu-iotests/263.out
> >>>>>
> >>>>> Thanks, applied to my block-next branch:
> >>>>>
> >>>>> https://git.xanclic.moe/XanClic/qemu/commits/branch/block-next
> >>>>
> >>>> Oops, maybe not.  221 needs to be adjusted.
> >>>>
> >>>
> >>>
> >>> Hmm yes, I forget to run tests.. Areas which were zero becomes data|zero, it
> >>> don't look good.
> >>>
> >>> So, it's not quite right to report DATA | RECURSE, we actually should report
> >>> DATA_OR_ZERO | RECURSE, which is actually ALLOCATED | RECURSE, as otherwise
> >>> DATA will be set in final result (generic layer must not drop it, obviously).
> >>>
> >>> ALLOCATED never returned by drivers but seems it should be. I'll think a bit and
> >>> resend something new.
> >>>
> >>>
> >>
> >>
> >> Hmmm.. So, we have raw node, and assume backing chain under it. who should loop through it,
> >> generic code or raw driver?
> >>
> >> Now it all looks like generic code is responsible for looping through filtered chain (backing files
> >> and filters) and driver is responsible for all it's children except for filtered child.
> >>
> >> Or, driver may return something that says to generic child to handle the whole backing chain of returned
> >> file at once, as it's another backing chain. And seems even RECURSE don't work correctly as it doesn't handle
> >> the backing chain in this recursion. Why it works better than RAW - just because we return it together
> >> with DATA flags and this DATA flag is kept anyway, independently of finding zeros or not.
> >>
> >>
> > 
> > 
> > Hmm, so, is it correct that we return DATA | RECURSE, if we are not really sure that it is data?
> > 
> > If we see at
> > 
> >   * BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer
> > 
> > seems like we should report DATA only if there is allocation..
> > 
> >   * DATA ZERO OFFSET_VALID
> >   *  t    t        t       sectors read as zero, returned file is zero at offset
> >   *  t    f        t       sectors read as valid from file at offset
> >   *  f    t        t       sectors preallocated, read as zero, returned file not
> > 
> > so, ZERO alone doesn't guarantee that we may safely read?
> > 
> > So, for qcow2 metadata-preallocated images, what about zero-init? We report DATA, and probably get ZERO from
> > file and have finally DAYA | ZERO which guarantees that read will return zeros, but is it true?
> > 
> > Finally, what "DATA" mean? That space is allocated and occupies disk space? Or it only  means only ALLOCATED i.e.
> > "read from this layer, not from backing" otherwise, and adds additional meaning to ZERO when used together, that
> > read will return zeros for sure?

I think DATA means that the data for this block is provided by *file. I
wouldn't necessarily understand it to mean that the data actually takes
up physical disk space there.

> Continue self-discussion.
> 
> Consider closer the following case:
>  >   * DATA ZERO OFFSET_VALID
>  >   *  f    t        t       sectors preallocated, read as zero, returned file not
> 
> It actually means that we must not read, as read will return wrong
> data, when clusters are actually zero for guest.

It means that you need to read from bs itself to get the correct data
(which will be zero). Even though OFFSET_VALID is set, reading from
*file (typically bs->file->bs) at the returned offset might not give the
right result.

> It's OK, when for ex. qcow2 returns this combination and link to its
> file child: it means that if you read from qcow2 node, you'll see
> correct zeros, as qcow2 has special metadata which shows that these
> clusters are zero. But if you read from file directly at returned
> offset you'll see garbage, so don't do it.

Correct.

> But what if some node returns this combination with file == itself? It
> actually means that you must not read, but you should call
> block-status to understand that there are zeros. So, if some format
> can return this combination with file == itself it means that you must
> not read it directly, but only after checking block status.

This doesn't make sense to me. Reading from a node is always correct.

But you're right that DATA seems to mean something slightly different at
the protocol level because *file cannot have a meaningful value for the
lower layer there. In this case, DATA still seems to mean that the data
is fetched from the lower layer (i.e. the block device on which the file
system resides). For holes, this is not the case.

> And file-posix is example of such driver. But file-posix holes are guaranteed to read as zero, so we can report DATA | ZERO.
> But this will break user expirience which assumes that DATA means occupation of real disk space.

With the above explanation, DATA shouldn't be set for holes.

But it's still kind of inconsistent because OFFSET_VALID and the offset
refer to bs itself and not to the lower layer.

> ...
> me go and re-read what we've documented in NBD protocol about block steus...
> 
> "DATA" turns into NBD_STATE_HOLE, which formally means nothing, and just notes that probably there is no disk space occupation
> for this region.. So it's about disk space allocation and nothing about correctness of read.
> and NBD_STATE_ZERO guarantees that region read as all zeroes.
> 
> Look at code in nbd/server.c.. Aha, it calls block_status_above and turns !ALLOCATED into HOLE. Which means that it will never
> return HOLE for file-posix..

Hm... This is a mess. :-)

Kevin


  reply	other threads:[~2019-08-13 11:51 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-12 18:11 [Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW Vladimir Sementsov-Ogievskiy
2019-08-12 18:11 ` [Qemu-devel] [PATCH 1/2] block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE Vladimir Sementsov-Ogievskiy
2019-08-13 11:04   ` Kevin Wolf
2019-08-13 11:28     ` Vladimir Sementsov-Ogievskiy
2019-08-13 12:01       ` Kevin Wolf
2019-08-13 13:21         ` [Qemu-devel] [Qemu-block] " Kevin Wolf
2019-08-13 14:46           ` Max Reitz
2019-08-13 14:43     ` [Qemu-devel] " Max Reitz
2019-08-13 14:56       ` Vladimir Sementsov-Ogievskiy
2019-08-13 15:03         ` Max Reitz
2019-08-13 15:22           ` Vladimir Sementsov-Ogievskiy
2019-08-13 16:07             ` Max Reitz
2019-08-13 15:41       ` Kevin Wolf
2019-08-13 15:54         ` Vladimir Sementsov-Ogievskiy
2019-08-13 16:08           ` Kevin Wolf
2019-08-13 16:32             ` Vladimir Sementsov-Ogievskiy
2019-08-14  6:27               ` Vladimir Sementsov-Ogievskiy
2019-08-13 16:21         ` Max Reitz
2019-08-12 18:11 ` [Qemu-devel] [PATCH 2/2] iotests: test mirroring qcow2 under raw format Vladimir Sementsov-Ogievskiy
2019-08-13  9:10   ` Kevin Wolf
2019-08-13  9:22     ` Vladimir Sementsov-Ogievskiy
2019-08-13  9:36       ` Kevin Wolf
2019-08-12 19:46 ` [Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW Max Reitz
2019-08-12 19:50   ` Max Reitz
2019-08-13  8:39     ` Vladimir Sementsov-Ogievskiy
2019-08-13  9:01       ` Vladimir Sementsov-Ogievskiy
2019-08-13  9:33         ` Vladimir Sementsov-Ogievskiy
2019-08-13 11:14           ` Vladimir Sementsov-Ogievskiy
2019-08-13 11:51             ` Kevin Wolf [this message]
2019-08-13 13:00               ` Vladimir Sementsov-Ogievskiy
2019-08-13 14:31               ` Max Reitz
2019-08-13 14:46                 ` Vladimir Sementsov-Ogievskiy
2019-08-13 14:53                   ` Max Reitz
2019-08-13 15:03                     ` Kevin Wolf
2019-08-13 15:04                       ` Max Reitz
2019-08-13 14:50                 ` Eric Blake
2019-08-13  9:34   ` Kevin Wolf
2019-08-13 14:38     ` Max Reitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190813115115.GG4663@localhost.localdomain \
    --to=kwolf@redhat.com \
    --cc=den@virtuozzo.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).