All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Denis V. Lunev" <den@openvz.org>
To: Eric Blake <eblake@redhat.com>,
	qemu-block@nongnu.org, qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>, Max Reitz <mreitz@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 1/1] qcow2: handle cluster leak happening with a guest TRIM command
Date: Tue, 23 May 2017 17:52:25 +0300	[thread overview]
Message-ID: <a221266c-ea2e-555c-35e1-13d180c0554c@openvz.org> (raw)
In-Reply-To: <07c51989-1032-8039-0436-515e63f14df1@redhat.com>

On 05/22/2017 04:02 PM, Eric Blake wrote:
> On 05/22/2017 06:56 AM, Denis V. Lunev wrote:
>>>
>>> As it is, your patch doesn't apply to master.  And even if it did, your
>>> patch breaks semantics - we CANNOT discard clusters that must read back
>>> as zeroes.
>>>
>> Can you pls give some details why the cluster can not be
>> discarded? This is something that I can not understand.
> The cluster CAN be discarded if the guest requests it.  There is a
> difference between:
>
> qemu-img -f qcow2 'w -z    0 64k' 1.img
>
> which says to ensure that the cluster reads back as zero, but to NOT
> unmap things (corresponding to NBD_CMD_WRITE_ZEROES with
> NBD_CMD_FLAG_NO_HOLE set), and:
>
> qemu-img -f qcow2 'w -z -u 0 64k' 1.img
>
> which says to ensure that the cluster reads back as zero, and that
> unmapping the cluster is permitted if that will make the operation
> faster and/or use less space (corresponding to NBD_CMD_WRITE_ZEROES with
> NBD_CMD_FLAG_NO_HOLE clear).
>
> In current master, we've gone as far as to make an obvious difference
> between the two types of qcow2 zero clusters (see commit fdfab37);
> QCOW2_CLUSTER_ZERO_PLAIN corresponds to the case where the guest allowed
> a cluster to be unmapped, while QCOW2_CLUSTER_ZERO_ALLOC corresponds to
> the case where we've written zeroes but require the allocation mapping
> to remain intact.
>
> One of the big reasons why we do NOT want the guest to be able to
> convert a cluster to QCOW2_CLUSTER_UNALLOCATED after a write zero
> request (but only after a discard request) is that we don't want to
> audit what will happen if a backing file is later added to the image.
> If a guest does a discard, then they are acknowledging that the data in
> that cluster can read back as anything (we try to make it read back as
> zero where practical, but do not guarantee that).  But if a guest does a
> write zero request, even if they permit unmapping, the semantics require
> that they can read back zero, no matter what happens to the backing chain.
>
> An example that Kevin used to convince me was the process of drive
> mirroring: if you mirror just the active qcow2 layer, then cancel the
> mirror, the destination is supposed to be a snapshot of what the guest
> saw at that moment, once you rebase the destination image to be on top
> of the same backing contents as the source saw.  That implies that I can
> use 'qemu-img rebase -u' to inject a backing chain.  If we mirror a zero
> cluster over to the destination, and make sure the destination sees a
> QCOW2_CLUSTER_ZERO_PLAIN, then no matter what the backing chain we later
> rebase the destination onto, we still read 0 for that cluster (as the
> guest expected).  But if we mirror the zero cluster over and leave the
> destination image with QCOW2_CLUSTER_UNALLOCATED, then supply a backing
> chain where the cluster does not read as zero, then we have altered the
> guest-visible contents.  So it is safer to ALWAYS require that a guest
> write-zero action uses one of the two types of qcow2 zero clusters, to
> keep the guest-visible contents of that cluster constant no matter what
> rebasing actions later occur.
>
> That said, we've also had a conversation on things we can do to improve
> mirroring; one of the ideas was to add a new flag (comparable to the
> existing BDRV_REQ_MAY_UNMAP) that mirroring (but not guest actions) can
> use to request that zero_single_l2() is permitted to reuse
> QCOW2_CLUSTER_UNALLOCATED instead of having to switch to
> QCOW2_CLUSTER_ZERO_PLAIN, or where we can exploit PUNCH_HOLE semantics
> on the underlying bs->file to efficiently write zeroes into the file
> system even while keeping QCOW2_CLUSTER_ZERO_ALLOC at the qcow2 layer.
>
>> At my opinion the cluster is clearly marked with QCOW_OFLAG_ZERO
>> and thus will be read with zeroes regardless of the content of the
>> cluster which is pointed by the offset. This is actually what
>> happens on the FS without PUNCH_HOLE support.
>>
>> That is why this offset can be actually reused for any other
>> block, what is I am trying to propose.
> Yes, when using write zeros with unmap (qemu-io -c 'w -z -u ...'), then
> the code already unmaps the cluster, so that it can be reused for any
> other block.  So it boils down to whether your guest is really
> requesting write zeroes with permission to unmap, and whether any other
> options you have specified regarding the disk are filtering out
> BDRV_REQ_MAY_UNMAP before it reaches the qcow2 layer.
>
this is very clear now. Thank you very much for this letter.

Thank you and best regards,
    Den

      reply	other threads:[~2017-05-23 14:52 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-21 14:21 [Qemu-devel] [PATCH 1/1] qcow2: handle cluster leak happening with a guest TRIM command Denis V. Lunev
2017-05-22 11:37 ` Eric Blake
2017-05-22 11:56   ` Denis V. Lunev
2017-05-22 13:02     ` Eric Blake
2017-05-23 14:52       ` Denis V. Lunev [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a221266c-ea2e-555c-35e1-13d180c0554c@openvz.org \
    --to=den@openvz.org \
    --cc=eblake@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.