All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Richard Laager <rlaager@wiktel.com>, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations
Date: Wed, 14 Mar 2012 13:49:48 +0100	[thread overview]
Message-ID: <4F6093EC.2090303@redhat.com> (raw)
In-Reply-To: <4F60911E.6030201@redhat.com>

Il 14/03/2012 13:37, Kevin Wolf ha scritto:
> Am 14.03.2012 13:14, schrieb Paolo Bonzini:
>>> Paolo mentioned a use case as a fast way for guests to write zeros, but
>>> is it really faster than a normal write when we have to emulate it by a
>>> bdrv_write with a temporary buffer of zeros? 
>>
>> No, of course not.
>>
>>> On the other hand we have
>>> the cases where discard really means "I don't care about the data any
>>> more" and emulating it by writing zeros is just a waste of resources there.
>>>
>>> So I think we only want to advertise that discard zeroes data if we can
>>> do it efficiently. This means that the format does support it, and that
>>> the device is able to communicate the discard granularity (= cluster
>>> size) to the guest OS.
>>
>> Note that the discard granularity is only a hint, so it's really more a
>> maximum suggested value than a granularity.  Outside of a cluster
>> boundary the format would still have to write zeros manually.
> 
> You're talking about SCSI here, I guess? Would be one case where being
> able to define sane semantics for virtio-blk would have been an
> advantage... I had hoped that SCSI was already sane, but if doesn't
> distinguish between "I don't care about this any more" and "I want to
> have zeros here", then I'm afraid I can't call it sane any more.

It does make the distinction.  "I don't care" is UNMAP (or WRITE
SAME(16) with the UNMAP bit set); "I want to have zeroes" is WRITE
SAME(10) or WRITE SAME(16) with an all-zero payload.

> We can make the conditions even stricter, i.e. allow it only if protocol
> can pass through discards for unaligned requests. This wouldn't free
> clusters on an image format level, but at least on a file system level.
> 
>> Also, Linux for example will only round the number of sectors down to
>> the granularity, not the start sector.  Rereading the code, for SCSI we
>> want to advertise a zero granularity (aka do whatever you want),
>> otherwise we may get only misaligned discard requests and end up writing
>> zeroes inefficiently all the time.
> 
> Does this make sense with real hardware or is it a Linux bug?

It's a bug, SCSI defines the "optimal unmap request starting LBA" to be
"(n × optimal unmap granularity) + unmap granularity alignment".

>> The problem is that advertising discard_zeroes_data based on the backend
>> calls for trouble as soon as you migrate between storage formats,
>> filesystems or disks.
> 
> True. You would have to emulate if you migrate from a source that can
> discard to zeros efficiently to a destination that can't.
> 
> In the end, I guess we'll just have to accept that we can't fix bad
> semantics of ATA and SCSI, and just need to decide whether "I don't
> care" or "I want to have zeros" is more common. My feeling is that "I
> don't care" is the more useful operation because it can't be expressed
> otherwise, but I haven't checked what guests really do.

Yeah, guests right now only use it for unused filesystem pieces, so the
"do not care" semantics are fine.  I also hoped to use discard to avoid
blowing up thin-provisioned images when streaming.  Perhaps we can use
bdrv_has_zero_init instead, and/or pass down the copy-on-read flag to
the block driver.

Anyhow, there are some patches from this series that are relatively
independent and ready for inclusion, I'll extract them and post them
separately.

Paolo

  reply	other threads:[~2012-03-14 12:50 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-08 17:15 [Qemu-devel] [RFC PATCH 00/17] Improvements around discard and write zeroes Paolo Bonzini
2012-03-08 17:15 ` [Qemu-devel] [RFC PATCH 01/17] qemu-iotests: add a simple test for write_zeroes Paolo Bonzini
2012-03-08 17:15 ` [Qemu-devel] [RFC PATCH 02/17] qed: make write-zeroes bounce buffer smaller than a single cluster Paolo Bonzini
2012-03-08 17:15 ` [Qemu-devel] [RFC PATCH 03/17] block: add discard properties to BlockDriverInfo Paolo Bonzini
2012-03-09 16:47   ` Kevin Wolf
2012-03-08 17:15 ` [Qemu-devel] [RFC PATCH 04/17] qed: implement bdrv_aio_discard Paolo Bonzini
2012-03-09 16:31   ` Kevin Wolf
2012-03-09 17:53     ` Paolo Bonzini
2012-03-08 17:15 ` [Qemu-devel] [RFC PATCH 05/17] block: pass around qiov for write_zeroes operation Paolo Bonzini
2012-03-08 17:15 ` [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations Paolo Bonzini
2012-03-09 16:37   ` Kevin Wolf
2012-03-09 18:06     ` Paolo Bonzini
2012-03-10 18:02       ` Richard Laager
2012-03-12 12:27         ` Paolo Bonzini
2012-03-12 13:04           ` Kevin Wolf
2012-03-13 19:13           ` Richard Laager
2012-03-14  7:41             ` Paolo Bonzini
2012-03-14 12:01               ` Kevin Wolf
2012-03-14 12:14                 ` Paolo Bonzini
2012-03-14 12:37                   ` Kevin Wolf
2012-03-14 12:49                     ` Paolo Bonzini [this message]
2012-03-14 13:04                       ` Kevin Wolf
2012-03-24 15:33                       ` Christoph Hellwig
2012-03-24 15:30                   ` Christoph Hellwig
2012-03-26 19:40                     ` Richard Laager
2012-03-27 10:20                       ` Kevin Wolf
2012-03-24 15:29                 ` Christoph Hellwig
2012-03-26  9:44                   ` Daniel P. Berrange
2012-03-26  9:56                     ` Christoph Hellwig
2012-03-15  0:42               ` Richard Laager
2012-03-15  9:36                 ` Paolo Bonzini
2012-03-16  0:47                   ` Richard Laager
2012-03-16  9:34                     ` Paolo Bonzini
2012-03-24 15:27         ` Christoph Hellwig
2012-03-26 19:40           ` Richard Laager
2012-03-27  9:08             ` Christoph Hellwig
2012-03-08 17:15 ` [Qemu-devel] [RFC PATCH 07/17] block: make high level discard operation always zero Paolo Bonzini
2012-03-08 17:55   ` Avi Kivity
2012-03-09 16:42     ` Kevin Wolf
2012-03-12 10:42       ` Avi Kivity
2012-03-12 11:04         ` Kevin Wolf
2012-03-12 12:03           ` Avi Kivity
2012-03-08 17:15 ` [Qemu-devel] [RFC PATCH 08/17] block: kill the write zeroes operation Paolo Bonzini
2012-03-08 17:15 ` [Qemu-devel] [RFC PATCH 09/17] ide: issue discard asynchronously but serialize the pieces Paolo Bonzini
2012-03-08 17:15 ` [Qemu-devel] [RFC PATCH 10/17] ide/scsi: add discard_zeroes_data property Paolo Bonzini
2012-03-08 18:13   ` Avi Kivity
2012-03-08 18:14     ` Avi Kivity
2012-03-08 17:15 ` [Qemu-devel] [RFC PATCH 11/17] ide/scsi: prepare for flipping the discard defaults Paolo Bonzini
2012-03-08 17:15 ` [Qemu-devel] [RFC PATCH 12/17] ide/scsi: turn on discard Paolo Bonzini
2012-03-08 18:17   ` Avi Kivity
2012-03-08 17:15 ` [Qemu-devel] [RFC PATCH 13/17] block: fallback from discard to writes Paolo Bonzini
2012-03-24 15:35   ` Christoph Hellwig
2012-03-08 17:15 ` [Qemu-devel] [RFC PATCH 14/17] block: support FALLOC_FL_PUNCH_HOLE trimming Paolo Bonzini
2012-03-09  8:20   ` Chris Wedgwood
2012-03-09  8:31     ` Paolo Bonzini
2012-03-09  8:35       ` Chris Wedgwood
2012-03-09  8:40         ` Paolo Bonzini
2012-03-09 10:31   ` Stefan Hajnoczi
2012-03-09 10:43     ` Paolo Bonzini
2012-03-09 10:53       ` Stefan Hajnoczi
2012-03-09 10:57         ` Paolo Bonzini
2012-03-09 20:36   ` Richard Laager
2012-03-12  9:34     ` Paolo Bonzini
2012-03-24 15:40     ` Christoph Hellwig
2012-03-08 17:15 ` [Qemu-devel] [RFC PATCH 15/17] raw: add get_info Paolo Bonzini
2012-03-08 17:15 ` [Qemu-devel] [RFC PATCH 16/17] qemu-io: fix the alloc command Paolo Bonzini
2012-03-08 17:15 ` [Qemu-devel] [RFC PATCH 17/17] raw: implement is_allocated Paolo Bonzini
2012-03-24 15:42   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F6093EC.2090303@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rlaager@wiktel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.