From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Return-Path: MIME-Version: 1.0 References: <1528258740-6581-1-git-send-email-changpeng.liu@intel.com> <20181012210628.226361-1-dverkamp@chromium.org> <20181015092740.GA3964@infradead.org> In-Reply-To: <20181015092740.GA3964@infradead.org> From: Daniel Verkamp Date: Mon, 15 Oct 2018 16:16:03 -0700 Message-ID: Subject: Re: [PATCH v8] virtio_blk: add discard and write zeroes support To: hch@infradead.org Cc: virtualization@lists.linux-foundation.org, linux-block@vger.kernel.org, mst@redhat.com, jasowang@redhat.com, axboe@kernel.dk, stefanha@redhat.com, Changpeng Liu Content-Type: text/plain; charset="UTF-8" List-ID: On Mon, Oct 15, 2018 at 2:27 AM Christoph Hellwig wrote= : > On Fri, Oct 12, 2018 at 02:06:28PM -0700, Daniel Verkamp wrote: > > From: Changpeng Liu > > > > In commit 88c85538, "virtio-blk: add discard and write zeroes features > > to specification" (https://github.com/oasis-tcs/virtio-spec), the virti= o > > There is some issues in this spec. For one using the multiple ranges > also for write zeroes is rather inefficient. Write zeroes really should > use the same format as read and write. I wasn't involved in the writing of the spec, so I'll defer to Michael and Changpeng here, but I'm not sure how "set in stone" the virtio specification is, or if it can be updated somehow without breaking compatibility. I agree that Write Zeroes would be simpler to implement as a single LBA + length rather than a list. However, it's not really possible to use the same format as the regular virtio block read/write commands (VIRTIO_BLK_T_IN/VIRTIO_BLK_T_OUT), since the read/write commands do not specify a length explicitly; length is implied by the length of the data buffer as defined by the virtio descriptor, but a Write Zeroes command does not require a data buffer. At best, this could be a separate command mirroring the layout of struct virtio_blk_req but with data replaced with a length field; I'm not sure that buys much in the way of consistency. > Second the unmap flag isn't properly specified at all, as nothing > says the device may not unmap without the unmap flag. Please take > a look at the SCSI or NVMe =D1=95pec for some guidance. This could probably use some clarifying text in the specification, but given that there is nothing in the spec describing what the device needs to do when unmap =3D 0, I would assume that the device can do whatever it likes, as long as the blocks read back as 0s afterwards. Reading back 0s is required by the definition of the Write Zeroes command in the same virtio spec change. It would probably be good to clarify this and explicitly define what the device is allowed to do in response to both settings of the unmap bit. My understanding of the corresponding feature in NVMe (the Deallocate bit in the Write Zeroes command) is that the only difference between Deallocate =3D 1 and 0 is that the device "should" versus "may" (no "shall" on either side) deallocate the corresponding blocks, but only if the device supports reading 0s back after blocks are deallocated. If the device does not support reading 0s after deallocation, it is not allowed to deallocate blocks as part of a Write Zeroes command regardless of the setting of the Deallocate bit. Some similar wording could probably be added to the virtio spec to clarify the meaning of unmap, although I would prefer something that makes it a little clearer that the bit is only intended as a hint from the driver to indicate whether the device should attempt to keep storage allocated for the zeroed blocks, if that is indeed the intended behavior. Is there some in-kernel doc that describes what behavior the Linux block layer needs from a write zeroes command? > > +static inline int virtblk_setup_discard_write_zeroes(struct request *r= eq, > > + bool unmap) > > Why is this an inline function? I don't think there's any reason it needs to be inline; I can drop the inline in the next revision. Given (as far as I can tell) your concerns seem to apply to the Write Zeroes command specifically, would it be reasonable to start with a patch that just adds support for the Discard command (along with fixes for Ming's feedback)? This would be sufficient for my particular use case (although I can't speak for Changpeng), and we can revisit Write Zeroes once the spec concerns are worked out. Thanks, -- Daniel From mboxrd@z Thu Jan 1 00:00:00 1970 From: Daniel Verkamp Subject: Re: [PATCH v8] virtio_blk: add discard and write zeroes support Date: Mon, 15 Oct 2018 16:16:03 -0700 Message-ID: References: <1528258740-6581-1-git-send-email-changpeng.liu@intel.com> <20181012210628.226361-1-dverkamp@chromium.org> <20181015092740.GA3964@infradead.org> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: In-Reply-To: <20181015092740.GA3964@infradead.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: virtualization-bounces@lists.linux-foundation.org Errors-To: virtualization-bounces@lists.linux-foundation.org To: hch@infradead.org Cc: axboe@kernel.dk, mst@redhat.com, virtualization@lists.linux-foundation.org, linux-block@vger.kernel.org, stefanha@redhat.com, Changpeng Liu List-Id: virtualization@lists.linuxfoundation.org T24gTW9uLCBPY3QgMTUsIDIwMTggYXQgMjoyNyBBTSBDaHJpc3RvcGggSGVsbHdpZyA8aGNoQGlu ZnJhZGVhZC5vcmc+IHdyb3RlOgo+IE9uIEZyaSwgT2N0IDEyLCAyMDE4IGF0IDAyOjA2OjI4UE0g LTA3MDAsIERhbmllbCBWZXJrYW1wIHdyb3RlOgo+ID4gRnJvbTogQ2hhbmdwZW5nIExpdSA8Y2hh bmdwZW5nLmxpdUBpbnRlbC5jb20+Cj4gPgo+ID4gSW4gY29tbWl0IDg4Yzg1NTM4LCAidmlydGlv LWJsazogYWRkIGRpc2NhcmQgYW5kIHdyaXRlIHplcm9lcyBmZWF0dXJlcwo+ID4gdG8gc3BlY2lm aWNhdGlvbiIgKGh0dHBzOi8vZ2l0aHViLmNvbS9vYXNpcy10Y3MvdmlydGlvLXNwZWMpLCB0aGUg dmlydGlvCj4KPiBUaGVyZSBpcyBzb21lIGlzc3VlcyBpbiB0aGlzIHNwZWMuICBGb3Igb25lIHVz aW5nIHRoZSBtdWx0aXBsZSByYW5nZXMKPiBhbHNvIGZvciB3cml0ZSB6ZXJvZXMgaXMgcmF0aGVy IGluZWZmaWNpZW50LiAgV3JpdGUgemVyb2VzIHJlYWxseSBzaG91bGQKPiB1c2UgdGhlIHNhbWUg Zm9ybWF0IGFzIHJlYWQgYW5kIHdyaXRlLgoKSSB3YXNuJ3QgaW52b2x2ZWQgaW4gdGhlIHdyaXRp bmcgb2YgdGhlIHNwZWMsIHNvIEknbGwgZGVmZXIgdG8gTWljaGFlbAphbmQgQ2hhbmdwZW5nIGhl cmUsIGJ1dCBJJ20gbm90IHN1cmUgaG93ICJzZXQgaW4gc3RvbmUiIHRoZSB2aXJ0aW8Kc3BlY2lm aWNhdGlvbiBpcywgb3IgaWYgaXQgY2FuIGJlIHVwZGF0ZWQgc29tZWhvdyB3aXRob3V0IGJyZWFr aW5nCmNvbXBhdGliaWxpdHkuCgpJIGFncmVlIHRoYXQgV3JpdGUgWmVyb2VzIHdvdWxkIGJlIHNp bXBsZXIgdG8gaW1wbGVtZW50IGFzIGEgc2luZ2xlCkxCQSArIGxlbmd0aCByYXRoZXIgdGhhbiBh IGxpc3QuICBIb3dldmVyLCBpdCdzIG5vdCByZWFsbHkgcG9zc2libGUgdG8KdXNlIHRoZSBzYW1l IGZvcm1hdCBhcyB0aGUgcmVndWxhciB2aXJ0aW8gYmxvY2sgcmVhZC93cml0ZSBjb21tYW5kcwoo VklSVElPX0JMS19UX0lOL1ZJUlRJT19CTEtfVF9PVVQpLCBzaW5jZSB0aGUgcmVhZC93cml0ZSBj b21tYW5kcyBkbwpub3Qgc3BlY2lmeSBhIGxlbmd0aCBleHBsaWNpdGx5OyBsZW5ndGggaXMgaW1w bGllZCBieSB0aGUgbGVuZ3RoIG9mCnRoZSBkYXRhIGJ1ZmZlciBhcyBkZWZpbmVkIGJ5IHRoZSB2 aXJ0aW8gZGVzY3JpcHRvciwgYnV0IGEgV3JpdGUKWmVyb2VzIGNvbW1hbmQgZG9lcyBub3QgcmVx dWlyZSBhIGRhdGEgYnVmZmVyLiAgQXQgYmVzdCwgdGhpcyBjb3VsZCBiZQphIHNlcGFyYXRlIGNv bW1hbmQgbWlycm9yaW5nIHRoZSBsYXlvdXQgb2Ygc3RydWN0IHZpcnRpb19ibGtfcmVxIGJ1dAp3 aXRoIGRhdGEgcmVwbGFjZWQgd2l0aCBhIGxlbmd0aCBmaWVsZDsgSSdtIG5vdCBzdXJlIHRoYXQg YnV5cyBtdWNoIGluCnRoZSB3YXkgb2YgY29uc2lzdGVuY3kuCgo+IFNlY29uZCB0aGUgdW5tYXAg ZmxhZyBpc24ndCBwcm9wZXJseSBzcGVjaWZpZWQgYXQgYWxsLCBhcyBub3RoaW5nCj4gc2F5cyB0 aGUgZGV2aWNlIG1heSBub3QgdW5tYXAgd2l0aG91dCB0aGUgdW5tYXAgZmxhZy4gIFBsZWFzZSB0 YWtlCj4gYSBsb29rIGF0IHRoZSBTQ1NJIG9yIE5WTWUg0ZVwZWMgZm9yIHNvbWUgZ3VpZGFuY2Uu CgpUaGlzIGNvdWxkIHByb2JhYmx5IHVzZSBzb21lIGNsYXJpZnlpbmcgdGV4dCBpbiB0aGUgc3Bl Y2lmaWNhdGlvbiwgYnV0CmdpdmVuIHRoYXQgdGhlcmUgaXMgbm90aGluZyBpbiB0aGUgc3BlYyBk ZXNjcmliaW5nIHdoYXQgdGhlIGRldmljZQpuZWVkcyB0byBkbyB3aGVuIHVubWFwID0gMCwgSSB3 b3VsZCBhc3N1bWUgdGhhdCB0aGUgZGV2aWNlIGNhbiBkbwp3aGF0ZXZlciBpdCBsaWtlcywgYXMg bG9uZyBhcyB0aGUgYmxvY2tzIHJlYWQgYmFjayBhcyAwcyBhZnRlcndhcmRzLgpSZWFkaW5nIGJh Y2sgMHMgaXMgcmVxdWlyZWQgYnkgdGhlIGRlZmluaXRpb24gb2YgdGhlIFdyaXRlIFplcm9lcwpj b21tYW5kIGluIHRoZSBzYW1lIHZpcnRpbyBzcGVjIGNoYW5nZS4gIEl0IHdvdWxkIHByb2JhYmx5 IGJlIGdvb2QgdG8KY2xhcmlmeSB0aGlzIGFuZCBleHBsaWNpdGx5IGRlZmluZSB3aGF0IHRoZSBk ZXZpY2UgaXMgYWxsb3dlZCB0byBkbyBpbgpyZXNwb25zZSB0byBib3RoIHNldHRpbmdzIG9mIHRo ZSB1bm1hcCBiaXQuCgpNeSB1bmRlcnN0YW5kaW5nIG9mIHRoZSBjb3JyZXNwb25kaW5nIGZlYXR1 cmUgaW4gTlZNZSAodGhlIERlYWxsb2NhdGUKYml0IGluIHRoZSBXcml0ZSBaZXJvZXMgY29tbWFu ZCkgaXMgdGhhdCB0aGUgb25seSBkaWZmZXJlbmNlIGJldHdlZW4KRGVhbGxvY2F0ZSA9IDEgYW5k IDAgaXMgdGhhdCB0aGUgZGV2aWNlICJzaG91bGQiIHZlcnN1cyAibWF5IiAobm8KInNoYWxsIiBv biBlaXRoZXIgc2lkZSkgZGVhbGxvY2F0ZSB0aGUgY29ycmVzcG9uZGluZyBibG9ja3MsIGJ1dCBv bmx5CmlmIHRoZSBkZXZpY2Ugc3VwcG9ydHMgcmVhZGluZyAwcyBiYWNrIGFmdGVyIGJsb2NrcyBh cmUgZGVhbGxvY2F0ZWQuCklmIHRoZSBkZXZpY2UgZG9lcyBub3Qgc3VwcG9ydCByZWFkaW5nIDBz IGFmdGVyIGRlYWxsb2NhdGlvbiwgaXQgaXMKbm90IGFsbG93ZWQgdG8gZGVhbGxvY2F0ZSBibG9j a3MgYXMgcGFydCBvZiBhIFdyaXRlIFplcm9lcyBjb21tYW5kCnJlZ2FyZGxlc3Mgb2YgdGhlIHNl dHRpbmcgb2YgdGhlIERlYWxsb2NhdGUgYml0LgoKU29tZSBzaW1pbGFyIHdvcmRpbmcgY291bGQg cHJvYmFibHkgYmUgYWRkZWQgdG8gdGhlIHZpcnRpbyBzcGVjIHRvCmNsYXJpZnkgdGhlIG1lYW5p bmcgb2YgdW5tYXAsIGFsdGhvdWdoIEkgd291bGQgcHJlZmVyIHNvbWV0aGluZyB0aGF0Cm1ha2Vz IGl0IGEgbGl0dGxlIGNsZWFyZXIgdGhhdCB0aGUgYml0IGlzIG9ubHkgaW50ZW5kZWQgYXMgYSBo aW50IGZyb20KdGhlIGRyaXZlciB0byBpbmRpY2F0ZSB3aGV0aGVyIHRoZSBkZXZpY2Ugc2hvdWxk IGF0dGVtcHQgdG8ga2VlcApzdG9yYWdlIGFsbG9jYXRlZCBmb3IgdGhlIHplcm9lZCBibG9ja3Ms IGlmIHRoYXQgaXMgaW5kZWVkIHRoZQppbnRlbmRlZCBiZWhhdmlvci4KCklzIHRoZXJlIHNvbWUg aW4ta2VybmVsIGRvYyB0aGF0IGRlc2NyaWJlcyB3aGF0IGJlaGF2aW9yIHRoZSBMaW51eApibG9j ayBsYXllciBuZWVkcyBmcm9tIGEgd3JpdGUgemVyb2VzIGNvbW1hbmQ/Cgo+ID4gK3N0YXRpYyBp bmxpbmUgaW50IHZpcnRibGtfc2V0dXBfZGlzY2FyZF93cml0ZV96ZXJvZXMoc3RydWN0IHJlcXVl c3QgKnJlcSwKPiA+ICsgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICBib29sIHVubWFwKQo+Cj4gV2h5IGlzIHRoaXMgYW4gaW5saW5lIGZ1bmN0aW9uPwoKSSBkb24n dCB0aGluayB0aGVyZSdzIGFueSByZWFzb24gaXQgbmVlZHMgdG8gYmUgaW5saW5lOyBJIGNhbiBk cm9wIHRoZQppbmxpbmUgaW4gdGhlIG5leHQgcmV2aXNpb24uCgpHaXZlbiAoYXMgZmFyIGFzIEkg Y2FuIHRlbGwpIHlvdXIgY29uY2VybnMgc2VlbSB0byBhcHBseSB0byB0aGUgV3JpdGUKWmVyb2Vz IGNvbW1hbmQgc3BlY2lmaWNhbGx5LCB3b3VsZCBpdCBiZSByZWFzb25hYmxlIHRvIHN0YXJ0IHdp dGggYQpwYXRjaCB0aGF0IGp1c3QgYWRkcyBzdXBwb3J0IGZvciB0aGUgRGlzY2FyZCBjb21tYW5k IChhbG9uZyB3aXRoIGZpeGVzCmZvciBNaW5nJ3MgZmVlZGJhY2spPyAgVGhpcyB3b3VsZCBiZSBz dWZmaWNpZW50IGZvciBteSBwYXJ0aWN1bGFyIHVzZQpjYXNlIChhbHRob3VnaCBJIGNhbid0IHNw ZWFrIGZvciBDaGFuZ3BlbmcpLCBhbmQgd2UgY2FuIHJldmlzaXQgV3JpdGUKWmVyb2VzIG9uY2Ug dGhlIHNwZWMgY29uY2VybnMgYXJlIHdvcmtlZCBvdXQuCgpUaGFua3MsCi0tIERhbmllbApfX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwpWaXJ0dWFsaXphdGlv biBtYWlsaW5nIGxpc3QKVmlydHVhbGl6YXRpb25AbGlzdHMubGludXgtZm91bmRhdGlvbi5vcmcK aHR0cHM6Ly9saXN0cy5saW51eGZvdW5kYXRpb24ub3JnL21haWxtYW4vbGlzdGluZm8vdmlydHVh bGl6YXRpb24=