linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [virtio-dev] [PATCH] virtio-blk: add DISCARD support to virtio-blk driver
  2017-03-28  8:39 [PATCH] virtio-blk: add DISCARD support to virtio-blk driver Changpeng Liu
@ 2017-03-27 11:34 ` Paolo Bonzini
  2017-03-28  1:52   ` Liu, Changpeng
  2017-03-27 14:56 ` Christoph Hellwig
  2017-03-27 20:20 ` Stefan Hajnoczi
  2 siblings, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2017-03-27 11:34 UTC (permalink / raw)
  To: Changpeng Liu, virtio-dev, virtualization, linux-kernel, hch; +Cc: qemu-devel



On 28/03/2017 10:39, Changpeng Liu wrote:
> +	if (virtio_has_feature(vdev, VIRTIO_BLK_F_DISCARD)) {
> +		q->limits.discard_zeroes_data = 0;

Maybe you could use another feature bit to populate discard_zeroes_data.

Paolo

> +		q->limits.discard_alignment = blk_size;
> +		q->limits.discard_granularity = blk_size;
> +		blk_queue_max_discard_sectors(q, UINT_MAX);
> +		blk_queue_max_discard_segments(q, 1);
> +		queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
> +	}
> +

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] virtio-blk: add DISCARD support to virtio-blk driver
  2017-03-28  8:39 [PATCH] virtio-blk: add DISCARD support to virtio-blk driver Changpeng Liu
  2017-03-27 11:34 ` [virtio-dev] " Paolo Bonzini
@ 2017-03-27 14:56 ` Christoph Hellwig
  2017-03-28  1:43   ` Liu, Changpeng
  2017-03-27 20:20 ` Stefan Hajnoczi
  2 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2017-03-27 14:56 UTC (permalink / raw)
  To: Changpeng Liu; +Cc: virtio-dev, virtualization, linux-kernel, hch, qemu-devel

On Tue, Mar 28, 2017 at 04:39:25PM +0800, Changpeng Liu wrote:
> Currently virtio-blk driver does not provide discard feature flag, so the
> filesystems which built on top of the block device will not send discard
> command. This is okay for HDD backend, but it will impact the performance
> for SSD backend.
> 
> Add a feature flag VIRTIO_BLK_F_DISCARD and command VIRTIO_BLK_T_DISCARD
> to extend exist virtio-blk protocol. virtio-blk protocol uses a single
> 8 bytes descriptor containing type,reserved and sector, currently Linux
> uses the reserved field as IO priority, here we also re-use the reserved
> field as number of discard sectors.

Do you have a link to the specification for this feature?  At least
virtio-v1.0 does not seem to specify a discard feature.

Note that Linux 4.11 and later have support for multi-range discard
ala ATA TRIM, SCSI UNMAP and NVMe deallocate which might be useful
here, too.

> +		q->limits.discard_zeroes_data = 0;

No need to clear this.  Also hopefully this field goes away for 4.12

> +		blk_queue_max_discard_segments(q, 1);

No need to set this.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] virtio-blk: add DISCARD support to virtio-blk driver
  2017-03-28  8:39 [PATCH] virtio-blk: add DISCARD support to virtio-blk driver Changpeng Liu
  2017-03-27 11:34 ` [virtio-dev] " Paolo Bonzini
  2017-03-27 14:56 ` Christoph Hellwig
@ 2017-03-27 20:20 ` Stefan Hajnoczi
  2017-03-28  2:15   ` Liu, Changpeng
  2 siblings, 1 reply; 8+ messages in thread
From: Stefan Hajnoczi @ 2017-03-27 20:20 UTC (permalink / raw)
  To: Changpeng Liu; +Cc: virtio-dev, virtualization, linux-kernel, hch, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 5435 bytes --]

On Tue, Mar 28, 2017 at 04:39:25PM +0800, Changpeng Liu wrote:
> Currently virtio-blk driver does not provide discard feature flag, so the
> filesystems which built on top of the block device will not send discard
> command. This is okay for HDD backend, but it will impact the performance
> for SSD backend.
> 
> Add a feature flag VIRTIO_BLK_F_DISCARD and command VIRTIO_BLK_T_DISCARD
> to extend exist virtio-blk protocol. virtio-blk protocol uses a single
> 8 bytes descriptor containing type,reserved and sector, currently Linux
> uses the reserved field as IO priority, here we also re-use the reserved
> field as number of discard sectors.
> 
> Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
> ---
>  drivers/block/virtio_blk.c      | 38 +++++++++++++++++++++++++++++---------
>  include/uapi/linux/virtio_blk.h | 12 ++++++++++--
>  2 files changed, 39 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 1d4c9f8..550cfe7 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -241,6 +241,9 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
>  	case REQ_OP_FLUSH:
>  		type = VIRTIO_BLK_T_FLUSH;
>  		break;
> +	case REQ_OP_DISCARD:
> +		type = VIRTIO_BLK_T_DISCARD;
> +		break;
>  	case REQ_OP_SCSI_IN:
>  	case REQ_OP_SCSI_OUT:
>  		type = VIRTIO_BLK_T_SCSI_CMD;
> @@ -256,16 +259,24 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
>  	vbr->out_hdr.type = cpu_to_virtio32(vblk->vdev, type);
>  	vbr->out_hdr.sector = type ?
>  		0 : cpu_to_virtio64(vblk->vdev, blk_rq_pos(req));
> -	vbr->out_hdr.ioprio = cpu_to_virtio32(vblk->vdev, req_get_ioprio(req));
> +	vbr->out_hdr.u.ioprio = cpu_to_virtio32(vblk->vdev, req_get_ioprio(req));
>  
>  	blk_mq_start_request(req);
>  
> -	num = blk_rq_map_sg(hctx->queue, req, vbr->sg);
> -	if (num) {
> -		if (rq_data_dir(req) == WRITE)
> -			vbr->out_hdr.type |= cpu_to_virtio32(vblk->vdev, VIRTIO_BLK_T_OUT);
> -		else
> -			vbr->out_hdr.type |= cpu_to_virtio32(vblk->vdev, VIRTIO_BLK_T_IN);
> +	if (type == VIRTIO_BLK_T_DISCARD) {
> +		vbr->out_hdr.u.discard_nr_sectors = cpu_to_virtio32(vblk->vdev,
> +								    blk_rq_sectors(req));
> +		num = 0;
> +	} else {
> +		num = blk_rq_map_sg(hctx->queue, req, vbr->sg);
> +		if (num) {
> +			if (rq_data_dir(req) == WRITE)
> +				vbr->out_hdr.type |= cpu_to_virtio32(vblk->vdev,
> +								     VIRTIO_BLK_T_OUT);
> +			else
> +				vbr->out_hdr.type |= cpu_to_virtio32(vblk->vdev,
> +								     VIRTIO_BLK_T_IN);
> +		}
>  	}
>  
>  	spin_lock_irqsave(&vblk->vqs[qid].lock, flags);
> @@ -775,6 +786,15 @@ static int virtblk_probe(struct virtio_device *vdev)
>  	if (!err && opt_io_size)
>  		blk_queue_io_opt(q, blk_size * opt_io_size);
>  
> +	if (virtio_has_feature(vdev, VIRTIO_BLK_F_DISCARD)) {
> +		q->limits.discard_zeroes_data = 0;
> +		q->limits.discard_alignment = blk_size;
> +		q->limits.discard_granularity = blk_size;
> +		blk_queue_max_discard_sectors(q, UINT_MAX);
> +		blk_queue_max_discard_segments(q, 1);
> +		queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
> +	}

Please add configuration space fields for these limits.  Looking at the
virtio-scsi block limits code in QEMU's scsi_disk_emulate_inquiry() I
can see that the hypervisor has useful values that it wants to
communicate.  They shouldn't be hardcoded to blk_size.

> +
>  	virtio_device_ready(vdev);
>  
>  	device_add_disk(&vdev->dev, vblk->disk);
> @@ -882,14 +902,14 @@ static int virtblk_restore(struct virtio_device *vdev)
>  	VIRTIO_BLK_F_SCSI,
>  #endif
>  	VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE,
> -	VIRTIO_BLK_F_MQ,
> +	VIRTIO_BLK_F_MQ, VIRTIO_BLK_F_DISCARD,
>  }
>  ;
>  static unsigned int features[] = {
>  	VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX, VIRTIO_BLK_F_GEOMETRY,
>  	VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE,
>  	VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE,
> -	VIRTIO_BLK_F_MQ,
> +	VIRTIO_BLK_F_MQ, VIRTIO_BLK_F_DISCARD,
>  };
>  
>  static struct virtio_driver virtio_blk = {
> diff --git a/include/uapi/linux/virtio_blk.h b/include/uapi/linux/virtio_blk.h
> index 9ebe4d9..d608649 100644
> --- a/include/uapi/linux/virtio_blk.h
> +++ b/include/uapi/linux/virtio_blk.h
> @@ -38,6 +38,7 @@
>  #define VIRTIO_BLK_F_BLK_SIZE	6	/* Block size of disk is available*/
>  #define VIRTIO_BLK_F_TOPOLOGY	10	/* Topology information is available */
>  #define VIRTIO_BLK_F_MQ		12	/* support more than one vq */
> +#define VIRTIO_BLK_F_DISCARD	13	/* DISCARD command is supported */
>  
>  /* Legacy feature bits */
>  #ifndef VIRTIO_BLK_NO_LEGACY
> @@ -114,6 +115,9 @@ struct virtio_blk_config {
>  /* Get device ID command */
>  #define VIRTIO_BLK_T_GET_ID    8
>  
> +/* Discard command */
> +#define VIRTIO_BLK_T_DISCARD	16
> +
>  #ifndef VIRTIO_BLK_NO_LEGACY
>  /* Barrier before this op. */
>  #define VIRTIO_BLK_T_BARRIER	0x80000000
> @@ -127,8 +131,12 @@ struct virtio_blk_config {
>  struct virtio_blk_outhdr {
>  	/* VIRTIO_BLK_T* */
>  	__virtio32 type;
> -	/* io priority. */
> -	__virtio32 ioprio;
> +	union {
> +		/* io priority. */
> +		__virtio32 ioprio;
> +		/* discard number of sectors */
> +		__virtio32 discard_nr_sectors;
> +	} u;

DISCARD commands have no io priority?  Perhaps it's better to add an
extended header.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] virtio-blk: add DISCARD support to virtio-blk driver
  2017-03-27 14:56 ` Christoph Hellwig
@ 2017-03-28  1:43   ` Liu, Changpeng
  0 siblings, 0 replies; 8+ messages in thread
From: Liu, Changpeng @ 2017-03-28  1:43 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: virtio-dev, virtualization, linux-kernel, qemu-devel



> -----Original Message-----
> From: Christoph Hellwig [mailto:hch@lst.de]
> Sent: Monday, March 27, 2017 10:56 PM
> To: Liu, Changpeng <changpeng.liu@intel.com>
> Cc: virtio-dev@lists.oasis-open.org; virtualization@lists.linux-foundation.org; linux-
> kernel@vger.kernel.org; hch@lst.de; qemu-devel@nongnu.org
> Subject: Re: [PATCH] virtio-blk: add DISCARD support to virtio-blk driver
> 
> On Tue, Mar 28, 2017 at 04:39:25PM +0800, Changpeng Liu wrote:
> > Currently virtio-blk driver does not provide discard feature flag, so the
> > filesystems which built on top of the block device will not send discard
> > command. This is okay for HDD backend, but it will impact the performance
> > for SSD backend.
> >
> > Add a feature flag VIRTIO_BLK_F_DISCARD and command
> VIRTIO_BLK_T_DISCARD
> > to extend exist virtio-blk protocol. virtio-blk protocol uses a single
> > 8 bytes descriptor containing type,reserved and sector, currently Linux
> > uses the reserved field as IO priority, here we also re-use the reserved
> > field as number of discard sectors.
> 
> Do you have a link to the specification for this feature?  At least
> virtio-v1.0 does not seem to specify a discard feature.
Not yet, patch goes first, there are uses who don't want to migrate from virtio-blk to virtio-scsi,
so this feature seems reasonable, the question is that how we should define the specification to
support this feature.
> 
> Note that Linux 4.11 and later have support for multi-range discard
> ala ATA TRIM, SCSI UNMAP and NVMe deallocate which might be useful
> here, too.
For support multi-range discard features, the DISCARD command must have data_in buffer, similar with
SCSI UNMAP and NVMe DEALLOCATE commands, maybe 16 bytes aligned descriptors are required for each
DISCARD command.
> 
> > +		q->limits.discard_zeroes_data = 0;
> 
> No need to clear this.  Also hopefully this field goes away for 4.12
> 
> > +		blk_queue_max_discard_segments(q, 1);
> 
> No need to set this.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [virtio-dev] [PATCH] virtio-blk: add DISCARD support to virtio-blk driver
  2017-03-27 11:34 ` [virtio-dev] " Paolo Bonzini
@ 2017-03-28  1:52   ` Liu, Changpeng
  0 siblings, 0 replies; 8+ messages in thread
From: Liu, Changpeng @ 2017-03-28  1:52 UTC (permalink / raw)
  To: Paolo Bonzini, virtio-dev, virtualization, linux-kernel, hch; +Cc: qemu-devel



> -----Original Message-----
> From: virtio-dev@lists.oasis-open.org [mailto:virtio-dev@lists.oasis-open.org] On
> Behalf Of Paolo Bonzini
> Sent: Monday, March 27, 2017 7:34 PM
> To: Liu, Changpeng <changpeng.liu@intel.com>; virtio-dev@lists.oasis-open.org;
> virtualization@lists.linux-foundation.org; linux-kernel@vger.kernel.org; hch@lst.de
> Cc: qemu-devel@nongnu.org
> Subject: Re: [virtio-dev] [PATCH] virtio-blk: add DISCARD support to virtio-blk driver
> 
> 
> 
> On 28/03/2017 10:39, Changpeng Liu wrote:
> > +	if (virtio_has_feature(vdev, VIRTIO_BLK_F_DISCARD)) {
> > +		q->limits.discard_zeroes_data = 0;
> 
> Maybe you could use another feature bit to populate discard_zeroes_data.
> 
> Paolo
> 
Sounds good to me, Christoph Hellwig mentioned this field will be removed in next release, just removed this line makes clear.
> > +		q->limits.discard_alignment = blk_size;
> > +		q->limits.discard_granularity = blk_size;
> > +		blk_queue_max_discard_sectors(q, UINT_MAX);
> > +		blk_queue_max_discard_segments(q, 1);
> > +		queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
> > +	}
> > +
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] virtio-blk: add DISCARD support to virtio-blk driver
  2017-03-27 20:20 ` Stefan Hajnoczi
@ 2017-03-28  2:15   ` Liu, Changpeng
  2017-03-28  8:37     ` Stefan Hajnoczi
  0 siblings, 1 reply; 8+ messages in thread
From: Liu, Changpeng @ 2017-03-28  2:15 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-dev, virtualization, linux-kernel, hch, qemu-devel



> -----Original Message-----
> From: Stefan Hajnoczi [mailto:stefanha@gmail.com]
> Sent: Tuesday, March 28, 2017 4:20 AM
> To: Liu, Changpeng <changpeng.liu@intel.com>
> Cc: virtio-dev@lists.oasis-open.org; virtualization@lists.linux-foundation.org; linux-
> kernel@vger.kernel.org; hch@lst.de; qemu-devel@nongnu.org
> Subject: Re: [PATCH] virtio-blk: add DISCARD support to virtio-blk driver
> 
> On Tue, Mar 28, 2017 at 04:39:25PM +0800, Changpeng Liu wrote:
> > Currently virtio-blk driver does not provide discard feature flag, so the
> > filesystems which built on top of the block device will not send discard
> > command. This is okay for HDD backend, but it will impact the performance
> > for SSD backend.
> >
> > Add a feature flag VIRTIO_BLK_F_DISCARD and command
> VIRTIO_BLK_T_DISCARD
> > to extend exist virtio-blk protocol. virtio-blk protocol uses a single
> > 8 bytes descriptor containing type,reserved and sector, currently Linux
> > uses the reserved field as IO priority, here we also re-use the reserved
> > field as number of discard sectors.
> >
> > Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
> > ---
> >  drivers/block/virtio_blk.c      | 38 +++++++++++++++++++++++++++++---------
> >  include/uapi/linux/virtio_blk.h | 12 ++++++++++--
> >  2 files changed, 39 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > index 1d4c9f8..550cfe7 100644
> > --- a/drivers/block/virtio_blk.c
> > +++ b/drivers/block/virtio_blk.c
> > @@ -241,6 +241,9 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
> >  	case REQ_OP_FLUSH:
> >  		type = VIRTIO_BLK_T_FLUSH;
> >  		break;
> > +	case REQ_OP_DISCARD:
> > +		type = VIRTIO_BLK_T_DISCARD;
> > +		break;
> >  	case REQ_OP_SCSI_IN:
> >  	case REQ_OP_SCSI_OUT:
> >  		type = VIRTIO_BLK_T_SCSI_CMD;
> > @@ -256,16 +259,24 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
> >  	vbr->out_hdr.type = cpu_to_virtio32(vblk->vdev, type);
> >  	vbr->out_hdr.sector = type ?
> >  		0 : cpu_to_virtio64(vblk->vdev, blk_rq_pos(req));
> > -	vbr->out_hdr.ioprio = cpu_to_virtio32(vblk->vdev, req_get_ioprio(req));
> > +	vbr->out_hdr.u.ioprio = cpu_to_virtio32(vblk->vdev, req_get_ioprio(req));
> >
> >  	blk_mq_start_request(req);
> >
> > -	num = blk_rq_map_sg(hctx->queue, req, vbr->sg);
> > -	if (num) {
> > -		if (rq_data_dir(req) == WRITE)
> > -			vbr->out_hdr.type |= cpu_to_virtio32(vblk->vdev,
> VIRTIO_BLK_T_OUT);
> > -		else
> > -			vbr->out_hdr.type |= cpu_to_virtio32(vblk->vdev,
> VIRTIO_BLK_T_IN);
> > +	if (type == VIRTIO_BLK_T_DISCARD) {
> > +		vbr->out_hdr.u.discard_nr_sectors = cpu_to_virtio32(vblk->vdev,
> > +
> blk_rq_sectors(req));
> > +		num = 0;
> > +	} else {
> > +		num = blk_rq_map_sg(hctx->queue, req, vbr->sg);
> > +		if (num) {
> > +			if (rq_data_dir(req) == WRITE)
> > +				vbr->out_hdr.type |= cpu_to_virtio32(vblk->vdev,
> > +
> VIRTIO_BLK_T_OUT);
> > +			else
> > +				vbr->out_hdr.type |= cpu_to_virtio32(vblk->vdev,
> > +
> VIRTIO_BLK_T_IN);
> > +		}
> >  	}
> >
> >  	spin_lock_irqsave(&vblk->vqs[qid].lock, flags);
> > @@ -775,6 +786,15 @@ static int virtblk_probe(struct virtio_device *vdev)
> >  	if (!err && opt_io_size)
> >  		blk_queue_io_opt(q, blk_size * opt_io_size);
> >
> > +	if (virtio_has_feature(vdev, VIRTIO_BLK_F_DISCARD)) {
> > +		q->limits.discard_zeroes_data = 0;
> > +		q->limits.discard_alignment = blk_size;
> > +		q->limits.discard_granularity = blk_size;
> > +		blk_queue_max_discard_sectors(q, UINT_MAX);
> > +		blk_queue_max_discard_segments(q, 1);
> > +		queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
> > +	}
> 
> Please add configuration space fields for these limits.  Looking at the
> virtio-scsi block limits code in QEMU's scsi_disk_emulate_inquiry() I
> can see that the hypervisor has useful values that it wants to
> communicate.  They shouldn't be hardcoded to blk_size.
Yes, move discard related parameters to configuration space make sense.
> 
> > +
> >  	virtio_device_ready(vdev);
> >
> >  	device_add_disk(&vdev->dev, vblk->disk);
> > @@ -882,14 +902,14 @@ static int virtblk_restore(struct virtio_device *vdev)
> >  	VIRTIO_BLK_F_SCSI,
> >  #endif
> >  	VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY,
> VIRTIO_BLK_F_CONFIG_WCE,
> > -	VIRTIO_BLK_F_MQ,
> > +	VIRTIO_BLK_F_MQ, VIRTIO_BLK_F_DISCARD,
> >  }
> >  ;
> >  static unsigned int features[] = {
> >  	VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX,
> VIRTIO_BLK_F_GEOMETRY,
> >  	VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE,
> >  	VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY,
> VIRTIO_BLK_F_CONFIG_WCE,
> > -	VIRTIO_BLK_F_MQ,
> > +	VIRTIO_BLK_F_MQ, VIRTIO_BLK_F_DISCARD,
> >  };
> >
> >  static struct virtio_driver virtio_blk = {
> > diff --git a/include/uapi/linux/virtio_blk.h b/include/uapi/linux/virtio_blk.h
> > index 9ebe4d9..d608649 100644
> > --- a/include/uapi/linux/virtio_blk.h
> > +++ b/include/uapi/linux/virtio_blk.h
> > @@ -38,6 +38,7 @@
> >  #define VIRTIO_BLK_F_BLK_SIZE	6	/* Block size of disk is available*/
> >  #define VIRTIO_BLK_F_TOPOLOGY	10	/* Topology information is
> available */
> >  #define VIRTIO_BLK_F_MQ		12	/* support more than one vq */
> > +#define VIRTIO_BLK_F_DISCARD	13	/* DISCARD command is supported
> */
> >
> >  /* Legacy feature bits */
> >  #ifndef VIRTIO_BLK_NO_LEGACY
> > @@ -114,6 +115,9 @@ struct virtio_blk_config {
> >  /* Get device ID command */
> >  #define VIRTIO_BLK_T_GET_ID    8
> >
> > +/* Discard command */
> > +#define VIRTIO_BLK_T_DISCARD	16
> > +
> >  #ifndef VIRTIO_BLK_NO_LEGACY
> >  /* Barrier before this op. */
> >  #define VIRTIO_BLK_T_BARRIER	0x80000000
> > @@ -127,8 +131,12 @@ struct virtio_blk_config {
> >  struct virtio_blk_outhdr {
> >  	/* VIRTIO_BLK_T* */
> >  	__virtio32 type;
> > -	/* io priority. */
> > -	__virtio32 ioprio;
> > +	union {
> > +		/* io priority. */
> > +		__virtio32 ioprio;
> > +		/* discard number of sectors */
> > +		__virtio32 discard_nr_sectors;
> > +	} u;
> 
> DISCARD commands have no io priority?  Perhaps it's better to add an
> extended header.
What I think now is that, keep the ioprio field, and let DISCARD command has input data buffers, 16 bytes aligned descriptor for each
DISCARD segment(can support mult-range feature), and this is also aligned with SCSI and NVMe specification.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] virtio-blk: add DISCARD support to virtio-blk driver
  2017-03-28  2:15   ` Liu, Changpeng
@ 2017-03-28  8:37     ` Stefan Hajnoczi
  0 siblings, 0 replies; 8+ messages in thread
From: Stefan Hajnoczi @ 2017-03-28  8:37 UTC (permalink / raw)
  To: Liu, Changpeng; +Cc: virtio-dev, virtualization, linux-kernel, hch, qemu-devel

On Tue, Mar 28, 2017 at 3:15 AM, Liu, Changpeng <changpeng.liu@intel.com> wrote:
>> -----Original Message-----
>> From: Stefan Hajnoczi [mailto:stefanha@gmail.com]
>> Sent: Tuesday, March 28, 2017 4:20 AM
>> To: Liu, Changpeng <changpeng.liu@intel.com>
>> Cc: virtio-dev@lists.oasis-open.org; virtualization@lists.linux-foundation.org; linux-
>> kernel@vger.kernel.org; hch@lst.de; qemu-devel@nongnu.org
>> Subject: Re: [PATCH] virtio-blk: add DISCARD support to virtio-blk driver
>>
>> On Tue, Mar 28, 2017 at 04:39:25PM +0800, Changpeng Liu wrote:
>> > Currently virtio-blk driver does not provide discard feature flag, so the
>> > filesystems which built on top of the block device will not send discard
>> > command. This is okay for HDD backend, but it will impact the performance
>> > for SSD backend.
>> >
>> > Add a feature flag VIRTIO_BLK_F_DISCARD and command
>> VIRTIO_BLK_T_DISCARD
>> > to extend exist virtio-blk protocol. virtio-blk protocol uses a single
>> > 8 bytes descriptor containing type,reserved and sector, currently Linux
>> > uses the reserved field as IO priority, here we also re-use the reserved
>> > field as number of discard sectors.
>> >
>> > Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
>> > ---
>> >  drivers/block/virtio_blk.c      | 38 +++++++++++++++++++++++++++++---------
>> >  include/uapi/linux/virtio_blk.h | 12 ++++++++++--
>> >  2 files changed, 39 insertions(+), 11 deletions(-)
>> >
>> > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>> > index 1d4c9f8..550cfe7 100644
>> > --- a/drivers/block/virtio_blk.c
>> > +++ b/drivers/block/virtio_blk.c
>> > @@ -241,6 +241,9 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
>> >     case REQ_OP_FLUSH:
>> >             type = VIRTIO_BLK_T_FLUSH;
>> >             break;
>> > +   case REQ_OP_DISCARD:
>> > +           type = VIRTIO_BLK_T_DISCARD;
>> > +           break;
>> >     case REQ_OP_SCSI_IN:
>> >     case REQ_OP_SCSI_OUT:
>> >             type = VIRTIO_BLK_T_SCSI_CMD;
>> > @@ -256,16 +259,24 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
>> >     vbr->out_hdr.type = cpu_to_virtio32(vblk->vdev, type);
>> >     vbr->out_hdr.sector = type ?
>> >             0 : cpu_to_virtio64(vblk->vdev, blk_rq_pos(req));
>> > -   vbr->out_hdr.ioprio = cpu_to_virtio32(vblk->vdev, req_get_ioprio(req));
>> > +   vbr->out_hdr.u.ioprio = cpu_to_virtio32(vblk->vdev, req_get_ioprio(req));
>> >
>> >     blk_mq_start_request(req);
>> >
>> > -   num = blk_rq_map_sg(hctx->queue, req, vbr->sg);
>> > -   if (num) {
>> > -           if (rq_data_dir(req) == WRITE)
>> > -                   vbr->out_hdr.type |= cpu_to_virtio32(vblk->vdev,
>> VIRTIO_BLK_T_OUT);
>> > -           else
>> > -                   vbr->out_hdr.type |= cpu_to_virtio32(vblk->vdev,
>> VIRTIO_BLK_T_IN);
>> > +   if (type == VIRTIO_BLK_T_DISCARD) {
>> > +           vbr->out_hdr.u.discard_nr_sectors = cpu_to_virtio32(vblk->vdev,
>> > +
>> blk_rq_sectors(req));
>> > +           num = 0;
>> > +   } else {
>> > +           num = blk_rq_map_sg(hctx->queue, req, vbr->sg);
>> > +           if (num) {
>> > +                   if (rq_data_dir(req) == WRITE)
>> > +                           vbr->out_hdr.type |= cpu_to_virtio32(vblk->vdev,
>> > +
>> VIRTIO_BLK_T_OUT);
>> > +                   else
>> > +                           vbr->out_hdr.type |= cpu_to_virtio32(vblk->vdev,
>> > +
>> VIRTIO_BLK_T_IN);
>> > +           }
>> >     }
>> >
>> >     spin_lock_irqsave(&vblk->vqs[qid].lock, flags);
>> > @@ -775,6 +786,15 @@ static int virtblk_probe(struct virtio_device *vdev)
>> >     if (!err && opt_io_size)
>> >             blk_queue_io_opt(q, blk_size * opt_io_size);
>> >
>> > +   if (virtio_has_feature(vdev, VIRTIO_BLK_F_DISCARD)) {
>> > +           q->limits.discard_zeroes_data = 0;
>> > +           q->limits.discard_alignment = blk_size;
>> > +           q->limits.discard_granularity = blk_size;
>> > +           blk_queue_max_discard_sectors(q, UINT_MAX);
>> > +           blk_queue_max_discard_segments(q, 1);
>> > +           queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
>> > +   }
>>
>> Please add configuration space fields for these limits.  Looking at the
>> virtio-scsi block limits code in QEMU's scsi_disk_emulate_inquiry() I
>> can see that the hypervisor has useful values that it wants to
>> communicate.  They shouldn't be hardcoded to blk_size.
> Yes, move discard related parameters to configuration space make sense.
>>
>> > +
>> >     virtio_device_ready(vdev);
>> >
>> >     device_add_disk(&vdev->dev, vblk->disk);
>> > @@ -882,14 +902,14 @@ static int virtblk_restore(struct virtio_device *vdev)
>> >     VIRTIO_BLK_F_SCSI,
>> >  #endif
>> >     VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY,
>> VIRTIO_BLK_F_CONFIG_WCE,
>> > -   VIRTIO_BLK_F_MQ,
>> > +   VIRTIO_BLK_F_MQ, VIRTIO_BLK_F_DISCARD,
>> >  }
>> >  ;
>> >  static unsigned int features[] = {
>> >     VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX,
>> VIRTIO_BLK_F_GEOMETRY,
>> >     VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE,
>> >     VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY,
>> VIRTIO_BLK_F_CONFIG_WCE,
>> > -   VIRTIO_BLK_F_MQ,
>> > +   VIRTIO_BLK_F_MQ, VIRTIO_BLK_F_DISCARD,
>> >  };
>> >
>> >  static struct virtio_driver virtio_blk = {
>> > diff --git a/include/uapi/linux/virtio_blk.h b/include/uapi/linux/virtio_blk.h
>> > index 9ebe4d9..d608649 100644
>> > --- a/include/uapi/linux/virtio_blk.h
>> > +++ b/include/uapi/linux/virtio_blk.h
>> > @@ -38,6 +38,7 @@
>> >  #define VIRTIO_BLK_F_BLK_SIZE      6       /* Block size of disk is available*/
>> >  #define VIRTIO_BLK_F_TOPOLOGY      10      /* Topology information is
>> available */
>> >  #define VIRTIO_BLK_F_MQ            12      /* support more than one vq */
>> > +#define VIRTIO_BLK_F_DISCARD       13      /* DISCARD command is supported
>> */
>> >
>> >  /* Legacy feature bits */
>> >  #ifndef VIRTIO_BLK_NO_LEGACY
>> > @@ -114,6 +115,9 @@ struct virtio_blk_config {
>> >  /* Get device ID command */
>> >  #define VIRTIO_BLK_T_GET_ID    8
>> >
>> > +/* Discard command */
>> > +#define VIRTIO_BLK_T_DISCARD       16
>> > +
>> >  #ifndef VIRTIO_BLK_NO_LEGACY
>> >  /* Barrier before this op. */
>> >  #define VIRTIO_BLK_T_BARRIER       0x80000000
>> > @@ -127,8 +131,12 @@ struct virtio_blk_config {
>> >  struct virtio_blk_outhdr {
>> >     /* VIRTIO_BLK_T* */
>> >     __virtio32 type;
>> > -   /* io priority. */
>> > -   __virtio32 ioprio;
>> > +   union {
>> > +           /* io priority. */
>> > +           __virtio32 ioprio;
>> > +           /* discard number of sectors */
>> > +           __virtio32 discard_nr_sectors;
>> > +   } u;
>>
>> DISCARD commands have no io priority?  Perhaps it's better to add an
>> extended header.
> What I think now is that, keep the ioprio field, and let DISCARD command has input data buffers, 16 bytes aligned descriptor for each
> DISCARD segment(can support mult-range feature), and this is also aligned with SCSI and NVMe specification.

Sounds good, thanks.

Stefan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] virtio-blk: add DISCARD support to virtio-blk driver
@ 2017-03-28  8:39 Changpeng Liu
  2017-03-27 11:34 ` [virtio-dev] " Paolo Bonzini
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Changpeng Liu @ 2017-03-28  8:39 UTC (permalink / raw)
  To: virtio-dev, virtualization, linux-kernel, hch; +Cc: qemu-devel

Currently virtio-blk driver does not provide discard feature flag, so the
filesystems which built on top of the block device will not send discard
command. This is okay for HDD backend, but it will impact the performance
for SSD backend.

Add a feature flag VIRTIO_BLK_F_DISCARD and command VIRTIO_BLK_T_DISCARD
to extend exist virtio-blk protocol. virtio-blk protocol uses a single
8 bytes descriptor containing type,reserved and sector, currently Linux
uses the reserved field as IO priority, here we also re-use the reserved
field as number of discard sectors.

Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
---
 drivers/block/virtio_blk.c      | 38 +++++++++++++++++++++++++++++---------
 include/uapi/linux/virtio_blk.h | 12 ++++++++++--
 2 files changed, 39 insertions(+), 11 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 1d4c9f8..550cfe7 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -241,6 +241,9 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
 	case REQ_OP_FLUSH:
 		type = VIRTIO_BLK_T_FLUSH;
 		break;
+	case REQ_OP_DISCARD:
+		type = VIRTIO_BLK_T_DISCARD;
+		break;
 	case REQ_OP_SCSI_IN:
 	case REQ_OP_SCSI_OUT:
 		type = VIRTIO_BLK_T_SCSI_CMD;
@@ -256,16 +259,24 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
 	vbr->out_hdr.type = cpu_to_virtio32(vblk->vdev, type);
 	vbr->out_hdr.sector = type ?
 		0 : cpu_to_virtio64(vblk->vdev, blk_rq_pos(req));
-	vbr->out_hdr.ioprio = cpu_to_virtio32(vblk->vdev, req_get_ioprio(req));
+	vbr->out_hdr.u.ioprio = cpu_to_virtio32(vblk->vdev, req_get_ioprio(req));
 
 	blk_mq_start_request(req);
 
-	num = blk_rq_map_sg(hctx->queue, req, vbr->sg);
-	if (num) {
-		if (rq_data_dir(req) == WRITE)
-			vbr->out_hdr.type |= cpu_to_virtio32(vblk->vdev, VIRTIO_BLK_T_OUT);
-		else
-			vbr->out_hdr.type |= cpu_to_virtio32(vblk->vdev, VIRTIO_BLK_T_IN);
+	if (type == VIRTIO_BLK_T_DISCARD) {
+		vbr->out_hdr.u.discard_nr_sectors = cpu_to_virtio32(vblk->vdev,
+								    blk_rq_sectors(req));
+		num = 0;
+	} else {
+		num = blk_rq_map_sg(hctx->queue, req, vbr->sg);
+		if (num) {
+			if (rq_data_dir(req) == WRITE)
+				vbr->out_hdr.type |= cpu_to_virtio32(vblk->vdev,
+								     VIRTIO_BLK_T_OUT);
+			else
+				vbr->out_hdr.type |= cpu_to_virtio32(vblk->vdev,
+								     VIRTIO_BLK_T_IN);
+		}
 	}
 
 	spin_lock_irqsave(&vblk->vqs[qid].lock, flags);
@@ -775,6 +786,15 @@ static int virtblk_probe(struct virtio_device *vdev)
 	if (!err && opt_io_size)
 		blk_queue_io_opt(q, blk_size * opt_io_size);
 
+	if (virtio_has_feature(vdev, VIRTIO_BLK_F_DISCARD)) {
+		q->limits.discard_zeroes_data = 0;
+		q->limits.discard_alignment = blk_size;
+		q->limits.discard_granularity = blk_size;
+		blk_queue_max_discard_sectors(q, UINT_MAX);
+		blk_queue_max_discard_segments(q, 1);
+		queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
+	}
+
 	virtio_device_ready(vdev);
 
 	device_add_disk(&vdev->dev, vblk->disk);
@@ -882,14 +902,14 @@ static int virtblk_restore(struct virtio_device *vdev)
 	VIRTIO_BLK_F_SCSI,
 #endif
 	VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE,
-	VIRTIO_BLK_F_MQ,
+	VIRTIO_BLK_F_MQ, VIRTIO_BLK_F_DISCARD,
 }
 ;
 static unsigned int features[] = {
 	VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX, VIRTIO_BLK_F_GEOMETRY,
 	VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE,
 	VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE,
-	VIRTIO_BLK_F_MQ,
+	VIRTIO_BLK_F_MQ, VIRTIO_BLK_F_DISCARD,
 };
 
 static struct virtio_driver virtio_blk = {
diff --git a/include/uapi/linux/virtio_blk.h b/include/uapi/linux/virtio_blk.h
index 9ebe4d9..d608649 100644
--- a/include/uapi/linux/virtio_blk.h
+++ b/include/uapi/linux/virtio_blk.h
@@ -38,6 +38,7 @@
 #define VIRTIO_BLK_F_BLK_SIZE	6	/* Block size of disk is available*/
 #define VIRTIO_BLK_F_TOPOLOGY	10	/* Topology information is available */
 #define VIRTIO_BLK_F_MQ		12	/* support more than one vq */
+#define VIRTIO_BLK_F_DISCARD	13	/* DISCARD command is supported */
 
 /* Legacy feature bits */
 #ifndef VIRTIO_BLK_NO_LEGACY
@@ -114,6 +115,9 @@ struct virtio_blk_config {
 /* Get device ID command */
 #define VIRTIO_BLK_T_GET_ID    8
 
+/* Discard command */
+#define VIRTIO_BLK_T_DISCARD	16
+
 #ifndef VIRTIO_BLK_NO_LEGACY
 /* Barrier before this op. */
 #define VIRTIO_BLK_T_BARRIER	0x80000000
@@ -127,8 +131,12 @@ struct virtio_blk_config {
 struct virtio_blk_outhdr {
 	/* VIRTIO_BLK_T* */
 	__virtio32 type;
-	/* io priority. */
-	__virtio32 ioprio;
+	union {
+		/* io priority. */
+		__virtio32 ioprio;
+		/* discard number of sectors */
+		__virtio32 discard_nr_sectors;
+	} u;
 	/* Sector (ie. 512 byte offset) */
 	__virtio64 sector;
 };
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-03-28  8:38 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-28  8:39 [PATCH] virtio-blk: add DISCARD support to virtio-blk driver Changpeng Liu
2017-03-27 11:34 ` [virtio-dev] " Paolo Bonzini
2017-03-28  1:52   ` Liu, Changpeng
2017-03-27 14:56 ` Christoph Hellwig
2017-03-28  1:43   ` Liu, Changpeng
2017-03-27 20:20 ` Stefan Hajnoczi
2017-03-28  2:15   ` Liu, Changpeng
2017-03-28  8:37     ` Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).