All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 0/3] xen-blkfront/xen-blkback trim support
@ 2011-08-18  9:34 Li Dongyang
  2011-08-18  9:34 ` [PATCH V2 1/3] xen-blkfront: add BLKIF_OP_TRIM and backend type flags Li Dongyang
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Li Dongyang @ 2011-08-18  9:34 UTC (permalink / raw)
  To: xen-devel; +Cc: owen.smith, JBeulich

Hi, List
Here is V2 of the trim support for blkfront/blkback,
it's now rebased on Jeremy's next-3.1 tree and has some adjustments
according to the comments from Jan Beulich, Thanks

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH V2 1/3] xen-blkfront: add BLKIF_OP_TRIM and backend type flags
  2011-08-18  9:34 [PATCH V2 0/3] xen-blkfront/xen-blkback trim support Li Dongyang
@ 2011-08-18  9:34 ` Li Dongyang
  2011-08-20  0:38   ` Jeremy Fitzhardinge
  2011-08-18  9:34 ` [PATCH V2 2/3] xen-blkfront: teach blkfront driver handle trim request Li Dongyang
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Li Dongyang @ 2011-08-18  9:34 UTC (permalink / raw)
  To: xen-devel; +Cc: owen.smith, JBeulich

This adds the BLKIF_OP_TRIM for blkfront and blkback, also 2 flags telling
us the type of the backend, used in blkback to determine what to do when we
see a trim request.
Part of the patch is just taken from Owen Smith, Thanks

Signed-off-by: Owen Smith <owen.smith@citrix.com>
Signed-off-by: Li Dongyang <lidongyang@novell.com>
---
 include/xen/interface/io/blkif.h |   21 +++++++++++++++++++++
 1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/include/xen/interface/io/blkif.h b/include/xen/interface/io/blkif.h
index 3d5d6db..b92cf23 100644
--- a/include/xen/interface/io/blkif.h
+++ b/include/xen/interface/io/blkif.h
@@ -57,6 +57,19 @@ typedef uint64_t blkif_sector_t;
  * "feature-flush-cache" node!
  */
 #define BLKIF_OP_FLUSH_DISKCACHE   3
+
+/*
+ * Recognised only if "feature-trim" is present in backend xenbus info.
+ * The "feature-trim" node contains a boolean indicating whether barrier
+ * requests are likely to succeed or fail. Either way, a trim request
+ * may fail at any time with BLKIF_RSP_EOPNOTSUPP if it is unsupported by
+ * the underlying block-device hardware. The boolean simply indicates whether
+ * or not it is worthwhile for the frontend to attempt trim requests.
+ * If a backend does not recognise BLKIF_OP_TRIM, it should *not*
+ * create the "feature-trim" node!
+ */
+#define BLKIF_OP_TRIM            5
+
 /*
  * Maximum scatter/gather segments per request.
  * This is carefully chosen so that sizeof(struct blkif_ring) <= PAGE_SIZE.
@@ -74,6 +87,11 @@ struct blkif_request_rw {
 	} seg[BLKIF_MAX_SEGMENTS_PER_REQUEST];
 };
 
+struct blkif_request_trim {
+	blkif_sector_t sector_number;
+	uint64_t nr_sectors;
+};
+
 struct blkif_request {
 	uint8_t        operation;    /* BLKIF_OP_???                         */
 	uint8_t        nr_segments;  /* number of segments                   */
@@ -81,6 +99,7 @@ struct blkif_request {
 	uint64_t       id;           /* private guest value, echoed in resp  */
 	union {
 		struct blkif_request_rw rw;
+		struct blkif_request_trim trim;
 	} u;
 };
 
@@ -109,6 +128,8 @@ DEFINE_RING_TYPES(blkif, struct blkif_request, struct blkif_response);
 #define VDISK_CDROM        0x1
 #define VDISK_REMOVABLE    0x2
 #define VDISK_READONLY     0x4
+#define VDISK_FILE_BACKEND 0x8
+#define VDISK_PHY_BACKEND  0x10
 
 /* Xen-defined major numbers for virtual disks, they look strangely
  * familiar */
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH V2 2/3] xen-blkfront: teach blkfront driver handle trim request
  2011-08-18  9:34 [PATCH V2 0/3] xen-blkfront/xen-blkback trim support Li Dongyang
  2011-08-18  9:34 ` [PATCH V2 1/3] xen-blkfront: add BLKIF_OP_TRIM and backend type flags Li Dongyang
@ 2011-08-18  9:34 ` Li Dongyang
  2011-08-18 15:05   ` Konrad Rzeszutek Wilk
  2011-08-18  9:34 ` [PATCH V2 3/3] xen-blkback: handle trim request in backend driver Li Dongyang
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Li Dongyang @ 2011-08-18  9:34 UTC (permalink / raw)
  To: xen-devel; +Cc: owen.smith, JBeulich

The blkfront driver now will read feature-trim from xenstore,
and set up the request queue with trim params, then we can forward the
discard requests to backend driver.

Signed-off-by: Li Dongyang <lidongyang@novell.com>
---
 drivers/block/xen-blkfront.c |   96 ++++++++++++++++++++++++++++++++----------
 1 files changed, 73 insertions(+), 23 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index b536a9c..c22fb07 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -98,6 +98,9 @@ struct blkfront_info
 	unsigned long shadow_free;
 	unsigned int feature_flush;
 	unsigned int flush_op;
+	unsigned int feature_trim;
+	unsigned int discard_granularity;
+	unsigned int discard_alignment;
 	int is_ready;
 };
 
@@ -302,29 +305,36 @@ static int blkif_queue_request(struct request *req)
 		ring_req->operation = info->flush_op;
 	}
 
-	ring_req->nr_segments = blk_rq_map_sg(req->q, req, info->sg);
-	BUG_ON(ring_req->nr_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST);
+	if (unlikely(req->cmd_flags & REQ_DISCARD)) {
+		/* id, sector_number and handle are set above. */
+		ring_req->operation = BLKIF_OP_TRIM;
+		ring_req->nr_segments = 0;
+		ring_req->u.trim.nr_sectors = blk_rq_sectors(req);
+	} else {
+		ring_req->nr_segments = blk_rq_map_sg(req->q, req, info->sg);
+		BUG_ON(ring_req->nr_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST);
 
-	for_each_sg(info->sg, sg, ring_req->nr_segments, i) {
-		buffer_mfn = pfn_to_mfn(page_to_pfn(sg_page(sg)));
-		fsect = sg->offset >> 9;
-		lsect = fsect + (sg->length >> 9) - 1;
-		/* install a grant reference. */
-		ref = gnttab_claim_grant_reference(&gref_head);
-		BUG_ON(ref == -ENOSPC);
+		for_each_sg(info->sg, sg, ring_req->nr_segments, i) {
+			buffer_mfn = pfn_to_mfn(page_to_pfn(sg_page(sg)));
+			fsect = sg->offset >> 9;
+			lsect = fsect + (sg->length >> 9) - 1;
+			/* install a grant reference. */
+			ref = gnttab_claim_grant_reference(&gref_head);
+			BUG_ON(ref == -ENOSPC);
 
-		gnttab_grant_foreign_access_ref(
-				ref,
-				info->xbdev->otherend_id,
-				buffer_mfn,
-				rq_data_dir(req) );
-
-		info->shadow[id].frame[i] = mfn_to_pfn(buffer_mfn);
-		ring_req->u.rw.seg[i] =
-				(struct blkif_request_segment) {
-					.gref       = ref,
-					.first_sect = fsect,
-					.last_sect  = lsect };
+			gnttab_grant_foreign_access_ref(
+					ref,
+					info->xbdev->otherend_id,
+					buffer_mfn,
+					rq_data_dir(req));
+
+			info->shadow[id].frame[i] = mfn_to_pfn(buffer_mfn);
+			ring_req->u.rw.seg[i] =
+					(struct blkif_request_segment) {
+						.gref       = ref,
+						.first_sect = fsect,
+						.last_sect  = lsect };
+		}
 	}
 
 	info->ring.req_prod_pvt++;
@@ -399,6 +409,7 @@ wait:
 static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size)
 {
 	struct request_queue *rq;
+	struct blkfront_info *info = gd->private_data;
 
 	rq = blk_init_queue(do_blkif_request, &blkif_io_lock);
 	if (rq == NULL)
@@ -406,6 +417,13 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size)
 
 	queue_flag_set_unlocked(QUEUE_FLAG_VIRT, rq);
 
+	if (info->feature_trim) {
+		queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, rq);
+		blk_queue_max_discard_sectors(rq, get_capacity(gd));
+		rq->limits.discard_granularity = info->discard_granularity;
+		rq->limits.discard_alignment = info->discard_alignment;
+	}
+
 	/* Hard sector size and max sectors impersonate the equiv. hardware. */
 	blk_queue_logical_block_size(rq, sector_size);
 	blk_queue_max_hw_sectors(rq, 512);
@@ -722,6 +740,19 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 
 		error = (bret->status == BLKIF_RSP_OKAY) ? 0 : -EIO;
 		switch (bret->operation) {
+		case BLKIF_OP_TRIM:
+			if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) {
+				struct request_queue *rq = info->rq;
+				printk(KERN_WARNING "blkfront: %s: trim op failed\n",
+					   info->gd->disk_name);
+				error = -EOPNOTSUPP;
+				info->feature_trim = 0;
+				spin_lock_irq(rq->queue_lock);
+				queue_flag_clear(QUEUE_FLAG_DISCARD, rq);
+				spin_unlock_irq(rq->queue_lock);
+			}
+			__blk_end_request_all(req, error);
+			break;
 		case BLKIF_OP_FLUSH_DISKCACHE:
 		case BLKIF_OP_WRITE_BARRIER:
 			if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) {
@@ -1108,7 +1139,9 @@ static void blkfront_connect(struct blkfront_info *info)
 	unsigned long sector_size;
 	unsigned int binfo;
 	int err;
-	int barrier, flush;
+	unsigned int discard_granularity;
+	unsigned int discard_alignment;
+	int barrier, flush, trim;
 
 	switch (info->connected) {
 	case BLKIF_STATE_CONNECTED:
@@ -1178,7 +1211,24 @@ static void blkfront_connect(struct blkfront_info *info)
 		info->feature_flush = REQ_FLUSH;
 		info->flush_op = BLKIF_OP_FLUSH_DISKCACHE;
 	}
-		
+
+	err = xenbus_gather(XBT_NIL, info->xbdev->otherend,
+			    "feature-trim", "%d", &trim,
+			    NULL);
+
+	if (!err && trim) {
+		info->feature_trim = 1;
+		err = xenbus_gather(XBT_NIL, info->xbdev->otherend,
+			"discard_granularity", "%u", &discard_granularity,
+			"discard_alignment", "%u", &discard_alignment,
+			NULL);
+		if (!err) {
+			info->discard_granularity = discard_granularity;
+			info->discard_alignment = discard_alignment;
+		}
+	} else
+		info->feature_trim = 0;
+
 	err = xlvbd_alloc_gendisk(sectors, info, binfo, sector_size);
 	if (err) {
 		xenbus_dev_fatal(info->xbdev, err, "xlvbd_add at %s",
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH V2 3/3] xen-blkback: handle trim request in backend driver
  2011-08-18  9:34 [PATCH V2 0/3] xen-blkfront/xen-blkback trim support Li Dongyang
  2011-08-18  9:34 ` [PATCH V2 1/3] xen-blkfront: add BLKIF_OP_TRIM and backend type flags Li Dongyang
  2011-08-18  9:34 ` [PATCH V2 2/3] xen-blkfront: teach blkfront driver handle trim request Li Dongyang
@ 2011-08-18  9:34 ` Li Dongyang
  2011-08-18 14:56   ` Konrad Rzeszutek Wilk
  2011-08-18 14:56 ` [PATCH V2 0/3] xen-blkfront/xen-blkback trim support Konrad Rzeszutek Wilk
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Li Dongyang @ 2011-08-18  9:34 UTC (permalink / raw)
  To: xen-devel; +Cc: owen.smith, JBeulich

Now blkback driver can handle the trim request from guest, we will
forward the request to phy device if it really has trim support, or we'll
punch a hole on the image file.

Signed-off-by: Li Dongyang <lidongyang@novell.com>
---
 drivers/block/xen-blkback/blkback.c |   85 +++++++++++++++++++++++++++++------
 drivers/block/xen-blkback/common.h  |    4 +-
 drivers/block/xen-blkback/xenbus.c  |   61 +++++++++++++++++++++++++
 3 files changed, 135 insertions(+), 15 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index 2330a9a..5acc37a 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -39,6 +39,9 @@
 #include <linux/list.h>
 #include <linux/delay.h>
 #include <linux/freezer.h>
+#include <linux/loop.h>
+#include <linux/falloc.h>
+#include <linux/fs.h>
 
 #include <xen/events.h>
 #include <xen/page.h>
@@ -258,13 +261,16 @@ irqreturn_t xen_blkif_be_int(int irq, void *dev_id)
 
 static void print_stats(struct xen_blkif *blkif)
 {
-	pr_info("xen-blkback (%s): oo %3d  |  rd %4d  |  wr %4d  |  f %4d\n",
+	pr_info("xen-blkback (%s): oo %3d  |  rd %4d  |  wr %4d  |  f %4d"
+		 "  |  tr %4d\n",
 		 current->comm, blkif->st_oo_req,
-		 blkif->st_rd_req, blkif->st_wr_req, blkif->st_f_req);
+		 blkif->st_rd_req, blkif->st_wr_req,
+		 blkif->st_f_req, blkif->st_tr_req);
 	blkif->st_print = jiffies + msecs_to_jiffies(10 * 1000);
 	blkif->st_rd_req = 0;
 	blkif->st_wr_req = 0;
 	blkif->st_oo_req = 0;
+	blkif->st_tr_req = 0;
 }
 
 int xen_blkif_schedule(void *arg)
@@ -563,6 +569,10 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 		blkif->st_f_req++;
 		operation = WRITE_FLUSH;
 		break;
+	case BLKIF_OP_TRIM:
+		blkif->st_tr_req++;
+		operation = REQ_DISCARD;
+		break;
 	case BLKIF_OP_WRITE_BARRIER:
 	default:
 		operation = 0; /* make gcc happy */
@@ -572,7 +582,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 
 	/* Check that the number of segments is sane. */
 	nseg = req->nr_segments;
-	if (unlikely(nseg == 0 && operation != WRITE_FLUSH) ||
+	if (unlikely(nseg == 0 && operation != (WRITE_FLUSH | REQ_DISCARD)) ||
 	    unlikely(nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST)) {
 		pr_debug(DRV_PFX "Bad number of segments in request (%d)\n",
 			 nseg);
@@ -627,10 +637,13 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 	 * the hypercall to unmap the grants - that is all done in
 	 * xen_blkbk_unmap.
 	 */
-	if (xen_blkbk_map(req, pending_req, seg))
+	if (operation != BLKIF_OP_TRIM && xen_blkbk_map(req, pending_req, seg))
 		goto fail_flush;
 
-	/* This corresponding xen_blkif_put is done in __end_block_io_op */
+	/*
+	 * This corresponding xen_blkif_put is done in __end_block_io_op, or
+	 * below if we are handling a BLKIF_OP_TRIM.
+	 */
 	xen_blkif_get(blkif);
 
 	for (i = 0; i < nseg; i++) {
@@ -654,18 +667,62 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 		preq.sector_number += seg[i].nsec;
 	}
 
-	/* This will be hit if the operation was a flush. */
+	/* This will be hit if the operation was a flush or trim. */
 	if (!bio) {
-		BUG_ON(operation != WRITE_FLUSH);
+		BUG_ON(operation != (WRITE_FLUSH | REQ_DISCARD));
 
-		bio = bio_alloc(GFP_KERNEL, 0);
-		if (unlikely(bio == NULL))
-			goto fail_put_bio;
+		if (operation == WRITE_FLUSH) {
+			bio = bio_alloc(GFP_KERNEL, 0);
+			if (unlikely(bio == NULL))
+				goto fail_put_bio;
 
-		biolist[nbio++] = bio;
-		bio->bi_bdev    = preq.bdev;
-		bio->bi_private = pending_req;
-		bio->bi_end_io  = end_block_io_op;
+			biolist[nbio++] = bio;
+			bio->bi_bdev    = preq.bdev;
+			bio->bi_private = pending_req;
+			bio->bi_end_io  = end_block_io_op;
+		} else if (operation == REQ_DISCARD) {
+			int err = 0;
+			int status = BLKIF_RSP_OKAY;
+			struct block_device *bdev = blkif->vbd.bdev;
+
+			preq.nr_sects = req->u.trim.nr_sectors;
+			if (blkif->vbd.type & VDISK_PHY_BACKEND)
+				/* just forward the trim request */
+				err = blkdev_issue_discard(bdev,
+						preq.sector_number,
+						preq.nr_sects,
+						GFP_KERNEL, 0);
+			else if (blkif->vbd.type & VDISK_FILE_BACKEND) {
+				/* punch a hole in the backing file */
+				struct loop_device *lo =
+					bdev->bd_disk->private_data;
+				struct file *file = lo->lo_backing_file;
+
+				if (file->f_op->fallocate)
+					err = file->f_op->fallocate(file,
+						FALLOC_FL_KEEP_SIZE |
+						FALLOC_FL_PUNCH_HOLE,
+						preq.sector_number << 9,
+						preq.nr_sects << 9);
+				else
+					err = -EOPNOTSUPP;
+			} else
+				status = BLKIF_RSP_EOPNOTSUPP;
+
+			if (err == -EOPNOTSUPP) {
+				DPRINTK("blkback: discard op failed, "
+						"not supported\n");
+				status = BLKIF_RSP_EOPNOTSUPP;
+			} else if (err)
+				status = BLKIF_RSP_ERROR;
+
+			if (status == BLKIF_RSP_OKAY)
+				blkif->st_tr_sect += preq.nr_sects;
+			make_response(blkif, req->id, req->operation, status);
+			xen_blkif_put(blkif);
+			free_req(pending_req);
+			return 0;
+		}
 	}
 
 	/*
diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index 9e40b28..1fef727 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -159,8 +159,10 @@ struct xen_blkif {
 	int			st_wr_req;
 	int			st_oo_req;
 	int			st_f_req;
+	int			st_tr_req;
 	int			st_rd_sect;
 	int			st_wr_sect;
+	int			st_tr_sect;
 
 	wait_queue_head_t	waiting_to_free;
 
@@ -182,7 +184,7 @@ struct xen_blkif {
 
 struct phys_req {
 	unsigned short		dev;
-	unsigned short		nr_sects;
+	blkif_sector_t		nr_sects;
 	struct block_device	*bdev;
 	blkif_sector_t		sector_number;
 };
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index 3f129b4..05ea8e0 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -272,16 +272,20 @@ VBD_SHOW(oo_req,  "%d\n", be->blkif->st_oo_req);
 VBD_SHOW(rd_req,  "%d\n", be->blkif->st_rd_req);
 VBD_SHOW(wr_req,  "%d\n", be->blkif->st_wr_req);
 VBD_SHOW(f_req,  "%d\n", be->blkif->st_f_req);
+VBD_SHOW(tr_req, "%d\n", be->blkif->st_tr_req);
 VBD_SHOW(rd_sect, "%d\n", be->blkif->st_rd_sect);
 VBD_SHOW(wr_sect, "%d\n", be->blkif->st_wr_sect);
+VBD_SHOW(tr_sect, "%d\n", be->blkif->st_tr_sect);
 
 static struct attribute *xen_vbdstat_attrs[] = {
 	&dev_attr_oo_req.attr,
 	&dev_attr_rd_req.attr,
 	&dev_attr_wr_req.attr,
 	&dev_attr_f_req.attr,
+	&dev_attr_tr_req.attr,
 	&dev_attr_rd_sect.attr,
 	&dev_attr_wr_sect.attr,
+	&dev_attr_tr_sect.attr,
 	NULL
 };
 
@@ -419,6 +423,59 @@ int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt,
 	return err;
 }
 
+int xen_blkbk_trim(struct xenbus_transaction xbt, struct backend_info *be)
+{
+	struct xenbus_device *dev = be->dev;
+	struct xen_vbd *vbd = &be->blkif->vbd;
+	char *type;
+	int err;
+	int state = 0;
+
+	type = xenbus_read(XBT_NIL, dev->nodename, "type", NULL);
+	if (!IS_ERR(type)) {
+		if (strcmp(type, "file") == 0)
+			state = 1;
+			vbd->type |= VDISK_FILE_BACKEND;
+		if (strcmp(type, "phy") == 0) {
+			struct block_device *bdev = be->blkif->vbd.bdev;
+			struct request_queue *q = bdev_get_queue(bdev);
+			if (blk_queue_discard(q)) {
+				err = xenbus_printf(xbt, dev->nodename,
+					"discard_granularity", "%u",
+					q->limits.discard_granularity);
+				if (err) {
+					xenbus_dev_fatal(dev, err,
+						"writing discard_granularity");
+					goto kfree;
+				}
+				err = xenbus_printf(xbt, dev->nodename,
+					"discard_alignment", "%u",
+					q->limits.discard_alignment);
+				if (err) {
+					xenbus_dev_fatal(dev, err,
+						"writing discard_alignment");
+					goto kfree;
+				}
+				state = 1;
+				vbd->type |= VDISK_PHY_BACKEND;
+			}
+		}
+	} else {
+		err = PTR_ERR(type);
+		xenbus_dev_fatal(dev, err, "reading type");
+		goto out;
+	}
+
+	err = xenbus_printf(xbt, dev->nodename, "feature-trim",
+			    "%d", state);
+	if (err)
+		xenbus_dev_fatal(dev, err, "writing feature-trim");
+kfree:
+	kfree(type);
+out:
+	return err;
+}
+
 /*
  * Entry point to this code when a new device is created.  Allocate the basic
  * structures, and watch the store waiting for the hotplug scripts to tell us
@@ -650,6 +707,10 @@ again:
 	if (err)
 		goto abort;
 
+	err = xen_blkbk_trim(xbt, be);
+	if (err)
+		goto abort;
+
 	err = xenbus_printf(xbt, dev->nodename, "sectors", "%llu",
 			    (unsigned long long)vbd_sz(&be->blkif->vbd));
 	if (err) {
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH V2 3/3] xen-blkback: handle trim request in backend driver
  2011-08-18  9:34 ` [PATCH V2 3/3] xen-blkback: handle trim request in backend driver Li Dongyang
@ 2011-08-18 14:56   ` Konrad Rzeszutek Wilk
  2011-08-22  9:43     ` Li Dongyang
  0 siblings, 1 reply; 14+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-08-18 14:56 UTC (permalink / raw)
  To: Li Dongyang; +Cc: xen-devel, owen.smith, JBeulich

On Thu, Aug 18, 2011 at 05:34:31PM +0800, Li Dongyang wrote:
> Now blkback driver can handle the trim request from guest, we will
> forward the request to phy device if it really has trim support, or we'll
> punch a hole on the image file.
> 
> Signed-off-by: Li Dongyang <lidongyang@novell.com>
> ---
>  drivers/block/xen-blkback/blkback.c |   85 +++++++++++++++++++++++++++++------
>  drivers/block/xen-blkback/common.h  |    4 +-
>  drivers/block/xen-blkback/xenbus.c  |   61 +++++++++++++++++++++++++
>  3 files changed, 135 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
> index 2330a9a..5acc37a 100644
> --- a/drivers/block/xen-blkback/blkback.c
> +++ b/drivers/block/xen-blkback/blkback.c
> @@ -39,6 +39,9 @@
>  #include <linux/list.h>
>  #include <linux/delay.h>
>  #include <linux/freezer.h>
> +#include <linux/loop.h>
> +#include <linux/falloc.h>
> +#include <linux/fs.h>
>  
>  #include <xen/events.h>
>  #include <xen/page.h>
> @@ -258,13 +261,16 @@ irqreturn_t xen_blkif_be_int(int irq, void *dev_id)
>  
>  static void print_stats(struct xen_blkif *blkif)
>  {
> -	pr_info("xen-blkback (%s): oo %3d  |  rd %4d  |  wr %4d  |  f %4d\n",
> +	pr_info("xen-blkback (%s): oo %3d  |  rd %4d  |  wr %4d  |  f %4d"
> +		 "  |  tr %4d\n",
>  		 current->comm, blkif->st_oo_req,
> -		 blkif->st_rd_req, blkif->st_wr_req, blkif->st_f_req);
> +		 blkif->st_rd_req, blkif->st_wr_req,
> +		 blkif->st_f_req, blkif->st_tr_req);
>  	blkif->st_print = jiffies + msecs_to_jiffies(10 * 1000);
>  	blkif->st_rd_req = 0;
>  	blkif->st_wr_req = 0;
>  	blkif->st_oo_req = 0;
> +	blkif->st_tr_req = 0;
>  }
>  
>  int xen_blkif_schedule(void *arg)
> @@ -563,6 +569,10 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
>  		blkif->st_f_req++;
>  		operation = WRITE_FLUSH;
>  		break;
> +	case BLKIF_OP_TRIM:
> +		blkif->st_tr_req++;
> +		operation = REQ_DISCARD;
> +		break;
>  	case BLKIF_OP_WRITE_BARRIER:
>  	default:
>  		operation = 0; /* make gcc happy */
> @@ -572,7 +582,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
>  
>  	/* Check that the number of segments is sane. */
>  	nseg = req->nr_segments;
> -	if (unlikely(nseg == 0 && operation != WRITE_FLUSH) ||
> +	if (unlikely(nseg == 0 && operation != (WRITE_FLUSH | REQ_DISCARD)) ||
>  	    unlikely(nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST)) {
>  		pr_debug(DRV_PFX "Bad number of segments in request (%d)\n",
>  			 nseg);
> @@ -627,10 +637,13 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
>  	 * the hypercall to unmap the grants - that is all done in
>  	 * xen_blkbk_unmap.
>  	 */
> -	if (xen_blkbk_map(req, pending_req, seg))
> +	if (operation != BLKIF_OP_TRIM && xen_blkbk_map(req, pending_req, seg))
>  		goto fail_flush;
>  
> -	/* This corresponding xen_blkif_put is done in __end_block_io_op */
> +	/*
> +	 * This corresponding xen_blkif_put is done in __end_block_io_op, or
> +	 * below if we are handling a BLKIF_OP_TRIM.
> +	 */
>  	xen_blkif_get(blkif);
>  
>  	for (i = 0; i < nseg; i++) {
> @@ -654,18 +667,62 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
>  		preq.sector_number += seg[i].nsec;
>  	}
>  
> -	/* This will be hit if the operation was a flush. */
> +	/* This will be hit if the operation was a flush or trim. */
>  	if (!bio) {
> -		BUG_ON(operation != WRITE_FLUSH);
> +		BUG_ON(operation != (WRITE_FLUSH | REQ_DISCARD));
>  
> -		bio = bio_alloc(GFP_KERNEL, 0);
> -		if (unlikely(bio == NULL))
> -			goto fail_put_bio;
> +		if (operation == WRITE_FLUSH) {
> +			bio = bio_alloc(GFP_KERNEL, 0);
> +			if (unlikely(bio == NULL))
> +				goto fail_put_bio;
>  
> -		biolist[nbio++] = bio;
> -		bio->bi_bdev    = preq.bdev;
> -		bio->bi_private = pending_req;
> -		bio->bi_end_io  = end_block_io_op;
> +			biolist[nbio++] = bio;
> +			bio->bi_bdev    = preq.bdev;
> +			bio->bi_private = pending_req;
> +			bio->bi_end_io  = end_block_io_op;
> +		} else if (operation == REQ_DISCARD) {
> +			int err = 0;
> +			int status = BLKIF_RSP_OKAY;
> +			struct block_device *bdev = blkif->vbd.bdev;
> +
> +			preq.nr_sects = req->u.trim.nr_sectors;
> +			if (blkif->vbd.type & VDISK_PHY_BACKEND)
> +				/* just forward the trim request */
> +				err = blkdev_issue_discard(bdev,
> +						preq.sector_number,
> +						preq.nr_sects,
> +						GFP_KERNEL, 0);
> +			else if (blkif->vbd.type & VDISK_FILE_BACKEND) {
> +				/* punch a hole in the backing file */
> +				struct loop_device *lo =
> +					bdev->bd_disk->private_data;
> +				struct file *file = lo->lo_backing_file;
> +
> +				if (file->f_op->fallocate)
> +					err = file->f_op->fallocate(file,
> +						FALLOC_FL_KEEP_SIZE |
> +						FALLOC_FL_PUNCH_HOLE,
> +						preq.sector_number << 9,
> +						preq.nr_sects << 9);
> +				else
> +					err = -EOPNOTSUPP;
> +			} else
> +				status = BLKIF_RSP_EOPNOTSUPP;
> +
> +			if (err == -EOPNOTSUPP) {
> +				DPRINTK("blkback: discard op failed, "
> +						"not supported\n");

Use pr_debug like the rest of the file does.

> +				status = BLKIF_RSP_EOPNOTSUPP;
> +			} else if (err)
> +				status = BLKIF_RSP_ERROR;
> +
> +			if (status == BLKIF_RSP_OKAY)
> +				blkif->st_tr_sect += preq.nr_sects;
> +			make_response(blkif, req->id, req->operation, status);
> +			xen_blkif_put(blkif);
> +			free_req(pending_req);
> +			return 0;
> +		}

All of this should really be moved to its own function.

>  	}
>  
>  	/*
> diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
> index 9e40b28..1fef727 100644
> --- a/drivers/block/xen-blkback/common.h
> +++ b/drivers/block/xen-blkback/common.h
> @@ -159,8 +159,10 @@ struct xen_blkif {
>  	int			st_wr_req;
>  	int			st_oo_req;
>  	int			st_f_req;
> +	int			st_tr_req;
>  	int			st_rd_sect;
>  	int			st_wr_sect;
> +	int			st_tr_sect;
>  
>  	wait_queue_head_t	waiting_to_free;
>  
> @@ -182,7 +184,7 @@ struct xen_blkif {
>  
>  struct phys_req {
>  	unsigned short		dev;
> -	unsigned short		nr_sects;
> +	blkif_sector_t		nr_sects;
>  	struct block_device	*bdev;
>  	blkif_sector_t		sector_number;
>  };
> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> index 3f129b4..05ea8e0 100644
> --- a/drivers/block/xen-blkback/xenbus.c
> +++ b/drivers/block/xen-blkback/xenbus.c
> @@ -272,16 +272,20 @@ VBD_SHOW(oo_req,  "%d\n", be->blkif->st_oo_req);
>  VBD_SHOW(rd_req,  "%d\n", be->blkif->st_rd_req);
>  VBD_SHOW(wr_req,  "%d\n", be->blkif->st_wr_req);
>  VBD_SHOW(f_req,  "%d\n", be->blkif->st_f_req);
> +VBD_SHOW(tr_req, "%d\n", be->blkif->st_tr_req);
>  VBD_SHOW(rd_sect, "%d\n", be->blkif->st_rd_sect);
>  VBD_SHOW(wr_sect, "%d\n", be->blkif->st_wr_sect);
> +VBD_SHOW(tr_sect, "%d\n", be->blkif->st_tr_sect);
>  
>  static struct attribute *xen_vbdstat_attrs[] = {
>  	&dev_attr_oo_req.attr,
>  	&dev_attr_rd_req.attr,
>  	&dev_attr_wr_req.attr,
>  	&dev_attr_f_req.attr,
> +	&dev_attr_tr_req.attr,
>  	&dev_attr_rd_sect.attr,
>  	&dev_attr_wr_sect.attr,
> +	&dev_attr_tr_sect.attr,
>  	NULL
>  };
>  
> @@ -419,6 +423,59 @@ int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt,
>  	return err;
>  }
>  
> +int xen_blkbk_trim(struct xenbus_transaction xbt, struct backend_info *be)
> +{
> +	struct xenbus_device *dev = be->dev;
> +	struct xen_vbd *vbd = &be->blkif->vbd;
> +	char *type;
> +	int err;
> +	int state = 0;
> +
> +	type = xenbus_read(XBT_NIL, dev->nodename, "type", NULL);
> +	if (!IS_ERR(type)) {
> +		if (strcmp(type, "file") == 0)
> +			state = 1;
> +			vbd->type |= VDISK_FILE_BACKEND;
> +		if (strcmp(type, "phy") == 0) {

Use 'strncmp' please.

> +			struct block_device *bdev = be->blkif->vbd.bdev;
> +			struct request_queue *q = bdev_get_queue(bdev);
> +			if (blk_queue_discard(q)) {
> +				err = xenbus_printf(xbt, dev->nodename,
> +					"discard_granularity", "%u",

Hm, most of the items written to the Xenbus use '-', not '_'.
Any particular reason for using '_'?

> +					q->limits.discard_granularity);
> +				if (err) {
> +					xenbus_dev_fatal(dev, err,
> +						"writing discard_granularity");
> +					goto kfree;
> +				}
> +				err = xenbus_printf(xbt, dev->nodename,
> +					"discard_alignment", "%u",

Ditto here.
> +					q->limits.discard_alignment);
> +				if (err) {
> +					xenbus_dev_fatal(dev, err,
> +						"writing discard_alignment");
> +					goto kfree;
> +				}
> +				state = 1;
> +				vbd->type |= VDISK_PHY_BACKEND;
> +			}
> +		}
> +	} else {
> +		err = PTR_ERR(type);
> +		xenbus_dev_fatal(dev, err, "reading type");
> +		goto out;
> +	}
> +
> +	err = xenbus_printf(xbt, dev->nodename, "feature-trim",
> +			    "%d", state);
> +	if (err)
> +		xenbus_dev_fatal(dev, err, "writing feature-trim");
> +kfree:
> +	kfree(type);
> +out:
> +	return err;
> +}
> +
>  /*
>   * Entry point to this code when a new device is created.  Allocate the basic
>   * structures, and watch the store waiting for the hotplug scripts to tell us
> @@ -650,6 +707,10 @@ again:
>  	if (err)
>  		goto abort;
>  
> +	err = xen_blkbk_trim(xbt, be);
> +	if (err)
> +		goto abort;
> +
>  	err = xenbus_printf(xbt, dev->nodename, "sectors", "%llu",
>  			    (unsigned long long)vbd_sz(&be->blkif->vbd));
>  	if (err) {
> -- 
> 1.7.6
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V2 0/3] xen-blkfront/xen-blkback trim support
  2011-08-18  9:34 [PATCH V2 0/3] xen-blkfront/xen-blkback trim support Li Dongyang
                   ` (2 preceding siblings ...)
  2011-08-18  9:34 ` [PATCH V2 3/3] xen-blkback: handle trim request in backend driver Li Dongyang
@ 2011-08-18 14:56 ` Konrad Rzeszutek Wilk
  2011-08-18 15:06 ` Konrad Rzeszutek Wilk
  2011-08-20  0:41 ` Jeremy Fitzhardinge
  5 siblings, 0 replies; 14+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-08-18 14:56 UTC (permalink / raw)
  To: Li Dongyang; +Cc: xen-devel, owen.smith, JBeulich

On Thu, Aug 18, 2011 at 05:34:28PM +0800, Li Dongyang wrote:
> Hi, List
> Here is V2 of the trim support for blkfront/blkback,
> it's now rebased on Jeremy's next-3.1 tree and has some adjustments
> according to the comments from Jan Beulich, Thanks

Great! Thanks for posting them!
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V2 2/3] xen-blkfront: teach blkfront driver handle trim request
  2011-08-18  9:34 ` [PATCH V2 2/3] xen-blkfront: teach blkfront driver handle trim request Li Dongyang
@ 2011-08-18 15:05   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 14+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-08-18 15:05 UTC (permalink / raw)
  To: Li Dongyang; +Cc: xen-devel, owen.smith, JBeulich

On Thu, Aug 18, 2011 at 05:34:30PM +0800, Li Dongyang wrote:
> The blkfront driver now will read feature-trim from xenstore,
> and set up the request queue with trim params, then we can forward the
> discard requests to backend driver.
> 
> Signed-off-by: Li Dongyang <lidongyang@novell.com>
> ---
>  drivers/block/xen-blkfront.c |   96 ++++++++++++++++++++++++++++++++----------
>  1 files changed, 73 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index b536a9c..c22fb07 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -98,6 +98,9 @@ struct blkfront_info
>  	unsigned long shadow_free;
>  	unsigned int feature_flush;
>  	unsigned int flush_op;
> +	unsigned int feature_trim;
> +	unsigned int discard_granularity;
> +	unsigned int discard_alignment;
>  	int is_ready;
>  };
>  
> @@ -302,29 +305,36 @@ static int blkif_queue_request(struct request *req)
>  		ring_req->operation = info->flush_op;
>  	}
>  
> -	ring_req->nr_segments = blk_rq_map_sg(req->q, req, info->sg);
> -	BUG_ON(ring_req->nr_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST);
> +	if (unlikely(req->cmd_flags & REQ_DISCARD)) {
> +		/* id, sector_number and handle are set above. */
> +		ring_req->operation = BLKIF_OP_TRIM;
> +		ring_req->nr_segments = 0;
> +		ring_req->u.trim.nr_sectors = blk_rq_sectors(req);
> +	} else {
> +		ring_req->nr_segments = blk_rq_map_sg(req->q, req, info->sg);
> +		BUG_ON(ring_req->nr_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST);
>  
> -	for_each_sg(info->sg, sg, ring_req->nr_segments, i) {
> -		buffer_mfn = pfn_to_mfn(page_to_pfn(sg_page(sg)));
> -		fsect = sg->offset >> 9;
> -		lsect = fsect + (sg->length >> 9) - 1;
> -		/* install a grant reference. */
> -		ref = gnttab_claim_grant_reference(&gref_head);
> -		BUG_ON(ref == -ENOSPC);
> +		for_each_sg(info->sg, sg, ring_req->nr_segments, i) {
> +			buffer_mfn = pfn_to_mfn(page_to_pfn(sg_page(sg)));
> +			fsect = sg->offset >> 9;
> +			lsect = fsect + (sg->length >> 9) - 1;
> +			/* install a grant reference. */
> +			ref = gnttab_claim_grant_reference(&gref_head);
> +			BUG_ON(ref == -ENOSPC);
>  
> -		gnttab_grant_foreign_access_ref(
> -				ref,
> -				info->xbdev->otherend_id,
> -				buffer_mfn,
> -				rq_data_dir(req) );
> -
> -		info->shadow[id].frame[i] = mfn_to_pfn(buffer_mfn);
> -		ring_req->u.rw.seg[i] =
> -				(struct blkif_request_segment) {
> -					.gref       = ref,
> -					.first_sect = fsect,
> -					.last_sect  = lsect };
> +			gnttab_grant_foreign_access_ref(
> +					ref,
> +					info->xbdev->otherend_id,
> +					buffer_mfn,
> +					rq_data_dir(req));
> +
> +			info->shadow[id].frame[i] = mfn_to_pfn(buffer_mfn);
> +			ring_req->u.rw.seg[i] =
> +					(struct blkif_request_segment) {
> +						.gref       = ref,
> +						.first_sect = fsect,
> +						.last_sect  = lsect };
> +		}
>  	}
>  
>  	info->ring.req_prod_pvt++;
> @@ -399,6 +409,7 @@ wait:
>  static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size)
>  {
>  	struct request_queue *rq;
> +	struct blkfront_info *info = gd->private_data;
>  
>  	rq = blk_init_queue(do_blkif_request, &blkif_io_lock);
>  	if (rq == NULL)
> @@ -406,6 +417,13 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size)
>  
>  	queue_flag_set_unlocked(QUEUE_FLAG_VIRT, rq);
>  
> +	if (info->feature_trim) {
> +		queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, rq);
> +		blk_queue_max_discard_sectors(rq, get_capacity(gd));
> +		rq->limits.discard_granularity = info->discard_granularity;
> +		rq->limits.discard_alignment = info->discard_alignment;
> +	}
> +
>  	/* Hard sector size and max sectors impersonate the equiv. hardware. */
>  	blk_queue_logical_block_size(rq, sector_size);
>  	blk_queue_max_hw_sectors(rq, 512);
> @@ -722,6 +740,19 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>  
>  		error = (bret->status == BLKIF_RSP_OKAY) ? 0 : -EIO;
>  		switch (bret->operation) {
> +		case BLKIF_OP_TRIM:
> +			if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) {
> +				struct request_queue *rq = info->rq;
> +				printk(KERN_WARNING "blkfront: %s: trim op failed\n",
> +					   info->gd->disk_name);
> +				error = -EOPNOTSUPP;
> +				info->feature_trim = 0;
> +				spin_lock_irq(rq->queue_lock);
> +				queue_flag_clear(QUEUE_FLAG_DISCARD, rq);
> +				spin_unlock_irq(rq->queue_lock);
> +			}
> +			__blk_end_request_all(req, error);
> +			break;
>  		case BLKIF_OP_FLUSH_DISKCACHE:
>  		case BLKIF_OP_WRITE_BARRIER:
>  			if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) {
> @@ -1108,7 +1139,9 @@ static void blkfront_connect(struct blkfront_info *info)
>  	unsigned long sector_size;
>  	unsigned int binfo;
>  	int err;
> -	int barrier, flush;
> +	unsigned int discard_granularity;
> +	unsigned int discard_alignment;
> +	int barrier, flush, trim;
>  
>  	switch (info->connected) {
>  	case BLKIF_STATE_CONNECTED:
> @@ -1178,7 +1211,24 @@ static void blkfront_connect(struct blkfront_info *info)
>  		info->feature_flush = REQ_FLUSH;
>  		info->flush_op = BLKIF_OP_FLUSH_DISKCACHE;
>  	}
> -		
> +
> +	err = xenbus_gather(XBT_NIL, info->xbdev->otherend,
> +			    "feature-trim", "%d", &trim,
> +			    NULL);
> +
> +	if (!err && trim) {
> +		info->feature_trim = 1;
> +		err = xenbus_gather(XBT_NIL, info->xbdev->otherend,
> +			"discard_granularity", "%u", &discard_granularity,
> +			"discard_alignment", "%u", &discard_alignment,

I am just not sure why the '_' was choosen. It seems so odds
with the rest of what is put in XenBus.

> +			NULL);
> +		if (!err) {
> +			info->discard_granularity = discard_granularity;
> +			info->discard_alignment = discard_alignment;
> +		}
> +	} else
> +		info->feature_trim = 0;
> +
>  	err = xlvbd_alloc_gendisk(sectors, info, binfo, sector_size);
>  	if (err) {
>  		xenbus_dev_fatal(info->xbdev, err, "xlvbd_add at %s",
> -- 
> 1.7.6
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V2 0/3] xen-blkfront/xen-blkback trim support
  2011-08-18  9:34 [PATCH V2 0/3] xen-blkfront/xen-blkback trim support Li Dongyang
                   ` (3 preceding siblings ...)
  2011-08-18 14:56 ` [PATCH V2 0/3] xen-blkfront/xen-blkback trim support Konrad Rzeszutek Wilk
@ 2011-08-18 15:06 ` Konrad Rzeszutek Wilk
  2011-08-20  0:41 ` Jeremy Fitzhardinge
  5 siblings, 0 replies; 14+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-08-18 15:06 UTC (permalink / raw)
  To: Li Dongyang; +Cc: xen-devel, owen.smith, JBeulich

On Thu, Aug 18, 2011 at 05:34:28PM +0800, Li Dongyang wrote:
> Hi, List
> Here is V2 of the trim support for blkfront/blkback,
> it's now rebased on Jeremy's next-3.1 tree and has some adjustments

In the future, please base it on top of a Linus's kernel - say
3.1-rc2.

> according to the comments from Jan Beulich, Thanks
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V2 1/3] xen-blkfront: add BLKIF_OP_TRIM and backend type flags
  2011-08-18  9:34 ` [PATCH V2 1/3] xen-blkfront: add BLKIF_OP_TRIM and backend type flags Li Dongyang
@ 2011-08-20  0:38   ` Jeremy Fitzhardinge
  2011-08-22  9:36     ` Li Dongyang
  0 siblings, 1 reply; 14+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-20  0:38 UTC (permalink / raw)
  To: Li Dongyang; +Cc: xen-devel, owen.smith, JBeulich

On 08/18/2011 02:34 AM, Li Dongyang wrote:
> This adds the BLKIF_OP_TRIM for blkfront and blkback, also 2 flags telling
> us the type of the backend, used in blkback to determine what to do when we
> see a trim request.
> Part of the patch is just taken from Owen Smith, Thanks
>
> Signed-off-by: Owen Smith <owen.smith@citrix.com>
> Signed-off-by: Li Dongyang <lidongyang@novell.com>
> ---
>  include/xen/interface/io/blkif.h |   21 +++++++++++++++++++++
>  1 files changed, 21 insertions(+), 0 deletions(-)
>
> diff --git a/include/xen/interface/io/blkif.h b/include/xen/interface/io/blkif.h
> index 3d5d6db..b92cf23 100644
> --- a/include/xen/interface/io/blkif.h
> +++ b/include/xen/interface/io/blkif.h
> @@ -57,6 +57,19 @@ typedef uint64_t blkif_sector_t;
>   * "feature-flush-cache" node!
>   */
>  #define BLKIF_OP_FLUSH_DISKCACHE   3
> +
> +/*
> + * Recognised only if "feature-trim" is present in backend xenbus info.
> + * The "feature-trim" node contains a boolean indicating whether barrier
> + * requests are likely to succeed or fail. Either way, a trim request

Barrier requests?

> + * may fail at any time with BLKIF_RSP_EOPNOTSUPP if it is unsupported by
> + * the underlying block-device hardware. The boolean simply indicates whether
> + * or not it is worthwhile for the frontend to attempt trim requests.
> + * If a backend does not recognise BLKIF_OP_TRIM, it should *not*
> + * create the "feature-trim" node!

Is all this necessary?  What happens if guests just send OP_TRIM
requests, and if the host doesn't understand them then it will fails
them with EOPNOTSUPP?  Is a TRIM request ever anything more than a hint
to the backend that certain blocks are no longer needed?

> + */
> +#define BLKIF_OP_TRIM            5
> +
>  /*
>   * Maximum scatter/gather segments per request.
>   * This is carefully chosen so that sizeof(struct blkif_ring) <= PAGE_SIZE.
> @@ -74,6 +87,11 @@ struct blkif_request_rw {
>  	} seg[BLKIF_MAX_SEGMENTS_PER_REQUEST];
>  };
>  
> +struct blkif_request_trim {
> +	blkif_sector_t sector_number;
> +	uint64_t nr_sectors;
> +};
> +
>  struct blkif_request {
>  	uint8_t        operation;    /* BLKIF_OP_???                         */
>  	uint8_t        nr_segments;  /* number of segments                   */
> @@ -81,6 +99,7 @@ struct blkif_request {
>  	uint64_t       id;           /* private guest value, echoed in resp  */
>  	union {
>  		struct blkif_request_rw rw;
> +		struct blkif_request_trim trim;
>  	} u;
>  };
>  
> @@ -109,6 +128,8 @@ DEFINE_RING_TYPES(blkif, struct blkif_request, struct blkif_response);
>  #define VDISK_CDROM        0x1
>  #define VDISK_REMOVABLE    0x2
>  #define VDISK_READONLY     0x4
> +#define VDISK_FILE_BACKEND 0x8
> +#define VDISK_PHY_BACKEND  0x10

What are these for?  Why does a frontend care about these?

    J

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V2 0/3] xen-blkfront/xen-blkback trim support
  2011-08-18  9:34 [PATCH V2 0/3] xen-blkfront/xen-blkback trim support Li Dongyang
                   ` (4 preceding siblings ...)
  2011-08-18 15:06 ` Konrad Rzeszutek Wilk
@ 2011-08-20  0:41 ` Jeremy Fitzhardinge
  5 siblings, 0 replies; 14+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-20  0:41 UTC (permalink / raw)
  To: Li Dongyang; +Cc: xen-devel, owen.smith, JBeulich

On 08/18/2011 02:34 AM, Li Dongyang wrote:
> Hi, List
> Here is V2 of the trim support for blkfront/blkback,
> it's now rebased on Jeremy's next-3.1 tree and has some adjustments

Please don't use that branch; its really just for my personal use, and I
don't intend it for general use.  New patches should be based on Linus's
linux.git.

    J

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V2 1/3] xen-blkfront: add BLKIF_OP_TRIM and backend type flags
  2011-08-20  0:38   ` Jeremy Fitzhardinge
@ 2011-08-22  9:36     ` Li Dongyang
  2011-08-22 17:44       ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 14+ messages in thread
From: Li Dongyang @ 2011-08-22  9:36 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: xen-devel, owen.smith, JBeulich

On Sat, Aug 20, 2011 at 8:38 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> On 08/18/2011 02:34 AM, Li Dongyang wrote:
>> This adds the BLKIF_OP_TRIM for blkfront and blkback, also 2 flags telling
>> us the type of the backend, used in blkback to determine what to do when we
>> see a trim request.
>> Part of the patch is just taken from Owen Smith, Thanks
>>
>> Signed-off-by: Owen Smith <owen.smith@citrix.com>
>> Signed-off-by: Li Dongyang <lidongyang@novell.com>
>> ---
>>  include/xen/interface/io/blkif.h |   21 +++++++++++++++++++++
>>  1 files changed, 21 insertions(+), 0 deletions(-)
>>
>> diff --git a/include/xen/interface/io/blkif.h b/include/xen/interface/io/blkif.h
>> index 3d5d6db..b92cf23 100644
>> --- a/include/xen/interface/io/blkif.h
>> +++ b/include/xen/interface/io/blkif.h
>> @@ -57,6 +57,19 @@ typedef uint64_t blkif_sector_t;
>>   * "feature-flush-cache" node!
>>   */
>>  #define BLKIF_OP_FLUSH_DISKCACHE   3
>> +
>> +/*
>> + * Recognised only if "feature-trim" is present in backend xenbus info.
>> + * The "feature-trim" node contains a boolean indicating whether barrier
>> + * requests are likely to succeed or fail. Either way, a trim request
>
> Barrier requests?
hm, I wonder the same, seems it's a copy & paste mistake,
the BLKIF_OP_TRIM part is taken from Owen's patch back in Jan 2011:
http://lists.xensource.com/archives/html/xen-devel/2011-01/msg01059.html
>
>> + * may fail at any time with BLKIF_RSP_EOPNOTSUPP if it is unsupported by
>> + * the underlying block-device hardware. The boolean simply indicates whether
>> + * or not it is worthwhile for the frontend to attempt trim requests.
>> + * If a backend does not recognise BLKIF_OP_TRIM, it should *not*
>> + * create the "feature-trim" node!
>
> Is all this necessary?  What happens if guests just send OP_TRIM
> requests, and if the host doesn't understand them then it will fails
> them with EOPNOTSUPP?  Is a TRIM request ever anything more than a hint
> to the backend that certain blocks are no longer needed?
that won't happen: we only mark the queue in the guest has TRIM if
blkback tells blkfront
via xenstore. if we don't init the queue with TRIM in guest, if guest
send OP_TRIM,
it gonna fail with ENONOTSUPP in the guest's block layer, see
blkdev_issue_discard.
and yes, trim is just a hint, the basic idea is forward the hint to
phy dev if it has trim, or
punch a hole to reduce disk usage if the backend is a file.
and this comment is taken from Owen, I think he could give sth here.
>
>> + */
>> +#define BLKIF_OP_TRIM            5
>> +
>>  /*
>>   * Maximum scatter/gather segments per request.
>>   * This is carefully chosen so that sizeof(struct blkif_ring) <= PAGE_SIZE.
>> @@ -74,6 +87,11 @@ struct blkif_request_rw {
>>       } seg[BLKIF_MAX_SEGMENTS_PER_REQUEST];
>>  };
>>
>> +struct blkif_request_trim {
>> +     blkif_sector_t sector_number;
>> +     uint64_t nr_sectors;
>> +};
>> +
>>  struct blkif_request {
>>       uint8_t        operation;    /* BLKIF_OP_???                         */
>>       uint8_t        nr_segments;  /* number of segments                   */
>> @@ -81,6 +99,7 @@ struct blkif_request {
>>       uint64_t       id;           /* private guest value, echoed in resp  */
>>       union {
>>               struct blkif_request_rw rw;
>> +             struct blkif_request_trim trim;
>>       } u;
>>  };
>>
>> @@ -109,6 +128,8 @@ DEFINE_RING_TYPES(blkif, struct blkif_request, struct blkif_response);
>>  #define VDISK_CDROM        0x1
>>  #define VDISK_REMOVABLE    0x2
>>  #define VDISK_READONLY     0x4
>> +#define VDISK_FILE_BACKEND 0x8
>> +#define VDISK_PHY_BACKEND  0x10
>
> What are these for?  Why does a frontend care about these?
they are used for the backend driver to decide what to do when we get
a OP_TRIM, if it's phy then forward the trim,
or punch a hole in the file, Jan also mentioned this is not good,
gonna find another place for the flags, thanks
>
>    J
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V2 3/3] xen-blkback: handle trim request in backend driver
  2011-08-18 14:56   ` Konrad Rzeszutek Wilk
@ 2011-08-22  9:43     ` Li Dongyang
  2011-08-22 13:27       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 14+ messages in thread
From: Li Dongyang @ 2011-08-22  9:43 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel, owen.smith, JBeulich

On Thu, Aug 18, 2011 at 10:56 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Thu, Aug 18, 2011 at 05:34:31PM +0800, Li Dongyang wrote:
>> Now blkback driver can handle the trim request from guest, we will
>> forward the request to phy device if it really has trim support, or we'll
>> punch a hole on the image file.
>>
>> Signed-off-by: Li Dongyang <lidongyang@novell.com>
>> ---
>>  drivers/block/xen-blkback/blkback.c |   85 +++++++++++++++++++++++++++++------
>>  drivers/block/xen-blkback/common.h  |    4 +-
>>  drivers/block/xen-blkback/xenbus.c  |   61 +++++++++++++++++++++++++
>>  3 files changed, 135 insertions(+), 15 deletions(-)
>>
>> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
>> index 2330a9a..5acc37a 100644
>> --- a/drivers/block/xen-blkback/blkback.c
>> +++ b/drivers/block/xen-blkback/blkback.c
>> @@ -39,6 +39,9 @@
>>  #include <linux/list.h>
>>  #include <linux/delay.h>
>>  #include <linux/freezer.h>
>> +#include <linux/loop.h>
>> +#include <linux/falloc.h>
>> +#include <linux/fs.h>
>>
>>  #include <xen/events.h>
>>  #include <xen/page.h>
>> @@ -258,13 +261,16 @@ irqreturn_t xen_blkif_be_int(int irq, void *dev_id)
>>
>>  static void print_stats(struct xen_blkif *blkif)
>>  {
>> -     pr_info("xen-blkback (%s): oo %3d  |  rd %4d  |  wr %4d  |  f %4d\n",
>> +     pr_info("xen-blkback (%s): oo %3d  |  rd %4d  |  wr %4d  |  f %4d"
>> +              "  |  tr %4d\n",
>>                current->comm, blkif->st_oo_req,
>> -              blkif->st_rd_req, blkif->st_wr_req, blkif->st_f_req);
>> +              blkif->st_rd_req, blkif->st_wr_req,
>> +              blkif->st_f_req, blkif->st_tr_req);
>>       blkif->st_print = jiffies + msecs_to_jiffies(10 * 1000);
>>       blkif->st_rd_req = 0;
>>       blkif->st_wr_req = 0;
>>       blkif->st_oo_req = 0;
>> +     blkif->st_tr_req = 0;
>>  }
>>
>>  int xen_blkif_schedule(void *arg)
>> @@ -563,6 +569,10 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
>>               blkif->st_f_req++;
>>               operation = WRITE_FLUSH;
>>               break;
>> +     case BLKIF_OP_TRIM:
>> +             blkif->st_tr_req++;
>> +             operation = REQ_DISCARD;
>> +             break;
>>       case BLKIF_OP_WRITE_BARRIER:
>>       default:
>>               operation = 0; /* make gcc happy */
>> @@ -572,7 +582,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
>>
>>       /* Check that the number of segments is sane. */
>>       nseg = req->nr_segments;
>> -     if (unlikely(nseg == 0 && operation != WRITE_FLUSH) ||
>> +     if (unlikely(nseg == 0 && operation != (WRITE_FLUSH | REQ_DISCARD)) ||
>>           unlikely(nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST)) {
>>               pr_debug(DRV_PFX "Bad number of segments in request (%d)\n",
>>                        nseg);
>> @@ -627,10 +637,13 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
>>        * the hypercall to unmap the grants - that is all done in
>>        * xen_blkbk_unmap.
>>        */
>> -     if (xen_blkbk_map(req, pending_req, seg))
>> +     if (operation != BLKIF_OP_TRIM && xen_blkbk_map(req, pending_req, seg))
>>               goto fail_flush;
>>
>> -     /* This corresponding xen_blkif_put is done in __end_block_io_op */
>> +     /*
>> +      * This corresponding xen_blkif_put is done in __end_block_io_op, or
>> +      * below if we are handling a BLKIF_OP_TRIM.
>> +      */
>>       xen_blkif_get(blkif);
>>
>>       for (i = 0; i < nseg; i++) {
>> @@ -654,18 +667,62 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
>>               preq.sector_number += seg[i].nsec;
>>       }
>>
>> -     /* This will be hit if the operation was a flush. */
>> +     /* This will be hit if the operation was a flush or trim. */
>>       if (!bio) {
>> -             BUG_ON(operation != WRITE_FLUSH);
>> +             BUG_ON(operation != (WRITE_FLUSH | REQ_DISCARD));
>>
>> -             bio = bio_alloc(GFP_KERNEL, 0);
>> -             if (unlikely(bio == NULL))
>> -                     goto fail_put_bio;
>> +             if (operation == WRITE_FLUSH) {
>> +                     bio = bio_alloc(GFP_KERNEL, 0);
>> +                     if (unlikely(bio == NULL))
>> +                             goto fail_put_bio;
>>
>> -             biolist[nbio++] = bio;
>> -             bio->bi_bdev    = preq.bdev;
>> -             bio->bi_private = pending_req;
>> -             bio->bi_end_io  = end_block_io_op;
>> +                     biolist[nbio++] = bio;
>> +                     bio->bi_bdev    = preq.bdev;
>> +                     bio->bi_private = pending_req;
>> +                     bio->bi_end_io  = end_block_io_op;
>> +             } else if (operation == REQ_DISCARD) {
>> +                     int err = 0;
>> +                     int status = BLKIF_RSP_OKAY;
>> +                     struct block_device *bdev = blkif->vbd.bdev;
>> +
>> +                     preq.nr_sects = req->u.trim.nr_sectors;
>> +                     if (blkif->vbd.type & VDISK_PHY_BACKEND)
>> +                             /* just forward the trim request */
>> +                             err = blkdev_issue_discard(bdev,
>> +                                             preq.sector_number,
>> +                                             preq.nr_sects,
>> +                                             GFP_KERNEL, 0);
>> +                     else if (blkif->vbd.type & VDISK_FILE_BACKEND) {
>> +                             /* punch a hole in the backing file */
>> +                             struct loop_device *lo =
>> +                                     bdev->bd_disk->private_data;
>> +                             struct file *file = lo->lo_backing_file;
>> +
>> +                             if (file->f_op->fallocate)
>> +                                     err = file->f_op->fallocate(file,
>> +                                             FALLOC_FL_KEEP_SIZE |
>> +                                             FALLOC_FL_PUNCH_HOLE,
>> +                                             preq.sector_number << 9,
>> +                                             preq.nr_sects << 9);
>> +                             else
>> +                                     err = -EOPNOTSUPP;
>> +                     } else
>> +                             status = BLKIF_RSP_EOPNOTSUPP;
>> +
>> +                     if (err == -EOPNOTSUPP) {
>> +                             DPRINTK("blkback: discard op failed, "
>> +                                             "not supported\n");
>
> Use pr_debug like the rest of the file does.
gonna fix
>
>> +                             status = BLKIF_RSP_EOPNOTSUPP;
>> +                     } else if (err)
>> +                             status = BLKIF_RSP_ERROR;
>> +
>> +                     if (status == BLKIF_RSP_OKAY)
>> +                             blkif->st_tr_sect += preq.nr_sects;
>> +                     make_response(blkif, req->id, req->operation, status);
>> +                     xen_blkif_put(blkif);
>> +                     free_req(pending_req);
>> +                     return 0;
>> +             }
>
> All of this should really be moved to its own function.
not quite clear about this, do you mean we should make sth like
dispatch_trim_block_io only for OP_TRIM?
I added the trim handling stuff to dispatch_rw_block_io because it
also handles flush stuff.
>
>>       }
>>
>>       /*
>> diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
>> index 9e40b28..1fef727 100644
>> --- a/drivers/block/xen-blkback/common.h
>> +++ b/drivers/block/xen-blkback/common.h
>> @@ -159,8 +159,10 @@ struct xen_blkif {
>>       int                     st_wr_req;
>>       int                     st_oo_req;
>>       int                     st_f_req;
>> +     int                     st_tr_req;
>>       int                     st_rd_sect;
>>       int                     st_wr_sect;
>> +     int                     st_tr_sect;
>>
>>       wait_queue_head_t       waiting_to_free;
>>
>> @@ -182,7 +184,7 @@ struct xen_blkif {
>>
>>  struct phys_req {
>>       unsigned short          dev;
>> -     unsigned short          nr_sects;
>> +     blkif_sector_t          nr_sects;
>>       struct block_device     *bdev;
>>       blkif_sector_t          sector_number;
>>  };
>> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
>> index 3f129b4..05ea8e0 100644
>> --- a/drivers/block/xen-blkback/xenbus.c
>> +++ b/drivers/block/xen-blkback/xenbus.c
>> @@ -272,16 +272,20 @@ VBD_SHOW(oo_req,  "%d\n", be->blkif->st_oo_req);
>>  VBD_SHOW(rd_req,  "%d\n", be->blkif->st_rd_req);
>>  VBD_SHOW(wr_req,  "%d\n", be->blkif->st_wr_req);
>>  VBD_SHOW(f_req,  "%d\n", be->blkif->st_f_req);
>> +VBD_SHOW(tr_req, "%d\n", be->blkif->st_tr_req);
>>  VBD_SHOW(rd_sect, "%d\n", be->blkif->st_rd_sect);
>>  VBD_SHOW(wr_sect, "%d\n", be->blkif->st_wr_sect);
>> +VBD_SHOW(tr_sect, "%d\n", be->blkif->st_tr_sect);
>>
>>  static struct attribute *xen_vbdstat_attrs[] = {
>>       &dev_attr_oo_req.attr,
>>       &dev_attr_rd_req.attr,
>>       &dev_attr_wr_req.attr,
>>       &dev_attr_f_req.attr,
>> +     &dev_attr_tr_req.attr,
>>       &dev_attr_rd_sect.attr,
>>       &dev_attr_wr_sect.attr,
>> +     &dev_attr_tr_sect.attr,
>>       NULL
>>  };
>>
>> @@ -419,6 +423,59 @@ int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt,
>>       return err;
>>  }
>>
>> +int xen_blkbk_trim(struct xenbus_transaction xbt, struct backend_info *be)
>> +{
>> +     struct xenbus_device *dev = be->dev;
>> +     struct xen_vbd *vbd = &be->blkif->vbd;
>> +     char *type;
>> +     int err;
>> +     int state = 0;
>> +
>> +     type = xenbus_read(XBT_NIL, dev->nodename, "type", NULL);
>> +     if (!IS_ERR(type)) {
>> +             if (strcmp(type, "file") == 0)
>> +                     state = 1;
>> +                     vbd->type |= VDISK_FILE_BACKEND;
>> +             if (strcmp(type, "phy") == 0) {
>
> Use 'strncmp' please.
gonna fix
>
>> +                     struct block_device *bdev = be->blkif->vbd.bdev;
>> +                     struct request_queue *q = bdev_get_queue(bdev);
>> +                     if (blk_queue_discard(q)) {
>> +                             err = xenbus_printf(xbt, dev->nodename,
>> +                                     "discard_granularity", "%u",
>
> Hm, most of the items written to the Xenbus use '-', not '_'.
> Any particular reason for using '_'?
they are taken from struct queue_limits so I used the underscore, no
problem to change them to '-'
>
>> +                                     q->limits.discard_granularity);
>> +                             if (err) {
>> +                                     xenbus_dev_fatal(dev, err,
>> +                                             "writing discard_granularity");
>> +                                     goto kfree;
>> +                             }
>> +                             err = xenbus_printf(xbt, dev->nodename,
>> +                                     "discard_alignment", "%u",
>
> Ditto here.
>> +                                     q->limits.discard_alignment);
>> +                             if (err) {
>> +                                     xenbus_dev_fatal(dev, err,
>> +                                             "writing discard_alignment");
>> +                                     goto kfree;
>> +                             }
>> +                             state = 1;
>> +                             vbd->type |= VDISK_PHY_BACKEND;
>> +                     }
>> +             }
>> +     } else {
>> +             err = PTR_ERR(type);
>> +             xenbus_dev_fatal(dev, err, "reading type");
>> +             goto out;
>> +     }
>> +
>> +     err = xenbus_printf(xbt, dev->nodename, "feature-trim",
>> +                         "%d", state);
>> +     if (err)
>> +             xenbus_dev_fatal(dev, err, "writing feature-trim");
>> +kfree:
>> +     kfree(type);
>> +out:
>> +     return err;
>> +}
>> +
>>  /*
>>   * Entry point to this code when a new device is created.  Allocate the basic
>>   * structures, and watch the store waiting for the hotplug scripts to tell us
>> @@ -650,6 +707,10 @@ again:
>>       if (err)
>>               goto abort;
>>
>> +     err = xen_blkbk_trim(xbt, be);
>> +     if (err)
>> +             goto abort;
>> +
>>       err = xenbus_printf(xbt, dev->nodename, "sectors", "%llu",
>>                           (unsigned long long)vbd_sz(&be->blkif->vbd));
>>       if (err) {
>> --
>> 1.7.6
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V2 3/3] xen-blkback: handle trim request in backend driver
  2011-08-22  9:43     ` Li Dongyang
@ 2011-08-22 13:27       ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 14+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-08-22 13:27 UTC (permalink / raw)
  To: Li Dongyang; +Cc: xen-devel, owen.smith, JBeulich

On Mon, Aug 22, 2011 at 05:43:28PM +0800, Li Dongyang wrote:
> On Thu, Aug 18, 2011 at 10:56 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
> > On Thu, Aug 18, 2011 at 05:34:31PM +0800, Li Dongyang wrote:
> >> Now blkback driver can handle the trim request from guest, we will
> >> forward the request to phy device if it really has trim support, or we'll
> >> punch a hole on the image file.
> >>
> >> Signed-off-by: Li Dongyang <lidongyang@novell.com>
> >> ---
> >>  drivers/block/xen-blkback/blkback.c |   85 +++++++++++++++++++++++++++++------
> >>  drivers/block/xen-blkback/common.h  |    4 +-
> >>  drivers/block/xen-blkback/xenbus.c  |   61 +++++++++++++++++++++++++
> >>  3 files changed, 135 insertions(+), 15 deletions(-)
> >>
> >> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
> >> index 2330a9a..5acc37a 100644
> >> --- a/drivers/block/xen-blkback/blkback.c
> >> +++ b/drivers/block/xen-blkback/blkback.c
> >> @@ -39,6 +39,9 @@
> >>  #include <linux/list.h>
> >>  #include <linux/delay.h>
> >>  #include <linux/freezer.h>
> >> +#include <linux/loop.h>
> >> +#include <linux/falloc.h>
> >> +#include <linux/fs.h>
> >>
> >>  #include <xen/events.h>
> >>  #include <xen/page.h>
> >> @@ -258,13 +261,16 @@ irqreturn_t xen_blkif_be_int(int irq, void *dev_id)
> >>
> >>  static void print_stats(struct xen_blkif *blkif)
> >>  {
> >> -     pr_info("xen-blkback (%s): oo %3d  |  rd %4d  |  wr %4d  |  f %4d\n",
> >> +     pr_info("xen-blkback (%s): oo %3d  |  rd %4d  |  wr %4d  |  f %4d"
> >> +              "  |  tr %4d\n",
> >>                current->comm, blkif->st_oo_req,
> >> -              blkif->st_rd_req, blkif->st_wr_req, blkif->st_f_req);
> >> +              blkif->st_rd_req, blkif->st_wr_req,
> >> +              blkif->st_f_req, blkif->st_tr_req);
> >>       blkif->st_print = jiffies + msecs_to_jiffies(10 * 1000);
> >>       blkif->st_rd_req = 0;
> >>       blkif->st_wr_req = 0;
> >>       blkif->st_oo_req = 0;
> >> +     blkif->st_tr_req = 0;
> >>  }
> >>
> >>  int xen_blkif_schedule(void *arg)
> >> @@ -563,6 +569,10 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
> >>               blkif->st_f_req++;
> >>               operation = WRITE_FLUSH;
> >>               break;
> >> +     case BLKIF_OP_TRIM:
> >> +             blkif->st_tr_req++;
> >> +             operation = REQ_DISCARD;
> >> +             break;
> >>       case BLKIF_OP_WRITE_BARRIER:
> >>       default:
> >>               operation = 0; /* make gcc happy */
> >> @@ -572,7 +582,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
> >>
> >>       /* Check that the number of segments is sane. */
> >>       nseg = req->nr_segments;
> >> -     if (unlikely(nseg == 0 && operation != WRITE_FLUSH) ||
> >> +     if (unlikely(nseg == 0 && operation != (WRITE_FLUSH | REQ_DISCARD)) ||
> >>           unlikely(nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST)) {
> >>               pr_debug(DRV_PFX "Bad number of segments in request (%d)\n",
> >>                        nseg);
> >> @@ -627,10 +637,13 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
> >>        * the hypercall to unmap the grants - that is all done in
> >>        * xen_blkbk_unmap.
> >>        */
> >> -     if (xen_blkbk_map(req, pending_req, seg))
> >> +     if (operation != BLKIF_OP_TRIM && xen_blkbk_map(req, pending_req, seg))
> >>               goto fail_flush;
> >>
> >> -     /* This corresponding xen_blkif_put is done in __end_block_io_op */
> >> +     /*
> >> +      * This corresponding xen_blkif_put is done in __end_block_io_op, or
> >> +      * below if we are handling a BLKIF_OP_TRIM.
> >> +      */
> >>       xen_blkif_get(blkif);
> >>
> >>       for (i = 0; i < nseg; i++) {
> >> @@ -654,18 +667,62 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
> >>               preq.sector_number += seg[i].nsec;
> >>       }
> >>
> >> -     /* This will be hit if the operation was a flush. */
> >> +     /* This will be hit if the operation was a flush or trim. */
> >>       if (!bio) {
> >> -             BUG_ON(operation != WRITE_FLUSH);
> >> +             BUG_ON(operation != (WRITE_FLUSH | REQ_DISCARD));
> >>
> >> -             bio = bio_alloc(GFP_KERNEL, 0);
> >> -             if (unlikely(bio == NULL))
> >> -                     goto fail_put_bio;
> >> +             if (operation == WRITE_FLUSH) {
> >> +                     bio = bio_alloc(GFP_KERNEL, 0);
> >> +                     if (unlikely(bio == NULL))
> >> +                             goto fail_put_bio;
> >>
> >> -             biolist[nbio++] = bio;
> >> -             bio->bi_bdev    = preq.bdev;
> >> -             bio->bi_private = pending_req;
> >> -             bio->bi_end_io  = end_block_io_op;
> >> +                     biolist[nbio++] = bio;
> >> +                     bio->bi_bdev    = preq.bdev;
> >> +                     bio->bi_private = pending_req;
> >> +                     bio->bi_end_io  = end_block_io_op;
> >> +             } else if (operation == REQ_DISCARD) {
> >> +                     int err = 0;
> >> +                     int status = BLKIF_RSP_OKAY;
> >> +                     struct block_device *bdev = blkif->vbd.bdev;
> >> +
> >> +                     preq.nr_sects = req->u.trim.nr_sectors;
> >> +                     if (blkif->vbd.type & VDISK_PHY_BACKEND)
> >> +                             /* just forward the trim request */
> >> +                             err = blkdev_issue_discard(bdev,
> >> +                                             preq.sector_number,
> >> +                                             preq.nr_sects,
> >> +                                             GFP_KERNEL, 0);
> >> +                     else if (blkif->vbd.type & VDISK_FILE_BACKEND) {
> >> +                             /* punch a hole in the backing file */
> >> +                             struct loop_device *lo =
> >> +                                     bdev->bd_disk->private_data;
> >> +                             struct file *file = lo->lo_backing_file;
> >> +
> >> +                             if (file->f_op->fallocate)
> >> +                                     err = file->f_op->fallocate(file,
> >> +                                             FALLOC_FL_KEEP_SIZE |
> >> +                                             FALLOC_FL_PUNCH_HOLE,
> >> +                                             preq.sector_number << 9,
> >> +                                             preq.nr_sects << 9);
> >> +                             else
> >> +                                     err = -EOPNOTSUPP;
> >> +                     } else
> >> +                             status = BLKIF_RSP_EOPNOTSUPP;
> >> +
> >> +                     if (err == -EOPNOTSUPP) {
> >> +                             DPRINTK("blkback: discard op failed, "
> >> +                                             "not supported\n");
> >
> > Use pr_debug like the rest of the file does.
> gonna fix
> >
> >> +                             status = BLKIF_RSP_EOPNOTSUPP;
> >> +                     } else if (err)
> >> +                             status = BLKIF_RSP_ERROR;
> >> +
> >> +                     if (status == BLKIF_RSP_OKAY)
> >> +                             blkif->st_tr_sect += preq.nr_sects;
> >> +                     make_response(blkif, req->id, req->operation, status);
> >> +                     xen_blkif_put(blkif);
> >> +                     free_req(pending_req);
> >> +                     return 0;
> >> +             }
> >
> > All of this should really be moved to its own function.
> not quite clear about this, do you mean we should make sth like
> dispatch_trim_block_io only for OP_TRIM?
> I added the trim handling stuff to dispatch_rw_block_io because it
> also handles flush stuff.

That function is getting quite busy - it has a lot of code. My thought
was that you could seperate it out. Like

    if (!bio) {
        if (operation == WRITE_FLUSH) {
            bio = bio_alloc(..)
            .. snip (the same as your patch has it.
        } else if (operation == REQ_DISCARD) {
            xen_blk_trim(blkif, preq);
            return 0;
        }

And do all of the operation in xen_blk_trim. Including the make_response ..

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V2 1/3] xen-blkfront: add BLKIF_OP_TRIM and backend type flags
  2011-08-22  9:36     ` Li Dongyang
@ 2011-08-22 17:44       ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 14+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-22 17:44 UTC (permalink / raw)
  To: Li Dongyang; +Cc: xen-devel, owen.smith, JBeulich

On 08/22/2011 02:36 AM, Li Dongyang wrote:
> On Sat, Aug 20, 2011 at 8:38 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>> On 08/18/2011 02:34 AM, Li Dongyang wrote:
>>> This adds the BLKIF_OP_TRIM for blkfront and blkback, also 2 flags telling
>>> us the type of the backend, used in blkback to determine what to do when we
>>> see a trim request.
>>> Part of the patch is just taken from Owen Smith, Thanks
>>>
>>> Signed-off-by: Owen Smith <owen.smith@citrix.com>
>>> Signed-off-by: Li Dongyang <lidongyang@novell.com>
>>> ---
>>>  include/xen/interface/io/blkif.h |   21 +++++++++++++++++++++
>>>  1 files changed, 21 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/include/xen/interface/io/blkif.h b/include/xen/interface/io/blkif.h
>>> index 3d5d6db..b92cf23 100644
>>> --- a/include/xen/interface/io/blkif.h
>>> +++ b/include/xen/interface/io/blkif.h
>>> @@ -57,6 +57,19 @@ typedef uint64_t blkif_sector_t;
>>>   * "feature-flush-cache" node!
>>>   */
>>>  #define BLKIF_OP_FLUSH_DISKCACHE   3
>>> +
>>> +/*
>>> + * Recognised only if "feature-trim" is present in backend xenbus info.
>>> + * The "feature-trim" node contains a boolean indicating whether barrier
>>> + * requests are likely to succeed or fail. Either way, a trim request
>> Barrier requests?
> hm, I wonder the same, seems it's a copy & paste mistake,
> the BLKIF_OP_TRIM part is taken from Owen's patch back in Jan 2011:
> http://lists.xensource.com/archives/html/xen-devel/2011-01/msg01059.html
>>> + * may fail at any time with BLKIF_RSP_EOPNOTSUPP if it is unsupported by
>>> + * the underlying block-device hardware. The boolean simply indicates whether
>>> + * or not it is worthwhile for the frontend to attempt trim requests.
>>> + * If a backend does not recognise BLKIF_OP_TRIM, it should *not*
>>> + * create the "feature-trim" node!
>> Is all this necessary?  What happens if guests just send OP_TRIM
>> requests, and if the host doesn't understand them then it will fails
>> them with EOPNOTSUPP?  Is a TRIM request ever anything more than a hint
>> to the backend that certain blocks are no longer needed?
> that won't happen: we only mark the queue in the guest has TRIM if
> blkback tells blkfront
> via xenstore. if we don't init the queue with TRIM in guest, if guest
> send OP_TRIM,
> it gonna fail with ENONOTSUPP in the guest's block layer, see
> blkdev_issue_discard.
> and yes, trim is just a hint, the basic idea is forward the hint to
> phy dev if it has trim, or
> punch a hole to reduce disk usage if the backend is a file.
> and this comment is taken from Owen, I think he could give sth here.

Right.  So if this is just a hint, and there's no correctness
implications for the frontend if the backend doesn't support trim, then
the frontend should just start trying to use trim and then suppress them
if they come back as failed.

I guess if the frontend needs other information from the backend (about
trim chunk size?) then this is moot, but this kind of feature
negotiation via xenbus seems pretty flaky and it would be nice to avoid
repeating the mistake.

    J

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-08-22 17:44 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-18  9:34 [PATCH V2 0/3] xen-blkfront/xen-blkback trim support Li Dongyang
2011-08-18  9:34 ` [PATCH V2 1/3] xen-blkfront: add BLKIF_OP_TRIM and backend type flags Li Dongyang
2011-08-20  0:38   ` Jeremy Fitzhardinge
2011-08-22  9:36     ` Li Dongyang
2011-08-22 17:44       ` Jeremy Fitzhardinge
2011-08-18  9:34 ` [PATCH V2 2/3] xen-blkfront: teach blkfront driver handle trim request Li Dongyang
2011-08-18 15:05   ` Konrad Rzeszutek Wilk
2011-08-18  9:34 ` [PATCH V2 3/3] xen-blkback: handle trim request in backend driver Li Dongyang
2011-08-18 14:56   ` Konrad Rzeszutek Wilk
2011-08-22  9:43     ` Li Dongyang
2011-08-22 13:27       ` Konrad Rzeszutek Wilk
2011-08-18 14:56 ` [PATCH V2 0/3] xen-blkfront/xen-blkback trim support Konrad Rzeszutek Wilk
2011-08-18 15:06 ` Konrad Rzeszutek Wilk
2011-08-20  0:41 ` Jeremy Fitzhardinge

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.