[PATCH v3 0/9] xen-block: support multi hardware-queues/rings

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/9] xen-block: support multi hardware-queues/rings
@ 2015-09-05 12:39 Bob Liu
  2015-09-05 12:39 ` [PATCH v3 1/9] xen-blkfront: convert to blk-mq APIs Bob Liu
                   ` (14 more replies)
  0 siblings, 15 replies; 83+ messages in thread
From: Bob Liu @ 2015-09-05 12:39 UTC (permalink / raw)
  To: xen-devel
  Cc: david.vrabel, linux-kernel, roger.pau, konrad.wilk,
	felipe.franciosi, axboe, hch, avanzini.arianna, rafal.mielniczuk,
	boris.ostrovsky, jonathan.davies, Bob Liu

Note: These patches were based on original work of Arianna's internship for
GNOME's Outreach Program for Women.

The first patch which just convert xen-blkfront driver to use blk-mq api has
been applied by David.

After using blk-mq api, a guest has more than one(nr_vpus) software request
queues associated with each block front. These queues can be mapped over several
rings(hardware queues) to the backend, making it very easy for us to run
multiple threads on the backend for a single virtual disk.

By having different threads issuing requests at the same time, the performance
of guest can be improved significantly in the end.

Test was done based on null_blk driver:
dom0: v4.2-rc8 16vcpus 10GB "modprobe null_blk"
domu: v4.2-rc8 16vcpus 10GB

[test]
rw=read or randread
direct=1
ioengine=libaio
bs=4k
time_based
runtime=30
filename=/dev/xvdb
numjobs=16
iodepth=64
iodepth_batch=64
iodepth_batch_complete=64
group_reporting

Seqread:
	dom0 	domU(no_mq) 	domU(4 queues) 	 8 queues 	16 queues
iops:  1308k        690k        1380k(+200%)        1238k           1471k

Randread:
	dom0 	domU(no_mq) 	domU(4 queues) 	 8 queues 	16 queues
iops:  1310k        279k        810k(+200%)          871k           1000k

Only with 4queues, iops for domU get improved a lot and nearly catch up with
dom0. There were also similar huge improvement for write and real SSD storage.

---
v3: Rebased to v4.2-rc8

Bob Liu (9):
  xen-blkfront: convert to blk-mq APIs
  xen-block: add document for mutli hardware queues/rings
  xen/blkfront: separate per ring information out of device info
  xen/blkfront: pseudo support for multi hardware queues/rings
  xen/blkfront: convert per device io_lock to per ring ring_lock
  xen/blkfront: negotiate the number of hw queues/rings with backend
  xen/blkback: separate ring information out of struct xen_blkif
  xen/blkback: pseudo support for multi hardware queues/rings
  xen/blkback: get number of hardware queues/rings from blkfront

 drivers/block/xen-blkback/blkback.c |  373 +++++-----
 drivers/block/xen-blkback/common.h  |   53 +-
 drivers/block/xen-blkback/xenbus.c  |  376 ++++++----
 drivers/block/xen-blkfront.c        | 1343 ++++++++++++++++++++---------------
 include/xen/interface/io/blkif.h    |   32 +
 5 files changed, 1278 insertions(+), 899 deletions(-)

-- 
1.7.10.4


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v3 1/9] xen-blkfront: convert to blk-mq APIs
  2015-09-05 12:39 [PATCH v3 0/9] xen-block: support multi hardware-queues/rings Bob Liu
  2015-09-05 12:39 ` [PATCH v3 1/9] xen-blkfront: convert to blk-mq APIs Bob Liu
@ 2015-09-05 12:39 ` Bob Liu
  2015-09-23 20:31   ` Konrad Rzeszutek Wilk
  2015-09-23 20:31   ` Konrad Rzeszutek Wilk
  2015-09-05 12:39 ` [PATCH v3 2/9] xen-block: add document for mutli hardware queues/rings Bob Liu
                   ` (12 subsequent siblings)
  14 siblings, 2 replies; 83+ messages in thread
From: Bob Liu @ 2015-09-05 12:39 UTC (permalink / raw)
  To: xen-devel
  Cc: david.vrabel, linux-kernel, roger.pau, konrad.wilk,
	felipe.franciosi, axboe, hch, avanzini.arianna, rafal.mielniczuk,
	boris.ostrovsky, jonathan.davies, Bob Liu

Note: This patch is based on original work of Arianna's internship for
GNOME's Outreach Program for Women.

Only one hardware queue is used now, so there is no significant
performance change

The legacy non-mq code is deleted completely which is the same as other
drivers like virtio, mtip, and nvme.

Also dropped one unnecessary holding of info->io_lock when calling
blk_mq_stop_hw_queues().

Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jens Axboe <axboe@fb.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/block/xen-blkfront.c |  146 +++++++++++++++++-------------------------
 1 file changed, 60 insertions(+), 86 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 7a8a73f..5dd591d 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -37,6 +37,7 @@
 
 #include <linux/interrupt.h>
 #include <linux/blkdev.h>
+#include <linux/blk-mq.h>
 #include <linux/hdreg.h>
 #include <linux/cdrom.h>
 #include <linux/module.h>
@@ -148,6 +149,7 @@ struct blkfront_info
 	unsigned int feature_persistent:1;
 	unsigned int max_indirect_segments;
 	int is_ready;
+	struct blk_mq_tag_set tag_set;
 };
 
 static unsigned int nr_minors;
@@ -617,54 +619,41 @@ static inline bool blkif_request_flush_invalid(struct request *req,
 		 !(info->feature_flush & REQ_FUA)));
 }
 
-/*
- * do_blkif_request
- *  read a block; request is in a request queue
- */
-static void do_blkif_request(struct request_queue *rq)
+static int blkif_queue_rq(struct blk_mq_hw_ctx *hctx,
+			   const struct blk_mq_queue_data *qd)
 {
-	struct blkfront_info *info = NULL;
-	struct request *req;
-	int queued;
-
-	pr_debug("Entered do_blkif_request\n");
-
-	queued = 0;
+	struct blkfront_info *info = qd->rq->rq_disk->private_data;
 
-	while ((req = blk_peek_request(rq)) != NULL) {
-		info = req->rq_disk->private_data;
-
-		if (RING_FULL(&info->ring))
-			goto wait;
+	blk_mq_start_request(qd->rq);
+	spin_lock_irq(&info->io_lock);
+	if (RING_FULL(&info->ring))
+		goto out_busy;
 
-		blk_start_request(req);
+	if (blkif_request_flush_invalid(qd->rq, info))
+		goto out_err;
 
-		if (blkif_request_flush_invalid(req, info)) {
-			__blk_end_request_all(req, -EOPNOTSUPP);
-			continue;
-		}
+	if (blkif_queue_request(qd->rq))
+		goto out_busy;
 
-		pr_debug("do_blk_req %p: cmd %p, sec %lx, "
-			 "(%u/%u) [%s]\n",
-			 req, req->cmd, (unsigned long)blk_rq_pos(req),
-			 blk_rq_cur_sectors(req), blk_rq_sectors(req),
-			 rq_data_dir(req) ? "write" : "read");
-
-		if (blkif_queue_request(req)) {
-			blk_requeue_request(rq, req);
-wait:
-			/* Avoid pointless unplugs. */
-			blk_stop_queue(rq);
-			break;
-		}
+	flush_requests(info);
+	spin_unlock_irq(&info->io_lock);
+	return BLK_MQ_RQ_QUEUE_OK;
 
-		queued++;
-	}
+out_err:
+	spin_unlock_irq(&info->io_lock);
+	return BLK_MQ_RQ_QUEUE_ERROR;
 
-	if (queued != 0)
-		flush_requests(info);
+out_busy:
+	spin_unlock_irq(&info->io_lock);
+	blk_mq_stop_hw_queue(hctx);
+	return BLK_MQ_RQ_QUEUE_BUSY;
 }
 
+static struct blk_mq_ops blkfront_mq_ops = {
+	.queue_rq = blkif_queue_rq,
+	.map_queue = blk_mq_map_queue,
+};
+
 static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
 				unsigned int physical_sector_size,
 				unsigned int segments)
@@ -672,9 +661,22 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
 	struct request_queue *rq;
 	struct blkfront_info *info = gd->private_data;
 
-	rq = blk_init_queue(do_blkif_request, &info->io_lock);
-	if (rq == NULL)
+	memset(&info->tag_set, 0, sizeof(info->tag_set));
+	info->tag_set.ops = &blkfront_mq_ops;
+	info->tag_set.nr_hw_queues = 1;
+	info->tag_set.queue_depth =  BLK_RING_SIZE(info);
+	info->tag_set.numa_node = NUMA_NO_NODE;
+	info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
+	info->tag_set.cmd_size = 0;
+	info->tag_set.driver_data = info;
+
+	if (blk_mq_alloc_tag_set(&info->tag_set))
 		return -1;
+	rq = blk_mq_init_queue(&info->tag_set);
+	if (IS_ERR(rq)) {
+		blk_mq_free_tag_set(&info->tag_set);
+		return -1;
+	}
 
 	queue_flag_set_unlocked(QUEUE_FLAG_VIRT, rq);
 
@@ -902,19 +904,15 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
 static void xlvbd_release_gendisk(struct blkfront_info *info)
 {
 	unsigned int minor, nr_minors;
-	unsigned long flags;
 
 	if (info->rq == NULL)
 		return;
 
-	spin_lock_irqsave(&info->io_lock, flags);
-
 	/* No more blkif_request(). */
-	blk_stop_queue(info->rq);
+	blk_mq_stop_hw_queues(info->rq);
 
 	/* No more gnttab callback work. */
 	gnttab_cancel_free_callback(&info->callback);
-	spin_unlock_irqrestore(&info->io_lock, flags);
 
 	/* Flush gnttab callback work. Must be done with no locks held. */
 	flush_work(&info->work);
@@ -926,20 +924,18 @@ static void xlvbd_release_gendisk(struct blkfront_info *info)
 	xlbd_release_minors(minor, nr_minors);
 
 	blk_cleanup_queue(info->rq);
+	blk_mq_free_tag_set(&info->tag_set);
 	info->rq = NULL;
 
 	put_disk(info->gd);
 	info->gd = NULL;
 }
 
+/* Must be called with io_lock holded */
 static void kick_pending_request_queues(struct blkfront_info *info)
 {
-	if (!RING_FULL(&info->ring)) {
-		/* Re-enable calldowns. */
-		blk_start_queue(info->rq);
-		/* Kick things off immediately. */
-		do_blkif_request(info->rq);
-	}
+	if (!RING_FULL(&info->ring))
+		blk_mq_start_stopped_hw_queues(info->rq, true);
 }
 
 static void blkif_restart_queue(struct work_struct *work)
@@ -964,7 +960,7 @@ static void blkif_free(struct blkfront_info *info, int suspend)
 		BLKIF_STATE_SUSPENDED : BLKIF_STATE_DISCONNECTED;
 	/* No more blkif_request(). */
 	if (info->rq)
-		blk_stop_queue(info->rq);
+		blk_mq_stop_hw_queues(info->rq);
 
 	/* Remove all persistent grants */
 	if (!list_empty(&info->grants)) {
@@ -1147,7 +1143,6 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 	RING_IDX i, rp;
 	unsigned long flags;
 	struct blkfront_info *info = (struct blkfront_info *)dev_id;
-	int error;
 
 	spin_lock_irqsave(&info->io_lock, flags);
 
@@ -1188,37 +1183,37 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 			continue;
 		}
 
-		error = (bret->status == BLKIF_RSP_OKAY) ? 0 : -EIO;
+		req->errors = (bret->status == BLKIF_RSP_OKAY) ? 0 : -EIO;
 		switch (bret->operation) {
 		case BLKIF_OP_DISCARD:
 			if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) {
 				struct request_queue *rq = info->rq;
 				printk(KERN_WARNING "blkfront: %s: %s op failed\n",
 					   info->gd->disk_name, op_name(bret->operation));
-				error = -EOPNOTSUPP;
+				req->errors = -EOPNOTSUPP;
 				info->feature_discard = 0;
 				info->feature_secdiscard = 0;
 				queue_flag_clear(QUEUE_FLAG_DISCARD, rq);
 				queue_flag_clear(QUEUE_FLAG_SECDISCARD, rq);
 			}
-			__blk_end_request_all(req, error);
+			blk_mq_complete_request(req);
 			break;
 		case BLKIF_OP_FLUSH_DISKCACHE:
 		case BLKIF_OP_WRITE_BARRIER:
 			if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) {
 				printk(KERN_WARNING "blkfront: %s: %s op failed\n",
 				       info->gd->disk_name, op_name(bret->operation));
-				error = -EOPNOTSUPP;
+				req->errors = -EOPNOTSUPP;
 			}
 			if (unlikely(bret->status == BLKIF_RSP_ERROR &&
 				     info->shadow[id].req.u.rw.nr_segments == 0)) {
 				printk(KERN_WARNING "blkfront: %s: empty %s op failed\n",
 				       info->gd->disk_name, op_name(bret->operation));
-				error = -EOPNOTSUPP;
+				req->errors = -EOPNOTSUPP;
 			}
-			if (unlikely(error)) {
-				if (error == -EOPNOTSUPP)
-					error = 0;
+			if (unlikely(req->errors)) {
+				if (req->errors == -EOPNOTSUPP)
+					req->errors = 0;
 				info->feature_flush = 0;
 				xlvbd_flush(info);
 			}
@@ -1229,7 +1224,7 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 				dev_dbg(&info->xbdev->dev, "Bad return from blkdev data "
 					"request: %x\n", bret->status);
 
-			__blk_end_request_all(req, error);
+			blk_mq_complete_request(req);
 			break;
 		default:
 			BUG();
@@ -1558,28 +1553,6 @@ static int blkif_recover(struct blkfront_info *info)
 
 	kfree(copy);
 
-	/*
-	 * Empty the queue, this is important because we might have
-	 * requests in the queue with more segments than what we
-	 * can handle now.
-	 */
-	spin_lock_irq(&info->io_lock);
-	while ((req = blk_fetch_request(info->rq)) != NULL) {
-		if (req->cmd_flags &
-		    (REQ_FLUSH | REQ_FUA | REQ_DISCARD | REQ_SECURE)) {
-			list_add(&req->queuelist, &requests);
-			continue;
-		}
-		merge_bio.head = req->bio;
-		merge_bio.tail = req->biotail;
-		bio_list_merge(&bio_list, &merge_bio);
-		req->bio = NULL;
-		if (req->cmd_flags & (REQ_FLUSH | REQ_FUA))
-			pr_alert("diskcache flush request found!\n");
-		__blk_end_request_all(req, 0);
-	}
-	spin_unlock_irq(&info->io_lock);
-
 	xenbus_switch_state(info->xbdev, XenbusStateConnected);
 
 	spin_lock_irq(&info->io_lock);
@@ -1594,9 +1567,10 @@ static int blkif_recover(struct blkfront_info *info)
 		/* Requeue pending requests (flush or discard) */
 		list_del_init(&req->queuelist);
 		BUG_ON(req->nr_phys_segments > segs);
-		blk_requeue_request(info->rq, req);
+		blk_mq_requeue_request(req);
 	}
 	spin_unlock_irq(&info->io_lock);
+	blk_mq_kick_requeue_list(info->rq);
 
 	while ((bio = bio_list_pop(&bio_list)) != NULL) {
 		/* Traverse the list of pending bios and re-queue them */
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v3 1/9] xen-blkfront: convert to blk-mq APIs
  2015-09-05 12:39 [PATCH v3 0/9] xen-block: support multi hardware-queues/rings Bob Liu
@ 2015-09-05 12:39 ` Bob Liu
  2015-09-05 12:39 ` Bob Liu
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 83+ messages in thread
From: Bob Liu @ 2015-09-05 12:39 UTC (permalink / raw)
  To: xen-devel
  Cc: hch, felipe.franciosi, rafal.mielniczuk, linux-kernel,
	jonathan.davies, axboe, Bob Liu, david.vrabel, avanzini.arianna,
	boris.ostrovsky, roger.pau

Note: This patch is based on original work of Arianna's internship for
GNOME's Outreach Program for Women.

Only one hardware queue is used now, so there is no significant
performance change

The legacy non-mq code is deleted completely which is the same as other
drivers like virtio, mtip, and nvme.

Also dropped one unnecessary holding of info->io_lock when calling
blk_mq_stop_hw_queues().

Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jens Axboe <axboe@fb.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/block/xen-blkfront.c |  146 +++++++++++++++++-------------------------
 1 file changed, 60 insertions(+), 86 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 7a8a73f..5dd591d 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -37,6 +37,7 @@
 
 #include <linux/interrupt.h>
 #include <linux/blkdev.h>
+#include <linux/blk-mq.h>
 #include <linux/hdreg.h>
 #include <linux/cdrom.h>
 #include <linux/module.h>
@@ -148,6 +149,7 @@ struct blkfront_info
 	unsigned int feature_persistent:1;
 	unsigned int max_indirect_segments;
 	int is_ready;
+	struct blk_mq_tag_set tag_set;
 };
 
 static unsigned int nr_minors;
@@ -617,54 +619,41 @@ static inline bool blkif_request_flush_invalid(struct request *req,
 		 !(info->feature_flush & REQ_FUA)));
 }
 
-/*
- * do_blkif_request
- *  read a block; request is in a request queue
- */
-static void do_blkif_request(struct request_queue *rq)
+static int blkif_queue_rq(struct blk_mq_hw_ctx *hctx,
+			   const struct blk_mq_queue_data *qd)
 {
-	struct blkfront_info *info = NULL;
-	struct request *req;
-	int queued;
-
-	pr_debug("Entered do_blkif_request\n");
-
-	queued = 0;
+	struct blkfront_info *info = qd->rq->rq_disk->private_data;
 
-	while ((req = blk_peek_request(rq)) != NULL) {
-		info = req->rq_disk->private_data;
-
-		if (RING_FULL(&info->ring))
-			goto wait;
+	blk_mq_start_request(qd->rq);
+	spin_lock_irq(&info->io_lock);
+	if (RING_FULL(&info->ring))
+		goto out_busy;
 
-		blk_start_request(req);
+	if (blkif_request_flush_invalid(qd->rq, info))
+		goto out_err;
 
-		if (blkif_request_flush_invalid(req, info)) {
-			__blk_end_request_all(req, -EOPNOTSUPP);
-			continue;
-		}
+	if (blkif_queue_request(qd->rq))
+		goto out_busy;
 
-		pr_debug("do_blk_req %p: cmd %p, sec %lx, "
-			 "(%u/%u) [%s]\n",
-			 req, req->cmd, (unsigned long)blk_rq_pos(req),
-			 blk_rq_cur_sectors(req), blk_rq_sectors(req),
-			 rq_data_dir(req) ? "write" : "read");
-
-		if (blkif_queue_request(req)) {
-			blk_requeue_request(rq, req);
-wait:
-			/* Avoid pointless unplugs. */
-			blk_stop_queue(rq);
-			break;
-		}
+	flush_requests(info);
+	spin_unlock_irq(&info->io_lock);
+	return BLK_MQ_RQ_QUEUE_OK;
 
-		queued++;
-	}
+out_err:
+	spin_unlock_irq(&info->io_lock);
+	return BLK_MQ_RQ_QUEUE_ERROR;
 
-	if (queued != 0)
-		flush_requests(info);
+out_busy:
+	spin_unlock_irq(&info->io_lock);
+	blk_mq_stop_hw_queue(hctx);
+	return BLK_MQ_RQ_QUEUE_BUSY;
 }
 
+static struct blk_mq_ops blkfront_mq_ops = {
+	.queue_rq = blkif_queue_rq,
+	.map_queue = blk_mq_map_queue,
+};
+
 static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
 				unsigned int physical_sector_size,
 				unsigned int segments)
@@ -672,9 +661,22 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
 	struct request_queue *rq;
 	struct blkfront_info *info = gd->private_data;
 
-	rq = blk_init_queue(do_blkif_request, &info->io_lock);
-	if (rq == NULL)
+	memset(&info->tag_set, 0, sizeof(info->tag_set));
+	info->tag_set.ops = &blkfront_mq_ops;
+	info->tag_set.nr_hw_queues = 1;
+	info->tag_set.queue_depth =  BLK_RING_SIZE(info);
+	info->tag_set.numa_node = NUMA_NO_NODE;
+	info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
+	info->tag_set.cmd_size = 0;
+	info->tag_set.driver_data = info;
+
+	if (blk_mq_alloc_tag_set(&info->tag_set))
 		return -1;
+	rq = blk_mq_init_queue(&info->tag_set);
+	if (IS_ERR(rq)) {
+		blk_mq_free_tag_set(&info->tag_set);
+		return -1;
+	}
 
 	queue_flag_set_unlocked(QUEUE_FLAG_VIRT, rq);
 
@@ -902,19 +904,15 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
 static void xlvbd_release_gendisk(struct blkfront_info *info)
 {
 	unsigned int minor, nr_minors;
-	unsigned long flags;
 
 	if (info->rq == NULL)
 		return;
 
-	spin_lock_irqsave(&info->io_lock, flags);
-
 	/* No more blkif_request(). */
-	blk_stop_queue(info->rq);
+	blk_mq_stop_hw_queues(info->rq);
 
 	/* No more gnttab callback work. */
 	gnttab_cancel_free_callback(&info->callback);
-	spin_unlock_irqrestore(&info->io_lock, flags);
 
 	/* Flush gnttab callback work. Must be done with no locks held. */
 	flush_work(&info->work);
@@ -926,20 +924,18 @@ static void xlvbd_release_gendisk(struct blkfront_info *info)
 	xlbd_release_minors(minor, nr_minors);
 
 	blk_cleanup_queue(info->rq);
+	blk_mq_free_tag_set(&info->tag_set);
 	info->rq = NULL;
 
 	put_disk(info->gd);
 	info->gd = NULL;
 }
 
+/* Must be called with io_lock holded */
 static void kick_pending_request_queues(struct blkfront_info *info)
 {
-	if (!RING_FULL(&info->ring)) {
-		/* Re-enable calldowns. */
-		blk_start_queue(info->rq);
-		/* Kick things off immediately. */
-		do_blkif_request(info->rq);
-	}
+	if (!RING_FULL(&info->ring))
+		blk_mq_start_stopped_hw_queues(info->rq, true);
 }
 
 static void blkif_restart_queue(struct work_struct *work)
@@ -964,7 +960,7 @@ static void blkif_free(struct blkfront_info *info, int suspend)
 		BLKIF_STATE_SUSPENDED : BLKIF_STATE_DISCONNECTED;
 	/* No more blkif_request(). */
 	if (info->rq)
-		blk_stop_queue(info->rq);
+		blk_mq_stop_hw_queues(info->rq);
 
 	/* Remove all persistent grants */
 	if (!list_empty(&info->grants)) {
@@ -1147,7 +1143,6 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 	RING_IDX i, rp;
 	unsigned long flags;
 	struct blkfront_info *info = (struct blkfront_info *)dev_id;
-	int error;
 
 	spin_lock_irqsave(&info->io_lock, flags);
 
@@ -1188,37 +1183,37 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 			continue;
 		}
 
-		error = (bret->status == BLKIF_RSP_OKAY) ? 0 : -EIO;
+		req->errors = (bret->status == BLKIF_RSP_OKAY) ? 0 : -EIO;
 		switch (bret->operation) {
 		case BLKIF_OP_DISCARD:
 			if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) {
 				struct request_queue *rq = info->rq;
 				printk(KERN_WARNING "blkfront: %s: %s op failed\n",
 					   info->gd->disk_name, op_name(bret->operation));
-				error = -EOPNOTSUPP;
+				req->errors = -EOPNOTSUPP;
 				info->feature_discard = 0;
 				info->feature_secdiscard = 0;
 				queue_flag_clear(QUEUE_FLAG_DISCARD, rq);
 				queue_flag_clear(QUEUE_FLAG_SECDISCARD, rq);
 			}
-			__blk_end_request_all(req, error);
+			blk_mq_complete_request(req);
 			break;
 		case BLKIF_OP_FLUSH_DISKCACHE:
 		case BLKIF_OP_WRITE_BARRIER:
 			if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) {
 				printk(KERN_WARNING "blkfront: %s: %s op failed\n",
 				       info->gd->disk_name, op_name(bret->operation));
-				error = -EOPNOTSUPP;
+				req->errors = -EOPNOTSUPP;
 			}
 			if (unlikely(bret->status == BLKIF_RSP_ERROR &&
 				     info->shadow[id].req.u.rw.nr_segments == 0)) {
 				printk(KERN_WARNING "blkfront: %s: empty %s op failed\n",
 				       info->gd->disk_name, op_name(bret->operation));
-				error = -EOPNOTSUPP;
+				req->errors = -EOPNOTSUPP;
 			}
-			if (unlikely(error)) {
-				if (error == -EOPNOTSUPP)
-					error = 0;
+			if (unlikely(req->errors)) {
+				if (req->errors == -EOPNOTSUPP)
+					req->errors = 0;
 				info->feature_flush = 0;
 				xlvbd_flush(info);
 			}
@@ -1229,7 +1224,7 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 				dev_dbg(&info->xbdev->dev, "Bad return from blkdev data "
 					"request: %x\n", bret->status);
 
-			__blk_end_request_all(req, error);
+			blk_mq_complete_request(req);
 			break;
 		default:
 			BUG();
@@ -1558,28 +1553,6 @@ static int blkif_recover(struct blkfront_info *info)
 
 	kfree(copy);
 
-	/*
-	 * Empty the queue, this is important because we might have
-	 * requests in the queue with more segments than what we
-	 * can handle now.
-	 */
-	spin_lock_irq(&info->io_lock);
-	while ((req = blk_fetch_request(info->rq)) != NULL) {
-		if (req->cmd_flags &
-		    (REQ_FLUSH | REQ_FUA | REQ_DISCARD | REQ_SECURE)) {
-			list_add(&req->queuelist, &requests);
-			continue;
-		}
-		merge_bio.head = req->bio;
-		merge_bio.tail = req->biotail;
-		bio_list_merge(&bio_list, &merge_bio);
-		req->bio = NULL;
-		if (req->cmd_flags & (REQ_FLUSH | REQ_FUA))
-			pr_alert("diskcache flush request found!\n");
-		__blk_end_request_all(req, 0);
-	}
-	spin_unlock_irq(&info->io_lock);
-
 	xenbus_switch_state(info->xbdev, XenbusStateConnected);
 
 	spin_lock_irq(&info->io_lock);
@@ -1594,9 +1567,10 @@ static int blkif_recover(struct blkfront_info *info)
 		/* Requeue pending requests (flush or discard) */
 		list_del_init(&req->queuelist);
 		BUG_ON(req->nr_phys_segments > segs);
-		blk_requeue_request(info->rq, req);
+		blk_mq_requeue_request(req);
 	}
 	spin_unlock_irq(&info->io_lock);
+	blk_mq_kick_requeue_list(info->rq);
 
 	while ((bio = bio_list_pop(&bio_list)) != NULL) {
 		/* Traverse the list of pending bios and re-queue them */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v3 2/9] xen-block: add document for mutli hardware queues/rings
  2015-09-05 12:39 [PATCH v3 0/9] xen-block: support multi hardware-queues/rings Bob Liu
                   ` (2 preceding siblings ...)
  2015-09-05 12:39 ` [PATCH v3 2/9] xen-block: add document for mutli hardware queues/rings Bob Liu
@ 2015-09-05 12:39 ` Bob Liu
  2015-09-23 20:32   ` Konrad Rzeszutek Wilk
                     ` (3 more replies)
  2015-09-05 12:39 ` [PATCH v3 3/9] xen/blkfront: separate per ring information out of device info Bob Liu
                   ` (10 subsequent siblings)
  14 siblings, 4 replies; 83+ messages in thread
From: Bob Liu @ 2015-09-05 12:39 UTC (permalink / raw)
  To: xen-devel
  Cc: david.vrabel, linux-kernel, roger.pau, konrad.wilk,
	felipe.franciosi, axboe, hch, avanzini.arianna, rafal.mielniczuk,
	boris.ostrovsky, jonathan.davies, Bob Liu

Document multi queues/rings of xen-block.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 include/xen/interface/io/blkif.h |   32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/include/xen/interface/io/blkif.h b/include/xen/interface/io/blkif.h
index c33e1c4..b453b70 100644
--- a/include/xen/interface/io/blkif.h
+++ b/include/xen/interface/io/blkif.h
@@ -28,6 +28,38 @@ typedef uint16_t blkif_vdev_t;
 typedef uint64_t blkif_sector_t;
 
 /*
+ * Multiple hardware queues/rings:
+ * If supported, the backend will write the key "multi-queue-max-queues" to
+ * the directory for that vbd, and set its value to the maximum supported
+ * number of queues.
+ * Frontends that are aware of this feature and wish to use it can write the
+ * key "multi-queue-num-queues", set to the number they wish to use, which
+ * must be greater than zero, and no more than the value reported by the backend
+ * in "multi-queue-max-queues".
+ *
+ * For frontends requesting just one queue, the usual event-channel and
+ * ring-ref keys are written as before, simplifying the backend processing
+ * to avoid distinguishing between a frontend that doesn't understand the
+ * multi-queue feature, and one that does, but requested only one queue.
+ *
+ * Frontends requesting two or more queues must not write the toplevel
+ * event-channeland ring-ref keys, instead writing those keys under sub-keys
+ * having the name "queue-N" where N is the integer ID of the queue/ring for
+ * which those keys belong. Queues are indexed from zero.
+ * For example, a frontend with two queues must write the following set of
+ * queue-related keys:
+ *
+ * /local/domain/1/device/vbd/0/multi-queue-num-queues = "2"
+ * /local/domain/1/device/vbd/0/queue-0 = ""
+ * /local/domain/1/device/vbd/0/queue-0/ring-ref = "<ring-ref>"
+ * /local/domain/1/device/vbd/0/queue-0/event-channel = "<evtchn>"
+ * /local/domain/1/device/vbd/0/queue-1 = ""
+ * /local/domain/1/device/vbd/0/queue-1/ring-ref = "<ring-ref>"
+ * /local/domain/1/device/vbd/0/queue-1/event-channel = "<evtchn>"
+ *
+ */
+
+/*
  * REQUEST CODES.
  */
 #define BLKIF_OP_READ              0
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v3 2/9] xen-block: add document for mutli hardware queues/rings
  2015-09-05 12:39 [PATCH v3 0/9] xen-block: support multi hardware-queues/rings Bob Liu
  2015-09-05 12:39 ` [PATCH v3 1/9] xen-blkfront: convert to blk-mq APIs Bob Liu
  2015-09-05 12:39 ` Bob Liu
@ 2015-09-05 12:39 ` Bob Liu
  2015-09-05 12:39 ` Bob Liu
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 83+ messages in thread
From: Bob Liu @ 2015-09-05 12:39 UTC (permalink / raw)
  To: xen-devel
  Cc: hch, felipe.franciosi, rafal.mielniczuk, linux-kernel,
	jonathan.davies, axboe, Bob Liu, david.vrabel, avanzini.arianna,
	boris.ostrovsky, roger.pau

Document multi queues/rings of xen-block.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 include/xen/interface/io/blkif.h |   32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/include/xen/interface/io/blkif.h b/include/xen/interface/io/blkif.h
index c33e1c4..b453b70 100644
--- a/include/xen/interface/io/blkif.h
+++ b/include/xen/interface/io/blkif.h
@@ -28,6 +28,38 @@ typedef uint16_t blkif_vdev_t;
 typedef uint64_t blkif_sector_t;
 
 /*
+ * Multiple hardware queues/rings:
+ * If supported, the backend will write the key "multi-queue-max-queues" to
+ * the directory for that vbd, and set its value to the maximum supported
+ * number of queues.
+ * Frontends that are aware of this feature and wish to use it can write the
+ * key "multi-queue-num-queues", set to the number they wish to use, which
+ * must be greater than zero, and no more than the value reported by the backend
+ * in "multi-queue-max-queues".
+ *
+ * For frontends requesting just one queue, the usual event-channel and
+ * ring-ref keys are written as before, simplifying the backend processing
+ * to avoid distinguishing between a frontend that doesn't understand the
+ * multi-queue feature, and one that does, but requested only one queue.
+ *
+ * Frontends requesting two or more queues must not write the toplevel
+ * event-channeland ring-ref keys, instead writing those keys under sub-keys
+ * having the name "queue-N" where N is the integer ID of the queue/ring for
+ * which those keys belong. Queues are indexed from zero.
+ * For example, a frontend with two queues must write the following set of
+ * queue-related keys:
+ *
+ * /local/domain/1/device/vbd/0/multi-queue-num-queues = "2"
+ * /local/domain/1/device/vbd/0/queue-0 = ""
+ * /local/domain/1/device/vbd/0/queue-0/ring-ref = "<ring-ref>"
+ * /local/domain/1/device/vbd/0/queue-0/event-channel = "<evtchn>"
+ * /local/domain/1/device/vbd/0/queue-1 = ""
+ * /local/domain/1/device/vbd/0/queue-1/ring-ref = "<ring-ref>"
+ * /local/domain/1/device/vbd/0/queue-1/event-channel = "<evtchn>"
+ *
+ */
+
+/*
  * REQUEST CODES.
  */
 #define BLKIF_OP_READ              0
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v3 3/9] xen/blkfront: separate per ring information out of device info
  2015-09-05 12:39 [PATCH v3 0/9] xen-block: support multi hardware-queues/rings Bob Liu
                   ` (3 preceding siblings ...)
  2015-09-05 12:39 ` Bob Liu
@ 2015-09-05 12:39 ` Bob Liu
  2015-10-02 17:02   ` Roger Pau Monné
  2015-10-02 17:02   ` Roger Pau Monné
  2015-09-05 12:39 ` Bob Liu
                   ` (9 subsequent siblings)
  14 siblings, 2 replies; 83+ messages in thread
From: Bob Liu @ 2015-09-05 12:39 UTC (permalink / raw)
  To: xen-devel
  Cc: david.vrabel, linux-kernel, roger.pau, konrad.wilk,
	felipe.franciosi, axboe, hch, avanzini.arianna, rafal.mielniczuk,
	boris.ostrovsky, jonathan.davies, Bob Liu

Split per ring information to an new structure:blkfront_ring_info, also rename
per blkfront_info to blkfront_dev_info.

A ring is the representation of a hardware queue, every vbd device can associate
with one or more blkfront_ring_info depending on how many hardware
queues/rings to be used.

This patch is a preparation for supporting real multi hardware queues/rings.

Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 drivers/block/xen-blkfront.c |  854 ++++++++++++++++++++++--------------------
 1 file changed, 445 insertions(+), 409 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 5dd591d..bf416d5 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -107,7 +107,7 @@ static unsigned int xen_blkif_max_ring_order;
 module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, S_IRUGO);
 MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the shared ring");
 
-#define BLK_RING_SIZE(info) __CONST_RING_SIZE(blkif, PAGE_SIZE * (info)->nr_ring_pages)
+#define BLK_RING_SIZE(dinfo) __CONST_RING_SIZE(blkif, PAGE_SIZE * (dinfo)->nr_ring_pages)
 #define BLK_MAX_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE * XENBUS_MAX_RING_PAGES)
 /*
  * ring-ref%i i=(-1UL) would take 11 characters + 'ring-ref' is 8, so 19
@@ -116,12 +116,31 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the
 #define RINGREF_NAME_LEN (20)
 
 /*
+ *  Per-ring info.
+ *  Every blkfront device can associate with one or more blkfront_ring_info,
+ *  depending on how many hardware queues to be used.
+ */
+struct blkfront_ring_info
+{
+	struct blkif_front_ring ring;
+	unsigned int ring_ref[XENBUS_MAX_RING_PAGES];
+	unsigned int evtchn, irq;
+	struct work_struct work;
+	struct gnttab_free_callback callback;
+	struct blk_shadow shadow[BLK_MAX_RING_SIZE];
+	struct list_head grants;
+	struct list_head indirect_pages;
+	unsigned int persistent_gnts_c;
+	unsigned long shadow_free;
+	struct blkfront_dev_info *dinfo;
+};
+
+/*
  * We have one of these per vbd, whether ide, scsi or 'other'.  They
  * hang in private_data off the gendisk structure. We may end up
  * putting all kinds of interesting stuff here :-)
  */
-struct blkfront_info
-{
+struct blkfront_dev_info {
 	spinlock_t io_lock;
 	struct mutex mutex;
 	struct xenbus_device *xbdev;
@@ -129,18 +148,7 @@ struct blkfront_info
 	int vdevice;
 	blkif_vdev_t handle;
 	enum blkif_state connected;
-	int ring_ref[XENBUS_MAX_RING_PAGES];
-	unsigned int nr_ring_pages;
-	struct blkif_front_ring ring;
-	unsigned int evtchn, irq;
 	struct request_queue *rq;
-	struct work_struct work;
-	struct gnttab_free_callback callback;
-	struct blk_shadow shadow[BLK_MAX_RING_SIZE];
-	struct list_head grants;
-	struct list_head indirect_pages;
-	unsigned int persistent_gnts_c;
-	unsigned long shadow_free;
 	unsigned int feature_flush;
 	unsigned int feature_discard:1;
 	unsigned int feature_secdiscard:1;
@@ -149,7 +157,9 @@ struct blkfront_info
 	unsigned int feature_persistent:1;
 	unsigned int max_indirect_segments;
 	int is_ready;
+	unsigned int nr_ring_pages;
 	struct blk_mq_tag_set tag_set;
+	struct blkfront_ring_info rinfo;
 };
 
 static unsigned int nr_minors;
@@ -180,32 +190,33 @@ static DEFINE_SPINLOCK(minor_lock);
 #define INDIRECT_GREFS(_segs) \
 	((_segs + SEGS_PER_INDIRECT_FRAME - 1)/SEGS_PER_INDIRECT_FRAME)
 
-static int blkfront_setup_indirect(struct blkfront_info *info);
-static int blkfront_gather_backend_features(struct blkfront_info *info);
+static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo);
+static int blkfront_gather_backend_features(struct blkfront_dev_info *dinfo);
 
-static int get_id_from_freelist(struct blkfront_info *info)
+static int get_id_from_freelist(struct blkfront_ring_info *rinfo)
 {
-	unsigned long free = info->shadow_free;
-	BUG_ON(free >= BLK_RING_SIZE(info));
-	info->shadow_free = info->shadow[free].req.u.rw.id;
-	info->shadow[free].req.u.rw.id = 0x0fffffee; /* debug */
+	unsigned long free = rinfo->shadow_free;
+
+	BUG_ON(free >= BLK_RING_SIZE(rinfo->dinfo));
+	rinfo->shadow_free = rinfo->shadow[free].req.u.rw.id;
+	rinfo->shadow[free].req.u.rw.id = 0x0fffffee; /* debug */
 	return free;
 }
 
-static int add_id_to_freelist(struct blkfront_info *info,
+static int add_id_to_freelist(struct blkfront_ring_info *rinfo,
 			       unsigned long id)
 {
-	if (info->shadow[id].req.u.rw.id != id)
+	if (rinfo->shadow[id].req.u.rw.id != id)
 		return -EINVAL;
-	if (info->shadow[id].request == NULL)
+	if (rinfo->shadow[id].request == NULL)
 		return -EINVAL;
-	info->shadow[id].req.u.rw.id  = info->shadow_free;
-	info->shadow[id].request = NULL;
-	info->shadow_free = id;
+	rinfo->shadow[id].req.u.rw.id  = rinfo->shadow_free;
+	rinfo->shadow[id].request = NULL;
+	rinfo->shadow_free = id;
 	return 0;
 }
 
-static int fill_grant_buffer(struct blkfront_info *info, int num)
+static int fill_grant_buffer(struct blkfront_ring_info *rinfo, int num)
 {
 	struct page *granted_page;
 	struct grant *gnt_list_entry, *n;
@@ -216,7 +227,7 @@ static int fill_grant_buffer(struct blkfront_info *info, int num)
 		if (!gnt_list_entry)
 			goto out_of_memory;
 
-		if (info->feature_persistent) {
+		if (rinfo->dinfo->feature_persistent) {
 			granted_page = alloc_page(GFP_NOIO);
 			if (!granted_page) {
 				kfree(gnt_list_entry);
@@ -226,7 +237,7 @@ static int fill_grant_buffer(struct blkfront_info *info, int num)
 		}
 
 		gnt_list_entry->gref = GRANT_INVALID_REF;
-		list_add(&gnt_list_entry->node, &info->grants);
+		list_add(&gnt_list_entry->node, &rinfo->grants);
 		i++;
 	}
 
@@ -234,9 +245,9 @@ static int fill_grant_buffer(struct blkfront_info *info, int num)
 
 out_of_memory:
 	list_for_each_entry_safe(gnt_list_entry, n,
-	                         &info->grants, node) {
+				 &rinfo->grants, node) {
 		list_del(&gnt_list_entry->node);
-		if (info->feature_persistent)
+		if (rinfo->dinfo->feature_persistent)
 			__free_page(pfn_to_page(gnt_list_entry->pfn));
 		kfree(gnt_list_entry);
 		i--;
@@ -246,33 +257,33 @@ out_of_memory:
 }
 
 static struct grant *get_grant(grant_ref_t *gref_head,
-                               unsigned long pfn,
-                               struct blkfront_info *info)
+			       unsigned long pfn,
+			       struct blkfront_ring_info *rinfo)
 {
 	struct grant *gnt_list_entry;
 	unsigned long buffer_mfn;
 
-	BUG_ON(list_empty(&info->grants));
-	gnt_list_entry = list_first_entry(&info->grants, struct grant,
+	BUG_ON(list_empty(&rinfo->grants));
+	gnt_list_entry = list_first_entry(&rinfo->grants, struct grant,
 	                                  node);
 	list_del(&gnt_list_entry->node);
 
 	if (gnt_list_entry->gref != GRANT_INVALID_REF) {
-		info->persistent_gnts_c--;
+		rinfo->persistent_gnts_c--;
 		return gnt_list_entry;
 	}
 
 	/* Assign a gref to this page */
 	gnt_list_entry->gref = gnttab_claim_grant_reference(gref_head);
 	BUG_ON(gnt_list_entry->gref == -ENOSPC);
-	if (!info->feature_persistent) {
+	if (!rinfo->dinfo->feature_persistent) {
 		BUG_ON(!pfn);
 		gnt_list_entry->pfn = pfn;
 	}
 	buffer_mfn = pfn_to_mfn(gnt_list_entry->pfn);
 	gnttab_grant_foreign_access_ref(gnt_list_entry->gref,
-	                                info->xbdev->otherend_id,
-	                                buffer_mfn, 0);
+					rinfo->dinfo->xbdev->otherend_id,
+					buffer_mfn, 0);
 	return gnt_list_entry;
 }
 
@@ -342,8 +353,9 @@ static void xlbd_release_minors(unsigned int minor, unsigned int nr)
 
 static void blkif_restart_queue_callback(void *arg)
 {
-	struct blkfront_info *info = (struct blkfront_info *)arg;
-	schedule_work(&info->work);
+	struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)arg;
+
+	schedule_work(&rinfo->work);
 }
 
 static int blkif_getgeo(struct block_device *bd, struct hd_geometry *hg)
@@ -365,22 +377,22 @@ static int blkif_getgeo(struct block_device *bd, struct hd_geometry *hg)
 static int blkif_ioctl(struct block_device *bdev, fmode_t mode,
 		       unsigned command, unsigned long argument)
 {
-	struct blkfront_info *info = bdev->bd_disk->private_data;
+	struct blkfront_dev_info *dinfo = bdev->bd_disk->private_data;
 	int i;
 
-	dev_dbg(&info->xbdev->dev, "command: 0x%x, argument: 0x%lx\n",
+	dev_dbg(&dinfo->xbdev->dev, "command: 0x%x, argument: 0x%lx\n",
 		command, (long)argument);
 
 	switch (command) {
 	case CDROMMULTISESSION:
-		dev_dbg(&info->xbdev->dev, "FIXME: support multisession CDs later\n");
+		dev_dbg(&dinfo->xbdev->dev, "FIXME: support multisession CDs later\n");
 		for (i = 0; i < sizeof(struct cdrom_multisession); i++)
 			if (put_user(0, (char __user *)(argument + i)))
 				return -EFAULT;
 		return 0;
 
 	case CDROM_GET_CAPABILITY: {
-		struct gendisk *gd = info->gd;
+		struct gendisk *gd = dinfo->gd;
 		if (gd->flags & GENHD_FL_CD)
 			return 0;
 		return -EINVAL;
@@ -401,9 +413,10 @@ static int blkif_ioctl(struct block_device *bdev, fmode_t mode,
  *
  * @req: a request struct
  */
-static int blkif_queue_request(struct request *req)
+static int blkif_queue_request(struct request *req,
+			       struct blkfront_ring_info *rinfo)
 {
-	struct blkfront_info *info = req->rq_disk->private_data;
+	struct blkfront_dev_info *dinfo = req->rq_disk->private_data;
 	struct blkif_request *ring_req;
 	unsigned long id;
 	unsigned int fsect, lsect;
@@ -421,7 +434,7 @@ static int blkif_queue_request(struct request *req)
 	struct scatterlist *sg;
 	int nseg, max_grefs;
 
-	if (unlikely(info->connected != BLKIF_STATE_CONNECTED))
+	if (unlikely(dinfo->connected != BLKIF_STATE_CONNECTED))
 		return 1;
 
 	max_grefs = req->nr_phys_segments;
@@ -433,15 +446,15 @@ static int blkif_queue_request(struct request *req)
 		max_grefs += INDIRECT_GREFS(req->nr_phys_segments);
 
 	/* Check if we have enough grants to allocate a requests */
-	if (info->persistent_gnts_c < max_grefs) {
+	if (rinfo->persistent_gnts_c < max_grefs) {
 		new_persistent_gnts = 1;
 		if (gnttab_alloc_grant_references(
-		    max_grefs - info->persistent_gnts_c,
+		    max_grefs - rinfo->persistent_gnts_c,
 		    &gref_head) < 0) {
 			gnttab_request_free_callback(
-				&info->callback,
+				&rinfo->callback,
 				blkif_restart_queue_callback,
-				info,
+				rinfo,
 				max_grefs);
 			return 1;
 		}
@@ -449,25 +462,25 @@ static int blkif_queue_request(struct request *req)
 		new_persistent_gnts = 0;
 
 	/* Fill out a communications ring structure. */
-	ring_req = RING_GET_REQUEST(&info->ring, info->ring.req_prod_pvt);
-	id = get_id_from_freelist(info);
-	info->shadow[id].request = req;
+	ring_req = RING_GET_REQUEST(&rinfo->ring, rinfo->ring.req_prod_pvt);
+	id = get_id_from_freelist(rinfo);
+	rinfo->shadow[id].request = req;
 
 	if (unlikely(req->cmd_flags & (REQ_DISCARD | REQ_SECURE))) {
 		ring_req->operation = BLKIF_OP_DISCARD;
 		ring_req->u.discard.nr_sectors = blk_rq_sectors(req);
 		ring_req->u.discard.id = id;
 		ring_req->u.discard.sector_number = (blkif_sector_t)blk_rq_pos(req);
-		if ((req->cmd_flags & REQ_SECURE) && info->feature_secdiscard)
+		if ((req->cmd_flags & REQ_SECURE) && dinfo->feature_secdiscard)
 			ring_req->u.discard.flag = BLKIF_DISCARD_SECURE;
 		else
 			ring_req->u.discard.flag = 0;
 	} else {
-		BUG_ON(info->max_indirect_segments == 0 &&
+		BUG_ON(dinfo->max_indirect_segments == 0 &&
 		       req->nr_phys_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST);
-		BUG_ON(info->max_indirect_segments &&
-		       req->nr_phys_segments > info->max_indirect_segments);
-		nseg = blk_rq_map_sg(req->q, req, info->shadow[id].sg);
+		BUG_ON(dinfo->max_indirect_segments &&
+		       req->nr_phys_segments > dinfo->max_indirect_segments);
+		nseg = blk_rq_map_sg(req->q, req, rinfo->shadow[id].sg);
 		ring_req->u.rw.id = id;
 		if (nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
 			/*
@@ -479,11 +492,11 @@ static int blkif_queue_request(struct request *req)
 			ring_req->u.indirect.indirect_op = rq_data_dir(req) ?
 				BLKIF_OP_WRITE : BLKIF_OP_READ;
 			ring_req->u.indirect.sector_number = (blkif_sector_t)blk_rq_pos(req);
-			ring_req->u.indirect.handle = info->handle;
+			ring_req->u.indirect.handle = dinfo->handle;
 			ring_req->u.indirect.nr_segments = nseg;
 		} else {
 			ring_req->u.rw.sector_number = (blkif_sector_t)blk_rq_pos(req);
-			ring_req->u.rw.handle = info->handle;
+			ring_req->u.rw.handle = dinfo->handle;
 			ring_req->operation = rq_data_dir(req) ?
 				BLKIF_OP_WRITE : BLKIF_OP_READ;
 			if (req->cmd_flags & (REQ_FLUSH | REQ_FUA)) {
@@ -494,7 +507,7 @@ static int blkif_queue_request(struct request *req)
 				 * way.  (It's also a FLUSH+FUA, since it is
 				 * guaranteed ordered WRT previous writes.)
 				 */
-				switch (info->feature_flush &
+				switch (dinfo->feature_flush &
 					((REQ_FLUSH|REQ_FUA))) {
 				case REQ_FLUSH|REQ_FUA:
 					ring_req->operation =
@@ -510,7 +523,7 @@ static int blkif_queue_request(struct request *req)
 			}
 			ring_req->u.rw.nr_segments = nseg;
 		}
-		for_each_sg(info->shadow[id].sg, sg, nseg, i) {
+		for_each_sg(rinfo->shadow[id].sg, sg, nseg, i) {
 			fsect = sg->offset >> 9;
 			lsect = fsect + (sg->length >> 9) - 1;
 
@@ -522,28 +535,28 @@ static int blkif_queue_request(struct request *req)
 					kunmap_atomic(segments);
 
 				n = i / SEGS_PER_INDIRECT_FRAME;
-				if (!info->feature_persistent) {
+				if (!dinfo->feature_persistent) {
 					struct page *indirect_page;
 
 					/* Fetch a pre-allocated page to use for indirect grefs */
-					BUG_ON(list_empty(&info->indirect_pages));
-					indirect_page = list_first_entry(&info->indirect_pages,
+					BUG_ON(list_empty(&rinfo->indirect_pages));
+					indirect_page = list_first_entry(&rinfo->indirect_pages,
 					                                 struct page, lru);
 					list_del(&indirect_page->lru);
 					pfn = page_to_pfn(indirect_page);
 				}
-				gnt_list_entry = get_grant(&gref_head, pfn, info);
-				info->shadow[id].indirect_grants[n] = gnt_list_entry;
+				gnt_list_entry = get_grant(&gref_head, pfn, rinfo);
+				rinfo->shadow[id].indirect_grants[n] = gnt_list_entry;
 				segments = kmap_atomic(pfn_to_page(gnt_list_entry->pfn));
 				ring_req->u.indirect.indirect_grefs[n] = gnt_list_entry->gref;
 			}
 
-			gnt_list_entry = get_grant(&gref_head, page_to_pfn(sg_page(sg)), info);
+			gnt_list_entry = get_grant(&gref_head, page_to_pfn(sg_page(sg)), rinfo);
 			ref = gnt_list_entry->gref;
 
-			info->shadow[id].grants_used[i] = gnt_list_entry;
+			rinfo->shadow[id].grants_used[i] = gnt_list_entry;
 
-			if (rq_data_dir(req) && info->feature_persistent) {
+			if (rq_data_dir(req) && dinfo->feature_persistent) {
 				char *bvec_data;
 				void *shared_data;
 
@@ -587,10 +600,10 @@ static int blkif_queue_request(struct request *req)
 			kunmap_atomic(segments);
 	}
 
-	info->ring.req_prod_pvt++;
+	rinfo->ring.req_prod_pvt++;
 
 	/* Keep a private copy so we can reissue requests when recovering. */
-	info->shadow[id].req = *ring_req;
+	rinfo->shadow[id].req = *ring_req;
 
 	if (new_persistent_gnts)
 		gnttab_free_grant_references(gref_head);
@@ -599,59 +612,70 @@ static int blkif_queue_request(struct request *req)
 }
 
 
-static inline void flush_requests(struct blkfront_info *info)
+static inline void flush_requests(struct blkfront_ring_info *rinfo)
 {
 	int notify;
 
-	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&info->ring, notify);
+	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&rinfo->ring, notify);
 
 	if (notify)
-		notify_remote_via_irq(info->irq);
+		notify_remote_via_irq(rinfo->irq);
 }
 
 static inline bool blkif_request_flush_invalid(struct request *req,
-					       struct blkfront_info *info)
+					       struct blkfront_dev_info *dinfo)
 {
 	return ((req->cmd_type != REQ_TYPE_FS) ||
 		((req->cmd_flags & REQ_FLUSH) &&
-		 !(info->feature_flush & REQ_FLUSH)) ||
+		 !(dinfo->feature_flush & REQ_FLUSH)) ||
 		((req->cmd_flags & REQ_FUA) &&
-		 !(info->feature_flush & REQ_FUA)));
+		 !(dinfo->feature_flush & REQ_FUA)));
 }
 
 static int blkif_queue_rq(struct blk_mq_hw_ctx *hctx,
 			   const struct blk_mq_queue_data *qd)
 {
-	struct blkfront_info *info = qd->rq->rq_disk->private_data;
+	struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)hctx->driver_data;
+	struct blkfront_dev_info *dinfo = rinfo->dinfo;
 
 	blk_mq_start_request(qd->rq);
-	spin_lock_irq(&info->io_lock);
-	if (RING_FULL(&info->ring))
+	spin_lock_irq(&dinfo->io_lock);
+	if (RING_FULL(&rinfo->ring))
 		goto out_busy;
 
-	if (blkif_request_flush_invalid(qd->rq, info))
+	if (blkif_request_flush_invalid(qd->rq, dinfo))
 		goto out_err;
 
-	if (blkif_queue_request(qd->rq))
+	if (blkif_queue_request(qd->rq, rinfo))
 		goto out_busy;
 
-	flush_requests(info);
-	spin_unlock_irq(&info->io_lock);
+	flush_requests(rinfo);
+	spin_unlock_irq(&dinfo->io_lock);
 	return BLK_MQ_RQ_QUEUE_OK;
 
 out_err:
-	spin_unlock_irq(&info->io_lock);
+	spin_unlock_irq(&dinfo->io_lock);
 	return BLK_MQ_RQ_QUEUE_ERROR;
 
 out_busy:
-	spin_unlock_irq(&info->io_lock);
+	spin_unlock_irq(&dinfo->io_lock);
 	blk_mq_stop_hw_queue(hctx);
 	return BLK_MQ_RQ_QUEUE_BUSY;
 }
 
+static int blk_mq_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
+			    unsigned int index)
+{
+	struct blkfront_dev_info *dinfo = (struct blkfront_dev_info *)data;
+
+	hctx->driver_data = &dinfo->rinfo;
+	return 0;
+}
+
 static struct blk_mq_ops blkfront_mq_ops = {
 	.queue_rq = blkif_queue_rq,
 	.map_queue = blk_mq_map_queue,
+	.init_hctx = blk_mq_init_hctx,
 };
 
 static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
@@ -659,33 +683,33 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
 				unsigned int segments)
 {
 	struct request_queue *rq;
-	struct blkfront_info *info = gd->private_data;
-
-	memset(&info->tag_set, 0, sizeof(info->tag_set));
-	info->tag_set.ops = &blkfront_mq_ops;
-	info->tag_set.nr_hw_queues = 1;
-	info->tag_set.queue_depth =  BLK_RING_SIZE(info);
-	info->tag_set.numa_node = NUMA_NO_NODE;
-	info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
-	info->tag_set.cmd_size = 0;
-	info->tag_set.driver_data = info;
-
-	if (blk_mq_alloc_tag_set(&info->tag_set))
+	struct blkfront_dev_info *dinfo = gd->private_data;
+
+	memset(&dinfo->tag_set, 0, sizeof(dinfo->tag_set));
+	dinfo->tag_set.ops = &blkfront_mq_ops;
+	dinfo->tag_set.nr_hw_queues = 1;
+	dinfo->tag_set.queue_depth =  BLK_RING_SIZE(dinfo);
+	dinfo->tag_set.numa_node = NUMA_NO_NODE;
+	dinfo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
+	dinfo->tag_set.cmd_size = 0;
+	dinfo->tag_set.driver_data = dinfo;
+
+	if (blk_mq_alloc_tag_set(&dinfo->tag_set))
 		return -1;
-	rq = blk_mq_init_queue(&info->tag_set);
+	rq = blk_mq_init_queue(&dinfo->tag_set);
 	if (IS_ERR(rq)) {
-		blk_mq_free_tag_set(&info->tag_set);
+		blk_mq_free_tag_set(&dinfo->tag_set);
 		return -1;
 	}
 
 	queue_flag_set_unlocked(QUEUE_FLAG_VIRT, rq);
 
-	if (info->feature_discard) {
+	if (dinfo->feature_discard) {
 		queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, rq);
 		blk_queue_max_discard_sectors(rq, get_capacity(gd));
-		rq->limits.discard_granularity = info->discard_granularity;
-		rq->limits.discard_alignment = info->discard_alignment;
-		if (info->feature_secdiscard)
+		rq->limits.discard_granularity = dinfo->discard_granularity;
+		rq->limits.discard_alignment = dinfo->discard_alignment;
+		if (dinfo->feature_secdiscard)
 			queue_flag_set_unlocked(QUEUE_FLAG_SECDISCARD, rq);
 	}
 
@@ -724,14 +748,14 @@ static const char *flush_info(unsigned int feature_flush)
 	}
 }
 
-static void xlvbd_flush(struct blkfront_info *info)
+static void xlvbd_flush(struct blkfront_dev_info *dinfo)
 {
-	blk_queue_flush(info->rq, info->feature_flush);
+	blk_queue_flush(dinfo->rq, dinfo->feature_flush);
 	pr_info("blkfront: %s: %s %s %s %s %s\n",
-		info->gd->disk_name, flush_info(info->feature_flush),
-		"persistent grants:", info->feature_persistent ?
+		dinfo->gd->disk_name, flush_info(dinfo->feature_flush),
+		"persistent grants:", dinfo->feature_persistent ?
 		"enabled;" : "disabled;", "indirect descriptors:",
-		info->max_indirect_segments ? "enabled;" : "disabled;");
+		dinfo->max_indirect_segments ? "enabled;" : "disabled;");
 }
 
 static int xen_translate_vdev(int vdevice, int *minor, unsigned int *offset)
@@ -803,7 +827,7 @@ static char *encode_disk_name(char *ptr, unsigned int n)
 }
 
 static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
-			       struct blkfront_info *info,
+			       struct blkfront_dev_info *dinfo,
 			       u16 vdisk_info, u16 sector_size,
 			       unsigned int physical_sector_size)
 {
@@ -815,32 +839,32 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
 	int nr_parts;
 	char *ptr;
 
-	BUG_ON(info->gd != NULL);
-	BUG_ON(info->rq != NULL);
+	BUG_ON(dinfo->gd != NULL);
+	BUG_ON(dinfo->rq != NULL);
 
-	if ((info->vdevice>>EXT_SHIFT) > 1) {
+	if ((dinfo->vdevice>>EXT_SHIFT) > 1) {
 		/* this is above the extended range; something is wrong */
-		printk(KERN_WARNING "blkfront: vdevice 0x%x is above the extended range; ignoring\n", info->vdevice);
+		printk(KERN_WARNING "blkfront: vdevice 0x%x is above the extended range; ignoring\n", dinfo->vdevice);
 		return -ENODEV;
 	}
 
-	if (!VDEV_IS_EXTENDED(info->vdevice)) {
-		err = xen_translate_vdev(info->vdevice, &minor, &offset);
+	if (!VDEV_IS_EXTENDED(dinfo->vdevice)) {
+		err = xen_translate_vdev(dinfo->vdevice, &minor, &offset);
 		if (err)
 			return err;		
  		nr_parts = PARTS_PER_DISK;
 	} else {
-		minor = BLKIF_MINOR_EXT(info->vdevice);
+		minor = BLKIF_MINOR_EXT(dinfo->vdevice);
 		nr_parts = PARTS_PER_EXT_DISK;
 		offset = minor / nr_parts;
 		if (xen_hvm_domain() && offset < EMULATED_HD_DISK_NAME_OFFSET + 4)
 			printk(KERN_WARNING "blkfront: vdevice 0x%x might conflict with "
 					"emulated IDE disks,\n\t choose an xvd device name"
-					"from xvde on\n", info->vdevice);
+					"from xvde on\n", dinfo->vdevice);
 	}
 	if (minor >> MINORBITS) {
 		pr_warn("blkfront: %#x's minor (%#x) out of range; ignoring\n",
-			info->vdevice, minor);
+			dinfo->vdevice, minor);
 		return -ENODEV;
 	}
 
@@ -868,21 +892,21 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
 	gd->major = XENVBD_MAJOR;
 	gd->first_minor = minor;
 	gd->fops = &xlvbd_block_fops;
-	gd->private_data = info;
-	gd->driverfs_dev = &(info->xbdev->dev);
+	gd->private_data = dinfo;
+	gd->driverfs_dev = &(dinfo->xbdev->dev);
 	set_capacity(gd, capacity);
 
 	if (xlvbd_init_blk_queue(gd, sector_size, physical_sector_size,
-				 info->max_indirect_segments ? :
+				 dinfo->max_indirect_segments ? :
 				 BLKIF_MAX_SEGMENTS_PER_REQUEST)) {
 		del_gendisk(gd);
 		goto release;
 	}
 
-	info->rq = gd->queue;
-	info->gd = gd;
+	dinfo->rq = gd->queue;
+	dinfo->gd = gd;
 
-	xlvbd_flush(info);
+	xlvbd_flush(dinfo);
 
 	if (vdisk_info & VDISK_READONLY)
 		set_disk_ro(gd, 1);
@@ -901,118 +925,120 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
 	return err;
 }
 
-static void xlvbd_release_gendisk(struct blkfront_info *info)
+static void xlvbd_release_gendisk(struct blkfront_dev_info *dinfo)
 {
 	unsigned int minor, nr_minors;
+	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
 
-	if (info->rq == NULL)
+	if (dinfo->rq == NULL)
 		return;
 
 	/* No more blkif_request(). */
-	blk_mq_stop_hw_queues(info->rq);
+	blk_mq_stop_hw_queues(dinfo->rq);
 
 	/* No more gnttab callback work. */
-	gnttab_cancel_free_callback(&info->callback);
+	gnttab_cancel_free_callback(&rinfo->callback);
 
 	/* Flush gnttab callback work. Must be done with no locks held. */
-	flush_work(&info->work);
+	flush_work(&rinfo->work);
 
-	del_gendisk(info->gd);
+	del_gendisk(dinfo->gd);
 
-	minor = info->gd->first_minor;
-	nr_minors = info->gd->minors;
+	minor = dinfo->gd->first_minor;
+	nr_minors = dinfo->gd->minors;
 	xlbd_release_minors(minor, nr_minors);
 
-	blk_cleanup_queue(info->rq);
-	blk_mq_free_tag_set(&info->tag_set);
-	info->rq = NULL;
+	blk_cleanup_queue(dinfo->rq);
+	blk_mq_free_tag_set(&dinfo->tag_set);
+	dinfo->rq = NULL;
 
-	put_disk(info->gd);
-	info->gd = NULL;
+	put_disk(dinfo->gd);
+	dinfo->gd = NULL;
 }
 
 /* Must be called with io_lock holded */
-static void kick_pending_request_queues(struct blkfront_info *info)
+static void kick_pending_request_queues(struct blkfront_ring_info *rinfo)
 {
-	if (!RING_FULL(&info->ring))
-		blk_mq_start_stopped_hw_queues(info->rq, true);
+	if (!RING_FULL(&rinfo->ring))
+		blk_mq_start_stopped_hw_queues(rinfo->dinfo->rq, true);
 }
 
 static void blkif_restart_queue(struct work_struct *work)
 {
-	struct blkfront_info *info = container_of(work, struct blkfront_info, work);
+	struct blkfront_ring_info *rinfo = container_of(work, struct blkfront_ring_info, work);
 
-	spin_lock_irq(&info->io_lock);
-	if (info->connected == BLKIF_STATE_CONNECTED)
-		kick_pending_request_queues(info);
-	spin_unlock_irq(&info->io_lock);
+	spin_lock_irq(&rinfo->dinfo->io_lock);
+	if (rinfo->dinfo->connected == BLKIF_STATE_CONNECTED)
+		kick_pending_request_queues(rinfo);
+	spin_unlock_irq(&rinfo->dinfo->io_lock);
 }
 
-static void blkif_free(struct blkfront_info *info, int suspend)
+static void blkif_free(struct blkfront_dev_info *dinfo, int suspend)
 {
 	struct grant *persistent_gnt;
 	struct grant *n;
 	int i, j, segs;
+	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
 
 	/* Prevent new requests being issued until we fix things up. */
-	spin_lock_irq(&info->io_lock);
-	info->connected = suspend ?
+	spin_lock_irq(&dinfo->io_lock);
+	dinfo->connected = suspend ?
 		BLKIF_STATE_SUSPENDED : BLKIF_STATE_DISCONNECTED;
 	/* No more blkif_request(). */
-	if (info->rq)
-		blk_mq_stop_hw_queues(info->rq);
+	if (dinfo->rq)
+		blk_mq_stop_hw_queues(dinfo->rq);
 
 	/* Remove all persistent grants */
-	if (!list_empty(&info->grants)) {
+	if (!list_empty(&rinfo->grants)) {
 		list_for_each_entry_safe(persistent_gnt, n,
-		                         &info->grants, node) {
+					 &rinfo->grants, node) {
 			list_del(&persistent_gnt->node);
 			if (persistent_gnt->gref != GRANT_INVALID_REF) {
 				gnttab_end_foreign_access(persistent_gnt->gref,
 				                          0, 0UL);
-				info->persistent_gnts_c--;
+				rinfo->persistent_gnts_c--;
 			}
-			if (info->feature_persistent)
+			if (dinfo->feature_persistent)
 				__free_page(pfn_to_page(persistent_gnt->pfn));
 			kfree(persistent_gnt);
 		}
 	}
-	BUG_ON(info->persistent_gnts_c != 0);
+	BUG_ON(rinfo->persistent_gnts_c != 0);
 
 	/*
 	 * Remove indirect pages, this only happens when using indirect
 	 * descriptors but not persistent grants
 	 */
-	if (!list_empty(&info->indirect_pages)) {
+	if (!list_empty(&rinfo->indirect_pages)) {
 		struct page *indirect_page, *n;
 
-		BUG_ON(info->feature_persistent);
-		list_for_each_entry_safe(indirect_page, n, &info->indirect_pages, lru) {
+		BUG_ON(dinfo->feature_persistent);
+		list_for_each_entry_safe(indirect_page, n, &rinfo->indirect_pages, lru) {
 			list_del(&indirect_page->lru);
 			__free_page(indirect_page);
 		}
 	}
 
-	for (i = 0; i < BLK_RING_SIZE(info); i++) {
+	for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
 		/*
 		 * Clear persistent grants present in requests already
 		 * on the shared ring
 		 */
-		if (!info->shadow[i].request)
+		if (!rinfo->shadow[i].request)
 			goto free_shadow;
 
-		segs = info->shadow[i].req.operation == BLKIF_OP_INDIRECT ?
-		       info->shadow[i].req.u.indirect.nr_segments :
-		       info->shadow[i].req.u.rw.nr_segments;
+		segs = rinfo->shadow[i].req.operation == BLKIF_OP_INDIRECT ?
+		       rinfo->shadow[i].req.u.indirect.nr_segments :
+		       rinfo->shadow[i].req.u.rw.nr_segments;
 		for (j = 0; j < segs; j++) {
-			persistent_gnt = info->shadow[i].grants_used[j];
+			persistent_gnt = rinfo->shadow[i].grants_used[j];
 			gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
-			if (info->feature_persistent)
+			if (dinfo->feature_persistent)
 				__free_page(pfn_to_page(persistent_gnt->pfn));
 			kfree(persistent_gnt);
 		}
 
-		if (info->shadow[i].req.operation != BLKIF_OP_INDIRECT)
+		if (rinfo->shadow[i].req.operation != BLKIF_OP_INDIRECT)
 			/*
 			 * If this is not an indirect operation don't try to
 			 * free indirect segments
@@ -1020,45 +1046,45 @@ static void blkif_free(struct blkfront_info *info, int suspend)
 			goto free_shadow;
 
 		for (j = 0; j < INDIRECT_GREFS(segs); j++) {
-			persistent_gnt = info->shadow[i].indirect_grants[j];
+			persistent_gnt = rinfo->shadow[i].indirect_grants[j];
 			gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
 			__free_page(pfn_to_page(persistent_gnt->pfn));
 			kfree(persistent_gnt);
 		}
 
 free_shadow:
-		kfree(info->shadow[i].grants_used);
-		info->shadow[i].grants_used = NULL;
-		kfree(info->shadow[i].indirect_grants);
-		info->shadow[i].indirect_grants = NULL;
-		kfree(info->shadow[i].sg);
-		info->shadow[i].sg = NULL;
+		kfree(rinfo->shadow[i].grants_used);
+		rinfo->shadow[i].grants_used = NULL;
+		kfree(rinfo->shadow[i].indirect_grants);
+		rinfo->shadow[i].indirect_grants = NULL;
+		kfree(rinfo->shadow[i].sg);
+		rinfo->shadow[i].sg = NULL;
 	}
 
 	/* No more gnttab callback work. */
-	gnttab_cancel_free_callback(&info->callback);
-	spin_unlock_irq(&info->io_lock);
+	gnttab_cancel_free_callback(&rinfo->callback);
+	spin_unlock_irq(&dinfo->io_lock);
 
 	/* Flush gnttab callback work. Must be done with no locks held. */
-	flush_work(&info->work);
+	flush_work(&rinfo->work);
 
 	/* Free resources associated with old device channel. */
-	for (i = 0; i < info->nr_ring_pages; i++) {
-		if (info->ring_ref[i] != GRANT_INVALID_REF) {
-			gnttab_end_foreign_access(info->ring_ref[i], 0, 0);
-			info->ring_ref[i] = GRANT_INVALID_REF;
+	for (i = 0; i < dinfo->nr_ring_pages; i++) {
+		if (rinfo->ring_ref[i] != GRANT_INVALID_REF) {
+			gnttab_end_foreign_access(rinfo->ring_ref[i], 0, 0);
+			rinfo->ring_ref[i] = GRANT_INVALID_REF;
 		}
 	}
-	free_pages((unsigned long)info->ring.sring, get_order(info->nr_ring_pages * PAGE_SIZE));
-	info->ring.sring = NULL;
+	free_pages((unsigned long)rinfo->ring.sring, get_order(dinfo->nr_ring_pages * PAGE_SIZE));
+	rinfo->ring.sring = NULL;
 
-	if (info->irq)
-		unbind_from_irqhandler(info->irq, info);
-	info->evtchn = info->irq = 0;
+	if (rinfo->irq)
+		unbind_from_irqhandler(rinfo->irq, rinfo);
+	rinfo->evtchn = rinfo->irq = 0;
 
 }
 
-static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
+static void blkif_completion(struct blk_shadow *s, struct blkfront_ring_info *rinfo,
 			     struct blkif_response *bret)
 {
 	int i = 0;
@@ -1066,11 +1092,12 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
 	char *bvec_data;
 	void *shared_data;
 	int nseg;
+	struct blkfront_dev_info *dinfo = rinfo->dinfo;
 
 	nseg = s->req.operation == BLKIF_OP_INDIRECT ?
 		s->req.u.indirect.nr_segments : s->req.u.rw.nr_segments;
 
-	if (bret->operation == BLKIF_OP_READ && info->feature_persistent) {
+	if (bret->operation == BLKIF_OP_READ && dinfo->feature_persistent) {
 		for_each_sg(s->sg, sg, nseg, i) {
 			BUG_ON(sg->offset + sg->length > PAGE_SIZE);
 			shared_data = kmap_atomic(
@@ -1092,11 +1119,11 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
 			 * we add it at the head of the list, so it will be
 			 * reused first.
 			 */
-			if (!info->feature_persistent)
+			if (!dinfo->feature_persistent)
 				pr_alert_ratelimited("backed has not unmapped grant: %u\n",
 						     s->grants_used[i]->gref);
-			list_add(&s->grants_used[i]->node, &info->grants);
-			info->persistent_gnts_c++;
+			list_add(&s->grants_used[i]->node, &rinfo->grants);
+			rinfo->persistent_gnts_c++;
 		} else {
 			/*
 			 * If the grant is not mapped by the backend we end the
@@ -1106,17 +1133,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
 			 */
 			gnttab_end_foreign_access(s->grants_used[i]->gref, 0, 0UL);
 			s->grants_used[i]->gref = GRANT_INVALID_REF;
-			list_add_tail(&s->grants_used[i]->node, &info->grants);
+			list_add_tail(&s->grants_used[i]->node, &rinfo->grants);
 		}
 	}
 	if (s->req.operation == BLKIF_OP_INDIRECT) {
 		for (i = 0; i < INDIRECT_GREFS(nseg); i++) {
 			if (gnttab_query_foreign_access(s->indirect_grants[i]->gref)) {
-				if (!info->feature_persistent)
+				if (!dinfo->feature_persistent)
 					pr_alert_ratelimited("backed has not unmapped grant: %u\n",
 							     s->indirect_grants[i]->gref);
-				list_add(&s->indirect_grants[i]->node, &info->grants);
-				info->persistent_gnts_c++;
+				list_add(&s->indirect_grants[i]->node, &rinfo->grants);
+				rinfo->persistent_gnts_c++;
 			} else {
 				struct page *indirect_page;
 
@@ -1125,12 +1152,12 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
 				 * Add the used indirect page back to the list of
 				 * available pages for indirect grefs.
 				 */
-				if (!info->feature_persistent) {
+				if (!dinfo->feature_persistent) {
 					indirect_page = pfn_to_page(s->indirect_grants[i]->pfn);
-					list_add(&indirect_page->lru, &info->indirect_pages);
+					list_add(&indirect_page->lru, &rinfo->indirect_pages);
 				}
 				s->indirect_grants[i]->gref = GRANT_INVALID_REF;
-				list_add_tail(&s->indirect_grants[i]->node, &info->grants);
+				list_add_tail(&s->indirect_grants[i]->node, &rinfo->grants);
 			}
 		}
 	}
@@ -1142,44 +1169,45 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 	struct blkif_response *bret;
 	RING_IDX i, rp;
 	unsigned long flags;
-	struct blkfront_info *info = (struct blkfront_info *)dev_id;
+	struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)dev_id;
+	struct blkfront_dev_info *dinfo = rinfo->dinfo;
 
-	spin_lock_irqsave(&info->io_lock, flags);
+	spin_lock_irqsave(&dinfo->io_lock, flags);
 
-	if (unlikely(info->connected != BLKIF_STATE_CONNECTED)) {
-		spin_unlock_irqrestore(&info->io_lock, flags);
+	if (unlikely(dinfo->connected != BLKIF_STATE_CONNECTED)) {
+		spin_unlock_irqrestore(&dinfo->io_lock, flags);
 		return IRQ_HANDLED;
 	}
 
  again:
-	rp = info->ring.sring->rsp_prod;
+	rp = rinfo->ring.sring->rsp_prod;
 	rmb(); /* Ensure we see queued responses up to 'rp'. */
 
-	for (i = info->ring.rsp_cons; i != rp; i++) {
+	for (i = rinfo->ring.rsp_cons; i != rp; i++) {
 		unsigned long id;
 
-		bret = RING_GET_RESPONSE(&info->ring, i);
+		bret = RING_GET_RESPONSE(&rinfo->ring, i);
 		id   = bret->id;
 		/*
 		 * The backend has messed up and given us an id that we would
 		 * never have given to it (we stamp it up to BLK_RING_SIZE -
 		 * look in get_id_from_freelist.
 		 */
-		if (id >= BLK_RING_SIZE(info)) {
+		if (id >= BLK_RING_SIZE(dinfo)) {
 			WARN(1, "%s: response to %s has incorrect id (%ld)\n",
-			     info->gd->disk_name, op_name(bret->operation), id);
+			     dinfo->gd->disk_name, op_name(bret->operation), id);
 			/* We can't safely get the 'struct request' as
 			 * the id is busted. */
 			continue;
 		}
-		req  = info->shadow[id].request;
+		req  = rinfo->shadow[id].request;
 
 		if (bret->operation != BLKIF_OP_DISCARD)
-			blkif_completion(&info->shadow[id], info, bret);
+			blkif_completion(&rinfo->shadow[id], rinfo, bret);
 
-		if (add_id_to_freelist(info, id)) {
+		if (add_id_to_freelist(rinfo, id)) {
 			WARN(1, "%s: response to %s (id %ld) couldn't be recycled!\n",
-			     info->gd->disk_name, op_name(bret->operation), id);
+			     dinfo->gd->disk_name, op_name(bret->operation), id);
 			continue;
 		}
 
@@ -1187,12 +1215,12 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 		switch (bret->operation) {
 		case BLKIF_OP_DISCARD:
 			if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) {
-				struct request_queue *rq = info->rq;
+				struct request_queue *rq = dinfo->rq;
 				printk(KERN_WARNING "blkfront: %s: %s op failed\n",
-					   info->gd->disk_name, op_name(bret->operation));
+					   dinfo->gd->disk_name, op_name(bret->operation));
 				req->errors = -EOPNOTSUPP;
-				info->feature_discard = 0;
-				info->feature_secdiscard = 0;
+				dinfo->feature_discard = 0;
+				dinfo->feature_secdiscard = 0;
 				queue_flag_clear(QUEUE_FLAG_DISCARD, rq);
 				queue_flag_clear(QUEUE_FLAG_SECDISCARD, rq);
 			}
@@ -1202,26 +1230,26 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 		case BLKIF_OP_WRITE_BARRIER:
 			if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) {
 				printk(KERN_WARNING "blkfront: %s: %s op failed\n",
-				       info->gd->disk_name, op_name(bret->operation));
+				       dinfo->gd->disk_name, op_name(bret->operation));
 				req->errors = -EOPNOTSUPP;
 			}
 			if (unlikely(bret->status == BLKIF_RSP_ERROR &&
-				     info->shadow[id].req.u.rw.nr_segments == 0)) {
+				     rinfo->shadow[id].req.u.rw.nr_segments == 0)) {
 				printk(KERN_WARNING "blkfront: %s: empty %s op failed\n",
-				       info->gd->disk_name, op_name(bret->operation));
+				       dinfo->gd->disk_name, op_name(bret->operation));
 				req->errors = -EOPNOTSUPP;
 			}
 			if (unlikely(req->errors)) {
 				if (req->errors == -EOPNOTSUPP)
 					req->errors = 0;
-				info->feature_flush = 0;
-				xlvbd_flush(info);
+				dinfo->feature_flush = 0;
+				xlvbd_flush(dinfo);
 			}
 			/* fall through */
 		case BLKIF_OP_READ:
 		case BLKIF_OP_WRITE:
 			if (unlikely(bret->status != BLKIF_RSP_OKAY))
-				dev_dbg(&info->xbdev->dev, "Bad return from blkdev data "
+				dev_dbg(&dinfo->xbdev->dev, "Bad return from blkdev data "
 					"request: %x\n", bret->status);
 
 			blk_mq_complete_request(req);
@@ -1231,34 +1259,35 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 		}
 	}
 
-	info->ring.rsp_cons = i;
+	rinfo->ring.rsp_cons = i;
 
-	if (i != info->ring.req_prod_pvt) {
+	if (i != rinfo->ring.req_prod_pvt) {
 		int more_to_do;
-		RING_FINAL_CHECK_FOR_RESPONSES(&info->ring, more_to_do);
+		RING_FINAL_CHECK_FOR_RESPONSES(&rinfo->ring, more_to_do);
 		if (more_to_do)
 			goto again;
 	} else
-		info->ring.sring->rsp_event = i + 1;
+		rinfo->ring.sring->rsp_event = i + 1;
 
-	kick_pending_request_queues(info);
+	kick_pending_request_queues(rinfo);
 
-	spin_unlock_irqrestore(&info->io_lock, flags);
+	spin_unlock_irqrestore(&dinfo->io_lock, flags);
 
 	return IRQ_HANDLED;
 }
 
 
 static int setup_blkring(struct xenbus_device *dev,
-			 struct blkfront_info *info)
+			 struct blkfront_ring_info *rinfo)
 {
 	struct blkif_sring *sring;
 	int err, i;
-	unsigned long ring_size = info->nr_ring_pages * PAGE_SIZE;
+	struct blkfront_dev_info *dinfo = rinfo->dinfo;
+	unsigned long ring_size = dinfo->nr_ring_pages * PAGE_SIZE;
 	grant_ref_t gref[XENBUS_MAX_RING_PAGES];
 
-	for (i = 0; i < info->nr_ring_pages; i++)
-		info->ring_ref[i] = GRANT_INVALID_REF;
+	for (i = 0; i < dinfo->nr_ring_pages; i++)
+		rinfo->ring_ref[i] = GRANT_INVALID_REF;
 
 	sring = (struct blkif_sring *)__get_free_pages(GFP_NOIO | __GFP_HIGH,
 						       get_order(ring_size));
@@ -1267,58 +1296,59 @@ static int setup_blkring(struct xenbus_device *dev,
 		return -ENOMEM;
 	}
 	SHARED_RING_INIT(sring);
-	FRONT_RING_INIT(&info->ring, sring, ring_size);
+	FRONT_RING_INIT(&rinfo->ring, sring, ring_size);
 
-	err = xenbus_grant_ring(dev, info->ring.sring, info->nr_ring_pages, gref);
+	err = xenbus_grant_ring(dev, rinfo->ring.sring, dinfo->nr_ring_pages, gref);
 	if (err < 0) {
 		free_pages((unsigned long)sring, get_order(ring_size));
-		info->ring.sring = NULL;
+		rinfo->ring.sring = NULL;
 		goto fail;
 	}
-	for (i = 0; i < info->nr_ring_pages; i++)
-		info->ring_ref[i] = gref[i];
+	for (i = 0; i < dinfo->nr_ring_pages; i++)
+		rinfo->ring_ref[i] = gref[i];
 
-	err = xenbus_alloc_evtchn(dev, &info->evtchn);
+	err = xenbus_alloc_evtchn(dev, &rinfo->evtchn);
 	if (err)
 		goto fail;
 
-	err = bind_evtchn_to_irqhandler(info->evtchn, blkif_interrupt, 0,
-					"blkif", info);
+	err = bind_evtchn_to_irqhandler(rinfo->evtchn, blkif_interrupt, 0,
+					"blkif", rinfo);
 	if (err <= 0) {
 		xenbus_dev_fatal(dev, err,
 				 "bind_evtchn_to_irqhandler failed");
 		goto fail;
 	}
-	info->irq = err;
+	rinfo->irq = err;
 
 	return 0;
 fail:
-	blkif_free(info, 0);
+	blkif_free(dinfo, 0);
 	return err;
 }
 
 
 /* Common code used when first setting up, and when resuming. */
 static int talk_to_blkback(struct xenbus_device *dev,
-			   struct blkfront_info *info)
+			   struct blkfront_dev_info *dinfo)
 {
 	const char *message = NULL;
 	struct xenbus_transaction xbt;
 	int err, i;
 	unsigned int max_page_order = 0;
 	unsigned int ring_page_order = 0;
+	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
 
-	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
+	err = xenbus_scanf(XBT_NIL, dinfo->xbdev->otherend,
 			   "max-ring-page-order", "%u", &max_page_order);
 	if (err != 1)
-		info->nr_ring_pages = 1;
+		dinfo->nr_ring_pages = 1;
 	else {
 		ring_page_order = min(xen_blkif_max_ring_order, max_page_order);
-		info->nr_ring_pages = 1 << ring_page_order;
+		dinfo->nr_ring_pages = 1 << ring_page_order;
 	}
 
 	/* Create shared ring, alloc event channel. */
-	err = setup_blkring(dev, info);
+	err = setup_blkring(dev, rinfo);
 	if (err)
 		goto out;
 
@@ -1329,9 +1359,9 @@ again:
 		goto destroy_blkring;
 	}
 
-	if (info->nr_ring_pages == 1) {
+	if (dinfo->nr_ring_pages == 1) {
 		err = xenbus_printf(xbt, dev->nodename,
-				    "ring-ref", "%u", info->ring_ref[0]);
+				    "ring-ref", "%u", rinfo->ring_ref[0]);
 		if (err) {
 			message = "writing ring-ref";
 			goto abort_transaction;
@@ -1344,12 +1374,12 @@ again:
 			goto abort_transaction;
 		}
 
-		for (i = 0; i < info->nr_ring_pages; i++) {
+		for (i = 0; i < dinfo->nr_ring_pages; i++) {
 			char ring_ref_name[RINGREF_NAME_LEN];
 
 			snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
 			err = xenbus_printf(xbt, dev->nodename, ring_ref_name,
-					    "%u", info->ring_ref[i]);
+					    "%u", rinfo->ring_ref[i]);
 			if (err) {
 				message = "writing ring-ref";
 				goto abort_transaction;
@@ -1357,7 +1387,7 @@ again:
 		}
 	}
 	err = xenbus_printf(xbt, dev->nodename,
-			    "event-channel", "%u", info->evtchn);
+			    "event-channel", "%u", rinfo->evtchn);
 	if (err) {
 		message = "writing event-channel";
 		goto abort_transaction;
@@ -1382,9 +1412,9 @@ again:
 		goto destroy_blkring;
 	}
 
-	for (i = 0; i < BLK_RING_SIZE(info); i++)
-		info->shadow[i].req.u.rw.id = i+1;
-	info->shadow[BLK_RING_SIZE(info)-1].req.u.rw.id = 0x0fffffff;
+	for (i = 0; i < BLK_RING_SIZE(dinfo); i++)
+		rinfo->shadow[i].req.u.rw.id = i+1;
+	rinfo->shadow[BLK_RING_SIZE(dinfo)-1].req.u.rw.id = 0x0fffffff;
 	xenbus_switch_state(dev, XenbusStateInitialised);
 
 	return 0;
@@ -1394,7 +1424,7 @@ again:
 	if (message)
 		xenbus_dev_fatal(dev, err, "%s", message);
  destroy_blkring:
-	blkif_free(info, 0);
+	blkif_free(dinfo, 0);
  out:
 	return err;
 }
@@ -1409,7 +1439,8 @@ static int blkfront_probe(struct xenbus_device *dev,
 			  const struct xenbus_device_id *id)
 {
 	int err, vdevice;
-	struct blkfront_info *info;
+	struct blkfront_dev_info *dinfo;
+	struct blkfront_ring_info *rinfo;
 
 	/* FIXME: Use dynamic device id if this is not set. */
 	err = xenbus_scanf(XBT_NIL, dev->nodename,
@@ -1453,25 +1484,27 @@ static int blkfront_probe(struct xenbus_device *dev,
 		}
 		kfree(type);
 	}
-	info = kzalloc(sizeof(*info), GFP_KERNEL);
-	if (!info) {
+	dinfo = kzalloc(sizeof(*dinfo), GFP_KERNEL);
+	if (!dinfo) {
 		xenbus_dev_fatal(dev, -ENOMEM, "allocating info structure");
 		return -ENOMEM;
 	}
 
-	mutex_init(&info->mutex);
-	spin_lock_init(&info->io_lock);
-	info->xbdev = dev;
-	info->vdevice = vdevice;
-	INIT_LIST_HEAD(&info->grants);
-	INIT_LIST_HEAD(&info->indirect_pages);
-	info->persistent_gnts_c = 0;
-	info->connected = BLKIF_STATE_DISCONNECTED;
-	INIT_WORK(&info->work, blkif_restart_queue);
+	rinfo = &dinfo->rinfo;
+	mutex_init(&dinfo->mutex);
+	spin_lock_init(&dinfo->io_lock);
+	dinfo->xbdev = dev;
+	dinfo->vdevice = vdevice;
+	INIT_LIST_HEAD(&rinfo->grants);
+	INIT_LIST_HEAD(&rinfo->indirect_pages);
+	rinfo->persistent_gnts_c = 0;
+	dinfo->connected = BLKIF_STATE_DISCONNECTED;
+	rinfo->dinfo = dinfo;
+	INIT_WORK(&rinfo->work, blkif_restart_queue);
 
 	/* Front end dir is a number, which is used as the id. */
-	info->handle = simple_strtoul(strrchr(dev->nodename, '/')+1, NULL, 0);
-	dev_set_drvdata(&dev->dev, info);
+	dinfo->handle = simple_strtoul(strrchr(dev->nodename, '/')+1, NULL, 0);
+	dev_set_drvdata(&dev->dev, dinfo);
 
 	return 0;
 }
@@ -1491,7 +1524,7 @@ static void split_bio_end(struct bio *bio, int error)
 	bio_put(bio);
 }
 
-static int blkif_recover(struct blkfront_info *info)
+static int blkif_recover(struct blkfront_dev_info *dinfo)
 {
 	int i;
 	struct request *req, *n;
@@ -1503,31 +1536,32 @@ static int blkif_recover(struct blkfront_info *info)
 	int pending, size;
 	struct split_bio *split_bio;
 	struct list_head requests;
+	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
 
 	/* Stage 1: Make a safe copy of the shadow state. */
-	copy = kmemdup(info->shadow, sizeof(info->shadow),
+	copy = kmemdup(rinfo->shadow, sizeof(rinfo->shadow),
 		       GFP_NOIO | __GFP_REPEAT | __GFP_HIGH);
 	if (!copy)
 		return -ENOMEM;
 
 	/* Stage 2: Set up free list. */
-	memset(&info->shadow, 0, sizeof(info->shadow));
-	for (i = 0; i < BLK_RING_SIZE(info); i++)
-		info->shadow[i].req.u.rw.id = i+1;
-	info->shadow_free = info->ring.req_prod_pvt;
-	info->shadow[BLK_RING_SIZE(info)-1].req.u.rw.id = 0x0fffffff;
+	memset(&rinfo->shadow, 0, sizeof(rinfo->shadow));
+	for (i = 0; i < BLK_RING_SIZE(dinfo); i++)
+		rinfo->shadow[i].req.u.rw.id = i+1;
+	rinfo->shadow_free = rinfo->ring.req_prod_pvt;
+	rinfo->shadow[BLK_RING_SIZE(dinfo)-1].req.u.rw.id = 0x0fffffff;
 
-	rc = blkfront_gather_backend_features(info);
+	rc = blkfront_gather_backend_features(dinfo);
 	if (rc) {
 		kfree(copy);
 		return rc;
 	}
 
-	segs = info->max_indirect_segments ? : BLKIF_MAX_SEGMENTS_PER_REQUEST;
-	blk_queue_max_segments(info->rq, segs);
+	segs = dinfo->max_indirect_segments ? : BLKIF_MAX_SEGMENTS_PER_REQUEST;
+	blk_queue_max_segments(dinfo->rq, segs);
 	bio_list_init(&bio_list);
 	INIT_LIST_HEAD(&requests);
-	for (i = 0; i < BLK_RING_SIZE(info); i++) {
+	for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
 		/* Not in use? */
 		if (!copy[i].request)
 			continue;
@@ -1553,15 +1587,15 @@ static int blkif_recover(struct blkfront_info *info)
 
 	kfree(copy);
 
-	xenbus_switch_state(info->xbdev, XenbusStateConnected);
+	xenbus_switch_state(dinfo->xbdev, XenbusStateConnected);
 
-	spin_lock_irq(&info->io_lock);
+	spin_lock_irq(&dinfo->io_lock);
 
 	/* Now safe for us to use the shared ring */
-	info->connected = BLKIF_STATE_CONNECTED;
+	dinfo->connected = BLKIF_STATE_CONNECTED;
 
 	/* Kick any other new requests queued since we resumed */
-	kick_pending_request_queues(info);
+	kick_pending_request_queues(rinfo);
 
 	list_for_each_entry_safe(req, n, &requests, queuelist) {
 		/* Requeue pending requests (flush or discard) */
@@ -1569,8 +1603,8 @@ static int blkif_recover(struct blkfront_info *info)
 		BUG_ON(req->nr_phys_segments > segs);
 		blk_mq_requeue_request(req);
 	}
-	spin_unlock_irq(&info->io_lock);
-	blk_mq_kick_requeue_list(info->rq);
+	spin_unlock_irq(&dinfo->io_lock);
+	blk_mq_kick_requeue_list(dinfo->rq);
 
 	while ((bio = bio_list_pop(&bio_list)) != NULL) {
 		/* Traverse the list of pending bios and re-queue them */
@@ -1616,14 +1650,14 @@ static int blkif_recover(struct blkfront_info *info)
  */
 static int blkfront_resume(struct xenbus_device *dev)
 {
-	struct blkfront_info *info = dev_get_drvdata(&dev->dev);
+	struct blkfront_dev_info *dinfo = dev_get_drvdata(&dev->dev);
 	int err;
 
 	dev_dbg(&dev->dev, "blkfront_resume: %s\n", dev->nodename);
 
-	blkif_free(info, info->connected == BLKIF_STATE_CONNECTED);
+	blkif_free(dinfo, dinfo->connected == BLKIF_STATE_CONNECTED);
 
-	err = talk_to_blkback(dev, info);
+	err = talk_to_blkback(dev, dinfo);
 
 	/*
 	 * We have to wait for the backend to switch to
@@ -1635,22 +1669,22 @@ static int blkfront_resume(struct xenbus_device *dev)
 }
 
 static void
-blkfront_closing(struct blkfront_info *info)
+blkfront_closing(struct blkfront_dev_info *dinfo)
 {
-	struct xenbus_device *xbdev = info->xbdev;
+	struct xenbus_device *xbdev = dinfo->xbdev;
 	struct block_device *bdev = NULL;
 
-	mutex_lock(&info->mutex);
+	mutex_lock(&dinfo->mutex);
 
 	if (xbdev->state == XenbusStateClosing) {
-		mutex_unlock(&info->mutex);
+		mutex_unlock(&dinfo->mutex);
 		return;
 	}
 
-	if (info->gd)
-		bdev = bdget_disk(info->gd, 0);
+	if (dinfo->gd)
+		bdev = bdget_disk(dinfo->gd, 0);
 
-	mutex_unlock(&info->mutex);
+	mutex_unlock(&dinfo->mutex);
 
 	if (!bdev) {
 		xenbus_frontend_closed(xbdev);
@@ -1664,7 +1698,7 @@ blkfront_closing(struct blkfront_info *info)
 				 "Device in use; refusing to close");
 		xenbus_switch_state(xbdev, XenbusStateClosing);
 	} else {
-		xlvbd_release_gendisk(info);
+		xlvbd_release_gendisk(dinfo);
 		xenbus_frontend_closed(xbdev);
 	}
 
@@ -1672,93 +1706,94 @@ blkfront_closing(struct blkfront_info *info)
 	bdput(bdev);
 }
 
-static void blkfront_setup_discard(struct blkfront_info *info)
+static void blkfront_setup_discard(struct blkfront_dev_info *dinfo)
 {
 	int err;
 	unsigned int discard_granularity;
 	unsigned int discard_alignment;
 	unsigned int discard_secure;
 
-	info->feature_discard = 1;
-	err = xenbus_gather(XBT_NIL, info->xbdev->otherend,
+	dinfo->feature_discard = 1;
+	err = xenbus_gather(XBT_NIL, dinfo->xbdev->otherend,
 		"discard-granularity", "%u", &discard_granularity,
 		"discard-alignment", "%u", &discard_alignment,
 		NULL);
 	if (!err) {
-		info->discard_granularity = discard_granularity;
-		info->discard_alignment = discard_alignment;
+		dinfo->discard_granularity = discard_granularity;
+		dinfo->discard_alignment = discard_alignment;
 	}
-	err = xenbus_gather(XBT_NIL, info->xbdev->otherend,
+	err = xenbus_gather(XBT_NIL, dinfo->xbdev->otherend,
 		    "discard-secure", "%d", &discard_secure,
 		    NULL);
 	if (!err)
-		info->feature_secdiscard = !!discard_secure;
+		dinfo->feature_secdiscard = !!discard_secure;
 }
 
-static int blkfront_setup_indirect(struct blkfront_info *info)
+static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo)
 {
 	unsigned int segs;
 	int err, i;
+	struct blkfront_dev_info *dinfo = rinfo->dinfo;
 
-	if (info->max_indirect_segments == 0)
+	if (dinfo->max_indirect_segments == 0)
 		segs = BLKIF_MAX_SEGMENTS_PER_REQUEST;
 	else
-		segs = info->max_indirect_segments;
+		segs = dinfo->max_indirect_segments;
 
-	err = fill_grant_buffer(info, (segs + INDIRECT_GREFS(segs)) * BLK_RING_SIZE(info));
+	err = fill_grant_buffer(rinfo, (segs + INDIRECT_GREFS(segs)) * BLK_RING_SIZE(dinfo));
 	if (err)
 		goto out_of_memory;
 
-	if (!info->feature_persistent && info->max_indirect_segments) {
+	if (!dinfo->feature_persistent && dinfo->max_indirect_segments) {
 		/*
 		 * We are using indirect descriptors but not persistent
 		 * grants, we need to allocate a set of pages that can be
 		 * used for mapping indirect grefs
 		 */
-		int num = INDIRECT_GREFS(segs) * BLK_RING_SIZE(info);
+		int num = INDIRECT_GREFS(segs) * BLK_RING_SIZE(dinfo);
 
-		BUG_ON(!list_empty(&info->indirect_pages));
+		BUG_ON(!list_empty(&rinfo->indirect_pages));
 		for (i = 0; i < num; i++) {
 			struct page *indirect_page = alloc_page(GFP_NOIO);
 			if (!indirect_page)
 				goto out_of_memory;
-			list_add(&indirect_page->lru, &info->indirect_pages);
+			list_add(&indirect_page->lru, &rinfo->indirect_pages);
 		}
 	}
 
-	for (i = 0; i < BLK_RING_SIZE(info); i++) {
-		info->shadow[i].grants_used = kzalloc(
-			sizeof(info->shadow[i].grants_used[0]) * segs,
+	for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
+		rinfo->shadow[i].grants_used = kzalloc(
+			sizeof(rinfo->shadow[i].grants_used[0]) * segs,
 			GFP_NOIO);
-		info->shadow[i].sg = kzalloc(sizeof(info->shadow[i].sg[0]) * segs, GFP_NOIO);
-		if (info->max_indirect_segments)
-			info->shadow[i].indirect_grants = kzalloc(
-				sizeof(info->shadow[i].indirect_grants[0]) *
+		rinfo->shadow[i].sg = kzalloc(sizeof(rinfo->shadow[i].sg[0]) * segs, GFP_NOIO);
+		if (dinfo->max_indirect_segments)
+			rinfo->shadow[i].indirect_grants = kzalloc(
+				sizeof(rinfo->shadow[i].indirect_grants[0]) *
 				INDIRECT_GREFS(segs),
 				GFP_NOIO);
-		if ((info->shadow[i].grants_used == NULL) ||
-			(info->shadow[i].sg == NULL) ||
-		     (info->max_indirect_segments &&
-		     (info->shadow[i].indirect_grants == NULL)))
+		if ((rinfo->shadow[i].grants_used == NULL) ||
+			(rinfo->shadow[i].sg == NULL) ||
+		     (dinfo->max_indirect_segments &&
+		     (rinfo->shadow[i].indirect_grants == NULL)))
 			goto out_of_memory;
-		sg_init_table(info->shadow[i].sg, segs);
+		sg_init_table(rinfo->shadow[i].sg, segs);
 	}
 
 
 	return 0;
 
 out_of_memory:
-	for (i = 0; i < BLK_RING_SIZE(info); i++) {
-		kfree(info->shadow[i].grants_used);
-		info->shadow[i].grants_used = NULL;
-		kfree(info->shadow[i].sg);
-		info->shadow[i].sg = NULL;
-		kfree(info->shadow[i].indirect_grants);
-		info->shadow[i].indirect_grants = NULL;
+	for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
+		kfree(rinfo->shadow[i].grants_used);
+		rinfo->shadow[i].grants_used = NULL;
+		kfree(rinfo->shadow[i].sg);
+		rinfo->shadow[i].sg = NULL;
+		kfree(rinfo->shadow[i].indirect_grants);
+		rinfo->shadow[i].indirect_grants = NULL;
 	}
-	if (!list_empty(&info->indirect_pages)) {
+	if (!list_empty(&rinfo->indirect_pages)) {
 		struct page *indirect_page, *n;
-		list_for_each_entry_safe(indirect_page, n, &info->indirect_pages, lru) {
+		list_for_each_entry_safe(indirect_page, n, &rinfo->indirect_pages, lru) {
 			list_del(&indirect_page->lru);
 			__free_page(indirect_page);
 		}
@@ -1769,15 +1804,15 @@ out_of_memory:
 /*
  * Gather all backend feature-*
  */
-static int blkfront_gather_backend_features(struct blkfront_info *info)
+static int blkfront_gather_backend_features(struct blkfront_dev_info *dinfo)
 {
 	int err;
 	int barrier, flush, discard, persistent;
 	unsigned int indirect_segments;
 
-	info->feature_flush = 0;
+	dinfo->feature_flush = 0;
 
-	err = xenbus_gather(XBT_NIL, info->xbdev->otherend,
+	err = xenbus_gather(XBT_NIL, dinfo->xbdev->otherend,
 			"feature-barrier", "%d", &barrier,
 			NULL);
 
@@ -1789,71 +1824,72 @@ static int blkfront_gather_backend_features(struct blkfront_info *info)
 	 * If there are barriers, then we use flush.
 	 */
 	if (!err && barrier)
-		info->feature_flush = REQ_FLUSH | REQ_FUA;
+		dinfo->feature_flush = REQ_FLUSH | REQ_FUA;
 	/*
 	 * And if there is "feature-flush-cache" use that above
 	 * barriers.
 	 */
-	err = xenbus_gather(XBT_NIL, info->xbdev->otherend,
+	err = xenbus_gather(XBT_NIL, dinfo->xbdev->otherend,
 			"feature-flush-cache", "%d", &flush,
 			NULL);
 
 	if (!err && flush)
-		info->feature_flush = REQ_FLUSH;
+		dinfo->feature_flush = REQ_FLUSH;
 
-	err = xenbus_gather(XBT_NIL, info->xbdev->otherend,
+	err = xenbus_gather(XBT_NIL, dinfo->xbdev->otherend,
 			"feature-discard", "%d", &discard,
 			NULL);
 
 	if (!err && discard)
-		blkfront_setup_discard(info);
+		blkfront_setup_discard(dinfo);
 
-	err = xenbus_gather(XBT_NIL, info->xbdev->otherend,
+	err = xenbus_gather(XBT_NIL, dinfo->xbdev->otherend,
 			"feature-persistent", "%u", &persistent,
 			NULL);
 	if (err)
-		info->feature_persistent = 0;
+		dinfo->feature_persistent = 0;
 	else
-		info->feature_persistent = persistent;
+		dinfo->feature_persistent = persistent;
 
-	err = xenbus_gather(XBT_NIL, info->xbdev->otherend,
+	err = xenbus_gather(XBT_NIL, dinfo->xbdev->otherend,
 			    "feature-max-indirect-segments", "%u", &indirect_segments,
 			    NULL);
 	if (err)
-		info->max_indirect_segments = 0;
+		dinfo->max_indirect_segments = 0;
 	else
-		info->max_indirect_segments = min(indirect_segments,
+		dinfo->max_indirect_segments = min(indirect_segments,
 						  xen_blkif_max_segments);
 
-	return blkfront_setup_indirect(info);
+	return blkfront_setup_indirect(&dinfo->rinfo);
 }
 
 /*
  * Invoked when the backend is finally 'ready' (and has told produced
  * the details about the physical device - #sectors, size, etc).
  */
-static void blkfront_connect(struct blkfront_info *info)
+static void blkfront_connect(struct blkfront_dev_info *dinfo)
 {
 	unsigned long long sectors;
 	unsigned long sector_size;
 	unsigned int physical_sector_size;
 	unsigned int binfo;
 	int err;
+	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
 
-	switch (info->connected) {
+	switch (dinfo->connected) {
 	case BLKIF_STATE_CONNECTED:
 		/*
 		 * Potentially, the back-end may be signalling
 		 * a capacity change; update the capacity.
 		 */
-		err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
+		err = xenbus_scanf(XBT_NIL, dinfo->xbdev->otherend,
 				   "sectors", "%Lu", &sectors);
 		if (XENBUS_EXIST_ERR(err))
 			return;
 		printk(KERN_INFO "Setting capacity to %Lu\n",
 		       sectors);
-		set_capacity(info->gd, sectors);
-		revalidate_disk(info->gd);
+		set_capacity(dinfo->gd, sectors);
+		revalidate_disk(dinfo->gd);
 
 		return;
 	case BLKIF_STATE_SUSPENDED:
@@ -1863,25 +1899,25 @@ static void blkfront_connect(struct blkfront_info *info)
 		 * reconnecting, at least we need to know if the backend
 		 * supports indirect descriptors, and how many.
 		 */
-		blkif_recover(info);
+		blkif_recover(dinfo);
 		return;
 
 	default:
 		break;
 	}
 
-	dev_dbg(&info->xbdev->dev, "%s:%s.\n",
-		__func__, info->xbdev->otherend);
+	dev_dbg(&dinfo->xbdev->dev, "%s:%s.\n",
+		__func__, dinfo->xbdev->otherend);
 
-	err = xenbus_gather(XBT_NIL, info->xbdev->otherend,
+	err = xenbus_gather(XBT_NIL, dinfo->xbdev->otherend,
 			    "sectors", "%llu", &sectors,
 			    "info", "%u", &binfo,
 			    "sector-size", "%lu", &sector_size,
 			    NULL);
 	if (err) {
-		xenbus_dev_fatal(info->xbdev, err,
+		xenbus_dev_fatal(dinfo->xbdev, err,
 				 "reading backend fields at %s",
-				 info->xbdev->otherend);
+				 dinfo->xbdev->otherend);
 		return;
 	}
 
@@ -1890,37 +1926,37 @@ static void blkfront_connect(struct blkfront_info *info)
 	 * provide this. Assume physical sector size to be the same as
 	 * sector_size in that case.
 	 */
-	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
+	err = xenbus_scanf(XBT_NIL, dinfo->xbdev->otherend,
 			   "physical-sector-size", "%u", &physical_sector_size);
 	if (err != 1)
 		physical_sector_size = sector_size;
 
-	err = blkfront_gather_backend_features(info);
+	err = blkfront_gather_backend_features(dinfo);
 	if (err) {
-		xenbus_dev_fatal(info->xbdev, err, "setup_indirect at %s",
-				 info->xbdev->otherend);
+		xenbus_dev_fatal(dinfo->xbdev, err, "setup_indirect at %s",
+				 dinfo->xbdev->otherend);
 		return;
 	}
 
-	err = xlvbd_alloc_gendisk(sectors, info, binfo, sector_size,
+	err = xlvbd_alloc_gendisk(sectors, dinfo, binfo, sector_size,
 				  physical_sector_size);
 	if (err) {
-		xenbus_dev_fatal(info->xbdev, err, "xlvbd_add at %s",
-				 info->xbdev->otherend);
+		xenbus_dev_fatal(dinfo->xbdev, err, "xlvbd_add at %s",
+				 dinfo->xbdev->otherend);
 		return;
 	}
 
-	xenbus_switch_state(info->xbdev, XenbusStateConnected);
+	xenbus_switch_state(dinfo->xbdev, XenbusStateConnected);
 
 	/* Kick pending requests. */
-	spin_lock_irq(&info->io_lock);
-	info->connected = BLKIF_STATE_CONNECTED;
-	kick_pending_request_queues(info);
-	spin_unlock_irq(&info->io_lock);
+	spin_lock_irq(&dinfo->io_lock);
+	dinfo->connected = BLKIF_STATE_CONNECTED;
+	kick_pending_request_queues(rinfo);
+	spin_unlock_irq(&dinfo->io_lock);
 
-	add_disk(info->gd);
+	add_disk(dinfo->gd);
 
-	info->is_ready = 1;
+	dinfo->is_ready = 1;
 }
 
 /**
@@ -1929,7 +1965,7 @@ static void blkfront_connect(struct blkfront_info *info)
 static void blkback_changed(struct xenbus_device *dev,
 			    enum xenbus_state backend_state)
 {
-	struct blkfront_info *info = dev_get_drvdata(&dev->dev);
+	struct blkfront_dev_info *dinfo = dev_get_drvdata(&dev->dev);
 
 	dev_dbg(&dev->dev, "blkfront:blkback_changed to state %d.\n", backend_state);
 
@@ -1937,8 +1973,8 @@ static void blkback_changed(struct xenbus_device *dev,
 	case XenbusStateInitWait:
 		if (dev->state != XenbusStateInitialising)
 			break;
-		if (talk_to_blkback(dev, info)) {
-			kfree(info);
+		if (talk_to_blkback(dev, dinfo)) {
+			kfree(dinfo);
 			dev_set_drvdata(&dev->dev, NULL);
 			break;
 		}
@@ -1950,7 +1986,7 @@ static void blkback_changed(struct xenbus_device *dev,
 		break;
 
 	case XenbusStateConnected:
-		blkfront_connect(info);
+		blkfront_connect(dinfo);
 		break;
 
 	case XenbusStateClosed:
@@ -1958,32 +1994,32 @@ static void blkback_changed(struct xenbus_device *dev,
 			break;
 		/* Missed the backend's Closing state -- fallthrough */
 	case XenbusStateClosing:
-		blkfront_closing(info);
+		blkfront_closing(dinfo);
 		break;
 	}
 }
 
 static int blkfront_remove(struct xenbus_device *xbdev)
 {
-	struct blkfront_info *info = dev_get_drvdata(&xbdev->dev);
+	struct blkfront_dev_info *dinfo = dev_get_drvdata(&xbdev->dev);
 	struct block_device *bdev = NULL;
 	struct gendisk *disk;
 
 	dev_dbg(&xbdev->dev, "%s removed", xbdev->nodename);
 
-	blkif_free(info, 0);
+	blkif_free(dinfo, 0);
 
-	mutex_lock(&info->mutex);
+	mutex_lock(&dinfo->mutex);
 
-	disk = info->gd;
+	disk = dinfo->gd;
 	if (disk)
 		bdev = bdget_disk(disk, 0);
 
-	info->xbdev = NULL;
-	mutex_unlock(&info->mutex);
+	dinfo->xbdev = NULL;
+	mutex_unlock(&dinfo->mutex);
 
 	if (!bdev) {
-		kfree(info);
+		kfree(dinfo);
 		return 0;
 	}
 
@@ -1994,16 +2030,16 @@ static int blkfront_remove(struct xenbus_device *xbdev)
 	 */
 
 	mutex_lock(&bdev->bd_mutex);
-	info = disk->private_data;
+	dinfo = disk->private_data;
 
 	dev_warn(disk_to_dev(disk),
 		 "%s was hot-unplugged, %d stale handles\n",
 		 xbdev->nodename, bdev->bd_openers);
 
-	if (info && !bdev->bd_openers) {
-		xlvbd_release_gendisk(info);
+	if (dinfo && !bdev->bd_openers) {
+		xlvbd_release_gendisk(dinfo);
 		disk->private_data = NULL;
-		kfree(info);
+		kfree(dinfo);
 	}
 
 	mutex_unlock(&bdev->bd_mutex);
@@ -2014,33 +2050,33 @@ static int blkfront_remove(struct xenbus_device *xbdev)
 
 static int blkfront_is_ready(struct xenbus_device *dev)
 {
-	struct blkfront_info *info = dev_get_drvdata(&dev->dev);
+	struct blkfront_dev_info *dinfo = dev_get_drvdata(&dev->dev);
 
-	return info->is_ready && info->xbdev;
+	return dinfo->is_ready && dinfo->xbdev;
 }
 
 static int blkif_open(struct block_device *bdev, fmode_t mode)
 {
 	struct gendisk *disk = bdev->bd_disk;
-	struct blkfront_info *info;
+	struct blkfront_dev_info *dinfo;
 	int err = 0;
 
 	mutex_lock(&blkfront_mutex);
 
-	info = disk->private_data;
-	if (!info) {
+	dinfo = disk->private_data;
+	if (!dinfo) {
 		/* xbdev gone */
 		err = -ERESTARTSYS;
 		goto out;
 	}
 
-	mutex_lock(&info->mutex);
+	mutex_lock(&dinfo->mutex);
 
-	if (!info->gd)
+	if (!dinfo->gd)
 		/* xbdev is closed */
 		err = -ERESTARTSYS;
 
-	mutex_unlock(&info->mutex);
+	mutex_unlock(&dinfo->mutex);
 
 out:
 	mutex_unlock(&blkfront_mutex);
@@ -2049,7 +2085,7 @@ out:
 
 static void blkif_release(struct gendisk *disk, fmode_t mode)
 {
-	struct blkfront_info *info = disk->private_data;
+	struct blkfront_dev_info *dinfo = disk->private_data;
 	struct block_device *bdev;
 	struct xenbus_device *xbdev;
 
@@ -2069,24 +2105,24 @@ static void blkif_release(struct gendisk *disk, fmode_t mode)
 	 * deferred this request, because the bdev was still open.
 	 */
 
-	mutex_lock(&info->mutex);
-	xbdev = info->xbdev;
+	mutex_lock(&dinfo->mutex);
+	xbdev = dinfo->xbdev;
 
 	if (xbdev && xbdev->state == XenbusStateClosing) {
 		/* pending switch to state closed */
 		dev_info(disk_to_dev(bdev->bd_disk), "releasing disk\n");
-		xlvbd_release_gendisk(info);
-		xenbus_frontend_closed(info->xbdev);
+		xlvbd_release_gendisk(dinfo);
+		xenbus_frontend_closed(dinfo->xbdev);
  	}
 
-	mutex_unlock(&info->mutex);
+	mutex_unlock(&dinfo->mutex);
 
 	if (!xbdev) {
 		/* sudden device removal */
 		dev_info(disk_to_dev(bdev->bd_disk), "releasing disk\n");
-		xlvbd_release_gendisk(info);
+		xlvbd_release_gendisk(dinfo);
 		disk->private_data = NULL;
-		kfree(info);
+		kfree(dinfo);
 	}
 
 out:
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v3 3/9] xen/blkfront: separate per ring information out of device info
  2015-09-05 12:39 [PATCH v3 0/9] xen-block: support multi hardware-queues/rings Bob Liu
                   ` (4 preceding siblings ...)
  2015-09-05 12:39 ` [PATCH v3 3/9] xen/blkfront: separate per ring information out of device info Bob Liu
@ 2015-09-05 12:39 ` Bob Liu
  2015-09-05 12:39 ` [PATCH v3 4/9] xen/blkfront: pseudo support for multi hardware queues/rings Bob Liu
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 83+ messages in thread
From: Bob Liu @ 2015-09-05 12:39 UTC (permalink / raw)
  To: xen-devel
  Cc: hch, felipe.franciosi, rafal.mielniczuk, linux-kernel,
	jonathan.davies, axboe, Bob Liu, david.vrabel, avanzini.arianna,
	boris.ostrovsky, roger.pau

Split per ring information to an new structure:blkfront_ring_info, also rename
per blkfront_info to blkfront_dev_info.

A ring is the representation of a hardware queue, every vbd device can associate
with one or more blkfront_ring_info depending on how many hardware
queues/rings to be used.

This patch is a preparation for supporting real multi hardware queues/rings.

Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 drivers/block/xen-blkfront.c |  854 ++++++++++++++++++++++--------------------
 1 file changed, 445 insertions(+), 409 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 5dd591d..bf416d5 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -107,7 +107,7 @@ static unsigned int xen_blkif_max_ring_order;
 module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, S_IRUGO);
 MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the shared ring");
 
-#define BLK_RING_SIZE(info) __CONST_RING_SIZE(blkif, PAGE_SIZE * (info)->nr_ring_pages)
+#define BLK_RING_SIZE(dinfo) __CONST_RING_SIZE(blkif, PAGE_SIZE * (dinfo)->nr_ring_pages)
 #define BLK_MAX_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE * XENBUS_MAX_RING_PAGES)
 /*
  * ring-ref%i i=(-1UL) would take 11 characters + 'ring-ref' is 8, so 19
@@ -116,12 +116,31 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the
 #define RINGREF_NAME_LEN (20)
 
 /*
+ *  Per-ring info.
+ *  Every blkfront device can associate with one or more blkfront_ring_info,
+ *  depending on how many hardware queues to be used.
+ */
+struct blkfront_ring_info
+{
+	struct blkif_front_ring ring;
+	unsigned int ring_ref[XENBUS_MAX_RING_PAGES];
+	unsigned int evtchn, irq;
+	struct work_struct work;
+	struct gnttab_free_callback callback;
+	struct blk_shadow shadow[BLK_MAX_RING_SIZE];
+	struct list_head grants;
+	struct list_head indirect_pages;
+	unsigned int persistent_gnts_c;
+	unsigned long shadow_free;
+	struct blkfront_dev_info *dinfo;
+};
+
+/*
  * We have one of these per vbd, whether ide, scsi or 'other'.  They
  * hang in private_data off the gendisk structure. We may end up
  * putting all kinds of interesting stuff here :-)
  */
-struct blkfront_info
-{
+struct blkfront_dev_info {
 	spinlock_t io_lock;
 	struct mutex mutex;
 	struct xenbus_device *xbdev;
@@ -129,18 +148,7 @@ struct blkfront_info
 	int vdevice;
 	blkif_vdev_t handle;
 	enum blkif_state connected;
-	int ring_ref[XENBUS_MAX_RING_PAGES];
-	unsigned int nr_ring_pages;
-	struct blkif_front_ring ring;
-	unsigned int evtchn, irq;
 	struct request_queue *rq;
-	struct work_struct work;
-	struct gnttab_free_callback callback;
-	struct blk_shadow shadow[BLK_MAX_RING_SIZE];
-	struct list_head grants;
-	struct list_head indirect_pages;
-	unsigned int persistent_gnts_c;
-	unsigned long shadow_free;
 	unsigned int feature_flush;
 	unsigned int feature_discard:1;
 	unsigned int feature_secdiscard:1;
@@ -149,7 +157,9 @@ struct blkfront_info
 	unsigned int feature_persistent:1;
 	unsigned int max_indirect_segments;
 	int is_ready;
+	unsigned int nr_ring_pages;
 	struct blk_mq_tag_set tag_set;
+	struct blkfront_ring_info rinfo;
 };
 
 static unsigned int nr_minors;
@@ -180,32 +190,33 @@ static DEFINE_SPINLOCK(minor_lock);
 #define INDIRECT_GREFS(_segs) \
 	((_segs + SEGS_PER_INDIRECT_FRAME - 1)/SEGS_PER_INDIRECT_FRAME)
 
-static int blkfront_setup_indirect(struct blkfront_info *info);
-static int blkfront_gather_backend_features(struct blkfront_info *info);
+static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo);
+static int blkfront_gather_backend_features(struct blkfront_dev_info *dinfo);
 
-static int get_id_from_freelist(struct blkfront_info *info)
+static int get_id_from_freelist(struct blkfront_ring_info *rinfo)
 {
-	unsigned long free = info->shadow_free;
-	BUG_ON(free >= BLK_RING_SIZE(info));
-	info->shadow_free = info->shadow[free].req.u.rw.id;
-	info->shadow[free].req.u.rw.id = 0x0fffffee; /* debug */
+	unsigned long free = rinfo->shadow_free;
+
+	BUG_ON(free >= BLK_RING_SIZE(rinfo->dinfo));
+	rinfo->shadow_free = rinfo->shadow[free].req.u.rw.id;
+	rinfo->shadow[free].req.u.rw.id = 0x0fffffee; /* debug */
 	return free;
 }
 
-static int add_id_to_freelist(struct blkfront_info *info,
+static int add_id_to_freelist(struct blkfront_ring_info *rinfo,
 			       unsigned long id)
 {
-	if (info->shadow[id].req.u.rw.id != id)
+	if (rinfo->shadow[id].req.u.rw.id != id)
 		return -EINVAL;
-	if (info->shadow[id].request == NULL)
+	if (rinfo->shadow[id].request == NULL)
 		return -EINVAL;
-	info->shadow[id].req.u.rw.id  = info->shadow_free;
-	info->shadow[id].request = NULL;
-	info->shadow_free = id;
+	rinfo->shadow[id].req.u.rw.id  = rinfo->shadow_free;
+	rinfo->shadow[id].request = NULL;
+	rinfo->shadow_free = id;
 	return 0;
 }
 
-static int fill_grant_buffer(struct blkfront_info *info, int num)
+static int fill_grant_buffer(struct blkfront_ring_info *rinfo, int num)
 {
 	struct page *granted_page;
 	struct grant *gnt_list_entry, *n;
@@ -216,7 +227,7 @@ static int fill_grant_buffer(struct blkfront_info *info, int num)
 		if (!gnt_list_entry)
 			goto out_of_memory;
 
-		if (info->feature_persistent) {
+		if (rinfo->dinfo->feature_persistent) {
 			granted_page = alloc_page(GFP_NOIO);
 			if (!granted_page) {
 				kfree(gnt_list_entry);
@@ -226,7 +237,7 @@ static int fill_grant_buffer(struct blkfront_info *info, int num)
 		}
 
 		gnt_list_entry->gref = GRANT_INVALID_REF;
-		list_add(&gnt_list_entry->node, &info->grants);
+		list_add(&gnt_list_entry->node, &rinfo->grants);
 		i++;
 	}
 
@@ -234,9 +245,9 @@ static int fill_grant_buffer(struct blkfront_info *info, int num)
 
 out_of_memory:
 	list_for_each_entry_safe(gnt_list_entry, n,
-	                         &info->grants, node) {
+				 &rinfo->grants, node) {
 		list_del(&gnt_list_entry->node);
-		if (info->feature_persistent)
+		if (rinfo->dinfo->feature_persistent)
 			__free_page(pfn_to_page(gnt_list_entry->pfn));
 		kfree(gnt_list_entry);
 		i--;
@@ -246,33 +257,33 @@ out_of_memory:
 }
 
 static struct grant *get_grant(grant_ref_t *gref_head,
-                               unsigned long pfn,
-                               struct blkfront_info *info)
+			       unsigned long pfn,
+			       struct blkfront_ring_info *rinfo)
 {
 	struct grant *gnt_list_entry;
 	unsigned long buffer_mfn;
 
-	BUG_ON(list_empty(&info->grants));
-	gnt_list_entry = list_first_entry(&info->grants, struct grant,
+	BUG_ON(list_empty(&rinfo->grants));
+	gnt_list_entry = list_first_entry(&rinfo->grants, struct grant,
 	                                  node);
 	list_del(&gnt_list_entry->node);
 
 	if (gnt_list_entry->gref != GRANT_INVALID_REF) {
-		info->persistent_gnts_c--;
+		rinfo->persistent_gnts_c--;
 		return gnt_list_entry;
 	}
 
 	/* Assign a gref to this page */
 	gnt_list_entry->gref = gnttab_claim_grant_reference(gref_head);
 	BUG_ON(gnt_list_entry->gref == -ENOSPC);
-	if (!info->feature_persistent) {
+	if (!rinfo->dinfo->feature_persistent) {
 		BUG_ON(!pfn);
 		gnt_list_entry->pfn = pfn;
 	}
 	buffer_mfn = pfn_to_mfn(gnt_list_entry->pfn);
 	gnttab_grant_foreign_access_ref(gnt_list_entry->gref,
-	                                info->xbdev->otherend_id,
-	                                buffer_mfn, 0);
+					rinfo->dinfo->xbdev->otherend_id,
+					buffer_mfn, 0);
 	return gnt_list_entry;
 }
 
@@ -342,8 +353,9 @@ static void xlbd_release_minors(unsigned int minor, unsigned int nr)
 
 static void blkif_restart_queue_callback(void *arg)
 {
-	struct blkfront_info *info = (struct blkfront_info *)arg;
-	schedule_work(&info->work);
+	struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)arg;
+
+	schedule_work(&rinfo->work);
 }
 
 static int blkif_getgeo(struct block_device *bd, struct hd_geometry *hg)
@@ -365,22 +377,22 @@ static int blkif_getgeo(struct block_device *bd, struct hd_geometry *hg)
 static int blkif_ioctl(struct block_device *bdev, fmode_t mode,
 		       unsigned command, unsigned long argument)
 {
-	struct blkfront_info *info = bdev->bd_disk->private_data;
+	struct blkfront_dev_info *dinfo = bdev->bd_disk->private_data;
 	int i;
 
-	dev_dbg(&info->xbdev->dev, "command: 0x%x, argument: 0x%lx\n",
+	dev_dbg(&dinfo->xbdev->dev, "command: 0x%x, argument: 0x%lx\n",
 		command, (long)argument);
 
 	switch (command) {
 	case CDROMMULTISESSION:
-		dev_dbg(&info->xbdev->dev, "FIXME: support multisession CDs later\n");
+		dev_dbg(&dinfo->xbdev->dev, "FIXME: support multisession CDs later\n");
 		for (i = 0; i < sizeof(struct cdrom_multisession); i++)
 			if (put_user(0, (char __user *)(argument + i)))
 				return -EFAULT;
 		return 0;
 
 	case CDROM_GET_CAPABILITY: {
-		struct gendisk *gd = info->gd;
+		struct gendisk *gd = dinfo->gd;
 		if (gd->flags & GENHD_FL_CD)
 			return 0;
 		return -EINVAL;
@@ -401,9 +413,10 @@ static int blkif_ioctl(struct block_device *bdev, fmode_t mode,
  *
  * @req: a request struct
  */
-static int blkif_queue_request(struct request *req)
+static int blkif_queue_request(struct request *req,
+			       struct blkfront_ring_info *rinfo)
 {
-	struct blkfront_info *info = req->rq_disk->private_data;
+	struct blkfront_dev_info *dinfo = req->rq_disk->private_data;
 	struct blkif_request *ring_req;
 	unsigned long id;
 	unsigned int fsect, lsect;
@@ -421,7 +434,7 @@ static int blkif_queue_request(struct request *req)
 	struct scatterlist *sg;
 	int nseg, max_grefs;
 
-	if (unlikely(info->connected != BLKIF_STATE_CONNECTED))
+	if (unlikely(dinfo->connected != BLKIF_STATE_CONNECTED))
 		return 1;
 
 	max_grefs = req->nr_phys_segments;
@@ -433,15 +446,15 @@ static int blkif_queue_request(struct request *req)
 		max_grefs += INDIRECT_GREFS(req->nr_phys_segments);
 
 	/* Check if we have enough grants to allocate a requests */
-	if (info->persistent_gnts_c < max_grefs) {
+	if (rinfo->persistent_gnts_c < max_grefs) {
 		new_persistent_gnts = 1;
 		if (gnttab_alloc_grant_references(
-		    max_grefs - info->persistent_gnts_c,
+		    max_grefs - rinfo->persistent_gnts_c,
 		    &gref_head) < 0) {
 			gnttab_request_free_callback(
-				&info->callback,
+				&rinfo->callback,
 				blkif_restart_queue_callback,
-				info,
+				rinfo,
 				max_grefs);
 			return 1;
 		}
@@ -449,25 +462,25 @@ static int blkif_queue_request(struct request *req)
 		new_persistent_gnts = 0;
 
 	/* Fill out a communications ring structure. */
-	ring_req = RING_GET_REQUEST(&info->ring, info->ring.req_prod_pvt);
-	id = get_id_from_freelist(info);
-	info->shadow[id].request = req;
+	ring_req = RING_GET_REQUEST(&rinfo->ring, rinfo->ring.req_prod_pvt);
+	id = get_id_from_freelist(rinfo);
+	rinfo->shadow[id].request = req;
 
 	if (unlikely(req->cmd_flags & (REQ_DISCARD | REQ_SECURE))) {
 		ring_req->operation = BLKIF_OP_DISCARD;
 		ring_req->u.discard.nr_sectors = blk_rq_sectors(req);
 		ring_req->u.discard.id = id;
 		ring_req->u.discard.sector_number = (blkif_sector_t)blk_rq_pos(req);
-		if ((req->cmd_flags & REQ_SECURE) && info->feature_secdiscard)
+		if ((req->cmd_flags & REQ_SECURE) && dinfo->feature_secdiscard)
 			ring_req->u.discard.flag = BLKIF_DISCARD_SECURE;
 		else
 			ring_req->u.discard.flag = 0;
 	} else {
-		BUG_ON(info->max_indirect_segments == 0 &&
+		BUG_ON(dinfo->max_indirect_segments == 0 &&
 		       req->nr_phys_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST);
-		BUG_ON(info->max_indirect_segments &&
-		       req->nr_phys_segments > info->max_indirect_segments);
-		nseg = blk_rq_map_sg(req->q, req, info->shadow[id].sg);
+		BUG_ON(dinfo->max_indirect_segments &&
+		       req->nr_phys_segments > dinfo->max_indirect_segments);
+		nseg = blk_rq_map_sg(req->q, req, rinfo->shadow[id].sg);
 		ring_req->u.rw.id = id;
 		if (nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
 			/*
@@ -479,11 +492,11 @@ static int blkif_queue_request(struct request *req)
 			ring_req->u.indirect.indirect_op = rq_data_dir(req) ?
 				BLKIF_OP_WRITE : BLKIF_OP_READ;
 			ring_req->u.indirect.sector_number = (blkif_sector_t)blk_rq_pos(req);
-			ring_req->u.indirect.handle = info->handle;
+			ring_req->u.indirect.handle = dinfo->handle;
 			ring_req->u.indirect.nr_segments = nseg;
 		} else {
 			ring_req->u.rw.sector_number = (blkif_sector_t)blk_rq_pos(req);
-			ring_req->u.rw.handle = info->handle;
+			ring_req->u.rw.handle = dinfo->handle;
 			ring_req->operation = rq_data_dir(req) ?
 				BLKIF_OP_WRITE : BLKIF_OP_READ;
 			if (req->cmd_flags & (REQ_FLUSH | REQ_FUA)) {
@@ -494,7 +507,7 @@ static int blkif_queue_request(struct request *req)
 				 * way.  (It's also a FLUSH+FUA, since it is
 				 * guaranteed ordered WRT previous writes.)
 				 */
-				switch (info->feature_flush &
+				switch (dinfo->feature_flush &
 					((REQ_FLUSH|REQ_FUA))) {
 				case REQ_FLUSH|REQ_FUA:
 					ring_req->operation =
@@ -510,7 +523,7 @@ static int blkif_queue_request(struct request *req)
 			}
 			ring_req->u.rw.nr_segments = nseg;
 		}
-		for_each_sg(info->shadow[id].sg, sg, nseg, i) {
+		for_each_sg(rinfo->shadow[id].sg, sg, nseg, i) {
 			fsect = sg->offset >> 9;
 			lsect = fsect + (sg->length >> 9) - 1;
 
@@ -522,28 +535,28 @@ static int blkif_queue_request(struct request *req)
 					kunmap_atomic(segments);
 
 				n = i / SEGS_PER_INDIRECT_FRAME;
-				if (!info->feature_persistent) {
+				if (!dinfo->feature_persistent) {
 					struct page *indirect_page;
 
 					/* Fetch a pre-allocated page to use for indirect grefs */
-					BUG_ON(list_empty(&info->indirect_pages));
-					indirect_page = list_first_entry(&info->indirect_pages,
+					BUG_ON(list_empty(&rinfo->indirect_pages));
+					indirect_page = list_first_entry(&rinfo->indirect_pages,
 					                                 struct page, lru);
 					list_del(&indirect_page->lru);
 					pfn = page_to_pfn(indirect_page);
 				}
-				gnt_list_entry = get_grant(&gref_head, pfn, info);
-				info->shadow[id].indirect_grants[n] = gnt_list_entry;
+				gnt_list_entry = get_grant(&gref_head, pfn, rinfo);
+				rinfo->shadow[id].indirect_grants[n] = gnt_list_entry;
 				segments = kmap_atomic(pfn_to_page(gnt_list_entry->pfn));
 				ring_req->u.indirect.indirect_grefs[n] = gnt_list_entry->gref;
 			}
 
-			gnt_list_entry = get_grant(&gref_head, page_to_pfn(sg_page(sg)), info);
+			gnt_list_entry = get_grant(&gref_head, page_to_pfn(sg_page(sg)), rinfo);
 			ref = gnt_list_entry->gref;
 
-			info->shadow[id].grants_used[i] = gnt_list_entry;
+			rinfo->shadow[id].grants_used[i] = gnt_list_entry;
 
-			if (rq_data_dir(req) && info->feature_persistent) {
+			if (rq_data_dir(req) && dinfo->feature_persistent) {
 				char *bvec_data;
 				void *shared_data;
 
@@ -587,10 +600,10 @@ static int blkif_queue_request(struct request *req)
 			kunmap_atomic(segments);
 	}
 
-	info->ring.req_prod_pvt++;
+	rinfo->ring.req_prod_pvt++;
 
 	/* Keep a private copy so we can reissue requests when recovering. */
-	info->shadow[id].req = *ring_req;
+	rinfo->shadow[id].req = *ring_req;
 
 	if (new_persistent_gnts)
 		gnttab_free_grant_references(gref_head);
@@ -599,59 +612,70 @@ static int blkif_queue_request(struct request *req)
 }
 
 
-static inline void flush_requests(struct blkfront_info *info)
+static inline void flush_requests(struct blkfront_ring_info *rinfo)
 {
 	int notify;
 
-	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&info->ring, notify);
+	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&rinfo->ring, notify);
 
 	if (notify)
-		notify_remote_via_irq(info->irq);
+		notify_remote_via_irq(rinfo->irq);
 }
 
 static inline bool blkif_request_flush_invalid(struct request *req,
-					       struct blkfront_info *info)
+					       struct blkfront_dev_info *dinfo)
 {
 	return ((req->cmd_type != REQ_TYPE_FS) ||
 		((req->cmd_flags & REQ_FLUSH) &&
-		 !(info->feature_flush & REQ_FLUSH)) ||
+		 !(dinfo->feature_flush & REQ_FLUSH)) ||
 		((req->cmd_flags & REQ_FUA) &&
-		 !(info->feature_flush & REQ_FUA)));
+		 !(dinfo->feature_flush & REQ_FUA)));
 }
 
 static int blkif_queue_rq(struct blk_mq_hw_ctx *hctx,
 			   const struct blk_mq_queue_data *qd)
 {
-	struct blkfront_info *info = qd->rq->rq_disk->private_data;
+	struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)hctx->driver_data;
+	struct blkfront_dev_info *dinfo = rinfo->dinfo;
 
 	blk_mq_start_request(qd->rq);
-	spin_lock_irq(&info->io_lock);
-	if (RING_FULL(&info->ring))
+	spin_lock_irq(&dinfo->io_lock);
+	if (RING_FULL(&rinfo->ring))
 		goto out_busy;
 
-	if (blkif_request_flush_invalid(qd->rq, info))
+	if (blkif_request_flush_invalid(qd->rq, dinfo))
 		goto out_err;
 
-	if (blkif_queue_request(qd->rq))
+	if (blkif_queue_request(qd->rq, rinfo))
 		goto out_busy;
 
-	flush_requests(info);
-	spin_unlock_irq(&info->io_lock);
+	flush_requests(rinfo);
+	spin_unlock_irq(&dinfo->io_lock);
 	return BLK_MQ_RQ_QUEUE_OK;
 
 out_err:
-	spin_unlock_irq(&info->io_lock);
+	spin_unlock_irq(&dinfo->io_lock);
 	return BLK_MQ_RQ_QUEUE_ERROR;
 
 out_busy:
-	spin_unlock_irq(&info->io_lock);
+	spin_unlock_irq(&dinfo->io_lock);
 	blk_mq_stop_hw_queue(hctx);
 	return BLK_MQ_RQ_QUEUE_BUSY;
 }
 
+static int blk_mq_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
+			    unsigned int index)
+{
+	struct blkfront_dev_info *dinfo = (struct blkfront_dev_info *)data;
+
+	hctx->driver_data = &dinfo->rinfo;
+	return 0;
+}
+
 static struct blk_mq_ops blkfront_mq_ops = {
 	.queue_rq = blkif_queue_rq,
 	.map_queue = blk_mq_map_queue,
+	.init_hctx = blk_mq_init_hctx,
 };
 
 static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
@@ -659,33 +683,33 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
 				unsigned int segments)
 {
 	struct request_queue *rq;
-	struct blkfront_info *info = gd->private_data;
-
-	memset(&info->tag_set, 0, sizeof(info->tag_set));
-	info->tag_set.ops = &blkfront_mq_ops;
-	info->tag_set.nr_hw_queues = 1;
-	info->tag_set.queue_depth =  BLK_RING_SIZE(info);
-	info->tag_set.numa_node = NUMA_NO_NODE;
-	info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
-	info->tag_set.cmd_size = 0;
-	info->tag_set.driver_data = info;
-
-	if (blk_mq_alloc_tag_set(&info->tag_set))
+	struct blkfront_dev_info *dinfo = gd->private_data;
+
+	memset(&dinfo->tag_set, 0, sizeof(dinfo->tag_set));
+	dinfo->tag_set.ops = &blkfront_mq_ops;
+	dinfo->tag_set.nr_hw_queues = 1;
+	dinfo->tag_set.queue_depth =  BLK_RING_SIZE(dinfo);
+	dinfo->tag_set.numa_node = NUMA_NO_NODE;
+	dinfo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
+	dinfo->tag_set.cmd_size = 0;
+	dinfo->tag_set.driver_data = dinfo;
+
+	if (blk_mq_alloc_tag_set(&dinfo->tag_set))
 		return -1;
-	rq = blk_mq_init_queue(&info->tag_set);
+	rq = blk_mq_init_queue(&dinfo->tag_set);
 	if (IS_ERR(rq)) {
-		blk_mq_free_tag_set(&info->tag_set);
+		blk_mq_free_tag_set(&dinfo->tag_set);
 		return -1;
 	}
 
 	queue_flag_set_unlocked(QUEUE_FLAG_VIRT, rq);
 
-	if (info->feature_discard) {
+	if (dinfo->feature_discard) {
 		queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, rq);
 		blk_queue_max_discard_sectors(rq, get_capacity(gd));
-		rq->limits.discard_granularity = info->discard_granularity;
-		rq->limits.discard_alignment = info->discard_alignment;
-		if (info->feature_secdiscard)
+		rq->limits.discard_granularity = dinfo->discard_granularity;
+		rq->limits.discard_alignment = dinfo->discard_alignment;
+		if (dinfo->feature_secdiscard)
 			queue_flag_set_unlocked(QUEUE_FLAG_SECDISCARD, rq);
 	}
 
@@ -724,14 +748,14 @@ static const char *flush_info(unsigned int feature_flush)
 	}
 }
 
-static void xlvbd_flush(struct blkfront_info *info)
+static void xlvbd_flush(struct blkfront_dev_info *dinfo)
 {
-	blk_queue_flush(info->rq, info->feature_flush);
+	blk_queue_flush(dinfo->rq, dinfo->feature_flush);
 	pr_info("blkfront: %s: %s %s %s %s %s\n",
-		info->gd->disk_name, flush_info(info->feature_flush),
-		"persistent grants:", info->feature_persistent ?
+		dinfo->gd->disk_name, flush_info(dinfo->feature_flush),
+		"persistent grants:", dinfo->feature_persistent ?
 		"enabled;" : "disabled;", "indirect descriptors:",
-		info->max_indirect_segments ? "enabled;" : "disabled;");
+		dinfo->max_indirect_segments ? "enabled;" : "disabled;");
 }
 
 static int xen_translate_vdev(int vdevice, int *minor, unsigned int *offset)
@@ -803,7 +827,7 @@ static char *encode_disk_name(char *ptr, unsigned int n)
 }
 
 static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
-			       struct blkfront_info *info,
+			       struct blkfront_dev_info *dinfo,
 			       u16 vdisk_info, u16 sector_size,
 			       unsigned int physical_sector_size)
 {
@@ -815,32 +839,32 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
 	int nr_parts;
 	char *ptr;
 
-	BUG_ON(info->gd != NULL);
-	BUG_ON(info->rq != NULL);
+	BUG_ON(dinfo->gd != NULL);
+	BUG_ON(dinfo->rq != NULL);
 
-	if ((info->vdevice>>EXT_SHIFT) > 1) {
+	if ((dinfo->vdevice>>EXT_SHIFT) > 1) {
 		/* this is above the extended range; something is wrong */
-		printk(KERN_WARNING "blkfront: vdevice 0x%x is above the extended range; ignoring\n", info->vdevice);
+		printk(KERN_WARNING "blkfront: vdevice 0x%x is above the extended range; ignoring\n", dinfo->vdevice);
 		return -ENODEV;
 	}
 
-	if (!VDEV_IS_EXTENDED(info->vdevice)) {
-		err = xen_translate_vdev(info->vdevice, &minor, &offset);
+	if (!VDEV_IS_EXTENDED(dinfo->vdevice)) {
+		err = xen_translate_vdev(dinfo->vdevice, &minor, &offset);
 		if (err)
 			return err;		
  		nr_parts = PARTS_PER_DISK;
 	} else {
-		minor = BLKIF_MINOR_EXT(info->vdevice);
+		minor = BLKIF_MINOR_EXT(dinfo->vdevice);
 		nr_parts = PARTS_PER_EXT_DISK;
 		offset = minor / nr_parts;
 		if (xen_hvm_domain() && offset < EMULATED_HD_DISK_NAME_OFFSET + 4)
 			printk(KERN_WARNING "blkfront: vdevice 0x%x might conflict with "
 					"emulated IDE disks,\n\t choose an xvd device name"
-					"from xvde on\n", info->vdevice);
+					"from xvde on\n", dinfo->vdevice);
 	}
 	if (minor >> MINORBITS) {
 		pr_warn("blkfront: %#x's minor (%#x) out of range; ignoring\n",
-			info->vdevice, minor);
+			dinfo->vdevice, minor);
 		return -ENODEV;
 	}
 
@@ -868,21 +892,21 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
 	gd->major = XENVBD_MAJOR;
 	gd->first_minor = minor;
 	gd->fops = &xlvbd_block_fops;
-	gd->private_data = info;
-	gd->driverfs_dev = &(info->xbdev->dev);
+	gd->private_data = dinfo;
+	gd->driverfs_dev = &(dinfo->xbdev->dev);
 	set_capacity(gd, capacity);
 
 	if (xlvbd_init_blk_queue(gd, sector_size, physical_sector_size,
-				 info->max_indirect_segments ? :
+				 dinfo->max_indirect_segments ? :
 				 BLKIF_MAX_SEGMENTS_PER_REQUEST)) {
 		del_gendisk(gd);
 		goto release;
 	}
 
-	info->rq = gd->queue;
-	info->gd = gd;
+	dinfo->rq = gd->queue;
+	dinfo->gd = gd;
 
-	xlvbd_flush(info);
+	xlvbd_flush(dinfo);
 
 	if (vdisk_info & VDISK_READONLY)
 		set_disk_ro(gd, 1);
@@ -901,118 +925,120 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
 	return err;
 }
 
-static void xlvbd_release_gendisk(struct blkfront_info *info)
+static void xlvbd_release_gendisk(struct blkfront_dev_info *dinfo)
 {
 	unsigned int minor, nr_minors;
+	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
 
-	if (info->rq == NULL)
+	if (dinfo->rq == NULL)
 		return;
 
 	/* No more blkif_request(). */
-	blk_mq_stop_hw_queues(info->rq);
+	blk_mq_stop_hw_queues(dinfo->rq);
 
 	/* No more gnttab callback work. */
-	gnttab_cancel_free_callback(&info->callback);
+	gnttab_cancel_free_callback(&rinfo->callback);
 
 	/* Flush gnttab callback work. Must be done with no locks held. */
-	flush_work(&info->work);
+	flush_work(&rinfo->work);
 
-	del_gendisk(info->gd);
+	del_gendisk(dinfo->gd);
 
-	minor = info->gd->first_minor;
-	nr_minors = info->gd->minors;
+	minor = dinfo->gd->first_minor;
+	nr_minors = dinfo->gd->minors;
 	xlbd_release_minors(minor, nr_minors);
 
-	blk_cleanup_queue(info->rq);
-	blk_mq_free_tag_set(&info->tag_set);
-	info->rq = NULL;
+	blk_cleanup_queue(dinfo->rq);
+	blk_mq_free_tag_set(&dinfo->tag_set);
+	dinfo->rq = NULL;
 
-	put_disk(info->gd);
-	info->gd = NULL;
+	put_disk(dinfo->gd);
+	dinfo->gd = NULL;
 }
 
 /* Must be called with io_lock holded */
-static void kick_pending_request_queues(struct blkfront_info *info)
+static void kick_pending_request_queues(struct blkfront_ring_info *rinfo)
 {
-	if (!RING_FULL(&info->ring))
-		blk_mq_start_stopped_hw_queues(info->rq, true);
+	if (!RING_FULL(&rinfo->ring))
+		blk_mq_start_stopped_hw_queues(rinfo->dinfo->rq, true);
 }
 
 static void blkif_restart_queue(struct work_struct *work)
 {
-	struct blkfront_info *info = container_of(work, struct blkfront_info, work);
+	struct blkfront_ring_info *rinfo = container_of(work, struct blkfront_ring_info, work);
 
-	spin_lock_irq(&info->io_lock);
-	if (info->connected == BLKIF_STATE_CONNECTED)
-		kick_pending_request_queues(info);
-	spin_unlock_irq(&info->io_lock);
+	spin_lock_irq(&rinfo->dinfo->io_lock);
+	if (rinfo->dinfo->connected == BLKIF_STATE_CONNECTED)
+		kick_pending_request_queues(rinfo);
+	spin_unlock_irq(&rinfo->dinfo->io_lock);
 }
 
-static void blkif_free(struct blkfront_info *info, int suspend)
+static void blkif_free(struct blkfront_dev_info *dinfo, int suspend)
 {
 	struct grant *persistent_gnt;
 	struct grant *n;
 	int i, j, segs;
+	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
 
 	/* Prevent new requests being issued until we fix things up. */
-	spin_lock_irq(&info->io_lock);
-	info->connected = suspend ?
+	spin_lock_irq(&dinfo->io_lock);
+	dinfo->connected = suspend ?
 		BLKIF_STATE_SUSPENDED : BLKIF_STATE_DISCONNECTED;
 	/* No more blkif_request(). */
-	if (info->rq)
-		blk_mq_stop_hw_queues(info->rq);
+	if (dinfo->rq)
+		blk_mq_stop_hw_queues(dinfo->rq);
 
 	/* Remove all persistent grants */
-	if (!list_empty(&info->grants)) {
+	if (!list_empty(&rinfo->grants)) {
 		list_for_each_entry_safe(persistent_gnt, n,
-		                         &info->grants, node) {
+					 &rinfo->grants, node) {
 			list_del(&persistent_gnt->node);
 			if (persistent_gnt->gref != GRANT_INVALID_REF) {
 				gnttab_end_foreign_access(persistent_gnt->gref,
 				                          0, 0UL);
-				info->persistent_gnts_c--;
+				rinfo->persistent_gnts_c--;
 			}
-			if (info->feature_persistent)
+			if (dinfo->feature_persistent)
 				__free_page(pfn_to_page(persistent_gnt->pfn));
 			kfree(persistent_gnt);
 		}
 	}
-	BUG_ON(info->persistent_gnts_c != 0);
+	BUG_ON(rinfo->persistent_gnts_c != 0);
 
 	/*
 	 * Remove indirect pages, this only happens when using indirect
 	 * descriptors but not persistent grants
 	 */
-	if (!list_empty(&info->indirect_pages)) {
+	if (!list_empty(&rinfo->indirect_pages)) {
 		struct page *indirect_page, *n;
 
-		BUG_ON(info->feature_persistent);
-		list_for_each_entry_safe(indirect_page, n, &info->indirect_pages, lru) {
+		BUG_ON(dinfo->feature_persistent);
+		list_for_each_entry_safe(indirect_page, n, &rinfo->indirect_pages, lru) {
 			list_del(&indirect_page->lru);
 			__free_page(indirect_page);
 		}
 	}
 
-	for (i = 0; i < BLK_RING_SIZE(info); i++) {
+	for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
 		/*
 		 * Clear persistent grants present in requests already
 		 * on the shared ring
 		 */
-		if (!info->shadow[i].request)
+		if (!rinfo->shadow[i].request)
 			goto free_shadow;
 
-		segs = info->shadow[i].req.operation == BLKIF_OP_INDIRECT ?
-		       info->shadow[i].req.u.indirect.nr_segments :
-		       info->shadow[i].req.u.rw.nr_segments;
+		segs = rinfo->shadow[i].req.operation == BLKIF_OP_INDIRECT ?
+		       rinfo->shadow[i].req.u.indirect.nr_segments :
+		       rinfo->shadow[i].req.u.rw.nr_segments;
 		for (j = 0; j < segs; j++) {
-			persistent_gnt = info->shadow[i].grants_used[j];
+			persistent_gnt = rinfo->shadow[i].grants_used[j];
 			gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
-			if (info->feature_persistent)
+			if (dinfo->feature_persistent)
 				__free_page(pfn_to_page(persistent_gnt->pfn));
 			kfree(persistent_gnt);
 		}
 
-		if (info->shadow[i].req.operation != BLKIF_OP_INDIRECT)
+		if (rinfo->shadow[i].req.operation != BLKIF_OP_INDIRECT)
 			/*
 			 * If this is not an indirect operation don't try to
 			 * free indirect segments
@@ -1020,45 +1046,45 @@ static void blkif_free(struct blkfront_info *info, int suspend)
 			goto free_shadow;
 
 		for (j = 0; j < INDIRECT_GREFS(segs); j++) {
-			persistent_gnt = info->shadow[i].indirect_grants[j];
+			persistent_gnt = rinfo->shadow[i].indirect_grants[j];
 			gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
 			__free_page(pfn_to_page(persistent_gnt->pfn));
 			kfree(persistent_gnt);
 		}
 
 free_shadow:
-		kfree(info->shadow[i].grants_used);
-		info->shadow[i].grants_used = NULL;
-		kfree(info->shadow[i].indirect_grants);
-		info->shadow[i].indirect_grants = NULL;
-		kfree(info->shadow[i].sg);
-		info->shadow[i].sg = NULL;
+		kfree(rinfo->shadow[i].grants_used);
+		rinfo->shadow[i].grants_used = NULL;
+		kfree(rinfo->shadow[i].indirect_grants);
+		rinfo->shadow[i].indirect_grants = NULL;
+		kfree(rinfo->shadow[i].sg);
+		rinfo->shadow[i].sg = NULL;
 	}
 
 	/* No more gnttab callback work. */
-	gnttab_cancel_free_callback(&info->callback);
-	spin_unlock_irq(&info->io_lock);
+	gnttab_cancel_free_callback(&rinfo->callback);
+	spin_unlock_irq(&dinfo->io_lock);
 
 	/* Flush gnttab callback work. Must be done with no locks held. */
-	flush_work(&info->work);
+	flush_work(&rinfo->work);
 
 	/* Free resources associated with old device channel. */
-	for (i = 0; i < info->nr_ring_pages; i++) {
-		if (info->ring_ref[i] != GRANT_INVALID_REF) {
-			gnttab_end_foreign_access(info->ring_ref[i], 0, 0);
-			info->ring_ref[i] = GRANT_INVALID_REF;
+	for (i = 0; i < dinfo->nr_ring_pages; i++) {
+		if (rinfo->ring_ref[i] != GRANT_INVALID_REF) {
+			gnttab_end_foreign_access(rinfo->ring_ref[i], 0, 0);
+			rinfo->ring_ref[i] = GRANT_INVALID_REF;
 		}
 	}
-	free_pages((unsigned long)info->ring.sring, get_order(info->nr_ring_pages * PAGE_SIZE));
-	info->ring.sring = NULL;
+	free_pages((unsigned long)rinfo->ring.sring, get_order(dinfo->nr_ring_pages * PAGE_SIZE));
+	rinfo->ring.sring = NULL;
 
-	if (info->irq)
-		unbind_from_irqhandler(info->irq, info);
-	info->evtchn = info->irq = 0;
+	if (rinfo->irq)
+		unbind_from_irqhandler(rinfo->irq, rinfo);
+	rinfo->evtchn = rinfo->irq = 0;
 
 }
 
-static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
+static void blkif_completion(struct blk_shadow *s, struct blkfront_ring_info *rinfo,
 			     struct blkif_response *bret)
 {
 	int i = 0;
@@ -1066,11 +1092,12 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
 	char *bvec_data;
 	void *shared_data;
 	int nseg;
+	struct blkfront_dev_info *dinfo = rinfo->dinfo;
 
 	nseg = s->req.operation == BLKIF_OP_INDIRECT ?
 		s->req.u.indirect.nr_segments : s->req.u.rw.nr_segments;
 
-	if (bret->operation == BLKIF_OP_READ && info->feature_persistent) {
+	if (bret->operation == BLKIF_OP_READ && dinfo->feature_persistent) {
 		for_each_sg(s->sg, sg, nseg, i) {
 			BUG_ON(sg->offset + sg->length > PAGE_SIZE);
 			shared_data = kmap_atomic(
@@ -1092,11 +1119,11 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
 			 * we add it at the head of the list, so it will be
 			 * reused first.
 			 */
-			if (!info->feature_persistent)
+			if (!dinfo->feature_persistent)
 				pr_alert_ratelimited("backed has not unmapped grant: %u\n",
 						     s->grants_used[i]->gref);
-			list_add(&s->grants_used[i]->node, &info->grants);
-			info->persistent_gnts_c++;
+			list_add(&s->grants_used[i]->node, &rinfo->grants);
+			rinfo->persistent_gnts_c++;
 		} else {
 			/*
 			 * If the grant is not mapped by the backend we end the
@@ -1106,17 +1133,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
 			 */
 			gnttab_end_foreign_access(s->grants_used[i]->gref, 0, 0UL);
 			s->grants_used[i]->gref = GRANT_INVALID_REF;
-			list_add_tail(&s->grants_used[i]->node, &info->grants);
+			list_add_tail(&s->grants_used[i]->node, &rinfo->grants);
 		}
 	}
 	if (s->req.operation == BLKIF_OP_INDIRECT) {
 		for (i = 0; i < INDIRECT_GREFS(nseg); i++) {
 			if (gnttab_query_foreign_access(s->indirect_grants[i]->gref)) {
-				if (!info->feature_persistent)
+				if (!dinfo->feature_persistent)
 					pr_alert_ratelimited("backed has not unmapped grant: %u\n",
 							     s->indirect_grants[i]->gref);
-				list_add(&s->indirect_grants[i]->node, &info->grants);
-				info->persistent_gnts_c++;
+				list_add(&s->indirect_grants[i]->node, &rinfo->grants);
+				rinfo->persistent_gnts_c++;
 			} else {
 				struct page *indirect_page;
 
@@ -1125,12 +1152,12 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
 				 * Add the used indirect page back to the list of
 				 * available pages for indirect grefs.
 				 */
-				if (!info->feature_persistent) {
+				if (!dinfo->feature_persistent) {
 					indirect_page = pfn_to_page(s->indirect_grants[i]->pfn);
-					list_add(&indirect_page->lru, &info->indirect_pages);
+					list_add(&indirect_page->lru, &rinfo->indirect_pages);
 				}
 				s->indirect_grants[i]->gref = GRANT_INVALID_REF;
-				list_add_tail(&s->indirect_grants[i]->node, &info->grants);
+				list_add_tail(&s->indirect_grants[i]->node, &rinfo->grants);
 			}
 		}
 	}
@@ -1142,44 +1169,45 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 	struct blkif_response *bret;
 	RING_IDX i, rp;
 	unsigned long flags;
-	struct blkfront_info *info = (struct blkfront_info *)dev_id;
+	struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)dev_id;
+	struct blkfront_dev_info *dinfo = rinfo->dinfo;
 
-	spin_lock_irqsave(&info->io_lock, flags);
+	spin_lock_irqsave(&dinfo->io_lock, flags);
 
-	if (unlikely(info->connected != BLKIF_STATE_CONNECTED)) {
-		spin_unlock_irqrestore(&info->io_lock, flags);
+	if (unlikely(dinfo->connected != BLKIF_STATE_CONNECTED)) {
+		spin_unlock_irqrestore(&dinfo->io_lock, flags);
 		return IRQ_HANDLED;
 	}
 
  again:
-	rp = info->ring.sring->rsp_prod;
+	rp = rinfo->ring.sring->rsp_prod;
 	rmb(); /* Ensure we see queued responses up to 'rp'. */
 
-	for (i = info->ring.rsp_cons; i != rp; i++) {
+	for (i = rinfo->ring.rsp_cons; i != rp; i++) {
 		unsigned long id;
 
-		bret = RING_GET_RESPONSE(&info->ring, i);
+		bret = RING_GET_RESPONSE(&rinfo->ring, i);
 		id   = bret->id;
 		/*
 		 * The backend has messed up and given us an id that we would
 		 * never have given to it (we stamp it up to BLK_RING_SIZE -
 		 * look in get_id_from_freelist.
 		 */
-		if (id >= BLK_RING_SIZE(info)) {
+		if (id >= BLK_RING_SIZE(dinfo)) {
 			WARN(1, "%s: response to %s has incorrect id (%ld)\n",
-			     info->gd->disk_name, op_name(bret->operation), id);
+			     dinfo->gd->disk_name, op_name(bret->operation), id);
 			/* We can't safely get the 'struct request' as
 			 * the id is busted. */
 			continue;
 		}
-		req  = info->shadow[id].request;
+		req  = rinfo->shadow[id].request;
 
 		if (bret->operation != BLKIF_OP_DISCARD)
-			blkif_completion(&info->shadow[id], info, bret);
+			blkif_completion(&rinfo->shadow[id], rinfo, bret);
 
-		if (add_id_to_freelist(info, id)) {
+		if (add_id_to_freelist(rinfo, id)) {
 			WARN(1, "%s: response to %s (id %ld) couldn't be recycled!\n",
-			     info->gd->disk_name, op_name(bret->operation), id);
+			     dinfo->gd->disk_name, op_name(bret->operation), id);
 			continue;
 		}
 
@@ -1187,12 +1215,12 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 		switch (bret->operation) {
 		case BLKIF_OP_DISCARD:
 			if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) {
-				struct request_queue *rq = info->rq;
+				struct request_queue *rq = dinfo->rq;
 				printk(KERN_WARNING "blkfront: %s: %s op failed\n",
-					   info->gd->disk_name, op_name(bret->operation));
+					   dinfo->gd->disk_name, op_name(bret->operation));
 				req->errors = -EOPNOTSUPP;
-				info->feature_discard = 0;
-				info->feature_secdiscard = 0;
+				dinfo->feature_discard = 0;
+				dinfo->feature_secdiscard = 0;
 				queue_flag_clear(QUEUE_FLAG_DISCARD, rq);
 				queue_flag_clear(QUEUE_FLAG_SECDISCARD, rq);
 			}
@@ -1202,26 +1230,26 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 		case BLKIF_OP_WRITE_BARRIER:
 			if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) {
 				printk(KERN_WARNING "blkfront: %s: %s op failed\n",
-				       info->gd->disk_name, op_name(bret->operation));
+				       dinfo->gd->disk_name, op_name(bret->operation));
 				req->errors = -EOPNOTSUPP;
 			}
 			if (unlikely(bret->status == BLKIF_RSP_ERROR &&
-				     info->shadow[id].req.u.rw.nr_segments == 0)) {
+				     rinfo->shadow[id].req.u.rw.nr_segments == 0)) {
 				printk(KERN_WARNING "blkfront: %s: empty %s op failed\n",
-				       info->gd->disk_name, op_name(bret->operation));
+				       dinfo->gd->disk_name, op_name(bret->operation));
 				req->errors = -EOPNOTSUPP;
 			}
 			if (unlikely(req->errors)) {
 				if (req->errors == -EOPNOTSUPP)
 					req->errors = 0;
-				info->feature_flush = 0;
-				xlvbd_flush(info);
+				dinfo->feature_flush = 0;
+				xlvbd_flush(dinfo);
 			}
 			/* fall through */
 		case BLKIF_OP_READ:
 		case BLKIF_OP_WRITE:
 			if (unlikely(bret->status != BLKIF_RSP_OKAY))
-				dev_dbg(&info->xbdev->dev, "Bad return from blkdev data "
+				dev_dbg(&dinfo->xbdev->dev, "Bad return from blkdev data "
 					"request: %x\n", bret->status);
 
 			blk_mq_complete_request(req);
@@ -1231,34 +1259,35 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 		}
 	}
 
-	info->ring.rsp_cons = i;
+	rinfo->ring.rsp_cons = i;
 
-	if (i != info->ring.req_prod_pvt) {
+	if (i != rinfo->ring.req_prod_pvt) {
 		int more_to_do;
-		RING_FINAL_CHECK_FOR_RESPONSES(&info->ring, more_to_do);
+		RING_FINAL_CHECK_FOR_RESPONSES(&rinfo->ring, more_to_do);
 		if (more_to_do)
 			goto again;
 	} else
-		info->ring.sring->rsp_event = i + 1;
+		rinfo->ring.sring->rsp_event = i + 1;
 
-	kick_pending_request_queues(info);
+	kick_pending_request_queues(rinfo);
 
-	spin_unlock_irqrestore(&info->io_lock, flags);
+	spin_unlock_irqrestore(&dinfo->io_lock, flags);
 
 	return IRQ_HANDLED;
 }
 
 
 static int setup_blkring(struct xenbus_device *dev,
-			 struct blkfront_info *info)
+			 struct blkfront_ring_info *rinfo)
 {
 	struct blkif_sring *sring;
 	int err, i;
-	unsigned long ring_size = info->nr_ring_pages * PAGE_SIZE;
+	struct blkfront_dev_info *dinfo = rinfo->dinfo;
+	unsigned long ring_size = dinfo->nr_ring_pages * PAGE_SIZE;
 	grant_ref_t gref[XENBUS_MAX_RING_PAGES];
 
-	for (i = 0; i < info->nr_ring_pages; i++)
-		info->ring_ref[i] = GRANT_INVALID_REF;
+	for (i = 0; i < dinfo->nr_ring_pages; i++)
+		rinfo->ring_ref[i] = GRANT_INVALID_REF;
 
 	sring = (struct blkif_sring *)__get_free_pages(GFP_NOIO | __GFP_HIGH,
 						       get_order(ring_size));
@@ -1267,58 +1296,59 @@ static int setup_blkring(struct xenbus_device *dev,
 		return -ENOMEM;
 	}
 	SHARED_RING_INIT(sring);
-	FRONT_RING_INIT(&info->ring, sring, ring_size);
+	FRONT_RING_INIT(&rinfo->ring, sring, ring_size);
 
-	err = xenbus_grant_ring(dev, info->ring.sring, info->nr_ring_pages, gref);
+	err = xenbus_grant_ring(dev, rinfo->ring.sring, dinfo->nr_ring_pages, gref);
 	if (err < 0) {
 		free_pages((unsigned long)sring, get_order(ring_size));
-		info->ring.sring = NULL;
+		rinfo->ring.sring = NULL;
 		goto fail;
 	}
-	for (i = 0; i < info->nr_ring_pages; i++)
-		info->ring_ref[i] = gref[i];
+	for (i = 0; i < dinfo->nr_ring_pages; i++)
+		rinfo->ring_ref[i] = gref[i];
 
-	err = xenbus_alloc_evtchn(dev, &info->evtchn);
+	err = xenbus_alloc_evtchn(dev, &rinfo->evtchn);
 	if (err)
 		goto fail;
 
-	err = bind_evtchn_to_irqhandler(info->evtchn, blkif_interrupt, 0,
-					"blkif", info);
+	err = bind_evtchn_to_irqhandler(rinfo->evtchn, blkif_interrupt, 0,
+					"blkif", rinfo);
 	if (err <= 0) {
 		xenbus_dev_fatal(dev, err,
 				 "bind_evtchn_to_irqhandler failed");
 		goto fail;
 	}
-	info->irq = err;
+	rinfo->irq = err;
 
 	return 0;
 fail:
-	blkif_free(info, 0);
+	blkif_free(dinfo, 0);
 	return err;
 }
 
 
 /* Common code used when first setting up, and when resuming. */
 static int talk_to_blkback(struct xenbus_device *dev,
-			   struct blkfront_info *info)
+			   struct blkfront_dev_info *dinfo)
 {
 	const char *message = NULL;
 	struct xenbus_transaction xbt;
 	int err, i;
 	unsigned int max_page_order = 0;
 	unsigned int ring_page_order = 0;
+	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
 
-	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
+	err = xenbus_scanf(XBT_NIL, dinfo->xbdev->otherend,
 			   "max-ring-page-order", "%u", &max_page_order);
 	if (err != 1)
-		info->nr_ring_pages = 1;
+		dinfo->nr_ring_pages = 1;
 	else {
 		ring_page_order = min(xen_blkif_max_ring_order, max_page_order);
-		info->nr_ring_pages = 1 << ring_page_order;
+		dinfo->nr_ring_pages = 1 << ring_page_order;
 	}
 
 	/* Create shared ring, alloc event channel. */
-	err = setup_blkring(dev, info);
+	err = setup_blkring(dev, rinfo);
 	if (err)
 		goto out;
 
@@ -1329,9 +1359,9 @@ again:
 		goto destroy_blkring;
 	}
 
-	if (info->nr_ring_pages == 1) {
+	if (dinfo->nr_ring_pages == 1) {
 		err = xenbus_printf(xbt, dev->nodename,
-				    "ring-ref", "%u", info->ring_ref[0]);
+				    "ring-ref", "%u", rinfo->ring_ref[0]);
 		if (err) {
 			message = "writing ring-ref";
 			goto abort_transaction;
@@ -1344,12 +1374,12 @@ again:
 			goto abort_transaction;
 		}
 
-		for (i = 0; i < info->nr_ring_pages; i++) {
+		for (i = 0; i < dinfo->nr_ring_pages; i++) {
 			char ring_ref_name[RINGREF_NAME_LEN];
 
 			snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
 			err = xenbus_printf(xbt, dev->nodename, ring_ref_name,
-					    "%u", info->ring_ref[i]);
+					    "%u", rinfo->ring_ref[i]);
 			if (err) {
 				message = "writing ring-ref";
 				goto abort_transaction;
@@ -1357,7 +1387,7 @@ again:
 		}
 	}
 	err = xenbus_printf(xbt, dev->nodename,
-			    "event-channel", "%u", info->evtchn);
+			    "event-channel", "%u", rinfo->evtchn);
 	if (err) {
 		message = "writing event-channel";
 		goto abort_transaction;
@@ -1382,9 +1412,9 @@ again:
 		goto destroy_blkring;
 	}
 
-	for (i = 0; i < BLK_RING_SIZE(info); i++)
-		info->shadow[i].req.u.rw.id = i+1;
-	info->shadow[BLK_RING_SIZE(info)-1].req.u.rw.id = 0x0fffffff;
+	for (i = 0; i < BLK_RING_SIZE(dinfo); i++)
+		rinfo->shadow[i].req.u.rw.id = i+1;
+	rinfo->shadow[BLK_RING_SIZE(dinfo)-1].req.u.rw.id = 0x0fffffff;
 	xenbus_switch_state(dev, XenbusStateInitialised);
 
 	return 0;
@@ -1394,7 +1424,7 @@ again:
 	if (message)
 		xenbus_dev_fatal(dev, err, "%s", message);
  destroy_blkring:
-	blkif_free(info, 0);
+	blkif_free(dinfo, 0);
  out:
 	return err;
 }
@@ -1409,7 +1439,8 @@ static int blkfront_probe(struct xenbus_device *dev,
 			  const struct xenbus_device_id *id)
 {
 	int err, vdevice;
-	struct blkfront_info *info;
+	struct blkfront_dev_info *dinfo;
+	struct blkfront_ring_info *rinfo;
 
 	/* FIXME: Use dynamic device id if this is not set. */
 	err = xenbus_scanf(XBT_NIL, dev->nodename,
@@ -1453,25 +1484,27 @@ static int blkfront_probe(struct xenbus_device *dev,
 		}
 		kfree(type);
 	}
-	info = kzalloc(sizeof(*info), GFP_KERNEL);
-	if (!info) {
+	dinfo = kzalloc(sizeof(*dinfo), GFP_KERNEL);
+	if (!dinfo) {
 		xenbus_dev_fatal(dev, -ENOMEM, "allocating info structure");
 		return -ENOMEM;
 	}
 
-	mutex_init(&info->mutex);
-	spin_lock_init(&info->io_lock);
-	info->xbdev = dev;
-	info->vdevice = vdevice;
-	INIT_LIST_HEAD(&info->grants);
-	INIT_LIST_HEAD(&info->indirect_pages);
-	info->persistent_gnts_c = 0;
-	info->connected = BLKIF_STATE_DISCONNECTED;
-	INIT_WORK(&info->work, blkif_restart_queue);
+	rinfo = &dinfo->rinfo;
+	mutex_init(&dinfo->mutex);
+	spin_lock_init(&dinfo->io_lock);
+	dinfo->xbdev = dev;
+	dinfo->vdevice = vdevice;
+	INIT_LIST_HEAD(&rinfo->grants);
+	INIT_LIST_HEAD(&rinfo->indirect_pages);
+	rinfo->persistent_gnts_c = 0;
+	dinfo->connected = BLKIF_STATE_DISCONNECTED;
+	rinfo->dinfo = dinfo;
+	INIT_WORK(&rinfo->work, blkif_restart_queue);
 
 	/* Front end dir is a number, which is used as the id. */
-	info->handle = simple_strtoul(strrchr(dev->nodename, '/')+1, NULL, 0);
-	dev_set_drvdata(&dev->dev, info);
+	dinfo->handle = simple_strtoul(strrchr(dev->nodename, '/')+1, NULL, 0);
+	dev_set_drvdata(&dev->dev, dinfo);
 
 	return 0;
 }
@@ -1491,7 +1524,7 @@ static void split_bio_end(struct bio *bio, int error)
 	bio_put(bio);
 }
 
-static int blkif_recover(struct blkfront_info *info)
+static int blkif_recover(struct blkfront_dev_info *dinfo)
 {
 	int i;
 	struct request *req, *n;
@@ -1503,31 +1536,32 @@ static int blkif_recover(struct blkfront_info *info)
 	int pending, size;
 	struct split_bio *split_bio;
 	struct list_head requests;
+	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
 
 	/* Stage 1: Make a safe copy of the shadow state. */
-	copy = kmemdup(info->shadow, sizeof(info->shadow),
+	copy = kmemdup(rinfo->shadow, sizeof(rinfo->shadow),
 		       GFP_NOIO | __GFP_REPEAT | __GFP_HIGH);
 	if (!copy)
 		return -ENOMEM;
 
 	/* Stage 2: Set up free list. */
-	memset(&info->shadow, 0, sizeof(info->shadow));
-	for (i = 0; i < BLK_RING_SIZE(info); i++)
-		info->shadow[i].req.u.rw.id = i+1;
-	info->shadow_free = info->ring.req_prod_pvt;
-	info->shadow[BLK_RING_SIZE(info)-1].req.u.rw.id = 0x0fffffff;
+	memset(&rinfo->shadow, 0, sizeof(rinfo->shadow));
+	for (i = 0; i < BLK_RING_SIZE(dinfo); i++)
+		rinfo->shadow[i].req.u.rw.id = i+1;
+	rinfo->shadow_free = rinfo->ring.req_prod_pvt;
+	rinfo->shadow[BLK_RING_SIZE(dinfo)-1].req.u.rw.id = 0x0fffffff;
 
-	rc = blkfront_gather_backend_features(info);
+	rc = blkfront_gather_backend_features(dinfo);
 	if (rc) {
 		kfree(copy);
 		return rc;
 	}
 
-	segs = info->max_indirect_segments ? : BLKIF_MAX_SEGMENTS_PER_REQUEST;
-	blk_queue_max_segments(info->rq, segs);
+	segs = dinfo->max_indirect_segments ? : BLKIF_MAX_SEGMENTS_PER_REQUEST;
+	blk_queue_max_segments(dinfo->rq, segs);
 	bio_list_init(&bio_list);
 	INIT_LIST_HEAD(&requests);
-	for (i = 0; i < BLK_RING_SIZE(info); i++) {
+	for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
 		/* Not in use? */
 		if (!copy[i].request)
 			continue;
@@ -1553,15 +1587,15 @@ static int blkif_recover(struct blkfront_info *info)
 
 	kfree(copy);
 
-	xenbus_switch_state(info->xbdev, XenbusStateConnected);
+	xenbus_switch_state(dinfo->xbdev, XenbusStateConnected);
 
-	spin_lock_irq(&info->io_lock);
+	spin_lock_irq(&dinfo->io_lock);
 
 	/* Now safe for us to use the shared ring */
-	info->connected = BLKIF_STATE_CONNECTED;
+	dinfo->connected = BLKIF_STATE_CONNECTED;
 
 	/* Kick any other new requests queued since we resumed */
-	kick_pending_request_queues(info);
+	kick_pending_request_queues(rinfo);
 
 	list_for_each_entry_safe(req, n, &requests, queuelist) {
 		/* Requeue pending requests (flush or discard) */
@@ -1569,8 +1603,8 @@ static int blkif_recover(struct blkfront_info *info)
 		BUG_ON(req->nr_phys_segments > segs);
 		blk_mq_requeue_request(req);
 	}
-	spin_unlock_irq(&info->io_lock);
-	blk_mq_kick_requeue_list(info->rq);
+	spin_unlock_irq(&dinfo->io_lock);
+	blk_mq_kick_requeue_list(dinfo->rq);
 
 	while ((bio = bio_list_pop(&bio_list)) != NULL) {
 		/* Traverse the list of pending bios and re-queue them */
@@ -1616,14 +1650,14 @@ static int blkif_recover(struct blkfront_info *info)
  */
 static int blkfront_resume(struct xenbus_device *dev)
 {
-	struct blkfront_info *info = dev_get_drvdata(&dev->dev);
+	struct blkfront_dev_info *dinfo = dev_get_drvdata(&dev->dev);
 	int err;
 
 	dev_dbg(&dev->dev, "blkfront_resume: %s\n", dev->nodename);
 
-	blkif_free(info, info->connected == BLKIF_STATE_CONNECTED);
+	blkif_free(dinfo, dinfo->connected == BLKIF_STATE_CONNECTED);
 
-	err = talk_to_blkback(dev, info);
+	err = talk_to_blkback(dev, dinfo);
 
 	/*
 	 * We have to wait for the backend to switch to
@@ -1635,22 +1669,22 @@ static int blkfront_resume(struct xenbus_device *dev)
 }
 
 static void
-blkfront_closing(struct blkfront_info *info)
+blkfront_closing(struct blkfront_dev_info *dinfo)
 {
-	struct xenbus_device *xbdev = info->xbdev;
+	struct xenbus_device *xbdev = dinfo->xbdev;
 	struct block_device *bdev = NULL;
 
-	mutex_lock(&info->mutex);
+	mutex_lock(&dinfo->mutex);
 
 	if (xbdev->state == XenbusStateClosing) {
-		mutex_unlock(&info->mutex);
+		mutex_unlock(&dinfo->mutex);
 		return;
 	}
 
-	if (info->gd)
-		bdev = bdget_disk(info->gd, 0);
+	if (dinfo->gd)
+		bdev = bdget_disk(dinfo->gd, 0);
 
-	mutex_unlock(&info->mutex);
+	mutex_unlock(&dinfo->mutex);
 
 	if (!bdev) {
 		xenbus_frontend_closed(xbdev);
@@ -1664,7 +1698,7 @@ blkfront_closing(struct blkfront_info *info)
 				 "Device in use; refusing to close");
 		xenbus_switch_state(xbdev, XenbusStateClosing);
 	} else {
-		xlvbd_release_gendisk(info);
+		xlvbd_release_gendisk(dinfo);
 		xenbus_frontend_closed(xbdev);
 	}
 
@@ -1672,93 +1706,94 @@ blkfront_closing(struct blkfront_info *info)
 	bdput(bdev);
 }
 
-static void blkfront_setup_discard(struct blkfront_info *info)
+static void blkfront_setup_discard(struct blkfront_dev_info *dinfo)
 {
 	int err;
 	unsigned int discard_granularity;
 	unsigned int discard_alignment;
 	unsigned int discard_secure;
 
-	info->feature_discard = 1;
-	err = xenbus_gather(XBT_NIL, info->xbdev->otherend,
+	dinfo->feature_discard = 1;
+	err = xenbus_gather(XBT_NIL, dinfo->xbdev->otherend,
 		"discard-granularity", "%u", &discard_granularity,
 		"discard-alignment", "%u", &discard_alignment,
 		NULL);
 	if (!err) {
-		info->discard_granularity = discard_granularity;
-		info->discard_alignment = discard_alignment;
+		dinfo->discard_granularity = discard_granularity;
+		dinfo->discard_alignment = discard_alignment;
 	}
-	err = xenbus_gather(XBT_NIL, info->xbdev->otherend,
+	err = xenbus_gather(XBT_NIL, dinfo->xbdev->otherend,
 		    "discard-secure", "%d", &discard_secure,
 		    NULL);
 	if (!err)
-		info->feature_secdiscard = !!discard_secure;
+		dinfo->feature_secdiscard = !!discard_secure;
 }
 
-static int blkfront_setup_indirect(struct blkfront_info *info)
+static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo)
 {
 	unsigned int segs;
 	int err, i;
+	struct blkfront_dev_info *dinfo = rinfo->dinfo;
 
-	if (info->max_indirect_segments == 0)
+	if (dinfo->max_indirect_segments == 0)
 		segs = BLKIF_MAX_SEGMENTS_PER_REQUEST;
 	else
-		segs = info->max_indirect_segments;
+		segs = dinfo->max_indirect_segments;
 
-	err = fill_grant_buffer(info, (segs + INDIRECT_GREFS(segs)) * BLK_RING_SIZE(info));
+	err = fill_grant_buffer(rinfo, (segs + INDIRECT_GREFS(segs)) * BLK_RING_SIZE(dinfo));
 	if (err)
 		goto out_of_memory;
 
-	if (!info->feature_persistent && info->max_indirect_segments) {
+	if (!dinfo->feature_persistent && dinfo->max_indirect_segments) {
 		/*
 		 * We are using indirect descriptors but not persistent
 		 * grants, we need to allocate a set of pages that can be
 		 * used for mapping indirect grefs
 		 */
-		int num = INDIRECT_GREFS(segs) * BLK_RING_SIZE(info);
+		int num = INDIRECT_GREFS(segs) * BLK_RING_SIZE(dinfo);
 
-		BUG_ON(!list_empty(&info->indirect_pages));
+		BUG_ON(!list_empty(&rinfo->indirect_pages));
 		for (i = 0; i < num; i++) {
 			struct page *indirect_page = alloc_page(GFP_NOIO);
 			if (!indirect_page)
 				goto out_of_memory;
-			list_add(&indirect_page->lru, &info->indirect_pages);
+			list_add(&indirect_page->lru, &rinfo->indirect_pages);
 		}
 	}
 
-	for (i = 0; i < BLK_RING_SIZE(info); i++) {
-		info->shadow[i].grants_used = kzalloc(
-			sizeof(info->shadow[i].grants_used[0]) * segs,
+	for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
+		rinfo->shadow[i].grants_used = kzalloc(
+			sizeof(rinfo->shadow[i].grants_used[0]) * segs,
 			GFP_NOIO);
-		info->shadow[i].sg = kzalloc(sizeof(info->shadow[i].sg[0]) * segs, GFP_NOIO);
-		if (info->max_indirect_segments)
-			info->shadow[i].indirect_grants = kzalloc(
-				sizeof(info->shadow[i].indirect_grants[0]) *
+		rinfo->shadow[i].sg = kzalloc(sizeof(rinfo->shadow[i].sg[0]) * segs, GFP_NOIO);
+		if (dinfo->max_indirect_segments)
+			rinfo->shadow[i].indirect_grants = kzalloc(
+				sizeof(rinfo->shadow[i].indirect_grants[0]) *
 				INDIRECT_GREFS(segs),
 				GFP_NOIO);
-		if ((info->shadow[i].grants_used == NULL) ||
-			(info->shadow[i].sg == NULL) ||
-		     (info->max_indirect_segments &&
-		     (info->shadow[i].indirect_grants == NULL)))
+		if ((rinfo->shadow[i].grants_used == NULL) ||
+			(rinfo->shadow[i].sg == NULL) ||
+		     (dinfo->max_indirect_segments &&
+		     (rinfo->shadow[i].indirect_grants == NULL)))
 			goto out_of_memory;
-		sg_init_table(info->shadow[i].sg, segs);
+		sg_init_table(rinfo->shadow[i].sg, segs);
 	}
 
 
 	return 0;
 
 out_of_memory:
-	for (i = 0; i < BLK_RING_SIZE(info); i++) {
-		kfree(info->shadow[i].grants_used);
-		info->shadow[i].grants_used = NULL;
-		kfree(info->shadow[i].sg);
-		info->shadow[i].sg = NULL;
-		kfree(info->shadow[i].indirect_grants);
-		info->shadow[i].indirect_grants = NULL;
+	for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
+		kfree(rinfo->shadow[i].grants_used);
+		rinfo->shadow[i].grants_used = NULL;
+		kfree(rinfo->shadow[i].sg);
+		rinfo->shadow[i].sg = NULL;
+		kfree(rinfo->shadow[i].indirect_grants);
+		rinfo->shadow[i].indirect_grants = NULL;
 	}
-	if (!list_empty(&info->indirect_pages)) {
+	if (!list_empty(&rinfo->indirect_pages)) {
 		struct page *indirect_page, *n;
-		list_for_each_entry_safe(indirect_page, n, &info->indirect_pages, lru) {
+		list_for_each_entry_safe(indirect_page, n, &rinfo->indirect_pages, lru) {
 			list_del(&indirect_page->lru);
 			__free_page(indirect_page);
 		}
@@ -1769,15 +1804,15 @@ out_of_memory:
 /*
  * Gather all backend feature-*
  */
-static int blkfront_gather_backend_features(struct blkfront_info *info)
+static int blkfront_gather_backend_features(struct blkfront_dev_info *dinfo)
 {
 	int err;
 	int barrier, flush, discard, persistent;
 	unsigned int indirect_segments;
 
-	info->feature_flush = 0;
+	dinfo->feature_flush = 0;
 
-	err = xenbus_gather(XBT_NIL, info->xbdev->otherend,
+	err = xenbus_gather(XBT_NIL, dinfo->xbdev->otherend,
 			"feature-barrier", "%d", &barrier,
 			NULL);
 
@@ -1789,71 +1824,72 @@ static int blkfront_gather_backend_features(struct blkfront_info *info)
 	 * If there are barriers, then we use flush.
 	 */
 	if (!err && barrier)
-		info->feature_flush = REQ_FLUSH | REQ_FUA;
+		dinfo->feature_flush = REQ_FLUSH | REQ_FUA;
 	/*
 	 * And if there is "feature-flush-cache" use that above
 	 * barriers.
 	 */
-	err = xenbus_gather(XBT_NIL, info->xbdev->otherend,
+	err = xenbus_gather(XBT_NIL, dinfo->xbdev->otherend,
 			"feature-flush-cache", "%d", &flush,
 			NULL);
 
 	if (!err && flush)
-		info->feature_flush = REQ_FLUSH;
+		dinfo->feature_flush = REQ_FLUSH;
 
-	err = xenbus_gather(XBT_NIL, info->xbdev->otherend,
+	err = xenbus_gather(XBT_NIL, dinfo->xbdev->otherend,
 			"feature-discard", "%d", &discard,
 			NULL);
 
 	if (!err && discard)
-		blkfront_setup_discard(info);
+		blkfront_setup_discard(dinfo);
 
-	err = xenbus_gather(XBT_NIL, info->xbdev->otherend,
+	err = xenbus_gather(XBT_NIL, dinfo->xbdev->otherend,
 			"feature-persistent", "%u", &persistent,
 			NULL);
 	if (err)
-		info->feature_persistent = 0;
+		dinfo->feature_persistent = 0;
 	else
-		info->feature_persistent = persistent;
+		dinfo->feature_persistent = persistent;
 
-	err = xenbus_gather(XBT_NIL, info->xbdev->otherend,
+	err = xenbus_gather(XBT_NIL, dinfo->xbdev->otherend,
 			    "feature-max-indirect-segments", "%u", &indirect_segments,
 			    NULL);
 	if (err)
-		info->max_indirect_segments = 0;
+		dinfo->max_indirect_segments = 0;
 	else
-		info->max_indirect_segments = min(indirect_segments,
+		dinfo->max_indirect_segments = min(indirect_segments,
 						  xen_blkif_max_segments);
 
-	return blkfront_setup_indirect(info);
+	return blkfront_setup_indirect(&dinfo->rinfo);
 }
 
 /*
  * Invoked when the backend is finally 'ready' (and has told produced
  * the details about the physical device - #sectors, size, etc).
  */
-static void blkfront_connect(struct blkfront_info *info)
+static void blkfront_connect(struct blkfront_dev_info *dinfo)
 {
 	unsigned long long sectors;
 	unsigned long sector_size;
 	unsigned int physical_sector_size;
 	unsigned int binfo;
 	int err;
+	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
 
-	switch (info->connected) {
+	switch (dinfo->connected) {
 	case BLKIF_STATE_CONNECTED:
 		/*
 		 * Potentially, the back-end may be signalling
 		 * a capacity change; update the capacity.
 		 */
-		err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
+		err = xenbus_scanf(XBT_NIL, dinfo->xbdev->otherend,
 				   "sectors", "%Lu", &sectors);
 		if (XENBUS_EXIST_ERR(err))
 			return;
 		printk(KERN_INFO "Setting capacity to %Lu\n",
 		       sectors);
-		set_capacity(info->gd, sectors);
-		revalidate_disk(info->gd);
+		set_capacity(dinfo->gd, sectors);
+		revalidate_disk(dinfo->gd);
 
 		return;
 	case BLKIF_STATE_SUSPENDED:
@@ -1863,25 +1899,25 @@ static void blkfront_connect(struct blkfront_info *info)
 		 * reconnecting, at least we need to know if the backend
 		 * supports indirect descriptors, and how many.
 		 */
-		blkif_recover(info);
+		blkif_recover(dinfo);
 		return;
 
 	default:
 		break;
 	}
 
-	dev_dbg(&info->xbdev->dev, "%s:%s.\n",
-		__func__, info->xbdev->otherend);
+	dev_dbg(&dinfo->xbdev->dev, "%s:%s.\n",
+		__func__, dinfo->xbdev->otherend);
 
-	err = xenbus_gather(XBT_NIL, info->xbdev->otherend,
+	err = xenbus_gather(XBT_NIL, dinfo->xbdev->otherend,
 			    "sectors", "%llu", &sectors,
 			    "info", "%u", &binfo,
 			    "sector-size", "%lu", &sector_size,
 			    NULL);
 	if (err) {
-		xenbus_dev_fatal(info->xbdev, err,
+		xenbus_dev_fatal(dinfo->xbdev, err,
 				 "reading backend fields at %s",
-				 info->xbdev->otherend);
+				 dinfo->xbdev->otherend);
 		return;
 	}
 
@@ -1890,37 +1926,37 @@ static void blkfront_connect(struct blkfront_info *info)
 	 * provide this. Assume physical sector size to be the same as
 	 * sector_size in that case.
 	 */
-	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
+	err = xenbus_scanf(XBT_NIL, dinfo->xbdev->otherend,
 			   "physical-sector-size", "%u", &physical_sector_size);
 	if (err != 1)
 		physical_sector_size = sector_size;
 
-	err = blkfront_gather_backend_features(info);
+	err = blkfront_gather_backend_features(dinfo);
 	if (err) {
-		xenbus_dev_fatal(info->xbdev, err, "setup_indirect at %s",
-				 info->xbdev->otherend);
+		xenbus_dev_fatal(dinfo->xbdev, err, "setup_indirect at %s",
+				 dinfo->xbdev->otherend);
 		return;
 	}
 
-	err = xlvbd_alloc_gendisk(sectors, info, binfo, sector_size,
+	err = xlvbd_alloc_gendisk(sectors, dinfo, binfo, sector_size,
 				  physical_sector_size);
 	if (err) {
-		xenbus_dev_fatal(info->xbdev, err, "xlvbd_add at %s",
-				 info->xbdev->otherend);
+		xenbus_dev_fatal(dinfo->xbdev, err, "xlvbd_add at %s",
+				 dinfo->xbdev->otherend);
 		return;
 	}
 
-	xenbus_switch_state(info->xbdev, XenbusStateConnected);
+	xenbus_switch_state(dinfo->xbdev, XenbusStateConnected);
 
 	/* Kick pending requests. */
-	spin_lock_irq(&info->io_lock);
-	info->connected = BLKIF_STATE_CONNECTED;
-	kick_pending_request_queues(info);
-	spin_unlock_irq(&info->io_lock);
+	spin_lock_irq(&dinfo->io_lock);
+	dinfo->connected = BLKIF_STATE_CONNECTED;
+	kick_pending_request_queues(rinfo);
+	spin_unlock_irq(&dinfo->io_lock);
 
-	add_disk(info->gd);
+	add_disk(dinfo->gd);
 
-	info->is_ready = 1;
+	dinfo->is_ready = 1;
 }
 
 /**
@@ -1929,7 +1965,7 @@ static void blkfront_connect(struct blkfront_info *info)
 static void blkback_changed(struct xenbus_device *dev,
 			    enum xenbus_state backend_state)
 {
-	struct blkfront_info *info = dev_get_drvdata(&dev->dev);
+	struct blkfront_dev_info *dinfo = dev_get_drvdata(&dev->dev);
 
 	dev_dbg(&dev->dev, "blkfront:blkback_changed to state %d.\n", backend_state);
 
@@ -1937,8 +1973,8 @@ static void blkback_changed(struct xenbus_device *dev,
 	case XenbusStateInitWait:
 		if (dev->state != XenbusStateInitialising)
 			break;
-		if (talk_to_blkback(dev, info)) {
-			kfree(info);
+		if (talk_to_blkback(dev, dinfo)) {
+			kfree(dinfo);
 			dev_set_drvdata(&dev->dev, NULL);
 			break;
 		}
@@ -1950,7 +1986,7 @@ static void blkback_changed(struct xenbus_device *dev,
 		break;
 
 	case XenbusStateConnected:
-		blkfront_connect(info);
+		blkfront_connect(dinfo);
 		break;
 
 	case XenbusStateClosed:
@@ -1958,32 +1994,32 @@ static void blkback_changed(struct xenbus_device *dev,
 			break;
 		/* Missed the backend's Closing state -- fallthrough */
 	case XenbusStateClosing:
-		blkfront_closing(info);
+		blkfront_closing(dinfo);
 		break;
 	}
 }
 
 static int blkfront_remove(struct xenbus_device *xbdev)
 {
-	struct blkfront_info *info = dev_get_drvdata(&xbdev->dev);
+	struct blkfront_dev_info *dinfo = dev_get_drvdata(&xbdev->dev);
 	struct block_device *bdev = NULL;
 	struct gendisk *disk;
 
 	dev_dbg(&xbdev->dev, "%s removed", xbdev->nodename);
 
-	blkif_free(info, 0);
+	blkif_free(dinfo, 0);
 
-	mutex_lock(&info->mutex);
+	mutex_lock(&dinfo->mutex);
 
-	disk = info->gd;
+	disk = dinfo->gd;
 	if (disk)
 		bdev = bdget_disk(disk, 0);
 
-	info->xbdev = NULL;
-	mutex_unlock(&info->mutex);
+	dinfo->xbdev = NULL;
+	mutex_unlock(&dinfo->mutex);
 
 	if (!bdev) {
-		kfree(info);
+		kfree(dinfo);
 		return 0;
 	}
 
@@ -1994,16 +2030,16 @@ static int blkfront_remove(struct xenbus_device *xbdev)
 	 */
 
 	mutex_lock(&bdev->bd_mutex);
-	info = disk->private_data;
+	dinfo = disk->private_data;
 
 	dev_warn(disk_to_dev(disk),
 		 "%s was hot-unplugged, %d stale handles\n",
 		 xbdev->nodename, bdev->bd_openers);
 
-	if (info && !bdev->bd_openers) {
-		xlvbd_release_gendisk(info);
+	if (dinfo && !bdev->bd_openers) {
+		xlvbd_release_gendisk(dinfo);
 		disk->private_data = NULL;
-		kfree(info);
+		kfree(dinfo);
 	}
 
 	mutex_unlock(&bdev->bd_mutex);
@@ -2014,33 +2050,33 @@ static int blkfront_remove(struct xenbus_device *xbdev)
 
 static int blkfront_is_ready(struct xenbus_device *dev)
 {
-	struct blkfront_info *info = dev_get_drvdata(&dev->dev);
+	struct blkfront_dev_info *dinfo = dev_get_drvdata(&dev->dev);
 
-	return info->is_ready && info->xbdev;
+	return dinfo->is_ready && dinfo->xbdev;
 }
 
 static int blkif_open(struct block_device *bdev, fmode_t mode)
 {
 	struct gendisk *disk = bdev->bd_disk;
-	struct blkfront_info *info;
+	struct blkfront_dev_info *dinfo;
 	int err = 0;
 
 	mutex_lock(&blkfront_mutex);
 
-	info = disk->private_data;
-	if (!info) {
+	dinfo = disk->private_data;
+	if (!dinfo) {
 		/* xbdev gone */
 		err = -ERESTARTSYS;
 		goto out;
 	}
 
-	mutex_lock(&info->mutex);
+	mutex_lock(&dinfo->mutex);
 
-	if (!info->gd)
+	if (!dinfo->gd)
 		/* xbdev is closed */
 		err = -ERESTARTSYS;
 
-	mutex_unlock(&info->mutex);
+	mutex_unlock(&dinfo->mutex);
 
 out:
 	mutex_unlock(&blkfront_mutex);
@@ -2049,7 +2085,7 @@ out:
 
 static void blkif_release(struct gendisk *disk, fmode_t mode)
 {
-	struct blkfront_info *info = disk->private_data;
+	struct blkfront_dev_info *dinfo = disk->private_data;
 	struct block_device *bdev;
 	struct xenbus_device *xbdev;
 
@@ -2069,24 +2105,24 @@ static void blkif_release(struct gendisk *disk, fmode_t mode)
 	 * deferred this request, because the bdev was still open.
 	 */
 
-	mutex_lock(&info->mutex);
-	xbdev = info->xbdev;
+	mutex_lock(&dinfo->mutex);
+	xbdev = dinfo->xbdev;
 
 	if (xbdev && xbdev->state == XenbusStateClosing) {
 		/* pending switch to state closed */
 		dev_info(disk_to_dev(bdev->bd_disk), "releasing disk\n");
-		xlvbd_release_gendisk(info);
-		xenbus_frontend_closed(info->xbdev);
+		xlvbd_release_gendisk(dinfo);
+		xenbus_frontend_closed(dinfo->xbdev);
  	}
 
-	mutex_unlock(&info->mutex);
+	mutex_unlock(&dinfo->mutex);
 
 	if (!xbdev) {
 		/* sudden device removal */
 		dev_info(disk_to_dev(bdev->bd_disk), "releasing disk\n");
-		xlvbd_release_gendisk(info);
+		xlvbd_release_gendisk(dinfo);
 		disk->private_data = NULL;
-		kfree(info);
+		kfree(dinfo);
 	}
 
 out:
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v3 4/9] xen/blkfront: pseudo support for multi hardware queues/rings
  2015-09-05 12:39 [PATCH v3 0/9] xen-block: support multi hardware-queues/rings Bob Liu
                   ` (5 preceding siblings ...)
  2015-09-05 12:39 ` Bob Liu
@ 2015-09-05 12:39 ` Bob Liu
  2015-10-05 10:52   ` Roger Pau Monné
  2015-10-05 10:52   ` Roger Pau Monné
  2015-09-05 12:39 ` Bob Liu
                   ` (7 subsequent siblings)
  14 siblings, 2 replies; 83+ messages in thread
From: Bob Liu @ 2015-09-05 12:39 UTC (permalink / raw)
  To: xen-devel
  Cc: david.vrabel, linux-kernel, roger.pau, konrad.wilk,
	felipe.franciosi, axboe, hch, avanzini.arianna, rafal.mielniczuk,
	boris.ostrovsky, jonathan.davies, Bob Liu

Prepare patch for multi hardware queues/rings, the ring number was set to 1 by
force.

* Use 'nr_rings' in per dev_info to identify how many hw queues/rings are
  supported, and a pointer *rinfo for all its rings.
* Rename 'nr_ring_pages' => 'pages_per_ring' to distinguish from 'nr_rings'
  better.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 drivers/block/xen-blkfront.c |  513 +++++++++++++++++++++++++-----------------
 1 file changed, 308 insertions(+), 205 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index bf416d5..bf45c99 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -107,7 +107,7 @@ static unsigned int xen_blkif_max_ring_order;
 module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, S_IRUGO);
 MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the shared ring");
 
-#define BLK_RING_SIZE(dinfo) __CONST_RING_SIZE(blkif, PAGE_SIZE * (dinfo)->nr_ring_pages)
+#define BLK_RING_SIZE(dinfo) __CONST_RING_SIZE(blkif, PAGE_SIZE * (dinfo)->pages_per_ring)
 #define BLK_MAX_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE * XENBUS_MAX_RING_PAGES)
 /*
  * ring-ref%i i=(-1UL) would take 11 characters + 'ring-ref' is 8, so 19
@@ -157,9 +157,10 @@ struct blkfront_dev_info {
 	unsigned int feature_persistent:1;
 	unsigned int max_indirect_segments;
 	int is_ready;
-	unsigned int nr_ring_pages;
+	unsigned int pages_per_ring;
 	struct blk_mq_tag_set tag_set;
-	struct blkfront_ring_info rinfo;
+	struct blkfront_ring_info *rinfo;
+	unsigned int nr_rings;
 };
 
 static unsigned int nr_minors;
@@ -191,7 +192,7 @@ static DEFINE_SPINLOCK(minor_lock);
 	((_segs + SEGS_PER_INDIRECT_FRAME - 1)/SEGS_PER_INDIRECT_FRAME)
 
 static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo);
-static int blkfront_gather_backend_features(struct blkfront_dev_info *dinfo);
+static void __blkfront_gather_backend_features(struct blkfront_dev_info *dinfo);
 
 static int get_id_from_freelist(struct blkfront_ring_info *rinfo)
 {
@@ -668,7 +669,7 @@ static int blk_mq_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
 {
 	struct blkfront_dev_info *dinfo = (struct blkfront_dev_info *)data;
 
-	hctx->driver_data = &dinfo->rinfo;
+	hctx->driver_data = &dinfo->rinfo[index];
 	return 0;
 }
 
@@ -927,8 +928,8 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
 
 static void xlvbd_release_gendisk(struct blkfront_dev_info *dinfo)
 {
-	unsigned int minor, nr_minors;
-	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
+	unsigned int minor, nr_minors, i;
+	struct blkfront_ring_info *rinfo;
 
 	if (dinfo->rq == NULL)
 		return;
@@ -936,11 +937,15 @@ static void xlvbd_release_gendisk(struct blkfront_dev_info *dinfo)
 	/* No more blkif_request(). */
 	blk_mq_stop_hw_queues(dinfo->rq);
 
-	/* No more gnttab callback work. */
-	gnttab_cancel_free_callback(&rinfo->callback);
+	for (i = 0; i < dinfo->nr_rings; i++) {
+		rinfo = &dinfo->rinfo[i];
 
-	/* Flush gnttab callback work. Must be done with no locks held. */
-	flush_work(&rinfo->work);
+		/* No more gnttab callback work. */
+		gnttab_cancel_free_callback(&rinfo->callback);
+
+		/* Flush gnttab callback work. Must be done with no locks held. */
+		flush_work(&rinfo->work);
+	}
 
 	del_gendisk(dinfo->gd);
 
@@ -977,8 +982,8 @@ static void blkif_free(struct blkfront_dev_info *dinfo, int suspend)
 {
 	struct grant *persistent_gnt;
 	struct grant *n;
-	int i, j, segs;
-	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
+	int i, j, segs, r_index;
+	struct blkfront_ring_info *rinfo;
 
 	/* Prevent new requests being issued until we fix things up. */
 	spin_lock_irq(&dinfo->io_lock);
@@ -988,100 +993,103 @@ static void blkif_free(struct blkfront_dev_info *dinfo, int suspend)
 	if (dinfo->rq)
 		blk_mq_stop_hw_queues(dinfo->rq);
 
-	/* Remove all persistent grants */
-	if (!list_empty(&rinfo->grants)) {
-		list_for_each_entry_safe(persistent_gnt, n,
-					 &rinfo->grants, node) {
-			list_del(&persistent_gnt->node);
-			if (persistent_gnt->gref != GRANT_INVALID_REF) {
-				gnttab_end_foreign_access(persistent_gnt->gref,
-				                          0, 0UL);
-				rinfo->persistent_gnts_c--;
+	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
+		rinfo = &dinfo->rinfo[r_index];
+
+		/* Remove all persistent grants */
+		if (!list_empty(&rinfo->grants)) {
+			list_for_each_entry_safe(persistent_gnt, n,
+						 &rinfo->grants, node) {
+				list_del(&persistent_gnt->node);
+				if (persistent_gnt->gref != GRANT_INVALID_REF) {
+					gnttab_end_foreign_access(persistent_gnt->gref,
+								  0, 0UL);
+					rinfo->persistent_gnts_c--;
+				}
+				if (dinfo->feature_persistent)
+					__free_page(pfn_to_page(persistent_gnt->pfn));
+				kfree(persistent_gnt);
 			}
-			if (dinfo->feature_persistent)
-				__free_page(pfn_to_page(persistent_gnt->pfn));
-			kfree(persistent_gnt);
 		}
-	}
-	BUG_ON(rinfo->persistent_gnts_c != 0);
+		BUG_ON(rinfo->persistent_gnts_c != 0);
 
-	/*
-	 * Remove indirect pages, this only happens when using indirect
-	 * descriptors but not persistent grants
-	 */
-	if (!list_empty(&rinfo->indirect_pages)) {
-		struct page *indirect_page, *n;
-
-		BUG_ON(dinfo->feature_persistent);
-		list_for_each_entry_safe(indirect_page, n, &rinfo->indirect_pages, lru) {
-			list_del(&indirect_page->lru);
-			__free_page(indirect_page);
-		}
-	}
-
-	for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
 		/*
-		 * Clear persistent grants present in requests already
-		 * on the shared ring
+		 * Remove indirect pages, this only happens when using indirect
+		 * descriptors but not persistent grants
 		 */
-		if (!rinfo->shadow[i].request)
-			goto free_shadow;
-
-		segs = rinfo->shadow[i].req.operation == BLKIF_OP_INDIRECT ?
-		       rinfo->shadow[i].req.u.indirect.nr_segments :
-		       rinfo->shadow[i].req.u.rw.nr_segments;
-		for (j = 0; j < segs; j++) {
-			persistent_gnt = rinfo->shadow[i].grants_used[j];
-			gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
-			if (dinfo->feature_persistent)
-				__free_page(pfn_to_page(persistent_gnt->pfn));
-			kfree(persistent_gnt);
+		if (!list_empty(&rinfo->indirect_pages)) {
+			struct page *indirect_page, *n;
+
+			BUG_ON(dinfo->feature_persistent);
+			list_for_each_entry_safe(indirect_page, n, &rinfo->indirect_pages, lru) {
+				list_del(&indirect_page->lru);
+				__free_page(indirect_page);
+			}
 		}
 
-		if (rinfo->shadow[i].req.operation != BLKIF_OP_INDIRECT)
+		for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
 			/*
-			 * If this is not an indirect operation don't try to
-			 * free indirect segments
+			 * Clear persistent grants present in requests already
+			 * on the shared ring
 			 */
-			goto free_shadow;
+			if (!rinfo->shadow[i].request)
+				goto free_shadow;
+
+			segs = rinfo->shadow[i].req.operation == BLKIF_OP_INDIRECT ?
+			       rinfo->shadow[i].req.u.indirect.nr_segments :
+			       rinfo->shadow[i].req.u.rw.nr_segments;
+			for (j = 0; j < segs; j++) {
+				persistent_gnt = rinfo->shadow[i].grants_used[j];
+				gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
+				if (dinfo->feature_persistent)
+					__free_page(pfn_to_page(persistent_gnt->pfn));
+				kfree(persistent_gnt);
+			}
 
-		for (j = 0; j < INDIRECT_GREFS(segs); j++) {
-			persistent_gnt = rinfo->shadow[i].indirect_grants[j];
-			gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
-			__free_page(pfn_to_page(persistent_gnt->pfn));
-			kfree(persistent_gnt);
-		}
+			if (rinfo->shadow[i].req.operation != BLKIF_OP_INDIRECT)
+				/*
+				 * If this is not an indirect operation don't try to
+				 * free indirect segments
+				 */
+				goto free_shadow;
+
+			for (j = 0; j < INDIRECT_GREFS(segs); j++) {
+				persistent_gnt = rinfo->shadow[i].indirect_grants[j];
+				gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
+				__free_page(pfn_to_page(persistent_gnt->pfn));
+				kfree(persistent_gnt);
+			}
 
 free_shadow:
-		kfree(rinfo->shadow[i].grants_used);
-		rinfo->shadow[i].grants_used = NULL;
-		kfree(rinfo->shadow[i].indirect_grants);
-		rinfo->shadow[i].indirect_grants = NULL;
-		kfree(rinfo->shadow[i].sg);
-		rinfo->shadow[i].sg = NULL;
-	}
+			kfree(rinfo->shadow[i].grants_used);
+			rinfo->shadow[i].grants_used = NULL;
+			kfree(rinfo->shadow[i].indirect_grants);
+			rinfo->shadow[i].indirect_grants = NULL;
+			kfree(rinfo->shadow[i].sg);
+			rinfo->shadow[i].sg = NULL;
+		}
 
-	/* No more gnttab callback work. */
-	gnttab_cancel_free_callback(&rinfo->callback);
-	spin_unlock_irq(&dinfo->io_lock);
+		/* No more gnttab callback work. */
+		gnttab_cancel_free_callback(&rinfo->callback);
+		spin_unlock_irq(&dinfo->io_lock);
 
-	/* Flush gnttab callback work. Must be done with no locks held. */
-	flush_work(&rinfo->work);
+		/* Flush gnttab callback work. Must be done with no locks held. */
+		flush_work(&rinfo->work);
 
-	/* Free resources associated with old device channel. */
-	for (i = 0; i < dinfo->nr_ring_pages; i++) {
-		if (rinfo->ring_ref[i] != GRANT_INVALID_REF) {
-			gnttab_end_foreign_access(rinfo->ring_ref[i], 0, 0);
-			rinfo->ring_ref[i] = GRANT_INVALID_REF;
+		/* Free resources associated with old device channel. */
+		for (i = 0; i < dinfo->pages_per_ring; i++) {
+			if (rinfo->ring_ref[i] != GRANT_INVALID_REF) {
+				gnttab_end_foreign_access(rinfo->ring_ref[i], 0, 0);
+				rinfo->ring_ref[i] = GRANT_INVALID_REF;
+			}
 		}
-	}
-	free_pages((unsigned long)rinfo->ring.sring, get_order(dinfo->nr_ring_pages * PAGE_SIZE));
-	rinfo->ring.sring = NULL;
-
-	if (rinfo->irq)
-		unbind_from_irqhandler(rinfo->irq, rinfo);
-	rinfo->evtchn = rinfo->irq = 0;
+		free_pages((unsigned long)rinfo->ring.sring, get_order(dinfo->pages_per_ring * PAGE_SIZE));
+		rinfo->ring.sring = NULL;
 
+		if (rinfo->irq)
+			unbind_from_irqhandler(rinfo->irq, rinfo);
+		rinfo->evtchn = rinfo->irq = 0;
+	}
 }
 
 static void blkif_completion(struct blk_shadow *s, struct blkfront_ring_info *rinfo,
@@ -1276,6 +1284,26 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 	return IRQ_HANDLED;
 }
 
+static void destroy_blkring(struct xenbus_device *dev,
+			    struct blkfront_ring_info *rinfo)
+{
+	int i;
+
+	if (rinfo->irq)
+		unbind_from_irqhandler(rinfo->irq, rinfo);
+	if (rinfo->evtchn)
+		xenbus_free_evtchn(dev, rinfo->evtchn);
+
+	for (i = 0; i < rinfo->dinfo->pages_per_ring; i++) {
+		if (rinfo->ring_ref[i] != GRANT_INVALID_REF) {
+			gnttab_end_foreign_access(rinfo->ring_ref[i], 0, 0);
+			rinfo->ring_ref[i] = GRANT_INVALID_REF;
+		}
+	}
+	free_pages((unsigned long)rinfo->ring.sring,
+		   get_order(rinfo->dinfo->pages_per_ring * PAGE_SIZE));
+	rinfo->ring.sring = NULL;
+}
 
 static int setup_blkring(struct xenbus_device *dev,
 			 struct blkfront_ring_info *rinfo)
@@ -1283,10 +1311,10 @@ static int setup_blkring(struct xenbus_device *dev,
 	struct blkif_sring *sring;
 	int err, i;
 	struct blkfront_dev_info *dinfo = rinfo->dinfo;
-	unsigned long ring_size = dinfo->nr_ring_pages * PAGE_SIZE;
+	unsigned long ring_size = dinfo->pages_per_ring * PAGE_SIZE;
 	grant_ref_t gref[XENBUS_MAX_RING_PAGES];
 
-	for (i = 0; i < dinfo->nr_ring_pages; i++)
+	for (i = 0; i < dinfo->pages_per_ring; i++)
 		rinfo->ring_ref[i] = GRANT_INVALID_REF;
 
 	sring = (struct blkif_sring *)__get_free_pages(GFP_NOIO | __GFP_HIGH,
@@ -1298,13 +1326,13 @@ static int setup_blkring(struct xenbus_device *dev,
 	SHARED_RING_INIT(sring);
 	FRONT_RING_INIT(&rinfo->ring, sring, ring_size);
 
-	err = xenbus_grant_ring(dev, rinfo->ring.sring, dinfo->nr_ring_pages, gref);
+	err = xenbus_grant_ring(dev, rinfo->ring.sring, dinfo->pages_per_ring, gref);
 	if (err < 0) {
 		free_pages((unsigned long)sring, get_order(ring_size));
 		rinfo->ring.sring = NULL;
 		goto fail;
 	}
-	for (i = 0; i < dinfo->nr_ring_pages; i++)
+	for (i = 0; i < dinfo->pages_per_ring; i++)
 		rinfo->ring_ref[i] = gref[i];
 
 	err = xenbus_alloc_evtchn(dev, &rinfo->evtchn);
@@ -1322,7 +1350,7 @@ static int setup_blkring(struct xenbus_device *dev,
 
 	return 0;
 fail:
-	blkif_free(dinfo, 0);
+	destroy_blkring(dev, rinfo);
 	return err;
 }
 
@@ -1333,65 +1361,76 @@ static int talk_to_blkback(struct xenbus_device *dev,
 {
 	const char *message = NULL;
 	struct xenbus_transaction xbt;
-	int err, i;
+	int err, i, r_index;
 	unsigned int max_page_order = 0;
 	unsigned int ring_page_order = 0;
-	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
+	struct blkfront_ring_info *rinfo;
 
 	err = xenbus_scanf(XBT_NIL, dinfo->xbdev->otherend,
 			   "max-ring-page-order", "%u", &max_page_order);
 	if (err != 1)
-		dinfo->nr_ring_pages = 1;
+		dinfo->pages_per_ring = 1;
 	else {
 		ring_page_order = min(xen_blkif_max_ring_order, max_page_order);
-		dinfo->nr_ring_pages = 1 << ring_page_order;
+		dinfo->pages_per_ring = 1 << ring_page_order;
 	}
 
-	/* Create shared ring, alloc event channel. */
-	err = setup_blkring(dev, rinfo);
-	if (err)
-		goto out;
+	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
+		rinfo = &dinfo->rinfo[r_index];
+		/* Create shared ring, alloc event channel. */
+		err = setup_blkring(dev, rinfo);
+		if (err)
+			goto out;
+	}
 
 again:
 	err = xenbus_transaction_start(&xbt);
 	if (err) {
 		xenbus_dev_fatal(dev, err, "starting transaction");
-		goto destroy_blkring;
+		goto out;
 	}
 
-	if (dinfo->nr_ring_pages == 1) {
-		err = xenbus_printf(xbt, dev->nodename,
-				    "ring-ref", "%u", rinfo->ring_ref[0]);
-		if (err) {
-			message = "writing ring-ref";
-			goto abort_transaction;
-		}
-	} else {
-		err = xenbus_printf(xbt, dev->nodename,
-				    "ring-page-order", "%u", ring_page_order);
-		if (err) {
-			message = "writing ring-page-order";
-			goto abort_transaction;
-		}
-
-		for (i = 0; i < dinfo->nr_ring_pages; i++) {
-			char ring_ref_name[RINGREF_NAME_LEN];
+	if (dinfo->nr_rings == 1) {
+		rinfo = &dinfo->rinfo[0];
 
-			snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
-			err = xenbus_printf(xbt, dev->nodename, ring_ref_name,
-					    "%u", rinfo->ring_ref[i]);
+		if (dinfo->pages_per_ring == 1) {
+			err = xenbus_printf(xbt, dev->nodename,
+					    "ring-ref", "%u", rinfo->ring_ref[0]);
 			if (err) {
 				message = "writing ring-ref";
 				goto abort_transaction;
 			}
+		} else {
+			err = xenbus_printf(xbt, dev->nodename,
+					    "ring-page-order", "%u", ring_page_order);
+			if (err) {
+				message = "writing ring-page-order";
+				goto abort_transaction;
+			}
+
+			for (i = 0; i < dinfo->pages_per_ring; i++) {
+				char ring_ref_name[RINGREF_NAME_LEN];
+
+				snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
+				err = xenbus_printf(xbt, dev->nodename, ring_ref_name,
+						    "%u", rinfo->ring_ref[i]);
+				if (err) {
+					message = "writing ring-ref";
+					goto abort_transaction;
+				}
+			}
 		}
-	}
-	err = xenbus_printf(xbt, dev->nodename,
-			    "event-channel", "%u", rinfo->evtchn);
-	if (err) {
-		message = "writing event-channel";
+		err = xenbus_printf(xbt, dev->nodename,
+				    "event-channel", "%u", rinfo->evtchn);
+		if (err) {
+			message = "writing event-channel";
+			goto abort_transaction;
+		}
+	} else {
+		/* Not supported at this stage */
 		goto abort_transaction;
 	}
+
 	err = xenbus_printf(xbt, dev->nodename, "protocol", "%s",
 			    XEN_IO_PROTO_ABI_NATIVE);
 	if (err) {
@@ -1409,12 +1448,16 @@ again:
 		if (err == -EAGAIN)
 			goto again;
 		xenbus_dev_fatal(dev, err, "completing transaction");
-		goto destroy_blkring;
+		goto out;
 	}
 
-	for (i = 0; i < BLK_RING_SIZE(dinfo); i++)
-		rinfo->shadow[i].req.u.rw.id = i+1;
-	rinfo->shadow[BLK_RING_SIZE(dinfo)-1].req.u.rw.id = 0x0fffffff;
+	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
+		rinfo = &dinfo->rinfo[r_index];
+
+		for (i = 0; i < BLK_RING_SIZE(dinfo); i++)
+			rinfo->shadow[i].req.u.rw.id = i+1;
+		rinfo->shadow[BLK_RING_SIZE(dinfo)-1].req.u.rw.id = 0x0fffffff;
+	}
 	xenbus_switch_state(dev, XenbusStateInitialised);
 
 	return 0;
@@ -1423,9 +1466,9 @@ again:
 	xenbus_transaction_end(xbt, 1);
 	if (message)
 		xenbus_dev_fatal(dev, err, "%s", message);
- destroy_blkring:
-	blkif_free(dinfo, 0);
  out:
+	while (--r_index >= 0)
+		destroy_blkring(dev, &dinfo->rinfo[r_index]);
 	return err;
 }
 
@@ -1438,7 +1481,7 @@ again:
 static int blkfront_probe(struct xenbus_device *dev,
 			  const struct xenbus_device_id *id)
 {
-	int err, vdevice;
+	int err, vdevice, r_index;
 	struct blkfront_dev_info *dinfo;
 	struct blkfront_ring_info *rinfo;
 
@@ -1490,17 +1533,29 @@ static int blkfront_probe(struct xenbus_device *dev,
 		return -ENOMEM;
 	}
 
-	rinfo = &dinfo->rinfo;
 	mutex_init(&dinfo->mutex);
 	spin_lock_init(&dinfo->io_lock);
 	dinfo->xbdev = dev;
 	dinfo->vdevice = vdevice;
-	INIT_LIST_HEAD(&rinfo->grants);
-	INIT_LIST_HEAD(&rinfo->indirect_pages);
-	rinfo->persistent_gnts_c = 0;
 	dinfo->connected = BLKIF_STATE_DISCONNECTED;
-	rinfo->dinfo = dinfo;
-	INIT_WORK(&rinfo->work, blkif_restart_queue);
+
+	dinfo->nr_rings = 1;
+	dinfo->rinfo = kzalloc(sizeof(*rinfo) * dinfo->nr_rings, GFP_KERNEL);
+	if (!dinfo->rinfo) {
+		xenbus_dev_fatal(dev, -ENOMEM, "allocating ring_info structure");
+		kfree(dinfo);
+		return -ENOMEM;
+	}
+
+	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
+		rinfo = &dinfo->rinfo[r_index];
+
+		INIT_LIST_HEAD(&rinfo->grants);
+		INIT_LIST_HEAD(&rinfo->indirect_pages);
+		rinfo->persistent_gnts_c = 0;
+		rinfo->dinfo = dinfo;
+		INIT_WORK(&rinfo->work, blkif_restart_queue);
+	}
 
 	/* Front end dir is a number, which is used as the id. */
 	dinfo->handle = simple_strtoul(strrchr(dev->nodename, '/')+1, NULL, 0);
@@ -1526,7 +1581,7 @@ static void split_bio_end(struct bio *bio, int error)
 
 static int blkif_recover(struct blkfront_dev_info *dinfo)
 {
-	int i;
+	int i, r_index;
 	struct request *req, *n;
 	struct blk_shadow *copy;
 	int rc;
@@ -1536,56 +1591,62 @@ static int blkif_recover(struct blkfront_dev_info *dinfo)
 	int pending, size;
 	struct split_bio *split_bio;
 	struct list_head requests;
-	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
-
-	/* Stage 1: Make a safe copy of the shadow state. */
-	copy = kmemdup(rinfo->shadow, sizeof(rinfo->shadow),
-		       GFP_NOIO | __GFP_REPEAT | __GFP_HIGH);
-	if (!copy)
-		return -ENOMEM;
-
-	/* Stage 2: Set up free list. */
-	memset(&rinfo->shadow, 0, sizeof(rinfo->shadow));
-	for (i = 0; i < BLK_RING_SIZE(dinfo); i++)
-		rinfo->shadow[i].req.u.rw.id = i+1;
-	rinfo->shadow_free = rinfo->ring.req_prod_pvt;
-	rinfo->shadow[BLK_RING_SIZE(dinfo)-1].req.u.rw.id = 0x0fffffff;
-
-	rc = blkfront_gather_backend_features(dinfo);
-	if (rc) {
-		kfree(copy);
-		return rc;
-	}
+	struct blkfront_ring_info *rinfo;
 
+	__blkfront_gather_backend_features(dinfo);
 	segs = dinfo->max_indirect_segments ? : BLKIF_MAX_SEGMENTS_PER_REQUEST;
 	blk_queue_max_segments(dinfo->rq, segs);
 	bio_list_init(&bio_list);
 	INIT_LIST_HEAD(&requests);
-	for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
-		/* Not in use? */
-		if (!copy[i].request)
-			continue;
 
-		/*
-		 * Get the bios in the request so we can re-queue them.
-		 */
-		if (copy[i].request->cmd_flags &
-		    (REQ_FLUSH | REQ_FUA | REQ_DISCARD | REQ_SECURE)) {
+	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
+		rinfo = &dinfo->rinfo[r_index];
+
+		/* Stage 1: Make a safe copy of the shadow state. */
+		copy = kmemdup(rinfo->shadow, sizeof(rinfo->shadow),
+			       GFP_NOIO | __GFP_REPEAT | __GFP_HIGH);
+		if (!copy)
+			return -ENOMEM;
+
+		/* Stage 2: Set up free list. */
+		memset(&rinfo->shadow, 0, sizeof(rinfo->shadow));
+		for (i = 0; i < BLK_RING_SIZE(dinfo); i++)
+			rinfo->shadow[i].req.u.rw.id = i+1;
+		rinfo->shadow_free = rinfo->ring.req_prod_pvt;
+		rinfo->shadow[BLK_RING_SIZE(dinfo)-1].req.u.rw.id = 0x0fffffff;
+
+		rc = blkfront_setup_indirect(rinfo);
+		if (rc) {
+			kfree(copy);
+			return rc;
+		}
+
+		for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
+			/* Not in use? */
+			if (!copy[i].request)
+				continue;
+
 			/*
-			 * Flush operations don't contain bios, so
-			 * we need to requeue the whole request
+			 * Get the bios in the request so we can re-queue them.
 			 */
-			list_add(&copy[i].request->queuelist, &requests);
-			continue;
+			if (copy[i].request->cmd_flags &
+			    (REQ_FLUSH | REQ_FUA | REQ_DISCARD | REQ_SECURE)) {
+				/*
+				 * Flush operations don't contain bios, so
+				 * we need to requeue the whole request
+				 */
+				list_add(&copy[i].request->queuelist, &requests);
+				continue;
+			}
+			merge_bio.head = copy[i].request->bio;
+			merge_bio.tail = copy[i].request->biotail;
+			bio_list_merge(&bio_list, &merge_bio);
+			copy[i].request->bio = NULL;
+			blk_end_request_all(copy[i].request, 0);
 		}
-		merge_bio.head = copy[i].request->bio;
-		merge_bio.tail = copy[i].request->biotail;
-		bio_list_merge(&bio_list, &merge_bio);
-		copy[i].request->bio = NULL;
-		blk_end_request_all(copy[i].request, 0);
-	}
 
-	kfree(copy);
+		kfree(copy);
+	}
 
 	xenbus_switch_state(dinfo->xbdev, XenbusStateConnected);
 
@@ -1594,8 +1655,12 @@ static int blkif_recover(struct blkfront_dev_info *dinfo)
 	/* Now safe for us to use the shared ring */
 	dinfo->connected = BLKIF_STATE_CONNECTED;
 
-	/* Kick any other new requests queued since we resumed */
-	kick_pending_request_queues(rinfo);
+	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
+		rinfo = &dinfo->rinfo[r_index];
+
+		/* Kick any other new requests queued since we resumed */
+		kick_pending_request_queues(rinfo);
+	}
 
 	list_for_each_entry_safe(req, n, &requests, queuelist) {
 		/* Requeue pending requests (flush or discard) */
@@ -1729,6 +1794,38 @@ static void blkfront_setup_discard(struct blkfront_dev_info *dinfo)
 		dinfo->feature_secdiscard = !!discard_secure;
 }
 
+static void blkfront_clean_ring(struct blkfront_ring_info *rinfo)
+{
+	int i;
+
+	for (i = 0; i < BLK_RING_SIZE(rinfo->dinfo); i++) {
+		kfree(rinfo->shadow[i].grants_used);
+		rinfo->shadow[i].grants_used = NULL;
+		kfree(rinfo->shadow[i].sg);
+		rinfo->shadow[i].sg = NULL;
+		kfree(rinfo->shadow[i].indirect_grants);
+		rinfo->shadow[i].indirect_grants = NULL;
+	}
+	if (!list_empty(&rinfo->indirect_pages)) {
+		struct page *indirect_page, *n;
+		list_for_each_entry_safe(indirect_page, n, &rinfo->indirect_pages, lru) {
+			list_del(&indirect_page->lru);
+			__free_page(indirect_page);
+		}
+	}
+
+	if (!list_empty(&rinfo->grants)) {
+		struct grant *gnt_list_entry, *n;
+		list_for_each_entry_safe(gnt_list_entry, n,
+				&rinfo->grants, node) {
+			list_del(&gnt_list_entry->node);
+			if (rinfo->dinfo->feature_persistent)
+				__free_page(pfn_to_page(gnt_list_entry->pfn));
+			kfree(gnt_list_entry);
+		}
+	}
+}
+
 static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo)
 {
 	unsigned int segs;
@@ -1783,28 +1880,14 @@ static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo)
 	return 0;
 
 out_of_memory:
-	for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
-		kfree(rinfo->shadow[i].grants_used);
-		rinfo->shadow[i].grants_used = NULL;
-		kfree(rinfo->shadow[i].sg);
-		rinfo->shadow[i].sg = NULL;
-		kfree(rinfo->shadow[i].indirect_grants);
-		rinfo->shadow[i].indirect_grants = NULL;
-	}
-	if (!list_empty(&rinfo->indirect_pages)) {
-		struct page *indirect_page, *n;
-		list_for_each_entry_safe(indirect_page, n, &rinfo->indirect_pages, lru) {
-			list_del(&indirect_page->lru);
-			__free_page(indirect_page);
-		}
-	}
+	blkfront_clean_ring(rinfo);
 	return -ENOMEM;
 }
 
 /*
  * Gather all backend feature-*
  */
-static int blkfront_gather_backend_features(struct blkfront_dev_info *dinfo)
+static void __blkfront_gather_backend_features(struct blkfront_dev_info *dinfo)
 {
 	int err;
 	int barrier, flush, discard, persistent;
@@ -1859,8 +1942,25 @@ static int blkfront_gather_backend_features(struct blkfront_dev_info *dinfo)
 	else
 		dinfo->max_indirect_segments = min(indirect_segments,
 						  xen_blkif_max_segments);
+}
+
+static int blkfront_gather_backend_features(struct blkfront_dev_info *dinfo)
+{
+	int err, i;
+
+	__blkfront_gather_backend_features(dinfo);
 
-	return blkfront_setup_indirect(&dinfo->rinfo);
+	for (i = 0; i < dinfo->nr_rings; i++) {
+		err = blkfront_setup_indirect(&dinfo->rinfo[i]);
+		if (err)
+			goto out;
+	}
+	return 0;
+
+out:
+	while (--i >= 0)
+		blkfront_clean_ring(&dinfo->rinfo[i]);
+	return err;
 }
 
 /*
@@ -1873,8 +1973,8 @@ static void blkfront_connect(struct blkfront_dev_info *dinfo)
 	unsigned long sector_size;
 	unsigned int physical_sector_size;
 	unsigned int binfo;
-	int err;
-	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
+	int err, i;
+	struct blkfront_ring_info *rinfo;
 
 	switch (dinfo->connected) {
 	case BLKIF_STATE_CONNECTED:
@@ -1951,7 +2051,10 @@ static void blkfront_connect(struct blkfront_dev_info *dinfo)
 	/* Kick pending requests. */
 	spin_lock_irq(&dinfo->io_lock);
 	dinfo->connected = BLKIF_STATE_CONNECTED;
-	kick_pending_request_queues(rinfo);
+	for (i = 0; i < dinfo->nr_rings; i++) {
+		rinfo = &dinfo->rinfo[i];
+		kick_pending_request_queues(rinfo);
+	}
 	spin_unlock_irq(&dinfo->io_lock);
 
 	add_disk(dinfo->gd);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v3 4/9] xen/blkfront: pseudo support for multi hardware queues/rings
  2015-09-05 12:39 [PATCH v3 0/9] xen-block: support multi hardware-queues/rings Bob Liu
                   ` (6 preceding siblings ...)
  2015-09-05 12:39 ` [PATCH v3 4/9] xen/blkfront: pseudo support for multi hardware queues/rings Bob Liu
@ 2015-09-05 12:39 ` Bob Liu
  2015-09-05 12:39   ` Bob Liu
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 83+ messages in thread
From: Bob Liu @ 2015-09-05 12:39 UTC (permalink / raw)
  To: xen-devel
  Cc: hch, felipe.franciosi, rafal.mielniczuk, linux-kernel,
	jonathan.davies, axboe, Bob Liu, david.vrabel, avanzini.arianna,
	boris.ostrovsky, roger.pau

Prepare patch for multi hardware queues/rings, the ring number was set to 1 by
force.

* Use 'nr_rings' in per dev_info to identify how many hw queues/rings are
  supported, and a pointer *rinfo for all its rings.
* Rename 'nr_ring_pages' => 'pages_per_ring' to distinguish from 'nr_rings'
  better.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 drivers/block/xen-blkfront.c |  513 +++++++++++++++++++++++++-----------------
 1 file changed, 308 insertions(+), 205 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index bf416d5..bf45c99 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -107,7 +107,7 @@ static unsigned int xen_blkif_max_ring_order;
 module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, S_IRUGO);
 MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the shared ring");
 
-#define BLK_RING_SIZE(dinfo) __CONST_RING_SIZE(blkif, PAGE_SIZE * (dinfo)->nr_ring_pages)
+#define BLK_RING_SIZE(dinfo) __CONST_RING_SIZE(blkif, PAGE_SIZE * (dinfo)->pages_per_ring)
 #define BLK_MAX_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE * XENBUS_MAX_RING_PAGES)
 /*
  * ring-ref%i i=(-1UL) would take 11 characters + 'ring-ref' is 8, so 19
@@ -157,9 +157,10 @@ struct blkfront_dev_info {
 	unsigned int feature_persistent:1;
 	unsigned int max_indirect_segments;
 	int is_ready;
-	unsigned int nr_ring_pages;
+	unsigned int pages_per_ring;
 	struct blk_mq_tag_set tag_set;
-	struct blkfront_ring_info rinfo;
+	struct blkfront_ring_info *rinfo;
+	unsigned int nr_rings;
 };
 
 static unsigned int nr_minors;
@@ -191,7 +192,7 @@ static DEFINE_SPINLOCK(minor_lock);
 	((_segs + SEGS_PER_INDIRECT_FRAME - 1)/SEGS_PER_INDIRECT_FRAME)
 
 static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo);
-static int blkfront_gather_backend_features(struct blkfront_dev_info *dinfo);
+static void __blkfront_gather_backend_features(struct blkfront_dev_info *dinfo);
 
 static int get_id_from_freelist(struct blkfront_ring_info *rinfo)
 {
@@ -668,7 +669,7 @@ static int blk_mq_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
 {
 	struct blkfront_dev_info *dinfo = (struct blkfront_dev_info *)data;
 
-	hctx->driver_data = &dinfo->rinfo;
+	hctx->driver_data = &dinfo->rinfo[index];
 	return 0;
 }
 
@@ -927,8 +928,8 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
 
 static void xlvbd_release_gendisk(struct blkfront_dev_info *dinfo)
 {
-	unsigned int minor, nr_minors;
-	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
+	unsigned int minor, nr_minors, i;
+	struct blkfront_ring_info *rinfo;
 
 	if (dinfo->rq == NULL)
 		return;
@@ -936,11 +937,15 @@ static void xlvbd_release_gendisk(struct blkfront_dev_info *dinfo)
 	/* No more blkif_request(). */
 	blk_mq_stop_hw_queues(dinfo->rq);
 
-	/* No more gnttab callback work. */
-	gnttab_cancel_free_callback(&rinfo->callback);
+	for (i = 0; i < dinfo->nr_rings; i++) {
+		rinfo = &dinfo->rinfo[i];
 
-	/* Flush gnttab callback work. Must be done with no locks held. */
-	flush_work(&rinfo->work);
+		/* No more gnttab callback work. */
+		gnttab_cancel_free_callback(&rinfo->callback);
+
+		/* Flush gnttab callback work. Must be done with no locks held. */
+		flush_work(&rinfo->work);
+	}
 
 	del_gendisk(dinfo->gd);
 
@@ -977,8 +982,8 @@ static void blkif_free(struct blkfront_dev_info *dinfo, int suspend)
 {
 	struct grant *persistent_gnt;
 	struct grant *n;
-	int i, j, segs;
-	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
+	int i, j, segs, r_index;
+	struct blkfront_ring_info *rinfo;
 
 	/* Prevent new requests being issued until we fix things up. */
 	spin_lock_irq(&dinfo->io_lock);
@@ -988,100 +993,103 @@ static void blkif_free(struct blkfront_dev_info *dinfo, int suspend)
 	if (dinfo->rq)
 		blk_mq_stop_hw_queues(dinfo->rq);
 
-	/* Remove all persistent grants */
-	if (!list_empty(&rinfo->grants)) {
-		list_for_each_entry_safe(persistent_gnt, n,
-					 &rinfo->grants, node) {
-			list_del(&persistent_gnt->node);
-			if (persistent_gnt->gref != GRANT_INVALID_REF) {
-				gnttab_end_foreign_access(persistent_gnt->gref,
-				                          0, 0UL);
-				rinfo->persistent_gnts_c--;
+	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
+		rinfo = &dinfo->rinfo[r_index];
+
+		/* Remove all persistent grants */
+		if (!list_empty(&rinfo->grants)) {
+			list_for_each_entry_safe(persistent_gnt, n,
+						 &rinfo->grants, node) {
+				list_del(&persistent_gnt->node);
+				if (persistent_gnt->gref != GRANT_INVALID_REF) {
+					gnttab_end_foreign_access(persistent_gnt->gref,
+								  0, 0UL);
+					rinfo->persistent_gnts_c--;
+				}
+				if (dinfo->feature_persistent)
+					__free_page(pfn_to_page(persistent_gnt->pfn));
+				kfree(persistent_gnt);
 			}
-			if (dinfo->feature_persistent)
-				__free_page(pfn_to_page(persistent_gnt->pfn));
-			kfree(persistent_gnt);
 		}
-	}
-	BUG_ON(rinfo->persistent_gnts_c != 0);
+		BUG_ON(rinfo->persistent_gnts_c != 0);
 
-	/*
-	 * Remove indirect pages, this only happens when using indirect
-	 * descriptors but not persistent grants
-	 */
-	if (!list_empty(&rinfo->indirect_pages)) {
-		struct page *indirect_page, *n;
-
-		BUG_ON(dinfo->feature_persistent);
-		list_for_each_entry_safe(indirect_page, n, &rinfo->indirect_pages, lru) {
-			list_del(&indirect_page->lru);
-			__free_page(indirect_page);
-		}
-	}
-
-	for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
 		/*
-		 * Clear persistent grants present in requests already
-		 * on the shared ring
+		 * Remove indirect pages, this only happens when using indirect
+		 * descriptors but not persistent grants
 		 */
-		if (!rinfo->shadow[i].request)
-			goto free_shadow;
-
-		segs = rinfo->shadow[i].req.operation == BLKIF_OP_INDIRECT ?
-		       rinfo->shadow[i].req.u.indirect.nr_segments :
-		       rinfo->shadow[i].req.u.rw.nr_segments;
-		for (j = 0; j < segs; j++) {
-			persistent_gnt = rinfo->shadow[i].grants_used[j];
-			gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
-			if (dinfo->feature_persistent)
-				__free_page(pfn_to_page(persistent_gnt->pfn));
-			kfree(persistent_gnt);
+		if (!list_empty(&rinfo->indirect_pages)) {
+			struct page *indirect_page, *n;
+
+			BUG_ON(dinfo->feature_persistent);
+			list_for_each_entry_safe(indirect_page, n, &rinfo->indirect_pages, lru) {
+				list_del(&indirect_page->lru);
+				__free_page(indirect_page);
+			}
 		}
 
-		if (rinfo->shadow[i].req.operation != BLKIF_OP_INDIRECT)
+		for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
 			/*
-			 * If this is not an indirect operation don't try to
-			 * free indirect segments
+			 * Clear persistent grants present in requests already
+			 * on the shared ring
 			 */
-			goto free_shadow;
+			if (!rinfo->shadow[i].request)
+				goto free_shadow;
+
+			segs = rinfo->shadow[i].req.operation == BLKIF_OP_INDIRECT ?
+			       rinfo->shadow[i].req.u.indirect.nr_segments :
+			       rinfo->shadow[i].req.u.rw.nr_segments;
+			for (j = 0; j < segs; j++) {
+				persistent_gnt = rinfo->shadow[i].grants_used[j];
+				gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
+				if (dinfo->feature_persistent)
+					__free_page(pfn_to_page(persistent_gnt->pfn));
+				kfree(persistent_gnt);
+			}
 
-		for (j = 0; j < INDIRECT_GREFS(segs); j++) {
-			persistent_gnt = rinfo->shadow[i].indirect_grants[j];
-			gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
-			__free_page(pfn_to_page(persistent_gnt->pfn));
-			kfree(persistent_gnt);
-		}
+			if (rinfo->shadow[i].req.operation != BLKIF_OP_INDIRECT)
+				/*
+				 * If this is not an indirect operation don't try to
+				 * free indirect segments
+				 */
+				goto free_shadow;
+
+			for (j = 0; j < INDIRECT_GREFS(segs); j++) {
+				persistent_gnt = rinfo->shadow[i].indirect_grants[j];
+				gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
+				__free_page(pfn_to_page(persistent_gnt->pfn));
+				kfree(persistent_gnt);
+			}
 
 free_shadow:
-		kfree(rinfo->shadow[i].grants_used);
-		rinfo->shadow[i].grants_used = NULL;
-		kfree(rinfo->shadow[i].indirect_grants);
-		rinfo->shadow[i].indirect_grants = NULL;
-		kfree(rinfo->shadow[i].sg);
-		rinfo->shadow[i].sg = NULL;
-	}
+			kfree(rinfo->shadow[i].grants_used);
+			rinfo->shadow[i].grants_used = NULL;
+			kfree(rinfo->shadow[i].indirect_grants);
+			rinfo->shadow[i].indirect_grants = NULL;
+			kfree(rinfo->shadow[i].sg);
+			rinfo->shadow[i].sg = NULL;
+		}
 
-	/* No more gnttab callback work. */
-	gnttab_cancel_free_callback(&rinfo->callback);
-	spin_unlock_irq(&dinfo->io_lock);
+		/* No more gnttab callback work. */
+		gnttab_cancel_free_callback(&rinfo->callback);
+		spin_unlock_irq(&dinfo->io_lock);
 
-	/* Flush gnttab callback work. Must be done with no locks held. */
-	flush_work(&rinfo->work);
+		/* Flush gnttab callback work. Must be done with no locks held. */
+		flush_work(&rinfo->work);
 
-	/* Free resources associated with old device channel. */
-	for (i = 0; i < dinfo->nr_ring_pages; i++) {
-		if (rinfo->ring_ref[i] != GRANT_INVALID_REF) {
-			gnttab_end_foreign_access(rinfo->ring_ref[i], 0, 0);
-			rinfo->ring_ref[i] = GRANT_INVALID_REF;
+		/* Free resources associated with old device channel. */
+		for (i = 0; i < dinfo->pages_per_ring; i++) {
+			if (rinfo->ring_ref[i] != GRANT_INVALID_REF) {
+				gnttab_end_foreign_access(rinfo->ring_ref[i], 0, 0);
+				rinfo->ring_ref[i] = GRANT_INVALID_REF;
+			}
 		}
-	}
-	free_pages((unsigned long)rinfo->ring.sring, get_order(dinfo->nr_ring_pages * PAGE_SIZE));
-	rinfo->ring.sring = NULL;
-
-	if (rinfo->irq)
-		unbind_from_irqhandler(rinfo->irq, rinfo);
-	rinfo->evtchn = rinfo->irq = 0;
+		free_pages((unsigned long)rinfo->ring.sring, get_order(dinfo->pages_per_ring * PAGE_SIZE));
+		rinfo->ring.sring = NULL;
 
+		if (rinfo->irq)
+			unbind_from_irqhandler(rinfo->irq, rinfo);
+		rinfo->evtchn = rinfo->irq = 0;
+	}
 }
 
 static void blkif_completion(struct blk_shadow *s, struct blkfront_ring_info *rinfo,
@@ -1276,6 +1284,26 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 	return IRQ_HANDLED;
 }
 
+static void destroy_blkring(struct xenbus_device *dev,
+			    struct blkfront_ring_info *rinfo)
+{
+	int i;
+
+	if (rinfo->irq)
+		unbind_from_irqhandler(rinfo->irq, rinfo);
+	if (rinfo->evtchn)
+		xenbus_free_evtchn(dev, rinfo->evtchn);
+
+	for (i = 0; i < rinfo->dinfo->pages_per_ring; i++) {
+		if (rinfo->ring_ref[i] != GRANT_INVALID_REF) {
+			gnttab_end_foreign_access(rinfo->ring_ref[i], 0, 0);
+			rinfo->ring_ref[i] = GRANT_INVALID_REF;
+		}
+	}
+	free_pages((unsigned long)rinfo->ring.sring,
+		   get_order(rinfo->dinfo->pages_per_ring * PAGE_SIZE));
+	rinfo->ring.sring = NULL;
+}
 
 static int setup_blkring(struct xenbus_device *dev,
 			 struct blkfront_ring_info *rinfo)
@@ -1283,10 +1311,10 @@ static int setup_blkring(struct xenbus_device *dev,
 	struct blkif_sring *sring;
 	int err, i;
 	struct blkfront_dev_info *dinfo = rinfo->dinfo;
-	unsigned long ring_size = dinfo->nr_ring_pages * PAGE_SIZE;
+	unsigned long ring_size = dinfo->pages_per_ring * PAGE_SIZE;
 	grant_ref_t gref[XENBUS_MAX_RING_PAGES];
 
-	for (i = 0; i < dinfo->nr_ring_pages; i++)
+	for (i = 0; i < dinfo->pages_per_ring; i++)
 		rinfo->ring_ref[i] = GRANT_INVALID_REF;
 
 	sring = (struct blkif_sring *)__get_free_pages(GFP_NOIO | __GFP_HIGH,
@@ -1298,13 +1326,13 @@ static int setup_blkring(struct xenbus_device *dev,
 	SHARED_RING_INIT(sring);
 	FRONT_RING_INIT(&rinfo->ring, sring, ring_size);
 
-	err = xenbus_grant_ring(dev, rinfo->ring.sring, dinfo->nr_ring_pages, gref);
+	err = xenbus_grant_ring(dev, rinfo->ring.sring, dinfo->pages_per_ring, gref);
 	if (err < 0) {
 		free_pages((unsigned long)sring, get_order(ring_size));
 		rinfo->ring.sring = NULL;
 		goto fail;
 	}
-	for (i = 0; i < dinfo->nr_ring_pages; i++)
+	for (i = 0; i < dinfo->pages_per_ring; i++)
 		rinfo->ring_ref[i] = gref[i];
 
 	err = xenbus_alloc_evtchn(dev, &rinfo->evtchn);
@@ -1322,7 +1350,7 @@ static int setup_blkring(struct xenbus_device *dev,
 
 	return 0;
 fail:
-	blkif_free(dinfo, 0);
+	destroy_blkring(dev, rinfo);
 	return err;
 }
 
@@ -1333,65 +1361,76 @@ static int talk_to_blkback(struct xenbus_device *dev,
 {
 	const char *message = NULL;
 	struct xenbus_transaction xbt;
-	int err, i;
+	int err, i, r_index;
 	unsigned int max_page_order = 0;
 	unsigned int ring_page_order = 0;
-	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
+	struct blkfront_ring_info *rinfo;
 
 	err = xenbus_scanf(XBT_NIL, dinfo->xbdev->otherend,
 			   "max-ring-page-order", "%u", &max_page_order);
 	if (err != 1)
-		dinfo->nr_ring_pages = 1;
+		dinfo->pages_per_ring = 1;
 	else {
 		ring_page_order = min(xen_blkif_max_ring_order, max_page_order);
-		dinfo->nr_ring_pages = 1 << ring_page_order;
+		dinfo->pages_per_ring = 1 << ring_page_order;
 	}
 
-	/* Create shared ring, alloc event channel. */
-	err = setup_blkring(dev, rinfo);
-	if (err)
-		goto out;
+	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
+		rinfo = &dinfo->rinfo[r_index];
+		/* Create shared ring, alloc event channel. */
+		err = setup_blkring(dev, rinfo);
+		if (err)
+			goto out;
+	}
 
 again:
 	err = xenbus_transaction_start(&xbt);
 	if (err) {
 		xenbus_dev_fatal(dev, err, "starting transaction");
-		goto destroy_blkring;
+		goto out;
 	}
 
-	if (dinfo->nr_ring_pages == 1) {
-		err = xenbus_printf(xbt, dev->nodename,
-				    "ring-ref", "%u", rinfo->ring_ref[0]);
-		if (err) {
-			message = "writing ring-ref";
-			goto abort_transaction;
-		}
-	} else {
-		err = xenbus_printf(xbt, dev->nodename,
-				    "ring-page-order", "%u", ring_page_order);
-		if (err) {
-			message = "writing ring-page-order";
-			goto abort_transaction;
-		}
-
-		for (i = 0; i < dinfo->nr_ring_pages; i++) {
-			char ring_ref_name[RINGREF_NAME_LEN];
+	if (dinfo->nr_rings == 1) {
+		rinfo = &dinfo->rinfo[0];
 
-			snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
-			err = xenbus_printf(xbt, dev->nodename, ring_ref_name,
-					    "%u", rinfo->ring_ref[i]);
+		if (dinfo->pages_per_ring == 1) {
+			err = xenbus_printf(xbt, dev->nodename,
+					    "ring-ref", "%u", rinfo->ring_ref[0]);
 			if (err) {
 				message = "writing ring-ref";
 				goto abort_transaction;
 			}
+		} else {
+			err = xenbus_printf(xbt, dev->nodename,
+					    "ring-page-order", "%u", ring_page_order);
+			if (err) {
+				message = "writing ring-page-order";
+				goto abort_transaction;
+			}
+
+			for (i = 0; i < dinfo->pages_per_ring; i++) {
+				char ring_ref_name[RINGREF_NAME_LEN];
+
+				snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
+				err = xenbus_printf(xbt, dev->nodename, ring_ref_name,
+						    "%u", rinfo->ring_ref[i]);
+				if (err) {
+					message = "writing ring-ref";
+					goto abort_transaction;
+				}
+			}
 		}
-	}
-	err = xenbus_printf(xbt, dev->nodename,
-			    "event-channel", "%u", rinfo->evtchn);
-	if (err) {
-		message = "writing event-channel";
+		err = xenbus_printf(xbt, dev->nodename,
+				    "event-channel", "%u", rinfo->evtchn);
+		if (err) {
+			message = "writing event-channel";
+			goto abort_transaction;
+		}
+	} else {
+		/* Not supported at this stage */
 		goto abort_transaction;
 	}
+
 	err = xenbus_printf(xbt, dev->nodename, "protocol", "%s",
 			    XEN_IO_PROTO_ABI_NATIVE);
 	if (err) {
@@ -1409,12 +1448,16 @@ again:
 		if (err == -EAGAIN)
 			goto again;
 		xenbus_dev_fatal(dev, err, "completing transaction");
-		goto destroy_blkring;
+		goto out;
 	}
 
-	for (i = 0; i < BLK_RING_SIZE(dinfo); i++)
-		rinfo->shadow[i].req.u.rw.id = i+1;
-	rinfo->shadow[BLK_RING_SIZE(dinfo)-1].req.u.rw.id = 0x0fffffff;
+	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
+		rinfo = &dinfo->rinfo[r_index];
+
+		for (i = 0; i < BLK_RING_SIZE(dinfo); i++)
+			rinfo->shadow[i].req.u.rw.id = i+1;
+		rinfo->shadow[BLK_RING_SIZE(dinfo)-1].req.u.rw.id = 0x0fffffff;
+	}
 	xenbus_switch_state(dev, XenbusStateInitialised);
 
 	return 0;
@@ -1423,9 +1466,9 @@ again:
 	xenbus_transaction_end(xbt, 1);
 	if (message)
 		xenbus_dev_fatal(dev, err, "%s", message);
- destroy_blkring:
-	blkif_free(dinfo, 0);
  out:
+	while (--r_index >= 0)
+		destroy_blkring(dev, &dinfo->rinfo[r_index]);
 	return err;
 }
 
@@ -1438,7 +1481,7 @@ again:
 static int blkfront_probe(struct xenbus_device *dev,
 			  const struct xenbus_device_id *id)
 {
-	int err, vdevice;
+	int err, vdevice, r_index;
 	struct blkfront_dev_info *dinfo;
 	struct blkfront_ring_info *rinfo;
 
@@ -1490,17 +1533,29 @@ static int blkfront_probe(struct xenbus_device *dev,
 		return -ENOMEM;
 	}
 
-	rinfo = &dinfo->rinfo;
 	mutex_init(&dinfo->mutex);
 	spin_lock_init(&dinfo->io_lock);
 	dinfo->xbdev = dev;
 	dinfo->vdevice = vdevice;
-	INIT_LIST_HEAD(&rinfo->grants);
-	INIT_LIST_HEAD(&rinfo->indirect_pages);
-	rinfo->persistent_gnts_c = 0;
 	dinfo->connected = BLKIF_STATE_DISCONNECTED;
-	rinfo->dinfo = dinfo;
-	INIT_WORK(&rinfo->work, blkif_restart_queue);
+
+	dinfo->nr_rings = 1;
+	dinfo->rinfo = kzalloc(sizeof(*rinfo) * dinfo->nr_rings, GFP_KERNEL);
+	if (!dinfo->rinfo) {
+		xenbus_dev_fatal(dev, -ENOMEM, "allocating ring_info structure");
+		kfree(dinfo);
+		return -ENOMEM;
+	}
+
+	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
+		rinfo = &dinfo->rinfo[r_index];
+
+		INIT_LIST_HEAD(&rinfo->grants);
+		INIT_LIST_HEAD(&rinfo->indirect_pages);
+		rinfo->persistent_gnts_c = 0;
+		rinfo->dinfo = dinfo;
+		INIT_WORK(&rinfo->work, blkif_restart_queue);
+	}
 
 	/* Front end dir is a number, which is used as the id. */
 	dinfo->handle = simple_strtoul(strrchr(dev->nodename, '/')+1, NULL, 0);
@@ -1526,7 +1581,7 @@ static void split_bio_end(struct bio *bio, int error)
 
 static int blkif_recover(struct blkfront_dev_info *dinfo)
 {
-	int i;
+	int i, r_index;
 	struct request *req, *n;
 	struct blk_shadow *copy;
 	int rc;
@@ -1536,56 +1591,62 @@ static int blkif_recover(struct blkfront_dev_info *dinfo)
 	int pending, size;
 	struct split_bio *split_bio;
 	struct list_head requests;
-	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
-
-	/* Stage 1: Make a safe copy of the shadow state. */
-	copy = kmemdup(rinfo->shadow, sizeof(rinfo->shadow),
-		       GFP_NOIO | __GFP_REPEAT | __GFP_HIGH);
-	if (!copy)
-		return -ENOMEM;
-
-	/* Stage 2: Set up free list. */
-	memset(&rinfo->shadow, 0, sizeof(rinfo->shadow));
-	for (i = 0; i < BLK_RING_SIZE(dinfo); i++)
-		rinfo->shadow[i].req.u.rw.id = i+1;
-	rinfo->shadow_free = rinfo->ring.req_prod_pvt;
-	rinfo->shadow[BLK_RING_SIZE(dinfo)-1].req.u.rw.id = 0x0fffffff;
-
-	rc = blkfront_gather_backend_features(dinfo);
-	if (rc) {
-		kfree(copy);
-		return rc;
-	}
+	struct blkfront_ring_info *rinfo;
 
+	__blkfront_gather_backend_features(dinfo);
 	segs = dinfo->max_indirect_segments ? : BLKIF_MAX_SEGMENTS_PER_REQUEST;
 	blk_queue_max_segments(dinfo->rq, segs);
 	bio_list_init(&bio_list);
 	INIT_LIST_HEAD(&requests);
-	for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
-		/* Not in use? */
-		if (!copy[i].request)
-			continue;
 
-		/*
-		 * Get the bios in the request so we can re-queue them.
-		 */
-		if (copy[i].request->cmd_flags &
-		    (REQ_FLUSH | REQ_FUA | REQ_DISCARD | REQ_SECURE)) {
+	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
+		rinfo = &dinfo->rinfo[r_index];
+
+		/* Stage 1: Make a safe copy of the shadow state. */
+		copy = kmemdup(rinfo->shadow, sizeof(rinfo->shadow),
+			       GFP_NOIO | __GFP_REPEAT | __GFP_HIGH);
+		if (!copy)
+			return -ENOMEM;
+
+		/* Stage 2: Set up free list. */
+		memset(&rinfo->shadow, 0, sizeof(rinfo->shadow));
+		for (i = 0; i < BLK_RING_SIZE(dinfo); i++)
+			rinfo->shadow[i].req.u.rw.id = i+1;
+		rinfo->shadow_free = rinfo->ring.req_prod_pvt;
+		rinfo->shadow[BLK_RING_SIZE(dinfo)-1].req.u.rw.id = 0x0fffffff;
+
+		rc = blkfront_setup_indirect(rinfo);
+		if (rc) {
+			kfree(copy);
+			return rc;
+		}
+
+		for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
+			/* Not in use? */
+			if (!copy[i].request)
+				continue;
+
 			/*
-			 * Flush operations don't contain bios, so
-			 * we need to requeue the whole request
+			 * Get the bios in the request so we can re-queue them.
 			 */
-			list_add(&copy[i].request->queuelist, &requests);
-			continue;
+			if (copy[i].request->cmd_flags &
+			    (REQ_FLUSH | REQ_FUA | REQ_DISCARD | REQ_SECURE)) {
+				/*
+				 * Flush operations don't contain bios, so
+				 * we need to requeue the whole request
+				 */
+				list_add(&copy[i].request->queuelist, &requests);
+				continue;
+			}
+			merge_bio.head = copy[i].request->bio;
+			merge_bio.tail = copy[i].request->biotail;
+			bio_list_merge(&bio_list, &merge_bio);
+			copy[i].request->bio = NULL;
+			blk_end_request_all(copy[i].request, 0);
 		}
-		merge_bio.head = copy[i].request->bio;
-		merge_bio.tail = copy[i].request->biotail;
-		bio_list_merge(&bio_list, &merge_bio);
-		copy[i].request->bio = NULL;
-		blk_end_request_all(copy[i].request, 0);
-	}
 
-	kfree(copy);
+		kfree(copy);
+	}
 
 	xenbus_switch_state(dinfo->xbdev, XenbusStateConnected);
 
@@ -1594,8 +1655,12 @@ static int blkif_recover(struct blkfront_dev_info *dinfo)
 	/* Now safe for us to use the shared ring */
 	dinfo->connected = BLKIF_STATE_CONNECTED;
 
-	/* Kick any other new requests queued since we resumed */
-	kick_pending_request_queues(rinfo);
+	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
+		rinfo = &dinfo->rinfo[r_index];
+
+		/* Kick any other new requests queued since we resumed */
+		kick_pending_request_queues(rinfo);
+	}
 
 	list_for_each_entry_safe(req, n, &requests, queuelist) {
 		/* Requeue pending requests (flush or discard) */
@@ -1729,6 +1794,38 @@ static void blkfront_setup_discard(struct blkfront_dev_info *dinfo)
 		dinfo->feature_secdiscard = !!discard_secure;
 }
 
+static void blkfront_clean_ring(struct blkfront_ring_info *rinfo)
+{
+	int i;
+
+	for (i = 0; i < BLK_RING_SIZE(rinfo->dinfo); i++) {
+		kfree(rinfo->shadow[i].grants_used);
+		rinfo->shadow[i].grants_used = NULL;
+		kfree(rinfo->shadow[i].sg);
+		rinfo->shadow[i].sg = NULL;
+		kfree(rinfo->shadow[i].indirect_grants);
+		rinfo->shadow[i].indirect_grants = NULL;
+	}
+	if (!list_empty(&rinfo->indirect_pages)) {
+		struct page *indirect_page, *n;
+		list_for_each_entry_safe(indirect_page, n, &rinfo->indirect_pages, lru) {
+			list_del(&indirect_page->lru);
+			__free_page(indirect_page);
+		}
+	}
+
+	if (!list_empty(&rinfo->grants)) {
+		struct grant *gnt_list_entry, *n;
+		list_for_each_entry_safe(gnt_list_entry, n,
+				&rinfo->grants, node) {
+			list_del(&gnt_list_entry->node);
+			if (rinfo->dinfo->feature_persistent)
+				__free_page(pfn_to_page(gnt_list_entry->pfn));
+			kfree(gnt_list_entry);
+		}
+	}
+}
+
 static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo)
 {
 	unsigned int segs;
@@ -1783,28 +1880,14 @@ static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo)
 	return 0;
 
 out_of_memory:
-	for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
-		kfree(rinfo->shadow[i].grants_used);
-		rinfo->shadow[i].grants_used = NULL;
-		kfree(rinfo->shadow[i].sg);
-		rinfo->shadow[i].sg = NULL;
-		kfree(rinfo->shadow[i].indirect_grants);
-		rinfo->shadow[i].indirect_grants = NULL;
-	}
-	if (!list_empty(&rinfo->indirect_pages)) {
-		struct page *indirect_page, *n;
-		list_for_each_entry_safe(indirect_page, n, &rinfo->indirect_pages, lru) {
-			list_del(&indirect_page->lru);
-			__free_page(indirect_page);
-		}
-	}
+	blkfront_clean_ring(rinfo);
 	return -ENOMEM;
 }
 
 /*
  * Gather all backend feature-*
  */
-static int blkfront_gather_backend_features(struct blkfront_dev_info *dinfo)
+static void __blkfront_gather_backend_features(struct blkfront_dev_info *dinfo)
 {
 	int err;
 	int barrier, flush, discard, persistent;
@@ -1859,8 +1942,25 @@ static int blkfront_gather_backend_features(struct blkfront_dev_info *dinfo)
 	else
 		dinfo->max_indirect_segments = min(indirect_segments,
 						  xen_blkif_max_segments);
+}
+
+static int blkfront_gather_backend_features(struct blkfront_dev_info *dinfo)
+{
+	int err, i;
+
+	__blkfront_gather_backend_features(dinfo);
 
-	return blkfront_setup_indirect(&dinfo->rinfo);
+	for (i = 0; i < dinfo->nr_rings; i++) {
+		err = blkfront_setup_indirect(&dinfo->rinfo[i]);
+		if (err)
+			goto out;
+	}
+	return 0;
+
+out:
+	while (--i >= 0)
+		blkfront_clean_ring(&dinfo->rinfo[i]);
+	return err;
 }
 
 /*
@@ -1873,8 +1973,8 @@ static void blkfront_connect(struct blkfront_dev_info *dinfo)
 	unsigned long sector_size;
 	unsigned int physical_sector_size;
 	unsigned int binfo;
-	int err;
-	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
+	int err, i;
+	struct blkfront_ring_info *rinfo;
 
 	switch (dinfo->connected) {
 	case BLKIF_STATE_CONNECTED:
@@ -1951,7 +2051,10 @@ static void blkfront_connect(struct blkfront_dev_info *dinfo)
 	/* Kick pending requests. */
 	spin_lock_irq(&dinfo->io_lock);
 	dinfo->connected = BLKIF_STATE_CONNECTED;
-	kick_pending_request_queues(rinfo);
+	for (i = 0; i < dinfo->nr_rings; i++) {
+		rinfo = &dinfo->rinfo[i];
+		kick_pending_request_queues(rinfo);
+	}
 	spin_unlock_irq(&dinfo->io_lock);
 
 	add_disk(dinfo->gd);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v3 5/9] xen/blkfront: convert per device io_lock to per ring ring_lock
  2015-09-05 12:39 [PATCH v3 0/9] xen-block: support multi hardware-queues/rings Bob Liu
@ 2015-09-05 12:39   ` Bob Liu
  2015-09-05 12:39 ` Bob Liu
                     ` (13 subsequent siblings)
  14 siblings, 0 replies; 83+ messages in thread
From: Bob Liu @ 2015-09-05 12:39 UTC (permalink / raw)
  To: xen-devel
  Cc: david.vrabel, linux-kernel, roger.pau, konrad.wilk,
	felipe.franciosi, axboe, hch, avanzini.arianna, rafal.mielniczuk,
	boris.ostrovsky, jonathan.davies, Bob Liu

The per device io_lock became a coarser grained lock after multi-queues/rings
was introduced, this patch converts it to a fine-grained per ring lock.

NOTE: The per dev_info structure was no more protected by any lock.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 drivers/block/xen-blkfront.c |   44 +++++++++++++++++++-----------------------
 1 file changed, 20 insertions(+), 24 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index bf45c99..1cae76b 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -123,6 +123,7 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the
 struct blkfront_ring_info
 {
 	struct blkif_front_ring ring;
+	spinlock_t ring_lock;
 	unsigned int ring_ref[XENBUS_MAX_RING_PAGES];
 	unsigned int evtchn, irq;
 	struct work_struct work;
@@ -141,7 +142,6 @@ struct blkfront_ring_info
  * putting all kinds of interesting stuff here :-)
  */
 struct blkfront_dev_info {
-	spinlock_t io_lock;
 	struct mutex mutex;
 	struct xenbus_device *xbdev;
 	struct gendisk *gd;
@@ -637,29 +637,28 @@ static int blkif_queue_rq(struct blk_mq_hw_ctx *hctx,
 			   const struct blk_mq_queue_data *qd)
 {
 	struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)hctx->driver_data;
-	struct blkfront_dev_info *dinfo = rinfo->dinfo;
 
 	blk_mq_start_request(qd->rq);
-	spin_lock_irq(&dinfo->io_lock);
+	spin_lock_irq(&rinfo->ring_lock);
 	if (RING_FULL(&rinfo->ring))
 		goto out_busy;
 
-	if (blkif_request_flush_invalid(qd->rq, dinfo))
+	if (blkif_request_flush_invalid(qd->rq, rinfo->dinfo))
 		goto out_err;
 
 	if (blkif_queue_request(qd->rq, rinfo))
 		goto out_busy;
 
 	flush_requests(rinfo);
-	spin_unlock_irq(&dinfo->io_lock);
+	spin_unlock_irq(&rinfo->ring_lock);
 	return BLK_MQ_RQ_QUEUE_OK;
 
 out_err:
-	spin_unlock_irq(&dinfo->io_lock);
+	spin_unlock_irq(&rinfo->ring_lock);
 	return BLK_MQ_RQ_QUEUE_ERROR;
 
 out_busy:
-	spin_unlock_irq(&dinfo->io_lock);
+	spin_unlock_irq(&rinfo->ring_lock);
 	blk_mq_stop_hw_queue(hctx);
 	return BLK_MQ_RQ_QUEUE_BUSY;
 }
@@ -961,7 +960,7 @@ static void xlvbd_release_gendisk(struct blkfront_dev_info *dinfo)
 	dinfo->gd = NULL;
 }
 
-/* Must be called with io_lock holded */
+/* Must be called with ring_lock holded */
 static void kick_pending_request_queues(struct blkfront_ring_info *rinfo)
 {
 	if (!RING_FULL(&rinfo->ring))
@@ -972,10 +971,10 @@ static void blkif_restart_queue(struct work_struct *work)
 {
 	struct blkfront_ring_info *rinfo = container_of(work, struct blkfront_ring_info, work);
 
-	spin_lock_irq(&rinfo->dinfo->io_lock);
+	spin_lock_irq(&rinfo->ring_lock);
 	if (rinfo->dinfo->connected == BLKIF_STATE_CONNECTED)
 		kick_pending_request_queues(rinfo);
-	spin_unlock_irq(&rinfo->dinfo->io_lock);
+	spin_unlock_irq(&rinfo->ring_lock);
 }
 
 static void blkif_free(struct blkfront_dev_info *dinfo, int suspend)
@@ -986,7 +985,6 @@ static void blkif_free(struct blkfront_dev_info *dinfo, int suspend)
 	struct blkfront_ring_info *rinfo;
 
 	/* Prevent new requests being issued until we fix things up. */
-	spin_lock_irq(&dinfo->io_lock);
 	dinfo->connected = suspend ?
 		BLKIF_STATE_SUSPENDED : BLKIF_STATE_DISCONNECTED;
 	/* No more blkif_request(). */
@@ -996,6 +994,7 @@ static void blkif_free(struct blkfront_dev_info *dinfo, int suspend)
 	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
 		rinfo = &dinfo->rinfo[r_index];
 
+		spin_lock_irq(&rinfo->ring_lock);
 		/* Remove all persistent grants */
 		if (!list_empty(&rinfo->grants)) {
 			list_for_each_entry_safe(persistent_gnt, n,
@@ -1071,7 +1070,7 @@ free_shadow:
 
 		/* No more gnttab callback work. */
 		gnttab_cancel_free_callback(&rinfo->callback);
-		spin_unlock_irq(&dinfo->io_lock);
+		spin_unlock_irq(&rinfo->ring_lock);
 
 		/* Flush gnttab callback work. Must be done with no locks held. */
 		flush_work(&rinfo->work);
@@ -1180,13 +1179,10 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 	struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)dev_id;
 	struct blkfront_dev_info *dinfo = rinfo->dinfo;
 
-	spin_lock_irqsave(&dinfo->io_lock, flags);
-
-	if (unlikely(dinfo->connected != BLKIF_STATE_CONNECTED)) {
-		spin_unlock_irqrestore(&dinfo->io_lock, flags);
+	if (unlikely(dinfo->connected != BLKIF_STATE_CONNECTED))
 		return IRQ_HANDLED;
-	}
 
+	spin_lock_irqsave(&rinfo->ring_lock, flags);
  again:
 	rp = rinfo->ring.sring->rsp_prod;
 	rmb(); /* Ensure we see queued responses up to 'rp'. */
@@ -1279,7 +1275,7 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 
 	kick_pending_request_queues(rinfo);
 
-	spin_unlock_irqrestore(&dinfo->io_lock, flags);
+	spin_unlock_irqrestore(&rinfo->ring_lock, flags);
 
 	return IRQ_HANDLED;
 }
@@ -1534,7 +1530,6 @@ static int blkfront_probe(struct xenbus_device *dev,
 	}
 
 	mutex_init(&dinfo->mutex);
-	spin_lock_init(&dinfo->io_lock);
 	dinfo->xbdev = dev;
 	dinfo->vdevice = vdevice;
 	dinfo->connected = BLKIF_STATE_DISCONNECTED;
@@ -1550,6 +1545,7 @@ static int blkfront_probe(struct xenbus_device *dev,
 	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
 		rinfo = &dinfo->rinfo[r_index];
 
+		spin_lock_init(&rinfo->ring_lock);
 		INIT_LIST_HEAD(&rinfo->grants);
 		INIT_LIST_HEAD(&rinfo->indirect_pages);
 		rinfo->persistent_gnts_c = 0;
@@ -1650,16 +1646,16 @@ static int blkif_recover(struct blkfront_dev_info *dinfo)
 
 	xenbus_switch_state(dinfo->xbdev, XenbusStateConnected);
 
-	spin_lock_irq(&dinfo->io_lock);
-
 	/* Now safe for us to use the shared ring */
 	dinfo->connected = BLKIF_STATE_CONNECTED;
 
 	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
 		rinfo = &dinfo->rinfo[r_index];
 
+		spin_lock_irq(&rinfo->ring_lock);
 		/* Kick any other new requests queued since we resumed */
 		kick_pending_request_queues(rinfo);
+		spin_unlock_irq(&rinfo->ring_lock);
 	}
 
 	list_for_each_entry_safe(req, n, &requests, queuelist) {
@@ -1668,7 +1664,6 @@ static int blkif_recover(struct blkfront_dev_info *dinfo)
 		BUG_ON(req->nr_phys_segments > segs);
 		blk_mq_requeue_request(req);
 	}
-	spin_unlock_irq(&dinfo->io_lock);
 	blk_mq_kick_requeue_list(dinfo->rq);
 
 	while ((bio = bio_list_pop(&bio_list)) != NULL) {
@@ -2049,13 +2044,14 @@ static void blkfront_connect(struct blkfront_dev_info *dinfo)
 	xenbus_switch_state(dinfo->xbdev, XenbusStateConnected);
 
 	/* Kick pending requests. */
-	spin_lock_irq(&dinfo->io_lock);
 	dinfo->connected = BLKIF_STATE_CONNECTED;
 	for (i = 0; i < dinfo->nr_rings; i++) {
 		rinfo = &dinfo->rinfo[i];
+
+		spin_lock_irq(&rinfo->ring_lock);
 		kick_pending_request_queues(rinfo);
+		spin_unlock_irq(&rinfo->ring_lock);
 	}
-	spin_unlock_irq(&dinfo->io_lock);
 
 	add_disk(dinfo->gd);
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v3 5/9] xen/blkfront: convert per device io_lock to per ring ring_lock
@ 2015-09-05 12:39   ` Bob Liu
  0 siblings, 0 replies; 83+ messages in thread
From: Bob Liu @ 2015-09-05 12:39 UTC (permalink / raw)
  To: xen-devel
  Cc: hch, felipe.franciosi, rafal.mielniczuk, linux-kernel,
	jonathan.davies, axboe, Bob Liu, david.vrabel, avanzini.arianna,
	boris.ostrovsky, roger.pau

The per device io_lock became a coarser grained lock after multi-queues/rings
was introduced, this patch converts it to a fine-grained per ring lock.

NOTE: The per dev_info structure was no more protected by any lock.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 drivers/block/xen-blkfront.c |   44 +++++++++++++++++++-----------------------
 1 file changed, 20 insertions(+), 24 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index bf45c99..1cae76b 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -123,6 +123,7 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the
 struct blkfront_ring_info
 {
 	struct blkif_front_ring ring;
+	spinlock_t ring_lock;
 	unsigned int ring_ref[XENBUS_MAX_RING_PAGES];
 	unsigned int evtchn, irq;
 	struct work_struct work;
@@ -141,7 +142,6 @@ struct blkfront_ring_info
  * putting all kinds of interesting stuff here :-)
  */
 struct blkfront_dev_info {
-	spinlock_t io_lock;
 	struct mutex mutex;
 	struct xenbus_device *xbdev;
 	struct gendisk *gd;
@@ -637,29 +637,28 @@ static int blkif_queue_rq(struct blk_mq_hw_ctx *hctx,
 			   const struct blk_mq_queue_data *qd)
 {
 	struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)hctx->driver_data;
-	struct blkfront_dev_info *dinfo = rinfo->dinfo;
 
 	blk_mq_start_request(qd->rq);
-	spin_lock_irq(&dinfo->io_lock);
+	spin_lock_irq(&rinfo->ring_lock);
 	if (RING_FULL(&rinfo->ring))
 		goto out_busy;
 
-	if (blkif_request_flush_invalid(qd->rq, dinfo))
+	if (blkif_request_flush_invalid(qd->rq, rinfo->dinfo))
 		goto out_err;
 
 	if (blkif_queue_request(qd->rq, rinfo))
 		goto out_busy;
 
 	flush_requests(rinfo);
-	spin_unlock_irq(&dinfo->io_lock);
+	spin_unlock_irq(&rinfo->ring_lock);
 	return BLK_MQ_RQ_QUEUE_OK;
 
 out_err:
-	spin_unlock_irq(&dinfo->io_lock);
+	spin_unlock_irq(&rinfo->ring_lock);
 	return BLK_MQ_RQ_QUEUE_ERROR;
 
 out_busy:
-	spin_unlock_irq(&dinfo->io_lock);
+	spin_unlock_irq(&rinfo->ring_lock);
 	blk_mq_stop_hw_queue(hctx);
 	return BLK_MQ_RQ_QUEUE_BUSY;
 }
@@ -961,7 +960,7 @@ static void xlvbd_release_gendisk(struct blkfront_dev_info *dinfo)
 	dinfo->gd = NULL;
 }
 
-/* Must be called with io_lock holded */
+/* Must be called with ring_lock holded */
 static void kick_pending_request_queues(struct blkfront_ring_info *rinfo)
 {
 	if (!RING_FULL(&rinfo->ring))
@@ -972,10 +971,10 @@ static void blkif_restart_queue(struct work_struct *work)
 {
 	struct blkfront_ring_info *rinfo = container_of(work, struct blkfront_ring_info, work);
 
-	spin_lock_irq(&rinfo->dinfo->io_lock);
+	spin_lock_irq(&rinfo->ring_lock);
 	if (rinfo->dinfo->connected == BLKIF_STATE_CONNECTED)
 		kick_pending_request_queues(rinfo);
-	spin_unlock_irq(&rinfo->dinfo->io_lock);
+	spin_unlock_irq(&rinfo->ring_lock);
 }
 
 static void blkif_free(struct blkfront_dev_info *dinfo, int suspend)
@@ -986,7 +985,6 @@ static void blkif_free(struct blkfront_dev_info *dinfo, int suspend)
 	struct blkfront_ring_info *rinfo;
 
 	/* Prevent new requests being issued until we fix things up. */
-	spin_lock_irq(&dinfo->io_lock);
 	dinfo->connected = suspend ?
 		BLKIF_STATE_SUSPENDED : BLKIF_STATE_DISCONNECTED;
 	/* No more blkif_request(). */
@@ -996,6 +994,7 @@ static void blkif_free(struct blkfront_dev_info *dinfo, int suspend)
 	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
 		rinfo = &dinfo->rinfo[r_index];
 
+		spin_lock_irq(&rinfo->ring_lock);
 		/* Remove all persistent grants */
 		if (!list_empty(&rinfo->grants)) {
 			list_for_each_entry_safe(persistent_gnt, n,
@@ -1071,7 +1070,7 @@ free_shadow:
 
 		/* No more gnttab callback work. */
 		gnttab_cancel_free_callback(&rinfo->callback);
-		spin_unlock_irq(&dinfo->io_lock);
+		spin_unlock_irq(&rinfo->ring_lock);
 
 		/* Flush gnttab callback work. Must be done with no locks held. */
 		flush_work(&rinfo->work);
@@ -1180,13 +1179,10 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 	struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)dev_id;
 	struct blkfront_dev_info *dinfo = rinfo->dinfo;
 
-	spin_lock_irqsave(&dinfo->io_lock, flags);
-
-	if (unlikely(dinfo->connected != BLKIF_STATE_CONNECTED)) {
-		spin_unlock_irqrestore(&dinfo->io_lock, flags);
+	if (unlikely(dinfo->connected != BLKIF_STATE_CONNECTED))
 		return IRQ_HANDLED;
-	}
 
+	spin_lock_irqsave(&rinfo->ring_lock, flags);
  again:
 	rp = rinfo->ring.sring->rsp_prod;
 	rmb(); /* Ensure we see queued responses up to 'rp'. */
@@ -1279,7 +1275,7 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 
 	kick_pending_request_queues(rinfo);
 
-	spin_unlock_irqrestore(&dinfo->io_lock, flags);
+	spin_unlock_irqrestore(&rinfo->ring_lock, flags);
 
 	return IRQ_HANDLED;
 }
@@ -1534,7 +1530,6 @@ static int blkfront_probe(struct xenbus_device *dev,
 	}
 
 	mutex_init(&dinfo->mutex);
-	spin_lock_init(&dinfo->io_lock);
 	dinfo->xbdev = dev;
 	dinfo->vdevice = vdevice;
 	dinfo->connected = BLKIF_STATE_DISCONNECTED;
@@ -1550,6 +1545,7 @@ static int blkfront_probe(struct xenbus_device *dev,
 	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
 		rinfo = &dinfo->rinfo[r_index];
 
+		spin_lock_init(&rinfo->ring_lock);
 		INIT_LIST_HEAD(&rinfo->grants);
 		INIT_LIST_HEAD(&rinfo->indirect_pages);
 		rinfo->persistent_gnts_c = 0;
@@ -1650,16 +1646,16 @@ static int blkif_recover(struct blkfront_dev_info *dinfo)
 
 	xenbus_switch_state(dinfo->xbdev, XenbusStateConnected);
 
-	spin_lock_irq(&dinfo->io_lock);
-
 	/* Now safe for us to use the shared ring */
 	dinfo->connected = BLKIF_STATE_CONNECTED;
 
 	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
 		rinfo = &dinfo->rinfo[r_index];
 
+		spin_lock_irq(&rinfo->ring_lock);
 		/* Kick any other new requests queued since we resumed */
 		kick_pending_request_queues(rinfo);
+		spin_unlock_irq(&rinfo->ring_lock);
 	}
 
 	list_for_each_entry_safe(req, n, &requests, queuelist) {
@@ -1668,7 +1664,6 @@ static int blkif_recover(struct blkfront_dev_info *dinfo)
 		BUG_ON(req->nr_phys_segments > segs);
 		blk_mq_requeue_request(req);
 	}
-	spin_unlock_irq(&dinfo->io_lock);
 	blk_mq_kick_requeue_list(dinfo->rq);
 
 	while ((bio = bio_list_pop(&bio_list)) != NULL) {
@@ -2049,13 +2044,14 @@ static void blkfront_connect(struct blkfront_dev_info *dinfo)
 	xenbus_switch_state(dinfo->xbdev, XenbusStateConnected);
 
 	/* Kick pending requests. */
-	spin_lock_irq(&dinfo->io_lock);
 	dinfo->connected = BLKIF_STATE_CONNECTED;
 	for (i = 0; i < dinfo->nr_rings; i++) {
 		rinfo = &dinfo->rinfo[i];
+
+		spin_lock_irq(&rinfo->ring_lock);
 		kick_pending_request_queues(rinfo);
+		spin_unlock_irq(&rinfo->ring_lock);
 	}
-	spin_unlock_irq(&dinfo->io_lock);
 
 	add_disk(dinfo->gd);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v3 6/9] xen/blkfront: negotiate the number of hw queues/rings with backend
  2015-09-05 12:39 [PATCH v3 0/9] xen-block: support multi hardware-queues/rings Bob Liu
@ 2015-09-05 12:39   ` Bob Liu
  2015-09-05 12:39 ` Bob Liu
                     ` (13 subsequent siblings)
  14 siblings, 0 replies; 83+ messages in thread
From: Bob Liu @ 2015-09-05 12:39 UTC (permalink / raw)
  To: xen-devel
  Cc: david.vrabel, linux-kernel, roger.pau, konrad.wilk,
	felipe.franciosi, axboe, hch, avanzini.arianna, rafal.mielniczuk,
	boris.ostrovsky, jonathan.davies, Bob Liu

The max number of hardware queues for xen/blkfront is set by parameter
'max_queues', while the number xen/blkback supported is notified through
xenstore("multi-queue-max-queues").

The negotiated number was the smaller one, and was written back to
xen/blkback as "multi-queue-num-queues".

Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 drivers/block/xen-blkfront.c |  142 ++++++++++++++++++++++++++++++++----------
 1 file changed, 108 insertions(+), 34 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 1cae76b..1aa66c9 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -107,6 +107,10 @@ static unsigned int xen_blkif_max_ring_order;
 module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, S_IRUGO);
 MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the shared ring");
 
+static unsigned int xen_blkif_max_queues;
+module_param_named(max_queues, xen_blkif_max_queues, uint, S_IRUGO);
+MODULE_PARM_DESC(max_queues, "Maximum number of hardware queues/rings per virtual disk");
+
 #define BLK_RING_SIZE(dinfo) __CONST_RING_SIZE(blkif, PAGE_SIZE * (dinfo)->pages_per_ring)
 #define BLK_MAX_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE * XENBUS_MAX_RING_PAGES)
 /*
@@ -114,6 +118,7 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the
  * characters are enough. Define to 20 to keep consist with backend.
  */
 #define RINGREF_NAME_LEN (20)
+#define QUEUE_NAME_LEN (12)
 
 /*
  *  Per-ring info.
@@ -687,7 +692,7 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
 
 	memset(&dinfo->tag_set, 0, sizeof(dinfo->tag_set));
 	dinfo->tag_set.ops = &blkfront_mq_ops;
-	dinfo->tag_set.nr_hw_queues = 1;
+	dinfo->tag_set.nr_hw_queues = dinfo->nr_rings;
 	dinfo->tag_set.queue_depth =  BLK_RING_SIZE(dinfo);
 	dinfo->tag_set.numa_node = NUMA_NO_NODE;
 	dinfo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
@@ -1350,6 +1355,51 @@ fail:
 	return err;
 }
 
+static int write_per_ring_nodes(struct xenbus_transaction xbt,
+				struct blkfront_ring_info *rinfo, const char *dir)
+{
+	int err, i;
+	const char *message = NULL;
+	struct blkfront_dev_info *dinfo = rinfo->dinfo;
+
+	if (dinfo->pages_per_ring == 1) {
+		err = xenbus_printf(xbt, dir, "ring-ref", "%u", rinfo->ring_ref[0]);
+		if (err) {
+			message = "writing ring-ref";
+			goto abort_transaction;
+		}
+		pr_info("%s: write ring-ref:%d\n", dir, rinfo->ring_ref[0]);
+	} else {
+		for (i = 0; i < dinfo->pages_per_ring; i++) {
+			char ring_ref_name[RINGREF_NAME_LEN];
+
+			snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
+			err = xenbus_printf(xbt, dir, ring_ref_name,
+					    "%u", rinfo->ring_ref[i]);
+			if (err) {
+				message = "writing ring-ref";
+				goto abort_transaction;
+			}
+			pr_info("%s: write ring-ref:%d\n", dir, rinfo->ring_ref[i]);
+		}
+	}
+
+	err = xenbus_printf(xbt, dir, "event-channel", "%u", rinfo->evtchn);
+	if (err) {
+		message = "writing event-channel";
+		goto abort_transaction;
+	}
+	pr_info("%s: write event-channel:%d\n", dir, rinfo->evtchn);
+
+	return 0;
+
+abort_transaction:
+	xenbus_transaction_end(xbt, 1);
+	if (message)
+		xenbus_dev_fatal(dinfo->xbdev, err, "%s", message);
+
+	return err;
+}
 
 /* Common code used when first setting up, and when resuming. */
 static int talk_to_blkback(struct xenbus_device *dev,
@@ -1386,45 +1436,51 @@ again:
 		goto out;
 	}
 
+	if (dinfo->pages_per_ring > 1) {
+		err = xenbus_printf(xbt, dev->nodename, "ring-page-order", "%u",
+				    ring_page_order);
+		if (err) {
+			message = "writing ring-page-order";
+			goto abort_transaction;
+		}
+	}
+
+	/* We already got the number of queues in _probe */
 	if (dinfo->nr_rings == 1) {
 		rinfo = &dinfo->rinfo[0];
 
-		if (dinfo->pages_per_ring == 1) {
-			err = xenbus_printf(xbt, dev->nodename,
-					    "ring-ref", "%u", rinfo->ring_ref[0]);
-			if (err) {
-				message = "writing ring-ref";
-				goto abort_transaction;
-			}
-		} else {
-			err = xenbus_printf(xbt, dev->nodename,
-					    "ring-page-order", "%u", ring_page_order);
-			if (err) {
-				message = "writing ring-page-order";
-				goto abort_transaction;
-			}
-
-			for (i = 0; i < dinfo->pages_per_ring; i++) {
-				char ring_ref_name[RINGREF_NAME_LEN];
+		err = write_per_ring_nodes(xbt, &dinfo->rinfo[0], dev->nodename);
+		if (err)
+			goto out;
+	} else {
+		char *path;
+		size_t pathsize;
 
-				snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
-				err = xenbus_printf(xbt, dev->nodename, ring_ref_name,
-						    "%u", rinfo->ring_ref[i]);
-				if (err) {
-					message = "writing ring-ref";
-					goto abort_transaction;
-				}
-			}
-		}
-		err = xenbus_printf(xbt, dev->nodename,
-				    "event-channel", "%u", rinfo->evtchn);
+		err = xenbus_printf(xbt, dev->nodename, "multi-queue-num-queues", "%u",
+				    dinfo->nr_rings);
 		if (err) {
-			message = "writing event-channel";
+			message = "writing multi-queue-num-queues";
 			goto abort_transaction;
 		}
-	} else {
-		/* Not supported at this stage */
-		goto abort_transaction;
+
+		pathsize = strlen(dev->nodename) + QUEUE_NAME_LEN;
+		path = kzalloc(pathsize, GFP_KERNEL);
+		if (!path) {
+			err = -ENOMEM;
+			message = "ENOMEM while writing ring references";
+			goto abort_transaction;
+		}
+
+		for (i = 0; i < dinfo->nr_rings; i++) {
+			memset(path, 0, pathsize);
+			snprintf(path, pathsize, "%s/queue-%u", dev->nodename, i);
+			err = write_per_ring_nodes(xbt, &dinfo->rinfo[i], path);
+			if (err) {
+				kfree(path);
+				goto out;
+			}
+		}
+		kfree(path);
 	}
 
 	err = xenbus_printf(xbt, dev->nodename, "protocol", "%s",
@@ -1480,6 +1536,7 @@ static int blkfront_probe(struct xenbus_device *dev,
 	int err, vdevice, r_index;
 	struct blkfront_dev_info *dinfo;
 	struct blkfront_ring_info *rinfo;
+	unsigned int back_max_queues = 0;
 
 	/* FIXME: Use dynamic device id if this is not set. */
 	err = xenbus_scanf(XBT_NIL, dev->nodename,
@@ -1534,7 +1591,17 @@ static int blkfront_probe(struct xenbus_device *dev,
 	dinfo->vdevice = vdevice;
 	dinfo->connected = BLKIF_STATE_DISCONNECTED;
 
-	dinfo->nr_rings = 1;
+	/* Check if backend supports multiple queues */
+	err = xenbus_scanf(XBT_NIL, dinfo->xbdev->otherend,
+			   "multi-queue-max-queues", "%u", &back_max_queues);
+	if (err < 0)
+		back_max_queues = 1;
+
+	dinfo->nr_rings = min(back_max_queues, xen_blkif_max_queues);
+	if (dinfo->nr_rings <= 0)
+		dinfo->nr_rings = 1;
+	pr_info("dinfo->nr_rings:%u, backend support max-queues:%u\n", dinfo->nr_rings, back_max_queues);
+
 	dinfo->rinfo = kzalloc(sizeof(*rinfo) * dinfo->nr_rings, GFP_KERNEL);
 	if (!dinfo->rinfo) {
 		xenbus_dev_fatal(dev, -ENOMEM, "allocating ring_info structure");
@@ -2257,6 +2324,7 @@ static struct xenbus_driver blkfront_driver = {
 static int __init xlblk_init(void)
 {
 	int ret;
+	int nr_cpus = num_online_cpus();
 
 	if (!xen_domain())
 		return -ENODEV;
@@ -2267,6 +2335,12 @@ static int __init xlblk_init(void)
 		xen_blkif_max_ring_order = 0;
 	}
 
+	if (xen_blkif_max_queues > nr_cpus) {
+		pr_info("Invalid max_queues (%d), will use default max: %d.\n",
+			xen_blkif_max_queues, nr_cpus);
+		xen_blkif_max_queues = nr_cpus;
+	}
+
 	if (!xen_has_pv_disk_devices())
 		return -ENODEV;
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v3 6/9] xen/blkfront: negotiate the number of hw queues/rings with backend
@ 2015-09-05 12:39   ` Bob Liu
  0 siblings, 0 replies; 83+ messages in thread
From: Bob Liu @ 2015-09-05 12:39 UTC (permalink / raw)
  To: xen-devel
  Cc: hch, felipe.franciosi, rafal.mielniczuk, linux-kernel,
	jonathan.davies, axboe, Bob Liu, david.vrabel, avanzini.arianna,
	boris.ostrovsky, roger.pau

The max number of hardware queues for xen/blkfront is set by parameter
'max_queues', while the number xen/blkback supported is notified through
xenstore("multi-queue-max-queues").

The negotiated number was the smaller one, and was written back to
xen/blkback as "multi-queue-num-queues".

Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 drivers/block/xen-blkfront.c |  142 ++++++++++++++++++++++++++++++++----------
 1 file changed, 108 insertions(+), 34 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 1cae76b..1aa66c9 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -107,6 +107,10 @@ static unsigned int xen_blkif_max_ring_order;
 module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, S_IRUGO);
 MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the shared ring");
 
+static unsigned int xen_blkif_max_queues;
+module_param_named(max_queues, xen_blkif_max_queues, uint, S_IRUGO);
+MODULE_PARM_DESC(max_queues, "Maximum number of hardware queues/rings per virtual disk");
+
 #define BLK_RING_SIZE(dinfo) __CONST_RING_SIZE(blkif, PAGE_SIZE * (dinfo)->pages_per_ring)
 #define BLK_MAX_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE * XENBUS_MAX_RING_PAGES)
 /*
@@ -114,6 +118,7 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the
  * characters are enough. Define to 20 to keep consist with backend.
  */
 #define RINGREF_NAME_LEN (20)
+#define QUEUE_NAME_LEN (12)
 
 /*
  *  Per-ring info.
@@ -687,7 +692,7 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
 
 	memset(&dinfo->tag_set, 0, sizeof(dinfo->tag_set));
 	dinfo->tag_set.ops = &blkfront_mq_ops;
-	dinfo->tag_set.nr_hw_queues = 1;
+	dinfo->tag_set.nr_hw_queues = dinfo->nr_rings;
 	dinfo->tag_set.queue_depth =  BLK_RING_SIZE(dinfo);
 	dinfo->tag_set.numa_node = NUMA_NO_NODE;
 	dinfo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
@@ -1350,6 +1355,51 @@ fail:
 	return err;
 }
 
+static int write_per_ring_nodes(struct xenbus_transaction xbt,
+				struct blkfront_ring_info *rinfo, const char *dir)
+{
+	int err, i;
+	const char *message = NULL;
+	struct blkfront_dev_info *dinfo = rinfo->dinfo;
+
+	if (dinfo->pages_per_ring == 1) {
+		err = xenbus_printf(xbt, dir, "ring-ref", "%u", rinfo->ring_ref[0]);
+		if (err) {
+			message = "writing ring-ref";
+			goto abort_transaction;
+		}
+		pr_info("%s: write ring-ref:%d\n", dir, rinfo->ring_ref[0]);
+	} else {
+		for (i = 0; i < dinfo->pages_per_ring; i++) {
+			char ring_ref_name[RINGREF_NAME_LEN];
+
+			snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
+			err = xenbus_printf(xbt, dir, ring_ref_name,
+					    "%u", rinfo->ring_ref[i]);
+			if (err) {
+				message = "writing ring-ref";
+				goto abort_transaction;
+			}
+			pr_info("%s: write ring-ref:%d\n", dir, rinfo->ring_ref[i]);
+		}
+	}
+
+	err = xenbus_printf(xbt, dir, "event-channel", "%u", rinfo->evtchn);
+	if (err) {
+		message = "writing event-channel";
+		goto abort_transaction;
+	}
+	pr_info("%s: write event-channel:%d\n", dir, rinfo->evtchn);
+
+	return 0;
+
+abort_transaction:
+	xenbus_transaction_end(xbt, 1);
+	if (message)
+		xenbus_dev_fatal(dinfo->xbdev, err, "%s", message);
+
+	return err;
+}
 
 /* Common code used when first setting up, and when resuming. */
 static int talk_to_blkback(struct xenbus_device *dev,
@@ -1386,45 +1436,51 @@ again:
 		goto out;
 	}
 
+	if (dinfo->pages_per_ring > 1) {
+		err = xenbus_printf(xbt, dev->nodename, "ring-page-order", "%u",
+				    ring_page_order);
+		if (err) {
+			message = "writing ring-page-order";
+			goto abort_transaction;
+		}
+	}
+
+	/* We already got the number of queues in _probe */
 	if (dinfo->nr_rings == 1) {
 		rinfo = &dinfo->rinfo[0];
 
-		if (dinfo->pages_per_ring == 1) {
-			err = xenbus_printf(xbt, dev->nodename,
-					    "ring-ref", "%u", rinfo->ring_ref[0]);
-			if (err) {
-				message = "writing ring-ref";
-				goto abort_transaction;
-			}
-		} else {
-			err = xenbus_printf(xbt, dev->nodename,
-					    "ring-page-order", "%u", ring_page_order);
-			if (err) {
-				message = "writing ring-page-order";
-				goto abort_transaction;
-			}
-
-			for (i = 0; i < dinfo->pages_per_ring; i++) {
-				char ring_ref_name[RINGREF_NAME_LEN];
+		err = write_per_ring_nodes(xbt, &dinfo->rinfo[0], dev->nodename);
+		if (err)
+			goto out;
+	} else {
+		char *path;
+		size_t pathsize;
 
-				snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
-				err = xenbus_printf(xbt, dev->nodename, ring_ref_name,
-						    "%u", rinfo->ring_ref[i]);
-				if (err) {
-					message = "writing ring-ref";
-					goto abort_transaction;
-				}
-			}
-		}
-		err = xenbus_printf(xbt, dev->nodename,
-				    "event-channel", "%u", rinfo->evtchn);
+		err = xenbus_printf(xbt, dev->nodename, "multi-queue-num-queues", "%u",
+				    dinfo->nr_rings);
 		if (err) {
-			message = "writing event-channel";
+			message = "writing multi-queue-num-queues";
 			goto abort_transaction;
 		}
-	} else {
-		/* Not supported at this stage */
-		goto abort_transaction;
+
+		pathsize = strlen(dev->nodename) + QUEUE_NAME_LEN;
+		path = kzalloc(pathsize, GFP_KERNEL);
+		if (!path) {
+			err = -ENOMEM;
+			message = "ENOMEM while writing ring references";
+			goto abort_transaction;
+		}
+
+		for (i = 0; i < dinfo->nr_rings; i++) {
+			memset(path, 0, pathsize);
+			snprintf(path, pathsize, "%s/queue-%u", dev->nodename, i);
+			err = write_per_ring_nodes(xbt, &dinfo->rinfo[i], path);
+			if (err) {
+				kfree(path);
+				goto out;
+			}
+		}
+		kfree(path);
 	}
 
 	err = xenbus_printf(xbt, dev->nodename, "protocol", "%s",
@@ -1480,6 +1536,7 @@ static int blkfront_probe(struct xenbus_device *dev,
 	int err, vdevice, r_index;
 	struct blkfront_dev_info *dinfo;
 	struct blkfront_ring_info *rinfo;
+	unsigned int back_max_queues = 0;
 
 	/* FIXME: Use dynamic device id if this is not set. */
 	err = xenbus_scanf(XBT_NIL, dev->nodename,
@@ -1534,7 +1591,17 @@ static int blkfront_probe(struct xenbus_device *dev,
 	dinfo->vdevice = vdevice;
 	dinfo->connected = BLKIF_STATE_DISCONNECTED;
 
-	dinfo->nr_rings = 1;
+	/* Check if backend supports multiple queues */
+	err = xenbus_scanf(XBT_NIL, dinfo->xbdev->otherend,
+			   "multi-queue-max-queues", "%u", &back_max_queues);
+	if (err < 0)
+		back_max_queues = 1;
+
+	dinfo->nr_rings = min(back_max_queues, xen_blkif_max_queues);
+	if (dinfo->nr_rings <= 0)
+		dinfo->nr_rings = 1;
+	pr_info("dinfo->nr_rings:%u, backend support max-queues:%u\n", dinfo->nr_rings, back_max_queues);
+
 	dinfo->rinfo = kzalloc(sizeof(*rinfo) * dinfo->nr_rings, GFP_KERNEL);
 	if (!dinfo->rinfo) {
 		xenbus_dev_fatal(dev, -ENOMEM, "allocating ring_info structure");
@@ -2257,6 +2324,7 @@ static struct xenbus_driver blkfront_driver = {
 static int __init xlblk_init(void)
 {
 	int ret;
+	int nr_cpus = num_online_cpus();
 
 	if (!xen_domain())
 		return -ENODEV;
@@ -2267,6 +2335,12 @@ static int __init xlblk_init(void)
 		xen_blkif_max_ring_order = 0;
 	}
 
+	if (xen_blkif_max_queues > nr_cpus) {
+		pr_info("Invalid max_queues (%d), will use default max: %d.\n",
+			xen_blkif_max_queues, nr_cpus);
+		xen_blkif_max_queues = nr_cpus;
+	}
+
 	if (!xen_has_pv_disk_devices())
 		return -ENODEV;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v3 7/9] xen/blkback: separate ring information out of struct xen_blkif
  2015-09-05 12:39 [PATCH v3 0/9] xen-block: support multi hardware-queues/rings Bob Liu
@ 2015-09-05 12:39   ` Bob Liu
  2015-09-05 12:39 ` Bob Liu
                     ` (13 subsequent siblings)
  14 siblings, 0 replies; 83+ messages in thread
From: Bob Liu @ 2015-09-05 12:39 UTC (permalink / raw)
  To: xen-devel
  Cc: david.vrabel, linux-kernel, roger.pau, konrad.wilk,
	felipe.franciosi, axboe, hch, avanzini.arianna, rafal.mielniczuk,
	boris.ostrovsky, jonathan.davies, Bob Liu

Split per ring information to an new structure:xen_blkif_ring, so that one vbd
device can associate with one or more rings/hardware queues.

This patch is a preparation for supporting multi hardware queues/rings.

Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 drivers/block/xen-blkback/blkback.c |  365 ++++++++++++++++++-----------------
 drivers/block/xen-blkback/common.h  |   52 +++--
 drivers/block/xen-blkback/xenbus.c  |  130 +++++++------
 3 files changed, 295 insertions(+), 252 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index 954c002..fd02240 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -113,71 +113,71 @@ module_param(log_stats, int, 0644);
 /* Number of free pages to remove on each call to gnttab_free_pages */
 #define NUM_BATCH_FREE_PAGES 10
 
-static inline int get_free_page(struct xen_blkif *blkif, struct page **page)
+static inline int get_free_page(struct xen_blkif_ring *ring, struct page **page)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&blkif->free_pages_lock, flags);
-	if (list_empty(&blkif->free_pages)) {
-		BUG_ON(blkif->free_pages_num != 0);
-		spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
+	spin_lock_irqsave(&ring->free_pages_lock, flags);
+	if (list_empty(&ring->free_pages)) {
+		BUG_ON(ring->free_pages_num != 0);
+		spin_unlock_irqrestore(&ring->free_pages_lock, flags);
 		return gnttab_alloc_pages(1, page);
 	}
-	BUG_ON(blkif->free_pages_num == 0);
-	page[0] = list_first_entry(&blkif->free_pages, struct page, lru);
+	BUG_ON(ring->free_pages_num == 0);
+	page[0] = list_first_entry(&ring->free_pages, struct page, lru);
 	list_del(&page[0]->lru);
-	blkif->free_pages_num--;
-	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
+	ring->free_pages_num--;
+	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
 
 	return 0;
 }
 
-static inline void put_free_pages(struct xen_blkif *blkif, struct page **page,
+static inline void put_free_pages(struct xen_blkif_ring *ring, struct page **page,
                                   int num)
 {
 	unsigned long flags;
 	int i;
 
-	spin_lock_irqsave(&blkif->free_pages_lock, flags);
+	spin_lock_irqsave(&ring->free_pages_lock, flags);
 	for (i = 0; i < num; i++)
-		list_add(&page[i]->lru, &blkif->free_pages);
-	blkif->free_pages_num += num;
-	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
+		list_add(&page[i]->lru, &ring->free_pages);
+	ring->free_pages_num += num;
+	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
 }
 
-static inline void shrink_free_pagepool(struct xen_blkif *blkif, int num)
+static inline void shrink_free_pagepool(struct xen_blkif_ring *ring, int num)
 {
 	/* Remove requested pages in batches of NUM_BATCH_FREE_PAGES */
 	struct page *page[NUM_BATCH_FREE_PAGES];
 	unsigned int num_pages = 0;
 	unsigned long flags;
 
-	spin_lock_irqsave(&blkif->free_pages_lock, flags);
-	while (blkif->free_pages_num > num) {
-		BUG_ON(list_empty(&blkif->free_pages));
-		page[num_pages] = list_first_entry(&blkif->free_pages,
+	spin_lock_irqsave(&ring->free_pages_lock, flags);
+	while (ring->free_pages_num > num) {
+		BUG_ON(list_empty(&ring->free_pages));
+		page[num_pages] = list_first_entry(&ring->free_pages,
 		                                   struct page, lru);
 		list_del(&page[num_pages]->lru);
-		blkif->free_pages_num--;
+		ring->free_pages_num--;
 		if (++num_pages == NUM_BATCH_FREE_PAGES) {
-			spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
+			spin_unlock_irqrestore(&ring->free_pages_lock, flags);
 			gnttab_free_pages(num_pages, page);
-			spin_lock_irqsave(&blkif->free_pages_lock, flags);
+			spin_lock_irqsave(&ring->free_pages_lock, flags);
 			num_pages = 0;
 		}
 	}
-	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
+	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
 	if (num_pages != 0)
 		gnttab_free_pages(num_pages, page);
 }
 
 #define vaddr(page) ((unsigned long)pfn_to_kaddr(page_to_pfn(page)))
 
-static int do_block_io_op(struct xen_blkif *blkif);
-static int dispatch_rw_block_io(struct xen_blkif *blkif,
+static int do_block_io_op(struct xen_blkif_ring *ring);
+static int dispatch_rw_block_io(struct xen_blkif_ring *ring,
 				struct blkif_request *req,
 				struct pending_req *pending_req);
-static void make_response(struct xen_blkif *blkif, u64 id,
+static void make_response(struct xen_blkif_ring *ring, u64 id,
 			  unsigned short op, int st);
 
 #define foreach_grant_safe(pos, n, rbtree, node) \
@@ -198,19 +198,19 @@ static void make_response(struct xen_blkif *blkif, u64 id,
  * bit operations to modify the flags of a persistent grant and to count
  * the number of used grants.
  */
-static int add_persistent_gnt(struct xen_blkif *blkif,
+static int add_persistent_gnt(struct xen_blkif_ring *ring,
 			       struct persistent_gnt *persistent_gnt)
 {
 	struct rb_node **new = NULL, *parent = NULL;
 	struct persistent_gnt *this;
 
-	if (blkif->persistent_gnt_c >= xen_blkif_max_pgrants) {
-		if (!blkif->vbd.overflow_max_grants)
-			blkif->vbd.overflow_max_grants = 1;
+	if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) {
+		if (!ring->blkif->vbd.overflow_max_grants)
+			ring->blkif->vbd.overflow_max_grants = 1;
 		return -EBUSY;
 	}
 	/* Figure out where to put new node */
-	new = &blkif->persistent_gnts.rb_node;
+	new = &ring->persistent_gnts.rb_node;
 	while (*new) {
 		this = container_of(*new, struct persistent_gnt, node);
 
@@ -229,19 +229,19 @@ static int add_persistent_gnt(struct xen_blkif *blkif,
 	set_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags);
 	/* Add new node and rebalance tree. */
 	rb_link_node(&(persistent_gnt->node), parent, new);
-	rb_insert_color(&(persistent_gnt->node), &blkif->persistent_gnts);
-	blkif->persistent_gnt_c++;
-	atomic_inc(&blkif->persistent_gnt_in_use);
+	rb_insert_color(&(persistent_gnt->node), &ring->persistent_gnts);
+	ring->persistent_gnt_c++;
+	atomic_inc(&ring->persistent_gnt_in_use);
 	return 0;
 }
 
-static struct persistent_gnt *get_persistent_gnt(struct xen_blkif *blkif,
-						 grant_ref_t gref)
+static struct persistent_gnt *get_persistent_gnt(struct xen_blkif_ring *ring,
+						grant_ref_t gref)
 {
 	struct persistent_gnt *data;
 	struct rb_node *node = NULL;
 
-	node = blkif->persistent_gnts.rb_node;
+	node = ring->persistent_gnts.rb_node;
 	while (node) {
 		data = container_of(node, struct persistent_gnt, node);
 
@@ -255,24 +255,24 @@ static struct persistent_gnt *get_persistent_gnt(struct xen_blkif *blkif,
 				return NULL;
 			}
 			set_bit(PERSISTENT_GNT_ACTIVE, data->flags);
-			atomic_inc(&blkif->persistent_gnt_in_use);
+			atomic_inc(&ring->persistent_gnt_in_use);
 			return data;
 		}
 	}
 	return NULL;
 }
 
-static void put_persistent_gnt(struct xen_blkif *blkif,
+static void put_persistent_gnt(struct xen_blkif_ring *ring,
                                struct persistent_gnt *persistent_gnt)
 {
 	if(!test_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags))
 		pr_alert_ratelimited("freeing a grant already unused\n");
 	set_bit(PERSISTENT_GNT_WAS_ACTIVE, persistent_gnt->flags);
 	clear_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags);
-	atomic_dec(&blkif->persistent_gnt_in_use);
+	atomic_dec(&ring->persistent_gnt_in_use);
 }
 
-static void free_persistent_gnts(struct xen_blkif *blkif, struct rb_root *root,
+static void free_persistent_gnts(struct xen_blkif_ring *ring, struct rb_root *root,
                                  unsigned int num)
 {
 	struct gnttab_unmap_grant_ref unmap[BLKIF_MAX_SEGMENTS_PER_REQUEST];
@@ -303,7 +303,7 @@ static void free_persistent_gnts(struct xen_blkif *blkif, struct rb_root *root,
 			unmap_data.count = segs_to_unmap;
 			BUG_ON(gnttab_unmap_refs_sync(&unmap_data));
 
-			put_free_pages(blkif, pages, segs_to_unmap);
+			put_free_pages(ring, pages, segs_to_unmap);
 			segs_to_unmap = 0;
 		}
 
@@ -320,15 +320,15 @@ void xen_blkbk_unmap_purged_grants(struct work_struct *work)
 	struct page *pages[BLKIF_MAX_SEGMENTS_PER_REQUEST];
 	struct persistent_gnt *persistent_gnt;
 	int segs_to_unmap = 0;
-	struct xen_blkif *blkif = container_of(work, typeof(*blkif), persistent_purge_work);
+	struct xen_blkif_ring *ring = container_of(work, typeof(*ring), persistent_purge_work);
 	struct gntab_unmap_queue_data unmap_data;
 
 	unmap_data.pages = pages;
 	unmap_data.unmap_ops = unmap;
 	unmap_data.kunmap_ops = NULL;
 
-	while(!list_empty(&blkif->persistent_purge_list)) {
-		persistent_gnt = list_first_entry(&blkif->persistent_purge_list,
+	while(!list_empty(&ring->persistent_purge_list)) {
+		persistent_gnt = list_first_entry(&ring->persistent_purge_list,
 		                                  struct persistent_gnt,
 		                                  remove_node);
 		list_del(&persistent_gnt->remove_node);
@@ -343,7 +343,7 @@ void xen_blkbk_unmap_purged_grants(struct work_struct *work)
 		if (++segs_to_unmap == BLKIF_MAX_SEGMENTS_PER_REQUEST) {
 			unmap_data.count = segs_to_unmap;
 			BUG_ON(gnttab_unmap_refs_sync(&unmap_data));
-			put_free_pages(blkif, pages, segs_to_unmap);
+			put_free_pages(ring, pages, segs_to_unmap);
 			segs_to_unmap = 0;
 		}
 		kfree(persistent_gnt);
@@ -351,34 +351,35 @@ void xen_blkbk_unmap_purged_grants(struct work_struct *work)
 	if (segs_to_unmap > 0) {
 		unmap_data.count = segs_to_unmap;
 		BUG_ON(gnttab_unmap_refs_sync(&unmap_data));
-		put_free_pages(blkif, pages, segs_to_unmap);
+		put_free_pages(ring, pages, segs_to_unmap);
 	}
 }
 
-static void purge_persistent_gnt(struct xen_blkif *blkif)
+static void purge_persistent_gnt(struct xen_blkif_ring *ring)
 {
 	struct persistent_gnt *persistent_gnt;
 	struct rb_node *n;
 	unsigned int num_clean, total;
 	bool scan_used = false, clean_used = false;
 	struct rb_root *root;
+	struct xen_blkif *blkif = ring->blkif;
 
-	if (blkif->persistent_gnt_c < xen_blkif_max_pgrants ||
-	    (blkif->persistent_gnt_c == xen_blkif_max_pgrants &&
+	if (ring->persistent_gnt_c < xen_blkif_max_pgrants ||
+	    (ring->persistent_gnt_c == xen_blkif_max_pgrants &&
 	    !blkif->vbd.overflow_max_grants)) {
 		return;
 	}
 
-	if (work_busy(&blkif->persistent_purge_work)) {
+	if (work_busy(&ring->persistent_purge_work)) {
 		pr_alert_ratelimited("Scheduled work from previous purge is still busy, cannot purge list\n");
 		return;
 	}
 
 	num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN;
-	num_clean = blkif->persistent_gnt_c - xen_blkif_max_pgrants + num_clean;
-	num_clean = min(blkif->persistent_gnt_c, num_clean);
+	num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants + num_clean;
+	num_clean = min(ring->persistent_gnt_c, num_clean);
 	if ((num_clean == 0) ||
-	    (num_clean > (blkif->persistent_gnt_c - atomic_read(&blkif->persistent_gnt_in_use))))
+	    (num_clean > (ring->persistent_gnt_c - atomic_read(&ring->persistent_gnt_in_use))))
 		return;
 
 	/*
@@ -394,8 +395,8 @@ static void purge_persistent_gnt(struct xen_blkif *blkif)
 
 	pr_debug("Going to purge %u persistent grants\n", num_clean);
 
-	BUG_ON(!list_empty(&blkif->persistent_purge_list));
-	root = &blkif->persistent_gnts;
+	BUG_ON(!list_empty(&ring->persistent_purge_list));
+	root = &ring->persistent_gnts;
 purge_list:
 	foreach_grant_safe(persistent_gnt, n, root, node) {
 		BUG_ON(persistent_gnt->handle ==
@@ -414,7 +415,7 @@ purge_list:
 
 		rb_erase(&persistent_gnt->node, root);
 		list_add(&persistent_gnt->remove_node,
-		         &blkif->persistent_purge_list);
+			 &ring->persistent_purge_list);
 		if (--num_clean == 0)
 			goto finished;
 	}
@@ -435,11 +436,11 @@ finished:
 		goto purge_list;
 	}
 
-	blkif->persistent_gnt_c -= (total - num_clean);
+	ring->persistent_gnt_c -= (total - num_clean);
 	blkif->vbd.overflow_max_grants = 0;
 
 	/* We can defer this work */
-	schedule_work(&blkif->persistent_purge_work);
+	schedule_work(&ring->persistent_purge_work);
 	pr_debug("Purged %u/%u\n", (total - num_clean), total);
 	return;
 }
@@ -447,18 +448,18 @@ finished:
 /*
  * Retrieve from the 'pending_reqs' a free pending_req structure to be used.
  */
-static struct pending_req *alloc_req(struct xen_blkif *blkif)
+static struct pending_req *alloc_req(struct xen_blkif_ring *ring)
 {
 	struct pending_req *req = NULL;
 	unsigned long flags;
 
-	spin_lock_irqsave(&blkif->pending_free_lock, flags);
-	if (!list_empty(&blkif->pending_free)) {
-		req = list_entry(blkif->pending_free.next, struct pending_req,
+	spin_lock_irqsave(&ring->pending_free_lock, flags);
+	if (!list_empty(&ring->pending_free)) {
+		req = list_entry(ring->pending_free.next, struct pending_req,
 				 free_list);
 		list_del(&req->free_list);
 	}
-	spin_unlock_irqrestore(&blkif->pending_free_lock, flags);
+	spin_unlock_irqrestore(&ring->pending_free_lock, flags);
 	return req;
 }
 
@@ -466,17 +467,17 @@ static struct pending_req *alloc_req(struct xen_blkif *blkif)
  * Return the 'pending_req' structure back to the freepool. We also
  * wake up the thread if it was waiting for a free page.
  */
-static void free_req(struct xen_blkif *blkif, struct pending_req *req)
+static void free_req(struct xen_blkif_ring *ring, struct pending_req *req)
 {
 	unsigned long flags;
 	int was_empty;
 
-	spin_lock_irqsave(&blkif->pending_free_lock, flags);
-	was_empty = list_empty(&blkif->pending_free);
-	list_add(&req->free_list, &blkif->pending_free);
-	spin_unlock_irqrestore(&blkif->pending_free_lock, flags);
+	spin_lock_irqsave(&ring->pending_free_lock, flags);
+	was_empty = list_empty(&ring->pending_free);
+	list_add(&req->free_list, &ring->pending_free);
+	spin_unlock_irqrestore(&ring->pending_free_lock, flags);
 	if (was_empty)
-		wake_up(&blkif->pending_free_wq);
+		wake_up(&ring->pending_free_wq);
 }
 
 /*
@@ -556,10 +557,10 @@ abort:
 /*
  * Notification from the guest OS.
  */
-static void blkif_notify_work(struct xen_blkif *blkif)
+static void blkif_notify_work(struct xen_blkif_ring *ring)
 {
-	blkif->waiting_reqs = 1;
-	wake_up(&blkif->wq);
+	ring->waiting_reqs = 1;
+	wake_up(&ring->wq);
 }
 
 irqreturn_t xen_blkif_be_int(int irq, void *dev_id)
@@ -572,25 +573,26 @@ irqreturn_t xen_blkif_be_int(int irq, void *dev_id)
  * SCHEDULER FUNCTIONS
  */
 
-static void print_stats(struct xen_blkif *blkif)
+static void print_stats(struct xen_blkif_ring *ring)
 {
 	pr_info("(%s): oo %3llu  |  rd %4llu  |  wr %4llu  |  f %4llu"
 		 "  |  ds %4llu | pg: %4u/%4d\n",
-		 current->comm, blkif->st_oo_req,
-		 blkif->st_rd_req, blkif->st_wr_req,
-		 blkif->st_f_req, blkif->st_ds_req,
-		 blkif->persistent_gnt_c,
+		 current->comm, ring->st_oo_req,
+		 ring->st_rd_req, ring->st_wr_req,
+		 ring->st_f_req, ring->st_ds_req,
+		 ring->persistent_gnt_c,
 		 xen_blkif_max_pgrants);
-	blkif->st_print = jiffies + msecs_to_jiffies(10 * 1000);
-	blkif->st_rd_req = 0;
-	blkif->st_wr_req = 0;
-	blkif->st_oo_req = 0;
-	blkif->st_ds_req = 0;
+	ring->st_print = jiffies + msecs_to_jiffies(10 * 1000);
+	ring->st_rd_req = 0;
+	ring->st_wr_req = 0;
+	ring->st_oo_req = 0;
+	ring->st_ds_req = 0;
 }
 
 int xen_blkif_schedule(void *arg)
 {
-	struct xen_blkif *blkif = arg;
+	struct xen_blkif_ring *ring = arg;
+	struct xen_blkif *blkif = ring->blkif;
 	struct xen_vbd *vbd = &blkif->vbd;
 	unsigned long timeout;
 	int ret;
@@ -606,50 +608,50 @@ int xen_blkif_schedule(void *arg)
 		timeout = msecs_to_jiffies(LRU_INTERVAL);
 
 		timeout = wait_event_interruptible_timeout(
-			blkif->wq,
-			blkif->waiting_reqs || kthread_should_stop(),
+			ring->wq,
+			ring->waiting_reqs || kthread_should_stop(),
 			timeout);
 		if (timeout == 0)
 			goto purge_gnt_list;
 		timeout = wait_event_interruptible_timeout(
-			blkif->pending_free_wq,
-			!list_empty(&blkif->pending_free) ||
+			ring->pending_free_wq,
+			!list_empty(&ring->pending_free) ||
 			kthread_should_stop(),
 			timeout);
 		if (timeout == 0)
 			goto purge_gnt_list;
 
-		blkif->waiting_reqs = 0;
+		ring->waiting_reqs = 0;
 		smp_mb(); /* clear flag *before* checking for work */
 
-		ret = do_block_io_op(blkif);
+		ret = do_block_io_op(ring);
 		if (ret > 0)
-			blkif->waiting_reqs = 1;
+			ring->waiting_reqs = 1;
 		if (ret == -EACCES)
-			wait_event_interruptible(blkif->shutdown_wq,
+			wait_event_interruptible(ring->shutdown_wq,
 						 kthread_should_stop());
 
 purge_gnt_list:
 		if (blkif->vbd.feature_gnt_persistent &&
-		    time_after(jiffies, blkif->next_lru)) {
-			purge_persistent_gnt(blkif);
-			blkif->next_lru = jiffies + msecs_to_jiffies(LRU_INTERVAL);
+		    time_after(jiffies, ring->next_lru)) {
+			purge_persistent_gnt(ring);
+			ring->next_lru = jiffies + msecs_to_jiffies(LRU_INTERVAL);
 		}
 
 		/* Shrink if we have more than xen_blkif_max_buffer_pages */
-		shrink_free_pagepool(blkif, xen_blkif_max_buffer_pages);
+		shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);
 
-		if (log_stats && time_after(jiffies, blkif->st_print))
-			print_stats(blkif);
+		if (log_stats && time_after(jiffies, ring->st_print))
+			print_stats(ring);
 	}
 
 	/* Drain pending purge work */
-	flush_work(&blkif->persistent_purge_work);
+	flush_work(&ring->persistent_purge_work);
 
 	if (log_stats)
-		print_stats(blkif);
+		print_stats(ring);
 
-	blkif->xenblkd = NULL;
+	ring->xenblkd = NULL;
 	xen_blkif_put(blkif);
 
 	return 0;
@@ -658,22 +660,22 @@ purge_gnt_list:
 /*
  * Remove persistent grants and empty the pool of free pages
  */
-void xen_blkbk_free_caches(struct xen_blkif *blkif)
+void xen_blkbk_free_caches(struct xen_blkif_ring *ring)
 {
 	/* Free all persistent grant pages */
-	if (!RB_EMPTY_ROOT(&blkif->persistent_gnts))
-		free_persistent_gnts(blkif, &blkif->persistent_gnts,
-			blkif->persistent_gnt_c);
+	if (!RB_EMPTY_ROOT(&ring->persistent_gnts))
+		free_persistent_gnts(ring, &ring->persistent_gnts,
+			ring->persistent_gnt_c);
 
-	BUG_ON(!RB_EMPTY_ROOT(&blkif->persistent_gnts));
-	blkif->persistent_gnt_c = 0;
+	BUG_ON(!RB_EMPTY_ROOT(&ring->persistent_gnts));
+	ring->persistent_gnt_c = 0;
 
 	/* Since we are shutting down remove all pages from the buffer */
-	shrink_free_pagepool(blkif, 0 /* All */);
+	shrink_free_pagepool(ring, 0 /* All */);
 }
 
 static unsigned int xen_blkbk_unmap_prepare(
-	struct xen_blkif *blkif,
+	struct xen_blkif_ring *ring,
 	struct grant_page **pages,
 	unsigned int num,
 	struct gnttab_unmap_grant_ref *unmap_ops,
@@ -683,7 +685,7 @@ static unsigned int xen_blkbk_unmap_prepare(
 
 	for (i = 0; i < num; i++) {
 		if (pages[i]->persistent_gnt != NULL) {
-			put_persistent_gnt(blkif, pages[i]->persistent_gnt);
+			put_persistent_gnt(ring, pages[i]->persistent_gnt);
 			continue;
 		}
 		if (pages[i]->handle == BLKBACK_INVALID_HANDLE)
@@ -700,17 +702,18 @@ static unsigned int xen_blkbk_unmap_prepare(
 
 static void xen_blkbk_unmap_and_respond_callback(int result, struct gntab_unmap_queue_data *data)
 {
-	struct pending_req* pending_req = (struct pending_req*) (data->data);
-	struct xen_blkif *blkif = pending_req->blkif;
+	struct pending_req *pending_req = (struct pending_req *)(data->data);
+	struct xen_blkif_ring *ring = pending_req->ring;
+	struct xen_blkif *blkif = ring->blkif;
 
 	/* BUG_ON used to reproduce existing behaviour,
 	   but is this the best way to deal with this? */
 	BUG_ON(result);
 
-	put_free_pages(blkif, data->pages, data->count);
-	make_response(blkif, pending_req->id,
+	put_free_pages(ring, data->pages, data->count);
+	make_response(ring, pending_req->id,
 		      pending_req->operation, pending_req->status);
-	free_req(blkif, pending_req);
+	free_req(ring, pending_req);
 	/*
 	 * Make sure the request is freed before releasing blkif,
 	 * or there could be a race between free_req and the
@@ -723,7 +726,7 @@ static void xen_blkbk_unmap_and_respond_callback(int result, struct gntab_unmap_
 	 * pending_free_wq if there's a drain going on, but it has
 	 * to be taken into account if the current model is changed.
 	 */
-	if (atomic_dec_and_test(&blkif->inflight) && atomic_read(&blkif->drain)) {
+	if (atomic_dec_and_test(&ring->inflight) && atomic_read(&blkif->drain)) {
 		complete(&blkif->drain_complete);
 	}
 	xen_blkif_put(blkif);
@@ -732,11 +735,11 @@ static void xen_blkbk_unmap_and_respond_callback(int result, struct gntab_unmap_
 static void xen_blkbk_unmap_and_respond(struct pending_req *req)
 {
 	struct gntab_unmap_queue_data* work = &req->gnttab_unmap_data;
-	struct xen_blkif *blkif = req->blkif;
+	struct xen_blkif_ring *ring = req->ring;
 	struct grant_page **pages = req->segments;
 	unsigned int invcount;
 
-	invcount = xen_blkbk_unmap_prepare(blkif, pages, req->nr_segs,
+	invcount = xen_blkbk_unmap_prepare(ring, pages, req->nr_segs,
 					   req->unmap, req->unmap_pages);
 
 	work->data = req;
@@ -757,7 +760,7 @@ static void xen_blkbk_unmap_and_respond(struct pending_req *req)
  * of hypercalls, but since this is only used in error paths there's
  * no real need.
  */
-static void xen_blkbk_unmap(struct xen_blkif *blkif,
+static void xen_blkbk_unmap(struct xen_blkif_ring *ring,
                             struct grant_page *pages[],
                             int num)
 {
@@ -768,20 +771,20 @@ static void xen_blkbk_unmap(struct xen_blkif *blkif,
 
 	while (num) {
 		unsigned int batch = min(num, BLKIF_MAX_SEGMENTS_PER_REQUEST);
-		
-		invcount = xen_blkbk_unmap_prepare(blkif, pages, batch,
+
+		invcount = xen_blkbk_unmap_prepare(ring, pages, batch,
 						   unmap, unmap_pages);
 		if (invcount) {
 			ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, invcount);
 			BUG_ON(ret);
-			put_free_pages(blkif, unmap_pages, invcount);
+			put_free_pages(ring, unmap_pages, invcount);
 		}
 		pages += batch;
 		num -= batch;
 	}
 }
 
-static int xen_blkbk_map(struct xen_blkif *blkif,
+static int xen_blkbk_map(struct xen_blkif_ring *ring,
 			 struct grant_page *pages[],
 			 int num, bool ro)
 {
@@ -794,6 +797,7 @@ static int xen_blkbk_map(struct xen_blkif *blkif,
 	int ret = 0;
 	int last_map = 0, map_until = 0;
 	int use_persistent_gnts;
+	struct xen_blkif *blkif = ring->blkif;
 
 	use_persistent_gnts = (blkif->vbd.feature_gnt_persistent);
 
@@ -808,7 +812,7 @@ again:
 
 		if (use_persistent_gnts)
 			persistent_gnt = get_persistent_gnt(
-				blkif,
+				ring,
 				pages[i]->gref);
 
 		if (persistent_gnt) {
@@ -819,7 +823,7 @@ again:
 			pages[i]->page = persistent_gnt->page;
 			pages[i]->persistent_gnt = persistent_gnt;
 		} else {
-			if (get_free_page(blkif, &pages[i]->page))
+			if (get_free_page(ring, &pages[i]->page))
 				goto out_of_memory;
 			addr = vaddr(pages[i]->page);
 			pages_to_gnt[segs_to_map] = pages[i]->page;
@@ -852,7 +856,7 @@ again:
 			BUG_ON(new_map_idx >= segs_to_map);
 			if (unlikely(map[new_map_idx].status != 0)) {
 				pr_debug("invalid buffer -- could not remap it\n");
-				put_free_pages(blkif, &pages[seg_idx]->page, 1);
+				put_free_pages(ring, &pages[seg_idx]->page, 1);
 				pages[seg_idx]->handle = BLKBACK_INVALID_HANDLE;
 				ret |= 1;
 				goto next;
@@ -862,7 +866,7 @@ again:
 			continue;
 		}
 		if (use_persistent_gnts &&
-		    blkif->persistent_gnt_c < xen_blkif_max_pgrants) {
+		    ring->persistent_gnt_c < xen_blkif_max_pgrants) {
 			/*
 			 * We are using persistent grants, the grant is
 			 * not mapped but we might have room for it.
@@ -880,7 +884,7 @@ again:
 			persistent_gnt->gnt = map[new_map_idx].ref;
 			persistent_gnt->handle = map[new_map_idx].handle;
 			persistent_gnt->page = pages[seg_idx]->page;
-			if (add_persistent_gnt(blkif,
+			if (add_persistent_gnt(ring,
 			                       persistent_gnt)) {
 				kfree(persistent_gnt);
 				persistent_gnt = NULL;
@@ -888,7 +892,7 @@ again:
 			}
 			pages[seg_idx]->persistent_gnt = persistent_gnt;
 			pr_debug("grant %u added to the tree of persistent grants, using %u/%u\n",
-				 persistent_gnt->gnt, blkif->persistent_gnt_c,
+				 persistent_gnt->gnt, ring->persistent_gnt_c,
 				 xen_blkif_max_pgrants);
 			goto next;
 		}
@@ -913,7 +917,7 @@ next:
 
 out_of_memory:
 	pr_alert("%s: out of memory\n", __func__);
-	put_free_pages(blkif, pages_to_gnt, segs_to_map);
+	put_free_pages(ring, pages_to_gnt, segs_to_map);
 	return -ENOMEM;
 }
 
@@ -921,7 +925,7 @@ static int xen_blkbk_map_seg(struct pending_req *pending_req)
 {
 	int rc;
 
-	rc = xen_blkbk_map(pending_req->blkif, pending_req->segments,
+	rc = xen_blkbk_map(pending_req->ring, pending_req->segments,
 			   pending_req->nr_segs,
 	                   (pending_req->operation != BLKIF_OP_READ));
 
@@ -934,7 +938,7 @@ static int xen_blkbk_parse_indirect(struct blkif_request *req,
 				    struct phys_req *preq)
 {
 	struct grant_page **pages = pending_req->indirect_pages;
-	struct xen_blkif *blkif = pending_req->blkif;
+	struct xen_blkif_ring *ring = pending_req->ring;
 	int indirect_grefs, rc, n, nseg, i;
 	struct blkif_request_segment *segments = NULL;
 
@@ -945,7 +949,7 @@ static int xen_blkbk_parse_indirect(struct blkif_request *req,
 	for (i = 0; i < indirect_grefs; i++)
 		pages[i]->gref = req->u.indirect.indirect_grefs[i];
 
-	rc = xen_blkbk_map(blkif, pages, indirect_grefs, true);
+	rc = xen_blkbk_map(ring, pages, indirect_grefs, true);
 	if (rc)
 		goto unmap;
 
@@ -972,15 +976,16 @@ static int xen_blkbk_parse_indirect(struct blkif_request *req,
 unmap:
 	if (segments)
 		kunmap_atomic(segments);
-	xen_blkbk_unmap(blkif, pages, indirect_grefs);
+	xen_blkbk_unmap(ring, pages, indirect_grefs);
 	return rc;
 }
 
-static int dispatch_discard_io(struct xen_blkif *blkif,
+static int dispatch_discard_io(struct xen_blkif_ring *ring,
 				struct blkif_request *req)
 {
 	int err = 0;
 	int status = BLKIF_RSP_OKAY;
+	struct xen_blkif *blkif = ring->blkif;
 	struct block_device *bdev = blkif->vbd.bdev;
 	unsigned long secure;
 	struct phys_req preq;
@@ -997,7 +1002,7 @@ static int dispatch_discard_io(struct xen_blkif *blkif,
 			preq.sector_number + preq.nr_sects, blkif->vbd.pdevice);
 		goto fail_response;
 	}
-	blkif->st_ds_req++;
+	ring->st_ds_req++;
 
 	secure = (blkif->vbd.discard_secure &&
 		 (req->u.discard.flag & BLKIF_DISCARD_SECURE)) ?
@@ -1013,26 +1018,27 @@ fail_response:
 	} else if (err)
 		status = BLKIF_RSP_ERROR;
 
-	make_response(blkif, req->u.discard.id, req->operation, status);
+	make_response(ring, req->u.discard.id, req->operation, status);
 	xen_blkif_put(blkif);
 	return err;
 }
 
-static int dispatch_other_io(struct xen_blkif *blkif,
+static int dispatch_other_io(struct xen_blkif_ring *ring,
 			     struct blkif_request *req,
 			     struct pending_req *pending_req)
 {
-	free_req(blkif, pending_req);
-	make_response(blkif, req->u.other.id, req->operation,
+	free_req(ring, pending_req);
+	make_response(ring, req->u.other.id, req->operation,
 		      BLKIF_RSP_EOPNOTSUPP);
 	return -EIO;
 }
 
-static void xen_blk_drain_io(struct xen_blkif *blkif)
+static void xen_blk_drain_io(struct xen_blkif_ring *ring)
 {
+	struct xen_blkif *blkif = ring->blkif;
 	atomic_set(&blkif->drain, 1);
 	do {
-		if (atomic_read(&blkif->inflight) == 0)
+		if (atomic_read(&ring->inflight) == 0)
 			break;
 		wait_for_completion_interruptible_timeout(
 				&blkif->drain_complete, HZ);
@@ -1053,12 +1059,12 @@ static void __end_block_io_op(struct pending_req *pending_req, int error)
 	if ((pending_req->operation == BLKIF_OP_FLUSH_DISKCACHE) &&
 	    (error == -EOPNOTSUPP)) {
 		pr_debug("flush diskcache op failed, not supported\n");
-		xen_blkbk_flush_diskcache(XBT_NIL, pending_req->blkif->be, 0);
+		xen_blkbk_flush_diskcache(XBT_NIL, pending_req->ring->blkif->be, 0);
 		pending_req->status = BLKIF_RSP_EOPNOTSUPP;
 	} else if ((pending_req->operation == BLKIF_OP_WRITE_BARRIER) &&
 		    (error == -EOPNOTSUPP)) {
 		pr_debug("write barrier op failed, not supported\n");
-		xen_blkbk_barrier(XBT_NIL, pending_req->blkif->be, 0);
+		xen_blkbk_barrier(XBT_NIL, pending_req->ring->blkif->be, 0);
 		pending_req->status = BLKIF_RSP_EOPNOTSUPP;
 	} else if (error) {
 		pr_debug("Buffer not up-to-date at end of operation,"
@@ -1092,9 +1098,9 @@ static void end_block_io_op(struct bio *bio, int error)
  * and transmute  it to the block API to hand it over to the proper block disk.
  */
 static int
-__do_block_io_op(struct xen_blkif *blkif)
+__do_block_io_op(struct xen_blkif_ring *ring)
 {
-	union blkif_back_rings *blk_rings = &blkif->blk_rings;
+	union blkif_back_rings *blk_rings = &ring->blk_rings;
 	struct blkif_request req;
 	struct pending_req *pending_req;
 	RING_IDX rc, rp;
@@ -1107,7 +1113,7 @@ __do_block_io_op(struct xen_blkif *blkif)
 	if (RING_REQUEST_PROD_OVERFLOW(&blk_rings->common, rp)) {
 		rc = blk_rings->common.rsp_prod_pvt;
 		pr_warn("Frontend provided bogus ring requests (%d - %d = %d). Halting ring processing on dev=%04x\n",
-			rp, rc, rp - rc, blkif->vbd.pdevice);
+			rp, rc, rp - rc, ring->blkif->vbd.pdevice);
 		return -EACCES;
 	}
 	while (rc != rp) {
@@ -1120,14 +1126,14 @@ __do_block_io_op(struct xen_blkif *blkif)
 			break;
 		}
 
-		pending_req = alloc_req(blkif);
+		pending_req = alloc_req(ring);
 		if (NULL == pending_req) {
-			blkif->st_oo_req++;
+			ring->st_oo_req++;
 			more_to_do = 1;
 			break;
 		}
 
-		switch (blkif->blk_protocol) {
+		switch (ring->blkif->blk_protocol) {
 		case BLKIF_PROTOCOL_NATIVE:
 			memcpy(&req, RING_GET_REQUEST(&blk_rings->native, rc), sizeof(req));
 			break;
@@ -1151,16 +1157,16 @@ __do_block_io_op(struct xen_blkif *blkif)
 		case BLKIF_OP_WRITE_BARRIER:
 		case BLKIF_OP_FLUSH_DISKCACHE:
 		case BLKIF_OP_INDIRECT:
-			if (dispatch_rw_block_io(blkif, &req, pending_req))
+			if (dispatch_rw_block_io(ring, &req, pending_req))
 				goto done;
 			break;
 		case BLKIF_OP_DISCARD:
-			free_req(blkif, pending_req);
-			if (dispatch_discard_io(blkif, &req))
+			free_req(ring, pending_req);
+			if (dispatch_discard_io(ring, &req))
 				goto done;
 			break;
 		default:
-			if (dispatch_other_io(blkif, &req, pending_req))
+			if (dispatch_other_io(ring, &req, pending_req))
 				goto done;
 			break;
 		}
@@ -1173,13 +1179,13 @@ done:
 }
 
 static int
-do_block_io_op(struct xen_blkif *blkif)
+do_block_io_op(struct xen_blkif_ring *ring)
 {
-	union blkif_back_rings *blk_rings = &blkif->blk_rings;
+	union blkif_back_rings *blk_rings = &ring->blk_rings;
 	int more_to_do;
 
 	do {
-		more_to_do = __do_block_io_op(blkif);
+		more_to_do = __do_block_io_op(ring);
 		if (more_to_do)
 			break;
 
@@ -1192,7 +1198,7 @@ do_block_io_op(struct xen_blkif *blkif)
  * Transmutation of the 'struct blkif_request' to a proper 'struct bio'
  * and call the 'submit_bio' to pass it to the underlying storage.
  */
-static int dispatch_rw_block_io(struct xen_blkif *blkif,
+static int dispatch_rw_block_io(struct xen_blkif_ring *ring,
 				struct blkif_request *req,
 				struct pending_req *pending_req)
 {
@@ -1219,17 +1225,17 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 
 	switch (req_operation) {
 	case BLKIF_OP_READ:
-		blkif->st_rd_req++;
+		ring->st_rd_req++;
 		operation = READ;
 		break;
 	case BLKIF_OP_WRITE:
-		blkif->st_wr_req++;
+		ring->st_wr_req++;
 		operation = WRITE_ODIRECT;
 		break;
 	case BLKIF_OP_WRITE_BARRIER:
 		drain = true;
 	case BLKIF_OP_FLUSH_DISKCACHE:
-		blkif->st_f_req++;
+		ring->st_f_req++;
 		operation = WRITE_FLUSH;
 		break;
 	default:
@@ -1254,7 +1260,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 
 	preq.nr_sects      = 0;
 
-	pending_req->blkif     = blkif;
+	pending_req->ring     = ring;
 	pending_req->id        = req->u.rw.id;
 	pending_req->operation = req_operation;
 	pending_req->status    = BLKIF_RSP_OKAY;
@@ -1281,12 +1287,12 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 			goto fail_response;
 	}
 
-	if (xen_vbd_translate(&preq, blkif, operation) != 0) {
+	if (xen_vbd_translate(&preq, ring->blkif, operation) != 0) {
 		pr_debug("access denied: %s of [%llu,%llu] on dev=%04x\n",
 			 operation == READ ? "read" : "write",
 			 preq.sector_number,
 			 preq.sector_number + preq.nr_sects,
-			 blkif->vbd.pdevice);
+			 ring->blkif->vbd.pdevice);
 		goto fail_response;
 	}
 
@@ -1298,7 +1304,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 		if (((int)preq.sector_number|(int)seg[i].nsec) &
 		    ((bdev_logical_block_size(preq.bdev) >> 9) - 1)) {
 			pr_debug("Misaligned I/O request from domain %d\n",
-				 blkif->domid);
+				 ring->blkif->domid);
 			goto fail_response;
 		}
 	}
@@ -1307,7 +1313,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 	 * issue the WRITE_FLUSH.
 	 */
 	if (drain)
-		xen_blk_drain_io(pending_req->blkif);
+		xen_blk_drain_io(pending_req->ring);
 
 	/*
 	 * If we have failed at this point, we need to undo the M2P override,
@@ -1322,8 +1328,8 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 	 * This corresponding xen_blkif_put is done in __end_block_io_op, or
 	 * below (in "!bio") if we are handling a BLKIF_OP_DISCARD.
 	 */
-	xen_blkif_get(blkif);
-	atomic_inc(&blkif->inflight);
+	xen_blkif_get(ring->blkif);
+	atomic_inc(&ring->inflight);
 
 	for (i = 0; i < nseg; i++) {
 		while ((bio == NULL) ||
@@ -1371,19 +1377,19 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 	blk_finish_plug(&plug);
 
 	if (operation == READ)
-		blkif->st_rd_sect += preq.nr_sects;
+		ring->st_rd_sect += preq.nr_sects;
 	else if (operation & WRITE)
-		blkif->st_wr_sect += preq.nr_sects;
+		ring->st_wr_sect += preq.nr_sects;
 
 	return 0;
 
  fail_flush:
-	xen_blkbk_unmap(blkif, pending_req->segments,
+	xen_blkbk_unmap(ring, pending_req->segments,
 	                pending_req->nr_segs);
  fail_response:
 	/* Haven't submitted any bio's yet. */
-	make_response(blkif, req->u.rw.id, req_operation, BLKIF_RSP_ERROR);
-	free_req(blkif, pending_req);
+	make_response(ring, req->u.rw.id, req_operation, BLKIF_RSP_ERROR);
+	free_req(ring, pending_req);
 	msleep(1); /* back off a bit */
 	return -EIO;
 
@@ -1401,21 +1407,22 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 /*
  * Put a response on the ring on how the operation fared.
  */
-static void make_response(struct xen_blkif *blkif, u64 id,
+static void make_response(struct xen_blkif_ring *ring, u64 id,
 			  unsigned short op, int st)
 {
 	struct blkif_response  resp;
 	unsigned long     flags;
-	union blkif_back_rings *blk_rings = &blkif->blk_rings;
+	union blkif_back_rings *blk_rings;
 	int notify;
 
 	resp.id        = id;
 	resp.operation = op;
 	resp.status    = st;
 
-	spin_lock_irqsave(&blkif->blk_ring_lock, flags);
+	spin_lock_irqsave(&ring->blk_ring_lock, flags);
+	blk_rings = &ring->blk_rings;
 	/* Place on the response ring for the relevant domain. */
-	switch (blkif->blk_protocol) {
+	switch (ring->blkif->blk_protocol) {
 	case BLKIF_PROTOCOL_NATIVE:
 		memcpy(RING_GET_RESPONSE(&blk_rings->native, blk_rings->native.rsp_prod_pvt),
 		       &resp, sizeof(resp));
@@ -1433,9 +1440,9 @@ static void make_response(struct xen_blkif *blkif, u64 id,
 	}
 	blk_rings->common.rsp_prod_pvt++;
 	RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&blk_rings->common, notify);
-	spin_unlock_irqrestore(&blkif->blk_ring_lock, flags);
+	spin_unlock_irqrestore(&ring->blk_ring_lock, flags);
 	if (notify)
-		notify_remote_via_irq(blkif->irq);
+		notify_remote_via_irq(ring->irq);
 }
 
 static int __init xen_blkif_init(void)
diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index 45a044a..cc253d4 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -260,30 +260,18 @@ struct persistent_gnt {
 	struct list_head remove_node;
 };
 
-struct xen_blkif {
-	/* Unique identifier for this interface. */
-	domid_t			domid;
-	unsigned int		handle;
+/* Per-ring information */
+struct xen_blkif_ring {
 	/* Physical parameters of the comms window. */
 	unsigned int		irq;
-	/* Comms information. */
-	enum blkif_protocol	blk_protocol;
 	union blkif_back_rings	blk_rings;
 	void			*blk_ring;
-	/* The VBD attached to this interface. */
-	struct xen_vbd		vbd;
-	/* Back pointer to the backend_info. */
-	struct backend_info	*be;
 	/* Private fields. */
 	spinlock_t		blk_ring_lock;
-	atomic_t		refcnt;
 
 	wait_queue_head_t	wq;
-	/* for barrier (drain) requests */
-	struct completion	drain_complete;
-	atomic_t		drain;
 	atomic_t		inflight;
-	/* One thread per one blkif. */
+	/* One thread per blkif ring. */
 	struct task_struct	*xenblkd;
 	unsigned int		waiting_reqs;
 
@@ -321,7 +309,37 @@ struct xen_blkif {
 	struct work_struct	free_work;
 	/* Thread shutdown wait queue. */
 	wait_queue_head_t	shutdown_wq;
+	struct xen_blkif *blkif;
+};
+
+struct xen_blkif {
+	/* Unique identifier for this interface. */
+	domid_t			domid;
+	unsigned int		handle;
+	/* Comms information. */
+	enum blkif_protocol	blk_protocol;
+	/* The VBD attached to this interface. */
+	struct xen_vbd		vbd;
+	/* Back pointer to the backend_info. */
+	struct backend_info	*be;
+	/* for barrier (drain) requests */
+	struct completion	drain_complete;
+	atomic_t		drain;
+	atomic_t		refcnt;
+	struct work_struct	free_work;
+
+	/* statistics */
+	unsigned long		st_print;
+	unsigned long long			st_rd_req;
+	unsigned long long			st_wr_req;
+	unsigned long long			st_oo_req;
+	unsigned long long			st_f_req;
+	unsigned long long			st_ds_req;
+	unsigned long long			st_rd_sect;
+	unsigned long long			st_wr_sect;
 	unsigned int nr_ring_pages;
+	/* All rings for this device */
+	struct xen_blkif_ring ring;
 };
 
 struct seg_buf {
@@ -343,7 +361,7 @@ struct grant_page {
  * response queued for it, with the saved 'id' passed back.
  */
 struct pending_req {
-	struct xen_blkif	*blkif;
+	struct xen_blkif_ring   *ring;
 	u64			id;
 	int			nr_segs;
 	atomic_t		pendcnt;
@@ -385,7 +403,7 @@ int xen_blkif_xenbus_init(void);
 irqreturn_t xen_blkif_be_int(int irq, void *dev_id);
 int xen_blkif_schedule(void *arg);
 int xen_blkif_purge_persistent(void *arg);
-void xen_blkbk_free_caches(struct xen_blkif *blkif);
+void xen_blkbk_free_caches(struct xen_blkif_ring *ring);
 
 int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt,
 			      struct backend_info *be, int state);
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index deb3f00..6482ee3 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -88,7 +88,7 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
 	char name[BLKBACK_NAME_LEN];
 
 	/* Not ready to connect? */
-	if (!blkif->irq || !blkif->vbd.bdev)
+	if (!blkif->ring.irq || !blkif->vbd.bdev)
 		return;
 
 	/* Already connected? */
@@ -113,10 +113,10 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
 	}
 	invalidate_inode_pages2(blkif->vbd.bdev->bd_inode->i_mapping);
 
-	blkif->xenblkd = kthread_run(xen_blkif_schedule, blkif, "%s", name);
-	if (IS_ERR(blkif->xenblkd)) {
-		err = PTR_ERR(blkif->xenblkd);
-		blkif->xenblkd = NULL;
+	blkif->ring.xenblkd = kthread_run(xen_blkif_schedule, &blkif->ring, "%s", name);
+	if (IS_ERR(blkif->ring.xenblkd)) {
+		err = PTR_ERR(blkif->ring.xenblkd);
+		blkif->ring.xenblkd = NULL;
 		xenbus_dev_error(blkif->be->dev, err, "start xenblkd");
 		return;
 	}
@@ -125,6 +125,7 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
 static struct xen_blkif *xen_blkif_alloc(domid_t domid)
 {
 	struct xen_blkif *blkif;
+	struct xen_blkif_ring *ring;
 
 	BUILD_BUG_ON(MAX_INDIRECT_PAGES > BLKIF_MAX_INDIRECT_PAGES_PER_REQUEST);
 
@@ -133,41 +134,44 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
 		return ERR_PTR(-ENOMEM);
 
 	blkif->domid = domid;
-	spin_lock_init(&blkif->blk_ring_lock);
+	ring = &blkif->ring;
+	ring->blkif = blkif;
+	spin_lock_init(&ring->blk_ring_lock);
 	atomic_set(&blkif->refcnt, 1);
-	init_waitqueue_head(&blkif->wq);
+	init_waitqueue_head(&ring->wq);
 	init_completion(&blkif->drain_complete);
 	atomic_set(&blkif->drain, 0);
-	blkif->st_print = jiffies;
-	blkif->persistent_gnts.rb_node = NULL;
-	spin_lock_init(&blkif->free_pages_lock);
-	INIT_LIST_HEAD(&blkif->free_pages);
-	INIT_LIST_HEAD(&blkif->persistent_purge_list);
-	blkif->free_pages_num = 0;
-	atomic_set(&blkif->persistent_gnt_in_use, 0);
-	atomic_set(&blkif->inflight, 0);
-	INIT_WORK(&blkif->persistent_purge_work, xen_blkbk_unmap_purged_grants);
-
-	INIT_LIST_HEAD(&blkif->pending_free);
+	ring->st_print = jiffies;
+	ring->persistent_gnts.rb_node = NULL;
+	spin_lock_init(&ring->free_pages_lock);
+	INIT_LIST_HEAD(&ring->free_pages);
+	INIT_LIST_HEAD(&ring->persistent_purge_list);
+	ring->free_pages_num = 0;
+	atomic_set(&ring->persistent_gnt_in_use, 0);
+	atomic_set(&ring->inflight, 0);
+	INIT_WORK(&ring->persistent_purge_work, xen_blkbk_unmap_purged_grants);
+
+	INIT_LIST_HEAD(&ring->pending_free);
 	INIT_WORK(&blkif->free_work, xen_blkif_deferred_free);
-	spin_lock_init(&blkif->pending_free_lock);
-	init_waitqueue_head(&blkif->pending_free_wq);
-	init_waitqueue_head(&blkif->shutdown_wq);
+	spin_lock_init(&ring->pending_free_lock);
+	init_waitqueue_head(&ring->pending_free_wq);
+	init_waitqueue_head(&ring->shutdown_wq);
 
 	return blkif;
 }
 
-static int xen_blkif_map(struct xen_blkif *blkif, grant_ref_t *gref,
+static int xen_blkif_map(struct xen_blkif_ring *ring, grant_ref_t *gref,
 			 unsigned int nr_grefs, unsigned int evtchn)
 {
 	int err;
+	struct xen_blkif *blkif = ring->blkif;
 
 	/* Already connected through? */
-	if (blkif->irq)
+	if (ring->irq)
 		return 0;
 
 	err = xenbus_map_ring_valloc(blkif->be->dev, gref, nr_grefs,
-				     &blkif->blk_ring);
+				     &ring->blk_ring);
 	if (err < 0)
 		return err;
 
@@ -175,22 +179,22 @@ static int xen_blkif_map(struct xen_blkif *blkif, grant_ref_t *gref,
 	case BLKIF_PROTOCOL_NATIVE:
 	{
 		struct blkif_sring *sring;
-		sring = (struct blkif_sring *)blkif->blk_ring;
-		BACK_RING_INIT(&blkif->blk_rings.native, sring, PAGE_SIZE * nr_grefs);
+		sring = (struct blkif_sring *)ring->blk_ring;
+		BACK_RING_INIT(&ring->blk_rings.native, sring, PAGE_SIZE * nr_grefs);
 		break;
 	}
 	case BLKIF_PROTOCOL_X86_32:
 	{
 		struct blkif_x86_32_sring *sring_x86_32;
-		sring_x86_32 = (struct blkif_x86_32_sring *)blkif->blk_ring;
-		BACK_RING_INIT(&blkif->blk_rings.x86_32, sring_x86_32, PAGE_SIZE * nr_grefs);
+		sring_x86_32 = (struct blkif_x86_32_sring *)ring->blk_ring;
+		BACK_RING_INIT(&ring->blk_rings.x86_32, sring_x86_32, PAGE_SIZE * nr_grefs);
 		break;
 	}
 	case BLKIF_PROTOCOL_X86_64:
 	{
 		struct blkif_x86_64_sring *sring_x86_64;
-		sring_x86_64 = (struct blkif_x86_64_sring *)blkif->blk_ring;
-		BACK_RING_INIT(&blkif->blk_rings.x86_64, sring_x86_64, PAGE_SIZE * nr_grefs);
+		sring_x86_64 = (struct blkif_x86_64_sring *)ring->blk_ring;
+		BACK_RING_INIT(&ring->blk_rings.x86_64, sring_x86_64, PAGE_SIZE * nr_grefs);
 		break;
 	}
 	default:
@@ -199,44 +203,46 @@ static int xen_blkif_map(struct xen_blkif *blkif, grant_ref_t *gref,
 
 	err = bind_interdomain_evtchn_to_irqhandler(blkif->domid, evtchn,
 						    xen_blkif_be_int, 0,
-						    "blkif-backend", blkif);
+						    "blkif-backend", ring);
 	if (err < 0) {
-		xenbus_unmap_ring_vfree(blkif->be->dev, blkif->blk_ring);
-		blkif->blk_rings.common.sring = NULL;
+		xenbus_unmap_ring_vfree(blkif->be->dev, ring->blk_ring);
+		ring->blk_rings.common.sring = NULL;
 		return err;
 	}
-	blkif->irq = err;
+	ring->irq = err;
 
 	return 0;
 }
 
 static int xen_blkif_disconnect(struct xen_blkif *blkif)
 {
-	if (blkif->xenblkd) {
-		kthread_stop(blkif->xenblkd);
-		wake_up(&blkif->shutdown_wq);
-		blkif->xenblkd = NULL;
+	struct xen_blkif_ring *ring = &blkif->ring;
+
+	if (ring->xenblkd) {
+		kthread_stop(ring->xenblkd);
+		wake_up(&ring->shutdown_wq);
+		ring->xenblkd = NULL;
 	}
 
 	/* The above kthread_stop() guarantees that at this point we
 	 * don't have any discard_io or other_io requests. So, checking
 	 * for inflight IO is enough.
 	 */
-	if (atomic_read(&blkif->inflight) > 0)
+	if (atomic_read(&ring->inflight) > 0)
 		return -EBUSY;
 
-	if (blkif->irq) {
-		unbind_from_irqhandler(blkif->irq, blkif);
-		blkif->irq = 0;
+	if (ring->irq) {
+		unbind_from_irqhandler(ring->irq, ring);
+		ring->irq = 0;
 	}
 
-	if (blkif->blk_rings.common.sring) {
-		xenbus_unmap_ring_vfree(blkif->be->dev, blkif->blk_ring);
-		blkif->blk_rings.common.sring = NULL;
+	if (ring->blk_rings.common.sring) {
+		xenbus_unmap_ring_vfree(blkif->be->dev, ring->blk_ring);
+		ring->blk_rings.common.sring = NULL;
 	}
 
 	/* Remove all persistent grants and the cache of ballooned pages. */
-	xen_blkbk_free_caches(blkif);
+	xen_blkbk_free_caches(ring);
 
 	return 0;
 }
@@ -245,20 +251,21 @@ static void xen_blkif_free(struct xen_blkif *blkif)
 {
 	struct pending_req *req, *n;
 	int i = 0, j;
+	struct xen_blkif_ring *ring = &blkif->ring;
 
 	xen_blkif_disconnect(blkif);
 	xen_vbd_free(&blkif->vbd);
 
 	/* Make sure everything is drained before shutting down */
-	BUG_ON(blkif->persistent_gnt_c != 0);
-	BUG_ON(atomic_read(&blkif->persistent_gnt_in_use) != 0);
-	BUG_ON(blkif->free_pages_num != 0);
-	BUG_ON(!list_empty(&blkif->persistent_purge_list));
-	BUG_ON(!list_empty(&blkif->free_pages));
-	BUG_ON(!RB_EMPTY_ROOT(&blkif->persistent_gnts));
+	BUG_ON(ring->persistent_gnt_c != 0);
+	BUG_ON(atomic_read(&ring->persistent_gnt_in_use) != 0);
+	BUG_ON(ring->free_pages_num != 0);
+	BUG_ON(!list_empty(&ring->persistent_purge_list));
+	BUG_ON(!list_empty(&ring->free_pages));
+	BUG_ON(!RB_EMPTY_ROOT(&ring->persistent_gnts));
 
 	/* Check that there is no request in use */
-	list_for_each_entry_safe(req, n, &blkif->pending_free, free_list) {
+	list_for_each_entry_safe(req, n, &ring->pending_free, free_list) {
 		list_del(&req->free_list);
 
 		for (j = 0; j < MAX_INDIRECT_SEGMENTS; j++)
@@ -298,6 +305,16 @@ int __init xen_blkif_interface_init(void)
 	{								\
 		struct xenbus_device *dev = to_xenbus_device(_dev);	\
 		struct backend_info *be = dev_get_drvdata(&dev->dev);	\
+		struct xen_blkif *blkif = be->blkif;			\
+		struct xen_blkif_ring *ring = &blkif->ring;		\
+									\
+		blkif->st_oo_req = ring->st_oo_req;			\
+		blkif->st_rd_req = ring->st_rd_req;			\
+		blkif->st_wr_req = ring->st_wr_req;			\
+		blkif->st_f_req = ring->st_f_req;			\
+		blkif->st_ds_req = ring->st_ds_req;			\
+		blkif->st_rd_sect = ring->st_rd_sect;			\
+		blkif->st_wr_sect = ring->st_wr_sect;			\
 									\
 		return sprintf(buf, format, ##args);			\
 	}								\
@@ -830,6 +847,7 @@ static int connect_ring(struct backend_info *be)
 	char protocol[64] = "";
 	struct pending_req *req, *n;
 	int err, i, j;
+	struct xen_blkif_ring *ring = &be->blkif->ring;
 
 	pr_debug("%s %s\n", __func__, dev->otherend);
 
@@ -918,7 +936,7 @@ static int connect_ring(struct backend_info *be)
 		req = kzalloc(sizeof(*req), GFP_KERNEL);
 		if (!req)
 			goto fail;
-		list_add_tail(&req->free_list, &be->blkif->pending_free);
+		list_add_tail(&req->free_list, &ring->pending_free);
 		for (j = 0; j < MAX_INDIRECT_SEGMENTS; j++) {
 			req->segments[j] = kzalloc(sizeof(*req->segments[0]), GFP_KERNEL);
 			if (!req->segments[j])
@@ -933,7 +951,7 @@ static int connect_ring(struct backend_info *be)
 	}
 
 	/* Map the shared frame, irq etc. */
-	err = xen_blkif_map(be->blkif, ring_ref, nr_grefs, evtchn);
+	err = xen_blkif_map(ring, ring_ref, nr_grefs, evtchn);
 	if (err) {
 		xenbus_dev_fatal(dev, err, "mapping ring-ref port %u", evtchn);
 		return err;
@@ -942,7 +960,7 @@ static int connect_ring(struct backend_info *be)
 	return 0;
 
 fail:
-	list_for_each_entry_safe(req, n, &be->blkif->pending_free, free_list) {
+	list_for_each_entry_safe(req, n, &ring->pending_free, free_list) {
 		list_del(&req->free_list);
 		for (j = 0; j < MAX_INDIRECT_SEGMENTS; j++) {
 			if (!req->segments[j])
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v3 7/9] xen/blkback: separate ring information out of struct xen_blkif
@ 2015-09-05 12:39   ` Bob Liu
  0 siblings, 0 replies; 83+ messages in thread
From: Bob Liu @ 2015-09-05 12:39 UTC (permalink / raw)
  To: xen-devel
  Cc: hch, felipe.franciosi, rafal.mielniczuk, linux-kernel,
	jonathan.davies, axboe, Bob Liu, david.vrabel, avanzini.arianna,
	boris.ostrovsky, roger.pau

Split per ring information to an new structure:xen_blkif_ring, so that one vbd
device can associate with one or more rings/hardware queues.

This patch is a preparation for supporting multi hardware queues/rings.

Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 drivers/block/xen-blkback/blkback.c |  365 ++++++++++++++++++-----------------
 drivers/block/xen-blkback/common.h  |   52 +++--
 drivers/block/xen-blkback/xenbus.c  |  130 +++++++------
 3 files changed, 295 insertions(+), 252 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index 954c002..fd02240 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -113,71 +113,71 @@ module_param(log_stats, int, 0644);
 /* Number of free pages to remove on each call to gnttab_free_pages */
 #define NUM_BATCH_FREE_PAGES 10
 
-static inline int get_free_page(struct xen_blkif *blkif, struct page **page)
+static inline int get_free_page(struct xen_blkif_ring *ring, struct page **page)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&blkif->free_pages_lock, flags);
-	if (list_empty(&blkif->free_pages)) {
-		BUG_ON(blkif->free_pages_num != 0);
-		spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
+	spin_lock_irqsave(&ring->free_pages_lock, flags);
+	if (list_empty(&ring->free_pages)) {
+		BUG_ON(ring->free_pages_num != 0);
+		spin_unlock_irqrestore(&ring->free_pages_lock, flags);
 		return gnttab_alloc_pages(1, page);
 	}
-	BUG_ON(blkif->free_pages_num == 0);
-	page[0] = list_first_entry(&blkif->free_pages, struct page, lru);
+	BUG_ON(ring->free_pages_num == 0);
+	page[0] = list_first_entry(&ring->free_pages, struct page, lru);
 	list_del(&page[0]->lru);
-	blkif->free_pages_num--;
-	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
+	ring->free_pages_num--;
+	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
 
 	return 0;
 }
 
-static inline void put_free_pages(struct xen_blkif *blkif, struct page **page,
+static inline void put_free_pages(struct xen_blkif_ring *ring, struct page **page,
                                   int num)
 {
 	unsigned long flags;
 	int i;
 
-	spin_lock_irqsave(&blkif->free_pages_lock, flags);
+	spin_lock_irqsave(&ring->free_pages_lock, flags);
 	for (i = 0; i < num; i++)
-		list_add(&page[i]->lru, &blkif->free_pages);
-	blkif->free_pages_num += num;
-	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
+		list_add(&page[i]->lru, &ring->free_pages);
+	ring->free_pages_num += num;
+	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
 }
 
-static inline void shrink_free_pagepool(struct xen_blkif *blkif, int num)
+static inline void shrink_free_pagepool(struct xen_blkif_ring *ring, int num)
 {
 	/* Remove requested pages in batches of NUM_BATCH_FREE_PAGES */
 	struct page *page[NUM_BATCH_FREE_PAGES];
 	unsigned int num_pages = 0;
 	unsigned long flags;
 
-	spin_lock_irqsave(&blkif->free_pages_lock, flags);
-	while (blkif->free_pages_num > num) {
-		BUG_ON(list_empty(&blkif->free_pages));
-		page[num_pages] = list_first_entry(&blkif->free_pages,
+	spin_lock_irqsave(&ring->free_pages_lock, flags);
+	while (ring->free_pages_num > num) {
+		BUG_ON(list_empty(&ring->free_pages));
+		page[num_pages] = list_first_entry(&ring->free_pages,
 		                                   struct page, lru);
 		list_del(&page[num_pages]->lru);
-		blkif->free_pages_num--;
+		ring->free_pages_num--;
 		if (++num_pages == NUM_BATCH_FREE_PAGES) {
-			spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
+			spin_unlock_irqrestore(&ring->free_pages_lock, flags);
 			gnttab_free_pages(num_pages, page);
-			spin_lock_irqsave(&blkif->free_pages_lock, flags);
+			spin_lock_irqsave(&ring->free_pages_lock, flags);
 			num_pages = 0;
 		}
 	}
-	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
+	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
 	if (num_pages != 0)
 		gnttab_free_pages(num_pages, page);
 }
 
 #define vaddr(page) ((unsigned long)pfn_to_kaddr(page_to_pfn(page)))
 
-static int do_block_io_op(struct xen_blkif *blkif);
-static int dispatch_rw_block_io(struct xen_blkif *blkif,
+static int do_block_io_op(struct xen_blkif_ring *ring);
+static int dispatch_rw_block_io(struct xen_blkif_ring *ring,
 				struct blkif_request *req,
 				struct pending_req *pending_req);
-static void make_response(struct xen_blkif *blkif, u64 id,
+static void make_response(struct xen_blkif_ring *ring, u64 id,
 			  unsigned short op, int st);
 
 #define foreach_grant_safe(pos, n, rbtree, node) \
@@ -198,19 +198,19 @@ static void make_response(struct xen_blkif *blkif, u64 id,
  * bit operations to modify the flags of a persistent grant and to count
  * the number of used grants.
  */
-static int add_persistent_gnt(struct xen_blkif *blkif,
+static int add_persistent_gnt(struct xen_blkif_ring *ring,
 			       struct persistent_gnt *persistent_gnt)
 {
 	struct rb_node **new = NULL, *parent = NULL;
 	struct persistent_gnt *this;
 
-	if (blkif->persistent_gnt_c >= xen_blkif_max_pgrants) {
-		if (!blkif->vbd.overflow_max_grants)
-			blkif->vbd.overflow_max_grants = 1;
+	if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) {
+		if (!ring->blkif->vbd.overflow_max_grants)
+			ring->blkif->vbd.overflow_max_grants = 1;
 		return -EBUSY;
 	}
 	/* Figure out where to put new node */
-	new = &blkif->persistent_gnts.rb_node;
+	new = &ring->persistent_gnts.rb_node;
 	while (*new) {
 		this = container_of(*new, struct persistent_gnt, node);
 
@@ -229,19 +229,19 @@ static int add_persistent_gnt(struct xen_blkif *blkif,
 	set_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags);
 	/* Add new node and rebalance tree. */
 	rb_link_node(&(persistent_gnt->node), parent, new);
-	rb_insert_color(&(persistent_gnt->node), &blkif->persistent_gnts);
-	blkif->persistent_gnt_c++;
-	atomic_inc(&blkif->persistent_gnt_in_use);
+	rb_insert_color(&(persistent_gnt->node), &ring->persistent_gnts);
+	ring->persistent_gnt_c++;
+	atomic_inc(&ring->persistent_gnt_in_use);
 	return 0;
 }
 
-static struct persistent_gnt *get_persistent_gnt(struct xen_blkif *blkif,
-						 grant_ref_t gref)
+static struct persistent_gnt *get_persistent_gnt(struct xen_blkif_ring *ring,
+						grant_ref_t gref)
 {
 	struct persistent_gnt *data;
 	struct rb_node *node = NULL;
 
-	node = blkif->persistent_gnts.rb_node;
+	node = ring->persistent_gnts.rb_node;
 	while (node) {
 		data = container_of(node, struct persistent_gnt, node);
 
@@ -255,24 +255,24 @@ static struct persistent_gnt *get_persistent_gnt(struct xen_blkif *blkif,
 				return NULL;
 			}
 			set_bit(PERSISTENT_GNT_ACTIVE, data->flags);
-			atomic_inc(&blkif->persistent_gnt_in_use);
+			atomic_inc(&ring->persistent_gnt_in_use);
 			return data;
 		}
 	}
 	return NULL;
 }
 
-static void put_persistent_gnt(struct xen_blkif *blkif,
+static void put_persistent_gnt(struct xen_blkif_ring *ring,
                                struct persistent_gnt *persistent_gnt)
 {
 	if(!test_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags))
 		pr_alert_ratelimited("freeing a grant already unused\n");
 	set_bit(PERSISTENT_GNT_WAS_ACTIVE, persistent_gnt->flags);
 	clear_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags);
-	atomic_dec(&blkif->persistent_gnt_in_use);
+	atomic_dec(&ring->persistent_gnt_in_use);
 }
 
-static void free_persistent_gnts(struct xen_blkif *blkif, struct rb_root *root,
+static void free_persistent_gnts(struct xen_blkif_ring *ring, struct rb_root *root,
                                  unsigned int num)
 {
 	struct gnttab_unmap_grant_ref unmap[BLKIF_MAX_SEGMENTS_PER_REQUEST];
@@ -303,7 +303,7 @@ static void free_persistent_gnts(struct xen_blkif *blkif, struct rb_root *root,
 			unmap_data.count = segs_to_unmap;
 			BUG_ON(gnttab_unmap_refs_sync(&unmap_data));
 
-			put_free_pages(blkif, pages, segs_to_unmap);
+			put_free_pages(ring, pages, segs_to_unmap);
 			segs_to_unmap = 0;
 		}
 
@@ -320,15 +320,15 @@ void xen_blkbk_unmap_purged_grants(struct work_struct *work)
 	struct page *pages[BLKIF_MAX_SEGMENTS_PER_REQUEST];
 	struct persistent_gnt *persistent_gnt;
 	int segs_to_unmap = 0;
-	struct xen_blkif *blkif = container_of(work, typeof(*blkif), persistent_purge_work);
+	struct xen_blkif_ring *ring = container_of(work, typeof(*ring), persistent_purge_work);
 	struct gntab_unmap_queue_data unmap_data;
 
 	unmap_data.pages = pages;
 	unmap_data.unmap_ops = unmap;
 	unmap_data.kunmap_ops = NULL;
 
-	while(!list_empty(&blkif->persistent_purge_list)) {
-		persistent_gnt = list_first_entry(&blkif->persistent_purge_list,
+	while(!list_empty(&ring->persistent_purge_list)) {
+		persistent_gnt = list_first_entry(&ring->persistent_purge_list,
 		                                  struct persistent_gnt,
 		                                  remove_node);
 		list_del(&persistent_gnt->remove_node);
@@ -343,7 +343,7 @@ void xen_blkbk_unmap_purged_grants(struct work_struct *work)
 		if (++segs_to_unmap == BLKIF_MAX_SEGMENTS_PER_REQUEST) {
 			unmap_data.count = segs_to_unmap;
 			BUG_ON(gnttab_unmap_refs_sync(&unmap_data));
-			put_free_pages(blkif, pages, segs_to_unmap);
+			put_free_pages(ring, pages, segs_to_unmap);
 			segs_to_unmap = 0;
 		}
 		kfree(persistent_gnt);
@@ -351,34 +351,35 @@ void xen_blkbk_unmap_purged_grants(struct work_struct *work)
 	if (segs_to_unmap > 0) {
 		unmap_data.count = segs_to_unmap;
 		BUG_ON(gnttab_unmap_refs_sync(&unmap_data));
-		put_free_pages(blkif, pages, segs_to_unmap);
+		put_free_pages(ring, pages, segs_to_unmap);
 	}
 }
 
-static void purge_persistent_gnt(struct xen_blkif *blkif)
+static void purge_persistent_gnt(struct xen_blkif_ring *ring)
 {
 	struct persistent_gnt *persistent_gnt;
 	struct rb_node *n;
 	unsigned int num_clean, total;
 	bool scan_used = false, clean_used = false;
 	struct rb_root *root;
+	struct xen_blkif *blkif = ring->blkif;
 
-	if (blkif->persistent_gnt_c < xen_blkif_max_pgrants ||
-	    (blkif->persistent_gnt_c == xen_blkif_max_pgrants &&
+	if (ring->persistent_gnt_c < xen_blkif_max_pgrants ||
+	    (ring->persistent_gnt_c == xen_blkif_max_pgrants &&
 	    !blkif->vbd.overflow_max_grants)) {
 		return;
 	}
 
-	if (work_busy(&blkif->persistent_purge_work)) {
+	if (work_busy(&ring->persistent_purge_work)) {
 		pr_alert_ratelimited("Scheduled work from previous purge is still busy, cannot purge list\n");
 		return;
 	}
 
 	num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN;
-	num_clean = blkif->persistent_gnt_c - xen_blkif_max_pgrants + num_clean;
-	num_clean = min(blkif->persistent_gnt_c, num_clean);
+	num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants + num_clean;
+	num_clean = min(ring->persistent_gnt_c, num_clean);
 	if ((num_clean == 0) ||
-	    (num_clean > (blkif->persistent_gnt_c - atomic_read(&blkif->persistent_gnt_in_use))))
+	    (num_clean > (ring->persistent_gnt_c - atomic_read(&ring->persistent_gnt_in_use))))
 		return;
 
 	/*
@@ -394,8 +395,8 @@ static void purge_persistent_gnt(struct xen_blkif *blkif)
 
 	pr_debug("Going to purge %u persistent grants\n", num_clean);
 
-	BUG_ON(!list_empty(&blkif->persistent_purge_list));
-	root = &blkif->persistent_gnts;
+	BUG_ON(!list_empty(&ring->persistent_purge_list));
+	root = &ring->persistent_gnts;
 purge_list:
 	foreach_grant_safe(persistent_gnt, n, root, node) {
 		BUG_ON(persistent_gnt->handle ==
@@ -414,7 +415,7 @@ purge_list:
 
 		rb_erase(&persistent_gnt->node, root);
 		list_add(&persistent_gnt->remove_node,
-		         &blkif->persistent_purge_list);
+			 &ring->persistent_purge_list);
 		if (--num_clean == 0)
 			goto finished;
 	}
@@ -435,11 +436,11 @@ finished:
 		goto purge_list;
 	}
 
-	blkif->persistent_gnt_c -= (total - num_clean);
+	ring->persistent_gnt_c -= (total - num_clean);
 	blkif->vbd.overflow_max_grants = 0;
 
 	/* We can defer this work */
-	schedule_work(&blkif->persistent_purge_work);
+	schedule_work(&ring->persistent_purge_work);
 	pr_debug("Purged %u/%u\n", (total - num_clean), total);
 	return;
 }
@@ -447,18 +448,18 @@ finished:
 /*
  * Retrieve from the 'pending_reqs' a free pending_req structure to be used.
  */
-static struct pending_req *alloc_req(struct xen_blkif *blkif)
+static struct pending_req *alloc_req(struct xen_blkif_ring *ring)
 {
 	struct pending_req *req = NULL;
 	unsigned long flags;
 
-	spin_lock_irqsave(&blkif->pending_free_lock, flags);
-	if (!list_empty(&blkif->pending_free)) {
-		req = list_entry(blkif->pending_free.next, struct pending_req,
+	spin_lock_irqsave(&ring->pending_free_lock, flags);
+	if (!list_empty(&ring->pending_free)) {
+		req = list_entry(ring->pending_free.next, struct pending_req,
 				 free_list);
 		list_del(&req->free_list);
 	}
-	spin_unlock_irqrestore(&blkif->pending_free_lock, flags);
+	spin_unlock_irqrestore(&ring->pending_free_lock, flags);
 	return req;
 }
 
@@ -466,17 +467,17 @@ static struct pending_req *alloc_req(struct xen_blkif *blkif)
  * Return the 'pending_req' structure back to the freepool. We also
  * wake up the thread if it was waiting for a free page.
  */
-static void free_req(struct xen_blkif *blkif, struct pending_req *req)
+static void free_req(struct xen_blkif_ring *ring, struct pending_req *req)
 {
 	unsigned long flags;
 	int was_empty;
 
-	spin_lock_irqsave(&blkif->pending_free_lock, flags);
-	was_empty = list_empty(&blkif->pending_free);
-	list_add(&req->free_list, &blkif->pending_free);
-	spin_unlock_irqrestore(&blkif->pending_free_lock, flags);
+	spin_lock_irqsave(&ring->pending_free_lock, flags);
+	was_empty = list_empty(&ring->pending_free);
+	list_add(&req->free_list, &ring->pending_free);
+	spin_unlock_irqrestore(&ring->pending_free_lock, flags);
 	if (was_empty)
-		wake_up(&blkif->pending_free_wq);
+		wake_up(&ring->pending_free_wq);
 }
 
 /*
@@ -556,10 +557,10 @@ abort:
 /*
  * Notification from the guest OS.
  */
-static void blkif_notify_work(struct xen_blkif *blkif)
+static void blkif_notify_work(struct xen_blkif_ring *ring)
 {
-	blkif->waiting_reqs = 1;
-	wake_up(&blkif->wq);
+	ring->waiting_reqs = 1;
+	wake_up(&ring->wq);
 }
 
 irqreturn_t xen_blkif_be_int(int irq, void *dev_id)
@@ -572,25 +573,26 @@ irqreturn_t xen_blkif_be_int(int irq, void *dev_id)
  * SCHEDULER FUNCTIONS
  */
 
-static void print_stats(struct xen_blkif *blkif)
+static void print_stats(struct xen_blkif_ring *ring)
 {
 	pr_info("(%s): oo %3llu  |  rd %4llu  |  wr %4llu  |  f %4llu"
 		 "  |  ds %4llu | pg: %4u/%4d\n",
-		 current->comm, blkif->st_oo_req,
-		 blkif->st_rd_req, blkif->st_wr_req,
-		 blkif->st_f_req, blkif->st_ds_req,
-		 blkif->persistent_gnt_c,
+		 current->comm, ring->st_oo_req,
+		 ring->st_rd_req, ring->st_wr_req,
+		 ring->st_f_req, ring->st_ds_req,
+		 ring->persistent_gnt_c,
 		 xen_blkif_max_pgrants);
-	blkif->st_print = jiffies + msecs_to_jiffies(10 * 1000);
-	blkif->st_rd_req = 0;
-	blkif->st_wr_req = 0;
-	blkif->st_oo_req = 0;
-	blkif->st_ds_req = 0;
+	ring->st_print = jiffies + msecs_to_jiffies(10 * 1000);
+	ring->st_rd_req = 0;
+	ring->st_wr_req = 0;
+	ring->st_oo_req = 0;
+	ring->st_ds_req = 0;
 }
 
 int xen_blkif_schedule(void *arg)
 {
-	struct xen_blkif *blkif = arg;
+	struct xen_blkif_ring *ring = arg;
+	struct xen_blkif *blkif = ring->blkif;
 	struct xen_vbd *vbd = &blkif->vbd;
 	unsigned long timeout;
 	int ret;
@@ -606,50 +608,50 @@ int xen_blkif_schedule(void *arg)
 		timeout = msecs_to_jiffies(LRU_INTERVAL);
 
 		timeout = wait_event_interruptible_timeout(
-			blkif->wq,
-			blkif->waiting_reqs || kthread_should_stop(),
+			ring->wq,
+			ring->waiting_reqs || kthread_should_stop(),
 			timeout);
 		if (timeout == 0)
 			goto purge_gnt_list;
 		timeout = wait_event_interruptible_timeout(
-			blkif->pending_free_wq,
-			!list_empty(&blkif->pending_free) ||
+			ring->pending_free_wq,
+			!list_empty(&ring->pending_free) ||
 			kthread_should_stop(),
 			timeout);
 		if (timeout == 0)
 			goto purge_gnt_list;
 
-		blkif->waiting_reqs = 0;
+		ring->waiting_reqs = 0;
 		smp_mb(); /* clear flag *before* checking for work */
 
-		ret = do_block_io_op(blkif);
+		ret = do_block_io_op(ring);
 		if (ret > 0)
-			blkif->waiting_reqs = 1;
+			ring->waiting_reqs = 1;
 		if (ret == -EACCES)
-			wait_event_interruptible(blkif->shutdown_wq,
+			wait_event_interruptible(ring->shutdown_wq,
 						 kthread_should_stop());
 
 purge_gnt_list:
 		if (blkif->vbd.feature_gnt_persistent &&
-		    time_after(jiffies, blkif->next_lru)) {
-			purge_persistent_gnt(blkif);
-			blkif->next_lru = jiffies + msecs_to_jiffies(LRU_INTERVAL);
+		    time_after(jiffies, ring->next_lru)) {
+			purge_persistent_gnt(ring);
+			ring->next_lru = jiffies + msecs_to_jiffies(LRU_INTERVAL);
 		}
 
 		/* Shrink if we have more than xen_blkif_max_buffer_pages */
-		shrink_free_pagepool(blkif, xen_blkif_max_buffer_pages);
+		shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);
 
-		if (log_stats && time_after(jiffies, blkif->st_print))
-			print_stats(blkif);
+		if (log_stats && time_after(jiffies, ring->st_print))
+			print_stats(ring);
 	}
 
 	/* Drain pending purge work */
-	flush_work(&blkif->persistent_purge_work);
+	flush_work(&ring->persistent_purge_work);
 
 	if (log_stats)
-		print_stats(blkif);
+		print_stats(ring);
 
-	blkif->xenblkd = NULL;
+	ring->xenblkd = NULL;
 	xen_blkif_put(blkif);
 
 	return 0;
@@ -658,22 +660,22 @@ purge_gnt_list:
 /*
  * Remove persistent grants and empty the pool of free pages
  */
-void xen_blkbk_free_caches(struct xen_blkif *blkif)
+void xen_blkbk_free_caches(struct xen_blkif_ring *ring)
 {
 	/* Free all persistent grant pages */
-	if (!RB_EMPTY_ROOT(&blkif->persistent_gnts))
-		free_persistent_gnts(blkif, &blkif->persistent_gnts,
-			blkif->persistent_gnt_c);
+	if (!RB_EMPTY_ROOT(&ring->persistent_gnts))
+		free_persistent_gnts(ring, &ring->persistent_gnts,
+			ring->persistent_gnt_c);
 
-	BUG_ON(!RB_EMPTY_ROOT(&blkif->persistent_gnts));
-	blkif->persistent_gnt_c = 0;
+	BUG_ON(!RB_EMPTY_ROOT(&ring->persistent_gnts));
+	ring->persistent_gnt_c = 0;
 
 	/* Since we are shutting down remove all pages from the buffer */
-	shrink_free_pagepool(blkif, 0 /* All */);
+	shrink_free_pagepool(ring, 0 /* All */);
 }
 
 static unsigned int xen_blkbk_unmap_prepare(
-	struct xen_blkif *blkif,
+	struct xen_blkif_ring *ring,
 	struct grant_page **pages,
 	unsigned int num,
 	struct gnttab_unmap_grant_ref *unmap_ops,
@@ -683,7 +685,7 @@ static unsigned int xen_blkbk_unmap_prepare(
 
 	for (i = 0; i < num; i++) {
 		if (pages[i]->persistent_gnt != NULL) {
-			put_persistent_gnt(blkif, pages[i]->persistent_gnt);
+			put_persistent_gnt(ring, pages[i]->persistent_gnt);
 			continue;
 		}
 		if (pages[i]->handle == BLKBACK_INVALID_HANDLE)
@@ -700,17 +702,18 @@ static unsigned int xen_blkbk_unmap_prepare(
 
 static void xen_blkbk_unmap_and_respond_callback(int result, struct gntab_unmap_queue_data *data)
 {
-	struct pending_req* pending_req = (struct pending_req*) (data->data);
-	struct xen_blkif *blkif = pending_req->blkif;
+	struct pending_req *pending_req = (struct pending_req *)(data->data);
+	struct xen_blkif_ring *ring = pending_req->ring;
+	struct xen_blkif *blkif = ring->blkif;
 
 	/* BUG_ON used to reproduce existing behaviour,
 	   but is this the best way to deal with this? */
 	BUG_ON(result);
 
-	put_free_pages(blkif, data->pages, data->count);
-	make_response(blkif, pending_req->id,
+	put_free_pages(ring, data->pages, data->count);
+	make_response(ring, pending_req->id,
 		      pending_req->operation, pending_req->status);
-	free_req(blkif, pending_req);
+	free_req(ring, pending_req);
 	/*
 	 * Make sure the request is freed before releasing blkif,
 	 * or there could be a race between free_req and the
@@ -723,7 +726,7 @@ static void xen_blkbk_unmap_and_respond_callback(int result, struct gntab_unmap_
 	 * pending_free_wq if there's a drain going on, but it has
 	 * to be taken into account if the current model is changed.
 	 */
-	if (atomic_dec_and_test(&blkif->inflight) && atomic_read(&blkif->drain)) {
+	if (atomic_dec_and_test(&ring->inflight) && atomic_read(&blkif->drain)) {
 		complete(&blkif->drain_complete);
 	}
 	xen_blkif_put(blkif);
@@ -732,11 +735,11 @@ static void xen_blkbk_unmap_and_respond_callback(int result, struct gntab_unmap_
 static void xen_blkbk_unmap_and_respond(struct pending_req *req)
 {
 	struct gntab_unmap_queue_data* work = &req->gnttab_unmap_data;
-	struct xen_blkif *blkif = req->blkif;
+	struct xen_blkif_ring *ring = req->ring;
 	struct grant_page **pages = req->segments;
 	unsigned int invcount;
 
-	invcount = xen_blkbk_unmap_prepare(blkif, pages, req->nr_segs,
+	invcount = xen_blkbk_unmap_prepare(ring, pages, req->nr_segs,
 					   req->unmap, req->unmap_pages);
 
 	work->data = req;
@@ -757,7 +760,7 @@ static void xen_blkbk_unmap_and_respond(struct pending_req *req)
  * of hypercalls, but since this is only used in error paths there's
  * no real need.
  */
-static void xen_blkbk_unmap(struct xen_blkif *blkif,
+static void xen_blkbk_unmap(struct xen_blkif_ring *ring,
                             struct grant_page *pages[],
                             int num)
 {
@@ -768,20 +771,20 @@ static void xen_blkbk_unmap(struct xen_blkif *blkif,
 
 	while (num) {
 		unsigned int batch = min(num, BLKIF_MAX_SEGMENTS_PER_REQUEST);
-		
-		invcount = xen_blkbk_unmap_prepare(blkif, pages, batch,
+
+		invcount = xen_blkbk_unmap_prepare(ring, pages, batch,
 						   unmap, unmap_pages);
 		if (invcount) {
 			ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, invcount);
 			BUG_ON(ret);
-			put_free_pages(blkif, unmap_pages, invcount);
+			put_free_pages(ring, unmap_pages, invcount);
 		}
 		pages += batch;
 		num -= batch;
 	}
 }
 
-static int xen_blkbk_map(struct xen_blkif *blkif,
+static int xen_blkbk_map(struct xen_blkif_ring *ring,
 			 struct grant_page *pages[],
 			 int num, bool ro)
 {
@@ -794,6 +797,7 @@ static int xen_blkbk_map(struct xen_blkif *blkif,
 	int ret = 0;
 	int last_map = 0, map_until = 0;
 	int use_persistent_gnts;
+	struct xen_blkif *blkif = ring->blkif;
 
 	use_persistent_gnts = (blkif->vbd.feature_gnt_persistent);
 
@@ -808,7 +812,7 @@ again:
 
 		if (use_persistent_gnts)
 			persistent_gnt = get_persistent_gnt(
-				blkif,
+				ring,
 				pages[i]->gref);
 
 		if (persistent_gnt) {
@@ -819,7 +823,7 @@ again:
 			pages[i]->page = persistent_gnt->page;
 			pages[i]->persistent_gnt = persistent_gnt;
 		} else {
-			if (get_free_page(blkif, &pages[i]->page))
+			if (get_free_page(ring, &pages[i]->page))
 				goto out_of_memory;
 			addr = vaddr(pages[i]->page);
 			pages_to_gnt[segs_to_map] = pages[i]->page;
@@ -852,7 +856,7 @@ again:
 			BUG_ON(new_map_idx >= segs_to_map);
 			if (unlikely(map[new_map_idx].status != 0)) {
 				pr_debug("invalid buffer -- could not remap it\n");
-				put_free_pages(blkif, &pages[seg_idx]->page, 1);
+				put_free_pages(ring, &pages[seg_idx]->page, 1);
 				pages[seg_idx]->handle = BLKBACK_INVALID_HANDLE;
 				ret |= 1;
 				goto next;
@@ -862,7 +866,7 @@ again:
 			continue;
 		}
 		if (use_persistent_gnts &&
-		    blkif->persistent_gnt_c < xen_blkif_max_pgrants) {
+		    ring->persistent_gnt_c < xen_blkif_max_pgrants) {
 			/*
 			 * We are using persistent grants, the grant is
 			 * not mapped but we might have room for it.
@@ -880,7 +884,7 @@ again:
 			persistent_gnt->gnt = map[new_map_idx].ref;
 			persistent_gnt->handle = map[new_map_idx].handle;
 			persistent_gnt->page = pages[seg_idx]->page;
-			if (add_persistent_gnt(blkif,
+			if (add_persistent_gnt(ring,
 			                       persistent_gnt)) {
 				kfree(persistent_gnt);
 				persistent_gnt = NULL;
@@ -888,7 +892,7 @@ again:
 			}
 			pages[seg_idx]->persistent_gnt = persistent_gnt;
 			pr_debug("grant %u added to the tree of persistent grants, using %u/%u\n",
-				 persistent_gnt->gnt, blkif->persistent_gnt_c,
+				 persistent_gnt->gnt, ring->persistent_gnt_c,
 				 xen_blkif_max_pgrants);
 			goto next;
 		}
@@ -913,7 +917,7 @@ next:
 
 out_of_memory:
 	pr_alert("%s: out of memory\n", __func__);
-	put_free_pages(blkif, pages_to_gnt, segs_to_map);
+	put_free_pages(ring, pages_to_gnt, segs_to_map);
 	return -ENOMEM;
 }
 
@@ -921,7 +925,7 @@ static int xen_blkbk_map_seg(struct pending_req *pending_req)
 {
 	int rc;
 
-	rc = xen_blkbk_map(pending_req->blkif, pending_req->segments,
+	rc = xen_blkbk_map(pending_req->ring, pending_req->segments,
 			   pending_req->nr_segs,
 	                   (pending_req->operation != BLKIF_OP_READ));
 
@@ -934,7 +938,7 @@ static int xen_blkbk_parse_indirect(struct blkif_request *req,
 				    struct phys_req *preq)
 {
 	struct grant_page **pages = pending_req->indirect_pages;
-	struct xen_blkif *blkif = pending_req->blkif;
+	struct xen_blkif_ring *ring = pending_req->ring;
 	int indirect_grefs, rc, n, nseg, i;
 	struct blkif_request_segment *segments = NULL;
 
@@ -945,7 +949,7 @@ static int xen_blkbk_parse_indirect(struct blkif_request *req,
 	for (i = 0; i < indirect_grefs; i++)
 		pages[i]->gref = req->u.indirect.indirect_grefs[i];
 
-	rc = xen_blkbk_map(blkif, pages, indirect_grefs, true);
+	rc = xen_blkbk_map(ring, pages, indirect_grefs, true);
 	if (rc)
 		goto unmap;
 
@@ -972,15 +976,16 @@ static int xen_blkbk_parse_indirect(struct blkif_request *req,
 unmap:
 	if (segments)
 		kunmap_atomic(segments);
-	xen_blkbk_unmap(blkif, pages, indirect_grefs);
+	xen_blkbk_unmap(ring, pages, indirect_grefs);
 	return rc;
 }
 
-static int dispatch_discard_io(struct xen_blkif *blkif,
+static int dispatch_discard_io(struct xen_blkif_ring *ring,
 				struct blkif_request *req)
 {
 	int err = 0;
 	int status = BLKIF_RSP_OKAY;
+	struct xen_blkif *blkif = ring->blkif;
 	struct block_device *bdev = blkif->vbd.bdev;
 	unsigned long secure;
 	struct phys_req preq;
@@ -997,7 +1002,7 @@ static int dispatch_discard_io(struct xen_blkif *blkif,
 			preq.sector_number + preq.nr_sects, blkif->vbd.pdevice);
 		goto fail_response;
 	}
-	blkif->st_ds_req++;
+	ring->st_ds_req++;
 
 	secure = (blkif->vbd.discard_secure &&
 		 (req->u.discard.flag & BLKIF_DISCARD_SECURE)) ?
@@ -1013,26 +1018,27 @@ fail_response:
 	} else if (err)
 		status = BLKIF_RSP_ERROR;
 
-	make_response(blkif, req->u.discard.id, req->operation, status);
+	make_response(ring, req->u.discard.id, req->operation, status);
 	xen_blkif_put(blkif);
 	return err;
 }
 
-static int dispatch_other_io(struct xen_blkif *blkif,
+static int dispatch_other_io(struct xen_blkif_ring *ring,
 			     struct blkif_request *req,
 			     struct pending_req *pending_req)
 {
-	free_req(blkif, pending_req);
-	make_response(blkif, req->u.other.id, req->operation,
+	free_req(ring, pending_req);
+	make_response(ring, req->u.other.id, req->operation,
 		      BLKIF_RSP_EOPNOTSUPP);
 	return -EIO;
 }
 
-static void xen_blk_drain_io(struct xen_blkif *blkif)
+static void xen_blk_drain_io(struct xen_blkif_ring *ring)
 {
+	struct xen_blkif *blkif = ring->blkif;
 	atomic_set(&blkif->drain, 1);
 	do {
-		if (atomic_read(&blkif->inflight) == 0)
+		if (atomic_read(&ring->inflight) == 0)
 			break;
 		wait_for_completion_interruptible_timeout(
 				&blkif->drain_complete, HZ);
@@ -1053,12 +1059,12 @@ static void __end_block_io_op(struct pending_req *pending_req, int error)
 	if ((pending_req->operation == BLKIF_OP_FLUSH_DISKCACHE) &&
 	    (error == -EOPNOTSUPP)) {
 		pr_debug("flush diskcache op failed, not supported\n");
-		xen_blkbk_flush_diskcache(XBT_NIL, pending_req->blkif->be, 0);
+		xen_blkbk_flush_diskcache(XBT_NIL, pending_req->ring->blkif->be, 0);
 		pending_req->status = BLKIF_RSP_EOPNOTSUPP;
 	} else if ((pending_req->operation == BLKIF_OP_WRITE_BARRIER) &&
 		    (error == -EOPNOTSUPP)) {
 		pr_debug("write barrier op failed, not supported\n");
-		xen_blkbk_barrier(XBT_NIL, pending_req->blkif->be, 0);
+		xen_blkbk_barrier(XBT_NIL, pending_req->ring->blkif->be, 0);
 		pending_req->status = BLKIF_RSP_EOPNOTSUPP;
 	} else if (error) {
 		pr_debug("Buffer not up-to-date at end of operation,"
@@ -1092,9 +1098,9 @@ static void end_block_io_op(struct bio *bio, int error)
  * and transmute  it to the block API to hand it over to the proper block disk.
  */
 static int
-__do_block_io_op(struct xen_blkif *blkif)
+__do_block_io_op(struct xen_blkif_ring *ring)
 {
-	union blkif_back_rings *blk_rings = &blkif->blk_rings;
+	union blkif_back_rings *blk_rings = &ring->blk_rings;
 	struct blkif_request req;
 	struct pending_req *pending_req;
 	RING_IDX rc, rp;
@@ -1107,7 +1113,7 @@ __do_block_io_op(struct xen_blkif *blkif)
 	if (RING_REQUEST_PROD_OVERFLOW(&blk_rings->common, rp)) {
 		rc = blk_rings->common.rsp_prod_pvt;
 		pr_warn("Frontend provided bogus ring requests (%d - %d = %d). Halting ring processing on dev=%04x\n",
-			rp, rc, rp - rc, blkif->vbd.pdevice);
+			rp, rc, rp - rc, ring->blkif->vbd.pdevice);
 		return -EACCES;
 	}
 	while (rc != rp) {
@@ -1120,14 +1126,14 @@ __do_block_io_op(struct xen_blkif *blkif)
 			break;
 		}
 
-		pending_req = alloc_req(blkif);
+		pending_req = alloc_req(ring);
 		if (NULL == pending_req) {
-			blkif->st_oo_req++;
+			ring->st_oo_req++;
 			more_to_do = 1;
 			break;
 		}
 
-		switch (blkif->blk_protocol) {
+		switch (ring->blkif->blk_protocol) {
 		case BLKIF_PROTOCOL_NATIVE:
 			memcpy(&req, RING_GET_REQUEST(&blk_rings->native, rc), sizeof(req));
 			break;
@@ -1151,16 +1157,16 @@ __do_block_io_op(struct xen_blkif *blkif)
 		case BLKIF_OP_WRITE_BARRIER:
 		case BLKIF_OP_FLUSH_DISKCACHE:
 		case BLKIF_OP_INDIRECT:
-			if (dispatch_rw_block_io(blkif, &req, pending_req))
+			if (dispatch_rw_block_io(ring, &req, pending_req))
 				goto done;
 			break;
 		case BLKIF_OP_DISCARD:
-			free_req(blkif, pending_req);
-			if (dispatch_discard_io(blkif, &req))
+			free_req(ring, pending_req);
+			if (dispatch_discard_io(ring, &req))
 				goto done;
 			break;
 		default:
-			if (dispatch_other_io(blkif, &req, pending_req))
+			if (dispatch_other_io(ring, &req, pending_req))
 				goto done;
 			break;
 		}
@@ -1173,13 +1179,13 @@ done:
 }
 
 static int
-do_block_io_op(struct xen_blkif *blkif)
+do_block_io_op(struct xen_blkif_ring *ring)
 {
-	union blkif_back_rings *blk_rings = &blkif->blk_rings;
+	union blkif_back_rings *blk_rings = &ring->blk_rings;
 	int more_to_do;
 
 	do {
-		more_to_do = __do_block_io_op(blkif);
+		more_to_do = __do_block_io_op(ring);
 		if (more_to_do)
 			break;
 
@@ -1192,7 +1198,7 @@ do_block_io_op(struct xen_blkif *blkif)
  * Transmutation of the 'struct blkif_request' to a proper 'struct bio'
  * and call the 'submit_bio' to pass it to the underlying storage.
  */
-static int dispatch_rw_block_io(struct xen_blkif *blkif,
+static int dispatch_rw_block_io(struct xen_blkif_ring *ring,
 				struct blkif_request *req,
 				struct pending_req *pending_req)
 {
@@ -1219,17 +1225,17 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 
 	switch (req_operation) {
 	case BLKIF_OP_READ:
-		blkif->st_rd_req++;
+		ring->st_rd_req++;
 		operation = READ;
 		break;
 	case BLKIF_OP_WRITE:
-		blkif->st_wr_req++;
+		ring->st_wr_req++;
 		operation = WRITE_ODIRECT;
 		break;
 	case BLKIF_OP_WRITE_BARRIER:
 		drain = true;
 	case BLKIF_OP_FLUSH_DISKCACHE:
-		blkif->st_f_req++;
+		ring->st_f_req++;
 		operation = WRITE_FLUSH;
 		break;
 	default:
@@ -1254,7 +1260,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 
 	preq.nr_sects      = 0;
 
-	pending_req->blkif     = blkif;
+	pending_req->ring     = ring;
 	pending_req->id        = req->u.rw.id;
 	pending_req->operation = req_operation;
 	pending_req->status    = BLKIF_RSP_OKAY;
@@ -1281,12 +1287,12 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 			goto fail_response;
 	}
 
-	if (xen_vbd_translate(&preq, blkif, operation) != 0) {
+	if (xen_vbd_translate(&preq, ring->blkif, operation) != 0) {
 		pr_debug("access denied: %s of [%llu,%llu] on dev=%04x\n",
 			 operation == READ ? "read" : "write",
 			 preq.sector_number,
 			 preq.sector_number + preq.nr_sects,
-			 blkif->vbd.pdevice);
+			 ring->blkif->vbd.pdevice);
 		goto fail_response;
 	}
 
@@ -1298,7 +1304,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 		if (((int)preq.sector_number|(int)seg[i].nsec) &
 		    ((bdev_logical_block_size(preq.bdev) >> 9) - 1)) {
 			pr_debug("Misaligned I/O request from domain %d\n",
-				 blkif->domid);
+				 ring->blkif->domid);
 			goto fail_response;
 		}
 	}
@@ -1307,7 +1313,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 	 * issue the WRITE_FLUSH.
 	 */
 	if (drain)
-		xen_blk_drain_io(pending_req->blkif);
+		xen_blk_drain_io(pending_req->ring);
 
 	/*
 	 * If we have failed at this point, we need to undo the M2P override,
@@ -1322,8 +1328,8 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 	 * This corresponding xen_blkif_put is done in __end_block_io_op, or
 	 * below (in "!bio") if we are handling a BLKIF_OP_DISCARD.
 	 */
-	xen_blkif_get(blkif);
-	atomic_inc(&blkif->inflight);
+	xen_blkif_get(ring->blkif);
+	atomic_inc(&ring->inflight);
 
 	for (i = 0; i < nseg; i++) {
 		while ((bio == NULL) ||
@@ -1371,19 +1377,19 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 	blk_finish_plug(&plug);
 
 	if (operation == READ)
-		blkif->st_rd_sect += preq.nr_sects;
+		ring->st_rd_sect += preq.nr_sects;
 	else if (operation & WRITE)
-		blkif->st_wr_sect += preq.nr_sects;
+		ring->st_wr_sect += preq.nr_sects;
 
 	return 0;
 
  fail_flush:
-	xen_blkbk_unmap(blkif, pending_req->segments,
+	xen_blkbk_unmap(ring, pending_req->segments,
 	                pending_req->nr_segs);
  fail_response:
 	/* Haven't submitted any bio's yet. */
-	make_response(blkif, req->u.rw.id, req_operation, BLKIF_RSP_ERROR);
-	free_req(blkif, pending_req);
+	make_response(ring, req->u.rw.id, req_operation, BLKIF_RSP_ERROR);
+	free_req(ring, pending_req);
 	msleep(1); /* back off a bit */
 	return -EIO;
 
@@ -1401,21 +1407,22 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 /*
  * Put a response on the ring on how the operation fared.
  */
-static void make_response(struct xen_blkif *blkif, u64 id,
+static void make_response(struct xen_blkif_ring *ring, u64 id,
 			  unsigned short op, int st)
 {
 	struct blkif_response  resp;
 	unsigned long     flags;
-	union blkif_back_rings *blk_rings = &blkif->blk_rings;
+	union blkif_back_rings *blk_rings;
 	int notify;
 
 	resp.id        = id;
 	resp.operation = op;
 	resp.status    = st;
 
-	spin_lock_irqsave(&blkif->blk_ring_lock, flags);
+	spin_lock_irqsave(&ring->blk_ring_lock, flags);
+	blk_rings = &ring->blk_rings;
 	/* Place on the response ring for the relevant domain. */
-	switch (blkif->blk_protocol) {
+	switch (ring->blkif->blk_protocol) {
 	case BLKIF_PROTOCOL_NATIVE:
 		memcpy(RING_GET_RESPONSE(&blk_rings->native, blk_rings->native.rsp_prod_pvt),
 		       &resp, sizeof(resp));
@@ -1433,9 +1440,9 @@ static void make_response(struct xen_blkif *blkif, u64 id,
 	}
 	blk_rings->common.rsp_prod_pvt++;
 	RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&blk_rings->common, notify);
-	spin_unlock_irqrestore(&blkif->blk_ring_lock, flags);
+	spin_unlock_irqrestore(&ring->blk_ring_lock, flags);
 	if (notify)
-		notify_remote_via_irq(blkif->irq);
+		notify_remote_via_irq(ring->irq);
 }
 
 static int __init xen_blkif_init(void)
diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index 45a044a..cc253d4 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -260,30 +260,18 @@ struct persistent_gnt {
 	struct list_head remove_node;
 };
 
-struct xen_blkif {
-	/* Unique identifier for this interface. */
-	domid_t			domid;
-	unsigned int		handle;
+/* Per-ring information */
+struct xen_blkif_ring {
 	/* Physical parameters of the comms window. */
 	unsigned int		irq;
-	/* Comms information. */
-	enum blkif_protocol	blk_protocol;
 	union blkif_back_rings	blk_rings;
 	void			*blk_ring;
-	/* The VBD attached to this interface. */
-	struct xen_vbd		vbd;
-	/* Back pointer to the backend_info. */
-	struct backend_info	*be;
 	/* Private fields. */
 	spinlock_t		blk_ring_lock;
-	atomic_t		refcnt;
 
 	wait_queue_head_t	wq;
-	/* for barrier (drain) requests */
-	struct completion	drain_complete;
-	atomic_t		drain;
 	atomic_t		inflight;
-	/* One thread per one blkif. */
+	/* One thread per blkif ring. */
 	struct task_struct	*xenblkd;
 	unsigned int		waiting_reqs;
 
@@ -321,7 +309,37 @@ struct xen_blkif {
 	struct work_struct	free_work;
 	/* Thread shutdown wait queue. */
 	wait_queue_head_t	shutdown_wq;
+	struct xen_blkif *blkif;
+};
+
+struct xen_blkif {
+	/* Unique identifier for this interface. */
+	domid_t			domid;
+	unsigned int		handle;
+	/* Comms information. */
+	enum blkif_protocol	blk_protocol;
+	/* The VBD attached to this interface. */
+	struct xen_vbd		vbd;
+	/* Back pointer to the backend_info. */
+	struct backend_info	*be;
+	/* for barrier (drain) requests */
+	struct completion	drain_complete;
+	atomic_t		drain;
+	atomic_t		refcnt;
+	struct work_struct	free_work;
+
+	/* statistics */
+	unsigned long		st_print;
+	unsigned long long			st_rd_req;
+	unsigned long long			st_wr_req;
+	unsigned long long			st_oo_req;
+	unsigned long long			st_f_req;
+	unsigned long long			st_ds_req;
+	unsigned long long			st_rd_sect;
+	unsigned long long			st_wr_sect;
 	unsigned int nr_ring_pages;
+	/* All rings for this device */
+	struct xen_blkif_ring ring;
 };
 
 struct seg_buf {
@@ -343,7 +361,7 @@ struct grant_page {
  * response queued for it, with the saved 'id' passed back.
  */
 struct pending_req {
-	struct xen_blkif	*blkif;
+	struct xen_blkif_ring   *ring;
 	u64			id;
 	int			nr_segs;
 	atomic_t		pendcnt;
@@ -385,7 +403,7 @@ int xen_blkif_xenbus_init(void);
 irqreturn_t xen_blkif_be_int(int irq, void *dev_id);
 int xen_blkif_schedule(void *arg);
 int xen_blkif_purge_persistent(void *arg);
-void xen_blkbk_free_caches(struct xen_blkif *blkif);
+void xen_blkbk_free_caches(struct xen_blkif_ring *ring);
 
 int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt,
 			      struct backend_info *be, int state);
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index deb3f00..6482ee3 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -88,7 +88,7 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
 	char name[BLKBACK_NAME_LEN];
 
 	/* Not ready to connect? */
-	if (!blkif->irq || !blkif->vbd.bdev)
+	if (!blkif->ring.irq || !blkif->vbd.bdev)
 		return;
 
 	/* Already connected? */
@@ -113,10 +113,10 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
 	}
 	invalidate_inode_pages2(blkif->vbd.bdev->bd_inode->i_mapping);
 
-	blkif->xenblkd = kthread_run(xen_blkif_schedule, blkif, "%s", name);
-	if (IS_ERR(blkif->xenblkd)) {
-		err = PTR_ERR(blkif->xenblkd);
-		blkif->xenblkd = NULL;
+	blkif->ring.xenblkd = kthread_run(xen_blkif_schedule, &blkif->ring, "%s", name);
+	if (IS_ERR(blkif->ring.xenblkd)) {
+		err = PTR_ERR(blkif->ring.xenblkd);
+		blkif->ring.xenblkd = NULL;
 		xenbus_dev_error(blkif->be->dev, err, "start xenblkd");
 		return;
 	}
@@ -125,6 +125,7 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
 static struct xen_blkif *xen_blkif_alloc(domid_t domid)
 {
 	struct xen_blkif *blkif;
+	struct xen_blkif_ring *ring;
 
 	BUILD_BUG_ON(MAX_INDIRECT_PAGES > BLKIF_MAX_INDIRECT_PAGES_PER_REQUEST);
 
@@ -133,41 +134,44 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
 		return ERR_PTR(-ENOMEM);
 
 	blkif->domid = domid;
-	spin_lock_init(&blkif->blk_ring_lock);
+	ring = &blkif->ring;
+	ring->blkif = blkif;
+	spin_lock_init(&ring->blk_ring_lock);
 	atomic_set(&blkif->refcnt, 1);
-	init_waitqueue_head(&blkif->wq);
+	init_waitqueue_head(&ring->wq);
 	init_completion(&blkif->drain_complete);
 	atomic_set(&blkif->drain, 0);
-	blkif->st_print = jiffies;
-	blkif->persistent_gnts.rb_node = NULL;
-	spin_lock_init(&blkif->free_pages_lock);
-	INIT_LIST_HEAD(&blkif->free_pages);
-	INIT_LIST_HEAD(&blkif->persistent_purge_list);
-	blkif->free_pages_num = 0;
-	atomic_set(&blkif->persistent_gnt_in_use, 0);
-	atomic_set(&blkif->inflight, 0);
-	INIT_WORK(&blkif->persistent_purge_work, xen_blkbk_unmap_purged_grants);
-
-	INIT_LIST_HEAD(&blkif->pending_free);
+	ring->st_print = jiffies;
+	ring->persistent_gnts.rb_node = NULL;
+	spin_lock_init(&ring->free_pages_lock);
+	INIT_LIST_HEAD(&ring->free_pages);
+	INIT_LIST_HEAD(&ring->persistent_purge_list);
+	ring->free_pages_num = 0;
+	atomic_set(&ring->persistent_gnt_in_use, 0);
+	atomic_set(&ring->inflight, 0);
+	INIT_WORK(&ring->persistent_purge_work, xen_blkbk_unmap_purged_grants);
+
+	INIT_LIST_HEAD(&ring->pending_free);
 	INIT_WORK(&blkif->free_work, xen_blkif_deferred_free);
-	spin_lock_init(&blkif->pending_free_lock);
-	init_waitqueue_head(&blkif->pending_free_wq);
-	init_waitqueue_head(&blkif->shutdown_wq);
+	spin_lock_init(&ring->pending_free_lock);
+	init_waitqueue_head(&ring->pending_free_wq);
+	init_waitqueue_head(&ring->shutdown_wq);
 
 	return blkif;
 }
 
-static int xen_blkif_map(struct xen_blkif *blkif, grant_ref_t *gref,
+static int xen_blkif_map(struct xen_blkif_ring *ring, grant_ref_t *gref,
 			 unsigned int nr_grefs, unsigned int evtchn)
 {
 	int err;
+	struct xen_blkif *blkif = ring->blkif;
 
 	/* Already connected through? */
-	if (blkif->irq)
+	if (ring->irq)
 		return 0;
 
 	err = xenbus_map_ring_valloc(blkif->be->dev, gref, nr_grefs,
-				     &blkif->blk_ring);
+				     &ring->blk_ring);
 	if (err < 0)
 		return err;
 
@@ -175,22 +179,22 @@ static int xen_blkif_map(struct xen_blkif *blkif, grant_ref_t *gref,
 	case BLKIF_PROTOCOL_NATIVE:
 	{
 		struct blkif_sring *sring;
-		sring = (struct blkif_sring *)blkif->blk_ring;
-		BACK_RING_INIT(&blkif->blk_rings.native, sring, PAGE_SIZE * nr_grefs);
+		sring = (struct blkif_sring *)ring->blk_ring;
+		BACK_RING_INIT(&ring->blk_rings.native, sring, PAGE_SIZE * nr_grefs);
 		break;
 	}
 	case BLKIF_PROTOCOL_X86_32:
 	{
 		struct blkif_x86_32_sring *sring_x86_32;
-		sring_x86_32 = (struct blkif_x86_32_sring *)blkif->blk_ring;
-		BACK_RING_INIT(&blkif->blk_rings.x86_32, sring_x86_32, PAGE_SIZE * nr_grefs);
+		sring_x86_32 = (struct blkif_x86_32_sring *)ring->blk_ring;
+		BACK_RING_INIT(&ring->blk_rings.x86_32, sring_x86_32, PAGE_SIZE * nr_grefs);
 		break;
 	}
 	case BLKIF_PROTOCOL_X86_64:
 	{
 		struct blkif_x86_64_sring *sring_x86_64;
-		sring_x86_64 = (struct blkif_x86_64_sring *)blkif->blk_ring;
-		BACK_RING_INIT(&blkif->blk_rings.x86_64, sring_x86_64, PAGE_SIZE * nr_grefs);
+		sring_x86_64 = (struct blkif_x86_64_sring *)ring->blk_ring;
+		BACK_RING_INIT(&ring->blk_rings.x86_64, sring_x86_64, PAGE_SIZE * nr_grefs);
 		break;
 	}
 	default:
@@ -199,44 +203,46 @@ static int xen_blkif_map(struct xen_blkif *blkif, grant_ref_t *gref,
 
 	err = bind_interdomain_evtchn_to_irqhandler(blkif->domid, evtchn,
 						    xen_blkif_be_int, 0,
-						    "blkif-backend", blkif);
+						    "blkif-backend", ring);
 	if (err < 0) {
-		xenbus_unmap_ring_vfree(blkif->be->dev, blkif->blk_ring);
-		blkif->blk_rings.common.sring = NULL;
+		xenbus_unmap_ring_vfree(blkif->be->dev, ring->blk_ring);
+		ring->blk_rings.common.sring = NULL;
 		return err;
 	}
-	blkif->irq = err;
+	ring->irq = err;
 
 	return 0;
 }
 
 static int xen_blkif_disconnect(struct xen_blkif *blkif)
 {
-	if (blkif->xenblkd) {
-		kthread_stop(blkif->xenblkd);
-		wake_up(&blkif->shutdown_wq);
-		blkif->xenblkd = NULL;
+	struct xen_blkif_ring *ring = &blkif->ring;
+
+	if (ring->xenblkd) {
+		kthread_stop(ring->xenblkd);
+		wake_up(&ring->shutdown_wq);
+		ring->xenblkd = NULL;
 	}
 
 	/* The above kthread_stop() guarantees that at this point we
 	 * don't have any discard_io or other_io requests. So, checking
 	 * for inflight IO is enough.
 	 */
-	if (atomic_read(&blkif->inflight) > 0)
+	if (atomic_read(&ring->inflight) > 0)
 		return -EBUSY;
 
-	if (blkif->irq) {
-		unbind_from_irqhandler(blkif->irq, blkif);
-		blkif->irq = 0;
+	if (ring->irq) {
+		unbind_from_irqhandler(ring->irq, ring);
+		ring->irq = 0;
 	}
 
-	if (blkif->blk_rings.common.sring) {
-		xenbus_unmap_ring_vfree(blkif->be->dev, blkif->blk_ring);
-		blkif->blk_rings.common.sring = NULL;
+	if (ring->blk_rings.common.sring) {
+		xenbus_unmap_ring_vfree(blkif->be->dev, ring->blk_ring);
+		ring->blk_rings.common.sring = NULL;
 	}
 
 	/* Remove all persistent grants and the cache of ballooned pages. */
-	xen_blkbk_free_caches(blkif);
+	xen_blkbk_free_caches(ring);
 
 	return 0;
 }
@@ -245,20 +251,21 @@ static void xen_blkif_free(struct xen_blkif *blkif)
 {
 	struct pending_req *req, *n;
 	int i = 0, j;
+	struct xen_blkif_ring *ring = &blkif->ring;
 
 	xen_blkif_disconnect(blkif);
 	xen_vbd_free(&blkif->vbd);
 
 	/* Make sure everything is drained before shutting down */
-	BUG_ON(blkif->persistent_gnt_c != 0);
-	BUG_ON(atomic_read(&blkif->persistent_gnt_in_use) != 0);
-	BUG_ON(blkif->free_pages_num != 0);
-	BUG_ON(!list_empty(&blkif->persistent_purge_list));
-	BUG_ON(!list_empty(&blkif->free_pages));
-	BUG_ON(!RB_EMPTY_ROOT(&blkif->persistent_gnts));
+	BUG_ON(ring->persistent_gnt_c != 0);
+	BUG_ON(atomic_read(&ring->persistent_gnt_in_use) != 0);
+	BUG_ON(ring->free_pages_num != 0);
+	BUG_ON(!list_empty(&ring->persistent_purge_list));
+	BUG_ON(!list_empty(&ring->free_pages));
+	BUG_ON(!RB_EMPTY_ROOT(&ring->persistent_gnts));
 
 	/* Check that there is no request in use */
-	list_for_each_entry_safe(req, n, &blkif->pending_free, free_list) {
+	list_for_each_entry_safe(req, n, &ring->pending_free, free_list) {
 		list_del(&req->free_list);
 
 		for (j = 0; j < MAX_INDIRECT_SEGMENTS; j++)
@@ -298,6 +305,16 @@ int __init xen_blkif_interface_init(void)
 	{								\
 		struct xenbus_device *dev = to_xenbus_device(_dev);	\
 		struct backend_info *be = dev_get_drvdata(&dev->dev);	\
+		struct xen_blkif *blkif = be->blkif;			\
+		struct xen_blkif_ring *ring = &blkif->ring;		\
+									\
+		blkif->st_oo_req = ring->st_oo_req;			\
+		blkif->st_rd_req = ring->st_rd_req;			\
+		blkif->st_wr_req = ring->st_wr_req;			\
+		blkif->st_f_req = ring->st_f_req;			\
+		blkif->st_ds_req = ring->st_ds_req;			\
+		blkif->st_rd_sect = ring->st_rd_sect;			\
+		blkif->st_wr_sect = ring->st_wr_sect;			\
 									\
 		return sprintf(buf, format, ##args);			\
 	}								\
@@ -830,6 +847,7 @@ static int connect_ring(struct backend_info *be)
 	char protocol[64] = "";
 	struct pending_req *req, *n;
 	int err, i, j;
+	struct xen_blkif_ring *ring = &be->blkif->ring;
 
 	pr_debug("%s %s\n", __func__, dev->otherend);
 
@@ -918,7 +936,7 @@ static int connect_ring(struct backend_info *be)
 		req = kzalloc(sizeof(*req), GFP_KERNEL);
 		if (!req)
 			goto fail;
-		list_add_tail(&req->free_list, &be->blkif->pending_free);
+		list_add_tail(&req->free_list, &ring->pending_free);
 		for (j = 0; j < MAX_INDIRECT_SEGMENTS; j++) {
 			req->segments[j] = kzalloc(sizeof(*req->segments[0]), GFP_KERNEL);
 			if (!req->segments[j])
@@ -933,7 +951,7 @@ static int connect_ring(struct backend_info *be)
 	}
 
 	/* Map the shared frame, irq etc. */
-	err = xen_blkif_map(be->blkif, ring_ref, nr_grefs, evtchn);
+	err = xen_blkif_map(ring, ring_ref, nr_grefs, evtchn);
 	if (err) {
 		xenbus_dev_fatal(dev, err, "mapping ring-ref port %u", evtchn);
 		return err;
@@ -942,7 +960,7 @@ static int connect_ring(struct backend_info *be)
 	return 0;
 
 fail:
-	list_for_each_entry_safe(req, n, &be->blkif->pending_free, free_list) {
+	list_for_each_entry_safe(req, n, &ring->pending_free, free_list) {
 		list_del(&req->free_list);
 		for (j = 0; j < MAX_INDIRECT_SEGMENTS; j++) {
 			if (!req->segments[j])
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v3 8/9] xen/blkback: pseudo support for multi hardware queues/rings
  2015-09-05 12:39 [PATCH v3 0/9] xen-block: support multi hardware-queues/rings Bob Liu
@ 2015-09-05 12:39   ` Bob Liu
  2015-09-05 12:39 ` Bob Liu
                     ` (13 subsequent siblings)
  14 siblings, 0 replies; 83+ messages in thread
From: Bob Liu @ 2015-09-05 12:39 UTC (permalink / raw)
  To: xen-devel
  Cc: david.vrabel, linux-kernel, roger.pau, konrad.wilk,
	felipe.franciosi, axboe, hch, avanzini.arianna, rafal.mielniczuk,
	boris.ostrovsky, jonathan.davies, Bob Liu

Prepare patch for multi hardware queues/rings, the ring number was set to 1 by
force.

Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 drivers/block/xen-blkback/common.h |    3 +-
 drivers/block/xen-blkback/xenbus.c |  328 +++++++++++++++++++++++-------------
 2 files changed, 209 insertions(+), 122 deletions(-)

diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index cc253d4..ba058a0 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -339,7 +339,8 @@ struct xen_blkif {
 	unsigned long long			st_wr_sect;
 	unsigned int nr_ring_pages;
 	/* All rings for this device */
-	struct xen_blkif_ring ring;
+	struct xen_blkif_ring *rings;
+	unsigned int nr_rings;
 };
 
 struct seg_buf {
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index 6482ee3..04b8d0d 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -26,6 +26,7 @@
 /* Enlarge the array size in order to fully show blkback name. */
 #define BLKBACK_NAME_LEN (20)
 #define RINGREF_NAME_LEN (20)
+#define RINGREF_NAME_LEN (20)
 
 struct backend_info {
 	struct xenbus_device	*dev;
@@ -84,11 +85,13 @@ static int blkback_name(struct xen_blkif *blkif, char *buf)
 
 static void xen_update_blkif_status(struct xen_blkif *blkif)
 {
-	int err;
+	int err, i;
 	char name[BLKBACK_NAME_LEN];
+	struct xen_blkif_ring *ring;
+	char per_ring_name[BLKBACK_NAME_LEN + 2];
 
 	/* Not ready to connect? */
-	if (!blkif->ring.irq || !blkif->vbd.bdev)
+	if (!blkif->rings || !blkif->rings[0].irq || !blkif->vbd.bdev)
 		return;
 
 	/* Already connected? */
@@ -113,19 +116,68 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
 	}
 	invalidate_inode_pages2(blkif->vbd.bdev->bd_inode->i_mapping);
 
-	blkif->ring.xenblkd = kthread_run(xen_blkif_schedule, &blkif->ring, "%s", name);
-	if (IS_ERR(blkif->ring.xenblkd)) {
-		err = PTR_ERR(blkif->ring.xenblkd);
-		blkif->ring.xenblkd = NULL;
-		xenbus_dev_error(blkif->be->dev, err, "start xenblkd");
-		return;
+	if (blkif->nr_rings == 1) {
+		blkif->rings[0].xenblkd = kthread_run(xen_blkif_schedule, &blkif->rings[0], "%s", name);
+		if (IS_ERR(blkif->rings[0].xenblkd)) {
+			err = PTR_ERR(blkif->rings[0].xenblkd);
+			blkif->rings[0].xenblkd = NULL;
+			xenbus_dev_error(blkif->be->dev, err, "start xenblkd");
+			return;
+		}
+	} else {
+		for (i = 0; i < blkif->nr_rings; i++) {
+			snprintf(per_ring_name, BLKBACK_NAME_LEN + 2, "%s-%d", name, i);
+			ring = &blkif->rings[i];
+			ring->xenblkd = kthread_run(xen_blkif_schedule, ring, "%s", per_ring_name);
+			if (IS_ERR(ring->xenblkd)) {
+				err = PTR_ERR(ring->xenblkd);
+				ring->xenblkd = NULL;
+				xenbus_dev_error(blkif->be->dev, err,
+						"start %s xenblkd", per_ring_name);
+				return;
+			}
+		}
+	}
+}
+
+static int xen_blkif_alloc_rings(struct xen_blkif *blkif)
+{
+	struct xen_blkif_ring *ring;
+	int r;
+
+	blkif->rings = kzalloc(blkif->nr_rings * sizeof(struct xen_blkif_ring), GFP_KERNEL);
+	if (!blkif->rings)
+		return -ENOMEM;
+
+	for (r = 0; r < blkif->nr_rings; r++) {
+		ring = &blkif->rings[r];
+
+		spin_lock_init(&ring->blk_ring_lock);
+		init_waitqueue_head(&ring->wq);
+		ring->st_print = jiffies;
+		ring->persistent_gnts.rb_node = NULL;
+		spin_lock_init(&ring->free_pages_lock);
+		INIT_LIST_HEAD(&ring->free_pages);
+		INIT_LIST_HEAD(&ring->persistent_purge_list);
+		ring->free_pages_num = 0;
+		atomic_set(&ring->persistent_gnt_in_use, 0);
+		atomic_set(&ring->inflight, 0);
+		INIT_WORK(&ring->persistent_purge_work, xen_blkbk_unmap_purged_grants);
+		INIT_LIST_HEAD(&ring->pending_free);
+
+		spin_lock_init(&ring->pending_free_lock);
+		init_waitqueue_head(&ring->pending_free_wq);
+		init_waitqueue_head(&ring->shutdown_wq);
+		ring->blkif = blkif;
+		xen_blkif_get(blkif);
 	}
+
+	return 0;
 }
 
 static struct xen_blkif *xen_blkif_alloc(domid_t domid)
 {
 	struct xen_blkif *blkif;
-	struct xen_blkif_ring *ring;
 
 	BUILD_BUG_ON(MAX_INDIRECT_PAGES > BLKIF_MAX_INDIRECT_PAGES_PER_REQUEST);
 
@@ -134,29 +186,16 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
 		return ERR_PTR(-ENOMEM);
 
 	blkif->domid = domid;
-	ring = &blkif->ring;
-	ring->blkif = blkif;
-	spin_lock_init(&ring->blk_ring_lock);
 	atomic_set(&blkif->refcnt, 1);
-	init_waitqueue_head(&ring->wq);
 	init_completion(&blkif->drain_complete);
 	atomic_set(&blkif->drain, 0);
-	ring->st_print = jiffies;
-	ring->persistent_gnts.rb_node = NULL;
-	spin_lock_init(&ring->free_pages_lock);
-	INIT_LIST_HEAD(&ring->free_pages);
-	INIT_LIST_HEAD(&ring->persistent_purge_list);
-	ring->free_pages_num = 0;
-	atomic_set(&ring->persistent_gnt_in_use, 0);
-	atomic_set(&ring->inflight, 0);
-	INIT_WORK(&ring->persistent_purge_work, xen_blkbk_unmap_purged_grants);
-
-	INIT_LIST_HEAD(&ring->pending_free);
 	INIT_WORK(&blkif->free_work, xen_blkif_deferred_free);
-	spin_lock_init(&ring->pending_free_lock);
-	init_waitqueue_head(&ring->pending_free_wq);
-	init_waitqueue_head(&ring->shutdown_wq);
 
+	blkif->nr_rings = 1;
+	if (xen_blkif_alloc_rings(blkif)) {
+		kmem_cache_free(xen_blkif_cachep, blkif);
+		return ERR_PTR(-ENOMEM);
+	}
 	return blkif;
 }
 
@@ -216,70 +255,78 @@ static int xen_blkif_map(struct xen_blkif_ring *ring, grant_ref_t *gref,
 
 static int xen_blkif_disconnect(struct xen_blkif *blkif)
 {
-	struct xen_blkif_ring *ring = &blkif->ring;
+	struct xen_blkif_ring *ring;
+	int i;
+
+	for (i = 0; i < blkif->nr_rings; i++) {
+		ring = &blkif->rings[i];
+		if (ring->xenblkd) {
+			kthread_stop(ring->xenblkd);
+			wake_up(&ring->shutdown_wq);
+			ring->xenblkd = NULL;
+		}
 
-	if (ring->xenblkd) {
-		kthread_stop(ring->xenblkd);
-		wake_up(&ring->shutdown_wq);
-		ring->xenblkd = NULL;
-	}
+		/* The above kthread_stop() guarantees that at this point we
+		 * don't have any discard_io or other_io requests. So, checking
+		 * for inflight IO is enough.
+		 */
+		if (atomic_read(&ring->inflight) > 0)
+			return -EBUSY;
 
-	/* The above kthread_stop() guarantees that at this point we
-	 * don't have any discard_io or other_io requests. So, checking
-	 * for inflight IO is enough.
-	 */
-	if (atomic_read(&ring->inflight) > 0)
-		return -EBUSY;
+		if (ring->irq) {
+			unbind_from_irqhandler(ring->irq, ring);
+			ring->irq = 0;
+		}
 
-	if (ring->irq) {
-		unbind_from_irqhandler(ring->irq, ring);
-		ring->irq = 0;
-	}
+		if (ring->blk_rings.common.sring) {
+			xenbus_unmap_ring_vfree(blkif->be->dev, ring->blk_ring);
+			ring->blk_rings.common.sring = NULL;
+		}
 
-	if (ring->blk_rings.common.sring) {
-		xenbus_unmap_ring_vfree(blkif->be->dev, ring->blk_ring);
-		ring->blk_rings.common.sring = NULL;
+		/* Remove all persistent grants and the cache of ballooned pages. */
+		xen_blkbk_free_caches(ring);
 	}
 
-	/* Remove all persistent grants and the cache of ballooned pages. */
-	xen_blkbk_free_caches(ring);
-
 	return 0;
 }
 
 static void xen_blkif_free(struct xen_blkif *blkif)
 {
 	struct pending_req *req, *n;
-	int i = 0, j;
-	struct xen_blkif_ring *ring = &blkif->ring;
+	int i = 0, j, r;
+	struct xen_blkif_ring *ring;
 
 	xen_blkif_disconnect(blkif);
 	xen_vbd_free(&blkif->vbd);
 
-	/* Make sure everything is drained before shutting down */
-	BUG_ON(ring->persistent_gnt_c != 0);
-	BUG_ON(atomic_read(&ring->persistent_gnt_in_use) != 0);
-	BUG_ON(ring->free_pages_num != 0);
-	BUG_ON(!list_empty(&ring->persistent_purge_list));
-	BUG_ON(!list_empty(&ring->free_pages));
-	BUG_ON(!RB_EMPTY_ROOT(&ring->persistent_gnts));
+	for (r = 0; r < blkif->nr_rings; r++) {
+		ring = &blkif->rings[r];
+		/* Make sure everything is drained before shutting down */
+		BUG_ON(ring->persistent_gnt_c != 0);
+		BUG_ON(atomic_read(&ring->persistent_gnt_in_use) != 0);
+		BUG_ON(ring->free_pages_num != 0);
+		BUG_ON(!list_empty(&ring->persistent_purge_list));
+		BUG_ON(!list_empty(&ring->free_pages));
+		BUG_ON(!RB_EMPTY_ROOT(&ring->persistent_gnts));
 
-	/* Check that there is no request in use */
-	list_for_each_entry_safe(req, n, &ring->pending_free, free_list) {
-		list_del(&req->free_list);
+		/* Check that there is no request in use */
+		list_for_each_entry_safe(req, n, &ring->pending_free, free_list) {
+			list_del(&req->free_list);
 
-		for (j = 0; j < MAX_INDIRECT_SEGMENTS; j++)
-			kfree(req->segments[j]);
+			for (j = 0; j < MAX_INDIRECT_SEGMENTS; j++)
+				kfree(req->segments[j]);
 
-		for (j = 0; j < MAX_INDIRECT_PAGES; j++)
-			kfree(req->indirect_pages[j]);
+			for (j = 0; j < MAX_INDIRECT_PAGES; j++)
+				kfree(req->indirect_pages[j]);
 
-		kfree(req);
-		i++;
-	}
+			kfree(req);
+			i++;
+		}
 
-	WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));
+		WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));
+	}
 
+	kfree(blkif->rings);
 	kmem_cache_free(xen_blkif_cachep, blkif);
 }
 
@@ -306,15 +353,19 @@ int __init xen_blkif_interface_init(void)
 		struct xenbus_device *dev = to_xenbus_device(_dev);	\
 		struct backend_info *be = dev_get_drvdata(&dev->dev);	\
 		struct xen_blkif *blkif = be->blkif;			\
-		struct xen_blkif_ring *ring = &blkif->ring;		\
+		struct xen_blkif_ring *ring;				\
+		int i;							\
 									\
-		blkif->st_oo_req = ring->st_oo_req;			\
-		blkif->st_rd_req = ring->st_rd_req;			\
-		blkif->st_wr_req = ring->st_wr_req;			\
-		blkif->st_f_req = ring->st_f_req;			\
-		blkif->st_ds_req = ring->st_ds_req;			\
-		blkif->st_rd_sect = ring->st_rd_sect;			\
-		blkif->st_wr_sect = ring->st_wr_sect;			\
+		for (i = 0; i < blkif->nr_rings; i++) {			\
+			ring = &blkif->rings[i];			\
+			blkif->st_oo_req += ring->st_oo_req;		\
+			blkif->st_rd_req += ring->st_rd_req;		\
+			blkif->st_wr_req += ring->st_wr_req;		\
+			blkif->st_f_req += ring->st_f_req;		\
+			blkif->st_ds_req += ring->st_ds_req;		\
+			blkif->st_rd_sect += ring->st_rd_sect;		\
+			blkif->st_wr_sect += ring->st_wr_sect;		\
+		}							\
 									\
 		return sprintf(buf, format, ##args);			\
 	}								\
@@ -438,6 +489,7 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
 static int xen_blkbk_remove(struct xenbus_device *dev)
 {
 	struct backend_info *be = dev_get_drvdata(&dev->dev);
+	int i;
 
 	pr_debug("%s %p %d\n", __func__, dev, dev->otherend_id);
 
@@ -454,7 +506,8 @@ static int xen_blkbk_remove(struct xenbus_device *dev)
 
 	if (be->blkif) {
 		xen_blkif_disconnect(be->blkif);
-		xen_blkif_put(be->blkif);
+		for (i = 0; i < be->blkif->nr_rings; i++)
+			xen_blkif_put(be->blkif);
 	}
 
 	kfree(be->mode);
@@ -837,21 +890,16 @@ again:
 	xenbus_transaction_end(xbt, 1);
 }
 
-
-static int connect_ring(struct backend_info *be)
+static int read_per_ring_refs(struct xen_blkif_ring *ring, const char *dir)
 {
-	struct xenbus_device *dev = be->dev;
+	unsigned int ring_page_order, nr_grefs, evtchn;
 	unsigned int ring_ref[XENBUS_MAX_RING_PAGES];
-	unsigned int evtchn, nr_grefs, ring_page_order;
-	unsigned int pers_grants;
-	char protocol[64] = "";
 	struct pending_req *req, *n;
 	int err, i, j;
-	struct xen_blkif_ring *ring = &be->blkif->ring;
-
-	pr_debug("%s %s\n", __func__, dev->otherend);
+	struct xen_blkif *blkif = ring->blkif;
+	struct xenbus_device *dev = blkif->be->dev;
 
-	err = xenbus_scanf(XBT_NIL, dev->otherend, "event-channel", "%u",
+	err = xenbus_scanf(XBT_NIL, dir, "event-channel", "%u",
 			  &evtchn);
 	if (err != 1) {
 		err = -EINVAL;
@@ -859,12 +907,11 @@ static int connect_ring(struct backend_info *be)
 				 dev->otherend);
 		return err;
 	}
-	pr_info("event-channel %u\n", evtchn);
 
 	err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-page-order", "%u",
 			  &ring_page_order);
 	if (err != 1) {
-		err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-ref",
+		err = xenbus_scanf(XBT_NIL, dir, "ring-ref",
 				  "%u", &ring_ref[0]);
 		if (err != 1) {
 			err = -EINVAL;
@@ -873,7 +920,7 @@ static int connect_ring(struct backend_info *be)
 			return err;
 		}
 		nr_grefs = 1;
-		pr_info("%s:using single page: ring-ref %d\n", dev->otherend,
+		pr_info("%s:using single page: ring-ref %d\n", dir,
 			ring_ref[0]);
 	} else {
 		unsigned int i;
@@ -891,7 +938,7 @@ static int connect_ring(struct backend_info *be)
 			char ring_ref_name[RINGREF_NAME_LEN];
 
 			snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
-			err = xenbus_scanf(XBT_NIL, dev->otherend, ring_ref_name,
+			err = xenbus_scanf(XBT_NIL, dir, ring_ref_name,
 					   "%u", &ring_ref[i]);
 			if (err != 1) {
 				err = -EINVAL;
@@ -899,38 +946,12 @@ static int connect_ring(struct backend_info *be)
 						 dev->otherend, ring_ref_name);
 				return err;
 			}
-			pr_info("ring-ref%u: %u\n", i, ring_ref[i]);
+			pr_info("%s: ring-ref%u: %u\n", dir, i, ring_ref[i]);
 		}
 	}
+	blkif->nr_ring_pages = nr_grefs;
 
-	be->blkif->blk_protocol = BLKIF_PROTOCOL_DEFAULT;
-	err = xenbus_gather(XBT_NIL, dev->otherend, "protocol",
-			    "%63s", protocol, NULL);
-	if (err)
-		strcpy(protocol, "unspecified, assuming default");
-	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_NATIVE))
-		be->blkif->blk_protocol = BLKIF_PROTOCOL_NATIVE;
-	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_X86_32))
-		be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_32;
-	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_X86_64))
-		be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_64;
-	else {
-		xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
-		return -1;
-	}
-	err = xenbus_gather(XBT_NIL, dev->otherend,
-			    "feature-persistent", "%u",
-			    &pers_grants, NULL);
-	if (err)
-		pers_grants = 0;
-
-	be->blkif->vbd.feature_gnt_persistent = pers_grants;
-	be->blkif->vbd.overflow_max_grants = 0;
-	be->blkif->nr_ring_pages = nr_grefs;
-
-	pr_info("ring-pages:%d, event-channel %d, protocol %d (%s) %s\n",
-		nr_grefs, evtchn, be->blkif->blk_protocol, protocol,
-		pers_grants ? "persistent grants" : "");
+	pr_info("ring-pages:%d, event-channel %d.\n", nr_grefs, evtchn);
 
 	for (i = 0; i < nr_grefs * XEN_BLKIF_REQS_PER_PAGE; i++) {
 		req = kzalloc(sizeof(*req), GFP_KERNEL);
@@ -975,6 +996,71 @@ fail:
 		kfree(req);
 	}
 	return -ENOMEM;
+
+}
+
+static int connect_ring(struct backend_info *be)
+{
+	struct xenbus_device *dev = be->dev;
+	unsigned int pers_grants;
+	char protocol[64] = "";
+	int err, i;
+	char *xspath;
+	size_t xspathsize;
+	const size_t xenstore_path_ext_size = 11; /* sufficient for "/queue-NNN" */
+
+	pr_debug("%s %s\n", __func__, dev->otherend);
+
+	be->blkif->blk_protocol = BLKIF_PROTOCOL_DEFAULT;
+	err = xenbus_gather(XBT_NIL, dev->otherend, "protocol",
+			    "%63s", protocol, NULL);
+	if (err)
+		strcpy(protocol, "unspecified, assuming default");
+	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_NATIVE))
+		be->blkif->blk_protocol = BLKIF_PROTOCOL_NATIVE;
+	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_X86_32))
+		be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_32;
+	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_X86_64))
+		be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_64;
+	else {
+		xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
+		return -1;
+	}
+	err = xenbus_gather(XBT_NIL, dev->otherend,
+			    "feature-persistent", "%u",
+			    &pers_grants, NULL);
+	if (err)
+		pers_grants = 0;
+
+	be->blkif->vbd.feature_gnt_persistent = pers_grants;
+	be->blkif->vbd.overflow_max_grants = 0;
+
+	pr_info("nr_rings:%d protocol %d (%s) %s\n", be->blkif->nr_rings,
+		 be->blkif->blk_protocol, protocol,
+		 pers_grants ? "persistent grants" : "");
+
+	if (be->blkif->nr_rings == 1)
+		return read_per_ring_refs(&be->blkif->rings[0], dev->otherend);
+	else {
+		xspathsize = strlen(dev->otherend) + xenstore_path_ext_size;
+		xspath = kzalloc(xspathsize, GFP_KERNEL);
+		if (!xspath) {
+			xenbus_dev_fatal(dev, -ENOMEM, "reading ring references");
+			return -ENOMEM;
+		}
+
+		for (i = 0; i < be->blkif->nr_rings; i++) {
+			memset(xspath, 0, xspathsize);
+			snprintf(xspath, xspathsize, "%s/queue-%u", dev->otherend, i);
+			err = read_per_ring_refs(&be->blkif->rings[i], xspath);
+			if (err) {
+				kfree(xspath);
+				return err;
+			}
+		}
+		kfree(xspath);
+	}
+	return 0;
 }
 
 static const struct xenbus_device_id xen_blkbk_ids[] = {
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v3 8/9] xen/blkback: pseudo support for multi hardware queues/rings
@ 2015-09-05 12:39   ` Bob Liu
  0 siblings, 0 replies; 83+ messages in thread
From: Bob Liu @ 2015-09-05 12:39 UTC (permalink / raw)
  To: xen-devel
  Cc: hch, felipe.franciosi, rafal.mielniczuk, linux-kernel,
	jonathan.davies, axboe, Bob Liu, david.vrabel, avanzini.arianna,
	boris.ostrovsky, roger.pau

Prepare patch for multi hardware queues/rings, the ring number was set to 1 by
force.

Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 drivers/block/xen-blkback/common.h |    3 +-
 drivers/block/xen-blkback/xenbus.c |  328 +++++++++++++++++++++++-------------
 2 files changed, 209 insertions(+), 122 deletions(-)

diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index cc253d4..ba058a0 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -339,7 +339,8 @@ struct xen_blkif {
 	unsigned long long			st_wr_sect;
 	unsigned int nr_ring_pages;
 	/* All rings for this device */
-	struct xen_blkif_ring ring;
+	struct xen_blkif_ring *rings;
+	unsigned int nr_rings;
 };
 
 struct seg_buf {
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index 6482ee3..04b8d0d 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -26,6 +26,7 @@
 /* Enlarge the array size in order to fully show blkback name. */
 #define BLKBACK_NAME_LEN (20)
 #define RINGREF_NAME_LEN (20)
+#define RINGREF_NAME_LEN (20)
 
 struct backend_info {
 	struct xenbus_device	*dev;
@@ -84,11 +85,13 @@ static int blkback_name(struct xen_blkif *blkif, char *buf)
 
 static void xen_update_blkif_status(struct xen_blkif *blkif)
 {
-	int err;
+	int err, i;
 	char name[BLKBACK_NAME_LEN];
+	struct xen_blkif_ring *ring;
+	char per_ring_name[BLKBACK_NAME_LEN + 2];
 
 	/* Not ready to connect? */
-	if (!blkif->ring.irq || !blkif->vbd.bdev)
+	if (!blkif->rings || !blkif->rings[0].irq || !blkif->vbd.bdev)
 		return;
 
 	/* Already connected? */
@@ -113,19 +116,68 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
 	}
 	invalidate_inode_pages2(blkif->vbd.bdev->bd_inode->i_mapping);
 
-	blkif->ring.xenblkd = kthread_run(xen_blkif_schedule, &blkif->ring, "%s", name);
-	if (IS_ERR(blkif->ring.xenblkd)) {
-		err = PTR_ERR(blkif->ring.xenblkd);
-		blkif->ring.xenblkd = NULL;
-		xenbus_dev_error(blkif->be->dev, err, "start xenblkd");
-		return;
+	if (blkif->nr_rings == 1) {
+		blkif->rings[0].xenblkd = kthread_run(xen_blkif_schedule, &blkif->rings[0], "%s", name);
+		if (IS_ERR(blkif->rings[0].xenblkd)) {
+			err = PTR_ERR(blkif->rings[0].xenblkd);
+			blkif->rings[0].xenblkd = NULL;
+			xenbus_dev_error(blkif->be->dev, err, "start xenblkd");
+			return;
+		}
+	} else {
+		for (i = 0; i < blkif->nr_rings; i++) {
+			snprintf(per_ring_name, BLKBACK_NAME_LEN + 2, "%s-%d", name, i);
+			ring = &blkif->rings[i];
+			ring->xenblkd = kthread_run(xen_blkif_schedule, ring, "%s", per_ring_name);
+			if (IS_ERR(ring->xenblkd)) {
+				err = PTR_ERR(ring->xenblkd);
+				ring->xenblkd = NULL;
+				xenbus_dev_error(blkif->be->dev, err,
+						"start %s xenblkd", per_ring_name);
+				return;
+			}
+		}
+	}
+}
+
+static int xen_blkif_alloc_rings(struct xen_blkif *blkif)
+{
+	struct xen_blkif_ring *ring;
+	int r;
+
+	blkif->rings = kzalloc(blkif->nr_rings * sizeof(struct xen_blkif_ring), GFP_KERNEL);
+	if (!blkif->rings)
+		return -ENOMEM;
+
+	for (r = 0; r < blkif->nr_rings; r++) {
+		ring = &blkif->rings[r];
+
+		spin_lock_init(&ring->blk_ring_lock);
+		init_waitqueue_head(&ring->wq);
+		ring->st_print = jiffies;
+		ring->persistent_gnts.rb_node = NULL;
+		spin_lock_init(&ring->free_pages_lock);
+		INIT_LIST_HEAD(&ring->free_pages);
+		INIT_LIST_HEAD(&ring->persistent_purge_list);
+		ring->free_pages_num = 0;
+		atomic_set(&ring->persistent_gnt_in_use, 0);
+		atomic_set(&ring->inflight, 0);
+		INIT_WORK(&ring->persistent_purge_work, xen_blkbk_unmap_purged_grants);
+		INIT_LIST_HEAD(&ring->pending_free);
+
+		spin_lock_init(&ring->pending_free_lock);
+		init_waitqueue_head(&ring->pending_free_wq);
+		init_waitqueue_head(&ring->shutdown_wq);
+		ring->blkif = blkif;
+		xen_blkif_get(blkif);
 	}
+
+	return 0;
 }
 
 static struct xen_blkif *xen_blkif_alloc(domid_t domid)
 {
 	struct xen_blkif *blkif;
-	struct xen_blkif_ring *ring;
 
 	BUILD_BUG_ON(MAX_INDIRECT_PAGES > BLKIF_MAX_INDIRECT_PAGES_PER_REQUEST);
 
@@ -134,29 +186,16 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
 		return ERR_PTR(-ENOMEM);
 
 	blkif->domid = domid;
-	ring = &blkif->ring;
-	ring->blkif = blkif;
-	spin_lock_init(&ring->blk_ring_lock);
 	atomic_set(&blkif->refcnt, 1);
-	init_waitqueue_head(&ring->wq);
 	init_completion(&blkif->drain_complete);
 	atomic_set(&blkif->drain, 0);
-	ring->st_print = jiffies;
-	ring->persistent_gnts.rb_node = NULL;
-	spin_lock_init(&ring->free_pages_lock);
-	INIT_LIST_HEAD(&ring->free_pages);
-	INIT_LIST_HEAD(&ring->persistent_purge_list);
-	ring->free_pages_num = 0;
-	atomic_set(&ring->persistent_gnt_in_use, 0);
-	atomic_set(&ring->inflight, 0);
-	INIT_WORK(&ring->persistent_purge_work, xen_blkbk_unmap_purged_grants);
-
-	INIT_LIST_HEAD(&ring->pending_free);
 	INIT_WORK(&blkif->free_work, xen_blkif_deferred_free);
-	spin_lock_init(&ring->pending_free_lock);
-	init_waitqueue_head(&ring->pending_free_wq);
-	init_waitqueue_head(&ring->shutdown_wq);
 
+	blkif->nr_rings = 1;
+	if (xen_blkif_alloc_rings(blkif)) {
+		kmem_cache_free(xen_blkif_cachep, blkif);
+		return ERR_PTR(-ENOMEM);
+	}
 	return blkif;
 }
 
@@ -216,70 +255,78 @@ static int xen_blkif_map(struct xen_blkif_ring *ring, grant_ref_t *gref,
 
 static int xen_blkif_disconnect(struct xen_blkif *blkif)
 {
-	struct xen_blkif_ring *ring = &blkif->ring;
+	struct xen_blkif_ring *ring;
+	int i;
+
+	for (i = 0; i < blkif->nr_rings; i++) {
+		ring = &blkif->rings[i];
+		if (ring->xenblkd) {
+			kthread_stop(ring->xenblkd);
+			wake_up(&ring->shutdown_wq);
+			ring->xenblkd = NULL;
+		}
 
-	if (ring->xenblkd) {
-		kthread_stop(ring->xenblkd);
-		wake_up(&ring->shutdown_wq);
-		ring->xenblkd = NULL;
-	}
+		/* The above kthread_stop() guarantees that at this point we
+		 * don't have any discard_io or other_io requests. So, checking
+		 * for inflight IO is enough.
+		 */
+		if (atomic_read(&ring->inflight) > 0)
+			return -EBUSY;
 
-	/* The above kthread_stop() guarantees that at this point we
-	 * don't have any discard_io or other_io requests. So, checking
-	 * for inflight IO is enough.
-	 */
-	if (atomic_read(&ring->inflight) > 0)
-		return -EBUSY;
+		if (ring->irq) {
+			unbind_from_irqhandler(ring->irq, ring);
+			ring->irq = 0;
+		}
 
-	if (ring->irq) {
-		unbind_from_irqhandler(ring->irq, ring);
-		ring->irq = 0;
-	}
+		if (ring->blk_rings.common.sring) {
+			xenbus_unmap_ring_vfree(blkif->be->dev, ring->blk_ring);
+			ring->blk_rings.common.sring = NULL;
+		}
 
-	if (ring->blk_rings.common.sring) {
-		xenbus_unmap_ring_vfree(blkif->be->dev, ring->blk_ring);
-		ring->blk_rings.common.sring = NULL;
+		/* Remove all persistent grants and the cache of ballooned pages. */
+		xen_blkbk_free_caches(ring);
 	}
 
-	/* Remove all persistent grants and the cache of ballooned pages. */
-	xen_blkbk_free_caches(ring);
-
 	return 0;
 }
 
 static void xen_blkif_free(struct xen_blkif *blkif)
 {
 	struct pending_req *req, *n;
-	int i = 0, j;
-	struct xen_blkif_ring *ring = &blkif->ring;
+	int i = 0, j, r;
+	struct xen_blkif_ring *ring;
 
 	xen_blkif_disconnect(blkif);
 	xen_vbd_free(&blkif->vbd);
 
-	/* Make sure everything is drained before shutting down */
-	BUG_ON(ring->persistent_gnt_c != 0);
-	BUG_ON(atomic_read(&ring->persistent_gnt_in_use) != 0);
-	BUG_ON(ring->free_pages_num != 0);
-	BUG_ON(!list_empty(&ring->persistent_purge_list));
-	BUG_ON(!list_empty(&ring->free_pages));
-	BUG_ON(!RB_EMPTY_ROOT(&ring->persistent_gnts));
+	for (r = 0; r < blkif->nr_rings; r++) {
+		ring = &blkif->rings[r];
+		/* Make sure everything is drained before shutting down */
+		BUG_ON(ring->persistent_gnt_c != 0);
+		BUG_ON(atomic_read(&ring->persistent_gnt_in_use) != 0);
+		BUG_ON(ring->free_pages_num != 0);
+		BUG_ON(!list_empty(&ring->persistent_purge_list));
+		BUG_ON(!list_empty(&ring->free_pages));
+		BUG_ON(!RB_EMPTY_ROOT(&ring->persistent_gnts));
 
-	/* Check that there is no request in use */
-	list_for_each_entry_safe(req, n, &ring->pending_free, free_list) {
-		list_del(&req->free_list);
+		/* Check that there is no request in use */
+		list_for_each_entry_safe(req, n, &ring->pending_free, free_list) {
+			list_del(&req->free_list);
 
-		for (j = 0; j < MAX_INDIRECT_SEGMENTS; j++)
-			kfree(req->segments[j]);
+			for (j = 0; j < MAX_INDIRECT_SEGMENTS; j++)
+				kfree(req->segments[j]);
 
-		for (j = 0; j < MAX_INDIRECT_PAGES; j++)
-			kfree(req->indirect_pages[j]);
+			for (j = 0; j < MAX_INDIRECT_PAGES; j++)
+				kfree(req->indirect_pages[j]);
 
-		kfree(req);
-		i++;
-	}
+			kfree(req);
+			i++;
+		}
 
-	WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));
+		WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));
+	}
 
+	kfree(blkif->rings);
 	kmem_cache_free(xen_blkif_cachep, blkif);
 }
 
@@ -306,15 +353,19 @@ int __init xen_blkif_interface_init(void)
 		struct xenbus_device *dev = to_xenbus_device(_dev);	\
 		struct backend_info *be = dev_get_drvdata(&dev->dev);	\
 		struct xen_blkif *blkif = be->blkif;			\
-		struct xen_blkif_ring *ring = &blkif->ring;		\
+		struct xen_blkif_ring *ring;				\
+		int i;							\
 									\
-		blkif->st_oo_req = ring->st_oo_req;			\
-		blkif->st_rd_req = ring->st_rd_req;			\
-		blkif->st_wr_req = ring->st_wr_req;			\
-		blkif->st_f_req = ring->st_f_req;			\
-		blkif->st_ds_req = ring->st_ds_req;			\
-		blkif->st_rd_sect = ring->st_rd_sect;			\
-		blkif->st_wr_sect = ring->st_wr_sect;			\
+		for (i = 0; i < blkif->nr_rings; i++) {			\
+			ring = &blkif->rings[i];			\
+			blkif->st_oo_req += ring->st_oo_req;		\
+			blkif->st_rd_req += ring->st_rd_req;		\
+			blkif->st_wr_req += ring->st_wr_req;		\
+			blkif->st_f_req += ring->st_f_req;		\
+			blkif->st_ds_req += ring->st_ds_req;		\
+			blkif->st_rd_sect += ring->st_rd_sect;		\
+			blkif->st_wr_sect += ring->st_wr_sect;		\
+		}							\
 									\
 		return sprintf(buf, format, ##args);			\
 	}								\
@@ -438,6 +489,7 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
 static int xen_blkbk_remove(struct xenbus_device *dev)
 {
 	struct backend_info *be = dev_get_drvdata(&dev->dev);
+	int i;
 
 	pr_debug("%s %p %d\n", __func__, dev, dev->otherend_id);
 
@@ -454,7 +506,8 @@ static int xen_blkbk_remove(struct xenbus_device *dev)
 
 	if (be->blkif) {
 		xen_blkif_disconnect(be->blkif);
-		xen_blkif_put(be->blkif);
+		for (i = 0; i < be->blkif->nr_rings; i++)
+			xen_blkif_put(be->blkif);
 	}
 
 	kfree(be->mode);
@@ -837,21 +890,16 @@ again:
 	xenbus_transaction_end(xbt, 1);
 }
 
-
-static int connect_ring(struct backend_info *be)
+static int read_per_ring_refs(struct xen_blkif_ring *ring, const char *dir)
 {
-	struct xenbus_device *dev = be->dev;
+	unsigned int ring_page_order, nr_grefs, evtchn;
 	unsigned int ring_ref[XENBUS_MAX_RING_PAGES];
-	unsigned int evtchn, nr_grefs, ring_page_order;
-	unsigned int pers_grants;
-	char protocol[64] = "";
 	struct pending_req *req, *n;
 	int err, i, j;
-	struct xen_blkif_ring *ring = &be->blkif->ring;
-
-	pr_debug("%s %s\n", __func__, dev->otherend);
+	struct xen_blkif *blkif = ring->blkif;
+	struct xenbus_device *dev = blkif->be->dev;
 
-	err = xenbus_scanf(XBT_NIL, dev->otherend, "event-channel", "%u",
+	err = xenbus_scanf(XBT_NIL, dir, "event-channel", "%u",
 			  &evtchn);
 	if (err != 1) {
 		err = -EINVAL;
@@ -859,12 +907,11 @@ static int connect_ring(struct backend_info *be)
 				 dev->otherend);
 		return err;
 	}
-	pr_info("event-channel %u\n", evtchn);
 
 	err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-page-order", "%u",
 			  &ring_page_order);
 	if (err != 1) {
-		err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-ref",
+		err = xenbus_scanf(XBT_NIL, dir, "ring-ref",
 				  "%u", &ring_ref[0]);
 		if (err != 1) {
 			err = -EINVAL;
@@ -873,7 +920,7 @@ static int connect_ring(struct backend_info *be)
 			return err;
 		}
 		nr_grefs = 1;
-		pr_info("%s:using single page: ring-ref %d\n", dev->otherend,
+		pr_info("%s:using single page: ring-ref %d\n", dir,
 			ring_ref[0]);
 	} else {
 		unsigned int i;
@@ -891,7 +938,7 @@ static int connect_ring(struct backend_info *be)
 			char ring_ref_name[RINGREF_NAME_LEN];
 
 			snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
-			err = xenbus_scanf(XBT_NIL, dev->otherend, ring_ref_name,
+			err = xenbus_scanf(XBT_NIL, dir, ring_ref_name,
 					   "%u", &ring_ref[i]);
 			if (err != 1) {
 				err = -EINVAL;
@@ -899,38 +946,12 @@ static int connect_ring(struct backend_info *be)
 						 dev->otherend, ring_ref_name);
 				return err;
 			}
-			pr_info("ring-ref%u: %u\n", i, ring_ref[i]);
+			pr_info("%s: ring-ref%u: %u\n", dir, i, ring_ref[i]);
 		}
 	}
+	blkif->nr_ring_pages = nr_grefs;
 
-	be->blkif->blk_protocol = BLKIF_PROTOCOL_DEFAULT;
-	err = xenbus_gather(XBT_NIL, dev->otherend, "protocol",
-			    "%63s", protocol, NULL);
-	if (err)
-		strcpy(protocol, "unspecified, assuming default");
-	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_NATIVE))
-		be->blkif->blk_protocol = BLKIF_PROTOCOL_NATIVE;
-	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_X86_32))
-		be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_32;
-	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_X86_64))
-		be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_64;
-	else {
-		xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
-		return -1;
-	}
-	err = xenbus_gather(XBT_NIL, dev->otherend,
-			    "feature-persistent", "%u",
-			    &pers_grants, NULL);
-	if (err)
-		pers_grants = 0;
-
-	be->blkif->vbd.feature_gnt_persistent = pers_grants;
-	be->blkif->vbd.overflow_max_grants = 0;
-	be->blkif->nr_ring_pages = nr_grefs;
-
-	pr_info("ring-pages:%d, event-channel %d, protocol %d (%s) %s\n",
-		nr_grefs, evtchn, be->blkif->blk_protocol, protocol,
-		pers_grants ? "persistent grants" : "");
+	pr_info("ring-pages:%d, event-channel %d.\n", nr_grefs, evtchn);
 
 	for (i = 0; i < nr_grefs * XEN_BLKIF_REQS_PER_PAGE; i++) {
 		req = kzalloc(sizeof(*req), GFP_KERNEL);
@@ -975,6 +996,71 @@ fail:
 		kfree(req);
 	}
 	return -ENOMEM;
+
+}
+
+static int connect_ring(struct backend_info *be)
+{
+	struct xenbus_device *dev = be->dev;
+	unsigned int pers_grants;
+	char protocol[64] = "";
+	int err, i;
+	char *xspath;
+	size_t xspathsize;
+	const size_t xenstore_path_ext_size = 11; /* sufficient for "/queue-NNN" */
+
+	pr_debug("%s %s\n", __func__, dev->otherend);
+
+	be->blkif->blk_protocol = BLKIF_PROTOCOL_DEFAULT;
+	err = xenbus_gather(XBT_NIL, dev->otherend, "protocol",
+			    "%63s", protocol, NULL);
+	if (err)
+		strcpy(protocol, "unspecified, assuming default");
+	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_NATIVE))
+		be->blkif->blk_protocol = BLKIF_PROTOCOL_NATIVE;
+	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_X86_32))
+		be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_32;
+	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_X86_64))
+		be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_64;
+	else {
+		xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
+		return -1;
+	}
+	err = xenbus_gather(XBT_NIL, dev->otherend,
+			    "feature-persistent", "%u",
+			    &pers_grants, NULL);
+	if (err)
+		pers_grants = 0;
+
+	be->blkif->vbd.feature_gnt_persistent = pers_grants;
+	be->blkif->vbd.overflow_max_grants = 0;
+
+	pr_info("nr_rings:%d protocol %d (%s) %s\n", be->blkif->nr_rings,
+		 be->blkif->blk_protocol, protocol,
+		 pers_grants ? "persistent grants" : "");
+
+	if (be->blkif->nr_rings == 1)
+		return read_per_ring_refs(&be->blkif->rings[0], dev->otherend);
+	else {
+		xspathsize = strlen(dev->otherend) + xenstore_path_ext_size;
+		xspath = kzalloc(xspathsize, GFP_KERNEL);
+		if (!xspath) {
+			xenbus_dev_fatal(dev, -ENOMEM, "reading ring references");
+			return -ENOMEM;
+		}
+
+		for (i = 0; i < be->blkif->nr_rings; i++) {
+			memset(xspath, 0, xspathsize);
+			snprintf(xspath, xspathsize, "%s/queue-%u", dev->otherend, i);
+			err = read_per_ring_refs(&be->blkif->rings[i], xspath);
+			if (err) {
+				kfree(xspath);
+				return err;
+			}
+		}
+		kfree(xspath);
+	}
+	return 0;
 }
 
 static const struct xenbus_device_id xen_blkbk_ids[] = {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v3 9/9] xen/blkback: get number of hardware queues/rings from blkfront
  2015-09-05 12:39 [PATCH v3 0/9] xen-block: support multi hardware-queues/rings Bob Liu
@ 2015-09-05 12:39   ` Bob Liu
  2015-09-05 12:39 ` Bob Liu
                     ` (13 subsequent siblings)
  14 siblings, 0 replies; 83+ messages in thread
From: Bob Liu @ 2015-09-05 12:39 UTC (permalink / raw)
  To: xen-devel
  Cc: david.vrabel, linux-kernel, roger.pau, konrad.wilk,
	felipe.franciosi, axboe, hch, avanzini.arianna, rafal.mielniczuk,
	boris.ostrovsky, jonathan.davies, Bob Liu

Backend advertises "multi-queue-max-queues" to front, and then read back the
final negotiated queues/rings from "multi-queue-num-queues" which is wrote by
blkfront.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 drivers/block/xen-blkback/blkback.c |    8 ++++++++
 drivers/block/xen-blkback/xenbus.c  |   36 ++++++++++++++++++++++++++++++-----
 2 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index fd02240..b904fe05f0 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -83,6 +83,11 @@ module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644);
 MODULE_PARM_DESC(max_persistent_grants,
                  "Maximum number of grants to map persistently");
 
+unsigned int xenblk_max_queues;
+module_param_named(max_queues, xenblk_max_queues, uint, 0644);
+MODULE_PARM_DESC(max_queues,
+		 "Maximum number of hardware queues per virtual disk");
+
 /*
  * Maximum order of pages to be used for the shared ring between front and
  * backend, 4KB page granularity is used.
@@ -1458,6 +1463,9 @@ static int __init xen_blkif_init(void)
 		xen_blkif_max_ring_order = XENBUS_MAX_RING_PAGE_ORDER;
 	}
 
+	/* Allow as many queues as there are CPUs, by default */
+	xenblk_max_queues = num_online_cpus();
+
 	rc = xen_blkif_interface_init();
 	if (rc)
 		goto failed_init;
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index 04b8d0d..aa97ea5 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -28,6 +28,8 @@
 #define RINGREF_NAME_LEN (20)
 #define RINGREF_NAME_LEN (20)
 
+extern unsigned int xenblk_max_queues;
+
 struct backend_info {
 	struct xenbus_device	*dev;
 	struct xen_blkif	*blkif;
@@ -191,11 +193,6 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
 	atomic_set(&blkif->drain, 0);
 	INIT_WORK(&blkif->free_work, xen_blkif_deferred_free);
 
-	blkif->nr_rings = 1;
-	if (xen_blkif_alloc_rings(blkif)) {
-		kmem_cache_free(xen_blkif_cachep, blkif);
-		return ERR_PTR(-ENOMEM);
-	}
 	return blkif;
 }
 
@@ -618,6 +615,14 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
 		goto fail;
 	}
 
+	/* Multi-queue: wrte how many queues backend supported. */
+	err = xenbus_printf(XBT_NIL, dev->nodename,
+			    "multi-queue-max-queues", "%u", xenblk_max_queues);
+	if (err) {
+		pr_debug("Error writing multi-queue-num-queues\n");
+		goto fail;
+	}
+
 	/* setup back pointer */
 	be->blkif->be = be;
 
@@ -1008,6 +1013,7 @@ static int connect_ring(struct backend_info *be)
 	char *xspath;
 	size_t xspathsize;
 	const size_t xenstore_path_ext_size = 11; /* sufficient for "/queue-NNN" */
+	unsigned int requested_num_queues = 0;
 
 	pr_debug("%s %s\n", __func__, dev->otherend);
 
@@ -1035,6 +1041,26 @@ static int connect_ring(struct backend_info *be)
 	be->blkif->vbd.feature_gnt_persistent = pers_grants;
 	be->blkif->vbd.overflow_max_grants = 0;
 
+	/*
+	 * Read the number of hardware queus from frontend.
+	 */
+	err = xenbus_scanf(XBT_NIL, dev->otherend, "multi-queue-num-queues", "%u", &requested_num_queues);
+	if (err < 0) {
+		requested_num_queues = 1;
+	} else {
+		if (requested_num_queues > xenblk_max_queues
+		    || requested_num_queues == 0) {
+			/* buggy or malicious guest */
+			xenbus_dev_fatal(dev, err,
+					"guest requested %u queues, exceeding the maximum of %u.",
+					requested_num_queues, xenblk_max_queues);
+			return -1;
+		}
+	}
+	be->blkif->nr_rings = requested_num_queues;
+	if (xen_blkif_alloc_rings(be->blkif))
+		return -ENOMEM;
+
 	pr_info("nr_rings:%d protocol %d (%s) %s\n", be->blkif->nr_rings,
 		 be->blkif->blk_protocol, protocol,
 		 pers_grants ? "persistent grants" : "");
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v3 9/9] xen/blkback: get number of hardware queues/rings from blkfront
@ 2015-09-05 12:39   ` Bob Liu
  0 siblings, 0 replies; 83+ messages in thread
From: Bob Liu @ 2015-09-05 12:39 UTC (permalink / raw)
  To: xen-devel
  Cc: hch, felipe.franciosi, rafal.mielniczuk, linux-kernel,
	jonathan.davies, axboe, Bob Liu, david.vrabel, avanzini.arianna,
	boris.ostrovsky, roger.pau

Backend advertises "multi-queue-max-queues" to front, and then read back the
final negotiated queues/rings from "multi-queue-num-queues" which is wrote by
blkfront.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 drivers/block/xen-blkback/blkback.c |    8 ++++++++
 drivers/block/xen-blkback/xenbus.c  |   36 ++++++++++++++++++++++++++++++-----
 2 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index fd02240..b904fe05f0 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -83,6 +83,11 @@ module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644);
 MODULE_PARM_DESC(max_persistent_grants,
                  "Maximum number of grants to map persistently");
 
+unsigned int xenblk_max_queues;
+module_param_named(max_queues, xenblk_max_queues, uint, 0644);
+MODULE_PARM_DESC(max_queues,
+		 "Maximum number of hardware queues per virtual disk");
+
 /*
  * Maximum order of pages to be used for the shared ring between front and
  * backend, 4KB page granularity is used.
@@ -1458,6 +1463,9 @@ static int __init xen_blkif_init(void)
 		xen_blkif_max_ring_order = XENBUS_MAX_RING_PAGE_ORDER;
 	}
 
+	/* Allow as many queues as there are CPUs, by default */
+	xenblk_max_queues = num_online_cpus();
+
 	rc = xen_blkif_interface_init();
 	if (rc)
 		goto failed_init;
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index 04b8d0d..aa97ea5 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -28,6 +28,8 @@
 #define RINGREF_NAME_LEN (20)
 #define RINGREF_NAME_LEN (20)
 
+extern unsigned int xenblk_max_queues;
+
 struct backend_info {
 	struct xenbus_device	*dev;
 	struct xen_blkif	*blkif;
@@ -191,11 +193,6 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
 	atomic_set(&blkif->drain, 0);
 	INIT_WORK(&blkif->free_work, xen_blkif_deferred_free);
 
-	blkif->nr_rings = 1;
-	if (xen_blkif_alloc_rings(blkif)) {
-		kmem_cache_free(xen_blkif_cachep, blkif);
-		return ERR_PTR(-ENOMEM);
-	}
 	return blkif;
 }
 
@@ -618,6 +615,14 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
 		goto fail;
 	}
 
+	/* Multi-queue: wrte how many queues backend supported. */
+	err = xenbus_printf(XBT_NIL, dev->nodename,
+			    "multi-queue-max-queues", "%u", xenblk_max_queues);
+	if (err) {
+		pr_debug("Error writing multi-queue-num-queues\n");
+		goto fail;
+	}
+
 	/* setup back pointer */
 	be->blkif->be = be;
 
@@ -1008,6 +1013,7 @@ static int connect_ring(struct backend_info *be)
 	char *xspath;
 	size_t xspathsize;
 	const size_t xenstore_path_ext_size = 11; /* sufficient for "/queue-NNN" */
+	unsigned int requested_num_queues = 0;
 
 	pr_debug("%s %s\n", __func__, dev->otherend);
 
@@ -1035,6 +1041,26 @@ static int connect_ring(struct backend_info *be)
 	be->blkif->vbd.feature_gnt_persistent = pers_grants;
 	be->blkif->vbd.overflow_max_grants = 0;
 
+	/*
+	 * Read the number of hardware queus from frontend.
+	 */
+	err = xenbus_scanf(XBT_NIL, dev->otherend, "multi-queue-num-queues", "%u", &requested_num_queues);
+	if (err < 0) {
+		requested_num_queues = 1;
+	} else {
+		if (requested_num_queues > xenblk_max_queues
+		    || requested_num_queues == 0) {
+			/* buggy or malicious guest */
+			xenbus_dev_fatal(dev, err,
+					"guest requested %u queues, exceeding the maximum of %u.",
+					requested_num_queues, xenblk_max_queues);
+			return -1;
+		}
+	}
+	be->blkif->nr_rings = requested_num_queues;
+	if (xen_blkif_alloc_rings(be->blkif))
+		return -ENOMEM;
+
 	pr_info("nr_rings:%d protocol %d (%s) %s\n", be->blkif->nr_rings,
 		 be->blkif->blk_protocol, protocol,
 		 pers_grants ? "persistent grants" : "");
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 1/9] xen-blkfront: convert to blk-mq APIs
  2015-09-05 12:39 ` Bob Liu
@ 2015-09-23 20:31   ` Konrad Rzeszutek Wilk
  2015-09-23 21:12     ` Konrad Rzeszutek Wilk
  2015-09-23 21:12     ` Konrad Rzeszutek Wilk
  2015-09-23 20:31   ` Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 83+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-09-23 20:31 UTC (permalink / raw)
  To: Bob Liu, david.vrabel
  Cc: xen-devel, david.vrabel, linux-kernel, roger.pau,
	felipe.franciosi, axboe, hch, avanzini.arianna, rafal.mielniczuk,
	boris.ostrovsky, jonathan.davies

On Sat, Sep 05, 2015 at 08:39:34PM +0800, Bob Liu wrote:
> Note: This patch is based on original work of Arianna's internship for
> GNOME's Outreach Program for Women.
> 
> Only one hardware queue is used now, so there is no significant
> performance change
> 
> The legacy non-mq code is deleted completely which is the same as other
> drivers like virtio, mtip, and nvme.
> 
> Also dropped one unnecessary holding of info->io_lock when calling
> blk_mq_stop_hw_queues().
> 
> Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jens Axboe <axboe@fb.com>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>

Odd.

This should have gone in Linux 4.3 but it did not? I remember seeing it
there? I think?

Anyhow I will put this in my queue for 4.4.
> ---
>  drivers/block/xen-blkfront.c |  146 +++++++++++++++++-------------------------
>  1 file changed, 60 insertions(+), 86 deletions(-)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 7a8a73f..5dd591d 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -37,6 +37,7 @@
>  
>  #include <linux/interrupt.h>
>  #include <linux/blkdev.h>
> +#include <linux/blk-mq.h>
>  #include <linux/hdreg.h>
>  #include <linux/cdrom.h>
>  #include <linux/module.h>
> @@ -148,6 +149,7 @@ struct blkfront_info
>  	unsigned int feature_persistent:1;
>  	unsigned int max_indirect_segments;
>  	int is_ready;
> +	struct blk_mq_tag_set tag_set;
>  };
>  
>  static unsigned int nr_minors;
> @@ -617,54 +619,41 @@ static inline bool blkif_request_flush_invalid(struct request *req,
>  		 !(info->feature_flush & REQ_FUA)));
>  }
>  
> -/*
> - * do_blkif_request
> - *  read a block; request is in a request queue
> - */
> -static void do_blkif_request(struct request_queue *rq)
> +static int blkif_queue_rq(struct blk_mq_hw_ctx *hctx,
> +			   const struct blk_mq_queue_data *qd)
>  {
> -	struct blkfront_info *info = NULL;
> -	struct request *req;
> -	int queued;
> -
> -	pr_debug("Entered do_blkif_request\n");
> -
> -	queued = 0;
> +	struct blkfront_info *info = qd->rq->rq_disk->private_data;
>  
> -	while ((req = blk_peek_request(rq)) != NULL) {
> -		info = req->rq_disk->private_data;
> -
> -		if (RING_FULL(&info->ring))
> -			goto wait;
> +	blk_mq_start_request(qd->rq);
> +	spin_lock_irq(&info->io_lock);
> +	if (RING_FULL(&info->ring))
> +		goto out_busy;
>  
> -		blk_start_request(req);
> +	if (blkif_request_flush_invalid(qd->rq, info))
> +		goto out_err;
>  
> -		if (blkif_request_flush_invalid(req, info)) {
> -			__blk_end_request_all(req, -EOPNOTSUPP);
> -			continue;
> -		}
> +	if (blkif_queue_request(qd->rq))
> +		goto out_busy;
>  
> -		pr_debug("do_blk_req %p: cmd %p, sec %lx, "
> -			 "(%u/%u) [%s]\n",
> -			 req, req->cmd, (unsigned long)blk_rq_pos(req),
> -			 blk_rq_cur_sectors(req), blk_rq_sectors(req),
> -			 rq_data_dir(req) ? "write" : "read");
> -
> -		if (blkif_queue_request(req)) {
> -			blk_requeue_request(rq, req);
> -wait:
> -			/* Avoid pointless unplugs. */
> -			blk_stop_queue(rq);
> -			break;
> -		}
> +	flush_requests(info);
> +	spin_unlock_irq(&info->io_lock);
> +	return BLK_MQ_RQ_QUEUE_OK;
>  
> -		queued++;
> -	}
> +out_err:
> +	spin_unlock_irq(&info->io_lock);
> +	return BLK_MQ_RQ_QUEUE_ERROR;
>  
> -	if (queued != 0)
> -		flush_requests(info);
> +out_busy:
> +	spin_unlock_irq(&info->io_lock);
> +	blk_mq_stop_hw_queue(hctx);
> +	return BLK_MQ_RQ_QUEUE_BUSY;
>  }
>  
> +static struct blk_mq_ops blkfront_mq_ops = {
> +	.queue_rq = blkif_queue_rq,
> +	.map_queue = blk_mq_map_queue,
> +};
> +
>  static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
>  				unsigned int physical_sector_size,
>  				unsigned int segments)
> @@ -672,9 +661,22 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
>  	struct request_queue *rq;
>  	struct blkfront_info *info = gd->private_data;
>  
> -	rq = blk_init_queue(do_blkif_request, &info->io_lock);
> -	if (rq == NULL)
> +	memset(&info->tag_set, 0, sizeof(info->tag_set));
> +	info->tag_set.ops = &blkfront_mq_ops;
> +	info->tag_set.nr_hw_queues = 1;
> +	info->tag_set.queue_depth =  BLK_RING_SIZE(info);
> +	info->tag_set.numa_node = NUMA_NO_NODE;
> +	info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
> +	info->tag_set.cmd_size = 0;
> +	info->tag_set.driver_data = info;
> +
> +	if (blk_mq_alloc_tag_set(&info->tag_set))
>  		return -1;
> +	rq = blk_mq_init_queue(&info->tag_set);
> +	if (IS_ERR(rq)) {
> +		blk_mq_free_tag_set(&info->tag_set);
> +		return -1;
> +	}
>  
>  	queue_flag_set_unlocked(QUEUE_FLAG_VIRT, rq);
>  
> @@ -902,19 +904,15 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
>  static void xlvbd_release_gendisk(struct blkfront_info *info)
>  {
>  	unsigned int minor, nr_minors;
> -	unsigned long flags;
>  
>  	if (info->rq == NULL)
>  		return;
>  
> -	spin_lock_irqsave(&info->io_lock, flags);
> -
>  	/* No more blkif_request(). */
> -	blk_stop_queue(info->rq);
> +	blk_mq_stop_hw_queues(info->rq);
>  
>  	/* No more gnttab callback work. */
>  	gnttab_cancel_free_callback(&info->callback);
> -	spin_unlock_irqrestore(&info->io_lock, flags);
>  
>  	/* Flush gnttab callback work. Must be done with no locks held. */
>  	flush_work(&info->work);
> @@ -926,20 +924,18 @@ static void xlvbd_release_gendisk(struct blkfront_info *info)
>  	xlbd_release_minors(minor, nr_minors);
>  
>  	blk_cleanup_queue(info->rq);
> +	blk_mq_free_tag_set(&info->tag_set);
>  	info->rq = NULL;
>  
>  	put_disk(info->gd);
>  	info->gd = NULL;
>  }
>  
> +/* Must be called with io_lock holded */
>  static void kick_pending_request_queues(struct blkfront_info *info)
>  {
> -	if (!RING_FULL(&info->ring)) {
> -		/* Re-enable calldowns. */
> -		blk_start_queue(info->rq);
> -		/* Kick things off immediately. */
> -		do_blkif_request(info->rq);
> -	}
> +	if (!RING_FULL(&info->ring))
> +		blk_mq_start_stopped_hw_queues(info->rq, true);
>  }
>  
>  static void blkif_restart_queue(struct work_struct *work)
> @@ -964,7 +960,7 @@ static void blkif_free(struct blkfront_info *info, int suspend)
>  		BLKIF_STATE_SUSPENDED : BLKIF_STATE_DISCONNECTED;
>  	/* No more blkif_request(). */
>  	if (info->rq)
> -		blk_stop_queue(info->rq);
> +		blk_mq_stop_hw_queues(info->rq);
>  
>  	/* Remove all persistent grants */
>  	if (!list_empty(&info->grants)) {
> @@ -1147,7 +1143,6 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>  	RING_IDX i, rp;
>  	unsigned long flags;
>  	struct blkfront_info *info = (struct blkfront_info *)dev_id;
> -	int error;
>  
>  	spin_lock_irqsave(&info->io_lock, flags);
>  
> @@ -1188,37 +1183,37 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>  			continue;
>  		}
>  
> -		error = (bret->status == BLKIF_RSP_OKAY) ? 0 : -EIO;
> +		req->errors = (bret->status == BLKIF_RSP_OKAY) ? 0 : -EIO;
>  		switch (bret->operation) {
>  		case BLKIF_OP_DISCARD:
>  			if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) {
>  				struct request_queue *rq = info->rq;
>  				printk(KERN_WARNING "blkfront: %s: %s op failed\n",
>  					   info->gd->disk_name, op_name(bret->operation));
> -				error = -EOPNOTSUPP;
> +				req->errors = -EOPNOTSUPP;
>  				info->feature_discard = 0;
>  				info->feature_secdiscard = 0;
>  				queue_flag_clear(QUEUE_FLAG_DISCARD, rq);
>  				queue_flag_clear(QUEUE_FLAG_SECDISCARD, rq);
>  			}
> -			__blk_end_request_all(req, error);
> +			blk_mq_complete_request(req);
>  			break;
>  		case BLKIF_OP_FLUSH_DISKCACHE:
>  		case BLKIF_OP_WRITE_BARRIER:
>  			if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) {
>  				printk(KERN_WARNING "blkfront: %s: %s op failed\n",
>  				       info->gd->disk_name, op_name(bret->operation));
> -				error = -EOPNOTSUPP;
> +				req->errors = -EOPNOTSUPP;
>  			}
>  			if (unlikely(bret->status == BLKIF_RSP_ERROR &&
>  				     info->shadow[id].req.u.rw.nr_segments == 0)) {
>  				printk(KERN_WARNING "blkfront: %s: empty %s op failed\n",
>  				       info->gd->disk_name, op_name(bret->operation));
> -				error = -EOPNOTSUPP;
> +				req->errors = -EOPNOTSUPP;
>  			}
> -			if (unlikely(error)) {
> -				if (error == -EOPNOTSUPP)
> -					error = 0;
> +			if (unlikely(req->errors)) {
> +				if (req->errors == -EOPNOTSUPP)
> +					req->errors = 0;
>  				info->feature_flush = 0;
>  				xlvbd_flush(info);
>  			}
> @@ -1229,7 +1224,7 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>  				dev_dbg(&info->xbdev->dev, "Bad return from blkdev data "
>  					"request: %x\n", bret->status);
>  
> -			__blk_end_request_all(req, error);
> +			blk_mq_complete_request(req);
>  			break;
>  		default:
>  			BUG();
> @@ -1558,28 +1553,6 @@ static int blkif_recover(struct blkfront_info *info)
>  
>  	kfree(copy);
>  
> -	/*
> -	 * Empty the queue, this is important because we might have
> -	 * requests in the queue with more segments than what we
> -	 * can handle now.
> -	 */
> -	spin_lock_irq(&info->io_lock);
> -	while ((req = blk_fetch_request(info->rq)) != NULL) {
> -		if (req->cmd_flags &
> -		    (REQ_FLUSH | REQ_FUA | REQ_DISCARD | REQ_SECURE)) {
> -			list_add(&req->queuelist, &requests);
> -			continue;
> -		}
> -		merge_bio.head = req->bio;
> -		merge_bio.tail = req->biotail;
> -		bio_list_merge(&bio_list, &merge_bio);
> -		req->bio = NULL;
> -		if (req->cmd_flags & (REQ_FLUSH | REQ_FUA))
> -			pr_alert("diskcache flush request found!\n");
> -		__blk_end_request_all(req, 0);
> -	}
> -	spin_unlock_irq(&info->io_lock);
> -
>  	xenbus_switch_state(info->xbdev, XenbusStateConnected);
>  
>  	spin_lock_irq(&info->io_lock);
> @@ -1594,9 +1567,10 @@ static int blkif_recover(struct blkfront_info *info)
>  		/* Requeue pending requests (flush or discard) */
>  		list_del_init(&req->queuelist);
>  		BUG_ON(req->nr_phys_segments > segs);
> -		blk_requeue_request(info->rq, req);
> +		blk_mq_requeue_request(req);
>  	}
>  	spin_unlock_irq(&info->io_lock);
> +	blk_mq_kick_requeue_list(info->rq);
>  
>  	while ((bio = bio_list_pop(&bio_list)) != NULL) {
>  		/* Traverse the list of pending bios and re-queue them */
> -- 
> 1.7.10.4
> 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 1/9] xen-blkfront: convert to blk-mq APIs
  2015-09-05 12:39 ` Bob Liu
  2015-09-23 20:31   ` Konrad Rzeszutek Wilk
@ 2015-09-23 20:31   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 83+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-09-23 20:31 UTC (permalink / raw)
  To: Bob Liu
  Cc: hch, felipe.franciosi, rafal.mielniczuk, jonathan.davies,
	linux-kernel, xen-devel, axboe, david.vrabel, avanzini.arianna,
	boris.ostrovsky, roger.pau

On Sat, Sep 05, 2015 at 08:39:34PM +0800, Bob Liu wrote:
> Note: This patch is based on original work of Arianna's internship for
> GNOME's Outreach Program for Women.
> 
> Only one hardware queue is used now, so there is no significant
> performance change
> 
> The legacy non-mq code is deleted completely which is the same as other
> drivers like virtio, mtip, and nvme.
> 
> Also dropped one unnecessary holding of info->io_lock when calling
> blk_mq_stop_hw_queues().
> 
> Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Jens Axboe <axboe@fb.com>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>

Odd.

This should have gone in Linux 4.3 but it did not? I remember seeing it
there? I think?

Anyhow I will put this in my queue for 4.4.
> ---
>  drivers/block/xen-blkfront.c |  146 +++++++++++++++++-------------------------
>  1 file changed, 60 insertions(+), 86 deletions(-)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 7a8a73f..5dd591d 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -37,6 +37,7 @@
>  
>  #include <linux/interrupt.h>
>  #include <linux/blkdev.h>
> +#include <linux/blk-mq.h>
>  #include <linux/hdreg.h>
>  #include <linux/cdrom.h>
>  #include <linux/module.h>
> @@ -148,6 +149,7 @@ struct blkfront_info
>  	unsigned int feature_persistent:1;
>  	unsigned int max_indirect_segments;
>  	int is_ready;
> +	struct blk_mq_tag_set tag_set;
>  };
>  
>  static unsigned int nr_minors;
> @@ -617,54 +619,41 @@ static inline bool blkif_request_flush_invalid(struct request *req,
>  		 !(info->feature_flush & REQ_FUA)));
>  }
>  
> -/*
> - * do_blkif_request
> - *  read a block; request is in a request queue
> - */
> -static void do_blkif_request(struct request_queue *rq)
> +static int blkif_queue_rq(struct blk_mq_hw_ctx *hctx,
> +			   const struct blk_mq_queue_data *qd)
>  {
> -	struct blkfront_info *info = NULL;
> -	struct request *req;
> -	int queued;
> -
> -	pr_debug("Entered do_blkif_request\n");
> -
> -	queued = 0;
> +	struct blkfront_info *info = qd->rq->rq_disk->private_data;
>  
> -	while ((req = blk_peek_request(rq)) != NULL) {
> -		info = req->rq_disk->private_data;
> -
> -		if (RING_FULL(&info->ring))
> -			goto wait;
> +	blk_mq_start_request(qd->rq);
> +	spin_lock_irq(&info->io_lock);
> +	if (RING_FULL(&info->ring))
> +		goto out_busy;
>  
> -		blk_start_request(req);
> +	if (blkif_request_flush_invalid(qd->rq, info))
> +		goto out_err;
>  
> -		if (blkif_request_flush_invalid(req, info)) {
> -			__blk_end_request_all(req, -EOPNOTSUPP);
> -			continue;
> -		}
> +	if (blkif_queue_request(qd->rq))
> +		goto out_busy;
>  
> -		pr_debug("do_blk_req %p: cmd %p, sec %lx, "
> -			 "(%u/%u) [%s]\n",
> -			 req, req->cmd, (unsigned long)blk_rq_pos(req),
> -			 blk_rq_cur_sectors(req), blk_rq_sectors(req),
> -			 rq_data_dir(req) ? "write" : "read");
> -
> -		if (blkif_queue_request(req)) {
> -			blk_requeue_request(rq, req);
> -wait:
> -			/* Avoid pointless unplugs. */
> -			blk_stop_queue(rq);
> -			break;
> -		}
> +	flush_requests(info);
> +	spin_unlock_irq(&info->io_lock);
> +	return BLK_MQ_RQ_QUEUE_OK;
>  
> -		queued++;
> -	}
> +out_err:
> +	spin_unlock_irq(&info->io_lock);
> +	return BLK_MQ_RQ_QUEUE_ERROR;
>  
> -	if (queued != 0)
> -		flush_requests(info);
> +out_busy:
> +	spin_unlock_irq(&info->io_lock);
> +	blk_mq_stop_hw_queue(hctx);
> +	return BLK_MQ_RQ_QUEUE_BUSY;
>  }
>  
> +static struct blk_mq_ops blkfront_mq_ops = {
> +	.queue_rq = blkif_queue_rq,
> +	.map_queue = blk_mq_map_queue,
> +};
> +
>  static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
>  				unsigned int physical_sector_size,
>  				unsigned int segments)
> @@ -672,9 +661,22 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
>  	struct request_queue *rq;
>  	struct blkfront_info *info = gd->private_data;
>  
> -	rq = blk_init_queue(do_blkif_request, &info->io_lock);
> -	if (rq == NULL)
> +	memset(&info->tag_set, 0, sizeof(info->tag_set));
> +	info->tag_set.ops = &blkfront_mq_ops;
> +	info->tag_set.nr_hw_queues = 1;
> +	info->tag_set.queue_depth =  BLK_RING_SIZE(info);
> +	info->tag_set.numa_node = NUMA_NO_NODE;
> +	info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
> +	info->tag_set.cmd_size = 0;
> +	info->tag_set.driver_data = info;
> +
> +	if (blk_mq_alloc_tag_set(&info->tag_set))
>  		return -1;
> +	rq = blk_mq_init_queue(&info->tag_set);
> +	if (IS_ERR(rq)) {
> +		blk_mq_free_tag_set(&info->tag_set);
> +		return -1;
> +	}
>  
>  	queue_flag_set_unlocked(QUEUE_FLAG_VIRT, rq);
>  
> @@ -902,19 +904,15 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
>  static void xlvbd_release_gendisk(struct blkfront_info *info)
>  {
>  	unsigned int minor, nr_minors;
> -	unsigned long flags;
>  
>  	if (info->rq == NULL)
>  		return;
>  
> -	spin_lock_irqsave(&info->io_lock, flags);
> -
>  	/* No more blkif_request(). */
> -	blk_stop_queue(info->rq);
> +	blk_mq_stop_hw_queues(info->rq);
>  
>  	/* No more gnttab callback work. */
>  	gnttab_cancel_free_callback(&info->callback);
> -	spin_unlock_irqrestore(&info->io_lock, flags);
>  
>  	/* Flush gnttab callback work. Must be done with no locks held. */
>  	flush_work(&info->work);
> @@ -926,20 +924,18 @@ static void xlvbd_release_gendisk(struct blkfront_info *info)
>  	xlbd_release_minors(minor, nr_minors);
>  
>  	blk_cleanup_queue(info->rq);
> +	blk_mq_free_tag_set(&info->tag_set);
>  	info->rq = NULL;
>  
>  	put_disk(info->gd);
>  	info->gd = NULL;
>  }
>  
> +/* Must be called with io_lock holded */
>  static void kick_pending_request_queues(struct blkfront_info *info)
>  {
> -	if (!RING_FULL(&info->ring)) {
> -		/* Re-enable calldowns. */
> -		blk_start_queue(info->rq);
> -		/* Kick things off immediately. */
> -		do_blkif_request(info->rq);
> -	}
> +	if (!RING_FULL(&info->ring))
> +		blk_mq_start_stopped_hw_queues(info->rq, true);
>  }
>  
>  static void blkif_restart_queue(struct work_struct *work)
> @@ -964,7 +960,7 @@ static void blkif_free(struct blkfront_info *info, int suspend)
>  		BLKIF_STATE_SUSPENDED : BLKIF_STATE_DISCONNECTED;
>  	/* No more blkif_request(). */
>  	if (info->rq)
> -		blk_stop_queue(info->rq);
> +		blk_mq_stop_hw_queues(info->rq);
>  
>  	/* Remove all persistent grants */
>  	if (!list_empty(&info->grants)) {
> @@ -1147,7 +1143,6 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>  	RING_IDX i, rp;
>  	unsigned long flags;
>  	struct blkfront_info *info = (struct blkfront_info *)dev_id;
> -	int error;
>  
>  	spin_lock_irqsave(&info->io_lock, flags);
>  
> @@ -1188,37 +1183,37 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>  			continue;
>  		}
>  
> -		error = (bret->status == BLKIF_RSP_OKAY) ? 0 : -EIO;
> +		req->errors = (bret->status == BLKIF_RSP_OKAY) ? 0 : -EIO;
>  		switch (bret->operation) {
>  		case BLKIF_OP_DISCARD:
>  			if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) {
>  				struct request_queue *rq = info->rq;
>  				printk(KERN_WARNING "blkfront: %s: %s op failed\n",
>  					   info->gd->disk_name, op_name(bret->operation));
> -				error = -EOPNOTSUPP;
> +				req->errors = -EOPNOTSUPP;
>  				info->feature_discard = 0;
>  				info->feature_secdiscard = 0;
>  				queue_flag_clear(QUEUE_FLAG_DISCARD, rq);
>  				queue_flag_clear(QUEUE_FLAG_SECDISCARD, rq);
>  			}
> -			__blk_end_request_all(req, error);
> +			blk_mq_complete_request(req);
>  			break;
>  		case BLKIF_OP_FLUSH_DISKCACHE:
>  		case BLKIF_OP_WRITE_BARRIER:
>  			if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) {
>  				printk(KERN_WARNING "blkfront: %s: %s op failed\n",
>  				       info->gd->disk_name, op_name(bret->operation));
> -				error = -EOPNOTSUPP;
> +				req->errors = -EOPNOTSUPP;
>  			}
>  			if (unlikely(bret->status == BLKIF_RSP_ERROR &&
>  				     info->shadow[id].req.u.rw.nr_segments == 0)) {
>  				printk(KERN_WARNING "blkfront: %s: empty %s op failed\n",
>  				       info->gd->disk_name, op_name(bret->operation));
> -				error = -EOPNOTSUPP;
> +				req->errors = -EOPNOTSUPP;
>  			}
> -			if (unlikely(error)) {
> -				if (error == -EOPNOTSUPP)
> -					error = 0;
> +			if (unlikely(req->errors)) {
> +				if (req->errors == -EOPNOTSUPP)
> +					req->errors = 0;
>  				info->feature_flush = 0;
>  				xlvbd_flush(info);
>  			}
> @@ -1229,7 +1224,7 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>  				dev_dbg(&info->xbdev->dev, "Bad return from blkdev data "
>  					"request: %x\n", bret->status);
>  
> -			__blk_end_request_all(req, error);
> +			blk_mq_complete_request(req);
>  			break;
>  		default:
>  			BUG();
> @@ -1558,28 +1553,6 @@ static int blkif_recover(struct blkfront_info *info)
>  
>  	kfree(copy);
>  
> -	/*
> -	 * Empty the queue, this is important because we might have
> -	 * requests in the queue with more segments than what we
> -	 * can handle now.
> -	 */
> -	spin_lock_irq(&info->io_lock);
> -	while ((req = blk_fetch_request(info->rq)) != NULL) {
> -		if (req->cmd_flags &
> -		    (REQ_FLUSH | REQ_FUA | REQ_DISCARD | REQ_SECURE)) {
> -			list_add(&req->queuelist, &requests);
> -			continue;
> -		}
> -		merge_bio.head = req->bio;
> -		merge_bio.tail = req->biotail;
> -		bio_list_merge(&bio_list, &merge_bio);
> -		req->bio = NULL;
> -		if (req->cmd_flags & (REQ_FLUSH | REQ_FUA))
> -			pr_alert("diskcache flush request found!\n");
> -		__blk_end_request_all(req, 0);
> -	}
> -	spin_unlock_irq(&info->io_lock);
> -
>  	xenbus_switch_state(info->xbdev, XenbusStateConnected);
>  
>  	spin_lock_irq(&info->io_lock);
> @@ -1594,9 +1567,10 @@ static int blkif_recover(struct blkfront_info *info)
>  		/* Requeue pending requests (flush or discard) */
>  		list_del_init(&req->queuelist);
>  		BUG_ON(req->nr_phys_segments > segs);
> -		blk_requeue_request(info->rq, req);
> +		blk_mq_requeue_request(req);
>  	}
>  	spin_unlock_irq(&info->io_lock);
> +	blk_mq_kick_requeue_list(info->rq);
>  
>  	while ((bio = bio_list_pop(&bio_list)) != NULL) {
>  		/* Traverse the list of pending bios and re-queue them */
> -- 
> 1.7.10.4
> 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 2/9] xen-block: add document for mutli hardware queues/rings
  2015-09-05 12:39 ` Bob Liu
  2015-09-23 20:32   ` Konrad Rzeszutek Wilk
@ 2015-09-23 20:32   ` Konrad Rzeszutek Wilk
  2015-10-02 16:04   ` Roger Pau Monné
  2015-10-02 16:04   ` Roger Pau Monné
  3 siblings, 0 replies; 83+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-09-23 20:32 UTC (permalink / raw)
  To: Bob Liu
  Cc: xen-devel, david.vrabel, linux-kernel, roger.pau,
	felipe.franciosi, axboe, hch, avanzini.arianna, rafal.mielniczuk,
	boris.ostrovsky, jonathan.davies

On Sat, Sep 05, 2015 at 08:39:35PM +0800, Bob Liu wrote:
> Document multi queues/rings of xen-block.

This needs to be posted on Xen devel as well so that the blkif.h header
in Xen has this.

> 
> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> ---
>  include/xen/interface/io/blkif.h |   32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/include/xen/interface/io/blkif.h b/include/xen/interface/io/blkif.h
> index c33e1c4..b453b70 100644
> --- a/include/xen/interface/io/blkif.h
> +++ b/include/xen/interface/io/blkif.h
> @@ -28,6 +28,38 @@ typedef uint16_t blkif_vdev_t;
>  typedef uint64_t blkif_sector_t;
>  
>  /*
> + * Multiple hardware queues/rings:
> + * If supported, the backend will write the key "multi-queue-max-queues" to
> + * the directory for that vbd, and set its value to the maximum supported
> + * number of queues.
> + * Frontends that are aware of this feature and wish to use it can write the
> + * key "multi-queue-num-queues", set to the number they wish to use, which
> + * must be greater than zero, and no more than the value reported by the backend
> + * in "multi-queue-max-queues".
> + *
> + * For frontends requesting just one queue, the usual event-channel and
> + * ring-ref keys are written as before, simplifying the backend processing
> + * to avoid distinguishing between a frontend that doesn't understand the
> + * multi-queue feature, and one that does, but requested only one queue.
> + *
> + * Frontends requesting two or more queues must not write the toplevel
> + * event-channeland ring-ref keys, instead writing those keys under sub-keys
> + * having the name "queue-N" where N is the integer ID of the queue/ring for
> + * which those keys belong. Queues are indexed from zero.
> + * For example, a frontend with two queues must write the following set of
> + * queue-related keys:
> + *
> + * /local/domain/1/device/vbd/0/multi-queue-num-queues = "2"
> + * /local/domain/1/device/vbd/0/queue-0 = ""
> + * /local/domain/1/device/vbd/0/queue-0/ring-ref = "<ring-ref>"
> + * /local/domain/1/device/vbd/0/queue-0/event-channel = "<evtchn>"
> + * /local/domain/1/device/vbd/0/queue-1 = ""
> + * /local/domain/1/device/vbd/0/queue-1/ring-ref = "<ring-ref>"
> + * /local/domain/1/device/vbd/0/queue-1/event-channel = "<evtchn>"
> + *
> + */
> +
> +/*
>   * REQUEST CODES.
>   */
>  #define BLKIF_OP_READ              0
> -- 
> 1.7.10.4
> 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 2/9] xen-block: add document for mutli hardware queues/rings
  2015-09-05 12:39 ` Bob Liu
@ 2015-09-23 20:32   ` Konrad Rzeszutek Wilk
  2015-09-23 20:32   ` Konrad Rzeszutek Wilk
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 83+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-09-23 20:32 UTC (permalink / raw)
  To: Bob Liu
  Cc: hch, felipe.franciosi, rafal.mielniczuk, jonathan.davies,
	linux-kernel, xen-devel, axboe, david.vrabel, avanzini.arianna,
	boris.ostrovsky, roger.pau

On Sat, Sep 05, 2015 at 08:39:35PM +0800, Bob Liu wrote:
> Document multi queues/rings of xen-block.

This needs to be posted on Xen devel as well so that the blkif.h header
in Xen has this.

> 
> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> ---
>  include/xen/interface/io/blkif.h |   32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/include/xen/interface/io/blkif.h b/include/xen/interface/io/blkif.h
> index c33e1c4..b453b70 100644
> --- a/include/xen/interface/io/blkif.h
> +++ b/include/xen/interface/io/blkif.h
> @@ -28,6 +28,38 @@ typedef uint16_t blkif_vdev_t;
>  typedef uint64_t blkif_sector_t;
>  
>  /*
> + * Multiple hardware queues/rings:
> + * If supported, the backend will write the key "multi-queue-max-queues" to
> + * the directory for that vbd, and set its value to the maximum supported
> + * number of queues.
> + * Frontends that are aware of this feature and wish to use it can write the
> + * key "multi-queue-num-queues", set to the number they wish to use, which
> + * must be greater than zero, and no more than the value reported by the backend
> + * in "multi-queue-max-queues".
> + *
> + * For frontends requesting just one queue, the usual event-channel and
> + * ring-ref keys are written as before, simplifying the backend processing
> + * to avoid distinguishing between a frontend that doesn't understand the
> + * multi-queue feature, and one that does, but requested only one queue.
> + *
> + * Frontends requesting two or more queues must not write the toplevel
> + * event-channeland ring-ref keys, instead writing those keys under sub-keys
> + * having the name "queue-N" where N is the integer ID of the queue/ring for
> + * which those keys belong. Queues are indexed from zero.
> + * For example, a frontend with two queues must write the following set of
> + * queue-related keys:
> + *
> + * /local/domain/1/device/vbd/0/multi-queue-num-queues = "2"
> + * /local/domain/1/device/vbd/0/queue-0 = ""
> + * /local/domain/1/device/vbd/0/queue-0/ring-ref = "<ring-ref>"
> + * /local/domain/1/device/vbd/0/queue-0/event-channel = "<evtchn>"
> + * /local/domain/1/device/vbd/0/queue-1 = ""
> + * /local/domain/1/device/vbd/0/queue-1/ring-ref = "<ring-ref>"
> + * /local/domain/1/device/vbd/0/queue-1/event-channel = "<evtchn>"
> + *
> + */
> +
> +/*
>   * REQUEST CODES.
>   */
>  #define BLKIF_OP_READ              0
> -- 
> 1.7.10.4
> 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 1/9] xen-blkfront: convert to blk-mq APIs
  2015-09-23 20:31   ` Konrad Rzeszutek Wilk
@ 2015-09-23 21:12     ` Konrad Rzeszutek Wilk
  2015-09-23 21:12     ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 83+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-09-23 21:12 UTC (permalink / raw)
  To: Bob Liu, david.vrabel
  Cc: xen-devel, linux-kernel, roger.pau, felipe.franciosi, axboe, hch,
	avanzini.arianna, rafal.mielniczuk, boris.ostrovsky,
	jonathan.davies

On Wed, Sep 23, 2015 at 04:31:21PM -0400, Konrad Rzeszutek Wilk wrote:
> On Sat, Sep 05, 2015 at 08:39:34PM +0800, Bob Liu wrote:
> > Note: This patch is based on original work of Arianna's internship for
> > GNOME's Outreach Program for Women.
> > 
> > Only one hardware queue is used now, so there is no significant
> > performance change
> > 
> > The legacy non-mq code is deleted completely which is the same as other
> > drivers like virtio, mtip, and nvme.
> > 
> > Also dropped one unnecessary holding of info->io_lock when calling
> > blk_mq_stop_hw_queues().
> > 
> > Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
> > Signed-off-by: Bob Liu <bob.liu@oracle.com>
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > Acked-by: Jens Axboe <axboe@fb.com>
> > Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> 
> Odd.
> 
> This should have gone in Linux 4.3 but it did not? I remember seeing it
> there? I think?

Ignore the noise please. I was in my 'for-jens-4.3' branch which did not
include this as it went through the Xen branch. Sorry.


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 1/9] xen-blkfront: convert to blk-mq APIs
  2015-09-23 20:31   ` Konrad Rzeszutek Wilk
  2015-09-23 21:12     ` Konrad Rzeszutek Wilk
@ 2015-09-23 21:12     ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 83+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-09-23 21:12 UTC (permalink / raw)
  To: Bob Liu, david.vrabel
  Cc: hch, felipe.franciosi, rafal.mielniczuk, jonathan.davies,
	linux-kernel, xen-devel, axboe, avanzini.arianna,
	boris.ostrovsky, roger.pau

On Wed, Sep 23, 2015 at 04:31:21PM -0400, Konrad Rzeszutek Wilk wrote:
> On Sat, Sep 05, 2015 at 08:39:34PM +0800, Bob Liu wrote:
> > Note: This patch is based on original work of Arianna's internship for
> > GNOME's Outreach Program for Women.
> > 
> > Only one hardware queue is used now, so there is no significant
> > performance change
> > 
> > The legacy non-mq code is deleted completely which is the same as other
> > drivers like virtio, mtip, and nvme.
> > 
> > Also dropped one unnecessary holding of info->io_lock when calling
> > blk_mq_stop_hw_queues().
> > 
> > Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
> > Signed-off-by: Bob Liu <bob.liu@oracle.com>
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > Acked-by: Jens Axboe <axboe@fb.com>
> > Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> 
> Odd.
> 
> This should have gone in Linux 4.3 but it did not? I remember seeing it
> there? I think?

Ignore the noise please. I was in my 'for-jens-4.3' branch which did not
include this as it went through the Xen branch. Sorry.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 0/9] xen-block: support multi hardware-queues/rings
  2015-09-05 12:39 [PATCH v3 0/9] xen-block: support multi hardware-queues/rings Bob Liu
                   ` (12 preceding siblings ...)
  2015-09-05 12:39   ` Bob Liu
@ 2015-10-02  9:57 ` Rafal Mielniczuk
  2015-10-02  9:57 ` Rafal Mielniczuk
  14 siblings, 0 replies; 83+ messages in thread
From: Rafal Mielniczuk @ 2015-10-02  9:57 UTC (permalink / raw)
  To: Bob Liu, xen-devel
  Cc: David Vrabel, linux-kernel, Roger Pau Monne, konrad.wilk,
	Felipe Franciosi, axboe, hch, avanzini.arianna, boris.ostrovsky,
	Jonathan Davies

On 05/09/15 13:40, Bob Liu wrote:
> Note: These patches were based on original work of Arianna's internship for
> GNOME's Outreach Program for Women.
>
> The first patch which just convert xen-blkfront driver to use blk-mq api has
> been applied by David.
>
> After using blk-mq api, a guest has more than one(nr_vpus) software request
> queues associated with each block front. These queues can be mapped over several
> rings(hardware queues) to the backend, making it very easy for us to run
> multiple threads on the backend for a single virtual disk.
>
> By having different threads issuing requests at the same time, the performance
> of guest can be improved significantly in the end.
>
> Test was done based on null_blk driver:
> dom0: v4.2-rc8 16vcpus 10GB "modprobe null_blk"
> domu: v4.2-rc8 16vcpus 10GB
>
> [test]
> rw=read or randread
> direct=1
> ioengine=libaio
> bs=4k
> time_based
> runtime=30
> filename=/dev/xvdb
> numjobs=16
> iodepth=64
> iodepth_batch=64
> iodepth_batch_complete=64
> group_reporting
>
> Seqread:
> 	dom0 	domU(no_mq) 	domU(4 queues) 	 8 queues 	16 queues
> iops:  1308k        690k        1380k(+200%)        1238k           1471k
>
> Randread:
> 	dom0 	domU(no_mq) 	domU(4 queues) 	 8 queues 	16 queues
> iops:  1310k        279k        810k(+200%)          871k           1000k
>
> Only with 4queues, iops for domU get improved a lot and nearly catch up with
> dom0. There were also similar huge improvement for write and real SSD storage.
>
> ---
> v3: Rebased to v4.2-rc8
>
> Bob Liu (9):
>   xen-blkfront: convert to blk-mq APIs
>   xen-block: add document for mutli hardware queues/rings
>   xen/blkfront: separate per ring information out of device info
>   xen/blkfront: pseudo support for multi hardware queues/rings
>   xen/blkfront: convert per device io_lock to per ring ring_lock
>   xen/blkfront: negotiate the number of hw queues/rings with backend
>   xen/blkback: separate ring information out of struct xen_blkif
>   xen/blkback: pseudo support for multi hardware queues/rings
>   xen/blkback: get number of hardware queues/rings from blkfront
>
>  drivers/block/xen-blkback/blkback.c |  373 +++++-----
>  drivers/block/xen-blkback/common.h  |   53 +-
>  drivers/block/xen-blkback/xenbus.c  |  376 ++++++----
>  drivers/block/xen-blkfront.c        | 1343 ++++++++++++++++++++---------------
>  include/xen/interface/io/blkif.h    |   32 +
>  5 files changed, 1278 insertions(+), 899 deletions(-)
>
Hello,

Following are the results for sequential reads executed on the guest with the Intel P3700 SSD dom0 backend equipped with 16 hardware queues, 
which makes it a good candidate for the multi-queue measurements.

dom0: v4.2 16vcpus 4GB
domU: v4.2 16vcpus 10GB

fio --name=test --ioengine=libaio \
    --time_based=1 --runtime=30 --ramp_time=15 \
    --filename=/dev/xvdc --direct=1 --group_reporting=1 \
    --iodepth=16 --iodepth_batch=16 --iodepth_batch_complete=16 \
    --numjobs=16 --rw=read --bs=$bs

        1 queue     2 queues    4 queues    8 queues    16 queues
512     583K        757K        930K        995K        976K
1K      557K        832K        908K        931K        956K
2K      585K        794K        927K        975K        948K
4K      546K        709K        700K        754K        820K
8K      357K        414K        414K        414K        414K
16K     172K        194K        207K        207K        207K
32K     91K         99K         103K        103K        103K
64K     42K         51K         51K         51K         51K
128K    21K         25K         25K         25K         25K

With the increasing number of queues in the blkfront we see a gradual improvement in the number of iops,
especially for the small block sizes, as with larger block sizes we hit the limitations of the disk quicker.

Rafal


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 0/9] xen-block: support multi hardware-queues/rings
  2015-09-05 12:39 [PATCH v3 0/9] xen-block: support multi hardware-queues/rings Bob Liu
                   ` (13 preceding siblings ...)
  2015-10-02  9:57 ` [PATCH v3 0/9] xen-block: support multi hardware-queues/rings Rafal Mielniczuk
@ 2015-10-02  9:57 ` Rafal Mielniczuk
  14 siblings, 0 replies; 83+ messages in thread
From: Rafal Mielniczuk @ 2015-10-02  9:57 UTC (permalink / raw)
  To: Bob Liu, xen-devel
  Cc: hch, Felipe Franciosi, linux-kernel, Jonathan Davies, axboe,
	David Vrabel, avanzini.arianna, boris.ostrovsky, Roger Pau Monne

On 05/09/15 13:40, Bob Liu wrote:
> Note: These patches were based on original work of Arianna's internship for
> GNOME's Outreach Program for Women.
>
> The first patch which just convert xen-blkfront driver to use blk-mq api has
> been applied by David.
>
> After using blk-mq api, a guest has more than one(nr_vpus) software request
> queues associated with each block front. These queues can be mapped over several
> rings(hardware queues) to the backend, making it very easy for us to run
> multiple threads on the backend for a single virtual disk.
>
> By having different threads issuing requests at the same time, the performance
> of guest can be improved significantly in the end.
>
> Test was done based on null_blk driver:
> dom0: v4.2-rc8 16vcpus 10GB "modprobe null_blk"
> domu: v4.2-rc8 16vcpus 10GB
>
> [test]
> rw=read or randread
> direct=1
> ioengine=libaio
> bs=4k
> time_based
> runtime=30
> filename=/dev/xvdb
> numjobs=16
> iodepth=64
> iodepth_batch=64
> iodepth_batch_complete=64
> group_reporting
>
> Seqread:
> 	dom0 	domU(no_mq) 	domU(4 queues) 	 8 queues 	16 queues
> iops:  1308k        690k        1380k(+200%)        1238k           1471k
>
> Randread:
> 	dom0 	domU(no_mq) 	domU(4 queues) 	 8 queues 	16 queues
> iops:  1310k        279k        810k(+200%)          871k           1000k
>
> Only with 4queues, iops for domU get improved a lot and nearly catch up with
> dom0. There were also similar huge improvement for write and real SSD storage.
>
> ---
> v3: Rebased to v4.2-rc8
>
> Bob Liu (9):
>   xen-blkfront: convert to blk-mq APIs
>   xen-block: add document for mutli hardware queues/rings
>   xen/blkfront: separate per ring information out of device info
>   xen/blkfront: pseudo support for multi hardware queues/rings
>   xen/blkfront: convert per device io_lock to per ring ring_lock
>   xen/blkfront: negotiate the number of hw queues/rings with backend
>   xen/blkback: separate ring information out of struct xen_blkif
>   xen/blkback: pseudo support for multi hardware queues/rings
>   xen/blkback: get number of hardware queues/rings from blkfront
>
>  drivers/block/xen-blkback/blkback.c |  373 +++++-----
>  drivers/block/xen-blkback/common.h  |   53 +-
>  drivers/block/xen-blkback/xenbus.c  |  376 ++++++----
>  drivers/block/xen-blkfront.c        | 1343 ++++++++++++++++++++---------------
>  include/xen/interface/io/blkif.h    |   32 +
>  5 files changed, 1278 insertions(+), 899 deletions(-)
>
Hello,

Following are the results for sequential reads executed on the guest with the Intel P3700 SSD dom0 backend equipped with 16 hardware queues, 
which makes it a good candidate for the multi-queue measurements.

dom0: v4.2 16vcpus 4GB
domU: v4.2 16vcpus 10GB

fio --name=test --ioengine=libaio \
    --time_based=1 --runtime=30 --ramp_time=15 \
    --filename=/dev/xvdc --direct=1 --group_reporting=1 \
    --iodepth=16 --iodepth_batch=16 --iodepth_batch_complete=16 \
    --numjobs=16 --rw=read --bs=$bs

        1 queue     2 queues    4 queues    8 queues    16 queues
512     583K        757K        930K        995K        976K
1K      557K        832K        908K        931K        956K
2K      585K        794K        927K        975K        948K
4K      546K        709K        700K        754K        820K
8K      357K        414K        414K        414K        414K
16K     172K        194K        207K        207K        207K
32K     91K         99K         103K        103K        103K
64K     42K         51K         51K         51K         51K
128K    21K         25K         25K         25K         25K

With the increasing number of queues in the blkfront we see a gradual improvement in the number of iops,
especially for the small block sizes, as with larger block sizes we hit the limitations of the disk quicker.

Rafal

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 2/9] xen-block: add document for mutli hardware queues/rings
  2015-09-05 12:39 ` Bob Liu
  2015-09-23 20:32   ` Konrad Rzeszutek Wilk
  2015-09-23 20:32   ` Konrad Rzeszutek Wilk
@ 2015-10-02 16:04   ` Roger Pau Monné
  2015-10-02 16:12     ` Wei Liu
  2015-10-02 16:12     ` [Xen-devel] " Wei Liu
  2015-10-02 16:04   ` Roger Pau Monné
  3 siblings, 2 replies; 83+ messages in thread
From: Roger Pau Monné @ 2015-10-02 16:04 UTC (permalink / raw)
  To: Bob Liu, xen-devel
  Cc: david.vrabel, linux-kernel, konrad.wilk, felipe.franciosi, axboe,
	hch, avanzini.arianna, rafal.mielniczuk, boris.ostrovsky,
	jonathan.davies

El 05/09/15 a les 14.39, Bob Liu ha escrit:
> Document multi queues/rings of xen-block.
> 
> Signed-off-by: Bob Liu <bob.liu@oracle.com>

As said by Konrad, you should send this against the Xen public headers
also (or even before). I have a comment below.

> ---
>  include/xen/interface/io/blkif.h |   32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/include/xen/interface/io/blkif.h b/include/xen/interface/io/blkif.h
> index c33e1c4..b453b70 100644
> --- a/include/xen/interface/io/blkif.h
> +++ b/include/xen/interface/io/blkif.h
> @@ -28,6 +28,38 @@ typedef uint16_t blkif_vdev_t;
>  typedef uint64_t blkif_sector_t;
>  
>  /*
> + * Multiple hardware queues/rings:
> + * If supported, the backend will write the key "multi-queue-max-queues" to
> + * the directory for that vbd, and set its value to the maximum supported
> + * number of queues.
> + * Frontends that are aware of this feature and wish to use it can write the
> + * key "multi-queue-num-queues", set to the number they wish to use, which
> + * must be greater than zero, and no more than the value reported by the backend
> + * in "multi-queue-max-queues".
> + *
> + * For frontends requesting just one queue, the usual event-channel and
> + * ring-ref keys are written as before, simplifying the backend processing
> + * to avoid distinguishing between a frontend that doesn't understand the
> + * multi-queue feature, and one that does, but requested only one queue.
> + *
> + * Frontends requesting two or more queues must not write the toplevel
> + * event-channeland ring-ref keys, instead writing those keys under sub-keys
> + * having the name "queue-N" where N is the integer ID of the queue/ring for
> + * which those keys belong. Queues are indexed from zero.
> + * For example, a frontend with two queues must write the following set of
> + * queue-related keys:
> + *
> + * /local/domain/1/device/vbd/0/multi-queue-num-queues = "2"
> + * /local/domain/1/device/vbd/0/queue-0 = ""
> + * /local/domain/1/device/vbd/0/queue-0/ring-ref = "<ring-ref>"
> + * /local/domain/1/device/vbd/0/queue-0/event-channel = "<evtchn>"
> + * /local/domain/1/device/vbd/0/queue-1 = ""
> + * /local/domain/1/device/vbd/0/queue-1/ring-ref = "<ring-ref>"
> + * /local/domain/1/device/vbd/0/queue-1/event-channel = "<evtchn>"

AFAICT, it's impossible by design to use multiple queues together with
multipage rings, is that right?

Roger.


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 2/9] xen-block: add document for mutli hardware queues/rings
  2015-09-05 12:39 ` Bob Liu
                     ` (2 preceding siblings ...)
  2015-10-02 16:04   ` Roger Pau Monné
@ 2015-10-02 16:04   ` Roger Pau Monné
  3 siblings, 0 replies; 83+ messages in thread
From: Roger Pau Monné @ 2015-10-02 16:04 UTC (permalink / raw)
  To: Bob Liu, xen-devel
  Cc: hch, felipe.franciosi, rafal.mielniczuk, linux-kernel,
	jonathan.davies, axboe, david.vrabel, avanzini.arianna,
	boris.ostrovsky

El 05/09/15 a les 14.39, Bob Liu ha escrit:
> Document multi queues/rings of xen-block.
> 
> Signed-off-by: Bob Liu <bob.liu@oracle.com>

As said by Konrad, you should send this against the Xen public headers
also (or even before). I have a comment below.

> ---
>  include/xen/interface/io/blkif.h |   32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/include/xen/interface/io/blkif.h b/include/xen/interface/io/blkif.h
> index c33e1c4..b453b70 100644
> --- a/include/xen/interface/io/blkif.h
> +++ b/include/xen/interface/io/blkif.h
> @@ -28,6 +28,38 @@ typedef uint16_t blkif_vdev_t;
>  typedef uint64_t blkif_sector_t;
>  
>  /*
> + * Multiple hardware queues/rings:
> + * If supported, the backend will write the key "multi-queue-max-queues" to
> + * the directory for that vbd, and set its value to the maximum supported
> + * number of queues.
> + * Frontends that are aware of this feature and wish to use it can write the
> + * key "multi-queue-num-queues", set to the number they wish to use, which
> + * must be greater than zero, and no more than the value reported by the backend
> + * in "multi-queue-max-queues".
> + *
> + * For frontends requesting just one queue, the usual event-channel and
> + * ring-ref keys are written as before, simplifying the backend processing
> + * to avoid distinguishing between a frontend that doesn't understand the
> + * multi-queue feature, and one that does, but requested only one queue.
> + *
> + * Frontends requesting two or more queues must not write the toplevel
> + * event-channeland ring-ref keys, instead writing those keys under sub-keys
> + * having the name "queue-N" where N is the integer ID of the queue/ring for
> + * which those keys belong. Queues are indexed from zero.
> + * For example, a frontend with two queues must write the following set of
> + * queue-related keys:
> + *
> + * /local/domain/1/device/vbd/0/multi-queue-num-queues = "2"
> + * /local/domain/1/device/vbd/0/queue-0 = ""
> + * /local/domain/1/device/vbd/0/queue-0/ring-ref = "<ring-ref>"
> + * /local/domain/1/device/vbd/0/queue-0/event-channel = "<evtchn>"
> + * /local/domain/1/device/vbd/0/queue-1 = ""
> + * /local/domain/1/device/vbd/0/queue-1/ring-ref = "<ring-ref>"
> + * /local/domain/1/device/vbd/0/queue-1/event-channel = "<evtchn>"

AFAICT, it's impossible by design to use multiple queues together with
multipage rings, is that right?

Roger.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Xen-devel] [PATCH v3 2/9] xen-block: add document for mutli hardware queues/rings
  2015-10-02 16:04   ` Roger Pau Monné
  2015-10-02 16:12     ` Wei Liu
@ 2015-10-02 16:12     ` Wei Liu
  2015-10-02 16:22       ` Roger Pau Monné
  2015-10-02 16:22       ` [Xen-devel] " Roger Pau Monné
  1 sibling, 2 replies; 83+ messages in thread
From: Wei Liu @ 2015-10-02 16:12 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Bob Liu, xen-devel, hch, felipe.franciosi, rafal.mielniczuk,
	linux-kernel, jonathan.davies, axboe, david.vrabel,
	avanzini.arianna, boris.ostrovsky, wei.liu2

On Fri, Oct 02, 2015 at 06:04:35PM +0200, Roger Pau Monné wrote:
> El 05/09/15 a les 14.39, Bob Liu ha escrit:
> > Document multi queues/rings of xen-block.
> > 
> > Signed-off-by: Bob Liu <bob.liu@oracle.com>
> 
> As said by Konrad, you should send this against the Xen public headers
> also (or even before). I have a comment below.
> 
> > ---
> >  include/xen/interface/io/blkif.h |   32 ++++++++++++++++++++++++++++++++
> >  1 file changed, 32 insertions(+)
> > 
> > diff --git a/include/xen/interface/io/blkif.h b/include/xen/interface/io/blkif.h
> > index c33e1c4..b453b70 100644
> > --- a/include/xen/interface/io/blkif.h
> > +++ b/include/xen/interface/io/blkif.h
> > @@ -28,6 +28,38 @@ typedef uint16_t blkif_vdev_t;
> >  typedef uint64_t blkif_sector_t;
> >  
> >  /*
> > + * Multiple hardware queues/rings:
> > + * If supported, the backend will write the key "multi-queue-max-queues" to
> > + * the directory for that vbd, and set its value to the maximum supported
> > + * number of queues.
> > + * Frontends that are aware of this feature and wish to use it can write the
> > + * key "multi-queue-num-queues", set to the number they wish to use, which
> > + * must be greater than zero, and no more than the value reported by the backend
> > + * in "multi-queue-max-queues".
> > + *
> > + * For frontends requesting just one queue, the usual event-channel and
> > + * ring-ref keys are written as before, simplifying the backend processing
> > + * to avoid distinguishing between a frontend that doesn't understand the
> > + * multi-queue feature, and one that does, but requested only one queue.
> > + *
> > + * Frontends requesting two or more queues must not write the toplevel
> > + * event-channeland ring-ref keys, instead writing those keys under sub-keys
> > + * having the name "queue-N" where N is the integer ID of the queue/ring for
> > + * which those keys belong. Queues are indexed from zero.
> > + * For example, a frontend with two queues must write the following set of
> > + * queue-related keys:
> > + *
> > + * /local/domain/1/device/vbd/0/multi-queue-num-queues = "2"
> > + * /local/domain/1/device/vbd/0/queue-0 = ""
> > + * /local/domain/1/device/vbd/0/queue-0/ring-ref = "<ring-ref>"
> > + * /local/domain/1/device/vbd/0/queue-0/event-channel = "<evtchn>"
> > + * /local/domain/1/device/vbd/0/queue-1 = ""
> > + * /local/domain/1/device/vbd/0/queue-1/ring-ref = "<ring-ref>"
> > + * /local/domain/1/device/vbd/0/queue-1/event-channel = "<evtchn>"
> 
> AFAICT, it's impossible by design to use multiple queues together with
> multipage rings, is that right?
> 

As far as I can tell, these two features are not inherently coupled.
Whether you want to make (by design) them coupled together or not is
another matter. :-)

> Roger.
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 2/9] xen-block: add document for mutli hardware queues/rings
  2015-10-02 16:04   ` Roger Pau Monné
@ 2015-10-02 16:12     ` Wei Liu
  2015-10-02 16:12     ` [Xen-devel] " Wei Liu
  1 sibling, 0 replies; 83+ messages in thread
From: Wei Liu @ 2015-10-02 16:12 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: jonathan.davies, felipe.franciosi, rafal.mielniczuk, axboe,
	linux-kernel, xen-devel, hch, david.vrabel, avanzini.arianna,
	wei.liu2, boris.ostrovsky

On Fri, Oct 02, 2015 at 06:04:35PM +0200, Roger Pau Monné wrote:
> El 05/09/15 a les 14.39, Bob Liu ha escrit:
> > Document multi queues/rings of xen-block.
> > 
> > Signed-off-by: Bob Liu <bob.liu@oracle.com>
> 
> As said by Konrad, you should send this against the Xen public headers
> also (or even before). I have a comment below.
> 
> > ---
> >  include/xen/interface/io/blkif.h |   32 ++++++++++++++++++++++++++++++++
> >  1 file changed, 32 insertions(+)
> > 
> > diff --git a/include/xen/interface/io/blkif.h b/include/xen/interface/io/blkif.h
> > index c33e1c4..b453b70 100644
> > --- a/include/xen/interface/io/blkif.h
> > +++ b/include/xen/interface/io/blkif.h
> > @@ -28,6 +28,38 @@ typedef uint16_t blkif_vdev_t;
> >  typedef uint64_t blkif_sector_t;
> >  
> >  /*
> > + * Multiple hardware queues/rings:
> > + * If supported, the backend will write the key "multi-queue-max-queues" to
> > + * the directory for that vbd, and set its value to the maximum supported
> > + * number of queues.
> > + * Frontends that are aware of this feature and wish to use it can write the
> > + * key "multi-queue-num-queues", set to the number they wish to use, which
> > + * must be greater than zero, and no more than the value reported by the backend
> > + * in "multi-queue-max-queues".
> > + *
> > + * For frontends requesting just one queue, the usual event-channel and
> > + * ring-ref keys are written as before, simplifying the backend processing
> > + * to avoid distinguishing between a frontend that doesn't understand the
> > + * multi-queue feature, and one that does, but requested only one queue.
> > + *
> > + * Frontends requesting two or more queues must not write the toplevel
> > + * event-channeland ring-ref keys, instead writing those keys under sub-keys
> > + * having the name "queue-N" where N is the integer ID of the queue/ring for
> > + * which those keys belong. Queues are indexed from zero.
> > + * For example, a frontend with two queues must write the following set of
> > + * queue-related keys:
> > + *
> > + * /local/domain/1/device/vbd/0/multi-queue-num-queues = "2"
> > + * /local/domain/1/device/vbd/0/queue-0 = ""
> > + * /local/domain/1/device/vbd/0/queue-0/ring-ref = "<ring-ref>"
> > + * /local/domain/1/device/vbd/0/queue-0/event-channel = "<evtchn>"
> > + * /local/domain/1/device/vbd/0/queue-1 = ""
> > + * /local/domain/1/device/vbd/0/queue-1/ring-ref = "<ring-ref>"
> > + * /local/domain/1/device/vbd/0/queue-1/event-channel = "<evtchn>"
> 
> AFAICT, it's impossible by design to use multiple queues together with
> multipage rings, is that right?
> 

As far as I can tell, these two features are not inherently coupled.
Whether you want to make (by design) them coupled together or not is
another matter. :-)

> Roger.
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Xen-devel] [PATCH v3 2/9] xen-block: add document for mutli hardware queues/rings
  2015-10-02 16:12     ` [Xen-devel] " Wei Liu
  2015-10-02 16:22       ` Roger Pau Monné
@ 2015-10-02 16:22       ` Roger Pau Monné
  2015-10-02 23:55         ` Bob Liu
  2015-10-02 23:55         ` Bob Liu
  1 sibling, 2 replies; 83+ messages in thread
From: Roger Pau Monné @ 2015-10-02 16:22 UTC (permalink / raw)
  To: Wei Liu
  Cc: Bob Liu, xen-devel, hch, felipe.franciosi, rafal.mielniczuk,
	linux-kernel, jonathan.davies, axboe, david.vrabel,
	avanzini.arianna, boris.ostrovsky

El 02/10/15 a les 18.12, Wei Liu ha escrit:
> On Fri, Oct 02, 2015 at 06:04:35PM +0200, Roger Pau Monné wrote:
>> El 05/09/15 a les 14.39, Bob Liu ha escrit:
>>> Document multi queues/rings of xen-block.
>>>
>>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>>
>> As said by Konrad, you should send this against the Xen public headers
>> also (or even before). I have a comment below.
>>
>>> ---
>>>  include/xen/interface/io/blkif.h |   32 ++++++++++++++++++++++++++++++++
>>>  1 file changed, 32 insertions(+)
>>>
>>> diff --git a/include/xen/interface/io/blkif.h b/include/xen/interface/io/blkif.h
>>> index c33e1c4..b453b70 100644
>>> --- a/include/xen/interface/io/blkif.h
>>> +++ b/include/xen/interface/io/blkif.h
>>> @@ -28,6 +28,38 @@ typedef uint16_t blkif_vdev_t;
>>>  typedef uint64_t blkif_sector_t;
>>>  
>>>  /*
>>> + * Multiple hardware queues/rings:
>>> + * If supported, the backend will write the key "multi-queue-max-queues" to
>>> + * the directory for that vbd, and set its value to the maximum supported
>>> + * number of queues.
>>> + * Frontends that are aware of this feature and wish to use it can write the
>>> + * key "multi-queue-num-queues", set to the number they wish to use, which
>>> + * must be greater than zero, and no more than the value reported by the backend
>>> + * in "multi-queue-max-queues".
>>> + *
>>> + * For frontends requesting just one queue, the usual event-channel and
>>> + * ring-ref keys are written as before, simplifying the backend processing
>>> + * to avoid distinguishing between a frontend that doesn't understand the
>>> + * multi-queue feature, and one that does, but requested only one queue.
>>> + *
>>> + * Frontends requesting two or more queues must not write the toplevel
>>> + * event-channeland ring-ref keys, instead writing those keys under sub-keys
>>> + * having the name "queue-N" where N is the integer ID of the queue/ring for
>>> + * which those keys belong. Queues are indexed from zero.
>>> + * For example, a frontend with two queues must write the following set of
>>> + * queue-related keys:
>>> + *
>>> + * /local/domain/1/device/vbd/0/multi-queue-num-queues = "2"
>>> + * /local/domain/1/device/vbd/0/queue-0 = ""
>>> + * /local/domain/1/device/vbd/0/queue-0/ring-ref = "<ring-ref>"
>>> + * /local/domain/1/device/vbd/0/queue-0/event-channel = "<evtchn>"
>>> + * /local/domain/1/device/vbd/0/queue-1 = ""
>>> + * /local/domain/1/device/vbd/0/queue-1/ring-ref = "<ring-ref>"
>>> + * /local/domain/1/device/vbd/0/queue-1/event-channel = "<evtchn>"
>>
>> AFAICT, it's impossible by design to use multiple queues together with
>> multipage rings, is that right?
>>
> 
> As far as I can tell, these two features are not inherently coupled.
> Whether you want to make (by design) them coupled together or not is
> another matter. :-)

I haven't looked at the implementation yet, but some mention of whether
multipage-rings are allowed with multiqueue would be good. For example
if both can indeed be used in conjunction I would mention:

If multi-page rings are also used, the format of the grant references
will be:

/local/domain/1/device/vbd/0/queue-0/ring-ref0 = "<ring-ref0>"
/local/domain/1/device/vbd/0/queue-0/ring-ref1 = "<ring-ref1>"
/local/domain/1/device/vbd/0/queue-0/ring-ref2 = "<ring-ref2>"
[...]

Roger.


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 2/9] xen-block: add document for mutli hardware queues/rings
  2015-10-02 16:12     ` [Xen-devel] " Wei Liu
@ 2015-10-02 16:22       ` Roger Pau Monné
  2015-10-02 16:22       ` [Xen-devel] " Roger Pau Monné
  1 sibling, 0 replies; 83+ messages in thread
From: Roger Pau Monné @ 2015-10-02 16:22 UTC (permalink / raw)
  To: Wei Liu
  Cc: jonathan.davies, felipe.franciosi, rafal.mielniczuk, axboe,
	linux-kernel, xen-devel, hch, david.vrabel, avanzini.arianna,
	boris.ostrovsky

El 02/10/15 a les 18.12, Wei Liu ha escrit:
> On Fri, Oct 02, 2015 at 06:04:35PM +0200, Roger Pau Monné wrote:
>> El 05/09/15 a les 14.39, Bob Liu ha escrit:
>>> Document multi queues/rings of xen-block.
>>>
>>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>>
>> As said by Konrad, you should send this against the Xen public headers
>> also (or even before). I have a comment below.
>>
>>> ---
>>>  include/xen/interface/io/blkif.h |   32 ++++++++++++++++++++++++++++++++
>>>  1 file changed, 32 insertions(+)
>>>
>>> diff --git a/include/xen/interface/io/blkif.h b/include/xen/interface/io/blkif.h
>>> index c33e1c4..b453b70 100644
>>> --- a/include/xen/interface/io/blkif.h
>>> +++ b/include/xen/interface/io/blkif.h
>>> @@ -28,6 +28,38 @@ typedef uint16_t blkif_vdev_t;
>>>  typedef uint64_t blkif_sector_t;
>>>  
>>>  /*
>>> + * Multiple hardware queues/rings:
>>> + * If supported, the backend will write the key "multi-queue-max-queues" to
>>> + * the directory for that vbd, and set its value to the maximum supported
>>> + * number of queues.
>>> + * Frontends that are aware of this feature and wish to use it can write the
>>> + * key "multi-queue-num-queues", set to the number they wish to use, which
>>> + * must be greater than zero, and no more than the value reported by the backend
>>> + * in "multi-queue-max-queues".
>>> + *
>>> + * For frontends requesting just one queue, the usual event-channel and
>>> + * ring-ref keys are written as before, simplifying the backend processing
>>> + * to avoid distinguishing between a frontend that doesn't understand the
>>> + * multi-queue feature, and one that does, but requested only one queue.
>>> + *
>>> + * Frontends requesting two or more queues must not write the toplevel
>>> + * event-channeland ring-ref keys, instead writing those keys under sub-keys
>>> + * having the name "queue-N" where N is the integer ID of the queue/ring for
>>> + * which those keys belong. Queues are indexed from zero.
>>> + * For example, a frontend with two queues must write the following set of
>>> + * queue-related keys:
>>> + *
>>> + * /local/domain/1/device/vbd/0/multi-queue-num-queues = "2"
>>> + * /local/domain/1/device/vbd/0/queue-0 = ""
>>> + * /local/domain/1/device/vbd/0/queue-0/ring-ref = "<ring-ref>"
>>> + * /local/domain/1/device/vbd/0/queue-0/event-channel = "<evtchn>"
>>> + * /local/domain/1/device/vbd/0/queue-1 = ""
>>> + * /local/domain/1/device/vbd/0/queue-1/ring-ref = "<ring-ref>"
>>> + * /local/domain/1/device/vbd/0/queue-1/event-channel = "<evtchn>"
>>
>> AFAICT, it's impossible by design to use multiple queues together with
>> multipage rings, is that right?
>>
> 
> As far as I can tell, these two features are not inherently coupled.
> Whether you want to make (by design) them coupled together or not is
> another matter. :-)

I haven't looked at the implementation yet, but some mention of whether
multipage-rings are allowed with multiqueue would be good. For example
if both can indeed be used in conjunction I would mention:

If multi-page rings are also used, the format of the grant references
will be:

/local/domain/1/device/vbd/0/queue-0/ring-ref0 = "<ring-ref0>"
/local/domain/1/device/vbd/0/queue-0/ring-ref1 = "<ring-ref1>"
/local/domain/1/device/vbd/0/queue-0/ring-ref2 = "<ring-ref2>"
[...]

Roger.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 3/9] xen/blkfront: separate per ring information out of device info
  2015-09-05 12:39 ` [PATCH v3 3/9] xen/blkfront: separate per ring information out of device info Bob Liu
@ 2015-10-02 17:02   ` Roger Pau Monné
  2015-10-03  0:34     ` Bob Liu
                       ` (3 more replies)
  2015-10-02 17:02   ` Roger Pau Monné
  1 sibling, 4 replies; 83+ messages in thread
From: Roger Pau Monné @ 2015-10-02 17:02 UTC (permalink / raw)
  To: Bob Liu, xen-devel
  Cc: david.vrabel, linux-kernel, konrad.wilk, felipe.franciosi, axboe,
	hch, avanzini.arianna, rafal.mielniczuk, boris.ostrovsky,
	jonathan.davies

El 05/09/15 a les 14.39, Bob Liu ha escrit:
> Split per ring information to an new structure:blkfront_ring_info, also rename
> per blkfront_info to blkfront_dev_info.
  ^ removed.
> 
> A ring is the representation of a hardware queue, every vbd device can associate
> with one or more blkfront_ring_info depending on how many hardware
> queues/rings to be used.
> 
> This patch is a preparation for supporting real multi hardware queues/rings.
> 
> Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> ---
>  drivers/block/xen-blkfront.c |  854 ++++++++++++++++++++++--------------------
>  1 file changed, 445 insertions(+), 409 deletions(-)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 5dd591d..bf416d5 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -107,7 +107,7 @@ static unsigned int xen_blkif_max_ring_order;
>  module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, S_IRUGO);
>  MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the shared ring");
>  
> -#define BLK_RING_SIZE(info) __CONST_RING_SIZE(blkif, PAGE_SIZE * (info)->nr_ring_pages)
> +#define BLK_RING_SIZE(dinfo) __CONST_RING_SIZE(blkif, PAGE_SIZE * (dinfo)->nr_ring_pages)

This change looks pointless, any reason to use dinfo instead of info?

>  #define BLK_MAX_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE * XENBUS_MAX_RING_PAGES)
>  /*
>   * ring-ref%i i=(-1UL) would take 11 characters + 'ring-ref' is 8, so 19
> @@ -116,12 +116,31 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the
>  #define RINGREF_NAME_LEN (20)
>  
>  /*
> + *  Per-ring info.
> + *  Every blkfront device can associate with one or more blkfront_ring_info,
> + *  depending on how many hardware queues to be used.
> + */
> +struct blkfront_ring_info
> +{
> +	struct blkif_front_ring ring;
> +	unsigned int ring_ref[XENBUS_MAX_RING_PAGES];
> +	unsigned int evtchn, irq;
> +	struct work_struct work;
> +	struct gnttab_free_callback callback;
> +	struct blk_shadow shadow[BLK_MAX_RING_SIZE];
> +	struct list_head grants;
> +	struct list_head indirect_pages;
> +	unsigned int persistent_gnts_c;

persistent grants should be per-device, not per-queue IMHO. Is it really
hard to make this global instead of per-queue?

> +	unsigned long shadow_free;
> +	struct blkfront_dev_info *dinfo;
> +};
> +
> +/*
>   * We have one of these per vbd, whether ide, scsi or 'other'.  They
>   * hang in private_data off the gendisk structure. We may end up
>   * putting all kinds of interesting stuff here :-)
>   */
> -struct blkfront_info
> -{
> +struct blkfront_dev_info {

IMHO, you can leave this as blkfront_info (unless I'm missing something).

>  	spinlock_t io_lock;

Shouldn't the spinlock be per-queue instead of per-device?

>  	struct mutex mutex;
>  	struct xenbus_device *xbdev;
> @@ -129,18 +148,7 @@ struct blkfront_info
>  	int vdevice;
>  	blkif_vdev_t handle;
>  	enum blkif_state connected;
> -	int ring_ref[XENBUS_MAX_RING_PAGES];
> -	unsigned int nr_ring_pages;
> -	struct blkif_front_ring ring;
> -	unsigned int evtchn, irq;
>  	struct request_queue *rq;
> -	struct work_struct work;
> -	struct gnttab_free_callback callback;
> -	struct blk_shadow shadow[BLK_MAX_RING_SIZE];
> -	struct list_head grants;
> -	struct list_head indirect_pages;
> -	unsigned int persistent_gnts_c;
> -	unsigned long shadow_free;
>  	unsigned int feature_flush;
>  	unsigned int feature_discard:1;
>  	unsigned int feature_secdiscard:1;
> @@ -149,7 +157,9 @@ struct blkfront_info
>  	unsigned int feature_persistent:1;
>  	unsigned int max_indirect_segments;
>  	int is_ready;
> +	unsigned int nr_ring_pages;

Spurious change? You are removing it in the chunk above and adding it
back here.

[...]

> @@ -246,33 +257,33 @@ out_of_memory:
>  }
>  
>  static struct grant *get_grant(grant_ref_t *gref_head,
> -                               unsigned long pfn,
> -                               struct blkfront_info *info)
> +			       unsigned long pfn,
> +			       struct blkfront_ring_info *rinfo)

Indentation? (or my email client is mangling emails one more time...)

In order to make this easier to review, do you think you can leave
blkfront_info as "info" for now, and do the renaming to dinfo in a later
patch. That would help figuring out mechanical name changes from the
actual meat of the patch.

Roger.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 3/9] xen/blkfront: separate per ring information out of device info
  2015-09-05 12:39 ` [PATCH v3 3/9] xen/blkfront: separate per ring information out of device info Bob Liu
  2015-10-02 17:02   ` Roger Pau Monné
@ 2015-10-02 17:02   ` Roger Pau Monné
  1 sibling, 0 replies; 83+ messages in thread
From: Roger Pau Monné @ 2015-10-02 17:02 UTC (permalink / raw)
  To: Bob Liu, xen-devel
  Cc: hch, felipe.franciosi, rafal.mielniczuk, linux-kernel,
	jonathan.davies, axboe, david.vrabel, avanzini.arianna,
	boris.ostrovsky

El 05/09/15 a les 14.39, Bob Liu ha escrit:
> Split per ring information to an new structure:blkfront_ring_info, also rename
> per blkfront_info to blkfront_dev_info.
  ^ removed.
> 
> A ring is the representation of a hardware queue, every vbd device can associate
> with one or more blkfront_ring_info depending on how many hardware
> queues/rings to be used.
> 
> This patch is a preparation for supporting real multi hardware queues/rings.
> 
> Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> ---
>  drivers/block/xen-blkfront.c |  854 ++++++++++++++++++++++--------------------
>  1 file changed, 445 insertions(+), 409 deletions(-)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 5dd591d..bf416d5 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -107,7 +107,7 @@ static unsigned int xen_blkif_max_ring_order;
>  module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, S_IRUGO);
>  MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the shared ring");
>  
> -#define BLK_RING_SIZE(info) __CONST_RING_SIZE(blkif, PAGE_SIZE * (info)->nr_ring_pages)
> +#define BLK_RING_SIZE(dinfo) __CONST_RING_SIZE(blkif, PAGE_SIZE * (dinfo)->nr_ring_pages)

This change looks pointless, any reason to use dinfo instead of info?

>  #define BLK_MAX_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE * XENBUS_MAX_RING_PAGES)
>  /*
>   * ring-ref%i i=(-1UL) would take 11 characters + 'ring-ref' is 8, so 19
> @@ -116,12 +116,31 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the
>  #define RINGREF_NAME_LEN (20)
>  
>  /*
> + *  Per-ring info.
> + *  Every blkfront device can associate with one or more blkfront_ring_info,
> + *  depending on how many hardware queues to be used.
> + */
> +struct blkfront_ring_info
> +{
> +	struct blkif_front_ring ring;
> +	unsigned int ring_ref[XENBUS_MAX_RING_PAGES];
> +	unsigned int evtchn, irq;
> +	struct work_struct work;
> +	struct gnttab_free_callback callback;
> +	struct blk_shadow shadow[BLK_MAX_RING_SIZE];
> +	struct list_head grants;
> +	struct list_head indirect_pages;
> +	unsigned int persistent_gnts_c;

persistent grants should be per-device, not per-queue IMHO. Is it really
hard to make this global instead of per-queue?

> +	unsigned long shadow_free;
> +	struct blkfront_dev_info *dinfo;
> +};
> +
> +/*
>   * We have one of these per vbd, whether ide, scsi or 'other'.  They
>   * hang in private_data off the gendisk structure. We may end up
>   * putting all kinds of interesting stuff here :-)
>   */
> -struct blkfront_info
> -{
> +struct blkfront_dev_info {

IMHO, you can leave this as blkfront_info (unless I'm missing something).

>  	spinlock_t io_lock;

Shouldn't the spinlock be per-queue instead of per-device?

>  	struct mutex mutex;
>  	struct xenbus_device *xbdev;
> @@ -129,18 +148,7 @@ struct blkfront_info
>  	int vdevice;
>  	blkif_vdev_t handle;
>  	enum blkif_state connected;
> -	int ring_ref[XENBUS_MAX_RING_PAGES];
> -	unsigned int nr_ring_pages;
> -	struct blkif_front_ring ring;
> -	unsigned int evtchn, irq;
>  	struct request_queue *rq;
> -	struct work_struct work;
> -	struct gnttab_free_callback callback;
> -	struct blk_shadow shadow[BLK_MAX_RING_SIZE];
> -	struct list_head grants;
> -	struct list_head indirect_pages;
> -	unsigned int persistent_gnts_c;
> -	unsigned long shadow_free;
>  	unsigned int feature_flush;
>  	unsigned int feature_discard:1;
>  	unsigned int feature_secdiscard:1;
> @@ -149,7 +157,9 @@ struct blkfront_info
>  	unsigned int feature_persistent:1;
>  	unsigned int max_indirect_segments;
>  	int is_ready;
> +	unsigned int nr_ring_pages;

Spurious change? You are removing it in the chunk above and adding it
back here.

[...]

> @@ -246,33 +257,33 @@ out_of_memory:
>  }
>  
>  static struct grant *get_grant(grant_ref_t *gref_head,
> -                               unsigned long pfn,
> -                               struct blkfront_info *info)
> +			       unsigned long pfn,
> +			       struct blkfront_ring_info *rinfo)

Indentation? (or my email client is mangling emails one more time...)

In order to make this easier to review, do you think you can leave
blkfront_info as "info" for now, and do the renaming to dinfo in a later
patch. That would help figuring out mechanical name changes from the
actual meat of the patch.

Roger.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [Xen-devel] [PATCH v3 2/9] xen-block: add document for mutli hardware queues/rings
  2015-10-02 16:22       ` [Xen-devel] " Roger Pau Monné
@ 2015-10-02 23:55         ` Bob Liu
  2015-10-02 23:55         ` Bob Liu
  1 sibling, 0 replies; 83+ messages in thread
From: Bob Liu @ 2015-10-02 23:55 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Wei Liu, xen-devel, hch, felipe.franciosi, rafal.mielniczuk,
	linux-kernel, jonathan.davies, axboe, david.vrabel,
	avanzini.arianna, boris.ostrovsky


On 10/03/2015 12:22 AM, Roger Pau Monné wrote:
> El 02/10/15 a les 18.12, Wei Liu ha escrit:
>> On Fri, Oct 02, 2015 at 06:04:35PM +0200, Roger Pau Monné wrote:
>>> El 05/09/15 a les 14.39, Bob Liu ha escrit:
>>>> Document multi queues/rings of xen-block.
>>>>
>>>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>>>
>>> As said by Konrad, you should send this against the Xen public headers
>>> also (or even before). I have a comment below.
>>>

Sure, I'll do that and also rebase this series after get more comments.

>>>> ---
>>>>  include/xen/interface/io/blkif.h |   32 ++++++++++++++++++++++++++++++++
>>>>  1 file changed, 32 insertions(+)
>>>>
>>>> diff --git a/include/xen/interface/io/blkif.h b/include/xen/interface/io/blkif.h
>>>> index c33e1c4..b453b70 100644
>>>> --- a/include/xen/interface/io/blkif.h
>>>> +++ b/include/xen/interface/io/blkif.h
>>>> @@ -28,6 +28,38 @@ typedef uint16_t blkif_vdev_t;
>>>>  typedef uint64_t blkif_sector_t;
>>>>  
>>>>  /*
>>>> + * Multiple hardware queues/rings:
>>>> + * If supported, the backend will write the key "multi-queue-max-queues" to
>>>> + * the directory for that vbd, and set its value to the maximum supported
>>>> + * number of queues.
>>>> + * Frontends that are aware of this feature and wish to use it can write the
>>>> + * key "multi-queue-num-queues", set to the number they wish to use, which
>>>> + * must be greater than zero, and no more than the value reported by the backend
>>>> + * in "multi-queue-max-queues".
>>>> + *
>>>> + * For frontends requesting just one queue, the usual event-channel and
>>>> + * ring-ref keys are written as before, simplifying the backend processing
>>>> + * to avoid distinguishing between a frontend that doesn't understand the
>>>> + * multi-queue feature, and one that does, but requested only one queue.
>>>> + *
>>>> + * Frontends requesting two or more queues must not write the toplevel
>>>> + * event-channeland ring-ref keys, instead writing those keys under sub-keys
>>>> + * having the name "queue-N" where N is the integer ID of the queue/ring for
>>>> + * which those keys belong. Queues are indexed from zero.
>>>> + * For example, a frontend with two queues must write the following set of
>>>> + * queue-related keys:
>>>> + *
>>>> + * /local/domain/1/device/vbd/0/multi-queue-num-queues = "2"
>>>> + * /local/domain/1/device/vbd/0/queue-0 = ""
>>>> + * /local/domain/1/device/vbd/0/queue-0/ring-ref = "<ring-ref>"
>>>> + * /local/domain/1/device/vbd/0/queue-0/event-channel = "<evtchn>"
>>>> + * /local/domain/1/device/vbd/0/queue-1 = ""
>>>> + * /local/domain/1/device/vbd/0/queue-1/ring-ref = "<ring-ref>"
>>>> + * /local/domain/1/device/vbd/0/queue-1/event-channel = "<evtchn>"
>>>
>>> AFAICT, it's impossible by design to use multiple queues together with
>>> multipage rings, is that right?
>>>
>>
>> As far as I can tell, these two features are not inherently coupled.
>> Whether you want to make (by design) them coupled together or not is
>> another matter. :-)
> 
> I haven't looked at the implementation yet, but some mention of whether
> multipage-rings are allowed with multiqueue would be good. For example
> if both can indeed be used in conjunction I would mention:
> 
> If multi-page rings are also used, the format of the grant references
> will be:
> 
> /local/domain/1/device/vbd/0/queue-0/ring-ref0 = "<ring-ref0>"
> /local/domain/1/device/vbd/0/queue-0/ring-ref1 = "<ring-ref1>"
> /local/domain/1/device/vbd/0/queue-0/ring-ref2 = "<ring-ref2>"
> [...]
> 

True, and this is already supported. I'll update the document next version.

-- 
Regards,
-Bob

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 2/9] xen-block: add document for mutli hardware queues/rings
  2015-10-02 16:22       ` [Xen-devel] " Roger Pau Monné
  2015-10-02 23:55         ` Bob Liu
@ 2015-10-02 23:55         ` Bob Liu
  1 sibling, 0 replies; 83+ messages in thread
From: Bob Liu @ 2015-10-02 23:55 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: jonathan.davies, felipe.franciosi, rafal.mielniczuk, axboe,
	linux-kernel, xen-devel, hch, david.vrabel, avanzini.arianna,
	Wei Liu, boris.ostrovsky


On 10/03/2015 12:22 AM, Roger Pau Monné wrote:
> El 02/10/15 a les 18.12, Wei Liu ha escrit:
>> On Fri, Oct 02, 2015 at 06:04:35PM +0200, Roger Pau Monné wrote:
>>> El 05/09/15 a les 14.39, Bob Liu ha escrit:
>>>> Document multi queues/rings of xen-block.
>>>>
>>>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>>>
>>> As said by Konrad, you should send this against the Xen public headers
>>> also (or even before). I have a comment below.
>>>

Sure, I'll do that and also rebase this series after get more comments.

>>>> ---
>>>>  include/xen/interface/io/blkif.h |   32 ++++++++++++++++++++++++++++++++
>>>>  1 file changed, 32 insertions(+)
>>>>
>>>> diff --git a/include/xen/interface/io/blkif.h b/include/xen/interface/io/blkif.h
>>>> index c33e1c4..b453b70 100644
>>>> --- a/include/xen/interface/io/blkif.h
>>>> +++ b/include/xen/interface/io/blkif.h
>>>> @@ -28,6 +28,38 @@ typedef uint16_t blkif_vdev_t;
>>>>  typedef uint64_t blkif_sector_t;
>>>>  
>>>>  /*
>>>> + * Multiple hardware queues/rings:
>>>> + * If supported, the backend will write the key "multi-queue-max-queues" to
>>>> + * the directory for that vbd, and set its value to the maximum supported
>>>> + * number of queues.
>>>> + * Frontends that are aware of this feature and wish to use it can write the
>>>> + * key "multi-queue-num-queues", set to the number they wish to use, which
>>>> + * must be greater than zero, and no more than the value reported by the backend
>>>> + * in "multi-queue-max-queues".
>>>> + *
>>>> + * For frontends requesting just one queue, the usual event-channel and
>>>> + * ring-ref keys are written as before, simplifying the backend processing
>>>> + * to avoid distinguishing between a frontend that doesn't understand the
>>>> + * multi-queue feature, and one that does, but requested only one queue.
>>>> + *
>>>> + * Frontends requesting two or more queues must not write the toplevel
>>>> + * event-channeland ring-ref keys, instead writing those keys under sub-keys
>>>> + * having the name "queue-N" where N is the integer ID of the queue/ring for
>>>> + * which those keys belong. Queues are indexed from zero.
>>>> + * For example, a frontend with two queues must write the following set of
>>>> + * queue-related keys:
>>>> + *
>>>> + * /local/domain/1/device/vbd/0/multi-queue-num-queues = "2"
>>>> + * /local/domain/1/device/vbd/0/queue-0 = ""
>>>> + * /local/domain/1/device/vbd/0/queue-0/ring-ref = "<ring-ref>"
>>>> + * /local/domain/1/device/vbd/0/queue-0/event-channel = "<evtchn>"
>>>> + * /local/domain/1/device/vbd/0/queue-1 = ""
>>>> + * /local/domain/1/device/vbd/0/queue-1/ring-ref = "<ring-ref>"
>>>> + * /local/domain/1/device/vbd/0/queue-1/event-channel = "<evtchn>"
>>>
>>> AFAICT, it's impossible by design to use multiple queues together with
>>> multipage rings, is that right?
>>>
>>
>> As far as I can tell, these two features are not inherently coupled.
>> Whether you want to make (by design) them coupled together or not is
>> another matter. :-)
> 
> I haven't looked at the implementation yet, but some mention of whether
> multipage-rings are allowed with multiqueue would be good. For example
> if both can indeed be used in conjunction I would mention:
> 
> If multi-page rings are also used, the format of the grant references
> will be:
> 
> /local/domain/1/device/vbd/0/queue-0/ring-ref0 = "<ring-ref0>"
> /local/domain/1/device/vbd/0/queue-0/ring-ref1 = "<ring-ref1>"
> /local/domain/1/device/vbd/0/queue-0/ring-ref2 = "<ring-ref2>"
> [...]
> 

True, and this is already supported. I'll update the document next version.

-- 
Regards,
-Bob

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 3/9] xen/blkfront: separate per ring information out of device info
  2015-10-02 17:02   ` Roger Pau Monné
  2015-10-03  0:34     ` Bob Liu
@ 2015-10-03  0:34     ` Bob Liu
  2015-10-05 15:17       ` Roger Pau Monné
  2015-10-05 15:17       ` Roger Pau Monné
  2015-10-10  8:30     ` Bob Liu
  2015-10-10  8:30     ` Bob Liu
  3 siblings, 2 replies; 83+ messages in thread
From: Bob Liu @ 2015-10-03  0:34 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, david.vrabel, linux-kernel, konrad.wilk,
	felipe.franciosi, axboe, hch, avanzini.arianna, rafal.mielniczuk,
	boris.ostrovsky, jonathan.davies


On 10/03/2015 01:02 AM, Roger Pau Monné wrote:
> El 05/09/15 a les 14.39, Bob Liu ha escrit:
>> Split per ring information to an new structure:blkfront_ring_info, also rename
>> per blkfront_info to blkfront_dev_info.
>   ^ removed.
>>
>> A ring is the representation of a hardware queue, every vbd device can associate
>> with one or more blkfront_ring_info depending on how many hardware
>> queues/rings to be used.
>>
>> This patch is a preparation for supporting real multi hardware queues/rings.
>>
>> Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>> ---
>>  drivers/block/xen-blkfront.c |  854 ++++++++++++++++++++++--------------------
>>  1 file changed, 445 insertions(+), 409 deletions(-)
>>
>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>> index 5dd591d..bf416d5 100644
>> --- a/drivers/block/xen-blkfront.c
>> +++ b/drivers/block/xen-blkfront.c
>> @@ -107,7 +107,7 @@ static unsigned int xen_blkif_max_ring_order;
>>  module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, S_IRUGO);
>>  MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the shared ring");
>>  
>> -#define BLK_RING_SIZE(info) __CONST_RING_SIZE(blkif, PAGE_SIZE * (info)->nr_ring_pages)
>> +#define BLK_RING_SIZE(dinfo) __CONST_RING_SIZE(blkif, PAGE_SIZE * (dinfo)->nr_ring_pages)
> 
> This change looks pointless, any reason to use dinfo instead of info?
> 
>>  #define BLK_MAX_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE * XENBUS_MAX_RING_PAGES)
>>  /*
>>   * ring-ref%i i=(-1UL) would take 11 characters + 'ring-ref' is 8, so 19
>> @@ -116,12 +116,31 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the
>>  #define RINGREF_NAME_LEN (20)
>>  
>>  /*
>> + *  Per-ring info.
>> + *  Every blkfront device can associate with one or more blkfront_ring_info,
>> + *  depending on how many hardware queues to be used.
>> + */
>> +struct blkfront_ring_info
>> +{
>> +	struct blkif_front_ring ring;
>> +	unsigned int ring_ref[XENBUS_MAX_RING_PAGES];
>> +	unsigned int evtchn, irq;
>> +	struct work_struct work;
>> +	struct gnttab_free_callback callback;
>> +	struct blk_shadow shadow[BLK_MAX_RING_SIZE];
>> +	struct list_head grants;
>> +	struct list_head indirect_pages;
>> +	unsigned int persistent_gnts_c;
> 
> persistent grants should be per-device, not per-queue IMHO. Is it really
> hard to make this global instead of per-queue?
> 

The most important thing is keep changes minimal for better review at this stage.
I'll check which way has the least modification.

>> +	unsigned long shadow_free;
>> +	struct blkfront_dev_info *dinfo;
>> +};
>> +
>> +/*
>>   * We have one of these per vbd, whether ide, scsi or 'other'.  They
>>   * hang in private_data off the gendisk structure. We may end up
>>   * putting all kinds of interesting stuff here :-)
>>   */
>> -struct blkfront_info
>> -{
>> +struct blkfront_dev_info {
> 
> IMHO, you can leave this as blkfront_info (unless I'm missing something).
> 
>>  	spinlock_t io_lock;
> 
> Shouldn't the spinlock be per-queue instead of per-device?
> 

That's in another patch for better review.
'[PATCH v3 5/9] xen/blkfront: convert per device io_lock to per ring ring_lock' will do that.

>>  	struct mutex mutex;
>>  	struct xenbus_device *xbdev;
>> @@ -129,18 +148,7 @@ struct blkfront_info
>>  	int vdevice;
>>  	blkif_vdev_t handle;
>>  	enum blkif_state connected;
>> -	int ring_ref[XENBUS_MAX_RING_PAGES];
>> -	unsigned int nr_ring_pages;
>> -	struct blkif_front_ring ring;
>> -	unsigned int evtchn, irq;
>>  	struct request_queue *rq;
>> -	struct work_struct work;
>> -	struct gnttab_free_callback callback;
>> -	struct blk_shadow shadow[BLK_MAX_RING_SIZE];
>> -	struct list_head grants;
>> -	struct list_head indirect_pages;
>> -	unsigned int persistent_gnts_c;
>> -	unsigned long shadow_free;
>>  	unsigned int feature_flush;
>>  	unsigned int feature_discard:1;
>>  	unsigned int feature_secdiscard:1;
>> @@ -149,7 +157,9 @@ struct blkfront_info
>>  	unsigned int feature_persistent:1;
>>  	unsigned int max_indirect_segments;
>>  	int is_ready;
>> +	unsigned int nr_ring_pages;
> 
> Spurious change? You are removing it in the chunk above and adding it
> back here.
> 

Will be fix.

> [...]
> 
>> @@ -246,33 +257,33 @@ out_of_memory:
>>  }
>>  
>>  static struct grant *get_grant(grant_ref_t *gref_head,
>> -                               unsigned long pfn,
>> -                               struct blkfront_info *info)
>> +			       unsigned long pfn,
>> +			       struct blkfront_ring_info *rinfo)
> 
> Indentation? (or my email client is mangling emails one more time...)
> 

Will be fix.

> In order to make this easier to review, do you think you can leave
> blkfront_info as "info" for now, and do the renaming to dinfo in a later
> patch. That would help figuring out mechanical name changes from the
> actual meat of the patch.
> 

That's what I have done in v2, but believe me it's more difficult to read and review.
They are a lot of messed place combined with info and rinfo, when seeing an info you have to think 
whether is device info or ring info. It's more straightforward to use dinfo and rinfo to distinguish at the beginning.

-- 
Regards,
-Bob

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 3/9] xen/blkfront: separate per ring information out of device info
  2015-10-02 17:02   ` Roger Pau Monné
@ 2015-10-03  0:34     ` Bob Liu
  2015-10-03  0:34     ` Bob Liu
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 83+ messages in thread
From: Bob Liu @ 2015-10-03  0:34 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: hch, felipe.franciosi, rafal.mielniczuk, linux-kernel, xen-devel,
	axboe, jonathan.davies, david.vrabel, avanzini.arianna,
	boris.ostrovsky


On 10/03/2015 01:02 AM, Roger Pau Monné wrote:
> El 05/09/15 a les 14.39, Bob Liu ha escrit:
>> Split per ring information to an new structure:blkfront_ring_info, also rename
>> per blkfront_info to blkfront_dev_info.
>   ^ removed.
>>
>> A ring is the representation of a hardware queue, every vbd device can associate
>> with one or more blkfront_ring_info depending on how many hardware
>> queues/rings to be used.
>>
>> This patch is a preparation for supporting real multi hardware queues/rings.
>>
>> Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>> ---
>>  drivers/block/xen-blkfront.c |  854 ++++++++++++++++++++++--------------------
>>  1 file changed, 445 insertions(+), 409 deletions(-)
>>
>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>> index 5dd591d..bf416d5 100644
>> --- a/drivers/block/xen-blkfront.c
>> +++ b/drivers/block/xen-blkfront.c
>> @@ -107,7 +107,7 @@ static unsigned int xen_blkif_max_ring_order;
>>  module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, S_IRUGO);
>>  MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the shared ring");
>>  
>> -#define BLK_RING_SIZE(info) __CONST_RING_SIZE(blkif, PAGE_SIZE * (info)->nr_ring_pages)
>> +#define BLK_RING_SIZE(dinfo) __CONST_RING_SIZE(blkif, PAGE_SIZE * (dinfo)->nr_ring_pages)
> 
> This change looks pointless, any reason to use dinfo instead of info?
> 
>>  #define BLK_MAX_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE * XENBUS_MAX_RING_PAGES)
>>  /*
>>   * ring-ref%i i=(-1UL) would take 11 characters + 'ring-ref' is 8, so 19
>> @@ -116,12 +116,31 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the
>>  #define RINGREF_NAME_LEN (20)
>>  
>>  /*
>> + *  Per-ring info.
>> + *  Every blkfront device can associate with one or more blkfront_ring_info,
>> + *  depending on how many hardware queues to be used.
>> + */
>> +struct blkfront_ring_info
>> +{
>> +	struct blkif_front_ring ring;
>> +	unsigned int ring_ref[XENBUS_MAX_RING_PAGES];
>> +	unsigned int evtchn, irq;
>> +	struct work_struct work;
>> +	struct gnttab_free_callback callback;
>> +	struct blk_shadow shadow[BLK_MAX_RING_SIZE];
>> +	struct list_head grants;
>> +	struct list_head indirect_pages;
>> +	unsigned int persistent_gnts_c;
> 
> persistent grants should be per-device, not per-queue IMHO. Is it really
> hard to make this global instead of per-queue?
> 

The most important thing is keep changes minimal for better review at this stage.
I'll check which way has the least modification.

>> +	unsigned long shadow_free;
>> +	struct blkfront_dev_info *dinfo;
>> +};
>> +
>> +/*
>>   * We have one of these per vbd, whether ide, scsi or 'other'.  They
>>   * hang in private_data off the gendisk structure. We may end up
>>   * putting all kinds of interesting stuff here :-)
>>   */
>> -struct blkfront_info
>> -{
>> +struct blkfront_dev_info {
> 
> IMHO, you can leave this as blkfront_info (unless I'm missing something).
> 
>>  	spinlock_t io_lock;
> 
> Shouldn't the spinlock be per-queue instead of per-device?
> 

That's in another patch for better review.
'[PATCH v3 5/9] xen/blkfront: convert per device io_lock to per ring ring_lock' will do that.

>>  	struct mutex mutex;
>>  	struct xenbus_device *xbdev;
>> @@ -129,18 +148,7 @@ struct blkfront_info
>>  	int vdevice;
>>  	blkif_vdev_t handle;
>>  	enum blkif_state connected;
>> -	int ring_ref[XENBUS_MAX_RING_PAGES];
>> -	unsigned int nr_ring_pages;
>> -	struct blkif_front_ring ring;
>> -	unsigned int evtchn, irq;
>>  	struct request_queue *rq;
>> -	struct work_struct work;
>> -	struct gnttab_free_callback callback;
>> -	struct blk_shadow shadow[BLK_MAX_RING_SIZE];
>> -	struct list_head grants;
>> -	struct list_head indirect_pages;
>> -	unsigned int persistent_gnts_c;
>> -	unsigned long shadow_free;
>>  	unsigned int feature_flush;
>>  	unsigned int feature_discard:1;
>>  	unsigned int feature_secdiscard:1;
>> @@ -149,7 +157,9 @@ struct blkfront_info
>>  	unsigned int feature_persistent:1;
>>  	unsigned int max_indirect_segments;
>>  	int is_ready;
>> +	unsigned int nr_ring_pages;
> 
> Spurious change? You are removing it in the chunk above and adding it
> back here.
> 

Will be fix.

> [...]
> 
>> @@ -246,33 +257,33 @@ out_of_memory:
>>  }
>>  
>>  static struct grant *get_grant(grant_ref_t *gref_head,
>> -                               unsigned long pfn,
>> -                               struct blkfront_info *info)
>> +			       unsigned long pfn,
>> +			       struct blkfront_ring_info *rinfo)
> 
> Indentation? (or my email client is mangling emails one more time...)
> 

Will be fix.

> In order to make this easier to review, do you think you can leave
> blkfront_info as "info" for now, and do the renaming to dinfo in a later
> patch. That would help figuring out mechanical name changes from the
> actual meat of the patch.
> 

That's what I have done in v2, but believe me it's more difficult to read and review.
They are a lot of messed place combined with info and rinfo, when seeing an info you have to think 
whether is device info or ring info. It's more straightforward to use dinfo and rinfo to distinguish at the beginning.

-- 
Regards,
-Bob

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 4/9] xen/blkfront: pseudo support for multi hardware queues/rings
  2015-09-05 12:39 ` [PATCH v3 4/9] xen/blkfront: pseudo support for multi hardware queues/rings Bob Liu
@ 2015-10-05 10:52   ` Roger Pau Monné
  2015-10-07 10:28     ` Bob Liu
  2015-10-07 10:28     ` Bob Liu
  2015-10-05 10:52   ` Roger Pau Monné
  1 sibling, 2 replies; 83+ messages in thread
From: Roger Pau Monné @ 2015-10-05 10:52 UTC (permalink / raw)
  To: Bob Liu, xen-devel
  Cc: david.vrabel, linux-kernel, konrad.wilk, felipe.franciosi, axboe,
	hch, avanzini.arianna, rafal.mielniczuk, boris.ostrovsky,
	jonathan.davies

El 05/09/15 a les 14.39, Bob Liu ha escrit:
> Prepare patch for multi hardware queues/rings, the ring number was set to 1 by
> force.
> 
> * Use 'nr_rings' in per dev_info to identify how many hw queues/rings are
>   supported, and a pointer *rinfo for all its rings.
> * Rename 'nr_ring_pages' => 'pages_per_ring' to distinguish from 'nr_rings'
>   better.
> 
> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> ---
>  drivers/block/xen-blkfront.c |  513 +++++++++++++++++++++++++-----------------
>  1 file changed, 308 insertions(+), 205 deletions(-)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index bf416d5..bf45c99 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -107,7 +107,7 @@ static unsigned int xen_blkif_max_ring_order;
>  module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, S_IRUGO);
>  MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the shared ring");
>  
> -#define BLK_RING_SIZE(dinfo) __CONST_RING_SIZE(blkif, PAGE_SIZE * (dinfo)->nr_ring_pages)
> +#define BLK_RING_SIZE(dinfo) __CONST_RING_SIZE(blkif, PAGE_SIZE * (dinfo)->pages_per_ring)
>  #define BLK_MAX_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE * XENBUS_MAX_RING_PAGES)
>  /*
>   * ring-ref%i i=(-1UL) would take 11 characters + 'ring-ref' is 8, so 19
> @@ -157,9 +157,10 @@ struct blkfront_dev_info {
>  	unsigned int feature_persistent:1;
>  	unsigned int max_indirect_segments;
>  	int is_ready;
> -	unsigned int nr_ring_pages;
> +	unsigned int pages_per_ring;

Why do you rename this field? nr_ring_pages seems more consistent with
the nr_rings field that you add below IMO, but that might be a matter of
taste.

>  	struct blk_mq_tag_set tag_set;
> -	struct blkfront_ring_info rinfo;
> +	struct blkfront_ring_info *rinfo;
> +	unsigned int nr_rings;
>  };
>  
>  static unsigned int nr_minors;
> @@ -191,7 +192,7 @@ static DEFINE_SPINLOCK(minor_lock);
>  	((_segs + SEGS_PER_INDIRECT_FRAME - 1)/SEGS_PER_INDIRECT_FRAME)
>  
>  static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo);
> -static int blkfront_gather_backend_features(struct blkfront_dev_info *dinfo);
> +static void __blkfront_gather_backend_features(struct blkfront_dev_info *dinfo);
>  
>  static int get_id_from_freelist(struct blkfront_ring_info *rinfo)
>  {
> @@ -668,7 +669,7 @@ static int blk_mq_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
>  {
>  	struct blkfront_dev_info *dinfo = (struct blkfront_dev_info *)data;
>  
> -	hctx->driver_data = &dinfo->rinfo;
> +	hctx->driver_data = &dinfo->rinfo[index];
>  	return 0;
>  }
>  
> @@ -927,8 +928,8 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
>  
>  static void xlvbd_release_gendisk(struct blkfront_dev_info *dinfo)
>  {
> -	unsigned int minor, nr_minors;
> -	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
> +	unsigned int minor, nr_minors, i;
> +	struct blkfront_ring_info *rinfo;
>  
>  	if (dinfo->rq == NULL)
>  		return;
> @@ -936,11 +937,15 @@ static void xlvbd_release_gendisk(struct blkfront_dev_info *dinfo)
>  	/* No more blkif_request(). */
>  	blk_mq_stop_hw_queues(dinfo->rq);
>  
> -	/* No more gnttab callback work. */
> -	gnttab_cancel_free_callback(&rinfo->callback);
> +	for (i = 0; i < dinfo->nr_rings; i++) {

I would be tempted to declare rinfo only inside the for loop, to limit
the scope:

		struct blkfront_ring_info *rinfo = &dinfo->rinfo[i];

> +		rinfo = &dinfo->rinfo[i];
>  
> -	/* Flush gnttab callback work. Must be done with no locks held. */
> -	flush_work(&rinfo->work);
> +		/* No more gnttab callback work. */
> +		gnttab_cancel_free_callback(&rinfo->callback);
> +
> +		/* Flush gnttab callback work. Must be done with no locks held. */
> +		flush_work(&rinfo->work);
> +	}
>  
>  	del_gendisk(dinfo->gd);
>  
> @@ -977,8 +982,8 @@ static void blkif_free(struct blkfront_dev_info *dinfo, int suspend)
>  {
>  	struct grant *persistent_gnt;
>  	struct grant *n;
> -	int i, j, segs;
> -	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
> +	int i, j, segs, r_index;
> +	struct blkfront_ring_info *rinfo;
>  
>  	/* Prevent new requests being issued until we fix things up. */
>  	spin_lock_irq(&dinfo->io_lock);
> @@ -988,100 +993,103 @@ static void blkif_free(struct blkfront_dev_info *dinfo, int suspend)
>  	if (dinfo->rq)
>  		blk_mq_stop_hw_queues(dinfo->rq);
>  
> -	/* Remove all persistent grants */
> -	if (!list_empty(&rinfo->grants)) {
> -		list_for_each_entry_safe(persistent_gnt, n,
> -					 &rinfo->grants, node) {
> -			list_del(&persistent_gnt->node);
> -			if (persistent_gnt->gref != GRANT_INVALID_REF) {
> -				gnttab_end_foreign_access(persistent_gnt->gref,
> -				                          0, 0UL);
> -				rinfo->persistent_gnts_c--;
> +	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
> +		rinfo = &dinfo->rinfo[r_index];

struct blkfront_ring_info *rinfo = &dinfo->rinfo[r_index];

Would it be helpful to place all this code inside of a helper function,
ie: blkif_free_ring?

> +
> +		/* Remove all persistent grants */
> +		if (!list_empty(&rinfo->grants)) {
> +			list_for_each_entry_safe(persistent_gnt, n,
> +						 &rinfo->grants, node) {
> +				list_del(&persistent_gnt->node);
> +				if (persistent_gnt->gref != GRANT_INVALID_REF) {
> +					gnttab_end_foreign_access(persistent_gnt->gref,
> +								  0, 0UL);
> +					rinfo->persistent_gnts_c--;
> +				}
> +				if (dinfo->feature_persistent)
> +					__free_page(pfn_to_page(persistent_gnt->pfn));
> +				kfree(persistent_gnt);
>  			}
> -			if (dinfo->feature_persistent)
> -				__free_page(pfn_to_page(persistent_gnt->pfn));
> -			kfree(persistent_gnt);
>  		}
> -	}
> -	BUG_ON(rinfo->persistent_gnts_c != 0);
> +		BUG_ON(rinfo->persistent_gnts_c != 0);
>  
> -	/*
> -	 * Remove indirect pages, this only happens when using indirect
> -	 * descriptors but not persistent grants
> -	 */
> -	if (!list_empty(&rinfo->indirect_pages)) {
> -		struct page *indirect_page, *n;
> -
> -		BUG_ON(dinfo->feature_persistent);
> -		list_for_each_entry_safe(indirect_page, n, &rinfo->indirect_pages, lru) {
> -			list_del(&indirect_page->lru);
> -			__free_page(indirect_page);
> -		}
> -	}
> -
> -	for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
>  		/*
> -		 * Clear persistent grants present in requests already
> -		 * on the shared ring
> +		 * Remove indirect pages, this only happens when using indirect
> +		 * descriptors but not persistent grants
>  		 */
> -		if (!rinfo->shadow[i].request)
> -			goto free_shadow;
> -
> -		segs = rinfo->shadow[i].req.operation == BLKIF_OP_INDIRECT ?
> -		       rinfo->shadow[i].req.u.indirect.nr_segments :
> -		       rinfo->shadow[i].req.u.rw.nr_segments;
> -		for (j = 0; j < segs; j++) {
> -			persistent_gnt = rinfo->shadow[i].grants_used[j];
> -			gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
> -			if (dinfo->feature_persistent)
> -				__free_page(pfn_to_page(persistent_gnt->pfn));
> -			kfree(persistent_gnt);
> +		if (!list_empty(&rinfo->indirect_pages)) {
> +			struct page *indirect_page, *n;
> +
> +			BUG_ON(dinfo->feature_persistent);
> +			list_for_each_entry_safe(indirect_page, n, &rinfo->indirect_pages, lru) {
> +				list_del(&indirect_page->lru);
> +				__free_page(indirect_page);
> +			}
>  		}
>  
> -		if (rinfo->shadow[i].req.operation != BLKIF_OP_INDIRECT)
> +		for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
>  			/*
> -			 * If this is not an indirect operation don't try to
> -			 * free indirect segments
> +			 * Clear persistent grants present in requests already
> +			 * on the shared ring
>  			 */
> -			goto free_shadow;
> +			if (!rinfo->shadow[i].request)
> +				goto free_shadow;
> +
> +			segs = rinfo->shadow[i].req.operation == BLKIF_OP_INDIRECT ?
> +			       rinfo->shadow[i].req.u.indirect.nr_segments :
> +			       rinfo->shadow[i].req.u.rw.nr_segments;
> +			for (j = 0; j < segs; j++) {
> +				persistent_gnt = rinfo->shadow[i].grants_used[j];
> +				gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
> +				if (dinfo->feature_persistent)
> +					__free_page(pfn_to_page(persistent_gnt->pfn));
> +				kfree(persistent_gnt);
> +			}
>  
> -		for (j = 0; j < INDIRECT_GREFS(segs); j++) {
> -			persistent_gnt = rinfo->shadow[i].indirect_grants[j];
> -			gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
> -			__free_page(pfn_to_page(persistent_gnt->pfn));
> -			kfree(persistent_gnt);
> -		}
> +			if (rinfo->shadow[i].req.operation != BLKIF_OP_INDIRECT)
> +				/*
> +				 * If this is not an indirect operation don't try to
> +				 * free indirect segments
> +				 */
> +				goto free_shadow;
> +
> +			for (j = 0; j < INDIRECT_GREFS(segs); j++) {
> +				persistent_gnt = rinfo->shadow[i].indirect_grants[j];
> +				gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
> +				__free_page(pfn_to_page(persistent_gnt->pfn));
> +				kfree(persistent_gnt);
> +			}
>  
>  free_shadow:
> -		kfree(rinfo->shadow[i].grants_used);
> -		rinfo->shadow[i].grants_used = NULL;
> -		kfree(rinfo->shadow[i].indirect_grants);
> -		rinfo->shadow[i].indirect_grants = NULL;
> -		kfree(rinfo->shadow[i].sg);
> -		rinfo->shadow[i].sg = NULL;
> -	}
> +			kfree(rinfo->shadow[i].grants_used);
> +			rinfo->shadow[i].grants_used = NULL;
> +			kfree(rinfo->shadow[i].indirect_grants);
> +			rinfo->shadow[i].indirect_grants = NULL;
> +			kfree(rinfo->shadow[i].sg);
> +			rinfo->shadow[i].sg = NULL;
> +		}
>  
> -	/* No more gnttab callback work. */
> -	gnttab_cancel_free_callback(&rinfo->callback);
> -	spin_unlock_irq(&dinfo->io_lock);
> +		/* No more gnttab callback work. */
> +		gnttab_cancel_free_callback(&rinfo->callback);
> +		spin_unlock_irq(&dinfo->io_lock);
>  
> -	/* Flush gnttab callback work. Must be done with no locks held. */
> -	flush_work(&rinfo->work);
> +		/* Flush gnttab callback work. Must be done with no locks held. */
> +		flush_work(&rinfo->work);
>  
> -	/* Free resources associated with old device channel. */
> -	for (i = 0; i < dinfo->nr_ring_pages; i++) {
> -		if (rinfo->ring_ref[i] != GRANT_INVALID_REF) {
> -			gnttab_end_foreign_access(rinfo->ring_ref[i], 0, 0);
> -			rinfo->ring_ref[i] = GRANT_INVALID_REF;
> +		/* Free resources associated with old device channel. */
> +		for (i = 0; i < dinfo->pages_per_ring; i++) {
> +			if (rinfo->ring_ref[i] != GRANT_INVALID_REF) {
> +				gnttab_end_foreign_access(rinfo->ring_ref[i], 0, 0);
> +				rinfo->ring_ref[i] = GRANT_INVALID_REF;
> +			}
>  		}
> -	}
> -	free_pages((unsigned long)rinfo->ring.sring, get_order(dinfo->nr_ring_pages * PAGE_SIZE));
> -	rinfo->ring.sring = NULL;
> -
> -	if (rinfo->irq)
> -		unbind_from_irqhandler(rinfo->irq, rinfo);
> -	rinfo->evtchn = rinfo->irq = 0;
> +		free_pages((unsigned long)rinfo->ring.sring, get_order(dinfo->pages_per_ring * PAGE_SIZE));
> +		rinfo->ring.sring = NULL;
>  
> +		if (rinfo->irq)
> +			unbind_from_irqhandler(rinfo->irq, rinfo);
> +		rinfo->evtchn = rinfo->irq = 0;
> +	}
>  }
>  
>  static void blkif_completion(struct blk_shadow *s, struct blkfront_ring_info *rinfo,
> @@ -1276,6 +1284,26 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>  	return IRQ_HANDLED;
>  }
>  
> +static void destroy_blkring(struct xenbus_device *dev,
> +			    struct blkfront_ring_info *rinfo)
> +{
> +	int i;
> +
> +	if (rinfo->irq)
> +		unbind_from_irqhandler(rinfo->irq, rinfo);
> +	if (rinfo->evtchn)
> +		xenbus_free_evtchn(dev, rinfo->evtchn);
> +
> +	for (i = 0; i < rinfo->dinfo->pages_per_ring; i++) {
> +		if (rinfo->ring_ref[i] != GRANT_INVALID_REF) {
> +			gnttab_end_foreign_access(rinfo->ring_ref[i], 0, 0);
> +			rinfo->ring_ref[i] = GRANT_INVALID_REF;
> +		}
> +	}
> +	free_pages((unsigned long)rinfo->ring.sring,
> +		   get_order(rinfo->dinfo->pages_per_ring * PAGE_SIZE));
> +	rinfo->ring.sring = NULL;
> +}
>  
>  static int setup_blkring(struct xenbus_device *dev,
>  			 struct blkfront_ring_info *rinfo)
> @@ -1283,10 +1311,10 @@ static int setup_blkring(struct xenbus_device *dev,
>  	struct blkif_sring *sring;
>  	int err, i;
>  	struct blkfront_dev_info *dinfo = rinfo->dinfo;
> -	unsigned long ring_size = dinfo->nr_ring_pages * PAGE_SIZE;
> +	unsigned long ring_size = dinfo->pages_per_ring * PAGE_SIZE;
>  	grant_ref_t gref[XENBUS_MAX_RING_PAGES];
>  
> -	for (i = 0; i < dinfo->nr_ring_pages; i++)
> +	for (i = 0; i < dinfo->pages_per_ring; i++)
>  		rinfo->ring_ref[i] = GRANT_INVALID_REF;
>  
>  	sring = (struct blkif_sring *)__get_free_pages(GFP_NOIO | __GFP_HIGH,
> @@ -1298,13 +1326,13 @@ static int setup_blkring(struct xenbus_device *dev,
>  	SHARED_RING_INIT(sring);
>  	FRONT_RING_INIT(&rinfo->ring, sring, ring_size);
>  
> -	err = xenbus_grant_ring(dev, rinfo->ring.sring, dinfo->nr_ring_pages, gref);
> +	err = xenbus_grant_ring(dev, rinfo->ring.sring, dinfo->pages_per_ring, gref);
>  	if (err < 0) {
>  		free_pages((unsigned long)sring, get_order(ring_size));
>  		rinfo->ring.sring = NULL;
>  		goto fail;
>  	}
> -	for (i = 0; i < dinfo->nr_ring_pages; i++)
> +	for (i = 0; i < dinfo->pages_per_ring; i++)
>  		rinfo->ring_ref[i] = gref[i];
>  
>  	err = xenbus_alloc_evtchn(dev, &rinfo->evtchn);
> @@ -1322,7 +1350,7 @@ static int setup_blkring(struct xenbus_device *dev,
>  
>  	return 0;
>  fail:
> -	blkif_free(dinfo, 0);
> +	destroy_blkring(dev, rinfo);

blkif_free used to clean a lot more than what destroy_blkring does, is
this right?

>  	return err;
>  }
>  
> @@ -1333,65 +1361,76 @@ static int talk_to_blkback(struct xenbus_device *dev,
>  {
>  	const char *message = NULL;
>  	struct xenbus_transaction xbt;
> -	int err, i;
> +	int err, i, r_index;
>  	unsigned int max_page_order = 0;
>  	unsigned int ring_page_order = 0;
> -	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
> +	struct blkfront_ring_info *rinfo;
>  
>  	err = xenbus_scanf(XBT_NIL, dinfo->xbdev->otherend,
>  			   "max-ring-page-order", "%u", &max_page_order);
>  	if (err != 1)
> -		dinfo->nr_ring_pages = 1;
> +		dinfo->pages_per_ring = 1;
>  	else {
>  		ring_page_order = min(xen_blkif_max_ring_order, max_page_order);
> -		dinfo->nr_ring_pages = 1 << ring_page_order;
> +		dinfo->pages_per_ring = 1 << ring_page_order;

As said above, I think nr_ring_pages is perfectly fine, and avoids all
this ponintless changes.

>  	}
>  
> -	/* Create shared ring, alloc event channel. */
> -	err = setup_blkring(dev, rinfo);
> -	if (err)
> -		goto out;
> +	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
> +		rinfo = &dinfo->rinfo[r_index];
> +		/* Create shared ring, alloc event channel. */
> +		err = setup_blkring(dev, rinfo);
> +		if (err)
> +			goto out;
> +	}
>  
>  again:
>  	err = xenbus_transaction_start(&xbt);
>  	if (err) {
>  		xenbus_dev_fatal(dev, err, "starting transaction");
> -		goto destroy_blkring;
> +		goto out;
>  	}
>  
> -	if (dinfo->nr_ring_pages == 1) {
> -		err = xenbus_printf(xbt, dev->nodename,
> -				    "ring-ref", "%u", rinfo->ring_ref[0]);
> -		if (err) {
> -			message = "writing ring-ref";
> -			goto abort_transaction;
> -		}
> -	} else {
> -		err = xenbus_printf(xbt, dev->nodename,
> -				    "ring-page-order", "%u", ring_page_order);
> -		if (err) {
> -			message = "writing ring-page-order";
> -			goto abort_transaction;
> -		}
> -
> -		for (i = 0; i < dinfo->nr_ring_pages; i++) {
> -			char ring_ref_name[RINGREF_NAME_LEN];
> +	if (dinfo->nr_rings == 1) {
> +		rinfo = &dinfo->rinfo[0];
>  
> -			snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
> -			err = xenbus_printf(xbt, dev->nodename, ring_ref_name,
> -					    "%u", rinfo->ring_ref[i]);
> +		if (dinfo->pages_per_ring == 1) {
> +			err = xenbus_printf(xbt, dev->nodename,
> +					    "ring-ref", "%u", rinfo->ring_ref[0]);
>  			if (err) {
>  				message = "writing ring-ref";
>  				goto abort_transaction;
>  			}
> +		} else {
> +			err = xenbus_printf(xbt, dev->nodename,
> +					    "ring-page-order", "%u", ring_page_order);
> +			if (err) {
> +				message = "writing ring-page-order";
> +				goto abort_transaction;
> +			}
> +
> +			for (i = 0; i < dinfo->pages_per_ring; i++) {
> +				char ring_ref_name[RINGREF_NAME_LEN];
> +
> +				snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
> +				err = xenbus_printf(xbt, dev->nodename, ring_ref_name,
> +						    "%u", rinfo->ring_ref[i]);
> +				if (err) {
> +					message = "writing ring-ref";
> +					goto abort_transaction;
> +				}
> +			}
>  		}
> -	}
> -	err = xenbus_printf(xbt, dev->nodename,
> -			    "event-channel", "%u", rinfo->evtchn);
> -	if (err) {
> -		message = "writing event-channel";
> +		err = xenbus_printf(xbt, dev->nodename,
> +				    "event-channel", "%u", rinfo->evtchn);
> +		if (err) {
> +			message = "writing event-channel";
> +			goto abort_transaction;
> +		}
> +	} else {
> +		/* Not supported at this stage */
>  		goto abort_transaction;
>  	}
> +
>  	err = xenbus_printf(xbt, dev->nodename, "protocol", "%s",
>  			    XEN_IO_PROTO_ABI_NATIVE);
>  	if (err) {
> @@ -1409,12 +1448,16 @@ again:
>  		if (err == -EAGAIN)
>  			goto again;
>  		xenbus_dev_fatal(dev, err, "completing transaction");
> -		goto destroy_blkring;
> +		goto out;
>  	}
>  
> -	for (i = 0; i < BLK_RING_SIZE(dinfo); i++)
> -		rinfo->shadow[i].req.u.rw.id = i+1;
> -	rinfo->shadow[BLK_RING_SIZE(dinfo)-1].req.u.rw.id = 0x0fffffff;
> +	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
> +		rinfo = &dinfo->rinfo[r_index];
> +
> +		for (i = 0; i < BLK_RING_SIZE(dinfo); i++)
> +			rinfo->shadow[i].req.u.rw.id = i+1;
> +		rinfo->shadow[BLK_RING_SIZE(dinfo)-1].req.u.rw.id = 0x0fffffff;
> +	}
>  	xenbus_switch_state(dev, XenbusStateInitialised);
>  
>  	return 0;
> @@ -1423,9 +1466,9 @@ again:
>  	xenbus_transaction_end(xbt, 1);
>  	if (message)
>  		xenbus_dev_fatal(dev, err, "%s", message);
> - destroy_blkring:
> -	blkif_free(dinfo, 0);
>   out:
> +	while (--r_index >= 0)
> +		destroy_blkring(dev, &dinfo->rinfo[r_index]);

The same as above, destroy_blkring does a different cleaning of what
used to be done in blkif_free.

>  	return err;
>  }
>  
> @@ -1438,7 +1481,7 @@ again:
>  static int blkfront_probe(struct xenbus_device *dev,
>  			  const struct xenbus_device_id *id)
>  {
> -	int err, vdevice;
> +	int err, vdevice, r_index;
>  	struct blkfront_dev_info *dinfo;
>  	struct blkfront_ring_info *rinfo;
>  
> @@ -1490,17 +1533,29 @@ static int blkfront_probe(struct xenbus_device *dev,
>  		return -ENOMEM;
>  	}
>  
> -	rinfo = &dinfo->rinfo;
>  	mutex_init(&dinfo->mutex);
>  	spin_lock_init(&dinfo->io_lock);
>  	dinfo->xbdev = dev;
>  	dinfo->vdevice = vdevice;
> -	INIT_LIST_HEAD(&rinfo->grants);
> -	INIT_LIST_HEAD(&rinfo->indirect_pages);
> -	rinfo->persistent_gnts_c = 0;
>  	dinfo->connected = BLKIF_STATE_DISCONNECTED;
> -	rinfo->dinfo = dinfo;
> -	INIT_WORK(&rinfo->work, blkif_restart_queue);
> +
> +	dinfo->nr_rings = 1;
> +	dinfo->rinfo = kzalloc(sizeof(*rinfo) * dinfo->nr_rings, GFP_KERNEL);
> +	if (!dinfo->rinfo) {
> +		xenbus_dev_fatal(dev, -ENOMEM, "allocating ring_info structure");
> +		kfree(dinfo);
> +		return -ENOMEM;
> +	}
> +
> +	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
> +		rinfo = &dinfo->rinfo[r_index];
> +
> +		INIT_LIST_HEAD(&rinfo->grants);
> +		INIT_LIST_HEAD(&rinfo->indirect_pages);
> +		rinfo->persistent_gnts_c = 0;
> +		rinfo->dinfo = dinfo;
> +		INIT_WORK(&rinfo->work, blkif_restart_queue);
> +	}
>  
>  	/* Front end dir is a number, which is used as the id. */
>  	dinfo->handle = simple_strtoul(strrchr(dev->nodename, '/')+1, NULL, 0);
> @@ -1526,7 +1581,7 @@ static void split_bio_end(struct bio *bio, int error)
>  
>  static int blkif_recover(struct blkfront_dev_info *dinfo)
>  {
> -	int i;
> +	int i, r_index;
>  	struct request *req, *n;
>  	struct blk_shadow *copy;
>  	int rc;
> @@ -1536,56 +1591,62 @@ static int blkif_recover(struct blkfront_dev_info *dinfo)
>  	int pending, size;
>  	struct split_bio *split_bio;
>  	struct list_head requests;
> -	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
> -
> -	/* Stage 1: Make a safe copy of the shadow state. */
> -	copy = kmemdup(rinfo->shadow, sizeof(rinfo->shadow),
> -		       GFP_NOIO | __GFP_REPEAT | __GFP_HIGH);
> -	if (!copy)
> -		return -ENOMEM;
> -
> -	/* Stage 2: Set up free list. */
> -	memset(&rinfo->shadow, 0, sizeof(rinfo->shadow));
> -	for (i = 0; i < BLK_RING_SIZE(dinfo); i++)
> -		rinfo->shadow[i].req.u.rw.id = i+1;
> -	rinfo->shadow_free = rinfo->ring.req_prod_pvt;
> -	rinfo->shadow[BLK_RING_SIZE(dinfo)-1].req.u.rw.id = 0x0fffffff;
> -
> -	rc = blkfront_gather_backend_features(dinfo);
> -	if (rc) {
> -		kfree(copy);
> -		return rc;
> -	}
> +	struct blkfront_ring_info *rinfo;
>  
> +	__blkfront_gather_backend_features(dinfo);
>  	segs = dinfo->max_indirect_segments ? : BLKIF_MAX_SEGMENTS_PER_REQUEST;
>  	blk_queue_max_segments(dinfo->rq, segs);
>  	bio_list_init(&bio_list);
>  	INIT_LIST_HEAD(&requests);
> -	for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
> -		/* Not in use? */
> -		if (!copy[i].request)
> -			continue;
>  
> -		/*
> -		 * Get the bios in the request so we can re-queue them.
> -		 */
> -		if (copy[i].request->cmd_flags &
> -		    (REQ_FLUSH | REQ_FUA | REQ_DISCARD | REQ_SECURE)) {
> +	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
> +		rinfo = &dinfo->rinfo[r_index];
> +
> +		/* Stage 1: Make a safe copy of the shadow state. */
> +		copy = kmemdup(rinfo->shadow, sizeof(rinfo->shadow),
> +			       GFP_NOIO | __GFP_REPEAT | __GFP_HIGH);
> +		if (!copy)
> +			return -ENOMEM;
> +
> +		/* Stage 2: Set up free list. */
> +		memset(&rinfo->shadow, 0, sizeof(rinfo->shadow));
> +		for (i = 0; i < BLK_RING_SIZE(dinfo); i++)
> +			rinfo->shadow[i].req.u.rw.id = i+1;
> +		rinfo->shadow_free = rinfo->ring.req_prod_pvt;
> +		rinfo->shadow[BLK_RING_SIZE(dinfo)-1].req.u.rw.id = 0x0fffffff;
> +
> +		rc = blkfront_setup_indirect(rinfo);
> +		if (rc) {
> +			kfree(copy);
> +			return rc;
> +		}
> +
> +		for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
> +			/* Not in use? */
> +			if (!copy[i].request)
> +				continue;
> +
>  			/*
> -			 * Flush operations don't contain bios, so
> -			 * we need to requeue the whole request
> +			 * Get the bios in the request so we can re-queue them.
>  			 */
> -			list_add(&copy[i].request->queuelist, &requests);
> -			continue;
> +			if (copy[i].request->cmd_flags &
> +			    (REQ_FLUSH | REQ_FUA | REQ_DISCARD | REQ_SECURE)) {
> +				/*
> +				 * Flush operations don't contain bios, so
> +				 * we need to requeue the whole request
> +				 */
> +				list_add(&copy[i].request->queuelist, &requests);
> +				continue;
> +			}
> +			merge_bio.head = copy[i].request->bio;
> +			merge_bio.tail = copy[i].request->biotail;
> +			bio_list_merge(&bio_list, &merge_bio);
> +			copy[i].request->bio = NULL;
> +			blk_end_request_all(copy[i].request, 0);
>  		}
> -		merge_bio.head = copy[i].request->bio;
> -		merge_bio.tail = copy[i].request->biotail;
> -		bio_list_merge(&bio_list, &merge_bio);
> -		copy[i].request->bio = NULL;
> -		blk_end_request_all(copy[i].request, 0);
> -	}
>  
> -	kfree(copy);
> +		kfree(copy);
> +	}
>  
>  	xenbus_switch_state(dinfo->xbdev, XenbusStateConnected);
>  
> @@ -1594,8 +1655,12 @@ static int blkif_recover(struct blkfront_dev_info *dinfo)
>  	/* Now safe for us to use the shared ring */
>  	dinfo->connected = BLKIF_STATE_CONNECTED;
>  
> -	/* Kick any other new requests queued since we resumed */
> -	kick_pending_request_queues(rinfo);
> +	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
> +		rinfo = &dinfo->rinfo[r_index];
> +
> +		/* Kick any other new requests queued since we resumed */
> +		kick_pending_request_queues(rinfo);
> +	}
>  
>  	list_for_each_entry_safe(req, n, &requests, queuelist) {
>  		/* Requeue pending requests (flush or discard) */
> @@ -1729,6 +1794,38 @@ static void blkfront_setup_discard(struct blkfront_dev_info *dinfo)
>  		dinfo->feature_secdiscard = !!discard_secure;
>  }
>  
> +static void blkfront_clean_ring(struct blkfront_ring_info *rinfo)
> +{
> +	int i;
> +
> +	for (i = 0; i < BLK_RING_SIZE(rinfo->dinfo); i++) {
> +		kfree(rinfo->shadow[i].grants_used);
> +		rinfo->shadow[i].grants_used = NULL;
> +		kfree(rinfo->shadow[i].sg);
> +		rinfo->shadow[i].sg = NULL;
> +		kfree(rinfo->shadow[i].indirect_grants);
> +		rinfo->shadow[i].indirect_grants = NULL;
> +	}
> +	if (!list_empty(&rinfo->indirect_pages)) {
> +		struct page *indirect_page, *n;
> +		list_for_each_entry_safe(indirect_page, n, &rinfo->indirect_pages, lru) {
> +			list_del(&indirect_page->lru);
> +			__free_page(indirect_page);
> +		}
> +	}
> +
> +	if (!list_empty(&rinfo->grants)) {
> +		struct grant *gnt_list_entry, *n;
> +		list_for_each_entry_safe(gnt_list_entry, n,
> +				&rinfo->grants, node) {
> +			list_del(&gnt_list_entry->node);
> +			if (rinfo->dinfo->feature_persistent)
> +				__free_page(pfn_to_page(gnt_list_entry->pfn));
> +			kfree(gnt_list_entry);
> +		}
> +	}
> +}
> +
>  static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo)
>  {
>  	unsigned int segs;
> @@ -1783,28 +1880,14 @@ static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo)
>  	return 0;
>  
>  out_of_memory:
> -	for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
> -		kfree(rinfo->shadow[i].grants_used);
> -		rinfo->shadow[i].grants_used = NULL;
> -		kfree(rinfo->shadow[i].sg);
> -		rinfo->shadow[i].sg = NULL;
> -		kfree(rinfo->shadow[i].indirect_grants);
> -		rinfo->shadow[i].indirect_grants = NULL;
> -	}
> -	if (!list_empty(&rinfo->indirect_pages)) {
> -		struct page *indirect_page, *n;
> -		list_for_each_entry_safe(indirect_page, n, &rinfo->indirect_pages, lru) {
> -			list_del(&indirect_page->lru);
> -			__free_page(indirect_page);
> -		}
> -	}
> +	blkfront_clean_ring(rinfo);
>  	return -ENOMEM;
>  }
>  
>  /*
>   * Gather all backend feature-*
>   */
> -static int blkfront_gather_backend_features(struct blkfront_dev_info *dinfo)
> +static void __blkfront_gather_backend_features(struct blkfront_dev_info *dinfo)
>  {
>  	int err;
>  	int barrier, flush, discard, persistent;
> @@ -1859,8 +1942,25 @@ static int blkfront_gather_backend_features(struct blkfront_dev_info *dinfo)
>  	else
>  		dinfo->max_indirect_segments = min(indirect_segments,
>  						  xen_blkif_max_segments);
> +}
> +
> +static int blkfront_gather_backend_features(struct blkfront_dev_info *dinfo)
> +{
> +	int err, i;
> +
> +	__blkfront_gather_backend_features(dinfo);

IMHO, there's no need to introduce __blkfront_gather_backend_features,
just add the chunk below to the existing blkfront_gather_backend_features.

> -	return blkfront_setup_indirect(&dinfo->rinfo);
> +	for (i = 0; i < dinfo->nr_rings; i++) {
> +		err = blkfront_setup_indirect(&dinfo->rinfo[i]);
> +		if (err)
> +			goto out;
> +	}
> +	return 0;
> +
> +out:
> +	while (--i >= 0)
> +		blkfront_clean_ring(&dinfo->rinfo[i]);
> +	return err;
>  }
>  
>  /*
> @@ -1873,8 +1973,8 @@ static void blkfront_connect(struct blkfront_dev_info *dinfo)
>  	unsigned long sector_size;
>  	unsigned int physical_sector_size;
>  	unsigned int binfo;
> -	int err;
> -	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
> +	int err, i;
> +	struct blkfront_ring_info *rinfo;
>  
>  	switch (dinfo->connected) {
>  	case BLKIF_STATE_CONNECTED:
> @@ -1951,7 +2051,10 @@ static void blkfront_connect(struct blkfront_dev_info *dinfo)
>  	/* Kick pending requests. */
>  	spin_lock_irq(&dinfo->io_lock);
>  	dinfo->connected = BLKIF_STATE_CONNECTED;
> -	kick_pending_request_queues(rinfo);
> +	for (i = 0; i < dinfo->nr_rings; i++) {
> +		rinfo = &dinfo->rinfo[i];

If rinfo is only going to be used in the for loop you can declare it inside:

		struct blkfront_ring_info *rinfo = &dinfo->rinfo[i];

> +		kick_pending_request_queues(rinfo);
> +	}
>  	spin_unlock_irq(&dinfo->io_lock);
>  
>  	add_disk(dinfo->gd);
> 


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 4/9] xen/blkfront: pseudo support for multi hardware queues/rings
  2015-09-05 12:39 ` [PATCH v3 4/9] xen/blkfront: pseudo support for multi hardware queues/rings Bob Liu
  2015-10-05 10:52   ` Roger Pau Monné
@ 2015-10-05 10:52   ` Roger Pau Monné
  1 sibling, 0 replies; 83+ messages in thread
From: Roger Pau Monné @ 2015-10-05 10:52 UTC (permalink / raw)
  To: Bob Liu, xen-devel
  Cc: hch, felipe.franciosi, rafal.mielniczuk, linux-kernel,
	jonathan.davies, axboe, david.vrabel, avanzini.arianna,
	boris.ostrovsky

El 05/09/15 a les 14.39, Bob Liu ha escrit:
> Prepare patch for multi hardware queues/rings, the ring number was set to 1 by
> force.
> 
> * Use 'nr_rings' in per dev_info to identify how many hw queues/rings are
>   supported, and a pointer *rinfo for all its rings.
> * Rename 'nr_ring_pages' => 'pages_per_ring' to distinguish from 'nr_rings'
>   better.
> 
> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> ---
>  drivers/block/xen-blkfront.c |  513 +++++++++++++++++++++++++-----------------
>  1 file changed, 308 insertions(+), 205 deletions(-)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index bf416d5..bf45c99 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -107,7 +107,7 @@ static unsigned int xen_blkif_max_ring_order;
>  module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, S_IRUGO);
>  MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the shared ring");
>  
> -#define BLK_RING_SIZE(dinfo) __CONST_RING_SIZE(blkif, PAGE_SIZE * (dinfo)->nr_ring_pages)
> +#define BLK_RING_SIZE(dinfo) __CONST_RING_SIZE(blkif, PAGE_SIZE * (dinfo)->pages_per_ring)
>  #define BLK_MAX_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE * XENBUS_MAX_RING_PAGES)
>  /*
>   * ring-ref%i i=(-1UL) would take 11 characters + 'ring-ref' is 8, so 19
> @@ -157,9 +157,10 @@ struct blkfront_dev_info {
>  	unsigned int feature_persistent:1;
>  	unsigned int max_indirect_segments;
>  	int is_ready;
> -	unsigned int nr_ring_pages;
> +	unsigned int pages_per_ring;

Why do you rename this field? nr_ring_pages seems more consistent with
the nr_rings field that you add below IMO, but that might be a matter of
taste.

>  	struct blk_mq_tag_set tag_set;
> -	struct blkfront_ring_info rinfo;
> +	struct blkfront_ring_info *rinfo;
> +	unsigned int nr_rings;
>  };
>  
>  static unsigned int nr_minors;
> @@ -191,7 +192,7 @@ static DEFINE_SPINLOCK(minor_lock);
>  	((_segs + SEGS_PER_INDIRECT_FRAME - 1)/SEGS_PER_INDIRECT_FRAME)
>  
>  static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo);
> -static int blkfront_gather_backend_features(struct blkfront_dev_info *dinfo);
> +static void __blkfront_gather_backend_features(struct blkfront_dev_info *dinfo);
>  
>  static int get_id_from_freelist(struct blkfront_ring_info *rinfo)
>  {
> @@ -668,7 +669,7 @@ static int blk_mq_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
>  {
>  	struct blkfront_dev_info *dinfo = (struct blkfront_dev_info *)data;
>  
> -	hctx->driver_data = &dinfo->rinfo;
> +	hctx->driver_data = &dinfo->rinfo[index];
>  	return 0;
>  }
>  
> @@ -927,8 +928,8 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
>  
>  static void xlvbd_release_gendisk(struct blkfront_dev_info *dinfo)
>  {
> -	unsigned int minor, nr_minors;
> -	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
> +	unsigned int minor, nr_minors, i;
> +	struct blkfront_ring_info *rinfo;
>  
>  	if (dinfo->rq == NULL)
>  		return;
> @@ -936,11 +937,15 @@ static void xlvbd_release_gendisk(struct blkfront_dev_info *dinfo)
>  	/* No more blkif_request(). */
>  	blk_mq_stop_hw_queues(dinfo->rq);
>  
> -	/* No more gnttab callback work. */
> -	gnttab_cancel_free_callback(&rinfo->callback);
> +	for (i = 0; i < dinfo->nr_rings; i++) {

I would be tempted to declare rinfo only inside the for loop, to limit
the scope:

		struct blkfront_ring_info *rinfo = &dinfo->rinfo[i];

> +		rinfo = &dinfo->rinfo[i];
>  
> -	/* Flush gnttab callback work. Must be done with no locks held. */
> -	flush_work(&rinfo->work);
> +		/* No more gnttab callback work. */
> +		gnttab_cancel_free_callback(&rinfo->callback);
> +
> +		/* Flush gnttab callback work. Must be done with no locks held. */
> +		flush_work(&rinfo->work);
> +	}
>  
>  	del_gendisk(dinfo->gd);
>  
> @@ -977,8 +982,8 @@ static void blkif_free(struct blkfront_dev_info *dinfo, int suspend)
>  {
>  	struct grant *persistent_gnt;
>  	struct grant *n;
> -	int i, j, segs;
> -	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
> +	int i, j, segs, r_index;
> +	struct blkfront_ring_info *rinfo;
>  
>  	/* Prevent new requests being issued until we fix things up. */
>  	spin_lock_irq(&dinfo->io_lock);
> @@ -988,100 +993,103 @@ static void blkif_free(struct blkfront_dev_info *dinfo, int suspend)
>  	if (dinfo->rq)
>  		blk_mq_stop_hw_queues(dinfo->rq);
>  
> -	/* Remove all persistent grants */
> -	if (!list_empty(&rinfo->grants)) {
> -		list_for_each_entry_safe(persistent_gnt, n,
> -					 &rinfo->grants, node) {
> -			list_del(&persistent_gnt->node);
> -			if (persistent_gnt->gref != GRANT_INVALID_REF) {
> -				gnttab_end_foreign_access(persistent_gnt->gref,
> -				                          0, 0UL);
> -				rinfo->persistent_gnts_c--;
> +	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
> +		rinfo = &dinfo->rinfo[r_index];

struct blkfront_ring_info *rinfo = &dinfo->rinfo[r_index];

Would it be helpful to place all this code inside of a helper function,
ie: blkif_free_ring?

> +
> +		/* Remove all persistent grants */
> +		if (!list_empty(&rinfo->grants)) {
> +			list_for_each_entry_safe(persistent_gnt, n,
> +						 &rinfo->grants, node) {
> +				list_del(&persistent_gnt->node);
> +				if (persistent_gnt->gref != GRANT_INVALID_REF) {
> +					gnttab_end_foreign_access(persistent_gnt->gref,
> +								  0, 0UL);
> +					rinfo->persistent_gnts_c--;
> +				}
> +				if (dinfo->feature_persistent)
> +					__free_page(pfn_to_page(persistent_gnt->pfn));
> +				kfree(persistent_gnt);
>  			}
> -			if (dinfo->feature_persistent)
> -				__free_page(pfn_to_page(persistent_gnt->pfn));
> -			kfree(persistent_gnt);
>  		}
> -	}
> -	BUG_ON(rinfo->persistent_gnts_c != 0);
> +		BUG_ON(rinfo->persistent_gnts_c != 0);
>  
> -	/*
> -	 * Remove indirect pages, this only happens when using indirect
> -	 * descriptors but not persistent grants
> -	 */
> -	if (!list_empty(&rinfo->indirect_pages)) {
> -		struct page *indirect_page, *n;
> -
> -		BUG_ON(dinfo->feature_persistent);
> -		list_for_each_entry_safe(indirect_page, n, &rinfo->indirect_pages, lru) {
> -			list_del(&indirect_page->lru);
> -			__free_page(indirect_page);
> -		}
> -	}
> -
> -	for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
>  		/*
> -		 * Clear persistent grants present in requests already
> -		 * on the shared ring
> +		 * Remove indirect pages, this only happens when using indirect
> +		 * descriptors but not persistent grants
>  		 */
> -		if (!rinfo->shadow[i].request)
> -			goto free_shadow;
> -
> -		segs = rinfo->shadow[i].req.operation == BLKIF_OP_INDIRECT ?
> -		       rinfo->shadow[i].req.u.indirect.nr_segments :
> -		       rinfo->shadow[i].req.u.rw.nr_segments;
> -		for (j = 0; j < segs; j++) {
> -			persistent_gnt = rinfo->shadow[i].grants_used[j];
> -			gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
> -			if (dinfo->feature_persistent)
> -				__free_page(pfn_to_page(persistent_gnt->pfn));
> -			kfree(persistent_gnt);
> +		if (!list_empty(&rinfo->indirect_pages)) {
> +			struct page *indirect_page, *n;
> +
> +			BUG_ON(dinfo->feature_persistent);
> +			list_for_each_entry_safe(indirect_page, n, &rinfo->indirect_pages, lru) {
> +				list_del(&indirect_page->lru);
> +				__free_page(indirect_page);
> +			}
>  		}
>  
> -		if (rinfo->shadow[i].req.operation != BLKIF_OP_INDIRECT)
> +		for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
>  			/*
> -			 * If this is not an indirect operation don't try to
> -			 * free indirect segments
> +			 * Clear persistent grants present in requests already
> +			 * on the shared ring
>  			 */
> -			goto free_shadow;
> +			if (!rinfo->shadow[i].request)
> +				goto free_shadow;
> +
> +			segs = rinfo->shadow[i].req.operation == BLKIF_OP_INDIRECT ?
> +			       rinfo->shadow[i].req.u.indirect.nr_segments :
> +			       rinfo->shadow[i].req.u.rw.nr_segments;
> +			for (j = 0; j < segs; j++) {
> +				persistent_gnt = rinfo->shadow[i].grants_used[j];
> +				gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
> +				if (dinfo->feature_persistent)
> +					__free_page(pfn_to_page(persistent_gnt->pfn));
> +				kfree(persistent_gnt);
> +			}
>  
> -		for (j = 0; j < INDIRECT_GREFS(segs); j++) {
> -			persistent_gnt = rinfo->shadow[i].indirect_grants[j];
> -			gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
> -			__free_page(pfn_to_page(persistent_gnt->pfn));
> -			kfree(persistent_gnt);
> -		}
> +			if (rinfo->shadow[i].req.operation != BLKIF_OP_INDIRECT)
> +				/*
> +				 * If this is not an indirect operation don't try to
> +				 * free indirect segments
> +				 */
> +				goto free_shadow;
> +
> +			for (j = 0; j < INDIRECT_GREFS(segs); j++) {
> +				persistent_gnt = rinfo->shadow[i].indirect_grants[j];
> +				gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
> +				__free_page(pfn_to_page(persistent_gnt->pfn));
> +				kfree(persistent_gnt);
> +			}
>  
>  free_shadow:
> -		kfree(rinfo->shadow[i].grants_used);
> -		rinfo->shadow[i].grants_used = NULL;
> -		kfree(rinfo->shadow[i].indirect_grants);
> -		rinfo->shadow[i].indirect_grants = NULL;
> -		kfree(rinfo->shadow[i].sg);
> -		rinfo->shadow[i].sg = NULL;
> -	}
> +			kfree(rinfo->shadow[i].grants_used);
> +			rinfo->shadow[i].grants_used = NULL;
> +			kfree(rinfo->shadow[i].indirect_grants);
> +			rinfo->shadow[i].indirect_grants = NULL;
> +			kfree(rinfo->shadow[i].sg);
> +			rinfo->shadow[i].sg = NULL;
> +		}
>  
> -	/* No more gnttab callback work. */
> -	gnttab_cancel_free_callback(&rinfo->callback);
> -	spin_unlock_irq(&dinfo->io_lock);
> +		/* No more gnttab callback work. */
> +		gnttab_cancel_free_callback(&rinfo->callback);
> +		spin_unlock_irq(&dinfo->io_lock);
>  
> -	/* Flush gnttab callback work. Must be done with no locks held. */
> -	flush_work(&rinfo->work);
> +		/* Flush gnttab callback work. Must be done with no locks held. */
> +		flush_work(&rinfo->work);
>  
> -	/* Free resources associated with old device channel. */
> -	for (i = 0; i < dinfo->nr_ring_pages; i++) {
> -		if (rinfo->ring_ref[i] != GRANT_INVALID_REF) {
> -			gnttab_end_foreign_access(rinfo->ring_ref[i], 0, 0);
> -			rinfo->ring_ref[i] = GRANT_INVALID_REF;
> +		/* Free resources associated with old device channel. */
> +		for (i = 0; i < dinfo->pages_per_ring; i++) {
> +			if (rinfo->ring_ref[i] != GRANT_INVALID_REF) {
> +				gnttab_end_foreign_access(rinfo->ring_ref[i], 0, 0);
> +				rinfo->ring_ref[i] = GRANT_INVALID_REF;
> +			}
>  		}
> -	}
> -	free_pages((unsigned long)rinfo->ring.sring, get_order(dinfo->nr_ring_pages * PAGE_SIZE));
> -	rinfo->ring.sring = NULL;
> -
> -	if (rinfo->irq)
> -		unbind_from_irqhandler(rinfo->irq, rinfo);
> -	rinfo->evtchn = rinfo->irq = 0;
> +		free_pages((unsigned long)rinfo->ring.sring, get_order(dinfo->pages_per_ring * PAGE_SIZE));
> +		rinfo->ring.sring = NULL;
>  
> +		if (rinfo->irq)
> +			unbind_from_irqhandler(rinfo->irq, rinfo);
> +		rinfo->evtchn = rinfo->irq = 0;
> +	}
>  }
>  
>  static void blkif_completion(struct blk_shadow *s, struct blkfront_ring_info *rinfo,
> @@ -1276,6 +1284,26 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>  	return IRQ_HANDLED;
>  }
>  
> +static void destroy_blkring(struct xenbus_device *dev,
> +			    struct blkfront_ring_info *rinfo)
> +{
> +	int i;
> +
> +	if (rinfo->irq)
> +		unbind_from_irqhandler(rinfo->irq, rinfo);
> +	if (rinfo->evtchn)
> +		xenbus_free_evtchn(dev, rinfo->evtchn);
> +
> +	for (i = 0; i < rinfo->dinfo->pages_per_ring; i++) {
> +		if (rinfo->ring_ref[i] != GRANT_INVALID_REF) {
> +			gnttab_end_foreign_access(rinfo->ring_ref[i], 0, 0);
> +			rinfo->ring_ref[i] = GRANT_INVALID_REF;
> +		}
> +	}
> +	free_pages((unsigned long)rinfo->ring.sring,
> +		   get_order(rinfo->dinfo->pages_per_ring * PAGE_SIZE));
> +	rinfo->ring.sring = NULL;
> +}
>  
>  static int setup_blkring(struct xenbus_device *dev,
>  			 struct blkfront_ring_info *rinfo)
> @@ -1283,10 +1311,10 @@ static int setup_blkring(struct xenbus_device *dev,
>  	struct blkif_sring *sring;
>  	int err, i;
>  	struct blkfront_dev_info *dinfo = rinfo->dinfo;
> -	unsigned long ring_size = dinfo->nr_ring_pages * PAGE_SIZE;
> +	unsigned long ring_size = dinfo->pages_per_ring * PAGE_SIZE;
>  	grant_ref_t gref[XENBUS_MAX_RING_PAGES];
>  
> -	for (i = 0; i < dinfo->nr_ring_pages; i++)
> +	for (i = 0; i < dinfo->pages_per_ring; i++)
>  		rinfo->ring_ref[i] = GRANT_INVALID_REF;
>  
>  	sring = (struct blkif_sring *)__get_free_pages(GFP_NOIO | __GFP_HIGH,
> @@ -1298,13 +1326,13 @@ static int setup_blkring(struct xenbus_device *dev,
>  	SHARED_RING_INIT(sring);
>  	FRONT_RING_INIT(&rinfo->ring, sring, ring_size);
>  
> -	err = xenbus_grant_ring(dev, rinfo->ring.sring, dinfo->nr_ring_pages, gref);
> +	err = xenbus_grant_ring(dev, rinfo->ring.sring, dinfo->pages_per_ring, gref);
>  	if (err < 0) {
>  		free_pages((unsigned long)sring, get_order(ring_size));
>  		rinfo->ring.sring = NULL;
>  		goto fail;
>  	}
> -	for (i = 0; i < dinfo->nr_ring_pages; i++)
> +	for (i = 0; i < dinfo->pages_per_ring; i++)
>  		rinfo->ring_ref[i] = gref[i];
>  
>  	err = xenbus_alloc_evtchn(dev, &rinfo->evtchn);
> @@ -1322,7 +1350,7 @@ static int setup_blkring(struct xenbus_device *dev,
>  
>  	return 0;
>  fail:
> -	blkif_free(dinfo, 0);
> +	destroy_blkring(dev, rinfo);

blkif_free used to clean a lot more than what destroy_blkring does, is
this right?

>  	return err;
>  }
>  
> @@ -1333,65 +1361,76 @@ static int talk_to_blkback(struct xenbus_device *dev,
>  {
>  	const char *message = NULL;
>  	struct xenbus_transaction xbt;
> -	int err, i;
> +	int err, i, r_index;
>  	unsigned int max_page_order = 0;
>  	unsigned int ring_page_order = 0;
> -	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
> +	struct blkfront_ring_info *rinfo;
>  
>  	err = xenbus_scanf(XBT_NIL, dinfo->xbdev->otherend,
>  			   "max-ring-page-order", "%u", &max_page_order);
>  	if (err != 1)
> -		dinfo->nr_ring_pages = 1;
> +		dinfo->pages_per_ring = 1;
>  	else {
>  		ring_page_order = min(xen_blkif_max_ring_order, max_page_order);
> -		dinfo->nr_ring_pages = 1 << ring_page_order;
> +		dinfo->pages_per_ring = 1 << ring_page_order;

As said above, I think nr_ring_pages is perfectly fine, and avoids all
this ponintless changes.

>  	}
>  
> -	/* Create shared ring, alloc event channel. */
> -	err = setup_blkring(dev, rinfo);
> -	if (err)
> -		goto out;
> +	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
> +		rinfo = &dinfo->rinfo[r_index];
> +		/* Create shared ring, alloc event channel. */
> +		err = setup_blkring(dev, rinfo);
> +		if (err)
> +			goto out;
> +	}
>  
>  again:
>  	err = xenbus_transaction_start(&xbt);
>  	if (err) {
>  		xenbus_dev_fatal(dev, err, "starting transaction");
> -		goto destroy_blkring;
> +		goto out;
>  	}
>  
> -	if (dinfo->nr_ring_pages == 1) {
> -		err = xenbus_printf(xbt, dev->nodename,
> -				    "ring-ref", "%u", rinfo->ring_ref[0]);
> -		if (err) {
> -			message = "writing ring-ref";
> -			goto abort_transaction;
> -		}
> -	} else {
> -		err = xenbus_printf(xbt, dev->nodename,
> -				    "ring-page-order", "%u", ring_page_order);
> -		if (err) {
> -			message = "writing ring-page-order";
> -			goto abort_transaction;
> -		}
> -
> -		for (i = 0; i < dinfo->nr_ring_pages; i++) {
> -			char ring_ref_name[RINGREF_NAME_LEN];
> +	if (dinfo->nr_rings == 1) {
> +		rinfo = &dinfo->rinfo[0];
>  
> -			snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
> -			err = xenbus_printf(xbt, dev->nodename, ring_ref_name,
> -					    "%u", rinfo->ring_ref[i]);
> +		if (dinfo->pages_per_ring == 1) {
> +			err = xenbus_printf(xbt, dev->nodename,
> +					    "ring-ref", "%u", rinfo->ring_ref[0]);
>  			if (err) {
>  				message = "writing ring-ref";
>  				goto abort_transaction;
>  			}
> +		} else {
> +			err = xenbus_printf(xbt, dev->nodename,
> +					    "ring-page-order", "%u", ring_page_order);
> +			if (err) {
> +				message = "writing ring-page-order";
> +				goto abort_transaction;
> +			}
> +
> +			for (i = 0; i < dinfo->pages_per_ring; i++) {
> +				char ring_ref_name[RINGREF_NAME_LEN];
> +
> +				snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
> +				err = xenbus_printf(xbt, dev->nodename, ring_ref_name,
> +						    "%u", rinfo->ring_ref[i]);
> +				if (err) {
> +					message = "writing ring-ref";
> +					goto abort_transaction;
> +				}
> +			}
>  		}
> -	}
> -	err = xenbus_printf(xbt, dev->nodename,
> -			    "event-channel", "%u", rinfo->evtchn);
> -	if (err) {
> -		message = "writing event-channel";
> +		err = xenbus_printf(xbt, dev->nodename,
> +				    "event-channel", "%u", rinfo->evtchn);
> +		if (err) {
> +			message = "writing event-channel";
> +			goto abort_transaction;
> +		}
> +	} else {
> +		/* Not supported at this stage */
>  		goto abort_transaction;
>  	}
> +
>  	err = xenbus_printf(xbt, dev->nodename, "protocol", "%s",
>  			    XEN_IO_PROTO_ABI_NATIVE);
>  	if (err) {
> @@ -1409,12 +1448,16 @@ again:
>  		if (err == -EAGAIN)
>  			goto again;
>  		xenbus_dev_fatal(dev, err, "completing transaction");
> -		goto destroy_blkring;
> +		goto out;
>  	}
>  
> -	for (i = 0; i < BLK_RING_SIZE(dinfo); i++)
> -		rinfo->shadow[i].req.u.rw.id = i+1;
> -	rinfo->shadow[BLK_RING_SIZE(dinfo)-1].req.u.rw.id = 0x0fffffff;
> +	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
> +		rinfo = &dinfo->rinfo[r_index];
> +
> +		for (i = 0; i < BLK_RING_SIZE(dinfo); i++)
> +			rinfo->shadow[i].req.u.rw.id = i+1;
> +		rinfo->shadow[BLK_RING_SIZE(dinfo)-1].req.u.rw.id = 0x0fffffff;
> +	}
>  	xenbus_switch_state(dev, XenbusStateInitialised);
>  
>  	return 0;
> @@ -1423,9 +1466,9 @@ again:
>  	xenbus_transaction_end(xbt, 1);
>  	if (message)
>  		xenbus_dev_fatal(dev, err, "%s", message);
> - destroy_blkring:
> -	blkif_free(dinfo, 0);
>   out:
> +	while (--r_index >= 0)
> +		destroy_blkring(dev, &dinfo->rinfo[r_index]);

The same as above, destroy_blkring does a different cleaning of what
used to be done in blkif_free.

>  	return err;
>  }
>  
> @@ -1438,7 +1481,7 @@ again:
>  static int blkfront_probe(struct xenbus_device *dev,
>  			  const struct xenbus_device_id *id)
>  {
> -	int err, vdevice;
> +	int err, vdevice, r_index;
>  	struct blkfront_dev_info *dinfo;
>  	struct blkfront_ring_info *rinfo;
>  
> @@ -1490,17 +1533,29 @@ static int blkfront_probe(struct xenbus_device *dev,
>  		return -ENOMEM;
>  	}
>  
> -	rinfo = &dinfo->rinfo;
>  	mutex_init(&dinfo->mutex);
>  	spin_lock_init(&dinfo->io_lock);
>  	dinfo->xbdev = dev;
>  	dinfo->vdevice = vdevice;
> -	INIT_LIST_HEAD(&rinfo->grants);
> -	INIT_LIST_HEAD(&rinfo->indirect_pages);
> -	rinfo->persistent_gnts_c = 0;
>  	dinfo->connected = BLKIF_STATE_DISCONNECTED;
> -	rinfo->dinfo = dinfo;
> -	INIT_WORK(&rinfo->work, blkif_restart_queue);
> +
> +	dinfo->nr_rings = 1;
> +	dinfo->rinfo = kzalloc(sizeof(*rinfo) * dinfo->nr_rings, GFP_KERNEL);
> +	if (!dinfo->rinfo) {
> +		xenbus_dev_fatal(dev, -ENOMEM, "allocating ring_info structure");
> +		kfree(dinfo);
> +		return -ENOMEM;
> +	}
> +
> +	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
> +		rinfo = &dinfo->rinfo[r_index];
> +
> +		INIT_LIST_HEAD(&rinfo->grants);
> +		INIT_LIST_HEAD(&rinfo->indirect_pages);
> +		rinfo->persistent_gnts_c = 0;
> +		rinfo->dinfo = dinfo;
> +		INIT_WORK(&rinfo->work, blkif_restart_queue);
> +	}
>  
>  	/* Front end dir is a number, which is used as the id. */
>  	dinfo->handle = simple_strtoul(strrchr(dev->nodename, '/')+1, NULL, 0);
> @@ -1526,7 +1581,7 @@ static void split_bio_end(struct bio *bio, int error)
>  
>  static int blkif_recover(struct blkfront_dev_info *dinfo)
>  {
> -	int i;
> +	int i, r_index;
>  	struct request *req, *n;
>  	struct blk_shadow *copy;
>  	int rc;
> @@ -1536,56 +1591,62 @@ static int blkif_recover(struct blkfront_dev_info *dinfo)
>  	int pending, size;
>  	struct split_bio *split_bio;
>  	struct list_head requests;
> -	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
> -
> -	/* Stage 1: Make a safe copy of the shadow state. */
> -	copy = kmemdup(rinfo->shadow, sizeof(rinfo->shadow),
> -		       GFP_NOIO | __GFP_REPEAT | __GFP_HIGH);
> -	if (!copy)
> -		return -ENOMEM;
> -
> -	/* Stage 2: Set up free list. */
> -	memset(&rinfo->shadow, 0, sizeof(rinfo->shadow));
> -	for (i = 0; i < BLK_RING_SIZE(dinfo); i++)
> -		rinfo->shadow[i].req.u.rw.id = i+1;
> -	rinfo->shadow_free = rinfo->ring.req_prod_pvt;
> -	rinfo->shadow[BLK_RING_SIZE(dinfo)-1].req.u.rw.id = 0x0fffffff;
> -
> -	rc = blkfront_gather_backend_features(dinfo);
> -	if (rc) {
> -		kfree(copy);
> -		return rc;
> -	}
> +	struct blkfront_ring_info *rinfo;
>  
> +	__blkfront_gather_backend_features(dinfo);
>  	segs = dinfo->max_indirect_segments ? : BLKIF_MAX_SEGMENTS_PER_REQUEST;
>  	blk_queue_max_segments(dinfo->rq, segs);
>  	bio_list_init(&bio_list);
>  	INIT_LIST_HEAD(&requests);
> -	for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
> -		/* Not in use? */
> -		if (!copy[i].request)
> -			continue;
>  
> -		/*
> -		 * Get the bios in the request so we can re-queue them.
> -		 */
> -		if (copy[i].request->cmd_flags &
> -		    (REQ_FLUSH | REQ_FUA | REQ_DISCARD | REQ_SECURE)) {
> +	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
> +		rinfo = &dinfo->rinfo[r_index];
> +
> +		/* Stage 1: Make a safe copy of the shadow state. */
> +		copy = kmemdup(rinfo->shadow, sizeof(rinfo->shadow),
> +			       GFP_NOIO | __GFP_REPEAT | __GFP_HIGH);
> +		if (!copy)
> +			return -ENOMEM;
> +
> +		/* Stage 2: Set up free list. */
> +		memset(&rinfo->shadow, 0, sizeof(rinfo->shadow));
> +		for (i = 0; i < BLK_RING_SIZE(dinfo); i++)
> +			rinfo->shadow[i].req.u.rw.id = i+1;
> +		rinfo->shadow_free = rinfo->ring.req_prod_pvt;
> +		rinfo->shadow[BLK_RING_SIZE(dinfo)-1].req.u.rw.id = 0x0fffffff;
> +
> +		rc = blkfront_setup_indirect(rinfo);
> +		if (rc) {
> +			kfree(copy);
> +			return rc;
> +		}
> +
> +		for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
> +			/* Not in use? */
> +			if (!copy[i].request)
> +				continue;
> +
>  			/*
> -			 * Flush operations don't contain bios, so
> -			 * we need to requeue the whole request
> +			 * Get the bios in the request so we can re-queue them.
>  			 */
> -			list_add(&copy[i].request->queuelist, &requests);
> -			continue;
> +			if (copy[i].request->cmd_flags &
> +			    (REQ_FLUSH | REQ_FUA | REQ_DISCARD | REQ_SECURE)) {
> +				/*
> +				 * Flush operations don't contain bios, so
> +				 * we need to requeue the whole request
> +				 */
> +				list_add(&copy[i].request->queuelist, &requests);
> +				continue;
> +			}
> +			merge_bio.head = copy[i].request->bio;
> +			merge_bio.tail = copy[i].request->biotail;
> +			bio_list_merge(&bio_list, &merge_bio);
> +			copy[i].request->bio = NULL;
> +			blk_end_request_all(copy[i].request, 0);
>  		}
> -		merge_bio.head = copy[i].request->bio;
> -		merge_bio.tail = copy[i].request->biotail;
> -		bio_list_merge(&bio_list, &merge_bio);
> -		copy[i].request->bio = NULL;
> -		blk_end_request_all(copy[i].request, 0);
> -	}
>  
> -	kfree(copy);
> +		kfree(copy);
> +	}
>  
>  	xenbus_switch_state(dinfo->xbdev, XenbusStateConnected);
>  
> @@ -1594,8 +1655,12 @@ static int blkif_recover(struct blkfront_dev_info *dinfo)
>  	/* Now safe for us to use the shared ring */
>  	dinfo->connected = BLKIF_STATE_CONNECTED;
>  
> -	/* Kick any other new requests queued since we resumed */
> -	kick_pending_request_queues(rinfo);
> +	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
> +		rinfo = &dinfo->rinfo[r_index];
> +
> +		/* Kick any other new requests queued since we resumed */
> +		kick_pending_request_queues(rinfo);
> +	}
>  
>  	list_for_each_entry_safe(req, n, &requests, queuelist) {
>  		/* Requeue pending requests (flush or discard) */
> @@ -1729,6 +1794,38 @@ static void blkfront_setup_discard(struct blkfront_dev_info *dinfo)
>  		dinfo->feature_secdiscard = !!discard_secure;
>  }
>  
> +static void blkfront_clean_ring(struct blkfront_ring_info *rinfo)
> +{
> +	int i;
> +
> +	for (i = 0; i < BLK_RING_SIZE(rinfo->dinfo); i++) {
> +		kfree(rinfo->shadow[i].grants_used);
> +		rinfo->shadow[i].grants_used = NULL;
> +		kfree(rinfo->shadow[i].sg);
> +		rinfo->shadow[i].sg = NULL;
> +		kfree(rinfo->shadow[i].indirect_grants);
> +		rinfo->shadow[i].indirect_grants = NULL;
> +	}
> +	if (!list_empty(&rinfo->indirect_pages)) {
> +		struct page *indirect_page, *n;
> +		list_for_each_entry_safe(indirect_page, n, &rinfo->indirect_pages, lru) {
> +			list_del(&indirect_page->lru);
> +			__free_page(indirect_page);
> +		}
> +	}
> +
> +	if (!list_empty(&rinfo->grants)) {
> +		struct grant *gnt_list_entry, *n;
> +		list_for_each_entry_safe(gnt_list_entry, n,
> +				&rinfo->grants, node) {
> +			list_del(&gnt_list_entry->node);
> +			if (rinfo->dinfo->feature_persistent)
> +				__free_page(pfn_to_page(gnt_list_entry->pfn));
> +			kfree(gnt_list_entry);
> +		}
> +	}
> +}
> +
>  static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo)
>  {
>  	unsigned int segs;
> @@ -1783,28 +1880,14 @@ static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo)
>  	return 0;
>  
>  out_of_memory:
> -	for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
> -		kfree(rinfo->shadow[i].grants_used);
> -		rinfo->shadow[i].grants_used = NULL;
> -		kfree(rinfo->shadow[i].sg);
> -		rinfo->shadow[i].sg = NULL;
> -		kfree(rinfo->shadow[i].indirect_grants);
> -		rinfo->shadow[i].indirect_grants = NULL;
> -	}
> -	if (!list_empty(&rinfo->indirect_pages)) {
> -		struct page *indirect_page, *n;
> -		list_for_each_entry_safe(indirect_page, n, &rinfo->indirect_pages, lru) {
> -			list_del(&indirect_page->lru);
> -			__free_page(indirect_page);
> -		}
> -	}
> +	blkfront_clean_ring(rinfo);
>  	return -ENOMEM;
>  }
>  
>  /*
>   * Gather all backend feature-*
>   */
> -static int blkfront_gather_backend_features(struct blkfront_dev_info *dinfo)
> +static void __blkfront_gather_backend_features(struct blkfront_dev_info *dinfo)
>  {
>  	int err;
>  	int barrier, flush, discard, persistent;
> @@ -1859,8 +1942,25 @@ static int blkfront_gather_backend_features(struct blkfront_dev_info *dinfo)
>  	else
>  		dinfo->max_indirect_segments = min(indirect_segments,
>  						  xen_blkif_max_segments);
> +}
> +
> +static int blkfront_gather_backend_features(struct blkfront_dev_info *dinfo)
> +{
> +	int err, i;
> +
> +	__blkfront_gather_backend_features(dinfo);

IMHO, there's no need to introduce __blkfront_gather_backend_features,
just add the chunk below to the existing blkfront_gather_backend_features.

> -	return blkfront_setup_indirect(&dinfo->rinfo);
> +	for (i = 0; i < dinfo->nr_rings; i++) {
> +		err = blkfront_setup_indirect(&dinfo->rinfo[i]);
> +		if (err)
> +			goto out;
> +	}
> +	return 0;
> +
> +out:
> +	while (--i >= 0)
> +		blkfront_clean_ring(&dinfo->rinfo[i]);
> +	return err;
>  }
>  
>  /*
> @@ -1873,8 +1973,8 @@ static void blkfront_connect(struct blkfront_dev_info *dinfo)
>  	unsigned long sector_size;
>  	unsigned int physical_sector_size;
>  	unsigned int binfo;
> -	int err;
> -	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
> +	int err, i;
> +	struct blkfront_ring_info *rinfo;
>  
>  	switch (dinfo->connected) {
>  	case BLKIF_STATE_CONNECTED:
> @@ -1951,7 +2051,10 @@ static void blkfront_connect(struct blkfront_dev_info *dinfo)
>  	/* Kick pending requests. */
>  	spin_lock_irq(&dinfo->io_lock);
>  	dinfo->connected = BLKIF_STATE_CONNECTED;
> -	kick_pending_request_queues(rinfo);
> +	for (i = 0; i < dinfo->nr_rings; i++) {
> +		rinfo = &dinfo->rinfo[i];

If rinfo is only going to be used in the for loop you can declare it inside:

		struct blkfront_ring_info *rinfo = &dinfo->rinfo[i];

> +		kick_pending_request_queues(rinfo);
> +	}
>  	spin_unlock_irq(&dinfo->io_lock);
>  
>  	add_disk(dinfo->gd);
> 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 5/9] xen/blkfront: convert per device io_lock to per ring ring_lock
  2015-09-05 12:39   ` Bob Liu
  (?)
  (?)
@ 2015-10-05 14:13   ` Roger Pau Monné
  2015-10-07 10:34     ` Bob Liu
  2015-10-07 10:34     ` Bob Liu
  -1 siblings, 2 replies; 83+ messages in thread
From: Roger Pau Monné @ 2015-10-05 14:13 UTC (permalink / raw)
  To: Bob Liu, xen-devel
  Cc: david.vrabel, linux-kernel, konrad.wilk, felipe.franciosi, axboe,
	hch, avanzini.arianna, rafal.mielniczuk, boris.ostrovsky,
	jonathan.davies

El 05/09/15 a les 14.39, Bob Liu ha escrit:
> The per device io_lock became a coarser grained lock after multi-queues/rings
> was introduced, this patch converts it to a fine-grained per ring lock.
> 
> NOTE: The per dev_info structure was no more protected by any lock.

I would rewrite this as:

Note that the per-device blkfront_dev_info structure is no longer
protected by any lock.

> 
> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> ---
>  drivers/block/xen-blkfront.c |   44 +++++++++++++++++++-----------------------
>  1 file changed, 20 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index bf45c99..1cae76b 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -123,6 +123,7 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the
>  struct blkfront_ring_info
>  {
>  	struct blkif_front_ring ring;
> +	spinlock_t ring_lock;
>  	unsigned int ring_ref[XENBUS_MAX_RING_PAGES];
>  	unsigned int evtchn, irq;
>  	struct work_struct work;
> @@ -141,7 +142,6 @@ struct blkfront_ring_info
>   * putting all kinds of interesting stuff here :-)
>   */
>  struct blkfront_dev_info {
> -	spinlock_t io_lock;
>  	struct mutex mutex;
>  	struct xenbus_device *xbdev;
>  	struct gendisk *gd;
> @@ -637,29 +637,28 @@ static int blkif_queue_rq(struct blk_mq_hw_ctx *hctx,
>  			   const struct blk_mq_queue_data *qd)
>  {
>  	struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)hctx->driver_data;
> -	struct blkfront_dev_info *dinfo = rinfo->dinfo;
>  
>  	blk_mq_start_request(qd->rq);
> -	spin_lock_irq(&dinfo->io_lock);
> +	spin_lock_irq(&rinfo->ring_lock);
>  	if (RING_FULL(&rinfo->ring))
>  		goto out_busy;
>  
> -	if (blkif_request_flush_invalid(qd->rq, dinfo))
> +	if (blkif_request_flush_invalid(qd->rq, rinfo->dinfo))
>  		goto out_err;
>  
>  	if (blkif_queue_request(qd->rq, rinfo))
>  		goto out_busy;
>  
>  	flush_requests(rinfo);
> -	spin_unlock_irq(&dinfo->io_lock);
> +	spin_unlock_irq(&rinfo->ring_lock);
>  	return BLK_MQ_RQ_QUEUE_OK;
>  
>  out_err:
> -	spin_unlock_irq(&dinfo->io_lock);
> +	spin_unlock_irq(&rinfo->ring_lock);
>  	return BLK_MQ_RQ_QUEUE_ERROR;
>  
>  out_busy:
> -	spin_unlock_irq(&dinfo->io_lock);
> +	spin_unlock_irq(&rinfo->ring_lock);
>  	blk_mq_stop_hw_queue(hctx);
>  	return BLK_MQ_RQ_QUEUE_BUSY;
>  }
> @@ -961,7 +960,7 @@ static void xlvbd_release_gendisk(struct blkfront_dev_info *dinfo)
>  	dinfo->gd = NULL;
>  }
>  
> -/* Must be called with io_lock holded */
> +/* Must be called with ring_lock holded */
                                    ^ held.
>  static void kick_pending_request_queues(struct blkfront_ring_info *rinfo)
>  {
>  	if (!RING_FULL(&rinfo->ring))
> @@ -972,10 +971,10 @@ static void blkif_restart_queue(struct work_struct *work)
>  {
>  	struct blkfront_ring_info *rinfo = container_of(work, struct blkfront_ring_info, work);
>  
> -	spin_lock_irq(&rinfo->dinfo->io_lock);
> +	spin_lock_irq(&rinfo->ring_lock);
>  	if (rinfo->dinfo->connected == BLKIF_STATE_CONNECTED)
>  		kick_pending_request_queues(rinfo);

This seems wrong, why are you acquiring a per-ring lock in order to
check a per-device field? IMHO, I think you need to introduce a
per-device lock or drop the locking around this chunk if it's really not
needed.

> -	spin_unlock_irq(&rinfo->dinfo->io_lock);
> +	spin_unlock_irq(&rinfo->ring_lock);
>  }
>  
>  static void blkif_free(struct blkfront_dev_info *dinfo, int suspend)
> @@ -986,7 +985,6 @@ static void blkif_free(struct blkfront_dev_info *dinfo, int suspend)
>  	struct blkfront_ring_info *rinfo;
>  
>  	/* Prevent new requests being issued until we fix things up. */
> -	spin_lock_irq(&dinfo->io_lock);
>  	dinfo->connected = suspend ?
>  		BLKIF_STATE_SUSPENDED : BLKIF_STATE_DISCONNECTED;
>  	/* No more blkif_request(). */
> @@ -996,6 +994,7 @@ static void blkif_free(struct blkfront_dev_info *dinfo, int suspend)
>  	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
>  		rinfo = &dinfo->rinfo[r_index];
>  
> +		spin_lock_irq(&rinfo->ring_lock);
>  		/* Remove all persistent grants */
>  		if (!list_empty(&rinfo->grants)) {
>  			list_for_each_entry_safe(persistent_gnt, n,
> @@ -1071,7 +1070,7 @@ free_shadow:
>  
>  		/* No more gnttab callback work. */
>  		gnttab_cancel_free_callback(&rinfo->callback);
> -		spin_unlock_irq(&dinfo->io_lock);
> +		spin_unlock_irq(&rinfo->ring_lock);
>  
>  		/* Flush gnttab callback work. Must be done with no locks held. */
>  		flush_work(&rinfo->work);
> @@ -1180,13 +1179,10 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>  	struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)dev_id;
>  	struct blkfront_dev_info *dinfo = rinfo->dinfo;
>  
> -	spin_lock_irqsave(&dinfo->io_lock, flags);
> -
> -	if (unlikely(dinfo->connected != BLKIF_STATE_CONNECTED)) {
> -		spin_unlock_irqrestore(&dinfo->io_lock, flags);
> +	if (unlikely(dinfo->connected != BLKIF_STATE_CONNECTED))
>  		return IRQ_HANDLED;
> -	}
>  
> +	spin_lock_irqsave(&rinfo->ring_lock, flags);
>   again:
>  	rp = rinfo->ring.sring->rsp_prod;
>  	rmb(); /* Ensure we see queued responses up to 'rp'. */
> @@ -1279,7 +1275,7 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>  
>  	kick_pending_request_queues(rinfo);
>  
> -	spin_unlock_irqrestore(&dinfo->io_lock, flags);
> +	spin_unlock_irqrestore(&rinfo->ring_lock, flags);
>  
>  	return IRQ_HANDLED;
>  }
> @@ -1534,7 +1530,6 @@ static int blkfront_probe(struct xenbus_device *dev,
>  	}
>  
>  	mutex_init(&dinfo->mutex);
> -	spin_lock_init(&dinfo->io_lock);
>  	dinfo->xbdev = dev;
>  	dinfo->vdevice = vdevice;
>  	dinfo->connected = BLKIF_STATE_DISCONNECTED;
> @@ -1550,6 +1545,7 @@ static int blkfront_probe(struct xenbus_device *dev,
>  	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
>  		rinfo = &dinfo->rinfo[r_index];
>  
> +		spin_lock_init(&rinfo->ring_lock);
>  		INIT_LIST_HEAD(&rinfo->grants);
>  		INIT_LIST_HEAD(&rinfo->indirect_pages);
>  		rinfo->persistent_gnts_c = 0;
> @@ -1650,16 +1646,16 @@ static int blkif_recover(struct blkfront_dev_info *dinfo)
>  
>  	xenbus_switch_state(dinfo->xbdev, XenbusStateConnected);
>  
> -	spin_lock_irq(&dinfo->io_lock);
> -
>  	/* Now safe for us to use the shared ring */
>  	dinfo->connected = BLKIF_STATE_CONNECTED;
>  
>  	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
>  		rinfo = &dinfo->rinfo[r_index];
>  
> +		spin_lock_irq(&rinfo->ring_lock);
>  		/* Kick any other new requests queued since we resumed */
>  		kick_pending_request_queues(rinfo);
> +		spin_unlock_irq(&rinfo->ring_lock);
>  	}
>  
>  	list_for_each_entry_safe(req, n, &requests, queuelist) {
> @@ -1668,7 +1664,6 @@ static int blkif_recover(struct blkfront_dev_info *dinfo)
>  		BUG_ON(req->nr_phys_segments > segs);
>  		blk_mq_requeue_request(req);
>  	}
> -	spin_unlock_irq(&dinfo->io_lock);
>  	blk_mq_kick_requeue_list(dinfo->rq);
>  
>  	while ((bio = bio_list_pop(&bio_list)) != NULL) {
> @@ -2049,13 +2044,14 @@ static void blkfront_connect(struct blkfront_dev_info *dinfo)
>  	xenbus_switch_state(dinfo->xbdev, XenbusStateConnected);
>  
>  	/* Kick pending requests. */
> -	spin_lock_irq(&dinfo->io_lock);
>  	dinfo->connected = BLKIF_STATE_CONNECTED;
>  	for (i = 0; i < dinfo->nr_rings; i++) {
>  		rinfo = &dinfo->rinfo[i];
> +
> +		spin_lock_irq(&rinfo->ring_lock);
>  		kick_pending_request_queues(rinfo);
> +		spin_unlock_irq(&rinfo->ring_lock);
>  	}
> -	spin_unlock_irq(&dinfo->io_lock);
>  
>  	add_disk(dinfo->gd);
>  
> 


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 5/9] xen/blkfront: convert per device io_lock to per ring ring_lock
  2015-09-05 12:39   ` Bob Liu
  (?)
@ 2015-10-05 14:13   ` Roger Pau Monné
  -1 siblings, 0 replies; 83+ messages in thread
From: Roger Pau Monné @ 2015-10-05 14:13 UTC (permalink / raw)
  To: Bob Liu, xen-devel
  Cc: hch, felipe.franciosi, rafal.mielniczuk, linux-kernel,
	jonathan.davies, axboe, david.vrabel, avanzini.arianna,
	boris.ostrovsky

El 05/09/15 a les 14.39, Bob Liu ha escrit:
> The per device io_lock became a coarser grained lock after multi-queues/rings
> was introduced, this patch converts it to a fine-grained per ring lock.
> 
> NOTE: The per dev_info structure was no more protected by any lock.

I would rewrite this as:

Note that the per-device blkfront_dev_info structure is no longer
protected by any lock.

> 
> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> ---
>  drivers/block/xen-blkfront.c |   44 +++++++++++++++++++-----------------------
>  1 file changed, 20 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index bf45c99..1cae76b 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -123,6 +123,7 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the
>  struct blkfront_ring_info
>  {
>  	struct blkif_front_ring ring;
> +	spinlock_t ring_lock;
>  	unsigned int ring_ref[XENBUS_MAX_RING_PAGES];
>  	unsigned int evtchn, irq;
>  	struct work_struct work;
> @@ -141,7 +142,6 @@ struct blkfront_ring_info
>   * putting all kinds of interesting stuff here :-)
>   */
>  struct blkfront_dev_info {
> -	spinlock_t io_lock;
>  	struct mutex mutex;
>  	struct xenbus_device *xbdev;
>  	struct gendisk *gd;
> @@ -637,29 +637,28 @@ static int blkif_queue_rq(struct blk_mq_hw_ctx *hctx,
>  			   const struct blk_mq_queue_data *qd)
>  {
>  	struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)hctx->driver_data;
> -	struct blkfront_dev_info *dinfo = rinfo->dinfo;
>  
>  	blk_mq_start_request(qd->rq);
> -	spin_lock_irq(&dinfo->io_lock);
> +	spin_lock_irq(&rinfo->ring_lock);
>  	if (RING_FULL(&rinfo->ring))
>  		goto out_busy;
>  
> -	if (blkif_request_flush_invalid(qd->rq, dinfo))
> +	if (blkif_request_flush_invalid(qd->rq, rinfo->dinfo))
>  		goto out_err;
>  
>  	if (blkif_queue_request(qd->rq, rinfo))
>  		goto out_busy;
>  
>  	flush_requests(rinfo);
> -	spin_unlock_irq(&dinfo->io_lock);
> +	spin_unlock_irq(&rinfo->ring_lock);
>  	return BLK_MQ_RQ_QUEUE_OK;
>  
>  out_err:
> -	spin_unlock_irq(&dinfo->io_lock);
> +	spin_unlock_irq(&rinfo->ring_lock);
>  	return BLK_MQ_RQ_QUEUE_ERROR;
>  
>  out_busy:
> -	spin_unlock_irq(&dinfo->io_lock);
> +	spin_unlock_irq(&rinfo->ring_lock);
>  	blk_mq_stop_hw_queue(hctx);
>  	return BLK_MQ_RQ_QUEUE_BUSY;
>  }
> @@ -961,7 +960,7 @@ static void xlvbd_release_gendisk(struct blkfront_dev_info *dinfo)
>  	dinfo->gd = NULL;
>  }
>  
> -/* Must be called with io_lock holded */
> +/* Must be called with ring_lock holded */
                                    ^ held.
>  static void kick_pending_request_queues(struct blkfront_ring_info *rinfo)
>  {
>  	if (!RING_FULL(&rinfo->ring))
> @@ -972,10 +971,10 @@ static void blkif_restart_queue(struct work_struct *work)
>  {
>  	struct blkfront_ring_info *rinfo = container_of(work, struct blkfront_ring_info, work);
>  
> -	spin_lock_irq(&rinfo->dinfo->io_lock);
> +	spin_lock_irq(&rinfo->ring_lock);
>  	if (rinfo->dinfo->connected == BLKIF_STATE_CONNECTED)
>  		kick_pending_request_queues(rinfo);

This seems wrong, why are you acquiring a per-ring lock in order to
check a per-device field? IMHO, I think you need to introduce a
per-device lock or drop the locking around this chunk if it's really not
needed.

> -	spin_unlock_irq(&rinfo->dinfo->io_lock);
> +	spin_unlock_irq(&rinfo->ring_lock);
>  }
>  
>  static void blkif_free(struct blkfront_dev_info *dinfo, int suspend)
> @@ -986,7 +985,6 @@ static void blkif_free(struct blkfront_dev_info *dinfo, int suspend)
>  	struct blkfront_ring_info *rinfo;
>  
>  	/* Prevent new requests being issued until we fix things up. */
> -	spin_lock_irq(&dinfo->io_lock);
>  	dinfo->connected = suspend ?
>  		BLKIF_STATE_SUSPENDED : BLKIF_STATE_DISCONNECTED;
>  	/* No more blkif_request(). */
> @@ -996,6 +994,7 @@ static void blkif_free(struct blkfront_dev_info *dinfo, int suspend)
>  	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
>  		rinfo = &dinfo->rinfo[r_index];
>  
> +		spin_lock_irq(&rinfo->ring_lock);
>  		/* Remove all persistent grants */
>  		if (!list_empty(&rinfo->grants)) {
>  			list_for_each_entry_safe(persistent_gnt, n,
> @@ -1071,7 +1070,7 @@ free_shadow:
>  
>  		/* No more gnttab callback work. */
>  		gnttab_cancel_free_callback(&rinfo->callback);
> -		spin_unlock_irq(&dinfo->io_lock);
> +		spin_unlock_irq(&rinfo->ring_lock);
>  
>  		/* Flush gnttab callback work. Must be done with no locks held. */
>  		flush_work(&rinfo->work);
> @@ -1180,13 +1179,10 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>  	struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)dev_id;
>  	struct blkfront_dev_info *dinfo = rinfo->dinfo;
>  
> -	spin_lock_irqsave(&dinfo->io_lock, flags);
> -
> -	if (unlikely(dinfo->connected != BLKIF_STATE_CONNECTED)) {
> -		spin_unlock_irqrestore(&dinfo->io_lock, flags);
> +	if (unlikely(dinfo->connected != BLKIF_STATE_CONNECTED))
>  		return IRQ_HANDLED;
> -	}
>  
> +	spin_lock_irqsave(&rinfo->ring_lock, flags);
>   again:
>  	rp = rinfo->ring.sring->rsp_prod;
>  	rmb(); /* Ensure we see queued responses up to 'rp'. */
> @@ -1279,7 +1275,7 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>  
>  	kick_pending_request_queues(rinfo);
>  
> -	spin_unlock_irqrestore(&dinfo->io_lock, flags);
> +	spin_unlock_irqrestore(&rinfo->ring_lock, flags);
>  
>  	return IRQ_HANDLED;
>  }
> @@ -1534,7 +1530,6 @@ static int blkfront_probe(struct xenbus_device *dev,
>  	}
>  
>  	mutex_init(&dinfo->mutex);
> -	spin_lock_init(&dinfo->io_lock);
>  	dinfo->xbdev = dev;
>  	dinfo->vdevice = vdevice;
>  	dinfo->connected = BLKIF_STATE_DISCONNECTED;
> @@ -1550,6 +1545,7 @@ static int blkfront_probe(struct xenbus_device *dev,
>  	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
>  		rinfo = &dinfo->rinfo[r_index];
>  
> +		spin_lock_init(&rinfo->ring_lock);
>  		INIT_LIST_HEAD(&rinfo->grants);
>  		INIT_LIST_HEAD(&rinfo->indirect_pages);
>  		rinfo->persistent_gnts_c = 0;
> @@ -1650,16 +1646,16 @@ static int blkif_recover(struct blkfront_dev_info *dinfo)
>  
>  	xenbus_switch_state(dinfo->xbdev, XenbusStateConnected);
>  
> -	spin_lock_irq(&dinfo->io_lock);
> -
>  	/* Now safe for us to use the shared ring */
>  	dinfo->connected = BLKIF_STATE_CONNECTED;
>  
>  	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
>  		rinfo = &dinfo->rinfo[r_index];
>  
> +		spin_lock_irq(&rinfo->ring_lock);
>  		/* Kick any other new requests queued since we resumed */
>  		kick_pending_request_queues(rinfo);
> +		spin_unlock_irq(&rinfo->ring_lock);
>  	}
>  
>  	list_for_each_entry_safe(req, n, &requests, queuelist) {
> @@ -1668,7 +1664,6 @@ static int blkif_recover(struct blkfront_dev_info *dinfo)
>  		BUG_ON(req->nr_phys_segments > segs);
>  		blk_mq_requeue_request(req);
>  	}
> -	spin_unlock_irq(&dinfo->io_lock);
>  	blk_mq_kick_requeue_list(dinfo->rq);
>  
>  	while ((bio = bio_list_pop(&bio_list)) != NULL) {
> @@ -2049,13 +2044,14 @@ static void blkfront_connect(struct blkfront_dev_info *dinfo)
>  	xenbus_switch_state(dinfo->xbdev, XenbusStateConnected);
>  
>  	/* Kick pending requests. */
> -	spin_lock_irq(&dinfo->io_lock);
>  	dinfo->connected = BLKIF_STATE_CONNECTED;
>  	for (i = 0; i < dinfo->nr_rings; i++) {
>  		rinfo = &dinfo->rinfo[i];
> +
> +		spin_lock_irq(&rinfo->ring_lock);
>  		kick_pending_request_queues(rinfo);
> +		spin_unlock_irq(&rinfo->ring_lock);
>  	}
> -	spin_unlock_irq(&dinfo->io_lock);
>  
>  	add_disk(dinfo->gd);
>  
> 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 6/9] xen/blkfront: negotiate the number of hw queues/rings with backend
  2015-09-05 12:39   ` Bob Liu
  (?)
@ 2015-10-05 14:40   ` Roger Pau Monné
  2015-10-07 10:39     ` Bob Liu
  2015-10-07 10:39     ` Bob Liu
  -1 siblings, 2 replies; 83+ messages in thread
From: Roger Pau Monné @ 2015-10-05 14:40 UTC (permalink / raw)
  To: Bob Liu, xen-devel
  Cc: david.vrabel, linux-kernel, konrad.wilk, felipe.franciosi, axboe,
	hch, avanzini.arianna, rafal.mielniczuk, boris.ostrovsky,
	jonathan.davies

El 05/09/15 a les 14.39, Bob Liu ha escrit:
> The max number of hardware queues for xen/blkfront is set by parameter
> 'max_queues', while the number xen/blkback supported is notified through
> xenstore("multi-queue-max-queues").
> 
> The negotiated number was the smaller one, and was written back to
> xen/blkback as "multi-queue-num-queues".

I would write this in present instead of past, I think it's clearer:

The negotiated number _is_ the smaller...

> 
> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> ---
>  drivers/block/xen-blkfront.c |  142 ++++++++++++++++++++++++++++++++----------
>  1 file changed, 108 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 1cae76b..1aa66c9 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -107,6 +107,10 @@ static unsigned int xen_blkif_max_ring_order;
>  module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, S_IRUGO);
>  MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the shared ring");
>  
> +static unsigned int xen_blkif_max_queues;
> +module_param_named(max_queues, xen_blkif_max_queues, uint, S_IRUGO);
> +MODULE_PARM_DESC(max_queues, "Maximum number of hardware queues/rings per virtual disk");
> +
>  #define BLK_RING_SIZE(dinfo) __CONST_RING_SIZE(blkif, PAGE_SIZE * (dinfo)->pages_per_ring)
>  #define BLK_MAX_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE * XENBUS_MAX_RING_PAGES)
>  /*
> @@ -114,6 +118,7 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the
>   * characters are enough. Define to 20 to keep consist with backend.
>   */
>  #define RINGREF_NAME_LEN (20)
> +#define QUEUE_NAME_LEN (12)
>  
>  /*
>   *  Per-ring info.
> @@ -687,7 +692,7 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
>  
>  	memset(&dinfo->tag_set, 0, sizeof(dinfo->tag_set));
>  	dinfo->tag_set.ops = &blkfront_mq_ops;
> -	dinfo->tag_set.nr_hw_queues = 1;
> +	dinfo->tag_set.nr_hw_queues = dinfo->nr_rings;
>  	dinfo->tag_set.queue_depth =  BLK_RING_SIZE(dinfo);
>  	dinfo->tag_set.numa_node = NUMA_NO_NODE;
>  	dinfo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
> @@ -1350,6 +1355,51 @@ fail:
>  	return err;
>  }
>  
> +static int write_per_ring_nodes(struct xenbus_transaction xbt,
> +				struct blkfront_ring_info *rinfo, const char *dir)
> +{
> +	int err, i;
> +	const char *message = NULL;
> +	struct blkfront_dev_info *dinfo = rinfo->dinfo;
> +
> +	if (dinfo->pages_per_ring == 1) {
> +		err = xenbus_printf(xbt, dir, "ring-ref", "%u", rinfo->ring_ref[0]);
> +		if (err) {
> +			message = "writing ring-ref";
> +			goto abort_transaction;
> +		}
> +		pr_info("%s: write ring-ref:%d\n", dir, rinfo->ring_ref[0]);
> +	} else {
> +		for (i = 0; i < dinfo->pages_per_ring; i++) {
> +			char ring_ref_name[RINGREF_NAME_LEN];
> +
> +			snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
> +			err = xenbus_printf(xbt, dir, ring_ref_name,
> +					    "%u", rinfo->ring_ref[i]);
> +			if (err) {
> +				message = "writing ring-ref";
> +				goto abort_transaction;
> +			}
> +			pr_info("%s: write ring-ref:%d\n", dir, rinfo->ring_ref[i]);
> +		}
> +	}
> +
> +	err = xenbus_printf(xbt, dir, "event-channel", "%u", rinfo->evtchn);
> +	if (err) {
> +		message = "writing event-channel";
> +		goto abort_transaction;
> +	}
> +	pr_info("%s: write event-channel:%d\n", dir, rinfo->evtchn);
> +
> +	return 0;
> +
> +abort_transaction:
> +	xenbus_transaction_end(xbt, 1);

The transaction is not started inside of the helper, so I would prefer
it to be ended in the same function where it is started. This should
just return the error and the caller should end the transaction.

> +	if (message)
> +		xenbus_dev_fatal(dinfo->xbdev, err, "%s", message);
> +
> +	return err;
> +}
>  
>  /* Common code used when first setting up, and when resuming. */
>  static int talk_to_blkback(struct xenbus_device *dev,
> @@ -1386,45 +1436,51 @@ again:
>  		goto out;
>  	}
>  
> +	if (dinfo->pages_per_ring > 1) {
> +		err = xenbus_printf(xbt, dev->nodename, "ring-page-order", "%u",
> +				    ring_page_order);
> +		if (err) {
> +			message = "writing ring-page-order";
> +			goto abort_transaction;
> +		}
> +	}
> +
> +	/* We already got the number of queues in _probe */
>  	if (dinfo->nr_rings == 1) {
>  		rinfo = &dinfo->rinfo[0];
>  
> -		if (dinfo->pages_per_ring == 1) {
> -			err = xenbus_printf(xbt, dev->nodename,
> -					    "ring-ref", "%u", rinfo->ring_ref[0]);
> -			if (err) {
> -				message = "writing ring-ref";
> -				goto abort_transaction;
> -			}
> -		} else {
> -			err = xenbus_printf(xbt, dev->nodename,
> -					    "ring-page-order", "%u", ring_page_order);
> -			if (err) {
> -				message = "writing ring-page-order";
> -				goto abort_transaction;
> -			}
> -
> -			for (i = 0; i < dinfo->pages_per_ring; i++) {
> -				char ring_ref_name[RINGREF_NAME_LEN];
> +		err = write_per_ring_nodes(xbt, &dinfo->rinfo[0], dev->nodename);
> +		if (err)
> +			goto out;
> +	} else {
> +		char *path;
> +		size_t pathsize;
>  
> -				snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
> -				err = xenbus_printf(xbt, dev->nodename, ring_ref_name,
> -						    "%u", rinfo->ring_ref[i]);
> -				if (err) {
> -					message = "writing ring-ref";
> -					goto abort_transaction;
> -				}
> -			}
> -		}
> -		err = xenbus_printf(xbt, dev->nodename,
> -				    "event-channel", "%u", rinfo->evtchn);
> +		err = xenbus_printf(xbt, dev->nodename, "multi-queue-num-queues", "%u",
> +				    dinfo->nr_rings);
>  		if (err) {
> -			message = "writing event-channel";
> +			message = "writing multi-queue-num-queues";
>  			goto abort_transaction;
>  		}
> -	} else {
> -		/* Not supported at this stage */
> -		goto abort_transaction;
> +
> +		pathsize = strlen(dev->nodename) + QUEUE_NAME_LEN;
> +		path = kzalloc(pathsize, GFP_KERNEL);

kmalloc seems better here, since you are memseting it below in order to
reuse it.

> +		if (!path) {
> +			err = -ENOMEM;
> +			message = "ENOMEM while writing ring references";
> +			goto abort_transaction;
> +		}
> +
> +		for (i = 0; i < dinfo->nr_rings; i++) {
> +			memset(path, 0, pathsize);
> +			snprintf(path, pathsize, "%s/queue-%u", dev->nodename, i);
> +			err = write_per_ring_nodes(xbt, &dinfo->rinfo[i], path);
> +			if (err) {
> +				kfree(path);
> +				goto out;
> +			}
> +		}
> +		kfree(path);
>  	}
>  
>  	err = xenbus_printf(xbt, dev->nodename, "protocol", "%s",
> @@ -1480,6 +1536,7 @@ static int blkfront_probe(struct xenbus_device *dev,
>  	int err, vdevice, r_index;
>  	struct blkfront_dev_info *dinfo;
>  	struct blkfront_ring_info *rinfo;
> +	unsigned int back_max_queues = 0;
>  
>  	/* FIXME: Use dynamic device id if this is not set. */
>  	err = xenbus_scanf(XBT_NIL, dev->nodename,
> @@ -1534,7 +1591,17 @@ static int blkfront_probe(struct xenbus_device *dev,
>  	dinfo->vdevice = vdevice;
>  	dinfo->connected = BLKIF_STATE_DISCONNECTED;
>  
> -	dinfo->nr_rings = 1;
> +	/* Check if backend supports multiple queues */
> +	err = xenbus_scanf(XBT_NIL, dinfo->xbdev->otherend,
> +			   "multi-queue-max-queues", "%u", &back_max_queues);
> +	if (err < 0)
> +		back_max_queues = 1;
> +
> +	dinfo->nr_rings = min(back_max_queues, xen_blkif_max_queues);
> +	if (dinfo->nr_rings <= 0)
> +		dinfo->nr_rings = 1;
> +	pr_info("dinfo->nr_rings:%u, backend support max-queues:%u\n", dinfo->nr_rings, back_max_queues);
        ^ pr_debug                           ^ supports

And I would also print the number of queues that are going to be used.

> +
>  	dinfo->rinfo = kzalloc(sizeof(*rinfo) * dinfo->nr_rings, GFP_KERNEL);
>  	if (!dinfo->rinfo) {
>  		xenbus_dev_fatal(dev, -ENOMEM, "allocating ring_info structure");
> @@ -2257,6 +2324,7 @@ static struct xenbus_driver blkfront_driver = {
>  static int __init xlblk_init(void)
>  {
>  	int ret;
> +	int nr_cpus = num_online_cpus();
>  
>  	if (!xen_domain())
>  		return -ENODEV;
> @@ -2267,6 +2335,12 @@ static int __init xlblk_init(void)
>  		xen_blkif_max_ring_order = 0;
>  	}
>  
> +	if (xen_blkif_max_queues > nr_cpus) {

Shouldn't there be a default value for xen_blkif_max_queues if the user
hasn't set the parameter on the command line?

> +		pr_info("Invalid max_queues (%d), will use default max: %d.\n",
> +			xen_blkif_max_queues, nr_cpus);
> +		xen_blkif_max_queues = nr_cpus;
> +	}
> +
>  	if (!xen_has_pv_disk_devices())
>  		return -ENODEV;
>  
> 


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 6/9] xen/blkfront: negotiate the number of hw queues/rings with backend
  2015-09-05 12:39   ` Bob Liu
  (?)
  (?)
@ 2015-10-05 14:40   ` Roger Pau Monné
  -1 siblings, 0 replies; 83+ messages in thread
From: Roger Pau Monné @ 2015-10-05 14:40 UTC (permalink / raw)
  To: Bob Liu, xen-devel
  Cc: hch, felipe.franciosi, rafal.mielniczuk, linux-kernel,
	jonathan.davies, axboe, david.vrabel, avanzini.arianna,
	boris.ostrovsky

El 05/09/15 a les 14.39, Bob Liu ha escrit:
> The max number of hardware queues for xen/blkfront is set by parameter
> 'max_queues', while the number xen/blkback supported is notified through
> xenstore("multi-queue-max-queues").
> 
> The negotiated number was the smaller one, and was written back to
> xen/blkback as "multi-queue-num-queues".

I would write this in present instead of past, I think it's clearer:

The negotiated number _is_ the smaller...

> 
> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> ---
>  drivers/block/xen-blkfront.c |  142 ++++++++++++++++++++++++++++++++----------
>  1 file changed, 108 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 1cae76b..1aa66c9 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -107,6 +107,10 @@ static unsigned int xen_blkif_max_ring_order;
>  module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, S_IRUGO);
>  MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the shared ring");
>  
> +static unsigned int xen_blkif_max_queues;
> +module_param_named(max_queues, xen_blkif_max_queues, uint, S_IRUGO);
> +MODULE_PARM_DESC(max_queues, "Maximum number of hardware queues/rings per virtual disk");
> +
>  #define BLK_RING_SIZE(dinfo) __CONST_RING_SIZE(blkif, PAGE_SIZE * (dinfo)->pages_per_ring)
>  #define BLK_MAX_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE * XENBUS_MAX_RING_PAGES)
>  /*
> @@ -114,6 +118,7 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the
>   * characters are enough. Define to 20 to keep consist with backend.
>   */
>  #define RINGREF_NAME_LEN (20)
> +#define QUEUE_NAME_LEN (12)
>  
>  /*
>   *  Per-ring info.
> @@ -687,7 +692,7 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
>  
>  	memset(&dinfo->tag_set, 0, sizeof(dinfo->tag_set));
>  	dinfo->tag_set.ops = &blkfront_mq_ops;
> -	dinfo->tag_set.nr_hw_queues = 1;
> +	dinfo->tag_set.nr_hw_queues = dinfo->nr_rings;
>  	dinfo->tag_set.queue_depth =  BLK_RING_SIZE(dinfo);
>  	dinfo->tag_set.numa_node = NUMA_NO_NODE;
>  	dinfo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
> @@ -1350,6 +1355,51 @@ fail:
>  	return err;
>  }
>  
> +static int write_per_ring_nodes(struct xenbus_transaction xbt,
> +				struct blkfront_ring_info *rinfo, const char *dir)
> +{
> +	int err, i;
> +	const char *message = NULL;
> +	struct blkfront_dev_info *dinfo = rinfo->dinfo;
> +
> +	if (dinfo->pages_per_ring == 1) {
> +		err = xenbus_printf(xbt, dir, "ring-ref", "%u", rinfo->ring_ref[0]);
> +		if (err) {
> +			message = "writing ring-ref";
> +			goto abort_transaction;
> +		}
> +		pr_info("%s: write ring-ref:%d\n", dir, rinfo->ring_ref[0]);
> +	} else {
> +		for (i = 0; i < dinfo->pages_per_ring; i++) {
> +			char ring_ref_name[RINGREF_NAME_LEN];
> +
> +			snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
> +			err = xenbus_printf(xbt, dir, ring_ref_name,
> +					    "%u", rinfo->ring_ref[i]);
> +			if (err) {
> +				message = "writing ring-ref";
> +				goto abort_transaction;
> +			}
> +			pr_info("%s: write ring-ref:%d\n", dir, rinfo->ring_ref[i]);
> +		}
> +	}
> +
> +	err = xenbus_printf(xbt, dir, "event-channel", "%u", rinfo->evtchn);
> +	if (err) {
> +		message = "writing event-channel";
> +		goto abort_transaction;
> +	}
> +	pr_info("%s: write event-channel:%d\n", dir, rinfo->evtchn);
> +
> +	return 0;
> +
> +abort_transaction:
> +	xenbus_transaction_end(xbt, 1);

The transaction is not started inside of the helper, so I would prefer
it to be ended in the same function where it is started. This should
just return the error and the caller should end the transaction.

> +	if (message)
> +		xenbus_dev_fatal(dinfo->xbdev, err, "%s", message);
> +
> +	return err;
> +}
>  
>  /* Common code used when first setting up, and when resuming. */
>  static int talk_to_blkback(struct xenbus_device *dev,
> @@ -1386,45 +1436,51 @@ again:
>  		goto out;
>  	}
>  
> +	if (dinfo->pages_per_ring > 1) {
> +		err = xenbus_printf(xbt, dev->nodename, "ring-page-order", "%u",
> +				    ring_page_order);
> +		if (err) {
> +			message = "writing ring-page-order";
> +			goto abort_transaction;
> +		}
> +	}
> +
> +	/* We already got the number of queues in _probe */
>  	if (dinfo->nr_rings == 1) {
>  		rinfo = &dinfo->rinfo[0];
>  
> -		if (dinfo->pages_per_ring == 1) {
> -			err = xenbus_printf(xbt, dev->nodename,
> -					    "ring-ref", "%u", rinfo->ring_ref[0]);
> -			if (err) {
> -				message = "writing ring-ref";
> -				goto abort_transaction;
> -			}
> -		} else {
> -			err = xenbus_printf(xbt, dev->nodename,
> -					    "ring-page-order", "%u", ring_page_order);
> -			if (err) {
> -				message = "writing ring-page-order";
> -				goto abort_transaction;
> -			}
> -
> -			for (i = 0; i < dinfo->pages_per_ring; i++) {
> -				char ring_ref_name[RINGREF_NAME_LEN];
> +		err = write_per_ring_nodes(xbt, &dinfo->rinfo[0], dev->nodename);
> +		if (err)
> +			goto out;
> +	} else {
> +		char *path;
> +		size_t pathsize;
>  
> -				snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
> -				err = xenbus_printf(xbt, dev->nodename, ring_ref_name,
> -						    "%u", rinfo->ring_ref[i]);
> -				if (err) {
> -					message = "writing ring-ref";
> -					goto abort_transaction;
> -				}
> -			}
> -		}
> -		err = xenbus_printf(xbt, dev->nodename,
> -				    "event-channel", "%u", rinfo->evtchn);
> +		err = xenbus_printf(xbt, dev->nodename, "multi-queue-num-queues", "%u",
> +				    dinfo->nr_rings);
>  		if (err) {
> -			message = "writing event-channel";
> +			message = "writing multi-queue-num-queues";
>  			goto abort_transaction;
>  		}
> -	} else {
> -		/* Not supported at this stage */
> -		goto abort_transaction;
> +
> +		pathsize = strlen(dev->nodename) + QUEUE_NAME_LEN;
> +		path = kzalloc(pathsize, GFP_KERNEL);

kmalloc seems better here, since you are memseting it below in order to
reuse it.

> +		if (!path) {
> +			err = -ENOMEM;
> +			message = "ENOMEM while writing ring references";
> +			goto abort_transaction;
> +		}
> +
> +		for (i = 0; i < dinfo->nr_rings; i++) {
> +			memset(path, 0, pathsize);
> +			snprintf(path, pathsize, "%s/queue-%u", dev->nodename, i);
> +			err = write_per_ring_nodes(xbt, &dinfo->rinfo[i], path);
> +			if (err) {
> +				kfree(path);
> +				goto out;
> +			}
> +		}
> +		kfree(path);
>  	}
>  
>  	err = xenbus_printf(xbt, dev->nodename, "protocol", "%s",
> @@ -1480,6 +1536,7 @@ static int blkfront_probe(struct xenbus_device *dev,
>  	int err, vdevice, r_index;
>  	struct blkfront_dev_info *dinfo;
>  	struct blkfront_ring_info *rinfo;
> +	unsigned int back_max_queues = 0;
>  
>  	/* FIXME: Use dynamic device id if this is not set. */
>  	err = xenbus_scanf(XBT_NIL, dev->nodename,
> @@ -1534,7 +1591,17 @@ static int blkfront_probe(struct xenbus_device *dev,
>  	dinfo->vdevice = vdevice;
>  	dinfo->connected = BLKIF_STATE_DISCONNECTED;
>  
> -	dinfo->nr_rings = 1;
> +	/* Check if backend supports multiple queues */
> +	err = xenbus_scanf(XBT_NIL, dinfo->xbdev->otherend,
> +			   "multi-queue-max-queues", "%u", &back_max_queues);
> +	if (err < 0)
> +		back_max_queues = 1;
> +
> +	dinfo->nr_rings = min(back_max_queues, xen_blkif_max_queues);
> +	if (dinfo->nr_rings <= 0)
> +		dinfo->nr_rings = 1;
> +	pr_info("dinfo->nr_rings:%u, backend support max-queues:%u\n", dinfo->nr_rings, back_max_queues);
        ^ pr_debug                           ^ supports

And I would also print the number of queues that are going to be used.

> +
>  	dinfo->rinfo = kzalloc(sizeof(*rinfo) * dinfo->nr_rings, GFP_KERNEL);
>  	if (!dinfo->rinfo) {
>  		xenbus_dev_fatal(dev, -ENOMEM, "allocating ring_info structure");
> @@ -2257,6 +2324,7 @@ static struct xenbus_driver blkfront_driver = {
>  static int __init xlblk_init(void)
>  {
>  	int ret;
> +	int nr_cpus = num_online_cpus();
>  
>  	if (!xen_domain())
>  		return -ENODEV;
> @@ -2267,6 +2335,12 @@ static int __init xlblk_init(void)
>  		xen_blkif_max_ring_order = 0;
>  	}
>  
> +	if (xen_blkif_max_queues > nr_cpus) {

Shouldn't there be a default value for xen_blkif_max_queues if the user
hasn't set the parameter on the command line?

> +		pr_info("Invalid max_queues (%d), will use default max: %d.\n",
> +			xen_blkif_max_queues, nr_cpus);
> +		xen_blkif_max_queues = nr_cpus;
> +	}
> +
>  	if (!xen_has_pv_disk_devices())
>  		return -ENODEV;
>  
> 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 7/9] xen/blkback: separate ring information out of struct xen_blkif
  2015-09-05 12:39   ` Bob Liu
  (?)
@ 2015-10-05 14:55   ` Roger Pau Monné
  2015-10-07 10:41     ` Bob Liu
                       ` (3 more replies)
  -1 siblings, 4 replies; 83+ messages in thread
From: Roger Pau Monné @ 2015-10-05 14:55 UTC (permalink / raw)
  To: Bob Liu, xen-devel
  Cc: david.vrabel, linux-kernel, konrad.wilk, felipe.franciosi, axboe,
	hch, avanzini.arianna, rafal.mielniczuk, boris.ostrovsky,
	jonathan.davies

El 05/09/15 a les 14.39, Bob Liu ha escrit:
> Split per ring information to an new structure:xen_blkif_ring, so that one vbd
> device can associate with one or more rings/hardware queues.
> 
> This patch is a preparation for supporting multi hardware queues/rings.
> 
> Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> ---
>  drivers/block/xen-blkback/blkback.c |  365 ++++++++++++++++++-----------------
>  drivers/block/xen-blkback/common.h  |   52 +++--
>  drivers/block/xen-blkback/xenbus.c  |  130 +++++++------
>  3 files changed, 295 insertions(+), 252 deletions(-)
> 
> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
> index 954c002..fd02240 100644
> --- a/drivers/block/xen-blkback/blkback.c
> +++ b/drivers/block/xen-blkback/blkback.c
> @@ -113,71 +113,71 @@ module_param(log_stats, int, 0644);
>  /* Number of free pages to remove on each call to gnttab_free_pages */
>  #define NUM_BATCH_FREE_PAGES 10
>  
> -static inline int get_free_page(struct xen_blkif *blkif, struct page **page)
> +static inline int get_free_page(struct xen_blkif_ring *ring, struct page **page)
>  {
>  	unsigned long flags;
>  
> -	spin_lock_irqsave(&blkif->free_pages_lock, flags);
> -	if (list_empty(&blkif->free_pages)) {
> -		BUG_ON(blkif->free_pages_num != 0);
> -		spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
> +	spin_lock_irqsave(&ring->free_pages_lock, flags);
> +	if (list_empty(&ring->free_pages)) {

I'm afraid the pool of free pages should be per-device, not per-ring.

> +		BUG_ON(ring->free_pages_num != 0);
> +		spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>  		return gnttab_alloc_pages(1, page);
>  	}
> -	BUG_ON(blkif->free_pages_num == 0);
> -	page[0] = list_first_entry(&blkif->free_pages, struct page, lru);
> +	BUG_ON(ring->free_pages_num == 0);
> +	page[0] = list_first_entry(&ring->free_pages, struct page, lru);
>  	list_del(&page[0]->lru);
> -	blkif->free_pages_num--;
> -	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
> +	ring->free_pages_num--;
> +	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>  
>  	return 0;
>  }
>  
> -static inline void put_free_pages(struct xen_blkif *blkif, struct page **page,
> +static inline void put_free_pages(struct xen_blkif_ring *ring, struct page **page,
>                                    int num)
>  {
>  	unsigned long flags;
>  	int i;
>  
> -	spin_lock_irqsave(&blkif->free_pages_lock, flags);
> +	spin_lock_irqsave(&ring->free_pages_lock, flags);
>  	for (i = 0; i < num; i++)
> -		list_add(&page[i]->lru, &blkif->free_pages);
> -	blkif->free_pages_num += num;
> -	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
> +		list_add(&page[i]->lru, &ring->free_pages);
> +	ring->free_pages_num += num;
> +	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>  }
>  
> -static inline void shrink_free_pagepool(struct xen_blkif *blkif, int num)
> +static inline void shrink_free_pagepool(struct xen_blkif_ring *ring, int num)
>  {
>  	/* Remove requested pages in batches of NUM_BATCH_FREE_PAGES */
>  	struct page *page[NUM_BATCH_FREE_PAGES];
>  	unsigned int num_pages = 0;
>  	unsigned long flags;
>  
> -	spin_lock_irqsave(&blkif->free_pages_lock, flags);
> -	while (blkif->free_pages_num > num) {
> -		BUG_ON(list_empty(&blkif->free_pages));
> -		page[num_pages] = list_first_entry(&blkif->free_pages,
> +	spin_lock_irqsave(&ring->free_pages_lock, flags);
> +	while (ring->free_pages_num > num) {
> +		BUG_ON(list_empty(&ring->free_pages));
> +		page[num_pages] = list_first_entry(&ring->free_pages,
>  		                                   struct page, lru);
>  		list_del(&page[num_pages]->lru);
> -		blkif->free_pages_num--;
> +		ring->free_pages_num--;
>  		if (++num_pages == NUM_BATCH_FREE_PAGES) {
> -			spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
> +			spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>  			gnttab_free_pages(num_pages, page);
> -			spin_lock_irqsave(&blkif->free_pages_lock, flags);
> +			spin_lock_irqsave(&ring->free_pages_lock, flags);
>  			num_pages = 0;
>  		}
>  	}
> -	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
> +	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>  	if (num_pages != 0)
>  		gnttab_free_pages(num_pages, page);
>  }
>  
>  #define vaddr(page) ((unsigned long)pfn_to_kaddr(page_to_pfn(page)))
>  
> -static int do_block_io_op(struct xen_blkif *blkif);
> -static int dispatch_rw_block_io(struct xen_blkif *blkif,
> +static int do_block_io_op(struct xen_blkif_ring *ring);
> +static int dispatch_rw_block_io(struct xen_blkif_ring *ring,
>  				struct blkif_request *req,
>  				struct pending_req *pending_req);
> -static void make_response(struct xen_blkif *blkif, u64 id,
> +static void make_response(struct xen_blkif_ring *ring, u64 id,
>  			  unsigned short op, int st);
>  
>  #define foreach_grant_safe(pos, n, rbtree, node) \
> @@ -198,19 +198,19 @@ static void make_response(struct xen_blkif *blkif, u64 id,
>   * bit operations to modify the flags of a persistent grant and to count
>   * the number of used grants.
>   */
> -static int add_persistent_gnt(struct xen_blkif *blkif,
> +static int add_persistent_gnt(struct xen_blkif_ring *ring,
>  			       struct persistent_gnt *persistent_gnt)
>  {
>  	struct rb_node **new = NULL, *parent = NULL;
>  	struct persistent_gnt *this;
>  
> -	if (blkif->persistent_gnt_c >= xen_blkif_max_pgrants) {
> -		if (!blkif->vbd.overflow_max_grants)
> -			blkif->vbd.overflow_max_grants = 1;
> +	if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) {
> +		if (!ring->blkif->vbd.overflow_max_grants)
> +			ring->blkif->vbd.overflow_max_grants = 1;

The same for the pool of persistent grants, it should be per-device and
not per-ring.

And I think this issue is far worse than the others, because a frontend
might use a persistent grant on different queues, forcing the backend
map the grant several times for each queue, this is not acceptable IMO.

Roger.


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 7/9] xen/blkback: separate ring information out of struct xen_blkif
  2015-09-05 12:39   ` Bob Liu
  (?)
  (?)
@ 2015-10-05 14:55   ` Roger Pau Monné
  -1 siblings, 0 replies; 83+ messages in thread
From: Roger Pau Monné @ 2015-10-05 14:55 UTC (permalink / raw)
  To: Bob Liu, xen-devel
  Cc: hch, felipe.franciosi, rafal.mielniczuk, linux-kernel,
	jonathan.davies, axboe, david.vrabel, avanzini.arianna,
	boris.ostrovsky

El 05/09/15 a les 14.39, Bob Liu ha escrit:
> Split per ring information to an new structure:xen_blkif_ring, so that one vbd
> device can associate with one or more rings/hardware queues.
> 
> This patch is a preparation for supporting multi hardware queues/rings.
> 
> Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> ---
>  drivers/block/xen-blkback/blkback.c |  365 ++++++++++++++++++-----------------
>  drivers/block/xen-blkback/common.h  |   52 +++--
>  drivers/block/xen-blkback/xenbus.c  |  130 +++++++------
>  3 files changed, 295 insertions(+), 252 deletions(-)
> 
> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
> index 954c002..fd02240 100644
> --- a/drivers/block/xen-blkback/blkback.c
> +++ b/drivers/block/xen-blkback/blkback.c
> @@ -113,71 +113,71 @@ module_param(log_stats, int, 0644);
>  /* Number of free pages to remove on each call to gnttab_free_pages */
>  #define NUM_BATCH_FREE_PAGES 10
>  
> -static inline int get_free_page(struct xen_blkif *blkif, struct page **page)
> +static inline int get_free_page(struct xen_blkif_ring *ring, struct page **page)
>  {
>  	unsigned long flags;
>  
> -	spin_lock_irqsave(&blkif->free_pages_lock, flags);
> -	if (list_empty(&blkif->free_pages)) {
> -		BUG_ON(blkif->free_pages_num != 0);
> -		spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
> +	spin_lock_irqsave(&ring->free_pages_lock, flags);
> +	if (list_empty(&ring->free_pages)) {

I'm afraid the pool of free pages should be per-device, not per-ring.

> +		BUG_ON(ring->free_pages_num != 0);
> +		spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>  		return gnttab_alloc_pages(1, page);
>  	}
> -	BUG_ON(blkif->free_pages_num == 0);
> -	page[0] = list_first_entry(&blkif->free_pages, struct page, lru);
> +	BUG_ON(ring->free_pages_num == 0);
> +	page[0] = list_first_entry(&ring->free_pages, struct page, lru);
>  	list_del(&page[0]->lru);
> -	blkif->free_pages_num--;
> -	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
> +	ring->free_pages_num--;
> +	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>  
>  	return 0;
>  }
>  
> -static inline void put_free_pages(struct xen_blkif *blkif, struct page **page,
> +static inline void put_free_pages(struct xen_blkif_ring *ring, struct page **page,
>                                    int num)
>  {
>  	unsigned long flags;
>  	int i;
>  
> -	spin_lock_irqsave(&blkif->free_pages_lock, flags);
> +	spin_lock_irqsave(&ring->free_pages_lock, flags);
>  	for (i = 0; i < num; i++)
> -		list_add(&page[i]->lru, &blkif->free_pages);
> -	blkif->free_pages_num += num;
> -	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
> +		list_add(&page[i]->lru, &ring->free_pages);
> +	ring->free_pages_num += num;
> +	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>  }
>  
> -static inline void shrink_free_pagepool(struct xen_blkif *blkif, int num)
> +static inline void shrink_free_pagepool(struct xen_blkif_ring *ring, int num)
>  {
>  	/* Remove requested pages in batches of NUM_BATCH_FREE_PAGES */
>  	struct page *page[NUM_BATCH_FREE_PAGES];
>  	unsigned int num_pages = 0;
>  	unsigned long flags;
>  
> -	spin_lock_irqsave(&blkif->free_pages_lock, flags);
> -	while (blkif->free_pages_num > num) {
> -		BUG_ON(list_empty(&blkif->free_pages));
> -		page[num_pages] = list_first_entry(&blkif->free_pages,
> +	spin_lock_irqsave(&ring->free_pages_lock, flags);
> +	while (ring->free_pages_num > num) {
> +		BUG_ON(list_empty(&ring->free_pages));
> +		page[num_pages] = list_first_entry(&ring->free_pages,
>  		                                   struct page, lru);
>  		list_del(&page[num_pages]->lru);
> -		blkif->free_pages_num--;
> +		ring->free_pages_num--;
>  		if (++num_pages == NUM_BATCH_FREE_PAGES) {
> -			spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
> +			spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>  			gnttab_free_pages(num_pages, page);
> -			spin_lock_irqsave(&blkif->free_pages_lock, flags);
> +			spin_lock_irqsave(&ring->free_pages_lock, flags);
>  			num_pages = 0;
>  		}
>  	}
> -	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
> +	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>  	if (num_pages != 0)
>  		gnttab_free_pages(num_pages, page);
>  }
>  
>  #define vaddr(page) ((unsigned long)pfn_to_kaddr(page_to_pfn(page)))
>  
> -static int do_block_io_op(struct xen_blkif *blkif);
> -static int dispatch_rw_block_io(struct xen_blkif *blkif,
> +static int do_block_io_op(struct xen_blkif_ring *ring);
> +static int dispatch_rw_block_io(struct xen_blkif_ring *ring,
>  				struct blkif_request *req,
>  				struct pending_req *pending_req);
> -static void make_response(struct xen_blkif *blkif, u64 id,
> +static void make_response(struct xen_blkif_ring *ring, u64 id,
>  			  unsigned short op, int st);
>  
>  #define foreach_grant_safe(pos, n, rbtree, node) \
> @@ -198,19 +198,19 @@ static void make_response(struct xen_blkif *blkif, u64 id,
>   * bit operations to modify the flags of a persistent grant and to count
>   * the number of used grants.
>   */
> -static int add_persistent_gnt(struct xen_blkif *blkif,
> +static int add_persistent_gnt(struct xen_blkif_ring *ring,
>  			       struct persistent_gnt *persistent_gnt)
>  {
>  	struct rb_node **new = NULL, *parent = NULL;
>  	struct persistent_gnt *this;
>  
> -	if (blkif->persistent_gnt_c >= xen_blkif_max_pgrants) {
> -		if (!blkif->vbd.overflow_max_grants)
> -			blkif->vbd.overflow_max_grants = 1;
> +	if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) {
> +		if (!ring->blkif->vbd.overflow_max_grants)
> +			ring->blkif->vbd.overflow_max_grants = 1;

The same for the pool of persistent grants, it should be per-device and
not per-ring.

And I think this issue is far worse than the others, because a frontend
might use a persistent grant on different queues, forcing the backend
map the grant several times for each queue, this is not acceptable IMO.

Roger.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 7/9] xen/blkback: separate ring information out of struct xen_blkif
  2015-09-05 12:39   ` Bob Liu
                     ` (3 preceding siblings ...)
  (?)
@ 2015-10-05 14:55   ` Roger Pau Monné
  -1 siblings, 0 replies; 83+ messages in thread
From: Roger Pau Monné @ 2015-10-05 14:55 UTC (permalink / raw)
  To: Bob Liu, xen-devel
  Cc: david.vrabel, linux-kernel, konrad.wilk, felipe.franciosi, axboe,
	hch, avanzini.arianna, rafal.mielniczuk, boris.ostrovsky,
	jonathan.davies

El 05/09/15 a les 14.39, Bob Liu ha escrit:
> Split per ring information to an new structure:xen_blkif_ring, so that one vbd
> device can associate with one or more rings/hardware queues.
> 
> This patch is a preparation for supporting multi hardware queues/rings.
> 
> Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> ---
>  drivers/block/xen-blkback/blkback.c |  365 ++++++++++++++++++-----------------
>  drivers/block/xen-blkback/common.h  |   52 +++--
>  drivers/block/xen-blkback/xenbus.c  |  130 +++++++------
>  3 files changed, 295 insertions(+), 252 deletions(-)
> 
> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
> index 954c002..fd02240 100644
> --- a/drivers/block/xen-blkback/blkback.c
> +++ b/drivers/block/xen-blkback/blkback.c
> @@ -113,71 +113,71 @@ module_param(log_stats, int, 0644);
>  /* Number of free pages to remove on each call to gnttab_free_pages */
>  #define NUM_BATCH_FREE_PAGES 10
>  
> -static inline int get_free_page(struct xen_blkif *blkif, struct page **page)
> +static inline int get_free_page(struct xen_blkif_ring *ring, struct page **page)
>  {
>  	unsigned long flags;
>  
> -	spin_lock_irqsave(&blkif->free_pages_lock, flags);
> -	if (list_empty(&blkif->free_pages)) {
> -		BUG_ON(blkif->free_pages_num != 0);
> -		spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
> +	spin_lock_irqsave(&ring->free_pages_lock, flags);
> +	if (list_empty(&ring->free_pages)) {

I'm afraid the pool of free pages should be per-device, not per-ring.

> +		BUG_ON(ring->free_pages_num != 0);
> +		spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>  		return gnttab_alloc_pages(1, page);
>  	}
> -	BUG_ON(blkif->free_pages_num == 0);
> -	page[0] = list_first_entry(&blkif->free_pages, struct page, lru);
> +	BUG_ON(ring->free_pages_num == 0);
> +	page[0] = list_first_entry(&ring->free_pages, struct page, lru);
>  	list_del(&page[0]->lru);
> -	blkif->free_pages_num--;
> -	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
> +	ring->free_pages_num--;
> +	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>  
>  	return 0;
>  }
>  
> -static inline void put_free_pages(struct xen_blkif *blkif, struct page **page,
> +static inline void put_free_pages(struct xen_blkif_ring *ring, struct page **page,
>                                    int num)
>  {
>  	unsigned long flags;
>  	int i;
>  
> -	spin_lock_irqsave(&blkif->free_pages_lock, flags);
> +	spin_lock_irqsave(&ring->free_pages_lock, flags);
>  	for (i = 0; i < num; i++)
> -		list_add(&page[i]->lru, &blkif->free_pages);
> -	blkif->free_pages_num += num;
> -	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
> +		list_add(&page[i]->lru, &ring->free_pages);
> +	ring->free_pages_num += num;
> +	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>  }
>  
> -static inline void shrink_free_pagepool(struct xen_blkif *blkif, int num)
> +static inline void shrink_free_pagepool(struct xen_blkif_ring *ring, int num)
>  {
>  	/* Remove requested pages in batches of NUM_BATCH_FREE_PAGES */
>  	struct page *page[NUM_BATCH_FREE_PAGES];
>  	unsigned int num_pages = 0;
>  	unsigned long flags;
>  
> -	spin_lock_irqsave(&blkif->free_pages_lock, flags);
> -	while (blkif->free_pages_num > num) {
> -		BUG_ON(list_empty(&blkif->free_pages));
> -		page[num_pages] = list_first_entry(&blkif->free_pages,
> +	spin_lock_irqsave(&ring->free_pages_lock, flags);
> +	while (ring->free_pages_num > num) {
> +		BUG_ON(list_empty(&ring->free_pages));
> +		page[num_pages] = list_first_entry(&ring->free_pages,
>  		                                   struct page, lru);
>  		list_del(&page[num_pages]->lru);
> -		blkif->free_pages_num--;
> +		ring->free_pages_num--;
>  		if (++num_pages == NUM_BATCH_FREE_PAGES) {
> -			spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
> +			spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>  			gnttab_free_pages(num_pages, page);
> -			spin_lock_irqsave(&blkif->free_pages_lock, flags);
> +			spin_lock_irqsave(&ring->free_pages_lock, flags);
>  			num_pages = 0;
>  		}
>  	}
> -	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
> +	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>  	if (num_pages != 0)
>  		gnttab_free_pages(num_pages, page);
>  }
>  
>  #define vaddr(page) ((unsigned long)pfn_to_kaddr(page_to_pfn(page)))
>  
> -static int do_block_io_op(struct xen_blkif *blkif);
> -static int dispatch_rw_block_io(struct xen_blkif *blkif,
> +static int do_block_io_op(struct xen_blkif_ring *ring);
> +static int dispatch_rw_block_io(struct xen_blkif_ring *ring,
>  				struct blkif_request *req,
>  				struct pending_req *pending_req);
> -static void make_response(struct xen_blkif *blkif, u64 id,
> +static void make_response(struct xen_blkif_ring *ring, u64 id,
>  			  unsigned short op, int st);
>  
>  #define foreach_grant_safe(pos, n, rbtree, node) \
> @@ -198,19 +198,19 @@ static void make_response(struct xen_blkif *blkif, u64 id,
>   * bit operations to modify the flags of a persistent grant and to count
>   * the number of used grants.
>   */
> -static int add_persistent_gnt(struct xen_blkif *blkif,
> +static int add_persistent_gnt(struct xen_blkif_ring *ring,
>  			       struct persistent_gnt *persistent_gnt)
>  {
>  	struct rb_node **new = NULL, *parent = NULL;
>  	struct persistent_gnt *this;
>  
> -	if (blkif->persistent_gnt_c >= xen_blkif_max_pgrants) {
> -		if (!blkif->vbd.overflow_max_grants)
> -			blkif->vbd.overflow_max_grants = 1;
> +	if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) {
> +		if (!ring->blkif->vbd.overflow_max_grants)
> +			ring->blkif->vbd.overflow_max_grants = 1;

The same for the pool of persistent grants, it should be per-device and
not per-ring.

And I think this issue is far worse than the others, because a frontend
might use a persistent grant on different queues, forcing the backend
map the grant several times for each queue, this is not acceptable IMO.

Roger.


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 7/9] xen/blkback: separate ring information out of struct xen_blkif
  2015-09-05 12:39   ` Bob Liu
                     ` (2 preceding siblings ...)
  (?)
@ 2015-10-05 14:55   ` Roger Pau Monné
  -1 siblings, 0 replies; 83+ messages in thread
From: Roger Pau Monné @ 2015-10-05 14:55 UTC (permalink / raw)
  To: Bob Liu, xen-devel
  Cc: hch, felipe.franciosi, rafal.mielniczuk, linux-kernel,
	jonathan.davies, axboe, david.vrabel, avanzini.arianna,
	boris.ostrovsky

El 05/09/15 a les 14.39, Bob Liu ha escrit:
> Split per ring information to an new structure:xen_blkif_ring, so that one vbd
> device can associate with one or more rings/hardware queues.
> 
> This patch is a preparation for supporting multi hardware queues/rings.
> 
> Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> ---
>  drivers/block/xen-blkback/blkback.c |  365 ++++++++++++++++++-----------------
>  drivers/block/xen-blkback/common.h  |   52 +++--
>  drivers/block/xen-blkback/xenbus.c  |  130 +++++++------
>  3 files changed, 295 insertions(+), 252 deletions(-)
> 
> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
> index 954c002..fd02240 100644
> --- a/drivers/block/xen-blkback/blkback.c
> +++ b/drivers/block/xen-blkback/blkback.c
> @@ -113,71 +113,71 @@ module_param(log_stats, int, 0644);
>  /* Number of free pages to remove on each call to gnttab_free_pages */
>  #define NUM_BATCH_FREE_PAGES 10
>  
> -static inline int get_free_page(struct xen_blkif *blkif, struct page **page)
> +static inline int get_free_page(struct xen_blkif_ring *ring, struct page **page)
>  {
>  	unsigned long flags;
>  
> -	spin_lock_irqsave(&blkif->free_pages_lock, flags);
> -	if (list_empty(&blkif->free_pages)) {
> -		BUG_ON(blkif->free_pages_num != 0);
> -		spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
> +	spin_lock_irqsave(&ring->free_pages_lock, flags);
> +	if (list_empty(&ring->free_pages)) {

I'm afraid the pool of free pages should be per-device, not per-ring.

> +		BUG_ON(ring->free_pages_num != 0);
> +		spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>  		return gnttab_alloc_pages(1, page);
>  	}
> -	BUG_ON(blkif->free_pages_num == 0);
> -	page[0] = list_first_entry(&blkif->free_pages, struct page, lru);
> +	BUG_ON(ring->free_pages_num == 0);
> +	page[0] = list_first_entry(&ring->free_pages, struct page, lru);
>  	list_del(&page[0]->lru);
> -	blkif->free_pages_num--;
> -	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
> +	ring->free_pages_num--;
> +	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>  
>  	return 0;
>  }
>  
> -static inline void put_free_pages(struct xen_blkif *blkif, struct page **page,
> +static inline void put_free_pages(struct xen_blkif_ring *ring, struct page **page,
>                                    int num)
>  {
>  	unsigned long flags;
>  	int i;
>  
> -	spin_lock_irqsave(&blkif->free_pages_lock, flags);
> +	spin_lock_irqsave(&ring->free_pages_lock, flags);
>  	for (i = 0; i < num; i++)
> -		list_add(&page[i]->lru, &blkif->free_pages);
> -	blkif->free_pages_num += num;
> -	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
> +		list_add(&page[i]->lru, &ring->free_pages);
> +	ring->free_pages_num += num;
> +	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>  }
>  
> -static inline void shrink_free_pagepool(struct xen_blkif *blkif, int num)
> +static inline void shrink_free_pagepool(struct xen_blkif_ring *ring, int num)
>  {
>  	/* Remove requested pages in batches of NUM_BATCH_FREE_PAGES */
>  	struct page *page[NUM_BATCH_FREE_PAGES];
>  	unsigned int num_pages = 0;
>  	unsigned long flags;
>  
> -	spin_lock_irqsave(&blkif->free_pages_lock, flags);
> -	while (blkif->free_pages_num > num) {
> -		BUG_ON(list_empty(&blkif->free_pages));
> -		page[num_pages] = list_first_entry(&blkif->free_pages,
> +	spin_lock_irqsave(&ring->free_pages_lock, flags);
> +	while (ring->free_pages_num > num) {
> +		BUG_ON(list_empty(&ring->free_pages));
> +		page[num_pages] = list_first_entry(&ring->free_pages,
>  		                                   struct page, lru);
>  		list_del(&page[num_pages]->lru);
> -		blkif->free_pages_num--;
> +		ring->free_pages_num--;
>  		if (++num_pages == NUM_BATCH_FREE_PAGES) {
> -			spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
> +			spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>  			gnttab_free_pages(num_pages, page);
> -			spin_lock_irqsave(&blkif->free_pages_lock, flags);
> +			spin_lock_irqsave(&ring->free_pages_lock, flags);
>  			num_pages = 0;
>  		}
>  	}
> -	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
> +	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>  	if (num_pages != 0)
>  		gnttab_free_pages(num_pages, page);
>  }
>  
>  #define vaddr(page) ((unsigned long)pfn_to_kaddr(page_to_pfn(page)))
>  
> -static int do_block_io_op(struct xen_blkif *blkif);
> -static int dispatch_rw_block_io(struct xen_blkif *blkif,
> +static int do_block_io_op(struct xen_blkif_ring *ring);
> +static int dispatch_rw_block_io(struct xen_blkif_ring *ring,
>  				struct blkif_request *req,
>  				struct pending_req *pending_req);
> -static void make_response(struct xen_blkif *blkif, u64 id,
> +static void make_response(struct xen_blkif_ring *ring, u64 id,
>  			  unsigned short op, int st);
>  
>  #define foreach_grant_safe(pos, n, rbtree, node) \
> @@ -198,19 +198,19 @@ static void make_response(struct xen_blkif *blkif, u64 id,
>   * bit operations to modify the flags of a persistent grant and to count
>   * the number of used grants.
>   */
> -static int add_persistent_gnt(struct xen_blkif *blkif,
> +static int add_persistent_gnt(struct xen_blkif_ring *ring,
>  			       struct persistent_gnt *persistent_gnt)
>  {
>  	struct rb_node **new = NULL, *parent = NULL;
>  	struct persistent_gnt *this;
>  
> -	if (blkif->persistent_gnt_c >= xen_blkif_max_pgrants) {
> -		if (!blkif->vbd.overflow_max_grants)
> -			blkif->vbd.overflow_max_grants = 1;
> +	if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) {
> +		if (!ring->blkif->vbd.overflow_max_grants)
> +			ring->blkif->vbd.overflow_max_grants = 1;

The same for the pool of persistent grants, it should be per-device and
not per-ring.

And I think this issue is far worse than the others, because a frontend
might use a persistent grant on different queues, forcing the backend
map the grant several times for each queue, this is not acceptable IMO.

Roger.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 8/9] xen/blkback: pseudo support for multi hardware queues/rings
  2015-09-05 12:39   ` Bob Liu
  (?)
  (?)
@ 2015-10-05 15:08   ` Roger Pau Monné
  2015-10-07 10:50     ` Bob Liu
  2015-10-07 10:50     ` Bob Liu
  -1 siblings, 2 replies; 83+ messages in thread
From: Roger Pau Monné @ 2015-10-05 15:08 UTC (permalink / raw)
  To: Bob Liu, xen-devel
  Cc: david.vrabel, linux-kernel, konrad.wilk, felipe.franciosi, axboe,
	hch, avanzini.arianna, rafal.mielniczuk, boris.ostrovsky,
	jonathan.davies

El 05/09/15 a les 14.39, Bob Liu ha escrit:
> Prepare patch for multi hardware queues/rings, the ring number was set to 1 by
> force.

This should be:

Preparatory patch for multiple hardware queues (rings). The number of
rings is unconditionally set to 1.

But frankly this description is not helpful at all, you should describe
the preparatory changes and why you need them.

> 
> Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> ---
>  drivers/block/xen-blkback/common.h |    3 +-
>  drivers/block/xen-blkback/xenbus.c |  328 +++++++++++++++++++++++-------------
>  2 files changed, 209 insertions(+), 122 deletions(-)
> 
> diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
> index cc253d4..ba058a0 100644
> --- a/drivers/block/xen-blkback/common.h
> +++ b/drivers/block/xen-blkback/common.h
> @@ -339,7 +339,8 @@ struct xen_blkif {
>  	unsigned long long			st_wr_sect;
>  	unsigned int nr_ring_pages;
>  	/* All rings for this device */
> -	struct xen_blkif_ring ring;
> +	struct xen_blkif_ring *rings;
> +	unsigned int nr_rings;
>  };
>  
>  struct seg_buf {
> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> index 6482ee3..04b8d0d 100644
> --- a/drivers/block/xen-blkback/xenbus.c
> +++ b/drivers/block/xen-blkback/xenbus.c
> @@ -26,6 +26,7 @@
>  /* Enlarge the array size in order to fully show blkback name. */
>  #define BLKBACK_NAME_LEN (20)
>  #define RINGREF_NAME_LEN (20)
> +#define RINGREF_NAME_LEN (20)

Duplicate define?

>  
>  struct backend_info {
>  	struct xenbus_device	*dev;
> @@ -84,11 +85,13 @@ static int blkback_name(struct xen_blkif *blkif, char *buf)
>  
>  static void xen_update_blkif_status(struct xen_blkif *blkif)
>  {
> -	int err;
> +	int err, i;
>  	char name[BLKBACK_NAME_LEN];
> +	struct xen_blkif_ring *ring;
> +	char per_ring_name[BLKBACK_NAME_LEN + 2];

Hm, why don't you just add + 2 to the place where BLKBACK_NAME_LEN is
defined and use the same character array ("name")? This is just a waste
of stack.

>  
>  	/* Not ready to connect? */
> -	if (!blkif->ring.irq || !blkif->vbd.bdev)
> +	if (!blkif->rings || !blkif->rings[0].irq || !blkif->vbd.bdev)
>  		return;
>  
>  	/* Already connected? */
> @@ -113,19 +116,68 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
>  	}
>  	invalidate_inode_pages2(blkif->vbd.bdev->bd_inode->i_mapping);
>  
> -	blkif->ring.xenblkd = kthread_run(xen_blkif_schedule, &blkif->ring, "%s", name);
> -	if (IS_ERR(blkif->ring.xenblkd)) {
> -		err = PTR_ERR(blkif->ring.xenblkd);
> -		blkif->ring.xenblkd = NULL;
> -		xenbus_dev_error(blkif->be->dev, err, "start xenblkd");
> -		return;
> +	if (blkif->nr_rings == 1) {
> +		blkif->rings[0].xenblkd = kthread_run(xen_blkif_schedule, &blkif->rings[0], "%s", name);
> +		if (IS_ERR(blkif->rings[0].xenblkd)) {
> +			err = PTR_ERR(blkif->rings[0].xenblkd);
> +			blkif->rings[0].xenblkd = NULL;
> +			xenbus_dev_error(blkif->be->dev, err, "start xenblkd");
> +			return;
> +		}
> +	} else {
> +		for (i = 0; i < blkif->nr_rings; i++) {
> +			snprintf(per_ring_name, BLKBACK_NAME_LEN + 2, "%s-%d", name, i);
> +			ring = &blkif->rings[i];
> +			ring->xenblkd = kthread_run(xen_blkif_schedule, ring, "%s", per_ring_name);
> +			if (IS_ERR(ring->xenblkd)) {
> +				err = PTR_ERR(ring->xenblkd);
> +				ring->xenblkd = NULL;
> +				xenbus_dev_error(blkif->be->dev, err,
> +						"start %s xenblkd", per_ring_name);
> +				return;
> +			}
> +		}
> +	}
> +}
> +
> +static int xen_blkif_alloc_rings(struct xen_blkif *blkif)
> +{
> +	struct xen_blkif_ring *ring;
> +	int r;
> +
> +	blkif->rings = kzalloc(blkif->nr_rings * sizeof(struct xen_blkif_ring), GFP_KERNEL);
> +	if (!blkif->rings)
> +		return -ENOMEM;
> +
> +	for (r = 0; r < blkif->nr_rings; r++) {
> +		ring = &blkif->rings[r];
> +
> +		spin_lock_init(&ring->blk_ring_lock);
> +		init_waitqueue_head(&ring->wq);
> +		ring->st_print = jiffies;
> +		ring->persistent_gnts.rb_node = NULL;
> +		spin_lock_init(&ring->free_pages_lock);
> +		INIT_LIST_HEAD(&ring->free_pages);
> +		INIT_LIST_HEAD(&ring->persistent_purge_list);
> +		ring->free_pages_num = 0;
> +		atomic_set(&ring->persistent_gnt_in_use, 0);
> +		atomic_set(&ring->inflight, 0);
> +		INIT_WORK(&ring->persistent_purge_work, xen_blkbk_unmap_purged_grants);
> +		INIT_LIST_HEAD(&ring->pending_free);
> +
> +		spin_lock_init(&ring->pending_free_lock);
> +		init_waitqueue_head(&ring->pending_free_wq);
> +		init_waitqueue_head(&ring->shutdown_wq);

I've already commented on the previous patch, but a bunch of this needs
to be per-device rather than per-ring.

> +		ring->blkif = blkif;
> +		xen_blkif_get(blkif);
>  	}
> +
> +	return 0;
>  }
>  
>  static struct xen_blkif *xen_blkif_alloc(domid_t domid)
>  {
>  	struct xen_blkif *blkif;
> -	struct xen_blkif_ring *ring;
>  
>  	BUILD_BUG_ON(MAX_INDIRECT_PAGES > BLKIF_MAX_INDIRECT_PAGES_PER_REQUEST);
>  
> @@ -134,29 +186,16 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
>  		return ERR_PTR(-ENOMEM);
>  
>  	blkif->domid = domid;
> -	ring = &blkif->ring;
> -	ring->blkif = blkif;
> -	spin_lock_init(&ring->blk_ring_lock);
>  	atomic_set(&blkif->refcnt, 1);
> -	init_waitqueue_head(&ring->wq);
>  	init_completion(&blkif->drain_complete);
>  	atomic_set(&blkif->drain, 0);
> -	ring->st_print = jiffies;
> -	ring->persistent_gnts.rb_node = NULL;
> -	spin_lock_init(&ring->free_pages_lock);
> -	INIT_LIST_HEAD(&ring->free_pages);
> -	INIT_LIST_HEAD(&ring->persistent_purge_list);
> -	ring->free_pages_num = 0;
> -	atomic_set(&ring->persistent_gnt_in_use, 0);
> -	atomic_set(&ring->inflight, 0);
> -	INIT_WORK(&ring->persistent_purge_work, xen_blkbk_unmap_purged_grants);
> -
> -	INIT_LIST_HEAD(&ring->pending_free);
>  	INIT_WORK(&blkif->free_work, xen_blkif_deferred_free);
> -	spin_lock_init(&ring->pending_free_lock);
> -	init_waitqueue_head(&ring->pending_free_wq);
> -	init_waitqueue_head(&ring->shutdown_wq);
>  
> +	blkif->nr_rings = 1;
> +	if (xen_blkif_alloc_rings(blkif)) {
> +		kmem_cache_free(xen_blkif_cachep, blkif);
> +		return ERR_PTR(-ENOMEM);
> +	}
>  	return blkif;
>  }
>  
> @@ -216,70 +255,78 @@ static int xen_blkif_map(struct xen_blkif_ring *ring, grant_ref_t *gref,
>  
>  static int xen_blkif_disconnect(struct xen_blkif *blkif)
>  {
> -	struct xen_blkif_ring *ring = &blkif->ring;
> +	struct xen_blkif_ring *ring;
> +	int i;
> +
> +	for (i = 0; i < blkif->nr_rings; i++) {
> +		ring = &blkif->rings[i];
> +		if (ring->xenblkd) {
> +			kthread_stop(ring->xenblkd);
> +			wake_up(&ring->shutdown_wq);
> +			ring->xenblkd = NULL;
> +		}
>  
> -	if (ring->xenblkd) {
> -		kthread_stop(ring->xenblkd);
> -		wake_up(&ring->shutdown_wq);
> -		ring->xenblkd = NULL;
> -	}
> +		/* The above kthread_stop() guarantees that at this point we
> +		 * don't have any discard_io or other_io requests. So, checking
> +		 * for inflight IO is enough.
> +		 */
> +		if (atomic_read(&ring->inflight) > 0)
> +			return -EBUSY;
>  
> -	/* The above kthread_stop() guarantees that at this point we
> -	 * don't have any discard_io or other_io requests. So, checking
> -	 * for inflight IO is enough.
> -	 */
> -	if (atomic_read(&ring->inflight) > 0)
> -		return -EBUSY;
> +		if (ring->irq) {
> +			unbind_from_irqhandler(ring->irq, ring);
> +			ring->irq = 0;
> +		}
>  
> -	if (ring->irq) {
> -		unbind_from_irqhandler(ring->irq, ring);
> -		ring->irq = 0;
> -	}
> +		if (ring->blk_rings.common.sring) {
> +			xenbus_unmap_ring_vfree(blkif->be->dev, ring->blk_ring);
> +			ring->blk_rings.common.sring = NULL;
> +		}
>  
> -	if (ring->blk_rings.common.sring) {
> -		xenbus_unmap_ring_vfree(blkif->be->dev, ring->blk_ring);
> -		ring->blk_rings.common.sring = NULL;
> +		/* Remove all persistent grants and the cache of ballooned pages. */
> +		xen_blkbk_free_caches(ring);
>  	}
>  
> -	/* Remove all persistent grants and the cache of ballooned pages. */
> -	xen_blkbk_free_caches(ring);
> -
>  	return 0;
>  }
>  
>  static void xen_blkif_free(struct xen_blkif *blkif)
>  {
>  	struct pending_req *req, *n;
> -	int i = 0, j;
> -	struct xen_blkif_ring *ring = &blkif->ring;
> +	int i = 0, j, r;
> +	struct xen_blkif_ring *ring;
>  
>  	xen_blkif_disconnect(blkif);
>  	xen_vbd_free(&blkif->vbd);
>  
> -	/* Make sure everything is drained before shutting down */
> -	BUG_ON(ring->persistent_gnt_c != 0);
> -	BUG_ON(atomic_read(&ring->persistent_gnt_in_use) != 0);
> -	BUG_ON(ring->free_pages_num != 0);
> -	BUG_ON(!list_empty(&ring->persistent_purge_list));
> -	BUG_ON(!list_empty(&ring->free_pages));
> -	BUG_ON(!RB_EMPTY_ROOT(&ring->persistent_gnts));
> +	for (r = 0; r < blkif->nr_rings; r++) {
> +		ring = &blkif->rings[r];
> +		/* Make sure everything is drained before shutting down */
> +		BUG_ON(ring->persistent_gnt_c != 0);
> +		BUG_ON(atomic_read(&ring->persistent_gnt_in_use) != 0);
> +		BUG_ON(ring->free_pages_num != 0);
> +		BUG_ON(!list_empty(&ring->persistent_purge_list));
> +		BUG_ON(!list_empty(&ring->free_pages));
> +		BUG_ON(!RB_EMPTY_ROOT(&ring->persistent_gnts));
>  
> -	/* Check that there is no request in use */
> -	list_for_each_entry_safe(req, n, &ring->pending_free, free_list) {
> -		list_del(&req->free_list);
> +		/* Check that there is no request in use */
> +		list_for_each_entry_safe(req, n, &ring->pending_free, free_list) {
> +			list_del(&req->free_list);
>  
> -		for (j = 0; j < MAX_INDIRECT_SEGMENTS; j++)
> -			kfree(req->segments[j]);
> +			for (j = 0; j < MAX_INDIRECT_SEGMENTS; j++)
> +				kfree(req->segments[j]);
>  
> -		for (j = 0; j < MAX_INDIRECT_PAGES; j++)
> -			kfree(req->indirect_pages[j]);
> +			for (j = 0; j < MAX_INDIRECT_PAGES; j++)
> +				kfree(req->indirect_pages[j]);
>  
> -		kfree(req);
> -		i++;
> -	}
> +			kfree(req);
> +			i++;
> +		}
>  
> -	WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));
> +		WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));
> +	}
>  
> +	kfree(blkif->rings);
>  	kmem_cache_free(xen_blkif_cachep, blkif);
>  }
>  
> @@ -306,15 +353,19 @@ int __init xen_blkif_interface_init(void)
>  		struct xenbus_device *dev = to_xenbus_device(_dev);	\
>  		struct backend_info *be = dev_get_drvdata(&dev->dev);	\
>  		struct xen_blkif *blkif = be->blkif;			\
> -		struct xen_blkif_ring *ring = &blkif->ring;		\
> +		struct xen_blkif_ring *ring;				\
> +		int i;							\
>  									\
> -		blkif->st_oo_req = ring->st_oo_req;			\
> -		blkif->st_rd_req = ring->st_rd_req;			\
> -		blkif->st_wr_req = ring->st_wr_req;			\
> -		blkif->st_f_req = ring->st_f_req;			\
> -		blkif->st_ds_req = ring->st_ds_req;			\
> -		blkif->st_rd_sect = ring->st_rd_sect;			\
> -		blkif->st_wr_sect = ring->st_wr_sect;			\
> +		for (i = 0; i < blkif->nr_rings; i++) {			\
> +			ring = &blkif->rings[i];			\
> +			blkif->st_oo_req += ring->st_oo_req;		\
> +			blkif->st_rd_req += ring->st_rd_req;		\
> +			blkif->st_wr_req += ring->st_wr_req;		\
> +			blkif->st_f_req += ring->st_f_req;		\
> +			blkif->st_ds_req += ring->st_ds_req;		\
> +			blkif->st_rd_sect += ring->st_rd_sect;		\
> +			blkif->st_wr_sect += ring->st_wr_sect;		\
> +		}							\
>  									\
>  		return sprintf(buf, format, ##args);			\
>  	}								\
> @@ -438,6 +489,7 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
>  static int xen_blkbk_remove(struct xenbus_device *dev)
>  {
>  	struct backend_info *be = dev_get_drvdata(&dev->dev);
> +	int i;
>  
>  	pr_debug("%s %p %d\n", __func__, dev, dev->otherend_id);
>  
> @@ -454,7 +506,8 @@ static int xen_blkbk_remove(struct xenbus_device *dev)
>  
>  	if (be->blkif) {
>  		xen_blkif_disconnect(be->blkif);
> -		xen_blkif_put(be->blkif);
> +		for (i = 0; i < be->blkif->nr_rings; i++)
> +			xen_blkif_put(be->blkif);
>  	}
>  
>  	kfree(be->mode);
> @@ -837,21 +890,16 @@ again:
>  	xenbus_transaction_end(xbt, 1);
>  }
>  
> -
> -static int connect_ring(struct backend_info *be)
> +static int read_per_ring_refs(struct xen_blkif_ring *ring, const char *dir)
>  {
> -	struct xenbus_device *dev = be->dev;
> +	unsigned int ring_page_order, nr_grefs, evtchn;
>  	unsigned int ring_ref[XENBUS_MAX_RING_PAGES];
> -	unsigned int evtchn, nr_grefs, ring_page_order;
> -	unsigned int pers_grants;
> -	char protocol[64] = "";
>  	struct pending_req *req, *n;
>  	int err, i, j;
> -	struct xen_blkif_ring *ring = &be->blkif->ring;
> -
> -	pr_debug("%s %s\n", __func__, dev->otherend);
> +	struct xen_blkif *blkif = ring->blkif;
> +	struct xenbus_device *dev = blkif->be->dev;
>  
> -	err = xenbus_scanf(XBT_NIL, dev->otherend, "event-channel", "%u",
> +	err = xenbus_scanf(XBT_NIL, dir, "event-channel", "%u",
>  			  &evtchn);
>  	if (err != 1) {
>  		err = -EINVAL;
> @@ -859,12 +907,11 @@ static int connect_ring(struct backend_info *be)
>  				 dev->otherend);
>  		return err;
>  	}
> -	pr_info("event-channel %u\n", evtchn);
>  
>  	err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-page-order", "%u",
>  			  &ring_page_order);
>  	if (err != 1) {
> -		err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-ref",
> +		err = xenbus_scanf(XBT_NIL, dir, "ring-ref",
>  				  "%u", &ring_ref[0]);
>  		if (err != 1) {
>  			err = -EINVAL;
> @@ -873,7 +920,7 @@ static int connect_ring(struct backend_info *be)
>  			return err;
>  		}
>  		nr_grefs = 1;
> -		pr_info("%s:using single page: ring-ref %d\n", dev->otherend,
> +		pr_info("%s:using single page: ring-ref %d\n", dir,
>  			ring_ref[0]);
>  	} else {
>  		unsigned int i;
> @@ -891,7 +938,7 @@ static int connect_ring(struct backend_info *be)
>  			char ring_ref_name[RINGREF_NAME_LEN];
>  
>  			snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
> -			err = xenbus_scanf(XBT_NIL, dev->otherend, ring_ref_name,
> +			err = xenbus_scanf(XBT_NIL, dir, ring_ref_name,
>  					   "%u", &ring_ref[i]);
>  			if (err != 1) {
>  				err = -EINVAL;
> @@ -899,38 +946,12 @@ static int connect_ring(struct backend_info *be)
>  						 dev->otherend, ring_ref_name);
>  				return err;
>  			}
> -			pr_info("ring-ref%u: %u\n", i, ring_ref[i]);
> +			pr_info("%s: ring-ref%u: %u\n", dir, i, ring_ref[i]);
>  		}
>  	}
> +	blkif->nr_ring_pages = nr_grefs;
>  
> -	be->blkif->blk_protocol = BLKIF_PROTOCOL_DEFAULT;
> -	err = xenbus_gather(XBT_NIL, dev->otherend, "protocol",
> -			    "%63s", protocol, NULL);
> -	if (err)
> -		strcpy(protocol, "unspecified, assuming default");
> -	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_NATIVE))
> -		be->blkif->blk_protocol = BLKIF_PROTOCOL_NATIVE;
> -	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_X86_32))
> -		be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_32;
> -	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_X86_64))
> -		be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_64;
> -	else {
> -		xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
> -		return -1;
> -	}
> -	err = xenbus_gather(XBT_NIL, dev->otherend,
> -			    "feature-persistent", "%u",
> -			    &pers_grants, NULL);
> -	if (err)
> -		pers_grants = 0;
> -
> -	be->blkif->vbd.feature_gnt_persistent = pers_grants;
> -	be->blkif->vbd.overflow_max_grants = 0;
> -	be->blkif->nr_ring_pages = nr_grefs;
> -
> -	pr_info("ring-pages:%d, event-channel %d, protocol %d (%s) %s\n",
> -		nr_grefs, evtchn, be->blkif->blk_protocol, protocol,
> -		pers_grants ? "persistent grants" : "");
> +	pr_info("ring-pages:%d, event-channel %d.\n", nr_grefs, evtchn);
>  
>  	for (i = 0; i < nr_grefs * XEN_BLKIF_REQS_PER_PAGE; i++) {
>  		req = kzalloc(sizeof(*req), GFP_KERNEL);
> @@ -975,6 +996,71 @@ fail:
>  		kfree(req);
>  	}
>  	return -ENOMEM;
> +
> +}
> +
> +static int connect_ring(struct backend_info *be)
> +{
> +	struct xenbus_device *dev = be->dev;
> +	unsigned int pers_grants;
> +	char protocol[64] = "";
> +	int err, i;
> +	char *xspath;
> +	size_t xspathsize;
> +	const size_t xenstore_path_ext_size = 11; /* sufficient for "/queue-NNN" */

Maybe a define would be better than a const local variable here?

> +
> +	pr_debug("%s %s\n", __func__, dev->otherend);
> +
> +	be->blkif->blk_protocol = BLKIF_PROTOCOL_DEFAULT;
> +	err = xenbus_gather(XBT_NIL, dev->otherend, "protocol",
> +			    "%63s", protocol, NULL);
> +	if (err)
> +		strcpy(protocol, "unspecified, assuming default");
> +	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_NATIVE))
> +		be->blkif->blk_protocol = BLKIF_PROTOCOL_NATIVE;
> +	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_X86_32))
> +		be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_32;
> +	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_X86_64))
> +		be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_64;
> +	else {
> +		xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
> +		return -1;
> +	}
> +	err = xenbus_gather(XBT_NIL, dev->otherend,
> +			    "feature-persistent", "%u",
> +			    &pers_grants, NULL);
> +	if (err)
> +		pers_grants = 0;
> +
> +	be->blkif->vbd.feature_gnt_persistent = pers_grants;
> +	be->blkif->vbd.overflow_max_grants = 0;
> +
> +	pr_info("nr_rings:%d protocol %d (%s) %s\n", be->blkif->nr_rings,
> +		 be->blkif->blk_protocol, protocol,
> +		 pers_grants ? "persistent grants" : "");
> +
> +	if (be->blkif->nr_rings == 1)
> +		return read_per_ring_refs(&be->blkif->rings[0], dev->otherend);
> +	else {
> +		xspathsize = strlen(dev->otherend) + xenstore_path_ext_size;
> +		xspath = kzalloc(xspathsize, GFP_KERNEL);
> +		if (!xspath) {
> +			xenbus_dev_fatal(dev, -ENOMEM, "reading ring references");
> +			return -ENOMEM;
> +		}
> +
> +		for (i = 0; i < be->blkif->nr_rings; i++) {
> +			memset(xspath, 0, xspathsize);
> +			snprintf(xspath, xspathsize, "%s/queue-%u", dev->otherend, i);
> +			err = read_per_ring_refs(&be->blkif->rings[i], xspath);
> +			if (err) {
> +				kfree(xspath);
> +				return err;
> +			}
> +		}
> +		kfree(xspath);
> +	}
> +	return 0;
>  }
>  
>  static const struct xenbus_device_id xen_blkbk_ids[] = {
> 


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 8/9] xen/blkback: pseudo support for multi hardware queues/rings
  2015-09-05 12:39   ` Bob Liu
  (?)
@ 2015-10-05 15:08   ` Roger Pau Monné
  -1 siblings, 0 replies; 83+ messages in thread
From: Roger Pau Monné @ 2015-10-05 15:08 UTC (permalink / raw)
  To: Bob Liu, xen-devel
  Cc: hch, felipe.franciosi, rafal.mielniczuk, linux-kernel,
	jonathan.davies, axboe, david.vrabel, avanzini.arianna,
	boris.ostrovsky

El 05/09/15 a les 14.39, Bob Liu ha escrit:
> Prepare patch for multi hardware queues/rings, the ring number was set to 1 by
> force.

This should be:

Preparatory patch for multiple hardware queues (rings). The number of
rings is unconditionally set to 1.

But frankly this description is not helpful at all, you should describe
the preparatory changes and why you need them.

> 
> Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> ---
>  drivers/block/xen-blkback/common.h |    3 +-
>  drivers/block/xen-blkback/xenbus.c |  328 +++++++++++++++++++++++-------------
>  2 files changed, 209 insertions(+), 122 deletions(-)
> 
> diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
> index cc253d4..ba058a0 100644
> --- a/drivers/block/xen-blkback/common.h
> +++ b/drivers/block/xen-blkback/common.h
> @@ -339,7 +339,8 @@ struct xen_blkif {
>  	unsigned long long			st_wr_sect;
>  	unsigned int nr_ring_pages;
>  	/* All rings for this device */
> -	struct xen_blkif_ring ring;
> +	struct xen_blkif_ring *rings;
> +	unsigned int nr_rings;
>  };
>  
>  struct seg_buf {
> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> index 6482ee3..04b8d0d 100644
> --- a/drivers/block/xen-blkback/xenbus.c
> +++ b/drivers/block/xen-blkback/xenbus.c
> @@ -26,6 +26,7 @@
>  /* Enlarge the array size in order to fully show blkback name. */
>  #define BLKBACK_NAME_LEN (20)
>  #define RINGREF_NAME_LEN (20)
> +#define RINGREF_NAME_LEN (20)

Duplicate define?

>  
>  struct backend_info {
>  	struct xenbus_device	*dev;
> @@ -84,11 +85,13 @@ static int blkback_name(struct xen_blkif *blkif, char *buf)
>  
>  static void xen_update_blkif_status(struct xen_blkif *blkif)
>  {
> -	int err;
> +	int err, i;
>  	char name[BLKBACK_NAME_LEN];
> +	struct xen_blkif_ring *ring;
> +	char per_ring_name[BLKBACK_NAME_LEN + 2];

Hm, why don't you just add + 2 to the place where BLKBACK_NAME_LEN is
defined and use the same character array ("name")? This is just a waste
of stack.

>  
>  	/* Not ready to connect? */
> -	if (!blkif->ring.irq || !blkif->vbd.bdev)
> +	if (!blkif->rings || !blkif->rings[0].irq || !blkif->vbd.bdev)
>  		return;
>  
>  	/* Already connected? */
> @@ -113,19 +116,68 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
>  	}
>  	invalidate_inode_pages2(blkif->vbd.bdev->bd_inode->i_mapping);
>  
> -	blkif->ring.xenblkd = kthread_run(xen_blkif_schedule, &blkif->ring, "%s", name);
> -	if (IS_ERR(blkif->ring.xenblkd)) {
> -		err = PTR_ERR(blkif->ring.xenblkd);
> -		blkif->ring.xenblkd = NULL;
> -		xenbus_dev_error(blkif->be->dev, err, "start xenblkd");
> -		return;
> +	if (blkif->nr_rings == 1) {
> +		blkif->rings[0].xenblkd = kthread_run(xen_blkif_schedule, &blkif->rings[0], "%s", name);
> +		if (IS_ERR(blkif->rings[0].xenblkd)) {
> +			err = PTR_ERR(blkif->rings[0].xenblkd);
> +			blkif->rings[0].xenblkd = NULL;
> +			xenbus_dev_error(blkif->be->dev, err, "start xenblkd");
> +			return;
> +		}
> +	} else {
> +		for (i = 0; i < blkif->nr_rings; i++) {
> +			snprintf(per_ring_name, BLKBACK_NAME_LEN + 2, "%s-%d", name, i);
> +			ring = &blkif->rings[i];
> +			ring->xenblkd = kthread_run(xen_blkif_schedule, ring, "%s", per_ring_name);
> +			if (IS_ERR(ring->xenblkd)) {
> +				err = PTR_ERR(ring->xenblkd);
> +				ring->xenblkd = NULL;
> +				xenbus_dev_error(blkif->be->dev, err,
> +						"start %s xenblkd", per_ring_name);
> +				return;
> +			}
> +		}
> +	}
> +}
> +
> +static int xen_blkif_alloc_rings(struct xen_blkif *blkif)
> +{
> +	struct xen_blkif_ring *ring;
> +	int r;
> +
> +	blkif->rings = kzalloc(blkif->nr_rings * sizeof(struct xen_blkif_ring), GFP_KERNEL);
> +	if (!blkif->rings)
> +		return -ENOMEM;
> +
> +	for (r = 0; r < blkif->nr_rings; r++) {
> +		ring = &blkif->rings[r];
> +
> +		spin_lock_init(&ring->blk_ring_lock);
> +		init_waitqueue_head(&ring->wq);
> +		ring->st_print = jiffies;
> +		ring->persistent_gnts.rb_node = NULL;
> +		spin_lock_init(&ring->free_pages_lock);
> +		INIT_LIST_HEAD(&ring->free_pages);
> +		INIT_LIST_HEAD(&ring->persistent_purge_list);
> +		ring->free_pages_num = 0;
> +		atomic_set(&ring->persistent_gnt_in_use, 0);
> +		atomic_set(&ring->inflight, 0);
> +		INIT_WORK(&ring->persistent_purge_work, xen_blkbk_unmap_purged_grants);
> +		INIT_LIST_HEAD(&ring->pending_free);
> +
> +		spin_lock_init(&ring->pending_free_lock);
> +		init_waitqueue_head(&ring->pending_free_wq);
> +		init_waitqueue_head(&ring->shutdown_wq);

I've already commented on the previous patch, but a bunch of this needs
to be per-device rather than per-ring.

> +		ring->blkif = blkif;
> +		xen_blkif_get(blkif);
>  	}
> +
> +	return 0;
>  }
>  
>  static struct xen_blkif *xen_blkif_alloc(domid_t domid)
>  {
>  	struct xen_blkif *blkif;
> -	struct xen_blkif_ring *ring;
>  
>  	BUILD_BUG_ON(MAX_INDIRECT_PAGES > BLKIF_MAX_INDIRECT_PAGES_PER_REQUEST);
>  
> @@ -134,29 +186,16 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
>  		return ERR_PTR(-ENOMEM);
>  
>  	blkif->domid = domid;
> -	ring = &blkif->ring;
> -	ring->blkif = blkif;
> -	spin_lock_init(&ring->blk_ring_lock);
>  	atomic_set(&blkif->refcnt, 1);
> -	init_waitqueue_head(&ring->wq);
>  	init_completion(&blkif->drain_complete);
>  	atomic_set(&blkif->drain, 0);
> -	ring->st_print = jiffies;
> -	ring->persistent_gnts.rb_node = NULL;
> -	spin_lock_init(&ring->free_pages_lock);
> -	INIT_LIST_HEAD(&ring->free_pages);
> -	INIT_LIST_HEAD(&ring->persistent_purge_list);
> -	ring->free_pages_num = 0;
> -	atomic_set(&ring->persistent_gnt_in_use, 0);
> -	atomic_set(&ring->inflight, 0);
> -	INIT_WORK(&ring->persistent_purge_work, xen_blkbk_unmap_purged_grants);
> -
> -	INIT_LIST_HEAD(&ring->pending_free);
>  	INIT_WORK(&blkif->free_work, xen_blkif_deferred_free);
> -	spin_lock_init(&ring->pending_free_lock);
> -	init_waitqueue_head(&ring->pending_free_wq);
> -	init_waitqueue_head(&ring->shutdown_wq);
>  
> +	blkif->nr_rings = 1;
> +	if (xen_blkif_alloc_rings(blkif)) {
> +		kmem_cache_free(xen_blkif_cachep, blkif);
> +		return ERR_PTR(-ENOMEM);
> +	}
>  	return blkif;
>  }
>  
> @@ -216,70 +255,78 @@ static int xen_blkif_map(struct xen_blkif_ring *ring, grant_ref_t *gref,
>  
>  static int xen_blkif_disconnect(struct xen_blkif *blkif)
>  {
> -	struct xen_blkif_ring *ring = &blkif->ring;
> +	struct xen_blkif_ring *ring;
> +	int i;
> +
> +	for (i = 0; i < blkif->nr_rings; i++) {
> +		ring = &blkif->rings[i];
> +		if (ring->xenblkd) {
> +			kthread_stop(ring->xenblkd);
> +			wake_up(&ring->shutdown_wq);
> +			ring->xenblkd = NULL;
> +		}
>  
> -	if (ring->xenblkd) {
> -		kthread_stop(ring->xenblkd);
> -		wake_up(&ring->shutdown_wq);
> -		ring->xenblkd = NULL;
> -	}
> +		/* The above kthread_stop() guarantees that at this point we
> +		 * don't have any discard_io or other_io requests. So, checking
> +		 * for inflight IO is enough.
> +		 */
> +		if (atomic_read(&ring->inflight) > 0)
> +			return -EBUSY;
>  
> -	/* The above kthread_stop() guarantees that at this point we
> -	 * don't have any discard_io or other_io requests. So, checking
> -	 * for inflight IO is enough.
> -	 */
> -	if (atomic_read(&ring->inflight) > 0)
> -		return -EBUSY;
> +		if (ring->irq) {
> +			unbind_from_irqhandler(ring->irq, ring);
> +			ring->irq = 0;
> +		}
>  
> -	if (ring->irq) {
> -		unbind_from_irqhandler(ring->irq, ring);
> -		ring->irq = 0;
> -	}
> +		if (ring->blk_rings.common.sring) {
> +			xenbus_unmap_ring_vfree(blkif->be->dev, ring->blk_ring);
> +			ring->blk_rings.common.sring = NULL;
> +		}
>  
> -	if (ring->blk_rings.common.sring) {
> -		xenbus_unmap_ring_vfree(blkif->be->dev, ring->blk_ring);
> -		ring->blk_rings.common.sring = NULL;
> +		/* Remove all persistent grants and the cache of ballooned pages. */
> +		xen_blkbk_free_caches(ring);
>  	}
>  
> -	/* Remove all persistent grants and the cache of ballooned pages. */
> -	xen_blkbk_free_caches(ring);
> -
>  	return 0;
>  }
>  
>  static void xen_blkif_free(struct xen_blkif *blkif)
>  {
>  	struct pending_req *req, *n;
> -	int i = 0, j;
> -	struct xen_blkif_ring *ring = &blkif->ring;
> +	int i = 0, j, r;
> +	struct xen_blkif_ring *ring;
>  
>  	xen_blkif_disconnect(blkif);
>  	xen_vbd_free(&blkif->vbd);
>  
> -	/* Make sure everything is drained before shutting down */
> -	BUG_ON(ring->persistent_gnt_c != 0);
> -	BUG_ON(atomic_read(&ring->persistent_gnt_in_use) != 0);
> -	BUG_ON(ring->free_pages_num != 0);
> -	BUG_ON(!list_empty(&ring->persistent_purge_list));
> -	BUG_ON(!list_empty(&ring->free_pages));
> -	BUG_ON(!RB_EMPTY_ROOT(&ring->persistent_gnts));
> +	for (r = 0; r < blkif->nr_rings; r++) {
> +		ring = &blkif->rings[r];
> +		/* Make sure everything is drained before shutting down */
> +		BUG_ON(ring->persistent_gnt_c != 0);
> +		BUG_ON(atomic_read(&ring->persistent_gnt_in_use) != 0);
> +		BUG_ON(ring->free_pages_num != 0);
> +		BUG_ON(!list_empty(&ring->persistent_purge_list));
> +		BUG_ON(!list_empty(&ring->free_pages));
> +		BUG_ON(!RB_EMPTY_ROOT(&ring->persistent_gnts));
>  
> -	/* Check that there is no request in use */
> -	list_for_each_entry_safe(req, n, &ring->pending_free, free_list) {
> -		list_del(&req->free_list);
> +		/* Check that there is no request in use */
> +		list_for_each_entry_safe(req, n, &ring->pending_free, free_list) {
> +			list_del(&req->free_list);
>  
> -		for (j = 0; j < MAX_INDIRECT_SEGMENTS; j++)
> -			kfree(req->segments[j]);
> +			for (j = 0; j < MAX_INDIRECT_SEGMENTS; j++)
> +				kfree(req->segments[j]);
>  
> -		for (j = 0; j < MAX_INDIRECT_PAGES; j++)
> -			kfree(req->indirect_pages[j]);
> +			for (j = 0; j < MAX_INDIRECT_PAGES; j++)
> +				kfree(req->indirect_pages[j]);
>  
> -		kfree(req);
> -		i++;
> -	}
> +			kfree(req);
> +			i++;
> +		}
>  
> -	WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));
> +		WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));
> +	}
>  
> +	kfree(blkif->rings);
>  	kmem_cache_free(xen_blkif_cachep, blkif);
>  }
>  
> @@ -306,15 +353,19 @@ int __init xen_blkif_interface_init(void)
>  		struct xenbus_device *dev = to_xenbus_device(_dev);	\
>  		struct backend_info *be = dev_get_drvdata(&dev->dev);	\
>  		struct xen_blkif *blkif = be->blkif;			\
> -		struct xen_blkif_ring *ring = &blkif->ring;		\
> +		struct xen_blkif_ring *ring;				\
> +		int i;							\
>  									\
> -		blkif->st_oo_req = ring->st_oo_req;			\
> -		blkif->st_rd_req = ring->st_rd_req;			\
> -		blkif->st_wr_req = ring->st_wr_req;			\
> -		blkif->st_f_req = ring->st_f_req;			\
> -		blkif->st_ds_req = ring->st_ds_req;			\
> -		blkif->st_rd_sect = ring->st_rd_sect;			\
> -		blkif->st_wr_sect = ring->st_wr_sect;			\
> +		for (i = 0; i < blkif->nr_rings; i++) {			\
> +			ring = &blkif->rings[i];			\
> +			blkif->st_oo_req += ring->st_oo_req;		\
> +			blkif->st_rd_req += ring->st_rd_req;		\
> +			blkif->st_wr_req += ring->st_wr_req;		\
> +			blkif->st_f_req += ring->st_f_req;		\
> +			blkif->st_ds_req += ring->st_ds_req;		\
> +			blkif->st_rd_sect += ring->st_rd_sect;		\
> +			blkif->st_wr_sect += ring->st_wr_sect;		\
> +		}							\
>  									\
>  		return sprintf(buf, format, ##args);			\
>  	}								\
> @@ -438,6 +489,7 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
>  static int xen_blkbk_remove(struct xenbus_device *dev)
>  {
>  	struct backend_info *be = dev_get_drvdata(&dev->dev);
> +	int i;
>  
>  	pr_debug("%s %p %d\n", __func__, dev, dev->otherend_id);
>  
> @@ -454,7 +506,8 @@ static int xen_blkbk_remove(struct xenbus_device *dev)
>  
>  	if (be->blkif) {
>  		xen_blkif_disconnect(be->blkif);
> -		xen_blkif_put(be->blkif);
> +		for (i = 0; i < be->blkif->nr_rings; i++)
> +			xen_blkif_put(be->blkif);
>  	}
>  
>  	kfree(be->mode);
> @@ -837,21 +890,16 @@ again:
>  	xenbus_transaction_end(xbt, 1);
>  }
>  
> -
> -static int connect_ring(struct backend_info *be)
> +static int read_per_ring_refs(struct xen_blkif_ring *ring, const char *dir)
>  {
> -	struct xenbus_device *dev = be->dev;
> +	unsigned int ring_page_order, nr_grefs, evtchn;
>  	unsigned int ring_ref[XENBUS_MAX_RING_PAGES];
> -	unsigned int evtchn, nr_grefs, ring_page_order;
> -	unsigned int pers_grants;
> -	char protocol[64] = "";
>  	struct pending_req *req, *n;
>  	int err, i, j;
> -	struct xen_blkif_ring *ring = &be->blkif->ring;
> -
> -	pr_debug("%s %s\n", __func__, dev->otherend);
> +	struct xen_blkif *blkif = ring->blkif;
> +	struct xenbus_device *dev = blkif->be->dev;
>  
> -	err = xenbus_scanf(XBT_NIL, dev->otherend, "event-channel", "%u",
> +	err = xenbus_scanf(XBT_NIL, dir, "event-channel", "%u",
>  			  &evtchn);
>  	if (err != 1) {
>  		err = -EINVAL;
> @@ -859,12 +907,11 @@ static int connect_ring(struct backend_info *be)
>  				 dev->otherend);
>  		return err;
>  	}
> -	pr_info("event-channel %u\n", evtchn);
>  
>  	err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-page-order", "%u",
>  			  &ring_page_order);
>  	if (err != 1) {
> -		err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-ref",
> +		err = xenbus_scanf(XBT_NIL, dir, "ring-ref",
>  				  "%u", &ring_ref[0]);
>  		if (err != 1) {
>  			err = -EINVAL;
> @@ -873,7 +920,7 @@ static int connect_ring(struct backend_info *be)
>  			return err;
>  		}
>  		nr_grefs = 1;
> -		pr_info("%s:using single page: ring-ref %d\n", dev->otherend,
> +		pr_info("%s:using single page: ring-ref %d\n", dir,
>  			ring_ref[0]);
>  	} else {
>  		unsigned int i;
> @@ -891,7 +938,7 @@ static int connect_ring(struct backend_info *be)
>  			char ring_ref_name[RINGREF_NAME_LEN];
>  
>  			snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
> -			err = xenbus_scanf(XBT_NIL, dev->otherend, ring_ref_name,
> +			err = xenbus_scanf(XBT_NIL, dir, ring_ref_name,
>  					   "%u", &ring_ref[i]);
>  			if (err != 1) {
>  				err = -EINVAL;
> @@ -899,38 +946,12 @@ static int connect_ring(struct backend_info *be)
>  						 dev->otherend, ring_ref_name);
>  				return err;
>  			}
> -			pr_info("ring-ref%u: %u\n", i, ring_ref[i]);
> +			pr_info("%s: ring-ref%u: %u\n", dir, i, ring_ref[i]);
>  		}
>  	}
> +	blkif->nr_ring_pages = nr_grefs;
>  
> -	be->blkif->blk_protocol = BLKIF_PROTOCOL_DEFAULT;
> -	err = xenbus_gather(XBT_NIL, dev->otherend, "protocol",
> -			    "%63s", protocol, NULL);
> -	if (err)
> -		strcpy(protocol, "unspecified, assuming default");
> -	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_NATIVE))
> -		be->blkif->blk_protocol = BLKIF_PROTOCOL_NATIVE;
> -	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_X86_32))
> -		be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_32;
> -	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_X86_64))
> -		be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_64;
> -	else {
> -		xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
> -		return -1;
> -	}
> -	err = xenbus_gather(XBT_NIL, dev->otherend,
> -			    "feature-persistent", "%u",
> -			    &pers_grants, NULL);
> -	if (err)
> -		pers_grants = 0;
> -
> -	be->blkif->vbd.feature_gnt_persistent = pers_grants;
> -	be->blkif->vbd.overflow_max_grants = 0;
> -	be->blkif->nr_ring_pages = nr_grefs;
> -
> -	pr_info("ring-pages:%d, event-channel %d, protocol %d (%s) %s\n",
> -		nr_grefs, evtchn, be->blkif->blk_protocol, protocol,
> -		pers_grants ? "persistent grants" : "");
> +	pr_info("ring-pages:%d, event-channel %d.\n", nr_grefs, evtchn);
>  
>  	for (i = 0; i < nr_grefs * XEN_BLKIF_REQS_PER_PAGE; i++) {
>  		req = kzalloc(sizeof(*req), GFP_KERNEL);
> @@ -975,6 +996,71 @@ fail:
>  		kfree(req);
>  	}
>  	return -ENOMEM;
> +
> +}
> +
> +static int connect_ring(struct backend_info *be)
> +{
> +	struct xenbus_device *dev = be->dev;
> +	unsigned int pers_grants;
> +	char protocol[64] = "";
> +	int err, i;
> +	char *xspath;
> +	size_t xspathsize;
> +	const size_t xenstore_path_ext_size = 11; /* sufficient for "/queue-NNN" */

Maybe a define would be better than a const local variable here?

> +
> +	pr_debug("%s %s\n", __func__, dev->otherend);
> +
> +	be->blkif->blk_protocol = BLKIF_PROTOCOL_DEFAULT;
> +	err = xenbus_gather(XBT_NIL, dev->otherend, "protocol",
> +			    "%63s", protocol, NULL);
> +	if (err)
> +		strcpy(protocol, "unspecified, assuming default");
> +	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_NATIVE))
> +		be->blkif->blk_protocol = BLKIF_PROTOCOL_NATIVE;
> +	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_X86_32))
> +		be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_32;
> +	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_X86_64))
> +		be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_64;
> +	else {
> +		xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
> +		return -1;
> +	}
> +	err = xenbus_gather(XBT_NIL, dev->otherend,
> +			    "feature-persistent", "%u",
> +			    &pers_grants, NULL);
> +	if (err)
> +		pers_grants = 0;
> +
> +	be->blkif->vbd.feature_gnt_persistent = pers_grants;
> +	be->blkif->vbd.overflow_max_grants = 0;
> +
> +	pr_info("nr_rings:%d protocol %d (%s) %s\n", be->blkif->nr_rings,
> +		 be->blkif->blk_protocol, protocol,
> +		 pers_grants ? "persistent grants" : "");
> +
> +	if (be->blkif->nr_rings == 1)
> +		return read_per_ring_refs(&be->blkif->rings[0], dev->otherend);
> +	else {
> +		xspathsize = strlen(dev->otherend) + xenstore_path_ext_size;
> +		xspath = kzalloc(xspathsize, GFP_KERNEL);
> +		if (!xspath) {
> +			xenbus_dev_fatal(dev, -ENOMEM, "reading ring references");
> +			return -ENOMEM;
> +		}
> +
> +		for (i = 0; i < be->blkif->nr_rings; i++) {
> +			memset(xspath, 0, xspathsize);
> +			snprintf(xspath, xspathsize, "%s/queue-%u", dev->otherend, i);
> +			err = read_per_ring_refs(&be->blkif->rings[i], xspath);
> +			if (err) {
> +				kfree(xspath);
> +				return err;
> +			}
> +		}
> +		kfree(xspath);
> +	}
> +	return 0;
>  }
>  
>  static const struct xenbus_device_id xen_blkbk_ids[] = {
> 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 9/9] xen/blkback: get number of hardware queues/rings from blkfront
  2015-09-05 12:39   ` Bob Liu
  (?)
  (?)
@ 2015-10-05 15:15   ` Roger Pau Monné
  2015-10-07 10:54     ` Bob Liu
  2015-10-07 10:54     ` Bob Liu
  -1 siblings, 2 replies; 83+ messages in thread
From: Roger Pau Monné @ 2015-10-05 15:15 UTC (permalink / raw)
  To: Bob Liu, xen-devel
  Cc: david.vrabel, linux-kernel, konrad.wilk, felipe.franciosi, axboe,
	hch, avanzini.arianna, rafal.mielniczuk, boris.ostrovsky,
	jonathan.davies

El 05/09/15 a les 14.39, Bob Liu ha escrit:
> Backend advertises "multi-queue-max-queues" to front, and then read back the
> final negotiated queues/rings from "multi-queue-num-queues" which is wrote by
> blkfront.
> 
> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> ---
>  drivers/block/xen-blkback/blkback.c |    8 ++++++++
>  drivers/block/xen-blkback/xenbus.c  |   36 ++++++++++++++++++++++++++++++-----
>  2 files changed, 39 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
> index fd02240..b904fe05f0 100644
> --- a/drivers/block/xen-blkback/blkback.c
> +++ b/drivers/block/xen-blkback/blkback.c
> @@ -83,6 +83,11 @@ module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644);
>  MODULE_PARM_DESC(max_persistent_grants,
>                   "Maximum number of grants to map persistently");
>  
> +unsigned int xenblk_max_queues;
> +module_param_named(max_queues, xenblk_max_queues, uint, 0644);
> +MODULE_PARM_DESC(max_queues,
> +		 "Maximum number of hardware queues per virtual disk");
> +
>  /*
>   * Maximum order of pages to be used for the shared ring between front and
>   * backend, 4KB page granularity is used.
> @@ -1458,6 +1463,9 @@ static int __init xen_blkif_init(void)
>  		xen_blkif_max_ring_order = XENBUS_MAX_RING_PAGE_ORDER;
>  	}
>  
> +	/* Allow as many queues as there are CPUs, by default */
> +	xenblk_max_queues = num_online_cpus();

Hm, I'm not sure of the best way to set a default value for this.
Consider for example a scenario were Dom0 is limited to 2vCPUs, but DomU
has 8 vCPUs. Are we going to limit the number of queues to two? Is that
the most appropriate value from a performance PoV?

I have to admit I don't have a clear idea of a default value for this
field, and maybe the number of CPUs on the backend is indeed what works
better, but there needs to be a comment explaining the reasoning behind
this setting.

>  	rc = xen_blkif_interface_init();
>  	if (rc)
>  		goto failed_init;
> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> index 04b8d0d..aa97ea5 100644
> --- a/drivers/block/xen-blkback/xenbus.c
> +++ b/drivers/block/xen-blkback/xenbus.c
> @@ -28,6 +28,8 @@
>  #define RINGREF_NAME_LEN (20)
>  #define RINGREF_NAME_LEN (20)
>  
> +extern unsigned int xenblk_max_queues;

This should live in blkback/common.h

> +
>  struct backend_info {
>  	struct xenbus_device	*dev;
>  	struct xen_blkif	*blkif;
> @@ -191,11 +193,6 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
>  	atomic_set(&blkif->drain, 0);
>  	INIT_WORK(&blkif->free_work, xen_blkif_deferred_free);
>  
> -	blkif->nr_rings = 1;
> -	if (xen_blkif_alloc_rings(blkif)) {
> -		kmem_cache_free(xen_blkif_cachep, blkif);
> -		return ERR_PTR(-ENOMEM);
> -	}
>  	return blkif;
>  }
>  
> @@ -618,6 +615,14 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
>  		goto fail;
>  	}
>  
> +	/* Multi-queue: wrte how many queues backend supported. */
                        ^ write how many queues are supported by the
backend.
> +	err = xenbus_printf(XBT_NIL, dev->nodename,
> +			    "multi-queue-max-queues", "%u", xenblk_max_queues);
> +	if (err) {
> +		pr_debug("Error writing multi-queue-num-queues\n");
                ^ pr_warn at least.
> +		goto fail;
> +	}
> +
>  	/* setup back pointer */
>  	be->blkif->be = be;
>  
> @@ -1008,6 +1013,7 @@ static int connect_ring(struct backend_info *be)
>  	char *xspath;
>  	size_t xspathsize;
>  	const size_t xenstore_path_ext_size = 11; /* sufficient for "/queue-NNN" */
> +	unsigned int requested_num_queues = 0;
>  
>  	pr_debug("%s %s\n", __func__, dev->otherend);
>  
> @@ -1035,6 +1041,26 @@ static int connect_ring(struct backend_info *be)
>  	be->blkif->vbd.feature_gnt_persistent = pers_grants;
>  	be->blkif->vbd.overflow_max_grants = 0;
>  
> +	/*
> +	 * Read the number of hardware queus from frontend.
                                       ^ queues
> +	 */
> +	err = xenbus_scanf(XBT_NIL, dev->otherend, "multi-queue-num-queues", "%u", &requested_num_queues);
> +	if (err < 0) {
> +		requested_num_queues = 1;
> +	} else {
> +		if (requested_num_queues > xenblk_max_queues
> +		    || requested_num_queues == 0) {
> +			/* buggy or malicious guest */
> +			xenbus_dev_fatal(dev, err,
> +					"guest requested %u queues, exceeding the maximum of %u.",
> +					requested_num_queues, xenblk_max_queues);
> +			return -1;
> +		}
> +	}
> +	be->blkif->nr_rings = requested_num_queues;
> +	if (xen_blkif_alloc_rings(be->blkif))
> +		return -ENOMEM;
> +
>  	pr_info("nr_rings:%d protocol %d (%s) %s\n", be->blkif->nr_rings,
>  		 be->blkif->blk_protocol, protocol,
>  		 pers_grants ? "persistent grants" : "");
> 


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 9/9] xen/blkback: get number of hardware queues/rings from blkfront
  2015-09-05 12:39   ` Bob Liu
  (?)
@ 2015-10-05 15:15   ` Roger Pau Monné
  -1 siblings, 0 replies; 83+ messages in thread
From: Roger Pau Monné @ 2015-10-05 15:15 UTC (permalink / raw)
  To: Bob Liu, xen-devel
  Cc: hch, felipe.franciosi, rafal.mielniczuk, linux-kernel,
	jonathan.davies, axboe, david.vrabel, avanzini.arianna,
	boris.ostrovsky

El 05/09/15 a les 14.39, Bob Liu ha escrit:
> Backend advertises "multi-queue-max-queues" to front, and then read back the
> final negotiated queues/rings from "multi-queue-num-queues" which is wrote by
> blkfront.
> 
> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> ---
>  drivers/block/xen-blkback/blkback.c |    8 ++++++++
>  drivers/block/xen-blkback/xenbus.c  |   36 ++++++++++++++++++++++++++++++-----
>  2 files changed, 39 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
> index fd02240..b904fe05f0 100644
> --- a/drivers/block/xen-blkback/blkback.c
> +++ b/drivers/block/xen-blkback/blkback.c
> @@ -83,6 +83,11 @@ module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644);
>  MODULE_PARM_DESC(max_persistent_grants,
>                   "Maximum number of grants to map persistently");
>  
> +unsigned int xenblk_max_queues;
> +module_param_named(max_queues, xenblk_max_queues, uint, 0644);
> +MODULE_PARM_DESC(max_queues,
> +		 "Maximum number of hardware queues per virtual disk");
> +
>  /*
>   * Maximum order of pages to be used for the shared ring between front and
>   * backend, 4KB page granularity is used.
> @@ -1458,6 +1463,9 @@ static int __init xen_blkif_init(void)
>  		xen_blkif_max_ring_order = XENBUS_MAX_RING_PAGE_ORDER;
>  	}
>  
> +	/* Allow as many queues as there are CPUs, by default */
> +	xenblk_max_queues = num_online_cpus();

Hm, I'm not sure of the best way to set a default value for this.
Consider for example a scenario were Dom0 is limited to 2vCPUs, but DomU
has 8 vCPUs. Are we going to limit the number of queues to two? Is that
the most appropriate value from a performance PoV?

I have to admit I don't have a clear idea of a default value for this
field, and maybe the number of CPUs on the backend is indeed what works
better, but there needs to be a comment explaining the reasoning behind
this setting.

>  	rc = xen_blkif_interface_init();
>  	if (rc)
>  		goto failed_init;
> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> index 04b8d0d..aa97ea5 100644
> --- a/drivers/block/xen-blkback/xenbus.c
> +++ b/drivers/block/xen-blkback/xenbus.c
> @@ -28,6 +28,8 @@
>  #define RINGREF_NAME_LEN (20)
>  #define RINGREF_NAME_LEN (20)
>  
> +extern unsigned int xenblk_max_queues;

This should live in blkback/common.h

> +
>  struct backend_info {
>  	struct xenbus_device	*dev;
>  	struct xen_blkif	*blkif;
> @@ -191,11 +193,6 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
>  	atomic_set(&blkif->drain, 0);
>  	INIT_WORK(&blkif->free_work, xen_blkif_deferred_free);
>  
> -	blkif->nr_rings = 1;
> -	if (xen_blkif_alloc_rings(blkif)) {
> -		kmem_cache_free(xen_blkif_cachep, blkif);
> -		return ERR_PTR(-ENOMEM);
> -	}
>  	return blkif;
>  }
>  
> @@ -618,6 +615,14 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
>  		goto fail;
>  	}
>  
> +	/* Multi-queue: wrte how many queues backend supported. */
                        ^ write how many queues are supported by the
backend.
> +	err = xenbus_printf(XBT_NIL, dev->nodename,
> +			    "multi-queue-max-queues", "%u", xenblk_max_queues);
> +	if (err) {
> +		pr_debug("Error writing multi-queue-num-queues\n");
                ^ pr_warn at least.
> +		goto fail;
> +	}
> +
>  	/* setup back pointer */
>  	be->blkif->be = be;
>  
> @@ -1008,6 +1013,7 @@ static int connect_ring(struct backend_info *be)
>  	char *xspath;
>  	size_t xspathsize;
>  	const size_t xenstore_path_ext_size = 11; /* sufficient for "/queue-NNN" */
> +	unsigned int requested_num_queues = 0;
>  
>  	pr_debug("%s %s\n", __func__, dev->otherend);
>  
> @@ -1035,6 +1041,26 @@ static int connect_ring(struct backend_info *be)
>  	be->blkif->vbd.feature_gnt_persistent = pers_grants;
>  	be->blkif->vbd.overflow_max_grants = 0;
>  
> +	/*
> +	 * Read the number of hardware queus from frontend.
                                       ^ queues
> +	 */
> +	err = xenbus_scanf(XBT_NIL, dev->otherend, "multi-queue-num-queues", "%u", &requested_num_queues);
> +	if (err < 0) {
> +		requested_num_queues = 1;
> +	} else {
> +		if (requested_num_queues > xenblk_max_queues
> +		    || requested_num_queues == 0) {
> +			/* buggy or malicious guest */
> +			xenbus_dev_fatal(dev, err,
> +					"guest requested %u queues, exceeding the maximum of %u.",
> +					requested_num_queues, xenblk_max_queues);
> +			return -1;
> +		}
> +	}
> +	be->blkif->nr_rings = requested_num_queues;
> +	if (xen_blkif_alloc_rings(be->blkif))
> +		return -ENOMEM;
> +
>  	pr_info("nr_rings:%d protocol %d (%s) %s\n", be->blkif->nr_rings,
>  		 be->blkif->blk_protocol, protocol,
>  		 pers_grants ? "persistent grants" : "");
> 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 3/9] xen/blkfront: separate per ring information out of device info
  2015-10-03  0:34     ` Bob Liu
  2015-10-05 15:17       ` Roger Pau Monné
@ 2015-10-05 15:17       ` Roger Pau Monné
  1 sibling, 0 replies; 83+ messages in thread
From: Roger Pau Monné @ 2015-10-05 15:17 UTC (permalink / raw)
  To: Bob Liu
  Cc: xen-devel, david.vrabel, linux-kernel, konrad.wilk,
	felipe.franciosi, axboe, hch, avanzini.arianna, rafal.mielniczuk,
	boris.ostrovsky, jonathan.davies

El 03/10/15 a les 2.34, Bob Liu ha escrit:
> 
> On 10/03/2015 01:02 AM, Roger Pau Monné wrote:
>> El 05/09/15 a les 14.39, Bob Liu ha escrit:
>> In order to make this easier to review, do you think you can leave
>> blkfront_info as "info" for now, and do the renaming to dinfo in a later
>> patch. That would help figuring out mechanical name changes from the
>> actual meat of the patch.
>>
> 
> That's what I have done in v2, but believe me it's more difficult to read and review.
> They are a lot of messed place combined with info and rinfo, when seeing an info you have to think 
> whether is device info or ring info. It's more straightforward to use dinfo and rinfo to distinguish at the beginning.

Ack, I was thinking of ways to limit the diff of this series, because
it's very big and hard to review in general. If you think the current
split is the best one I'm fine with that, thanks for trying to keep it
to a minimum.

Roger.


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 3/9] xen/blkfront: separate per ring information out of device info
  2015-10-03  0:34     ` Bob Liu
@ 2015-10-05 15:17       ` Roger Pau Monné
  2015-10-05 15:17       ` Roger Pau Monné
  1 sibling, 0 replies; 83+ messages in thread
From: Roger Pau Monné @ 2015-10-05 15:17 UTC (permalink / raw)
  To: Bob Liu
  Cc: hch, felipe.franciosi, rafal.mielniczuk, linux-kernel, xen-devel,
	axboe, jonathan.davies, david.vrabel, avanzini.arianna,
	boris.ostrovsky

El 03/10/15 a les 2.34, Bob Liu ha escrit:
> 
> On 10/03/2015 01:02 AM, Roger Pau Monné wrote:
>> El 05/09/15 a les 14.39, Bob Liu ha escrit:
>> In order to make this easier to review, do you think you can leave
>> blkfront_info as "info" for now, and do the renaming to dinfo in a later
>> patch. That would help figuring out mechanical name changes from the
>> actual meat of the patch.
>>
> 
> That's what I have done in v2, but believe me it's more difficult to read and review.
> They are a lot of messed place combined with info and rinfo, when seeing an info you have to think 
> whether is device info or ring info. It's more straightforward to use dinfo and rinfo to distinguish at the beginning.

Ack, I was thinking of ways to limit the diff of this series, because
it's very big and hard to review in general. If you think the current
split is the best one I'm fine with that, thanks for trying to keep it
to a minimum.

Roger.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 4/9] xen/blkfront: pseudo support for multi hardware queues/rings
  2015-10-05 10:52   ` Roger Pau Monné
@ 2015-10-07 10:28     ` Bob Liu
  2015-10-07 10:28     ` Bob Liu
  1 sibling, 0 replies; 83+ messages in thread
From: Bob Liu @ 2015-10-07 10:28 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, david.vrabel, linux-kernel, konrad.wilk,
	felipe.franciosi, axboe, hch, avanzini.arianna, rafal.mielniczuk,
	boris.ostrovsky, jonathan.davies


On 10/05/2015 06:52 PM, Roger Pau Monné wrote:
> El 05/09/15 a les 14.39, Bob Liu ha escrit:
>> Prepare patch for multi hardware queues/rings, the ring number was set to 1 by
>> force.
>>
>> * Use 'nr_rings' in per dev_info to identify how many hw queues/rings are
>>   supported, and a pointer *rinfo for all its rings.
>> * Rename 'nr_ring_pages' => 'pages_per_ring' to distinguish from 'nr_rings'
>>   better.
>>
>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>> ---
>>  drivers/block/xen-blkfront.c |  513 +++++++++++++++++++++++++-----------------
>>  1 file changed, 308 insertions(+), 205 deletions(-)
>>
>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>> index bf416d5..bf45c99 100644
>> --- a/drivers/block/xen-blkfront.c
>> +++ b/drivers/block/xen-blkfront.c
>> @@ -107,7 +107,7 @@ static unsigned int xen_blkif_max_ring_order;
>>  module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, S_IRUGO);
>>  MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the shared ring");
>>  
>> -#define BLK_RING_SIZE(dinfo) __CONST_RING_SIZE(blkif, PAGE_SIZE * (dinfo)->nr_ring_pages)
>> +#define BLK_RING_SIZE(dinfo) __CONST_RING_SIZE(blkif, PAGE_SIZE * (dinfo)->pages_per_ring)
>>  #define BLK_MAX_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE * XENBUS_MAX_RING_PAGES)
>>  /*
>>   * ring-ref%i i=(-1UL) would take 11 characters + 'ring-ref' is 8, so 19
>> @@ -157,9 +157,10 @@ struct blkfront_dev_info {
>>  	unsigned int feature_persistent:1;
>>  	unsigned int max_indirect_segments;
>>  	int is_ready;
>> -	unsigned int nr_ring_pages;
>> +	unsigned int pages_per_ring;
> 
> Why do you rename this field? nr_ring_pages seems more consistent with
> the nr_rings field that you add below IMO, but that might be a matter of
> taste.
> 

I think after nr_rings introduced, nr_ring_pages is not suitable any more because it may misread to
nr-ring pages in total instead of per-ring.

So I prefer rename it to pages_per_ring.

>>  	struct blk_mq_tag_set tag_set;
>> -	struct blkfront_ring_info rinfo;
>> +	struct blkfront_ring_info *rinfo;
>> +	unsigned int nr_rings;
>>  };
>>  
>>  static unsigned int nr_minors;
>> @@ -191,7 +192,7 @@ static DEFINE_SPINLOCK(minor_lock);
>>  	((_segs + SEGS_PER_INDIRECT_FRAME - 1)/SEGS_PER_INDIRECT_FRAME)
>>  
>>  static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo);
>> -static int blkfront_gather_backend_features(struct blkfront_dev_info *dinfo);
>> +static void __blkfront_gather_backend_features(struct blkfront_dev_info *dinfo);
>>  
>>  static int get_id_from_freelist(struct blkfront_ring_info *rinfo)
>>  {
>> @@ -668,7 +669,7 @@ static int blk_mq_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
>>  {
>>  	struct blkfront_dev_info *dinfo = (struct blkfront_dev_info *)data;
>>  
>> -	hctx->driver_data = &dinfo->rinfo;
>> +	hctx->driver_data = &dinfo->rinfo[index];
>>  	return 0;
>>  }
>>  
>> @@ -927,8 +928,8 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
>>  
>>  static void xlvbd_release_gendisk(struct blkfront_dev_info *dinfo)
>>  {
>> -	unsigned int minor, nr_minors;
>> -	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
>> +	unsigned int minor, nr_minors, i;
>> +	struct blkfront_ring_info *rinfo;
>>  
>>  	if (dinfo->rq == NULL)
>>  		return;
>> @@ -936,11 +937,15 @@ static void xlvbd_release_gendisk(struct blkfront_dev_info *dinfo)
>>  	/* No more blkif_request(). */
>>  	blk_mq_stop_hw_queues(dinfo->rq);
>>  
>> -	/* No more gnttab callback work. */
>> -	gnttab_cancel_free_callback(&rinfo->callback);
>> +	for (i = 0; i < dinfo->nr_rings; i++) {
> 
> I would be tempted to declare rinfo only inside the for loop, to limit
> the scope:

Will update.

> 
> 		struct blkfront_ring_info *rinfo = &dinfo->rinfo[i];
> 
>> +		rinfo = &dinfo->rinfo[i];
>>  
>> -	/* Flush gnttab callback work. Must be done with no locks held. */
>> -	flush_work(&rinfo->work);
>> +		/* No more gnttab callback work. */
>> +		gnttab_cancel_free_callback(&rinfo->callback);
>> +
>> +		/* Flush gnttab callback work. Must be done with no locks held. */
>> +		flush_work(&rinfo->work);
>> +	}
>>  
>>  	del_gendisk(dinfo->gd);
>>  
>> @@ -977,8 +982,8 @@ static void blkif_free(struct blkfront_dev_info *dinfo, int suspend)
>>  {
>>  	struct grant *persistent_gnt;
>>  	struct grant *n;
>> -	int i, j, segs;
>> -	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
>> +	int i, j, segs, r_index;
>> +	struct blkfront_ring_info *rinfo;
>>  
>>  	/* Prevent new requests being issued until we fix things up. */
>>  	spin_lock_irq(&dinfo->io_lock);
>> @@ -988,100 +993,103 @@ static void blkif_free(struct blkfront_dev_info *dinfo, int suspend)
>>  	if (dinfo->rq)
>>  		blk_mq_stop_hw_queues(dinfo->rq);
>>  
>> -	/* Remove all persistent grants */
>> -	if (!list_empty(&rinfo->grants)) {
>> -		list_for_each_entry_safe(persistent_gnt, n,
>> -					 &rinfo->grants, node) {
>> -			list_del(&persistent_gnt->node);
>> -			if (persistent_gnt->gref != GRANT_INVALID_REF) {
>> -				gnttab_end_foreign_access(persistent_gnt->gref,
>> -				                          0, 0UL);
>> -				rinfo->persistent_gnts_c--;
>> +	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
>> +		rinfo = &dinfo->rinfo[r_index];
> 
> struct blkfront_ring_info *rinfo = &dinfo->rinfo[r_index];
> 
> Would it be helpful to place all this code inside of a helper function,
> ie: blkif_free_ring?
> 

Sure, will update.

>> +
>> +		/* Remove all persistent grants */
>> +		if (!list_empty(&rinfo->grants)) {
>> +			list_for_each_entry_safe(persistent_gnt, n,
>> +						 &rinfo->grants, node) {
>> +				list_del(&persistent_gnt->node);
>> +				if (persistent_gnt->gref != GRANT_INVALID_REF) {
>> +					gnttab_end_foreign_access(persistent_gnt->gref,
>> +								  0, 0UL);
>> +					rinfo->persistent_gnts_c--;
>> +				}
>> +				if (dinfo->feature_persistent)
>> +					__free_page(pfn_to_page(persistent_gnt->pfn));
>> +				kfree(persistent_gnt);
>>  			}
>> -			if (dinfo->feature_persistent)
>> -				__free_page(pfn_to_page(persistent_gnt->pfn));
>> -			kfree(persistent_gnt);
>>  		}
>> -	}
>> -	BUG_ON(rinfo->persistent_gnts_c != 0);
>> +		BUG_ON(rinfo->persistent_gnts_c != 0);
>>  
>> -	/*
>> -	 * Remove indirect pages, this only happens when using indirect
>> -	 * descriptors but not persistent grants
>> -	 */
>> -	if (!list_empty(&rinfo->indirect_pages)) {
>> -		struct page *indirect_page, *n;
>> -
>> -		BUG_ON(dinfo->feature_persistent);
>> -		list_for_each_entry_safe(indirect_page, n, &rinfo->indirect_pages, lru) {
>> -			list_del(&indirect_page->lru);
>> -			__free_page(indirect_page);
>> -		}
>> -	}
>> -
>> -	for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
>>  		/*
>> -		 * Clear persistent grants present in requests already
>> -		 * on the shared ring
>> +		 * Remove indirect pages, this only happens when using indirect
>> +		 * descriptors but not persistent grants
>>  		 */
>> -		if (!rinfo->shadow[i].request)
>> -			goto free_shadow;
>> -
>> -		segs = rinfo->shadow[i].req.operation == BLKIF_OP_INDIRECT ?
>> -		       rinfo->shadow[i].req.u.indirect.nr_segments :
>> -		       rinfo->shadow[i].req.u.rw.nr_segments;
>> -		for (j = 0; j < segs; j++) {
>> -			persistent_gnt = rinfo->shadow[i].grants_used[j];
>> -			gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
>> -			if (dinfo->feature_persistent)
>> -				__free_page(pfn_to_page(persistent_gnt->pfn));
>> -			kfree(persistent_gnt);
>> +		if (!list_empty(&rinfo->indirect_pages)) {
>> +			struct page *indirect_page, *n;
>> +
>> +			BUG_ON(dinfo->feature_persistent);
>> +			list_for_each_entry_safe(indirect_page, n, &rinfo->indirect_pages, lru) {
>> +				list_del(&indirect_page->lru);
>> +				__free_page(indirect_page);
>> +			}
>>  		}
>>  
>> -		if (rinfo->shadow[i].req.operation != BLKIF_OP_INDIRECT)
>> +		for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
>>  			/*
>> -			 * If this is not an indirect operation don't try to
>> -			 * free indirect segments
>> +			 * Clear persistent grants present in requests already
>> +			 * on the shared ring
>>  			 */
>> -			goto free_shadow;
>> +			if (!rinfo->shadow[i].request)
>> +				goto free_shadow;
>> +
>> +			segs = rinfo->shadow[i].req.operation == BLKIF_OP_INDIRECT ?
>> +			       rinfo->shadow[i].req.u.indirect.nr_segments :
>> +			       rinfo->shadow[i].req.u.rw.nr_segments;
>> +			for (j = 0; j < segs; j++) {
>> +				persistent_gnt = rinfo->shadow[i].grants_used[j];
>> +				gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
>> +				if (dinfo->feature_persistent)
>> +					__free_page(pfn_to_page(persistent_gnt->pfn));
>> +				kfree(persistent_gnt);
>> +			}
>>  
>> -		for (j = 0; j < INDIRECT_GREFS(segs); j++) {
>> -			persistent_gnt = rinfo->shadow[i].indirect_grants[j];
>> -			gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
>> -			__free_page(pfn_to_page(persistent_gnt->pfn));
>> -			kfree(persistent_gnt);
>> -		}
>> +			if (rinfo->shadow[i].req.operation != BLKIF_OP_INDIRECT)
>> +				/*
>> +				 * If this is not an indirect operation don't try to
>> +				 * free indirect segments
>> +				 */
>> +				goto free_shadow;
>> +
>> +			for (j = 0; j < INDIRECT_GREFS(segs); j++) {
>> +				persistent_gnt = rinfo->shadow[i].indirect_grants[j];
>> +				gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
>> +				__free_page(pfn_to_page(persistent_gnt->pfn));
>> +				kfree(persistent_gnt);
>> +			}
>>  
>>  free_shadow:
>> -		kfree(rinfo->shadow[i].grants_used);
>> -		rinfo->shadow[i].grants_used = NULL;
>> -		kfree(rinfo->shadow[i].indirect_grants);
>> -		rinfo->shadow[i].indirect_grants = NULL;
>> -		kfree(rinfo->shadow[i].sg);
>> -		rinfo->shadow[i].sg = NULL;
>> -	}
>> +			kfree(rinfo->shadow[i].grants_used);
>> +			rinfo->shadow[i].grants_used = NULL;
>> +			kfree(rinfo->shadow[i].indirect_grants);
>> +			rinfo->shadow[i].indirect_grants = NULL;
>> +			kfree(rinfo->shadow[i].sg);
>> +			rinfo->shadow[i].sg = NULL;
>> +		}
>>  
>> -	/* No more gnttab callback work. */
>> -	gnttab_cancel_free_callback(&rinfo->callback);
>> -	spin_unlock_irq(&dinfo->io_lock);
>> +		/* No more gnttab callback work. */
>> +		gnttab_cancel_free_callback(&rinfo->callback);
>> +		spin_unlock_irq(&dinfo->io_lock);
>>  
>> -	/* Flush gnttab callback work. Must be done with no locks held. */
>> -	flush_work(&rinfo->work);
>> +		/* Flush gnttab callback work. Must be done with no locks held. */
>> +		flush_work(&rinfo->work);
>>  
>> -	/* Free resources associated with old device channel. */
>> -	for (i = 0; i < dinfo->nr_ring_pages; i++) {
>> -		if (rinfo->ring_ref[i] != GRANT_INVALID_REF) {
>> -			gnttab_end_foreign_access(rinfo->ring_ref[i], 0, 0);
>> -			rinfo->ring_ref[i] = GRANT_INVALID_REF;
>> +		/* Free resources associated with old device channel. */
>> +		for (i = 0; i < dinfo->pages_per_ring; i++) {
>> +			if (rinfo->ring_ref[i] != GRANT_INVALID_REF) {
>> +				gnttab_end_foreign_access(rinfo->ring_ref[i], 0, 0);
>> +				rinfo->ring_ref[i] = GRANT_INVALID_REF;
>> +			}
>>  		}
>> -	}
>> -	free_pages((unsigned long)rinfo->ring.sring, get_order(dinfo->nr_ring_pages * PAGE_SIZE));
>> -	rinfo->ring.sring = NULL;
>> -
>> -	if (rinfo->irq)
>> -		unbind_from_irqhandler(rinfo->irq, rinfo);
>> -	rinfo->evtchn = rinfo->irq = 0;
>> +		free_pages((unsigned long)rinfo->ring.sring, get_order(dinfo->pages_per_ring * PAGE_SIZE));
>> +		rinfo->ring.sring = NULL;
>>  
>> +		if (rinfo->irq)
>> +			unbind_from_irqhandler(rinfo->irq, rinfo);
>> +		rinfo->evtchn = rinfo->irq = 0;
>> +	}
>>  }
>>  
>>  static void blkif_completion(struct blk_shadow *s, struct blkfront_ring_info *rinfo,
>> @@ -1276,6 +1284,26 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>>  	return IRQ_HANDLED;
>>  }
>>  
>> +static void destroy_blkring(struct xenbus_device *dev,
>> +			    struct blkfront_ring_info *rinfo)
>> +{
>> +	int i;
>> +
>> +	if (rinfo->irq)
>> +		unbind_from_irqhandler(rinfo->irq, rinfo);
>> +	if (rinfo->evtchn)
>> +		xenbus_free_evtchn(dev, rinfo->evtchn);
>> +
>> +	for (i = 0; i < rinfo->dinfo->pages_per_ring; i++) {
>> +		if (rinfo->ring_ref[i] != GRANT_INVALID_REF) {
>> +			gnttab_end_foreign_access(rinfo->ring_ref[i], 0, 0);
>> +			rinfo->ring_ref[i] = GRANT_INVALID_REF;
>> +		}
>> +	}
>> +	free_pages((unsigned long)rinfo->ring.sring,
>> +		   get_order(rinfo->dinfo->pages_per_ring * PAGE_SIZE));
>> +	rinfo->ring.sring = NULL;
>> +}
>>  
>>  static int setup_blkring(struct xenbus_device *dev,
>>  			 struct blkfront_ring_info *rinfo)
>> @@ -1283,10 +1311,10 @@ static int setup_blkring(struct xenbus_device *dev,
>>  	struct blkif_sring *sring;
>>  	int err, i;
>>  	struct blkfront_dev_info *dinfo = rinfo->dinfo;
>> -	unsigned long ring_size = dinfo->nr_ring_pages * PAGE_SIZE;
>> +	unsigned long ring_size = dinfo->pages_per_ring * PAGE_SIZE;
>>  	grant_ref_t gref[XENBUS_MAX_RING_PAGES];
>>  
>> -	for (i = 0; i < dinfo->nr_ring_pages; i++)
>> +	for (i = 0; i < dinfo->pages_per_ring; i++)
>>  		rinfo->ring_ref[i] = GRANT_INVALID_REF;
>>  
>>  	sring = (struct blkif_sring *)__get_free_pages(GFP_NOIO | __GFP_HIGH,
>> @@ -1298,13 +1326,13 @@ static int setup_blkring(struct xenbus_device *dev,
>>  	SHARED_RING_INIT(sring);
>>  	FRONT_RING_INIT(&rinfo->ring, sring, ring_size);
>>  
>> -	err = xenbus_grant_ring(dev, rinfo->ring.sring, dinfo->nr_ring_pages, gref);
>> +	err = xenbus_grant_ring(dev, rinfo->ring.sring, dinfo->pages_per_ring, gref);
>>  	if (err < 0) {
>>  		free_pages((unsigned long)sring, get_order(ring_size));
>>  		rinfo->ring.sring = NULL;
>>  		goto fail;
>>  	}
>> -	for (i = 0; i < dinfo->nr_ring_pages; i++)
>> +	for (i = 0; i < dinfo->pages_per_ring; i++)
>>  		rinfo->ring_ref[i] = gref[i];
>>  
>>  	err = xenbus_alloc_evtchn(dev, &rinfo->evtchn);
>> @@ -1322,7 +1350,7 @@ static int setup_blkring(struct xenbus_device *dev,
>>  
>>  	return 0;
>>  fail:
>> -	blkif_free(dinfo, 0);
>> +	destroy_blkring(dev, rinfo);
> 
> blkif_free used to clean a lot more than what destroy_blkring does, is
> this right?
> 

I think it's fine because we are still in the early setup stage.
But to minial the changes I'll not introduce destroy_blkring() in next version any more.

And blkif_free() will be deleted here, because talk_to_blkback() will call blkif_free() anyway if return err.

>>  	return err;
>>  }
>>  
>> @@ -1333,65 +1361,76 @@ static int talk_to_blkback(struct xenbus_device *dev,
>>  {
>>  	const char *message = NULL;
>>  	struct xenbus_transaction xbt;
>> -	int err, i;
>> +	int err, i, r_index;
>>  	unsigned int max_page_order = 0;
>>  	unsigned int ring_page_order = 0;
>> -	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
>> +	struct blkfront_ring_info *rinfo;
>>  
>>  	err = xenbus_scanf(XBT_NIL, dinfo->xbdev->otherend,
>>  			   "max-ring-page-order", "%u", &max_page_order);
>>  	if (err != 1)
>> -		dinfo->nr_ring_pages = 1;
>> +		dinfo->pages_per_ring = 1;
>>  	else {
>>  		ring_page_order = min(xen_blkif_max_ring_order, max_page_order);
>> -		dinfo->nr_ring_pages = 1 << ring_page_order;
>> +		dinfo->pages_per_ring = 1 << ring_page_order;
> 
> As said above, I think nr_ring_pages is perfectly fine, and avoids all
> this ponintless changes.
> 
>>  	}
>>  
>> -	/* Create shared ring, alloc event channel. */
>> -	err = setup_blkring(dev, rinfo);
>> -	if (err)
>> -		goto out;
>> +	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
>> +		rinfo = &dinfo->rinfo[r_index];
>> +		/* Create shared ring, alloc event channel. */
>> +		err = setup_blkring(dev, rinfo);
>> +		if (err)
>> +			goto out;
>> +	}
>>  
>>  again:
>>  	err = xenbus_transaction_start(&xbt);
>>  	if (err) {
>>  		xenbus_dev_fatal(dev, err, "starting transaction");
>> -		goto destroy_blkring;
>> +		goto out;
>>  	}
>>  
>> -	if (dinfo->nr_ring_pages == 1) {
>> -		err = xenbus_printf(xbt, dev->nodename,
>> -				    "ring-ref", "%u", rinfo->ring_ref[0]);
>> -		if (err) {
>> -			message = "writing ring-ref";
>> -			goto abort_transaction;
>> -		}
>> -	} else {
>> -		err = xenbus_printf(xbt, dev->nodename,
>> -				    "ring-page-order", "%u", ring_page_order);
>> -		if (err) {
>> -			message = "writing ring-page-order";
>> -			goto abort_transaction;
>> -		}
>> -
>> -		for (i = 0; i < dinfo->nr_ring_pages; i++) {
>> -			char ring_ref_name[RINGREF_NAME_LEN];
>> +	if (dinfo->nr_rings == 1) {
>> +		rinfo = &dinfo->rinfo[0];
>>  
>> -			snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
>> -			err = xenbus_printf(xbt, dev->nodename, ring_ref_name,
>> -					    "%u", rinfo->ring_ref[i]);
>> +		if (dinfo->pages_per_ring == 1) {
>> +			err = xenbus_printf(xbt, dev->nodename,
>> +					    "ring-ref", "%u", rinfo->ring_ref[0]);
>>  			if (err) {
>>  				message = "writing ring-ref";
>>  				goto abort_transaction;
>>  			}
>> +		} else {
>> +			err = xenbus_printf(xbt, dev->nodename,
>> +					    "ring-page-order", "%u", ring_page_order);
>> +			if (err) {
>> +				message = "writing ring-page-order";
>> +				goto abort_transaction;
>> +			}
>> +
>> +			for (i = 0; i < dinfo->pages_per_ring; i++) {
>> +				char ring_ref_name[RINGREF_NAME_LEN];
>> +
>> +				snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
>> +				err = xenbus_printf(xbt, dev->nodename, ring_ref_name,
>> +						    "%u", rinfo->ring_ref[i]);
>> +				if (err) {
>> +					message = "writing ring-ref";
>> +					goto abort_transaction;
>> +				}
>> +			}
>>  		}
>> -	}
>> -	err = xenbus_printf(xbt, dev->nodename,
>> -			    "event-channel", "%u", rinfo->evtchn);
>> -	if (err) {
>> -		message = "writing event-channel";
>> +		err = xenbus_printf(xbt, dev->nodename,
>> +				    "event-channel", "%u", rinfo->evtchn);
>> +		if (err) {
>> +			message = "writing event-channel";
>> +			goto abort_transaction;
>> +		}
>> +	} else {
>> +		/* Not supported at this stage */
>>  		goto abort_transaction;
>>  	}
>> +
>>  	err = xenbus_printf(xbt, dev->nodename, "protocol", "%s",
>>  			    XEN_IO_PROTO_ABI_NATIVE);
>>  	if (err) {
>> @@ -1409,12 +1448,16 @@ again:
>>  		if (err == -EAGAIN)
>>  			goto again;
>>  		xenbus_dev_fatal(dev, err, "completing transaction");
>> -		goto destroy_blkring;
>> +		goto out;
>>  	}
>>  
>> -	for (i = 0; i < BLK_RING_SIZE(dinfo); i++)
>> -		rinfo->shadow[i].req.u.rw.id = i+1;
>> -	rinfo->shadow[BLK_RING_SIZE(dinfo)-1].req.u.rw.id = 0x0fffffff;
>> +	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
>> +		rinfo = &dinfo->rinfo[r_index];
>> +
>> +		for (i = 0; i < BLK_RING_SIZE(dinfo); i++)
>> +			rinfo->shadow[i].req.u.rw.id = i+1;
>> +		rinfo->shadow[BLK_RING_SIZE(dinfo)-1].req.u.rw.id = 0x0fffffff;
>> +	}
>>  	xenbus_switch_state(dev, XenbusStateInitialised);
>>  
>>  	return 0;
>> @@ -1423,9 +1466,9 @@ again:
>>  	xenbus_transaction_end(xbt, 1);
>>  	if (message)
>>  		xenbus_dev_fatal(dev, err, "%s", message);
>> - destroy_blkring:
>> -	blkif_free(dinfo, 0);
>>   out:
>> +	while (--r_index >= 0)
>> +		destroy_blkring(dev, &dinfo->rinfo[r_index]);
> 
> The same as above, destroy_blkring does a different cleaning of what
> used to be done in blkif_free.

Will revert to blkif_free().

> 
>>  	return err;
>>  }
>>  
>> @@ -1438,7 +1481,7 @@ again:
>>  static int blkfront_probe(struct xenbus_device *dev,
>>  			  const struct xenbus_device_id *id)
>>  {
>> -	int err, vdevice;
>> +	int err, vdevice, r_index;
>>  	struct blkfront_dev_info *dinfo;
>>  	struct blkfront_ring_info *rinfo;
>>  
>> @@ -1490,17 +1533,29 @@ static int blkfront_probe(struct xenbus_device *dev,
>>  		return -ENOMEM;
>>  	}
>>  
>> -	rinfo = &dinfo->rinfo;
>>  	mutex_init(&dinfo->mutex);
>>  	spin_lock_init(&dinfo->io_lock);
>>  	dinfo->xbdev = dev;
>>  	dinfo->vdevice = vdevice;
>> -	INIT_LIST_HEAD(&rinfo->grants);
>> -	INIT_LIST_HEAD(&rinfo->indirect_pages);
>> -	rinfo->persistent_gnts_c = 0;
>>  	dinfo->connected = BLKIF_STATE_DISCONNECTED;
>> -	rinfo->dinfo = dinfo;
>> -	INIT_WORK(&rinfo->work, blkif_restart_queue);
>> +
>> +	dinfo->nr_rings = 1;
>> +	dinfo->rinfo = kzalloc(sizeof(*rinfo) * dinfo->nr_rings, GFP_KERNEL);
>> +	if (!dinfo->rinfo) {
>> +		xenbus_dev_fatal(dev, -ENOMEM, "allocating ring_info structure");
>> +		kfree(dinfo);
>> +		return -ENOMEM;
>> +	}
>> +
>> +	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
>> +		rinfo = &dinfo->rinfo[r_index];
>> +
>> +		INIT_LIST_HEAD(&rinfo->grants);
>> +		INIT_LIST_HEAD(&rinfo->indirect_pages);
>> +		rinfo->persistent_gnts_c = 0;
>> +		rinfo->dinfo = dinfo;
>> +		INIT_WORK(&rinfo->work, blkif_restart_queue);
>> +	}
>>  
>>  	/* Front end dir is a number, which is used as the id. */
>>  	dinfo->handle = simple_strtoul(strrchr(dev->nodename, '/')+1, NULL, 0);
>> @@ -1526,7 +1581,7 @@ static void split_bio_end(struct bio *bio, int error)
>>  
>>  static int blkif_recover(struct blkfront_dev_info *dinfo)
>>  {
>> -	int i;
>> +	int i, r_index;
>>  	struct request *req, *n;
>>  	struct blk_shadow *copy;
>>  	int rc;
>> @@ -1536,56 +1591,62 @@ static int blkif_recover(struct blkfront_dev_info *dinfo)
>>  	int pending, size;
>>  	struct split_bio *split_bio;
>>  	struct list_head requests;
>> -	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
>> -
>> -	/* Stage 1: Make a safe copy of the shadow state. */
>> -	copy = kmemdup(rinfo->shadow, sizeof(rinfo->shadow),
>> -		       GFP_NOIO | __GFP_REPEAT | __GFP_HIGH);
>> -	if (!copy)
>> -		return -ENOMEM;
>> -
>> -	/* Stage 2: Set up free list. */
>> -	memset(&rinfo->shadow, 0, sizeof(rinfo->shadow));
>> -	for (i = 0; i < BLK_RING_SIZE(dinfo); i++)
>> -		rinfo->shadow[i].req.u.rw.id = i+1;
>> -	rinfo->shadow_free = rinfo->ring.req_prod_pvt;
>> -	rinfo->shadow[BLK_RING_SIZE(dinfo)-1].req.u.rw.id = 0x0fffffff;
>> -
>> -	rc = blkfront_gather_backend_features(dinfo);
>> -	if (rc) {
>> -		kfree(copy);
>> -		return rc;
>> -	}
>> +	struct blkfront_ring_info *rinfo;
>>  
>> +	__blkfront_gather_backend_features(dinfo);
>>  	segs = dinfo->max_indirect_segments ? : BLKIF_MAX_SEGMENTS_PER_REQUEST;
>>  	blk_queue_max_segments(dinfo->rq, segs);
>>  	bio_list_init(&bio_list);
>>  	INIT_LIST_HEAD(&requests);
>> -	for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
>> -		/* Not in use? */
>> -		if (!copy[i].request)
>> -			continue;
>>  
>> -		/*
>> -		 * Get the bios in the request so we can re-queue them.
>> -		 */
>> -		if (copy[i].request->cmd_flags &
>> -		    (REQ_FLUSH | REQ_FUA | REQ_DISCARD | REQ_SECURE)) {
>> +	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
>> +		rinfo = &dinfo->rinfo[r_index];
>> +
>> +		/* Stage 1: Make a safe copy of the shadow state. */
>> +		copy = kmemdup(rinfo->shadow, sizeof(rinfo->shadow),
>> +			       GFP_NOIO | __GFP_REPEAT | __GFP_HIGH);
>> +		if (!copy)
>> +			return -ENOMEM;
>> +
>> +		/* Stage 2: Set up free list. */
>> +		memset(&rinfo->shadow, 0, sizeof(rinfo->shadow));
>> +		for (i = 0; i < BLK_RING_SIZE(dinfo); i++)
>> +			rinfo->shadow[i].req.u.rw.id = i+1;
>> +		rinfo->shadow_free = rinfo->ring.req_prod_pvt;
>> +		rinfo->shadow[BLK_RING_SIZE(dinfo)-1].req.u.rw.id = 0x0fffffff;
>> +
>> +		rc = blkfront_setup_indirect(rinfo);
>> +		if (rc) {
>> +			kfree(copy);
>> +			return rc;
>> +		}
>> +
>> +		for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
>> +			/* Not in use? */
>> +			if (!copy[i].request)
>> +				continue;
>> +
>>  			/*
>> -			 * Flush operations don't contain bios, so
>> -			 * we need to requeue the whole request
>> +			 * Get the bios in the request so we can re-queue them.
>>  			 */
>> -			list_add(&copy[i].request->queuelist, &requests);
>> -			continue;
>> +			if (copy[i].request->cmd_flags &
>> +			    (REQ_FLUSH | REQ_FUA | REQ_DISCARD | REQ_SECURE)) {
>> +				/*
>> +				 * Flush operations don't contain bios, so
>> +				 * we need to requeue the whole request
>> +				 */
>> +				list_add(&copy[i].request->queuelist, &requests);
>> +				continue;
>> +			}
>> +			merge_bio.head = copy[i].request->bio;
>> +			merge_bio.tail = copy[i].request->biotail;
>> +			bio_list_merge(&bio_list, &merge_bio);
>> +			copy[i].request->bio = NULL;
>> +			blk_end_request_all(copy[i].request, 0);
>>  		}
>> -		merge_bio.head = copy[i].request->bio;
>> -		merge_bio.tail = copy[i].request->biotail;
>> -		bio_list_merge(&bio_list, &merge_bio);
>> -		copy[i].request->bio = NULL;
>> -		blk_end_request_all(copy[i].request, 0);
>> -	}
>>  
>> -	kfree(copy);
>> +		kfree(copy);
>> +	}
>>  
>>  	xenbus_switch_state(dinfo->xbdev, XenbusStateConnected);
>>  
>> @@ -1594,8 +1655,12 @@ static int blkif_recover(struct blkfront_dev_info *dinfo)
>>  	/* Now safe for us to use the shared ring */
>>  	dinfo->connected = BLKIF_STATE_CONNECTED;
>>  
>> -	/* Kick any other new requests queued since we resumed */
>> -	kick_pending_request_queues(rinfo);
>> +	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
>> +		rinfo = &dinfo->rinfo[r_index];
>> +
>> +		/* Kick any other new requests queued since we resumed */
>> +		kick_pending_request_queues(rinfo);
>> +	}
>>  
>>  	list_for_each_entry_safe(req, n, &requests, queuelist) {
>>  		/* Requeue pending requests (flush or discard) */
>> @@ -1729,6 +1794,38 @@ static void blkfront_setup_discard(struct blkfront_dev_info *dinfo)
>>  		dinfo->feature_secdiscard = !!discard_secure;
>>  }
>>  
>> +static void blkfront_clean_ring(struct blkfront_ring_info *rinfo)
>> +{
>> +	int i;
>> +
>> +	for (i = 0; i < BLK_RING_SIZE(rinfo->dinfo); i++) {
>> +		kfree(rinfo->shadow[i].grants_used);
>> +		rinfo->shadow[i].grants_used = NULL;
>> +		kfree(rinfo->shadow[i].sg);
>> +		rinfo->shadow[i].sg = NULL;
>> +		kfree(rinfo->shadow[i].indirect_grants);
>> +		rinfo->shadow[i].indirect_grants = NULL;
>> +	}
>> +	if (!list_empty(&rinfo->indirect_pages)) {
>> +		struct page *indirect_page, *n;
>> +		list_for_each_entry_safe(indirect_page, n, &rinfo->indirect_pages, lru) {
>> +			list_del(&indirect_page->lru);
>> +			__free_page(indirect_page);
>> +		}
>> +	}
>> +
>> +	if (!list_empty(&rinfo->grants)) {
>> +		struct grant *gnt_list_entry, *n;
>> +		list_for_each_entry_safe(gnt_list_entry, n,
>> +				&rinfo->grants, node) {
>> +			list_del(&gnt_list_entry->node);
>> +			if (rinfo->dinfo->feature_persistent)
>> +				__free_page(pfn_to_page(gnt_list_entry->pfn));
>> +			kfree(gnt_list_entry);
>> +		}
>> +	}
>> +}
>> +
>>  static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo)
>>  {
>>  	unsigned int segs;
>> @@ -1783,28 +1880,14 @@ static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo)
>>  	return 0;
>>  
>>  out_of_memory:
>> -	for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
>> -		kfree(rinfo->shadow[i].grants_used);
>> -		rinfo->shadow[i].grants_used = NULL;
>> -		kfree(rinfo->shadow[i].sg);
>> -		rinfo->shadow[i].sg = NULL;
>> -		kfree(rinfo->shadow[i].indirect_grants);
>> -		rinfo->shadow[i].indirect_grants = NULL;
>> -	}
>> -	if (!list_empty(&rinfo->indirect_pages)) {
>> -		struct page *indirect_page, *n;
>> -		list_for_each_entry_safe(indirect_page, n, &rinfo->indirect_pages, lru) {
>> -			list_del(&indirect_page->lru);
>> -			__free_page(indirect_page);
>> -		}
>> -	}
>> +	blkfront_clean_ring(rinfo);
>>  	return -ENOMEM;
>>  }
>>  
>>  /*
>>   * Gather all backend feature-*
>>   */
>> -static int blkfront_gather_backend_features(struct blkfront_dev_info *dinfo)
>> +static void __blkfront_gather_backend_features(struct blkfront_dev_info *dinfo)
>>  {
>>  	int err;
>>  	int barrier, flush, discard, persistent;
>> @@ -1859,8 +1942,25 @@ static int blkfront_gather_backend_features(struct blkfront_dev_info *dinfo)
>>  	else
>>  		dinfo->max_indirect_segments = min(indirect_segments,
>>  						  xen_blkif_max_segments);
>> +}
>> +
>> +static int blkfront_gather_backend_features(struct blkfront_dev_info *dinfo)
>> +{
>> +	int err, i;
>> +
>> +	__blkfront_gather_backend_features(dinfo);
> 
> IMHO, there's no need to introduce __blkfront_gather_backend_features,
> just add the chunk below to the existing blkfront_gather_backend_features.
> 

Will update.

>> -	return blkfront_setup_indirect(&dinfo->rinfo);
>> +	for (i = 0; i < dinfo->nr_rings; i++) {
>> +		err = blkfront_setup_indirect(&dinfo->rinfo[i]);
>> +		if (err)
>> +			goto out;
>> +	}
>> +	return 0;
>> +
>> +out:
>> +	while (--i >= 0)
>> +		blkfront_clean_ring(&dinfo->rinfo[i]);
>> +	return err;
>>  }
>>  
>>  /*
>> @@ -1873,8 +1973,8 @@ static void blkfront_connect(struct blkfront_dev_info *dinfo)
>>  	unsigned long sector_size;
>>  	unsigned int physical_sector_size;
>>  	unsigned int binfo;
>> -	int err;
>> -	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
>> +	int err, i;
>> +	struct blkfront_ring_info *rinfo;
>>  
>>  	switch (dinfo->connected) {
>>  	case BLKIF_STATE_CONNECTED:
>> @@ -1951,7 +2051,10 @@ static void blkfront_connect(struct blkfront_dev_info *dinfo)
>>  	/* Kick pending requests. */
>>  	spin_lock_irq(&dinfo->io_lock);
>>  	dinfo->connected = BLKIF_STATE_CONNECTED;
>> -	kick_pending_request_queues(rinfo);
>> +	for (i = 0; i < dinfo->nr_rings; i++) {
>> +		rinfo = &dinfo->rinfo[i];
> 
> If rinfo is only going to be used in the for loop you can declare it inside:
> 

Will update.

Thank you for your patient for these big patches.

> 		struct blkfront_ring_info *rinfo = &dinfo->rinfo[i];
> 
>> +		kick_pending_request_queues(rinfo);
>> +	}
>>  	spin_unlock_irq(&dinfo->io_lock);
>>  
>>  	add_disk(dinfo->gd);
>>
> 

-- 
Regards,
-Bob

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 4/9] xen/blkfront: pseudo support for multi hardware queues/rings
  2015-10-05 10:52   ` Roger Pau Monné
  2015-10-07 10:28     ` Bob Liu
@ 2015-10-07 10:28     ` Bob Liu
  1 sibling, 0 replies; 83+ messages in thread
From: Bob Liu @ 2015-10-07 10:28 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: hch, felipe.franciosi, rafal.mielniczuk, linux-kernel, xen-devel,
	axboe, jonathan.davies, david.vrabel, avanzini.arianna,
	boris.ostrovsky


On 10/05/2015 06:52 PM, Roger Pau Monné wrote:
> El 05/09/15 a les 14.39, Bob Liu ha escrit:
>> Prepare patch for multi hardware queues/rings, the ring number was set to 1 by
>> force.
>>
>> * Use 'nr_rings' in per dev_info to identify how many hw queues/rings are
>>   supported, and a pointer *rinfo for all its rings.
>> * Rename 'nr_ring_pages' => 'pages_per_ring' to distinguish from 'nr_rings'
>>   better.
>>
>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>> ---
>>  drivers/block/xen-blkfront.c |  513 +++++++++++++++++++++++++-----------------
>>  1 file changed, 308 insertions(+), 205 deletions(-)
>>
>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>> index bf416d5..bf45c99 100644
>> --- a/drivers/block/xen-blkfront.c
>> +++ b/drivers/block/xen-blkfront.c
>> @@ -107,7 +107,7 @@ static unsigned int xen_blkif_max_ring_order;
>>  module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, S_IRUGO);
>>  MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the shared ring");
>>  
>> -#define BLK_RING_SIZE(dinfo) __CONST_RING_SIZE(blkif, PAGE_SIZE * (dinfo)->nr_ring_pages)
>> +#define BLK_RING_SIZE(dinfo) __CONST_RING_SIZE(blkif, PAGE_SIZE * (dinfo)->pages_per_ring)
>>  #define BLK_MAX_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE * XENBUS_MAX_RING_PAGES)
>>  /*
>>   * ring-ref%i i=(-1UL) would take 11 characters + 'ring-ref' is 8, so 19
>> @@ -157,9 +157,10 @@ struct blkfront_dev_info {
>>  	unsigned int feature_persistent:1;
>>  	unsigned int max_indirect_segments;
>>  	int is_ready;
>> -	unsigned int nr_ring_pages;
>> +	unsigned int pages_per_ring;
> 
> Why do you rename this field? nr_ring_pages seems more consistent with
> the nr_rings field that you add below IMO, but that might be a matter of
> taste.
> 

I think after nr_rings introduced, nr_ring_pages is not suitable any more because it may misread to
nr-ring pages in total instead of per-ring.

So I prefer rename it to pages_per_ring.

>>  	struct blk_mq_tag_set tag_set;
>> -	struct blkfront_ring_info rinfo;
>> +	struct blkfront_ring_info *rinfo;
>> +	unsigned int nr_rings;
>>  };
>>  
>>  static unsigned int nr_minors;
>> @@ -191,7 +192,7 @@ static DEFINE_SPINLOCK(minor_lock);
>>  	((_segs + SEGS_PER_INDIRECT_FRAME - 1)/SEGS_PER_INDIRECT_FRAME)
>>  
>>  static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo);
>> -static int blkfront_gather_backend_features(struct blkfront_dev_info *dinfo);
>> +static void __blkfront_gather_backend_features(struct blkfront_dev_info *dinfo);
>>  
>>  static int get_id_from_freelist(struct blkfront_ring_info *rinfo)
>>  {
>> @@ -668,7 +669,7 @@ static int blk_mq_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
>>  {
>>  	struct blkfront_dev_info *dinfo = (struct blkfront_dev_info *)data;
>>  
>> -	hctx->driver_data = &dinfo->rinfo;
>> +	hctx->driver_data = &dinfo->rinfo[index];
>>  	return 0;
>>  }
>>  
>> @@ -927,8 +928,8 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
>>  
>>  static void xlvbd_release_gendisk(struct blkfront_dev_info *dinfo)
>>  {
>> -	unsigned int minor, nr_minors;
>> -	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
>> +	unsigned int minor, nr_minors, i;
>> +	struct blkfront_ring_info *rinfo;
>>  
>>  	if (dinfo->rq == NULL)
>>  		return;
>> @@ -936,11 +937,15 @@ static void xlvbd_release_gendisk(struct blkfront_dev_info *dinfo)
>>  	/* No more blkif_request(). */
>>  	blk_mq_stop_hw_queues(dinfo->rq);
>>  
>> -	/* No more gnttab callback work. */
>> -	gnttab_cancel_free_callback(&rinfo->callback);
>> +	for (i = 0; i < dinfo->nr_rings; i++) {
> 
> I would be tempted to declare rinfo only inside the for loop, to limit
> the scope:

Will update.

> 
> 		struct blkfront_ring_info *rinfo = &dinfo->rinfo[i];
> 
>> +		rinfo = &dinfo->rinfo[i];
>>  
>> -	/* Flush gnttab callback work. Must be done with no locks held. */
>> -	flush_work(&rinfo->work);
>> +		/* No more gnttab callback work. */
>> +		gnttab_cancel_free_callback(&rinfo->callback);
>> +
>> +		/* Flush gnttab callback work. Must be done with no locks held. */
>> +		flush_work(&rinfo->work);
>> +	}
>>  
>>  	del_gendisk(dinfo->gd);
>>  
>> @@ -977,8 +982,8 @@ static void blkif_free(struct blkfront_dev_info *dinfo, int suspend)
>>  {
>>  	struct grant *persistent_gnt;
>>  	struct grant *n;
>> -	int i, j, segs;
>> -	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
>> +	int i, j, segs, r_index;
>> +	struct blkfront_ring_info *rinfo;
>>  
>>  	/* Prevent new requests being issued until we fix things up. */
>>  	spin_lock_irq(&dinfo->io_lock);
>> @@ -988,100 +993,103 @@ static void blkif_free(struct blkfront_dev_info *dinfo, int suspend)
>>  	if (dinfo->rq)
>>  		blk_mq_stop_hw_queues(dinfo->rq);
>>  
>> -	/* Remove all persistent grants */
>> -	if (!list_empty(&rinfo->grants)) {
>> -		list_for_each_entry_safe(persistent_gnt, n,
>> -					 &rinfo->grants, node) {
>> -			list_del(&persistent_gnt->node);
>> -			if (persistent_gnt->gref != GRANT_INVALID_REF) {
>> -				gnttab_end_foreign_access(persistent_gnt->gref,
>> -				                          0, 0UL);
>> -				rinfo->persistent_gnts_c--;
>> +	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
>> +		rinfo = &dinfo->rinfo[r_index];
> 
> struct blkfront_ring_info *rinfo = &dinfo->rinfo[r_index];
> 
> Would it be helpful to place all this code inside of a helper function,
> ie: blkif_free_ring?
> 

Sure, will update.

>> +
>> +		/* Remove all persistent grants */
>> +		if (!list_empty(&rinfo->grants)) {
>> +			list_for_each_entry_safe(persistent_gnt, n,
>> +						 &rinfo->grants, node) {
>> +				list_del(&persistent_gnt->node);
>> +				if (persistent_gnt->gref != GRANT_INVALID_REF) {
>> +					gnttab_end_foreign_access(persistent_gnt->gref,
>> +								  0, 0UL);
>> +					rinfo->persistent_gnts_c--;
>> +				}
>> +				if (dinfo->feature_persistent)
>> +					__free_page(pfn_to_page(persistent_gnt->pfn));
>> +				kfree(persistent_gnt);
>>  			}
>> -			if (dinfo->feature_persistent)
>> -				__free_page(pfn_to_page(persistent_gnt->pfn));
>> -			kfree(persistent_gnt);
>>  		}
>> -	}
>> -	BUG_ON(rinfo->persistent_gnts_c != 0);
>> +		BUG_ON(rinfo->persistent_gnts_c != 0);
>>  
>> -	/*
>> -	 * Remove indirect pages, this only happens when using indirect
>> -	 * descriptors but not persistent grants
>> -	 */
>> -	if (!list_empty(&rinfo->indirect_pages)) {
>> -		struct page *indirect_page, *n;
>> -
>> -		BUG_ON(dinfo->feature_persistent);
>> -		list_for_each_entry_safe(indirect_page, n, &rinfo->indirect_pages, lru) {
>> -			list_del(&indirect_page->lru);
>> -			__free_page(indirect_page);
>> -		}
>> -	}
>> -
>> -	for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
>>  		/*
>> -		 * Clear persistent grants present in requests already
>> -		 * on the shared ring
>> +		 * Remove indirect pages, this only happens when using indirect
>> +		 * descriptors but not persistent grants
>>  		 */
>> -		if (!rinfo->shadow[i].request)
>> -			goto free_shadow;
>> -
>> -		segs = rinfo->shadow[i].req.operation == BLKIF_OP_INDIRECT ?
>> -		       rinfo->shadow[i].req.u.indirect.nr_segments :
>> -		       rinfo->shadow[i].req.u.rw.nr_segments;
>> -		for (j = 0; j < segs; j++) {
>> -			persistent_gnt = rinfo->shadow[i].grants_used[j];
>> -			gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
>> -			if (dinfo->feature_persistent)
>> -				__free_page(pfn_to_page(persistent_gnt->pfn));
>> -			kfree(persistent_gnt);
>> +		if (!list_empty(&rinfo->indirect_pages)) {
>> +			struct page *indirect_page, *n;
>> +
>> +			BUG_ON(dinfo->feature_persistent);
>> +			list_for_each_entry_safe(indirect_page, n, &rinfo->indirect_pages, lru) {
>> +				list_del(&indirect_page->lru);
>> +				__free_page(indirect_page);
>> +			}
>>  		}
>>  
>> -		if (rinfo->shadow[i].req.operation != BLKIF_OP_INDIRECT)
>> +		for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
>>  			/*
>> -			 * If this is not an indirect operation don't try to
>> -			 * free indirect segments
>> +			 * Clear persistent grants present in requests already
>> +			 * on the shared ring
>>  			 */
>> -			goto free_shadow;
>> +			if (!rinfo->shadow[i].request)
>> +				goto free_shadow;
>> +
>> +			segs = rinfo->shadow[i].req.operation == BLKIF_OP_INDIRECT ?
>> +			       rinfo->shadow[i].req.u.indirect.nr_segments :
>> +			       rinfo->shadow[i].req.u.rw.nr_segments;
>> +			for (j = 0; j < segs; j++) {
>> +				persistent_gnt = rinfo->shadow[i].grants_used[j];
>> +				gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
>> +				if (dinfo->feature_persistent)
>> +					__free_page(pfn_to_page(persistent_gnt->pfn));
>> +				kfree(persistent_gnt);
>> +			}
>>  
>> -		for (j = 0; j < INDIRECT_GREFS(segs); j++) {
>> -			persistent_gnt = rinfo->shadow[i].indirect_grants[j];
>> -			gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
>> -			__free_page(pfn_to_page(persistent_gnt->pfn));
>> -			kfree(persistent_gnt);
>> -		}
>> +			if (rinfo->shadow[i].req.operation != BLKIF_OP_INDIRECT)
>> +				/*
>> +				 * If this is not an indirect operation don't try to
>> +				 * free indirect segments
>> +				 */
>> +				goto free_shadow;
>> +
>> +			for (j = 0; j < INDIRECT_GREFS(segs); j++) {
>> +				persistent_gnt = rinfo->shadow[i].indirect_grants[j];
>> +				gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
>> +				__free_page(pfn_to_page(persistent_gnt->pfn));
>> +				kfree(persistent_gnt);
>> +			}
>>  
>>  free_shadow:
>> -		kfree(rinfo->shadow[i].grants_used);
>> -		rinfo->shadow[i].grants_used = NULL;
>> -		kfree(rinfo->shadow[i].indirect_grants);
>> -		rinfo->shadow[i].indirect_grants = NULL;
>> -		kfree(rinfo->shadow[i].sg);
>> -		rinfo->shadow[i].sg = NULL;
>> -	}
>> +			kfree(rinfo->shadow[i].grants_used);
>> +			rinfo->shadow[i].grants_used = NULL;
>> +			kfree(rinfo->shadow[i].indirect_grants);
>> +			rinfo->shadow[i].indirect_grants = NULL;
>> +			kfree(rinfo->shadow[i].sg);
>> +			rinfo->shadow[i].sg = NULL;
>> +		}
>>  
>> -	/* No more gnttab callback work. */
>> -	gnttab_cancel_free_callback(&rinfo->callback);
>> -	spin_unlock_irq(&dinfo->io_lock);
>> +		/* No more gnttab callback work. */
>> +		gnttab_cancel_free_callback(&rinfo->callback);
>> +		spin_unlock_irq(&dinfo->io_lock);
>>  
>> -	/* Flush gnttab callback work. Must be done with no locks held. */
>> -	flush_work(&rinfo->work);
>> +		/* Flush gnttab callback work. Must be done with no locks held. */
>> +		flush_work(&rinfo->work);
>>  
>> -	/* Free resources associated with old device channel. */
>> -	for (i = 0; i < dinfo->nr_ring_pages; i++) {
>> -		if (rinfo->ring_ref[i] != GRANT_INVALID_REF) {
>> -			gnttab_end_foreign_access(rinfo->ring_ref[i], 0, 0);
>> -			rinfo->ring_ref[i] = GRANT_INVALID_REF;
>> +		/* Free resources associated with old device channel. */
>> +		for (i = 0; i < dinfo->pages_per_ring; i++) {
>> +			if (rinfo->ring_ref[i] != GRANT_INVALID_REF) {
>> +				gnttab_end_foreign_access(rinfo->ring_ref[i], 0, 0);
>> +				rinfo->ring_ref[i] = GRANT_INVALID_REF;
>> +			}
>>  		}
>> -	}
>> -	free_pages((unsigned long)rinfo->ring.sring, get_order(dinfo->nr_ring_pages * PAGE_SIZE));
>> -	rinfo->ring.sring = NULL;
>> -
>> -	if (rinfo->irq)
>> -		unbind_from_irqhandler(rinfo->irq, rinfo);
>> -	rinfo->evtchn = rinfo->irq = 0;
>> +		free_pages((unsigned long)rinfo->ring.sring, get_order(dinfo->pages_per_ring * PAGE_SIZE));
>> +		rinfo->ring.sring = NULL;
>>  
>> +		if (rinfo->irq)
>> +			unbind_from_irqhandler(rinfo->irq, rinfo);
>> +		rinfo->evtchn = rinfo->irq = 0;
>> +	}
>>  }
>>  
>>  static void blkif_completion(struct blk_shadow *s, struct blkfront_ring_info *rinfo,
>> @@ -1276,6 +1284,26 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>>  	return IRQ_HANDLED;
>>  }
>>  
>> +static void destroy_blkring(struct xenbus_device *dev,
>> +			    struct blkfront_ring_info *rinfo)
>> +{
>> +	int i;
>> +
>> +	if (rinfo->irq)
>> +		unbind_from_irqhandler(rinfo->irq, rinfo);
>> +	if (rinfo->evtchn)
>> +		xenbus_free_evtchn(dev, rinfo->evtchn);
>> +
>> +	for (i = 0; i < rinfo->dinfo->pages_per_ring; i++) {
>> +		if (rinfo->ring_ref[i] != GRANT_INVALID_REF) {
>> +			gnttab_end_foreign_access(rinfo->ring_ref[i], 0, 0);
>> +			rinfo->ring_ref[i] = GRANT_INVALID_REF;
>> +		}
>> +	}
>> +	free_pages((unsigned long)rinfo->ring.sring,
>> +		   get_order(rinfo->dinfo->pages_per_ring * PAGE_SIZE));
>> +	rinfo->ring.sring = NULL;
>> +}
>>  
>>  static int setup_blkring(struct xenbus_device *dev,
>>  			 struct blkfront_ring_info *rinfo)
>> @@ -1283,10 +1311,10 @@ static int setup_blkring(struct xenbus_device *dev,
>>  	struct blkif_sring *sring;
>>  	int err, i;
>>  	struct blkfront_dev_info *dinfo = rinfo->dinfo;
>> -	unsigned long ring_size = dinfo->nr_ring_pages * PAGE_SIZE;
>> +	unsigned long ring_size = dinfo->pages_per_ring * PAGE_SIZE;
>>  	grant_ref_t gref[XENBUS_MAX_RING_PAGES];
>>  
>> -	for (i = 0; i < dinfo->nr_ring_pages; i++)
>> +	for (i = 0; i < dinfo->pages_per_ring; i++)
>>  		rinfo->ring_ref[i] = GRANT_INVALID_REF;
>>  
>>  	sring = (struct blkif_sring *)__get_free_pages(GFP_NOIO | __GFP_HIGH,
>> @@ -1298,13 +1326,13 @@ static int setup_blkring(struct xenbus_device *dev,
>>  	SHARED_RING_INIT(sring);
>>  	FRONT_RING_INIT(&rinfo->ring, sring, ring_size);
>>  
>> -	err = xenbus_grant_ring(dev, rinfo->ring.sring, dinfo->nr_ring_pages, gref);
>> +	err = xenbus_grant_ring(dev, rinfo->ring.sring, dinfo->pages_per_ring, gref);
>>  	if (err < 0) {
>>  		free_pages((unsigned long)sring, get_order(ring_size));
>>  		rinfo->ring.sring = NULL;
>>  		goto fail;
>>  	}
>> -	for (i = 0; i < dinfo->nr_ring_pages; i++)
>> +	for (i = 0; i < dinfo->pages_per_ring; i++)
>>  		rinfo->ring_ref[i] = gref[i];
>>  
>>  	err = xenbus_alloc_evtchn(dev, &rinfo->evtchn);
>> @@ -1322,7 +1350,7 @@ static int setup_blkring(struct xenbus_device *dev,
>>  
>>  	return 0;
>>  fail:
>> -	blkif_free(dinfo, 0);
>> +	destroy_blkring(dev, rinfo);
> 
> blkif_free used to clean a lot more than what destroy_blkring does, is
> this right?
> 

I think it's fine because we are still in the early setup stage.
But to minial the changes I'll not introduce destroy_blkring() in next version any more.

And blkif_free() will be deleted here, because talk_to_blkback() will call blkif_free() anyway if return err.

>>  	return err;
>>  }
>>  
>> @@ -1333,65 +1361,76 @@ static int talk_to_blkback(struct xenbus_device *dev,
>>  {
>>  	const char *message = NULL;
>>  	struct xenbus_transaction xbt;
>> -	int err, i;
>> +	int err, i, r_index;
>>  	unsigned int max_page_order = 0;
>>  	unsigned int ring_page_order = 0;
>> -	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
>> +	struct blkfront_ring_info *rinfo;
>>  
>>  	err = xenbus_scanf(XBT_NIL, dinfo->xbdev->otherend,
>>  			   "max-ring-page-order", "%u", &max_page_order);
>>  	if (err != 1)
>> -		dinfo->nr_ring_pages = 1;
>> +		dinfo->pages_per_ring = 1;
>>  	else {
>>  		ring_page_order = min(xen_blkif_max_ring_order, max_page_order);
>> -		dinfo->nr_ring_pages = 1 << ring_page_order;
>> +		dinfo->pages_per_ring = 1 << ring_page_order;
> 
> As said above, I think nr_ring_pages is perfectly fine, and avoids all
> this ponintless changes.
> 
>>  	}
>>  
>> -	/* Create shared ring, alloc event channel. */
>> -	err = setup_blkring(dev, rinfo);
>> -	if (err)
>> -		goto out;
>> +	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
>> +		rinfo = &dinfo->rinfo[r_index];
>> +		/* Create shared ring, alloc event channel. */
>> +		err = setup_blkring(dev, rinfo);
>> +		if (err)
>> +			goto out;
>> +	}
>>  
>>  again:
>>  	err = xenbus_transaction_start(&xbt);
>>  	if (err) {
>>  		xenbus_dev_fatal(dev, err, "starting transaction");
>> -		goto destroy_blkring;
>> +		goto out;
>>  	}
>>  
>> -	if (dinfo->nr_ring_pages == 1) {
>> -		err = xenbus_printf(xbt, dev->nodename,
>> -				    "ring-ref", "%u", rinfo->ring_ref[0]);
>> -		if (err) {
>> -			message = "writing ring-ref";
>> -			goto abort_transaction;
>> -		}
>> -	} else {
>> -		err = xenbus_printf(xbt, dev->nodename,
>> -				    "ring-page-order", "%u", ring_page_order);
>> -		if (err) {
>> -			message = "writing ring-page-order";
>> -			goto abort_transaction;
>> -		}
>> -
>> -		for (i = 0; i < dinfo->nr_ring_pages; i++) {
>> -			char ring_ref_name[RINGREF_NAME_LEN];
>> +	if (dinfo->nr_rings == 1) {
>> +		rinfo = &dinfo->rinfo[0];
>>  
>> -			snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
>> -			err = xenbus_printf(xbt, dev->nodename, ring_ref_name,
>> -					    "%u", rinfo->ring_ref[i]);
>> +		if (dinfo->pages_per_ring == 1) {
>> +			err = xenbus_printf(xbt, dev->nodename,
>> +					    "ring-ref", "%u", rinfo->ring_ref[0]);
>>  			if (err) {
>>  				message = "writing ring-ref";
>>  				goto abort_transaction;
>>  			}
>> +		} else {
>> +			err = xenbus_printf(xbt, dev->nodename,
>> +					    "ring-page-order", "%u", ring_page_order);
>> +			if (err) {
>> +				message = "writing ring-page-order";
>> +				goto abort_transaction;
>> +			}
>> +
>> +			for (i = 0; i < dinfo->pages_per_ring; i++) {
>> +				char ring_ref_name[RINGREF_NAME_LEN];
>> +
>> +				snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
>> +				err = xenbus_printf(xbt, dev->nodename, ring_ref_name,
>> +						    "%u", rinfo->ring_ref[i]);
>> +				if (err) {
>> +					message = "writing ring-ref";
>> +					goto abort_transaction;
>> +				}
>> +			}
>>  		}
>> -	}
>> -	err = xenbus_printf(xbt, dev->nodename,
>> -			    "event-channel", "%u", rinfo->evtchn);
>> -	if (err) {
>> -		message = "writing event-channel";
>> +		err = xenbus_printf(xbt, dev->nodename,
>> +				    "event-channel", "%u", rinfo->evtchn);
>> +		if (err) {
>> +			message = "writing event-channel";
>> +			goto abort_transaction;
>> +		}
>> +	} else {
>> +		/* Not supported at this stage */
>>  		goto abort_transaction;
>>  	}
>> +
>>  	err = xenbus_printf(xbt, dev->nodename, "protocol", "%s",
>>  			    XEN_IO_PROTO_ABI_NATIVE);
>>  	if (err) {
>> @@ -1409,12 +1448,16 @@ again:
>>  		if (err == -EAGAIN)
>>  			goto again;
>>  		xenbus_dev_fatal(dev, err, "completing transaction");
>> -		goto destroy_blkring;
>> +		goto out;
>>  	}
>>  
>> -	for (i = 0; i < BLK_RING_SIZE(dinfo); i++)
>> -		rinfo->shadow[i].req.u.rw.id = i+1;
>> -	rinfo->shadow[BLK_RING_SIZE(dinfo)-1].req.u.rw.id = 0x0fffffff;
>> +	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
>> +		rinfo = &dinfo->rinfo[r_index];
>> +
>> +		for (i = 0; i < BLK_RING_SIZE(dinfo); i++)
>> +			rinfo->shadow[i].req.u.rw.id = i+1;
>> +		rinfo->shadow[BLK_RING_SIZE(dinfo)-1].req.u.rw.id = 0x0fffffff;
>> +	}
>>  	xenbus_switch_state(dev, XenbusStateInitialised);
>>  
>>  	return 0;
>> @@ -1423,9 +1466,9 @@ again:
>>  	xenbus_transaction_end(xbt, 1);
>>  	if (message)
>>  		xenbus_dev_fatal(dev, err, "%s", message);
>> - destroy_blkring:
>> -	blkif_free(dinfo, 0);
>>   out:
>> +	while (--r_index >= 0)
>> +		destroy_blkring(dev, &dinfo->rinfo[r_index]);
> 
> The same as above, destroy_blkring does a different cleaning of what
> used to be done in blkif_free.

Will revert to blkif_free().

> 
>>  	return err;
>>  }
>>  
>> @@ -1438,7 +1481,7 @@ again:
>>  static int blkfront_probe(struct xenbus_device *dev,
>>  			  const struct xenbus_device_id *id)
>>  {
>> -	int err, vdevice;
>> +	int err, vdevice, r_index;
>>  	struct blkfront_dev_info *dinfo;
>>  	struct blkfront_ring_info *rinfo;
>>  
>> @@ -1490,17 +1533,29 @@ static int blkfront_probe(struct xenbus_device *dev,
>>  		return -ENOMEM;
>>  	}
>>  
>> -	rinfo = &dinfo->rinfo;
>>  	mutex_init(&dinfo->mutex);
>>  	spin_lock_init(&dinfo->io_lock);
>>  	dinfo->xbdev = dev;
>>  	dinfo->vdevice = vdevice;
>> -	INIT_LIST_HEAD(&rinfo->grants);
>> -	INIT_LIST_HEAD(&rinfo->indirect_pages);
>> -	rinfo->persistent_gnts_c = 0;
>>  	dinfo->connected = BLKIF_STATE_DISCONNECTED;
>> -	rinfo->dinfo = dinfo;
>> -	INIT_WORK(&rinfo->work, blkif_restart_queue);
>> +
>> +	dinfo->nr_rings = 1;
>> +	dinfo->rinfo = kzalloc(sizeof(*rinfo) * dinfo->nr_rings, GFP_KERNEL);
>> +	if (!dinfo->rinfo) {
>> +		xenbus_dev_fatal(dev, -ENOMEM, "allocating ring_info structure");
>> +		kfree(dinfo);
>> +		return -ENOMEM;
>> +	}
>> +
>> +	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
>> +		rinfo = &dinfo->rinfo[r_index];
>> +
>> +		INIT_LIST_HEAD(&rinfo->grants);
>> +		INIT_LIST_HEAD(&rinfo->indirect_pages);
>> +		rinfo->persistent_gnts_c = 0;
>> +		rinfo->dinfo = dinfo;
>> +		INIT_WORK(&rinfo->work, blkif_restart_queue);
>> +	}
>>  
>>  	/* Front end dir is a number, which is used as the id. */
>>  	dinfo->handle = simple_strtoul(strrchr(dev->nodename, '/')+1, NULL, 0);
>> @@ -1526,7 +1581,7 @@ static void split_bio_end(struct bio *bio, int error)
>>  
>>  static int blkif_recover(struct blkfront_dev_info *dinfo)
>>  {
>> -	int i;
>> +	int i, r_index;
>>  	struct request *req, *n;
>>  	struct blk_shadow *copy;
>>  	int rc;
>> @@ -1536,56 +1591,62 @@ static int blkif_recover(struct blkfront_dev_info *dinfo)
>>  	int pending, size;
>>  	struct split_bio *split_bio;
>>  	struct list_head requests;
>> -	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
>> -
>> -	/* Stage 1: Make a safe copy of the shadow state. */
>> -	copy = kmemdup(rinfo->shadow, sizeof(rinfo->shadow),
>> -		       GFP_NOIO | __GFP_REPEAT | __GFP_HIGH);
>> -	if (!copy)
>> -		return -ENOMEM;
>> -
>> -	/* Stage 2: Set up free list. */
>> -	memset(&rinfo->shadow, 0, sizeof(rinfo->shadow));
>> -	for (i = 0; i < BLK_RING_SIZE(dinfo); i++)
>> -		rinfo->shadow[i].req.u.rw.id = i+1;
>> -	rinfo->shadow_free = rinfo->ring.req_prod_pvt;
>> -	rinfo->shadow[BLK_RING_SIZE(dinfo)-1].req.u.rw.id = 0x0fffffff;
>> -
>> -	rc = blkfront_gather_backend_features(dinfo);
>> -	if (rc) {
>> -		kfree(copy);
>> -		return rc;
>> -	}
>> +	struct blkfront_ring_info *rinfo;
>>  
>> +	__blkfront_gather_backend_features(dinfo);
>>  	segs = dinfo->max_indirect_segments ? : BLKIF_MAX_SEGMENTS_PER_REQUEST;
>>  	blk_queue_max_segments(dinfo->rq, segs);
>>  	bio_list_init(&bio_list);
>>  	INIT_LIST_HEAD(&requests);
>> -	for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
>> -		/* Not in use? */
>> -		if (!copy[i].request)
>> -			continue;
>>  
>> -		/*
>> -		 * Get the bios in the request so we can re-queue them.
>> -		 */
>> -		if (copy[i].request->cmd_flags &
>> -		    (REQ_FLUSH | REQ_FUA | REQ_DISCARD | REQ_SECURE)) {
>> +	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
>> +		rinfo = &dinfo->rinfo[r_index];
>> +
>> +		/* Stage 1: Make a safe copy of the shadow state. */
>> +		copy = kmemdup(rinfo->shadow, sizeof(rinfo->shadow),
>> +			       GFP_NOIO | __GFP_REPEAT | __GFP_HIGH);
>> +		if (!copy)
>> +			return -ENOMEM;
>> +
>> +		/* Stage 2: Set up free list. */
>> +		memset(&rinfo->shadow, 0, sizeof(rinfo->shadow));
>> +		for (i = 0; i < BLK_RING_SIZE(dinfo); i++)
>> +			rinfo->shadow[i].req.u.rw.id = i+1;
>> +		rinfo->shadow_free = rinfo->ring.req_prod_pvt;
>> +		rinfo->shadow[BLK_RING_SIZE(dinfo)-1].req.u.rw.id = 0x0fffffff;
>> +
>> +		rc = blkfront_setup_indirect(rinfo);
>> +		if (rc) {
>> +			kfree(copy);
>> +			return rc;
>> +		}
>> +
>> +		for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
>> +			/* Not in use? */
>> +			if (!copy[i].request)
>> +				continue;
>> +
>>  			/*
>> -			 * Flush operations don't contain bios, so
>> -			 * we need to requeue the whole request
>> +			 * Get the bios in the request so we can re-queue them.
>>  			 */
>> -			list_add(&copy[i].request->queuelist, &requests);
>> -			continue;
>> +			if (copy[i].request->cmd_flags &
>> +			    (REQ_FLUSH | REQ_FUA | REQ_DISCARD | REQ_SECURE)) {
>> +				/*
>> +				 * Flush operations don't contain bios, so
>> +				 * we need to requeue the whole request
>> +				 */
>> +				list_add(&copy[i].request->queuelist, &requests);
>> +				continue;
>> +			}
>> +			merge_bio.head = copy[i].request->bio;
>> +			merge_bio.tail = copy[i].request->biotail;
>> +			bio_list_merge(&bio_list, &merge_bio);
>> +			copy[i].request->bio = NULL;
>> +			blk_end_request_all(copy[i].request, 0);
>>  		}
>> -		merge_bio.head = copy[i].request->bio;
>> -		merge_bio.tail = copy[i].request->biotail;
>> -		bio_list_merge(&bio_list, &merge_bio);
>> -		copy[i].request->bio = NULL;
>> -		blk_end_request_all(copy[i].request, 0);
>> -	}
>>  
>> -	kfree(copy);
>> +		kfree(copy);
>> +	}
>>  
>>  	xenbus_switch_state(dinfo->xbdev, XenbusStateConnected);
>>  
>> @@ -1594,8 +1655,12 @@ static int blkif_recover(struct blkfront_dev_info *dinfo)
>>  	/* Now safe for us to use the shared ring */
>>  	dinfo->connected = BLKIF_STATE_CONNECTED;
>>  
>> -	/* Kick any other new requests queued since we resumed */
>> -	kick_pending_request_queues(rinfo);
>> +	for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
>> +		rinfo = &dinfo->rinfo[r_index];
>> +
>> +		/* Kick any other new requests queued since we resumed */
>> +		kick_pending_request_queues(rinfo);
>> +	}
>>  
>>  	list_for_each_entry_safe(req, n, &requests, queuelist) {
>>  		/* Requeue pending requests (flush or discard) */
>> @@ -1729,6 +1794,38 @@ static void blkfront_setup_discard(struct blkfront_dev_info *dinfo)
>>  		dinfo->feature_secdiscard = !!discard_secure;
>>  }
>>  
>> +static void blkfront_clean_ring(struct blkfront_ring_info *rinfo)
>> +{
>> +	int i;
>> +
>> +	for (i = 0; i < BLK_RING_SIZE(rinfo->dinfo); i++) {
>> +		kfree(rinfo->shadow[i].grants_used);
>> +		rinfo->shadow[i].grants_used = NULL;
>> +		kfree(rinfo->shadow[i].sg);
>> +		rinfo->shadow[i].sg = NULL;
>> +		kfree(rinfo->shadow[i].indirect_grants);
>> +		rinfo->shadow[i].indirect_grants = NULL;
>> +	}
>> +	if (!list_empty(&rinfo->indirect_pages)) {
>> +		struct page *indirect_page, *n;
>> +		list_for_each_entry_safe(indirect_page, n, &rinfo->indirect_pages, lru) {
>> +			list_del(&indirect_page->lru);
>> +			__free_page(indirect_page);
>> +		}
>> +	}
>> +
>> +	if (!list_empty(&rinfo->grants)) {
>> +		struct grant *gnt_list_entry, *n;
>> +		list_for_each_entry_safe(gnt_list_entry, n,
>> +				&rinfo->grants, node) {
>> +			list_del(&gnt_list_entry->node);
>> +			if (rinfo->dinfo->feature_persistent)
>> +				__free_page(pfn_to_page(gnt_list_entry->pfn));
>> +			kfree(gnt_list_entry);
>> +		}
>> +	}
>> +}
>> +
>>  static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo)
>>  {
>>  	unsigned int segs;
>> @@ -1783,28 +1880,14 @@ static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo)
>>  	return 0;
>>  
>>  out_of_memory:
>> -	for (i = 0; i < BLK_RING_SIZE(dinfo); i++) {
>> -		kfree(rinfo->shadow[i].grants_used);
>> -		rinfo->shadow[i].grants_used = NULL;
>> -		kfree(rinfo->shadow[i].sg);
>> -		rinfo->shadow[i].sg = NULL;
>> -		kfree(rinfo->shadow[i].indirect_grants);
>> -		rinfo->shadow[i].indirect_grants = NULL;
>> -	}
>> -	if (!list_empty(&rinfo->indirect_pages)) {
>> -		struct page *indirect_page, *n;
>> -		list_for_each_entry_safe(indirect_page, n, &rinfo->indirect_pages, lru) {
>> -			list_del(&indirect_page->lru);
>> -			__free_page(indirect_page);
>> -		}
>> -	}
>> +	blkfront_clean_ring(rinfo);
>>  	return -ENOMEM;
>>  }
>>  
>>  /*
>>   * Gather all backend feature-*
>>   */
>> -static int blkfront_gather_backend_features(struct blkfront_dev_info *dinfo)
>> +static void __blkfront_gather_backend_features(struct blkfront_dev_info *dinfo)
>>  {
>>  	int err;
>>  	int barrier, flush, discard, persistent;
>> @@ -1859,8 +1942,25 @@ static int blkfront_gather_backend_features(struct blkfront_dev_info *dinfo)
>>  	else
>>  		dinfo->max_indirect_segments = min(indirect_segments,
>>  						  xen_blkif_max_segments);
>> +}
>> +
>> +static int blkfront_gather_backend_features(struct blkfront_dev_info *dinfo)
>> +{
>> +	int err, i;
>> +
>> +	__blkfront_gather_backend_features(dinfo);
> 
> IMHO, there's no need to introduce __blkfront_gather_backend_features,
> just add the chunk below to the existing blkfront_gather_backend_features.
> 

Will update.

>> -	return blkfront_setup_indirect(&dinfo->rinfo);
>> +	for (i = 0; i < dinfo->nr_rings; i++) {
>> +		err = blkfront_setup_indirect(&dinfo->rinfo[i]);
>> +		if (err)
>> +			goto out;
>> +	}
>> +	return 0;
>> +
>> +out:
>> +	while (--i >= 0)
>> +		blkfront_clean_ring(&dinfo->rinfo[i]);
>> +	return err;
>>  }
>>  
>>  /*
>> @@ -1873,8 +1973,8 @@ static void blkfront_connect(struct blkfront_dev_info *dinfo)
>>  	unsigned long sector_size;
>>  	unsigned int physical_sector_size;
>>  	unsigned int binfo;
>> -	int err;
>> -	struct blkfront_ring_info *rinfo = &dinfo->rinfo;
>> +	int err, i;
>> +	struct blkfront_ring_info *rinfo;
>>  
>>  	switch (dinfo->connected) {
>>  	case BLKIF_STATE_CONNECTED:
>> @@ -1951,7 +2051,10 @@ static void blkfront_connect(struct blkfront_dev_info *dinfo)
>>  	/* Kick pending requests. */
>>  	spin_lock_irq(&dinfo->io_lock);
>>  	dinfo->connected = BLKIF_STATE_CONNECTED;
>> -	kick_pending_request_queues(rinfo);
>> +	for (i = 0; i < dinfo->nr_rings; i++) {
>> +		rinfo = &dinfo->rinfo[i];
> 
> If rinfo is only going to be used in the for loop you can declare it inside:
> 

Will update.

Thank you for your patient for these big patches.

> 		struct blkfront_ring_info *rinfo = &dinfo->rinfo[i];
> 
>> +		kick_pending_request_queues(rinfo);
>> +	}
>>  	spin_unlock_irq(&dinfo->io_lock);
>>  
>>  	add_disk(dinfo->gd);
>>
> 

-- 
Regards,
-Bob

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 5/9] xen/blkfront: convert per device io_lock to per ring ring_lock
  2015-10-05 14:13   ` Roger Pau Monné
@ 2015-10-07 10:34     ` Bob Liu
  2015-10-07 10:34     ` Bob Liu
  1 sibling, 0 replies; 83+ messages in thread
From: Bob Liu @ 2015-10-07 10:34 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, david.vrabel, linux-kernel, konrad.wilk,
	felipe.franciosi, axboe, hch, avanzini.arianna, rafal.mielniczuk,
	boris.ostrovsky, jonathan.davies


On 10/05/2015 10:13 PM, Roger Pau Monné wrote:
> El 05/09/15 a les 14.39, Bob Liu ha escrit:
>> The per device io_lock became a coarser grained lock after multi-queues/rings
>> was introduced, this patch converts it to a fine-grained per ring lock.
>>
>> NOTE: The per dev_info structure was no more protected by any lock.
> 
> I would rewrite this as:
> 
> Note that the per-device blkfront_dev_info structure is no longer
> protected by any lock.
> 

Will update.

>>
>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>> ---
>>  drivers/block/xen-blkfront.c |   44 +++++++++++++++++++-----------------------
>>  1 file changed, 20 insertions(+), 24 deletions(-)
>>
>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>> index bf45c99..1cae76b 100644
>> --- a/drivers/block/xen-blkfront.c
>> +++ b/drivers/block/xen-blkfront.c
>> @@ -123,6 +123,7 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the
>>  struct blkfront_ring_info
>>  {
>>  	struct blkif_front_ring ring;
>> +	spinlock_t ring_lock;
>>  	unsigned int ring_ref[XENBUS_MAX_RING_PAGES];
>>  	unsigned int evtchn, irq;
>>  	struct work_struct work;
>> @@ -141,7 +142,6 @@ struct blkfront_ring_info
>>   * putting all kinds of interesting stuff here :-)
>>   */
>>  struct blkfront_dev_info {
>> -	spinlock_t io_lock;
>>  	struct mutex mutex;
>>  	struct xenbus_device *xbdev;
>>  	struct gendisk *gd;
>> @@ -637,29 +637,28 @@ static int blkif_queue_rq(struct blk_mq_hw_ctx *hctx,
>>  			   const struct blk_mq_queue_data *qd)
>>  {
>>  	struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)hctx->driver_data;
>> -	struct blkfront_dev_info *dinfo = rinfo->dinfo;
>>  
>>  	blk_mq_start_request(qd->rq);
>> -	spin_lock_irq(&dinfo->io_lock);
>> +	spin_lock_irq(&rinfo->ring_lock);
>>  	if (RING_FULL(&rinfo->ring))
>>  		goto out_busy;
>>  
>> -	if (blkif_request_flush_invalid(qd->rq, dinfo))
>> +	if (blkif_request_flush_invalid(qd->rq, rinfo->dinfo))
>>  		goto out_err;
>>  
>>  	if (blkif_queue_request(qd->rq, rinfo))
>>  		goto out_busy;
>>  
>>  	flush_requests(rinfo);
>> -	spin_unlock_irq(&dinfo->io_lock);
>> +	spin_unlock_irq(&rinfo->ring_lock);
>>  	return BLK_MQ_RQ_QUEUE_OK;
>>  
>>  out_err:
>> -	spin_unlock_irq(&dinfo->io_lock);
>> +	spin_unlock_irq(&rinfo->ring_lock);
>>  	return BLK_MQ_RQ_QUEUE_ERROR;
>>  
>>  out_busy:
>> -	spin_unlock_irq(&dinfo->io_lock);
>> +	spin_unlock_irq(&rinfo->ring_lock);
>>  	blk_mq_stop_hw_queue(hctx);
>>  	return BLK_MQ_RQ_QUEUE_BUSY;
>>  }
>> @@ -961,7 +960,7 @@ static void xlvbd_release_gendisk(struct blkfront_dev_info *dinfo)
>>  	dinfo->gd = NULL;
>>  }
>>  
>> -/* Must be called with io_lock holded */
>> +/* Must be called with ring_lock holded */
>                                     ^ held.
>>  static void kick_pending_request_queues(struct blkfront_ring_info *rinfo)
>>  {
>>  	if (!RING_FULL(&rinfo->ring))
>> @@ -972,10 +971,10 @@ static void blkif_restart_queue(struct work_struct *work)
>>  {
>>  	struct blkfront_ring_info *rinfo = container_of(work, struct blkfront_ring_info, work);
>>  
>> -	spin_lock_irq(&rinfo->dinfo->io_lock);
>> +	spin_lock_irq(&rinfo->ring_lock);
>>  	if (rinfo->dinfo->connected == BLKIF_STATE_CONNECTED)
>>  		kick_pending_request_queues(rinfo);
> 
> This seems wrong, why are you acquiring a per-ring lock in order to
> check a per-device field? IMHO, I think you need to introduce a
> per-device lock or drop the locking around this chunk if it's really not
> needed.
> 

The lock here is to protect kick_pending_request_queues() where we will check RING_FULL().
Will move this lock after the checking of per-device field.

Thanks,
-Bob

>> -	spin_unlock_irq(&rinfo->dinfo->io_lock);
>> +	spin_unlock_irq(&rinfo->ring_lock);
>>  }
>>  

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 5/9] xen/blkfront: convert per device io_lock to per ring ring_lock
  2015-10-05 14:13   ` Roger Pau Monné
  2015-10-07 10:34     ` Bob Liu
@ 2015-10-07 10:34     ` Bob Liu
  1 sibling, 0 replies; 83+ messages in thread
From: Bob Liu @ 2015-10-07 10:34 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: hch, felipe.franciosi, rafal.mielniczuk, linux-kernel, xen-devel,
	axboe, jonathan.davies, david.vrabel, avanzini.arianna,
	boris.ostrovsky


On 10/05/2015 10:13 PM, Roger Pau Monné wrote:
> El 05/09/15 a les 14.39, Bob Liu ha escrit:
>> The per device io_lock became a coarser grained lock after multi-queues/rings
>> was introduced, this patch converts it to a fine-grained per ring lock.
>>
>> NOTE: The per dev_info structure was no more protected by any lock.
> 
> I would rewrite this as:
> 
> Note that the per-device blkfront_dev_info structure is no longer
> protected by any lock.
> 

Will update.

>>
>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>> ---
>>  drivers/block/xen-blkfront.c |   44 +++++++++++++++++++-----------------------
>>  1 file changed, 20 insertions(+), 24 deletions(-)
>>
>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>> index bf45c99..1cae76b 100644
>> --- a/drivers/block/xen-blkfront.c
>> +++ b/drivers/block/xen-blkfront.c
>> @@ -123,6 +123,7 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the
>>  struct blkfront_ring_info
>>  {
>>  	struct blkif_front_ring ring;
>> +	spinlock_t ring_lock;
>>  	unsigned int ring_ref[XENBUS_MAX_RING_PAGES];
>>  	unsigned int evtchn, irq;
>>  	struct work_struct work;
>> @@ -141,7 +142,6 @@ struct blkfront_ring_info
>>   * putting all kinds of interesting stuff here :-)
>>   */
>>  struct blkfront_dev_info {
>> -	spinlock_t io_lock;
>>  	struct mutex mutex;
>>  	struct xenbus_device *xbdev;
>>  	struct gendisk *gd;
>> @@ -637,29 +637,28 @@ static int blkif_queue_rq(struct blk_mq_hw_ctx *hctx,
>>  			   const struct blk_mq_queue_data *qd)
>>  {
>>  	struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)hctx->driver_data;
>> -	struct blkfront_dev_info *dinfo = rinfo->dinfo;
>>  
>>  	blk_mq_start_request(qd->rq);
>> -	spin_lock_irq(&dinfo->io_lock);
>> +	spin_lock_irq(&rinfo->ring_lock);
>>  	if (RING_FULL(&rinfo->ring))
>>  		goto out_busy;
>>  
>> -	if (blkif_request_flush_invalid(qd->rq, dinfo))
>> +	if (blkif_request_flush_invalid(qd->rq, rinfo->dinfo))
>>  		goto out_err;
>>  
>>  	if (blkif_queue_request(qd->rq, rinfo))
>>  		goto out_busy;
>>  
>>  	flush_requests(rinfo);
>> -	spin_unlock_irq(&dinfo->io_lock);
>> +	spin_unlock_irq(&rinfo->ring_lock);
>>  	return BLK_MQ_RQ_QUEUE_OK;
>>  
>>  out_err:
>> -	spin_unlock_irq(&dinfo->io_lock);
>> +	spin_unlock_irq(&rinfo->ring_lock);
>>  	return BLK_MQ_RQ_QUEUE_ERROR;
>>  
>>  out_busy:
>> -	spin_unlock_irq(&dinfo->io_lock);
>> +	spin_unlock_irq(&rinfo->ring_lock);
>>  	blk_mq_stop_hw_queue(hctx);
>>  	return BLK_MQ_RQ_QUEUE_BUSY;
>>  }
>> @@ -961,7 +960,7 @@ static void xlvbd_release_gendisk(struct blkfront_dev_info *dinfo)
>>  	dinfo->gd = NULL;
>>  }
>>  
>> -/* Must be called with io_lock holded */
>> +/* Must be called with ring_lock holded */
>                                     ^ held.
>>  static void kick_pending_request_queues(struct blkfront_ring_info *rinfo)
>>  {
>>  	if (!RING_FULL(&rinfo->ring))
>> @@ -972,10 +971,10 @@ static void blkif_restart_queue(struct work_struct *work)
>>  {
>>  	struct blkfront_ring_info *rinfo = container_of(work, struct blkfront_ring_info, work);
>>  
>> -	spin_lock_irq(&rinfo->dinfo->io_lock);
>> +	spin_lock_irq(&rinfo->ring_lock);
>>  	if (rinfo->dinfo->connected == BLKIF_STATE_CONNECTED)
>>  		kick_pending_request_queues(rinfo);
> 
> This seems wrong, why are you acquiring a per-ring lock in order to
> check a per-device field? IMHO, I think you need to introduce a
> per-device lock or drop the locking around this chunk if it's really not
> needed.
> 

The lock here is to protect kick_pending_request_queues() where we will check RING_FULL().
Will move this lock after the checking of per-device field.

Thanks,
-Bob

>> -	spin_unlock_irq(&rinfo->dinfo->io_lock);
>> +	spin_unlock_irq(&rinfo->ring_lock);
>>  }
>>  

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 6/9] xen/blkfront: negotiate the number of hw queues/rings with backend
  2015-10-05 14:40   ` Roger Pau Monné
  2015-10-07 10:39     ` Bob Liu
@ 2015-10-07 10:39     ` Bob Liu
  2015-10-07 11:46       ` Roger Pau Monné
  2015-10-07 11:46       ` Roger Pau Monné
  1 sibling, 2 replies; 83+ messages in thread
From: Bob Liu @ 2015-10-07 10:39 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, david.vrabel, linux-kernel, konrad.wilk,
	felipe.franciosi, axboe, hch, avanzini.arianna, rafal.mielniczuk,
	boris.ostrovsky, jonathan.davies


On 10/05/2015 10:40 PM, Roger Pau Monné wrote:
> El 05/09/15 a les 14.39, Bob Liu ha escrit:
>> The max number of hardware queues for xen/blkfront is set by parameter
>> 'max_queues', while the number xen/blkback supported is notified through
>> xenstore("multi-queue-max-queues").
>>
>> The negotiated number was the smaller one, and was written back to
>> xen/blkback as "multi-queue-num-queues".
> 
> I would write this in present instead of past, I think it's clearer:
> 
> The negotiated number _is_ the smaller...

Will update.

> 
>>
>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>> ---
>>  drivers/block/xen-blkfront.c |  142 ++++++++++++++++++++++++++++++++----------
>>  1 file changed, 108 insertions(+), 34 deletions(-)
>>
>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>> index 1cae76b..1aa66c9 100644
>> --- a/drivers/block/xen-blkfront.c
>> +++ b/drivers/block/xen-blkfront.c
>> @@ -107,6 +107,10 @@ static unsigned int xen_blkif_max_ring_order;
>>  module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, S_IRUGO);
>>  MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the shared ring");
>>  
>> +static unsigned int xen_blkif_max_queues;
>> +module_param_named(max_queues, xen_blkif_max_queues, uint, S_IRUGO);
>> +MODULE_PARM_DESC(max_queues, "Maximum number of hardware queues/rings per virtual disk");
>> +
>>  #define BLK_RING_SIZE(dinfo) __CONST_RING_SIZE(blkif, PAGE_SIZE * (dinfo)->pages_per_ring)
>>  #define BLK_MAX_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE * XENBUS_MAX_RING_PAGES)
>>  /*
>> @@ -114,6 +118,7 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the
>>   * characters are enough. Define to 20 to keep consist with backend.
>>   */
>>  #define RINGREF_NAME_LEN (20)
>> +#define QUEUE_NAME_LEN (12)
>>  
>>  /*
>>   *  Per-ring info.
>> @@ -687,7 +692,7 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
>>  
>>  	memset(&dinfo->tag_set, 0, sizeof(dinfo->tag_set));
>>  	dinfo->tag_set.ops = &blkfront_mq_ops;
>> -	dinfo->tag_set.nr_hw_queues = 1;
>> +	dinfo->tag_set.nr_hw_queues = dinfo->nr_rings;
>>  	dinfo->tag_set.queue_depth =  BLK_RING_SIZE(dinfo);
>>  	dinfo->tag_set.numa_node = NUMA_NO_NODE;
>>  	dinfo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
>> @@ -1350,6 +1355,51 @@ fail:
>>  	return err;
>>  }
>>  
>> +static int write_per_ring_nodes(struct xenbus_transaction xbt,
>> +				struct blkfront_ring_info *rinfo, const char *dir)
>> +{
>> +	int err, i;
>> +	const char *message = NULL;
>> +	struct blkfront_dev_info *dinfo = rinfo->dinfo;
>> +
>> +	if (dinfo->pages_per_ring == 1) {
>> +		err = xenbus_printf(xbt, dir, "ring-ref", "%u", rinfo->ring_ref[0]);
>> +		if (err) {
>> +			message = "writing ring-ref";
>> +			goto abort_transaction;
>> +		}
>> +		pr_info("%s: write ring-ref:%d\n", dir, rinfo->ring_ref[0]);
>> +	} else {
>> +		for (i = 0; i < dinfo->pages_per_ring; i++) {
>> +			char ring_ref_name[RINGREF_NAME_LEN];
>> +
>> +			snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
>> +			err = xenbus_printf(xbt, dir, ring_ref_name,
>> +					    "%u", rinfo->ring_ref[i]);
>> +			if (err) {
>> +				message = "writing ring-ref";
>> +				goto abort_transaction;
>> +			}
>> +			pr_info("%s: write ring-ref:%d\n", dir, rinfo->ring_ref[i]);
>> +		}
>> +	}
>> +
>> +	err = xenbus_printf(xbt, dir, "event-channel", "%u", rinfo->evtchn);
>> +	if (err) {
>> +		message = "writing event-channel";
>> +		goto abort_transaction;
>> +	}
>> +	pr_info("%s: write event-channel:%d\n", dir, rinfo->evtchn);
>> +
>> +	return 0;
>> +
>> +abort_transaction:
>> +	xenbus_transaction_end(xbt, 1);
> 
> The transaction is not started inside of the helper, so I would prefer
> it to be ended in the same function where it is started. This should
> just return the error and the caller should end the transaction.
> 

The difficult part is how to return the right message?

>> +	if (message)
>> +		xenbus_dev_fatal(dinfo->xbdev, err, "%s", message);
>> +
>> +	return err;
>> +}
>>  
>>  /* Common code used when first setting up, and when resuming. */
>>  static int talk_to_blkback(struct xenbus_device *dev,
>> @@ -1386,45 +1436,51 @@ again:
>>  		goto out;
>>  	}
>>  
>> +	if (dinfo->pages_per_ring > 1) {
>> +		err = xenbus_printf(xbt, dev->nodename, "ring-page-order", "%u",
>> +				    ring_page_order);
>> +		if (err) {
>> +			message = "writing ring-page-order";
>> +			goto abort_transaction;
>> +		}
>> +	}
>> +
>> +	/* We already got the number of queues in _probe */
>>  	if (dinfo->nr_rings == 1) {
>>  		rinfo = &dinfo->rinfo[0];
>>  
>> -		if (dinfo->pages_per_ring == 1) {
>> -			err = xenbus_printf(xbt, dev->nodename,
>> -					    "ring-ref", "%u", rinfo->ring_ref[0]);
>> -			if (err) {
>> -				message = "writing ring-ref";
>> -				goto abort_transaction;
>> -			}
>> -		} else {
>> -			err = xenbus_printf(xbt, dev->nodename,
>> -					    "ring-page-order", "%u", ring_page_order);
>> -			if (err) {
>> -				message = "writing ring-page-order";
>> -				goto abort_transaction;
>> -			}
>> -
>> -			for (i = 0; i < dinfo->pages_per_ring; i++) {
>> -				char ring_ref_name[RINGREF_NAME_LEN];
>> +		err = write_per_ring_nodes(xbt, &dinfo->rinfo[0], dev->nodename);
>> +		if (err)
>> +			goto out;
>> +	} else {
>> +		char *path;
>> +		size_t pathsize;
>>  
>> -				snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
>> -				err = xenbus_printf(xbt, dev->nodename, ring_ref_name,
>> -						    "%u", rinfo->ring_ref[i]);
>> -				if (err) {
>> -					message = "writing ring-ref";
>> -					goto abort_transaction;
>> -				}
>> -			}
>> -		}
>> -		err = xenbus_printf(xbt, dev->nodename,
>> -				    "event-channel", "%u", rinfo->evtchn);
>> +		err = xenbus_printf(xbt, dev->nodename, "multi-queue-num-queues", "%u",
>> +				    dinfo->nr_rings);
>>  		if (err) {
>> -			message = "writing event-channel";
>> +			message = "writing multi-queue-num-queues";
>>  			goto abort_transaction;
>>  		}
>> -	} else {
>> -		/* Not supported at this stage */
>> -		goto abort_transaction;
>> +
>> +		pathsize = strlen(dev->nodename) + QUEUE_NAME_LEN;
>> +		path = kzalloc(pathsize, GFP_KERNEL);
> 
> kmalloc seems better here, since you are memseting it below in order to
> reuse it.
> 
>> +		if (!path) {
>> +			err = -ENOMEM;
>> +			message = "ENOMEM while writing ring references";
>> +			goto abort_transaction;
>> +		}
>> +
>> +		for (i = 0; i < dinfo->nr_rings; i++) {
>> +			memset(path, 0, pathsize);
>> +			snprintf(path, pathsize, "%s/queue-%u", dev->nodename, i);
>> +			err = write_per_ring_nodes(xbt, &dinfo->rinfo[i], path);
>> +			if (err) {
>> +				kfree(path);
>> +				goto out;
>> +			}
>> +		}
>> +		kfree(path);
>>  	}
>>  
>>  	err = xenbus_printf(xbt, dev->nodename, "protocol", "%s",
>> @@ -1480,6 +1536,7 @@ static int blkfront_probe(struct xenbus_device *dev,
>>  	int err, vdevice, r_index;
>>  	struct blkfront_dev_info *dinfo;
>>  	struct blkfront_ring_info *rinfo;
>> +	unsigned int back_max_queues = 0;
>>  
>>  	/* FIXME: Use dynamic device id if this is not set. */
>>  	err = xenbus_scanf(XBT_NIL, dev->nodename,
>> @@ -1534,7 +1591,17 @@ static int blkfront_probe(struct xenbus_device *dev,
>>  	dinfo->vdevice = vdevice;
>>  	dinfo->connected = BLKIF_STATE_DISCONNECTED;
>>  
>> -	dinfo->nr_rings = 1;
>> +	/* Check if backend supports multiple queues */
>> +	err = xenbus_scanf(XBT_NIL, dinfo->xbdev->otherend,
>> +			   "multi-queue-max-queues", "%u", &back_max_queues);
>> +	if (err < 0)
>> +		back_max_queues = 1;
>> +
>> +	dinfo->nr_rings = min(back_max_queues, xen_blkif_max_queues);
>> +	if (dinfo->nr_rings <= 0)
>> +		dinfo->nr_rings = 1;
>> +	pr_info("dinfo->nr_rings:%u, backend support max-queues:%u\n", dinfo->nr_rings, back_max_queues);
>         ^ pr_debug                           ^ supports
> 
> And I would also print the number of queues that are going to be used.
> 

Will update all above comments.

>> +
>>  	dinfo->rinfo = kzalloc(sizeof(*rinfo) * dinfo->nr_rings, GFP_KERNEL);
>>  	if (!dinfo->rinfo) {
>>  		xenbus_dev_fatal(dev, -ENOMEM, "allocating ring_info structure");
>> @@ -2257,6 +2324,7 @@ static struct xenbus_driver blkfront_driver = {
>>  static int __init xlblk_init(void)
>>  {
>>  	int ret;
>> +	int nr_cpus = num_online_cpus();
>>  
>>  	if (!xen_domain())
>>  		return -ENODEV;
>> @@ -2267,6 +2335,12 @@ static int __init xlblk_init(void)
>>  		xen_blkif_max_ring_order = 0;
>>  	}
>>  
>> +	if (xen_blkif_max_queues > nr_cpus) {
> 
> Shouldn't there be a default value for xen_blkif_max_queues if the user
> hasn't set the parameter on the command line?
> 

Then the default value is 0, multi-queue isn't enabled by default.

>> +		pr_info("Invalid max_queues (%d), will use default max: %d.\n",
>> +			xen_blkif_max_queues, nr_cpus);
>> +		xen_blkif_max_queues = nr_cpus;
>> +	}
>> +
>>  	if (!xen_has_pv_disk_devices())
>>  		return -ENODEV;
>>  
>>
> 

-- 
Regards,
-Bob

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 6/9] xen/blkfront: negotiate the number of hw queues/rings with backend
  2015-10-05 14:40   ` Roger Pau Monné
@ 2015-10-07 10:39     ` Bob Liu
  2015-10-07 10:39     ` Bob Liu
  1 sibling, 0 replies; 83+ messages in thread
From: Bob Liu @ 2015-10-07 10:39 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: hch, felipe.franciosi, rafal.mielniczuk, linux-kernel, xen-devel,
	axboe, jonathan.davies, david.vrabel, avanzini.arianna,
	boris.ostrovsky


On 10/05/2015 10:40 PM, Roger Pau Monné wrote:
> El 05/09/15 a les 14.39, Bob Liu ha escrit:
>> The max number of hardware queues for xen/blkfront is set by parameter
>> 'max_queues', while the number xen/blkback supported is notified through
>> xenstore("multi-queue-max-queues").
>>
>> The negotiated number was the smaller one, and was written back to
>> xen/blkback as "multi-queue-num-queues".
> 
> I would write this in present instead of past, I think it's clearer:
> 
> The negotiated number _is_ the smaller...

Will update.

> 
>>
>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>> ---
>>  drivers/block/xen-blkfront.c |  142 ++++++++++++++++++++++++++++++++----------
>>  1 file changed, 108 insertions(+), 34 deletions(-)
>>
>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>> index 1cae76b..1aa66c9 100644
>> --- a/drivers/block/xen-blkfront.c
>> +++ b/drivers/block/xen-blkfront.c
>> @@ -107,6 +107,10 @@ static unsigned int xen_blkif_max_ring_order;
>>  module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, S_IRUGO);
>>  MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the shared ring");
>>  
>> +static unsigned int xen_blkif_max_queues;
>> +module_param_named(max_queues, xen_blkif_max_queues, uint, S_IRUGO);
>> +MODULE_PARM_DESC(max_queues, "Maximum number of hardware queues/rings per virtual disk");
>> +
>>  #define BLK_RING_SIZE(dinfo) __CONST_RING_SIZE(blkif, PAGE_SIZE * (dinfo)->pages_per_ring)
>>  #define BLK_MAX_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE * XENBUS_MAX_RING_PAGES)
>>  /*
>> @@ -114,6 +118,7 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the
>>   * characters are enough. Define to 20 to keep consist with backend.
>>   */
>>  #define RINGREF_NAME_LEN (20)
>> +#define QUEUE_NAME_LEN (12)
>>  
>>  /*
>>   *  Per-ring info.
>> @@ -687,7 +692,7 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
>>  
>>  	memset(&dinfo->tag_set, 0, sizeof(dinfo->tag_set));
>>  	dinfo->tag_set.ops = &blkfront_mq_ops;
>> -	dinfo->tag_set.nr_hw_queues = 1;
>> +	dinfo->tag_set.nr_hw_queues = dinfo->nr_rings;
>>  	dinfo->tag_set.queue_depth =  BLK_RING_SIZE(dinfo);
>>  	dinfo->tag_set.numa_node = NUMA_NO_NODE;
>>  	dinfo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
>> @@ -1350,6 +1355,51 @@ fail:
>>  	return err;
>>  }
>>  
>> +static int write_per_ring_nodes(struct xenbus_transaction xbt,
>> +				struct blkfront_ring_info *rinfo, const char *dir)
>> +{
>> +	int err, i;
>> +	const char *message = NULL;
>> +	struct blkfront_dev_info *dinfo = rinfo->dinfo;
>> +
>> +	if (dinfo->pages_per_ring == 1) {
>> +		err = xenbus_printf(xbt, dir, "ring-ref", "%u", rinfo->ring_ref[0]);
>> +		if (err) {
>> +			message = "writing ring-ref";
>> +			goto abort_transaction;
>> +		}
>> +		pr_info("%s: write ring-ref:%d\n", dir, rinfo->ring_ref[0]);
>> +	} else {
>> +		for (i = 0; i < dinfo->pages_per_ring; i++) {
>> +			char ring_ref_name[RINGREF_NAME_LEN];
>> +
>> +			snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
>> +			err = xenbus_printf(xbt, dir, ring_ref_name,
>> +					    "%u", rinfo->ring_ref[i]);
>> +			if (err) {
>> +				message = "writing ring-ref";
>> +				goto abort_transaction;
>> +			}
>> +			pr_info("%s: write ring-ref:%d\n", dir, rinfo->ring_ref[i]);
>> +		}
>> +	}
>> +
>> +	err = xenbus_printf(xbt, dir, "event-channel", "%u", rinfo->evtchn);
>> +	if (err) {
>> +		message = "writing event-channel";
>> +		goto abort_transaction;
>> +	}
>> +	pr_info("%s: write event-channel:%d\n", dir, rinfo->evtchn);
>> +
>> +	return 0;
>> +
>> +abort_transaction:
>> +	xenbus_transaction_end(xbt, 1);
> 
> The transaction is not started inside of the helper, so I would prefer
> it to be ended in the same function where it is started. This should
> just return the error and the caller should end the transaction.
> 

The difficult part is how to return the right message?

>> +	if (message)
>> +		xenbus_dev_fatal(dinfo->xbdev, err, "%s", message);
>> +
>> +	return err;
>> +}
>>  
>>  /* Common code used when first setting up, and when resuming. */
>>  static int talk_to_blkback(struct xenbus_device *dev,
>> @@ -1386,45 +1436,51 @@ again:
>>  		goto out;
>>  	}
>>  
>> +	if (dinfo->pages_per_ring > 1) {
>> +		err = xenbus_printf(xbt, dev->nodename, "ring-page-order", "%u",
>> +				    ring_page_order);
>> +		if (err) {
>> +			message = "writing ring-page-order";
>> +			goto abort_transaction;
>> +		}
>> +	}
>> +
>> +	/* We already got the number of queues in _probe */
>>  	if (dinfo->nr_rings == 1) {
>>  		rinfo = &dinfo->rinfo[0];
>>  
>> -		if (dinfo->pages_per_ring == 1) {
>> -			err = xenbus_printf(xbt, dev->nodename,
>> -					    "ring-ref", "%u", rinfo->ring_ref[0]);
>> -			if (err) {
>> -				message = "writing ring-ref";
>> -				goto abort_transaction;
>> -			}
>> -		} else {
>> -			err = xenbus_printf(xbt, dev->nodename,
>> -					    "ring-page-order", "%u", ring_page_order);
>> -			if (err) {
>> -				message = "writing ring-page-order";
>> -				goto abort_transaction;
>> -			}
>> -
>> -			for (i = 0; i < dinfo->pages_per_ring; i++) {
>> -				char ring_ref_name[RINGREF_NAME_LEN];
>> +		err = write_per_ring_nodes(xbt, &dinfo->rinfo[0], dev->nodename);
>> +		if (err)
>> +			goto out;
>> +	} else {
>> +		char *path;
>> +		size_t pathsize;
>>  
>> -				snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
>> -				err = xenbus_printf(xbt, dev->nodename, ring_ref_name,
>> -						    "%u", rinfo->ring_ref[i]);
>> -				if (err) {
>> -					message = "writing ring-ref";
>> -					goto abort_transaction;
>> -				}
>> -			}
>> -		}
>> -		err = xenbus_printf(xbt, dev->nodename,
>> -				    "event-channel", "%u", rinfo->evtchn);
>> +		err = xenbus_printf(xbt, dev->nodename, "multi-queue-num-queues", "%u",
>> +				    dinfo->nr_rings);
>>  		if (err) {
>> -			message = "writing event-channel";
>> +			message = "writing multi-queue-num-queues";
>>  			goto abort_transaction;
>>  		}
>> -	} else {
>> -		/* Not supported at this stage */
>> -		goto abort_transaction;
>> +
>> +		pathsize = strlen(dev->nodename) + QUEUE_NAME_LEN;
>> +		path = kzalloc(pathsize, GFP_KERNEL);
> 
> kmalloc seems better here, since you are memseting it below in order to
> reuse it.
> 
>> +		if (!path) {
>> +			err = -ENOMEM;
>> +			message = "ENOMEM while writing ring references";
>> +			goto abort_transaction;
>> +		}
>> +
>> +		for (i = 0; i < dinfo->nr_rings; i++) {
>> +			memset(path, 0, pathsize);
>> +			snprintf(path, pathsize, "%s/queue-%u", dev->nodename, i);
>> +			err = write_per_ring_nodes(xbt, &dinfo->rinfo[i], path);
>> +			if (err) {
>> +				kfree(path);
>> +				goto out;
>> +			}
>> +		}
>> +		kfree(path);
>>  	}
>>  
>>  	err = xenbus_printf(xbt, dev->nodename, "protocol", "%s",
>> @@ -1480,6 +1536,7 @@ static int blkfront_probe(struct xenbus_device *dev,
>>  	int err, vdevice, r_index;
>>  	struct blkfront_dev_info *dinfo;
>>  	struct blkfront_ring_info *rinfo;
>> +	unsigned int back_max_queues = 0;
>>  
>>  	/* FIXME: Use dynamic device id if this is not set. */
>>  	err = xenbus_scanf(XBT_NIL, dev->nodename,
>> @@ -1534,7 +1591,17 @@ static int blkfront_probe(struct xenbus_device *dev,
>>  	dinfo->vdevice = vdevice;
>>  	dinfo->connected = BLKIF_STATE_DISCONNECTED;
>>  
>> -	dinfo->nr_rings = 1;
>> +	/* Check if backend supports multiple queues */
>> +	err = xenbus_scanf(XBT_NIL, dinfo->xbdev->otherend,
>> +			   "multi-queue-max-queues", "%u", &back_max_queues);
>> +	if (err < 0)
>> +		back_max_queues = 1;
>> +
>> +	dinfo->nr_rings = min(back_max_queues, xen_blkif_max_queues);
>> +	if (dinfo->nr_rings <= 0)
>> +		dinfo->nr_rings = 1;
>> +	pr_info("dinfo->nr_rings:%u, backend support max-queues:%u\n", dinfo->nr_rings, back_max_queues);
>         ^ pr_debug                           ^ supports
> 
> And I would also print the number of queues that are going to be used.
> 

Will update all above comments.

>> +
>>  	dinfo->rinfo = kzalloc(sizeof(*rinfo) * dinfo->nr_rings, GFP_KERNEL);
>>  	if (!dinfo->rinfo) {
>>  		xenbus_dev_fatal(dev, -ENOMEM, "allocating ring_info structure");
>> @@ -2257,6 +2324,7 @@ static struct xenbus_driver blkfront_driver = {
>>  static int __init xlblk_init(void)
>>  {
>>  	int ret;
>> +	int nr_cpus = num_online_cpus();
>>  
>>  	if (!xen_domain())
>>  		return -ENODEV;
>> @@ -2267,6 +2335,12 @@ static int __init xlblk_init(void)
>>  		xen_blkif_max_ring_order = 0;
>>  	}
>>  
>> +	if (xen_blkif_max_queues > nr_cpus) {
> 
> Shouldn't there be a default value for xen_blkif_max_queues if the user
> hasn't set the parameter on the command line?
> 

Then the default value is 0, multi-queue isn't enabled by default.

>> +		pr_info("Invalid max_queues (%d), will use default max: %d.\n",
>> +			xen_blkif_max_queues, nr_cpus);
>> +		xen_blkif_max_queues = nr_cpus;
>> +	}
>> +
>>  	if (!xen_has_pv_disk_devices())
>>  		return -ENODEV;
>>  
>>
> 

-- 
Regards,
-Bob

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 7/9] xen/blkback: separate ring information out of struct xen_blkif
  2015-10-05 14:55   ` Roger Pau Monné
  2015-10-07 10:41     ` Bob Liu
@ 2015-10-07 10:41     ` Bob Liu
  2015-10-10  4:08     ` Bob Liu
  2015-10-10  4:08     ` Bob Liu
  3 siblings, 0 replies; 83+ messages in thread
From: Bob Liu @ 2015-10-07 10:41 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, david.vrabel, linux-kernel, konrad.wilk,
	felipe.franciosi, axboe, hch, avanzini.arianna, rafal.mielniczuk,
	boris.ostrovsky, jonathan.davies


On 10/05/2015 10:55 PM, Roger Pau Monné wrote:
> El 05/09/15 a les 14.39, Bob Liu ha escrit:
>> Split per ring information to an new structure:xen_blkif_ring, so that one vbd
>> device can associate with one or more rings/hardware queues.
>>
>> This patch is a preparation for supporting multi hardware queues/rings.
>>
>> Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>> ---
>>  drivers/block/xen-blkback/blkback.c |  365 ++++++++++++++++++-----------------
>>  drivers/block/xen-blkback/common.h  |   52 +++--
>>  drivers/block/xen-blkback/xenbus.c  |  130 +++++++------
>>  3 files changed, 295 insertions(+), 252 deletions(-)
>>
>> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
>> index 954c002..fd02240 100644
>> --- a/drivers/block/xen-blkback/blkback.c
>> +++ b/drivers/block/xen-blkback/blkback.c
>> @@ -113,71 +113,71 @@ module_param(log_stats, int, 0644);
>>  /* Number of free pages to remove on each call to gnttab_free_pages */
>>  #define NUM_BATCH_FREE_PAGES 10
>>  
>> -static inline int get_free_page(struct xen_blkif *blkif, struct page **page)
>> +static inline int get_free_page(struct xen_blkif_ring *ring, struct page **page)
>>  {
>>  	unsigned long flags;
>>  
>> -	spin_lock_irqsave(&blkif->free_pages_lock, flags);
>> -	if (list_empty(&blkif->free_pages)) {
>> -		BUG_ON(blkif->free_pages_num != 0);
>> -		spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
>> +	spin_lock_irqsave(&ring->free_pages_lock, flags);
>> +	if (list_empty(&ring->free_pages)) {
> 
> I'm afraid the pool of free pages should be per-device, not per-ring.
> 
>> +		BUG_ON(ring->free_pages_num != 0);
>> +		spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>>  		return gnttab_alloc_pages(1, page);
>>  	}
>> -	BUG_ON(blkif->free_pages_num == 0);
>> -	page[0] = list_first_entry(&blkif->free_pages, struct page, lru);
>> +	BUG_ON(ring->free_pages_num == 0);
>> +	page[0] = list_first_entry(&ring->free_pages, struct page, lru);
>>  	list_del(&page[0]->lru);
>> -	blkif->free_pages_num--;
>> -	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
>> +	ring->free_pages_num--;
>> +	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>>  
>>  	return 0;
>>  }
>>  
>> -static inline void put_free_pages(struct xen_blkif *blkif, struct page **page,
>> +static inline void put_free_pages(struct xen_blkif_ring *ring, struct page **page,
>>                                    int num)
>>  {
>>  	unsigned long flags;
>>  	int i;
>>  
>> -	spin_lock_irqsave(&blkif->free_pages_lock, flags);
>> +	spin_lock_irqsave(&ring->free_pages_lock, flags);
>>  	for (i = 0; i < num; i++)
>> -		list_add(&page[i]->lru, &blkif->free_pages);
>> -	blkif->free_pages_num += num;
>> -	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
>> +		list_add(&page[i]->lru, &ring->free_pages);
>> +	ring->free_pages_num += num;
>> +	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>>  }
>>  
>> -static inline void shrink_free_pagepool(struct xen_blkif *blkif, int num)
>> +static inline void shrink_free_pagepool(struct xen_blkif_ring *ring, int num)
>>  {
>>  	/* Remove requested pages in batches of NUM_BATCH_FREE_PAGES */
>>  	struct page *page[NUM_BATCH_FREE_PAGES];
>>  	unsigned int num_pages = 0;
>>  	unsigned long flags;
>>  
>> -	spin_lock_irqsave(&blkif->free_pages_lock, flags);
>> -	while (blkif->free_pages_num > num) {
>> -		BUG_ON(list_empty(&blkif->free_pages));
>> -		page[num_pages] = list_first_entry(&blkif->free_pages,
>> +	spin_lock_irqsave(&ring->free_pages_lock, flags);
>> +	while (ring->free_pages_num > num) {
>> +		BUG_ON(list_empty(&ring->free_pages));
>> +		page[num_pages] = list_first_entry(&ring->free_pages,
>>  		                                   struct page, lru);
>>  		list_del(&page[num_pages]->lru);
>> -		blkif->free_pages_num--;
>> +		ring->free_pages_num--;
>>  		if (++num_pages == NUM_BATCH_FREE_PAGES) {
>> -			spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
>> +			spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>>  			gnttab_free_pages(num_pages, page);
>> -			spin_lock_irqsave(&blkif->free_pages_lock, flags);
>> +			spin_lock_irqsave(&ring->free_pages_lock, flags);
>>  			num_pages = 0;
>>  		}
>>  	}
>> -	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
>> +	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>>  	if (num_pages != 0)
>>  		gnttab_free_pages(num_pages, page);
>>  }
>>  
>>  #define vaddr(page) ((unsigned long)pfn_to_kaddr(page_to_pfn(page)))
>>  
>> -static int do_block_io_op(struct xen_blkif *blkif);
>> -static int dispatch_rw_block_io(struct xen_blkif *blkif,
>> +static int do_block_io_op(struct xen_blkif_ring *ring);
>> +static int dispatch_rw_block_io(struct xen_blkif_ring *ring,
>>  				struct blkif_request *req,
>>  				struct pending_req *pending_req);
>> -static void make_response(struct xen_blkif *blkif, u64 id,
>> +static void make_response(struct xen_blkif_ring *ring, u64 id,
>>  			  unsigned short op, int st);
>>  
>>  #define foreach_grant_safe(pos, n, rbtree, node) \
>> @@ -198,19 +198,19 @@ static void make_response(struct xen_blkif *blkif, u64 id,
>>   * bit operations to modify the flags of a persistent grant and to count
>>   * the number of used grants.
>>   */
>> -static int add_persistent_gnt(struct xen_blkif *blkif,
>> +static int add_persistent_gnt(struct xen_blkif_ring *ring,
>>  			       struct persistent_gnt *persistent_gnt)
>>  {
>>  	struct rb_node **new = NULL, *parent = NULL;
>>  	struct persistent_gnt *this;
>>  
>> -	if (blkif->persistent_gnt_c >= xen_blkif_max_pgrants) {
>> -		if (!blkif->vbd.overflow_max_grants)
>> -			blkif->vbd.overflow_max_grants = 1;
>> +	if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) {
>> +		if (!ring->blkif->vbd.overflow_max_grants)
>> +			ring->blkif->vbd.overflow_max_grants = 1;
> 
> The same for the pool of persistent grants, it should be per-device and
> not per-ring.
> 
> And I think this issue is far worse than the others, because a frontend
> might use a persistent grant on different queues, forcing the backend
> map the grant several times for each queue, this is not acceptable IMO.
> 

Thanks, will be updated.

-- 
Regards,
-Bob

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 7/9] xen/blkback: separate ring information out of struct xen_blkif
  2015-10-05 14:55   ` Roger Pau Monné
@ 2015-10-07 10:41     ` Bob Liu
  2015-10-07 10:41     ` Bob Liu
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 83+ messages in thread
From: Bob Liu @ 2015-10-07 10:41 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: hch, felipe.franciosi, rafal.mielniczuk, linux-kernel, xen-devel,
	axboe, jonathan.davies, david.vrabel, avanzini.arianna,
	boris.ostrovsky


On 10/05/2015 10:55 PM, Roger Pau Monné wrote:
> El 05/09/15 a les 14.39, Bob Liu ha escrit:
>> Split per ring information to an new structure:xen_blkif_ring, so that one vbd
>> device can associate with one or more rings/hardware queues.
>>
>> This patch is a preparation for supporting multi hardware queues/rings.
>>
>> Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>> ---
>>  drivers/block/xen-blkback/blkback.c |  365 ++++++++++++++++++-----------------
>>  drivers/block/xen-blkback/common.h  |   52 +++--
>>  drivers/block/xen-blkback/xenbus.c  |  130 +++++++------
>>  3 files changed, 295 insertions(+), 252 deletions(-)
>>
>> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
>> index 954c002..fd02240 100644
>> --- a/drivers/block/xen-blkback/blkback.c
>> +++ b/drivers/block/xen-blkback/blkback.c
>> @@ -113,71 +113,71 @@ module_param(log_stats, int, 0644);
>>  /* Number of free pages to remove on each call to gnttab_free_pages */
>>  #define NUM_BATCH_FREE_PAGES 10
>>  
>> -static inline int get_free_page(struct xen_blkif *blkif, struct page **page)
>> +static inline int get_free_page(struct xen_blkif_ring *ring, struct page **page)
>>  {
>>  	unsigned long flags;
>>  
>> -	spin_lock_irqsave(&blkif->free_pages_lock, flags);
>> -	if (list_empty(&blkif->free_pages)) {
>> -		BUG_ON(blkif->free_pages_num != 0);
>> -		spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
>> +	spin_lock_irqsave(&ring->free_pages_lock, flags);
>> +	if (list_empty(&ring->free_pages)) {
> 
> I'm afraid the pool of free pages should be per-device, not per-ring.
> 
>> +		BUG_ON(ring->free_pages_num != 0);
>> +		spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>>  		return gnttab_alloc_pages(1, page);
>>  	}
>> -	BUG_ON(blkif->free_pages_num == 0);
>> -	page[0] = list_first_entry(&blkif->free_pages, struct page, lru);
>> +	BUG_ON(ring->free_pages_num == 0);
>> +	page[0] = list_first_entry(&ring->free_pages, struct page, lru);
>>  	list_del(&page[0]->lru);
>> -	blkif->free_pages_num--;
>> -	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
>> +	ring->free_pages_num--;
>> +	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>>  
>>  	return 0;
>>  }
>>  
>> -static inline void put_free_pages(struct xen_blkif *blkif, struct page **page,
>> +static inline void put_free_pages(struct xen_blkif_ring *ring, struct page **page,
>>                                    int num)
>>  {
>>  	unsigned long flags;
>>  	int i;
>>  
>> -	spin_lock_irqsave(&blkif->free_pages_lock, flags);
>> +	spin_lock_irqsave(&ring->free_pages_lock, flags);
>>  	for (i = 0; i < num; i++)
>> -		list_add(&page[i]->lru, &blkif->free_pages);
>> -	blkif->free_pages_num += num;
>> -	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
>> +		list_add(&page[i]->lru, &ring->free_pages);
>> +	ring->free_pages_num += num;
>> +	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>>  }
>>  
>> -static inline void shrink_free_pagepool(struct xen_blkif *blkif, int num)
>> +static inline void shrink_free_pagepool(struct xen_blkif_ring *ring, int num)
>>  {
>>  	/* Remove requested pages in batches of NUM_BATCH_FREE_PAGES */
>>  	struct page *page[NUM_BATCH_FREE_PAGES];
>>  	unsigned int num_pages = 0;
>>  	unsigned long flags;
>>  
>> -	spin_lock_irqsave(&blkif->free_pages_lock, flags);
>> -	while (blkif->free_pages_num > num) {
>> -		BUG_ON(list_empty(&blkif->free_pages));
>> -		page[num_pages] = list_first_entry(&blkif->free_pages,
>> +	spin_lock_irqsave(&ring->free_pages_lock, flags);
>> +	while (ring->free_pages_num > num) {
>> +		BUG_ON(list_empty(&ring->free_pages));
>> +		page[num_pages] = list_first_entry(&ring->free_pages,
>>  		                                   struct page, lru);
>>  		list_del(&page[num_pages]->lru);
>> -		blkif->free_pages_num--;
>> +		ring->free_pages_num--;
>>  		if (++num_pages == NUM_BATCH_FREE_PAGES) {
>> -			spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
>> +			spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>>  			gnttab_free_pages(num_pages, page);
>> -			spin_lock_irqsave(&blkif->free_pages_lock, flags);
>> +			spin_lock_irqsave(&ring->free_pages_lock, flags);
>>  			num_pages = 0;
>>  		}
>>  	}
>> -	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
>> +	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>>  	if (num_pages != 0)
>>  		gnttab_free_pages(num_pages, page);
>>  }
>>  
>>  #define vaddr(page) ((unsigned long)pfn_to_kaddr(page_to_pfn(page)))
>>  
>> -static int do_block_io_op(struct xen_blkif *blkif);
>> -static int dispatch_rw_block_io(struct xen_blkif *blkif,
>> +static int do_block_io_op(struct xen_blkif_ring *ring);
>> +static int dispatch_rw_block_io(struct xen_blkif_ring *ring,
>>  				struct blkif_request *req,
>>  				struct pending_req *pending_req);
>> -static void make_response(struct xen_blkif *blkif, u64 id,
>> +static void make_response(struct xen_blkif_ring *ring, u64 id,
>>  			  unsigned short op, int st);
>>  
>>  #define foreach_grant_safe(pos, n, rbtree, node) \
>> @@ -198,19 +198,19 @@ static void make_response(struct xen_blkif *blkif, u64 id,
>>   * bit operations to modify the flags of a persistent grant and to count
>>   * the number of used grants.
>>   */
>> -static int add_persistent_gnt(struct xen_blkif *blkif,
>> +static int add_persistent_gnt(struct xen_blkif_ring *ring,
>>  			       struct persistent_gnt *persistent_gnt)
>>  {
>>  	struct rb_node **new = NULL, *parent = NULL;
>>  	struct persistent_gnt *this;
>>  
>> -	if (blkif->persistent_gnt_c >= xen_blkif_max_pgrants) {
>> -		if (!blkif->vbd.overflow_max_grants)
>> -			blkif->vbd.overflow_max_grants = 1;
>> +	if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) {
>> +		if (!ring->blkif->vbd.overflow_max_grants)
>> +			ring->blkif->vbd.overflow_max_grants = 1;
> 
> The same for the pool of persistent grants, it should be per-device and
> not per-ring.
> 
> And I think this issue is far worse than the others, because a frontend
> might use a persistent grant on different queues, forcing the backend
> map the grant several times for each queue, this is not acceptable IMO.
> 

Thanks, will be updated.

-- 
Regards,
-Bob

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 8/9] xen/blkback: pseudo support for multi hardware queues/rings
  2015-10-05 15:08   ` Roger Pau Monné
@ 2015-10-07 10:50     ` Bob Liu
  2015-10-07 11:49       ` Roger Pau Monné
  2015-10-07 11:49       ` Roger Pau Monné
  2015-10-07 10:50     ` Bob Liu
  1 sibling, 2 replies; 83+ messages in thread
From: Bob Liu @ 2015-10-07 10:50 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, david.vrabel, linux-kernel, konrad.wilk,
	felipe.franciosi, axboe, hch, avanzini.arianna, rafal.mielniczuk,
	boris.ostrovsky, jonathan.davies


On 10/05/2015 11:08 PM, Roger Pau Monné wrote:
> El 05/09/15 a les 14.39, Bob Liu ha escrit:
>> Prepare patch for multi hardware queues/rings, the ring number was set to 1 by
>> force.
> 
> This should be:
> 
> Preparatory patch for multiple hardware queues (rings). The number of
> rings is unconditionally set to 1.
> 
> But frankly this description is not helpful at all, you should describe
> the preparatory changes and why you need them.
> 

Will update, the purpose is just to make each patch smaller and more readable.

>>
>> Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>> ---
>>  drivers/block/xen-blkback/common.h |    3 +-
>>  drivers/block/xen-blkback/xenbus.c |  328 +++++++++++++++++++++++-------------
>>  2 files changed, 209 insertions(+), 122 deletions(-)
>>
>> diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
>> index cc253d4..ba058a0 100644
>> --- a/drivers/block/xen-blkback/common.h
>> +++ b/drivers/block/xen-blkback/common.h
>> @@ -339,7 +339,8 @@ struct xen_blkif {
>>  	unsigned long long			st_wr_sect;
>>  	unsigned int nr_ring_pages;
>>  	/* All rings for this device */
>> -	struct xen_blkif_ring ring;
>> +	struct xen_blkif_ring *rings;
>> +	unsigned int nr_rings;
>>  };
>>  
>>  struct seg_buf {
>> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
>> index 6482ee3..04b8d0d 100644
>> --- a/drivers/block/xen-blkback/xenbus.c
>> +++ b/drivers/block/xen-blkback/xenbus.c
>> @@ -26,6 +26,7 @@
>>  /* Enlarge the array size in order to fully show blkback name. */
>>  #define BLKBACK_NAME_LEN (20)
>>  #define RINGREF_NAME_LEN (20)
>> +#define RINGREF_NAME_LEN (20)
> 
> Duplicate define?
> 

Will update.

>>  
>>  struct backend_info {
>>  	struct xenbus_device	*dev;
>> @@ -84,11 +85,13 @@ static int blkback_name(struct xen_blkif *blkif, char *buf)
>>  
>>  static void xen_update_blkif_status(struct xen_blkif *blkif)
>>  {
>> -	int err;
>> +	int err, i;
>>  	char name[BLKBACK_NAME_LEN];
>> +	struct xen_blkif_ring *ring;
>> +	char per_ring_name[BLKBACK_NAME_LEN + 2];
> 
> Hm, why don't you just add + 2 to the place where BLKBACK_NAME_LEN is
> defined and use the same character array ("name")? This is just a waste
> of stack.
> 

per_ring_name = name + 'queue number';
If reuse name[], I'm not sure whether
snprintf(name, BLKBACK_NAME_LEN + 2, "%s-%d", name, i); can work.

>>  
>>  	/* Not ready to connect? */
>> -	if (!blkif->ring.irq || !blkif->vbd.bdev)
>> +	if (!blkif->rings || !blkif->rings[0].irq || !blkif->vbd.bdev)
>>  		return;
>>  
>>  	/* Already connected? */
>> @@ -113,19 +116,68 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
>>  	}
>>  	invalidate_inode_pages2(blkif->vbd.bdev->bd_inode->i_mapping);
>>  
>> -	blkif->ring.xenblkd = kthread_run(xen_blkif_schedule, &blkif->ring, "%s", name);
>> -	if (IS_ERR(blkif->ring.xenblkd)) {
>> -		err = PTR_ERR(blkif->ring.xenblkd);
>> -		blkif->ring.xenblkd = NULL;
>> -		xenbus_dev_error(blkif->be->dev, err, "start xenblkd");
>> -		return;
>> +	if (blkif->nr_rings == 1) {
>> +		blkif->rings[0].xenblkd = kthread_run(xen_blkif_schedule, &blkif->rings[0], "%s", name);
>> +		if (IS_ERR(blkif->rings[0].xenblkd)) {
>> +			err = PTR_ERR(blkif->rings[0].xenblkd);
>> +			blkif->rings[0].xenblkd = NULL;
>> +			xenbus_dev_error(blkif->be->dev, err, "start xenblkd");
>> +			return;
>> +		}
>> +	} else {
>> +		for (i = 0; i < blkif->nr_rings; i++) {
>> +			snprintf(per_ring_name, BLKBACK_NAME_LEN + 2, "%s-%d", name, i);
>> +			ring = &blkif->rings[i];
>> +			ring->xenblkd = kthread_run(xen_blkif_schedule, ring, "%s", per_ring_name);
>> +			if (IS_ERR(ring->xenblkd)) {
>> +				err = PTR_ERR(ring->xenblkd);
>> +				ring->xenblkd = NULL;
>> +				xenbus_dev_error(blkif->be->dev, err,
>> +						"start %s xenblkd", per_ring_name);
>> +				return;
>> +			}
>> +		}
>> +	}
>> +}
>> +
>> +static int xen_blkif_alloc_rings(struct xen_blkif *blkif)
>> +{
>> +	struct xen_blkif_ring *ring;
>> +	int r;
>> +
>> +	blkif->rings = kzalloc(blkif->nr_rings * sizeof(struct xen_blkif_ring), GFP_KERNEL);
>> +	if (!blkif->rings)
>> +		return -ENOMEM;
>> +
>> +	for (r = 0; r < blkif->nr_rings; r++) {
>> +		ring = &blkif->rings[r];
>> +
>> +		spin_lock_init(&ring->blk_ring_lock);
>> +		init_waitqueue_head(&ring->wq);
>> +		ring->st_print = jiffies;
>> +		ring->persistent_gnts.rb_node = NULL;
>> +		spin_lock_init(&ring->free_pages_lock);
>> +		INIT_LIST_HEAD(&ring->free_pages);
>> +		INIT_LIST_HEAD(&ring->persistent_purge_list);
>> +		ring->free_pages_num = 0;
>> +		atomic_set(&ring->persistent_gnt_in_use, 0);
>> +		atomic_set(&ring->inflight, 0);
>> +		INIT_WORK(&ring->persistent_purge_work, xen_blkbk_unmap_purged_grants);
>> +		INIT_LIST_HEAD(&ring->pending_free);
>> +
>> +		spin_lock_init(&ring->pending_free_lock);
>> +		init_waitqueue_head(&ring->pending_free_wq);
>> +		init_waitqueue_head(&ring->shutdown_wq);
> 
> I've already commented on the previous patch, but a bunch of this needs
> to be per-device rather than per-ring.
> 

Will update and all other comments.

Thanks
-Bob

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 8/9] xen/blkback: pseudo support for multi hardware queues/rings
  2015-10-05 15:08   ` Roger Pau Monné
  2015-10-07 10:50     ` Bob Liu
@ 2015-10-07 10:50     ` Bob Liu
  1 sibling, 0 replies; 83+ messages in thread
From: Bob Liu @ 2015-10-07 10:50 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: hch, felipe.franciosi, rafal.mielniczuk, linux-kernel, xen-devel,
	axboe, jonathan.davies, david.vrabel, avanzini.arianna,
	boris.ostrovsky


On 10/05/2015 11:08 PM, Roger Pau Monné wrote:
> El 05/09/15 a les 14.39, Bob Liu ha escrit:
>> Prepare patch for multi hardware queues/rings, the ring number was set to 1 by
>> force.
> 
> This should be:
> 
> Preparatory patch for multiple hardware queues (rings). The number of
> rings is unconditionally set to 1.
> 
> But frankly this description is not helpful at all, you should describe
> the preparatory changes and why you need them.
> 

Will update, the purpose is just to make each patch smaller and more readable.

>>
>> Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>> ---
>>  drivers/block/xen-blkback/common.h |    3 +-
>>  drivers/block/xen-blkback/xenbus.c |  328 +++++++++++++++++++++++-------------
>>  2 files changed, 209 insertions(+), 122 deletions(-)
>>
>> diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
>> index cc253d4..ba058a0 100644
>> --- a/drivers/block/xen-blkback/common.h
>> +++ b/drivers/block/xen-blkback/common.h
>> @@ -339,7 +339,8 @@ struct xen_blkif {
>>  	unsigned long long			st_wr_sect;
>>  	unsigned int nr_ring_pages;
>>  	/* All rings for this device */
>> -	struct xen_blkif_ring ring;
>> +	struct xen_blkif_ring *rings;
>> +	unsigned int nr_rings;
>>  };
>>  
>>  struct seg_buf {
>> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
>> index 6482ee3..04b8d0d 100644
>> --- a/drivers/block/xen-blkback/xenbus.c
>> +++ b/drivers/block/xen-blkback/xenbus.c
>> @@ -26,6 +26,7 @@
>>  /* Enlarge the array size in order to fully show blkback name. */
>>  #define BLKBACK_NAME_LEN (20)
>>  #define RINGREF_NAME_LEN (20)
>> +#define RINGREF_NAME_LEN (20)
> 
> Duplicate define?
> 

Will update.

>>  
>>  struct backend_info {
>>  	struct xenbus_device	*dev;
>> @@ -84,11 +85,13 @@ static int blkback_name(struct xen_blkif *blkif, char *buf)
>>  
>>  static void xen_update_blkif_status(struct xen_blkif *blkif)
>>  {
>> -	int err;
>> +	int err, i;
>>  	char name[BLKBACK_NAME_LEN];
>> +	struct xen_blkif_ring *ring;
>> +	char per_ring_name[BLKBACK_NAME_LEN + 2];
> 
> Hm, why don't you just add + 2 to the place where BLKBACK_NAME_LEN is
> defined and use the same character array ("name")? This is just a waste
> of stack.
> 

per_ring_name = name + 'queue number';
If reuse name[], I'm not sure whether
snprintf(name, BLKBACK_NAME_LEN + 2, "%s-%d", name, i); can work.

>>  
>>  	/* Not ready to connect? */
>> -	if (!blkif->ring.irq || !blkif->vbd.bdev)
>> +	if (!blkif->rings || !blkif->rings[0].irq || !blkif->vbd.bdev)
>>  		return;
>>  
>>  	/* Already connected? */
>> @@ -113,19 +116,68 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
>>  	}
>>  	invalidate_inode_pages2(blkif->vbd.bdev->bd_inode->i_mapping);
>>  
>> -	blkif->ring.xenblkd = kthread_run(xen_blkif_schedule, &blkif->ring, "%s", name);
>> -	if (IS_ERR(blkif->ring.xenblkd)) {
>> -		err = PTR_ERR(blkif->ring.xenblkd);
>> -		blkif->ring.xenblkd = NULL;
>> -		xenbus_dev_error(blkif->be->dev, err, "start xenblkd");
>> -		return;
>> +	if (blkif->nr_rings == 1) {
>> +		blkif->rings[0].xenblkd = kthread_run(xen_blkif_schedule, &blkif->rings[0], "%s", name);
>> +		if (IS_ERR(blkif->rings[0].xenblkd)) {
>> +			err = PTR_ERR(blkif->rings[0].xenblkd);
>> +			blkif->rings[0].xenblkd = NULL;
>> +			xenbus_dev_error(blkif->be->dev, err, "start xenblkd");
>> +			return;
>> +		}
>> +	} else {
>> +		for (i = 0; i < blkif->nr_rings; i++) {
>> +			snprintf(per_ring_name, BLKBACK_NAME_LEN + 2, "%s-%d", name, i);
>> +			ring = &blkif->rings[i];
>> +			ring->xenblkd = kthread_run(xen_blkif_schedule, ring, "%s", per_ring_name);
>> +			if (IS_ERR(ring->xenblkd)) {
>> +				err = PTR_ERR(ring->xenblkd);
>> +				ring->xenblkd = NULL;
>> +				xenbus_dev_error(blkif->be->dev, err,
>> +						"start %s xenblkd", per_ring_name);
>> +				return;
>> +			}
>> +		}
>> +	}
>> +}
>> +
>> +static int xen_blkif_alloc_rings(struct xen_blkif *blkif)
>> +{
>> +	struct xen_blkif_ring *ring;
>> +	int r;
>> +
>> +	blkif->rings = kzalloc(blkif->nr_rings * sizeof(struct xen_blkif_ring), GFP_KERNEL);
>> +	if (!blkif->rings)
>> +		return -ENOMEM;
>> +
>> +	for (r = 0; r < blkif->nr_rings; r++) {
>> +		ring = &blkif->rings[r];
>> +
>> +		spin_lock_init(&ring->blk_ring_lock);
>> +		init_waitqueue_head(&ring->wq);
>> +		ring->st_print = jiffies;
>> +		ring->persistent_gnts.rb_node = NULL;
>> +		spin_lock_init(&ring->free_pages_lock);
>> +		INIT_LIST_HEAD(&ring->free_pages);
>> +		INIT_LIST_HEAD(&ring->persistent_purge_list);
>> +		ring->free_pages_num = 0;
>> +		atomic_set(&ring->persistent_gnt_in_use, 0);
>> +		atomic_set(&ring->inflight, 0);
>> +		INIT_WORK(&ring->persistent_purge_work, xen_blkbk_unmap_purged_grants);
>> +		INIT_LIST_HEAD(&ring->pending_free);
>> +
>> +		spin_lock_init(&ring->pending_free_lock);
>> +		init_waitqueue_head(&ring->pending_free_wq);
>> +		init_waitqueue_head(&ring->shutdown_wq);
> 
> I've already commented on the previous patch, but a bunch of this needs
> to be per-device rather than per-ring.
> 

Will update and all other comments.

Thanks
-Bob

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 9/9] xen/blkback: get number of hardware queues/rings from blkfront
  2015-10-05 15:15   ` Roger Pau Monné
  2015-10-07 10:54     ` Bob Liu
@ 2015-10-07 10:54     ` Bob Liu
  1 sibling, 0 replies; 83+ messages in thread
From: Bob Liu @ 2015-10-07 10:54 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, david.vrabel, linux-kernel, konrad.wilk,
	felipe.franciosi, axboe, hch, avanzini.arianna, rafal.mielniczuk,
	boris.ostrovsky, jonathan.davies


On 10/05/2015 11:15 PM, Roger Pau Monné wrote:
> El 05/09/15 a les 14.39, Bob Liu ha escrit:
>> Backend advertises "multi-queue-max-queues" to front, and then read back the
>> final negotiated queues/rings from "multi-queue-num-queues" which is wrote by
>> blkfront.
>>
>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>> ---
>>  drivers/block/xen-blkback/blkback.c |    8 ++++++++
>>  drivers/block/xen-blkback/xenbus.c  |   36 ++++++++++++++++++++++++++++++-----
>>  2 files changed, 39 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
>> index fd02240..b904fe05f0 100644
>> --- a/drivers/block/xen-blkback/blkback.c
>> +++ b/drivers/block/xen-blkback/blkback.c
>> @@ -83,6 +83,11 @@ module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644);
>>  MODULE_PARM_DESC(max_persistent_grants,
>>                   "Maximum number of grants to map persistently");
>>  
>> +unsigned int xenblk_max_queues;
>> +module_param_named(max_queues, xenblk_max_queues, uint, 0644);
>> +MODULE_PARM_DESC(max_queues,
>> +		 "Maximum number of hardware queues per virtual disk");
>> +
>>  /*
>>   * Maximum order of pages to be used for the shared ring between front and
>>   * backend, 4KB page granularity is used.
>> @@ -1458,6 +1463,9 @@ static int __init xen_blkif_init(void)
>>  		xen_blkif_max_ring_order = XENBUS_MAX_RING_PAGE_ORDER;
>>  	}
>>  
>> +	/* Allow as many queues as there are CPUs, by default */
>> +	xenblk_max_queues = num_online_cpus();
> 
> Hm, I'm not sure of the best way to set a default value for this.
> Consider for example a scenario were Dom0 is limited to 2vCPUs, but DomU
> has 8 vCPUs. Are we going to limit the number of queues to two? Is that
> the most appropriate value from a performance PoV?
> 
> I have to admit I don't have a clear idea of a default value for this
> field, and maybe the number of CPUs on the backend is indeed what works
> better, but there needs to be a comment explaining the reasoning behind
> this setting.
> 

It looks like xen-netback also chose num online cpus as the default value.
Anyway, that's not a big problem and can be fixed easily in future.

Thanks again for reviewing this big patch set!

Regards,
-Bob

>>  	rc = xen_blkif_interface_init();
>>  	if (rc)
>>  		goto failed_init;
>> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
>> index 04b8d0d..aa97ea5 100644
>> --- a/drivers/block/xen-blkback/xenbus.c
>> +++ b/drivers/block/xen-blkback/xenbus.c
>> @@ -28,6 +28,8 @@
>>  #define RINGREF_NAME_LEN (20)
>>  #define RINGREF_NAME_LEN (20)
>>  
>> +extern unsigned int xenblk_max_queues;
> 
> This should live in blkback/common.h
> 
>> +
>>  struct backend_info {
>>  	struct xenbus_device	*dev;
>>  	struct xen_blkif	*blkif;
>> @@ -191,11 +193,6 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
>>  	atomic_set(&blkif->drain, 0);
>>  	INIT_WORK(&blkif->free_work, xen_blkif_deferred_free);
>>  
>> -	blkif->nr_rings = 1;
>> -	if (xen_blkif_alloc_rings(blkif)) {
>> -		kmem_cache_free(xen_blkif_cachep, blkif);
>> -		return ERR_PTR(-ENOMEM);
>> -	}
>>  	return blkif;
>>  }
>>  
>> @@ -618,6 +615,14 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
>>  		goto fail;
>>  	}
>>  
>> +	/* Multi-queue: wrte how many queues backend supported. */
>                         ^ write how many queues are supported by the
> backend.
>> +	err = xenbus_printf(XBT_NIL, dev->nodename,
>> +			    "multi-queue-max-queues", "%u", xenblk_max_queues);
>> +	if (err) {
>> +		pr_debug("Error writing multi-queue-num-queues\n");
>                 ^ pr_warn at least.
>> +		goto fail;
>> +	}
>> +
>>  	/* setup back pointer */
>>  	be->blkif->be = be;
>>  
>> @@ -1008,6 +1013,7 @@ static int connect_ring(struct backend_info *be)
>>  	char *xspath;
>>  	size_t xspathsize;
>>  	const size_t xenstore_path_ext_size = 11; /* sufficient for "/queue-NNN" */
>> +	unsigned int requested_num_queues = 0;
>>  
>>  	pr_debug("%s %s\n", __func__, dev->otherend);
>>  
>> @@ -1035,6 +1041,26 @@ static int connect_ring(struct backend_info *be)
>>  	be->blkif->vbd.feature_gnt_persistent = pers_grants;
>>  	be->blkif->vbd.overflow_max_grants = 0;
>>  
>> +	/*
>> +	 * Read the number of hardware queus from frontend.
>                                        ^ queues
>> +	 */
>> +	err = xenbus_scanf(XBT_NIL, dev->otherend, "multi-queue-num-queues", "%u", &requested_num_queues);
>> +	if (err < 0) {
>> +		requested_num_queues = 1;
>> +	} else {
>> +		if (requested_num_queues > xenblk_max_queues
>> +		    || requested_num_queues == 0) {
>> +			/* buggy or malicious guest */
>> +			xenbus_dev_fatal(dev, err,
>> +					"guest requested %u queues, exceeding the maximum of %u.",
>> +					requested_num_queues, xenblk_max_queues);
>> +			return -1;
>> +		}
>> +	}
>> +	be->blkif->nr_rings = requested_num_queues;
>> +	if (xen_blkif_alloc_rings(be->blkif))
>> +		return -ENOMEM;
>> +
>>  	pr_info("nr_rings:%d protocol %d (%s) %s\n", be->blkif->nr_rings,
>>  		 be->blkif->blk_protocol, protocol,
>>  		 pers_grants ? "persistent grants" : "");
>>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 9/9] xen/blkback: get number of hardware queues/rings from blkfront
  2015-10-05 15:15   ` Roger Pau Monné
@ 2015-10-07 10:54     ` Bob Liu
  2015-10-07 10:54     ` Bob Liu
  1 sibling, 0 replies; 83+ messages in thread
From: Bob Liu @ 2015-10-07 10:54 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: hch, felipe.franciosi, rafal.mielniczuk, linux-kernel, xen-devel,
	axboe, jonathan.davies, david.vrabel, avanzini.arianna,
	boris.ostrovsky


On 10/05/2015 11:15 PM, Roger Pau Monné wrote:
> El 05/09/15 a les 14.39, Bob Liu ha escrit:
>> Backend advertises "multi-queue-max-queues" to front, and then read back the
>> final negotiated queues/rings from "multi-queue-num-queues" which is wrote by
>> blkfront.
>>
>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>> ---
>>  drivers/block/xen-blkback/blkback.c |    8 ++++++++
>>  drivers/block/xen-blkback/xenbus.c  |   36 ++++++++++++++++++++++++++++++-----
>>  2 files changed, 39 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
>> index fd02240..b904fe05f0 100644
>> --- a/drivers/block/xen-blkback/blkback.c
>> +++ b/drivers/block/xen-blkback/blkback.c
>> @@ -83,6 +83,11 @@ module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644);
>>  MODULE_PARM_DESC(max_persistent_grants,
>>                   "Maximum number of grants to map persistently");
>>  
>> +unsigned int xenblk_max_queues;
>> +module_param_named(max_queues, xenblk_max_queues, uint, 0644);
>> +MODULE_PARM_DESC(max_queues,
>> +		 "Maximum number of hardware queues per virtual disk");
>> +
>>  /*
>>   * Maximum order of pages to be used for the shared ring between front and
>>   * backend, 4KB page granularity is used.
>> @@ -1458,6 +1463,9 @@ static int __init xen_blkif_init(void)
>>  		xen_blkif_max_ring_order = XENBUS_MAX_RING_PAGE_ORDER;
>>  	}
>>  
>> +	/* Allow as many queues as there are CPUs, by default */
>> +	xenblk_max_queues = num_online_cpus();
> 
> Hm, I'm not sure of the best way to set a default value for this.
> Consider for example a scenario were Dom0 is limited to 2vCPUs, but DomU
> has 8 vCPUs. Are we going to limit the number of queues to two? Is that
> the most appropriate value from a performance PoV?
> 
> I have to admit I don't have a clear idea of a default value for this
> field, and maybe the number of CPUs on the backend is indeed what works
> better, but there needs to be a comment explaining the reasoning behind
> this setting.
> 

It looks like xen-netback also chose num online cpus as the default value.
Anyway, that's not a big problem and can be fixed easily in future.

Thanks again for reviewing this big patch set!

Regards,
-Bob

>>  	rc = xen_blkif_interface_init();
>>  	if (rc)
>>  		goto failed_init;
>> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
>> index 04b8d0d..aa97ea5 100644
>> --- a/drivers/block/xen-blkback/xenbus.c
>> +++ b/drivers/block/xen-blkback/xenbus.c
>> @@ -28,6 +28,8 @@
>>  #define RINGREF_NAME_LEN (20)
>>  #define RINGREF_NAME_LEN (20)
>>  
>> +extern unsigned int xenblk_max_queues;
> 
> This should live in blkback/common.h
> 
>> +
>>  struct backend_info {
>>  	struct xenbus_device	*dev;
>>  	struct xen_blkif	*blkif;
>> @@ -191,11 +193,6 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
>>  	atomic_set(&blkif->drain, 0);
>>  	INIT_WORK(&blkif->free_work, xen_blkif_deferred_free);
>>  
>> -	blkif->nr_rings = 1;
>> -	if (xen_blkif_alloc_rings(blkif)) {
>> -		kmem_cache_free(xen_blkif_cachep, blkif);
>> -		return ERR_PTR(-ENOMEM);
>> -	}
>>  	return blkif;
>>  }
>>  
>> @@ -618,6 +615,14 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
>>  		goto fail;
>>  	}
>>  
>> +	/* Multi-queue: wrte how many queues backend supported. */
>                         ^ write how many queues are supported by the
> backend.
>> +	err = xenbus_printf(XBT_NIL, dev->nodename,
>> +			    "multi-queue-max-queues", "%u", xenblk_max_queues);
>> +	if (err) {
>> +		pr_debug("Error writing multi-queue-num-queues\n");
>                 ^ pr_warn at least.
>> +		goto fail;
>> +	}
>> +
>>  	/* setup back pointer */
>>  	be->blkif->be = be;
>>  
>> @@ -1008,6 +1013,7 @@ static int connect_ring(struct backend_info *be)
>>  	char *xspath;
>>  	size_t xspathsize;
>>  	const size_t xenstore_path_ext_size = 11; /* sufficient for "/queue-NNN" */
>> +	unsigned int requested_num_queues = 0;
>>  
>>  	pr_debug("%s %s\n", __func__, dev->otherend);
>>  
>> @@ -1035,6 +1041,26 @@ static int connect_ring(struct backend_info *be)
>>  	be->blkif->vbd.feature_gnt_persistent = pers_grants;
>>  	be->blkif->vbd.overflow_max_grants = 0;
>>  
>> +	/*
>> +	 * Read the number of hardware queus from frontend.
>                                        ^ queues
>> +	 */
>> +	err = xenbus_scanf(XBT_NIL, dev->otherend, "multi-queue-num-queues", "%u", &requested_num_queues);
>> +	if (err < 0) {
>> +		requested_num_queues = 1;
>> +	} else {
>> +		if (requested_num_queues > xenblk_max_queues
>> +		    || requested_num_queues == 0) {
>> +			/* buggy or malicious guest */
>> +			xenbus_dev_fatal(dev, err,
>> +					"guest requested %u queues, exceeding the maximum of %u.",
>> +					requested_num_queues, xenblk_max_queues);
>> +			return -1;
>> +		}
>> +	}
>> +	be->blkif->nr_rings = requested_num_queues;
>> +	if (xen_blkif_alloc_rings(be->blkif))
>> +		return -ENOMEM;
>> +
>>  	pr_info("nr_rings:%d protocol %d (%s) %s\n", be->blkif->nr_rings,
>>  		 be->blkif->blk_protocol, protocol,
>>  		 pers_grants ? "persistent grants" : "");
>>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 6/9] xen/blkfront: negotiate the number of hw queues/rings with backend
  2015-10-07 10:39     ` Bob Liu
@ 2015-10-07 11:46       ` Roger Pau Monné
  2015-10-07 12:19         ` Bob Liu
  2015-10-07 12:19         ` Bob Liu
  2015-10-07 11:46       ` Roger Pau Monné
  1 sibling, 2 replies; 83+ messages in thread
From: Roger Pau Monné @ 2015-10-07 11:46 UTC (permalink / raw)
  To: Bob Liu
  Cc: xen-devel, david.vrabel, linux-kernel, konrad.wilk,
	felipe.franciosi, axboe, hch, avanzini.arianna, rafal.mielniczuk,
	boris.ostrovsky, jonathan.davies

El 07/10/15 a les 12.39, Bob Liu ha escrit:
> On 10/05/2015 10:40 PM, Roger Pau Monné wrote:
>> El 05/09/15 a les 14.39, Bob Liu ha escrit:
>>> @@ -2267,6 +2335,12 @@ static int __init xlblk_init(void)
>>>  		xen_blkif_max_ring_order = 0;
>>>  	}
>>>  
>>> +	if (xen_blkif_max_queues > nr_cpus) {
>>
>> Shouldn't there be a default value for xen_blkif_max_queues if the user
>> hasn't set the parameter on the command line?
>>
> 
> Then the default value is 0, multi-queue isn't enabled by default.

Why isn't it enabled by default with a sensible number of queues? I
guess something like:

if (xen_blkif_max_queues == 0)
	xen_blkif_max_queues = min(nr_cpus, 4);

Roger.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 6/9] xen/blkfront: negotiate the number of hw queues/rings with backend
  2015-10-07 10:39     ` Bob Liu
  2015-10-07 11:46       ` Roger Pau Monné
@ 2015-10-07 11:46       ` Roger Pau Monné
  1 sibling, 0 replies; 83+ messages in thread
From: Roger Pau Monné @ 2015-10-07 11:46 UTC (permalink / raw)
  To: Bob Liu
  Cc: hch, felipe.franciosi, rafal.mielniczuk, linux-kernel, xen-devel,
	axboe, jonathan.davies, david.vrabel, avanzini.arianna,
	boris.ostrovsky

El 07/10/15 a les 12.39, Bob Liu ha escrit:
> On 10/05/2015 10:40 PM, Roger Pau Monné wrote:
>> El 05/09/15 a les 14.39, Bob Liu ha escrit:
>>> @@ -2267,6 +2335,12 @@ static int __init xlblk_init(void)
>>>  		xen_blkif_max_ring_order = 0;
>>>  	}
>>>  
>>> +	if (xen_blkif_max_queues > nr_cpus) {
>>
>> Shouldn't there be a default value for xen_blkif_max_queues if the user
>> hasn't set the parameter on the command line?
>>
> 
> Then the default value is 0, multi-queue isn't enabled by default.

Why isn't it enabled by default with a sensible number of queues? I
guess something like:

if (xen_blkif_max_queues == 0)
	xen_blkif_max_queues = min(nr_cpus, 4);

Roger.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 8/9] xen/blkback: pseudo support for multi hardware queues/rings
  2015-10-07 10:50     ` Bob Liu
@ 2015-10-07 11:49       ` Roger Pau Monné
  2015-10-07 11:49       ` Roger Pau Monné
  1 sibling, 0 replies; 83+ messages in thread
From: Roger Pau Monné @ 2015-10-07 11:49 UTC (permalink / raw)
  To: Bob Liu
  Cc: xen-devel, david.vrabel, linux-kernel, konrad.wilk,
	felipe.franciosi, axboe, hch, avanzini.arianna, rafal.mielniczuk,
	boris.ostrovsky, jonathan.davies

El 07/10/15 a les 12.50, Bob Liu ha escrit:
> 
> On 10/05/2015 11:08 PM, Roger Pau Monné wrote:
>> El 05/09/15 a les 14.39, Bob Liu ha escrit:
>>> Prepare patch for multi hardware queues/rings, the ring number was set to 1 by
>>> force.
>>
>> This should be:
>>
>> Preparatory patch for multiple hardware queues (rings). The number of
>> rings is unconditionally set to 1.
>>
>> But frankly this description is not helpful at all, you should describe
>> the preparatory changes and why you need them.
>>
> 
> Will update, the purpose is just to make each patch smaller and more readable.
> 
>>>
>>> Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
>>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>>> ---
>>>  drivers/block/xen-blkback/common.h |    3 +-
>>>  drivers/block/xen-blkback/xenbus.c |  328 +++++++++++++++++++++++-------------
>>>  2 files changed, 209 insertions(+), 122 deletions(-)
>>>
>>> diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
>>> index cc253d4..ba058a0 100644
>>> --- a/drivers/block/xen-blkback/common.h
>>> +++ b/drivers/block/xen-blkback/common.h
>>> @@ -339,7 +339,8 @@ struct xen_blkif {
>>>  	unsigned long long			st_wr_sect;
>>>  	unsigned int nr_ring_pages;
>>>  	/* All rings for this device */
>>> -	struct xen_blkif_ring ring;
>>> +	struct xen_blkif_ring *rings;
>>> +	unsigned int nr_rings;
>>>  };
>>>  
>>>  struct seg_buf {
>>> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
>>> index 6482ee3..04b8d0d 100644
>>> --- a/drivers/block/xen-blkback/xenbus.c
>>> +++ b/drivers/block/xen-blkback/xenbus.c
>>> @@ -26,6 +26,7 @@
>>>  /* Enlarge the array size in order to fully show blkback name. */
>>>  #define BLKBACK_NAME_LEN (20)
>>>  #define RINGREF_NAME_LEN (20)
>>> +#define RINGREF_NAME_LEN (20)
>>
>> Duplicate define?
>>
> 
> Will update.
> 
>>>  
>>>  struct backend_info {
>>>  	struct xenbus_device	*dev;
>>> @@ -84,11 +85,13 @@ static int blkback_name(struct xen_blkif *blkif, char *buf)
>>>  
>>>  static void xen_update_blkif_status(struct xen_blkif *blkif)
>>>  {
>>> -	int err;
>>> +	int err, i;
>>>  	char name[BLKBACK_NAME_LEN];
>>> +	struct xen_blkif_ring *ring;
>>> +	char per_ring_name[BLKBACK_NAME_LEN + 2];
>>
>> Hm, why don't you just add + 2 to the place where BLKBACK_NAME_LEN is
>> defined and use the same character array ("name")? This is just a waste
>> of stack.
>>
> 
> per_ring_name = name + 'queue number';
> If reuse name[], I'm not sure whether
> snprintf(name, BLKBACK_NAME_LEN + 2, "%s-%d", name, i); can work.

What I would like to avoid is having two character arrays in the stack
when IMHO we can achieve the same with a single array that's long enough.

Roger.


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 8/9] xen/blkback: pseudo support for multi hardware queues/rings
  2015-10-07 10:50     ` Bob Liu
  2015-10-07 11:49       ` Roger Pau Monné
@ 2015-10-07 11:49       ` Roger Pau Monné
  1 sibling, 0 replies; 83+ messages in thread
From: Roger Pau Monné @ 2015-10-07 11:49 UTC (permalink / raw)
  To: Bob Liu
  Cc: hch, felipe.franciosi, rafal.mielniczuk, linux-kernel, xen-devel,
	axboe, jonathan.davies, david.vrabel, avanzini.arianna,
	boris.ostrovsky

El 07/10/15 a les 12.50, Bob Liu ha escrit:
> 
> On 10/05/2015 11:08 PM, Roger Pau Monné wrote:
>> El 05/09/15 a les 14.39, Bob Liu ha escrit:
>>> Prepare patch for multi hardware queues/rings, the ring number was set to 1 by
>>> force.
>>
>> This should be:
>>
>> Preparatory patch for multiple hardware queues (rings). The number of
>> rings is unconditionally set to 1.
>>
>> But frankly this description is not helpful at all, you should describe
>> the preparatory changes and why you need them.
>>
> 
> Will update, the purpose is just to make each patch smaller and more readable.
> 
>>>
>>> Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
>>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>>> ---
>>>  drivers/block/xen-blkback/common.h |    3 +-
>>>  drivers/block/xen-blkback/xenbus.c |  328 +++++++++++++++++++++++-------------
>>>  2 files changed, 209 insertions(+), 122 deletions(-)
>>>
>>> diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
>>> index cc253d4..ba058a0 100644
>>> --- a/drivers/block/xen-blkback/common.h
>>> +++ b/drivers/block/xen-blkback/common.h
>>> @@ -339,7 +339,8 @@ struct xen_blkif {
>>>  	unsigned long long			st_wr_sect;
>>>  	unsigned int nr_ring_pages;
>>>  	/* All rings for this device */
>>> -	struct xen_blkif_ring ring;
>>> +	struct xen_blkif_ring *rings;
>>> +	unsigned int nr_rings;
>>>  };
>>>  
>>>  struct seg_buf {
>>> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
>>> index 6482ee3..04b8d0d 100644
>>> --- a/drivers/block/xen-blkback/xenbus.c
>>> +++ b/drivers/block/xen-blkback/xenbus.c
>>> @@ -26,6 +26,7 @@
>>>  /* Enlarge the array size in order to fully show blkback name. */
>>>  #define BLKBACK_NAME_LEN (20)
>>>  #define RINGREF_NAME_LEN (20)
>>> +#define RINGREF_NAME_LEN (20)
>>
>> Duplicate define?
>>
> 
> Will update.
> 
>>>  
>>>  struct backend_info {
>>>  	struct xenbus_device	*dev;
>>> @@ -84,11 +85,13 @@ static int blkback_name(struct xen_blkif *blkif, char *buf)
>>>  
>>>  static void xen_update_blkif_status(struct xen_blkif *blkif)
>>>  {
>>> -	int err;
>>> +	int err, i;
>>>  	char name[BLKBACK_NAME_LEN];
>>> +	struct xen_blkif_ring *ring;
>>> +	char per_ring_name[BLKBACK_NAME_LEN + 2];
>>
>> Hm, why don't you just add + 2 to the place where BLKBACK_NAME_LEN is
>> defined and use the same character array ("name")? This is just a waste
>> of stack.
>>
> 
> per_ring_name = name + 'queue number';
> If reuse name[], I'm not sure whether
> snprintf(name, BLKBACK_NAME_LEN + 2, "%s-%d", name, i); can work.

What I would like to avoid is having two character arrays in the stack
when IMHO we can achieve the same with a single array that's long enough.

Roger.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 6/9] xen/blkfront: negotiate the number of hw queues/rings with backend
  2015-10-07 11:46       ` Roger Pau Monné
@ 2015-10-07 12:19         ` Bob Liu
  2015-10-07 12:19         ` Bob Liu
  1 sibling, 0 replies; 83+ messages in thread
From: Bob Liu @ 2015-10-07 12:19 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, david.vrabel, linux-kernel, konrad.wilk,
	felipe.franciosi, axboe, hch, avanzini.arianna, rafal.mielniczuk,
	boris.ostrovsky, jonathan.davies


On 10/07/2015 07:46 PM, Roger Pau Monné wrote:
> El 07/10/15 a les 12.39, Bob Liu ha escrit:
>> On 10/05/2015 10:40 PM, Roger Pau Monné wrote:
>>> El 05/09/15 a les 14.39, Bob Liu ha escrit:
>>>> @@ -2267,6 +2335,12 @@ static int __init xlblk_init(void)
>>>>  		xen_blkif_max_ring_order = 0;
>>>>  	}
>>>>  
>>>> +	if (xen_blkif_max_queues > nr_cpus) {
>>>
>>> Shouldn't there be a default value for xen_blkif_max_queues if the user
>>> hasn't set the parameter on the command line?
>>>
>>
>> Then the default value is 0, multi-queue isn't enabled by default.
> 
> Why isn't it enabled by default with a sensible number of queues? I
> guess something like:
> 
> if (xen_blkif_max_queues == 0)
> 	xen_blkif_max_queues = min(nr_cpus, 4);
> 

I'm worry about complains about more memory consumption if set to 4 by default.
Anyway, if you think it's fine I'll update the default value to 4 in next version.

-- 
Regards,
-Bob

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 6/9] xen/blkfront: negotiate the number of hw queues/rings with backend
  2015-10-07 11:46       ` Roger Pau Monné
  2015-10-07 12:19         ` Bob Liu
@ 2015-10-07 12:19         ` Bob Liu
  1 sibling, 0 replies; 83+ messages in thread
From: Bob Liu @ 2015-10-07 12:19 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: hch, felipe.franciosi, rafal.mielniczuk, linux-kernel, xen-devel,
	axboe, jonathan.davies, david.vrabel, avanzini.arianna,
	boris.ostrovsky


On 10/07/2015 07:46 PM, Roger Pau Monné wrote:
> El 07/10/15 a les 12.39, Bob Liu ha escrit:
>> On 10/05/2015 10:40 PM, Roger Pau Monné wrote:
>>> El 05/09/15 a les 14.39, Bob Liu ha escrit:
>>>> @@ -2267,6 +2335,12 @@ static int __init xlblk_init(void)
>>>>  		xen_blkif_max_ring_order = 0;
>>>>  	}
>>>>  
>>>> +	if (xen_blkif_max_queues > nr_cpus) {
>>>
>>> Shouldn't there be a default value for xen_blkif_max_queues if the user
>>> hasn't set the parameter on the command line?
>>>
>>
>> Then the default value is 0, multi-queue isn't enabled by default.
> 
> Why isn't it enabled by default with a sensible number of queues? I
> guess something like:
> 
> if (xen_blkif_max_queues == 0)
> 	xen_blkif_max_queues = min(nr_cpus, 4);
> 

I'm worry about complains about more memory consumption if set to 4 by default.
Anyway, if you think it's fine I'll update the default value to 4 in next version.

-- 
Regards,
-Bob

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 7/9] xen/blkback: separate ring information out of struct xen_blkif
  2015-10-05 14:55   ` Roger Pau Monné
  2015-10-07 10:41     ` Bob Liu
  2015-10-07 10:41     ` Bob Liu
@ 2015-10-10  4:08     ` Bob Liu
  2015-10-19  9:36       ` Roger Pau Monné
  2015-10-19  9:36       ` Roger Pau Monné
  2015-10-10  4:08     ` Bob Liu
  3 siblings, 2 replies; 83+ messages in thread
From: Bob Liu @ 2015-10-10  4:08 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, david.vrabel, linux-kernel, konrad.wilk,
	felipe.franciosi, axboe, hch, avanzini.arianna, rafal.mielniczuk,
	boris.ostrovsky, jonathan.davies


On 10/05/2015 10:55 PM, Roger Pau Monné wrote:
> El 05/09/15 a les 14.39, Bob Liu ha escrit:
>> Split per ring information to an new structure:xen_blkif_ring, so that one vbd
>> device can associate with one or more rings/hardware queues.
>>
>> This patch is a preparation for supporting multi hardware queues/rings.
>>
>> Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>> ---
>>  drivers/block/xen-blkback/blkback.c |  365 ++++++++++++++++++-----------------
>>  drivers/block/xen-blkback/common.h  |   52 +++--
>>  drivers/block/xen-blkback/xenbus.c  |  130 +++++++------
>>  3 files changed, 295 insertions(+), 252 deletions(-)
>>
>> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
>> index 954c002..fd02240 100644
>> --- a/drivers/block/xen-blkback/blkback.c
>> +++ b/drivers/block/xen-blkback/blkback.c
>> @@ -113,71 +113,71 @@ module_param(log_stats, int, 0644);
>>  /* Number of free pages to remove on each call to gnttab_free_pages */
>>  #define NUM_BATCH_FREE_PAGES 10
>>  
>> -static inline int get_free_page(struct xen_blkif *blkif, struct page **page)
>> +static inline int get_free_page(struct xen_blkif_ring *ring, struct page **page)
>>  {
>>  	unsigned long flags;
>>  
>> -	spin_lock_irqsave(&blkif->free_pages_lock, flags);
>> -	if (list_empty(&blkif->free_pages)) {
>> -		BUG_ON(blkif->free_pages_num != 0);
>> -		spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
>> +	spin_lock_irqsave(&ring->free_pages_lock, flags);
>> +	if (list_empty(&ring->free_pages)) {
> 
> I'm afraid the pool of free pages should be per-device, not per-ring.
> 
>> +		BUG_ON(ring->free_pages_num != 0);
>> +		spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>>  		return gnttab_alloc_pages(1, page);
>>  	}
>> -	BUG_ON(blkif->free_pages_num == 0);
>> -	page[0] = list_first_entry(&blkif->free_pages, struct page, lru);
>> +	BUG_ON(ring->free_pages_num == 0);
>> +	page[0] = list_first_entry(&ring->free_pages, struct page, lru);
>>  	list_del(&page[0]->lru);
>> -	blkif->free_pages_num--;
>> -	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
>> +	ring->free_pages_num--;
>> +	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>>  
>>  	return 0;
>>  }
>>  
>> -static inline void put_free_pages(struct xen_blkif *blkif, struct page **page,
>> +static inline void put_free_pages(struct xen_blkif_ring *ring, struct page **page,
>>                                    int num)
>>  {
>>  	unsigned long flags;
>>  	int i;
>>  
>> -	spin_lock_irqsave(&blkif->free_pages_lock, flags);
>> +	spin_lock_irqsave(&ring->free_pages_lock, flags);
>>  	for (i = 0; i < num; i++)
>> -		list_add(&page[i]->lru, &blkif->free_pages);
>> -	blkif->free_pages_num += num;
>> -	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
>> +		list_add(&page[i]->lru, &ring->free_pages);
>> +	ring->free_pages_num += num;
>> +	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>>  }
>>  
>> -static inline void shrink_free_pagepool(struct xen_blkif *blkif, int num)
>> +static inline void shrink_free_pagepool(struct xen_blkif_ring *ring, int num)
>>  {
>>  	/* Remove requested pages in batches of NUM_BATCH_FREE_PAGES */
>>  	struct page *page[NUM_BATCH_FREE_PAGES];
>>  	unsigned int num_pages = 0;
>>  	unsigned long flags;
>>  
>> -	spin_lock_irqsave(&blkif->free_pages_lock, flags);
>> -	while (blkif->free_pages_num > num) {
>> -		BUG_ON(list_empty(&blkif->free_pages));
>> -		page[num_pages] = list_first_entry(&blkif->free_pages,
>> +	spin_lock_irqsave(&ring->free_pages_lock, flags);
>> +	while (ring->free_pages_num > num) {
>> +		BUG_ON(list_empty(&ring->free_pages));
>> +		page[num_pages] = list_first_entry(&ring->free_pages,
>>  		                                   struct page, lru);
>>  		list_del(&page[num_pages]->lru);
>> -		blkif->free_pages_num--;
>> +		ring->free_pages_num--;
>>  		if (++num_pages == NUM_BATCH_FREE_PAGES) {
>> -			spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
>> +			spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>>  			gnttab_free_pages(num_pages, page);
>> -			spin_lock_irqsave(&blkif->free_pages_lock, flags);
>> +			spin_lock_irqsave(&ring->free_pages_lock, flags);
>>  			num_pages = 0;
>>  		}
>>  	}
>> -	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
>> +	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>>  	if (num_pages != 0)
>>  		gnttab_free_pages(num_pages, page);
>>  }
>>  
>>  #define vaddr(page) ((unsigned long)pfn_to_kaddr(page_to_pfn(page)))
>>  
>> -static int do_block_io_op(struct xen_blkif *blkif);
>> -static int dispatch_rw_block_io(struct xen_blkif *blkif,
>> +static int do_block_io_op(struct xen_blkif_ring *ring);
>> +static int dispatch_rw_block_io(struct xen_blkif_ring *ring,
>>  				struct blkif_request *req,
>>  				struct pending_req *pending_req);
>> -static void make_response(struct xen_blkif *blkif, u64 id,
>> +static void make_response(struct xen_blkif_ring *ring, u64 id,
>>  			  unsigned short op, int st);
>>  
>>  #define foreach_grant_safe(pos, n, rbtree, node) \
>> @@ -198,19 +198,19 @@ static void make_response(struct xen_blkif *blkif, u64 id,
>>   * bit operations to modify the flags of a persistent grant and to count
>>   * the number of used grants.
>>   */
>> -static int add_persistent_gnt(struct xen_blkif *blkif,
>> +static int add_persistent_gnt(struct xen_blkif_ring *ring,
>>  			       struct persistent_gnt *persistent_gnt)
>>  {
>>  	struct rb_node **new = NULL, *parent = NULL;
>>  	struct persistent_gnt *this;
>>  
>> -	if (blkif->persistent_gnt_c >= xen_blkif_max_pgrants) {
>> -		if (!blkif->vbd.overflow_max_grants)
>> -			blkif->vbd.overflow_max_grants = 1;
>> +	if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) {
>> +		if (!ring->blkif->vbd.overflow_max_grants)
>> +			ring->blkif->vbd.overflow_max_grants = 1;
> 
> The same for the pool of persistent grants, it should be per-device and
> not per-ring.
> 
> And I think this issue is far worse than the others, because a frontend
> might use a persistent grant on different queues, forcing the backend
> map the grant several times for each queue, this is not acceptable IMO.
> 

Hi Roger,

I realize it would make things complicate if making persistent grant per-device instead of per-queue.
Extra locks are required to protect the per-device pool on both blkfront and blkback.

AFAIR, there was a discussion before about dropping persistent grant map at all.
The only reason we left this feature was backward compatibility.
So that I think we should not complicate xen-block code any more because of a going to be dropped feature.

How about disable feature-persistent if multi-queue was used?

-- 
Regards,
-Bob

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 7/9] xen/blkback: separate ring information out of struct xen_blkif
  2015-10-05 14:55   ` Roger Pau Monné
                       ` (2 preceding siblings ...)
  2015-10-10  4:08     ` Bob Liu
@ 2015-10-10  4:08     ` Bob Liu
  3 siblings, 0 replies; 83+ messages in thread
From: Bob Liu @ 2015-10-10  4:08 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: hch, felipe.franciosi, rafal.mielniczuk, linux-kernel, xen-devel,
	axboe, jonathan.davies, david.vrabel, avanzini.arianna,
	boris.ostrovsky


On 10/05/2015 10:55 PM, Roger Pau Monné wrote:
> El 05/09/15 a les 14.39, Bob Liu ha escrit:
>> Split per ring information to an new structure:xen_blkif_ring, so that one vbd
>> device can associate with one or more rings/hardware queues.
>>
>> This patch is a preparation for supporting multi hardware queues/rings.
>>
>> Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>> ---
>>  drivers/block/xen-blkback/blkback.c |  365 ++++++++++++++++++-----------------
>>  drivers/block/xen-blkback/common.h  |   52 +++--
>>  drivers/block/xen-blkback/xenbus.c  |  130 +++++++------
>>  3 files changed, 295 insertions(+), 252 deletions(-)
>>
>> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
>> index 954c002..fd02240 100644
>> --- a/drivers/block/xen-blkback/blkback.c
>> +++ b/drivers/block/xen-blkback/blkback.c
>> @@ -113,71 +113,71 @@ module_param(log_stats, int, 0644);
>>  /* Number of free pages to remove on each call to gnttab_free_pages */
>>  #define NUM_BATCH_FREE_PAGES 10
>>  
>> -static inline int get_free_page(struct xen_blkif *blkif, struct page **page)
>> +static inline int get_free_page(struct xen_blkif_ring *ring, struct page **page)
>>  {
>>  	unsigned long flags;
>>  
>> -	spin_lock_irqsave(&blkif->free_pages_lock, flags);
>> -	if (list_empty(&blkif->free_pages)) {
>> -		BUG_ON(blkif->free_pages_num != 0);
>> -		spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
>> +	spin_lock_irqsave(&ring->free_pages_lock, flags);
>> +	if (list_empty(&ring->free_pages)) {
> 
> I'm afraid the pool of free pages should be per-device, not per-ring.
> 
>> +		BUG_ON(ring->free_pages_num != 0);
>> +		spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>>  		return gnttab_alloc_pages(1, page);
>>  	}
>> -	BUG_ON(blkif->free_pages_num == 0);
>> -	page[0] = list_first_entry(&blkif->free_pages, struct page, lru);
>> +	BUG_ON(ring->free_pages_num == 0);
>> +	page[0] = list_first_entry(&ring->free_pages, struct page, lru);
>>  	list_del(&page[0]->lru);
>> -	blkif->free_pages_num--;
>> -	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
>> +	ring->free_pages_num--;
>> +	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>>  
>>  	return 0;
>>  }
>>  
>> -static inline void put_free_pages(struct xen_blkif *blkif, struct page **page,
>> +static inline void put_free_pages(struct xen_blkif_ring *ring, struct page **page,
>>                                    int num)
>>  {
>>  	unsigned long flags;
>>  	int i;
>>  
>> -	spin_lock_irqsave(&blkif->free_pages_lock, flags);
>> +	spin_lock_irqsave(&ring->free_pages_lock, flags);
>>  	for (i = 0; i < num; i++)
>> -		list_add(&page[i]->lru, &blkif->free_pages);
>> -	blkif->free_pages_num += num;
>> -	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
>> +		list_add(&page[i]->lru, &ring->free_pages);
>> +	ring->free_pages_num += num;
>> +	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>>  }
>>  
>> -static inline void shrink_free_pagepool(struct xen_blkif *blkif, int num)
>> +static inline void shrink_free_pagepool(struct xen_blkif_ring *ring, int num)
>>  {
>>  	/* Remove requested pages in batches of NUM_BATCH_FREE_PAGES */
>>  	struct page *page[NUM_BATCH_FREE_PAGES];
>>  	unsigned int num_pages = 0;
>>  	unsigned long flags;
>>  
>> -	spin_lock_irqsave(&blkif->free_pages_lock, flags);
>> -	while (blkif->free_pages_num > num) {
>> -		BUG_ON(list_empty(&blkif->free_pages));
>> -		page[num_pages] = list_first_entry(&blkif->free_pages,
>> +	spin_lock_irqsave(&ring->free_pages_lock, flags);
>> +	while (ring->free_pages_num > num) {
>> +		BUG_ON(list_empty(&ring->free_pages));
>> +		page[num_pages] = list_first_entry(&ring->free_pages,
>>  		                                   struct page, lru);
>>  		list_del(&page[num_pages]->lru);
>> -		blkif->free_pages_num--;
>> +		ring->free_pages_num--;
>>  		if (++num_pages == NUM_BATCH_FREE_PAGES) {
>> -			spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
>> +			spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>>  			gnttab_free_pages(num_pages, page);
>> -			spin_lock_irqsave(&blkif->free_pages_lock, flags);
>> +			spin_lock_irqsave(&ring->free_pages_lock, flags);
>>  			num_pages = 0;
>>  		}
>>  	}
>> -	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
>> +	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
>>  	if (num_pages != 0)
>>  		gnttab_free_pages(num_pages, page);
>>  }
>>  
>>  #define vaddr(page) ((unsigned long)pfn_to_kaddr(page_to_pfn(page)))
>>  
>> -static int do_block_io_op(struct xen_blkif *blkif);
>> -static int dispatch_rw_block_io(struct xen_blkif *blkif,
>> +static int do_block_io_op(struct xen_blkif_ring *ring);
>> +static int dispatch_rw_block_io(struct xen_blkif_ring *ring,
>>  				struct blkif_request *req,
>>  				struct pending_req *pending_req);
>> -static void make_response(struct xen_blkif *blkif, u64 id,
>> +static void make_response(struct xen_blkif_ring *ring, u64 id,
>>  			  unsigned short op, int st);
>>  
>>  #define foreach_grant_safe(pos, n, rbtree, node) \
>> @@ -198,19 +198,19 @@ static void make_response(struct xen_blkif *blkif, u64 id,
>>   * bit operations to modify the flags of a persistent grant and to count
>>   * the number of used grants.
>>   */
>> -static int add_persistent_gnt(struct xen_blkif *blkif,
>> +static int add_persistent_gnt(struct xen_blkif_ring *ring,
>>  			       struct persistent_gnt *persistent_gnt)
>>  {
>>  	struct rb_node **new = NULL, *parent = NULL;
>>  	struct persistent_gnt *this;
>>  
>> -	if (blkif->persistent_gnt_c >= xen_blkif_max_pgrants) {
>> -		if (!blkif->vbd.overflow_max_grants)
>> -			blkif->vbd.overflow_max_grants = 1;
>> +	if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) {
>> +		if (!ring->blkif->vbd.overflow_max_grants)
>> +			ring->blkif->vbd.overflow_max_grants = 1;
> 
> The same for the pool of persistent grants, it should be per-device and
> not per-ring.
> 
> And I think this issue is far worse than the others, because a frontend
> might use a persistent grant on different queues, forcing the backend
> map the grant several times for each queue, this is not acceptable IMO.
> 

Hi Roger,

I realize it would make things complicate if making persistent grant per-device instead of per-queue.
Extra locks are required to protect the per-device pool on both blkfront and blkback.

AFAIR, there was a discussion before about dropping persistent grant map at all.
The only reason we left this feature was backward compatibility.
So that I think we should not complicate xen-block code any more because of a going to be dropped feature.

How about disable feature-persistent if multi-queue was used?

-- 
Regards,
-Bob

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 3/9] xen/blkfront: separate per ring information out of device info
  2015-10-02 17:02   ` Roger Pau Monné
  2015-10-03  0:34     ` Bob Liu
  2015-10-03  0:34     ` Bob Liu
@ 2015-10-10  8:30     ` Bob Liu
  2015-10-19  9:42       ` Roger Pau Monné
  2015-10-19  9:42       ` Roger Pau Monné
  2015-10-10  8:30     ` Bob Liu
  3 siblings, 2 replies; 83+ messages in thread
From: Bob Liu @ 2015-10-10  8:30 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, david.vrabel, linux-kernel, konrad.wilk,
	felipe.franciosi, axboe, hch, avanzini.arianna, rafal.mielniczuk,
	boris.ostrovsky, jonathan.davies


On 10/03/2015 01:02 AM, Roger Pau Monné wrote:
> El 05/09/15 a les 14.39, Bob Liu ha escrit:
>> Split per ring information to an new structure:blkfront_ring_info, also rename
>> per blkfront_info to blkfront_dev_info.
>   ^ removed.
>>
>> A ring is the representation of a hardware queue, every vbd device can associate
>> with one or more blkfront_ring_info depending on how many hardware
>> queues/rings to be used.
>>
>> This patch is a preparation for supporting real multi hardware queues/rings.
>>
>> Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>> ---
>>  drivers/block/xen-blkfront.c |  854 ++++++++++++++++++++++--------------------
>>  1 file changed, 445 insertions(+), 409 deletions(-)
>>
>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>> index 5dd591d..bf416d5 100644
>> --- a/drivers/block/xen-blkfront.c
>> +++ b/drivers/block/xen-blkfront.c
>> @@ -107,7 +107,7 @@ static unsigned int xen_blkif_max_ring_order;
>>  module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, S_IRUGO);
>>  MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the shared ring");
>>  
>> -#define BLK_RING_SIZE(info) __CONST_RING_SIZE(blkif, PAGE_SIZE * (info)->nr_ring_pages)
>> +#define BLK_RING_SIZE(dinfo) __CONST_RING_SIZE(blkif, PAGE_SIZE * (dinfo)->nr_ring_pages)
> 
> This change looks pointless, any reason to use dinfo instead of info?
> 
>>  #define BLK_MAX_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE * XENBUS_MAX_RING_PAGES)
>>  /*
>>   * ring-ref%i i=(-1UL) would take 11 characters + 'ring-ref' is 8, so 19
>> @@ -116,12 +116,31 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the
>>  #define RINGREF_NAME_LEN (20)
>>  
>>  /*
>> + *  Per-ring info.
>> + *  Every blkfront device can associate with one or more blkfront_ring_info,
>> + *  depending on how many hardware queues to be used.
>> + */
>> +struct blkfront_ring_info
>> +{
>> +	struct blkif_front_ring ring;
>> +	unsigned int ring_ref[XENBUS_MAX_RING_PAGES];
>> +	unsigned int evtchn, irq;
>> +	struct work_struct work;
>> +	struct gnttab_free_callback callback;
>> +	struct blk_shadow shadow[BLK_MAX_RING_SIZE];
>> +	struct list_head grants;
>> +	struct list_head indirect_pages;
>> +	unsigned int persistent_gnts_c;
> 
> persistent grants should be per-device, not per-queue IMHO. Is it really
> hard to make this global instead of per-queue?
> 

I didn't see the benefit of making it per-device, but disadvantages instead:
If persistent grants are per-device, then we have to introduce an extra lock to protect this list.
Which will complicate the code and may slow down the performance when the queue number is large e.g 16 queues.

Regards,
Bob Liu

>> +	unsigned long shadow_free;
>> +	struct blkfront_dev_info *dinfo;
>> +};

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 3/9] xen/blkfront: separate per ring information out of device info
  2015-10-02 17:02   ` Roger Pau Monné
                       ` (2 preceding siblings ...)
  2015-10-10  8:30     ` Bob Liu
@ 2015-10-10  8:30     ` Bob Liu
  3 siblings, 0 replies; 83+ messages in thread
From: Bob Liu @ 2015-10-10  8:30 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: hch, felipe.franciosi, rafal.mielniczuk, linux-kernel, xen-devel,
	axboe, jonathan.davies, david.vrabel, avanzini.arianna,
	boris.ostrovsky


On 10/03/2015 01:02 AM, Roger Pau Monné wrote:
> El 05/09/15 a les 14.39, Bob Liu ha escrit:
>> Split per ring information to an new structure:blkfront_ring_info, also rename
>> per blkfront_info to blkfront_dev_info.
>   ^ removed.
>>
>> A ring is the representation of a hardware queue, every vbd device can associate
>> with one or more blkfront_ring_info depending on how many hardware
>> queues/rings to be used.
>>
>> This patch is a preparation for supporting real multi hardware queues/rings.
>>
>> Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>> ---
>>  drivers/block/xen-blkfront.c |  854 ++++++++++++++++++++++--------------------
>>  1 file changed, 445 insertions(+), 409 deletions(-)
>>
>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>> index 5dd591d..bf416d5 100644
>> --- a/drivers/block/xen-blkfront.c
>> +++ b/drivers/block/xen-blkfront.c
>> @@ -107,7 +107,7 @@ static unsigned int xen_blkif_max_ring_order;
>>  module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, S_IRUGO);
>>  MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the shared ring");
>>  
>> -#define BLK_RING_SIZE(info) __CONST_RING_SIZE(blkif, PAGE_SIZE * (info)->nr_ring_pages)
>> +#define BLK_RING_SIZE(dinfo) __CONST_RING_SIZE(blkif, PAGE_SIZE * (dinfo)->nr_ring_pages)
> 
> This change looks pointless, any reason to use dinfo instead of info?
> 
>>  #define BLK_MAX_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE * XENBUS_MAX_RING_PAGES)
>>  /*
>>   * ring-ref%i i=(-1UL) would take 11 characters + 'ring-ref' is 8, so 19
>> @@ -116,12 +116,31 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the
>>  #define RINGREF_NAME_LEN (20)
>>  
>>  /*
>> + *  Per-ring info.
>> + *  Every blkfront device can associate with one or more blkfront_ring_info,
>> + *  depending on how many hardware queues to be used.
>> + */
>> +struct blkfront_ring_info
>> +{
>> +	struct blkif_front_ring ring;
>> +	unsigned int ring_ref[XENBUS_MAX_RING_PAGES];
>> +	unsigned int evtchn, irq;
>> +	struct work_struct work;
>> +	struct gnttab_free_callback callback;
>> +	struct blk_shadow shadow[BLK_MAX_RING_SIZE];
>> +	struct list_head grants;
>> +	struct list_head indirect_pages;
>> +	unsigned int persistent_gnts_c;
> 
> persistent grants should be per-device, not per-queue IMHO. Is it really
> hard to make this global instead of per-queue?
> 

I didn't see the benefit of making it per-device, but disadvantages instead:
If persistent grants are per-device, then we have to introduce an extra lock to protect this list.
Which will complicate the code and may slow down the performance when the queue number is large e.g 16 queues.

Regards,
Bob Liu

>> +	unsigned long shadow_free;
>> +	struct blkfront_dev_info *dinfo;
>> +};

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 7/9] xen/blkback: separate ring information out of struct xen_blkif
  2015-10-10  4:08     ` Bob Liu
  2015-10-19  9:36       ` Roger Pau Monné
@ 2015-10-19  9:36       ` Roger Pau Monné
  2015-10-19 10:03         ` Bob Liu
  2015-10-19 10:03         ` Bob Liu
  1 sibling, 2 replies; 83+ messages in thread
From: Roger Pau Monné @ 2015-10-19  9:36 UTC (permalink / raw)
  To: Bob Liu
  Cc: xen-devel, david.vrabel, linux-kernel, konrad.wilk,
	felipe.franciosi, axboe, hch, avanzini.arianna, rafal.mielniczuk,
	boris.ostrovsky, jonathan.davies

El 10/10/15 a les 6.08, Bob Liu ha escrit:
> On 10/05/2015 10:55 PM, Roger Pau Monné wrote:
>> The same for the pool of persistent grants, it should be per-device and
>> not per-ring.
>>
>> And I think this issue is far worse than the others, because a frontend
>> might use a persistent grant on different queues, forcing the backend
>> map the grant several times for each queue, this is not acceptable IMO.
>>
> 
> Hi Roger,
> 
> I realize it would make things complicate if making persistent grant per-device instead of per-queue.
> Extra locks are required to protect the per-device pool on both blkfront and blkback.

Yes, I realize this, but without having at least a prototype it's hard
to tell if contention is going to be a problem or not. We already use a
red-black tree to store persistent grants, which should be quite fast
when performing searches.

IMHO, we are doing things backwards, we should have investigated first
if using per-device was a problem, and if it indeed was a problem then
move to per-queue. Using per-device just required adding locks around
the functions to get/put grants and friends, leaving the data structures
untouched (per-device).

> AFAIR, there was a discussion before about dropping persistent grant map at all.
> The only reason we left this feature was backward compatibility.
> So that I think we should not complicate xen-block code any more because of a going to be dropped feature.
> 
> How about disable feature-persistent if multi-queue was used?

This is not what we have done in the past, also there's no way for
blkback to tell the frontend that persistent grants and multiqueue
cannot be used at the same time. Blkback puts all supported features on
xenstore before knowing anything about the frontend.

Also, if you want to do it per-queue instead of per-device the limits
need to be properly adjusted, not just the persistent grants one, but
also the queue of cached free pages. This also implies that each queue
it's going to have it's own LRU and purge task.

Roger.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 7/9] xen/blkback: separate ring information out of struct xen_blkif
  2015-10-10  4:08     ` Bob Liu
@ 2015-10-19  9:36       ` Roger Pau Monné
  2015-10-19  9:36       ` Roger Pau Monné
  1 sibling, 0 replies; 83+ messages in thread
From: Roger Pau Monné @ 2015-10-19  9:36 UTC (permalink / raw)
  To: Bob Liu
  Cc: hch, felipe.franciosi, rafal.mielniczuk, linux-kernel, xen-devel,
	axboe, jonathan.davies, david.vrabel, avanzini.arianna,
	boris.ostrovsky

El 10/10/15 a les 6.08, Bob Liu ha escrit:
> On 10/05/2015 10:55 PM, Roger Pau Monné wrote:
>> The same for the pool of persistent grants, it should be per-device and
>> not per-ring.
>>
>> And I think this issue is far worse than the others, because a frontend
>> might use a persistent grant on different queues, forcing the backend
>> map the grant several times for each queue, this is not acceptable IMO.
>>
> 
> Hi Roger,
> 
> I realize it would make things complicate if making persistent grant per-device instead of per-queue.
> Extra locks are required to protect the per-device pool on both blkfront and blkback.

Yes, I realize this, but without having at least a prototype it's hard
to tell if contention is going to be a problem or not. We already use a
red-black tree to store persistent grants, which should be quite fast
when performing searches.

IMHO, we are doing things backwards, we should have investigated first
if using per-device was a problem, and if it indeed was a problem then
move to per-queue. Using per-device just required adding locks around
the functions to get/put grants and friends, leaving the data structures
untouched (per-device).

> AFAIR, there was a discussion before about dropping persistent grant map at all.
> The only reason we left this feature was backward compatibility.
> So that I think we should not complicate xen-block code any more because of a going to be dropped feature.
> 
> How about disable feature-persistent if multi-queue was used?

This is not what we have done in the past, also there's no way for
blkback to tell the frontend that persistent grants and multiqueue
cannot be used at the same time. Blkback puts all supported features on
xenstore before knowing anything about the frontend.

Also, if you want to do it per-queue instead of per-device the limits
need to be properly adjusted, not just the persistent grants one, but
also the queue of cached free pages. This also implies that each queue
it's going to have it's own LRU and purge task.

Roger.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 3/9] xen/blkfront: separate per ring information out of device info
  2015-10-10  8:30     ` Bob Liu
@ 2015-10-19  9:42       ` Roger Pau Monné
  2015-10-19  9:42       ` Roger Pau Monné
  1 sibling, 0 replies; 83+ messages in thread
From: Roger Pau Monné @ 2015-10-19  9:42 UTC (permalink / raw)
  To: Bob Liu
  Cc: xen-devel, david.vrabel, linux-kernel, konrad.wilk,
	felipe.franciosi, axboe, hch, avanzini.arianna, rafal.mielniczuk,
	boris.ostrovsky, jonathan.davies

El 10/10/15 a les 10.30, Bob Liu ha escrit:
> 
> On 10/03/2015 01:02 AM, Roger Pau Monné wrote:
>> El 05/09/15 a les 14.39, Bob Liu ha escrit:
>>> Split per ring information to an new structure:blkfront_ring_info, also rename
>>> per blkfront_info to blkfront_dev_info.
>>   ^ removed.
>>>
>>> A ring is the representation of a hardware queue, every vbd device can associate
>>> with one or more blkfront_ring_info depending on how many hardware
>>> queues/rings to be used.
>>>
>>> This patch is a preparation for supporting real multi hardware queues/rings.
>>>
>>> Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
>>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>>> ---
>>>  drivers/block/xen-blkfront.c |  854 ++++++++++++++++++++++--------------------
>>>  1 file changed, 445 insertions(+), 409 deletions(-)
>>>
>>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>>> index 5dd591d..bf416d5 100644
>>> --- a/drivers/block/xen-blkfront.c
>>> +++ b/drivers/block/xen-blkfront.c
>>> @@ -107,7 +107,7 @@ static unsigned int xen_blkif_max_ring_order;
>>>  module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, S_IRUGO);
>>>  MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the shared ring");
>>>  
>>> -#define BLK_RING_SIZE(info) __CONST_RING_SIZE(blkif, PAGE_SIZE * (info)->nr_ring_pages)
>>> +#define BLK_RING_SIZE(dinfo) __CONST_RING_SIZE(blkif, PAGE_SIZE * (dinfo)->nr_ring_pages)
>>
>> This change looks pointless, any reason to use dinfo instead of info?
>>
>>>  #define BLK_MAX_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE * XENBUS_MAX_RING_PAGES)
>>>  /*
>>>   * ring-ref%i i=(-1UL) would take 11 characters + 'ring-ref' is 8, so 19
>>> @@ -116,12 +116,31 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the
>>>  #define RINGREF_NAME_LEN (20)
>>>  
>>>  /*
>>> + *  Per-ring info.
>>> + *  Every blkfront device can associate with one or more blkfront_ring_info,
>>> + *  depending on how many hardware queues to be used.
>>> + */
>>> +struct blkfront_ring_info
>>> +{
>>> +	struct blkif_front_ring ring;
>>> +	unsigned int ring_ref[XENBUS_MAX_RING_PAGES];
>>> +	unsigned int evtchn, irq;
>>> +	struct work_struct work;
>>> +	struct gnttab_free_callback callback;
>>> +	struct blk_shadow shadow[BLK_MAX_RING_SIZE];
>>> +	struct list_head grants;
>>> +	struct list_head indirect_pages;
>>> +	unsigned int persistent_gnts_c;
>>
>> persistent grants should be per-device, not per-queue IMHO. Is it really
>> hard to make this global instead of per-queue?
>>
> 
> I didn't see the benefit of making it per-device, but disadvantages instead:
> If persistent grants are per-device, then we have to introduce an extra lock to protect this list.
> Which will complicate the code and may slow down the performance when the queue number is large e.g 16 queues.

IMHO, and as I said in the reply to patch 7, there's no way to know that
unless you actually implement it, and I think it was easier to just add
locks around existing functions without moving the data structures
(leaving them per-device).

Also, you didn't want to enable multiple queues by default because of
the RAM usage, if we make all this per-device RAM usage is not going to
be increased much, which will mean we could enable multiple queues by
default with a sensible value (4 maybe?). TBH, I don't think we are
going to see contention with 4 queues per device.

Roger.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 3/9] xen/blkfront: separate per ring information out of device info
  2015-10-10  8:30     ` Bob Liu
  2015-10-19  9:42       ` Roger Pau Monné
@ 2015-10-19  9:42       ` Roger Pau Monné
  1 sibling, 0 replies; 83+ messages in thread
From: Roger Pau Monné @ 2015-10-19  9:42 UTC (permalink / raw)
  To: Bob Liu
  Cc: hch, felipe.franciosi, rafal.mielniczuk, linux-kernel, xen-devel,
	axboe, jonathan.davies, david.vrabel, avanzini.arianna,
	boris.ostrovsky

El 10/10/15 a les 10.30, Bob Liu ha escrit:
> 
> On 10/03/2015 01:02 AM, Roger Pau Monné wrote:
>> El 05/09/15 a les 14.39, Bob Liu ha escrit:
>>> Split per ring information to an new structure:blkfront_ring_info, also rename
>>> per blkfront_info to blkfront_dev_info.
>>   ^ removed.
>>>
>>> A ring is the representation of a hardware queue, every vbd device can associate
>>> with one or more blkfront_ring_info depending on how many hardware
>>> queues/rings to be used.
>>>
>>> This patch is a preparation for supporting real multi hardware queues/rings.
>>>
>>> Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
>>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>>> ---
>>>  drivers/block/xen-blkfront.c |  854 ++++++++++++++++++++++--------------------
>>>  1 file changed, 445 insertions(+), 409 deletions(-)
>>>
>>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>>> index 5dd591d..bf416d5 100644
>>> --- a/drivers/block/xen-blkfront.c
>>> +++ b/drivers/block/xen-blkfront.c
>>> @@ -107,7 +107,7 @@ static unsigned int xen_blkif_max_ring_order;
>>>  module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, S_IRUGO);
>>>  MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the shared ring");
>>>  
>>> -#define BLK_RING_SIZE(info) __CONST_RING_SIZE(blkif, PAGE_SIZE * (info)->nr_ring_pages)
>>> +#define BLK_RING_SIZE(dinfo) __CONST_RING_SIZE(blkif, PAGE_SIZE * (dinfo)->nr_ring_pages)
>>
>> This change looks pointless, any reason to use dinfo instead of info?
>>
>>>  #define BLK_MAX_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE * XENBUS_MAX_RING_PAGES)
>>>  /*
>>>   * ring-ref%i i=(-1UL) would take 11 characters + 'ring-ref' is 8, so 19
>>> @@ -116,12 +116,31 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the
>>>  #define RINGREF_NAME_LEN (20)
>>>  
>>>  /*
>>> + *  Per-ring info.
>>> + *  Every blkfront device can associate with one or more blkfront_ring_info,
>>> + *  depending on how many hardware queues to be used.
>>> + */
>>> +struct blkfront_ring_info
>>> +{
>>> +	struct blkif_front_ring ring;
>>> +	unsigned int ring_ref[XENBUS_MAX_RING_PAGES];
>>> +	unsigned int evtchn, irq;
>>> +	struct work_struct work;
>>> +	struct gnttab_free_callback callback;
>>> +	struct blk_shadow shadow[BLK_MAX_RING_SIZE];
>>> +	struct list_head grants;
>>> +	struct list_head indirect_pages;
>>> +	unsigned int persistent_gnts_c;
>>
>> persistent grants should be per-device, not per-queue IMHO. Is it really
>> hard to make this global instead of per-queue?
>>
> 
> I didn't see the benefit of making it per-device, but disadvantages instead:
> If persistent grants are per-device, then we have to introduce an extra lock to protect this list.
> Which will complicate the code and may slow down the performance when the queue number is large e.g 16 queues.

IMHO, and as I said in the reply to patch 7, there's no way to know that
unless you actually implement it, and I think it was easier to just add
locks around existing functions without moving the data structures
(leaving them per-device).

Also, you didn't want to enable multiple queues by default because of
the RAM usage, if we make all this per-device RAM usage is not going to
be increased much, which will mean we could enable multiple queues by
default with a sensible value (4 maybe?). TBH, I don't think we are
going to see contention with 4 queues per device.

Roger.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 7/9] xen/blkback: separate ring information out of struct xen_blkif
  2015-10-19  9:36       ` Roger Pau Monné
  2015-10-19 10:03         ` Bob Liu
@ 2015-10-19 10:03         ` Bob Liu
  1 sibling, 0 replies; 83+ messages in thread
From: Bob Liu @ 2015-10-19 10:03 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, david.vrabel, linux-kernel, konrad.wilk,
	felipe.franciosi, axboe, hch, avanzini.arianna, rafal.mielniczuk,
	boris.ostrovsky, jonathan.davies


On 10/19/2015 05:36 PM, Roger Pau Monné wrote:
> El 10/10/15 a les 6.08, Bob Liu ha escrit:
>> On 10/05/2015 10:55 PM, Roger Pau Monné wrote:
>>> The same for the pool of persistent grants, it should be per-device and
>>> not per-ring.
>>>
>>> And I think this issue is far worse than the others, because a frontend
>>> might use a persistent grant on different queues, forcing the backend
>>> map the grant several times for each queue, this is not acceptable IMO.
>>>
>>
>> Hi Roger,
>>
>> I realize it would make things complicate if making persistent grant per-device instead of per-queue.
>> Extra locks are required to protect the per-device pool on both blkfront and blkback.
> 
> Yes, I realize this, but without having at least a prototype it's hard
> to tell if contention is going to be a problem or not. We already use a
> red-black tree to store persistent grants, which should be quite fast
> when performing searches.
> 
> IMHO, we are doing things backwards, we should have investigated first
> if using per-device was a problem, and if it indeed was a problem then
> move to per-queue. Using per-device just required adding locks around
> the functions to get/put grants and friends, leaving the data structures
> untouched (per-device).
> 
>> AFAIR, there was a discussion before about dropping persistent grant map at all.
>> The only reason we left this feature was backward compatibility.
>> So that I think we should not complicate xen-block code any more because of a going to be dropped feature.
>>
>> How about disable feature-persistent if multi-queue was used?
> 
> This is not what we have done in the past, also there's no way for
> blkback to tell the frontend that persistent grants and multiqueue
> cannot be used at the same time. Blkback puts all supported features on
> xenstore before knowing anything about the frontend.
> 
> Also, if you want to do it per-queue instead of per-device the limits
> need to be properly adjusted, not just the persistent grants one, but
> also the queue of cached free pages. This also implies that each queue
> it's going to have it's own LRU and purge task.
> 

Okay, then I'll update with a per-device version.

For blkfront there would be two locks used, one for per-device and the other for per-ring.

For blkback, an new lock would be added to protect the red-black tree e.g. in add_persistent_gnt().

-- 
Regards,
-Bob

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v3 7/9] xen/blkback: separate ring information out of struct xen_blkif
  2015-10-19  9:36       ` Roger Pau Monné
@ 2015-10-19 10:03         ` Bob Liu
  2015-10-19 10:03         ` Bob Liu
  1 sibling, 0 replies; 83+ messages in thread
From: Bob Liu @ 2015-10-19 10:03 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: hch, felipe.franciosi, rafal.mielniczuk, linux-kernel, xen-devel,
	axboe, jonathan.davies, david.vrabel, avanzini.arianna,
	boris.ostrovsky


On 10/19/2015 05:36 PM, Roger Pau Monné wrote:
> El 10/10/15 a les 6.08, Bob Liu ha escrit:
>> On 10/05/2015 10:55 PM, Roger Pau Monné wrote:
>>> The same for the pool of persistent grants, it should be per-device and
>>> not per-ring.
>>>
>>> And I think this issue is far worse than the others, because a frontend
>>> might use a persistent grant on different queues, forcing the backend
>>> map the grant several times for each queue, this is not acceptable IMO.
>>>
>>
>> Hi Roger,
>>
>> I realize it would make things complicate if making persistent grant per-device instead of per-queue.
>> Extra locks are required to protect the per-device pool on both blkfront and blkback.
> 
> Yes, I realize this, but without having at least a prototype it's hard
> to tell if contention is going to be a problem or not. We already use a
> red-black tree to store persistent grants, which should be quite fast
> when performing searches.
> 
> IMHO, we are doing things backwards, we should have investigated first
> if using per-device was a problem, and if it indeed was a problem then
> move to per-queue. Using per-device just required adding locks around
> the functions to get/put grants and friends, leaving the data structures
> untouched (per-device).
> 
>> AFAIR, there was a discussion before about dropping persistent grant map at all.
>> The only reason we left this feature was backward compatibility.
>> So that I think we should not complicate xen-block code any more because of a going to be dropped feature.
>>
>> How about disable feature-persistent if multi-queue was used?
> 
> This is not what we have done in the past, also there's no way for
> blkback to tell the frontend that persistent grants and multiqueue
> cannot be used at the same time. Blkback puts all supported features on
> xenstore before knowing anything about the frontend.
> 
> Also, if you want to do it per-queue instead of per-device the limits
> need to be properly adjusted, not just the persistent grants one, but
> also the queue of cached free pages. This also implies that each queue
> it's going to have it's own LRU and purge task.
> 

Okay, then I'll update with a per-device version.

For blkfront there would be two locks used, one for per-device and the other for per-ring.

For blkback, an new lock would be added to protect the red-black tree e.g. in add_persistent_gnt().

-- 
Regards,
-Bob

^ permalink raw reply	[flat|nested] 83+ messages in thread

end of thread, other threads:[~2015-10-19 10:04 UTC | newest]

Thread overview: 83+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-05 12:39 [PATCH v3 0/9] xen-block: support multi hardware-queues/rings Bob Liu
2015-09-05 12:39 ` [PATCH v3 1/9] xen-blkfront: convert to blk-mq APIs Bob Liu
2015-09-05 12:39 ` Bob Liu
2015-09-23 20:31   ` Konrad Rzeszutek Wilk
2015-09-23 21:12     ` Konrad Rzeszutek Wilk
2015-09-23 21:12     ` Konrad Rzeszutek Wilk
2015-09-23 20:31   ` Konrad Rzeszutek Wilk
2015-09-05 12:39 ` [PATCH v3 2/9] xen-block: add document for mutli hardware queues/rings Bob Liu
2015-09-05 12:39 ` Bob Liu
2015-09-23 20:32   ` Konrad Rzeszutek Wilk
2015-09-23 20:32   ` Konrad Rzeszutek Wilk
2015-10-02 16:04   ` Roger Pau Monné
2015-10-02 16:12     ` Wei Liu
2015-10-02 16:12     ` [Xen-devel] " Wei Liu
2015-10-02 16:22       ` Roger Pau Monné
2015-10-02 16:22       ` [Xen-devel] " Roger Pau Monné
2015-10-02 23:55         ` Bob Liu
2015-10-02 23:55         ` Bob Liu
2015-10-02 16:04   ` Roger Pau Monné
2015-09-05 12:39 ` [PATCH v3 3/9] xen/blkfront: separate per ring information out of device info Bob Liu
2015-10-02 17:02   ` Roger Pau Monné
2015-10-03  0:34     ` Bob Liu
2015-10-03  0:34     ` Bob Liu
2015-10-05 15:17       ` Roger Pau Monné
2015-10-05 15:17       ` Roger Pau Monné
2015-10-10  8:30     ` Bob Liu
2015-10-19  9:42       ` Roger Pau Monné
2015-10-19  9:42       ` Roger Pau Monné
2015-10-10  8:30     ` Bob Liu
2015-10-02 17:02   ` Roger Pau Monné
2015-09-05 12:39 ` Bob Liu
2015-09-05 12:39 ` [PATCH v3 4/9] xen/blkfront: pseudo support for multi hardware queues/rings Bob Liu
2015-10-05 10:52   ` Roger Pau Monné
2015-10-07 10:28     ` Bob Liu
2015-10-07 10:28     ` Bob Liu
2015-10-05 10:52   ` Roger Pau Monné
2015-09-05 12:39 ` Bob Liu
2015-09-05 12:39 ` [PATCH v3 5/9] xen/blkfront: convert per device io_lock to per ring ring_lock Bob Liu
2015-09-05 12:39   ` Bob Liu
2015-10-05 14:13   ` Roger Pau Monné
2015-10-05 14:13   ` Roger Pau Monné
2015-10-07 10:34     ` Bob Liu
2015-10-07 10:34     ` Bob Liu
2015-09-05 12:39 ` [PATCH v3 6/9] xen/blkfront: negotiate the number of hw queues/rings with backend Bob Liu
2015-09-05 12:39   ` Bob Liu
2015-10-05 14:40   ` Roger Pau Monné
2015-10-07 10:39     ` Bob Liu
2015-10-07 10:39     ` Bob Liu
2015-10-07 11:46       ` Roger Pau Monné
2015-10-07 12:19         ` Bob Liu
2015-10-07 12:19         ` Bob Liu
2015-10-07 11:46       ` Roger Pau Monné
2015-10-05 14:40   ` Roger Pau Monné
2015-09-05 12:39 ` [PATCH v3 7/9] xen/blkback: separate ring information out of struct xen_blkif Bob Liu
2015-09-05 12:39   ` Bob Liu
2015-10-05 14:55   ` Roger Pau Monné
2015-10-07 10:41     ` Bob Liu
2015-10-07 10:41     ` Bob Liu
2015-10-10  4:08     ` Bob Liu
2015-10-19  9:36       ` Roger Pau Monné
2015-10-19  9:36       ` Roger Pau Monné
2015-10-19 10:03         ` Bob Liu
2015-10-19 10:03         ` Bob Liu
2015-10-10  4:08     ` Bob Liu
2015-10-05 14:55   ` Roger Pau Monné
2015-10-05 14:55   ` Roger Pau Monné
2015-10-05 14:55   ` Roger Pau Monné
2015-09-05 12:39 ` [PATCH v3 8/9] xen/blkback: pseudo support for multi hardware queues/rings Bob Liu
2015-09-05 12:39   ` Bob Liu
2015-10-05 15:08   ` Roger Pau Monné
2015-10-05 15:08   ` Roger Pau Monné
2015-10-07 10:50     ` Bob Liu
2015-10-07 11:49       ` Roger Pau Monné
2015-10-07 11:49       ` Roger Pau Monné
2015-10-07 10:50     ` Bob Liu
2015-09-05 12:39 ` [PATCH v3 9/9] xen/blkback: get number of hardware queues/rings from blkfront Bob Liu
2015-09-05 12:39   ` Bob Liu
2015-10-05 15:15   ` Roger Pau Monné
2015-10-05 15:15   ` Roger Pau Monné
2015-10-07 10:54     ` Bob Liu
2015-10-07 10:54     ` Bob Liu
2015-10-02  9:57 ` [PATCH v3 0/9] xen-block: support multi hardware-queues/rings Rafal Mielniczuk
2015-10-02  9:57 ` Rafal Mielniczuk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.