[PATCH v5 00/10] xen-block: multi hardware-queues/rings support

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v5 00/10] xen-block: multi hardware-queues/rings support
@ 2015-11-14  3:12 Bob Liu
  2015-11-14  3:12 ` [PATCH v5 01/10] xen/blkif: document blkif multi-queue/ring extension Bob Liu
                   ` (22 more replies)
  0 siblings, 23 replies; 59+ messages in thread
From: Bob Liu @ 2015-11-14  3:12 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, roger.pau, konrad.wilk, felipe.franciosi, axboe,
	avanzini.arianna, rafal.mielniczuk, jonathan.davies,
	david.vrabel, Bob Liu

Note: These patches were based on original work of Arianna's internship for
GNOME's Outreach Program for Women.

After using blk-mq api, a guest has more than one(nr_vpus) software request
queues associated with each block front. These queues can be mapped over several
rings(hardware queues) to the backend, making it very easy for us to run
multiple threads on the backend for a single virtual disk.

By having different threads issuing requests at the same time, the performance
of guest can be improved significantly.

Test was done based on null_blk driver:
dom0: v4.3-rc7 16vcpus 10GB "modprobe null_blk"
domU: v4.3-rc7 16vcpus 10GB

[test]
rw=read
direct=1
ioengine=libaio
bs=4k
time_based
runtime=30
filename=/dev/xvdb
numjobs=16
iodepth=64
iodepth_batch=64
iodepth_batch_complete=64
group_reporting

Results:
iops1: After commit("xen/blkfront: make persistent grants per-queue").
iops2: After commit("xen/blkback: make persistent grants and free pages pool per-queue").

Queues:			  1 	   4 	  	  8 	 	 16
Iops orig(k):		810 	1064 		780 		700
Iops1(k):		810     1230(~20%)	1024(~20%)	850(~20%)
Iops2(k):		810     1410(~35%)	1354(~75%)      1440(~100%)

With 4 queues after this series we can get ~75% increase in IOPS, and
performance won't drop if incresing queue numbers.

Please find the respective chart in this link:
https://www.dropbox.com/s/agrcy2pbzbsvmwv/iops.png?dl=0

---
v5:
 * Rebase to xen/tip.git tags/for-linus-4.4-rc0-tag.
 * Comments from Konrad.

v4:
 * Rebase to v4.3-rc7.
 * Comments from Roger.

v3:
 * Rebased to v4.2-rc8.

Bob Liu (10):
  xen/blkif: document blkif multi-queue/ring extension
  xen/blkfront: separate per ring information out of device info
  xen/blkfront: pseudo support for multi hardware queues/rings
  xen/blkfront: split per device io_lock
  xen/blkfront: negotiate number of queues/rings to be used with backend
  xen/blkback: separate ring information out of struct xen_blkif
  xen/blkback: pseudo support for multi hardware queues/rings
  xen/blkback: get the number of hardware queues/rings from blkfront
  xen/blkfront: make persistent grants per-queue
  xen/blkback: make pool of persistent grants and free pages per-queue

 drivers/block/xen-blkback/blkback.c | 386 ++++++++++---------
 drivers/block/xen-blkback/common.h  |  78 ++--
 drivers/block/xen-blkback/xenbus.c  | 359 ++++++++++++------
 drivers/block/xen-blkfront.c        | 718 ++++++++++++++++++++++--------------
 include/xen/interface/io/blkif.h    |  48 +++
 5 files changed, 971 insertions(+), 618 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v5 01/10] xen/blkif: document blkif multi-queue/ring extension
  2015-11-14  3:12 [PATCH v5 00/10] xen-block: multi hardware-queues/rings support Bob Liu
  2015-11-14  3:12 ` [PATCH v5 01/10] xen/blkif: document blkif multi-queue/ring extension Bob Liu
@ 2015-11-14  3:12 ` Bob Liu
  2015-11-16 19:27   ` Konrad Rzeszutek Wilk
  2015-11-14  3:12   ` Bob Liu
                   ` (20 subsequent siblings)
  22 siblings, 1 reply; 59+ messages in thread
From: Bob Liu @ 2015-11-14  3:12 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, roger.pau, konrad.wilk, felipe.franciosi, axboe,
	avanzini.arianna, rafal.mielniczuk, jonathan.davies,
	david.vrabel, Bob Liu

Document the multi-queue/ring feature in terms of XenStore keys to be written by
the backend and by the frontend.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
v2:
Add descriptions together with multi-page ring buffer.
---
 include/xen/interface/io/blkif.h |   48 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/include/xen/interface/io/blkif.h b/include/xen/interface/io/blkif.h
index c33e1c4..8b8cfad 100644
--- a/include/xen/interface/io/blkif.h
+++ b/include/xen/interface/io/blkif.h
@@ -28,6 +28,54 @@ typedef uint16_t blkif_vdev_t;
 typedef uint64_t blkif_sector_t;
 
 /*
+ * Multiple hardware queues/rings:
+ * If supported, the backend will write the key "multi-queue-max-queues" to
+ * the directory for that vbd, and set its value to the maximum supported
+ * number of queues.
+ * Frontends that are aware of this feature and wish to use it can write the
+ * key "multi-queue-num-queues" with the number they wish to use, which must be
+ * greater than zero, and no more than the value reported by the backend in
+ * "multi-queue-max-queues".
+ *
+ * For frontends requesting just one queue, the usual event-channel and
+ * ring-ref keys are written as before, simplifying the backend processing
+ * to avoid distinguishing between a frontend that doesn't understand the
+ * multi-queue feature, and one that does, but requested only one queue.
+ *
+ * Frontends requesting two or more queues must not write the toplevel
+ * event-channel and ring-ref keys, instead writing those keys under sub-keys
+ * having the name "queue-N" where N is the integer ID of the queue/ring for
+ * which those keys belong. Queues are indexed from zero.
+ * For example, a frontend with two queues must write the following set of
+ * queue-related keys:
+ *
+ * /local/domain/1/device/vbd/0/multi-queue-num-queues = "2"
+ * /local/domain/1/device/vbd/0/queue-0 = ""
+ * /local/domain/1/device/vbd/0/queue-0/ring-ref = "<ring-ref#0>"
+ * /local/domain/1/device/vbd/0/queue-0/event-channel = "<evtchn#0>"
+ * /local/domain/1/device/vbd/0/queue-1 = ""
+ * /local/domain/1/device/vbd/0/queue-1/ring-ref = "<ring-ref#1>"
+ * /local/domain/1/device/vbd/0/queue-1/event-channel = "<evtchn#1>"
+ *
+ * It is also possible to use multiple queues/rings together with
+ * feature multi-page ring buffer.
+ * For example, a frontend requests two queues/rings and the size of each ring
+ * buffer is two pages must write the following set of related keys:
+ *
+ * /local/domain/1/device/vbd/0/multi-queue-num-queues = "2"
+ * /local/domain/1/device/vbd/0/ring-page-order = "1"
+ * /local/domain/1/device/vbd/0/queue-0 = ""
+ * /local/domain/1/device/vbd/0/queue-0/ring-ref0 = "<ring-ref#0>"
+ * /local/domain/1/device/vbd/0/queue-0/ring-ref1 = "<ring-ref#1>"
+ * /local/domain/1/device/vbd/0/queue-0/event-channel = "<evtchn#0>"
+ * /local/domain/1/device/vbd/0/queue-1 = ""
+ * /local/domain/1/device/vbd/0/queue-1/ring-ref0 = "<ring-ref#2>"
+ * /local/domain/1/device/vbd/0/queue-1/ring-ref1 = "<ring-ref#3>"
+ * /local/domain/1/device/vbd/0/queue-1/event-channel = "<evtchn#1>"
+ *
+ */
+
+/*
  * REQUEST CODES.
  */
 #define BLKIF_OP_READ              0
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 01/10] xen/blkif: document blkif multi-queue/ring extension
  2015-11-14  3:12 [PATCH v5 00/10] xen-block: multi hardware-queues/rings support Bob Liu
@ 2015-11-14  3:12 ` Bob Liu
  2015-11-14  3:12 ` Bob Liu
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 59+ messages in thread
From: Bob Liu @ 2015-11-14  3:12 UTC (permalink / raw)
  To: xen-devel
  Cc: jonathan.davies, felipe.franciosi, rafal.mielniczuk,
	linux-kernel, axboe, Bob Liu, david.vrabel, avanzini.arianna,
	roger.pau

Document the multi-queue/ring feature in terms of XenStore keys to be written by
the backend and by the frontend.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
v2:
Add descriptions together with multi-page ring buffer.
---
 include/xen/interface/io/blkif.h |   48 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/include/xen/interface/io/blkif.h b/include/xen/interface/io/blkif.h
index c33e1c4..8b8cfad 100644
--- a/include/xen/interface/io/blkif.h
+++ b/include/xen/interface/io/blkif.h
@@ -28,6 +28,54 @@ typedef uint16_t blkif_vdev_t;
 typedef uint64_t blkif_sector_t;
 
 /*
+ * Multiple hardware queues/rings:
+ * If supported, the backend will write the key "multi-queue-max-queues" to
+ * the directory for that vbd, and set its value to the maximum supported
+ * number of queues.
+ * Frontends that are aware of this feature and wish to use it can write the
+ * key "multi-queue-num-queues" with the number they wish to use, which must be
+ * greater than zero, and no more than the value reported by the backend in
+ * "multi-queue-max-queues".
+ *
+ * For frontends requesting just one queue, the usual event-channel and
+ * ring-ref keys are written as before, simplifying the backend processing
+ * to avoid distinguishing between a frontend that doesn't understand the
+ * multi-queue feature, and one that does, but requested only one queue.
+ *
+ * Frontends requesting two or more queues must not write the toplevel
+ * event-channel and ring-ref keys, instead writing those keys under sub-keys
+ * having the name "queue-N" where N is the integer ID of the queue/ring for
+ * which those keys belong. Queues are indexed from zero.
+ * For example, a frontend with two queues must write the following set of
+ * queue-related keys:
+ *
+ * /local/domain/1/device/vbd/0/multi-queue-num-queues = "2"
+ * /local/domain/1/device/vbd/0/queue-0 = ""
+ * /local/domain/1/device/vbd/0/queue-0/ring-ref = "<ring-ref#0>"
+ * /local/domain/1/device/vbd/0/queue-0/event-channel = "<evtchn#0>"
+ * /local/domain/1/device/vbd/0/queue-1 = ""
+ * /local/domain/1/device/vbd/0/queue-1/ring-ref = "<ring-ref#1>"
+ * /local/domain/1/device/vbd/0/queue-1/event-channel = "<evtchn#1>"
+ *
+ * It is also possible to use multiple queues/rings together with
+ * feature multi-page ring buffer.
+ * For example, a frontend requests two queues/rings and the size of each ring
+ * buffer is two pages must write the following set of related keys:
+ *
+ * /local/domain/1/device/vbd/0/multi-queue-num-queues = "2"
+ * /local/domain/1/device/vbd/0/ring-page-order = "1"
+ * /local/domain/1/device/vbd/0/queue-0 = ""
+ * /local/domain/1/device/vbd/0/queue-0/ring-ref0 = "<ring-ref#0>"
+ * /local/domain/1/device/vbd/0/queue-0/ring-ref1 = "<ring-ref#1>"
+ * /local/domain/1/device/vbd/0/queue-0/event-channel = "<evtchn#0>"
+ * /local/domain/1/device/vbd/0/queue-1 = ""
+ * /local/domain/1/device/vbd/0/queue-1/ring-ref0 = "<ring-ref#2>"
+ * /local/domain/1/device/vbd/0/queue-1/ring-ref1 = "<ring-ref#3>"
+ * /local/domain/1/device/vbd/0/queue-1/event-channel = "<evtchn#1>"
+ *
+ */
+
+/*
  * REQUEST CODES.
  */
 #define BLKIF_OP_READ              0
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 02/10] xen/blkfront: separate per ring information out of device info
  2015-11-14  3:12 [PATCH v5 00/10] xen-block: multi hardware-queues/rings support Bob Liu
@ 2015-11-14  3:12   ` Bob Liu
  2015-11-14  3:12 ` Bob Liu
                     ` (21 subsequent siblings)
  22 siblings, 0 replies; 59+ messages in thread
From: Bob Liu @ 2015-11-14  3:12 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, roger.pau, konrad.wilk, felipe.franciosi, axboe,
	avanzini.arianna, rafal.mielniczuk, jonathan.davies,
	david.vrabel, Bob Liu

Split per ring information to an new structure "blkfront_ring_info".

A ring is the representation of a hardware queue, every vbd device can associate
with one or more rings depending on how many hardware queues/rings to be used.

This patch is a preparation for supporting real multi hardware queues/rings.

Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
v2: Fix build error.
---
 drivers/block/xen-blkfront.c |  359 +++++++++++++++++++++++-------------------
 1 file changed, 197 insertions(+), 162 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 2fee2ee..0c3ad21 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -120,6 +120,23 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the
 #define RINGREF_NAME_LEN (20)
 
 /*
+ *  Per-ring info.
+ *  Every blkfront device can associate with one or more blkfront_ring_info,
+ *  depending on how many hardware queues/rings to be used.
+ */
+struct blkfront_ring_info {
+	struct blkif_front_ring ring;
+	unsigned int ring_ref[XENBUS_MAX_RING_GRANTS];
+	unsigned int evtchn, irq;
+	struct work_struct work;
+	struct gnttab_free_callback callback;
+	struct blk_shadow shadow[BLK_MAX_RING_SIZE];
+	struct list_head indirect_pages;
+	unsigned long shadow_free;
+	struct blkfront_info *dev_info;
+};
+
+/*
  * We have one of these per vbd, whether ide, scsi or 'other'.  They
  * hang in private_data off the gendisk structure. We may end up
  * putting all kinds of interesting stuff here :-)
@@ -133,18 +150,10 @@ struct blkfront_info
 	int vdevice;
 	blkif_vdev_t handle;
 	enum blkif_state connected;
-	int ring_ref[XENBUS_MAX_RING_GRANTS];
 	unsigned int nr_ring_pages;
-	struct blkif_front_ring ring;
-	unsigned int evtchn, irq;
 	struct request_queue *rq;
-	struct work_struct work;
-	struct gnttab_free_callback callback;
-	struct blk_shadow shadow[BLK_MAX_RING_SIZE];
 	struct list_head grants;
-	struct list_head indirect_pages;
 	unsigned int persistent_gnts_c;
-	unsigned long shadow_free;
 	unsigned int feature_flush;
 	unsigned int feature_discard:1;
 	unsigned int feature_secdiscard:1;
@@ -155,6 +164,7 @@ struct blkfront_info
 	unsigned int max_indirect_segments;
 	int is_ready;
 	struct blk_mq_tag_set tag_set;
+	struct blkfront_ring_info rinfo;
 };
 
 static unsigned int nr_minors;
@@ -198,33 +208,35 @@ static DEFINE_SPINLOCK(minor_lock);
 
 #define GREFS(_psegs)	((_psegs) * GRANTS_PER_PSEG)
 
-static int blkfront_setup_indirect(struct blkfront_info *info);
+static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo);
 static int blkfront_gather_backend_features(struct blkfront_info *info);
 
-static int get_id_from_freelist(struct blkfront_info *info)
+static int get_id_from_freelist(struct blkfront_ring_info *rinfo)
 {
-	unsigned long free = info->shadow_free;
-	BUG_ON(free >= BLK_RING_SIZE(info));
-	info->shadow_free = info->shadow[free].req.u.rw.id;
-	info->shadow[free].req.u.rw.id = 0x0fffffee; /* debug */
+	unsigned long free = rinfo->shadow_free;
+
+	BUG_ON(free >= BLK_RING_SIZE(rinfo->dev_info));
+	rinfo->shadow_free = rinfo->shadow[free].req.u.rw.id;
+	rinfo->shadow[free].req.u.rw.id = 0x0fffffee; /* debug */
 	return free;
 }
 
-static int add_id_to_freelist(struct blkfront_info *info,
+static int add_id_to_freelist(struct blkfront_ring_info *rinfo,
 			       unsigned long id)
 {
-	if (info->shadow[id].req.u.rw.id != id)
+	if (rinfo->shadow[id].req.u.rw.id != id)
 		return -EINVAL;
-	if (info->shadow[id].request == NULL)
+	if (rinfo->shadow[id].request == NULL)
 		return -EINVAL;
-	info->shadow[id].req.u.rw.id  = info->shadow_free;
-	info->shadow[id].request = NULL;
-	info->shadow_free = id;
+	rinfo->shadow[id].req.u.rw.id  = rinfo->shadow_free;
+	rinfo->shadow[id].request = NULL;
+	rinfo->shadow_free = id;
 	return 0;
 }
 
-static int fill_grant_buffer(struct blkfront_info *info, int num)
+static int fill_grant_buffer(struct blkfront_ring_info *rinfo, int num)
 {
+	struct blkfront_info *info = rinfo->dev_info;
 	struct page *granted_page;
 	struct grant *gnt_list_entry, *n;
 	int i = 0;
@@ -326,8 +338,8 @@ static struct grant *get_indirect_grant(grant_ref_t *gref_head,
 		struct page *indirect_page;
 
 		/* Fetch a pre-allocated page to use for indirect grefs */
-		BUG_ON(list_empty(&info->indirect_pages));
-		indirect_page = list_first_entry(&info->indirect_pages,
+		BUG_ON(list_empty(&info->rinfo.indirect_pages));
+		indirect_page = list_first_entry(&info->rinfo.indirect_pages,
 						 struct page, lru);
 		list_del(&indirect_page->lru);
 		gnt_list_entry->page = indirect_page;
@@ -403,8 +415,8 @@ static void xlbd_release_minors(unsigned int minor, unsigned int nr)
 
 static void blkif_restart_queue_callback(void *arg)
 {
-	struct blkfront_info *info = (struct blkfront_info *)arg;
-	schedule_work(&info->work);
+	struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)arg;
+	schedule_work(&rinfo->work);
 }
 
 static int blkif_getgeo(struct block_device *bd, struct hd_geometry *hg)
@@ -456,16 +468,16 @@ static int blkif_ioctl(struct block_device *bdev, fmode_t mode,
 	return 0;
 }
 
-static int blkif_queue_discard_req(struct request *req)
+static int blkif_queue_discard_req(struct request *req, struct blkfront_ring_info *rinfo)
 {
-	struct blkfront_info *info = req->rq_disk->private_data;
+	struct blkfront_info *info = rinfo->dev_info;
 	struct blkif_request *ring_req;
 	unsigned long id;
 
 	/* Fill out a communications ring structure. */
-	ring_req = RING_GET_REQUEST(&info->ring, info->ring.req_prod_pvt);
-	id = get_id_from_freelist(info);
-	info->shadow[id].request = req;
+	ring_req = RING_GET_REQUEST(&rinfo->ring, rinfo->ring.req_prod_pvt);
+	id = get_id_from_freelist(rinfo);
+	rinfo->shadow[id].request = req;
 
 	ring_req->operation = BLKIF_OP_DISCARD;
 	ring_req->u.discard.nr_sectors = blk_rq_sectors(req);
@@ -476,10 +488,10 @@ static int blkif_queue_discard_req(struct request *req)
 	else
 		ring_req->u.discard.flag = 0;
 
-	info->ring.req_prod_pvt++;
+	rinfo->ring.req_prod_pvt++;
 
 	/* Keep a private copy so we can reissue requests when recovering. */
-	info->shadow[id].req = *ring_req;
+	rinfo->shadow[id].req = *ring_req;
 
 	return 0;
 }
@@ -487,7 +499,7 @@ static int blkif_queue_discard_req(struct request *req)
 struct setup_rw_req {
 	unsigned int grant_idx;
 	struct blkif_request_segment *segments;
-	struct blkfront_info *info;
+	struct blkfront_ring_info *rinfo;
 	struct blkif_request *ring_req;
 	grant_ref_t gref_head;
 	unsigned int id;
@@ -507,8 +519,9 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
 	/* Convenient aliases */
 	unsigned int grant_idx = setup->grant_idx;
 	struct blkif_request *ring_req = setup->ring_req;
-	struct blkfront_info *info = setup->info;
-	struct blk_shadow *shadow = &info->shadow[setup->id];
+	struct blkfront_ring_info *rinfo = setup->rinfo;
+	struct blkfront_info *info = rinfo->dev_info;
+	struct blk_shadow *shadow = &rinfo->shadow[setup->id];
 
 	if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
 	    (grant_idx % GRANTS_PER_INDIRECT_FRAME == 0)) {
@@ -566,16 +579,16 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
 	(setup->grant_idx)++;
 }
 
-static int blkif_queue_rw_req(struct request *req)
+static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *rinfo)
 {
-	struct blkfront_info *info = req->rq_disk->private_data;
+	struct blkfront_info *info = rinfo->dev_info;
 	struct blkif_request *ring_req;
 	unsigned long id;
 	int i;
 	struct setup_rw_req setup = {
 		.grant_idx = 0,
 		.segments = NULL,
-		.info = info,
+		.rinfo = rinfo,
 		.need_copy = rq_data_dir(req) && info->feature_persistent,
 	};
 
@@ -603,9 +616,9 @@ static int blkif_queue_rw_req(struct request *req)
 		    max_grefs - info->persistent_gnts_c,
 		    &setup.gref_head) < 0) {
 			gnttab_request_free_callback(
-				&info->callback,
+				&rinfo->callback,
 				blkif_restart_queue_callback,
-				info,
+				rinfo,
 				max_grefs);
 			return 1;
 		}
@@ -613,23 +626,23 @@ static int blkif_queue_rw_req(struct request *req)
 		new_persistent_gnts = 0;
 
 	/* Fill out a communications ring structure. */
-	ring_req = RING_GET_REQUEST(&info->ring, info->ring.req_prod_pvt);
-	id = get_id_from_freelist(info);
-	info->shadow[id].request = req;
+	ring_req = RING_GET_REQUEST(&rinfo->ring, rinfo->ring.req_prod_pvt);
+	id = get_id_from_freelist(rinfo);
+	rinfo->shadow[id].request = req;
 
 	BUG_ON(info->max_indirect_segments == 0 &&
 	       GREFS(req->nr_phys_segments) > BLKIF_MAX_SEGMENTS_PER_REQUEST);
 	BUG_ON(info->max_indirect_segments &&
 	       GREFS(req->nr_phys_segments) > info->max_indirect_segments);
 
-	num_sg = blk_rq_map_sg(req->q, req, info->shadow[id].sg);
+	num_sg = blk_rq_map_sg(req->q, req, rinfo->shadow[id].sg);
 	num_grant = 0;
 	/* Calculate the number of grant used */
-	for_each_sg(info->shadow[id].sg, sg, num_sg, i)
+	for_each_sg(rinfo->shadow[id].sg, sg, num_sg, i)
 	       num_grant += gnttab_count_grant(sg->offset, sg->length);
 
 	ring_req->u.rw.id = id;
-	info->shadow[id].num_sg = num_sg;
+	rinfo->shadow[id].num_sg = num_sg;
 	if (num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
 		/*
 		 * The indirect operation can only be a BLKIF_OP_READ or
@@ -674,7 +687,7 @@ static int blkif_queue_rw_req(struct request *req)
 
 	setup.ring_req = ring_req;
 	setup.id = id;
-	for_each_sg(info->shadow[id].sg, sg, num_sg, i) {
+	for_each_sg(rinfo->shadow[id].sg, sg, num_sg, i) {
 		BUG_ON(sg->offset + sg->length > PAGE_SIZE);
 
 		if (setup.need_copy) {
@@ -694,10 +707,10 @@ static int blkif_queue_rw_req(struct request *req)
 	if (setup.segments)
 		kunmap_atomic(setup.segments);
 
-	info->ring.req_prod_pvt++;
+	rinfo->ring.req_prod_pvt++;
 
 	/* Keep a private copy so we can reissue requests when recovering. */
-	info->shadow[id].req = *ring_req;
+	rinfo->shadow[id].req = *ring_req;
 
 	if (new_persistent_gnts)
 		gnttab_free_grant_references(setup.gref_head);
@@ -711,27 +724,25 @@ static int blkif_queue_rw_req(struct request *req)
  *
  * @req: a request struct
  */
-static int blkif_queue_request(struct request *req)
+static int blkif_queue_request(struct request *req, struct blkfront_ring_info *rinfo)
 {
-	struct blkfront_info *info = req->rq_disk->private_data;
-
-	if (unlikely(info->connected != BLKIF_STATE_CONNECTED))
+	if (unlikely(rinfo->dev_info->connected != BLKIF_STATE_CONNECTED))
 		return 1;
 
 	if (unlikely(req->cmd_flags & (REQ_DISCARD | REQ_SECURE)))
-		return blkif_queue_discard_req(req);
+		return blkif_queue_discard_req(req, rinfo);
 	else
-		return blkif_queue_rw_req(req);
+		return blkif_queue_rw_req(req, rinfo);
 }
 
-static inline void flush_requests(struct blkfront_info *info)
+static inline void flush_requests(struct blkfront_ring_info *rinfo)
 {
 	int notify;
 
-	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&info->ring, notify);
+	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&rinfo->ring, notify);
 
 	if (notify)
-		notify_remote_via_irq(info->irq);
+		notify_remote_via_irq(rinfo->irq);
 }
 
 static inline bool blkif_request_flush_invalid(struct request *req,
@@ -747,20 +758,21 @@ static inline bool blkif_request_flush_invalid(struct request *req,
 static int blkif_queue_rq(struct blk_mq_hw_ctx *hctx,
 			   const struct blk_mq_queue_data *qd)
 {
-	struct blkfront_info *info = qd->rq->rq_disk->private_data;
+	struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)hctx->driver_data;
+	struct blkfront_info *info = rinfo->dev_info;
 
 	blk_mq_start_request(qd->rq);
 	spin_lock_irq(&info->io_lock);
-	if (RING_FULL(&info->ring))
+	if (RING_FULL(&rinfo->ring))
 		goto out_busy;
 
-	if (blkif_request_flush_invalid(qd->rq, info))
+	if (blkif_request_flush_invalid(qd->rq, rinfo->dev_info))
 		goto out_err;
 
-	if (blkif_queue_request(qd->rq))
+	if (blkif_queue_request(qd->rq, rinfo))
 		goto out_busy;
 
-	flush_requests(info);
+	flush_requests(rinfo);
 	spin_unlock_irq(&info->io_lock);
 	return BLK_MQ_RQ_QUEUE_OK;
 
@@ -774,9 +786,19 @@ out_busy:
 	return BLK_MQ_RQ_QUEUE_BUSY;
 }
 
+static int blk_mq_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
+			    unsigned int index)
+{
+	struct blkfront_info *info = (struct blkfront_info *)data;
+
+	hctx->driver_data = &info->rinfo;
+	return 0;
+}
+
 static struct blk_mq_ops blkfront_mq_ops = {
 	.queue_rq = blkif_queue_rq,
 	.map_queue = blk_mq_map_queue,
+	.init_hctx = blk_mq_init_hctx,
 };
 
 static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
@@ -1029,6 +1051,7 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
 static void xlvbd_release_gendisk(struct blkfront_info *info)
 {
 	unsigned int minor, nr_minors;
+	struct blkfront_ring_info *rinfo = &info->rinfo;
 
 	if (info->rq == NULL)
 		return;
@@ -1037,10 +1060,10 @@ static void xlvbd_release_gendisk(struct blkfront_info *info)
 	blk_mq_stop_hw_queues(info->rq);
 
 	/* No more gnttab callback work. */
-	gnttab_cancel_free_callback(&info->callback);
+	gnttab_cancel_free_callback(&rinfo->callback);
 
 	/* Flush gnttab callback work. Must be done with no locks held. */
-	flush_work(&info->work);
+	flush_work(&rinfo->work);
 
 	del_gendisk(info->gd);
 
@@ -1057,20 +1080,20 @@ static void xlvbd_release_gendisk(struct blkfront_info *info)
 }
 
 /* Must be called with io_lock holded */
-static void kick_pending_request_queues(struct blkfront_info *info)
+static void kick_pending_request_queues(struct blkfront_ring_info *rinfo)
 {
-	if (!RING_FULL(&info->ring))
-		blk_mq_start_stopped_hw_queues(info->rq, true);
+	if (!RING_FULL(&rinfo->ring))
+		blk_mq_start_stopped_hw_queues(rinfo->dev_info->rq, true);
 }
 
 static void blkif_restart_queue(struct work_struct *work)
 {
-	struct blkfront_info *info = container_of(work, struct blkfront_info, work);
+	struct blkfront_ring_info *rinfo = container_of(work, struct blkfront_ring_info, work);
 
-	spin_lock_irq(&info->io_lock);
-	if (info->connected == BLKIF_STATE_CONNECTED)
-		kick_pending_request_queues(info);
-	spin_unlock_irq(&info->io_lock);
+	spin_lock_irq(&rinfo->dev_info->io_lock);
+	if (rinfo->dev_info->connected == BLKIF_STATE_CONNECTED)
+		kick_pending_request_queues(rinfo);
+	spin_unlock_irq(&rinfo->dev_info->io_lock);
 }
 
 static void blkif_free(struct blkfront_info *info, int suspend)
@@ -1078,6 +1101,7 @@ static void blkif_free(struct blkfront_info *info, int suspend)
 	struct grant *persistent_gnt;
 	struct grant *n;
 	int i, j, segs;
+	struct blkfront_ring_info *rinfo = &info->rinfo;
 
 	/* Prevent new requests being issued until we fix things up. */
 	spin_lock_irq(&info->io_lock);
@@ -1090,7 +1114,7 @@ static void blkif_free(struct blkfront_info *info, int suspend)
 	/* Remove all persistent grants */
 	if (!list_empty(&info->grants)) {
 		list_for_each_entry_safe(persistent_gnt, n,
-		                         &info->grants, node) {
+					 &info->grants, node) {
 			list_del(&persistent_gnt->node);
 			if (persistent_gnt->gref != GRANT_INVALID_REF) {
 				gnttab_end_foreign_access(persistent_gnt->gref,
@@ -1108,11 +1132,11 @@ static void blkif_free(struct blkfront_info *info, int suspend)
 	 * Remove indirect pages, this only happens when using indirect
 	 * descriptors but not persistent grants
 	 */
-	if (!list_empty(&info->indirect_pages)) {
+	if (!list_empty(&rinfo->indirect_pages)) {
 		struct page *indirect_page, *n;
 
 		BUG_ON(info->feature_persistent);
-		list_for_each_entry_safe(indirect_page, n, &info->indirect_pages, lru) {
+		list_for_each_entry_safe(indirect_page, n, &rinfo->indirect_pages, lru) {
 			list_del(&indirect_page->lru);
 			__free_page(indirect_page);
 		}
@@ -1123,21 +1147,21 @@ static void blkif_free(struct blkfront_info *info, int suspend)
 		 * Clear persistent grants present in requests already
 		 * on the shared ring
 		 */
-		if (!info->shadow[i].request)
+		if (!rinfo->shadow[i].request)
 			goto free_shadow;
 
-		segs = info->shadow[i].req.operation == BLKIF_OP_INDIRECT ?
-		       info->shadow[i].req.u.indirect.nr_segments :
-		       info->shadow[i].req.u.rw.nr_segments;
+		segs = rinfo->shadow[i].req.operation == BLKIF_OP_INDIRECT ?
+		       rinfo->shadow[i].req.u.indirect.nr_segments :
+		       rinfo->shadow[i].req.u.rw.nr_segments;
 		for (j = 0; j < segs; j++) {
-			persistent_gnt = info->shadow[i].grants_used[j];
+			persistent_gnt = rinfo->shadow[i].grants_used[j];
 			gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
 			if (info->feature_persistent)
 				__free_page(persistent_gnt->page);
 			kfree(persistent_gnt);
 		}
 
-		if (info->shadow[i].req.operation != BLKIF_OP_INDIRECT)
+		if (rinfo->shadow[i].req.operation != BLKIF_OP_INDIRECT)
 			/*
 			 * If this is not an indirect operation don't try to
 			 * free indirect segments
@@ -1145,41 +1169,41 @@ static void blkif_free(struct blkfront_info *info, int suspend)
 			goto free_shadow;
 
 		for (j = 0; j < INDIRECT_GREFS(segs); j++) {
-			persistent_gnt = info->shadow[i].indirect_grants[j];
+			persistent_gnt = rinfo->shadow[i].indirect_grants[j];
 			gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
 			__free_page(persistent_gnt->page);
 			kfree(persistent_gnt);
 		}
 
 free_shadow:
-		kfree(info->shadow[i].grants_used);
-		info->shadow[i].grants_used = NULL;
-		kfree(info->shadow[i].indirect_grants);
-		info->shadow[i].indirect_grants = NULL;
-		kfree(info->shadow[i].sg);
-		info->shadow[i].sg = NULL;
+		kfree(rinfo->shadow[i].grants_used);
+		rinfo->shadow[i].grants_used = NULL;
+		kfree(rinfo->shadow[i].indirect_grants);
+		rinfo->shadow[i].indirect_grants = NULL;
+		kfree(rinfo->shadow[i].sg);
+		rinfo->shadow[i].sg = NULL;
 	}
 
 	/* No more gnttab callback work. */
-	gnttab_cancel_free_callback(&info->callback);
+	gnttab_cancel_free_callback(&rinfo->callback);
 	spin_unlock_irq(&info->io_lock);
 
 	/* Flush gnttab callback work. Must be done with no locks held. */
-	flush_work(&info->work);
+	flush_work(&rinfo->work);
 
 	/* Free resources associated with old device channel. */
 	for (i = 0; i < info->nr_ring_pages; i++) {
-		if (info->ring_ref[i] != GRANT_INVALID_REF) {
-			gnttab_end_foreign_access(info->ring_ref[i], 0, 0);
-			info->ring_ref[i] = GRANT_INVALID_REF;
+		if (rinfo->ring_ref[i] != GRANT_INVALID_REF) {
+			gnttab_end_foreign_access(rinfo->ring_ref[i], 0, 0);
+			rinfo->ring_ref[i] = GRANT_INVALID_REF;
 		}
 	}
-	free_pages((unsigned long)info->ring.sring, get_order(info->nr_ring_pages * PAGE_SIZE));
-	info->ring.sring = NULL;
+	free_pages((unsigned long)rinfo->ring.sring, get_order(info->nr_ring_pages * PAGE_SIZE));
+	rinfo->ring.sring = NULL;
 
-	if (info->irq)
-		unbind_from_irqhandler(info->irq, info);
-	info->evtchn = info->irq = 0;
+	if (rinfo->irq)
+		unbind_from_irqhandler(rinfo->irq, rinfo);
+	rinfo->evtchn = rinfo->irq = 0;
 
 }
 
@@ -1209,12 +1233,13 @@ static void blkif_copy_from_grant(unsigned long gfn, unsigned int offset,
 	kunmap_atomic(shared_data);
 }
 
-static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
+static void blkif_completion(struct blk_shadow *s, struct blkfront_ring_info *rinfo,
 			     struct blkif_response *bret)
 {
 	int i = 0;
 	struct scatterlist *sg;
 	int num_sg, num_grant;
+	struct blkfront_info *info = rinfo->dev_info;
 	struct copy_from_grant data = {
 		.s = s,
 		.grant_idx = 0,
@@ -1284,7 +1309,7 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
 				 */
 				if (!info->feature_persistent) {
 					indirect_page = s->indirect_grants[i]->page;
-					list_add(&indirect_page->lru, &info->indirect_pages);
+					list_add(&indirect_page->lru, &rinfo->indirect_pages);
 				}
 				s->indirect_grants[i]->gref = GRANT_INVALID_REF;
 				list_add_tail(&s->indirect_grants[i]->node, &info->grants);
@@ -1299,7 +1324,8 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 	struct blkif_response *bret;
 	RING_IDX i, rp;
 	unsigned long flags;
-	struct blkfront_info *info = (struct blkfront_info *)dev_id;
+	struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)dev_id;
+	struct blkfront_info *info = rinfo->dev_info;
 	int error;
 
 	spin_lock_irqsave(&info->io_lock, flags);
@@ -1310,13 +1336,13 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 	}
 
  again:
-	rp = info->ring.sring->rsp_prod;
+	rp = rinfo->ring.sring->rsp_prod;
 	rmb(); /* Ensure we see queued responses up to 'rp'. */
 
-	for (i = info->ring.rsp_cons; i != rp; i++) {
+	for (i = rinfo->ring.rsp_cons; i != rp; i++) {
 		unsigned long id;
 
-		bret = RING_GET_RESPONSE(&info->ring, i);
+		bret = RING_GET_RESPONSE(&rinfo->ring, i);
 		id   = bret->id;
 		/*
 		 * The backend has messed up and given us an id that we would
@@ -1330,12 +1356,12 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 			 * the id is busted. */
 			continue;
 		}
-		req  = info->shadow[id].request;
+		req  = rinfo->shadow[id].request;
 
 		if (bret->operation != BLKIF_OP_DISCARD)
-			blkif_completion(&info->shadow[id], info, bret);
+			blkif_completion(&rinfo->shadow[id], rinfo, bret);
 
-		if (add_id_to_freelist(info, id)) {
+		if (add_id_to_freelist(rinfo, id)) {
 			WARN(1, "%s: response to %s (id %ld) couldn't be recycled!\n",
 			     info->gd->disk_name, op_name(bret->operation), id);
 			continue;
@@ -1364,7 +1390,7 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 				error = -EOPNOTSUPP;
 			}
 			if (unlikely(bret->status == BLKIF_RSP_ERROR &&
-				     info->shadow[id].req.u.rw.nr_segments == 0)) {
+				     rinfo->shadow[id].req.u.rw.nr_segments == 0)) {
 				printk(KERN_WARNING "blkfront: %s: empty %s op failed\n",
 				       info->gd->disk_name, op_name(bret->operation));
 				error = -EOPNOTSUPP;
@@ -1389,17 +1415,17 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 		}
 	}
 
-	info->ring.rsp_cons = i;
+	rinfo->ring.rsp_cons = i;
 
-	if (i != info->ring.req_prod_pvt) {
+	if (i != rinfo->ring.req_prod_pvt) {
 		int more_to_do;
-		RING_FINAL_CHECK_FOR_RESPONSES(&info->ring, more_to_do);
+		RING_FINAL_CHECK_FOR_RESPONSES(&rinfo->ring, more_to_do);
 		if (more_to_do)
 			goto again;
 	} else
-		info->ring.sring->rsp_event = i + 1;
+		rinfo->ring.sring->rsp_event = i + 1;
 
-	kick_pending_request_queues(info);
+	kick_pending_request_queues(rinfo);
 
 	spin_unlock_irqrestore(&info->io_lock, flags);
 
@@ -1408,15 +1434,16 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 
 
 static int setup_blkring(struct xenbus_device *dev,
-			 struct blkfront_info *info)
+			 struct blkfront_ring_info *rinfo)
 {
 	struct blkif_sring *sring;
 	int err, i;
+	struct blkfront_info *info = rinfo->dev_info;
 	unsigned long ring_size = info->nr_ring_pages * XEN_PAGE_SIZE;
 	grant_ref_t gref[XENBUS_MAX_RING_GRANTS];
 
 	for (i = 0; i < info->nr_ring_pages; i++)
-		info->ring_ref[i] = GRANT_INVALID_REF;
+		rinfo->ring_ref[i] = GRANT_INVALID_REF;
 
 	sring = (struct blkif_sring *)__get_free_pages(GFP_NOIO | __GFP_HIGH,
 						       get_order(ring_size));
@@ -1425,29 +1452,29 @@ static int setup_blkring(struct xenbus_device *dev,
 		return -ENOMEM;
 	}
 	SHARED_RING_INIT(sring);
-	FRONT_RING_INIT(&info->ring, sring, ring_size);
+	FRONT_RING_INIT(&rinfo->ring, sring, ring_size);
 
-	err = xenbus_grant_ring(dev, info->ring.sring, info->nr_ring_pages, gref);
+	err = xenbus_grant_ring(dev, rinfo->ring.sring, info->nr_ring_pages, gref);
 	if (err < 0) {
 		free_pages((unsigned long)sring, get_order(ring_size));
-		info->ring.sring = NULL;
+		rinfo->ring.sring = NULL;
 		goto fail;
 	}
 	for (i = 0; i < info->nr_ring_pages; i++)
-		info->ring_ref[i] = gref[i];
+		rinfo->ring_ref[i] = gref[i];
 
-	err = xenbus_alloc_evtchn(dev, &info->evtchn);
+	err = xenbus_alloc_evtchn(dev, &rinfo->evtchn);
 	if (err)
 		goto fail;
 
-	err = bind_evtchn_to_irqhandler(info->evtchn, blkif_interrupt, 0,
-					"blkif", info);
+	err = bind_evtchn_to_irqhandler(rinfo->evtchn, blkif_interrupt, 0,
+					"blkif", rinfo);
 	if (err <= 0) {
 		xenbus_dev_fatal(dev, err,
 				 "bind_evtchn_to_irqhandler failed");
 		goto fail;
 	}
-	info->irq = err;
+	rinfo->irq = err;
 
 	return 0;
 fail:
@@ -1465,6 +1492,7 @@ static int talk_to_blkback(struct xenbus_device *dev,
 	int err, i;
 	unsigned int max_page_order = 0;
 	unsigned int ring_page_order = 0;
+	struct blkfront_ring_info *rinfo = &info->rinfo;
 
 	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
 			   "max-ring-page-order", "%u", &max_page_order);
@@ -1476,7 +1504,7 @@ static int talk_to_blkback(struct xenbus_device *dev,
 	}
 
 	/* Create shared ring, alloc event channel. */
-	err = setup_blkring(dev, info);
+	err = setup_blkring(dev, rinfo);
 	if (err)
 		goto out;
 
@@ -1489,7 +1517,7 @@ again:
 
 	if (info->nr_ring_pages == 1) {
 		err = xenbus_printf(xbt, dev->nodename,
-				    "ring-ref", "%u", info->ring_ref[0]);
+				    "ring-ref", "%u", rinfo->ring_ref[0]);
 		if (err) {
 			message = "writing ring-ref";
 			goto abort_transaction;
@@ -1507,7 +1535,7 @@ again:
 
 			snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
 			err = xenbus_printf(xbt, dev->nodename, ring_ref_name,
-					    "%u", info->ring_ref[i]);
+					    "%u", rinfo->ring_ref[i]);
 			if (err) {
 				message = "writing ring-ref";
 				goto abort_transaction;
@@ -1515,7 +1543,7 @@ again:
 		}
 	}
 	err = xenbus_printf(xbt, dev->nodename,
-			    "event-channel", "%u", info->evtchn);
+			    "event-channel", "%u", rinfo->evtchn);
 	if (err) {
 		message = "writing event-channel";
 		goto abort_transaction;
@@ -1541,8 +1569,8 @@ again:
 	}
 
 	for (i = 0; i < BLK_RING_SIZE(info); i++)
-		info->shadow[i].req.u.rw.id = i+1;
-	info->shadow[BLK_RING_SIZE(info)-1].req.u.rw.id = 0x0fffffff;
+		rinfo->shadow[i].req.u.rw.id = i+1;
+	rinfo->shadow[BLK_RING_SIZE(info)-1].req.u.rw.id = 0x0fffffff;
 	xenbus_switch_state(dev, XenbusStateInitialised);
 
 	return 0;
@@ -1568,6 +1596,7 @@ static int blkfront_probe(struct xenbus_device *dev,
 {
 	int err, vdevice;
 	struct blkfront_info *info;
+	struct blkfront_ring_info *rinfo;
 
 	/* FIXME: Use dynamic device id if this is not set. */
 	err = xenbus_scanf(XBT_NIL, dev->nodename,
@@ -1617,15 +1646,18 @@ static int blkfront_probe(struct xenbus_device *dev,
 		return -ENOMEM;
 	}
 
+	rinfo = &info->rinfo;
+	INIT_LIST_HEAD(&rinfo->indirect_pages);
+	rinfo->dev_info = info;
+	INIT_WORK(&rinfo->work, blkif_restart_queue);
+
 	mutex_init(&info->mutex);
 	spin_lock_init(&info->io_lock);
 	info->xbdev = dev;
 	info->vdevice = vdevice;
 	INIT_LIST_HEAD(&info->grants);
-	INIT_LIST_HEAD(&info->indirect_pages);
 	info->persistent_gnts_c = 0;
 	info->connected = BLKIF_STATE_DISCONNECTED;
-	INIT_WORK(&info->work, blkif_restart_queue);
 
 	/* Front end dir is a number, which is used as the id. */
 	info->handle = simple_strtoul(strrchr(dev->nodename, '/')+1, NULL, 0);
@@ -1659,19 +1691,20 @@ static int blkif_recover(struct blkfront_info *info)
 	int pending, size;
 	struct split_bio *split_bio;
 	struct list_head requests;
+	struct blkfront_ring_info *rinfo = &info->rinfo;
 
 	/* Stage 1: Make a safe copy of the shadow state. */
-	copy = kmemdup(info->shadow, sizeof(info->shadow),
+	copy = kmemdup(rinfo->shadow, sizeof(rinfo->shadow),
 		       GFP_NOIO | __GFP_REPEAT | __GFP_HIGH);
 	if (!copy)
 		return -ENOMEM;
 
 	/* Stage 2: Set up free list. */
-	memset(&info->shadow, 0, sizeof(info->shadow));
+	memset(&rinfo->shadow, 0, sizeof(rinfo->shadow));
 	for (i = 0; i < BLK_RING_SIZE(info); i++)
-		info->shadow[i].req.u.rw.id = i+1;
-	info->shadow_free = info->ring.req_prod_pvt;
-	info->shadow[BLK_RING_SIZE(info)-1].req.u.rw.id = 0x0fffffff;
+		rinfo->shadow[i].req.u.rw.id = i+1;
+	rinfo->shadow_free = rinfo->ring.req_prod_pvt;
+	rinfo->shadow[BLK_RING_SIZE(info)-1].req.u.rw.id = 0x0fffffff;
 
 	rc = blkfront_gather_backend_features(info);
 	if (rc) {
@@ -1717,7 +1750,7 @@ static int blkif_recover(struct blkfront_info *info)
 	info->connected = BLKIF_STATE_CONNECTED;
 
 	/* Kick any other new requests queued since we resumed */
-	kick_pending_request_queues(info);
+	kick_pending_request_queues(rinfo);
 
 	list_for_each_entry_safe(req, n, &requests, queuelist) {
 		/* Requeue pending requests (flush or discard) */
@@ -1851,10 +1884,11 @@ static void blkfront_setup_discard(struct blkfront_info *info)
 		info->feature_secdiscard = !!discard_secure;
 }
 
-static int blkfront_setup_indirect(struct blkfront_info *info)
+static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo)
 {
 	unsigned int psegs, grants;
 	int err, i;
+	struct blkfront_info *info = rinfo->dev_info;
 
 	if (info->max_indirect_segments == 0)
 		grants = BLKIF_MAX_SEGMENTS_PER_REQUEST;
@@ -1862,7 +1896,7 @@ static int blkfront_setup_indirect(struct blkfront_info *info)
 		grants = info->max_indirect_segments;
 	psegs = grants / GRANTS_PER_PSEG;
 
-	err = fill_grant_buffer(info,
+	err = fill_grant_buffer(rinfo,
 				(grants + INDIRECT_GREFS(grants)) * BLK_RING_SIZE(info));
 	if (err)
 		goto out_of_memory;
@@ -1875,31 +1909,31 @@ static int blkfront_setup_indirect(struct blkfront_info *info)
 		 */
 		int num = INDIRECT_GREFS(grants) * BLK_RING_SIZE(info);
 
-		BUG_ON(!list_empty(&info->indirect_pages));
+		BUG_ON(!list_empty(&rinfo->indirect_pages));
 		for (i = 0; i < num; i++) {
 			struct page *indirect_page = alloc_page(GFP_NOIO);
 			if (!indirect_page)
 				goto out_of_memory;
-			list_add(&indirect_page->lru, &info->indirect_pages);
+			list_add(&indirect_page->lru, &rinfo->indirect_pages);
 		}
 	}
 
 	for (i = 0; i < BLK_RING_SIZE(info); i++) {
-		info->shadow[i].grants_used = kzalloc(
-			sizeof(info->shadow[i].grants_used[0]) * grants,
+		rinfo->shadow[i].grants_used = kzalloc(
+			sizeof(rinfo->shadow[i].grants_used[0]) * grants,
 			GFP_NOIO);
-		info->shadow[i].sg = kzalloc(sizeof(info->shadow[i].sg[0]) * psegs, GFP_NOIO);
+		rinfo->shadow[i].sg = kzalloc(sizeof(rinfo->shadow[i].sg[0]) * psegs, GFP_NOIO);
 		if (info->max_indirect_segments)
-			info->shadow[i].indirect_grants = kzalloc(
-				sizeof(info->shadow[i].indirect_grants[0]) *
+			rinfo->shadow[i].indirect_grants = kzalloc(
+				sizeof(rinfo->shadow[i].indirect_grants[0]) *
 				INDIRECT_GREFS(grants),
 				GFP_NOIO);
-		if ((info->shadow[i].grants_used == NULL) ||
-			(info->shadow[i].sg == NULL) ||
+		if ((rinfo->shadow[i].grants_used == NULL) ||
+			(rinfo->shadow[i].sg == NULL) ||
 		     (info->max_indirect_segments &&
-		     (info->shadow[i].indirect_grants == NULL)))
+		     (rinfo->shadow[i].indirect_grants == NULL)))
 			goto out_of_memory;
-		sg_init_table(info->shadow[i].sg, psegs);
+		sg_init_table(rinfo->shadow[i].sg, psegs);
 	}
 
 
@@ -1907,16 +1941,16 @@ static int blkfront_setup_indirect(struct blkfront_info *info)
 
 out_of_memory:
 	for (i = 0; i < BLK_RING_SIZE(info); i++) {
-		kfree(info->shadow[i].grants_used);
-		info->shadow[i].grants_used = NULL;
-		kfree(info->shadow[i].sg);
-		info->shadow[i].sg = NULL;
-		kfree(info->shadow[i].indirect_grants);
-		info->shadow[i].indirect_grants = NULL;
-	}
-	if (!list_empty(&info->indirect_pages)) {
+		kfree(rinfo->shadow[i].grants_used);
+		rinfo->shadow[i].grants_used = NULL;
+		kfree(rinfo->shadow[i].sg);
+		rinfo->shadow[i].sg = NULL;
+		kfree(rinfo->shadow[i].indirect_grants);
+		rinfo->shadow[i].indirect_grants = NULL;
+	}
+	if (!list_empty(&rinfo->indirect_pages)) {
 		struct page *indirect_page, *n;
-		list_for_each_entry_safe(indirect_page, n, &info->indirect_pages, lru) {
+		list_for_each_entry_safe(indirect_page, n, &rinfo->indirect_pages, lru) {
 			list_del(&indirect_page->lru);
 			__free_page(indirect_page);
 		}
@@ -1983,7 +2017,7 @@ static int blkfront_gather_backend_features(struct blkfront_info *info)
 		info->max_indirect_segments = min(indirect_segments,
 						  xen_blkif_max_segments);
 
-	return blkfront_setup_indirect(info);
+	return blkfront_setup_indirect(&info->rinfo);
 }
 
 /*
@@ -1997,6 +2031,7 @@ static void blkfront_connect(struct blkfront_info *info)
 	unsigned int physical_sector_size;
 	unsigned int binfo;
 	int err;
+	struct blkfront_ring_info *rinfo = &info->rinfo;
 
 	switch (info->connected) {
 	case BLKIF_STATE_CONNECTED:
@@ -2073,7 +2108,7 @@ static void blkfront_connect(struct blkfront_info *info)
 	/* Kick pending requests. */
 	spin_lock_irq(&info->io_lock);
 	info->connected = BLKIF_STATE_CONNECTED;
-	kick_pending_request_queues(info);
+	kick_pending_request_queues(rinfo);
 	spin_unlock_irq(&info->io_lock);
 
 	add_disk(info->gd);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 02/10] xen/blkfront: separate per ring information out of device info
@ 2015-11-14  3:12   ` Bob Liu
  0 siblings, 0 replies; 59+ messages in thread
From: Bob Liu @ 2015-11-14  3:12 UTC (permalink / raw)
  To: xen-devel
  Cc: jonathan.davies, felipe.franciosi, rafal.mielniczuk,
	linux-kernel, axboe, Bob Liu, david.vrabel, avanzini.arianna,
	roger.pau

Split per ring information to an new structure "blkfront_ring_info".

A ring is the representation of a hardware queue, every vbd device can associate
with one or more rings depending on how many hardware queues/rings to be used.

This patch is a preparation for supporting real multi hardware queues/rings.

Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
v2: Fix build error.
---
 drivers/block/xen-blkfront.c |  359 +++++++++++++++++++++++-------------------
 1 file changed, 197 insertions(+), 162 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 2fee2ee..0c3ad21 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -120,6 +120,23 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the
 #define RINGREF_NAME_LEN (20)
 
 /*
+ *  Per-ring info.
+ *  Every blkfront device can associate with one or more blkfront_ring_info,
+ *  depending on how many hardware queues/rings to be used.
+ */
+struct blkfront_ring_info {
+	struct blkif_front_ring ring;
+	unsigned int ring_ref[XENBUS_MAX_RING_GRANTS];
+	unsigned int evtchn, irq;
+	struct work_struct work;
+	struct gnttab_free_callback callback;
+	struct blk_shadow shadow[BLK_MAX_RING_SIZE];
+	struct list_head indirect_pages;
+	unsigned long shadow_free;
+	struct blkfront_info *dev_info;
+};
+
+/*
  * We have one of these per vbd, whether ide, scsi or 'other'.  They
  * hang in private_data off the gendisk structure. We may end up
  * putting all kinds of interesting stuff here :-)
@@ -133,18 +150,10 @@ struct blkfront_info
 	int vdevice;
 	blkif_vdev_t handle;
 	enum blkif_state connected;
-	int ring_ref[XENBUS_MAX_RING_GRANTS];
 	unsigned int nr_ring_pages;
-	struct blkif_front_ring ring;
-	unsigned int evtchn, irq;
 	struct request_queue *rq;
-	struct work_struct work;
-	struct gnttab_free_callback callback;
-	struct blk_shadow shadow[BLK_MAX_RING_SIZE];
 	struct list_head grants;
-	struct list_head indirect_pages;
 	unsigned int persistent_gnts_c;
-	unsigned long shadow_free;
 	unsigned int feature_flush;
 	unsigned int feature_discard:1;
 	unsigned int feature_secdiscard:1;
@@ -155,6 +164,7 @@ struct blkfront_info
 	unsigned int max_indirect_segments;
 	int is_ready;
 	struct blk_mq_tag_set tag_set;
+	struct blkfront_ring_info rinfo;
 };
 
 static unsigned int nr_minors;
@@ -198,33 +208,35 @@ static DEFINE_SPINLOCK(minor_lock);
 
 #define GREFS(_psegs)	((_psegs) * GRANTS_PER_PSEG)
 
-static int blkfront_setup_indirect(struct blkfront_info *info);
+static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo);
 static int blkfront_gather_backend_features(struct blkfront_info *info);
 
-static int get_id_from_freelist(struct blkfront_info *info)
+static int get_id_from_freelist(struct blkfront_ring_info *rinfo)
 {
-	unsigned long free = info->shadow_free;
-	BUG_ON(free >= BLK_RING_SIZE(info));
-	info->shadow_free = info->shadow[free].req.u.rw.id;
-	info->shadow[free].req.u.rw.id = 0x0fffffee; /* debug */
+	unsigned long free = rinfo->shadow_free;
+
+	BUG_ON(free >= BLK_RING_SIZE(rinfo->dev_info));
+	rinfo->shadow_free = rinfo->shadow[free].req.u.rw.id;
+	rinfo->shadow[free].req.u.rw.id = 0x0fffffee; /* debug */
 	return free;
 }
 
-static int add_id_to_freelist(struct blkfront_info *info,
+static int add_id_to_freelist(struct blkfront_ring_info *rinfo,
 			       unsigned long id)
 {
-	if (info->shadow[id].req.u.rw.id != id)
+	if (rinfo->shadow[id].req.u.rw.id != id)
 		return -EINVAL;
-	if (info->shadow[id].request == NULL)
+	if (rinfo->shadow[id].request == NULL)
 		return -EINVAL;
-	info->shadow[id].req.u.rw.id  = info->shadow_free;
-	info->shadow[id].request = NULL;
-	info->shadow_free = id;
+	rinfo->shadow[id].req.u.rw.id  = rinfo->shadow_free;
+	rinfo->shadow[id].request = NULL;
+	rinfo->shadow_free = id;
 	return 0;
 }
 
-static int fill_grant_buffer(struct blkfront_info *info, int num)
+static int fill_grant_buffer(struct blkfront_ring_info *rinfo, int num)
 {
+	struct blkfront_info *info = rinfo->dev_info;
 	struct page *granted_page;
 	struct grant *gnt_list_entry, *n;
 	int i = 0;
@@ -326,8 +338,8 @@ static struct grant *get_indirect_grant(grant_ref_t *gref_head,
 		struct page *indirect_page;
 
 		/* Fetch a pre-allocated page to use for indirect grefs */
-		BUG_ON(list_empty(&info->indirect_pages));
-		indirect_page = list_first_entry(&info->indirect_pages,
+		BUG_ON(list_empty(&info->rinfo.indirect_pages));
+		indirect_page = list_first_entry(&info->rinfo.indirect_pages,
 						 struct page, lru);
 		list_del(&indirect_page->lru);
 		gnt_list_entry->page = indirect_page;
@@ -403,8 +415,8 @@ static void xlbd_release_minors(unsigned int minor, unsigned int nr)
 
 static void blkif_restart_queue_callback(void *arg)
 {
-	struct blkfront_info *info = (struct blkfront_info *)arg;
-	schedule_work(&info->work);
+	struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)arg;
+	schedule_work(&rinfo->work);
 }
 
 static int blkif_getgeo(struct block_device *bd, struct hd_geometry *hg)
@@ -456,16 +468,16 @@ static int blkif_ioctl(struct block_device *bdev, fmode_t mode,
 	return 0;
 }
 
-static int blkif_queue_discard_req(struct request *req)
+static int blkif_queue_discard_req(struct request *req, struct blkfront_ring_info *rinfo)
 {
-	struct blkfront_info *info = req->rq_disk->private_data;
+	struct blkfront_info *info = rinfo->dev_info;
 	struct blkif_request *ring_req;
 	unsigned long id;
 
 	/* Fill out a communications ring structure. */
-	ring_req = RING_GET_REQUEST(&info->ring, info->ring.req_prod_pvt);
-	id = get_id_from_freelist(info);
-	info->shadow[id].request = req;
+	ring_req = RING_GET_REQUEST(&rinfo->ring, rinfo->ring.req_prod_pvt);
+	id = get_id_from_freelist(rinfo);
+	rinfo->shadow[id].request = req;
 
 	ring_req->operation = BLKIF_OP_DISCARD;
 	ring_req->u.discard.nr_sectors = blk_rq_sectors(req);
@@ -476,10 +488,10 @@ static int blkif_queue_discard_req(struct request *req)
 	else
 		ring_req->u.discard.flag = 0;
 
-	info->ring.req_prod_pvt++;
+	rinfo->ring.req_prod_pvt++;
 
 	/* Keep a private copy so we can reissue requests when recovering. */
-	info->shadow[id].req = *ring_req;
+	rinfo->shadow[id].req = *ring_req;
 
 	return 0;
 }
@@ -487,7 +499,7 @@ static int blkif_queue_discard_req(struct request *req)
 struct setup_rw_req {
 	unsigned int grant_idx;
 	struct blkif_request_segment *segments;
-	struct blkfront_info *info;
+	struct blkfront_ring_info *rinfo;
 	struct blkif_request *ring_req;
 	grant_ref_t gref_head;
 	unsigned int id;
@@ -507,8 +519,9 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
 	/* Convenient aliases */
 	unsigned int grant_idx = setup->grant_idx;
 	struct blkif_request *ring_req = setup->ring_req;
-	struct blkfront_info *info = setup->info;
-	struct blk_shadow *shadow = &info->shadow[setup->id];
+	struct blkfront_ring_info *rinfo = setup->rinfo;
+	struct blkfront_info *info = rinfo->dev_info;
+	struct blk_shadow *shadow = &rinfo->shadow[setup->id];
 
 	if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
 	    (grant_idx % GRANTS_PER_INDIRECT_FRAME == 0)) {
@@ -566,16 +579,16 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
 	(setup->grant_idx)++;
 }
 
-static int blkif_queue_rw_req(struct request *req)
+static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *rinfo)
 {
-	struct blkfront_info *info = req->rq_disk->private_data;
+	struct blkfront_info *info = rinfo->dev_info;
 	struct blkif_request *ring_req;
 	unsigned long id;
 	int i;
 	struct setup_rw_req setup = {
 		.grant_idx = 0,
 		.segments = NULL,
-		.info = info,
+		.rinfo = rinfo,
 		.need_copy = rq_data_dir(req) && info->feature_persistent,
 	};
 
@@ -603,9 +616,9 @@ static int blkif_queue_rw_req(struct request *req)
 		    max_grefs - info->persistent_gnts_c,
 		    &setup.gref_head) < 0) {
 			gnttab_request_free_callback(
-				&info->callback,
+				&rinfo->callback,
 				blkif_restart_queue_callback,
-				info,
+				rinfo,
 				max_grefs);
 			return 1;
 		}
@@ -613,23 +626,23 @@ static int blkif_queue_rw_req(struct request *req)
 		new_persistent_gnts = 0;
 
 	/* Fill out a communications ring structure. */
-	ring_req = RING_GET_REQUEST(&info->ring, info->ring.req_prod_pvt);
-	id = get_id_from_freelist(info);
-	info->shadow[id].request = req;
+	ring_req = RING_GET_REQUEST(&rinfo->ring, rinfo->ring.req_prod_pvt);
+	id = get_id_from_freelist(rinfo);
+	rinfo->shadow[id].request = req;
 
 	BUG_ON(info->max_indirect_segments == 0 &&
 	       GREFS(req->nr_phys_segments) > BLKIF_MAX_SEGMENTS_PER_REQUEST);
 	BUG_ON(info->max_indirect_segments &&
 	       GREFS(req->nr_phys_segments) > info->max_indirect_segments);
 
-	num_sg = blk_rq_map_sg(req->q, req, info->shadow[id].sg);
+	num_sg = blk_rq_map_sg(req->q, req, rinfo->shadow[id].sg);
 	num_grant = 0;
 	/* Calculate the number of grant used */
-	for_each_sg(info->shadow[id].sg, sg, num_sg, i)
+	for_each_sg(rinfo->shadow[id].sg, sg, num_sg, i)
 	       num_grant += gnttab_count_grant(sg->offset, sg->length);
 
 	ring_req->u.rw.id = id;
-	info->shadow[id].num_sg = num_sg;
+	rinfo->shadow[id].num_sg = num_sg;
 	if (num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
 		/*
 		 * The indirect operation can only be a BLKIF_OP_READ or
@@ -674,7 +687,7 @@ static int blkif_queue_rw_req(struct request *req)
 
 	setup.ring_req = ring_req;
 	setup.id = id;
-	for_each_sg(info->shadow[id].sg, sg, num_sg, i) {
+	for_each_sg(rinfo->shadow[id].sg, sg, num_sg, i) {
 		BUG_ON(sg->offset + sg->length > PAGE_SIZE);
 
 		if (setup.need_copy) {
@@ -694,10 +707,10 @@ static int blkif_queue_rw_req(struct request *req)
 	if (setup.segments)
 		kunmap_atomic(setup.segments);
 
-	info->ring.req_prod_pvt++;
+	rinfo->ring.req_prod_pvt++;
 
 	/* Keep a private copy so we can reissue requests when recovering. */
-	info->shadow[id].req = *ring_req;
+	rinfo->shadow[id].req = *ring_req;
 
 	if (new_persistent_gnts)
 		gnttab_free_grant_references(setup.gref_head);
@@ -711,27 +724,25 @@ static int blkif_queue_rw_req(struct request *req)
  *
  * @req: a request struct
  */
-static int blkif_queue_request(struct request *req)
+static int blkif_queue_request(struct request *req, struct blkfront_ring_info *rinfo)
 {
-	struct blkfront_info *info = req->rq_disk->private_data;
-
-	if (unlikely(info->connected != BLKIF_STATE_CONNECTED))
+	if (unlikely(rinfo->dev_info->connected != BLKIF_STATE_CONNECTED))
 		return 1;
 
 	if (unlikely(req->cmd_flags & (REQ_DISCARD | REQ_SECURE)))
-		return blkif_queue_discard_req(req);
+		return blkif_queue_discard_req(req, rinfo);
 	else
-		return blkif_queue_rw_req(req);
+		return blkif_queue_rw_req(req, rinfo);
 }
 
-static inline void flush_requests(struct blkfront_info *info)
+static inline void flush_requests(struct blkfront_ring_info *rinfo)
 {
 	int notify;
 
-	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&info->ring, notify);
+	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&rinfo->ring, notify);
 
 	if (notify)
-		notify_remote_via_irq(info->irq);
+		notify_remote_via_irq(rinfo->irq);
 }
 
 static inline bool blkif_request_flush_invalid(struct request *req,
@@ -747,20 +758,21 @@ static inline bool blkif_request_flush_invalid(struct request *req,
 static int blkif_queue_rq(struct blk_mq_hw_ctx *hctx,
 			   const struct blk_mq_queue_data *qd)
 {
-	struct blkfront_info *info = qd->rq->rq_disk->private_data;
+	struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)hctx->driver_data;
+	struct blkfront_info *info = rinfo->dev_info;
 
 	blk_mq_start_request(qd->rq);
 	spin_lock_irq(&info->io_lock);
-	if (RING_FULL(&info->ring))
+	if (RING_FULL(&rinfo->ring))
 		goto out_busy;
 
-	if (blkif_request_flush_invalid(qd->rq, info))
+	if (blkif_request_flush_invalid(qd->rq, rinfo->dev_info))
 		goto out_err;
 
-	if (blkif_queue_request(qd->rq))
+	if (blkif_queue_request(qd->rq, rinfo))
 		goto out_busy;
 
-	flush_requests(info);
+	flush_requests(rinfo);
 	spin_unlock_irq(&info->io_lock);
 	return BLK_MQ_RQ_QUEUE_OK;
 
@@ -774,9 +786,19 @@ out_busy:
 	return BLK_MQ_RQ_QUEUE_BUSY;
 }
 
+static int blk_mq_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
+			    unsigned int index)
+{
+	struct blkfront_info *info = (struct blkfront_info *)data;
+
+	hctx->driver_data = &info->rinfo;
+	return 0;
+}
+
 static struct blk_mq_ops blkfront_mq_ops = {
 	.queue_rq = blkif_queue_rq,
 	.map_queue = blk_mq_map_queue,
+	.init_hctx = blk_mq_init_hctx,
 };
 
 static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
@@ -1029,6 +1051,7 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
 static void xlvbd_release_gendisk(struct blkfront_info *info)
 {
 	unsigned int minor, nr_minors;
+	struct blkfront_ring_info *rinfo = &info->rinfo;
 
 	if (info->rq == NULL)
 		return;
@@ -1037,10 +1060,10 @@ static void xlvbd_release_gendisk(struct blkfront_info *info)
 	blk_mq_stop_hw_queues(info->rq);
 
 	/* No more gnttab callback work. */
-	gnttab_cancel_free_callback(&info->callback);
+	gnttab_cancel_free_callback(&rinfo->callback);
 
 	/* Flush gnttab callback work. Must be done with no locks held. */
-	flush_work(&info->work);
+	flush_work(&rinfo->work);
 
 	del_gendisk(info->gd);
 
@@ -1057,20 +1080,20 @@ static void xlvbd_release_gendisk(struct blkfront_info *info)
 }
 
 /* Must be called with io_lock holded */
-static void kick_pending_request_queues(struct blkfront_info *info)
+static void kick_pending_request_queues(struct blkfront_ring_info *rinfo)
 {
-	if (!RING_FULL(&info->ring))
-		blk_mq_start_stopped_hw_queues(info->rq, true);
+	if (!RING_FULL(&rinfo->ring))
+		blk_mq_start_stopped_hw_queues(rinfo->dev_info->rq, true);
 }
 
 static void blkif_restart_queue(struct work_struct *work)
 {
-	struct blkfront_info *info = container_of(work, struct blkfront_info, work);
+	struct blkfront_ring_info *rinfo = container_of(work, struct blkfront_ring_info, work);
 
-	spin_lock_irq(&info->io_lock);
-	if (info->connected == BLKIF_STATE_CONNECTED)
-		kick_pending_request_queues(info);
-	spin_unlock_irq(&info->io_lock);
+	spin_lock_irq(&rinfo->dev_info->io_lock);
+	if (rinfo->dev_info->connected == BLKIF_STATE_CONNECTED)
+		kick_pending_request_queues(rinfo);
+	spin_unlock_irq(&rinfo->dev_info->io_lock);
 }
 
 static void blkif_free(struct blkfront_info *info, int suspend)
@@ -1078,6 +1101,7 @@ static void blkif_free(struct blkfront_info *info, int suspend)
 	struct grant *persistent_gnt;
 	struct grant *n;
 	int i, j, segs;
+	struct blkfront_ring_info *rinfo = &info->rinfo;
 
 	/* Prevent new requests being issued until we fix things up. */
 	spin_lock_irq(&info->io_lock);
@@ -1090,7 +1114,7 @@ static void blkif_free(struct blkfront_info *info, int suspend)
 	/* Remove all persistent grants */
 	if (!list_empty(&info->grants)) {
 		list_for_each_entry_safe(persistent_gnt, n,
-		                         &info->grants, node) {
+					 &info->grants, node) {
 			list_del(&persistent_gnt->node);
 			if (persistent_gnt->gref != GRANT_INVALID_REF) {
 				gnttab_end_foreign_access(persistent_gnt->gref,
@@ -1108,11 +1132,11 @@ static void blkif_free(struct blkfront_info *info, int suspend)
 	 * Remove indirect pages, this only happens when using indirect
 	 * descriptors but not persistent grants
 	 */
-	if (!list_empty(&info->indirect_pages)) {
+	if (!list_empty(&rinfo->indirect_pages)) {
 		struct page *indirect_page, *n;
 
 		BUG_ON(info->feature_persistent);
-		list_for_each_entry_safe(indirect_page, n, &info->indirect_pages, lru) {
+		list_for_each_entry_safe(indirect_page, n, &rinfo->indirect_pages, lru) {
 			list_del(&indirect_page->lru);
 			__free_page(indirect_page);
 		}
@@ -1123,21 +1147,21 @@ static void blkif_free(struct blkfront_info *info, int suspend)
 		 * Clear persistent grants present in requests already
 		 * on the shared ring
 		 */
-		if (!info->shadow[i].request)
+		if (!rinfo->shadow[i].request)
 			goto free_shadow;
 
-		segs = info->shadow[i].req.operation == BLKIF_OP_INDIRECT ?
-		       info->shadow[i].req.u.indirect.nr_segments :
-		       info->shadow[i].req.u.rw.nr_segments;
+		segs = rinfo->shadow[i].req.operation == BLKIF_OP_INDIRECT ?
+		       rinfo->shadow[i].req.u.indirect.nr_segments :
+		       rinfo->shadow[i].req.u.rw.nr_segments;
 		for (j = 0; j < segs; j++) {
-			persistent_gnt = info->shadow[i].grants_used[j];
+			persistent_gnt = rinfo->shadow[i].grants_used[j];
 			gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
 			if (info->feature_persistent)
 				__free_page(persistent_gnt->page);
 			kfree(persistent_gnt);
 		}
 
-		if (info->shadow[i].req.operation != BLKIF_OP_INDIRECT)
+		if (rinfo->shadow[i].req.operation != BLKIF_OP_INDIRECT)
 			/*
 			 * If this is not an indirect operation don't try to
 			 * free indirect segments
@@ -1145,41 +1169,41 @@ static void blkif_free(struct blkfront_info *info, int suspend)
 			goto free_shadow;
 
 		for (j = 0; j < INDIRECT_GREFS(segs); j++) {
-			persistent_gnt = info->shadow[i].indirect_grants[j];
+			persistent_gnt = rinfo->shadow[i].indirect_grants[j];
 			gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
 			__free_page(persistent_gnt->page);
 			kfree(persistent_gnt);
 		}
 
 free_shadow:
-		kfree(info->shadow[i].grants_used);
-		info->shadow[i].grants_used = NULL;
-		kfree(info->shadow[i].indirect_grants);
-		info->shadow[i].indirect_grants = NULL;
-		kfree(info->shadow[i].sg);
-		info->shadow[i].sg = NULL;
+		kfree(rinfo->shadow[i].grants_used);
+		rinfo->shadow[i].grants_used = NULL;
+		kfree(rinfo->shadow[i].indirect_grants);
+		rinfo->shadow[i].indirect_grants = NULL;
+		kfree(rinfo->shadow[i].sg);
+		rinfo->shadow[i].sg = NULL;
 	}
 
 	/* No more gnttab callback work. */
-	gnttab_cancel_free_callback(&info->callback);
+	gnttab_cancel_free_callback(&rinfo->callback);
 	spin_unlock_irq(&info->io_lock);
 
 	/* Flush gnttab callback work. Must be done with no locks held. */
-	flush_work(&info->work);
+	flush_work(&rinfo->work);
 
 	/* Free resources associated with old device channel. */
 	for (i = 0; i < info->nr_ring_pages; i++) {
-		if (info->ring_ref[i] != GRANT_INVALID_REF) {
-			gnttab_end_foreign_access(info->ring_ref[i], 0, 0);
-			info->ring_ref[i] = GRANT_INVALID_REF;
+		if (rinfo->ring_ref[i] != GRANT_INVALID_REF) {
+			gnttab_end_foreign_access(rinfo->ring_ref[i], 0, 0);
+			rinfo->ring_ref[i] = GRANT_INVALID_REF;
 		}
 	}
-	free_pages((unsigned long)info->ring.sring, get_order(info->nr_ring_pages * PAGE_SIZE));
-	info->ring.sring = NULL;
+	free_pages((unsigned long)rinfo->ring.sring, get_order(info->nr_ring_pages * PAGE_SIZE));
+	rinfo->ring.sring = NULL;
 
-	if (info->irq)
-		unbind_from_irqhandler(info->irq, info);
-	info->evtchn = info->irq = 0;
+	if (rinfo->irq)
+		unbind_from_irqhandler(rinfo->irq, rinfo);
+	rinfo->evtchn = rinfo->irq = 0;
 
 }
 
@@ -1209,12 +1233,13 @@ static void blkif_copy_from_grant(unsigned long gfn, unsigned int offset,
 	kunmap_atomic(shared_data);
 }
 
-static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
+static void blkif_completion(struct blk_shadow *s, struct blkfront_ring_info *rinfo,
 			     struct blkif_response *bret)
 {
 	int i = 0;
 	struct scatterlist *sg;
 	int num_sg, num_grant;
+	struct blkfront_info *info = rinfo->dev_info;
 	struct copy_from_grant data = {
 		.s = s,
 		.grant_idx = 0,
@@ -1284,7 +1309,7 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
 				 */
 				if (!info->feature_persistent) {
 					indirect_page = s->indirect_grants[i]->page;
-					list_add(&indirect_page->lru, &info->indirect_pages);
+					list_add(&indirect_page->lru, &rinfo->indirect_pages);
 				}
 				s->indirect_grants[i]->gref = GRANT_INVALID_REF;
 				list_add_tail(&s->indirect_grants[i]->node, &info->grants);
@@ -1299,7 +1324,8 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 	struct blkif_response *bret;
 	RING_IDX i, rp;
 	unsigned long flags;
-	struct blkfront_info *info = (struct blkfront_info *)dev_id;
+	struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)dev_id;
+	struct blkfront_info *info = rinfo->dev_info;
 	int error;
 
 	spin_lock_irqsave(&info->io_lock, flags);
@@ -1310,13 +1336,13 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 	}
 
  again:
-	rp = info->ring.sring->rsp_prod;
+	rp = rinfo->ring.sring->rsp_prod;
 	rmb(); /* Ensure we see queued responses up to 'rp'. */
 
-	for (i = info->ring.rsp_cons; i != rp; i++) {
+	for (i = rinfo->ring.rsp_cons; i != rp; i++) {
 		unsigned long id;
 
-		bret = RING_GET_RESPONSE(&info->ring, i);
+		bret = RING_GET_RESPONSE(&rinfo->ring, i);
 		id   = bret->id;
 		/*
 		 * The backend has messed up and given us an id that we would
@@ -1330,12 +1356,12 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 			 * the id is busted. */
 			continue;
 		}
-		req  = info->shadow[id].request;
+		req  = rinfo->shadow[id].request;
 
 		if (bret->operation != BLKIF_OP_DISCARD)
-			blkif_completion(&info->shadow[id], info, bret);
+			blkif_completion(&rinfo->shadow[id], rinfo, bret);
 
-		if (add_id_to_freelist(info, id)) {
+		if (add_id_to_freelist(rinfo, id)) {
 			WARN(1, "%s: response to %s (id %ld) couldn't be recycled!\n",
 			     info->gd->disk_name, op_name(bret->operation), id);
 			continue;
@@ -1364,7 +1390,7 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 				error = -EOPNOTSUPP;
 			}
 			if (unlikely(bret->status == BLKIF_RSP_ERROR &&
-				     info->shadow[id].req.u.rw.nr_segments == 0)) {
+				     rinfo->shadow[id].req.u.rw.nr_segments == 0)) {
 				printk(KERN_WARNING "blkfront: %s: empty %s op failed\n",
 				       info->gd->disk_name, op_name(bret->operation));
 				error = -EOPNOTSUPP;
@@ -1389,17 +1415,17 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 		}
 	}
 
-	info->ring.rsp_cons = i;
+	rinfo->ring.rsp_cons = i;
 
-	if (i != info->ring.req_prod_pvt) {
+	if (i != rinfo->ring.req_prod_pvt) {
 		int more_to_do;
-		RING_FINAL_CHECK_FOR_RESPONSES(&info->ring, more_to_do);
+		RING_FINAL_CHECK_FOR_RESPONSES(&rinfo->ring, more_to_do);
 		if (more_to_do)
 			goto again;
 	} else
-		info->ring.sring->rsp_event = i + 1;
+		rinfo->ring.sring->rsp_event = i + 1;
 
-	kick_pending_request_queues(info);
+	kick_pending_request_queues(rinfo);
 
 	spin_unlock_irqrestore(&info->io_lock, flags);
 
@@ -1408,15 +1434,16 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 
 
 static int setup_blkring(struct xenbus_device *dev,
-			 struct blkfront_info *info)
+			 struct blkfront_ring_info *rinfo)
 {
 	struct blkif_sring *sring;
 	int err, i;
+	struct blkfront_info *info = rinfo->dev_info;
 	unsigned long ring_size = info->nr_ring_pages * XEN_PAGE_SIZE;
 	grant_ref_t gref[XENBUS_MAX_RING_GRANTS];
 
 	for (i = 0; i < info->nr_ring_pages; i++)
-		info->ring_ref[i] = GRANT_INVALID_REF;
+		rinfo->ring_ref[i] = GRANT_INVALID_REF;
 
 	sring = (struct blkif_sring *)__get_free_pages(GFP_NOIO | __GFP_HIGH,
 						       get_order(ring_size));
@@ -1425,29 +1452,29 @@ static int setup_blkring(struct xenbus_device *dev,
 		return -ENOMEM;
 	}
 	SHARED_RING_INIT(sring);
-	FRONT_RING_INIT(&info->ring, sring, ring_size);
+	FRONT_RING_INIT(&rinfo->ring, sring, ring_size);
 
-	err = xenbus_grant_ring(dev, info->ring.sring, info->nr_ring_pages, gref);
+	err = xenbus_grant_ring(dev, rinfo->ring.sring, info->nr_ring_pages, gref);
 	if (err < 0) {
 		free_pages((unsigned long)sring, get_order(ring_size));
-		info->ring.sring = NULL;
+		rinfo->ring.sring = NULL;
 		goto fail;
 	}
 	for (i = 0; i < info->nr_ring_pages; i++)
-		info->ring_ref[i] = gref[i];
+		rinfo->ring_ref[i] = gref[i];
 
-	err = xenbus_alloc_evtchn(dev, &info->evtchn);
+	err = xenbus_alloc_evtchn(dev, &rinfo->evtchn);
 	if (err)
 		goto fail;
 
-	err = bind_evtchn_to_irqhandler(info->evtchn, blkif_interrupt, 0,
-					"blkif", info);
+	err = bind_evtchn_to_irqhandler(rinfo->evtchn, blkif_interrupt, 0,
+					"blkif", rinfo);
 	if (err <= 0) {
 		xenbus_dev_fatal(dev, err,
 				 "bind_evtchn_to_irqhandler failed");
 		goto fail;
 	}
-	info->irq = err;
+	rinfo->irq = err;
 
 	return 0;
 fail:
@@ -1465,6 +1492,7 @@ static int talk_to_blkback(struct xenbus_device *dev,
 	int err, i;
 	unsigned int max_page_order = 0;
 	unsigned int ring_page_order = 0;
+	struct blkfront_ring_info *rinfo = &info->rinfo;
 
 	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
 			   "max-ring-page-order", "%u", &max_page_order);
@@ -1476,7 +1504,7 @@ static int talk_to_blkback(struct xenbus_device *dev,
 	}
 
 	/* Create shared ring, alloc event channel. */
-	err = setup_blkring(dev, info);
+	err = setup_blkring(dev, rinfo);
 	if (err)
 		goto out;
 
@@ -1489,7 +1517,7 @@ again:
 
 	if (info->nr_ring_pages == 1) {
 		err = xenbus_printf(xbt, dev->nodename,
-				    "ring-ref", "%u", info->ring_ref[0]);
+				    "ring-ref", "%u", rinfo->ring_ref[0]);
 		if (err) {
 			message = "writing ring-ref";
 			goto abort_transaction;
@@ -1507,7 +1535,7 @@ again:
 
 			snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
 			err = xenbus_printf(xbt, dev->nodename, ring_ref_name,
-					    "%u", info->ring_ref[i]);
+					    "%u", rinfo->ring_ref[i]);
 			if (err) {
 				message = "writing ring-ref";
 				goto abort_transaction;
@@ -1515,7 +1543,7 @@ again:
 		}
 	}
 	err = xenbus_printf(xbt, dev->nodename,
-			    "event-channel", "%u", info->evtchn);
+			    "event-channel", "%u", rinfo->evtchn);
 	if (err) {
 		message = "writing event-channel";
 		goto abort_transaction;
@@ -1541,8 +1569,8 @@ again:
 	}
 
 	for (i = 0; i < BLK_RING_SIZE(info); i++)
-		info->shadow[i].req.u.rw.id = i+1;
-	info->shadow[BLK_RING_SIZE(info)-1].req.u.rw.id = 0x0fffffff;
+		rinfo->shadow[i].req.u.rw.id = i+1;
+	rinfo->shadow[BLK_RING_SIZE(info)-1].req.u.rw.id = 0x0fffffff;
 	xenbus_switch_state(dev, XenbusStateInitialised);
 
 	return 0;
@@ -1568,6 +1596,7 @@ static int blkfront_probe(struct xenbus_device *dev,
 {
 	int err, vdevice;
 	struct blkfront_info *info;
+	struct blkfront_ring_info *rinfo;
 
 	/* FIXME: Use dynamic device id if this is not set. */
 	err = xenbus_scanf(XBT_NIL, dev->nodename,
@@ -1617,15 +1646,18 @@ static int blkfront_probe(struct xenbus_device *dev,
 		return -ENOMEM;
 	}
 
+	rinfo = &info->rinfo;
+	INIT_LIST_HEAD(&rinfo->indirect_pages);
+	rinfo->dev_info = info;
+	INIT_WORK(&rinfo->work, blkif_restart_queue);
+
 	mutex_init(&info->mutex);
 	spin_lock_init(&info->io_lock);
 	info->xbdev = dev;
 	info->vdevice = vdevice;
 	INIT_LIST_HEAD(&info->grants);
-	INIT_LIST_HEAD(&info->indirect_pages);
 	info->persistent_gnts_c = 0;
 	info->connected = BLKIF_STATE_DISCONNECTED;
-	INIT_WORK(&info->work, blkif_restart_queue);
 
 	/* Front end dir is a number, which is used as the id. */
 	info->handle = simple_strtoul(strrchr(dev->nodename, '/')+1, NULL, 0);
@@ -1659,19 +1691,20 @@ static int blkif_recover(struct blkfront_info *info)
 	int pending, size;
 	struct split_bio *split_bio;
 	struct list_head requests;
+	struct blkfront_ring_info *rinfo = &info->rinfo;
 
 	/* Stage 1: Make a safe copy of the shadow state. */
-	copy = kmemdup(info->shadow, sizeof(info->shadow),
+	copy = kmemdup(rinfo->shadow, sizeof(rinfo->shadow),
 		       GFP_NOIO | __GFP_REPEAT | __GFP_HIGH);
 	if (!copy)
 		return -ENOMEM;
 
 	/* Stage 2: Set up free list. */
-	memset(&info->shadow, 0, sizeof(info->shadow));
+	memset(&rinfo->shadow, 0, sizeof(rinfo->shadow));
 	for (i = 0; i < BLK_RING_SIZE(info); i++)
-		info->shadow[i].req.u.rw.id = i+1;
-	info->shadow_free = info->ring.req_prod_pvt;
-	info->shadow[BLK_RING_SIZE(info)-1].req.u.rw.id = 0x0fffffff;
+		rinfo->shadow[i].req.u.rw.id = i+1;
+	rinfo->shadow_free = rinfo->ring.req_prod_pvt;
+	rinfo->shadow[BLK_RING_SIZE(info)-1].req.u.rw.id = 0x0fffffff;
 
 	rc = blkfront_gather_backend_features(info);
 	if (rc) {
@@ -1717,7 +1750,7 @@ static int blkif_recover(struct blkfront_info *info)
 	info->connected = BLKIF_STATE_CONNECTED;
 
 	/* Kick any other new requests queued since we resumed */
-	kick_pending_request_queues(info);
+	kick_pending_request_queues(rinfo);
 
 	list_for_each_entry_safe(req, n, &requests, queuelist) {
 		/* Requeue pending requests (flush or discard) */
@@ -1851,10 +1884,11 @@ static void blkfront_setup_discard(struct blkfront_info *info)
 		info->feature_secdiscard = !!discard_secure;
 }
 
-static int blkfront_setup_indirect(struct blkfront_info *info)
+static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo)
 {
 	unsigned int psegs, grants;
 	int err, i;
+	struct blkfront_info *info = rinfo->dev_info;
 
 	if (info->max_indirect_segments == 0)
 		grants = BLKIF_MAX_SEGMENTS_PER_REQUEST;
@@ -1862,7 +1896,7 @@ static int blkfront_setup_indirect(struct blkfront_info *info)
 		grants = info->max_indirect_segments;
 	psegs = grants / GRANTS_PER_PSEG;
 
-	err = fill_grant_buffer(info,
+	err = fill_grant_buffer(rinfo,
 				(grants + INDIRECT_GREFS(grants)) * BLK_RING_SIZE(info));
 	if (err)
 		goto out_of_memory;
@@ -1875,31 +1909,31 @@ static int blkfront_setup_indirect(struct blkfront_info *info)
 		 */
 		int num = INDIRECT_GREFS(grants) * BLK_RING_SIZE(info);
 
-		BUG_ON(!list_empty(&info->indirect_pages));
+		BUG_ON(!list_empty(&rinfo->indirect_pages));
 		for (i = 0; i < num; i++) {
 			struct page *indirect_page = alloc_page(GFP_NOIO);
 			if (!indirect_page)
 				goto out_of_memory;
-			list_add(&indirect_page->lru, &info->indirect_pages);
+			list_add(&indirect_page->lru, &rinfo->indirect_pages);
 		}
 	}
 
 	for (i = 0; i < BLK_RING_SIZE(info); i++) {
-		info->shadow[i].grants_used = kzalloc(
-			sizeof(info->shadow[i].grants_used[0]) * grants,
+		rinfo->shadow[i].grants_used = kzalloc(
+			sizeof(rinfo->shadow[i].grants_used[0]) * grants,
 			GFP_NOIO);
-		info->shadow[i].sg = kzalloc(sizeof(info->shadow[i].sg[0]) * psegs, GFP_NOIO);
+		rinfo->shadow[i].sg = kzalloc(sizeof(rinfo->shadow[i].sg[0]) * psegs, GFP_NOIO);
 		if (info->max_indirect_segments)
-			info->shadow[i].indirect_grants = kzalloc(
-				sizeof(info->shadow[i].indirect_grants[0]) *
+			rinfo->shadow[i].indirect_grants = kzalloc(
+				sizeof(rinfo->shadow[i].indirect_grants[0]) *
 				INDIRECT_GREFS(grants),
 				GFP_NOIO);
-		if ((info->shadow[i].grants_used == NULL) ||
-			(info->shadow[i].sg == NULL) ||
+		if ((rinfo->shadow[i].grants_used == NULL) ||
+			(rinfo->shadow[i].sg == NULL) ||
 		     (info->max_indirect_segments &&
-		     (info->shadow[i].indirect_grants == NULL)))
+		     (rinfo->shadow[i].indirect_grants == NULL)))
 			goto out_of_memory;
-		sg_init_table(info->shadow[i].sg, psegs);
+		sg_init_table(rinfo->shadow[i].sg, psegs);
 	}
 
 
@@ -1907,16 +1941,16 @@ static int blkfront_setup_indirect(struct blkfront_info *info)
 
 out_of_memory:
 	for (i = 0; i < BLK_RING_SIZE(info); i++) {
-		kfree(info->shadow[i].grants_used);
-		info->shadow[i].grants_used = NULL;
-		kfree(info->shadow[i].sg);
-		info->shadow[i].sg = NULL;
-		kfree(info->shadow[i].indirect_grants);
-		info->shadow[i].indirect_grants = NULL;
-	}
-	if (!list_empty(&info->indirect_pages)) {
+		kfree(rinfo->shadow[i].grants_used);
+		rinfo->shadow[i].grants_used = NULL;
+		kfree(rinfo->shadow[i].sg);
+		rinfo->shadow[i].sg = NULL;
+		kfree(rinfo->shadow[i].indirect_grants);
+		rinfo->shadow[i].indirect_grants = NULL;
+	}
+	if (!list_empty(&rinfo->indirect_pages)) {
 		struct page *indirect_page, *n;
-		list_for_each_entry_safe(indirect_page, n, &info->indirect_pages, lru) {
+		list_for_each_entry_safe(indirect_page, n, &rinfo->indirect_pages, lru) {
 			list_del(&indirect_page->lru);
 			__free_page(indirect_page);
 		}
@@ -1983,7 +2017,7 @@ static int blkfront_gather_backend_features(struct blkfront_info *info)
 		info->max_indirect_segments = min(indirect_segments,
 						  xen_blkif_max_segments);
 
-	return blkfront_setup_indirect(info);
+	return blkfront_setup_indirect(&info->rinfo);
 }
 
 /*
@@ -1997,6 +2031,7 @@ static void blkfront_connect(struct blkfront_info *info)
 	unsigned int physical_sector_size;
 	unsigned int binfo;
 	int err;
+	struct blkfront_ring_info *rinfo = &info->rinfo;
 
 	switch (info->connected) {
 	case BLKIF_STATE_CONNECTED:
@@ -2073,7 +2108,7 @@ static void blkfront_connect(struct blkfront_info *info)
 	/* Kick pending requests. */
 	spin_lock_irq(&info->io_lock);
 	info->connected = BLKIF_STATE_CONNECTED;
-	kick_pending_request_queues(info);
+	kick_pending_request_queues(rinfo);
 	spin_unlock_irq(&info->io_lock);
 
 	add_disk(info->gd);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 03/10] xen/blkfront: pseudo support for multi hardware queues/rings
  2015-11-14  3:12 [PATCH v5 00/10] xen-block: multi hardware-queues/rings support Bob Liu
@ 2015-11-14  3:12   ` Bob Liu
  2015-11-14  3:12 ` Bob Liu
                     ` (21 subsequent siblings)
  22 siblings, 0 replies; 59+ messages in thread
From: Bob Liu @ 2015-11-14  3:12 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, roger.pau, konrad.wilk, felipe.franciosi, axboe,
	avanzini.arianna, rafal.mielniczuk, jonathan.davies,
	david.vrabel, Bob Liu

Preparatory patch for multiple hardware queues (rings). The number of
rings is unconditionally set to 1, larger number will be enabled in next
patch("xen/blkfront: negotiate number of queues/rings to be used with backend")
so as to make every single patch small and readable.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
v2:
 * Fix memleak.
 * Other comments from Konrad.
---
 drivers/block/xen-blkfront.c |  341 ++++++++++++++++++++++++------------------
 1 file changed, 195 insertions(+), 146 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 0c3ad21..d73734f 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -150,6 +150,7 @@ struct blkfront_info
 	int vdevice;
 	blkif_vdev_t handle;
 	enum blkif_state connected;
+	/* Number of pages per ring buffer. */
 	unsigned int nr_ring_pages;
 	struct request_queue *rq;
 	struct list_head grants;
@@ -164,7 +165,8 @@ struct blkfront_info
 	unsigned int max_indirect_segments;
 	int is_ready;
 	struct blk_mq_tag_set tag_set;
-	struct blkfront_ring_info rinfo;
+	struct blkfront_ring_info *rinfo;
+	unsigned int nr_rings;
 };
 
 static unsigned int nr_minors;
@@ -209,7 +211,7 @@ static DEFINE_SPINLOCK(minor_lock);
 #define GREFS(_psegs)	((_psegs) * GRANTS_PER_PSEG)
 
 static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo);
-static int blkfront_gather_backend_features(struct blkfront_info *info);
+static void blkfront_gather_backend_features(struct blkfront_info *info);
 
 static int get_id_from_freelist(struct blkfront_ring_info *rinfo)
 {
@@ -338,8 +340,8 @@ static struct grant *get_indirect_grant(grant_ref_t *gref_head,
 		struct page *indirect_page;
 
 		/* Fetch a pre-allocated page to use for indirect grefs */
-		BUG_ON(list_empty(&info->rinfo.indirect_pages));
-		indirect_page = list_first_entry(&info->rinfo.indirect_pages,
+		BUG_ON(list_empty(&info->rinfo->indirect_pages));
+		indirect_page = list_first_entry(&info->rinfo->indirect_pages,
 						 struct page, lru);
 		list_del(&indirect_page->lru);
 		gnt_list_entry->page = indirect_page;
@@ -597,7 +599,6 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
 	 * existing persistent grants, or if we have to get new grants,
 	 * as there are not sufficiently many free.
 	 */
-	bool new_persistent_gnts;
 	struct scatterlist *sg;
 	int num_sg, max_grefs, num_grant;
 
@@ -609,12 +610,12 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
 		 */
 		max_grefs += INDIRECT_GREFS(max_grefs);
 
-	/* Check if we have enough grants to allocate a requests */
-	if (info->persistent_gnts_c < max_grefs) {
-		new_persistent_gnts = 1;
-		if (gnttab_alloc_grant_references(
-		    max_grefs - info->persistent_gnts_c,
-		    &setup.gref_head) < 0) {
+	/*
+	 * We have to reserve 'max_grefs' grants at first because persistent
+	 * grants are shared by all rings.
+	 */
+	if (max_grefs > 0)
+		if (gnttab_alloc_grant_references(max_grefs, &setup.gref_head) < 0) {
 			gnttab_request_free_callback(
 				&rinfo->callback,
 				blkif_restart_queue_callback,
@@ -622,8 +623,6 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
 				max_grefs);
 			return 1;
 		}
-	} else
-		new_persistent_gnts = 0;
 
 	/* Fill out a communications ring structure. */
 	ring_req = RING_GET_REQUEST(&rinfo->ring, rinfo->ring.req_prod_pvt);
@@ -712,7 +711,7 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
 	/* Keep a private copy so we can reissue requests when recovering. */
 	rinfo->shadow[id].req = *ring_req;
 
-	if (new_persistent_gnts)
+	if (max_grefs > 0)
 		gnttab_free_grant_references(setup.gref_head);
 
 	return 0;
@@ -791,7 +790,8 @@ static int blk_mq_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
 {
 	struct blkfront_info *info = (struct blkfront_info *)data;
 
-	hctx->driver_data = &info->rinfo;
+	BUG_ON(info->nr_rings <= index);
+	hctx->driver_data = &info->rinfo[index];
 	return 0;
 }
 
@@ -1050,8 +1050,7 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
 
 static void xlvbd_release_gendisk(struct blkfront_info *info)
 {
-	unsigned int minor, nr_minors;
-	struct blkfront_ring_info *rinfo = &info->rinfo;
+	unsigned int minor, nr_minors, i;
 
 	if (info->rq == NULL)
 		return;
@@ -1059,11 +1058,15 @@ static void xlvbd_release_gendisk(struct blkfront_info *info)
 	/* No more blkif_request(). */
 	blk_mq_stop_hw_queues(info->rq);
 
-	/* No more gnttab callback work. */
-	gnttab_cancel_free_callback(&rinfo->callback);
+	for (i = 0; i < info->nr_rings; i++) {
+		struct blkfront_ring_info *rinfo = &info->rinfo[i];
 
-	/* Flush gnttab callback work. Must be done with no locks held. */
-	flush_work(&rinfo->work);
+		/* No more gnttab callback work. */
+		gnttab_cancel_free_callback(&rinfo->callback);
+
+		/* Flush gnttab callback work. Must be done with no locks held. */
+		flush_work(&rinfo->work);
+	}
 
 	del_gendisk(info->gd);
 
@@ -1096,37 +1099,11 @@ static void blkif_restart_queue(struct work_struct *work)
 	spin_unlock_irq(&rinfo->dev_info->io_lock);
 }
 
-static void blkif_free(struct blkfront_info *info, int suspend)
+static void blkif_free_ring(struct blkfront_ring_info *rinfo)
 {
 	struct grant *persistent_gnt;
-	struct grant *n;
+	struct blkfront_info *info = rinfo->dev_info;
 	int i, j, segs;
-	struct blkfront_ring_info *rinfo = &info->rinfo;
-
-	/* Prevent new requests being issued until we fix things up. */
-	spin_lock_irq(&info->io_lock);
-	info->connected = suspend ?
-		BLKIF_STATE_SUSPENDED : BLKIF_STATE_DISCONNECTED;
-	/* No more blkif_request(). */
-	if (info->rq)
-		blk_mq_stop_hw_queues(info->rq);
-
-	/* Remove all persistent grants */
-	if (!list_empty(&info->grants)) {
-		list_for_each_entry_safe(persistent_gnt, n,
-					 &info->grants, node) {
-			list_del(&persistent_gnt->node);
-			if (persistent_gnt->gref != GRANT_INVALID_REF) {
-				gnttab_end_foreign_access(persistent_gnt->gref,
-				                          0, 0UL);
-				info->persistent_gnts_c--;
-			}
-			if (info->feature_persistent)
-				__free_page(persistent_gnt->page);
-			kfree(persistent_gnt);
-		}
-	}
-	BUG_ON(info->persistent_gnts_c != 0);
 
 	/*
 	 * Remove indirect pages, this only happens when using indirect
@@ -1186,7 +1163,6 @@ free_shadow:
 
 	/* No more gnttab callback work. */
 	gnttab_cancel_free_callback(&rinfo->callback);
-	spin_unlock_irq(&info->io_lock);
 
 	/* Flush gnttab callback work. Must be done with no locks held. */
 	flush_work(&rinfo->work);
@@ -1204,7 +1180,43 @@ free_shadow:
 	if (rinfo->irq)
 		unbind_from_irqhandler(rinfo->irq, rinfo);
 	rinfo->evtchn = rinfo->irq = 0;
+}
 
+static void blkif_free(struct blkfront_info *info, int suspend)
+{
+	struct grant *persistent_gnt, *n;
+	unsigned int i;
+
+	/* Prevent new requests being issued until we fix things up. */
+	spin_lock_irq(&info->io_lock);
+	info->connected = suspend ?
+		BLKIF_STATE_SUSPENDED : BLKIF_STATE_DISCONNECTED;
+	/* No more blkif_request(). */
+	if (info->rq)
+		blk_mq_stop_hw_queues(info->rq);
+
+	/* Remove all persistent grants */
+	if (!list_empty(&info->grants)) {
+		list_for_each_entry_safe(persistent_gnt, n,
+					 &info->grants, node) {
+			list_del(&persistent_gnt->node);
+			if (persistent_gnt->gref != GRANT_INVALID_REF) {
+				gnttab_end_foreign_access(persistent_gnt->gref,
+							  0, 0UL);
+				info->persistent_gnts_c--;
+			}
+			if (info->feature_persistent)
+				__free_page(persistent_gnt->page);
+			kfree(persistent_gnt);
+		}
+	}
+	BUG_ON(info->persistent_gnts_c != 0);
+
+	for (i = 0; i < info->nr_rings; i++)
+		blkif_free_ring(&info->rinfo[i]);
+	kfree(info->rinfo);
+	info->nr_rings = 0;
+	spin_unlock_irq(&info->io_lock);
 }
 
 struct copy_from_grant {
@@ -1492,7 +1504,7 @@ static int talk_to_blkback(struct xenbus_device *dev,
 	int err, i;
 	unsigned int max_page_order = 0;
 	unsigned int ring_page_order = 0;
-	struct blkfront_ring_info *rinfo = &info->rinfo;
+	struct blkfront_ring_info *rinfo;
 
 	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
 			   "max-ring-page-order", "%u", &max_page_order);
@@ -1503,10 +1515,13 @@ static int talk_to_blkback(struct xenbus_device *dev,
 		info->nr_ring_pages = 1 << ring_page_order;
 	}
 
-	/* Create shared ring, alloc event channel. */
-	err = setup_blkring(dev, rinfo);
-	if (err)
-		goto out;
+	for (i = 0; i < info->nr_rings; i++) {
+		rinfo = &info->rinfo[i];
+		/* Create shared ring, alloc event channel. */
+		err = setup_blkring(dev, rinfo);
+		if (err)
+			goto destroy_blkring;
+	}
 
 again:
 	err = xenbus_transaction_start(&xbt);
@@ -1515,37 +1530,43 @@ again:
 		goto destroy_blkring;
 	}
 
-	if (info->nr_ring_pages == 1) {
-		err = xenbus_printf(xbt, dev->nodename,
-				    "ring-ref", "%u", rinfo->ring_ref[0]);
-		if (err) {
-			message = "writing ring-ref";
-			goto abort_transaction;
-		}
-	} else {
-		err = xenbus_printf(xbt, dev->nodename,
-				    "ring-page-order", "%u", ring_page_order);
-		if (err) {
-			message = "writing ring-page-order";
-			goto abort_transaction;
-		}
-
-		for (i = 0; i < info->nr_ring_pages; i++) {
-			char ring_ref_name[RINGREF_NAME_LEN];
-
-			snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
-			err = xenbus_printf(xbt, dev->nodename, ring_ref_name,
-					    "%u", rinfo->ring_ref[i]);
+	if (info->nr_rings == 1) {
+		rinfo = &info->rinfo[0];
+		if (info->nr_ring_pages == 1) {
+			err = xenbus_printf(xbt, dev->nodename,
+					    "ring-ref", "%u", rinfo->ring_ref[0]);
 			if (err) {
 				message = "writing ring-ref";
 				goto abort_transaction;
 			}
+		} else {
+			err = xenbus_printf(xbt, dev->nodename,
+					    "ring-page-order", "%u", ring_page_order);
+			if (err) {
+				message = "writing ring-page-order";
+				goto abort_transaction;
+			}
+
+			for (i = 0; i < info->nr_ring_pages; i++) {
+				char ring_ref_name[RINGREF_NAME_LEN];
+
+				snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
+				err = xenbus_printf(xbt, dev->nodename, ring_ref_name,
+						    "%u", rinfo->ring_ref[i]);
+				if (err) {
+					message = "writing ring-ref";
+					goto abort_transaction;
+				}
+			}
 		}
-	}
-	err = xenbus_printf(xbt, dev->nodename,
-			    "event-channel", "%u", rinfo->evtchn);
-	if (err) {
-		message = "writing event-channel";
+		err = xenbus_printf(xbt, dev->nodename,
+				    "event-channel", "%u", rinfo->evtchn);
+		if (err) {
+			message = "writing event-channel";
+			goto abort_transaction;
+		}
+	} else {
+		/* Not supported at this stage. */
 		goto abort_transaction;
 	}
 	err = xenbus_printf(xbt, dev->nodename, "protocol", "%s",
@@ -1568,9 +1589,14 @@ again:
 		goto destroy_blkring;
 	}
 
-	for (i = 0; i < BLK_RING_SIZE(info); i++)
-		rinfo->shadow[i].req.u.rw.id = i+1;
-	rinfo->shadow[BLK_RING_SIZE(info)-1].req.u.rw.id = 0x0fffffff;
+	for (i = 0; i < info->nr_rings; i++) {
+		int j;
+		rinfo = &info->rinfo[i];
+
+		for (j = 0; j < BLK_RING_SIZE(info); j++)
+			rinfo->shadow[j].req.u.rw.id = j + 1;
+		rinfo->shadow[BLK_RING_SIZE(info)-1].req.u.rw.id = 0x0fffffff;
+	}
 	xenbus_switch_state(dev, XenbusStateInitialised);
 
 	return 0;
@@ -1581,7 +1607,7 @@ again:
 		xenbus_dev_fatal(dev, err, "%s", message);
  destroy_blkring:
 	blkif_free(info, 0);
- out:
+
 	return err;
 }
 
@@ -1594,9 +1620,8 @@ again:
 static int blkfront_probe(struct xenbus_device *dev,
 			  const struct xenbus_device_id *id)
 {
-	int err, vdevice;
+	int err, vdevice, r_index;
 	struct blkfront_info *info;
-	struct blkfront_ring_info *rinfo;
 
 	/* FIXME: Use dynamic device id if this is not set. */
 	err = xenbus_scanf(XBT_NIL, dev->nodename,
@@ -1646,10 +1671,22 @@ static int blkfront_probe(struct xenbus_device *dev,
 		return -ENOMEM;
 	}
 
-	rinfo = &info->rinfo;
-	INIT_LIST_HEAD(&rinfo->indirect_pages);
-	rinfo->dev_info = info;
-	INIT_WORK(&rinfo->work, blkif_restart_queue);
+	info->nr_rings = 1;
+	info->rinfo = kzalloc(sizeof(struct blkfront_ring_info) * info->nr_rings, GFP_KERNEL);
+	if (!info->rinfo) {
+		xenbus_dev_fatal(dev, -ENOMEM, "allocating ring_info structure");
+		kfree(info);
+		return -ENOMEM;
+	}
+
+	for (r_index = 0; r_index < info->nr_rings; r_index++) {
+		struct blkfront_ring_info *rinfo;
+
+		rinfo = &info->rinfo[r_index];
+		INIT_LIST_HEAD(&rinfo->indirect_pages);
+		rinfo->dev_info = info;
+		INIT_WORK(&rinfo->work, blkif_restart_queue);
+	}
 
 	mutex_init(&info->mutex);
 	spin_lock_init(&info->io_lock);
@@ -1681,7 +1718,7 @@ static void split_bio_end(struct bio *bio)
 
 static int blkif_recover(struct blkfront_info *info)
 {
-	int i;
+	int i, r_index;
 	struct request *req, *n;
 	struct blk_shadow *copy;
 	int rc;
@@ -1691,57 +1728,62 @@ static int blkif_recover(struct blkfront_info *info)
 	int pending, size;
 	struct split_bio *split_bio;
 	struct list_head requests;
-	struct blkfront_ring_info *rinfo = &info->rinfo;
-
-	/* Stage 1: Make a safe copy of the shadow state. */
-	copy = kmemdup(rinfo->shadow, sizeof(rinfo->shadow),
-		       GFP_NOIO | __GFP_REPEAT | __GFP_HIGH);
-	if (!copy)
-		return -ENOMEM;
-
-	/* Stage 2: Set up free list. */
-	memset(&rinfo->shadow, 0, sizeof(rinfo->shadow));
-	for (i = 0; i < BLK_RING_SIZE(info); i++)
-		rinfo->shadow[i].req.u.rw.id = i+1;
-	rinfo->shadow_free = rinfo->ring.req_prod_pvt;
-	rinfo->shadow[BLK_RING_SIZE(info)-1].req.u.rw.id = 0x0fffffff;
-
-	rc = blkfront_gather_backend_features(info);
-	if (rc) {
-		kfree(copy);
-		return rc;
-	}
 
+	blkfront_gather_backend_features(info);
 	segs = info->max_indirect_segments ? : BLKIF_MAX_SEGMENTS_PER_REQUEST;
 	blk_queue_max_segments(info->rq, segs);
 	bio_list_init(&bio_list);
 	INIT_LIST_HEAD(&requests);
-	for (i = 0; i < BLK_RING_SIZE(info); i++) {
-		/* Not in use? */
-		if (!copy[i].request)
-			continue;
 
-		/*
-		 * Get the bios in the request so we can re-queue them.
-		 */
-		if (copy[i].request->cmd_flags &
-		    (REQ_FLUSH | REQ_FUA | REQ_DISCARD | REQ_SECURE)) {
+	for (r_index = 0; r_index < info->nr_rings; r_index++) {
+		struct blkfront_ring_info *rinfo;
+
+		rinfo = &info->rinfo[r_index];
+		/* Stage 1: Make a safe copy of the shadow state. */
+		copy = kmemdup(rinfo->shadow, sizeof(rinfo->shadow),
+			       GFP_NOIO | __GFP_REPEAT | __GFP_HIGH);
+		if (!copy)
+			return -ENOMEM;
+
+		/* Stage 2: Set up free list. */
+		memset(&rinfo->shadow, 0, sizeof(rinfo->shadow));
+		for (i = 0; i < BLK_RING_SIZE(info); i++)
+			rinfo->shadow[i].req.u.rw.id = i+1;
+		rinfo->shadow_free = rinfo->ring.req_prod_pvt;
+		rinfo->shadow[BLK_RING_SIZE(info)-1].req.u.rw.id = 0x0fffffff;
+
+		rc = blkfront_setup_indirect(rinfo);
+		if (rc) {
+			kfree(copy);
+			return rc;
+		}
+
+		for (i = 0; i < BLK_RING_SIZE(info); i++) {
+			/* Not in use? */
+			if (!copy[i].request)
+				continue;
+
 			/*
-			 * Flush operations don't contain bios, so
-			 * we need to requeue the whole request
+			 * Get the bios in the request so we can re-queue them.
 			 */
-			list_add(&copy[i].request->queuelist, &requests);
-			continue;
+			if (copy[i].request->cmd_flags &
+			    (REQ_FLUSH | REQ_FUA | REQ_DISCARD | REQ_SECURE)) {
+				/*
+				 * Flush operations don't contain bios, so
+				 * we need to requeue the whole request
+				 */
+				list_add(&copy[i].request->queuelist, &requests);
+				continue;
+			}
+			merge_bio.head = copy[i].request->bio;
+			merge_bio.tail = copy[i].request->biotail;
+			bio_list_merge(&bio_list, &merge_bio);
+			copy[i].request->bio = NULL;
+			blk_end_request_all(copy[i].request, 0);
 		}
-		merge_bio.head = copy[i].request->bio;
-		merge_bio.tail = copy[i].request->biotail;
-		bio_list_merge(&bio_list, &merge_bio);
-		copy[i].request->bio = NULL;
-		blk_end_request_all(copy[i].request, 0);
-	}
-
-	kfree(copy);
 
+		kfree(copy);
+	}
 	xenbus_switch_state(info->xbdev, XenbusStateConnected);
 
 	spin_lock_irq(&info->io_lock);
@@ -1749,8 +1791,13 @@ static int blkif_recover(struct blkfront_info *info)
 	/* Now safe for us to use the shared ring */
 	info->connected = BLKIF_STATE_CONNECTED;
 
-	/* Kick any other new requests queued since we resumed */
-	kick_pending_request_queues(rinfo);
+	for (r_index = 0; r_index < info->nr_rings; r_index++) {
+		struct blkfront_ring_info *rinfo;
+
+		rinfo = &info->rinfo[r_index];
+		/* Kick any other new requests queued since we resumed */
+		kick_pending_request_queues(rinfo);
+	}
 
 	list_for_each_entry_safe(req, n, &requests, queuelist) {
 		/* Requeue pending requests (flush or discard) */
@@ -1961,7 +2008,7 @@ out_of_memory:
 /*
  * Gather all backend feature-*
  */
-static int blkfront_gather_backend_features(struct blkfront_info *info)
+static void blkfront_gather_backend_features(struct blkfront_info *info)
 {
 	int err;
 	int barrier, flush, discard, persistent;
@@ -2016,8 +2063,6 @@ static int blkfront_gather_backend_features(struct blkfront_info *info)
 	else
 		info->max_indirect_segments = min(indirect_segments,
 						  xen_blkif_max_segments);
-
-	return blkfront_setup_indirect(&info->rinfo);
 }
 
 /*
@@ -2030,8 +2075,7 @@ static void blkfront_connect(struct blkfront_info *info)
 	unsigned long sector_size;
 	unsigned int physical_sector_size;
 	unsigned int binfo;
-	int err;
-	struct blkfront_ring_info *rinfo = &info->rinfo;
+	int err, i;
 
 	switch (info->connected) {
 	case BLKIF_STATE_CONNECTED:
@@ -2088,11 +2132,15 @@ static void blkfront_connect(struct blkfront_info *info)
 	if (err != 1)
 		physical_sector_size = sector_size;
 
-	err = blkfront_gather_backend_features(info);
-	if (err) {
-		xenbus_dev_fatal(info->xbdev, err, "setup_indirect at %s",
-				 info->xbdev->otherend);
-		return;
+	blkfront_gather_backend_features(info);
+	for (i = 0; i < info->nr_rings; i++) {
+		err = blkfront_setup_indirect(&info->rinfo[i]);
+		if (err) {
+			xenbus_dev_fatal(info->xbdev, err, "setup_indirect at %s",
+					 info->xbdev->otherend);
+			blkif_free(info, 0);
+			break;
+		}
 	}
 
 	err = xlvbd_alloc_gendisk(sectors, info, binfo, sector_size,
@@ -2108,7 +2156,8 @@ static void blkfront_connect(struct blkfront_info *info)
 	/* Kick pending requests. */
 	spin_lock_irq(&info->io_lock);
 	info->connected = BLKIF_STATE_CONNECTED;
-	kick_pending_request_queues(rinfo);
+	for (i = 0; i < info->nr_rings; i++)
+		kick_pending_request_queues(&info->rinfo[i]);
 	spin_unlock_irq(&info->io_lock);
 
 	add_disk(info->gd);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 03/10] xen/blkfront: pseudo support for multi hardware queues/rings
@ 2015-11-14  3:12   ` Bob Liu
  0 siblings, 0 replies; 59+ messages in thread
From: Bob Liu @ 2015-11-14  3:12 UTC (permalink / raw)
  To: xen-devel
  Cc: jonathan.davies, felipe.franciosi, rafal.mielniczuk,
	linux-kernel, axboe, Bob Liu, david.vrabel, avanzini.arianna,
	roger.pau

Preparatory patch for multiple hardware queues (rings). The number of
rings is unconditionally set to 1, larger number will be enabled in next
patch("xen/blkfront: negotiate number of queues/rings to be used with backend")
so as to make every single patch small and readable.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
v2:
 * Fix memleak.
 * Other comments from Konrad.
---
 drivers/block/xen-blkfront.c |  341 ++++++++++++++++++++++++------------------
 1 file changed, 195 insertions(+), 146 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 0c3ad21..d73734f 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -150,6 +150,7 @@ struct blkfront_info
 	int vdevice;
 	blkif_vdev_t handle;
 	enum blkif_state connected;
+	/* Number of pages per ring buffer. */
 	unsigned int nr_ring_pages;
 	struct request_queue *rq;
 	struct list_head grants;
@@ -164,7 +165,8 @@ struct blkfront_info
 	unsigned int max_indirect_segments;
 	int is_ready;
 	struct blk_mq_tag_set tag_set;
-	struct blkfront_ring_info rinfo;
+	struct blkfront_ring_info *rinfo;
+	unsigned int nr_rings;
 };
 
 static unsigned int nr_minors;
@@ -209,7 +211,7 @@ static DEFINE_SPINLOCK(minor_lock);
 #define GREFS(_psegs)	((_psegs) * GRANTS_PER_PSEG)
 
 static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo);
-static int blkfront_gather_backend_features(struct blkfront_info *info);
+static void blkfront_gather_backend_features(struct blkfront_info *info);
 
 static int get_id_from_freelist(struct blkfront_ring_info *rinfo)
 {
@@ -338,8 +340,8 @@ static struct grant *get_indirect_grant(grant_ref_t *gref_head,
 		struct page *indirect_page;
 
 		/* Fetch a pre-allocated page to use for indirect grefs */
-		BUG_ON(list_empty(&info->rinfo.indirect_pages));
-		indirect_page = list_first_entry(&info->rinfo.indirect_pages,
+		BUG_ON(list_empty(&info->rinfo->indirect_pages));
+		indirect_page = list_first_entry(&info->rinfo->indirect_pages,
 						 struct page, lru);
 		list_del(&indirect_page->lru);
 		gnt_list_entry->page = indirect_page;
@@ -597,7 +599,6 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
 	 * existing persistent grants, or if we have to get new grants,
 	 * as there are not sufficiently many free.
 	 */
-	bool new_persistent_gnts;
 	struct scatterlist *sg;
 	int num_sg, max_grefs, num_grant;
 
@@ -609,12 +610,12 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
 		 */
 		max_grefs += INDIRECT_GREFS(max_grefs);
 
-	/* Check if we have enough grants to allocate a requests */
-	if (info->persistent_gnts_c < max_grefs) {
-		new_persistent_gnts = 1;
-		if (gnttab_alloc_grant_references(
-		    max_grefs - info->persistent_gnts_c,
-		    &setup.gref_head) < 0) {
+	/*
+	 * We have to reserve 'max_grefs' grants at first because persistent
+	 * grants are shared by all rings.
+	 */
+	if (max_grefs > 0)
+		if (gnttab_alloc_grant_references(max_grefs, &setup.gref_head) < 0) {
 			gnttab_request_free_callback(
 				&rinfo->callback,
 				blkif_restart_queue_callback,
@@ -622,8 +623,6 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
 				max_grefs);
 			return 1;
 		}
-	} else
-		new_persistent_gnts = 0;
 
 	/* Fill out a communications ring structure. */
 	ring_req = RING_GET_REQUEST(&rinfo->ring, rinfo->ring.req_prod_pvt);
@@ -712,7 +711,7 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
 	/* Keep a private copy so we can reissue requests when recovering. */
 	rinfo->shadow[id].req = *ring_req;
 
-	if (new_persistent_gnts)
+	if (max_grefs > 0)
 		gnttab_free_grant_references(setup.gref_head);
 
 	return 0;
@@ -791,7 +790,8 @@ static int blk_mq_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
 {
 	struct blkfront_info *info = (struct blkfront_info *)data;
 
-	hctx->driver_data = &info->rinfo;
+	BUG_ON(info->nr_rings <= index);
+	hctx->driver_data = &info->rinfo[index];
 	return 0;
 }
 
@@ -1050,8 +1050,7 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
 
 static void xlvbd_release_gendisk(struct blkfront_info *info)
 {
-	unsigned int minor, nr_minors;
-	struct blkfront_ring_info *rinfo = &info->rinfo;
+	unsigned int minor, nr_minors, i;
 
 	if (info->rq == NULL)
 		return;
@@ -1059,11 +1058,15 @@ static void xlvbd_release_gendisk(struct blkfront_info *info)
 	/* No more blkif_request(). */
 	blk_mq_stop_hw_queues(info->rq);
 
-	/* No more gnttab callback work. */
-	gnttab_cancel_free_callback(&rinfo->callback);
+	for (i = 0; i < info->nr_rings; i++) {
+		struct blkfront_ring_info *rinfo = &info->rinfo[i];
 
-	/* Flush gnttab callback work. Must be done with no locks held. */
-	flush_work(&rinfo->work);
+		/* No more gnttab callback work. */
+		gnttab_cancel_free_callback(&rinfo->callback);
+
+		/* Flush gnttab callback work. Must be done with no locks held. */
+		flush_work(&rinfo->work);
+	}
 
 	del_gendisk(info->gd);
 
@@ -1096,37 +1099,11 @@ static void blkif_restart_queue(struct work_struct *work)
 	spin_unlock_irq(&rinfo->dev_info->io_lock);
 }
 
-static void blkif_free(struct blkfront_info *info, int suspend)
+static void blkif_free_ring(struct blkfront_ring_info *rinfo)
 {
 	struct grant *persistent_gnt;
-	struct grant *n;
+	struct blkfront_info *info = rinfo->dev_info;
 	int i, j, segs;
-	struct blkfront_ring_info *rinfo = &info->rinfo;
-
-	/* Prevent new requests being issued until we fix things up. */
-	spin_lock_irq(&info->io_lock);
-	info->connected = suspend ?
-		BLKIF_STATE_SUSPENDED : BLKIF_STATE_DISCONNECTED;
-	/* No more blkif_request(). */
-	if (info->rq)
-		blk_mq_stop_hw_queues(info->rq);
-
-	/* Remove all persistent grants */
-	if (!list_empty(&info->grants)) {
-		list_for_each_entry_safe(persistent_gnt, n,
-					 &info->grants, node) {
-			list_del(&persistent_gnt->node);
-			if (persistent_gnt->gref != GRANT_INVALID_REF) {
-				gnttab_end_foreign_access(persistent_gnt->gref,
-				                          0, 0UL);
-				info->persistent_gnts_c--;
-			}
-			if (info->feature_persistent)
-				__free_page(persistent_gnt->page);
-			kfree(persistent_gnt);
-		}
-	}
-	BUG_ON(info->persistent_gnts_c != 0);
 
 	/*
 	 * Remove indirect pages, this only happens when using indirect
@@ -1186,7 +1163,6 @@ free_shadow:
 
 	/* No more gnttab callback work. */
 	gnttab_cancel_free_callback(&rinfo->callback);
-	spin_unlock_irq(&info->io_lock);
 
 	/* Flush gnttab callback work. Must be done with no locks held. */
 	flush_work(&rinfo->work);
@@ -1204,7 +1180,43 @@ free_shadow:
 	if (rinfo->irq)
 		unbind_from_irqhandler(rinfo->irq, rinfo);
 	rinfo->evtchn = rinfo->irq = 0;
+}
 
+static void blkif_free(struct blkfront_info *info, int suspend)
+{
+	struct grant *persistent_gnt, *n;
+	unsigned int i;
+
+	/* Prevent new requests being issued until we fix things up. */
+	spin_lock_irq(&info->io_lock);
+	info->connected = suspend ?
+		BLKIF_STATE_SUSPENDED : BLKIF_STATE_DISCONNECTED;
+	/* No more blkif_request(). */
+	if (info->rq)
+		blk_mq_stop_hw_queues(info->rq);
+
+	/* Remove all persistent grants */
+	if (!list_empty(&info->grants)) {
+		list_for_each_entry_safe(persistent_gnt, n,
+					 &info->grants, node) {
+			list_del(&persistent_gnt->node);
+			if (persistent_gnt->gref != GRANT_INVALID_REF) {
+				gnttab_end_foreign_access(persistent_gnt->gref,
+							  0, 0UL);
+				info->persistent_gnts_c--;
+			}
+			if (info->feature_persistent)
+				__free_page(persistent_gnt->page);
+			kfree(persistent_gnt);
+		}
+	}
+	BUG_ON(info->persistent_gnts_c != 0);
+
+	for (i = 0; i < info->nr_rings; i++)
+		blkif_free_ring(&info->rinfo[i]);
+	kfree(info->rinfo);
+	info->nr_rings = 0;
+	spin_unlock_irq(&info->io_lock);
 }
 
 struct copy_from_grant {
@@ -1492,7 +1504,7 @@ static int talk_to_blkback(struct xenbus_device *dev,
 	int err, i;
 	unsigned int max_page_order = 0;
 	unsigned int ring_page_order = 0;
-	struct blkfront_ring_info *rinfo = &info->rinfo;
+	struct blkfront_ring_info *rinfo;
 
 	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
 			   "max-ring-page-order", "%u", &max_page_order);
@@ -1503,10 +1515,13 @@ static int talk_to_blkback(struct xenbus_device *dev,
 		info->nr_ring_pages = 1 << ring_page_order;
 	}
 
-	/* Create shared ring, alloc event channel. */
-	err = setup_blkring(dev, rinfo);
-	if (err)
-		goto out;
+	for (i = 0; i < info->nr_rings; i++) {
+		rinfo = &info->rinfo[i];
+		/* Create shared ring, alloc event channel. */
+		err = setup_blkring(dev, rinfo);
+		if (err)
+			goto destroy_blkring;
+	}
 
 again:
 	err = xenbus_transaction_start(&xbt);
@@ -1515,37 +1530,43 @@ again:
 		goto destroy_blkring;
 	}
 
-	if (info->nr_ring_pages == 1) {
-		err = xenbus_printf(xbt, dev->nodename,
-				    "ring-ref", "%u", rinfo->ring_ref[0]);
-		if (err) {
-			message = "writing ring-ref";
-			goto abort_transaction;
-		}
-	} else {
-		err = xenbus_printf(xbt, dev->nodename,
-				    "ring-page-order", "%u", ring_page_order);
-		if (err) {
-			message = "writing ring-page-order";
-			goto abort_transaction;
-		}
-
-		for (i = 0; i < info->nr_ring_pages; i++) {
-			char ring_ref_name[RINGREF_NAME_LEN];
-
-			snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
-			err = xenbus_printf(xbt, dev->nodename, ring_ref_name,
-					    "%u", rinfo->ring_ref[i]);
+	if (info->nr_rings == 1) {
+		rinfo = &info->rinfo[0];
+		if (info->nr_ring_pages == 1) {
+			err = xenbus_printf(xbt, dev->nodename,
+					    "ring-ref", "%u", rinfo->ring_ref[0]);
 			if (err) {
 				message = "writing ring-ref";
 				goto abort_transaction;
 			}
+		} else {
+			err = xenbus_printf(xbt, dev->nodename,
+					    "ring-page-order", "%u", ring_page_order);
+			if (err) {
+				message = "writing ring-page-order";
+				goto abort_transaction;
+			}
+
+			for (i = 0; i < info->nr_ring_pages; i++) {
+				char ring_ref_name[RINGREF_NAME_LEN];
+
+				snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
+				err = xenbus_printf(xbt, dev->nodename, ring_ref_name,
+						    "%u", rinfo->ring_ref[i]);
+				if (err) {
+					message = "writing ring-ref";
+					goto abort_transaction;
+				}
+			}
 		}
-	}
-	err = xenbus_printf(xbt, dev->nodename,
-			    "event-channel", "%u", rinfo->evtchn);
-	if (err) {
-		message = "writing event-channel";
+		err = xenbus_printf(xbt, dev->nodename,
+				    "event-channel", "%u", rinfo->evtchn);
+		if (err) {
+			message = "writing event-channel";
+			goto abort_transaction;
+		}
+	} else {
+		/* Not supported at this stage. */
 		goto abort_transaction;
 	}
 	err = xenbus_printf(xbt, dev->nodename, "protocol", "%s",
@@ -1568,9 +1589,14 @@ again:
 		goto destroy_blkring;
 	}
 
-	for (i = 0; i < BLK_RING_SIZE(info); i++)
-		rinfo->shadow[i].req.u.rw.id = i+1;
-	rinfo->shadow[BLK_RING_SIZE(info)-1].req.u.rw.id = 0x0fffffff;
+	for (i = 0; i < info->nr_rings; i++) {
+		int j;
+		rinfo = &info->rinfo[i];
+
+		for (j = 0; j < BLK_RING_SIZE(info); j++)
+			rinfo->shadow[j].req.u.rw.id = j + 1;
+		rinfo->shadow[BLK_RING_SIZE(info)-1].req.u.rw.id = 0x0fffffff;
+	}
 	xenbus_switch_state(dev, XenbusStateInitialised);
 
 	return 0;
@@ -1581,7 +1607,7 @@ again:
 		xenbus_dev_fatal(dev, err, "%s", message);
  destroy_blkring:
 	blkif_free(info, 0);
- out:
+
 	return err;
 }
 
@@ -1594,9 +1620,8 @@ again:
 static int blkfront_probe(struct xenbus_device *dev,
 			  const struct xenbus_device_id *id)
 {
-	int err, vdevice;
+	int err, vdevice, r_index;
 	struct blkfront_info *info;
-	struct blkfront_ring_info *rinfo;
 
 	/* FIXME: Use dynamic device id if this is not set. */
 	err = xenbus_scanf(XBT_NIL, dev->nodename,
@@ -1646,10 +1671,22 @@ static int blkfront_probe(struct xenbus_device *dev,
 		return -ENOMEM;
 	}
 
-	rinfo = &info->rinfo;
-	INIT_LIST_HEAD(&rinfo->indirect_pages);
-	rinfo->dev_info = info;
-	INIT_WORK(&rinfo->work, blkif_restart_queue);
+	info->nr_rings = 1;
+	info->rinfo = kzalloc(sizeof(struct blkfront_ring_info) * info->nr_rings, GFP_KERNEL);
+	if (!info->rinfo) {
+		xenbus_dev_fatal(dev, -ENOMEM, "allocating ring_info structure");
+		kfree(info);
+		return -ENOMEM;
+	}
+
+	for (r_index = 0; r_index < info->nr_rings; r_index++) {
+		struct blkfront_ring_info *rinfo;
+
+		rinfo = &info->rinfo[r_index];
+		INIT_LIST_HEAD(&rinfo->indirect_pages);
+		rinfo->dev_info = info;
+		INIT_WORK(&rinfo->work, blkif_restart_queue);
+	}
 
 	mutex_init(&info->mutex);
 	spin_lock_init(&info->io_lock);
@@ -1681,7 +1718,7 @@ static void split_bio_end(struct bio *bio)
 
 static int blkif_recover(struct blkfront_info *info)
 {
-	int i;
+	int i, r_index;
 	struct request *req, *n;
 	struct blk_shadow *copy;
 	int rc;
@@ -1691,57 +1728,62 @@ static int blkif_recover(struct blkfront_info *info)
 	int pending, size;
 	struct split_bio *split_bio;
 	struct list_head requests;
-	struct blkfront_ring_info *rinfo = &info->rinfo;
-
-	/* Stage 1: Make a safe copy of the shadow state. */
-	copy = kmemdup(rinfo->shadow, sizeof(rinfo->shadow),
-		       GFP_NOIO | __GFP_REPEAT | __GFP_HIGH);
-	if (!copy)
-		return -ENOMEM;
-
-	/* Stage 2: Set up free list. */
-	memset(&rinfo->shadow, 0, sizeof(rinfo->shadow));
-	for (i = 0; i < BLK_RING_SIZE(info); i++)
-		rinfo->shadow[i].req.u.rw.id = i+1;
-	rinfo->shadow_free = rinfo->ring.req_prod_pvt;
-	rinfo->shadow[BLK_RING_SIZE(info)-1].req.u.rw.id = 0x0fffffff;
-
-	rc = blkfront_gather_backend_features(info);
-	if (rc) {
-		kfree(copy);
-		return rc;
-	}
 
+	blkfront_gather_backend_features(info);
 	segs = info->max_indirect_segments ? : BLKIF_MAX_SEGMENTS_PER_REQUEST;
 	blk_queue_max_segments(info->rq, segs);
 	bio_list_init(&bio_list);
 	INIT_LIST_HEAD(&requests);
-	for (i = 0; i < BLK_RING_SIZE(info); i++) {
-		/* Not in use? */
-		if (!copy[i].request)
-			continue;
 
-		/*
-		 * Get the bios in the request so we can re-queue them.
-		 */
-		if (copy[i].request->cmd_flags &
-		    (REQ_FLUSH | REQ_FUA | REQ_DISCARD | REQ_SECURE)) {
+	for (r_index = 0; r_index < info->nr_rings; r_index++) {
+		struct blkfront_ring_info *rinfo;
+
+		rinfo = &info->rinfo[r_index];
+		/* Stage 1: Make a safe copy of the shadow state. */
+		copy = kmemdup(rinfo->shadow, sizeof(rinfo->shadow),
+			       GFP_NOIO | __GFP_REPEAT | __GFP_HIGH);
+		if (!copy)
+			return -ENOMEM;
+
+		/* Stage 2: Set up free list. */
+		memset(&rinfo->shadow, 0, sizeof(rinfo->shadow));
+		for (i = 0; i < BLK_RING_SIZE(info); i++)
+			rinfo->shadow[i].req.u.rw.id = i+1;
+		rinfo->shadow_free = rinfo->ring.req_prod_pvt;
+		rinfo->shadow[BLK_RING_SIZE(info)-1].req.u.rw.id = 0x0fffffff;
+
+		rc = blkfront_setup_indirect(rinfo);
+		if (rc) {
+			kfree(copy);
+			return rc;
+		}
+
+		for (i = 0; i < BLK_RING_SIZE(info); i++) {
+			/* Not in use? */
+			if (!copy[i].request)
+				continue;
+
 			/*
-			 * Flush operations don't contain bios, so
-			 * we need to requeue the whole request
+			 * Get the bios in the request so we can re-queue them.
 			 */
-			list_add(&copy[i].request->queuelist, &requests);
-			continue;
+			if (copy[i].request->cmd_flags &
+			    (REQ_FLUSH | REQ_FUA | REQ_DISCARD | REQ_SECURE)) {
+				/*
+				 * Flush operations don't contain bios, so
+				 * we need to requeue the whole request
+				 */
+				list_add(&copy[i].request->queuelist, &requests);
+				continue;
+			}
+			merge_bio.head = copy[i].request->bio;
+			merge_bio.tail = copy[i].request->biotail;
+			bio_list_merge(&bio_list, &merge_bio);
+			copy[i].request->bio = NULL;
+			blk_end_request_all(copy[i].request, 0);
 		}
-		merge_bio.head = copy[i].request->bio;
-		merge_bio.tail = copy[i].request->biotail;
-		bio_list_merge(&bio_list, &merge_bio);
-		copy[i].request->bio = NULL;
-		blk_end_request_all(copy[i].request, 0);
-	}
-
-	kfree(copy);
 
+		kfree(copy);
+	}
 	xenbus_switch_state(info->xbdev, XenbusStateConnected);
 
 	spin_lock_irq(&info->io_lock);
@@ -1749,8 +1791,13 @@ static int blkif_recover(struct blkfront_info *info)
 	/* Now safe for us to use the shared ring */
 	info->connected = BLKIF_STATE_CONNECTED;
 
-	/* Kick any other new requests queued since we resumed */
-	kick_pending_request_queues(rinfo);
+	for (r_index = 0; r_index < info->nr_rings; r_index++) {
+		struct blkfront_ring_info *rinfo;
+
+		rinfo = &info->rinfo[r_index];
+		/* Kick any other new requests queued since we resumed */
+		kick_pending_request_queues(rinfo);
+	}
 
 	list_for_each_entry_safe(req, n, &requests, queuelist) {
 		/* Requeue pending requests (flush or discard) */
@@ -1961,7 +2008,7 @@ out_of_memory:
 /*
  * Gather all backend feature-*
  */
-static int blkfront_gather_backend_features(struct blkfront_info *info)
+static void blkfront_gather_backend_features(struct blkfront_info *info)
 {
 	int err;
 	int barrier, flush, discard, persistent;
@@ -2016,8 +2063,6 @@ static int blkfront_gather_backend_features(struct blkfront_info *info)
 	else
 		info->max_indirect_segments = min(indirect_segments,
 						  xen_blkif_max_segments);
-
-	return blkfront_setup_indirect(&info->rinfo);
 }
 
 /*
@@ -2030,8 +2075,7 @@ static void blkfront_connect(struct blkfront_info *info)
 	unsigned long sector_size;
 	unsigned int physical_sector_size;
 	unsigned int binfo;
-	int err;
-	struct blkfront_ring_info *rinfo = &info->rinfo;
+	int err, i;
 
 	switch (info->connected) {
 	case BLKIF_STATE_CONNECTED:
@@ -2088,11 +2132,15 @@ static void blkfront_connect(struct blkfront_info *info)
 	if (err != 1)
 		physical_sector_size = sector_size;
 
-	err = blkfront_gather_backend_features(info);
-	if (err) {
-		xenbus_dev_fatal(info->xbdev, err, "setup_indirect at %s",
-				 info->xbdev->otherend);
-		return;
+	blkfront_gather_backend_features(info);
+	for (i = 0; i < info->nr_rings; i++) {
+		err = blkfront_setup_indirect(&info->rinfo[i]);
+		if (err) {
+			xenbus_dev_fatal(info->xbdev, err, "setup_indirect at %s",
+					 info->xbdev->otherend);
+			blkif_free(info, 0);
+			break;
+		}
 	}
 
 	err = xlvbd_alloc_gendisk(sectors, info, binfo, sector_size,
@@ -2108,7 +2156,8 @@ static void blkfront_connect(struct blkfront_info *info)
 	/* Kick pending requests. */
 	spin_lock_irq(&info->io_lock);
 	info->connected = BLKIF_STATE_CONNECTED;
-	kick_pending_request_queues(rinfo);
+	for (i = 0; i < info->nr_rings; i++)
+		kick_pending_request_queues(&info->rinfo[i]);
 	spin_unlock_irq(&info->io_lock);
 
 	add_disk(info->gd);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 04/10] xen/blkfront: split per device io_lock
  2015-11-14  3:12 [PATCH v5 00/10] xen-block: multi hardware-queues/rings support Bob Liu
                   ` (4 preceding siblings ...)
  2015-11-14  3:12 ` [PATCH v5 04/10] xen/blkfront: split per device io_lock Bob Liu
@ 2015-11-14  3:12 ` Bob Liu
  2015-11-16 20:59   ` Konrad Rzeszutek Wilk
  2015-11-14  3:12 ` [PATCH v5 05/10] xen/blkfront: negotiate number of queues/rings to be used with backend Bob Liu
                   ` (16 subsequent siblings)
  22 siblings, 1 reply; 59+ messages in thread
From: Bob Liu @ 2015-11-14  3:12 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, roger.pau, konrad.wilk, felipe.franciosi, axboe,
	avanzini.arianna, rafal.mielniczuk, jonathan.davies,
	david.vrabel, Bob Liu

After commit "xen/blkfront: separate per ring information out of device
info", per-ring data is protected by a per-device lock('io_lock').

This is not a good way and will effect the scalability, so introduces a
per-ring lock('ring_lock').

The old 'io_lock' is renamed to 'dev_lock' which protects the ->grants list and
persistent_gnts_c shared by all rings.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
v2:
 * Introduce kick_pending_request_queues_locked().
 * Add comment for 'ring_lock'.
 * Move locks to more suitable place.
---
 drivers/block/xen-blkfront.c |   73 +++++++++++++++++++++++++++---------------
 1 file changed, 47 insertions(+), 26 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index d73734f..56c9ec6 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -125,6 +125,8 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the
  *  depending on how many hardware queues/rings to be used.
  */
 struct blkfront_ring_info {
+	/* Lock to protect data in every ring buffer. */
+	spinlock_t ring_lock;
 	struct blkif_front_ring ring;
 	unsigned int ring_ref[XENBUS_MAX_RING_GRANTS];
 	unsigned int evtchn, irq;
@@ -143,7 +145,6 @@ struct blkfront_ring_info {
  */
 struct blkfront_info
 {
-	spinlock_t io_lock;
 	struct mutex mutex;
 	struct xenbus_device *xbdev;
 	struct gendisk *gd;
@@ -153,6 +154,11 @@ struct blkfront_info
 	/* Number of pages per ring buffer. */
 	unsigned int nr_ring_pages;
 	struct request_queue *rq;
+	/*
+	 * Lock to protect info->grants list and persistent_gnts_c shared by all
+	 * rings.
+	 */
+	spinlock_t dev_lock;
 	struct list_head grants;
 	unsigned int persistent_gnts_c;
 	unsigned int feature_flush;
@@ -258,7 +264,9 @@ static int fill_grant_buffer(struct blkfront_ring_info *rinfo, int num)
 		}
 
 		gnt_list_entry->gref = GRANT_INVALID_REF;
+		spin_lock_irq(&info->dev_lock);
 		list_add(&gnt_list_entry->node, &info->grants);
+		spin_unlock_irq(&info->dev_lock);
 		i++;
 	}
 
@@ -267,7 +275,9 @@ static int fill_grant_buffer(struct blkfront_ring_info *rinfo, int num)
 out_of_memory:
 	list_for_each_entry_safe(gnt_list_entry, n,
 	                         &info->grants, node) {
+		spin_lock_irq(&info->dev_lock);
 		list_del(&gnt_list_entry->node);
+		spin_unlock_irq(&info->dev_lock);
 		if (info->feature_persistent)
 			__free_page(gnt_list_entry->page);
 		kfree(gnt_list_entry);
@@ -280,7 +290,9 @@ out_of_memory:
 static struct grant *get_free_grant(struct blkfront_info *info)
 {
 	struct grant *gnt_list_entry;
+	unsigned long flags;
 
+	spin_lock_irqsave(&info->dev_lock, flags);
 	BUG_ON(list_empty(&info->grants));
 	gnt_list_entry = list_first_entry(&info->grants, struct grant,
 					  node);
@@ -288,6 +300,7 @@ static struct grant *get_free_grant(struct blkfront_info *info)
 
 	if (gnt_list_entry->gref != GRANT_INVALID_REF)
 		info->persistent_gnts_c--;
+	spin_unlock_irqrestore(&info->dev_lock, flags);
 
 	return gnt_list_entry;
 }
@@ -757,11 +770,11 @@ static inline bool blkif_request_flush_invalid(struct request *req,
 static int blkif_queue_rq(struct blk_mq_hw_ctx *hctx,
 			   const struct blk_mq_queue_data *qd)
 {
+	unsigned long flags;
 	struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)hctx->driver_data;
-	struct blkfront_info *info = rinfo->dev_info;
 
 	blk_mq_start_request(qd->rq);
-	spin_lock_irq(&info->io_lock);
+	spin_lock_irqsave(&rinfo->ring_lock, flags);
 	if (RING_FULL(&rinfo->ring))
 		goto out_busy;
 
@@ -772,15 +785,15 @@ static int blkif_queue_rq(struct blk_mq_hw_ctx *hctx,
 		goto out_busy;
 
 	flush_requests(rinfo);
-	spin_unlock_irq(&info->io_lock);
+	spin_unlock_irqrestore(&rinfo->ring_lock, flags);
 	return BLK_MQ_RQ_QUEUE_OK;
 
 out_err:
-	spin_unlock_irq(&info->io_lock);
+	spin_unlock_irqrestore(&rinfo->ring_lock, flags);
 	return BLK_MQ_RQ_QUEUE_ERROR;
 
 out_busy:
-	spin_unlock_irq(&info->io_lock);
+	spin_unlock_irqrestore(&rinfo->ring_lock, flags);
 	blk_mq_stop_hw_queue(hctx);
 	return BLK_MQ_RQ_QUEUE_BUSY;
 }
@@ -1082,21 +1095,28 @@ static void xlvbd_release_gendisk(struct blkfront_info *info)
 	info->gd = NULL;
 }
 
-/* Must be called with io_lock holded */
-static void kick_pending_request_queues(struct blkfront_ring_info *rinfo)
+/* Already hold rinfo->ring_lock. */
+static inline void kick_pending_request_queues_locked(struct blkfront_ring_info *rinfo)
 {
 	if (!RING_FULL(&rinfo->ring))
 		blk_mq_start_stopped_hw_queues(rinfo->dev_info->rq, true);
 }
 
+static void kick_pending_request_queues(struct blkfront_ring_info *rinfo)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&rinfo->ring_lock, flags);
+	kick_pending_request_queues_locked(rinfo);
+	spin_unlock_irqrestore(&rinfo->ring_lock, flags);
+}
+
 static void blkif_restart_queue(struct work_struct *work)
 {
 	struct blkfront_ring_info *rinfo = container_of(work, struct blkfront_ring_info, work);
 
-	spin_lock_irq(&rinfo->dev_info->io_lock);
 	if (rinfo->dev_info->connected == BLKIF_STATE_CONNECTED)
 		kick_pending_request_queues(rinfo);
-	spin_unlock_irq(&rinfo->dev_info->io_lock);
 }
 
 static void blkif_free_ring(struct blkfront_ring_info *rinfo)
@@ -1188,7 +1208,6 @@ static void blkif_free(struct blkfront_info *info, int suspend)
 	unsigned int i;
 
 	/* Prevent new requests being issued until we fix things up. */
-	spin_lock_irq(&info->io_lock);
 	info->connected = suspend ?
 		BLKIF_STATE_SUSPENDED : BLKIF_STATE_DISCONNECTED;
 	/* No more blkif_request(). */
@@ -1196,6 +1215,7 @@ static void blkif_free(struct blkfront_info *info, int suspend)
 		blk_mq_stop_hw_queues(info->rq);
 
 	/* Remove all persistent grants */
+	spin_lock_irq(&info->dev_lock);
 	if (!list_empty(&info->grants)) {
 		list_for_each_entry_safe(persistent_gnt, n,
 					 &info->grants, node) {
@@ -1211,12 +1231,12 @@ static void blkif_free(struct blkfront_info *info, int suspend)
 		}
 	}
 	BUG_ON(info->persistent_gnts_c != 0);
+	spin_unlock_irq(&info->dev_lock);
 
 	for (i = 0; i < info->nr_rings; i++)
 		blkif_free_ring(&info->rinfo[i]);
 	kfree(info->rinfo);
 	info->nr_rings = 0;
-	spin_unlock_irq(&info->io_lock);
 }
 
 struct copy_from_grant {
@@ -1251,6 +1271,7 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_ring_info *ri
 	int i = 0;
 	struct scatterlist *sg;
 	int num_sg, num_grant;
+	unsigned long flags;
 	struct blkfront_info *info = rinfo->dev_info;
 	struct copy_from_grant data = {
 		.s = s,
@@ -1289,8 +1310,10 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_ring_info *ri
 			if (!info->feature_persistent)
 				pr_alert_ratelimited("backed has not unmapped grant: %u\n",
 						     s->grants_used[i]->gref);
+			spin_lock_irqsave(&info->dev_lock, flags);
 			list_add(&s->grants_used[i]->node, &info->grants);
 			info->persistent_gnts_c++;
+			spin_unlock_irqrestore(&info->dev_lock, flags);
 		} else {
 			/*
 			 * If the grant is not mapped by the backend we end the
@@ -1300,7 +1323,9 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_ring_info *ri
 			 */
 			gnttab_end_foreign_access(s->grants_used[i]->gref, 0, 0UL);
 			s->grants_used[i]->gref = GRANT_INVALID_REF;
+			spin_lock_irqsave(&info->dev_lock, flags);
 			list_add_tail(&s->grants_used[i]->node, &info->grants);
+			spin_unlock_irqrestore(&info->dev_lock, flags);
 		}
 	}
 	if (s->req.operation == BLKIF_OP_INDIRECT) {
@@ -1309,8 +1334,10 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_ring_info *ri
 				if (!info->feature_persistent)
 					pr_alert_ratelimited("backed has not unmapped grant: %u\n",
 							     s->indirect_grants[i]->gref);
+				spin_lock_irqsave(&info->dev_lock, flags);
 				list_add(&s->indirect_grants[i]->node, &info->grants);
 				info->persistent_gnts_c++;
+				spin_unlock_irqrestore(&info->dev_lock, flags);
 			} else {
 				struct page *indirect_page;
 
@@ -1324,7 +1351,9 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_ring_info *ri
 					list_add(&indirect_page->lru, &rinfo->indirect_pages);
 				}
 				s->indirect_grants[i]->gref = GRANT_INVALID_REF;
+				spin_lock_irqsave(&info->dev_lock, flags);
 				list_add_tail(&s->indirect_grants[i]->node, &info->grants);
+				spin_unlock_irqrestore(&info->dev_lock, flags);
 			}
 		}
 	}
@@ -1340,13 +1369,10 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 	struct blkfront_info *info = rinfo->dev_info;
 	int error;
 
-	spin_lock_irqsave(&info->io_lock, flags);
-
-	if (unlikely(info->connected != BLKIF_STATE_CONNECTED)) {
-		spin_unlock_irqrestore(&info->io_lock, flags);
+	if (unlikely(info->connected != BLKIF_STATE_CONNECTED))
 		return IRQ_HANDLED;
-	}
 
+	spin_lock_irqsave(&rinfo->ring_lock, flags);
  again:
 	rp = rinfo->ring.sring->rsp_prod;
 	rmb(); /* Ensure we see queued responses up to 'rp'. */
@@ -1437,9 +1463,9 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 	} else
 		rinfo->ring.sring->rsp_event = i + 1;
 
-	kick_pending_request_queues(rinfo);
+	kick_pending_request_queues_locked(rinfo);
 
-	spin_unlock_irqrestore(&info->io_lock, flags);
+	spin_unlock_irqrestore(&rinfo->ring_lock, flags);
 
 	return IRQ_HANDLED;
 }
@@ -1686,14 +1712,14 @@ static int blkfront_probe(struct xenbus_device *dev,
 		INIT_LIST_HEAD(&rinfo->indirect_pages);
 		rinfo->dev_info = info;
 		INIT_WORK(&rinfo->work, blkif_restart_queue);
+		spin_lock_init(&rinfo->ring_lock);
 	}
 
 	mutex_init(&info->mutex);
-	spin_lock_init(&info->io_lock);
+	spin_lock_init(&info->dev_lock);
 	info->xbdev = dev;
 	info->vdevice = vdevice;
 	INIT_LIST_HEAD(&info->grants);
-	info->persistent_gnts_c = 0;
 	info->connected = BLKIF_STATE_DISCONNECTED;
 
 	/* Front end dir is a number, which is used as the id. */
@@ -1786,8 +1812,6 @@ static int blkif_recover(struct blkfront_info *info)
 	}
 	xenbus_switch_state(info->xbdev, XenbusStateConnected);
 
-	spin_lock_irq(&info->io_lock);
-
 	/* Now safe for us to use the shared ring */
 	info->connected = BLKIF_STATE_CONNECTED;
 
@@ -1805,7 +1829,6 @@ static int blkif_recover(struct blkfront_info *info)
 		BUG_ON(req->nr_phys_segments > segs);
 		blk_mq_requeue_request(req);
 	}
-	spin_unlock_irq(&info->io_lock);
 	blk_mq_kick_requeue_list(info->rq);
 
 	while ((bio = bio_list_pop(&bio_list)) != NULL) {
@@ -2154,11 +2177,9 @@ static void blkfront_connect(struct blkfront_info *info)
 	xenbus_switch_state(info->xbdev, XenbusStateConnected);
 
 	/* Kick pending requests. */
-	spin_lock_irq(&info->io_lock);
 	info->connected = BLKIF_STATE_CONNECTED;
 	for (i = 0; i < info->nr_rings; i++)
 		kick_pending_request_queues(&info->rinfo[i]);
-	spin_unlock_irq(&info->io_lock);
 
 	add_disk(info->gd);
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 04/10] xen/blkfront: split per device io_lock
  2015-11-14  3:12 [PATCH v5 00/10] xen-block: multi hardware-queues/rings support Bob Liu
                   ` (3 preceding siblings ...)
  2015-11-14  3:12   ` Bob Liu
@ 2015-11-14  3:12 ` Bob Liu
  2015-11-14  3:12 ` Bob Liu
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 59+ messages in thread
From: Bob Liu @ 2015-11-14  3:12 UTC (permalink / raw)
  To: xen-devel
  Cc: jonathan.davies, felipe.franciosi, rafal.mielniczuk,
	linux-kernel, axboe, Bob Liu, david.vrabel, avanzini.arianna,
	roger.pau

After commit "xen/blkfront: separate per ring information out of device
info", per-ring data is protected by a per-device lock('io_lock').

This is not a good way and will effect the scalability, so introduces a
per-ring lock('ring_lock').

The old 'io_lock' is renamed to 'dev_lock' which protects the ->grants list and
persistent_gnts_c shared by all rings.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
v2:
 * Introduce kick_pending_request_queues_locked().
 * Add comment for 'ring_lock'.
 * Move locks to more suitable place.
---
 drivers/block/xen-blkfront.c |   73 +++++++++++++++++++++++++++---------------
 1 file changed, 47 insertions(+), 26 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index d73734f..56c9ec6 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -125,6 +125,8 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the
  *  depending on how many hardware queues/rings to be used.
  */
 struct blkfront_ring_info {
+	/* Lock to protect data in every ring buffer. */
+	spinlock_t ring_lock;
 	struct blkif_front_ring ring;
 	unsigned int ring_ref[XENBUS_MAX_RING_GRANTS];
 	unsigned int evtchn, irq;
@@ -143,7 +145,6 @@ struct blkfront_ring_info {
  */
 struct blkfront_info
 {
-	spinlock_t io_lock;
 	struct mutex mutex;
 	struct xenbus_device *xbdev;
 	struct gendisk *gd;
@@ -153,6 +154,11 @@ struct blkfront_info
 	/* Number of pages per ring buffer. */
 	unsigned int nr_ring_pages;
 	struct request_queue *rq;
+	/*
+	 * Lock to protect info->grants list and persistent_gnts_c shared by all
+	 * rings.
+	 */
+	spinlock_t dev_lock;
 	struct list_head grants;
 	unsigned int persistent_gnts_c;
 	unsigned int feature_flush;
@@ -258,7 +264,9 @@ static int fill_grant_buffer(struct blkfront_ring_info *rinfo, int num)
 		}
 
 		gnt_list_entry->gref = GRANT_INVALID_REF;
+		spin_lock_irq(&info->dev_lock);
 		list_add(&gnt_list_entry->node, &info->grants);
+		spin_unlock_irq(&info->dev_lock);
 		i++;
 	}
 
@@ -267,7 +275,9 @@ static int fill_grant_buffer(struct blkfront_ring_info *rinfo, int num)
 out_of_memory:
 	list_for_each_entry_safe(gnt_list_entry, n,
 	                         &info->grants, node) {
+		spin_lock_irq(&info->dev_lock);
 		list_del(&gnt_list_entry->node);
+		spin_unlock_irq(&info->dev_lock);
 		if (info->feature_persistent)
 			__free_page(gnt_list_entry->page);
 		kfree(gnt_list_entry);
@@ -280,7 +290,9 @@ out_of_memory:
 static struct grant *get_free_grant(struct blkfront_info *info)
 {
 	struct grant *gnt_list_entry;
+	unsigned long flags;
 
+	spin_lock_irqsave(&info->dev_lock, flags);
 	BUG_ON(list_empty(&info->grants));
 	gnt_list_entry = list_first_entry(&info->grants, struct grant,
 					  node);
@@ -288,6 +300,7 @@ static struct grant *get_free_grant(struct blkfront_info *info)
 
 	if (gnt_list_entry->gref != GRANT_INVALID_REF)
 		info->persistent_gnts_c--;
+	spin_unlock_irqrestore(&info->dev_lock, flags);
 
 	return gnt_list_entry;
 }
@@ -757,11 +770,11 @@ static inline bool blkif_request_flush_invalid(struct request *req,
 static int blkif_queue_rq(struct blk_mq_hw_ctx *hctx,
 			   const struct blk_mq_queue_data *qd)
 {
+	unsigned long flags;
 	struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)hctx->driver_data;
-	struct blkfront_info *info = rinfo->dev_info;
 
 	blk_mq_start_request(qd->rq);
-	spin_lock_irq(&info->io_lock);
+	spin_lock_irqsave(&rinfo->ring_lock, flags);
 	if (RING_FULL(&rinfo->ring))
 		goto out_busy;
 
@@ -772,15 +785,15 @@ static int blkif_queue_rq(struct blk_mq_hw_ctx *hctx,
 		goto out_busy;
 
 	flush_requests(rinfo);
-	spin_unlock_irq(&info->io_lock);
+	spin_unlock_irqrestore(&rinfo->ring_lock, flags);
 	return BLK_MQ_RQ_QUEUE_OK;
 
 out_err:
-	spin_unlock_irq(&info->io_lock);
+	spin_unlock_irqrestore(&rinfo->ring_lock, flags);
 	return BLK_MQ_RQ_QUEUE_ERROR;
 
 out_busy:
-	spin_unlock_irq(&info->io_lock);
+	spin_unlock_irqrestore(&rinfo->ring_lock, flags);
 	blk_mq_stop_hw_queue(hctx);
 	return BLK_MQ_RQ_QUEUE_BUSY;
 }
@@ -1082,21 +1095,28 @@ static void xlvbd_release_gendisk(struct blkfront_info *info)
 	info->gd = NULL;
 }
 
-/* Must be called with io_lock holded */
-static void kick_pending_request_queues(struct blkfront_ring_info *rinfo)
+/* Already hold rinfo->ring_lock. */
+static inline void kick_pending_request_queues_locked(struct blkfront_ring_info *rinfo)
 {
 	if (!RING_FULL(&rinfo->ring))
 		blk_mq_start_stopped_hw_queues(rinfo->dev_info->rq, true);
 }
 
+static void kick_pending_request_queues(struct blkfront_ring_info *rinfo)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&rinfo->ring_lock, flags);
+	kick_pending_request_queues_locked(rinfo);
+	spin_unlock_irqrestore(&rinfo->ring_lock, flags);
+}
+
 static void blkif_restart_queue(struct work_struct *work)
 {
 	struct blkfront_ring_info *rinfo = container_of(work, struct blkfront_ring_info, work);
 
-	spin_lock_irq(&rinfo->dev_info->io_lock);
 	if (rinfo->dev_info->connected == BLKIF_STATE_CONNECTED)
 		kick_pending_request_queues(rinfo);
-	spin_unlock_irq(&rinfo->dev_info->io_lock);
 }
 
 static void blkif_free_ring(struct blkfront_ring_info *rinfo)
@@ -1188,7 +1208,6 @@ static void blkif_free(struct blkfront_info *info, int suspend)
 	unsigned int i;
 
 	/* Prevent new requests being issued until we fix things up. */
-	spin_lock_irq(&info->io_lock);
 	info->connected = suspend ?
 		BLKIF_STATE_SUSPENDED : BLKIF_STATE_DISCONNECTED;
 	/* No more blkif_request(). */
@@ -1196,6 +1215,7 @@ static void blkif_free(struct blkfront_info *info, int suspend)
 		blk_mq_stop_hw_queues(info->rq);
 
 	/* Remove all persistent grants */
+	spin_lock_irq(&info->dev_lock);
 	if (!list_empty(&info->grants)) {
 		list_for_each_entry_safe(persistent_gnt, n,
 					 &info->grants, node) {
@@ -1211,12 +1231,12 @@ static void blkif_free(struct blkfront_info *info, int suspend)
 		}
 	}
 	BUG_ON(info->persistent_gnts_c != 0);
+	spin_unlock_irq(&info->dev_lock);
 
 	for (i = 0; i < info->nr_rings; i++)
 		blkif_free_ring(&info->rinfo[i]);
 	kfree(info->rinfo);
 	info->nr_rings = 0;
-	spin_unlock_irq(&info->io_lock);
 }
 
 struct copy_from_grant {
@@ -1251,6 +1271,7 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_ring_info *ri
 	int i = 0;
 	struct scatterlist *sg;
 	int num_sg, num_grant;
+	unsigned long flags;
 	struct blkfront_info *info = rinfo->dev_info;
 	struct copy_from_grant data = {
 		.s = s,
@@ -1289,8 +1310,10 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_ring_info *ri
 			if (!info->feature_persistent)
 				pr_alert_ratelimited("backed has not unmapped grant: %u\n",
 						     s->grants_used[i]->gref);
+			spin_lock_irqsave(&info->dev_lock, flags);
 			list_add(&s->grants_used[i]->node, &info->grants);
 			info->persistent_gnts_c++;
+			spin_unlock_irqrestore(&info->dev_lock, flags);
 		} else {
 			/*
 			 * If the grant is not mapped by the backend we end the
@@ -1300,7 +1323,9 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_ring_info *ri
 			 */
 			gnttab_end_foreign_access(s->grants_used[i]->gref, 0, 0UL);
 			s->grants_used[i]->gref = GRANT_INVALID_REF;
+			spin_lock_irqsave(&info->dev_lock, flags);
 			list_add_tail(&s->grants_used[i]->node, &info->grants);
+			spin_unlock_irqrestore(&info->dev_lock, flags);
 		}
 	}
 	if (s->req.operation == BLKIF_OP_INDIRECT) {
@@ -1309,8 +1334,10 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_ring_info *ri
 				if (!info->feature_persistent)
 					pr_alert_ratelimited("backed has not unmapped grant: %u\n",
 							     s->indirect_grants[i]->gref);
+				spin_lock_irqsave(&info->dev_lock, flags);
 				list_add(&s->indirect_grants[i]->node, &info->grants);
 				info->persistent_gnts_c++;
+				spin_unlock_irqrestore(&info->dev_lock, flags);
 			} else {
 				struct page *indirect_page;
 
@@ -1324,7 +1351,9 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_ring_info *ri
 					list_add(&indirect_page->lru, &rinfo->indirect_pages);
 				}
 				s->indirect_grants[i]->gref = GRANT_INVALID_REF;
+				spin_lock_irqsave(&info->dev_lock, flags);
 				list_add_tail(&s->indirect_grants[i]->node, &info->grants);
+				spin_unlock_irqrestore(&info->dev_lock, flags);
 			}
 		}
 	}
@@ -1340,13 +1369,10 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 	struct blkfront_info *info = rinfo->dev_info;
 	int error;
 
-	spin_lock_irqsave(&info->io_lock, flags);
-
-	if (unlikely(info->connected != BLKIF_STATE_CONNECTED)) {
-		spin_unlock_irqrestore(&info->io_lock, flags);
+	if (unlikely(info->connected != BLKIF_STATE_CONNECTED))
 		return IRQ_HANDLED;
-	}
 
+	spin_lock_irqsave(&rinfo->ring_lock, flags);
  again:
 	rp = rinfo->ring.sring->rsp_prod;
 	rmb(); /* Ensure we see queued responses up to 'rp'. */
@@ -1437,9 +1463,9 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 	} else
 		rinfo->ring.sring->rsp_event = i + 1;
 
-	kick_pending_request_queues(rinfo);
+	kick_pending_request_queues_locked(rinfo);
 
-	spin_unlock_irqrestore(&info->io_lock, flags);
+	spin_unlock_irqrestore(&rinfo->ring_lock, flags);
 
 	return IRQ_HANDLED;
 }
@@ -1686,14 +1712,14 @@ static int blkfront_probe(struct xenbus_device *dev,
 		INIT_LIST_HEAD(&rinfo->indirect_pages);
 		rinfo->dev_info = info;
 		INIT_WORK(&rinfo->work, blkif_restart_queue);
+		spin_lock_init(&rinfo->ring_lock);
 	}
 
 	mutex_init(&info->mutex);
-	spin_lock_init(&info->io_lock);
+	spin_lock_init(&info->dev_lock);
 	info->xbdev = dev;
 	info->vdevice = vdevice;
 	INIT_LIST_HEAD(&info->grants);
-	info->persistent_gnts_c = 0;
 	info->connected = BLKIF_STATE_DISCONNECTED;
 
 	/* Front end dir is a number, which is used as the id. */
@@ -1786,8 +1812,6 @@ static int blkif_recover(struct blkfront_info *info)
 	}
 	xenbus_switch_state(info->xbdev, XenbusStateConnected);
 
-	spin_lock_irq(&info->io_lock);
-
 	/* Now safe for us to use the shared ring */
 	info->connected = BLKIF_STATE_CONNECTED;
 
@@ -1805,7 +1829,6 @@ static int blkif_recover(struct blkfront_info *info)
 		BUG_ON(req->nr_phys_segments > segs);
 		blk_mq_requeue_request(req);
 	}
-	spin_unlock_irq(&info->io_lock);
 	blk_mq_kick_requeue_list(info->rq);
 
 	while ((bio = bio_list_pop(&bio_list)) != NULL) {
@@ -2154,11 +2177,9 @@ static void blkfront_connect(struct blkfront_info *info)
 	xenbus_switch_state(info->xbdev, XenbusStateConnected);
 
 	/* Kick pending requests. */
-	spin_lock_irq(&info->io_lock);
 	info->connected = BLKIF_STATE_CONNECTED;
 	for (i = 0; i < info->nr_rings; i++)
 		kick_pending_request_queues(&info->rinfo[i]);
-	spin_unlock_irq(&info->io_lock);
 
 	add_disk(info->gd);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 05/10] xen/blkfront: negotiate number of queues/rings to be used with backend
  2015-11-14  3:12 [PATCH v5 00/10] xen-block: multi hardware-queues/rings support Bob Liu
                   ` (5 preceding siblings ...)
  2015-11-14  3:12 ` Bob Liu
@ 2015-11-14  3:12 ` Bob Liu
  2015-11-16 21:27   ` Konrad Rzeszutek Wilk
  2015-11-14  3:12 ` Bob Liu
                   ` (15 subsequent siblings)
  22 siblings, 1 reply; 59+ messages in thread
From: Bob Liu @ 2015-11-14  3:12 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, roger.pau, konrad.wilk, felipe.franciosi, axboe,
	avanzini.arianna, rafal.mielniczuk, jonathan.davies,
	david.vrabel, Bob Liu

The max number of hardware queues for xen/blkfront is set by parameter
'max_queues'(default 4), while it is also capped by the max value that the
xen/blkback exposes through XenStore key 'multi-queue-max-queues'.

The negotiated number is the smaller one and would be written back to xenstore
as "multi-queue-num-queues", blkback needs to read this negotiated number.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
v2:
 * Make 'i' be an unsigned int.
 * Other comments from Konrad.
---
 drivers/block/xen-blkfront.c |  160 +++++++++++++++++++++++++++++++-----------
 1 file changed, 119 insertions(+), 41 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 56c9ec6..84496be 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -99,6 +99,10 @@ static unsigned int xen_blkif_max_segments = 32;
 module_param_named(max, xen_blkif_max_segments, int, S_IRUGO);
 MODULE_PARM_DESC(max, "Maximum amount of segments in indirect requests (default is 32)");
 
+static unsigned int xen_blkif_max_queues = 4;
+module_param_named(max_queues, xen_blkif_max_queues, uint, S_IRUGO);
+MODULE_PARM_DESC(max_queues, "Maximum number of hardware queues/rings used per virtual disk");
+
 /*
  * Maximum order of pages to be used for the shared ring between front and
  * backend, 4KB page granularity is used.
@@ -118,6 +122,10 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the
  * characters are enough. Define to 20 to keep consist with backend.
  */
 #define RINGREF_NAME_LEN (20)
+/*
+ * queue-%u would take 7 + 10(UINT_MAX) = 17 characters
+ */
+#define QUEUE_NAME_LEN (17)
 
 /*
  *  Per-ring info.
@@ -823,7 +831,7 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
 
 	memset(&info->tag_set, 0, sizeof(info->tag_set));
 	info->tag_set.ops = &blkfront_mq_ops;
-	info->tag_set.nr_hw_queues = 1;
+	info->tag_set.nr_hw_queues = info->nr_rings;
 	info->tag_set.queue_depth =  BLK_RING_SIZE(info);
 	info->tag_set.numa_node = NUMA_NO_NODE;
 	info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
@@ -1520,6 +1528,53 @@ fail:
 	return err;
 }
 
+/*
+ * Write out per-ring/queue nodes including ring-ref and event-channel, and each
+ * ring buffer may have multi pages depending on ->nr_ring_pages.
+ */
+static int write_per_ring_nodes(struct xenbus_transaction xbt,
+				struct blkfront_ring_info *rinfo, const char *dir)
+{
+	int err;
+	unsigned int i;
+	const char *message = NULL;
+	struct blkfront_info *info = rinfo->dev_info;
+
+	if (info->nr_ring_pages == 1) {
+		err = xenbus_printf(xbt, dir, "ring-ref", "%u", rinfo->ring_ref[0]);
+		if (err) {
+			message = "writing ring-ref";
+			goto abort_transaction;
+		}
+	} else {
+		for (i = 0; i < info->nr_ring_pages; i++) {
+			char ring_ref_name[RINGREF_NAME_LEN];
+
+			snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
+			err = xenbus_printf(xbt, dir, ring_ref_name,
+					    "%u", rinfo->ring_ref[i]);
+			if (err) {
+				message = "writing ring-ref";
+				goto abort_transaction;
+			}
+		}
+	}
+
+	err = xenbus_printf(xbt, dir, "event-channel", "%u", rinfo->evtchn);
+	if (err) {
+		message = "writing event-channel";
+		goto abort_transaction;
+	}
+
+	return 0;
+
+abort_transaction:
+	xenbus_transaction_end(xbt, 1);
+	if (message)
+		xenbus_dev_fatal(info->xbdev, err, "%s", message);
+
+	return err;
+}
 
 /* Common code used when first setting up, and when resuming. */
 static int talk_to_blkback(struct xenbus_device *dev,
@@ -1527,10 +1582,9 @@ static int talk_to_blkback(struct xenbus_device *dev,
 {
 	const char *message = NULL;
 	struct xenbus_transaction xbt;
-	int err, i;
-	unsigned int max_page_order = 0;
+	int err;
+	unsigned int i, max_page_order = 0;
 	unsigned int ring_page_order = 0;
-	struct blkfront_ring_info *rinfo;
 
 	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
 			   "max-ring-page-order", "%u", &max_page_order);
@@ -1542,7 +1596,8 @@ static int talk_to_blkback(struct xenbus_device *dev,
 	}
 
 	for (i = 0; i < info->nr_rings; i++) {
-		rinfo = &info->rinfo[i];
+		struct blkfront_ring_info *rinfo = &info->rinfo[i];
+
 		/* Create shared ring, alloc event channel. */
 		err = setup_blkring(dev, rinfo);
 		if (err)
@@ -1556,44 +1611,49 @@ again:
 		goto destroy_blkring;
 	}
 
-	if (info->nr_rings == 1) {
-		rinfo = &info->rinfo[0];
-		if (info->nr_ring_pages == 1) {
-			err = xenbus_printf(xbt, dev->nodename,
-					    "ring-ref", "%u", rinfo->ring_ref[0]);
-			if (err) {
-				message = "writing ring-ref";
-				goto abort_transaction;
-			}
-		} else {
-			err = xenbus_printf(xbt, dev->nodename,
-					    "ring-page-order", "%u", ring_page_order);
-			if (err) {
-				message = "writing ring-page-order";
-				goto abort_transaction;
-			}
+	if (info->nr_ring_pages > 1) {
+		err = xenbus_printf(xbt, dev->nodename, "ring-page-order", "%u",
+				    ring_page_order);
+		if (err) {
+			message = "writing ring-page-order";
+			goto abort_transaction;
+		}
+	}
 
-			for (i = 0; i < info->nr_ring_pages; i++) {
-				char ring_ref_name[RINGREF_NAME_LEN];
+	/* We already got the number of queues/rings in _probe */
+	if (info->nr_rings == 1) {
+		err = write_per_ring_nodes(xbt, &info->rinfo[0], dev->nodename);
+		if (err)
+			goto destroy_blkring;
+	} else {
+		char *path;
+		size_t pathsize;
 
-				snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
-				err = xenbus_printf(xbt, dev->nodename, ring_ref_name,
-						    "%u", rinfo->ring_ref[i]);
-				if (err) {
-					message = "writing ring-ref";
-					goto abort_transaction;
-				}
-			}
-		}
-		err = xenbus_printf(xbt, dev->nodename,
-				    "event-channel", "%u", rinfo->evtchn);
+		err = xenbus_printf(xbt, dev->nodename, "multi-queue-num-queues", "%u",
+				    info->nr_rings);
 		if (err) {
-			message = "writing event-channel";
+			message = "writing multi-queue-num-queues";
 			goto abort_transaction;
 		}
-	} else {
-		/* Not supported at this stage. */
-		goto abort_transaction;
+
+		pathsize = strlen(dev->nodename) + QUEUE_NAME_LEN;
+		path = kmalloc(pathsize, GFP_KERNEL);
+		if (!path) {
+			err = -ENOMEM;
+			message = "ENOMEM while writing ring references";
+			goto abort_transaction;
+		}
+
+		for (i = 0; i < info->nr_rings; i++) {
+			memset(path, 0, pathsize);
+			snprintf(path, pathsize, "%s/queue-%u", dev->nodename, i);
+			err = write_per_ring_nodes(xbt, &info->rinfo[i], path);
+			if (err) {
+				kfree(path);
+				goto destroy_blkring;
+			}
+		}
+		kfree(path);
 	}
 	err = xenbus_printf(xbt, dev->nodename, "protocol", "%s",
 			    XEN_IO_PROTO_ABI_NATIVE);
@@ -1617,7 +1677,7 @@ again:
 
 	for (i = 0; i < info->nr_rings; i++) {
 		int j;
-		rinfo = &info->rinfo[i];
+		struct blkfront_ring_info *rinfo = &info->rinfo[i];
 
 		for (j = 0; j < BLK_RING_SIZE(info); j++)
 			rinfo->shadow[j].req.u.rw.id = j + 1;
@@ -1648,6 +1708,7 @@ static int blkfront_probe(struct xenbus_device *dev,
 {
 	int err, vdevice, r_index;
 	struct blkfront_info *info;
+	unsigned int backend_max_queues = 0;
 
 	/* FIXME: Use dynamic device id if this is not set. */
 	err = xenbus_scanf(XBT_NIL, dev->nodename,
@@ -1697,7 +1758,18 @@ static int blkfront_probe(struct xenbus_device *dev,
 		return -ENOMEM;
 	}
 
-	info->nr_rings = 1;
+	info->xbdev = dev;
+	/* Check if backend supports multiple queues. */
+	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
+			   "multi-queue-max-queues", "%u", &backend_max_queues);
+	if (err < 0)
+		backend_max_queues = 1;
+
+	info->nr_rings = min(backend_max_queues, xen_blkif_max_queues);
+	/* We need at least one ring. */
+	if (!info->nr_rings)
+		info->nr_rings = 1;
+
 	info->rinfo = kzalloc(sizeof(struct blkfront_ring_info) * info->nr_rings, GFP_KERNEL);
 	if (!info->rinfo) {
 		xenbus_dev_fatal(dev, -ENOMEM, "allocating ring_info structure");
@@ -1717,7 +1789,6 @@ static int blkfront_probe(struct xenbus_device *dev,
 
 	mutex_init(&info->mutex);
 	spin_lock_init(&info->dev_lock);
-	info->xbdev = dev;
 	info->vdevice = vdevice;
 	INIT_LIST_HEAD(&info->grants);
 	info->connected = BLKIF_STATE_DISCONNECTED;
@@ -2386,6 +2457,7 @@ static struct xenbus_driver blkfront_driver = {
 static int __init xlblk_init(void)
 {
 	int ret;
+	int nr_cpus = num_online_cpus();
 
 	if (!xen_domain())
 		return -ENODEV;
@@ -2396,6 +2468,12 @@ static int __init xlblk_init(void)
 		xen_blkif_max_ring_order = 0;
 	}
 
+	if (xen_blkif_max_queues > nr_cpus) {
+		pr_info("Invalid max_queues (%d), will use default max: %d.\n",
+			xen_blkif_max_queues, nr_cpus);
+		xen_blkif_max_queues = nr_cpus;
+	}
+
 	if (!xen_has_pv_disk_devices())
 		return -ENODEV;
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 05/10] xen/blkfront: negotiate number of queues/rings to be used with backend
  2015-11-14  3:12 [PATCH v5 00/10] xen-block: multi hardware-queues/rings support Bob Liu
                   ` (6 preceding siblings ...)
  2015-11-14  3:12 ` [PATCH v5 05/10] xen/blkfront: negotiate number of queues/rings to be used with backend Bob Liu
@ 2015-11-14  3:12 ` Bob Liu
  2015-11-14  3:12   ` Bob Liu
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 59+ messages in thread
From: Bob Liu @ 2015-11-14  3:12 UTC (permalink / raw)
  To: xen-devel
  Cc: jonathan.davies, felipe.franciosi, rafal.mielniczuk,
	linux-kernel, axboe, Bob Liu, david.vrabel, avanzini.arianna,
	roger.pau

The max number of hardware queues for xen/blkfront is set by parameter
'max_queues'(default 4), while it is also capped by the max value that the
xen/blkback exposes through XenStore key 'multi-queue-max-queues'.

The negotiated number is the smaller one and would be written back to xenstore
as "multi-queue-num-queues", blkback needs to read this negotiated number.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
v2:
 * Make 'i' be an unsigned int.
 * Other comments from Konrad.
---
 drivers/block/xen-blkfront.c |  160 +++++++++++++++++++++++++++++++-----------
 1 file changed, 119 insertions(+), 41 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 56c9ec6..84496be 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -99,6 +99,10 @@ static unsigned int xen_blkif_max_segments = 32;
 module_param_named(max, xen_blkif_max_segments, int, S_IRUGO);
 MODULE_PARM_DESC(max, "Maximum amount of segments in indirect requests (default is 32)");
 
+static unsigned int xen_blkif_max_queues = 4;
+module_param_named(max_queues, xen_blkif_max_queues, uint, S_IRUGO);
+MODULE_PARM_DESC(max_queues, "Maximum number of hardware queues/rings used per virtual disk");
+
 /*
  * Maximum order of pages to be used for the shared ring between front and
  * backend, 4KB page granularity is used.
@@ -118,6 +122,10 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the
  * characters are enough. Define to 20 to keep consist with backend.
  */
 #define RINGREF_NAME_LEN (20)
+/*
+ * queue-%u would take 7 + 10(UINT_MAX) = 17 characters
+ */
+#define QUEUE_NAME_LEN (17)
 
 /*
  *  Per-ring info.
@@ -823,7 +831,7 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
 
 	memset(&info->tag_set, 0, sizeof(info->tag_set));
 	info->tag_set.ops = &blkfront_mq_ops;
-	info->tag_set.nr_hw_queues = 1;
+	info->tag_set.nr_hw_queues = info->nr_rings;
 	info->tag_set.queue_depth =  BLK_RING_SIZE(info);
 	info->tag_set.numa_node = NUMA_NO_NODE;
 	info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
@@ -1520,6 +1528,53 @@ fail:
 	return err;
 }
 
+/*
+ * Write out per-ring/queue nodes including ring-ref and event-channel, and each
+ * ring buffer may have multi pages depending on ->nr_ring_pages.
+ */
+static int write_per_ring_nodes(struct xenbus_transaction xbt,
+				struct blkfront_ring_info *rinfo, const char *dir)
+{
+	int err;
+	unsigned int i;
+	const char *message = NULL;
+	struct blkfront_info *info = rinfo->dev_info;
+
+	if (info->nr_ring_pages == 1) {
+		err = xenbus_printf(xbt, dir, "ring-ref", "%u", rinfo->ring_ref[0]);
+		if (err) {
+			message = "writing ring-ref";
+			goto abort_transaction;
+		}
+	} else {
+		for (i = 0; i < info->nr_ring_pages; i++) {
+			char ring_ref_name[RINGREF_NAME_LEN];
+
+			snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
+			err = xenbus_printf(xbt, dir, ring_ref_name,
+					    "%u", rinfo->ring_ref[i]);
+			if (err) {
+				message = "writing ring-ref";
+				goto abort_transaction;
+			}
+		}
+	}
+
+	err = xenbus_printf(xbt, dir, "event-channel", "%u", rinfo->evtchn);
+	if (err) {
+		message = "writing event-channel";
+		goto abort_transaction;
+	}
+
+	return 0;
+
+abort_transaction:
+	xenbus_transaction_end(xbt, 1);
+	if (message)
+		xenbus_dev_fatal(info->xbdev, err, "%s", message);
+
+	return err;
+}
 
 /* Common code used when first setting up, and when resuming. */
 static int talk_to_blkback(struct xenbus_device *dev,
@@ -1527,10 +1582,9 @@ static int talk_to_blkback(struct xenbus_device *dev,
 {
 	const char *message = NULL;
 	struct xenbus_transaction xbt;
-	int err, i;
-	unsigned int max_page_order = 0;
+	int err;
+	unsigned int i, max_page_order = 0;
 	unsigned int ring_page_order = 0;
-	struct blkfront_ring_info *rinfo;
 
 	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
 			   "max-ring-page-order", "%u", &max_page_order);
@@ -1542,7 +1596,8 @@ static int talk_to_blkback(struct xenbus_device *dev,
 	}
 
 	for (i = 0; i < info->nr_rings; i++) {
-		rinfo = &info->rinfo[i];
+		struct blkfront_ring_info *rinfo = &info->rinfo[i];
+
 		/* Create shared ring, alloc event channel. */
 		err = setup_blkring(dev, rinfo);
 		if (err)
@@ -1556,44 +1611,49 @@ again:
 		goto destroy_blkring;
 	}
 
-	if (info->nr_rings == 1) {
-		rinfo = &info->rinfo[0];
-		if (info->nr_ring_pages == 1) {
-			err = xenbus_printf(xbt, dev->nodename,
-					    "ring-ref", "%u", rinfo->ring_ref[0]);
-			if (err) {
-				message = "writing ring-ref";
-				goto abort_transaction;
-			}
-		} else {
-			err = xenbus_printf(xbt, dev->nodename,
-					    "ring-page-order", "%u", ring_page_order);
-			if (err) {
-				message = "writing ring-page-order";
-				goto abort_transaction;
-			}
+	if (info->nr_ring_pages > 1) {
+		err = xenbus_printf(xbt, dev->nodename, "ring-page-order", "%u",
+				    ring_page_order);
+		if (err) {
+			message = "writing ring-page-order";
+			goto abort_transaction;
+		}
+	}
 
-			for (i = 0; i < info->nr_ring_pages; i++) {
-				char ring_ref_name[RINGREF_NAME_LEN];
+	/* We already got the number of queues/rings in _probe */
+	if (info->nr_rings == 1) {
+		err = write_per_ring_nodes(xbt, &info->rinfo[0], dev->nodename);
+		if (err)
+			goto destroy_blkring;
+	} else {
+		char *path;
+		size_t pathsize;
 
-				snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
-				err = xenbus_printf(xbt, dev->nodename, ring_ref_name,
-						    "%u", rinfo->ring_ref[i]);
-				if (err) {
-					message = "writing ring-ref";
-					goto abort_transaction;
-				}
-			}
-		}
-		err = xenbus_printf(xbt, dev->nodename,
-				    "event-channel", "%u", rinfo->evtchn);
+		err = xenbus_printf(xbt, dev->nodename, "multi-queue-num-queues", "%u",
+				    info->nr_rings);
 		if (err) {
-			message = "writing event-channel";
+			message = "writing multi-queue-num-queues";
 			goto abort_transaction;
 		}
-	} else {
-		/* Not supported at this stage. */
-		goto abort_transaction;
+
+		pathsize = strlen(dev->nodename) + QUEUE_NAME_LEN;
+		path = kmalloc(pathsize, GFP_KERNEL);
+		if (!path) {
+			err = -ENOMEM;
+			message = "ENOMEM while writing ring references";
+			goto abort_transaction;
+		}
+
+		for (i = 0; i < info->nr_rings; i++) {
+			memset(path, 0, pathsize);
+			snprintf(path, pathsize, "%s/queue-%u", dev->nodename, i);
+			err = write_per_ring_nodes(xbt, &info->rinfo[i], path);
+			if (err) {
+				kfree(path);
+				goto destroy_blkring;
+			}
+		}
+		kfree(path);
 	}
 	err = xenbus_printf(xbt, dev->nodename, "protocol", "%s",
 			    XEN_IO_PROTO_ABI_NATIVE);
@@ -1617,7 +1677,7 @@ again:
 
 	for (i = 0; i < info->nr_rings; i++) {
 		int j;
-		rinfo = &info->rinfo[i];
+		struct blkfront_ring_info *rinfo = &info->rinfo[i];
 
 		for (j = 0; j < BLK_RING_SIZE(info); j++)
 			rinfo->shadow[j].req.u.rw.id = j + 1;
@@ -1648,6 +1708,7 @@ static int blkfront_probe(struct xenbus_device *dev,
 {
 	int err, vdevice, r_index;
 	struct blkfront_info *info;
+	unsigned int backend_max_queues = 0;
 
 	/* FIXME: Use dynamic device id if this is not set. */
 	err = xenbus_scanf(XBT_NIL, dev->nodename,
@@ -1697,7 +1758,18 @@ static int blkfront_probe(struct xenbus_device *dev,
 		return -ENOMEM;
 	}
 
-	info->nr_rings = 1;
+	info->xbdev = dev;
+	/* Check if backend supports multiple queues. */
+	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
+			   "multi-queue-max-queues", "%u", &backend_max_queues);
+	if (err < 0)
+		backend_max_queues = 1;
+
+	info->nr_rings = min(backend_max_queues, xen_blkif_max_queues);
+	/* We need at least one ring. */
+	if (!info->nr_rings)
+		info->nr_rings = 1;
+
 	info->rinfo = kzalloc(sizeof(struct blkfront_ring_info) * info->nr_rings, GFP_KERNEL);
 	if (!info->rinfo) {
 		xenbus_dev_fatal(dev, -ENOMEM, "allocating ring_info structure");
@@ -1717,7 +1789,6 @@ static int blkfront_probe(struct xenbus_device *dev,
 
 	mutex_init(&info->mutex);
 	spin_lock_init(&info->dev_lock);
-	info->xbdev = dev;
 	info->vdevice = vdevice;
 	INIT_LIST_HEAD(&info->grants);
 	info->connected = BLKIF_STATE_DISCONNECTED;
@@ -2386,6 +2457,7 @@ static struct xenbus_driver blkfront_driver = {
 static int __init xlblk_init(void)
 {
 	int ret;
+	int nr_cpus = num_online_cpus();
 
 	if (!xen_domain())
 		return -ENODEV;
@@ -2396,6 +2468,12 @@ static int __init xlblk_init(void)
 		xen_blkif_max_ring_order = 0;
 	}
 
+	if (xen_blkif_max_queues > nr_cpus) {
+		pr_info("Invalid max_queues (%d), will use default max: %d.\n",
+			xen_blkif_max_queues, nr_cpus);
+		xen_blkif_max_queues = nr_cpus;
+	}
+
 	if (!xen_has_pv_disk_devices())
 		return -ENODEV;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 06/10] xen/blkback: separate ring information out of struct xen_blkif
  2015-11-14  3:12 [PATCH v5 00/10] xen-block: multi hardware-queues/rings support Bob Liu
@ 2015-11-14  3:12   ` Bob Liu
  2015-11-14  3:12 ` Bob Liu
                     ` (21 subsequent siblings)
  22 siblings, 0 replies; 59+ messages in thread
From: Bob Liu @ 2015-11-14  3:12 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, roger.pau, konrad.wilk, felipe.franciosi, axboe,
	avanzini.arianna, rafal.mielniczuk, jonathan.davies,
	david.vrabel, Bob Liu

Split per ring information to an new structure "xen_blkif_ring", so that one vbd
device can be associated with one or more rings/hardware queues.

Introduce 'pers_gnts_lock' to protect the pool of persistent grants since we
may have multi backend threads.

This patch is a preparation for supporting multi hardware queues/rings.

Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
v2:
 * Have an BUG_ON on the holding of the pers_gnts_lock.
---
 drivers/block/xen-blkback/blkback.c |  235 ++++++++++++++++++++---------------
 drivers/block/xen-blkback/common.h  |   54 ++++----
 drivers/block/xen-blkback/xenbus.c  |   96 +++++++-------
 3 files changed, 214 insertions(+), 171 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index f909994..fb5bfd4 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -173,11 +173,11 @@ static inline void shrink_free_pagepool(struct xen_blkif *blkif, int num)
 
 #define vaddr(page) ((unsigned long)pfn_to_kaddr(page_to_pfn(page)))
 
-static int do_block_io_op(struct xen_blkif *blkif);
-static int dispatch_rw_block_io(struct xen_blkif *blkif,
+static int do_block_io_op(struct xen_blkif_ring *ring);
+static int dispatch_rw_block_io(struct xen_blkif_ring *ring,
 				struct blkif_request *req,
 				struct pending_req *pending_req);
-static void make_response(struct xen_blkif *blkif, u64 id,
+static void make_response(struct xen_blkif_ring *ring, u64 id,
 			  unsigned short op, int st);
 
 #define foreach_grant_safe(pos, n, rbtree, node) \
@@ -189,14 +189,8 @@ static void make_response(struct xen_blkif *blkif, u64 id,
 
 
 /*
- * We don't need locking around the persistent grant helpers
- * because blkback uses a single-thread for each backed, so we
- * can be sure that this functions will never be called recursively.
- *
- * The only exception to that is put_persistent_grant, that can be called
- * from interrupt context (by xen_blkbk_unmap), so we have to use atomic
- * bit operations to modify the flags of a persistent grant and to count
- * the number of used grants.
+ * pers_gnts_lock must be used around all the persistent grant helpers
+ * because blkback may use multi-thread/queue for each backend.
  */
 static int add_persistent_gnt(struct xen_blkif *blkif,
 			       struct persistent_gnt *persistent_gnt)
@@ -204,6 +198,7 @@ static int add_persistent_gnt(struct xen_blkif *blkif,
 	struct rb_node **new = NULL, *parent = NULL;
 	struct persistent_gnt *this;
 
+	BUG_ON(!spin_is_locked(&blkif->pers_gnts_lock));
 	if (blkif->persistent_gnt_c >= xen_blkif_max_pgrants) {
 		if (!blkif->vbd.overflow_max_grants)
 			blkif->vbd.overflow_max_grants = 1;
@@ -241,6 +236,7 @@ static struct persistent_gnt *get_persistent_gnt(struct xen_blkif *blkif,
 	struct persistent_gnt *data;
 	struct rb_node *node = NULL;
 
+	BUG_ON(!spin_is_locked(&blkif->pers_gnts_lock));
 	node = blkif->persistent_gnts.rb_node;
 	while (node) {
 		data = container_of(node, struct persistent_gnt, node);
@@ -265,6 +261,7 @@ static struct persistent_gnt *get_persistent_gnt(struct xen_blkif *blkif,
 static void put_persistent_gnt(struct xen_blkif *blkif,
                                struct persistent_gnt *persistent_gnt)
 {
+	BUG_ON(!spin_is_locked(&blkif->pers_gnts_lock));
 	if(!test_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags))
 		pr_alert_ratelimited("freeing a grant already unused\n");
 	set_bit(PERSISTENT_GNT_WAS_ACTIVE, persistent_gnt->flags);
@@ -286,6 +283,7 @@ static void free_persistent_gnts(struct xen_blkif *blkif, struct rb_root *root,
 	unmap_data.unmap_ops = unmap;
 	unmap_data.kunmap_ops = NULL;
 
+	BUG_ON(!spin_is_locked(&blkif->pers_gnts_lock));
 	foreach_grant_safe(persistent_gnt, n, root, node) {
 		BUG_ON(persistent_gnt->handle ==
 			BLKBACK_INVALID_HANDLE);
@@ -322,11 +320,13 @@ void xen_blkbk_unmap_purged_grants(struct work_struct *work)
 	int segs_to_unmap = 0;
 	struct xen_blkif *blkif = container_of(work, typeof(*blkif), persistent_purge_work);
 	struct gntab_unmap_queue_data unmap_data;
+	unsigned long flags;
 
 	unmap_data.pages = pages;
 	unmap_data.unmap_ops = unmap;
 	unmap_data.kunmap_ops = NULL;
 
+	spin_lock_irqsave(&blkif->pers_gnts_lock, flags);
 	while(!list_empty(&blkif->persistent_purge_list)) {
 		persistent_gnt = list_first_entry(&blkif->persistent_purge_list,
 		                                  struct persistent_gnt,
@@ -348,6 +348,7 @@ void xen_blkbk_unmap_purged_grants(struct work_struct *work)
 		}
 		kfree(persistent_gnt);
 	}
+	spin_unlock_irqrestore(&blkif->pers_gnts_lock, flags);
 	if (segs_to_unmap > 0) {
 		unmap_data.count = segs_to_unmap;
 		BUG_ON(gnttab_unmap_refs_sync(&unmap_data));
@@ -362,16 +363,18 @@ static void purge_persistent_gnt(struct xen_blkif *blkif)
 	unsigned int num_clean, total;
 	bool scan_used = false, clean_used = false;
 	struct rb_root *root;
+	unsigned long flags;
 
+	spin_lock_irqsave(&blkif->pers_gnts_lock, flags);
 	if (blkif->persistent_gnt_c < xen_blkif_max_pgrants ||
 	    (blkif->persistent_gnt_c == xen_blkif_max_pgrants &&
 	    !blkif->vbd.overflow_max_grants)) {
-		return;
+		goto out;
 	}
 
 	if (work_busy(&blkif->persistent_purge_work)) {
 		pr_alert_ratelimited("Scheduled work from previous purge is still busy, cannot purge list\n");
-		return;
+		goto out;
 	}
 
 	num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN;
@@ -379,7 +382,7 @@ static void purge_persistent_gnt(struct xen_blkif *blkif)
 	num_clean = min(blkif->persistent_gnt_c, num_clean);
 	if ((num_clean == 0) ||
 	    (num_clean > (blkif->persistent_gnt_c - atomic_read(&blkif->persistent_gnt_in_use))))
-		return;
+		goto out;
 
 	/*
 	 * At this point, we can assure that there will be no calls
@@ -436,29 +439,35 @@ finished:
 	}
 
 	blkif->persistent_gnt_c -= (total - num_clean);
+	spin_unlock_irqrestore(&blkif->pers_gnts_lock, flags);
 	blkif->vbd.overflow_max_grants = 0;
 
 	/* We can defer this work */
 	schedule_work(&blkif->persistent_purge_work);
 	pr_debug("Purged %u/%u\n", (total - num_clean), total);
 	return;
+
+out:
+	spin_unlock_irqrestore(&blkif->pers_gnts_lock, flags);
+
+	return;
 }
 
 /*
  * Retrieve from the 'pending_reqs' a free pending_req structure to be used.
  */
-static struct pending_req *alloc_req(struct xen_blkif *blkif)
+static struct pending_req *alloc_req(struct xen_blkif_ring *ring)
 {
 	struct pending_req *req = NULL;
 	unsigned long flags;
 
-	spin_lock_irqsave(&blkif->pending_free_lock, flags);
-	if (!list_empty(&blkif->pending_free)) {
-		req = list_entry(blkif->pending_free.next, struct pending_req,
+	spin_lock_irqsave(&ring->pending_free_lock, flags);
+	if (!list_empty(&ring->pending_free)) {
+		req = list_entry(ring->pending_free.next, struct pending_req,
 				 free_list);
 		list_del(&req->free_list);
 	}
-	spin_unlock_irqrestore(&blkif->pending_free_lock, flags);
+	spin_unlock_irqrestore(&ring->pending_free_lock, flags);
 	return req;
 }
 
@@ -466,17 +475,17 @@ static struct pending_req *alloc_req(struct xen_blkif *blkif)
  * Return the 'pending_req' structure back to the freepool. We also
  * wake up the thread if it was waiting for a free page.
  */
-static void free_req(struct xen_blkif *blkif, struct pending_req *req)
+static void free_req(struct xen_blkif_ring *ring, struct pending_req *req)
 {
 	unsigned long flags;
 	int was_empty;
 
-	spin_lock_irqsave(&blkif->pending_free_lock, flags);
-	was_empty = list_empty(&blkif->pending_free);
-	list_add(&req->free_list, &blkif->pending_free);
-	spin_unlock_irqrestore(&blkif->pending_free_lock, flags);
+	spin_lock_irqsave(&ring->pending_free_lock, flags);
+	was_empty = list_empty(&ring->pending_free);
+	list_add(&req->free_list, &ring->pending_free);
+	spin_unlock_irqrestore(&ring->pending_free_lock, flags);
 	if (was_empty)
-		wake_up(&blkif->pending_free_wq);
+		wake_up(&ring->pending_free_wq);
 }
 
 /*
@@ -556,10 +565,10 @@ abort:
 /*
  * Notification from the guest OS.
  */
-static void blkif_notify_work(struct xen_blkif *blkif)
+static void blkif_notify_work(struct xen_blkif_ring *ring)
 {
-	blkif->waiting_reqs = 1;
-	wake_up(&blkif->wq);
+	ring->waiting_reqs = 1;
+	wake_up(&ring->wq);
 }
 
 irqreturn_t xen_blkif_be_int(int irq, void *dev_id)
@@ -590,7 +599,8 @@ static void print_stats(struct xen_blkif *blkif)
 
 int xen_blkif_schedule(void *arg)
 {
-	struct xen_blkif *blkif = arg;
+	struct xen_blkif_ring *ring = arg;
+	struct xen_blkif *blkif = ring->blkif;
 	struct xen_vbd *vbd = &blkif->vbd;
 	unsigned long timeout;
 	int ret;
@@ -606,27 +616,27 @@ int xen_blkif_schedule(void *arg)
 		timeout = msecs_to_jiffies(LRU_INTERVAL);
 
 		timeout = wait_event_interruptible_timeout(
-			blkif->wq,
-			blkif->waiting_reqs || kthread_should_stop(),
+			ring->wq,
+			ring->waiting_reqs || kthread_should_stop(),
 			timeout);
 		if (timeout == 0)
 			goto purge_gnt_list;
 		timeout = wait_event_interruptible_timeout(
-			blkif->pending_free_wq,
-			!list_empty(&blkif->pending_free) ||
+			ring->pending_free_wq,
+			!list_empty(&ring->pending_free) ||
 			kthread_should_stop(),
 			timeout);
 		if (timeout == 0)
 			goto purge_gnt_list;
 
-		blkif->waiting_reqs = 0;
+		ring->waiting_reqs = 0;
 		smp_mb(); /* clear flag *before* checking for work */
 
-		ret = do_block_io_op(blkif);
+		ret = do_block_io_op(ring);
 		if (ret > 0)
-			blkif->waiting_reqs = 1;
+			ring->waiting_reqs = 1;
 		if (ret == -EACCES)
-			wait_event_interruptible(blkif->shutdown_wq,
+			wait_event_interruptible(ring->shutdown_wq,
 						 kthread_should_stop());
 
 purge_gnt_list:
@@ -649,7 +659,7 @@ purge_gnt_list:
 	if (log_stats)
 		print_stats(blkif);
 
-	blkif->xenblkd = NULL;
+	ring->xenblkd = NULL;
 	xen_blkif_put(blkif);
 
 	return 0;
@@ -658,32 +668,40 @@ purge_gnt_list:
 /*
  * Remove persistent grants and empty the pool of free pages
  */
-void xen_blkbk_free_caches(struct xen_blkif *blkif)
+void xen_blkbk_free_caches(struct xen_blkif_ring *ring)
 {
+	struct xen_blkif *blkif = ring->blkif;
+	unsigned long flags;
+
 	/* Free all persistent grant pages */
+	spin_lock_irqsave(&blkif->pers_gnts_lock, flags);
 	if (!RB_EMPTY_ROOT(&blkif->persistent_gnts))
 		free_persistent_gnts(blkif, &blkif->persistent_gnts,
 			blkif->persistent_gnt_c);
 
 	BUG_ON(!RB_EMPTY_ROOT(&blkif->persistent_gnts));
 	blkif->persistent_gnt_c = 0;
+	spin_unlock_irqrestore(&blkif->pers_gnts_lock, flags);
 
 	/* Since we are shutting down remove all pages from the buffer */
 	shrink_free_pagepool(blkif, 0 /* All */);
 }
 
 static unsigned int xen_blkbk_unmap_prepare(
-	struct xen_blkif *blkif,
+	struct xen_blkif_ring *ring,
 	struct grant_page **pages,
 	unsigned int num,
 	struct gnttab_unmap_grant_ref *unmap_ops,
 	struct page **unmap_pages)
 {
 	unsigned int i, invcount = 0;
+	unsigned long flags;
 
 	for (i = 0; i < num; i++) {
 		if (pages[i]->persistent_gnt != NULL) {
-			put_persistent_gnt(blkif, pages[i]->persistent_gnt);
+			spin_lock_irqsave(&ring->blkif->pers_gnts_lock, flags);
+			put_persistent_gnt(ring->blkif, pages[i]->persistent_gnt);
+			spin_unlock_irqrestore(&ring->blkif->pers_gnts_lock, flags);
 			continue;
 		}
 		if (pages[i]->handle == BLKBACK_INVALID_HANDLE)
@@ -700,17 +718,18 @@ static unsigned int xen_blkbk_unmap_prepare(
 
 static void xen_blkbk_unmap_and_respond_callback(int result, struct gntab_unmap_queue_data *data)
 {
-	struct pending_req* pending_req = (struct pending_req*) (data->data);
-	struct xen_blkif *blkif = pending_req->blkif;
+	struct pending_req *pending_req = (struct pending_req *)(data->data);
+	struct xen_blkif_ring *ring = pending_req->ring;
+	struct xen_blkif *blkif = ring->blkif;
 
 	/* BUG_ON used to reproduce existing behaviour,
 	   but is this the best way to deal with this? */
 	BUG_ON(result);
 
 	put_free_pages(blkif, data->pages, data->count);
-	make_response(blkif, pending_req->id,
+	make_response(ring, pending_req->id,
 		      pending_req->operation, pending_req->status);
-	free_req(blkif, pending_req);
+	free_req(ring, pending_req);
 	/*
 	 * Make sure the request is freed before releasing blkif,
 	 * or there could be a race between free_req and the
@@ -723,7 +742,7 @@ static void xen_blkbk_unmap_and_respond_callback(int result, struct gntab_unmap_
 	 * pending_free_wq if there's a drain going on, but it has
 	 * to be taken into account if the current model is changed.
 	 */
-	if (atomic_dec_and_test(&blkif->inflight) && atomic_read(&blkif->drain)) {
+	if (atomic_dec_and_test(&ring->inflight) && atomic_read(&blkif->drain)) {
 		complete(&blkif->drain_complete);
 	}
 	xen_blkif_put(blkif);
@@ -732,11 +751,11 @@ static void xen_blkbk_unmap_and_respond_callback(int result, struct gntab_unmap_
 static void xen_blkbk_unmap_and_respond(struct pending_req *req)
 {
 	struct gntab_unmap_queue_data* work = &req->gnttab_unmap_data;
-	struct xen_blkif *blkif = req->blkif;
+	struct xen_blkif_ring *ring = req->ring;
 	struct grant_page **pages = req->segments;
 	unsigned int invcount;
 
-	invcount = xen_blkbk_unmap_prepare(blkif, pages, req->nr_segs,
+	invcount = xen_blkbk_unmap_prepare(ring, pages, req->nr_segs,
 					   req->unmap, req->unmap_pages);
 
 	work->data = req;
@@ -757,7 +776,7 @@ static void xen_blkbk_unmap_and_respond(struct pending_req *req)
  * of hypercalls, but since this is only used in error paths there's
  * no real need.
  */
-static void xen_blkbk_unmap(struct xen_blkif *blkif,
+static void xen_blkbk_unmap(struct xen_blkif_ring *ring,
                             struct grant_page *pages[],
                             int num)
 {
@@ -768,20 +787,20 @@ static void xen_blkbk_unmap(struct xen_blkif *blkif,
 
 	while (num) {
 		unsigned int batch = min(num, BLKIF_MAX_SEGMENTS_PER_REQUEST);
-		
-		invcount = xen_blkbk_unmap_prepare(blkif, pages, batch,
+
+		invcount = xen_blkbk_unmap_prepare(ring, pages, batch,
 						   unmap, unmap_pages);
 		if (invcount) {
 			ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, invcount);
 			BUG_ON(ret);
-			put_free_pages(blkif, unmap_pages, invcount);
+			put_free_pages(ring->blkif, unmap_pages, invcount);
 		}
 		pages += batch;
 		num -= batch;
 	}
 }
 
-static int xen_blkbk_map(struct xen_blkif *blkif,
+static int xen_blkbk_map(struct xen_blkif_ring *ring,
 			 struct grant_page *pages[],
 			 int num, bool ro)
 {
@@ -794,6 +813,8 @@ static int xen_blkbk_map(struct xen_blkif *blkif,
 	int ret = 0;
 	int last_map = 0, map_until = 0;
 	int use_persistent_gnts;
+	struct xen_blkif *blkif = ring->blkif;
+	unsigned long irq_flags;
 
 	use_persistent_gnts = (blkif->vbd.feature_gnt_persistent);
 
@@ -806,10 +827,13 @@ again:
 	for (i = map_until; i < num; i++) {
 		uint32_t flags;
 
-		if (use_persistent_gnts)
+		if (use_persistent_gnts) {
+			spin_lock_irqsave(&blkif->pers_gnts_lock, irq_flags);
 			persistent_gnt = get_persistent_gnt(
 				blkif,
 				pages[i]->gref);
+			spin_unlock_irqrestore(&blkif->pers_gnts_lock, irq_flags);
+		}
 
 		if (persistent_gnt) {
 			/*
@@ -880,8 +904,10 @@ again:
 			persistent_gnt->gnt = map[new_map_idx].ref;
 			persistent_gnt->handle = map[new_map_idx].handle;
 			persistent_gnt->page = pages[seg_idx]->page;
+			spin_lock_irqsave(&blkif->pers_gnts_lock, irq_flags);
 			if (add_persistent_gnt(blkif,
 			                       persistent_gnt)) {
+				spin_unlock_irqrestore(&blkif->pers_gnts_lock, irq_flags);
 				kfree(persistent_gnt);
 				persistent_gnt = NULL;
 				goto next;
@@ -890,6 +916,7 @@ again:
 			pr_debug("grant %u added to the tree of persistent grants, using %u/%u\n",
 				 persistent_gnt->gnt, blkif->persistent_gnt_c,
 				 xen_blkif_max_pgrants);
+			spin_unlock_irqrestore(&blkif->pers_gnts_lock, irq_flags);
 			goto next;
 		}
 		if (use_persistent_gnts && !blkif->vbd.overflow_max_grants) {
@@ -921,7 +948,7 @@ static int xen_blkbk_map_seg(struct pending_req *pending_req)
 {
 	int rc;
 
-	rc = xen_blkbk_map(pending_req->blkif, pending_req->segments,
+	rc = xen_blkbk_map(pending_req->ring, pending_req->segments,
 			   pending_req->nr_segs,
 	                   (pending_req->operation != BLKIF_OP_READ));
 
@@ -934,7 +961,7 @@ static int xen_blkbk_parse_indirect(struct blkif_request *req,
 				    struct phys_req *preq)
 {
 	struct grant_page **pages = pending_req->indirect_pages;
-	struct xen_blkif *blkif = pending_req->blkif;
+	struct xen_blkif_ring *ring = pending_req->ring;
 	int indirect_grefs, rc, n, nseg, i;
 	struct blkif_request_segment *segments = NULL;
 
@@ -945,7 +972,7 @@ static int xen_blkbk_parse_indirect(struct blkif_request *req,
 	for (i = 0; i < indirect_grefs; i++)
 		pages[i]->gref = req->u.indirect.indirect_grefs[i];
 
-	rc = xen_blkbk_map(blkif, pages, indirect_grefs, true);
+	rc = xen_blkbk_map(ring, pages, indirect_grefs, true);
 	if (rc)
 		goto unmap;
 
@@ -972,15 +999,16 @@ static int xen_blkbk_parse_indirect(struct blkif_request *req,
 unmap:
 	if (segments)
 		kunmap_atomic(segments);
-	xen_blkbk_unmap(blkif, pages, indirect_grefs);
+	xen_blkbk_unmap(ring, pages, indirect_grefs);
 	return rc;
 }
 
-static int dispatch_discard_io(struct xen_blkif *blkif,
+static int dispatch_discard_io(struct xen_blkif_ring *ring,
 				struct blkif_request *req)
 {
 	int err = 0;
 	int status = BLKIF_RSP_OKAY;
+	struct xen_blkif *blkif = ring->blkif;
 	struct block_device *bdev = blkif->vbd.bdev;
 	unsigned long secure;
 	struct phys_req preq;
@@ -1013,26 +1041,28 @@ fail_response:
 	} else if (err)
 		status = BLKIF_RSP_ERROR;
 
-	make_response(blkif, req->u.discard.id, req->operation, status);
+	make_response(ring, req->u.discard.id, req->operation, status);
 	xen_blkif_put(blkif);
 	return err;
 }
 
-static int dispatch_other_io(struct xen_blkif *blkif,
+static int dispatch_other_io(struct xen_blkif_ring *ring,
 			     struct blkif_request *req,
 			     struct pending_req *pending_req)
 {
-	free_req(blkif, pending_req);
-	make_response(blkif, req->u.other.id, req->operation,
+	free_req(ring, pending_req);
+	make_response(ring, req->u.other.id, req->operation,
 		      BLKIF_RSP_EOPNOTSUPP);
 	return -EIO;
 }
 
-static void xen_blk_drain_io(struct xen_blkif *blkif)
+static void xen_blk_drain_io(struct xen_blkif_ring *ring)
 {
+	struct xen_blkif *blkif = ring->blkif;
+
 	atomic_set(&blkif->drain, 1);
 	do {
-		if (atomic_read(&blkif->inflight) == 0)
+		if (atomic_read(&ring->inflight) == 0)
 			break;
 		wait_for_completion_interruptible_timeout(
 				&blkif->drain_complete, HZ);
@@ -1053,12 +1083,12 @@ static void __end_block_io_op(struct pending_req *pending_req, int error)
 	if ((pending_req->operation == BLKIF_OP_FLUSH_DISKCACHE) &&
 	    (error == -EOPNOTSUPP)) {
 		pr_debug("flush diskcache op failed, not supported\n");
-		xen_blkbk_flush_diskcache(XBT_NIL, pending_req->blkif->be, 0);
+		xen_blkbk_flush_diskcache(XBT_NIL, pending_req->ring->blkif->be, 0);
 		pending_req->status = BLKIF_RSP_EOPNOTSUPP;
 	} else if ((pending_req->operation == BLKIF_OP_WRITE_BARRIER) &&
 		    (error == -EOPNOTSUPP)) {
 		pr_debug("write barrier op failed, not supported\n");
-		xen_blkbk_barrier(XBT_NIL, pending_req->blkif->be, 0);
+		xen_blkbk_barrier(XBT_NIL, pending_req->ring->blkif->be, 0);
 		pending_req->status = BLKIF_RSP_EOPNOTSUPP;
 	} else if (error) {
 		pr_debug("Buffer not up-to-date at end of operation,"
@@ -1092,9 +1122,9 @@ static void end_block_io_op(struct bio *bio)
  * and transmute  it to the block API to hand it over to the proper block disk.
  */
 static int
-__do_block_io_op(struct xen_blkif *blkif)
+__do_block_io_op(struct xen_blkif_ring *ring)
 {
-	union blkif_back_rings *blk_rings = &blkif->blk_rings;
+	union blkif_back_rings *blk_rings = &ring->blk_rings;
 	struct blkif_request req;
 	struct pending_req *pending_req;
 	RING_IDX rc, rp;
@@ -1107,7 +1137,7 @@ __do_block_io_op(struct xen_blkif *blkif)
 	if (RING_REQUEST_PROD_OVERFLOW(&blk_rings->common, rp)) {
 		rc = blk_rings->common.rsp_prod_pvt;
 		pr_warn("Frontend provided bogus ring requests (%d - %d = %d). Halting ring processing on dev=%04x\n",
-			rp, rc, rp - rc, blkif->vbd.pdevice);
+			rp, rc, rp - rc, ring->blkif->vbd.pdevice);
 		return -EACCES;
 	}
 	while (rc != rp) {
@@ -1120,14 +1150,14 @@ __do_block_io_op(struct xen_blkif *blkif)
 			break;
 		}
 
-		pending_req = alloc_req(blkif);
+		pending_req = alloc_req(ring);
 		if (NULL == pending_req) {
-			blkif->st_oo_req++;
+			ring->blkif->st_oo_req++;
 			more_to_do = 1;
 			break;
 		}
 
-		switch (blkif->blk_protocol) {
+		switch (ring->blkif->blk_protocol) {
 		case BLKIF_PROTOCOL_NATIVE:
 			memcpy(&req, RING_GET_REQUEST(&blk_rings->native, rc), sizeof(req));
 			break;
@@ -1151,16 +1181,16 @@ __do_block_io_op(struct xen_blkif *blkif)
 		case BLKIF_OP_WRITE_BARRIER:
 		case BLKIF_OP_FLUSH_DISKCACHE:
 		case BLKIF_OP_INDIRECT:
-			if (dispatch_rw_block_io(blkif, &req, pending_req))
+			if (dispatch_rw_block_io(ring, &req, pending_req))
 				goto done;
 			break;
 		case BLKIF_OP_DISCARD:
-			free_req(blkif, pending_req);
-			if (dispatch_discard_io(blkif, &req))
+			free_req(ring, pending_req);
+			if (dispatch_discard_io(ring, &req))
 				goto done;
 			break;
 		default:
-			if (dispatch_other_io(blkif, &req, pending_req))
+			if (dispatch_other_io(ring, &req, pending_req))
 				goto done;
 			break;
 		}
@@ -1173,13 +1203,13 @@ done:
 }
 
 static int
-do_block_io_op(struct xen_blkif *blkif)
+do_block_io_op(struct xen_blkif_ring *ring)
 {
-	union blkif_back_rings *blk_rings = &blkif->blk_rings;
+	union blkif_back_rings *blk_rings = &ring->blk_rings;
 	int more_to_do;
 
 	do {
-		more_to_do = __do_block_io_op(blkif);
+		more_to_do = __do_block_io_op(ring);
 		if (more_to_do)
 			break;
 
@@ -1192,7 +1222,7 @@ do_block_io_op(struct xen_blkif *blkif)
  * Transmutation of the 'struct blkif_request' to a proper 'struct bio'
  * and call the 'submit_bio' to pass it to the underlying storage.
  */
-static int dispatch_rw_block_io(struct xen_blkif *blkif,
+static int dispatch_rw_block_io(struct xen_blkif_ring *ring,
 				struct blkif_request *req,
 				struct pending_req *pending_req)
 {
@@ -1220,17 +1250,17 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 
 	switch (req_operation) {
 	case BLKIF_OP_READ:
-		blkif->st_rd_req++;
+		ring->blkif->st_rd_req++;
 		operation = READ;
 		break;
 	case BLKIF_OP_WRITE:
-		blkif->st_wr_req++;
+		ring->blkif->st_wr_req++;
 		operation = WRITE_ODIRECT;
 		break;
 	case BLKIF_OP_WRITE_BARRIER:
 		drain = true;
 	case BLKIF_OP_FLUSH_DISKCACHE:
-		blkif->st_f_req++;
+		ring->blkif->st_f_req++;
 		operation = WRITE_FLUSH;
 		break;
 	default:
@@ -1255,7 +1285,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 
 	preq.nr_sects      = 0;
 
-	pending_req->blkif     = blkif;
+	pending_req->ring     = ring;
 	pending_req->id        = req->u.rw.id;
 	pending_req->operation = req_operation;
 	pending_req->status    = BLKIF_RSP_OKAY;
@@ -1282,12 +1312,12 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 			goto fail_response;
 	}
 
-	if (xen_vbd_translate(&preq, blkif, operation) != 0) {
+	if (xen_vbd_translate(&preq, ring->blkif, operation) != 0) {
 		pr_debug("access denied: %s of [%llu,%llu] on dev=%04x\n",
 			 operation == READ ? "read" : "write",
 			 preq.sector_number,
 			 preq.sector_number + preq.nr_sects,
-			 blkif->vbd.pdevice);
+			 ring->blkif->vbd.pdevice);
 		goto fail_response;
 	}
 
@@ -1299,7 +1329,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 		if (((int)preq.sector_number|(int)seg[i].nsec) &
 		    ((bdev_logical_block_size(preq.bdev) >> 9) - 1)) {
 			pr_debug("Misaligned I/O request from domain %d\n",
-				 blkif->domid);
+				 ring->blkif->domid);
 			goto fail_response;
 		}
 	}
@@ -1308,7 +1338,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 	 * issue the WRITE_FLUSH.
 	 */
 	if (drain)
-		xen_blk_drain_io(pending_req->blkif);
+		xen_blk_drain_io(pending_req->ring);
 
 	/*
 	 * If we have failed at this point, we need to undo the M2P override,
@@ -1323,8 +1353,8 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 	 * This corresponding xen_blkif_put is done in __end_block_io_op, or
 	 * below (in "!bio") if we are handling a BLKIF_OP_DISCARD.
 	 */
-	xen_blkif_get(blkif);
-	atomic_inc(&blkif->inflight);
+	xen_blkif_get(ring->blkif);
+	atomic_inc(&ring->inflight);
 
 	for (i = 0; i < nseg; i++) {
 		while ((bio == NULL) ||
@@ -1372,19 +1402,19 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 	blk_finish_plug(&plug);
 
 	if (operation == READ)
-		blkif->st_rd_sect += preq.nr_sects;
+		ring->blkif->st_rd_sect += preq.nr_sects;
 	else if (operation & WRITE)
-		blkif->st_wr_sect += preq.nr_sects;
+		ring->blkif->st_wr_sect += preq.nr_sects;
 
 	return 0;
 
  fail_flush:
-	xen_blkbk_unmap(blkif, pending_req->segments,
+	xen_blkbk_unmap(ring, pending_req->segments,
 	                pending_req->nr_segs);
  fail_response:
 	/* Haven't submitted any bio's yet. */
-	make_response(blkif, req->u.rw.id, req_operation, BLKIF_RSP_ERROR);
-	free_req(blkif, pending_req);
+	make_response(ring, req->u.rw.id, req_operation, BLKIF_RSP_ERROR);
+	free_req(ring, pending_req);
 	msleep(1); /* back off a bit */
 	return -EIO;
 
@@ -1402,21 +1432,22 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 /*
  * Put a response on the ring on how the operation fared.
  */
-static void make_response(struct xen_blkif *blkif, u64 id,
+static void make_response(struct xen_blkif_ring *ring, u64 id,
 			  unsigned short op, int st)
 {
 	struct blkif_response  resp;
 	unsigned long     flags;
-	union blkif_back_rings *blk_rings = &blkif->blk_rings;
+	union blkif_back_rings *blk_rings;
 	int notify;
 
 	resp.id        = id;
 	resp.operation = op;
 	resp.status    = st;
 
-	spin_lock_irqsave(&blkif->blk_ring_lock, flags);
+	spin_lock_irqsave(&ring->blk_ring_lock, flags);
+	blk_rings = &ring->blk_rings;
 	/* Place on the response ring for the relevant domain. */
-	switch (blkif->blk_protocol) {
+	switch (ring->blkif->blk_protocol) {
 	case BLKIF_PROTOCOL_NATIVE:
 		memcpy(RING_GET_RESPONSE(&blk_rings->native, blk_rings->native.rsp_prod_pvt),
 		       &resp, sizeof(resp));
@@ -1434,9 +1465,9 @@ static void make_response(struct xen_blkif *blkif, u64 id,
 	}
 	blk_rings->common.rsp_prod_pvt++;
 	RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&blk_rings->common, notify);
-	spin_unlock_irqrestore(&blkif->blk_ring_lock, flags);
+	spin_unlock_irqrestore(&ring->blk_ring_lock, flags);
 	if (notify)
-		notify_remote_via_irq(blkif->irq);
+		notify_remote_via_irq(ring->irq);
 }
 
 static int __init xen_blkif_init(void)
diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index 68e87a0..f4dfa5b 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -269,34 +269,50 @@ struct persistent_gnt {
 	struct list_head remove_node;
 };
 
+/* Per-ring information. */
+struct xen_blkif_ring {
+	/* Physical parameters of the comms window. */
+	unsigned int		irq;
+	union blkif_back_rings	blk_rings;
+	void			*blk_ring;
+	/* Private fields. */
+	spinlock_t		blk_ring_lock;
+
+	wait_queue_head_t	wq;
+	atomic_t		inflight;
+	/* One thread per blkif ring. */
+	struct task_struct	*xenblkd;
+	unsigned int		waiting_reqs;
+
+	/* List of all 'pending_req' available */
+	struct list_head	pending_free;
+	/* And its spinlock. */
+	spinlock_t		pending_free_lock;
+	wait_queue_head_t	pending_free_wq;
+
+	struct work_struct	free_work;
+	/* Thread shutdown wait queue. */
+	wait_queue_head_t	shutdown_wq;
+	struct xen_blkif *blkif;
+};
+
 struct xen_blkif {
 	/* Unique identifier for this interface. */
 	domid_t			domid;
 	unsigned int		handle;
-	/* Physical parameters of the comms window. */
-	unsigned int		irq;
 	/* Comms information. */
 	enum blkif_protocol	blk_protocol;
-	union blkif_back_rings	blk_rings;
-	void			*blk_ring;
 	/* The VBD attached to this interface. */
 	struct xen_vbd		vbd;
 	/* Back pointer to the backend_info. */
 	struct backend_info	*be;
-	/* Private fields. */
-	spinlock_t		blk_ring_lock;
 	atomic_t		refcnt;
-
-	wait_queue_head_t	wq;
 	/* for barrier (drain) requests */
 	struct completion	drain_complete;
 	atomic_t		drain;
-	atomic_t		inflight;
-	/* One thread per one blkif. */
-	struct task_struct	*xenblkd;
-	unsigned int		waiting_reqs;
 
 	/* tree to store persistent grants */
+	spinlock_t		pers_gnts_lock;
 	struct rb_root		persistent_gnts;
 	unsigned int		persistent_gnt_c;
 	atomic_t		persistent_gnt_in_use;
@@ -311,12 +327,6 @@ struct xen_blkif {
 	int			free_pages_num;
 	struct list_head	free_pages;
 
-	/* List of all 'pending_req' available */
-	struct list_head	pending_free;
-	/* And its spinlock. */
-	spinlock_t		pending_free_lock;
-	wait_queue_head_t	pending_free_wq;
-
 	/* statistics */
 	unsigned long		st_print;
 	unsigned long long			st_rd_req;
@@ -328,9 +338,9 @@ struct xen_blkif {
 	unsigned long long			st_wr_sect;
 
 	struct work_struct	free_work;
-	/* Thread shutdown wait queue. */
-	wait_queue_head_t	shutdown_wq;
 	unsigned int nr_ring_pages;
+	/* All rings for this device. */
+	struct xen_blkif_ring ring;
 };
 
 struct seg_buf {
@@ -352,7 +362,7 @@ struct grant_page {
  * response queued for it, with the saved 'id' passed back.
  */
 struct pending_req {
-	struct xen_blkif	*blkif;
+	struct xen_blkif_ring   *ring;
 	u64			id;
 	int			nr_segs;
 	atomic_t		pendcnt;
@@ -394,7 +404,7 @@ int xen_blkif_xenbus_init(void);
 irqreturn_t xen_blkif_be_int(int irq, void *dev_id);
 int xen_blkif_schedule(void *arg);
 int xen_blkif_purge_persistent(void *arg);
-void xen_blkbk_free_caches(struct xen_blkif *blkif);
+void xen_blkbk_free_caches(struct xen_blkif_ring *ring);
 
 int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt,
 			      struct backend_info *be, int state);
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index f53cff4..e4bfc92 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -88,7 +88,7 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
 	char name[BLKBACK_NAME_LEN];
 
 	/* Not ready to connect? */
-	if (!blkif->irq || !blkif->vbd.bdev)
+	if (!blkif->ring.irq || !blkif->vbd.bdev)
 		return;
 
 	/* Already connected? */
@@ -113,10 +113,10 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
 	}
 	invalidate_inode_pages2(blkif->vbd.bdev->bd_inode->i_mapping);
 
-	blkif->xenblkd = kthread_run(xen_blkif_schedule, blkif, "%s", name);
-	if (IS_ERR(blkif->xenblkd)) {
-		err = PTR_ERR(blkif->xenblkd);
-		blkif->xenblkd = NULL;
+	blkif->ring.xenblkd = kthread_run(xen_blkif_schedule, &blkif->ring, "%s", name);
+	if (IS_ERR(blkif->ring.xenblkd)) {
+		err = PTR_ERR(blkif->ring.xenblkd);
+		blkif->ring.xenblkd = NULL;
 		xenbus_dev_error(blkif->be->dev, err, "start xenblkd");
 		return;
 	}
@@ -125,6 +125,7 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
 static struct xen_blkif *xen_blkif_alloc(domid_t domid)
 {
 	struct xen_blkif *blkif;
+	struct xen_blkif_ring *ring;
 
 	BUILD_BUG_ON(MAX_INDIRECT_PAGES > BLKIF_MAX_INDIRECT_PAGES_PER_REQUEST);
 
@@ -133,41 +134,40 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
 		return ERR_PTR(-ENOMEM);
 
 	blkif->domid = domid;
-	spin_lock_init(&blkif->blk_ring_lock);
 	atomic_set(&blkif->refcnt, 1);
-	init_waitqueue_head(&blkif->wq);
 	init_completion(&blkif->drain_complete);
-	atomic_set(&blkif->drain, 0);
-	blkif->st_print = jiffies;
-	blkif->persistent_gnts.rb_node = NULL;
+	INIT_WORK(&blkif->free_work, xen_blkif_deferred_free);
 	spin_lock_init(&blkif->free_pages_lock);
 	INIT_LIST_HEAD(&blkif->free_pages);
 	INIT_LIST_HEAD(&blkif->persistent_purge_list);
-	blkif->free_pages_num = 0;
-	atomic_set(&blkif->persistent_gnt_in_use, 0);
-	atomic_set(&blkif->inflight, 0);
+	blkif->st_print = jiffies;
 	INIT_WORK(&blkif->persistent_purge_work, xen_blkbk_unmap_purged_grants);
 
-	INIT_LIST_HEAD(&blkif->pending_free);
-	INIT_WORK(&blkif->free_work, xen_blkif_deferred_free);
-	spin_lock_init(&blkif->pending_free_lock);
-	init_waitqueue_head(&blkif->pending_free_wq);
-	init_waitqueue_head(&blkif->shutdown_wq);
+	ring = &blkif->ring;
+	ring->blkif = blkif;
+	spin_lock_init(&ring->blk_ring_lock);
+	init_waitqueue_head(&ring->wq);
+
+	INIT_LIST_HEAD(&ring->pending_free);
+	spin_lock_init(&ring->pending_free_lock);
+	init_waitqueue_head(&ring->pending_free_wq);
+	init_waitqueue_head(&ring->shutdown_wq);
 
 	return blkif;
 }
 
-static int xen_blkif_map(struct xen_blkif *blkif, grant_ref_t *gref,
+static int xen_blkif_map(struct xen_blkif_ring *ring, grant_ref_t *gref,
 			 unsigned int nr_grefs, unsigned int evtchn)
 {
 	int err;
+	struct xen_blkif *blkif = ring->blkif;
 
 	/* Already connected through? */
-	if (blkif->irq)
+	if (ring->irq)
 		return 0;
 
 	err = xenbus_map_ring_valloc(blkif->be->dev, gref, nr_grefs,
-				     &blkif->blk_ring);
+				     &ring->blk_ring);
 	if (err < 0)
 		return err;
 
@@ -175,24 +175,24 @@ static int xen_blkif_map(struct xen_blkif *blkif, grant_ref_t *gref,
 	case BLKIF_PROTOCOL_NATIVE:
 	{
 		struct blkif_sring *sring;
-		sring = (struct blkif_sring *)blkif->blk_ring;
-		BACK_RING_INIT(&blkif->blk_rings.native, sring,
+		sring = (struct blkif_sring *)ring->blk_ring;
+		BACK_RING_INIT(&ring->blk_rings.native, sring,
 			       XEN_PAGE_SIZE * nr_grefs);
 		break;
 	}
 	case BLKIF_PROTOCOL_X86_32:
 	{
 		struct blkif_x86_32_sring *sring_x86_32;
-		sring_x86_32 = (struct blkif_x86_32_sring *)blkif->blk_ring;
-		BACK_RING_INIT(&blkif->blk_rings.x86_32, sring_x86_32,
+		sring_x86_32 = (struct blkif_x86_32_sring *)ring->blk_ring;
+		BACK_RING_INIT(&ring->blk_rings.x86_32, sring_x86_32,
 			       XEN_PAGE_SIZE * nr_grefs);
 		break;
 	}
 	case BLKIF_PROTOCOL_X86_64:
 	{
 		struct blkif_x86_64_sring *sring_x86_64;
-		sring_x86_64 = (struct blkif_x86_64_sring *)blkif->blk_ring;
-		BACK_RING_INIT(&blkif->blk_rings.x86_64, sring_x86_64,
+		sring_x86_64 = (struct blkif_x86_64_sring *)ring->blk_ring;
+		BACK_RING_INIT(&ring->blk_rings.x86_64, sring_x86_64,
 			       XEN_PAGE_SIZE * nr_grefs);
 		break;
 	}
@@ -202,13 +202,13 @@ static int xen_blkif_map(struct xen_blkif *blkif, grant_ref_t *gref,
 
 	err = bind_interdomain_evtchn_to_irqhandler(blkif->domid, evtchn,
 						    xen_blkif_be_int, 0,
-						    "blkif-backend", blkif);
+						    "blkif-backend", ring);
 	if (err < 0) {
-		xenbus_unmap_ring_vfree(blkif->be->dev, blkif->blk_ring);
-		blkif->blk_rings.common.sring = NULL;
+		xenbus_unmap_ring_vfree(blkif->be->dev, ring->blk_ring);
+		ring->blk_rings.common.sring = NULL;
 		return err;
 	}
-	blkif->irq = err;
+	ring->irq = err;
 
 	return 0;
 }
@@ -217,35 +217,36 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
 {
 	struct pending_req *req, *n;
 	int i = 0, j;
+	struct xen_blkif_ring *ring = &blkif->ring;
 
-	if (blkif->xenblkd) {
-		kthread_stop(blkif->xenblkd);
-		wake_up(&blkif->shutdown_wq);
-		blkif->xenblkd = NULL;
+	if (ring->xenblkd) {
+		kthread_stop(ring->xenblkd);
+		wake_up(&ring->shutdown_wq);
+		ring->xenblkd = NULL;
 	}
 
 	/* The above kthread_stop() guarantees that at this point we
 	 * don't have any discard_io or other_io requests. So, checking
 	 * for inflight IO is enough.
 	 */
-	if (atomic_read(&blkif->inflight) > 0)
+	if (atomic_read(&ring->inflight) > 0)
 		return -EBUSY;
 
-	if (blkif->irq) {
-		unbind_from_irqhandler(blkif->irq, blkif);
-		blkif->irq = 0;
+	if (ring->irq) {
+		unbind_from_irqhandler(ring->irq, ring);
+		ring->irq = 0;
 	}
 
-	if (blkif->blk_rings.common.sring) {
-		xenbus_unmap_ring_vfree(blkif->be->dev, blkif->blk_ring);
-		blkif->blk_rings.common.sring = NULL;
+	if (ring->blk_rings.common.sring) {
+		xenbus_unmap_ring_vfree(blkif->be->dev, ring->blk_ring);
+		ring->blk_rings.common.sring = NULL;
 	}
 
 	/* Remove all persistent grants and the cache of ballooned pages. */
-	xen_blkbk_free_caches(blkif);
+	xen_blkbk_free_caches(ring);
 
 	/* Check that there is no request in use */
-	list_for_each_entry_safe(req, n, &blkif->pending_free, free_list) {
+	list_for_each_entry_safe(req, n, &ring->pending_free, free_list) {
 		list_del(&req->free_list);
 
 		for (j = 0; j < MAX_INDIRECT_SEGMENTS; j++)
@@ -835,6 +836,7 @@ static int connect_ring(struct backend_info *be)
 	char protocol[64] = "";
 	struct pending_req *req, *n;
 	int err, i, j;
+	struct xen_blkif_ring *ring = &be->blkif->ring;
 
 	pr_debug("%s %s\n", __func__, dev->otherend);
 
@@ -923,7 +925,7 @@ static int connect_ring(struct backend_info *be)
 		req = kzalloc(sizeof(*req), GFP_KERNEL);
 		if (!req)
 			goto fail;
-		list_add_tail(&req->free_list, &be->blkif->pending_free);
+		list_add_tail(&req->free_list, &ring->pending_free);
 		for (j = 0; j < MAX_INDIRECT_SEGMENTS; j++) {
 			req->segments[j] = kzalloc(sizeof(*req->segments[0]), GFP_KERNEL);
 			if (!req->segments[j])
@@ -938,7 +940,7 @@ static int connect_ring(struct backend_info *be)
 	}
 
 	/* Map the shared frame, irq etc. */
-	err = xen_blkif_map(be->blkif, ring_ref, nr_grefs, evtchn);
+	err = xen_blkif_map(ring, ring_ref, nr_grefs, evtchn);
 	if (err) {
 		xenbus_dev_fatal(dev, err, "mapping ring-ref port %u", evtchn);
 		return err;
@@ -947,7 +949,7 @@ static int connect_ring(struct backend_info *be)
 	return 0;
 
 fail:
-	list_for_each_entry_safe(req, n, &be->blkif->pending_free, free_list) {
+	list_for_each_entry_safe(req, n, &ring->pending_free, free_list) {
 		list_del(&req->free_list);
 		for (j = 0; j < MAX_INDIRECT_SEGMENTS; j++) {
 			if (!req->segments[j])
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 06/10] xen/blkback: separate ring information out of struct xen_blkif
@ 2015-11-14  3:12   ` Bob Liu
  0 siblings, 0 replies; 59+ messages in thread
From: Bob Liu @ 2015-11-14  3:12 UTC (permalink / raw)
  To: xen-devel
  Cc: jonathan.davies, felipe.franciosi, rafal.mielniczuk,
	linux-kernel, axboe, Bob Liu, david.vrabel, avanzini.arianna,
	roger.pau

Split per ring information to an new structure "xen_blkif_ring", so that one vbd
device can be associated with one or more rings/hardware queues.

Introduce 'pers_gnts_lock' to protect the pool of persistent grants since we
may have multi backend threads.

This patch is a preparation for supporting multi hardware queues/rings.

Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
v2:
 * Have an BUG_ON on the holding of the pers_gnts_lock.
---
 drivers/block/xen-blkback/blkback.c |  235 ++++++++++++++++++++---------------
 drivers/block/xen-blkback/common.h  |   54 ++++----
 drivers/block/xen-blkback/xenbus.c  |   96 +++++++-------
 3 files changed, 214 insertions(+), 171 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index f909994..fb5bfd4 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -173,11 +173,11 @@ static inline void shrink_free_pagepool(struct xen_blkif *blkif, int num)
 
 #define vaddr(page) ((unsigned long)pfn_to_kaddr(page_to_pfn(page)))
 
-static int do_block_io_op(struct xen_blkif *blkif);
-static int dispatch_rw_block_io(struct xen_blkif *blkif,
+static int do_block_io_op(struct xen_blkif_ring *ring);
+static int dispatch_rw_block_io(struct xen_blkif_ring *ring,
 				struct blkif_request *req,
 				struct pending_req *pending_req);
-static void make_response(struct xen_blkif *blkif, u64 id,
+static void make_response(struct xen_blkif_ring *ring, u64 id,
 			  unsigned short op, int st);
 
 #define foreach_grant_safe(pos, n, rbtree, node) \
@@ -189,14 +189,8 @@ static void make_response(struct xen_blkif *blkif, u64 id,
 
 
 /*
- * We don't need locking around the persistent grant helpers
- * because blkback uses a single-thread for each backed, so we
- * can be sure that this functions will never be called recursively.
- *
- * The only exception to that is put_persistent_grant, that can be called
- * from interrupt context (by xen_blkbk_unmap), so we have to use atomic
- * bit operations to modify the flags of a persistent grant and to count
- * the number of used grants.
+ * pers_gnts_lock must be used around all the persistent grant helpers
+ * because blkback may use multi-thread/queue for each backend.
  */
 static int add_persistent_gnt(struct xen_blkif *blkif,
 			       struct persistent_gnt *persistent_gnt)
@@ -204,6 +198,7 @@ static int add_persistent_gnt(struct xen_blkif *blkif,
 	struct rb_node **new = NULL, *parent = NULL;
 	struct persistent_gnt *this;
 
+	BUG_ON(!spin_is_locked(&blkif->pers_gnts_lock));
 	if (blkif->persistent_gnt_c >= xen_blkif_max_pgrants) {
 		if (!blkif->vbd.overflow_max_grants)
 			blkif->vbd.overflow_max_grants = 1;
@@ -241,6 +236,7 @@ static struct persistent_gnt *get_persistent_gnt(struct xen_blkif *blkif,
 	struct persistent_gnt *data;
 	struct rb_node *node = NULL;
 
+	BUG_ON(!spin_is_locked(&blkif->pers_gnts_lock));
 	node = blkif->persistent_gnts.rb_node;
 	while (node) {
 		data = container_of(node, struct persistent_gnt, node);
@@ -265,6 +261,7 @@ static struct persistent_gnt *get_persistent_gnt(struct xen_blkif *blkif,
 static void put_persistent_gnt(struct xen_blkif *blkif,
                                struct persistent_gnt *persistent_gnt)
 {
+	BUG_ON(!spin_is_locked(&blkif->pers_gnts_lock));
 	if(!test_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags))
 		pr_alert_ratelimited("freeing a grant already unused\n");
 	set_bit(PERSISTENT_GNT_WAS_ACTIVE, persistent_gnt->flags);
@@ -286,6 +283,7 @@ static void free_persistent_gnts(struct xen_blkif *blkif, struct rb_root *root,
 	unmap_data.unmap_ops = unmap;
 	unmap_data.kunmap_ops = NULL;
 
+	BUG_ON(!spin_is_locked(&blkif->pers_gnts_lock));
 	foreach_grant_safe(persistent_gnt, n, root, node) {
 		BUG_ON(persistent_gnt->handle ==
 			BLKBACK_INVALID_HANDLE);
@@ -322,11 +320,13 @@ void xen_blkbk_unmap_purged_grants(struct work_struct *work)
 	int segs_to_unmap = 0;
 	struct xen_blkif *blkif = container_of(work, typeof(*blkif), persistent_purge_work);
 	struct gntab_unmap_queue_data unmap_data;
+	unsigned long flags;
 
 	unmap_data.pages = pages;
 	unmap_data.unmap_ops = unmap;
 	unmap_data.kunmap_ops = NULL;
 
+	spin_lock_irqsave(&blkif->pers_gnts_lock, flags);
 	while(!list_empty(&blkif->persistent_purge_list)) {
 		persistent_gnt = list_first_entry(&blkif->persistent_purge_list,
 		                                  struct persistent_gnt,
@@ -348,6 +348,7 @@ void xen_blkbk_unmap_purged_grants(struct work_struct *work)
 		}
 		kfree(persistent_gnt);
 	}
+	spin_unlock_irqrestore(&blkif->pers_gnts_lock, flags);
 	if (segs_to_unmap > 0) {
 		unmap_data.count = segs_to_unmap;
 		BUG_ON(gnttab_unmap_refs_sync(&unmap_data));
@@ -362,16 +363,18 @@ static void purge_persistent_gnt(struct xen_blkif *blkif)
 	unsigned int num_clean, total;
 	bool scan_used = false, clean_used = false;
 	struct rb_root *root;
+	unsigned long flags;
 
+	spin_lock_irqsave(&blkif->pers_gnts_lock, flags);
 	if (blkif->persistent_gnt_c < xen_blkif_max_pgrants ||
 	    (blkif->persistent_gnt_c == xen_blkif_max_pgrants &&
 	    !blkif->vbd.overflow_max_grants)) {
-		return;
+		goto out;
 	}
 
 	if (work_busy(&blkif->persistent_purge_work)) {
 		pr_alert_ratelimited("Scheduled work from previous purge is still busy, cannot purge list\n");
-		return;
+		goto out;
 	}
 
 	num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN;
@@ -379,7 +382,7 @@ static void purge_persistent_gnt(struct xen_blkif *blkif)
 	num_clean = min(blkif->persistent_gnt_c, num_clean);
 	if ((num_clean == 0) ||
 	    (num_clean > (blkif->persistent_gnt_c - atomic_read(&blkif->persistent_gnt_in_use))))
-		return;
+		goto out;
 
 	/*
 	 * At this point, we can assure that there will be no calls
@@ -436,29 +439,35 @@ finished:
 	}
 
 	blkif->persistent_gnt_c -= (total - num_clean);
+	spin_unlock_irqrestore(&blkif->pers_gnts_lock, flags);
 	blkif->vbd.overflow_max_grants = 0;
 
 	/* We can defer this work */
 	schedule_work(&blkif->persistent_purge_work);
 	pr_debug("Purged %u/%u\n", (total - num_clean), total);
 	return;
+
+out:
+	spin_unlock_irqrestore(&blkif->pers_gnts_lock, flags);
+
+	return;
 }
 
 /*
  * Retrieve from the 'pending_reqs' a free pending_req structure to be used.
  */
-static struct pending_req *alloc_req(struct xen_blkif *blkif)
+static struct pending_req *alloc_req(struct xen_blkif_ring *ring)
 {
 	struct pending_req *req = NULL;
 	unsigned long flags;
 
-	spin_lock_irqsave(&blkif->pending_free_lock, flags);
-	if (!list_empty(&blkif->pending_free)) {
-		req = list_entry(blkif->pending_free.next, struct pending_req,
+	spin_lock_irqsave(&ring->pending_free_lock, flags);
+	if (!list_empty(&ring->pending_free)) {
+		req = list_entry(ring->pending_free.next, struct pending_req,
 				 free_list);
 		list_del(&req->free_list);
 	}
-	spin_unlock_irqrestore(&blkif->pending_free_lock, flags);
+	spin_unlock_irqrestore(&ring->pending_free_lock, flags);
 	return req;
 }
 
@@ -466,17 +475,17 @@ static struct pending_req *alloc_req(struct xen_blkif *blkif)
  * Return the 'pending_req' structure back to the freepool. We also
  * wake up the thread if it was waiting for a free page.
  */
-static void free_req(struct xen_blkif *blkif, struct pending_req *req)
+static void free_req(struct xen_blkif_ring *ring, struct pending_req *req)
 {
 	unsigned long flags;
 	int was_empty;
 
-	spin_lock_irqsave(&blkif->pending_free_lock, flags);
-	was_empty = list_empty(&blkif->pending_free);
-	list_add(&req->free_list, &blkif->pending_free);
-	spin_unlock_irqrestore(&blkif->pending_free_lock, flags);
+	spin_lock_irqsave(&ring->pending_free_lock, flags);
+	was_empty = list_empty(&ring->pending_free);
+	list_add(&req->free_list, &ring->pending_free);
+	spin_unlock_irqrestore(&ring->pending_free_lock, flags);
 	if (was_empty)
-		wake_up(&blkif->pending_free_wq);
+		wake_up(&ring->pending_free_wq);
 }
 
 /*
@@ -556,10 +565,10 @@ abort:
 /*
  * Notification from the guest OS.
  */
-static void blkif_notify_work(struct xen_blkif *blkif)
+static void blkif_notify_work(struct xen_blkif_ring *ring)
 {
-	blkif->waiting_reqs = 1;
-	wake_up(&blkif->wq);
+	ring->waiting_reqs = 1;
+	wake_up(&ring->wq);
 }
 
 irqreturn_t xen_blkif_be_int(int irq, void *dev_id)
@@ -590,7 +599,8 @@ static void print_stats(struct xen_blkif *blkif)
 
 int xen_blkif_schedule(void *arg)
 {
-	struct xen_blkif *blkif = arg;
+	struct xen_blkif_ring *ring = arg;
+	struct xen_blkif *blkif = ring->blkif;
 	struct xen_vbd *vbd = &blkif->vbd;
 	unsigned long timeout;
 	int ret;
@@ -606,27 +616,27 @@ int xen_blkif_schedule(void *arg)
 		timeout = msecs_to_jiffies(LRU_INTERVAL);
 
 		timeout = wait_event_interruptible_timeout(
-			blkif->wq,
-			blkif->waiting_reqs || kthread_should_stop(),
+			ring->wq,
+			ring->waiting_reqs || kthread_should_stop(),
 			timeout);
 		if (timeout == 0)
 			goto purge_gnt_list;
 		timeout = wait_event_interruptible_timeout(
-			blkif->pending_free_wq,
-			!list_empty(&blkif->pending_free) ||
+			ring->pending_free_wq,
+			!list_empty(&ring->pending_free) ||
 			kthread_should_stop(),
 			timeout);
 		if (timeout == 0)
 			goto purge_gnt_list;
 
-		blkif->waiting_reqs = 0;
+		ring->waiting_reqs = 0;
 		smp_mb(); /* clear flag *before* checking for work */
 
-		ret = do_block_io_op(blkif);
+		ret = do_block_io_op(ring);
 		if (ret > 0)
-			blkif->waiting_reqs = 1;
+			ring->waiting_reqs = 1;
 		if (ret == -EACCES)
-			wait_event_interruptible(blkif->shutdown_wq,
+			wait_event_interruptible(ring->shutdown_wq,
 						 kthread_should_stop());
 
 purge_gnt_list:
@@ -649,7 +659,7 @@ purge_gnt_list:
 	if (log_stats)
 		print_stats(blkif);
 
-	blkif->xenblkd = NULL;
+	ring->xenblkd = NULL;
 	xen_blkif_put(blkif);
 
 	return 0;
@@ -658,32 +668,40 @@ purge_gnt_list:
 /*
  * Remove persistent grants and empty the pool of free pages
  */
-void xen_blkbk_free_caches(struct xen_blkif *blkif)
+void xen_blkbk_free_caches(struct xen_blkif_ring *ring)
 {
+	struct xen_blkif *blkif = ring->blkif;
+	unsigned long flags;
+
 	/* Free all persistent grant pages */
+	spin_lock_irqsave(&blkif->pers_gnts_lock, flags);
 	if (!RB_EMPTY_ROOT(&blkif->persistent_gnts))
 		free_persistent_gnts(blkif, &blkif->persistent_gnts,
 			blkif->persistent_gnt_c);
 
 	BUG_ON(!RB_EMPTY_ROOT(&blkif->persistent_gnts));
 	blkif->persistent_gnt_c = 0;
+	spin_unlock_irqrestore(&blkif->pers_gnts_lock, flags);
 
 	/* Since we are shutting down remove all pages from the buffer */
 	shrink_free_pagepool(blkif, 0 /* All */);
 }
 
 static unsigned int xen_blkbk_unmap_prepare(
-	struct xen_blkif *blkif,
+	struct xen_blkif_ring *ring,
 	struct grant_page **pages,
 	unsigned int num,
 	struct gnttab_unmap_grant_ref *unmap_ops,
 	struct page **unmap_pages)
 {
 	unsigned int i, invcount = 0;
+	unsigned long flags;
 
 	for (i = 0; i < num; i++) {
 		if (pages[i]->persistent_gnt != NULL) {
-			put_persistent_gnt(blkif, pages[i]->persistent_gnt);
+			spin_lock_irqsave(&ring->blkif->pers_gnts_lock, flags);
+			put_persistent_gnt(ring->blkif, pages[i]->persistent_gnt);
+			spin_unlock_irqrestore(&ring->blkif->pers_gnts_lock, flags);
 			continue;
 		}
 		if (pages[i]->handle == BLKBACK_INVALID_HANDLE)
@@ -700,17 +718,18 @@ static unsigned int xen_blkbk_unmap_prepare(
 
 static void xen_blkbk_unmap_and_respond_callback(int result, struct gntab_unmap_queue_data *data)
 {
-	struct pending_req* pending_req = (struct pending_req*) (data->data);
-	struct xen_blkif *blkif = pending_req->blkif;
+	struct pending_req *pending_req = (struct pending_req *)(data->data);
+	struct xen_blkif_ring *ring = pending_req->ring;
+	struct xen_blkif *blkif = ring->blkif;
 
 	/* BUG_ON used to reproduce existing behaviour,
 	   but is this the best way to deal with this? */
 	BUG_ON(result);
 
 	put_free_pages(blkif, data->pages, data->count);
-	make_response(blkif, pending_req->id,
+	make_response(ring, pending_req->id,
 		      pending_req->operation, pending_req->status);
-	free_req(blkif, pending_req);
+	free_req(ring, pending_req);
 	/*
 	 * Make sure the request is freed before releasing blkif,
 	 * or there could be a race between free_req and the
@@ -723,7 +742,7 @@ static void xen_blkbk_unmap_and_respond_callback(int result, struct gntab_unmap_
 	 * pending_free_wq if there's a drain going on, but it has
 	 * to be taken into account if the current model is changed.
 	 */
-	if (atomic_dec_and_test(&blkif->inflight) && atomic_read(&blkif->drain)) {
+	if (atomic_dec_and_test(&ring->inflight) && atomic_read(&blkif->drain)) {
 		complete(&blkif->drain_complete);
 	}
 	xen_blkif_put(blkif);
@@ -732,11 +751,11 @@ static void xen_blkbk_unmap_and_respond_callback(int result, struct gntab_unmap_
 static void xen_blkbk_unmap_and_respond(struct pending_req *req)
 {
 	struct gntab_unmap_queue_data* work = &req->gnttab_unmap_data;
-	struct xen_blkif *blkif = req->blkif;
+	struct xen_blkif_ring *ring = req->ring;
 	struct grant_page **pages = req->segments;
 	unsigned int invcount;
 
-	invcount = xen_blkbk_unmap_prepare(blkif, pages, req->nr_segs,
+	invcount = xen_blkbk_unmap_prepare(ring, pages, req->nr_segs,
 					   req->unmap, req->unmap_pages);
 
 	work->data = req;
@@ -757,7 +776,7 @@ static void xen_blkbk_unmap_and_respond(struct pending_req *req)
  * of hypercalls, but since this is only used in error paths there's
  * no real need.
  */
-static void xen_blkbk_unmap(struct xen_blkif *blkif,
+static void xen_blkbk_unmap(struct xen_blkif_ring *ring,
                             struct grant_page *pages[],
                             int num)
 {
@@ -768,20 +787,20 @@ static void xen_blkbk_unmap(struct xen_blkif *blkif,
 
 	while (num) {
 		unsigned int batch = min(num, BLKIF_MAX_SEGMENTS_PER_REQUEST);
-		
-		invcount = xen_blkbk_unmap_prepare(blkif, pages, batch,
+
+		invcount = xen_blkbk_unmap_prepare(ring, pages, batch,
 						   unmap, unmap_pages);
 		if (invcount) {
 			ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, invcount);
 			BUG_ON(ret);
-			put_free_pages(blkif, unmap_pages, invcount);
+			put_free_pages(ring->blkif, unmap_pages, invcount);
 		}
 		pages += batch;
 		num -= batch;
 	}
 }
 
-static int xen_blkbk_map(struct xen_blkif *blkif,
+static int xen_blkbk_map(struct xen_blkif_ring *ring,
 			 struct grant_page *pages[],
 			 int num, bool ro)
 {
@@ -794,6 +813,8 @@ static int xen_blkbk_map(struct xen_blkif *blkif,
 	int ret = 0;
 	int last_map = 0, map_until = 0;
 	int use_persistent_gnts;
+	struct xen_blkif *blkif = ring->blkif;
+	unsigned long irq_flags;
 
 	use_persistent_gnts = (blkif->vbd.feature_gnt_persistent);
 
@@ -806,10 +827,13 @@ again:
 	for (i = map_until; i < num; i++) {
 		uint32_t flags;
 
-		if (use_persistent_gnts)
+		if (use_persistent_gnts) {
+			spin_lock_irqsave(&blkif->pers_gnts_lock, irq_flags);
 			persistent_gnt = get_persistent_gnt(
 				blkif,
 				pages[i]->gref);
+			spin_unlock_irqrestore(&blkif->pers_gnts_lock, irq_flags);
+		}
 
 		if (persistent_gnt) {
 			/*
@@ -880,8 +904,10 @@ again:
 			persistent_gnt->gnt = map[new_map_idx].ref;
 			persistent_gnt->handle = map[new_map_idx].handle;
 			persistent_gnt->page = pages[seg_idx]->page;
+			spin_lock_irqsave(&blkif->pers_gnts_lock, irq_flags);
 			if (add_persistent_gnt(blkif,
 			                       persistent_gnt)) {
+				spin_unlock_irqrestore(&blkif->pers_gnts_lock, irq_flags);
 				kfree(persistent_gnt);
 				persistent_gnt = NULL;
 				goto next;
@@ -890,6 +916,7 @@ again:
 			pr_debug("grant %u added to the tree of persistent grants, using %u/%u\n",
 				 persistent_gnt->gnt, blkif->persistent_gnt_c,
 				 xen_blkif_max_pgrants);
+			spin_unlock_irqrestore(&blkif->pers_gnts_lock, irq_flags);
 			goto next;
 		}
 		if (use_persistent_gnts && !blkif->vbd.overflow_max_grants) {
@@ -921,7 +948,7 @@ static int xen_blkbk_map_seg(struct pending_req *pending_req)
 {
 	int rc;
 
-	rc = xen_blkbk_map(pending_req->blkif, pending_req->segments,
+	rc = xen_blkbk_map(pending_req->ring, pending_req->segments,
 			   pending_req->nr_segs,
 	                   (pending_req->operation != BLKIF_OP_READ));
 
@@ -934,7 +961,7 @@ static int xen_blkbk_parse_indirect(struct blkif_request *req,
 				    struct phys_req *preq)
 {
 	struct grant_page **pages = pending_req->indirect_pages;
-	struct xen_blkif *blkif = pending_req->blkif;
+	struct xen_blkif_ring *ring = pending_req->ring;
 	int indirect_grefs, rc, n, nseg, i;
 	struct blkif_request_segment *segments = NULL;
 
@@ -945,7 +972,7 @@ static int xen_blkbk_parse_indirect(struct blkif_request *req,
 	for (i = 0; i < indirect_grefs; i++)
 		pages[i]->gref = req->u.indirect.indirect_grefs[i];
 
-	rc = xen_blkbk_map(blkif, pages, indirect_grefs, true);
+	rc = xen_blkbk_map(ring, pages, indirect_grefs, true);
 	if (rc)
 		goto unmap;
 
@@ -972,15 +999,16 @@ static int xen_blkbk_parse_indirect(struct blkif_request *req,
 unmap:
 	if (segments)
 		kunmap_atomic(segments);
-	xen_blkbk_unmap(blkif, pages, indirect_grefs);
+	xen_blkbk_unmap(ring, pages, indirect_grefs);
 	return rc;
 }
 
-static int dispatch_discard_io(struct xen_blkif *blkif,
+static int dispatch_discard_io(struct xen_blkif_ring *ring,
 				struct blkif_request *req)
 {
 	int err = 0;
 	int status = BLKIF_RSP_OKAY;
+	struct xen_blkif *blkif = ring->blkif;
 	struct block_device *bdev = blkif->vbd.bdev;
 	unsigned long secure;
 	struct phys_req preq;
@@ -1013,26 +1041,28 @@ fail_response:
 	} else if (err)
 		status = BLKIF_RSP_ERROR;
 
-	make_response(blkif, req->u.discard.id, req->operation, status);
+	make_response(ring, req->u.discard.id, req->operation, status);
 	xen_blkif_put(blkif);
 	return err;
 }
 
-static int dispatch_other_io(struct xen_blkif *blkif,
+static int dispatch_other_io(struct xen_blkif_ring *ring,
 			     struct blkif_request *req,
 			     struct pending_req *pending_req)
 {
-	free_req(blkif, pending_req);
-	make_response(blkif, req->u.other.id, req->operation,
+	free_req(ring, pending_req);
+	make_response(ring, req->u.other.id, req->operation,
 		      BLKIF_RSP_EOPNOTSUPP);
 	return -EIO;
 }
 
-static void xen_blk_drain_io(struct xen_blkif *blkif)
+static void xen_blk_drain_io(struct xen_blkif_ring *ring)
 {
+	struct xen_blkif *blkif = ring->blkif;
+
 	atomic_set(&blkif->drain, 1);
 	do {
-		if (atomic_read(&blkif->inflight) == 0)
+		if (atomic_read(&ring->inflight) == 0)
 			break;
 		wait_for_completion_interruptible_timeout(
 				&blkif->drain_complete, HZ);
@@ -1053,12 +1083,12 @@ static void __end_block_io_op(struct pending_req *pending_req, int error)
 	if ((pending_req->operation == BLKIF_OP_FLUSH_DISKCACHE) &&
 	    (error == -EOPNOTSUPP)) {
 		pr_debug("flush diskcache op failed, not supported\n");
-		xen_blkbk_flush_diskcache(XBT_NIL, pending_req->blkif->be, 0);
+		xen_blkbk_flush_diskcache(XBT_NIL, pending_req->ring->blkif->be, 0);
 		pending_req->status = BLKIF_RSP_EOPNOTSUPP;
 	} else if ((pending_req->operation == BLKIF_OP_WRITE_BARRIER) &&
 		    (error == -EOPNOTSUPP)) {
 		pr_debug("write barrier op failed, not supported\n");
-		xen_blkbk_barrier(XBT_NIL, pending_req->blkif->be, 0);
+		xen_blkbk_barrier(XBT_NIL, pending_req->ring->blkif->be, 0);
 		pending_req->status = BLKIF_RSP_EOPNOTSUPP;
 	} else if (error) {
 		pr_debug("Buffer not up-to-date at end of operation,"
@@ -1092,9 +1122,9 @@ static void end_block_io_op(struct bio *bio)
  * and transmute  it to the block API to hand it over to the proper block disk.
  */
 static int
-__do_block_io_op(struct xen_blkif *blkif)
+__do_block_io_op(struct xen_blkif_ring *ring)
 {
-	union blkif_back_rings *blk_rings = &blkif->blk_rings;
+	union blkif_back_rings *blk_rings = &ring->blk_rings;
 	struct blkif_request req;
 	struct pending_req *pending_req;
 	RING_IDX rc, rp;
@@ -1107,7 +1137,7 @@ __do_block_io_op(struct xen_blkif *blkif)
 	if (RING_REQUEST_PROD_OVERFLOW(&blk_rings->common, rp)) {
 		rc = blk_rings->common.rsp_prod_pvt;
 		pr_warn("Frontend provided bogus ring requests (%d - %d = %d). Halting ring processing on dev=%04x\n",
-			rp, rc, rp - rc, blkif->vbd.pdevice);
+			rp, rc, rp - rc, ring->blkif->vbd.pdevice);
 		return -EACCES;
 	}
 	while (rc != rp) {
@@ -1120,14 +1150,14 @@ __do_block_io_op(struct xen_blkif *blkif)
 			break;
 		}
 
-		pending_req = alloc_req(blkif);
+		pending_req = alloc_req(ring);
 		if (NULL == pending_req) {
-			blkif->st_oo_req++;
+			ring->blkif->st_oo_req++;
 			more_to_do = 1;
 			break;
 		}
 
-		switch (blkif->blk_protocol) {
+		switch (ring->blkif->blk_protocol) {
 		case BLKIF_PROTOCOL_NATIVE:
 			memcpy(&req, RING_GET_REQUEST(&blk_rings->native, rc), sizeof(req));
 			break;
@@ -1151,16 +1181,16 @@ __do_block_io_op(struct xen_blkif *blkif)
 		case BLKIF_OP_WRITE_BARRIER:
 		case BLKIF_OP_FLUSH_DISKCACHE:
 		case BLKIF_OP_INDIRECT:
-			if (dispatch_rw_block_io(blkif, &req, pending_req))
+			if (dispatch_rw_block_io(ring, &req, pending_req))
 				goto done;
 			break;
 		case BLKIF_OP_DISCARD:
-			free_req(blkif, pending_req);
-			if (dispatch_discard_io(blkif, &req))
+			free_req(ring, pending_req);
+			if (dispatch_discard_io(ring, &req))
 				goto done;
 			break;
 		default:
-			if (dispatch_other_io(blkif, &req, pending_req))
+			if (dispatch_other_io(ring, &req, pending_req))
 				goto done;
 			break;
 		}
@@ -1173,13 +1203,13 @@ done:
 }
 
 static int
-do_block_io_op(struct xen_blkif *blkif)
+do_block_io_op(struct xen_blkif_ring *ring)
 {
-	union blkif_back_rings *blk_rings = &blkif->blk_rings;
+	union blkif_back_rings *blk_rings = &ring->blk_rings;
 	int more_to_do;
 
 	do {
-		more_to_do = __do_block_io_op(blkif);
+		more_to_do = __do_block_io_op(ring);
 		if (more_to_do)
 			break;
 
@@ -1192,7 +1222,7 @@ do_block_io_op(struct xen_blkif *blkif)
  * Transmutation of the 'struct blkif_request' to a proper 'struct bio'
  * and call the 'submit_bio' to pass it to the underlying storage.
  */
-static int dispatch_rw_block_io(struct xen_blkif *blkif,
+static int dispatch_rw_block_io(struct xen_blkif_ring *ring,
 				struct blkif_request *req,
 				struct pending_req *pending_req)
 {
@@ -1220,17 +1250,17 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 
 	switch (req_operation) {
 	case BLKIF_OP_READ:
-		blkif->st_rd_req++;
+		ring->blkif->st_rd_req++;
 		operation = READ;
 		break;
 	case BLKIF_OP_WRITE:
-		blkif->st_wr_req++;
+		ring->blkif->st_wr_req++;
 		operation = WRITE_ODIRECT;
 		break;
 	case BLKIF_OP_WRITE_BARRIER:
 		drain = true;
 	case BLKIF_OP_FLUSH_DISKCACHE:
-		blkif->st_f_req++;
+		ring->blkif->st_f_req++;
 		operation = WRITE_FLUSH;
 		break;
 	default:
@@ -1255,7 +1285,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 
 	preq.nr_sects      = 0;
 
-	pending_req->blkif     = blkif;
+	pending_req->ring     = ring;
 	pending_req->id        = req->u.rw.id;
 	pending_req->operation = req_operation;
 	pending_req->status    = BLKIF_RSP_OKAY;
@@ -1282,12 +1312,12 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 			goto fail_response;
 	}
 
-	if (xen_vbd_translate(&preq, blkif, operation) != 0) {
+	if (xen_vbd_translate(&preq, ring->blkif, operation) != 0) {
 		pr_debug("access denied: %s of [%llu,%llu] on dev=%04x\n",
 			 operation == READ ? "read" : "write",
 			 preq.sector_number,
 			 preq.sector_number + preq.nr_sects,
-			 blkif->vbd.pdevice);
+			 ring->blkif->vbd.pdevice);
 		goto fail_response;
 	}
 
@@ -1299,7 +1329,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 		if (((int)preq.sector_number|(int)seg[i].nsec) &
 		    ((bdev_logical_block_size(preq.bdev) >> 9) - 1)) {
 			pr_debug("Misaligned I/O request from domain %d\n",
-				 blkif->domid);
+				 ring->blkif->domid);
 			goto fail_response;
 		}
 	}
@@ -1308,7 +1338,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 	 * issue the WRITE_FLUSH.
 	 */
 	if (drain)
-		xen_blk_drain_io(pending_req->blkif);
+		xen_blk_drain_io(pending_req->ring);
 
 	/*
 	 * If we have failed at this point, we need to undo the M2P override,
@@ -1323,8 +1353,8 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 	 * This corresponding xen_blkif_put is done in __end_block_io_op, or
 	 * below (in "!bio") if we are handling a BLKIF_OP_DISCARD.
 	 */
-	xen_blkif_get(blkif);
-	atomic_inc(&blkif->inflight);
+	xen_blkif_get(ring->blkif);
+	atomic_inc(&ring->inflight);
 
 	for (i = 0; i < nseg; i++) {
 		while ((bio == NULL) ||
@@ -1372,19 +1402,19 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 	blk_finish_plug(&plug);
 
 	if (operation == READ)
-		blkif->st_rd_sect += preq.nr_sects;
+		ring->blkif->st_rd_sect += preq.nr_sects;
 	else if (operation & WRITE)
-		blkif->st_wr_sect += preq.nr_sects;
+		ring->blkif->st_wr_sect += preq.nr_sects;
 
 	return 0;
 
  fail_flush:
-	xen_blkbk_unmap(blkif, pending_req->segments,
+	xen_blkbk_unmap(ring, pending_req->segments,
 	                pending_req->nr_segs);
  fail_response:
 	/* Haven't submitted any bio's yet. */
-	make_response(blkif, req->u.rw.id, req_operation, BLKIF_RSP_ERROR);
-	free_req(blkif, pending_req);
+	make_response(ring, req->u.rw.id, req_operation, BLKIF_RSP_ERROR);
+	free_req(ring, pending_req);
 	msleep(1); /* back off a bit */
 	return -EIO;
 
@@ -1402,21 +1432,22 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 /*
  * Put a response on the ring on how the operation fared.
  */
-static void make_response(struct xen_blkif *blkif, u64 id,
+static void make_response(struct xen_blkif_ring *ring, u64 id,
 			  unsigned short op, int st)
 {
 	struct blkif_response  resp;
 	unsigned long     flags;
-	union blkif_back_rings *blk_rings = &blkif->blk_rings;
+	union blkif_back_rings *blk_rings;
 	int notify;
 
 	resp.id        = id;
 	resp.operation = op;
 	resp.status    = st;
 
-	spin_lock_irqsave(&blkif->blk_ring_lock, flags);
+	spin_lock_irqsave(&ring->blk_ring_lock, flags);
+	blk_rings = &ring->blk_rings;
 	/* Place on the response ring for the relevant domain. */
-	switch (blkif->blk_protocol) {
+	switch (ring->blkif->blk_protocol) {
 	case BLKIF_PROTOCOL_NATIVE:
 		memcpy(RING_GET_RESPONSE(&blk_rings->native, blk_rings->native.rsp_prod_pvt),
 		       &resp, sizeof(resp));
@@ -1434,9 +1465,9 @@ static void make_response(struct xen_blkif *blkif, u64 id,
 	}
 	blk_rings->common.rsp_prod_pvt++;
 	RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&blk_rings->common, notify);
-	spin_unlock_irqrestore(&blkif->blk_ring_lock, flags);
+	spin_unlock_irqrestore(&ring->blk_ring_lock, flags);
 	if (notify)
-		notify_remote_via_irq(blkif->irq);
+		notify_remote_via_irq(ring->irq);
 }
 
 static int __init xen_blkif_init(void)
diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index 68e87a0..f4dfa5b 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -269,34 +269,50 @@ struct persistent_gnt {
 	struct list_head remove_node;
 };
 
+/* Per-ring information. */
+struct xen_blkif_ring {
+	/* Physical parameters of the comms window. */
+	unsigned int		irq;
+	union blkif_back_rings	blk_rings;
+	void			*blk_ring;
+	/* Private fields. */
+	spinlock_t		blk_ring_lock;
+
+	wait_queue_head_t	wq;
+	atomic_t		inflight;
+	/* One thread per blkif ring. */
+	struct task_struct	*xenblkd;
+	unsigned int		waiting_reqs;
+
+	/* List of all 'pending_req' available */
+	struct list_head	pending_free;
+	/* And its spinlock. */
+	spinlock_t		pending_free_lock;
+	wait_queue_head_t	pending_free_wq;
+
+	struct work_struct	free_work;
+	/* Thread shutdown wait queue. */
+	wait_queue_head_t	shutdown_wq;
+	struct xen_blkif *blkif;
+};
+
 struct xen_blkif {
 	/* Unique identifier for this interface. */
 	domid_t			domid;
 	unsigned int		handle;
-	/* Physical parameters of the comms window. */
-	unsigned int		irq;
 	/* Comms information. */
 	enum blkif_protocol	blk_protocol;
-	union blkif_back_rings	blk_rings;
-	void			*blk_ring;
 	/* The VBD attached to this interface. */
 	struct xen_vbd		vbd;
 	/* Back pointer to the backend_info. */
 	struct backend_info	*be;
-	/* Private fields. */
-	spinlock_t		blk_ring_lock;
 	atomic_t		refcnt;
-
-	wait_queue_head_t	wq;
 	/* for barrier (drain) requests */
 	struct completion	drain_complete;
 	atomic_t		drain;
-	atomic_t		inflight;
-	/* One thread per one blkif. */
-	struct task_struct	*xenblkd;
-	unsigned int		waiting_reqs;
 
 	/* tree to store persistent grants */
+	spinlock_t		pers_gnts_lock;
 	struct rb_root		persistent_gnts;
 	unsigned int		persistent_gnt_c;
 	atomic_t		persistent_gnt_in_use;
@@ -311,12 +327,6 @@ struct xen_blkif {
 	int			free_pages_num;
 	struct list_head	free_pages;
 
-	/* List of all 'pending_req' available */
-	struct list_head	pending_free;
-	/* And its spinlock. */
-	spinlock_t		pending_free_lock;
-	wait_queue_head_t	pending_free_wq;
-
 	/* statistics */
 	unsigned long		st_print;
 	unsigned long long			st_rd_req;
@@ -328,9 +338,9 @@ struct xen_blkif {
 	unsigned long long			st_wr_sect;
 
 	struct work_struct	free_work;
-	/* Thread shutdown wait queue. */
-	wait_queue_head_t	shutdown_wq;
 	unsigned int nr_ring_pages;
+	/* All rings for this device. */
+	struct xen_blkif_ring ring;
 };
 
 struct seg_buf {
@@ -352,7 +362,7 @@ struct grant_page {
  * response queued for it, with the saved 'id' passed back.
  */
 struct pending_req {
-	struct xen_blkif	*blkif;
+	struct xen_blkif_ring   *ring;
 	u64			id;
 	int			nr_segs;
 	atomic_t		pendcnt;
@@ -394,7 +404,7 @@ int xen_blkif_xenbus_init(void);
 irqreturn_t xen_blkif_be_int(int irq, void *dev_id);
 int xen_blkif_schedule(void *arg);
 int xen_blkif_purge_persistent(void *arg);
-void xen_blkbk_free_caches(struct xen_blkif *blkif);
+void xen_blkbk_free_caches(struct xen_blkif_ring *ring);
 
 int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt,
 			      struct backend_info *be, int state);
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index f53cff4..e4bfc92 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -88,7 +88,7 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
 	char name[BLKBACK_NAME_LEN];
 
 	/* Not ready to connect? */
-	if (!blkif->irq || !blkif->vbd.bdev)
+	if (!blkif->ring.irq || !blkif->vbd.bdev)
 		return;
 
 	/* Already connected? */
@@ -113,10 +113,10 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
 	}
 	invalidate_inode_pages2(blkif->vbd.bdev->bd_inode->i_mapping);
 
-	blkif->xenblkd = kthread_run(xen_blkif_schedule, blkif, "%s", name);
-	if (IS_ERR(blkif->xenblkd)) {
-		err = PTR_ERR(blkif->xenblkd);
-		blkif->xenblkd = NULL;
+	blkif->ring.xenblkd = kthread_run(xen_blkif_schedule, &blkif->ring, "%s", name);
+	if (IS_ERR(blkif->ring.xenblkd)) {
+		err = PTR_ERR(blkif->ring.xenblkd);
+		blkif->ring.xenblkd = NULL;
 		xenbus_dev_error(blkif->be->dev, err, "start xenblkd");
 		return;
 	}
@@ -125,6 +125,7 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
 static struct xen_blkif *xen_blkif_alloc(domid_t domid)
 {
 	struct xen_blkif *blkif;
+	struct xen_blkif_ring *ring;
 
 	BUILD_BUG_ON(MAX_INDIRECT_PAGES > BLKIF_MAX_INDIRECT_PAGES_PER_REQUEST);
 
@@ -133,41 +134,40 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
 		return ERR_PTR(-ENOMEM);
 
 	blkif->domid = domid;
-	spin_lock_init(&blkif->blk_ring_lock);
 	atomic_set(&blkif->refcnt, 1);
-	init_waitqueue_head(&blkif->wq);
 	init_completion(&blkif->drain_complete);
-	atomic_set(&blkif->drain, 0);
-	blkif->st_print = jiffies;
-	blkif->persistent_gnts.rb_node = NULL;
+	INIT_WORK(&blkif->free_work, xen_blkif_deferred_free);
 	spin_lock_init(&blkif->free_pages_lock);
 	INIT_LIST_HEAD(&blkif->free_pages);
 	INIT_LIST_HEAD(&blkif->persistent_purge_list);
-	blkif->free_pages_num = 0;
-	atomic_set(&blkif->persistent_gnt_in_use, 0);
-	atomic_set(&blkif->inflight, 0);
+	blkif->st_print = jiffies;
 	INIT_WORK(&blkif->persistent_purge_work, xen_blkbk_unmap_purged_grants);
 
-	INIT_LIST_HEAD(&blkif->pending_free);
-	INIT_WORK(&blkif->free_work, xen_blkif_deferred_free);
-	spin_lock_init(&blkif->pending_free_lock);
-	init_waitqueue_head(&blkif->pending_free_wq);
-	init_waitqueue_head(&blkif->shutdown_wq);
+	ring = &blkif->ring;
+	ring->blkif = blkif;
+	spin_lock_init(&ring->blk_ring_lock);
+	init_waitqueue_head(&ring->wq);
+
+	INIT_LIST_HEAD(&ring->pending_free);
+	spin_lock_init(&ring->pending_free_lock);
+	init_waitqueue_head(&ring->pending_free_wq);
+	init_waitqueue_head(&ring->shutdown_wq);
 
 	return blkif;
 }
 
-static int xen_blkif_map(struct xen_blkif *blkif, grant_ref_t *gref,
+static int xen_blkif_map(struct xen_blkif_ring *ring, grant_ref_t *gref,
 			 unsigned int nr_grefs, unsigned int evtchn)
 {
 	int err;
+	struct xen_blkif *blkif = ring->blkif;
 
 	/* Already connected through? */
-	if (blkif->irq)
+	if (ring->irq)
 		return 0;
 
 	err = xenbus_map_ring_valloc(blkif->be->dev, gref, nr_grefs,
-				     &blkif->blk_ring);
+				     &ring->blk_ring);
 	if (err < 0)
 		return err;
 
@@ -175,24 +175,24 @@ static int xen_blkif_map(struct xen_blkif *blkif, grant_ref_t *gref,
 	case BLKIF_PROTOCOL_NATIVE:
 	{
 		struct blkif_sring *sring;
-		sring = (struct blkif_sring *)blkif->blk_ring;
-		BACK_RING_INIT(&blkif->blk_rings.native, sring,
+		sring = (struct blkif_sring *)ring->blk_ring;
+		BACK_RING_INIT(&ring->blk_rings.native, sring,
 			       XEN_PAGE_SIZE * nr_grefs);
 		break;
 	}
 	case BLKIF_PROTOCOL_X86_32:
 	{
 		struct blkif_x86_32_sring *sring_x86_32;
-		sring_x86_32 = (struct blkif_x86_32_sring *)blkif->blk_ring;
-		BACK_RING_INIT(&blkif->blk_rings.x86_32, sring_x86_32,
+		sring_x86_32 = (struct blkif_x86_32_sring *)ring->blk_ring;
+		BACK_RING_INIT(&ring->blk_rings.x86_32, sring_x86_32,
 			       XEN_PAGE_SIZE * nr_grefs);
 		break;
 	}
 	case BLKIF_PROTOCOL_X86_64:
 	{
 		struct blkif_x86_64_sring *sring_x86_64;
-		sring_x86_64 = (struct blkif_x86_64_sring *)blkif->blk_ring;
-		BACK_RING_INIT(&blkif->blk_rings.x86_64, sring_x86_64,
+		sring_x86_64 = (struct blkif_x86_64_sring *)ring->blk_ring;
+		BACK_RING_INIT(&ring->blk_rings.x86_64, sring_x86_64,
 			       XEN_PAGE_SIZE * nr_grefs);
 		break;
 	}
@@ -202,13 +202,13 @@ static int xen_blkif_map(struct xen_blkif *blkif, grant_ref_t *gref,
 
 	err = bind_interdomain_evtchn_to_irqhandler(blkif->domid, evtchn,
 						    xen_blkif_be_int, 0,
-						    "blkif-backend", blkif);
+						    "blkif-backend", ring);
 	if (err < 0) {
-		xenbus_unmap_ring_vfree(blkif->be->dev, blkif->blk_ring);
-		blkif->blk_rings.common.sring = NULL;
+		xenbus_unmap_ring_vfree(blkif->be->dev, ring->blk_ring);
+		ring->blk_rings.common.sring = NULL;
 		return err;
 	}
-	blkif->irq = err;
+	ring->irq = err;
 
 	return 0;
 }
@@ -217,35 +217,36 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
 {
 	struct pending_req *req, *n;
 	int i = 0, j;
+	struct xen_blkif_ring *ring = &blkif->ring;
 
-	if (blkif->xenblkd) {
-		kthread_stop(blkif->xenblkd);
-		wake_up(&blkif->shutdown_wq);
-		blkif->xenblkd = NULL;
+	if (ring->xenblkd) {
+		kthread_stop(ring->xenblkd);
+		wake_up(&ring->shutdown_wq);
+		ring->xenblkd = NULL;
 	}
 
 	/* The above kthread_stop() guarantees that at this point we
 	 * don't have any discard_io or other_io requests. So, checking
 	 * for inflight IO is enough.
 	 */
-	if (atomic_read(&blkif->inflight) > 0)
+	if (atomic_read(&ring->inflight) > 0)
 		return -EBUSY;
 
-	if (blkif->irq) {
-		unbind_from_irqhandler(blkif->irq, blkif);
-		blkif->irq = 0;
+	if (ring->irq) {
+		unbind_from_irqhandler(ring->irq, ring);
+		ring->irq = 0;
 	}
 
-	if (blkif->blk_rings.common.sring) {
-		xenbus_unmap_ring_vfree(blkif->be->dev, blkif->blk_ring);
-		blkif->blk_rings.common.sring = NULL;
+	if (ring->blk_rings.common.sring) {
+		xenbus_unmap_ring_vfree(blkif->be->dev, ring->blk_ring);
+		ring->blk_rings.common.sring = NULL;
 	}
 
 	/* Remove all persistent grants and the cache of ballooned pages. */
-	xen_blkbk_free_caches(blkif);
+	xen_blkbk_free_caches(ring);
 
 	/* Check that there is no request in use */
-	list_for_each_entry_safe(req, n, &blkif->pending_free, free_list) {
+	list_for_each_entry_safe(req, n, &ring->pending_free, free_list) {
 		list_del(&req->free_list);
 
 		for (j = 0; j < MAX_INDIRECT_SEGMENTS; j++)
@@ -835,6 +836,7 @@ static int connect_ring(struct backend_info *be)
 	char protocol[64] = "";
 	struct pending_req *req, *n;
 	int err, i, j;
+	struct xen_blkif_ring *ring = &be->blkif->ring;
 
 	pr_debug("%s %s\n", __func__, dev->otherend);
 
@@ -923,7 +925,7 @@ static int connect_ring(struct backend_info *be)
 		req = kzalloc(sizeof(*req), GFP_KERNEL);
 		if (!req)
 			goto fail;
-		list_add_tail(&req->free_list, &be->blkif->pending_free);
+		list_add_tail(&req->free_list, &ring->pending_free);
 		for (j = 0; j < MAX_INDIRECT_SEGMENTS; j++) {
 			req->segments[j] = kzalloc(sizeof(*req->segments[0]), GFP_KERNEL);
 			if (!req->segments[j])
@@ -938,7 +940,7 @@ static int connect_ring(struct backend_info *be)
 	}
 
 	/* Map the shared frame, irq etc. */
-	err = xen_blkif_map(be->blkif, ring_ref, nr_grefs, evtchn);
+	err = xen_blkif_map(ring, ring_ref, nr_grefs, evtchn);
 	if (err) {
 		xenbus_dev_fatal(dev, err, "mapping ring-ref port %u", evtchn);
 		return err;
@@ -947,7 +949,7 @@ static int connect_ring(struct backend_info *be)
 	return 0;
 
 fail:
-	list_for_each_entry_safe(req, n, &be->blkif->pending_free, free_list) {
+	list_for_each_entry_safe(req, n, &ring->pending_free, free_list) {
 		list_del(&req->free_list);
 		for (j = 0; j < MAX_INDIRECT_SEGMENTS; j++) {
 			if (!req->segments[j])
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 07/10] xen/blkback: pseudo support for multi hardware queues/rings
  2015-11-14  3:12 [PATCH v5 00/10] xen-block: multi hardware-queues/rings support Bob Liu
                   ` (8 preceding siblings ...)
  2015-11-14  3:12   ` Bob Liu
@ 2015-11-14  3:12 ` Bob Liu
  2015-11-25 17:40   ` Konrad Rzeszutek Wilk
  2015-11-25 17:40   ` Konrad Rzeszutek Wilk
  2015-11-14  3:12 ` Bob Liu
                   ` (12 subsequent siblings)
  22 siblings, 2 replies; 59+ messages in thread
From: Bob Liu @ 2015-11-14  3:12 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, roger.pau, konrad.wilk, felipe.franciosi, axboe,
	avanzini.arianna, rafal.mielniczuk, jonathan.davies,
	david.vrabel, Bob Liu

Preparatory patch for multiple hardware queues (rings). The number of
rings is unconditionally set to 1, larger number will be enabled in next
patch("xen/blkback: get the number of hardware queues/rings from blkfront") so
as to make every single patch small and readable.

Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 drivers/block/xen-blkback/common.h |    3 +-
 drivers/block/xen-blkback/xenbus.c |  277 ++++++++++++++++++++++--------------
 2 files changed, 175 insertions(+), 105 deletions(-)

diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index f4dfa5b..f2386e3 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -340,7 +340,8 @@ struct xen_blkif {
 	struct work_struct	free_work;
 	unsigned int nr_ring_pages;
 	/* All rings for this device. */
-	struct xen_blkif_ring ring;
+	struct xen_blkif_ring *rings;
+	unsigned int nr_rings;
 };
 
 struct seg_buf {
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index e4bfc92..6c6e048 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -86,9 +86,11 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
 {
 	int err;
 	char name[BLKBACK_NAME_LEN];
+	struct xen_blkif_ring *ring;
+	unsigned int i;
 
 	/* Not ready to connect? */
-	if (!blkif->ring.irq || !blkif->vbd.bdev)
+	if (!blkif->rings || !blkif->rings[0].irq || !blkif->vbd.bdev)
 		return;
 
 	/* Already connected? */
@@ -113,19 +115,55 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
 	}
 	invalidate_inode_pages2(blkif->vbd.bdev->bd_inode->i_mapping);
 
-	blkif->ring.xenblkd = kthread_run(xen_blkif_schedule, &blkif->ring, "%s", name);
-	if (IS_ERR(blkif->ring.xenblkd)) {
-		err = PTR_ERR(blkif->ring.xenblkd);
-		blkif->ring.xenblkd = NULL;
-		xenbus_dev_error(blkif->be->dev, err, "start xenblkd");
-		return;
+	for (i = 0; i < blkif->nr_rings; i++) {
+		ring = &blkif->rings[i];
+		ring->xenblkd = kthread_run(xen_blkif_schedule, ring, "%s-%d", name, i);
+		if (IS_ERR(ring->xenblkd)) {
+			err = PTR_ERR(ring->xenblkd);
+			ring->xenblkd = NULL;
+			xenbus_dev_fatal(blkif->be->dev, err,
+					"start %s-%d xenblkd", name, i);
+			goto out;
+		}
+	}
+	return;
+
+out:
+	while (--i >= 0) {
+		ring = &blkif->rings[i];
+		kthread_stop(ring->xenblkd);
 	}
+	return;
+}
+
+static int xen_blkif_alloc_rings(struct xen_blkif *blkif)
+{
+	unsigned int r;
+
+	blkif->rings = kzalloc(blkif->nr_rings * sizeof(struct xen_blkif_ring), GFP_KERNEL);
+	if (!blkif->rings)
+		return -ENOMEM;
+
+	for (r = 0; r < blkif->nr_rings; r++) {
+		struct xen_blkif_ring *ring = &blkif->rings[r];
+
+		spin_lock_init(&ring->blk_ring_lock);
+		init_waitqueue_head(&ring->wq);
+		INIT_LIST_HEAD(&ring->pending_free);
+
+		spin_lock_init(&ring->pending_free_lock);
+		init_waitqueue_head(&ring->pending_free_wq);
+		init_waitqueue_head(&ring->shutdown_wq);
+		ring->blkif = blkif;
+		xen_blkif_get(blkif);
+	}
+
+	return 0;
 }
 
 static struct xen_blkif *xen_blkif_alloc(domid_t domid)
 {
 	struct xen_blkif *blkif;
-	struct xen_blkif_ring *ring;
 
 	BUILD_BUG_ON(MAX_INDIRECT_PAGES > BLKIF_MAX_INDIRECT_PAGES_PER_REQUEST);
 
@@ -143,15 +181,11 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
 	blkif->st_print = jiffies;
 	INIT_WORK(&blkif->persistent_purge_work, xen_blkbk_unmap_purged_grants);
 
-	ring = &blkif->ring;
-	ring->blkif = blkif;
-	spin_lock_init(&ring->blk_ring_lock);
-	init_waitqueue_head(&ring->wq);
-
-	INIT_LIST_HEAD(&ring->pending_free);
-	spin_lock_init(&ring->pending_free_lock);
-	init_waitqueue_head(&ring->pending_free_wq);
-	init_waitqueue_head(&ring->shutdown_wq);
+	blkif->nr_rings = 1;
+	if (xen_blkif_alloc_rings(blkif)) {
+		kmem_cache_free(xen_blkif_cachep, blkif);
+		return ERR_PTR(-ENOMEM);
+	}
 
 	return blkif;
 }
@@ -216,50 +250,54 @@ static int xen_blkif_map(struct xen_blkif_ring *ring, grant_ref_t *gref,
 static int xen_blkif_disconnect(struct xen_blkif *blkif)
 {
 	struct pending_req *req, *n;
-	int i = 0, j;
-	struct xen_blkif_ring *ring = &blkif->ring;
+	unsigned int j, r;
 
-	if (ring->xenblkd) {
-		kthread_stop(ring->xenblkd);
-		wake_up(&ring->shutdown_wq);
-		ring->xenblkd = NULL;
-	}
+	for (r = 0; r < blkif->nr_rings; r++) {
+		struct xen_blkif_ring *ring = &blkif->rings[r];
+		unsigned int i = 0;
 
-	/* The above kthread_stop() guarantees that at this point we
-	 * don't have any discard_io or other_io requests. So, checking
-	 * for inflight IO is enough.
-	 */
-	if (atomic_read(&ring->inflight) > 0)
-		return -EBUSY;
+		if (ring->xenblkd) {
+			kthread_stop(ring->xenblkd);
+			wake_up(&ring->shutdown_wq);
+			ring->xenblkd = NULL;
+		}
 
-	if (ring->irq) {
-		unbind_from_irqhandler(ring->irq, ring);
-		ring->irq = 0;
-	}
+		/* The above kthread_stop() guarantees that at this point we
+		 * don't have any discard_io or other_io requests. So, checking
+		 * for inflight IO is enough.
+		 */
+		if (atomic_read(&ring->inflight) > 0)
+			return -EBUSY;
 
-	if (ring->blk_rings.common.sring) {
-		xenbus_unmap_ring_vfree(blkif->be->dev, ring->blk_ring);
-		ring->blk_rings.common.sring = NULL;
-	}
+		if (ring->irq) {
+			unbind_from_irqhandler(ring->irq, ring);
+			ring->irq = 0;
+		}
 
-	/* Remove all persistent grants and the cache of ballooned pages. */
-	xen_blkbk_free_caches(ring);
+		if (ring->blk_rings.common.sring) {
+			xenbus_unmap_ring_vfree(blkif->be->dev, ring->blk_ring);
+			ring->blk_rings.common.sring = NULL;
+		}
 
-	/* Check that there is no request in use */
-	list_for_each_entry_safe(req, n, &ring->pending_free, free_list) {
-		list_del(&req->free_list);
+		/* Remove all persistent grants and the cache of ballooned pages. */
+		xen_blkbk_free_caches(ring);
 
-		for (j = 0; j < MAX_INDIRECT_SEGMENTS; j++)
-			kfree(req->segments[j]);
+		/* Check that there is no request in use */
+		list_for_each_entry_safe(req, n, &ring->pending_free, free_list) {
+			list_del(&req->free_list);
 
-		for (j = 0; j < MAX_INDIRECT_PAGES; j++)
-			kfree(req->indirect_pages[j]);
+			for (j = 0; j < MAX_INDIRECT_SEGMENTS; j++)
+				kfree(req->segments[j]);
 
-		kfree(req);
-		i++;
-	}
+			for (j = 0; j < MAX_INDIRECT_PAGES; j++)
+				kfree(req->indirect_pages[j]);
 
-	WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));
+			kfree(req);
+			i++;
+		}
+
+		WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));
+	}
 	blkif->nr_ring_pages = 0;
 
 	return 0;
@@ -279,6 +317,7 @@ static void xen_blkif_free(struct xen_blkif *blkif)
 	BUG_ON(!list_empty(&blkif->free_pages));
 	BUG_ON(!RB_EMPTY_ROOT(&blkif->persistent_gnts));
 
+	kfree(blkif->rings);
 	kmem_cache_free(xen_blkif_cachep, blkif);
 }
 
@@ -427,6 +466,7 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
 static int xen_blkbk_remove(struct xenbus_device *dev)
 {
 	struct backend_info *be = dev_get_drvdata(&dev->dev);
+	unsigned int i;
 
 	pr_debug("%s %p %d\n", __func__, dev, dev->otherend_id);
 
@@ -443,7 +483,8 @@ static int xen_blkbk_remove(struct xenbus_device *dev)
 
 	if (be->blkif) {
 		xen_blkif_disconnect(be->blkif);
-		xen_blkif_put(be->blkif);
+		for (i = 0; i < be->blkif->nr_rings; i++)
+			xen_blkif_put(be->blkif);
 	}
 
 	kfree(be->mode);
@@ -826,51 +867,43 @@ again:
 	xenbus_transaction_end(xbt, 1);
 }
 
-
-static int connect_ring(struct backend_info *be)
+/*
+ * Each ring may have multi pages, depends on "ring-page-order".
+ */
+static int read_per_ring_refs(struct xen_blkif_ring *ring, const char *dir)
 {
-	struct xenbus_device *dev = be->dev;
 	unsigned int ring_ref[XENBUS_MAX_RING_GRANTS];
-	unsigned int evtchn, nr_grefs, ring_page_order;
-	unsigned int pers_grants;
-	char protocol[64] = "";
 	struct pending_req *req, *n;
 	int err, i, j;
-	struct xen_blkif_ring *ring = &be->blkif->ring;
-
-	pr_debug("%s %s\n", __func__, dev->otherend);
+	struct xen_blkif *blkif = ring->blkif;
+	struct xenbus_device *dev = blkif->be->dev;
+	unsigned int ring_page_order, nr_grefs, evtchn;
 
-	err = xenbus_scanf(XBT_NIL, dev->otherend, "event-channel", "%u",
+	err = xenbus_scanf(XBT_NIL, dir, "event-channel", "%u",
 			  &evtchn);
 	if (err != 1) {
 		err = -EINVAL;
-		xenbus_dev_fatal(dev, err, "reading %s/event-channel",
-				 dev->otherend);
+		xenbus_dev_fatal(dev, err, "reading %s/event-channel", dir);
 		return err;
 	}
-	pr_info("event-channel %u\n", evtchn);
 
 	err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-page-order", "%u",
 			  &ring_page_order);
 	if (err != 1) {
-		err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-ref",
-				  "%u", &ring_ref[0]);
+		err = xenbus_scanf(XBT_NIL, dir, "ring-ref", "%u", &ring_ref[0]);
 		if (err != 1) {
 			err = -EINVAL;
-			xenbus_dev_fatal(dev, err, "reading %s/ring-ref",
-					 dev->otherend);
+			xenbus_dev_fatal(dev, err, "reading %s/ring-ref", dir);
 			return err;
 		}
 		nr_grefs = 1;
-		pr_info("%s:using single page: ring-ref %d\n", dev->otherend,
-			ring_ref[0]);
 	} else {
 		unsigned int i;
 
 		if (ring_page_order > xen_blkif_max_ring_order) {
 			err = -EINVAL;
 			xenbus_dev_fatal(dev, err, "%s/request %d ring page order exceed max:%d",
-					 dev->otherend, ring_page_order,
+					 dir, ring_page_order,
 					 xen_blkif_max_ring_order);
 			return err;
 		}
@@ -880,46 +913,17 @@ static int connect_ring(struct backend_info *be)
 			char ring_ref_name[RINGREF_NAME_LEN];
 
 			snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
-			err = xenbus_scanf(XBT_NIL, dev->otherend, ring_ref_name,
+			err = xenbus_scanf(XBT_NIL, dir, ring_ref_name,
 					   "%u", &ring_ref[i]);
 			if (err != 1) {
 				err = -EINVAL;
 				xenbus_dev_fatal(dev, err, "reading %s/%s",
-						 dev->otherend, ring_ref_name);
+						 dir, ring_ref_name);
 				return err;
 			}
-			pr_info("ring-ref%u: %u\n", i, ring_ref[i]);
 		}
 	}
-
-	be->blkif->blk_protocol = BLKIF_PROTOCOL_DEFAULT;
-	err = xenbus_gather(XBT_NIL, dev->otherend, "protocol",
-			    "%63s", protocol, NULL);
-	if (err)
-		strcpy(protocol, "unspecified, assuming default");
-	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_NATIVE))
-		be->blkif->blk_protocol = BLKIF_PROTOCOL_NATIVE;
-	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_X86_32))
-		be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_32;
-	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_X86_64))
-		be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_64;
-	else {
-		xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
-		return -1;
-	}
-	err = xenbus_gather(XBT_NIL, dev->otherend,
-			    "feature-persistent", "%u",
-			    &pers_grants, NULL);
-	if (err)
-		pers_grants = 0;
-
-	be->blkif->vbd.feature_gnt_persistent = pers_grants;
-	be->blkif->vbd.overflow_max_grants = 0;
-	be->blkif->nr_ring_pages = nr_grefs;
-
-	pr_info("ring-pages:%d, event-channel %d, protocol %d (%s) %s\n",
-		nr_grefs, evtchn, be->blkif->blk_protocol, protocol,
-		pers_grants ? "persistent grants" : "");
+	blkif->nr_ring_pages = nr_grefs;
 
 	for (i = 0; i < nr_grefs * XEN_BLKIF_REQS_PER_PAGE; i++) {
 		req = kzalloc(sizeof(*req), GFP_KERNEL);
@@ -964,6 +968,71 @@ fail:
 		kfree(req);
 	}
 	return -ENOMEM;
+
+}
+
+static int connect_ring(struct backend_info *be)
+{
+	struct xenbus_device *dev = be->dev;
+	unsigned int pers_grants;
+	char protocol[64] = "";
+	int err, i;
+	char *xspath;
+	size_t xspathsize;
+	const size_t xenstore_path_ext_size = 11; /* sufficient for "/queue-NNN" */
+
+	pr_debug("%s %s\n", __func__, dev->otherend);
+
+	be->blkif->blk_protocol = BLKIF_PROTOCOL_DEFAULT;
+	err = xenbus_gather(XBT_NIL, dev->otherend, "protocol",
+			    "%63s", protocol, NULL);
+	if (err)
+		strcpy(protocol, "unspecified, assuming default");
+	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_NATIVE))
+		be->blkif->blk_protocol = BLKIF_PROTOCOL_NATIVE;
+	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_X86_32))
+		be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_32;
+	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_X86_64))
+		be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_64;
+	else {
+		xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
+		return -1;
+	}
+	err = xenbus_gather(XBT_NIL, dev->otherend,
+			    "feature-persistent", "%u",
+			    &pers_grants, NULL);
+	if (err)
+		pers_grants = 0;
+
+	be->blkif->vbd.feature_gnt_persistent = pers_grants;
+	be->blkif->vbd.overflow_max_grants = 0;
+
+	pr_info("%s: using %d queues, protocol %d (%s) %s\n", dev->nodename,
+		 be->blkif->nr_rings, be->blkif->blk_protocol, protocol,
+		 pers_grants ? "persistent grants" : "");
+
+	if (be->blkif->nr_rings == 1)
+		return read_per_ring_refs(&be->blkif->rings[0], dev->otherend);
+	else {
+		xspathsize = strlen(dev->otherend) + xenstore_path_ext_size;
+		xspath = kmalloc(xspathsize, GFP_KERNEL);
+		if (!xspath) {
+			xenbus_dev_fatal(dev, -ENOMEM, "reading ring references");
+			return -ENOMEM;
+		}
+
+		for (i = 0; i < be->blkif->nr_rings; i++) {
+			memset(xspath, 0, xspathsize);
+			snprintf(xspath, xspathsize, "%s/queue-%u", dev->otherend, i);
+			err = read_per_ring_refs(&be->blkif->rings[i], xspath);
+			if (err) {
+				kfree(xspath);
+				return err;
+			}
+		}
+		kfree(xspath);
+	}
+	return 0;
 }
 
 static const struct xenbus_device_id xen_blkbk_ids[] = {
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 07/10] xen/blkback: pseudo support for multi hardware queues/rings
  2015-11-14  3:12 [PATCH v5 00/10] xen-block: multi hardware-queues/rings support Bob Liu
                   ` (9 preceding siblings ...)
  2015-11-14  3:12 ` [PATCH v5 07/10] xen/blkback: pseudo support for multi hardware queues/rings Bob Liu
@ 2015-11-14  3:12 ` Bob Liu
  2015-11-14  3:12 ` [PATCH v5 08/10] xen/blkback: get the number of hardware queues/rings from blkfront Bob Liu
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 59+ messages in thread
From: Bob Liu @ 2015-11-14  3:12 UTC (permalink / raw)
  To: xen-devel
  Cc: jonathan.davies, felipe.franciosi, rafal.mielniczuk,
	linux-kernel, axboe, Bob Liu, david.vrabel, avanzini.arianna,
	roger.pau

Preparatory patch for multiple hardware queues (rings). The number of
rings is unconditionally set to 1, larger number will be enabled in next
patch("xen/blkback: get the number of hardware queues/rings from blkfront") so
as to make every single patch small and readable.

Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 drivers/block/xen-blkback/common.h |    3 +-
 drivers/block/xen-blkback/xenbus.c |  277 ++++++++++++++++++++++--------------
 2 files changed, 175 insertions(+), 105 deletions(-)

diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index f4dfa5b..f2386e3 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -340,7 +340,8 @@ struct xen_blkif {
 	struct work_struct	free_work;
 	unsigned int nr_ring_pages;
 	/* All rings for this device. */
-	struct xen_blkif_ring ring;
+	struct xen_blkif_ring *rings;
+	unsigned int nr_rings;
 };
 
 struct seg_buf {
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index e4bfc92..6c6e048 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -86,9 +86,11 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
 {
 	int err;
 	char name[BLKBACK_NAME_LEN];
+	struct xen_blkif_ring *ring;
+	unsigned int i;
 
 	/* Not ready to connect? */
-	if (!blkif->ring.irq || !blkif->vbd.bdev)
+	if (!blkif->rings || !blkif->rings[0].irq || !blkif->vbd.bdev)
 		return;
 
 	/* Already connected? */
@@ -113,19 +115,55 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
 	}
 	invalidate_inode_pages2(blkif->vbd.bdev->bd_inode->i_mapping);
 
-	blkif->ring.xenblkd = kthread_run(xen_blkif_schedule, &blkif->ring, "%s", name);
-	if (IS_ERR(blkif->ring.xenblkd)) {
-		err = PTR_ERR(blkif->ring.xenblkd);
-		blkif->ring.xenblkd = NULL;
-		xenbus_dev_error(blkif->be->dev, err, "start xenblkd");
-		return;
+	for (i = 0; i < blkif->nr_rings; i++) {
+		ring = &blkif->rings[i];
+		ring->xenblkd = kthread_run(xen_blkif_schedule, ring, "%s-%d", name, i);
+		if (IS_ERR(ring->xenblkd)) {
+			err = PTR_ERR(ring->xenblkd);
+			ring->xenblkd = NULL;
+			xenbus_dev_fatal(blkif->be->dev, err,
+					"start %s-%d xenblkd", name, i);
+			goto out;
+		}
+	}
+	return;
+
+out:
+	while (--i >= 0) {
+		ring = &blkif->rings[i];
+		kthread_stop(ring->xenblkd);
 	}
+	return;
+}
+
+static int xen_blkif_alloc_rings(struct xen_blkif *blkif)
+{
+	unsigned int r;
+
+	blkif->rings = kzalloc(blkif->nr_rings * sizeof(struct xen_blkif_ring), GFP_KERNEL);
+	if (!blkif->rings)
+		return -ENOMEM;
+
+	for (r = 0; r < blkif->nr_rings; r++) {
+		struct xen_blkif_ring *ring = &blkif->rings[r];
+
+		spin_lock_init(&ring->blk_ring_lock);
+		init_waitqueue_head(&ring->wq);
+		INIT_LIST_HEAD(&ring->pending_free);
+
+		spin_lock_init(&ring->pending_free_lock);
+		init_waitqueue_head(&ring->pending_free_wq);
+		init_waitqueue_head(&ring->shutdown_wq);
+		ring->blkif = blkif;
+		xen_blkif_get(blkif);
+	}
+
+	return 0;
 }
 
 static struct xen_blkif *xen_blkif_alloc(domid_t domid)
 {
 	struct xen_blkif *blkif;
-	struct xen_blkif_ring *ring;
 
 	BUILD_BUG_ON(MAX_INDIRECT_PAGES > BLKIF_MAX_INDIRECT_PAGES_PER_REQUEST);
 
@@ -143,15 +181,11 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
 	blkif->st_print = jiffies;
 	INIT_WORK(&blkif->persistent_purge_work, xen_blkbk_unmap_purged_grants);
 
-	ring = &blkif->ring;
-	ring->blkif = blkif;
-	spin_lock_init(&ring->blk_ring_lock);
-	init_waitqueue_head(&ring->wq);
-
-	INIT_LIST_HEAD(&ring->pending_free);
-	spin_lock_init(&ring->pending_free_lock);
-	init_waitqueue_head(&ring->pending_free_wq);
-	init_waitqueue_head(&ring->shutdown_wq);
+	blkif->nr_rings = 1;
+	if (xen_blkif_alloc_rings(blkif)) {
+		kmem_cache_free(xen_blkif_cachep, blkif);
+		return ERR_PTR(-ENOMEM);
+	}
 
 	return blkif;
 }
@@ -216,50 +250,54 @@ static int xen_blkif_map(struct xen_blkif_ring *ring, grant_ref_t *gref,
 static int xen_blkif_disconnect(struct xen_blkif *blkif)
 {
 	struct pending_req *req, *n;
-	int i = 0, j;
-	struct xen_blkif_ring *ring = &blkif->ring;
+	unsigned int j, r;
 
-	if (ring->xenblkd) {
-		kthread_stop(ring->xenblkd);
-		wake_up(&ring->shutdown_wq);
-		ring->xenblkd = NULL;
-	}
+	for (r = 0; r < blkif->nr_rings; r++) {
+		struct xen_blkif_ring *ring = &blkif->rings[r];
+		unsigned int i = 0;
 
-	/* The above kthread_stop() guarantees that at this point we
-	 * don't have any discard_io or other_io requests. So, checking
-	 * for inflight IO is enough.
-	 */
-	if (atomic_read(&ring->inflight) > 0)
-		return -EBUSY;
+		if (ring->xenblkd) {
+			kthread_stop(ring->xenblkd);
+			wake_up(&ring->shutdown_wq);
+			ring->xenblkd = NULL;
+		}
 
-	if (ring->irq) {
-		unbind_from_irqhandler(ring->irq, ring);
-		ring->irq = 0;
-	}
+		/* The above kthread_stop() guarantees that at this point we
+		 * don't have any discard_io or other_io requests. So, checking
+		 * for inflight IO is enough.
+		 */
+		if (atomic_read(&ring->inflight) > 0)
+			return -EBUSY;
 
-	if (ring->blk_rings.common.sring) {
-		xenbus_unmap_ring_vfree(blkif->be->dev, ring->blk_ring);
-		ring->blk_rings.common.sring = NULL;
-	}
+		if (ring->irq) {
+			unbind_from_irqhandler(ring->irq, ring);
+			ring->irq = 0;
+		}
 
-	/* Remove all persistent grants and the cache of ballooned pages. */
-	xen_blkbk_free_caches(ring);
+		if (ring->blk_rings.common.sring) {
+			xenbus_unmap_ring_vfree(blkif->be->dev, ring->blk_ring);
+			ring->blk_rings.common.sring = NULL;
+		}
 
-	/* Check that there is no request in use */
-	list_for_each_entry_safe(req, n, &ring->pending_free, free_list) {
-		list_del(&req->free_list);
+		/* Remove all persistent grants and the cache of ballooned pages. */
+		xen_blkbk_free_caches(ring);
 
-		for (j = 0; j < MAX_INDIRECT_SEGMENTS; j++)
-			kfree(req->segments[j]);
+		/* Check that there is no request in use */
+		list_for_each_entry_safe(req, n, &ring->pending_free, free_list) {
+			list_del(&req->free_list);
 
-		for (j = 0; j < MAX_INDIRECT_PAGES; j++)
-			kfree(req->indirect_pages[j]);
+			for (j = 0; j < MAX_INDIRECT_SEGMENTS; j++)
+				kfree(req->segments[j]);
 
-		kfree(req);
-		i++;
-	}
+			for (j = 0; j < MAX_INDIRECT_PAGES; j++)
+				kfree(req->indirect_pages[j]);
 
-	WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));
+			kfree(req);
+			i++;
+		}
+
+		WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));
+	}
 	blkif->nr_ring_pages = 0;
 
 	return 0;
@@ -279,6 +317,7 @@ static void xen_blkif_free(struct xen_blkif *blkif)
 	BUG_ON(!list_empty(&blkif->free_pages));
 	BUG_ON(!RB_EMPTY_ROOT(&blkif->persistent_gnts));
 
+	kfree(blkif->rings);
 	kmem_cache_free(xen_blkif_cachep, blkif);
 }
 
@@ -427,6 +466,7 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
 static int xen_blkbk_remove(struct xenbus_device *dev)
 {
 	struct backend_info *be = dev_get_drvdata(&dev->dev);
+	unsigned int i;
 
 	pr_debug("%s %p %d\n", __func__, dev, dev->otherend_id);
 
@@ -443,7 +483,8 @@ static int xen_blkbk_remove(struct xenbus_device *dev)
 
 	if (be->blkif) {
 		xen_blkif_disconnect(be->blkif);
-		xen_blkif_put(be->blkif);
+		for (i = 0; i < be->blkif->nr_rings; i++)
+			xen_blkif_put(be->blkif);
 	}
 
 	kfree(be->mode);
@@ -826,51 +867,43 @@ again:
 	xenbus_transaction_end(xbt, 1);
 }
 
-
-static int connect_ring(struct backend_info *be)
+/*
+ * Each ring may have multi pages, depends on "ring-page-order".
+ */
+static int read_per_ring_refs(struct xen_blkif_ring *ring, const char *dir)
 {
-	struct xenbus_device *dev = be->dev;
 	unsigned int ring_ref[XENBUS_MAX_RING_GRANTS];
-	unsigned int evtchn, nr_grefs, ring_page_order;
-	unsigned int pers_grants;
-	char protocol[64] = "";
 	struct pending_req *req, *n;
 	int err, i, j;
-	struct xen_blkif_ring *ring = &be->blkif->ring;
-
-	pr_debug("%s %s\n", __func__, dev->otherend);
+	struct xen_blkif *blkif = ring->blkif;
+	struct xenbus_device *dev = blkif->be->dev;
+	unsigned int ring_page_order, nr_grefs, evtchn;
 
-	err = xenbus_scanf(XBT_NIL, dev->otherend, "event-channel", "%u",
+	err = xenbus_scanf(XBT_NIL, dir, "event-channel", "%u",
 			  &evtchn);
 	if (err != 1) {
 		err = -EINVAL;
-		xenbus_dev_fatal(dev, err, "reading %s/event-channel",
-				 dev->otherend);
+		xenbus_dev_fatal(dev, err, "reading %s/event-channel", dir);
 		return err;
 	}
-	pr_info("event-channel %u\n", evtchn);
 
 	err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-page-order", "%u",
 			  &ring_page_order);
 	if (err != 1) {
-		err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-ref",
-				  "%u", &ring_ref[0]);
+		err = xenbus_scanf(XBT_NIL, dir, "ring-ref", "%u", &ring_ref[0]);
 		if (err != 1) {
 			err = -EINVAL;
-			xenbus_dev_fatal(dev, err, "reading %s/ring-ref",
-					 dev->otherend);
+			xenbus_dev_fatal(dev, err, "reading %s/ring-ref", dir);
 			return err;
 		}
 		nr_grefs = 1;
-		pr_info("%s:using single page: ring-ref %d\n", dev->otherend,
-			ring_ref[0]);
 	} else {
 		unsigned int i;
 
 		if (ring_page_order > xen_blkif_max_ring_order) {
 			err = -EINVAL;
 			xenbus_dev_fatal(dev, err, "%s/request %d ring page order exceed max:%d",
-					 dev->otherend, ring_page_order,
+					 dir, ring_page_order,
 					 xen_blkif_max_ring_order);
 			return err;
 		}
@@ -880,46 +913,17 @@ static int connect_ring(struct backend_info *be)
 			char ring_ref_name[RINGREF_NAME_LEN];
 
 			snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
-			err = xenbus_scanf(XBT_NIL, dev->otherend, ring_ref_name,
+			err = xenbus_scanf(XBT_NIL, dir, ring_ref_name,
 					   "%u", &ring_ref[i]);
 			if (err != 1) {
 				err = -EINVAL;
 				xenbus_dev_fatal(dev, err, "reading %s/%s",
-						 dev->otherend, ring_ref_name);
+						 dir, ring_ref_name);
 				return err;
 			}
-			pr_info("ring-ref%u: %u\n", i, ring_ref[i]);
 		}
 	}
-
-	be->blkif->blk_protocol = BLKIF_PROTOCOL_DEFAULT;
-	err = xenbus_gather(XBT_NIL, dev->otherend, "protocol",
-			    "%63s", protocol, NULL);
-	if (err)
-		strcpy(protocol, "unspecified, assuming default");
-	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_NATIVE))
-		be->blkif->blk_protocol = BLKIF_PROTOCOL_NATIVE;
-	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_X86_32))
-		be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_32;
-	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_X86_64))
-		be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_64;
-	else {
-		xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
-		return -1;
-	}
-	err = xenbus_gather(XBT_NIL, dev->otherend,
-			    "feature-persistent", "%u",
-			    &pers_grants, NULL);
-	if (err)
-		pers_grants = 0;
-
-	be->blkif->vbd.feature_gnt_persistent = pers_grants;
-	be->blkif->vbd.overflow_max_grants = 0;
-	be->blkif->nr_ring_pages = nr_grefs;
-
-	pr_info("ring-pages:%d, event-channel %d, protocol %d (%s) %s\n",
-		nr_grefs, evtchn, be->blkif->blk_protocol, protocol,
-		pers_grants ? "persistent grants" : "");
+	blkif->nr_ring_pages = nr_grefs;
 
 	for (i = 0; i < nr_grefs * XEN_BLKIF_REQS_PER_PAGE; i++) {
 		req = kzalloc(sizeof(*req), GFP_KERNEL);
@@ -964,6 +968,71 @@ fail:
 		kfree(req);
 	}
 	return -ENOMEM;
+
+}
+
+static int connect_ring(struct backend_info *be)
+{
+	struct xenbus_device *dev = be->dev;
+	unsigned int pers_grants;
+	char protocol[64] = "";
+	int err, i;
+	char *xspath;
+	size_t xspathsize;
+	const size_t xenstore_path_ext_size = 11; /* sufficient for "/queue-NNN" */
+
+	pr_debug("%s %s\n", __func__, dev->otherend);
+
+	be->blkif->blk_protocol = BLKIF_PROTOCOL_DEFAULT;
+	err = xenbus_gather(XBT_NIL, dev->otherend, "protocol",
+			    "%63s", protocol, NULL);
+	if (err)
+		strcpy(protocol, "unspecified, assuming default");
+	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_NATIVE))
+		be->blkif->blk_protocol = BLKIF_PROTOCOL_NATIVE;
+	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_X86_32))
+		be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_32;
+	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_X86_64))
+		be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_64;
+	else {
+		xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
+		return -1;
+	}
+	err = xenbus_gather(XBT_NIL, dev->otherend,
+			    "feature-persistent", "%u",
+			    &pers_grants, NULL);
+	if (err)
+		pers_grants = 0;
+
+	be->blkif->vbd.feature_gnt_persistent = pers_grants;
+	be->blkif->vbd.overflow_max_grants = 0;
+
+	pr_info("%s: using %d queues, protocol %d (%s) %s\n", dev->nodename,
+		 be->blkif->nr_rings, be->blkif->blk_protocol, protocol,
+		 pers_grants ? "persistent grants" : "");
+
+	if (be->blkif->nr_rings == 1)
+		return read_per_ring_refs(&be->blkif->rings[0], dev->otherend);
+	else {
+		xspathsize = strlen(dev->otherend) + xenstore_path_ext_size;
+		xspath = kmalloc(xspathsize, GFP_KERNEL);
+		if (!xspath) {
+			xenbus_dev_fatal(dev, -ENOMEM, "reading ring references");
+			return -ENOMEM;
+		}
+
+		for (i = 0; i < be->blkif->nr_rings; i++) {
+			memset(xspath, 0, xspathsize);
+			snprintf(xspath, xspathsize, "%s/queue-%u", dev->otherend, i);
+			err = read_per_ring_refs(&be->blkif->rings[i], xspath);
+			if (err) {
+				kfree(xspath);
+				return err;
+			}
+		}
+		kfree(xspath);
+	}
+	return 0;
 }
 
 static const struct xenbus_device_id xen_blkbk_ids[] = {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 08/10] xen/blkback: get the number of hardware queues/rings from blkfront
  2015-11-14  3:12 [PATCH v5 00/10] xen-block: multi hardware-queues/rings support Bob Liu
                   ` (11 preceding siblings ...)
  2015-11-14  3:12 ` [PATCH v5 08/10] xen/blkback: get the number of hardware queues/rings from blkfront Bob Liu
@ 2015-11-14  3:12 ` Bob Liu
  2015-11-25 17:58   ` Konrad Rzeszutek Wilk
  2015-11-25 17:58   ` Konrad Rzeszutek Wilk
  2015-11-14  3:12 ` [PATCH v5 09/10] xen/blkfront: make persistent grants pool per-queue Bob Liu
                   ` (9 subsequent siblings)
  22 siblings, 2 replies; 59+ messages in thread
From: Bob Liu @ 2015-11-14  3:12 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, roger.pau, konrad.wilk, felipe.franciosi, axboe,
	avanzini.arianna, rafal.mielniczuk, jonathan.davies,
	david.vrabel, Bob Liu

Backend advertises "multi-queue-max-queues" to front, also get the negotiated
number from "multi-queue-num-queues" written by blkfront.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 drivers/block/xen-blkback/blkback.c |   12 ++++++++++++
 drivers/block/xen-blkback/common.h  |    1 +
 drivers/block/xen-blkback/xenbus.c  |   34 ++++++++++++++++++++++++++++------
 3 files changed, 41 insertions(+), 6 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index fb5bfd4..acedc46 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -84,6 +84,15 @@ MODULE_PARM_DESC(max_persistent_grants,
                  "Maximum number of grants to map persistently");
 
 /*
+ * Maximum number of rings/queues blkback supports, allow as many queues as there
+ * are CPUs if user has not specified a value.
+ */
+unsigned int xenblk_max_queues;
+module_param_named(max_queues, xenblk_max_queues, uint, 0644);
+MODULE_PARM_DESC(max_queues,
+		 "Maximum number of hardware queues per virtual disk");
+
+/*
  * Maximum order of pages to be used for the shared ring between front and
  * backend, 4KB page granularity is used.
  */
@@ -1483,6 +1492,9 @@ static int __init xen_blkif_init(void)
 		xen_blkif_max_ring_order = XENBUS_MAX_RING_GRANT_ORDER;
 	}
 
+	if (xenblk_max_queues == 0)
+		xenblk_max_queues = num_online_cpus();
+
 	rc = xen_blkif_interface_init();
 	if (rc)
 		goto failed_init;
diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index f2386e3..0833dc6 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -46,6 +46,7 @@
 #include <xen/interface/io/protocols.h>
 
 extern unsigned int xen_blkif_max_ring_order;
+extern unsigned int xenblk_max_queues;
 /*
  * This is the maximum number of segments that would be allowed in indirect
  * requests. This value will also be passed to the frontend.
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index 6c6e048..d83b790 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -181,12 +181,6 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
 	blkif->st_print = jiffies;
 	INIT_WORK(&blkif->persistent_purge_work, xen_blkbk_unmap_purged_grants);
 
-	blkif->nr_rings = 1;
-	if (xen_blkif_alloc_rings(blkif)) {
-		kmem_cache_free(xen_blkif_cachep, blkif);
-		return ERR_PTR(-ENOMEM);
-	}
-
 	return blkif;
 }
 
@@ -595,6 +589,12 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
 		goto fail;
 	}
 
+	/* Multi-queue: write how many queues are supported by the backend. */
+	err = xenbus_printf(XBT_NIL, dev->nodename,
+			    "multi-queue-max-queues", "%u", xenblk_max_queues);
+	if (err)
+		pr_warn("Error writing multi-queue-num-queues\n");
+
 	/* setup back pointer */
 	be->blkif->be = be;
 
@@ -980,6 +980,7 @@ static int connect_ring(struct backend_info *be)
 	char *xspath;
 	size_t xspathsize;
 	const size_t xenstore_path_ext_size = 11; /* sufficient for "/queue-NNN" */
+	unsigned int requested_num_queues = 0;
 
 	pr_debug("%s %s\n", __func__, dev->otherend);
 
@@ -1007,6 +1008,27 @@ static int connect_ring(struct backend_info *be)
 	be->blkif->vbd.feature_gnt_persistent = pers_grants;
 	be->blkif->vbd.overflow_max_grants = 0;
 
+	/*
+	 * Read the number of hardware queues from frontend.
+	 */
+	err = xenbus_scanf(XBT_NIL, dev->otherend, "multi-queue-num-queues",
+			   "%u", &requested_num_queues);
+	if (err < 0) {
+		requested_num_queues = 1;
+	} else {
+		if (requested_num_queues > xenblk_max_queues
+		    || requested_num_queues == 0) {
+			/* buggy or malicious guest */
+			xenbus_dev_fatal(dev, err,
+					"guest requested %u queues, exceeding the maximum of %u.",
+					requested_num_queues, xenblk_max_queues);
+			return -1;
+		}
+	}
+	be->blkif->nr_rings = requested_num_queues;
+	if (xen_blkif_alloc_rings(be->blkif))
+		return -ENOMEM;
+
 	pr_info("%s: using %d queues, protocol %d (%s) %s\n", dev->nodename,
 		 be->blkif->nr_rings, be->blkif->blk_protocol, protocol,
 		 pers_grants ? "persistent grants" : "");
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 08/10] xen/blkback: get the number of hardware queues/rings from blkfront
  2015-11-14  3:12 [PATCH v5 00/10] xen-block: multi hardware-queues/rings support Bob Liu
                   ` (10 preceding siblings ...)
  2015-11-14  3:12 ` Bob Liu
@ 2015-11-14  3:12 ` Bob Liu
  2015-11-14  3:12 ` Bob Liu
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 59+ messages in thread
From: Bob Liu @ 2015-11-14  3:12 UTC (permalink / raw)
  To: xen-devel
  Cc: jonathan.davies, felipe.franciosi, rafal.mielniczuk,
	linux-kernel, axboe, Bob Liu, david.vrabel, avanzini.arianna,
	roger.pau

Backend advertises "multi-queue-max-queues" to front, also get the negotiated
number from "multi-queue-num-queues" written by blkfront.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 drivers/block/xen-blkback/blkback.c |   12 ++++++++++++
 drivers/block/xen-blkback/common.h  |    1 +
 drivers/block/xen-blkback/xenbus.c  |   34 ++++++++++++++++++++++++++++------
 3 files changed, 41 insertions(+), 6 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index fb5bfd4..acedc46 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -84,6 +84,15 @@ MODULE_PARM_DESC(max_persistent_grants,
                  "Maximum number of grants to map persistently");
 
 /*
+ * Maximum number of rings/queues blkback supports, allow as many queues as there
+ * are CPUs if user has not specified a value.
+ */
+unsigned int xenblk_max_queues;
+module_param_named(max_queues, xenblk_max_queues, uint, 0644);
+MODULE_PARM_DESC(max_queues,
+		 "Maximum number of hardware queues per virtual disk");
+
+/*
  * Maximum order of pages to be used for the shared ring between front and
  * backend, 4KB page granularity is used.
  */
@@ -1483,6 +1492,9 @@ static int __init xen_blkif_init(void)
 		xen_blkif_max_ring_order = XENBUS_MAX_RING_GRANT_ORDER;
 	}
 
+	if (xenblk_max_queues == 0)
+		xenblk_max_queues = num_online_cpus();
+
 	rc = xen_blkif_interface_init();
 	if (rc)
 		goto failed_init;
diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index f2386e3..0833dc6 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -46,6 +46,7 @@
 #include <xen/interface/io/protocols.h>
 
 extern unsigned int xen_blkif_max_ring_order;
+extern unsigned int xenblk_max_queues;
 /*
  * This is the maximum number of segments that would be allowed in indirect
  * requests. This value will also be passed to the frontend.
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index 6c6e048..d83b790 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -181,12 +181,6 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
 	blkif->st_print = jiffies;
 	INIT_WORK(&blkif->persistent_purge_work, xen_blkbk_unmap_purged_grants);
 
-	blkif->nr_rings = 1;
-	if (xen_blkif_alloc_rings(blkif)) {
-		kmem_cache_free(xen_blkif_cachep, blkif);
-		return ERR_PTR(-ENOMEM);
-	}
-
 	return blkif;
 }
 
@@ -595,6 +589,12 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
 		goto fail;
 	}
 
+	/* Multi-queue: write how many queues are supported by the backend. */
+	err = xenbus_printf(XBT_NIL, dev->nodename,
+			    "multi-queue-max-queues", "%u", xenblk_max_queues);
+	if (err)
+		pr_warn("Error writing multi-queue-num-queues\n");
+
 	/* setup back pointer */
 	be->blkif->be = be;
 
@@ -980,6 +980,7 @@ static int connect_ring(struct backend_info *be)
 	char *xspath;
 	size_t xspathsize;
 	const size_t xenstore_path_ext_size = 11; /* sufficient for "/queue-NNN" */
+	unsigned int requested_num_queues = 0;
 
 	pr_debug("%s %s\n", __func__, dev->otherend);
 
@@ -1007,6 +1008,27 @@ static int connect_ring(struct backend_info *be)
 	be->blkif->vbd.feature_gnt_persistent = pers_grants;
 	be->blkif->vbd.overflow_max_grants = 0;
 
+	/*
+	 * Read the number of hardware queues from frontend.
+	 */
+	err = xenbus_scanf(XBT_NIL, dev->otherend, "multi-queue-num-queues",
+			   "%u", &requested_num_queues);
+	if (err < 0) {
+		requested_num_queues = 1;
+	} else {
+		if (requested_num_queues > xenblk_max_queues
+		    || requested_num_queues == 0) {
+			/* buggy or malicious guest */
+			xenbus_dev_fatal(dev, err,
+					"guest requested %u queues, exceeding the maximum of %u.",
+					requested_num_queues, xenblk_max_queues);
+			return -1;
+		}
+	}
+	be->blkif->nr_rings = requested_num_queues;
+	if (xen_blkif_alloc_rings(be->blkif))
+		return -ENOMEM;
+
 	pr_info("%s: using %d queues, protocol %d (%s) %s\n", dev->nodename,
 		 be->blkif->nr_rings, be->blkif->blk_protocol, protocol,
 		 pers_grants ? "persistent grants" : "");
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 09/10] xen/blkfront: make persistent grants pool per-queue
  2015-11-14  3:12 [PATCH v5 00/10] xen-block: multi hardware-queues/rings support Bob Liu
                   ` (12 preceding siblings ...)
  2015-11-14  3:12 ` Bob Liu
@ 2015-11-14  3:12 ` Bob Liu
  2015-11-14  3:12 ` Bob Liu
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 59+ messages in thread
From: Bob Liu @ 2015-11-14  3:12 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, roger.pau, konrad.wilk, felipe.franciosi, axboe,
	avanzini.arianna, rafal.mielniczuk, jonathan.davies,
	david.vrabel, Bob Liu

Make persistent grants per-queue/ring instead of per-device, so that we can
drop the 'dev_lock' and get better scalability.

Test was done based on null_blk driver:
dom0: v4.2-rc8 16vcpus 10GB "modprobe null_blk"
domu: v4.2-rc8 16vcpus 10GB

[test]
rw=read
direct=1
ioengine=libaio
bs=4k
time_based
runtime=30
filename=/dev/xvdb
numjobs=16
iodepth=64
iodepth_batch=64
iodepth_batch_complete=64
group_reporting

Queues:			  1 	   4 	  	  8 	 	 16
Iops orig(k):		810 	1064 		780 		700
Iops patched(k):	810     1230(~20%)	1024(~20%)	850(~20%)

Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 drivers/block/xen-blkfront.c |  110 +++++++++++++++++-------------------------
 1 file changed, 43 insertions(+), 67 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 84496be..451f852 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -142,6 +142,8 @@ struct blkfront_ring_info {
 	struct gnttab_free_callback callback;
 	struct blk_shadow shadow[BLK_MAX_RING_SIZE];
 	struct list_head indirect_pages;
+	struct list_head grants;
+	unsigned int persistent_gnts_c;
 	unsigned long shadow_free;
 	struct blkfront_info *dev_info;
 };
@@ -162,13 +164,6 @@ struct blkfront_info
 	/* Number of pages per ring buffer. */
 	unsigned int nr_ring_pages;
 	struct request_queue *rq;
-	/*
-	 * Lock to protect info->grants list and persistent_gnts_c shared by all
-	 * rings.
-	 */
-	spinlock_t dev_lock;
-	struct list_head grants;
-	unsigned int persistent_gnts_c;
 	unsigned int feature_flush;
 	unsigned int feature_discard:1;
 	unsigned int feature_secdiscard:1;
@@ -272,9 +267,7 @@ static int fill_grant_buffer(struct blkfront_ring_info *rinfo, int num)
 		}
 
 		gnt_list_entry->gref = GRANT_INVALID_REF;
-		spin_lock_irq(&info->dev_lock);
-		list_add(&gnt_list_entry->node, &info->grants);
-		spin_unlock_irq(&info->dev_lock);
+		list_add(&gnt_list_entry->node, &rinfo->grants);
 		i++;
 	}
 
@@ -282,10 +275,8 @@ static int fill_grant_buffer(struct blkfront_ring_info *rinfo, int num)
 
 out_of_memory:
 	list_for_each_entry_safe(gnt_list_entry, n,
-	                         &info->grants, node) {
-		spin_lock_irq(&info->dev_lock);
+	                         &rinfo->grants, node) {
 		list_del(&gnt_list_entry->node);
-		spin_unlock_irq(&info->dev_lock);
 		if (info->feature_persistent)
 			__free_page(gnt_list_entry->page);
 		kfree(gnt_list_entry);
@@ -295,20 +286,17 @@ out_of_memory:
 	return -ENOMEM;
 }
 
-static struct grant *get_free_grant(struct blkfront_info *info)
+static struct grant *get_free_grant(struct blkfront_ring_info *rinfo)
 {
 	struct grant *gnt_list_entry;
-	unsigned long flags;
 
-	spin_lock_irqsave(&info->dev_lock, flags);
-	BUG_ON(list_empty(&info->grants));
-	gnt_list_entry = list_first_entry(&info->grants, struct grant,
+	BUG_ON(list_empty(&rinfo->grants));
+	gnt_list_entry = list_first_entry(&rinfo->grants, struct grant,
 					  node);
 	list_del(&gnt_list_entry->node);
 
 	if (gnt_list_entry->gref != GRANT_INVALID_REF)
-		info->persistent_gnts_c--;
-	spin_unlock_irqrestore(&info->dev_lock, flags);
+		rinfo->persistent_gnts_c--;
 
 	return gnt_list_entry;
 }
@@ -324,9 +312,10 @@ static inline void grant_foreign_access(const struct grant *gnt_list_entry,
 
 static struct grant *get_grant(grant_ref_t *gref_head,
 			       unsigned long gfn,
-			       struct blkfront_info *info)
+			       struct blkfront_ring_info *rinfo)
 {
-	struct grant *gnt_list_entry = get_free_grant(info);
+	struct grant *gnt_list_entry = get_free_grant(rinfo);
+	struct blkfront_info *info = rinfo->dev_info;
 
 	if (gnt_list_entry->gref != GRANT_INVALID_REF)
 		return gnt_list_entry;
@@ -347,9 +336,10 @@ static struct grant *get_grant(grant_ref_t *gref_head,
 }
 
 static struct grant *get_indirect_grant(grant_ref_t *gref_head,
-					struct blkfront_info *info)
+					struct blkfront_ring_info *rinfo)
 {
-	struct grant *gnt_list_entry = get_free_grant(info);
+	struct grant *gnt_list_entry = get_free_grant(rinfo);
+	struct blkfront_info *info = rinfo->dev_info;
 
 	if (gnt_list_entry->gref != GRANT_INVALID_REF)
 		return gnt_list_entry;
@@ -361,8 +351,8 @@ static struct grant *get_indirect_grant(grant_ref_t *gref_head,
 		struct page *indirect_page;
 
 		/* Fetch a pre-allocated page to use for indirect grefs */
-		BUG_ON(list_empty(&info->rinfo->indirect_pages));
-		indirect_page = list_first_entry(&info->rinfo->indirect_pages,
+		BUG_ON(list_empty(&rinfo->indirect_pages));
+		indirect_page = list_first_entry(&rinfo->indirect_pages,
 						 struct page, lru);
 		list_del(&indirect_page->lru);
 		gnt_list_entry->page = indirect_page;
@@ -543,7 +533,6 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
 	unsigned int grant_idx = setup->grant_idx;
 	struct blkif_request *ring_req = setup->ring_req;
 	struct blkfront_ring_info *rinfo = setup->rinfo;
-	struct blkfront_info *info = rinfo->dev_info;
 	struct blk_shadow *shadow = &rinfo->shadow[setup->id];
 
 	if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
@@ -552,13 +541,13 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
 			kunmap_atomic(setup->segments);
 
 		n = grant_idx / GRANTS_PER_INDIRECT_FRAME;
-		gnt_list_entry = get_indirect_grant(&setup->gref_head, info);
+		gnt_list_entry = get_indirect_grant(&setup->gref_head, rinfo);
 		shadow->indirect_grants[n] = gnt_list_entry;
 		setup->segments = kmap_atomic(gnt_list_entry->page);
 		ring_req->u.indirect.indirect_grefs[n] = gnt_list_entry->gref;
 	}
 
-	gnt_list_entry = get_grant(&setup->gref_head, gfn, info);
+	gnt_list_entry = get_grant(&setup->gref_head, gfn, rinfo);
 	ref = gnt_list_entry->gref;
 	shadow->grants_used[grant_idx] = gnt_list_entry;
 
@@ -1129,7 +1118,7 @@ static void blkif_restart_queue(struct work_struct *work)
 
 static void blkif_free_ring(struct blkfront_ring_info *rinfo)
 {
-	struct grant *persistent_gnt;
+	struct grant *persistent_gnt, *n;
 	struct blkfront_info *info = rinfo->dev_info;
 	int i, j, segs;
 
@@ -1147,6 +1136,23 @@ static void blkif_free_ring(struct blkfront_ring_info *rinfo)
 		}
 	}
 
+	/* Remove all persistent grants. */
+	if (!list_empty(&rinfo->grants)) {
+		list_for_each_entry_safe(persistent_gnt, n,
+					 &rinfo->grants, node) {
+			list_del(&persistent_gnt->node);
+			if (persistent_gnt->gref != GRANT_INVALID_REF) {
+				gnttab_end_foreign_access(persistent_gnt->gref,
+							  0, 0UL);
+				rinfo->persistent_gnts_c--;
+			}
+			if (info->feature_persistent)
+				__free_page(persistent_gnt->page);
+			kfree(persistent_gnt);
+		}
+	}
+	BUG_ON(rinfo->persistent_gnts_c != 0);
+
 	for (i = 0; i < BLK_RING_SIZE(info); i++) {
 		/*
 		 * Clear persistent grants present in requests already
@@ -1212,7 +1218,6 @@ free_shadow:
 
 static void blkif_free(struct blkfront_info *info, int suspend)
 {
-	struct grant *persistent_gnt, *n;
 	unsigned int i;
 
 	/* Prevent new requests being issued until we fix things up. */
@@ -1222,25 +1227,6 @@ static void blkif_free(struct blkfront_info *info, int suspend)
 	if (info->rq)
 		blk_mq_stop_hw_queues(info->rq);
 
-	/* Remove all persistent grants */
-	spin_lock_irq(&info->dev_lock);
-	if (!list_empty(&info->grants)) {
-		list_for_each_entry_safe(persistent_gnt, n,
-					 &info->grants, node) {
-			list_del(&persistent_gnt->node);
-			if (persistent_gnt->gref != GRANT_INVALID_REF) {
-				gnttab_end_foreign_access(persistent_gnt->gref,
-							  0, 0UL);
-				info->persistent_gnts_c--;
-			}
-			if (info->feature_persistent)
-				__free_page(persistent_gnt->page);
-			kfree(persistent_gnt);
-		}
-	}
-	BUG_ON(info->persistent_gnts_c != 0);
-	spin_unlock_irq(&info->dev_lock);
-
 	for (i = 0; i < info->nr_rings; i++)
 		blkif_free_ring(&info->rinfo[i]);
 	kfree(info->rinfo);
@@ -1279,7 +1265,6 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_ring_info *ri
 	int i = 0;
 	struct scatterlist *sg;
 	int num_sg, num_grant;
-	unsigned long flags;
 	struct blkfront_info *info = rinfo->dev_info;
 	struct copy_from_grant data = {
 		.s = s,
@@ -1318,10 +1303,8 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_ring_info *ri
 			if (!info->feature_persistent)
 				pr_alert_ratelimited("backed has not unmapped grant: %u\n",
 						     s->grants_used[i]->gref);
-			spin_lock_irqsave(&info->dev_lock, flags);
-			list_add(&s->grants_used[i]->node, &info->grants);
-			info->persistent_gnts_c++;
-			spin_unlock_irqrestore(&info->dev_lock, flags);
+			list_add(&s->grants_used[i]->node, &rinfo->grants);
+			rinfo->persistent_gnts_c++;
 		} else {
 			/*
 			 * If the grant is not mapped by the backend we end the
@@ -1331,9 +1314,7 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_ring_info *ri
 			 */
 			gnttab_end_foreign_access(s->grants_used[i]->gref, 0, 0UL);
 			s->grants_used[i]->gref = GRANT_INVALID_REF;
-			spin_lock_irqsave(&info->dev_lock, flags);
-			list_add_tail(&s->grants_used[i]->node, &info->grants);
-			spin_unlock_irqrestore(&info->dev_lock, flags);
+			list_add_tail(&s->grants_used[i]->node, &rinfo->grants);
 		}
 	}
 	if (s->req.operation == BLKIF_OP_INDIRECT) {
@@ -1342,10 +1323,8 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_ring_info *ri
 				if (!info->feature_persistent)
 					pr_alert_ratelimited("backed has not unmapped grant: %u\n",
 							     s->indirect_grants[i]->gref);
-				spin_lock_irqsave(&info->dev_lock, flags);
-				list_add(&s->indirect_grants[i]->node, &info->grants);
-				info->persistent_gnts_c++;
-				spin_unlock_irqrestore(&info->dev_lock, flags);
+				list_add(&s->indirect_grants[i]->node, &rinfo->grants);
+				rinfo->persistent_gnts_c++;
 			} else {
 				struct page *indirect_page;
 
@@ -1359,9 +1338,7 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_ring_info *ri
 					list_add(&indirect_page->lru, &rinfo->indirect_pages);
 				}
 				s->indirect_grants[i]->gref = GRANT_INVALID_REF;
-				spin_lock_irqsave(&info->dev_lock, flags);
-				list_add_tail(&s->indirect_grants[i]->node, &info->grants);
-				spin_unlock_irqrestore(&info->dev_lock, flags);
+				list_add_tail(&s->indirect_grants[i]->node, &rinfo->grants);
 			}
 		}
 	}
@@ -1782,15 +1759,14 @@ static int blkfront_probe(struct xenbus_device *dev,
 
 		rinfo = &info->rinfo[r_index];
 		INIT_LIST_HEAD(&rinfo->indirect_pages);
+		INIT_LIST_HEAD(&rinfo->grants);
 		rinfo->dev_info = info;
 		INIT_WORK(&rinfo->work, blkif_restart_queue);
 		spin_lock_init(&rinfo->ring_lock);
 	}
 
 	mutex_init(&info->mutex);
-	spin_lock_init(&info->dev_lock);
 	info->vdevice = vdevice;
-	INIT_LIST_HEAD(&info->grants);
 	info->connected = BLKIF_STATE_DISCONNECTED;
 
 	/* Front end dir is a number, which is used as the id. */
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 09/10] xen/blkfront: make persistent grants pool per-queue
  2015-11-14  3:12 [PATCH v5 00/10] xen-block: multi hardware-queues/rings support Bob Liu
                   ` (13 preceding siblings ...)
  2015-11-14  3:12 ` [PATCH v5 09/10] xen/blkfront: make persistent grants pool per-queue Bob Liu
@ 2015-11-14  3:12 ` Bob Liu
  2015-11-14  3:12 ` [PATCH v5 10/10] xen/blkback: make pool of persistent grants and free pages per-queue Bob Liu
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 59+ messages in thread
From: Bob Liu @ 2015-11-14  3:12 UTC (permalink / raw)
  To: xen-devel
  Cc: jonathan.davies, felipe.franciosi, rafal.mielniczuk,
	linux-kernel, axboe, Bob Liu, david.vrabel, avanzini.arianna,
	roger.pau

Make persistent grants per-queue/ring instead of per-device, so that we can
drop the 'dev_lock' and get better scalability.

Test was done based on null_blk driver:
dom0: v4.2-rc8 16vcpus 10GB "modprobe null_blk"
domu: v4.2-rc8 16vcpus 10GB

[test]
rw=read
direct=1
ioengine=libaio
bs=4k
time_based
runtime=30
filename=/dev/xvdb
numjobs=16
iodepth=64
iodepth_batch=64
iodepth_batch_complete=64
group_reporting

Queues:			  1 	   4 	  	  8 	 	 16
Iops orig(k):		810 	1064 		780 		700
Iops patched(k):	810     1230(~20%)	1024(~20%)	850(~20%)

Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 drivers/block/xen-blkfront.c |  110 +++++++++++++++++-------------------------
 1 file changed, 43 insertions(+), 67 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 84496be..451f852 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -142,6 +142,8 @@ struct blkfront_ring_info {
 	struct gnttab_free_callback callback;
 	struct blk_shadow shadow[BLK_MAX_RING_SIZE];
 	struct list_head indirect_pages;
+	struct list_head grants;
+	unsigned int persistent_gnts_c;
 	unsigned long shadow_free;
 	struct blkfront_info *dev_info;
 };
@@ -162,13 +164,6 @@ struct blkfront_info
 	/* Number of pages per ring buffer. */
 	unsigned int nr_ring_pages;
 	struct request_queue *rq;
-	/*
-	 * Lock to protect info->grants list and persistent_gnts_c shared by all
-	 * rings.
-	 */
-	spinlock_t dev_lock;
-	struct list_head grants;
-	unsigned int persistent_gnts_c;
 	unsigned int feature_flush;
 	unsigned int feature_discard:1;
 	unsigned int feature_secdiscard:1;
@@ -272,9 +267,7 @@ static int fill_grant_buffer(struct blkfront_ring_info *rinfo, int num)
 		}
 
 		gnt_list_entry->gref = GRANT_INVALID_REF;
-		spin_lock_irq(&info->dev_lock);
-		list_add(&gnt_list_entry->node, &info->grants);
-		spin_unlock_irq(&info->dev_lock);
+		list_add(&gnt_list_entry->node, &rinfo->grants);
 		i++;
 	}
 
@@ -282,10 +275,8 @@ static int fill_grant_buffer(struct blkfront_ring_info *rinfo, int num)
 
 out_of_memory:
 	list_for_each_entry_safe(gnt_list_entry, n,
-	                         &info->grants, node) {
-		spin_lock_irq(&info->dev_lock);
+	                         &rinfo->grants, node) {
 		list_del(&gnt_list_entry->node);
-		spin_unlock_irq(&info->dev_lock);
 		if (info->feature_persistent)
 			__free_page(gnt_list_entry->page);
 		kfree(gnt_list_entry);
@@ -295,20 +286,17 @@ out_of_memory:
 	return -ENOMEM;
 }
 
-static struct grant *get_free_grant(struct blkfront_info *info)
+static struct grant *get_free_grant(struct blkfront_ring_info *rinfo)
 {
 	struct grant *gnt_list_entry;
-	unsigned long flags;
 
-	spin_lock_irqsave(&info->dev_lock, flags);
-	BUG_ON(list_empty(&info->grants));
-	gnt_list_entry = list_first_entry(&info->grants, struct grant,
+	BUG_ON(list_empty(&rinfo->grants));
+	gnt_list_entry = list_first_entry(&rinfo->grants, struct grant,
 					  node);
 	list_del(&gnt_list_entry->node);
 
 	if (gnt_list_entry->gref != GRANT_INVALID_REF)
-		info->persistent_gnts_c--;
-	spin_unlock_irqrestore(&info->dev_lock, flags);
+		rinfo->persistent_gnts_c--;
 
 	return gnt_list_entry;
 }
@@ -324,9 +312,10 @@ static inline void grant_foreign_access(const struct grant *gnt_list_entry,
 
 static struct grant *get_grant(grant_ref_t *gref_head,
 			       unsigned long gfn,
-			       struct blkfront_info *info)
+			       struct blkfront_ring_info *rinfo)
 {
-	struct grant *gnt_list_entry = get_free_grant(info);
+	struct grant *gnt_list_entry = get_free_grant(rinfo);
+	struct blkfront_info *info = rinfo->dev_info;
 
 	if (gnt_list_entry->gref != GRANT_INVALID_REF)
 		return gnt_list_entry;
@@ -347,9 +336,10 @@ static struct grant *get_grant(grant_ref_t *gref_head,
 }
 
 static struct grant *get_indirect_grant(grant_ref_t *gref_head,
-					struct blkfront_info *info)
+					struct blkfront_ring_info *rinfo)
 {
-	struct grant *gnt_list_entry = get_free_grant(info);
+	struct grant *gnt_list_entry = get_free_grant(rinfo);
+	struct blkfront_info *info = rinfo->dev_info;
 
 	if (gnt_list_entry->gref != GRANT_INVALID_REF)
 		return gnt_list_entry;
@@ -361,8 +351,8 @@ static struct grant *get_indirect_grant(grant_ref_t *gref_head,
 		struct page *indirect_page;
 
 		/* Fetch a pre-allocated page to use for indirect grefs */
-		BUG_ON(list_empty(&info->rinfo->indirect_pages));
-		indirect_page = list_first_entry(&info->rinfo->indirect_pages,
+		BUG_ON(list_empty(&rinfo->indirect_pages));
+		indirect_page = list_first_entry(&rinfo->indirect_pages,
 						 struct page, lru);
 		list_del(&indirect_page->lru);
 		gnt_list_entry->page = indirect_page;
@@ -543,7 +533,6 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
 	unsigned int grant_idx = setup->grant_idx;
 	struct blkif_request *ring_req = setup->ring_req;
 	struct blkfront_ring_info *rinfo = setup->rinfo;
-	struct blkfront_info *info = rinfo->dev_info;
 	struct blk_shadow *shadow = &rinfo->shadow[setup->id];
 
 	if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
@@ -552,13 +541,13 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
 			kunmap_atomic(setup->segments);
 
 		n = grant_idx / GRANTS_PER_INDIRECT_FRAME;
-		gnt_list_entry = get_indirect_grant(&setup->gref_head, info);
+		gnt_list_entry = get_indirect_grant(&setup->gref_head, rinfo);
 		shadow->indirect_grants[n] = gnt_list_entry;
 		setup->segments = kmap_atomic(gnt_list_entry->page);
 		ring_req->u.indirect.indirect_grefs[n] = gnt_list_entry->gref;
 	}
 
-	gnt_list_entry = get_grant(&setup->gref_head, gfn, info);
+	gnt_list_entry = get_grant(&setup->gref_head, gfn, rinfo);
 	ref = gnt_list_entry->gref;
 	shadow->grants_used[grant_idx] = gnt_list_entry;
 
@@ -1129,7 +1118,7 @@ static void blkif_restart_queue(struct work_struct *work)
 
 static void blkif_free_ring(struct blkfront_ring_info *rinfo)
 {
-	struct grant *persistent_gnt;
+	struct grant *persistent_gnt, *n;
 	struct blkfront_info *info = rinfo->dev_info;
 	int i, j, segs;
 
@@ -1147,6 +1136,23 @@ static void blkif_free_ring(struct blkfront_ring_info *rinfo)
 		}
 	}
 
+	/* Remove all persistent grants. */
+	if (!list_empty(&rinfo->grants)) {
+		list_for_each_entry_safe(persistent_gnt, n,
+					 &rinfo->grants, node) {
+			list_del(&persistent_gnt->node);
+			if (persistent_gnt->gref != GRANT_INVALID_REF) {
+				gnttab_end_foreign_access(persistent_gnt->gref,
+							  0, 0UL);
+				rinfo->persistent_gnts_c--;
+			}
+			if (info->feature_persistent)
+				__free_page(persistent_gnt->page);
+			kfree(persistent_gnt);
+		}
+	}
+	BUG_ON(rinfo->persistent_gnts_c != 0);
+
 	for (i = 0; i < BLK_RING_SIZE(info); i++) {
 		/*
 		 * Clear persistent grants present in requests already
@@ -1212,7 +1218,6 @@ free_shadow:
 
 static void blkif_free(struct blkfront_info *info, int suspend)
 {
-	struct grant *persistent_gnt, *n;
 	unsigned int i;
 
 	/* Prevent new requests being issued until we fix things up. */
@@ -1222,25 +1227,6 @@ static void blkif_free(struct blkfront_info *info, int suspend)
 	if (info->rq)
 		blk_mq_stop_hw_queues(info->rq);
 
-	/* Remove all persistent grants */
-	spin_lock_irq(&info->dev_lock);
-	if (!list_empty(&info->grants)) {
-		list_for_each_entry_safe(persistent_gnt, n,
-					 &info->grants, node) {
-			list_del(&persistent_gnt->node);
-			if (persistent_gnt->gref != GRANT_INVALID_REF) {
-				gnttab_end_foreign_access(persistent_gnt->gref,
-							  0, 0UL);
-				info->persistent_gnts_c--;
-			}
-			if (info->feature_persistent)
-				__free_page(persistent_gnt->page);
-			kfree(persistent_gnt);
-		}
-	}
-	BUG_ON(info->persistent_gnts_c != 0);
-	spin_unlock_irq(&info->dev_lock);
-
 	for (i = 0; i < info->nr_rings; i++)
 		blkif_free_ring(&info->rinfo[i]);
 	kfree(info->rinfo);
@@ -1279,7 +1265,6 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_ring_info *ri
 	int i = 0;
 	struct scatterlist *sg;
 	int num_sg, num_grant;
-	unsigned long flags;
 	struct blkfront_info *info = rinfo->dev_info;
 	struct copy_from_grant data = {
 		.s = s,
@@ -1318,10 +1303,8 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_ring_info *ri
 			if (!info->feature_persistent)
 				pr_alert_ratelimited("backed has not unmapped grant: %u\n",
 						     s->grants_used[i]->gref);
-			spin_lock_irqsave(&info->dev_lock, flags);
-			list_add(&s->grants_used[i]->node, &info->grants);
-			info->persistent_gnts_c++;
-			spin_unlock_irqrestore(&info->dev_lock, flags);
+			list_add(&s->grants_used[i]->node, &rinfo->grants);
+			rinfo->persistent_gnts_c++;
 		} else {
 			/*
 			 * If the grant is not mapped by the backend we end the
@@ -1331,9 +1314,7 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_ring_info *ri
 			 */
 			gnttab_end_foreign_access(s->grants_used[i]->gref, 0, 0UL);
 			s->grants_used[i]->gref = GRANT_INVALID_REF;
-			spin_lock_irqsave(&info->dev_lock, flags);
-			list_add_tail(&s->grants_used[i]->node, &info->grants);
-			spin_unlock_irqrestore(&info->dev_lock, flags);
+			list_add_tail(&s->grants_used[i]->node, &rinfo->grants);
 		}
 	}
 	if (s->req.operation == BLKIF_OP_INDIRECT) {
@@ -1342,10 +1323,8 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_ring_info *ri
 				if (!info->feature_persistent)
 					pr_alert_ratelimited("backed has not unmapped grant: %u\n",
 							     s->indirect_grants[i]->gref);
-				spin_lock_irqsave(&info->dev_lock, flags);
-				list_add(&s->indirect_grants[i]->node, &info->grants);
-				info->persistent_gnts_c++;
-				spin_unlock_irqrestore(&info->dev_lock, flags);
+				list_add(&s->indirect_grants[i]->node, &rinfo->grants);
+				rinfo->persistent_gnts_c++;
 			} else {
 				struct page *indirect_page;
 
@@ -1359,9 +1338,7 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_ring_info *ri
 					list_add(&indirect_page->lru, &rinfo->indirect_pages);
 				}
 				s->indirect_grants[i]->gref = GRANT_INVALID_REF;
-				spin_lock_irqsave(&info->dev_lock, flags);
-				list_add_tail(&s->indirect_grants[i]->node, &info->grants);
-				spin_unlock_irqrestore(&info->dev_lock, flags);
+				list_add_tail(&s->indirect_grants[i]->node, &rinfo->grants);
 			}
 		}
 	}
@@ -1782,15 +1759,14 @@ static int blkfront_probe(struct xenbus_device *dev,
 
 		rinfo = &info->rinfo[r_index];
 		INIT_LIST_HEAD(&rinfo->indirect_pages);
+		INIT_LIST_HEAD(&rinfo->grants);
 		rinfo->dev_info = info;
 		INIT_WORK(&rinfo->work, blkif_restart_queue);
 		spin_lock_init(&rinfo->ring_lock);
 	}
 
 	mutex_init(&info->mutex);
-	spin_lock_init(&info->dev_lock);
 	info->vdevice = vdevice;
-	INIT_LIST_HEAD(&info->grants);
 	info->connected = BLKIF_STATE_DISCONNECTED;
 
 	/* Front end dir is a number, which is used as the id. */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 10/10] xen/blkback: make pool of persistent grants and free pages per-queue
  2015-11-14  3:12 [PATCH v5 00/10] xen-block: multi hardware-queues/rings support Bob Liu
                   ` (14 preceding siblings ...)
  2015-11-14  3:12 ` Bob Liu
@ 2015-11-14  3:12 ` Bob Liu
  2015-11-14  3:12 ` Bob Liu
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 59+ messages in thread
From: Bob Liu @ 2015-11-14  3:12 UTC (permalink / raw)
  To: xen-devel
  Cc: linux-kernel, roger.pau, konrad.wilk, felipe.franciosi, axboe,
	avanzini.arianna, rafal.mielniczuk, jonathan.davies,
	david.vrabel, Bob Liu

Make pool of persistent grants and free pages per-queue/ring instead of
per-device to get better scalability.

Test was done based on null_blk driver:
dom0: v4.2-rc8 16vcpus 10GB "modprobe null_blk"
domu: v4.2-rc8 16vcpus 10GB

[test]
rw=read
direct=1
ioengine=libaio
bs=4k
time_based
runtime=30
filename=/dev/xvdb
numjobs=16
iodepth=64
iodepth_batch=64
iodepth_batch_complete=64
group_reporting

Results:
iops1: After commit("xen/blkfront: make persistent grants per-queue").
iops2: After this commit.

Queues:			  1 	   4 	  	  8 	 	 16
Iops orig(k):		810 	1064 		780 		700
Iops1(k):		810     1230(~20%)	1024(~20%)	850(~20%)
Iops2(k):		810     1410(~35%)	1354(~75%)      1440(~100%)

With 4 queues after this commit we can get ~75% increase in IOPS, and
performance won't drop if incresing queue numbers.

Please find the respective chart in this link:
https://www.dropbox.com/s/agrcy2pbzbsvmwv/iops.png?dl=0

Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 drivers/block/xen-blkback/blkback.c |  202 ++++++++++++++++-------------------
 drivers/block/xen-blkback/common.h  |   32 +++---
 drivers/block/xen-blkback/xenbus.c  |   21 ++--
 3 files changed, 118 insertions(+), 137 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index acedc46..0e8a04d 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -122,60 +122,60 @@ module_param(log_stats, int, 0644);
 /* Number of free pages to remove on each call to gnttab_free_pages */
 #define NUM_BATCH_FREE_PAGES 10
 
-static inline int get_free_page(struct xen_blkif *blkif, struct page **page)
+static inline int get_free_page(struct xen_blkif_ring *ring, struct page **page)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&blkif->free_pages_lock, flags);
-	if (list_empty(&blkif->free_pages)) {
-		BUG_ON(blkif->free_pages_num != 0);
-		spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
+	spin_lock_irqsave(&ring->free_pages_lock, flags);
+	if (list_empty(&ring->free_pages)) {
+		BUG_ON(ring->free_pages_num != 0);
+		spin_unlock_irqrestore(&ring->free_pages_lock, flags);
 		return gnttab_alloc_pages(1, page);
 	}
-	BUG_ON(blkif->free_pages_num == 0);
-	page[0] = list_first_entry(&blkif->free_pages, struct page, lru);
+	BUG_ON(ring->free_pages_num == 0);
+	page[0] = list_first_entry(&ring->free_pages, struct page, lru);
 	list_del(&page[0]->lru);
-	blkif->free_pages_num--;
-	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
+	ring->free_pages_num--;
+	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
 
 	return 0;
 }
 
-static inline void put_free_pages(struct xen_blkif *blkif, struct page **page,
+static inline void put_free_pages(struct xen_blkif_ring *ring, struct page **page,
                                   int num)
 {
 	unsigned long flags;
 	int i;
 
-	spin_lock_irqsave(&blkif->free_pages_lock, flags);
+	spin_lock_irqsave(&ring->free_pages_lock, flags);
 	for (i = 0; i < num; i++)
-		list_add(&page[i]->lru, &blkif->free_pages);
-	blkif->free_pages_num += num;
-	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
+		list_add(&page[i]->lru, &ring->free_pages);
+	ring->free_pages_num += num;
+	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
 }
 
-static inline void shrink_free_pagepool(struct xen_blkif *blkif, int num)
+static inline void shrink_free_pagepool(struct xen_blkif_ring *ring, int num)
 {
 	/* Remove requested pages in batches of NUM_BATCH_FREE_PAGES */
 	struct page *page[NUM_BATCH_FREE_PAGES];
 	unsigned int num_pages = 0;
 	unsigned long flags;
 
-	spin_lock_irqsave(&blkif->free_pages_lock, flags);
-	while (blkif->free_pages_num > num) {
-		BUG_ON(list_empty(&blkif->free_pages));
-		page[num_pages] = list_first_entry(&blkif->free_pages,
+	spin_lock_irqsave(&ring->free_pages_lock, flags);
+	while (ring->free_pages_num > num) {
+		BUG_ON(list_empty(&ring->free_pages));
+		page[num_pages] = list_first_entry(&ring->free_pages,
 		                                   struct page, lru);
 		list_del(&page[num_pages]->lru);
-		blkif->free_pages_num--;
+		ring->free_pages_num--;
 		if (++num_pages == NUM_BATCH_FREE_PAGES) {
-			spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
+			spin_unlock_irqrestore(&ring->free_pages_lock, flags);
 			gnttab_free_pages(num_pages, page);
-			spin_lock_irqsave(&blkif->free_pages_lock, flags);
+			spin_lock_irqsave(&ring->free_pages_lock, flags);
 			num_pages = 0;
 		}
 	}
-	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
+	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
 	if (num_pages != 0)
 		gnttab_free_pages(num_pages, page);
 }
@@ -198,23 +198,29 @@ static void make_response(struct xen_blkif_ring *ring, u64 id,
 
 
 /*
- * pers_gnts_lock must be used around all the persistent grant helpers
- * because blkback may use multi-thread/queue for each backend.
+ * We don't need locking around the persistent grant helpers
+ * because blkback uses a single-thread for each backend, so we
+ * can be sure that this functions will never be called recursively.
+ *
+ * The only exception to that is put_persistent_grant, that can be called
+ * from interrupt context (by xen_blkbk_unmap), so we have to use atomic
+ * bit operations to modify the flags of a persistent grant and to count
+ * the number of used grants.
  */
-static int add_persistent_gnt(struct xen_blkif *blkif,
+static int add_persistent_gnt(struct xen_blkif_ring *ring,
 			       struct persistent_gnt *persistent_gnt)
 {
 	struct rb_node **new = NULL, *parent = NULL;
 	struct persistent_gnt *this;
+	struct xen_blkif *blkif = ring->blkif;
 
-	BUG_ON(!spin_is_locked(&blkif->pers_gnts_lock));
-	if (blkif->persistent_gnt_c >= xen_blkif_max_pgrants) {
+	if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) {
 		if (!blkif->vbd.overflow_max_grants)
 			blkif->vbd.overflow_max_grants = 1;
 		return -EBUSY;
 	}
 	/* Figure out where to put new node */
-	new = &blkif->persistent_gnts.rb_node;
+	new = &ring->persistent_gnts.rb_node;
 	while (*new) {
 		this = container_of(*new, struct persistent_gnt, node);
 
@@ -233,20 +239,19 @@ static int add_persistent_gnt(struct xen_blkif *blkif,
 	set_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags);
 	/* Add new node and rebalance tree. */
 	rb_link_node(&(persistent_gnt->node), parent, new);
-	rb_insert_color(&(persistent_gnt->node), &blkif->persistent_gnts);
-	blkif->persistent_gnt_c++;
-	atomic_inc(&blkif->persistent_gnt_in_use);
+	rb_insert_color(&(persistent_gnt->node), &ring->persistent_gnts);
+	ring->persistent_gnt_c++;
+	atomic_inc(&ring->persistent_gnt_in_use);
 	return 0;
 }
 
-static struct persistent_gnt *get_persistent_gnt(struct xen_blkif *blkif,
+static struct persistent_gnt *get_persistent_gnt(struct xen_blkif_ring *ring,
 						 grant_ref_t gref)
 {
 	struct persistent_gnt *data;
 	struct rb_node *node = NULL;
 
-	BUG_ON(!spin_is_locked(&blkif->pers_gnts_lock));
-	node = blkif->persistent_gnts.rb_node;
+	node = ring->persistent_gnts.rb_node;
 	while (node) {
 		data = container_of(node, struct persistent_gnt, node);
 
@@ -260,25 +265,24 @@ static struct persistent_gnt *get_persistent_gnt(struct xen_blkif *blkif,
 				return NULL;
 			}
 			set_bit(PERSISTENT_GNT_ACTIVE, data->flags);
-			atomic_inc(&blkif->persistent_gnt_in_use);
+			atomic_inc(&ring->persistent_gnt_in_use);
 			return data;
 		}
 	}
 	return NULL;
 }
 
-static void put_persistent_gnt(struct xen_blkif *blkif,
+static void put_persistent_gnt(struct xen_blkif_ring *ring,
                                struct persistent_gnt *persistent_gnt)
 {
-	BUG_ON(!spin_is_locked(&blkif->pers_gnts_lock));
 	if(!test_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags))
 		pr_alert_ratelimited("freeing a grant already unused\n");
 	set_bit(PERSISTENT_GNT_WAS_ACTIVE, persistent_gnt->flags);
 	clear_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags);
-	atomic_dec(&blkif->persistent_gnt_in_use);
+	atomic_dec(&ring->persistent_gnt_in_use);
 }
 
-static void free_persistent_gnts(struct xen_blkif *blkif, struct rb_root *root,
+static void free_persistent_gnts(struct xen_blkif_ring *ring, struct rb_root *root,
                                  unsigned int num)
 {
 	struct gnttab_unmap_grant_ref unmap[BLKIF_MAX_SEGMENTS_PER_REQUEST];
@@ -292,7 +296,6 @@ static void free_persistent_gnts(struct xen_blkif *blkif, struct rb_root *root,
 	unmap_data.unmap_ops = unmap;
 	unmap_data.kunmap_ops = NULL;
 
-	BUG_ON(!spin_is_locked(&blkif->pers_gnts_lock));
 	foreach_grant_safe(persistent_gnt, n, root, node) {
 		BUG_ON(persistent_gnt->handle ==
 			BLKBACK_INVALID_HANDLE);
@@ -310,7 +313,7 @@ static void free_persistent_gnts(struct xen_blkif *blkif, struct rb_root *root,
 			unmap_data.count = segs_to_unmap;
 			BUG_ON(gnttab_unmap_refs_sync(&unmap_data));
 
-			put_free_pages(blkif, pages, segs_to_unmap);
+			put_free_pages(ring, pages, segs_to_unmap);
 			segs_to_unmap = 0;
 		}
 
@@ -327,17 +330,15 @@ void xen_blkbk_unmap_purged_grants(struct work_struct *work)
 	struct page *pages[BLKIF_MAX_SEGMENTS_PER_REQUEST];
 	struct persistent_gnt *persistent_gnt;
 	int segs_to_unmap = 0;
-	struct xen_blkif *blkif = container_of(work, typeof(*blkif), persistent_purge_work);
+	struct xen_blkif_ring *ring = container_of(work, typeof(*ring), persistent_purge_work);
 	struct gntab_unmap_queue_data unmap_data;
-	unsigned long flags;
 
 	unmap_data.pages = pages;
 	unmap_data.unmap_ops = unmap;
 	unmap_data.kunmap_ops = NULL;
 
-	spin_lock_irqsave(&blkif->pers_gnts_lock, flags);
-	while(!list_empty(&blkif->persistent_purge_list)) {
-		persistent_gnt = list_first_entry(&blkif->persistent_purge_list,
+	while(!list_empty(&ring->persistent_purge_list)) {
+		persistent_gnt = list_first_entry(&ring->persistent_purge_list,
 		                                  struct persistent_gnt,
 		                                  remove_node);
 		list_del(&persistent_gnt->remove_node);
@@ -352,45 +353,42 @@ void xen_blkbk_unmap_purged_grants(struct work_struct *work)
 		if (++segs_to_unmap == BLKIF_MAX_SEGMENTS_PER_REQUEST) {
 			unmap_data.count = segs_to_unmap;
 			BUG_ON(gnttab_unmap_refs_sync(&unmap_data));
-			put_free_pages(blkif, pages, segs_to_unmap);
+			put_free_pages(ring, pages, segs_to_unmap);
 			segs_to_unmap = 0;
 		}
 		kfree(persistent_gnt);
 	}
-	spin_unlock_irqrestore(&blkif->pers_gnts_lock, flags);
 	if (segs_to_unmap > 0) {
 		unmap_data.count = segs_to_unmap;
 		BUG_ON(gnttab_unmap_refs_sync(&unmap_data));
-		put_free_pages(blkif, pages, segs_to_unmap);
+		put_free_pages(ring, pages, segs_to_unmap);
 	}
 }
 
-static void purge_persistent_gnt(struct xen_blkif *blkif)
+static void purge_persistent_gnt(struct xen_blkif_ring *ring)
 {
 	struct persistent_gnt *persistent_gnt;
 	struct rb_node *n;
 	unsigned int num_clean, total;
 	bool scan_used = false, clean_used = false;
 	struct rb_root *root;
-	unsigned long flags;
 
-	spin_lock_irqsave(&blkif->pers_gnts_lock, flags);
-	if (blkif->persistent_gnt_c < xen_blkif_max_pgrants ||
-	    (blkif->persistent_gnt_c == xen_blkif_max_pgrants &&
-	    !blkif->vbd.overflow_max_grants)) {
+	if (ring->persistent_gnt_c < xen_blkif_max_pgrants ||
+	    (ring->persistent_gnt_c == xen_blkif_max_pgrants &&
+	    !ring->blkif->vbd.overflow_max_grants)) {
 		goto out;
 	}
 
-	if (work_busy(&blkif->persistent_purge_work)) {
+	if (work_busy(&ring->persistent_purge_work)) {
 		pr_alert_ratelimited("Scheduled work from previous purge is still busy, cannot purge list\n");
 		goto out;
 	}
 
 	num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN;
-	num_clean = blkif->persistent_gnt_c - xen_blkif_max_pgrants + num_clean;
-	num_clean = min(blkif->persistent_gnt_c, num_clean);
+	num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants + num_clean;
+	num_clean = min(ring->persistent_gnt_c, num_clean);
 	if ((num_clean == 0) ||
-	    (num_clean > (blkif->persistent_gnt_c - atomic_read(&blkif->persistent_gnt_in_use))))
+	    (num_clean > (ring->persistent_gnt_c - atomic_read(&ring->persistent_gnt_in_use))))
 		goto out;
 
 	/*
@@ -406,8 +404,8 @@ static void purge_persistent_gnt(struct xen_blkif *blkif)
 
 	pr_debug("Going to purge %u persistent grants\n", num_clean);
 
-	BUG_ON(!list_empty(&blkif->persistent_purge_list));
-	root = &blkif->persistent_gnts;
+	BUG_ON(!list_empty(&ring->persistent_purge_list));
+	root = &ring->persistent_gnts;
 purge_list:
 	foreach_grant_safe(persistent_gnt, n, root, node) {
 		BUG_ON(persistent_gnt->handle ==
@@ -426,7 +424,7 @@ purge_list:
 
 		rb_erase(&persistent_gnt->node, root);
 		list_add(&persistent_gnt->remove_node,
-		         &blkif->persistent_purge_list);
+			 &ring->persistent_purge_list);
 		if (--num_clean == 0)
 			goto finished;
 	}
@@ -447,18 +445,14 @@ finished:
 		goto purge_list;
 	}
 
-	blkif->persistent_gnt_c -= (total - num_clean);
-	spin_unlock_irqrestore(&blkif->pers_gnts_lock, flags);
-	blkif->vbd.overflow_max_grants = 0;
+	ring->persistent_gnt_c -= (total - num_clean);
+	ring->blkif->vbd.overflow_max_grants = 0;
 
 	/* We can defer this work */
-	schedule_work(&blkif->persistent_purge_work);
+	schedule_work(&ring->persistent_purge_work);
 	pr_debug("Purged %u/%u\n", (total - num_clean), total);
-	return;
 
 out:
-	spin_unlock_irqrestore(&blkif->pers_gnts_lock, flags);
-
 	return;
 }
 
@@ -590,14 +584,16 @@ irqreturn_t xen_blkif_be_int(int irq, void *dev_id)
  * SCHEDULER FUNCTIONS
  */
 
-static void print_stats(struct xen_blkif *blkif)
+static void print_stats(struct xen_blkif_ring *ring)
 {
+	struct xen_blkif *blkif = ring->blkif;
+
 	pr_info("(%s): oo %3llu  |  rd %4llu  |  wr %4llu  |  f %4llu"
 		 "  |  ds %4llu | pg: %4u/%4d\n",
 		 current->comm, blkif->st_oo_req,
 		 blkif->st_rd_req, blkif->st_wr_req,
 		 blkif->st_f_req, blkif->st_ds_req,
-		 blkif->persistent_gnt_c,
+		 ring->persistent_gnt_c,
 		 xen_blkif_max_pgrants);
 	blkif->st_print = jiffies + msecs_to_jiffies(10 * 1000);
 	blkif->st_rd_req = 0;
@@ -650,23 +646,23 @@ int xen_blkif_schedule(void *arg)
 
 purge_gnt_list:
 		if (blkif->vbd.feature_gnt_persistent &&
-		    time_after(jiffies, blkif->next_lru)) {
-			purge_persistent_gnt(blkif);
-			blkif->next_lru = jiffies + msecs_to_jiffies(LRU_INTERVAL);
+		    time_after(jiffies, ring->next_lru)) {
+			purge_persistent_gnt(ring);
+			ring->next_lru = jiffies + msecs_to_jiffies(LRU_INTERVAL);
 		}
 
 		/* Shrink if we have more than xen_blkif_max_buffer_pages */
-		shrink_free_pagepool(blkif, xen_blkif_max_buffer_pages);
+		shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);
 
-		if (log_stats && time_after(jiffies, blkif->st_print))
-			print_stats(blkif);
+		if (log_stats && time_after(jiffies, ring->blkif->st_print))
+			print_stats(ring);
 	}
 
 	/* Drain pending purge work */
-	flush_work(&blkif->persistent_purge_work);
+	flush_work(&ring->persistent_purge_work);
 
 	if (log_stats)
-		print_stats(blkif);
+		print_stats(ring);
 
 	ring->xenblkd = NULL;
 	xen_blkif_put(blkif);
@@ -679,21 +675,16 @@ purge_gnt_list:
  */
 void xen_blkbk_free_caches(struct xen_blkif_ring *ring)
 {
-	struct xen_blkif *blkif = ring->blkif;
-	unsigned long flags;
-
 	/* Free all persistent grant pages */
-	spin_lock_irqsave(&blkif->pers_gnts_lock, flags);
-	if (!RB_EMPTY_ROOT(&blkif->persistent_gnts))
-		free_persistent_gnts(blkif, &blkif->persistent_gnts,
-			blkif->persistent_gnt_c);
+	if (!RB_EMPTY_ROOT(&ring->persistent_gnts))
+		free_persistent_gnts(ring, &ring->persistent_gnts,
+			ring->persistent_gnt_c);
 
-	BUG_ON(!RB_EMPTY_ROOT(&blkif->persistent_gnts));
-	blkif->persistent_gnt_c = 0;
-	spin_unlock_irqrestore(&blkif->pers_gnts_lock, flags);
+	BUG_ON(!RB_EMPTY_ROOT(&ring->persistent_gnts));
+	ring->persistent_gnt_c = 0;
 
 	/* Since we are shutting down remove all pages from the buffer */
-	shrink_free_pagepool(blkif, 0 /* All */);
+	shrink_free_pagepool(ring, 0 /* All */);
 }
 
 static unsigned int xen_blkbk_unmap_prepare(
@@ -704,13 +695,10 @@ static unsigned int xen_blkbk_unmap_prepare(
 	struct page **unmap_pages)
 {
 	unsigned int i, invcount = 0;
-	unsigned long flags;
 
 	for (i = 0; i < num; i++) {
 		if (pages[i]->persistent_gnt != NULL) {
-			spin_lock_irqsave(&ring->blkif->pers_gnts_lock, flags);
-			put_persistent_gnt(ring->blkif, pages[i]->persistent_gnt);
-			spin_unlock_irqrestore(&ring->blkif->pers_gnts_lock, flags);
+			put_persistent_gnt(ring, pages[i]->persistent_gnt);
 			continue;
 		}
 		if (pages[i]->handle == BLKBACK_INVALID_HANDLE)
@@ -735,7 +723,7 @@ static void xen_blkbk_unmap_and_respond_callback(int result, struct gntab_unmap_
 	   but is this the best way to deal with this? */
 	BUG_ON(result);
 
-	put_free_pages(blkif, data->pages, data->count);
+	put_free_pages(ring, data->pages, data->count);
 	make_response(ring, pending_req->id,
 		      pending_req->operation, pending_req->status);
 	free_req(ring, pending_req);
@@ -802,7 +790,7 @@ static void xen_blkbk_unmap(struct xen_blkif_ring *ring,
 		if (invcount) {
 			ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, invcount);
 			BUG_ON(ret);
-			put_free_pages(ring->blkif, unmap_pages, invcount);
+			put_free_pages(ring, unmap_pages, invcount);
 		}
 		pages += batch;
 		num -= batch;
@@ -823,7 +811,6 @@ static int xen_blkbk_map(struct xen_blkif_ring *ring,
 	int last_map = 0, map_until = 0;
 	int use_persistent_gnts;
 	struct xen_blkif *blkif = ring->blkif;
-	unsigned long irq_flags;
 
 	use_persistent_gnts = (blkif->vbd.feature_gnt_persistent);
 
@@ -837,11 +824,9 @@ again:
 		uint32_t flags;
 
 		if (use_persistent_gnts) {
-			spin_lock_irqsave(&blkif->pers_gnts_lock, irq_flags);
 			persistent_gnt = get_persistent_gnt(
-				blkif,
+				ring,
 				pages[i]->gref);
-			spin_unlock_irqrestore(&blkif->pers_gnts_lock, irq_flags);
 		}
 
 		if (persistent_gnt) {
@@ -852,7 +837,7 @@ again:
 			pages[i]->page = persistent_gnt->page;
 			pages[i]->persistent_gnt = persistent_gnt;
 		} else {
-			if (get_free_page(blkif, &pages[i]->page))
+			if (get_free_page(ring, &pages[i]->page))
 				goto out_of_memory;
 			addr = vaddr(pages[i]->page);
 			pages_to_gnt[segs_to_map] = pages[i]->page;
@@ -885,7 +870,7 @@ again:
 			BUG_ON(new_map_idx >= segs_to_map);
 			if (unlikely(map[new_map_idx].status != 0)) {
 				pr_debug("invalid buffer -- could not remap it\n");
-				put_free_pages(blkif, &pages[seg_idx]->page, 1);
+				put_free_pages(ring, &pages[seg_idx]->page, 1);
 				pages[seg_idx]->handle = BLKBACK_INVALID_HANDLE;
 				ret |= 1;
 				goto next;
@@ -895,7 +880,7 @@ again:
 			continue;
 		}
 		if (use_persistent_gnts &&
-		    blkif->persistent_gnt_c < xen_blkif_max_pgrants) {
+		    ring->persistent_gnt_c < xen_blkif_max_pgrants) {
 			/*
 			 * We are using persistent grants, the grant is
 			 * not mapped but we might have room for it.
@@ -913,19 +898,16 @@ again:
 			persistent_gnt->gnt = map[new_map_idx].ref;
 			persistent_gnt->handle = map[new_map_idx].handle;
 			persistent_gnt->page = pages[seg_idx]->page;
-			spin_lock_irqsave(&blkif->pers_gnts_lock, irq_flags);
-			if (add_persistent_gnt(blkif,
+			if (add_persistent_gnt(ring,
 			                       persistent_gnt)) {
-				spin_unlock_irqrestore(&blkif->pers_gnts_lock, irq_flags);
 				kfree(persistent_gnt);
 				persistent_gnt = NULL;
 				goto next;
 			}
 			pages[seg_idx]->persistent_gnt = persistent_gnt;
 			pr_debug("grant %u added to the tree of persistent grants, using %u/%u\n",
-				 persistent_gnt->gnt, blkif->persistent_gnt_c,
+				 persistent_gnt->gnt, ring->persistent_gnt_c,
 				 xen_blkif_max_pgrants);
-			spin_unlock_irqrestore(&blkif->pers_gnts_lock, irq_flags);
 			goto next;
 		}
 		if (use_persistent_gnts && !blkif->vbd.overflow_max_grants) {
@@ -949,7 +931,7 @@ next:
 
 out_of_memory:
 	pr_alert("%s: out of memory\n", __func__);
-	put_free_pages(blkif, pages_to_gnt, segs_to_map);
+	put_free_pages(ring, pages_to_gnt, segs_to_map);
 	return -ENOMEM;
 }
 
diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index 0833dc6..54f126e 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -291,6 +291,22 @@ struct xen_blkif_ring {
 	spinlock_t		pending_free_lock;
 	wait_queue_head_t	pending_free_wq;
 
+	/* Tree to store persistent grants. */
+	spinlock_t		pers_gnts_lock;
+	struct rb_root		persistent_gnts;
+	unsigned int		persistent_gnt_c;
+	atomic_t		persistent_gnt_in_use;
+	unsigned long           next_lru;
+
+	/* Used by the kworker that offload work from the persistent purge. */
+	struct list_head	persistent_purge_list;
+	struct work_struct	persistent_purge_work;
+
+	/* Buffer of free pages to map grant refs. */
+	spinlock_t		free_pages_lock;
+	int			free_pages_num;
+	struct list_head	free_pages;
+
 	struct work_struct	free_work;
 	/* Thread shutdown wait queue. */
 	wait_queue_head_t	shutdown_wq;
@@ -312,22 +328,6 @@ struct xen_blkif {
 	struct completion	drain_complete;
 	atomic_t		drain;
 
-	/* tree to store persistent grants */
-	spinlock_t		pers_gnts_lock;
-	struct rb_root		persistent_gnts;
-	unsigned int		persistent_gnt_c;
-	atomic_t		persistent_gnt_in_use;
-	unsigned long           next_lru;
-
-	/* used by the kworker that offload work from the persistent purge */
-	struct list_head	persistent_purge_list;
-	struct work_struct	persistent_purge_work;
-
-	/* buffer of free pages to map grant refs */
-	spinlock_t		free_pages_lock;
-	int			free_pages_num;
-	struct list_head	free_pages;
-
 	/* statistics */
 	unsigned long		st_print;
 	unsigned long long			st_rd_req;
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index d83b790..db8ad57 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -150,6 +150,10 @@ static int xen_blkif_alloc_rings(struct xen_blkif *blkif)
 		spin_lock_init(&ring->blk_ring_lock);
 		init_waitqueue_head(&ring->wq);
 		INIT_LIST_HEAD(&ring->pending_free);
+		INIT_LIST_HEAD(&ring->persistent_purge_list);
+		INIT_WORK(&ring->persistent_purge_work, xen_blkbk_unmap_purged_grants);
+		spin_lock_init(&ring->free_pages_lock);
+		INIT_LIST_HEAD(&ring->free_pages);
 
 		spin_lock_init(&ring->pending_free_lock);
 		init_waitqueue_head(&ring->pending_free_wq);
@@ -175,11 +179,7 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
 	atomic_set(&blkif->refcnt, 1);
 	init_completion(&blkif->drain_complete);
 	INIT_WORK(&blkif->free_work, xen_blkif_deferred_free);
-	spin_lock_init(&blkif->free_pages_lock);
-	INIT_LIST_HEAD(&blkif->free_pages);
-	INIT_LIST_HEAD(&blkif->persistent_purge_list);
 	blkif->st_print = jiffies;
-	INIT_WORK(&blkif->persistent_purge_work, xen_blkbk_unmap_purged_grants);
 
 	return blkif;
 }
@@ -290,6 +290,12 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
 			i++;
 		}
 
+		BUG_ON(atomic_read(&ring->persistent_gnt_in_use) != 0);
+		BUG_ON(!list_empty(&ring->persistent_purge_list));
+		BUG_ON(!RB_EMPTY_ROOT(&ring->persistent_gnts));
+		BUG_ON(!list_empty(&ring->free_pages));
+		BUG_ON(ring->free_pages_num != 0);
+		BUG_ON(ring->persistent_gnt_c != 0);
 		WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));
 	}
 	blkif->nr_ring_pages = 0;
@@ -304,13 +310,6 @@ static void xen_blkif_free(struct xen_blkif *blkif)
 	xen_vbd_free(&blkif->vbd);
 
 	/* Make sure everything is drained before shutting down */
-	BUG_ON(blkif->persistent_gnt_c != 0);
-	BUG_ON(atomic_read(&blkif->persistent_gnt_in_use) != 0);
-	BUG_ON(blkif->free_pages_num != 0);
-	BUG_ON(!list_empty(&blkif->persistent_purge_list));
-	BUG_ON(!list_empty(&blkif->free_pages));
-	BUG_ON(!RB_EMPTY_ROOT(&blkif->persistent_gnts));
-
 	kfree(blkif->rings);
 	kmem_cache_free(xen_blkif_cachep, blkif);
 }
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v5 10/10] xen/blkback: make pool of persistent grants and free pages per-queue
  2015-11-14  3:12 [PATCH v5 00/10] xen-block: multi hardware-queues/rings support Bob Liu
                   ` (15 preceding siblings ...)
  2015-11-14  3:12 ` [PATCH v5 10/10] xen/blkback: make pool of persistent grants and free pages per-queue Bob Liu
@ 2015-11-14  3:12 ` Bob Liu
  2015-11-16 21:08 ` [PATCH v5 00/10] xen-block: multi hardware-queues/rings support Konrad Rzeszutek Wilk
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 59+ messages in thread
From: Bob Liu @ 2015-11-14  3:12 UTC (permalink / raw)
  To: xen-devel
  Cc: jonathan.davies, felipe.franciosi, rafal.mielniczuk,
	linux-kernel, axboe, Bob Liu, david.vrabel, avanzini.arianna,
	roger.pau

Make pool of persistent grants and free pages per-queue/ring instead of
per-device to get better scalability.

Test was done based on null_blk driver:
dom0: v4.2-rc8 16vcpus 10GB "modprobe null_blk"
domu: v4.2-rc8 16vcpus 10GB

[test]
rw=read
direct=1
ioengine=libaio
bs=4k
time_based
runtime=30
filename=/dev/xvdb
numjobs=16
iodepth=64
iodepth_batch=64
iodepth_batch_complete=64
group_reporting

Results:
iops1: After commit("xen/blkfront: make persistent grants per-queue").
iops2: After this commit.

Queues:			  1 	   4 	  	  8 	 	 16
Iops orig(k):		810 	1064 		780 		700
Iops1(k):		810     1230(~20%)	1024(~20%)	850(~20%)
Iops2(k):		810     1410(~35%)	1354(~75%)      1440(~100%)

With 4 queues after this commit we can get ~75% increase in IOPS, and
performance won't drop if incresing queue numbers.

Please find the respective chart in this link:
https://www.dropbox.com/s/agrcy2pbzbsvmwv/iops.png?dl=0

Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 drivers/block/xen-blkback/blkback.c |  202 ++++++++++++++++-------------------
 drivers/block/xen-blkback/common.h  |   32 +++---
 drivers/block/xen-blkback/xenbus.c  |   21 ++--
 3 files changed, 118 insertions(+), 137 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index acedc46..0e8a04d 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -122,60 +122,60 @@ module_param(log_stats, int, 0644);
 /* Number of free pages to remove on each call to gnttab_free_pages */
 #define NUM_BATCH_FREE_PAGES 10
 
-static inline int get_free_page(struct xen_blkif *blkif, struct page **page)
+static inline int get_free_page(struct xen_blkif_ring *ring, struct page **page)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&blkif->free_pages_lock, flags);
-	if (list_empty(&blkif->free_pages)) {
-		BUG_ON(blkif->free_pages_num != 0);
-		spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
+	spin_lock_irqsave(&ring->free_pages_lock, flags);
+	if (list_empty(&ring->free_pages)) {
+		BUG_ON(ring->free_pages_num != 0);
+		spin_unlock_irqrestore(&ring->free_pages_lock, flags);
 		return gnttab_alloc_pages(1, page);
 	}
-	BUG_ON(blkif->free_pages_num == 0);
-	page[0] = list_first_entry(&blkif->free_pages, struct page, lru);
+	BUG_ON(ring->free_pages_num == 0);
+	page[0] = list_first_entry(&ring->free_pages, struct page, lru);
 	list_del(&page[0]->lru);
-	blkif->free_pages_num--;
-	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
+	ring->free_pages_num--;
+	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
 
 	return 0;
 }
 
-static inline void put_free_pages(struct xen_blkif *blkif, struct page **page,
+static inline void put_free_pages(struct xen_blkif_ring *ring, struct page **page,
                                   int num)
 {
 	unsigned long flags;
 	int i;
 
-	spin_lock_irqsave(&blkif->free_pages_lock, flags);
+	spin_lock_irqsave(&ring->free_pages_lock, flags);
 	for (i = 0; i < num; i++)
-		list_add(&page[i]->lru, &blkif->free_pages);
-	blkif->free_pages_num += num;
-	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
+		list_add(&page[i]->lru, &ring->free_pages);
+	ring->free_pages_num += num;
+	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
 }
 
-static inline void shrink_free_pagepool(struct xen_blkif *blkif, int num)
+static inline void shrink_free_pagepool(struct xen_blkif_ring *ring, int num)
 {
 	/* Remove requested pages in batches of NUM_BATCH_FREE_PAGES */
 	struct page *page[NUM_BATCH_FREE_PAGES];
 	unsigned int num_pages = 0;
 	unsigned long flags;
 
-	spin_lock_irqsave(&blkif->free_pages_lock, flags);
-	while (blkif->free_pages_num > num) {
-		BUG_ON(list_empty(&blkif->free_pages));
-		page[num_pages] = list_first_entry(&blkif->free_pages,
+	spin_lock_irqsave(&ring->free_pages_lock, flags);
+	while (ring->free_pages_num > num) {
+		BUG_ON(list_empty(&ring->free_pages));
+		page[num_pages] = list_first_entry(&ring->free_pages,
 		                                   struct page, lru);
 		list_del(&page[num_pages]->lru);
-		blkif->free_pages_num--;
+		ring->free_pages_num--;
 		if (++num_pages == NUM_BATCH_FREE_PAGES) {
-			spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
+			spin_unlock_irqrestore(&ring->free_pages_lock, flags);
 			gnttab_free_pages(num_pages, page);
-			spin_lock_irqsave(&blkif->free_pages_lock, flags);
+			spin_lock_irqsave(&ring->free_pages_lock, flags);
 			num_pages = 0;
 		}
 	}
-	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
+	spin_unlock_irqrestore(&ring->free_pages_lock, flags);
 	if (num_pages != 0)
 		gnttab_free_pages(num_pages, page);
 }
@@ -198,23 +198,29 @@ static void make_response(struct xen_blkif_ring *ring, u64 id,
 
 
 /*
- * pers_gnts_lock must be used around all the persistent grant helpers
- * because blkback may use multi-thread/queue for each backend.
+ * We don't need locking around the persistent grant helpers
+ * because blkback uses a single-thread for each backend, so we
+ * can be sure that this functions will never be called recursively.
+ *
+ * The only exception to that is put_persistent_grant, that can be called
+ * from interrupt context (by xen_blkbk_unmap), so we have to use atomic
+ * bit operations to modify the flags of a persistent grant and to count
+ * the number of used grants.
  */
-static int add_persistent_gnt(struct xen_blkif *blkif,
+static int add_persistent_gnt(struct xen_blkif_ring *ring,
 			       struct persistent_gnt *persistent_gnt)
 {
 	struct rb_node **new = NULL, *parent = NULL;
 	struct persistent_gnt *this;
+	struct xen_blkif *blkif = ring->blkif;
 
-	BUG_ON(!spin_is_locked(&blkif->pers_gnts_lock));
-	if (blkif->persistent_gnt_c >= xen_blkif_max_pgrants) {
+	if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) {
 		if (!blkif->vbd.overflow_max_grants)
 			blkif->vbd.overflow_max_grants = 1;
 		return -EBUSY;
 	}
 	/* Figure out where to put new node */
-	new = &blkif->persistent_gnts.rb_node;
+	new = &ring->persistent_gnts.rb_node;
 	while (*new) {
 		this = container_of(*new, struct persistent_gnt, node);
 
@@ -233,20 +239,19 @@ static int add_persistent_gnt(struct xen_blkif *blkif,
 	set_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags);
 	/* Add new node and rebalance tree. */
 	rb_link_node(&(persistent_gnt->node), parent, new);
-	rb_insert_color(&(persistent_gnt->node), &blkif->persistent_gnts);
-	blkif->persistent_gnt_c++;
-	atomic_inc(&blkif->persistent_gnt_in_use);
+	rb_insert_color(&(persistent_gnt->node), &ring->persistent_gnts);
+	ring->persistent_gnt_c++;
+	atomic_inc(&ring->persistent_gnt_in_use);
 	return 0;
 }
 
-static struct persistent_gnt *get_persistent_gnt(struct xen_blkif *blkif,
+static struct persistent_gnt *get_persistent_gnt(struct xen_blkif_ring *ring,
 						 grant_ref_t gref)
 {
 	struct persistent_gnt *data;
 	struct rb_node *node = NULL;
 
-	BUG_ON(!spin_is_locked(&blkif->pers_gnts_lock));
-	node = blkif->persistent_gnts.rb_node;
+	node = ring->persistent_gnts.rb_node;
 	while (node) {
 		data = container_of(node, struct persistent_gnt, node);
 
@@ -260,25 +265,24 @@ static struct persistent_gnt *get_persistent_gnt(struct xen_blkif *blkif,
 				return NULL;
 			}
 			set_bit(PERSISTENT_GNT_ACTIVE, data->flags);
-			atomic_inc(&blkif->persistent_gnt_in_use);
+			atomic_inc(&ring->persistent_gnt_in_use);
 			return data;
 		}
 	}
 	return NULL;
 }
 
-static void put_persistent_gnt(struct xen_blkif *blkif,
+static void put_persistent_gnt(struct xen_blkif_ring *ring,
                                struct persistent_gnt *persistent_gnt)
 {
-	BUG_ON(!spin_is_locked(&blkif->pers_gnts_lock));
 	if(!test_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags))
 		pr_alert_ratelimited("freeing a grant already unused\n");
 	set_bit(PERSISTENT_GNT_WAS_ACTIVE, persistent_gnt->flags);
 	clear_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags);
-	atomic_dec(&blkif->persistent_gnt_in_use);
+	atomic_dec(&ring->persistent_gnt_in_use);
 }
 
-static void free_persistent_gnts(struct xen_blkif *blkif, struct rb_root *root,
+static void free_persistent_gnts(struct xen_blkif_ring *ring, struct rb_root *root,
                                  unsigned int num)
 {
 	struct gnttab_unmap_grant_ref unmap[BLKIF_MAX_SEGMENTS_PER_REQUEST];
@@ -292,7 +296,6 @@ static void free_persistent_gnts(struct xen_blkif *blkif, struct rb_root *root,
 	unmap_data.unmap_ops = unmap;
 	unmap_data.kunmap_ops = NULL;
 
-	BUG_ON(!spin_is_locked(&blkif->pers_gnts_lock));
 	foreach_grant_safe(persistent_gnt, n, root, node) {
 		BUG_ON(persistent_gnt->handle ==
 			BLKBACK_INVALID_HANDLE);
@@ -310,7 +313,7 @@ static void free_persistent_gnts(struct xen_blkif *blkif, struct rb_root *root,
 			unmap_data.count = segs_to_unmap;
 			BUG_ON(gnttab_unmap_refs_sync(&unmap_data));
 
-			put_free_pages(blkif, pages, segs_to_unmap);
+			put_free_pages(ring, pages, segs_to_unmap);
 			segs_to_unmap = 0;
 		}
 
@@ -327,17 +330,15 @@ void xen_blkbk_unmap_purged_grants(struct work_struct *work)
 	struct page *pages[BLKIF_MAX_SEGMENTS_PER_REQUEST];
 	struct persistent_gnt *persistent_gnt;
 	int segs_to_unmap = 0;
-	struct xen_blkif *blkif = container_of(work, typeof(*blkif), persistent_purge_work);
+	struct xen_blkif_ring *ring = container_of(work, typeof(*ring), persistent_purge_work);
 	struct gntab_unmap_queue_data unmap_data;
-	unsigned long flags;
 
 	unmap_data.pages = pages;
 	unmap_data.unmap_ops = unmap;
 	unmap_data.kunmap_ops = NULL;
 
-	spin_lock_irqsave(&blkif->pers_gnts_lock, flags);
-	while(!list_empty(&blkif->persistent_purge_list)) {
-		persistent_gnt = list_first_entry(&blkif->persistent_purge_list,
+	while(!list_empty(&ring->persistent_purge_list)) {
+		persistent_gnt = list_first_entry(&ring->persistent_purge_list,
 		                                  struct persistent_gnt,
 		                                  remove_node);
 		list_del(&persistent_gnt->remove_node);
@@ -352,45 +353,42 @@ void xen_blkbk_unmap_purged_grants(struct work_struct *work)
 		if (++segs_to_unmap == BLKIF_MAX_SEGMENTS_PER_REQUEST) {
 			unmap_data.count = segs_to_unmap;
 			BUG_ON(gnttab_unmap_refs_sync(&unmap_data));
-			put_free_pages(blkif, pages, segs_to_unmap);
+			put_free_pages(ring, pages, segs_to_unmap);
 			segs_to_unmap = 0;
 		}
 		kfree(persistent_gnt);
 	}
-	spin_unlock_irqrestore(&blkif->pers_gnts_lock, flags);
 	if (segs_to_unmap > 0) {
 		unmap_data.count = segs_to_unmap;
 		BUG_ON(gnttab_unmap_refs_sync(&unmap_data));
-		put_free_pages(blkif, pages, segs_to_unmap);
+		put_free_pages(ring, pages, segs_to_unmap);
 	}
 }
 
-static void purge_persistent_gnt(struct xen_blkif *blkif)
+static void purge_persistent_gnt(struct xen_blkif_ring *ring)
 {
 	struct persistent_gnt *persistent_gnt;
 	struct rb_node *n;
 	unsigned int num_clean, total;
 	bool scan_used = false, clean_used = false;
 	struct rb_root *root;
-	unsigned long flags;
 
-	spin_lock_irqsave(&blkif->pers_gnts_lock, flags);
-	if (blkif->persistent_gnt_c < xen_blkif_max_pgrants ||
-	    (blkif->persistent_gnt_c == xen_blkif_max_pgrants &&
-	    !blkif->vbd.overflow_max_grants)) {
+	if (ring->persistent_gnt_c < xen_blkif_max_pgrants ||
+	    (ring->persistent_gnt_c == xen_blkif_max_pgrants &&
+	    !ring->blkif->vbd.overflow_max_grants)) {
 		goto out;
 	}
 
-	if (work_busy(&blkif->persistent_purge_work)) {
+	if (work_busy(&ring->persistent_purge_work)) {
 		pr_alert_ratelimited("Scheduled work from previous purge is still busy, cannot purge list\n");
 		goto out;
 	}
 
 	num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN;
-	num_clean = blkif->persistent_gnt_c - xen_blkif_max_pgrants + num_clean;
-	num_clean = min(blkif->persistent_gnt_c, num_clean);
+	num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants + num_clean;
+	num_clean = min(ring->persistent_gnt_c, num_clean);
 	if ((num_clean == 0) ||
-	    (num_clean > (blkif->persistent_gnt_c - atomic_read(&blkif->persistent_gnt_in_use))))
+	    (num_clean > (ring->persistent_gnt_c - atomic_read(&ring->persistent_gnt_in_use))))
 		goto out;
 
 	/*
@@ -406,8 +404,8 @@ static void purge_persistent_gnt(struct xen_blkif *blkif)
 
 	pr_debug("Going to purge %u persistent grants\n", num_clean);
 
-	BUG_ON(!list_empty(&blkif->persistent_purge_list));
-	root = &blkif->persistent_gnts;
+	BUG_ON(!list_empty(&ring->persistent_purge_list));
+	root = &ring->persistent_gnts;
 purge_list:
 	foreach_grant_safe(persistent_gnt, n, root, node) {
 		BUG_ON(persistent_gnt->handle ==
@@ -426,7 +424,7 @@ purge_list:
 
 		rb_erase(&persistent_gnt->node, root);
 		list_add(&persistent_gnt->remove_node,
-		         &blkif->persistent_purge_list);
+			 &ring->persistent_purge_list);
 		if (--num_clean == 0)
 			goto finished;
 	}
@@ -447,18 +445,14 @@ finished:
 		goto purge_list;
 	}
 
-	blkif->persistent_gnt_c -= (total - num_clean);
-	spin_unlock_irqrestore(&blkif->pers_gnts_lock, flags);
-	blkif->vbd.overflow_max_grants = 0;
+	ring->persistent_gnt_c -= (total - num_clean);
+	ring->blkif->vbd.overflow_max_grants = 0;
 
 	/* We can defer this work */
-	schedule_work(&blkif->persistent_purge_work);
+	schedule_work(&ring->persistent_purge_work);
 	pr_debug("Purged %u/%u\n", (total - num_clean), total);
-	return;
 
 out:
-	spin_unlock_irqrestore(&blkif->pers_gnts_lock, flags);
-
 	return;
 }
 
@@ -590,14 +584,16 @@ irqreturn_t xen_blkif_be_int(int irq, void *dev_id)
  * SCHEDULER FUNCTIONS
  */
 
-static void print_stats(struct xen_blkif *blkif)
+static void print_stats(struct xen_blkif_ring *ring)
 {
+	struct xen_blkif *blkif = ring->blkif;
+
 	pr_info("(%s): oo %3llu  |  rd %4llu  |  wr %4llu  |  f %4llu"
 		 "  |  ds %4llu | pg: %4u/%4d\n",
 		 current->comm, blkif->st_oo_req,
 		 blkif->st_rd_req, blkif->st_wr_req,
 		 blkif->st_f_req, blkif->st_ds_req,
-		 blkif->persistent_gnt_c,
+		 ring->persistent_gnt_c,
 		 xen_blkif_max_pgrants);
 	blkif->st_print = jiffies + msecs_to_jiffies(10 * 1000);
 	blkif->st_rd_req = 0;
@@ -650,23 +646,23 @@ int xen_blkif_schedule(void *arg)
 
 purge_gnt_list:
 		if (blkif->vbd.feature_gnt_persistent &&
-		    time_after(jiffies, blkif->next_lru)) {
-			purge_persistent_gnt(blkif);
-			blkif->next_lru = jiffies + msecs_to_jiffies(LRU_INTERVAL);
+		    time_after(jiffies, ring->next_lru)) {
+			purge_persistent_gnt(ring);
+			ring->next_lru = jiffies + msecs_to_jiffies(LRU_INTERVAL);
 		}
 
 		/* Shrink if we have more than xen_blkif_max_buffer_pages */
-		shrink_free_pagepool(blkif, xen_blkif_max_buffer_pages);
+		shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);
 
-		if (log_stats && time_after(jiffies, blkif->st_print))
-			print_stats(blkif);
+		if (log_stats && time_after(jiffies, ring->blkif->st_print))
+			print_stats(ring);
 	}
 
 	/* Drain pending purge work */
-	flush_work(&blkif->persistent_purge_work);
+	flush_work(&ring->persistent_purge_work);
 
 	if (log_stats)
-		print_stats(blkif);
+		print_stats(ring);
 
 	ring->xenblkd = NULL;
 	xen_blkif_put(blkif);
@@ -679,21 +675,16 @@ purge_gnt_list:
  */
 void xen_blkbk_free_caches(struct xen_blkif_ring *ring)
 {
-	struct xen_blkif *blkif = ring->blkif;
-	unsigned long flags;
-
 	/* Free all persistent grant pages */
-	spin_lock_irqsave(&blkif->pers_gnts_lock, flags);
-	if (!RB_EMPTY_ROOT(&blkif->persistent_gnts))
-		free_persistent_gnts(blkif, &blkif->persistent_gnts,
-			blkif->persistent_gnt_c);
+	if (!RB_EMPTY_ROOT(&ring->persistent_gnts))
+		free_persistent_gnts(ring, &ring->persistent_gnts,
+			ring->persistent_gnt_c);
 
-	BUG_ON(!RB_EMPTY_ROOT(&blkif->persistent_gnts));
-	blkif->persistent_gnt_c = 0;
-	spin_unlock_irqrestore(&blkif->pers_gnts_lock, flags);
+	BUG_ON(!RB_EMPTY_ROOT(&ring->persistent_gnts));
+	ring->persistent_gnt_c = 0;
 
 	/* Since we are shutting down remove all pages from the buffer */
-	shrink_free_pagepool(blkif, 0 /* All */);
+	shrink_free_pagepool(ring, 0 /* All */);
 }
 
 static unsigned int xen_blkbk_unmap_prepare(
@@ -704,13 +695,10 @@ static unsigned int xen_blkbk_unmap_prepare(
 	struct page **unmap_pages)
 {
 	unsigned int i, invcount = 0;
-	unsigned long flags;
 
 	for (i = 0; i < num; i++) {
 		if (pages[i]->persistent_gnt != NULL) {
-			spin_lock_irqsave(&ring->blkif->pers_gnts_lock, flags);
-			put_persistent_gnt(ring->blkif, pages[i]->persistent_gnt);
-			spin_unlock_irqrestore(&ring->blkif->pers_gnts_lock, flags);
+			put_persistent_gnt(ring, pages[i]->persistent_gnt);
 			continue;
 		}
 		if (pages[i]->handle == BLKBACK_INVALID_HANDLE)
@@ -735,7 +723,7 @@ static void xen_blkbk_unmap_and_respond_callback(int result, struct gntab_unmap_
 	   but is this the best way to deal with this? */
 	BUG_ON(result);
 
-	put_free_pages(blkif, data->pages, data->count);
+	put_free_pages(ring, data->pages, data->count);
 	make_response(ring, pending_req->id,
 		      pending_req->operation, pending_req->status);
 	free_req(ring, pending_req);
@@ -802,7 +790,7 @@ static void xen_blkbk_unmap(struct xen_blkif_ring *ring,
 		if (invcount) {
 			ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, invcount);
 			BUG_ON(ret);
-			put_free_pages(ring->blkif, unmap_pages, invcount);
+			put_free_pages(ring, unmap_pages, invcount);
 		}
 		pages += batch;
 		num -= batch;
@@ -823,7 +811,6 @@ static int xen_blkbk_map(struct xen_blkif_ring *ring,
 	int last_map = 0, map_until = 0;
 	int use_persistent_gnts;
 	struct xen_blkif *blkif = ring->blkif;
-	unsigned long irq_flags;
 
 	use_persistent_gnts = (blkif->vbd.feature_gnt_persistent);
 
@@ -837,11 +824,9 @@ again:
 		uint32_t flags;
 
 		if (use_persistent_gnts) {
-			spin_lock_irqsave(&blkif->pers_gnts_lock, irq_flags);
 			persistent_gnt = get_persistent_gnt(
-				blkif,
+				ring,
 				pages[i]->gref);
-			spin_unlock_irqrestore(&blkif->pers_gnts_lock, irq_flags);
 		}
 
 		if (persistent_gnt) {
@@ -852,7 +837,7 @@ again:
 			pages[i]->page = persistent_gnt->page;
 			pages[i]->persistent_gnt = persistent_gnt;
 		} else {
-			if (get_free_page(blkif, &pages[i]->page))
+			if (get_free_page(ring, &pages[i]->page))
 				goto out_of_memory;
 			addr = vaddr(pages[i]->page);
 			pages_to_gnt[segs_to_map] = pages[i]->page;
@@ -885,7 +870,7 @@ again:
 			BUG_ON(new_map_idx >= segs_to_map);
 			if (unlikely(map[new_map_idx].status != 0)) {
 				pr_debug("invalid buffer -- could not remap it\n");
-				put_free_pages(blkif, &pages[seg_idx]->page, 1);
+				put_free_pages(ring, &pages[seg_idx]->page, 1);
 				pages[seg_idx]->handle = BLKBACK_INVALID_HANDLE;
 				ret |= 1;
 				goto next;
@@ -895,7 +880,7 @@ again:
 			continue;
 		}
 		if (use_persistent_gnts &&
-		    blkif->persistent_gnt_c < xen_blkif_max_pgrants) {
+		    ring->persistent_gnt_c < xen_blkif_max_pgrants) {
 			/*
 			 * We are using persistent grants, the grant is
 			 * not mapped but we might have room for it.
@@ -913,19 +898,16 @@ again:
 			persistent_gnt->gnt = map[new_map_idx].ref;
 			persistent_gnt->handle = map[new_map_idx].handle;
 			persistent_gnt->page = pages[seg_idx]->page;
-			spin_lock_irqsave(&blkif->pers_gnts_lock, irq_flags);
-			if (add_persistent_gnt(blkif,
+			if (add_persistent_gnt(ring,
 			                       persistent_gnt)) {
-				spin_unlock_irqrestore(&blkif->pers_gnts_lock, irq_flags);
 				kfree(persistent_gnt);
 				persistent_gnt = NULL;
 				goto next;
 			}
 			pages[seg_idx]->persistent_gnt = persistent_gnt;
 			pr_debug("grant %u added to the tree of persistent grants, using %u/%u\n",
-				 persistent_gnt->gnt, blkif->persistent_gnt_c,
+				 persistent_gnt->gnt, ring->persistent_gnt_c,
 				 xen_blkif_max_pgrants);
-			spin_unlock_irqrestore(&blkif->pers_gnts_lock, irq_flags);
 			goto next;
 		}
 		if (use_persistent_gnts && !blkif->vbd.overflow_max_grants) {
@@ -949,7 +931,7 @@ next:
 
 out_of_memory:
 	pr_alert("%s: out of memory\n", __func__);
-	put_free_pages(blkif, pages_to_gnt, segs_to_map);
+	put_free_pages(ring, pages_to_gnt, segs_to_map);
 	return -ENOMEM;
 }
 
diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index 0833dc6..54f126e 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -291,6 +291,22 @@ struct xen_blkif_ring {
 	spinlock_t		pending_free_lock;
 	wait_queue_head_t	pending_free_wq;
 
+	/* Tree to store persistent grants. */
+	spinlock_t		pers_gnts_lock;
+	struct rb_root		persistent_gnts;
+	unsigned int		persistent_gnt_c;
+	atomic_t		persistent_gnt_in_use;
+	unsigned long           next_lru;
+
+	/* Used by the kworker that offload work from the persistent purge. */
+	struct list_head	persistent_purge_list;
+	struct work_struct	persistent_purge_work;
+
+	/* Buffer of free pages to map grant refs. */
+	spinlock_t		free_pages_lock;
+	int			free_pages_num;
+	struct list_head	free_pages;
+
 	struct work_struct	free_work;
 	/* Thread shutdown wait queue. */
 	wait_queue_head_t	shutdown_wq;
@@ -312,22 +328,6 @@ struct xen_blkif {
 	struct completion	drain_complete;
 	atomic_t		drain;
 
-	/* tree to store persistent grants */
-	spinlock_t		pers_gnts_lock;
-	struct rb_root		persistent_gnts;
-	unsigned int		persistent_gnt_c;
-	atomic_t		persistent_gnt_in_use;
-	unsigned long           next_lru;
-
-	/* used by the kworker that offload work from the persistent purge */
-	struct list_head	persistent_purge_list;
-	struct work_struct	persistent_purge_work;
-
-	/* buffer of free pages to map grant refs */
-	spinlock_t		free_pages_lock;
-	int			free_pages_num;
-	struct list_head	free_pages;
-
 	/* statistics */
 	unsigned long		st_print;
 	unsigned long long			st_rd_req;
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index d83b790..db8ad57 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -150,6 +150,10 @@ static int xen_blkif_alloc_rings(struct xen_blkif *blkif)
 		spin_lock_init(&ring->blk_ring_lock);
 		init_waitqueue_head(&ring->wq);
 		INIT_LIST_HEAD(&ring->pending_free);
+		INIT_LIST_HEAD(&ring->persistent_purge_list);
+		INIT_WORK(&ring->persistent_purge_work, xen_blkbk_unmap_purged_grants);
+		spin_lock_init(&ring->free_pages_lock);
+		INIT_LIST_HEAD(&ring->free_pages);
 
 		spin_lock_init(&ring->pending_free_lock);
 		init_waitqueue_head(&ring->pending_free_wq);
@@ -175,11 +179,7 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
 	atomic_set(&blkif->refcnt, 1);
 	init_completion(&blkif->drain_complete);
 	INIT_WORK(&blkif->free_work, xen_blkif_deferred_free);
-	spin_lock_init(&blkif->free_pages_lock);
-	INIT_LIST_HEAD(&blkif->free_pages);
-	INIT_LIST_HEAD(&blkif->persistent_purge_list);
 	blkif->st_print = jiffies;
-	INIT_WORK(&blkif->persistent_purge_work, xen_blkbk_unmap_purged_grants);
 
 	return blkif;
 }
@@ -290,6 +290,12 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
 			i++;
 		}
 
+		BUG_ON(atomic_read(&ring->persistent_gnt_in_use) != 0);
+		BUG_ON(!list_empty(&ring->persistent_purge_list));
+		BUG_ON(!RB_EMPTY_ROOT(&ring->persistent_gnts));
+		BUG_ON(!list_empty(&ring->free_pages));
+		BUG_ON(ring->free_pages_num != 0);
+		BUG_ON(ring->persistent_gnt_c != 0);
 		WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));
 	}
 	blkif->nr_ring_pages = 0;
@@ -304,13 +310,6 @@ static void xen_blkif_free(struct xen_blkif *blkif)
 	xen_vbd_free(&blkif->vbd);
 
 	/* Make sure everything is drained before shutting down */
-	BUG_ON(blkif->persistent_gnt_c != 0);
-	BUG_ON(atomic_read(&blkif->persistent_gnt_in_use) != 0);
-	BUG_ON(blkif->free_pages_num != 0);
-	BUG_ON(!list_empty(&blkif->persistent_purge_list));
-	BUG_ON(!list_empty(&blkif->free_pages));
-	BUG_ON(!RB_EMPTY_ROOT(&blkif->persistent_gnts));
-
 	kfree(blkif->rings);
 	kmem_cache_free(xen_blkif_cachep, blkif);
 }
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 01/10] xen/blkif: document blkif multi-queue/ring extension
  2015-11-14  3:12 ` Bob Liu
@ 2015-11-16 19:27   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-16 19:27 UTC (permalink / raw)
  To: Bob Liu
  Cc: jonathan.davies, felipe.franciosi, rafal.mielniczuk,
	linux-kernel, xen-devel, axboe, david.vrabel, avanzini.arianna,
	roger.pau

On Sat, Nov 14, 2015 at 11:12:10AM +0800, Bob Liu wrote:
> Document the multi-queue/ring feature in terms of XenStore keys to be written by
> the backend and by the frontend.
> 
> Signed-off-by: Bob Liu <bob.liu@oracle.com>

applied to my 'for-jens-4.5', but this won't go anywhere until the Xen version
is acked.

> ---
> v2:
> Add descriptions together with multi-page ring buffer.
> ---
>  include/xen/interface/io/blkif.h |   48 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 48 insertions(+)
> 
> diff --git a/include/xen/interface/io/blkif.h b/include/xen/interface/io/blkif.h
> index c33e1c4..8b8cfad 100644
> --- a/include/xen/interface/io/blkif.h
> +++ b/include/xen/interface/io/blkif.h
> @@ -28,6 +28,54 @@ typedef uint16_t blkif_vdev_t;
>  typedef uint64_t blkif_sector_t;
>  
>  /*
> + * Multiple hardware queues/rings:
> + * If supported, the backend will write the key "multi-queue-max-queues" to
> + * the directory for that vbd, and set its value to the maximum supported
> + * number of queues.
> + * Frontends that are aware of this feature and wish to use it can write the
> + * key "multi-queue-num-queues" with the number they wish to use, which must be
> + * greater than zero, and no more than the value reported by the backend in
> + * "multi-queue-max-queues".
> + *
> + * For frontends requesting just one queue, the usual event-channel and
> + * ring-ref keys are written as before, simplifying the backend processing
> + * to avoid distinguishing between a frontend that doesn't understand the
> + * multi-queue feature, and one that does, but requested only one queue.
> + *
> + * Frontends requesting two or more queues must not write the toplevel
> + * event-channel and ring-ref keys, instead writing those keys under sub-keys
> + * having the name "queue-N" where N is the integer ID of the queue/ring for
> + * which those keys belong. Queues are indexed from zero.
> + * For example, a frontend with two queues must write the following set of
> + * queue-related keys:
> + *
> + * /local/domain/1/device/vbd/0/multi-queue-num-queues = "2"
> + * /local/domain/1/device/vbd/0/queue-0 = ""
> + * /local/domain/1/device/vbd/0/queue-0/ring-ref = "<ring-ref#0>"
> + * /local/domain/1/device/vbd/0/queue-0/event-channel = "<evtchn#0>"
> + * /local/domain/1/device/vbd/0/queue-1 = ""
> + * /local/domain/1/device/vbd/0/queue-1/ring-ref = "<ring-ref#1>"
> + * /local/domain/1/device/vbd/0/queue-1/event-channel = "<evtchn#1>"
> + *
> + * It is also possible to use multiple queues/rings together with
> + * feature multi-page ring buffer.
> + * For example, a frontend requests two queues/rings and the size of each ring
> + * buffer is two pages must write the following set of related keys:
> + *
> + * /local/domain/1/device/vbd/0/multi-queue-num-queues = "2"
> + * /local/domain/1/device/vbd/0/ring-page-order = "1"
> + * /local/domain/1/device/vbd/0/queue-0 = ""
> + * /local/domain/1/device/vbd/0/queue-0/ring-ref0 = "<ring-ref#0>"
> + * /local/domain/1/device/vbd/0/queue-0/ring-ref1 = "<ring-ref#1>"
> + * /local/domain/1/device/vbd/0/queue-0/event-channel = "<evtchn#0>"
> + * /local/domain/1/device/vbd/0/queue-1 = ""
> + * /local/domain/1/device/vbd/0/queue-1/ring-ref0 = "<ring-ref#2>"
> + * /local/domain/1/device/vbd/0/queue-1/ring-ref1 = "<ring-ref#3>"
> + * /local/domain/1/device/vbd/0/queue-1/event-channel = "<evtchn#1>"
> + *
> + */
> +
> +/*
>   * REQUEST CODES.
>   */
>  #define BLKIF_OP_READ              0
> -- 
> 1.7.10.4
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 02/10] xen/blkfront: separate per ring information out of device info
  2015-11-14  3:12   ` Bob Liu
  (?)
@ 2015-11-16 19:44   ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-16 19:44 UTC (permalink / raw)
  To: Bob Liu
  Cc: jonathan.davies, felipe.franciosi, rafal.mielniczuk,
	linux-kernel, xen-devel, axboe, david.vrabel, avanzini.arianna,
	roger.pau

On Sat, Nov 14, 2015 at 11:12:11AM +0800, Bob Liu wrote:
> Split per ring information to an new structure "blkfront_ring_info".
> 
> A ring is the representation of a hardware queue, every vbd device can associate
> with one or more rings depending on how many hardware queues/rings to be used.
> 
> This patch is a preparation for supporting real multi hardware queues/rings.

I also added:

     We also add a backpointer to 'struct blkfront_info' (dev_info) which
    is not needed (we could use containers_of) but further patch
    ("xen/blkfront: pseudo support for multi hardware queues/rings")
    will make allocation of 'blkfront_ring_info' dynamic.

Fixed some odd tab spaces.

And put in my 'for-jens-4.5' branch.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 03/10] xen/blkfront: pseudo support for multi hardware queues/rings
  2015-11-14  3:12   ` Bob Liu
  (?)
@ 2015-11-16 20:38   ` Konrad Rzeszutek Wilk
  2015-11-16 21:54     ` Konrad Rzeszutek Wilk
  -1 siblings, 1 reply; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-16 20:38 UTC (permalink / raw)
  To: Bob Liu
  Cc: jonathan.davies, felipe.franciosi, rafal.mielniczuk,
	linux-kernel, xen-devel, axboe, david.vrabel, avanzini.arianna,
	roger.pau

> +static void blkif_free(struct blkfront_info *info, int suspend)
> +{
> +	struct grant *persistent_gnt, *n;
> +	unsigned int i;
> +
> +	/* Prevent new requests being issued until we fix things up. */
> +	spin_lock_irq(&info->io_lock);
> +	info->connected = suspend ?
> +		BLKIF_STATE_SUSPENDED : BLKIF_STATE_DISCONNECTED;
> +	/* No more blkif_request(). */
> +	if (info->rq)
> +		blk_mq_stop_hw_queues(info->rq);
> +
> +	/* Remove all persistent grants */
> +	if (!list_empty(&info->grants)) {
> +		list_for_each_entry_safe(persistent_gnt, n,
> +					 &info->grants, node) {
> +			list_del(&persistent_gnt->node);
> +			if (persistent_gnt->gref != GRANT_INVALID_REF) {
> +				gnttab_end_foreign_access(persistent_gnt->gref,
> +							  0, 0UL);
> +				info->persistent_gnts_c--;
> +			}
> +			if (info->feature_persistent)
> +				__free_page(persistent_gnt->page);
> +			kfree(persistent_gnt);
> +		}
> +	}
> +	BUG_ON(info->persistent_gnts_c != 0);
> +
> +	for (i = 0; i < info->nr_rings; i++)
> +		blkif_free_ring(&info->rinfo[i]);
> +	kfree(info->rinfo);

?? That should be inside the loop.

Anyhow did this change:

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index d73734f..6b46d4d 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -611,7 +611,7 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
                max_grefs += INDIRECT_GREFS(max_grefs);
 
        /*
-        * We have to reserve 'max_grefs' grants at first because persistent
+        * We have to reserve 'max_grefs' grants because persistent
         * grants are shared by all rings.
         */
        if (max_grefs > 0)
@@ -1212,9 +1212,11 @@ static void blkif_free(struct blkfront_info *info, int suspend)
        }
        BUG_ON(info->persistent_gnts_c != 0);
 
-       for (i = 0; i < info->nr_rings; i++)
+       for (i = 0; i < info->nr_rings; i++) {
                blkif_free_ring(&info->rinfo[i]);
-       kfree(info->rinfo);
+               kfree(info->rinfo);
+               info->ring = NULL;
+       }
        info->nr_rings = 0;
        spin_unlock_irq(&info->io_lock);
 }

and put in my 'for-jens-4.5' branch.

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 04/10] xen/blkfront: split per device io_lock
  2015-11-14  3:12 ` Bob Liu
@ 2015-11-16 20:59   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-16 20:59 UTC (permalink / raw)
  To: Bob Liu
  Cc: jonathan.davies, felipe.franciosi, rafal.mielniczuk,
	linux-kernel, xen-devel, axboe, david.vrabel, avanzini.arianna,
	roger.pau

On Sat, Nov 14, 2015 at 11:12:13AM +0800, Bob Liu wrote:
> After commit "xen/blkfront: separate per ring information out of device
> info", per-ring data is protected by a per-device lock('io_lock').
> 
> This is not a good way and will effect the scalability, so introduces a
> per-ring lock('ring_lock').
> 
> The old 'io_lock' is renamed to 'dev_lock' which protects the ->grants list and
> persistent_gnts_c shared by all rings.

I changed the description a bit:

    xen/blkfront: split per device io_lock
    
    After patch "xen/blkfront: separate per ring information out of device
    info", per-ring data is protected by a per-device lock ('io_lock').
    
    This is not a good way and will effect the scalability, so introduce a
    per-ring lock ('ring_lock').
    
    The old 'io_lock' is renamed to 'dev_lock' which protects the ->grants list and
    ->persistent_gnts_c which are shared by all rings.
    
    Note that in 'blkfront_probe' the 'blkfront_info' is setup via kzalloc
    so setting ->persistent_gnts_c to zero is not needed.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 00/10] xen-block: multi hardware-queues/rings support
  2015-11-14  3:12 [PATCH v5 00/10] xen-block: multi hardware-queues/rings support Bob Liu
                   ` (16 preceding siblings ...)
  2015-11-14  3:12 ` Bob Liu
@ 2015-11-16 21:08 ` Konrad Rzeszutek Wilk
  2015-11-16 21:35 ` [PATCH] cleanups to xen-lkbfront on top of " Konrad Rzeszutek Wilk
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-16 21:08 UTC (permalink / raw)
  To: Bob Liu
  Cc: jonathan.davies, felipe.franciosi, rafal.mielniczuk,
	linux-kernel, xen-devel, axboe, david.vrabel, avanzini.arianna,
	roger.pau

On Sat, Nov 14, 2015 at 11:12:09AM +0800, Bob Liu wrote:
> Note: These patches were based on original work of Arianna's internship for
> GNOME's Outreach Program for Women.
> 
> After using blk-mq api, a guest has more than one(nr_vpus) software request
> queues associated with each block front. These queues can be mapped over several
> rings(hardware queues) to the backend, making it very easy for us to run
> multiple threads on the backend for a single virtual disk.
> 
> By having different threads issuing requests at the same time, the performance
> of guest can be improved significantly.
> 
> Test was done based on null_blk driver:
> dom0: v4.3-rc7 16vcpus 10GB "modprobe null_blk"

Surely v4.4-rc1?

> domU: v4.3-rc7 16vcpus 10GB

Ditto.

> 
> [test]
> rw=read
> direct=1
> ioengine=libaio
> bs=4k
> time_based
> runtime=30
> filename=/dev/xvdb
> numjobs=16
> iodepth=64
> iodepth_batch=64
> iodepth_batch_complete=64
> group_reporting
> 
> Results:
> iops1: After commit("xen/blkfront: make persistent grants per-queue").
> iops2: After commit("xen/blkback: make persistent grants and free pages pool per-queue").
> 
> Queues:			  1 	   4 	  	  8 	 	 16
> Iops orig(k):		810 	1064 		780 		700
> Iops1(k):		810     1230(~20%)	1024(~20%)	850(~20%)
> Iops2(k):		810     1410(~35%)	1354(~75%)      1440(~100%)

Wholy cow. That is some contention on a lock (iops1 vs iops2). Thank you for
running these numbers.

> 
> With 4 queues after this series we can get ~75% increase in IOPS, and
> performance won't drop if incresing queue numbers.
> 
> Please find the respective chart in this link:
> https://www.dropbox.com/s/agrcy2pbzbsvmwv/iops.png?dl=0

Thank you for that link.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 05/10] xen/blkfront: negotiate number of queues/rings to be used with backend
  2015-11-14  3:12 ` [PATCH v5 05/10] xen/blkfront: negotiate number of queues/rings to be used with backend Bob Liu
@ 2015-11-16 21:27   ` Konrad Rzeszutek Wilk
  2015-11-16 23:13     ` Bob Liu
  2015-11-16 23:13     ` Bob Liu
  0 siblings, 2 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-16 21:27 UTC (permalink / raw)
  To: Bob Liu
  Cc: jonathan.davies, felipe.franciosi, rafal.mielniczuk,
	linux-kernel, xen-devel, axboe, david.vrabel, avanzini.arianna,
	roger.pau

>  /* Common code used when first setting up, and when resuming. */
>  static int talk_to_blkback(struct xenbus_device *dev,
> @@ -1527,10 +1582,9 @@ static int talk_to_blkback(struct xenbus_device *dev,
>  {
>  	const char *message = NULL;
>  	struct xenbus_transaction xbt;
> -	int err, i;
> -	unsigned int max_page_order = 0;
> +	int err;
> +	unsigned int i, max_page_order = 0;
>  	unsigned int ring_page_order = 0;
> -	struct blkfront_ring_info *rinfo;

Why? You end up doing the 'struct blkfront_ring_info' decleration
in two of the loops below?
>  
>  	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
>  			   "max-ring-page-order", "%u", &max_page_order);
> @@ -1542,7 +1596,8 @@ static int talk_to_blkback(struct xenbus_device *dev,
>  	}
>  
>  	for (i = 0; i < info->nr_rings; i++) {
> -		rinfo = &info->rinfo[i];
> +		struct blkfront_ring_info *rinfo = &info->rinfo[i];
> +

Here..

> @@ -1617,7 +1677,7 @@ again:
>  
>  	for (i = 0; i < info->nr_rings; i++) {
>  		int j;
> -		rinfo = &info->rinfo[i];
> +		struct blkfront_ring_info *rinfo = &info->rinfo[i];

And here?

It is not a big deal but I am curious of why add this change?

> @@ -1717,7 +1789,6 @@ static int blkfront_probe(struct xenbus_device *dev,
>  
>  	mutex_init(&info->mutex);
>  	spin_lock_init(&info->dev_lock);
> -	info->xbdev = dev;

That looks like a spurious change? Ah, I see that we do the same exact
operation earlier in the blkfront_probe.

Let me take this out of this patch and spin it as a seperate patch.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH] cleanups to xen-lkbfront on top of xen-block: multi hardware-queues/rings support.
  2015-11-14  3:12 [PATCH v5 00/10] xen-block: multi hardware-queues/rings support Bob Liu
                   ` (17 preceding siblings ...)
  2015-11-16 21:08 ` [PATCH v5 00/10] xen-block: multi hardware-queues/rings support Konrad Rzeszutek Wilk
@ 2015-11-16 21:35 ` Konrad Rzeszutek Wilk
  2015-11-16 21:35   ` [PATCH 1/2] xen/blkfront: Cleanup of comments, fix unaligned variables, and syntax errors Konrad Rzeszutek Wilk
  2015-11-16 21:35   ` [PATCH 2/2] xen/blkfront: Remove duplicate setting of ->xbdev Konrad Rzeszutek Wilk
  2015-11-25 19:25 ` [PATCH v5 00/10] xen-block: multi hardware-queues/rings support Konrad Rzeszutek Wilk
                   ` (3 subsequent siblings)
  22 siblings, 2 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-16 21:35 UTC (permalink / raw)
  To: xen-devel, linux-kernel, felipe.franciosi, avanzini.arianna,
	david.vrabel, bob.liu
  Cc: axboe, jonathan.davies

Hey,

As I was reviewing Bob's patches some things didn't look right. The
first patch is just a cleanup code. The second patch is a spin-off
a change in Bob's 'xen/blkfront: negotiate number of queues/rings to be
used with backend' patch.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 1/2] xen/blkfront: Cleanup of comments, fix unaligned variables, and syntax errors.
  2015-11-16 21:35 ` [PATCH] cleanups to xen-lkbfront on top of " Konrad Rzeszutek Wilk
@ 2015-11-16 21:35   ` Konrad Rzeszutek Wilk
  2015-11-16 21:35   ` [PATCH 2/2] xen/blkfront: Remove duplicate setting of ->xbdev Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-16 21:35 UTC (permalink / raw)
  To: xen-devel, linux-kernel, felipe.franciosi, avanzini.arianna,
	david.vrabel, bob.liu
  Cc: axboe, jonathan.davies, Konrad Rzeszutek Wilk

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/block/xen-blkfront.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 5154f4d..7c131c5 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -118,8 +118,8 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the
 	__CONST_RING_SIZE(blkif, XEN_PAGE_SIZE * XENBUS_MAX_RING_GRANTS)
 
 /*
- * ring-ref%i i=(-1UL) would take 11 characters + 'ring-ref' is 8, so 19
- * characters are enough. Define to 20 to keep consist with backend.
+ * ring-ref%u i=(-1UL) would take 11 characters + 'ring-ref' is 8, so 19
+ * characters are enough. Define to 20 to keep consistent with backend.
  */
 #define RINGREF_NAME_LEN (20)
 /*
@@ -238,7 +238,7 @@ static int get_id_from_freelist(struct blkfront_ring_info *rinfo)
 }
 
 static int add_id_to_freelist(struct blkfront_ring_info *rinfo,
-			       unsigned long id)
+			      unsigned long id)
 {
 	if (rinfo->shadow[id].req.u.rw.id != id)
 		return -EINVAL;
@@ -257,7 +257,7 @@ static int fill_grant_buffer(struct blkfront_ring_info *rinfo, int num)
 	struct grant *gnt_list_entry, *n;
 	int i = 0;
 
-	while(i < num) {
+	while (i < num) {
 		gnt_list_entry = kzalloc(sizeof(struct grant), GFP_NOIO);
 		if (!gnt_list_entry)
 			goto out_of_memory;
@@ -776,7 +776,7 @@ static inline bool blkif_request_flush_invalid(struct request *req,
 }
 
 static int blkif_queue_rq(struct blk_mq_hw_ctx *hctx,
-			   const struct blk_mq_queue_data *qd)
+			  const struct blk_mq_queue_data *qd)
 {
 	unsigned long flags;
 	struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)hctx->driver_data;
@@ -1968,8 +1968,7 @@ static int blkfront_resume(struct xenbus_device *dev)
 	return err;
 }
 
-static void
-blkfront_closing(struct blkfront_info *info)
+static void blkfront_closing(struct blkfront_info *info)
 {
 	struct xenbus_device *xbdev = info->xbdev;
 	struct block_device *bdev = NULL;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 2/2] xen/blkfront: Remove duplicate setting of ->xbdev.
  2015-11-16 21:35 ` [PATCH] cleanups to xen-lkbfront on top of " Konrad Rzeszutek Wilk
  2015-11-16 21:35   ` [PATCH 1/2] xen/blkfront: Cleanup of comments, fix unaligned variables, and syntax errors Konrad Rzeszutek Wilk
@ 2015-11-16 21:35   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-16 21:35 UTC (permalink / raw)
  To: xen-devel, linux-kernel, felipe.franciosi, avanzini.arianna,
	david.vrabel, bob.liu
  Cc: axboe, jonathan.davies, Konrad Rzeszutek Wilk

From: Bob Liu <bob.liu@oracle.com>

We do the same exact operations a bit earlier in the
function.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/block/xen-blkfront.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 7c131c5..c0f6fcd 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1792,7 +1792,6 @@ static int blkfront_probe(struct xenbus_device *dev,
 
 	mutex_init(&info->mutex);
 	spin_lock_init(&info->dev_lock);
-	info->xbdev = dev;
 	info->vdevice = vdevice;
 	INIT_LIST_HEAD(&info->grants);
 	info->connected = BLKIF_STATE_DISCONNECTED;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 03/10] xen/blkfront: pseudo support for multi hardware queues/rings
  2015-11-16 20:38   ` Konrad Rzeszutek Wilk
@ 2015-11-16 21:54     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-16 21:54 UTC (permalink / raw)
  To: Bob Liu
  Cc: jonathan.davies, felipe.franciosi, rafal.mielniczuk,
	linux-kernel, xen-devel, axboe, david.vrabel, avanzini.arianna,
	roger.pau

> -       for (i = 0; i < info->nr_rings; i++)
> +       for (i = 0; i < info->nr_rings; i++) {
>                 blkif_free_ring(&info->rinfo[i]);
> -       kfree(info->rinfo);
> +               kfree(info->rinfo);
> +               info->ring = NULL;
> +       }

Duh! Not it shouldn't. That is because we do info->ring = kzalloc(.. * info->nr_rings)).

<sigh> That throws me off every time.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 05/10] xen/blkfront: negotiate number of queues/rings to be used with backend
  2015-11-16 21:27   ` Konrad Rzeszutek Wilk
  2015-11-16 23:13     ` Bob Liu
@ 2015-11-16 23:13     ` Bob Liu
  2015-11-17 14:20       ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 59+ messages in thread
From: Bob Liu @ 2015-11-16 23:13 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, linux-kernel, roger.pau, felipe.franciosi, axboe,
	avanzini.arianna, rafal.mielniczuk, jonathan.davies,
	david.vrabel


On 11/17/2015 05:27 AM, Konrad Rzeszutek Wilk wrote:
>>  /* Common code used when first setting up, and when resuming. */
>>  static int talk_to_blkback(struct xenbus_device *dev,
>> @@ -1527,10 +1582,9 @@ static int talk_to_blkback(struct xenbus_device *dev,
>>  {
>>  	const char *message = NULL;
>>  	struct xenbus_transaction xbt;
>> -	int err, i;
>> -	unsigned int max_page_order = 0;
>> +	int err;
>> +	unsigned int i, max_page_order = 0;
>>  	unsigned int ring_page_order = 0;
>> -	struct blkfront_ring_info *rinfo;
> 
> Why? You end up doing the 'struct blkfront_ring_info' decleration
> in two of the loops below?

Oh, that's because Roger mentioned we would be tempted to declare rinfo only inside the for loop, to limit
the scope.

>>  
>>  	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
>>  			   "max-ring-page-order", "%u", &max_page_order);
>> @@ -1542,7 +1596,8 @@ static int talk_to_blkback(struct xenbus_device *dev,
>>  	}
>>  
>>  	for (i = 0; i < info->nr_rings; i++) {
>> -		rinfo = &info->rinfo[i];
>> +		struct blkfront_ring_info *rinfo = &info->rinfo[i];
>> +
> 
> Here..
> 
>> @@ -1617,7 +1677,7 @@ again:
>>  
>>  	for (i = 0; i < info->nr_rings; i++) {
>>  		int j;
>> -		rinfo = &info->rinfo[i];
>> +		struct blkfront_ring_info *rinfo = &info->rinfo[i];
> 
> And here?
> 
> It is not a big deal but I am curious of why add this change?
> 
>> @@ -1717,7 +1789,6 @@ static int blkfront_probe(struct xenbus_device *dev,
>>  
>>  	mutex_init(&info->mutex);
>>  	spin_lock_init(&info->dev_lock);
>> -	info->xbdev = dev;
> 
> That looks like a spurious change? Ah, I see that we do the same exact
> operation earlier in the blkfront_probe.
> 

The place of this line was changed because:

1738         info->xbdev = dev;

1739         /* Check if backend supports multiple queues. */
1740         err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
                                         	^^^^
						We need xbdev to be set in advance.
                                  
1741                            "multi-queue-max-queues", "%u", &backend_max_queues);
1742         if (err < 0)
1743                 backend_max_queues = 1;


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 05/10] xen/blkfront: negotiate number of queues/rings to be used with backend
  2015-11-16 21:27   ` Konrad Rzeszutek Wilk
@ 2015-11-16 23:13     ` Bob Liu
  2015-11-16 23:13     ` Bob Liu
  1 sibling, 0 replies; 59+ messages in thread
From: Bob Liu @ 2015-11-16 23:13 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: jonathan.davies, felipe.franciosi, rafal.mielniczuk,
	linux-kernel, xen-devel, axboe, david.vrabel, avanzini.arianna,
	roger.pau


On 11/17/2015 05:27 AM, Konrad Rzeszutek Wilk wrote:
>>  /* Common code used when first setting up, and when resuming. */
>>  static int talk_to_blkback(struct xenbus_device *dev,
>> @@ -1527,10 +1582,9 @@ static int talk_to_blkback(struct xenbus_device *dev,
>>  {
>>  	const char *message = NULL;
>>  	struct xenbus_transaction xbt;
>> -	int err, i;
>> -	unsigned int max_page_order = 0;
>> +	int err;
>> +	unsigned int i, max_page_order = 0;
>>  	unsigned int ring_page_order = 0;
>> -	struct blkfront_ring_info *rinfo;
> 
> Why? You end up doing the 'struct blkfront_ring_info' decleration
> in two of the loops below?

Oh, that's because Roger mentioned we would be tempted to declare rinfo only inside the for loop, to limit
the scope.

>>  
>>  	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
>>  			   "max-ring-page-order", "%u", &max_page_order);
>> @@ -1542,7 +1596,8 @@ static int talk_to_blkback(struct xenbus_device *dev,
>>  	}
>>  
>>  	for (i = 0; i < info->nr_rings; i++) {
>> -		rinfo = &info->rinfo[i];
>> +		struct blkfront_ring_info *rinfo = &info->rinfo[i];
>> +
> 
> Here..
> 
>> @@ -1617,7 +1677,7 @@ again:
>>  
>>  	for (i = 0; i < info->nr_rings; i++) {
>>  		int j;
>> -		rinfo = &info->rinfo[i];
>> +		struct blkfront_ring_info *rinfo = &info->rinfo[i];
> 
> And here?
> 
> It is not a big deal but I am curious of why add this change?
> 
>> @@ -1717,7 +1789,6 @@ static int blkfront_probe(struct xenbus_device *dev,
>>  
>>  	mutex_init(&info->mutex);
>>  	spin_lock_init(&info->dev_lock);
>> -	info->xbdev = dev;
> 
> That looks like a spurious change? Ah, I see that we do the same exact
> operation earlier in the blkfront_probe.
> 

The place of this line was changed because:

1738         info->xbdev = dev;

1739         /* Check if backend supports multiple queues. */
1740         err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
                                         	^^^^
						We need xbdev to be set in advance.
                                  
1741                            "multi-queue-max-queues", "%u", &backend_max_queues);
1742         if (err < 0)
1743                 backend_max_queues = 1;

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 05/10] xen/blkfront: negotiate number of queues/rings to be used with backend
  2015-11-16 23:13     ` Bob Liu
@ 2015-11-17 14:20       ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-17 14:20 UTC (permalink / raw)
  To: Bob Liu
  Cc: jonathan.davies, felipe.franciosi, rafal.mielniczuk,
	linux-kernel, xen-devel, axboe, david.vrabel, avanzini.arianna,
	roger.pau

On Tue, Nov 17, 2015 at 07:13:34AM +0800, Bob Liu wrote:
> 
> On 11/17/2015 05:27 AM, Konrad Rzeszutek Wilk wrote:
> >>  /* Common code used when first setting up, and when resuming. */
> >>  static int talk_to_blkback(struct xenbus_device *dev,
> >> @@ -1527,10 +1582,9 @@ static int talk_to_blkback(struct xenbus_device *dev,
> >>  {
> >>  	const char *message = NULL;
> >>  	struct xenbus_transaction xbt;
> >> -	int err, i;
> >> -	unsigned int max_page_order = 0;
> >> +	int err;
> >> +	unsigned int i, max_page_order = 0;
> >>  	unsigned int ring_page_order = 0;
> >> -	struct blkfront_ring_info *rinfo;
> > 
> > Why? You end up doing the 'struct blkfront_ring_info' decleration
> > in two of the loops below?
> 
> Oh, that's because Roger mentioned we would be tempted to declare rinfo only inside the for loop, to limit
> the scope.
> 
> >>  
> >>  	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
> >>  			   "max-ring-page-order", "%u", &max_page_order);
> >> @@ -1542,7 +1596,8 @@ static int talk_to_blkback(struct xenbus_device *dev,
> >>  	}
> >>  
> >>  	for (i = 0; i < info->nr_rings; i++) {
> >> -		rinfo = &info->rinfo[i];
> >> +		struct blkfront_ring_info *rinfo = &info->rinfo[i];
> >> +
> > 
> > Here..
> > 
> >> @@ -1617,7 +1677,7 @@ again:
> >>  
> >>  	for (i = 0; i < info->nr_rings; i++) {
> >>  		int j;
> >> -		rinfo = &info->rinfo[i];
> >> +		struct blkfront_ring_info *rinfo = &info->rinfo[i];
> > 
> > And here?
> > 
> > It is not a big deal but I am curious of why add this change?
> > 
> >> @@ -1717,7 +1789,6 @@ static int blkfront_probe(struct xenbus_device *dev,
> >>  
> >>  	mutex_init(&info->mutex);
> >>  	spin_lock_init(&info->dev_lock);
> >> -	info->xbdev = dev;
> > 
> > That looks like a spurious change? Ah, I see that we do the same exact
> > operation earlier in the blkfront_probe.
> > 
> 
> The place of this line was changed because:
> 
> 1738         info->xbdev = dev;
> 
> 1739         /* Check if backend supports multiple queues. */
> 1740         err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
>                                          	^^^^
> 						We need xbdev to be set in advance.
>                                   
> 1741                            "multi-queue-max-queues", "%u", &backend_max_queues);
> 1742         if (err < 0)
> 1743                 backend_max_queues = 1;

Duh! Yes. Thanks.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 07/10] xen/blkback: pseudo support for multi hardware queues/rings
  2015-11-14  3:12 ` [PATCH v5 07/10] xen/blkback: pseudo support for multi hardware queues/rings Bob Liu
  2015-11-25 17:40   ` Konrad Rzeszutek Wilk
@ 2015-11-25 17:40   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-25 17:40 UTC (permalink / raw)
  To: Bob Liu
  Cc: xen-devel, linux-kernel, roger.pau, felipe.franciosi, axboe,
	avanzini.arianna, rafal.mielniczuk, jonathan.davies,
	david.vrabel

> @@ -113,19 +115,55 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
>  	}
>  	invalidate_inode_pages2(blkif->vbd.bdev->bd_inode->i_mapping);
>  
> -	blkif->ring.xenblkd = kthread_run(xen_blkif_schedule, &blkif->ring, "%s", name);
> -	if (IS_ERR(blkif->ring.xenblkd)) {
> -		err = PTR_ERR(blkif->ring.xenblkd);
> -		blkif->ring.xenblkd = NULL;
> -		xenbus_dev_error(blkif->be->dev, err, "start xenblkd");
> -		return;
> +	for (i = 0; i < blkif->nr_rings; i++) {
> +		ring = &blkif->rings[i];
> +		ring->xenblkd = kthread_run(xen_blkif_schedule, ring, "%s-%d", name, i);
> +		if (IS_ERR(ring->xenblkd)) {
> +			err = PTR_ERR(ring->xenblkd);
> +			ring->xenblkd = NULL;
> +			xenbus_dev_fatal(blkif->be->dev, err,
> +					"start %s-%d xenblkd", name, i);
> +			goto out;
> +		}
> +	}
> +	return;
> +
> +out:
> +	while (--i >= 0) {
> +		ring = &blkif->rings[i];
> +		kthread_stop(ring->xenblkd);

That won't work. Imagine us failing at the start of the loop above, 
so i==0. We get here and decrement and unsigned int by one, and loop
back to 0xffffffffff. Naturally 0xffff.. >= 0 so we will just continue
one going over the blkif->rings[0xffffff].. and BOOM!

This worked when 'i' was 'int', but now it is unsigned int.

Let me make it 'int' and then this works, or we can swap
the loop around and use 'i-1' to use the previous entry.

[Fixed it up in my tree]
>  	}
> +	return;
> +}
> +
.. snip..
> +static int connect_ring(struct backend_info *be)
> +{
> +	struct xenbus_device *dev = be->dev;
> +	unsigned int pers_grants;
> +	char protocol[64] = "";
> +	int err, i;
> +	char *xspath;
> +	size_t xspathsize;
> +	const size_t xenstore_path_ext_size = 11; /* sufficient for "/queue-NNN" */
> +
> +	pr_debug("%s %s\n", __func__, dev->otherend);
> +
> +	be->blkif->blk_protocol = BLKIF_PROTOCOL_DEFAULT;
> +	err = xenbus_gather(XBT_NIL, dev->otherend, "protocol",
> +			    "%63s", protocol, NULL);
> +	if (err)
> +		strcpy(protocol, "unspecified, assuming default");
> +	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_NATIVE))
> +		be->blkif->blk_protocol = BLKIF_PROTOCOL_NATIVE;
> +	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_X86_32))
> +		be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_32;
> +	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_X86_64))
> +		be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_64;
> +	else {
> +		xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
> +		return -1;
> +	}
> +	err = xenbus_gather(XBT_NIL, dev->otherend,
> +			    "feature-persistent", "%u",
> +			    &pers_grants, NULL);
> +	if (err)
> +		pers_grants = 0;
> +
> +	be->blkif->vbd.feature_gnt_persistent = pers_grants;
> +	be->blkif->vbd.overflow_max_grants = 0;
> +
> +	pr_info("%s: using %d queues, protocol %d (%s) %s\n", dev->nodename,
> +		 be->blkif->nr_rings, be->blkif->blk_protocol, protocol,
> +		 pers_grants ? "persistent grants" : "");
> +
> +	if (be->blkif->nr_rings == 1)
> +		return read_per_ring_refs(&be->blkif->rings[0], dev->otherend);
> +	else {
> +		xspathsize = strlen(dev->otherend) + xenstore_path_ext_size;
> +		xspath = kmalloc(xspathsize, GFP_KERNEL);
> +		if (!xspath) {
> +			xenbus_dev_fatal(dev, -ENOMEM, "reading ring references");
> +			return -ENOMEM;
> +		}
> +
> +		for (i = 0; i < be->blkif->nr_rings; i++) {
> +			memset(xspath, 0, xspathsize);
> +			snprintf(xspath, xspathsize, "%s/queue-%u", dev->otherend, i);
> +			err = read_per_ring_refs(&be->blkif->rings[i], xspath);

Say nr_rings is 4 and this fails at the last one. That means
be->blkif->rings[0..2].pending_free has a bunch of pages and
also ring->blk_ring are set. We return out of this function
and end back in (frontend_changed):
 752                 err = connect_ring(be);
 753                 if (err)
 754                         break;

Great. So we have a memory leak until the device goes in
XenbusStateConnected (where we end up calling xen_blkif_disconnect
and free ring[0..2]..

But that may take a while if the guest is not nice. Perhaps we should
add in  frontend_changed(..) an call to xen_blkif_disconnect in case
we fail at 'connect_ring' to clear the memory faster. I will prep a
patch for that.

> +			if (err) {
> +				kfree(xspath);
> +				return err;
> +			}
> +		}
> +		kfree(xspath);
> +	}
> +	return 0;
>  }
>  
>  static const struct xenbus_device_id xen_blkbk_ids[] = {
> -- 
> 1.7.10.4
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 07/10] xen/blkback: pseudo support for multi hardware queues/rings
  2015-11-14  3:12 ` [PATCH v5 07/10] xen/blkback: pseudo support for multi hardware queues/rings Bob Liu
@ 2015-11-25 17:40   ` Konrad Rzeszutek Wilk
  2015-11-25 17:40   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-25 17:40 UTC (permalink / raw)
  To: Bob Liu
  Cc: jonathan.davies, felipe.franciosi, rafal.mielniczuk,
	linux-kernel, xen-devel, axboe, david.vrabel, avanzini.arianna,
	roger.pau

> @@ -113,19 +115,55 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
>  	}
>  	invalidate_inode_pages2(blkif->vbd.bdev->bd_inode->i_mapping);
>  
> -	blkif->ring.xenblkd = kthread_run(xen_blkif_schedule, &blkif->ring, "%s", name);
> -	if (IS_ERR(blkif->ring.xenblkd)) {
> -		err = PTR_ERR(blkif->ring.xenblkd);
> -		blkif->ring.xenblkd = NULL;
> -		xenbus_dev_error(blkif->be->dev, err, "start xenblkd");
> -		return;
> +	for (i = 0; i < blkif->nr_rings; i++) {
> +		ring = &blkif->rings[i];
> +		ring->xenblkd = kthread_run(xen_blkif_schedule, ring, "%s-%d", name, i);
> +		if (IS_ERR(ring->xenblkd)) {
> +			err = PTR_ERR(ring->xenblkd);
> +			ring->xenblkd = NULL;
> +			xenbus_dev_fatal(blkif->be->dev, err,
> +					"start %s-%d xenblkd", name, i);
> +			goto out;
> +		}
> +	}
> +	return;
> +
> +out:
> +	while (--i >= 0) {
> +		ring = &blkif->rings[i];
> +		kthread_stop(ring->xenblkd);

That won't work. Imagine us failing at the start of the loop above, 
so i==0. We get here and decrement and unsigned int by one, and loop
back to 0xffffffffff. Naturally 0xffff.. >= 0 so we will just continue
one going over the blkif->rings[0xffffff].. and BOOM!

This worked when 'i' was 'int', but now it is unsigned int.

Let me make it 'int' and then this works, or we can swap
the loop around and use 'i-1' to use the previous entry.

[Fixed it up in my tree]
>  	}
> +	return;
> +}
> +
.. snip..
> +static int connect_ring(struct backend_info *be)
> +{
> +	struct xenbus_device *dev = be->dev;
> +	unsigned int pers_grants;
> +	char protocol[64] = "";
> +	int err, i;
> +	char *xspath;
> +	size_t xspathsize;
> +	const size_t xenstore_path_ext_size = 11; /* sufficient for "/queue-NNN" */
> +
> +	pr_debug("%s %s\n", __func__, dev->otherend);
> +
> +	be->blkif->blk_protocol = BLKIF_PROTOCOL_DEFAULT;
> +	err = xenbus_gather(XBT_NIL, dev->otherend, "protocol",
> +			    "%63s", protocol, NULL);
> +	if (err)
> +		strcpy(protocol, "unspecified, assuming default");
> +	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_NATIVE))
> +		be->blkif->blk_protocol = BLKIF_PROTOCOL_NATIVE;
> +	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_X86_32))
> +		be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_32;
> +	else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_X86_64))
> +		be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_64;
> +	else {
> +		xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
> +		return -1;
> +	}
> +	err = xenbus_gather(XBT_NIL, dev->otherend,
> +			    "feature-persistent", "%u",
> +			    &pers_grants, NULL);
> +	if (err)
> +		pers_grants = 0;
> +
> +	be->blkif->vbd.feature_gnt_persistent = pers_grants;
> +	be->blkif->vbd.overflow_max_grants = 0;
> +
> +	pr_info("%s: using %d queues, protocol %d (%s) %s\n", dev->nodename,
> +		 be->blkif->nr_rings, be->blkif->blk_protocol, protocol,
> +		 pers_grants ? "persistent grants" : "");
> +
> +	if (be->blkif->nr_rings == 1)
> +		return read_per_ring_refs(&be->blkif->rings[0], dev->otherend);
> +	else {
> +		xspathsize = strlen(dev->otherend) + xenstore_path_ext_size;
> +		xspath = kmalloc(xspathsize, GFP_KERNEL);
> +		if (!xspath) {
> +			xenbus_dev_fatal(dev, -ENOMEM, "reading ring references");
> +			return -ENOMEM;
> +		}
> +
> +		for (i = 0; i < be->blkif->nr_rings; i++) {
> +			memset(xspath, 0, xspathsize);
> +			snprintf(xspath, xspathsize, "%s/queue-%u", dev->otherend, i);
> +			err = read_per_ring_refs(&be->blkif->rings[i], xspath);

Say nr_rings is 4 and this fails at the last one. That means
be->blkif->rings[0..2].pending_free has a bunch of pages and
also ring->blk_ring are set. We return out of this function
and end back in (frontend_changed):
 752                 err = connect_ring(be);
 753                 if (err)
 754                         break;

Great. So we have a memory leak until the device goes in
XenbusStateConnected (where we end up calling xen_blkif_disconnect
and free ring[0..2]..

But that may take a while if the guest is not nice. Perhaps we should
add in  frontend_changed(..) an call to xen_blkif_disconnect in case
we fail at 'connect_ring' to clear the memory faster. I will prep a
patch for that.

> +			if (err) {
> +				kfree(xspath);
> +				return err;
> +			}
> +		}
> +		kfree(xspath);
> +	}
> +	return 0;
>  }
>  
>  static const struct xenbus_device_id xen_blkbk_ids[] = {
> -- 
> 1.7.10.4
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 08/10] xen/blkback: get the number of hardware queues/rings from blkfront
  2015-11-14  3:12 ` Bob Liu
  2015-11-25 17:58   ` Konrad Rzeszutek Wilk
@ 2015-11-25 17:58   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-25 17:58 UTC (permalink / raw)
  To: Bob Liu
  Cc: xen-devel, linux-kernel, roger.pau, felipe.franciosi, axboe,
	avanzini.arianna, rafal.mielniczuk, jonathan.davies,
	david.vrabel

On Sat, Nov 14, 2015 at 11:12:17AM +0800, Bob Liu wrote:
> Backend advertises "multi-queue-max-queues" to front, also get the negotiated
> number from "multi-queue-num-queues" written by blkfront.
> 
> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> ---
>  drivers/block/xen-blkback/blkback.c |   12 ++++++++++++
>  drivers/block/xen-blkback/common.h  |    1 +
>  drivers/block/xen-blkback/xenbus.c  |   34 ++++++++++++++++++++++++++++------
>  3 files changed, 41 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
> index fb5bfd4..acedc46 100644
> --- a/drivers/block/xen-blkback/blkback.c
> +++ b/drivers/block/xen-blkback/blkback.c
> @@ -84,6 +84,15 @@ MODULE_PARM_DESC(max_persistent_grants,
>                   "Maximum number of grants to map persistently");
>  
>  /*
> + * Maximum number of rings/queues blkback supports, allow as many queues as there
> + * are CPUs if user has not specified a value.
> + */
> +unsigned int xenblk_max_queues;
> +module_param_named(max_queues, xenblk_max_queues, uint, 0644);
> +MODULE_PARM_DESC(max_queues,
> +		 "Maximum number of hardware queues per virtual disk");

Added:
 unsigned int xenblk_max_queues;
 module_param_named(max_queues, xenblk_max_queues, uint, 0644);
 MODULE_PARM_DESC(max_queues,
-                "Maximum number of hardware queues per virtual disk");
+                "Maximum number of hardware queues per virtual disk." \
+                "By default it is the number of online CPUs.");
 
 /*

> +
> +/*
>   * Maximum order of pages to be used for the shared ring between front and
>   * backend, 4KB page granularity is used.
>   */
> @@ -1483,6 +1492,9 @@ static int __init xen_blkif_init(void)
>  		xen_blkif_max_ring_order = XENBUS_MAX_RING_GRANT_ORDER;
>  	}
>  
> +	if (xenblk_max_queues == 0)
> +		xenblk_max_queues = num_online_cpus();
> +
>  	rc = xen_blkif_interface_init();
>  	if (rc)
>  		goto failed_init;
> diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
> index f2386e3..0833dc6 100644
> --- a/drivers/block/xen-blkback/common.h
> +++ b/drivers/block/xen-blkback/common.h
> @@ -46,6 +46,7 @@
>  #include <xen/interface/io/protocols.h>
>  
>  extern unsigned int xen_blkif_max_ring_order;
> +extern unsigned int xenblk_max_queues;
>  /*
>   * This is the maximum number of segments that would be allowed in indirect
>   * requests. This value will also be passed to the frontend.
> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> index 6c6e048..d83b790 100644
> --- a/drivers/block/xen-blkback/xenbus.c
> +++ b/drivers/block/xen-blkback/xenbus.c
> @@ -181,12 +181,6 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
>  	blkif->st_print = jiffies;
>  	INIT_WORK(&blkif->persistent_purge_work, xen_blkbk_unmap_purged_grants);
>  
> -	blkif->nr_rings = 1;
> -	if (xen_blkif_alloc_rings(blkif)) {
> -		kmem_cache_free(xen_blkif_cachep, blkif);
> -		return ERR_PTR(-ENOMEM);
> -	}
> -
>  	return blkif;
>  }
>  
> @@ -595,6 +589,12 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
>  		goto fail;
>  	}
>  
> +	/* Multi-queue: write how many queues are supported by the backend. */
> +	err = xenbus_printf(XBT_NIL, dev->nodename,
> +			    "multi-queue-max-queues", "%u", xenblk_max_queues);
> +	if (err)
> +		pr_warn("Error writing multi-queue-num-queues\n");

s/num/max/

> +
>  	/* setup back pointer */
>  	be->blkif->be = be;
>  
> @@ -980,6 +980,7 @@ static int connect_ring(struct backend_info *be)
>  	char *xspath;
>  	size_t xspathsize;
>  	const size_t xenstore_path_ext_size = 11; /* sufficient for "/queue-NNN" */
> +	unsigned int requested_num_queues = 0;
>  
>  	pr_debug("%s %s\n", __func__, dev->otherend);
>  
> @@ -1007,6 +1008,27 @@ static int connect_ring(struct backend_info *be)
>  	be->blkif->vbd.feature_gnt_persistent = pers_grants;
>  	be->blkif->vbd.overflow_max_grants = 0;
>  
> +	/*
> +	 * Read the number of hardware queues from frontend.
> +	 */
> +	err = xenbus_scanf(XBT_NIL, dev->otherend, "multi-queue-num-queues",
> +			   "%u", &requested_num_queues);
> +	if (err < 0) {
> +		requested_num_queues = 1;
> +	} else {
> +		if (requested_num_queues > xenblk_max_queues
> +		    || requested_num_queues == 0) {
> +			/* buggy or malicious guest */
> +			xenbus_dev_fatal(dev, err,
> +					"guest requested %u queues, exceeding the maximum of %u.",
> +					requested_num_queues, xenblk_max_queues);
> +			return -1;

And made this return -ENOSYS.


> +		}
> +	}
> +	be->blkif->nr_rings = requested_num_queues;
> +	if (xen_blkif_alloc_rings(be->blkif))
> +		return -ENOMEM;
> +
>  	pr_info("%s: using %d queues, protocol %d (%s) %s\n", dev->nodename,
>  		 be->blkif->nr_rings, be->blkif->blk_protocol, protocol,
>  		 pers_grants ? "persistent grants" : "");
> -- 
> 1.7.10.4
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 08/10] xen/blkback: get the number of hardware queues/rings from blkfront
  2015-11-14  3:12 ` Bob Liu
@ 2015-11-25 17:58   ` Konrad Rzeszutek Wilk
  2015-11-25 17:58   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-25 17:58 UTC (permalink / raw)
  To: Bob Liu
  Cc: jonathan.davies, felipe.franciosi, rafal.mielniczuk,
	linux-kernel, xen-devel, axboe, david.vrabel, avanzini.arianna,
	roger.pau

On Sat, Nov 14, 2015 at 11:12:17AM +0800, Bob Liu wrote:
> Backend advertises "multi-queue-max-queues" to front, also get the negotiated
> number from "multi-queue-num-queues" written by blkfront.
> 
> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> ---
>  drivers/block/xen-blkback/blkback.c |   12 ++++++++++++
>  drivers/block/xen-blkback/common.h  |    1 +
>  drivers/block/xen-blkback/xenbus.c  |   34 ++++++++++++++++++++++++++++------
>  3 files changed, 41 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
> index fb5bfd4..acedc46 100644
> --- a/drivers/block/xen-blkback/blkback.c
> +++ b/drivers/block/xen-blkback/blkback.c
> @@ -84,6 +84,15 @@ MODULE_PARM_DESC(max_persistent_grants,
>                   "Maximum number of grants to map persistently");
>  
>  /*
> + * Maximum number of rings/queues blkback supports, allow as many queues as there
> + * are CPUs if user has not specified a value.
> + */
> +unsigned int xenblk_max_queues;
> +module_param_named(max_queues, xenblk_max_queues, uint, 0644);
> +MODULE_PARM_DESC(max_queues,
> +		 "Maximum number of hardware queues per virtual disk");

Added:
 unsigned int xenblk_max_queues;
 module_param_named(max_queues, xenblk_max_queues, uint, 0644);
 MODULE_PARM_DESC(max_queues,
-                "Maximum number of hardware queues per virtual disk");
+                "Maximum number of hardware queues per virtual disk." \
+                "By default it is the number of online CPUs.");
 
 /*

> +
> +/*
>   * Maximum order of pages to be used for the shared ring between front and
>   * backend, 4KB page granularity is used.
>   */
> @@ -1483,6 +1492,9 @@ static int __init xen_blkif_init(void)
>  		xen_blkif_max_ring_order = XENBUS_MAX_RING_GRANT_ORDER;
>  	}
>  
> +	if (xenblk_max_queues == 0)
> +		xenblk_max_queues = num_online_cpus();
> +
>  	rc = xen_blkif_interface_init();
>  	if (rc)
>  		goto failed_init;
> diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
> index f2386e3..0833dc6 100644
> --- a/drivers/block/xen-blkback/common.h
> +++ b/drivers/block/xen-blkback/common.h
> @@ -46,6 +46,7 @@
>  #include <xen/interface/io/protocols.h>
>  
>  extern unsigned int xen_blkif_max_ring_order;
> +extern unsigned int xenblk_max_queues;
>  /*
>   * This is the maximum number of segments that would be allowed in indirect
>   * requests. This value will also be passed to the frontend.
> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> index 6c6e048..d83b790 100644
> --- a/drivers/block/xen-blkback/xenbus.c
> +++ b/drivers/block/xen-blkback/xenbus.c
> @@ -181,12 +181,6 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
>  	blkif->st_print = jiffies;
>  	INIT_WORK(&blkif->persistent_purge_work, xen_blkbk_unmap_purged_grants);
>  
> -	blkif->nr_rings = 1;
> -	if (xen_blkif_alloc_rings(blkif)) {
> -		kmem_cache_free(xen_blkif_cachep, blkif);
> -		return ERR_PTR(-ENOMEM);
> -	}
> -
>  	return blkif;
>  }
>  
> @@ -595,6 +589,12 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
>  		goto fail;
>  	}
>  
> +	/* Multi-queue: write how many queues are supported by the backend. */
> +	err = xenbus_printf(XBT_NIL, dev->nodename,
> +			    "multi-queue-max-queues", "%u", xenblk_max_queues);
> +	if (err)
> +		pr_warn("Error writing multi-queue-num-queues\n");

s/num/max/

> +
>  	/* setup back pointer */
>  	be->blkif->be = be;
>  
> @@ -980,6 +980,7 @@ static int connect_ring(struct backend_info *be)
>  	char *xspath;
>  	size_t xspathsize;
>  	const size_t xenstore_path_ext_size = 11; /* sufficient for "/queue-NNN" */
> +	unsigned int requested_num_queues = 0;
>  
>  	pr_debug("%s %s\n", __func__, dev->otherend);
>  
> @@ -1007,6 +1008,27 @@ static int connect_ring(struct backend_info *be)
>  	be->blkif->vbd.feature_gnt_persistent = pers_grants;
>  	be->blkif->vbd.overflow_max_grants = 0;
>  
> +	/*
> +	 * Read the number of hardware queues from frontend.
> +	 */
> +	err = xenbus_scanf(XBT_NIL, dev->otherend, "multi-queue-num-queues",
> +			   "%u", &requested_num_queues);
> +	if (err < 0) {
> +		requested_num_queues = 1;
> +	} else {
> +		if (requested_num_queues > xenblk_max_queues
> +		    || requested_num_queues == 0) {
> +			/* buggy or malicious guest */
> +			xenbus_dev_fatal(dev, err,
> +					"guest requested %u queues, exceeding the maximum of %u.",
> +					requested_num_queues, xenblk_max_queues);
> +			return -1;

And made this return -ENOSYS.


> +		}
> +	}
> +	be->blkif->nr_rings = requested_num_queues;
> +	if (xen_blkif_alloc_rings(be->blkif))
> +		return -ENOMEM;
> +
>  	pr_info("%s: using %d queues, protocol %d (%s) %s\n", dev->nodename,
>  		 be->blkif->nr_rings, be->blkif->blk_protocol, protocol,
>  		 pers_grants ? "persistent grants" : "");
> -- 
> 1.7.10.4
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 00/10] xen-block: multi hardware-queues/rings support
  2015-11-14  3:12 [PATCH v5 00/10] xen-block: multi hardware-queues/rings support Bob Liu
                   ` (19 preceding siblings ...)
  2015-11-25 19:25 ` [PATCH v5 00/10] xen-block: multi hardware-queues/rings support Konrad Rzeszutek Wilk
@ 2015-11-25 19:25 ` Konrad Rzeszutek Wilk
  2015-11-25 20:56   ` Konrad Rzeszutek Wilk
  2015-11-25 20:56   ` Konrad Rzeszutek Wilk
  2015-11-25 19:35 ` [PATCH RFC] Various fixes to xen block drivers on top of Bob's multi-queue patches Konrad Rzeszutek Wilk
  2015-11-25 19:35 ` [PATCH RFC] Various fixes to xen block drivers on top of Bob's multi-queue patches Konrad Rzeszutek Wilk
  22 siblings, 2 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-25 19:25 UTC (permalink / raw)
  To: Bob Liu
  Cc: xen-devel, linux-kernel, roger.pau, felipe.franciosi, axboe,
	avanzini.arianna, rafal.mielniczuk, jonathan.davies,
	david.vrabel

>   xen/blkback: separate ring information out of struct xen_blkif
>   xen/blkback: pseudo support for multi hardware queues/rings
>   xen/blkback: get the number of hardware queues/rings from blkfront
>   xen/blkback: make pool of persistent grants and free pages per-queue

OK, got to those as well. I have put them in 'devel/for-jens-4.5' and
are going to test them overnight before pushing them out.

I see two bugs in the code that we MUST deal with:

 - print_stats () is going to show zero values.
 - the sysfs code (VBD_SHOW) aren't converted over to fetch data
   from all the rings.

> 
>  drivers/block/xen-blkback/blkback.c | 386 ++++++++++---------
>  drivers/block/xen-blkback/common.h  |  78 ++--
>  drivers/block/xen-blkback/xenbus.c  | 359 ++++++++++++------
>  drivers/block/xen-blkfront.c        | 718 ++++++++++++++++++++++--------------
>  include/xen/interface/io/blkif.h    |  48 +++
>  5 files changed, 971 insertions(+), 618 deletions(-)
> 
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 00/10] xen-block: multi hardware-queues/rings support
  2015-11-14  3:12 [PATCH v5 00/10] xen-block: multi hardware-queues/rings support Bob Liu
                   ` (18 preceding siblings ...)
  2015-11-16 21:35 ` [PATCH] cleanups to xen-lkbfront on top of " Konrad Rzeszutek Wilk
@ 2015-11-25 19:25 ` Konrad Rzeszutek Wilk
  2015-11-25 19:25 ` Konrad Rzeszutek Wilk
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-25 19:25 UTC (permalink / raw)
  To: Bob Liu
  Cc: jonathan.davies, felipe.franciosi, rafal.mielniczuk,
	linux-kernel, xen-devel, axboe, david.vrabel, avanzini.arianna,
	roger.pau

>   xen/blkback: separate ring information out of struct xen_blkif
>   xen/blkback: pseudo support for multi hardware queues/rings
>   xen/blkback: get the number of hardware queues/rings from blkfront
>   xen/blkback: make pool of persistent grants and free pages per-queue

OK, got to those as well. I have put them in 'devel/for-jens-4.5' and
are going to test them overnight before pushing them out.

I see two bugs in the code that we MUST deal with:

 - print_stats () is going to show zero values.
 - the sysfs code (VBD_SHOW) aren't converted over to fetch data
   from all the rings.

> 
>  drivers/block/xen-blkback/blkback.c | 386 ++++++++++---------
>  drivers/block/xen-blkback/common.h  |  78 ++--
>  drivers/block/xen-blkback/xenbus.c  | 359 ++++++++++++------
>  drivers/block/xen-blkfront.c        | 718 ++++++++++++++++++++++--------------
>  include/xen/interface/io/blkif.h    |  48 +++
>  5 files changed, 971 insertions(+), 618 deletions(-)
> 
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH RFC] Various fixes to xen block drivers on top of Bob's multi-queue patches.
  2015-11-14  3:12 [PATCH v5 00/10] xen-block: multi hardware-queues/rings support Bob Liu
                   ` (20 preceding siblings ...)
  2015-11-25 19:25 ` Konrad Rzeszutek Wilk
@ 2015-11-25 19:35 ` Konrad Rzeszutek Wilk
  2015-11-25 19:35   ` [PATCH RFC 1/2] xen/blocks: Return -EXX instead of -1 Konrad Rzeszutek Wilk
                     ` (3 more replies)
  2015-11-25 19:35 ` [PATCH RFC] Various fixes to xen block drivers on top of Bob's multi-queue patches Konrad Rzeszutek Wilk
  22 siblings, 4 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-25 19:35 UTC (permalink / raw)
  To: roger.pau, jonathan.davies, david.vrabel, linux-kernel, jgross,
	xen-devel, bob.liu

Hey,

As I was reviewing Bob's backend patches I spotted a couple of
oddities that I thought should be fixed.

Please review at your leisure.

 drivers/block/xen-blkback/xenbus.c | 10 ++++++++--
 drivers/block/xen-blkfront.c       |  4 ++--
 2 files changed, 10 insertions(+), 4 deletions(-)


Konrad Rzeszutek Wilk (2):
      xen/blocks: Return -EXX instead of -1
      xen/blkback: Free resources if connect_ring failed.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH RFC] Various fixes to xen block drivers on top of Bob's multi-queue patches.
  2015-11-14  3:12 [PATCH v5 00/10] xen-block: multi hardware-queues/rings support Bob Liu
                   ` (21 preceding siblings ...)
  2015-11-25 19:35 ` [PATCH RFC] Various fixes to xen block drivers on top of Bob's multi-queue patches Konrad Rzeszutek Wilk
@ 2015-11-25 19:35 ` Konrad Rzeszutek Wilk
  22 siblings, 0 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-25 19:35 UTC (permalink / raw)
  To: roger.pau, jonathan.davies, david.vrabel, linux-kernel, jgross,
	xen-devel, bob.liu

Hey,

As I was reviewing Bob's backend patches I spotted a couple of
oddities that I thought should be fixed.

Please review at your leisure.

 drivers/block/xen-blkback/xenbus.c | 10 ++++++++--
 drivers/block/xen-blkfront.c       |  4 ++--
 2 files changed, 10 insertions(+), 4 deletions(-)


Konrad Rzeszutek Wilk (2):
      xen/blocks: Return -EXX instead of -1
      xen/blkback: Free resources if connect_ring failed.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH RFC 1/2] xen/blocks: Return -EXX instead of -1
  2015-11-25 19:35 ` [PATCH RFC] Various fixes to xen block drivers on top of Bob's multi-queue patches Konrad Rzeszutek Wilk
  2015-11-25 19:35   ` [PATCH RFC 1/2] xen/blocks: Return -EXX instead of -1 Konrad Rzeszutek Wilk
@ 2015-11-25 19:35   ` Konrad Rzeszutek Wilk
  2015-11-25 19:35   ` [PATCH RFC 2/2] xen/blkback: Free resources if connect_ring failed Konrad Rzeszutek Wilk
  2015-11-25 19:35   ` Konrad Rzeszutek Wilk
  3 siblings, 0 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-25 19:35 UTC (permalink / raw)
  To: roger.pau, jonathan.davies, david.vrabel, linux-kernel, jgross,
	xen-devel, bob.liu
  Cc: Konrad Rzeszutek Wilk

Lets return sensible values instead of -1.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/block/xen-blkback/xenbus.c | 2 +-
 drivers/block/xen-blkfront.c       | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index 2b8650a9..ca3a414 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -996,7 +996,7 @@ static int connect_ring(struct backend_info *be)
 		be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_64;
 	else {
 		xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
-		return -1;
+		return -ENOSYS;
 	}
 	err = xenbus_gather(XBT_NIL, dev->otherend,
 			    "feature-persistent", "%u",
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index b48e488..0360c44 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -828,11 +828,11 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
 	info->tag_set.driver_data = info;
 
 	if (blk_mq_alloc_tag_set(&info->tag_set))
-		return -1;
+		return -EINVAL;
 	rq = blk_mq_init_queue(&info->tag_set);
 	if (IS_ERR(rq)) {
 		blk_mq_free_tag_set(&info->tag_set);
-		return -1;
+		return PTR_ERR(rq);
 	}
 
 	queue_flag_set_unlocked(QUEUE_FLAG_VIRT, rq);
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH RFC 1/2] xen/blocks: Return -EXX instead of -1
  2015-11-25 19:35 ` [PATCH RFC] Various fixes to xen block drivers on top of Bob's multi-queue patches Konrad Rzeszutek Wilk
@ 2015-11-25 19:35   ` Konrad Rzeszutek Wilk
  2015-11-25 19:35   ` Konrad Rzeszutek Wilk
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-25 19:35 UTC (permalink / raw)
  To: roger.pau, jonathan.davies, david.vrabel, linux-kernel, jgross,
	xen-devel, bob.liu
  Cc: Konrad Rzeszutek Wilk

Lets return sensible values instead of -1.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/block/xen-blkback/xenbus.c | 2 +-
 drivers/block/xen-blkfront.c       | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index 2b8650a9..ca3a414 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -996,7 +996,7 @@ static int connect_ring(struct backend_info *be)
 		be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_64;
 	else {
 		xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
-		return -1;
+		return -ENOSYS;
 	}
 	err = xenbus_gather(XBT_NIL, dev->otherend,
 			    "feature-persistent", "%u",
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index b48e488..0360c44 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -828,11 +828,11 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
 	info->tag_set.driver_data = info;
 
 	if (blk_mq_alloc_tag_set(&info->tag_set))
-		return -1;
+		return -EINVAL;
 	rq = blk_mq_init_queue(&info->tag_set);
 	if (IS_ERR(rq)) {
 		blk_mq_free_tag_set(&info->tag_set);
-		return -1;
+		return PTR_ERR(rq);
 	}
 
 	queue_flag_set_unlocked(QUEUE_FLAG_VIRT, rq);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH RFC 2/2] xen/blkback: Free resources if connect_ring failed.
  2015-11-25 19:35 ` [PATCH RFC] Various fixes to xen block drivers on top of Bob's multi-queue patches Konrad Rzeszutek Wilk
  2015-11-25 19:35   ` [PATCH RFC 1/2] xen/blocks: Return -EXX instead of -1 Konrad Rzeszutek Wilk
  2015-11-25 19:35   ` Konrad Rzeszutek Wilk
@ 2015-11-25 19:35   ` Konrad Rzeszutek Wilk
  2015-11-25 19:35   ` Konrad Rzeszutek Wilk
  3 siblings, 0 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-25 19:35 UTC (permalink / raw)
  To: roger.pau, jonathan.davies, david.vrabel, linux-kernel, jgross,
	xen-devel, bob.liu
  Cc: Konrad Rzeszutek Wilk

With the multi-queue support we could fail at setting up
some of the rings and fail the connection. That meant that
all resources tied to rings[0..n-1] (where n is the ring
that failed to be setup). Eventually the frontend will switch
to the states and we will call xen_blkif_disconnect.

However we do not want to be at the mercy of the frontend
deciding when to change states. This allows us to do the
cleanup right away and freeing resources.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/block/xen-blkback/xenbus.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index ca3a414..c92b358 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -749,8 +749,14 @@ static void frontend_changed(struct xenbus_device *dev,
 		}
 
 		err = connect_ring(be);
-		if (err)
+		if (err) {
+			/*
+			 * Clean up so that memory resources can be used by
+			 * other devices. connect_ring reported already error.
+			 */
+			xen_blkif_disconnect(be->blkif);
 			break;
+		}
 		xen_update_blkif_status(be->blkif);
 		break;
 
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH RFC 2/2] xen/blkback: Free resources if connect_ring failed.
  2015-11-25 19:35 ` [PATCH RFC] Various fixes to xen block drivers on top of Bob's multi-queue patches Konrad Rzeszutek Wilk
                     ` (2 preceding siblings ...)
  2015-11-25 19:35   ` [PATCH RFC 2/2] xen/blkback: Free resources if connect_ring failed Konrad Rzeszutek Wilk
@ 2015-11-25 19:35   ` Konrad Rzeszutek Wilk
  3 siblings, 0 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-25 19:35 UTC (permalink / raw)
  To: roger.pau, jonathan.davies, david.vrabel, linux-kernel, jgross,
	xen-devel, bob.liu
  Cc: Konrad Rzeszutek Wilk

With the multi-queue support we could fail at setting up
some of the rings and fail the connection. That meant that
all resources tied to rings[0..n-1] (where n is the ring
that failed to be setup). Eventually the frontend will switch
to the states and we will call xen_blkif_disconnect.

However we do not want to be at the mercy of the frontend
deciding when to change states. This allows us to do the
cleanup right away and freeing resources.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/block/xen-blkback/xenbus.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index ca3a414..c92b358 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -749,8 +749,14 @@ static void frontend_changed(struct xenbus_device *dev,
 		}
 
 		err = connect_ring(be);
-		if (err)
+		if (err) {
+			/*
+			 * Clean up so that memory resources can be used by
+			 * other devices. connect_ring reported already error.
+			 */
+			xen_blkif_disconnect(be->blkif);
 			break;
+		}
 		xen_update_blkif_status(be->blkif);
 		break;
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 00/10] xen-block: multi hardware-queues/rings support
  2015-11-25 19:25 ` Konrad Rzeszutek Wilk
  2015-11-25 20:56   ` Konrad Rzeszutek Wilk
@ 2015-11-25 20:56   ` Konrad Rzeszutek Wilk
  2015-11-25 22:12     ` Konrad Rzeszutek Wilk
  2015-11-25 22:12     ` Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-25 20:56 UTC (permalink / raw)
  To: Bob Liu
  Cc: xen-devel, linux-kernel, roger.pau, felipe.franciosi, axboe,
	avanzini.arianna, rafal.mielniczuk, jonathan.davies,
	david.vrabel

On Wed, Nov 25, 2015 at 02:25:07PM -0500, Konrad Rzeszutek Wilk wrote:
> >   xen/blkback: separate ring information out of struct xen_blkif
> >   xen/blkback: pseudo support for multi hardware queues/rings
> >   xen/blkback: get the number of hardware queues/rings from blkfront
> >   xen/blkback: make pool of persistent grants and free pages per-queue
> 
> OK, got to those as well. I have put them in 'devel/for-jens-4.5' and
> are going to test them overnight before pushing them out.
> 
> I see two bugs in the code that we MUST deal with:
> 
>  - print_stats () is going to show zero values.
>  - the sysfs code (VBD_SHOW) aren't converted over to fetch data
>    from all the rings.

- kthread_run can't handle the two "name, i" arguments. I see:

root      5101     2  0 20:47 ?        00:00:00 [blkback.3.xvda-]
root      5102     2  0 20:47 ?        00:00:00 [blkback.3.xvda-]


> 
> > 
> >  drivers/block/xen-blkback/blkback.c | 386 ++++++++++---------
> >  drivers/block/xen-blkback/common.h  |  78 ++--
> >  drivers/block/xen-blkback/xenbus.c  | 359 ++++++++++++------
> >  drivers/block/xen-blkfront.c        | 718 ++++++++++++++++++++++--------------
> >  include/xen/interface/io/blkif.h    |  48 +++
> >  5 files changed, 971 insertions(+), 618 deletions(-)
> > 
> > -- 
> > 1.8.3.1
> > 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 00/10] xen-block: multi hardware-queues/rings support
  2015-11-25 19:25 ` Konrad Rzeszutek Wilk
@ 2015-11-25 20:56   ` Konrad Rzeszutek Wilk
  2015-11-25 20:56   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-25 20:56 UTC (permalink / raw)
  To: Bob Liu
  Cc: jonathan.davies, felipe.franciosi, rafal.mielniczuk,
	linux-kernel, xen-devel, axboe, david.vrabel, avanzini.arianna,
	roger.pau

On Wed, Nov 25, 2015 at 02:25:07PM -0500, Konrad Rzeszutek Wilk wrote:
> >   xen/blkback: separate ring information out of struct xen_blkif
> >   xen/blkback: pseudo support for multi hardware queues/rings
> >   xen/blkback: get the number of hardware queues/rings from blkfront
> >   xen/blkback: make pool of persistent grants and free pages per-queue
> 
> OK, got to those as well. I have put them in 'devel/for-jens-4.5' and
> are going to test them overnight before pushing them out.
> 
> I see two bugs in the code that we MUST deal with:
> 
>  - print_stats () is going to show zero values.
>  - the sysfs code (VBD_SHOW) aren't converted over to fetch data
>    from all the rings.

- kthread_run can't handle the two "name, i" arguments. I see:

root      5101     2  0 20:47 ?        00:00:00 [blkback.3.xvda-]
root      5102     2  0 20:47 ?        00:00:00 [blkback.3.xvda-]


> 
> > 
> >  drivers/block/xen-blkback/blkback.c | 386 ++++++++++---------
> >  drivers/block/xen-blkback/common.h  |  78 ++--
> >  drivers/block/xen-blkback/xenbus.c  | 359 ++++++++++++------
> >  drivers/block/xen-blkfront.c        | 718 ++++++++++++++++++++++--------------
> >  include/xen/interface/io/blkif.h    |  48 +++
> >  5 files changed, 971 insertions(+), 618 deletions(-)
> > 
> > -- 
> > 1.8.3.1
> > 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 00/10] xen-block: multi hardware-queues/rings support
  2015-11-25 20:56   ` Konrad Rzeszutek Wilk
@ 2015-11-25 22:12     ` Konrad Rzeszutek Wilk
  2015-11-26  2:28       ` Bob Liu
  2015-11-26  2:28       ` Bob Liu
  2015-11-25 22:12     ` Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-25 22:12 UTC (permalink / raw)
  To: Bob Liu
  Cc: xen-devel, linux-kernel, roger.pau, felipe.franciosi, axboe,
	avanzini.arianna, rafal.mielniczuk, jonathan.davies,
	david.vrabel

On Wed, Nov 25, 2015 at 03:56:03PM -0500, Konrad Rzeszutek Wilk wrote:
> On Wed, Nov 25, 2015 at 02:25:07PM -0500, Konrad Rzeszutek Wilk wrote:
> > >   xen/blkback: separate ring information out of struct xen_blkif
> > >   xen/blkback: pseudo support for multi hardware queues/rings
> > >   xen/blkback: get the number of hardware queues/rings from blkfront
> > >   xen/blkback: make pool of persistent grants and free pages per-queue
> > 
> > OK, got to those as well. I have put them in 'devel/for-jens-4.5' and
> > are going to test them overnight before pushing them out.
> > 
> > I see two bugs in the code that we MUST deal with:
> > 
> >  - print_stats () is going to show zero values.
> >  - the sysfs code (VBD_SHOW) aren't converted over to fetch data
> >    from all the rings.
> 
> - kthread_run can't handle the two "name, i" arguments. I see:
> 
> root      5101     2  0 20:47 ?        00:00:00 [blkback.3.xvda-]
> root      5102     2  0 20:47 ?        00:00:00 [blkback.3.xvda-]

And doing save/restore:

xl save <id> /tmp/A;
xl restore /tmp/A;

ends up us loosing the proper state and not getting the ring setup back.
I see this is backend:

[ 2719.448600] vbd vbd-22-51712: -1 guest requested 0 queues, exceeding the maximum of 3.

And XenStore agrees:
tool = ""
 xenstored = ""
local = ""
 domain = ""
  0 = ""
   domid = "0"
   name = "Domain-0"
   device-model = ""
    0 = ""
     state = "running"
   error = ""
    backend = ""
     vbd = ""
      2 = ""
       51712 = ""
        error = "-1 guest requested 0 queues, exceeding the maximum of 3."

.. which also leads to a memory leak as xen_blkbk_remove never gets
called.
> 
> 
> > 
> > > 
> > >  drivers/block/xen-blkback/blkback.c | 386 ++++++++++---------
> > >  drivers/block/xen-blkback/common.h  |  78 ++--
> > >  drivers/block/xen-blkback/xenbus.c  | 359 ++++++++++++------
> > >  drivers/block/xen-blkfront.c        | 718 ++++++++++++++++++++++--------------
> > >  include/xen/interface/io/blkif.h    |  48 +++
> > >  5 files changed, 971 insertions(+), 618 deletions(-)
> > > 
> > > -- 
> > > 1.8.3.1
> > > 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 00/10] xen-block: multi hardware-queues/rings support
  2015-11-25 20:56   ` Konrad Rzeszutek Wilk
  2015-11-25 22:12     ` Konrad Rzeszutek Wilk
@ 2015-11-25 22:12     ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-25 22:12 UTC (permalink / raw)
  To: Bob Liu
  Cc: jonathan.davies, felipe.franciosi, rafal.mielniczuk,
	linux-kernel, xen-devel, axboe, david.vrabel, avanzini.arianna,
	roger.pau

On Wed, Nov 25, 2015 at 03:56:03PM -0500, Konrad Rzeszutek Wilk wrote:
> On Wed, Nov 25, 2015 at 02:25:07PM -0500, Konrad Rzeszutek Wilk wrote:
> > >   xen/blkback: separate ring information out of struct xen_blkif
> > >   xen/blkback: pseudo support for multi hardware queues/rings
> > >   xen/blkback: get the number of hardware queues/rings from blkfront
> > >   xen/blkback: make pool of persistent grants and free pages per-queue
> > 
> > OK, got to those as well. I have put them in 'devel/for-jens-4.5' and
> > are going to test them overnight before pushing them out.
> > 
> > I see two bugs in the code that we MUST deal with:
> > 
> >  - print_stats () is going to show zero values.
> >  - the sysfs code (VBD_SHOW) aren't converted over to fetch data
> >    from all the rings.
> 
> - kthread_run can't handle the two "name, i" arguments. I see:
> 
> root      5101     2  0 20:47 ?        00:00:00 [blkback.3.xvda-]
> root      5102     2  0 20:47 ?        00:00:00 [blkback.3.xvda-]

And doing save/restore:

xl save <id> /tmp/A;
xl restore /tmp/A;

ends up us loosing the proper state and not getting the ring setup back.
I see this is backend:

[ 2719.448600] vbd vbd-22-51712: -1 guest requested 0 queues, exceeding the maximum of 3.

And XenStore agrees:
tool = ""
 xenstored = ""
local = ""
 domain = ""
  0 = ""
   domid = "0"
   name = "Domain-0"
   device-model = ""
    0 = ""
     state = "running"
   error = ""
    backend = ""
     vbd = ""
      2 = ""
       51712 = ""
        error = "-1 guest requested 0 queues, exceeding the maximum of 3."

.. which also leads to a memory leak as xen_blkbk_remove never gets
called.
> 
> 
> > 
> > > 
> > >  drivers/block/xen-blkback/blkback.c | 386 ++++++++++---------
> > >  drivers/block/xen-blkback/common.h  |  78 ++--
> > >  drivers/block/xen-blkback/xenbus.c  | 359 ++++++++++++------
> > >  drivers/block/xen-blkfront.c        | 718 ++++++++++++++++++++++--------------
> > >  include/xen/interface/io/blkif.h    |  48 +++
> > >  5 files changed, 971 insertions(+), 618 deletions(-)
> > > 
> > > -- 
> > > 1.8.3.1
> > > 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 00/10] xen-block: multi hardware-queues/rings support
  2015-11-25 22:12     ` Konrad Rzeszutek Wilk
@ 2015-11-26  2:28       ` Bob Liu
  2015-11-26  2:57         ` Konrad Rzeszutek Wilk
  2015-11-26  2:57         ` Konrad Rzeszutek Wilk
  2015-11-26  2:28       ` Bob Liu
  1 sibling, 2 replies; 59+ messages in thread
From: Bob Liu @ 2015-11-26  2:28 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, linux-kernel, roger.pau, felipe.franciosi, axboe,
	avanzini.arianna, rafal.mielniczuk, jonathan.davies,
	david.vrabel


On 11/26/2015 06:12 AM, Konrad Rzeszutek Wilk wrote:
> On Wed, Nov 25, 2015 at 03:56:03PM -0500, Konrad Rzeszutek Wilk wrote:
>> On Wed, Nov 25, 2015 at 02:25:07PM -0500, Konrad Rzeszutek Wilk wrote:
>>>>   xen/blkback: separate ring information out of struct xen_blkif
>>>>   xen/blkback: pseudo support for multi hardware queues/rings
>>>>   xen/blkback: get the number of hardware queues/rings from blkfront
>>>>   xen/blkback: make pool of persistent grants and free pages per-queue
>>>
>>> OK, got to those as well. I have put them in 'devel/for-jens-4.5' and
>>> are going to test them overnight before pushing them out.
>>>
>>> I see two bugs in the code that we MUST deal with:
>>>
>>>  - print_stats () is going to show zero values.
>>>  - the sysfs code (VBD_SHOW) aren't converted over to fetch data
>>>    from all the rings.
>>
>> - kthread_run can't handle the two "name, i" arguments. I see:
>>
>> root      5101     2  0 20:47 ?        00:00:00 [blkback.3.xvda-]
>> root      5102     2  0 20:47 ?        00:00:00 [blkback.3.xvda-]
> 
> And doing save/restore:
> 
> xl save <id> /tmp/A;
> xl restore /tmp/A;
> 
> ends up us loosing the proper state and not getting the ring setup back.
> I see this is backend:
> 
> [ 2719.448600] vbd vbd-22-51712: -1 guest requested 0 queues, exceeding the maximum of 3.
> 
> And XenStore agrees:
> tool = ""
>  xenstored = ""
> local = ""
>  domain = ""
>   0 = ""
>    domid = "0"
>    name = "Domain-0"
>    device-model = ""
>     0 = ""
>      state = "running"
>    error = ""
>     backend = ""
>      vbd = ""
>       2 = ""
>        51712 = ""
>         error = "-1 guest requested 0 queues, exceeding the maximum of 3."
> 
> .. which also leads to a memory leak as xen_blkbk_remove never gets
> called.

I think which was already fix by your patch:
[PATCH RFC 2/2] xen/blkback: Free resources if connect_ring failed.

P.S. I didn't see your git tree updated with these patches.

-- 
Regards,
-Bob

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 00/10] xen-block: multi hardware-queues/rings support
  2015-11-25 22:12     ` Konrad Rzeszutek Wilk
  2015-11-26  2:28       ` Bob Liu
@ 2015-11-26  2:28       ` Bob Liu
  1 sibling, 0 replies; 59+ messages in thread
From: Bob Liu @ 2015-11-26  2:28 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: jonathan.davies, felipe.franciosi, rafal.mielniczuk,
	linux-kernel, xen-devel, axboe, david.vrabel, avanzini.arianna,
	roger.pau


On 11/26/2015 06:12 AM, Konrad Rzeszutek Wilk wrote:
> On Wed, Nov 25, 2015 at 03:56:03PM -0500, Konrad Rzeszutek Wilk wrote:
>> On Wed, Nov 25, 2015 at 02:25:07PM -0500, Konrad Rzeszutek Wilk wrote:
>>>>   xen/blkback: separate ring information out of struct xen_blkif
>>>>   xen/blkback: pseudo support for multi hardware queues/rings
>>>>   xen/blkback: get the number of hardware queues/rings from blkfront
>>>>   xen/blkback: make pool of persistent grants and free pages per-queue
>>>
>>> OK, got to those as well. I have put them in 'devel/for-jens-4.5' and
>>> are going to test them overnight before pushing them out.
>>>
>>> I see two bugs in the code that we MUST deal with:
>>>
>>>  - print_stats () is going to show zero values.
>>>  - the sysfs code (VBD_SHOW) aren't converted over to fetch data
>>>    from all the rings.
>>
>> - kthread_run can't handle the two "name, i" arguments. I see:
>>
>> root      5101     2  0 20:47 ?        00:00:00 [blkback.3.xvda-]
>> root      5102     2  0 20:47 ?        00:00:00 [blkback.3.xvda-]
> 
> And doing save/restore:
> 
> xl save <id> /tmp/A;
> xl restore /tmp/A;
> 
> ends up us loosing the proper state and not getting the ring setup back.
> I see this is backend:
> 
> [ 2719.448600] vbd vbd-22-51712: -1 guest requested 0 queues, exceeding the maximum of 3.
> 
> And XenStore agrees:
> tool = ""
>  xenstored = ""
> local = ""
>  domain = ""
>   0 = ""
>    domid = "0"
>    name = "Domain-0"
>    device-model = ""
>     0 = ""
>      state = "running"
>    error = ""
>     backend = ""
>      vbd = ""
>       2 = ""
>        51712 = ""
>         error = "-1 guest requested 0 queues, exceeding the maximum of 3."
> 
> .. which also leads to a memory leak as xen_blkbk_remove never gets
> called.

I think which was already fix by your patch:
[PATCH RFC 2/2] xen/blkback: Free resources if connect_ring failed.

P.S. I didn't see your git tree updated with these patches.

-- 
Regards,
-Bob

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 00/10] xen-block: multi hardware-queues/rings support
  2015-11-26  2:28       ` Bob Liu
@ 2015-11-26  2:57         ` Konrad Rzeszutek Wilk
  2015-11-26  7:09           ` Bob Liu
  2015-11-26  7:09           ` Bob Liu
  2015-11-26  2:57         ` Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-26  2:57 UTC (permalink / raw)
  To: Bob Liu
  Cc: xen-devel, linux-kernel, roger.pau, felipe.franciosi, axboe,
	avanzini.arianna, rafal.mielniczuk, jonathan.davies,
	david.vrabel

On Thu, Nov 26, 2015 at 10:28:10AM +0800, Bob Liu wrote:
> 
> On 11/26/2015 06:12 AM, Konrad Rzeszutek Wilk wrote:
> > On Wed, Nov 25, 2015 at 03:56:03PM -0500, Konrad Rzeszutek Wilk wrote:
> >> On Wed, Nov 25, 2015 at 02:25:07PM -0500, Konrad Rzeszutek Wilk wrote:
> >>>>   xen/blkback: separate ring information out of struct xen_blkif
> >>>>   xen/blkback: pseudo support for multi hardware queues/rings
> >>>>   xen/blkback: get the number of hardware queues/rings from blkfront
> >>>>   xen/blkback: make pool of persistent grants and free pages per-queue
> >>>
> >>> OK, got to those as well. I have put them in 'devel/for-jens-4.5' and
> >>> are going to test them overnight before pushing them out.
> >>>
> >>> I see two bugs in the code that we MUST deal with:
> >>>
> >>>  - print_stats () is going to show zero values.
> >>>  - the sysfs code (VBD_SHOW) aren't converted over to fetch data
> >>>    from all the rings.
> >>
> >> - kthread_run can't handle the two "name, i" arguments. I see:
> >>
> >> root      5101     2  0 20:47 ?        00:00:00 [blkback.3.xvda-]
> >> root      5102     2  0 20:47 ?        00:00:00 [blkback.3.xvda-]
> > 
> > And doing save/restore:
> > 
> > xl save <id> /tmp/A;
> > xl restore /tmp/A;
> > 
> > ends up us loosing the proper state and not getting the ring setup back.
> > I see this is backend:
> > 
> > [ 2719.448600] vbd vbd-22-51712: -1 guest requested 0 queues, exceeding the maximum of 3.
> > 
> > And XenStore agrees:
> > tool = ""
> >  xenstored = ""
> > local = ""
> >  domain = ""
> >   0 = ""
> >    domid = "0"
> >    name = "Domain-0"
> >    device-model = ""
> >     0 = ""
> >      state = "running"
> >    error = ""
> >     backend = ""
> >      vbd = ""
> >       2 = ""
> >        51712 = ""
> >         error = "-1 guest requested 0 queues, exceeding the maximum of 3."
> > 
> > .. which also leads to a memory leak as xen_blkbk_remove never gets
> > called.
> 
> I think which was already fix by your patch:
> [PATCH RFC 2/2] xen/blkback: Free resources if connect_ring failed.

Nope. I get that with or without the patch.

I pushed the patches in
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git
#devel/for-jens-4.5 tree. It also has some extra patches that should be
soon going via the x86 tree.

With the xen-blkback compiled with #define DEBUG 1 I see:

[   63.887741] xen-blkback: xen_blkbk_probe ffff880026a8cc00 1
[   63.894302] xen-blkback: backend_changed ffff880026a8cc00 1
[   63.895748] xen-blkback: frontend_changed ffff880026a8cc00 Initialising
[   63.922700] xen-blkback: xen_blkbk_probe ffff8800269da800 1
[   63.927849] xen-blkback: backend_changed ffff8800269da800 1
[   63.929117] xen-blkback: Successful creation of handle=ca00 (dom=1)
[   63.930605] xen-blkback: frontend_changed ffff8800269da800 Initialising
[   64.097161] xen-blkback: backend_changed ffff880026a8cc00 1
[   64.098992] xen-blkback: Successful creation of handle=1600 (dom=1)
[   64.345913] device vif1.0 entered promiscuous mode
[   64.351469] IPv6: ADDRCONF(NETDEV_UP): vif1.0: link is not ready
[   64.538682] device vif1.0-emu entered promiscuous mode
[   64.546592] switch: port 3(vif1.0-emu) entered forwarding state
[   64.548357] switch: port 3(vif1.0-emu) entered forwarding state
[   79.544475] switch: port 3(vif1.0-emu) entered forwarding state
[   84.090637] switch: port 3(vif1.0-emu) entered disabled state
[   84.091545] device vif1.0-emu left promiscuous mode
[   84.092416] switch: port 3(vif1.0-emu) entered disabled state
[   89.286901] vif vif-1-0 vif1.0: Guest Rx ready
[   89.287921] IPv6: ADDRCONF(NETDEV_CHANGE): vif1.0: link becomes ready
[   89.288943] switch: port 2(vif1.0) entered forwarding state
[   89.289747] switch: port 2(vif1.0) entered forwarding state
[   89.456176] xen-blkback: frontend_changed ffff880026a8cc00 Closed
[   89.481945] xen-blkback: frontend_changed ffff8800269da800 Initialised
[   89.482802] xen-blkback: connect_ring /local/domain/1/device/vbd/51712
[   89.484068] xen-blkback: backend/vbd/1/51712: using 2 queues, protocol 2 (x86_32-abi) persistent grants
[   89.532755] xen-blkback: connect /local/domain/1/device/vbd/51712
[   89.541694] xen_update_blkif_status: name=[blkback.1.xvda-0]
[   89.542667] xen_update_blkif_status: name=[blkback.1.xvda-1]
[   89.561913] xen-blkback: frontend_changed ffff8800269da800 Connected

.. so here the guest booted and now we are suspending it.

[  104.300579] switch: port 2(vif1.0) entered forwarding state
[  208.057752] xen-blkback: frontend_changed ffff880026a8cc00 Unknown
[  208.061282] xen-blkback: xen_blkbk_remove ffff880026a8cc00 1
[  208.081888] xen-blkback: frontend_changed ffff8800269da800 Unknown
[  208.082759] xen-blkback: xen_blkbk_remove ffff8800269da800 1
[  208.102745] switch: port 2(vif1.0) entered disabled state
[  208.109089] switch: port 2(vif1.0) entered disabled state
[  208.109934] device vif1.0 left promiscuous mode
[  208.110734] switch: port 2(vif1.0) entered disabled state

We are done with suspend and we are resuming:

[  302.128132] xen-blkback: xen_blkbk_probe ffff880027fb1c00 2
[  302.134915] xen-blkback: backend_changed ffff880027fb1c00 2
[  302.136568] xen-blkback: frontend_changed ffff880027fb1c00 Initialising
[  302.165161] xen-blkback: xen_blkbk_probe ffff88001e2cac00 2
[  302.170157] xen-blkback: backend_changed ffff88001e2cac00 2
[  302.171420] xen-blkback: Successful creation of handle=ca00 (dom=2)
[  302.172873] xen-blkback: frontend_changed ffff88001e2cac00 Initialising
[  302.409007] xen-blkback: backend_changed ffff880027fb1c00 2
[  302.411237] xen-blkback: Successful creation of handle=1600 (dom=2)
[  302.634468] device vif2.0 entered promiscuous mode
[  302.639871] IPv6: ADDRCONF(NETDEV_UP): vif2.0: link is not ready
[  302.822459] device vif2.0-emu entered promiscuous mode
[  302.830021] switch: port 3(vif2.0-emu) entered forwarding state
[  302.831568] switch: port 3(vif2.0-emu) entered forwarding state
[  302.849206] switch: port 3(vif2.0-emu) entered disabled state
[  302.850952] device vif2.0-emu left promiscuous mode
[  302.852567] switch: port 3(vif2.0-emu) entered disabled state
[  302.895719] xen-blkback: frontend_changed ffff88002e2cac00 Initialised
[  302.897237] xen-blkback: connect_ring /local/domain/2/device/vbd/51712
[  302.899389] vbd vbd-2-51712: -1 guest requested 0 queues, exceeding the maximum of 3.
[  303.038086] xen-blkback: frontend_changed ffff88001e2cac00 Closing
[  303.073058] vif vif-2-0 vif2.0: Guest Rx ready
[  303.074057] IPv6: ADDRCONF(NETDEV_CHANGE): vif2.0: link becomes ready
[  303.075079] switch: port 2(vif2.0) entered forwarding state
[  303.075976] switch: port 2(vif2.0) entered forwarding state
[  306.204385] switch: port 2(vif2.0) entered disabled state
[  306.234820] xen-blkback: frontend_changed ffff880027fb1c00 Unknown
[  306.236307] xen-blkback: xen_blkbk_remove ffff880027fb1c00 2
[  306.248684] xen-blkback: frontend_changed ffff88001e2cac00 Unknown
[  306.249958] xen-blkback: xen_blkbk_remove ffff88001e2cac00 2
[  306.262379] switch: port 2(vif2.0) entered disabled state
[  306.263280] device vif2.0 left promiscuous mode
[  306.264210] switch: port 2(vif2.0) entered disabled state


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 00/10] xen-block: multi hardware-queues/rings support
  2015-11-26  2:28       ` Bob Liu
  2015-11-26  2:57         ` Konrad Rzeszutek Wilk
@ 2015-11-26  2:57         ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-26  2:57 UTC (permalink / raw)
  To: Bob Liu
  Cc: jonathan.davies, felipe.franciosi, rafal.mielniczuk,
	linux-kernel, xen-devel, axboe, david.vrabel, avanzini.arianna,
	roger.pau

On Thu, Nov 26, 2015 at 10:28:10AM +0800, Bob Liu wrote:
> 
> On 11/26/2015 06:12 AM, Konrad Rzeszutek Wilk wrote:
> > On Wed, Nov 25, 2015 at 03:56:03PM -0500, Konrad Rzeszutek Wilk wrote:
> >> On Wed, Nov 25, 2015 at 02:25:07PM -0500, Konrad Rzeszutek Wilk wrote:
> >>>>   xen/blkback: separate ring information out of struct xen_blkif
> >>>>   xen/blkback: pseudo support for multi hardware queues/rings
> >>>>   xen/blkback: get the number of hardware queues/rings from blkfront
> >>>>   xen/blkback: make pool of persistent grants and free pages per-queue
> >>>
> >>> OK, got to those as well. I have put them in 'devel/for-jens-4.5' and
> >>> are going to test them overnight before pushing them out.
> >>>
> >>> I see two bugs in the code that we MUST deal with:
> >>>
> >>>  - print_stats () is going to show zero values.
> >>>  - the sysfs code (VBD_SHOW) aren't converted over to fetch data
> >>>    from all the rings.
> >>
> >> - kthread_run can't handle the two "name, i" arguments. I see:
> >>
> >> root      5101     2  0 20:47 ?        00:00:00 [blkback.3.xvda-]
> >> root      5102     2  0 20:47 ?        00:00:00 [blkback.3.xvda-]
> > 
> > And doing save/restore:
> > 
> > xl save <id> /tmp/A;
> > xl restore /tmp/A;
> > 
> > ends up us loosing the proper state and not getting the ring setup back.
> > I see this is backend:
> > 
> > [ 2719.448600] vbd vbd-22-51712: -1 guest requested 0 queues, exceeding the maximum of 3.
> > 
> > And XenStore agrees:
> > tool = ""
> >  xenstored = ""
> > local = ""
> >  domain = ""
> >   0 = ""
> >    domid = "0"
> >    name = "Domain-0"
> >    device-model = ""
> >     0 = ""
> >      state = "running"
> >    error = ""
> >     backend = ""
> >      vbd = ""
> >       2 = ""
> >        51712 = ""
> >         error = "-1 guest requested 0 queues, exceeding the maximum of 3."
> > 
> > .. which also leads to a memory leak as xen_blkbk_remove never gets
> > called.
> 
> I think which was already fix by your patch:
> [PATCH RFC 2/2] xen/blkback: Free resources if connect_ring failed.

Nope. I get that with or without the patch.

I pushed the patches in
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git
#devel/for-jens-4.5 tree. It also has some extra patches that should be
soon going via the x86 tree.

With the xen-blkback compiled with #define DEBUG 1 I see:

[   63.887741] xen-blkback: xen_blkbk_probe ffff880026a8cc00 1
[   63.894302] xen-blkback: backend_changed ffff880026a8cc00 1
[   63.895748] xen-blkback: frontend_changed ffff880026a8cc00 Initialising
[   63.922700] xen-blkback: xen_blkbk_probe ffff8800269da800 1
[   63.927849] xen-blkback: backend_changed ffff8800269da800 1
[   63.929117] xen-blkback: Successful creation of handle=ca00 (dom=1)
[   63.930605] xen-blkback: frontend_changed ffff8800269da800 Initialising
[   64.097161] xen-blkback: backend_changed ffff880026a8cc00 1
[   64.098992] xen-blkback: Successful creation of handle=1600 (dom=1)
[   64.345913] device vif1.0 entered promiscuous mode
[   64.351469] IPv6: ADDRCONF(NETDEV_UP): vif1.0: link is not ready
[   64.538682] device vif1.0-emu entered promiscuous mode
[   64.546592] switch: port 3(vif1.0-emu) entered forwarding state
[   64.548357] switch: port 3(vif1.0-emu) entered forwarding state
[   79.544475] switch: port 3(vif1.0-emu) entered forwarding state
[   84.090637] switch: port 3(vif1.0-emu) entered disabled state
[   84.091545] device vif1.0-emu left promiscuous mode
[   84.092416] switch: port 3(vif1.0-emu) entered disabled state
[   89.286901] vif vif-1-0 vif1.0: Guest Rx ready
[   89.287921] IPv6: ADDRCONF(NETDEV_CHANGE): vif1.0: link becomes ready
[   89.288943] switch: port 2(vif1.0) entered forwarding state
[   89.289747] switch: port 2(vif1.0) entered forwarding state
[   89.456176] xen-blkback: frontend_changed ffff880026a8cc00 Closed
[   89.481945] xen-blkback: frontend_changed ffff8800269da800 Initialised
[   89.482802] xen-blkback: connect_ring /local/domain/1/device/vbd/51712
[   89.484068] xen-blkback: backend/vbd/1/51712: using 2 queues, protocol 2 (x86_32-abi) persistent grants
[   89.532755] xen-blkback: connect /local/domain/1/device/vbd/51712
[   89.541694] xen_update_blkif_status: name=[blkback.1.xvda-0]
[   89.542667] xen_update_blkif_status: name=[blkback.1.xvda-1]
[   89.561913] xen-blkback: frontend_changed ffff8800269da800 Connected

.. so here the guest booted and now we are suspending it.

[  104.300579] switch: port 2(vif1.0) entered forwarding state
[  208.057752] xen-blkback: frontend_changed ffff880026a8cc00 Unknown
[  208.061282] xen-blkback: xen_blkbk_remove ffff880026a8cc00 1
[  208.081888] xen-blkback: frontend_changed ffff8800269da800 Unknown
[  208.082759] xen-blkback: xen_blkbk_remove ffff8800269da800 1
[  208.102745] switch: port 2(vif1.0) entered disabled state
[  208.109089] switch: port 2(vif1.0) entered disabled state
[  208.109934] device vif1.0 left promiscuous mode
[  208.110734] switch: port 2(vif1.0) entered disabled state

We are done with suspend and we are resuming:

[  302.128132] xen-blkback: xen_blkbk_probe ffff880027fb1c00 2
[  302.134915] xen-blkback: backend_changed ffff880027fb1c00 2
[  302.136568] xen-blkback: frontend_changed ffff880027fb1c00 Initialising
[  302.165161] xen-blkback: xen_blkbk_probe ffff88001e2cac00 2
[  302.170157] xen-blkback: backend_changed ffff88001e2cac00 2
[  302.171420] xen-blkback: Successful creation of handle=ca00 (dom=2)
[  302.172873] xen-blkback: frontend_changed ffff88001e2cac00 Initialising
[  302.409007] xen-blkback: backend_changed ffff880027fb1c00 2
[  302.411237] xen-blkback: Successful creation of handle=1600 (dom=2)
[  302.634468] device vif2.0 entered promiscuous mode
[  302.639871] IPv6: ADDRCONF(NETDEV_UP): vif2.0: link is not ready
[  302.822459] device vif2.0-emu entered promiscuous mode
[  302.830021] switch: port 3(vif2.0-emu) entered forwarding state
[  302.831568] switch: port 3(vif2.0-emu) entered forwarding state
[  302.849206] switch: port 3(vif2.0-emu) entered disabled state
[  302.850952] device vif2.0-emu left promiscuous mode
[  302.852567] switch: port 3(vif2.0-emu) entered disabled state
[  302.895719] xen-blkback: frontend_changed ffff88002e2cac00 Initialised
[  302.897237] xen-blkback: connect_ring /local/domain/2/device/vbd/51712
[  302.899389] vbd vbd-2-51712: -1 guest requested 0 queues, exceeding the maximum of 3.
[  303.038086] xen-blkback: frontend_changed ffff88001e2cac00 Closing
[  303.073058] vif vif-2-0 vif2.0: Guest Rx ready
[  303.074057] IPv6: ADDRCONF(NETDEV_CHANGE): vif2.0: link becomes ready
[  303.075079] switch: port 2(vif2.0) entered forwarding state
[  303.075976] switch: port 2(vif2.0) entered forwarding state
[  306.204385] switch: port 2(vif2.0) entered disabled state
[  306.234820] xen-blkback: frontend_changed ffff880027fb1c00 Unknown
[  306.236307] xen-blkback: xen_blkbk_remove ffff880027fb1c00 2
[  306.248684] xen-blkback: frontend_changed ffff88001e2cac00 Unknown
[  306.249958] xen-blkback: xen_blkbk_remove ffff88001e2cac00 2
[  306.262379] switch: port 2(vif2.0) entered disabled state
[  306.263280] device vif2.0 left promiscuous mode
[  306.264210] switch: port 2(vif2.0) entered disabled state

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 00/10] xen-block: multi hardware-queues/rings support
  2015-11-26  2:57         ` Konrad Rzeszutek Wilk
  2015-11-26  7:09           ` Bob Liu
@ 2015-11-26  7:09           ` Bob Liu
  2015-11-26 16:20             ` Konrad Rzeszutek Wilk
                               ` (2 more replies)
  1 sibling, 3 replies; 59+ messages in thread
From: Bob Liu @ 2015-11-26  7:09 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, linux-kernel, roger.pau, felipe.franciosi, axboe,
	avanzini.arianna, rafal.mielniczuk, jonathan.davies,
	david.vrabel

[-- Attachment #1: Type: text/plain, Size: 2212 bytes --]


On 11/26/2015 10:57 AM, Konrad Rzeszutek Wilk wrote:
> On Thu, Nov 26, 2015 at 10:28:10AM +0800, Bob Liu wrote:
>>
>> On 11/26/2015 06:12 AM, Konrad Rzeszutek Wilk wrote:
>>> On Wed, Nov 25, 2015 at 03:56:03PM -0500, Konrad Rzeszutek Wilk wrote:
>>>> On Wed, Nov 25, 2015 at 02:25:07PM -0500, Konrad Rzeszutek Wilk wrote:
>>>>>>   xen/blkback: separate ring information out of struct xen_blkif
>>>>>>   xen/blkback: pseudo support for multi hardware queues/rings
>>>>>>   xen/blkback: get the number of hardware queues/rings from blkfront
>>>>>>   xen/blkback: make pool of persistent grants and free pages per-queue
>>>>>
>>>>> OK, got to those as well. I have put them in 'devel/for-jens-4.5' and
>>>>> are going to test them overnight before pushing them out.
>>>>>
>>>>> I see two bugs in the code that we MUST deal with:
>>>>>
>>>>>  - print_stats () is going to show zero values.
>>>>>  - the sysfs code (VBD_SHOW) aren't converted over to fetch data
>>>>>    from all the rings.
>>>>
>>>> - kthread_run can't handle the two "name, i" arguments. I see:
>>>>
>>>> root      5101     2  0 20:47 ?        00:00:00 [blkback.3.xvda-]
>>>> root      5102     2  0 20:47 ?        00:00:00 [blkback.3.xvda-]
>>>
>>> And doing save/restore:
>>>
>>> xl save <id> /tmp/A;
>>> xl restore /tmp/A;
>>>
>>> ends up us loosing the proper state and not getting the ring setup back.
>>> I see this is backend:
>>>
>>> [ 2719.448600] vbd vbd-22-51712: -1 guest requested 0 queues, exceeding the maximum of 3.
>>>
>>> And XenStore agrees:
>>> tool = ""
>>>  xenstored = ""
>>> local = ""
>>>  domain = ""
>>>   0 = ""
>>>    domid = "0"
>>>    name = "Domain-0"
>>>    device-model = ""
>>>     0 = ""
>>>      state = "running"
>>>    error = ""
>>>     backend = ""
>>>      vbd = ""
>>>       2 = ""
>>>        51712 = ""
>>>         error = "-1 guest requested 0 queues, exceeding the maximum of 3."
>>>
>>> .. which also leads to a memory leak as xen_blkbk_remove never gets
>>> called.
>>
>> I think which was already fix by your patch:
>> [PATCH RFC 2/2] xen/blkback: Free resources if connect_ring failed.
> 
> Nope. I get that with or without the patch.
> 

Attached patch should fix this issue. 

-- 
Regards,
-Bob

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-xen-blkfront-fix-compile-error.patch --]
[-- Type: text/x-patch; name="0001-xen-blkfront-fix-compile-error.patch", Size: 1029 bytes --]

>From f297a05fc27fb0bc9a3ed15407f8cc6ffd5e2a00 Mon Sep 17 00:00:00 2001
From: Bob Liu <bob.liu@oracle.com>
Date: Wed, 25 Nov 2015 14:56:32 -0500
Subject: [PATCH 1/2] xen:blkfront: fix compile error
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Fix this build error:
drivers/block/xen-blkfront.c: In function ‘blkif_free’:
drivers/block/xen-blkfront.c:1234:6: error: ‘struct blkfront_info’ has no
member named ‘ring’ info->ring = NULL;

Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 drivers/block/xen-blkfront.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 625604d..ef5ce43 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1231,7 +1231,7 @@ static void blkif_free(struct blkfront_info *info, int suspend)
 		blkif_free_ring(&info->rinfo[i]);
 
 	kfree(info->rinfo);
-	info->ring = NULL;
+	info->rinfo = NULL;
 	info->nr_rings = 0;
 }
 
-- 
1.8.3.1


[-- Attachment #3: 0002-xen-blkfront-realloc-ring-info-in-blkif_resume.patch --]
[-- Type: text/x-patch, Size: 1904 bytes --]

>From aab0bb1690213e665966ea22b021e0eeaacfc717 Mon Sep 17 00:00:00 2001
From: Bob Liu <bob.liu@oracle.com>
Date: Wed, 25 Nov 2015 17:52:55 -0500
Subject: [PATCH 2/2] xen/blkfront: realloc ring info in blkif_resume

Need to reallocate ring info in the resume path, because info->rinfo was freed
in blkif_free(). And 'multi-queue-max-queues' backend reports may have been
changed.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 drivers/block/xen-blkfront.c | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index ef5ce43..9634a65 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1926,12 +1926,38 @@ static int blkif_recover(struct blkfront_info *info)
 static int blkfront_resume(struct xenbus_device *dev)
 {
 	struct blkfront_info *info = dev_get_drvdata(&dev->dev);
-	int err;
+	int err = 0;
+	unsigned int max_queues = 0, r_index;
 
 	dev_dbg(&dev->dev, "blkfront_resume: %s\n", dev->nodename);
 
 	blkif_free(info, info->connected == BLKIF_STATE_CONNECTED);
 
+	err = xenbus_gather(XBT_NIL, info->xbdev->otherend,
+			"multi-queue-max-queues", "%u", &max_queues, NULL);
+	if (err)
+		max_queues = 1;
+
+	info->nr_rings = min(max_queues, xen_blkif_max_queues);
+	/* We need at least one ring. */
+	if (!info->nr_rings)
+		info->nr_rings = 1;
+
+	info->rinfo = kzalloc(sizeof(struct blkfront_ring_info) * info->nr_rings, GFP_KERNEL);
+	if (!info->rinfo)
+		return -ENOMEM;
+
+	for (r_index = 0; r_index < info->nr_rings; r_index++) {
+		struct blkfront_ring_info *rinfo;
+
+		rinfo = &info->rinfo[r_index];
+		INIT_LIST_HEAD(&rinfo->indirect_pages);
+		INIT_LIST_HEAD(&rinfo->grants);
+		rinfo->dev_info = info;
+		INIT_WORK(&rinfo->work, blkif_restart_queue);
+		spin_lock_init(&rinfo->ring_lock);
+	}
+
 	err = talk_to_blkback(dev, info);
 
 	/*
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 00/10] xen-block: multi hardware-queues/rings support
  2015-11-26  2:57         ` Konrad Rzeszutek Wilk
@ 2015-11-26  7:09           ` Bob Liu
  2015-11-26  7:09           ` Bob Liu
  1 sibling, 0 replies; 59+ messages in thread
From: Bob Liu @ 2015-11-26  7:09 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: jonathan.davies, felipe.franciosi, rafal.mielniczuk,
	linux-kernel, xen-devel, axboe, david.vrabel, avanzini.arianna,
	roger.pau

[-- Attachment #1: Type: text/plain, Size: 2212 bytes --]


On 11/26/2015 10:57 AM, Konrad Rzeszutek Wilk wrote:
> On Thu, Nov 26, 2015 at 10:28:10AM +0800, Bob Liu wrote:
>>
>> On 11/26/2015 06:12 AM, Konrad Rzeszutek Wilk wrote:
>>> On Wed, Nov 25, 2015 at 03:56:03PM -0500, Konrad Rzeszutek Wilk wrote:
>>>> On Wed, Nov 25, 2015 at 02:25:07PM -0500, Konrad Rzeszutek Wilk wrote:
>>>>>>   xen/blkback: separate ring information out of struct xen_blkif
>>>>>>   xen/blkback: pseudo support for multi hardware queues/rings
>>>>>>   xen/blkback: get the number of hardware queues/rings from blkfront
>>>>>>   xen/blkback: make pool of persistent grants and free pages per-queue
>>>>>
>>>>> OK, got to those as well. I have put them in 'devel/for-jens-4.5' and
>>>>> are going to test them overnight before pushing them out.
>>>>>
>>>>> I see two bugs in the code that we MUST deal with:
>>>>>
>>>>>  - print_stats () is going to show zero values.
>>>>>  - the sysfs code (VBD_SHOW) aren't converted over to fetch data
>>>>>    from all the rings.
>>>>
>>>> - kthread_run can't handle the two "name, i" arguments. I see:
>>>>
>>>> root      5101     2  0 20:47 ?        00:00:00 [blkback.3.xvda-]
>>>> root      5102     2  0 20:47 ?        00:00:00 [blkback.3.xvda-]
>>>
>>> And doing save/restore:
>>>
>>> xl save <id> /tmp/A;
>>> xl restore /tmp/A;
>>>
>>> ends up us loosing the proper state and not getting the ring setup back.
>>> I see this is backend:
>>>
>>> [ 2719.448600] vbd vbd-22-51712: -1 guest requested 0 queues, exceeding the maximum of 3.
>>>
>>> And XenStore agrees:
>>> tool = ""
>>>  xenstored = ""
>>> local = ""
>>>  domain = ""
>>>   0 = ""
>>>    domid = "0"
>>>    name = "Domain-0"
>>>    device-model = ""
>>>     0 = ""
>>>      state = "running"
>>>    error = ""
>>>     backend = ""
>>>      vbd = ""
>>>       2 = ""
>>>        51712 = ""
>>>         error = "-1 guest requested 0 queues, exceeding the maximum of 3."
>>>
>>> .. which also leads to a memory leak as xen_blkbk_remove never gets
>>> called.
>>
>> I think which was already fix by your patch:
>> [PATCH RFC 2/2] xen/blkback: Free resources if connect_ring failed.
> 
> Nope. I get that with or without the patch.
> 

Attached patch should fix this issue. 

-- 
Regards,
-Bob

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-xen-blkfront-fix-compile-error.patch --]
[-- Type: text/x-patch; name="0001-xen-blkfront-fix-compile-error.patch", Size: 1060 bytes --]

>From f297a05fc27fb0bc9a3ed15407f8cc6ffd5e2a00 Mon Sep 17 00:00:00 2001
From: Bob Liu <bob.liu@oracle.com>
Date: Wed, 25 Nov 2015 14:56:32 -0500
Subject: [PATCH 1/2] xen:blkfront: fix compile error
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Fix this build error:
drivers/block/xen-blkfront.c: In function ‘blkif_free’:
drivers/block/xen-blkfront.c:1234:6: error: ‘struct blkfront_info’ has no
member named ‘ring’ info->ring = NULL;

Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 drivers/block/xen-blkfront.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 625604d..ef5ce43 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1231,7 +1231,7 @@ static void blkif_free(struct blkfront_info *info, int suspend)
 		blkif_free_ring(&info->rinfo[i]);

 	kfree(info->rinfo);
-	info->ring = NULL;
+	info->rinfo = NULL;
 	info->nr_rings = 0;
 }

--
1.8.3.1


[-- Attachment #3: 0002-xen-blkfront-realloc-ring-info-in-blkif_resume.patch --]
[-- Type: text/x-patch, Size: 1904 bytes --]

>From aab0bb1690213e665966ea22b021e0eeaacfc717 Mon Sep 17 00:00:00 2001
From: Bob Liu <bob.liu@oracle.com>
Date: Wed, 25 Nov 2015 17:52:55 -0500
Subject: [PATCH 2/2] xen/blkfront: realloc ring info in blkif_resume

Need to reallocate ring info in the resume path, because info->rinfo was freed
in blkif_free(). And 'multi-queue-max-queues' backend reports may have been
changed.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 drivers/block/xen-blkfront.c | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index ef5ce43..9634a65 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1926,12 +1926,38 @@ static int blkif_recover(struct blkfront_info *info)
 static int blkfront_resume(struct xenbus_device *dev)
 {
 	struct blkfront_info *info = dev_get_drvdata(&dev->dev);
-	int err;
+	int err = 0;
+	unsigned int max_queues = 0, r_index;
 
 	dev_dbg(&dev->dev, "blkfront_resume: %s\n", dev->nodename);
 
 	blkif_free(info, info->connected == BLKIF_STATE_CONNECTED);
 
+	err = xenbus_gather(XBT_NIL, info->xbdev->otherend,
+			"multi-queue-max-queues", "%u", &max_queues, NULL);
+	if (err)
+		max_queues = 1;
+
+	info->nr_rings = min(max_queues, xen_blkif_max_queues);
+	/* We need at least one ring. */
+	if (!info->nr_rings)
+		info->nr_rings = 1;
+
+	info->rinfo = kzalloc(sizeof(struct blkfront_ring_info) * info->nr_rings, GFP_KERNEL);
+	if (!info->rinfo)
+		return -ENOMEM;
+
+	for (r_index = 0; r_index < info->nr_rings; r_index++) {
+		struct blkfront_ring_info *rinfo;
+
+		rinfo = &info->rinfo[r_index];
+		INIT_LIST_HEAD(&rinfo->indirect_pages);
+		INIT_LIST_HEAD(&rinfo->grants);
+		rinfo->dev_info = info;
+		INIT_WORK(&rinfo->work, blkif_restart_queue);
+		spin_lock_init(&rinfo->ring_lock);
+	}
+
 	err = talk_to_blkback(dev, info);
 
 	/*
-- 
1.8.3.1


[-- Attachment #4: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 00/10] xen-block: multi hardware-queues/rings support
  2015-11-26  7:09           ` Bob Liu
@ 2015-11-26 16:20             ` Konrad Rzeszutek Wilk
  2015-11-26 16:20             ` Konrad Rzeszutek Wilk
  2015-11-30 22:03             ` Konrad Rzeszutek Wilk
  2 siblings, 0 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-26 16:20 UTC (permalink / raw)
  To: Bob Liu
  Cc: xen-devel, linux-kernel, roger.pau, felipe.franciosi, axboe,
	avanzini.arianna, rafal.mielniczuk, jonathan.davies,
	david.vrabel

On November 26, 2015 2:09:02 AM EST, Bob Liu <bob.liu@oracle.com> wrote:
>
>On 11/26/2015 10:57 AM, Konrad Rzeszutek Wilk wrote:
>> On Thu, Nov 26, 2015 at 10:28:10AM +0800, Bob Liu wrote:
>>>
>>> On 11/26/2015 06:12 AM, Konrad Rzeszutek Wilk wrote:
>>>> On Wed, Nov 25, 2015 at 03:56:03PM -0500, Konrad Rzeszutek Wilk
>wrote:
>>>>> On Wed, Nov 25, 2015 at 02:25:07PM -0500, Konrad Rzeszutek Wilk
>wrote:
>>>>>>>   xen/blkback: separate ring information out of struct xen_blkif
>>>>>>>   xen/blkback: pseudo support for multi hardware queues/rings
>>>>>>>   xen/blkback: get the number of hardware queues/rings from
>blkfront
>>>>>>>   xen/blkback: make pool of persistent grants and free pages
>per-queue
>>>>>>
>>>>>> OK, got to those as well. I have put them in 'devel/for-jens-4.5'
>and
>>>>>> are going to test them overnight before pushing them out.
>>>>>>
>>>>>> I see two bugs in the code that we MUST deal with:
>>>>>>
>>>>>>  - print_stats () is going to show zero values.
>>>>>>  - the sysfs code (VBD_SHOW) aren't converted over to fetch data
>>>>>>    from all the rings.
>>>>>
>>>>> - kthread_run can't handle the two "name, i" arguments. I see:
>>>>>
>>>>> root      5101     2  0 20:47 ?        00:00:00 [blkback.3.xvda-]
>>>>> root      5102     2  0 20:47 ?        00:00:00 [blkback.3.xvda-]
>>>>
>>>> And doing save/restore:
>>>>
>>>> xl save <id> /tmp/A;
>>>> xl restore /tmp/A;
>>>>
>>>> ends up us loosing the proper state and not getting the ring setup
>back.
>>>> I see this is backend:
>>>>
>>>> [ 2719.448600] vbd vbd-22-51712: -1 guest requested 0 queues,
>exceeding the maximum of 3.
>>>>
>>>> And XenStore agrees:
>>>> tool = ""
>>>>  xenstored = ""
>>>> local = ""
>>>>  domain = ""
>>>>   0 = ""
>>>>    domid = "0"
>>>>    name = "Domain-0"
>>>>    device-model = ""
>>>>     0 = ""
>>>>      state = "running"
>>>>    error = ""
>>>>     backend = ""
>>>>      vbd = ""
>>>>       2 = ""
>>>>        51712 = ""
>>>>         error = "-1 guest requested 0 queues, exceeding the maximum
>of 3."
>>>>
>>>> .. which also leads to a memory leak as xen_blkbk_remove never gets
>>>> called.
>>>
>>> I think which was already fix by your patch:
>>> [PATCH RFC 2/2] xen/blkback: Free resources if connect_ring failed.
>> 
>> Nope. I get that with or without the patch.
>> 
>
>Attached patch should fix this issue. 

Yup!

Thanks!



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 00/10] xen-block: multi hardware-queues/rings support
  2015-11-26  7:09           ` Bob Liu
  2015-11-26 16:20             ` Konrad Rzeszutek Wilk
@ 2015-11-26 16:20             ` Konrad Rzeszutek Wilk
  2015-11-30 22:03             ` Konrad Rzeszutek Wilk
  2 siblings, 0 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-26 16:20 UTC (permalink / raw)
  To: Bob Liu
  Cc: jonathan.davies, felipe.franciosi, rafal.mielniczuk,
	linux-kernel, xen-devel, axboe, david.vrabel, avanzini.arianna,
	roger.pau

On November 26, 2015 2:09:02 AM EST, Bob Liu <bob.liu@oracle.com> wrote:
>
>On 11/26/2015 10:57 AM, Konrad Rzeszutek Wilk wrote:
>> On Thu, Nov 26, 2015 at 10:28:10AM +0800, Bob Liu wrote:
>>>
>>> On 11/26/2015 06:12 AM, Konrad Rzeszutek Wilk wrote:
>>>> On Wed, Nov 25, 2015 at 03:56:03PM -0500, Konrad Rzeszutek Wilk
>wrote:
>>>>> On Wed, Nov 25, 2015 at 02:25:07PM -0500, Konrad Rzeszutek Wilk
>wrote:
>>>>>>>   xen/blkback: separate ring information out of struct xen_blkif
>>>>>>>   xen/blkback: pseudo support for multi hardware queues/rings
>>>>>>>   xen/blkback: get the number of hardware queues/rings from
>blkfront
>>>>>>>   xen/blkback: make pool of persistent grants and free pages
>per-queue
>>>>>>
>>>>>> OK, got to those as well. I have put them in 'devel/for-jens-4.5'
>and
>>>>>> are going to test them overnight before pushing them out.
>>>>>>
>>>>>> I see two bugs in the code that we MUST deal with:
>>>>>>
>>>>>>  - print_stats () is going to show zero values.
>>>>>>  - the sysfs code (VBD_SHOW) aren't converted over to fetch data
>>>>>>    from all the rings.
>>>>>
>>>>> - kthread_run can't handle the two "name, i" arguments. I see:
>>>>>
>>>>> root      5101     2  0 20:47 ?        00:00:00 [blkback.3.xvda-]
>>>>> root      5102     2  0 20:47 ?        00:00:00 [blkback.3.xvda-]
>>>>
>>>> And doing save/restore:
>>>>
>>>> xl save <id> /tmp/A;
>>>> xl restore /tmp/A;
>>>>
>>>> ends up us loosing the proper state and not getting the ring setup
>back.
>>>> I see this is backend:
>>>>
>>>> [ 2719.448600] vbd vbd-22-51712: -1 guest requested 0 queues,
>exceeding the maximum of 3.
>>>>
>>>> And XenStore agrees:
>>>> tool = ""
>>>>  xenstored = ""
>>>> local = ""
>>>>  domain = ""
>>>>   0 = ""
>>>>    domid = "0"
>>>>    name = "Domain-0"
>>>>    device-model = ""
>>>>     0 = ""
>>>>      state = "running"
>>>>    error = ""
>>>>     backend = ""
>>>>      vbd = ""
>>>>       2 = ""
>>>>        51712 = ""
>>>>         error = "-1 guest requested 0 queues, exceeding the maximum
>of 3."
>>>>
>>>> .. which also leads to a memory leak as xen_blkbk_remove never gets
>>>> called.
>>>
>>> I think which was already fix by your patch:
>>> [PATCH RFC 2/2] xen/blkback: Free resources if connect_ring failed.
>> 
>> Nope. I get that with or without the patch.
>> 
>
>Attached patch should fix this issue. 

Yup!

Thanks!

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v5 00/10] xen-block: multi hardware-queues/rings support
  2015-11-26  7:09           ` Bob Liu
  2015-11-26 16:20             ` Konrad Rzeszutek Wilk
  2015-11-26 16:20             ` Konrad Rzeszutek Wilk
@ 2015-11-30 22:03             ` Konrad Rzeszutek Wilk
  2 siblings, 0 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-30 22:03 UTC (permalink / raw)
  To: Bob Liu
  Cc: jonathan.davies, felipe.franciosi, rafal.mielniczuk,
	linux-kernel, xen-devel, axboe, david.vrabel, avanzini.arianna,
	roger.pau

On Thu, Nov 26, 2015 at 03:09:02PM +0800, Bob Liu wrote:
> 
> On 11/26/2015 10:57 AM, Konrad Rzeszutek Wilk wrote:
> > On Thu, Nov 26, 2015 at 10:28:10AM +0800, Bob Liu wrote:
> >>
> >> On 11/26/2015 06:12 AM, Konrad Rzeszutek Wilk wrote:
> >>> On Wed, Nov 25, 2015 at 03:56:03PM -0500, Konrad Rzeszutek Wilk wrote:
> >>>> On Wed, Nov 25, 2015 at 02:25:07PM -0500, Konrad Rzeszutek Wilk wrote:
> >>>>>>   xen/blkback: separate ring information out of struct xen_blkif
> >>>>>>   xen/blkback: pseudo support for multi hardware queues/rings
> >>>>>>   xen/blkback: get the number of hardware queues/rings from blkfront
> >>>>>>   xen/blkback: make pool of persistent grants and free pages per-queue
> >>>>>
> >>>>> OK, got to those as well. I have put them in 'devel/for-jens-4.5' and
> >>>>> are going to test them overnight before pushing them out.
> >>>>>
> >>>>> I see two bugs in the code that we MUST deal with:
> >>>>>
> >>>>>  - print_stats () is going to show zero values.
> >>>>>  - the sysfs code (VBD_SHOW) aren't converted over to fetch data
> >>>>>    from all the rings.
> >>>>
> >>>> - kthread_run can't handle the two "name, i" arguments. I see:
> >>>>
> >>>> root      5101     2  0 20:47 ?        00:00:00 [blkback.3.xvda-]
> >>>> root      5102     2  0 20:47 ?        00:00:00 [blkback.3.xvda-]
> >>>
> >>> And doing save/restore:
> >>>
> >>> xl save <id> /tmp/A;
> >>> xl restore /tmp/A;
> >>>
> >>> ends up us loosing the proper state and not getting the ring setup back.
> >>> I see this is backend:
> >>>
> >>> [ 2719.448600] vbd vbd-22-51712: -1 guest requested 0 queues, exceeding the maximum of 3.
> >>>
> >>> And XenStore agrees:
> >>> tool = ""
> >>>  xenstored = ""
> >>> local = ""
> >>>  domain = ""
> >>>   0 = ""
> >>>    domid = "0"
> >>>    name = "Domain-0"
> >>>    device-model = ""
> >>>     0 = ""
> >>>      state = "running"
> >>>    error = ""
> >>>     backend = ""
> >>>      vbd = ""
> >>>       2 = ""
> >>>        51712 = ""
> >>>         error = "-1 guest requested 0 queues, exceeding the maximum of 3."
> >>>
> >>> .. which also leads to a memory leak as xen_blkbk_remove never gets
> >>> called.
> >>
> >> I think which was already fix by your patch:
> >> [PATCH RFC 2/2] xen/blkback: Free resources if connect_ring failed.
> > 
> > Nope. I get that with or without the patch.
> > 
> 
> Attached patch should fix this issue. 

I reworked it a bit.

>From 214635bd2d1c331d984a8170be30c7ba82a11fb2 Mon Sep 17 00:00:00 2001
From: Bob Liu <bob.liu@oracle.com>
Date: Wed, 25 Nov 2015 17:52:55 -0500
Subject: [PATCH] xen/blkfront: realloc ring info in blkif_resume

Need to reallocate ring info in the resume path, because info->rinfo was freed
in blkif_free(). And 'multi-queue-max-queues' backend reports may have been
changed.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
Reported-and-Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/block/xen-blkfront.c | 74 +++++++++++++++++++++++++++-----------------
 1 file changed, 45 insertions(+), 29 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index ef5ce43..4f77d36 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1676,6 +1676,43 @@ again:
 	return err;
 }
 
+static int negotiate_mq(struct blkfront_info *info)
+{
+	unsigned int backend_max_queues = 0;
+	int err;
+	unsigned int i;
+
+	BUG_ON(info->nr_rings);
+
+	/* Check if backend supports multiple queues. */
+	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
+			   "multi-queue-max-queues", "%u", &backend_max_queues);
+	if (err < 0)
+		backend_max_queues = 1;
+
+	info->nr_rings = min(backend_max_queues, xen_blkif_max_queues);
+	/* We need at least one ring. */
+	if (!info->nr_rings)
+		info->nr_rings = 1;
+
+	info->rinfo = kzalloc(sizeof(struct blkfront_ring_info) * info->nr_rings, GFP_KERNEL);
+	if (!info->rinfo) {
+		xenbus_dev_fatal(info->xbdev, -ENOMEM, "allocating ring_info structure");
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < info->nr_rings; i++) {
+		struct blkfront_ring_info *rinfo;
+
+		rinfo = &info->rinfo[i];
+		INIT_LIST_HEAD(&rinfo->indirect_pages);
+		INIT_LIST_HEAD(&rinfo->grants);
+		rinfo->dev_info = info;
+		INIT_WORK(&rinfo->work, blkif_restart_queue);
+		spin_lock_init(&rinfo->ring_lock);
+	}
+	return 0;
+}
 /**
  * Entry point to this code when a new device is created.  Allocate the basic
  * structures and the ring buffer for communication with the backend, and
@@ -1686,9 +1723,7 @@ static int blkfront_probe(struct xenbus_device *dev,
 			  const struct xenbus_device_id *id)
 {
 	int err, vdevice;
-	unsigned int r_index;
 	struct blkfront_info *info;
-	unsigned int backend_max_queues = 0;
 
 	/* FIXME: Use dynamic device id if this is not set. */
 	err = xenbus_scanf(XBT_NIL, dev->nodename,
@@ -1739,33 +1774,10 @@ static int blkfront_probe(struct xenbus_device *dev,
 	}
 
 	info->xbdev = dev;
-	/* Check if backend supports multiple queues. */
-	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
-			   "multi-queue-max-queues", "%u", &backend_max_queues);
-	if (err < 0)
-		backend_max_queues = 1;
-
-	info->nr_rings = min(backend_max_queues, xen_blkif_max_queues);
-	/* We need at least one ring. */
-	if (!info->nr_rings)
-		info->nr_rings = 1;
-
-	info->rinfo = kzalloc(sizeof(struct blkfront_ring_info) * info->nr_rings, GFP_KERNEL);
-	if (!info->rinfo) {
-		xenbus_dev_fatal(dev, -ENOMEM, "allocating ring_info structure");
+	err = negotiate_mq(info);
+	if (err) {
 		kfree(info);
-		return -ENOMEM;
-	}
-
-	for (r_index = 0; r_index < info->nr_rings; r_index++) {
-		struct blkfront_ring_info *rinfo;
-
-		rinfo = &info->rinfo[r_index];
-		INIT_LIST_HEAD(&rinfo->indirect_pages);
-		INIT_LIST_HEAD(&rinfo->grants);
-		rinfo->dev_info = info;
-		INIT_WORK(&rinfo->work, blkif_restart_queue);
-		spin_lock_init(&rinfo->ring_lock);
+		return err;
 	}
 
 	mutex_init(&info->mutex);
@@ -1926,12 +1938,16 @@ static int blkif_recover(struct blkfront_info *info)
 static int blkfront_resume(struct xenbus_device *dev)
 {
 	struct blkfront_info *info = dev_get_drvdata(&dev->dev);
-	int err;
+	int err = 0;
 
 	dev_dbg(&dev->dev, "blkfront_resume: %s\n", dev->nodename);
 
 	blkif_free(info, info->connected == BLKIF_STATE_CONNECTED);
 
+	err = negotiate_mq(info);
+	if (err)
+		return err;
+
 	err = talk_to_blkback(dev, info);
 
 	/*
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2015-11-30 22:03 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-14  3:12 [PATCH v5 00/10] xen-block: multi hardware-queues/rings support Bob Liu
2015-11-14  3:12 ` [PATCH v5 01/10] xen/blkif: document blkif multi-queue/ring extension Bob Liu
2015-11-14  3:12 ` Bob Liu
2015-11-16 19:27   ` Konrad Rzeszutek Wilk
2015-11-14  3:12 ` [PATCH v5 02/10] xen/blkfront: separate per ring information out of device info Bob Liu
2015-11-14  3:12   ` Bob Liu
2015-11-16 19:44   ` Konrad Rzeszutek Wilk
2015-11-14  3:12 ` [PATCH v5 03/10] xen/blkfront: pseudo support for multi hardware queues/rings Bob Liu
2015-11-14  3:12   ` Bob Liu
2015-11-16 20:38   ` Konrad Rzeszutek Wilk
2015-11-16 21:54     ` Konrad Rzeszutek Wilk
2015-11-14  3:12 ` [PATCH v5 04/10] xen/blkfront: split per device io_lock Bob Liu
2015-11-14  3:12 ` Bob Liu
2015-11-16 20:59   ` Konrad Rzeszutek Wilk
2015-11-14  3:12 ` [PATCH v5 05/10] xen/blkfront: negotiate number of queues/rings to be used with backend Bob Liu
2015-11-16 21:27   ` Konrad Rzeszutek Wilk
2015-11-16 23:13     ` Bob Liu
2015-11-16 23:13     ` Bob Liu
2015-11-17 14:20       ` Konrad Rzeszutek Wilk
2015-11-14  3:12 ` Bob Liu
2015-11-14  3:12 ` [PATCH v5 06/10] xen/blkback: separate ring information out of struct xen_blkif Bob Liu
2015-11-14  3:12   ` Bob Liu
2015-11-14  3:12 ` [PATCH v5 07/10] xen/blkback: pseudo support for multi hardware queues/rings Bob Liu
2015-11-25 17:40   ` Konrad Rzeszutek Wilk
2015-11-25 17:40   ` Konrad Rzeszutek Wilk
2015-11-14  3:12 ` Bob Liu
2015-11-14  3:12 ` [PATCH v5 08/10] xen/blkback: get the number of hardware queues/rings from blkfront Bob Liu
2015-11-14  3:12 ` Bob Liu
2015-11-25 17:58   ` Konrad Rzeszutek Wilk
2015-11-25 17:58   ` Konrad Rzeszutek Wilk
2015-11-14  3:12 ` [PATCH v5 09/10] xen/blkfront: make persistent grants pool per-queue Bob Liu
2015-11-14  3:12 ` Bob Liu
2015-11-14  3:12 ` [PATCH v5 10/10] xen/blkback: make pool of persistent grants and free pages per-queue Bob Liu
2015-11-14  3:12 ` Bob Liu
2015-11-16 21:08 ` [PATCH v5 00/10] xen-block: multi hardware-queues/rings support Konrad Rzeszutek Wilk
2015-11-16 21:35 ` [PATCH] cleanups to xen-lkbfront on top of " Konrad Rzeszutek Wilk
2015-11-16 21:35   ` [PATCH 1/2] xen/blkfront: Cleanup of comments, fix unaligned variables, and syntax errors Konrad Rzeszutek Wilk
2015-11-16 21:35   ` [PATCH 2/2] xen/blkfront: Remove duplicate setting of ->xbdev Konrad Rzeszutek Wilk
2015-11-25 19:25 ` [PATCH v5 00/10] xen-block: multi hardware-queues/rings support Konrad Rzeszutek Wilk
2015-11-25 19:25 ` Konrad Rzeszutek Wilk
2015-11-25 20:56   ` Konrad Rzeszutek Wilk
2015-11-25 20:56   ` Konrad Rzeszutek Wilk
2015-11-25 22:12     ` Konrad Rzeszutek Wilk
2015-11-26  2:28       ` Bob Liu
2015-11-26  2:57         ` Konrad Rzeszutek Wilk
2015-11-26  7:09           ` Bob Liu
2015-11-26  7:09           ` Bob Liu
2015-11-26 16:20             ` Konrad Rzeszutek Wilk
2015-11-26 16:20             ` Konrad Rzeszutek Wilk
2015-11-30 22:03             ` Konrad Rzeszutek Wilk
2015-11-26  2:57         ` Konrad Rzeszutek Wilk
2015-11-26  2:28       ` Bob Liu
2015-11-25 22:12     ` Konrad Rzeszutek Wilk
2015-11-25 19:35 ` [PATCH RFC] Various fixes to xen block drivers on top of Bob's multi-queue patches Konrad Rzeszutek Wilk
2015-11-25 19:35   ` [PATCH RFC 1/2] xen/blocks: Return -EXX instead of -1 Konrad Rzeszutek Wilk
2015-11-25 19:35   ` Konrad Rzeszutek Wilk
2015-11-25 19:35   ` [PATCH RFC 2/2] xen/blkback: Free resources if connect_ring failed Konrad Rzeszutek Wilk
2015-11-25 19:35   ` Konrad Rzeszutek Wilk
2015-11-25 19:35 ` [PATCH RFC] Various fixes to xen block drivers on top of Bob's multi-queue patches Konrad Rzeszutek Wilk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.