[PATCH 0/2] block/xen-blkfront: Support non-indirect with 64KB page granularity

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] block/xen-blkfront: Support non-indirect with 64KB page granularity
@ 2015-09-11 19:31 Julien Grall
  2015-09-11 19:31 ` [PATCH 1/2] block/xen-blkfront: Introduce blkif_ring_get_request Julien Grall
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Julien Grall @ 2015-09-11 19:31 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, stefano.stabellini, linux-kernel, Julien Grall,
	Konrad Rzeszutek Wilk, Roger Pau Monné,
	Boris Ostrovsky, David Vrabel

Hi all,

This is a follow-up on the previous discussion [1] related to guest using 64KB
page granularity not booting with backend using non-indirect grant.

This has been successly tested on ARM64 with both 64KB and 4KB page granularity
guests and QEMU as the backend. Indeed QEMU is not supported indirect.

For a summary of the previous discussion see patch #2.

This series is based on top of my 64KB page granularity support [2].

Comments are welcomed.

Sincerely yours,

[1] http://lists.xen.org/archives/html/xen-devel/2015-08/msg01659.html
[2] https://lwn.net/Articles/656797/

Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>

Julien Grall (2):
  block/xen-blkfront: Introduce blkif_ring_get_request
  block/xen-blkfront: Handle non-indirect grant with 64KB pages

 drivers/block/xen-blkfront.c | 229 ++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 202 insertions(+), 27 deletions(-)

-- 
2.1.4


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/2] block/xen-blkfront: Introduce blkif_ring_get_request
  2015-09-11 19:31 [PATCH 0/2] block/xen-blkfront: Support non-indirect with 64KB page granularity Julien Grall
@ 2015-09-11 19:31 ` Julien Grall
  2015-10-05 15:22   ` Roger Pau Monné
  2015-09-11 19:32 ` [PATCH 2/2] block/xen-blkfront: Handle non-indirect grant with 64KB pages Julien Grall
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Julien Grall @ 2015-09-11 19:31 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, stefano.stabellini, linux-kernel, Julien Grall,
	Konrad Rzeszutek Wilk, Roger Pau Monné,
	Boris Ostrovsky, David Vrabel

The code to get a request is always the same. Therefore we can factorize
it in a single function.

Signed-off-by: Julien Grall <julien.grall@citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
---
 drivers/block/xen-blkfront.c | 30 +++++++++++++++++++-----------
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 43cda94..f9d55c3 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -456,6 +456,23 @@ static int blkif_ioctl(struct block_device *bdev, fmode_t mode,
 	return 0;
 }
 
+static unsigned long blkif_ring_get_request(struct blkfront_info *info,
+					    struct request *req,
+					    struct blkif_request **ring_req)
+{
+	unsigned long id;
+
+	*ring_req = RING_GET_REQUEST(&info->ring, info->ring.req_prod_pvt);
+	info->ring.req_prod_pvt++;
+
+	id = get_id_from_freelist(info);
+	info->shadow[id].request = req;
+
+	(*ring_req)->u.rw.id = id;
+
+	return id;
+}
+
 static int blkif_queue_discard_req(struct request *req)
 {
 	struct blkfront_info *info = req->rq_disk->private_data;
@@ -463,9 +480,7 @@ static int blkif_queue_discard_req(struct request *req)
 	unsigned long id;
 
 	/* Fill out a communications ring structure. */
-	ring_req = RING_GET_REQUEST(&info->ring, info->ring.req_prod_pvt);
-	id = get_id_from_freelist(info);
-	info->shadow[id].request = req;
+	id = blkif_ring_get_request(info, req, &ring_req);
 
 	ring_req->operation = BLKIF_OP_DISCARD;
 	ring_req->u.discard.nr_sectors = blk_rq_sectors(req);
@@ -476,8 +491,6 @@ static int blkif_queue_discard_req(struct request *req)
 	else
 		ring_req->u.discard.flag = 0;
 
-	info->ring.req_prod_pvt++;
-
 	/* Keep a private copy so we can reissue requests when recovering. */
 	info->shadow[id].req = *ring_req;
 
@@ -613,9 +626,7 @@ static int blkif_queue_rw_req(struct request *req)
 		new_persistent_gnts = 0;
 
 	/* Fill out a communications ring structure. */
-	ring_req = RING_GET_REQUEST(&info->ring, info->ring.req_prod_pvt);
-	id = get_id_from_freelist(info);
-	info->shadow[id].request = req;
+	id = blkif_ring_get_request(info, req, &ring_req);
 
 	BUG_ON(info->max_indirect_segments == 0 &&
 	       GREFS(req->nr_phys_segments) > BLKIF_MAX_SEGMENTS_PER_REQUEST);
@@ -628,7 +639,6 @@ static int blkif_queue_rw_req(struct request *req)
 	for_each_sg(info->shadow[id].sg, sg, num_sg, i)
 	       num_grant += gnttab_count_grant(sg->offset, sg->length);
 
-	ring_req->u.rw.id = id;
 	info->shadow[id].num_sg = num_sg;
 	if (num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
 		/*
@@ -694,8 +704,6 @@ static int blkif_queue_rw_req(struct request *req)
 	if (setup.segments)
 		kunmap_atomic(setup.segments);
 
-	info->ring.req_prod_pvt++;
-
 	/* Keep a private copy so we can reissue requests when recovering. */
 	info->shadow[id].req = *ring_req;
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/2] block/xen-blkfront: Handle non-indirect grant with 64KB pages
  2015-09-11 19:31 [PATCH 0/2] block/xen-blkfront: Support non-indirect with 64KB page granularity Julien Grall
  2015-09-11 19:31 ` [PATCH 1/2] block/xen-blkfront: Introduce blkif_ring_get_request Julien Grall
@ 2015-09-11 19:32 ` Julien Grall
  2015-10-05 16:01   ` Roger Pau Monné
  2015-09-12  9:46 ` [Xen-devel] [PATCH 0/2] block/xen-blkfront: Support non-indirect with 64KB page granularity Bob Liu
  2015-10-02  9:32 ` Julien Grall
  3 siblings, 1 reply; 18+ messages in thread
From: Julien Grall @ 2015-09-11 19:32 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, stefano.stabellini, linux-kernel, Julien Grall,
	Konrad Rzeszutek Wilk, Roger Pau Monné,
	Boris Ostrovsky, David Vrabel

The minimal size of request in the block framework is always PAGE_SIZE.
It means that when 64KB guest is support, the request will at least be
64KB.

Although, if the backend doesn't support indirect grant (such as QDISK
in QEMU), a ring request is only able to accomodate 11 segments of 4KB
(i.e 44KB).

The current frontend is assuming that an I/O request will always fit in
a ring request. This is not true any more when using 64KB page
granularity and will therefore crash during the boot.

On ARM64, the ABI is completely neutral to the page granularity used by
the domU. The guest has the choice between different page granularity
supported by the processors (for instance on ARM64: 4KB, 16KB, 64KB).
This can't be enforced by the hypervisor and therefore it's possible to
run guests using different page granularity.

So we can't mandate the block backend to support non-indirect grant
when the frontend is using 64KB page granularity and have to fix it
properly in the frontend.

The solution exposed below is based on modifying directly the frontend
guest rather than asking the block framework to support smaller size
(i.e < PAGE_SIZE). This is because the change is the block framework are
not trivial as everything seems to relying on a struct *page (see [1]).
Although, it may be possible that someone succeed to do it in the future
and we would therefore be able to use advantage.

Given that a block request may not fit in a single ring request, a
second request is introduced for the data that cannot fit in the first
one. This means that the second request should never be used on Linux
configuration using a page granularity < 44KB.

Note that the parameters blk_queue_max_* helpers haven't been updated.
The block code will set mimimum size supported and we may be able  to
support directly any change in the block framework that lower down the
mimimal size of a request.

[1] http://lists.xen.org/archives/html/xen-devel/2015-08/msg02200.html

Signed-off-by: Julien Grall <julien.grall@citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
---
 drivers/block/xen-blkfront.c | 199 +++++++++++++++++++++++++++++++++++++++----
 1 file changed, 183 insertions(+), 16 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index f9d55c3..03772c9 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -60,6 +60,20 @@
 
 #include <asm/xen/hypervisor.h>
 
+/*
+ * The block framework is always working on segment of PAGE_SIZE minimum.
+ * When Linux is using a different page size than xen, it may not be possible
+ * to put all the data in a single segment.
+ * This can happen when the backend doesn't support indirect grant and
+ * therefore the maximum amount of data that a request can carry is
+ * BLKIF_MAX_SEGMENTS_PER_REQUEST * XEN_PAGE_SIZE = 44KB
+ *
+ * Note that we only support one extra request. So the Linux page size
+ * should be <= ( 2 * BLKIF_MAX_SEGMENTS_PER_REQUEST * XEN_PAGE_SIZE) =
+ * 88KB.
+ */
+#define HAS_EXTRA_REQ (BLKIF_MAX_SEGMENTS_PER_REQUEST < XEN_PFN_PER_PAGE)
+
 enum blkif_state {
 	BLKIF_STATE_DISCONNECTED,
 	BLKIF_STATE_CONNECTED,
@@ -79,6 +93,19 @@ struct blk_shadow {
 	struct grant **indirect_grants;
 	struct scatterlist *sg;
 	unsigned int num_sg;
+	enum
+	{
+		REQ_WAITING,
+		REQ_DONE,
+		REQ_FAIL
+	} status;
+
+	#define NO_ASSOCIATED_ID ~0UL
+	/*
+	 * Id of the sibling if we ever need 2 requests when handling a
+	 * block I/O request
+	 */
+	unsigned long associated_id;
 };
 
 struct split_bio {
@@ -467,6 +494,8 @@ static unsigned long blkif_ring_get_request(struct blkfront_info *info,
 
 	id = get_id_from_freelist(info);
 	info->shadow[id].request = req;
+	info->shadow[id].status = REQ_WAITING;
+	info->shadow[id].associated_id = NO_ASSOCIATED_ID;
 
 	(*ring_req)->u.rw.id = id;
 
@@ -508,6 +537,9 @@ struct setup_rw_req {
 	bool need_copy;
 	unsigned int bvec_off;
 	char *bvec_data;
+
+	bool require_extra_req;
+	struct blkif_request *ring_req2;
 };
 
 static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
@@ -521,8 +553,24 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
 	unsigned int grant_idx = setup->grant_idx;
 	struct blkif_request *ring_req = setup->ring_req;
 	struct blkfront_info *info = setup->info;
+	/*
+	 * We always use the shadow of the first request to store the list
+	 * of grant associated to the block I/O request. This made the
+	 * completion more easy to handle even if the block I/O request is
+	 * split.
+	 */
 	struct blk_shadow *shadow = &info->shadow[setup->id];
 
+	if (unlikely(setup->require_extra_req &&
+		     grant_idx >= BLKIF_MAX_SEGMENTS_PER_REQUEST)) {
+		/*
+		 * We are using the second request, setup grant_idx
+		 * to be the index of the segment array
+		 */
+		grant_idx -= BLKIF_MAX_SEGMENTS_PER_REQUEST;
+		ring_req = setup->ring_req2;
+	}
+
 	if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
 	    (grant_idx % GRANTS_PER_INDIRECT_FRAME == 0)) {
 		if (setup->segments)
@@ -537,7 +585,11 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
 
 	gnt_list_entry = get_grant(&setup->gref_head, gfn, info);
 	ref = gnt_list_entry->gref;
-	shadow->grants_used[grant_idx] = gnt_list_entry;
+	/*
+	 * All the grants are stored in the shadow of the first
+	 * request. Therefore we have to use the global index
+	 */
+	shadow->grants_used[setup->grant_idx] = gnt_list_entry;
 
 	if (setup->need_copy) {
 		void *shared_data;
@@ -579,11 +631,31 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
 	(setup->grant_idx)++;
 }
 
+static void blkif_setup_extra_req(struct blkif_request *first,
+				  struct blkif_request *second)
+{
+	uint16_t nr_segments = first->u.rw.nr_segments;
+
+	/*
+	 * The second request is only present when the first request uses
+	 * all its segments. It's always the continuity of the first one
+	 */
+	first->u.rw.nr_segments = BLKIF_MAX_SEGMENTS_PER_REQUEST;
+
+	second->u.rw.nr_segments = nr_segments - BLKIF_MAX_SEGMENTS_PER_REQUEST;
+	second->u.rw.sector_number = first->u.rw.sector_number +
+		(BLKIF_MAX_SEGMENTS_PER_REQUEST * XEN_PAGE_SIZE) / 512;
+
+	second->u.rw.handle = first->u.rw.handle;
+	second->operation = first->operation;
+}
+
 static int blkif_queue_rw_req(struct request *req)
 {
 	struct blkfront_info *info = req->rq_disk->private_data;
-	struct blkif_request *ring_req;
-	unsigned long id;
+	struct blkif_request *ring_req, *ring_req2 = NULL;
+	unsigned long id, id2 = NO_ASSOCIATED_ID;
+	bool require_extra_req = false;
 	int i;
 	struct setup_rw_req setup = {
 		.grant_idx = 0,
@@ -628,19 +700,19 @@ static int blkif_queue_rw_req(struct request *req)
 	/* Fill out a communications ring structure. */
 	id = blkif_ring_get_request(info, req, &ring_req);
 
-	BUG_ON(info->max_indirect_segments == 0 &&
-	       GREFS(req->nr_phys_segments) > BLKIF_MAX_SEGMENTS_PER_REQUEST);
-	BUG_ON(info->max_indirect_segments &&
-	       GREFS(req->nr_phys_segments) > info->max_indirect_segments);
-
 	num_sg = blk_rq_map_sg(req->q, req, info->shadow[id].sg);
 	num_grant = 0;
 	/* Calculate the number of grant used */
 	for_each_sg(info->shadow[id].sg, sg, num_sg, i)
 	       num_grant += gnttab_count_grant(sg->offset, sg->length);
 
+	require_extra_req = info->max_indirect_segments == 0 &&
+		num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST;
+	BUG_ON(!HAS_EXTRA_REQ && require_extra_req);
+
 	info->shadow[id].num_sg = num_sg;
-	if (num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
+	if (num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST &&
+	    likely(!require_extra_req)) {
 		/*
 		 * The indirect operation can only be a BLKIF_OP_READ or
 		 * BLKIF_OP_WRITE
@@ -680,10 +752,29 @@ static int blkif_queue_rw_req(struct request *req)
 			}
 		}
 		ring_req->u.rw.nr_segments = num_grant;
+		if (unlikely(require_extra_req)) {
+			id2 = blkif_ring_get_request(info, req, &ring_req2);
+			/*
+			 * Only the first request contain the scatter-gather
+			 * list.
+			 */
+			info->shadow[id2].num_sg = 0;
+
+			blkif_setup_extra_req(ring_req, ring_req2);
+
+			/* Link the 2 requests together */
+			info->shadow[id2].associated_id = id;
+			info->shadow[id].associated_id = id2;
+		}
 	}
 
 	setup.ring_req = ring_req;
 	setup.id = id;
+
+	setup.require_extra_req = require_extra_req;
+	if (unlikely(require_extra_req))
+		setup.ring_req2 = ring_req2;
+
 	for_each_sg(info->shadow[id].sg, sg, num_sg, i) {
 		BUG_ON(sg->offset + sg->length > PAGE_SIZE);
 
@@ -706,6 +797,8 @@ static int blkif_queue_rw_req(struct request *req)
 
 	/* Keep a private copy so we can reissue requests when recovering. */
 	info->shadow[id].req = *ring_req;
+	if (unlikely(require_extra_req))
+		info->shadow[id2].req = *ring_req2;
 
 	if (new_persistent_gnts)
 		gnttab_free_grant_references(setup.gref_head);
@@ -797,7 +890,15 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
 	memset(&info->tag_set, 0, sizeof(info->tag_set));
 	info->tag_set.ops = &blkfront_mq_ops;
 	info->tag_set.nr_hw_queues = 1;
-	info->tag_set.queue_depth =  BLK_RING_SIZE(info);
+	if (HAS_EXTRA_REQ && info->max_indirect_segments == 0) {
+		/* When non-indirect grant is not supported, the I/O request
+		 * will be split between mutiple request in the ring.
+		 * To avoid problem when sending the request, divide by
+		 * 2 the depth of the queue.
+		 */
+		info->tag_set.queue_depth =  BLK_RING_SIZE(info) / 2;
+	} else
+		info->tag_set.queue_depth = BLK_RING_SIZE(info);
 	info->tag_set.numa_node = NUMA_NO_NODE;
 	info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
 	info->tag_set.cmd_size = 0;
@@ -1217,19 +1318,67 @@ static void blkif_copy_from_grant(unsigned long gfn, unsigned int offset,
 	kunmap_atomic(shared_data);
 }
 
-static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
+static bool blkif_completion(unsigned long *id, struct blkfront_info *info,
 			     struct blkif_response *bret)
 {
 	int i = 0;
 	struct scatterlist *sg;
 	int num_sg, num_grant;
+	struct blk_shadow *s = &info->shadow[*id];
 	struct copy_from_grant data = {
-		.s = s,
 		.grant_idx = 0,
 	};
 
 	num_grant = s->req.operation == BLKIF_OP_INDIRECT ?
 		s->req.u.indirect.nr_segments : s->req.u.rw.nr_segments;
+
+	/* The I/O request may be split in two */
+	if (unlikely(s->associated_id != NO_ASSOCIATED_ID)) {
+		struct blk_shadow *s2 = &info->shadow[s->associated_id];
+
+		/* Keep the status of the current response in shadow */
+		s->status = (bret->status == BLKIF_RSP_OKAY) ?
+			REQ_DONE : REQ_FAIL;
+
+		/* Wait the second response if not yet here */
+		if (s2->status == REQ_WAITING)
+			return 0;
+
+		/*
+		 * The status of the current response will be used in
+		 * order to know if the request has failed.
+		 * Update the current response status only if has not
+		 * failed.
+		 */
+		if (bret->status == BLKIF_RSP_OKAY && s2->status == REQ_FAIL)
+			bret->status = BLKIF_RSP_ERROR;
+
+		/*
+		 * All the grants is stored in the first shadow in order
+		 * to make the completion code simpler.
+		 */
+		num_grant += s2->req.u.rw.nr_segments;
+
+		/*
+		 * The two responses may not come in order. Only the
+		 * first request will store the scatter-gather list
+		 */
+		if (s2->num_sg != 0) {
+			/* Update "id" with the ID of the first response */
+			*id = s->associated_id;
+			s = s2;
+		}
+
+		/* We don't need anymore the second request, so recycling
+		 * it now
+		 */
+		if (add_id_to_freelist(info, s->associated_id))
+			WARN(1, "%s: can't recycle the second part (id = %ld)"
+			     "of the request\n",
+			     info->gd->disk_name, s->associated_id);
+	}
+
+	data.s = s;
 	num_sg = s->num_sg;
 
 	if (bret->operation == BLKIF_OP_READ && info->feature_persistent) {
@@ -1299,6 +1448,8 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
 			}
 		}
 	}
+
+	return 1;
 }
 
 static irqreturn_t blkif_interrupt(int irq, void *dev_id)
@@ -1339,8 +1490,14 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 		}
 		req  = info->shadow[id].request;
 
-		if (bret->operation != BLKIF_OP_DISCARD)
-			blkif_completion(&info->shadow[id], info, bret);
+		if (bret->operation != BLKIF_OP_DISCARD) {
+			/*
+			 * We may need to wait for an extra response if the
+			 * I/O request is split in 2
+			 */
+			if (!blkif_completion(&id, info, bret))
+				continue;
+		}
 
 		if (add_id_to_freelist(info, id)) {
 			WARN(1, "%s: response to %s (id %ld) couldn't be recycled!\n",
@@ -1863,8 +2020,18 @@ static int blkfront_setup_indirect(struct blkfront_info *info)
 	unsigned int psegs, grants;
 	int err, i;
 
-	if (info->max_indirect_segments == 0)
-		grants = BLKIF_MAX_SEGMENTS_PER_REQUEST;
+	if (info->max_indirect_segments == 0) {
+		if (!HAS_EXTRA_REQ)
+			grants = BLKIF_MAX_SEGMENTS_PER_REQUEST;
+		else {
+			/*
+			 * When an extra req is required, the maximum
+			 * grants supported is related to the size of the
+			 * Linux block segment
+			 */
+			grants = GRANTS_PER_PSEG;
+		}
+	}
 	else
 		grants = info->max_indirect_segments;
 	psegs = grants / GRANTS_PER_PSEG;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] [PATCH 0/2] block/xen-blkfront: Support non-indirect with 64KB page granularity
  2015-09-11 19:31 [PATCH 0/2] block/xen-blkfront: Support non-indirect with 64KB page granularity Julien Grall
  2015-09-11 19:31 ` [PATCH 1/2] block/xen-blkfront: Introduce blkif_ring_get_request Julien Grall
  2015-09-11 19:32 ` [PATCH 2/2] block/xen-blkfront: Handle non-indirect grant with 64KB pages Julien Grall
@ 2015-09-12  9:46 ` Bob Liu
  2015-09-13 12:06   ` Julien Grall
  2015-10-02  9:32 ` Julien Grall
  3 siblings, 1 reply; 18+ messages in thread
From: Bob Liu @ 2015-09-12  9:46 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, ian.campbell, stefano.stabellini, linux-kernel,
	David Vrabel, Boris Ostrovsky, Roger Pau Monné

Hi Julien,

On 09/12/2015 03:31 AM, Julien Grall wrote:
> Hi all,
> 
> This is a follow-up on the previous discussion [1] related to guest using 64KB
> page granularity not booting with backend using non-indirect grant.
> 
> This has been successly tested on ARM64 with both 64KB and 4KB page granularity
> guests and QEMU as the backend. Indeed QEMU is not supported indirect.
> 

What's the linux kernel page granularity of the backend(dom0) in your test?

Did you test if using xen-blkback as the backend? Especially when linux kernel page
granularity of dom0 is 4kB while domU is 64KB.

Thanks,
-Bob

> For a summary of the previous discussion see patch #2.
> 
> This series is based on top of my 64KB page granularity support [2].
> 
> Comments are welcomed.
> 
> Sincerely yours,
> 
> [1] http://lists.xen.org/archives/html/xen-devel/2015-08/msg01659.html
> [2] https://lwn.net/Articles/656797/
> 
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: "Roger Pau Monné" <roger.pau@citrix.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: David Vrabel <david.vrabel@citrix.com>
> 
> Julien Grall (2):
>   block/xen-blkfront: Introduce blkif_ring_get_request
>   block/xen-blkfront: Handle non-indirect grant with 64KB pages
> 
>  drivers/block/xen-blkfront.c | 229 ++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 202 insertions(+), 27 deletions(-)
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] [PATCH 0/2] block/xen-blkfront: Support non-indirect with 64KB page granularity
  2015-09-12  9:46 ` [Xen-devel] [PATCH 0/2] block/xen-blkfront: Support non-indirect with 64KB page granularity Bob Liu
@ 2015-09-13 12:06   ` Julien Grall
  2015-09-13 12:44     ` Bob Liu
  0 siblings, 1 reply; 18+ messages in thread
From: Julien Grall @ 2015-09-13 12:06 UTC (permalink / raw)
  To: Bob Liu
  Cc: xen-devel, ian.campbell, stefano.stabellini, linux-kernel,
	David Vrabel, Boris Ostrovsky, Roger Pau Monné

On 12/09/2015 10:46, Bob Liu wrote:
> Hi Julien,

Hi Bob,

> On 09/12/2015 03:31 AM, Julien Grall wrote:
>> Hi all,
>>
>> This is a follow-up on the previous discussion [1] related to guest using 64KB
>> page granularity not booting with backend using non-indirect grant.
>>
>> This has been successly tested on ARM64 with both 64KB and 4KB page granularity
>> guests and QEMU as the backend. Indeed QEMU is not supported indirect.
>>
>
> What's the linux kernel page granularity of the backend(dom0) in your test?

The PV protocol and the grant are always based on 4KB page granularity. 
So the backend page granularity doesn't matter.

>
> Did you test if using xen-blkback as the backend? Especially when linux kernel page
> granularity of dom0 is 4kB while domU is 64KB.

Sorry, but I don't understand what you are trying to know with those 2 
questions.

xen-blkback is always supporting indirect grant mapping (AFAICT it's not 
possible for the user to disable it). The issue is only happening when 
the backend is not supporting non-indirect grant (such as QEMU).

What matters while testing this series is having a backend which doesn't 
not support indirect grant. All the code added should never be called if 
the backend is support indirect grant and/or the frontend is using 4KB 
page granularity. There is some safeguard in patch #2 to ensure that 
this extra request is never created in those situations (see the 
definition of HAS_EXTRA_REQ).
That means that there is no difference for x86 and arm32 given that the 
page granularity is always 4KB.

Nonetheless, this patchset has been tested with DOM0 4KB using both QEMU 
and xen-blkback for the backend.

I have done testing on backend using 64KB page but with some patches not 
yet sent upstream. This is because gnttdev doesn't yet support 64KB pages.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] [PATCH 0/2] block/xen-blkfront: Support non-indirect with 64KB page granularity
  2015-09-13 12:06   ` Julien Grall
@ 2015-09-13 12:44     ` Bob Liu
  2015-09-13 17:47       ` Julien Grall
  0 siblings, 1 reply; 18+ messages in thread
From: Bob Liu @ 2015-09-13 12:44 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, ian.campbell, stefano.stabellini, linux-kernel,
	David Vrabel, Boris Ostrovsky, Roger Pau Monné


On 09/13/2015 08:06 PM, Julien Grall wrote:
> 
> 
> On 12/09/2015 10:46, Bob Liu wrote:
>> Hi Julien,
> 
> Hi Bob,
> 
> 
>> On 09/12/2015 03:31 AM, Julien Grall wrote:
>>> Hi all,
>>>
>>> This is a follow-up on the previous discussion [1] related to guest using 64KB
>>> page granularity not booting with backend using non-indirect grant.
>>>
>>> This has been successly tested on ARM64 with both 64KB and 4KB page granularity
>>> guests and QEMU as the backend. Indeed QEMU is not supported indirect.
>>>
>>
>> What's the linux kernel page granularity of the backend(dom0) in your test?
> 
> The PV protocol and the grant are always based on 4KB page granularity. So the backend page granularity doesn't matter.
> 
>>
>> Did you test if using xen-blkback as the backend? Especially when linux kernel page
>> granularity of dom0 is 4kB while domU is 64KB.
> 
> Sorry, but I don't understand what you are trying to know with those 2 questions.
> 
> xen-blkback is always supporting indirect grant mapping (AFAICT it's not possible for the user to disable it). 
> The issue is only happening when the backend is not supporting non-indirect grant (such as QEMU).
> What matters while testing this series is having a backend which doesn't not support indirect grant. 
> All the code added should never be called if the backend is support indirect grant and/or the frontend is using 4KB page granularity.

I may misunderstood here.
But I think same changes are also required even if backend supports indirect grant when frontend is using 64KB page granularity.
Else
1) How to set up the grant map for requests in domU?
The minimum segment buffer size in a request is PAGE_SIZE(64KB) while grant is 4KB based.

2) Codes like below in blkback.c may not work correctly?
if ((req->u.rw.seg[i].last_sect >= (PAGE_SIZE >> 9)) ||

Because PAGE_SIZE in backend is 4KB, while the written value by domU is 64KB based.

Thanks,
-Bob

> There is some safeguard in patch #2 to ensure that this extra request is never created in those situations (see the definition of HAS_EXTRA_REQ).
> That means that there is no difference for x86 and arm32 given that the page granularity is always 4KB.
> 
> Nonetheless, this patchset has been tested with DOM0 4KB using both QEMU and xen-blkback for the backend.
> 
> I have done testing on backend using 64KB page but with some patches not yet sent upstream. This is because gnttdev doesn't yet support 64KB pages.
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] [PATCH 0/2] block/xen-blkfront: Support non-indirect with 64KB page granularity
  2015-09-13 12:44     ` Bob Liu
@ 2015-09-13 17:47       ` Julien Grall
  2015-09-14  0:37         ` Bob Liu
  0 siblings, 1 reply; 18+ messages in thread
From: Julien Grall @ 2015-09-13 17:47 UTC (permalink / raw)
  To: Bob Liu
  Cc: xen-devel, ian.campbell, stefano.stabellini, linux-kernel,
	David Vrabel, Boris Ostrovsky, Roger Pau Monné



On 13/09/2015 13:44, Bob Liu wrote:
> I may misunderstood here.
> But I think same changes are also required even if backend supports indirect grant when frontend is using 64KB page granularity.
> Else
> 1) How to set up the grant map for requests in domU?
> The minimum segment buffer size in a request is PAGE_SIZE(64KB) while grant is 4KB based.
>
> 2) Codes like below in blkback.c may not work correctly?
> if ((req->u.rw.seg[i].last_sect >= (PAGE_SIZE >> 9)) ||
>
> Because PAGE_SIZE in backend is 4KB, while the written value by domU is 64KB based.

As mention in my cover letter, this patch is not self-sufficient to 
support 64KB guest. It's a follow-up of the 64KB page granularity 
support I sent on the ML (the new version was sent earlier this week [1]).

One of the patch [2] is taking care of breaking down the I/O request in 
multiple 4KB segment that will be used in the ring request. You may want 
to give a look to this patch before looking to this series.

Regards,

[1] https://lwn.net/Articles/656797/
[2] http://www.spinics.net/lists/arm-kernel/msg430468.html

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] [PATCH 0/2] block/xen-blkfront: Support non-indirect with 64KB page granularity
  2015-09-13 17:47       ` Julien Grall
@ 2015-09-14  0:37         ` Bob Liu
  0 siblings, 0 replies; 18+ messages in thread
From: Bob Liu @ 2015-09-14  0:37 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, ian.campbell, stefano.stabellini, linux-kernel,
	David Vrabel, Boris Ostrovsky, Roger Pau Monné



On 09/14/2015 01:47 AM, Julien Grall wrote:
> 
> 
> On 13/09/2015 13:44, Bob Liu wrote:
>> I may misunderstood here.
>> But I think same changes are also required even if backend supports indirect grant when frontend is using 64KB page granularity.
>> Else
>> 1) How to set up the grant map for requests in domU?
>> The minimum segment buffer size in a request is PAGE_SIZE(64KB) while grant is 4KB based.
>>
>> 2) Codes like below in blkback.c may not work correctly?
>> if ((req->u.rw.seg[i].last_sect >= (PAGE_SIZE >> 9)) ||
>>
>> Because PAGE_SIZE in backend is 4KB, while the written value by domU is 64KB based.
> 
> As mention in my cover letter, this patch is not self-sufficient to support 64KB guest. It's a follow-up of the 64KB page granularity support I sent on the ML (the new version was sent earlier this week [1]).
> 
> One of the patch [2] is taking care of breaking down the I/O request in multiple 4KB segment that will be used in the ring request. You may want to give a look to this patch before looking to this series.
> 

Oh, sorry! I'll have a look at those patches.

Thanks,
-Bob

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] [PATCH 0/2] block/xen-blkfront: Support non-indirect with 64KB page granularity
  2015-09-11 19:31 [PATCH 0/2] block/xen-blkfront: Support non-indirect with 64KB page granularity Julien Grall
                   ` (2 preceding siblings ...)
  2015-09-12  9:46 ` [Xen-devel] [PATCH 0/2] block/xen-blkfront: Support non-indirect with 64KB page granularity Bob Liu
@ 2015-10-02  9:32 ` Julien Grall
  3 siblings, 0 replies; 18+ messages in thread
From: Julien Grall @ 2015-10-02  9:32 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, stefano.stabellini, linux-kernel, David Vrabel,
	Boris Ostrovsky, Roger Pau Monné

Hi,

Ping, any comment on this series?

Regards,

On 11/09/15 20:31, Julien Grall wrote:
> Hi all,
> 
> This is a follow-up on the previous discussion [1] related to guest using 64KB
> page granularity not booting with backend using non-indirect grant.
> 
> This has been successly tested on ARM64 with both 64KB and 4KB page granularity
> guests and QEMU as the backend. Indeed QEMU is not supported indirect.
> 
> For a summary of the previous discussion see patch #2.
> 
> This series is based on top of my 64KB page granularity support [2].
> 
> Comments are welcomed.
> 
> Sincerely yours,
> 
> [1] http://lists.xen.org/archives/html/xen-devel/2015-08/msg01659.html
> [2] https://lwn.net/Articles/656797/
> 
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: "Roger Pau Monné" <roger.pau@citrix.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: David Vrabel <david.vrabel@citrix.com>
> 
> Julien Grall (2):
>   block/xen-blkfront: Introduce blkif_ring_get_request
>   block/xen-blkfront: Handle non-indirect grant with 64KB pages
> 
>  drivers/block/xen-blkfront.c | 229 ++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 202 insertions(+), 27 deletions(-)
> 


-- 
Julien Grall

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] block/xen-blkfront: Introduce blkif_ring_get_request
  2015-09-11 19:31 ` [PATCH 1/2] block/xen-blkfront: Introduce blkif_ring_get_request Julien Grall
@ 2015-10-05 15:22   ` Roger Pau Monné
  0 siblings, 0 replies; 18+ messages in thread
From: Roger Pau Monné @ 2015-10-05 15:22 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: ian.campbell, stefano.stabellini, linux-kernel,
	Konrad Rzeszutek Wilk, Boris Ostrovsky, David Vrabel

El 11/09/15 a les 21.31, Julien Grall ha escrit:
> The code to get a request is always the same. Therefore we can factorize
> it in a single function.
> 
> Signed-off-by: Julien Grall <julien.grall@citrix.com>

Acked-by: Roger Pau Monné <roger.pau@citrix.com>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] block/xen-blkfront: Handle non-indirect grant with 64KB pages
  2015-09-11 19:32 ` [PATCH 2/2] block/xen-blkfront: Handle non-indirect grant with 64KB pages Julien Grall
@ 2015-10-05 16:01   ` Roger Pau Monné
  2015-10-05 17:05     ` [Xen-devel] " Julien Grall
  0 siblings, 1 reply; 18+ messages in thread
From: Roger Pau Monné @ 2015-10-05 16:01 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: ian.campbell, stefano.stabellini, linux-kernel,
	Konrad Rzeszutek Wilk, Boris Ostrovsky, David Vrabel

El 11/09/15 a les 21.32, Julien Grall ha escrit:
> The minimal size of request in the block framework is always PAGE_SIZE.
> It means that when 64KB guest is support, the request will at least be
> 64KB.
> 
> Although, if the backend doesn't support indirect grant (such as QDISK
> in QEMU), a ring request is only able to accomodate 11 segments of 4KB
> (i.e 44KB).
> 
> The current frontend is assuming that an I/O request will always fit in
> a ring request. This is not true any more when using 64KB page
> granularity and will therefore crash during the boot.
                                       ^ during boot.
> 
> On ARM64, the ABI is completely neutral to the page granularity used by
> the domU. The guest has the choice between different page granularity
> supported by the processors (for instance on ARM64: 4KB, 16KB, 64KB).
> This can't be enforced by the hypervisor and therefore it's possible to
> run guests using different page granularity.
> 
> So we can't mandate the block backend to support non-indirect grant
> when the frontend is using 64KB page granularity and have to fix it
> properly in the frontend.
> 
> The solution exposed below is based on modifying directly the frontend
> guest rather than asking the block framework to support smaller size
> (i.e < PAGE_SIZE). This is because the change is the block framework are
> not trivial as everything seems to relying on a struct *page (see [1]).
> Although, it may be possible that someone succeed to do it in the future
> and we would therefore be able to use advantage.
                                       ^ it. (no advantage IMHO)
> 
> Given that a block request may not fit in a single ring request, a
> second request is introduced for the data that cannot fit in the first
> one. This means that the second request should never be used on Linux
> configuration using a page granularity < 44KB.
  ^ if the page size is smaller than 44KB.
> 
> Note that the parameters blk_queue_max_* helpers haven't been updated.
> The block code will set mimimum size supported and we may be able  to
                          ^ the minimum                extra space ^
> support directly any change in the block framework that lower down the
> mimimal size of a request.
  ^ minimal

I have a concern regarding the splitting done in this patch.

What happens with FUA requests when split? For example the frontend
receives a FUA requests with 64KB of data, and this is split into two
different requests on the ring, is this going to cause trouble in the
higher layers if for example the first request is completed but the
second is not? Could we leave the disk in a bad state as a consequence
of this?

> [1] http://lists.xen.org/archives/html/xen-devel/2015-08/msg02200.html
> 
> Signed-off-by: Julien Grall <julien.grall@citrix.com>
> 
> ---
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: "Roger Pau Monné" <roger.pau@citrix.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: David Vrabel <david.vrabel@citrix.com>
> ---
>  drivers/block/xen-blkfront.c | 199 +++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 183 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index f9d55c3..03772c9 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -60,6 +60,20 @@
>  
>  #include <asm/xen/hypervisor.h>
>  
> +/*
> + * The block framework is always working on segment of PAGE_SIZE minimum.

The above sentence needs to be reworded.

> + * When Linux is using a different page size than xen, it may not be possible
> + * to put all the data in a single segment.
> + * This can happen when the backend doesn't support indirect grant and
                                     indirect requests ^
> + * therefore the maximum amount of data that a request can carry is
> + * BLKIF_MAX_SEGMENTS_PER_REQUEST * XEN_PAGE_SIZE = 44KB
> + *
> + * Note that we only support one extra request. So the Linux page size
> + * should be <= ( 2 * BLKIF_MAX_SEGMENTS_PER_REQUEST * XEN_PAGE_SIZE) =
> + * 88KB.
> + */
> +#define HAS_EXTRA_REQ (BLKIF_MAX_SEGMENTS_PER_REQUEST < XEN_PFN_PER_PAGE)

Since you are already introducing a preprocessor define, I would like
the code added to be protected by it.

>  enum blkif_state {
>  	BLKIF_STATE_DISCONNECTED,
>  	BLKIF_STATE_CONNECTED,
> @@ -79,6 +93,19 @@ struct blk_shadow {
>  	struct grant **indirect_grants;
>  	struct scatterlist *sg;
>  	unsigned int num_sg;

#if HAS_EXTRA_REQ

> +	enum
> +	{
> +		REQ_WAITING,
> +		REQ_DONE,
> +		REQ_FAIL
> +	} status;
> +
> +	#define NO_ASSOCIATED_ID ~0UL
> +	/*
> +	 * Id of the sibling if we ever need 2 requests when handling a
> +	 * block I/O request
> +	 */
> +	unsigned long associated_id;

#endif

>  };
>  
>  struct split_bio {
> @@ -467,6 +494,8 @@ static unsigned long blkif_ring_get_request(struct blkfront_info *info,
>  
>  	id = get_id_from_freelist(info);
>  	info->shadow[id].request = req;
> +	info->shadow[id].status = REQ_WAITING;
> +	info->shadow[id].associated_id = NO_ASSOCIATED_ID;
>  
>  	(*ring_req)->u.rw.id = id;
>  
> @@ -508,6 +537,9 @@ struct setup_rw_req {
>  	bool need_copy;
>  	unsigned int bvec_off;
>  	char *bvec_data;
> +
> +	bool require_extra_req;
> +	struct blkif_request *ring_req2;

extra_ring_req?

>  };
>  
>  static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
> @@ -521,8 +553,24 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
>  	unsigned int grant_idx = setup->grant_idx;
>  	struct blkif_request *ring_req = setup->ring_req;
>  	struct blkfront_info *info = setup->info;
> +	/*
> +	 * We always use the shadow of the first request to store the list
> +	 * of grant associated to the block I/O request. This made the
              ^ grants                                        ^ makes
> +	 * completion more easy to handle even if the block I/O request is
> +	 * split.
> +	 */
>  	struct blk_shadow *shadow = &info->shadow[setup->id];
>  
> +	if (unlikely(setup->require_extra_req &&
> +		     grant_idx >= BLKIF_MAX_SEGMENTS_PER_REQUEST)) {
> +		/*
> +		 * We are using the second request, setup grant_idx
> +		 * to be the index of the segment array
> +		 */
> +		grant_idx -= BLKIF_MAX_SEGMENTS_PER_REQUEST;
> +		ring_req = setup->ring_req2;
> +	}
> +
>  	if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
>  	    (grant_idx % GRANTS_PER_INDIRECT_FRAME == 0)) {
>  		if (setup->segments)
> @@ -537,7 +585,11 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
>  
>  	gnt_list_entry = get_grant(&setup->gref_head, gfn, info);
>  	ref = gnt_list_entry->gref;
> -	shadow->grants_used[grant_idx] = gnt_list_entry;
> +	/*
> +	 * All the grants are stored in the shadow of the first
> +	 * request. Therefore we have to use the global index
> +	 */
> +	shadow->grants_used[setup->grant_idx] = gnt_list_entry;
>  
>  	if (setup->need_copy) {
>  		void *shared_data;
> @@ -579,11 +631,31 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
>  	(setup->grant_idx)++;
>  }
>  
> +static void blkif_setup_extra_req(struct blkif_request *first,
> +				  struct blkif_request *second)
> +{
> +	uint16_t nr_segments = first->u.rw.nr_segments;
> +
> +	/*
> +	 * The second request is only present when the first request uses
> +	 * all its segments. It's always the continuity of the first one
> +	 */
> +	first->u.rw.nr_segments = BLKIF_MAX_SEGMENTS_PER_REQUEST;
> +
> +	second->u.rw.nr_segments = nr_segments - BLKIF_MAX_SEGMENTS_PER_REQUEST;
> +	second->u.rw.sector_number = first->u.rw.sector_number +
> +		(BLKIF_MAX_SEGMENTS_PER_REQUEST * XEN_PAGE_SIZE) / 512;

Does this need to take into account if sectors have been skipped in the
previous requests due to empty data?

Also I'm not sure how correct it is to hardcode 512 here.

> +
> +	second->u.rw.handle = first->u.rw.handle;
> +	second->operation = first->operation;
> +}
> +
>  static int blkif_queue_rw_req(struct request *req)
>  {
>  	struct blkfront_info *info = req->rq_disk->private_data;
> -	struct blkif_request *ring_req;
> -	unsigned long id;
> +	struct blkif_request *ring_req, *ring_req2 = NULL;
> +	unsigned long id, id2 = NO_ASSOCIATED_ID;
> +	bool require_extra_req = false;
>  	int i;
>  	struct setup_rw_req setup = {
>  		.grant_idx = 0,
> @@ -628,19 +700,19 @@ static int blkif_queue_rw_req(struct request *req)
>  	/* Fill out a communications ring structure. */
>  	id = blkif_ring_get_request(info, req, &ring_req);
>  
> -	BUG_ON(info->max_indirect_segments == 0 &&
> -	       GREFS(req->nr_phys_segments) > BLKIF_MAX_SEGMENTS_PER_REQUEST);
> -	BUG_ON(info->max_indirect_segments &&
> -	       GREFS(req->nr_phys_segments) > info->max_indirect_segments);
> -
>  	num_sg = blk_rq_map_sg(req->q, req, info->shadow[id].sg);
>  	num_grant = 0;
>  	/* Calculate the number of grant used */
>  	for_each_sg(info->shadow[id].sg, sg, num_sg, i)
>  	       num_grant += gnttab_count_grant(sg->offset, sg->length);
>  
> +	require_extra_req = info->max_indirect_segments == 0 &&
> +		num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST;
> +	BUG_ON(!HAS_EXTRA_REQ && require_extra_req);
> +
>  	info->shadow[id].num_sg = num_sg;
> -	if (num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
> +	if (num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST &&
> +	    likely(!require_extra_req)) {
>  		/*
>  		 * The indirect operation can only be a BLKIF_OP_READ or
>  		 * BLKIF_OP_WRITE
> @@ -680,10 +752,29 @@ static int blkif_queue_rw_req(struct request *req)
>  			}
>  		}
>  		ring_req->u.rw.nr_segments = num_grant;
> +		if (unlikely(require_extra_req)) {
> +			id2 = blkif_ring_get_request(info, req, &ring_req2);

How can you guarantee that there's always going to be another free
request? AFAICT blkif_queue_rq checks for RING_FULL, but you don't
actually know if there's only one slot or more than one available.

> +			/*
> +			 * Only the first request contain the scatter-gather
> +			 * list.
> +			 */
> +			info->shadow[id2].num_sg = 0;
> +
> +			blkif_setup_extra_req(ring_req, ring_req2);
> +
> +			/* Link the 2 requests together */
> +			info->shadow[id2].associated_id = id;
> +			info->shadow[id].associated_id = id2;
> +		}
>  	}
>  
>  	setup.ring_req = ring_req;
>  	setup.id = id;
> +
> +	setup.require_extra_req = require_extra_req;
> +	if (unlikely(require_extra_req))
> +		setup.ring_req2 = ring_req2;
> +
>  	for_each_sg(info->shadow[id].sg, sg, num_sg, i) {
>  		BUG_ON(sg->offset + sg->length > PAGE_SIZE);
>  
> @@ -706,6 +797,8 @@ static int blkif_queue_rw_req(struct request *req)
>  
>  	/* Keep a private copy so we can reissue requests when recovering. */
>  	info->shadow[id].req = *ring_req;
> +	if (unlikely(require_extra_req))
> +		info->shadow[id2].req = *ring_req2;
>  
>  	if (new_persistent_gnts)
>  		gnttab_free_grant_references(setup.gref_head);
> @@ -797,7 +890,15 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
>  	memset(&info->tag_set, 0, sizeof(info->tag_set));
>  	info->tag_set.ops = &blkfront_mq_ops;
>  	info->tag_set.nr_hw_queues = 1;
> -	info->tag_set.queue_depth =  BLK_RING_SIZE(info);
> +	if (HAS_EXTRA_REQ && info->max_indirect_segments == 0) {
> +		/* When non-indirect grant is not supported, the I/O request
> +		 * will be split between mutiple request in the ring.
> +		 * To avoid problem when sending the request, divide by
                            ^ problems
> +		 * 2 the depth of the queue.
> +		 */
> +		info->tag_set.queue_depth =  BLK_RING_SIZE(info) / 2;
> +	} else
> +		info->tag_set.queue_depth = BLK_RING_SIZE(info);
>  	info->tag_set.numa_node = NUMA_NO_NODE;
>  	info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
>  	info->tag_set.cmd_size = 0;
> @@ -1217,19 +1318,67 @@ static void blkif_copy_from_grant(unsigned long gfn, unsigned int offset,
>  	kunmap_atomic(shared_data);
>  }
>  
> -static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
> +static bool blkif_completion(unsigned long *id, struct blkfront_info *info,
>  			     struct blkif_response *bret)
>  {
>  	int i = 0;
>  	struct scatterlist *sg;
>  	int num_sg, num_grant;
> +	struct blk_shadow *s = &info->shadow[*id];
>  	struct copy_from_grant data = {
> -		.s = s,
>  		.grant_idx = 0,
>  	};
>  
>  	num_grant = s->req.operation == BLKIF_OP_INDIRECT ?
>  		s->req.u.indirect.nr_segments : s->req.u.rw.nr_segments;
> +
> +	/* The I/O request may be split in two */
> +	if (unlikely(s->associated_id != NO_ASSOCIATED_ID)) {
> +		struct blk_shadow *s2 = &info->shadow[s->associated_id];
> +
> +		/* Keep the status of the current response in shadow */
> +		s->status = (bret->status == BLKIF_RSP_OKAY) ?
> +			REQ_DONE : REQ_FAIL;
> +
> +		/* Wait the second response if not yet here */
> +		if (s2->status == REQ_WAITING)
> +			return 0;
> +
> +		/*
> +		 * The status of the current response will be used in
> +		 * order to know if the request has failed.
> +		 * Update the current response status only if has not
> +		 * failed.
> +		 */
> +		if (bret->status == BLKIF_RSP_OKAY && s2->status == REQ_FAIL)
> +			bret->status = BLKIF_RSP_ERROR;
> +
> +		/*
> +		 * All the grants is stored in the first shadow in order
> +		 * to make the completion code simpler.
> +		 */
> +		num_grant += s2->req.u.rw.nr_segments;
> +
> +		/*
> +		 * The two responses may not come in order. Only the
> +		 * first request will store the scatter-gather list
> +		 */
> +		if (s2->num_sg != 0) {
> +			/* Update "id" with the ID of the first response */
> +			*id = s->associated_id;
> +			s = s2;
> +		}
> +
> +		/* We don't need anymore the second request, so recycling
> +		 * it now
> +		 */
> +		if (add_id_to_freelist(info, s->associated_id))
> +			WARN(1, "%s: can't recycle the second part (id = %ld)"
> +			     "of the request\n",
> +			     info->gd->disk_name, s->associated_id);
> +	}
> +
> +	data.s = s;
>  	num_sg = s->num_sg;
>  
>  	if (bret->operation == BLKIF_OP_READ && info->feature_persistent) {
> @@ -1299,6 +1448,8 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
>  			}
>  		}
>  	}
> +
> +	return 1;
>  }
>  
>  static irqreturn_t blkif_interrupt(int irq, void *dev_id)
> @@ -1339,8 +1490,14 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>  		}
>  		req  = info->shadow[id].request;
>  
> -		if (bret->operation != BLKIF_OP_DISCARD)
> -			blkif_completion(&info->shadow[id], info, bret);
> +		if (bret->operation != BLKIF_OP_DISCARD) {
> +			/*
> +			 * We may need to wait for an extra response if the
> +			 * I/O request is split in 2
> +			 */
> +			if (!blkif_completion(&id, info, bret))
> +				continue;
> +		}
>  
>  		if (add_id_to_freelist(info, id)) {
>  			WARN(1, "%s: response to %s (id %ld) couldn't be recycled!\n",
> @@ -1863,8 +2020,18 @@ static int blkfront_setup_indirect(struct blkfront_info *info)
>  	unsigned int psegs, grants;
>  	int err, i;
>  
> -	if (info->max_indirect_segments == 0)
> -		grants = BLKIF_MAX_SEGMENTS_PER_REQUEST;
> +	if (info->max_indirect_segments == 0) {
> +		if (!HAS_EXTRA_REQ)
> +			grants = BLKIF_MAX_SEGMENTS_PER_REQUEST;
> +		else {
> +			/*
> +			 * When an extra req is required, the maximum
> +			 * grants supported is related to the size of the
> +			 * Linux block segment
> +			 */
> +			grants = GRANTS_PER_PSEG;
> +		}
> +	}
>  	else
>  		grants = info->max_indirect_segments;
>  	psegs = grants / GRANTS_PER_PSEG;
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] [PATCH 2/2] block/xen-blkfront: Handle non-indirect grant with 64KB pages
  2015-10-05 16:01   ` Roger Pau Monné
@ 2015-10-05 17:05     ` Julien Grall
  2015-10-06  9:39       ` Roger Pau Monné
  0 siblings, 1 reply; 18+ messages in thread
From: Julien Grall @ 2015-10-05 17:05 UTC (permalink / raw)
  To: Roger Pau Monné, xen-devel
  Cc: ian.campbell, stefano.stabellini, linux-kernel, David Vrabel,
	Boris Ostrovsky

On 05/10/15 17:01, Roger Pau Monné wrote:
> El 11/09/15 a les 21.32, Julien Grall ha escrit:
>> The minimal size of request in the block framework is always PAGE_SIZE.
>> It means that when 64KB guest is support, the request will at least be
>> 64KB.
>>
>> Although, if the backend doesn't support indirect grant (such as QDISK
>> in QEMU), a ring request is only able to accomodate 11 segments of 4KB
>> (i.e 44KB).
>>
>> The current frontend is assuming that an I/O request will always fit in
>> a ring request. This is not true any more when using 64KB page
>> granularity and will therefore crash during the boot.
>                                        ^ during boot.
>>
>> On ARM64, the ABI is completely neutral to the page granularity used by
>> the domU. The guest has the choice between different page granularity
>> supported by the processors (for instance on ARM64: 4KB, 16KB, 64KB).
>> This can't be enforced by the hypervisor and therefore it's possible to
>> run guests using different page granularity.
>>
>> So we can't mandate the block backend to support non-indirect grant
>> when the frontend is using 64KB page granularity and have to fix it
>> properly in the frontend.
>>
>> The solution exposed below is based on modifying directly the frontend
>> guest rather than asking the block framework to support smaller size
>> (i.e < PAGE_SIZE). This is because the change is the block framework are
>> not trivial as everything seems to relying on a struct *page (see [1]).
>> Although, it may be possible that someone succeed to do it in the future
>> and we would therefore be able to use advantage.
>                                        ^ it. (no advantage IMHO)
>>
>> Given that a block request may not fit in a single ring request, a
>> second request is introduced for the data that cannot fit in the first
>> one. This means that the second request should never be used on Linux
>> configuration using a page granularity < 44KB.
>   ^ if the page size is smaller than 44KB.
>>
>> Note that the parameters blk_queue_max_* helpers haven't been updated.
>> The block code will set mimimum size supported and we may be able  to
>                           ^ the minimum                extra space ^
>> support directly any change in the block framework that lower down the
>> mimimal size of a request.
>   ^ minimal
> 
> I have a concern regarding the splitting done in this patch.
> 
> What happens with FUA requests when split? For example the frontend
> receives a FUA requests with 64KB of data, and this is split into two
> different requests on the ring, is this going to cause trouble in the
> higher layers if for example the first request is completed but the
> second is not? Could we leave the disk in a bad state as a consequence
> of this?

If a block request is split into two ring requests, we will wait the two
responses before reporting the completion to the higher layers (see
blkif_interrupt and blkif_completion).

Furthermore, the second ring request will always use the same operation
as the first one. Note that you will flush twice which is not nice but
could be improved.

> 
>> [1] http://lists.xen.org/archives/html/xen-devel/2015-08/msg02200.html
>>
>> Signed-off-by: Julien Grall <julien.grall@citrix.com>
>>
>> ---
>> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> Cc: "Roger Pau Monné" <roger.pau@citrix.com>
>> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> Cc: David Vrabel <david.vrabel@citrix.com>
>> ---
>>  drivers/block/xen-blkfront.c | 199 +++++++++++++++++++++++++++++++++++++++----
>>  1 file changed, 183 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>> index f9d55c3..03772c9 100644
>> --- a/drivers/block/xen-blkfront.c
>> +++ b/drivers/block/xen-blkfront.c
>> @@ -60,6 +60,20 @@
>>  
>>  #include <asm/xen/hypervisor.h>
>>  
>> +/*
>> + * The block framework is always working on segment of PAGE_SIZE minimum.
> 
> The above sentence needs to be reworded.

What about:

"The mininal size of the segment supported by the block framework is
PAGE_SIZE."

> 
>> + * When Linux is using a different page size than xen, it may not be possible
>> + * to put all the data in a single segment.
>> + * This can happen when the backend doesn't support indirect grant and
>                                      indirect requests ^
>> + * therefore the maximum amount of data that a request can carry is
>> + * BLKIF_MAX_SEGMENTS_PER_REQUEST * XEN_PAGE_SIZE = 44KB
>> + *
>> + * Note that we only support one extra request. So the Linux page size
>> + * should be <= ( 2 * BLKIF_MAX_SEGMENTS_PER_REQUEST * XEN_PAGE_SIZE) =
>> + * 88KB.
>> + */
>> +#define HAS_EXTRA_REQ (BLKIF_MAX_SEGMENTS_PER_REQUEST < XEN_PFN_PER_PAGE)
> 
> Since you are already introducing a preprocessor define, I would like
> the code added to be protected by it.

Most of the code is protected by an if (HAS_REQ_EXTRA && ...) or
variable setting based on HAS_REQ_EXTRA. Furthermore there is extra
protection with some BUG_ON.

I would rather avoid to use the preprocessor to avoid ending up with:

#ifdef HAS_EXTRA_REQ
/* Code */
#else
/* Code */
#endif

and potentially miss update on the ARM64 with 64KB pages because
currently most of people are developing PV drivers on x86.

>>  enum blkif_state {
>>  	BLKIF_STATE_DISCONNECTED,
>>  	BLKIF_STATE_CONNECTED,
>> @@ -79,6 +93,19 @@ struct blk_shadow {
>>  	struct grant **indirect_grants;
>>  	struct scatterlist *sg;
>>  	unsigned int num_sg;
> 
> #if HAS_EXTRA_REQ
> 
>> +	enum
>> +	{
>> +		REQ_WAITING,
>> +		REQ_DONE,
>> +		REQ_FAIL
>> +	} status;
>> +
>> +	#define NO_ASSOCIATED_ID ~0UL
>> +	/*
>> +	 * Id of the sibling if we ever need 2 requests when handling a
>> +	 * block I/O request
>> +	 */
>> +	unsigned long associated_id;
> 
> #endif

See my remark above.

> 
>>  };
>>  
>>  struct split_bio {
>> @@ -467,6 +494,8 @@ static unsigned long blkif_ring_get_request(struct blkfront_info *info,
>>  
>>  	id = get_id_from_freelist(info);
>>  	info->shadow[id].request = req;
>> +	info->shadow[id].status = REQ_WAITING;
>> +	info->shadow[id].associated_id = NO_ASSOCIATED_ID;
>>  
>>  	(*ring_req)->u.rw.id = id;
>>  
>> @@ -508,6 +537,9 @@ struct setup_rw_req {
>>  	bool need_copy;
>>  	unsigned int bvec_off;
>>  	char *bvec_data;
>> +
>> +	bool require_extra_req;
>> +	struct blkif_request *ring_req2;
> 
> extra_ring_req?

Will do.


>>  };
>>  
>>  static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
>> @@ -521,8 +553,24 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
>>  	unsigned int grant_idx = setup->grant_idx;
>>  	struct blkif_request *ring_req = setup->ring_req;
>>  	struct blkfront_info *info = setup->info;
>> +	/*
>> +	 * We always use the shadow of the first request to store the list
>> +	 * of grant associated to the block I/O request. This made the
>               ^ grants                                        ^ makes
>> +	 * completion more easy to handle even if the block I/O request is
>> +	 * split.
>> +	 */
>>  	struct blk_shadow *shadow = &info->shadow[setup->id];
>>  
>> +	if (unlikely(setup->require_extra_req &&
>> +		     grant_idx >= BLKIF_MAX_SEGMENTS_PER_REQUEST)) {
>> +		/*
>> +		 * We are using the second request, setup grant_idx
>> +		 * to be the index of the segment array
>> +		 */
>> +		grant_idx -= BLKIF_MAX_SEGMENTS_PER_REQUEST;
>> +		ring_req = setup->ring_req2;
>> +	}
>> +
>>  	if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
>>  	    (grant_idx % GRANTS_PER_INDIRECT_FRAME == 0)) {
>>  		if (setup->segments)
>> @@ -537,7 +585,11 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
>>  
>>  	gnt_list_entry = get_grant(&setup->gref_head, gfn, info);
>>  	ref = gnt_list_entry->gref;
>> -	shadow->grants_used[grant_idx] = gnt_list_entry;
>> +	/*
>> +	 * All the grants are stored in the shadow of the first
>> +	 * request. Therefore we have to use the global index
>> +	 */
>> +	shadow->grants_used[setup->grant_idx] = gnt_list_entry;
>>  
>>  	if (setup->need_copy) {
>>  		void *shared_data;
>> @@ -579,11 +631,31 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
>>  	(setup->grant_idx)++;
>>  }
>>  
>> +static void blkif_setup_extra_req(struct blkif_request *first,
>> +				  struct blkif_request *second)
>> +{
>> +	uint16_t nr_segments = first->u.rw.nr_segments;
>> +
>> +	/*
>> +	 * The second request is only present when the first request uses
>> +	 * all its segments. It's always the continuity of the first one
>> +	 */
>> +	first->u.rw.nr_segments = BLKIF_MAX_SEGMENTS_PER_REQUEST;
>> +
>> +	second->u.rw.nr_segments = nr_segments - BLKIF_MAX_SEGMENTS_PER_REQUEST;
>> +	second->u.rw.sector_number = first->u.rw.sector_number +
>> +		(BLKIF_MAX_SEGMENTS_PER_REQUEST * XEN_PAGE_SIZE) / 512;
> 
> Does this need to take into account if sectors have been skipped in the
> previous requests due to empty data?

I'm not sure to understand this question.

AFAIU, the data is always contiguous in a block segment even though it
can be split accross multiple Linux page. The number of segments is
calculated from the amount of data see "Calculate the number of grant
used" the code. The second request will only be created if this number
is greater than BLKIF_MAX_SEGMENTS_PER_REQUEST.

So the second request will always be the continuity of the first one.

> Also I'm not sure how correct it is to hardcode 512 here.

Well, we are hardcoding it in blkfront and the block framework does the
same (see blk_limits_max_hw_sectors).

> 
>> +
>> +	second->u.rw.handle = first->u.rw.handle;
>> +	second->operation = first->operation;
>> +}
>> +
>>  static int blkif_queue_rw_req(struct request *req)
>>  {
>>  	struct blkfront_info *info = req->rq_disk->private_data;
>> -	struct blkif_request *ring_req;
>> -	unsigned long id;
>> +	struct blkif_request *ring_req, *ring_req2 = NULL;
>> +	unsigned long id, id2 = NO_ASSOCIATED_ID;
>> +	bool require_extra_req = false;
>>  	int i;
>>  	struct setup_rw_req setup = {
>>  		.grant_idx = 0,
>> @@ -628,19 +700,19 @@ static int blkif_queue_rw_req(struct request *req)
>>  	/* Fill out a communications ring structure. */
>>  	id = blkif_ring_get_request(info, req, &ring_req);
>>  
>> -	BUG_ON(info->max_indirect_segments == 0 &&
>> -	       GREFS(req->nr_phys_segments) > BLKIF_MAX_SEGMENTS_PER_REQUEST);
>> -	BUG_ON(info->max_indirect_segments &&
>> -	       GREFS(req->nr_phys_segments) > info->max_indirect_segments);
>> -
>>  	num_sg = blk_rq_map_sg(req->q, req, info->shadow[id].sg);
>>  	num_grant = 0;
>>  	/* Calculate the number of grant used */
>>  	for_each_sg(info->shadow[id].sg, sg, num_sg, i)
>>  	       num_grant += gnttab_count_grant(sg->offset, sg->length);
>>  
>> +	require_extra_req = info->max_indirect_segments == 0 &&
>> +		num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST;
>> +	BUG_ON(!HAS_EXTRA_REQ && require_extra_req);
>> +
>>  	info->shadow[id].num_sg = num_sg;
>> -	if (num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
>> +	if (num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST &&
>> +	    likely(!require_extra_req)) {
>>  		/*
>>  		 * The indirect operation can only be a BLKIF_OP_READ or
>>  		 * BLKIF_OP_WRITE
>> @@ -680,10 +752,29 @@ static int blkif_queue_rw_req(struct request *req)
>>  			}
>>  		}
>>  		ring_req->u.rw.nr_segments = num_grant;
>> +		if (unlikely(require_extra_req)) {
>> +			id2 = blkif_ring_get_request(info, req, &ring_req2);
> 
> How can you guarantee that there's always going to be another free
> request? AFAICT blkif_queue_rq checks for RING_FULL, but you don't
> actually know if there's only one slot or more than one available.

Because the depth of the queue is divided by 2 when the extra request is
used (see xlvbd_init_blk_queue).

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] [PATCH 2/2] block/xen-blkfront: Handle non-indirect grant with 64KB pages
  2015-10-05 17:05     ` [Xen-devel] " Julien Grall
@ 2015-10-06  9:39       ` Roger Pau Monné
  2015-10-06  9:58         ` Julien Grall
  0 siblings, 1 reply; 18+ messages in thread
From: Roger Pau Monné @ 2015-10-06  9:39 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: ian.campbell, stefano.stabellini, linux-kernel, David Vrabel,
	Boris Ostrovsky

El 05/10/15 a les 19.05, Julien Grall ha escrit:
> On 05/10/15 17:01, Roger Pau Monné wrote:
>> El 11/09/15 a les 21.32, Julien Grall ha escrit:
>>> The minimal size of request in the block framework is always PAGE_SIZE.
>>> It means that when 64KB guest is support, the request will at least be
>>> 64KB.
>>>
>>> Although, if the backend doesn't support indirect grant (such as QDISK
>>> in QEMU), a ring request is only able to accomodate 11 segments of 4KB
>>> (i.e 44KB).
>>>
>>> The current frontend is assuming that an I/O request will always fit in
>>> a ring request. This is not true any more when using 64KB page
>>> granularity and will therefore crash during the boot.
>>                                        ^ during boot.
>>>
>>> On ARM64, the ABI is completely neutral to the page granularity used by
>>> the domU. The guest has the choice between different page granularity
>>> supported by the processors (for instance on ARM64: 4KB, 16KB, 64KB).
>>> This can't be enforced by the hypervisor and therefore it's possible to
>>> run guests using different page granularity.
>>>
>>> So we can't mandate the block backend to support non-indirect grant
>>> when the frontend is using 64KB page granularity and have to fix it
>>> properly in the frontend.
>>>
>>> The solution exposed below is based on modifying directly the frontend
>>> guest rather than asking the block framework to support smaller size
>>> (i.e < PAGE_SIZE). This is because the change is the block framework are
>>> not trivial as everything seems to relying on a struct *page (see [1]).
>>> Although, it may be possible that someone succeed to do it in the future
>>> and we would therefore be able to use advantage.
>>                                        ^ it. (no advantage IMHO)
>>>
>>> Given that a block request may not fit in a single ring request, a
>>> second request is introduced for the data that cannot fit in the first
>>> one. This means that the second request should never be used on Linux
>>> configuration using a page granularity < 44KB.
>>   ^ if the page size is smaller than 44KB.
>>>
>>> Note that the parameters blk_queue_max_* helpers haven't been updated.
>>> The block code will set mimimum size supported and we may be able  to
>>                           ^ the minimum                extra space ^
>>> support directly any change in the block framework that lower down the
>>> mimimal size of a request.
>>   ^ minimal
>>
>> I have a concern regarding the splitting done in this patch.
>>
>> What happens with FUA requests when split? For example the frontend
>> receives a FUA requests with 64KB of data, and this is split into two
>> different requests on the ring, is this going to cause trouble in the
>> higher layers if for example the first request is completed but the
>> second is not? Could we leave the disk in a bad state as a consequence
>> of this?
> 
> If a block request is split into two ring requests, we will wait the two
> responses before reporting the completion to the higher layers (see
> blkif_interrupt and blkif_completion).
> 
> Furthermore, the second ring request will always use the same operation
> as the first one. Note that you will flush twice which is not nice but
> could be improved.
> 
>>
>>> [1] http://lists.xen.org/archives/html/xen-devel/2015-08/msg02200.html
>>>
>>> Signed-off-by: Julien Grall <julien.grall@citrix.com>
>>>
>>> ---
>>> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>>> Cc: "Roger Pau Monné" <roger.pau@citrix.com>
>>> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>>> Cc: David Vrabel <david.vrabel@citrix.com>
>>> ---
>>>  drivers/block/xen-blkfront.c | 199 +++++++++++++++++++++++++++++++++++++++----
>>>  1 file changed, 183 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>>> index f9d55c3..03772c9 100644
>>> --- a/drivers/block/xen-blkfront.c
>>> +++ b/drivers/block/xen-blkfront.c
>>> @@ -60,6 +60,20 @@
>>>  
>>>  #include <asm/xen/hypervisor.h>
>>>  
>>> +/*
>>> + * The block framework is always working on segment of PAGE_SIZE minimum.
>>
>> The above sentence needs to be reworded.
> 
> What about:
> 
> "The mininal size of the segment supported by the block framework is
> PAGE_SIZE."

That sounds fine.

> 
>>
>>> + * When Linux is using a different page size than xen, it may not be possible
>>> + * to put all the data in a single segment.
>>> + * This can happen when the backend doesn't support indirect grant and
>>                                      indirect requests ^
>>> + * therefore the maximum amount of data that a request can carry is
>>> + * BLKIF_MAX_SEGMENTS_PER_REQUEST * XEN_PAGE_SIZE = 44KB
>>> + *
>>> + * Note that we only support one extra request. So the Linux page size
>>> + * should be <= ( 2 * BLKIF_MAX_SEGMENTS_PER_REQUEST * XEN_PAGE_SIZE) =
>>> + * 88KB.
>>> + */
>>> +#define HAS_EXTRA_REQ (BLKIF_MAX_SEGMENTS_PER_REQUEST < XEN_PFN_PER_PAGE)
>>
>> Since you are already introducing a preprocessor define, I would like
>> the code added to be protected by it.
> 
> Most of the code is protected by an if (HAS_REQ_EXTRA && ...) or
> variable setting based on HAS_REQ_EXTRA. Furthermore there is extra
> protection with some BUG_ON.
> 
> I would rather avoid to use the preprocessor to avoid ending up with:
> 
> #ifdef HAS_EXTRA_REQ
> /* Code */
> #else
> /* Code */
> #endif
> 
> and potentially miss update on the ARM64 with 64KB pages because
> currently most of people are developing PV drivers on x86.

Ok, I was also thinking whether a compile time assert would be fine:

BUILD_BUG_ON(PAGE_SIZE > 2 * BLKIF_MAX_SEGMENTS_PER_REQUEST *
XEN_PAGE_SIZE);

But then if indirect descriptors are available the assert is no longer true.

>>>  enum blkif_state {
>>>  	BLKIF_STATE_DISCONNECTED,
>>>  	BLKIF_STATE_CONNECTED,
>>> @@ -79,6 +93,19 @@ struct blk_shadow {
>>>  	struct grant **indirect_grants;
>>>  	struct scatterlist *sg;
>>>  	unsigned int num_sg;
>>
>> #if HAS_EXTRA_REQ
>>
>>> +	enum
>>> +	{
>>> +		REQ_WAITING,
>>> +		REQ_DONE,
>>> +		REQ_FAIL
>>> +	} status;
>>> +
>>> +	#define NO_ASSOCIATED_ID ~0UL
>>> +	/*
>>> +	 * Id of the sibling if we ever need 2 requests when handling a
>>> +	 * block I/O request
>>> +	 */
>>> +	unsigned long associated_id;
>>
>> #endif
> 
> See my remark above.
> 
>>
>>>  };
>>>  
>>>  struct split_bio {
>>> @@ -467,6 +494,8 @@ static unsigned long blkif_ring_get_request(struct blkfront_info *info,
>>>  
>>>  	id = get_id_from_freelist(info);
>>>  	info->shadow[id].request = req;
>>> +	info->shadow[id].status = REQ_WAITING;
>>> +	info->shadow[id].associated_id = NO_ASSOCIATED_ID;
>>>  
>>>  	(*ring_req)->u.rw.id = id;
>>>  
>>> @@ -508,6 +537,9 @@ struct setup_rw_req {
>>>  	bool need_copy;
>>>  	unsigned int bvec_off;
>>>  	char *bvec_data;
>>> +
>>> +	bool require_extra_req;
>>> +	struct blkif_request *ring_req2;
>>
>> extra_ring_req?
> 
> Will do.
> 
> 
>>>  };
>>>  
>>>  static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
>>> @@ -521,8 +553,24 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
>>>  	unsigned int grant_idx = setup->grant_idx;
>>>  	struct blkif_request *ring_req = setup->ring_req;
>>>  	struct blkfront_info *info = setup->info;
>>> +	/*
>>> +	 * We always use the shadow of the first request to store the list
>>> +	 * of grant associated to the block I/O request. This made the
>>               ^ grants                                        ^ makes
>>> +	 * completion more easy to handle even if the block I/O request is
>>> +	 * split.
>>> +	 */
>>>  	struct blk_shadow *shadow = &info->shadow[setup->id];
>>>  
>>> +	if (unlikely(setup->require_extra_req &&
>>> +		     grant_idx >= BLKIF_MAX_SEGMENTS_PER_REQUEST)) {
>>> +		/*
>>> +		 * We are using the second request, setup grant_idx
>>> +		 * to be the index of the segment array
>>> +		 */
>>> +		grant_idx -= BLKIF_MAX_SEGMENTS_PER_REQUEST;
>>> +		ring_req = setup->ring_req2;
>>> +	}
>>> +
>>>  	if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
>>>  	    (grant_idx % GRANTS_PER_INDIRECT_FRAME == 0)) {
>>>  		if (setup->segments)
>>> @@ -537,7 +585,11 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
>>>  
>>>  	gnt_list_entry = get_grant(&setup->gref_head, gfn, info);
>>>  	ref = gnt_list_entry->gref;
>>> -	shadow->grants_used[grant_idx] = gnt_list_entry;
>>> +	/*
>>> +	 * All the grants are stored in the shadow of the first
>>> +	 * request. Therefore we have to use the global index
>>> +	 */
>>> +	shadow->grants_used[setup->grant_idx] = gnt_list_entry;
>>>  
>>>  	if (setup->need_copy) {
>>>  		void *shared_data;
>>> @@ -579,11 +631,31 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
>>>  	(setup->grant_idx)++;
>>>  }
>>>  
>>> +static void blkif_setup_extra_req(struct blkif_request *first,
>>> +				  struct blkif_request *second)
>>> +{
>>> +	uint16_t nr_segments = first->u.rw.nr_segments;
>>> +
>>> +	/*
>>> +	 * The second request is only present when the first request uses
>>> +	 * all its segments. It's always the continuity of the first one
>>> +	 */
>>> +	first->u.rw.nr_segments = BLKIF_MAX_SEGMENTS_PER_REQUEST;
>>> +
>>> +	second->u.rw.nr_segments = nr_segments - BLKIF_MAX_SEGMENTS_PER_REQUEST;
>>> +	second->u.rw.sector_number = first->u.rw.sector_number +
>>> +		(BLKIF_MAX_SEGMENTS_PER_REQUEST * XEN_PAGE_SIZE) / 512;
>>
>> Does this need to take into account if sectors have been skipped in the
>> previous requests due to empty data?
> 
> I'm not sure to understand this question.
> 
> AFAIU, the data is always contiguous in a block segment even though it
> can be split accross multiple Linux page. The number of segments is
> calculated from the amount of data see "Calculate the number of grant
> used" the code. The second request will only be created if this number
> is greater than BLKIF_MAX_SEGMENTS_PER_REQUEST.
> 
> So the second request will always be the continuity of the first one.

Ack.

>> Also I'm not sure how correct it is to hardcode 512 here.
> 
> Well, we are hardcoding it in blkfront and the block framework does the
> same (see blk_limits_max_hw_sectors).

Right, seems like the whole block system is always working with 512 as
the basic unit.

>>
>>> +
>>> +	second->u.rw.handle = first->u.rw.handle;
>>> +	second->operation = first->operation;
>>> +}
>>> +
>>>  static int blkif_queue_rw_req(struct request *req)
>>>  {
>>>  	struct blkfront_info *info = req->rq_disk->private_data;
>>> -	struct blkif_request *ring_req;
>>> -	unsigned long id;
>>> +	struct blkif_request *ring_req, *ring_req2 = NULL;
>>> +	unsigned long id, id2 = NO_ASSOCIATED_ID;
>>> +	bool require_extra_req = false;
>>>  	int i;
>>>  	struct setup_rw_req setup = {
>>>  		.grant_idx = 0,
>>> @@ -628,19 +700,19 @@ static int blkif_queue_rw_req(struct request *req)
>>>  	/* Fill out a communications ring structure. */
>>>  	id = blkif_ring_get_request(info, req, &ring_req);
>>>  
>>> -	BUG_ON(info->max_indirect_segments == 0 &&
>>> -	       GREFS(req->nr_phys_segments) > BLKIF_MAX_SEGMENTS_PER_REQUEST);
>>> -	BUG_ON(info->max_indirect_segments &&
>>> -	       GREFS(req->nr_phys_segments) > info->max_indirect_segments);
>>> -
>>>  	num_sg = blk_rq_map_sg(req->q, req, info->shadow[id].sg);
>>>  	num_grant = 0;
>>>  	/* Calculate the number of grant used */
>>>  	for_each_sg(info->shadow[id].sg, sg, num_sg, i)
>>>  	       num_grant += gnttab_count_grant(sg->offset, sg->length);
>>>  
>>> +	require_extra_req = info->max_indirect_segments == 0 &&
>>> +		num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST;
>>> +	BUG_ON(!HAS_EXTRA_REQ && require_extra_req);
>>> +
>>>  	info->shadow[id].num_sg = num_sg;
>>> -	if (num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
>>> +	if (num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST &&
>>> +	    likely(!require_extra_req)) {
>>>  		/*
>>>  		 * The indirect operation can only be a BLKIF_OP_READ or
>>>  		 * BLKIF_OP_WRITE
>>> @@ -680,10 +752,29 @@ static int blkif_queue_rw_req(struct request *req)
>>>  			}
>>>  		}
>>>  		ring_req->u.rw.nr_segments = num_grant;
>>> +		if (unlikely(require_extra_req)) {
>>> +			id2 = blkif_ring_get_request(info, req, &ring_req2);
>>
>> How can you guarantee that there's always going to be another free
>> request? AFAICT blkif_queue_rq checks for RING_FULL, but you don't
>> actually know if there's only one slot or more than one available.
> 
> Because the depth of the queue is divided by 2 when the extra request is
> used (see xlvbd_init_blk_queue).

I see, that's quite restrictive but I guess it's better than introducing
a new ring macro in order to figure out if there are at least two free
slots.

Roger.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] [PATCH 2/2] block/xen-blkfront: Handle non-indirect grant with 64KB pages
  2015-10-06  9:39       ` Roger Pau Monné
@ 2015-10-06  9:58         ` Julien Grall
  2015-10-06 10:06           ` Roger Pau Monné
  0 siblings, 1 reply; 18+ messages in thread
From: Julien Grall @ 2015-10-06  9:58 UTC (permalink / raw)
  To: Roger Pau Monné, xen-devel
  Cc: ian.campbell, stefano.stabellini, linux-kernel, David Vrabel,
	Boris Ostrovsky

Hi Roger,

On 06/10/2015 10:39, Roger Pau Monné wrote:
> El 05/10/15 a les 19.05, Julien Grall ha escrit:
>> On 05/10/15 17:01, Roger Pau Monné wrote:
>>> El 11/09/15 a les 21.32, Julien Grall ha escrit:
>>>>   		ring_req->u.rw.nr_segments = num_grant;
>>>> +		if (unlikely(require_extra_req)) {
>>>> +			id2 = blkif_ring_get_request(info, req, &ring_req2);
>>>
>>> How can you guarantee that there's always going to be another free
>>> request? AFAICT blkif_queue_rq checks for RING_FULL, but you don't
>>> actually know if there's only one slot or more than one available.
>>
>> Because the depth of the queue is divided by 2 when the extra request is
>> used (see xlvbd_init_blk_queue).

I just noticed that I didn't mention this restriction in the commit 
message. I will do it in the next revision.

> I see, that's quite restrictive but I guess it's better than introducing
> a new ring macro in order to figure out if there are at least two free
> slots.

I actually didn't think about your suggestion. I choose to divide by two 
based on the assumption that the block framework will always try to send 
a request with the maximum data possible.

I don't know if this assumption is correct as I'm not fully aware how 
the block framework is working.

If it's valid, in the case of 64KB guest, the maximum size of a request 
would be 64KB when indirect segment is not supported. So we would end up 
with a lot of 64KB request which will require 2 ring request.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] [PATCH 2/2] block/xen-blkfront: Handle non-indirect grant with 64KB pages
  2015-10-06  9:58         ` Julien Grall
@ 2015-10-06 10:06           ` Roger Pau Monné
  2015-10-12 18:00             ` Julien Grall
  0 siblings, 1 reply; 18+ messages in thread
From: Roger Pau Monné @ 2015-10-06 10:06 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: ian.campbell, stefano.stabellini, linux-kernel, David Vrabel,
	Boris Ostrovsky

El 06/10/15 a les 11.58, Julien Grall ha escrit:
> Hi Roger,
> 
> On 06/10/2015 10:39, Roger Pau Monné wrote:
>> El 05/10/15 a les 19.05, Julien Grall ha escrit:
>>> On 05/10/15 17:01, Roger Pau Monné wrote:
>>>> El 11/09/15 a les 21.32, Julien Grall ha escrit:
>>>>>           ring_req->u.rw.nr_segments = num_grant;
>>>>> +        if (unlikely(require_extra_req)) {
>>>>> +            id2 = blkif_ring_get_request(info, req, &ring_req2);
>>>>
>>>> How can you guarantee that there's always going to be another free
>>>> request? AFAICT blkif_queue_rq checks for RING_FULL, but you don't
>>>> actually know if there's only one slot or more than one available.
>>>
>>> Because the depth of the queue is divided by 2 when the extra request is
>>> used (see xlvbd_init_blk_queue).
> 
> I just noticed that I didn't mention this restriction in the commit
> message. I will do it in the next revision.
> 
>> I see, that's quite restrictive but I guess it's better than introducing
>> a new ring macro in order to figure out if there are at least two free
>> slots.
> 
> I actually didn't think about your suggestion. I choose to divide by two
> based on the assumption that the block framework will always try to send
> a request with the maximum data possible.

AFAIK that depends on the request itself, the block layer will try to
merge requests if possible, but you can also expect that there are going
to be requests that will just contain a single block.

> I don't know if this assumption is correct as I'm not fully aware how
> the block framework is working.
> 
> If it's valid, in the case of 64KB guest, the maximum size of a request
> would be 64KB when indirect segment is not supported. So we would end up
> with a lot of 64KB request which will require 2 ring request.

I guess you could add a counter in order to see how many requests were
split vs total number of requests.

Roger.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] [PATCH 2/2] block/xen-blkfront: Handle non-indirect grant with 64KB pages
  2015-10-06 10:06           ` Roger Pau Monné
@ 2015-10-12 18:00             ` Julien Grall
  2015-10-19 11:16               ` Roger Pau Monné
  0 siblings, 1 reply; 18+ messages in thread
From: Julien Grall @ 2015-10-12 18:00 UTC (permalink / raw)
  To: Roger Pau Monné, xen-devel
  Cc: linux-kernel, Boris Ostrovsky, David Vrabel, ian.campbell,
	stefano.stabellini

On 06/10/15 11:06, Roger Pau Monné wrote:
> El 06/10/15 a les 11.58, Julien Grall ha escrit:
>> Hi Roger,
>>
>> On 06/10/2015 10:39, Roger Pau Monné wrote:
>>> El 05/10/15 a les 19.05, Julien Grall ha escrit:
>>>> On 05/10/15 17:01, Roger Pau Monné wrote:
>>>>> El 11/09/15 a les 21.32, Julien Grall ha escrit:
>>>>>>           ring_req->u.rw.nr_segments = num_grant;
>>>>>> +        if (unlikely(require_extra_req)) {
>>>>>> +            id2 = blkif_ring_get_request(info, req, &ring_req2);
>>>>>
>>>>> How can you guarantee that there's always going to be another free
>>>>> request? AFAICT blkif_queue_rq checks for RING_FULL, but you don't
>>>>> actually know if there's only one slot or more than one available.
>>>>
>>>> Because the depth of the queue is divided by 2 when the extra request is
>>>> used (see xlvbd_init_blk_queue).
>>
>> I just noticed that I didn't mention this restriction in the commit
>> message. I will do it in the next revision.
>>
>>> I see, that's quite restrictive but I guess it's better than introducing
>>> a new ring macro in order to figure out if there are at least two free
>>> slots.
>>
>> I actually didn't think about your suggestion. I choose to divide by two
>> based on the assumption that the block framework will always try to send
>> a request with the maximum data possible.
> 
> AFAIK that depends on the request itself, the block layer will try to
> merge requests if possible, but you can also expect that there are going
> to be requests that will just contain a single block.
> 
>> I don't know if this assumption is correct as I'm not fully aware how
>> the block framework is working.
>>
>> If it's valid, in the case of 64KB guest, the maximum size of a request
>> would be 64KB when indirect segment is not supported. So we would end up
>> with a lot of 64KB request which will require 2 ring request.
> 
> I guess you could add a counter in order to see how many requests were
> split vs total number of requests.

So the number of 64KB request is fairly small compare to the total
number of request (277 for 4687 request) for general usage (i.e cd, find).

Although as soon as I use dd, the block request will be merged. So I
guess a common usage will not provide enough data to fill a 64KB request.

Although as soon as I use dd with a block size of 64KB, most of the
request fill 64KB and an extra request is required.

Note that I had to implement quickly xen_biovec_phys_mergeable for 64KB
page as I left this aside. Without it, the biovec won't be merge except
with dd if you specific the block size (bs=64KB).

I've also looked to the code to see if it's possible to check if there
is 2 ring requests free and if not waiting until they are available.

Currently, we don't need to check if a request if free because the queue
is sized according to the number of request supported by the ring. This
means that the block layer is handling the check and we will always have
space in the ring.

If we decide to avoid dividing the number of request enqueue by the
block layer, we would have to handle ourself if there is 2 ring requests
free.
AFAICT, when BLK_MQ_RQ_BUSY is returned the block layer will stop the
queue. So we need to have some logic in blkfront to know when the 2 ring
requests become free and restart the queue. I guest it would be similar
to gnttab_request_free_callback.

I'd like your advice to know whether this is worth to implement it in
blockfront given that it will be only used for 64KB guest with backend
not supporting indirect grant.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] [PATCH 2/2] block/xen-blkfront: Handle non-indirect grant with 64KB pages
  2015-10-12 18:00             ` Julien Grall
@ 2015-10-19 11:16               ` Roger Pau Monné
  2015-10-19 11:36                 ` Julien Grall
  0 siblings, 1 reply; 18+ messages in thread
From: Roger Pau Monné @ 2015-10-19 11:16 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: linux-kernel, Boris Ostrovsky, David Vrabel, ian.campbell,
	stefano.stabellini

El 12/10/15 a les 20.00, Julien Grall ha escrit:
> On 06/10/15 11:06, Roger Pau Monné wrote:
>> El 06/10/15 a les 11.58, Julien Grall ha escrit:
>>> Hi Roger,
>>>
>>> On 06/10/2015 10:39, Roger Pau Monné wrote:
>>>> El 05/10/15 a les 19.05, Julien Grall ha escrit:
>>>>> On 05/10/15 17:01, Roger Pau Monné wrote:
>>>>>> El 11/09/15 a les 21.32, Julien Grall ha escrit:
>>>>>>>           ring_req->u.rw.nr_segments = num_grant;
>>>>>>> +        if (unlikely(require_extra_req)) {
>>>>>>> +            id2 = blkif_ring_get_request(info, req, &ring_req2);
>>>>>>
>>>>>> How can you guarantee that there's always going to be another free
>>>>>> request? AFAICT blkif_queue_rq checks for RING_FULL, but you don't
>>>>>> actually know if there's only one slot or more than one available.
>>>>>
>>>>> Because the depth of the queue is divided by 2 when the extra request is
>>>>> used (see xlvbd_init_blk_queue).
>>>
>>> I just noticed that I didn't mention this restriction in the commit
>>> message. I will do it in the next revision.
>>>
>>>> I see, that's quite restrictive but I guess it's better than introducing
>>>> a new ring macro in order to figure out if there are at least two free
>>>> slots.
>>>
>>> I actually didn't think about your suggestion. I choose to divide by two
>>> based on the assumption that the block framework will always try to send
>>> a request with the maximum data possible.
>>
>> AFAIK that depends on the request itself, the block layer will try to
>> merge requests if possible, but you can also expect that there are going
>> to be requests that will just contain a single block.
>>
>>> I don't know if this assumption is correct as I'm not fully aware how
>>> the block framework is working.
>>>
>>> If it's valid, in the case of 64KB guest, the maximum size of a request
>>> would be 64KB when indirect segment is not supported. So we would end up
>>> with a lot of 64KB request which will require 2 ring request.
>>
>> I guess you could add a counter in order to see how many requests were
>> split vs total number of requests.
> 
> So the number of 64KB request is fairly small compare to the total
> number of request (277 for 4687 request) for general usage (i.e cd, find).
> 
> Although as soon as I use dd, the block request will be merged. So I
> guess a common usage will not provide enough data to fill a 64KB request.
> 
> Although as soon as I use dd with a block size of 64KB, most of the
> request fill 64KB and an extra request is required.
> 
> Note that I had to implement quickly xen_biovec_phys_mergeable for 64KB
> page as I left this aside. Without it, the biovec won't be merge except
> with dd if you specific the block size (bs=64KB).
> 
> I've also looked to the code to see if it's possible to check if there
> is 2 ring requests free and if not waiting until they are available.
> 
> Currently, we don't need to check if a request if free because the queue
> is sized according to the number of request supported by the ring. This
> means that the block layer is handling the check and we will always have
> space in the ring.
> 
> If we decide to avoid dividing the number of request enqueue by the
> block layer, we would have to handle ourself if there is 2 ring requests
> free.
> AFAICT, when BLK_MQ_RQ_BUSY is returned the block layer will stop the
> queue. So we need to have some logic in blkfront to know when the 2 ring
> requests become free and restart the queue. I guest it would be similar
> to gnttab_request_free_callback.
> 
> I'd like your advice to know whether this is worth to implement it in
> blockfront given that it will be only used for 64KB guest with backend
> not supporting indirect grant.

At this point I don't think it's worth implementing it, if you feel like
doing that later in order to improve performance that would be fine, but
I don't think it should be required in order to get this merged.

I think you had to resend the patch anyway to fix some comments, but
apart from that:

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Roger.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] [PATCH 2/2] block/xen-blkfront: Handle non-indirect grant with 64KB pages
  2015-10-19 11:16               ` Roger Pau Monné
@ 2015-10-19 11:36                 ` Julien Grall
  0 siblings, 0 replies; 18+ messages in thread
From: Julien Grall @ 2015-10-19 11:36 UTC (permalink / raw)
  To: Roger Pau Monné, xen-devel
  Cc: Boris Ostrovsky, stefano.stabellini, linux-kernel, ian.campbell,
	David Vrabel

On 19/10/15 12:16, Roger Pau Monné wrote:
> At this point I don't think it's worth implementing it, if you feel like
> doing that later in order to improve performance that would be fine, but
> I don't think it should be required in order to get this merged.

I would rather avoid to improve performance in the frontend and
encourage people to implement indirect descriptor in their backend.

It would be more beneficial for any block frontend (x86, arm, 4K pages,
64K pages...).

> I think you had to resend the patch anyway to fix some comments, but
> apart from that:

Note that I'm planning to update the commit message to summarize my
previous mail.

> Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Thank you, I will try to resend a new version today.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-10-19 11:37 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-11 19:31 [PATCH 0/2] block/xen-blkfront: Support non-indirect with 64KB page granularity Julien Grall
2015-09-11 19:31 ` [PATCH 1/2] block/xen-blkfront: Introduce blkif_ring_get_request Julien Grall
2015-10-05 15:22   ` Roger Pau Monné
2015-09-11 19:32 ` [PATCH 2/2] block/xen-blkfront: Handle non-indirect grant with 64KB pages Julien Grall
2015-10-05 16:01   ` Roger Pau Monné
2015-10-05 17:05     ` [Xen-devel] " Julien Grall
2015-10-06  9:39       ` Roger Pau Monné
2015-10-06  9:58         ` Julien Grall
2015-10-06 10:06           ` Roger Pau Monné
2015-10-12 18:00             ` Julien Grall
2015-10-19 11:16               ` Roger Pau Monné
2015-10-19 11:36                 ` Julien Grall
2015-09-12  9:46 ` [Xen-devel] [PATCH 0/2] block/xen-blkfront: Support non-indirect with 64KB page granularity Bob Liu
2015-09-13 12:06   ` Julien Grall
2015-09-13 12:44     ` Bob Liu
2015-09-13 17:47       ` Julien Grall
2015-09-14  0:37         ` Bob Liu
2015-10-02  9:32 ` Julien Grall

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).