[PATCH v3 0/2] block/xen-blkfront: Support non-indirect grant with 64KB page granularity

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/2] block/xen-blkfront: Support non-indirect grant with 64KB page granularity
@ 2015-11-18 18:57 Julien Grall
  2015-11-18 18:57 ` [PATCH v3 1/2] block/xen-blkfront: Introduce blkif_ring_get_request Julien Grall
                   ` (8 more replies)
  0 siblings, 9 replies; 26+ messages in thread
From: Julien Grall @ 2015-11-18 18:57 UTC (permalink / raw)
  To: xen-devel, linux-kernel
  Cc: Julien Grall, Konrad Rzeszutek Wilk, Roger Pau Monné,
	Boris Ostrovsky, David Vrabel, Bob Liu

Hi all,

This is a follow-up on the previous discussion [1] related to guest using 64KB
page granularity which doesn't boot when the backend isn't using indirect
descriptor.

This has been successfully tested on ARM64 with both 64KB and 4KB page
granularity guests and QEMU as the backend. Indeed QEMU doesn't support
indirect descriptor.

This series is based on xentip/for-linus-4.4 which include the support for
64KB Linux guest.

For all the changes see in each patch.

Sincerely yours,

[1] http://lists.xen.org/archives/html/xen-devel/2015-08/msg01659.html

Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Bob Liu <bob.liu@oracle.com>

Julien Grall (2):
  block/xen-blkfront: Introduce blkif_ring_get_request
  block/xen-blkfront: Handle non-indirect grant with 64KB pages

 drivers/block/xen-blkfront.c | 258 ++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 231 insertions(+), 27 deletions(-)

-- 
2.1.4


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v3 1/2] block/xen-blkfront: Introduce blkif_ring_get_request
  2015-11-18 18:57 [PATCH v3 0/2] block/xen-blkfront: Support non-indirect grant with 64KB page granularity Julien Grall
@ 2015-11-18 18:57 ` Julien Grall
  2015-11-18 18:57 ` Julien Grall
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: Julien Grall @ 2015-11-18 18:57 UTC (permalink / raw)
  To: xen-devel, linux-kernel
  Cc: Julien Grall, Roger Pau Monné,
	Konrad Rzeszutek Wilk, Boris Ostrovsky, David Vrabel, Bob Liu

The code to get a request is always the same. Therefore we can factorize
it in a single function.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Bob Liu <bob.liu@oracle.com>

    Changes in v2:
        - Add Royger's acked-by
---
 drivers/block/xen-blkfront.c | 30 +++++++++++++++++++-----------
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 2fee2ee..2248a47 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -456,6 +456,23 @@ static int blkif_ioctl(struct block_device *bdev, fmode_t mode,
 	return 0;
 }
 
+static unsigned long blkif_ring_get_request(struct blkfront_info *info,
+					    struct request *req,
+					    struct blkif_request **ring_req)
+{
+	unsigned long id;
+
+	*ring_req = RING_GET_REQUEST(&info->ring, info->ring.req_prod_pvt);
+	info->ring.req_prod_pvt++;
+
+	id = get_id_from_freelist(info);
+	info->shadow[id].request = req;
+
+	(*ring_req)->u.rw.id = id;
+
+	return id;
+}
+
 static int blkif_queue_discard_req(struct request *req)
 {
 	struct blkfront_info *info = req->rq_disk->private_data;
@@ -463,9 +480,7 @@ static int blkif_queue_discard_req(struct request *req)
 	unsigned long id;
 
 	/* Fill out a communications ring structure. */
-	ring_req = RING_GET_REQUEST(&info->ring, info->ring.req_prod_pvt);
-	id = get_id_from_freelist(info);
-	info->shadow[id].request = req;
+	id = blkif_ring_get_request(info, req, &ring_req);
 
 	ring_req->operation = BLKIF_OP_DISCARD;
 	ring_req->u.discard.nr_sectors = blk_rq_sectors(req);
@@ -476,8 +491,6 @@ static int blkif_queue_discard_req(struct request *req)
 	else
 		ring_req->u.discard.flag = 0;
 
-	info->ring.req_prod_pvt++;
-
 	/* Keep a private copy so we can reissue requests when recovering. */
 	info->shadow[id].req = *ring_req;
 
@@ -613,9 +626,7 @@ static int blkif_queue_rw_req(struct request *req)
 		new_persistent_gnts = 0;
 
 	/* Fill out a communications ring structure. */
-	ring_req = RING_GET_REQUEST(&info->ring, info->ring.req_prod_pvt);
-	id = get_id_from_freelist(info);
-	info->shadow[id].request = req;
+	id = blkif_ring_get_request(info, req, &ring_req);
 
 	BUG_ON(info->max_indirect_segments == 0 &&
 	       GREFS(req->nr_phys_segments) > BLKIF_MAX_SEGMENTS_PER_REQUEST);
@@ -628,7 +639,6 @@ static int blkif_queue_rw_req(struct request *req)
 	for_each_sg(info->shadow[id].sg, sg, num_sg, i)
 	       num_grant += gnttab_count_grant(sg->offset, sg->length);
 
-	ring_req->u.rw.id = id;
 	info->shadow[id].num_sg = num_sg;
 	if (num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
 		/*
@@ -694,8 +704,6 @@ static int blkif_queue_rw_req(struct request *req)
 	if (setup.segments)
 		kunmap_atomic(setup.segments);
 
-	info->ring.req_prod_pvt++;
-
 	/* Keep a private copy so we can reissue requests when recovering. */
 	info->shadow[id].req = *ring_req;
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 1/2] block/xen-blkfront: Introduce blkif_ring_get_request
  2015-11-18 18:57 [PATCH v3 0/2] block/xen-blkfront: Support non-indirect grant with 64KB page granularity Julien Grall
  2015-11-18 18:57 ` [PATCH v3 1/2] block/xen-blkfront: Introduce blkif_ring_get_request Julien Grall
@ 2015-11-18 18:57 ` Julien Grall
  2015-11-18 18:57 ` [PATCH v3 2/2] block/xen-blkfront: Handle non-indirect grant with 64KB pages Julien Grall
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: Julien Grall @ 2015-11-18 18:57 UTC (permalink / raw)
  To: xen-devel, linux-kernel
  Cc: Julien Grall, David Vrabel, Boris Ostrovsky, Roger Pau Monné

The code to get a request is always the same. Therefore we can factorize
it in a single function.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Bob Liu <bob.liu@oracle.com>

    Changes in v2:
        - Add Royger's acked-by
---
 drivers/block/xen-blkfront.c | 30 +++++++++++++++++++-----------
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 2fee2ee..2248a47 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -456,6 +456,23 @@ static int blkif_ioctl(struct block_device *bdev, fmode_t mode,
 	return 0;
 }
 
+static unsigned long blkif_ring_get_request(struct blkfront_info *info,
+					    struct request *req,
+					    struct blkif_request **ring_req)
+{
+	unsigned long id;
+
+	*ring_req = RING_GET_REQUEST(&info->ring, info->ring.req_prod_pvt);
+	info->ring.req_prod_pvt++;
+
+	id = get_id_from_freelist(info);
+	info->shadow[id].request = req;
+
+	(*ring_req)->u.rw.id = id;
+
+	return id;
+}
+
 static int blkif_queue_discard_req(struct request *req)
 {
 	struct blkfront_info *info = req->rq_disk->private_data;
@@ -463,9 +480,7 @@ static int blkif_queue_discard_req(struct request *req)
 	unsigned long id;
 
 	/* Fill out a communications ring structure. */
-	ring_req = RING_GET_REQUEST(&info->ring, info->ring.req_prod_pvt);
-	id = get_id_from_freelist(info);
-	info->shadow[id].request = req;
+	id = blkif_ring_get_request(info, req, &ring_req);
 
 	ring_req->operation = BLKIF_OP_DISCARD;
 	ring_req->u.discard.nr_sectors = blk_rq_sectors(req);
@@ -476,8 +491,6 @@ static int blkif_queue_discard_req(struct request *req)
 	else
 		ring_req->u.discard.flag = 0;
 
-	info->ring.req_prod_pvt++;
-
 	/* Keep a private copy so we can reissue requests when recovering. */
 	info->shadow[id].req = *ring_req;
 
@@ -613,9 +626,7 @@ static int blkif_queue_rw_req(struct request *req)
 		new_persistent_gnts = 0;
 
 	/* Fill out a communications ring structure. */
-	ring_req = RING_GET_REQUEST(&info->ring, info->ring.req_prod_pvt);
-	id = get_id_from_freelist(info);
-	info->shadow[id].request = req;
+	id = blkif_ring_get_request(info, req, &ring_req);
 
 	BUG_ON(info->max_indirect_segments == 0 &&
 	       GREFS(req->nr_phys_segments) > BLKIF_MAX_SEGMENTS_PER_REQUEST);
@@ -628,7 +639,6 @@ static int blkif_queue_rw_req(struct request *req)
 	for_each_sg(info->shadow[id].sg, sg, num_sg, i)
 	       num_grant += gnttab_count_grant(sg->offset, sg->length);
 
-	ring_req->u.rw.id = id;
 	info->shadow[id].num_sg = num_sg;
 	if (num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
 		/*
@@ -694,8 +704,6 @@ static int blkif_queue_rw_req(struct request *req)
 	if (setup.segments)
 		kunmap_atomic(setup.segments);
 
-	info->ring.req_prod_pvt++;
-
 	/* Keep a private copy so we can reissue requests when recovering. */
 	info->shadow[id].req = *ring_req;
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 2/2] block/xen-blkfront: Handle non-indirect grant with 64KB pages
  2015-11-18 18:57 [PATCH v3 0/2] block/xen-blkfront: Support non-indirect grant with 64KB page granularity Julien Grall
                   ` (2 preceding siblings ...)
  2015-11-18 18:57 ` [PATCH v3 2/2] block/xen-blkfront: Handle non-indirect grant with 64KB pages Julien Grall
@ 2015-11-18 18:57 ` Julien Grall
  2015-12-01 15:30   ` Roger Pau Monné
  2015-12-01 15:30   ` Roger Pau Monné
  2015-11-30 16:16 ` [PATCH v3 0/2] block/xen-blkfront: Support non-indirect grant with 64KB page granularity Julien Grall
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 26+ messages in thread
From: Julien Grall @ 2015-11-18 18:57 UTC (permalink / raw)
  To: xen-devel, linux-kernel
  Cc: Julien Grall, Konrad Rzeszutek Wilk, Roger Pau Monné,
	Boris Ostrovsky, David Vrabel, Bob Liu

The minimal size of request in the block framework is always PAGE_SIZE.
It means that when 64KB guest is support, the request will at least be
64KB.

Although, if the backend doesn't support indirect descriptor (such as QDISK
in QEMU), a ring request is only able to accommodate 11 segments of 4KB
(i.e 44KB).

The current frontend is assuming that an I/O request will always fit in
a ring request. This is not true any more when using 64KB page
granularity and will therefore crash during boot.

On ARM64, the ABI is completely neutral to the page granularity used by
the domU. The guest has the choice between different page granularity
supported by the processors (for instance on ARM64: 4KB, 16KB, 64KB).
This can't be enforced by the hypervisor and therefore it's possible to
run guests using different page granularity.

So we can't mandate the block backend to support indirect descriptor
when the frontend is using 64KB page granularity and have to fix it
properly in the frontend.

The solution exposed below is based on modifying directly the frontend
guest rather than asking the block framework to support smaller size
(i.e < PAGE_SIZE). This is because the change is the block framework are
not trivial as everything seems to relying on a struct *page (see [1]).
Although, it may be possible that someone succeed to do it in the future
and we would therefore be able to use it.

Given that a block request may not fit in a single ring request, a
second request is introduced for the data that cannot fit in the first
one. This means that the second ring request should never be used on
Linux if the page size is smaller than 44KB.

To achieve the support of the extra ring request, the block queue size
is divided by two. Therefore, the ring will always contain enough space
to accommodate 2 ring requests. While this will reduce the overall
performance, it will make the implementation more contained. The way
forward to get better performance is to implement in the backend either
indirect descriptor or multiple grants ring.

Note that the parameters blk_queue_max_* helpers haven't been updated.
The block code will set the mimimum size supported and we may be able
to support directly any change in the block framework that lower down
the minimal size of a request.

[1] http://lists.xen.org/archives/html/xen-devel/2015-08/msg02200.html

Signed-off-by: Julien Grall <julien.grall@citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Bob Liu <bob.liu@oracle.com>

    Roger, I haven't kept your acked-by because the update of the final
    status.

    Changes in v3:
        - Typoes
        - Change the way the final status is computed to handle the
        possibility of the request handled by different backend (could
        happen during suspend/resume).

    Changes in v2:
        - Update the commit message
        - Typoes
        - Rename ring_req2/id2 to extra_ring_req/extra_id
---
 drivers/block/xen-blkfront.c | 228 ++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 212 insertions(+), 16 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 2248a47..33f6ff4 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -60,6 +60,20 @@
 
 #include <asm/xen/hypervisor.h>
 
+/*
+ * The minimal size of segment supported by the block framework is PAGE_SIZE.
+ * When Linux is using a different page size than xen, it may not be possible
+ * to put all the data in a single segment.
+ * This can happen when the backend doesn't support indirect descriptor and
+ * therefore the maximum amount of data that a request can carry is
+ * BLKIF_MAX_SEGMENTS_PER_REQUEST * XEN_PAGE_SIZE = 44KB
+ *
+ * Note that we only support one extra request. So the Linux page size
+ * should be <= ( 2 * BLKIF_MAX_SEGMENTS_PER_REQUEST * XEN_PAGE_SIZE) =
+ * 88KB.
+ */
+#define HAS_EXTRA_REQ (BLKIF_MAX_SEGMENTS_PER_REQUEST < XEN_PFN_PER_PAGE)
+
 enum blkif_state {
 	BLKIF_STATE_DISCONNECTED,
 	BLKIF_STATE_CONNECTED,
@@ -72,6 +86,13 @@ struct grant {
 	struct list_head node;
 };
 
+enum blk_req_status {
+	REQ_WAITING,
+	REQ_DONE,
+	REQ_ERROR,
+	REQ_EOPNOTSUPP,
+};
+
 struct blk_shadow {
 	struct blkif_request req;
 	struct request *request;
@@ -79,6 +100,14 @@ struct blk_shadow {
 	struct grant **indirect_grants;
 	struct scatterlist *sg;
 	unsigned int num_sg;
+	enum blk_req_status status;
+
+	#define NO_ASSOCIATED_ID ~0UL
+	/*
+	 * Id of the sibling if we ever need 2 requests when handling a
+	 * block I/O request
+	 */
+	unsigned long associated_id;
 };
 
 struct split_bio {
@@ -467,6 +496,8 @@ static unsigned long blkif_ring_get_request(struct blkfront_info *info,
 
 	id = get_id_from_freelist(info);
 	info->shadow[id].request = req;
+	info->shadow[id].status = REQ_WAITING;
+	info->shadow[id].associated_id = NO_ASSOCIATED_ID;
 
 	(*ring_req)->u.rw.id = id;
 
@@ -508,6 +539,9 @@ struct setup_rw_req {
 	bool need_copy;
 	unsigned int bvec_off;
 	char *bvec_data;
+
+	bool require_extra_req;
+	struct blkif_request *extra_ring_req;
 };
 
 static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
@@ -521,8 +555,24 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
 	unsigned int grant_idx = setup->grant_idx;
 	struct blkif_request *ring_req = setup->ring_req;
 	struct blkfront_info *info = setup->info;
+	/*
+	 * We always use the shadow of the first request to store the list
+	 * of grant associated to the block I/O request. This made the
+	 * completion more easy to handle even if the block I/O request is
+	 * split.
+	 */
 	struct blk_shadow *shadow = &info->shadow[setup->id];
 
+	if (unlikely(setup->require_extra_req &&
+		     grant_idx >= BLKIF_MAX_SEGMENTS_PER_REQUEST)) {
+		/*
+		 * We are using the second request, setup grant_idx
+		 * to be the index of the segment array
+		 */
+		grant_idx -= BLKIF_MAX_SEGMENTS_PER_REQUEST;
+		ring_req = setup->extra_ring_req;
+	}
+
 	if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
 	    (grant_idx % GRANTS_PER_INDIRECT_FRAME == 0)) {
 		if (setup->segments)
@@ -537,7 +587,11 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
 
 	gnt_list_entry = get_grant(&setup->gref_head, gfn, info);
 	ref = gnt_list_entry->gref;
-	shadow->grants_used[grant_idx] = gnt_list_entry;
+	/*
+	 * All the grants are stored in the shadow of the first
+	 * request. Therefore we have to use the global index
+	 */
+	shadow->grants_used[setup->grant_idx] = gnt_list_entry;
 
 	if (setup->need_copy) {
 		void *shared_data;
@@ -579,11 +633,31 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
 	(setup->grant_idx)++;
 }
 
+static void blkif_setup_extra_req(struct blkif_request *first,
+				  struct blkif_request *second)
+{
+	uint16_t nr_segments = first->u.rw.nr_segments;
+
+	/*
+	 * The second request is only present when the first request uses
+	 * all its segments. It's always the continuity of the first one.
+	 */
+	first->u.rw.nr_segments = BLKIF_MAX_SEGMENTS_PER_REQUEST;
+
+	second->u.rw.nr_segments = nr_segments - BLKIF_MAX_SEGMENTS_PER_REQUEST;
+	second->u.rw.sector_number = first->u.rw.sector_number +
+		(BLKIF_MAX_SEGMENTS_PER_REQUEST * XEN_PAGE_SIZE) / 512;
+
+	second->u.rw.handle = first->u.rw.handle;
+	second->operation = first->operation;
+}
+
 static int blkif_queue_rw_req(struct request *req)
 {
 	struct blkfront_info *info = req->rq_disk->private_data;
-	struct blkif_request *ring_req;
-	unsigned long id;
+	struct blkif_request *ring_req, *extra_ring_req = NULL;
+	unsigned long id, extra_id = NO_ASSOCIATED_ID;
+	bool require_extra_req = false;
 	int i;
 	struct setup_rw_req setup = {
 		.grant_idx = 0,
@@ -628,19 +702,19 @@ static int blkif_queue_rw_req(struct request *req)
 	/* Fill out a communications ring structure. */
 	id = blkif_ring_get_request(info, req, &ring_req);
 
-	BUG_ON(info->max_indirect_segments == 0 &&
-	       GREFS(req->nr_phys_segments) > BLKIF_MAX_SEGMENTS_PER_REQUEST);
-	BUG_ON(info->max_indirect_segments &&
-	       GREFS(req->nr_phys_segments) > info->max_indirect_segments);
-
 	num_sg = blk_rq_map_sg(req->q, req, info->shadow[id].sg);
 	num_grant = 0;
 	/* Calculate the number of grant used */
 	for_each_sg(info->shadow[id].sg, sg, num_sg, i)
 	       num_grant += gnttab_count_grant(sg->offset, sg->length);
 
+	require_extra_req = info->max_indirect_segments == 0 &&
+		num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST;
+	BUG_ON(!HAS_EXTRA_REQ && require_extra_req);
+
 	info->shadow[id].num_sg = num_sg;
-	if (num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
+	if (num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST &&
+	    likely(!require_extra_req)) {
 		/*
 		 * The indirect operation can only be a BLKIF_OP_READ or
 		 * BLKIF_OP_WRITE
@@ -680,10 +754,30 @@ static int blkif_queue_rw_req(struct request *req)
 			}
 		}
 		ring_req->u.rw.nr_segments = num_grant;
+		if (unlikely(require_extra_req)) {
+			extra_id = blkif_ring_get_request(info, req,
+							  &extra_ring_req);
+			/*
+			 * Only the first request contains the scatter-gather
+			 * list.
+			 */
+			info->shadow[extra_id].num_sg = 0;
+
+			blkif_setup_extra_req(ring_req, extra_ring_req);
+
+			/* Link the 2 requests together */
+			info->shadow[extra_id].associated_id = id;
+			info->shadow[id].associated_id = extra_id;
+		}
 	}
 
 	setup.ring_req = ring_req;
 	setup.id = id;
+
+	setup.require_extra_req = require_extra_req;
+	if (unlikely(require_extra_req))
+		setup.extra_ring_req = extra_ring_req;
+
 	for_each_sg(info->shadow[id].sg, sg, num_sg, i) {
 		BUG_ON(sg->offset + sg->length > PAGE_SIZE);
 
@@ -706,6 +800,8 @@ static int blkif_queue_rw_req(struct request *req)
 
 	/* Keep a private copy so we can reissue requests when recovering. */
 	info->shadow[id].req = *ring_req;
+	if (unlikely(require_extra_req))
+		info->shadow[extra_id].req = *extra_ring_req;
 
 	if (new_persistent_gnts)
 		gnttab_free_grant_references(setup.gref_head);
@@ -797,7 +893,16 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
 	memset(&info->tag_set, 0, sizeof(info->tag_set));
 	info->tag_set.ops = &blkfront_mq_ops;
 	info->tag_set.nr_hw_queues = 1;
-	info->tag_set.queue_depth =  BLK_RING_SIZE(info);
+	if (HAS_EXTRA_REQ && info->max_indirect_segments == 0) {
+		/*
+		 * When indirect descriptior is not supported, the I/O request
+		 * will be split between multiple request in the ring.
+		 * To avoid problems when sending the request, divide by
+		 * 2 the depth of the queue.
+		 */
+		info->tag_set.queue_depth =  BLK_RING_SIZE(info) / 2;
+	} else
+		info->tag_set.queue_depth = BLK_RING_SIZE(info);
 	info->tag_set.numa_node = NUMA_NO_NODE;
 	info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
 	info->tag_set.cmd_size = 0;
@@ -1217,19 +1322,92 @@ static void blkif_copy_from_grant(unsigned long gfn, unsigned int offset,
 	kunmap_atomic(shared_data);
 }
 
-static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
+static enum blk_req_status blkif_rsp_to_req_status(int rsp)
+{
+	switch (rsp)
+	{
+	case BLKIF_RSP_OKAY:
+		return REQ_DONE;
+	case BLKIF_RSP_ERROR:
+		return REQ_ERROR;
+	case BLKIF_RSP_EOPNOTSUPP:
+		return REQ_EOPNOTSUPP;
+	default:
+		return REQ_ERROR;
+	}
+}
+
+/*
+ * Get the final status of the block request based on two ring response
+ */
+static int blkif_get_final_status(enum blk_req_status s1,
+				  enum blk_req_status s2)
+{
+	BUG_ON(s1 == REQ_WAITING);
+	BUG_ON(s2 == REQ_WAITING);
+
+	if (s1 == REQ_ERROR || s2 == REQ_ERROR)
+		return BLKIF_RSP_ERROR;
+	else if (s1 == REQ_EOPNOTSUPP || s2 == REQ_EOPNOTSUPP)
+		return BLKIF_RSP_EOPNOTSUPP;
+	else
+		return BLKIF_RSP_OKAY;
+}
+
+static bool blkif_completion(unsigned long *id, struct blkfront_info *info,
 			     struct blkif_response *bret)
 {
 	int i = 0;
 	struct scatterlist *sg;
 	int num_sg, num_grant;
+	struct blk_shadow *s = &info->shadow[*id];
 	struct copy_from_grant data = {
-		.s = s,
 		.grant_idx = 0,
 	};
 
 	num_grant = s->req.operation == BLKIF_OP_INDIRECT ?
 		s->req.u.indirect.nr_segments : s->req.u.rw.nr_segments;
+
+	/* The I/O request may be split in two */
+	if (unlikely(s->associated_id != NO_ASSOCIATED_ID)) {
+		struct blk_shadow *s2 = &info->shadow[s->associated_id];
+
+		/* Keep the status of the current response in shadow */
+		s->status = blkif_rsp_to_req_status(bret->status);
+
+		/* Wait the second response if not yet here */
+		if (s2->status == REQ_WAITING)
+			return 0;
+
+		bret->status = blkif_get_final_status(s->status,
+						      s2->status);
+
+		/*
+		 * All the grants is stored in the first shadow in order
+		 * to make the completion code simpler.
+		 */
+		num_grant += s2->req.u.rw.nr_segments;
+
+		/*
+		 * The two responses may not come in order. Only the
+		 * first request will store the scatter-gather list
+		 */
+		if (s2->num_sg != 0) {
+			/* Update "id" with the ID of the first response */
+			*id = s->associated_id;
+			s = s2;
+		}
+
+		/*
+		 * We don't need anymore the second request, so recycling
+		 * it now.
+		 */
+		if (add_id_to_freelist(info, s->associated_id))
+			WARN(1, "%s: can't recycle the second part (id = %ld) of the request\n",
+			     info->gd->disk_name, s->associated_id);
+	}
+
+	data.s = s;
 	num_sg = s->num_sg;
 
 	if (bret->operation == BLKIF_OP_READ && info->feature_persistent) {
@@ -1299,6 +1477,8 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
 			}
 		}
 	}
+
+	return 1;
 }
 
 static irqreturn_t blkif_interrupt(int irq, void *dev_id)
@@ -1340,8 +1520,14 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 		}
 		req  = info->shadow[id].request;
 
-		if (bret->operation != BLKIF_OP_DISCARD)
-			blkif_completion(&info->shadow[id], info, bret);
+		if (bret->operation != BLKIF_OP_DISCARD) {
+			/*
+			 * We may need to wait for an extra response if the
+			 * I/O request is split in 2
+			 */
+			if (!blkif_completion(&id, info, bret))
+				continue;
+		}
 
 		if (add_id_to_freelist(info, id)) {
 			WARN(1, "%s: response to %s (id %ld) couldn't be recycled!\n",
@@ -1864,8 +2050,18 @@ static int blkfront_setup_indirect(struct blkfront_info *info)
 	unsigned int psegs, grants;
 	int err, i;
 
-	if (info->max_indirect_segments == 0)
-		grants = BLKIF_MAX_SEGMENTS_PER_REQUEST;
+	if (info->max_indirect_segments == 0) {
+		if (!HAS_EXTRA_REQ)
+			grants = BLKIF_MAX_SEGMENTS_PER_REQUEST;
+		else {
+			/*
+			 * When an extra req is required, the maximum
+			 * grants supported is related to the size of the
+			 * Linux block segment
+			 */
+			grants = GRANTS_PER_PSEG;
+		}
+	}
 	else
 		grants = info->max_indirect_segments;
 	psegs = grants / GRANTS_PER_PSEG;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 2/2] block/xen-blkfront: Handle non-indirect grant with 64KB pages
  2015-11-18 18:57 [PATCH v3 0/2] block/xen-blkfront: Support non-indirect grant with 64KB page granularity Julien Grall
  2015-11-18 18:57 ` [PATCH v3 1/2] block/xen-blkfront: Introduce blkif_ring_get_request Julien Grall
  2015-11-18 18:57 ` Julien Grall
@ 2015-11-18 18:57 ` Julien Grall
  2015-11-18 18:57 ` Julien Grall
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: Julien Grall @ 2015-11-18 18:57 UTC (permalink / raw)
  To: xen-devel, linux-kernel
  Cc: Julien Grall, David Vrabel, Boris Ostrovsky, Roger Pau Monné

The minimal size of request in the block framework is always PAGE_SIZE.
It means that when 64KB guest is support, the request will at least be
64KB.

Although, if the backend doesn't support indirect descriptor (such as QDISK
in QEMU), a ring request is only able to accommodate 11 segments of 4KB
(i.e 44KB).

The current frontend is assuming that an I/O request will always fit in
a ring request. This is not true any more when using 64KB page
granularity and will therefore crash during boot.

On ARM64, the ABI is completely neutral to the page granularity used by
the domU. The guest has the choice between different page granularity
supported by the processors (for instance on ARM64: 4KB, 16KB, 64KB).
This can't be enforced by the hypervisor and therefore it's possible to
run guests using different page granularity.

So we can't mandate the block backend to support indirect descriptor
when the frontend is using 64KB page granularity and have to fix it
properly in the frontend.

The solution exposed below is based on modifying directly the frontend
guest rather than asking the block framework to support smaller size
(i.e < PAGE_SIZE). This is because the change is the block framework are
not trivial as everything seems to relying on a struct *page (see [1]).
Although, it may be possible that someone succeed to do it in the future
and we would therefore be able to use it.

Given that a block request may not fit in a single ring request, a
second request is introduced for the data that cannot fit in the first
one. This means that the second ring request should never be used on
Linux if the page size is smaller than 44KB.

To achieve the support of the extra ring request, the block queue size
is divided by two. Therefore, the ring will always contain enough space
to accommodate 2 ring requests. While this will reduce the overall
performance, it will make the implementation more contained. The way
forward to get better performance is to implement in the backend either
indirect descriptor or multiple grants ring.

Note that the parameters blk_queue_max_* helpers haven't been updated.
The block code will set the mimimum size supported and we may be able
to support directly any change in the block framework that lower down
the minimal size of a request.

[1] http://lists.xen.org/archives/html/xen-devel/2015-08/msg02200.html

Signed-off-by: Julien Grall <julien.grall@citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Bob Liu <bob.liu@oracle.com>

    Roger, I haven't kept your acked-by because the update of the final
    status.

    Changes in v3:
        - Typoes
        - Change the way the final status is computed to handle the
        possibility of the request handled by different backend (could
        happen during suspend/resume).

    Changes in v2:
        - Update the commit message
        - Typoes
        - Rename ring_req2/id2 to extra_ring_req/extra_id
---
 drivers/block/xen-blkfront.c | 228 ++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 212 insertions(+), 16 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 2248a47..33f6ff4 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -60,6 +60,20 @@
 
 #include <asm/xen/hypervisor.h>
 
+/*
+ * The minimal size of segment supported by the block framework is PAGE_SIZE.
+ * When Linux is using a different page size than xen, it may not be possible
+ * to put all the data in a single segment.
+ * This can happen when the backend doesn't support indirect descriptor and
+ * therefore the maximum amount of data that a request can carry is
+ * BLKIF_MAX_SEGMENTS_PER_REQUEST * XEN_PAGE_SIZE = 44KB
+ *
+ * Note that we only support one extra request. So the Linux page size
+ * should be <= ( 2 * BLKIF_MAX_SEGMENTS_PER_REQUEST * XEN_PAGE_SIZE) =
+ * 88KB.
+ */
+#define HAS_EXTRA_REQ (BLKIF_MAX_SEGMENTS_PER_REQUEST < XEN_PFN_PER_PAGE)
+
 enum blkif_state {
 	BLKIF_STATE_DISCONNECTED,
 	BLKIF_STATE_CONNECTED,
@@ -72,6 +86,13 @@ struct grant {
 	struct list_head node;
 };
 
+enum blk_req_status {
+	REQ_WAITING,
+	REQ_DONE,
+	REQ_ERROR,
+	REQ_EOPNOTSUPP,
+};
+
 struct blk_shadow {
 	struct blkif_request req;
 	struct request *request;
@@ -79,6 +100,14 @@ struct blk_shadow {
 	struct grant **indirect_grants;
 	struct scatterlist *sg;
 	unsigned int num_sg;
+	enum blk_req_status status;
+
+	#define NO_ASSOCIATED_ID ~0UL
+	/*
+	 * Id of the sibling if we ever need 2 requests when handling a
+	 * block I/O request
+	 */
+	unsigned long associated_id;
 };
 
 struct split_bio {
@@ -467,6 +496,8 @@ static unsigned long blkif_ring_get_request(struct blkfront_info *info,
 
 	id = get_id_from_freelist(info);
 	info->shadow[id].request = req;
+	info->shadow[id].status = REQ_WAITING;
+	info->shadow[id].associated_id = NO_ASSOCIATED_ID;
 
 	(*ring_req)->u.rw.id = id;
 
@@ -508,6 +539,9 @@ struct setup_rw_req {
 	bool need_copy;
 	unsigned int bvec_off;
 	char *bvec_data;
+
+	bool require_extra_req;
+	struct blkif_request *extra_ring_req;
 };
 
 static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
@@ -521,8 +555,24 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
 	unsigned int grant_idx = setup->grant_idx;
 	struct blkif_request *ring_req = setup->ring_req;
 	struct blkfront_info *info = setup->info;
+	/*
+	 * We always use the shadow of the first request to store the list
+	 * of grant associated to the block I/O request. This made the
+	 * completion more easy to handle even if the block I/O request is
+	 * split.
+	 */
 	struct blk_shadow *shadow = &info->shadow[setup->id];
 
+	if (unlikely(setup->require_extra_req &&
+		     grant_idx >= BLKIF_MAX_SEGMENTS_PER_REQUEST)) {
+		/*
+		 * We are using the second request, setup grant_idx
+		 * to be the index of the segment array
+		 */
+		grant_idx -= BLKIF_MAX_SEGMENTS_PER_REQUEST;
+		ring_req = setup->extra_ring_req;
+	}
+
 	if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
 	    (grant_idx % GRANTS_PER_INDIRECT_FRAME == 0)) {
 		if (setup->segments)
@@ -537,7 +587,11 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
 
 	gnt_list_entry = get_grant(&setup->gref_head, gfn, info);
 	ref = gnt_list_entry->gref;
-	shadow->grants_used[grant_idx] = gnt_list_entry;
+	/*
+	 * All the grants are stored in the shadow of the first
+	 * request. Therefore we have to use the global index
+	 */
+	shadow->grants_used[setup->grant_idx] = gnt_list_entry;
 
 	if (setup->need_copy) {
 		void *shared_data;
@@ -579,11 +633,31 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
 	(setup->grant_idx)++;
 }
 
+static void blkif_setup_extra_req(struct blkif_request *first,
+				  struct blkif_request *second)
+{
+	uint16_t nr_segments = first->u.rw.nr_segments;
+
+	/*
+	 * The second request is only present when the first request uses
+	 * all its segments. It's always the continuity of the first one.
+	 */
+	first->u.rw.nr_segments = BLKIF_MAX_SEGMENTS_PER_REQUEST;
+
+	second->u.rw.nr_segments = nr_segments - BLKIF_MAX_SEGMENTS_PER_REQUEST;
+	second->u.rw.sector_number = first->u.rw.sector_number +
+		(BLKIF_MAX_SEGMENTS_PER_REQUEST * XEN_PAGE_SIZE) / 512;
+
+	second->u.rw.handle = first->u.rw.handle;
+	second->operation = first->operation;
+}
+
 static int blkif_queue_rw_req(struct request *req)
 {
 	struct blkfront_info *info = req->rq_disk->private_data;
-	struct blkif_request *ring_req;
-	unsigned long id;
+	struct blkif_request *ring_req, *extra_ring_req = NULL;
+	unsigned long id, extra_id = NO_ASSOCIATED_ID;
+	bool require_extra_req = false;
 	int i;
 	struct setup_rw_req setup = {
 		.grant_idx = 0,
@@ -628,19 +702,19 @@ static int blkif_queue_rw_req(struct request *req)
 	/* Fill out a communications ring structure. */
 	id = blkif_ring_get_request(info, req, &ring_req);
 
-	BUG_ON(info->max_indirect_segments == 0 &&
-	       GREFS(req->nr_phys_segments) > BLKIF_MAX_SEGMENTS_PER_REQUEST);
-	BUG_ON(info->max_indirect_segments &&
-	       GREFS(req->nr_phys_segments) > info->max_indirect_segments);
-
 	num_sg = blk_rq_map_sg(req->q, req, info->shadow[id].sg);
 	num_grant = 0;
 	/* Calculate the number of grant used */
 	for_each_sg(info->shadow[id].sg, sg, num_sg, i)
 	       num_grant += gnttab_count_grant(sg->offset, sg->length);
 
+	require_extra_req = info->max_indirect_segments == 0 &&
+		num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST;
+	BUG_ON(!HAS_EXTRA_REQ && require_extra_req);
+
 	info->shadow[id].num_sg = num_sg;
-	if (num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
+	if (num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST &&
+	    likely(!require_extra_req)) {
 		/*
 		 * The indirect operation can only be a BLKIF_OP_READ or
 		 * BLKIF_OP_WRITE
@@ -680,10 +754,30 @@ static int blkif_queue_rw_req(struct request *req)
 			}
 		}
 		ring_req->u.rw.nr_segments = num_grant;
+		if (unlikely(require_extra_req)) {
+			extra_id = blkif_ring_get_request(info, req,
+							  &extra_ring_req);
+			/*
+			 * Only the first request contains the scatter-gather
+			 * list.
+			 */
+			info->shadow[extra_id].num_sg = 0;
+
+			blkif_setup_extra_req(ring_req, extra_ring_req);
+
+			/* Link the 2 requests together */
+			info->shadow[extra_id].associated_id = id;
+			info->shadow[id].associated_id = extra_id;
+		}
 	}
 
 	setup.ring_req = ring_req;
 	setup.id = id;
+
+	setup.require_extra_req = require_extra_req;
+	if (unlikely(require_extra_req))
+		setup.extra_ring_req = extra_ring_req;
+
 	for_each_sg(info->shadow[id].sg, sg, num_sg, i) {
 		BUG_ON(sg->offset + sg->length > PAGE_SIZE);
 
@@ -706,6 +800,8 @@ static int blkif_queue_rw_req(struct request *req)
 
 	/* Keep a private copy so we can reissue requests when recovering. */
 	info->shadow[id].req = *ring_req;
+	if (unlikely(require_extra_req))
+		info->shadow[extra_id].req = *extra_ring_req;
 
 	if (new_persistent_gnts)
 		gnttab_free_grant_references(setup.gref_head);
@@ -797,7 +893,16 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
 	memset(&info->tag_set, 0, sizeof(info->tag_set));
 	info->tag_set.ops = &blkfront_mq_ops;
 	info->tag_set.nr_hw_queues = 1;
-	info->tag_set.queue_depth =  BLK_RING_SIZE(info);
+	if (HAS_EXTRA_REQ && info->max_indirect_segments == 0) {
+		/*
+		 * When indirect descriptior is not supported, the I/O request
+		 * will be split between multiple request in the ring.
+		 * To avoid problems when sending the request, divide by
+		 * 2 the depth of the queue.
+		 */
+		info->tag_set.queue_depth =  BLK_RING_SIZE(info) / 2;
+	} else
+		info->tag_set.queue_depth = BLK_RING_SIZE(info);
 	info->tag_set.numa_node = NUMA_NO_NODE;
 	info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
 	info->tag_set.cmd_size = 0;
@@ -1217,19 +1322,92 @@ static void blkif_copy_from_grant(unsigned long gfn, unsigned int offset,
 	kunmap_atomic(shared_data);
 }
 
-static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
+static enum blk_req_status blkif_rsp_to_req_status(int rsp)
+{
+	switch (rsp)
+	{
+	case BLKIF_RSP_OKAY:
+		return REQ_DONE;
+	case BLKIF_RSP_ERROR:
+		return REQ_ERROR;
+	case BLKIF_RSP_EOPNOTSUPP:
+		return REQ_EOPNOTSUPP;
+	default:
+		return REQ_ERROR;
+	}
+}
+
+/*
+ * Get the final status of the block request based on two ring response
+ */
+static int blkif_get_final_status(enum blk_req_status s1,
+				  enum blk_req_status s2)
+{
+	BUG_ON(s1 == REQ_WAITING);
+	BUG_ON(s2 == REQ_WAITING);
+
+	if (s1 == REQ_ERROR || s2 == REQ_ERROR)
+		return BLKIF_RSP_ERROR;
+	else if (s1 == REQ_EOPNOTSUPP || s2 == REQ_EOPNOTSUPP)
+		return BLKIF_RSP_EOPNOTSUPP;
+	else
+		return BLKIF_RSP_OKAY;
+}
+
+static bool blkif_completion(unsigned long *id, struct blkfront_info *info,
 			     struct blkif_response *bret)
 {
 	int i = 0;
 	struct scatterlist *sg;
 	int num_sg, num_grant;
+	struct blk_shadow *s = &info->shadow[*id];
 	struct copy_from_grant data = {
-		.s = s,
 		.grant_idx = 0,
 	};
 
 	num_grant = s->req.operation == BLKIF_OP_INDIRECT ?
 		s->req.u.indirect.nr_segments : s->req.u.rw.nr_segments;
+
+	/* The I/O request may be split in two */
+	if (unlikely(s->associated_id != NO_ASSOCIATED_ID)) {
+		struct blk_shadow *s2 = &info->shadow[s->associated_id];
+
+		/* Keep the status of the current response in shadow */
+		s->status = blkif_rsp_to_req_status(bret->status);
+
+		/* Wait the second response if not yet here */
+		if (s2->status == REQ_WAITING)
+			return 0;
+
+		bret->status = blkif_get_final_status(s->status,
+						      s2->status);
+
+		/*
+		 * All the grants is stored in the first shadow in order
+		 * to make the completion code simpler.
+		 */
+		num_grant += s2->req.u.rw.nr_segments;
+
+		/*
+		 * The two responses may not come in order. Only the
+		 * first request will store the scatter-gather list
+		 */
+		if (s2->num_sg != 0) {
+			/* Update "id" with the ID of the first response */
+			*id = s->associated_id;
+			s = s2;
+		}
+
+		/*
+		 * We don't need anymore the second request, so recycling
+		 * it now.
+		 */
+		if (add_id_to_freelist(info, s->associated_id))
+			WARN(1, "%s: can't recycle the second part (id = %ld) of the request\n",
+			     info->gd->disk_name, s->associated_id);
+	}
+
+	data.s = s;
 	num_sg = s->num_sg;
 
 	if (bret->operation == BLKIF_OP_READ && info->feature_persistent) {
@@ -1299,6 +1477,8 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
 			}
 		}
 	}
+
+	return 1;
 }
 
 static irqreturn_t blkif_interrupt(int irq, void *dev_id)
@@ -1340,8 +1520,14 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 		}
 		req  = info->shadow[id].request;
 
-		if (bret->operation != BLKIF_OP_DISCARD)
-			blkif_completion(&info->shadow[id], info, bret);
+		if (bret->operation != BLKIF_OP_DISCARD) {
+			/*
+			 * We may need to wait for an extra response if the
+			 * I/O request is split in 2
+			 */
+			if (!blkif_completion(&id, info, bret))
+				continue;
+		}
 
 		if (add_id_to_freelist(info, id)) {
 			WARN(1, "%s: response to %s (id %ld) couldn't be recycled!\n",
@@ -1864,8 +2050,18 @@ static int blkfront_setup_indirect(struct blkfront_info *info)
 	unsigned int psegs, grants;
 	int err, i;
 
-	if (info->max_indirect_segments == 0)
-		grants = BLKIF_MAX_SEGMENTS_PER_REQUEST;
+	if (info->max_indirect_segments == 0) {
+		if (!HAS_EXTRA_REQ)
+			grants = BLKIF_MAX_SEGMENTS_PER_REQUEST;
+		else {
+			/*
+			 * When an extra req is required, the maximum
+			 * grants supported is related to the size of the
+			 * Linux block segment
+			 */
+			grants = GRANTS_PER_PSEG;
+		}
+	}
 	else
 		grants = info->max_indirect_segments;
 	psegs = grants / GRANTS_PER_PSEG;
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [Xen-devel] [PATCH v3 0/2] block/xen-blkfront: Support non-indirect grant with 64KB page granularity
  2015-11-18 18:57 [PATCH v3 0/2] block/xen-blkfront: Support non-indirect grant with 64KB page granularity Julien Grall
                   ` (4 preceding siblings ...)
  2015-11-30 16:16 ` [PATCH v3 0/2] block/xen-blkfront: Support non-indirect grant with 64KB page granularity Julien Grall
@ 2015-11-30 16:16 ` Julien Grall
  2015-11-30 18:45 ` Julien Grall
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: Julien Grall @ 2015-11-30 16:16 UTC (permalink / raw)
  To: xen-devel, linux-kernel
  Cc: David Vrabel, Boris Ostrovsky, Roger Pau Monné

Hi,

Ping?

Regards,

On 18/11/15 18:57, Julien Grall wrote:
> Hi all,
> 
> This is a follow-up on the previous discussion [1] related to guest using 64KB
> page granularity which doesn't boot when the backend isn't using indirect
> descriptor.
> 
> This has been successfully tested on ARM64 with both 64KB and 4KB page
> granularity guests and QEMU as the backend. Indeed QEMU doesn't support
> indirect descriptor.
> 
> This series is based on xentip/for-linus-4.4 which include the support for
> 64KB Linux guest.
> 
> For all the changes see in each patch.
> 
> Sincerely yours,
> 
> [1] http://lists.xen.org/archives/html/xen-devel/2015-08/msg01659.html
> 
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: "Roger Pau Monné" <roger.pau@citrix.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: David Vrabel <david.vrabel@citrix.com>
> Cc: Bob Liu <bob.liu@oracle.com>
> 
> Julien Grall (2):
>   block/xen-blkfront: Introduce blkif_ring_get_request
>   block/xen-blkfront: Handle non-indirect grant with 64KB pages
> 
>  drivers/block/xen-blkfront.c | 258 ++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 231 insertions(+), 27 deletions(-)
> 


-- 
Julien Grall

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/2] block/xen-blkfront: Support non-indirect grant with 64KB page granularity
  2015-11-18 18:57 [PATCH v3 0/2] block/xen-blkfront: Support non-indirect grant with 64KB page granularity Julien Grall
                   ` (3 preceding siblings ...)
  2015-11-18 18:57 ` Julien Grall
@ 2015-11-30 16:16 ` Julien Grall
  2015-11-30 16:16 ` [Xen-devel] " Julien Grall
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: Julien Grall @ 2015-11-30 16:16 UTC (permalink / raw)
  To: xen-devel, linux-kernel
  Cc: Boris Ostrovsky, David Vrabel, Roger Pau Monné

Hi,

Ping?

Regards,

On 18/11/15 18:57, Julien Grall wrote:
> Hi all,
> 
> This is a follow-up on the previous discussion [1] related to guest using 64KB
> page granularity which doesn't boot when the backend isn't using indirect
> descriptor.
> 
> This has been successfully tested on ARM64 with both 64KB and 4KB page
> granularity guests and QEMU as the backend. Indeed QEMU doesn't support
> indirect descriptor.
> 
> This series is based on xentip/for-linus-4.4 which include the support for
> 64KB Linux guest.
> 
> For all the changes see in each patch.
> 
> Sincerely yours,
> 
> [1] http://lists.xen.org/archives/html/xen-devel/2015-08/msg01659.html
> 
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: "Roger Pau Monné" <roger.pau@citrix.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: David Vrabel <david.vrabel@citrix.com>
> Cc: Bob Liu <bob.liu@oracle.com>
> 
> Julien Grall (2):
>   block/xen-blkfront: Introduce blkif_ring_get_request
>   block/xen-blkfront: Handle non-indirect grant with 64KB pages
> 
>  drivers/block/xen-blkfront.c | 258 ++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 231 insertions(+), 27 deletions(-)
> 


-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Xen-devel] [PATCH v3 0/2] block/xen-blkfront: Support non-indirect grant with 64KB page granularity
  2015-11-18 18:57 [PATCH v3 0/2] block/xen-blkfront: Support non-indirect grant with 64KB page granularity Julien Grall
                   ` (5 preceding siblings ...)
  2015-11-30 16:16 ` [Xen-devel] " Julien Grall
@ 2015-11-30 18:45 ` Julien Grall
  2015-11-30 18:45 ` Julien Grall
  2015-12-01 15:37 ` Konrad Rzeszutek Wilk
  8 siblings, 0 replies; 26+ messages in thread
From: Julien Grall @ 2015-11-30 18:45 UTC (permalink / raw)
  To: xen-devel, linux-kernel
  Cc: David Vrabel, Boris Ostrovsky, Roger Pau Monné,
	bob.liu, konrad.wilk

Hi,

Just noticed that Bob and Konrad have not been correctly CCed.

Regards,

On 18/11/15 18:57, Julien Grall wrote:
> Hi all,
> 
> This is a follow-up on the previous discussion [1] related to guest using 64KB
> page granularity which doesn't boot when the backend isn't using indirect
> descriptor.
> 
> This has been successfully tested on ARM64 with both 64KB and 4KB page
> granularity guests and QEMU as the backend. Indeed QEMU doesn't support
> indirect descriptor.
> 
> This series is based on xentip/for-linus-4.4 which include the support for
> 64KB Linux guest.
> 
> For all the changes see in each patch.
> 
> Sincerely yours,
> 
> [1] http://lists.xen.org/archives/html/xen-devel/2015-08/msg01659.html
> 
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: "Roger Pau Monné" <roger.pau@citrix.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: David Vrabel <david.vrabel@citrix.com>
> Cc: Bob Liu <bob.liu@oracle.com>
> 
> Julien Grall (2):
>   block/xen-blkfront: Introduce blkif_ring_get_request
>   block/xen-blkfront: Handle non-indirect grant with 64KB pages
> 
>  drivers/block/xen-blkfront.c | 258 ++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 231 insertions(+), 27 deletions(-)
> 


-- 
Julien Grall

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/2] block/xen-blkfront: Support non-indirect grant with 64KB page granularity
  2015-11-18 18:57 [PATCH v3 0/2] block/xen-blkfront: Support non-indirect grant with 64KB page granularity Julien Grall
                   ` (6 preceding siblings ...)
  2015-11-30 18:45 ` Julien Grall
@ 2015-11-30 18:45 ` Julien Grall
  2015-12-01 15:37 ` Konrad Rzeszutek Wilk
  8 siblings, 0 replies; 26+ messages in thread
From: Julien Grall @ 2015-11-30 18:45 UTC (permalink / raw)
  To: xen-devel, linux-kernel
  Cc: Boris Ostrovsky, David Vrabel, Roger Pau Monné

Hi,

Just noticed that Bob and Konrad have not been correctly CCed.

Regards,

On 18/11/15 18:57, Julien Grall wrote:
> Hi all,
> 
> This is a follow-up on the previous discussion [1] related to guest using 64KB
> page granularity which doesn't boot when the backend isn't using indirect
> descriptor.
> 
> This has been successfully tested on ARM64 with both 64KB and 4KB page
> granularity guests and QEMU as the backend. Indeed QEMU doesn't support
> indirect descriptor.
> 
> This series is based on xentip/for-linus-4.4 which include the support for
> 64KB Linux guest.
> 
> For all the changes see in each patch.
> 
> Sincerely yours,
> 
> [1] http://lists.xen.org/archives/html/xen-devel/2015-08/msg01659.html
> 
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: "Roger Pau Monné" <roger.pau@citrix.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: David Vrabel <david.vrabel@citrix.com>
> Cc: Bob Liu <bob.liu@oracle.com>
> 
> Julien Grall (2):
>   block/xen-blkfront: Introduce blkif_ring_get_request
>   block/xen-blkfront: Handle non-indirect grant with 64KB pages
> 
>  drivers/block/xen-blkfront.c | 258 ++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 231 insertions(+), 27 deletions(-)
> 


-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/2] block/xen-blkfront: Handle non-indirect grant with 64KB pages
  2015-11-18 18:57 ` Julien Grall
  2015-12-01 15:30   ` Roger Pau Monné
@ 2015-12-01 15:30   ` Roger Pau Monné
  2015-12-08 12:45     ` Julien Grall
  2015-12-08 12:45     ` [Xen-devel] " Julien Grall
  1 sibling, 2 replies; 26+ messages in thread
From: Roger Pau Monné @ 2015-12-01 15:30 UTC (permalink / raw)
  To: Julien Grall, xen-devel, linux-kernel
  Cc: Konrad Rzeszutek Wilk, Boris Ostrovsky, David Vrabel, Bob Liu

El 18/11/15 a les 19.57, Julien Grall ha escrit:
> The minimal size of request in the block framework is always PAGE_SIZE.
> It means that when 64KB guest is support, the request will at least be
> 64KB.
> 
> Although, if the backend doesn't support indirect descriptor (such as QDISK
> in QEMU), a ring request is only able to accommodate 11 segments of 4KB
> (i.e 44KB).
> 
> The current frontend is assuming that an I/O request will always fit in
> a ring request. This is not true any more when using 64KB page
> granularity and will therefore crash during boot.
> 
> On ARM64, the ABI is completely neutral to the page granularity used by
> the domU. The guest has the choice between different page granularity
> supported by the processors (for instance on ARM64: 4KB, 16KB, 64KB).
> This can't be enforced by the hypervisor and therefore it's possible to
> run guests using different page granularity.
> 
> So we can't mandate the block backend to support indirect descriptor
> when the frontend is using 64KB page granularity and have to fix it
> properly in the frontend.
> 
> The solution exposed below is based on modifying directly the frontend
> guest rather than asking the block framework to support smaller size
> (i.e < PAGE_SIZE). This is because the change is the block framework are
> not trivial as everything seems to relying on a struct *page (see [1]).
> Although, it may be possible that someone succeed to do it in the future
> and we would therefore be able to use it.
> 
> Given that a block request may not fit in a single ring request, a
> second request is introduced for the data that cannot fit in the first
> one. This means that the second ring request should never be used on
> Linux if the page size is smaller than 44KB.
> 
> To achieve the support of the extra ring request, the block queue size
> is divided by two. Therefore, the ring will always contain enough space
> to accommodate 2 ring requests. While this will reduce the overall
> performance, it will make the implementation more contained. The way
> forward to get better performance is to implement in the backend either
> indirect descriptor or multiple grants ring.
> 
> Note that the parameters blk_queue_max_* helpers haven't been updated.
> The block code will set the mimimum size supported and we may be able
> to support directly any change in the block framework that lower down
> the minimal size of a request.
> 
> [1] http://lists.xen.org/archives/html/xen-devel/2015-08/msg02200.html
> 
> Signed-off-by: Julien Grall <julien.grall@citrix.com>

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Just a couple of cosmetic comments below. Unless anyone else has real
comments I don't think you need to resend.

> ---
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: "Roger Pau Monné" <roger.pau@citrix.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: David Vrabel <david.vrabel@citrix.com>
> Cc: Bob Liu <bob.liu@oracle.com>
> 
>     Roger, I haven't kept your acked-by because the update of the final
>     status.
> 
>     Changes in v3:
>         - Typoes
>         - Change the way the final status is computed to handle the
>         possibility of the request handled by different backend (could
>         happen during suspend/resume).
> 
>     Changes in v2:
>         - Update the commit message
>         - Typoes
>         - Rename ring_req2/id2 to extra_ring_req/extra_id
> ---
>  drivers/block/xen-blkfront.c | 228 ++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 212 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 2248a47..33f6ff4 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -60,6 +60,20 @@
>  
>  #include <asm/xen/hypervisor.h>
>  
> +/*
> + * The minimal size of segment supported by the block framework is PAGE_SIZE.
> + * When Linux is using a different page size than xen, it may not be possible
> + * to put all the data in a single segment.
> + * This can happen when the backend doesn't support indirect descriptor and
> + * therefore the maximum amount of data that a request can carry is
> + * BLKIF_MAX_SEGMENTS_PER_REQUEST * XEN_PAGE_SIZE = 44KB
> + *
> + * Note that we only support one extra request. So the Linux page size
> + * should be <= ( 2 * BLKIF_MAX_SEGMENTS_PER_REQUEST * XEN_PAGE_SIZE) =
> + * 88KB.
> + */
> +#define HAS_EXTRA_REQ (BLKIF_MAX_SEGMENTS_PER_REQUEST < XEN_PFN_PER_PAGE)
> +
>  enum blkif_state {
>  	BLKIF_STATE_DISCONNECTED,
>  	BLKIF_STATE_CONNECTED,
> @@ -72,6 +86,13 @@ struct grant {
>  	struct list_head node;
>  };
>  
> +enum blk_req_status {
> +	REQ_WAITING,
> +	REQ_DONE,
> +	REQ_ERROR,
> +	REQ_EOPNOTSUPP,
> +};
> +
>  struct blk_shadow {
>  	struct blkif_request req;
>  	struct request *request;
> @@ -79,6 +100,14 @@ struct blk_shadow {
>  	struct grant **indirect_grants;
>  	struct scatterlist *sg;
>  	unsigned int num_sg;
> +	enum blk_req_status status;
> +
> +	#define NO_ASSOCIATED_ID ~0UL
> +	/*
> +	 * Id of the sibling if we ever need 2 requests when handling a
> +	 * block I/O request
> +	 */
> +	unsigned long associated_id;
>  };
>  
>  struct split_bio {
> @@ -467,6 +496,8 @@ static unsigned long blkif_ring_get_request(struct blkfront_info *info,
>  
>  	id = get_id_from_freelist(info);
>  	info->shadow[id].request = req;
> +	info->shadow[id].status = REQ_WAITING;
> +	info->shadow[id].associated_id = NO_ASSOCIATED_ID;
>  
>  	(*ring_req)->u.rw.id = id;
>  
> @@ -508,6 +539,9 @@ struct setup_rw_req {
>  	bool need_copy;
>  	unsigned int bvec_off;
>  	char *bvec_data;
> +
> +	bool require_extra_req;
> +	struct blkif_request *extra_ring_req;
>  };
>  
>  static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
> @@ -521,8 +555,24 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
>  	unsigned int grant_idx = setup->grant_idx;
>  	struct blkif_request *ring_req = setup->ring_req;
>  	struct blkfront_info *info = setup->info;
> +	/*
> +	 * We always use the shadow of the first request to store the list
> +	 * of grant associated to the block I/O request. This made the
> +	 * completion more easy to handle even if the block I/O request is
> +	 * split.
> +	 */
>  	struct blk_shadow *shadow = &info->shadow[setup->id];
>  
> +	if (unlikely(setup->require_extra_req &&
> +		     grant_idx >= BLKIF_MAX_SEGMENTS_PER_REQUEST)) {
> +		/*
> +		 * We are using the second request, setup grant_idx
> +		 * to be the index of the segment array
> +		 */
> +		grant_idx -= BLKIF_MAX_SEGMENTS_PER_REQUEST;
> +		ring_req = setup->extra_ring_req;
> +	}
> +
>  	if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
>  	    (grant_idx % GRANTS_PER_INDIRECT_FRAME == 0)) {
>  		if (setup->segments)
> @@ -537,7 +587,11 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
>  
>  	gnt_list_entry = get_grant(&setup->gref_head, gfn, info);
>  	ref = gnt_list_entry->gref;
> -	shadow->grants_used[grant_idx] = gnt_list_entry;
> +	/*
> +	 * All the grants are stored in the shadow of the first
> +	 * request. Therefore we have to use the global index
> +	 */
> +	shadow->grants_used[setup->grant_idx] = gnt_list_entry;
>  
>  	if (setup->need_copy) {
>  		void *shared_data;
> @@ -579,11 +633,31 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
>  	(setup->grant_idx)++;
>  }
>  
> +static void blkif_setup_extra_req(struct blkif_request *first,
> +				  struct blkif_request *second)
> +{
> +	uint16_t nr_segments = first->u.rw.nr_segments;
> +
> +	/*
> +	 * The second request is only present when the first request uses
> +	 * all its segments. It's always the continuity of the first one.
> +	 */
> +	first->u.rw.nr_segments = BLKIF_MAX_SEGMENTS_PER_REQUEST;
> +
> +	second->u.rw.nr_segments = nr_segments - BLKIF_MAX_SEGMENTS_PER_REQUEST;
> +	second->u.rw.sector_number = first->u.rw.sector_number +
> +		(BLKIF_MAX_SEGMENTS_PER_REQUEST * XEN_PAGE_SIZE) / 512;
> +
> +	second->u.rw.handle = first->u.rw.handle;
> +	second->operation = first->operation;
> +}
> +
>  static int blkif_queue_rw_req(struct request *req)
>  {
>  	struct blkfront_info *info = req->rq_disk->private_data;
> -	struct blkif_request *ring_req;
> -	unsigned long id;
> +	struct blkif_request *ring_req, *extra_ring_req = NULL;
> +	unsigned long id, extra_id = NO_ASSOCIATED_ID;
> +	bool require_extra_req = false;
>  	int i;
>  	struct setup_rw_req setup = {
>  		.grant_idx = 0,
> @@ -628,19 +702,19 @@ static int blkif_queue_rw_req(struct request *req)
>  	/* Fill out a communications ring structure. */
>  	id = blkif_ring_get_request(info, req, &ring_req);
>  
> -	BUG_ON(info->max_indirect_segments == 0 &&
> -	       GREFS(req->nr_phys_segments) > BLKIF_MAX_SEGMENTS_PER_REQUEST);
> -	BUG_ON(info->max_indirect_segments &&
> -	       GREFS(req->nr_phys_segments) > info->max_indirect_segments);
> -
>  	num_sg = blk_rq_map_sg(req->q, req, info->shadow[id].sg);
>  	num_grant = 0;
>  	/* Calculate the number of grant used */
>  	for_each_sg(info->shadow[id].sg, sg, num_sg, i)
>  	       num_grant += gnttab_count_grant(sg->offset, sg->length);
>  
> +	require_extra_req = info->max_indirect_segments == 0 &&
> +		num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST;
> +	BUG_ON(!HAS_EXTRA_REQ && require_extra_req);
> +
>  	info->shadow[id].num_sg = num_sg;
> -	if (num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
> +	if (num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST &&
> +	    likely(!require_extra_req)) {
>  		/*
>  		 * The indirect operation can only be a BLKIF_OP_READ or
>  		 * BLKIF_OP_WRITE
> @@ -680,10 +754,30 @@ static int blkif_queue_rw_req(struct request *req)
>  			}
>  		}
>  		ring_req->u.rw.nr_segments = num_grant;
> +		if (unlikely(require_extra_req)) {
> +			extra_id = blkif_ring_get_request(info, req,
> +							  &extra_ring_req);
> +			/*
> +			 * Only the first request contains the scatter-gather
> +			 * list.
> +			 */
> +			info->shadow[extra_id].num_sg = 0;
> +
> +			blkif_setup_extra_req(ring_req, extra_ring_req);
> +
> +			/* Link the 2 requests together */
> +			info->shadow[extra_id].associated_id = id;
> +			info->shadow[id].associated_id = extra_id;
> +		}
>  	}
>  
>  	setup.ring_req = ring_req;
>  	setup.id = id;
> +
> +	setup.require_extra_req = require_extra_req;
> +	if (unlikely(require_extra_req))
> +		setup.extra_ring_req = extra_ring_req;
> +
>  	for_each_sg(info->shadow[id].sg, sg, num_sg, i) {
>  		BUG_ON(sg->offset + sg->length > PAGE_SIZE);
>  
> @@ -706,6 +800,8 @@ static int blkif_queue_rw_req(struct request *req)
>  
>  	/* Keep a private copy so we can reissue requests when recovering. */
>  	info->shadow[id].req = *ring_req;
> +	if (unlikely(require_extra_req))
> +		info->shadow[extra_id].req = *extra_ring_req;
>  
>  	if (new_persistent_gnts)
>  		gnttab_free_grant_references(setup.gref_head);
> @@ -797,7 +893,16 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
>  	memset(&info->tag_set, 0, sizeof(info->tag_set));
>  	info->tag_set.ops = &blkfront_mq_ops;
>  	info->tag_set.nr_hw_queues = 1;
> -	info->tag_set.queue_depth =  BLK_RING_SIZE(info);
> +	if (HAS_EXTRA_REQ && info->max_indirect_segments == 0) {
> +		/*
> +		 * When indirect descriptior is not supported, the I/O request
                                 ^ descriptor
> +		 * will be split between multiple request in the ring.
> +		 * To avoid problems when sending the request, divide by
> +		 * 2 the depth of the queue.
> +		 */
> +		info->tag_set.queue_depth =  BLK_RING_SIZE(info) / 2;
> +	} else
> +		info->tag_set.queue_depth = BLK_RING_SIZE(info);
>  	info->tag_set.numa_node = NUMA_NO_NODE;
>  	info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
>  	info->tag_set.cmd_size = 0;
> @@ -1217,19 +1322,92 @@ static void blkif_copy_from_grant(unsigned long gfn, unsigned int offset,
>  	kunmap_atomic(shared_data);
>  }
>  
> -static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
> +static enum blk_req_status blkif_rsp_to_req_status(int rsp)
> +{
> +	switch (rsp)
> +	{
> +	case BLKIF_RSP_OKAY:
> +		return REQ_DONE;
> +	case BLKIF_RSP_ERROR:
> +		return REQ_ERROR;

This could be joined with the default case with a fall through:

case BLKIF_RSP_ERROR:
default:
	return REQ_ERROR;

> +	case BLKIF_RSP_EOPNOTSUPP:
> +		return REQ_EOPNOTSUPP;
> +	default:
> +		return REQ_ERROR;
> +	}
> +}
> +
> +/*
> + * Get the final status of the block request based on two ring response
> + */
> +static int blkif_get_final_status(enum blk_req_status s1,
> +				  enum blk_req_status s2)
> +{
> +	BUG_ON(s1 == REQ_WAITING);
> +	BUG_ON(s2 == REQ_WAITING);
> +
> +	if (s1 == REQ_ERROR || s2 == REQ_ERROR)
> +		return BLKIF_RSP_ERROR;
> +	else if (s1 == REQ_EOPNOTSUPP || s2 == REQ_EOPNOTSUPP)
> +		return BLKIF_RSP_EOPNOTSUPP;
> +	else

Unneeded else?

> +		return BLKIF_RSP_OKAY;
> +}
> +
> +static bool blkif_completion(unsigned long *id, struct blkfront_info *info,
>  			     struct blkif_response *bret)
>  {
>  	int i = 0;
>  	struct scatterlist *sg;
>  	int num_sg, num_grant;
> +	struct blk_shadow *s = &info->shadow[*id];
>  	struct copy_from_grant data = {
> -		.s = s,
>  		.grant_idx = 0,
>  	};
>  
>  	num_grant = s->req.operation == BLKIF_OP_INDIRECT ?
>  		s->req.u.indirect.nr_segments : s->req.u.rw.nr_segments;
> +
> +	/* The I/O request may be split in two */
> +	if (unlikely(s->associated_id != NO_ASSOCIATED_ID)) {
> +		struct blk_shadow *s2 = &info->shadow[s->associated_id];
> +
> +		/* Keep the status of the current response in shadow */
> +		s->status = blkif_rsp_to_req_status(bret->status);
> +
> +		/* Wait the second response if not yet here */
> +		if (s2->status == REQ_WAITING)
> +			return 0;
> +
> +		bret->status = blkif_get_final_status(s->status,
> +						      s2->status);
> +
> +		/*
> +		 * All the grants is stored in the first shadow in order
> +		 * to make the completion code simpler.
> +		 */
> +		num_grant += s2->req.u.rw.nr_segments;
> +
> +		/*
> +		 * The two responses may not come in order. Only the
> +		 * first request will store the scatter-gather list
> +		 */
> +		if (s2->num_sg != 0) {
> +			/* Update "id" with the ID of the first response */
> +			*id = s->associated_id;
> +			s = s2;
> +		}
> +
> +		/*
> +		 * We don't need anymore the second request, so recycling
> +		 * it now.
> +		 */
> +		if (add_id_to_freelist(info, s->associated_id))
> +			WARN(1, "%s: can't recycle the second part (id = %ld) of the request\n",
> +			     info->gd->disk_name, s->associated_id);
> +	}
> +
> +	data.s = s;
>  	num_sg = s->num_sg;
>  
>  	if (bret->operation == BLKIF_OP_READ && info->feature_persistent) {
> @@ -1299,6 +1477,8 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
>  			}
>  		}
>  	}
> +
> +	return 1;
>  }
>  
>  static irqreturn_t blkif_interrupt(int irq, void *dev_id)
> @@ -1340,8 +1520,14 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>  		}
>  		req  = info->shadow[id].request;
>  
> -		if (bret->operation != BLKIF_OP_DISCARD)
> -			blkif_completion(&info->shadow[id], info, bret);
> +		if (bret->operation != BLKIF_OP_DISCARD) {
> +			/*
> +			 * We may need to wait for an extra response if the
> +			 * I/O request is split in 2
> +			 */
> +			if (!blkif_completion(&id, info, bret))
> +				continue;
> +		}
>  
>  		if (add_id_to_freelist(info, id)) {
>  			WARN(1, "%s: response to %s (id %ld) couldn't be recycled!\n",
> @@ -1864,8 +2050,18 @@ static int blkfront_setup_indirect(struct blkfront_info *info)
>  	unsigned int psegs, grants;
>  	int err, i;
>  
> -	if (info->max_indirect_segments == 0)
> -		grants = BLKIF_MAX_SEGMENTS_PER_REQUEST;
> +	if (info->max_indirect_segments == 0) {
> +		if (!HAS_EXTRA_REQ)
> +			grants = BLKIF_MAX_SEGMENTS_PER_REQUEST;
> +		else {
> +			/*
> +			 * When an extra req is required, the maximum
> +			 * grants supported is related to the size of the
> +			 * Linux block segment
> +			 */
> +			grants = GRANTS_PER_PSEG;
> +		}
> +	}
>  	else
>  		grants = info->max_indirect_segments;
>  	psegs = grants / GRANTS_PER_PSEG;
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/2] block/xen-blkfront: Handle non-indirect grant with 64KB pages
  2015-11-18 18:57 ` Julien Grall
@ 2015-12-01 15:30   ` Roger Pau Monné
  2015-12-01 15:30   ` Roger Pau Monné
  1 sibling, 0 replies; 26+ messages in thread
From: Roger Pau Monné @ 2015-12-01 15:30 UTC (permalink / raw)
  To: Julien Grall, xen-devel, linux-kernel; +Cc: Boris Ostrovsky, David Vrabel

El 18/11/15 a les 19.57, Julien Grall ha escrit:
> The minimal size of request in the block framework is always PAGE_SIZE.
> It means that when 64KB guest is support, the request will at least be
> 64KB.
> 
> Although, if the backend doesn't support indirect descriptor (such as QDISK
> in QEMU), a ring request is only able to accommodate 11 segments of 4KB
> (i.e 44KB).
> 
> The current frontend is assuming that an I/O request will always fit in
> a ring request. This is not true any more when using 64KB page
> granularity and will therefore crash during boot.
> 
> On ARM64, the ABI is completely neutral to the page granularity used by
> the domU. The guest has the choice between different page granularity
> supported by the processors (for instance on ARM64: 4KB, 16KB, 64KB).
> This can't be enforced by the hypervisor and therefore it's possible to
> run guests using different page granularity.
> 
> So we can't mandate the block backend to support indirect descriptor
> when the frontend is using 64KB page granularity and have to fix it
> properly in the frontend.
> 
> The solution exposed below is based on modifying directly the frontend
> guest rather than asking the block framework to support smaller size
> (i.e < PAGE_SIZE). This is because the change is the block framework are
> not trivial as everything seems to relying on a struct *page (see [1]).
> Although, it may be possible that someone succeed to do it in the future
> and we would therefore be able to use it.
> 
> Given that a block request may not fit in a single ring request, a
> second request is introduced for the data that cannot fit in the first
> one. This means that the second ring request should never be used on
> Linux if the page size is smaller than 44KB.
> 
> To achieve the support of the extra ring request, the block queue size
> is divided by two. Therefore, the ring will always contain enough space
> to accommodate 2 ring requests. While this will reduce the overall
> performance, it will make the implementation more contained. The way
> forward to get better performance is to implement in the backend either
> indirect descriptor or multiple grants ring.
> 
> Note that the parameters blk_queue_max_* helpers haven't been updated.
> The block code will set the mimimum size supported and we may be able
> to support directly any change in the block framework that lower down
> the minimal size of a request.
> 
> [1] http://lists.xen.org/archives/html/xen-devel/2015-08/msg02200.html
> 
> Signed-off-by: Julien Grall <julien.grall@citrix.com>

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Just a couple of cosmetic comments below. Unless anyone else has real
comments I don't think you need to resend.

> ---
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: "Roger Pau Monné" <roger.pau@citrix.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: David Vrabel <david.vrabel@citrix.com>
> Cc: Bob Liu <bob.liu@oracle.com>
> 
>     Roger, I haven't kept your acked-by because the update of the final
>     status.
> 
>     Changes in v3:
>         - Typoes
>         - Change the way the final status is computed to handle the
>         possibility of the request handled by different backend (could
>         happen during suspend/resume).
> 
>     Changes in v2:
>         - Update the commit message
>         - Typoes
>         - Rename ring_req2/id2 to extra_ring_req/extra_id
> ---
>  drivers/block/xen-blkfront.c | 228 ++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 212 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 2248a47..33f6ff4 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -60,6 +60,20 @@
>  
>  #include <asm/xen/hypervisor.h>
>  
> +/*
> + * The minimal size of segment supported by the block framework is PAGE_SIZE.
> + * When Linux is using a different page size than xen, it may not be possible
> + * to put all the data in a single segment.
> + * This can happen when the backend doesn't support indirect descriptor and
> + * therefore the maximum amount of data that a request can carry is
> + * BLKIF_MAX_SEGMENTS_PER_REQUEST * XEN_PAGE_SIZE = 44KB
> + *
> + * Note that we only support one extra request. So the Linux page size
> + * should be <= ( 2 * BLKIF_MAX_SEGMENTS_PER_REQUEST * XEN_PAGE_SIZE) =
> + * 88KB.
> + */
> +#define HAS_EXTRA_REQ (BLKIF_MAX_SEGMENTS_PER_REQUEST < XEN_PFN_PER_PAGE)
> +
>  enum blkif_state {
>  	BLKIF_STATE_DISCONNECTED,
>  	BLKIF_STATE_CONNECTED,
> @@ -72,6 +86,13 @@ struct grant {
>  	struct list_head node;
>  };
>  
> +enum blk_req_status {
> +	REQ_WAITING,
> +	REQ_DONE,
> +	REQ_ERROR,
> +	REQ_EOPNOTSUPP,
> +};
> +
>  struct blk_shadow {
>  	struct blkif_request req;
>  	struct request *request;
> @@ -79,6 +100,14 @@ struct blk_shadow {
>  	struct grant **indirect_grants;
>  	struct scatterlist *sg;
>  	unsigned int num_sg;
> +	enum blk_req_status status;
> +
> +	#define NO_ASSOCIATED_ID ~0UL
> +	/*
> +	 * Id of the sibling if we ever need 2 requests when handling a
> +	 * block I/O request
> +	 */
> +	unsigned long associated_id;
>  };
>  
>  struct split_bio {
> @@ -467,6 +496,8 @@ static unsigned long blkif_ring_get_request(struct blkfront_info *info,
>  
>  	id = get_id_from_freelist(info);
>  	info->shadow[id].request = req;
> +	info->shadow[id].status = REQ_WAITING;
> +	info->shadow[id].associated_id = NO_ASSOCIATED_ID;
>  
>  	(*ring_req)->u.rw.id = id;
>  
> @@ -508,6 +539,9 @@ struct setup_rw_req {
>  	bool need_copy;
>  	unsigned int bvec_off;
>  	char *bvec_data;
> +
> +	bool require_extra_req;
> +	struct blkif_request *extra_ring_req;
>  };
>  
>  static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
> @@ -521,8 +555,24 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
>  	unsigned int grant_idx = setup->grant_idx;
>  	struct blkif_request *ring_req = setup->ring_req;
>  	struct blkfront_info *info = setup->info;
> +	/*
> +	 * We always use the shadow of the first request to store the list
> +	 * of grant associated to the block I/O request. This made the
> +	 * completion more easy to handle even if the block I/O request is
> +	 * split.
> +	 */
>  	struct blk_shadow *shadow = &info->shadow[setup->id];
>  
> +	if (unlikely(setup->require_extra_req &&
> +		     grant_idx >= BLKIF_MAX_SEGMENTS_PER_REQUEST)) {
> +		/*
> +		 * We are using the second request, setup grant_idx
> +		 * to be the index of the segment array
> +		 */
> +		grant_idx -= BLKIF_MAX_SEGMENTS_PER_REQUEST;
> +		ring_req = setup->extra_ring_req;
> +	}
> +
>  	if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
>  	    (grant_idx % GRANTS_PER_INDIRECT_FRAME == 0)) {
>  		if (setup->segments)
> @@ -537,7 +587,11 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
>  
>  	gnt_list_entry = get_grant(&setup->gref_head, gfn, info);
>  	ref = gnt_list_entry->gref;
> -	shadow->grants_used[grant_idx] = gnt_list_entry;
> +	/*
> +	 * All the grants are stored in the shadow of the first
> +	 * request. Therefore we have to use the global index
> +	 */
> +	shadow->grants_used[setup->grant_idx] = gnt_list_entry;
>  
>  	if (setup->need_copy) {
>  		void *shared_data;
> @@ -579,11 +633,31 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
>  	(setup->grant_idx)++;
>  }
>  
> +static void blkif_setup_extra_req(struct blkif_request *first,
> +				  struct blkif_request *second)
> +{
> +	uint16_t nr_segments = first->u.rw.nr_segments;
> +
> +	/*
> +	 * The second request is only present when the first request uses
> +	 * all its segments. It's always the continuity of the first one.
> +	 */
> +	first->u.rw.nr_segments = BLKIF_MAX_SEGMENTS_PER_REQUEST;
> +
> +	second->u.rw.nr_segments = nr_segments - BLKIF_MAX_SEGMENTS_PER_REQUEST;
> +	second->u.rw.sector_number = first->u.rw.sector_number +
> +		(BLKIF_MAX_SEGMENTS_PER_REQUEST * XEN_PAGE_SIZE) / 512;
> +
> +	second->u.rw.handle = first->u.rw.handle;
> +	second->operation = first->operation;
> +}
> +
>  static int blkif_queue_rw_req(struct request *req)
>  {
>  	struct blkfront_info *info = req->rq_disk->private_data;
> -	struct blkif_request *ring_req;
> -	unsigned long id;
> +	struct blkif_request *ring_req, *extra_ring_req = NULL;
> +	unsigned long id, extra_id = NO_ASSOCIATED_ID;
> +	bool require_extra_req = false;
>  	int i;
>  	struct setup_rw_req setup = {
>  		.grant_idx = 0,
> @@ -628,19 +702,19 @@ static int blkif_queue_rw_req(struct request *req)
>  	/* Fill out a communications ring structure. */
>  	id = blkif_ring_get_request(info, req, &ring_req);
>  
> -	BUG_ON(info->max_indirect_segments == 0 &&
> -	       GREFS(req->nr_phys_segments) > BLKIF_MAX_SEGMENTS_PER_REQUEST);
> -	BUG_ON(info->max_indirect_segments &&
> -	       GREFS(req->nr_phys_segments) > info->max_indirect_segments);
> -
>  	num_sg = blk_rq_map_sg(req->q, req, info->shadow[id].sg);
>  	num_grant = 0;
>  	/* Calculate the number of grant used */
>  	for_each_sg(info->shadow[id].sg, sg, num_sg, i)
>  	       num_grant += gnttab_count_grant(sg->offset, sg->length);
>  
> +	require_extra_req = info->max_indirect_segments == 0 &&
> +		num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST;
> +	BUG_ON(!HAS_EXTRA_REQ && require_extra_req);
> +
>  	info->shadow[id].num_sg = num_sg;
> -	if (num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
> +	if (num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST &&
> +	    likely(!require_extra_req)) {
>  		/*
>  		 * The indirect operation can only be a BLKIF_OP_READ or
>  		 * BLKIF_OP_WRITE
> @@ -680,10 +754,30 @@ static int blkif_queue_rw_req(struct request *req)
>  			}
>  		}
>  		ring_req->u.rw.nr_segments = num_grant;
> +		if (unlikely(require_extra_req)) {
> +			extra_id = blkif_ring_get_request(info, req,
> +							  &extra_ring_req);
> +			/*
> +			 * Only the first request contains the scatter-gather
> +			 * list.
> +			 */
> +			info->shadow[extra_id].num_sg = 0;
> +
> +			blkif_setup_extra_req(ring_req, extra_ring_req);
> +
> +			/* Link the 2 requests together */
> +			info->shadow[extra_id].associated_id = id;
> +			info->shadow[id].associated_id = extra_id;
> +		}
>  	}
>  
>  	setup.ring_req = ring_req;
>  	setup.id = id;
> +
> +	setup.require_extra_req = require_extra_req;
> +	if (unlikely(require_extra_req))
> +		setup.extra_ring_req = extra_ring_req;
> +
>  	for_each_sg(info->shadow[id].sg, sg, num_sg, i) {
>  		BUG_ON(sg->offset + sg->length > PAGE_SIZE);
>  
> @@ -706,6 +800,8 @@ static int blkif_queue_rw_req(struct request *req)
>  
>  	/* Keep a private copy so we can reissue requests when recovering. */
>  	info->shadow[id].req = *ring_req;
> +	if (unlikely(require_extra_req))
> +		info->shadow[extra_id].req = *extra_ring_req;
>  
>  	if (new_persistent_gnts)
>  		gnttab_free_grant_references(setup.gref_head);
> @@ -797,7 +893,16 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
>  	memset(&info->tag_set, 0, sizeof(info->tag_set));
>  	info->tag_set.ops = &blkfront_mq_ops;
>  	info->tag_set.nr_hw_queues = 1;
> -	info->tag_set.queue_depth =  BLK_RING_SIZE(info);
> +	if (HAS_EXTRA_REQ && info->max_indirect_segments == 0) {
> +		/*
> +		 * When indirect descriptior is not supported, the I/O request
                                 ^ descriptor
> +		 * will be split between multiple request in the ring.
> +		 * To avoid problems when sending the request, divide by
> +		 * 2 the depth of the queue.
> +		 */
> +		info->tag_set.queue_depth =  BLK_RING_SIZE(info) / 2;
> +	} else
> +		info->tag_set.queue_depth = BLK_RING_SIZE(info);
>  	info->tag_set.numa_node = NUMA_NO_NODE;
>  	info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
>  	info->tag_set.cmd_size = 0;
> @@ -1217,19 +1322,92 @@ static void blkif_copy_from_grant(unsigned long gfn, unsigned int offset,
>  	kunmap_atomic(shared_data);
>  }
>  
> -static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
> +static enum blk_req_status blkif_rsp_to_req_status(int rsp)
> +{
> +	switch (rsp)
> +	{
> +	case BLKIF_RSP_OKAY:
> +		return REQ_DONE;
> +	case BLKIF_RSP_ERROR:
> +		return REQ_ERROR;

This could be joined with the default case with a fall through:

case BLKIF_RSP_ERROR:
default:
	return REQ_ERROR;

> +	case BLKIF_RSP_EOPNOTSUPP:
> +		return REQ_EOPNOTSUPP;
> +	default:
> +		return REQ_ERROR;
> +	}
> +}
> +
> +/*
> + * Get the final status of the block request based on two ring response
> + */
> +static int blkif_get_final_status(enum blk_req_status s1,
> +				  enum blk_req_status s2)
> +{
> +	BUG_ON(s1 == REQ_WAITING);
> +	BUG_ON(s2 == REQ_WAITING);
> +
> +	if (s1 == REQ_ERROR || s2 == REQ_ERROR)
> +		return BLKIF_RSP_ERROR;
> +	else if (s1 == REQ_EOPNOTSUPP || s2 == REQ_EOPNOTSUPP)
> +		return BLKIF_RSP_EOPNOTSUPP;
> +	else

Unneeded else?

> +		return BLKIF_RSP_OKAY;
> +}
> +
> +static bool blkif_completion(unsigned long *id, struct blkfront_info *info,
>  			     struct blkif_response *bret)
>  {
>  	int i = 0;
>  	struct scatterlist *sg;
>  	int num_sg, num_grant;
> +	struct blk_shadow *s = &info->shadow[*id];
>  	struct copy_from_grant data = {
> -		.s = s,
>  		.grant_idx = 0,
>  	};
>  
>  	num_grant = s->req.operation == BLKIF_OP_INDIRECT ?
>  		s->req.u.indirect.nr_segments : s->req.u.rw.nr_segments;
> +
> +	/* The I/O request may be split in two */
> +	if (unlikely(s->associated_id != NO_ASSOCIATED_ID)) {
> +		struct blk_shadow *s2 = &info->shadow[s->associated_id];
> +
> +		/* Keep the status of the current response in shadow */
> +		s->status = blkif_rsp_to_req_status(bret->status);
> +
> +		/* Wait the second response if not yet here */
> +		if (s2->status == REQ_WAITING)
> +			return 0;
> +
> +		bret->status = blkif_get_final_status(s->status,
> +						      s2->status);
> +
> +		/*
> +		 * All the grants is stored in the first shadow in order
> +		 * to make the completion code simpler.
> +		 */
> +		num_grant += s2->req.u.rw.nr_segments;
> +
> +		/*
> +		 * The two responses may not come in order. Only the
> +		 * first request will store the scatter-gather list
> +		 */
> +		if (s2->num_sg != 0) {
> +			/* Update "id" with the ID of the first response */
> +			*id = s->associated_id;
> +			s = s2;
> +		}
> +
> +		/*
> +		 * We don't need anymore the second request, so recycling
> +		 * it now.
> +		 */
> +		if (add_id_to_freelist(info, s->associated_id))
> +			WARN(1, "%s: can't recycle the second part (id = %ld) of the request\n",
> +			     info->gd->disk_name, s->associated_id);
> +	}
> +
> +	data.s = s;
>  	num_sg = s->num_sg;
>  
>  	if (bret->operation == BLKIF_OP_READ && info->feature_persistent) {
> @@ -1299,6 +1477,8 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
>  			}
>  		}
>  	}
> +
> +	return 1;
>  }
>  
>  static irqreturn_t blkif_interrupt(int irq, void *dev_id)
> @@ -1340,8 +1520,14 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>  		}
>  		req  = info->shadow[id].request;
>  
> -		if (bret->operation != BLKIF_OP_DISCARD)
> -			blkif_completion(&info->shadow[id], info, bret);
> +		if (bret->operation != BLKIF_OP_DISCARD) {
> +			/*
> +			 * We may need to wait for an extra response if the
> +			 * I/O request is split in 2
> +			 */
> +			if (!blkif_completion(&id, info, bret))
> +				continue;
> +		}
>  
>  		if (add_id_to_freelist(info, id)) {
>  			WARN(1, "%s: response to %s (id %ld) couldn't be recycled!\n",
> @@ -1864,8 +2050,18 @@ static int blkfront_setup_indirect(struct blkfront_info *info)
>  	unsigned int psegs, grants;
>  	int err, i;
>  
> -	if (info->max_indirect_segments == 0)
> -		grants = BLKIF_MAX_SEGMENTS_PER_REQUEST;
> +	if (info->max_indirect_segments == 0) {
> +		if (!HAS_EXTRA_REQ)
> +			grants = BLKIF_MAX_SEGMENTS_PER_REQUEST;
> +		else {
> +			/*
> +			 * When an extra req is required, the maximum
> +			 * grants supported is related to the size of the
> +			 * Linux block segment
> +			 */
> +			grants = GRANTS_PER_PSEG;
> +		}
> +	}
>  	else
>  		grants = info->max_indirect_segments;
>  	psegs = grants / GRANTS_PER_PSEG;
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/2] block/xen-blkfront: Support non-indirect grant with 64KB page granularity
  2015-11-18 18:57 [PATCH v3 0/2] block/xen-blkfront: Support non-indirect grant with 64KB page granularity Julien Grall
                   ` (7 preceding siblings ...)
  2015-11-30 18:45 ` Julien Grall
@ 2015-12-01 15:37 ` Konrad Rzeszutek Wilk
  2015-12-01 17:55   ` [Xen-devel] " Julien Grall
  2015-12-01 17:55   ` Julien Grall
  8 siblings, 2 replies; 26+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-12-01 15:37 UTC (permalink / raw)
  To: Julien Grall
  Cc: linux-kernel, David Vrabel, xen-devel, Boris Ostrovsky,
	Roger Pau Monné

On Wed, Nov 18, 2015 at 06:57:23PM +0000, Julien Grall wrote:
> Hi all,
> 
> This is a follow-up on the previous discussion [1] related to guest using 64KB
> page granularity which doesn't boot when the backend isn't using indirect
> descriptor.
> 
> This has been successfully tested on ARM64 with both 64KB and 4KB page
> granularity guests and QEMU as the backend. Indeed QEMU doesn't support
> indirect descriptor.
> 
> This series is based on xentip/for-linus-4.4 which include the support for
> 64KB Linux guest.

In the meantime the multi-queue patches have been put in the queue

git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git #devel/for-jens-4.5

I will try rebasing the patches on top of that.
> 
> For all the changes see in each patch.
> 
> Sincerely yours,
> 
> [1] http://lists.xen.org/archives/html/xen-devel/2015-08/msg01659.html
> 
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: "Roger Pau Monné" <roger.pau@citrix.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: David Vrabel <david.vrabel@citrix.com>
> Cc: Bob Liu <bob.liu@oracle.com>
> 
> Julien Grall (2):
>   block/xen-blkfront: Introduce blkif_ring_get_request
>   block/xen-blkfront: Handle non-indirect grant with 64KB pages
> 
>  drivers/block/xen-blkfront.c | 258 ++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 231 insertions(+), 27 deletions(-)
> 
> -- 
> 2.1.4
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Xen-devel] [PATCH v3 0/2] block/xen-blkfront: Support non-indirect grant with 64KB page granularity
  2015-12-01 15:37 ` Konrad Rzeszutek Wilk
@ 2015-12-01 17:55   ` Julien Grall
  2015-12-01 18:52     ` Konrad Rzeszutek Wilk
  2015-12-01 17:55   ` Julien Grall
  1 sibling, 1 reply; 26+ messages in thread
From: Julien Grall @ 2015-12-01 17:55 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, David Vrabel, xen-devel, Boris Ostrovsky,
	Roger Pau Monné

Hi Konrad,

On 01/12/15 15:37, Konrad Rzeszutek Wilk wrote:
> On Wed, Nov 18, 2015 at 06:57:23PM +0000, Julien Grall wrote:
>> Hi all,
>>
>> This is a follow-up on the previous discussion [1] related to guest using 64KB
>> page granularity which doesn't boot when the backend isn't using indirect
>> descriptor.
>>
>> This has been successfully tested on ARM64 with both 64KB and 4KB page
>> granularity guests and QEMU as the backend. Indeed QEMU doesn't support
>> indirect descriptor.
>>
>> This series is based on xentip/for-linus-4.4 which include the support for
>> 64KB Linux guest.
> 
> In the meantime the multi-queue patches have been put in the queue
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git #devel/for-jens-4.5
> 
> I will try rebasing the patches on top of that.

It will likely clash with the multiqueue changes. I will rebase this
patch series and resend it.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/2] block/xen-blkfront: Support non-indirect grant with 64KB page granularity
  2015-12-01 15:37 ` Konrad Rzeszutek Wilk
  2015-12-01 17:55   ` [Xen-devel] " Julien Grall
@ 2015-12-01 17:55   ` Julien Grall
  1 sibling, 0 replies; 26+ messages in thread
From: Julien Grall @ 2015-12-01 17:55 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, Boris Ostrovsky, Roger Pau Monné,
	linux-kernel, David Vrabel

Hi Konrad,

On 01/12/15 15:37, Konrad Rzeszutek Wilk wrote:
> On Wed, Nov 18, 2015 at 06:57:23PM +0000, Julien Grall wrote:
>> Hi all,
>>
>> This is a follow-up on the previous discussion [1] related to guest using 64KB
>> page granularity which doesn't boot when the backend isn't using indirect
>> descriptor.
>>
>> This has been successfully tested on ARM64 with both 64KB and 4KB page
>> granularity guests and QEMU as the backend. Indeed QEMU doesn't support
>> indirect descriptor.
>>
>> This series is based on xentip/for-linus-4.4 which include the support for
>> 64KB Linux guest.
> 
> In the meantime the multi-queue patches have been put in the queue
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git #devel/for-jens-4.5
> 
> I will try rebasing the patches on top of that.

It will likely clash with the multiqueue changes. I will rebase this
patch series and resend it.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/2] block/xen-blkfront: Support non-indirect grant with 64KB page granularity
  2015-12-01 17:55   ` [Xen-devel] " Julien Grall
@ 2015-12-01 18:52     ` Konrad Rzeszutek Wilk
  2015-12-07 14:21       ` Julien Grall
                         ` (3 more replies)
  0 siblings, 4 replies; 26+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-12-01 18:52 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Boris Ostrovsky, Roger Pau Monné,
	linux-kernel, David Vrabel

[-- Attachment #1: Type: text/plain, Size: 1109 bytes --]

On Tue, Dec 01, 2015 at 05:55:48PM +0000, Julien Grall wrote:
> Hi Konrad,
> 
> On 01/12/15 15:37, Konrad Rzeszutek Wilk wrote:
> > On Wed, Nov 18, 2015 at 06:57:23PM +0000, Julien Grall wrote:
> >> Hi all,
> >>
> >> This is a follow-up on the previous discussion [1] related to guest using 64KB
> >> page granularity which doesn't boot when the backend isn't using indirect
> >> descriptor.
> >>
> >> This has been successfully tested on ARM64 with both 64KB and 4KB page
> >> granularity guests and QEMU as the backend. Indeed QEMU doesn't support
> >> indirect descriptor.
> >>
> >> This series is based on xentip/for-linus-4.4 which include the support for
> >> 64KB Linux guest.
> > 
> > In the meantime the multi-queue patches have been put in the queue
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git #devel/for-jens-4.5
> > 
> > I will try rebasing the patches on top of that.
> 
> It will likely clash with the multiqueue changes. I will rebase this
> patch series and resend it.

I got patch #1 ported over (see attached). Testing it now.
> 
> Regards,
> 
> -- 
> Julien Grall

[-- Attachment #2: 0001-block-xen-blkfront-Introduce-blkif_ring_get_request.patch --]
[-- Type: text/plain, Size: 3203 bytes --]

>From ebbda22e54e6557188298a3e1d6c0dcf4b04da26 Mon Sep 17 00:00:00 2001
From: Julien Grall <julien.grall@citrix.com>
Date: Wed, 18 Nov 2015 18:57:24 +0000
Subject: [PATCH] block/xen-blkfront: Introduce blkif_ring_get_request
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The code to get a request is always the same. Therefore we can factorize
it in a single function.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/block/xen-blkfront.c | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 4f77d36..38af260 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -481,6 +481,24 @@ static int blkif_ioctl(struct block_device *bdev, fmode_t mode,
 	return 0;
 }
 
+static unsigned long blkif_ring_get_request(struct blkfront_ring_info *rinfo,
+					    struct request *req,
+					    struct blkif_request **ring_req)
+{
+	unsigned long id;
+	struct blkfront_info *info = rinfo->dev_info;
+
+	*ring_req = RING_GET_REQUEST(&rinfo->ring, rinfo->ring.req_prod_pvt);
+	rinfo->ring.req_prod_pvt++;
+
+	id = get_id_from_freelist(rinfo);
+	rinfo->shadow[id].request = req;
+
+	(*ring_req)->u.rw.id = id;
+
+	return id;
+}
+
 static int blkif_queue_discard_req(struct request *req, struct blkfront_ring_info *rinfo)
 {
 	struct blkfront_info *info = rinfo->dev_info;
@@ -488,9 +506,7 @@ static int blkif_queue_discard_req(struct request *req, struct blkfront_ring_inf
 	unsigned long id;
 
 	/* Fill out a communications ring structure. */
-	ring_req = RING_GET_REQUEST(&rinfo->ring, rinfo->ring.req_prod_pvt);
-	id = get_id_from_freelist(rinfo);
-	rinfo->shadow[id].request = req;
+	id = blkif_ring_get_request(rinfo, req, &ring_req);
 
 	ring_req->operation = BLKIF_OP_DISCARD;
 	ring_req->u.discard.nr_sectors = blk_rq_sectors(req);
@@ -501,8 +517,6 @@ static int blkif_queue_discard_req(struct request *req, struct blkfront_ring_inf
 	else
 		ring_req->u.discard.flag = 0;
 
-	rinfo->ring.req_prod_pvt++;
-
 	/* Keep a private copy so we can reissue requests when recovering. */
 	rinfo->shadow[id].req = *ring_req;
 
@@ -635,9 +649,7 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
 		}
 
 	/* Fill out a communications ring structure. */
-	ring_req = RING_GET_REQUEST(&rinfo->ring, rinfo->ring.req_prod_pvt);
-	id = get_id_from_freelist(rinfo);
-	rinfo->shadow[id].request = req;
+	id = blkif_ring_get_request(rinfo, req, &ring_req);
 
 	BUG_ON(info->max_indirect_segments == 0 &&
 	       GREFS(req->nr_phys_segments) > BLKIF_MAX_SEGMENTS_PER_REQUEST);
@@ -716,8 +728,6 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
 	if (setup.segments)
 		kunmap_atomic(setup.segments);
 
-	rinfo->ring.req_prod_pvt++;
-
 	/* Keep a private copy so we can reissue requests when recovering. */
 	rinfo->shadow[id].req = *ring_req;
 
-- 
2.1.0


[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [Xen-devel] [PATCH v3 0/2] block/xen-blkfront: Support non-indirect grant with 64KB page granularity
  2015-12-01 18:52     ` Konrad Rzeszutek Wilk
  2015-12-07 14:21       ` Julien Grall
@ 2015-12-07 14:21       ` Julien Grall
  2015-12-07 15:01         ` Konrad Rzeszutek Wilk
  2015-12-08 12:25       ` Julien Grall
  2015-12-08 12:25       ` [Xen-devel] " Julien Grall
  3 siblings, 1 reply; 26+ messages in thread
From: Julien Grall @ 2015-12-07 14:21 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, Boris Ostrovsky, Roger Pau Monné,
	linux-kernel, David Vrabel

Hi Konrad,

On 01/12/15 18:52, Konrad Rzeszutek Wilk wrote:
> On Tue, Dec 01, 2015 at 05:55:48PM +0000, Julien Grall wrote:
>> Hi Konrad,
>>
>> On 01/12/15 15:37, Konrad Rzeszutek Wilk wrote:
>>> On Wed, Nov 18, 2015 at 06:57:23PM +0000, Julien Grall wrote:
>>>> Hi all,
>>>>
>>>> This is a follow-up on the previous discussion [1] related to guest using 64KB
>>>> page granularity which doesn't boot when the backend isn't using indirect
>>>> descriptor.
>>>>
>>>> This has been successfully tested on ARM64 with both 64KB and 4KB page
>>>> granularity guests and QEMU as the backend. Indeed QEMU doesn't support
>>>> indirect descriptor.
>>>>
>>>> This series is based on xentip/for-linus-4.4 which include the support for
>>>> 64KB Linux guest.
>>>
>>> In the meantime the multi-queue patches have been put in the queue
>>>
>>> git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git #devel/for-jens-4.5
>>>
>>> I will try rebasing the patches on top of that.
>>
>> It will likely clash with the multiqueue changes. I will rebase this
>> patch series and resend it.
> 
> I got patch #1 ported over (see attached). Testing it now.

Thank you, the changes look good to me. What about patch #2?

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/2] block/xen-blkfront: Support non-indirect grant with 64KB page granularity
  2015-12-01 18:52     ` Konrad Rzeszutek Wilk
@ 2015-12-07 14:21       ` Julien Grall
  2015-12-07 14:21       ` [Xen-devel] " Julien Grall
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 26+ messages in thread
From: Julien Grall @ 2015-12-07 14:21 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, Boris Ostrovsky, David Vrabel, linux-kernel,
	Roger Pau Monné

Hi Konrad,

On 01/12/15 18:52, Konrad Rzeszutek Wilk wrote:
> On Tue, Dec 01, 2015 at 05:55:48PM +0000, Julien Grall wrote:
>> Hi Konrad,
>>
>> On 01/12/15 15:37, Konrad Rzeszutek Wilk wrote:
>>> On Wed, Nov 18, 2015 at 06:57:23PM +0000, Julien Grall wrote:
>>>> Hi all,
>>>>
>>>> This is a follow-up on the previous discussion [1] related to guest using 64KB
>>>> page granularity which doesn't boot when the backend isn't using indirect
>>>> descriptor.
>>>>
>>>> This has been successfully tested on ARM64 with both 64KB and 4KB page
>>>> granularity guests and QEMU as the backend. Indeed QEMU doesn't support
>>>> indirect descriptor.
>>>>
>>>> This series is based on xentip/for-linus-4.4 which include the support for
>>>> 64KB Linux guest.
>>>
>>> In the meantime the multi-queue patches have been put in the queue
>>>
>>> git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git #devel/for-jens-4.5
>>>
>>> I will try rebasing the patches on top of that.
>>
>> It will likely clash with the multiqueue changes. I will rebase this
>> patch series and resend it.
> 
> I got patch #1 ported over (see attached). Testing it now.

Thank you, the changes look good to me. What about patch #2?

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/2] block/xen-blkfront: Support non-indirect grant with 64KB page granularity
  2015-12-07 14:21       ` [Xen-devel] " Julien Grall
@ 2015-12-07 15:01         ` Konrad Rzeszutek Wilk
  2015-12-08 12:51           ` Julien Grall
  2015-12-08 12:51           ` [Xen-devel] " Julien Grall
  0 siblings, 2 replies; 26+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-12-07 15:01 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Boris Ostrovsky, David Vrabel, linux-kernel,
	Roger Pau Monné

On Mon, Dec 07, 2015 at 02:21:46PM +0000, Julien Grall wrote:
> Hi Konrad,
> 
> On 01/12/15 18:52, Konrad Rzeszutek Wilk wrote:
> > On Tue, Dec 01, 2015 at 05:55:48PM +0000, Julien Grall wrote:
> >> Hi Konrad,
> >>
> >> On 01/12/15 15:37, Konrad Rzeszutek Wilk wrote:
> >>> On Wed, Nov 18, 2015 at 06:57:23PM +0000, Julien Grall wrote:
> >>>> Hi all,
> >>>>
> >>>> This is a follow-up on the previous discussion [1] related to guest using 64KB
> >>>> page granularity which doesn't boot when the backend isn't using indirect
> >>>> descriptor.
> >>>>
> >>>> This has been successfully tested on ARM64 with both 64KB and 4KB page
> >>>> granularity guests and QEMU as the backend. Indeed QEMU doesn't support
> >>>> indirect descriptor.
> >>>>
> >>>> This series is based on xentip/for-linus-4.4 which include the support for
> >>>> 64KB Linux guest.
> >>>
> >>> In the meantime the multi-queue patches have been put in the queue
> >>>
> >>> git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git #devel/for-jens-4.5
> >>>
> >>> I will try rebasing the patches on top of that.
> >>
> >> It will likely clash with the multiqueue changes. I will rebase this
> >> patch series and resend it.
> > 
> > I got patch #1 ported over (see attached). Testing it now.
> 
> Thank you, the changes look good to me. What about patch #2?

Oh, I thought you said you would rebase it - so I had been waiting
for that.

Should have communicated that much better.

If it wouldn't be too much trouble - could you rebase #2 please?



> 
> Regards,
> 
> -- 
> Julien Grall

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Xen-devel] [PATCH v3 0/2] block/xen-blkfront: Support non-indirect grant with 64KB page granularity
  2015-12-01 18:52     ` Konrad Rzeszutek Wilk
                         ` (2 preceding siblings ...)
  2015-12-08 12:25       ` Julien Grall
@ 2015-12-08 12:25       ` Julien Grall
  3 siblings, 0 replies; 26+ messages in thread
From: Julien Grall @ 2015-12-08 12:25 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, Boris Ostrovsky, Roger Pau Monné,
	linux-kernel, David Vrabel

Hi Konrad,

The rebase of my patch is not correct. It now contains an unused variable and
missing one change.

I will post the rebase of the two patches.

On 01/12/15 18:52, Konrad Rzeszutek Wilk wrote:
> +static unsigned long blkif_ring_get_request(struct blkfront_ring_info *rinfo,
> +					    struct request *req,
> +					    struct blkif_request **ring_req)
> +{
> +	unsigned long id;
> +	struct blkfront_info *info = rinfo->dev_info;

This variable is unused within the function.

> +
> +	*ring_req = RING_GET_REQUEST(&rinfo->ring, rinfo->ring.req_prod_pvt);
> +	rinfo->ring.req_prod_pvt++;
> +
> +	id = get_id_from_freelist(rinfo);
> +	rinfo->shadow[id].request = req;
> +
> +	(*ring_req)->u.rw.id = id;
> +
> +	return id;
> +}
> +
>  static int blkif_queue_discard_req(struct request *req, struct blkfront_ring_info *rinfo)
>  {
>  	struct blkfront_info *info = rinfo->dev_info;
> @@ -488,9 +506,7 @@ static int blkif_queue_discard_req(struct request *req, struct blkfront_ring_inf
>  	unsigned long id;
>  
>  	/* Fill out a communications ring structure. */
> -	ring_req = RING_GET_REQUEST(&rinfo->ring, rinfo->ring.req_prod_pvt);
> -	id = get_id_from_freelist(rinfo);
> -	rinfo->shadow[id].request = req;
> +	id = blkif_ring_get_request(rinfo, req, &ring_req);
>  
>  	ring_req->operation = BLKIF_OP_DISCARD;
>  	ring_req->u.discard.nr_sectors = blk_rq_sectors(req);
> @@ -501,8 +517,6 @@ static int blkif_queue_discard_req(struct request *req, struct blkfront_ring_inf
>  	else
>  		ring_req->u.discard.flag = 0;
>  
> -	rinfo->ring.req_prod_pvt++;
> -
>  	/* Keep a private copy so we can reissue requests when recovering. */
>  	rinfo->shadow[id].req = *ring_req;
>  
> @@ -635,9 +649,7 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
>  		}
>  
>  	/* Fill out a communications ring structure. */
> -	ring_req = RING_GET_REQUEST(&rinfo->ring, rinfo->ring.req_prod_pvt);
> -	id = get_id_from_freelist(rinfo);
> -	rinfo->shadow[id].request = req;
> +	id = blkif_ring_get_request(rinfo, req, &ring_req);
>  
>  	BUG_ON(info->max_indirect_segments == 0 &&
>  	       GREFS(req->nr_phys_segments) > BLKIF_MAX_SEGMENTS_PER_REQUEST);

@@ -650,7 +661,6 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
        for_each_sg(rinfo->shadow[id].sg, sg, num_sg, i)
               num_grant += gnttab_count_grant(sg->offset, sg->length);
 
-       ring_req->u.rw.id = id;
        rinfo->shadow[id].num_sg = num_sg;
        if (num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
                /*

> @@ -716,8 +728,6 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
>  	if (setup.segments)
>  		kunmap_atomic(setup.segments);
>  
> -	rinfo->ring.req_prod_pvt++;
> -
>  	/* Keep a private copy so we can reissue requests when recovering. */
>  	rinfo->shadow[id].req = *ring_req;

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/2] block/xen-blkfront: Support non-indirect grant with 64KB page granularity
  2015-12-01 18:52     ` Konrad Rzeszutek Wilk
  2015-12-07 14:21       ` Julien Grall
  2015-12-07 14:21       ` [Xen-devel] " Julien Grall
@ 2015-12-08 12:25       ` Julien Grall
  2015-12-08 12:25       ` [Xen-devel] " Julien Grall
  3 siblings, 0 replies; 26+ messages in thread
From: Julien Grall @ 2015-12-08 12:25 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, Boris Ostrovsky, David Vrabel, linux-kernel,
	Roger Pau Monné

Hi Konrad,

The rebase of my patch is not correct. It now contains an unused variable and
missing one change.

I will post the rebase of the two patches.

On 01/12/15 18:52, Konrad Rzeszutek Wilk wrote:
> +static unsigned long blkif_ring_get_request(struct blkfront_ring_info *rinfo,
> +					    struct request *req,
> +					    struct blkif_request **ring_req)
> +{
> +	unsigned long id;
> +	struct blkfront_info *info = rinfo->dev_info;

This variable is unused within the function.

> +
> +	*ring_req = RING_GET_REQUEST(&rinfo->ring, rinfo->ring.req_prod_pvt);
> +	rinfo->ring.req_prod_pvt++;
> +
> +	id = get_id_from_freelist(rinfo);
> +	rinfo->shadow[id].request = req;
> +
> +	(*ring_req)->u.rw.id = id;
> +
> +	return id;
> +}
> +
>  static int blkif_queue_discard_req(struct request *req, struct blkfront_ring_info *rinfo)
>  {
>  	struct blkfront_info *info = rinfo->dev_info;
> @@ -488,9 +506,7 @@ static int blkif_queue_discard_req(struct request *req, struct blkfront_ring_inf
>  	unsigned long id;
>  
>  	/* Fill out a communications ring structure. */
> -	ring_req = RING_GET_REQUEST(&rinfo->ring, rinfo->ring.req_prod_pvt);
> -	id = get_id_from_freelist(rinfo);
> -	rinfo->shadow[id].request = req;
> +	id = blkif_ring_get_request(rinfo, req, &ring_req);
>  
>  	ring_req->operation = BLKIF_OP_DISCARD;
>  	ring_req->u.discard.nr_sectors = blk_rq_sectors(req);
> @@ -501,8 +517,6 @@ static int blkif_queue_discard_req(struct request *req, struct blkfront_ring_inf
>  	else
>  		ring_req->u.discard.flag = 0;
>  
> -	rinfo->ring.req_prod_pvt++;
> -
>  	/* Keep a private copy so we can reissue requests when recovering. */
>  	rinfo->shadow[id].req = *ring_req;
>  
> @@ -635,9 +649,7 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
>  		}
>  
>  	/* Fill out a communications ring structure. */
> -	ring_req = RING_GET_REQUEST(&rinfo->ring, rinfo->ring.req_prod_pvt);
> -	id = get_id_from_freelist(rinfo);
> -	rinfo->shadow[id].request = req;
> +	id = blkif_ring_get_request(rinfo, req, &ring_req);
>  
>  	BUG_ON(info->max_indirect_segments == 0 &&
>  	       GREFS(req->nr_phys_segments) > BLKIF_MAX_SEGMENTS_PER_REQUEST);

@@ -650,7 +661,6 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
        for_each_sg(rinfo->shadow[id].sg, sg, num_sg, i)
               num_grant += gnttab_count_grant(sg->offset, sg->length);
 
-       ring_req->u.rw.id = id;
        rinfo->shadow[id].num_sg = num_sg;
        if (num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
                /*

> @@ -716,8 +728,6 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
>  	if (setup.segments)
>  		kunmap_atomic(setup.segments);
>  
> -	rinfo->ring.req_prod_pvt++;
> -
>  	/* Keep a private copy so we can reissue requests when recovering. */
>  	rinfo->shadow[id].req = *ring_req;

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Xen-devel] [PATCH v3 2/2] block/xen-blkfront: Handle non-indirect grant with 64KB pages
  2015-12-01 15:30   ` Roger Pau Monné
  2015-12-08 12:45     ` Julien Grall
@ 2015-12-08 12:45     ` Julien Grall
  1 sibling, 0 replies; 26+ messages in thread
From: Julien Grall @ 2015-12-08 12:45 UTC (permalink / raw)
  To: Roger Pau Monné, xen-devel, linux-kernel
  Cc: Boris Ostrovsky, David Vrabel

Hi Royger,

On 01/12/15 15:30, Roger Pau Monné wrote:
> Just a couple of cosmetic comments below. Unless anyone else has real
> comments I don't think you need to resend.

Well, I'm considering those kind of comestic comments as minor. They are
not necessary and doesn't improve the readability of the code. It even
make it worth in the two cases.

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/2] block/xen-blkfront: Handle non-indirect grant with 64KB pages
  2015-12-01 15:30   ` Roger Pau Monné
@ 2015-12-08 12:45     ` Julien Grall
  2015-12-08 12:45     ` [Xen-devel] " Julien Grall
  1 sibling, 0 replies; 26+ messages in thread
From: Julien Grall @ 2015-12-08 12:45 UTC (permalink / raw)
  To: Roger Pau Monné, xen-devel, linux-kernel
  Cc: Boris Ostrovsky, David Vrabel

Hi Royger,

On 01/12/15 15:30, Roger Pau Monné wrote:
> Just a couple of cosmetic comments below. Unless anyone else has real
> comments I don't think you need to resend.

Well, I'm considering those kind of comestic comments as minor. They are
not necessary and doesn't improve the readability of the code. It even
make it worth in the two cases.

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Xen-devel] [PATCH v3 0/2] block/xen-blkfront: Support non-indirect grant with 64KB page granularity
  2015-12-07 15:01         ` Konrad Rzeszutek Wilk
  2015-12-08 12:51           ` Julien Grall
@ 2015-12-08 12:51           ` Julien Grall
  2015-12-08 19:28             ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 26+ messages in thread
From: Julien Grall @ 2015-12-08 12:51 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, Boris Ostrovsky, David Vrabel, linux-kernel,
	Roger Pau Monné

[-- Attachment #1: Type: text/plain, Size: 582 bytes --]

Hi Konrad,

On 07/12/15 15:01, Konrad Rzeszutek Wilk wrote:
> On Mon, Dec 07, 2015 at 02:21:46PM +0000, Julien Grall wrote:
>> Thank you, the changes look good to me. What about patch #2?
> 
> Oh, I thought you said you would rebase it - so I had been waiting
> for that.
> 
> Should have communicated that much better.
> 
> If it wouldn't be too much trouble - could you rebase #2 please?

I'm also providing a rebase of #1 because your wasn't correct. See
attachments.

git://xenbits.xen.org/people/julieng/linux-arm.git
branch blkfront-64-indirect-v4

Regards,

-- 
Julien Grall

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-block-xen-blkfront-Introduce-blkif_ring_get_request.patch --]
[-- Type: text/x-patch; name="0001-block-xen-blkfront-Introduce-blkif_ring_get_request.patch", Size: 3655 bytes --]

>From 2e3c97be628f010891c5e1ce807bd649ede32b93 Mon Sep 17 00:00:00 2001
From: Julien Grall <julien.grall@citrix.com>
Date: Thu, 13 Aug 2015 19:23:10 +0100
Subject: [PATCH 1/2] block/xen-blkfront: Introduce blkif_ring_get_request
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The code to get a request is always the same. Therefore we can factorize
it in a single function.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Bob Liu <bob.liu@oracle.com>

    Changes in v4:
        - Rebase on top of konrad/devel/for-jens-4.5

    Changes in v2:
        - Add Royger's acked-by
---
 drivers/block/xen-blkfront.c | 30 +++++++++++++++++++-----------
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 4f77d36..33a80d5 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -481,6 +481,23 @@ static int blkif_ioctl(struct block_device *bdev, fmode_t mode,
 	return 0;
 }
 
+static unsigned long blkif_ring_get_request(struct blkfront_ring_info *rinfo,
+					    struct request *req,
+					    struct blkif_request **ring_req)
+{
+	unsigned long id;
+
+	*ring_req = RING_GET_REQUEST(&rinfo->ring, rinfo->ring.req_prod_pvt);
+	rinfo->ring.req_prod_pvt++;
+
+	id = get_id_from_freelist(rinfo);
+	rinfo->shadow[id].request = req;
+
+	(*ring_req)->u.rw.id = id;
+
+	return id;
+}
+
 static int blkif_queue_discard_req(struct request *req, struct blkfront_ring_info *rinfo)
 {
 	struct blkfront_info *info = rinfo->dev_info;
@@ -488,9 +505,7 @@ static int blkif_queue_discard_req(struct request *req, struct blkfront_ring_inf
 	unsigned long id;
 
 	/* Fill out a communications ring structure. */
-	ring_req = RING_GET_REQUEST(&rinfo->ring, rinfo->ring.req_prod_pvt);
-	id = get_id_from_freelist(rinfo);
-	rinfo->shadow[id].request = req;
+	id = blkif_ring_get_request(rinfo, req, &ring_req);
 
 	ring_req->operation = BLKIF_OP_DISCARD;
 	ring_req->u.discard.nr_sectors = blk_rq_sectors(req);
@@ -501,8 +516,6 @@ static int blkif_queue_discard_req(struct request *req, struct blkfront_ring_inf
 	else
 		ring_req->u.discard.flag = 0;
 
-	rinfo->ring.req_prod_pvt++;
-
 	/* Keep a private copy so we can reissue requests when recovering. */
 	rinfo->shadow[id].req = *ring_req;
 
@@ -635,9 +648,7 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
 		}
 
 	/* Fill out a communications ring structure. */
-	ring_req = RING_GET_REQUEST(&rinfo->ring, rinfo->ring.req_prod_pvt);
-	id = get_id_from_freelist(rinfo);
-	rinfo->shadow[id].request = req;
+	id = blkif_ring_get_request(rinfo, req, &ring_req);
 
 	BUG_ON(info->max_indirect_segments == 0 &&
 	       GREFS(req->nr_phys_segments) > BLKIF_MAX_SEGMENTS_PER_REQUEST);
@@ -650,7 +661,6 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
 	for_each_sg(rinfo->shadow[id].sg, sg, num_sg, i)
 	       num_grant += gnttab_count_grant(sg->offset, sg->length);
 
-	ring_req->u.rw.id = id;
 	rinfo->shadow[id].num_sg = num_sg;
 	if (num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
 		/*
@@ -716,8 +726,6 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
 	if (setup.segments)
 		kunmap_atomic(setup.segments);
 
-	rinfo->ring.req_prod_pvt++;
-
 	/* Keep a private copy so we can reissue requests when recovering. */
 	rinfo->shadow[id].req = *ring_req;
 
-- 
2.1.4


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: 0002-block-xen-blkfront-Handle-non-indirect-grant-with-64.patch --]
[-- Type: text/x-patch; name="0002-block-xen-blkfront-Handle-non-indirect-grant-with-64.patch", Size: 15497 bytes --]

>From c8623233b85a34cfc603da0cba3111f4365e275f Mon Sep 17 00:00:00 2001
From: Julien Grall <julien.grall@citrix.com>
Date: Thu, 13 Aug 2015 13:13:35 +0100
Subject: [PATCH 2/2] block/xen-blkfront: Handle non-indirect grant with 64KB
 pages
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The minimal size of request in the block framework is always PAGE_SIZE.
It means that when 64KB guest is support, the request will at least be
64KB.

Although, if the backend doesn't support indirect descriptor (such as QDISK
in QEMU), a ring request is only able to accommodate 11 segments of 4KB
(i.e 44KB).

The current frontend is assuming that an I/O request will always fit in
a ring request. This is not true any more when using 64KB page
granularity and will therefore crash during boot.

On ARM64, the ABI is completely neutral to the page granularity used by
the domU. The guest has the choice between different page granularity
supported by the processors (for instance on ARM64: 4KB, 16KB, 64KB).
This can't be enforced by the hypervisor and therefore it's possible to
run guests using different page granularity.

So we can't mandate the block backend to support indirect descriptor
when the frontend is using 64KB page granularity and have to fix it
properly in the frontend.

The solution exposed below is based on modifying directly the frontend
guest rather than asking the block framework to support smaller size
(i.e < PAGE_SIZE). This is because the change is the block framework are
not trivial as everything seems to relying on a struct *page (see [1]).
Although, it may be possible that someone succeed to do it in the future
and we would therefore be able to use it.

Given that a block request may not fit in a single ring request, a
second request is introduced for the data that cannot fit in the first
one. This means that the second ring request should never be used on
Linux if the page size is smaller than 44KB.

To achieve the support of the extra ring request, the block queue size
is divided by two. Therefore, the ring will always contain enough space
to accommodate 2 ring requests. While this will reduce the overall
performance, it will make the implementation more contained. The way
forward to get better performance is to implement in the backend either
indirect descriptor or multiple grants ring.

Note that the parameters blk_queue_max_* helpers haven't been updated.
The block code will set the mimimum size supported and we may be able
to support directly any change in the block framework that lower down
the minimal size of a request.

[1] http://lists.xen.org/archives/html/xen-devel/2015-08/msg02200.html

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: "Roger Pau Monné" <roger.pau@citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Bob Liu <bob.liu@oracle.com>

    Roger, I haven't kept your acked-by because the update of the final
    status.

    Changes in v4:
        - Rebase on top of konrad/devel/for-jens-4.5

    Changes in v3:
        - Typoes
        - Change the way the final status is computed to handle the
        possibility of the request handled by different backend (could
        happen during suspend/resume).

    Changes in v2:
        - Update the commit message
        - Typoes
        - Rename ring_req2/id2 to extra_ring_req/extra_id
---
 drivers/block/xen-blkfront.c | 228 ++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 212 insertions(+), 16 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 33a80d5..2d6fdb4 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -60,6 +60,20 @@
 
 #include <asm/xen/hypervisor.h>
 
+/*
+ * The minimal size of segment supported by the block framework is PAGE_SIZE.
+ * When Linux is using a different page size than xen, it may not be possible
+ * to put all the data in a single segment.
+ * This can happen when the backend doesn't support indirect descriptor and
+ * therefore the maximum amount of data that a request can carry is
+ * BLKIF_MAX_SEGMENTS_PER_REQUEST * XEN_PAGE_SIZE = 44KB
+ *
+ * Note that we only support one extra request. So the Linux page size
+ * should be <= ( 2 * BLKIF_MAX_SEGMENTS_PER_REQUEST * XEN_PAGE_SIZE) =
+ * 88KB.
+ */
+#define HAS_EXTRA_REQ (BLKIF_MAX_SEGMENTS_PER_REQUEST < XEN_PFN_PER_PAGE)
+
 enum blkif_state {
 	BLKIF_STATE_DISCONNECTED,
 	BLKIF_STATE_CONNECTED,
@@ -72,6 +86,13 @@ struct grant {
 	struct list_head node;
 };
 
+enum blk_req_status {
+	REQ_WAITING,
+	REQ_DONE,
+	REQ_ERROR,
+	REQ_EOPNOTSUPP,
+};
+
 struct blk_shadow {
 	struct blkif_request req;
 	struct request *request;
@@ -79,6 +100,14 @@ struct blk_shadow {
 	struct grant **indirect_grants;
 	struct scatterlist *sg;
 	unsigned int num_sg;
+	enum blk_req_status status;
+
+	#define NO_ASSOCIATED_ID ~0UL
+	/*
+	 * Id of the sibling if we ever need 2 requests when handling a
+	 * block I/O request
+	 */
+	unsigned long associated_id;
 };
 
 struct split_bio {
@@ -492,6 +521,8 @@ static unsigned long blkif_ring_get_request(struct blkfront_ring_info *rinfo,
 
 	id = get_id_from_freelist(rinfo);
 	rinfo->shadow[id].request = req;
+	rinfo->shadow[id].status = REQ_WAITING;
+	rinfo->shadow[id].associated_id = NO_ASSOCIATED_ID;
 
 	(*ring_req)->u.rw.id = id;
 
@@ -533,6 +564,9 @@ struct setup_rw_req {
 	bool need_copy;
 	unsigned int bvec_off;
 	char *bvec_data;
+
+	bool require_extra_req;
+	struct blkif_request *extra_ring_req;
 };
 
 static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
@@ -546,8 +580,24 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
 	unsigned int grant_idx = setup->grant_idx;
 	struct blkif_request *ring_req = setup->ring_req;
 	struct blkfront_ring_info *rinfo = setup->rinfo;
+	/*
+	 * We always use the shadow of the first request to store the list
+	 * of grant associated to the block I/O request. This made the
+	 * completion more easy to handle even if the block I/O request is
+	 * split.
+	 */
 	struct blk_shadow *shadow = &rinfo->shadow[setup->id];
 
+	if (unlikely(setup->require_extra_req &&
+		     grant_idx >= BLKIF_MAX_SEGMENTS_PER_REQUEST)) {
+		/*
+		 * We are using the second request, setup grant_idx
+		 * to be the index of the segment array
+		 */
+		grant_idx -= BLKIF_MAX_SEGMENTS_PER_REQUEST;
+		ring_req = setup->extra_ring_req;
+	}
+
 	if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
 	    (grant_idx % GRANTS_PER_INDIRECT_FRAME == 0)) {
 		if (setup->segments)
@@ -562,7 +612,11 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
 
 	gnt_list_entry = get_grant(&setup->gref_head, gfn, rinfo);
 	ref = gnt_list_entry->gref;
-	shadow->grants_used[grant_idx] = gnt_list_entry;
+	/*
+	 * All the grants are stored in the shadow of the first
+	 * request. Therefore we have to use the global index
+	 */
+	shadow->grants_used[setup->grant_idx] = gnt_list_entry;
 
 	if (setup->need_copy) {
 		void *shared_data;
@@ -604,11 +658,31 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
 	(setup->grant_idx)++;
 }
 
+static void blkif_setup_extra_req(struct blkif_request *first,
+				  struct blkif_request *second)
+{
+	uint16_t nr_segments = first->u.rw.nr_segments;
+
+	/*
+	 * The second request is only present when the first request uses
+	 * all its segments. It's always the continuity of the first one.
+	 */
+	first->u.rw.nr_segments = BLKIF_MAX_SEGMENTS_PER_REQUEST;
+
+	second->u.rw.nr_segments = nr_segments - BLKIF_MAX_SEGMENTS_PER_REQUEST;
+	second->u.rw.sector_number = first->u.rw.sector_number +
+		(BLKIF_MAX_SEGMENTS_PER_REQUEST * XEN_PAGE_SIZE) / 512;
+
+	second->u.rw.handle = first->u.rw.handle;
+	second->operation = first->operation;
+}
+
 static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *rinfo)
 {
 	struct blkfront_info *info = rinfo->dev_info;
-	struct blkif_request *ring_req;
-	unsigned long id;
+	struct blkif_request *ring_req, *extra_ring_req = NULL;
+	unsigned long id, extra_id = NO_ASSOCIATED_ID;
+	bool require_extra_req = false;
 	int i;
 	struct setup_rw_req setup = {
 		.grant_idx = 0,
@@ -650,19 +724,19 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
 	/* Fill out a communications ring structure. */
 	id = blkif_ring_get_request(rinfo, req, &ring_req);
 
-	BUG_ON(info->max_indirect_segments == 0 &&
-	       GREFS(req->nr_phys_segments) > BLKIF_MAX_SEGMENTS_PER_REQUEST);
-	BUG_ON(info->max_indirect_segments &&
-	       GREFS(req->nr_phys_segments) > info->max_indirect_segments);
-
 	num_sg = blk_rq_map_sg(req->q, req, rinfo->shadow[id].sg);
 	num_grant = 0;
 	/* Calculate the number of grant used */
 	for_each_sg(rinfo->shadow[id].sg, sg, num_sg, i)
 	       num_grant += gnttab_count_grant(sg->offset, sg->length);
 
+	require_extra_req = info->max_indirect_segments == 0 &&
+		num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST;
+	BUG_ON(!HAS_EXTRA_REQ && require_extra_req);
+
 	rinfo->shadow[id].num_sg = num_sg;
-	if (num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
+	if (num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST &&
+	    likely(!require_extra_req)) {
 		/*
 		 * The indirect operation can only be a BLKIF_OP_READ or
 		 * BLKIF_OP_WRITE
@@ -702,10 +776,30 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
 			}
 		}
 		ring_req->u.rw.nr_segments = num_grant;
+		if (unlikely(require_extra_req)) {
+			extra_id = blkif_ring_get_request(rinfo, req,
+							  &extra_ring_req);
+			/*
+			 * Only the first request contains the scatter-gather
+			 * list.
+			 */
+			rinfo->shadow[extra_id].num_sg = 0;
+
+			blkif_setup_extra_req(ring_req, extra_ring_req);
+
+			/* Link the 2 requests together */
+			rinfo->shadow[extra_id].associated_id = id;
+			rinfo->shadow[id].associated_id = extra_id;
+		}
 	}
 
 	setup.ring_req = ring_req;
 	setup.id = id;
+
+	setup.require_extra_req = require_extra_req;
+	if (unlikely(require_extra_req))
+		setup.extra_ring_req = extra_ring_req;
+
 	for_each_sg(rinfo->shadow[id].sg, sg, num_sg, i) {
 		BUG_ON(sg->offset + sg->length > PAGE_SIZE);
 
@@ -728,6 +822,8 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
 
 	/* Keep a private copy so we can reissue requests when recovering. */
 	rinfo->shadow[id].req = *ring_req;
+	if (unlikely(require_extra_req))
+		rinfo->shadow[extra_id].req = *extra_ring_req;
 
 	if (max_grefs > 0)
 		gnttab_free_grant_references(setup.gref_head);
@@ -829,7 +925,16 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
 	memset(&info->tag_set, 0, sizeof(info->tag_set));
 	info->tag_set.ops = &blkfront_mq_ops;
 	info->tag_set.nr_hw_queues = info->nr_rings;
-	info->tag_set.queue_depth =  BLK_RING_SIZE(info);
+	if (HAS_EXTRA_REQ && info->max_indirect_segments == 0) {
+		/*
+		 * When indirect descriptior is not supported, the I/O request
+		 * will be split between multiple request in the ring.
+		 * To avoid problems when sending the request, divide by
+		 * 2 the depth of the queue.
+		 */
+		info->tag_set.queue_depth =  BLK_RING_SIZE(info) / 2;
+	} else
+		info->tag_set.queue_depth = BLK_RING_SIZE(info);
 	info->tag_set.numa_node = NUMA_NO_NODE;
 	info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
 	info->tag_set.cmd_size = 0;
@@ -1269,20 +1374,93 @@ static void blkif_copy_from_grant(unsigned long gfn, unsigned int offset,
 	kunmap_atomic(shared_data);
 }
 
-static void blkif_completion(struct blk_shadow *s, struct blkfront_ring_info *rinfo,
+static enum blk_req_status blkif_rsp_to_req_status(int rsp)
+{
+	switch (rsp)
+	{
+	case BLKIF_RSP_OKAY:
+		return REQ_DONE;
+	case BLKIF_RSP_EOPNOTSUPP:
+		return REQ_EOPNOTSUPP;
+	case BLKIF_RSP_ERROR:
+		/* Fallthrough */
+	default:
+		return REQ_ERROR;
+	}
+}
+
+/*
+ * Get the final status of the block request based on two ring response
+ */
+static int blkif_get_final_status(enum blk_req_status s1,
+				  enum blk_req_status s2)
+{
+	BUG_ON(s1 == REQ_WAITING);
+	BUG_ON(s2 == REQ_WAITING);
+
+	if (s1 == REQ_ERROR || s2 == REQ_ERROR)
+		return BLKIF_RSP_ERROR;
+	else if (s1 == REQ_EOPNOTSUPP || s2 == REQ_EOPNOTSUPP)
+		return BLKIF_RSP_EOPNOTSUPP;
+	return BLKIF_RSP_OKAY;
+}
+
+static bool blkif_completion(unsigned long *id,
+			     struct blkfront_ring_info *rinfo,
 			     struct blkif_response *bret)
 {
 	int i = 0;
 	struct scatterlist *sg;
 	int num_sg, num_grant;
 	struct blkfront_info *info = rinfo->dev_info;
+	struct blk_shadow *s = &rinfo->shadow[*id];
 	struct copy_from_grant data = {
-		.s = s,
 		.grant_idx = 0,
 	};
 
 	num_grant = s->req.operation == BLKIF_OP_INDIRECT ?
 		s->req.u.indirect.nr_segments : s->req.u.rw.nr_segments;
+
+	/* The I/O request may be split in two */
+	if (unlikely(s->associated_id != NO_ASSOCIATED_ID)) {
+		struct blk_shadow *s2 = &rinfo->shadow[s->associated_id];
+
+		/* Keep the status of the current response in shadow */
+		s->status = blkif_rsp_to_req_status(bret->status);
+
+		/* Wait the second response if not yet here */
+		if (s2->status == REQ_WAITING)
+			return 0;
+
+		bret->status = blkif_get_final_status(s->status,
+						      s2->status);
+
+		/*
+		 * All the grants is stored in the first shadow in order
+		 * to make the completion code simpler.
+		 */
+		num_grant += s2->req.u.rw.nr_segments;
+
+		/*
+		 * The two responses may not come in order. Only the
+		 * first request will store the scatter-gather list
+		 */
+		if (s2->num_sg != 0) {
+			/* Update "id" with the ID of the first response */
+			*id = s->associated_id;
+			s = s2;
+		}
+
+		/*
+		 * We don't need anymore the second request, so recycling
+		 * it now.
+		 */
+		if (add_id_to_freelist(rinfo, s->associated_id))
+			WARN(1, "%s: can't recycle the second part (id = %ld) of the request\n",
+			     info->gd->disk_name, s->associated_id);
+	}
+
+	data.s = s;
 	num_sg = s->num_sg;
 
 	if (bret->operation == BLKIF_OP_READ && info->feature_persistent) {
@@ -1352,6 +1530,8 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_ring_info *ri
 			}
 		}
 	}
+
+	return 1;
 }
 
 static irqreturn_t blkif_interrupt(int irq, void *dev_id)
@@ -1391,8 +1571,14 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 		}
 		req  = rinfo->shadow[id].request;
 
-		if (bret->operation != BLKIF_OP_DISCARD)
-			blkif_completion(&rinfo->shadow[id], rinfo, bret);
+		if (bret->operation != BLKIF_OP_DISCARD) {
+			/*
+			 * We may need to wait for an extra response if the
+			 * I/O request is split in 2
+			 */
+			if (!blkif_completion(&id, rinfo, bret))
+				continue;
+		}
 
 		if (add_id_to_freelist(rinfo, id)) {
 			WARN(1, "%s: response to %s (id %ld) couldn't be recycled!\n",
@@ -2033,8 +2219,18 @@ static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo)
 	int err, i;
 	struct blkfront_info *info = rinfo->dev_info;
 
-	if (info->max_indirect_segments == 0)
-		grants = BLKIF_MAX_SEGMENTS_PER_REQUEST;
+	if (info->max_indirect_segments == 0) {
+		if (!HAS_EXTRA_REQ)
+			grants = BLKIF_MAX_SEGMENTS_PER_REQUEST;
+		else {
+			/*
+			 * When an extra req is required, the maximum
+			 * grants supported is related to the size of the
+			 * Linux block segment
+			 */
+			grants = GRANTS_PER_PSEG;
+		}
+	}
 	else
 		grants = info->max_indirect_segments;
 	psegs = grants / GRANTS_PER_PSEG;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/2] block/xen-blkfront: Support non-indirect grant with 64KB page granularity
  2015-12-07 15:01         ` Konrad Rzeszutek Wilk
@ 2015-12-08 12:51           ` Julien Grall
  2015-12-08 12:51           ` [Xen-devel] " Julien Grall
  1 sibling, 0 replies; 26+ messages in thread
From: Julien Grall @ 2015-12-08 12:51 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, Boris Ostrovsky, Roger Pau Monné,
	David Vrabel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 582 bytes --]

Hi Konrad,

On 07/12/15 15:01, Konrad Rzeszutek Wilk wrote:
> On Mon, Dec 07, 2015 at 02:21:46PM +0000, Julien Grall wrote:
>> Thank you, the changes look good to me. What about patch #2?
> 
> Oh, I thought you said you would rebase it - so I had been waiting
> for that.
> 
> Should have communicated that much better.
> 
> If it wouldn't be too much trouble - could you rebase #2 please?

I'm also providing a rebase of #1 because your wasn't correct. See
attachments.

git://xenbits.xen.org/people/julieng/linux-arm.git
branch blkfront-64-indirect-v4

Regards,

-- 
Julien Grall

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-block-xen-blkfront-Introduce-blkif_ring_get_request.patch --]
[-- Type: text/x-patch; name="0001-block-xen-blkfront-Introduce-blkif_ring_get_request.patch", Size: 3752 bytes --]

>From 2e3c97be628f010891c5e1ce807bd649ede32b93 Mon Sep 17 00:00:00 2001
From: Julien Grall <julien.grall@citrix.com>
Date: Thu, 13 Aug 2015 19:23:10 +0100
Subject: [PATCH 1/2] block/xen-blkfront: Introduce blkif_ring_get_request
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The code to get a request is always the same. Therefore we can factorize
it in a single function.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Bob Liu <bob.liu@oracle.com>

    Changes in v4:
        - Rebase on top of konrad/devel/for-jens-4.5

    Changes in v2:
        - Add Royger's acked-by
---
 drivers/block/xen-blkfront.c | 30 +++++++++++++++++++-----------
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 4f77d36..33a80d5 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -481,6 +481,23 @@ static int blkif_ioctl(struct block_device *bdev, fmode_t mode,
 	return 0;
 }

+static unsigned long blkif_ring_get_request(struct blkfront_ring_info *rinfo,
+					    struct request *req,
+					    struct blkif_request **ring_req)
+{
+	unsigned long id;
+
+	*ring_req = RING_GET_REQUEST(&rinfo->ring, rinfo->ring.req_prod_pvt);
+	rinfo->ring.req_prod_pvt++;
+
+	id = get_id_from_freelist(rinfo);
+	rinfo->shadow[id].request = req;
+
+	(*ring_req)->u.rw.id = id;
+
+	return id;
+}
+
 static int blkif_queue_discard_req(struct request *req, struct blkfront_ring_info *rinfo)
 {
 	struct blkfront_info *info = rinfo->dev_info;
@@ -488,9 +505,7 @@ static int blkif_queue_discard_req(struct request *req, struct blkfront_ring_inf
 	unsigned long id;

 	/* Fill out a communications ring structure. */
-	ring_req = RING_GET_REQUEST(&rinfo->ring, rinfo->ring.req_prod_pvt);
-	id = get_id_from_freelist(rinfo);
-	rinfo->shadow[id].request = req;
+	id = blkif_ring_get_request(rinfo, req, &ring_req);

 	ring_req->operation = BLKIF_OP_DISCARD;
 	ring_req->u.discard.nr_sectors = blk_rq_sectors(req);
@@ -501,8 +516,6 @@ static int blkif_queue_discard_req(struct request *req, struct blkfront_ring_inf
 	else
 		ring_req->u.discard.flag = 0;

-	rinfo->ring.req_prod_pvt++;
-
 	/* Keep a private copy so we can reissue requests when recovering. */
 	rinfo->shadow[id].req = *ring_req;

@@ -635,9 +648,7 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
 		}

 	/* Fill out a communications ring structure. */
-	ring_req = RING_GET_REQUEST(&rinfo->ring, rinfo->ring.req_prod_pvt);
-	id = get_id_from_freelist(rinfo);
-	rinfo->shadow[id].request = req;
+	id = blkif_ring_get_request(rinfo, req, &ring_req);

 	BUG_ON(info->max_indirect_segments == 0 &&
 	       GREFS(req->nr_phys_segments) > BLKIF_MAX_SEGMENTS_PER_REQUEST);
@@ -650,7 +661,6 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
 	for_each_sg(rinfo->shadow[id].sg, sg, num_sg, i)
 	       num_grant += gnttab_count_grant(sg->offset, sg->length);

-	ring_req->u.rw.id = id;
 	rinfo->shadow[id].num_sg = num_sg;
 	if (num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
 		/*
@@ -716,8 +726,6 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
 	if (setup.segments)
 		kunmap_atomic(setup.segments);

-	rinfo->ring.req_prod_pvt++;
-
 	/* Keep a private copy so we can reissue requests when recovering. */
 	rinfo->shadow[id].req = *ring_req;

--
2.1.4


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: 0002-block-xen-blkfront-Handle-non-indirect-grant-with-64.patch --]
[-- Type: text/x-patch; name="0002-block-xen-blkfront-Handle-non-indirect-grant-with-64.patch", Size: 15932 bytes --]

>From c8623233b85a34cfc603da0cba3111f4365e275f Mon Sep 17 00:00:00 2001
From: Julien Grall <julien.grall@citrix.com>
Date: Thu, 13 Aug 2015 13:13:35 +0100
Subject: [PATCH 2/2] block/xen-blkfront: Handle non-indirect grant with 64KB
 pages
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The minimal size of request in the block framework is always PAGE_SIZE.
It means that when 64KB guest is support, the request will at least be
64KB.

Although, if the backend doesn't support indirect descriptor (such as QDISK
in QEMU), a ring request is only able to accommodate 11 segments of 4KB
(i.e 44KB).

The current frontend is assuming that an I/O request will always fit in
a ring request. This is not true any more when using 64KB page
granularity and will therefore crash during boot.

On ARM64, the ABI is completely neutral to the page granularity used by
the domU. The guest has the choice between different page granularity
supported by the processors (for instance on ARM64: 4KB, 16KB, 64KB).
This can't be enforced by the hypervisor and therefore it's possible to
run guests using different page granularity.

So we can't mandate the block backend to support indirect descriptor
when the frontend is using 64KB page granularity and have to fix it
properly in the frontend.

The solution exposed below is based on modifying directly the frontend
guest rather than asking the block framework to support smaller size
(i.e < PAGE_SIZE). This is because the change is the block framework are
not trivial as everything seems to relying on a struct *page (see [1]).
Although, it may be possible that someone succeed to do it in the future
and we would therefore be able to use it.

Given that a block request may not fit in a single ring request, a
second request is introduced for the data that cannot fit in the first
one. This means that the second ring request should never be used on
Linux if the page size is smaller than 44KB.

To achieve the support of the extra ring request, the block queue size
is divided by two. Therefore, the ring will always contain enough space
to accommodate 2 ring requests. While this will reduce the overall
performance, it will make the implementation more contained. The way
forward to get better performance is to implement in the backend either
indirect descriptor or multiple grants ring.

Note that the parameters blk_queue_max_* helpers haven't been updated.
The block code will set the mimimum size supported and we may be able
to support directly any change in the block framework that lower down
the minimal size of a request.

[1] http://lists.xen.org/archives/html/xen-devel/2015-08/msg02200.html

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: "Roger Pau Monné" <roger.pau@citrix.com>

---
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Bob Liu <bob.liu@oracle.com>

    Roger, I haven't kept your acked-by because the update of the final
    status.

    Changes in v4:
        - Rebase on top of konrad/devel/for-jens-4.5

    Changes in v3:
        - Typoes
        - Change the way the final status is computed to handle the
        possibility of the request handled by different backend (could
        happen during suspend/resume).

    Changes in v2:
        - Update the commit message
        - Typoes
        - Rename ring_req2/id2 to extra_ring_req/extra_id
---
 drivers/block/xen-blkfront.c | 228 ++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 212 insertions(+), 16 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 33a80d5..2d6fdb4 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -60,6 +60,20 @@

 #include <asm/xen/hypervisor.h>

+/*
+ * The minimal size of segment supported by the block framework is PAGE_SIZE.
+ * When Linux is using a different page size than xen, it may not be possible
+ * to put all the data in a single segment.
+ * This can happen when the backend doesn't support indirect descriptor and
+ * therefore the maximum amount of data that a request can carry is
+ * BLKIF_MAX_SEGMENTS_PER_REQUEST * XEN_PAGE_SIZE = 44KB
+ *
+ * Note that we only support one extra request. So the Linux page size
+ * should be <= ( 2 * BLKIF_MAX_SEGMENTS_PER_REQUEST * XEN_PAGE_SIZE) =
+ * 88KB.
+ */
+#define HAS_EXTRA_REQ (BLKIF_MAX_SEGMENTS_PER_REQUEST < XEN_PFN_PER_PAGE)
+
 enum blkif_state {
 	BLKIF_STATE_DISCONNECTED,
 	BLKIF_STATE_CONNECTED,
@@ -72,6 +86,13 @@ struct grant {
 	struct list_head node;
 };

+enum blk_req_status {
+	REQ_WAITING,
+	REQ_DONE,
+	REQ_ERROR,
+	REQ_EOPNOTSUPP,
+};
+
 struct blk_shadow {
 	struct blkif_request req;
 	struct request *request;
@@ -79,6 +100,14 @@ struct blk_shadow {
 	struct grant **indirect_grants;
 	struct scatterlist *sg;
 	unsigned int num_sg;
+	enum blk_req_status status;
+
+	#define NO_ASSOCIATED_ID ~0UL
+	/*
+	 * Id of the sibling if we ever need 2 requests when handling a
+	 * block I/O request
+	 */
+	unsigned long associated_id;
 };

 struct split_bio {
@@ -492,6 +521,8 @@ static unsigned long blkif_ring_get_request(struct blkfront_ring_info *rinfo,

 	id = get_id_from_freelist(rinfo);
 	rinfo->shadow[id].request = req;
+	rinfo->shadow[id].status = REQ_WAITING;
+	rinfo->shadow[id].associated_id = NO_ASSOCIATED_ID;

 	(*ring_req)->u.rw.id = id;

@@ -533,6 +564,9 @@ struct setup_rw_req {
 	bool need_copy;
 	unsigned int bvec_off;
 	char *bvec_data;
+
+	bool require_extra_req;
+	struct blkif_request *extra_ring_req;
 };

 static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
@@ -546,8 +580,24 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
 	unsigned int grant_idx = setup->grant_idx;
 	struct blkif_request *ring_req = setup->ring_req;
 	struct blkfront_ring_info *rinfo = setup->rinfo;
+	/*
+	 * We always use the shadow of the first request to store the list
+	 * of grant associated to the block I/O request. This made the
+	 * completion more easy to handle even if the block I/O request is
+	 * split.
+	 */
 	struct blk_shadow *shadow = &rinfo->shadow[setup->id];

+	if (unlikely(setup->require_extra_req &&
+		     grant_idx >= BLKIF_MAX_SEGMENTS_PER_REQUEST)) {
+		/*
+		 * We are using the second request, setup grant_idx
+		 * to be the index of the segment array
+		 */
+		grant_idx -= BLKIF_MAX_SEGMENTS_PER_REQUEST;
+		ring_req = setup->extra_ring_req;
+	}
+
 	if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
 	    (grant_idx % GRANTS_PER_INDIRECT_FRAME == 0)) {
 		if (setup->segments)
@@ -562,7 +612,11 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,

 	gnt_list_entry = get_grant(&setup->gref_head, gfn, rinfo);
 	ref = gnt_list_entry->gref;
-	shadow->grants_used[grant_idx] = gnt_list_entry;
+	/*
+	 * All the grants are stored in the shadow of the first
+	 * request. Therefore we have to use the global index
+	 */
+	shadow->grants_used[setup->grant_idx] = gnt_list_entry;

 	if (setup->need_copy) {
 		void *shared_data;
@@ -604,11 +658,31 @@ static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
 	(setup->grant_idx)++;
 }

+static void blkif_setup_extra_req(struct blkif_request *first,
+				  struct blkif_request *second)
+{
+	uint16_t nr_segments = first->u.rw.nr_segments;
+
+	/*
+	 * The second request is only present when the first request uses
+	 * all its segments. It's always the continuity of the first one.
+	 */
+	first->u.rw.nr_segments = BLKIF_MAX_SEGMENTS_PER_REQUEST;
+
+	second->u.rw.nr_segments = nr_segments - BLKIF_MAX_SEGMENTS_PER_REQUEST;
+	second->u.rw.sector_number = first->u.rw.sector_number +
+		(BLKIF_MAX_SEGMENTS_PER_REQUEST * XEN_PAGE_SIZE) / 512;
+
+	second->u.rw.handle = first->u.rw.handle;
+	second->operation = first->operation;
+}
+
 static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *rinfo)
 {
 	struct blkfront_info *info = rinfo->dev_info;
-	struct blkif_request *ring_req;
-	unsigned long id;
+	struct blkif_request *ring_req, *extra_ring_req = NULL;
+	unsigned long id, extra_id = NO_ASSOCIATED_ID;
+	bool require_extra_req = false;
 	int i;
 	struct setup_rw_req setup = {
 		.grant_idx = 0,
@@ -650,19 +724,19 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
 	/* Fill out a communications ring structure. */
 	id = blkif_ring_get_request(rinfo, req, &ring_req);

-	BUG_ON(info->max_indirect_segments == 0 &&
-	       GREFS(req->nr_phys_segments) > BLKIF_MAX_SEGMENTS_PER_REQUEST);
-	BUG_ON(info->max_indirect_segments &&
-	       GREFS(req->nr_phys_segments) > info->max_indirect_segments);
-
 	num_sg = blk_rq_map_sg(req->q, req, rinfo->shadow[id].sg);
 	num_grant = 0;
 	/* Calculate the number of grant used */
 	for_each_sg(rinfo->shadow[id].sg, sg, num_sg, i)
 	       num_grant += gnttab_count_grant(sg->offset, sg->length);

+	require_extra_req = info->max_indirect_segments == 0 &&
+		num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST;
+	BUG_ON(!HAS_EXTRA_REQ && require_extra_req);
+
 	rinfo->shadow[id].num_sg = num_sg;
-	if (num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
+	if (num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST &&
+	    likely(!require_extra_req)) {
 		/*
 		 * The indirect operation can only be a BLKIF_OP_READ or
 		 * BLKIF_OP_WRITE
@@ -702,10 +776,30 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
 			}
 		}
 		ring_req->u.rw.nr_segments = num_grant;
+		if (unlikely(require_extra_req)) {
+			extra_id = blkif_ring_get_request(rinfo, req,
+							  &extra_ring_req);
+			/*
+			 * Only the first request contains the scatter-gather
+			 * list.
+			 */
+			rinfo->shadow[extra_id].num_sg = 0;
+
+			blkif_setup_extra_req(ring_req, extra_ring_req);
+
+			/* Link the 2 requests together */
+			rinfo->shadow[extra_id].associated_id = id;
+			rinfo->shadow[id].associated_id = extra_id;
+		}
 	}

 	setup.ring_req = ring_req;
 	setup.id = id;
+
+	setup.require_extra_req = require_extra_req;
+	if (unlikely(require_extra_req))
+		setup.extra_ring_req = extra_ring_req;
+
 	for_each_sg(rinfo->shadow[id].sg, sg, num_sg, i) {
 		BUG_ON(sg->offset + sg->length > PAGE_SIZE);

@@ -728,6 +822,8 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri

 	/* Keep a private copy so we can reissue requests when recovering. */
 	rinfo->shadow[id].req = *ring_req;
+	if (unlikely(require_extra_req))
+		rinfo->shadow[extra_id].req = *extra_ring_req;

 	if (max_grefs > 0)
 		gnttab_free_grant_references(setup.gref_head);
@@ -829,7 +925,16 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
 	memset(&info->tag_set, 0, sizeof(info->tag_set));
 	info->tag_set.ops = &blkfront_mq_ops;
 	info->tag_set.nr_hw_queues = info->nr_rings;
-	info->tag_set.queue_depth =  BLK_RING_SIZE(info);
+	if (HAS_EXTRA_REQ && info->max_indirect_segments == 0) {
+		/*
+		 * When indirect descriptior is not supported, the I/O request
+		 * will be split between multiple request in the ring.
+		 * To avoid problems when sending the request, divide by
+		 * 2 the depth of the queue.
+		 */
+		info->tag_set.queue_depth =  BLK_RING_SIZE(info) / 2;
+	} else
+		info->tag_set.queue_depth = BLK_RING_SIZE(info);
 	info->tag_set.numa_node = NUMA_NO_NODE;
 	info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
 	info->tag_set.cmd_size = 0;
@@ -1269,20 +1374,93 @@ static void blkif_copy_from_grant(unsigned long gfn, unsigned int offset,
 	kunmap_atomic(shared_data);
 }

-static void blkif_completion(struct blk_shadow *s, struct blkfront_ring_info *rinfo,
+static enum blk_req_status blkif_rsp_to_req_status(int rsp)
+{
+	switch (rsp)
+	{
+	case BLKIF_RSP_OKAY:
+		return REQ_DONE;
+	case BLKIF_RSP_EOPNOTSUPP:
+		return REQ_EOPNOTSUPP;
+	case BLKIF_RSP_ERROR:
+		/* Fallthrough */
+	default:
+		return REQ_ERROR;
+	}
+}
+
+/*
+ * Get the final status of the block request based on two ring response
+ */
+static int blkif_get_final_status(enum blk_req_status s1,
+				  enum blk_req_status s2)
+{
+	BUG_ON(s1 == REQ_WAITING);
+	BUG_ON(s2 == REQ_WAITING);
+
+	if (s1 == REQ_ERROR || s2 == REQ_ERROR)
+		return BLKIF_RSP_ERROR;
+	else if (s1 == REQ_EOPNOTSUPP || s2 == REQ_EOPNOTSUPP)
+		return BLKIF_RSP_EOPNOTSUPP;
+	return BLKIF_RSP_OKAY;
+}
+
+static bool blkif_completion(unsigned long *id,
+			     struct blkfront_ring_info *rinfo,
 			     struct blkif_response *bret)
 {
 	int i = 0;
 	struct scatterlist *sg;
 	int num_sg, num_grant;
 	struct blkfront_info *info = rinfo->dev_info;
+	struct blk_shadow *s = &rinfo->shadow[*id];
 	struct copy_from_grant data = {
-		.s = s,
 		.grant_idx = 0,
 	};

 	num_grant = s->req.operation == BLKIF_OP_INDIRECT ?
 		s->req.u.indirect.nr_segments : s->req.u.rw.nr_segments;
+
+	/* The I/O request may be split in two */
+	if (unlikely(s->associated_id != NO_ASSOCIATED_ID)) {
+		struct blk_shadow *s2 = &rinfo->shadow[s->associated_id];
+
+		/* Keep the status of the current response in shadow */
+		s->status = blkif_rsp_to_req_status(bret->status);
+
+		/* Wait the second response if not yet here */
+		if (s2->status == REQ_WAITING)
+			return 0;
+
+		bret->status = blkif_get_final_status(s->status,
+						      s2->status);
+
+		/*
+		 * All the grants is stored in the first shadow in order
+		 * to make the completion code simpler.
+		 */
+		num_grant += s2->req.u.rw.nr_segments;
+
+		/*
+		 * The two responses may not come in order. Only the
+		 * first request will store the scatter-gather list
+		 */
+		if (s2->num_sg != 0) {
+			/* Update "id" with the ID of the first response */
+			*id = s->associated_id;
+			s = s2;
+		}
+
+		/*
+		 * We don't need anymore the second request, so recycling
+		 * it now.
+		 */
+		if (add_id_to_freelist(rinfo, s->associated_id))
+			WARN(1, "%s: can't recycle the second part (id = %ld) of the request\n",
+			     info->gd->disk_name, s->associated_id);
+	}
+
+	data.s = s;
 	num_sg = s->num_sg;

 	if (bret->operation == BLKIF_OP_READ && info->feature_persistent) {
@@ -1352,6 +1530,8 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_ring_info *ri
 			}
 		}
 	}
+
+	return 1;
 }

 static irqreturn_t blkif_interrupt(int irq, void *dev_id)
@@ -1391,8 +1571,14 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 		}
 		req  = rinfo->shadow[id].request;

-		if (bret->operation != BLKIF_OP_DISCARD)
-			blkif_completion(&rinfo->shadow[id], rinfo, bret);
+		if (bret->operation != BLKIF_OP_DISCARD) {
+			/*
+			 * We may need to wait for an extra response if the
+			 * I/O request is split in 2
+			 */
+			if (!blkif_completion(&id, rinfo, bret))
+				continue;
+		}

 		if (add_id_to_freelist(rinfo, id)) {
 			WARN(1, "%s: response to %s (id %ld) couldn't be recycled!\n",
@@ -2033,8 +2219,18 @@ static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo)
 	int err, i;
 	struct blkfront_info *info = rinfo->dev_info;

-	if (info->max_indirect_segments == 0)
-		grants = BLKIF_MAX_SEGMENTS_PER_REQUEST;
+	if (info->max_indirect_segments == 0) {
+		if (!HAS_EXTRA_REQ)
+			grants = BLKIF_MAX_SEGMENTS_PER_REQUEST;
+		else {
+			/*
+			 * When an extra req is required, the maximum
+			 * grants supported is related to the size of the
+			 * Linux block segment
+			 */
+			grants = GRANTS_PER_PSEG;
+		}
+	}
 	else
 		grants = info->max_indirect_segments;
 	psegs = grants / GRANTS_PER_PSEG;
--
2.1.4


[-- Attachment #4: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/2] block/xen-blkfront: Support non-indirect grant with 64KB page granularity
  2015-12-08 12:51           ` [Xen-devel] " Julien Grall
@ 2015-12-08 19:28             ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 26+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-12-08 19:28 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Boris Ostrovsky, Roger Pau Monné,
	David Vrabel, linux-kernel

On Tue, Dec 08, 2015 at 12:51:25PM +0000, Julien Grall wrote:
> Hi Konrad,
> 
> On 07/12/15 15:01, Konrad Rzeszutek Wilk wrote:
> > On Mon, Dec 07, 2015 at 02:21:46PM +0000, Julien Grall wrote:
> >> Thank you, the changes look good to me. What about patch #2?
> > 
> > Oh, I thought you said you would rebase it - so I had been waiting
> > for that.
> > 
> > Should have communicated that much better.
> > 
> > If it wouldn't be too much trouble - could you rebase #2 please?
> 
> I'm also providing a rebase of #1 because your wasn't correct. See
> attachments.
> 
> git://xenbits.xen.org/people/julieng/linux-arm.git
> branch blkfront-64-indirect-v4

OK, I've taken those two and I am testing them now.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v3 0/2] block/xen-blkfront: Support non-indirect grant with 64KB page granularity
@ 2015-11-18 18:57 Julien Grall
  0 siblings, 0 replies; 26+ messages in thread
From: Julien Grall @ 2015-11-18 18:57 UTC (permalink / raw)
  To: xen-devel, linux-kernel
  Cc: Julien Grall, David Vrabel, Boris Ostrovsky, Roger Pau Monné

Hi all,

This is a follow-up on the previous discussion [1] related to guest using 64KB
page granularity which doesn't boot when the backend isn't using indirect
descriptor.

This has been successfully tested on ARM64 with both 64KB and 4KB page
granularity guests and QEMU as the backend. Indeed QEMU doesn't support
indirect descriptor.

This series is based on xentip/for-linus-4.4 which include the support for
64KB Linux guest.

For all the changes see in each patch.

Sincerely yours,

[1] http://lists.xen.org/archives/html/xen-devel/2015-08/msg01659.html

Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Bob Liu <bob.liu@oracle.com>

Julien Grall (2):
  block/xen-blkfront: Introduce blkif_ring_get_request
  block/xen-blkfront: Handle non-indirect grant with 64KB pages

 drivers/block/xen-blkfront.c | 258 ++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 231 insertions(+), 27 deletions(-)

-- 
2.1.4

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2015-12-08 19:28 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-18 18:57 [PATCH v3 0/2] block/xen-blkfront: Support non-indirect grant with 64KB page granularity Julien Grall
2015-11-18 18:57 ` [PATCH v3 1/2] block/xen-blkfront: Introduce blkif_ring_get_request Julien Grall
2015-11-18 18:57 ` Julien Grall
2015-11-18 18:57 ` [PATCH v3 2/2] block/xen-blkfront: Handle non-indirect grant with 64KB pages Julien Grall
2015-11-18 18:57 ` Julien Grall
2015-12-01 15:30   ` Roger Pau Monné
2015-12-01 15:30   ` Roger Pau Monné
2015-12-08 12:45     ` Julien Grall
2015-12-08 12:45     ` [Xen-devel] " Julien Grall
2015-11-30 16:16 ` [PATCH v3 0/2] block/xen-blkfront: Support non-indirect grant with 64KB page granularity Julien Grall
2015-11-30 16:16 ` [Xen-devel] " Julien Grall
2015-11-30 18:45 ` Julien Grall
2015-11-30 18:45 ` Julien Grall
2015-12-01 15:37 ` Konrad Rzeszutek Wilk
2015-12-01 17:55   ` [Xen-devel] " Julien Grall
2015-12-01 18:52     ` Konrad Rzeszutek Wilk
2015-12-07 14:21       ` Julien Grall
2015-12-07 14:21       ` [Xen-devel] " Julien Grall
2015-12-07 15:01         ` Konrad Rzeszutek Wilk
2015-12-08 12:51           ` Julien Grall
2015-12-08 12:51           ` [Xen-devel] " Julien Grall
2015-12-08 19:28             ` Konrad Rzeszutek Wilk
2015-12-08 12:25       ` Julien Grall
2015-12-08 12:25       ` [Xen-devel] " Julien Grall
2015-12-01 17:55   ` Julien Grall
  -- strict thread matches above, loose matches on Subject: below --
2015-11-18 18:57 Julien Grall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.