All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/3] xen-disk: performance improvements
@ 2017-06-20 13:47 ` Paul Durrant
  0 siblings, 0 replies; 28+ messages in thread
From: Paul Durrant @ 2017-06-20 13:47 UTC (permalink / raw)
  To: xen-devel, qemu-devel, qemu-block; +Cc: Paul Durrant

Paul Durrant (3):
  xen-disk: only advertize feature-persistent if grant copy is not
    available
  xen-disk: add support for multi-page shared rings
  xen-disk: use an IOThread per instance

 hw/block/trace-events |   7 ++
 hw/block/xen_disk.c   | 200 ++++++++++++++++++++++++++++++++++++++++----------
 2 files changed, 168 insertions(+), 39 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 0/3] xen-disk: performance improvements
@ 2017-06-20 13:47 ` Paul Durrant
  0 siblings, 0 replies; 28+ messages in thread
From: Paul Durrant @ 2017-06-20 13:47 UTC (permalink / raw)
  To: xen-devel, qemu-devel, qemu-block; +Cc: Paul Durrant

Paul Durrant (3):
  xen-disk: only advertize feature-persistent if grant copy is not
    available
  xen-disk: add support for multi-page shared rings
  xen-disk: use an IOThread per instance

 hw/block/trace-events |   7 ++
 hw/block/xen_disk.c   | 200 ++++++++++++++++++++++++++++++++++++++++----------
 2 files changed, 168 insertions(+), 39 deletions(-)

-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH 1/3] xen-disk: only advertize feature-persistent if grant copy is not available
  2017-06-20 13:47 ` Paul Durrant
@ 2017-06-20 13:47   ` Paul Durrant
  -1 siblings, 0 replies; 28+ messages in thread
From: Paul Durrant @ 2017-06-20 13:47 UTC (permalink / raw)
  To: xen-devel, qemu-devel, qemu-block
  Cc: Paul Durrant, Stefano Stabellini, Anthony Perard, Kevin Wolf, Max Reitz

If grant copy is available then it will always be used in preference to
persistent maps. In this case feature-persistent should not be advertized
to the frontend, otherwise it may needlessly copy data into persistently
granted buffers.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
---
 hw/block/xen_disk.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 3a22805fbc..9b06e3aa81 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -1023,11 +1023,18 @@ static int blk_init(struct XenDevice *xendev)
 
     blkdev->file_blk  = BLOCK_SIZE;
 
+    blkdev->feature_grant_copy =
+                (xengnttab_grant_copy(blkdev->xendev.gnttabdev, 0, NULL) == 0);
+
+    xen_pv_printf(&blkdev->xendev, 3, "grant copy operation %s\n",
+                  blkdev->feature_grant_copy ? "enabled" : "disabled");
+
     /* fill info
      * blk_connect supplies sector-size and sectors
      */
     xenstore_write_be_int(&blkdev->xendev, "feature-flush-cache", 1);
-    xenstore_write_be_int(&blkdev->xendev, "feature-persistent", 1);
+    xenstore_write_be_int(&blkdev->xendev, "feature-persistent",
+                          !blkdev->feature_grant_copy);
     xenstore_write_be_int(&blkdev->xendev, "info", info);
 
     blk_parse_discard(blkdev);
@@ -1202,12 +1209,6 @@ static int blk_connect(struct XenDevice *xendev)
 
     xen_be_bind_evtchn(&blkdev->xendev);
 
-    blkdev->feature_grant_copy =
-                (xengnttab_grant_copy(blkdev->xendev.gnttabdev, 0, NULL) == 0);
-
-    xen_pv_printf(&blkdev->xendev, 3, "grant copy operation %s\n",
-                  blkdev->feature_grant_copy ? "enabled" : "disabled");
-
     xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, ring-ref %d, "
                   "remote port %d, local port %d\n",
                   blkdev->xendev.protocol, blkdev->ring_ref,
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 1/3] xen-disk: only advertize feature-persistent if grant copy is not available
@ 2017-06-20 13:47   ` Paul Durrant
  0 siblings, 0 replies; 28+ messages in thread
From: Paul Durrant @ 2017-06-20 13:47 UTC (permalink / raw)
  To: xen-devel, qemu-devel, qemu-block
  Cc: Anthony Perard, Kevin Wolf, Paul Durrant, Stefano Stabellini, Max Reitz

If grant copy is available then it will always be used in preference to
persistent maps. In this case feature-persistent should not be advertized
to the frontend, otherwise it may needlessly copy data into persistently
granted buffers.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
---
 hw/block/xen_disk.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 3a22805fbc..9b06e3aa81 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -1023,11 +1023,18 @@ static int blk_init(struct XenDevice *xendev)
 
     blkdev->file_blk  = BLOCK_SIZE;
 
+    blkdev->feature_grant_copy =
+                (xengnttab_grant_copy(blkdev->xendev.gnttabdev, 0, NULL) == 0);
+
+    xen_pv_printf(&blkdev->xendev, 3, "grant copy operation %s\n",
+                  blkdev->feature_grant_copy ? "enabled" : "disabled");
+
     /* fill info
      * blk_connect supplies sector-size and sectors
      */
     xenstore_write_be_int(&blkdev->xendev, "feature-flush-cache", 1);
-    xenstore_write_be_int(&blkdev->xendev, "feature-persistent", 1);
+    xenstore_write_be_int(&blkdev->xendev, "feature-persistent",
+                          !blkdev->feature_grant_copy);
     xenstore_write_be_int(&blkdev->xendev, "info", info);
 
     blk_parse_discard(blkdev);
@@ -1202,12 +1209,6 @@ static int blk_connect(struct XenDevice *xendev)
 
     xen_be_bind_evtchn(&blkdev->xendev);
 
-    blkdev->feature_grant_copy =
-                (xengnttab_grant_copy(blkdev->xendev.gnttabdev, 0, NULL) == 0);
-
-    xen_pv_printf(&blkdev->xendev, 3, "grant copy operation %s\n",
-                  blkdev->feature_grant_copy ? "enabled" : "disabled");
-
     xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, ring-ref %d, "
                   "remote port %d, local port %d\n",
                   blkdev->xendev.protocol, blkdev->ring_ref,
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH 2/3] xen-disk: add support for multi-page shared rings
  2017-06-20 13:47 ` Paul Durrant
@ 2017-06-20 13:47   ` Paul Durrant
  -1 siblings, 0 replies; 28+ messages in thread
From: Paul Durrant @ 2017-06-20 13:47 UTC (permalink / raw)
  To: xen-devel, qemu-devel, qemu-block
  Cc: Paul Durrant, Stefano Stabellini, Anthony Perard, Kevin Wolf, Max Reitz

The blkif protocol has had provision for negotiation of multi-page shared
rings for some time now and many guest OS have support in their frontend
drivers.

This patch makes the necessary modifications to xen-disk support a shared
ring up to order 4 (i.e. 16 pages).

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
---
 hw/block/xen_disk.c | 141 ++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 110 insertions(+), 31 deletions(-)

diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 9b06e3aa81..a9942d32db 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -36,8 +36,6 @@
 
 static int batch_maps   = 0;
 
-static int max_requests = 32;
-
 /* ------------------------------------------------------------- */
 
 #define BLOCK_SIZE  512
@@ -84,6 +82,8 @@ struct ioreq {
     BlockAcctCookie     acct;
 };
 
+#define MAX_RING_PAGE_ORDER 4
+
 struct XenBlkDev {
     struct XenDevice    xendev;  /* must be first */
     char                *params;
@@ -94,7 +94,8 @@ struct XenBlkDev {
     bool                directiosafe;
     const char          *fileproto;
     const char          *filename;
-    int                 ring_ref;
+    unsigned int        ring_ref[1 << MAX_RING_PAGE_ORDER];
+    unsigned int        nr_ring_ref;
     void                *sring;
     int64_t             file_blk;
     int64_t             file_size;
@@ -110,6 +111,7 @@ struct XenBlkDev {
     int                 requests_total;
     int                 requests_inflight;
     int                 requests_finished;
+    unsigned int        max_requests;
 
     /* Persistent grants extension */
     gboolean            feature_discard;
@@ -199,7 +201,7 @@ static struct ioreq *ioreq_start(struct XenBlkDev *blkdev)
     struct ioreq *ioreq = NULL;
 
     if (QLIST_EMPTY(&blkdev->freelist)) {
-        if (blkdev->requests_total >= max_requests) {
+        if (blkdev->requests_total >= blkdev->max_requests) {
             goto out;
         }
         /* allocate new struct */
@@ -905,7 +907,7 @@ static void blk_handle_requests(struct XenBlkDev *blkdev)
         ioreq_runio_qemu_aio(ioreq);
     }
 
-    if (blkdev->more_work && blkdev->requests_inflight < max_requests) {
+    if (blkdev->more_work && blkdev->requests_inflight < blkdev->max_requests) {
         qemu_bh_schedule(blkdev->bh);
     }
 }
@@ -918,15 +920,6 @@ static void blk_bh(void *opaque)
     blk_handle_requests(blkdev);
 }
 
-/*
- * We need to account for the grant allocations requiring contiguous
- * chunks; the worst case number would be
- *     max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
- * but in order to keep things simple just use
- *     2 * max_req * max_seg.
- */
-#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
-
 static void blk_alloc(struct XenDevice *xendev)
 {
     struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
@@ -938,11 +931,6 @@ static void blk_alloc(struct XenDevice *xendev)
     if (xen_mode != XEN_EMULATE) {
         batch_maps = 1;
     }
-    if (xengnttab_set_max_grants(xendev->gnttabdev,
-            MAX_GRANTS(max_requests, BLKIF_MAX_SEGMENTS_PER_REQUEST)) < 0) {
-        xen_pv_printf(xendev, 0, "xengnttab_set_max_grants failed: %s\n",
-                      strerror(errno));
-    }
 }
 
 static void blk_parse_discard(struct XenBlkDev *blkdev)
@@ -1037,6 +1025,9 @@ static int blk_init(struct XenDevice *xendev)
                           !blkdev->feature_grant_copy);
     xenstore_write_be_int(&blkdev->xendev, "info", info);
 
+    xenstore_write_be_int(&blkdev->xendev, "max-ring-page-order",
+                          MAX_RING_PAGE_ORDER);
+
     blk_parse_discard(blkdev);
 
     g_free(directiosafe);
@@ -1058,12 +1049,25 @@ out_error:
     return -1;
 }
 
+/*
+ * We need to account for the grant allocations requiring contiguous
+ * chunks; the worst case number would be
+ *     max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
+ * but in order to keep things simple just use
+ *     2 * max_req * max_seg.
+ */
+#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
+
 static int blk_connect(struct XenDevice *xendev)
 {
     struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
     int pers, index, qflags;
     bool readonly = true;
     bool writethrough = true;
+    int order, ring_ref;
+    unsigned int ring_size, max_grants;
+    unsigned int i;
+    uint32_t *domids;
 
     /* read-only ? */
     if (blkdev->directiosafe) {
@@ -1138,9 +1142,39 @@ static int blk_connect(struct XenDevice *xendev)
     xenstore_write_be_int64(&blkdev->xendev, "sectors",
                             blkdev->file_size / blkdev->file_blk);
 
-    if (xenstore_read_fe_int(&blkdev->xendev, "ring-ref", &blkdev->ring_ref) == -1) {
+    if (xenstore_read_fe_int(&blkdev->xendev, "ring-page-order",
+                             &order) == -1) {
+        blkdev->nr_ring_ref = 1;
+
+        if (xenstore_read_fe_int(&blkdev->xendev, "ring-ref",
+                                 &ring_ref) == -1) {
+            return -1;
+        }
+        blkdev->ring_ref[0] = ring_ref;
+
+    } else if (order >= 0 && order <= MAX_RING_PAGE_ORDER) {
+        blkdev->nr_ring_ref = 1 << order;
+
+        for (i = 0; i < blkdev->nr_ring_ref; i++) {
+            char *key;
+
+            key = g_strdup_printf("ring-ref%u", i);
+            if (!key) {
+                return -1;
+            }
+
+            if (xenstore_read_fe_int(&blkdev->xendev, key,
+                                     &ring_ref) == -1) {
+                return -1;
+            }
+            blkdev->ring_ref[i] = ring_ref;
+
+            g_free(key);
+        }
+    } else {
         return -1;
     }
+
     if (xenstore_read_fe_int(&blkdev->xendev, "event-channel",
                              &blkdev->xendev.remote_port) == -1) {
         return -1;
@@ -1163,41 +1197,85 @@ static int blk_connect(struct XenDevice *xendev)
         blkdev->protocol = BLKIF_PROTOCOL_NATIVE;
     }
 
-    blkdev->sring = xengnttab_map_grant_ref(blkdev->xendev.gnttabdev,
-                                            blkdev->xendev.dom,
-                                            blkdev->ring_ref,
-                                            PROT_READ | PROT_WRITE);
+    ring_size = XC_PAGE_SIZE * blkdev->nr_ring_ref;
+    switch (blkdev->protocol) {
+    case BLKIF_PROTOCOL_NATIVE:
+    {
+        blkdev->max_requests = __CONST_RING_SIZE(blkif, ring_size);
+        break;
+    }
+    case BLKIF_PROTOCOL_X86_32:
+    {
+        blkdev->max_requests = __CONST_RING_SIZE(blkif_x86_32, ring_size);
+        break;
+    }
+    case BLKIF_PROTOCOL_X86_64:
+    {
+        blkdev->max_requests = __CONST_RING_SIZE(blkif_x86_64, ring_size);
+        break;
+    }
+    default:
+        return -1;
+    }
+
+    /* Calculate the maximum number of grants needed by ioreqs */
+    max_grants = MAX_GRANTS(blkdev->max_requests,
+                            BLKIF_MAX_SEGMENTS_PER_REQUEST);
+    /* Add on the number needed for the ring pages */
+    max_grants += blkdev->nr_ring_ref;
+
+    if (xengnttab_set_max_grants(blkdev->xendev.gnttabdev, max_grants)) {
+        xen_pv_printf(xendev, 0, "xengnttab_set_max_grants failed: %s\n",
+                      strerror(errno));
+        return -1;
+    }
+
+    domids = g_malloc0_n(blkdev->nr_ring_ref, sizeof(uint32_t));
+    for (i = 0; i < blkdev->nr_ring_ref; i++) {
+        domids[i] = blkdev->xendev.dom;
+    }
+
+    blkdev->sring = xengnttab_map_grant_refs(blkdev->xendev.gnttabdev,
+                                             blkdev->nr_ring_ref,
+                                             domids,
+                                             blkdev->ring_ref,
+                                             PROT_READ | PROT_WRITE);
+
+    g_free(domids);
+
     if (!blkdev->sring) {
         return -1;
     }
+
     blkdev->cnt_map++;
 
     switch (blkdev->protocol) {
     case BLKIF_PROTOCOL_NATIVE:
     {
         blkif_sring_t *sring_native = blkdev->sring;
-        BACK_RING_INIT(&blkdev->rings.native, sring_native, XC_PAGE_SIZE);
+        BACK_RING_INIT(&blkdev->rings.native, sring_native, ring_size);
         break;
     }
     case BLKIF_PROTOCOL_X86_32:
     {
         blkif_x86_32_sring_t *sring_x86_32 = blkdev->sring;
 
-        BACK_RING_INIT(&blkdev->rings.x86_32_part, sring_x86_32, XC_PAGE_SIZE);
+        BACK_RING_INIT(&blkdev->rings.x86_32_part, sring_x86_32, ring_size);
         break;
     }
     case BLKIF_PROTOCOL_X86_64:
     {
         blkif_x86_64_sring_t *sring_x86_64 = blkdev->sring;
 
-        BACK_RING_INIT(&blkdev->rings.x86_64_part, sring_x86_64, XC_PAGE_SIZE);
+        BACK_RING_INIT(&blkdev->rings.x86_64_part, sring_x86_64, ring_size);
         break;
     }
     }
 
     if (blkdev->feature_persistent) {
         /* Init persistent grants */
-        blkdev->max_grants = max_requests * BLKIF_MAX_SEGMENTS_PER_REQUEST;
+        blkdev->max_grants = blkdev->max_requests *
+            BLKIF_MAX_SEGMENTS_PER_REQUEST;
         blkdev->persistent_gnts = g_tree_new_full((GCompareDataFunc)int_cmp,
                                              NULL, NULL,
                                              batch_maps ?
@@ -1209,9 +1287,9 @@ static int blk_connect(struct XenDevice *xendev)
 
     xen_be_bind_evtchn(&blkdev->xendev);
 
-    xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, ring-ref %d, "
+    xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, nr-ring-ref %u, "
                   "remote port %d, local port %d\n",
-                  blkdev->xendev.protocol, blkdev->ring_ref,
+                  blkdev->xendev.protocol, blkdev->nr_ring_ref,
                   blkdev->xendev.remote_port, blkdev->xendev.local_port);
     return 0;
 }
@@ -1228,7 +1306,8 @@ static void blk_disconnect(struct XenDevice *xendev)
     xen_pv_unbind_evtchn(&blkdev->xendev);
 
     if (blkdev->sring) {
-        xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring, 1);
+        xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring,
+                        blkdev->nr_ring_ref);
         blkdev->cnt_map--;
         blkdev->sring = NULL;
     }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/3] xen-disk: add support for multi-page shared rings
@ 2017-06-20 13:47   ` Paul Durrant
  0 siblings, 0 replies; 28+ messages in thread
From: Paul Durrant @ 2017-06-20 13:47 UTC (permalink / raw)
  To: xen-devel, qemu-devel, qemu-block
  Cc: Anthony Perard, Kevin Wolf, Paul Durrant, Stefano Stabellini, Max Reitz

The blkif protocol has had provision for negotiation of multi-page shared
rings for some time now and many guest OS have support in their frontend
drivers.

This patch makes the necessary modifications to xen-disk support a shared
ring up to order 4 (i.e. 16 pages).

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
---
 hw/block/xen_disk.c | 141 ++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 110 insertions(+), 31 deletions(-)

diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 9b06e3aa81..a9942d32db 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -36,8 +36,6 @@
 
 static int batch_maps   = 0;
 
-static int max_requests = 32;
-
 /* ------------------------------------------------------------- */
 
 #define BLOCK_SIZE  512
@@ -84,6 +82,8 @@ struct ioreq {
     BlockAcctCookie     acct;
 };
 
+#define MAX_RING_PAGE_ORDER 4
+
 struct XenBlkDev {
     struct XenDevice    xendev;  /* must be first */
     char                *params;
@@ -94,7 +94,8 @@ struct XenBlkDev {
     bool                directiosafe;
     const char          *fileproto;
     const char          *filename;
-    int                 ring_ref;
+    unsigned int        ring_ref[1 << MAX_RING_PAGE_ORDER];
+    unsigned int        nr_ring_ref;
     void                *sring;
     int64_t             file_blk;
     int64_t             file_size;
@@ -110,6 +111,7 @@ struct XenBlkDev {
     int                 requests_total;
     int                 requests_inflight;
     int                 requests_finished;
+    unsigned int        max_requests;
 
     /* Persistent grants extension */
     gboolean            feature_discard;
@@ -199,7 +201,7 @@ static struct ioreq *ioreq_start(struct XenBlkDev *blkdev)
     struct ioreq *ioreq = NULL;
 
     if (QLIST_EMPTY(&blkdev->freelist)) {
-        if (blkdev->requests_total >= max_requests) {
+        if (blkdev->requests_total >= blkdev->max_requests) {
             goto out;
         }
         /* allocate new struct */
@@ -905,7 +907,7 @@ static void blk_handle_requests(struct XenBlkDev *blkdev)
         ioreq_runio_qemu_aio(ioreq);
     }
 
-    if (blkdev->more_work && blkdev->requests_inflight < max_requests) {
+    if (blkdev->more_work && blkdev->requests_inflight < blkdev->max_requests) {
         qemu_bh_schedule(blkdev->bh);
     }
 }
@@ -918,15 +920,6 @@ static void blk_bh(void *opaque)
     blk_handle_requests(blkdev);
 }
 
-/*
- * We need to account for the grant allocations requiring contiguous
- * chunks; the worst case number would be
- *     max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
- * but in order to keep things simple just use
- *     2 * max_req * max_seg.
- */
-#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
-
 static void blk_alloc(struct XenDevice *xendev)
 {
     struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
@@ -938,11 +931,6 @@ static void blk_alloc(struct XenDevice *xendev)
     if (xen_mode != XEN_EMULATE) {
         batch_maps = 1;
     }
-    if (xengnttab_set_max_grants(xendev->gnttabdev,
-            MAX_GRANTS(max_requests, BLKIF_MAX_SEGMENTS_PER_REQUEST)) < 0) {
-        xen_pv_printf(xendev, 0, "xengnttab_set_max_grants failed: %s\n",
-                      strerror(errno));
-    }
 }
 
 static void blk_parse_discard(struct XenBlkDev *blkdev)
@@ -1037,6 +1025,9 @@ static int blk_init(struct XenDevice *xendev)
                           !blkdev->feature_grant_copy);
     xenstore_write_be_int(&blkdev->xendev, "info", info);
 
+    xenstore_write_be_int(&blkdev->xendev, "max-ring-page-order",
+                          MAX_RING_PAGE_ORDER);
+
     blk_parse_discard(blkdev);
 
     g_free(directiosafe);
@@ -1058,12 +1049,25 @@ out_error:
     return -1;
 }
 
+/*
+ * We need to account for the grant allocations requiring contiguous
+ * chunks; the worst case number would be
+ *     max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
+ * but in order to keep things simple just use
+ *     2 * max_req * max_seg.
+ */
+#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
+
 static int blk_connect(struct XenDevice *xendev)
 {
     struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
     int pers, index, qflags;
     bool readonly = true;
     bool writethrough = true;
+    int order, ring_ref;
+    unsigned int ring_size, max_grants;
+    unsigned int i;
+    uint32_t *domids;
 
     /* read-only ? */
     if (blkdev->directiosafe) {
@@ -1138,9 +1142,39 @@ static int blk_connect(struct XenDevice *xendev)
     xenstore_write_be_int64(&blkdev->xendev, "sectors",
                             blkdev->file_size / blkdev->file_blk);
 
-    if (xenstore_read_fe_int(&blkdev->xendev, "ring-ref", &blkdev->ring_ref) == -1) {
+    if (xenstore_read_fe_int(&blkdev->xendev, "ring-page-order",
+                             &order) == -1) {
+        blkdev->nr_ring_ref = 1;
+
+        if (xenstore_read_fe_int(&blkdev->xendev, "ring-ref",
+                                 &ring_ref) == -1) {
+            return -1;
+        }
+        blkdev->ring_ref[0] = ring_ref;
+
+    } else if (order >= 0 && order <= MAX_RING_PAGE_ORDER) {
+        blkdev->nr_ring_ref = 1 << order;
+
+        for (i = 0; i < blkdev->nr_ring_ref; i++) {
+            char *key;
+
+            key = g_strdup_printf("ring-ref%u", i);
+            if (!key) {
+                return -1;
+            }
+
+            if (xenstore_read_fe_int(&blkdev->xendev, key,
+                                     &ring_ref) == -1) {
+                return -1;
+            }
+            blkdev->ring_ref[i] = ring_ref;
+
+            g_free(key);
+        }
+    } else {
         return -1;
     }
+
     if (xenstore_read_fe_int(&blkdev->xendev, "event-channel",
                              &blkdev->xendev.remote_port) == -1) {
         return -1;
@@ -1163,41 +1197,85 @@ static int blk_connect(struct XenDevice *xendev)
         blkdev->protocol = BLKIF_PROTOCOL_NATIVE;
     }
 
-    blkdev->sring = xengnttab_map_grant_ref(blkdev->xendev.gnttabdev,
-                                            blkdev->xendev.dom,
-                                            blkdev->ring_ref,
-                                            PROT_READ | PROT_WRITE);
+    ring_size = XC_PAGE_SIZE * blkdev->nr_ring_ref;
+    switch (blkdev->protocol) {
+    case BLKIF_PROTOCOL_NATIVE:
+    {
+        blkdev->max_requests = __CONST_RING_SIZE(blkif, ring_size);
+        break;
+    }
+    case BLKIF_PROTOCOL_X86_32:
+    {
+        blkdev->max_requests = __CONST_RING_SIZE(blkif_x86_32, ring_size);
+        break;
+    }
+    case BLKIF_PROTOCOL_X86_64:
+    {
+        blkdev->max_requests = __CONST_RING_SIZE(blkif_x86_64, ring_size);
+        break;
+    }
+    default:
+        return -1;
+    }
+
+    /* Calculate the maximum number of grants needed by ioreqs */
+    max_grants = MAX_GRANTS(blkdev->max_requests,
+                            BLKIF_MAX_SEGMENTS_PER_REQUEST);
+    /* Add on the number needed for the ring pages */
+    max_grants += blkdev->nr_ring_ref;
+
+    if (xengnttab_set_max_grants(blkdev->xendev.gnttabdev, max_grants)) {
+        xen_pv_printf(xendev, 0, "xengnttab_set_max_grants failed: %s\n",
+                      strerror(errno));
+        return -1;
+    }
+
+    domids = g_malloc0_n(blkdev->nr_ring_ref, sizeof(uint32_t));
+    for (i = 0; i < blkdev->nr_ring_ref; i++) {
+        domids[i] = blkdev->xendev.dom;
+    }
+
+    blkdev->sring = xengnttab_map_grant_refs(blkdev->xendev.gnttabdev,
+                                             blkdev->nr_ring_ref,
+                                             domids,
+                                             blkdev->ring_ref,
+                                             PROT_READ | PROT_WRITE);
+
+    g_free(domids);
+
     if (!blkdev->sring) {
         return -1;
     }
+
     blkdev->cnt_map++;
 
     switch (blkdev->protocol) {
     case BLKIF_PROTOCOL_NATIVE:
     {
         blkif_sring_t *sring_native = blkdev->sring;
-        BACK_RING_INIT(&blkdev->rings.native, sring_native, XC_PAGE_SIZE);
+        BACK_RING_INIT(&blkdev->rings.native, sring_native, ring_size);
         break;
     }
     case BLKIF_PROTOCOL_X86_32:
     {
         blkif_x86_32_sring_t *sring_x86_32 = blkdev->sring;
 
-        BACK_RING_INIT(&blkdev->rings.x86_32_part, sring_x86_32, XC_PAGE_SIZE);
+        BACK_RING_INIT(&blkdev->rings.x86_32_part, sring_x86_32, ring_size);
         break;
     }
     case BLKIF_PROTOCOL_X86_64:
     {
         blkif_x86_64_sring_t *sring_x86_64 = blkdev->sring;
 
-        BACK_RING_INIT(&blkdev->rings.x86_64_part, sring_x86_64, XC_PAGE_SIZE);
+        BACK_RING_INIT(&blkdev->rings.x86_64_part, sring_x86_64, ring_size);
         break;
     }
     }
 
     if (blkdev->feature_persistent) {
         /* Init persistent grants */
-        blkdev->max_grants = max_requests * BLKIF_MAX_SEGMENTS_PER_REQUEST;
+        blkdev->max_grants = blkdev->max_requests *
+            BLKIF_MAX_SEGMENTS_PER_REQUEST;
         blkdev->persistent_gnts = g_tree_new_full((GCompareDataFunc)int_cmp,
                                              NULL, NULL,
                                              batch_maps ?
@@ -1209,9 +1287,9 @@ static int blk_connect(struct XenDevice *xendev)
 
     xen_be_bind_evtchn(&blkdev->xendev);
 
-    xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, ring-ref %d, "
+    xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, nr-ring-ref %u, "
                   "remote port %d, local port %d\n",
-                  blkdev->xendev.protocol, blkdev->ring_ref,
+                  blkdev->xendev.protocol, blkdev->nr_ring_ref,
                   blkdev->xendev.remote_port, blkdev->xendev.local_port);
     return 0;
 }
@@ -1228,7 +1306,8 @@ static void blk_disconnect(struct XenDevice *xendev)
     xen_pv_unbind_evtchn(&blkdev->xendev);
 
     if (blkdev->sring) {
-        xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring, 1);
+        xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring,
+                        blkdev->nr_ring_ref);
         blkdev->cnt_map--;
         blkdev->sring = NULL;
     }
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH 3/3] xen-disk: use an IOThread per instance
  2017-06-20 13:47 ` Paul Durrant
@ 2017-06-20 13:47   ` Paul Durrant
  -1 siblings, 0 replies; 28+ messages in thread
From: Paul Durrant @ 2017-06-20 13:47 UTC (permalink / raw)
  To: xen-devel, qemu-devel, qemu-block
  Cc: Paul Durrant, Stefano Stabellini, Anthony Perard, Kevin Wolf, Max Reitz

This patch allocates an IOThread object for each xen_disk instance and
sets the AIO context appropriately on connect. This allows processing
of I/O to proceed in parallel.

The patch also adds tracepoints into xen_disk to make it possible to
follow the state transtions of an instance in the log.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
---
 hw/block/trace-events |  7 +++++++
 hw/block/xen_disk.c   | 44 +++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/hw/block/trace-events b/hw/block/trace-events
index 65e83dc258..608b24ba66 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -10,3 +10,10 @@ virtio_blk_submit_multireq(void *mrb, int start, int num_reqs, uint64_t offset,
 # hw/block/hd-geometry.c
 hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p LCHS %d %d %d"
 hd_geometry_guess(void *blk, uint32_t cyls, uint32_t heads, uint32_t secs, int trans) "blk %p CHS %u %u %u trans %d"
+
+# hw/block/xen_disk.c
+xen_disk_alloc(char *name) "%s"
+xen_disk_init(char *name) "%s"
+xen_disk_connect(char *name) "%s"
+xen_disk_disconnect(char *name) "%s"
+xen_disk_free(char *name) "%s"
diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index a9942d32db..ec1085c802 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -27,10 +27,13 @@
 #include "hw/xen/xen_backend.h"
 #include "xen_blkif.h"
 #include "sysemu/blockdev.h"
+#include "sysemu/iothread.h"
 #include "sysemu/block-backend.h"
 #include "qapi/error.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qstring.h"
+#include "qom/object_interfaces.h"
+#include "trace.h"
 
 /* ------------------------------------------------------------- */
 
@@ -128,6 +131,9 @@ struct XenBlkDev {
     DriveInfo           *dinfo;
     BlockBackend        *blk;
     QEMUBH              *bh;
+
+    IOThread            *iothread;
+    AioContext          *ctx;
 };
 
 /* ------------------------------------------------------------- */
@@ -923,11 +929,31 @@ static void blk_bh(void *opaque)
 static void blk_alloc(struct XenDevice *xendev)
 {
     struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
+    Object *obj;
+    char *name;
+    Error *err = NULL;
+
+    trace_xen_disk_alloc(xendev->name);
 
     QLIST_INIT(&blkdev->inflight);
     QLIST_INIT(&blkdev->finished);
     QLIST_INIT(&blkdev->freelist);
-    blkdev->bh = qemu_bh_new(blk_bh, blkdev);
+
+    obj = object_new(TYPE_IOTHREAD);
+    name = g_strdup_printf("iothread-%s", xendev->name);
+
+    object_property_add_child(object_get_objects_root(), name, obj, &err);
+    assert(!err);
+
+    g_free(name);
+
+    user_creatable_complete(obj, &err);
+    assert(!err);
+
+    blkdev->iothread = (IOThread *)object_dynamic_cast(obj, TYPE_IOTHREAD);
+    blkdev->ctx = iothread_get_aio_context(blkdev->iothread);
+    blkdev->bh = aio_bh_new(blkdev->ctx, blk_bh, blkdev);
+
     if (xen_mode != XEN_EMULATE) {
         batch_maps = 1;
     }
@@ -954,6 +980,8 @@ static int blk_init(struct XenDevice *xendev)
     int info = 0;
     char *directiosafe = NULL;
 
+    trace_xen_disk_init(xendev->name);
+
     /* read xenstore entries */
     if (blkdev->params == NULL) {
         char *h = NULL;
@@ -1069,6 +1097,8 @@ static int blk_connect(struct XenDevice *xendev)
     unsigned int i;
     uint32_t *domids;
 
+    trace_xen_disk_connect(xendev->name);
+
     /* read-only ? */
     if (blkdev->directiosafe) {
         qflags = BDRV_O_NOCACHE | BDRV_O_NATIVE_AIO;
@@ -1285,6 +1315,8 @@ static int blk_connect(struct XenDevice *xendev)
         blkdev->persistent_gnt_count = 0;
     }
 
+    blk_set_aio_context(blkdev->blk, blkdev->ctx);
+
     xen_be_bind_evtchn(&blkdev->xendev);
 
     xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, nr-ring-ref %u, "
@@ -1298,13 +1330,20 @@ static void blk_disconnect(struct XenDevice *xendev)
 {
     struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
 
+    trace_xen_disk_disconnect(xendev->name);
+
+    aio_context_acquire(blkdev->ctx);
+
     if (blkdev->blk) {
+        blk_set_aio_context(blkdev->blk, qemu_get_aio_context());
         blk_detach_dev(blkdev->blk, blkdev);
         blk_unref(blkdev->blk);
         blkdev->blk = NULL;
     }
     xen_pv_unbind_evtchn(&blkdev->xendev);
 
+    aio_context_release(blkdev->ctx);
+
     if (blkdev->sring) {
         xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring,
                         blkdev->nr_ring_ref);
@@ -1338,6 +1377,8 @@ static int blk_free(struct XenDevice *xendev)
     struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
     struct ioreq *ioreq;
 
+    trace_xen_disk_free(xendev->name);
+
     if (blkdev->blk || blkdev->sring) {
         blk_disconnect(xendev);
     }
@@ -1355,6 +1396,7 @@ static int blk_free(struct XenDevice *xendev)
     g_free(blkdev->dev);
     g_free(blkdev->devtype);
     qemu_bh_delete(blkdev->bh);
+    object_unparent(OBJECT(blkdev->iothread));
     return 0;
 }
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/3] xen-disk: use an IOThread per instance
@ 2017-06-20 13:47   ` Paul Durrant
  0 siblings, 0 replies; 28+ messages in thread
From: Paul Durrant @ 2017-06-20 13:47 UTC (permalink / raw)
  To: xen-devel, qemu-devel, qemu-block
  Cc: Anthony Perard, Kevin Wolf, Paul Durrant, Stefano Stabellini, Max Reitz

This patch allocates an IOThread object for each xen_disk instance and
sets the AIO context appropriately on connect. This allows processing
of I/O to proceed in parallel.

The patch also adds tracepoints into xen_disk to make it possible to
follow the state transtions of an instance in the log.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
---
 hw/block/trace-events |  7 +++++++
 hw/block/xen_disk.c   | 44 +++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/hw/block/trace-events b/hw/block/trace-events
index 65e83dc258..608b24ba66 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -10,3 +10,10 @@ virtio_blk_submit_multireq(void *mrb, int start, int num_reqs, uint64_t offset,
 # hw/block/hd-geometry.c
 hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p LCHS %d %d %d"
 hd_geometry_guess(void *blk, uint32_t cyls, uint32_t heads, uint32_t secs, int trans) "blk %p CHS %u %u %u trans %d"
+
+# hw/block/xen_disk.c
+xen_disk_alloc(char *name) "%s"
+xen_disk_init(char *name) "%s"
+xen_disk_connect(char *name) "%s"
+xen_disk_disconnect(char *name) "%s"
+xen_disk_free(char *name) "%s"
diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index a9942d32db..ec1085c802 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -27,10 +27,13 @@
 #include "hw/xen/xen_backend.h"
 #include "xen_blkif.h"
 #include "sysemu/blockdev.h"
+#include "sysemu/iothread.h"
 #include "sysemu/block-backend.h"
 #include "qapi/error.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qstring.h"
+#include "qom/object_interfaces.h"
+#include "trace.h"
 
 /* ------------------------------------------------------------- */
 
@@ -128,6 +131,9 @@ struct XenBlkDev {
     DriveInfo           *dinfo;
     BlockBackend        *blk;
     QEMUBH              *bh;
+
+    IOThread            *iothread;
+    AioContext          *ctx;
 };
 
 /* ------------------------------------------------------------- */
@@ -923,11 +929,31 @@ static void blk_bh(void *opaque)
 static void blk_alloc(struct XenDevice *xendev)
 {
     struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
+    Object *obj;
+    char *name;
+    Error *err = NULL;
+
+    trace_xen_disk_alloc(xendev->name);
 
     QLIST_INIT(&blkdev->inflight);
     QLIST_INIT(&blkdev->finished);
     QLIST_INIT(&blkdev->freelist);
-    blkdev->bh = qemu_bh_new(blk_bh, blkdev);
+
+    obj = object_new(TYPE_IOTHREAD);
+    name = g_strdup_printf("iothread-%s", xendev->name);
+
+    object_property_add_child(object_get_objects_root(), name, obj, &err);
+    assert(!err);
+
+    g_free(name);
+
+    user_creatable_complete(obj, &err);
+    assert(!err);
+
+    blkdev->iothread = (IOThread *)object_dynamic_cast(obj, TYPE_IOTHREAD);
+    blkdev->ctx = iothread_get_aio_context(blkdev->iothread);
+    blkdev->bh = aio_bh_new(blkdev->ctx, blk_bh, blkdev);
+
     if (xen_mode != XEN_EMULATE) {
         batch_maps = 1;
     }
@@ -954,6 +980,8 @@ static int blk_init(struct XenDevice *xendev)
     int info = 0;
     char *directiosafe = NULL;
 
+    trace_xen_disk_init(xendev->name);
+
     /* read xenstore entries */
     if (blkdev->params == NULL) {
         char *h = NULL;
@@ -1069,6 +1097,8 @@ static int blk_connect(struct XenDevice *xendev)
     unsigned int i;
     uint32_t *domids;
 
+    trace_xen_disk_connect(xendev->name);
+
     /* read-only ? */
     if (blkdev->directiosafe) {
         qflags = BDRV_O_NOCACHE | BDRV_O_NATIVE_AIO;
@@ -1285,6 +1315,8 @@ static int blk_connect(struct XenDevice *xendev)
         blkdev->persistent_gnt_count = 0;
     }
 
+    blk_set_aio_context(blkdev->blk, blkdev->ctx);
+
     xen_be_bind_evtchn(&blkdev->xendev);
 
     xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, nr-ring-ref %u, "
@@ -1298,13 +1330,20 @@ static void blk_disconnect(struct XenDevice *xendev)
 {
     struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
 
+    trace_xen_disk_disconnect(xendev->name);
+
+    aio_context_acquire(blkdev->ctx);
+
     if (blkdev->blk) {
+        blk_set_aio_context(blkdev->blk, qemu_get_aio_context());
         blk_detach_dev(blkdev->blk, blkdev);
         blk_unref(blkdev->blk);
         blkdev->blk = NULL;
     }
     xen_pv_unbind_evtchn(&blkdev->xendev);
 
+    aio_context_release(blkdev->ctx);
+
     if (blkdev->sring) {
         xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring,
                         blkdev->nr_ring_ref);
@@ -1338,6 +1377,8 @@ static int blk_free(struct XenDevice *xendev)
     struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
     struct ioreq *ioreq;
 
+    trace_xen_disk_free(xendev->name);
+
     if (blkdev->blk || blkdev->sring) {
         blk_disconnect(xendev);
     }
@@ -1355,6 +1396,7 @@ static int blk_free(struct XenDevice *xendev)
     g_free(blkdev->dev);
     g_free(blkdev->devtype);
     qemu_bh_delete(blkdev->bh);
+    object_unparent(OBJECT(blkdev->iothread));
     return 0;
 }
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] xen-disk: use an IOThread per instance
  2017-06-20 13:47   ` Paul Durrant
  (?)
  (?)
@ 2017-06-20 16:07   ` Paolo Bonzini
  2017-06-20 16:09       ` Paul Durrant
  -1 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2017-06-20 16:07 UTC (permalink / raw)
  To: Paul Durrant, xen-devel, qemu-devel, qemu-block
  Cc: Anthony Perard, Kevin Wolf, Stefano Stabellini, Max Reitz

On 20/06/2017 15:47, Paul Durrant wrote:
> This patch allocates an IOThread object for each xen_disk instance and
> sets the AIO context appropriately on connect. This allows processing
> of I/O to proceed in parallel.
> 
> The patch also adds tracepoints into xen_disk to make it possible to
> follow the state transtions of an instance in the log.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>

The QEMU block layer is not yet thread safe, but code running in
IOThreads still has to take the AioContext lock.  You need to call
aio_context_acquire/release in blk_bh and qemu_aio_complete.

Paolo

> ---
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Anthony Perard <anthony.perard@citrix.com>
> Cc: Kevin Wolf <kwolf@redhat.com>
> Cc: Max Reitz <mreitz@redhat.com>
> ---
>  hw/block/trace-events |  7 +++++++
>  hw/block/xen_disk.c   | 44 +++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 50 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/block/trace-events b/hw/block/trace-events
> index 65e83dc258..608b24ba66 100644
> --- a/hw/block/trace-events
> +++ b/hw/block/trace-events
> @@ -10,3 +10,10 @@ virtio_blk_submit_multireq(void *mrb, int start, int num_reqs, uint64_t offset,
>  # hw/block/hd-geometry.c
>  hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p LCHS %d %d %d"
>  hd_geometry_guess(void *blk, uint32_t cyls, uint32_t heads, uint32_t secs, int trans) "blk %p CHS %u %u %u trans %d"
> +
> +# hw/block/xen_disk.c
> +xen_disk_alloc(char *name) "%s"
> +xen_disk_init(char *name) "%s"
> +xen_disk_connect(char *name) "%s"
> +xen_disk_disconnect(char *name) "%s"
> +xen_disk_free(char *name) "%s"
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index a9942d32db..ec1085c802 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -27,10 +27,13 @@
>  #include "hw/xen/xen_backend.h"
>  #include "xen_blkif.h"
>  #include "sysemu/blockdev.h"
> +#include "sysemu/iothread.h"
>  #include "sysemu/block-backend.h"
>  #include "qapi/error.h"
>  #include "qapi/qmp/qdict.h"
>  #include "qapi/qmp/qstring.h"
> +#include "qom/object_interfaces.h"
> +#include "trace.h"
>  
>  /* ------------------------------------------------------------- */
>  
> @@ -128,6 +131,9 @@ struct XenBlkDev {
>      DriveInfo           *dinfo;
>      BlockBackend        *blk;
>      QEMUBH              *bh;
> +
> +    IOThread            *iothread;
> +    AioContext          *ctx;
>  };
>  
>  /* ------------------------------------------------------------- */
> @@ -923,11 +929,31 @@ static void blk_bh(void *opaque)
>  static void blk_alloc(struct XenDevice *xendev)
>  {
>      struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
> +    Object *obj;
> +    char *name;
> +    Error *err = NULL;
> +
> +    trace_xen_disk_alloc(xendev->name);
>  
>      QLIST_INIT(&blkdev->inflight);
>      QLIST_INIT(&blkdev->finished);
>      QLIST_INIT(&blkdev->freelist);
> -    blkdev->bh = qemu_bh_new(blk_bh, blkdev);
> +
> +    obj = object_new(TYPE_IOTHREAD);
> +    name = g_strdup_printf("iothread-%s", xendev->name);
> +
> +    object_property_add_child(object_get_objects_root(), name, obj, &err);
> +    assert(!err);
> +
> +    g_free(name);
> +
> +    user_creatable_complete(obj, &err);
> +    assert(!err);
> +
> +    blkdev->iothread = (IOThread *)object_dynamic_cast(obj, TYPE_IOTHREAD);
> +    blkdev->ctx = iothread_get_aio_context(blkdev->iothread);
> +    blkdev->bh = aio_bh_new(blkdev->ctx, blk_bh, blkdev);
> +
>      if (xen_mode != XEN_EMULATE) {
>          batch_maps = 1;
>      }
> @@ -954,6 +980,8 @@ static int blk_init(struct XenDevice *xendev)
>      int info = 0;
>      char *directiosafe = NULL;
>  
> +    trace_xen_disk_init(xendev->name);
> +
>      /* read xenstore entries */
>      if (blkdev->params == NULL) {
>          char *h = NULL;
> @@ -1069,6 +1097,8 @@ static int blk_connect(struct XenDevice *xendev)
>      unsigned int i;
>      uint32_t *domids;
>  
> +    trace_xen_disk_connect(xendev->name);
> +
>      /* read-only ? */
>      if (blkdev->directiosafe) {
>          qflags = BDRV_O_NOCACHE | BDRV_O_NATIVE_AIO;
> @@ -1285,6 +1315,8 @@ static int blk_connect(struct XenDevice *xendev)
>          blkdev->persistent_gnt_count = 0;
>      }
>  
> +    blk_set_aio_context(blkdev->blk, blkdev->ctx);
> +
>      xen_be_bind_evtchn(&blkdev->xendev);
>  
>      xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, nr-ring-ref %u, "
> @@ -1298,13 +1330,20 @@ static void blk_disconnect(struct XenDevice *xendev)
>  {
>      struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
>  
> +    trace_xen_disk_disconnect(xendev->name);
> +
> +    aio_context_acquire(blkdev->ctx);
> +
>      if (blkdev->blk) {
> +        blk_set_aio_context(blkdev->blk, qemu_get_aio_context());
>          blk_detach_dev(blkdev->blk, blkdev);
>          blk_unref(blkdev->blk);
>          blkdev->blk = NULL;
>      }
>      xen_pv_unbind_evtchn(&blkdev->xendev);
>  
> +    aio_context_release(blkdev->ctx);
> +
>      if (blkdev->sring) {
>          xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring,
>                          blkdev->nr_ring_ref);
> @@ -1338,6 +1377,8 @@ static int blk_free(struct XenDevice *xendev)
>      struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
>      struct ioreq *ioreq;
>  
> +    trace_xen_disk_free(xendev->name);
> +
>      if (blkdev->blk || blkdev->sring) {
>          blk_disconnect(xendev);
>      }
> @@ -1355,6 +1396,7 @@ static int blk_free(struct XenDevice *xendev)
>      g_free(blkdev->dev);
>      g_free(blkdev->devtype);
>      qemu_bh_delete(blkdev->bh);
> +    object_unparent(OBJECT(blkdev->iothread));
>      return 0;
>  }
>  
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] xen-disk: use an IOThread per instance
  2017-06-20 13:47   ` Paul Durrant
  (?)
@ 2017-06-20 16:07   ` Paolo Bonzini
  -1 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2017-06-20 16:07 UTC (permalink / raw)
  To: Paul Durrant, xen-devel, qemu-devel, qemu-block
  Cc: Anthony Perard, Kevin Wolf, Stefano Stabellini, Max Reitz

On 20/06/2017 15:47, Paul Durrant wrote:
> This patch allocates an IOThread object for each xen_disk instance and
> sets the AIO context appropriately on connect. This allows processing
> of I/O to proceed in parallel.
> 
> The patch also adds tracepoints into xen_disk to make it possible to
> follow the state transtions of an instance in the log.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>

The QEMU block layer is not yet thread safe, but code running in
IOThreads still has to take the AioContext lock.  You need to call
aio_context_acquire/release in blk_bh and qemu_aio_complete.

Paolo

> ---
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Anthony Perard <anthony.perard@citrix.com>
> Cc: Kevin Wolf <kwolf@redhat.com>
> Cc: Max Reitz <mreitz@redhat.com>
> ---
>  hw/block/trace-events |  7 +++++++
>  hw/block/xen_disk.c   | 44 +++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 50 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/block/trace-events b/hw/block/trace-events
> index 65e83dc258..608b24ba66 100644
> --- a/hw/block/trace-events
> +++ b/hw/block/trace-events
> @@ -10,3 +10,10 @@ virtio_blk_submit_multireq(void *mrb, int start, int num_reqs, uint64_t offset,
>  # hw/block/hd-geometry.c
>  hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p LCHS %d %d %d"
>  hd_geometry_guess(void *blk, uint32_t cyls, uint32_t heads, uint32_t secs, int trans) "blk %p CHS %u %u %u trans %d"
> +
> +# hw/block/xen_disk.c
> +xen_disk_alloc(char *name) "%s"
> +xen_disk_init(char *name) "%s"
> +xen_disk_connect(char *name) "%s"
> +xen_disk_disconnect(char *name) "%s"
> +xen_disk_free(char *name) "%s"
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index a9942d32db..ec1085c802 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -27,10 +27,13 @@
>  #include "hw/xen/xen_backend.h"
>  #include "xen_blkif.h"
>  #include "sysemu/blockdev.h"
> +#include "sysemu/iothread.h"
>  #include "sysemu/block-backend.h"
>  #include "qapi/error.h"
>  #include "qapi/qmp/qdict.h"
>  #include "qapi/qmp/qstring.h"
> +#include "qom/object_interfaces.h"
> +#include "trace.h"
>  
>  /* ------------------------------------------------------------- */
>  
> @@ -128,6 +131,9 @@ struct XenBlkDev {
>      DriveInfo           *dinfo;
>      BlockBackend        *blk;
>      QEMUBH              *bh;
> +
> +    IOThread            *iothread;
> +    AioContext          *ctx;
>  };
>  
>  /* ------------------------------------------------------------- */
> @@ -923,11 +929,31 @@ static void blk_bh(void *opaque)
>  static void blk_alloc(struct XenDevice *xendev)
>  {
>      struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
> +    Object *obj;
> +    char *name;
> +    Error *err = NULL;
> +
> +    trace_xen_disk_alloc(xendev->name);
>  
>      QLIST_INIT(&blkdev->inflight);
>      QLIST_INIT(&blkdev->finished);
>      QLIST_INIT(&blkdev->freelist);
> -    blkdev->bh = qemu_bh_new(blk_bh, blkdev);
> +
> +    obj = object_new(TYPE_IOTHREAD);
> +    name = g_strdup_printf("iothread-%s", xendev->name);
> +
> +    object_property_add_child(object_get_objects_root(), name, obj, &err);
> +    assert(!err);
> +
> +    g_free(name);
> +
> +    user_creatable_complete(obj, &err);
> +    assert(!err);
> +
> +    blkdev->iothread = (IOThread *)object_dynamic_cast(obj, TYPE_IOTHREAD);
> +    blkdev->ctx = iothread_get_aio_context(blkdev->iothread);
> +    blkdev->bh = aio_bh_new(blkdev->ctx, blk_bh, blkdev);
> +
>      if (xen_mode != XEN_EMULATE) {
>          batch_maps = 1;
>      }
> @@ -954,6 +980,8 @@ static int blk_init(struct XenDevice *xendev)
>      int info = 0;
>      char *directiosafe = NULL;
>  
> +    trace_xen_disk_init(xendev->name);
> +
>      /* read xenstore entries */
>      if (blkdev->params == NULL) {
>          char *h = NULL;
> @@ -1069,6 +1097,8 @@ static int blk_connect(struct XenDevice *xendev)
>      unsigned int i;
>      uint32_t *domids;
>  
> +    trace_xen_disk_connect(xendev->name);
> +
>      /* read-only ? */
>      if (blkdev->directiosafe) {
>          qflags = BDRV_O_NOCACHE | BDRV_O_NATIVE_AIO;
> @@ -1285,6 +1315,8 @@ static int blk_connect(struct XenDevice *xendev)
>          blkdev->persistent_gnt_count = 0;
>      }
>  
> +    blk_set_aio_context(blkdev->blk, blkdev->ctx);
> +
>      xen_be_bind_evtchn(&blkdev->xendev);
>  
>      xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, nr-ring-ref %u, "
> @@ -1298,13 +1330,20 @@ static void blk_disconnect(struct XenDevice *xendev)
>  {
>      struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
>  
> +    trace_xen_disk_disconnect(xendev->name);
> +
> +    aio_context_acquire(blkdev->ctx);
> +
>      if (blkdev->blk) {
> +        blk_set_aio_context(blkdev->blk, qemu_get_aio_context());
>          blk_detach_dev(blkdev->blk, blkdev);
>          blk_unref(blkdev->blk);
>          blkdev->blk = NULL;
>      }
>      xen_pv_unbind_evtchn(&blkdev->xendev);
>  
> +    aio_context_release(blkdev->ctx);
> +
>      if (blkdev->sring) {
>          xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring,
>                          blkdev->nr_ring_ref);
> @@ -1338,6 +1377,8 @@ static int blk_free(struct XenDevice *xendev)
>      struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
>      struct ioreq *ioreq;
>  
> +    trace_xen_disk_free(xendev->name);
> +
>      if (blkdev->blk || blkdev->sring) {
>          blk_disconnect(xendev);
>      }
> @@ -1355,6 +1396,7 @@ static int blk_free(struct XenDevice *xendev)
>      g_free(blkdev->dev);
>      g_free(blkdev->devtype);
>      qemu_bh_delete(blkdev->bh);
> +    object_unparent(OBJECT(blkdev->iothread));
>      return 0;
>  }
>  
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] xen-disk: use an IOThread per instance
  2017-06-20 16:07   ` [Qemu-devel] " Paolo Bonzini
@ 2017-06-20 16:09       ` Paul Durrant
  0 siblings, 0 replies; 28+ messages in thread
From: Paul Durrant @ 2017-06-20 16:09 UTC (permalink / raw)
  To: 'Paolo Bonzini', xen-devel, qemu-devel, qemu-block
  Cc: Anthony Perard, Kevin Wolf, Stefano Stabellini, Max Reitz

> -----Original Message-----
> From: Paolo Bonzini [mailto:paolo.bonzini@gmail.com] On Behalf Of Paolo
> Bonzini
> Sent: 20 June 2017 17:08
> To: Paul Durrant <Paul.Durrant@citrix.com>; xen-devel@lists.xenproject.org;
> qemu-devel@nongnu.org; qemu-block@nongnu.org
> Cc: Anthony Perard <anthony.perard@citrix.com>; Kevin Wolf
> <kwolf@redhat.com>; Stefano Stabellini <sstabellini@kernel.org>; Max Reitz
> <mreitz@redhat.com>
> Subject: Re: [PATCH 3/3] xen-disk: use an IOThread per instance
> 
> On 20/06/2017 15:47, Paul Durrant wrote:
> > This patch allocates an IOThread object for each xen_disk instance and
> > sets the AIO context appropriately on connect. This allows processing
> > of I/O to proceed in parallel.
> >
> > The patch also adds tracepoints into xen_disk to make it possible to
> > follow the state transtions of an instance in the log.
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> 
> The QEMU block layer is not yet thread safe, but code running in
> IOThreads still has to take the AioContext lock.  You need to call
> aio_context_acquire/release in blk_bh and qemu_aio_complete.
> 

Ok, thanks. I'll update the patch and re-test.

Cheers,

  Paul

> Paolo
> 
> > ---
> > Cc: Stefano Stabellini <sstabellini@kernel.org>
> > Cc: Anthony Perard <anthony.perard@citrix.com>
> > Cc: Kevin Wolf <kwolf@redhat.com>
> > Cc: Max Reitz <mreitz@redhat.com>
> > ---
> >  hw/block/trace-events |  7 +++++++
> >  hw/block/xen_disk.c   | 44
> +++++++++++++++++++++++++++++++++++++++++++-
> >  2 files changed, 50 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/block/trace-events b/hw/block/trace-events
> > index 65e83dc258..608b24ba66 100644
> > --- a/hw/block/trace-events
> > +++ b/hw/block/trace-events
> > @@ -10,3 +10,10 @@ virtio_blk_submit_multireq(void *mrb, int start, int
> num_reqs, uint64_t offset,
> >  # hw/block/hd-geometry.c
> >  hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p
> LCHS %d %d %d"
> >  hd_geometry_guess(void *blk, uint32_t cyls, uint32_t heads, uint32_t
> secs, int trans) "blk %p CHS %u %u %u trans %d"
> > +
> > +# hw/block/xen_disk.c
> > +xen_disk_alloc(char *name) "%s"
> > +xen_disk_init(char *name) "%s"
> > +xen_disk_connect(char *name) "%s"
> > +xen_disk_disconnect(char *name) "%s"
> > +xen_disk_free(char *name) "%s"
> > diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> > index a9942d32db..ec1085c802 100644
> > --- a/hw/block/xen_disk.c
> > +++ b/hw/block/xen_disk.c
> > @@ -27,10 +27,13 @@
> >  #include "hw/xen/xen_backend.h"
> >  #include "xen_blkif.h"
> >  #include "sysemu/blockdev.h"
> > +#include "sysemu/iothread.h"
> >  #include "sysemu/block-backend.h"
> >  #include "qapi/error.h"
> >  #include "qapi/qmp/qdict.h"
> >  #include "qapi/qmp/qstring.h"
> > +#include "qom/object_interfaces.h"
> > +#include "trace.h"
> >
> >  /* ------------------------------------------------------------- */
> >
> > @@ -128,6 +131,9 @@ struct XenBlkDev {
> >      DriveInfo           *dinfo;
> >      BlockBackend        *blk;
> >      QEMUBH              *bh;
> > +
> > +    IOThread            *iothread;
> > +    AioContext          *ctx;
> >  };
> >
> >  /* ------------------------------------------------------------- */
> > @@ -923,11 +929,31 @@ static void blk_bh(void *opaque)
> >  static void blk_alloc(struct XenDevice *xendev)
> >  {
> >      struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev,
> xendev);
> > +    Object *obj;
> > +    char *name;
> > +    Error *err = NULL;
> > +
> > +    trace_xen_disk_alloc(xendev->name);
> >
> >      QLIST_INIT(&blkdev->inflight);
> >      QLIST_INIT(&blkdev->finished);
> >      QLIST_INIT(&blkdev->freelist);
> > -    blkdev->bh = qemu_bh_new(blk_bh, blkdev);
> > +
> > +    obj = object_new(TYPE_IOTHREAD);
> > +    name = g_strdup_printf("iothread-%s", xendev->name);
> > +
> > +    object_property_add_child(object_get_objects_root(), name, obj,
> &err);
> > +    assert(!err);
> > +
> > +    g_free(name);
> > +
> > +    user_creatable_complete(obj, &err);
> > +    assert(!err);
> > +
> > +    blkdev->iothread = (IOThread *)object_dynamic_cast(obj,
> TYPE_IOTHREAD);
> > +    blkdev->ctx = iothread_get_aio_context(blkdev->iothread);
> > +    blkdev->bh = aio_bh_new(blkdev->ctx, blk_bh, blkdev);
> > +
> >      if (xen_mode != XEN_EMULATE) {
> >          batch_maps = 1;
> >      }
> > @@ -954,6 +980,8 @@ static int blk_init(struct XenDevice *xendev)
> >      int info = 0;
> >      char *directiosafe = NULL;
> >
> > +    trace_xen_disk_init(xendev->name);
> > +
> >      /* read xenstore entries */
> >      if (blkdev->params == NULL) {
> >          char *h = NULL;
> > @@ -1069,6 +1097,8 @@ static int blk_connect(struct XenDevice *xendev)
> >      unsigned int i;
> >      uint32_t *domids;
> >
> > +    trace_xen_disk_connect(xendev->name);
> > +
> >      /* read-only ? */
> >      if (blkdev->directiosafe) {
> >          qflags = BDRV_O_NOCACHE | BDRV_O_NATIVE_AIO;
> > @@ -1285,6 +1315,8 @@ static int blk_connect(struct XenDevice *xendev)
> >          blkdev->persistent_gnt_count = 0;
> >      }
> >
> > +    blk_set_aio_context(blkdev->blk, blkdev->ctx);
> > +
> >      xen_be_bind_evtchn(&blkdev->xendev);
> >
> >      xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, nr-ring-ref %u, "
> > @@ -1298,13 +1330,20 @@ static void blk_disconnect(struct XenDevice
> *xendev)
> >  {
> >      struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev,
> xendev);
> >
> > +    trace_xen_disk_disconnect(xendev->name);
> > +
> > +    aio_context_acquire(blkdev->ctx);
> > +
> >      if (blkdev->blk) {
> > +        blk_set_aio_context(blkdev->blk, qemu_get_aio_context());
> >          blk_detach_dev(blkdev->blk, blkdev);
> >          blk_unref(blkdev->blk);
> >          blkdev->blk = NULL;
> >      }
> >      xen_pv_unbind_evtchn(&blkdev->xendev);
> >
> > +    aio_context_release(blkdev->ctx);
> > +
> >      if (blkdev->sring) {
> >          xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring,
> >                          blkdev->nr_ring_ref);
> > @@ -1338,6 +1377,8 @@ static int blk_free(struct XenDevice *xendev)
> >      struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev,
> xendev);
> >      struct ioreq *ioreq;
> >
> > +    trace_xen_disk_free(xendev->name);
> > +
> >      if (blkdev->blk || blkdev->sring) {
> >          blk_disconnect(xendev);
> >      }
> > @@ -1355,6 +1396,7 @@ static int blk_free(struct XenDevice *xendev)
> >      g_free(blkdev->dev);
> >      g_free(blkdev->devtype);
> >      qemu_bh_delete(blkdev->bh);
> > +    object_unparent(OBJECT(blkdev->iothread));
> >      return 0;
> >  }
> >
> >

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] xen-disk: use an IOThread per instance
@ 2017-06-20 16:09       ` Paul Durrant
  0 siblings, 0 replies; 28+ messages in thread
From: Paul Durrant @ 2017-06-20 16:09 UTC (permalink / raw)
  To: 'Paolo Bonzini', xen-devel, qemu-devel, qemu-block
  Cc: Anthony Perard, Kevin Wolf, Stefano Stabellini, Max Reitz

> -----Original Message-----
> From: Paolo Bonzini [mailto:paolo.bonzini@gmail.com] On Behalf Of Paolo
> Bonzini
> Sent: 20 June 2017 17:08
> To: Paul Durrant <Paul.Durrant@citrix.com>; xen-devel@lists.xenproject.org;
> qemu-devel@nongnu.org; qemu-block@nongnu.org
> Cc: Anthony Perard <anthony.perard@citrix.com>; Kevin Wolf
> <kwolf@redhat.com>; Stefano Stabellini <sstabellini@kernel.org>; Max Reitz
> <mreitz@redhat.com>
> Subject: Re: [PATCH 3/3] xen-disk: use an IOThread per instance
> 
> On 20/06/2017 15:47, Paul Durrant wrote:
> > This patch allocates an IOThread object for each xen_disk instance and
> > sets the AIO context appropriately on connect. This allows processing
> > of I/O to proceed in parallel.
> >
> > The patch also adds tracepoints into xen_disk to make it possible to
> > follow the state transtions of an instance in the log.
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> 
> The QEMU block layer is not yet thread safe, but code running in
> IOThreads still has to take the AioContext lock.  You need to call
> aio_context_acquire/release in blk_bh and qemu_aio_complete.
> 

Ok, thanks. I'll update the patch and re-test.

Cheers,

  Paul

> Paolo
> 
> > ---
> > Cc: Stefano Stabellini <sstabellini@kernel.org>
> > Cc: Anthony Perard <anthony.perard@citrix.com>
> > Cc: Kevin Wolf <kwolf@redhat.com>
> > Cc: Max Reitz <mreitz@redhat.com>
> > ---
> >  hw/block/trace-events |  7 +++++++
> >  hw/block/xen_disk.c   | 44
> +++++++++++++++++++++++++++++++++++++++++++-
> >  2 files changed, 50 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/block/trace-events b/hw/block/trace-events
> > index 65e83dc258..608b24ba66 100644
> > --- a/hw/block/trace-events
> > +++ b/hw/block/trace-events
> > @@ -10,3 +10,10 @@ virtio_blk_submit_multireq(void *mrb, int start, int
> num_reqs, uint64_t offset,
> >  # hw/block/hd-geometry.c
> >  hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p
> LCHS %d %d %d"
> >  hd_geometry_guess(void *blk, uint32_t cyls, uint32_t heads, uint32_t
> secs, int trans) "blk %p CHS %u %u %u trans %d"
> > +
> > +# hw/block/xen_disk.c
> > +xen_disk_alloc(char *name) "%s"
> > +xen_disk_init(char *name) "%s"
> > +xen_disk_connect(char *name) "%s"
> > +xen_disk_disconnect(char *name) "%s"
> > +xen_disk_free(char *name) "%s"
> > diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> > index a9942d32db..ec1085c802 100644
> > --- a/hw/block/xen_disk.c
> > +++ b/hw/block/xen_disk.c
> > @@ -27,10 +27,13 @@
> >  #include "hw/xen/xen_backend.h"
> >  #include "xen_blkif.h"
> >  #include "sysemu/blockdev.h"
> > +#include "sysemu/iothread.h"
> >  #include "sysemu/block-backend.h"
> >  #include "qapi/error.h"
> >  #include "qapi/qmp/qdict.h"
> >  #include "qapi/qmp/qstring.h"
> > +#include "qom/object_interfaces.h"
> > +#include "trace.h"
> >
> >  /* ------------------------------------------------------------- */
> >
> > @@ -128,6 +131,9 @@ struct XenBlkDev {
> >      DriveInfo           *dinfo;
> >      BlockBackend        *blk;
> >      QEMUBH              *bh;
> > +
> > +    IOThread            *iothread;
> > +    AioContext          *ctx;
> >  };
> >
> >  /* ------------------------------------------------------------- */
> > @@ -923,11 +929,31 @@ static void blk_bh(void *opaque)
> >  static void blk_alloc(struct XenDevice *xendev)
> >  {
> >      struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev,
> xendev);
> > +    Object *obj;
> > +    char *name;
> > +    Error *err = NULL;
> > +
> > +    trace_xen_disk_alloc(xendev->name);
> >
> >      QLIST_INIT(&blkdev->inflight);
> >      QLIST_INIT(&blkdev->finished);
> >      QLIST_INIT(&blkdev->freelist);
> > -    blkdev->bh = qemu_bh_new(blk_bh, blkdev);
> > +
> > +    obj = object_new(TYPE_IOTHREAD);
> > +    name = g_strdup_printf("iothread-%s", xendev->name);
> > +
> > +    object_property_add_child(object_get_objects_root(), name, obj,
> &err);
> > +    assert(!err);
> > +
> > +    g_free(name);
> > +
> > +    user_creatable_complete(obj, &err);
> > +    assert(!err);
> > +
> > +    blkdev->iothread = (IOThread *)object_dynamic_cast(obj,
> TYPE_IOTHREAD);
> > +    blkdev->ctx = iothread_get_aio_context(blkdev->iothread);
> > +    blkdev->bh = aio_bh_new(blkdev->ctx, blk_bh, blkdev);
> > +
> >      if (xen_mode != XEN_EMULATE) {
> >          batch_maps = 1;
> >      }
> > @@ -954,6 +980,8 @@ static int blk_init(struct XenDevice *xendev)
> >      int info = 0;
> >      char *directiosafe = NULL;
> >
> > +    trace_xen_disk_init(xendev->name);
> > +
> >      /* read xenstore entries */
> >      if (blkdev->params == NULL) {
> >          char *h = NULL;
> > @@ -1069,6 +1097,8 @@ static int blk_connect(struct XenDevice *xendev)
> >      unsigned int i;
> >      uint32_t *domids;
> >
> > +    trace_xen_disk_connect(xendev->name);
> > +
> >      /* read-only ? */
> >      if (blkdev->directiosafe) {
> >          qflags = BDRV_O_NOCACHE | BDRV_O_NATIVE_AIO;
> > @@ -1285,6 +1315,8 @@ static int blk_connect(struct XenDevice *xendev)
> >          blkdev->persistent_gnt_count = 0;
> >      }
> >
> > +    blk_set_aio_context(blkdev->blk, blkdev->ctx);
> > +
> >      xen_be_bind_evtchn(&blkdev->xendev);
> >
> >      xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, nr-ring-ref %u, "
> > @@ -1298,13 +1330,20 @@ static void blk_disconnect(struct XenDevice
> *xendev)
> >  {
> >      struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev,
> xendev);
> >
> > +    trace_xen_disk_disconnect(xendev->name);
> > +
> > +    aio_context_acquire(blkdev->ctx);
> > +
> >      if (blkdev->blk) {
> > +        blk_set_aio_context(blkdev->blk, qemu_get_aio_context());
> >          blk_detach_dev(blkdev->blk, blkdev);
> >          blk_unref(blkdev->blk);
> >          blkdev->blk = NULL;
> >      }
> >      xen_pv_unbind_evtchn(&blkdev->xendev);
> >
> > +    aio_context_release(blkdev->ctx);
> > +
> >      if (blkdev->sring) {
> >          xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring,
> >                          blkdev->nr_ring_ref);
> > @@ -1338,6 +1377,8 @@ static int blk_free(struct XenDevice *xendev)
> >      struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev,
> xendev);
> >      struct ioreq *ioreq;
> >
> > +    trace_xen_disk_free(xendev->name);
> > +
> >      if (blkdev->blk || blkdev->sring) {
> >          blk_disconnect(xendev);
> >      }
> > @@ -1355,6 +1396,7 @@ static int blk_free(struct XenDevice *xendev)
> >      g_free(blkdev->dev);
> >      g_free(blkdev->devtype);
> >      qemu_bh_delete(blkdev->bh);
> > +    object_unparent(OBJECT(blkdev->iothread));
> >      return 0;
> >  }
> >
> >
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] xen-disk: only advertize feature-persistent if grant copy is not available
  2017-06-20 13:47   ` Paul Durrant
  (?)
@ 2017-06-20 22:19   ` Stefano Stabellini
  2017-06-21  9:17       ` Roger Pau Monné
  -1 siblings, 1 reply; 28+ messages in thread
From: Stefano Stabellini @ 2017-06-20 22:19 UTC (permalink / raw)
  To: Paul Durrant
  Cc: xen-devel, qemu-devel, qemu-block, Stefano Stabellini,
	Anthony Perard, Kevin Wolf, Max Reitz, roger.pau

On Tue, 20 Jun 2017, Paul Durrant wrote:
> If grant copy is available then it will always be used in preference to
> persistent maps. In this case feature-persistent should not be advertized
> to the frontend, otherwise it may needlessly copy data into persistently
> granted buffers.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>

CC'ing Roger.

It is true that using feature-persistent together with grant copies is a
a very bad idea.

But this change enstablishes an explicit preference of
feature_grant_copy over feature-persistent in the xen_disk backend. It
is not obvious to me that it should be the case.

Why is feature_grant_copy (without feature-persistent) better than
feature-persistent (without feature_grant_copy)? Shouldn't we simply
avoid grant copies to copy data to persistent grants?


> ---
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Anthony Perard <anthony.perard@citrix.com>
> Cc: Kevin Wolf <kwolf@redhat.com>
> Cc: Max Reitz <mreitz@redhat.com>
> ---
>  hw/block/xen_disk.c | 15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index 3a22805fbc..9b06e3aa81 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -1023,11 +1023,18 @@ static int blk_init(struct XenDevice *xendev)
>  
>      blkdev->file_blk  = BLOCK_SIZE;
>  
> +    blkdev->feature_grant_copy =
> +                (xengnttab_grant_copy(blkdev->xendev.gnttabdev, 0, NULL) == 0);
> +
> +    xen_pv_printf(&blkdev->xendev, 3, "grant copy operation %s\n",
> +                  blkdev->feature_grant_copy ? "enabled" : "disabled");
> +
>      /* fill info
>       * blk_connect supplies sector-size and sectors
>       */
>      xenstore_write_be_int(&blkdev->xendev, "feature-flush-cache", 1);
> -    xenstore_write_be_int(&blkdev->xendev, "feature-persistent", 1);
> +    xenstore_write_be_int(&blkdev->xendev, "feature-persistent",
> +                          !blkdev->feature_grant_copy);
>      xenstore_write_be_int(&blkdev->xendev, "info", info);
>  
>      blk_parse_discard(blkdev);
> @@ -1202,12 +1209,6 @@ static int blk_connect(struct XenDevice *xendev)
>  
>      xen_be_bind_evtchn(&blkdev->xendev);
>  
> -    blkdev->feature_grant_copy =
> -                (xengnttab_grant_copy(blkdev->xendev.gnttabdev, 0, NULL) == 0);
> -
> -    xen_pv_printf(&blkdev->xendev, 3, "grant copy operation %s\n",
> -                  blkdev->feature_grant_copy ? "enabled" : "disabled");
> -
>      xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, ring-ref %d, "
>                    "remote port %d, local port %d\n",
>                    blkdev->xendev.protocol, blkdev->ring_ref,
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] xen-disk: only advertize feature-persistent if grant copy is not available
  2017-06-20 13:47   ` Paul Durrant
  (?)
  (?)
@ 2017-06-20 22:19   ` Stefano Stabellini
  -1 siblings, 0 replies; 28+ messages in thread
From: Stefano Stabellini @ 2017-06-20 22:19 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Kevin Wolf, Stefano Stabellini, qemu-block, qemu-devel,
	Max Reitz, Anthony Perard, xen-devel, roger.pau

On Tue, 20 Jun 2017, Paul Durrant wrote:
> If grant copy is available then it will always be used in preference to
> persistent maps. In this case feature-persistent should not be advertized
> to the frontend, otherwise it may needlessly copy data into persistently
> granted buffers.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>

CC'ing Roger.

It is true that using feature-persistent together with grant copies is a
a very bad idea.

But this change enstablishes an explicit preference of
feature_grant_copy over feature-persistent in the xen_disk backend. It
is not obvious to me that it should be the case.

Why is feature_grant_copy (without feature-persistent) better than
feature-persistent (without feature_grant_copy)? Shouldn't we simply
avoid grant copies to copy data to persistent grants?


> ---
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Anthony Perard <anthony.perard@citrix.com>
> Cc: Kevin Wolf <kwolf@redhat.com>
> Cc: Max Reitz <mreitz@redhat.com>
> ---
>  hw/block/xen_disk.c | 15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index 3a22805fbc..9b06e3aa81 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -1023,11 +1023,18 @@ static int blk_init(struct XenDevice *xendev)
>  
>      blkdev->file_blk  = BLOCK_SIZE;
>  
> +    blkdev->feature_grant_copy =
> +                (xengnttab_grant_copy(blkdev->xendev.gnttabdev, 0, NULL) == 0);
> +
> +    xen_pv_printf(&blkdev->xendev, 3, "grant copy operation %s\n",
> +                  blkdev->feature_grant_copy ? "enabled" : "disabled");
> +
>      /* fill info
>       * blk_connect supplies sector-size and sectors
>       */
>      xenstore_write_be_int(&blkdev->xendev, "feature-flush-cache", 1);
> -    xenstore_write_be_int(&blkdev->xendev, "feature-persistent", 1);
> +    xenstore_write_be_int(&blkdev->xendev, "feature-persistent",
> +                          !blkdev->feature_grant_copy);
>      xenstore_write_be_int(&blkdev->xendev, "info", info);
>  
>      blk_parse_discard(blkdev);
> @@ -1202,12 +1209,6 @@ static int blk_connect(struct XenDevice *xendev)
>  
>      xen_be_bind_evtchn(&blkdev->xendev);
>  
> -    blkdev->feature_grant_copy =
> -                (xengnttab_grant_copy(blkdev->xendev.gnttabdev, 0, NULL) == 0);
> -
> -    xen_pv_printf(&blkdev->xendev, 3, "grant copy operation %s\n",
> -                  blkdev->feature_grant_copy ? "enabled" : "disabled");
> -
>      xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, ring-ref %d, "
>                    "remote port %d, local port %d\n",
>                    blkdev->xendev.protocol, blkdev->ring_ref,
> -- 
> 2.11.0
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] xen-disk: add support for multi-page shared rings
  2017-06-20 13:47   ` Paul Durrant
@ 2017-06-20 22:51     ` Stefano Stabellini
  -1 siblings, 0 replies; 28+ messages in thread
From: Stefano Stabellini @ 2017-06-20 22:51 UTC (permalink / raw)
  To: Paul Durrant
  Cc: xen-devel, qemu-devel, qemu-block, Stefano Stabellini,
	Anthony Perard, Kevin Wolf, Max Reitz

On Tue, 20 Jun 2017, Paul Durrant wrote:
> The blkif protocol has had provision for negotiation of multi-page shared
> rings for some time now and many guest OS have support in their frontend
> drivers.
> 
> This patch makes the necessary modifications to xen-disk support a shared
> ring up to order 4 (i.e. 16 pages).
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>

Thanks for the patch!

> ---
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Anthony Perard <anthony.perard@citrix.com>
> Cc: Kevin Wolf <kwolf@redhat.com>
> Cc: Max Reitz <mreitz@redhat.com>
> ---
>  hw/block/xen_disk.c | 141 ++++++++++++++++++++++++++++++++++++++++------------
>  1 file changed, 110 insertions(+), 31 deletions(-)
> 
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index 9b06e3aa81..a9942d32db 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -36,8 +36,6 @@
>  
>  static int batch_maps   = 0;
>  
> -static int max_requests = 32;
> -
>  /* ------------------------------------------------------------- */
>  
>  #define BLOCK_SIZE  512
> @@ -84,6 +82,8 @@ struct ioreq {
>      BlockAcctCookie     acct;
>  };
>  
> +#define MAX_RING_PAGE_ORDER 4
> +
>  struct XenBlkDev {
>      struct XenDevice    xendev;  /* must be first */
>      char                *params;
> @@ -94,7 +94,8 @@ struct XenBlkDev {
>      bool                directiosafe;
>      const char          *fileproto;
>      const char          *filename;
> -    int                 ring_ref;
> +    unsigned int        ring_ref[1 << MAX_RING_PAGE_ORDER];
> +    unsigned int        nr_ring_ref;
>      void                *sring;
>      int64_t             file_blk;
>      int64_t             file_size;
> @@ -110,6 +111,7 @@ struct XenBlkDev {
>      int                 requests_total;
>      int                 requests_inflight;
>      int                 requests_finished;
> +    unsigned int        max_requests;
>  
>      /* Persistent grants extension */
>      gboolean            feature_discard;
> @@ -199,7 +201,7 @@ static struct ioreq *ioreq_start(struct XenBlkDev *blkdev)
>      struct ioreq *ioreq = NULL;
>  
>      if (QLIST_EMPTY(&blkdev->freelist)) {
> -        if (blkdev->requests_total >= max_requests) {
> +        if (blkdev->requests_total >= blkdev->max_requests) {
>              goto out;
>          }
>          /* allocate new struct */
> @@ -905,7 +907,7 @@ static void blk_handle_requests(struct XenBlkDev *blkdev)
>          ioreq_runio_qemu_aio(ioreq);
>      }
>  
> -    if (blkdev->more_work && blkdev->requests_inflight < max_requests) {
> +    if (blkdev->more_work && blkdev->requests_inflight < blkdev->max_requests) {
>          qemu_bh_schedule(blkdev->bh);
>      }
>  }
> @@ -918,15 +920,6 @@ static void blk_bh(void *opaque)
>      blk_handle_requests(blkdev);
>  }
>  
> -/*
> - * We need to account for the grant allocations requiring contiguous
> - * chunks; the worst case number would be
> - *     max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
> - * but in order to keep things simple just use
> - *     2 * max_req * max_seg.
> - */
> -#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
> -
>  static void blk_alloc(struct XenDevice *xendev)
>  {
>      struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
> @@ -938,11 +931,6 @@ static void blk_alloc(struct XenDevice *xendev)
>      if (xen_mode != XEN_EMULATE) {
>          batch_maps = 1;
>      }
> -    if (xengnttab_set_max_grants(xendev->gnttabdev,
> -            MAX_GRANTS(max_requests, BLKIF_MAX_SEGMENTS_PER_REQUEST)) < 0) {
> -        xen_pv_printf(xendev, 0, "xengnttab_set_max_grants failed: %s\n",
> -                      strerror(errno));
> -    }
>  }
>  
>  static void blk_parse_discard(struct XenBlkDev *blkdev)
> @@ -1037,6 +1025,9 @@ static int blk_init(struct XenDevice *xendev)
>                            !blkdev->feature_grant_copy);
>      xenstore_write_be_int(&blkdev->xendev, "info", info);
>  
> +    xenstore_write_be_int(&blkdev->xendev, "max-ring-page-order",
> +                          MAX_RING_PAGE_ORDER);
> +
>      blk_parse_discard(blkdev);
>  
>      g_free(directiosafe);
> @@ -1058,12 +1049,25 @@ out_error:
>      return -1;
>  }
>  
> +/*
> + * We need to account for the grant allocations requiring contiguous
> + * chunks; the worst case number would be
> + *     max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
> + * but in order to keep things simple just use
> + *     2 * max_req * max_seg.
> + */
> +#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
> +
>  static int blk_connect(struct XenDevice *xendev)
>  {
>      struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
>      int pers, index, qflags;
>      bool readonly = true;
>      bool writethrough = true;
> +    int order, ring_ref;
> +    unsigned int ring_size, max_grants;
> +    unsigned int i;
> +    uint32_t *domids;
>  
>      /* read-only ? */
>      if (blkdev->directiosafe) {
> @@ -1138,9 +1142,39 @@ static int blk_connect(struct XenDevice *xendev)
>      xenstore_write_be_int64(&blkdev->xendev, "sectors",
>                              blkdev->file_size / blkdev->file_blk);
>  
> -    if (xenstore_read_fe_int(&blkdev->xendev, "ring-ref", &blkdev->ring_ref) == -1) {
> +    if (xenstore_read_fe_int(&blkdev->xendev, "ring-page-order",
> +                             &order) == -1) {
> +        blkdev->nr_ring_ref = 1;
> +
> +        if (xenstore_read_fe_int(&blkdev->xendev, "ring-ref",
> +                                 &ring_ref) == -1) {
> +            return -1;
> +        }
> +        blkdev->ring_ref[0] = ring_ref;
> +
> +    } else if (order >= 0 && order <= MAX_RING_PAGE_ORDER) {
> +        blkdev->nr_ring_ref = 1 << order;
> +
> +        for (i = 0; i < blkdev->nr_ring_ref; i++) {
> +            char *key;
> +
> +            key = g_strdup_printf("ring-ref%u", i);
> +            if (!key) {
> +                return -1;
> +            }
> +
> +            if (xenstore_read_fe_int(&blkdev->xendev, key,
> +                                     &ring_ref) == -1) {

it looks like we are leaking key.


> +                return -1;
> +            }
> +            blkdev->ring_ref[i] = ring_ref;
> +
> +            g_free(key);
> +        }
> +    } else {
>          return -1;

I would like to print warning if the requested ring size exceeds our
limit.


>      }
> +
>      if (xenstore_read_fe_int(&blkdev->xendev, "event-channel",
>                               &blkdev->xendev.remote_port) == -1) {
>          return -1;
> @@ -1163,41 +1197,85 @@ static int blk_connect(struct XenDevice *xendev)
>          blkdev->protocol = BLKIF_PROTOCOL_NATIVE;
>      }
>  
> -    blkdev->sring = xengnttab_map_grant_ref(blkdev->xendev.gnttabdev,
> -                                            blkdev->xendev.dom,
> -                                            blkdev->ring_ref,
> -                                            PROT_READ | PROT_WRITE);
> +    ring_size = XC_PAGE_SIZE * blkdev->nr_ring_ref;
> +    switch (blkdev->protocol) {
> +    case BLKIF_PROTOCOL_NATIVE:
> +    {
> +        blkdev->max_requests = __CONST_RING_SIZE(blkif, ring_size);
> +        break;
> +    }
> +    case BLKIF_PROTOCOL_X86_32:
> +    {
> +        blkdev->max_requests = __CONST_RING_SIZE(blkif_x86_32, ring_size);
> +        break;
> +    }
> +    case BLKIF_PROTOCOL_X86_64:
> +    {
> +        blkdev->max_requests = __CONST_RING_SIZE(blkif_x86_64, ring_size);
> +        break;
> +    }
> +    default:
> +        return -1;
> +    }
> +
> +    /* Calculate the maximum number of grants needed by ioreqs */
> +    max_grants = MAX_GRANTS(blkdev->max_requests,
> +                            BLKIF_MAX_SEGMENTS_PER_REQUEST);
> +    /* Add on the number needed for the ring pages */
> +    max_grants += blkdev->nr_ring_ref;
> +
> +    if (xengnttab_set_max_grants(blkdev->xendev.gnttabdev, max_grants)) {
> +        xen_pv_printf(xendev, 0, "xengnttab_set_max_grants failed: %s\n",
> +                      strerror(errno));
> +        return -1;
> +    }
> +
> +    domids = g_malloc0_n(blkdev->nr_ring_ref, sizeof(uint32_t));
> +    for (i = 0; i < blkdev->nr_ring_ref; i++) {
> +        domids[i] = blkdev->xendev.dom;
> +    }
> +
> +    blkdev->sring = xengnttab_map_grant_refs(blkdev->xendev.gnttabdev,
> +                                             blkdev->nr_ring_ref,
> +                                             domids,
> +                                             blkdev->ring_ref,
> +                                             PROT_READ | PROT_WRITE);
> +
> +    g_free(domids);
> +
>      if (!blkdev->sring) {
>          return -1;
>      }
> +
>      blkdev->cnt_map++;
>  
>      switch (blkdev->protocol) {
>      case BLKIF_PROTOCOL_NATIVE:
>      {
>          blkif_sring_t *sring_native = blkdev->sring;
> -        BACK_RING_INIT(&blkdev->rings.native, sring_native, XC_PAGE_SIZE);
> +        BACK_RING_INIT(&blkdev->rings.native, sring_native, ring_size);
>          break;
>      }
>      case BLKIF_PROTOCOL_X86_32:
>      {
>          blkif_x86_32_sring_t *sring_x86_32 = blkdev->sring;
>  
> -        BACK_RING_INIT(&blkdev->rings.x86_32_part, sring_x86_32, XC_PAGE_SIZE);
> +        BACK_RING_INIT(&blkdev->rings.x86_32_part, sring_x86_32, ring_size);
>          break;
>      }
>      case BLKIF_PROTOCOL_X86_64:
>      {
>          blkif_x86_64_sring_t *sring_x86_64 = blkdev->sring;
>  
> -        BACK_RING_INIT(&blkdev->rings.x86_64_part, sring_x86_64, XC_PAGE_SIZE);
> +        BACK_RING_INIT(&blkdev->rings.x86_64_part, sring_x86_64, ring_size);
>          break;
>      }
>      }
>  
>      if (blkdev->feature_persistent) {
>          /* Init persistent grants */
> -        blkdev->max_grants = max_requests * BLKIF_MAX_SEGMENTS_PER_REQUEST;
> +        blkdev->max_grants = blkdev->max_requests *
> +            BLKIF_MAX_SEGMENTS_PER_REQUEST;
>          blkdev->persistent_gnts = g_tree_new_full((GCompareDataFunc)int_cmp,
>                                               NULL, NULL,
>                                               batch_maps ?
> @@ -1209,9 +1287,9 @@ static int blk_connect(struct XenDevice *xendev)
>  
>      xen_be_bind_evtchn(&blkdev->xendev);
>  
> -    xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, ring-ref %d, "
> +    xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, nr-ring-ref %u, "
>                    "remote port %d, local port %d\n",
> -                  blkdev->xendev.protocol, blkdev->ring_ref,
> +                  blkdev->xendev.protocol, blkdev->nr_ring_ref,
>                    blkdev->xendev.remote_port, blkdev->xendev.local_port);
>      return 0;
>  }
> @@ -1228,7 +1306,8 @@ static void blk_disconnect(struct XenDevice *xendev)
>      xen_pv_unbind_evtchn(&blkdev->xendev);
>  
>      if (blkdev->sring) {
> -        xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring, 1);
> +        xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring,
> +                        blkdev->nr_ring_ref);
>          blkdev->cnt_map--;
>          blkdev->sring = NULL;
>      }
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/3] xen-disk: add support for multi-page shared rings
@ 2017-06-20 22:51     ` Stefano Stabellini
  0 siblings, 0 replies; 28+ messages in thread
From: Stefano Stabellini @ 2017-06-20 22:51 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Kevin Wolf, Stefano Stabellini, qemu-block, qemu-devel,
	Max Reitz, Anthony Perard, xen-devel

On Tue, 20 Jun 2017, Paul Durrant wrote:
> The blkif protocol has had provision for negotiation of multi-page shared
> rings for some time now and many guest OS have support in their frontend
> drivers.
> 
> This patch makes the necessary modifications to xen-disk support a shared
> ring up to order 4 (i.e. 16 pages).
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>

Thanks for the patch!

> ---
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Anthony Perard <anthony.perard@citrix.com>
> Cc: Kevin Wolf <kwolf@redhat.com>
> Cc: Max Reitz <mreitz@redhat.com>
> ---
>  hw/block/xen_disk.c | 141 ++++++++++++++++++++++++++++++++++++++++------------
>  1 file changed, 110 insertions(+), 31 deletions(-)
> 
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index 9b06e3aa81..a9942d32db 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -36,8 +36,6 @@
>  
>  static int batch_maps   = 0;
>  
> -static int max_requests = 32;
> -
>  /* ------------------------------------------------------------- */
>  
>  #define BLOCK_SIZE  512
> @@ -84,6 +82,8 @@ struct ioreq {
>      BlockAcctCookie     acct;
>  };
>  
> +#define MAX_RING_PAGE_ORDER 4
> +
>  struct XenBlkDev {
>      struct XenDevice    xendev;  /* must be first */
>      char                *params;
> @@ -94,7 +94,8 @@ struct XenBlkDev {
>      bool                directiosafe;
>      const char          *fileproto;
>      const char          *filename;
> -    int                 ring_ref;
> +    unsigned int        ring_ref[1 << MAX_RING_PAGE_ORDER];
> +    unsigned int        nr_ring_ref;
>      void                *sring;
>      int64_t             file_blk;
>      int64_t             file_size;
> @@ -110,6 +111,7 @@ struct XenBlkDev {
>      int                 requests_total;
>      int                 requests_inflight;
>      int                 requests_finished;
> +    unsigned int        max_requests;
>  
>      /* Persistent grants extension */
>      gboolean            feature_discard;
> @@ -199,7 +201,7 @@ static struct ioreq *ioreq_start(struct XenBlkDev *blkdev)
>      struct ioreq *ioreq = NULL;
>  
>      if (QLIST_EMPTY(&blkdev->freelist)) {
> -        if (blkdev->requests_total >= max_requests) {
> +        if (blkdev->requests_total >= blkdev->max_requests) {
>              goto out;
>          }
>          /* allocate new struct */
> @@ -905,7 +907,7 @@ static void blk_handle_requests(struct XenBlkDev *blkdev)
>          ioreq_runio_qemu_aio(ioreq);
>      }
>  
> -    if (blkdev->more_work && blkdev->requests_inflight < max_requests) {
> +    if (blkdev->more_work && blkdev->requests_inflight < blkdev->max_requests) {
>          qemu_bh_schedule(blkdev->bh);
>      }
>  }
> @@ -918,15 +920,6 @@ static void blk_bh(void *opaque)
>      blk_handle_requests(blkdev);
>  }
>  
> -/*
> - * We need to account for the grant allocations requiring contiguous
> - * chunks; the worst case number would be
> - *     max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
> - * but in order to keep things simple just use
> - *     2 * max_req * max_seg.
> - */
> -#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
> -
>  static void blk_alloc(struct XenDevice *xendev)
>  {
>      struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
> @@ -938,11 +931,6 @@ static void blk_alloc(struct XenDevice *xendev)
>      if (xen_mode != XEN_EMULATE) {
>          batch_maps = 1;
>      }
> -    if (xengnttab_set_max_grants(xendev->gnttabdev,
> -            MAX_GRANTS(max_requests, BLKIF_MAX_SEGMENTS_PER_REQUEST)) < 0) {
> -        xen_pv_printf(xendev, 0, "xengnttab_set_max_grants failed: %s\n",
> -                      strerror(errno));
> -    }
>  }
>  
>  static void blk_parse_discard(struct XenBlkDev *blkdev)
> @@ -1037,6 +1025,9 @@ static int blk_init(struct XenDevice *xendev)
>                            !blkdev->feature_grant_copy);
>      xenstore_write_be_int(&blkdev->xendev, "info", info);
>  
> +    xenstore_write_be_int(&blkdev->xendev, "max-ring-page-order",
> +                          MAX_RING_PAGE_ORDER);
> +
>      blk_parse_discard(blkdev);
>  
>      g_free(directiosafe);
> @@ -1058,12 +1049,25 @@ out_error:
>      return -1;
>  }
>  
> +/*
> + * We need to account for the grant allocations requiring contiguous
> + * chunks; the worst case number would be
> + *     max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
> + * but in order to keep things simple just use
> + *     2 * max_req * max_seg.
> + */
> +#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
> +
>  static int blk_connect(struct XenDevice *xendev)
>  {
>      struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
>      int pers, index, qflags;
>      bool readonly = true;
>      bool writethrough = true;
> +    int order, ring_ref;
> +    unsigned int ring_size, max_grants;
> +    unsigned int i;
> +    uint32_t *domids;
>  
>      /* read-only ? */
>      if (blkdev->directiosafe) {
> @@ -1138,9 +1142,39 @@ static int blk_connect(struct XenDevice *xendev)
>      xenstore_write_be_int64(&blkdev->xendev, "sectors",
>                              blkdev->file_size / blkdev->file_blk);
>  
> -    if (xenstore_read_fe_int(&blkdev->xendev, "ring-ref", &blkdev->ring_ref) == -1) {
> +    if (xenstore_read_fe_int(&blkdev->xendev, "ring-page-order",
> +                             &order) == -1) {
> +        blkdev->nr_ring_ref = 1;
> +
> +        if (xenstore_read_fe_int(&blkdev->xendev, "ring-ref",
> +                                 &ring_ref) == -1) {
> +            return -1;
> +        }
> +        blkdev->ring_ref[0] = ring_ref;
> +
> +    } else if (order >= 0 && order <= MAX_RING_PAGE_ORDER) {
> +        blkdev->nr_ring_ref = 1 << order;
> +
> +        for (i = 0; i < blkdev->nr_ring_ref; i++) {
> +            char *key;
> +
> +            key = g_strdup_printf("ring-ref%u", i);
> +            if (!key) {
> +                return -1;
> +            }
> +
> +            if (xenstore_read_fe_int(&blkdev->xendev, key,
> +                                     &ring_ref) == -1) {

it looks like we are leaking key.


> +                return -1;
> +            }
> +            blkdev->ring_ref[i] = ring_ref;
> +
> +            g_free(key);
> +        }
> +    } else {
>          return -1;

I would like to print warning if the requested ring size exceeds our
limit.


>      }
> +
>      if (xenstore_read_fe_int(&blkdev->xendev, "event-channel",
>                               &blkdev->xendev.remote_port) == -1) {
>          return -1;
> @@ -1163,41 +1197,85 @@ static int blk_connect(struct XenDevice *xendev)
>          blkdev->protocol = BLKIF_PROTOCOL_NATIVE;
>      }
>  
> -    blkdev->sring = xengnttab_map_grant_ref(blkdev->xendev.gnttabdev,
> -                                            blkdev->xendev.dom,
> -                                            blkdev->ring_ref,
> -                                            PROT_READ | PROT_WRITE);
> +    ring_size = XC_PAGE_SIZE * blkdev->nr_ring_ref;
> +    switch (blkdev->protocol) {
> +    case BLKIF_PROTOCOL_NATIVE:
> +    {
> +        blkdev->max_requests = __CONST_RING_SIZE(blkif, ring_size);
> +        break;
> +    }
> +    case BLKIF_PROTOCOL_X86_32:
> +    {
> +        blkdev->max_requests = __CONST_RING_SIZE(blkif_x86_32, ring_size);
> +        break;
> +    }
> +    case BLKIF_PROTOCOL_X86_64:
> +    {
> +        blkdev->max_requests = __CONST_RING_SIZE(blkif_x86_64, ring_size);
> +        break;
> +    }
> +    default:
> +        return -1;
> +    }
> +
> +    /* Calculate the maximum number of grants needed by ioreqs */
> +    max_grants = MAX_GRANTS(blkdev->max_requests,
> +                            BLKIF_MAX_SEGMENTS_PER_REQUEST);
> +    /* Add on the number needed for the ring pages */
> +    max_grants += blkdev->nr_ring_ref;
> +
> +    if (xengnttab_set_max_grants(blkdev->xendev.gnttabdev, max_grants)) {
> +        xen_pv_printf(xendev, 0, "xengnttab_set_max_grants failed: %s\n",
> +                      strerror(errno));
> +        return -1;
> +    }
> +
> +    domids = g_malloc0_n(blkdev->nr_ring_ref, sizeof(uint32_t));
> +    for (i = 0; i < blkdev->nr_ring_ref; i++) {
> +        domids[i] = blkdev->xendev.dom;
> +    }
> +
> +    blkdev->sring = xengnttab_map_grant_refs(blkdev->xendev.gnttabdev,
> +                                             blkdev->nr_ring_ref,
> +                                             domids,
> +                                             blkdev->ring_ref,
> +                                             PROT_READ | PROT_WRITE);
> +
> +    g_free(domids);
> +
>      if (!blkdev->sring) {
>          return -1;
>      }
> +
>      blkdev->cnt_map++;
>  
>      switch (blkdev->protocol) {
>      case BLKIF_PROTOCOL_NATIVE:
>      {
>          blkif_sring_t *sring_native = blkdev->sring;
> -        BACK_RING_INIT(&blkdev->rings.native, sring_native, XC_PAGE_SIZE);
> +        BACK_RING_INIT(&blkdev->rings.native, sring_native, ring_size);
>          break;
>      }
>      case BLKIF_PROTOCOL_X86_32:
>      {
>          blkif_x86_32_sring_t *sring_x86_32 = blkdev->sring;
>  
> -        BACK_RING_INIT(&blkdev->rings.x86_32_part, sring_x86_32, XC_PAGE_SIZE);
> +        BACK_RING_INIT(&blkdev->rings.x86_32_part, sring_x86_32, ring_size);
>          break;
>      }
>      case BLKIF_PROTOCOL_X86_64:
>      {
>          blkif_x86_64_sring_t *sring_x86_64 = blkdev->sring;
>  
> -        BACK_RING_INIT(&blkdev->rings.x86_64_part, sring_x86_64, XC_PAGE_SIZE);
> +        BACK_RING_INIT(&blkdev->rings.x86_64_part, sring_x86_64, ring_size);
>          break;
>      }
>      }
>  
>      if (blkdev->feature_persistent) {
>          /* Init persistent grants */
> -        blkdev->max_grants = max_requests * BLKIF_MAX_SEGMENTS_PER_REQUEST;
> +        blkdev->max_grants = blkdev->max_requests *
> +            BLKIF_MAX_SEGMENTS_PER_REQUEST;
>          blkdev->persistent_gnts = g_tree_new_full((GCompareDataFunc)int_cmp,
>                                               NULL, NULL,
>                                               batch_maps ?
> @@ -1209,9 +1287,9 @@ static int blk_connect(struct XenDevice *xendev)
>  
>      xen_be_bind_evtchn(&blkdev->xendev);
>  
> -    xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, ring-ref %d, "
> +    xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, nr-ring-ref %u, "
>                    "remote port %d, local port %d\n",
> -                  blkdev->xendev.protocol, blkdev->ring_ref,
> +                  blkdev->xendev.protocol, blkdev->nr_ring_ref,
>                    blkdev->xendev.remote_port, blkdev->xendev.local_port);
>      return 0;
>  }
> @@ -1228,7 +1306,8 @@ static void blk_disconnect(struct XenDevice *xendev)
>      xen_pv_unbind_evtchn(&blkdev->xendev);
>  
>      if (blkdev->sring) {
> -        xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring, 1);
> +        xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring,
> +                        blkdev->nr_ring_ref);
>          blkdev->cnt_map--;
>          blkdev->sring = NULL;
>      }
> -- 
> 2.11.0
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] xen-disk: only advertize feature-persistent if grant copy is not available
  2017-06-20 22:19   ` [Qemu-devel] " Stefano Stabellini
@ 2017-06-21  9:17       ` Roger Pau Monné
  0 siblings, 0 replies; 28+ messages in thread
From: Roger Pau Monné @ 2017-06-21  9:17 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Paul Durrant, xen-devel, qemu-devel, qemu-block, Anthony Perard,
	Kevin Wolf, Max Reitz

On Tue, Jun 20, 2017 at 03:19:33PM -0700, Stefano Stabellini wrote:
> On Tue, 20 Jun 2017, Paul Durrant wrote:
> > If grant copy is available then it will always be used in preference to
> > persistent maps. In this case feature-persistent should not be advertized
> > to the frontend, otherwise it may needlessly copy data into persistently
> > granted buffers.
> > 
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> 
> CC'ing Roger.
> 
> It is true that using feature-persistent together with grant copies is a
> a very bad idea.
> 
> But this change enstablishes an explicit preference of
> feature_grant_copy over feature-persistent in the xen_disk backend. It
> is not obvious to me that it should be the case.
> 
> Why is feature_grant_copy (without feature-persistent) better than
> feature-persistent (without feature_grant_copy)? Shouldn't we simply
> avoid grant copies to copy data to persistent grants?

When using persistent grants the frontend must always copy data from
the buffer to the persistent grant, there's no way to avoid this.

Using grant_copy we move the copy from the frontend to the backend,
which means the CPU time of the copy is accounted to the backend. This
is not ideal, but IMHO it's better than persistent grants because it
avoids keeping a pool of mapped grants that consume memory and make
the code more complex.

Do you have some performance data showing the difference between
persistent grants vs grant copy?

Roger.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] xen-disk: only advertize feature-persistent if grant copy is not available
@ 2017-06-21  9:17       ` Roger Pau Monné
  0 siblings, 0 replies; 28+ messages in thread
From: Roger Pau Monné @ 2017-06-21  9:17 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Kevin Wolf, qemu-block, qemu-devel, Max Reitz, Paul Durrant,
	Anthony Perard, xen-devel

On Tue, Jun 20, 2017 at 03:19:33PM -0700, Stefano Stabellini wrote:
> On Tue, 20 Jun 2017, Paul Durrant wrote:
> > If grant copy is available then it will always be used in preference to
> > persistent maps. In this case feature-persistent should not be advertized
> > to the frontend, otherwise it may needlessly copy data into persistently
> > granted buffers.
> > 
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> 
> CC'ing Roger.
> 
> It is true that using feature-persistent together with grant copies is a
> a very bad idea.
> 
> But this change enstablishes an explicit preference of
> feature_grant_copy over feature-persistent in the xen_disk backend. It
> is not obvious to me that it should be the case.
> 
> Why is feature_grant_copy (without feature-persistent) better than
> feature-persistent (without feature_grant_copy)? Shouldn't we simply
> avoid grant copies to copy data to persistent grants?

When using persistent grants the frontend must always copy data from
the buffer to the persistent grant, there's no way to avoid this.

Using grant_copy we move the copy from the frontend to the backend,
which means the CPU time of the copy is accounted to the backend. This
is not ideal, but IMHO it's better than persistent grants because it
avoids keeping a pool of mapped grants that consume memory and make
the code more complex.

Do you have some performance data showing the difference between
persistent grants vs grant copy?

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] xen-disk: only advertize feature-persistent if grant copy is not available
  2017-06-21  9:17       ` Roger Pau Monné
  (?)
@ 2017-06-21  9:35       ` Paul Durrant
  2017-06-21 10:40         ` Paul Durrant
  2017-06-21 10:40         ` Paul Durrant
  -1 siblings, 2 replies; 28+ messages in thread
From: Paul Durrant @ 2017-06-21  9:35 UTC (permalink / raw)
  To: Roger Pau Monne, Stefano Stabellini
  Cc: xen-devel, qemu-devel, qemu-block, Anthony Perard, Kevin Wolf, Max Reitz

> -----Original Message-----
> From: Roger Pau Monne
> Sent: 21 June 2017 10:18
> To: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Paul Durrant <Paul.Durrant@citrix.com>; xen-devel@lists.xenproject.org;
> qemu-devel@nongnu.org; qemu-block@nongnu.org; Anthony Perard
> <anthony.perard@citrix.com>; Kevin Wolf <kwolf@redhat.com>; Max Reitz
> <mreitz@redhat.com>
> Subject: Re: [PATCH 1/3] xen-disk: only advertize feature-persistent if grant
> copy is not available
> 
> On Tue, Jun 20, 2017 at 03:19:33PM -0700, Stefano Stabellini wrote:
> > On Tue, 20 Jun 2017, Paul Durrant wrote:
> > > If grant copy is available then it will always be used in preference to
> > > persistent maps. In this case feature-persistent should not be advertized
> > > to the frontend, otherwise it may needlessly copy data into persistently
> > > granted buffers.
> > >
> > > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> >
> > CC'ing Roger.
> >
> > It is true that using feature-persistent together with grant copies is a
> > a very bad idea.
> >
> > But this change enstablishes an explicit preference of
> > feature_grant_copy over feature-persistent in the xen_disk backend. It
> > is not obvious to me that it should be the case.
> >
> > Why is feature_grant_copy (without feature-persistent) better than
> > feature-persistent (without feature_grant_copy)? Shouldn't we simply
> > avoid grant copies to copy data to persistent grants?
> 
> When using persistent grants the frontend must always copy data from
> the buffer to the persistent grant, there's no way to avoid this.
> 
> Using grant_copy we move the copy from the frontend to the backend,
> which means the CPU time of the copy is accounted to the backend. This
> is not ideal, but IMHO it's better than persistent grants because it
> avoids keeping a pool of mapped grants that consume memory and make
> the code more complex.
> 
> Do you have some performance data showing the difference between
> persistent grants vs grant copy?
> 

No, but I can get some :-)

For a little background... I've been trying to push throughput of fio running in a debian stretch guest on my skull canyon NUC. When I started out, I was getting ~100MBbs. When I finished, with this patch, the IOThreads one, the multi-page ring one and a bit of hackery to turn off all the aio flushes that seem to occur even if the image is opened with O_DIRECT, I was getting ~960Mbps... which is about line rate for the SSD in the in NUC.

So, I'll force use of persistent grants on and see what sort of throughput I get.

Cheers,

  Paul

> Roger.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] xen-disk: only advertize feature-persistent if grant copy is not available
  2017-06-21  9:17       ` Roger Pau Monné
  (?)
  (?)
@ 2017-06-21  9:35       ` Paul Durrant
  -1 siblings, 0 replies; 28+ messages in thread
From: Paul Durrant @ 2017-06-21  9:35 UTC (permalink / raw)
  To: Roger Pau Monne, Stefano Stabellini
  Cc: Kevin Wolf, qemu-block, qemu-devel, Max Reitz, Anthony Perard, xen-devel

> -----Original Message-----
> From: Roger Pau Monne
> Sent: 21 June 2017 10:18
> To: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Paul Durrant <Paul.Durrant@citrix.com>; xen-devel@lists.xenproject.org;
> qemu-devel@nongnu.org; qemu-block@nongnu.org; Anthony Perard
> <anthony.perard@citrix.com>; Kevin Wolf <kwolf@redhat.com>; Max Reitz
> <mreitz@redhat.com>
> Subject: Re: [PATCH 1/3] xen-disk: only advertize feature-persistent if grant
> copy is not available
> 
> On Tue, Jun 20, 2017 at 03:19:33PM -0700, Stefano Stabellini wrote:
> > On Tue, 20 Jun 2017, Paul Durrant wrote:
> > > If grant copy is available then it will always be used in preference to
> > > persistent maps. In this case feature-persistent should not be advertized
> > > to the frontend, otherwise it may needlessly copy data into persistently
> > > granted buffers.
> > >
> > > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> >
> > CC'ing Roger.
> >
> > It is true that using feature-persistent together with grant copies is a
> > a very bad idea.
> >
> > But this change enstablishes an explicit preference of
> > feature_grant_copy over feature-persistent in the xen_disk backend. It
> > is not obvious to me that it should be the case.
> >
> > Why is feature_grant_copy (without feature-persistent) better than
> > feature-persistent (without feature_grant_copy)? Shouldn't we simply
> > avoid grant copies to copy data to persistent grants?
> 
> When using persistent grants the frontend must always copy data from
> the buffer to the persistent grant, there's no way to avoid this.
> 
> Using grant_copy we move the copy from the frontend to the backend,
> which means the CPU time of the copy is accounted to the backend. This
> is not ideal, but IMHO it's better than persistent grants because it
> avoids keeping a pool of mapped grants that consume memory and make
> the code more complex.
> 
> Do you have some performance data showing the difference between
> persistent grants vs grant copy?
> 

No, but I can get some :-)

For a little background... I've been trying to push throughput of fio running in a debian stretch guest on my skull canyon NUC. When I started out, I was getting ~100MBbs. When I finished, with this patch, the IOThreads one, the multi-page ring one and a bit of hackery to turn off all the aio flushes that seem to occur even if the image is opened with O_DIRECT, I was getting ~960Mbps... which is about line rate for the SSD in the in NUC.

So, I'll force use of persistent grants on and see what sort of throughput I get.

Cheers,

  Paul

> Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] xen-disk: only advertize feature-persistent if grant copy is not available
  2017-06-21  9:35       ` [Qemu-devel] " Paul Durrant
@ 2017-06-21 10:40         ` Paul Durrant
  2017-06-21 10:50             ` Roger Pau Monne
  2017-06-21 10:40         ` Paul Durrant
  1 sibling, 1 reply; 28+ messages in thread
From: Paul Durrant @ 2017-06-21 10:40 UTC (permalink / raw)
  To: Paul Durrant, Roger Pau Monne, Stefano Stabellini
  Cc: Kevin Wolf, qemu-block, qemu-devel, Max Reitz, Anthony Perard, xen-devel

> -----Original Message-----
> From: Qemu-devel [mailto:qemu-devel-
> bounces+paul.durrant=citrix.com@nongnu.org] On Behalf Of Paul Durrant
> Sent: 21 June 2017 10:36
> To: Roger Pau Monne <roger.pau@citrix.com>; Stefano Stabellini
> <sstabellini@kernel.org>
> Cc: Kevin Wolf <kwolf@redhat.com>; qemu-block@nongnu.org; qemu-
> devel@nongnu.org; Max Reitz <mreitz@redhat.com>; Anthony Perard
> <anthony.perard@citrix.com>; xen-devel@lists.xenproject.org
> Subject: Re: [Qemu-devel] [PATCH 1/3] xen-disk: only advertize feature-
> persistent if grant copy is not available
> 
> > -----Original Message-----
> > From: Roger Pau Monne
> > Sent: 21 June 2017 10:18
> > To: Stefano Stabellini <sstabellini@kernel.org>
> > Cc: Paul Durrant <Paul.Durrant@citrix.com>; xen-
> devel@lists.xenproject.org;
> > qemu-devel@nongnu.org; qemu-block@nongnu.org; Anthony Perard
> > <anthony.perard@citrix.com>; Kevin Wolf <kwolf@redhat.com>; Max
> Reitz
> > <mreitz@redhat.com>
> > Subject: Re: [PATCH 1/3] xen-disk: only advertize feature-persistent if
> grant
> > copy is not available
> >
> > On Tue, Jun 20, 2017 at 03:19:33PM -0700, Stefano Stabellini wrote:
> > > On Tue, 20 Jun 2017, Paul Durrant wrote:
> > > > If grant copy is available then it will always be used in preference to
> > > > persistent maps. In this case feature-persistent should not be
> advertized
> > > > to the frontend, otherwise it may needlessly copy data into persistently
> > > > granted buffers.
> > > >
> > > > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > >
> > > CC'ing Roger.
> > >
> > > It is true that using feature-persistent together with grant copies is a
> > > a very bad idea.
> > >
> > > But this change enstablishes an explicit preference of
> > > feature_grant_copy over feature-persistent in the xen_disk backend. It
> > > is not obvious to me that it should be the case.
> > >
> > > Why is feature_grant_copy (without feature-persistent) better than
> > > feature-persistent (without feature_grant_copy)? Shouldn't we simply
> > > avoid grant copies to copy data to persistent grants?
> >
> > When using persistent grants the frontend must always copy data from
> > the buffer to the persistent grant, there's no way to avoid this.
> >
> > Using grant_copy we move the copy from the frontend to the backend,
> > which means the CPU time of the copy is accounted to the backend. This
> > is not ideal, but IMHO it's better than persistent grants because it
> > avoids keeping a pool of mapped grants that consume memory and make
> > the code more complex.
> >
> > Do you have some performance data showing the difference between
> > persistent grants vs grant copy?
> >
> 
> No, but I can get some :-)
> 
> For a little background... I've been trying to push throughput of fio running in
> a debian stretch guest on my skull canyon NUC. When I started out, I was
> getting ~100MBbs. When I finished, with this patch, the IOThreads one, the
> multi-page ring one and a bit of hackery to turn off all the aio flushes that
> seem to occur even if the image is opened with O_DIRECT, I was getting
> ~960Mbps... which is about line rate for the SSD in the in NUC.
> 
> So, I'll force use of persistent grants on and see what sort of throughput I
> get.

A quick test with grant copy forced off (causing persistent grants to be used)... My VM is debian stretch using a 16 page shared ring from blkfront. The image backing xvdb is a fully inflated 10G qcow2.

root@dhcp-237-70:~# fio --randrepeat=1 --ioengine=libaio --direct=0 --gtod_reduce=1 --name=test --filename=/dev/xvdb --bs=512k --iodepth=64 --size=10G --readwrite=randwrite --ramp_time=4
test: (g=0): rw=randwrite, bs=512K-512K/512K-512K/512K-512K, ioengine=libaio, iodepth=64
fio-2.16
Starting 1 process
Jobs: 1 (f=1): [w(1)] [70.6% done] [0KB/539.4MB/0KB /s] [0/1078/0 iops] [eta 00m:05s]
test: (groupid=0, jobs=1): err= 0: pid=633: Wed Jun 21 06:26:06 2017
  write: io=6146.6MB, bw=795905KB/s, iops=1546, runt=  7908msec
  cpu          : usr=2.07%, sys=34.00%, ctx=4490, majf=0, minf=1
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.3%, >=64=166.9%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued    : total=r=0/w=12230/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: io=6146.6MB, aggrb=795904KB/s, minb=795904KB/s, maxb=795904KB/s, mint=7908msec, maxt=7908msec

Disk stats (read/write):
  xvdb: ios=54/228860, merge=0/2230616, ticks=16/5403048, in_queue=5409068, util=98.26%

The dom0 cpu usage for the relevant IOThread was ~60%

The same test with grant copy...

root@dhcp-237-70:~# fio --randrepeat=1 --ioengine=libaio --direct=0 --gtod_reduce=1 --name=test --filename=/dev/xvdb --bs=512k --iodepth=64 --size=10G --readwrite=randwrite --ramp_time=4
test: (g=0): rw=randwrite, bs=512K-512K/512K-512K/512K-512K, ioengine=libaio, iodepth=64
fio-2.16
Starting 1 process
Jobs: 1 (f=1): [w(1)] [70.6% done] [0KB/607.7MB/0KB /s] [0/1215/0 iops] [eta 00m:05s]
test: (groupid=0, jobs=1): err= 0: pid=483: Wed Jun 21 06:35:14 2017
  write: io=6232.0MB, bw=810976KB/s, iops=1575, runt=  7869msec
  cpu          : usr=2.44%, sys=37.42%, ctx=3570, majf=0, minf=1
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.3%, >=64=164.6%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued    : total=r=0/w=12401/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: io=6232.0MB, aggrb=810975KB/s, minb=810975KB/s, maxb=810975KB/s, mint=7869msec, maxt=7869msec

Disk stats (read/write):
  xvdb: ios=54/229583, merge=0/2235879, ticks=16/5409500, in_queue=5415080, util=98.27%

So, higher throughput and iops. The dom0 cpu usage was running at ~70%, so there is definitely more dom0 overhead by using grant copy. The usage of grant copy could probably be improved through since the current code issues an copy ioctl per ioreq. With some batching I suspect some, if not all, of the extra overhead could be recovered.

Cheers,

  Paul

> 
> Cheers,
> 
>   Paul
> 
> > Roger.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] xen-disk: only advertize feature-persistent if grant copy is not available
  2017-06-21  9:35       ` [Qemu-devel] " Paul Durrant
  2017-06-21 10:40         ` Paul Durrant
@ 2017-06-21 10:40         ` Paul Durrant
  1 sibling, 0 replies; 28+ messages in thread
From: Paul Durrant @ 2017-06-21 10:40 UTC (permalink / raw)
  To: Paul Durrant, Roger Pau Monne, Stefano Stabellini
  Cc: Kevin Wolf, qemu-block, qemu-devel, Max Reitz, Anthony Perard, xen-devel

> -----Original Message-----
> From: Qemu-devel [mailto:qemu-devel-
> bounces+paul.durrant=citrix.com@nongnu.org] On Behalf Of Paul Durrant
> Sent: 21 June 2017 10:36
> To: Roger Pau Monne <roger.pau@citrix.com>; Stefano Stabellini
> <sstabellini@kernel.org>
> Cc: Kevin Wolf <kwolf@redhat.com>; qemu-block@nongnu.org; qemu-
> devel@nongnu.org; Max Reitz <mreitz@redhat.com>; Anthony Perard
> <anthony.perard@citrix.com>; xen-devel@lists.xenproject.org
> Subject: Re: [Qemu-devel] [PATCH 1/3] xen-disk: only advertize feature-
> persistent if grant copy is not available
> 
> > -----Original Message-----
> > From: Roger Pau Monne
> > Sent: 21 June 2017 10:18
> > To: Stefano Stabellini <sstabellini@kernel.org>
> > Cc: Paul Durrant <Paul.Durrant@citrix.com>; xen-
> devel@lists.xenproject.org;
> > qemu-devel@nongnu.org; qemu-block@nongnu.org; Anthony Perard
> > <anthony.perard@citrix.com>; Kevin Wolf <kwolf@redhat.com>; Max
> Reitz
> > <mreitz@redhat.com>
> > Subject: Re: [PATCH 1/3] xen-disk: only advertize feature-persistent if
> grant
> > copy is not available
> >
> > On Tue, Jun 20, 2017 at 03:19:33PM -0700, Stefano Stabellini wrote:
> > > On Tue, 20 Jun 2017, Paul Durrant wrote:
> > > > If grant copy is available then it will always be used in preference to
> > > > persistent maps. In this case feature-persistent should not be
> advertized
> > > > to the frontend, otherwise it may needlessly copy data into persistently
> > > > granted buffers.
> > > >
> > > > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > >
> > > CC'ing Roger.
> > >
> > > It is true that using feature-persistent together with grant copies is a
> > > a very bad idea.
> > >
> > > But this change enstablishes an explicit preference of
> > > feature_grant_copy over feature-persistent in the xen_disk backend. It
> > > is not obvious to me that it should be the case.
> > >
> > > Why is feature_grant_copy (without feature-persistent) better than
> > > feature-persistent (without feature_grant_copy)? Shouldn't we simply
> > > avoid grant copies to copy data to persistent grants?
> >
> > When using persistent grants the frontend must always copy data from
> > the buffer to the persistent grant, there's no way to avoid this.
> >
> > Using grant_copy we move the copy from the frontend to the backend,
> > which means the CPU time of the copy is accounted to the backend. This
> > is not ideal, but IMHO it's better than persistent grants because it
> > avoids keeping a pool of mapped grants that consume memory and make
> > the code more complex.
> >
> > Do you have some performance data showing the difference between
> > persistent grants vs grant copy?
> >
> 
> No, but I can get some :-)
> 
> For a little background... I've been trying to push throughput of fio running in
> a debian stretch guest on my skull canyon NUC. When I started out, I was
> getting ~100MBbs. When I finished, with this patch, the IOThreads one, the
> multi-page ring one and a bit of hackery to turn off all the aio flushes that
> seem to occur even if the image is opened with O_DIRECT, I was getting
> ~960Mbps... which is about line rate for the SSD in the in NUC.
> 
> So, I'll force use of persistent grants on and see what sort of throughput I
> get.

A quick test with grant copy forced off (causing persistent grants to be used)... My VM is debian stretch using a 16 page shared ring from blkfront. The image backing xvdb is a fully inflated 10G qcow2.

root@dhcp-237-70:~# fio --randrepeat=1 --ioengine=libaio --direct=0 --gtod_reduce=1 --name=test --filename=/dev/xvdb --bs=512k --iodepth=64 --size=10G --readwrite=randwrite --ramp_time=4
test: (g=0): rw=randwrite, bs=512K-512K/512K-512K/512K-512K, ioengine=libaio, iodepth=64
fio-2.16
Starting 1 process
Jobs: 1 (f=1): [w(1)] [70.6% done] [0KB/539.4MB/0KB /s] [0/1078/0 iops] [eta 00m:05s]
test: (groupid=0, jobs=1): err= 0: pid=633: Wed Jun 21 06:26:06 2017
  write: io=6146.6MB, bw=795905KB/s, iops=1546, runt=  7908msec
  cpu          : usr=2.07%, sys=34.00%, ctx=4490, majf=0, minf=1
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.3%, >=64=166.9%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued    : total=r=0/w=12230/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: io=6146.6MB, aggrb=795904KB/s, minb=795904KB/s, maxb=795904KB/s, mint=7908msec, maxt=7908msec

Disk stats (read/write):
  xvdb: ios=54/228860, merge=0/2230616, ticks=16/5403048, in_queue=5409068, util=98.26%

The dom0 cpu usage for the relevant IOThread was ~60%

The same test with grant copy...

root@dhcp-237-70:~# fio --randrepeat=1 --ioengine=libaio --direct=0 --gtod_reduce=1 --name=test --filename=/dev/xvdb --bs=512k --iodepth=64 --size=10G --readwrite=randwrite --ramp_time=4
test: (g=0): rw=randwrite, bs=512K-512K/512K-512K/512K-512K, ioengine=libaio, iodepth=64
fio-2.16
Starting 1 process
Jobs: 1 (f=1): [w(1)] [70.6% done] [0KB/607.7MB/0KB /s] [0/1215/0 iops] [eta 00m:05s]
test: (groupid=0, jobs=1): err= 0: pid=483: Wed Jun 21 06:35:14 2017
  write: io=6232.0MB, bw=810976KB/s, iops=1575, runt=  7869msec
  cpu          : usr=2.44%, sys=37.42%, ctx=3570, majf=0, minf=1
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.3%, >=64=164.6%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued    : total=r=0/w=12401/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: io=6232.0MB, aggrb=810975KB/s, minb=810975KB/s, maxb=810975KB/s, mint=7869msec, maxt=7869msec

Disk stats (read/write):
  xvdb: ios=54/229583, merge=0/2235879, ticks=16/5409500, in_queue=5415080, util=98.27%

So, higher throughput and iops. The dom0 cpu usage was running at ~70%, so there is definitely more dom0 overhead by using grant copy. The usage of grant copy could probably be improved through since the current code issues an copy ioctl per ioreq. With some batching I suspect some, if not all, of the extra overhead could be recovered.

Cheers,

  Paul

> 
> Cheers,
> 
>   Paul
> 
> > Roger.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] xen-disk: add support for multi-page shared rings
  2017-06-20 22:51     ` Stefano Stabellini
@ 2017-06-21 10:42       ` Paul Durrant
  -1 siblings, 0 replies; 28+ messages in thread
From: Paul Durrant @ 2017-06-21 10:42 UTC (permalink / raw)
  To: 'Stefano Stabellini'
  Cc: xen-devel, qemu-devel, qemu-block, Anthony Perard, Kevin Wolf, Max Reitz

> -----Original Message-----
> From: Stefano Stabellini [mailto:sstabellini@kernel.org]
> Sent: 20 June 2017 23:51
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: xen-devel@lists.xenproject.org; qemu-devel@nongnu.org; qemu-
> block@nongnu.org; Stefano Stabellini <sstabellini@kernel.org>; Anthony
> Perard <anthony.perard@citrix.com>; Kevin Wolf <kwolf@redhat.com>;
> Max Reitz <mreitz@redhat.com>
> Subject: Re: [PATCH 2/3] xen-disk: add support for multi-page shared rings
> 
> On Tue, 20 Jun 2017, Paul Durrant wrote:
> > The blkif protocol has had provision for negotiation of multi-page shared
> > rings for some time now and many guest OS have support in their frontend
> > drivers.
> >
> > This patch makes the necessary modifications to xen-disk support a shared
> > ring up to order 4 (i.e. 16 pages).
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> 
> Thanks for the patch!
> 

You're welcome.

> > ---
> > Cc: Stefano Stabellini <sstabellini@kernel.org>
> > Cc: Anthony Perard <anthony.perard@citrix.com>
> > Cc: Kevin Wolf <kwolf@redhat.com>
> > Cc: Max Reitz <mreitz@redhat.com>
> > ---
> >  hw/block/xen_disk.c | 141
> ++++++++++++++++++++++++++++++++++++++++------------
> >  1 file changed, 110 insertions(+), 31 deletions(-)
> >
> > diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> > index 9b06e3aa81..a9942d32db 100644
> > --- a/hw/block/xen_disk.c
> > +++ b/hw/block/xen_disk.c
> > @@ -36,8 +36,6 @@
> >
> >  static int batch_maps   = 0;
> >
> > -static int max_requests = 32;
> > -
> >  /* ------------------------------------------------------------- */
> >
> >  #define BLOCK_SIZE  512
> > @@ -84,6 +82,8 @@ struct ioreq {
> >      BlockAcctCookie     acct;
> >  };
> >
> > +#define MAX_RING_PAGE_ORDER 4
> > +
> >  struct XenBlkDev {
> >      struct XenDevice    xendev;  /* must be first */
> >      char                *params;
> > @@ -94,7 +94,8 @@ struct XenBlkDev {
> >      bool                directiosafe;
> >      const char          *fileproto;
> >      const char          *filename;
> > -    int                 ring_ref;
> > +    unsigned int        ring_ref[1 << MAX_RING_PAGE_ORDER];
> > +    unsigned int        nr_ring_ref;
> >      void                *sring;
> >      int64_t             file_blk;
> >      int64_t             file_size;
> > @@ -110,6 +111,7 @@ struct XenBlkDev {
> >      int                 requests_total;
> >      int                 requests_inflight;
> >      int                 requests_finished;
> > +    unsigned int        max_requests;
> >
> >      /* Persistent grants extension */
> >      gboolean            feature_discard;
> > @@ -199,7 +201,7 @@ static struct ioreq *ioreq_start(struct XenBlkDev
> *blkdev)
> >      struct ioreq *ioreq = NULL;
> >
> >      if (QLIST_EMPTY(&blkdev->freelist)) {
> > -        if (blkdev->requests_total >= max_requests) {
> > +        if (blkdev->requests_total >= blkdev->max_requests) {
> >              goto out;
> >          }
> >          /* allocate new struct */
> > @@ -905,7 +907,7 @@ static void blk_handle_requests(struct XenBlkDev
> *blkdev)
> >          ioreq_runio_qemu_aio(ioreq);
> >      }
> >
> > -    if (blkdev->more_work && blkdev->requests_inflight < max_requests) {
> > +    if (blkdev->more_work && blkdev->requests_inflight < blkdev-
> >max_requests) {
> >          qemu_bh_schedule(blkdev->bh);
> >      }
> >  }
> > @@ -918,15 +920,6 @@ static void blk_bh(void *opaque)
> >      blk_handle_requests(blkdev);
> >  }
> >
> > -/*
> > - * We need to account for the grant allocations requiring contiguous
> > - * chunks; the worst case number would be
> > - *     max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
> > - * but in order to keep things simple just use
> > - *     2 * max_req * max_seg.
> > - */
> > -#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
> > -
> >  static void blk_alloc(struct XenDevice *xendev)
> >  {
> >      struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev,
> xendev);
> > @@ -938,11 +931,6 @@ static void blk_alloc(struct XenDevice *xendev)
> >      if (xen_mode != XEN_EMULATE) {
> >          batch_maps = 1;
> >      }
> > -    if (xengnttab_set_max_grants(xendev->gnttabdev,
> > -            MAX_GRANTS(max_requests,
> BLKIF_MAX_SEGMENTS_PER_REQUEST)) < 0) {
> > -        xen_pv_printf(xendev, 0, "xengnttab_set_max_grants failed: %s\n",
> > -                      strerror(errno));
> > -    }
> >  }
> >
> >  static void blk_parse_discard(struct XenBlkDev *blkdev)
> > @@ -1037,6 +1025,9 @@ static int blk_init(struct XenDevice *xendev)
> >                            !blkdev->feature_grant_copy);
> >      xenstore_write_be_int(&blkdev->xendev, "info", info);
> >
> > +    xenstore_write_be_int(&blkdev->xendev, "max-ring-page-order",
> > +                          MAX_RING_PAGE_ORDER);
> > +
> >      blk_parse_discard(blkdev);
> >
> >      g_free(directiosafe);
> > @@ -1058,12 +1049,25 @@ out_error:
> >      return -1;
> >  }
> >
> > +/*
> > + * We need to account for the grant allocations requiring contiguous
> > + * chunks; the worst case number would be
> > + *     max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
> > + * but in order to keep things simple just use
> > + *     2 * max_req * max_seg.
> > + */
> > +#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
> > +
> >  static int blk_connect(struct XenDevice *xendev)
> >  {
> >      struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev,
> xendev);
> >      int pers, index, qflags;
> >      bool readonly = true;
> >      bool writethrough = true;
> > +    int order, ring_ref;
> > +    unsigned int ring_size, max_grants;
> > +    unsigned int i;
> > +    uint32_t *domids;
> >
> >      /* read-only ? */
> >      if (blkdev->directiosafe) {
> > @@ -1138,9 +1142,39 @@ static int blk_connect(struct XenDevice
> *xendev)
> >      xenstore_write_be_int64(&blkdev->xendev, "sectors",
> >                              blkdev->file_size / blkdev->file_blk);
> >
> > -    if (xenstore_read_fe_int(&blkdev->xendev, "ring-ref", &blkdev-
> >ring_ref) == -1) {
> > +    if (xenstore_read_fe_int(&blkdev->xendev, "ring-page-order",
> > +                             &order) == -1) {
> > +        blkdev->nr_ring_ref = 1;
> > +
> > +        if (xenstore_read_fe_int(&blkdev->xendev, "ring-ref",
> > +                                 &ring_ref) == -1) {
> > +            return -1;
> > +        }
> > +        blkdev->ring_ref[0] = ring_ref;
> > +
> > +    } else if (order >= 0 && order <= MAX_RING_PAGE_ORDER) {
> > +        blkdev->nr_ring_ref = 1 << order;
> > +
> > +        for (i = 0; i < blkdev->nr_ring_ref; i++) {
> > +            char *key;
> > +
> > +            key = g_strdup_printf("ring-ref%u", i);
> > +            if (!key) {
> > +                return -1;
> > +            }
> > +
> > +            if (xenstore_read_fe_int(&blkdev->xendev, key,
> > +                                     &ring_ref) == -1) {
> 
> it looks like we are leaking key.
> 

Indeed it does. Good spot.

> 
> > +                return -1;
> > +            }
> > +            blkdev->ring_ref[i] = ring_ref;
> > +
> > +            g_free(key);
> > +        }
> > +    } else {
> >          return -1;
> 
> I would like to print warning if the requested ring size exceeds our
> limit.
> 

Sure. That's a good idea.

Cheers,

  Paul

> 
> >      }
> > +
> >      if (xenstore_read_fe_int(&blkdev->xendev, "event-channel",
> >                               &blkdev->xendev.remote_port) == -1) {
> >          return -1;
> > @@ -1163,41 +1197,85 @@ static int blk_connect(struct XenDevice
> *xendev)
> >          blkdev->protocol = BLKIF_PROTOCOL_NATIVE;
> >      }
> >
> > -    blkdev->sring = xengnttab_map_grant_ref(blkdev->xendev.gnttabdev,
> > -                                            blkdev->xendev.dom,
> > -                                            blkdev->ring_ref,
> > -                                            PROT_READ | PROT_WRITE);
> > +    ring_size = XC_PAGE_SIZE * blkdev->nr_ring_ref;
> > +    switch (blkdev->protocol) {
> > +    case BLKIF_PROTOCOL_NATIVE:
> > +    {
> > +        blkdev->max_requests = __CONST_RING_SIZE(blkif, ring_size);
> > +        break;
> > +    }
> > +    case BLKIF_PROTOCOL_X86_32:
> > +    {
> > +        blkdev->max_requests = __CONST_RING_SIZE(blkif_x86_32,
> ring_size);
> > +        break;
> > +    }
> > +    case BLKIF_PROTOCOL_X86_64:
> > +    {
> > +        blkdev->max_requests = __CONST_RING_SIZE(blkif_x86_64,
> ring_size);
> > +        break;
> > +    }
> > +    default:
> > +        return -1;
> > +    }
> > +
> > +    /* Calculate the maximum number of grants needed by ioreqs */
> > +    max_grants = MAX_GRANTS(blkdev->max_requests,
> > +                            BLKIF_MAX_SEGMENTS_PER_REQUEST);
> > +    /* Add on the number needed for the ring pages */
> > +    max_grants += blkdev->nr_ring_ref;
> > +
> > +    if (xengnttab_set_max_grants(blkdev->xendev.gnttabdev,
> max_grants)) {
> > +        xen_pv_printf(xendev, 0, "xengnttab_set_max_grants failed: %s\n",
> > +                      strerror(errno));
> > +        return -1;
> > +    }
> > +
> > +    domids = g_malloc0_n(blkdev->nr_ring_ref, sizeof(uint32_t));
> > +    for (i = 0; i < blkdev->nr_ring_ref; i++) {
> > +        domids[i] = blkdev->xendev.dom;
> > +    }
> > +
> > +    blkdev->sring = xengnttab_map_grant_refs(blkdev-
> >xendev.gnttabdev,
> > +                                             blkdev->nr_ring_ref,
> > +                                             domids,
> > +                                             blkdev->ring_ref,
> > +                                             PROT_READ | PROT_WRITE);
> > +
> > +    g_free(domids);
> > +
> >      if (!blkdev->sring) {
> >          return -1;
> >      }
> > +
> >      blkdev->cnt_map++;
> >
> >      switch (blkdev->protocol) {
> >      case BLKIF_PROTOCOL_NATIVE:
> >      {
> >          blkif_sring_t *sring_native = blkdev->sring;
> > -        BACK_RING_INIT(&blkdev->rings.native, sring_native,
> XC_PAGE_SIZE);
> > +        BACK_RING_INIT(&blkdev->rings.native, sring_native, ring_size);
> >          break;
> >      }
> >      case BLKIF_PROTOCOL_X86_32:
> >      {
> >          blkif_x86_32_sring_t *sring_x86_32 = blkdev->sring;
> >
> > -        BACK_RING_INIT(&blkdev->rings.x86_32_part, sring_x86_32,
> XC_PAGE_SIZE);
> > +        BACK_RING_INIT(&blkdev->rings.x86_32_part, sring_x86_32,
> ring_size);
> >          break;
> >      }
> >      case BLKIF_PROTOCOL_X86_64:
> >      {
> >          blkif_x86_64_sring_t *sring_x86_64 = blkdev->sring;
> >
> > -        BACK_RING_INIT(&blkdev->rings.x86_64_part, sring_x86_64,
> XC_PAGE_SIZE);
> > +        BACK_RING_INIT(&blkdev->rings.x86_64_part, sring_x86_64,
> ring_size);
> >          break;
> >      }
> >      }
> >
> >      if (blkdev->feature_persistent) {
> >          /* Init persistent grants */
> > -        blkdev->max_grants = max_requests *
> BLKIF_MAX_SEGMENTS_PER_REQUEST;
> > +        blkdev->max_grants = blkdev->max_requests *
> > +            BLKIF_MAX_SEGMENTS_PER_REQUEST;
> >          blkdev->persistent_gnts =
> g_tree_new_full((GCompareDataFunc)int_cmp,
> >                                               NULL, NULL,
> >                                               batch_maps ?
> > @@ -1209,9 +1287,9 @@ static int blk_connect(struct XenDevice *xendev)
> >
> >      xen_be_bind_evtchn(&blkdev->xendev);
> >
> > -    xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, ring-ref %d, "
> > +    xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, nr-ring-ref %u, "
> >                    "remote port %d, local port %d\n",
> > -                  blkdev->xendev.protocol, blkdev->ring_ref,
> > +                  blkdev->xendev.protocol, blkdev->nr_ring_ref,
> >                    blkdev->xendev.remote_port, blkdev->xendev.local_port);
> >      return 0;
> >  }
> > @@ -1228,7 +1306,8 @@ static void blk_disconnect(struct XenDevice
> *xendev)
> >      xen_pv_unbind_evtchn(&blkdev->xendev);
> >
> >      if (blkdev->sring) {
> > -        xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring, 1);
> > +        xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring,
> > +                        blkdev->nr_ring_ref);
> >          blkdev->cnt_map--;
> >          blkdev->sring = NULL;
> >      }
> > --
> > 2.11.0
> >

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/3] xen-disk: add support for multi-page shared rings
@ 2017-06-21 10:42       ` Paul Durrant
  0 siblings, 0 replies; 28+ messages in thread
From: Paul Durrant @ 2017-06-21 10:42 UTC (permalink / raw)
  To: 'Stefano Stabellini'
  Cc: Kevin Wolf, qemu-block, qemu-devel, Max Reitz, Anthony Perard, xen-devel

> -----Original Message-----
> From: Stefano Stabellini [mailto:sstabellini@kernel.org]
> Sent: 20 June 2017 23:51
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: xen-devel@lists.xenproject.org; qemu-devel@nongnu.org; qemu-
> block@nongnu.org; Stefano Stabellini <sstabellini@kernel.org>; Anthony
> Perard <anthony.perard@citrix.com>; Kevin Wolf <kwolf@redhat.com>;
> Max Reitz <mreitz@redhat.com>
> Subject: Re: [PATCH 2/3] xen-disk: add support for multi-page shared rings
> 
> On Tue, 20 Jun 2017, Paul Durrant wrote:
> > The blkif protocol has had provision for negotiation of multi-page shared
> > rings for some time now and many guest OS have support in their frontend
> > drivers.
> >
> > This patch makes the necessary modifications to xen-disk support a shared
> > ring up to order 4 (i.e. 16 pages).
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> 
> Thanks for the patch!
> 

You're welcome.

> > ---
> > Cc: Stefano Stabellini <sstabellini@kernel.org>
> > Cc: Anthony Perard <anthony.perard@citrix.com>
> > Cc: Kevin Wolf <kwolf@redhat.com>
> > Cc: Max Reitz <mreitz@redhat.com>
> > ---
> >  hw/block/xen_disk.c | 141
> ++++++++++++++++++++++++++++++++++++++++------------
> >  1 file changed, 110 insertions(+), 31 deletions(-)
> >
> > diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> > index 9b06e3aa81..a9942d32db 100644
> > --- a/hw/block/xen_disk.c
> > +++ b/hw/block/xen_disk.c
> > @@ -36,8 +36,6 @@
> >
> >  static int batch_maps   = 0;
> >
> > -static int max_requests = 32;
> > -
> >  /* ------------------------------------------------------------- */
> >
> >  #define BLOCK_SIZE  512
> > @@ -84,6 +82,8 @@ struct ioreq {
> >      BlockAcctCookie     acct;
> >  };
> >
> > +#define MAX_RING_PAGE_ORDER 4
> > +
> >  struct XenBlkDev {
> >      struct XenDevice    xendev;  /* must be first */
> >      char                *params;
> > @@ -94,7 +94,8 @@ struct XenBlkDev {
> >      bool                directiosafe;
> >      const char          *fileproto;
> >      const char          *filename;
> > -    int                 ring_ref;
> > +    unsigned int        ring_ref[1 << MAX_RING_PAGE_ORDER];
> > +    unsigned int        nr_ring_ref;
> >      void                *sring;
> >      int64_t             file_blk;
> >      int64_t             file_size;
> > @@ -110,6 +111,7 @@ struct XenBlkDev {
> >      int                 requests_total;
> >      int                 requests_inflight;
> >      int                 requests_finished;
> > +    unsigned int        max_requests;
> >
> >      /* Persistent grants extension */
> >      gboolean            feature_discard;
> > @@ -199,7 +201,7 @@ static struct ioreq *ioreq_start(struct XenBlkDev
> *blkdev)
> >      struct ioreq *ioreq = NULL;
> >
> >      if (QLIST_EMPTY(&blkdev->freelist)) {
> > -        if (blkdev->requests_total >= max_requests) {
> > +        if (blkdev->requests_total >= blkdev->max_requests) {
> >              goto out;
> >          }
> >          /* allocate new struct */
> > @@ -905,7 +907,7 @@ static void blk_handle_requests(struct XenBlkDev
> *blkdev)
> >          ioreq_runio_qemu_aio(ioreq);
> >      }
> >
> > -    if (blkdev->more_work && blkdev->requests_inflight < max_requests) {
> > +    if (blkdev->more_work && blkdev->requests_inflight < blkdev-
> >max_requests) {
> >          qemu_bh_schedule(blkdev->bh);
> >      }
> >  }
> > @@ -918,15 +920,6 @@ static void blk_bh(void *opaque)
> >      blk_handle_requests(blkdev);
> >  }
> >
> > -/*
> > - * We need to account for the grant allocations requiring contiguous
> > - * chunks; the worst case number would be
> > - *     max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
> > - * but in order to keep things simple just use
> > - *     2 * max_req * max_seg.
> > - */
> > -#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
> > -
> >  static void blk_alloc(struct XenDevice *xendev)
> >  {
> >      struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev,
> xendev);
> > @@ -938,11 +931,6 @@ static void blk_alloc(struct XenDevice *xendev)
> >      if (xen_mode != XEN_EMULATE) {
> >          batch_maps = 1;
> >      }
> > -    if (xengnttab_set_max_grants(xendev->gnttabdev,
> > -            MAX_GRANTS(max_requests,
> BLKIF_MAX_SEGMENTS_PER_REQUEST)) < 0) {
> > -        xen_pv_printf(xendev, 0, "xengnttab_set_max_grants failed: %s\n",
> > -                      strerror(errno));
> > -    }
> >  }
> >
> >  static void blk_parse_discard(struct XenBlkDev *blkdev)
> > @@ -1037,6 +1025,9 @@ static int blk_init(struct XenDevice *xendev)
> >                            !blkdev->feature_grant_copy);
> >      xenstore_write_be_int(&blkdev->xendev, "info", info);
> >
> > +    xenstore_write_be_int(&blkdev->xendev, "max-ring-page-order",
> > +                          MAX_RING_PAGE_ORDER);
> > +
> >      blk_parse_discard(blkdev);
> >
> >      g_free(directiosafe);
> > @@ -1058,12 +1049,25 @@ out_error:
> >      return -1;
> >  }
> >
> > +/*
> > + * We need to account for the grant allocations requiring contiguous
> > + * chunks; the worst case number would be
> > + *     max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
> > + * but in order to keep things simple just use
> > + *     2 * max_req * max_seg.
> > + */
> > +#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
> > +
> >  static int blk_connect(struct XenDevice *xendev)
> >  {
> >      struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev,
> xendev);
> >      int pers, index, qflags;
> >      bool readonly = true;
> >      bool writethrough = true;
> > +    int order, ring_ref;
> > +    unsigned int ring_size, max_grants;
> > +    unsigned int i;
> > +    uint32_t *domids;
> >
> >      /* read-only ? */
> >      if (blkdev->directiosafe) {
> > @@ -1138,9 +1142,39 @@ static int blk_connect(struct XenDevice
> *xendev)
> >      xenstore_write_be_int64(&blkdev->xendev, "sectors",
> >                              blkdev->file_size / blkdev->file_blk);
> >
> > -    if (xenstore_read_fe_int(&blkdev->xendev, "ring-ref", &blkdev-
> >ring_ref) == -1) {
> > +    if (xenstore_read_fe_int(&blkdev->xendev, "ring-page-order",
> > +                             &order) == -1) {
> > +        blkdev->nr_ring_ref = 1;
> > +
> > +        if (xenstore_read_fe_int(&blkdev->xendev, "ring-ref",
> > +                                 &ring_ref) == -1) {
> > +            return -1;
> > +        }
> > +        blkdev->ring_ref[0] = ring_ref;
> > +
> > +    } else if (order >= 0 && order <= MAX_RING_PAGE_ORDER) {
> > +        blkdev->nr_ring_ref = 1 << order;
> > +
> > +        for (i = 0; i < blkdev->nr_ring_ref; i++) {
> > +            char *key;
> > +
> > +            key = g_strdup_printf("ring-ref%u", i);
> > +            if (!key) {
> > +                return -1;
> > +            }
> > +
> > +            if (xenstore_read_fe_int(&blkdev->xendev, key,
> > +                                     &ring_ref) == -1) {
> 
> it looks like we are leaking key.
> 

Indeed it does. Good spot.

> 
> > +                return -1;
> > +            }
> > +            blkdev->ring_ref[i] = ring_ref;
> > +
> > +            g_free(key);
> > +        }
> > +    } else {
> >          return -1;
> 
> I would like to print warning if the requested ring size exceeds our
> limit.
> 

Sure. That's a good idea.

Cheers,

  Paul

> 
> >      }
> > +
> >      if (xenstore_read_fe_int(&blkdev->xendev, "event-channel",
> >                               &blkdev->xendev.remote_port) == -1) {
> >          return -1;
> > @@ -1163,41 +1197,85 @@ static int blk_connect(struct XenDevice
> *xendev)
> >          blkdev->protocol = BLKIF_PROTOCOL_NATIVE;
> >      }
> >
> > -    blkdev->sring = xengnttab_map_grant_ref(blkdev->xendev.gnttabdev,
> > -                                            blkdev->xendev.dom,
> > -                                            blkdev->ring_ref,
> > -                                            PROT_READ | PROT_WRITE);
> > +    ring_size = XC_PAGE_SIZE * blkdev->nr_ring_ref;
> > +    switch (blkdev->protocol) {
> > +    case BLKIF_PROTOCOL_NATIVE:
> > +    {
> > +        blkdev->max_requests = __CONST_RING_SIZE(blkif, ring_size);
> > +        break;
> > +    }
> > +    case BLKIF_PROTOCOL_X86_32:
> > +    {
> > +        blkdev->max_requests = __CONST_RING_SIZE(blkif_x86_32,
> ring_size);
> > +        break;
> > +    }
> > +    case BLKIF_PROTOCOL_X86_64:
> > +    {
> > +        blkdev->max_requests = __CONST_RING_SIZE(blkif_x86_64,
> ring_size);
> > +        break;
> > +    }
> > +    default:
> > +        return -1;
> > +    }
> > +
> > +    /* Calculate the maximum number of grants needed by ioreqs */
> > +    max_grants = MAX_GRANTS(blkdev->max_requests,
> > +                            BLKIF_MAX_SEGMENTS_PER_REQUEST);
> > +    /* Add on the number needed for the ring pages */
> > +    max_grants += blkdev->nr_ring_ref;
> > +
> > +    if (xengnttab_set_max_grants(blkdev->xendev.gnttabdev,
> max_grants)) {
> > +        xen_pv_printf(xendev, 0, "xengnttab_set_max_grants failed: %s\n",
> > +                      strerror(errno));
> > +        return -1;
> > +    }
> > +
> > +    domids = g_malloc0_n(blkdev->nr_ring_ref, sizeof(uint32_t));
> > +    for (i = 0; i < blkdev->nr_ring_ref; i++) {
> > +        domids[i] = blkdev->xendev.dom;
> > +    }
> > +
> > +    blkdev->sring = xengnttab_map_grant_refs(blkdev-
> >xendev.gnttabdev,
> > +                                             blkdev->nr_ring_ref,
> > +                                             domids,
> > +                                             blkdev->ring_ref,
> > +                                             PROT_READ | PROT_WRITE);
> > +
> > +    g_free(domids);
> > +
> >      if (!blkdev->sring) {
> >          return -1;
> >      }
> > +
> >      blkdev->cnt_map++;
> >
> >      switch (blkdev->protocol) {
> >      case BLKIF_PROTOCOL_NATIVE:
> >      {
> >          blkif_sring_t *sring_native = blkdev->sring;
> > -        BACK_RING_INIT(&blkdev->rings.native, sring_native,
> XC_PAGE_SIZE);
> > +        BACK_RING_INIT(&blkdev->rings.native, sring_native, ring_size);
> >          break;
> >      }
> >      case BLKIF_PROTOCOL_X86_32:
> >      {
> >          blkif_x86_32_sring_t *sring_x86_32 = blkdev->sring;
> >
> > -        BACK_RING_INIT(&blkdev->rings.x86_32_part, sring_x86_32,
> XC_PAGE_SIZE);
> > +        BACK_RING_INIT(&blkdev->rings.x86_32_part, sring_x86_32,
> ring_size);
> >          break;
> >      }
> >      case BLKIF_PROTOCOL_X86_64:
> >      {
> >          blkif_x86_64_sring_t *sring_x86_64 = blkdev->sring;
> >
> > -        BACK_RING_INIT(&blkdev->rings.x86_64_part, sring_x86_64,
> XC_PAGE_SIZE);
> > +        BACK_RING_INIT(&blkdev->rings.x86_64_part, sring_x86_64,
> ring_size);
> >          break;
> >      }
> >      }
> >
> >      if (blkdev->feature_persistent) {
> >          /* Init persistent grants */
> > -        blkdev->max_grants = max_requests *
> BLKIF_MAX_SEGMENTS_PER_REQUEST;
> > +        blkdev->max_grants = blkdev->max_requests *
> > +            BLKIF_MAX_SEGMENTS_PER_REQUEST;
> >          blkdev->persistent_gnts =
> g_tree_new_full((GCompareDataFunc)int_cmp,
> >                                               NULL, NULL,
> >                                               batch_maps ?
> > @@ -1209,9 +1287,9 @@ static int blk_connect(struct XenDevice *xendev)
> >
> >      xen_be_bind_evtchn(&blkdev->xendev);
> >
> > -    xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, ring-ref %d, "
> > +    xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, nr-ring-ref %u, "
> >                    "remote port %d, local port %d\n",
> > -                  blkdev->xendev.protocol, blkdev->ring_ref,
> > +                  blkdev->xendev.protocol, blkdev->nr_ring_ref,
> >                    blkdev->xendev.remote_port, blkdev->xendev.local_port);
> >      return 0;
> >  }
> > @@ -1228,7 +1306,8 @@ static void blk_disconnect(struct XenDevice
> *xendev)
> >      xen_pv_unbind_evtchn(&blkdev->xendev);
> >
> >      if (blkdev->sring) {
> > -        xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring, 1);
> > +        xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring,
> > +                        blkdev->nr_ring_ref);
> >          blkdev->cnt_map--;
> >          blkdev->sring = NULL;
> >      }
> > --
> > 2.11.0
> >

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] xen-disk: only advertize feature-persistent if grant copy is not available
  2017-06-21 10:40         ` Paul Durrant
@ 2017-06-21 10:50             ` Roger Pau Monne
  0 siblings, 0 replies; 28+ messages in thread
From: Roger Pau Monne @ 2017-06-21 10:50 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Stefano Stabellini, Kevin Wolf, qemu-block, qemu-devel,
	Max Reitz, Anthony Perard, xen-devel

On Wed, Jun 21, 2017 at 11:40:00AM +0100, Paul Durrant wrote:
> > -----Original Message-----
> > From: Qemu-devel [mailto:qemu-devel-
> > bounces+paul.durrant=citrix.com@nongnu.org] On Behalf Of Paul Durrant
> > Sent: 21 June 2017 10:36
> > To: Roger Pau Monne <roger.pau@citrix.com>; Stefano Stabellini
> > <sstabellini@kernel.org>
> > Cc: Kevin Wolf <kwolf@redhat.com>; qemu-block@nongnu.org; qemu-
> > devel@nongnu.org; Max Reitz <mreitz@redhat.com>; Anthony Perard
> > <anthony.perard@citrix.com>; xen-devel@lists.xenproject.org
> > Subject: Re: [Qemu-devel] [PATCH 1/3] xen-disk: only advertize feature-
> > persistent if grant copy is not available
> > 
> > > -----Original Message-----
> > > From: Roger Pau Monne
> > > Sent: 21 June 2017 10:18
> > > To: Stefano Stabellini <sstabellini@kernel.org>
> > > Cc: Paul Durrant <Paul.Durrant@citrix.com>; xen-
> > devel@lists.xenproject.org;
> > > qemu-devel@nongnu.org; qemu-block@nongnu.org; Anthony Perard
> > > <anthony.perard@citrix.com>; Kevin Wolf <kwolf@redhat.com>; Max
> > Reitz
> > > <mreitz@redhat.com>
> > > Subject: Re: [PATCH 1/3] xen-disk: only advertize feature-persistent if
> > grant
> > > copy is not available
> > >
> > > On Tue, Jun 20, 2017 at 03:19:33PM -0700, Stefano Stabellini wrote:
> > > > On Tue, 20 Jun 2017, Paul Durrant wrote:
> > > > > If grant copy is available then it will always be used in preference to
> > > > > persistent maps. In this case feature-persistent should not be
> > advertized
> > > > > to the frontend, otherwise it may needlessly copy data into persistently
> > > > > granted buffers.
> > > > >
> > > > > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > > >
> > > > CC'ing Roger.
> > > >
> > > > It is true that using feature-persistent together with grant copies is a
> > > > a very bad idea.
> > > >
> > > > But this change enstablishes an explicit preference of
> > > > feature_grant_copy over feature-persistent in the xen_disk backend. It
> > > > is not obvious to me that it should be the case.
> > > >
> > > > Why is feature_grant_copy (without feature-persistent) better than
> > > > feature-persistent (without feature_grant_copy)? Shouldn't we simply
> > > > avoid grant copies to copy data to persistent grants?
> > >
> > > When using persistent grants the frontend must always copy data from
> > > the buffer to the persistent grant, there's no way to avoid this.
> > >
> > > Using grant_copy we move the copy from the frontend to the backend,
> > > which means the CPU time of the copy is accounted to the backend. This
> > > is not ideal, but IMHO it's better than persistent grants because it
> > > avoids keeping a pool of mapped grants that consume memory and make
> > > the code more complex.
> > >
> > > Do you have some performance data showing the difference between
> > > persistent grants vs grant copy?
> > >
> > 
> > No, but I can get some :-)
> > 
> > For a little background... I've been trying to push throughput of fio running in
> > a debian stretch guest on my skull canyon NUC. When I started out, I was
> > getting ~100MBbs. When I finished, with this patch, the IOThreads one, the
> > multi-page ring one and a bit of hackery to turn off all the aio flushes that
> > seem to occur even if the image is opened with O_DIRECT, I was getting
> > ~960Mbps... which is about line rate for the SSD in the in NUC.
> > 
> > So, I'll force use of persistent grants on and see what sort of throughput I
> > get.
> 
> A quick test with grant copy forced off (causing persistent grants to be used)... My VM is debian stretch using a 16 page shared ring from blkfront. The image backing xvdb is a fully inflated 10G qcow2.
> 
> root@dhcp-237-70:~# fio --randrepeat=1 --ioengine=libaio --direct=0 --gtod_reduce=1 --name=test --filename=/dev/xvdb --bs=512k --iodepth=64 --size=10G --readwrite=randwrite --ramp_time=4
> test: (g=0): rw=randwrite, bs=512K-512K/512K-512K/512K-512K, ioengine=libaio, iodepth=64
> fio-2.16
> Starting 1 process
> Jobs: 1 (f=1): [w(1)] [70.6% done] [0KB/539.4MB/0KB /s] [0/1078/0 iops] [eta 00m:05s]
> test: (groupid=0, jobs=1): err= 0: pid=633: Wed Jun 21 06:26:06 2017
>   write: io=6146.6MB, bw=795905KB/s, iops=1546, runt=  7908msec
>   cpu          : usr=2.07%, sys=34.00%, ctx=4490, majf=0, minf=1
>   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.3%, >=64=166.9%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
>      issued    : total=r=0/w=12230/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
>      latency   : target=0, window=0, percentile=100.00%, depth=64
> 
> Run status group 0 (all jobs):
>   WRITE: io=6146.6MB, aggrb=795904KB/s, minb=795904KB/s, maxb=795904KB/s, mint=7908msec, maxt=7908msec
> 
> Disk stats (read/write):
>   xvdb: ios=54/228860, merge=0/2230616, ticks=16/5403048, in_queue=5409068, util=98.26%
> 
> The dom0 cpu usage for the relevant IOThread was ~60%
> 
> The same test with grant copy...
> 
> root@dhcp-237-70:~# fio --randrepeat=1 --ioengine=libaio --direct=0 --gtod_reduce=1 --name=test --filename=/dev/xvdb --bs=512k --iodepth=64 --size=10G --readwrite=randwrite --ramp_time=4
> test: (g=0): rw=randwrite, bs=512K-512K/512K-512K/512K-512K, ioengine=libaio, iodepth=64
> fio-2.16
> Starting 1 process
> Jobs: 1 (f=1): [w(1)] [70.6% done] [0KB/607.7MB/0KB /s] [0/1215/0 iops] [eta 00m:05s]
> test: (groupid=0, jobs=1): err= 0: pid=483: Wed Jun 21 06:35:14 2017
>   write: io=6232.0MB, bw=810976KB/s, iops=1575, runt=  7869msec
>   cpu          : usr=2.44%, sys=37.42%, ctx=3570, majf=0, minf=1
>   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.3%, >=64=164.6%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
>      issued    : total=r=0/w=12401/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
>      latency   : target=0, window=0, percentile=100.00%, depth=64
> 
> Run status group 0 (all jobs):
>   WRITE: io=6232.0MB, aggrb=810975KB/s, minb=810975KB/s, maxb=810975KB/s, mint=7869msec, maxt=7869msec
> 
> Disk stats (read/write):
>   xvdb: ios=54/229583, merge=0/2235879, ticks=16/5409500, in_queue=5415080, util=98.27%
> 
> So, higher throughput and iops. The dom0 cpu usage was running at ~70%, so there is definitely more dom0 overhead by using grant copy. The usage of grant copy could probably be improved through since the current code issues an copy ioctl per ioreq. With some batching I suspect some, if not all, of the extra overhead could be recovered.

There's almost always going to be more CPU overhead with grant-copy,
since when using persistent grants QEMU can avoid all (or almost all)
of the ioctls to the grant device.

For the persistent-grants benchmark, did you warm up the grant cache
first? (ie: are those results from a first run of fio?)

In any case, I'm happy to use something different than persistent
grants as long as the performance is similar.

Roger.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] xen-disk: only advertize feature-persistent if grant copy is not available
@ 2017-06-21 10:50             ` Roger Pau Monne
  0 siblings, 0 replies; 28+ messages in thread
From: Roger Pau Monne @ 2017-06-21 10:50 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Kevin Wolf, Stefano Stabellini, qemu-block, qemu-devel,
	Max Reitz, Anthony Perard, xen-devel

On Wed, Jun 21, 2017 at 11:40:00AM +0100, Paul Durrant wrote:
> > -----Original Message-----
> > From: Qemu-devel [mailto:qemu-devel-
> > bounces+paul.durrant=citrix.com@nongnu.org] On Behalf Of Paul Durrant
> > Sent: 21 June 2017 10:36
> > To: Roger Pau Monne <roger.pau@citrix.com>; Stefano Stabellini
> > <sstabellini@kernel.org>
> > Cc: Kevin Wolf <kwolf@redhat.com>; qemu-block@nongnu.org; qemu-
> > devel@nongnu.org; Max Reitz <mreitz@redhat.com>; Anthony Perard
> > <anthony.perard@citrix.com>; xen-devel@lists.xenproject.org
> > Subject: Re: [Qemu-devel] [PATCH 1/3] xen-disk: only advertize feature-
> > persistent if grant copy is not available
> > 
> > > -----Original Message-----
> > > From: Roger Pau Monne
> > > Sent: 21 June 2017 10:18
> > > To: Stefano Stabellini <sstabellini@kernel.org>
> > > Cc: Paul Durrant <Paul.Durrant@citrix.com>; xen-
> > devel@lists.xenproject.org;
> > > qemu-devel@nongnu.org; qemu-block@nongnu.org; Anthony Perard
> > > <anthony.perard@citrix.com>; Kevin Wolf <kwolf@redhat.com>; Max
> > Reitz
> > > <mreitz@redhat.com>
> > > Subject: Re: [PATCH 1/3] xen-disk: only advertize feature-persistent if
> > grant
> > > copy is not available
> > >
> > > On Tue, Jun 20, 2017 at 03:19:33PM -0700, Stefano Stabellini wrote:
> > > > On Tue, 20 Jun 2017, Paul Durrant wrote:
> > > > > If grant copy is available then it will always be used in preference to
> > > > > persistent maps. In this case feature-persistent should not be
> > advertized
> > > > > to the frontend, otherwise it may needlessly copy data into persistently
> > > > > granted buffers.
> > > > >
> > > > > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > > >
> > > > CC'ing Roger.
> > > >
> > > > It is true that using feature-persistent together with grant copies is a
> > > > a very bad idea.
> > > >
> > > > But this change enstablishes an explicit preference of
> > > > feature_grant_copy over feature-persistent in the xen_disk backend. It
> > > > is not obvious to me that it should be the case.
> > > >
> > > > Why is feature_grant_copy (without feature-persistent) better than
> > > > feature-persistent (without feature_grant_copy)? Shouldn't we simply
> > > > avoid grant copies to copy data to persistent grants?
> > >
> > > When using persistent grants the frontend must always copy data from
> > > the buffer to the persistent grant, there's no way to avoid this.
> > >
> > > Using grant_copy we move the copy from the frontend to the backend,
> > > which means the CPU time of the copy is accounted to the backend. This
> > > is not ideal, but IMHO it's better than persistent grants because it
> > > avoids keeping a pool of mapped grants that consume memory and make
> > > the code more complex.
> > >
> > > Do you have some performance data showing the difference between
> > > persistent grants vs grant copy?
> > >
> > 
> > No, but I can get some :-)
> > 
> > For a little background... I've been trying to push throughput of fio running in
> > a debian stretch guest on my skull canyon NUC. When I started out, I was
> > getting ~100MBbs. When I finished, with this patch, the IOThreads one, the
> > multi-page ring one and a bit of hackery to turn off all the aio flushes that
> > seem to occur even if the image is opened with O_DIRECT, I was getting
> > ~960Mbps... which is about line rate for the SSD in the in NUC.
> > 
> > So, I'll force use of persistent grants on and see what sort of throughput I
> > get.
> 
> A quick test with grant copy forced off (causing persistent grants to be used)... My VM is debian stretch using a 16 page shared ring from blkfront. The image backing xvdb is a fully inflated 10G qcow2.
> 
> root@dhcp-237-70:~# fio --randrepeat=1 --ioengine=libaio --direct=0 --gtod_reduce=1 --name=test --filename=/dev/xvdb --bs=512k --iodepth=64 --size=10G --readwrite=randwrite --ramp_time=4
> test: (g=0): rw=randwrite, bs=512K-512K/512K-512K/512K-512K, ioengine=libaio, iodepth=64
> fio-2.16
> Starting 1 process
> Jobs: 1 (f=1): [w(1)] [70.6% done] [0KB/539.4MB/0KB /s] [0/1078/0 iops] [eta 00m:05s]
> test: (groupid=0, jobs=1): err= 0: pid=633: Wed Jun 21 06:26:06 2017
>   write: io=6146.6MB, bw=795905KB/s, iops=1546, runt=  7908msec
>   cpu          : usr=2.07%, sys=34.00%, ctx=4490, majf=0, minf=1
>   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.3%, >=64=166.9%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
>      issued    : total=r=0/w=12230/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
>      latency   : target=0, window=0, percentile=100.00%, depth=64
> 
> Run status group 0 (all jobs):
>   WRITE: io=6146.6MB, aggrb=795904KB/s, minb=795904KB/s, maxb=795904KB/s, mint=7908msec, maxt=7908msec
> 
> Disk stats (read/write):
>   xvdb: ios=54/228860, merge=0/2230616, ticks=16/5403048, in_queue=5409068, util=98.26%
> 
> The dom0 cpu usage for the relevant IOThread was ~60%
> 
> The same test with grant copy...
> 
> root@dhcp-237-70:~# fio --randrepeat=1 --ioengine=libaio --direct=0 --gtod_reduce=1 --name=test --filename=/dev/xvdb --bs=512k --iodepth=64 --size=10G --readwrite=randwrite --ramp_time=4
> test: (g=0): rw=randwrite, bs=512K-512K/512K-512K/512K-512K, ioengine=libaio, iodepth=64
> fio-2.16
> Starting 1 process
> Jobs: 1 (f=1): [w(1)] [70.6% done] [0KB/607.7MB/0KB /s] [0/1215/0 iops] [eta 00m:05s]
> test: (groupid=0, jobs=1): err= 0: pid=483: Wed Jun 21 06:35:14 2017
>   write: io=6232.0MB, bw=810976KB/s, iops=1575, runt=  7869msec
>   cpu          : usr=2.44%, sys=37.42%, ctx=3570, majf=0, minf=1
>   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.3%, >=64=164.6%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
>      issued    : total=r=0/w=12401/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
>      latency   : target=0, window=0, percentile=100.00%, depth=64
> 
> Run status group 0 (all jobs):
>   WRITE: io=6232.0MB, aggrb=810975KB/s, minb=810975KB/s, maxb=810975KB/s, mint=7869msec, maxt=7869msec
> 
> Disk stats (read/write):
>   xvdb: ios=54/229583, merge=0/2235879, ticks=16/5409500, in_queue=5415080, util=98.27%
> 
> So, higher throughput and iops. The dom0 cpu usage was running at ~70%, so there is definitely more dom0 overhead by using grant copy. The usage of grant copy could probably be improved through since the current code issues an copy ioctl per ioreq. With some batching I suspect some, if not all, of the extra overhead could be recovered.

There's almost always going to be more CPU overhead with grant-copy,
since when using persistent grants QEMU can avoid all (or almost all)
of the ioctls to the grant device.

For the persistent-grants benchmark, did you warm up the grant cache
first? (ie: are those results from a first run of fio?)

In any case, I'm happy to use something different than persistent
grants as long as the performance is similar.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] xen-disk: only advertize feature-persistent if grant copy is not available
  2017-06-21 10:50             ` Roger Pau Monne
  (?)
@ 2017-06-21 11:05             ` Paul Durrant
  -1 siblings, 0 replies; 28+ messages in thread
From: Paul Durrant @ 2017-06-21 11:05 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Kevin Wolf, qemu-block, qemu-devel,
	Max Reitz, Anthony Perard, xen-devel

> -----Original Message-----
> From: Roger Pau Monne
> Sent: 21 June 2017 11:51
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>; Kevin Wolf
> <kwolf@redhat.com>; qemu-block@nongnu.org; qemu-devel@nongnu.org;
> Max Reitz <mreitz@redhat.com>; Anthony Perard
> <anthony.perard@citrix.com>; xen-devel@lists.xenproject.org
> Subject: Re: [Qemu-devel] [PATCH 1/3] xen-disk: only advertize feature-
> persistent if grant copy is not available
> 
> On Wed, Jun 21, 2017 at 11:40:00AM +0100, Paul Durrant wrote:
> > > -----Original Message-----
> > > From: Qemu-devel [mailto:qemu-devel-
> > > bounces+paul.durrant=citrix.com@nongnu.org] On Behalf Of Paul
> Durrant
> > > Sent: 21 June 2017 10:36
> > > To: Roger Pau Monne <roger.pau@citrix.com>; Stefano Stabellini
> > > <sstabellini@kernel.org>
> > > Cc: Kevin Wolf <kwolf@redhat.com>; qemu-block@nongnu.org; qemu-
> > > devel@nongnu.org; Max Reitz <mreitz@redhat.com>; Anthony Perard
> > > <anthony.perard@citrix.com>; xen-devel@lists.xenproject.org
> > > Subject: Re: [Qemu-devel] [PATCH 1/3] xen-disk: only advertize feature-
> > > persistent if grant copy is not available
> > >
> > > > -----Original Message-----
> > > > From: Roger Pau Monne
> > > > Sent: 21 June 2017 10:18
> > > > To: Stefano Stabellini <sstabellini@kernel.org>
> > > > Cc: Paul Durrant <Paul.Durrant@citrix.com>; xen-
> > > devel@lists.xenproject.org;
> > > > qemu-devel@nongnu.org; qemu-block@nongnu.org; Anthony Perard
> > > > <anthony.perard@citrix.com>; Kevin Wolf <kwolf@redhat.com>; Max
> > > Reitz
> > > > <mreitz@redhat.com>
> > > > Subject: Re: [PATCH 1/3] xen-disk: only advertize feature-persistent if
> > > grant
> > > > copy is not available
> > > >
> > > > On Tue, Jun 20, 2017 at 03:19:33PM -0700, Stefano Stabellini wrote:
> > > > > On Tue, 20 Jun 2017, Paul Durrant wrote:
> > > > > > If grant copy is available then it will always be used in preference to
> > > > > > persistent maps. In this case feature-persistent should not be
> > > advertized
> > > > > > to the frontend, otherwise it may needlessly copy data into
> persistently
> > > > > > granted buffers.
> > > > > >
> > > > > > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > > > >
> > > > > CC'ing Roger.
> > > > >
> > > > > It is true that using feature-persistent together with grant copies is a
> > > > > a very bad idea.
> > > > >
> > > > > But this change enstablishes an explicit preference of
> > > > > feature_grant_copy over feature-persistent in the xen_disk backend.
> It
> > > > > is not obvious to me that it should be the case.
> > > > >
> > > > > Why is feature_grant_copy (without feature-persistent) better than
> > > > > feature-persistent (without feature_grant_copy)? Shouldn't we
> simply
> > > > > avoid grant copies to copy data to persistent grants?
> > > >
> > > > When using persistent grants the frontend must always copy data from
> > > > the buffer to the persistent grant, there's no way to avoid this.
> > > >
> > > > Using grant_copy we move the copy from the frontend to the backend,
> > > > which means the CPU time of the copy is accounted to the backend.
> This
> > > > is not ideal, but IMHO it's better than persistent grants because it
> > > > avoids keeping a pool of mapped grants that consume memory and
> make
> > > > the code more complex.
> > > >
> > > > Do you have some performance data showing the difference between
> > > > persistent grants vs grant copy?
> > > >
> > >
> > > No, but I can get some :-)
> > >
> > > For a little background... I've been trying to push throughput of fio
> running in
> > > a debian stretch guest on my skull canyon NUC. When I started out, I was
> > > getting ~100MBbs. When I finished, with this patch, the IOThreads one,
> the
> > > multi-page ring one and a bit of hackery to turn off all the aio flushes that
> > > seem to occur even if the image is opened with O_DIRECT, I was getting
> > > ~960Mbps... which is about line rate for the SSD in the in NUC.
> > >
> > > So, I'll force use of persistent grants on and see what sort of throughput I
> > > get.
> >
> > A quick test with grant copy forced off (causing persistent grants to be
> used)... My VM is debian stretch using a 16 page shared ring from blkfront.
> The image backing xvdb is a fully inflated 10G qcow2.
> >
> > root@dhcp-237-70:~# fio --randrepeat=1 --ioengine=libaio --direct=0 --
> gtod_reduce=1 --name=test --filename=/dev/xvdb --bs=512k --iodepth=64 -
> -size=10G --readwrite=randwrite --ramp_time=4
> > test: (g=0): rw=randwrite, bs=512K-512K/512K-512K/512K-512K,
> ioengine=libaio, iodepth=64
> > fio-2.16
> > Starting 1 process
> > Jobs: 1 (f=1): [w(1)] [70.6% done] [0KB/539.4MB/0KB /s] [0/1078/0 iops]
> [eta 00m:05s]
> > test: (groupid=0, jobs=1): err= 0: pid=633: Wed Jun 21 06:26:06 2017
> >   write: io=6146.6MB, bw=795905KB/s, iops=1546, runt=  7908msec
> >   cpu          : usr=2.07%, sys=34.00%, ctx=4490, majf=0, minf=1
> >   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.3%,
> >=64=166.9%
> >      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >=64=0.0%
> >      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%,
> >=64=0.0%
> >      issued    : total=r=0/w=12230/d=0, short=r=0/w=0/d=0,
> drop=r=0/w=0/d=0
> >      latency   : target=0, window=0, percentile=100.00%, depth=64
> >
> > Run status group 0 (all jobs):
> >   WRITE: io=6146.6MB, aggrb=795904KB/s, minb=795904KB/s,
> maxb=795904KB/s, mint=7908msec, maxt=7908msec
> >
> > Disk stats (read/write):
> >   xvdb: ios=54/228860, merge=0/2230616, ticks=16/5403048,
> in_queue=5409068, util=98.26%
> >
> > The dom0 cpu usage for the relevant IOThread was ~60%
> >
> > The same test with grant copy...
> >
> > root@dhcp-237-70:~# fio --randrepeat=1 --ioengine=libaio --direct=0 --
> gtod_reduce=1 --name=test --filename=/dev/xvdb --bs=512k --iodepth=64 -
> -size=10G --readwrite=randwrite --ramp_time=4
> > test: (g=0): rw=randwrite, bs=512K-512K/512K-512K/512K-512K,
> ioengine=libaio, iodepth=64
> > fio-2.16
> > Starting 1 process
> > Jobs: 1 (f=1): [w(1)] [70.6% done] [0KB/607.7MB/0KB /s] [0/1215/0 iops]
> [eta 00m:05s]
> > test: (groupid=0, jobs=1): err= 0: pid=483: Wed Jun 21 06:35:14 2017
> >   write: io=6232.0MB, bw=810976KB/s, iops=1575, runt=  7869msec
> >   cpu          : usr=2.44%, sys=37.42%, ctx=3570, majf=0, minf=1
> >   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.3%,
> >=64=164.6%
> >      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >=64=0.0%
> >      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%,
> >=64=0.0%
> >      issued    : total=r=0/w=12401/d=0, short=r=0/w=0/d=0,
> drop=r=0/w=0/d=0
> >      latency   : target=0, window=0, percentile=100.00%, depth=64
> >
> > Run status group 0 (all jobs):
> >   WRITE: io=6232.0MB, aggrb=810975KB/s, minb=810975KB/s,
> maxb=810975KB/s, mint=7869msec, maxt=7869msec
> >
> > Disk stats (read/write):
> >   xvdb: ios=54/229583, merge=0/2235879, ticks=16/5409500,
> in_queue=5415080, util=98.27%
> >
> > So, higher throughput and iops. The dom0 cpu usage was running at ~70%,
> so there is definitely more dom0 overhead by using grant copy. The usage of
> grant copy could probably be improved through since the current code issues
> an copy ioctl per ioreq. With some batching I suspect some, if not all, of the
> extra overhead could be recovered.
> 
> There's almost always going to be more CPU overhead with grant-copy,
> since when using persistent grants QEMU can avoid all (or almost all)
> of the ioctls to the grant device.
> 
> For the persistent-grants benchmark, did you warm up the grant cache
> first? (ie: are those results from a first run of fio?)
> 

No, that was the third run I did (and the same in the grant copy case).

> In any case, I'm happy to use something different than persistent
> grants as long as the performance is similar.
> 

Yes, I'd even suggest removing the persistent grant code from xen_disk.c in the interest of reducing the complexity of the code... but I guess that depends on how likely folks are to be using a new QEMU with an older set of Xen libraries (and hence not have grant copy available to them).

  Paul

> Roger.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] xen-disk: only advertize feature-persistent if grant copy is not available
  2017-06-21 10:50             ` Roger Pau Monne
  (?)
  (?)
@ 2017-06-21 11:05             ` Paul Durrant
  -1 siblings, 0 replies; 28+ messages in thread
From: Paul Durrant @ 2017-06-21 11:05 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Kevin Wolf, Stefano Stabellini, qemu-block, qemu-devel,
	Max Reitz, Anthony Perard, xen-devel

> -----Original Message-----
> From: Roger Pau Monne
> Sent: 21 June 2017 11:51
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>; Kevin Wolf
> <kwolf@redhat.com>; qemu-block@nongnu.org; qemu-devel@nongnu.org;
> Max Reitz <mreitz@redhat.com>; Anthony Perard
> <anthony.perard@citrix.com>; xen-devel@lists.xenproject.org
> Subject: Re: [Qemu-devel] [PATCH 1/3] xen-disk: only advertize feature-
> persistent if grant copy is not available
> 
> On Wed, Jun 21, 2017 at 11:40:00AM +0100, Paul Durrant wrote:
> > > -----Original Message-----
> > > From: Qemu-devel [mailto:qemu-devel-
> > > bounces+paul.durrant=citrix.com@nongnu.org] On Behalf Of Paul
> Durrant
> > > Sent: 21 June 2017 10:36
> > > To: Roger Pau Monne <roger.pau@citrix.com>; Stefano Stabellini
> > > <sstabellini@kernel.org>
> > > Cc: Kevin Wolf <kwolf@redhat.com>; qemu-block@nongnu.org; qemu-
> > > devel@nongnu.org; Max Reitz <mreitz@redhat.com>; Anthony Perard
> > > <anthony.perard@citrix.com>; xen-devel@lists.xenproject.org
> > > Subject: Re: [Qemu-devel] [PATCH 1/3] xen-disk: only advertize feature-
> > > persistent if grant copy is not available
> > >
> > > > -----Original Message-----
> > > > From: Roger Pau Monne
> > > > Sent: 21 June 2017 10:18
> > > > To: Stefano Stabellini <sstabellini@kernel.org>
> > > > Cc: Paul Durrant <Paul.Durrant@citrix.com>; xen-
> > > devel@lists.xenproject.org;
> > > > qemu-devel@nongnu.org; qemu-block@nongnu.org; Anthony Perard
> > > > <anthony.perard@citrix.com>; Kevin Wolf <kwolf@redhat.com>; Max
> > > Reitz
> > > > <mreitz@redhat.com>
> > > > Subject: Re: [PATCH 1/3] xen-disk: only advertize feature-persistent if
> > > grant
> > > > copy is not available
> > > >
> > > > On Tue, Jun 20, 2017 at 03:19:33PM -0700, Stefano Stabellini wrote:
> > > > > On Tue, 20 Jun 2017, Paul Durrant wrote:
> > > > > > If grant copy is available then it will always be used in preference to
> > > > > > persistent maps. In this case feature-persistent should not be
> > > advertized
> > > > > > to the frontend, otherwise it may needlessly copy data into
> persistently
> > > > > > granted buffers.
> > > > > >
> > > > > > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > > > >
> > > > > CC'ing Roger.
> > > > >
> > > > > It is true that using feature-persistent together with grant copies is a
> > > > > a very bad idea.
> > > > >
> > > > > But this change enstablishes an explicit preference of
> > > > > feature_grant_copy over feature-persistent in the xen_disk backend.
> It
> > > > > is not obvious to me that it should be the case.
> > > > >
> > > > > Why is feature_grant_copy (without feature-persistent) better than
> > > > > feature-persistent (without feature_grant_copy)? Shouldn't we
> simply
> > > > > avoid grant copies to copy data to persistent grants?
> > > >
> > > > When using persistent grants the frontend must always copy data from
> > > > the buffer to the persistent grant, there's no way to avoid this.
> > > >
> > > > Using grant_copy we move the copy from the frontend to the backend,
> > > > which means the CPU time of the copy is accounted to the backend.
> This
> > > > is not ideal, but IMHO it's better than persistent grants because it
> > > > avoids keeping a pool of mapped grants that consume memory and
> make
> > > > the code more complex.
> > > >
> > > > Do you have some performance data showing the difference between
> > > > persistent grants vs grant copy?
> > > >
> > >
> > > No, but I can get some :-)
> > >
> > > For a little background... I've been trying to push throughput of fio
> running in
> > > a debian stretch guest on my skull canyon NUC. When I started out, I was
> > > getting ~100MBbs. When I finished, with this patch, the IOThreads one,
> the
> > > multi-page ring one and a bit of hackery to turn off all the aio flushes that
> > > seem to occur even if the image is opened with O_DIRECT, I was getting
> > > ~960Mbps... which is about line rate for the SSD in the in NUC.
> > >
> > > So, I'll force use of persistent grants on and see what sort of throughput I
> > > get.
> >
> > A quick test with grant copy forced off (causing persistent grants to be
> used)... My VM is debian stretch using a 16 page shared ring from blkfront.
> The image backing xvdb is a fully inflated 10G qcow2.
> >
> > root@dhcp-237-70:~# fio --randrepeat=1 --ioengine=libaio --direct=0 --
> gtod_reduce=1 --name=test --filename=/dev/xvdb --bs=512k --iodepth=64 -
> -size=10G --readwrite=randwrite --ramp_time=4
> > test: (g=0): rw=randwrite, bs=512K-512K/512K-512K/512K-512K,
> ioengine=libaio, iodepth=64
> > fio-2.16
> > Starting 1 process
> > Jobs: 1 (f=1): [w(1)] [70.6% done] [0KB/539.4MB/0KB /s] [0/1078/0 iops]
> [eta 00m:05s]
> > test: (groupid=0, jobs=1): err= 0: pid=633: Wed Jun 21 06:26:06 2017
> >   write: io=6146.6MB, bw=795905KB/s, iops=1546, runt=  7908msec
> >   cpu          : usr=2.07%, sys=34.00%, ctx=4490, majf=0, minf=1
> >   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.3%,
> >=64=166.9%
> >      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >=64=0.0%
> >      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%,
> >=64=0.0%
> >      issued    : total=r=0/w=12230/d=0, short=r=0/w=0/d=0,
> drop=r=0/w=0/d=0
> >      latency   : target=0, window=0, percentile=100.00%, depth=64
> >
> > Run status group 0 (all jobs):
> >   WRITE: io=6146.6MB, aggrb=795904KB/s, minb=795904KB/s,
> maxb=795904KB/s, mint=7908msec, maxt=7908msec
> >
> > Disk stats (read/write):
> >   xvdb: ios=54/228860, merge=0/2230616, ticks=16/5403048,
> in_queue=5409068, util=98.26%
> >
> > The dom0 cpu usage for the relevant IOThread was ~60%
> >
> > The same test with grant copy...
> >
> > root@dhcp-237-70:~# fio --randrepeat=1 --ioengine=libaio --direct=0 --
> gtod_reduce=1 --name=test --filename=/dev/xvdb --bs=512k --iodepth=64 -
> -size=10G --readwrite=randwrite --ramp_time=4
> > test: (g=0): rw=randwrite, bs=512K-512K/512K-512K/512K-512K,
> ioengine=libaio, iodepth=64
> > fio-2.16
> > Starting 1 process
> > Jobs: 1 (f=1): [w(1)] [70.6% done] [0KB/607.7MB/0KB /s] [0/1215/0 iops]
> [eta 00m:05s]
> > test: (groupid=0, jobs=1): err= 0: pid=483: Wed Jun 21 06:35:14 2017
> >   write: io=6232.0MB, bw=810976KB/s, iops=1575, runt=  7869msec
> >   cpu          : usr=2.44%, sys=37.42%, ctx=3570, majf=0, minf=1
> >   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.3%,
> >=64=164.6%
> >      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >=64=0.0%
> >      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%,
> >=64=0.0%
> >      issued    : total=r=0/w=12401/d=0, short=r=0/w=0/d=0,
> drop=r=0/w=0/d=0
> >      latency   : target=0, window=0, percentile=100.00%, depth=64
> >
> > Run status group 0 (all jobs):
> >   WRITE: io=6232.0MB, aggrb=810975KB/s, minb=810975KB/s,
> maxb=810975KB/s, mint=7869msec, maxt=7869msec
> >
> > Disk stats (read/write):
> >   xvdb: ios=54/229583, merge=0/2235879, ticks=16/5409500,
> in_queue=5415080, util=98.27%
> >
> > So, higher throughput and iops. The dom0 cpu usage was running at ~70%,
> so there is definitely more dom0 overhead by using grant copy. The usage of
> grant copy could probably be improved through since the current code issues
> an copy ioctl per ioreq. With some batching I suspect some, if not all, of the
> extra overhead could be recovered.
> 
> There's almost always going to be more CPU overhead with grant-copy,
> since when using persistent grants QEMU can avoid all (or almost all)
> of the ioctls to the grant device.
> 
> For the persistent-grants benchmark, did you warm up the grant cache
> first? (ie: are those results from a first run of fio?)
> 

No, that was the third run I did (and the same in the grant copy case).

> In any case, I'm happy to use something different than persistent
> grants as long as the performance is similar.
> 

Yes, I'd even suggest removing the persistent grant code from xen_disk.c in the interest of reducing the complexity of the code... but I guess that depends on how likely folks are to be using a new QEMU with an older set of Xen libraries (and hence not have grant copy available to them).

  Paul

> Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2017-06-21 11:06 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-20 13:47 [Qemu-devel] [PATCH 0/3] xen-disk: performance improvements Paul Durrant
2017-06-20 13:47 ` Paul Durrant
2017-06-20 13:47 ` [Qemu-devel] [PATCH 1/3] xen-disk: only advertize feature-persistent if grant copy is not available Paul Durrant
2017-06-20 13:47   ` Paul Durrant
2017-06-20 22:19   ` [Qemu-devel] " Stefano Stabellini
2017-06-21  9:17     ` Roger Pau Monné
2017-06-21  9:17       ` Roger Pau Monné
2017-06-21  9:35       ` [Qemu-devel] " Paul Durrant
2017-06-21 10:40         ` Paul Durrant
2017-06-21 10:50           ` Roger Pau Monne
2017-06-21 10:50             ` Roger Pau Monne
2017-06-21 11:05             ` Paul Durrant
2017-06-21 11:05             ` Paul Durrant
2017-06-21 10:40         ` Paul Durrant
2017-06-21  9:35       ` Paul Durrant
2017-06-20 22:19   ` Stefano Stabellini
2017-06-20 13:47 ` [Qemu-devel] [PATCH 2/3] xen-disk: add support for multi-page shared rings Paul Durrant
2017-06-20 13:47   ` Paul Durrant
2017-06-20 22:51   ` [Qemu-devel] " Stefano Stabellini
2017-06-20 22:51     ` Stefano Stabellini
2017-06-21 10:42     ` [Qemu-devel] " Paul Durrant
2017-06-21 10:42       ` Paul Durrant
2017-06-20 13:47 ` [Qemu-devel] [PATCH 3/3] xen-disk: use an IOThread per instance Paul Durrant
2017-06-20 13:47   ` Paul Durrant
2017-06-20 16:07   ` Paolo Bonzini
2017-06-20 16:07   ` [Qemu-devel] " Paolo Bonzini
2017-06-20 16:09     ` Paul Durrant
2017-06-20 16:09       ` Paul Durrant

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.