* [Qemu-devel] [PATCH v2 0/3] xen-disk: performance improvements
@ 2017-06-21 12:52 ` Paul Durrant
0 siblings, 0 replies; 24+ messages in thread
From: Paul Durrant @ 2017-06-21 12:52 UTC (permalink / raw)
To: xen-devel, qemu-devel, qemu-block; +Cc: Paul Durrant
Paul Durrant (3):
xen-disk: only advertize feature-persistent if grant copy is not
available
xen-disk: add support for multi-page shared rings
xen-disk: use an IOThread per instance
hw/block/trace-events | 7 ++
hw/block/xen_disk.c | 228 +++++++++++++++++++++++++++++++++++++++-----------
2 files changed, 188 insertions(+), 47 deletions(-)
--
2.11.0
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v2 0/3] xen-disk: performance improvements
@ 2017-06-21 12:52 ` Paul Durrant
0 siblings, 0 replies; 24+ messages in thread
From: Paul Durrant @ 2017-06-21 12:52 UTC (permalink / raw)
To: xen-devel, qemu-devel, qemu-block; +Cc: Paul Durrant
Paul Durrant (3):
xen-disk: only advertize feature-persistent if grant copy is not
available
xen-disk: add support for multi-page shared rings
xen-disk: use an IOThread per instance
hw/block/trace-events | 7 ++
hw/block/xen_disk.c | 228 +++++++++++++++++++++++++++++++++++++++-----------
2 files changed, 188 insertions(+), 47 deletions(-)
--
2.11.0
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Qemu-devel] [PATCH v2 1/3] xen-disk: only advertize feature-persistent if grant copy is not available
2017-06-21 12:52 ` Paul Durrant
@ 2017-06-21 12:52 ` Paul Durrant
-1 siblings, 0 replies; 24+ messages in thread
From: Paul Durrant @ 2017-06-21 12:52 UTC (permalink / raw)
To: xen-devel, qemu-devel, qemu-block
Cc: Paul Durrant, Stefano Stabellini, Anthony Perard, Kevin Wolf, Max Reitz
If grant copy is available then it will always be used in preference to
persistent maps. In this case feature-persistent should not be advertized
to the frontend, otherwise it may needlessly copy data into persistently
granted buffers.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
---
hw/block/xen_disk.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 3a22805fbc..9b06e3aa81 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -1023,11 +1023,18 @@ static int blk_init(struct XenDevice *xendev)
blkdev->file_blk = BLOCK_SIZE;
+ blkdev->feature_grant_copy =
+ (xengnttab_grant_copy(blkdev->xendev.gnttabdev, 0, NULL) == 0);
+
+ xen_pv_printf(&blkdev->xendev, 3, "grant copy operation %s\n",
+ blkdev->feature_grant_copy ? "enabled" : "disabled");
+
/* fill info
* blk_connect supplies sector-size and sectors
*/
xenstore_write_be_int(&blkdev->xendev, "feature-flush-cache", 1);
- xenstore_write_be_int(&blkdev->xendev, "feature-persistent", 1);
+ xenstore_write_be_int(&blkdev->xendev, "feature-persistent",
+ !blkdev->feature_grant_copy);
xenstore_write_be_int(&blkdev->xendev, "info", info);
blk_parse_discard(blkdev);
@@ -1202,12 +1209,6 @@ static int blk_connect(struct XenDevice *xendev)
xen_be_bind_evtchn(&blkdev->xendev);
- blkdev->feature_grant_copy =
- (xengnttab_grant_copy(blkdev->xendev.gnttabdev, 0, NULL) == 0);
-
- xen_pv_printf(&blkdev->xendev, 3, "grant copy operation %s\n",
- blkdev->feature_grant_copy ? "enabled" : "disabled");
-
xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, ring-ref %d, "
"remote port %d, local port %d\n",
blkdev->xendev.protocol, blkdev->ring_ref,
--
2.11.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v2 1/3] xen-disk: only advertize feature-persistent if grant copy is not available
@ 2017-06-21 12:52 ` Paul Durrant
0 siblings, 0 replies; 24+ messages in thread
From: Paul Durrant @ 2017-06-21 12:52 UTC (permalink / raw)
To: xen-devel, qemu-devel, qemu-block
Cc: Anthony Perard, Kevin Wolf, Paul Durrant, Stefano Stabellini, Max Reitz
If grant copy is available then it will always be used in preference to
persistent maps. In this case feature-persistent should not be advertized
to the frontend, otherwise it may needlessly copy data into persistently
granted buffers.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
---
hw/block/xen_disk.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 3a22805fbc..9b06e3aa81 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -1023,11 +1023,18 @@ static int blk_init(struct XenDevice *xendev)
blkdev->file_blk = BLOCK_SIZE;
+ blkdev->feature_grant_copy =
+ (xengnttab_grant_copy(blkdev->xendev.gnttabdev, 0, NULL) == 0);
+
+ xen_pv_printf(&blkdev->xendev, 3, "grant copy operation %s\n",
+ blkdev->feature_grant_copy ? "enabled" : "disabled");
+
/* fill info
* blk_connect supplies sector-size and sectors
*/
xenstore_write_be_int(&blkdev->xendev, "feature-flush-cache", 1);
- xenstore_write_be_int(&blkdev->xendev, "feature-persistent", 1);
+ xenstore_write_be_int(&blkdev->xendev, "feature-persistent",
+ !blkdev->feature_grant_copy);
xenstore_write_be_int(&blkdev->xendev, "info", info);
blk_parse_discard(blkdev);
@@ -1202,12 +1209,6 @@ static int blk_connect(struct XenDevice *xendev)
xen_be_bind_evtchn(&blkdev->xendev);
- blkdev->feature_grant_copy =
- (xengnttab_grant_copy(blkdev->xendev.gnttabdev, 0, NULL) == 0);
-
- xen_pv_printf(&blkdev->xendev, 3, "grant copy operation %s\n",
- blkdev->feature_grant_copy ? "enabled" : "disabled");
-
xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, ring-ref %d, "
"remote port %d, local port %d\n",
blkdev->xendev.protocol, blkdev->ring_ref,
--
2.11.0
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [Qemu-devel] [PATCH v2 2/3] xen-disk: add support for multi-page shared rings
2017-06-21 12:52 ` Paul Durrant
@ 2017-06-21 12:52 ` Paul Durrant
-1 siblings, 0 replies; 24+ messages in thread
From: Paul Durrant @ 2017-06-21 12:52 UTC (permalink / raw)
To: xen-devel, qemu-devel, qemu-block
Cc: Paul Durrant, Stefano Stabellini, Anthony Perard, Kevin Wolf, Max Reitz
The blkif protocol has had provision for negotiation of multi-page shared
rings for some time now and many guest OS have support in their frontend
drivers.
This patch makes the necessary modifications to xen-disk support a shared
ring up to order 4 (i.e. 16 pages).
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
v2:
- Fix memory leak in error path
- Print warning if ring-page-order exceeds limits
---
hw/block/xen_disk.c | 144 +++++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 113 insertions(+), 31 deletions(-)
diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 9b06e3aa81..0e6513708e 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -36,8 +36,6 @@
static int batch_maps = 0;
-static int max_requests = 32;
-
/* ------------------------------------------------------------- */
#define BLOCK_SIZE 512
@@ -84,6 +82,8 @@ struct ioreq {
BlockAcctCookie acct;
};
+#define MAX_RING_PAGE_ORDER 4
+
struct XenBlkDev {
struct XenDevice xendev; /* must be first */
char *params;
@@ -94,7 +94,8 @@ struct XenBlkDev {
bool directiosafe;
const char *fileproto;
const char *filename;
- int ring_ref;
+ unsigned int ring_ref[1 << MAX_RING_PAGE_ORDER];
+ unsigned int nr_ring_ref;
void *sring;
int64_t file_blk;
int64_t file_size;
@@ -110,6 +111,7 @@ struct XenBlkDev {
int requests_total;
int requests_inflight;
int requests_finished;
+ unsigned int max_requests;
/* Persistent grants extension */
gboolean feature_discard;
@@ -199,7 +201,7 @@ static struct ioreq *ioreq_start(struct XenBlkDev *blkdev)
struct ioreq *ioreq = NULL;
if (QLIST_EMPTY(&blkdev->freelist)) {
- if (blkdev->requests_total >= max_requests) {
+ if (blkdev->requests_total >= blkdev->max_requests) {
goto out;
}
/* allocate new struct */
@@ -905,7 +907,7 @@ static void blk_handle_requests(struct XenBlkDev *blkdev)
ioreq_runio_qemu_aio(ioreq);
}
- if (blkdev->more_work && blkdev->requests_inflight < max_requests) {
+ if (blkdev->more_work && blkdev->requests_inflight < blkdev->max_requests) {
qemu_bh_schedule(blkdev->bh);
}
}
@@ -918,15 +920,6 @@ static void blk_bh(void *opaque)
blk_handle_requests(blkdev);
}
-/*
- * We need to account for the grant allocations requiring contiguous
- * chunks; the worst case number would be
- * max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
- * but in order to keep things simple just use
- * 2 * max_req * max_seg.
- */
-#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
-
static void blk_alloc(struct XenDevice *xendev)
{
struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
@@ -938,11 +931,6 @@ static void blk_alloc(struct XenDevice *xendev)
if (xen_mode != XEN_EMULATE) {
batch_maps = 1;
}
- if (xengnttab_set_max_grants(xendev->gnttabdev,
- MAX_GRANTS(max_requests, BLKIF_MAX_SEGMENTS_PER_REQUEST)) < 0) {
- xen_pv_printf(xendev, 0, "xengnttab_set_max_grants failed: %s\n",
- strerror(errno));
- }
}
static void blk_parse_discard(struct XenBlkDev *blkdev)
@@ -1037,6 +1025,9 @@ static int blk_init(struct XenDevice *xendev)
!blkdev->feature_grant_copy);
xenstore_write_be_int(&blkdev->xendev, "info", info);
+ xenstore_write_be_int(&blkdev->xendev, "max-ring-page-order",
+ MAX_RING_PAGE_ORDER);
+
blk_parse_discard(blkdev);
g_free(directiosafe);
@@ -1058,12 +1049,25 @@ out_error:
return -1;
}
+/*
+ * We need to account for the grant allocations requiring contiguous
+ * chunks; the worst case number would be
+ * max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
+ * but in order to keep things simple just use
+ * 2 * max_req * max_seg.
+ */
+#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
+
static int blk_connect(struct XenDevice *xendev)
{
struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
int pers, index, qflags;
bool readonly = true;
bool writethrough = true;
+ int order, ring_ref;
+ unsigned int ring_size, max_grants;
+ unsigned int i;
+ uint32_t *domids;
/* read-only ? */
if (blkdev->directiosafe) {
@@ -1138,9 +1142,42 @@ static int blk_connect(struct XenDevice *xendev)
xenstore_write_be_int64(&blkdev->xendev, "sectors",
blkdev->file_size / blkdev->file_blk);
- if (xenstore_read_fe_int(&blkdev->xendev, "ring-ref", &blkdev->ring_ref) == -1) {
+ if (xenstore_read_fe_int(&blkdev->xendev, "ring-page-order",
+ &order) == -1) {
+ blkdev->nr_ring_ref = 1;
+
+ if (xenstore_read_fe_int(&blkdev->xendev, "ring-ref",
+ &ring_ref) == -1) {
+ return -1;
+ }
+ blkdev->ring_ref[0] = ring_ref;
+
+ } else if (order >= 0 && order <= MAX_RING_PAGE_ORDER) {
+ blkdev->nr_ring_ref = 1 << order;
+
+ for (i = 0; i < blkdev->nr_ring_ref; i++) {
+ char *key;
+
+ key = g_strdup_printf("ring-ref%u", i);
+ if (!key) {
+ return -1;
+ }
+
+ if (xenstore_read_fe_int(&blkdev->xendev, key,
+ &ring_ref) == -1) {
+ g_free(key);
+ return -1;
+ }
+ blkdev->ring_ref[i] = ring_ref;
+
+ g_free(key);
+ }
+ } else {
+ xen_pv_printf(xendev, 0, "invalid ring-page-order: %d\n",
+ order);
return -1;
}
+
if (xenstore_read_fe_int(&blkdev->xendev, "event-channel",
&blkdev->xendev.remote_port) == -1) {
return -1;
@@ -1163,41 +1200,85 @@ static int blk_connect(struct XenDevice *xendev)
blkdev->protocol = BLKIF_PROTOCOL_NATIVE;
}
- blkdev->sring = xengnttab_map_grant_ref(blkdev->xendev.gnttabdev,
- blkdev->xendev.dom,
- blkdev->ring_ref,
- PROT_READ | PROT_WRITE);
+ ring_size = XC_PAGE_SIZE * blkdev->nr_ring_ref;
+ switch (blkdev->protocol) {
+ case BLKIF_PROTOCOL_NATIVE:
+ {
+ blkdev->max_requests = __CONST_RING_SIZE(blkif, ring_size);
+ break;
+ }
+ case BLKIF_PROTOCOL_X86_32:
+ {
+ blkdev->max_requests = __CONST_RING_SIZE(blkif_x86_32, ring_size);
+ break;
+ }
+ case BLKIF_PROTOCOL_X86_64:
+ {
+ blkdev->max_requests = __CONST_RING_SIZE(blkif_x86_64, ring_size);
+ break;
+ }
+ default:
+ return -1;
+ }
+
+ /* Calculate the maximum number of grants needed by ioreqs */
+ max_grants = MAX_GRANTS(blkdev->max_requests,
+ BLKIF_MAX_SEGMENTS_PER_REQUEST);
+ /* Add on the number needed for the ring pages */
+ max_grants += blkdev->nr_ring_ref;
+
+ if (xengnttab_set_max_grants(blkdev->xendev.gnttabdev, max_grants)) {
+ xen_pv_printf(xendev, 0, "xengnttab_set_max_grants failed: %s\n",
+ strerror(errno));
+ return -1;
+ }
+
+ domids = g_malloc0_n(blkdev->nr_ring_ref, sizeof(uint32_t));
+ for (i = 0; i < blkdev->nr_ring_ref; i++) {
+ domids[i] = blkdev->xendev.dom;
+ }
+
+ blkdev->sring = xengnttab_map_grant_refs(blkdev->xendev.gnttabdev,
+ blkdev->nr_ring_ref,
+ domids,
+ blkdev->ring_ref,
+ PROT_READ | PROT_WRITE);
+
+ g_free(domids);
+
if (!blkdev->sring) {
return -1;
}
+
blkdev->cnt_map++;
switch (blkdev->protocol) {
case BLKIF_PROTOCOL_NATIVE:
{
blkif_sring_t *sring_native = blkdev->sring;
- BACK_RING_INIT(&blkdev->rings.native, sring_native, XC_PAGE_SIZE);
+ BACK_RING_INIT(&blkdev->rings.native, sring_native, ring_size);
break;
}
case BLKIF_PROTOCOL_X86_32:
{
blkif_x86_32_sring_t *sring_x86_32 = blkdev->sring;
- BACK_RING_INIT(&blkdev->rings.x86_32_part, sring_x86_32, XC_PAGE_SIZE);
+ BACK_RING_INIT(&blkdev->rings.x86_32_part, sring_x86_32, ring_size);
break;
}
case BLKIF_PROTOCOL_X86_64:
{
blkif_x86_64_sring_t *sring_x86_64 = blkdev->sring;
- BACK_RING_INIT(&blkdev->rings.x86_64_part, sring_x86_64, XC_PAGE_SIZE);
+ BACK_RING_INIT(&blkdev->rings.x86_64_part, sring_x86_64, ring_size);
break;
}
}
if (blkdev->feature_persistent) {
/* Init persistent grants */
- blkdev->max_grants = max_requests * BLKIF_MAX_SEGMENTS_PER_REQUEST;
+ blkdev->max_grants = blkdev->max_requests *
+ BLKIF_MAX_SEGMENTS_PER_REQUEST;
blkdev->persistent_gnts = g_tree_new_full((GCompareDataFunc)int_cmp,
NULL, NULL,
batch_maps ?
@@ -1209,9 +1290,9 @@ static int blk_connect(struct XenDevice *xendev)
xen_be_bind_evtchn(&blkdev->xendev);
- xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, ring-ref %d, "
+ xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, nr-ring-ref %u, "
"remote port %d, local port %d\n",
- blkdev->xendev.protocol, blkdev->ring_ref,
+ blkdev->xendev.protocol, blkdev->nr_ring_ref,
blkdev->xendev.remote_port, blkdev->xendev.local_port);
return 0;
}
@@ -1228,7 +1309,8 @@ static void blk_disconnect(struct XenDevice *xendev)
xen_pv_unbind_evtchn(&blkdev->xendev);
if (blkdev->sring) {
- xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring, 1);
+ xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring,
+ blkdev->nr_ring_ref);
blkdev->cnt_map--;
blkdev->sring = NULL;
}
--
2.11.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v2 2/3] xen-disk: add support for multi-page shared rings
@ 2017-06-21 12:52 ` Paul Durrant
0 siblings, 0 replies; 24+ messages in thread
From: Paul Durrant @ 2017-06-21 12:52 UTC (permalink / raw)
To: xen-devel, qemu-devel, qemu-block
Cc: Anthony Perard, Kevin Wolf, Paul Durrant, Stefano Stabellini, Max Reitz
The blkif protocol has had provision for negotiation of multi-page shared
rings for some time now and many guest OS have support in their frontend
drivers.
This patch makes the necessary modifications to xen-disk support a shared
ring up to order 4 (i.e. 16 pages).
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
v2:
- Fix memory leak in error path
- Print warning if ring-page-order exceeds limits
---
hw/block/xen_disk.c | 144 +++++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 113 insertions(+), 31 deletions(-)
diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 9b06e3aa81..0e6513708e 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -36,8 +36,6 @@
static int batch_maps = 0;
-static int max_requests = 32;
-
/* ------------------------------------------------------------- */
#define BLOCK_SIZE 512
@@ -84,6 +82,8 @@ struct ioreq {
BlockAcctCookie acct;
};
+#define MAX_RING_PAGE_ORDER 4
+
struct XenBlkDev {
struct XenDevice xendev; /* must be first */
char *params;
@@ -94,7 +94,8 @@ struct XenBlkDev {
bool directiosafe;
const char *fileproto;
const char *filename;
- int ring_ref;
+ unsigned int ring_ref[1 << MAX_RING_PAGE_ORDER];
+ unsigned int nr_ring_ref;
void *sring;
int64_t file_blk;
int64_t file_size;
@@ -110,6 +111,7 @@ struct XenBlkDev {
int requests_total;
int requests_inflight;
int requests_finished;
+ unsigned int max_requests;
/* Persistent grants extension */
gboolean feature_discard;
@@ -199,7 +201,7 @@ static struct ioreq *ioreq_start(struct XenBlkDev *blkdev)
struct ioreq *ioreq = NULL;
if (QLIST_EMPTY(&blkdev->freelist)) {
- if (blkdev->requests_total >= max_requests) {
+ if (blkdev->requests_total >= blkdev->max_requests) {
goto out;
}
/* allocate new struct */
@@ -905,7 +907,7 @@ static void blk_handle_requests(struct XenBlkDev *blkdev)
ioreq_runio_qemu_aio(ioreq);
}
- if (blkdev->more_work && blkdev->requests_inflight < max_requests) {
+ if (blkdev->more_work && blkdev->requests_inflight < blkdev->max_requests) {
qemu_bh_schedule(blkdev->bh);
}
}
@@ -918,15 +920,6 @@ static void blk_bh(void *opaque)
blk_handle_requests(blkdev);
}
-/*
- * We need to account for the grant allocations requiring contiguous
- * chunks; the worst case number would be
- * max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
- * but in order to keep things simple just use
- * 2 * max_req * max_seg.
- */
-#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
-
static void blk_alloc(struct XenDevice *xendev)
{
struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
@@ -938,11 +931,6 @@ static void blk_alloc(struct XenDevice *xendev)
if (xen_mode != XEN_EMULATE) {
batch_maps = 1;
}
- if (xengnttab_set_max_grants(xendev->gnttabdev,
- MAX_GRANTS(max_requests, BLKIF_MAX_SEGMENTS_PER_REQUEST)) < 0) {
- xen_pv_printf(xendev, 0, "xengnttab_set_max_grants failed: %s\n",
- strerror(errno));
- }
}
static void blk_parse_discard(struct XenBlkDev *blkdev)
@@ -1037,6 +1025,9 @@ static int blk_init(struct XenDevice *xendev)
!blkdev->feature_grant_copy);
xenstore_write_be_int(&blkdev->xendev, "info", info);
+ xenstore_write_be_int(&blkdev->xendev, "max-ring-page-order",
+ MAX_RING_PAGE_ORDER);
+
blk_parse_discard(blkdev);
g_free(directiosafe);
@@ -1058,12 +1049,25 @@ out_error:
return -1;
}
+/*
+ * We need to account for the grant allocations requiring contiguous
+ * chunks; the worst case number would be
+ * max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
+ * but in order to keep things simple just use
+ * 2 * max_req * max_seg.
+ */
+#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
+
static int blk_connect(struct XenDevice *xendev)
{
struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
int pers, index, qflags;
bool readonly = true;
bool writethrough = true;
+ int order, ring_ref;
+ unsigned int ring_size, max_grants;
+ unsigned int i;
+ uint32_t *domids;
/* read-only ? */
if (blkdev->directiosafe) {
@@ -1138,9 +1142,42 @@ static int blk_connect(struct XenDevice *xendev)
xenstore_write_be_int64(&blkdev->xendev, "sectors",
blkdev->file_size / blkdev->file_blk);
- if (xenstore_read_fe_int(&blkdev->xendev, "ring-ref", &blkdev->ring_ref) == -1) {
+ if (xenstore_read_fe_int(&blkdev->xendev, "ring-page-order",
+ &order) == -1) {
+ blkdev->nr_ring_ref = 1;
+
+ if (xenstore_read_fe_int(&blkdev->xendev, "ring-ref",
+ &ring_ref) == -1) {
+ return -1;
+ }
+ blkdev->ring_ref[0] = ring_ref;
+
+ } else if (order >= 0 && order <= MAX_RING_PAGE_ORDER) {
+ blkdev->nr_ring_ref = 1 << order;
+
+ for (i = 0; i < blkdev->nr_ring_ref; i++) {
+ char *key;
+
+ key = g_strdup_printf("ring-ref%u", i);
+ if (!key) {
+ return -1;
+ }
+
+ if (xenstore_read_fe_int(&blkdev->xendev, key,
+ &ring_ref) == -1) {
+ g_free(key);
+ return -1;
+ }
+ blkdev->ring_ref[i] = ring_ref;
+
+ g_free(key);
+ }
+ } else {
+ xen_pv_printf(xendev, 0, "invalid ring-page-order: %d\n",
+ order);
return -1;
}
+
if (xenstore_read_fe_int(&blkdev->xendev, "event-channel",
&blkdev->xendev.remote_port) == -1) {
return -1;
@@ -1163,41 +1200,85 @@ static int blk_connect(struct XenDevice *xendev)
blkdev->protocol = BLKIF_PROTOCOL_NATIVE;
}
- blkdev->sring = xengnttab_map_grant_ref(blkdev->xendev.gnttabdev,
- blkdev->xendev.dom,
- blkdev->ring_ref,
- PROT_READ | PROT_WRITE);
+ ring_size = XC_PAGE_SIZE * blkdev->nr_ring_ref;
+ switch (blkdev->protocol) {
+ case BLKIF_PROTOCOL_NATIVE:
+ {
+ blkdev->max_requests = __CONST_RING_SIZE(blkif, ring_size);
+ break;
+ }
+ case BLKIF_PROTOCOL_X86_32:
+ {
+ blkdev->max_requests = __CONST_RING_SIZE(blkif_x86_32, ring_size);
+ break;
+ }
+ case BLKIF_PROTOCOL_X86_64:
+ {
+ blkdev->max_requests = __CONST_RING_SIZE(blkif_x86_64, ring_size);
+ break;
+ }
+ default:
+ return -1;
+ }
+
+ /* Calculate the maximum number of grants needed by ioreqs */
+ max_grants = MAX_GRANTS(blkdev->max_requests,
+ BLKIF_MAX_SEGMENTS_PER_REQUEST);
+ /* Add on the number needed for the ring pages */
+ max_grants += blkdev->nr_ring_ref;
+
+ if (xengnttab_set_max_grants(blkdev->xendev.gnttabdev, max_grants)) {
+ xen_pv_printf(xendev, 0, "xengnttab_set_max_grants failed: %s\n",
+ strerror(errno));
+ return -1;
+ }
+
+ domids = g_malloc0_n(blkdev->nr_ring_ref, sizeof(uint32_t));
+ for (i = 0; i < blkdev->nr_ring_ref; i++) {
+ domids[i] = blkdev->xendev.dom;
+ }
+
+ blkdev->sring = xengnttab_map_grant_refs(blkdev->xendev.gnttabdev,
+ blkdev->nr_ring_ref,
+ domids,
+ blkdev->ring_ref,
+ PROT_READ | PROT_WRITE);
+
+ g_free(domids);
+
if (!blkdev->sring) {
return -1;
}
+
blkdev->cnt_map++;
switch (blkdev->protocol) {
case BLKIF_PROTOCOL_NATIVE:
{
blkif_sring_t *sring_native = blkdev->sring;
- BACK_RING_INIT(&blkdev->rings.native, sring_native, XC_PAGE_SIZE);
+ BACK_RING_INIT(&blkdev->rings.native, sring_native, ring_size);
break;
}
case BLKIF_PROTOCOL_X86_32:
{
blkif_x86_32_sring_t *sring_x86_32 = blkdev->sring;
- BACK_RING_INIT(&blkdev->rings.x86_32_part, sring_x86_32, XC_PAGE_SIZE);
+ BACK_RING_INIT(&blkdev->rings.x86_32_part, sring_x86_32, ring_size);
break;
}
case BLKIF_PROTOCOL_X86_64:
{
blkif_x86_64_sring_t *sring_x86_64 = blkdev->sring;
- BACK_RING_INIT(&blkdev->rings.x86_64_part, sring_x86_64, XC_PAGE_SIZE);
+ BACK_RING_INIT(&blkdev->rings.x86_64_part, sring_x86_64, ring_size);
break;
}
}
if (blkdev->feature_persistent) {
/* Init persistent grants */
- blkdev->max_grants = max_requests * BLKIF_MAX_SEGMENTS_PER_REQUEST;
+ blkdev->max_grants = blkdev->max_requests *
+ BLKIF_MAX_SEGMENTS_PER_REQUEST;
blkdev->persistent_gnts = g_tree_new_full((GCompareDataFunc)int_cmp,
NULL, NULL,
batch_maps ?
@@ -1209,9 +1290,9 @@ static int blk_connect(struct XenDevice *xendev)
xen_be_bind_evtchn(&blkdev->xendev);
- xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, ring-ref %d, "
+ xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, nr-ring-ref %u, "
"remote port %d, local port %d\n",
- blkdev->xendev.protocol, blkdev->ring_ref,
+ blkdev->xendev.protocol, blkdev->nr_ring_ref,
blkdev->xendev.remote_port, blkdev->xendev.local_port);
return 0;
}
@@ -1228,7 +1309,8 @@ static void blk_disconnect(struct XenDevice *xendev)
xen_pv_unbind_evtchn(&blkdev->xendev);
if (blkdev->sring) {
- xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring, 1);
+ xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring,
+ blkdev->nr_ring_ref);
blkdev->cnt_map--;
blkdev->sring = NULL;
}
--
2.11.0
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [Qemu-devel] [PATCH v2 3/3] xen-disk: use an IOThread per instance
2017-06-21 12:52 ` Paul Durrant
@ 2017-06-21 12:52 ` Paul Durrant
-1 siblings, 0 replies; 24+ messages in thread
From: Paul Durrant @ 2017-06-21 12:52 UTC (permalink / raw)
To: xen-devel, qemu-devel, qemu-block
Cc: Paul Durrant, Stefano Stabellini, Anthony Perard, Kevin Wolf, Max Reitz
This patch allocates an IOThread object for each xen_disk instance and
sets the AIO context appropriately on connect. This allows processing
of I/O to proceed in parallel.
The patch also adds tracepoints into xen_disk to make it possible to
follow the state transtions of an instance in the log.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
v2:
- explicitly acquire and release AIO context in qemu_aio_complete() and
blk_bh()
---
hw/block/trace-events | 7 ++++++
hw/block/xen_disk.c | 69 ++++++++++++++++++++++++++++++++++++++++++++-------
2 files changed, 67 insertions(+), 9 deletions(-)
diff --git a/hw/block/trace-events b/hw/block/trace-events
index 65e83dc258..608b24ba66 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -10,3 +10,10 @@ virtio_blk_submit_multireq(void *mrb, int start, int num_reqs, uint64_t offset,
# hw/block/hd-geometry.c
hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p LCHS %d %d %d"
hd_geometry_guess(void *blk, uint32_t cyls, uint32_t heads, uint32_t secs, int trans) "blk %p CHS %u %u %u trans %d"
+
+# hw/block/xen_disk.c
+xen_disk_alloc(char *name) "%s"
+xen_disk_init(char *name) "%s"
+xen_disk_connect(char *name) "%s"
+xen_disk_disconnect(char *name) "%s"
+xen_disk_free(char *name) "%s"
diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 0e6513708e..8548195195 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -27,10 +27,13 @@
#include "hw/xen/xen_backend.h"
#include "xen_blkif.h"
#include "sysemu/blockdev.h"
+#include "sysemu/iothread.h"
#include "sysemu/block-backend.h"
#include "qapi/error.h"
#include "qapi/qmp/qdict.h"
#include "qapi/qmp/qstring.h"
+#include "qom/object_interfaces.h"
+#include "trace.h"
/* ------------------------------------------------------------- */
@@ -128,6 +131,9 @@ struct XenBlkDev {
DriveInfo *dinfo;
BlockBackend *blk;
QEMUBH *bh;
+
+ IOThread *iothread;
+ AioContext *ctx;
};
/* ------------------------------------------------------------- */
@@ -599,9 +605,12 @@ static int ioreq_runio_qemu_aio(struct ioreq *ioreq);
static void qemu_aio_complete(void *opaque, int ret)
{
struct ioreq *ioreq = opaque;
+ struct XenBlkDev *blkdev = ioreq->blkdev;
+
+ aio_context_acquire(blkdev->ctx);
if (ret != 0) {
- xen_pv_printf(&ioreq->blkdev->xendev, 0, "%s I/O error\n",
+ xen_pv_printf(&blkdev->xendev, 0, "%s I/O error\n",
ioreq->req.operation == BLKIF_OP_READ ? "read" : "write");
ioreq->aio_errors++;
}
@@ -610,13 +619,13 @@ static void qemu_aio_complete(void *opaque, int ret)
if (ioreq->presync) {
ioreq->presync = 0;
ioreq_runio_qemu_aio(ioreq);
- return;
+ goto done;
}
if (ioreq->aio_inflight > 0) {
- return;
+ goto done;
}
- if (ioreq->blkdev->feature_grant_copy) {
+ if (blkdev->feature_grant_copy) {
switch (ioreq->req.operation) {
case BLKIF_OP_READ:
/* in case of failure ioreq->aio_errors is increased */
@@ -638,7 +647,7 @@ static void qemu_aio_complete(void *opaque, int ret)
}
ioreq->status = ioreq->aio_errors ? BLKIF_RSP_ERROR : BLKIF_RSP_OKAY;
- if (!ioreq->blkdev->feature_grant_copy) {
+ if (!blkdev->feature_grant_copy) {
ioreq_unmap(ioreq);
}
ioreq_finish(ioreq);
@@ -650,16 +659,19 @@ static void qemu_aio_complete(void *opaque, int ret)
}
case BLKIF_OP_READ:
if (ioreq->status == BLKIF_RSP_OKAY) {
- block_acct_done(blk_get_stats(ioreq->blkdev->blk), &ioreq->acct);
+ block_acct_done(blk_get_stats(blkdev->blk), &ioreq->acct);
} else {
- block_acct_failed(blk_get_stats(ioreq->blkdev->blk), &ioreq->acct);
+ block_acct_failed(blk_get_stats(blkdev->blk), &ioreq->acct);
}
break;
case BLKIF_OP_DISCARD:
default:
break;
}
- qemu_bh_schedule(ioreq->blkdev->bh);
+ qemu_bh_schedule(blkdev->bh);
+
+done:
+ aio_context_release(blkdev->ctx);
}
static bool blk_split_discard(struct ioreq *ioreq, blkif_sector_t sector_number,
@@ -917,17 +929,40 @@ static void blk_handle_requests(struct XenBlkDev *blkdev)
static void blk_bh(void *opaque)
{
struct XenBlkDev *blkdev = opaque;
+
+ aio_context_acquire(blkdev->ctx);
blk_handle_requests(blkdev);
+ aio_context_release(blkdev->ctx);
}
static void blk_alloc(struct XenDevice *xendev)
{
struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
+ Object *obj;
+ char *name;
+ Error *err = NULL;
+
+ trace_xen_disk_alloc(xendev->name);
QLIST_INIT(&blkdev->inflight);
QLIST_INIT(&blkdev->finished);
QLIST_INIT(&blkdev->freelist);
- blkdev->bh = qemu_bh_new(blk_bh, blkdev);
+
+ obj = object_new(TYPE_IOTHREAD);
+ name = g_strdup_printf("iothread-%s", xendev->name);
+
+ object_property_add_child(object_get_objects_root(), name, obj, &err);
+ assert(!err);
+
+ g_free(name);
+
+ user_creatable_complete(obj, &err);
+ assert(!err);
+
+ blkdev->iothread = (IOThread *)object_dynamic_cast(obj, TYPE_IOTHREAD);
+ blkdev->ctx = iothread_get_aio_context(blkdev->iothread);
+ blkdev->bh = aio_bh_new(blkdev->ctx, blk_bh, blkdev);
+
if (xen_mode != XEN_EMULATE) {
batch_maps = 1;
}
@@ -954,6 +989,8 @@ static int blk_init(struct XenDevice *xendev)
int info = 0;
char *directiosafe = NULL;
+ trace_xen_disk_init(xendev->name);
+
/* read xenstore entries */
if (blkdev->params == NULL) {
char *h = NULL;
@@ -1069,6 +1106,8 @@ static int blk_connect(struct XenDevice *xendev)
unsigned int i;
uint32_t *domids;
+ trace_xen_disk_connect(xendev->name);
+
/* read-only ? */
if (blkdev->directiosafe) {
qflags = BDRV_O_NOCACHE | BDRV_O_NATIVE_AIO;
@@ -1288,6 +1327,8 @@ static int blk_connect(struct XenDevice *xendev)
blkdev->persistent_gnt_count = 0;
}
+ blk_set_aio_context(blkdev->blk, blkdev->ctx);
+
xen_be_bind_evtchn(&blkdev->xendev);
xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, nr-ring-ref %u, "
@@ -1301,13 +1342,20 @@ static void blk_disconnect(struct XenDevice *xendev)
{
struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
+ trace_xen_disk_disconnect(xendev->name);
+
+ aio_context_acquire(blkdev->ctx);
+
if (blkdev->blk) {
+ blk_set_aio_context(blkdev->blk, qemu_get_aio_context());
blk_detach_dev(blkdev->blk, blkdev);
blk_unref(blkdev->blk);
blkdev->blk = NULL;
}
xen_pv_unbind_evtchn(&blkdev->xendev);
+ aio_context_release(blkdev->ctx);
+
if (blkdev->sring) {
xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring,
blkdev->nr_ring_ref);
@@ -1341,6 +1389,8 @@ static int blk_free(struct XenDevice *xendev)
struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
struct ioreq *ioreq;
+ trace_xen_disk_free(xendev->name);
+
if (blkdev->blk || blkdev->sring) {
blk_disconnect(xendev);
}
@@ -1358,6 +1408,7 @@ static int blk_free(struct XenDevice *xendev)
g_free(blkdev->dev);
g_free(blkdev->devtype);
qemu_bh_delete(blkdev->bh);
+ object_unparent(OBJECT(blkdev->iothread));
return 0;
}
--
2.11.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v2 3/3] xen-disk: use an IOThread per instance
@ 2017-06-21 12:52 ` Paul Durrant
0 siblings, 0 replies; 24+ messages in thread
From: Paul Durrant @ 2017-06-21 12:52 UTC (permalink / raw)
To: xen-devel, qemu-devel, qemu-block
Cc: Anthony Perard, Kevin Wolf, Paul Durrant, Stefano Stabellini, Max Reitz
This patch allocates an IOThread object for each xen_disk instance and
sets the AIO context appropriately on connect. This allows processing
of I/O to proceed in parallel.
The patch also adds tracepoints into xen_disk to make it possible to
follow the state transtions of an instance in the log.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
v2:
- explicitly acquire and release AIO context in qemu_aio_complete() and
blk_bh()
---
hw/block/trace-events | 7 ++++++
hw/block/xen_disk.c | 69 ++++++++++++++++++++++++++++++++++++++++++++-------
2 files changed, 67 insertions(+), 9 deletions(-)
diff --git a/hw/block/trace-events b/hw/block/trace-events
index 65e83dc258..608b24ba66 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -10,3 +10,10 @@ virtio_blk_submit_multireq(void *mrb, int start, int num_reqs, uint64_t offset,
# hw/block/hd-geometry.c
hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p LCHS %d %d %d"
hd_geometry_guess(void *blk, uint32_t cyls, uint32_t heads, uint32_t secs, int trans) "blk %p CHS %u %u %u trans %d"
+
+# hw/block/xen_disk.c
+xen_disk_alloc(char *name) "%s"
+xen_disk_init(char *name) "%s"
+xen_disk_connect(char *name) "%s"
+xen_disk_disconnect(char *name) "%s"
+xen_disk_free(char *name) "%s"
diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 0e6513708e..8548195195 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -27,10 +27,13 @@
#include "hw/xen/xen_backend.h"
#include "xen_blkif.h"
#include "sysemu/blockdev.h"
+#include "sysemu/iothread.h"
#include "sysemu/block-backend.h"
#include "qapi/error.h"
#include "qapi/qmp/qdict.h"
#include "qapi/qmp/qstring.h"
+#include "qom/object_interfaces.h"
+#include "trace.h"
/* ------------------------------------------------------------- */
@@ -128,6 +131,9 @@ struct XenBlkDev {
DriveInfo *dinfo;
BlockBackend *blk;
QEMUBH *bh;
+
+ IOThread *iothread;
+ AioContext *ctx;
};
/* ------------------------------------------------------------- */
@@ -599,9 +605,12 @@ static int ioreq_runio_qemu_aio(struct ioreq *ioreq);
static void qemu_aio_complete(void *opaque, int ret)
{
struct ioreq *ioreq = opaque;
+ struct XenBlkDev *blkdev = ioreq->blkdev;
+
+ aio_context_acquire(blkdev->ctx);
if (ret != 0) {
- xen_pv_printf(&ioreq->blkdev->xendev, 0, "%s I/O error\n",
+ xen_pv_printf(&blkdev->xendev, 0, "%s I/O error\n",
ioreq->req.operation == BLKIF_OP_READ ? "read" : "write");
ioreq->aio_errors++;
}
@@ -610,13 +619,13 @@ static void qemu_aio_complete(void *opaque, int ret)
if (ioreq->presync) {
ioreq->presync = 0;
ioreq_runio_qemu_aio(ioreq);
- return;
+ goto done;
}
if (ioreq->aio_inflight > 0) {
- return;
+ goto done;
}
- if (ioreq->blkdev->feature_grant_copy) {
+ if (blkdev->feature_grant_copy) {
switch (ioreq->req.operation) {
case BLKIF_OP_READ:
/* in case of failure ioreq->aio_errors is increased */
@@ -638,7 +647,7 @@ static void qemu_aio_complete(void *opaque, int ret)
}
ioreq->status = ioreq->aio_errors ? BLKIF_RSP_ERROR : BLKIF_RSP_OKAY;
- if (!ioreq->blkdev->feature_grant_copy) {
+ if (!blkdev->feature_grant_copy) {
ioreq_unmap(ioreq);
}
ioreq_finish(ioreq);
@@ -650,16 +659,19 @@ static void qemu_aio_complete(void *opaque, int ret)
}
case BLKIF_OP_READ:
if (ioreq->status == BLKIF_RSP_OKAY) {
- block_acct_done(blk_get_stats(ioreq->blkdev->blk), &ioreq->acct);
+ block_acct_done(blk_get_stats(blkdev->blk), &ioreq->acct);
} else {
- block_acct_failed(blk_get_stats(ioreq->blkdev->blk), &ioreq->acct);
+ block_acct_failed(blk_get_stats(blkdev->blk), &ioreq->acct);
}
break;
case BLKIF_OP_DISCARD:
default:
break;
}
- qemu_bh_schedule(ioreq->blkdev->bh);
+ qemu_bh_schedule(blkdev->bh);
+
+done:
+ aio_context_release(blkdev->ctx);
}
static bool blk_split_discard(struct ioreq *ioreq, blkif_sector_t sector_number,
@@ -917,17 +929,40 @@ static void blk_handle_requests(struct XenBlkDev *blkdev)
static void blk_bh(void *opaque)
{
struct XenBlkDev *blkdev = opaque;
+
+ aio_context_acquire(blkdev->ctx);
blk_handle_requests(blkdev);
+ aio_context_release(blkdev->ctx);
}
static void blk_alloc(struct XenDevice *xendev)
{
struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
+ Object *obj;
+ char *name;
+ Error *err = NULL;
+
+ trace_xen_disk_alloc(xendev->name);
QLIST_INIT(&blkdev->inflight);
QLIST_INIT(&blkdev->finished);
QLIST_INIT(&blkdev->freelist);
- blkdev->bh = qemu_bh_new(blk_bh, blkdev);
+
+ obj = object_new(TYPE_IOTHREAD);
+ name = g_strdup_printf("iothread-%s", xendev->name);
+
+ object_property_add_child(object_get_objects_root(), name, obj, &err);
+ assert(!err);
+
+ g_free(name);
+
+ user_creatable_complete(obj, &err);
+ assert(!err);
+
+ blkdev->iothread = (IOThread *)object_dynamic_cast(obj, TYPE_IOTHREAD);
+ blkdev->ctx = iothread_get_aio_context(blkdev->iothread);
+ blkdev->bh = aio_bh_new(blkdev->ctx, blk_bh, blkdev);
+
if (xen_mode != XEN_EMULATE) {
batch_maps = 1;
}
@@ -954,6 +989,8 @@ static int blk_init(struct XenDevice *xendev)
int info = 0;
char *directiosafe = NULL;
+ trace_xen_disk_init(xendev->name);
+
/* read xenstore entries */
if (blkdev->params == NULL) {
char *h = NULL;
@@ -1069,6 +1106,8 @@ static int blk_connect(struct XenDevice *xendev)
unsigned int i;
uint32_t *domids;
+ trace_xen_disk_connect(xendev->name);
+
/* read-only ? */
if (blkdev->directiosafe) {
qflags = BDRV_O_NOCACHE | BDRV_O_NATIVE_AIO;
@@ -1288,6 +1327,8 @@ static int blk_connect(struct XenDevice *xendev)
blkdev->persistent_gnt_count = 0;
}
+ blk_set_aio_context(blkdev->blk, blkdev->ctx);
+
xen_be_bind_evtchn(&blkdev->xendev);
xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, nr-ring-ref %u, "
@@ -1301,13 +1342,20 @@ static void blk_disconnect(struct XenDevice *xendev)
{
struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
+ trace_xen_disk_disconnect(xendev->name);
+
+ aio_context_acquire(blkdev->ctx);
+
if (blkdev->blk) {
+ blk_set_aio_context(blkdev->blk, qemu_get_aio_context());
blk_detach_dev(blkdev->blk, blkdev);
blk_unref(blkdev->blk);
blkdev->blk = NULL;
}
xen_pv_unbind_evtchn(&blkdev->xendev);
+ aio_context_release(blkdev->ctx);
+
if (blkdev->sring) {
xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring,
blkdev->nr_ring_ref);
@@ -1341,6 +1389,8 @@ static int blk_free(struct XenDevice *xendev)
struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
struct ioreq *ioreq;
+ trace_xen_disk_free(xendev->name);
+
if (blkdev->blk || blkdev->sring) {
blk_disconnect(xendev);
}
@@ -1358,6 +1408,7 @@ static int blk_free(struct XenDevice *xendev)
g_free(blkdev->dev);
g_free(blkdev->devtype);
qemu_bh_delete(blkdev->bh);
+ object_unparent(OBJECT(blkdev->iothread));
return 0;
}
--
2.11.0
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v2 2/3] xen-disk: add support for multi-page shared rings
2017-06-21 12:52 ` Paul Durrant
(?)
@ 2017-06-22 0:39 ` Stefano Stabellini
-1 siblings, 0 replies; 24+ messages in thread
From: Stefano Stabellini @ 2017-06-22 0:39 UTC (permalink / raw)
To: Paul Durrant
Cc: xen-devel, qemu-devel, qemu-block, Stefano Stabellini,
Anthony Perard, Kevin Wolf, Max Reitz
On Wed, 21 Jun 2017, Paul Durrant wrote:
> The blkif protocol has had provision for negotiation of multi-page shared
> rings for some time now and many guest OS have support in their frontend
> drivers.
>
> This patch makes the necessary modifications to xen-disk support a shared
> ring up to order 4 (i.e. 16 pages).
>
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
> ---
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Anthony Perard <anthony.perard@citrix.com>
> Cc: Kevin Wolf <kwolf@redhat.com>
> Cc: Max Reitz <mreitz@redhat.com>
>
> v2:
> - Fix memory leak in error path
> - Print warning if ring-page-order exceeds limits
> ---
> hw/block/xen_disk.c | 144 +++++++++++++++++++++++++++++++++++++++++-----------
> 1 file changed, 113 insertions(+), 31 deletions(-)
>
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index 9b06e3aa81..0e6513708e 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -36,8 +36,6 @@
>
> static int batch_maps = 0;
>
> -static int max_requests = 32;
> -
> /* ------------------------------------------------------------- */
>
> #define BLOCK_SIZE 512
> @@ -84,6 +82,8 @@ struct ioreq {
> BlockAcctCookie acct;
> };
>
> +#define MAX_RING_PAGE_ORDER 4
> +
> struct XenBlkDev {
> struct XenDevice xendev; /* must be first */
> char *params;
> @@ -94,7 +94,8 @@ struct XenBlkDev {
> bool directiosafe;
> const char *fileproto;
> const char *filename;
> - int ring_ref;
> + unsigned int ring_ref[1 << MAX_RING_PAGE_ORDER];
> + unsigned int nr_ring_ref;
> void *sring;
> int64_t file_blk;
> int64_t file_size;
> @@ -110,6 +111,7 @@ struct XenBlkDev {
> int requests_total;
> int requests_inflight;
> int requests_finished;
> + unsigned int max_requests;
>
> /* Persistent grants extension */
> gboolean feature_discard;
> @@ -199,7 +201,7 @@ static struct ioreq *ioreq_start(struct XenBlkDev *blkdev)
> struct ioreq *ioreq = NULL;
>
> if (QLIST_EMPTY(&blkdev->freelist)) {
> - if (blkdev->requests_total >= max_requests) {
> + if (blkdev->requests_total >= blkdev->max_requests) {
> goto out;
> }
> /* allocate new struct */
> @@ -905,7 +907,7 @@ static void blk_handle_requests(struct XenBlkDev *blkdev)
> ioreq_runio_qemu_aio(ioreq);
> }
>
> - if (blkdev->more_work && blkdev->requests_inflight < max_requests) {
> + if (blkdev->more_work && blkdev->requests_inflight < blkdev->max_requests) {
> qemu_bh_schedule(blkdev->bh);
> }
> }
> @@ -918,15 +920,6 @@ static void blk_bh(void *opaque)
> blk_handle_requests(blkdev);
> }
>
> -/*
> - * We need to account for the grant allocations requiring contiguous
> - * chunks; the worst case number would be
> - * max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
> - * but in order to keep things simple just use
> - * 2 * max_req * max_seg.
> - */
> -#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
> -
> static void blk_alloc(struct XenDevice *xendev)
> {
> struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
> @@ -938,11 +931,6 @@ static void blk_alloc(struct XenDevice *xendev)
> if (xen_mode != XEN_EMULATE) {
> batch_maps = 1;
> }
> - if (xengnttab_set_max_grants(xendev->gnttabdev,
> - MAX_GRANTS(max_requests, BLKIF_MAX_SEGMENTS_PER_REQUEST)) < 0) {
> - xen_pv_printf(xendev, 0, "xengnttab_set_max_grants failed: %s\n",
> - strerror(errno));
> - }
> }
>
> static void blk_parse_discard(struct XenBlkDev *blkdev)
> @@ -1037,6 +1025,9 @@ static int blk_init(struct XenDevice *xendev)
> !blkdev->feature_grant_copy);
> xenstore_write_be_int(&blkdev->xendev, "info", info);
>
> + xenstore_write_be_int(&blkdev->xendev, "max-ring-page-order",
> + MAX_RING_PAGE_ORDER);
> +
> blk_parse_discard(blkdev);
>
> g_free(directiosafe);
> @@ -1058,12 +1049,25 @@ out_error:
> return -1;
> }
>
> +/*
> + * We need to account for the grant allocations requiring contiguous
> + * chunks; the worst case number would be
> + * max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
> + * but in order to keep things simple just use
> + * 2 * max_req * max_seg.
> + */
> +#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
> +
> static int blk_connect(struct XenDevice *xendev)
> {
> struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
> int pers, index, qflags;
> bool readonly = true;
> bool writethrough = true;
> + int order, ring_ref;
> + unsigned int ring_size, max_grants;
> + unsigned int i;
> + uint32_t *domids;
>
> /* read-only ? */
> if (blkdev->directiosafe) {
> @@ -1138,9 +1142,42 @@ static int blk_connect(struct XenDevice *xendev)
> xenstore_write_be_int64(&blkdev->xendev, "sectors",
> blkdev->file_size / blkdev->file_blk);
>
> - if (xenstore_read_fe_int(&blkdev->xendev, "ring-ref", &blkdev->ring_ref) == -1) {
> + if (xenstore_read_fe_int(&blkdev->xendev, "ring-page-order",
> + &order) == -1) {
> + blkdev->nr_ring_ref = 1;
> +
> + if (xenstore_read_fe_int(&blkdev->xendev, "ring-ref",
> + &ring_ref) == -1) {
> + return -1;
> + }
> + blkdev->ring_ref[0] = ring_ref;
> +
> + } else if (order >= 0 && order <= MAX_RING_PAGE_ORDER) {
> + blkdev->nr_ring_ref = 1 << order;
> +
> + for (i = 0; i < blkdev->nr_ring_ref; i++) {
> + char *key;
> +
> + key = g_strdup_printf("ring-ref%u", i);
> + if (!key) {
> + return -1;
> + }
> +
> + if (xenstore_read_fe_int(&blkdev->xendev, key,
> + &ring_ref) == -1) {
> + g_free(key);
> + return -1;
> + }
> + blkdev->ring_ref[i] = ring_ref;
> +
> + g_free(key);
> + }
> + } else {
> + xen_pv_printf(xendev, 0, "invalid ring-page-order: %d\n",
> + order);
> return -1;
> }
> +
> if (xenstore_read_fe_int(&blkdev->xendev, "event-channel",
> &blkdev->xendev.remote_port) == -1) {
> return -1;
> @@ -1163,41 +1200,85 @@ static int blk_connect(struct XenDevice *xendev)
> blkdev->protocol = BLKIF_PROTOCOL_NATIVE;
> }
>
> - blkdev->sring = xengnttab_map_grant_ref(blkdev->xendev.gnttabdev,
> - blkdev->xendev.dom,
> - blkdev->ring_ref,
> - PROT_READ | PROT_WRITE);
> + ring_size = XC_PAGE_SIZE * blkdev->nr_ring_ref;
> + switch (blkdev->protocol) {
> + case BLKIF_PROTOCOL_NATIVE:
> + {
> + blkdev->max_requests = __CONST_RING_SIZE(blkif, ring_size);
> + break;
> + }
> + case BLKIF_PROTOCOL_X86_32:
> + {
> + blkdev->max_requests = __CONST_RING_SIZE(blkif_x86_32, ring_size);
> + break;
> + }
> + case BLKIF_PROTOCOL_X86_64:
> + {
> + blkdev->max_requests = __CONST_RING_SIZE(blkif_x86_64, ring_size);
> + break;
> + }
> + default:
> + return -1;
> + }
> +
> + /* Calculate the maximum number of grants needed by ioreqs */
> + max_grants = MAX_GRANTS(blkdev->max_requests,
> + BLKIF_MAX_SEGMENTS_PER_REQUEST);
> + /* Add on the number needed for the ring pages */
> + max_grants += blkdev->nr_ring_ref;
> +
> + if (xengnttab_set_max_grants(blkdev->xendev.gnttabdev, max_grants)) {
> + xen_pv_printf(xendev, 0, "xengnttab_set_max_grants failed: %s\n",
> + strerror(errno));
> + return -1;
> + }
> +
> + domids = g_malloc0_n(blkdev->nr_ring_ref, sizeof(uint32_t));
> + for (i = 0; i < blkdev->nr_ring_ref; i++) {
> + domids[i] = blkdev->xendev.dom;
> + }
> +
> + blkdev->sring = xengnttab_map_grant_refs(blkdev->xendev.gnttabdev,
> + blkdev->nr_ring_ref,
> + domids,
> + blkdev->ring_ref,
> + PROT_READ | PROT_WRITE);
> +
> + g_free(domids);
> +
> if (!blkdev->sring) {
> return -1;
> }
> +
> blkdev->cnt_map++;
>
> switch (blkdev->protocol) {
> case BLKIF_PROTOCOL_NATIVE:
> {
> blkif_sring_t *sring_native = blkdev->sring;
> - BACK_RING_INIT(&blkdev->rings.native, sring_native, XC_PAGE_SIZE);
> + BACK_RING_INIT(&blkdev->rings.native, sring_native, ring_size);
> break;
> }
> case BLKIF_PROTOCOL_X86_32:
> {
> blkif_x86_32_sring_t *sring_x86_32 = blkdev->sring;
>
> - BACK_RING_INIT(&blkdev->rings.x86_32_part, sring_x86_32, XC_PAGE_SIZE);
> + BACK_RING_INIT(&blkdev->rings.x86_32_part, sring_x86_32, ring_size);
> break;
> }
> case BLKIF_PROTOCOL_X86_64:
> {
> blkif_x86_64_sring_t *sring_x86_64 = blkdev->sring;
>
> - BACK_RING_INIT(&blkdev->rings.x86_64_part, sring_x86_64, XC_PAGE_SIZE);
> + BACK_RING_INIT(&blkdev->rings.x86_64_part, sring_x86_64, ring_size);
> break;
> }
> }
>
> if (blkdev->feature_persistent) {
> /* Init persistent grants */
> - blkdev->max_grants = max_requests * BLKIF_MAX_SEGMENTS_PER_REQUEST;
> + blkdev->max_grants = blkdev->max_requests *
> + BLKIF_MAX_SEGMENTS_PER_REQUEST;
> blkdev->persistent_gnts = g_tree_new_full((GCompareDataFunc)int_cmp,
> NULL, NULL,
> batch_maps ?
> @@ -1209,9 +1290,9 @@ static int blk_connect(struct XenDevice *xendev)
>
> xen_be_bind_evtchn(&blkdev->xendev);
>
> - xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, ring-ref %d, "
> + xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, nr-ring-ref %u, "
> "remote port %d, local port %d\n",
> - blkdev->xendev.protocol, blkdev->ring_ref,
> + blkdev->xendev.protocol, blkdev->nr_ring_ref,
> blkdev->xendev.remote_port, blkdev->xendev.local_port);
> return 0;
> }
> @@ -1228,7 +1309,8 @@ static void blk_disconnect(struct XenDevice *xendev)
> xen_pv_unbind_evtchn(&blkdev->xendev);
>
> if (blkdev->sring) {
> - xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring, 1);
> + xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring,
> + blkdev->nr_ring_ref);
> blkdev->cnt_map--;
> blkdev->sring = NULL;
> }
> --
> 2.11.0
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 2/3] xen-disk: add support for multi-page shared rings
2017-06-21 12:52 ` Paul Durrant
(?)
(?)
@ 2017-06-22 0:39 ` Stefano Stabellini
-1 siblings, 0 replies; 24+ messages in thread
From: Stefano Stabellini @ 2017-06-22 0:39 UTC (permalink / raw)
To: Paul Durrant
Cc: Kevin Wolf, Stefano Stabellini, qemu-block, qemu-devel,
Max Reitz, Anthony Perard, xen-devel
On Wed, 21 Jun 2017, Paul Durrant wrote:
> The blkif protocol has had provision for negotiation of multi-page shared
> rings for some time now and many guest OS have support in their frontend
> drivers.
>
> This patch makes the necessary modifications to xen-disk support a shared
> ring up to order 4 (i.e. 16 pages).
>
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
> ---
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Anthony Perard <anthony.perard@citrix.com>
> Cc: Kevin Wolf <kwolf@redhat.com>
> Cc: Max Reitz <mreitz@redhat.com>
>
> v2:
> - Fix memory leak in error path
> - Print warning if ring-page-order exceeds limits
> ---
> hw/block/xen_disk.c | 144 +++++++++++++++++++++++++++++++++++++++++-----------
> 1 file changed, 113 insertions(+), 31 deletions(-)
>
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index 9b06e3aa81..0e6513708e 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -36,8 +36,6 @@
>
> static int batch_maps = 0;
>
> -static int max_requests = 32;
> -
> /* ------------------------------------------------------------- */
>
> #define BLOCK_SIZE 512
> @@ -84,6 +82,8 @@ struct ioreq {
> BlockAcctCookie acct;
> };
>
> +#define MAX_RING_PAGE_ORDER 4
> +
> struct XenBlkDev {
> struct XenDevice xendev; /* must be first */
> char *params;
> @@ -94,7 +94,8 @@ struct XenBlkDev {
> bool directiosafe;
> const char *fileproto;
> const char *filename;
> - int ring_ref;
> + unsigned int ring_ref[1 << MAX_RING_PAGE_ORDER];
> + unsigned int nr_ring_ref;
> void *sring;
> int64_t file_blk;
> int64_t file_size;
> @@ -110,6 +111,7 @@ struct XenBlkDev {
> int requests_total;
> int requests_inflight;
> int requests_finished;
> + unsigned int max_requests;
>
> /* Persistent grants extension */
> gboolean feature_discard;
> @@ -199,7 +201,7 @@ static struct ioreq *ioreq_start(struct XenBlkDev *blkdev)
> struct ioreq *ioreq = NULL;
>
> if (QLIST_EMPTY(&blkdev->freelist)) {
> - if (blkdev->requests_total >= max_requests) {
> + if (blkdev->requests_total >= blkdev->max_requests) {
> goto out;
> }
> /* allocate new struct */
> @@ -905,7 +907,7 @@ static void blk_handle_requests(struct XenBlkDev *blkdev)
> ioreq_runio_qemu_aio(ioreq);
> }
>
> - if (blkdev->more_work && blkdev->requests_inflight < max_requests) {
> + if (blkdev->more_work && blkdev->requests_inflight < blkdev->max_requests) {
> qemu_bh_schedule(blkdev->bh);
> }
> }
> @@ -918,15 +920,6 @@ static void blk_bh(void *opaque)
> blk_handle_requests(blkdev);
> }
>
> -/*
> - * We need to account for the grant allocations requiring contiguous
> - * chunks; the worst case number would be
> - * max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
> - * but in order to keep things simple just use
> - * 2 * max_req * max_seg.
> - */
> -#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
> -
> static void blk_alloc(struct XenDevice *xendev)
> {
> struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
> @@ -938,11 +931,6 @@ static void blk_alloc(struct XenDevice *xendev)
> if (xen_mode != XEN_EMULATE) {
> batch_maps = 1;
> }
> - if (xengnttab_set_max_grants(xendev->gnttabdev,
> - MAX_GRANTS(max_requests, BLKIF_MAX_SEGMENTS_PER_REQUEST)) < 0) {
> - xen_pv_printf(xendev, 0, "xengnttab_set_max_grants failed: %s\n",
> - strerror(errno));
> - }
> }
>
> static void blk_parse_discard(struct XenBlkDev *blkdev)
> @@ -1037,6 +1025,9 @@ static int blk_init(struct XenDevice *xendev)
> !blkdev->feature_grant_copy);
> xenstore_write_be_int(&blkdev->xendev, "info", info);
>
> + xenstore_write_be_int(&blkdev->xendev, "max-ring-page-order",
> + MAX_RING_PAGE_ORDER);
> +
> blk_parse_discard(blkdev);
>
> g_free(directiosafe);
> @@ -1058,12 +1049,25 @@ out_error:
> return -1;
> }
>
> +/*
> + * We need to account for the grant allocations requiring contiguous
> + * chunks; the worst case number would be
> + * max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
> + * but in order to keep things simple just use
> + * 2 * max_req * max_seg.
> + */
> +#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
> +
> static int blk_connect(struct XenDevice *xendev)
> {
> struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
> int pers, index, qflags;
> bool readonly = true;
> bool writethrough = true;
> + int order, ring_ref;
> + unsigned int ring_size, max_grants;
> + unsigned int i;
> + uint32_t *domids;
>
> /* read-only ? */
> if (blkdev->directiosafe) {
> @@ -1138,9 +1142,42 @@ static int blk_connect(struct XenDevice *xendev)
> xenstore_write_be_int64(&blkdev->xendev, "sectors",
> blkdev->file_size / blkdev->file_blk);
>
> - if (xenstore_read_fe_int(&blkdev->xendev, "ring-ref", &blkdev->ring_ref) == -1) {
> + if (xenstore_read_fe_int(&blkdev->xendev, "ring-page-order",
> + &order) == -1) {
> + blkdev->nr_ring_ref = 1;
> +
> + if (xenstore_read_fe_int(&blkdev->xendev, "ring-ref",
> + &ring_ref) == -1) {
> + return -1;
> + }
> + blkdev->ring_ref[0] = ring_ref;
> +
> + } else if (order >= 0 && order <= MAX_RING_PAGE_ORDER) {
> + blkdev->nr_ring_ref = 1 << order;
> +
> + for (i = 0; i < blkdev->nr_ring_ref; i++) {
> + char *key;
> +
> + key = g_strdup_printf("ring-ref%u", i);
> + if (!key) {
> + return -1;
> + }
> +
> + if (xenstore_read_fe_int(&blkdev->xendev, key,
> + &ring_ref) == -1) {
> + g_free(key);
> + return -1;
> + }
> + blkdev->ring_ref[i] = ring_ref;
> +
> + g_free(key);
> + }
> + } else {
> + xen_pv_printf(xendev, 0, "invalid ring-page-order: %d\n",
> + order);
> return -1;
> }
> +
> if (xenstore_read_fe_int(&blkdev->xendev, "event-channel",
> &blkdev->xendev.remote_port) == -1) {
> return -1;
> @@ -1163,41 +1200,85 @@ static int blk_connect(struct XenDevice *xendev)
> blkdev->protocol = BLKIF_PROTOCOL_NATIVE;
> }
>
> - blkdev->sring = xengnttab_map_grant_ref(blkdev->xendev.gnttabdev,
> - blkdev->xendev.dom,
> - blkdev->ring_ref,
> - PROT_READ | PROT_WRITE);
> + ring_size = XC_PAGE_SIZE * blkdev->nr_ring_ref;
> + switch (blkdev->protocol) {
> + case BLKIF_PROTOCOL_NATIVE:
> + {
> + blkdev->max_requests = __CONST_RING_SIZE(blkif, ring_size);
> + break;
> + }
> + case BLKIF_PROTOCOL_X86_32:
> + {
> + blkdev->max_requests = __CONST_RING_SIZE(blkif_x86_32, ring_size);
> + break;
> + }
> + case BLKIF_PROTOCOL_X86_64:
> + {
> + blkdev->max_requests = __CONST_RING_SIZE(blkif_x86_64, ring_size);
> + break;
> + }
> + default:
> + return -1;
> + }
> +
> + /* Calculate the maximum number of grants needed by ioreqs */
> + max_grants = MAX_GRANTS(blkdev->max_requests,
> + BLKIF_MAX_SEGMENTS_PER_REQUEST);
> + /* Add on the number needed for the ring pages */
> + max_grants += blkdev->nr_ring_ref;
> +
> + if (xengnttab_set_max_grants(blkdev->xendev.gnttabdev, max_grants)) {
> + xen_pv_printf(xendev, 0, "xengnttab_set_max_grants failed: %s\n",
> + strerror(errno));
> + return -1;
> + }
> +
> + domids = g_malloc0_n(blkdev->nr_ring_ref, sizeof(uint32_t));
> + for (i = 0; i < blkdev->nr_ring_ref; i++) {
> + domids[i] = blkdev->xendev.dom;
> + }
> +
> + blkdev->sring = xengnttab_map_grant_refs(blkdev->xendev.gnttabdev,
> + blkdev->nr_ring_ref,
> + domids,
> + blkdev->ring_ref,
> + PROT_READ | PROT_WRITE);
> +
> + g_free(domids);
> +
> if (!blkdev->sring) {
> return -1;
> }
> +
> blkdev->cnt_map++;
>
> switch (blkdev->protocol) {
> case BLKIF_PROTOCOL_NATIVE:
> {
> blkif_sring_t *sring_native = blkdev->sring;
> - BACK_RING_INIT(&blkdev->rings.native, sring_native, XC_PAGE_SIZE);
> + BACK_RING_INIT(&blkdev->rings.native, sring_native, ring_size);
> break;
> }
> case BLKIF_PROTOCOL_X86_32:
> {
> blkif_x86_32_sring_t *sring_x86_32 = blkdev->sring;
>
> - BACK_RING_INIT(&blkdev->rings.x86_32_part, sring_x86_32, XC_PAGE_SIZE);
> + BACK_RING_INIT(&blkdev->rings.x86_32_part, sring_x86_32, ring_size);
> break;
> }
> case BLKIF_PROTOCOL_X86_64:
> {
> blkif_x86_64_sring_t *sring_x86_64 = blkdev->sring;
>
> - BACK_RING_INIT(&blkdev->rings.x86_64_part, sring_x86_64, XC_PAGE_SIZE);
> + BACK_RING_INIT(&blkdev->rings.x86_64_part, sring_x86_64, ring_size);
> break;
> }
> }
>
> if (blkdev->feature_persistent) {
> /* Init persistent grants */
> - blkdev->max_grants = max_requests * BLKIF_MAX_SEGMENTS_PER_REQUEST;
> + blkdev->max_grants = blkdev->max_requests *
> + BLKIF_MAX_SEGMENTS_PER_REQUEST;
> blkdev->persistent_gnts = g_tree_new_full((GCompareDataFunc)int_cmp,
> NULL, NULL,
> batch_maps ?
> @@ -1209,9 +1290,9 @@ static int blk_connect(struct XenDevice *xendev)
>
> xen_be_bind_evtchn(&blkdev->xendev);
>
> - xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, ring-ref %d, "
> + xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, nr-ring-ref %u, "
> "remote port %d, local port %d\n",
> - blkdev->xendev.protocol, blkdev->ring_ref,
> + blkdev->xendev.protocol, blkdev->nr_ring_ref,
> blkdev->xendev.remote_port, blkdev->xendev.local_port);
> return 0;
> }
> @@ -1228,7 +1309,8 @@ static void blk_disconnect(struct XenDevice *xendev)
> xen_pv_unbind_evtchn(&blkdev->xendev);
>
> if (blkdev->sring) {
> - xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring, 1);
> + xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring,
> + blkdev->nr_ring_ref);
> blkdev->cnt_map--;
> blkdev->sring = NULL;
> }
> --
> 2.11.0
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v2 1/3] xen-disk: only advertize feature-persistent if grant copy is not available
2017-06-21 12:52 ` Paul Durrant
(?)
(?)
@ 2017-06-22 0:40 ` Stefano Stabellini
-1 siblings, 0 replies; 24+ messages in thread
From: Stefano Stabellini @ 2017-06-22 0:40 UTC (permalink / raw)
To: Paul Durrant
Cc: xen-devel, qemu-devel, qemu-block, Stefano Stabellini,
Anthony Perard, Kevin Wolf, Max Reitz
On Wed, 21 Jun 2017, Paul Durrant wrote:
> If grant copy is available then it will always be used in preference to
> persistent maps. In this case feature-persistent should not be advertized
> to the frontend, otherwise it may needlessly copy data into persistently
> granted buffers.
>
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
> ---
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Anthony Perard <anthony.perard@citrix.com>
> Cc: Kevin Wolf <kwolf@redhat.com>
> Cc: Max Reitz <mreitz@redhat.com>
> ---
> hw/block/xen_disk.c | 15 ++++++++-------
> 1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index 3a22805fbc..9b06e3aa81 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -1023,11 +1023,18 @@ static int blk_init(struct XenDevice *xendev)
>
> blkdev->file_blk = BLOCK_SIZE;
>
> + blkdev->feature_grant_copy =
> + (xengnttab_grant_copy(blkdev->xendev.gnttabdev, 0, NULL) == 0);
> +
> + xen_pv_printf(&blkdev->xendev, 3, "grant copy operation %s\n",
> + blkdev->feature_grant_copy ? "enabled" : "disabled");
> +
> /* fill info
> * blk_connect supplies sector-size and sectors
> */
> xenstore_write_be_int(&blkdev->xendev, "feature-flush-cache", 1);
> - xenstore_write_be_int(&blkdev->xendev, "feature-persistent", 1);
> + xenstore_write_be_int(&blkdev->xendev, "feature-persistent",
> + !blkdev->feature_grant_copy);
> xenstore_write_be_int(&blkdev->xendev, "info", info);
>
> blk_parse_discard(blkdev);
> @@ -1202,12 +1209,6 @@ static int blk_connect(struct XenDevice *xendev)
>
> xen_be_bind_evtchn(&blkdev->xendev);
>
> - blkdev->feature_grant_copy =
> - (xengnttab_grant_copy(blkdev->xendev.gnttabdev, 0, NULL) == 0);
> -
> - xen_pv_printf(&blkdev->xendev, 3, "grant copy operation %s\n",
> - blkdev->feature_grant_copy ? "enabled" : "disabled");
> -
> xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, ring-ref %d, "
> "remote port %d, local port %d\n",
> blkdev->xendev.protocol, blkdev->ring_ref,
> --
> 2.11.0
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 1/3] xen-disk: only advertize feature-persistent if grant copy is not available
2017-06-21 12:52 ` Paul Durrant
(?)
@ 2017-06-22 0:40 ` Stefano Stabellini
-1 siblings, 0 replies; 24+ messages in thread
From: Stefano Stabellini @ 2017-06-22 0:40 UTC (permalink / raw)
To: Paul Durrant
Cc: Kevin Wolf, Stefano Stabellini, qemu-block, qemu-devel,
Max Reitz, Anthony Perard, xen-devel
On Wed, 21 Jun 2017, Paul Durrant wrote:
> If grant copy is available then it will always be used in preference to
> persistent maps. In this case feature-persistent should not be advertized
> to the frontend, otherwise it may needlessly copy data into persistently
> granted buffers.
>
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
> ---
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Anthony Perard <anthony.perard@citrix.com>
> Cc: Kevin Wolf <kwolf@redhat.com>
> Cc: Max Reitz <mreitz@redhat.com>
> ---
> hw/block/xen_disk.c | 15 ++++++++-------
> 1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index 3a22805fbc..9b06e3aa81 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -1023,11 +1023,18 @@ static int blk_init(struct XenDevice *xendev)
>
> blkdev->file_blk = BLOCK_SIZE;
>
> + blkdev->feature_grant_copy =
> + (xengnttab_grant_copy(blkdev->xendev.gnttabdev, 0, NULL) == 0);
> +
> + xen_pv_printf(&blkdev->xendev, 3, "grant copy operation %s\n",
> + blkdev->feature_grant_copy ? "enabled" : "disabled");
> +
> /* fill info
> * blk_connect supplies sector-size and sectors
> */
> xenstore_write_be_int(&blkdev->xendev, "feature-flush-cache", 1);
> - xenstore_write_be_int(&blkdev->xendev, "feature-persistent", 1);
> + xenstore_write_be_int(&blkdev->xendev, "feature-persistent",
> + !blkdev->feature_grant_copy);
> xenstore_write_be_int(&blkdev->xendev, "info", info);
>
> blk_parse_discard(blkdev);
> @@ -1202,12 +1209,6 @@ static int blk_connect(struct XenDevice *xendev)
>
> xen_be_bind_evtchn(&blkdev->xendev);
>
> - blkdev->feature_grant_copy =
> - (xengnttab_grant_copy(blkdev->xendev.gnttabdev, 0, NULL) == 0);
> -
> - xen_pv_printf(&blkdev->xendev, 3, "grant copy operation %s\n",
> - blkdev->feature_grant_copy ? "enabled" : "disabled");
> -
> xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, ring-ref %d, "
> "remote port %d, local port %d\n",
> blkdev->xendev.protocol, blkdev->ring_ref,
> --
> 2.11.0
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/3] xen-disk: use an IOThread per instance
2017-06-21 12:52 ` Paul Durrant
(?)
@ 2017-06-22 22:14 ` Stefano Stabellini
2017-07-07 8:20 ` Paul Durrant
2017-07-07 8:20 ` Paul Durrant
-1 siblings, 2 replies; 24+ messages in thread
From: Stefano Stabellini @ 2017-06-22 22:14 UTC (permalink / raw)
To: Paul Durrant
Cc: xen-devel, qemu-devel, qemu-block, Stefano Stabellini,
Anthony Perard, Kevin Wolf, Max Reitz, afaerber
CC'ing Andreas Färber. Could you please give a quick look below at the
way the iothread object is instantiate and destroyed? I am no object
model expert and would appreaciate a second opinion.
On Wed, 21 Jun 2017, Paul Durrant wrote:
> This patch allocates an IOThread object for each xen_disk instance and
> sets the AIO context appropriately on connect. This allows processing
> of I/O to proceed in parallel.
>
> The patch also adds tracepoints into xen_disk to make it possible to
> follow the state transtions of an instance in the log.
>
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> ---
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Anthony Perard <anthony.perard@citrix.com>
> Cc: Kevin Wolf <kwolf@redhat.com>
> Cc: Max Reitz <mreitz@redhat.com>
>
> v2:
> - explicitly acquire and release AIO context in qemu_aio_complete() and
> blk_bh()
> ---
> hw/block/trace-events | 7 ++++++
> hw/block/xen_disk.c | 69 ++++++++++++++++++++++++++++++++++++++++++++-------
> 2 files changed, 67 insertions(+), 9 deletions(-)
>
> diff --git a/hw/block/trace-events b/hw/block/trace-events
> index 65e83dc258..608b24ba66 100644
> --- a/hw/block/trace-events
> +++ b/hw/block/trace-events
> @@ -10,3 +10,10 @@ virtio_blk_submit_multireq(void *mrb, int start, int num_reqs, uint64_t offset,
> # hw/block/hd-geometry.c
> hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p LCHS %d %d %d"
> hd_geometry_guess(void *blk, uint32_t cyls, uint32_t heads, uint32_t secs, int trans) "blk %p CHS %u %u %u trans %d"
> +
> +# hw/block/xen_disk.c
> +xen_disk_alloc(char *name) "%s"
> +xen_disk_init(char *name) "%s"
> +xen_disk_connect(char *name) "%s"
> +xen_disk_disconnect(char *name) "%s"
> +xen_disk_free(char *name) "%s"
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index 0e6513708e..8548195195 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -27,10 +27,13 @@
> #include "hw/xen/xen_backend.h"
> #include "xen_blkif.h"
> #include "sysemu/blockdev.h"
> +#include "sysemu/iothread.h"
> #include "sysemu/block-backend.h"
> #include "qapi/error.h"
> #include "qapi/qmp/qdict.h"
> #include "qapi/qmp/qstring.h"
> +#include "qom/object_interfaces.h"
> +#include "trace.h"
>
> /* ------------------------------------------------------------- */
>
> @@ -128,6 +131,9 @@ struct XenBlkDev {
> DriveInfo *dinfo;
> BlockBackend *blk;
> QEMUBH *bh;
> +
> + IOThread *iothread;
> + AioContext *ctx;
> };
>
> /* ------------------------------------------------------------- */
> @@ -599,9 +605,12 @@ static int ioreq_runio_qemu_aio(struct ioreq *ioreq);
> static void qemu_aio_complete(void *opaque, int ret)
> {
> struct ioreq *ioreq = opaque;
> + struct XenBlkDev *blkdev = ioreq->blkdev;
> +
> + aio_context_acquire(blkdev->ctx);
I think that Paolo was right that we need a aio_context_acquire here,
however the issue is that with the current code:
blk_handle_requests -> ioreq_runio_qemu_aio -> qemu_aio_complete
leading to aio_context_acquire being called twice on the same lock,
which I don't think is allowed?
I think we need to get rid of the qemu_aio_complete call from
ioreq_runio_qemu_aio, but to do that we need to be careful with the
accounting of aio_inflight (today it's incremented unconditionally at
the beginning of ioreq_runio_qemu_aio, I think we would have to change
that to increment it only if presync).
> if (ret != 0) {
> - xen_pv_printf(&ioreq->blkdev->xendev, 0, "%s I/O error\n",
> + xen_pv_printf(&blkdev->xendev, 0, "%s I/O error\n",
> ioreq->req.operation == BLKIF_OP_READ ? "read" : "write");
> ioreq->aio_errors++;
> }
> @@ -610,13 +619,13 @@ static void qemu_aio_complete(void *opaque, int ret)
> if (ioreq->presync) {
> ioreq->presync = 0;
> ioreq_runio_qemu_aio(ioreq);
> - return;
> + goto done;
> }
> if (ioreq->aio_inflight > 0) {
> - return;
> + goto done;
> }
>
> - if (ioreq->blkdev->feature_grant_copy) {
> + if (blkdev->feature_grant_copy) {
> switch (ioreq->req.operation) {
> case BLKIF_OP_READ:
> /* in case of failure ioreq->aio_errors is increased */
> @@ -638,7 +647,7 @@ static void qemu_aio_complete(void *opaque, int ret)
> }
>
> ioreq->status = ioreq->aio_errors ? BLKIF_RSP_ERROR : BLKIF_RSP_OKAY;
> - if (!ioreq->blkdev->feature_grant_copy) {
> + if (!blkdev->feature_grant_copy) {
> ioreq_unmap(ioreq);
> }
> ioreq_finish(ioreq);
> @@ -650,16 +659,19 @@ static void qemu_aio_complete(void *opaque, int ret)
> }
> case BLKIF_OP_READ:
> if (ioreq->status == BLKIF_RSP_OKAY) {
> - block_acct_done(blk_get_stats(ioreq->blkdev->blk), &ioreq->acct);
> + block_acct_done(blk_get_stats(blkdev->blk), &ioreq->acct);
> } else {
> - block_acct_failed(blk_get_stats(ioreq->blkdev->blk), &ioreq->acct);
> + block_acct_failed(blk_get_stats(blkdev->blk), &ioreq->acct);
> }
> break;
> case BLKIF_OP_DISCARD:
> default:
> break;
> }
> - qemu_bh_schedule(ioreq->blkdev->bh);
> + qemu_bh_schedule(blkdev->bh);
> +
> +done:
> + aio_context_release(blkdev->ctx);
> }
>
> static bool blk_split_discard(struct ioreq *ioreq, blkif_sector_t sector_number,
> @@ -917,17 +929,40 @@ static void blk_handle_requests(struct XenBlkDev *blkdev)
> static void blk_bh(void *opaque)
> {
> struct XenBlkDev *blkdev = opaque;
> +
> + aio_context_acquire(blkdev->ctx);
> blk_handle_requests(blkdev);
> + aio_context_release(blkdev->ctx);
> }
>
> static void blk_alloc(struct XenDevice *xendev)
> {
> struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
> + Object *obj;
> + char *name;
> + Error *err = NULL;
> +
> + trace_xen_disk_alloc(xendev->name);
>
> QLIST_INIT(&blkdev->inflight);
> QLIST_INIT(&blkdev->finished);
> QLIST_INIT(&blkdev->freelist);
> - blkdev->bh = qemu_bh_new(blk_bh, blkdev);
> +
> + obj = object_new(TYPE_IOTHREAD);
> + name = g_strdup_printf("iothread-%s", xendev->name);
> +
> + object_property_add_child(object_get_objects_root(), name, obj, &err);
> + assert(!err);
Would it be enough to call object_ref?
> + g_free(name);
> +
> + user_creatable_complete(obj, &err);
Why do we need to call this?
> + assert(!err);
> +
> + blkdev->iothread = (IOThread *)object_dynamic_cast(obj, TYPE_IOTHREAD);
> + blkdev->ctx = iothread_get_aio_context(blkdev->iothread);
> + blkdev->bh = aio_bh_new(blkdev->ctx, blk_bh, blkdev);
> +
> if (xen_mode != XEN_EMULATE) {
> batch_maps = 1;
> }
> @@ -1288,6 +1327,8 @@ static int blk_connect(struct XenDevice *xendev)
> blkdev->persistent_gnt_count = 0;
> }
>
> + blk_set_aio_context(blkdev->blk, blkdev->ctx);
> +
> xen_be_bind_evtchn(&blkdev->xendev);
>
> xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, nr-ring-ref %u, "
> @@ -1301,13 +1342,20 @@ static void blk_disconnect(struct XenDevice *xendev)
> {
> struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
>
> + trace_xen_disk_disconnect(xendev->name);
> +
> + aio_context_acquire(blkdev->ctx);
> +
> if (blkdev->blk) {
> + blk_set_aio_context(blkdev->blk, qemu_get_aio_context());
> blk_detach_dev(blkdev->blk, blkdev);
> blk_unref(blkdev->blk);
> blkdev->blk = NULL;
> }
> xen_pv_unbind_evtchn(&blkdev->xendev);
>
> + aio_context_release(blkdev->ctx);
> +
> if (blkdev->sring) {
> xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring,
> blkdev->nr_ring_ref);
> @@ -1358,6 +1408,7 @@ static int blk_free(struct XenDevice *xendev)
> g_free(blkdev->dev);
> g_free(blkdev->devtype);
> qemu_bh_delete(blkdev->bh);
> + object_unparent(OBJECT(blkdev->iothread));
Shouldn't this be object_unref?
> return 0;
> }
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 3/3] xen-disk: use an IOThread per instance
2017-06-21 12:52 ` Paul Durrant
(?)
(?)
@ 2017-06-22 22:14 ` Stefano Stabellini
-1 siblings, 0 replies; 24+ messages in thread
From: Stefano Stabellini @ 2017-06-22 22:14 UTC (permalink / raw)
To: Paul Durrant
Cc: Kevin Wolf, Stefano Stabellini, qemu-block, qemu-devel,
Max Reitz, Anthony Perard, xen-devel, afaerber
[-- Attachment #1: Type: TEXT/PLAIN, Size: 8170 bytes --]
CC'ing Andreas Färber. Could you please give a quick look below at the
way the iothread object is instantiate and destroyed? I am no object
model expert and would appreaciate a second opinion.
On Wed, 21 Jun 2017, Paul Durrant wrote:
> This patch allocates an IOThread object for each xen_disk instance and
> sets the AIO context appropriately on connect. This allows processing
> of I/O to proceed in parallel.
>
> The patch also adds tracepoints into xen_disk to make it possible to
> follow the state transtions of an instance in the log.
>
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> ---
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Anthony Perard <anthony.perard@citrix.com>
> Cc: Kevin Wolf <kwolf@redhat.com>
> Cc: Max Reitz <mreitz@redhat.com>
>
> v2:
> - explicitly acquire and release AIO context in qemu_aio_complete() and
> blk_bh()
> ---
> hw/block/trace-events | 7 ++++++
> hw/block/xen_disk.c | 69 ++++++++++++++++++++++++++++++++++++++++++++-------
> 2 files changed, 67 insertions(+), 9 deletions(-)
>
> diff --git a/hw/block/trace-events b/hw/block/trace-events
> index 65e83dc258..608b24ba66 100644
> --- a/hw/block/trace-events
> +++ b/hw/block/trace-events
> @@ -10,3 +10,10 @@ virtio_blk_submit_multireq(void *mrb, int start, int num_reqs, uint64_t offset,
> # hw/block/hd-geometry.c
> hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p LCHS %d %d %d"
> hd_geometry_guess(void *blk, uint32_t cyls, uint32_t heads, uint32_t secs, int trans) "blk %p CHS %u %u %u trans %d"
> +
> +# hw/block/xen_disk.c
> +xen_disk_alloc(char *name) "%s"
> +xen_disk_init(char *name) "%s"
> +xen_disk_connect(char *name) "%s"
> +xen_disk_disconnect(char *name) "%s"
> +xen_disk_free(char *name) "%s"
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index 0e6513708e..8548195195 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -27,10 +27,13 @@
> #include "hw/xen/xen_backend.h"
> #include "xen_blkif.h"
> #include "sysemu/blockdev.h"
> +#include "sysemu/iothread.h"
> #include "sysemu/block-backend.h"
> #include "qapi/error.h"
> #include "qapi/qmp/qdict.h"
> #include "qapi/qmp/qstring.h"
> +#include "qom/object_interfaces.h"
> +#include "trace.h"
>
> /* ------------------------------------------------------------- */
>
> @@ -128,6 +131,9 @@ struct XenBlkDev {
> DriveInfo *dinfo;
> BlockBackend *blk;
> QEMUBH *bh;
> +
> + IOThread *iothread;
> + AioContext *ctx;
> };
>
> /* ------------------------------------------------------------- */
> @@ -599,9 +605,12 @@ static int ioreq_runio_qemu_aio(struct ioreq *ioreq);
> static void qemu_aio_complete(void *opaque, int ret)
> {
> struct ioreq *ioreq = opaque;
> + struct XenBlkDev *blkdev = ioreq->blkdev;
> +
> + aio_context_acquire(blkdev->ctx);
I think that Paolo was right that we need a aio_context_acquire here,
however the issue is that with the current code:
blk_handle_requests -> ioreq_runio_qemu_aio -> qemu_aio_complete
leading to aio_context_acquire being called twice on the same lock,
which I don't think is allowed?
I think we need to get rid of the qemu_aio_complete call from
ioreq_runio_qemu_aio, but to do that we need to be careful with the
accounting of aio_inflight (today it's incremented unconditionally at
the beginning of ioreq_runio_qemu_aio, I think we would have to change
that to increment it only if presync).
> if (ret != 0) {
> - xen_pv_printf(&ioreq->blkdev->xendev, 0, "%s I/O error\n",
> + xen_pv_printf(&blkdev->xendev, 0, "%s I/O error\n",
> ioreq->req.operation == BLKIF_OP_READ ? "read" : "write");
> ioreq->aio_errors++;
> }
> @@ -610,13 +619,13 @@ static void qemu_aio_complete(void *opaque, int ret)
> if (ioreq->presync) {
> ioreq->presync = 0;
> ioreq_runio_qemu_aio(ioreq);
> - return;
> + goto done;
> }
> if (ioreq->aio_inflight > 0) {
> - return;
> + goto done;
> }
>
> - if (ioreq->blkdev->feature_grant_copy) {
> + if (blkdev->feature_grant_copy) {
> switch (ioreq->req.operation) {
> case BLKIF_OP_READ:
> /* in case of failure ioreq->aio_errors is increased */
> @@ -638,7 +647,7 @@ static void qemu_aio_complete(void *opaque, int ret)
> }
>
> ioreq->status = ioreq->aio_errors ? BLKIF_RSP_ERROR : BLKIF_RSP_OKAY;
> - if (!ioreq->blkdev->feature_grant_copy) {
> + if (!blkdev->feature_grant_copy) {
> ioreq_unmap(ioreq);
> }
> ioreq_finish(ioreq);
> @@ -650,16 +659,19 @@ static void qemu_aio_complete(void *opaque, int ret)
> }
> case BLKIF_OP_READ:
> if (ioreq->status == BLKIF_RSP_OKAY) {
> - block_acct_done(blk_get_stats(ioreq->blkdev->blk), &ioreq->acct);
> + block_acct_done(blk_get_stats(blkdev->blk), &ioreq->acct);
> } else {
> - block_acct_failed(blk_get_stats(ioreq->blkdev->blk), &ioreq->acct);
> + block_acct_failed(blk_get_stats(blkdev->blk), &ioreq->acct);
> }
> break;
> case BLKIF_OP_DISCARD:
> default:
> break;
> }
> - qemu_bh_schedule(ioreq->blkdev->bh);
> + qemu_bh_schedule(blkdev->bh);
> +
> +done:
> + aio_context_release(blkdev->ctx);
> }
>
> static bool blk_split_discard(struct ioreq *ioreq, blkif_sector_t sector_number,
> @@ -917,17 +929,40 @@ static void blk_handle_requests(struct XenBlkDev *blkdev)
> static void blk_bh(void *opaque)
> {
> struct XenBlkDev *blkdev = opaque;
> +
> + aio_context_acquire(blkdev->ctx);
> blk_handle_requests(blkdev);
> + aio_context_release(blkdev->ctx);
> }
>
> static void blk_alloc(struct XenDevice *xendev)
> {
> struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
> + Object *obj;
> + char *name;
> + Error *err = NULL;
> +
> + trace_xen_disk_alloc(xendev->name);
>
> QLIST_INIT(&blkdev->inflight);
> QLIST_INIT(&blkdev->finished);
> QLIST_INIT(&blkdev->freelist);
> - blkdev->bh = qemu_bh_new(blk_bh, blkdev);
> +
> + obj = object_new(TYPE_IOTHREAD);
> + name = g_strdup_printf("iothread-%s", xendev->name);
> +
> + object_property_add_child(object_get_objects_root(), name, obj, &err);
> + assert(!err);
Would it be enough to call object_ref?
> + g_free(name);
> +
> + user_creatable_complete(obj, &err);
Why do we need to call this?
> + assert(!err);
> +
> + blkdev->iothread = (IOThread *)object_dynamic_cast(obj, TYPE_IOTHREAD);
> + blkdev->ctx = iothread_get_aio_context(blkdev->iothread);
> + blkdev->bh = aio_bh_new(blkdev->ctx, blk_bh, blkdev);
> +
> if (xen_mode != XEN_EMULATE) {
> batch_maps = 1;
> }
> @@ -1288,6 +1327,8 @@ static int blk_connect(struct XenDevice *xendev)
> blkdev->persistent_gnt_count = 0;
> }
>
> + blk_set_aio_context(blkdev->blk, blkdev->ctx);
> +
> xen_be_bind_evtchn(&blkdev->xendev);
>
> xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, nr-ring-ref %u, "
> @@ -1301,13 +1342,20 @@ static void blk_disconnect(struct XenDevice *xendev)
> {
> struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
>
> + trace_xen_disk_disconnect(xendev->name);
> +
> + aio_context_acquire(blkdev->ctx);
> +
> if (blkdev->blk) {
> + blk_set_aio_context(blkdev->blk, qemu_get_aio_context());
> blk_detach_dev(blkdev->blk, blkdev);
> blk_unref(blkdev->blk);
> blkdev->blk = NULL;
> }
> xen_pv_unbind_evtchn(&blkdev->xendev);
>
> + aio_context_release(blkdev->ctx);
> +
> if (blkdev->sring) {
> xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring,
> blkdev->nr_ring_ref);
> @@ -1358,6 +1408,7 @@ static int blk_free(struct XenDevice *xendev)
> g_free(blkdev->dev);
> g_free(blkdev->devtype);
> qemu_bh_delete(blkdev->bh);
> + object_unparent(OBJECT(blkdev->iothread));
Shouldn't this be object_unref?
> return 0;
> }
[-- Attachment #2: Type: text/plain, Size: 127 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [Xen-devel] [PATCH v2 0/3] xen-disk: performance improvements
2017-06-21 12:52 ` Paul Durrant
@ 2017-06-27 22:07 ` Stefano Stabellini
-1 siblings, 0 replies; 24+ messages in thread
From: Stefano Stabellini @ 2017-06-27 22:07 UTC (permalink / raw)
To: Paul Durrant; +Cc: xen-devel, qemu-devel, qemu-block
On Wed, 21 Jun 2017, Paul Durrant wrote:
> Paul Durrant (3):
> xen-disk: only advertize feature-persistent if grant copy is not
> available
> xen-disk: add support for multi-page shared rings
> xen-disk: use an IOThread per instance
>
> hw/block/trace-events | 7 ++
> hw/block/xen_disk.c | 228 +++++++++++++++++++++++++++++++++++++++-----------
> 2 files changed, 188 insertions(+), 47 deletions(-)
While waiting for an answer on patch #3, I sent a pull request for the
first 2 patches
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 0/3] xen-disk: performance improvements
@ 2017-06-27 22:07 ` Stefano Stabellini
0 siblings, 0 replies; 24+ messages in thread
From: Stefano Stabellini @ 2017-06-27 22:07 UTC (permalink / raw)
To: Paul Durrant; +Cc: xen-devel, qemu-devel, qemu-block
On Wed, 21 Jun 2017, Paul Durrant wrote:
> Paul Durrant (3):
> xen-disk: only advertize feature-persistent if grant copy is not
> available
> xen-disk: add support for multi-page shared rings
> xen-disk: use an IOThread per instance
>
> hw/block/trace-events | 7 ++
> hw/block/xen_disk.c | 228 +++++++++++++++++++++++++++++++++++++++-----------
> 2 files changed, 188 insertions(+), 47 deletions(-)
While waiting for an answer on patch #3, I sent a pull request for the
first 2 patches
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [Xen-devel] [PATCH v2 0/3] xen-disk: performance improvements
2017-06-27 22:07 ` Stefano Stabellini
@ 2017-06-28 12:52 ` Paul Durrant
-1 siblings, 0 replies; 24+ messages in thread
From: Paul Durrant @ 2017-06-28 12:52 UTC (permalink / raw)
To: 'Stefano Stabellini'; +Cc: xen-devel, qemu-devel, qemu-block
> -----Original Message-----
> From: Stefano Stabellini [mailto:sstabellini@kernel.org]
> Sent: 27 June 2017 23:07
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: xen-devel@lists.xenproject.org; qemu-devel@nongnu.org; qemu-
> block@nongnu.org
> Subject: Re: [Xen-devel] [PATCH v2 0/3] xen-disk: performance
> improvements
>
> On Wed, 21 Jun 2017, Paul Durrant wrote:
> > Paul Durrant (3):
> > xen-disk: only advertize feature-persistent if grant copy is not
> > available
> > xen-disk: add support for multi-page shared rings
> > xen-disk: use an IOThread per instance
> >
> > hw/block/trace-events | 7 ++
> > hw/block/xen_disk.c | 228
> +++++++++++++++++++++++++++++++++++++++-----------
> > 2 files changed, 188 insertions(+), 47 deletions(-)
>
> While waiting for an answer on patch #3, I sent a pull request for the
> first 2 patches
Cool. Thanks. Hopefully we won't have to wait too long for review on patch #3.
Cheers,
Paul
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 0/3] xen-disk: performance improvements
@ 2017-06-28 12:52 ` Paul Durrant
0 siblings, 0 replies; 24+ messages in thread
From: Paul Durrant @ 2017-06-28 12:52 UTC (permalink / raw)
To: 'Stefano Stabellini'; +Cc: xen-devel, qemu-devel, qemu-block
> -----Original Message-----
> From: Stefano Stabellini [mailto:sstabellini@kernel.org]
> Sent: 27 June 2017 23:07
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: xen-devel@lists.xenproject.org; qemu-devel@nongnu.org; qemu-
> block@nongnu.org
> Subject: Re: [Xen-devel] [PATCH v2 0/3] xen-disk: performance
> improvements
>
> On Wed, 21 Jun 2017, Paul Durrant wrote:
> > Paul Durrant (3):
> > xen-disk: only advertize feature-persistent if grant copy is not
> > available
> > xen-disk: add support for multi-page shared rings
> > xen-disk: use an IOThread per instance
> >
> > hw/block/trace-events | 7 ++
> > hw/block/xen_disk.c | 228
> +++++++++++++++++++++++++++++++++++++++-----------
> > 2 files changed, 188 insertions(+), 47 deletions(-)
>
> While waiting for an answer on patch #3, I sent a pull request for the
> first 2 patches
Cool. Thanks. Hopefully we won't have to wait too long for review on patch #3.
Cheers,
Paul
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/3] xen-disk: use an IOThread per instance
2017-06-22 22:14 ` [Qemu-devel] " Stefano Stabellini
@ 2017-07-07 8:20 ` Paul Durrant
2017-07-07 22:06 ` Stefano Stabellini
2017-07-07 22:06 ` [Qemu-devel] " Stefano Stabellini
2017-07-07 8:20 ` Paul Durrant
1 sibling, 2 replies; 24+ messages in thread
From: Paul Durrant @ 2017-07-07 8:20 UTC (permalink / raw)
To: 'Stefano Stabellini'
Cc: xen-devel, qemu-devel, qemu-block, Anthony Perard, Kevin Wolf,
Max Reitz, afaerber
> -----Original Message-----
> From: Stefano Stabellini [mailto:sstabellini@kernel.org]
> Sent: 22 June 2017 23:15
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: xen-devel@lists.xenproject.org; qemu-devel@nongnu.org; qemu-
> block@nongnu.org; Stefano Stabellini <sstabellini@kernel.org>; Anthony
> Perard <anthony.perard@citrix.com>; Kevin Wolf <kwolf@redhat.com>;
> Max Reitz <mreitz@redhat.com>; afaerber@suse.de
> Subject: Re: [PATCH v2 3/3] xen-disk: use an IOThread per instance
>
> CC'ing Andreas Färber. Could you please give a quick look below at the
> way the iothread object is instantiate and destroyed? I am no object
> model expert and would appreaciate a second opinion.
>
I have not seen any response so far.
>
> On Wed, 21 Jun 2017, Paul Durrant wrote:
> > This patch allocates an IOThread object for each xen_disk instance and
> > sets the AIO context appropriately on connect. This allows processing
> > of I/O to proceed in parallel.
> >
> > The patch also adds tracepoints into xen_disk to make it possible to
> > follow the state transtions of an instance in the log.
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > ---
> > Cc: Stefano Stabellini <sstabellini@kernel.org>
> > Cc: Anthony Perard <anthony.perard@citrix.com>
> > Cc: Kevin Wolf <kwolf@redhat.com>
> > Cc: Max Reitz <mreitz@redhat.com>
> >
> > v2:
> > - explicitly acquire and release AIO context in qemu_aio_complete() and
> > blk_bh()
> > ---
> > hw/block/trace-events | 7 ++++++
> > hw/block/xen_disk.c | 69
> ++++++++++++++++++++++++++++++++++++++++++++-------
> > 2 files changed, 67 insertions(+), 9 deletions(-)
> >
> > diff --git a/hw/block/trace-events b/hw/block/trace-events
> > index 65e83dc258..608b24ba66 100644
> > --- a/hw/block/trace-events
> > +++ b/hw/block/trace-events
> > @@ -10,3 +10,10 @@ virtio_blk_submit_multireq(void *mrb, int start, int
> num_reqs, uint64_t offset,
> > # hw/block/hd-geometry.c
> > hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p
> LCHS %d %d %d"
> > hd_geometry_guess(void *blk, uint32_t cyls, uint32_t heads, uint32_t
> secs, int trans) "blk %p CHS %u %u %u trans %d"
> > +
> > +# hw/block/xen_disk.c
> > +xen_disk_alloc(char *name) "%s"
> > +xen_disk_init(char *name) "%s"
> > +xen_disk_connect(char *name) "%s"
> > +xen_disk_disconnect(char *name) "%s"
> > +xen_disk_free(char *name) "%s"
> > diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> > index 0e6513708e..8548195195 100644
> > --- a/hw/block/xen_disk.c
> > +++ b/hw/block/xen_disk.c
> > @@ -27,10 +27,13 @@
> > #include "hw/xen/xen_backend.h"
> > #include "xen_blkif.h"
> > #include "sysemu/blockdev.h"
> > +#include "sysemu/iothread.h"
> > #include "sysemu/block-backend.h"
> > #include "qapi/error.h"
> > #include "qapi/qmp/qdict.h"
> > #include "qapi/qmp/qstring.h"
> > +#include "qom/object_interfaces.h"
> > +#include "trace.h"
> >
> > /* ------------------------------------------------------------- */
> >
> > @@ -128,6 +131,9 @@ struct XenBlkDev {
> > DriveInfo *dinfo;
> > BlockBackend *blk;
> > QEMUBH *bh;
> > +
> > + IOThread *iothread;
> > + AioContext *ctx;
> > };
> >
> > /* ------------------------------------------------------------- */
> > @@ -599,9 +605,12 @@ static int ioreq_runio_qemu_aio(struct ioreq
> *ioreq);
> > static void qemu_aio_complete(void *opaque, int ret)
> > {
> > struct ioreq *ioreq = opaque;
> > + struct XenBlkDev *blkdev = ioreq->blkdev;
> > +
> > + aio_context_acquire(blkdev->ctx);
>
> I think that Paolo was right that we need a aio_context_acquire here,
> however the issue is that with the current code:
>
> blk_handle_requests -> ioreq_runio_qemu_aio -> qemu_aio_complete
>
> leading to aio_context_acquire being called twice on the same lock,
> which I don't think is allowed?
It resolves to a qemu_rec_mutex_lock() which I believed is a recursive lock, so I think that's ok.
>
> I think we need to get rid of the qemu_aio_complete call from
> ioreq_runio_qemu_aio, but to do that we need to be careful with the
> accounting of aio_inflight (today it's incremented unconditionally at
> the beginning of ioreq_runio_qemu_aio, I think we would have to change
> that to increment it only if presync).
>
If the lock is indeed recursive then I think we can avoid this complication.
>
> > if (ret != 0) {
> > - xen_pv_printf(&ioreq->blkdev->xendev, 0, "%s I/O error\n",
> > + xen_pv_printf(&blkdev->xendev, 0, "%s I/O error\n",
> > ioreq->req.operation == BLKIF_OP_READ ? "read" : "write");
> > ioreq->aio_errors++;
> > }
> > @@ -610,13 +619,13 @@ static void qemu_aio_complete(void *opaque, int
> ret)
> > if (ioreq->presync) {
> > ioreq->presync = 0;
> > ioreq_runio_qemu_aio(ioreq);
> > - return;
> > + goto done;
> > }
> > if (ioreq->aio_inflight > 0) {
> > - return;
> > + goto done;
> > }
> >
> > - if (ioreq->blkdev->feature_grant_copy) {
> > + if (blkdev->feature_grant_copy) {
> > switch (ioreq->req.operation) {
> > case BLKIF_OP_READ:
> > /* in case of failure ioreq->aio_errors is increased */
> > @@ -638,7 +647,7 @@ static void qemu_aio_complete(void *opaque, int
> ret)
> > }
> >
> > ioreq->status = ioreq->aio_errors ? BLKIF_RSP_ERROR :
> BLKIF_RSP_OKAY;
> > - if (!ioreq->blkdev->feature_grant_copy) {
> > + if (!blkdev->feature_grant_copy) {
> > ioreq_unmap(ioreq);
> > }
> > ioreq_finish(ioreq);
> > @@ -650,16 +659,19 @@ static void qemu_aio_complete(void *opaque, int
> ret)
> > }
> > case BLKIF_OP_READ:
> > if (ioreq->status == BLKIF_RSP_OKAY) {
> > - block_acct_done(blk_get_stats(ioreq->blkdev->blk), &ioreq->acct);
> > + block_acct_done(blk_get_stats(blkdev->blk), &ioreq->acct);
> > } else {
> > - block_acct_failed(blk_get_stats(ioreq->blkdev->blk), &ioreq->acct);
> > + block_acct_failed(blk_get_stats(blkdev->blk), &ioreq->acct);
> > }
> > break;
> > case BLKIF_OP_DISCARD:
> > default:
> > break;
> > }
> > - qemu_bh_schedule(ioreq->blkdev->bh);
> > + qemu_bh_schedule(blkdev->bh);
> > +
> > +done:
> > + aio_context_release(blkdev->ctx);
> > }
> >
> > static bool blk_split_discard(struct ioreq *ioreq, blkif_sector_t
> sector_number,
> > @@ -917,17 +929,40 @@ static void blk_handle_requests(struct XenBlkDev
> *blkdev)
> > static void blk_bh(void *opaque)
> > {
> > struct XenBlkDev *blkdev = opaque;
> > +
> > + aio_context_acquire(blkdev->ctx);
> > blk_handle_requests(blkdev);
> > + aio_context_release(blkdev->ctx);
> > }
> >
> > static void blk_alloc(struct XenDevice *xendev)
> > {
> > struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev,
> xendev);
> > + Object *obj;
> > + char *name;
> > + Error *err = NULL;
> > +
> > + trace_xen_disk_alloc(xendev->name);
> >
> > QLIST_INIT(&blkdev->inflight);
> > QLIST_INIT(&blkdev->finished);
> > QLIST_INIT(&blkdev->freelist);
> > - blkdev->bh = qemu_bh_new(blk_bh, blkdev);
> > +
> > + obj = object_new(TYPE_IOTHREAD);
> > + name = g_strdup_printf("iothread-%s", xendev->name);
> > +
> > + object_property_add_child(object_get_objects_root(), name, obj,
> &err);
> > + assert(!err);
>
> Would it be enough to call object_ref?
>
You mean to avoid the assert? I guess so but I think any failure here would be indicative of a larger problem.
>
> > + g_free(name);
> > +
> > + user_creatable_complete(obj, &err);
>
> Why do we need to call this?
>
I'm not entirely sure but looking around the object code it seemed to be a necessary part of instantiation. Maybe it is not required for iothread objects, but I could not figure that out from looking at the code and comments in the header suggest it is harmless if it is not required.
>
> > + assert(!err);
> > +
> > + blkdev->iothread = (IOThread *)object_dynamic_cast(obj,
> TYPE_IOTHREAD);
> > + blkdev->ctx = iothread_get_aio_context(blkdev->iothread);
> > + blkdev->bh = aio_bh_new(blkdev->ctx, blk_bh, blkdev);
> > +
> > if (xen_mode != XEN_EMULATE) {
> > batch_maps = 1;
> > }
> > @@ -1288,6 +1327,8 @@ static int blk_connect(struct XenDevice *xendev)
> > blkdev->persistent_gnt_count = 0;
> > }
> >
> > + blk_set_aio_context(blkdev->blk, blkdev->ctx);
> > +
> > xen_be_bind_evtchn(&blkdev->xendev);
> >
> > xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, nr-ring-ref %u, "
> > @@ -1301,13 +1342,20 @@ static void blk_disconnect(struct XenDevice
> *xendev)
> > {
> > struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev,
> xendev);
> >
> > + trace_xen_disk_disconnect(xendev->name);
> > +
> > + aio_context_acquire(blkdev->ctx);
> > +
> > if (blkdev->blk) {
> > + blk_set_aio_context(blkdev->blk, qemu_get_aio_context());
> > blk_detach_dev(blkdev->blk, blkdev);
> > blk_unref(blkdev->blk);
> > blkdev->blk = NULL;
> > }
> > xen_pv_unbind_evtchn(&blkdev->xendev);
> >
> > + aio_context_release(blkdev->ctx);
> > +
> > if (blkdev->sring) {
> > xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring,
> > blkdev->nr_ring_ref);
> > @@ -1358,6 +1408,7 @@ static int blk_free(struct XenDevice *xendev)
> > g_free(blkdev->dev);
> > g_free(blkdev->devtype);
> > qemu_bh_delete(blkdev->bh);
> > + object_unparent(OBJECT(blkdev->iothread));
>
> Shouldn't this be object_unref?
>
I don't think so. I think this is required to undo what was done by calling object_property_add_child() on the root object. Looking at other code such as object_new_with_propv() it looks like the right thing to do is to call object_unref() after calling object_property_add_child() to drop the implicit ref taken by object_new() so I'd need to add the call in blk_alloc().
Paul
>
> > return 0;
> > }
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 3/3] xen-disk: use an IOThread per instance
2017-06-22 22:14 ` [Qemu-devel] " Stefano Stabellini
2017-07-07 8:20 ` Paul Durrant
@ 2017-07-07 8:20 ` Paul Durrant
1 sibling, 0 replies; 24+ messages in thread
From: Paul Durrant @ 2017-07-07 8:20 UTC (permalink / raw)
To: 'Stefano Stabellini'
Cc: Kevin Wolf, qemu-block, qemu-devel, Max Reitz, Anthony Perard,
xen-devel, afaerber
> -----Original Message-----
> From: Stefano Stabellini [mailto:sstabellini@kernel.org]
> Sent: 22 June 2017 23:15
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: xen-devel@lists.xenproject.org; qemu-devel@nongnu.org; qemu-
> block@nongnu.org; Stefano Stabellini <sstabellini@kernel.org>; Anthony
> Perard <anthony.perard@citrix.com>; Kevin Wolf <kwolf@redhat.com>;
> Max Reitz <mreitz@redhat.com>; afaerber@suse.de
> Subject: Re: [PATCH v2 3/3] xen-disk: use an IOThread per instance
>
> CC'ing Andreas Färber. Could you please give a quick look below at the
> way the iothread object is instantiate and destroyed? I am no object
> model expert and would appreaciate a second opinion.
>
I have not seen any response so far.
>
> On Wed, 21 Jun 2017, Paul Durrant wrote:
> > This patch allocates an IOThread object for each xen_disk instance and
> > sets the AIO context appropriately on connect. This allows processing
> > of I/O to proceed in parallel.
> >
> > The patch also adds tracepoints into xen_disk to make it possible to
> > follow the state transtions of an instance in the log.
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > ---
> > Cc: Stefano Stabellini <sstabellini@kernel.org>
> > Cc: Anthony Perard <anthony.perard@citrix.com>
> > Cc: Kevin Wolf <kwolf@redhat.com>
> > Cc: Max Reitz <mreitz@redhat.com>
> >
> > v2:
> > - explicitly acquire and release AIO context in qemu_aio_complete() and
> > blk_bh()
> > ---
> > hw/block/trace-events | 7 ++++++
> > hw/block/xen_disk.c | 69
> ++++++++++++++++++++++++++++++++++++++++++++-------
> > 2 files changed, 67 insertions(+), 9 deletions(-)
> >
> > diff --git a/hw/block/trace-events b/hw/block/trace-events
> > index 65e83dc258..608b24ba66 100644
> > --- a/hw/block/trace-events
> > +++ b/hw/block/trace-events
> > @@ -10,3 +10,10 @@ virtio_blk_submit_multireq(void *mrb, int start, int
> num_reqs, uint64_t offset,
> > # hw/block/hd-geometry.c
> > hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p
> LCHS %d %d %d"
> > hd_geometry_guess(void *blk, uint32_t cyls, uint32_t heads, uint32_t
> secs, int trans) "blk %p CHS %u %u %u trans %d"
> > +
> > +# hw/block/xen_disk.c
> > +xen_disk_alloc(char *name) "%s"
> > +xen_disk_init(char *name) "%s"
> > +xen_disk_connect(char *name) "%s"
> > +xen_disk_disconnect(char *name) "%s"
> > +xen_disk_free(char *name) "%s"
> > diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> > index 0e6513708e..8548195195 100644
> > --- a/hw/block/xen_disk.c
> > +++ b/hw/block/xen_disk.c
> > @@ -27,10 +27,13 @@
> > #include "hw/xen/xen_backend.h"
> > #include "xen_blkif.h"
> > #include "sysemu/blockdev.h"
> > +#include "sysemu/iothread.h"
> > #include "sysemu/block-backend.h"
> > #include "qapi/error.h"
> > #include "qapi/qmp/qdict.h"
> > #include "qapi/qmp/qstring.h"
> > +#include "qom/object_interfaces.h"
> > +#include "trace.h"
> >
> > /* ------------------------------------------------------------- */
> >
> > @@ -128,6 +131,9 @@ struct XenBlkDev {
> > DriveInfo *dinfo;
> > BlockBackend *blk;
> > QEMUBH *bh;
> > +
> > + IOThread *iothread;
> > + AioContext *ctx;
> > };
> >
> > /* ------------------------------------------------------------- */
> > @@ -599,9 +605,12 @@ static int ioreq_runio_qemu_aio(struct ioreq
> *ioreq);
> > static void qemu_aio_complete(void *opaque, int ret)
> > {
> > struct ioreq *ioreq = opaque;
> > + struct XenBlkDev *blkdev = ioreq->blkdev;
> > +
> > + aio_context_acquire(blkdev->ctx);
>
> I think that Paolo was right that we need a aio_context_acquire here,
> however the issue is that with the current code:
>
> blk_handle_requests -> ioreq_runio_qemu_aio -> qemu_aio_complete
>
> leading to aio_context_acquire being called twice on the same lock,
> which I don't think is allowed?
It resolves to a qemu_rec_mutex_lock() which I believed is a recursive lock, so I think that's ok.
>
> I think we need to get rid of the qemu_aio_complete call from
> ioreq_runio_qemu_aio, but to do that we need to be careful with the
> accounting of aio_inflight (today it's incremented unconditionally at
> the beginning of ioreq_runio_qemu_aio, I think we would have to change
> that to increment it only if presync).
>
If the lock is indeed recursive then I think we can avoid this complication.
>
> > if (ret != 0) {
> > - xen_pv_printf(&ioreq->blkdev->xendev, 0, "%s I/O error\n",
> > + xen_pv_printf(&blkdev->xendev, 0, "%s I/O error\n",
> > ioreq->req.operation == BLKIF_OP_READ ? "read" : "write");
> > ioreq->aio_errors++;
> > }
> > @@ -610,13 +619,13 @@ static void qemu_aio_complete(void *opaque, int
> ret)
> > if (ioreq->presync) {
> > ioreq->presync = 0;
> > ioreq_runio_qemu_aio(ioreq);
> > - return;
> > + goto done;
> > }
> > if (ioreq->aio_inflight > 0) {
> > - return;
> > + goto done;
> > }
> >
> > - if (ioreq->blkdev->feature_grant_copy) {
> > + if (blkdev->feature_grant_copy) {
> > switch (ioreq->req.operation) {
> > case BLKIF_OP_READ:
> > /* in case of failure ioreq->aio_errors is increased */
> > @@ -638,7 +647,7 @@ static void qemu_aio_complete(void *opaque, int
> ret)
> > }
> >
> > ioreq->status = ioreq->aio_errors ? BLKIF_RSP_ERROR :
> BLKIF_RSP_OKAY;
> > - if (!ioreq->blkdev->feature_grant_copy) {
> > + if (!blkdev->feature_grant_copy) {
> > ioreq_unmap(ioreq);
> > }
> > ioreq_finish(ioreq);
> > @@ -650,16 +659,19 @@ static void qemu_aio_complete(void *opaque, int
> ret)
> > }
> > case BLKIF_OP_READ:
> > if (ioreq->status == BLKIF_RSP_OKAY) {
> > - block_acct_done(blk_get_stats(ioreq->blkdev->blk), &ioreq->acct);
> > + block_acct_done(blk_get_stats(blkdev->blk), &ioreq->acct);
> > } else {
> > - block_acct_failed(blk_get_stats(ioreq->blkdev->blk), &ioreq->acct);
> > + block_acct_failed(blk_get_stats(blkdev->blk), &ioreq->acct);
> > }
> > break;
> > case BLKIF_OP_DISCARD:
> > default:
> > break;
> > }
> > - qemu_bh_schedule(ioreq->blkdev->bh);
> > + qemu_bh_schedule(blkdev->bh);
> > +
> > +done:
> > + aio_context_release(blkdev->ctx);
> > }
> >
> > static bool blk_split_discard(struct ioreq *ioreq, blkif_sector_t
> sector_number,
> > @@ -917,17 +929,40 @@ static void blk_handle_requests(struct XenBlkDev
> *blkdev)
> > static void blk_bh(void *opaque)
> > {
> > struct XenBlkDev *blkdev = opaque;
> > +
> > + aio_context_acquire(blkdev->ctx);
> > blk_handle_requests(blkdev);
> > + aio_context_release(blkdev->ctx);
> > }
> >
> > static void blk_alloc(struct XenDevice *xendev)
> > {
> > struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev,
> xendev);
> > + Object *obj;
> > + char *name;
> > + Error *err = NULL;
> > +
> > + trace_xen_disk_alloc(xendev->name);
> >
> > QLIST_INIT(&blkdev->inflight);
> > QLIST_INIT(&blkdev->finished);
> > QLIST_INIT(&blkdev->freelist);
> > - blkdev->bh = qemu_bh_new(blk_bh, blkdev);
> > +
> > + obj = object_new(TYPE_IOTHREAD);
> > + name = g_strdup_printf("iothread-%s", xendev->name);
> > +
> > + object_property_add_child(object_get_objects_root(), name, obj,
> &err);
> > + assert(!err);
>
> Would it be enough to call object_ref?
>
You mean to avoid the assert? I guess so but I think any failure here would be indicative of a larger problem.
>
> > + g_free(name);
> > +
> > + user_creatable_complete(obj, &err);
>
> Why do we need to call this?
>
I'm not entirely sure but looking around the object code it seemed to be a necessary part of instantiation. Maybe it is not required for iothread objects, but I could not figure that out from looking at the code and comments in the header suggest it is harmless if it is not required.
>
> > + assert(!err);
> > +
> > + blkdev->iothread = (IOThread *)object_dynamic_cast(obj,
> TYPE_IOTHREAD);
> > + blkdev->ctx = iothread_get_aio_context(blkdev->iothread);
> > + blkdev->bh = aio_bh_new(blkdev->ctx, blk_bh, blkdev);
> > +
> > if (xen_mode != XEN_EMULATE) {
> > batch_maps = 1;
> > }
> > @@ -1288,6 +1327,8 @@ static int blk_connect(struct XenDevice *xendev)
> > blkdev->persistent_gnt_count = 0;
> > }
> >
> > + blk_set_aio_context(blkdev->blk, blkdev->ctx);
> > +
> > xen_be_bind_evtchn(&blkdev->xendev);
> >
> > xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, nr-ring-ref %u, "
> > @@ -1301,13 +1342,20 @@ static void blk_disconnect(struct XenDevice
> *xendev)
> > {
> > struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev,
> xendev);
> >
> > + trace_xen_disk_disconnect(xendev->name);
> > +
> > + aio_context_acquire(blkdev->ctx);
> > +
> > if (blkdev->blk) {
> > + blk_set_aio_context(blkdev->blk, qemu_get_aio_context());
> > blk_detach_dev(blkdev->blk, blkdev);
> > blk_unref(blkdev->blk);
> > blkdev->blk = NULL;
> > }
> > xen_pv_unbind_evtchn(&blkdev->xendev);
> >
> > + aio_context_release(blkdev->ctx);
> > +
> > if (blkdev->sring) {
> > xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring,
> > blkdev->nr_ring_ref);
> > @@ -1358,6 +1408,7 @@ static int blk_free(struct XenDevice *xendev)
> > g_free(blkdev->dev);
> > g_free(blkdev->devtype);
> > qemu_bh_delete(blkdev->bh);
> > + object_unparent(OBJECT(blkdev->iothread));
>
> Shouldn't this be object_unref?
>
I don't think so. I think this is required to undo what was done by calling object_property_add_child() on the root object. Looking at other code such as object_new_with_propv() it looks like the right thing to do is to call object_unref() after calling object_property_add_child() to drop the implicit ref taken by object_new() so I'd need to add the call in blk_alloc().
Paul
>
> > return 0;
> > }
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/3] xen-disk: use an IOThread per instance
2017-07-07 8:20 ` Paul Durrant
2017-07-07 22:06 ` Stefano Stabellini
@ 2017-07-07 22:06 ` Stefano Stabellini
2017-07-10 12:11 ` Paul Durrant
2017-07-10 12:11 ` [Qemu-devel] " Paul Durrant
1 sibling, 2 replies; 24+ messages in thread
From: Stefano Stabellini @ 2017-07-07 22:06 UTC (permalink / raw)
To: Paul Durrant
Cc: 'Stefano Stabellini',
xen-devel, qemu-devel, qemu-block, Anthony Perard, Kevin Wolf,
Max Reitz, afaerber, armbru
On Fri, 7 Jul 2017, Paul Durrant wrote:
> > -----Original Message-----
> > From: Stefano Stabellini [mailto:sstabellini@kernel.org]
> > Sent: 22 June 2017 23:15
> > To: Paul Durrant <Paul.Durrant@citrix.com>
> > Cc: xen-devel@lists.xenproject.org; qemu-devel@nongnu.org; qemu-
> > block@nongnu.org; Stefano Stabellini <sstabellini@kernel.org>; Anthony
> > Perard <anthony.perard@citrix.com>; Kevin Wolf <kwolf@redhat.com>;
> > Max Reitz <mreitz@redhat.com>; afaerber@suse.de
> > Subject: Re: [PATCH v2 3/3] xen-disk: use an IOThread per instance
> >
> > CC'ing Andreas Färber. Could you please give a quick look below at the
> > way the iothread object is instantiate and destroyed? I am no object
> > model expert and would appreaciate a second opinion.
> >
>
> I have not seen any response so far.
>
> >
> > On Wed, 21 Jun 2017, Paul Durrant wrote:
> > > This patch allocates an IOThread object for each xen_disk instance and
> > > sets the AIO context appropriately on connect. This allows processing
> > > of I/O to proceed in parallel.
> > >
> > > The patch also adds tracepoints into xen_disk to make it possible to
> > > follow the state transtions of an instance in the log.
> > >
> > > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > > ---
> > > Cc: Stefano Stabellini <sstabellini@kernel.org>
> > > Cc: Anthony Perard <anthony.perard@citrix.com>
> > > Cc: Kevin Wolf <kwolf@redhat.com>
> > > Cc: Max Reitz <mreitz@redhat.com>
> > >
> > > v2:
> > > - explicitly acquire and release AIO context in qemu_aio_complete() and
> > > blk_bh()
> > > ---
> > > hw/block/trace-events | 7 ++++++
> > > hw/block/xen_disk.c | 69
> > ++++++++++++++++++++++++++++++++++++++++++++-------
> > > 2 files changed, 67 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/hw/block/trace-events b/hw/block/trace-events
> > > index 65e83dc258..608b24ba66 100644
> > > --- a/hw/block/trace-events
> > > +++ b/hw/block/trace-events
> > > @@ -10,3 +10,10 @@ virtio_blk_submit_multireq(void *mrb, int start, int
> > num_reqs, uint64_t offset,
> > > # hw/block/hd-geometry.c
> > > hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p
> > LCHS %d %d %d"
> > > hd_geometry_guess(void *blk, uint32_t cyls, uint32_t heads, uint32_t
> > secs, int trans) "blk %p CHS %u %u %u trans %d"
> > > +
> > > +# hw/block/xen_disk.c
> > > +xen_disk_alloc(char *name) "%s"
> > > +xen_disk_init(char *name) "%s"
> > > +xen_disk_connect(char *name) "%s"
> > > +xen_disk_disconnect(char *name) "%s"
> > > +xen_disk_free(char *name) "%s"
> > > diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> > > index 0e6513708e..8548195195 100644
> > > --- a/hw/block/xen_disk.c
> > > +++ b/hw/block/xen_disk.c
> > > @@ -27,10 +27,13 @@
> > > #include "hw/xen/xen_backend.h"
> > > #include "xen_blkif.h"
> > > #include "sysemu/blockdev.h"
> > > +#include "sysemu/iothread.h"
> > > #include "sysemu/block-backend.h"
> > > #include "qapi/error.h"
> > > #include "qapi/qmp/qdict.h"
> > > #include "qapi/qmp/qstring.h"
> > > +#include "qom/object_interfaces.h"
> > > +#include "trace.h"
> > >
> > > /* ------------------------------------------------------------- */
> > >
> > > @@ -128,6 +131,9 @@ struct XenBlkDev {
> > > DriveInfo *dinfo;
> > > BlockBackend *blk;
> > > QEMUBH *bh;
> > > +
> > > + IOThread *iothread;
> > > + AioContext *ctx;
> > > };
> > >
> > > /* ------------------------------------------------------------- */
> > > @@ -599,9 +605,12 @@ static int ioreq_runio_qemu_aio(struct ioreq
> > *ioreq);
> > > static void qemu_aio_complete(void *opaque, int ret)
> > > {
> > > struct ioreq *ioreq = opaque;
> > > + struct XenBlkDev *blkdev = ioreq->blkdev;
> > > +
> > > + aio_context_acquire(blkdev->ctx);
> >
> > I think that Paolo was right that we need a aio_context_acquire here,
> > however the issue is that with the current code:
> >
> > blk_handle_requests -> ioreq_runio_qemu_aio -> qemu_aio_complete
> >
> > leading to aio_context_acquire being called twice on the same lock,
> > which I don't think is allowed?
>
> It resolves to a qemu_rec_mutex_lock() which I believed is a recursive lock, so I think that's ok.
On Linux it becomes pthread_mutex_lock. The lock is created by
qemu_rec_mutex_init which specifies PTHREAD_MUTEX_RECURSIVE, so yes, it
should be recursive. Good.
> >
> > I think we need to get rid of the qemu_aio_complete call from
> > ioreq_runio_qemu_aio, but to do that we need to be careful with the
> > accounting of aio_inflight (today it's incremented unconditionally at
> > the beginning of ioreq_runio_qemu_aio, I think we would have to change
> > that to increment it only if presync).
> >
>
> If the lock is indeed recursive then I think we can avoid this complication.
OK
> >
> > > if (ret != 0) {
> > > - xen_pv_printf(&ioreq->blkdev->xendev, 0, "%s I/O error\n",
> > > + xen_pv_printf(&blkdev->xendev, 0, "%s I/O error\n",
> > > ioreq->req.operation == BLKIF_OP_READ ? "read" : "write");
> > > ioreq->aio_errors++;
> > > }
> > > @@ -610,13 +619,13 @@ static void qemu_aio_complete(void *opaque, int
> > ret)
> > > if (ioreq->presync) {
> > > ioreq->presync = 0;
> > > ioreq_runio_qemu_aio(ioreq);
> > > - return;
> > > + goto done;
> > > }
> > > if (ioreq->aio_inflight > 0) {
> > > - return;
> > > + goto done;
> > > }
> > >
> > > - if (ioreq->blkdev->feature_grant_copy) {
> > > + if (blkdev->feature_grant_copy) {
> > > switch (ioreq->req.operation) {
> > > case BLKIF_OP_READ:
> > > /* in case of failure ioreq->aio_errors is increased */
> > > @@ -638,7 +647,7 @@ static void qemu_aio_complete(void *opaque, int
> > ret)
> > > }
> > >
> > > ioreq->status = ioreq->aio_errors ? BLKIF_RSP_ERROR :
> > BLKIF_RSP_OKAY;
> > > - if (!ioreq->blkdev->feature_grant_copy) {
> > > + if (!blkdev->feature_grant_copy) {
> > > ioreq_unmap(ioreq);
> > > }
> > > ioreq_finish(ioreq);
> > > @@ -650,16 +659,19 @@ static void qemu_aio_complete(void *opaque, int
> > ret)
> > > }
> > > case BLKIF_OP_READ:
> > > if (ioreq->status == BLKIF_RSP_OKAY) {
> > > - block_acct_done(blk_get_stats(ioreq->blkdev->blk), &ioreq->acct);
> > > + block_acct_done(blk_get_stats(blkdev->blk), &ioreq->acct);
> > > } else {
> > > - block_acct_failed(blk_get_stats(ioreq->blkdev->blk), &ioreq->acct);
> > > + block_acct_failed(blk_get_stats(blkdev->blk), &ioreq->acct);
> > > }
> > > break;
> > > case BLKIF_OP_DISCARD:
> > > default:
> > > break;
> > > }
> > > - qemu_bh_schedule(ioreq->blkdev->bh);
> > > + qemu_bh_schedule(blkdev->bh);
> > > +
> > > +done:
> > > + aio_context_release(blkdev->ctx);
> > > }
> > >
> > > static bool blk_split_discard(struct ioreq *ioreq, blkif_sector_t
> > sector_number,
> > > @@ -917,17 +929,40 @@ static void blk_handle_requests(struct XenBlkDev
> > *blkdev)
> > > static void blk_bh(void *opaque)
> > > {
> > > struct XenBlkDev *blkdev = opaque;
> > > +
> > > + aio_context_acquire(blkdev->ctx);
> > > blk_handle_requests(blkdev);
> > > + aio_context_release(blkdev->ctx);
> > > }
> > >
> > > static void blk_alloc(struct XenDevice *xendev)
> > > {
> > > struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev,
> > xendev);
> > > + Object *obj;
> > > + char *name;
> > > + Error *err = NULL;
> > > +
> > > + trace_xen_disk_alloc(xendev->name);
> > >
> > > QLIST_INIT(&blkdev->inflight);
> > > QLIST_INIT(&blkdev->finished);
> > > QLIST_INIT(&blkdev->freelist);
> > > - blkdev->bh = qemu_bh_new(blk_bh, blkdev);
> > > +
> > > + obj = object_new(TYPE_IOTHREAD);
> > > + name = g_strdup_printf("iothread-%s", xendev->name);
> > > +
> > > + object_property_add_child(object_get_objects_root(), name, obj,
> > &err);
> > > + assert(!err);
> >
> > Would it be enough to call object_ref?
> >
>
> You mean to avoid the assert? I guess so but I think any failure here would be indicative of a larger problem.
No, I meant calling object_ref instead of object_property_add_child.
> >
> > > + g_free(name);
> > > +
> > > + user_creatable_complete(obj, &err);
> >
> > Why do we need to call this?
> >
>
> I'm not entirely sure but looking around the object code it seemed to be a necessary part of instantiation. Maybe it is not required for iothread objects, but I could not figure that out from looking at the code and comments in the header suggest it is harmless if it is not required.
>
> > > + assert(!err);
> > > +
> > > + blkdev->iothread = (IOThread *)object_dynamic_cast(obj,
> > TYPE_IOTHREAD);
> > > + blkdev->ctx = iothread_get_aio_context(blkdev->iothread);
> > > + blkdev->bh = aio_bh_new(blkdev->ctx, blk_bh, blkdev);
> > > +
> > > if (xen_mode != XEN_EMULATE) {
> > > batch_maps = 1;
> > > }
> > > @@ -1288,6 +1327,8 @@ static int blk_connect(struct XenDevice *xendev)
> > > blkdev->persistent_gnt_count = 0;
> > > }
> > >
> > > + blk_set_aio_context(blkdev->blk, blkdev->ctx);
> > > +
> > > xen_be_bind_evtchn(&blkdev->xendev);
> > >
> > > xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, nr-ring-ref %u, "
> > > @@ -1301,13 +1342,20 @@ static void blk_disconnect(struct XenDevice
> > *xendev)
> > > {
> > > struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev,
> > xendev);
> > >
> > > + trace_xen_disk_disconnect(xendev->name);
> > > +
> > > + aio_context_acquire(blkdev->ctx);
> > > +
> > > if (blkdev->blk) {
> > > + blk_set_aio_context(blkdev->blk, qemu_get_aio_context());
> > > blk_detach_dev(blkdev->blk, blkdev);
> > > blk_unref(blkdev->blk);
> > > blkdev->blk = NULL;
> > > }
> > > xen_pv_unbind_evtchn(&blkdev->xendev);
> > >
> > > + aio_context_release(blkdev->ctx);
> > > +
> > > if (blkdev->sring) {
> > > xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring,
> > > blkdev->nr_ring_ref);
> > > @@ -1358,6 +1408,7 @@ static int blk_free(struct XenDevice *xendev)
> > > g_free(blkdev->dev);
> > > g_free(blkdev->devtype);
> > > qemu_bh_delete(blkdev->bh);
> > > + object_unparent(OBJECT(blkdev->iothread));
> >
> > Shouldn't this be object_unref?
> >
>
> I don't think so. I think this is required to undo what was done by calling object_property_add_child() on the root object.
Right, so if object_property_add_child is not actually required, then
you might be able to turn object_unparent into object_unref.
Unfortunately I don't know enough about QOM to be able to tell which is
the right way of doing things, but looking at
hw/block/dataplane/virtio-blk.c, it would seem that only object_ref and
object_unref are required?
> Looking at other code such as object_new_with_propv() it looks like the right thing to do is to call object_unref() after calling object_property_add_child() to drop the implicit ref taken by object_new() so I'd need to add the call in blk_alloc().
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 3/3] xen-disk: use an IOThread per instance
2017-07-07 8:20 ` Paul Durrant
@ 2017-07-07 22:06 ` Stefano Stabellini
2017-07-07 22:06 ` [Qemu-devel] " Stefano Stabellini
1 sibling, 0 replies; 24+ messages in thread
From: Stefano Stabellini @ 2017-07-07 22:06 UTC (permalink / raw)
To: Paul Durrant
Cc: Kevin Wolf, 'Stefano Stabellini',
qemu-block, armbru, qemu-devel, Max Reitz, Anthony Perard,
xen-devel, afaerber
[-- Attachment #1: Type: TEXT/PLAIN, Size: 11287 bytes --]
On Fri, 7 Jul 2017, Paul Durrant wrote:
> > -----Original Message-----
> > From: Stefano Stabellini [mailto:sstabellini@kernel.org]
> > Sent: 22 June 2017 23:15
> > To: Paul Durrant <Paul.Durrant@citrix.com>
> > Cc: xen-devel@lists.xenproject.org; qemu-devel@nongnu.org; qemu-
> > block@nongnu.org; Stefano Stabellini <sstabellini@kernel.org>; Anthony
> > Perard <anthony.perard@citrix.com>; Kevin Wolf <kwolf@redhat.com>;
> > Max Reitz <mreitz@redhat.com>; afaerber@suse.de
> > Subject: Re: [PATCH v2 3/3] xen-disk: use an IOThread per instance
> >
> > CC'ing Andreas Färber. Could you please give a quick look below at the
> > way the iothread object is instantiate and destroyed? I am no object
> > model expert and would appreaciate a second opinion.
> >
>
> I have not seen any response so far.
>
> >
> > On Wed, 21 Jun 2017, Paul Durrant wrote:
> > > This patch allocates an IOThread object for each xen_disk instance and
> > > sets the AIO context appropriately on connect. This allows processing
> > > of I/O to proceed in parallel.
> > >
> > > The patch also adds tracepoints into xen_disk to make it possible to
> > > follow the state transtions of an instance in the log.
> > >
> > > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > > ---
> > > Cc: Stefano Stabellini <sstabellini@kernel.org>
> > > Cc: Anthony Perard <anthony.perard@citrix.com>
> > > Cc: Kevin Wolf <kwolf@redhat.com>
> > > Cc: Max Reitz <mreitz@redhat.com>
> > >
> > > v2:
> > > - explicitly acquire and release AIO context in qemu_aio_complete() and
> > > blk_bh()
> > > ---
> > > hw/block/trace-events | 7 ++++++
> > > hw/block/xen_disk.c | 69
> > ++++++++++++++++++++++++++++++++++++++++++++-------
> > > 2 files changed, 67 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/hw/block/trace-events b/hw/block/trace-events
> > > index 65e83dc258..608b24ba66 100644
> > > --- a/hw/block/trace-events
> > > +++ b/hw/block/trace-events
> > > @@ -10,3 +10,10 @@ virtio_blk_submit_multireq(void *mrb, int start, int
> > num_reqs, uint64_t offset,
> > > # hw/block/hd-geometry.c
> > > hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p
> > LCHS %d %d %d"
> > > hd_geometry_guess(void *blk, uint32_t cyls, uint32_t heads, uint32_t
> > secs, int trans) "blk %p CHS %u %u %u trans %d"
> > > +
> > > +# hw/block/xen_disk.c
> > > +xen_disk_alloc(char *name) "%s"
> > > +xen_disk_init(char *name) "%s"
> > > +xen_disk_connect(char *name) "%s"
> > > +xen_disk_disconnect(char *name) "%s"
> > > +xen_disk_free(char *name) "%s"
> > > diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> > > index 0e6513708e..8548195195 100644
> > > --- a/hw/block/xen_disk.c
> > > +++ b/hw/block/xen_disk.c
> > > @@ -27,10 +27,13 @@
> > > #include "hw/xen/xen_backend.h"
> > > #include "xen_blkif.h"
> > > #include "sysemu/blockdev.h"
> > > +#include "sysemu/iothread.h"
> > > #include "sysemu/block-backend.h"
> > > #include "qapi/error.h"
> > > #include "qapi/qmp/qdict.h"
> > > #include "qapi/qmp/qstring.h"
> > > +#include "qom/object_interfaces.h"
> > > +#include "trace.h"
> > >
> > > /* ------------------------------------------------------------- */
> > >
> > > @@ -128,6 +131,9 @@ struct XenBlkDev {
> > > DriveInfo *dinfo;
> > > BlockBackend *blk;
> > > QEMUBH *bh;
> > > +
> > > + IOThread *iothread;
> > > + AioContext *ctx;
> > > };
> > >
> > > /* ------------------------------------------------------------- */
> > > @@ -599,9 +605,12 @@ static int ioreq_runio_qemu_aio(struct ioreq
> > *ioreq);
> > > static void qemu_aio_complete(void *opaque, int ret)
> > > {
> > > struct ioreq *ioreq = opaque;
> > > + struct XenBlkDev *blkdev = ioreq->blkdev;
> > > +
> > > + aio_context_acquire(blkdev->ctx);
> >
> > I think that Paolo was right that we need a aio_context_acquire here,
> > however the issue is that with the current code:
> >
> > blk_handle_requests -> ioreq_runio_qemu_aio -> qemu_aio_complete
> >
> > leading to aio_context_acquire being called twice on the same lock,
> > which I don't think is allowed?
>
> It resolves to a qemu_rec_mutex_lock() which I believed is a recursive lock, so I think that's ok.
On Linux it becomes pthread_mutex_lock. The lock is created by
qemu_rec_mutex_init which specifies PTHREAD_MUTEX_RECURSIVE, so yes, it
should be recursive. Good.
> >
> > I think we need to get rid of the qemu_aio_complete call from
> > ioreq_runio_qemu_aio, but to do that we need to be careful with the
> > accounting of aio_inflight (today it's incremented unconditionally at
> > the beginning of ioreq_runio_qemu_aio, I think we would have to change
> > that to increment it only if presync).
> >
>
> If the lock is indeed recursive then I think we can avoid this complication.
OK
> >
> > > if (ret != 0) {
> > > - xen_pv_printf(&ioreq->blkdev->xendev, 0, "%s I/O error\n",
> > > + xen_pv_printf(&blkdev->xendev, 0, "%s I/O error\n",
> > > ioreq->req.operation == BLKIF_OP_READ ? "read" : "write");
> > > ioreq->aio_errors++;
> > > }
> > > @@ -610,13 +619,13 @@ static void qemu_aio_complete(void *opaque, int
> > ret)
> > > if (ioreq->presync) {
> > > ioreq->presync = 0;
> > > ioreq_runio_qemu_aio(ioreq);
> > > - return;
> > > + goto done;
> > > }
> > > if (ioreq->aio_inflight > 0) {
> > > - return;
> > > + goto done;
> > > }
> > >
> > > - if (ioreq->blkdev->feature_grant_copy) {
> > > + if (blkdev->feature_grant_copy) {
> > > switch (ioreq->req.operation) {
> > > case BLKIF_OP_READ:
> > > /* in case of failure ioreq->aio_errors is increased */
> > > @@ -638,7 +647,7 @@ static void qemu_aio_complete(void *opaque, int
> > ret)
> > > }
> > >
> > > ioreq->status = ioreq->aio_errors ? BLKIF_RSP_ERROR :
> > BLKIF_RSP_OKAY;
> > > - if (!ioreq->blkdev->feature_grant_copy) {
> > > + if (!blkdev->feature_grant_copy) {
> > > ioreq_unmap(ioreq);
> > > }
> > > ioreq_finish(ioreq);
> > > @@ -650,16 +659,19 @@ static void qemu_aio_complete(void *opaque, int
> > ret)
> > > }
> > > case BLKIF_OP_READ:
> > > if (ioreq->status == BLKIF_RSP_OKAY) {
> > > - block_acct_done(blk_get_stats(ioreq->blkdev->blk), &ioreq->acct);
> > > + block_acct_done(blk_get_stats(blkdev->blk), &ioreq->acct);
> > > } else {
> > > - block_acct_failed(blk_get_stats(ioreq->blkdev->blk), &ioreq->acct);
> > > + block_acct_failed(blk_get_stats(blkdev->blk), &ioreq->acct);
> > > }
> > > break;
> > > case BLKIF_OP_DISCARD:
> > > default:
> > > break;
> > > }
> > > - qemu_bh_schedule(ioreq->blkdev->bh);
> > > + qemu_bh_schedule(blkdev->bh);
> > > +
> > > +done:
> > > + aio_context_release(blkdev->ctx);
> > > }
> > >
> > > static bool blk_split_discard(struct ioreq *ioreq, blkif_sector_t
> > sector_number,
> > > @@ -917,17 +929,40 @@ static void blk_handle_requests(struct XenBlkDev
> > *blkdev)
> > > static void blk_bh(void *opaque)
> > > {
> > > struct XenBlkDev *blkdev = opaque;
> > > +
> > > + aio_context_acquire(blkdev->ctx);
> > > blk_handle_requests(blkdev);
> > > + aio_context_release(blkdev->ctx);
> > > }
> > >
> > > static void blk_alloc(struct XenDevice *xendev)
> > > {
> > > struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev,
> > xendev);
> > > + Object *obj;
> > > + char *name;
> > > + Error *err = NULL;
> > > +
> > > + trace_xen_disk_alloc(xendev->name);
> > >
> > > QLIST_INIT(&blkdev->inflight);
> > > QLIST_INIT(&blkdev->finished);
> > > QLIST_INIT(&blkdev->freelist);
> > > - blkdev->bh = qemu_bh_new(blk_bh, blkdev);
> > > +
> > > + obj = object_new(TYPE_IOTHREAD);
> > > + name = g_strdup_printf("iothread-%s", xendev->name);
> > > +
> > > + object_property_add_child(object_get_objects_root(), name, obj,
> > &err);
> > > + assert(!err);
> >
> > Would it be enough to call object_ref?
> >
>
> You mean to avoid the assert? I guess so but I think any failure here would be indicative of a larger problem.
No, I meant calling object_ref instead of object_property_add_child.
> >
> > > + g_free(name);
> > > +
> > > + user_creatable_complete(obj, &err);
> >
> > Why do we need to call this?
> >
>
> I'm not entirely sure but looking around the object code it seemed to be a necessary part of instantiation. Maybe it is not required for iothread objects, but I could not figure that out from looking at the code and comments in the header suggest it is harmless if it is not required.
>
> > > + assert(!err);
> > > +
> > > + blkdev->iothread = (IOThread *)object_dynamic_cast(obj,
> > TYPE_IOTHREAD);
> > > + blkdev->ctx = iothread_get_aio_context(blkdev->iothread);
> > > + blkdev->bh = aio_bh_new(blkdev->ctx, blk_bh, blkdev);
> > > +
> > > if (xen_mode != XEN_EMULATE) {
> > > batch_maps = 1;
> > > }
> > > @@ -1288,6 +1327,8 @@ static int blk_connect(struct XenDevice *xendev)
> > > blkdev->persistent_gnt_count = 0;
> > > }
> > >
> > > + blk_set_aio_context(blkdev->blk, blkdev->ctx);
> > > +
> > > xen_be_bind_evtchn(&blkdev->xendev);
> > >
> > > xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, nr-ring-ref %u, "
> > > @@ -1301,13 +1342,20 @@ static void blk_disconnect(struct XenDevice
> > *xendev)
> > > {
> > > struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev,
> > xendev);
> > >
> > > + trace_xen_disk_disconnect(xendev->name);
> > > +
> > > + aio_context_acquire(blkdev->ctx);
> > > +
> > > if (blkdev->blk) {
> > > + blk_set_aio_context(blkdev->blk, qemu_get_aio_context());
> > > blk_detach_dev(blkdev->blk, blkdev);
> > > blk_unref(blkdev->blk);
> > > blkdev->blk = NULL;
> > > }
> > > xen_pv_unbind_evtchn(&blkdev->xendev);
> > >
> > > + aio_context_release(blkdev->ctx);
> > > +
> > > if (blkdev->sring) {
> > > xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring,
> > > blkdev->nr_ring_ref);
> > > @@ -1358,6 +1408,7 @@ static int blk_free(struct XenDevice *xendev)
> > > g_free(blkdev->dev);
> > > g_free(blkdev->devtype);
> > > qemu_bh_delete(blkdev->bh);
> > > + object_unparent(OBJECT(blkdev->iothread));
> >
> > Shouldn't this be object_unref?
> >
>
> I don't think so. I think this is required to undo what was done by calling object_property_add_child() on the root object.
Right, so if object_property_add_child is not actually required, then
you might be able to turn object_unparent into object_unref.
Unfortunately I don't know enough about QOM to be able to tell which is
the right way of doing things, but looking at
hw/block/dataplane/virtio-blk.c, it would seem that only object_ref and
object_unref are required?
> Looking at other code such as object_new_with_propv() it looks like the right thing to do is to call object_unref() after calling object_property_add_child() to drop the implicit ref taken by object_new() so I'd need to add the call in blk_alloc().
[-- Attachment #2: Type: text/plain, Size: 127 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/3] xen-disk: use an IOThread per instance
2017-07-07 22:06 ` [Qemu-devel] " Stefano Stabellini
2017-07-10 12:11 ` Paul Durrant
@ 2017-07-10 12:11 ` Paul Durrant
1 sibling, 0 replies; 24+ messages in thread
From: Paul Durrant @ 2017-07-10 12:11 UTC (permalink / raw)
To: 'Stefano Stabellini'
Cc: xen-devel, qemu-devel, qemu-block, Anthony Perard, Kevin Wolf,
Max Reitz, afaerber, armbru
> -----Original Message-----
[snip]
> > > > + object_unparent(OBJECT(blkdev->iothread));
> > >
> > > Shouldn't this be object_unref?
> > >
> >
> > I don't think so. I think this is required to undo what was done by calling
> object_property_add_child() on the root object.
>
> Right, so if object_property_add_child is not actually required, then
> you might be able to turn object_unparent into object_unref.
>
> Unfortunately I don't know enough about QOM to be able to tell which is
> the right way of doing things, but looking at
> hw/block/dataplane/virtio-blk.c, it would seem that only object_ref and
> object_unref are required?
>
I guess I can give it a try. I was working on the assumption that all objects were required to have a parent, but maybe that's not true. Can someone more familiar with QOM comment?
Cheers,
Paul
>
> > Looking at other code such as object_new_with_propv() it looks like the
> right thing to do is to call object_unref() after calling
> object_property_add_child() to drop the implicit ref taken by object_new()
> so I'd need to add the call in blk_alloc().
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 3/3] xen-disk: use an IOThread per instance
2017-07-07 22:06 ` [Qemu-devel] " Stefano Stabellini
@ 2017-07-10 12:11 ` Paul Durrant
2017-07-10 12:11 ` [Qemu-devel] " Paul Durrant
1 sibling, 0 replies; 24+ messages in thread
From: Paul Durrant @ 2017-07-10 12:11 UTC (permalink / raw)
To: 'Stefano Stabellini'
Cc: Kevin Wolf, qemu-block, armbru, qemu-devel, Max Reitz,
Anthony Perard, xen-devel, afaerber
> -----Original Message-----
[snip]
> > > > + object_unparent(OBJECT(blkdev->iothread));
> > >
> > > Shouldn't this be object_unref?
> > >
> >
> > I don't think so. I think this is required to undo what was done by calling
> object_property_add_child() on the root object.
>
> Right, so if object_property_add_child is not actually required, then
> you might be able to turn object_unparent into object_unref.
>
> Unfortunately I don't know enough about QOM to be able to tell which is
> the right way of doing things, but looking at
> hw/block/dataplane/virtio-blk.c, it would seem that only object_ref and
> object_unref are required?
>
I guess I can give it a try. I was working on the assumption that all objects were required to have a parent, but maybe that's not true. Can someone more familiar with QOM comment?
Cheers,
Paul
>
> > Looking at other code such as object_new_with_propv() it looks like the
> right thing to do is to call object_unref() after calling
> object_property_add_child() to drop the implicit ref taken by object_new()
> so I'd need to add the call in blk_alloc().
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2017-07-10 12:11 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-21 12:52 [Qemu-devel] [PATCH v2 0/3] xen-disk: performance improvements Paul Durrant
2017-06-21 12:52 ` Paul Durrant
2017-06-21 12:52 ` [Qemu-devel] [PATCH v2 1/3] xen-disk: only advertize feature-persistent if grant copy is not available Paul Durrant
2017-06-21 12:52 ` Paul Durrant
2017-06-22 0:40 ` Stefano Stabellini
2017-06-22 0:40 ` [Qemu-devel] " Stefano Stabellini
2017-06-21 12:52 ` [Qemu-devel] [PATCH v2 2/3] xen-disk: add support for multi-page shared rings Paul Durrant
2017-06-21 12:52 ` Paul Durrant
2017-06-22 0:39 ` [Qemu-devel] " Stefano Stabellini
2017-06-22 0:39 ` Stefano Stabellini
2017-06-21 12:52 ` [Qemu-devel] [PATCH v2 3/3] xen-disk: use an IOThread per instance Paul Durrant
2017-06-21 12:52 ` Paul Durrant
2017-06-22 22:14 ` [Qemu-devel] " Stefano Stabellini
2017-07-07 8:20 ` Paul Durrant
2017-07-07 22:06 ` Stefano Stabellini
2017-07-07 22:06 ` [Qemu-devel] " Stefano Stabellini
2017-07-10 12:11 ` Paul Durrant
2017-07-10 12:11 ` [Qemu-devel] " Paul Durrant
2017-07-07 8:20 ` Paul Durrant
2017-06-22 22:14 ` Stefano Stabellini
2017-06-27 22:07 ` [Qemu-devel] [Xen-devel] [PATCH v2 0/3] xen-disk: performance improvements Stefano Stabellini
2017-06-27 22:07 ` Stefano Stabellini
2017-06-28 12:52 ` [Qemu-devel] [Xen-devel] " Paul Durrant
2017-06-28 12:52 ` Paul Durrant
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.