All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 0/4] The intro of QEMU block I/O throttling
@ 2011-09-08 10:11 ` Zhi Yong Wu
  0 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-09-08 10:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, aliguori, stefanha, kvm, mtosatti, Zhi Yong Wu, pair,
	zwu.kernel, ryanh

The main goal of the patch is to effectively cap the disk I/O speed or counts of one single VM.It is only one draft, so it unavoidably has some drawbacks, if you catch them, please let me know.

The patch will mainly introduce one block I/O throttling algorithm, one timer and one block queue for each I/O limits enabled drive.

When a block request is coming in, the throttling algorithm will check if its I/O rate or counts exceed the limits; if yes, then it will enqueue to the block queue; The timer will handle the I/O requests in it.

Some available features follow as below:
(1) global bps limit.
   -drive bps=xxx            in bytes/s
(2) only read bps limit
   -drive bps_rd=xxx         in bytes/s
(3) only write bps limit
   -drive bps_wr=xxx         in bytes/s
(4) global iops limit
   -drive iops=xxx           in ios/s
(5) only read iops limit
   -drive iops_rd=xxx        in ios/s
(6) only write iops limit
   -drive iops_wr=xxx        in ios/s
(7) the combination of some limits.
   -drive bps=xxx,iops=xxx

Known Limitations:
(1) #1 can not coexist with #2, #3
(2) #4 can not coexist with #5, #6
(3) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.

Changes since code V7:
  fix the build per patch based on stefan's comments.

Zhi Yong Wu (4):
  block: add the command line support
  block: add the block queue support
  block: add block timer and throttling algorithm
  qmp/hmp: add block_set_io_throttle

 v7: Mainly simply the block queue.
     Adjust codes based on stefan's comments.

 v6: Mainly fix the aio callback issue for block queue.
     Adjust codes based on Ram Pai's comments.

 v5: add qmp/hmp support.
     Adjust the codes based on stefan's comments
     qmp/hmp: add block_set_io_throttle

 v4: fix memory leaking based on ryan's feedback.

 v3: Added the code for extending slice time, and modified the method to compute wait time for the timer.

 v2: The codes V2 for QEMU disk I/O limits.
     Modified the codes mainly based on stefan's comments.

 v1: Submit the codes for QEMU disk I/O limits.
     Only a code draft.


 Makefile.objs     |    2 +-
 block.c           |  344 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 block.h           |    6 +-
 block/blk-queue.c |  201 +++++++++++++++++++++++++++++++
 block/blk-queue.h |   59 +++++++++
 block_int.h       |   30 +++++
 blockdev.c        |   98 +++++++++++++++
 blockdev.h        |    2 +
 hmp-commands.hx   |   15 +++
 qemu-config.c     |   24 ++++
 qemu-options.hx   |    1 +
 qerror.c          |    4 +
 qerror.h          |    3 +
 qmp-commands.hx   |   52 ++++++++-
 14 files changed, 825 insertions(+), 16 deletions(-)
 create mode 100644 block/blk-queue.c
 create mode 100644 block/blk-queue.h

-- 
1.7.6

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Qemu-devel] [PATCH v8 0/4] The intro of QEMU block I/O throttling
@ 2011-09-08 10:11 ` Zhi Yong Wu
  0 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-09-08 10:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, aliguori, stefanha, kvm, mtosatti, Zhi Yong Wu, pair,
	zwu.kernel, ryanh

The main goal of the patch is to effectively cap the disk I/O speed or counts of one single VM.It is only one draft, so it unavoidably has some drawbacks, if you catch them, please let me know.

The patch will mainly introduce one block I/O throttling algorithm, one timer and one block queue for each I/O limits enabled drive.

When a block request is coming in, the throttling algorithm will check if its I/O rate or counts exceed the limits; if yes, then it will enqueue to the block queue; The timer will handle the I/O requests in it.

Some available features follow as below:
(1) global bps limit.
   -drive bps=xxx            in bytes/s
(2) only read bps limit
   -drive bps_rd=xxx         in bytes/s
(3) only write bps limit
   -drive bps_wr=xxx         in bytes/s
(4) global iops limit
   -drive iops=xxx           in ios/s
(5) only read iops limit
   -drive iops_rd=xxx        in ios/s
(6) only write iops limit
   -drive iops_wr=xxx        in ios/s
(7) the combination of some limits.
   -drive bps=xxx,iops=xxx

Known Limitations:
(1) #1 can not coexist with #2, #3
(2) #4 can not coexist with #5, #6
(3) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.

Changes since code V7:
  fix the build per patch based on stefan's comments.

Zhi Yong Wu (4):
  block: add the command line support
  block: add the block queue support
  block: add block timer and throttling algorithm
  qmp/hmp: add block_set_io_throttle

 v7: Mainly simply the block queue.
     Adjust codes based on stefan's comments.

 v6: Mainly fix the aio callback issue for block queue.
     Adjust codes based on Ram Pai's comments.

 v5: add qmp/hmp support.
     Adjust the codes based on stefan's comments
     qmp/hmp: add block_set_io_throttle

 v4: fix memory leaking based on ryan's feedback.

 v3: Added the code for extending slice time, and modified the method to compute wait time for the timer.

 v2: The codes V2 for QEMU disk I/O limits.
     Modified the codes mainly based on stefan's comments.

 v1: Submit the codes for QEMU disk I/O limits.
     Only a code draft.


 Makefile.objs     |    2 +-
 block.c           |  344 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 block.h           |    6 +-
 block/blk-queue.c |  201 +++++++++++++++++++++++++++++++
 block/blk-queue.h |   59 +++++++++
 block_int.h       |   30 +++++
 blockdev.c        |   98 +++++++++++++++
 blockdev.h        |    2 +
 hmp-commands.hx   |   15 +++
 qemu-config.c     |   24 ++++
 qemu-options.hx   |    1 +
 qerror.c          |    4 +
 qerror.h          |    3 +
 qmp-commands.hx   |   52 ++++++++-
 14 files changed, 825 insertions(+), 16 deletions(-)
 create mode 100644 block/blk-queue.c
 create mode 100644 block/blk-queue.h

-- 
1.7.6

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v8 1/4] block: add the block queue support
  2011-09-08 10:11 ` [Qemu-devel] " Zhi Yong Wu
@ 2011-09-08 10:11   ` Zhi Yong Wu
  -1 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-09-08 10:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, stefanha, mtosatti, aliguori, ryanh, zwu.kernel, kwolf,
	pair, Zhi Yong Wu

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
---
 Makefile.objs     |    2 +-
 block/blk-queue.c |  201 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 block/blk-queue.h |   59 ++++++++++++++++
 block_int.h       |   27 +++++++
 4 files changed, 288 insertions(+), 1 deletions(-)
 create mode 100644 block/blk-queue.c
 create mode 100644 block/blk-queue.h

diff --git a/Makefile.objs b/Makefile.objs
index 26b885b..5dcf456 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -33,7 +33,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vv
 block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
 block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-nested-y += qed-check.o
-block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
+block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o blk-queue.o
 block-nested-$(CONFIG_WIN32) += raw-win32.o
 block-nested-$(CONFIG_POSIX) += raw-posix.o
 block-nested-$(CONFIG_CURL) += curl.o
diff --git a/block/blk-queue.c b/block/blk-queue.c
new file mode 100644
index 0000000..adef497
--- /dev/null
+++ b/block/blk-queue.c
@@ -0,0 +1,201 @@
+/*
+ * QEMU System Emulator queue definition for block layer
+ *
+ * Copyright (c) IBM, Corp. 2011
+ *
+ * Authors:
+ *  Zhi Yong Wu  <wuzhy@linux.vnet.ibm.com>
+ *  Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "block_int.h"
+#include "block/blk-queue.h"
+#include "qemu-common.h"
+
+/* The APIs for block request queue on qemu block layer.
+ */
+
+struct BlockQueueAIOCB {
+    BlockDriverAIOCB common;
+    QTAILQ_ENTRY(BlockQueueAIOCB) entry;
+    BlockRequestHandler *handler;
+    BlockDriverAIOCB *real_acb;
+
+    int64_t sector_num;
+    QEMUIOVector *qiov;
+    int nb_sectors;
+};
+
+typedef struct BlockQueueAIOCB BlockQueueAIOCB;
+
+struct BlockQueue {
+    QTAILQ_HEAD(requests, BlockQueueAIOCB) requests;
+    bool req_failed;
+    bool flushing;
+};
+
+static void qemu_block_queue_dequeue(BlockQueue *queue,
+                                     BlockQueueAIOCB *request)
+{
+    BlockQueueAIOCB *req;
+
+    assert(queue);
+    while (!QTAILQ_EMPTY(&queue->requests)) {
+        req = QTAILQ_FIRST(&queue->requests);
+        if (req == request) {
+            QTAILQ_REMOVE(&queue->requests, req, entry);
+            break;
+        }
+    }
+}
+
+static void qemu_block_queue_cancel(BlockDriverAIOCB *acb)
+{
+    BlockQueueAIOCB *request = container_of(acb, BlockQueueAIOCB, common);
+    if (request->real_acb) {
+        bdrv_aio_cancel(request->real_acb);
+    } else {
+        assert(request->common.bs->block_queue);
+        qemu_block_queue_dequeue(request->common.bs->block_queue,
+                                 request);
+    }
+
+    qemu_aio_release(request);
+}
+
+static AIOPool block_queue_pool = {
+    .aiocb_size         = sizeof(struct BlockQueueAIOCB),
+    .cancel             = qemu_block_queue_cancel,
+};
+
+static void qemu_block_queue_callback(void *opaque, int ret)
+{
+    BlockQueueAIOCB *acb = opaque;
+
+    if (acb->common.cb) {
+        acb->common.cb(acb->common.opaque, ret);
+    }
+
+    qemu_aio_release(acb);
+}
+
+BlockQueue *qemu_new_block_queue(void)
+{
+    BlockQueue *queue;
+
+    queue = g_malloc0(sizeof(BlockQueue));
+
+    QTAILQ_INIT(&queue->requests);
+
+    queue->req_failed = true;
+    queue->flushing   = false;
+
+    return queue;
+}
+
+void qemu_del_block_queue(BlockQueue *queue)
+{
+    BlockQueueAIOCB *request, *next;
+
+    QTAILQ_FOREACH_SAFE(request, &queue->requests, entry, next) {
+        QTAILQ_REMOVE(&queue->requests, request, entry);
+        qemu_aio_release(request);
+    }
+
+    g_free(queue);
+}
+
+BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
+                        BlockDriverState *bs,
+                        BlockRequestHandler *handler,
+                        int64_t sector_num,
+                        QEMUIOVector *qiov,
+                        int nb_sectors,
+                        BlockDriverCompletionFunc *cb,
+                        void *opaque)
+{
+    BlockDriverAIOCB *acb;
+    BlockQueueAIOCB *request;
+
+    if (queue->flushing) {
+        queue->req_failed = false;
+        return NULL;
+    } else {
+        acb = qemu_aio_get(&block_queue_pool, bs,
+                           cb, opaque);
+        request = container_of(acb, BlockQueueAIOCB, common);
+        request->handler       = handler;
+        request->sector_num    = sector_num;
+        request->qiov          = qiov;
+        request->nb_sectors    = nb_sectors;
+        request->real_acb      = NULL;
+        QTAILQ_INSERT_TAIL(&queue->requests, request, entry);
+    }
+
+    return acb;
+}
+
+static int qemu_block_queue_handler(BlockQueueAIOCB *request)
+{
+    int ret;
+    BlockDriverAIOCB *res;
+
+    res = request->handler(request->common.bs, request->sector_num,
+                           request->qiov, request->nb_sectors,
+                           qemu_block_queue_callback, request);
+    if (res) {
+        request->real_acb = res;
+    }
+
+    ret = (res == NULL) ? 0 : 1;
+
+    return ret;
+}
+
+void qemu_block_queue_flush(BlockQueue *queue)
+{
+    queue->flushing = true;
+    while (!QTAILQ_EMPTY(&queue->requests)) {
+        BlockQueueAIOCB *request = NULL;
+        int ret = 0;
+
+        request = QTAILQ_FIRST(&queue->requests);
+        QTAILQ_REMOVE(&queue->requests, request, entry);
+
+        queue->req_failed = true;
+        ret = qemu_block_queue_handler(request);
+        if (ret == 0) {
+            QTAILQ_INSERT_HEAD(&queue->requests, request, entry);
+            if (queue->req_failed) {
+                qemu_block_queue_callback(request, -EIO);
+                break;
+            }
+        }
+    }
+
+    queue->req_failed = true;
+    queue->flushing   = false;
+}
+
+bool qemu_block_queue_has_pending(BlockQueue *queue)
+{
+    return !queue->flushing && !QTAILQ_EMPTY(&queue->requests);
+}
diff --git a/block/blk-queue.h b/block/blk-queue.h
new file mode 100644
index 0000000..c1529f7
--- /dev/null
+++ b/block/blk-queue.h
@@ -0,0 +1,59 @@
+/*
+ * QEMU System Emulator queue declaration for block layer
+ *
+ * Copyright (c) IBM, Corp. 2011
+ *
+ * Authors:
+ *  Zhi Yong Wu  <wuzhy@linux.vnet.ibm.com>
+ *  Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef QEMU_BLOCK_QUEUE_H
+#define QEMU_BLOCK_QUEUE_H
+
+#include "block.h"
+#include "qemu-queue.h"
+
+typedef BlockDriverAIOCB* (BlockRequestHandler) (BlockDriverState *bs,
+                                int64_t sector_num, QEMUIOVector *qiov,
+                                int nb_sectors, BlockDriverCompletionFunc *cb,
+                                void *opaque);
+
+typedef struct BlockQueue BlockQueue;
+
+BlockQueue *qemu_new_block_queue(void);
+
+void qemu_del_block_queue(BlockQueue *queue);
+
+BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
+                        BlockDriverState *bs,
+                        BlockRequestHandler *handler,
+                        int64_t sector_num,
+                        QEMUIOVector *qiov,
+                        int nb_sectors,
+                        BlockDriverCompletionFunc *cb,
+                        void *opaque);
+
+void qemu_block_queue_flush(BlockQueue *queue);
+
+bool qemu_block_queue_has_pending(BlockQueue *queue);
+
+#endif /* QEMU_BLOCK_QUEUE_H */
diff --git a/block_int.h b/block_int.h
index 8a72b80..201e635 100644
--- a/block_int.h
+++ b/block_int.h
@@ -29,10 +29,18 @@
 #include "qemu-queue.h"
 #include "qemu-coroutine.h"
 #include "qemu-timer.h"
+#include "block/blk-queue.h"
 
 #define BLOCK_FLAG_ENCRYPT	1
 #define BLOCK_FLAG_COMPAT6	4
 
+#define BLOCK_IO_LIMIT_READ     0
+#define BLOCK_IO_LIMIT_WRITE    1
+#define BLOCK_IO_LIMIT_TOTAL    2
+
+#define BLOCK_IO_SLICE_TIME     100000000
+#define NANOSECONDS_PER_SECOND  1000000000.0
+
 #define BLOCK_OPT_SIZE          "size"
 #define BLOCK_OPT_ENCRYPT       "encryption"
 #define BLOCK_OPT_COMPAT6       "compat6"
@@ -49,6 +57,16 @@ typedef struct AIOPool {
     BlockDriverAIOCB *free_aiocb;
 } AIOPool;
 
+typedef struct BlockIOLimit {
+    uint64_t bps[3];
+    uint64_t iops[3];
+} BlockIOLimit;
+
+typedef struct BlockIODisp {
+    uint64_t bytes[2];
+    uint64_t ios[2];
+} BlockIODisp;
+
 struct BlockDriver {
     const char *format_name;
     int instance_size;
@@ -184,6 +202,15 @@ struct BlockDriverState {
 
     void *sync_aiocb;
 
+    /* the time for latest disk I/O */
+    int64_t slice_start;
+    int64_t slice_end;
+    BlockIOLimit io_limits;
+    BlockIODisp  io_disps;
+    BlockQueue   *block_queue;
+    QEMUTimer    *block_timer;
+    bool         io_limits_enabled;
+
     /* I/O stats (display with "info blockstats"). */
     uint64_t nr_bytes[BDRV_MAX_IOTYPE];
     uint64_t nr_ops[BDRV_MAX_IOTYPE];
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [PATCH v8 1/4] block: add the block queue support
@ 2011-09-08 10:11   ` Zhi Yong Wu
  0 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-09-08 10:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, aliguori, stefanha, kvm, mtosatti, Zhi Yong Wu, pair,
	zwu.kernel, ryanh

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
---
 Makefile.objs     |    2 +-
 block/blk-queue.c |  201 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 block/blk-queue.h |   59 ++++++++++++++++
 block_int.h       |   27 +++++++
 4 files changed, 288 insertions(+), 1 deletions(-)
 create mode 100644 block/blk-queue.c
 create mode 100644 block/blk-queue.h

diff --git a/Makefile.objs b/Makefile.objs
index 26b885b..5dcf456 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -33,7 +33,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vv
 block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
 block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-nested-y += qed-check.o
-block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
+block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o blk-queue.o
 block-nested-$(CONFIG_WIN32) += raw-win32.o
 block-nested-$(CONFIG_POSIX) += raw-posix.o
 block-nested-$(CONFIG_CURL) += curl.o
diff --git a/block/blk-queue.c b/block/blk-queue.c
new file mode 100644
index 0000000..adef497
--- /dev/null
+++ b/block/blk-queue.c
@@ -0,0 +1,201 @@
+/*
+ * QEMU System Emulator queue definition for block layer
+ *
+ * Copyright (c) IBM, Corp. 2011
+ *
+ * Authors:
+ *  Zhi Yong Wu  <wuzhy@linux.vnet.ibm.com>
+ *  Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "block_int.h"
+#include "block/blk-queue.h"
+#include "qemu-common.h"
+
+/* The APIs for block request queue on qemu block layer.
+ */
+
+struct BlockQueueAIOCB {
+    BlockDriverAIOCB common;
+    QTAILQ_ENTRY(BlockQueueAIOCB) entry;
+    BlockRequestHandler *handler;
+    BlockDriverAIOCB *real_acb;
+
+    int64_t sector_num;
+    QEMUIOVector *qiov;
+    int nb_sectors;
+};
+
+typedef struct BlockQueueAIOCB BlockQueueAIOCB;
+
+struct BlockQueue {
+    QTAILQ_HEAD(requests, BlockQueueAIOCB) requests;
+    bool req_failed;
+    bool flushing;
+};
+
+static void qemu_block_queue_dequeue(BlockQueue *queue,
+                                     BlockQueueAIOCB *request)
+{
+    BlockQueueAIOCB *req;
+
+    assert(queue);
+    while (!QTAILQ_EMPTY(&queue->requests)) {
+        req = QTAILQ_FIRST(&queue->requests);
+        if (req == request) {
+            QTAILQ_REMOVE(&queue->requests, req, entry);
+            break;
+        }
+    }
+}
+
+static void qemu_block_queue_cancel(BlockDriverAIOCB *acb)
+{
+    BlockQueueAIOCB *request = container_of(acb, BlockQueueAIOCB, common);
+    if (request->real_acb) {
+        bdrv_aio_cancel(request->real_acb);
+    } else {
+        assert(request->common.bs->block_queue);
+        qemu_block_queue_dequeue(request->common.bs->block_queue,
+                                 request);
+    }
+
+    qemu_aio_release(request);
+}
+
+static AIOPool block_queue_pool = {
+    .aiocb_size         = sizeof(struct BlockQueueAIOCB),
+    .cancel             = qemu_block_queue_cancel,
+};
+
+static void qemu_block_queue_callback(void *opaque, int ret)
+{
+    BlockQueueAIOCB *acb = opaque;
+
+    if (acb->common.cb) {
+        acb->common.cb(acb->common.opaque, ret);
+    }
+
+    qemu_aio_release(acb);
+}
+
+BlockQueue *qemu_new_block_queue(void)
+{
+    BlockQueue *queue;
+
+    queue = g_malloc0(sizeof(BlockQueue));
+
+    QTAILQ_INIT(&queue->requests);
+
+    queue->req_failed = true;
+    queue->flushing   = false;
+
+    return queue;
+}
+
+void qemu_del_block_queue(BlockQueue *queue)
+{
+    BlockQueueAIOCB *request, *next;
+
+    QTAILQ_FOREACH_SAFE(request, &queue->requests, entry, next) {
+        QTAILQ_REMOVE(&queue->requests, request, entry);
+        qemu_aio_release(request);
+    }
+
+    g_free(queue);
+}
+
+BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
+                        BlockDriverState *bs,
+                        BlockRequestHandler *handler,
+                        int64_t sector_num,
+                        QEMUIOVector *qiov,
+                        int nb_sectors,
+                        BlockDriverCompletionFunc *cb,
+                        void *opaque)
+{
+    BlockDriverAIOCB *acb;
+    BlockQueueAIOCB *request;
+
+    if (queue->flushing) {
+        queue->req_failed = false;
+        return NULL;
+    } else {
+        acb = qemu_aio_get(&block_queue_pool, bs,
+                           cb, opaque);
+        request = container_of(acb, BlockQueueAIOCB, common);
+        request->handler       = handler;
+        request->sector_num    = sector_num;
+        request->qiov          = qiov;
+        request->nb_sectors    = nb_sectors;
+        request->real_acb      = NULL;
+        QTAILQ_INSERT_TAIL(&queue->requests, request, entry);
+    }
+
+    return acb;
+}
+
+static int qemu_block_queue_handler(BlockQueueAIOCB *request)
+{
+    int ret;
+    BlockDriverAIOCB *res;
+
+    res = request->handler(request->common.bs, request->sector_num,
+                           request->qiov, request->nb_sectors,
+                           qemu_block_queue_callback, request);
+    if (res) {
+        request->real_acb = res;
+    }
+
+    ret = (res == NULL) ? 0 : 1;
+
+    return ret;
+}
+
+void qemu_block_queue_flush(BlockQueue *queue)
+{
+    queue->flushing = true;
+    while (!QTAILQ_EMPTY(&queue->requests)) {
+        BlockQueueAIOCB *request = NULL;
+        int ret = 0;
+
+        request = QTAILQ_FIRST(&queue->requests);
+        QTAILQ_REMOVE(&queue->requests, request, entry);
+
+        queue->req_failed = true;
+        ret = qemu_block_queue_handler(request);
+        if (ret == 0) {
+            QTAILQ_INSERT_HEAD(&queue->requests, request, entry);
+            if (queue->req_failed) {
+                qemu_block_queue_callback(request, -EIO);
+                break;
+            }
+        }
+    }
+
+    queue->req_failed = true;
+    queue->flushing   = false;
+}
+
+bool qemu_block_queue_has_pending(BlockQueue *queue)
+{
+    return !queue->flushing && !QTAILQ_EMPTY(&queue->requests);
+}
diff --git a/block/blk-queue.h b/block/blk-queue.h
new file mode 100644
index 0000000..c1529f7
--- /dev/null
+++ b/block/blk-queue.h
@@ -0,0 +1,59 @@
+/*
+ * QEMU System Emulator queue declaration for block layer
+ *
+ * Copyright (c) IBM, Corp. 2011
+ *
+ * Authors:
+ *  Zhi Yong Wu  <wuzhy@linux.vnet.ibm.com>
+ *  Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef QEMU_BLOCK_QUEUE_H
+#define QEMU_BLOCK_QUEUE_H
+
+#include "block.h"
+#include "qemu-queue.h"
+
+typedef BlockDriverAIOCB* (BlockRequestHandler) (BlockDriverState *bs,
+                                int64_t sector_num, QEMUIOVector *qiov,
+                                int nb_sectors, BlockDriverCompletionFunc *cb,
+                                void *opaque);
+
+typedef struct BlockQueue BlockQueue;
+
+BlockQueue *qemu_new_block_queue(void);
+
+void qemu_del_block_queue(BlockQueue *queue);
+
+BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
+                        BlockDriverState *bs,
+                        BlockRequestHandler *handler,
+                        int64_t sector_num,
+                        QEMUIOVector *qiov,
+                        int nb_sectors,
+                        BlockDriverCompletionFunc *cb,
+                        void *opaque);
+
+void qemu_block_queue_flush(BlockQueue *queue);
+
+bool qemu_block_queue_has_pending(BlockQueue *queue);
+
+#endif /* QEMU_BLOCK_QUEUE_H */
diff --git a/block_int.h b/block_int.h
index 8a72b80..201e635 100644
--- a/block_int.h
+++ b/block_int.h
@@ -29,10 +29,18 @@
 #include "qemu-queue.h"
 #include "qemu-coroutine.h"
 #include "qemu-timer.h"
+#include "block/blk-queue.h"
 
 #define BLOCK_FLAG_ENCRYPT	1
 #define BLOCK_FLAG_COMPAT6	4
 
+#define BLOCK_IO_LIMIT_READ     0
+#define BLOCK_IO_LIMIT_WRITE    1
+#define BLOCK_IO_LIMIT_TOTAL    2
+
+#define BLOCK_IO_SLICE_TIME     100000000
+#define NANOSECONDS_PER_SECOND  1000000000.0
+
 #define BLOCK_OPT_SIZE          "size"
 #define BLOCK_OPT_ENCRYPT       "encryption"
 #define BLOCK_OPT_COMPAT6       "compat6"
@@ -49,6 +57,16 @@ typedef struct AIOPool {
     BlockDriverAIOCB *free_aiocb;
 } AIOPool;
 
+typedef struct BlockIOLimit {
+    uint64_t bps[3];
+    uint64_t iops[3];
+} BlockIOLimit;
+
+typedef struct BlockIODisp {
+    uint64_t bytes[2];
+    uint64_t ios[2];
+} BlockIODisp;
+
 struct BlockDriver {
     const char *format_name;
     int instance_size;
@@ -184,6 +202,15 @@ struct BlockDriverState {
 
     void *sync_aiocb;
 
+    /* the time for latest disk I/O */
+    int64_t slice_start;
+    int64_t slice_end;
+    BlockIOLimit io_limits;
+    BlockIODisp  io_disps;
+    BlockQueue   *block_queue;
+    QEMUTimer    *block_timer;
+    bool         io_limits_enabled;
+
     /* I/O stats (display with "info blockstats"). */
     uint64_t nr_bytes[BDRV_MAX_IOTYPE];
     uint64_t nr_ops[BDRV_MAX_IOTYPE];
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v8 2/4] block: add the command line support
  2011-09-08 10:11 ` [Qemu-devel] " Zhi Yong Wu
@ 2011-09-08 10:11   ` Zhi Yong Wu
  -1 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-09-08 10:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, stefanha, mtosatti, aliguori, ryanh, zwu.kernel, kwolf,
	pair, Zhi Yong Wu

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
---
 block.c         |   59 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 block.h         |    5 ++++
 block_int.h     |    3 ++
 blockdev.c      |   29 +++++++++++++++++++++++++++
 qemu-config.c   |   24 ++++++++++++++++++++++
 qemu-options.hx |    1 +
 6 files changed, 121 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 43742b7..cd75183 100644
--- a/block.c
+++ b/block.c
@@ -104,6 +104,57 @@ int is_windows_drive(const char *filename)
 }
 #endif
 
+/* throttling disk I/O limits */
+void bdrv_io_limits_disable(BlockDriverState *bs)
+{
+    bs->io_limits_enabled = false;
+
+    if (bs->block_queue) {
+        qemu_block_queue_flush(bs->block_queue);
+        qemu_del_block_queue(bs->block_queue);
+        bs->block_queue = NULL;
+    }
+
+    if (bs->block_timer) {
+        qemu_del_timer(bs->block_timer);
+        qemu_free_timer(bs->block_timer);
+        bs->block_timer     = NULL;
+    }
+
+    bs->slice_start = 0;
+
+    bs->slice_end   = 0;
+}
+
+static void bdrv_block_timer(void *opaque)
+{
+    BlockDriverState *bs = opaque;
+    BlockQueue *queue    = bs->block_queue;
+
+    qemu_block_queue_flush(queue);
+}
+
+void bdrv_io_limits_enable(BlockDriverState *bs)
+{
+    bs->block_queue = qemu_new_block_queue();
+    bs->block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
+
+    bs->slice_start = qemu_get_clock_ns(vm_clock);
+
+    bs->slice_end   = bs->slice_start + BLOCK_IO_SLICE_TIME;
+}
+
+bool bdrv_io_limits_enabled(BlockDriverState *bs)
+{
+    BlockIOLimit *io_limits = &bs->io_limits;
+    return io_limits->bps[BLOCK_IO_LIMIT_READ]
+         || io_limits->bps[BLOCK_IO_LIMIT_WRITE]
+         || io_limits->bps[BLOCK_IO_LIMIT_TOTAL]
+         || io_limits->iops[BLOCK_IO_LIMIT_READ]
+         || io_limits->iops[BLOCK_IO_LIMIT_WRITE]
+         || io_limits->iops[BLOCK_IO_LIMIT_TOTAL];
+}
+
 /* check if the path starts with "<protocol>:" */
 static int path_has_protocol(const char *path)
 {
@@ -1453,6 +1504,14 @@ void bdrv_get_geometry_hint(BlockDriverState *bs,
     *psecs = bs->secs;
 }
 
+/* throttling disk io limits */
+void bdrv_set_io_limits(BlockDriverState *bs,
+                            BlockIOLimit *io_limits)
+{
+    bs->io_limits = *io_limits;
+    bs->io_limits_enabled = bdrv_io_limits_enabled(bs);
+}
+
 /* Recognize floppy formats */
 typedef struct FDFormat {
     FDriveType drive;
diff --git a/block.h b/block.h
index 3ac0b94..a3e69db 100644
--- a/block.h
+++ b/block.h
@@ -58,6 +58,11 @@ void bdrv_info(Monitor *mon, QObject **ret_data);
 void bdrv_stats_print(Monitor *mon, const QObject *data);
 void bdrv_info_stats(Monitor *mon, QObject **ret_data);
 
+/* disk I/O throttling */
+void bdrv_io_limits_enable(BlockDriverState *bs);
+void bdrv_io_limits_disable(BlockDriverState *bs);
+bool bdrv_io_limits_enabled(BlockDriverState *bs);
+
 void bdrv_init(void);
 void bdrv_init_with_whitelist(void);
 BlockDriver *bdrv_find_protocol(const char *filename);
diff --git a/block_int.h b/block_int.h
index 201e635..368c776 100644
--- a/block_int.h
+++ b/block_int.h
@@ -257,6 +257,9 @@ void qemu_aio_release(void *p);
 
 void *qemu_blockalign(BlockDriverState *bs, size_t size);
 
+void bdrv_set_io_limits(BlockDriverState *bs,
+                            BlockIOLimit *io_limits);
+
 #ifdef _WIN32
 int is_windows_drive(const char *filename);
 #endif
diff --git a/blockdev.c b/blockdev.c
index 2602591..619ae9f 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -236,6 +236,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
     int on_read_error, on_write_error;
     const char *devaddr;
     DriveInfo *dinfo;
+    BlockIOLimit io_limits;
     int snapshot = 0;
     int ret;
 
@@ -354,6 +355,31 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
         }
     }
 
+    /* disk I/O throttling */
+    io_limits.bps[BLOCK_IO_LIMIT_TOTAL]  =
+                           qemu_opt_get_number(opts, "bps", 0);
+    io_limits.bps[BLOCK_IO_LIMIT_READ]   =
+                           qemu_opt_get_number(opts, "bps_rd", 0);
+    io_limits.bps[BLOCK_IO_LIMIT_WRITE]  =
+                           qemu_opt_get_number(opts, "bps_wr", 0);
+    io_limits.iops[BLOCK_IO_LIMIT_TOTAL] =
+                           qemu_opt_get_number(opts, "iops", 0);
+    io_limits.iops[BLOCK_IO_LIMIT_READ]  =
+                           qemu_opt_get_number(opts, "iops_rd", 0);
+    io_limits.iops[BLOCK_IO_LIMIT_WRITE] =
+                           qemu_opt_get_number(opts, "iops_wr", 0);
+
+    if (((io_limits.bps[BLOCK_IO_LIMIT_TOTAL] != 0)
+            && ((io_limits.bps[BLOCK_IO_LIMIT_READ] != 0)
+            || (io_limits.bps[BLOCK_IO_LIMIT_WRITE] != 0)))
+            || ((io_limits.iops[BLOCK_IO_LIMIT_TOTAL] != 0)
+            && ((io_limits.iops[BLOCK_IO_LIMIT_READ] != 0)
+            || (io_limits.iops[BLOCK_IO_LIMIT_WRITE] != 0)))) {
+        error_report("bps(iops) and bps_rd/bps_wr(iops_rd/iops_wr)"
+                     "cannot be used at the same time");
+        return NULL;
+    }
+
     on_write_error = BLOCK_ERR_STOP_ENOSPC;
     if ((buf = qemu_opt_get(opts, "werror")) != NULL) {
         if (type != IF_IDE && type != IF_SCSI && type != IF_VIRTIO && type != IF_NONE) {
@@ -461,6 +487,9 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
 
     bdrv_set_on_error(dinfo->bdrv, on_read_error, on_write_error);
 
+    /* disk I/O throttling */
+    bdrv_set_io_limits(dinfo->bdrv, &io_limits);
+
     switch(type) {
     case IF_IDE:
     case IF_SCSI:
diff --git a/qemu-config.c b/qemu-config.c
index 7a7854f..405e587 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -85,6 +85,30 @@ static QemuOptsList qemu_drive_opts = {
             .name = "readonly",
             .type = QEMU_OPT_BOOL,
             .help = "open drive file as read-only",
+        },{
+            .name = "iops",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit total I/O operations per second",
+        },{
+            .name = "iops_rd",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit read operations per second",
+        },{
+            .name = "iops_wr",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit write operations per second",
+        },{
+            .name = "bps",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit total bytes per second",
+        },{
+            .name = "bps_rd",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit read bytes per second",
+        },{
+            .name = "bps_wr",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit write bytes per second",
         },
         { /* end of list */ }
     },
diff --git a/qemu-options.hx b/qemu-options.hx
index 659ecb2..2e42c5c 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -136,6 +136,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
     "       [,cache=writethrough|writeback|none|directsync|unsafe][,format=f]\n"
     "       [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
     "       [,readonly=on|off]\n"
+    "       [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r][,iops_wr=w]]\n"
     "                use 'file' as a drive image\n", QEMU_ARCH_ALL)
 STEXI
 @item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [PATCH v8 2/4] block: add the command line support
@ 2011-09-08 10:11   ` Zhi Yong Wu
  0 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-09-08 10:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, aliguori, stefanha, kvm, mtosatti, Zhi Yong Wu, pair,
	zwu.kernel, ryanh

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
---
 block.c         |   59 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 block.h         |    5 ++++
 block_int.h     |    3 ++
 blockdev.c      |   29 +++++++++++++++++++++++++++
 qemu-config.c   |   24 ++++++++++++++++++++++
 qemu-options.hx |    1 +
 6 files changed, 121 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 43742b7..cd75183 100644
--- a/block.c
+++ b/block.c
@@ -104,6 +104,57 @@ int is_windows_drive(const char *filename)
 }
 #endif
 
+/* throttling disk I/O limits */
+void bdrv_io_limits_disable(BlockDriverState *bs)
+{
+    bs->io_limits_enabled = false;
+
+    if (bs->block_queue) {
+        qemu_block_queue_flush(bs->block_queue);
+        qemu_del_block_queue(bs->block_queue);
+        bs->block_queue = NULL;
+    }
+
+    if (bs->block_timer) {
+        qemu_del_timer(bs->block_timer);
+        qemu_free_timer(bs->block_timer);
+        bs->block_timer     = NULL;
+    }
+
+    bs->slice_start = 0;
+
+    bs->slice_end   = 0;
+}
+
+static void bdrv_block_timer(void *opaque)
+{
+    BlockDriverState *bs = opaque;
+    BlockQueue *queue    = bs->block_queue;
+
+    qemu_block_queue_flush(queue);
+}
+
+void bdrv_io_limits_enable(BlockDriverState *bs)
+{
+    bs->block_queue = qemu_new_block_queue();
+    bs->block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
+
+    bs->slice_start = qemu_get_clock_ns(vm_clock);
+
+    bs->slice_end   = bs->slice_start + BLOCK_IO_SLICE_TIME;
+}
+
+bool bdrv_io_limits_enabled(BlockDriverState *bs)
+{
+    BlockIOLimit *io_limits = &bs->io_limits;
+    return io_limits->bps[BLOCK_IO_LIMIT_READ]
+         || io_limits->bps[BLOCK_IO_LIMIT_WRITE]
+         || io_limits->bps[BLOCK_IO_LIMIT_TOTAL]
+         || io_limits->iops[BLOCK_IO_LIMIT_READ]
+         || io_limits->iops[BLOCK_IO_LIMIT_WRITE]
+         || io_limits->iops[BLOCK_IO_LIMIT_TOTAL];
+}
+
 /* check if the path starts with "<protocol>:" */
 static int path_has_protocol(const char *path)
 {
@@ -1453,6 +1504,14 @@ void bdrv_get_geometry_hint(BlockDriverState *bs,
     *psecs = bs->secs;
 }
 
+/* throttling disk io limits */
+void bdrv_set_io_limits(BlockDriverState *bs,
+                            BlockIOLimit *io_limits)
+{
+    bs->io_limits = *io_limits;
+    bs->io_limits_enabled = bdrv_io_limits_enabled(bs);
+}
+
 /* Recognize floppy formats */
 typedef struct FDFormat {
     FDriveType drive;
diff --git a/block.h b/block.h
index 3ac0b94..a3e69db 100644
--- a/block.h
+++ b/block.h
@@ -58,6 +58,11 @@ void bdrv_info(Monitor *mon, QObject **ret_data);
 void bdrv_stats_print(Monitor *mon, const QObject *data);
 void bdrv_info_stats(Monitor *mon, QObject **ret_data);
 
+/* disk I/O throttling */
+void bdrv_io_limits_enable(BlockDriverState *bs);
+void bdrv_io_limits_disable(BlockDriverState *bs);
+bool bdrv_io_limits_enabled(BlockDriverState *bs);
+
 void bdrv_init(void);
 void bdrv_init_with_whitelist(void);
 BlockDriver *bdrv_find_protocol(const char *filename);
diff --git a/block_int.h b/block_int.h
index 201e635..368c776 100644
--- a/block_int.h
+++ b/block_int.h
@@ -257,6 +257,9 @@ void qemu_aio_release(void *p);
 
 void *qemu_blockalign(BlockDriverState *bs, size_t size);
 
+void bdrv_set_io_limits(BlockDriverState *bs,
+                            BlockIOLimit *io_limits);
+
 #ifdef _WIN32
 int is_windows_drive(const char *filename);
 #endif
diff --git a/blockdev.c b/blockdev.c
index 2602591..619ae9f 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -236,6 +236,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
     int on_read_error, on_write_error;
     const char *devaddr;
     DriveInfo *dinfo;
+    BlockIOLimit io_limits;
     int snapshot = 0;
     int ret;
 
@@ -354,6 +355,31 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
         }
     }
 
+    /* disk I/O throttling */
+    io_limits.bps[BLOCK_IO_LIMIT_TOTAL]  =
+                           qemu_opt_get_number(opts, "bps", 0);
+    io_limits.bps[BLOCK_IO_LIMIT_READ]   =
+                           qemu_opt_get_number(opts, "bps_rd", 0);
+    io_limits.bps[BLOCK_IO_LIMIT_WRITE]  =
+                           qemu_opt_get_number(opts, "bps_wr", 0);
+    io_limits.iops[BLOCK_IO_LIMIT_TOTAL] =
+                           qemu_opt_get_number(opts, "iops", 0);
+    io_limits.iops[BLOCK_IO_LIMIT_READ]  =
+                           qemu_opt_get_number(opts, "iops_rd", 0);
+    io_limits.iops[BLOCK_IO_LIMIT_WRITE] =
+                           qemu_opt_get_number(opts, "iops_wr", 0);
+
+    if (((io_limits.bps[BLOCK_IO_LIMIT_TOTAL] != 0)
+            && ((io_limits.bps[BLOCK_IO_LIMIT_READ] != 0)
+            || (io_limits.bps[BLOCK_IO_LIMIT_WRITE] != 0)))
+            || ((io_limits.iops[BLOCK_IO_LIMIT_TOTAL] != 0)
+            && ((io_limits.iops[BLOCK_IO_LIMIT_READ] != 0)
+            || (io_limits.iops[BLOCK_IO_LIMIT_WRITE] != 0)))) {
+        error_report("bps(iops) and bps_rd/bps_wr(iops_rd/iops_wr)"
+                     "cannot be used at the same time");
+        return NULL;
+    }
+
     on_write_error = BLOCK_ERR_STOP_ENOSPC;
     if ((buf = qemu_opt_get(opts, "werror")) != NULL) {
         if (type != IF_IDE && type != IF_SCSI && type != IF_VIRTIO && type != IF_NONE) {
@@ -461,6 +487,9 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
 
     bdrv_set_on_error(dinfo->bdrv, on_read_error, on_write_error);
 
+    /* disk I/O throttling */
+    bdrv_set_io_limits(dinfo->bdrv, &io_limits);
+
     switch(type) {
     case IF_IDE:
     case IF_SCSI:
diff --git a/qemu-config.c b/qemu-config.c
index 7a7854f..405e587 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -85,6 +85,30 @@ static QemuOptsList qemu_drive_opts = {
             .name = "readonly",
             .type = QEMU_OPT_BOOL,
             .help = "open drive file as read-only",
+        },{
+            .name = "iops",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit total I/O operations per second",
+        },{
+            .name = "iops_rd",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit read operations per second",
+        },{
+            .name = "iops_wr",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit write operations per second",
+        },{
+            .name = "bps",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit total bytes per second",
+        },{
+            .name = "bps_rd",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit read bytes per second",
+        },{
+            .name = "bps_wr",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit write bytes per second",
         },
         { /* end of list */ }
     },
diff --git a/qemu-options.hx b/qemu-options.hx
index 659ecb2..2e42c5c 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -136,6 +136,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
     "       [,cache=writethrough|writeback|none|directsync|unsafe][,format=f]\n"
     "       [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
     "       [,readonly=on|off]\n"
+    "       [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r][,iops_wr=w]]\n"
     "                use 'file' as a drive image\n", QEMU_ARCH_ALL)
 STEXI
 @item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v8 3/4] block: add block timer and throttling algorithm
  2011-09-08 10:11 ` [Qemu-devel] " Zhi Yong Wu
@ 2011-09-08 10:11   ` Zhi Yong Wu
  -1 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-09-08 10:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, stefanha, mtosatti, aliguori, ryanh, zwu.kernel, kwolf,
	pair, Zhi Yong Wu

Note:
     1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
     2.) When "dd" command is issued in guest, if its option bs is set to a large value such as "bs=1024K", the result speed will slightly bigger than the limits.

For these problems, if you have nice thought, pls let us know.:)

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
---
 block.c |  259 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 block.h |    1 -
 2 files changed, 248 insertions(+), 12 deletions(-)

diff --git a/block.c b/block.c
index cd75183..c08fde8 100644
--- a/block.c
+++ b/block.c
@@ -30,6 +30,9 @@
 #include "qemu-objects.h"
 #include "qemu-coroutine.h"
 
+#include "qemu-timer.h"
+#include "block/blk-queue.h"
+
 #ifdef CONFIG_BSD
 #include <sys/types.h>
 #include <sys/stat.h>
@@ -72,6 +75,13 @@ static int coroutine_fn bdrv_co_writev_em(BlockDriverState *bs,
                                          QEMUIOVector *iov);
 static int coroutine_fn bdrv_co_flush_em(BlockDriverState *bs);
 
+static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
+        bool is_write, double elapsed_time, uint64_t *wait);
+static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
+        double elapsed_time, uint64_t *wait);
+static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
+        bool is_write, int64_t *wait);
+
 static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
     QTAILQ_HEAD_INITIALIZER(bdrv_states);
 
@@ -745,6 +755,11 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
             bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
     }
 
+    /* throttling disk I/O limits */
+    if (bs->io_limits_enabled) {
+        bdrv_io_limits_enable(bs);
+    }
+
     return 0;
 
 unlink_and_fail:
@@ -783,6 +798,18 @@ void bdrv_close(BlockDriverState *bs)
         if (bs->change_cb)
             bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
     }
+
+    /* throttling disk I/O limits */
+    if (bs->block_queue) {
+        qemu_del_block_queue(bs->block_queue);
+        bs->block_queue = NULL;
+    }
+
+    if (bs->block_timer) {
+        qemu_del_timer(bs->block_timer);
+        qemu_free_timer(bs->block_timer);
+        bs->block_timer = NULL;
+    }
 }
 
 void bdrv_close_all(void)
@@ -2341,16 +2368,48 @@ BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num,
                                  BlockDriverCompletionFunc *cb, void *opaque)
 {
     BlockDriver *drv = bs->drv;
-
+    BlockDriverAIOCB *ret;
+    int64_t wait_time = -1;
+printf("sector_num=%ld, nb_sectors=%d\n", sector_num, nb_sectors);
     trace_bdrv_aio_readv(bs, sector_num, nb_sectors, opaque);
 
-    if (!drv)
-        return NULL;
-    if (bdrv_check_request(bs, sector_num, nb_sectors))
+    if (!drv || bdrv_check_request(bs, sector_num, nb_sectors)) {
         return NULL;
+    }
+
+    /* throttling disk read I/O */
+    if (bs->io_limits_enabled) {
+        if (bdrv_exceed_io_limits(bs, nb_sectors, false, &wait_time)) {
+            ret = qemu_block_queue_enqueue(bs->block_queue, bs, bdrv_aio_readv,
+                           sector_num, qiov, nb_sectors, cb, opaque);
+            printf("wait_time=%ld\n", wait_time);
+            if (wait_time != -1) {
+                printf("reset block timer\n");
+                qemu_mod_timer(bs->block_timer,
+                               wait_time + qemu_get_clock_ns(vm_clock));
+            }
+
+            if (ret) {
+                printf("ori ret is not null\n");
+            } else {
+                printf("ori ret is null\n");
+            }
+
+            return ret;
+        }
+    }
 
-    return drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
+    ret =  drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
                                cb, opaque);
+    if (ret) {
+        if (bs->io_limits_enabled) {
+            bs->io_disps.bytes[BLOCK_IO_LIMIT_READ] +=
+                              (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
+            bs->io_disps.ios[BLOCK_IO_LIMIT_READ]++;
+        }
+    }
+
+    return ret;
 }
 
 typedef struct BlockCompleteData {
@@ -2396,15 +2455,14 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
     BlockDriver *drv = bs->drv;
     BlockDriverAIOCB *ret;
     BlockCompleteData *blk_cb_data;
+    int64_t wait_time = -1;
 
     trace_bdrv_aio_writev(bs, sector_num, nb_sectors, opaque);
 
-    if (!drv)
-        return NULL;
-    if (bs->read_only)
-        return NULL;
-    if (bdrv_check_request(bs, sector_num, nb_sectors))
+    if (!drv || bs->read_only
+        || bdrv_check_request(bs, sector_num, nb_sectors)) {
         return NULL;
+    }
 
     if (bs->dirty_bitmap) {
         blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb,
@@ -2413,13 +2471,32 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
         opaque = blk_cb_data;
     }
 
+    /* throttling disk write I/O */
+    if (bs->io_limits_enabled) {
+        if (bdrv_exceed_io_limits(bs, nb_sectors, true, &wait_time)) {
+            ret = qemu_block_queue_enqueue(bs->block_queue, bs, bdrv_aio_writev,
+                                  sector_num, qiov, nb_sectors, cb, opaque);
+            if (wait_time != -1) {
+                qemu_mod_timer(bs->block_timer,
+                               wait_time + qemu_get_clock_ns(vm_clock));
+            }
+
+            return ret;
+        }
+    }
+
     ret = drv->bdrv_aio_writev(bs, sector_num, qiov, nb_sectors,
                                cb, opaque);
-
     if (ret) {
         if (bs->wr_highest_sector < sector_num + nb_sectors - 1) {
             bs->wr_highest_sector = sector_num + nb_sectors - 1;
         }
+
+        if (bs->io_limits_enabled) {
+            bs->io_disps.bytes[BLOCK_IO_LIMIT_WRITE] +=
+                               (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
+            bs->io_disps.ios[BLOCK_IO_LIMIT_WRITE]++;
+        }
     }
 
     return ret;
@@ -2684,6 +2761,166 @@ void bdrv_aio_cancel(BlockDriverAIOCB *acb)
     acb->pool->cancel(acb);
 }
 
+static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
+                 bool is_write, double elapsed_time, uint64_t *wait) {
+    uint64_t bps_limit = 0;
+    double   bytes_limit, bytes_disp, bytes_res;
+    double   slice_time, wait_time;
+
+    if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
+        bps_limit = bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL];
+    } else if (bs->io_limits.bps[is_write]) {
+        bps_limit = bs->io_limits.bps[is_write];
+    } else {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    slice_time = bs->slice_end - bs->slice_start;
+    slice_time /= (NANOSECONDS_PER_SECOND);
+    bytes_limit = bps_limit * slice_time;
+    bytes_disp  = bs->io_disps.bytes[is_write];
+    if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
+        bytes_disp += bs->io_disps.bytes[!is_write];
+    }
+
+    bytes_res   = (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
+
+    if (bytes_disp + bytes_res <= bytes_limit) {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    /* Calc approx time to dispatch */
+    wait_time = (bytes_disp + bytes_res) / bps_limit - elapsed_time;
+
+    if (wait) {
+        *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
+    }
+
+    printf("1 wait=%ld\n", *wait);
+    return true;
+}
+
+static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
+                             double elapsed_time, uint64_t *wait) {
+    uint64_t iops_limit = 0;
+    double   ios_limit, ios_disp;
+    double   slice_time, wait_time;
+
+    if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
+        iops_limit = bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL];
+    } else if (bs->io_limits.iops[is_write]) {
+        iops_limit = bs->io_limits.iops[is_write];
+    } else {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    slice_time = bs->slice_end - bs->slice_start;
+    slice_time /= (NANOSECONDS_PER_SECOND);
+    ios_limit  = iops_limit * slice_time;
+    ios_disp   = bs->io_disps.ios[is_write];
+    if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
+        ios_disp += bs->io_disps.ios[!is_write];
+    }
+
+    if (ios_disp + 1 <= ios_limit) {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    /* Calc approx time to dispatch */
+    wait_time = (ios_disp + 1) / iops_limit;
+    if (wait_time > elapsed_time) {
+        wait_time = wait_time - elapsed_time;
+    } else {
+        wait_time = 0;
+    }
+
+    if (wait) {
+        *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
+    }
+
+    return true;
+}
+
+static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
+                           bool is_write, int64_t *wait) {
+    int64_t  now, max_wait;
+    uint64_t bps_wait = 0, iops_wait = 0;
+    double   elapsed_time;
+    int      bps_ret, iops_ret;
+
+    now = qemu_get_clock_ns(vm_clock);
+    if ((bs->slice_start < now)
+        && (bs->slice_end > now)) {
+        bs->slice_end = now + BLOCK_IO_SLICE_TIME;
+    } else {
+        bs->slice_start = now;
+        bs->slice_end   = now + BLOCK_IO_SLICE_TIME;
+
+        bs->io_disps.bytes[is_write]  = 0;
+        bs->io_disps.bytes[!is_write] = 0;
+
+        bs->io_disps.ios[is_write]    = 0;
+        bs->io_disps.ios[!is_write]   = 0;
+    }
+
+    /* If a limit was exceeded, immediately queue this request */
+    if (qemu_block_queue_has_pending(bs->block_queue)) {
+        if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]
+            || bs->io_limits.bps[is_write] || bs->io_limits.iops[is_write]
+            || bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
+            if (wait) {
+                *wait = -1;
+            }
+
+            return true;
+        }
+    }
+
+    elapsed_time  = now - bs->slice_start;
+    elapsed_time  /= (NANOSECONDS_PER_SECOND);
+
+    bps_ret  = bdrv_exceed_bps_limits(bs, nb_sectors,
+                                      is_write, elapsed_time, &bps_wait);
+    iops_ret = bdrv_exceed_iops_limits(bs, is_write,
+                                      elapsed_time, &iops_wait);
+    if (bps_ret || iops_ret) {
+        max_wait = bps_wait > iops_wait ? bps_wait : iops_wait;
+        if (wait) {
+            *wait = max_wait;
+        }
+
+        now = qemu_get_clock_ns(vm_clock);
+        if (bs->slice_end < now + max_wait) {
+            bs->slice_end = now + max_wait;
+        }
+
+        printf("end wait=%ld\n", *wait);
+
+        return true;
+    }
+
+    if (wait) {
+        *wait = 0;
+    }
+
+    return false;
+}
 
 /**************************************************************/
 /* async block device emulation */
diff --git a/block.h b/block.h
index a3e69db..10d2828 100644
--- a/block.h
+++ b/block.h
@@ -107,7 +107,6 @@ int bdrv_change_backing_file(BlockDriverState *bs,
     const char *backing_file, const char *backing_fmt);
 void bdrv_register(BlockDriver *bdrv);
 
-
 typedef struct BdrvCheckResult {
     int corruptions;
     int leaks;
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [PATCH v8 3/4] block: add block timer and throttling algorithm
@ 2011-09-08 10:11   ` Zhi Yong Wu
  0 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-09-08 10:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, aliguori, stefanha, kvm, mtosatti, Zhi Yong Wu, pair,
	zwu.kernel, ryanh

Note:
     1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
     2.) When "dd" command is issued in guest, if its option bs is set to a large value such as "bs=1024K", the result speed will slightly bigger than the limits.

For these problems, if you have nice thought, pls let us know.:)

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
---
 block.c |  259 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 block.h |    1 -
 2 files changed, 248 insertions(+), 12 deletions(-)

diff --git a/block.c b/block.c
index cd75183..c08fde8 100644
--- a/block.c
+++ b/block.c
@@ -30,6 +30,9 @@
 #include "qemu-objects.h"
 #include "qemu-coroutine.h"
 
+#include "qemu-timer.h"
+#include "block/blk-queue.h"
+
 #ifdef CONFIG_BSD
 #include <sys/types.h>
 #include <sys/stat.h>
@@ -72,6 +75,13 @@ static int coroutine_fn bdrv_co_writev_em(BlockDriverState *bs,
                                          QEMUIOVector *iov);
 static int coroutine_fn bdrv_co_flush_em(BlockDriverState *bs);
 
+static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
+        bool is_write, double elapsed_time, uint64_t *wait);
+static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
+        double elapsed_time, uint64_t *wait);
+static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
+        bool is_write, int64_t *wait);
+
 static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
     QTAILQ_HEAD_INITIALIZER(bdrv_states);
 
@@ -745,6 +755,11 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
             bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
     }
 
+    /* throttling disk I/O limits */
+    if (bs->io_limits_enabled) {
+        bdrv_io_limits_enable(bs);
+    }
+
     return 0;
 
 unlink_and_fail:
@@ -783,6 +798,18 @@ void bdrv_close(BlockDriverState *bs)
         if (bs->change_cb)
             bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
     }
+
+    /* throttling disk I/O limits */
+    if (bs->block_queue) {
+        qemu_del_block_queue(bs->block_queue);
+        bs->block_queue = NULL;
+    }
+
+    if (bs->block_timer) {
+        qemu_del_timer(bs->block_timer);
+        qemu_free_timer(bs->block_timer);
+        bs->block_timer = NULL;
+    }
 }
 
 void bdrv_close_all(void)
@@ -2341,16 +2368,48 @@ BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num,
                                  BlockDriverCompletionFunc *cb, void *opaque)
 {
     BlockDriver *drv = bs->drv;
-
+    BlockDriverAIOCB *ret;
+    int64_t wait_time = -1;
+printf("sector_num=%ld, nb_sectors=%d\n", sector_num, nb_sectors);
     trace_bdrv_aio_readv(bs, sector_num, nb_sectors, opaque);
 
-    if (!drv)
-        return NULL;
-    if (bdrv_check_request(bs, sector_num, nb_sectors))
+    if (!drv || bdrv_check_request(bs, sector_num, nb_sectors)) {
         return NULL;
+    }
+
+    /* throttling disk read I/O */
+    if (bs->io_limits_enabled) {
+        if (bdrv_exceed_io_limits(bs, nb_sectors, false, &wait_time)) {
+            ret = qemu_block_queue_enqueue(bs->block_queue, bs, bdrv_aio_readv,
+                           sector_num, qiov, nb_sectors, cb, opaque);
+            printf("wait_time=%ld\n", wait_time);
+            if (wait_time != -1) {
+                printf("reset block timer\n");
+                qemu_mod_timer(bs->block_timer,
+                               wait_time + qemu_get_clock_ns(vm_clock));
+            }
+
+            if (ret) {
+                printf("ori ret is not null\n");
+            } else {
+                printf("ori ret is null\n");
+            }
+
+            return ret;
+        }
+    }
 
-    return drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
+    ret =  drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
                                cb, opaque);
+    if (ret) {
+        if (bs->io_limits_enabled) {
+            bs->io_disps.bytes[BLOCK_IO_LIMIT_READ] +=
+                              (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
+            bs->io_disps.ios[BLOCK_IO_LIMIT_READ]++;
+        }
+    }
+
+    return ret;
 }
 
 typedef struct BlockCompleteData {
@@ -2396,15 +2455,14 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
     BlockDriver *drv = bs->drv;
     BlockDriverAIOCB *ret;
     BlockCompleteData *blk_cb_data;
+    int64_t wait_time = -1;
 
     trace_bdrv_aio_writev(bs, sector_num, nb_sectors, opaque);
 
-    if (!drv)
-        return NULL;
-    if (bs->read_only)
-        return NULL;
-    if (bdrv_check_request(bs, sector_num, nb_sectors))
+    if (!drv || bs->read_only
+        || bdrv_check_request(bs, sector_num, nb_sectors)) {
         return NULL;
+    }
 
     if (bs->dirty_bitmap) {
         blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb,
@@ -2413,13 +2471,32 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
         opaque = blk_cb_data;
     }
 
+    /* throttling disk write I/O */
+    if (bs->io_limits_enabled) {
+        if (bdrv_exceed_io_limits(bs, nb_sectors, true, &wait_time)) {
+            ret = qemu_block_queue_enqueue(bs->block_queue, bs, bdrv_aio_writev,
+                                  sector_num, qiov, nb_sectors, cb, opaque);
+            if (wait_time != -1) {
+                qemu_mod_timer(bs->block_timer,
+                               wait_time + qemu_get_clock_ns(vm_clock));
+            }
+
+            return ret;
+        }
+    }
+
     ret = drv->bdrv_aio_writev(bs, sector_num, qiov, nb_sectors,
                                cb, opaque);
-
     if (ret) {
         if (bs->wr_highest_sector < sector_num + nb_sectors - 1) {
             bs->wr_highest_sector = sector_num + nb_sectors - 1;
         }
+
+        if (bs->io_limits_enabled) {
+            bs->io_disps.bytes[BLOCK_IO_LIMIT_WRITE] +=
+                               (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
+            bs->io_disps.ios[BLOCK_IO_LIMIT_WRITE]++;
+        }
     }
 
     return ret;
@@ -2684,6 +2761,166 @@ void bdrv_aio_cancel(BlockDriverAIOCB *acb)
     acb->pool->cancel(acb);
 }
 
+static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
+                 bool is_write, double elapsed_time, uint64_t *wait) {
+    uint64_t bps_limit = 0;
+    double   bytes_limit, bytes_disp, bytes_res;
+    double   slice_time, wait_time;
+
+    if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
+        bps_limit = bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL];
+    } else if (bs->io_limits.bps[is_write]) {
+        bps_limit = bs->io_limits.bps[is_write];
+    } else {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    slice_time = bs->slice_end - bs->slice_start;
+    slice_time /= (NANOSECONDS_PER_SECOND);
+    bytes_limit = bps_limit * slice_time;
+    bytes_disp  = bs->io_disps.bytes[is_write];
+    if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
+        bytes_disp += bs->io_disps.bytes[!is_write];
+    }
+
+    bytes_res   = (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
+
+    if (bytes_disp + bytes_res <= bytes_limit) {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    /* Calc approx time to dispatch */
+    wait_time = (bytes_disp + bytes_res) / bps_limit - elapsed_time;
+
+    if (wait) {
+        *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
+    }
+
+    printf("1 wait=%ld\n", *wait);
+    return true;
+}
+
+static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
+                             double elapsed_time, uint64_t *wait) {
+    uint64_t iops_limit = 0;
+    double   ios_limit, ios_disp;
+    double   slice_time, wait_time;
+
+    if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
+        iops_limit = bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL];
+    } else if (bs->io_limits.iops[is_write]) {
+        iops_limit = bs->io_limits.iops[is_write];
+    } else {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    slice_time = bs->slice_end - bs->slice_start;
+    slice_time /= (NANOSECONDS_PER_SECOND);
+    ios_limit  = iops_limit * slice_time;
+    ios_disp   = bs->io_disps.ios[is_write];
+    if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
+        ios_disp += bs->io_disps.ios[!is_write];
+    }
+
+    if (ios_disp + 1 <= ios_limit) {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    /* Calc approx time to dispatch */
+    wait_time = (ios_disp + 1) / iops_limit;
+    if (wait_time > elapsed_time) {
+        wait_time = wait_time - elapsed_time;
+    } else {
+        wait_time = 0;
+    }
+
+    if (wait) {
+        *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
+    }
+
+    return true;
+}
+
+static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
+                           bool is_write, int64_t *wait) {
+    int64_t  now, max_wait;
+    uint64_t bps_wait = 0, iops_wait = 0;
+    double   elapsed_time;
+    int      bps_ret, iops_ret;
+
+    now = qemu_get_clock_ns(vm_clock);
+    if ((bs->slice_start < now)
+        && (bs->slice_end > now)) {
+        bs->slice_end = now + BLOCK_IO_SLICE_TIME;
+    } else {
+        bs->slice_start = now;
+        bs->slice_end   = now + BLOCK_IO_SLICE_TIME;
+
+        bs->io_disps.bytes[is_write]  = 0;
+        bs->io_disps.bytes[!is_write] = 0;
+
+        bs->io_disps.ios[is_write]    = 0;
+        bs->io_disps.ios[!is_write]   = 0;
+    }
+
+    /* If a limit was exceeded, immediately queue this request */
+    if (qemu_block_queue_has_pending(bs->block_queue)) {
+        if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]
+            || bs->io_limits.bps[is_write] || bs->io_limits.iops[is_write]
+            || bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
+            if (wait) {
+                *wait = -1;
+            }
+
+            return true;
+        }
+    }
+
+    elapsed_time  = now - bs->slice_start;
+    elapsed_time  /= (NANOSECONDS_PER_SECOND);
+
+    bps_ret  = bdrv_exceed_bps_limits(bs, nb_sectors,
+                                      is_write, elapsed_time, &bps_wait);
+    iops_ret = bdrv_exceed_iops_limits(bs, is_write,
+                                      elapsed_time, &iops_wait);
+    if (bps_ret || iops_ret) {
+        max_wait = bps_wait > iops_wait ? bps_wait : iops_wait;
+        if (wait) {
+            *wait = max_wait;
+        }
+
+        now = qemu_get_clock_ns(vm_clock);
+        if (bs->slice_end < now + max_wait) {
+            bs->slice_end = now + max_wait;
+        }
+
+        printf("end wait=%ld\n", *wait);
+
+        return true;
+    }
+
+    if (wait) {
+        *wait = 0;
+    }
+
+    return false;
+}
 
 /**************************************************************/
 /* async block device emulation */
diff --git a/block.h b/block.h
index a3e69db..10d2828 100644
--- a/block.h
+++ b/block.h
@@ -107,7 +107,6 @@ int bdrv_change_backing_file(BlockDriverState *bs,
     const char *backing_file, const char *backing_fmt);
 void bdrv_register(BlockDriver *bdrv);
 
-
 typedef struct BdrvCheckResult {
     int corruptions;
     int leaks;
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v8 4/4] qmp/hmp: add block_set_io_throttle
  2011-09-08 10:11 ` [Qemu-devel] " Zhi Yong Wu
@ 2011-09-08 10:11   ` Zhi Yong Wu
  -1 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-09-08 10:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, stefanha, mtosatti, aliguori, ryanh, zwu.kernel, kwolf,
	pair, Zhi Yong Wu

The patch introduce one new command block_set_io_throttle; For its usage syntax, if you have better idea, pls let me know.

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
---
 block.c         |   26 +++++++++++++++++++-
 blockdev.c      |   69 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 blockdev.h      |    2 +
 hmp-commands.hx |   15 ++++++++++++
 qerror.c        |    4 +++
 qerror.h        |    3 ++
 qmp-commands.hx |   52 ++++++++++++++++++++++++++++++++++++++++-
 7 files changed, 168 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index c08fde8..1d3f067 100644
--- a/block.c
+++ b/block.c
@@ -1938,6 +1938,16 @@ static void bdrv_print_dict(QObject *obj, void *opaque)
                             qdict_get_bool(qdict, "ro"),
                             qdict_get_str(qdict, "drv"),
                             qdict_get_bool(qdict, "encrypted"));
+
+        monitor_printf(mon, " bps=%" PRId64 " bps_rd=%" PRId64
+                            " bps_wr=%" PRId64 " iops=%" PRId64
+                            " iops_rd=%" PRId64 " iops_wr=%" PRId64,
+                            qdict_get_int(qdict, "bps"),
+                            qdict_get_int(qdict, "bps_rd"),
+                            qdict_get_int(qdict, "bps_wr"),
+                            qdict_get_int(qdict, "iops"),
+                            qdict_get_int(qdict, "iops_rd"),
+                            qdict_get_int(qdict, "iops_wr"));
     } else {
         monitor_printf(mon, " [not inserted]");
     }
@@ -1970,10 +1980,22 @@ void bdrv_info(Monitor *mon, QObject **ret_data)
             QDict *bs_dict = qobject_to_qdict(bs_obj);
 
             obj = qobject_from_jsonf("{ 'file': %s, 'ro': %i, 'drv': %s, "
-                                     "'encrypted': %i }",
+                                     "'encrypted': %i, "
+                                     "'bps': %" PRId64 ","
+                                     "'bps_rd': %" PRId64 ","
+                                     "'bps_wr': %" PRId64 ","
+                                     "'iops': %" PRId64 ","
+                                     "'iops_rd': %" PRId64 ","
+                                     "'iops_wr': %" PRId64 "}",
                                      bs->filename, bs->read_only,
                                      bs->drv->format_name,
-                                     bdrv_is_encrypted(bs));
+                                     bdrv_is_encrypted(bs),
+                                     bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL],
+                                     bs->io_limits.bps[BLOCK_IO_LIMIT_READ],
+                                     bs->io_limits.bps[BLOCK_IO_LIMIT_WRITE],
+                                     bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL],
+                                     bs->io_limits.iops[BLOCK_IO_LIMIT_READ],
+                                     bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE]);
             if (bs->backing_file[0] != '\0') {
                 QDict *qdict = qobject_to_qdict(obj);
                 qdict_put(qdict, "backing_file",
diff --git a/blockdev.c b/blockdev.c
index 619ae9f..7f5c4df 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -747,6 +747,75 @@ int do_change_block(Monitor *mon, const char *device,
     return monitor_read_bdrv_key_start(mon, bs, NULL, NULL);
 }
 
+/* throttling disk I/O limits */
+int do_block_set_io_throttle(Monitor *mon,
+                       const QDict *qdict, QObject **ret_data)
+{
+    const char *devname = qdict_get_str(qdict, "device");
+    uint64_t bps        = qdict_get_try_int(qdict, "bps", -1);
+    uint64_t bps_rd     = qdict_get_try_int(qdict, "bps_rd", -1);
+    uint64_t bps_wr     = qdict_get_try_int(qdict, "bps_wr", -1);
+    uint64_t iops       = qdict_get_try_int(qdict, "iops", -1);
+    uint64_t iops_rd    = qdict_get_try_int(qdict, "iops_rd", -1);
+    uint64_t iops_wr    = qdict_get_try_int(qdict, "iops_wr", -1);
+    BlockDriverState *bs;
+
+    bs = bdrv_find(devname);
+    if (!bs) {
+        qerror_report(QERR_DEVICE_NOT_FOUND, devname);
+        return -1;
+    }
+
+    if ((bps == -1) && (bps_rd == -1) && (bps_wr == -1)
+        && (iops == -1) && (iops_rd == -1) && (iops_wr == -1)) {
+        qerror_report(QERR_MISSING_PARAMETER,
+                      "bps/bps_rd/bps_wr/iops/iops_rd/iops_wr");
+        return -1;
+    }
+
+    if (((bps != -1) && ((bps_rd != -1) || (bps_wr != -1)))
+        || ((iops != -1) && ((iops_rd != -1) || (iops_wr != -1)))) {
+        qerror_report(QERR_INVALID_PARAMETER_COMBINATION);
+        return -1;
+    }
+
+    if (bps != -1) {
+        bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL] = bps;
+        bs->io_limits.bps[BLOCK_IO_LIMIT_READ]  = 0;
+        bs->io_limits.bps[BLOCK_IO_LIMIT_WRITE] = 0;
+    }
+
+    if ((bps_rd != -1) || (bps_wr != -1)) {
+        bs->io_limits.bps[BLOCK_IO_LIMIT_READ]   =
+           (bps_rd == -1) ? bs->io_limits.bps[BLOCK_IO_LIMIT_READ] : bps_rd;
+        bs->io_limits.bps[BLOCK_IO_LIMIT_WRITE]  =
+           (bps_wr == -1) ? bs->io_limits.bps[BLOCK_IO_LIMIT_WRITE] : bps_wr;
+        bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]  = 0;
+    }
+
+    if (iops != -1) {
+        bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL] = iops;
+        bs->io_limits.iops[BLOCK_IO_LIMIT_READ]  = 0;
+        bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE] = 0;
+    }
+
+    if ((iops_rd != -1) || (iops_wr != -1)) {
+        bs->io_limits.iops[BLOCK_IO_LIMIT_READ]  =
+           (iops_rd == -1) ? bs->io_limits.iops[BLOCK_IO_LIMIT_READ] : iops_rd;
+        bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE] =
+           (iops_wr == -1) ? bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE] : iops_wr;
+        bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL] = 0;
+    }
+
+    if (!bs->io_limits_enabled && bdrv_io_limits_enabled(bs)) {
+        bdrv_io_limits_enable(bs);
+    } else if (bs->io_limits_enabled && !bdrv_io_limits_enabled(bs)) {
+        bdrv_io_limits_disable(bs);
+    }
+
+    return 0;
+}
+
 int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
 {
     const char *id = qdict_get_str(qdict, "id");
diff --git a/blockdev.h b/blockdev.h
index 3587786..c2f44c6 100644
--- a/blockdev.h
+++ b/blockdev.h
@@ -63,6 +63,8 @@ int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_change_block(Monitor *mon, const char *device,
                     const char *filename, const char *fmt);
 int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
+int do_block_set_io_throttle(Monitor *mon,
+                    const QDict *qdict, QObject **ret_data);
 int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_block_resize(Monitor *mon, const QDict *qdict, QObject **ret_data);
 
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 9e1cca8..a615427 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1210,6 +1210,21 @@ ETEXI
     },
 
 STEXI
+@item block_set_io_throttle @var{device} @var{bps} @var{bps_rd} @var{bps_wr} @var{iops} @var{iops_rd} @var{iops_wr}
+@findex block_set_io_throttle
+Change I/O throttle limits for a block drive to @var{bps} @var{bps_rd} @var{bps_wr} @var{iops} @var{iops_rd} @var{iops_wr}
+ETEXI
+
+    {
+        .name       = "block_set_io_throttle",
+        .args_type  = "device:B,bps:i?,bps_rd:i?,bps_wr:i?,iops:i?,iops_rd:i?,iops_wr:i?",
+        .params     = "device [bps] [bps_rd] [bps_wr] [iops] [iops_rd] [iops_wr]",
+        .help       = "change I/O throttle limits for a block drive",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = do_block_set_io_throttle,
+    },
+
+STEXI
 @item block_passwd @var{device} @var{password}
 @findex block_passwd
 Set the encrypted device @var{device} password to @var{password}
diff --git a/qerror.c b/qerror.c
index 3d64b80..33f9fdd 100644
--- a/qerror.c
+++ b/qerror.c
@@ -230,6 +230,10 @@ static const QErrorStringTable qerror_table[] = {
         .error_fmt = QERR_QGA_COMMAND_FAILED,
         .desc      = "Guest agent command failed, error was '%(message)'",
     },
+    {
+        .error_fmt = QERR_INVALID_PARAMETER_COMBINATION,
+        .desc      = "Invalid paramter combination",
+    },
     {}
 };
 
diff --git a/qerror.h b/qerror.h
index 8058456..62c1df2 100644
--- a/qerror.h
+++ b/qerror.h
@@ -193,4 +193,7 @@ QError *qobject_to_qerror(const QObject *obj);
 #define QERR_QGA_COMMAND_FAILED \
     "{ 'class': 'QgaCommandFailed', 'data': { 'message': %s } }"
 
+#define QERR_INVALID_PARAMETER_COMBINATION \
+    "{ 'class': 'InvalidParameterCombination', 'data': {} }"
+
 #endif /* QERROR_H */
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 27cc66e..e848969 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -862,6 +862,44 @@ Example:
 EQMP
 
     {
+        .name       = "block_set_io_throttle",
+        .args_type  = "device:B,bps:i?,bps_rd:i?,bps_wr:i?,iops:i?,iops_rd:i?,iops_wr:i?",
+        .params     = "device [bps] [bps_rd] [bps_wr] [iops] [iops_rd] [iops_wr]",
+        .help       = "change I/O throttle limits for a block drive",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = do_block_set_io_throttle,
+    },
+
+SQMP
+block_set_io_throttle
+------------
+
+Change I/O throttle limits for a block drive.
+
+Arguments:
+
+- "device": device name (json-string)
+- "bps":  total throughput limit in bytes per second(json-int, optional)
+- "bps_rd":  read throughput limit in bytes per second(json-int, optional)
+- "bps_wr":  read throughput limit in bytes per second(json-int, optional)
+- "iops":  total I/O operations per second(json-int, optional)
+- "iops_rd":  read I/O operations per second(json-int, optional)
+- "iops_wr":  write I/O operations per second(json-int, optional)
+
+Example:
+
+-> { "execute": "block_set_io_throttle", "arguments": { "device": "virtio0",
+                                               "bps": "1000000",
+                                               "bps_rd": "0",
+                                               "bps_wr": "0",
+                                               "iops": "0",
+                                               "iops_rd": "0",
+                                               "iops_wr": "0" } }
+<- { "return": {} }
+
+EQMP
+
+    {
         .name       = "set_password",
         .args_type  = "protocol:s,password:s,connected:s?",
         .params     = "protocol password action-if-connected",
@@ -1143,6 +1181,12 @@ Each json-object contain the following:
                                 "tftp", "vdi", "vmdk", "vpc", "vvfat"
          - "backing_file": backing file name (json-string, optional)
          - "encrypted": true if encrypted, false otherwise (json-bool)
+         - "bps": limit total bytes per second (json-int)
+         - "bps_rd": limit read bytes per second (json-int)
+         - "bps_wr": limit write bytes per second (json-int)
+         - "iops": limit total I/O operations per second (json-int)
+         - "iops_rd": limit read operations per second (json-int)
+         - "iops_wr": limit write operations per second (json-int)
 
 Example:
 
@@ -1157,7 +1201,13 @@ Example:
                "ro":false,
                "drv":"qcow2",
                "encrypted":false,
-               "file":"disks/test.img"
+               "file":"disks/vm.img",
+               "bps":1000000,
+               "bps_rd":0,
+               "bps_wr":0,
+               "iops":1000000,
+               "iops_rd":0,
+               "iops_wr":0,
             },
             "type":"unknown"
          },
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [PATCH v8 4/4] qmp/hmp: add block_set_io_throttle
@ 2011-09-08 10:11   ` Zhi Yong Wu
  0 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-09-08 10:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, aliguori, stefanha, kvm, mtosatti, Zhi Yong Wu, pair,
	zwu.kernel, ryanh

The patch introduce one new command block_set_io_throttle; For its usage syntax, if you have better idea, pls let me know.

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
---
 block.c         |   26 +++++++++++++++++++-
 blockdev.c      |   69 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 blockdev.h      |    2 +
 hmp-commands.hx |   15 ++++++++++++
 qerror.c        |    4 +++
 qerror.h        |    3 ++
 qmp-commands.hx |   52 ++++++++++++++++++++++++++++++++++++++++-
 7 files changed, 168 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index c08fde8..1d3f067 100644
--- a/block.c
+++ b/block.c
@@ -1938,6 +1938,16 @@ static void bdrv_print_dict(QObject *obj, void *opaque)
                             qdict_get_bool(qdict, "ro"),
                             qdict_get_str(qdict, "drv"),
                             qdict_get_bool(qdict, "encrypted"));
+
+        monitor_printf(mon, " bps=%" PRId64 " bps_rd=%" PRId64
+                            " bps_wr=%" PRId64 " iops=%" PRId64
+                            " iops_rd=%" PRId64 " iops_wr=%" PRId64,
+                            qdict_get_int(qdict, "bps"),
+                            qdict_get_int(qdict, "bps_rd"),
+                            qdict_get_int(qdict, "bps_wr"),
+                            qdict_get_int(qdict, "iops"),
+                            qdict_get_int(qdict, "iops_rd"),
+                            qdict_get_int(qdict, "iops_wr"));
     } else {
         monitor_printf(mon, " [not inserted]");
     }
@@ -1970,10 +1980,22 @@ void bdrv_info(Monitor *mon, QObject **ret_data)
             QDict *bs_dict = qobject_to_qdict(bs_obj);
 
             obj = qobject_from_jsonf("{ 'file': %s, 'ro': %i, 'drv': %s, "
-                                     "'encrypted': %i }",
+                                     "'encrypted': %i, "
+                                     "'bps': %" PRId64 ","
+                                     "'bps_rd': %" PRId64 ","
+                                     "'bps_wr': %" PRId64 ","
+                                     "'iops': %" PRId64 ","
+                                     "'iops_rd': %" PRId64 ","
+                                     "'iops_wr': %" PRId64 "}",
                                      bs->filename, bs->read_only,
                                      bs->drv->format_name,
-                                     bdrv_is_encrypted(bs));
+                                     bdrv_is_encrypted(bs),
+                                     bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL],
+                                     bs->io_limits.bps[BLOCK_IO_LIMIT_READ],
+                                     bs->io_limits.bps[BLOCK_IO_LIMIT_WRITE],
+                                     bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL],
+                                     bs->io_limits.iops[BLOCK_IO_LIMIT_READ],
+                                     bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE]);
             if (bs->backing_file[0] != '\0') {
                 QDict *qdict = qobject_to_qdict(obj);
                 qdict_put(qdict, "backing_file",
diff --git a/blockdev.c b/blockdev.c
index 619ae9f..7f5c4df 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -747,6 +747,75 @@ int do_change_block(Monitor *mon, const char *device,
     return monitor_read_bdrv_key_start(mon, bs, NULL, NULL);
 }
 
+/* throttling disk I/O limits */
+int do_block_set_io_throttle(Monitor *mon,
+                       const QDict *qdict, QObject **ret_data)
+{
+    const char *devname = qdict_get_str(qdict, "device");
+    uint64_t bps        = qdict_get_try_int(qdict, "bps", -1);
+    uint64_t bps_rd     = qdict_get_try_int(qdict, "bps_rd", -1);
+    uint64_t bps_wr     = qdict_get_try_int(qdict, "bps_wr", -1);
+    uint64_t iops       = qdict_get_try_int(qdict, "iops", -1);
+    uint64_t iops_rd    = qdict_get_try_int(qdict, "iops_rd", -1);
+    uint64_t iops_wr    = qdict_get_try_int(qdict, "iops_wr", -1);
+    BlockDriverState *bs;
+
+    bs = bdrv_find(devname);
+    if (!bs) {
+        qerror_report(QERR_DEVICE_NOT_FOUND, devname);
+        return -1;
+    }
+
+    if ((bps == -1) && (bps_rd == -1) && (bps_wr == -1)
+        && (iops == -1) && (iops_rd == -1) && (iops_wr == -1)) {
+        qerror_report(QERR_MISSING_PARAMETER,
+                      "bps/bps_rd/bps_wr/iops/iops_rd/iops_wr");
+        return -1;
+    }
+
+    if (((bps != -1) && ((bps_rd != -1) || (bps_wr != -1)))
+        || ((iops != -1) && ((iops_rd != -1) || (iops_wr != -1)))) {
+        qerror_report(QERR_INVALID_PARAMETER_COMBINATION);
+        return -1;
+    }
+
+    if (bps != -1) {
+        bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL] = bps;
+        bs->io_limits.bps[BLOCK_IO_LIMIT_READ]  = 0;
+        bs->io_limits.bps[BLOCK_IO_LIMIT_WRITE] = 0;
+    }
+
+    if ((bps_rd != -1) || (bps_wr != -1)) {
+        bs->io_limits.bps[BLOCK_IO_LIMIT_READ]   =
+           (bps_rd == -1) ? bs->io_limits.bps[BLOCK_IO_LIMIT_READ] : bps_rd;
+        bs->io_limits.bps[BLOCK_IO_LIMIT_WRITE]  =
+           (bps_wr == -1) ? bs->io_limits.bps[BLOCK_IO_LIMIT_WRITE] : bps_wr;
+        bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]  = 0;
+    }
+
+    if (iops != -1) {
+        bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL] = iops;
+        bs->io_limits.iops[BLOCK_IO_LIMIT_READ]  = 0;
+        bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE] = 0;
+    }
+
+    if ((iops_rd != -1) || (iops_wr != -1)) {
+        bs->io_limits.iops[BLOCK_IO_LIMIT_READ]  =
+           (iops_rd == -1) ? bs->io_limits.iops[BLOCK_IO_LIMIT_READ] : iops_rd;
+        bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE] =
+           (iops_wr == -1) ? bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE] : iops_wr;
+        bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL] = 0;
+    }
+
+    if (!bs->io_limits_enabled && bdrv_io_limits_enabled(bs)) {
+        bdrv_io_limits_enable(bs);
+    } else if (bs->io_limits_enabled && !bdrv_io_limits_enabled(bs)) {
+        bdrv_io_limits_disable(bs);
+    }
+
+    return 0;
+}
+
 int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
 {
     const char *id = qdict_get_str(qdict, "id");
diff --git a/blockdev.h b/blockdev.h
index 3587786..c2f44c6 100644
--- a/blockdev.h
+++ b/blockdev.h
@@ -63,6 +63,8 @@ int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_change_block(Monitor *mon, const char *device,
                     const char *filename, const char *fmt);
 int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
+int do_block_set_io_throttle(Monitor *mon,
+                    const QDict *qdict, QObject **ret_data);
 int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_block_resize(Monitor *mon, const QDict *qdict, QObject **ret_data);
 
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 9e1cca8..a615427 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1210,6 +1210,21 @@ ETEXI
     },
 
 STEXI
+@item block_set_io_throttle @var{device} @var{bps} @var{bps_rd} @var{bps_wr} @var{iops} @var{iops_rd} @var{iops_wr}
+@findex block_set_io_throttle
+Change I/O throttle limits for a block drive to @var{bps} @var{bps_rd} @var{bps_wr} @var{iops} @var{iops_rd} @var{iops_wr}
+ETEXI
+
+    {
+        .name       = "block_set_io_throttle",
+        .args_type  = "device:B,bps:i?,bps_rd:i?,bps_wr:i?,iops:i?,iops_rd:i?,iops_wr:i?",
+        .params     = "device [bps] [bps_rd] [bps_wr] [iops] [iops_rd] [iops_wr]",
+        .help       = "change I/O throttle limits for a block drive",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = do_block_set_io_throttle,
+    },
+
+STEXI
 @item block_passwd @var{device} @var{password}
 @findex block_passwd
 Set the encrypted device @var{device} password to @var{password}
diff --git a/qerror.c b/qerror.c
index 3d64b80..33f9fdd 100644
--- a/qerror.c
+++ b/qerror.c
@@ -230,6 +230,10 @@ static const QErrorStringTable qerror_table[] = {
         .error_fmt = QERR_QGA_COMMAND_FAILED,
         .desc      = "Guest agent command failed, error was '%(message)'",
     },
+    {
+        .error_fmt = QERR_INVALID_PARAMETER_COMBINATION,
+        .desc      = "Invalid paramter combination",
+    },
     {}
 };
 
diff --git a/qerror.h b/qerror.h
index 8058456..62c1df2 100644
--- a/qerror.h
+++ b/qerror.h
@@ -193,4 +193,7 @@ QError *qobject_to_qerror(const QObject *obj);
 #define QERR_QGA_COMMAND_FAILED \
     "{ 'class': 'QgaCommandFailed', 'data': { 'message': %s } }"
 
+#define QERR_INVALID_PARAMETER_COMBINATION \
+    "{ 'class': 'InvalidParameterCombination', 'data': {} }"
+
 #endif /* QERROR_H */
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 27cc66e..e848969 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -862,6 +862,44 @@ Example:
 EQMP
 
     {
+        .name       = "block_set_io_throttle",
+        .args_type  = "device:B,bps:i?,bps_rd:i?,bps_wr:i?,iops:i?,iops_rd:i?,iops_wr:i?",
+        .params     = "device [bps] [bps_rd] [bps_wr] [iops] [iops_rd] [iops_wr]",
+        .help       = "change I/O throttle limits for a block drive",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = do_block_set_io_throttle,
+    },
+
+SQMP
+block_set_io_throttle
+------------
+
+Change I/O throttle limits for a block drive.
+
+Arguments:
+
+- "device": device name (json-string)
+- "bps":  total throughput limit in bytes per second(json-int, optional)
+- "bps_rd":  read throughput limit in bytes per second(json-int, optional)
+- "bps_wr":  read throughput limit in bytes per second(json-int, optional)
+- "iops":  total I/O operations per second(json-int, optional)
+- "iops_rd":  read I/O operations per second(json-int, optional)
+- "iops_wr":  write I/O operations per second(json-int, optional)
+
+Example:
+
+-> { "execute": "block_set_io_throttle", "arguments": { "device": "virtio0",
+                                               "bps": "1000000",
+                                               "bps_rd": "0",
+                                               "bps_wr": "0",
+                                               "iops": "0",
+                                               "iops_rd": "0",
+                                               "iops_wr": "0" } }
+<- { "return": {} }
+
+EQMP
+
+    {
         .name       = "set_password",
         .args_type  = "protocol:s,password:s,connected:s?",
         .params     = "protocol password action-if-connected",
@@ -1143,6 +1181,12 @@ Each json-object contain the following:
                                 "tftp", "vdi", "vmdk", "vpc", "vvfat"
          - "backing_file": backing file name (json-string, optional)
          - "encrypted": true if encrypted, false otherwise (json-bool)
+         - "bps": limit total bytes per second (json-int)
+         - "bps_rd": limit read bytes per second (json-int)
+         - "bps_wr": limit write bytes per second (json-int)
+         - "iops": limit total I/O operations per second (json-int)
+         - "iops_rd": limit read operations per second (json-int)
+         - "iops_wr": limit write operations per second (json-int)
 
 Example:
 
@@ -1157,7 +1201,13 @@ Example:
                "ro":false,
                "drv":"qcow2",
                "encrypted":false,
-               "file":"disks/test.img"
+               "file":"disks/vm.img",
+               "bps":1000000,
+               "bps_rd":0,
+               "bps_wr":0,
+               "iops":1000000,
+               "iops_rd":0,
+               "iops_wr":0,
             },
             "type":"unknown"
          },
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH v8 3/4] block: add block timer and throttling algorithm
  2011-09-08 10:11   ` [Qemu-devel] " Zhi Yong Wu
@ 2011-09-09 14:44     ` Marcelo Tosatti
  -1 siblings, 0 replies; 68+ messages in thread
From: Marcelo Tosatti @ 2011-09-09 14:44 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: qemu-devel, kvm, stefanha, aliguori, ryanh, zwu.kernel, kwolf, pair

On Thu, Sep 08, 2011 at 06:11:07PM +0800, Zhi Yong Wu wrote:
> Note:
>      1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.

You can increase the length of the slice, if the request is larger than
slice_time * bps_limit.

>      2.) When "dd" command is issued in guest, if its option bs is set to a large value such as "bs=1024K", the result speed will slightly bigger than the limits.

Why?

There is lots of debugging leftovers in the patch.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 3/4] block: add block timer and throttling algorithm
@ 2011-09-09 14:44     ` Marcelo Tosatti
  0 siblings, 0 replies; 68+ messages in thread
From: Marcelo Tosatti @ 2011-09-09 14:44 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: kwolf, aliguori, stefanha, kvm, qemu-devel, pair, zwu.kernel, ryanh

On Thu, Sep 08, 2011 at 06:11:07PM +0800, Zhi Yong Wu wrote:
> Note:
>      1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.

You can increase the length of the slice, if the request is larger than
slice_time * bps_limit.

>      2.) When "dd" command is issued in guest, if its option bs is set to a large value such as "bs=1024K", the result speed will slightly bigger than the limits.

Why?

There is lots of debugging leftovers in the patch.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v8 3/4] block: add block timer and throttling algorithm
  2011-09-09 14:44     ` [Qemu-devel] " Marcelo Tosatti
@ 2011-09-13  3:09       ` Zhi Yong Wu
  -1 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-09-13  3:09 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Zhi Yong Wu, qemu-devel, kvm, stefanha, aliguori, ryanh, kwolf, pair

On Fri, Sep 9, 2011 at 10:44 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Thu, Sep 08, 2011 at 06:11:07PM +0800, Zhi Yong Wu wrote:
>> Note:
>>      1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
>
> You can increase the length of the slice, if the request is larger than
> slice_time * bps_limit.
Yeah, but it is a challenge for how to increase it. Do you have some nice idea?

>
>>      2.) When "dd" command is issued in guest, if its option bs is set to a large value such as "bs=1024K", the result speed will slightly bigger than the limits.
>
> Why?
This issue has not existed. I will remove it.
When drive bps=1000000, i did some testings on guest VM.
1.) bs=1024K
18+0 records in
18+0 records out
18874368 bytes (19 MB) copied, 26.6268 s, 709 kB/s
2.) bs=2048K
18+0 records in
18+0 records out
37748736 bytes (38 MB) copied, 46.5336 s, 811 kB/s

>
> There is lots of debugging leftovers in the patch.
sorry, i forgot to remove them.
>
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 3/4] block: add block timer and throttling algorithm
@ 2011-09-13  3:09       ` Zhi Yong Wu
  0 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-09-13  3:09 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: kwolf, aliguori, stefanha, kvm, Zhi Yong Wu, qemu-devel, pair, ryanh

On Fri, Sep 9, 2011 at 10:44 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Thu, Sep 08, 2011 at 06:11:07PM +0800, Zhi Yong Wu wrote:
>> Note:
>>      1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
>
> You can increase the length of the slice, if the request is larger than
> slice_time * bps_limit.
Yeah, but it is a challenge for how to increase it. Do you have some nice idea?

>
>>      2.) When "dd" command is issued in guest, if its option bs is set to a large value such as "bs=1024K", the result speed will slightly bigger than the limits.
>
> Why?
This issue has not existed. I will remove it.
When drive bps=1000000, i did some testings on guest VM.
1.) bs=1024K
18+0 records in
18+0 records out
18874368 bytes (19 MB) copied, 26.6268 s, 709 kB/s
2.) bs=2048K
18+0 records in
18+0 records out
37748736 bytes (38 MB) copied, 46.5336 s, 811 kB/s

>
> There is lots of debugging leftovers in the patch.
sorry, i forgot to remove them.
>
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v8 3/4] block: add block timer and throttling algorithm
  2011-09-13  3:09       ` [Qemu-devel] " Zhi Yong Wu
@ 2011-09-14 10:50         ` Marcelo Tosatti
  -1 siblings, 0 replies; 68+ messages in thread
From: Marcelo Tosatti @ 2011-09-14 10:50 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: Zhi Yong Wu, qemu-devel, kvm, stefanha, aliguori, ryanh, kwolf, pair

On Tue, Sep 13, 2011 at 11:09:46AM +0800, Zhi Yong Wu wrote:
> On Fri, Sep 9, 2011 at 10:44 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > On Thu, Sep 08, 2011 at 06:11:07PM +0800, Zhi Yong Wu wrote:
> >> Note:
> >>      1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
> >
> > You can increase the length of the slice, if the request is larger than
> > slice_time * bps_limit.
> Yeah, but it is a challenge for how to increase it. Do you have some nice idea?

If the queue is empty, and the request being processed does not fit the
queue, increase the slice so that the request fits.

That is, make BLOCK_IO_SLICE_TIME dynamic and adjust it as described
above (if the bps or io limits change, reset it to the default
BLOCK_IO_SLICE_TIME).

> >>      2.) When "dd" command is issued in guest, if its option bs is set to a large value such as "bs=1024K", the result speed will slightly bigger than the limits.
> >
> > Why?
> This issue has not existed. I will remove it.
> When drive bps=1000000, i did some testings on guest VM.
> 1.) bs=1024K
> 18+0 records in
> 18+0 records out
> 18874368 bytes (19 MB) copied, 26.6268 s, 709 kB/s
> 2.) bs=2048K
> 18+0 records in
> 18+0 records out
> 37748736 bytes (38 MB) copied, 46.5336 s, 811 kB/s
> 
> >
> > There is lots of debugging leftovers in the patch.
> sorry, i forgot to remove them.
> >
> >


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 3/4] block: add block timer and throttling algorithm
@ 2011-09-14 10:50         ` Marcelo Tosatti
  0 siblings, 0 replies; 68+ messages in thread
From: Marcelo Tosatti @ 2011-09-14 10:50 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: kwolf, aliguori, stefanha, kvm, Zhi Yong Wu, qemu-devel, pair, ryanh

On Tue, Sep 13, 2011 at 11:09:46AM +0800, Zhi Yong Wu wrote:
> On Fri, Sep 9, 2011 at 10:44 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > On Thu, Sep 08, 2011 at 06:11:07PM +0800, Zhi Yong Wu wrote:
> >> Note:
> >>      1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
> >
> > You can increase the length of the slice, if the request is larger than
> > slice_time * bps_limit.
> Yeah, but it is a challenge for how to increase it. Do you have some nice idea?

If the queue is empty, and the request being processed does not fit the
queue, increase the slice so that the request fits.

That is, make BLOCK_IO_SLICE_TIME dynamic and adjust it as described
above (if the bps or io limits change, reset it to the default
BLOCK_IO_SLICE_TIME).

> >>      2.) When "dd" command is issued in guest, if its option bs is set to a large value such as "bs=1024K", the result speed will slightly bigger than the limits.
> >
> > Why?
> This issue has not existed. I will remove it.
> When drive bps=1000000, i did some testings on guest VM.
> 1.) bs=1024K
> 18+0 records in
> 18+0 records out
> 18874368 bytes (19 MB) copied, 26.6268 s, 709 kB/s
> 2.) bs=2048K
> 18+0 records in
> 18+0 records out
> 37748736 bytes (38 MB) copied, 46.5336 s, 811 kB/s
> 
> >
> > There is lots of debugging leftovers in the patch.
> sorry, i forgot to remove them.
> >
> >

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v8 3/4] block: add block timer and throttling algorithm
  2011-09-14 10:50         ` [Qemu-devel] " Marcelo Tosatti
@ 2011-09-19  9:55           ` Zhi Yong Wu
  -1 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-09-19  9:55 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Zhi Yong Wu, qemu-devel, kvm, stefanha, aliguori, ryanh, kwolf, pair

On Wed, Sep 14, 2011 at 6:50 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Tue, Sep 13, 2011 at 11:09:46AM +0800, Zhi Yong Wu wrote:
>> On Fri, Sep 9, 2011 at 10:44 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> > On Thu, Sep 08, 2011 at 06:11:07PM +0800, Zhi Yong Wu wrote:
>> >> Note:
>> >>      1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
>> >
>> > You can increase the length of the slice, if the request is larger than
>> > slice_time * bps_limit.
>> Yeah, but it is a challenge for how to increase it. Do you have some nice idea?
>
> If the queue is empty, and the request being processed does not fit the
> queue, increase the slice so that the request fits.
Sorry for late reply. actually, do you think that this scenario is
meaningful for the user?
Since we implement this, if the user limits the bps below 512
bytes/second, the VM can also not run every task.
Can you let us know why we need to make such effort?

>
> That is, make BLOCK_IO_SLICE_TIME dynamic and adjust it as described
> above (if the bps or io limits change, reset it to the default
> BLOCK_IO_SLICE_TIME).
>
>> >>      2.) When "dd" command is issued in guest, if its option bs is set to a large value such as "bs=1024K", the result speed will slightly bigger than the limits.
>> >
>> > Why?
>> This issue has not existed. I will remove it.
>> When drive bps=1000000, i did some testings on guest VM.
>> 1.) bs=1024K
>> 18+0 records in
>> 18+0 records out
>> 18874368 bytes (19 MB) copied, 26.6268 s, 709 kB/s
>> 2.) bs=2048K
>> 18+0 records in
>> 18+0 records out
>> 37748736 bytes (38 MB) copied, 46.5336 s, 811 kB/s
>>
>> >
>> > There is lots of debugging leftovers in the patch.
>> sorry, i forgot to remove them.
>> >
>> >
>
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 3/4] block: add block timer and throttling algorithm
@ 2011-09-19  9:55           ` Zhi Yong Wu
  0 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-09-19  9:55 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: kwolf, aliguori, stefanha, kvm, Zhi Yong Wu, qemu-devel, pair, ryanh

On Wed, Sep 14, 2011 at 6:50 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Tue, Sep 13, 2011 at 11:09:46AM +0800, Zhi Yong Wu wrote:
>> On Fri, Sep 9, 2011 at 10:44 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> > On Thu, Sep 08, 2011 at 06:11:07PM +0800, Zhi Yong Wu wrote:
>> >> Note:
>> >>      1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
>> >
>> > You can increase the length of the slice, if the request is larger than
>> > slice_time * bps_limit.
>> Yeah, but it is a challenge for how to increase it. Do you have some nice idea?
>
> If the queue is empty, and the request being processed does not fit the
> queue, increase the slice so that the request fits.
Sorry for late reply. actually, do you think that this scenario is
meaningful for the user?
Since we implement this, if the user limits the bps below 512
bytes/second, the VM can also not run every task.
Can you let us know why we need to make such effort?

>
> That is, make BLOCK_IO_SLICE_TIME dynamic and adjust it as described
> above (if the bps or io limits change, reset it to the default
> BLOCK_IO_SLICE_TIME).
>
>> >>      2.) When "dd" command is issued in guest, if its option bs is set to a large value such as "bs=1024K", the result speed will slightly bigger than the limits.
>> >
>> > Why?
>> This issue has not existed. I will remove it.
>> When drive bps=1000000, i did some testings on guest VM.
>> 1.) bs=1024K
>> 18+0 records in
>> 18+0 records out
>> 18874368 bytes (19 MB) copied, 26.6268 s, 709 kB/s
>> 2.) bs=2048K
>> 18+0 records in
>> 18+0 records out
>> 37748736 bytes (38 MB) copied, 46.5336 s, 811 kB/s
>>
>> >
>> > There is lots of debugging leftovers in the patch.
>> sorry, i forgot to remove them.
>> >
>> >
>
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v8 3/4] block: add block timer and throttling algorithm
  2011-09-19  9:55           ` [Qemu-devel] " Zhi Yong Wu
@ 2011-09-20 12:34             ` Marcelo Tosatti
  -1 siblings, 0 replies; 68+ messages in thread
From: Marcelo Tosatti @ 2011-09-20 12:34 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: Zhi Yong Wu, qemu-devel, kvm, stefanha, aliguori, ryanh, kwolf, pair

On Mon, Sep 19, 2011 at 05:55:41PM +0800, Zhi Yong Wu wrote:
> On Wed, Sep 14, 2011 at 6:50 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > On Tue, Sep 13, 2011 at 11:09:46AM +0800, Zhi Yong Wu wrote:
> >> On Fri, Sep 9, 2011 at 10:44 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> >> > On Thu, Sep 08, 2011 at 06:11:07PM +0800, Zhi Yong Wu wrote:
> >> >> Note:
> >> >>      1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
> >> >
> >> > You can increase the length of the slice, if the request is larger than
> >> > slice_time * bps_limit.
> >> Yeah, but it is a challenge for how to increase it. Do you have some nice idea?
> >
> > If the queue is empty, and the request being processed does not fit the
> > queue, increase the slice so that the request fits.
> Sorry for late reply. actually, do you think that this scenario is
> meaningful for the user?
> Since we implement this, if the user limits the bps below 512
> bytes/second, the VM can also not run every task.
> Can you let us know why we need to make such effort?

It would be good to handle request larger than the slice.

It is not strictly necessary, but in case its not handled, a minimum
should be in place, to reflect maximum request size known. Being able to
specify something which crashes is not acceptable.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 3/4] block: add block timer and throttling algorithm
@ 2011-09-20 12:34             ` Marcelo Tosatti
  0 siblings, 0 replies; 68+ messages in thread
From: Marcelo Tosatti @ 2011-09-20 12:34 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: kwolf, aliguori, stefanha, kvm, Zhi Yong Wu, qemu-devel, pair, ryanh

On Mon, Sep 19, 2011 at 05:55:41PM +0800, Zhi Yong Wu wrote:
> On Wed, Sep 14, 2011 at 6:50 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > On Tue, Sep 13, 2011 at 11:09:46AM +0800, Zhi Yong Wu wrote:
> >> On Fri, Sep 9, 2011 at 10:44 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> >> > On Thu, Sep 08, 2011 at 06:11:07PM +0800, Zhi Yong Wu wrote:
> >> >> Note:
> >> >>      1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
> >> >
> >> > You can increase the length of the slice, if the request is larger than
> >> > slice_time * bps_limit.
> >> Yeah, but it is a challenge for how to increase it. Do you have some nice idea?
> >
> > If the queue is empty, and the request being processed does not fit the
> > queue, increase the slice so that the request fits.
> Sorry for late reply. actually, do you think that this scenario is
> meaningful for the user?
> Since we implement this, if the user limits the bps below 512
> bytes/second, the VM can also not run every task.
> Can you let us know why we need to make such effort?

It would be good to handle request larger than the slice.

It is not strictly necessary, but in case its not handled, a minimum
should be in place, to reflect maximum request size known. Being able to
specify something which crashes is not acceptable.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v8 3/4] block: add block timer and throttling algorithm
  2011-09-20 12:34             ` [Qemu-devel] " Marcelo Tosatti
@ 2011-09-21  3:14               ` Zhi Yong Wu
  -1 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-09-21  3:14 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Zhi Yong Wu, qemu-devel, kvm, stefanha, aliguori, ryanh, kwolf, pair

On Tue, Sep 20, 2011 at 8:34 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Mon, Sep 19, 2011 at 05:55:41PM +0800, Zhi Yong Wu wrote:
>> On Wed, Sep 14, 2011 at 6:50 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> > On Tue, Sep 13, 2011 at 11:09:46AM +0800, Zhi Yong Wu wrote:
>> >> On Fri, Sep 9, 2011 at 10:44 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> >> > On Thu, Sep 08, 2011 at 06:11:07PM +0800, Zhi Yong Wu wrote:
>> >> >> Note:
>> >> >>      1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
>> >> >
>> >> > You can increase the length of the slice, if the request is larger than
>> >> > slice_time * bps_limit.
>> >> Yeah, but it is a challenge for how to increase it. Do you have some nice idea?
>> >
>> > If the queue is empty, and the request being processed does not fit the
>> > queue, increase the slice so that the request fits.
>> Sorry for late reply. actually, do you think that this scenario is
>> meaningful for the user?
>> Since we implement this, if the user limits the bps below 512
>> bytes/second, the VM can also not run every task.
>> Can you let us know why we need to make such effort?
>
> It would be good to handle request larger than the slice.
OK. Let me spend some time on trying your way.
>
> It is not strictly necessary, but in case its not handled, a minimum
> should be in place, to reflect maximum request size known. Being able to
In fact, slice_time has been dynamic now, and adjusted in some range.
> specify something which crashes is not acceptable.
Do you mean that one warning should be displayed if the specified
limit is smaller than the minimum capability?

>
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 3/4] block: add block timer and throttling algorithm
@ 2011-09-21  3:14               ` Zhi Yong Wu
  0 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-09-21  3:14 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: kwolf, aliguori, stefanha, kvm, Zhi Yong Wu, qemu-devel, pair, ryanh

On Tue, Sep 20, 2011 at 8:34 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Mon, Sep 19, 2011 at 05:55:41PM +0800, Zhi Yong Wu wrote:
>> On Wed, Sep 14, 2011 at 6:50 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> > On Tue, Sep 13, 2011 at 11:09:46AM +0800, Zhi Yong Wu wrote:
>> >> On Fri, Sep 9, 2011 at 10:44 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> >> > On Thu, Sep 08, 2011 at 06:11:07PM +0800, Zhi Yong Wu wrote:
>> >> >> Note:
>> >> >>      1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
>> >> >
>> >> > You can increase the length of the slice, if the request is larger than
>> >> > slice_time * bps_limit.
>> >> Yeah, but it is a challenge for how to increase it. Do you have some nice idea?
>> >
>> > If the queue is empty, and the request being processed does not fit the
>> > queue, increase the slice so that the request fits.
>> Sorry for late reply. actually, do you think that this scenario is
>> meaningful for the user?
>> Since we implement this, if the user limits the bps below 512
>> bytes/second, the VM can also not run every task.
>> Can you let us know why we need to make such effort?
>
> It would be good to handle request larger than the slice.
OK. Let me spend some time on trying your way.
>
> It is not strictly necessary, but in case its not handled, a minimum
> should be in place, to reflect maximum request size known. Being able to
In fact, slice_time has been dynamic now, and adjusted in some range.
> specify something which crashes is not acceptable.
Do you mean that one warning should be displayed if the specified
limit is smaller than the minimum capability?

>
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v8 3/4] block: add block timer and throttling algorithm
  2011-09-21  3:14               ` [Qemu-devel] " Zhi Yong Wu
@ 2011-09-21  5:54                 ` Zhi Yong Wu
  -1 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-09-21  5:54 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Zhi Yong Wu, qemu-devel, kvm, stefanha, aliguori, ryanh, kwolf, pair

On Wed, Sep 21, 2011 at 11:14 AM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote:
> On Tue, Sep 20, 2011 at 8:34 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> On Mon, Sep 19, 2011 at 05:55:41PM +0800, Zhi Yong Wu wrote:
>>> On Wed, Sep 14, 2011 at 6:50 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>>> > On Tue, Sep 13, 2011 at 11:09:46AM +0800, Zhi Yong Wu wrote:
>>> >> On Fri, Sep 9, 2011 at 10:44 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>>> >> > On Thu, Sep 08, 2011 at 06:11:07PM +0800, Zhi Yong Wu wrote:
>>> >> >> Note:
>>> >> >>      1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
>>> >> >
>>> >> > You can increase the length of the slice, if the request is larger than
>>> >> > slice_time * bps_limit.
>>> >> Yeah, but it is a challenge for how to increase it. Do you have some nice idea?
>>> >
>>> > If the queue is empty, and the request being processed does not fit the
>>> > queue, increase the slice so that the request fits.
>>> Sorry for late reply. actually, do you think that this scenario is
>>> meaningful for the user?
>>> Since we implement this, if the user limits the bps below 512
>>> bytes/second, the VM can also not run every task.
>>> Can you let us know why we need to make such effort?
>>
>> It would be good to handle request larger than the slice.
> OK. Let me spend some time on trying your way.
>>
>> It is not strictly necessary, but in case its not handled, a minimum
>> should be in place, to reflect maximum request size known. Being able to
> In fact, slice_time has been dynamic now, and adjusted in some range.
Sorry, I made a mistake. Currently it is fixed.
>> specify something which crashes is not acceptable.
> Do you mean that one warning should be displayed if the specified
> limit is smaller than the minimum capability?
>
>>
>>
>
>
>
> --
> Regards,
>
> Zhi Yong Wu
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 3/4] block: add block timer and throttling algorithm
@ 2011-09-21  5:54                 ` Zhi Yong Wu
  0 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-09-21  5:54 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: kwolf, aliguori, stefanha, kvm, Zhi Yong Wu, qemu-devel, pair, ryanh

On Wed, Sep 21, 2011 at 11:14 AM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote:
> On Tue, Sep 20, 2011 at 8:34 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> On Mon, Sep 19, 2011 at 05:55:41PM +0800, Zhi Yong Wu wrote:
>>> On Wed, Sep 14, 2011 at 6:50 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>>> > On Tue, Sep 13, 2011 at 11:09:46AM +0800, Zhi Yong Wu wrote:
>>> >> On Fri, Sep 9, 2011 at 10:44 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>>> >> > On Thu, Sep 08, 2011 at 06:11:07PM +0800, Zhi Yong Wu wrote:
>>> >> >> Note:
>>> >> >>      1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
>>> >> >
>>> >> > You can increase the length of the slice, if the request is larger than
>>> >> > slice_time * bps_limit.
>>> >> Yeah, but it is a challenge for how to increase it. Do you have some nice idea?
>>> >
>>> > If the queue is empty, and the request being processed does not fit the
>>> > queue, increase the slice so that the request fits.
>>> Sorry for late reply. actually, do you think that this scenario is
>>> meaningful for the user?
>>> Since we implement this, if the user limits the bps below 512
>>> bytes/second, the VM can also not run every task.
>>> Can you let us know why we need to make such effort?
>>
>> It would be good to handle request larger than the slice.
> OK. Let me spend some time on trying your way.
>>
>> It is not strictly necessary, but in case its not handled, a minimum
>> should be in place, to reflect maximum request size known. Being able to
> In fact, slice_time has been dynamic now, and adjusted in some range.
Sorry, I made a mistake. Currently it is fixed.
>> specify something which crashes is not acceptable.
> Do you mean that one warning should be displayed if the specified
> limit is smaller than the minimum capability?
>
>>
>>
>
>
>
> --
> Regards,
>
> Zhi Yong Wu
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v8 3/4] block: add block timer and throttling algorithm
  2011-09-20 12:34             ` [Qemu-devel] " Marcelo Tosatti
@ 2011-09-21  7:03               ` Zhi Yong Wu
  -1 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-09-21  7:03 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Zhi Yong Wu, qemu-devel, kvm, stefanha, aliguori, ryanh, kwolf, pair

On Tue, Sep 20, 2011 at 8:34 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Mon, Sep 19, 2011 at 05:55:41PM +0800, Zhi Yong Wu wrote:
>> On Wed, Sep 14, 2011 at 6:50 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> > On Tue, Sep 13, 2011 at 11:09:46AM +0800, Zhi Yong Wu wrote:
>> >> On Fri, Sep 9, 2011 at 10:44 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> >> > On Thu, Sep 08, 2011 at 06:11:07PM +0800, Zhi Yong Wu wrote:
>> >> >> Note:
>> >> >>      1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
>> >> >
>> >> > You can increase the length of the slice, if the request is larger than
>> >> > slice_time * bps_limit.
>> >> Yeah, but it is a challenge for how to increase it. Do you have some nice idea?
>> >
>> > If the queue is empty, and the request being processed does not fit the
>> > queue, increase the slice so that the request fits.
>> Sorry for late reply. actually, do you think that this scenario is
>> meaningful for the user?
>> Since we implement this, if the user limits the bps below 512
>> bytes/second, the VM can also not run every task.
>> Can you let us know why we need to make such effort?
>
> It would be good to handle request larger than the slice.
Below is the code changes for your way. I used simple trace and did dd
test on guest, then found only the first rw req is handled, and
subsequent reqs are enqueued. After several minutes, guest prints the
info below on its terminal:
INFO: task kdmflush:326 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

I don't make sure if it is correct. Do you have better way to verify it?

>
> It is not strictly necessary, but in case its not handled, a minimum
> should be in place, to reflect maximum request size known. Being able to
> specify something which crashes is not acceptable.
>
>

diff --git a/block.c b/block.c
index af19784..f88c22a 100644
--- a/block.c
+++ b/block.c
@@ -132,9 +132,10 @@ void bdrv_io_limits_disable(BlockDriverState *bs)
         bs->block_timer     = NULL;
     }

-    bs->slice_start = 0;
-
-    bs->slice_end   = 0;
+    bs->slice_time    = 0;
+    bs->slice_start   = 0;
+    bs->slice_end     = 0;
+    bs->first_time_rw = false;
 }

 static void bdrv_block_timer(void *opaque)
@@ -151,9 +152,10 @@ void bdrv_io_limits_enable(BlockDriverState *bs)
     bs->block_queue = qemu_new_block_queue();
     bs->block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);

+    bs->slice_time  = BLOCK_IO_SLICE_TIME;
     bs->slice_start = qemu_get_clock_ns(vm_clock);
-
-    bs->slice_end   = bs->slice_start + BLOCK_IO_SLICE_TIME;
+    bs->slice_end   = bs->slice_start + bs->slice_time;
+    bs->first_time_rw = true;
 }

 bool bdrv_io_limits_enabled(BlockDriverState *bs)
@@ -2846,11 +2848,23 @@ static bool
bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
     /* Calc approx time to dispatch */
     wait_time = (bytes_disp + bytes_res) / bps_limit - elapsed_time;

-    if (wait) {
-        *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
-    }
+    if (!bs->first_time_rw
+        || !qemu_block_queue_is_empty(bs->block_queue)) {
+        if (wait) {
+            *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
+        }

-    return true;
+        return true;
+    } else {
+        bs->slice_time = wait_time * BLOCK_IO_SLICE_TIME * 10;
+        bs->slice_end += bs->slice_time - BLOCK_IO_SLICE_TIME;
+        if (wait) {
+            *wait = 0;
+        }
+
+        bs->first_time_rw = false;
+        return false;
+    }
 }

 static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
@@ -2895,11 +2909,23 @@ static bool
bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
         wait_time = 0;
     }

-    if (wait) {
-        *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
-    }
+    if (!bs->first_time_rw
+        || !qemu_block_queue_is_empty(bs->block_queue)) {
+        if (wait) {
+            *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
+        }

-    return true;
+        return true;
+    } else {
+        bs->slice_time = wait_time * BLOCK_IO_SLICE_TIME * 10;
+        bs->slice_end += bs->slice_time - BLOCK_IO_SLICE_TIME;
+        if (wait) {
+            *wait = 0;
+        }
+
+        bs->first_time_rw = false;
+        return false;
+    }
 }

 static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
@@ -2912,10 +2938,10 @@ static bool
bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
     now = qemu_get_clock_ns(vm_clock);
     if ((bs->slice_start < now)
         && (bs->slice_end > now)) {
-        bs->slice_end = now + BLOCK_IO_SLICE_TIME;
+        bs->slice_end = now + bs->slice_time;
     } else {
         bs->slice_start = now;
-        bs->slice_end   = now + BLOCK_IO_SLICE_TIME;
+        bs->slice_end   = now + bs->slice_time;

         bs->io_disps.bytes[is_write]  = 0;
         bs->io_disps.bytes[!is_write] = 0;
diff --git a/block/blk-queue.c b/block/blk-queue.c
index adef497..04e52ad 100644
--- a/block/blk-queue.c
+++ b/block/blk-queue.c
@@ -199,3 +199,8 @@ bool qemu_block_queue_has_pending(BlockQueue *queue)
 {
     return !queue->flushing && !QTAILQ_EMPTY(&queue->requests);
 }
+
+bool qemu_block_queue_is_empty(BlockQueue *queue)
+{
+    return QTAILQ_EMPTY(&queue->requests);
+}
diff --git a/block/blk-queue.h b/block/blk-queue.h
index c1529f7..d3b379b 100644
--- a/block/blk-queue.h
+++ b/block/blk-queue.h
@@ -56,4 +56,6 @@ void qemu_block_queue_flush(BlockQueue *queue);

 bool qemu_block_queue_has_pending(BlockQueue *queue);

+bool qemu_block_queue_is_empty(BlockQueue *queue);
+
 #endif /* QEMU_BLOCK_QUEUE_H */
diff --git a/block_int.h b/block_int.h
index 93c0d56..5eb007d 100644
--- a/block_int.h
+++ b/block_int.h
@@ -199,6 +199,7 @@ struct BlockDriverState {
     void *sync_aiocb;

     /* the time for latest disk I/O */
+    int64_t slice_time;
     int64_t slice_start;
     int64_t slice_end;
     BlockIOLimit io_limits;
@@ -206,6 +207,7 @@ struct BlockDriverState {
     BlockQueue   *block_queue;
     QEMUTimer    *block_timer;
     bool         io_limits_enabled;
+    bool         first_time_rw;

     /* I/O stats (display with "info blockstats"). */
     uint64_t nr_bytes[BDRV_MAX_IOTYPE];
diff --git a/blockdev.c b/blockdev.c
index 63bd2b5..67d5a50 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -782,6 +782,8 @@ int do_block_set_io_throttle(Monitor *mon,
     bs->io_limits.iops[BLOCK_IO_LIMIT_READ]  = iops_rd;
     bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE] = iops_wr;

+    bs->slice_time = BLOCK_IO_SLICE_TIME;
+
     if (!bs->io_limits_enabled && bdrv_io_limits_enabled(bs)) {
         bdrv_io_limits_enable(bs);
     } else if (bs->io_limits_enabled && !bdrv_io_limits_enabled(bs)) {





-- 
Regards,

Zhi Yong Wu

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 3/4] block: add block timer and throttling algorithm
@ 2011-09-21  7:03               ` Zhi Yong Wu
  0 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-09-21  7:03 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: kwolf, aliguori, stefanha, kvm, Zhi Yong Wu, qemu-devel, pair, ryanh

On Tue, Sep 20, 2011 at 8:34 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Mon, Sep 19, 2011 at 05:55:41PM +0800, Zhi Yong Wu wrote:
>> On Wed, Sep 14, 2011 at 6:50 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> > On Tue, Sep 13, 2011 at 11:09:46AM +0800, Zhi Yong Wu wrote:
>> >> On Fri, Sep 9, 2011 at 10:44 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> >> > On Thu, Sep 08, 2011 at 06:11:07PM +0800, Zhi Yong Wu wrote:
>> >> >> Note:
>> >> >>      1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
>> >> >
>> >> > You can increase the length of the slice, if the request is larger than
>> >> > slice_time * bps_limit.
>> >> Yeah, but it is a challenge for how to increase it. Do you have some nice idea?
>> >
>> > If the queue is empty, and the request being processed does not fit the
>> > queue, increase the slice so that the request fits.
>> Sorry for late reply. actually, do you think that this scenario is
>> meaningful for the user?
>> Since we implement this, if the user limits the bps below 512
>> bytes/second, the VM can also not run every task.
>> Can you let us know why we need to make such effort?
>
> It would be good to handle request larger than the slice.
Below is the code changes for your way. I used simple trace and did dd
test on guest, then found only the first rw req is handled, and
subsequent reqs are enqueued. After several minutes, guest prints the
info below on its terminal:
INFO: task kdmflush:326 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

I don't make sure if it is correct. Do you have better way to verify it?

>
> It is not strictly necessary, but in case its not handled, a minimum
> should be in place, to reflect maximum request size known. Being able to
> specify something which crashes is not acceptable.
>
>

diff --git a/block.c b/block.c
index af19784..f88c22a 100644
--- a/block.c
+++ b/block.c
@@ -132,9 +132,10 @@ void bdrv_io_limits_disable(BlockDriverState *bs)
         bs->block_timer     = NULL;
     }

-    bs->slice_start = 0;
-
-    bs->slice_end   = 0;
+    bs->slice_time    = 0;
+    bs->slice_start   = 0;
+    bs->slice_end     = 0;
+    bs->first_time_rw = false;
 }

 static void bdrv_block_timer(void *opaque)
@@ -151,9 +152,10 @@ void bdrv_io_limits_enable(BlockDriverState *bs)
     bs->block_queue = qemu_new_block_queue();
     bs->block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);

+    bs->slice_time  = BLOCK_IO_SLICE_TIME;
     bs->slice_start = qemu_get_clock_ns(vm_clock);
-
-    bs->slice_end   = bs->slice_start + BLOCK_IO_SLICE_TIME;
+    bs->slice_end   = bs->slice_start + bs->slice_time;
+    bs->first_time_rw = true;
 }

 bool bdrv_io_limits_enabled(BlockDriverState *bs)
@@ -2846,11 +2848,23 @@ static bool
bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
     /* Calc approx time to dispatch */
     wait_time = (bytes_disp + bytes_res) / bps_limit - elapsed_time;

-    if (wait) {
-        *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
-    }
+    if (!bs->first_time_rw
+        || !qemu_block_queue_is_empty(bs->block_queue)) {
+        if (wait) {
+            *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
+        }

-    return true;
+        return true;
+    } else {
+        bs->slice_time = wait_time * BLOCK_IO_SLICE_TIME * 10;
+        bs->slice_end += bs->slice_time - BLOCK_IO_SLICE_TIME;
+        if (wait) {
+            *wait = 0;
+        }
+
+        bs->first_time_rw = false;
+        return false;
+    }
 }

 static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
@@ -2895,11 +2909,23 @@ static bool
bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
         wait_time = 0;
     }

-    if (wait) {
-        *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
-    }
+    if (!bs->first_time_rw
+        || !qemu_block_queue_is_empty(bs->block_queue)) {
+        if (wait) {
+            *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
+        }

-    return true;
+        return true;
+    } else {
+        bs->slice_time = wait_time * BLOCK_IO_SLICE_TIME * 10;
+        bs->slice_end += bs->slice_time - BLOCK_IO_SLICE_TIME;
+        if (wait) {
+            *wait = 0;
+        }
+
+        bs->first_time_rw = false;
+        return false;
+    }
 }

 static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
@@ -2912,10 +2938,10 @@ static bool
bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
     now = qemu_get_clock_ns(vm_clock);
     if ((bs->slice_start < now)
         && (bs->slice_end > now)) {
-        bs->slice_end = now + BLOCK_IO_SLICE_TIME;
+        bs->slice_end = now + bs->slice_time;
     } else {
         bs->slice_start = now;
-        bs->slice_end   = now + BLOCK_IO_SLICE_TIME;
+        bs->slice_end   = now + bs->slice_time;

         bs->io_disps.bytes[is_write]  = 0;
         bs->io_disps.bytes[!is_write] = 0;
diff --git a/block/blk-queue.c b/block/blk-queue.c
index adef497..04e52ad 100644
--- a/block/blk-queue.c
+++ b/block/blk-queue.c
@@ -199,3 +199,8 @@ bool qemu_block_queue_has_pending(BlockQueue *queue)
 {
     return !queue->flushing && !QTAILQ_EMPTY(&queue->requests);
 }
+
+bool qemu_block_queue_is_empty(BlockQueue *queue)
+{
+    return QTAILQ_EMPTY(&queue->requests);
+}
diff --git a/block/blk-queue.h b/block/blk-queue.h
index c1529f7..d3b379b 100644
--- a/block/blk-queue.h
+++ b/block/blk-queue.h
@@ -56,4 +56,6 @@ void qemu_block_queue_flush(BlockQueue *queue);

 bool qemu_block_queue_has_pending(BlockQueue *queue);

+bool qemu_block_queue_is_empty(BlockQueue *queue);
+
 #endif /* QEMU_BLOCK_QUEUE_H */
diff --git a/block_int.h b/block_int.h
index 93c0d56..5eb007d 100644
--- a/block_int.h
+++ b/block_int.h
@@ -199,6 +199,7 @@ struct BlockDriverState {
     void *sync_aiocb;

     /* the time for latest disk I/O */
+    int64_t slice_time;
     int64_t slice_start;
     int64_t slice_end;
     BlockIOLimit io_limits;
@@ -206,6 +207,7 @@ struct BlockDriverState {
     BlockQueue   *block_queue;
     QEMUTimer    *block_timer;
     bool         io_limits_enabled;
+    bool         first_time_rw;

     /* I/O stats (display with "info blockstats"). */
     uint64_t nr_bytes[BDRV_MAX_IOTYPE];
diff --git a/blockdev.c b/blockdev.c
index 63bd2b5..67d5a50 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -782,6 +782,8 @@ int do_block_set_io_throttle(Monitor *mon,
     bs->io_limits.iops[BLOCK_IO_LIMIT_READ]  = iops_rd;
     bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE] = iops_wr;

+    bs->slice_time = BLOCK_IO_SLICE_TIME;
+
     if (!bs->io_limits_enabled && bdrv_io_limits_enabled(bs)) {
         bdrv_io_limits_enable(bs);
     } else if (bs->io_limits_enabled && !bdrv_io_limits_enabled(bs)) {





-- 
Regards,

Zhi Yong Wu

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH v8 1/4] block: add the block queue support
  2011-09-08 10:11   ` [Qemu-devel] " Zhi Yong Wu
@ 2011-09-23 15:32     ` Kevin Wolf
  -1 siblings, 0 replies; 68+ messages in thread
From: Kevin Wolf @ 2011-09-23 15:32 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: aliguori, stefanha, kvm, mtosatti, qemu-devel, pair, zwu.kernel, ryanh

Am 08.09.2011 12:11, schrieb Zhi Yong Wu:
> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
> ---
>  Makefile.objs     |    2 +-
>  block/blk-queue.c |  201 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  block/blk-queue.h |   59 ++++++++++++++++
>  block_int.h       |   27 +++++++
>  4 files changed, 288 insertions(+), 1 deletions(-)
>  create mode 100644 block/blk-queue.c
>  create mode 100644 block/blk-queue.h
> 
> diff --git a/Makefile.objs b/Makefile.objs
> index 26b885b..5dcf456 100644
> --- a/Makefile.objs
> +++ b/Makefile.objs
> @@ -33,7 +33,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vv
>  block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
>  block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>  block-nested-y += qed-check.o
> -block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
> +block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o blk-queue.o
>  block-nested-$(CONFIG_WIN32) += raw-win32.o
>  block-nested-$(CONFIG_POSIX) += raw-posix.o
>  block-nested-$(CONFIG_CURL) += curl.o
> diff --git a/block/blk-queue.c b/block/blk-queue.c
> new file mode 100644
> index 0000000..adef497
> --- /dev/null
> +++ b/block/blk-queue.c
> @@ -0,0 +1,201 @@
> +/*
> + * QEMU System Emulator queue definition for block layer
> + *
> + * Copyright (c) IBM, Corp. 2011
> + *
> + * Authors:
> + *  Zhi Yong Wu  <wuzhy@linux.vnet.ibm.com>
> + *  Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "block_int.h"
> +#include "block/blk-queue.h"
> +#include "qemu-common.h"
> +
> +/* The APIs for block request queue on qemu block layer.
> + */
> +
> +struct BlockQueueAIOCB {
> +    BlockDriverAIOCB common;
> +    QTAILQ_ENTRY(BlockQueueAIOCB) entry;
> +    BlockRequestHandler *handler;
> +    BlockDriverAIOCB *real_acb;
> +
> +    int64_t sector_num;
> +    QEMUIOVector *qiov;
> +    int nb_sectors;
> +};

The idea is that each request is first queued on the QTAILQ, and at some
point it's removed from the queue and gets a real_acb. But it never has
both at the same time. Correct?

Can we have the basic principle of operation spelled out as a comment
somewhere near the top of the file?

> +
> +typedef struct BlockQueueAIOCB BlockQueueAIOCB;
> +
> +struct BlockQueue {
> +    QTAILQ_HEAD(requests, BlockQueueAIOCB) requests;
> +    bool req_failed;
> +    bool flushing;
> +};

I find req_failed pretty confusing. Needs documentation at least, but
most probably also a better name.

> +
> +static void qemu_block_queue_dequeue(BlockQueue *queue,
> +                                     BlockQueueAIOCB *request)
> +{
> +    BlockQueueAIOCB *req;
> +
> +    assert(queue);
> +    while (!QTAILQ_EMPTY(&queue->requests)) {
> +        req = QTAILQ_FIRST(&queue->requests);
> +        if (req == request) {
> +            QTAILQ_REMOVE(&queue->requests, req, entry);
> +            break;
> +        }
> +    }
> +}

Is it just me or is this an endless loop if the request isn't the first
element in the list?

> +
> +static void qemu_block_queue_cancel(BlockDriverAIOCB *acb)
> +{
> +    BlockQueueAIOCB *request = container_of(acb, BlockQueueAIOCB, common);
> +    if (request->real_acb) {
> +        bdrv_aio_cancel(request->real_acb);
> +    } else {
> +        assert(request->common.bs->block_queue);
> +        qemu_block_queue_dequeue(request->common.bs->block_queue,
> +                                 request);
> +    }
> +
> +    qemu_aio_release(request);
> +}
> +
> +static AIOPool block_queue_pool = {
> +    .aiocb_size         = sizeof(struct BlockQueueAIOCB),
> +    .cancel             = qemu_block_queue_cancel,
> +};
> +
> +static void qemu_block_queue_callback(void *opaque, int ret)
> +{
> +    BlockQueueAIOCB *acb = opaque;
> +
> +    if (acb->common.cb) {
> +        acb->common.cb(acb->common.opaque, ret);
> +    }
> +
> +    qemu_aio_release(acb);
> +}
> +
> +BlockQueue *qemu_new_block_queue(void)
> +{
> +    BlockQueue *queue;
> +
> +    queue = g_malloc0(sizeof(BlockQueue));
> +
> +    QTAILQ_INIT(&queue->requests);
> +
> +    queue->req_failed = true;
> +    queue->flushing   = false;
> +
> +    return queue;
> +}
> +
> +void qemu_del_block_queue(BlockQueue *queue)
> +{
> +    BlockQueueAIOCB *request, *next;
> +
> +    QTAILQ_FOREACH_SAFE(request, &queue->requests, entry, next) {
> +        QTAILQ_REMOVE(&queue->requests, request, entry);
> +        qemu_aio_release(request);
> +    }
> +
> +    g_free(queue);
> +}

Can we be sure that no AIO requests are in flight that still use the now
released AIOCB?

> +
> +BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
> +                        BlockDriverState *bs,
> +                        BlockRequestHandler *handler,
> +                        int64_t sector_num,
> +                        QEMUIOVector *qiov,
> +                        int nb_sectors,
> +                        BlockDriverCompletionFunc *cb,
> +                        void *opaque)
> +{
> +    BlockDriverAIOCB *acb;
> +    BlockQueueAIOCB *request;
> +
> +    if (queue->flushing) {
> +        queue->req_failed = false;
> +        return NULL;
> +    } else {
> +        acb = qemu_aio_get(&block_queue_pool, bs,
> +                           cb, opaque);
> +        request = container_of(acb, BlockQueueAIOCB, common);
> +        request->handler       = handler;
> +        request->sector_num    = sector_num;
> +        request->qiov          = qiov;
> +        request->nb_sectors    = nb_sectors;
> +        request->real_acb      = NULL;
> +        QTAILQ_INSERT_TAIL(&queue->requests, request, entry);
> +    }
> +
> +    return acb;
> +}
> +
> +static int qemu_block_queue_handler(BlockQueueAIOCB *request)
> +{
> +    int ret;
> +    BlockDriverAIOCB *res;
> +
> +    res = request->handler(request->common.bs, request->sector_num,
> +                           request->qiov, request->nb_sectors,
> +                           qemu_block_queue_callback, request);
> +    if (res) {
> +        request->real_acb = res;
> +    }
> +
> +    ret = (res == NULL) ? 0 : 1;
> +
> +    return ret;

You mean return (res != NULL); and want to have bool as the return value
of this function.

> +}
> +
> +void qemu_block_queue_flush(BlockQueue *queue)
> +{
> +    queue->flushing = true;
> +    while (!QTAILQ_EMPTY(&queue->requests)) {
> +        BlockQueueAIOCB *request = NULL;
> +        int ret = 0;
> +
> +        request = QTAILQ_FIRST(&queue->requests);
> +        QTAILQ_REMOVE(&queue->requests, request, entry);
> +
> +        queue->req_failed = true;
> +        ret = qemu_block_queue_handler(request);
> +        if (ret == 0) {
> +            QTAILQ_INSERT_HEAD(&queue->requests, request, entry);
> +            if (queue->req_failed) {
> +                qemu_block_queue_callback(request, -EIO);
> +                break;
> +            }
> +        }
> +    }
> +
> +    queue->req_failed = true;
> +    queue->flushing   = false;
> +}
> +
> +bool qemu_block_queue_has_pending(BlockQueue *queue)
> +{
> +    return !queue->flushing && !QTAILQ_EMPTY(&queue->requests);
> +}

Why doesn't the queue have pending requests in the middle of a flush
operation? (That is, the flush hasn't completed yet)

> diff --git a/block/blk-queue.h b/block/blk-queue.h
> new file mode 100644
> index 0000000..c1529f7
> --- /dev/null
> +++ b/block/blk-queue.h
> @@ -0,0 +1,59 @@
> +/*
> + * QEMU System Emulator queue declaration for block layer
> + *
> + * Copyright (c) IBM, Corp. 2011
> + *
> + * Authors:
> + *  Zhi Yong Wu  <wuzhy@linux.vnet.ibm.com>
> + *  Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#ifndef QEMU_BLOCK_QUEUE_H
> +#define QEMU_BLOCK_QUEUE_H
> +
> +#include "block.h"
> +#include "qemu-queue.h"
> +
> +typedef BlockDriverAIOCB* (BlockRequestHandler) (BlockDriverState *bs,
> +                                int64_t sector_num, QEMUIOVector *qiov,
> +                                int nb_sectors, BlockDriverCompletionFunc *cb,
> +                                void *opaque);
> +
> +typedef struct BlockQueue BlockQueue;
> +
> +BlockQueue *qemu_new_block_queue(void);
> +
> +void qemu_del_block_queue(BlockQueue *queue);
> +
> +BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
> +                        BlockDriverState *bs,
> +                        BlockRequestHandler *handler,
> +                        int64_t sector_num,
> +                        QEMUIOVector *qiov,
> +                        int nb_sectors,
> +                        BlockDriverCompletionFunc *cb,
> +                        void *opaque);
> +
> +void qemu_block_queue_flush(BlockQueue *queue);
> +
> +bool qemu_block_queue_has_pending(BlockQueue *queue);
> +
> +#endif /* QEMU_BLOCK_QUEUE_H */
> diff --git a/block_int.h b/block_int.h
> index 8a72b80..201e635 100644
> --- a/block_int.h
> +++ b/block_int.h
> @@ -29,10 +29,18 @@
>  #include "qemu-queue.h"
>  #include "qemu-coroutine.h"
>  #include "qemu-timer.h"
> +#include "block/blk-queue.h"
>  
>  #define BLOCK_FLAG_ENCRYPT	1
>  #define BLOCK_FLAG_COMPAT6	4
>  
> +#define BLOCK_IO_LIMIT_READ     0
> +#define BLOCK_IO_LIMIT_WRITE    1
> +#define BLOCK_IO_LIMIT_TOTAL    2
> +
> +#define BLOCK_IO_SLICE_TIME     100000000
> +#define NANOSECONDS_PER_SECOND  1000000000.0
> +
>  #define BLOCK_OPT_SIZE          "size"
>  #define BLOCK_OPT_ENCRYPT       "encryption"
>  #define BLOCK_OPT_COMPAT6       "compat6"
> @@ -49,6 +57,16 @@ typedef struct AIOPool {
>      BlockDriverAIOCB *free_aiocb;
>  } AIOPool;
>  
> +typedef struct BlockIOLimit {
> +    uint64_t bps[3];
> +    uint64_t iops[3];
> +} BlockIOLimit;
> +
> +typedef struct BlockIODisp {
> +    uint64_t bytes[2];
> +    uint64_t ios[2];
> +} BlockIODisp;
> +
>  struct BlockDriver {
>      const char *format_name;
>      int instance_size;
> @@ -184,6 +202,15 @@ struct BlockDriverState {
>  
>      void *sync_aiocb;
>  
> +    /* the time for latest disk I/O */
> +    int64_t slice_start;
> +    int64_t slice_end;
> +    BlockIOLimit io_limits;
> +    BlockIODisp  io_disps;
> +    BlockQueue   *block_queue;
> +    QEMUTimer    *block_timer;
> +    bool         io_limits_enabled;
> +
>      /* I/O stats (display with "info blockstats"). */
>      uint64_t nr_bytes[BDRV_MAX_IOTYPE];
>      uint64_t nr_ops[BDRV_MAX_IOTYPE];

The changes to block_int.h look unrelated to this patch. Maybe they
should come later in the series.

Kevin

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 1/4] block: add the block queue support
@ 2011-09-23 15:32     ` Kevin Wolf
  0 siblings, 0 replies; 68+ messages in thread
From: Kevin Wolf @ 2011-09-23 15:32 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: aliguori, stefanha, kvm, mtosatti, qemu-devel, pair, zwu.kernel, ryanh

Am 08.09.2011 12:11, schrieb Zhi Yong Wu:
> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
> ---
>  Makefile.objs     |    2 +-
>  block/blk-queue.c |  201 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  block/blk-queue.h |   59 ++++++++++++++++
>  block_int.h       |   27 +++++++
>  4 files changed, 288 insertions(+), 1 deletions(-)
>  create mode 100644 block/blk-queue.c
>  create mode 100644 block/blk-queue.h
> 
> diff --git a/Makefile.objs b/Makefile.objs
> index 26b885b..5dcf456 100644
> --- a/Makefile.objs
> +++ b/Makefile.objs
> @@ -33,7 +33,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vv
>  block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
>  block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>  block-nested-y += qed-check.o
> -block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
> +block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o blk-queue.o
>  block-nested-$(CONFIG_WIN32) += raw-win32.o
>  block-nested-$(CONFIG_POSIX) += raw-posix.o
>  block-nested-$(CONFIG_CURL) += curl.o
> diff --git a/block/blk-queue.c b/block/blk-queue.c
> new file mode 100644
> index 0000000..adef497
> --- /dev/null
> +++ b/block/blk-queue.c
> @@ -0,0 +1,201 @@
> +/*
> + * QEMU System Emulator queue definition for block layer
> + *
> + * Copyright (c) IBM, Corp. 2011
> + *
> + * Authors:
> + *  Zhi Yong Wu  <wuzhy@linux.vnet.ibm.com>
> + *  Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "block_int.h"
> +#include "block/blk-queue.h"
> +#include "qemu-common.h"
> +
> +/* The APIs for block request queue on qemu block layer.
> + */
> +
> +struct BlockQueueAIOCB {
> +    BlockDriverAIOCB common;
> +    QTAILQ_ENTRY(BlockQueueAIOCB) entry;
> +    BlockRequestHandler *handler;
> +    BlockDriverAIOCB *real_acb;
> +
> +    int64_t sector_num;
> +    QEMUIOVector *qiov;
> +    int nb_sectors;
> +};

The idea is that each request is first queued on the QTAILQ, and at some
point it's removed from the queue and gets a real_acb. But it never has
both at the same time. Correct?

Can we have the basic principle of operation spelled out as a comment
somewhere near the top of the file?

> +
> +typedef struct BlockQueueAIOCB BlockQueueAIOCB;
> +
> +struct BlockQueue {
> +    QTAILQ_HEAD(requests, BlockQueueAIOCB) requests;
> +    bool req_failed;
> +    bool flushing;
> +};

I find req_failed pretty confusing. Needs documentation at least, but
most probably also a better name.

> +
> +static void qemu_block_queue_dequeue(BlockQueue *queue,
> +                                     BlockQueueAIOCB *request)
> +{
> +    BlockQueueAIOCB *req;
> +
> +    assert(queue);
> +    while (!QTAILQ_EMPTY(&queue->requests)) {
> +        req = QTAILQ_FIRST(&queue->requests);
> +        if (req == request) {
> +            QTAILQ_REMOVE(&queue->requests, req, entry);
> +            break;
> +        }
> +    }
> +}

Is it just me or is this an endless loop if the request isn't the first
element in the list?

> +
> +static void qemu_block_queue_cancel(BlockDriverAIOCB *acb)
> +{
> +    BlockQueueAIOCB *request = container_of(acb, BlockQueueAIOCB, common);
> +    if (request->real_acb) {
> +        bdrv_aio_cancel(request->real_acb);
> +    } else {
> +        assert(request->common.bs->block_queue);
> +        qemu_block_queue_dequeue(request->common.bs->block_queue,
> +                                 request);
> +    }
> +
> +    qemu_aio_release(request);
> +}
> +
> +static AIOPool block_queue_pool = {
> +    .aiocb_size         = sizeof(struct BlockQueueAIOCB),
> +    .cancel             = qemu_block_queue_cancel,
> +};
> +
> +static void qemu_block_queue_callback(void *opaque, int ret)
> +{
> +    BlockQueueAIOCB *acb = opaque;
> +
> +    if (acb->common.cb) {
> +        acb->common.cb(acb->common.opaque, ret);
> +    }
> +
> +    qemu_aio_release(acb);
> +}
> +
> +BlockQueue *qemu_new_block_queue(void)
> +{
> +    BlockQueue *queue;
> +
> +    queue = g_malloc0(sizeof(BlockQueue));
> +
> +    QTAILQ_INIT(&queue->requests);
> +
> +    queue->req_failed = true;
> +    queue->flushing   = false;
> +
> +    return queue;
> +}
> +
> +void qemu_del_block_queue(BlockQueue *queue)
> +{
> +    BlockQueueAIOCB *request, *next;
> +
> +    QTAILQ_FOREACH_SAFE(request, &queue->requests, entry, next) {
> +        QTAILQ_REMOVE(&queue->requests, request, entry);
> +        qemu_aio_release(request);
> +    }
> +
> +    g_free(queue);
> +}

Can we be sure that no AIO requests are in flight that still use the now
released AIOCB?

> +
> +BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
> +                        BlockDriverState *bs,
> +                        BlockRequestHandler *handler,
> +                        int64_t sector_num,
> +                        QEMUIOVector *qiov,
> +                        int nb_sectors,
> +                        BlockDriverCompletionFunc *cb,
> +                        void *opaque)
> +{
> +    BlockDriverAIOCB *acb;
> +    BlockQueueAIOCB *request;
> +
> +    if (queue->flushing) {
> +        queue->req_failed = false;
> +        return NULL;
> +    } else {
> +        acb = qemu_aio_get(&block_queue_pool, bs,
> +                           cb, opaque);
> +        request = container_of(acb, BlockQueueAIOCB, common);
> +        request->handler       = handler;
> +        request->sector_num    = sector_num;
> +        request->qiov          = qiov;
> +        request->nb_sectors    = nb_sectors;
> +        request->real_acb      = NULL;
> +        QTAILQ_INSERT_TAIL(&queue->requests, request, entry);
> +    }
> +
> +    return acb;
> +}
> +
> +static int qemu_block_queue_handler(BlockQueueAIOCB *request)
> +{
> +    int ret;
> +    BlockDriverAIOCB *res;
> +
> +    res = request->handler(request->common.bs, request->sector_num,
> +                           request->qiov, request->nb_sectors,
> +                           qemu_block_queue_callback, request);
> +    if (res) {
> +        request->real_acb = res;
> +    }
> +
> +    ret = (res == NULL) ? 0 : 1;
> +
> +    return ret;

You mean return (res != NULL); and want to have bool as the return value
of this function.

> +}
> +
> +void qemu_block_queue_flush(BlockQueue *queue)
> +{
> +    queue->flushing = true;
> +    while (!QTAILQ_EMPTY(&queue->requests)) {
> +        BlockQueueAIOCB *request = NULL;
> +        int ret = 0;
> +
> +        request = QTAILQ_FIRST(&queue->requests);
> +        QTAILQ_REMOVE(&queue->requests, request, entry);
> +
> +        queue->req_failed = true;
> +        ret = qemu_block_queue_handler(request);
> +        if (ret == 0) {
> +            QTAILQ_INSERT_HEAD(&queue->requests, request, entry);
> +            if (queue->req_failed) {
> +                qemu_block_queue_callback(request, -EIO);
> +                break;
> +            }
> +        }
> +    }
> +
> +    queue->req_failed = true;
> +    queue->flushing   = false;
> +}
> +
> +bool qemu_block_queue_has_pending(BlockQueue *queue)
> +{
> +    return !queue->flushing && !QTAILQ_EMPTY(&queue->requests);
> +}

Why doesn't the queue have pending requests in the middle of a flush
operation? (That is, the flush hasn't completed yet)

> diff --git a/block/blk-queue.h b/block/blk-queue.h
> new file mode 100644
> index 0000000..c1529f7
> --- /dev/null
> +++ b/block/blk-queue.h
> @@ -0,0 +1,59 @@
> +/*
> + * QEMU System Emulator queue declaration for block layer
> + *
> + * Copyright (c) IBM, Corp. 2011
> + *
> + * Authors:
> + *  Zhi Yong Wu  <wuzhy@linux.vnet.ibm.com>
> + *  Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#ifndef QEMU_BLOCK_QUEUE_H
> +#define QEMU_BLOCK_QUEUE_H
> +
> +#include "block.h"
> +#include "qemu-queue.h"
> +
> +typedef BlockDriverAIOCB* (BlockRequestHandler) (BlockDriverState *bs,
> +                                int64_t sector_num, QEMUIOVector *qiov,
> +                                int nb_sectors, BlockDriverCompletionFunc *cb,
> +                                void *opaque);
> +
> +typedef struct BlockQueue BlockQueue;
> +
> +BlockQueue *qemu_new_block_queue(void);
> +
> +void qemu_del_block_queue(BlockQueue *queue);
> +
> +BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
> +                        BlockDriverState *bs,
> +                        BlockRequestHandler *handler,
> +                        int64_t sector_num,
> +                        QEMUIOVector *qiov,
> +                        int nb_sectors,
> +                        BlockDriverCompletionFunc *cb,
> +                        void *opaque);
> +
> +void qemu_block_queue_flush(BlockQueue *queue);
> +
> +bool qemu_block_queue_has_pending(BlockQueue *queue);
> +
> +#endif /* QEMU_BLOCK_QUEUE_H */
> diff --git a/block_int.h b/block_int.h
> index 8a72b80..201e635 100644
> --- a/block_int.h
> +++ b/block_int.h
> @@ -29,10 +29,18 @@
>  #include "qemu-queue.h"
>  #include "qemu-coroutine.h"
>  #include "qemu-timer.h"
> +#include "block/blk-queue.h"
>  
>  #define BLOCK_FLAG_ENCRYPT	1
>  #define BLOCK_FLAG_COMPAT6	4
>  
> +#define BLOCK_IO_LIMIT_READ     0
> +#define BLOCK_IO_LIMIT_WRITE    1
> +#define BLOCK_IO_LIMIT_TOTAL    2
> +
> +#define BLOCK_IO_SLICE_TIME     100000000
> +#define NANOSECONDS_PER_SECOND  1000000000.0
> +
>  #define BLOCK_OPT_SIZE          "size"
>  #define BLOCK_OPT_ENCRYPT       "encryption"
>  #define BLOCK_OPT_COMPAT6       "compat6"
> @@ -49,6 +57,16 @@ typedef struct AIOPool {
>      BlockDriverAIOCB *free_aiocb;
>  } AIOPool;
>  
> +typedef struct BlockIOLimit {
> +    uint64_t bps[3];
> +    uint64_t iops[3];
> +} BlockIOLimit;
> +
> +typedef struct BlockIODisp {
> +    uint64_t bytes[2];
> +    uint64_t ios[2];
> +} BlockIODisp;
> +
>  struct BlockDriver {
>      const char *format_name;
>      int instance_size;
> @@ -184,6 +202,15 @@ struct BlockDriverState {
>  
>      void *sync_aiocb;
>  
> +    /* the time for latest disk I/O */
> +    int64_t slice_start;
> +    int64_t slice_end;
> +    BlockIOLimit io_limits;
> +    BlockIODisp  io_disps;
> +    BlockQueue   *block_queue;
> +    QEMUTimer    *block_timer;
> +    bool         io_limits_enabled;
> +
>      /* I/O stats (display with "info blockstats"). */
>      uint64_t nr_bytes[BDRV_MAX_IOTYPE];
>      uint64_t nr_ops[BDRV_MAX_IOTYPE];

The changes to block_int.h look unrelated to this patch. Maybe they
should come later in the series.

Kevin

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v8 2/4] block: add the command line support
  2011-09-08 10:11   ` [Qemu-devel] " Zhi Yong Wu
@ 2011-09-23 15:54     ` Kevin Wolf
  -1 siblings, 0 replies; 68+ messages in thread
From: Kevin Wolf @ 2011-09-23 15:54 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: aliguori, stefanha, kvm, mtosatti, qemu-devel, pair, zwu.kernel, ryanh

Am 08.09.2011 12:11, schrieb Zhi Yong Wu:
> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
> ---
>  block.c         |   59 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  block.h         |    5 ++++
>  block_int.h     |    3 ++
>  blockdev.c      |   29 +++++++++++++++++++++++++++
>  qemu-config.c   |   24 ++++++++++++++++++++++
>  qemu-options.hx |    1 +
>  6 files changed, 121 insertions(+), 0 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 43742b7..cd75183 100644
> --- a/block.c
> +++ b/block.c
> @@ -104,6 +104,57 @@ int is_windows_drive(const char *filename)
>  }
>  #endif
>  
> +/* throttling disk I/O limits */
> +void bdrv_io_limits_disable(BlockDriverState *bs)
> +{
> +    bs->io_limits_enabled = false;
> +
> +    if (bs->block_queue) {
> +        qemu_block_queue_flush(bs->block_queue);
> +        qemu_del_block_queue(bs->block_queue);
> +        bs->block_queue = NULL;
> +    }
> +
> +    if (bs->block_timer) {
> +        qemu_del_timer(bs->block_timer);
> +        qemu_free_timer(bs->block_timer);
> +        bs->block_timer     = NULL;
> +    }
> +
> +    bs->slice_start = 0;
> +
> +    bs->slice_end   = 0;

Remove the empty line between slice_start and slice_end?

> +}
> +
> +static void bdrv_block_timer(void *opaque)
> +{
> +    BlockDriverState *bs = opaque;
> +    BlockQueue *queue    = bs->block_queue;
> +
> +    qemu_block_queue_flush(queue);

Hm, didn't really notice it while reading patch 1, but
qemu_block_queue_flush() is misleading. It's really something like
qemu_block_queue_submit().

> +}
> +
> +void bdrv_io_limits_enable(BlockDriverState *bs)
> +{
> +    bs->block_queue = qemu_new_block_queue();
> +    bs->block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
> +
> +    bs->slice_start = qemu_get_clock_ns(vm_clock);
> +
> +    bs->slice_end   = bs->slice_start + BLOCK_IO_SLICE_TIME;
> +}

Same as above.

> +
> +bool bdrv_io_limits_enabled(BlockDriverState *bs)
> +{
> +    BlockIOLimit *io_limits = &bs->io_limits;
> +    return io_limits->bps[BLOCK_IO_LIMIT_READ]
> +         || io_limits->bps[BLOCK_IO_LIMIT_WRITE]
> +         || io_limits->bps[BLOCK_IO_LIMIT_TOTAL]
> +         || io_limits->iops[BLOCK_IO_LIMIT_READ]
> +         || io_limits->iops[BLOCK_IO_LIMIT_WRITE]
> +         || io_limits->iops[BLOCK_IO_LIMIT_TOTAL];
> +}
> +
>  /* check if the path starts with "<protocol>:" */
>  static int path_has_protocol(const char *path)
>  {
> @@ -1453,6 +1504,14 @@ void bdrv_get_geometry_hint(BlockDriverState *bs,
>      *psecs = bs->secs;
>  }
>  
> +/* throttling disk io limits */
> +void bdrv_set_io_limits(BlockDriverState *bs,
> +                            BlockIOLimit *io_limits)
> +{
> +    bs->io_limits = *io_limits;
> +    bs->io_limits_enabled = bdrv_io_limits_enabled(bs);
> +}
> +
>  /* Recognize floppy formats */
>  typedef struct FDFormat {
>      FDriveType drive;
> diff --git a/block.h b/block.h
> index 3ac0b94..a3e69db 100644
> --- a/block.h
> +++ b/block.h
> @@ -58,6 +58,11 @@ void bdrv_info(Monitor *mon, QObject **ret_data);
>  void bdrv_stats_print(Monitor *mon, const QObject *data);
>  void bdrv_info_stats(Monitor *mon, QObject **ret_data);
>  
> +/* disk I/O throttling */
> +void bdrv_io_limits_enable(BlockDriverState *bs);
> +void bdrv_io_limits_disable(BlockDriverState *bs);
> +bool bdrv_io_limits_enabled(BlockDriverState *bs);
> +
>  void bdrv_init(void);
>  void bdrv_init_with_whitelist(void);
>  BlockDriver *bdrv_find_protocol(const char *filename);
> diff --git a/block_int.h b/block_int.h
> index 201e635..368c776 100644
> --- a/block_int.h
> +++ b/block_int.h
> @@ -257,6 +257,9 @@ void qemu_aio_release(void *p);
>  
>  void *qemu_blockalign(BlockDriverState *bs, size_t size);
>  
> +void bdrv_set_io_limits(BlockDriverState *bs,
> +                            BlockIOLimit *io_limits);
> +
>  #ifdef _WIN32
>  int is_windows_drive(const char *filename);
>  #endif
> diff --git a/blockdev.c b/blockdev.c
> index 2602591..619ae9f 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -236,6 +236,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
>      int on_read_error, on_write_error;
>      const char *devaddr;
>      DriveInfo *dinfo;
> +    BlockIOLimit io_limits;
>      int snapshot = 0;
>      int ret;
>  
> @@ -354,6 +355,31 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
>          }
>      }
>  
> +    /* disk I/O throttling */
> +    io_limits.bps[BLOCK_IO_LIMIT_TOTAL]  =
> +                           qemu_opt_get_number(opts, "bps", 0);
> +    io_limits.bps[BLOCK_IO_LIMIT_READ]   =
> +                           qemu_opt_get_number(opts, "bps_rd", 0);
> +    io_limits.bps[BLOCK_IO_LIMIT_WRITE]  =
> +                           qemu_opt_get_number(opts, "bps_wr", 0);
> +    io_limits.iops[BLOCK_IO_LIMIT_TOTAL] =
> +                           qemu_opt_get_number(opts, "iops", 0);
> +    io_limits.iops[BLOCK_IO_LIMIT_READ]  =
> +                           qemu_opt_get_number(opts, "iops_rd", 0);
> +    io_limits.iops[BLOCK_IO_LIMIT_WRITE] =
> +                           qemu_opt_get_number(opts, "iops_wr", 0);
> +
> +    if (((io_limits.bps[BLOCK_IO_LIMIT_TOTAL] != 0)
> +            && ((io_limits.bps[BLOCK_IO_LIMIT_READ] != 0)
> +            || (io_limits.bps[BLOCK_IO_LIMIT_WRITE] != 0)))
> +            || ((io_limits.iops[BLOCK_IO_LIMIT_TOTAL] != 0)
> +            && ((io_limits.iops[BLOCK_IO_LIMIT_READ] != 0)
> +            || (io_limits.iops[BLOCK_IO_LIMIT_WRITE] != 0)))) {

-EWRITEONLY

Seriously, break this up into some temporary bool variables if you want
it to be readable.

> +        error_report("bps(iops) and bps_rd/bps_wr(iops_rd/iops_wr)"
> +                     "cannot be used at the same time");
> +        return NULL;
> +    }
> +
>      on_write_error = BLOCK_ERR_STOP_ENOSPC;
>      if ((buf = qemu_opt_get(opts, "werror")) != NULL) {
>          if (type != IF_IDE && type != IF_SCSI && type != IF_VIRTIO && type != IF_NONE) {
> @@ -461,6 +487,9 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
>  
>      bdrv_set_on_error(dinfo->bdrv, on_read_error, on_write_error);
>  
> +    /* disk I/O throttling */
> +    bdrv_set_io_limits(dinfo->bdrv, &io_limits);
> +
>      switch(type) {
>      case IF_IDE:
>      case IF_SCSI:
> diff --git a/qemu-config.c b/qemu-config.c
> index 7a7854f..405e587 100644
> --- a/qemu-config.c
> +++ b/qemu-config.c
> @@ -85,6 +85,30 @@ static QemuOptsList qemu_drive_opts = {
>              .name = "readonly",
>              .type = QEMU_OPT_BOOL,
>              .help = "open drive file as read-only",
> +        },{
> +            .name = "iops",
> +            .type = QEMU_OPT_NUMBER,
> +            .help = "limit total I/O operations per second",
> +        },{
> +            .name = "iops_rd",
> +            .type = QEMU_OPT_NUMBER,
> +            .help = "limit read operations per second",
> +        },{
> +            .name = "iops_wr",
> +            .type = QEMU_OPT_NUMBER,
> +            .help = "limit write operations per second",
> +        },{
> +            .name = "bps",
> +            .type = QEMU_OPT_NUMBER,
> +            .help = "limit total bytes per second",
> +        },{
> +            .name = "bps_rd",
> +            .type = QEMU_OPT_NUMBER,
> +            .help = "limit read bytes per second",
> +        },{
> +            .name = "bps_wr",
> +            .type = QEMU_OPT_NUMBER,
> +            .help = "limit write bytes per second",
>          },
>          { /* end of list */ }
>      },
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 659ecb2..2e42c5c 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -136,6 +136,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
>      "       [,cache=writethrough|writeback|none|directsync|unsafe][,format=f]\n"
>      "       [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
>      "       [,readonly=on|off]\n"
> +    "       [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r][,iops_wr=w]]\n"
>      "                use 'file' as a drive image\n", QEMU_ARCH_ALL)
>  STEXI
>  @item -drive @var{option}[,@var{option}[,@var{option}[,...]]]

Kevin

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 2/4] block: add the command line support
@ 2011-09-23 15:54     ` Kevin Wolf
  0 siblings, 0 replies; 68+ messages in thread
From: Kevin Wolf @ 2011-09-23 15:54 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: aliguori, stefanha, kvm, mtosatti, qemu-devel, pair, zwu.kernel, ryanh

Am 08.09.2011 12:11, schrieb Zhi Yong Wu:
> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
> ---
>  block.c         |   59 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  block.h         |    5 ++++
>  block_int.h     |    3 ++
>  blockdev.c      |   29 +++++++++++++++++++++++++++
>  qemu-config.c   |   24 ++++++++++++++++++++++
>  qemu-options.hx |    1 +
>  6 files changed, 121 insertions(+), 0 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 43742b7..cd75183 100644
> --- a/block.c
> +++ b/block.c
> @@ -104,6 +104,57 @@ int is_windows_drive(const char *filename)
>  }
>  #endif
>  
> +/* throttling disk I/O limits */
> +void bdrv_io_limits_disable(BlockDriverState *bs)
> +{
> +    bs->io_limits_enabled = false;
> +
> +    if (bs->block_queue) {
> +        qemu_block_queue_flush(bs->block_queue);
> +        qemu_del_block_queue(bs->block_queue);
> +        bs->block_queue = NULL;
> +    }
> +
> +    if (bs->block_timer) {
> +        qemu_del_timer(bs->block_timer);
> +        qemu_free_timer(bs->block_timer);
> +        bs->block_timer     = NULL;
> +    }
> +
> +    bs->slice_start = 0;
> +
> +    bs->slice_end   = 0;

Remove the empty line between slice_start and slice_end?

> +}
> +
> +static void bdrv_block_timer(void *opaque)
> +{
> +    BlockDriverState *bs = opaque;
> +    BlockQueue *queue    = bs->block_queue;
> +
> +    qemu_block_queue_flush(queue);

Hm, didn't really notice it while reading patch 1, but
qemu_block_queue_flush() is misleading. It's really something like
qemu_block_queue_submit().

> +}
> +
> +void bdrv_io_limits_enable(BlockDriverState *bs)
> +{
> +    bs->block_queue = qemu_new_block_queue();
> +    bs->block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
> +
> +    bs->slice_start = qemu_get_clock_ns(vm_clock);
> +
> +    bs->slice_end   = bs->slice_start + BLOCK_IO_SLICE_TIME;
> +}

Same as above.

> +
> +bool bdrv_io_limits_enabled(BlockDriverState *bs)
> +{
> +    BlockIOLimit *io_limits = &bs->io_limits;
> +    return io_limits->bps[BLOCK_IO_LIMIT_READ]
> +         || io_limits->bps[BLOCK_IO_LIMIT_WRITE]
> +         || io_limits->bps[BLOCK_IO_LIMIT_TOTAL]
> +         || io_limits->iops[BLOCK_IO_LIMIT_READ]
> +         || io_limits->iops[BLOCK_IO_LIMIT_WRITE]
> +         || io_limits->iops[BLOCK_IO_LIMIT_TOTAL];
> +}
> +
>  /* check if the path starts with "<protocol>:" */
>  static int path_has_protocol(const char *path)
>  {
> @@ -1453,6 +1504,14 @@ void bdrv_get_geometry_hint(BlockDriverState *bs,
>      *psecs = bs->secs;
>  }
>  
> +/* throttling disk io limits */
> +void bdrv_set_io_limits(BlockDriverState *bs,
> +                            BlockIOLimit *io_limits)
> +{
> +    bs->io_limits = *io_limits;
> +    bs->io_limits_enabled = bdrv_io_limits_enabled(bs);
> +}
> +
>  /* Recognize floppy formats */
>  typedef struct FDFormat {
>      FDriveType drive;
> diff --git a/block.h b/block.h
> index 3ac0b94..a3e69db 100644
> --- a/block.h
> +++ b/block.h
> @@ -58,6 +58,11 @@ void bdrv_info(Monitor *mon, QObject **ret_data);
>  void bdrv_stats_print(Monitor *mon, const QObject *data);
>  void bdrv_info_stats(Monitor *mon, QObject **ret_data);
>  
> +/* disk I/O throttling */
> +void bdrv_io_limits_enable(BlockDriverState *bs);
> +void bdrv_io_limits_disable(BlockDriverState *bs);
> +bool bdrv_io_limits_enabled(BlockDriverState *bs);
> +
>  void bdrv_init(void);
>  void bdrv_init_with_whitelist(void);
>  BlockDriver *bdrv_find_protocol(const char *filename);
> diff --git a/block_int.h b/block_int.h
> index 201e635..368c776 100644
> --- a/block_int.h
> +++ b/block_int.h
> @@ -257,6 +257,9 @@ void qemu_aio_release(void *p);
>  
>  void *qemu_blockalign(BlockDriverState *bs, size_t size);
>  
> +void bdrv_set_io_limits(BlockDriverState *bs,
> +                            BlockIOLimit *io_limits);
> +
>  #ifdef _WIN32
>  int is_windows_drive(const char *filename);
>  #endif
> diff --git a/blockdev.c b/blockdev.c
> index 2602591..619ae9f 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -236,6 +236,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
>      int on_read_error, on_write_error;
>      const char *devaddr;
>      DriveInfo *dinfo;
> +    BlockIOLimit io_limits;
>      int snapshot = 0;
>      int ret;
>  
> @@ -354,6 +355,31 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
>          }
>      }
>  
> +    /* disk I/O throttling */
> +    io_limits.bps[BLOCK_IO_LIMIT_TOTAL]  =
> +                           qemu_opt_get_number(opts, "bps", 0);
> +    io_limits.bps[BLOCK_IO_LIMIT_READ]   =
> +                           qemu_opt_get_number(opts, "bps_rd", 0);
> +    io_limits.bps[BLOCK_IO_LIMIT_WRITE]  =
> +                           qemu_opt_get_number(opts, "bps_wr", 0);
> +    io_limits.iops[BLOCK_IO_LIMIT_TOTAL] =
> +                           qemu_opt_get_number(opts, "iops", 0);
> +    io_limits.iops[BLOCK_IO_LIMIT_READ]  =
> +                           qemu_opt_get_number(opts, "iops_rd", 0);
> +    io_limits.iops[BLOCK_IO_LIMIT_WRITE] =
> +                           qemu_opt_get_number(opts, "iops_wr", 0);
> +
> +    if (((io_limits.bps[BLOCK_IO_LIMIT_TOTAL] != 0)
> +            && ((io_limits.bps[BLOCK_IO_LIMIT_READ] != 0)
> +            || (io_limits.bps[BLOCK_IO_LIMIT_WRITE] != 0)))
> +            || ((io_limits.iops[BLOCK_IO_LIMIT_TOTAL] != 0)
> +            && ((io_limits.iops[BLOCK_IO_LIMIT_READ] != 0)
> +            || (io_limits.iops[BLOCK_IO_LIMIT_WRITE] != 0)))) {

-EWRITEONLY

Seriously, break this up into some temporary bool variables if you want
it to be readable.

> +        error_report("bps(iops) and bps_rd/bps_wr(iops_rd/iops_wr)"
> +                     "cannot be used at the same time");
> +        return NULL;
> +    }
> +
>      on_write_error = BLOCK_ERR_STOP_ENOSPC;
>      if ((buf = qemu_opt_get(opts, "werror")) != NULL) {
>          if (type != IF_IDE && type != IF_SCSI && type != IF_VIRTIO && type != IF_NONE) {
> @@ -461,6 +487,9 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
>  
>      bdrv_set_on_error(dinfo->bdrv, on_read_error, on_write_error);
>  
> +    /* disk I/O throttling */
> +    bdrv_set_io_limits(dinfo->bdrv, &io_limits);
> +
>      switch(type) {
>      case IF_IDE:
>      case IF_SCSI:
> diff --git a/qemu-config.c b/qemu-config.c
> index 7a7854f..405e587 100644
> --- a/qemu-config.c
> +++ b/qemu-config.c
> @@ -85,6 +85,30 @@ static QemuOptsList qemu_drive_opts = {
>              .name = "readonly",
>              .type = QEMU_OPT_BOOL,
>              .help = "open drive file as read-only",
> +        },{
> +            .name = "iops",
> +            .type = QEMU_OPT_NUMBER,
> +            .help = "limit total I/O operations per second",
> +        },{
> +            .name = "iops_rd",
> +            .type = QEMU_OPT_NUMBER,
> +            .help = "limit read operations per second",
> +        },{
> +            .name = "iops_wr",
> +            .type = QEMU_OPT_NUMBER,
> +            .help = "limit write operations per second",
> +        },{
> +            .name = "bps",
> +            .type = QEMU_OPT_NUMBER,
> +            .help = "limit total bytes per second",
> +        },{
> +            .name = "bps_rd",
> +            .type = QEMU_OPT_NUMBER,
> +            .help = "limit read bytes per second",
> +        },{
> +            .name = "bps_wr",
> +            .type = QEMU_OPT_NUMBER,
> +            .help = "limit write bytes per second",
>          },
>          { /* end of list */ }
>      },
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 659ecb2..2e42c5c 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -136,6 +136,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
>      "       [,cache=writethrough|writeback|none|directsync|unsafe][,format=f]\n"
>      "       [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
>      "       [,readonly=on|off]\n"
> +    "       [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r][,iops_wr=w]]\n"
>      "                use 'file' as a drive image\n", QEMU_ARCH_ALL)
>  STEXI
>  @item -drive @var{option}[,@var{option}[,@var{option}[,...]]]

Kevin

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v8 3/4] block: add block timer and throttling algorithm
  2011-09-08 10:11   ` [Qemu-devel] " Zhi Yong Wu
@ 2011-09-23 16:19     ` Kevin Wolf
  -1 siblings, 0 replies; 68+ messages in thread
From: Kevin Wolf @ 2011-09-23 16:19 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: qemu-devel, kvm, stefanha, mtosatti, aliguori, ryanh, zwu.kernel, pair

Am 08.09.2011 12:11, schrieb Zhi Yong Wu:
> Note:
>      1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
>      2.) When "dd" command is issued in guest, if its option bs is set to a large value such as "bs=1024K", the result speed will slightly bigger than the limits.
> 
> For these problems, if you have nice thought, pls let us know.:)
> 
> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
> ---
>  block.c |  259 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
>  block.h |    1 -
>  2 files changed, 248 insertions(+), 12 deletions(-)

One general comment: What about synchronous and/or coroutine I/O
operations? Do you think they are just not important enough to consider
here or were they forgotten?

Also, do I understand correctly that you're always submitting the whole
queue at once? Does this effectively enforce the limit all the time or
will it lead to some peaks and then no requests at all for a while until
the average is right again?

Maybe some documentation on how it all works from a high level
perspective would be helpful.

> diff --git a/block.c b/block.c
> index cd75183..c08fde8 100644
> --- a/block.c
> +++ b/block.c
> @@ -30,6 +30,9 @@
>  #include "qemu-objects.h"
>  #include "qemu-coroutine.h"
>  
> +#include "qemu-timer.h"
> +#include "block/blk-queue.h"
> +
>  #ifdef CONFIG_BSD
>  #include <sys/types.h>
>  #include <sys/stat.h>
> @@ -72,6 +75,13 @@ static int coroutine_fn bdrv_co_writev_em(BlockDriverState *bs,
>                                           QEMUIOVector *iov);
>  static int coroutine_fn bdrv_co_flush_em(BlockDriverState *bs);
>  
> +static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
> +        bool is_write, double elapsed_time, uint64_t *wait);
> +static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
> +        double elapsed_time, uint64_t *wait);
> +static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
> +        bool is_write, int64_t *wait);
> +
>  static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
>      QTAILQ_HEAD_INITIALIZER(bdrv_states);
>  
> @@ -745,6 +755,11 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
>              bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
>      }
>  
> +    /* throttling disk I/O limits */
> +    if (bs->io_limits_enabled) {
> +        bdrv_io_limits_enable(bs);
> +    }
> +
>      return 0;
>  
>  unlink_and_fail:
> @@ -783,6 +798,18 @@ void bdrv_close(BlockDriverState *bs)
>          if (bs->change_cb)
>              bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
>      }
> +
> +    /* throttling disk I/O limits */
> +    if (bs->block_queue) {
> +        qemu_del_block_queue(bs->block_queue);
> +        bs->block_queue = NULL;
> +    }
> +
> +    if (bs->block_timer) {
> +        qemu_del_timer(bs->block_timer);
> +        qemu_free_timer(bs->block_timer);
> +        bs->block_timer = NULL;
> +    }

Why not io_limits_disable() instead of copying the code here?

>  }
>  
>  void bdrv_close_all(void)
> @@ -2341,16 +2368,48 @@ BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num,
>                                   BlockDriverCompletionFunc *cb, void *opaque)
>  {
>      BlockDriver *drv = bs->drv;
> -
> +    BlockDriverAIOCB *ret;
> +    int64_t wait_time = -1;
> +printf("sector_num=%ld, nb_sectors=%d\n", sector_num, nb_sectors);

Debugging leftover (more of them follow, won't comment on each one)

>      trace_bdrv_aio_readv(bs, sector_num, nb_sectors, opaque);
>  
> -    if (!drv)
> -        return NULL;
> -    if (bdrv_check_request(bs, sector_num, nb_sectors))
> +    if (!drv || bdrv_check_request(bs, sector_num, nb_sectors)) {
>          return NULL;
> +    }

This part is unrelated.

> +
> +    /* throttling disk read I/O */
> +    if (bs->io_limits_enabled) {
> +        if (bdrv_exceed_io_limits(bs, nb_sectors, false, &wait_time)) {
> +            ret = qemu_block_queue_enqueue(bs->block_queue, bs, bdrv_aio_readv,
> +                           sector_num, qiov, nb_sectors, cb, opaque);
> +            printf("wait_time=%ld\n", wait_time);
> +            if (wait_time != -1) {
> +                printf("reset block timer\n");
> +                qemu_mod_timer(bs->block_timer,
> +                               wait_time + qemu_get_clock_ns(vm_clock));
> +            }
> +
> +            if (ret) {
> +                printf("ori ret is not null\n");
> +            } else {
> +                printf("ori ret is null\n");
> +            }
> +
> +            return ret;
> +        }
> +    }
>  
> -    return drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
> +    ret =  drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
>                                 cb, opaque);
> +    if (ret) {
> +        if (bs->io_limits_enabled) {
> +            bs->io_disps.bytes[BLOCK_IO_LIMIT_READ] +=
> +                              (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
> +            bs->io_disps.ios[BLOCK_IO_LIMIT_READ]++;
> +        }

I wonder if you can't reuse bs->nr_bytes/nr_ops instead of introducing a
second counting mechanism. Would have the advantage that numbers are
actually consistent (your metric counts slightly differently than the
existing info blockstats one).

> +    }
> +
> +    return ret;
>  }
>  
>  typedef struct BlockCompleteData {
> @@ -2396,15 +2455,14 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
>      BlockDriver *drv = bs->drv;
>      BlockDriverAIOCB *ret;
>      BlockCompleteData *blk_cb_data;
> +    int64_t wait_time = -1;
>  
>      trace_bdrv_aio_writev(bs, sector_num, nb_sectors, opaque);
>  
> -    if (!drv)
> -        return NULL;
> -    if (bs->read_only)
> -        return NULL;
> -    if (bdrv_check_request(bs, sector_num, nb_sectors))
> +    if (!drv || bs->read_only
> +        || bdrv_check_request(bs, sector_num, nb_sectors)) {
>          return NULL;
> +    }

Again, unrelated changes.

>  
>      if (bs->dirty_bitmap) {
>          blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb,
> @@ -2413,13 +2471,32 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
>          opaque = blk_cb_data;
>      }
>  
> +    /* throttling disk write I/O */
> +    if (bs->io_limits_enabled) {
> +        if (bdrv_exceed_io_limits(bs, nb_sectors, true, &wait_time)) {
> +            ret = qemu_block_queue_enqueue(bs->block_queue, bs, bdrv_aio_writev,
> +                                  sector_num, qiov, nb_sectors, cb, opaque);
> +            if (wait_time != -1) {
> +                qemu_mod_timer(bs->block_timer,
> +                               wait_time + qemu_get_clock_ns(vm_clock));
> +            }
> +
> +            return ret;
> +        }
> +    }

This looks very similar to the code in bdrv_aio_readv. Can it be moved
into a common function?

Kevin

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 3/4] block: add block timer and throttling algorithm
@ 2011-09-23 16:19     ` Kevin Wolf
  0 siblings, 0 replies; 68+ messages in thread
From: Kevin Wolf @ 2011-09-23 16:19 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: aliguori, stefanha, kvm, mtosatti, qemu-devel, pair, zwu.kernel, ryanh

Am 08.09.2011 12:11, schrieb Zhi Yong Wu:
> Note:
>      1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
>      2.) When "dd" command is issued in guest, if its option bs is set to a large value such as "bs=1024K", the result speed will slightly bigger than the limits.
> 
> For these problems, if you have nice thought, pls let us know.:)
> 
> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
> ---
>  block.c |  259 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
>  block.h |    1 -
>  2 files changed, 248 insertions(+), 12 deletions(-)

One general comment: What about synchronous and/or coroutine I/O
operations? Do you think they are just not important enough to consider
here or were they forgotten?

Also, do I understand correctly that you're always submitting the whole
queue at once? Does this effectively enforce the limit all the time or
will it lead to some peaks and then no requests at all for a while until
the average is right again?

Maybe some documentation on how it all works from a high level
perspective would be helpful.

> diff --git a/block.c b/block.c
> index cd75183..c08fde8 100644
> --- a/block.c
> +++ b/block.c
> @@ -30,6 +30,9 @@
>  #include "qemu-objects.h"
>  #include "qemu-coroutine.h"
>  
> +#include "qemu-timer.h"
> +#include "block/blk-queue.h"
> +
>  #ifdef CONFIG_BSD
>  #include <sys/types.h>
>  #include <sys/stat.h>
> @@ -72,6 +75,13 @@ static int coroutine_fn bdrv_co_writev_em(BlockDriverState *bs,
>                                           QEMUIOVector *iov);
>  static int coroutine_fn bdrv_co_flush_em(BlockDriverState *bs);
>  
> +static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
> +        bool is_write, double elapsed_time, uint64_t *wait);
> +static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
> +        double elapsed_time, uint64_t *wait);
> +static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
> +        bool is_write, int64_t *wait);
> +
>  static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
>      QTAILQ_HEAD_INITIALIZER(bdrv_states);
>  
> @@ -745,6 +755,11 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
>              bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
>      }
>  
> +    /* throttling disk I/O limits */
> +    if (bs->io_limits_enabled) {
> +        bdrv_io_limits_enable(bs);
> +    }
> +
>      return 0;
>  
>  unlink_and_fail:
> @@ -783,6 +798,18 @@ void bdrv_close(BlockDriverState *bs)
>          if (bs->change_cb)
>              bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
>      }
> +
> +    /* throttling disk I/O limits */
> +    if (bs->block_queue) {
> +        qemu_del_block_queue(bs->block_queue);
> +        bs->block_queue = NULL;
> +    }
> +
> +    if (bs->block_timer) {
> +        qemu_del_timer(bs->block_timer);
> +        qemu_free_timer(bs->block_timer);
> +        bs->block_timer = NULL;
> +    }

Why not io_limits_disable() instead of copying the code here?

>  }
>  
>  void bdrv_close_all(void)
> @@ -2341,16 +2368,48 @@ BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num,
>                                   BlockDriverCompletionFunc *cb, void *opaque)
>  {
>      BlockDriver *drv = bs->drv;
> -
> +    BlockDriverAIOCB *ret;
> +    int64_t wait_time = -1;
> +printf("sector_num=%ld, nb_sectors=%d\n", sector_num, nb_sectors);

Debugging leftover (more of them follow, won't comment on each one)

>      trace_bdrv_aio_readv(bs, sector_num, nb_sectors, opaque);
>  
> -    if (!drv)
> -        return NULL;
> -    if (bdrv_check_request(bs, sector_num, nb_sectors))
> +    if (!drv || bdrv_check_request(bs, sector_num, nb_sectors)) {
>          return NULL;
> +    }

This part is unrelated.

> +
> +    /* throttling disk read I/O */
> +    if (bs->io_limits_enabled) {
> +        if (bdrv_exceed_io_limits(bs, nb_sectors, false, &wait_time)) {
> +            ret = qemu_block_queue_enqueue(bs->block_queue, bs, bdrv_aio_readv,
> +                           sector_num, qiov, nb_sectors, cb, opaque);
> +            printf("wait_time=%ld\n", wait_time);
> +            if (wait_time != -1) {
> +                printf("reset block timer\n");
> +                qemu_mod_timer(bs->block_timer,
> +                               wait_time + qemu_get_clock_ns(vm_clock));
> +            }
> +
> +            if (ret) {
> +                printf("ori ret is not null\n");
> +            } else {
> +                printf("ori ret is null\n");
> +            }
> +
> +            return ret;
> +        }
> +    }
>  
> -    return drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
> +    ret =  drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
>                                 cb, opaque);
> +    if (ret) {
> +        if (bs->io_limits_enabled) {
> +            bs->io_disps.bytes[BLOCK_IO_LIMIT_READ] +=
> +                              (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
> +            bs->io_disps.ios[BLOCK_IO_LIMIT_READ]++;
> +        }

I wonder if you can't reuse bs->nr_bytes/nr_ops instead of introducing a
second counting mechanism. Would have the advantage that numbers are
actually consistent (your metric counts slightly differently than the
existing info blockstats one).

> +    }
> +
> +    return ret;
>  }
>  
>  typedef struct BlockCompleteData {
> @@ -2396,15 +2455,14 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
>      BlockDriver *drv = bs->drv;
>      BlockDriverAIOCB *ret;
>      BlockCompleteData *blk_cb_data;
> +    int64_t wait_time = -1;
>  
>      trace_bdrv_aio_writev(bs, sector_num, nb_sectors, opaque);
>  
> -    if (!drv)
> -        return NULL;
> -    if (bs->read_only)
> -        return NULL;
> -    if (bdrv_check_request(bs, sector_num, nb_sectors))
> +    if (!drv || bs->read_only
> +        || bdrv_check_request(bs, sector_num, nb_sectors)) {
>          return NULL;
> +    }

Again, unrelated changes.

>  
>      if (bs->dirty_bitmap) {
>          blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb,
> @@ -2413,13 +2471,32 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
>          opaque = blk_cb_data;
>      }
>  
> +    /* throttling disk write I/O */
> +    if (bs->io_limits_enabled) {
> +        if (bdrv_exceed_io_limits(bs, nb_sectors, true, &wait_time)) {
> +            ret = qemu_block_queue_enqueue(bs->block_queue, bs, bdrv_aio_writev,
> +                                  sector_num, qiov, nb_sectors, cb, opaque);
> +            if (wait_time != -1) {
> +                qemu_mod_timer(bs->block_timer,
> +                               wait_time + qemu_get_clock_ns(vm_clock));
> +            }
> +
> +            return ret;
> +        }
> +    }

This looks very similar to the code in bdrv_aio_readv. Can it be moved
into a common function?

Kevin

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v8 2/4] block: add the command line support
  2011-09-23 15:54     ` [Qemu-devel] " Kevin Wolf
@ 2011-09-26  6:15       ` Zhi Yong Wu
  -1 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-09-26  6:15 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Zhi Yong Wu, qemu-devel, kvm, stefanha, mtosatti, aliguori, ryanh, pair

On Fri, Sep 23, 2011 at 11:54 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 08.09.2011 12:11, schrieb Zhi Yong Wu:
>> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>> ---
>>  block.c         |   59 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  block.h         |    5 ++++
>>  block_int.h     |    3 ++
>>  blockdev.c      |   29 +++++++++++++++++++++++++++
>>  qemu-config.c   |   24 ++++++++++++++++++++++
>>  qemu-options.hx |    1 +
>>  6 files changed, 121 insertions(+), 0 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index 43742b7..cd75183 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -104,6 +104,57 @@ int is_windows_drive(const char *filename)
>>  }
>>  #endif
>>
>> +/* throttling disk I/O limits */
>> +void bdrv_io_limits_disable(BlockDriverState *bs)
>> +{
>> +    bs->io_limits_enabled = false;
>> +
>> +    if (bs->block_queue) {
>> +        qemu_block_queue_flush(bs->block_queue);
>> +        qemu_del_block_queue(bs->block_queue);
>> +        bs->block_queue = NULL;
>> +    }
>> +
>> +    if (bs->block_timer) {
>> +        qemu_del_timer(bs->block_timer);
>> +        qemu_free_timer(bs->block_timer);
>> +        bs->block_timer     = NULL;
>> +    }
>> +
>> +    bs->slice_start = 0;
>> +
>> +    bs->slice_end   = 0;
>
> Remove the empty line between slice_start and slice_end?
Yeah, thanks.
>
>> +}
>> +
>> +static void bdrv_block_timer(void *opaque)
>> +{
>> +    BlockDriverState *bs = opaque;
>> +    BlockQueue *queue    = bs->block_queue;
>> +
>> +    qemu_block_queue_flush(queue);
>
> Hm, didn't really notice it while reading patch 1, but
> qemu_block_queue_flush() is misleading. It's really something like
Why do you say this is misleading?
> qemu_block_queue_submit().
Right. It will resubmit all enqueued I/O requests.
>
>> +}
>> +
>> +void bdrv_io_limits_enable(BlockDriverState *bs)
>> +{
>> +    bs->block_queue = qemu_new_block_queue();
>> +    bs->block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
>> +
>> +    bs->slice_start = qemu_get_clock_ns(vm_clock);
>> +
>> +    bs->slice_end   = bs->slice_start + BLOCK_IO_SLICE_TIME;
>> +}
>
> Same as above.
got it. I will remove, thanks.
>
>> +
>> +bool bdrv_io_limits_enabled(BlockDriverState *bs)
>> +{
>> +    BlockIOLimit *io_limits = &bs->io_limits;
>> +    return io_limits->bps[BLOCK_IO_LIMIT_READ]
>> +         || io_limits->bps[BLOCK_IO_LIMIT_WRITE]
>> +         || io_limits->bps[BLOCK_IO_LIMIT_TOTAL]
>> +         || io_limits->iops[BLOCK_IO_LIMIT_READ]
>> +         || io_limits->iops[BLOCK_IO_LIMIT_WRITE]
>> +         || io_limits->iops[BLOCK_IO_LIMIT_TOTAL];
>> +}
>> +
>>  /* check if the path starts with "<protocol>:" */
>>  static int path_has_protocol(const char *path)
>>  {
>> @@ -1453,6 +1504,14 @@ void bdrv_get_geometry_hint(BlockDriverState *bs,
>>      *psecs = bs->secs;
>>  }
>>
>> +/* throttling disk io limits */
>> +void bdrv_set_io_limits(BlockDriverState *bs,
>> +                            BlockIOLimit *io_limits)
>> +{
>> +    bs->io_limits = *io_limits;
>> +    bs->io_limits_enabled = bdrv_io_limits_enabled(bs);
>> +}
>> +
>>  /* Recognize floppy formats */
>>  typedef struct FDFormat {
>>      FDriveType drive;
>> diff --git a/block.h b/block.h
>> index 3ac0b94..a3e69db 100644
>> --- a/block.h
>> +++ b/block.h
>> @@ -58,6 +58,11 @@ void bdrv_info(Monitor *mon, QObject **ret_data);
>>  void bdrv_stats_print(Monitor *mon, const QObject *data);
>>  void bdrv_info_stats(Monitor *mon, QObject **ret_data);
>>
>> +/* disk I/O throttling */
>> +void bdrv_io_limits_enable(BlockDriverState *bs);
>> +void bdrv_io_limits_disable(BlockDriverState *bs);
>> +bool bdrv_io_limits_enabled(BlockDriverState *bs);
>> +
>>  void bdrv_init(void);
>>  void bdrv_init_with_whitelist(void);
>>  BlockDriver *bdrv_find_protocol(const char *filename);
>> diff --git a/block_int.h b/block_int.h
>> index 201e635..368c776 100644
>> --- a/block_int.h
>> +++ b/block_int.h
>> @@ -257,6 +257,9 @@ void qemu_aio_release(void *p);
>>
>>  void *qemu_blockalign(BlockDriverState *bs, size_t size);
>>
>> +void bdrv_set_io_limits(BlockDriverState *bs,
>> +                            BlockIOLimit *io_limits);
>> +
>>  #ifdef _WIN32
>>  int is_windows_drive(const char *filename);
>>  #endif
>> diff --git a/blockdev.c b/blockdev.c
>> index 2602591..619ae9f 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -236,6 +236,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
>>      int on_read_error, on_write_error;
>>      const char *devaddr;
>>      DriveInfo *dinfo;
>> +    BlockIOLimit io_limits;
>>      int snapshot = 0;
>>      int ret;
>>
>> @@ -354,6 +355,31 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
>>          }
>>      }
>>
>> +    /* disk I/O throttling */
>> +    io_limits.bps[BLOCK_IO_LIMIT_TOTAL]  =
>> +                           qemu_opt_get_number(opts, "bps", 0);
>> +    io_limits.bps[BLOCK_IO_LIMIT_READ]   =
>> +                           qemu_opt_get_number(opts, "bps_rd", 0);
>> +    io_limits.bps[BLOCK_IO_LIMIT_WRITE]  =
>> +                           qemu_opt_get_number(opts, "bps_wr", 0);
>> +    io_limits.iops[BLOCK_IO_LIMIT_TOTAL] =
>> +                           qemu_opt_get_number(opts, "iops", 0);
>> +    io_limits.iops[BLOCK_IO_LIMIT_READ]  =
>> +                           qemu_opt_get_number(opts, "iops_rd", 0);
>> +    io_limits.iops[BLOCK_IO_LIMIT_WRITE] =
>> +                           qemu_opt_get_number(opts, "iops_wr", 0);
>> +
>> +    if (((io_limits.bps[BLOCK_IO_LIMIT_TOTAL] != 0)
>> +            && ((io_limits.bps[BLOCK_IO_LIMIT_READ] != 0)
>> +            || (io_limits.bps[BLOCK_IO_LIMIT_WRITE] != 0)))
>> +            || ((io_limits.iops[BLOCK_IO_LIMIT_TOTAL] != 0)
>> +            && ((io_limits.iops[BLOCK_IO_LIMIT_READ] != 0)
>> +            || (io_limits.iops[BLOCK_IO_LIMIT_WRITE] != 0)))) {
>
> -EWRITEONLY
Sorry, what does this mean?
>
> Seriously, break this up into some temporary bool variables if you want
> it to be readable.
OK, good advice, and i will.
>
>> +        error_report("bps(iops) and bps_rd/bps_wr(iops_rd/iops_wr)"
>> +                     "cannot be used at the same time");
>> +        return NULL;
>> +    }
>> +
>>      on_write_error = BLOCK_ERR_STOP_ENOSPC;
>>      if ((buf = qemu_opt_get(opts, "werror")) != NULL) {
>>          if (type != IF_IDE && type != IF_SCSI && type != IF_VIRTIO && type != IF_NONE) {
>> @@ -461,6 +487,9 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
>>
>>      bdrv_set_on_error(dinfo->bdrv, on_read_error, on_write_error);
>>
>> +    /* disk I/O throttling */
>> +    bdrv_set_io_limits(dinfo->bdrv, &io_limits);
>> +
>>      switch(type) {
>>      case IF_IDE:
>>      case IF_SCSI:
>> diff --git a/qemu-config.c b/qemu-config.c
>> index 7a7854f..405e587 100644
>> --- a/qemu-config.c
>> +++ b/qemu-config.c
>> @@ -85,6 +85,30 @@ static QemuOptsList qemu_drive_opts = {
>>              .name = "readonly",
>>              .type = QEMU_OPT_BOOL,
>>              .help = "open drive file as read-only",
>> +        },{
>> +            .name = "iops",
>> +            .type = QEMU_OPT_NUMBER,
>> +            .help = "limit total I/O operations per second",
>> +        },{
>> +            .name = "iops_rd",
>> +            .type = QEMU_OPT_NUMBER,
>> +            .help = "limit read operations per second",
>> +        },{
>> +            .name = "iops_wr",
>> +            .type = QEMU_OPT_NUMBER,
>> +            .help = "limit write operations per second",
>> +        },{
>> +            .name = "bps",
>> +            .type = QEMU_OPT_NUMBER,
>> +            .help = "limit total bytes per second",
>> +        },{
>> +            .name = "bps_rd",
>> +            .type = QEMU_OPT_NUMBER,
>> +            .help = "limit read bytes per second",
>> +        },{
>> +            .name = "bps_wr",
>> +            .type = QEMU_OPT_NUMBER,
>> +            .help = "limit write bytes per second",
>>          },
>>          { /* end of list */ }
>>      },
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index 659ecb2..2e42c5c 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -136,6 +136,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
>>      "       [,cache=writethrough|writeback|none|directsync|unsafe][,format=f]\n"
>>      "       [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
>>      "       [,readonly=on|off]\n"
>> +    "       [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r][,iops_wr=w]]\n"
>>      "                use 'file' as a drive image\n", QEMU_ARCH_ALL)
>>  STEXI
>>  @item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
>
> Kevin
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 2/4] block: add the command line support
@ 2011-09-26  6:15       ` Zhi Yong Wu
  0 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-09-26  6:15 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: aliguori, stefanha, kvm, mtosatti, qemu-devel, pair, ryanh, Zhi Yong Wu

On Fri, Sep 23, 2011 at 11:54 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 08.09.2011 12:11, schrieb Zhi Yong Wu:
>> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>> ---
>>  block.c         |   59 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  block.h         |    5 ++++
>>  block_int.h     |    3 ++
>>  blockdev.c      |   29 +++++++++++++++++++++++++++
>>  qemu-config.c   |   24 ++++++++++++++++++++++
>>  qemu-options.hx |    1 +
>>  6 files changed, 121 insertions(+), 0 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index 43742b7..cd75183 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -104,6 +104,57 @@ int is_windows_drive(const char *filename)
>>  }
>>  #endif
>>
>> +/* throttling disk I/O limits */
>> +void bdrv_io_limits_disable(BlockDriverState *bs)
>> +{
>> +    bs->io_limits_enabled = false;
>> +
>> +    if (bs->block_queue) {
>> +        qemu_block_queue_flush(bs->block_queue);
>> +        qemu_del_block_queue(bs->block_queue);
>> +        bs->block_queue = NULL;
>> +    }
>> +
>> +    if (bs->block_timer) {
>> +        qemu_del_timer(bs->block_timer);
>> +        qemu_free_timer(bs->block_timer);
>> +        bs->block_timer     = NULL;
>> +    }
>> +
>> +    bs->slice_start = 0;
>> +
>> +    bs->slice_end   = 0;
>
> Remove the empty line between slice_start and slice_end?
Yeah, thanks.
>
>> +}
>> +
>> +static void bdrv_block_timer(void *opaque)
>> +{
>> +    BlockDriverState *bs = opaque;
>> +    BlockQueue *queue    = bs->block_queue;
>> +
>> +    qemu_block_queue_flush(queue);
>
> Hm, didn't really notice it while reading patch 1, but
> qemu_block_queue_flush() is misleading. It's really something like
Why do you say this is misleading?
> qemu_block_queue_submit().
Right. It will resubmit all enqueued I/O requests.
>
>> +}
>> +
>> +void bdrv_io_limits_enable(BlockDriverState *bs)
>> +{
>> +    bs->block_queue = qemu_new_block_queue();
>> +    bs->block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
>> +
>> +    bs->slice_start = qemu_get_clock_ns(vm_clock);
>> +
>> +    bs->slice_end   = bs->slice_start + BLOCK_IO_SLICE_TIME;
>> +}
>
> Same as above.
got it. I will remove, thanks.
>
>> +
>> +bool bdrv_io_limits_enabled(BlockDriverState *bs)
>> +{
>> +    BlockIOLimit *io_limits = &bs->io_limits;
>> +    return io_limits->bps[BLOCK_IO_LIMIT_READ]
>> +         || io_limits->bps[BLOCK_IO_LIMIT_WRITE]
>> +         || io_limits->bps[BLOCK_IO_LIMIT_TOTAL]
>> +         || io_limits->iops[BLOCK_IO_LIMIT_READ]
>> +         || io_limits->iops[BLOCK_IO_LIMIT_WRITE]
>> +         || io_limits->iops[BLOCK_IO_LIMIT_TOTAL];
>> +}
>> +
>>  /* check if the path starts with "<protocol>:" */
>>  static int path_has_protocol(const char *path)
>>  {
>> @@ -1453,6 +1504,14 @@ void bdrv_get_geometry_hint(BlockDriverState *bs,
>>      *psecs = bs->secs;
>>  }
>>
>> +/* throttling disk io limits */
>> +void bdrv_set_io_limits(BlockDriverState *bs,
>> +                            BlockIOLimit *io_limits)
>> +{
>> +    bs->io_limits = *io_limits;
>> +    bs->io_limits_enabled = bdrv_io_limits_enabled(bs);
>> +}
>> +
>>  /* Recognize floppy formats */
>>  typedef struct FDFormat {
>>      FDriveType drive;
>> diff --git a/block.h b/block.h
>> index 3ac0b94..a3e69db 100644
>> --- a/block.h
>> +++ b/block.h
>> @@ -58,6 +58,11 @@ void bdrv_info(Monitor *mon, QObject **ret_data);
>>  void bdrv_stats_print(Monitor *mon, const QObject *data);
>>  void bdrv_info_stats(Monitor *mon, QObject **ret_data);
>>
>> +/* disk I/O throttling */
>> +void bdrv_io_limits_enable(BlockDriverState *bs);
>> +void bdrv_io_limits_disable(BlockDriverState *bs);
>> +bool bdrv_io_limits_enabled(BlockDriverState *bs);
>> +
>>  void bdrv_init(void);
>>  void bdrv_init_with_whitelist(void);
>>  BlockDriver *bdrv_find_protocol(const char *filename);
>> diff --git a/block_int.h b/block_int.h
>> index 201e635..368c776 100644
>> --- a/block_int.h
>> +++ b/block_int.h
>> @@ -257,6 +257,9 @@ void qemu_aio_release(void *p);
>>
>>  void *qemu_blockalign(BlockDriverState *bs, size_t size);
>>
>> +void bdrv_set_io_limits(BlockDriverState *bs,
>> +                            BlockIOLimit *io_limits);
>> +
>>  #ifdef _WIN32
>>  int is_windows_drive(const char *filename);
>>  #endif
>> diff --git a/blockdev.c b/blockdev.c
>> index 2602591..619ae9f 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -236,6 +236,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
>>      int on_read_error, on_write_error;
>>      const char *devaddr;
>>      DriveInfo *dinfo;
>> +    BlockIOLimit io_limits;
>>      int snapshot = 0;
>>      int ret;
>>
>> @@ -354,6 +355,31 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
>>          }
>>      }
>>
>> +    /* disk I/O throttling */
>> +    io_limits.bps[BLOCK_IO_LIMIT_TOTAL]  =
>> +                           qemu_opt_get_number(opts, "bps", 0);
>> +    io_limits.bps[BLOCK_IO_LIMIT_READ]   =
>> +                           qemu_opt_get_number(opts, "bps_rd", 0);
>> +    io_limits.bps[BLOCK_IO_LIMIT_WRITE]  =
>> +                           qemu_opt_get_number(opts, "bps_wr", 0);
>> +    io_limits.iops[BLOCK_IO_LIMIT_TOTAL] =
>> +                           qemu_opt_get_number(opts, "iops", 0);
>> +    io_limits.iops[BLOCK_IO_LIMIT_READ]  =
>> +                           qemu_opt_get_number(opts, "iops_rd", 0);
>> +    io_limits.iops[BLOCK_IO_LIMIT_WRITE] =
>> +                           qemu_opt_get_number(opts, "iops_wr", 0);
>> +
>> +    if (((io_limits.bps[BLOCK_IO_LIMIT_TOTAL] != 0)
>> +            && ((io_limits.bps[BLOCK_IO_LIMIT_READ] != 0)
>> +            || (io_limits.bps[BLOCK_IO_LIMIT_WRITE] != 0)))
>> +            || ((io_limits.iops[BLOCK_IO_LIMIT_TOTAL] != 0)
>> +            && ((io_limits.iops[BLOCK_IO_LIMIT_READ] != 0)
>> +            || (io_limits.iops[BLOCK_IO_LIMIT_WRITE] != 0)))) {
>
> -EWRITEONLY
Sorry, what does this mean?
>
> Seriously, break this up into some temporary bool variables if you want
> it to be readable.
OK, good advice, and i will.
>
>> +        error_report("bps(iops) and bps_rd/bps_wr(iops_rd/iops_wr)"
>> +                     "cannot be used at the same time");
>> +        return NULL;
>> +    }
>> +
>>      on_write_error = BLOCK_ERR_STOP_ENOSPC;
>>      if ((buf = qemu_opt_get(opts, "werror")) != NULL) {
>>          if (type != IF_IDE && type != IF_SCSI && type != IF_VIRTIO && type != IF_NONE) {
>> @@ -461,6 +487,9 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
>>
>>      bdrv_set_on_error(dinfo->bdrv, on_read_error, on_write_error);
>>
>> +    /* disk I/O throttling */
>> +    bdrv_set_io_limits(dinfo->bdrv, &io_limits);
>> +
>>      switch(type) {
>>      case IF_IDE:
>>      case IF_SCSI:
>> diff --git a/qemu-config.c b/qemu-config.c
>> index 7a7854f..405e587 100644
>> --- a/qemu-config.c
>> +++ b/qemu-config.c
>> @@ -85,6 +85,30 @@ static QemuOptsList qemu_drive_opts = {
>>              .name = "readonly",
>>              .type = QEMU_OPT_BOOL,
>>              .help = "open drive file as read-only",
>> +        },{
>> +            .name = "iops",
>> +            .type = QEMU_OPT_NUMBER,
>> +            .help = "limit total I/O operations per second",
>> +        },{
>> +            .name = "iops_rd",
>> +            .type = QEMU_OPT_NUMBER,
>> +            .help = "limit read operations per second",
>> +        },{
>> +            .name = "iops_wr",
>> +            .type = QEMU_OPT_NUMBER,
>> +            .help = "limit write operations per second",
>> +        },{
>> +            .name = "bps",
>> +            .type = QEMU_OPT_NUMBER,
>> +            .help = "limit total bytes per second",
>> +        },{
>> +            .name = "bps_rd",
>> +            .type = QEMU_OPT_NUMBER,
>> +            .help = "limit read bytes per second",
>> +        },{
>> +            .name = "bps_wr",
>> +            .type = QEMU_OPT_NUMBER,
>> +            .help = "limit write bytes per second",
>>          },
>>          { /* end of list */ }
>>      },
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index 659ecb2..2e42c5c 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -136,6 +136,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
>>      "       [,cache=writethrough|writeback|none|directsync|unsafe][,format=f]\n"
>>      "       [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
>>      "       [,readonly=on|off]\n"
>> +    "       [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r][,iops_wr=w]]\n"
>>      "                use 'file' as a drive image\n", QEMU_ARCH_ALL)
>>  STEXI
>>  @item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
>
> Kevin
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v8 3/4] block: add block timer and throttling algorithm
  2011-09-23 16:19     ` [Qemu-devel] " Kevin Wolf
@ 2011-09-26  7:24       ` Zhi Yong Wu
  -1 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-09-26  7:24 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: aliguori, stefanha, kvm, mtosatti, qemu-devel, pair, ryanh, Zhi Yong Wu

On Sat, Sep 24, 2011 at 12:19 AM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 08.09.2011 12:11, schrieb Zhi Yong Wu:
>> Note:
>>      1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
>>      2.) When "dd" command is issued in guest, if its option bs is set to a large value such as "bs=1024K", the result speed will slightly bigger than the limits.
>>
>> For these problems, if you have nice thought, pls let us know.:)
>>
>> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>> ---
>>  block.c |  259 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
>>  block.h |    1 -
>>  2 files changed, 248 insertions(+), 12 deletions(-)
>
> One general comment: What about synchronous and/or coroutine I/O
> operations? Do you think they are just not important enough to consider
> here or were they forgotten?
For sync ops, we assume that it will be converse into async mode at
some point of future, right?
For coroutine I/O, it is introduced in image driver layer, and behind
bdrv_aio_readv/writev. I think that we need not consider them, right?

>
> Also, do I understand correctly that you're always submitting the whole
Right, when the block timer fire, it will flush whole request queue.
> queue at once? Does this effectively enforce the limit all the time or
> will it lead to some peaks and then no requests at all for a while until
In fact, it only try to submit those enqueued request one by one. If
fail to pass the limit, this request will be enqueued again.
> the average is right again?
Yeah, it is possible. Do you better idea?
>
> Maybe some documentation on how it all works from a high level
> perspective would be helpful.
>
>> diff --git a/block.c b/block.c
>> index cd75183..c08fde8 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -30,6 +30,9 @@
>>  #include "qemu-objects.h"
>>  #include "qemu-coroutine.h"
>>
>> +#include "qemu-timer.h"
>> +#include "block/blk-queue.h"
>> +
>>  #ifdef CONFIG_BSD
>>  #include <sys/types.h>
>>  #include <sys/stat.h>
>> @@ -72,6 +75,13 @@ static int coroutine_fn bdrv_co_writev_em(BlockDriverState *bs,
>>                                           QEMUIOVector *iov);
>>  static int coroutine_fn bdrv_co_flush_em(BlockDriverState *bs);
>>
>> +static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
>> +        bool is_write, double elapsed_time, uint64_t *wait);
>> +static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
>> +        double elapsed_time, uint64_t *wait);
>> +static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
>> +        bool is_write, int64_t *wait);
>> +
>>  static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
>>      QTAILQ_HEAD_INITIALIZER(bdrv_states);
>>
>> @@ -745,6 +755,11 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
>>              bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
>>      }
>>
>> +    /* throttling disk I/O limits */
>> +    if (bs->io_limits_enabled) {
>> +        bdrv_io_limits_enable(bs);
>> +    }
>> +
>>      return 0;
>>
>>  unlink_and_fail:
>> @@ -783,6 +798,18 @@ void bdrv_close(BlockDriverState *bs)
>>          if (bs->change_cb)
>>              bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
>>      }
>> +
>> +    /* throttling disk I/O limits */
>> +    if (bs->block_queue) {
>> +        qemu_del_block_queue(bs->block_queue);
>> +        bs->block_queue = NULL;
>> +    }
>> +
>> +    if (bs->block_timer) {
>> +        qemu_del_timer(bs->block_timer);
>> +        qemu_free_timer(bs->block_timer);
>> +        bs->block_timer = NULL;
>> +    }
>
> Why not io_limits_disable() instead of copying the code here?
Good point, thanks.
>
>>  }
>>
>>  void bdrv_close_all(void)
>> @@ -2341,16 +2368,48 @@ BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num,
>>                                   BlockDriverCompletionFunc *cb, void *opaque)
>>  {
>>      BlockDriver *drv = bs->drv;
>> -
>> +    BlockDriverAIOCB *ret;
>> +    int64_t wait_time = -1;
>> +printf("sector_num=%ld, nb_sectors=%d\n", sector_num, nb_sectors);
>
> Debugging leftover (more of them follow, won't comment on each one)
Removed.
>
>>      trace_bdrv_aio_readv(bs, sector_num, nb_sectors, opaque);
>>
>> -    if (!drv)
>> -        return NULL;
>> -    if (bdrv_check_request(bs, sector_num, nb_sectors))
>> +    if (!drv || bdrv_check_request(bs, sector_num, nb_sectors)) {
>>          return NULL;
>> +    }
>
> This part is unrelated.
Have changed it to original.
>
>> +
>> +    /* throttling disk read I/O */
>> +    if (bs->io_limits_enabled) {
>> +        if (bdrv_exceed_io_limits(bs, nb_sectors, false, &wait_time)) {
>> +            ret = qemu_block_queue_enqueue(bs->block_queue, bs, bdrv_aio_readv,
>> +                           sector_num, qiov, nb_sectors, cb, opaque);
>> +            printf("wait_time=%ld\n", wait_time);
>> +            if (wait_time != -1) {
>> +                printf("reset block timer\n");
>> +                qemu_mod_timer(bs->block_timer,
>> +                               wait_time + qemu_get_clock_ns(vm_clock));
>> +            }
>> +
>> +            if (ret) {
>> +                printf("ori ret is not null\n");
>> +            } else {
>> +                printf("ori ret is null\n");
>> +            }
>> +
>> +            return ret;
>> +        }
>> +    }
>>
>> -    return drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
>> +    ret =  drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
>>                                 cb, opaque);
>> +    if (ret) {
>> +        if (bs->io_limits_enabled) {
>> +            bs->io_disps.bytes[BLOCK_IO_LIMIT_READ] +=
>> +                              (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
>> +            bs->io_disps.ios[BLOCK_IO_LIMIT_READ]++;
>> +        }
>
> I wonder if you can't reuse bs->nr_bytes/nr_ops instead of introducing a
> second counting mechanism. Would have the advantage that numbers are
NO, our counting variables will be reset to ZERO if current slice
time(0.1ms) is used up.
> actually consistent (your metric counts slightly differently than the
> existing info blockstats one).
Yeah, i notice this, and don't think there's wrong with it. and you?
>
>> +    }
>> +
>> +    return ret;
>>  }
>>
>>  typedef struct BlockCompleteData {
>> @@ -2396,15 +2455,14 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
>>      BlockDriver *drv = bs->drv;
>>      BlockDriverAIOCB *ret;
>>      BlockCompleteData *blk_cb_data;
>> +    int64_t wait_time = -1;
>>
>>      trace_bdrv_aio_writev(bs, sector_num, nb_sectors, opaque);
>>
>> -    if (!drv)
>> -        return NULL;
>> -    if (bs->read_only)
>> -        return NULL;
>> -    if (bdrv_check_request(bs, sector_num, nb_sectors))
>> +    if (!drv || bs->read_only
>> +        || bdrv_check_request(bs, sector_num, nb_sectors)) {
>>          return NULL;
>> +    }
>
> Again, unrelated changes.
Have changed it to original.
>
>>
>>      if (bs->dirty_bitmap) {
>>          blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb,
>> @@ -2413,13 +2471,32 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
>>          opaque = blk_cb_data;
>>      }
>>
>> +    /* throttling disk write I/O */
>> +    if (bs->io_limits_enabled) {
>> +        if (bdrv_exceed_io_limits(bs, nb_sectors, true, &wait_time)) {
>> +            ret = qemu_block_queue_enqueue(bs->block_queue, bs, bdrv_aio_writev,
>> +                                  sector_num, qiov, nb_sectors, cb, opaque);
>> +            if (wait_time != -1) {
>> +                qemu_mod_timer(bs->block_timer,
>> +                               wait_time + qemu_get_clock_ns(vm_clock));
>> +            }
>> +
>> +            return ret;
>> +        }
>> +    }
>
> This looks very similar to the code in bdrv_aio_readv. Can it be moved
> into a common function?
Good advice, done. thanks.

>
> Kevin
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 3/4] block: add block timer and throttling algorithm
@ 2011-09-26  7:24       ` Zhi Yong Wu
  0 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-09-26  7:24 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: aliguori, stefanha, kvm, mtosatti, qemu-devel, pair, ryanh, Zhi Yong Wu

On Sat, Sep 24, 2011 at 12:19 AM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 08.09.2011 12:11, schrieb Zhi Yong Wu:
>> Note:
>>      1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
>>      2.) When "dd" command is issued in guest, if its option bs is set to a large value such as "bs=1024K", the result speed will slightly bigger than the limits.
>>
>> For these problems, if you have nice thought, pls let us know.:)
>>
>> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>> ---
>>  block.c |  259 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
>>  block.h |    1 -
>>  2 files changed, 248 insertions(+), 12 deletions(-)
>
> One general comment: What about synchronous and/or coroutine I/O
> operations? Do you think they are just not important enough to consider
> here or were they forgotten?
For sync ops, we assume that it will be converse into async mode at
some point of future, right?
For coroutine I/O, it is introduced in image driver layer, and behind
bdrv_aio_readv/writev. I think that we need not consider them, right?

>
> Also, do I understand correctly that you're always submitting the whole
Right, when the block timer fire, it will flush whole request queue.
> queue at once? Does this effectively enforce the limit all the time or
> will it lead to some peaks and then no requests at all for a while until
In fact, it only try to submit those enqueued request one by one. If
fail to pass the limit, this request will be enqueued again.
> the average is right again?
Yeah, it is possible. Do you better idea?
>
> Maybe some documentation on how it all works from a high level
> perspective would be helpful.
>
>> diff --git a/block.c b/block.c
>> index cd75183..c08fde8 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -30,6 +30,9 @@
>>  #include "qemu-objects.h"
>>  #include "qemu-coroutine.h"
>>
>> +#include "qemu-timer.h"
>> +#include "block/blk-queue.h"
>> +
>>  #ifdef CONFIG_BSD
>>  #include <sys/types.h>
>>  #include <sys/stat.h>
>> @@ -72,6 +75,13 @@ static int coroutine_fn bdrv_co_writev_em(BlockDriverState *bs,
>>                                           QEMUIOVector *iov);
>>  static int coroutine_fn bdrv_co_flush_em(BlockDriverState *bs);
>>
>> +static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
>> +        bool is_write, double elapsed_time, uint64_t *wait);
>> +static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
>> +        double elapsed_time, uint64_t *wait);
>> +static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
>> +        bool is_write, int64_t *wait);
>> +
>>  static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
>>      QTAILQ_HEAD_INITIALIZER(bdrv_states);
>>
>> @@ -745,6 +755,11 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
>>              bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
>>      }
>>
>> +    /* throttling disk I/O limits */
>> +    if (bs->io_limits_enabled) {
>> +        bdrv_io_limits_enable(bs);
>> +    }
>> +
>>      return 0;
>>
>>  unlink_and_fail:
>> @@ -783,6 +798,18 @@ void bdrv_close(BlockDriverState *bs)
>>          if (bs->change_cb)
>>              bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
>>      }
>> +
>> +    /* throttling disk I/O limits */
>> +    if (bs->block_queue) {
>> +        qemu_del_block_queue(bs->block_queue);
>> +        bs->block_queue = NULL;
>> +    }
>> +
>> +    if (bs->block_timer) {
>> +        qemu_del_timer(bs->block_timer);
>> +        qemu_free_timer(bs->block_timer);
>> +        bs->block_timer = NULL;
>> +    }
>
> Why not io_limits_disable() instead of copying the code here?
Good point, thanks.
>
>>  }
>>
>>  void bdrv_close_all(void)
>> @@ -2341,16 +2368,48 @@ BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num,
>>                                   BlockDriverCompletionFunc *cb, void *opaque)
>>  {
>>      BlockDriver *drv = bs->drv;
>> -
>> +    BlockDriverAIOCB *ret;
>> +    int64_t wait_time = -1;
>> +printf("sector_num=%ld, nb_sectors=%d\n", sector_num, nb_sectors);
>
> Debugging leftover (more of them follow, won't comment on each one)
Removed.
>
>>      trace_bdrv_aio_readv(bs, sector_num, nb_sectors, opaque);
>>
>> -    if (!drv)
>> -        return NULL;
>> -    if (bdrv_check_request(bs, sector_num, nb_sectors))
>> +    if (!drv || bdrv_check_request(bs, sector_num, nb_sectors)) {
>>          return NULL;
>> +    }
>
> This part is unrelated.
Have changed it to original.
>
>> +
>> +    /* throttling disk read I/O */
>> +    if (bs->io_limits_enabled) {
>> +        if (bdrv_exceed_io_limits(bs, nb_sectors, false, &wait_time)) {
>> +            ret = qemu_block_queue_enqueue(bs->block_queue, bs, bdrv_aio_readv,
>> +                           sector_num, qiov, nb_sectors, cb, opaque);
>> +            printf("wait_time=%ld\n", wait_time);
>> +            if (wait_time != -1) {
>> +                printf("reset block timer\n");
>> +                qemu_mod_timer(bs->block_timer,
>> +                               wait_time + qemu_get_clock_ns(vm_clock));
>> +            }
>> +
>> +            if (ret) {
>> +                printf("ori ret is not null\n");
>> +            } else {
>> +                printf("ori ret is null\n");
>> +            }
>> +
>> +            return ret;
>> +        }
>> +    }
>>
>> -    return drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
>> +    ret =  drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
>>                                 cb, opaque);
>> +    if (ret) {
>> +        if (bs->io_limits_enabled) {
>> +            bs->io_disps.bytes[BLOCK_IO_LIMIT_READ] +=
>> +                              (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
>> +            bs->io_disps.ios[BLOCK_IO_LIMIT_READ]++;
>> +        }
>
> I wonder if you can't reuse bs->nr_bytes/nr_ops instead of introducing a
> second counting mechanism. Would have the advantage that numbers are
NO, our counting variables will be reset to ZERO if current slice
time(0.1ms) is used up.
> actually consistent (your metric counts slightly differently than the
> existing info blockstats one).
Yeah, i notice this, and don't think there's wrong with it. and you?
>
>> +    }
>> +
>> +    return ret;
>>  }
>>
>>  typedef struct BlockCompleteData {
>> @@ -2396,15 +2455,14 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
>>      BlockDriver *drv = bs->drv;
>>      BlockDriverAIOCB *ret;
>>      BlockCompleteData *blk_cb_data;
>> +    int64_t wait_time = -1;
>>
>>      trace_bdrv_aio_writev(bs, sector_num, nb_sectors, opaque);
>>
>> -    if (!drv)
>> -        return NULL;
>> -    if (bs->read_only)
>> -        return NULL;
>> -    if (bdrv_check_request(bs, sector_num, nb_sectors))
>> +    if (!drv || bs->read_only
>> +        || bdrv_check_request(bs, sector_num, nb_sectors)) {
>>          return NULL;
>> +    }
>
> Again, unrelated changes.
Have changed it to original.
>
>>
>>      if (bs->dirty_bitmap) {
>>          blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb,
>> @@ -2413,13 +2471,32 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
>>          opaque = blk_cb_data;
>>      }
>>
>> +    /* throttling disk write I/O */
>> +    if (bs->io_limits_enabled) {
>> +        if (bdrv_exceed_io_limits(bs, nb_sectors, true, &wait_time)) {
>> +            ret = qemu_block_queue_enqueue(bs->block_queue, bs, bdrv_aio_writev,
>> +                                  sector_num, qiov, nb_sectors, cb, opaque);
>> +            if (wait_time != -1) {
>> +                qemu_mod_timer(bs->block_timer,
>> +                               wait_time + qemu_get_clock_ns(vm_clock));
>> +            }
>> +
>> +            return ret;
>> +        }
>> +    }
>
> This looks very similar to the code in bdrv_aio_readv. Can it be moved
> into a common function?
Good advice, done. thanks.

>
> Kevin
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 1/4] block: add the block queue support
  2011-09-23 15:32     ` [Qemu-devel] " Kevin Wolf
@ 2011-09-26  8:01       ` Zhi Yong Wu
  -1 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-09-26  8:01 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Zhi Yong Wu, aliguori, stefanha, kvm, mtosatti, qemu-devel, pair, ryanh

On Fri, Sep 23, 2011 at 11:32 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 08.09.2011 12:11, schrieb Zhi Yong Wu:
>> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>> ---
>>  Makefile.objs     |    2 +-
>>  block/blk-queue.c |  201 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  block/blk-queue.h |   59 ++++++++++++++++
>>  block_int.h       |   27 +++++++
>>  4 files changed, 288 insertions(+), 1 deletions(-)
>>  create mode 100644 block/blk-queue.c
>>  create mode 100644 block/blk-queue.h
>>
>> diff --git a/Makefile.objs b/Makefile.objs
>> index 26b885b..5dcf456 100644
>> --- a/Makefile.objs
>> +++ b/Makefile.objs
>> @@ -33,7 +33,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vv
>>  block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
>>  block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>>  block-nested-y += qed-check.o
>> -block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
>> +block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o blk-queue.o
>>  block-nested-$(CONFIG_WIN32) += raw-win32.o
>>  block-nested-$(CONFIG_POSIX) += raw-posix.o
>>  block-nested-$(CONFIG_CURL) += curl.o
>> diff --git a/block/blk-queue.c b/block/blk-queue.c
>> new file mode 100644
>> index 0000000..adef497
>> --- /dev/null
>> +++ b/block/blk-queue.c
>> @@ -0,0 +1,201 @@
>> +/*
>> + * QEMU System Emulator queue definition for block layer
>> + *
>> + * Copyright (c) IBM, Corp. 2011
>> + *
>> + * Authors:
>> + *  Zhi Yong Wu  <wuzhy@linux.vnet.ibm.com>
>> + *  Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a copy
>> + * of this software and associated documentation files (the "Software"), to deal
>> + * in the Software without restriction, including without limitation the rights
>> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
>> + * copies of the Software, and to permit persons to whom the Software is
>> + * furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
>> + * THE SOFTWARE.
>> + */
>> +
>> +#include "block_int.h"
>> +#include "block/blk-queue.h"
>> +#include "qemu-common.h"
>> +
>> +/* The APIs for block request queue on qemu block layer.
>> + */
>> +
>> +struct BlockQueueAIOCB {
>> +    BlockDriverAIOCB common;
>> +    QTAILQ_ENTRY(BlockQueueAIOCB) entry;
>> +    BlockRequestHandler *handler;
>> +    BlockDriverAIOCB *real_acb;
>> +
>> +    int64_t sector_num;
>> +    QEMUIOVector *qiov;
>> +    int nb_sectors;
>> +};
>
> The idea is that each request is first queued on the QTAILQ, and at some
> point it's removed from the queue and gets a real_acb. But it never has
> both at the same time. Correct?
NO. if block I/O throttling is enabled and I/O rate at runtime exceed
this limits, this request will be enqueued.
It represents the whole lifecycle of one enqueued request.

>
> Can we have the basic principle of operation spelled out as a comment
> somewhere near the top of the file?
OK, i will.
>
>> +
>> +typedef struct BlockQueueAIOCB BlockQueueAIOCB;
>> +
>> +struct BlockQueue {
>> +    QTAILQ_HEAD(requests, BlockQueueAIOCB) requests;
>> +    bool req_failed;
>> +    bool flushing;
>> +};
>
> I find req_failed pretty confusing. Needs documentation at least, but
> most probably also a better name.
OK. request_has_failed?
>
>> +
>> +static void qemu_block_queue_dequeue(BlockQueue *queue,
>> +                                     BlockQueueAIOCB *request)
>> +{
>> +    BlockQueueAIOCB *req;
>> +
>> +    assert(queue);
>> +    while (!QTAILQ_EMPTY(&queue->requests)) {
>> +        req = QTAILQ_FIRST(&queue->requests);
>> +        if (req == request) {
>> +            QTAILQ_REMOVE(&queue->requests, req, entry);
>> +            break;
>> +        }
>> +    }
>> +}
>
> Is it just me or is this an endless loop if the request isn't the first
> element in the list?
queue->requests is only used to store requests which exceed the limits.
Why is the request not the first evlement?
>
>> +
>> +static void qemu_block_queue_cancel(BlockDriverAIOCB *acb)
>> +{
>> +    BlockQueueAIOCB *request = container_of(acb, BlockQueueAIOCB, common);
>> +    if (request->real_acb) {
>> +        bdrv_aio_cancel(request->real_acb);
>> +    } else {
>> +        assert(request->common.bs->block_queue);
>> +        qemu_block_queue_dequeue(request->common.bs->block_queue,
>> +                                 request);
>> +    }
>> +
>> +    qemu_aio_release(request);
>> +}
>> +
>> +static AIOPool block_queue_pool = {
>> +    .aiocb_size         = sizeof(struct BlockQueueAIOCB),
>> +    .cancel             = qemu_block_queue_cancel,
>> +};
>> +
>> +static void qemu_block_queue_callback(void *opaque, int ret)
>> +{
>> +    BlockQueueAIOCB *acb = opaque;
>> +
>> +    if (acb->common.cb) {
>> +        acb->common.cb(acb->common.opaque, ret);
>> +    }
>> +
>> +    qemu_aio_release(acb);
>> +}
>> +
>> +BlockQueue *qemu_new_block_queue(void)
>> +{
>> +    BlockQueue *queue;
>> +
>> +    queue = g_malloc0(sizeof(BlockQueue));
>> +
>> +    QTAILQ_INIT(&queue->requests);
>> +
>> +    queue->req_failed = true;
>> +    queue->flushing   = false;
>> +
>> +    return queue;
>> +}
>> +
>> +void qemu_del_block_queue(BlockQueue *queue)
>> +{
>> +    BlockQueueAIOCB *request, *next;
>> +
>> +    QTAILQ_FOREACH_SAFE(request, &queue->requests, entry, next) {
>> +        QTAILQ_REMOVE(&queue->requests, request, entry);
>> +        qemu_aio_release(request);
>> +    }
>> +
>> +    g_free(queue);
>> +}
>
> Can we be sure that no AIO requests are in flight that still use the now
> released AIOCB?
Yeah, since qemu core code is serially performed, i think that when
qemu_del_block_queue is performed, no requests are in flight. Right?

>
>> +
>> +BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
>> +                        BlockDriverState *bs,
>> +                        BlockRequestHandler *handler,
>> +                        int64_t sector_num,
>> +                        QEMUIOVector *qiov,
>> +                        int nb_sectors,
>> +                        BlockDriverCompletionFunc *cb,
>> +                        void *opaque)
>> +{
>> +    BlockDriverAIOCB *acb;
>> +    BlockQueueAIOCB *request;
>> +
>> +    if (queue->flushing) {
>> +        queue->req_failed = false;
>> +        return NULL;
>> +    } else {
>> +        acb = qemu_aio_get(&block_queue_pool, bs,
>> +                           cb, opaque);
>> +        request = container_of(acb, BlockQueueAIOCB, common);
>> +        request->handler       = handler;
>> +        request->sector_num    = sector_num;
>> +        request->qiov          = qiov;
>> +        request->nb_sectors    = nb_sectors;
>> +        request->real_acb      = NULL;
>> +        QTAILQ_INSERT_TAIL(&queue->requests, request, entry);
>> +    }
>> +
>> +    return acb;
>> +}
>> +
>> +static int qemu_block_queue_handler(BlockQueueAIOCB *request)
>> +{
>> +    int ret;
>> +    BlockDriverAIOCB *res;
>> +
>> +    res = request->handler(request->common.bs, request->sector_num,
>> +                           request->qiov, request->nb_sectors,
>> +                           qemu_block_queue_callback, request);
>> +    if (res) {
>> +        request->real_acb = res;
>> +    }
>> +
>> +    ret = (res == NULL) ? 0 : 1;
>> +
>> +    return ret;
>
> You mean return (res != NULL); and want to have bool as the return value
> of this function.
Yeah, thanks. i will modify as below:
ret = (res == NULL) ? false : true;
and
static bool qemu_block_queue_handler()

>
>> +}
>> +
>> +void qemu_block_queue_flush(BlockQueue *queue)
>> +{
>> +    queue->flushing = true;
>> +    while (!QTAILQ_EMPTY(&queue->requests)) {
>> +        BlockQueueAIOCB *request = NULL;
>> +        int ret = 0;
>> +
>> +        request = QTAILQ_FIRST(&queue->requests);
>> +        QTAILQ_REMOVE(&queue->requests, request, entry);
>> +
>> +        queue->req_failed = true;
>> +        ret = qemu_block_queue_handler(request);
>> +        if (ret == 0) {
>> +            QTAILQ_INSERT_HEAD(&queue->requests, request, entry);
>> +            if (queue->req_failed) {
>> +                qemu_block_queue_callback(request, -EIO);
>> +                break;
>> +            }
>> +        }
>> +    }
>> +
>> +    queue->req_failed = true;
>> +    queue->flushing   = false;
>> +}
>> +
>> +bool qemu_block_queue_has_pending(BlockQueue *queue)
>> +{
>> +    return !queue->flushing && !QTAILQ_EMPTY(&queue->requests);
>> +}
>
> Why doesn't the queue have pending requests in the middle of a flush
> operation? (That is, the flush hasn't completed yet)
It is possible for the queue to have pending requests. if yes, how about?
>
>> diff --git a/block/blk-queue.h b/block/blk-queue.h
>> new file mode 100644
>> index 0000000..c1529f7
>> --- /dev/null
>> +++ b/block/blk-queue.h
>> @@ -0,0 +1,59 @@
>> +/*
>> + * QEMU System Emulator queue declaration for block layer
>> + *
>> + * Copyright (c) IBM, Corp. 2011
>> + *
>> + * Authors:
>> + *  Zhi Yong Wu  <wuzhy@linux.vnet.ibm.com>
>> + *  Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a copy
>> + * of this software and associated documentation files (the "Software"), to deal
>> + * in the Software without restriction, including without limitation the rights
>> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
>> + * copies of the Software, and to permit persons to whom the Software is
>> + * furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
>> + * THE SOFTWARE.
>> + */
>> +
>> +#ifndef QEMU_BLOCK_QUEUE_H
>> +#define QEMU_BLOCK_QUEUE_H
>> +
>> +#include "block.h"
>> +#include "qemu-queue.h"
>> +
>> +typedef BlockDriverAIOCB* (BlockRequestHandler) (BlockDriverState *bs,
>> +                                int64_t sector_num, QEMUIOVector *qiov,
>> +                                int nb_sectors, BlockDriverCompletionFunc *cb,
>> +                                void *opaque);
>> +
>> +typedef struct BlockQueue BlockQueue;
>> +
>> +BlockQueue *qemu_new_block_queue(void);
>> +
>> +void qemu_del_block_queue(BlockQueue *queue);
>> +
>> +BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
>> +                        BlockDriverState *bs,
>> +                        BlockRequestHandler *handler,
>> +                        int64_t sector_num,
>> +                        QEMUIOVector *qiov,
>> +                        int nb_sectors,
>> +                        BlockDriverCompletionFunc *cb,
>> +                        void *opaque);
>> +
>> +void qemu_block_queue_flush(BlockQueue *queue);
>> +
>> +bool qemu_block_queue_has_pending(BlockQueue *queue);
>> +
>> +#endif /* QEMU_BLOCK_QUEUE_H */
>> diff --git a/block_int.h b/block_int.h
>> index 8a72b80..201e635 100644
>> --- a/block_int.h
>> +++ b/block_int.h
>> @@ -29,10 +29,18 @@
>>  #include "qemu-queue.h"
>>  #include "qemu-coroutine.h"
>>  #include "qemu-timer.h"
>> +#include "block/blk-queue.h"
>>
>>  #define BLOCK_FLAG_ENCRYPT   1
>>  #define BLOCK_FLAG_COMPAT6   4
>>
>> +#define BLOCK_IO_LIMIT_READ     0
>> +#define BLOCK_IO_LIMIT_WRITE    1
>> +#define BLOCK_IO_LIMIT_TOTAL    2
>> +
>> +#define BLOCK_IO_SLICE_TIME     100000000
>> +#define NANOSECONDS_PER_SECOND  1000000000.0
>> +
>>  #define BLOCK_OPT_SIZE          "size"
>>  #define BLOCK_OPT_ENCRYPT       "encryption"
>>  #define BLOCK_OPT_COMPAT6       "compat6"
>> @@ -49,6 +57,16 @@ typedef struct AIOPool {
>>      BlockDriverAIOCB *free_aiocb;
>>  } AIOPool;
>>
>> +typedef struct BlockIOLimit {
>> +    uint64_t bps[3];
>> +    uint64_t iops[3];
>> +} BlockIOLimit;
>> +
>> +typedef struct BlockIODisp {
>> +    uint64_t bytes[2];
>> +    uint64_t ios[2];
>> +} BlockIODisp;
>> +
>>  struct BlockDriver {
>>      const char *format_name;
>>      int instance_size;
>> @@ -184,6 +202,15 @@ struct BlockDriverState {
>>
>>      void *sync_aiocb;
>>
>> +    /* the time for latest disk I/O */
>> +    int64_t slice_start;
>> +    int64_t slice_end;
>> +    BlockIOLimit io_limits;
>> +    BlockIODisp  io_disps;
>> +    BlockQueue   *block_queue;
>> +    QEMUTimer    *block_timer;
>> +    bool         io_limits_enabled;
>> +
>>      /* I/O stats (display with "info blockstats"). */
>>      uint64_t nr_bytes[BDRV_MAX_IOTYPE];
>>      uint64_t nr_ops[BDRV_MAX_IOTYPE];
>
> The changes to block_int.h look unrelated to this patch. Maybe they
> should come later in the series.
OK, i move them to related series. thanks.

>
> Kevin
>
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 1/4] block: add the block queue support
@ 2011-09-26  8:01       ` Zhi Yong Wu
  0 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-09-26  8:01 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: aliguori, stefanha, kvm, mtosatti, qemu-devel, pair, ryanh, Zhi Yong Wu

On Fri, Sep 23, 2011 at 11:32 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 08.09.2011 12:11, schrieb Zhi Yong Wu:
>> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>> ---
>>  Makefile.objs     |    2 +-
>>  block/blk-queue.c |  201 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  block/blk-queue.h |   59 ++++++++++++++++
>>  block_int.h       |   27 +++++++
>>  4 files changed, 288 insertions(+), 1 deletions(-)
>>  create mode 100644 block/blk-queue.c
>>  create mode 100644 block/blk-queue.h
>>
>> diff --git a/Makefile.objs b/Makefile.objs
>> index 26b885b..5dcf456 100644
>> --- a/Makefile.objs
>> +++ b/Makefile.objs
>> @@ -33,7 +33,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vv
>>  block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
>>  block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>>  block-nested-y += qed-check.o
>> -block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
>> +block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o blk-queue.o
>>  block-nested-$(CONFIG_WIN32) += raw-win32.o
>>  block-nested-$(CONFIG_POSIX) += raw-posix.o
>>  block-nested-$(CONFIG_CURL) += curl.o
>> diff --git a/block/blk-queue.c b/block/blk-queue.c
>> new file mode 100644
>> index 0000000..adef497
>> --- /dev/null
>> +++ b/block/blk-queue.c
>> @@ -0,0 +1,201 @@
>> +/*
>> + * QEMU System Emulator queue definition for block layer
>> + *
>> + * Copyright (c) IBM, Corp. 2011
>> + *
>> + * Authors:
>> + *  Zhi Yong Wu  <wuzhy@linux.vnet.ibm.com>
>> + *  Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a copy
>> + * of this software and associated documentation files (the "Software"), to deal
>> + * in the Software without restriction, including without limitation the rights
>> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
>> + * copies of the Software, and to permit persons to whom the Software is
>> + * furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
>> + * THE SOFTWARE.
>> + */
>> +
>> +#include "block_int.h"
>> +#include "block/blk-queue.h"
>> +#include "qemu-common.h"
>> +
>> +/* The APIs for block request queue on qemu block layer.
>> + */
>> +
>> +struct BlockQueueAIOCB {
>> +    BlockDriverAIOCB common;
>> +    QTAILQ_ENTRY(BlockQueueAIOCB) entry;
>> +    BlockRequestHandler *handler;
>> +    BlockDriverAIOCB *real_acb;
>> +
>> +    int64_t sector_num;
>> +    QEMUIOVector *qiov;
>> +    int nb_sectors;
>> +};
>
> The idea is that each request is first queued on the QTAILQ, and at some
> point it's removed from the queue and gets a real_acb. But it never has
> both at the same time. Correct?
NO. if block I/O throttling is enabled and I/O rate at runtime exceed
this limits, this request will be enqueued.
It represents the whole lifecycle of one enqueued request.

>
> Can we have the basic principle of operation spelled out as a comment
> somewhere near the top of the file?
OK, i will.
>
>> +
>> +typedef struct BlockQueueAIOCB BlockQueueAIOCB;
>> +
>> +struct BlockQueue {
>> +    QTAILQ_HEAD(requests, BlockQueueAIOCB) requests;
>> +    bool req_failed;
>> +    bool flushing;
>> +};
>
> I find req_failed pretty confusing. Needs documentation at least, but
> most probably also a better name.
OK. request_has_failed?
>
>> +
>> +static void qemu_block_queue_dequeue(BlockQueue *queue,
>> +                                     BlockQueueAIOCB *request)
>> +{
>> +    BlockQueueAIOCB *req;
>> +
>> +    assert(queue);
>> +    while (!QTAILQ_EMPTY(&queue->requests)) {
>> +        req = QTAILQ_FIRST(&queue->requests);
>> +        if (req == request) {
>> +            QTAILQ_REMOVE(&queue->requests, req, entry);
>> +            break;
>> +        }
>> +    }
>> +}
>
> Is it just me or is this an endless loop if the request isn't the first
> element in the list?
queue->requests is only used to store requests which exceed the limits.
Why is the request not the first evlement?
>
>> +
>> +static void qemu_block_queue_cancel(BlockDriverAIOCB *acb)
>> +{
>> +    BlockQueueAIOCB *request = container_of(acb, BlockQueueAIOCB, common);
>> +    if (request->real_acb) {
>> +        bdrv_aio_cancel(request->real_acb);
>> +    } else {
>> +        assert(request->common.bs->block_queue);
>> +        qemu_block_queue_dequeue(request->common.bs->block_queue,
>> +                                 request);
>> +    }
>> +
>> +    qemu_aio_release(request);
>> +}
>> +
>> +static AIOPool block_queue_pool = {
>> +    .aiocb_size         = sizeof(struct BlockQueueAIOCB),
>> +    .cancel             = qemu_block_queue_cancel,
>> +};
>> +
>> +static void qemu_block_queue_callback(void *opaque, int ret)
>> +{
>> +    BlockQueueAIOCB *acb = opaque;
>> +
>> +    if (acb->common.cb) {
>> +        acb->common.cb(acb->common.opaque, ret);
>> +    }
>> +
>> +    qemu_aio_release(acb);
>> +}
>> +
>> +BlockQueue *qemu_new_block_queue(void)
>> +{
>> +    BlockQueue *queue;
>> +
>> +    queue = g_malloc0(sizeof(BlockQueue));
>> +
>> +    QTAILQ_INIT(&queue->requests);
>> +
>> +    queue->req_failed = true;
>> +    queue->flushing   = false;
>> +
>> +    return queue;
>> +}
>> +
>> +void qemu_del_block_queue(BlockQueue *queue)
>> +{
>> +    BlockQueueAIOCB *request, *next;
>> +
>> +    QTAILQ_FOREACH_SAFE(request, &queue->requests, entry, next) {
>> +        QTAILQ_REMOVE(&queue->requests, request, entry);
>> +        qemu_aio_release(request);
>> +    }
>> +
>> +    g_free(queue);
>> +}
>
> Can we be sure that no AIO requests are in flight that still use the now
> released AIOCB?
Yeah, since qemu core code is serially performed, i think that when
qemu_del_block_queue is performed, no requests are in flight. Right?

>
>> +
>> +BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
>> +                        BlockDriverState *bs,
>> +                        BlockRequestHandler *handler,
>> +                        int64_t sector_num,
>> +                        QEMUIOVector *qiov,
>> +                        int nb_sectors,
>> +                        BlockDriverCompletionFunc *cb,
>> +                        void *opaque)
>> +{
>> +    BlockDriverAIOCB *acb;
>> +    BlockQueueAIOCB *request;
>> +
>> +    if (queue->flushing) {
>> +        queue->req_failed = false;
>> +        return NULL;
>> +    } else {
>> +        acb = qemu_aio_get(&block_queue_pool, bs,
>> +                           cb, opaque);
>> +        request = container_of(acb, BlockQueueAIOCB, common);
>> +        request->handler       = handler;
>> +        request->sector_num    = sector_num;
>> +        request->qiov          = qiov;
>> +        request->nb_sectors    = nb_sectors;
>> +        request->real_acb      = NULL;
>> +        QTAILQ_INSERT_TAIL(&queue->requests, request, entry);
>> +    }
>> +
>> +    return acb;
>> +}
>> +
>> +static int qemu_block_queue_handler(BlockQueueAIOCB *request)
>> +{
>> +    int ret;
>> +    BlockDriverAIOCB *res;
>> +
>> +    res = request->handler(request->common.bs, request->sector_num,
>> +                           request->qiov, request->nb_sectors,
>> +                           qemu_block_queue_callback, request);
>> +    if (res) {
>> +        request->real_acb = res;
>> +    }
>> +
>> +    ret = (res == NULL) ? 0 : 1;
>> +
>> +    return ret;
>
> You mean return (res != NULL); and want to have bool as the return value
> of this function.
Yeah, thanks. i will modify as below:
ret = (res == NULL) ? false : true;
and
static bool qemu_block_queue_handler()

>
>> +}
>> +
>> +void qemu_block_queue_flush(BlockQueue *queue)
>> +{
>> +    queue->flushing = true;
>> +    while (!QTAILQ_EMPTY(&queue->requests)) {
>> +        BlockQueueAIOCB *request = NULL;
>> +        int ret = 0;
>> +
>> +        request = QTAILQ_FIRST(&queue->requests);
>> +        QTAILQ_REMOVE(&queue->requests, request, entry);
>> +
>> +        queue->req_failed = true;
>> +        ret = qemu_block_queue_handler(request);
>> +        if (ret == 0) {
>> +            QTAILQ_INSERT_HEAD(&queue->requests, request, entry);
>> +            if (queue->req_failed) {
>> +                qemu_block_queue_callback(request, -EIO);
>> +                break;
>> +            }
>> +        }
>> +    }
>> +
>> +    queue->req_failed = true;
>> +    queue->flushing   = false;
>> +}
>> +
>> +bool qemu_block_queue_has_pending(BlockQueue *queue)
>> +{
>> +    return !queue->flushing && !QTAILQ_EMPTY(&queue->requests);
>> +}
>
> Why doesn't the queue have pending requests in the middle of a flush
> operation? (That is, the flush hasn't completed yet)
It is possible for the queue to have pending requests. if yes, how about?
>
>> diff --git a/block/blk-queue.h b/block/blk-queue.h
>> new file mode 100644
>> index 0000000..c1529f7
>> --- /dev/null
>> +++ b/block/blk-queue.h
>> @@ -0,0 +1,59 @@
>> +/*
>> + * QEMU System Emulator queue declaration for block layer
>> + *
>> + * Copyright (c) IBM, Corp. 2011
>> + *
>> + * Authors:
>> + *  Zhi Yong Wu  <wuzhy@linux.vnet.ibm.com>
>> + *  Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a copy
>> + * of this software and associated documentation files (the "Software"), to deal
>> + * in the Software without restriction, including without limitation the rights
>> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
>> + * copies of the Software, and to permit persons to whom the Software is
>> + * furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
>> + * THE SOFTWARE.
>> + */
>> +
>> +#ifndef QEMU_BLOCK_QUEUE_H
>> +#define QEMU_BLOCK_QUEUE_H
>> +
>> +#include "block.h"
>> +#include "qemu-queue.h"
>> +
>> +typedef BlockDriverAIOCB* (BlockRequestHandler) (BlockDriverState *bs,
>> +                                int64_t sector_num, QEMUIOVector *qiov,
>> +                                int nb_sectors, BlockDriverCompletionFunc *cb,
>> +                                void *opaque);
>> +
>> +typedef struct BlockQueue BlockQueue;
>> +
>> +BlockQueue *qemu_new_block_queue(void);
>> +
>> +void qemu_del_block_queue(BlockQueue *queue);
>> +
>> +BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
>> +                        BlockDriverState *bs,
>> +                        BlockRequestHandler *handler,
>> +                        int64_t sector_num,
>> +                        QEMUIOVector *qiov,
>> +                        int nb_sectors,
>> +                        BlockDriverCompletionFunc *cb,
>> +                        void *opaque);
>> +
>> +void qemu_block_queue_flush(BlockQueue *queue);
>> +
>> +bool qemu_block_queue_has_pending(BlockQueue *queue);
>> +
>> +#endif /* QEMU_BLOCK_QUEUE_H */
>> diff --git a/block_int.h b/block_int.h
>> index 8a72b80..201e635 100644
>> --- a/block_int.h
>> +++ b/block_int.h
>> @@ -29,10 +29,18 @@
>>  #include "qemu-queue.h"
>>  #include "qemu-coroutine.h"
>>  #include "qemu-timer.h"
>> +#include "block/blk-queue.h"
>>
>>  #define BLOCK_FLAG_ENCRYPT   1
>>  #define BLOCK_FLAG_COMPAT6   4
>>
>> +#define BLOCK_IO_LIMIT_READ     0
>> +#define BLOCK_IO_LIMIT_WRITE    1
>> +#define BLOCK_IO_LIMIT_TOTAL    2
>> +
>> +#define BLOCK_IO_SLICE_TIME     100000000
>> +#define NANOSECONDS_PER_SECOND  1000000000.0
>> +
>>  #define BLOCK_OPT_SIZE          "size"
>>  #define BLOCK_OPT_ENCRYPT       "encryption"
>>  #define BLOCK_OPT_COMPAT6       "compat6"
>> @@ -49,6 +57,16 @@ typedef struct AIOPool {
>>      BlockDriverAIOCB *free_aiocb;
>>  } AIOPool;
>>
>> +typedef struct BlockIOLimit {
>> +    uint64_t bps[3];
>> +    uint64_t iops[3];
>> +} BlockIOLimit;
>> +
>> +typedef struct BlockIODisp {
>> +    uint64_t bytes[2];
>> +    uint64_t ios[2];
>> +} BlockIODisp;
>> +
>>  struct BlockDriver {
>>      const char *format_name;
>>      int instance_size;
>> @@ -184,6 +202,15 @@ struct BlockDriverState {
>>
>>      void *sync_aiocb;
>>
>> +    /* the time for latest disk I/O */
>> +    int64_t slice_start;
>> +    int64_t slice_end;
>> +    BlockIOLimit io_limits;
>> +    BlockIODisp  io_disps;
>> +    BlockQueue   *block_queue;
>> +    QEMUTimer    *block_timer;
>> +    bool         io_limits_enabled;
>> +
>>      /* I/O stats (display with "info blockstats"). */
>>      uint64_t nr_bytes[BDRV_MAX_IOTYPE];
>>      uint64_t nr_ops[BDRV_MAX_IOTYPE];
>
> The changes to block_int.h look unrelated to this patch. Maybe they
> should come later in the series.
OK, i move them to related series. thanks.

>
> Kevin
>
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v8 3/4] block: add block timer and throttling algorithm
  2011-09-20 12:34             ` [Qemu-devel] " Marcelo Tosatti
@ 2011-09-26  8:15               ` Zhi Yong Wu
  -1 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-09-26  8:15 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Zhi Yong Wu, qemu-devel, kvm, stefanha, aliguori, ryanh, kwolf, pair

On Tue, Sep 20, 2011 at 8:34 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Mon, Sep 19, 2011 at 05:55:41PM +0800, Zhi Yong Wu wrote:
>> On Wed, Sep 14, 2011 at 6:50 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> > On Tue, Sep 13, 2011 at 11:09:46AM +0800, Zhi Yong Wu wrote:
>> >> On Fri, Sep 9, 2011 at 10:44 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> >> > On Thu, Sep 08, 2011 at 06:11:07PM +0800, Zhi Yong Wu wrote:
>> >> >> Note:
>> >> >>      1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
>> >> >
>> >> > You can increase the length of the slice, if the request is larger than
>> >> > slice_time * bps_limit.
>> >> Yeah, but it is a challenge for how to increase it. Do you have some nice idea?
>> >
>> > If the queue is empty, and the request being processed does not fit the
>> > queue, increase the slice so that the request fits.
>> Sorry for late reply. actually, do you think that this scenario is
>> meaningful for the user?
>> Since we implement this, if the user limits the bps below 512
>> bytes/second, the VM can also not run every task.
>> Can you let us know why we need to make such effort?
>
> It would be good to handle request larger than the slice.
>
> It is not strictly necessary, but in case its not handled, a minimum
> should be in place, to reflect maximum request size known. Being able to
> specify something which crashes is not acceptable.
HI, Marcelo,

any comments? I have post the implementation based on your suggestions

>
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 3/4] block: add block timer and throttling algorithm
@ 2011-09-26  8:15               ` Zhi Yong Wu
  0 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-09-26  8:15 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: kwolf, aliguori, stefanha, kvm, Zhi Yong Wu, qemu-devel, pair, ryanh

On Tue, Sep 20, 2011 at 8:34 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Mon, Sep 19, 2011 at 05:55:41PM +0800, Zhi Yong Wu wrote:
>> On Wed, Sep 14, 2011 at 6:50 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> > On Tue, Sep 13, 2011 at 11:09:46AM +0800, Zhi Yong Wu wrote:
>> >> On Fri, Sep 9, 2011 at 10:44 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> >> > On Thu, Sep 08, 2011 at 06:11:07PM +0800, Zhi Yong Wu wrote:
>> >> >> Note:
>> >> >>      1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
>> >> >
>> >> > You can increase the length of the slice, if the request is larger than
>> >> > slice_time * bps_limit.
>> >> Yeah, but it is a challenge for how to increase it. Do you have some nice idea?
>> >
>> > If the queue is empty, and the request being processed does not fit the
>> > queue, increase the slice so that the request fits.
>> Sorry for late reply. actually, do you think that this scenario is
>> meaningful for the user?
>> Since we implement this, if the user limits the bps below 512
>> bytes/second, the VM can also not run every task.
>> Can you let us know why we need to make such effort?
>
> It would be good to handle request larger than the slice.
>
> It is not strictly necessary, but in case its not handled, a minimum
> should be in place, to reflect maximum request size known. Being able to
> specify something which crashes is not acceptable.
HI, Marcelo,

any comments? I have post the implementation based on your suggestions

>
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 1/4] block: add the block queue support
  2011-09-26  8:01       ` Zhi Yong Wu
@ 2011-10-17 10:17         ` Kevin Wolf
  -1 siblings, 0 replies; 68+ messages in thread
From: Kevin Wolf @ 2011-10-17 10:17 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: aliguori, stefanha, kvm, mtosatti, qemu-devel, pair, ryanh, Zhi Yong Wu

Am 26.09.2011 10:01, schrieb Zhi Yong Wu:
> On Fri, Sep 23, 2011 at 11:32 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>> Am 08.09.2011 12:11, schrieb Zhi Yong Wu:
>>> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>>> ---
>>>  Makefile.objs     |    2 +-
>>>  block/blk-queue.c |  201 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>  block/blk-queue.h |   59 ++++++++++++++++
>>>  block_int.h       |   27 +++++++
>>>  4 files changed, 288 insertions(+), 1 deletions(-)
>>>  create mode 100644 block/blk-queue.c
>>>  create mode 100644 block/blk-queue.h
>>>
>>> diff --git a/Makefile.objs b/Makefile.objs
>>> index 26b885b..5dcf456 100644
>>> --- a/Makefile.objs
>>> +++ b/Makefile.objs
>>> @@ -33,7 +33,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vv
>>>  block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
>>>  block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>>>  block-nested-y += qed-check.o
>>> -block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
>>> +block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o blk-queue.o
>>>  block-nested-$(CONFIG_WIN32) += raw-win32.o
>>>  block-nested-$(CONFIG_POSIX) += raw-posix.o
>>>  block-nested-$(CONFIG_CURL) += curl.o
>>> diff --git a/block/blk-queue.c b/block/blk-queue.c
>>> new file mode 100644
>>> index 0000000..adef497
>>> --- /dev/null
>>> +++ b/block/blk-queue.c
>>> @@ -0,0 +1,201 @@
>>> +/*
>>> + * QEMU System Emulator queue definition for block layer
>>> + *
>>> + * Copyright (c) IBM, Corp. 2011
>>> + *
>>> + * Authors:
>>> + *  Zhi Yong Wu  <wuzhy@linux.vnet.ibm.com>
>>> + *  Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>>> + *
>>> + * Permission is hereby granted, free of charge, to any person obtaining a copy
>>> + * of this software and associated documentation files (the "Software"), to deal
>>> + * in the Software without restriction, including without limitation the rights
>>> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
>>> + * copies of the Software, and to permit persons to whom the Software is
>>> + * furnished to do so, subject to the following conditions:
>>> + *
>>> + * The above copyright notice and this permission notice shall be included in
>>> + * all copies or substantial portions of the Software.
>>> + *
>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
>>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
>>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
>>> + * THE SOFTWARE.
>>> + */
>>> +
>>> +#include "block_int.h"
>>> +#include "block/blk-queue.h"
>>> +#include "qemu-common.h"
>>> +
>>> +/* The APIs for block request queue on qemu block layer.
>>> + */
>>> +
>>> +struct BlockQueueAIOCB {
>>> +    BlockDriverAIOCB common;
>>> +    QTAILQ_ENTRY(BlockQueueAIOCB) entry;
>>> +    BlockRequestHandler *handler;
>>> +    BlockDriverAIOCB *real_acb;
>>> +
>>> +    int64_t sector_num;
>>> +    QEMUIOVector *qiov;
>>> +    int nb_sectors;
>>> +};
>>
>> The idea is that each request is first queued on the QTAILQ, and at some
>> point it's removed from the queue and gets a real_acb. But it never has
>> both at the same time. Correct?
> NO. if block I/O throttling is enabled and I/O rate at runtime exceed
> this limits, this request will be enqueued.
> It represents the whole lifecycle of one enqueued request.

What are the conditions under which the request will still be enqueued,
but has a real_acb at the same time?

>>> +
>>> +typedef struct BlockQueueAIOCB BlockQueueAIOCB;
>>> +
>>> +struct BlockQueue {
>>> +    QTAILQ_HEAD(requests, BlockQueueAIOCB) requests;
>>> +    bool req_failed;
>>> +    bool flushing;
>>> +};
>>
>> I find req_failed pretty confusing. Needs documentation at least, but
>> most probably also a better name.
> OK. request_has_failed?

No, that doesn't describe what it's really doing.

You set req_failed = true by default and then on some obscure condition
clear it or not. It's tracking something, but I'm not sure what meaning
it has during the whole process.

>>> +
>>> +static void qemu_block_queue_dequeue(BlockQueue *queue,
>>> +                                     BlockQueueAIOCB *request)
>>> +{
>>> +    BlockQueueAIOCB *req;
>>> +
>>> +    assert(queue);
>>> +    while (!QTAILQ_EMPTY(&queue->requests)) {
>>> +        req = QTAILQ_FIRST(&queue->requests);
>>> +        if (req == request) {
>>> +            QTAILQ_REMOVE(&queue->requests, req, entry);
>>> +            break;
>>> +        }
>>> +    }
>>> +}
>>
>> Is it just me or is this an endless loop if the request isn't the first
>> element in the list?
> queue->requests is only used to store requests which exceed the limits.
> Why is the request not the first evlement?

Why do you have a loop if it's always the first element?

>>> +void qemu_del_block_queue(BlockQueue *queue)
>>> +{
>>> +    BlockQueueAIOCB *request, *next;
>>> +
>>> +    QTAILQ_FOREACH_SAFE(request, &queue->requests, entry, next) {
>>> +        QTAILQ_REMOVE(&queue->requests, request, entry);
>>> +        qemu_aio_release(request);
>>> +    }
>>> +
>>> +    g_free(queue);
>>> +}
>>
>> Can we be sure that no AIO requests are in flight that still use the now
>> released AIOCB?
> Yeah, since qemu core code is serially performed, i think that when
> qemu_del_block_queue is performed, no requests are in flight. Right?

Patch 2 has this code:

+void bdrv_io_limits_disable(BlockDriverState *bs)
+{
+    bs->io_limits_enabled = false;
+
+    if (bs->block_queue) {
+        qemu_block_queue_flush(bs->block_queue);
+        qemu_del_block_queue(bs->block_queue);
+        bs->block_queue = NULL;
+    }

Does this mean that you can't disable I/O limits while the VM is running?

>>> +
>>> +BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
>>> +                        BlockDriverState *bs,
>>> +                        BlockRequestHandler *handler,
>>> +                        int64_t sector_num,
>>> +                        QEMUIOVector *qiov,
>>> +                        int nb_sectors,
>>> +                        BlockDriverCompletionFunc *cb,
>>> +                        void *opaque)
>>> +{
>>> +    BlockDriverAIOCB *acb;
>>> +    BlockQueueAIOCB *request;
>>> +
>>> +    if (queue->flushing) {
>>> +        queue->req_failed = false;
>>> +        return NULL;
>>> +    } else {
>>> +        acb = qemu_aio_get(&block_queue_pool, bs,
>>> +                           cb, opaque);
>>> +        request = container_of(acb, BlockQueueAIOCB, common);
>>> +        request->handler       = handler;
>>> +        request->sector_num    = sector_num;
>>> +        request->qiov          = qiov;
>>> +        request->nb_sectors    = nb_sectors;
>>> +        request->real_acb      = NULL;
>>> +        QTAILQ_INSERT_TAIL(&queue->requests, request, entry);
>>> +    }
>>> +
>>> +    return acb;
>>> +}
>>> +
>>> +static int qemu_block_queue_handler(BlockQueueAIOCB *request)
>>> +{
>>> +    int ret;
>>> +    BlockDriverAIOCB *res;
>>> +
>>> +    res = request->handler(request->common.bs, request->sector_num,
>>> +                           request->qiov, request->nb_sectors,
>>> +                           qemu_block_queue_callback, request);
>>> +    if (res) {
>>> +        request->real_acb = res;
>>> +    }
>>> +
>>> +    ret = (res == NULL) ? 0 : 1;
>>> +
>>> +    return ret;
>>
>> You mean return (res != NULL); and want to have bool as the return value
>> of this function.
> Yeah, thanks. i will modify as below:
> ret = (res == NULL) ? false : true;

ret = (res != NULL) is really more readable.

> and
> static bool qemu_block_queue_handler()
> 
>>
>>> +}
>>> +
>>> +void qemu_block_queue_flush(BlockQueue *queue)
>>> +{
>>> +    queue->flushing = true;
>>> +    while (!QTAILQ_EMPTY(&queue->requests)) {
>>> +        BlockQueueAIOCB *request = NULL;
>>> +        int ret = 0;
>>> +
>>> +        request = QTAILQ_FIRST(&queue->requests);
>>> +        QTAILQ_REMOVE(&queue->requests, request, entry);
>>> +
>>> +        queue->req_failed = true;
>>> +        ret = qemu_block_queue_handler(request);
>>> +        if (ret == 0) {
>>> +            QTAILQ_INSERT_HEAD(&queue->requests, request, entry);
>>> +            if (queue->req_failed) {
>>> +                qemu_block_queue_callback(request, -EIO);
>>> +                break;
>>> +            }
>>> +        }
>>> +    }
>>> +
>>> +    queue->req_failed = true;
>>> +    queue->flushing   = false;
>>> +}
>>> +
>>> +bool qemu_block_queue_has_pending(BlockQueue *queue)
>>> +{
>>> +    return !queue->flushing && !QTAILQ_EMPTY(&queue->requests);
>>> +}
>>
>> Why doesn't the queue have pending requests in the middle of a flush
>> operation? (That is, the flush hasn't completed yet)
> It is possible for the queue to have pending requests. if yes, how about?

Sorry, can't parse this.

I don't understand why the !queue->flushing part is correct.

Kevin

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 1/4] block: add the block queue support
@ 2011-10-17 10:17         ` Kevin Wolf
  0 siblings, 0 replies; 68+ messages in thread
From: Kevin Wolf @ 2011-10-17 10:17 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: pair, stefanha, kvm, mtosatti, Zhi Yong Wu, aliguori, qemu-devel, ryanh

Am 26.09.2011 10:01, schrieb Zhi Yong Wu:
> On Fri, Sep 23, 2011 at 11:32 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>> Am 08.09.2011 12:11, schrieb Zhi Yong Wu:
>>> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>>> ---
>>>  Makefile.objs     |    2 +-
>>>  block/blk-queue.c |  201 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>  block/blk-queue.h |   59 ++++++++++++++++
>>>  block_int.h       |   27 +++++++
>>>  4 files changed, 288 insertions(+), 1 deletions(-)
>>>  create mode 100644 block/blk-queue.c
>>>  create mode 100644 block/blk-queue.h
>>>
>>> diff --git a/Makefile.objs b/Makefile.objs
>>> index 26b885b..5dcf456 100644
>>> --- a/Makefile.objs
>>> +++ b/Makefile.objs
>>> @@ -33,7 +33,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vv
>>>  block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
>>>  block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>>>  block-nested-y += qed-check.o
>>> -block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
>>> +block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o blk-queue.o
>>>  block-nested-$(CONFIG_WIN32) += raw-win32.o
>>>  block-nested-$(CONFIG_POSIX) += raw-posix.o
>>>  block-nested-$(CONFIG_CURL) += curl.o
>>> diff --git a/block/blk-queue.c b/block/blk-queue.c
>>> new file mode 100644
>>> index 0000000..adef497
>>> --- /dev/null
>>> +++ b/block/blk-queue.c
>>> @@ -0,0 +1,201 @@
>>> +/*
>>> + * QEMU System Emulator queue definition for block layer
>>> + *
>>> + * Copyright (c) IBM, Corp. 2011
>>> + *
>>> + * Authors:
>>> + *  Zhi Yong Wu  <wuzhy@linux.vnet.ibm.com>
>>> + *  Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>>> + *
>>> + * Permission is hereby granted, free of charge, to any person obtaining a copy
>>> + * of this software and associated documentation files (the "Software"), to deal
>>> + * in the Software without restriction, including without limitation the rights
>>> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
>>> + * copies of the Software, and to permit persons to whom the Software is
>>> + * furnished to do so, subject to the following conditions:
>>> + *
>>> + * The above copyright notice and this permission notice shall be included in
>>> + * all copies or substantial portions of the Software.
>>> + *
>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
>>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
>>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
>>> + * THE SOFTWARE.
>>> + */
>>> +
>>> +#include "block_int.h"
>>> +#include "block/blk-queue.h"
>>> +#include "qemu-common.h"
>>> +
>>> +/* The APIs for block request queue on qemu block layer.
>>> + */
>>> +
>>> +struct BlockQueueAIOCB {
>>> +    BlockDriverAIOCB common;
>>> +    QTAILQ_ENTRY(BlockQueueAIOCB) entry;
>>> +    BlockRequestHandler *handler;
>>> +    BlockDriverAIOCB *real_acb;
>>> +
>>> +    int64_t sector_num;
>>> +    QEMUIOVector *qiov;
>>> +    int nb_sectors;
>>> +};
>>
>> The idea is that each request is first queued on the QTAILQ, and at some
>> point it's removed from the queue and gets a real_acb. But it never has
>> both at the same time. Correct?
> NO. if block I/O throttling is enabled and I/O rate at runtime exceed
> this limits, this request will be enqueued.
> It represents the whole lifecycle of one enqueued request.

What are the conditions under which the request will still be enqueued,
but has a real_acb at the same time?

>>> +
>>> +typedef struct BlockQueueAIOCB BlockQueueAIOCB;
>>> +
>>> +struct BlockQueue {
>>> +    QTAILQ_HEAD(requests, BlockQueueAIOCB) requests;
>>> +    bool req_failed;
>>> +    bool flushing;
>>> +};
>>
>> I find req_failed pretty confusing. Needs documentation at least, but
>> most probably also a better name.
> OK. request_has_failed?

No, that doesn't describe what it's really doing.

You set req_failed = true by default and then on some obscure condition
clear it or not. It's tracking something, but I'm not sure what meaning
it has during the whole process.

>>> +
>>> +static void qemu_block_queue_dequeue(BlockQueue *queue,
>>> +                                     BlockQueueAIOCB *request)
>>> +{
>>> +    BlockQueueAIOCB *req;
>>> +
>>> +    assert(queue);
>>> +    while (!QTAILQ_EMPTY(&queue->requests)) {
>>> +        req = QTAILQ_FIRST(&queue->requests);
>>> +        if (req == request) {
>>> +            QTAILQ_REMOVE(&queue->requests, req, entry);
>>> +            break;
>>> +        }
>>> +    }
>>> +}
>>
>> Is it just me or is this an endless loop if the request isn't the first
>> element in the list?
> queue->requests is only used to store requests which exceed the limits.
> Why is the request not the first evlement?

Why do you have a loop if it's always the first element?

>>> +void qemu_del_block_queue(BlockQueue *queue)
>>> +{
>>> +    BlockQueueAIOCB *request, *next;
>>> +
>>> +    QTAILQ_FOREACH_SAFE(request, &queue->requests, entry, next) {
>>> +        QTAILQ_REMOVE(&queue->requests, request, entry);
>>> +        qemu_aio_release(request);
>>> +    }
>>> +
>>> +    g_free(queue);
>>> +}
>>
>> Can we be sure that no AIO requests are in flight that still use the now
>> released AIOCB?
> Yeah, since qemu core code is serially performed, i think that when
> qemu_del_block_queue is performed, no requests are in flight. Right?

Patch 2 has this code:

+void bdrv_io_limits_disable(BlockDriverState *bs)
+{
+    bs->io_limits_enabled = false;
+
+    if (bs->block_queue) {
+        qemu_block_queue_flush(bs->block_queue);
+        qemu_del_block_queue(bs->block_queue);
+        bs->block_queue = NULL;
+    }

Does this mean that you can't disable I/O limits while the VM is running?

>>> +
>>> +BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
>>> +                        BlockDriverState *bs,
>>> +                        BlockRequestHandler *handler,
>>> +                        int64_t sector_num,
>>> +                        QEMUIOVector *qiov,
>>> +                        int nb_sectors,
>>> +                        BlockDriverCompletionFunc *cb,
>>> +                        void *opaque)
>>> +{
>>> +    BlockDriverAIOCB *acb;
>>> +    BlockQueueAIOCB *request;
>>> +
>>> +    if (queue->flushing) {
>>> +        queue->req_failed = false;
>>> +        return NULL;
>>> +    } else {
>>> +        acb = qemu_aio_get(&block_queue_pool, bs,
>>> +                           cb, opaque);
>>> +        request = container_of(acb, BlockQueueAIOCB, common);
>>> +        request->handler       = handler;
>>> +        request->sector_num    = sector_num;
>>> +        request->qiov          = qiov;
>>> +        request->nb_sectors    = nb_sectors;
>>> +        request->real_acb      = NULL;
>>> +        QTAILQ_INSERT_TAIL(&queue->requests, request, entry);
>>> +    }
>>> +
>>> +    return acb;
>>> +}
>>> +
>>> +static int qemu_block_queue_handler(BlockQueueAIOCB *request)
>>> +{
>>> +    int ret;
>>> +    BlockDriverAIOCB *res;
>>> +
>>> +    res = request->handler(request->common.bs, request->sector_num,
>>> +                           request->qiov, request->nb_sectors,
>>> +                           qemu_block_queue_callback, request);
>>> +    if (res) {
>>> +        request->real_acb = res;
>>> +    }
>>> +
>>> +    ret = (res == NULL) ? 0 : 1;
>>> +
>>> +    return ret;
>>
>> You mean return (res != NULL); and want to have bool as the return value
>> of this function.
> Yeah, thanks. i will modify as below:
> ret = (res == NULL) ? false : true;

ret = (res != NULL) is really more readable.

> and
> static bool qemu_block_queue_handler()
> 
>>
>>> +}
>>> +
>>> +void qemu_block_queue_flush(BlockQueue *queue)
>>> +{
>>> +    queue->flushing = true;
>>> +    while (!QTAILQ_EMPTY(&queue->requests)) {
>>> +        BlockQueueAIOCB *request = NULL;
>>> +        int ret = 0;
>>> +
>>> +        request = QTAILQ_FIRST(&queue->requests);
>>> +        QTAILQ_REMOVE(&queue->requests, request, entry);
>>> +
>>> +        queue->req_failed = true;
>>> +        ret = qemu_block_queue_handler(request);
>>> +        if (ret == 0) {
>>> +            QTAILQ_INSERT_HEAD(&queue->requests, request, entry);
>>> +            if (queue->req_failed) {
>>> +                qemu_block_queue_callback(request, -EIO);
>>> +                break;
>>> +            }
>>> +        }
>>> +    }
>>> +
>>> +    queue->req_failed = true;
>>> +    queue->flushing   = false;
>>> +}
>>> +
>>> +bool qemu_block_queue_has_pending(BlockQueue *queue)
>>> +{
>>> +    return !queue->flushing && !QTAILQ_EMPTY(&queue->requests);
>>> +}
>>
>> Why doesn't the queue have pending requests in the middle of a flush
>> operation? (That is, the flush hasn't completed yet)
> It is possible for the queue to have pending requests. if yes, how about?

Sorry, can't parse this.

I don't understand why the !queue->flushing part is correct.

Kevin

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 1/4] block: add the block queue support
  2011-10-17 10:17         ` Kevin Wolf
  (?)
@ 2011-10-17 10:17         ` Paolo Bonzini
  2011-10-18  7:00             ` Zhi Yong Wu
  -1 siblings, 1 reply; 68+ messages in thread
From: Paolo Bonzini @ 2011-10-17 10:17 UTC (permalink / raw)
  To: kvm; +Cc: qemu-devel

On 10/17/2011 12:17 PM, Kevin Wolf wrote:
> > > >  +
> > > >  +static int qemu_block_queue_handler(BlockQueueAIOCB *request)
> > > >  +{
> > > >  +    int ret;
> > > >  +    BlockDriverAIOCB *res;
> > > >  +
> > > >  +    res = request->handler(request->common.bs, request->sector_num,
> > > >  +                           request->qiov, request->nb_sectors,
> > > >  +                           qemu_block_queue_callback, request);
> > > >  +    if (res) {
> > > >  +        request->real_acb = res;
> > > >  +    }
> > > >  +
> > > >  +    ret = (res == NULL) ? 0 : 1;
> > > >  +
> > > >  +    return ret;
> > >
> > >  You mean return (res != NULL); and want to have bool as the return value
> > >  of this function.
> >
> >  Yeah, thanks. i will modify as below:
> >  ret = (res == NULL) ? false : true;
>
> ret = (res != NULL) is really more readable.

"return (res != NULL);" is even nicer! :)

Paolo


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 2/4] block: add the command line support
  2011-09-26  6:15       ` [Qemu-devel] " Zhi Yong Wu
@ 2011-10-17 10:19         ` Kevin Wolf
  -1 siblings, 0 replies; 68+ messages in thread
From: Kevin Wolf @ 2011-10-17 10:19 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: aliguori, stefanha, kvm, mtosatti, qemu-devel, pair, ryanh, Zhi Yong Wu

Am 26.09.2011 08:15, schrieb Zhi Yong Wu:
> On Fri, Sep 23, 2011 at 11:54 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>>> +}
>>> +
>>> +static void bdrv_block_timer(void *opaque)
>>> +{
>>> +    BlockDriverState *bs = opaque;
>>> +    BlockQueue *queue    = bs->block_queue;
>>> +
>>> +    qemu_block_queue_flush(queue);
>>
>> Hm, didn't really notice it while reading patch 1, but
>> qemu_block_queue_flush() is misleading. It's really something like
> Why do you say this is misleading?
>> qemu_block_queue_submit().
> Right. It will resubmit all enqueued I/O requests.

For me, flush sounds as if it waits for completion of all requests.

Kevin

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 2/4] block: add the command line support
@ 2011-10-17 10:19         ` Kevin Wolf
  0 siblings, 0 replies; 68+ messages in thread
From: Kevin Wolf @ 2011-10-17 10:19 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: pair, stefanha, kvm, mtosatti, Zhi Yong Wu, aliguori, qemu-devel, ryanh

Am 26.09.2011 08:15, schrieb Zhi Yong Wu:
> On Fri, Sep 23, 2011 at 11:54 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>>> +}
>>> +
>>> +static void bdrv_block_timer(void *opaque)
>>> +{
>>> +    BlockDriverState *bs = opaque;
>>> +    BlockQueue *queue    = bs->block_queue;
>>> +
>>> +    qemu_block_queue_flush(queue);
>>
>> Hm, didn't really notice it while reading patch 1, but
>> qemu_block_queue_flush() is misleading. It's really something like
> Why do you say this is misleading?
>> qemu_block_queue_submit().
> Right. It will resubmit all enqueued I/O requests.

For me, flush sounds as if it waits for completion of all requests.

Kevin

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 3/4] block: add block timer and throttling algorithm
  2011-09-26  7:24       ` [Qemu-devel] " Zhi Yong Wu
@ 2011-10-17 10:26         ` Kevin Wolf
  -1 siblings, 0 replies; 68+ messages in thread
From: Kevin Wolf @ 2011-10-17 10:26 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: aliguori, stefanha, kvm, mtosatti, qemu-devel, pair, ryanh, Zhi Yong Wu

Am 26.09.2011 09:24, schrieb Zhi Yong Wu:
> On Sat, Sep 24, 2011 at 12:19 AM, Kevin Wolf <kwolf@redhat.com> wrote:
>> Am 08.09.2011 12:11, schrieb Zhi Yong Wu:
>>> Note:
>>>      1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
>>>      2.) When "dd" command is issued in guest, if its option bs is set to a large value such as "bs=1024K", the result speed will slightly bigger than the limits.
>>>
>>> For these problems, if you have nice thought, pls let us know.:)
>>>
>>> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>>> ---
>>>  block.c |  259 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
>>>  block.h |    1 -
>>>  2 files changed, 248 insertions(+), 12 deletions(-)
>>
>> One general comment: What about synchronous and/or coroutine I/O
>> operations? Do you think they are just not important enough to consider
>> here or were they forgotten?
> For sync ops, we assume that it will be converse into async mode at
> some point of future, right?
> For coroutine I/O, it is introduced in image driver layer, and behind
> bdrv_aio_readv/writev. I think that we need not consider them, right?

Meanwhile the block layer has been changed to handle all requests in
terms of coroutines. So you would best move your intercepting code into
the coroutine functions.

>> Also, do I understand correctly that you're always submitting the whole
> Right, when the block timer fire, it will flush whole request queue.
>> queue at once? Does this effectively enforce the limit all the time or
>> will it lead to some peaks and then no requests at all for a while until
> In fact, it only try to submit those enqueued request one by one. If
> fail to pass the limit, this request will be enqueued again.

Right, I missed this. Makes sense.

>> the average is right again?
> Yeah, it is possible. Do you better idea?
>>
>> Maybe some documentation on how it all works from a high level
>> perspective would be helpful.
>>
>>> +    /* throttling disk read I/O */
>>> +    if (bs->io_limits_enabled) {
>>> +        if (bdrv_exceed_io_limits(bs, nb_sectors, false, &wait_time)) {
>>> +            ret = qemu_block_queue_enqueue(bs->block_queue, bs, bdrv_aio_readv,
>>> +                           sector_num, qiov, nb_sectors, cb, opaque);
>>> +            printf("wait_time=%ld\n", wait_time);
>>> +            if (wait_time != -1) {
>>> +                printf("reset block timer\n");
>>> +                qemu_mod_timer(bs->block_timer,
>>> +                               wait_time + qemu_get_clock_ns(vm_clock));
>>> +            }
>>> +
>>> +            if (ret) {
>>> +                printf("ori ret is not null\n");
>>> +            } else {
>>> +                printf("ori ret is null\n");
>>> +            }
>>> +
>>> +            return ret;
>>> +        }
>>> +    }
>>>
>>> -    return drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
>>> +    ret =  drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
>>>                                 cb, opaque);
>>> +    if (ret) {
>>> +        if (bs->io_limits_enabled) {
>>> +            bs->io_disps.bytes[BLOCK_IO_LIMIT_READ] +=
>>> +                              (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
>>> +            bs->io_disps.ios[BLOCK_IO_LIMIT_READ]++;
>>> +        }
>>
>> I wonder if you can't reuse bs->nr_bytes/nr_ops instead of introducing a
>> second counting mechanism. Would have the advantage that numbers are
> NO, our counting variables will be reset to ZERO if current slice
> time(0.1ms) is used up.

Instead of setting the counter to zero you could remember the base value
and calculate the difference when you need it. The advantage is that we
can share infrastructure instead of introducing several subtly different
ways of I/O accounting.

>> actually consistent (your metric counts slightly differently than the
>> existing info blockstats one).
> Yeah, i notice this, and don't think there's wrong with it. and you?

It's not really user friendly if a number that is called the same means
this in one place and in another place that.

Kevin

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 3/4] block: add block timer and throttling algorithm
@ 2011-10-17 10:26         ` Kevin Wolf
  0 siblings, 0 replies; 68+ messages in thread
From: Kevin Wolf @ 2011-10-17 10:26 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: pair, stefanha, kvm, mtosatti, Zhi Yong Wu, aliguori, qemu-devel, ryanh

Am 26.09.2011 09:24, schrieb Zhi Yong Wu:
> On Sat, Sep 24, 2011 at 12:19 AM, Kevin Wolf <kwolf@redhat.com> wrote:
>> Am 08.09.2011 12:11, schrieb Zhi Yong Wu:
>>> Note:
>>>      1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
>>>      2.) When "dd" command is issued in guest, if its option bs is set to a large value such as "bs=1024K", the result speed will slightly bigger than the limits.
>>>
>>> For these problems, if you have nice thought, pls let us know.:)
>>>
>>> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>>> ---
>>>  block.c |  259 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
>>>  block.h |    1 -
>>>  2 files changed, 248 insertions(+), 12 deletions(-)
>>
>> One general comment: What about synchronous and/or coroutine I/O
>> operations? Do you think they are just not important enough to consider
>> here or were they forgotten?
> For sync ops, we assume that it will be converse into async mode at
> some point of future, right?
> For coroutine I/O, it is introduced in image driver layer, and behind
> bdrv_aio_readv/writev. I think that we need not consider them, right?

Meanwhile the block layer has been changed to handle all requests in
terms of coroutines. So you would best move your intercepting code into
the coroutine functions.

>> Also, do I understand correctly that you're always submitting the whole
> Right, when the block timer fire, it will flush whole request queue.
>> queue at once? Does this effectively enforce the limit all the time or
>> will it lead to some peaks and then no requests at all for a while until
> In fact, it only try to submit those enqueued request one by one. If
> fail to pass the limit, this request will be enqueued again.

Right, I missed this. Makes sense.

>> the average is right again?
> Yeah, it is possible. Do you better idea?
>>
>> Maybe some documentation on how it all works from a high level
>> perspective would be helpful.
>>
>>> +    /* throttling disk read I/O */
>>> +    if (bs->io_limits_enabled) {
>>> +        if (bdrv_exceed_io_limits(bs, nb_sectors, false, &wait_time)) {
>>> +            ret = qemu_block_queue_enqueue(bs->block_queue, bs, bdrv_aio_readv,
>>> +                           sector_num, qiov, nb_sectors, cb, opaque);
>>> +            printf("wait_time=%ld\n", wait_time);
>>> +            if (wait_time != -1) {
>>> +                printf("reset block timer\n");
>>> +                qemu_mod_timer(bs->block_timer,
>>> +                               wait_time + qemu_get_clock_ns(vm_clock));
>>> +            }
>>> +
>>> +            if (ret) {
>>> +                printf("ori ret is not null\n");
>>> +            } else {
>>> +                printf("ori ret is null\n");
>>> +            }
>>> +
>>> +            return ret;
>>> +        }
>>> +    }
>>>
>>> -    return drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
>>> +    ret =  drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
>>>                                 cb, opaque);
>>> +    if (ret) {
>>> +        if (bs->io_limits_enabled) {
>>> +            bs->io_disps.bytes[BLOCK_IO_LIMIT_READ] +=
>>> +                              (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
>>> +            bs->io_disps.ios[BLOCK_IO_LIMIT_READ]++;
>>> +        }
>>
>> I wonder if you can't reuse bs->nr_bytes/nr_ops instead of introducing a
>> second counting mechanism. Would have the advantage that numbers are
> NO, our counting variables will be reset to ZERO if current slice
> time(0.1ms) is used up.

Instead of setting the counter to zero you could remember the base value
and calculate the difference when you need it. The advantage is that we
can share infrastructure instead of introducing several subtly different
ways of I/O accounting.

>> actually consistent (your metric counts slightly differently than the
>> existing info blockstats one).
> Yeah, i notice this, and don't think there's wrong with it. and you?

It's not really user friendly if a number that is called the same means
this in one place and in another place that.

Kevin

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 3/4] block: add block timer and throttling algorithm
  2011-10-17 10:26         ` Kevin Wolf
@ 2011-10-17 15:54           ` Stefan Hajnoczi
  -1 siblings, 0 replies; 68+ messages in thread
From: Stefan Hajnoczi @ 2011-10-17 15:54 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: Zhi Yong Wu, aliguori, stefanha, kvm, mtosatti, qemu-devel, pair,
	ryanh, Kevin Wolf

On Mon, Oct 17, 2011 at 11:26 AM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 26.09.2011 09:24, schrieb Zhi Yong Wu:
>> On Sat, Sep 24, 2011 at 12:19 AM, Kevin Wolf <kwolf@redhat.com> wrote:
>>> Am 08.09.2011 12:11, schrieb Zhi Yong Wu:
>>>> Note:
>>>>      1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
>>>>      2.) When "dd" command is issued in guest, if its option bs is set to a large value such as "bs=1024K", the result speed will slightly bigger than the limits.
>>>>
>>>> For these problems, if you have nice thought, pls let us know.:)
>>>>
>>>> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>>>> ---
>>>>  block.c |  259 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
>>>>  block.h |    1 -
>>>>  2 files changed, 248 insertions(+), 12 deletions(-)
>>>
>>> One general comment: What about synchronous and/or coroutine I/O
>>> operations? Do you think they are just not important enough to consider
>>> here or were they forgotten?
>> For sync ops, we assume that it will be converse into async mode at
>> some point of future, right?
>> For coroutine I/O, it is introduced in image driver layer, and behind
>> bdrv_aio_readv/writev. I think that we need not consider them, right?
>
> Meanwhile the block layer has been changed to handle all requests in
> terms of coroutines. So you would best move your intercepting code into
> the coroutine functions.

Some additional info: the advantage of handling all requests in
coroutines is that there is now a single place where you can put I/O
throttling.  It will work for bdrv_read(), bdrv_co_readv(), and
bdrv_aio_readv().  There is no code duplication, just put the I/O
throttling logic in bdrv_co_do_readv().

Stefan

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 3/4] block: add block timer and throttling algorithm
@ 2011-10-17 15:54           ` Stefan Hajnoczi
  0 siblings, 0 replies; 68+ messages in thread
From: Stefan Hajnoczi @ 2011-10-17 15:54 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: Kevin Wolf, aliguori, stefanha, kvm, mtosatti, qemu-devel, pair,
	Zhi Yong Wu, ryanh

On Mon, Oct 17, 2011 at 11:26 AM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 26.09.2011 09:24, schrieb Zhi Yong Wu:
>> On Sat, Sep 24, 2011 at 12:19 AM, Kevin Wolf <kwolf@redhat.com> wrote:
>>> Am 08.09.2011 12:11, schrieb Zhi Yong Wu:
>>>> Note:
>>>>      1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
>>>>      2.) When "dd" command is issued in guest, if its option bs is set to a large value such as "bs=1024K", the result speed will slightly bigger than the limits.
>>>>
>>>> For these problems, if you have nice thought, pls let us know.:)
>>>>
>>>> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>>>> ---
>>>>  block.c |  259 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
>>>>  block.h |    1 -
>>>>  2 files changed, 248 insertions(+), 12 deletions(-)
>>>
>>> One general comment: What about synchronous and/or coroutine I/O
>>> operations? Do you think they are just not important enough to consider
>>> here or were they forgotten?
>> For sync ops, we assume that it will be converse into async mode at
>> some point of future, right?
>> For coroutine I/O, it is introduced in image driver layer, and behind
>> bdrv_aio_readv/writev. I think that we need not consider them, right?
>
> Meanwhile the block layer has been changed to handle all requests in
> terms of coroutines. So you would best move your intercepting code into
> the coroutine functions.

Some additional info: the advantage of handling all requests in
coroutines is that there is now a single place where you can put I/O
throttling.  It will work for bdrv_read(), bdrv_co_readv(), and
bdrv_aio_readv().  There is no code duplication, just put the I/O
throttling logic in bdrv_co_do_readv().

Stefan

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 1/4] block: add the block queue support
  2011-10-17 10:17         ` Paolo Bonzini
@ 2011-10-18  7:00             ` Zhi Yong Wu
  0 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-10-18  7:00 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, qemu-devel

On Mon, Oct 17, 2011 at 6:17 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 10/17/2011 12:17 PM, Kevin Wolf wrote:
>>
>> > > >  +
>> > > >  +static int qemu_block_queue_handler(BlockQueueAIOCB *request)
>> > > >  +{
>> > > >  +    int ret;
>> > > >  +    BlockDriverAIOCB *res;
>> > > >  +
>> > > >  +    res = request->handler(request->common.bs,
>> > > > request->sector_num,
>> > > >  +                           request->qiov, request->nb_sectors,
>> > > >  +                           qemu_block_queue_callback, request);
>> > > >  +    if (res) {
>> > > >  +        request->real_acb = res;
>> > > >  +    }
>> > > >  +
>> > > >  +    ret = (res == NULL) ? 0 : 1;
>> > > >  +
>> > > >  +    return ret;
>> > >
>> > >  You mean return (res != NULL); and want to have bool as the return
>> > > value
>> > >  of this function.
>> >
>> >  Yeah, thanks. i will modify as below:
>> >  ret = (res == NULL) ? false : true;
>>
>> ret = (res != NULL) is really more readable.
>
> "return (res != NULL);" is even nicer! :)
Great, thanks
>
> Paolo
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 1/4] block: add the block queue support
@ 2011-10-18  7:00             ` Zhi Yong Wu
  0 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-10-18  7:00 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, kvm

On Mon, Oct 17, 2011 at 6:17 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 10/17/2011 12:17 PM, Kevin Wolf wrote:
>>
>> > > >  +
>> > > >  +static int qemu_block_queue_handler(BlockQueueAIOCB *request)
>> > > >  +{
>> > > >  +    int ret;
>> > > >  +    BlockDriverAIOCB *res;
>> > > >  +
>> > > >  +    res = request->handler(request->common.bs,
>> > > > request->sector_num,
>> > > >  +                           request->qiov, request->nb_sectors,
>> > > >  +                           qemu_block_queue_callback, request);
>> > > >  +    if (res) {
>> > > >  +        request->real_acb = res;
>> > > >  +    }
>> > > >  +
>> > > >  +    ret = (res == NULL) ? 0 : 1;
>> > > >  +
>> > > >  +    return ret;
>> > >
>> > >  You mean return (res != NULL); and want to have bool as the return
>> > > value
>> > >  of this function.
>> >
>> >  Yeah, thanks. i will modify as below:
>> >  ret = (res == NULL) ? false : true;
>>
>> ret = (res != NULL) is really more readable.
>
> "return (res != NULL);" is even nicer! :)
Great, thanks
>
> Paolo
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v8 1/4] block: add the block queue support
  2011-10-17 10:17         ` Kevin Wolf
@ 2011-10-18  8:07           ` Zhi Yong Wu
  -1 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-10-18  8:07 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Zhi Yong Wu, stefanha, kvm, qemu-devel

On Mon, Oct 17, 2011 at 6:17 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 26.09.2011 10:01, schrieb Zhi Yong Wu:
>> On Fri, Sep 23, 2011 at 11:32 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>>> Am 08.09.2011 12:11, schrieb Zhi Yong Wu:
>>>> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>>>> ---
>>>>  Makefile.objs     |    2 +-
>>>>  block/blk-queue.c |  201 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>  block/blk-queue.h |   59 ++++++++++++++++
>>>>  block_int.h       |   27 +++++++
>>>>  4 files changed, 288 insertions(+), 1 deletions(-)
>>>>  create mode 100644 block/blk-queue.c
>>>>  create mode 100644 block/blk-queue.h
>>>>
>>>> diff --git a/Makefile.objs b/Makefile.objs
>>>> index 26b885b..5dcf456 100644
>>>> --- a/Makefile.objs
>>>> +++ b/Makefile.objs
>>>> @@ -33,7 +33,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vv
>>>>  block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
>>>>  block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>>>>  block-nested-y += qed-check.o
>>>> -block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
>>>> +block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o blk-queue.o
>>>>  block-nested-$(CONFIG_WIN32) += raw-win32.o
>>>>  block-nested-$(CONFIG_POSIX) += raw-posix.o
>>>>  block-nested-$(CONFIG_CURL) += curl.o
>>>> diff --git a/block/blk-queue.c b/block/blk-queue.c
>>>> new file mode 100644
>>>> index 0000000..adef497
>>>> --- /dev/null
>>>> +++ b/block/blk-queue.c
>>>> @@ -0,0 +1,201 @@
>>>> +/*
>>>> + * QEMU System Emulator queue definition for block layer
>>>> + *
>>>> + * Copyright (c) IBM, Corp. 2011
>>>> + *
>>>> + * Authors:
>>>> + *  Zhi Yong Wu  <wuzhy@linux.vnet.ibm.com>
>>>> + *  Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>>>> + *
>>>> + * Permission is hereby granted, free of charge, to any person obtaining a copy
>>>> + * of this software and associated documentation files (the "Software"), to deal
>>>> + * in the Software without restriction, including without limitation the rights
>>>> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
>>>> + * copies of the Software, and to permit persons to whom the Software is
>>>> + * furnished to do so, subject to the following conditions:
>>>> + *
>>>> + * The above copyright notice and this permission notice shall be included in
>>>> + * all copies or substantial portions of the Software.
>>>> + *
>>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
>>>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>>>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
>>>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
>>>> + * THE SOFTWARE.
>>>> + */
>>>> +
>>>> +#include "block_int.h"
>>>> +#include "block/blk-queue.h"
>>>> +#include "qemu-common.h"
>>>> +
>>>> +/* The APIs for block request queue on qemu block layer.
>>>> + */
>>>> +
>>>> +struct BlockQueueAIOCB {
>>>> +    BlockDriverAIOCB common;
>>>> +    QTAILQ_ENTRY(BlockQueueAIOCB) entry;
>>>> +    BlockRequestHandler *handler;
>>>> +    BlockDriverAIOCB *real_acb;
>>>> +
>>>> +    int64_t sector_num;
>>>> +    QEMUIOVector *qiov;
>>>> +    int nb_sectors;
>>>> +};
>>>
>>> The idea is that each request is first queued on the QTAILQ, and at some
>>> point it's removed from the queue and gets a real_acb. But it never has
>>> both at the same time. Correct?
>> NO. if block I/O throttling is enabled and I/O rate at runtime exceed
>> this limits, this request will be enqueued.
>> It represents the whole lifecycle of one enqueued request.
>
> What are the conditions under which the request will still be enqueued,
> but has a real_acb at the same time?
When the request is not serviced and still need to be enqueued, one
real_acb will be existable at the same time.
thanks for your good catch.:)
When one request will be mallocated at the first time, we should save
the got acb to real_acb
qemu_block_queue_enqueue():
              request->real_acb      = acb;

>
>>>> +
>>>> +typedef struct BlockQueueAIOCB BlockQueueAIOCB;
>>>> +
>>>> +struct BlockQueue {
>>>> +    QTAILQ_HEAD(requests, BlockQueueAIOCB) requests;
>>>> +    bool req_failed;
>>>> +    bool flushing;
>>>> +};
>>>
>>> I find req_failed pretty confusing. Needs documentation at least, but
>>> most probably also a better name.
>> OK. request_has_failed?
>
> No, that doesn't describe what it's really doing.
>
> You set req_failed = true by default and then on some obscure condition
> clear it or not. It's tracking something, but I'm not sure what meaning
> it has during the whole process.
In qemu_block_queue_flush,
When bdrv_aio_readv/writev return NULL, if req_failed is changed to
false, it indicates that the request exceeds the limits again; if
req_failed is still true, it indicates that one I/O error takes place,
at the monent, qemu_block_queue_callback(request, -EIO) need to be
called to report this to upper layer.

>
>>>> +
>>>> +static void qemu_block_queue_dequeue(BlockQueue *queue,
>>>> +                                     BlockQueueAIOCB *request)
>>>> +{
>>>> +    BlockQueueAIOCB *req;
>>>> +
>>>> +    assert(queue);
>>>> +    while (!QTAILQ_EMPTY(&queue->requests)) {
>>>> +        req = QTAILQ_FIRST(&queue->requests);
>>>> +        if (req == request) {
>>>> +            QTAILQ_REMOVE(&queue->requests, req, entry);
>>>> +            break;
>>>> +        }
>>>> +    }
>>>> +}
>>>
>>> Is it just me or is this an endless loop if the request isn't the first
>>> element in the list?
>> queue->requests is only used to store requests which exceed the limits.
>> Why is the request not the first evlement?
>
> Why do you have a loop if it's always the first element?
Ah, it can cause dead loop, and QTAILQ_FOREACH_SAFE should be adopted
here. thanks.
    QTAILQ_FOREACH_SAFE(req, &queue->requests, entry, next) {
        if (*req == *request) {
            QTAILQ_REMOVE(&queue->requests, req, entry);
            break;
        }
    }

>
>>>> +void qemu_del_block_queue(BlockQueue *queue)
>>>> +{
>>>> +    BlockQueueAIOCB *request, *next;
>>>> +
>>>> +    QTAILQ_FOREACH_SAFE(request, &queue->requests, entry, next) {
>>>> +        QTAILQ_REMOVE(&queue->requests, request, entry);
>>>> +        qemu_aio_release(request);
>>>> +    }
>>>> +
>>>> +    g_free(queue);
>>>> +}
>>>
>>> Can we be sure that no AIO requests are in flight that still use the now
>>> released AIOCB?
>> Yeah, since qemu core code is serially performed, i think that when
>> qemu_del_block_queue is performed, no requests are in flight. Right?
>
> Patch 2 has this code:
>
> +void bdrv_io_limits_disable(BlockDriverState *bs)
> +{
> +    bs->io_limits_enabled = false;
> +
> +    if (bs->block_queue) {
> +        qemu_block_queue_flush(bs->block_queue);
> +        qemu_del_block_queue(bs->block_queue);
> +        bs->block_queue = NULL;
> +    }
>
> Does this mean that you can't disable I/O limits while the VM is running?
NO, you can even though VM is running.
>
>>>> +
>>>> +BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
>>>> +                        BlockDriverState *bs,
>>>> +                        BlockRequestHandler *handler,
>>>> +                        int64_t sector_num,
>>>> +                        QEMUIOVector *qiov,
>>>> +                        int nb_sectors,
>>>> +                        BlockDriverCompletionFunc *cb,
>>>> +                        void *opaque)
>>>> +{
>>>> +    BlockDriverAIOCB *acb;
>>>> +    BlockQueueAIOCB *request;
>>>> +
>>>> +    if (queue->flushing) {
>>>> +        queue->req_failed = false;
>>>> +        return NULL;
>>>> +    } else {
>>>> +        acb = qemu_aio_get(&block_queue_pool, bs,
>>>> +                           cb, opaque);
>>>> +        request = container_of(acb, BlockQueueAIOCB, common);
>>>> +        request->handler       = handler;
>>>> +        request->sector_num    = sector_num;
>>>> +        request->qiov          = qiov;
>>>> +        request->nb_sectors    = nb_sectors;
>>>> +        request->real_acb      = NULL;
>>>> +        QTAILQ_INSERT_TAIL(&queue->requests, request, entry);
>>>> +    }
>>>> +
>>>> +    return acb;
>>>> +}
>>>> +
>>>> +static int qemu_block_queue_handler(BlockQueueAIOCB *request)
>>>> +{
>>>> +    int ret;
>>>> +    BlockDriverAIOCB *res;
>>>> +
>>>> +    res = request->handler(request->common.bs, request->sector_num,
>>>> +                           request->qiov, request->nb_sectors,
>>>> +                           qemu_block_queue_callback, request);
>>>> +    if (res) {
>>>> +        request->real_acb = res;
>>>> +    }
>>>> +
>>>> +    ret = (res == NULL) ? 0 : 1;
>>>> +
>>>> +    return ret;
>>>
>>> You mean return (res != NULL); and want to have bool as the return value
>>> of this function.
>> Yeah, thanks. i will modify as below:
>> ret = (res == NULL) ? false : true;
>
> ret = (res != NULL) is really more readable.
I have adopted Paolo's suggestion.

>
>> and
>> static bool qemu_block_queue_handler()
>>
>>>
>>>> +}
>>>> +
>>>> +void qemu_block_queue_flush(BlockQueue *queue)
>>>> +{
>>>> +    queue->flushing = true;
>>>> +    while (!QTAILQ_EMPTY(&queue->requests)) {
>>>> +        BlockQueueAIOCB *request = NULL;
>>>> +        int ret = 0;
>>>> +
>>>> +        request = QTAILQ_FIRST(&queue->requests);
>>>> +        QTAILQ_REMOVE(&queue->requests, request, entry);
>>>> +
>>>> +        queue->req_failed = true;
>>>> +        ret = qemu_block_queue_handler(request);
>>>> +        if (ret == 0) {
>>>> +            QTAILQ_INSERT_HEAD(&queue->requests, request, entry);
>>>> +            if (queue->req_failed) {
>>>> +                qemu_block_queue_callback(request, -EIO);
>>>> +                break;
>>>> +            }
>>>> +        }
>>>> +    }
>>>> +
>>>> +    queue->req_failed = true;
>>>> +    queue->flushing   = false;
>>>> +}
>>>> +
>>>> +bool qemu_block_queue_has_pending(BlockQueue *queue)
>>>> +{
>>>> +    return !queue->flushing && !QTAILQ_EMPTY(&queue->requests);
>>>> +}
>>>
>>> Why doesn't the queue have pending requests in the middle of a flush
>>> operation? (That is, the flush hasn't completed yet)
>> It is possible for the queue to have pending requests. if yes, how about?
>
> Sorry, can't parse this.
>
> I don't understand why the !queue->flushing part is correct.
>
> Kevin
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 1/4] block: add the block queue support
@ 2011-10-18  8:07           ` Zhi Yong Wu
  0 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-10-18  8:07 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Zhi Yong Wu, stefanha, kvm, qemu-devel

On Mon, Oct 17, 2011 at 6:17 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 26.09.2011 10:01, schrieb Zhi Yong Wu:
>> On Fri, Sep 23, 2011 at 11:32 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>>> Am 08.09.2011 12:11, schrieb Zhi Yong Wu:
>>>> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>>>> ---
>>>>  Makefile.objs     |    2 +-
>>>>  block/blk-queue.c |  201 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>  block/blk-queue.h |   59 ++++++++++++++++
>>>>  block_int.h       |   27 +++++++
>>>>  4 files changed, 288 insertions(+), 1 deletions(-)
>>>>  create mode 100644 block/blk-queue.c
>>>>  create mode 100644 block/blk-queue.h
>>>>
>>>> diff --git a/Makefile.objs b/Makefile.objs
>>>> index 26b885b..5dcf456 100644
>>>> --- a/Makefile.objs
>>>> +++ b/Makefile.objs
>>>> @@ -33,7 +33,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vv
>>>>  block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
>>>>  block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>>>>  block-nested-y += qed-check.o
>>>> -block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
>>>> +block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o blk-queue.o
>>>>  block-nested-$(CONFIG_WIN32) += raw-win32.o
>>>>  block-nested-$(CONFIG_POSIX) += raw-posix.o
>>>>  block-nested-$(CONFIG_CURL) += curl.o
>>>> diff --git a/block/blk-queue.c b/block/blk-queue.c
>>>> new file mode 100644
>>>> index 0000000..adef497
>>>> --- /dev/null
>>>> +++ b/block/blk-queue.c
>>>> @@ -0,0 +1,201 @@
>>>> +/*
>>>> + * QEMU System Emulator queue definition for block layer
>>>> + *
>>>> + * Copyright (c) IBM, Corp. 2011
>>>> + *
>>>> + * Authors:
>>>> + *  Zhi Yong Wu  <wuzhy@linux.vnet.ibm.com>
>>>> + *  Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>>>> + *
>>>> + * Permission is hereby granted, free of charge, to any person obtaining a copy
>>>> + * of this software and associated documentation files (the "Software"), to deal
>>>> + * in the Software without restriction, including without limitation the rights
>>>> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
>>>> + * copies of the Software, and to permit persons to whom the Software is
>>>> + * furnished to do so, subject to the following conditions:
>>>> + *
>>>> + * The above copyright notice and this permission notice shall be included in
>>>> + * all copies or substantial portions of the Software.
>>>> + *
>>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
>>>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>>>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
>>>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
>>>> + * THE SOFTWARE.
>>>> + */
>>>> +
>>>> +#include "block_int.h"
>>>> +#include "block/blk-queue.h"
>>>> +#include "qemu-common.h"
>>>> +
>>>> +/* The APIs for block request queue on qemu block layer.
>>>> + */
>>>> +
>>>> +struct BlockQueueAIOCB {
>>>> +    BlockDriverAIOCB common;
>>>> +    QTAILQ_ENTRY(BlockQueueAIOCB) entry;
>>>> +    BlockRequestHandler *handler;
>>>> +    BlockDriverAIOCB *real_acb;
>>>> +
>>>> +    int64_t sector_num;
>>>> +    QEMUIOVector *qiov;
>>>> +    int nb_sectors;
>>>> +};
>>>
>>> The idea is that each request is first queued on the QTAILQ, and at some
>>> point it's removed from the queue and gets a real_acb. But it never has
>>> both at the same time. Correct?
>> NO. if block I/O throttling is enabled and I/O rate at runtime exceed
>> this limits, this request will be enqueued.
>> It represents the whole lifecycle of one enqueued request.
>
> What are the conditions under which the request will still be enqueued,
> but has a real_acb at the same time?
When the request is not serviced and still need to be enqueued, one
real_acb will be existable at the same time.
thanks for your good catch.:)
When one request will be mallocated at the first time, we should save
the got acb to real_acb
qemu_block_queue_enqueue():
              request->real_acb      = acb;

>
>>>> +
>>>> +typedef struct BlockQueueAIOCB BlockQueueAIOCB;
>>>> +
>>>> +struct BlockQueue {
>>>> +    QTAILQ_HEAD(requests, BlockQueueAIOCB) requests;
>>>> +    bool req_failed;
>>>> +    bool flushing;
>>>> +};
>>>
>>> I find req_failed pretty confusing. Needs documentation at least, but
>>> most probably also a better name.
>> OK. request_has_failed?
>
> No, that doesn't describe what it's really doing.
>
> You set req_failed = true by default and then on some obscure condition
> clear it or not. It's tracking something, but I'm not sure what meaning
> it has during the whole process.
In qemu_block_queue_flush,
When bdrv_aio_readv/writev return NULL, if req_failed is changed to
false, it indicates that the request exceeds the limits again; if
req_failed is still true, it indicates that one I/O error takes place,
at the monent, qemu_block_queue_callback(request, -EIO) need to be
called to report this to upper layer.

>
>>>> +
>>>> +static void qemu_block_queue_dequeue(BlockQueue *queue,
>>>> +                                     BlockQueueAIOCB *request)
>>>> +{
>>>> +    BlockQueueAIOCB *req;
>>>> +
>>>> +    assert(queue);
>>>> +    while (!QTAILQ_EMPTY(&queue->requests)) {
>>>> +        req = QTAILQ_FIRST(&queue->requests);
>>>> +        if (req == request) {
>>>> +            QTAILQ_REMOVE(&queue->requests, req, entry);
>>>> +            break;
>>>> +        }
>>>> +    }
>>>> +}
>>>
>>> Is it just me or is this an endless loop if the request isn't the first
>>> element in the list?
>> queue->requests is only used to store requests which exceed the limits.
>> Why is the request not the first evlement?
>
> Why do you have a loop if it's always the first element?
Ah, it can cause dead loop, and QTAILQ_FOREACH_SAFE should be adopted
here. thanks.
    QTAILQ_FOREACH_SAFE(req, &queue->requests, entry, next) {
        if (*req == *request) {
            QTAILQ_REMOVE(&queue->requests, req, entry);
            break;
        }
    }

>
>>>> +void qemu_del_block_queue(BlockQueue *queue)
>>>> +{
>>>> +    BlockQueueAIOCB *request, *next;
>>>> +
>>>> +    QTAILQ_FOREACH_SAFE(request, &queue->requests, entry, next) {
>>>> +        QTAILQ_REMOVE(&queue->requests, request, entry);
>>>> +        qemu_aio_release(request);
>>>> +    }
>>>> +
>>>> +    g_free(queue);
>>>> +}
>>>
>>> Can we be sure that no AIO requests are in flight that still use the now
>>> released AIOCB?
>> Yeah, since qemu core code is serially performed, i think that when
>> qemu_del_block_queue is performed, no requests are in flight. Right?
>
> Patch 2 has this code:
>
> +void bdrv_io_limits_disable(BlockDriverState *bs)
> +{
> +    bs->io_limits_enabled = false;
> +
> +    if (bs->block_queue) {
> +        qemu_block_queue_flush(bs->block_queue);
> +        qemu_del_block_queue(bs->block_queue);
> +        bs->block_queue = NULL;
> +    }
>
> Does this mean that you can't disable I/O limits while the VM is running?
NO, you can even though VM is running.
>
>>>> +
>>>> +BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
>>>> +                        BlockDriverState *bs,
>>>> +                        BlockRequestHandler *handler,
>>>> +                        int64_t sector_num,
>>>> +                        QEMUIOVector *qiov,
>>>> +                        int nb_sectors,
>>>> +                        BlockDriverCompletionFunc *cb,
>>>> +                        void *opaque)
>>>> +{
>>>> +    BlockDriverAIOCB *acb;
>>>> +    BlockQueueAIOCB *request;
>>>> +
>>>> +    if (queue->flushing) {
>>>> +        queue->req_failed = false;
>>>> +        return NULL;
>>>> +    } else {
>>>> +        acb = qemu_aio_get(&block_queue_pool, bs,
>>>> +                           cb, opaque);
>>>> +        request = container_of(acb, BlockQueueAIOCB, common);
>>>> +        request->handler       = handler;
>>>> +        request->sector_num    = sector_num;
>>>> +        request->qiov          = qiov;
>>>> +        request->nb_sectors    = nb_sectors;
>>>> +        request->real_acb      = NULL;
>>>> +        QTAILQ_INSERT_TAIL(&queue->requests, request, entry);
>>>> +    }
>>>> +
>>>> +    return acb;
>>>> +}
>>>> +
>>>> +static int qemu_block_queue_handler(BlockQueueAIOCB *request)
>>>> +{
>>>> +    int ret;
>>>> +    BlockDriverAIOCB *res;
>>>> +
>>>> +    res = request->handler(request->common.bs, request->sector_num,
>>>> +                           request->qiov, request->nb_sectors,
>>>> +                           qemu_block_queue_callback, request);
>>>> +    if (res) {
>>>> +        request->real_acb = res;
>>>> +    }
>>>> +
>>>> +    ret = (res == NULL) ? 0 : 1;
>>>> +
>>>> +    return ret;
>>>
>>> You mean return (res != NULL); and want to have bool as the return value
>>> of this function.
>> Yeah, thanks. i will modify as below:
>> ret = (res == NULL) ? false : true;
>
> ret = (res != NULL) is really more readable.
I have adopted Paolo's suggestion.

>
>> and
>> static bool qemu_block_queue_handler()
>>
>>>
>>>> +}
>>>> +
>>>> +void qemu_block_queue_flush(BlockQueue *queue)
>>>> +{
>>>> +    queue->flushing = true;
>>>> +    while (!QTAILQ_EMPTY(&queue->requests)) {
>>>> +        BlockQueueAIOCB *request = NULL;
>>>> +        int ret = 0;
>>>> +
>>>> +        request = QTAILQ_FIRST(&queue->requests);
>>>> +        QTAILQ_REMOVE(&queue->requests, request, entry);
>>>> +
>>>> +        queue->req_failed = true;
>>>> +        ret = qemu_block_queue_handler(request);
>>>> +        if (ret == 0) {
>>>> +            QTAILQ_INSERT_HEAD(&queue->requests, request, entry);
>>>> +            if (queue->req_failed) {
>>>> +                qemu_block_queue_callback(request, -EIO);
>>>> +                break;
>>>> +            }
>>>> +        }
>>>> +    }
>>>> +
>>>> +    queue->req_failed = true;
>>>> +    queue->flushing   = false;
>>>> +}
>>>> +
>>>> +bool qemu_block_queue_has_pending(BlockQueue *queue)
>>>> +{
>>>> +    return !queue->flushing && !QTAILQ_EMPTY(&queue->requests);
>>>> +}
>>>
>>> Why doesn't the queue have pending requests in the middle of a flush
>>> operation? (That is, the flush hasn't completed yet)
>> It is possible for the queue to have pending requests. if yes, how about?
>
> Sorry, can't parse this.
>
> I don't understand why the !queue->flushing part is correct.
>
> Kevin
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v8 2/4] block: add the command line support
  2011-10-17 10:19         ` Kevin Wolf
@ 2011-10-18  8:17           ` Zhi Yong Wu
  -1 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-10-18  8:17 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: pair, stefanha, kvm, mtosatti, Zhi Yong Wu, aliguori, qemu-devel, ryanh

On Mon, Oct 17, 2011 at 6:19 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 26.09.2011 08:15, schrieb Zhi Yong Wu:
>> On Fri, Sep 23, 2011 at 11:54 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>>>> +}
>>>> +
>>>> +static void bdrv_block_timer(void *opaque)
>>>> +{
>>>> +    BlockDriverState *bs = opaque;
>>>> +    BlockQueue *queue    = bs->block_queue;
>>>> +
>>>> +    qemu_block_queue_flush(queue);
>>>
>>> Hm, didn't really notice it while reading patch 1, but
>>> qemu_block_queue_flush() is misleading. It's really something like
>> Why do you say this is misleading?
>>> qemu_block_queue_submit().
>> Right. It will resubmit all enqueued I/O requests.
>
> For me, flush sounds as if it waits for completion of all requests.
The code is currently except one I/O error. But i think that we should
not take the action, right? I don't make sure if we should keep all
the enqueued request in order.

>
> Kevin
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 2/4] block: add the command line support
@ 2011-10-18  8:17           ` Zhi Yong Wu
  0 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-10-18  8:17 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: pair, stefanha, kvm, mtosatti, Zhi Yong Wu, aliguori, qemu-devel, ryanh

On Mon, Oct 17, 2011 at 6:19 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 26.09.2011 08:15, schrieb Zhi Yong Wu:
>> On Fri, Sep 23, 2011 at 11:54 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>>>> +}
>>>> +
>>>> +static void bdrv_block_timer(void *opaque)
>>>> +{
>>>> +    BlockDriverState *bs = opaque;
>>>> +    BlockQueue *queue    = bs->block_queue;
>>>> +
>>>> +    qemu_block_queue_flush(queue);
>>>
>>> Hm, didn't really notice it while reading patch 1, but
>>> qemu_block_queue_flush() is misleading. It's really something like
>> Why do you say this is misleading?
>>> qemu_block_queue_submit().
>> Right. It will resubmit all enqueued I/O requests.
>
> For me, flush sounds as if it waits for completion of all requests.
The code is currently except one I/O error. But i think that we should
not take the action, right? I don't make sure if we should keep all
the enqueued request in order.

>
> Kevin
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 3/4] block: add block timer and throttling algorithm
  2011-10-17 15:54           ` Stefan Hajnoczi
@ 2011-10-18  8:29             ` Zhi Yong Wu
  -1 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-10-18  8:29 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Zhi Yong Wu, aliguori, stefanha, kvm, mtosatti, qemu-devel, pair,
	ryanh, Kevin Wolf

On Mon, Oct 17, 2011 at 11:54 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Mon, Oct 17, 2011 at 11:26 AM, Kevin Wolf <kwolf@redhat.com> wrote:
>> Am 26.09.2011 09:24, schrieb Zhi Yong Wu:
>>> On Sat, Sep 24, 2011 at 12:19 AM, Kevin Wolf <kwolf@redhat.com> wrote:
>>>> Am 08.09.2011 12:11, schrieb Zhi Yong Wu:
>>>>> Note:
>>>>>      1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
>>>>>      2.) When "dd" command is issued in guest, if its option bs is set to a large value such as "bs=1024K", the result speed will slightly bigger than the limits.
>>>>>
>>>>> For these problems, if you have nice thought, pls let us know.:)
>>>>>
>>>>> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>>>>> ---
>>>>>  block.c |  259 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
>>>>>  block.h |    1 -
>>>>>  2 files changed, 248 insertions(+), 12 deletions(-)
>>>>
>>>> One general comment: What about synchronous and/or coroutine I/O
>>>> operations? Do you think they are just not important enough to consider
>>>> here or were they forgotten?
>>> For sync ops, we assume that it will be converse into async mode at
>>> some point of future, right?
>>> For coroutine I/O, it is introduced in image driver layer, and behind
>>> bdrv_aio_readv/writev. I think that we need not consider them, right?
>>
>> Meanwhile the block layer has been changed to handle all requests in
>> terms of coroutines. So you would best move your intercepting code into
>> the coroutine functions.
>
> Some additional info: the advantage of handling all requests in
> coroutines is that there is now a single place where you can put I/O
> throttling.  It will work for bdrv_read(), bdrv_co_readv(), and
> bdrv_aio_readv().  There is no code duplication, just put the I/O
> throttling logic in bdrv_co_do_readv().
got it. thanks.
>
> Stefan
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 3/4] block: add block timer and throttling algorithm
@ 2011-10-18  8:29             ` Zhi Yong Wu
  0 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-10-18  8:29 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, aliguori, stefanha, kvm, mtosatti, qemu-devel, pair,
	ryanh, Zhi Yong Wu

On Mon, Oct 17, 2011 at 11:54 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Mon, Oct 17, 2011 at 11:26 AM, Kevin Wolf <kwolf@redhat.com> wrote:
>> Am 26.09.2011 09:24, schrieb Zhi Yong Wu:
>>> On Sat, Sep 24, 2011 at 12:19 AM, Kevin Wolf <kwolf@redhat.com> wrote:
>>>> Am 08.09.2011 12:11, schrieb Zhi Yong Wu:
>>>>> Note:
>>>>>      1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
>>>>>      2.) When "dd" command is issued in guest, if its option bs is set to a large value such as "bs=1024K", the result speed will slightly bigger than the limits.
>>>>>
>>>>> For these problems, if you have nice thought, pls let us know.:)
>>>>>
>>>>> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>>>>> ---
>>>>>  block.c |  259 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
>>>>>  block.h |    1 -
>>>>>  2 files changed, 248 insertions(+), 12 deletions(-)
>>>>
>>>> One general comment: What about synchronous and/or coroutine I/O
>>>> operations? Do you think they are just not important enough to consider
>>>> here or were they forgotten?
>>> For sync ops, we assume that it will be converse into async mode at
>>> some point of future, right?
>>> For coroutine I/O, it is introduced in image driver layer, and behind
>>> bdrv_aio_readv/writev. I think that we need not consider them, right?
>>
>> Meanwhile the block layer has been changed to handle all requests in
>> terms of coroutines. So you would best move your intercepting code into
>> the coroutine functions.
>
> Some additional info: the advantage of handling all requests in
> coroutines is that there is now a single place where you can put I/O
> throttling.  It will work for bdrv_read(), bdrv_co_readv(), and
> bdrv_aio_readv().  There is no code duplication, just put the I/O
> throttling logic in bdrv_co_do_readv().
got it. thanks.
>
> Stefan
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 1/4] block: add the block queue support
  2011-10-18  8:07           ` [Qemu-devel] " Zhi Yong Wu
@ 2011-10-18  8:36             ` Kevin Wolf
  -1 siblings, 0 replies; 68+ messages in thread
From: Kevin Wolf @ 2011-10-18  8:36 UTC (permalink / raw)
  To: Zhi Yong Wu; +Cc: stefanha, kvm, qemu-devel, Zhi Yong Wu

Am 18.10.2011 10:07, schrieb Zhi Yong Wu:
> On Mon, Oct 17, 2011 at 6:17 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>>>>> +
>>>>> +typedef struct BlockQueueAIOCB BlockQueueAIOCB;
>>>>> +
>>>>> +struct BlockQueue {
>>>>> +    QTAILQ_HEAD(requests, BlockQueueAIOCB) requests;
>>>>> +    bool req_failed;
>>>>> +    bool flushing;
>>>>> +};
>>>>
>>>> I find req_failed pretty confusing. Needs documentation at least, but
>>>> most probably also a better name.
>>> OK. request_has_failed?
>>
>> No, that doesn't describe what it's really doing.
>>
>> You set req_failed = true by default and then on some obscure condition
>> clear it or not. It's tracking something, but I'm not sure what meaning
>> it has during the whole process.
> In qemu_block_queue_flush,
> When bdrv_aio_readv/writev return NULL, if req_failed is changed to
> false, it indicates that the request exceeds the limits again; if
> req_failed is still true, it indicates that one I/O error takes place,
> at the monent, qemu_block_queue_callback(request, -EIO) need to be
> called to report this to upper layer.

Okay, this makes some more sense now.

How about reversing the logic and maintaining a limit_exceeded flag
instead of a req_failed? I think this would be clearer.

>>>>> +void qemu_del_block_queue(BlockQueue *queue)
>>>>> +{
>>>>> +    BlockQueueAIOCB *request, *next;
>>>>> +
>>>>> +    QTAILQ_FOREACH_SAFE(request, &queue->requests, entry, next) {
>>>>> +        QTAILQ_REMOVE(&queue->requests, request, entry);
>>>>> +        qemu_aio_release(request);
>>>>> +    }
>>>>> +
>>>>> +    g_free(queue);
>>>>> +}
>>>>
>>>> Can we be sure that no AIO requests are in flight that still use the now
>>>> released AIOCB?
>>> Yeah, since qemu core code is serially performed, i think that when
>>> qemu_del_block_queue is performed, no requests are in flight. Right?
>>
>> Patch 2 has this code:
>>
>> +void bdrv_io_limits_disable(BlockDriverState *bs)
>> +{
>> +    bs->io_limits_enabled = false;
>> +
>> +    if (bs->block_queue) {
>> +        qemu_block_queue_flush(bs->block_queue);
>> +        qemu_del_block_queue(bs->block_queue);
>> +        bs->block_queue = NULL;
>> +    }
>>
>> Does this mean that you can't disable I/O limits while the VM is running?
> NO, you can even though VM is running.

Okay, in general qemu_block_queue_flush() empties the queue so that
there are no requests left that qemu_del_block_queue() could drop from
the queue. So in the common case it doesn't even enter the FOREACH loop.

I think problems start when requests have failed or exceeded the limit
again, then you have requests queued even after
qemu_block_queue_flush(). You must be aware of this, otherwise the code
in qemu_del_block_queue() wouldn't exist.

But you can't release the ACBs without having called their callback,
otherwise the caller would still assume that its ACB pointer is valid.
Maybe calling the callback before releasing the ACB would be enough.

However, for failed requests see below.

>>
>>>>> +
>>>>> +BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
>>>>> +                        BlockDriverState *bs,
>>>>> +                        BlockRequestHandler *handler,
>>>>> +                        int64_t sector_num,
>>>>> +                        QEMUIOVector *qiov,
>>>>> +                        int nb_sectors,
>>>>> +                        BlockDriverCompletionFunc *cb,
>>>>> +                        void *opaque)
>>>>> +{
>>>>> +    BlockDriverAIOCB *acb;
>>>>> +    BlockQueueAIOCB *request;
>>>>> +
>>>>> +    if (queue->flushing) {
>>>>> +        queue->req_failed = false;
>>>>> +        return NULL;
>>>>> +    } else {
>>>>> +        acb = qemu_aio_get(&block_queue_pool, bs,
>>>>> +                           cb, opaque);
>>>>> +        request = container_of(acb, BlockQueueAIOCB, common);
>>>>> +        request->handler       = handler;
>>>>> +        request->sector_num    = sector_num;
>>>>> +        request->qiov          = qiov;
>>>>> +        request->nb_sectors    = nb_sectors;
>>>>> +        request->real_acb      = NULL;
>>>>> +        QTAILQ_INSERT_TAIL(&queue->requests, request, entry);
>>>>> +    }
>>>>> +
>>>>> +    return acb;
>>>>> +}
>>>>> +
>>>>> +static int qemu_block_queue_handler(BlockQueueAIOCB *request)
>>>>> +{
>>>>> +    int ret;
>>>>> +    BlockDriverAIOCB *res;
>>>>> +
>>>>> +    res = request->handler(request->common.bs, request->sector_num,
>>>>> +                           request->qiov, request->nb_sectors,
>>>>> +                           qemu_block_queue_callback, request);
>>>>> +    if (res) {
>>>>> +        request->real_acb = res;
>>>>> +    }
>>>>> +
>>>>> +    ret = (res == NULL) ? 0 : 1;
>>>>> +
>>>>> +    return ret;
>>>>
>>>> You mean return (res != NULL); and want to have bool as the return value
>>>> of this function.
>>> Yeah, thanks. i will modify as below:
>>> ret = (res == NULL) ? false : true;
>>
>> ret = (res != NULL) is really more readable.
> I have adopted Paolo's suggestion.
> 
>>
>>> and
>>> static bool qemu_block_queue_handler()
>>>
>>>>
>>>>> +}
>>>>> +
>>>>> +void qemu_block_queue_flush(BlockQueue *queue)
>>>>> +{
>>>>> +    queue->flushing = true;
>>>>> +    while (!QTAILQ_EMPTY(&queue->requests)) {
>>>>> +        BlockQueueAIOCB *request = NULL;
>>>>> +        int ret = 0;
>>>>> +
>>>>> +        request = QTAILQ_FIRST(&queue->requests);
>>>>> +        QTAILQ_REMOVE(&queue->requests, request, entry);
>>>>> +
>>>>> +        queue->req_failed = true;
>>>>> +        ret = qemu_block_queue_handler(request);
>>>>> +        if (ret == 0) {
>>>>> +            QTAILQ_INSERT_HEAD(&queue->requests, request, entry);
>>>>> +            if (queue->req_failed) {
>>>>> +                qemu_block_queue_callback(request, -EIO);
>>>>> +                break;

When a request has failed, you call its callback, but still leave it
queued. I think this is wrong.

>>>>> +            }
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>> +    queue->req_failed = true;
>>>>> +    queue->flushing   = false;
>>>>> +}
>>>>> +
>>>>> +bool qemu_block_queue_has_pending(BlockQueue *queue)
>>>>> +{
>>>>> +    return !queue->flushing && !QTAILQ_EMPTY(&queue->requests);
>>>>> +}
>>>>
>>>> Why doesn't the queue have pending requests in the middle of a flush
>>>> operation? (That is, the flush hasn't completed yet)
>>> It is possible for the queue to have pending requests. if yes, how about?
>>
>> Sorry, can't parse this.
>>
>> I don't understand why the !queue->flushing part is correct.

What about this?

Kevin

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 1/4] block: add the block queue support
@ 2011-10-18  8:36             ` Kevin Wolf
  0 siblings, 0 replies; 68+ messages in thread
From: Kevin Wolf @ 2011-10-18  8:36 UTC (permalink / raw)
  To: Zhi Yong Wu; +Cc: Zhi Yong Wu, stefanha, kvm, qemu-devel

Am 18.10.2011 10:07, schrieb Zhi Yong Wu:
> On Mon, Oct 17, 2011 at 6:17 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>>>>> +
>>>>> +typedef struct BlockQueueAIOCB BlockQueueAIOCB;
>>>>> +
>>>>> +struct BlockQueue {
>>>>> +    QTAILQ_HEAD(requests, BlockQueueAIOCB) requests;
>>>>> +    bool req_failed;
>>>>> +    bool flushing;
>>>>> +};
>>>>
>>>> I find req_failed pretty confusing. Needs documentation at least, but
>>>> most probably also a better name.
>>> OK. request_has_failed?
>>
>> No, that doesn't describe what it's really doing.
>>
>> You set req_failed = true by default and then on some obscure condition
>> clear it or not. It's tracking something, but I'm not sure what meaning
>> it has during the whole process.
> In qemu_block_queue_flush,
> When bdrv_aio_readv/writev return NULL, if req_failed is changed to
> false, it indicates that the request exceeds the limits again; if
> req_failed is still true, it indicates that one I/O error takes place,
> at the monent, qemu_block_queue_callback(request, -EIO) need to be
> called to report this to upper layer.

Okay, this makes some more sense now.

How about reversing the logic and maintaining a limit_exceeded flag
instead of a req_failed? I think this would be clearer.

>>>>> +void qemu_del_block_queue(BlockQueue *queue)
>>>>> +{
>>>>> +    BlockQueueAIOCB *request, *next;
>>>>> +
>>>>> +    QTAILQ_FOREACH_SAFE(request, &queue->requests, entry, next) {
>>>>> +        QTAILQ_REMOVE(&queue->requests, request, entry);
>>>>> +        qemu_aio_release(request);
>>>>> +    }
>>>>> +
>>>>> +    g_free(queue);
>>>>> +}
>>>>
>>>> Can we be sure that no AIO requests are in flight that still use the now
>>>> released AIOCB?
>>> Yeah, since qemu core code is serially performed, i think that when
>>> qemu_del_block_queue is performed, no requests are in flight. Right?
>>
>> Patch 2 has this code:
>>
>> +void bdrv_io_limits_disable(BlockDriverState *bs)
>> +{
>> +    bs->io_limits_enabled = false;
>> +
>> +    if (bs->block_queue) {
>> +        qemu_block_queue_flush(bs->block_queue);
>> +        qemu_del_block_queue(bs->block_queue);
>> +        bs->block_queue = NULL;
>> +    }
>>
>> Does this mean that you can't disable I/O limits while the VM is running?
> NO, you can even though VM is running.

Okay, in general qemu_block_queue_flush() empties the queue so that
there are no requests left that qemu_del_block_queue() could drop from
the queue. So in the common case it doesn't even enter the FOREACH loop.

I think problems start when requests have failed or exceeded the limit
again, then you have requests queued even after
qemu_block_queue_flush(). You must be aware of this, otherwise the code
in qemu_del_block_queue() wouldn't exist.

But you can't release the ACBs without having called their callback,
otherwise the caller would still assume that its ACB pointer is valid.
Maybe calling the callback before releasing the ACB would be enough.

However, for failed requests see below.

>>
>>>>> +
>>>>> +BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
>>>>> +                        BlockDriverState *bs,
>>>>> +                        BlockRequestHandler *handler,
>>>>> +                        int64_t sector_num,
>>>>> +                        QEMUIOVector *qiov,
>>>>> +                        int nb_sectors,
>>>>> +                        BlockDriverCompletionFunc *cb,
>>>>> +                        void *opaque)
>>>>> +{
>>>>> +    BlockDriverAIOCB *acb;
>>>>> +    BlockQueueAIOCB *request;
>>>>> +
>>>>> +    if (queue->flushing) {
>>>>> +        queue->req_failed = false;
>>>>> +        return NULL;
>>>>> +    } else {
>>>>> +        acb = qemu_aio_get(&block_queue_pool, bs,
>>>>> +                           cb, opaque);
>>>>> +        request = container_of(acb, BlockQueueAIOCB, common);
>>>>> +        request->handler       = handler;
>>>>> +        request->sector_num    = sector_num;
>>>>> +        request->qiov          = qiov;
>>>>> +        request->nb_sectors    = nb_sectors;
>>>>> +        request->real_acb      = NULL;
>>>>> +        QTAILQ_INSERT_TAIL(&queue->requests, request, entry);
>>>>> +    }
>>>>> +
>>>>> +    return acb;
>>>>> +}
>>>>> +
>>>>> +static int qemu_block_queue_handler(BlockQueueAIOCB *request)
>>>>> +{
>>>>> +    int ret;
>>>>> +    BlockDriverAIOCB *res;
>>>>> +
>>>>> +    res = request->handler(request->common.bs, request->sector_num,
>>>>> +                           request->qiov, request->nb_sectors,
>>>>> +                           qemu_block_queue_callback, request);
>>>>> +    if (res) {
>>>>> +        request->real_acb = res;
>>>>> +    }
>>>>> +
>>>>> +    ret = (res == NULL) ? 0 : 1;
>>>>> +
>>>>> +    return ret;
>>>>
>>>> You mean return (res != NULL); and want to have bool as the return value
>>>> of this function.
>>> Yeah, thanks. i will modify as below:
>>> ret = (res == NULL) ? false : true;
>>
>> ret = (res != NULL) is really more readable.
> I have adopted Paolo's suggestion.
> 
>>
>>> and
>>> static bool qemu_block_queue_handler()
>>>
>>>>
>>>>> +}
>>>>> +
>>>>> +void qemu_block_queue_flush(BlockQueue *queue)
>>>>> +{
>>>>> +    queue->flushing = true;
>>>>> +    while (!QTAILQ_EMPTY(&queue->requests)) {
>>>>> +        BlockQueueAIOCB *request = NULL;
>>>>> +        int ret = 0;
>>>>> +
>>>>> +        request = QTAILQ_FIRST(&queue->requests);
>>>>> +        QTAILQ_REMOVE(&queue->requests, request, entry);
>>>>> +
>>>>> +        queue->req_failed = true;
>>>>> +        ret = qemu_block_queue_handler(request);
>>>>> +        if (ret == 0) {
>>>>> +            QTAILQ_INSERT_HEAD(&queue->requests, request, entry);
>>>>> +            if (queue->req_failed) {
>>>>> +                qemu_block_queue_callback(request, -EIO);
>>>>> +                break;

When a request has failed, you call its callback, but still leave it
queued. I think this is wrong.

>>>>> +            }
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>> +    queue->req_failed = true;
>>>>> +    queue->flushing   = false;
>>>>> +}
>>>>> +
>>>>> +bool qemu_block_queue_has_pending(BlockQueue *queue)
>>>>> +{
>>>>> +    return !queue->flushing && !QTAILQ_EMPTY(&queue->requests);
>>>>> +}
>>>>
>>>> Why doesn't the queue have pending requests in the middle of a flush
>>>> operation? (That is, the flush hasn't completed yet)
>>> It is possible for the queue to have pending requests. if yes, how about?
>>
>> Sorry, can't parse this.
>>
>> I don't understand why the !queue->flushing part is correct.

What about this?

Kevin

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 3/4] block: add block timer and throttling algorithm
  2011-10-17 10:26         ` Kevin Wolf
@ 2011-10-18  8:43           ` Zhi Yong Wu
  -1 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-10-18  8:43 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: aliguori, stefanha, kvm, qemu-devel, Zhi Yong Wu

On Mon, Oct 17, 2011 at 6:26 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 26.09.2011 09:24, schrieb Zhi Yong Wu:
>> On Sat, Sep 24, 2011 at 12:19 AM, Kevin Wolf <kwolf@redhat.com> wrote:
>>> Am 08.09.2011 12:11, schrieb Zhi Yong Wu:
>>>> Note:
>>>>      1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
>>>>      2.) When "dd" command is issued in guest, if its option bs is set to a large value such as "bs=1024K", the result speed will slightly bigger than the limits.
>>>>
>>>> For these problems, if you have nice thought, pls let us know.:)
>>>>
>>>> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>>>> ---
>>>>  block.c |  259 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
>>>>  block.h |    1 -
>>>>  2 files changed, 248 insertions(+), 12 deletions(-)
>>>
>>> One general comment: What about synchronous and/or coroutine I/O
>>> operations? Do you think they are just not important enough to consider
>>> here or were they forgotten?
>> For sync ops, we assume that it will be converse into async mode at
>> some point of future, right?
>> For coroutine I/O, it is introduced in image driver layer, and behind
>> bdrv_aio_readv/writev. I think that we need not consider them, right?
>
> Meanwhile the block layer has been changed to handle all requests in
> terms of coroutines. So you would best move your intercepting code into
> the coroutine functions.
OK. I will.
>
>>> Also, do I understand correctly that you're always submitting the whole
>> Right, when the block timer fire, it will flush whole request queue.
>>> queue at once? Does this effectively enforce the limit all the time or
>>> will it lead to some peaks and then no requests at all for a while until
>> In fact, it only try to submit those enqueued request one by one. If
>> fail to pass the limit, this request will be enqueued again.
>
> Right, I missed this. Makes sense.
>
>>> the average is right again?
>> Yeah, it is possible. Do you better idea?
>>>
>>> Maybe some documentation on how it all works from a high level
>>> perspective would be helpful.
>>>
>>>> +    /* throttling disk read I/O */
>>>> +    if (bs->io_limits_enabled) {
>>>> +        if (bdrv_exceed_io_limits(bs, nb_sectors, false, &wait_time)) {
>>>> +            ret = qemu_block_queue_enqueue(bs->block_queue, bs, bdrv_aio_readv,
>>>> +                           sector_num, qiov, nb_sectors, cb, opaque);
>>>> +            printf("wait_time=%ld\n", wait_time);
>>>> +            if (wait_time != -1) {
>>>> +                printf("reset block timer\n");
>>>> +                qemu_mod_timer(bs->block_timer,
>>>> +                               wait_time + qemu_get_clock_ns(vm_clock));
>>>> +            }
>>>> +
>>>> +            if (ret) {
>>>> +                printf("ori ret is not null\n");
>>>> +            } else {
>>>> +                printf("ori ret is null\n");
>>>> +            }
>>>> +
>>>> +            return ret;
>>>> +        }
>>>> +    }
>>>>
>>>> -    return drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
>>>> +    ret =  drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
>>>>                                 cb, opaque);
>>>> +    if (ret) {
>>>> +        if (bs->io_limits_enabled) {
>>>> +            bs->io_disps.bytes[BLOCK_IO_LIMIT_READ] +=
>>>> +                              (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
>>>> +            bs->io_disps.ios[BLOCK_IO_LIMIT_READ]++;
>>>> +        }
>>>
>>> I wonder if you can't reuse bs->nr_bytes/nr_ops instead of introducing a
>>> second counting mechanism. Would have the advantage that numbers are
>> NO, our counting variables will be reset to ZERO if current slice
>> time(0.1ms) is used up.
>
> Instead of setting the counter to zero you could remember the base value
> and calculate the difference when you need it. The advantage is that we
> can share infrastructure instead of introducing several subtly different
> ways of I/O accounting.
Good idea, let me try it.

>
>>> actually consistent (your metric counts slightly differently than the
>>> existing info blockstats one).
>> Yeah, i notice this, and don't think there's wrong with it. and you?
>
> It's not really user friendly if a number that is called the same means
> this in one place and in another place that.
OK
>
> Kevin
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 3/4] block: add block timer and throttling algorithm
@ 2011-10-18  8:43           ` Zhi Yong Wu
  0 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-10-18  8:43 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: aliguori, Zhi Yong Wu, stefanha, kvm, qemu-devel

On Mon, Oct 17, 2011 at 6:26 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 26.09.2011 09:24, schrieb Zhi Yong Wu:
>> On Sat, Sep 24, 2011 at 12:19 AM, Kevin Wolf <kwolf@redhat.com> wrote:
>>> Am 08.09.2011 12:11, schrieb Zhi Yong Wu:
>>>> Note:
>>>>      1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
>>>>      2.) When "dd" command is issued in guest, if its option bs is set to a large value such as "bs=1024K", the result speed will slightly bigger than the limits.
>>>>
>>>> For these problems, if you have nice thought, pls let us know.:)
>>>>
>>>> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>>>> ---
>>>>  block.c |  259 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
>>>>  block.h |    1 -
>>>>  2 files changed, 248 insertions(+), 12 deletions(-)
>>>
>>> One general comment: What about synchronous and/or coroutine I/O
>>> operations? Do you think they are just not important enough to consider
>>> here or were they forgotten?
>> For sync ops, we assume that it will be converse into async mode at
>> some point of future, right?
>> For coroutine I/O, it is introduced in image driver layer, and behind
>> bdrv_aio_readv/writev. I think that we need not consider them, right?
>
> Meanwhile the block layer has been changed to handle all requests in
> terms of coroutines. So you would best move your intercepting code into
> the coroutine functions.
OK. I will.
>
>>> Also, do I understand correctly that you're always submitting the whole
>> Right, when the block timer fire, it will flush whole request queue.
>>> queue at once? Does this effectively enforce the limit all the time or
>>> will it lead to some peaks and then no requests at all for a while until
>> In fact, it only try to submit those enqueued request one by one. If
>> fail to pass the limit, this request will be enqueued again.
>
> Right, I missed this. Makes sense.
>
>>> the average is right again?
>> Yeah, it is possible. Do you better idea?
>>>
>>> Maybe some documentation on how it all works from a high level
>>> perspective would be helpful.
>>>
>>>> +    /* throttling disk read I/O */
>>>> +    if (bs->io_limits_enabled) {
>>>> +        if (bdrv_exceed_io_limits(bs, nb_sectors, false, &wait_time)) {
>>>> +            ret = qemu_block_queue_enqueue(bs->block_queue, bs, bdrv_aio_readv,
>>>> +                           sector_num, qiov, nb_sectors, cb, opaque);
>>>> +            printf("wait_time=%ld\n", wait_time);
>>>> +            if (wait_time != -1) {
>>>> +                printf("reset block timer\n");
>>>> +                qemu_mod_timer(bs->block_timer,
>>>> +                               wait_time + qemu_get_clock_ns(vm_clock));
>>>> +            }
>>>> +
>>>> +            if (ret) {
>>>> +                printf("ori ret is not null\n");
>>>> +            } else {
>>>> +                printf("ori ret is null\n");
>>>> +            }
>>>> +
>>>> +            return ret;
>>>> +        }
>>>> +    }
>>>>
>>>> -    return drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
>>>> +    ret =  drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
>>>>                                 cb, opaque);
>>>> +    if (ret) {
>>>> +        if (bs->io_limits_enabled) {
>>>> +            bs->io_disps.bytes[BLOCK_IO_LIMIT_READ] +=
>>>> +                              (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
>>>> +            bs->io_disps.ios[BLOCK_IO_LIMIT_READ]++;
>>>> +        }
>>>
>>> I wonder if you can't reuse bs->nr_bytes/nr_ops instead of introducing a
>>> second counting mechanism. Would have the advantage that numbers are
>> NO, our counting variables will be reset to ZERO if current slice
>> time(0.1ms) is used up.
>
> Instead of setting the counter to zero you could remember the base value
> and calculate the difference when you need it. The advantage is that we
> can share infrastructure instead of introducing several subtly different
> ways of I/O accounting.
Good idea, let me try it.

>
>>> actually consistent (your metric counts slightly differently than the
>>> existing info blockstats one).
>> Yeah, i notice this, and don't think there's wrong with it. and you?
>
> It's not really user friendly if a number that is called the same means
> this in one place and in another place that.
OK
>
> Kevin
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 1/4] block: add the block queue support
  2011-10-18  8:36             ` Kevin Wolf
@ 2011-10-18  9:29               ` Zhi Yong Wu
  -1 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-10-18  9:29 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: stefanha, kvm, qemu-devel, Zhi Yong Wu

On Tue, Oct 18, 2011 at 4:36 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 18.10.2011 10:07, schrieb Zhi Yong Wu:
>> On Mon, Oct 17, 2011 at 6:17 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>>>>>> +
>>>>>> +typedef struct BlockQueueAIOCB BlockQueueAIOCB;
>>>>>> +
>>>>>> +struct BlockQueue {
>>>>>> +    QTAILQ_HEAD(requests, BlockQueueAIOCB) requests;
>>>>>> +    bool req_failed;
>>>>>> +    bool flushing;
>>>>>> +};
>>>>>
>>>>> I find req_failed pretty confusing. Needs documentation at least, but
>>>>> most probably also a better name.
>>>> OK. request_has_failed?
>>>
>>> No, that doesn't describe what it's really doing.
>>>
>>> You set req_failed = true by default and then on some obscure condition
>>> clear it or not. It's tracking something, but I'm not sure what meaning
>>> it has during the whole process.
>> In qemu_block_queue_flush,
>> When bdrv_aio_readv/writev return NULL, if req_failed is changed to
>> false, it indicates that the request exceeds the limits again; if
>> req_failed is still true, it indicates that one I/O error takes place,
>> at the monent, qemu_block_queue_callback(request, -EIO) need to be
>> called to report this to upper layer.
>
> Okay, this makes some more sense now.
>
> How about reversing the logic and maintaining a limit_exceeded flag
> instead of a req_failed? I think this would be clearer.
OK
>
>>>>>> +void qemu_del_block_queue(BlockQueue *queue)
>>>>>> +{
>>>>>> +    BlockQueueAIOCB *request, *next;
>>>>>> +
>>>>>> +    QTAILQ_FOREACH_SAFE(request, &queue->requests, entry, next) {
>>>>>> +        QTAILQ_REMOVE(&queue->requests, request, entry);
>>>>>> +        qemu_aio_release(request);
>>>>>> +    }
>>>>>> +
>>>>>> +    g_free(queue);
>>>>>> +}
>>>>>
>>>>> Can we be sure that no AIO requests are in flight that still use the now
>>>>> released AIOCB?
>>>> Yeah, since qemu core code is serially performed, i think that when
>>>> qemu_del_block_queue is performed, no requests are in flight. Right?
>>>
>>> Patch 2 has this code:
>>>
>>> +void bdrv_io_limits_disable(BlockDriverState *bs)
>>> +{
>>> +    bs->io_limits_enabled = false;
>>> +
>>> +    if (bs->block_queue) {
>>> +        qemu_block_queue_flush(bs->block_queue);
>>> +        qemu_del_block_queue(bs->block_queue);
>>> +        bs->block_queue = NULL;
>>> +    }
>>>
>>> Does this mean that you can't disable I/O limits while the VM is running?
>> NO, you can even though VM is running.
>
> Okay, in general qemu_block_queue_flush() empties the queue so that
> there are no requests left that qemu_del_block_queue() could drop from
> the queue. So in the common case it doesn't even enter the FOREACH loop.
I think that we should adopt !QTAILQ_EMPTY(&queue->requests), not
QTAILQ_FOREACH_SAFE in qemu_del_block_queue(),
right?
>
> I think problems start when requests have failed or exceeded the limit
> again, then you have requests queued even after
> qemu_block_queue_flush(). You must be aware of this, otherwise the code
> in qemu_del_block_queue() wouldn't exist.
>
> But you can't release the ACBs without having called their callback,
> otherwise the caller would still assume that its ACB pointer is valid.
> Maybe calling the callback before releasing the ACB would be enough.
Good, thanks.
>
> However, for failed requests see below.
>
>>>
>>>>>> +
>>>>>> +BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
>>>>>> +                        BlockDriverState *bs,
>>>>>> +                        BlockRequestHandler *handler,
>>>>>> +                        int64_t sector_num,
>>>>>> +                        QEMUIOVector *qiov,
>>>>>> +                        int nb_sectors,
>>>>>> +                        BlockDriverCompletionFunc *cb,
>>>>>> +                        void *opaque)
>>>>>> +{
>>>>>> +    BlockDriverAIOCB *acb;
>>>>>> +    BlockQueueAIOCB *request;
>>>>>> +
>>>>>> +    if (queue->flushing) {
>>>>>> +        queue->req_failed = false;
>>>>>> +        return NULL;
>>>>>> +    } else {
>>>>>> +        acb = qemu_aio_get(&block_queue_pool, bs,
>>>>>> +                           cb, opaque);
>>>>>> +        request = container_of(acb, BlockQueueAIOCB, common);
>>>>>> +        request->handler       = handler;
>>>>>> +        request->sector_num    = sector_num;
>>>>>> +        request->qiov          = qiov;
>>>>>> +        request->nb_sectors    = nb_sectors;
>>>>>> +        request->real_acb      = NULL;
>>>>>> +        QTAILQ_INSERT_TAIL(&queue->requests, request, entry);
>>>>>> +    }
>>>>>> +
>>>>>> +    return acb;
>>>>>> +}
>>>>>> +
>>>>>> +static int qemu_block_queue_handler(BlockQueueAIOCB *request)
>>>>>> +{
>>>>>> +    int ret;
>>>>>> +    BlockDriverAIOCB *res;
>>>>>> +
>>>>>> +    res = request->handler(request->common.bs, request->sector_num,
>>>>>> +                           request->qiov, request->nb_sectors,
>>>>>> +                           qemu_block_queue_callback, request);
>>>>>> +    if (res) {
>>>>>> +        request->real_acb = res;
>>>>>> +    }
>>>>>> +
>>>>>> +    ret = (res == NULL) ? 0 : 1;
>>>>>> +
>>>>>> +    return ret;
>>>>>
>>>>> You mean return (res != NULL); and want to have bool as the return value
>>>>> of this function.
>>>> Yeah, thanks. i will modify as below:
>>>> ret = (res == NULL) ? false : true;
>>>
>>> ret = (res != NULL) is really more readable.
>> I have adopted Paolo's suggestion.
>>
>>>
>>>> and
>>>> static bool qemu_block_queue_handler()
>>>>
>>>>>
>>>>>> +}
>>>>>> +
>>>>>> +void qemu_block_queue_flush(BlockQueue *queue)
>>>>>> +{
>>>>>> +    queue->flushing = true;
>>>>>> +    while (!QTAILQ_EMPTY(&queue->requests)) {
>>>>>> +        BlockQueueAIOCB *request = NULL;
>>>>>> +        int ret = 0;
>>>>>> +
>>>>>> +        request = QTAILQ_FIRST(&queue->requests);
>>>>>> +        QTAILQ_REMOVE(&queue->requests, request, entry);
>>>>>> +
>>>>>> +        queue->req_failed = true;
>>>>>> +        ret = qemu_block_queue_handler(request);
>>>>>> +        if (ret == 0) {
>>>>>> +            QTAILQ_INSERT_HEAD(&queue->requests, request, entry);
>>>>>> +            if (queue->req_failed) {
>>>>>> +                qemu_block_queue_callback(request, -EIO);
>>>>>> +                break;
>
> When a request has failed, you call its callback, but still leave it
> queued. I think this is wrong.
Right, we should not insert the failed request back to block queue.
>
>>>>>> +            }
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>> +    queue->req_failed = true;
>>>>>> +    queue->flushing   = false;
>>>>>> +}
>>>>>> +
>>>>>> +bool qemu_block_queue_has_pending(BlockQueue *queue)
>>>>>> +{
>>>>>> +    return !queue->flushing && !QTAILQ_EMPTY(&queue->requests);
>>>>>> +}
>>>>>
>>>>> Why doesn't the queue have pending requests in the middle of a flush
>>>>> operation? (That is, the flush hasn't completed yet)
>>>> It is possible for the queue to have pending requests. if yes, how about?
>>>
>>> Sorry, can't parse this.
>>>
>>> I don't understand why the !queue->flushing part is correct.
>
> What about this?
When bdrv_aio_readv/writev handle one request, it will determine if
block queue is not being flushed and isn't NULL; if yes, It assume
that this request is one new request from upper layer, so it won't
determine if the I/O rate at runtime has exceeded the limits, but
immediately insert it into block queue.

>
> Kevin
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 1/4] block: add the block queue support
@ 2011-10-18  9:29               ` Zhi Yong Wu
  0 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-10-18  9:29 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Zhi Yong Wu, stefanha, kvm, qemu-devel

On Tue, Oct 18, 2011 at 4:36 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 18.10.2011 10:07, schrieb Zhi Yong Wu:
>> On Mon, Oct 17, 2011 at 6:17 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>>>>>> +
>>>>>> +typedef struct BlockQueueAIOCB BlockQueueAIOCB;
>>>>>> +
>>>>>> +struct BlockQueue {
>>>>>> +    QTAILQ_HEAD(requests, BlockQueueAIOCB) requests;
>>>>>> +    bool req_failed;
>>>>>> +    bool flushing;
>>>>>> +};
>>>>>
>>>>> I find req_failed pretty confusing. Needs documentation at least, but
>>>>> most probably also a better name.
>>>> OK. request_has_failed?
>>>
>>> No, that doesn't describe what it's really doing.
>>>
>>> You set req_failed = true by default and then on some obscure condition
>>> clear it or not. It's tracking something, but I'm not sure what meaning
>>> it has during the whole process.
>> In qemu_block_queue_flush,
>> When bdrv_aio_readv/writev return NULL, if req_failed is changed to
>> false, it indicates that the request exceeds the limits again; if
>> req_failed is still true, it indicates that one I/O error takes place,
>> at the monent, qemu_block_queue_callback(request, -EIO) need to be
>> called to report this to upper layer.
>
> Okay, this makes some more sense now.
>
> How about reversing the logic and maintaining a limit_exceeded flag
> instead of a req_failed? I think this would be clearer.
OK
>
>>>>>> +void qemu_del_block_queue(BlockQueue *queue)
>>>>>> +{
>>>>>> +    BlockQueueAIOCB *request, *next;
>>>>>> +
>>>>>> +    QTAILQ_FOREACH_SAFE(request, &queue->requests, entry, next) {
>>>>>> +        QTAILQ_REMOVE(&queue->requests, request, entry);
>>>>>> +        qemu_aio_release(request);
>>>>>> +    }
>>>>>> +
>>>>>> +    g_free(queue);
>>>>>> +}
>>>>>
>>>>> Can we be sure that no AIO requests are in flight that still use the now
>>>>> released AIOCB?
>>>> Yeah, since qemu core code is serially performed, i think that when
>>>> qemu_del_block_queue is performed, no requests are in flight. Right?
>>>
>>> Patch 2 has this code:
>>>
>>> +void bdrv_io_limits_disable(BlockDriverState *bs)
>>> +{
>>> +    bs->io_limits_enabled = false;
>>> +
>>> +    if (bs->block_queue) {
>>> +        qemu_block_queue_flush(bs->block_queue);
>>> +        qemu_del_block_queue(bs->block_queue);
>>> +        bs->block_queue = NULL;
>>> +    }
>>>
>>> Does this mean that you can't disable I/O limits while the VM is running?
>> NO, you can even though VM is running.
>
> Okay, in general qemu_block_queue_flush() empties the queue so that
> there are no requests left that qemu_del_block_queue() could drop from
> the queue. So in the common case it doesn't even enter the FOREACH loop.
I think that we should adopt !QTAILQ_EMPTY(&queue->requests), not
QTAILQ_FOREACH_SAFE in qemu_del_block_queue(),
right?
>
> I think problems start when requests have failed or exceeded the limit
> again, then you have requests queued even after
> qemu_block_queue_flush(). You must be aware of this, otherwise the code
> in qemu_del_block_queue() wouldn't exist.
>
> But you can't release the ACBs without having called their callback,
> otherwise the caller would still assume that its ACB pointer is valid.
> Maybe calling the callback before releasing the ACB would be enough.
Good, thanks.
>
> However, for failed requests see below.
>
>>>
>>>>>> +
>>>>>> +BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
>>>>>> +                        BlockDriverState *bs,
>>>>>> +                        BlockRequestHandler *handler,
>>>>>> +                        int64_t sector_num,
>>>>>> +                        QEMUIOVector *qiov,
>>>>>> +                        int nb_sectors,
>>>>>> +                        BlockDriverCompletionFunc *cb,
>>>>>> +                        void *opaque)
>>>>>> +{
>>>>>> +    BlockDriverAIOCB *acb;
>>>>>> +    BlockQueueAIOCB *request;
>>>>>> +
>>>>>> +    if (queue->flushing) {
>>>>>> +        queue->req_failed = false;
>>>>>> +        return NULL;
>>>>>> +    } else {
>>>>>> +        acb = qemu_aio_get(&block_queue_pool, bs,
>>>>>> +                           cb, opaque);
>>>>>> +        request = container_of(acb, BlockQueueAIOCB, common);
>>>>>> +        request->handler       = handler;
>>>>>> +        request->sector_num    = sector_num;
>>>>>> +        request->qiov          = qiov;
>>>>>> +        request->nb_sectors    = nb_sectors;
>>>>>> +        request->real_acb      = NULL;
>>>>>> +        QTAILQ_INSERT_TAIL(&queue->requests, request, entry);
>>>>>> +    }
>>>>>> +
>>>>>> +    return acb;
>>>>>> +}
>>>>>> +
>>>>>> +static int qemu_block_queue_handler(BlockQueueAIOCB *request)
>>>>>> +{
>>>>>> +    int ret;
>>>>>> +    BlockDriverAIOCB *res;
>>>>>> +
>>>>>> +    res = request->handler(request->common.bs, request->sector_num,
>>>>>> +                           request->qiov, request->nb_sectors,
>>>>>> +                           qemu_block_queue_callback, request);
>>>>>> +    if (res) {
>>>>>> +        request->real_acb = res;
>>>>>> +    }
>>>>>> +
>>>>>> +    ret = (res == NULL) ? 0 : 1;
>>>>>> +
>>>>>> +    return ret;
>>>>>
>>>>> You mean return (res != NULL); and want to have bool as the return value
>>>>> of this function.
>>>> Yeah, thanks. i will modify as below:
>>>> ret = (res == NULL) ? false : true;
>>>
>>> ret = (res != NULL) is really more readable.
>> I have adopted Paolo's suggestion.
>>
>>>
>>>> and
>>>> static bool qemu_block_queue_handler()
>>>>
>>>>>
>>>>>> +}
>>>>>> +
>>>>>> +void qemu_block_queue_flush(BlockQueue *queue)
>>>>>> +{
>>>>>> +    queue->flushing = true;
>>>>>> +    while (!QTAILQ_EMPTY(&queue->requests)) {
>>>>>> +        BlockQueueAIOCB *request = NULL;
>>>>>> +        int ret = 0;
>>>>>> +
>>>>>> +        request = QTAILQ_FIRST(&queue->requests);
>>>>>> +        QTAILQ_REMOVE(&queue->requests, request, entry);
>>>>>> +
>>>>>> +        queue->req_failed = true;
>>>>>> +        ret = qemu_block_queue_handler(request);
>>>>>> +        if (ret == 0) {
>>>>>> +            QTAILQ_INSERT_HEAD(&queue->requests, request, entry);
>>>>>> +            if (queue->req_failed) {
>>>>>> +                qemu_block_queue_callback(request, -EIO);
>>>>>> +                break;
>
> When a request has failed, you call its callback, but still leave it
> queued. I think this is wrong.
Right, we should not insert the failed request back to block queue.
>
>>>>>> +            }
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>> +    queue->req_failed = true;
>>>>>> +    queue->flushing   = false;
>>>>>> +}
>>>>>> +
>>>>>> +bool qemu_block_queue_has_pending(BlockQueue *queue)
>>>>>> +{
>>>>>> +    return !queue->flushing && !QTAILQ_EMPTY(&queue->requests);
>>>>>> +}
>>>>>
>>>>> Why doesn't the queue have pending requests in the middle of a flush
>>>>> operation? (That is, the flush hasn't completed yet)
>>>> It is possible for the queue to have pending requests. if yes, how about?
>>>
>>> Sorry, can't parse this.
>>>
>>> I don't understand why the !queue->flushing part is correct.
>
> What about this?
When bdrv_aio_readv/writev handle one request, it will determine if
block queue is not being flushed and isn't NULL; if yes, It assume
that this request is one new request from upper layer, so it won't
determine if the I/O rate at runtime has exceeded the limits, but
immediately insert it into block queue.

>
> Kevin
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 1/4] block: add the block queue support
  2011-10-18  9:29               ` Zhi Yong Wu
@ 2011-10-18  9:56                 ` Kevin Wolf
  -1 siblings, 0 replies; 68+ messages in thread
From: Kevin Wolf @ 2011-10-18  9:56 UTC (permalink / raw)
  To: Zhi Yong Wu; +Cc: stefanha, kvm, qemu-devel, Zhi Yong Wu

Am 18.10.2011 11:29, schrieb Zhi Yong Wu:
>>>>>>> +void qemu_del_block_queue(BlockQueue *queue)
>>>>>>> +{
>>>>>>> +    BlockQueueAIOCB *request, *next;
>>>>>>> +
>>>>>>> +    QTAILQ_FOREACH_SAFE(request, &queue->requests, entry, next) {
>>>>>>> +        QTAILQ_REMOVE(&queue->requests, request, entry);
>>>>>>> +        qemu_aio_release(request);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    g_free(queue);
>>>>>>> +}
>>>>>>
>>>>>> Can we be sure that no AIO requests are in flight that still use the now
>>>>>> released AIOCB?
>>>>> Yeah, since qemu core code is serially performed, i think that when
>>>>> qemu_del_block_queue is performed, no requests are in flight. Right?
>>>>
>>>> Patch 2 has this code:
>>>>
>>>> +void bdrv_io_limits_disable(BlockDriverState *bs)
>>>> +{
>>>> +    bs->io_limits_enabled = false;
>>>> +
>>>> +    if (bs->block_queue) {
>>>> +        qemu_block_queue_flush(bs->block_queue);
>>>> +        qemu_del_block_queue(bs->block_queue);
>>>> +        bs->block_queue = NULL;
>>>> +    }
>>>>
>>>> Does this mean that you can't disable I/O limits while the VM is running?
>>> NO, you can even though VM is running.
>>
>> Okay, in general qemu_block_queue_flush() empties the queue so that
>> there are no requests left that qemu_del_block_queue() could drop from
>> the queue. So in the common case it doesn't even enter the FOREACH loop.
> I think that we should adopt !QTAILQ_EMPTY(&queue->requests), not
> QTAILQ_FOREACH_SAFE in qemu_del_block_queue(),
> right?

I think QTAILQ_FOREACH_SAFE is fine.

>>
>> I think problems start when requests have failed or exceeded the limit
>> again, then you have requests queued even after
>> qemu_block_queue_flush(). You must be aware of this, otherwise the code
>> in qemu_del_block_queue() wouldn't exist.
>>
>> But you can't release the ACBs without having called their callback,
>> otherwise the caller would still assume that its ACB pointer is valid.
>> Maybe calling the callback before releasing the ACB would be enough.
> Good, thanks.
>>>>
>>>>>>> +            }
>>>>>>> +        }
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    queue->req_failed = true;
>>>>>>> +    queue->flushing   = false;
>>>>>>> +}
>>>>>>> +
>>>>>>> +bool qemu_block_queue_has_pending(BlockQueue *queue)
>>>>>>> +{
>>>>>>> +    return !queue->flushing && !QTAILQ_EMPTY(&queue->requests);
>>>>>>> +}
>>>>>>
>>>>>> Why doesn't the queue have pending requests in the middle of a flush
>>>>>> operation? (That is, the flush hasn't completed yet)
>>>>> It is possible for the queue to have pending requests. if yes, how about?
>>>>
>>>> Sorry, can't parse this.
>>>>
>>>> I don't understand why the !queue->flushing part is correct.
>>
>> What about this?
> When bdrv_aio_readv/writev handle one request, it will determine if
> block queue is not being flushed and isn't NULL; if yes, It assume
> that this request is one new request from upper layer, so it won't
> determine if the I/O rate at runtime has exceeded the limits, but
> immediately insert it into block queue.

Hm, I think I understand what you're saying, but only after looking at
patch 3. This is not really implementing a has_pending(), but
has_pending_and_caller_wasnt_called_during_flush(). I think it would be
better to handle the queue->flushing condition in the caller where its
use is more obvious.

Kevin

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 1/4] block: add the block queue support
@ 2011-10-18  9:56                 ` Kevin Wolf
  0 siblings, 0 replies; 68+ messages in thread
From: Kevin Wolf @ 2011-10-18  9:56 UTC (permalink / raw)
  To: Zhi Yong Wu; +Cc: Zhi Yong Wu, stefanha, kvm, qemu-devel

Am 18.10.2011 11:29, schrieb Zhi Yong Wu:
>>>>>>> +void qemu_del_block_queue(BlockQueue *queue)
>>>>>>> +{
>>>>>>> +    BlockQueueAIOCB *request, *next;
>>>>>>> +
>>>>>>> +    QTAILQ_FOREACH_SAFE(request, &queue->requests, entry, next) {
>>>>>>> +        QTAILQ_REMOVE(&queue->requests, request, entry);
>>>>>>> +        qemu_aio_release(request);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    g_free(queue);
>>>>>>> +}
>>>>>>
>>>>>> Can we be sure that no AIO requests are in flight that still use the now
>>>>>> released AIOCB?
>>>>> Yeah, since qemu core code is serially performed, i think that when
>>>>> qemu_del_block_queue is performed, no requests are in flight. Right?
>>>>
>>>> Patch 2 has this code:
>>>>
>>>> +void bdrv_io_limits_disable(BlockDriverState *bs)
>>>> +{
>>>> +    bs->io_limits_enabled = false;
>>>> +
>>>> +    if (bs->block_queue) {
>>>> +        qemu_block_queue_flush(bs->block_queue);
>>>> +        qemu_del_block_queue(bs->block_queue);
>>>> +        bs->block_queue = NULL;
>>>> +    }
>>>>
>>>> Does this mean that you can't disable I/O limits while the VM is running?
>>> NO, you can even though VM is running.
>>
>> Okay, in general qemu_block_queue_flush() empties the queue so that
>> there are no requests left that qemu_del_block_queue() could drop from
>> the queue. So in the common case it doesn't even enter the FOREACH loop.
> I think that we should adopt !QTAILQ_EMPTY(&queue->requests), not
> QTAILQ_FOREACH_SAFE in qemu_del_block_queue(),
> right?

I think QTAILQ_FOREACH_SAFE is fine.

>>
>> I think problems start when requests have failed or exceeded the limit
>> again, then you have requests queued even after
>> qemu_block_queue_flush(). You must be aware of this, otherwise the code
>> in qemu_del_block_queue() wouldn't exist.
>>
>> But you can't release the ACBs without having called their callback,
>> otherwise the caller would still assume that its ACB pointer is valid.
>> Maybe calling the callback before releasing the ACB would be enough.
> Good, thanks.
>>>>
>>>>>>> +            }
>>>>>>> +        }
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    queue->req_failed = true;
>>>>>>> +    queue->flushing   = false;
>>>>>>> +}
>>>>>>> +
>>>>>>> +bool qemu_block_queue_has_pending(BlockQueue *queue)
>>>>>>> +{
>>>>>>> +    return !queue->flushing && !QTAILQ_EMPTY(&queue->requests);
>>>>>>> +}
>>>>>>
>>>>>> Why doesn't the queue have pending requests in the middle of a flush
>>>>>> operation? (That is, the flush hasn't completed yet)
>>>>> It is possible for the queue to have pending requests. if yes, how about?
>>>>
>>>> Sorry, can't parse this.
>>>>
>>>> I don't understand why the !queue->flushing part is correct.
>>
>> What about this?
> When bdrv_aio_readv/writev handle one request, it will determine if
> block queue is not being flushed and isn't NULL; if yes, It assume
> that this request is one new request from upper layer, so it won't
> determine if the I/O rate at runtime has exceeded the limits, but
> immediately insert it into block queue.

Hm, I think I understand what you're saying, but only after looking at
patch 3. This is not really implementing a has_pending(), but
has_pending_and_caller_wasnt_called_during_flush(). I think it would be
better to handle the queue->flushing condition in the caller where its
use is more obvious.

Kevin

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 1/4] block: add the block queue support
  2011-10-18  9:56                 ` Kevin Wolf
@ 2011-10-18 13:29                   ` Zhi Yong Wu
  -1 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-10-18 13:29 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: stefanha, kvm, qemu-devel, Zhi Yong Wu

On Tue, Oct 18, 2011 at 5:56 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 18.10.2011 11:29, schrieb Zhi Yong Wu:
>>>>>>>> +void qemu_del_block_queue(BlockQueue *queue)
>>>>>>>> +{
>>>>>>>> +    BlockQueueAIOCB *request, *next;
>>>>>>>> +
>>>>>>>> +    QTAILQ_FOREACH_SAFE(request, &queue->requests, entry, next) {
>>>>>>>> +        QTAILQ_REMOVE(&queue->requests, request, entry);
>>>>>>>> +        qemu_aio_release(request);
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    g_free(queue);
>>>>>>>> +}
>>>>>>>
>>>>>>> Can we be sure that no AIO requests are in flight that still use the now
>>>>>>> released AIOCB?
>>>>>> Yeah, since qemu core code is serially performed, i think that when
>>>>>> qemu_del_block_queue is performed, no requests are in flight. Right?
>>>>>
>>>>> Patch 2 has this code:
>>>>>
>>>>> +void bdrv_io_limits_disable(BlockDriverState *bs)
>>>>> +{
>>>>> +    bs->io_limits_enabled = false;
>>>>> +
>>>>> +    if (bs->block_queue) {
>>>>> +        qemu_block_queue_flush(bs->block_queue);
>>>>> +        qemu_del_block_queue(bs->block_queue);
>>>>> +        bs->block_queue = NULL;
>>>>> +    }
>>>>>
>>>>> Does this mean that you can't disable I/O limits while the VM is running?
>>>> NO, you can even though VM is running.
>>>
>>> Okay, in general qemu_block_queue_flush() empties the queue so that
>>> there are no requests left that qemu_del_block_queue() could drop from
>>> the queue. So in the common case it doesn't even enter the FOREACH loop.
>> I think that we should adopt !QTAILQ_EMPTY(&queue->requests), not
>> QTAILQ_FOREACH_SAFE in qemu_del_block_queue(),
>> right?
>
> I think QTAILQ_FOREACH_SAFE is fine.
>
>>>
>>> I think problems start when requests have failed or exceeded the limit
>>> again, then you have requests queued even after
>>> qemu_block_queue_flush(). You must be aware of this, otherwise the code
>>> in qemu_del_block_queue() wouldn't exist.
>>>
>>> But you can't release the ACBs without having called their callback,
>>> otherwise the caller would still assume that its ACB pointer is valid.
>>> Maybe calling the callback before releasing the ACB would be enough.
>> Good, thanks.
>>>>>
>>>>>>>> +            }
>>>>>>>> +        }
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    queue->req_failed = true;
>>>>>>>> +    queue->flushing   = false;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +bool qemu_block_queue_has_pending(BlockQueue *queue)
>>>>>>>> +{
>>>>>>>> +    return !queue->flushing && !QTAILQ_EMPTY(&queue->requests);
>>>>>>>> +}
>>>>>>>
>>>>>>> Why doesn't the queue have pending requests in the middle of a flush
>>>>>>> operation? (That is, the flush hasn't completed yet)
>>>>>> It is possible for the queue to have pending requests. if yes, how about?
>>>>>
>>>>> Sorry, can't parse this.
>>>>>
>>>>> I don't understand why the !queue->flushing part is correct.
>>>
>>> What about this?
>> When bdrv_aio_readv/writev handle one request, it will determine if
>> block queue is not being flushed and isn't NULL; if yes, It assume
>> that this request is one new request from upper layer, so it won't
>> determine if the I/O rate at runtime has exceeded the limits, but
>> immediately insert it into block queue.
>
> Hm, I think I understand what you're saying, but only after looking at
> patch 3. This is not really implementing a has_pending(), but
> has_pending_and_caller_wasnt_called_during_flush(). I think it would be
Correct.
> better to handle the queue->flushing condition in the caller where its
> use is more obvious.
OK. i will do as this.
>
> Kevin
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v8 1/4] block: add the block queue support
@ 2011-10-18 13:29                   ` Zhi Yong Wu
  0 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-10-18 13:29 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Zhi Yong Wu, stefanha, kvm, qemu-devel

On Tue, Oct 18, 2011 at 5:56 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 18.10.2011 11:29, schrieb Zhi Yong Wu:
>>>>>>>> +void qemu_del_block_queue(BlockQueue *queue)
>>>>>>>> +{
>>>>>>>> +    BlockQueueAIOCB *request, *next;
>>>>>>>> +
>>>>>>>> +    QTAILQ_FOREACH_SAFE(request, &queue->requests, entry, next) {
>>>>>>>> +        QTAILQ_REMOVE(&queue->requests, request, entry);
>>>>>>>> +        qemu_aio_release(request);
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    g_free(queue);
>>>>>>>> +}
>>>>>>>
>>>>>>> Can we be sure that no AIO requests are in flight that still use the now
>>>>>>> released AIOCB?
>>>>>> Yeah, since qemu core code is serially performed, i think that when
>>>>>> qemu_del_block_queue is performed, no requests are in flight. Right?
>>>>>
>>>>> Patch 2 has this code:
>>>>>
>>>>> +void bdrv_io_limits_disable(BlockDriverState *bs)
>>>>> +{
>>>>> +    bs->io_limits_enabled = false;
>>>>> +
>>>>> +    if (bs->block_queue) {
>>>>> +        qemu_block_queue_flush(bs->block_queue);
>>>>> +        qemu_del_block_queue(bs->block_queue);
>>>>> +        bs->block_queue = NULL;
>>>>> +    }
>>>>>
>>>>> Does this mean that you can't disable I/O limits while the VM is running?
>>>> NO, you can even though VM is running.
>>>
>>> Okay, in general qemu_block_queue_flush() empties the queue so that
>>> there are no requests left that qemu_del_block_queue() could drop from
>>> the queue. So in the common case it doesn't even enter the FOREACH loop.
>> I think that we should adopt !QTAILQ_EMPTY(&queue->requests), not
>> QTAILQ_FOREACH_SAFE in qemu_del_block_queue(),
>> right?
>
> I think QTAILQ_FOREACH_SAFE is fine.
>
>>>
>>> I think problems start when requests have failed or exceeded the limit
>>> again, then you have requests queued even after
>>> qemu_block_queue_flush(). You must be aware of this, otherwise the code
>>> in qemu_del_block_queue() wouldn't exist.
>>>
>>> But you can't release the ACBs without having called their callback,
>>> otherwise the caller would still assume that its ACB pointer is valid.
>>> Maybe calling the callback before releasing the ACB would be enough.
>> Good, thanks.
>>>>>
>>>>>>>> +            }
>>>>>>>> +        }
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    queue->req_failed = true;
>>>>>>>> +    queue->flushing   = false;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +bool qemu_block_queue_has_pending(BlockQueue *queue)
>>>>>>>> +{
>>>>>>>> +    return !queue->flushing && !QTAILQ_EMPTY(&queue->requests);
>>>>>>>> +}
>>>>>>>
>>>>>>> Why doesn't the queue have pending requests in the middle of a flush
>>>>>>> operation? (That is, the flush hasn't completed yet)
>>>>>> It is possible for the queue to have pending requests. if yes, how about?
>>>>>
>>>>> Sorry, can't parse this.
>>>>>
>>>>> I don't understand why the !queue->flushing part is correct.
>>>
>>> What about this?
>> When bdrv_aio_readv/writev handle one request, it will determine if
>> block queue is not being flushed and isn't NULL; if yes, It assume
>> that this request is one new request from upper layer, so it won't
>> determine if the I/O rate at runtime has exceeded the limits, but
>> immediately insert it into block queue.
>
> Hm, I think I understand what you're saying, but only after looking at
> patch 3. This is not really implementing a has_pending(), but
> has_pending_and_caller_wasnt_called_during_flush(). I think it would be
Correct.
> better to handle the queue->flushing condition in the caller where its
> use is more obvious.
OK. i will do as this.
>
> Kevin
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v8 3/4] block: add block timer and throttling algorithm
@ 2011-09-07 12:41 Zhi Yong Wu
  0 siblings, 0 replies; 68+ messages in thread
From: Zhi Yong Wu @ 2011-09-07 12:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, aliguori, stefanha, kvm, mtosatti, Zhi Yong Wu, zwu.kernel, ryanh

Note:
     1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
     2.) When "dd" command is issued in guest, if its option bs is set to a large value such as "bs=1024K", the result speed will slightly bigger than the limits.

For these problems, if you have nice thought, pls let us know.:)

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
---
 block.c |  246 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 block.h |    1 -
 2 files changed, 236 insertions(+), 11 deletions(-)

diff --git a/block.c b/block.c
index cd75183..8a82273 100644
--- a/block.c
+++ b/block.c
@@ -30,6 +30,9 @@
 #include "qemu-objects.h"
 #include "qemu-coroutine.h"
 
+#include "qemu-timer.h"
+#include "block/blk-queue.h"
+
 #ifdef CONFIG_BSD
 #include <sys/types.h>
 #include <sys/stat.h>
@@ -72,6 +75,13 @@ static int coroutine_fn bdrv_co_writev_em(BlockDriverState *bs,
                                          QEMUIOVector *iov);
 static int coroutine_fn bdrv_co_flush_em(BlockDriverState *bs);
 
+static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
+        bool is_write, double elapsed_time, uint64_t *wait);
+static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
+        double elapsed_time, uint64_t *wait);
+static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
+        bool is_write, int64_t *wait);
+
 static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
     QTAILQ_HEAD_INITIALIZER(bdrv_states);
 
@@ -745,6 +755,11 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
             bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
     }
 
+    /* throttling disk I/O limits */
+    if (bs->io_limits_enabled) {
+        bdrv_io_limits_enable(bs);
+    }
+
     return 0;
 
 unlink_and_fail:
@@ -783,6 +798,18 @@ void bdrv_close(BlockDriverState *bs)
         if (bs->change_cb)
             bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
     }
+
+    /* throttling disk I/O limits */
+    if (bs->block_queue) {
+        qemu_del_block_queue(bs->block_queue);
+        bs->block_queue = NULL;
+    }
+
+    if (bs->block_timer) {
+        qemu_del_timer(bs->block_timer);
+        qemu_free_timer(bs->block_timer);
+        bs->block_timer = NULL;
+    }
 }
 
 void bdrv_close_all(void)
@@ -2341,16 +2368,40 @@ BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num,
                                  BlockDriverCompletionFunc *cb, void *opaque)
 {
     BlockDriver *drv = bs->drv;
+    BlockDriverAIOCB *ret;
+    int64_t wait_time = -1;
 
     trace_bdrv_aio_readv(bs, sector_num, nb_sectors, opaque);
 
-    if (!drv)
-        return NULL;
-    if (bdrv_check_request(bs, sector_num, nb_sectors))
+    if (!drv || bdrv_check_request(bs, sector_num, nb_sectors)) {
         return NULL;
+    }
 
-    return drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
+    /* throttling disk read I/O */
+    if (bs->io_limits_enabled) {
+        if (bdrv_exceed_io_limits(bs, nb_sectors, false, &wait_time)) {
+            ret = qemu_block_queue_enqueue(bs->block_queue, bs, bdrv_aio_readv,
+                           sector_num, qiov, nb_sectors, cb, opaque);
+            if (wait_time != -1) {
+                qemu_mod_timer(bs->block_timer,
+                               wait_time + qemu_get_clock_ns(vm_clock));
+            }
+
+            return ret;
+        }
+    }
+
+    ret =  drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
                                cb, opaque);
+    if (ret) {
+        if (bs->io_limits_enabled) {
+            bs->io_disps.bytes[BLOCK_IO_LIMIT_READ] +=
+                              (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
+            bs->io_disps.ios[BLOCK_IO_LIMIT_READ]++;
+        }
+    }
+
+    return ret;
 }
 
 typedef struct BlockCompleteData {
@@ -2396,15 +2447,14 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
     BlockDriver *drv = bs->drv;
     BlockDriverAIOCB *ret;
     BlockCompleteData *blk_cb_data;
+    int64_t wait_time = -1;
 
     trace_bdrv_aio_writev(bs, sector_num, nb_sectors, opaque);
 
-    if (!drv)
-        return NULL;
-    if (bs->read_only)
-        return NULL;
-    if (bdrv_check_request(bs, sector_num, nb_sectors))
+    if (!drv || bs->read_only
+        || bdrv_check_request(bs, sector_num, nb_sectors)) {
         return NULL;
+    }
 
     if (bs->dirty_bitmap) {
         blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb,
@@ -2413,13 +2463,32 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
         opaque = blk_cb_data;
     }
 
+    /* throttling disk write I/O */
+    if (bs->io_limits_enabled) {
+        if (bdrv_exceed_io_limits(bs, nb_sectors, true, &wait_time)) {
+            ret = qemu_block_queue_enqueue(bs->block_queue, bs, bdrv_aio_writev,
+                                  sector_num, qiov, nb_sectors, cb, opaque);
+            if (wait_time != -1) {
+                qemu_mod_timer(bs->block_timer,
+                               wait_time + qemu_get_clock_ns(vm_clock));
+            }
+
+            return ret;
+        }
+    }
+
     ret = drv->bdrv_aio_writev(bs, sector_num, qiov, nb_sectors,
                                cb, opaque);
-
     if (ret) {
         if (bs->wr_highest_sector < sector_num + nb_sectors - 1) {
             bs->wr_highest_sector = sector_num + nb_sectors - 1;
         }
+
+        if (bs->io_limits_enabled) {
+            bs->io_disps.bytes[BLOCK_IO_LIMIT_WRITE] +=
+                               (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
+            bs->io_disps.ios[BLOCK_IO_LIMIT_WRITE]++;
+        }
     }
 
     return ret;
@@ -2684,6 +2753,163 @@ void bdrv_aio_cancel(BlockDriverAIOCB *acb)
     acb->pool->cancel(acb);
 }
 
+static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
+                 bool is_write, double elapsed_time, uint64_t *wait) {
+    uint64_t bps_limit = 0;
+    double   bytes_limit, bytes_disp, bytes_res;
+    double   slice_time, wait_time;
+
+    if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
+        bps_limit = bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL];
+    } else if (bs->io_limits.bps[is_write]) {
+        bps_limit = bs->io_limits.bps[is_write];
+    } else {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    slice_time = bs->slice_end - bs->slice_start;
+    slice_time /= (NANOSECONDS_PER_SECOND);
+    bytes_limit = bps_limit * slice_time;
+    bytes_disp  = bs->io_disps.bytes[is_write];
+    if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
+        bytes_disp += bs->io_disps.bytes[!is_write];
+    }
+
+    bytes_res   = (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
+
+    if (bytes_disp + bytes_res <= bytes_limit) {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    /* Calc approx time to dispatch */
+    wait_time = (bytes_disp + bytes_res) / bps_limit - elapsed_time;
+
+    if (wait) {
+        *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
+    }
+
+    return true;
+}
+
+static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
+                             double elapsed_time, uint64_t *wait) {
+    uint64_t iops_limit = 0;
+    double   ios_limit, ios_disp;
+    double   slice_time, wait_time;
+
+    if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
+        iops_limit = bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL];
+    } else if (bs->io_limits.iops[is_write]) {
+        iops_limit = bs->io_limits.iops[is_write];
+    } else {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    slice_time = bs->slice_end - bs->slice_start;
+    slice_time /= (NANOSECONDS_PER_SECOND);
+    ios_limit  = iops_limit * slice_time;
+    ios_disp   = bs->io_disps.ios[is_write];
+    if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
+        ios_disp += bs->io_disps.ios[!is_write];
+    }
+
+    if (ios_disp + 1 <= ios_limit) {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    /* Calc approx time to dispatch */
+    wait_time = (ios_disp + 1) / iops_limit;
+    if (wait_time > elapsed_time) {
+        wait_time = wait_time - elapsed_time;
+    } else {
+        wait_time = 0;
+    }
+
+    if (wait) {
+        *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
+    }
+
+    return true;
+}
+
+static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
+                           bool is_write, int64_t *wait) {
+    int64_t  now, max_wait;
+    uint64_t bps_wait = 0, iops_wait = 0;
+    double   elapsed_time;
+    int      bps_ret, iops_ret;
+
+    now = qemu_get_clock_ns(vm_clock);
+    if ((bs->slice_start < now)
+        && (bs->slice_end > now)) {
+        bs->slice_end = now + BLOCK_IO_SLICE_TIME;
+    } else {
+        bs->slice_start = now;
+        bs->slice_end   = now + BLOCK_IO_SLICE_TIME;
+
+        bs->io_disps.bytes[is_write]  = 0;
+        bs->io_disps.bytes[!is_write] = 0;
+
+        bs->io_disps.ios[is_write]    = 0;
+        bs->io_disps.ios[!is_write]   = 0;
+    }
+
+    /* If a limit was exceeded, immediately queue this request */
+    if (qemu_block_queue_has_pending(bs->block_queue)) {
+        if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]
+            || bs->io_limits.bps[is_write] || bs->io_limits.iops[is_write]
+            || bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
+            if (wait) {
+                *wait = -1;
+            }
+
+            return true;
+        }
+    }
+
+    elapsed_time  = now - bs->slice_start;
+    elapsed_time  /= (NANOSECONDS_PER_SECOND);
+
+    bps_ret  = bdrv_exceed_bps_limits(bs, nb_sectors,
+                                      is_write, elapsed_time, &bps_wait);
+    iops_ret = bdrv_exceed_iops_limits(bs, is_write,
+                                      elapsed_time, &iops_wait);
+    if (bps_ret || iops_ret) {
+        max_wait = bps_wait > iops_wait ? bps_wait : iops_wait;
+        if (wait) {
+            *wait = max_wait;
+        }
+
+        now = qemu_get_clock_ns(vm_clock);
+        if (bs->slice_end < now + max_wait) {
+            bs->slice_end = now + max_wait;
+        }
+
+        return true;
+    }
+
+    if (wait) {
+        *wait = 0;
+    }
+
+    return false;
+}
 
 /**************************************************************/
 /* async block device emulation */
diff --git a/block.h b/block.h
index a3e69db..10d2828 100644
--- a/block.h
+++ b/block.h
@@ -107,7 +107,6 @@ int bdrv_change_backing_file(BlockDriverState *bs,
     const char *backing_file, const char *backing_fmt);
 void bdrv_register(BlockDriver *bdrv);
 
-
 typedef struct BdrvCheckResult {
     int corruptions;
     int leaks;
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2011-10-18 13:29 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-08 10:11 [PATCH v8 0/4] The intro of QEMU block I/O throttling Zhi Yong Wu
2011-09-08 10:11 ` [Qemu-devel] " Zhi Yong Wu
2011-09-08 10:11 ` [PATCH v8 1/4] block: add the block queue support Zhi Yong Wu
2011-09-08 10:11   ` [Qemu-devel] " Zhi Yong Wu
2011-09-23 15:32   ` Kevin Wolf
2011-09-23 15:32     ` [Qemu-devel] " Kevin Wolf
2011-09-26  8:01     ` Zhi Yong Wu
2011-09-26  8:01       ` Zhi Yong Wu
2011-10-17 10:17       ` Kevin Wolf
2011-10-17 10:17         ` Kevin Wolf
2011-10-17 10:17         ` Paolo Bonzini
2011-10-18  7:00           ` Zhi Yong Wu
2011-10-18  7:00             ` Zhi Yong Wu
2011-10-18  8:07         ` Zhi Yong Wu
2011-10-18  8:07           ` [Qemu-devel] " Zhi Yong Wu
2011-10-18  8:36           ` Kevin Wolf
2011-10-18  8:36             ` Kevin Wolf
2011-10-18  9:29             ` Zhi Yong Wu
2011-10-18  9:29               ` Zhi Yong Wu
2011-10-18  9:56               ` Kevin Wolf
2011-10-18  9:56                 ` Kevin Wolf
2011-10-18 13:29                 ` Zhi Yong Wu
2011-10-18 13:29                   ` Zhi Yong Wu
2011-09-08 10:11 ` [PATCH v8 2/4] block: add the command line support Zhi Yong Wu
2011-09-08 10:11   ` [Qemu-devel] " Zhi Yong Wu
2011-09-23 15:54   ` Kevin Wolf
2011-09-23 15:54     ` [Qemu-devel] " Kevin Wolf
2011-09-26  6:15     ` Zhi Yong Wu
2011-09-26  6:15       ` [Qemu-devel] " Zhi Yong Wu
2011-10-17 10:19       ` Kevin Wolf
2011-10-17 10:19         ` Kevin Wolf
2011-10-18  8:17         ` Zhi Yong Wu
2011-10-18  8:17           ` [Qemu-devel] " Zhi Yong Wu
2011-09-08 10:11 ` [PATCH v8 3/4] block: add block timer and throttling algorithm Zhi Yong Wu
2011-09-08 10:11   ` [Qemu-devel] " Zhi Yong Wu
2011-09-09 14:44   ` Marcelo Tosatti
2011-09-09 14:44     ` [Qemu-devel] " Marcelo Tosatti
2011-09-13  3:09     ` Zhi Yong Wu
2011-09-13  3:09       ` [Qemu-devel] " Zhi Yong Wu
2011-09-14 10:50       ` Marcelo Tosatti
2011-09-14 10:50         ` [Qemu-devel] " Marcelo Tosatti
2011-09-19  9:55         ` Zhi Yong Wu
2011-09-19  9:55           ` [Qemu-devel] " Zhi Yong Wu
2011-09-20 12:34           ` Marcelo Tosatti
2011-09-20 12:34             ` [Qemu-devel] " Marcelo Tosatti
2011-09-21  3:14             ` Zhi Yong Wu
2011-09-21  3:14               ` [Qemu-devel] " Zhi Yong Wu
2011-09-21  5:54               ` Zhi Yong Wu
2011-09-21  5:54                 ` [Qemu-devel] " Zhi Yong Wu
2011-09-21  7:03             ` Zhi Yong Wu
2011-09-21  7:03               ` [Qemu-devel] " Zhi Yong Wu
2011-09-26  8:15             ` Zhi Yong Wu
2011-09-26  8:15               ` [Qemu-devel] " Zhi Yong Wu
2011-09-23 16:19   ` Kevin Wolf
2011-09-23 16:19     ` [Qemu-devel] " Kevin Wolf
2011-09-26  7:24     ` Zhi Yong Wu
2011-09-26  7:24       ` [Qemu-devel] " Zhi Yong Wu
2011-10-17 10:26       ` Kevin Wolf
2011-10-17 10:26         ` Kevin Wolf
2011-10-17 15:54         ` Stefan Hajnoczi
2011-10-17 15:54           ` Stefan Hajnoczi
2011-10-18  8:29           ` Zhi Yong Wu
2011-10-18  8:29             ` Zhi Yong Wu
2011-10-18  8:43         ` Zhi Yong Wu
2011-10-18  8:43           ` Zhi Yong Wu
2011-09-08 10:11 ` [PATCH v8 4/4] qmp/hmp: add block_set_io_throttle Zhi Yong Wu
2011-09-08 10:11   ` [Qemu-devel] " Zhi Yong Wu
  -- strict thread matches above, loose matches on Subject: below --
2011-09-07 12:41 [PATCH v8 3/4] block: add block timer and throttling algorithm Zhi Yong Wu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.