All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 0/1] The draft of the codes for QEMU disk I/O limits
@ 2011-07-25  2:22 ` Zhi Yong Wu
  0 siblings, 0 replies; 5+ messages in thread
From: Zhi Yong Wu @ 2011-07-25  2:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, aliguori, stefanha, raharper, kwolf, vgoyal, zwu.kernel,
	luowenj, Zhi Yong Wu

The main goal of the patch is to effectively cap the disk I/O speed or counts of one single VM.
It is only one draft, so it unavoidably has some drawbacks, if you catch them, please let me know.

The patch will mainly introduce one global timer and one block queue for each I/O limits enabled drive.
When a block request is coming in, a block I/O throttling algorithm will check if its I/O rate or counts exceed the limits; if yes, then it will enqueue to the block queue; The timer will periodically handle the I/O requests in it.

Some available features follow as below:
(1) global bps limit.
    -drive bps=xxx            in bytes/s
(2) only read bps limit
    -drive bps_rd=xxx         in bytes/s
(3) only write bps limit
    -drive bps_wr=xxx         in bytes/s
(4) global iops limit
    -drive iops=xxx           in ios/s
(5) only read iops limit
    -drive iops_rd=xxx        in ios/s
(6) only write iops limit
    -drive iops_wr=xxx        in ios/s
(7) the combination of some limits.
    -drive bps=xxx,iops=xxx 

Known Limitations:
(1) #1 can not coexist with #2, #3 
(2) #4 can not coexist with #5, #6 

Zhi Yong Wu (1):
  Submit the codes for QEMU disk I/O limits.

 Makefile.objs     |    2 +-
 block.c           |  248 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 block.h           |    1 -
 block/blk-queue.c |   99 +++++++++++++++++++++
 block/blk-queue.h |   73 ++++++++++++++++
 block_int.h       |   21 +++++
 blockdev.c        |   20 +++++
 qemu-config.c     |   24 +++++
 qemu-option.c     |   17 ++++
 qemu-option.h     |    1 +
 qemu-options.hx   |    1 +
 11 files changed, 505 insertions(+), 2 deletions(-)
 create mode 100644 block/blk-queue.c
 create mode 100644 block/blk-queue.h

-- 
1.7.2.3


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Qemu-devel] [PATCH v1 0/1] The draft of the codes for QEMU disk I/O limits
@ 2011-07-25  2:22 ` Zhi Yong Wu
  0 siblings, 0 replies; 5+ messages in thread
From: Zhi Yong Wu @ 2011-07-25  2:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, aliguori, stefanha, kvm, zwu.kernel, Zhi Yong Wu, luowenj,
	raharper, vgoyal

The main goal of the patch is to effectively cap the disk I/O speed or counts of one single VM.
It is only one draft, so it unavoidably has some drawbacks, if you catch them, please let me know.

The patch will mainly introduce one global timer and one block queue for each I/O limits enabled drive.
When a block request is coming in, a block I/O throttling algorithm will check if its I/O rate or counts exceed the limits; if yes, then it will enqueue to the block queue; The timer will periodically handle the I/O requests in it.

Some available features follow as below:
(1) global bps limit.
    -drive bps=xxx            in bytes/s
(2) only read bps limit
    -drive bps_rd=xxx         in bytes/s
(3) only write bps limit
    -drive bps_wr=xxx         in bytes/s
(4) global iops limit
    -drive iops=xxx           in ios/s
(5) only read iops limit
    -drive iops_rd=xxx        in ios/s
(6) only write iops limit
    -drive iops_wr=xxx        in ios/s
(7) the combination of some limits.
    -drive bps=xxx,iops=xxx 

Known Limitations:
(1) #1 can not coexist with #2, #3 
(2) #4 can not coexist with #5, #6 

Zhi Yong Wu (1):
  Submit the codes for QEMU disk I/O limits.

 Makefile.objs     |    2 +-
 block.c           |  248 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 block.h           |    1 -
 block/blk-queue.c |   99 +++++++++++++++++++++
 block/blk-queue.h |   73 ++++++++++++++++
 block_int.h       |   21 +++++
 blockdev.c        |   20 +++++
 qemu-config.c     |   24 +++++
 qemu-option.c     |   17 ++++
 qemu-option.h     |    1 +
 qemu-options.hx   |    1 +
 11 files changed, 505 insertions(+), 2 deletions(-)
 create mode 100644 block/blk-queue.c
 create mode 100644 block/blk-queue.h

-- 
1.7.2.3

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v1 1/1] Submit the codes for QEMU disk I/O limits.
  2011-07-25  2:22 ` [Qemu-devel] " Zhi Yong Wu
@ 2011-07-25  2:22   ` Zhi Yong Wu
  -1 siblings, 0 replies; 5+ messages in thread
From: Zhi Yong Wu @ 2011-07-25  2:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, aliguori, stefanha, raharper, kwolf, vgoyal, zwu.kernel,
	luowenj, Zhi Yong Wu

Signed-off-by: Zhi Yong Wu <zwu.kernel@gmail.com>
---
 Makefile.objs     |    2 +-
 block.c           |  248 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 block.h           |    1 -
 block/blk-queue.c |   99 +++++++++++++++++++++
 block/blk-queue.h |   73 ++++++++++++++++
 block_int.h       |   21 +++++
 blockdev.c        |   20 +++++
 qemu-config.c     |   24 +++++
 qemu-option.c     |   17 ++++
 qemu-option.h     |    1 +
 qemu-options.hx   |    1 +
 11 files changed, 505 insertions(+), 2 deletions(-)
 create mode 100644 block/blk-queue.c
 create mode 100644 block/blk-queue.h

diff --git a/Makefile.objs b/Makefile.objs
index 9f99ed4..06f2033 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -23,7 +23,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vv
 block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
 block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-nested-y += qed-check.o
-block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
+block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o blk-queue.o
 block-nested-$(CONFIG_WIN32) += raw-win32.o
 block-nested-$(CONFIG_POSIX) += raw-posix.o
 block-nested-$(CONFIG_CURL) += curl.o
diff --git a/block.c b/block.c
index 24a25d5..0b12f08 100644
--- a/block.c
+++ b/block.c
@@ -29,6 +29,9 @@
 #include "module.h"
 #include "qemu-objects.h"
 
+#include "qemu-timer.h"
+#include "block/blk-queue.h"
+
 #ifdef CONFIG_BSD
 #include <sys/types.h>
 #include <sys/stat.h>
@@ -58,6 +61,10 @@ static int bdrv_read_em(BlockDriverState *bs, int64_t sector_num,
 static int bdrv_write_em(BlockDriverState *bs, int64_t sector_num,
                          const uint8_t *buf, int nb_sectors);
 
+bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors, bool is_write, uint64_t *wait);
+bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write, uint64_t *wait);
+bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors, bool is_write, uint64_t *wait);
+
 static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
     QTAILQ_HEAD_INITIALIZER(bdrv_states);
 
@@ -167,6 +174,31 @@ void path_combine(char *dest, int dest_size,
     }
 }
 
+static void bdrv_block_timer(void *opaque)
+{
+    BlockDriverState *bs = opaque;
+    BlockQueue *queue = bs->block_queue;
+    uint64_t intval = 1;
+
+    while (!QTAILQ_EMPTY(&queue->requests)) {
+        BlockIORequest *request;
+        int ret;
+
+        request = QTAILQ_FIRST(&queue->requests);
+        QTAILQ_REMOVE(&queue->requests, request, entry);
+
+        ret = queue->handler(request);
+        if (ret == 0) {
+            QTAILQ_INSERT_HEAD(&queue->requests, request, entry);
+            break;
+        }
+
+        qemu_free(request);
+    }
+
+    qemu_mod_timer(bs->block_timer, (intval * 1000000000) + qemu_get_clock_ns(vm_clock));
+}
+
 void bdrv_register(BlockDriver *bdrv)
 {
     if (!bdrv->bdrv_aio_readv) {
@@ -476,6 +508,11 @@ static int bdrv_open_common(BlockDriverState *bs, const char *filename,
         goto free_and_fail;
     }
 
+    /* throttling disk I/O limits */
+    if (bs->io_limits != NULL) {
+    	bs->block_queue = qemu_new_block_queue(qemu_block_queue_handler);
+    }
+
 #ifndef _WIN32
     if (bs->is_temporary) {
         unlink(filename);
@@ -642,6 +679,16 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
             bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
     }
 
+    /* throttling disk I/O limits */
+    if (bs->io_limits != NULL) {
+	bs->io_disps    = qemu_mallocz(sizeof(BlockIODisp));
+    	bs->block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
+    	qemu_mod_timer(bs->block_timer, qemu_get_clock_ns(vm_clock));
+
+	bs->slice_start[0] = qemu_get_clock_ns(vm_clock);
+	bs->slice_start[1] = qemu_get_clock_ns(vm_clock);
+    }
+
     return 0;
 
 unlink_and_fail:
@@ -680,6 +727,15 @@ void bdrv_close(BlockDriverState *bs)
         if (bs->change_cb)
             bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
     }
+
+    /* throttling disk I/O limits */
+    if (bs->block_queue) {
+	qemu_del_block_queue(bs->block_queue);
+    }
+
+    if (bs->block_timer) {
+        qemu_free_timer(bs->block_timer);
+    }
 }
 
 void bdrv_close_all(void)
@@ -1312,6 +1368,13 @@ void bdrv_get_geometry_hint(BlockDriverState *bs,
     *psecs = bs->secs;
 }
 
+/* throttling disk io limits */
+void bdrv_set_io_limits(BlockDriverState *bs,
+			    BlockIOLimit *io_limits)
+{
+    bs->io_limits	      = io_limits;
+}
+
 /* Recognize floppy formats */
 typedef struct FDFormat {
     FDriveType drive;
@@ -2111,6 +2174,154 @@ char *bdrv_snapshot_dump(char *buf, int buf_size, QEMUSnapshotInfo *sn)
     return buf;
 }
 
+bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
+                           bool is_write, uint64_t *wait) {
+    int64_t  real_time;
+    uint64_t bps_limit   = 0;
+    double   bytes_limit, bytes_disp, bytes_res, elapsed_time;
+    double   slice_time = 0.1, wait_time;
+
+    if (bs->io_limits->bps[2]) {
+        bps_limit = bs->io_limits->bps[2];
+    } else if (bs->io_limits->bps[is_write]) {
+        bps_limit = bs->io_limits->bps[is_write];
+    } else {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    real_time = qemu_get_clock_ns(vm_clock);
+    if (bs->slice_start[is_write] + 100000000 <= real_time) {
+        bs->slice_start[is_write] = real_time;
+
+        bs->io_disps->bytes[is_write] = 0;
+        if (bs->io_limits->bps[2]) {
+            bs->io_disps->bytes[!is_write] = 0;
+        }
+    }
+
+    elapsed_time  = (real_time - bs->slice_start[is_write]) / 1000000000.0;
+    fprintf(stderr, "real_time = %ld, slice_start = %ld, elapsed_time = %g\n", real_time, bs->slice_start[is_write], elapsed_time);
+
+    bytes_limit	= bps_limit * slice_time;
+    bytes_disp  = bs->io_disps->bytes[is_write];
+    if (bs->io_limits->bps[2]) {
+        bytes_disp += bs->io_disps->bytes[!is_write];
+    }
+
+    bytes_res   = (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
+    fprintf(stderr, "bytes_limit = %g bytes_disp = %g, bytes_res = %g, elapsed_time = %g\n", bytes_limit, bytes_disp, bytes_res, elapsed_time);
+
+    if (bytes_disp + bytes_res <= bytes_limit) {
+        if (wait) {
+            *wait = 0;
+        }
+
+    	fprintf(stderr, "bytes_disp + bytes_res <= bytes_limit\n");
+        return false;
+    }
+
+    /* Calc approx time to dispatch */
+    wait_time = (bytes_disp + bytes_res - bytes_limit) / bps_limit;
+    if (!wait_time) {
+        wait_time = 1;
+    }
+
+    wait_time = wait_time + (slice_time - elapsed_time);
+    if (wait) {
+	*wait = wait_time * 1000000000 + 1;
+    }
+
+    return true;
+}
+
+bool bdrv_exceed_iops_limits(BlockDriverState *bs,
+                           bool is_write, uint64_t *wait) {
+    int64_t  real_time;
+    uint64_t iops_limit   = 0;
+    double   ios_limit, ios_disp, elapsed_time;
+    double   slice_time = 0.1, wait_time;
+
+    if (bs->io_limits->iops[2]) {
+        iops_limit = bs->io_limits->iops[2];
+    } else if (bs->io_limits->iops[is_write]) {
+        iops_limit = bs->io_limits->iops[is_write];
+    } else {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    real_time = qemu_get_clock_ns(vm_clock);
+    if (bs->slice_start[is_write] + 100000000 <= real_time) {
+        bs->slice_start[is_write] = real_time;
+
+        bs->io_disps->ios[is_write] = 0;
+        if (bs->io_limits->iops[2]) {
+            bs->io_disps->ios[!is_write] = 0;
+        }
+    }
+
+    elapsed_time  = (real_time - bs->slice_start[is_write]) / 1000000000.0;
+    fprintf(stderr, "real_time = %ld, slice_start = %ld, elapsed_time = %g\n", real_time, bs->slice_start[is_write], elapsed_time);
+
+    ios_limit = iops_limit * slice_time;
+    ios_disp  = bs->io_disps->ios[is_write];
+    if (bs->io_limits->iops[2]) {
+        ios_disp += bs->io_disps->ios[!is_write];
+    }
+    fprintf(stderr, "iops_limit = %ld ios_disp = %g, elapsed_time = %g\n", iops_limit, ios_disp, elapsed_time);
+
+    if (ios_disp + 1 <= ios_limit) {
+	if (wait) {
+	    *wait = 0;
+	}
+
+        fprintf(stderr, "ios_disp + 1 <= ios_limit\n");
+        return false;
+    }
+
+    /* Calc approx time to dispatch */
+    wait_time = (ios_disp + 1) / iops_limit;
+    if (wait_time > elapsed_time) {
+	wait_time = wait_time - elapsed_time;
+    } else {
+	wait_time = 0;
+    }
+
+    if (wait) {
+        *wait = wait_time * 1000000000 + 1;
+    }
+
+    return true;
+}
+
+bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
+                           bool is_write, uint64_t *wait) {
+    uint64_t bps_wait = 0, iops_wait = 0, max_wait;
+
+    if (bdrv_exceed_bps_limits(bs, nb_sectors, is_write, &bps_wait)
+        || bdrv_exceed_iops_limits(bs, is_write, &iops_wait)) {
+        max_wait = bps_wait > iops_wait ? bps_wait : iops_wait;
+        if (wait) {
+            *wait = max_wait;
+        }
+
+        fprintf(stderr, "bdrv_exceed_io_limits: wait = %ld\n", *wait);
+        return true;
+    }
+
+    if (wait) {
+        *wait = 0;
+    }
+
+    return false;
+}
 
 /**************************************************************/
 /* async I/Os */
@@ -2121,6 +2332,7 @@ BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num,
 {
     BlockDriver *drv = bs->drv;
     BlockDriverAIOCB *ret;
+    uint64_t wait_time = 0;
 
     trace_bdrv_aio_readv(bs, sector_num, nb_sectors, opaque);
 
@@ -2129,6 +2341,19 @@ BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num,
     if (bdrv_check_request(bs, sector_num, nb_sectors))
         return NULL;
 
+    /* throttling disk read I/O */
+    if (bs->io_limits != NULL) {
+    	if (bdrv_exceed_io_limits(bs, nb_sectors, false, &wait_time)) {
+            ret = qemu_block_queue_enqueue(bs->block_queue, bs, bdrv_aio_readv,
+				sector_num, qiov, nb_sectors, cb, opaque);
+    	    fprintf(stderr, "bdrv_aio_readv: wait_time = %ld, timer value = %ld\n", wait_time, wait_time + qemu_get_clock_ns(vm_clock));
+    	    qemu_mod_timer(bs->block_timer, wait_time + qemu_get_clock_ns(vm_clock));
+            
+	    return ret;
+	}
+    }
+
+
     ret = drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
                               cb, opaque);
 
@@ -2136,6 +2361,11 @@ BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num,
 	/* Update stats even though technically transfer has not happened. */
 	bs->rd_bytes += (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
 	bs->rd_ops ++;
+
+	if (bs->io_limits != NULL) {
+	    bs->io_disps->bytes[0] += (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
+	    bs->io_disps->ios[0] ++;
+	}
     }
 
     return ret;
@@ -2184,6 +2414,7 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
     BlockDriver *drv = bs->drv;
     BlockDriverAIOCB *ret;
     BlockCompleteData *blk_cb_data;
+    uint64_t wait_time = 0;
 
     trace_bdrv_aio_writev(bs, sector_num, nb_sectors, opaque);
 
@@ -2201,6 +2432,18 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
         opaque = blk_cb_data;
     }
 
+    /* throttling disk write I/O */
+    if (bs->io_limits != NULL) {
+    	if (bdrv_exceed_io_limits(bs, nb_sectors, true, &wait_time)) {
+	    ret = qemu_block_queue_enqueue(bs->block_queue, bs, bdrv_aio_writev,
+				sector_num, qiov, nb_sectors, cb, opaque);
+    	    fprintf(stderr, "bdrv_aio_writev: wait_time = %ld, timer value = %ld\n", wait_time, wait_time + qemu_get_clock_ns(vm_clock));
+    	    qemu_mod_timer(bs->block_timer, wait_time + qemu_get_clock_ns(vm_clock));
+            return ret;
+	}
+    }
+
+
     ret = drv->bdrv_aio_writev(bs, sector_num, qiov, nb_sectors,
                                cb, opaque);
 
@@ -2211,6 +2454,11 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
         if (bs->wr_highest_sector < sector_num + nb_sectors - 1) {
             bs->wr_highest_sector = sector_num + nb_sectors - 1;
         }
+
+	if (bs->io_limits != NULL) {
+            bs->io_disps->bytes[1] += (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
+            bs->io_disps->ios[1] ++;
+	}
     }
 
     return ret;
diff --git a/block.h b/block.h
index 859d1d9..f0dac62 100644
--- a/block.h
+++ b/block.h
@@ -97,7 +97,6 @@ int bdrv_change_backing_file(BlockDriverState *bs,
     const char *backing_file, const char *backing_fmt);
 void bdrv_register(BlockDriver *bdrv);
 
-
 typedef struct BdrvCheckResult {
     int corruptions;
     int leaks;
diff --git a/block/blk-queue.c b/block/blk-queue.c
new file mode 100644
index 0000000..6cffb33
--- /dev/null
+++ b/block/blk-queue.c
@@ -0,0 +1,99 @@
+/*
+ * QEMU System Emulator queue for block layer
+ *
+ * Copyright (c) 2011 Zhi Yong Wu  <zwu.kernel@gmail.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "block_int.h"
+#include "qemu-queue.h"
+#include "block/blk-queue.h"
+
+/* The APIs for block request queue on qemu block layer.
+ */
+
+BlockQueue *qemu_new_block_queue(BlockQueueHandler *handler)
+{
+    BlockQueue *queue;
+
+    queue = qemu_mallocz(sizeof(BlockQueue));
+
+    queue->handler = handler;
+    QTAILQ_INIT(&queue->requests);
+
+    return queue;
+}
+
+void qemu_del_block_queue(BlockQueue *queue)
+{
+    BlockIORequest *request, *next;
+
+    QTAILQ_FOREACH_SAFE(request, &queue->requests, entry, next) {
+        QTAILQ_REMOVE(&queue->requests, request, entry);
+        qemu_free(request);
+    }
+
+    qemu_free(queue);
+}
+
+BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
+			BlockDriverState *bs,
+			BlockRequestHandler *handler,
+			int64_t sector_num,
+			QEMUIOVector *qiov,
+			int nb_sectors,
+			BlockDriverCompletionFunc *cb,
+			void *opaque)
+{
+    BlockIORequest *request;
+    BlockDriverAIOCB *acb;
+
+    request = qemu_malloc(sizeof(BlockIORequest));
+    request->bs = bs;
+    request->handler = handler;
+    request->sector_num = sector_num;
+    request->qiov = qiov;
+    request->nb_sectors = nb_sectors;
+    request->cb = cb;
+    request->opaque = opaque;
+
+    QTAILQ_INSERT_TAIL(&queue->requests, request, entry);
+
+    acb = qemu_malloc(sizeof(*acb));
+    acb->bs = bs;
+    acb->cb = cb;
+    acb->opaque = opaque;
+
+    return acb;
+}
+
+int qemu_block_queue_handler(BlockIORequest *request)
+{
+    int ret;
+    BlockDriverAIOCB *res;
+
+    res = request->handler(request->bs, request->sector_num,
+        	                request->qiov, request->nb_sectors,
+				request->cb, request->opaque);
+
+    ret = (res == NULL) ? 0 : 1; 
+
+    return ret;
+}
diff --git a/block/blk-queue.h b/block/blk-queue.h
new file mode 100644
index 0000000..6c400c7
--- /dev/null
+++ b/block/blk-queue.h
@@ -0,0 +1,73 @@
+/*
+ * QEMU System Emulator queue definition for block layer
+ *
+ * Copyright (c) 2011 Zhi Yong Wu  <zwu.kernel@gmail.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef QEMU_BLOCK_QUEUE_H
+#define QEMU_BLOCK_QUEUE_H
+
+#include "block.h"
+#include "qemu-queue.h"
+#include "qemu-common.h"
+
+typedef BlockDriverAIOCB* (BlockRequestHandler) (BlockDriverState *bs,
+                                int64_t sector_num, QEMUIOVector *qiov,
+                                int nb_sectors, BlockDriverCompletionFunc *cb,
+                                void *opaque);
+
+struct BlockIORequest {
+    QTAILQ_ENTRY(BlockIORequest) entry;
+    BlockDriverState *bs;
+    BlockRequestHandler *handler;
+    int64_t sector_num;
+    QEMUIOVector *qiov;
+    int nb_sectors;
+    BlockDriverCompletionFunc *cb;
+    void *opaque;
+};
+
+typedef struct BlockIORequest BlockIORequest;
+
+typedef int (BlockQueueHandler) (BlockIORequest *request);
+
+struct BlockQueue {
+    BlockQueueHandler *handler;
+    QTAILQ_HEAD(requests, BlockIORequest) requests;
+};
+
+typedef struct BlockQueue BlockQueue;
+
+BlockQueue *qemu_new_block_queue(BlockQueueHandler *handler);
+
+void qemu_del_block_queue(BlockQueue *queue);
+
+BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
+                        BlockDriverState *bs,
+                        BlockRequestHandler *handler,
+                        int64_t sector_num,
+                        QEMUIOVector *qiov,
+                        int nb_sectors,
+                        BlockDriverCompletionFunc *cb,
+                        void *opaque);
+
+int qemu_block_queue_handler(BlockIORequest *request);
+#endif /* QEMU_BLOCK_QUEUE_H */
diff --git a/block_int.h b/block_int.h
index 1e265d2..a173e89 100644
--- a/block_int.h
+++ b/block_int.h
@@ -27,6 +27,7 @@
 #include "block.h"
 #include "qemu-option.h"
 #include "qemu-queue.h"
+#include "block/blk-queue.h"
 
 #define BLOCK_FLAG_ENCRYPT	1
 #define BLOCK_FLAG_COMPAT6	4
@@ -46,6 +47,16 @@ typedef struct AIOPool {
     BlockDriverAIOCB *free_aiocb;
 } AIOPool;
 
+typedef struct BlockIOLimit {
+    uint64_t bps[3];
+    uint64_t iops[3];
+} BlockIOLimit;
+
+typedef struct BlockIODisp {
+    uint64_t bytes[2];
+    uint64_t ios[2];
+} BlockIODisp;
+
 struct BlockDriver {
     const char *format_name;
     int instance_size;
@@ -175,6 +186,13 @@ struct BlockDriverState {
 
     void *sync_aiocb;
 
+    /* the time for latest disk I/O */
+    int64_t slice_start[2];
+    BlockIOLimit *io_limits;
+    BlockIODisp  *io_disps;
+    BlockQueue   *block_queue;
+    QEMUTimer    *block_timer;
+
     /* I/O stats (display with "info blockstats"). */
     uint64_t rd_bytes;
     uint64_t wr_bytes;
@@ -222,6 +240,9 @@ void qemu_aio_release(void *p);
 
 void *qemu_blockalign(BlockDriverState *bs, size_t size);
 
+void bdrv_set_io_limits(BlockDriverState *bs,
+                            BlockIOLimit *io_limits);
+
 #ifdef _WIN32
 int is_windows_drive(const char *filename);
 #endif
diff --git a/blockdev.c b/blockdev.c
index c263663..4d79e7a 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -238,6 +238,9 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
     int on_read_error, on_write_error;
     const char *devaddr;
     DriveInfo *dinfo;
+    BlockIOLimit *io_limits = NULL;
+    bool iol_flag = false;
+    const char *iol_opts[7] = {"bps", "bps_rd", "bps_wr", "iops", "iops_rd", "iops_wr"};
     int is_extboot = 0;
     int snapshot = 0;
     int ret;
@@ -372,6 +375,18 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
         return NULL;
     }
 
+    /* disk io limits */
+    iol_flag = qemu_opt_io_limits_enable_flag(opts, iol_opts);
+    if (iol_flag) {
+    	io_limits		= qemu_mallocz(sizeof(*io_limits));
+    	io_limits->bps[2]  	= qemu_opt_get_number(opts, "bps", 0);
+    	io_limits->bps[0] 	= qemu_opt_get_number(opts, "bps_rd", 0);
+    	io_limits->bps[1]  	= qemu_opt_get_number(opts, "bps_wr", 0);
+        io_limits->iops[2]    	= qemu_opt_get_number(opts, "iops", 0);
+        io_limits->iops[0] 	= qemu_opt_get_number(opts, "iops_rd", 0);
+        io_limits->iops[1] 	= qemu_opt_get_number(opts, "iops_wr", 0);
+    }
+
     on_write_error = BLOCK_ERR_STOP_ENOSPC;
     if ((buf = qemu_opt_get(opts, "werror")) != NULL) {
         if (type != IF_IDE && type != IF_SCSI && type != IF_VIRTIO && type != IF_NONE) {
@@ -483,6 +498,11 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
 
     bdrv_set_on_error(dinfo->bdrv, on_read_error, on_write_error);
 
+    /* throttling disk io limits */
+    if (iol_flag) {
+    	bdrv_set_io_limits(dinfo->bdrv, io_limits);
+    }
+
     switch(type) {
     case IF_IDE:
     case IF_SCSI:
diff --git a/qemu-config.c b/qemu-config.c
index efa892c..ad5f2d9 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -82,6 +82,30 @@ static QemuOptsList qemu_drive_opts = {
             .name = "boot",
             .type = QEMU_OPT_BOOL,
             .help = "make this a boot drive",
+        },{
+            .name = "iops",
+            .type = QEMU_OPT_NUMBER,
+            .help = "cap its iops for this drive",
+        },{
+            .name = "iops_rd",
+            .type = QEMU_OPT_NUMBER,
+            .help = "cap its iops_rd for this drive",
+        },{
+            .name = "iops_wr",
+            .type = QEMU_OPT_NUMBER,
+            .help = "cap its iops_wr for this drive",
+        },{
+            .name = "bps",
+            .type = QEMU_OPT_NUMBER,
+            .help = "cap its bps for this drive",
+        },{
+            .name = "bps_rd",
+            .type = QEMU_OPT_NUMBER,
+            .help = "cap its bps_rd for this drive",
+        },{
+            .name = "bps_wr",
+            .type = QEMU_OPT_NUMBER,
+            .help = "cap its iops_wr for this drive",
         },
         { /* end of list */ }
     },
diff --git a/qemu-option.c b/qemu-option.c
index 65db542..9fe234d 100644
--- a/qemu-option.c
+++ b/qemu-option.c
@@ -562,6 +562,23 @@ uint64_t qemu_opt_get_number(QemuOpts *opts, const char *name, uint64_t defval)
     return opt->value.uint;
 }
 
+bool qemu_opt_io_limits_enable_flag(QemuOpts *opts, const char **iol_opts)
+{
+     int i;
+     uint64_t opt_val   = 0;
+     bool iol_flag = false;
+
+     for (i = 0; iol_opts[i]; i++) {
+	 opt_val = qemu_opt_get_number(opts, iol_opts[i], 0);
+	 if (opt_val != 0) {
+	     iol_flag = true;
+	     break;
+	 }
+     }
+
+     return iol_flag;
+}
+
 uint64_t qemu_opt_get_size(QemuOpts *opts, const char *name, uint64_t defval)
 {
     QemuOpt *opt = qemu_opt_find(opts, name);
diff --git a/qemu-option.h b/qemu-option.h
index b515813..fc909f9 100644
--- a/qemu-option.h
+++ b/qemu-option.h
@@ -107,6 +107,7 @@ struct QemuOptsList {
 const char *qemu_opt_get(QemuOpts *opts, const char *name);
 int qemu_opt_get_bool(QemuOpts *opts, const char *name, int defval);
 uint64_t qemu_opt_get_number(QemuOpts *opts, const char *name, uint64_t defval);
+bool qemu_opt_io_limits_enable_flag(QemuOpts *opts, const char **iol_opts);
 uint64_t qemu_opt_get_size(QemuOpts *opts, const char *name, uint64_t defval);
 int qemu_opt_set(QemuOpts *opts, const char *name, const char *value);
 typedef int (*qemu_opt_loopfunc)(const char *name, const char *value, void *opaque);
diff --git a/qemu-options.hx b/qemu-options.hx
index cb3347e..ae219f5 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -121,6 +121,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
     "       [,cache=writethrough|writeback|none|unsafe][,format=f]\n"
     "       [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
     "       [,readonly=on|off][,boot=on|off]\n"
+    "       [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r][,iops_wr=w]]\n"
     "                use 'file' as a drive image\n", QEMU_ARCH_ALL)
 STEXI
 @item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [Qemu-devel] [PATCH v1 1/1] Submit the codes for QEMU disk I/O limits.
@ 2011-07-25  2:22   ` Zhi Yong Wu
  0 siblings, 0 replies; 5+ messages in thread
From: Zhi Yong Wu @ 2011-07-25  2:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, aliguori, stefanha, kvm, zwu.kernel, Zhi Yong Wu, luowenj,
	raharper, vgoyal

Signed-off-by: Zhi Yong Wu <zwu.kernel@gmail.com>
---
 Makefile.objs     |    2 +-
 block.c           |  248 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 block.h           |    1 -
 block/blk-queue.c |   99 +++++++++++++++++++++
 block/blk-queue.h |   73 ++++++++++++++++
 block_int.h       |   21 +++++
 blockdev.c        |   20 +++++
 qemu-config.c     |   24 +++++
 qemu-option.c     |   17 ++++
 qemu-option.h     |    1 +
 qemu-options.hx   |    1 +
 11 files changed, 505 insertions(+), 2 deletions(-)
 create mode 100644 block/blk-queue.c
 create mode 100644 block/blk-queue.h

diff --git a/Makefile.objs b/Makefile.objs
index 9f99ed4..06f2033 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -23,7 +23,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vv
 block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
 block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-nested-y += qed-check.o
-block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
+block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o blk-queue.o
 block-nested-$(CONFIG_WIN32) += raw-win32.o
 block-nested-$(CONFIG_POSIX) += raw-posix.o
 block-nested-$(CONFIG_CURL) += curl.o
diff --git a/block.c b/block.c
index 24a25d5..0b12f08 100644
--- a/block.c
+++ b/block.c
@@ -29,6 +29,9 @@
 #include "module.h"
 #include "qemu-objects.h"
 
+#include "qemu-timer.h"
+#include "block/blk-queue.h"
+
 #ifdef CONFIG_BSD
 #include <sys/types.h>
 #include <sys/stat.h>
@@ -58,6 +61,10 @@ static int bdrv_read_em(BlockDriverState *bs, int64_t sector_num,
 static int bdrv_write_em(BlockDriverState *bs, int64_t sector_num,
                          const uint8_t *buf, int nb_sectors);
 
+bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors, bool is_write, uint64_t *wait);
+bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write, uint64_t *wait);
+bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors, bool is_write, uint64_t *wait);
+
 static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
     QTAILQ_HEAD_INITIALIZER(bdrv_states);
 
@@ -167,6 +174,31 @@ void path_combine(char *dest, int dest_size,
     }
 }
 
+static void bdrv_block_timer(void *opaque)
+{
+    BlockDriverState *bs = opaque;
+    BlockQueue *queue = bs->block_queue;
+    uint64_t intval = 1;
+
+    while (!QTAILQ_EMPTY(&queue->requests)) {
+        BlockIORequest *request;
+        int ret;
+
+        request = QTAILQ_FIRST(&queue->requests);
+        QTAILQ_REMOVE(&queue->requests, request, entry);
+
+        ret = queue->handler(request);
+        if (ret == 0) {
+            QTAILQ_INSERT_HEAD(&queue->requests, request, entry);
+            break;
+        }
+
+        qemu_free(request);
+    }
+
+    qemu_mod_timer(bs->block_timer, (intval * 1000000000) + qemu_get_clock_ns(vm_clock));
+}
+
 void bdrv_register(BlockDriver *bdrv)
 {
     if (!bdrv->bdrv_aio_readv) {
@@ -476,6 +508,11 @@ static int bdrv_open_common(BlockDriverState *bs, const char *filename,
         goto free_and_fail;
     }
 
+    /* throttling disk I/O limits */
+    if (bs->io_limits != NULL) {
+    	bs->block_queue = qemu_new_block_queue(qemu_block_queue_handler);
+    }
+
 #ifndef _WIN32
     if (bs->is_temporary) {
         unlink(filename);
@@ -642,6 +679,16 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
             bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
     }
 
+    /* throttling disk I/O limits */
+    if (bs->io_limits != NULL) {
+	bs->io_disps    = qemu_mallocz(sizeof(BlockIODisp));
+    	bs->block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
+    	qemu_mod_timer(bs->block_timer, qemu_get_clock_ns(vm_clock));
+
+	bs->slice_start[0] = qemu_get_clock_ns(vm_clock);
+	bs->slice_start[1] = qemu_get_clock_ns(vm_clock);
+    }
+
     return 0;
 
 unlink_and_fail:
@@ -680,6 +727,15 @@ void bdrv_close(BlockDriverState *bs)
         if (bs->change_cb)
             bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
     }
+
+    /* throttling disk I/O limits */
+    if (bs->block_queue) {
+	qemu_del_block_queue(bs->block_queue);
+    }
+
+    if (bs->block_timer) {
+        qemu_free_timer(bs->block_timer);
+    }
 }
 
 void bdrv_close_all(void)
@@ -1312,6 +1368,13 @@ void bdrv_get_geometry_hint(BlockDriverState *bs,
     *psecs = bs->secs;
 }
 
+/* throttling disk io limits */
+void bdrv_set_io_limits(BlockDriverState *bs,
+			    BlockIOLimit *io_limits)
+{
+    bs->io_limits	      = io_limits;
+}
+
 /* Recognize floppy formats */
 typedef struct FDFormat {
     FDriveType drive;
@@ -2111,6 +2174,154 @@ char *bdrv_snapshot_dump(char *buf, int buf_size, QEMUSnapshotInfo *sn)
     return buf;
 }
 
+bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
+                           bool is_write, uint64_t *wait) {
+    int64_t  real_time;
+    uint64_t bps_limit   = 0;
+    double   bytes_limit, bytes_disp, bytes_res, elapsed_time;
+    double   slice_time = 0.1, wait_time;
+
+    if (bs->io_limits->bps[2]) {
+        bps_limit = bs->io_limits->bps[2];
+    } else if (bs->io_limits->bps[is_write]) {
+        bps_limit = bs->io_limits->bps[is_write];
+    } else {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    real_time = qemu_get_clock_ns(vm_clock);
+    if (bs->slice_start[is_write] + 100000000 <= real_time) {
+        bs->slice_start[is_write] = real_time;
+
+        bs->io_disps->bytes[is_write] = 0;
+        if (bs->io_limits->bps[2]) {
+            bs->io_disps->bytes[!is_write] = 0;
+        }
+    }
+
+    elapsed_time  = (real_time - bs->slice_start[is_write]) / 1000000000.0;
+    fprintf(stderr, "real_time = %ld, slice_start = %ld, elapsed_time = %g\n", real_time, bs->slice_start[is_write], elapsed_time);
+
+    bytes_limit	= bps_limit * slice_time;
+    bytes_disp  = bs->io_disps->bytes[is_write];
+    if (bs->io_limits->bps[2]) {
+        bytes_disp += bs->io_disps->bytes[!is_write];
+    }
+
+    bytes_res   = (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
+    fprintf(stderr, "bytes_limit = %g bytes_disp = %g, bytes_res = %g, elapsed_time = %g\n", bytes_limit, bytes_disp, bytes_res, elapsed_time);
+
+    if (bytes_disp + bytes_res <= bytes_limit) {
+        if (wait) {
+            *wait = 0;
+        }
+
+    	fprintf(stderr, "bytes_disp + bytes_res <= bytes_limit\n");
+        return false;
+    }
+
+    /* Calc approx time to dispatch */
+    wait_time = (bytes_disp + bytes_res - bytes_limit) / bps_limit;
+    if (!wait_time) {
+        wait_time = 1;
+    }
+
+    wait_time = wait_time + (slice_time - elapsed_time);
+    if (wait) {
+	*wait = wait_time * 1000000000 + 1;
+    }
+
+    return true;
+}
+
+bool bdrv_exceed_iops_limits(BlockDriverState *bs,
+                           bool is_write, uint64_t *wait) {
+    int64_t  real_time;
+    uint64_t iops_limit   = 0;
+    double   ios_limit, ios_disp, elapsed_time;
+    double   slice_time = 0.1, wait_time;
+
+    if (bs->io_limits->iops[2]) {
+        iops_limit = bs->io_limits->iops[2];
+    } else if (bs->io_limits->iops[is_write]) {
+        iops_limit = bs->io_limits->iops[is_write];
+    } else {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    real_time = qemu_get_clock_ns(vm_clock);
+    if (bs->slice_start[is_write] + 100000000 <= real_time) {
+        bs->slice_start[is_write] = real_time;
+
+        bs->io_disps->ios[is_write] = 0;
+        if (bs->io_limits->iops[2]) {
+            bs->io_disps->ios[!is_write] = 0;
+        }
+    }
+
+    elapsed_time  = (real_time - bs->slice_start[is_write]) / 1000000000.0;
+    fprintf(stderr, "real_time = %ld, slice_start = %ld, elapsed_time = %g\n", real_time, bs->slice_start[is_write], elapsed_time);
+
+    ios_limit = iops_limit * slice_time;
+    ios_disp  = bs->io_disps->ios[is_write];
+    if (bs->io_limits->iops[2]) {
+        ios_disp += bs->io_disps->ios[!is_write];
+    }
+    fprintf(stderr, "iops_limit = %ld ios_disp = %g, elapsed_time = %g\n", iops_limit, ios_disp, elapsed_time);
+
+    if (ios_disp + 1 <= ios_limit) {
+	if (wait) {
+	    *wait = 0;
+	}
+
+        fprintf(stderr, "ios_disp + 1 <= ios_limit\n");
+        return false;
+    }
+
+    /* Calc approx time to dispatch */
+    wait_time = (ios_disp + 1) / iops_limit;
+    if (wait_time > elapsed_time) {
+	wait_time = wait_time - elapsed_time;
+    } else {
+	wait_time = 0;
+    }
+
+    if (wait) {
+        *wait = wait_time * 1000000000 + 1;
+    }
+
+    return true;
+}
+
+bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
+                           bool is_write, uint64_t *wait) {
+    uint64_t bps_wait = 0, iops_wait = 0, max_wait;
+
+    if (bdrv_exceed_bps_limits(bs, nb_sectors, is_write, &bps_wait)
+        || bdrv_exceed_iops_limits(bs, is_write, &iops_wait)) {
+        max_wait = bps_wait > iops_wait ? bps_wait : iops_wait;
+        if (wait) {
+            *wait = max_wait;
+        }
+
+        fprintf(stderr, "bdrv_exceed_io_limits: wait = %ld\n", *wait);
+        return true;
+    }
+
+    if (wait) {
+        *wait = 0;
+    }
+
+    return false;
+}
 
 /**************************************************************/
 /* async I/Os */
@@ -2121,6 +2332,7 @@ BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num,
 {
     BlockDriver *drv = bs->drv;
     BlockDriverAIOCB *ret;
+    uint64_t wait_time = 0;
 
     trace_bdrv_aio_readv(bs, sector_num, nb_sectors, opaque);
 
@@ -2129,6 +2341,19 @@ BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num,
     if (bdrv_check_request(bs, sector_num, nb_sectors))
         return NULL;
 
+    /* throttling disk read I/O */
+    if (bs->io_limits != NULL) {
+    	if (bdrv_exceed_io_limits(bs, nb_sectors, false, &wait_time)) {
+            ret = qemu_block_queue_enqueue(bs->block_queue, bs, bdrv_aio_readv,
+				sector_num, qiov, nb_sectors, cb, opaque);
+    	    fprintf(stderr, "bdrv_aio_readv: wait_time = %ld, timer value = %ld\n", wait_time, wait_time + qemu_get_clock_ns(vm_clock));
+    	    qemu_mod_timer(bs->block_timer, wait_time + qemu_get_clock_ns(vm_clock));
+            
+	    return ret;
+	}
+    }
+
+
     ret = drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
                               cb, opaque);
 
@@ -2136,6 +2361,11 @@ BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num,
 	/* Update stats even though technically transfer has not happened. */
 	bs->rd_bytes += (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
 	bs->rd_ops ++;
+
+	if (bs->io_limits != NULL) {
+	    bs->io_disps->bytes[0] += (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
+	    bs->io_disps->ios[0] ++;
+	}
     }
 
     return ret;
@@ -2184,6 +2414,7 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
     BlockDriver *drv = bs->drv;
     BlockDriverAIOCB *ret;
     BlockCompleteData *blk_cb_data;
+    uint64_t wait_time = 0;
 
     trace_bdrv_aio_writev(bs, sector_num, nb_sectors, opaque);
 
@@ -2201,6 +2432,18 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
         opaque = blk_cb_data;
     }
 
+    /* throttling disk write I/O */
+    if (bs->io_limits != NULL) {
+    	if (bdrv_exceed_io_limits(bs, nb_sectors, true, &wait_time)) {
+	    ret = qemu_block_queue_enqueue(bs->block_queue, bs, bdrv_aio_writev,
+				sector_num, qiov, nb_sectors, cb, opaque);
+    	    fprintf(stderr, "bdrv_aio_writev: wait_time = %ld, timer value = %ld\n", wait_time, wait_time + qemu_get_clock_ns(vm_clock));
+    	    qemu_mod_timer(bs->block_timer, wait_time + qemu_get_clock_ns(vm_clock));
+            return ret;
+	}
+    }
+
+
     ret = drv->bdrv_aio_writev(bs, sector_num, qiov, nb_sectors,
                                cb, opaque);
 
@@ -2211,6 +2454,11 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
         if (bs->wr_highest_sector < sector_num + nb_sectors - 1) {
             bs->wr_highest_sector = sector_num + nb_sectors - 1;
         }
+
+	if (bs->io_limits != NULL) {
+            bs->io_disps->bytes[1] += (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
+            bs->io_disps->ios[1] ++;
+	}
     }
 
     return ret;
diff --git a/block.h b/block.h
index 859d1d9..f0dac62 100644
--- a/block.h
+++ b/block.h
@@ -97,7 +97,6 @@ int bdrv_change_backing_file(BlockDriverState *bs,
     const char *backing_file, const char *backing_fmt);
 void bdrv_register(BlockDriver *bdrv);
 
-
 typedef struct BdrvCheckResult {
     int corruptions;
     int leaks;
diff --git a/block/blk-queue.c b/block/blk-queue.c
new file mode 100644
index 0000000..6cffb33
--- /dev/null
+++ b/block/blk-queue.c
@@ -0,0 +1,99 @@
+/*
+ * QEMU System Emulator queue for block layer
+ *
+ * Copyright (c) 2011 Zhi Yong Wu  <zwu.kernel@gmail.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "block_int.h"
+#include "qemu-queue.h"
+#include "block/blk-queue.h"
+
+/* The APIs for block request queue on qemu block layer.
+ */
+
+BlockQueue *qemu_new_block_queue(BlockQueueHandler *handler)
+{
+    BlockQueue *queue;
+
+    queue = qemu_mallocz(sizeof(BlockQueue));
+
+    queue->handler = handler;
+    QTAILQ_INIT(&queue->requests);
+
+    return queue;
+}
+
+void qemu_del_block_queue(BlockQueue *queue)
+{
+    BlockIORequest *request, *next;
+
+    QTAILQ_FOREACH_SAFE(request, &queue->requests, entry, next) {
+        QTAILQ_REMOVE(&queue->requests, request, entry);
+        qemu_free(request);
+    }
+
+    qemu_free(queue);
+}
+
+BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
+			BlockDriverState *bs,
+			BlockRequestHandler *handler,
+			int64_t sector_num,
+			QEMUIOVector *qiov,
+			int nb_sectors,
+			BlockDriverCompletionFunc *cb,
+			void *opaque)
+{
+    BlockIORequest *request;
+    BlockDriverAIOCB *acb;
+
+    request = qemu_malloc(sizeof(BlockIORequest));
+    request->bs = bs;
+    request->handler = handler;
+    request->sector_num = sector_num;
+    request->qiov = qiov;
+    request->nb_sectors = nb_sectors;
+    request->cb = cb;
+    request->opaque = opaque;
+
+    QTAILQ_INSERT_TAIL(&queue->requests, request, entry);
+
+    acb = qemu_malloc(sizeof(*acb));
+    acb->bs = bs;
+    acb->cb = cb;
+    acb->opaque = opaque;
+
+    return acb;
+}
+
+int qemu_block_queue_handler(BlockIORequest *request)
+{
+    int ret;
+    BlockDriverAIOCB *res;
+
+    res = request->handler(request->bs, request->sector_num,
+        	                request->qiov, request->nb_sectors,
+				request->cb, request->opaque);
+
+    ret = (res == NULL) ? 0 : 1; 
+
+    return ret;
+}
diff --git a/block/blk-queue.h b/block/blk-queue.h
new file mode 100644
index 0000000..6c400c7
--- /dev/null
+++ b/block/blk-queue.h
@@ -0,0 +1,73 @@
+/*
+ * QEMU System Emulator queue definition for block layer
+ *
+ * Copyright (c) 2011 Zhi Yong Wu  <zwu.kernel@gmail.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef QEMU_BLOCK_QUEUE_H
+#define QEMU_BLOCK_QUEUE_H
+
+#include "block.h"
+#include "qemu-queue.h"
+#include "qemu-common.h"
+
+typedef BlockDriverAIOCB* (BlockRequestHandler) (BlockDriverState *bs,
+                                int64_t sector_num, QEMUIOVector *qiov,
+                                int nb_sectors, BlockDriverCompletionFunc *cb,
+                                void *opaque);
+
+struct BlockIORequest {
+    QTAILQ_ENTRY(BlockIORequest) entry;
+    BlockDriverState *bs;
+    BlockRequestHandler *handler;
+    int64_t sector_num;
+    QEMUIOVector *qiov;
+    int nb_sectors;
+    BlockDriverCompletionFunc *cb;
+    void *opaque;
+};
+
+typedef struct BlockIORequest BlockIORequest;
+
+typedef int (BlockQueueHandler) (BlockIORequest *request);
+
+struct BlockQueue {
+    BlockQueueHandler *handler;
+    QTAILQ_HEAD(requests, BlockIORequest) requests;
+};
+
+typedef struct BlockQueue BlockQueue;
+
+BlockQueue *qemu_new_block_queue(BlockQueueHandler *handler);
+
+void qemu_del_block_queue(BlockQueue *queue);
+
+BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
+                        BlockDriverState *bs,
+                        BlockRequestHandler *handler,
+                        int64_t sector_num,
+                        QEMUIOVector *qiov,
+                        int nb_sectors,
+                        BlockDriverCompletionFunc *cb,
+                        void *opaque);
+
+int qemu_block_queue_handler(BlockIORequest *request);
+#endif /* QEMU_BLOCK_QUEUE_H */
diff --git a/block_int.h b/block_int.h
index 1e265d2..a173e89 100644
--- a/block_int.h
+++ b/block_int.h
@@ -27,6 +27,7 @@
 #include "block.h"
 #include "qemu-option.h"
 #include "qemu-queue.h"
+#include "block/blk-queue.h"
 
 #define BLOCK_FLAG_ENCRYPT	1
 #define BLOCK_FLAG_COMPAT6	4
@@ -46,6 +47,16 @@ typedef struct AIOPool {
     BlockDriverAIOCB *free_aiocb;
 } AIOPool;
 
+typedef struct BlockIOLimit {
+    uint64_t bps[3];
+    uint64_t iops[3];
+} BlockIOLimit;
+
+typedef struct BlockIODisp {
+    uint64_t bytes[2];
+    uint64_t ios[2];
+} BlockIODisp;
+
 struct BlockDriver {
     const char *format_name;
     int instance_size;
@@ -175,6 +186,13 @@ struct BlockDriverState {
 
     void *sync_aiocb;
 
+    /* the time for latest disk I/O */
+    int64_t slice_start[2];
+    BlockIOLimit *io_limits;
+    BlockIODisp  *io_disps;
+    BlockQueue   *block_queue;
+    QEMUTimer    *block_timer;
+
     /* I/O stats (display with "info blockstats"). */
     uint64_t rd_bytes;
     uint64_t wr_bytes;
@@ -222,6 +240,9 @@ void qemu_aio_release(void *p);
 
 void *qemu_blockalign(BlockDriverState *bs, size_t size);
 
+void bdrv_set_io_limits(BlockDriverState *bs,
+                            BlockIOLimit *io_limits);
+
 #ifdef _WIN32
 int is_windows_drive(const char *filename);
 #endif
diff --git a/blockdev.c b/blockdev.c
index c263663..4d79e7a 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -238,6 +238,9 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
     int on_read_error, on_write_error;
     const char *devaddr;
     DriveInfo *dinfo;
+    BlockIOLimit *io_limits = NULL;
+    bool iol_flag = false;
+    const char *iol_opts[7] = {"bps", "bps_rd", "bps_wr", "iops", "iops_rd", "iops_wr"};
     int is_extboot = 0;
     int snapshot = 0;
     int ret;
@@ -372,6 +375,18 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
         return NULL;
     }
 
+    /* disk io limits */
+    iol_flag = qemu_opt_io_limits_enable_flag(opts, iol_opts);
+    if (iol_flag) {
+    	io_limits		= qemu_mallocz(sizeof(*io_limits));
+    	io_limits->bps[2]  	= qemu_opt_get_number(opts, "bps", 0);
+    	io_limits->bps[0] 	= qemu_opt_get_number(opts, "bps_rd", 0);
+    	io_limits->bps[1]  	= qemu_opt_get_number(opts, "bps_wr", 0);
+        io_limits->iops[2]    	= qemu_opt_get_number(opts, "iops", 0);
+        io_limits->iops[0] 	= qemu_opt_get_number(opts, "iops_rd", 0);
+        io_limits->iops[1] 	= qemu_opt_get_number(opts, "iops_wr", 0);
+    }
+
     on_write_error = BLOCK_ERR_STOP_ENOSPC;
     if ((buf = qemu_opt_get(opts, "werror")) != NULL) {
         if (type != IF_IDE && type != IF_SCSI && type != IF_VIRTIO && type != IF_NONE) {
@@ -483,6 +498,11 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
 
     bdrv_set_on_error(dinfo->bdrv, on_read_error, on_write_error);
 
+    /* throttling disk io limits */
+    if (iol_flag) {
+    	bdrv_set_io_limits(dinfo->bdrv, io_limits);
+    }
+
     switch(type) {
     case IF_IDE:
     case IF_SCSI:
diff --git a/qemu-config.c b/qemu-config.c
index efa892c..ad5f2d9 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -82,6 +82,30 @@ static QemuOptsList qemu_drive_opts = {
             .name = "boot",
             .type = QEMU_OPT_BOOL,
             .help = "make this a boot drive",
+        },{
+            .name = "iops",
+            .type = QEMU_OPT_NUMBER,
+            .help = "cap its iops for this drive",
+        },{
+            .name = "iops_rd",
+            .type = QEMU_OPT_NUMBER,
+            .help = "cap its iops_rd for this drive",
+        },{
+            .name = "iops_wr",
+            .type = QEMU_OPT_NUMBER,
+            .help = "cap its iops_wr for this drive",
+        },{
+            .name = "bps",
+            .type = QEMU_OPT_NUMBER,
+            .help = "cap its bps for this drive",
+        },{
+            .name = "bps_rd",
+            .type = QEMU_OPT_NUMBER,
+            .help = "cap its bps_rd for this drive",
+        },{
+            .name = "bps_wr",
+            .type = QEMU_OPT_NUMBER,
+            .help = "cap its iops_wr for this drive",
         },
         { /* end of list */ }
     },
diff --git a/qemu-option.c b/qemu-option.c
index 65db542..9fe234d 100644
--- a/qemu-option.c
+++ b/qemu-option.c
@@ -562,6 +562,23 @@ uint64_t qemu_opt_get_number(QemuOpts *opts, const char *name, uint64_t defval)
     return opt->value.uint;
 }
 
+bool qemu_opt_io_limits_enable_flag(QemuOpts *opts, const char **iol_opts)
+{
+     int i;
+     uint64_t opt_val   = 0;
+     bool iol_flag = false;
+
+     for (i = 0; iol_opts[i]; i++) {
+	 opt_val = qemu_opt_get_number(opts, iol_opts[i], 0);
+	 if (opt_val != 0) {
+	     iol_flag = true;
+	     break;
+	 }
+     }
+
+     return iol_flag;
+}
+
 uint64_t qemu_opt_get_size(QemuOpts *opts, const char *name, uint64_t defval)
 {
     QemuOpt *opt = qemu_opt_find(opts, name);
diff --git a/qemu-option.h b/qemu-option.h
index b515813..fc909f9 100644
--- a/qemu-option.h
+++ b/qemu-option.h
@@ -107,6 +107,7 @@ struct QemuOptsList {
 const char *qemu_opt_get(QemuOpts *opts, const char *name);
 int qemu_opt_get_bool(QemuOpts *opts, const char *name, int defval);
 uint64_t qemu_opt_get_number(QemuOpts *opts, const char *name, uint64_t defval);
+bool qemu_opt_io_limits_enable_flag(QemuOpts *opts, const char **iol_opts);
 uint64_t qemu_opt_get_size(QemuOpts *opts, const char *name, uint64_t defval);
 int qemu_opt_set(QemuOpts *opts, const char *name, const char *value);
 typedef int (*qemu_opt_loopfunc)(const char *name, const char *value, void *opaque);
diff --git a/qemu-options.hx b/qemu-options.hx
index cb3347e..ae219f5 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -121,6 +121,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
     "       [,cache=writethrough|writeback|none|unsafe][,format=f]\n"
     "       [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
     "       [,readonly=on|off][,boot=on|off]\n"
+    "       [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r][,iops_wr=w]]\n"
     "                use 'file' as a drive image\n", QEMU_ARCH_ALL)
 STEXI
 @item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v1 0/1] The draft of the codes for QEMU disk I/O limits
@ 2011-07-22  9:20 Zhi Yong Wu
  0 siblings, 0 replies; 5+ messages in thread
From: Zhi Yong Wu @ 2011-07-22  9:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, aliguori, stefanha, raharper, kwolf, vgoyal, zwu.kernel,
	luowenj, Zhi Yong Wu

The main goal of the patch is to effectively cap the disk I/O speed or counts of one single VM.
It is only one draft, so it unavoidably has some drawbacks, if you catch them, please let me know.

The patch will mainly introduce one global timer and one block queue for each I/O limits enabled drive.
When a block request is coming in, a block I/O throttling algorithm will check if its I/O rate or counts exceed the limits; if yes, then it will enqueue to the block queue; The timer will periodically handle the I/O requests in it.

Some available features follow as below:
(1) global bps limit.
    -drive bps=xxx            in bytes/s
(2) only read bps limit
    -drive bps_rd=xxx         in bytes/s
(3) only write bps limit
    -drive bps_wr=xxx         in bytes/s
(4) global iops limit
    -drive iops=xxx           in ios/s
(5) only read iops limit
    -drive iops_rd=xxx        in ios/s
(6) only write iops limit
    -drive iops_wr=xxx        in ios/s
(7) the combination of some limits.
    -drive bps=xxx,iops=xxx 

Known Limitations:
(1) #1 can not coexist with #2, #3 
(2) #4 can not coexist with #5, #6 

Zhi Yong Wu (1):
  Submit the codes for QEMU disk I/O limits.

 Makefile.objs     |    2 +-
 block.c           |  248 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 block.h           |    1 -
 block/blk-queue.c |   99 +++++++++++++++++++++
 block/blk-queue.h |   73 ++++++++++++++++
 block_int.h       |   21 +++++
 blockdev.c        |   20 +++++
 qemu-config.c     |   24 +++++
 qemu-option.c     |   17 ++++
 qemu-option.h     |    1 +
 qemu-options.hx   |    1 +
 11 files changed, 505 insertions(+), 2 deletions(-)
 create mode 100644 block/blk-queue.c
 create mode 100644 block/blk-queue.h

-- 
1.7.2.3


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-07-25  2:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-25  2:22 [PATCH v1 0/1] The draft of the codes for QEMU disk I/O limits Zhi Yong Wu
2011-07-25  2:22 ` [Qemu-devel] " Zhi Yong Wu
2011-07-25  2:22 ` [PATCH v1 1/1] Submit " Zhi Yong Wu
2011-07-25  2:22   ` [Qemu-devel] " Zhi Yong Wu
  -- strict thread matches above, loose matches on Subject: below --
2011-07-22  9:20 [PATCH v1 0/1] The draft of " Zhi Yong Wu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.