All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/4] The intro of QEMU block I/O throttling
@ 2011-09-01 11:44 ` Zhi Yong Wu
  0 siblings, 0 replies; 17+ messages in thread
From: Zhi Yong Wu @ 2011-09-01 11:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, stefanha, mtosatti, aliguori, ryanh, zwu.kernel, kwolf,
	pair, Zhi Yong Wu

The main goal of the patch is to effectively cap the disk I/O speed or counts of one single VM.It is only one draft, so it unavoidably has some drawbacks, if you catch them, please let me know.

The patch will mainly introduce one block I/O throttling algorithm, one timer and one block queue for each I/O limits enabled drive.

When a block request is coming in, the throttling algorithm will check if its I/O rate or counts exceed the limits; if yes, then it will enqueue to the block queue; The timer will handle the I/O requests in it.

Some available features follow as below:
(1) global bps limit.
   -drive bps=xxx            in bytes/s
(2) only read bps limit
   -drive bps_rd=xxx         in bytes/s
(3) only write bps limit
   -drive bps_wr=xxx         in bytes/s
(4) global iops limit
   -drive iops=xxx           in ios/s
(5) only read iops limit
   -drive iops_rd=xxx        in ios/s
(6) only write iops limit
   -drive iops_wr=xxx        in ios/s
(7) the combination of some limits.
   -drive bps=xxx,iops=xxx

Known Limitations:
(1) #1 can not coexist with #2, #3
(2) #4 can not coexist with #5, #6
(3) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.

Changes since code V5:
  Mainly fix the aio callback issue for block queue.
  Adjust codes based on Ram Pai's comments.

Zhi Yong Wu (4):
  block: add the command line support
  block: add the block queue support
  block: add block timer and block throttling algorithm
  qmp/hmp: add block_set_io_throttle

 v5: add qmp/hmp support. 
     Adjust the codes based on stefan's comments
     qmp/hmp: add block_set_io_throttle
 
 v4: fix memory leaking based on ryan's feedback.
 
 v3: Added the code for extending slice time, and modified the method to compute wait time for the timer.
 
 v2: The codes V2 for QEMU disk I/O limits.
     Modified the codes mainly based on stefan's comments.
 
 v1: Submit the codes for QEMU disk I/O limits.
     Only a code draft.

 Makefile.objs     |    2 +-
 block.c           |  324 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 block.h           |    6 +-
 block/blk-queue.c |  226 +++++++++++++++++++++++++++++++++++++
 block/blk-queue.h |   63 ++++++++++
 block_int.h       |   30 +++++
 blockdev.c        |   98 ++++++++++++++++
 blockdev.h        |    2 +
 hmp-commands.hx   |   15 +++
 qemu-config.c     |   24 ++++
 qemu-options.hx   |    1 +
 qerror.c          |    4 +
 qerror.h          |    3 +
 qmp-commands.hx   |   52 +++++++++-
 14 files changed, 837 insertions(+), 13 deletions(-)
 create mode 100644 block/blk-queue.c
 create mode 100644 block/blk-queue.h

-- 
1.7.6


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v6 0/4] The intro of QEMU block I/O throttling
@ 2011-09-01 11:44 ` Zhi Yong Wu
  0 siblings, 0 replies; 17+ messages in thread
From: Zhi Yong Wu @ 2011-09-01 11:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, aliguori, stefanha, kvm, mtosatti, Zhi Yong Wu, pair,
	zwu.kernel, ryanh

The main goal of the patch is to effectively cap the disk I/O speed or counts of one single VM.It is only one draft, so it unavoidably has some drawbacks, if you catch them, please let me know.

The patch will mainly introduce one block I/O throttling algorithm, one timer and one block queue for each I/O limits enabled drive.

When a block request is coming in, the throttling algorithm will check if its I/O rate or counts exceed the limits; if yes, then it will enqueue to the block queue; The timer will handle the I/O requests in it.

Some available features follow as below:
(1) global bps limit.
   -drive bps=xxx            in bytes/s
(2) only read bps limit
   -drive bps_rd=xxx         in bytes/s
(3) only write bps limit
   -drive bps_wr=xxx         in bytes/s
(4) global iops limit
   -drive iops=xxx           in ios/s
(5) only read iops limit
   -drive iops_rd=xxx        in ios/s
(6) only write iops limit
   -drive iops_wr=xxx        in ios/s
(7) the combination of some limits.
   -drive bps=xxx,iops=xxx

Known Limitations:
(1) #1 can not coexist with #2, #3
(2) #4 can not coexist with #5, #6
(3) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.

Changes since code V5:
  Mainly fix the aio callback issue for block queue.
  Adjust codes based on Ram Pai's comments.

Zhi Yong Wu (4):
  block: add the command line support
  block: add the block queue support
  block: add block timer and block throttling algorithm
  qmp/hmp: add block_set_io_throttle

 v5: add qmp/hmp support. 
     Adjust the codes based on stefan's comments
     qmp/hmp: add block_set_io_throttle
 
 v4: fix memory leaking based on ryan's feedback.
 
 v3: Added the code for extending slice time, and modified the method to compute wait time for the timer.
 
 v2: The codes V2 for QEMU disk I/O limits.
     Modified the codes mainly based on stefan's comments.
 
 v1: Submit the codes for QEMU disk I/O limits.
     Only a code draft.

 Makefile.objs     |    2 +-
 block.c           |  324 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 block.h           |    6 +-
 block/blk-queue.c |  226 +++++++++++++++++++++++++++++++++++++
 block/blk-queue.h |   63 ++++++++++
 block_int.h       |   30 +++++
 blockdev.c        |   98 ++++++++++++++++
 blockdev.h        |    2 +
 hmp-commands.hx   |   15 +++
 qemu-config.c     |   24 ++++
 qemu-options.hx   |    1 +
 qerror.c          |    4 +
 qerror.h          |    3 +
 qmp-commands.hx   |   52 +++++++++-
 14 files changed, 837 insertions(+), 13 deletions(-)
 create mode 100644 block/blk-queue.c
 create mode 100644 block/blk-queue.h

-- 
1.7.6

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v6 1/4] block: add the command line support
  2011-09-01 11:44 ` [Qemu-devel] " Zhi Yong Wu
@ 2011-09-01 11:44   ` Zhi Yong Wu
  -1 siblings, 0 replies; 17+ messages in thread
From: Zhi Yong Wu @ 2011-09-01 11:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, stefanha, mtosatti, aliguori, ryanh, zwu.kernel, kwolf,
	pair, Zhi Yong Wu

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
---
 block.c         |    8 ++++++++
 block_int.h     |   21 +++++++++++++++++++++
 blockdev.c      |   29 +++++++++++++++++++++++++++++
 qemu-config.c   |   24 ++++++++++++++++++++++++
 qemu-options.hx |    1 +
 5 files changed, 83 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 03a21d8..17ee3df 100644
--- a/block.c
+++ b/block.c
@@ -1453,6 +1453,14 @@ void bdrv_get_geometry_hint(BlockDriverState *bs,
     *psecs = bs->secs;
 }
 
+/* throttling disk io limits */
+void bdrv_set_io_limits(BlockDriverState *bs,
+                            BlockIOLimit *io_limits)
+{
+    bs->io_limits = *io_limits;
+    bs->io_limits_enabled = bdrv_io_limits_enabled(bs);
+}
+
 /* Recognize floppy formats */
 typedef struct FDFormat {
     FDriveType drive;
diff --git a/block_int.h b/block_int.h
index 8a72b80..e7bda5b 100644
--- a/block_int.h
+++ b/block_int.h
@@ -29,10 +29,18 @@
 #include "qemu-queue.h"
 #include "qemu-coroutine.h"
 #include "qemu-timer.h"
+#include "block/blk-queue.h"
 
 #define BLOCK_FLAG_ENCRYPT	1
 #define BLOCK_FLAG_COMPAT6	4
 
+#define BLOCK_IO_LIMIT_READ     0
+#define BLOCK_IO_LIMIT_WRITE    1
+#define BLOCK_IO_LIMIT_TOTAL    2
+
+#define BLOCK_IO_SLICE_TIME     100000000
+#define NANOSECONDS_PER_SECOND  1000000000.0
+
 #define BLOCK_OPT_SIZE          "size"
 #define BLOCK_OPT_ENCRYPT       "encryption"
 #define BLOCK_OPT_COMPAT6       "compat6"
@@ -49,6 +57,16 @@ typedef struct AIOPool {
     BlockDriverAIOCB *free_aiocb;
 } AIOPool;
 
+typedef struct BlockIOLimit {
+    uint64_t bps[3];
+    uint64_t iops[3];
+} BlockIOLimit;
+
+typedef struct BlockIODisp {
+    uint64_t bytes[2];
+    uint64_t ios[2];
+} BlockIODisp;
+
 struct BlockDriver {
     const char *format_name;
     int instance_size;
@@ -230,6 +248,9 @@ void qemu_aio_release(void *p);
 
 void *qemu_blockalign(BlockDriverState *bs, size_t size);
 
+void bdrv_set_io_limits(BlockDriverState *bs,
+                            BlockIOLimit *io_limits);
+
 #ifdef _WIN32
 int is_windows_drive(const char *filename);
 #endif
diff --git a/blockdev.c b/blockdev.c
index 2602591..619ae9f 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -236,6 +236,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
     int on_read_error, on_write_error;
     const char *devaddr;
     DriveInfo *dinfo;
+    BlockIOLimit io_limits;
     int snapshot = 0;
     int ret;
 
@@ -354,6 +355,31 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
         }
     }
 
+    /* disk I/O throttling */
+    io_limits.bps[BLOCK_IO_LIMIT_TOTAL]  =
+                           qemu_opt_get_number(opts, "bps", 0);
+    io_limits.bps[BLOCK_IO_LIMIT_READ]   =
+                           qemu_opt_get_number(opts, "bps_rd", 0);
+    io_limits.bps[BLOCK_IO_LIMIT_WRITE]  =
+                           qemu_opt_get_number(opts, "bps_wr", 0);
+    io_limits.iops[BLOCK_IO_LIMIT_TOTAL] =
+                           qemu_opt_get_number(opts, "iops", 0);
+    io_limits.iops[BLOCK_IO_LIMIT_READ]  =
+                           qemu_opt_get_number(opts, "iops_rd", 0);
+    io_limits.iops[BLOCK_IO_LIMIT_WRITE] =
+                           qemu_opt_get_number(opts, "iops_wr", 0);
+
+    if (((io_limits.bps[BLOCK_IO_LIMIT_TOTAL] != 0)
+            && ((io_limits.bps[BLOCK_IO_LIMIT_READ] != 0)
+            || (io_limits.bps[BLOCK_IO_LIMIT_WRITE] != 0)))
+            || ((io_limits.iops[BLOCK_IO_LIMIT_TOTAL] != 0)
+            && ((io_limits.iops[BLOCK_IO_LIMIT_READ] != 0)
+            || (io_limits.iops[BLOCK_IO_LIMIT_WRITE] != 0)))) {
+        error_report("bps(iops) and bps_rd/bps_wr(iops_rd/iops_wr)"
+                     "cannot be used at the same time");
+        return NULL;
+    }
+
     on_write_error = BLOCK_ERR_STOP_ENOSPC;
     if ((buf = qemu_opt_get(opts, "werror")) != NULL) {
         if (type != IF_IDE && type != IF_SCSI && type != IF_VIRTIO && type != IF_NONE) {
@@ -461,6 +487,9 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
 
     bdrv_set_on_error(dinfo->bdrv, on_read_error, on_write_error);
 
+    /* disk I/O throttling */
+    bdrv_set_io_limits(dinfo->bdrv, &io_limits);
+
     switch(type) {
     case IF_IDE:
     case IF_SCSI:
diff --git a/qemu-config.c b/qemu-config.c
index 139e077..abcf476 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -85,6 +85,30 @@ static QemuOptsList qemu_drive_opts = {
             .name = "readonly",
             .type = QEMU_OPT_BOOL,
             .help = "open drive file as read-only",
+        },{
+            .name = "iops",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit total I/O operations per second",
+        },{
+            .name = "iops_rd",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit read operations per second",
+        },{
+            .name = "iops_wr",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit write operations per second",
+        },{
+            .name = "bps",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit total bytes per second",
+        },{
+            .name = "bps_rd",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit read bytes per second",
+        },{
+            .name = "bps_wr",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit write bytes per second",
         },
         { /* end of list */ }
     },
diff --git a/qemu-options.hx b/qemu-options.hx
index 35d95d1..bbd15dd 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -136,6 +136,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
     "       [,cache=writethrough|writeback|none|directsync|unsafe][,format=f]\n"
     "       [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
     "       [,readonly=on|off]\n"
+    "       [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r][,iops_wr=w]]\n"
     "                use 'file' as a drive image\n", QEMU_ARCH_ALL)
 STEXI
 @item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v6 1/4] block: add the command line support
@ 2011-09-01 11:44   ` Zhi Yong Wu
  0 siblings, 0 replies; 17+ messages in thread
From: Zhi Yong Wu @ 2011-09-01 11:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, aliguori, stefanha, kvm, mtosatti, Zhi Yong Wu, pair,
	zwu.kernel, ryanh

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
---
 block.c         |    8 ++++++++
 block_int.h     |   21 +++++++++++++++++++++
 blockdev.c      |   29 +++++++++++++++++++++++++++++
 qemu-config.c   |   24 ++++++++++++++++++++++++
 qemu-options.hx |    1 +
 5 files changed, 83 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 03a21d8..17ee3df 100644
--- a/block.c
+++ b/block.c
@@ -1453,6 +1453,14 @@ void bdrv_get_geometry_hint(BlockDriverState *bs,
     *psecs = bs->secs;
 }
 
+/* throttling disk io limits */
+void bdrv_set_io_limits(BlockDriverState *bs,
+                            BlockIOLimit *io_limits)
+{
+    bs->io_limits = *io_limits;
+    bs->io_limits_enabled = bdrv_io_limits_enabled(bs);
+}
+
 /* Recognize floppy formats */
 typedef struct FDFormat {
     FDriveType drive;
diff --git a/block_int.h b/block_int.h
index 8a72b80..e7bda5b 100644
--- a/block_int.h
+++ b/block_int.h
@@ -29,10 +29,18 @@
 #include "qemu-queue.h"
 #include "qemu-coroutine.h"
 #include "qemu-timer.h"
+#include "block/blk-queue.h"
 
 #define BLOCK_FLAG_ENCRYPT	1
 #define BLOCK_FLAG_COMPAT6	4
 
+#define BLOCK_IO_LIMIT_READ     0
+#define BLOCK_IO_LIMIT_WRITE    1
+#define BLOCK_IO_LIMIT_TOTAL    2
+
+#define BLOCK_IO_SLICE_TIME     100000000
+#define NANOSECONDS_PER_SECOND  1000000000.0
+
 #define BLOCK_OPT_SIZE          "size"
 #define BLOCK_OPT_ENCRYPT       "encryption"
 #define BLOCK_OPT_COMPAT6       "compat6"
@@ -49,6 +57,16 @@ typedef struct AIOPool {
     BlockDriverAIOCB *free_aiocb;
 } AIOPool;
 
+typedef struct BlockIOLimit {
+    uint64_t bps[3];
+    uint64_t iops[3];
+} BlockIOLimit;
+
+typedef struct BlockIODisp {
+    uint64_t bytes[2];
+    uint64_t ios[2];
+} BlockIODisp;
+
 struct BlockDriver {
     const char *format_name;
     int instance_size;
@@ -230,6 +248,9 @@ void qemu_aio_release(void *p);
 
 void *qemu_blockalign(BlockDriverState *bs, size_t size);
 
+void bdrv_set_io_limits(BlockDriverState *bs,
+                            BlockIOLimit *io_limits);
+
 #ifdef _WIN32
 int is_windows_drive(const char *filename);
 #endif
diff --git a/blockdev.c b/blockdev.c
index 2602591..619ae9f 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -236,6 +236,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
     int on_read_error, on_write_error;
     const char *devaddr;
     DriveInfo *dinfo;
+    BlockIOLimit io_limits;
     int snapshot = 0;
     int ret;
 
@@ -354,6 +355,31 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
         }
     }
 
+    /* disk I/O throttling */
+    io_limits.bps[BLOCK_IO_LIMIT_TOTAL]  =
+                           qemu_opt_get_number(opts, "bps", 0);
+    io_limits.bps[BLOCK_IO_LIMIT_READ]   =
+                           qemu_opt_get_number(opts, "bps_rd", 0);
+    io_limits.bps[BLOCK_IO_LIMIT_WRITE]  =
+                           qemu_opt_get_number(opts, "bps_wr", 0);
+    io_limits.iops[BLOCK_IO_LIMIT_TOTAL] =
+                           qemu_opt_get_number(opts, "iops", 0);
+    io_limits.iops[BLOCK_IO_LIMIT_READ]  =
+                           qemu_opt_get_number(opts, "iops_rd", 0);
+    io_limits.iops[BLOCK_IO_LIMIT_WRITE] =
+                           qemu_opt_get_number(opts, "iops_wr", 0);
+
+    if (((io_limits.bps[BLOCK_IO_LIMIT_TOTAL] != 0)
+            && ((io_limits.bps[BLOCK_IO_LIMIT_READ] != 0)
+            || (io_limits.bps[BLOCK_IO_LIMIT_WRITE] != 0)))
+            || ((io_limits.iops[BLOCK_IO_LIMIT_TOTAL] != 0)
+            && ((io_limits.iops[BLOCK_IO_LIMIT_READ] != 0)
+            || (io_limits.iops[BLOCK_IO_LIMIT_WRITE] != 0)))) {
+        error_report("bps(iops) and bps_rd/bps_wr(iops_rd/iops_wr)"
+                     "cannot be used at the same time");
+        return NULL;
+    }
+
     on_write_error = BLOCK_ERR_STOP_ENOSPC;
     if ((buf = qemu_opt_get(opts, "werror")) != NULL) {
         if (type != IF_IDE && type != IF_SCSI && type != IF_VIRTIO && type != IF_NONE) {
@@ -461,6 +487,9 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
 
     bdrv_set_on_error(dinfo->bdrv, on_read_error, on_write_error);
 
+    /* disk I/O throttling */
+    bdrv_set_io_limits(dinfo->bdrv, &io_limits);
+
     switch(type) {
     case IF_IDE:
     case IF_SCSI:
diff --git a/qemu-config.c b/qemu-config.c
index 139e077..abcf476 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -85,6 +85,30 @@ static QemuOptsList qemu_drive_opts = {
             .name = "readonly",
             .type = QEMU_OPT_BOOL,
             .help = "open drive file as read-only",
+        },{
+            .name = "iops",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit total I/O operations per second",
+        },{
+            .name = "iops_rd",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit read operations per second",
+        },{
+            .name = "iops_wr",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit write operations per second",
+        },{
+            .name = "bps",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit total bytes per second",
+        },{
+            .name = "bps_rd",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit read bytes per second",
+        },{
+            .name = "bps_wr",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit write bytes per second",
         },
         { /* end of list */ }
     },
diff --git a/qemu-options.hx b/qemu-options.hx
index 35d95d1..bbd15dd 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -136,6 +136,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
     "       [,cache=writethrough|writeback|none|directsync|unsafe][,format=f]\n"
     "       [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
     "       [,readonly=on|off]\n"
+    "       [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r][,iops_wr=w]]\n"
     "                use 'file' as a drive image\n", QEMU_ARCH_ALL)
 STEXI
 @item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 2/4] block: add the block queue support
  2011-09-01 11:44 ` [Qemu-devel] " Zhi Yong Wu
@ 2011-09-01 11:44   ` Zhi Yong Wu
  -1 siblings, 0 replies; 17+ messages in thread
From: Zhi Yong Wu @ 2011-09-01 11:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, stefanha, mtosatti, aliguori, ryanh, zwu.kernel, kwolf,
	pair, Zhi Yong Wu

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
---
 Makefile.objs     |    2 +-
 block/blk-queue.c |  226 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 block/blk-queue.h |   63 +++++++++++++++
 3 files changed, 290 insertions(+), 1 deletions(-)
 create mode 100644 block/blk-queue.c
 create mode 100644 block/blk-queue.h

diff --git a/Makefile.objs b/Makefile.objs
index d1f3e5d..96a7323 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -33,7 +33,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vv
 block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
 block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-nested-y += qed-check.o
-block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
+block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o blk-queue.o
 block-nested-$(CONFIG_WIN32) += raw-win32.o
 block-nested-$(CONFIG_POSIX) += raw-posix.o
 block-nested-$(CONFIG_CURL) += curl.o
diff --git a/block/blk-queue.c b/block/blk-queue.c
new file mode 100644
index 0000000..eac824c
--- /dev/null
+++ b/block/blk-queue.c
@@ -0,0 +1,226 @@
+/*
+ * QEMU System Emulator queue definition for block layer
+ *
+ * Copyright (c) IBM, Corp. 2011
+ *
+ * Authors:
+ *  Zhi Yong Wu  <zwu.kernel@gmail.com>
+ *  Stefan Hajnoczi <stefanha@gmail.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "block_int.h"
+#include "block/blk-queue.h"
+#include "qemu-common.h"
+
+/* The APIs for block request queue on qemu block layer.
+ */
+
+struct BlockIORequest {
+    QTAILQ_ENTRY(BlockIORequest) entry;
+    BlockDriverState *bs;
+    BlockRequestHandler *handler;
+    int64_t sector_num;
+    QEMUIOVector *qiov;
+    int nb_sectors;
+    BlockDriverCompletionFunc *cb;
+    void *opaque;
+    BlockDriverAIOCB *acb;
+};
+
+struct BlockQueue {
+    QTAILQ_HEAD(requests, BlockIORequest) requests;
+    BlockIORequest *request;
+    bool flushing;
+};
+
+struct BlockQueueAIOCB {
+    BlockDriverAIOCB common;
+    BlockDriverCompletionFunc *real_cb;
+    BlockDriverAIOCB *real_acb;
+    void *opaque;
+    BlockIORequest *request;
+};
+
+static void qemu_block_queue_dequeue(BlockQueue *queue, BlockIORequest *request)
+{
+    BlockIORequest *req;
+
+    while (!QTAILQ_EMPTY(&queue->requests)) {
+        req = QTAILQ_FIRST(&queue->requests);
+        if (req == request) {
+            QTAILQ_REMOVE(&queue->requests, req, entry);
+            break;
+        }
+    }
+}
+
+static void qemu_block_queue_cancel(BlockDriverAIOCB *acb)
+{
+    BlockQueueAIOCB *blkacb = container_of(acb, BlockQueueAIOCB, common);
+    if (blkacb->real_acb) {
+        bdrv_aio_cancel(blkacb->real_acb);
+    } else {
+        assert(blkacb->common.bs->block_queue);
+        qemu_block_queue_dequeue(blkacb->common.bs->block_queue,
+                                 blkacb->request);
+    }
+
+    qemu_aio_release(blkacb);
+}
+
+static AIOPool block_queue_pool = {
+    .aiocb_size         = sizeof(struct BlockQueueAIOCB),
+    .cancel             = qemu_block_queue_cancel,
+};
+
+static void qemu_block_queue_callback(void *opaque, int ret)
+{
+    BlockQueueAIOCB *acb = opaque;
+
+    if (acb->real_cb) {
+        acb->real_cb(acb->opaque, ret);
+    }
+
+    qemu_aio_release(acb);
+}
+
+BlockQueue *qemu_new_block_queue(void)
+{
+    BlockQueue *queue;
+
+    queue = g_malloc0(sizeof(BlockQueue));
+
+    QTAILQ_INIT(&queue->requests);
+
+    queue->flushing = false;
+    queue->request  = NULL;
+
+    return queue;
+}
+
+void qemu_del_block_queue(BlockQueue *queue)
+{
+    BlockIORequest *request, *next;
+
+    QTAILQ_FOREACH_SAFE(request, &queue->requests, entry, next) {
+        QTAILQ_REMOVE(&queue->requests, request, entry);
+        g_free(request);
+    }
+
+    g_free(queue);
+}
+
+BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
+                        BlockDriverState *bs,
+                        BlockRequestHandler *handler,
+                        int64_t sector_num,
+                        QEMUIOVector *qiov,
+                        int nb_sectors,
+                        BlockDriverCompletionFunc *cb,
+                        void *opaque)
+{
+    BlockIORequest *request;
+    BlockDriverAIOCB *acb;
+    BlockQueueAIOCB *blkacb;
+
+    if (queue->request) {
+        request             = queue->request;
+        assert(request->acb);
+        acb                 = request->acb;
+        QTAILQ_INSERT_TAIL(&queue->requests, request, entry);
+        queue->request      = NULL;
+    } else {
+        request             = g_malloc0(sizeof(BlockIORequest));
+        request->bs         = bs;
+        request->handler    = handler;
+        request->sector_num = sector_num;
+        request->qiov       = qiov;
+        request->nb_sectors = nb_sectors;
+        request->cb         = qemu_block_queue_callback;
+        QTAILQ_INSERT_TAIL(&queue->requests, request, entry);
+
+        acb = qemu_aio_get(&block_queue_pool, bs,
+                           qemu_block_queue_callback, opaque);
+        blkacb = container_of(acb, BlockQueueAIOCB, common);
+        blkacb->real_cb     = cb;
+        blkacb->real_acb    = NULL;
+        blkacb->opaque      = opaque;
+        blkacb->request     = request;
+
+        request->acb        = acb;
+        request->opaque = (void *)blkacb;
+    }
+
+    return acb;
+}
+
+static int qemu_block_queue_handler(BlockIORequest *request)
+{
+    int ret;
+    BlockDriverAIOCB *res;
+
+    res = request->handler(request->bs, request->sector_num,
+                           request->qiov, request->nb_sectors,
+                           request->cb, request->opaque);
+    if (res) {
+        BlockQueueAIOCB *blkacb =
+                container_of(request->acb, BlockQueueAIOCB, common);
+        blkacb->real_acb = res;
+    }
+
+    ret = (res == NULL) ? 0 : 1;
+
+    return ret;
+}
+
+void qemu_block_queue_flush(BlockQueue *queue)
+{
+    queue->flushing = true;
+    while (!QTAILQ_EMPTY(&queue->requests)) {
+        BlockIORequest *request = NULL;
+        int ret = 0;
+
+        request = QTAILQ_FIRST(&queue->requests);
+        QTAILQ_REMOVE(&queue->requests, request, entry);
+
+        queue->request = request;
+        ret = qemu_block_queue_handler(request);
+
+        if (ret == 0) {
+            QTAILQ_INSERT_HEAD(&queue->requests, request, entry);
+            queue->request = NULL;
+            break;
+        }
+
+        if (queue->request) {
+            g_free(request);
+        }
+
+        queue->request = NULL;
+    }
+
+    queue->flushing = false;
+}
+
+bool qemu_block_queue_has_pending(BlockQueue *queue)
+{
+    return !queue->flushing && !QTAILQ_EMPTY(&queue->requests);
+}
diff --git a/block/blk-queue.h b/block/blk-queue.h
new file mode 100644
index 0000000..84c2407
--- /dev/null
+++ b/block/blk-queue.h
@@ -0,0 +1,63 @@
+/*
+ * QEMU System Emulator queue declaration for block layer
+ *
+ * Copyright (c) IBM, Corp. 2011
+ *
+ * Authors:
+ *  Zhi Yong Wu  <zwu.kernel@gmail.com>
+ *  Stefan Hajnoczi <stefanha@gmail.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef QEMU_BLOCK_QUEUE_H
+#define QEMU_BLOCK_QUEUE_H
+
+#include "block.h"
+#include "qemu-queue.h"
+
+typedef BlockDriverAIOCB* (BlockRequestHandler) (BlockDriverState *bs,
+                                int64_t sector_num, QEMUIOVector *qiov,
+                                int nb_sectors, BlockDriverCompletionFunc *cb,
+                                void *opaque);
+
+typedef struct BlockIORequest BlockIORequest;
+
+typedef struct BlockQueue BlockQueue;
+
+typedef struct BlockQueueAIOCB BlockQueueAIOCB;
+
+BlockQueue *qemu_new_block_queue(void);
+
+void qemu_del_block_queue(BlockQueue *queue);
+
+BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
+                        BlockDriverState *bs,
+                        BlockRequestHandler *handler,
+                        int64_t sector_num,
+                        QEMUIOVector *qiov,
+                        int nb_sectors,
+                        BlockDriverCompletionFunc *cb,
+                        void *opaque);
+
+void qemu_block_queue_flush(BlockQueue *queue);
+
+bool qemu_block_queue_has_pending(BlockQueue *queue);
+
+#endif /* QEMU_BLOCK_QUEUE_H */
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v6 2/4] block: add the block queue support
@ 2011-09-01 11:44   ` Zhi Yong Wu
  0 siblings, 0 replies; 17+ messages in thread
From: Zhi Yong Wu @ 2011-09-01 11:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, aliguori, stefanha, kvm, mtosatti, Zhi Yong Wu, pair,
	zwu.kernel, ryanh

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
---
 Makefile.objs     |    2 +-
 block/blk-queue.c |  226 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 block/blk-queue.h |   63 +++++++++++++++
 3 files changed, 290 insertions(+), 1 deletions(-)
 create mode 100644 block/blk-queue.c
 create mode 100644 block/blk-queue.h

diff --git a/Makefile.objs b/Makefile.objs
index d1f3e5d..96a7323 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -33,7 +33,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vv
 block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
 block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-nested-y += qed-check.o
-block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
+block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o blk-queue.o
 block-nested-$(CONFIG_WIN32) += raw-win32.o
 block-nested-$(CONFIG_POSIX) += raw-posix.o
 block-nested-$(CONFIG_CURL) += curl.o
diff --git a/block/blk-queue.c b/block/blk-queue.c
new file mode 100644
index 0000000..eac824c
--- /dev/null
+++ b/block/blk-queue.c
@@ -0,0 +1,226 @@
+/*
+ * QEMU System Emulator queue definition for block layer
+ *
+ * Copyright (c) IBM, Corp. 2011
+ *
+ * Authors:
+ *  Zhi Yong Wu  <zwu.kernel@gmail.com>
+ *  Stefan Hajnoczi <stefanha@gmail.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "block_int.h"
+#include "block/blk-queue.h"
+#include "qemu-common.h"
+
+/* The APIs for block request queue on qemu block layer.
+ */
+
+struct BlockIORequest {
+    QTAILQ_ENTRY(BlockIORequest) entry;
+    BlockDriverState *bs;
+    BlockRequestHandler *handler;
+    int64_t sector_num;
+    QEMUIOVector *qiov;
+    int nb_sectors;
+    BlockDriverCompletionFunc *cb;
+    void *opaque;
+    BlockDriverAIOCB *acb;
+};
+
+struct BlockQueue {
+    QTAILQ_HEAD(requests, BlockIORequest) requests;
+    BlockIORequest *request;
+    bool flushing;
+};
+
+struct BlockQueueAIOCB {
+    BlockDriverAIOCB common;
+    BlockDriverCompletionFunc *real_cb;
+    BlockDriverAIOCB *real_acb;
+    void *opaque;
+    BlockIORequest *request;
+};
+
+static void qemu_block_queue_dequeue(BlockQueue *queue, BlockIORequest *request)
+{
+    BlockIORequest *req;
+
+    while (!QTAILQ_EMPTY(&queue->requests)) {
+        req = QTAILQ_FIRST(&queue->requests);
+        if (req == request) {
+            QTAILQ_REMOVE(&queue->requests, req, entry);
+            break;
+        }
+    }
+}
+
+static void qemu_block_queue_cancel(BlockDriverAIOCB *acb)
+{
+    BlockQueueAIOCB *blkacb = container_of(acb, BlockQueueAIOCB, common);
+    if (blkacb->real_acb) {
+        bdrv_aio_cancel(blkacb->real_acb);
+    } else {
+        assert(blkacb->common.bs->block_queue);
+        qemu_block_queue_dequeue(blkacb->common.bs->block_queue,
+                                 blkacb->request);
+    }
+
+    qemu_aio_release(blkacb);
+}
+
+static AIOPool block_queue_pool = {
+    .aiocb_size         = sizeof(struct BlockQueueAIOCB),
+    .cancel             = qemu_block_queue_cancel,
+};
+
+static void qemu_block_queue_callback(void *opaque, int ret)
+{
+    BlockQueueAIOCB *acb = opaque;
+
+    if (acb->real_cb) {
+        acb->real_cb(acb->opaque, ret);
+    }
+
+    qemu_aio_release(acb);
+}
+
+BlockQueue *qemu_new_block_queue(void)
+{
+    BlockQueue *queue;
+
+    queue = g_malloc0(sizeof(BlockQueue));
+
+    QTAILQ_INIT(&queue->requests);
+
+    queue->flushing = false;
+    queue->request  = NULL;
+
+    return queue;
+}
+
+void qemu_del_block_queue(BlockQueue *queue)
+{
+    BlockIORequest *request, *next;
+
+    QTAILQ_FOREACH_SAFE(request, &queue->requests, entry, next) {
+        QTAILQ_REMOVE(&queue->requests, request, entry);
+        g_free(request);
+    }
+
+    g_free(queue);
+}
+
+BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
+                        BlockDriverState *bs,
+                        BlockRequestHandler *handler,
+                        int64_t sector_num,
+                        QEMUIOVector *qiov,
+                        int nb_sectors,
+                        BlockDriverCompletionFunc *cb,
+                        void *opaque)
+{
+    BlockIORequest *request;
+    BlockDriverAIOCB *acb;
+    BlockQueueAIOCB *blkacb;
+
+    if (queue->request) {
+        request             = queue->request;
+        assert(request->acb);
+        acb                 = request->acb;
+        QTAILQ_INSERT_TAIL(&queue->requests, request, entry);
+        queue->request      = NULL;
+    } else {
+        request             = g_malloc0(sizeof(BlockIORequest));
+        request->bs         = bs;
+        request->handler    = handler;
+        request->sector_num = sector_num;
+        request->qiov       = qiov;
+        request->nb_sectors = nb_sectors;
+        request->cb         = qemu_block_queue_callback;
+        QTAILQ_INSERT_TAIL(&queue->requests, request, entry);
+
+        acb = qemu_aio_get(&block_queue_pool, bs,
+                           qemu_block_queue_callback, opaque);
+        blkacb = container_of(acb, BlockQueueAIOCB, common);
+        blkacb->real_cb     = cb;
+        blkacb->real_acb    = NULL;
+        blkacb->opaque      = opaque;
+        blkacb->request     = request;
+
+        request->acb        = acb;
+        request->opaque = (void *)blkacb;
+    }
+
+    return acb;
+}
+
+static int qemu_block_queue_handler(BlockIORequest *request)
+{
+    int ret;
+    BlockDriverAIOCB *res;
+
+    res = request->handler(request->bs, request->sector_num,
+                           request->qiov, request->nb_sectors,
+                           request->cb, request->opaque);
+    if (res) {
+        BlockQueueAIOCB *blkacb =
+                container_of(request->acb, BlockQueueAIOCB, common);
+        blkacb->real_acb = res;
+    }
+
+    ret = (res == NULL) ? 0 : 1;
+
+    return ret;
+}
+
+void qemu_block_queue_flush(BlockQueue *queue)
+{
+    queue->flushing = true;
+    while (!QTAILQ_EMPTY(&queue->requests)) {
+        BlockIORequest *request = NULL;
+        int ret = 0;
+
+        request = QTAILQ_FIRST(&queue->requests);
+        QTAILQ_REMOVE(&queue->requests, request, entry);
+
+        queue->request = request;
+        ret = qemu_block_queue_handler(request);
+
+        if (ret == 0) {
+            QTAILQ_INSERT_HEAD(&queue->requests, request, entry);
+            queue->request = NULL;
+            break;
+        }
+
+        if (queue->request) {
+            g_free(request);
+        }
+
+        queue->request = NULL;
+    }
+
+    queue->flushing = false;
+}
+
+bool qemu_block_queue_has_pending(BlockQueue *queue)
+{
+    return !queue->flushing && !QTAILQ_EMPTY(&queue->requests);
+}
diff --git a/block/blk-queue.h b/block/blk-queue.h
new file mode 100644
index 0000000..84c2407
--- /dev/null
+++ b/block/blk-queue.h
@@ -0,0 +1,63 @@
+/*
+ * QEMU System Emulator queue declaration for block layer
+ *
+ * Copyright (c) IBM, Corp. 2011
+ *
+ * Authors:
+ *  Zhi Yong Wu  <zwu.kernel@gmail.com>
+ *  Stefan Hajnoczi <stefanha@gmail.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef QEMU_BLOCK_QUEUE_H
+#define QEMU_BLOCK_QUEUE_H
+
+#include "block.h"
+#include "qemu-queue.h"
+
+typedef BlockDriverAIOCB* (BlockRequestHandler) (BlockDriverState *bs,
+                                int64_t sector_num, QEMUIOVector *qiov,
+                                int nb_sectors, BlockDriverCompletionFunc *cb,
+                                void *opaque);
+
+typedef struct BlockIORequest BlockIORequest;
+
+typedef struct BlockQueue BlockQueue;
+
+typedef struct BlockQueueAIOCB BlockQueueAIOCB;
+
+BlockQueue *qemu_new_block_queue(void);
+
+void qemu_del_block_queue(BlockQueue *queue);
+
+BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
+                        BlockDriverState *bs,
+                        BlockRequestHandler *handler,
+                        int64_t sector_num,
+                        QEMUIOVector *qiov,
+                        int nb_sectors,
+                        BlockDriverCompletionFunc *cb,
+                        void *opaque);
+
+void qemu_block_queue_flush(BlockQueue *queue);
+
+bool qemu_block_queue_has_pending(BlockQueue *queue);
+
+#endif /* QEMU_BLOCK_QUEUE_H */
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 3/4] block: add block timer and block throttling algorithm
  2011-09-01 11:44 ` [Qemu-devel] " Zhi Yong Wu
@ 2011-09-01 11:44   ` Zhi Yong Wu
  -1 siblings, 0 replies; 17+ messages in thread
From: Zhi Yong Wu @ 2011-09-01 11:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, stefanha, mtosatti, aliguori, ryanh, zwu.kernel, kwolf,
	pair, Zhi Yong Wu

Note:
     1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
     2.) When "dd" command is issued in guest, if its option bs is set to a large value such as "bs=1024K", the result speed will slightly bigger than the limits.

For these problems, if you have nice thought, pls let us know.:)

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
---
 block.c     |  290 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 block.h     |    5 +
 block_int.h |    9 ++
 3 files changed, 296 insertions(+), 8 deletions(-)

diff --git a/block.c b/block.c
index 17ee3df..680f1e7 100644
--- a/block.c
+++ b/block.c
@@ -30,6 +30,9 @@
 #include "qemu-objects.h"
 #include "qemu-coroutine.h"
 
+#include "qemu-timer.h"
+#include "block/blk-queue.h"
+
 #ifdef CONFIG_BSD
 #include <sys/types.h>
 #include <sys/stat.h>
@@ -72,6 +75,13 @@ static int coroutine_fn bdrv_co_writev_em(BlockDriverState *bs,
                                          QEMUIOVector *iov);
 static int coroutine_fn bdrv_co_flush_em(BlockDriverState *bs);
 
+static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
+        bool is_write, double elapsed_time, uint64_t *wait);
+static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
+        double elapsed_time, uint64_t *wait);
+static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
+        bool is_write, uint64_t *wait);
+
 static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
     QTAILQ_HEAD_INITIALIZER(bdrv_states);
 
@@ -104,6 +114,64 @@ int is_windows_drive(const char *filename)
 }
 #endif
 
+/* throttling disk I/O limits */
+void bdrv_io_limits_disable(BlockDriverState *bs)
+{
+    bs->io_limits_enabled = false;
+
+    if (bs->block_queue) {
+        qemu_block_queue_flush(bs->block_queue);
+        qemu_del_block_queue(bs->block_queue);
+        bs->block_queue = NULL;
+    }
+
+    if (bs->block_timer) {
+        qemu_del_timer(bs->block_timer);
+        qemu_free_timer(bs->block_timer);
+        bs->block_timer = NULL;
+    }
+
+    bs->slice_start[BLOCK_IO_LIMIT_READ]  = 0;
+    bs->slice_start[BLOCK_IO_LIMIT_WRITE] = 0;
+
+    bs->slice_end[BLOCK_IO_LIMIT_READ]    = 0;
+    bs->slice_end[BLOCK_IO_LIMIT_WRITE]   = 0;
+}
+
+static void bdrv_block_timer(void *opaque)
+{
+    BlockDriverState *bs = opaque;
+    BlockQueue *queue = bs->block_queue;
+
+    qemu_block_queue_flush(queue);
+}
+
+void bdrv_io_limits_enable(BlockDriverState *bs)
+{
+    bs->block_queue    = qemu_new_block_queue();
+    bs->block_timer    = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
+
+    bs->slice_start[BLOCK_IO_LIMIT_READ]  = qemu_get_clock_ns(vm_clock);
+    bs->slice_start[BLOCK_IO_LIMIT_WRITE] =
+           bs->slice_start[BLOCK_IO_LIMIT_READ];
+
+    bs->slice_end[BLOCK_IO_LIMIT_READ]    =
+           bs->slice_start[BLOCK_IO_LIMIT_READ] + BLOCK_IO_SLICE_TIME;
+    bs->slice_end[BLOCK_IO_LIMIT_WRITE]   =
+           bs->slice_end[BLOCK_IO_LIMIT_READ];
+}
+
+bool bdrv_io_limits_enabled(BlockDriverState *bs)
+{
+    BlockIOLimit *io_limits = &bs->io_limits;
+    return io_limits->bps[BLOCK_IO_LIMIT_READ]
+         || io_limits->bps[BLOCK_IO_LIMIT_WRITE]
+         || io_limits->bps[BLOCK_IO_LIMIT_TOTAL]
+         || io_limits->iops[BLOCK_IO_LIMIT_READ]
+         || io_limits->iops[BLOCK_IO_LIMIT_WRITE]
+         || io_limits->iops[BLOCK_IO_LIMIT_TOTAL];
+}
+
 /* check if the path starts with "<protocol>:" */
 static int path_has_protocol(const char *path)
 {
@@ -694,6 +762,11 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
             bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
     }
 
+    /* throttling disk I/O limits */
+    if (bs->io_limits_enabled) {
+        bdrv_io_limits_enable(bs);
+    }
+
     return 0;
 
 unlink_and_fail:
@@ -732,6 +805,18 @@ void bdrv_close(BlockDriverState *bs)
         if (bs->change_cb)
             bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
     }
+
+    /* throttling disk I/O limits */
+    if (bs->block_queue) {
+        qemu_del_block_queue(bs->block_queue);
+        bs->block_queue = NULL;
+    }
+
+    if (bs->block_timer) {
+        qemu_del_timer(bs->block_timer);
+        qemu_free_timer(bs->block_timer);
+        bs->block_timer = NULL;
+    }
 }
 
 void bdrv_close_all(void)
@@ -2290,13 +2375,29 @@ BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num,
                                  BlockDriverCompletionFunc *cb, void *opaque)
 {
     BlockDriver *drv = bs->drv;
+    uint64_t wait_time = 0;
+    BlockDriverAIOCB *ret;
 
     trace_bdrv_aio_readv(bs, sector_num, nb_sectors, opaque);
 
-    if (!drv)
-        return NULL;
-    if (bdrv_check_request(bs, sector_num, nb_sectors))
+    if (!drv || bdrv_check_request(bs, sector_num, nb_sectors)) {
         return NULL;
+    }
+
+    /* throttling disk read I/O */
+    if (bs->io_limits_enabled) {
+        if (bdrv_exceed_io_limits(bs, nb_sectors, false, &wait_time)) {
+            ret = qemu_block_queue_enqueue(bs->block_queue, bs, bdrv_aio_readv,
+                           sector_num, qiov, nb_sectors, cb, opaque);
+            qemu_mod_timer(bs->block_timer,
+                           wait_time + qemu_get_clock_ns(vm_clock));
+            return ret;
+        }
+
+        bs->io_disps.bytes[BLOCK_IO_LIMIT_READ] +=
+                           (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
+        bs->io_disps.ios[BLOCK_IO_LIMIT_READ]++;
+    }
 
     return drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
                                cb, opaque);
@@ -2345,15 +2446,14 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
     BlockDriver *drv = bs->drv;
     BlockDriverAIOCB *ret;
     BlockCompleteData *blk_cb_data;
+    uint64_t wait_time = 0;
 
     trace_bdrv_aio_writev(bs, sector_num, nb_sectors, opaque);
 
-    if (!drv)
-        return NULL;
-    if (bs->read_only)
-        return NULL;
-    if (bdrv_check_request(bs, sector_num, nb_sectors))
+    if (!drv || bs->read_only
+        || bdrv_check_request(bs, sector_num, nb_sectors)) {
         return NULL;
+    }
 
     if (bs->dirty_bitmap) {
         blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb,
@@ -2362,6 +2462,17 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
         opaque = blk_cb_data;
     }
 
+    /* throttling disk write I/O */
+    if (bs->io_limits_enabled) {
+        if (bdrv_exceed_io_limits(bs, nb_sectors, true, &wait_time)) {
+            ret = qemu_block_queue_enqueue(bs->block_queue, bs, bdrv_aio_writev,
+                                  sector_num, qiov, nb_sectors, cb, opaque);
+            qemu_mod_timer(bs->block_timer,
+                                  wait_time + qemu_get_clock_ns(vm_clock));
+            return ret;
+        }
+    }
+
     ret = drv->bdrv_aio_writev(bs, sector_num, qiov, nb_sectors,
                                cb, opaque);
 
@@ -2369,6 +2480,12 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
         if (bs->wr_highest_sector < sector_num + nb_sectors - 1) {
             bs->wr_highest_sector = sector_num + nb_sectors - 1;
         }
+
+        if (bs->io_limits_enabled) {
+            bs->io_disps.bytes[BLOCK_IO_LIMIT_WRITE] +=
+                               (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
+            bs->io_disps.ios[BLOCK_IO_LIMIT_WRITE]++;
+        }
     }
 
     return ret;
@@ -2633,6 +2750,163 @@ void bdrv_aio_cancel(BlockDriverAIOCB *acb)
     acb->pool->cancel(acb);
 }
 
+static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
+                 bool is_write, double elapsed_time, uint64_t *wait) {
+    uint64_t bps_limit = 0;
+    double   bytes_limit, bytes_disp, bytes_res;
+    double   slice_time, wait_time;
+
+    if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
+        bps_limit = bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL];
+    } else if (bs->io_limits.bps[is_write]) {
+        bps_limit = bs->io_limits.bps[is_write];
+    } else {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    slice_time = bs->slice_end[is_write] - bs->slice_start[is_write];
+    slice_time /= (NANOSECONDS_PER_SECOND);
+    bytes_limit = bps_limit * slice_time;
+    bytes_disp  = bs->io_disps.bytes[is_write];
+    if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
+        bytes_disp += bs->io_disps.bytes[!is_write];
+    }
+
+    bytes_res   = (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
+
+    if (bytes_disp + bytes_res <= bytes_limit) {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    /* Calc approx time to dispatch */
+    wait_time = (bytes_disp + bytes_res) / bps_limit - elapsed_time;
+
+    if (wait) {
+        *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
+    }
+
+    return true;
+}
+
+static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
+                             double elapsed_time, uint64_t *wait) {
+    uint64_t iops_limit = 0;
+    double   ios_limit, ios_disp;
+    double   slice_time, wait_time;
+
+    if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
+        iops_limit = bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL];
+    } else if (bs->io_limits.iops[is_write]) {
+        iops_limit = bs->io_limits.iops[is_write];
+    } else {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    slice_time = bs->slice_end[is_write] - bs->slice_start[is_write];
+    slice_time /= (NANOSECONDS_PER_SECOND);
+    ios_limit  = iops_limit * slice_time;
+    ios_disp   = bs->io_disps.ios[is_write];
+    if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
+        ios_disp += bs->io_disps.ios[!is_write];
+    }
+
+    if (ios_disp + 1 <= ios_limit) {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    /* Calc approx time to dispatch */
+    wait_time = (ios_disp + 1) / iops_limit;
+    if (wait_time > elapsed_time) {
+        wait_time = wait_time - elapsed_time;
+    } else {
+        wait_time = 0;
+    }
+
+    if (wait) {
+        *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
+    }
+
+    return true;
+}
+
+static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
+                           bool is_write, uint64_t *wait) {
+    int64_t  now;
+    uint64_t bps_wait = 0, iops_wait = 0, max_wait;
+    double   elapsed_time;
+    int      bps_ret, iops_ret;
+
+    now = qemu_get_clock_ns(vm_clock);
+    if ((bs->slice_start[is_write] < now)
+        && (bs->slice_end[is_write] > now)) {
+        bs->slice_end[is_write]   = now + BLOCK_IO_SLICE_TIME;
+    } else {
+        bs->slice_start[is_write]     = now;
+        bs->slice_end[is_write]       = now + BLOCK_IO_SLICE_TIME;
+
+        bs->io_disps.bytes[is_write]  = 0;
+        bs->io_disps.bytes[!is_write] = 0;
+
+        bs->io_disps.ios[is_write]    = 0;
+        bs->io_disps.ios[!is_write]   = 0;
+    }
+
+    /* If a limit was exceeded, immediately queue this request */
+    if (qemu_block_queue_has_pending(bs->block_queue)) {
+        if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]
+            || bs->io_limits.bps[is_write] || bs->io_limits.iops[is_write]
+            || bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
+            if (wait) {
+                *wait = 0;
+            }
+
+            return true;
+        }
+    }
+
+    elapsed_time  = now - bs->slice_start[is_write];
+    elapsed_time  /= (NANOSECONDS_PER_SECOND);
+
+    bps_ret  = bdrv_exceed_bps_limits(bs, nb_sectors,
+                                      is_write, elapsed_time, &bps_wait);
+    iops_ret = bdrv_exceed_iops_limits(bs, is_write,
+                                      elapsed_time, &iops_wait);
+    if (bps_ret || iops_ret) {
+        max_wait = bps_wait > iops_wait ? bps_wait : iops_wait;
+        if (wait) {
+            *wait = max_wait;
+        }
+
+        now = qemu_get_clock_ns(vm_clock);
+        if (bs->slice_end[is_write] < now + max_wait) {
+            bs->slice_end[is_write] = now + max_wait;
+        }
+
+        return true;
+    }
+
+    if (wait) {
+        *wait = 0;
+    }
+
+    return false;
+}
 
 /**************************************************************/
 /* async block device emulation */
diff --git a/block.h b/block.h
index 3ac0b94..a3e69db 100644
--- a/block.h
+++ b/block.h
@@ -58,6 +58,11 @@ void bdrv_info(Monitor *mon, QObject **ret_data);
 void bdrv_stats_print(Monitor *mon, const QObject *data);
 void bdrv_info_stats(Monitor *mon, QObject **ret_data);
 
+/* disk I/O throttling */
+void bdrv_io_limits_enable(BlockDriverState *bs);
+void bdrv_io_limits_disable(BlockDriverState *bs);
+bool bdrv_io_limits_enabled(BlockDriverState *bs);
+
 void bdrv_init(void);
 void bdrv_init_with_whitelist(void);
 BlockDriver *bdrv_find_protocol(const char *filename);
diff --git a/block_int.h b/block_int.h
index e7bda5b..5d37277 100644
--- a/block_int.h
+++ b/block_int.h
@@ -202,6 +202,15 @@ struct BlockDriverState {
 
     void *sync_aiocb;
 
+    /* the time for latest disk I/O */
+    int64_t slice_start[2];
+    int64_t slice_end[2];
+    BlockIOLimit io_limits;
+    BlockIODisp  io_disps;
+    BlockQueue   *block_queue;
+    QEMUTimer    *block_timer;
+    bool         io_limits_enabled;
+
     /* I/O stats (display with "info blockstats"). */
     uint64_t nr_bytes[BDRV_MAX_IOTYPE];
     uint64_t nr_ops[BDRV_MAX_IOTYPE];
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v6 3/4] block: add block timer and block throttling algorithm
@ 2011-09-01 11:44   ` Zhi Yong Wu
  0 siblings, 0 replies; 17+ messages in thread
From: Zhi Yong Wu @ 2011-09-01 11:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, aliguori, stefanha, kvm, mtosatti, Zhi Yong Wu, pair,
	zwu.kernel, ryanh

Note:
     1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
     2.) When "dd" command is issued in guest, if its option bs is set to a large value such as "bs=1024K", the result speed will slightly bigger than the limits.

For these problems, if you have nice thought, pls let us know.:)

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
---
 block.c     |  290 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 block.h     |    5 +
 block_int.h |    9 ++
 3 files changed, 296 insertions(+), 8 deletions(-)

diff --git a/block.c b/block.c
index 17ee3df..680f1e7 100644
--- a/block.c
+++ b/block.c
@@ -30,6 +30,9 @@
 #include "qemu-objects.h"
 #include "qemu-coroutine.h"
 
+#include "qemu-timer.h"
+#include "block/blk-queue.h"
+
 #ifdef CONFIG_BSD
 #include <sys/types.h>
 #include <sys/stat.h>
@@ -72,6 +75,13 @@ static int coroutine_fn bdrv_co_writev_em(BlockDriverState *bs,
                                          QEMUIOVector *iov);
 static int coroutine_fn bdrv_co_flush_em(BlockDriverState *bs);
 
+static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
+        bool is_write, double elapsed_time, uint64_t *wait);
+static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
+        double elapsed_time, uint64_t *wait);
+static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
+        bool is_write, uint64_t *wait);
+
 static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
     QTAILQ_HEAD_INITIALIZER(bdrv_states);
 
@@ -104,6 +114,64 @@ int is_windows_drive(const char *filename)
 }
 #endif
 
+/* throttling disk I/O limits */
+void bdrv_io_limits_disable(BlockDriverState *bs)
+{
+    bs->io_limits_enabled = false;
+
+    if (bs->block_queue) {
+        qemu_block_queue_flush(bs->block_queue);
+        qemu_del_block_queue(bs->block_queue);
+        bs->block_queue = NULL;
+    }
+
+    if (bs->block_timer) {
+        qemu_del_timer(bs->block_timer);
+        qemu_free_timer(bs->block_timer);
+        bs->block_timer = NULL;
+    }
+
+    bs->slice_start[BLOCK_IO_LIMIT_READ]  = 0;
+    bs->slice_start[BLOCK_IO_LIMIT_WRITE] = 0;
+
+    bs->slice_end[BLOCK_IO_LIMIT_READ]    = 0;
+    bs->slice_end[BLOCK_IO_LIMIT_WRITE]   = 0;
+}
+
+static void bdrv_block_timer(void *opaque)
+{
+    BlockDriverState *bs = opaque;
+    BlockQueue *queue = bs->block_queue;
+
+    qemu_block_queue_flush(queue);
+}
+
+void bdrv_io_limits_enable(BlockDriverState *bs)
+{
+    bs->block_queue    = qemu_new_block_queue();
+    bs->block_timer    = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
+
+    bs->slice_start[BLOCK_IO_LIMIT_READ]  = qemu_get_clock_ns(vm_clock);
+    bs->slice_start[BLOCK_IO_LIMIT_WRITE] =
+           bs->slice_start[BLOCK_IO_LIMIT_READ];
+
+    bs->slice_end[BLOCK_IO_LIMIT_READ]    =
+           bs->slice_start[BLOCK_IO_LIMIT_READ] + BLOCK_IO_SLICE_TIME;
+    bs->slice_end[BLOCK_IO_LIMIT_WRITE]   =
+           bs->slice_end[BLOCK_IO_LIMIT_READ];
+}
+
+bool bdrv_io_limits_enabled(BlockDriverState *bs)
+{
+    BlockIOLimit *io_limits = &bs->io_limits;
+    return io_limits->bps[BLOCK_IO_LIMIT_READ]
+         || io_limits->bps[BLOCK_IO_LIMIT_WRITE]
+         || io_limits->bps[BLOCK_IO_LIMIT_TOTAL]
+         || io_limits->iops[BLOCK_IO_LIMIT_READ]
+         || io_limits->iops[BLOCK_IO_LIMIT_WRITE]
+         || io_limits->iops[BLOCK_IO_LIMIT_TOTAL];
+}
+
 /* check if the path starts with "<protocol>:" */
 static int path_has_protocol(const char *path)
 {
@@ -694,6 +762,11 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
             bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
     }
 
+    /* throttling disk I/O limits */
+    if (bs->io_limits_enabled) {
+        bdrv_io_limits_enable(bs);
+    }
+
     return 0;
 
 unlink_and_fail:
@@ -732,6 +805,18 @@ void bdrv_close(BlockDriverState *bs)
         if (bs->change_cb)
             bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
     }
+
+    /* throttling disk I/O limits */
+    if (bs->block_queue) {
+        qemu_del_block_queue(bs->block_queue);
+        bs->block_queue = NULL;
+    }
+
+    if (bs->block_timer) {
+        qemu_del_timer(bs->block_timer);
+        qemu_free_timer(bs->block_timer);
+        bs->block_timer = NULL;
+    }
 }
 
 void bdrv_close_all(void)
@@ -2290,13 +2375,29 @@ BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num,
                                  BlockDriverCompletionFunc *cb, void *opaque)
 {
     BlockDriver *drv = bs->drv;
+    uint64_t wait_time = 0;
+    BlockDriverAIOCB *ret;
 
     trace_bdrv_aio_readv(bs, sector_num, nb_sectors, opaque);
 
-    if (!drv)
-        return NULL;
-    if (bdrv_check_request(bs, sector_num, nb_sectors))
+    if (!drv || bdrv_check_request(bs, sector_num, nb_sectors)) {
         return NULL;
+    }
+
+    /* throttling disk read I/O */
+    if (bs->io_limits_enabled) {
+        if (bdrv_exceed_io_limits(bs, nb_sectors, false, &wait_time)) {
+            ret = qemu_block_queue_enqueue(bs->block_queue, bs, bdrv_aio_readv,
+                           sector_num, qiov, nb_sectors, cb, opaque);
+            qemu_mod_timer(bs->block_timer,
+                           wait_time + qemu_get_clock_ns(vm_clock));
+            return ret;
+        }
+
+        bs->io_disps.bytes[BLOCK_IO_LIMIT_READ] +=
+                           (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
+        bs->io_disps.ios[BLOCK_IO_LIMIT_READ]++;
+    }
 
     return drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
                                cb, opaque);
@@ -2345,15 +2446,14 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
     BlockDriver *drv = bs->drv;
     BlockDriverAIOCB *ret;
     BlockCompleteData *blk_cb_data;
+    uint64_t wait_time = 0;
 
     trace_bdrv_aio_writev(bs, sector_num, nb_sectors, opaque);
 
-    if (!drv)
-        return NULL;
-    if (bs->read_only)
-        return NULL;
-    if (bdrv_check_request(bs, sector_num, nb_sectors))
+    if (!drv || bs->read_only
+        || bdrv_check_request(bs, sector_num, nb_sectors)) {
         return NULL;
+    }
 
     if (bs->dirty_bitmap) {
         blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb,
@@ -2362,6 +2462,17 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
         opaque = blk_cb_data;
     }
 
+    /* throttling disk write I/O */
+    if (bs->io_limits_enabled) {
+        if (bdrv_exceed_io_limits(bs, nb_sectors, true, &wait_time)) {
+            ret = qemu_block_queue_enqueue(bs->block_queue, bs, bdrv_aio_writev,
+                                  sector_num, qiov, nb_sectors, cb, opaque);
+            qemu_mod_timer(bs->block_timer,
+                                  wait_time + qemu_get_clock_ns(vm_clock));
+            return ret;
+        }
+    }
+
     ret = drv->bdrv_aio_writev(bs, sector_num, qiov, nb_sectors,
                                cb, opaque);
 
@@ -2369,6 +2480,12 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
         if (bs->wr_highest_sector < sector_num + nb_sectors - 1) {
             bs->wr_highest_sector = sector_num + nb_sectors - 1;
         }
+
+        if (bs->io_limits_enabled) {
+            bs->io_disps.bytes[BLOCK_IO_LIMIT_WRITE] +=
+                               (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
+            bs->io_disps.ios[BLOCK_IO_LIMIT_WRITE]++;
+        }
     }
 
     return ret;
@@ -2633,6 +2750,163 @@ void bdrv_aio_cancel(BlockDriverAIOCB *acb)
     acb->pool->cancel(acb);
 }
 
+static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
+                 bool is_write, double elapsed_time, uint64_t *wait) {
+    uint64_t bps_limit = 0;
+    double   bytes_limit, bytes_disp, bytes_res;
+    double   slice_time, wait_time;
+
+    if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
+        bps_limit = bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL];
+    } else if (bs->io_limits.bps[is_write]) {
+        bps_limit = bs->io_limits.bps[is_write];
+    } else {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    slice_time = bs->slice_end[is_write] - bs->slice_start[is_write];
+    slice_time /= (NANOSECONDS_PER_SECOND);
+    bytes_limit = bps_limit * slice_time;
+    bytes_disp  = bs->io_disps.bytes[is_write];
+    if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
+        bytes_disp += bs->io_disps.bytes[!is_write];
+    }
+
+    bytes_res   = (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
+
+    if (bytes_disp + bytes_res <= bytes_limit) {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    /* Calc approx time to dispatch */
+    wait_time = (bytes_disp + bytes_res) / bps_limit - elapsed_time;
+
+    if (wait) {
+        *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
+    }
+
+    return true;
+}
+
+static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
+                             double elapsed_time, uint64_t *wait) {
+    uint64_t iops_limit = 0;
+    double   ios_limit, ios_disp;
+    double   slice_time, wait_time;
+
+    if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
+        iops_limit = bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL];
+    } else if (bs->io_limits.iops[is_write]) {
+        iops_limit = bs->io_limits.iops[is_write];
+    } else {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    slice_time = bs->slice_end[is_write] - bs->slice_start[is_write];
+    slice_time /= (NANOSECONDS_PER_SECOND);
+    ios_limit  = iops_limit * slice_time;
+    ios_disp   = bs->io_disps.ios[is_write];
+    if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
+        ios_disp += bs->io_disps.ios[!is_write];
+    }
+
+    if (ios_disp + 1 <= ios_limit) {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    /* Calc approx time to dispatch */
+    wait_time = (ios_disp + 1) / iops_limit;
+    if (wait_time > elapsed_time) {
+        wait_time = wait_time - elapsed_time;
+    } else {
+        wait_time = 0;
+    }
+
+    if (wait) {
+        *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
+    }
+
+    return true;
+}
+
+static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
+                           bool is_write, uint64_t *wait) {
+    int64_t  now;
+    uint64_t bps_wait = 0, iops_wait = 0, max_wait;
+    double   elapsed_time;
+    int      bps_ret, iops_ret;
+
+    now = qemu_get_clock_ns(vm_clock);
+    if ((bs->slice_start[is_write] < now)
+        && (bs->slice_end[is_write] > now)) {
+        bs->slice_end[is_write]   = now + BLOCK_IO_SLICE_TIME;
+    } else {
+        bs->slice_start[is_write]     = now;
+        bs->slice_end[is_write]       = now + BLOCK_IO_SLICE_TIME;
+
+        bs->io_disps.bytes[is_write]  = 0;
+        bs->io_disps.bytes[!is_write] = 0;
+
+        bs->io_disps.ios[is_write]    = 0;
+        bs->io_disps.ios[!is_write]   = 0;
+    }
+
+    /* If a limit was exceeded, immediately queue this request */
+    if (qemu_block_queue_has_pending(bs->block_queue)) {
+        if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]
+            || bs->io_limits.bps[is_write] || bs->io_limits.iops[is_write]
+            || bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
+            if (wait) {
+                *wait = 0;
+            }
+
+            return true;
+        }
+    }
+
+    elapsed_time  = now - bs->slice_start[is_write];
+    elapsed_time  /= (NANOSECONDS_PER_SECOND);
+
+    bps_ret  = bdrv_exceed_bps_limits(bs, nb_sectors,
+                                      is_write, elapsed_time, &bps_wait);
+    iops_ret = bdrv_exceed_iops_limits(bs, is_write,
+                                      elapsed_time, &iops_wait);
+    if (bps_ret || iops_ret) {
+        max_wait = bps_wait > iops_wait ? bps_wait : iops_wait;
+        if (wait) {
+            *wait = max_wait;
+        }
+
+        now = qemu_get_clock_ns(vm_clock);
+        if (bs->slice_end[is_write] < now + max_wait) {
+            bs->slice_end[is_write] = now + max_wait;
+        }
+
+        return true;
+    }
+
+    if (wait) {
+        *wait = 0;
+    }
+
+    return false;
+}
 
 /**************************************************************/
 /* async block device emulation */
diff --git a/block.h b/block.h
index 3ac0b94..a3e69db 100644
--- a/block.h
+++ b/block.h
@@ -58,6 +58,11 @@ void bdrv_info(Monitor *mon, QObject **ret_data);
 void bdrv_stats_print(Monitor *mon, const QObject *data);
 void bdrv_info_stats(Monitor *mon, QObject **ret_data);
 
+/* disk I/O throttling */
+void bdrv_io_limits_enable(BlockDriverState *bs);
+void bdrv_io_limits_disable(BlockDriverState *bs);
+bool bdrv_io_limits_enabled(BlockDriverState *bs);
+
 void bdrv_init(void);
 void bdrv_init_with_whitelist(void);
 BlockDriver *bdrv_find_protocol(const char *filename);
diff --git a/block_int.h b/block_int.h
index e7bda5b..5d37277 100644
--- a/block_int.h
+++ b/block_int.h
@@ -202,6 +202,15 @@ struct BlockDriverState {
 
     void *sync_aiocb;
 
+    /* the time for latest disk I/O */
+    int64_t slice_start[2];
+    int64_t slice_end[2];
+    BlockIOLimit io_limits;
+    BlockIODisp  io_disps;
+    BlockQueue   *block_queue;
+    QEMUTimer    *block_timer;
+    bool         io_limits_enabled;
+
     /* I/O stats (display with "info blockstats"). */
     uint64_t nr_bytes[BDRV_MAX_IOTYPE];
     uint64_t nr_ops[BDRV_MAX_IOTYPE];
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 4/4] qmp/hmp: add block_set_io_throttle
  2011-09-01 11:44 ` [Qemu-devel] " Zhi Yong Wu
@ 2011-09-01 11:44   ` Zhi Yong Wu
  -1 siblings, 0 replies; 17+ messages in thread
From: Zhi Yong Wu @ 2011-09-01 11:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, stefanha, mtosatti, aliguori, ryanh, zwu.kernel, kwolf,
	pair, Zhi Yong Wu

The patch introduce one new command block_set_io_throttle; For its usage syntax, if you have better idea, pls let me know.

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
---
 block.c         |   26 +++++++++++++++++++-
 block.h         |    1 -
 blockdev.c      |   69 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 blockdev.h      |    2 +
 hmp-commands.hx |   15 ++++++++++++
 qerror.c        |    4 +++
 qerror.h        |    3 ++
 qmp-commands.hx |   52 ++++++++++++++++++++++++++++++++++++++++-
 8 files changed, 168 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index 680f1e7..42f1ecc 100644
--- a/block.c
+++ b/block.c
@@ -1945,6 +1945,16 @@ static void bdrv_print_dict(QObject *obj, void *opaque)
                             qdict_get_bool(qdict, "ro"),
                             qdict_get_str(qdict, "drv"),
                             qdict_get_bool(qdict, "encrypted"));
+
+        monitor_printf(mon, " bps=%" PRId64 " bps_rd=%" PRId64
+                            " bps_wr=%" PRId64 " iops=%" PRId64
+                            " iops_rd=%" PRId64 " iops_wr=%" PRId64,
+                            qdict_get_int(qdict, "bps"),
+                            qdict_get_int(qdict, "bps_rd"),
+                            qdict_get_int(qdict, "bps_wr"),
+                            qdict_get_int(qdict, "iops"),
+                            qdict_get_int(qdict, "iops_rd"),
+                            qdict_get_int(qdict, "iops_wr"));
     } else {
         monitor_printf(mon, " [not inserted]");
     }
@@ -1977,10 +1987,22 @@ void bdrv_info(Monitor *mon, QObject **ret_data)
             QDict *bs_dict = qobject_to_qdict(bs_obj);
 
             obj = qobject_from_jsonf("{ 'file': %s, 'ro': %i, 'drv': %s, "
-                                     "'encrypted': %i }",
+                                     "'encrypted': %i, "
+                                     "'bps': %" PRId64 ","
+                                     "'bps_rd': %" PRId64 ","
+                                     "'bps_wr': %" PRId64 ","
+                                     "'iops': %" PRId64 ","
+                                     "'iops_rd': %" PRId64 ","
+                                     "'iops_wr': %" PRId64 "}",
                                      bs->filename, bs->read_only,
                                      bs->drv->format_name,
-                                     bdrv_is_encrypted(bs));
+                                     bdrv_is_encrypted(bs),
+                                     bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL],
+                                     bs->io_limits.bps[BLOCK_IO_LIMIT_READ],
+                                     bs->io_limits.bps[BLOCK_IO_LIMIT_WRITE],
+                                     bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL],
+                                     bs->io_limits.iops[BLOCK_IO_LIMIT_READ],
+                                     bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE]);
             if (bs->backing_file[0] != '\0') {
                 QDict *qdict = qobject_to_qdict(obj);
                 qdict_put(qdict, "backing_file",
diff --git a/block.h b/block.h
index a3e69db..10d2828 100644
--- a/block.h
+++ b/block.h
@@ -107,7 +107,6 @@ int bdrv_change_backing_file(BlockDriverState *bs,
     const char *backing_file, const char *backing_fmt);
 void bdrv_register(BlockDriver *bdrv);
 
-
 typedef struct BdrvCheckResult {
     int corruptions;
     int leaks;
diff --git a/blockdev.c b/blockdev.c
index 619ae9f..7f5c4df 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -747,6 +747,75 @@ int do_change_block(Monitor *mon, const char *device,
     return monitor_read_bdrv_key_start(mon, bs, NULL, NULL);
 }
 
+/* throttling disk I/O limits */
+int do_block_set_io_throttle(Monitor *mon,
+                       const QDict *qdict, QObject **ret_data)
+{
+    const char *devname = qdict_get_str(qdict, "device");
+    uint64_t bps        = qdict_get_try_int(qdict, "bps", -1);
+    uint64_t bps_rd     = qdict_get_try_int(qdict, "bps_rd", -1);
+    uint64_t bps_wr     = qdict_get_try_int(qdict, "bps_wr", -1);
+    uint64_t iops       = qdict_get_try_int(qdict, "iops", -1);
+    uint64_t iops_rd    = qdict_get_try_int(qdict, "iops_rd", -1);
+    uint64_t iops_wr    = qdict_get_try_int(qdict, "iops_wr", -1);
+    BlockDriverState *bs;
+
+    bs = bdrv_find(devname);
+    if (!bs) {
+        qerror_report(QERR_DEVICE_NOT_FOUND, devname);
+        return -1;
+    }
+
+    if ((bps == -1) && (bps_rd == -1) && (bps_wr == -1)
+        && (iops == -1) && (iops_rd == -1) && (iops_wr == -1)) {
+        qerror_report(QERR_MISSING_PARAMETER,
+                      "bps/bps_rd/bps_wr/iops/iops_rd/iops_wr");
+        return -1;
+    }
+
+    if (((bps != -1) && ((bps_rd != -1) || (bps_wr != -1)))
+        || ((iops != -1) && ((iops_rd != -1) || (iops_wr != -1)))) {
+        qerror_report(QERR_INVALID_PARAMETER_COMBINATION);
+        return -1;
+    }
+
+    if (bps != -1) {
+        bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL] = bps;
+        bs->io_limits.bps[BLOCK_IO_LIMIT_READ]  = 0;
+        bs->io_limits.bps[BLOCK_IO_LIMIT_WRITE] = 0;
+    }
+
+    if ((bps_rd != -1) || (bps_wr != -1)) {
+        bs->io_limits.bps[BLOCK_IO_LIMIT_READ]   =
+           (bps_rd == -1) ? bs->io_limits.bps[BLOCK_IO_LIMIT_READ] : bps_rd;
+        bs->io_limits.bps[BLOCK_IO_LIMIT_WRITE]  =
+           (bps_wr == -1) ? bs->io_limits.bps[BLOCK_IO_LIMIT_WRITE] : bps_wr;
+        bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]  = 0;
+    }
+
+    if (iops != -1) {
+        bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL] = iops;
+        bs->io_limits.iops[BLOCK_IO_LIMIT_READ]  = 0;
+        bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE] = 0;
+    }
+
+    if ((iops_rd != -1) || (iops_wr != -1)) {
+        bs->io_limits.iops[BLOCK_IO_LIMIT_READ]  =
+           (iops_rd == -1) ? bs->io_limits.iops[BLOCK_IO_LIMIT_READ] : iops_rd;
+        bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE] =
+           (iops_wr == -1) ? bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE] : iops_wr;
+        bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL] = 0;
+    }
+
+    if (!bs->io_limits_enabled && bdrv_io_limits_enabled(bs)) {
+        bdrv_io_limits_enable(bs);
+    } else if (bs->io_limits_enabled && !bdrv_io_limits_enabled(bs)) {
+        bdrv_io_limits_disable(bs);
+    }
+
+    return 0;
+}
+
 int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
 {
     const char *id = qdict_get_str(qdict, "id");
diff --git a/blockdev.h b/blockdev.h
index 3587786..c2f44c6 100644
--- a/blockdev.h
+++ b/blockdev.h
@@ -63,6 +63,8 @@ int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_change_block(Monitor *mon, const char *device,
                     const char *filename, const char *fmt);
 int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
+int do_block_set_io_throttle(Monitor *mon,
+                    const QDict *qdict, QObject **ret_data);
 int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_block_resize(Monitor *mon, const QDict *qdict, QObject **ret_data);
 
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 0ccfb28..147f9cf 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1210,6 +1210,21 @@ ETEXI
     },
 
 STEXI
+@item block_set_io_throttle @var{device} @var{bps} @var{bps_rd} @var{bps_wr} @var{iops} @var{iops_rd} @var{iops_wr}
+@findex block_set_io_throttle
+Change I/O throttle limits for a block drive to @var{bps} @var{bps_rd} @var{bps_wr} @var{iops} @var{iops_rd} @var{iops_wr}
+ETEXI
+
+    {
+        .name       = "block_set_io_throttle",
+        .args_type  = "device:B,bps:i?,bps_rd:i?,bps_wr:i?,iops:i?,iops_rd:i?,iops_wr:i?",
+        .params     = "device [bps] [bps_rd] [bps_wr] [iops] [iops_rd] [iops_wr]",
+        .help       = "change I/O throttle limits for a block drive",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = do_block_set_io_throttle,
+    },
+
+STEXI
 @item block_passwd @var{device} @var{password}
 @findex block_passwd
 Set the encrypted device @var{device} password to @var{password}
diff --git a/qerror.c b/qerror.c
index 3d64b80..33f9fdd 100644
--- a/qerror.c
+++ b/qerror.c
@@ -230,6 +230,10 @@ static const QErrorStringTable qerror_table[] = {
         .error_fmt = QERR_QGA_COMMAND_FAILED,
         .desc      = "Guest agent command failed, error was '%(message)'",
     },
+    {
+        .error_fmt = QERR_INVALID_PARAMETER_COMBINATION,
+        .desc      = "Invalid paramter combination",
+    },
     {}
 };
 
diff --git a/qerror.h b/qerror.h
index 8058456..62c1df2 100644
--- a/qerror.h
+++ b/qerror.h
@@ -193,4 +193,7 @@ QError *qobject_to_qerror(const QObject *obj);
 #define QERR_QGA_COMMAND_FAILED \
     "{ 'class': 'QgaCommandFailed', 'data': { 'message': %s } }"
 
+#define QERR_INVALID_PARAMETER_COMBINATION \
+    "{ 'class': 'InvalidParameterCombination', 'data': {} }"
+
 #endif /* QERROR_H */
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 27cc66e..e848969 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -862,6 +862,44 @@ Example:
 EQMP
 
     {
+        .name       = "block_set_io_throttle",
+        .args_type  = "device:B,bps:i?,bps_rd:i?,bps_wr:i?,iops:i?,iops_rd:i?,iops_wr:i?",
+        .params     = "device [bps] [bps_rd] [bps_wr] [iops] [iops_rd] [iops_wr]",
+        .help       = "change I/O throttle limits for a block drive",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = do_block_set_io_throttle,
+    },
+
+SQMP
+block_set_io_throttle
+------------
+
+Change I/O throttle limits for a block drive.
+
+Arguments:
+
+- "device": device name (json-string)
+- "bps":  total throughput limit in bytes per second(json-int, optional)
+- "bps_rd":  read throughput limit in bytes per second(json-int, optional)
+- "bps_wr":  read throughput limit in bytes per second(json-int, optional)
+- "iops":  total I/O operations per second(json-int, optional)
+- "iops_rd":  read I/O operations per second(json-int, optional)
+- "iops_wr":  write I/O operations per second(json-int, optional)
+
+Example:
+
+-> { "execute": "block_set_io_throttle", "arguments": { "device": "virtio0",
+                                               "bps": "1000000",
+                                               "bps_rd": "0",
+                                               "bps_wr": "0",
+                                               "iops": "0",
+                                               "iops_rd": "0",
+                                               "iops_wr": "0" } }
+<- { "return": {} }
+
+EQMP
+
+    {
         .name       = "set_password",
         .args_type  = "protocol:s,password:s,connected:s?",
         .params     = "protocol password action-if-connected",
@@ -1143,6 +1181,12 @@ Each json-object contain the following:
                                 "tftp", "vdi", "vmdk", "vpc", "vvfat"
          - "backing_file": backing file name (json-string, optional)
          - "encrypted": true if encrypted, false otherwise (json-bool)
+         - "bps": limit total bytes per second (json-int)
+         - "bps_rd": limit read bytes per second (json-int)
+         - "bps_wr": limit write bytes per second (json-int)
+         - "iops": limit total I/O operations per second (json-int)
+         - "iops_rd": limit read operations per second (json-int)
+         - "iops_wr": limit write operations per second (json-int)
 
 Example:
 
@@ -1157,7 +1201,13 @@ Example:
                "ro":false,
                "drv":"qcow2",
                "encrypted":false,
-               "file":"disks/test.img"
+               "file":"disks/vm.img",
+               "bps":1000000,
+               "bps_rd":0,
+               "bps_wr":0,
+               "iops":1000000,
+               "iops_rd":0,
+               "iops_wr":0,
             },
             "type":"unknown"
          },
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v6 4/4] qmp/hmp: add block_set_io_throttle
@ 2011-09-01 11:44   ` Zhi Yong Wu
  0 siblings, 0 replies; 17+ messages in thread
From: Zhi Yong Wu @ 2011-09-01 11:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, aliguori, stefanha, kvm, mtosatti, Zhi Yong Wu, pair,
	zwu.kernel, ryanh

The patch introduce one new command block_set_io_throttle; For its usage syntax, if you have better idea, pls let me know.

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
---
 block.c         |   26 +++++++++++++++++++-
 block.h         |    1 -
 blockdev.c      |   69 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 blockdev.h      |    2 +
 hmp-commands.hx |   15 ++++++++++++
 qerror.c        |    4 +++
 qerror.h        |    3 ++
 qmp-commands.hx |   52 ++++++++++++++++++++++++++++++++++++++++-
 8 files changed, 168 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index 680f1e7..42f1ecc 100644
--- a/block.c
+++ b/block.c
@@ -1945,6 +1945,16 @@ static void bdrv_print_dict(QObject *obj, void *opaque)
                             qdict_get_bool(qdict, "ro"),
                             qdict_get_str(qdict, "drv"),
                             qdict_get_bool(qdict, "encrypted"));
+
+        monitor_printf(mon, " bps=%" PRId64 " bps_rd=%" PRId64
+                            " bps_wr=%" PRId64 " iops=%" PRId64
+                            " iops_rd=%" PRId64 " iops_wr=%" PRId64,
+                            qdict_get_int(qdict, "bps"),
+                            qdict_get_int(qdict, "bps_rd"),
+                            qdict_get_int(qdict, "bps_wr"),
+                            qdict_get_int(qdict, "iops"),
+                            qdict_get_int(qdict, "iops_rd"),
+                            qdict_get_int(qdict, "iops_wr"));
     } else {
         monitor_printf(mon, " [not inserted]");
     }
@@ -1977,10 +1987,22 @@ void bdrv_info(Monitor *mon, QObject **ret_data)
             QDict *bs_dict = qobject_to_qdict(bs_obj);
 
             obj = qobject_from_jsonf("{ 'file': %s, 'ro': %i, 'drv': %s, "
-                                     "'encrypted': %i }",
+                                     "'encrypted': %i, "
+                                     "'bps': %" PRId64 ","
+                                     "'bps_rd': %" PRId64 ","
+                                     "'bps_wr': %" PRId64 ","
+                                     "'iops': %" PRId64 ","
+                                     "'iops_rd': %" PRId64 ","
+                                     "'iops_wr': %" PRId64 "}",
                                      bs->filename, bs->read_only,
                                      bs->drv->format_name,
-                                     bdrv_is_encrypted(bs));
+                                     bdrv_is_encrypted(bs),
+                                     bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL],
+                                     bs->io_limits.bps[BLOCK_IO_LIMIT_READ],
+                                     bs->io_limits.bps[BLOCK_IO_LIMIT_WRITE],
+                                     bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL],
+                                     bs->io_limits.iops[BLOCK_IO_LIMIT_READ],
+                                     bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE]);
             if (bs->backing_file[0] != '\0') {
                 QDict *qdict = qobject_to_qdict(obj);
                 qdict_put(qdict, "backing_file",
diff --git a/block.h b/block.h
index a3e69db..10d2828 100644
--- a/block.h
+++ b/block.h
@@ -107,7 +107,6 @@ int bdrv_change_backing_file(BlockDriverState *bs,
     const char *backing_file, const char *backing_fmt);
 void bdrv_register(BlockDriver *bdrv);
 
-
 typedef struct BdrvCheckResult {
     int corruptions;
     int leaks;
diff --git a/blockdev.c b/blockdev.c
index 619ae9f..7f5c4df 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -747,6 +747,75 @@ int do_change_block(Monitor *mon, const char *device,
     return monitor_read_bdrv_key_start(mon, bs, NULL, NULL);
 }
 
+/* throttling disk I/O limits */
+int do_block_set_io_throttle(Monitor *mon,
+                       const QDict *qdict, QObject **ret_data)
+{
+    const char *devname = qdict_get_str(qdict, "device");
+    uint64_t bps        = qdict_get_try_int(qdict, "bps", -1);
+    uint64_t bps_rd     = qdict_get_try_int(qdict, "bps_rd", -1);
+    uint64_t bps_wr     = qdict_get_try_int(qdict, "bps_wr", -1);
+    uint64_t iops       = qdict_get_try_int(qdict, "iops", -1);
+    uint64_t iops_rd    = qdict_get_try_int(qdict, "iops_rd", -1);
+    uint64_t iops_wr    = qdict_get_try_int(qdict, "iops_wr", -1);
+    BlockDriverState *bs;
+
+    bs = bdrv_find(devname);
+    if (!bs) {
+        qerror_report(QERR_DEVICE_NOT_FOUND, devname);
+        return -1;
+    }
+
+    if ((bps == -1) && (bps_rd == -1) && (bps_wr == -1)
+        && (iops == -1) && (iops_rd == -1) && (iops_wr == -1)) {
+        qerror_report(QERR_MISSING_PARAMETER,
+                      "bps/bps_rd/bps_wr/iops/iops_rd/iops_wr");
+        return -1;
+    }
+
+    if (((bps != -1) && ((bps_rd != -1) || (bps_wr != -1)))
+        || ((iops != -1) && ((iops_rd != -1) || (iops_wr != -1)))) {
+        qerror_report(QERR_INVALID_PARAMETER_COMBINATION);
+        return -1;
+    }
+
+    if (bps != -1) {
+        bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL] = bps;
+        bs->io_limits.bps[BLOCK_IO_LIMIT_READ]  = 0;
+        bs->io_limits.bps[BLOCK_IO_LIMIT_WRITE] = 0;
+    }
+
+    if ((bps_rd != -1) || (bps_wr != -1)) {
+        bs->io_limits.bps[BLOCK_IO_LIMIT_READ]   =
+           (bps_rd == -1) ? bs->io_limits.bps[BLOCK_IO_LIMIT_READ] : bps_rd;
+        bs->io_limits.bps[BLOCK_IO_LIMIT_WRITE]  =
+           (bps_wr == -1) ? bs->io_limits.bps[BLOCK_IO_LIMIT_WRITE] : bps_wr;
+        bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]  = 0;
+    }
+
+    if (iops != -1) {
+        bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL] = iops;
+        bs->io_limits.iops[BLOCK_IO_LIMIT_READ]  = 0;
+        bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE] = 0;
+    }
+
+    if ((iops_rd != -1) || (iops_wr != -1)) {
+        bs->io_limits.iops[BLOCK_IO_LIMIT_READ]  =
+           (iops_rd == -1) ? bs->io_limits.iops[BLOCK_IO_LIMIT_READ] : iops_rd;
+        bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE] =
+           (iops_wr == -1) ? bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE] : iops_wr;
+        bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL] = 0;
+    }
+
+    if (!bs->io_limits_enabled && bdrv_io_limits_enabled(bs)) {
+        bdrv_io_limits_enable(bs);
+    } else if (bs->io_limits_enabled && !bdrv_io_limits_enabled(bs)) {
+        bdrv_io_limits_disable(bs);
+    }
+
+    return 0;
+}
+
 int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
 {
     const char *id = qdict_get_str(qdict, "id");
diff --git a/blockdev.h b/blockdev.h
index 3587786..c2f44c6 100644
--- a/blockdev.h
+++ b/blockdev.h
@@ -63,6 +63,8 @@ int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_change_block(Monitor *mon, const char *device,
                     const char *filename, const char *fmt);
 int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
+int do_block_set_io_throttle(Monitor *mon,
+                    const QDict *qdict, QObject **ret_data);
 int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_block_resize(Monitor *mon, const QDict *qdict, QObject **ret_data);
 
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 0ccfb28..147f9cf 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1210,6 +1210,21 @@ ETEXI
     },
 
 STEXI
+@item block_set_io_throttle @var{device} @var{bps} @var{bps_rd} @var{bps_wr} @var{iops} @var{iops_rd} @var{iops_wr}
+@findex block_set_io_throttle
+Change I/O throttle limits for a block drive to @var{bps} @var{bps_rd} @var{bps_wr} @var{iops} @var{iops_rd} @var{iops_wr}
+ETEXI
+
+    {
+        .name       = "block_set_io_throttle",
+        .args_type  = "device:B,bps:i?,bps_rd:i?,bps_wr:i?,iops:i?,iops_rd:i?,iops_wr:i?",
+        .params     = "device [bps] [bps_rd] [bps_wr] [iops] [iops_rd] [iops_wr]",
+        .help       = "change I/O throttle limits for a block drive",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = do_block_set_io_throttle,
+    },
+
+STEXI
 @item block_passwd @var{device} @var{password}
 @findex block_passwd
 Set the encrypted device @var{device} password to @var{password}
diff --git a/qerror.c b/qerror.c
index 3d64b80..33f9fdd 100644
--- a/qerror.c
+++ b/qerror.c
@@ -230,6 +230,10 @@ static const QErrorStringTable qerror_table[] = {
         .error_fmt = QERR_QGA_COMMAND_FAILED,
         .desc      = "Guest agent command failed, error was '%(message)'",
     },
+    {
+        .error_fmt = QERR_INVALID_PARAMETER_COMBINATION,
+        .desc      = "Invalid paramter combination",
+    },
     {}
 };
 
diff --git a/qerror.h b/qerror.h
index 8058456..62c1df2 100644
--- a/qerror.h
+++ b/qerror.h
@@ -193,4 +193,7 @@ QError *qobject_to_qerror(const QObject *obj);
 #define QERR_QGA_COMMAND_FAILED \
     "{ 'class': 'QgaCommandFailed', 'data': { 'message': %s } }"
 
+#define QERR_INVALID_PARAMETER_COMBINATION \
+    "{ 'class': 'InvalidParameterCombination', 'data': {} }"
+
 #endif /* QERROR_H */
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 27cc66e..e848969 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -862,6 +862,44 @@ Example:
 EQMP
 
     {
+        .name       = "block_set_io_throttle",
+        .args_type  = "device:B,bps:i?,bps_rd:i?,bps_wr:i?,iops:i?,iops_rd:i?,iops_wr:i?",
+        .params     = "device [bps] [bps_rd] [bps_wr] [iops] [iops_rd] [iops_wr]",
+        .help       = "change I/O throttle limits for a block drive",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = do_block_set_io_throttle,
+    },
+
+SQMP
+block_set_io_throttle
+------------
+
+Change I/O throttle limits for a block drive.
+
+Arguments:
+
+- "device": device name (json-string)
+- "bps":  total throughput limit in bytes per second(json-int, optional)
+- "bps_rd":  read throughput limit in bytes per second(json-int, optional)
+- "bps_wr":  read throughput limit in bytes per second(json-int, optional)
+- "iops":  total I/O operations per second(json-int, optional)
+- "iops_rd":  read I/O operations per second(json-int, optional)
+- "iops_wr":  write I/O operations per second(json-int, optional)
+
+Example:
+
+-> { "execute": "block_set_io_throttle", "arguments": { "device": "virtio0",
+                                               "bps": "1000000",
+                                               "bps_rd": "0",
+                                               "bps_wr": "0",
+                                               "iops": "0",
+                                               "iops_rd": "0",
+                                               "iops_wr": "0" } }
+<- { "return": {} }
+
+EQMP
+
+    {
         .name       = "set_password",
         .args_type  = "protocol:s,password:s,connected:s?",
         .params     = "protocol password action-if-connected",
@@ -1143,6 +1181,12 @@ Each json-object contain the following:
                                 "tftp", "vdi", "vmdk", "vpc", "vvfat"
          - "backing_file": backing file name (json-string, optional)
          - "encrypted": true if encrypted, false otherwise (json-bool)
+         - "bps": limit total bytes per second (json-int)
+         - "bps_rd": limit read bytes per second (json-int)
+         - "bps_wr": limit write bytes per second (json-int)
+         - "iops": limit total I/O operations per second (json-int)
+         - "iops_rd": limit read operations per second (json-int)
+         - "iops_wr": limit write operations per second (json-int)
 
 Example:
 
@@ -1157,7 +1201,13 @@ Example:
                "ro":false,
                "drv":"qcow2",
                "encrypted":false,
-               "file":"disks/test.img"
+               "file":"disks/vm.img",
+               "bps":1000000,
+               "bps_rd":0,
+               "bps_wr":0,
+               "iops":1000000,
+               "iops_rd":0,
+               "iops_wr":0,
             },
             "type":"unknown"
          },
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 2/4] block: add the block queue support
  2011-09-01 11:44   ` [Qemu-devel] " Zhi Yong Wu
@ 2011-09-01 13:02     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2011-09-01 13:02 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: qemu-devel, kvm, stefanha, mtosatti, aliguori, ryanh, zwu.kernel,
	kwolf, pair

On Thu, Sep 1, 2011 at 12:44 PM, Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> wrote:
> +struct BlockIORequest {
> +    QTAILQ_ENTRY(BlockIORequest) entry;
> +    BlockDriverState *bs;
> +    BlockRequestHandler *handler;
> +    int64_t sector_num;
> +    QEMUIOVector *qiov;
> +    int nb_sectors;
> +    BlockDriverCompletionFunc *cb;
> +    void *opaque;
> +    BlockDriverAIOCB *acb;
> +};
> +
> +struct BlockQueue {
> +    QTAILQ_HEAD(requests, BlockIORequest) requests;
> +    BlockIORequest *request;
> +    bool flushing;
> +};
> +
> +struct BlockQueueAIOCB {
> +    BlockDriverAIOCB common;
> +    BlockDriverCompletionFunc *real_cb;
> +    BlockDriverAIOCB *real_acb;
> +    void *opaque;
> +    BlockIORequest *request;
> +};

Now it's pretty easy to merge BlockIORequest and BlockQueueAIOCB into
one struct.  That way we don't need to duplicate fields or link
structures together:

typedef struct BlockQueueAIOCB {
    BlockDriverAIOCB common;
    QTAILQ_ENTRY(BlockQueueAIOCB) entry;          /* held requests */
    BlockRequestHandler *handler;                 /* dispatch function */
    BlockDriverAIOCB *real_acb;                   /* dispatched aiocb */

    /* Request parameters */
    int64_t sector_num;
    QEMUIOVector *qiov;
    int nb_sectors;
} BlockQueueAIOCB;

This struct is kept for the lifetime of a request (both during
queueing and after dispatch).

> +
> +static void qemu_block_queue_dequeue(BlockQueue *queue, BlockIORequest *request)
> +{
> +    BlockIORequest *req;
> +
> +    while (!QTAILQ_EMPTY(&queue->requests)) {
> +        req = QTAILQ_FIRST(&queue->requests);
> +        if (req == request) {
> +            QTAILQ_REMOVE(&queue->requests, req, entry);
> +            break;
> +        }
> +    }
> +}
> +
> +static void qemu_block_queue_cancel(BlockDriverAIOCB *acb)
> +{
> +    BlockQueueAIOCB *blkacb = container_of(acb, BlockQueueAIOCB, common);
> +    if (blkacb->real_acb) {
> +        bdrv_aio_cancel(blkacb->real_acb);
> +    } else {
> +        assert(blkacb->common.bs->block_queue);
> +        qemu_block_queue_dequeue(blkacb->common.bs->block_queue,
> +                                 blkacb->request);

Can we use QTAILQ_REMOVE() and eliminate qemu_block_queue_dequeue()?
If the aiocb exists and is not dispatched (real_acb != NULL) then it
must be queued.

> +    }
> +
> +    qemu_aio_release(blkacb);
> +}
> +
> +static AIOPool block_queue_pool = {
> +    .aiocb_size         = sizeof(struct BlockQueueAIOCB),
> +    .cancel             = qemu_block_queue_cancel,
> +};
> +
> +static void qemu_block_queue_callback(void *opaque, int ret)
> +{
> +    BlockQueueAIOCB *acb = opaque;
> +
> +    if (acb->real_cb) {
> +        acb->real_cb(acb->opaque, ret);
> +    }
> +
> +    qemu_aio_release(acb);
> +}
> +
> +BlockQueue *qemu_new_block_queue(void)
> +{
> +    BlockQueue *queue;
> +
> +    queue = g_malloc0(sizeof(BlockQueue));
> +
> +    QTAILQ_INIT(&queue->requests);
> +
> +    queue->flushing = false;
> +    queue->request  = NULL;
> +
> +    return queue;
> +}
> +
> +void qemu_del_block_queue(BlockQueue *queue)
> +{
> +    BlockIORequest *request, *next;
> +
> +    QTAILQ_FOREACH_SAFE(request, &queue->requests, entry, next) {
> +        QTAILQ_REMOVE(&queue->requests, request, entry);
> +        g_free(request);

What about the acbs?  There needs to be a strategy for cleanly
shutting down completely.  In fact we should probably use cancellation
here or assert that the queue is already empty.

> +    }
> +
> +    g_free(queue);
> +}
> +
> +BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
> +                        BlockDriverState *bs,
> +                        BlockRequestHandler *handler,
> +                        int64_t sector_num,
> +                        QEMUIOVector *qiov,
> +                        int nb_sectors,
> +                        BlockDriverCompletionFunc *cb,
> +                        void *opaque)
> +{
> +    BlockIORequest *request;
> +    BlockDriverAIOCB *acb;
> +    BlockQueueAIOCB *blkacb;
> +
> +    if (queue->request) {
> +        request             = queue->request;
> +        assert(request->acb);
> +        acb                 = request->acb;
> +        QTAILQ_INSERT_TAIL(&queue->requests, request, entry);
> +        queue->request      = NULL;

I don't think we need this behavior.  There is already requeuing logic
in the flush function: if request->handler() fails then the request is
put back onto the queue and we stop flushing.

So all we need here is:

if (queue->flushing) {
    return NULL;
}

This results in the request->handler() failing (returning NULL) and
the flush function then requeues this request.

In other words, during flushing we do not allow any new requests to be enqueued.

> +    } else {
> +        request             = g_malloc0(sizeof(BlockIORequest));
> +        request->bs         = bs;
> +        request->handler    = handler;
> +        request->sector_num = sector_num;
> +        request->qiov       = qiov;
> +        request->nb_sectors = nb_sectors;
> +        request->cb         = qemu_block_queue_callback;
> +        QTAILQ_INSERT_TAIL(&queue->requests, request, entry);
> +
> +        acb = qemu_aio_get(&block_queue_pool, bs,
> +                           qemu_block_queue_callback, opaque);
> +        blkacb = container_of(acb, BlockQueueAIOCB, common);
> +        blkacb->real_cb     = cb;
> +        blkacb->real_acb    = NULL;
> +        blkacb->opaque      = opaque;
> +        blkacb->request     = request;
> +
> +        request->acb        = acb;
> +        request->opaque = (void *)blkacb;

Here's what this initialization code looks like when BlockIORequest
and BlockQueueAIOCB are merged:

acb = qemu_aio_get(&block_queue_pool, bs, cb, opaque);
acb->handler    = handler;
acb->sector_num = sector_num;
acb->qiov       = qiov;
acb->nb_sectors = nb_sectors;
acb->real_acb   = NULL;
QTAILQ_INSERT_TAIL(&queue->requests, acb, entry);

> +    }
> +
> +    return acb;
> +}
> +
> +static int qemu_block_queue_handler(BlockIORequest *request)
> +{
> +    int ret;
> +    BlockDriverAIOCB *res;
> +
> +    res = request->handler(request->bs, request->sector_num,
> +                           request->qiov, request->nb_sectors,
> +                           request->cb, request->opaque);

Please pass qemu_block_queue_callback and request->acb directly here
instead of using request->cb and request->opaque.  Those fields aren't
needed and just add indirection.

> +    if (res) {
> +        BlockQueueAIOCB *blkacb =
> +                container_of(request->acb, BlockQueueAIOCB, common);
> +        blkacb->real_acb = res;
> +    }
> +
> +    ret = (res == NULL) ? 0 : 1;
> +
> +    return ret;
> +}
> +
> +void qemu_block_queue_flush(BlockQueue *queue)
> +{
> +    queue->flushing = true;
> +    while (!QTAILQ_EMPTY(&queue->requests)) {
> +        BlockIORequest *request = NULL;
> +        int ret = 0;
> +
> +        request = QTAILQ_FIRST(&queue->requests);
> +        QTAILQ_REMOVE(&queue->requests, request, entry);
> +
> +        queue->request = request;
> +        ret = qemu_block_queue_handler(request);
> +
> +        if (ret == 0) {
> +            QTAILQ_INSERT_HEAD(&queue->requests, request, entry);
> +            queue->request = NULL;
> +            break;
> +        }
> +
> +        if (queue->request) {
> +            g_free(request);
> +        }
> +
> +        queue->request = NULL;
> +    }
> +
> +    queue->flushing = false;
> +}
> +
> +bool qemu_block_queue_has_pending(BlockQueue *queue)
> +{
> +    return !queue->flushing && !QTAILQ_EMPTY(&queue->requests);
> +}
> diff --git a/block/blk-queue.h b/block/blk-queue.h
> new file mode 100644
> index 0000000..84c2407
> --- /dev/null
> +++ b/block/blk-queue.h
> @@ -0,0 +1,63 @@
> +/*
> + * QEMU System Emulator queue declaration for block layer
> + *
> + * Copyright (c) IBM, Corp. 2011
> + *
> + * Authors:
> + *  Zhi Yong Wu  <zwu.kernel@gmail.com>
> + *  Stefan Hajnoczi <stefanha@gmail.com>

Please use linux.vnet.ibm.com addresses and not GMail.

> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#ifndef QEMU_BLOCK_QUEUE_H
> +#define QEMU_BLOCK_QUEUE_H
> +
> +#include "block.h"
> +#include "qemu-queue.h"
> +
> +typedef BlockDriverAIOCB* (BlockRequestHandler) (BlockDriverState *bs,
> +                                int64_t sector_num, QEMUIOVector *qiov,
> +                                int nb_sectors, BlockDriverCompletionFunc *cb,
> +                                void *opaque);
> +
> +typedef struct BlockIORequest BlockIORequest;

BlockIORequest does not need a forward declaration because only
blk-queue.c uses it.  It's a private type that the rest of QEMU should
not know about.

> +
> +typedef struct BlockQueue BlockQueue;
> +
> +typedef struct BlockQueueAIOCB BlockQueueAIOCB;

This is also a private type, callers only know about BlockDriverAIOCB.
 Please move to blk-queue.c.

Stefan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v6 2/4] block: add the block queue support
@ 2011-09-01 13:02     ` Stefan Hajnoczi
  0 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2011-09-01 13:02 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: kwolf, aliguori, stefanha, kvm, mtosatti, qemu-devel, pair,
	zwu.kernel, ryanh

On Thu, Sep 1, 2011 at 12:44 PM, Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> wrote:
> +struct BlockIORequest {
> +    QTAILQ_ENTRY(BlockIORequest) entry;
> +    BlockDriverState *bs;
> +    BlockRequestHandler *handler;
> +    int64_t sector_num;
> +    QEMUIOVector *qiov;
> +    int nb_sectors;
> +    BlockDriverCompletionFunc *cb;
> +    void *opaque;
> +    BlockDriverAIOCB *acb;
> +};
> +
> +struct BlockQueue {
> +    QTAILQ_HEAD(requests, BlockIORequest) requests;
> +    BlockIORequest *request;
> +    bool flushing;
> +};
> +
> +struct BlockQueueAIOCB {
> +    BlockDriverAIOCB common;
> +    BlockDriverCompletionFunc *real_cb;
> +    BlockDriverAIOCB *real_acb;
> +    void *opaque;
> +    BlockIORequest *request;
> +};

Now it's pretty easy to merge BlockIORequest and BlockQueueAIOCB into
one struct.  That way we don't need to duplicate fields or link
structures together:

typedef struct BlockQueueAIOCB {
    BlockDriverAIOCB common;
    QTAILQ_ENTRY(BlockQueueAIOCB) entry;          /* held requests */
    BlockRequestHandler *handler;                 /* dispatch function */
    BlockDriverAIOCB *real_acb;                   /* dispatched aiocb */

    /* Request parameters */
    int64_t sector_num;
    QEMUIOVector *qiov;
    int nb_sectors;
} BlockQueueAIOCB;

This struct is kept for the lifetime of a request (both during
queueing and after dispatch).

> +
> +static void qemu_block_queue_dequeue(BlockQueue *queue, BlockIORequest *request)
> +{
> +    BlockIORequest *req;
> +
> +    while (!QTAILQ_EMPTY(&queue->requests)) {
> +        req = QTAILQ_FIRST(&queue->requests);
> +        if (req == request) {
> +            QTAILQ_REMOVE(&queue->requests, req, entry);
> +            break;
> +        }
> +    }
> +}
> +
> +static void qemu_block_queue_cancel(BlockDriverAIOCB *acb)
> +{
> +    BlockQueueAIOCB *blkacb = container_of(acb, BlockQueueAIOCB, common);
> +    if (blkacb->real_acb) {
> +        bdrv_aio_cancel(blkacb->real_acb);
> +    } else {
> +        assert(blkacb->common.bs->block_queue);
> +        qemu_block_queue_dequeue(blkacb->common.bs->block_queue,
> +                                 blkacb->request);

Can we use QTAILQ_REMOVE() and eliminate qemu_block_queue_dequeue()?
If the aiocb exists and is not dispatched (real_acb != NULL) then it
must be queued.

> +    }
> +
> +    qemu_aio_release(blkacb);
> +}
> +
> +static AIOPool block_queue_pool = {
> +    .aiocb_size         = sizeof(struct BlockQueueAIOCB),
> +    .cancel             = qemu_block_queue_cancel,
> +};
> +
> +static void qemu_block_queue_callback(void *opaque, int ret)
> +{
> +    BlockQueueAIOCB *acb = opaque;
> +
> +    if (acb->real_cb) {
> +        acb->real_cb(acb->opaque, ret);
> +    }
> +
> +    qemu_aio_release(acb);
> +}
> +
> +BlockQueue *qemu_new_block_queue(void)
> +{
> +    BlockQueue *queue;
> +
> +    queue = g_malloc0(sizeof(BlockQueue));
> +
> +    QTAILQ_INIT(&queue->requests);
> +
> +    queue->flushing = false;
> +    queue->request  = NULL;
> +
> +    return queue;
> +}
> +
> +void qemu_del_block_queue(BlockQueue *queue)
> +{
> +    BlockIORequest *request, *next;
> +
> +    QTAILQ_FOREACH_SAFE(request, &queue->requests, entry, next) {
> +        QTAILQ_REMOVE(&queue->requests, request, entry);
> +        g_free(request);

What about the acbs?  There needs to be a strategy for cleanly
shutting down completely.  In fact we should probably use cancellation
here or assert that the queue is already empty.

> +    }
> +
> +    g_free(queue);
> +}
> +
> +BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
> +                        BlockDriverState *bs,
> +                        BlockRequestHandler *handler,
> +                        int64_t sector_num,
> +                        QEMUIOVector *qiov,
> +                        int nb_sectors,
> +                        BlockDriverCompletionFunc *cb,
> +                        void *opaque)
> +{
> +    BlockIORequest *request;
> +    BlockDriverAIOCB *acb;
> +    BlockQueueAIOCB *blkacb;
> +
> +    if (queue->request) {
> +        request             = queue->request;
> +        assert(request->acb);
> +        acb                 = request->acb;
> +        QTAILQ_INSERT_TAIL(&queue->requests, request, entry);
> +        queue->request      = NULL;

I don't think we need this behavior.  There is already requeuing logic
in the flush function: if request->handler() fails then the request is
put back onto the queue and we stop flushing.

So all we need here is:

if (queue->flushing) {
    return NULL;
}

This results in the request->handler() failing (returning NULL) and
the flush function then requeues this request.

In other words, during flushing we do not allow any new requests to be enqueued.

> +    } else {
> +        request             = g_malloc0(sizeof(BlockIORequest));
> +        request->bs         = bs;
> +        request->handler    = handler;
> +        request->sector_num = sector_num;
> +        request->qiov       = qiov;
> +        request->nb_sectors = nb_sectors;
> +        request->cb         = qemu_block_queue_callback;
> +        QTAILQ_INSERT_TAIL(&queue->requests, request, entry);
> +
> +        acb = qemu_aio_get(&block_queue_pool, bs,
> +                           qemu_block_queue_callback, opaque);
> +        blkacb = container_of(acb, BlockQueueAIOCB, common);
> +        blkacb->real_cb     = cb;
> +        blkacb->real_acb    = NULL;
> +        blkacb->opaque      = opaque;
> +        blkacb->request     = request;
> +
> +        request->acb        = acb;
> +        request->opaque = (void *)blkacb;

Here's what this initialization code looks like when BlockIORequest
and BlockQueueAIOCB are merged:

acb = qemu_aio_get(&block_queue_pool, bs, cb, opaque);
acb->handler    = handler;
acb->sector_num = sector_num;
acb->qiov       = qiov;
acb->nb_sectors = nb_sectors;
acb->real_acb   = NULL;
QTAILQ_INSERT_TAIL(&queue->requests, acb, entry);

> +    }
> +
> +    return acb;
> +}
> +
> +static int qemu_block_queue_handler(BlockIORequest *request)
> +{
> +    int ret;
> +    BlockDriverAIOCB *res;
> +
> +    res = request->handler(request->bs, request->sector_num,
> +                           request->qiov, request->nb_sectors,
> +                           request->cb, request->opaque);

Please pass qemu_block_queue_callback and request->acb directly here
instead of using request->cb and request->opaque.  Those fields aren't
needed and just add indirection.

> +    if (res) {
> +        BlockQueueAIOCB *blkacb =
> +                container_of(request->acb, BlockQueueAIOCB, common);
> +        blkacb->real_acb = res;
> +    }
> +
> +    ret = (res == NULL) ? 0 : 1;
> +
> +    return ret;
> +}
> +
> +void qemu_block_queue_flush(BlockQueue *queue)
> +{
> +    queue->flushing = true;
> +    while (!QTAILQ_EMPTY(&queue->requests)) {
> +        BlockIORequest *request = NULL;
> +        int ret = 0;
> +
> +        request = QTAILQ_FIRST(&queue->requests);
> +        QTAILQ_REMOVE(&queue->requests, request, entry);
> +
> +        queue->request = request;
> +        ret = qemu_block_queue_handler(request);
> +
> +        if (ret == 0) {
> +            QTAILQ_INSERT_HEAD(&queue->requests, request, entry);
> +            queue->request = NULL;
> +            break;
> +        }
> +
> +        if (queue->request) {
> +            g_free(request);
> +        }
> +
> +        queue->request = NULL;
> +    }
> +
> +    queue->flushing = false;
> +}
> +
> +bool qemu_block_queue_has_pending(BlockQueue *queue)
> +{
> +    return !queue->flushing && !QTAILQ_EMPTY(&queue->requests);
> +}
> diff --git a/block/blk-queue.h b/block/blk-queue.h
> new file mode 100644
> index 0000000..84c2407
> --- /dev/null
> +++ b/block/blk-queue.h
> @@ -0,0 +1,63 @@
> +/*
> + * QEMU System Emulator queue declaration for block layer
> + *
> + * Copyright (c) IBM, Corp. 2011
> + *
> + * Authors:
> + *  Zhi Yong Wu  <zwu.kernel@gmail.com>
> + *  Stefan Hajnoczi <stefanha@gmail.com>

Please use linux.vnet.ibm.com addresses and not GMail.

> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#ifndef QEMU_BLOCK_QUEUE_H
> +#define QEMU_BLOCK_QUEUE_H
> +
> +#include "block.h"
> +#include "qemu-queue.h"
> +
> +typedef BlockDriverAIOCB* (BlockRequestHandler) (BlockDriverState *bs,
> +                                int64_t sector_num, QEMUIOVector *qiov,
> +                                int nb_sectors, BlockDriverCompletionFunc *cb,
> +                                void *opaque);
> +
> +typedef struct BlockIORequest BlockIORequest;

BlockIORequest does not need a forward declaration because only
blk-queue.c uses it.  It's a private type that the rest of QEMU should
not know about.

> +
> +typedef struct BlockQueue BlockQueue;
> +
> +typedef struct BlockQueueAIOCB BlockQueueAIOCB;

This is also a private type, callers only know about BlockDriverAIOCB.
 Please move to blk-queue.c.

Stefan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 3/4] block: add block timer and block throttling algorithm
  2011-09-01 11:44   ` [Qemu-devel] " Zhi Yong Wu
@ 2011-09-01 13:36     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2011-09-01 13:36 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: kwolf, aliguori, stefanha, kvm, mtosatti, qemu-devel, pair,
	zwu.kernel, ryanh

On Thu, Sep 1, 2011 at 12:44 PM, Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> wrote:
> Note:
>     1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
>     2.) When "dd" command is issued in guest, if its option bs is set to a large value such as "bs=1024K", the result speed will slightly bigger than the limits.
>
> For these problems, if you have nice thought, pls let us know.:)
>
> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
> ---
>  block.c     |  290 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>  block.h     |    5 +
>  block_int.h |    9 ++
>  3 files changed, 296 insertions(+), 8 deletions(-)
>
> diff --git a/block.c b/block.c
> index 17ee3df..680f1e7 100644
> --- a/block.c
> +++ b/block.c
> @@ -30,6 +30,9 @@
>  #include "qemu-objects.h"
>  #include "qemu-coroutine.h"
>
> +#include "qemu-timer.h"
> +#include "block/blk-queue.h"
> +
>  #ifdef CONFIG_BSD
>  #include <sys/types.h>
>  #include <sys/stat.h>
> @@ -72,6 +75,13 @@ static int coroutine_fn bdrv_co_writev_em(BlockDriverState *bs,
>                                          QEMUIOVector *iov);
>  static int coroutine_fn bdrv_co_flush_em(BlockDriverState *bs);
>
> +static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
> +        bool is_write, double elapsed_time, uint64_t *wait);
> +static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
> +        double elapsed_time, uint64_t *wait);
> +static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
> +        bool is_write, uint64_t *wait);
> +
>  static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
>     QTAILQ_HEAD_INITIALIZER(bdrv_states);
>
> @@ -104,6 +114,64 @@ int is_windows_drive(const char *filename)
>  }
>  #endif
>
> +/* throttling disk I/O limits */
> +void bdrv_io_limits_disable(BlockDriverState *bs)
> +{
> +    bs->io_limits_enabled = false;
> +
> +    if (bs->block_queue) {
> +        qemu_block_queue_flush(bs->block_queue);
> +        qemu_del_block_queue(bs->block_queue);
> +        bs->block_queue = NULL;
> +    }
> +
> +    if (bs->block_timer) {
> +        qemu_del_timer(bs->block_timer);
> +        qemu_free_timer(bs->block_timer);
> +        bs->block_timer = NULL;
> +    }
> +
> +    bs->slice_start[BLOCK_IO_LIMIT_READ]  = 0;
> +    bs->slice_start[BLOCK_IO_LIMIT_WRITE] = 0;
> +
> +    bs->slice_end[BLOCK_IO_LIMIT_READ]    = 0;
> +    bs->slice_end[BLOCK_IO_LIMIT_WRITE]   = 0;
> +}
> +
> +static void bdrv_block_timer(void *opaque)
> +{
> +    BlockDriverState *bs = opaque;
> +    BlockQueue *queue = bs->block_queue;
> +
> +    qemu_block_queue_flush(queue);
> +}
> +
> +void bdrv_io_limits_enable(BlockDriverState *bs)
> +{
> +    bs->block_queue    = qemu_new_block_queue();
> +    bs->block_timer    = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
> +
> +    bs->slice_start[BLOCK_IO_LIMIT_READ]  = qemu_get_clock_ns(vm_clock);
> +    bs->slice_start[BLOCK_IO_LIMIT_WRITE] =
> +           bs->slice_start[BLOCK_IO_LIMIT_READ];
> +
> +    bs->slice_end[BLOCK_IO_LIMIT_READ]    =
> +           bs->slice_start[BLOCK_IO_LIMIT_READ] + BLOCK_IO_SLICE_TIME;
> +    bs->slice_end[BLOCK_IO_LIMIT_WRITE]   =
> +           bs->slice_end[BLOCK_IO_LIMIT_READ];
> +}
> +
> +bool bdrv_io_limits_enabled(BlockDriverState *bs)
> +{
> +    BlockIOLimit *io_limits = &bs->io_limits;
> +    return io_limits->bps[BLOCK_IO_LIMIT_READ]
> +         || io_limits->bps[BLOCK_IO_LIMIT_WRITE]
> +         || io_limits->bps[BLOCK_IO_LIMIT_TOTAL]
> +         || io_limits->iops[BLOCK_IO_LIMIT_READ]
> +         || io_limits->iops[BLOCK_IO_LIMIT_WRITE]
> +         || io_limits->iops[BLOCK_IO_LIMIT_TOTAL];
> +}
> +
>  /* check if the path starts with "<protocol>:" */
>  static int path_has_protocol(const char *path)
>  {
> @@ -694,6 +762,11 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
>             bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
>     }
>
> +    /* throttling disk I/O limits */
> +    if (bs->io_limits_enabled) {
> +        bdrv_io_limits_enable(bs);
> +    }
> +
>     return 0;
>
>  unlink_and_fail:
> @@ -732,6 +805,18 @@ void bdrv_close(BlockDriverState *bs)
>         if (bs->change_cb)
>             bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
>     }
> +
> +    /* throttling disk I/O limits */
> +    if (bs->block_queue) {
> +        qemu_del_block_queue(bs->block_queue);
> +        bs->block_queue = NULL;
> +    }
> +
> +    if (bs->block_timer) {
> +        qemu_del_timer(bs->block_timer);
> +        qemu_free_timer(bs->block_timer);
> +        bs->block_timer = NULL;
> +    }
>  }
>
>  void bdrv_close_all(void)
> @@ -2290,13 +2375,29 @@ BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num,
>                                  BlockDriverCompletionFunc *cb, void *opaque)
>  {
>     BlockDriver *drv = bs->drv;
> +    uint64_t wait_time = 0;
> +    BlockDriverAIOCB *ret;
>
>     trace_bdrv_aio_readv(bs, sector_num, nb_sectors, opaque);
>
> -    if (!drv)
> -        return NULL;
> -    if (bdrv_check_request(bs, sector_num, nb_sectors))
> +    if (!drv || bdrv_check_request(bs, sector_num, nb_sectors)) {
>         return NULL;
> +    }
> +
> +    /* throttling disk read I/O */
> +    if (bs->io_limits_enabled) {
> +        if (bdrv_exceed_io_limits(bs, nb_sectors, false, &wait_time)) {
> +            ret = qemu_block_queue_enqueue(bs->block_queue, bs, bdrv_aio_readv,
> +                           sector_num, qiov, nb_sectors, cb, opaque);
> +            qemu_mod_timer(bs->block_timer,
> +                           wait_time + qemu_get_clock_ns(vm_clock));
> +            return ret;
> +        }
> +
> +        bs->io_disps.bytes[BLOCK_IO_LIMIT_READ] +=
> +                           (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
> +        bs->io_disps.ios[BLOCK_IO_LIMIT_READ]++;
> +    }
>
>     return drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
>                                cb, opaque);
> @@ -2345,15 +2446,14 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
>     BlockDriver *drv = bs->drv;
>     BlockDriverAIOCB *ret;
>     BlockCompleteData *blk_cb_data;
> +    uint64_t wait_time = 0;
>
>     trace_bdrv_aio_writev(bs, sector_num, nb_sectors, opaque);
>
> -    if (!drv)
> -        return NULL;
> -    if (bs->read_only)
> -        return NULL;
> -    if (bdrv_check_request(bs, sector_num, nb_sectors))
> +    if (!drv || bs->read_only
> +        || bdrv_check_request(bs, sector_num, nb_sectors)) {
>         return NULL;
> +    }
>
>     if (bs->dirty_bitmap) {
>         blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb,
> @@ -2362,6 +2462,17 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
>         opaque = blk_cb_data;
>     }
>
> +    /* throttling disk write I/O */
> +    if (bs->io_limits_enabled) {
> +        if (bdrv_exceed_io_limits(bs, nb_sectors, true, &wait_time)) {
> +            ret = qemu_block_queue_enqueue(bs->block_queue, bs, bdrv_aio_writev,
> +                                  sector_num, qiov, nb_sectors, cb, opaque);
> +            qemu_mod_timer(bs->block_timer,
> +                                  wait_time + qemu_get_clock_ns(vm_clock));
> +            return ret;
> +        }
> +    }
> +
>     ret = drv->bdrv_aio_writev(bs, sector_num, qiov, nb_sectors,
>                                cb, opaque);
>
> @@ -2369,6 +2480,12 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
>         if (bs->wr_highest_sector < sector_num + nb_sectors - 1) {
>             bs->wr_highest_sector = sector_num + nb_sectors - 1;
>         }
> +
> +        if (bs->io_limits_enabled) {
> +            bs->io_disps.bytes[BLOCK_IO_LIMIT_WRITE] +=
> +                               (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
> +            bs->io_disps.ios[BLOCK_IO_LIMIT_WRITE]++;
> +        }
>     }
>
>     return ret;
> @@ -2633,6 +2750,163 @@ void bdrv_aio_cancel(BlockDriverAIOCB *acb)
>     acb->pool->cancel(acb);
>  }
>
> +static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
> +                 bool is_write, double elapsed_time, uint64_t *wait) {
> +    uint64_t bps_limit = 0;
> +    double   bytes_limit, bytes_disp, bytes_res;
> +    double   slice_time, wait_time;
> +
> +    if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
> +        bps_limit = bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL];
> +    } else if (bs->io_limits.bps[is_write]) {
> +        bps_limit = bs->io_limits.bps[is_write];
> +    } else {
> +        if (wait) {
> +            *wait = 0;
> +        }
> +
> +        return false;
> +    }
> +
> +    slice_time = bs->slice_end[is_write] - bs->slice_start[is_write];
> +    slice_time /= (NANOSECONDS_PER_SECOND);
> +    bytes_limit = bps_limit * slice_time;
> +    bytes_disp  = bs->io_disps.bytes[is_write];
> +    if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
> +        bytes_disp += bs->io_disps.bytes[!is_write];
> +    }
> +
> +    bytes_res   = (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
> +
> +    if (bytes_disp + bytes_res <= bytes_limit) {
> +        if (wait) {
> +            *wait = 0;
> +        }
> +
> +        return false;
> +    }
> +
> +    /* Calc approx time to dispatch */
> +    wait_time = (bytes_disp + bytes_res) / bps_limit - elapsed_time;
> +
> +    if (wait) {
> +        *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
> +    }
> +
> +    return true;
> +}
> +
> +static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
> +                             double elapsed_time, uint64_t *wait) {
> +    uint64_t iops_limit = 0;
> +    double   ios_limit, ios_disp;
> +    double   slice_time, wait_time;
> +
> +    if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
> +        iops_limit = bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL];
> +    } else if (bs->io_limits.iops[is_write]) {
> +        iops_limit = bs->io_limits.iops[is_write];
> +    } else {
> +        if (wait) {
> +            *wait = 0;
> +        }
> +
> +        return false;
> +    }
> +
> +    slice_time = bs->slice_end[is_write] - bs->slice_start[is_write];
> +    slice_time /= (NANOSECONDS_PER_SECOND);
> +    ios_limit  = iops_limit * slice_time;
> +    ios_disp   = bs->io_disps.ios[is_write];
> +    if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
> +        ios_disp += bs->io_disps.ios[!is_write];
> +    }
> +
> +    if (ios_disp + 1 <= ios_limit) {
> +        if (wait) {
> +            *wait = 0;
> +        }
> +
> +        return false;
> +    }
> +
> +    /* Calc approx time to dispatch */
> +    wait_time = (ios_disp + 1) / iops_limit;
> +    if (wait_time > elapsed_time) {
> +        wait_time = wait_time - elapsed_time;
> +    } else {
> +        wait_time = 0;
> +    }
> +
> +    if (wait) {
> +        *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
> +    }
> +
> +    return true;
> +}
> +
> +static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
> +                           bool is_write, uint64_t *wait) {
> +    int64_t  now;
> +    uint64_t bps_wait = 0, iops_wait = 0, max_wait;
> +    double   elapsed_time;
> +    int      bps_ret, iops_ret;
> +
> +    now = qemu_get_clock_ns(vm_clock);
> +    if ((bs->slice_start[is_write] < now)
> +        && (bs->slice_end[is_write] > now)) {
> +        bs->slice_end[is_write]   = now + BLOCK_IO_SLICE_TIME;
> +    } else {
> +        bs->slice_start[is_write]     = now;
> +        bs->slice_end[is_write]       = now + BLOCK_IO_SLICE_TIME;
> +
> +        bs->io_disps.bytes[is_write]  = 0;
> +        bs->io_disps.bytes[!is_write] = 0;
> +
> +        bs->io_disps.ios[is_write]    = 0;
> +        bs->io_disps.ios[!is_write]   = 0;

Does it make sense to keep separate slice_start/slice_end for read and
write since we reset the dispatched statistics to zero for both?
Perhaps we should use a scalar slice_start/slice_end and not two
separate values for read/write.

> +    }
> +
> +    /* If a limit was exceeded, immediately queue this request */
> +    if (qemu_block_queue_has_pending(bs->block_queue)) {
> +        if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]
> +            || bs->io_limits.bps[is_write] || bs->io_limits.iops[is_write]
> +            || bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
> +            if (wait) {
> +                *wait = 0;

This causes the queue to be flushed each time the guest enqueues an
I/O while there are queued requests.  Perhaps this is (part of) the
CPU overhead that Ryan's benchmarking discovered.

If we try to preserve request ordering then I don't think there is a
reason to modify the timer once it has been set.

Stefan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v6 3/4] block: add block timer and block throttling algorithm
@ 2011-09-01 13:36     ` Stefan Hajnoczi
  0 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2011-09-01 13:36 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: kwolf, aliguori, stefanha, kvm, mtosatti, qemu-devel, pair,
	zwu.kernel, ryanh

On Thu, Sep 1, 2011 at 12:44 PM, Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> wrote:
> Note:
>     1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
>     2.) When "dd" command is issued in guest, if its option bs is set to a large value such as "bs=1024K", the result speed will slightly bigger than the limits.
>
> For these problems, if you have nice thought, pls let us know.:)
>
> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
> ---
>  block.c     |  290 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>  block.h     |    5 +
>  block_int.h |    9 ++
>  3 files changed, 296 insertions(+), 8 deletions(-)
>
> diff --git a/block.c b/block.c
> index 17ee3df..680f1e7 100644
> --- a/block.c
> +++ b/block.c
> @@ -30,6 +30,9 @@
>  #include "qemu-objects.h"
>  #include "qemu-coroutine.h"
>
> +#include "qemu-timer.h"
> +#include "block/blk-queue.h"
> +
>  #ifdef CONFIG_BSD
>  #include <sys/types.h>
>  #include <sys/stat.h>
> @@ -72,6 +75,13 @@ static int coroutine_fn bdrv_co_writev_em(BlockDriverState *bs,
>                                          QEMUIOVector *iov);
>  static int coroutine_fn bdrv_co_flush_em(BlockDriverState *bs);
>
> +static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
> +        bool is_write, double elapsed_time, uint64_t *wait);
> +static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
> +        double elapsed_time, uint64_t *wait);
> +static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
> +        bool is_write, uint64_t *wait);
> +
>  static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
>     QTAILQ_HEAD_INITIALIZER(bdrv_states);
>
> @@ -104,6 +114,64 @@ int is_windows_drive(const char *filename)
>  }
>  #endif
>
> +/* throttling disk I/O limits */
> +void bdrv_io_limits_disable(BlockDriverState *bs)
> +{
> +    bs->io_limits_enabled = false;
> +
> +    if (bs->block_queue) {
> +        qemu_block_queue_flush(bs->block_queue);
> +        qemu_del_block_queue(bs->block_queue);
> +        bs->block_queue = NULL;
> +    }
> +
> +    if (bs->block_timer) {
> +        qemu_del_timer(bs->block_timer);
> +        qemu_free_timer(bs->block_timer);
> +        bs->block_timer = NULL;
> +    }
> +
> +    bs->slice_start[BLOCK_IO_LIMIT_READ]  = 0;
> +    bs->slice_start[BLOCK_IO_LIMIT_WRITE] = 0;
> +
> +    bs->slice_end[BLOCK_IO_LIMIT_READ]    = 0;
> +    bs->slice_end[BLOCK_IO_LIMIT_WRITE]   = 0;
> +}
> +
> +static void bdrv_block_timer(void *opaque)
> +{
> +    BlockDriverState *bs = opaque;
> +    BlockQueue *queue = bs->block_queue;
> +
> +    qemu_block_queue_flush(queue);
> +}
> +
> +void bdrv_io_limits_enable(BlockDriverState *bs)
> +{
> +    bs->block_queue    = qemu_new_block_queue();
> +    bs->block_timer    = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
> +
> +    bs->slice_start[BLOCK_IO_LIMIT_READ]  = qemu_get_clock_ns(vm_clock);
> +    bs->slice_start[BLOCK_IO_LIMIT_WRITE] =
> +           bs->slice_start[BLOCK_IO_LIMIT_READ];
> +
> +    bs->slice_end[BLOCK_IO_LIMIT_READ]    =
> +           bs->slice_start[BLOCK_IO_LIMIT_READ] + BLOCK_IO_SLICE_TIME;
> +    bs->slice_end[BLOCK_IO_LIMIT_WRITE]   =
> +           bs->slice_end[BLOCK_IO_LIMIT_READ];
> +}
> +
> +bool bdrv_io_limits_enabled(BlockDriverState *bs)
> +{
> +    BlockIOLimit *io_limits = &bs->io_limits;
> +    return io_limits->bps[BLOCK_IO_LIMIT_READ]
> +         || io_limits->bps[BLOCK_IO_LIMIT_WRITE]
> +         || io_limits->bps[BLOCK_IO_LIMIT_TOTAL]
> +         || io_limits->iops[BLOCK_IO_LIMIT_READ]
> +         || io_limits->iops[BLOCK_IO_LIMIT_WRITE]
> +         || io_limits->iops[BLOCK_IO_LIMIT_TOTAL];
> +}
> +
>  /* check if the path starts with "<protocol>:" */
>  static int path_has_protocol(const char *path)
>  {
> @@ -694,6 +762,11 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
>             bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
>     }
>
> +    /* throttling disk I/O limits */
> +    if (bs->io_limits_enabled) {
> +        bdrv_io_limits_enable(bs);
> +    }
> +
>     return 0;
>
>  unlink_and_fail:
> @@ -732,6 +805,18 @@ void bdrv_close(BlockDriverState *bs)
>         if (bs->change_cb)
>             bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
>     }
> +
> +    /* throttling disk I/O limits */
> +    if (bs->block_queue) {
> +        qemu_del_block_queue(bs->block_queue);
> +        bs->block_queue = NULL;
> +    }
> +
> +    if (bs->block_timer) {
> +        qemu_del_timer(bs->block_timer);
> +        qemu_free_timer(bs->block_timer);
> +        bs->block_timer = NULL;
> +    }
>  }
>
>  void bdrv_close_all(void)
> @@ -2290,13 +2375,29 @@ BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num,
>                                  BlockDriverCompletionFunc *cb, void *opaque)
>  {
>     BlockDriver *drv = bs->drv;
> +    uint64_t wait_time = 0;
> +    BlockDriverAIOCB *ret;
>
>     trace_bdrv_aio_readv(bs, sector_num, nb_sectors, opaque);
>
> -    if (!drv)
> -        return NULL;
> -    if (bdrv_check_request(bs, sector_num, nb_sectors))
> +    if (!drv || bdrv_check_request(bs, sector_num, nb_sectors)) {
>         return NULL;
> +    }
> +
> +    /* throttling disk read I/O */
> +    if (bs->io_limits_enabled) {
> +        if (bdrv_exceed_io_limits(bs, nb_sectors, false, &wait_time)) {
> +            ret = qemu_block_queue_enqueue(bs->block_queue, bs, bdrv_aio_readv,
> +                           sector_num, qiov, nb_sectors, cb, opaque);
> +            qemu_mod_timer(bs->block_timer,
> +                           wait_time + qemu_get_clock_ns(vm_clock));
> +            return ret;
> +        }
> +
> +        bs->io_disps.bytes[BLOCK_IO_LIMIT_READ] +=
> +                           (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
> +        bs->io_disps.ios[BLOCK_IO_LIMIT_READ]++;
> +    }
>
>     return drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
>                                cb, opaque);
> @@ -2345,15 +2446,14 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
>     BlockDriver *drv = bs->drv;
>     BlockDriverAIOCB *ret;
>     BlockCompleteData *blk_cb_data;
> +    uint64_t wait_time = 0;
>
>     trace_bdrv_aio_writev(bs, sector_num, nb_sectors, opaque);
>
> -    if (!drv)
> -        return NULL;
> -    if (bs->read_only)
> -        return NULL;
> -    if (bdrv_check_request(bs, sector_num, nb_sectors))
> +    if (!drv || bs->read_only
> +        || bdrv_check_request(bs, sector_num, nb_sectors)) {
>         return NULL;
> +    }
>
>     if (bs->dirty_bitmap) {
>         blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb,
> @@ -2362,6 +2462,17 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
>         opaque = blk_cb_data;
>     }
>
> +    /* throttling disk write I/O */
> +    if (bs->io_limits_enabled) {
> +        if (bdrv_exceed_io_limits(bs, nb_sectors, true, &wait_time)) {
> +            ret = qemu_block_queue_enqueue(bs->block_queue, bs, bdrv_aio_writev,
> +                                  sector_num, qiov, nb_sectors, cb, opaque);
> +            qemu_mod_timer(bs->block_timer,
> +                                  wait_time + qemu_get_clock_ns(vm_clock));
> +            return ret;
> +        }
> +    }
> +
>     ret = drv->bdrv_aio_writev(bs, sector_num, qiov, nb_sectors,
>                                cb, opaque);
>
> @@ -2369,6 +2480,12 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
>         if (bs->wr_highest_sector < sector_num + nb_sectors - 1) {
>             bs->wr_highest_sector = sector_num + nb_sectors - 1;
>         }
> +
> +        if (bs->io_limits_enabled) {
> +            bs->io_disps.bytes[BLOCK_IO_LIMIT_WRITE] +=
> +                               (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
> +            bs->io_disps.ios[BLOCK_IO_LIMIT_WRITE]++;
> +        }
>     }
>
>     return ret;
> @@ -2633,6 +2750,163 @@ void bdrv_aio_cancel(BlockDriverAIOCB *acb)
>     acb->pool->cancel(acb);
>  }
>
> +static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
> +                 bool is_write, double elapsed_time, uint64_t *wait) {
> +    uint64_t bps_limit = 0;
> +    double   bytes_limit, bytes_disp, bytes_res;
> +    double   slice_time, wait_time;
> +
> +    if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
> +        bps_limit = bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL];
> +    } else if (bs->io_limits.bps[is_write]) {
> +        bps_limit = bs->io_limits.bps[is_write];
> +    } else {
> +        if (wait) {
> +            *wait = 0;
> +        }
> +
> +        return false;
> +    }
> +
> +    slice_time = bs->slice_end[is_write] - bs->slice_start[is_write];
> +    slice_time /= (NANOSECONDS_PER_SECOND);
> +    bytes_limit = bps_limit * slice_time;
> +    bytes_disp  = bs->io_disps.bytes[is_write];
> +    if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
> +        bytes_disp += bs->io_disps.bytes[!is_write];
> +    }
> +
> +    bytes_res   = (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
> +
> +    if (bytes_disp + bytes_res <= bytes_limit) {
> +        if (wait) {
> +            *wait = 0;
> +        }
> +
> +        return false;
> +    }
> +
> +    /* Calc approx time to dispatch */
> +    wait_time = (bytes_disp + bytes_res) / bps_limit - elapsed_time;
> +
> +    if (wait) {
> +        *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
> +    }
> +
> +    return true;
> +}
> +
> +static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
> +                             double elapsed_time, uint64_t *wait) {
> +    uint64_t iops_limit = 0;
> +    double   ios_limit, ios_disp;
> +    double   slice_time, wait_time;
> +
> +    if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
> +        iops_limit = bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL];
> +    } else if (bs->io_limits.iops[is_write]) {
> +        iops_limit = bs->io_limits.iops[is_write];
> +    } else {
> +        if (wait) {
> +            *wait = 0;
> +        }
> +
> +        return false;
> +    }
> +
> +    slice_time = bs->slice_end[is_write] - bs->slice_start[is_write];
> +    slice_time /= (NANOSECONDS_PER_SECOND);
> +    ios_limit  = iops_limit * slice_time;
> +    ios_disp   = bs->io_disps.ios[is_write];
> +    if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
> +        ios_disp += bs->io_disps.ios[!is_write];
> +    }
> +
> +    if (ios_disp + 1 <= ios_limit) {
> +        if (wait) {
> +            *wait = 0;
> +        }
> +
> +        return false;
> +    }
> +
> +    /* Calc approx time to dispatch */
> +    wait_time = (ios_disp + 1) / iops_limit;
> +    if (wait_time > elapsed_time) {
> +        wait_time = wait_time - elapsed_time;
> +    } else {
> +        wait_time = 0;
> +    }
> +
> +    if (wait) {
> +        *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
> +    }
> +
> +    return true;
> +}
> +
> +static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
> +                           bool is_write, uint64_t *wait) {
> +    int64_t  now;
> +    uint64_t bps_wait = 0, iops_wait = 0, max_wait;
> +    double   elapsed_time;
> +    int      bps_ret, iops_ret;
> +
> +    now = qemu_get_clock_ns(vm_clock);
> +    if ((bs->slice_start[is_write] < now)
> +        && (bs->slice_end[is_write] > now)) {
> +        bs->slice_end[is_write]   = now + BLOCK_IO_SLICE_TIME;
> +    } else {
> +        bs->slice_start[is_write]     = now;
> +        bs->slice_end[is_write]       = now + BLOCK_IO_SLICE_TIME;
> +
> +        bs->io_disps.bytes[is_write]  = 0;
> +        bs->io_disps.bytes[!is_write] = 0;
> +
> +        bs->io_disps.ios[is_write]    = 0;
> +        bs->io_disps.ios[!is_write]   = 0;

Does it make sense to keep separate slice_start/slice_end for read and
write since we reset the dispatched statistics to zero for both?
Perhaps we should use a scalar slice_start/slice_end and not two
separate values for read/write.

> +    }
> +
> +    /* If a limit was exceeded, immediately queue this request */
> +    if (qemu_block_queue_has_pending(bs->block_queue)) {
> +        if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]
> +            || bs->io_limits.bps[is_write] || bs->io_limits.iops[is_write]
> +            || bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
> +            if (wait) {
> +                *wait = 0;

This causes the queue to be flushed each time the guest enqueues an
I/O while there are queued requests.  Perhaps this is (part of) the
CPU overhead that Ryan's benchmarking discovered.

If we try to preserve request ordering then I don't think there is a
reason to modify the timer once it has been set.

Stefan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v6 3/4] block: add block timer and block throttling algorithm
  2011-09-01 13:36     ` [Qemu-devel] " Stefan Hajnoczi
  (?)
@ 2011-09-05  7:10     ` Zhi Yong Wu
  -1 siblings, 0 replies; 17+ messages in thread
From: Zhi Yong Wu @ 2011-09-05  7:10 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel

On Thu, Sep 01, 2011 at 02:36:41PM +0100, Stefan Hajnoczi wrote:
>Date: Thu, 1 Sep 2011 14:36:41 +0100
>Message-ID: <CAJSP0QU8GqWsA6jaYAfs0=hvabQayfRPSzRNctGP6GEGgptHaw@mail.gmail.com>
>From: Stefan Hajnoczi <stefanha@gmail.com>
>To: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>Content-Type: text/plain; charset=ISO-8859-1
>Content-Transfer-Encoding: quoted-printable
>X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2)
>X-Received-From: 209.85.216.45
>Cc: kwolf@redhat.com, aliguori@us.ibm.com, stefanha@linux.vnet.ibm.com,
> kvm@vger.kernel.org, mtosatti@redhat.com, qemu-devel@nongnu.org,
> pair@us.ibm.com, zwu.kernel@gmail.com, ryanh@us.ibm.com
>Subject: Re: [Qemu-devel] [PATCH v6 3/4] block: add block timer and block
> throttling algorithm
>X-BeenThere: qemu-devel@nongnu.org
>X-Mailman-Version: 2.1.14
>Precedence: list
>List-Id: <qemu-devel.nongnu.org>
>List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
> <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
>List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
>List-Post: <mailto:qemu-devel@nongnu.org>
>List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
>List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
> <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
>X-Mailman-Copy: yes
>Errors-To: qemu-devel-bounces+wuzhy=linux.vnet.ibm.com@nongnu.org
>Sender: qemu-devel-bounces+wuzhy=linux.vnet.ibm.com@nongnu.org
>X-Brightmail-Tracker: AAAAAA==
>X-Xagent-From: stefanha@gmail.com
>X-Xagent-To: wuzhy@linux.vnet.ibm.com
>X-Xagent-Gateway: bldgate.vnet.ibm.com (XAGENTU8 at BLDGATE)
>
>On Thu, Sep 1, 2011 at 12:44 PM, Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> wrote:
>> Note:
>>     1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario.
>>     2.) When "dd" command is issued in guest, if its option bs is set to a large value such as "bs=1024K", the result speed will slightly bigger than the limits.
>>
>> For these problems, if you have nice thought, pls let us know.:)
>>
>> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>> ---
>>  block.c     |  290 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>>  block.h     |    5 +
>>  block_int.h |    9 ++
>>  3 files changed, 296 insertions(+), 8 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index 17ee3df..680f1e7 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -30,6 +30,9 @@
>>  #include "qemu-objects.h"
>>  #include "qemu-coroutine.h"
>>
>> +#include "qemu-timer.h"
>> +#include "block/blk-queue.h"
>> +
>>  #ifdef CONFIG_BSD
>>  #include <sys/types.h>
>>  #include <sys/stat.h>
>> @@ -72,6 +75,13 @@ static int coroutine_fn bdrv_co_writev_em(BlockDriverState *bs,
>>                                          QEMUIOVector *iov);
>>  static int coroutine_fn bdrv_co_flush_em(BlockDriverState *bs);
>>
>> +static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
>> +        bool is_write, double elapsed_time, uint64_t *wait);
>> +static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
>> +        double elapsed_time, uint64_t *wait);
>> +static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
>> +        bool is_write, uint64_t *wait);
>> +
>>  static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
>>     QTAILQ_HEAD_INITIALIZER(bdrv_states);
>>
>> @@ -104,6 +114,64 @@ int is_windows_drive(const char *filename)
>>  }
>>  #endif
>>
>> +/* throttling disk I/O limits */
>> +void bdrv_io_limits_disable(BlockDriverState *bs)
>> +{
>> +    bs->io_limits_enabled = false;
>> +
>> +    if (bs->block_queue) {
>> +        qemu_block_queue_flush(bs->block_queue);
>> +        qemu_del_block_queue(bs->block_queue);
>> +        bs->block_queue = NULL;
>> +    }
>> +
>> +    if (bs->block_timer) {
>> +        qemu_del_timer(bs->block_timer);
>> +        qemu_free_timer(bs->block_timer);
>> +        bs->block_timer = NULL;
>> +    }
>> +
>> +    bs->slice_start[BLOCK_IO_LIMIT_READ]  = 0;
>> +    bs->slice_start[BLOCK_IO_LIMIT_WRITE] = 0;
>> +
>> +    bs->slice_end[BLOCK_IO_LIMIT_READ]    = 0;
>> +    bs->slice_end[BLOCK_IO_LIMIT_WRITE]   = 0;
>> +}
>> +
>> +static void bdrv_block_timer(void *opaque)
>> +{
>> +    BlockDriverState *bs = opaque;
>> +    BlockQueue *queue = bs->block_queue;
>> +
>> +    qemu_block_queue_flush(queue);
>> +}
>> +
>> +void bdrv_io_limits_enable(BlockDriverState *bs)
>> +{
>> +    bs->block_queue    = qemu_new_block_queue();
>> +    bs->block_timer    = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
>> +
>> +    bs->slice_start[BLOCK_IO_LIMIT_READ]  = qemu_get_clock_ns(vm_clock);
>> +    bs->slice_start[BLOCK_IO_LIMIT_WRITE] =
>> +           bs->slice_start[BLOCK_IO_LIMIT_READ];
>> +
>> +    bs->slice_end[BLOCK_IO_LIMIT_READ]    =
>> +           bs->slice_start[BLOCK_IO_LIMIT_READ] + BLOCK_IO_SLICE_TIME;
>> +    bs->slice_end[BLOCK_IO_LIMIT_WRITE]   =
>> +           bs->slice_end[BLOCK_IO_LIMIT_READ];
>> +}
>> +
>> +bool bdrv_io_limits_enabled(BlockDriverState *bs)
>> +{
>> +    BlockIOLimit *io_limits = &bs->io_limits;
>> +    return io_limits->bps[BLOCK_IO_LIMIT_READ]
>> +         || io_limits->bps[BLOCK_IO_LIMIT_WRITE]
>> +         || io_limits->bps[BLOCK_IO_LIMIT_TOTAL]
>> +         || io_limits->iops[BLOCK_IO_LIMIT_READ]
>> +         || io_limits->iops[BLOCK_IO_LIMIT_WRITE]
>> +         || io_limits->iops[BLOCK_IO_LIMIT_TOTAL];
>> +}
>> +
>>  /* check if the path starts with "<protocol>:" */
>>  static int path_has_protocol(const char *path)
>>  {
>> @@ -694,6 +762,11 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
>>             bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
>>     }
>>
>> +    /* throttling disk I/O limits */
>> +    if (bs->io_limits_enabled) {
>> +        bdrv_io_limits_enable(bs);
>> +    }
>> +
>>     return 0;
>>
>>  unlink_and_fail:
>> @@ -732,6 +805,18 @@ void bdrv_close(BlockDriverState *bs)
>>         if (bs->change_cb)
>>             bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
>>     }
>> +
>> +    /* throttling disk I/O limits */
>> +    if (bs->block_queue) {
>> +        qemu_del_block_queue(bs->block_queue);
>> +        bs->block_queue = NULL;
>> +    }
>> +
>> +    if (bs->block_timer) {
>> +        qemu_del_timer(bs->block_timer);
>> +        qemu_free_timer(bs->block_timer);
>> +        bs->block_timer = NULL;
>> +    }
>>  }
>>
>>  void bdrv_close_all(void)
>> @@ -2290,13 +2375,29 @@ BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num,
>>                                  BlockDriverCompletionFunc *cb, void *opaque)
>>  {
>>     BlockDriver *drv = bs->drv;
>> +    uint64_t wait_time = 0;
>> +    BlockDriverAIOCB *ret;
>>
>>     trace_bdrv_aio_readv(bs, sector_num, nb_sectors, opaque);
>>
>> -    if (!drv)
>> -        return NULL;
>> -    if (bdrv_check_request(bs, sector_num, nb_sectors))
>> +    if (!drv || bdrv_check_request(bs, sector_num, nb_sectors)) {
>>         return NULL;
>> +    }
>> +
>> +    /* throttling disk read I/O */
>> +    if (bs->io_limits_enabled) {
>> +        if (bdrv_exceed_io_limits(bs, nb_sectors, false, &wait_time)) {
>> +            ret = qemu_block_queue_enqueue(bs->block_queue, bs, bdrv_aio_readv,
>> +                           sector_num, qiov, nb_sectors, cb, opaque);
>> +            qemu_mod_timer(bs->block_timer,
>> +                           wait_time + qemu_get_clock_ns(vm_clock));
>> +            return ret;
>> +        }
>> +
>> +        bs->io_disps.bytes[BLOCK_IO_LIMIT_READ] +=
>> +                           (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
>> +        bs->io_disps.ios[BLOCK_IO_LIMIT_READ]++;
>> +    }
>>
>>     return drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
>>                                cb, opaque);
>> @@ -2345,15 +2446,14 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
>>     BlockDriver *drv = bs->drv;
>>     BlockDriverAIOCB *ret;
>>     BlockCompleteData *blk_cb_data;
>> +    uint64_t wait_time = 0;
>>
>>     trace_bdrv_aio_writev(bs, sector_num, nb_sectors, opaque);
>>
>> -    if (!drv)
>> -        return NULL;
>> -    if (bs->read_only)
>> -        return NULL;
>> -    if (bdrv_check_request(bs, sector_num, nb_sectors))
>> +    if (!drv || bs->read_only
>> +        || bdrv_check_request(bs, sector_num, nb_sectors)) {
>>         return NULL;
>> +    }
>>
>>     if (bs->dirty_bitmap) {
>>         blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb,
>> @@ -2362,6 +2462,17 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
>>         opaque = blk_cb_data;
>>     }
>>
>> +    /* throttling disk write I/O */
>> +    if (bs->io_limits_enabled) {
>> +        if (bdrv_exceed_io_limits(bs, nb_sectors, true, &wait_time)) {
>> +            ret = qemu_block_queue_enqueue(bs->block_queue, bs, bdrv_aio_writev,
>> +                                  sector_num, qiov, nb_sectors, cb, opaque);
>> +            qemu_mod_timer(bs->block_timer,
>> +                                  wait_time + qemu_get_clock_ns(vm_clock));
>> +            return ret;
>> +        }
>> +    }
>> +
>>     ret = drv->bdrv_aio_writev(bs, sector_num, qiov, nb_sectors,
>>                                cb, opaque);
>>
>> @@ -2369,6 +2480,12 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
>>         if (bs->wr_highest_sector < sector_num + nb_sectors - 1) {
>>             bs->wr_highest_sector = sector_num + nb_sectors - 1;
>>         }
>> +
>> +        if (bs->io_limits_enabled) {
>> +            bs->io_disps.bytes[BLOCK_IO_LIMIT_WRITE] +=
>> +                               (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
>> +            bs->io_disps.ios[BLOCK_IO_LIMIT_WRITE]++;
>> +        }
>>     }
>>
>>     return ret;
>> @@ -2633,6 +2750,163 @@ void bdrv_aio_cancel(BlockDriverAIOCB *acb)
>>     acb->pool->cancel(acb);
>>  }
>>
>> +static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
>> +                 bool is_write, double elapsed_time, uint64_t *wait) {
>> +    uint64_t bps_limit = 0;
>> +    double   bytes_limit, bytes_disp, bytes_res;
>> +    double   slice_time, wait_time;
>> +
>> +    if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
>> +        bps_limit = bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL];
>> +    } else if (bs->io_limits.bps[is_write]) {
>> +        bps_limit = bs->io_limits.bps[is_write];
>> +    } else {
>> +        if (wait) {
>> +            *wait = 0;
>> +        }
>> +
>> +        return false;
>> +    }
>> +
>> +    slice_time = bs->slice_end[is_write] - bs->slice_start[is_write];
>> +    slice_time /= (NANOSECONDS_PER_SECOND);
>> +    bytes_limit = bps_limit * slice_time;
>> +    bytes_disp  = bs->io_disps.bytes[is_write];
>> +    if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
>> +        bytes_disp += bs->io_disps.bytes[!is_write];
>> +    }
>> +
>> +    bytes_res   = (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
>> +
>> +    if (bytes_disp + bytes_res <= bytes_limit) {
>> +        if (wait) {
>> +            *wait = 0;
>> +        }
>> +
>> +        return false;
>> +    }
>> +
>> +    /* Calc approx time to dispatch */
>> +    wait_time = (bytes_disp + bytes_res) / bps_limit - elapsed_time;
>> +
>> +    if (wait) {
>> +        *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
>> +    }
>> +
>> +    return true;
>> +}
>> +
>> +static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
>> +                             double elapsed_time, uint64_t *wait) {
>> +    uint64_t iops_limit = 0;
>> +    double   ios_limit, ios_disp;
>> +    double   slice_time, wait_time;
>> +
>> +    if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
>> +        iops_limit = bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL];
>> +    } else if (bs->io_limits.iops[is_write]) {
>> +        iops_limit = bs->io_limits.iops[is_write];
>> +    } else {
>> +        if (wait) {
>> +            *wait = 0;
>> +        }
>> +
>> +        return false;
>> +    }
>> +
>> +    slice_time = bs->slice_end[is_write] - bs->slice_start[is_write];
>> +    slice_time /= (NANOSECONDS_PER_SECOND);
>> +    ios_limit  = iops_limit * slice_time;
>> +    ios_disp   = bs->io_disps.ios[is_write];
>> +    if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
>> +        ios_disp += bs->io_disps.ios[!is_write];
>> +    }
>> +
>> +    if (ios_disp + 1 <= ios_limit) {
>> +        if (wait) {
>> +            *wait = 0;
>> +        }
>> +
>> +        return false;
>> +    }
>> +
>> +    /* Calc approx time to dispatch */
>> +    wait_time = (ios_disp + 1) / iops_limit;
>> +    if (wait_time > elapsed_time) {
>> +        wait_time = wait_time - elapsed_time;
>> +    } else {
>> +        wait_time = 0;
>> +    }
>> +
>> +    if (wait) {
>> +        *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
>> +    }
>> +
>> +    return true;
>> +}
>> +
>> +static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
>> +                           bool is_write, uint64_t *wait) {
>> +    int64_t  now;
>> +    uint64_t bps_wait = 0, iops_wait = 0, max_wait;
>> +    double   elapsed_time;
>> +    int      bps_ret, iops_ret;
>> +
>> +    now = qemu_get_clock_ns(vm_clock);
>> +    if ((bs->slice_start[is_write] < now)
>> +        && (bs->slice_end[is_write] > now)) {
>> +        bs->slice_end[is_write]   = now + BLOCK_IO_SLICE_TIME;
>> +    } else {
>> +        bs->slice_start[is_write]     = now;
>> +        bs->slice_end[is_write]       = now + BLOCK_IO_SLICE_TIME;
>> +
>> +        bs->io_disps.bytes[is_write]  = 0;
>> +        bs->io_disps.bytes[!is_write] = 0;
>> +
>> +        bs->io_disps.ios[is_write]    = 0;
>> +        bs->io_disps.ios[!is_write]   = 0;
>
>Does it make sense to keep separate slice_start/slice_end for read and
>write since we reset the dispatched statistics to zero for both?
>Perhaps we should use a scalar slice_start/slice_end and not two
>separate values for read/write.
Right, currently the scalar slice_* should be adopted, thanks.
>
>> +    }
>> +
>> +    /* If a limit was exceeded, immediately queue this request */
>> +    if (qemu_block_queue_has_pending(bs->block_queue)) {
>> +        if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]
>> +            || bs->io_limits.bps[is_write] || bs->io_limits.iops[is_write]
>> +            || bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
>> +            if (wait) {
>> +                *wait = 0;
>
>This causes the queue to be flushed each time the guest enqueues an
>I/O while there are queued requests.  Perhaps this is (part of) the
>CPU overhead that Ryan's benchmarking discovered.
>
>If we try to preserve request ordering then I don't think there is a
>reason to modify the timer once it has been set.
good catch, pls check the latest changes on my public git branch.

>
>Stefan
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v6 2/4] block: add the block queue support
  2011-09-01 13:02     ` [Qemu-devel] " Stefan Hajnoczi
  (?)
@ 2011-09-05  8:34     ` Zhi Yong Wu
  2011-09-07 11:13       ` Stefan Hajnoczi
  -1 siblings, 1 reply; 17+ messages in thread
From: Zhi Yong Wu @ 2011-09-05  8:34 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel

On Thu, Sep 01, 2011 at 02:02:41PM +0100, Stefan Hajnoczi wrote:
> 01 Sep 2011 06:02:41 -0700 (PDT)
>Received: by 10.220.200.77 with HTTP; Thu, 1 Sep 2011 06:02:41 -0700 (PDT)
>In-Reply-To: <1314877456-19521-3-git-send-email-wuzhy@linux.vnet.ibm.com>
>References: <1314877456-19521-1-git-send-email-wuzhy@linux.vnet.ibm.com>
> <1314877456-19521-3-git-send-email-wuzhy@linux.vnet.ibm.com>
>Date: Thu, 1 Sep 2011 14:02:41 +0100
>Message-ID: <CAJSP0QXxRTb74w1kkpwAVP2BM=9G7Q3k6_dcbevKHrec21mHww@mail.gmail.com>
>From: Stefan Hajnoczi <stefanha@gmail.com>
>To: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>Content-Type: text/plain; charset=ISO-8859-1
>Content-Transfer-Encoding: quoted-printable
>X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2)
>X-Received-From: 209.85.220.173
>Cc: kwolf@redhat.com, aliguori@us.ibm.com, stefanha@linux.vnet.ibm.com,
> kvm@vger.kernel.org, mtosatti@redhat.com, qemu-devel@nongnu.org,
> pair@us.ibm.com, zwu.kernel@gmail.com, ryanh@us.ibm.com
>Subject: Re: [Qemu-devel] [PATCH v6 2/4] block: add the block queue support
>X-BeenThere: qemu-devel@nongnu.org
>X-Mailman-Version: 2.1.14
>Precedence: list
>List-Id: <qemu-devel.nongnu.org>
>List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
> <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
>List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
>List-Post: <mailto:qemu-devel@nongnu.org>
>List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
>List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
> <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
>X-Mailman-Copy: yes
>Errors-To: qemu-devel-bounces+wuzhy=linux.vnet.ibm.com@nongnu.org
>Sender: qemu-devel-bounces+wuzhy=linux.vnet.ibm.com@nongnu.org
>X-Brightmail-Tracker: AAAAAA==
>X-Xagent-From: stefanha@gmail.com
>X-Xagent-To: wuzhy@linux.vnet.ibm.com
>X-Xagent-Gateway: vmsdvma.vnet.ibm.com (XAGENTU4 at VMSDVMA)
>
>On Thu, Sep 1, 2011 at 12:44 PM, Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> wrote:
>> +struct BlockIORequest {
>> +    QTAILQ_ENTRY(BlockIORequest) entry;
>> +    BlockDriverState *bs;
>> +    BlockRequestHandler *handler;
>> +    int64_t sector_num;
>> +    QEMUIOVector *qiov;
>> +    int nb_sectors;
>> +    BlockDriverCompletionFunc *cb;
>> +    void *opaque;
>> +    BlockDriverAIOCB *acb;
>> +};
>> +
>> +struct BlockQueue {
>> +    QTAILQ_HEAD(requests, BlockIORequest) requests;
>> +    BlockIORequest *request;
>> +    bool flushing;
>> +};
>> +
>> +struct BlockQueueAIOCB {
>> +    BlockDriverAIOCB common;
>> +    BlockDriverCompletionFunc *real_cb;
>> +    BlockDriverAIOCB *real_acb;
>> +    void *opaque;
>> +    BlockIORequest *request;
>> +};
>
>Now it's pretty easy to merge BlockIORequest and BlockQueueAIOCB into
>one struct.  That way we don't need to duplicate fields or link
>structures together:
Must we merge them? I think that it will cause the logic is not clear than current way. It is weird that some BlockQueueAIOCB are linked to block queue although it reduce the LOC to some extent.
Moreover, those block-queue functions need to be rewritten.

>
>typedef struct BlockQueueAIOCB {
>    BlockDriverAIOCB common;
>    QTAILQ_ENTRY(BlockQueueAIOCB) entry;          /* held requests */
>    BlockRequestHandler *handler;                 /* dispatch function */
>    BlockDriverAIOCB *real_acb;                   /* dispatched aiocb */
>
>    /* Request parameters */
>    int64_t sector_num;
>    QEMUIOVector *qiov;
>    int nb_sectors;
>} BlockQueueAIOCB;
>
>This struct is kept for the lifetime of a request (both during
>queueing and after dispatch).
ditto.
>
>> +
>> +static void qemu_block_queue_dequeue(BlockQueue *queue, BlockIORequest *request)
>> +{
>> +    BlockIORequest *req;
>> +
>> +    while (!QTAILQ_EMPTY(&queue->requests)) {
>> +        req = QTAILQ_FIRST(&queue->requests);
>> +        if (req == request) {
>> +            QTAILQ_REMOVE(&queue->requests, req, entry);
>> +            break;
>> +        }
>> +    }
>> +}
>> +
>> +static void qemu_block_queue_cancel(BlockDriverAIOCB *acb)
>> +{
>> +    BlockQueueAIOCB *blkacb = container_of(acb, BlockQueueAIOCB, common);
>> +    if (blkacb->real_acb) {
>> +        bdrv_aio_cancel(blkacb->real_acb);
>> +    } else {
>> +        assert(blkacb->common.bs->block_queue);
>> +        qemu_block_queue_dequeue(blkacb->common.bs->block_queue,
>> +                                 blkacb->request);
>
>Can we use QTAILQ_REMOVE() and eliminate qemu_block_queue_dequeue()?
why need to replace dequeue function with QTAILQ_REMOVE()?
>If the aiocb exists and is not dispatched (real_acb != NULL) then it
>must be queued.
Can you explain clearlier?
>
>> +    }
>> +
>> +    qemu_aio_release(blkacb);
>> +}
>> +
>> +static AIOPool block_queue_pool = {
>> +    .aiocb_size         = sizeof(struct BlockQueueAIOCB),
>> +    .cancel             = qemu_block_queue_cancel,
>> +};
>> +
>> +static void qemu_block_queue_callback(void *opaque, int ret)
>> +{
>> +    BlockQueueAIOCB *acb = opaque;
>> +
>> +    if (acb->real_cb) {
>> +        acb->real_cb(acb->opaque, ret);
>> +    }
>> +
>> +    qemu_aio_release(acb);
>> +}
>> +
>> +BlockQueue *qemu_new_block_queue(void)
>> +{
>> +    BlockQueue *queue;
>> +
>> +    queue = g_malloc0(sizeof(BlockQueue));
>> +
>> +    QTAILQ_INIT(&queue->requests);
>> +
>> +    queue->flushing = false;
>> +    queue->request  = NULL;
>> +
>> +    return queue;
>> +}
>> +
>> +void qemu_del_block_queue(BlockQueue *queue)
>> +{
>> +    BlockIORequest *request, *next;
>> +
>> +    QTAILQ_FOREACH_SAFE(request, &queue->requests, entry, next) {
>> +        QTAILQ_REMOVE(&queue->requests, request, entry);
>> +        g_free(request);
>
>What about the acbs?  There needs to be a strategy for cleanly
what strategy?
>shutting down completely.  In fact we should probably use cancellation
>here or assert that the queue is already empty.
You mean if the queue has been empty, we need to assert(queue)?
>
>> +    }
>> +
>> +    g_free(queue);
>> +}
>> +
>> +BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
>> +                        BlockDriverState *bs,
>> +                        BlockRequestHandler *handler,
>> +                        int64_t sector_num,
>> +                        QEMUIOVector *qiov,
>> +                        int nb_sectors,
>> +                        BlockDriverCompletionFunc *cb,
>> +                        void *opaque)
>> +{
>> +    BlockIORequest *request;
>> +    BlockDriverAIOCB *acb;
>> +    BlockQueueAIOCB *blkacb;
>> +
>> +    if (queue->request) {
>> +        request             = queue->request;
>> +        assert(request->acb);
>> +        acb                 = request->acb;
>> +        QTAILQ_INSERT_TAIL(&queue->requests, request, entry);
>> +        queue->request      = NULL;
>
>I don't think we need this behavior.  There is already requeuing logic
>in the flush function: if request->handler() fails then the request is
>put back onto the queue and we stop flushing.
>
>So all we need here is:
>
>if (queue->flushing) {
>    return NULL;
>}
very nice, thanks.
>
>This results in the request->handler() failing (returning NULL) and
>the flush function then requeues this request.
>
>In other words, during flushing we do not allow any new requests to be enqueued.
>
>> +    } else {
>> +        request             = g_malloc0(sizeof(BlockIORequest));
>> +        request->bs         = bs;
>> +        request->handler    = handler;
>> +        request->sector_num = sector_num;
>> +        request->qiov       = qiov;
>> +        request->nb_sectors = nb_sectors;
>> +        request->cb         = qemu_block_queue_callback;
>> +        QTAILQ_INSERT_TAIL(&queue->requests, request, entry);
>> +
>> +        acb = qemu_aio_get(&block_queue_pool, bs,
>> +                           qemu_block_queue_callback, opaque);
>> +        blkacb = container_of(acb, BlockQueueAIOCB, common);
>> +        blkacb->real_cb     = cb;
>> +        blkacb->real_acb    = NULL;
>> +        blkacb->opaque      = opaque;
>> +        blkacb->request     = request;
>> +
>> +        request->acb        = acb;
>> +        request->opaque = (void *)blkacb;
>
>Here's what this initialization code looks like when BlockIORequest
>and BlockQueueAIOCB are merged:
>
>acb = qemu_aio_get(&block_queue_pool, bs, cb, opaque);
>acb->handler    = handler;
>acb->sector_num = sector_num;
>acb->qiov       = qiov;
>acb->nb_sectors = nb_sectors;
>acb->real_acb   = NULL;
>QTAILQ_INSERT_TAIL(&queue->requests, acb, entry);
pls see comment #1.
>
>> +    }
>> +
>> +    return acb;
>> +}
>> +
>> +static int qemu_block_queue_handler(BlockIORequest *request)
>> +{
>> +    int ret;
>> +    BlockDriverAIOCB *res;
>> +
>> +    res = request->handler(request->bs, request->sector_num,
>> +                           request->qiov, request->nb_sectors,
>> +                           request->cb, request->opaque);
>
>Please pass qemu_block_queue_callback and request->acb directly here
>instead of using request->cb and request->opaque.  Those fields aren't
>needed and just add indirection.
If later the other guy want to reuse qemu_block_queue_handler, and use different callback function, then this function can not be used. Your way lower this function's reusability.
>
>> +    if (res) {
>> +        BlockQueueAIOCB *blkacb =
>> +                container_of(request->acb, BlockQueueAIOCB, common);
>> +        blkacb->real_acb = res;
>> +    }
>> +
>> +    ret = (res == NULL) ? 0 : 1;
>> +
>> +    return ret;
>> +}
>> +
>> +void qemu_block_queue_flush(BlockQueue *queue)
>> +{
>> +    queue->flushing = true;
>> +    while (!QTAILQ_EMPTY(&queue->requests)) {
>> +        BlockIORequest *request = NULL;
>> +        int ret = 0;
>> +
>> +        request = QTAILQ_FIRST(&queue->requests);
>> +        QTAILQ_REMOVE(&queue->requests, request, entry);
>> +
>> +        queue->request = request;
>> +        ret = qemu_block_queue_handler(request);
>> +
>> +        if (ret == 0) {
>> +            QTAILQ_INSERT_HEAD(&queue->requests, request, entry);
>> +            queue->request = NULL;
>> +            break;
>> +        }
>> +
>> +        if (queue->request) {
>> +            g_free(request);
>> +        }
>> +
>> +        queue->request = NULL;
>> +    }
>> +
>> +    queue->flushing = false;
>> +}
>> +
>> +bool qemu_block_queue_has_pending(BlockQueue *queue)
>> +{
>> +    return !queue->flushing && !QTAILQ_EMPTY(&queue->requests);
>> +}
>> diff --git a/block/blk-queue.h b/block/blk-queue.h
>> new file mode 100644
>> index 0000000..84c2407
>> --- /dev/null
>> +++ b/block/blk-queue.h
>> @@ -0,0 +1,63 @@
>> +/*
>> + * QEMU System Emulator queue declaration for block layer
>> + *
>> + * Copyright (c) IBM, Corp. 2011
>> + *
>> + * Authors:
>> + *  Zhi Yong Wu  <zwu.kernel@gmail.com>
>> + *  Stefan Hajnoczi <stefanha@gmail.com>
>
>Please use linux.vnet.ibm.com addresses and not GMail.
>
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a copy
>> + * of this software and associated documentation files (the "Software"), to deal
>> + * in the Software without restriction, including without limitation the rights
>> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
>> + * copies of the Software, and to permit persons to whom the Software is
>> + * furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
>> + * THE SOFTWARE.
>> + */
>> +
>> +#ifndef QEMU_BLOCK_QUEUE_H
>> +#define QEMU_BLOCK_QUEUE_H
>> +
>> +#include "block.h"
>> +#include "qemu-queue.h"
>> +
>> +typedef BlockDriverAIOCB* (BlockRequestHandler) (BlockDriverState *bs,
>> +                                int64_t sector_num, QEMUIOVector *qiov,
>> +                                int nb_sectors, BlockDriverCompletionFunc *cb,
>> +                                void *opaque);
>> +
>> +typedef struct BlockIORequest BlockIORequest;
>
>BlockIORequest does not need a forward declaration because only
>blk-queue.c uses it.  It's a private type that the rest of QEMU should
>not know about.
OK.
>
>> +
>> +typedef struct BlockQueue BlockQueue;
>> +
>> +typedef struct BlockQueueAIOCB BlockQueueAIOCB;
>
>This is also a private type, callers only know about BlockDriverAIOCB.
> Please move to blk-queue.c.
ditto
>
>Stefan
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v6 2/4] block: add the block queue support
  2011-09-05  8:34     ` Zhi Yong Wu
@ 2011-09-07 11:13       ` Stefan Hajnoczi
  0 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2011-09-07 11:13 UTC (permalink / raw)
  To: Zhi Yong Wu; +Cc: qemu-devel

On Mon, Sep 5, 2011 at 9:34 AM, Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> wrote:
> On Thu, Sep 01, 2011 at 02:02:41PM +0100, Stefan Hajnoczi wrote:
>> 01 Sep 2011 06:02:41 -0700 (PDT)
>>On Thu, Sep 1, 2011 at 12:44 PM, Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> wrote:
>>> +static void qemu_block_queue_dequeue(BlockQueue *queue, BlockIORequest *request)
>>> +{
>>> +    BlockIORequest *req;
>>> +
>>> +    while (!QTAILQ_EMPTY(&queue->requests)) {
>>> +        req = QTAILQ_FIRST(&queue->requests);
>>> +        if (req == request) {
>>> +            QTAILQ_REMOVE(&queue->requests, req, entry);
>>> +            break;
>>> +        }
>>> +    }
>>> +}
>>> +
>>> +static void qemu_block_queue_cancel(BlockDriverAIOCB *acb)
>>> +{
>>> +    BlockQueueAIOCB *blkacb = container_of(acb, BlockQueueAIOCB, common);
>>> +    if (blkacb->real_acb) {
>>> +        bdrv_aio_cancel(blkacb->real_acb);
>>> +    } else {
>>> +        assert(blkacb->common.bs->block_queue);
>>> +        qemu_block_queue_dequeue(blkacb->common.bs->block_queue,
>>> +                                 blkacb->request);
>>
>>Can we use QTAILQ_REMOVE() and eliminate qemu_block_queue_dequeue()?
> why need to replace dequeue function with QTAILQ_REMOVE()?

Because the existing QTAILQ_REMOVE() macro already does what is needed.

>>> +void qemu_del_block_queue(BlockQueue *queue)
>>> +{
>>> +    BlockIORequest *request, *next;
>>> +
>>> +    QTAILQ_FOREACH_SAFE(request, &queue->requests, entry, next) {
>>> +        QTAILQ_REMOVE(&queue->requests, request, entry);
>>> +        g_free(request);
>>
>>What about the acbs?  There needs to be a strategy for cleanly
> what strategy?
>>shutting down completely.  In fact we should probably use cancellation
>>here or assert that the queue is already empty.
> You mean if the queue has been empty, we need to assert(queue)?

This patch silently deletes queued requests, which leaks the
BlockQueueAIOCBs.  The queued requests will never be issued or
completed.

qemu_del_block_queue() must cleanly destroy the queue.  This function
could require the queue to be empty, then we do not need to worry
about queued requests.  The caller can do this by flushing it before
deleting the queue.

>>> +static int qemu_block_queue_handler(BlockIORequest *request)
>>> +{
>>> +    int ret;
>>> +    BlockDriverAIOCB *res;
>>> +
>>> +    res = request->handler(request->bs, request->sector_num,
>>> +                           request->qiov, request->nb_sectors,
>>> +                           request->cb, request->opaque);
>>
>>Please pass qemu_block_queue_callback and request->acb directly here
>>instead of using request->cb and request->opaque.  Those fields aren't
>>needed and just add indirection.
> If later the other guy want to reuse qemu_block_queue_handler, and use different callback function, then this function can not be used. Your way lower this function's reusability.

Code can be changed so it's best to do things the simple way first and
extend the code if necessary later.  Trying to think ahead results in
half-finished code where the reader has to guess the author's
intention.

Stefan

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2011-09-07 11:13 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-01 11:44 [PATCH v6 0/4] The intro of QEMU block I/O throttling Zhi Yong Wu
2011-09-01 11:44 ` [Qemu-devel] " Zhi Yong Wu
2011-09-01 11:44 ` [PATCH v6 1/4] block: add the command line support Zhi Yong Wu
2011-09-01 11:44   ` [Qemu-devel] " Zhi Yong Wu
2011-09-01 11:44 ` [PATCH v6 2/4] block: add the block queue support Zhi Yong Wu
2011-09-01 11:44   ` [Qemu-devel] " Zhi Yong Wu
2011-09-01 13:02   ` Stefan Hajnoczi
2011-09-01 13:02     ` [Qemu-devel] " Stefan Hajnoczi
2011-09-05  8:34     ` Zhi Yong Wu
2011-09-07 11:13       ` Stefan Hajnoczi
2011-09-01 11:44 ` [PATCH v6 3/4] block: add block timer and block throttling algorithm Zhi Yong Wu
2011-09-01 11:44   ` [Qemu-devel] " Zhi Yong Wu
2011-09-01 13:36   ` Stefan Hajnoczi
2011-09-01 13:36     ` [Qemu-devel] " Stefan Hajnoczi
2011-09-05  7:10     ` Zhi Yong Wu
2011-09-01 11:44 ` [PATCH v6 4/4] qmp/hmp: add block_set_io_throttle Zhi Yong Wu
2011-09-01 11:44   ` [Qemu-devel] " Zhi Yong Wu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.