All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH V2 for-1.6 0/5] Leaky bucket throttling and features
@ 2013-07-23 12:21 Benoît Canet
  2013-07-23 12:21 ` [Qemu-devel] [PATCH V2 for-1.6 1/5] block: Repair the throttling code Benoît Canet
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Benoît Canet @ 2013-07-23 12:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

The first patch fixes the throttling which was broken by a previous commit.

The next patch replace the existing throttling algorithm by the well described
leaky bucket algorithm.

Third patch implement bursting by adding *_threshold parameters to
qmp_block_set_io_throttle.

The last one allow to define the max size of an io when throttling by iops via
iops_sector_count to avoid vm users cheating on the iops limit.

The last patch adds a metric reflecting how much the I/O are throttled.

since v1:
    Add throttling percentage metric [Benoît]

Benoît Canet (5):
  block: Repair the throttling code.
  block: Modify the throttling code to implement the leaky bucket
    algorithm.
  block: Add support for throttling burst threshold in QMP and the
    command line.
  block: Add iops_sector_count to do the iops accounting for a given io
    size.
  block: Add throttling percentage metrics.

 block.c                   |  439 ++++++++++++++++++++++++++-------------------
 block/qapi.c              |   32 ++++
 blockdev.c                |  174 ++++++++++++++++--
 hmp.c                     |   38 +++-
 include/block/block_int.h |   18 +-
 include/block/coroutine.h |    5 +
 qapi-schema.json          |   42 ++++-
 qemu-coroutine-lock.c     |   14 ++
 qemu-options.hx           |    2 +-
 qmp-commands.hx           |   34 +++-
 10 files changed, 586 insertions(+), 212 deletions(-)

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH V2 for-1.6 1/5] block: Repair the throttling code.
  2013-07-23 12:21 [Qemu-devel] [PATCH V2 for-1.6 0/5] Leaky bucket throttling and features Benoît Canet
@ 2013-07-23 12:21 ` Benoît Canet
  2013-07-23 14:31   ` Stefan Hajnoczi
  2013-07-23 12:21 ` [Qemu-devel] [PATCH V2 for-1.6 2/5] block: Modify the throttling code to implement the leaky bucket algorithm Benoît Canet
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Benoît Canet @ 2013-07-23 12:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

The throttling code was segfaulting since commit
02ffb504485f0920cfc75a0982a602f824a9a4f4 because some qemu_co_queue_next caller
does not run in a coroutine.
qemu_co_queue_do_restart assume that the caller is a coroutinne.
As sugested by Stefan fix this by entering the coroutine directly.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block.c                   |    8 ++++----
 include/block/coroutine.h |    5 +++++
 qemu-coroutine-lock.c     |   14 ++++++++++++++
 3 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index b560241..dc72643 100644
--- a/block.c
+++ b/block.c
@@ -127,7 +127,8 @@ void bdrv_io_limits_disable(BlockDriverState *bs)
 {
     bs->io_limits_enabled = false;
 
-    while (qemu_co_queue_next(&bs->throttled_reqs));
+    while (qemu_co_enter_next(&bs->throttled_reqs)) {
+    }
 
     if (bs->block_timer) {
         qemu_del_timer(bs->block_timer);
@@ -143,7 +144,7 @@ static void bdrv_block_timer(void *opaque)
 {
     BlockDriverState *bs = opaque;
 
-    qemu_co_queue_next(&bs->throttled_reqs);
+    qemu_co_enter_next(&bs->throttled_reqs);
 }
 
 void bdrv_io_limits_enable(BlockDriverState *bs)
@@ -1445,8 +1446,7 @@ void bdrv_drain_all(void)
          * a busy wait.
          */
         QTAILQ_FOREACH(bs, &bdrv_states, list) {
-            if (!qemu_co_queue_empty(&bs->throttled_reqs)) {
-                qemu_co_queue_restart_all(&bs->throttled_reqs);
+            while (qemu_co_enter_next(&bs->throttled_reqs)) {
                 busy = true;
             }
         }
diff --git a/include/block/coroutine.h b/include/block/coroutine.h
index 377805a..66e331c 100644
--- a/include/block/coroutine.h
+++ b/include/block/coroutine.h
@@ -138,6 +138,11 @@ bool qemu_co_queue_next(CoQueue *queue);
 void qemu_co_queue_restart_all(CoQueue *queue);
 
 /**
+ * Enter the next coroutine in the queue
+ */
+bool qemu_co_enter_next(CoQueue *queue);
+
+/**
  * Checks if the CoQueue is empty.
  */
 bool qemu_co_queue_empty(CoQueue *queue);
diff --git a/qemu-coroutine-lock.c b/qemu-coroutine-lock.c
index d9fea49..c61d300 100644
--- a/qemu-coroutine-lock.c
+++ b/qemu-coroutine-lock.c
@@ -98,6 +98,20 @@ void qemu_co_queue_restart_all(CoQueue *queue)
     qemu_co_queue_do_restart(queue, false);
 }
 
+bool qemu_co_enter_next(CoQueue *queue)
+{
+    Coroutine *next;
+
+    next = QTAILQ_FIRST(&queue->entries);
+    if (!next) {
+        return false;
+    }
+
+    QTAILQ_REMOVE(&queue->entries, next, co_queue_next);
+    qemu_coroutine_enter(next, NULL);
+    return true;
+}
+
 bool qemu_co_queue_empty(CoQueue *queue)
 {
     return (QTAILQ_FIRST(&queue->entries) == NULL);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH V2 for-1.6 2/5] block: Modify the throttling code to implement the leaky bucket algorithm.
  2013-07-23 12:21 [Qemu-devel] [PATCH V2 for-1.6 0/5] Leaky bucket throttling and features Benoît Canet
  2013-07-23 12:21 ` [Qemu-devel] [PATCH V2 for-1.6 1/5] block: Repair the throttling code Benoît Canet
@ 2013-07-23 12:21 ` Benoît Canet
  2013-07-23 14:59   ` Stefan Hajnoczi
  2013-07-23 12:21 ` [Qemu-devel] [PATCH V2 for-1.6 3/5] block: Add support for throttling burst threshold in QMP and the command line Benoît Canet
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Benoît Canet @ 2013-07-23 12:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

This patch replace the previous algorithm by the well described leaky bucket
algorithm: A bucket is filled by the incoming IOs and a periodic timer decrement
the counter to make the bucket leak. When a given threshold is reached the
bucket is full and the IOs are hold.

In this patch the threshold is set to a default value to make the code behave
like the previous implementation.

In the next patch the threshold will be exposed in QMP to let the user control
the burstiness of the throttling.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block.c                   |  410 +++++++++++++++++++++++++--------------------
 blockdev.c                |   71 ++++++--
 include/block/block_int.h |   15 +-
 3 files changed, 299 insertions(+), 197 deletions(-)

diff --git a/block.c b/block.c
index dc72643..2d6e9b4 100644
--- a/block.c
+++ b/block.c
@@ -86,13 +86,6 @@ static void coroutine_fn bdrv_co_do_rw(void *opaque);
 static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
     int64_t sector_num, int nb_sectors);
 
-static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
-        bool is_write, double elapsed_time, uint64_t *wait);
-static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
-        double elapsed_time, uint64_t *wait);
-static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
-        bool is_write, int64_t *wait);
-
 static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
     QTAILQ_HEAD_INITIALIZER(bdrv_states);
 
@@ -101,6 +94,8 @@ static QLIST_HEAD(, BlockDriver) bdrv_drivers =
 
 /* If non-zero, use only whitelisted block drivers */
 static int use_bdrv_whitelist;
+/* boolean used to inform the throttling code that a bdrv_drain_all is issued */
+static bool draining;
 
 #ifdef _WIN32
 static int is_windows_drive_prefix(const char *filename)
@@ -135,15 +130,122 @@ void bdrv_io_limits_disable(BlockDriverState *bs)
         qemu_free_timer(bs->block_timer);
         bs->block_timer = NULL;
     }
+}
 
-    bs->slice_start = 0;
-    bs->slice_end   = 0;
+static void bdrv_make_bps_buckets_leak(BlockDriverState *bs, int64_t delta)
+{
+    int64_t *bytes = bs->leaky_buckets.bytes;
+    int64_t read_leak, write_leak;
+
+    /* the limit apply to both reads and writes */
+    if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
+        /* compute half the total leak */
+        int64_t leak = ((bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL] * delta) /
+                       NANOSECONDS_PER_SECOND);
+        int remain = leak % 2;
+        leak /= 2;
+
+        /* the read bucket is smaller than half the quantity to leak so take
+         * care adding the leak difference to write leak
+         */
+        if (bytes[BLOCK_IO_LIMIT_READ] <= leak) {
+            read_leak = bytes[BLOCK_IO_LIMIT_READ];
+            write_leak = 2 * leak + remain - bytes[BLOCK_IO_LIMIT_READ];
+        /* symetric case */
+        } else if (bytes[BLOCK_IO_LIMIT_WRITE] <= leak) {
+            write_leak = bytes[BLOCK_IO_LIMIT_WRITE];
+            read_leak = 2 * leak + remain - bytes[BLOCK_IO_LIMIT_WRITE];
+        /* both bucket above leak count use half the total leak for both */
+        } else {
+            write_leak = leak;
+            read_leak = leak + remain;
+        }
+    /* else we consider that limits are separated */
+    } else {
+        read_leak = (bs->io_limits.bps[BLOCK_IO_LIMIT_READ] * delta) /
+                    NANOSECONDS_PER_SECOND;
+        write_leak = (bs->io_limits.bps[BLOCK_IO_LIMIT_WRITE] * delta) /
+                     NANOSECONDS_PER_SECOND;
+    }
+
+    /* make the buckets leak */
+    bytes[BLOCK_IO_LIMIT_READ]  = MAX(bytes[BLOCK_IO_LIMIT_READ] - read_leak,
+                                      0);
+    bytes[BLOCK_IO_LIMIT_WRITE] = MAX(bytes[BLOCK_IO_LIMIT_WRITE] - write_leak,
+                                      0);
 }
 
+static void bdrv_make_iops_buckets_leak(BlockDriverState *bs, int64_t delta)
+{
+    double *ios = bs->leaky_buckets.ios;
+    int64_t read_leak, write_leak;
+
+    /* the limit apply to both reads and writes */
+    if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
+        /* compute half the total leak */
+        int64_t leak = ((bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL] * delta) /
+                       NANOSECONDS_PER_SECOND);
+        int remain = leak % 2;
+        leak /= 2;
+
+        /* the read bucket is smaller than half the quantity to leak so take
+         * care adding the leak difference to write leak
+         */
+        if (ios[BLOCK_IO_LIMIT_READ] <= leak) {
+            read_leak = ios[BLOCK_IO_LIMIT_READ];
+            write_leak = 2 * leak + remain - ios[BLOCK_IO_LIMIT_READ];
+        /* symetric case */
+        } else if (ios[BLOCK_IO_LIMIT_WRITE] <= leak) {
+            write_leak = ios[BLOCK_IO_LIMIT_WRITE];
+            read_leak = 2 * leak + remain - ios[BLOCK_IO_LIMIT_WRITE];
+        /* both bucket above leak count use half the total leak for both */
+        } else {
+            write_leak = leak;
+            read_leak = leak + remain;
+        }
+    /* else we consider that limits are separated */
+    } else {
+        read_leak = (bs->io_limits.iops[BLOCK_IO_LIMIT_READ] * delta) /
+                    NANOSECONDS_PER_SECOND;
+        write_leak = (bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE] * delta) /
+                     NANOSECONDS_PER_SECOND;
+    }
+
+    /* make the buckets leak */
+    ios[BLOCK_IO_LIMIT_READ]  = MAX(ios[BLOCK_IO_LIMIT_READ] - read_leak, 0);
+    ios[BLOCK_IO_LIMIT_WRITE] = MAX(ios[BLOCK_IO_LIMIT_WRITE] - write_leak, 0);
+}
+
+static void bdrv_leak_if_needed(BlockDriverState *bs)
+{
+    int64_t now;
+    int64_t delta;
+
+    if (!bs->must_leak) {
+        return;
+    }
+
+    bs->must_leak = false;
+
+    now = qemu_get_clock_ns(rt_clock);
+    delta = now - bs->previous_leak;
+    bs->previous_leak = now;
+
+    bdrv_make_bps_buckets_leak(bs, delta);
+    bdrv_make_iops_buckets_leak(bs, delta);
+}
+
+/* This callback is the timer in charge of making the leaky buckets leak */
 static void bdrv_block_timer(void *opaque)
 {
     BlockDriverState *bs = opaque;
 
+    /* rearm the timer */
+    qemu_mod_timer(bs->block_timer,
+                   qemu_get_clock_ns(vm_clock) +
+                   BLOCK_IO_THROTTLE_PERIOD);
+
+    bs->must_leak = true;
     qemu_co_enter_next(&bs->throttled_reqs);
 }
 
@@ -152,6 +254,10 @@ void bdrv_io_limits_enable(BlockDriverState *bs)
     qemu_co_queue_init(&bs->throttled_reqs);
     bs->block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
     bs->io_limits_enabled = true;
+    bs->previous_leak = qemu_get_clock_ns(rt_clock);
+    qemu_mod_timer(bs->block_timer,
+                   qemu_get_clock_ns(vm_clock) +
+                   BLOCK_IO_THROTTLE_PERIOD);
 }
 
 bool bdrv_io_limits_enabled(BlockDriverState *bs)
@@ -165,15 +271,113 @@ bool bdrv_io_limits_enabled(BlockDriverState *bs)
          || io_limits->iops[BLOCK_IO_LIMIT_TOTAL];
 }
 
+/* This function check if the correct bandwith threshold has been exceeded
+ *
+ * @is_write: true if the current IO is a write, false if it's a read
+ * @ret:      true if threshold has been exceeded else false
+ */
+static bool bdrv_is_bps_threshold_exceeded(BlockDriverState *bs, bool is_write)
+{
+    /* limit is on total read + write bps : do the sum and compare with total
+     * threshold
+     */
+    if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
+        int64_t bytes = bs->leaky_buckets.bytes[0] +
+                        bs->leaky_buckets.bytes[1];
+        return bs->io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL] < bytes;
+    }
+
+    /* check wether the threshold corresponding to the current io type (read,
+     * write has been exceeded
+     */
+    if (bs->io_limits.bps[is_write]) {
+        return bs->io_limits.bps_threshold[is_write] <
+               bs->leaky_buckets.bytes[is_write];
+    }
+
+    /* no limit */
+    return false;
+}
+
+/* This function check if the correct iops threshold has been exceeded
+ *
+ * @is_write: true if the current IO is a write, false if it's a read
+ * @ret:      true if threshold has been exceeded else false
+ */
+static bool bdrv_is_iops_threshold_exceeded(BlockDriverState *bs, bool is_write)
+{
+    /* limit is on total read + write iops : do the sum and compare with total
+     * threshold
+     */
+    if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
+        double ios = bs->leaky_buckets.ios[0] +
+                     bs->leaky_buckets.ios[1];
+        return bs->io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL] < ios;
+    }
+
+    /* check wether the threshold corresponding to the current io type (read,
+     * write has been exceeded
+     */
+    if (bs->io_limits.iops[is_write]) {
+        return bs->io_limits.iops_threshold[is_write] <
+               bs->leaky_buckets.ios[is_write];
+    }
+
+    /* no limit */
+    return false;
+}
+
+/* This function check if any bandwith or iops threshold has been exceeded
+ *
+ * @nb_sectors: the number of sectors of the current IO
+ * @is_write:   true if the current IO is a write, false if it's a read
+ * @ret:        true if any threshold has been exceeded else false
+ */
+static bool bdrv_is_any_threshold_exceeded(BlockDriverState *bs, int nb_sectors,
+                                           bool is_write)
+{
+    bool bps_ret, iops_ret;
+
+    /* check if any bandwith or per IO threshold has been exceeded */
+    bps_ret = bdrv_is_bps_threshold_exceeded(bs, is_write);
+    iops_ret = bdrv_is_iops_threshold_exceeded(bs, is_write);
+
+    /* if so the IO will be blocked so do not account it and return true
+     * also return false if a bdrv_drain_all is in progress
+     */
+    if (!draining && (bps_ret || iops_ret)) {
+        return true;
+    }
+
+    /* NOTE: the counter can go above the threshold when authorizing an IO.
+     *       At next call the code will punish the guest by blocking the
+     *       next IO until the counter has been decremented below the threshold.
+     *       This way if a guest issue a jumbo IO bigger than the threshold it
+     *       will have a chance no be authorized and will not result in a guest
+     *       IO deadlock.
+     */
+
+    /* the IO is authorized so do the accounting and return false */
+    bs->leaky_buckets.bytes[is_write] += (int64_t)nb_sectors *
+                                         BDRV_SECTOR_SIZE;
+    bs->leaky_buckets.ios[is_write]++;
+
+    return false;
+}
+
 static void bdrv_io_limits_intercept(BlockDriverState *bs,
                                      bool is_write, int nb_sectors)
 {
-    int64_t wait_time = -1;
-
+    bdrv_leak_if_needed(bs);
+    /* if some IOs are already queued because the bucket is full put the current
+     * IO at the end of the queue (FIFO)
+     */
     if (!qemu_co_queue_empty(&bs->throttled_reqs)) {
         qemu_co_queue_wait(&bs->throttled_reqs);
     }
 
+    bdrv_leak_if_needed(bs);
+
     /* In fact, we hope to keep each request's timing, in FIFO mode. The next
      * throttled requests will not be dequeued until the current request is
      * allowed to be serviced. So if the current request still exceeds the
@@ -181,13 +385,19 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs,
      * be still in throttled_reqs queue.
      */
 
-    while (bdrv_exceed_io_limits(bs, nb_sectors, is_write, &wait_time)) {
-        qemu_mod_timer(bs->block_timer,
-                       wait_time + qemu_get_clock_ns(vm_clock));
+    /* if a threshold is exceeded the leaky bucket is full so the code put the
+     * IO in the throttle_reqs queue until the bucket has leaked enough to be
+     * not full
+     */
+    while (bdrv_is_any_threshold_exceeded(bs, nb_sectors, is_write)) {
+        bdrv_leak_if_needed(bs);
         qemu_co_queue_wait_insert_head(&bs->throttled_reqs);
+        bdrv_leak_if_needed(bs);
     }
 
+    bdrv_leak_if_needed(bs);
     qemu_co_queue_next(&bs->throttled_reqs);
+    bdrv_leak_if_needed(bs);
 }
 
 /* check if the path starts with "<protocol>:" */
@@ -1439,6 +1649,9 @@ void bdrv_drain_all(void)
     BlockDriverState *bs;
     bool busy;
 
+    /* tell the throttling code we are draining */
+    draining = true;
+
     do {
         busy = qemu_aio_wait();
 
@@ -1457,6 +1670,8 @@ void bdrv_drain_all(void)
         assert(QLIST_EMPTY(&bs->tracked_requests));
         assert(qemu_co_queue_empty(&bs->throttled_reqs));
     }
+
+    draining = false;
 }
 
 /* make a BlockDriverState anonymous by removing from bdrv_state list.
@@ -1492,9 +1707,7 @@ static void bdrv_move_feature_fields(BlockDriverState *bs_dest,
     bs_dest->enable_write_cache = bs_src->enable_write_cache;
 
     /* i/o timing parameters */
-    bs_dest->slice_start        = bs_src->slice_start;
-    bs_dest->slice_end          = bs_src->slice_end;
-    bs_dest->slice_submitted    = bs_src->slice_submitted;
+    bs_dest->leaky_buckets      = bs_src->leaky_buckets;
     bs_dest->io_limits          = bs_src->io_limits;
     bs_dest->throttled_reqs     = bs_src->throttled_reqs;
     bs_dest->block_timer        = bs_src->block_timer;
@@ -3551,169 +3764,6 @@ void bdrv_aio_cancel(BlockDriverAIOCB *acb)
     acb->aiocb_info->cancel(acb);
 }
 
-/* block I/O throttling */
-static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
-                 bool is_write, double elapsed_time, uint64_t *wait)
-{
-    uint64_t bps_limit = 0;
-    uint64_t extension;
-    double   bytes_limit, bytes_base, bytes_res;
-    double   slice_time, wait_time;
-
-    if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
-        bps_limit = bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL];
-    } else if (bs->io_limits.bps[is_write]) {
-        bps_limit = bs->io_limits.bps[is_write];
-    } else {
-        if (wait) {
-            *wait = 0;
-        }
-
-        return false;
-    }
-
-    slice_time = bs->slice_end - bs->slice_start;
-    slice_time /= (NANOSECONDS_PER_SECOND);
-    bytes_limit = bps_limit * slice_time;
-    bytes_base  = bs->slice_submitted.bytes[is_write];
-    if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
-        bytes_base += bs->slice_submitted.bytes[!is_write];
-    }
-
-    /* bytes_base: the bytes of data which have been read/written; and
-     *             it is obtained from the history statistic info.
-     * bytes_res: the remaining bytes of data which need to be read/written.
-     * (bytes_base + bytes_res) / bps_limit: used to calcuate
-     *             the total time for completing reading/writting all data.
-     */
-    bytes_res   = (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
-
-    if (bytes_base + bytes_res <= bytes_limit) {
-        if (wait) {
-            *wait = 0;
-        }
-
-        return false;
-    }
-
-    /* Calc approx time to dispatch */
-    wait_time = (bytes_base + bytes_res) / bps_limit - elapsed_time;
-
-    /* When the I/O rate at runtime exceeds the limits,
-     * bs->slice_end need to be extended in order that the current statistic
-     * info can be kept until the timer fire, so it is increased and tuned
-     * based on the result of experiment.
-     */
-    extension = wait_time * NANOSECONDS_PER_SECOND;
-    extension = DIV_ROUND_UP(extension, BLOCK_IO_SLICE_TIME) *
-                BLOCK_IO_SLICE_TIME;
-    bs->slice_end += extension;
-    if (wait) {
-        *wait = wait_time * NANOSECONDS_PER_SECOND;
-    }
-
-    return true;
-}
-
-static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
-                             double elapsed_time, uint64_t *wait)
-{
-    uint64_t iops_limit = 0;
-    double   ios_limit, ios_base;
-    double   slice_time, wait_time;
-
-    if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
-        iops_limit = bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL];
-    } else if (bs->io_limits.iops[is_write]) {
-        iops_limit = bs->io_limits.iops[is_write];
-    } else {
-        if (wait) {
-            *wait = 0;
-        }
-
-        return false;
-    }
-
-    slice_time = bs->slice_end - bs->slice_start;
-    slice_time /= (NANOSECONDS_PER_SECOND);
-    ios_limit  = iops_limit * slice_time;
-    ios_base   = bs->slice_submitted.ios[is_write];
-    if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
-        ios_base += bs->slice_submitted.ios[!is_write];
-    }
-
-    if (ios_base + 1 <= ios_limit) {
-        if (wait) {
-            *wait = 0;
-        }
-
-        return false;
-    }
-
-    /* Calc approx time to dispatch, in seconds */
-    wait_time = (ios_base + 1) / iops_limit;
-    if (wait_time > elapsed_time) {
-        wait_time = wait_time - elapsed_time;
-    } else {
-        wait_time = 0;
-    }
-
-    /* Exceeded current slice, extend it by another slice time */
-    bs->slice_end += BLOCK_IO_SLICE_TIME;
-    if (wait) {
-        *wait = wait_time * NANOSECONDS_PER_SECOND;
-    }
-
-    return true;
-}
-
-static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
-                           bool is_write, int64_t *wait)
-{
-    int64_t  now, max_wait;
-    uint64_t bps_wait = 0, iops_wait = 0;
-    double   elapsed_time;
-    int      bps_ret, iops_ret;
-
-    now = qemu_get_clock_ns(vm_clock);
-    if (now > bs->slice_end) {
-        bs->slice_start = now;
-        bs->slice_end   = now + BLOCK_IO_SLICE_TIME;
-        memset(&bs->slice_submitted, 0, sizeof(bs->slice_submitted));
-    }
-
-    elapsed_time  = now - bs->slice_start;
-    elapsed_time  /= (NANOSECONDS_PER_SECOND);
-
-    bps_ret  = bdrv_exceed_bps_limits(bs, nb_sectors,
-                                      is_write, elapsed_time, &bps_wait);
-    iops_ret = bdrv_exceed_iops_limits(bs, is_write,
-                                      elapsed_time, &iops_wait);
-    if (bps_ret || iops_ret) {
-        max_wait = bps_wait > iops_wait ? bps_wait : iops_wait;
-        if (wait) {
-            *wait = max_wait;
-        }
-
-        now = qemu_get_clock_ns(vm_clock);
-        if (bs->slice_end < now + max_wait) {
-            bs->slice_end = now + max_wait;
-        }
-
-        return true;
-    }
-
-    if (wait) {
-        *wait = 0;
-    }
-
-    bs->slice_submitted.bytes[is_write] += (int64_t)nb_sectors *
-                                           BDRV_SECTOR_SIZE;
-    bs->slice_submitted.ios[is_write]++;
-
-    return false;
-}
-
 /**************************************************************/
 /* async block device emulation */
 
diff --git a/blockdev.c b/blockdev.c
index c5abd65..a78fba4 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -280,10 +280,25 @@ static int parse_block_error_action(const char *buf, bool is_read)
     }
 }
 
+static bool invalid(int64_t limit)
+{
+    if (!limit) {
+        return false;
+    }
+
+    if (limit < (THROTTLE_HZ * 2)) {
+        return true;
+    }
+
+    return false;
+}
+
 static bool do_check_io_limits(BlockIOLimit *io_limits, Error **errp)
 {
     bool bps_flag;
     bool iops_flag;
+    bool bps_threshold_flag;
+    bool iops_threshold_flag;
 
     assert(io_limits);
 
@@ -299,13 +314,30 @@ static bool do_check_io_limits(BlockIOLimit *io_limits, Error **errp)
         return false;
     }
 
-    if (io_limits->bps[BLOCK_IO_LIMIT_TOTAL] < 0 ||
-        io_limits->bps[BLOCK_IO_LIMIT_WRITE] < 0 ||
-        io_limits->bps[BLOCK_IO_LIMIT_READ] < 0 ||
-        io_limits->iops[BLOCK_IO_LIMIT_TOTAL] < 0 ||
-        io_limits->iops[BLOCK_IO_LIMIT_WRITE] < 0 ||
-        io_limits->iops[BLOCK_IO_LIMIT_READ] < 0) {
-        error_setg(errp, "bps and iops values must be 0 or greater");
+    bps_threshold_flag  =
+        (io_limits->bps_threshold[BLOCK_IO_LIMIT_TOTAL] != 0)
+         && ((io_limits->bps_threshold[BLOCK_IO_LIMIT_READ] != 0)
+         || (io_limits->bps_threshold[BLOCK_IO_LIMIT_WRITE] != 0));
+    iops_threshold_flag =
+        (io_limits->iops_threshold[BLOCK_IO_LIMIT_TOTAL] != 0)
+         && ((io_limits->iops_threshold[BLOCK_IO_LIMIT_READ] != 0)
+         || (io_limits->iops_threshold[BLOCK_IO_LIMIT_WRITE] != 0));
+    if (bps_threshold_flag || iops_threshold_flag) {
+        error_setg(errp, "bps_threshold(iops_threshold) and "
+            "bps_rd_threshold/bps_wr_threshold"
+            "(iops_rd_threshold/iops_wr_threshold) "
+            "cannot be used at the same time");
+        return false;
+    }
+
+    if (invalid(io_limits->bps[BLOCK_IO_LIMIT_TOTAL]) ||
+        invalid(io_limits->bps[BLOCK_IO_LIMIT_WRITE]) ||
+        invalid(io_limits->bps[BLOCK_IO_LIMIT_READ]) ||
+        invalid(io_limits->iops[BLOCK_IO_LIMIT_TOTAL]) ||
+        invalid(io_limits->iops[BLOCK_IO_LIMIT_WRITE]) ||
+        invalid(io_limits->iops[BLOCK_IO_LIMIT_READ])) {
+        error_setg(errp, "bps and iops values must be %i or greater",
+                   THROTTLE_HZ * 2);
         return false;
     }
 
@@ -497,6 +529,18 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type)
                            qemu_opt_get_number(opts, "iops_rd", 0);
     io_limits.iops[BLOCK_IO_LIMIT_WRITE] =
                            qemu_opt_get_number(opts, "iops_wr", 0);
+    io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL] =
+                           io_limits.bps[BLOCK_IO_LIMIT_TOTAL] / THROTTLE_HZ;
+    io_limits.bps_threshold[BLOCK_IO_LIMIT_READ]  =
+                           io_limits.bps[BLOCK_IO_LIMIT_READ] / THROTTLE_HZ;
+    io_limits.bps_threshold[BLOCK_IO_LIMIT_WRITE] =
+                           io_limits.bps[BLOCK_IO_LIMIT_WRITE] / THROTTLE_HZ;
+    io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL] =
+                           io_limits.iops[BLOCK_IO_LIMIT_TOTAL] / THROTTLE_HZ;
+    io_limits.iops_threshold[BLOCK_IO_LIMIT_READ]  =
+                           io_limits.iops[BLOCK_IO_LIMIT_READ] / THROTTLE_HZ;
+    io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE] =
+                           io_limits.iops[BLOCK_IO_LIMIT_WRITE] / THROTTLE_HZ;
 
     if (!do_check_io_limits(&io_limits, &error)) {
         error_report("%s", error_get_pretty(error));
@@ -1198,6 +1242,12 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
     io_limits.iops[BLOCK_IO_LIMIT_TOTAL]= iops;
     io_limits.iops[BLOCK_IO_LIMIT_READ] = iops_rd;
     io_limits.iops[BLOCK_IO_LIMIT_WRITE]= iops_wr;
+    io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL] = bps / THROTTLE_HZ;
+    io_limits.bps_threshold[BLOCK_IO_LIMIT_READ]  = bps_rd / THROTTLE_HZ;
+    io_limits.bps_threshold[BLOCK_IO_LIMIT_WRITE] = bps_wr / THROTTLE_HZ;
+    io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL] = iops / THROTTLE_HZ;
+    io_limits.iops_threshold[BLOCK_IO_LIMIT_READ]  = iops_rd / THROTTLE_HZ;
+    io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE] = iops_wr / THROTTLE_HZ;
 
     if (!do_check_io_limits(&io_limits, errp)) {
         return;
@@ -1209,11 +1259,10 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
         bdrv_io_limits_enable(bs);
     } else if (bs->io_limits_enabled && !bdrv_io_limits_enabled(bs)) {
         bdrv_io_limits_disable(bs);
-    } else {
-        if (bs->block_timer) {
-            qemu_mod_timer(bs->block_timer, qemu_get_clock_ns(vm_clock));
-        }
     }
+
+    /* reset leaky bucket to get the system in a known state */
+    memset(&bs->leaky_buckets, 0, sizeof(bs->leaky_buckets));
 }
 
 int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
diff --git a/include/block/block_int.h b/include/block/block_int.h
index c6ac871..e32ad1f 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -43,8 +43,9 @@
 #define BLOCK_IO_LIMIT_WRITE    1
 #define BLOCK_IO_LIMIT_TOTAL    2
 
-#define BLOCK_IO_SLICE_TIME     100000000
 #define NANOSECONDS_PER_SECOND  1000000000.0
+#define THROTTLE_HZ 1
+#define BLOCK_IO_THROTTLE_PERIOD (NANOSECONDS_PER_SECOND / THROTTLE_HZ)
 
 #define BLOCK_OPT_SIZE              "size"
 #define BLOCK_OPT_ENCRYPT           "encryption"
@@ -73,11 +74,13 @@ typedef struct BdrvTrackedRequest {
 typedef struct BlockIOLimit {
     int64_t bps[3];
     int64_t iops[3];
+    int64_t bps_threshold[3];
+    int64_t iops_threshold[3];
 } BlockIOLimit;
 
 typedef struct BlockIOBaseValue {
-    uint64_t bytes[2];
-    uint64_t ios[2];
+    int64_t bytes[2];
+    double  ios[2];
 } BlockIOBaseValue;
 
 struct BlockDriver {
@@ -264,10 +267,10 @@ struct BlockDriverState {
     unsigned int copy_on_read_in_flight;
 
     /* the time for latest disk I/O */
-    int64_t slice_start;
-    int64_t slice_end;
     BlockIOLimit io_limits;
-    BlockIOBaseValue slice_submitted;
+    BlockIOBaseValue leaky_buckets;
+    int64_t      previous_leak;
+    bool         must_leak;
     CoQueue      throttled_reqs;
     QEMUTimer    *block_timer;
     bool         io_limits_enabled;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH V2 for-1.6 3/5] block: Add support for throttling burst threshold in QMP and the command line.
  2013-07-23 12:21 [Qemu-devel] [PATCH V2 for-1.6 0/5] Leaky bucket throttling and features Benoît Canet
  2013-07-23 12:21 ` [Qemu-devel] [PATCH V2 for-1.6 1/5] block: Repair the throttling code Benoît Canet
  2013-07-23 12:21 ` [Qemu-devel] [PATCH V2 for-1.6 2/5] block: Modify the throttling code to implement the leaky bucket algorithm Benoît Canet
@ 2013-07-23 12:21 ` Benoît Canet
  2013-07-23 12:21 ` [Qemu-devel] [PATCH V2 for-1.6 4/5] block: Add iops_sector_count to do the iops accounting for a given io size Benoît Canet
  2013-07-23 12:21 ` [Qemu-devel] [PATCH V2 for-1.6 5/5] block: Add throttling percentage metrics Benoît Canet
  4 siblings, 0 replies; 10+ messages in thread
From: Benoît Canet @ 2013-07-23 12:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

The thresholds of the leaky bucket algorithm can be used to allow some
burstiness.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qapi.c     |   24 +++++++++++++
 blockdev.c       |  105 +++++++++++++++++++++++++++++++++++++++++++++++-------
 hmp.c            |   32 +++++++++++++++--
 qapi-schema.json |   34 ++++++++++++++++--
 qemu-options.hx  |    2 +-
 qmp-commands.hx  |   30 ++++++++++++++--
 6 files changed, 205 insertions(+), 22 deletions(-)

diff --git a/block/qapi.c b/block/qapi.c
index a4bc411..03f1604 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -235,6 +235,30 @@ void bdrv_query_info(BlockDriverState *bs,
                            bs->io_limits.iops[BLOCK_IO_LIMIT_READ];
             info->inserted->iops_wr =
                            bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE];
+            info->inserted->has_bps_threshold =
+                           bs->io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL];
+            info->inserted->bps_threshold =
+                           bs->io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL];
+            info->inserted->has_bps_rd_threshold =
+                           bs->io_limits.bps_threshold[BLOCK_IO_LIMIT_READ];
+            info->inserted->bps_rd_threshold =
+                           bs->io_limits.bps_threshold[BLOCK_IO_LIMIT_READ];
+            info->inserted->has_bps_wr_threshold =
+                           bs->io_limits.bps_threshold[BLOCK_IO_LIMIT_WRITE];
+            info->inserted->bps_wr_threshold =
+                           bs->io_limits.bps_threshold[BLOCK_IO_LIMIT_WRITE];
+            info->inserted->has_iops_threshold =
+                           bs->io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL];
+            info->inserted->iops_threshold =
+                           bs->io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL];
+            info->inserted->iops_rd_threshold =
+                           bs->io_limits.iops_threshold[BLOCK_IO_LIMIT_READ];
+            info->inserted->has_iops_rd_threshold =
+                           bs->io_limits.iops_threshold[BLOCK_IO_LIMIT_READ];
+            info->inserted->has_iops_wr_threshold =
+                           bs->io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE];
+            info->inserted->iops_wr_threshold =
+                           bs->io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE];
         }
 
         bs0 = bs;
diff --git a/blockdev.c b/blockdev.c
index a78fba4..9bda359 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -341,6 +341,17 @@ static bool do_check_io_limits(BlockIOLimit *io_limits, Error **errp)
         return false;
     }
 
+    if (io_limits->bps_threshold[BLOCK_IO_LIMIT_TOTAL] < 0 ||
+        io_limits->bps_threshold[BLOCK_IO_LIMIT_WRITE] < 0 ||
+        io_limits->bps_threshold[BLOCK_IO_LIMIT_READ] < 0 ||
+        io_limits->iops_threshold[BLOCK_IO_LIMIT_TOTAL] < 0 ||
+        io_limits->iops_threshold[BLOCK_IO_LIMIT_WRITE] < 0 ||
+        io_limits->iops_threshold[BLOCK_IO_LIMIT_READ] < 0) {
+        error_setg(errp,
+                   "threshold values must be 0 or greater");
+        return false;
+    }
+
     return true;
 }
 
@@ -523,24 +534,34 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type)
                            qemu_opt_get_number(opts, "bps_rd", 0);
     io_limits.bps[BLOCK_IO_LIMIT_WRITE]  =
                            qemu_opt_get_number(opts, "bps_wr", 0);
+    /* bps thresholds */
+    io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL]  =
+        qemu_opt_get_number(opts, "bps_threshold",
+                            io_limits.bps[BLOCK_IO_LIMIT_TOTAL] / THROTTLE_HZ);
+    io_limits.bps_threshold[BLOCK_IO_LIMIT_READ]  =
+        qemu_opt_get_number(opts, "bps_rd_threshold",
+                            io_limits.bps[BLOCK_IO_LIMIT_READ] / THROTTLE_HZ);
+    io_limits.bps_threshold[BLOCK_IO_LIMIT_WRITE]  =
+        qemu_opt_get_number(opts, "bps_wr_threshold",
+                            io_limits.bps[BLOCK_IO_LIMIT_WRITE] / THROTTLE_HZ);
+
     io_limits.iops[BLOCK_IO_LIMIT_TOTAL] =
                            qemu_opt_get_number(opts, "iops", 0);
     io_limits.iops[BLOCK_IO_LIMIT_READ]  =
                            qemu_opt_get_number(opts, "iops_rd", 0);
     io_limits.iops[BLOCK_IO_LIMIT_WRITE] =
                            qemu_opt_get_number(opts, "iops_wr", 0);
-    io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL] =
-                           io_limits.bps[BLOCK_IO_LIMIT_TOTAL] / THROTTLE_HZ;
-    io_limits.bps_threshold[BLOCK_IO_LIMIT_READ]  =
-                           io_limits.bps[BLOCK_IO_LIMIT_READ] / THROTTLE_HZ;
-    io_limits.bps_threshold[BLOCK_IO_LIMIT_WRITE] =
-                           io_limits.bps[BLOCK_IO_LIMIT_WRITE] / THROTTLE_HZ;
-    io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL] =
-                           io_limits.iops[BLOCK_IO_LIMIT_TOTAL] / THROTTLE_HZ;
+
+    /* iops thresholds */
+    io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL]  =
+        qemu_opt_get_number(opts, "iops_threshold",
+                            io_limits.iops[BLOCK_IO_LIMIT_TOTAL] / THROTTLE_HZ);
     io_limits.iops_threshold[BLOCK_IO_LIMIT_READ]  =
-                           io_limits.iops[BLOCK_IO_LIMIT_READ] / THROTTLE_HZ;
-    io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE] =
-                           io_limits.iops[BLOCK_IO_LIMIT_WRITE] / THROTTLE_HZ;
+        qemu_opt_get_number(opts, "iops_rd_threshold",
+                            io_limits.iops[BLOCK_IO_LIMIT_READ] / THROTTLE_HZ);
+    io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE]  =
+        qemu_opt_get_number(opts, "iops_wr_threshold",
+                            io_limits.iops[BLOCK_IO_LIMIT_WRITE] / THROTTLE_HZ);
 
     if (!do_check_io_limits(&io_limits, &error)) {
         error_report("%s", error_get_pretty(error));
@@ -1224,8 +1245,22 @@ void qmp_change_blockdev(const char *device, const char *filename,
 
 /* throttling disk I/O limits */
 void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
-                               int64_t bps_wr, int64_t iops, int64_t iops_rd,
-                               int64_t iops_wr, Error **errp)
+                               int64_t bps_wr,
+                               int64_t iops,
+                               int64_t iops_rd,
+                               int64_t iops_wr,
+                               bool has_bps_threshold,
+                               int64_t bps_threshold,
+                               bool has_bps_rd_threshold,
+                               int64_t bps_rd_threshold,
+                               bool has_bps_wr_threshold,
+                               int64_t bps_wr_threshold,
+                               bool has_iops_threshold,
+                               int64_t iops_threshold,
+                               bool has_iops_rd_threshold,
+                               int64_t iops_rd_threshold,
+                               bool has_iops_wr_threshold,
+                               int64_t iops_wr_threshold, Error **errp)
 {
     BlockIOLimit io_limits;
     BlockDriverState *bs;
@@ -1249,6 +1284,26 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
     io_limits.iops_threshold[BLOCK_IO_LIMIT_READ]  = iops_rd / THROTTLE_HZ;
     io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE] = iops_wr / THROTTLE_HZ;
 
+    /* override them with givens values if present */
+    if (has_bps_threshold) {
+        io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL] = bps_threshold;
+    }
+    if (has_bps_rd_threshold) {
+        io_limits.bps_threshold[BLOCK_IO_LIMIT_READ] = bps_rd_threshold;
+    }
+    if (has_bps_wr_threshold) {
+        io_limits.bps_threshold[BLOCK_IO_LIMIT_WRITE] = bps_wr_threshold;
+    }
+    if (has_iops_threshold) {
+        io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL] = iops_threshold;
+    }
+    if (has_iops_rd_threshold) {
+        io_limits.iops_threshold[BLOCK_IO_LIMIT_READ] = iops_rd_threshold;
+    }
+    if (has_iops_wr_threshold) {
+        io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE] = iops_wr_threshold;
+    }
+
     if (!do_check_io_limits(&io_limits, errp)) {
         return;
     }
@@ -1916,6 +1971,18 @@ QemuOptsList qemu_common_drive_opts = {
             .type = QEMU_OPT_NUMBER,
             .help = "limit write operations per second",
         },{
+            .name = "iops_threshold",
+            .type = QEMU_OPT_NUMBER,
+            .help = "total I/O operations threshold",
+        },{
+            .name = "iops_rd_threshold",
+            .type = QEMU_OPT_NUMBER,
+            .help = "read operations threshold",
+        },{
+            .name = "iops_wr_threshold",
+            .type = QEMU_OPT_NUMBER,
+            .help = "write operations threshold",
+        },{
             .name = "bps",
             .type = QEMU_OPT_NUMBER,
             .help = "limit total bytes per second",
@@ -1928,6 +1995,18 @@ QemuOptsList qemu_common_drive_opts = {
             .type = QEMU_OPT_NUMBER,
             .help = "limit write bytes per second",
         },{
+            .name = "bps_threshold",
+            .type = QEMU_OPT_NUMBER,
+            .help = "total bytes threshold",
+        },{
+            .name = "bps_rd_threshold",
+            .type = QEMU_OPT_NUMBER,
+            .help = "read bytes threshold",
+        },{
+            .name = "bps_wr_threshold",
+            .type = QEMU_OPT_NUMBER,
+            .help = "write bytes threshold",
+        },{
             .name = "copy-on-read",
             .type = QEMU_OPT_BOOL,
             .help = "copy read data from backing file into image file",
diff --git a/hmp.c b/hmp.c
index dc4d8d4..d75aa99 100644
--- a/hmp.c
+++ b/hmp.c
@@ -340,14 +340,28 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
         {
             monitor_printf(mon, "    I/O throttling:   bps=%" PRId64
                             " bps_rd=%" PRId64  " bps_wr=%" PRId64
+                            " bps_threshold=%" PRId64
+                            " bps_rd_threshold=%" PRId64
+                            " bps_wr_threshold=%" PRId64
                             " iops=%" PRId64 " iops_rd=%" PRId64
-                            " iops_wr=%" PRId64 "\n",
+                            " iops_wr=%" PRId64
+                            " iops_threshold=%" PRId64
+                            " iops_rd_threshold=%" PRId64
+                            " iops_wr_threshold=%" PRId64 "\n",
                             info->value->inserted->bps,
                             info->value->inserted->bps_rd,
                             info->value->inserted->bps_wr,
+                            info->value->inserted->bps_threshold,
+                            info->value->inserted->bps_rd_threshold,
+                            info->value->inserted->bps_wr_threshold,
                             info->value->inserted->iops,
                             info->value->inserted->iops_rd,
-                            info->value->inserted->iops_wr);
+                            info->value->inserted->iops_wr,
+                            info->value->inserted->iops_threshold,
+                            info->value->inserted->iops_rd_threshold,
+                            info->value->inserted->iops_wr_threshold);
+        } else {
+            monitor_printf(mon, " [not inserted]");
         }
 
         if (verbose) {
@@ -1094,7 +1108,19 @@ void hmp_block_set_io_throttle(Monitor *mon, const QDict *qdict)
                               qdict_get_int(qdict, "bps_wr"),
                               qdict_get_int(qdict, "iops"),
                               qdict_get_int(qdict, "iops_rd"),
-                              qdict_get_int(qdict, "iops_wr"), &err);
+                              qdict_get_int(qdict, "iops_wr"),
+                              false, /* no burst threshold via QMP */
+                              0,
+                              false,
+                              0,
+                              false,
+                              0,
+                              false,
+                              0,
+                              false,
+                              0,
+                              false,
+                              0, &err);
     hmp_handle_error(mon, &err);
 }
 
diff --git a/qapi-schema.json b/qapi-schema.json
index 7b9fef1..a6f3792 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -769,6 +769,18 @@
 #
 # @image: the info of image used (since: 1.6)
 #
+# @bps_threshold: #optional total threshold in bytes (Since 1.6)
+#
+# @bps_rd_threshold: #optional read threshold in bytes (Since 1.6)
+#
+# @bps_wr_threshold: #optional write threshold in bytes (Since 1.6)
+#
+# @iops_threshold: #optional total I/O operations threshold (Since 1.6)
+#
+# @iops_rd_threshold: #optional read I/O operations threshold (Since 1.6)
+#
+# @iops_wr_threshold: #optional write I/O operations threshold (Since 1.6)
+#
 # Since: 0.14.0
 #
 # Notes: This interface is only found in @BlockInfo.
@@ -779,7 +791,10 @@
             'encrypted': 'bool', 'encryption_key_missing': 'bool',
             'bps': 'int', 'bps_rd': 'int', 'bps_wr': 'int',
             'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int',
-            'image': 'ImageInfo' } }
+            'image': 'ImageInfo',
+            '*bps_threshold': 'int', '*bps_rd_threshold': 'int',
+            '*bps_wr_threshold': 'int', '*iops_threshold': 'int',
+            '*iops_rd_threshold': 'int', '*iops_wr_threshold': 'int' }}
 
 ##
 # @BlockDeviceIoStatus:
@@ -2158,6 +2173,18 @@
 #
 # @iops_wr: write I/O operations per second
 #
+# @bps_threshold: #optional total threshold in bytes (Since 1.6)
+#
+# @bps_rd_threshold: #optional read threshold in bytes (Since 1.6)
+#
+# @bps_wr_threshold: #optional write threshold in bytes (Since 1.6)
+#
+# @iops_threshold: #optional total I/O operations threshold (Since 1.6)
+#
+# @iops_rd_threshold: #optional read I/O operations threshold (Since 1.6)
+#
+# @iops_wr_threshold: #optional write I/O operations threshold (Since 1.6)
+#
 # Returns: Nothing on success
 #          If @device is not a valid block device, DeviceNotFound
 #
@@ -2165,7 +2192,10 @@
 ##
 { 'command': 'block_set_io_throttle',
   'data': { 'device': 'str', 'bps': 'int', 'bps_rd': 'int', 'bps_wr': 'int',
-            'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int' } }
+            'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int',
+            '*bps_threshold': 'int', '*bps_rd_threshold': 'int',
+            '*bps_wr_threshold': 'int', '*iops_threshold': 'int',
+            '*iops_rd_threshold': 'int', '*iops_wr_threshold': 'int' }}
 
 ##
 # @block-stream:
diff --git a/qemu-options.hx b/qemu-options.hx
index 4e98b4f..4d5ee66 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -409,7 +409,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
     "       [,cache=writethrough|writeback|none|directsync|unsafe][,format=f]\n"
     "       [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
     "       [,readonly=on|off][,copy-on-read=on|off]\n"
-    "       [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r][,iops_wr=w]]\n"
+    "       [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r][,iops_wr=w][,bps_threshold=bt]|[[,bps_rd_threshold=rt][,bps_wr_threshold=wt]]][[,iops_threshold=it]|[[,iops_rd_threshold=rt][,iops_wr_threshold=wt]]\n"
     "                use 'file' as a drive image\n", QEMU_ARCH_ALL)
 STEXI
 @item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
diff --git a/qmp-commands.hx b/qmp-commands.hx
index e075df4..11c638a 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -1385,7 +1385,7 @@ EQMP
 
     {
         .name       = "block_set_io_throttle",
-        .args_type  = "device:B,bps:l,bps_rd:l,bps_wr:l,iops:l,iops_rd:l,iops_wr:l",
+        .args_type  = "device:B,bps:l,bps_rd:l,bps_wr:l,iops:l,iops_rd:l,iops_wr:l,bps_threshold:l?,bps_rd_threshold:l?,bps_wr_threshold:l?,iops_threshold:l?,iops_rd_threshold:l?,iops_wr_threshold:l?",
         .mhandler.cmd_new = qmp_marshal_input_block_set_io_throttle,
     },
 
@@ -1400,10 +1400,16 @@ Arguments:
 - "device": device name (json-string)
 - "bps":  total throughput limit in bytes per second(json-int)
 - "bps_rd":  read throughput limit in bytes per second(json-int)
-- "bps_wr":  read throughput limit in bytes per second(json-int)
+- "bps_wr":  write throughput limit in bytes per second(json-int)
 - "iops":  total I/O operations per second(json-int)
 - "iops_rd":  read I/O operations per second(json-int)
 - "iops_wr":  write I/O operations per second(json-int)
+- "bps_threshold":  total threshold in bytes(json-int)
+- "bps_rd_threshold":  read threshold in bytes(json-int)
+- "bps_wr_threshold":  write threshold in bytes(json-int)
+- "iops_threshold":  total I/O operations threshold(json-int)
+- "iops_rd_threshold":  read I/O operations threshold(json-int)
+- "iops_wr_threshold":  write I/O operations threshold(json-int)
 
 Example:
 
@@ -1413,7 +1419,13 @@ Example:
                                                "bps_wr": "0",
                                                "iops": "0",
                                                "iops_rd": "0",
-                                               "iops_wr": "0" } }
+                                               "iops_wr": "0",
+                                               "bps_threshold": "8000000",
+                                               "bps_rd_threshold": "0",
+                                               "bps_wr_threshold": "0",
+                                               "iops_threshold": "0",
+                                               "iops_rd_threshold": "0",
+                                               "iops_wr_threshold": "0" } }
 <- { "return": {} }
 
 EQMP
@@ -1754,6 +1766,12 @@ Each json-object contain the following:
          - "iops": limit total I/O operations per second (json-int)
          - "iops_rd": limit read operations per second (json-int)
          - "iops_wr": limit write operations per second (json-int)
+         - "bps_threshold":  total threshold in bytes(json-int)
+         - "bps_rd_threshold":  read threshold in bytes(json-int)
+         - "bps_wr_threshold":  write threshold in bytes(json-int)
+         - "iops_threshold":  total I/O operations threshold(json-int)
+         - "iops_rd_threshold":  read I/O operations threshold(json-int)
+         - "iops_wr_threshold":  write I/O operations threshold(json-int)
          - "image": the detail of the image, it is a json-object containing
             the following:
              - "filename": image file name (json-string)
@@ -1823,6 +1841,12 @@ Example:
                "iops":1000000,
                "iops_rd":0,
                "iops_wr":0,
+               "bps_threshold": "8000000",
+               "bps_rd_threshold": "0",
+               "bps_wr_threshold": "0",
+               "iops_threshold": "0",
+               "iops_rd_threshold": "0",
+               "iops_wr_threshold": "0",
                "image":{
                   "filename":"disks/test.qcow2",
                   "format":"qcow2",
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH V2 for-1.6 4/5] block: Add iops_sector_count to do the iops accounting for a given io size.
  2013-07-23 12:21 [Qemu-devel] [PATCH V2 for-1.6 0/5] Leaky bucket throttling and features Benoît Canet
                   ` (2 preceding siblings ...)
  2013-07-23 12:21 ` [Qemu-devel] [PATCH V2 for-1.6 3/5] block: Add support for throttling burst threshold in QMP and the command line Benoît Canet
@ 2013-07-23 12:21 ` Benoît Canet
  2013-07-23 12:21 ` [Qemu-devel] [PATCH V2 for-1.6 5/5] block: Add throttling percentage metrics Benoît Canet
  4 siblings, 0 replies; 10+ messages in thread
From: Benoît Canet @ 2013-07-23 12:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

This feature can be used in case where users are avoiding the iops limit by
doing jumbo I/Os hammering the storage backend.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block.c                   |    8 +++++++-
 block/qapi.c              |    4 ++++
 blockdev.c                |   22 +++++++++++++++++++++-
 hmp.c                     |    8 ++++++--
 include/block/block_int.h |    1 +
 qapi-schema.json          |   10 ++++++++--
 qemu-options.hx           |    2 +-
 qmp-commands.hx           |    8 ++++++--
 8 files changed, 54 insertions(+), 9 deletions(-)

diff --git a/block.c b/block.c
index 2d6e9b4..cfa890d 100644
--- a/block.c
+++ b/block.c
@@ -337,6 +337,12 @@ static bool bdrv_is_any_threshold_exceeded(BlockDriverState *bs, int nb_sectors,
                                            bool is_write)
 {
     bool bps_ret, iops_ret;
+    double   ios = 1.0;
+
+    if (bs->io_limits.iops_sector_count) {
+        ios = ((double) nb_sectors) / bs->io_limits.iops_sector_count;
+        ios = MAX(ios, 1.0);
+    }
 
     /* check if any bandwith or per IO threshold has been exceeded */
     bps_ret = bdrv_is_bps_threshold_exceeded(bs, is_write);
@@ -360,7 +366,7 @@ static bool bdrv_is_any_threshold_exceeded(BlockDriverState *bs, int nb_sectors,
     /* the IO is authorized so do the accounting and return false */
     bs->leaky_buckets.bytes[is_write] += (int64_t)nb_sectors *
                                          BDRV_SECTOR_SIZE;
-    bs->leaky_buckets.ios[is_write]++;
+    bs->leaky_buckets.ios[is_write] += ios;
 
     return false;
 }
diff --git a/block/qapi.c b/block/qapi.c
index 03f1604..f81081c 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -259,6 +259,10 @@ void bdrv_query_info(BlockDriverState *bs,
                            bs->io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE];
             info->inserted->iops_wr_threshold =
                            bs->io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE];
+            info->inserted->has_iops_sector_count =
+                           bs->io_limits.iops_sector_count;
+            info->inserted->iops_sector_count =
+                           bs->io_limits.iops_sector_count;
         }
 
         bs0 = bs;
diff --git a/blockdev.c b/blockdev.c
index 9bda359..8ddd710 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -352,6 +352,11 @@ static bool do_check_io_limits(BlockIOLimit *io_limits, Error **errp)
         return false;
     }
 
+    if (io_limits->iops_sector_count < 0) {
+        error_setg(errp, "iops_sector_count must be 0 or greater");
+        return false;
+    }
+
     return true;
 }
 
@@ -563,6 +568,10 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type)
         qemu_opt_get_number(opts, "iops_wr_threshold",
                             io_limits.iops[BLOCK_IO_LIMIT_WRITE] / THROTTLE_HZ);
 
+    io_limits.iops_sector_count =
+                           qemu_opt_get_number(opts, "iops_sector_count", 0);
+ 
+
     if (!do_check_io_limits(&io_limits, &error)) {
         error_report("%s", error_get_pretty(error));
         error_free(error);
@@ -1260,7 +1269,9 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
                                bool has_iops_rd_threshold,
                                int64_t iops_rd_threshold,
                                bool has_iops_wr_threshold,
-                               int64_t iops_wr_threshold, Error **errp)
+                               int64_t iops_wr_threshold,
+                               bool has_iops_sector_count,
+                               int64_t iops_sector_count, Error **errp)
 {
     BlockIOLimit io_limits;
     BlockDriverState *bs;
@@ -1283,6 +1294,7 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
     io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL] = iops / THROTTLE_HZ;
     io_limits.iops_threshold[BLOCK_IO_LIMIT_READ]  = iops_rd / THROTTLE_HZ;
     io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE] = iops_wr / THROTTLE_HZ;
+    io_limits.iops_sector_count = 0;
 
     /* override them with givens values if present */
     if (has_bps_threshold) {
@@ -1304,6 +1316,10 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
         io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE] = iops_wr_threshold;
     }
 
+    if (has_iops_sector_count) {
+        io_limits.iops_sector_count = iops_sector_count;
+    }
+
     if (!do_check_io_limits(&io_limits, errp)) {
         return;
     }
@@ -2007,6 +2023,10 @@ QemuOptsList qemu_common_drive_opts = {
             .type = QEMU_OPT_NUMBER,
             .help = "write bytes threshold",
         },{
+            .name = "iops_sector_count",
+            .type = QEMU_OPT_NUMBER,
+            .help = "when limiting by iops max size of an I/O in sector",
+        },{
             .name = "copy-on-read",
             .type = QEMU_OPT_BOOL,
             .help = "copy read data from backing file into image file",
diff --git a/hmp.c b/hmp.c
index d75aa99..3912305 100644
--- a/hmp.c
+++ b/hmp.c
@@ -347,7 +347,8 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
                             " iops_wr=%" PRId64
                             " iops_threshold=%" PRId64
                             " iops_rd_threshold=%" PRId64
-                            " iops_wr_threshold=%" PRId64 "\n",
+                            " iops_wr_threshold=%" PRId64
+                            " iops_sector_count=%" PRId64 "\n",
                             info->value->inserted->bps,
                             info->value->inserted->bps_rd,
                             info->value->inserted->bps_wr,
@@ -359,7 +360,8 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
                             info->value->inserted->iops_wr,
                             info->value->inserted->iops_threshold,
                             info->value->inserted->iops_rd_threshold,
-                            info->value->inserted->iops_wr_threshold);
+                            info->value->inserted->iops_wr_threshold,
+                            info->value->inserted->iops_sector_count);
         } else {
             monitor_printf(mon, " [not inserted]");
         }
@@ -1120,6 +1122,8 @@ void hmp_block_set_io_throttle(Monitor *mon, const QDict *qdict)
                               false,
                               0,
                               false,
+                              0,
+                              false, /* No default I/O size */
                               0, &err);
     hmp_handle_error(mon, &err);
 }
diff --git a/include/block/block_int.h b/include/block/block_int.h
index e32ad1f..74d7503 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -76,6 +76,7 @@ typedef struct BlockIOLimit {
     int64_t iops[3];
     int64_t bps_threshold[3];
     int64_t iops_threshold[3];
+    int64_t iops_sector_count;
 } BlockIOLimit;
 
 typedef struct BlockIOBaseValue {
diff --git a/qapi-schema.json b/qapi-schema.json
index a6f3792..d579fda 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -781,6 +781,8 @@
 #
 # @iops_wr_threshold: #optional write I/O operations threshold (Since 1.6)
 #
+# @iops_sector_count: #optional an I/O size in sector (Since 1.6)
+#
 # Since: 0.14.0
 #
 # Notes: This interface is only found in @BlockInfo.
@@ -794,7 +796,8 @@
             'image': 'ImageInfo',
             '*bps_threshold': 'int', '*bps_rd_threshold': 'int',
             '*bps_wr_threshold': 'int', '*iops_threshold': 'int',
-            '*iops_rd_threshold': 'int', '*iops_wr_threshold': 'int' }}
+            '*iops_rd_threshold': 'int', '*iops_wr_threshold': 'int',
+            '*iops_sector_count': 'int' } }
 
 ##
 # @BlockDeviceIoStatus:
@@ -2185,6 +2188,8 @@
 #
 # @iops_wr_threshold: #optional write I/O operations threshold (Since 1.6)
 #
+# @iops_sector_count: #optional an I/O size in sector (Since 1.6)
+#
 # Returns: Nothing on success
 #          If @device is not a valid block device, DeviceNotFound
 #
@@ -2195,7 +2200,8 @@
             'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int',
             '*bps_threshold': 'int', '*bps_rd_threshold': 'int',
             '*bps_wr_threshold': 'int', '*iops_threshold': 'int',
-            '*iops_rd_threshold': 'int', '*iops_wr_threshold': 'int' }}
+            '*iops_rd_threshold': 'int', '*iops_wr_threshold': 'int',
+            '*iops_sector_count': 'int' }}
 
 ##
 # @block-stream:
diff --git a/qemu-options.hx b/qemu-options.hx
index 4d5ee66..2f417b8 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -409,7 +409,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
     "       [,cache=writethrough|writeback|none|directsync|unsafe][,format=f]\n"
     "       [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
     "       [,readonly=on|off][,copy-on-read=on|off]\n"
-    "       [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r][,iops_wr=w][,bps_threshold=bt]|[[,bps_rd_threshold=rt][,bps_wr_threshold=wt]]][[,iops_threshold=it]|[[,iops_rd_threshold=rt][,iops_wr_threshold=wt]]\n"
+    "       [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r][,iops_wr=w][,bps_threshold=bt]|[[,bps_rd_threshold=rt][,bps_wr_threshold=wt]]][[,iops_threshold=it]|[[,iops_rd_threshold=rt][,iops_wr_threshold=wt][,iops_sector_count=cnt]]\n"
     "                use 'file' as a drive image\n", QEMU_ARCH_ALL)
 STEXI
 @item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 11c638a..b40db2f 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -1385,7 +1385,7 @@ EQMP
 
     {
         .name       = "block_set_io_throttle",
-        .args_type  = "device:B,bps:l,bps_rd:l,bps_wr:l,iops:l,iops_rd:l,iops_wr:l,bps_threshold:l?,bps_rd_threshold:l?,bps_wr_threshold:l?,iops_threshold:l?,iops_rd_threshold:l?,iops_wr_threshold:l?",
+        .args_type  = "device:B,bps:l,bps_rd:l,bps_wr:l,iops:l,iops_rd:l,iops_wr:l,bps_threshold:l?,bps_rd_threshold:l?,bps_wr_threshold:l?,iops_threshold:l?,iops_rd_threshold:l?,iops_wr_threshold:l?,iops_sector_count:l?",
         .mhandler.cmd_new = qmp_marshal_input_block_set_io_throttle,
     },
 
@@ -1410,6 +1410,7 @@ Arguments:
 - "iops_threshold":  total I/O operations threshold(json-int)
 - "iops_rd_threshold":  read I/O operations threshold(json-int)
 - "iops_wr_threshold":  write I/O operations threshold(json-int)
+- "iops_sector_count":  I/O sector count when limiting(json-int)
 
 Example:
 
@@ -1425,7 +1426,8 @@ Example:
                                                "bps_wr_threshold": "0",
                                                "iops_threshold": "0",
                                                "iops_rd_threshold": "0",
-                                               "iops_wr_threshold": "0" } }
+                                               "iops_wr_threshold": "0",
+                                               "iops_sector_count": "0" } }
 <- { "return": {} }
 
 EQMP
@@ -1772,6 +1774,7 @@ Each json-object contain the following:
          - "iops_threshold":  total I/O operations threshold(json-int)
          - "iops_rd_threshold":  read I/O operations threshold(json-int)
          - "iops_wr_threshold":  write I/O operations threshold(json-int)
+         - "iops_sector_count":  I/O sector count when limiting(json-int)
          - "image": the detail of the image, it is a json-object containing
             the following:
              - "filename": image file name (json-string)
@@ -1847,6 +1850,7 @@ Example:
                "iops_threshold": "0",
                "iops_rd_threshold": "0",
                "iops_wr_threshold": "0",
+               "iops_sector_count": "0",
                "image":{
                   "filename":"disks/test.qcow2",
                   "format":"qcow2",
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH V2 for-1.6 5/5] block: Add throttling percentage metrics.
  2013-07-23 12:21 [Qemu-devel] [PATCH V2 for-1.6 0/5] Leaky bucket throttling and features Benoît Canet
                   ` (3 preceding siblings ...)
  2013-07-23 12:21 ` [Qemu-devel] [PATCH V2 for-1.6 4/5] block: Add iops_sector_count to do the iops accounting for a given io size Benoît Canet
@ 2013-07-23 12:21 ` Benoît Canet
  4 siblings, 0 replies; 10+ messages in thread
From: Benoît Canet @ 2013-07-23 12:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha

This metrics show how many percent of the time I/Os are put on hold by the
throttling algorithm.
This metric could be used by system administrators or cloud stack developpers
to decide when the throttling settings must be changed.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block.c                   |   17 ++++++++++++++++-
 block/qapi.c              |    4 ++++
 hmp.c                     |    6 ++++--
 include/block/block_int.h |    2 ++
 qapi-schema.json          |    4 +++-
 5 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index cfa890d..749f276 100644
--- a/block.c
+++ b/block.c
@@ -130,6 +130,8 @@ void bdrv_io_limits_disable(BlockDriverState *bs)
         qemu_free_timer(bs->block_timer);
         bs->block_timer = NULL;
     }
+    bs->throttling_percentage = 0;
+    bs->full_since = 0;
 }
 
 static void bdrv_make_bps_buckets_leak(BlockDriverState *bs, int64_t delta)
@@ -219,7 +221,8 @@ static void bdrv_make_iops_buckets_leak(BlockDriverState *bs, int64_t delta)
 static void bdrv_leak_if_needed(BlockDriverState *bs)
 {
     int64_t now;
-    int64_t delta;
+    int64_t delta; /* the delta that would be ideally the timer period */
+    int64_t delta_full; /* the delta where the bucket is full */
 
     if (!bs->must_leak) {
         return;
@@ -229,6 +232,11 @@ static void bdrv_leak_if_needed(BlockDriverState *bs)
 
     now = qemu_get_clock_ns(rt_clock);
     delta = now - bs->previous_leak;
+    /* compute throttle percentage reflecting how long IO are hold on average */
+    if (bs->full_since) {
+        delta_full = now - bs->full_since;
+        bs->throttling_percentage = (delta_full * 100) / delta;
+    }
     bs->previous_leak = now;
 
     bdrv_make_bps_buckets_leak(bs, delta);
@@ -255,6 +263,8 @@ void bdrv_io_limits_enable(BlockDriverState *bs)
     bs->block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
     bs->io_limits_enabled = true;
     bs->previous_leak = qemu_get_clock_ns(rt_clock);
+    bs->throttling_percentage = 0;
+    bs->full_since = 0;
     qemu_mod_timer(bs->block_timer,
                    qemu_get_clock_ns(vm_clock) +
                    BLOCK_IO_THROTTLE_PERIOD);
@@ -396,6 +406,11 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs,
      * not full
      */
     while (bdrv_is_any_threshold_exceeded(bs, nb_sectors, is_write)) {
+        /* remember since when the code decided to block the first I/O */
+        if (qemu_co_queue_empty(&bs->throttled_reqs)) {
+            bs->full_since = qemu_get_clock_ns(rt_clock);
+        }
+
         bdrv_leak_if_needed(bs);
         qemu_co_queue_wait_insert_head(&bs->throttled_reqs);
         bdrv_leak_if_needed(bs);
diff --git a/block/qapi.c b/block/qapi.c
index f81081c..bd1c6af 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -263,6 +263,10 @@ void bdrv_query_info(BlockDriverState *bs,
                            bs->io_limits.iops_sector_count;
             info->inserted->iops_sector_count =
                            bs->io_limits.iops_sector_count;
+            info->inserted->has_throttling_percentage =
+                           bs->throttling_percentage;
+            info->inserted->throttling_percentage =
+                           bs->throttling_percentage;
         }
 
         bs0 = bs;
diff --git a/hmp.c b/hmp.c
index 3912305..9dc4862 100644
--- a/hmp.c
+++ b/hmp.c
@@ -348,7 +348,8 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
                             " iops_threshold=%" PRId64
                             " iops_rd_threshold=%" PRId64
                             " iops_wr_threshold=%" PRId64
-                            " iops_sector_count=%" PRId64 "\n",
+                            " iops_sector_count=%" PRId64
+                            " throttling_percentage=%" PRId64 "\n",
                             info->value->inserted->bps,
                             info->value->inserted->bps_rd,
                             info->value->inserted->bps_wr,
@@ -361,7 +362,8 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
                             info->value->inserted->iops_threshold,
                             info->value->inserted->iops_rd_threshold,
                             info->value->inserted->iops_wr_threshold,
-                            info->value->inserted->iops_sector_count);
+                            info->value->inserted->iops_sector_count,
+                            info->value->inserted->throttling_percentage);
         } else {
             monitor_printf(mon, " [not inserted]");
         }
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 74d7503..4487cd9 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -271,6 +271,8 @@ struct BlockDriverState {
     BlockIOLimit io_limits;
     BlockIOBaseValue leaky_buckets;
     int64_t      previous_leak;
+    int64_t      full_since;
+    int          throttling_percentage;
     bool         must_leak;
     CoQueue      throttled_reqs;
     QEMUTimer    *block_timer;
diff --git a/qapi-schema.json b/qapi-schema.json
index d579fda..14a02e7 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -783,6 +783,8 @@
 #
 # @iops_sector_count: #optional an I/O size in sector (Since 1.6)
 #
+# @throttling_percentage: #optional reflect throttling activity (Since 1.6)
+#
 # Since: 0.14.0
 #
 # Notes: This interface is only found in @BlockInfo.
@@ -797,7 +799,7 @@
             '*bps_threshold': 'int', '*bps_rd_threshold': 'int',
             '*bps_wr_threshold': 'int', '*iops_threshold': 'int',
             '*iops_rd_threshold': 'int', '*iops_wr_threshold': 'int',
-            '*iops_sector_count': 'int' } }
+            '*iops_sector_count': 'int', '*throttling_percentage': 'int' } }
 
 ##
 # @BlockDeviceIoStatus:
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH V2 for-1.6 1/5] block: Repair the throttling code.
  2013-07-23 12:21 ` [Qemu-devel] [PATCH V2 for-1.6 1/5] block: Repair the throttling code Benoît Canet
@ 2013-07-23 14:31   ` Stefan Hajnoczi
  0 siblings, 0 replies; 10+ messages in thread
From: Stefan Hajnoczi @ 2013-07-23 14:31 UTC (permalink / raw)
  To: Benoît Canet; +Cc: kwolf, qemu-devel, stefanha

On Tue, Jul 23, 2013 at 02:21:03PM +0200, Benoît Canet wrote:
> The throttling code was segfaulting since commit
> 02ffb504485f0920cfc75a0982a602f824a9a4f4 because some qemu_co_queue_next caller
> does not run in a coroutine.
> qemu_co_queue_do_restart assume that the caller is a coroutinne.
> As sugested by Stefan fix this by entering the coroutine directly.

Please mark qemu_co_queue_next() and qemu_co_queue_restart_all()
coroutine_fn.  They may only be called from coroutine context.

Also please add assert(qemu_in_coroutine()) to these functions so an
assertion fires when they are used incorrectly.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH V2 for-1.6 2/5] block: Modify the throttling code to implement the leaky bucket algorithm.
  2013-07-23 12:21 ` [Qemu-devel] [PATCH V2 for-1.6 2/5] block: Modify the throttling code to implement the leaky bucket algorithm Benoît Canet
@ 2013-07-23 14:59   ` Stefan Hajnoczi
  2013-07-23 15:14     ` Benoît Canet
  2013-07-23 15:20     ` Benoît Canet
  0 siblings, 2 replies; 10+ messages in thread
From: Stefan Hajnoczi @ 2013-07-23 14:59 UTC (permalink / raw)
  To: Benoît Canet; +Cc: kwolf, qemu-devel, stefanha

On Tue, Jul 23, 2013 at 02:21:04PM +0200, Benoît Canet wrote:
> @@ -152,6 +254,10 @@ void bdrv_io_limits_enable(BlockDriverState *bs)
>      qemu_co_queue_init(&bs->throttled_reqs);
>      bs->block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
>      bs->io_limits_enabled = true;
> +    bs->previous_leak = qemu_get_clock_ns(rt_clock);
> +    qemu_mod_timer(bs->block_timer,
> +                   qemu_get_clock_ns(vm_clock) +
> +                   BLOCK_IO_THROTTLE_PERIOD);
>  }

I don't fully understand the patch yet but:

1. Can we activate the timer only when requests are actually pending?
   Imagine a host with 1000 guests, even a 1 second timer becomes
   wasteful.

2. You don't vary the wait time, does this mean a throttled request must
   wait for max 1 second?  If yes, then it introduces a big variance on
   request latency.

> +/* This function check if the correct bandwith threshold has been exceeded
> + *
> + * @is_write: true if the current IO is a write, false if it's a read
> + * @ret:      true if threshold has been exceeded else false
> + */
> +static bool bdrv_is_bps_threshold_exceeded(BlockDriverState *bs, bool is_write)
> +{
> +    /* limit is on total read + write bps : do the sum and compare with total
> +     * threshold
> +     */
> +    if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
> +        int64_t bytes = bs->leaky_buckets.bytes[0] +
> +                        bs->leaky_buckets.bytes[1];

BLOCK_IO_LIMIT_READ, BLOCK_IO_LIMIT_WRITE instead of 0/1?

> +        return bs->io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL] < bytes;
> +    }
> +
> +    /* check wether the threshold corresponding to the current io type (read,
> +     * write has been exceeded

s/write/write)/

> +     */
> +    if (bs->io_limits.bps[is_write]) {
> +        return bs->io_limits.bps_threshold[is_write] <
> +               bs->leaky_buckets.bytes[is_write];
> +    }
> +
> +    /* no limit */
> +    return false;
> +}
> +
> +/* This function check if the correct iops threshold has been exceeded
> + *
> + * @is_write: true if the current IO is a write, false if it's a read
> + * @ret:      true if threshold has been exceeded else false
> + */
> +static bool bdrv_is_iops_threshold_exceeded(BlockDriverState *bs, bool is_write)
> +{

Same comments apply as to bdrv_is_bps_threshold_exceeded().

> @@ -280,10 +280,25 @@ static int parse_block_error_action(const char *buf, bool is_read)
>      }
>  }
>  
> +static bool invalid(int64_t limit)

Please choose a more specific function name like check_io_limit().

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH V2 for-1.6 2/5] block: Modify the throttling code to implement the leaky bucket algorithm.
  2013-07-23 14:59   ` Stefan Hajnoczi
@ 2013-07-23 15:14     ` Benoît Canet
  2013-07-23 15:20     ` Benoît Canet
  1 sibling, 0 replies; 10+ messages in thread
From: Benoît Canet @ 2013-07-23 15:14 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kwolf, qemu-devel, stefanha

> 1. Can we activate the timer only when requests are actually pending?
>    Imagine a host with 1000 guests, even a 1 second timer becomes
>    wasteful.

I will try to do this.

> 2. You don't vary the wait time, does this mean a throttled request must
>    wait for max 1 second?  If yes, then it introduces a big variance on
>    request latency.

If iops or bps request where done on a negligible time (very fast storage
backend) yes.
We could make the timer frequency higher though to mitigate this.

Best regards

Benoît

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH V2 for-1.6 2/5] block: Modify the throttling code to implement the leaky bucket algorithm.
  2013-07-23 14:59   ` Stefan Hajnoczi
  2013-07-23 15:14     ` Benoît Canet
@ 2013-07-23 15:20     ` Benoît Canet
  1 sibling, 0 replies; 10+ messages in thread
From: Benoît Canet @ 2013-07-23 15:20 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kwolf, qemu-devel, stefanha

> I don't fully understand the patch yet but:

The patch is a variant of the following algorithm.
http://en.wikipedia.org/wiki/Leaky_bucket

Best regards

Benoît

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-07-23 15:18 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-23 12:21 [Qemu-devel] [PATCH V2 for-1.6 0/5] Leaky bucket throttling and features Benoît Canet
2013-07-23 12:21 ` [Qemu-devel] [PATCH V2 for-1.6 1/5] block: Repair the throttling code Benoît Canet
2013-07-23 14:31   ` Stefan Hajnoczi
2013-07-23 12:21 ` [Qemu-devel] [PATCH V2 for-1.6 2/5] block: Modify the throttling code to implement the leaky bucket algorithm Benoît Canet
2013-07-23 14:59   ` Stefan Hajnoczi
2013-07-23 15:14     ` Benoît Canet
2013-07-23 15:20     ` Benoît Canet
2013-07-23 12:21 ` [Qemu-devel] [PATCH V2 for-1.6 3/5] block: Add support for throttling burst threshold in QMP and the command line Benoît Canet
2013-07-23 12:21 ` [Qemu-devel] [PATCH V2 for-1.6 4/5] block: Add iops_sector_count to do the iops accounting for a given io size Benoît Canet
2013-07-23 12:21 ` [Qemu-devel] [PATCH V2 for-1.6 5/5] block: Add throttling percentage metrics Benoît Canet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.