All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC 0/4] block: fix I/O throttling oscillations
@ 2013-03-21 14:49 Stefan Hajnoczi
  2013-03-21 14:49 ` [Qemu-devel] [RFC 1/4] block: fix I/O throttling accounting blind spot Stefan Hajnoczi
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Stefan Hajnoczi @ 2013-03-21 14:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Benoît Canet, Stefan Hajnoczi, Zhi Yong Wu

Benoît Canet <benoit@irqsave.net> reported that QEMU I/O throttling can
oscillate under continuous I/O.  The test case runs 50 threads performing
random writes and a -drive iops=150 limit is used.

Since QEMU I/O throttling is implemented using 100 millisecond time slices,
we'd expect 150 +/- 15 IOPS.  Anything outside that range indicates a problem
with the I/O throttling algorithm.

It turned out that even a single thread performing sequential I/O continuously
is throttled outside the 150 +/- 15 IOPS range.  The continous stream of I/O
slows down as time goes on but resets to 150 IOPS again when interrupted.  This
can be tested with:

  $ iostat -d 1 -x /dev/vdb &
  $ dd if=/dev/vdb of=/dev/null bs=4096 iflag=direct

This patches addresses these problems as follows:

1. Account for I/O requests when they are submitted instead of completed.  This
   ensures that we do not exceed the budget for this slice.  Exceeding the
   budget leads to fluctuations since we have to make up for this later.

2. Use constant 100 millisecond slice time.  Adjusting the slice time at
   run-time led to oscillations.  Since the reason for adjusting slice time is
   not clear, drop this behavior.

I have also included two code clean-up patches.

Benoît: Please let me know if this solves the problems you're seeing.

Stefan Hajnoczi (4):
  block: fix I/O throttling accounting blind spot
  block: keep I/O throttling slice time constant
  block: drop duplicated slice extension code
  block: clean up I/O throttling wait_time code

 block.c                   | 47 ++++++++++++++++++++---------------------------
 blockdev.c                |  1 -
 include/block/block_int.h |  3 +--
 3 files changed, 21 insertions(+), 30 deletions(-)

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] [RFC 1/4] block: fix I/O throttling accounting blind spot
  2013-03-21 14:49 [Qemu-devel] [RFC 0/4] block: fix I/O throttling oscillations Stefan Hajnoczi
@ 2013-03-21 14:49 ` Stefan Hajnoczi
  2013-03-27  8:50   ` Zhi Yong Wu
  2013-03-21 14:49 ` [Qemu-devel] [RFC 2/4] block: keep I/O throttling slice time constant Stefan Hajnoczi
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Stefan Hajnoczi @ 2013-03-21 14:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Benoît Canet, Stefan Hajnoczi, Zhi Yong Wu

I/O throttling relies on bdrv_acct_done() which is called when a request
completes.  This leaves a blind spot since we only charge for completed
requests, not submitted requests.

For example, if there is 1 operation remaining in this time slice the
guest could submit 3 operations and they will all be submitted
successfully since they don't actually get accounted for until they
complete.

Originally we probably thought this is okay since the requests will be
accounted when the time slice is extended.  In practice it causes
fluctuations since the guest can exceed its I/O limit and it will be
punished for this later on.

Account for I/O upon submission so that I/O limits are enforced
properly.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block.c                   | 19 ++++++++-----------
 include/block/block_int.h |  2 +-
 2 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/block.c b/block.c
index 0a062c9..31fb0b0 100644
--- a/block.c
+++ b/block.c
@@ -141,7 +141,6 @@ void bdrv_io_limits_disable(BlockDriverState *bs)
     bs->slice_start = 0;
     bs->slice_end   = 0;
     bs->slice_time  = 0;
-    memset(&bs->io_base, 0, sizeof(bs->io_base));
 }
 
 static void bdrv_block_timer(void *opaque)
@@ -1329,8 +1328,8 @@ static void bdrv_move_feature_fields(BlockDriverState *bs_dest,
     bs_dest->slice_time         = bs_src->slice_time;
     bs_dest->slice_start        = bs_src->slice_start;
     bs_dest->slice_end          = bs_src->slice_end;
+    bs_dest->slice_submitted    = bs_src->slice_submitted;
     bs_dest->io_limits          = bs_src->io_limits;
-    bs_dest->io_base            = bs_src->io_base;
     bs_dest->throttled_reqs     = bs_src->throttled_reqs;
     bs_dest->block_timer        = bs_src->block_timer;
     bs_dest->io_limits_enabled  = bs_src->io_limits_enabled;
@@ -3665,9 +3664,9 @@ static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
     slice_time = bs->slice_end - bs->slice_start;
     slice_time /= (NANOSECONDS_PER_SECOND);
     bytes_limit = bps_limit * slice_time;
-    bytes_base  = bs->nr_bytes[is_write] - bs->io_base.bytes[is_write];
+    bytes_base  = bs->slice_submitted.bytes[is_write];
     if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
-        bytes_base += bs->nr_bytes[!is_write] - bs->io_base.bytes[!is_write];
+        bytes_base += bs->slice_submitted.bytes[!is_write];
     }
 
     /* bytes_base: the bytes of data which have been read/written; and
@@ -3683,6 +3682,7 @@ static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
             *wait = 0;
         }
 
+        bs->slice_submitted.bytes[is_write] += bytes_res;
         return false;
     }
 
@@ -3725,9 +3725,9 @@ static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
     slice_time = bs->slice_end - bs->slice_start;
     slice_time /= (NANOSECONDS_PER_SECOND);
     ios_limit  = iops_limit * slice_time;
-    ios_base   = bs->nr_ops[is_write] - bs->io_base.ios[is_write];
+    ios_base   = bs->slice_submitted.ios[is_write];
     if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
-        ios_base += bs->nr_ops[!is_write] - bs->io_base.ios[!is_write];
+        ios_base += bs->slice_submitted.ios[!is_write];
     }
 
     if (ios_base + 1 <= ios_limit) {
@@ -3735,6 +3735,7 @@ static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
             *wait = 0;
         }
 
+        bs->slice_submitted.ios[is_write]++;
         return false;
     }
 
@@ -3772,11 +3773,7 @@ static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
         bs->slice_start = now;
         bs->slice_end   = now + bs->slice_time;
 
-        bs->io_base.bytes[is_write]  = bs->nr_bytes[is_write];
-        bs->io_base.bytes[!is_write] = bs->nr_bytes[!is_write];
-
-        bs->io_base.ios[is_write]    = bs->nr_ops[is_write];
-        bs->io_base.ios[!is_write]   = bs->nr_ops[!is_write];
+        memset(&bs->slice_submitted, 0, sizeof(bs->slice_submitted));
     }
 
     elapsed_time  = now - bs->slice_start;
diff --git a/include/block/block_int.h b/include/block/block_int.h
index ce0aa26..b461764 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -251,7 +251,7 @@ struct BlockDriverState {
     int64_t slice_start;
     int64_t slice_end;
     BlockIOLimit io_limits;
-    BlockIOBaseValue  io_base;
+    BlockIOBaseValue slice_submitted;
     CoQueue      throttled_reqs;
     QEMUTimer    *block_timer;
     bool         io_limits_enabled;
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [RFC 2/4] block: keep I/O throttling slice time constant
  2013-03-21 14:49 [Qemu-devel] [RFC 0/4] block: fix I/O throttling oscillations Stefan Hajnoczi
  2013-03-21 14:49 ` [Qemu-devel] [RFC 1/4] block: fix I/O throttling accounting blind spot Stefan Hajnoczi
@ 2013-03-21 14:49 ` Stefan Hajnoczi
  2013-03-21 14:49 ` [Qemu-devel] [RFC 3/4] block: drop duplicated slice extension code Stefan Hajnoczi
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Stefan Hajnoczi @ 2013-03-21 14:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Benoît Canet, Stefan Hajnoczi, Zhi Yong Wu

It is not necessary to adjust the slice time at runtime.  We already
extend the current slice in order to carry over accounting into the next
slice.  Changing the actual slice time value introduces oscillations.

The guest may experience large changes in throughput or IOPS from one
moment to the next when slice times are adjusted.

Reported-by: Benoît Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block.c                   | 19 +++++++++----------
 blockdev.c                |  1 -
 include/block/block_int.h |  1 -
 3 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/block.c b/block.c
index 31fb0b0..712f544 100644
--- a/block.c
+++ b/block.c
@@ -140,7 +140,6 @@ void bdrv_io_limits_disable(BlockDriverState *bs)
 
     bs->slice_start = 0;
     bs->slice_end   = 0;
-    bs->slice_time  = 0;
 }
 
 static void bdrv_block_timer(void *opaque)
@@ -1325,7 +1324,6 @@ static void bdrv_move_feature_fields(BlockDriverState *bs_dest,
     bs_dest->enable_write_cache = bs_src->enable_write_cache;
 
     /* i/o timing parameters */
-    bs_dest->slice_time         = bs_src->slice_time;
     bs_dest->slice_start        = bs_src->slice_start;
     bs_dest->slice_end          = bs_src->slice_end;
     bs_dest->slice_submitted    = bs_src->slice_submitted;
@@ -3646,6 +3644,7 @@ static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
                  bool is_write, double elapsed_time, uint64_t *wait)
 {
     uint64_t bps_limit = 0;
+    uint64_t extension;
     double   bytes_limit, bytes_base, bytes_res;
     double   slice_time, wait_time;
 
@@ -3694,8 +3693,10 @@ static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
      * info can be kept until the timer fire, so it is increased and tuned
      * based on the result of experiment.
      */
-    bs->slice_time = wait_time * BLOCK_IO_SLICE_TIME * 10;
-    bs->slice_end += bs->slice_time - 3 * BLOCK_IO_SLICE_TIME;
+    extension = wait_time * NANOSECONDS_PER_SECOND;
+    extension = DIV_ROUND_UP(extension, BLOCK_IO_SLICE_TIME) *
+                BLOCK_IO_SLICE_TIME;
+    bs->slice_end += extension;
     if (wait) {
         *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
     }
@@ -3747,8 +3748,8 @@ static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
         wait_time = 0;
     }
 
-    bs->slice_time = wait_time * BLOCK_IO_SLICE_TIME * 10;
-    bs->slice_end += bs->slice_time - 3 * BLOCK_IO_SLICE_TIME;
+    /* Exceeded current slice, extend it by another slice time */
+    bs->slice_end += BLOCK_IO_SLICE_TIME;
     if (wait) {
         *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
     }
@@ -3767,12 +3768,10 @@ static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
     now = qemu_get_clock_ns(vm_clock);
     if ((bs->slice_start < now)
         && (bs->slice_end > now)) {
-        bs->slice_end = now + bs->slice_time;
+        bs->slice_end = now + BLOCK_IO_SLICE_TIME;
     } else {
-        bs->slice_time  =  5 * BLOCK_IO_SLICE_TIME;
         bs->slice_start = now;
-        bs->slice_end   = now + bs->slice_time;
-
+        bs->slice_end   = now + BLOCK_IO_SLICE_TIME;
         memset(&bs->slice_submitted, 0, sizeof(bs->slice_submitted));
     }
 
diff --git a/blockdev.c b/blockdev.c
index 09f76b7..e5ead1f 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1065,7 +1065,6 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
     }
 
     bs->io_limits = io_limits;
-    bs->slice_time = BLOCK_IO_SLICE_TIME;
 
     if (!bs->io_limits_enabled && bdrv_io_limits_enabled(bs)) {
         bdrv_io_limits_enable(bs);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index b461764..ec75802 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -247,7 +247,6 @@ struct BlockDriverState {
     unsigned int copy_on_read_in_flight;
 
     /* the time for latest disk I/O */
-    int64_t slice_time;
     int64_t slice_start;
     int64_t slice_end;
     BlockIOLimit io_limits;
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [RFC 3/4] block: drop duplicated slice extension code
  2013-03-21 14:49 [Qemu-devel] [RFC 0/4] block: fix I/O throttling oscillations Stefan Hajnoczi
  2013-03-21 14:49 ` [Qemu-devel] [RFC 1/4] block: fix I/O throttling accounting blind spot Stefan Hajnoczi
  2013-03-21 14:49 ` [Qemu-devel] [RFC 2/4] block: keep I/O throttling slice time constant Stefan Hajnoczi
@ 2013-03-21 14:49 ` Stefan Hajnoczi
  2013-03-21 14:49 ` [Qemu-devel] [RFC 4/4] block: clean up I/O throttling wait_time code Stefan Hajnoczi
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Stefan Hajnoczi @ 2013-03-21 14:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Benoît Canet, Stefan Hajnoczi, Zhi Yong Wu

The current slice is extended when an I/O request exceeds the limit.
There is no need to extend the slice every time we check a request.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/block.c b/block.c
index 712f544..fed3b4d 100644
--- a/block.c
+++ b/block.c
@@ -3766,10 +3766,7 @@ static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
     int      bps_ret, iops_ret;
 
     now = qemu_get_clock_ns(vm_clock);
-    if ((bs->slice_start < now)
-        && (bs->slice_end > now)) {
-        bs->slice_end = now + BLOCK_IO_SLICE_TIME;
-    } else {
+    if (now > bs->slice_end) {
         bs->slice_start = now;
         bs->slice_end   = now + BLOCK_IO_SLICE_TIME;
         memset(&bs->slice_submitted, 0, sizeof(bs->slice_submitted));
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [RFC 4/4] block: clean up I/O throttling wait_time code
  2013-03-21 14:49 [Qemu-devel] [RFC 0/4] block: fix I/O throttling oscillations Stefan Hajnoczi
                   ` (2 preceding siblings ...)
  2013-03-21 14:49 ` [Qemu-devel] [RFC 3/4] block: drop duplicated slice extension code Stefan Hajnoczi
@ 2013-03-21 14:49 ` Stefan Hajnoczi
  2013-03-21 15:27 ` [Qemu-devel] [RFC 0/4] block: fix I/O throttling oscillations Benoît Canet
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Stefan Hajnoczi @ 2013-03-21 14:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Benoît Canet, Stefan Hajnoczi, Zhi Yong Wu

The wait_time variable is in seconds.  Reflect this in a comment and use
NANOSECONDS_PER_SECOND instead of BLOCK_IO_SLICE_TIME * 10 (which
happens to have the right value).

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index fed3b4d..ff862a0 100644
--- a/block.c
+++ b/block.c
@@ -3698,7 +3698,7 @@ static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
                 BLOCK_IO_SLICE_TIME;
     bs->slice_end += extension;
     if (wait) {
-        *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
+        *wait = wait_time * NANOSECONDS_PER_SECOND;
     }
 
     return true;
@@ -3740,7 +3740,7 @@ static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
         return false;
     }
 
-    /* Calc approx time to dispatch */
+    /* Calc approx time to dispatch, in seconds */
     wait_time = (ios_base + 1) / iops_limit;
     if (wait_time > elapsed_time) {
         wait_time = wait_time - elapsed_time;
@@ -3751,7 +3751,7 @@ static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
     /* Exceeded current slice, extend it by another slice time */
     bs->slice_end += BLOCK_IO_SLICE_TIME;
     if (wait) {
-        *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
+        *wait = wait_time * NANOSECONDS_PER_SECOND;
     }
 
     return true;
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [RFC 0/4] block: fix I/O throttling oscillations
  2013-03-21 14:49 [Qemu-devel] [RFC 0/4] block: fix I/O throttling oscillations Stefan Hajnoczi
                   ` (3 preceding siblings ...)
  2013-03-21 14:49 ` [Qemu-devel] [RFC 4/4] block: clean up I/O throttling wait_time code Stefan Hajnoczi
@ 2013-03-21 15:27 ` Benoît Canet
  2013-03-21 16:34   ` Stefan Hajnoczi
  2013-03-27 12:55 ` Zhi Yong Wu
  2013-04-04 12:15 ` Benoît Canet
  6 siblings, 1 reply; 14+ messages in thread
From: Benoît Canet @ 2013-03-21 15:27 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, qemu-devel, Zhi Yong Wu

> I have also included two code clean-up patches.
> 
> Benoît: Please let me know if this solves the problems you're seeing.

Hello Stefan,

It seems super stable on my test setup.
I will deploy the patchset on the production setup to see if it solves
the problem that the user is seeing.
I should know if it work by the end of the next week.
If it works this patchset will have a super strong Tested-By:

Best regards

Benoît

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [RFC 0/4] block: fix I/O throttling oscillations
  2013-03-21 15:27 ` [Qemu-devel] [RFC 0/4] block: fix I/O throttling oscillations Benoît Canet
@ 2013-03-21 16:34   ` Stefan Hajnoczi
  0 siblings, 0 replies; 14+ messages in thread
From: Stefan Hajnoczi @ 2013-03-21 16:34 UTC (permalink / raw)
  To: Benoît Canet; +Cc: Kevin Wolf, qemu-devel, Zhi Yong Wu

On Thu, Mar 21, 2013 at 04:27:29PM +0100, Benoît Canet wrote:
> > I have also included two code clean-up patches.
> > 
> > Benoît: Please let me know if this solves the problems you're seeing.
> 
> Hello Stefan,
> 
> It seems super stable on my test setup.
> I will deploy the patchset on the production setup to see if it solves
> the problem that the user is seeing.
> I should know if it work by the end of the next week.
> If it works this patchset will have a super strong Tested-By:

Awesome, thanks!

Stefan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [RFC 1/4] block: fix I/O throttling accounting blind spot
  2013-03-21 14:49 ` [Qemu-devel] [RFC 1/4] block: fix I/O throttling accounting blind spot Stefan Hajnoczi
@ 2013-03-27  8:50   ` Zhi Yong Wu
  2013-03-27  9:14     ` Stefan Hajnoczi
  0 siblings, 1 reply; 14+ messages in thread
From: Zhi Yong Wu @ 2013-03-27  8:50 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, Zhi Yong Wu, qemu-devel, Benoît Canet

On Thu, Mar 21, 2013 at 10:49 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> I/O throttling relies on bdrv_acct_done() which is called when a request
> completes.  This leaves a blind spot since we only charge for completed
> requests, not submitted requests.
>
> For example, if there is 1 operation remaining in this time slice the
> guest could submit 3 operations and they will all be submitted
> successfully since they don't actually get accounted for until they
> complete.
>
> Originally we probably thought this is okay since the requests will be
> accounted when the time slice is extended.  In practice it causes
> fluctuations since the guest can exceed its I/O limit and it will be
> punished for this later on.
>
> Account for I/O upon submission so that I/O limits are enforced
> properly.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block.c                   | 19 ++++++++-----------
>  include/block/block_int.h |  2 +-
>  2 files changed, 9 insertions(+), 12 deletions(-)
>
> diff --git a/block.c b/block.c
> index 0a062c9..31fb0b0 100644
> --- a/block.c
> +++ b/block.c
> @@ -141,7 +141,6 @@ void bdrv_io_limits_disable(BlockDriverState *bs)
>      bs->slice_start = 0;
>      bs->slice_end   = 0;
>      bs->slice_time  = 0;
> -    memset(&bs->io_base, 0, sizeof(bs->io_base));
If we try some concussive operations, enable I/O throttling at first,
then disable it, and then enable it, how about? I guess that
bs->slice_submitted will maybe be not correct.
>  }
>
>  static void bdrv_block_timer(void *opaque)
> @@ -1329,8 +1328,8 @@ static void bdrv_move_feature_fields(BlockDriverState *bs_dest,
>      bs_dest->slice_time         = bs_src->slice_time;
>      bs_dest->slice_start        = bs_src->slice_start;
>      bs_dest->slice_end          = bs_src->slice_end;
> +    bs_dest->slice_submitted    = bs_src->slice_submitted;
>      bs_dest->io_limits          = bs_src->io_limits;
> -    bs_dest->io_base            = bs_src->io_base;
>      bs_dest->throttled_reqs     = bs_src->throttled_reqs;
>      bs_dest->block_timer        = bs_src->block_timer;
>      bs_dest->io_limits_enabled  = bs_src->io_limits_enabled;
> @@ -3665,9 +3664,9 @@ static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
>      slice_time = bs->slice_end - bs->slice_start;
>      slice_time /= (NANOSECONDS_PER_SECOND);
>      bytes_limit = bps_limit * slice_time;
> -    bytes_base  = bs->nr_bytes[is_write] - bs->io_base.bytes[is_write];
> +    bytes_base  = bs->slice_submitted.bytes[is_write];
>      if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
> -        bytes_base += bs->nr_bytes[!is_write] - bs->io_base.bytes[!is_write];
> +        bytes_base += bs->slice_submitted.bytes[!is_write];
>      }
>
>      /* bytes_base: the bytes of data which have been read/written; and
> @@ -3683,6 +3682,7 @@ static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
>              *wait = 0;
>          }
>
> +        bs->slice_submitted.bytes[is_write] += bytes_res;
>          return false;
>      }
>
> @@ -3725,9 +3725,9 @@ static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
>      slice_time = bs->slice_end - bs->slice_start;
>      slice_time /= (NANOSECONDS_PER_SECOND);
>      ios_limit  = iops_limit * slice_time;
> -    ios_base   = bs->nr_ops[is_write] - bs->io_base.ios[is_write];
> +    ios_base   = bs->slice_submitted.ios[is_write];
>      if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
> -        ios_base += bs->nr_ops[!is_write] - bs->io_base.ios[!is_write];
> +        ios_base += bs->slice_submitted.ios[!is_write];
>      }
>
>      if (ios_base + 1 <= ios_limit) {
> @@ -3735,6 +3735,7 @@ static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
>              *wait = 0;
>          }
>
> +        bs->slice_submitted.ios[is_write]++;
>          return false;
>      }
>
> @@ -3772,11 +3773,7 @@ static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
>          bs->slice_start = now;
>          bs->slice_end   = now + bs->slice_time;
>
> -        bs->io_base.bytes[is_write]  = bs->nr_bytes[is_write];
> -        bs->io_base.bytes[!is_write] = bs->nr_bytes[!is_write];
> -
> -        bs->io_base.ios[is_write]    = bs->nr_ops[is_write];
> -        bs->io_base.ios[!is_write]   = bs->nr_ops[!is_write];
> +        memset(&bs->slice_submitted, 0, sizeof(bs->slice_submitted));
>      }
>
>      elapsed_time  = now - bs->slice_start;
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index ce0aa26..b461764 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -251,7 +251,7 @@ struct BlockDriverState {
>      int64_t slice_start;
>      int64_t slice_end;
>      BlockIOLimit io_limits;
> -    BlockIOBaseValue  io_base;
> +    BlockIOBaseValue slice_submitted;
>      CoQueue      throttled_reqs;
>      QEMUTimer    *block_timer;
>      bool         io_limits_enabled;
> --
> 1.8.1.4
>
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [RFC 1/4] block: fix I/O throttling accounting blind spot
  2013-03-27  8:50   ` Zhi Yong Wu
@ 2013-03-27  9:14     ` Stefan Hajnoczi
  2013-03-27 12:49       ` Zhi Yong Wu
  0 siblings, 1 reply; 14+ messages in thread
From: Stefan Hajnoczi @ 2013-03-27  9:14 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: Kevin Wolf, Benoît Canet, qemu-devel, Stefan Hajnoczi, Zhi Yong Wu

On Wed, Mar 27, 2013 at 9:50 AM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote:
> On Thu, Mar 21, 2013 at 10:49 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>> I/O throttling relies on bdrv_acct_done() which is called when a request
>> completes.  This leaves a blind spot since we only charge for completed
>> requests, not submitted requests.
>>
>> For example, if there is 1 operation remaining in this time slice the
>> guest could submit 3 operations and they will all be submitted
>> successfully since they don't actually get accounted for until they
>> complete.
>>
>> Originally we probably thought this is okay since the requests will be
>> accounted when the time slice is extended.  In practice it causes
>> fluctuations since the guest can exceed its I/O limit and it will be
>> punished for this later on.
>>
>> Account for I/O upon submission so that I/O limits are enforced
>> properly.
>>
>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>> ---
>>  block.c                   | 19 ++++++++-----------
>>  include/block/block_int.h |  2 +-
>>  2 files changed, 9 insertions(+), 12 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index 0a062c9..31fb0b0 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -141,7 +141,6 @@ void bdrv_io_limits_disable(BlockDriverState *bs)
>>      bs->slice_start = 0;
>>      bs->slice_end   = 0;
>>      bs->slice_time  = 0;
>> -    memset(&bs->io_base, 0, sizeof(bs->io_base));
> If we try some concussive operations, enable I/O throttling at first,
> then disable it, and then enable it, how about? I guess that
> bs->slice_submitted will maybe be not correct.

The memset() was moved...

>> @@ -3772,11 +3773,7 @@ static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
>>          bs->slice_start = now;
>>          bs->slice_end   = now + bs->slice_time;
>>
>> -        bs->io_base.bytes[is_write]  = bs->nr_bytes[is_write];
>> -        bs->io_base.bytes[!is_write] = bs->nr_bytes[!is_write];
>> -
>> -        bs->io_base.ios[is_write]    = bs->nr_ops[is_write];
>> -        bs->io_base.ios[!is_write]   = bs->nr_ops[!is_write];
>> +        memset(&bs->slice_submitted, 0, sizeof(bs->slice_submitted));
>>      }
>>
>>      elapsed_time  = now - bs->slice_start;

...here.

Since bs->slice_start = 0 when I/O throttling is disabled we will
start a new slice next time bdrv_exceed_io_limits() is called.

Therefore bs->slice_submitted is consistent across disable/enable.

Stefan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [RFC 1/4] block: fix I/O throttling accounting blind spot
  2013-03-27  9:14     ` Stefan Hajnoczi
@ 2013-03-27 12:49       ` Zhi Yong Wu
  0 siblings, 0 replies; 14+ messages in thread
From: Zhi Yong Wu @ 2013-03-27 12:49 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Benoît Canet, qemu-devel, Stefan Hajnoczi, Zhi Yong Wu

On Wed, Mar 27, 2013 at 5:14 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Wed, Mar 27, 2013 at 9:50 AM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote:
>> On Thu, Mar 21, 2013 at 10:49 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>>> I/O throttling relies on bdrv_acct_done() which is called when a request
>>> completes.  This leaves a blind spot since we only charge for completed
>>> requests, not submitted requests.
>>>
>>> For example, if there is 1 operation remaining in this time slice the
>>> guest could submit 3 operations and they will all be submitted
>>> successfully since they don't actually get accounted for until they
>>> complete.
>>>
>>> Originally we probably thought this is okay since the requests will be
>>> accounted when the time slice is extended.  In practice it causes
>>> fluctuations since the guest can exceed its I/O limit and it will be
>>> punished for this later on.
>>>
>>> Account for I/O upon submission so that I/O limits are enforced
>>> properly.
>>>
>>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>>> ---
>>>  block.c                   | 19 ++++++++-----------
>>>  include/block/block_int.h |  2 +-
>>>  2 files changed, 9 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/block.c b/block.c
>>> index 0a062c9..31fb0b0 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -141,7 +141,6 @@ void bdrv_io_limits_disable(BlockDriverState *bs)
>>>      bs->slice_start = 0;
>>>      bs->slice_end   = 0;
>>>      bs->slice_time  = 0;
>>> -    memset(&bs->io_base, 0, sizeof(bs->io_base));
>> If we try some concussive operations, enable I/O throttling at first,
>> then disable it, and then enable it, how about? I guess that
>> bs->slice_submitted will maybe be not correct.
>
> The memset() was moved...
>
>>> @@ -3772,11 +3773,7 @@ static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
>>>          bs->slice_start = now;
>>>          bs->slice_end   = now + bs->slice_time;
>>>
>>> -        bs->io_base.bytes[is_write]  = bs->nr_bytes[is_write];
>>> -        bs->io_base.bytes[!is_write] = bs->nr_bytes[!is_write];
>>> -
>>> -        bs->io_base.ios[is_write]    = bs->nr_ops[is_write];
>>> -        bs->io_base.ios[!is_write]   = bs->nr_ops[!is_write];
>>> +        memset(&bs->slice_submitted, 0, sizeof(bs->slice_submitted));
>>>      }
>>>
>>>      elapsed_time  = now - bs->slice_start;
>
> ...here.
>
> Since bs->slice_start = 0 when I/O throttling is disabled we will
> start a new slice next time bdrv_exceed_io_limits() is called.
>
> Therefore bs->slice_submitted is consistent across disable/enable.
Yes, i also realized this just when i came home by subway.
>
> Stefan



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [RFC 0/4] block: fix I/O throttling oscillations
  2013-03-21 14:49 [Qemu-devel] [RFC 0/4] block: fix I/O throttling oscillations Stefan Hajnoczi
                   ` (4 preceding siblings ...)
  2013-03-21 15:27 ` [Qemu-devel] [RFC 0/4] block: fix I/O throttling oscillations Benoît Canet
@ 2013-03-27 12:55 ` Zhi Yong Wu
  2013-04-04 12:15 ` Benoît Canet
  6 siblings, 0 replies; 14+ messages in thread
From: Zhi Yong Wu @ 2013-03-27 12:55 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, Zhi Yong Wu, qemu-devel, Benoît Canet

They look good to me, but i have not done any test on my dev box,
thanks for your fix.

On Thu, Mar 21, 2013 at 10:49 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> Benoît Canet <benoit@irqsave.net> reported that QEMU I/O throttling can
> oscillate under continuous I/O.  The test case runs 50 threads performing
> random writes and a -drive iops=150 limit is used.
>
> Since QEMU I/O throttling is implemented using 100 millisecond time slices,
> we'd expect 150 +/- 15 IOPS.  Anything outside that range indicates a problem
> with the I/O throttling algorithm.
>
> It turned out that even a single thread performing sequential I/O continuously
> is throttled outside the 150 +/- 15 IOPS range.  The continous stream of I/O
> slows down as time goes on but resets to 150 IOPS again when interrupted.  This
> can be tested with:
>
>   $ iostat -d 1 -x /dev/vdb &
>   $ dd if=/dev/vdb of=/dev/null bs=4096 iflag=direct
>
> This patches addresses these problems as follows:
>
> 1. Account for I/O requests when they are submitted instead of completed.  This
>    ensures that we do not exceed the budget for this slice.  Exceeding the
>    budget leads to fluctuations since we have to make up for this later.
>
> 2. Use constant 100 millisecond slice time.  Adjusting the slice time at
>    run-time led to oscillations.  Since the reason for adjusting slice time is
>    not clear, drop this behavior.
>
> I have also included two code clean-up patches.
>
> Benoît: Please let me know if this solves the problems you're seeing.
>
> Stefan Hajnoczi (4):
>   block: fix I/O throttling accounting blind spot
>   block: keep I/O throttling slice time constant
>   block: drop duplicated slice extension code
>   block: clean up I/O throttling wait_time code
>
>  block.c                   | 47 ++++++++++++++++++++---------------------------
>  blockdev.c                |  1 -
>  include/block/block_int.h |  3 +--
>  3 files changed, 21 insertions(+), 30 deletions(-)
>
> --
> 1.8.1.4
>
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [RFC 0/4] block: fix I/O throttling oscillations
  2013-03-21 14:49 [Qemu-devel] [RFC 0/4] block: fix I/O throttling oscillations Stefan Hajnoczi
                   ` (5 preceding siblings ...)
  2013-03-27 12:55 ` Zhi Yong Wu
@ 2013-04-04 12:15 ` Benoît Canet
  2013-04-04 16:24   ` Kevin Wolf
  6 siblings, 1 reply; 14+ messages in thread
From: Benoît Canet @ 2013-04-04 12:15 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, qemu-devel, Zhi Yong Wu


Hi Stefan,

This solve the bug seen by the customer.
Whole patchset tested by on behalf of the user.

Tested-By: Benoit Canet <benoit@irqsave.net>

Best regards

Benoît

> Le Thursday 21 Mar 2013 à 15:49:55 (+0100), Stefan Hajnoczi a écrit :
> Benoît Canet <benoit@irqsave.net> reported that QEMU I/O throttling can
> oscillate under continuous I/O.  The test case runs 50 threads performing
> random writes and a -drive iops=150 limit is used.
> 
> Since QEMU I/O throttling is implemented using 100 millisecond time slices,
> we'd expect 150 +/- 15 IOPS.  Anything outside that range indicates a problem
> with the I/O throttling algorithm.
> 
> It turned out that even a single thread performing sequential I/O continuously
> is throttled outside the 150 +/- 15 IOPS range.  The continous stream of I/O
> slows down as time goes on but resets to 150 IOPS again when interrupted.  This
> can be tested with:
> 
>   $ iostat -d 1 -x /dev/vdb &
>   $ dd if=/dev/vdb of=/dev/null bs=4096 iflag=direct
> 
> This patches addresses these problems as follows:
> 
> 1. Account for I/O requests when they are submitted instead of completed.  This
>    ensures that we do not exceed the budget for this slice.  Exceeding the
>    budget leads to fluctuations since we have to make up for this later.
> 
> 2. Use constant 100 millisecond slice time.  Adjusting the slice time at
>    run-time led to oscillations.  Since the reason for adjusting slice time is
>    not clear, drop this behavior.
> 
> I have also included two code clean-up patches.
> 
> Benoît: Please let me know if this solves the problems you're seeing.
> 
> Stefan Hajnoczi (4):
>   block: fix I/O throttling accounting blind spot
>   block: keep I/O throttling slice time constant
>   block: drop duplicated slice extension code
>   block: clean up I/O throttling wait_time code
> 
>  block.c                   | 47 ++++++++++++++++++++---------------------------
>  blockdev.c                |  1 -
>  include/block/block_int.h |  3 +--
>  3 files changed, 21 insertions(+), 30 deletions(-)
> 
> -- 
> 1.8.1.4
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [RFC 0/4] block: fix I/O throttling oscillations
  2013-04-04 12:15 ` Benoît Canet
@ 2013-04-04 16:24   ` Kevin Wolf
  2013-04-04 16:30     ` Stefan Hajnoczi
  0 siblings, 1 reply; 14+ messages in thread
From: Kevin Wolf @ 2013-04-04 16:24 UTC (permalink / raw)
  To: Benoît Canet; +Cc: qemu-devel, Stefan Hajnoczi, Zhi Yong Wu

Am 04.04.2013 um 14:15 hat Benoît Canet geschrieben:
> 
> Hi Stefan,
> 
> This solve the bug seen by the customer.
> Whole patchset tested by on behalf of the user.
> 
> Tested-By: Benoit Canet <benoit@irqsave.net>

Stefan, are you going to send a different version or should I just apply
this one despite being an RFC?

Kevin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [RFC 0/4] block: fix I/O throttling oscillations
  2013-04-04 16:24   ` Kevin Wolf
@ 2013-04-04 16:30     ` Stefan Hajnoczi
  0 siblings, 0 replies; 14+ messages in thread
From: Stefan Hajnoczi @ 2013-04-04 16:30 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Benoît Canet, qemu-devel, Zhi Yong Wu

On Thu, Apr 04, 2013 at 06:24:23PM +0200, Kevin Wolf wrote:
> Am 04.04.2013 um 14:15 hat Benoît Canet geschrieben:
> > 
> > Hi Stefan,
> > 
> > This solve the bug seen by the customer.
> > Whole patchset tested by on behalf of the user.
> > 
> > Tested-By: Benoit Canet <benoit@irqsave.net>
> 
> Stefan, are you going to send a different version or should I just apply
> this one despite being an RFC?

If it still applies cleanly then please take this series.

Otherwise let me know and I'll send out a rebased series.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2013-04-04 16:31 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-21 14:49 [Qemu-devel] [RFC 0/4] block: fix I/O throttling oscillations Stefan Hajnoczi
2013-03-21 14:49 ` [Qemu-devel] [RFC 1/4] block: fix I/O throttling accounting blind spot Stefan Hajnoczi
2013-03-27  8:50   ` Zhi Yong Wu
2013-03-27  9:14     ` Stefan Hajnoczi
2013-03-27 12:49       ` Zhi Yong Wu
2013-03-21 14:49 ` [Qemu-devel] [RFC 2/4] block: keep I/O throttling slice time constant Stefan Hajnoczi
2013-03-21 14:49 ` [Qemu-devel] [RFC 3/4] block: drop duplicated slice extension code Stefan Hajnoczi
2013-03-21 14:49 ` [Qemu-devel] [RFC 4/4] block: clean up I/O throttling wait_time code Stefan Hajnoczi
2013-03-21 15:27 ` [Qemu-devel] [RFC 0/4] block: fix I/O throttling oscillations Benoît Canet
2013-03-21 16:34   ` Stefan Hajnoczi
2013-03-27 12:55 ` Zhi Yong Wu
2013-04-04 12:15 ` Benoît Canet
2013-04-04 16:24   ` Kevin Wolf
2013-04-04 16:30     ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.