All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] tests/migration: Add some slack to auto converge
@ 2020-02-10 19:57 Dr. David Alan Gilbert (git)
  2020-02-10 23:05 ` Peter Xu
  2020-02-11  8:34 ` Juan Quintela
  0 siblings, 2 replies; 3+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2020-02-10 19:57 UTC (permalink / raw)
  To: qemu-devel, quintela, peterx, thuth, lvivier; +Cc: philmd

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

There's an assert in autoconverge that checks that we quit the
iteration when we go below the expected threshold.  Philippe
saw a case where this assert fired with the measured value
slightly over the threshold. (about 3k out of a few million).

I can think of two reasons:
  a) Rounding errors
  b) That after we make the decision to quit iteration we do one
    more sync and that sees a few more dirty pages.

So add 1% slack to the assertion, that should cover a and
most cases of b, probably all we'll see for the test.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tests/qtest/migration-test.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index cf27ebbc9d..a78ac0c7da 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1237,7 +1237,8 @@ static void test_migrate_auto_converge(void)
     g_assert_cmpint(percentage, <=, max_pct);
 
     remaining = read_ram_property_int(from, "remaining");
-    g_assert_cmpint(remaining, <, expected_threshold);
+    g_assert_cmpint(remaining, <,
+                    (expected_threshold + expected_threshold / 100));
 
     migrate_continue(from, "pre-switchover");
 
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] tests/migration: Add some slack to auto converge
  2020-02-10 19:57 [PATCH] tests/migration: Add some slack to auto converge Dr. David Alan Gilbert (git)
@ 2020-02-10 23:05 ` Peter Xu
  2020-02-11  8:34 ` Juan Quintela
  1 sibling, 0 replies; 3+ messages in thread
From: Peter Xu @ 2020-02-10 23:05 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: lvivier, thuth, philmd, qemu-devel, quintela

On Mon, Feb 10, 2020 at 07:57:31PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> There's an assert in autoconverge that checks that we quit the
> iteration when we go below the expected threshold.  Philippe
> saw a case where this assert fired with the measured value
> slightly over the threshold. (about 3k out of a few million).
> 
> I can think of two reasons:
>   a) Rounding errors

Yes, I feel like we're having approximate values here and there in
migration code. One simple example is that we calculate bandwidth
using sent bytes and timestamps, however our sent bytes should in most
cases increase with target page size only (4096 on x86), and then to
divide an ~100ms value which changes too from time to time, then
everything comes out of these inaccurate numbers could be inaccurate..

>   b) That after we make the decision to quit iteration we do one
>     more sync and that sees a few more dirty pages.

True..  I also didn't see how to guarantee the extra 1% will cover
this case either...

> 
> So add 1% slack to the assertion, that should cover a and
> most cases of b, probably all we'll see for the test.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

... But if what we see are always 3K or around then maybe we only need
to fixup that special value.  Assuming this is for sure going to
unbreak tests:

Reviewed-by: Peter Xu <peterx@redhat.com>

(PS: I would probably remove this check as a whole :)

> ---
>  tests/qtest/migration-test.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index cf27ebbc9d..a78ac0c7da 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -1237,7 +1237,8 @@ static void test_migrate_auto_converge(void)
>      g_assert_cmpint(percentage, <=, max_pct);
>  
>      remaining = read_ram_property_int(from, "remaining");
> -    g_assert_cmpint(remaining, <, expected_threshold);
> +    g_assert_cmpint(remaining, <,
> +                    (expected_threshold + expected_threshold / 100));
>  
>      migrate_continue(from, "pre-switchover");
>  
> -- 
> 2.24.1
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] tests/migration: Add some slack to auto converge
  2020-02-10 19:57 [PATCH] tests/migration: Add some slack to auto converge Dr. David Alan Gilbert (git)
  2020-02-10 23:05 ` Peter Xu
@ 2020-02-11  8:34 ` Juan Quintela
  1 sibling, 0 replies; 3+ messages in thread
From: Juan Quintela @ 2020-02-11  8:34 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: lvivier, thuth, philmd, qemu-devel, peterx

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> There's an assert in autoconverge that checks that we quit the
> iteration when we go below the expected threshold.  Philippe
> saw a case where this assert fired with the measured value
> slightly over the threshold. (about 3k out of a few million).
>
> I can think of two reasons:
>   a) Rounding errors
>   b) That after we make the decision to quit iteration we do one
>     more sync and that sees a few more dirty pages.
>
> So add 1% slack to the assertion, that should cover a and
> most cases of b, probably all we'll see for the test.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

It shouldn't matter really.  And if we are seeing that problem.



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-02-11  8:36 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-10 19:57 [PATCH] tests/migration: Add some slack to auto converge Dr. David Alan Gilbert (git)
2020-02-10 23:05 ` Peter Xu
2020-02-11  8:34 ` Juan Quintela

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.