All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] migration: Rate limit inside host pages
@ 2019-12-05 10:29 Dr. David Alan Gilbert (git)
  2019-12-05 13:54 ` Juan Quintela
  2019-12-05 13:55 ` Peter Xu
  0 siblings, 2 replies; 4+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-12-05 10:29 UTC (permalink / raw)
  To: qemu-devel, LMa, quintela, peterx

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

When using hugepages, rate limiting is necessary within each huge
page, since a 1G huge page can take a significant time to send, so
you end up with bursty behaviour.

Fixes: 4c011c37ecb3 ("postcopy: Send whole huge pages")
Reported-by: Lin Ma <LMa@suse.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/migration.c  | 57 ++++++++++++++++++++++++------------------
 migration/migration.h  |  1 +
 migration/ram.c        |  2 ++
 migration/trace-events |  4 +--
 4 files changed, 37 insertions(+), 27 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 354ad072fa..27500d09a9 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3224,6 +3224,37 @@ void migration_consume_urgent_request(void)
     qemu_sem_wait(&migrate_get_current()->rate_limit_sem);
 }
 
+/* Returns true if the rate limiting was broken by an urgent request */
+bool migration_rate_limit(void)
+{
+    int64_t now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+    MigrationState *s = migrate_get_current();
+
+    bool urgent = false;
+    migration_update_counters(s, now);
+    if (qemu_file_rate_limit(s->to_dst_file)) {
+        /*
+         * Wait for a delay to do rate limiting OR
+         * something urgent to post the semaphore.
+         */
+        int ms = s->iteration_start_time + BUFFER_DELAY - now;
+        trace_migration_rate_limit_pre(ms);
+        if (qemu_sem_timedwait(&s->rate_limit_sem, ms) == 0) {
+            /*
+             * We were woken by one or more urgent things but
+             * the timedwait will have consumed one of them.
+             * The service routine for the urgent wake will dec
+             * the semaphore itself for each item it consumes,
+             * so add this one we just eat back.
+             */
+            qemu_sem_post(&s->rate_limit_sem);
+            urgent = true;
+        }
+        trace_migration_rate_limit_post(urgent);
+    }
+    return urgent;
+}
+
 /*
  * Master migration thread on the source VM.
  * It drives the migration and pumps the data down the outgoing channel.
@@ -3290,8 +3321,6 @@ static void *migration_thread(void *opaque)
     trace_migration_thread_setup_complete();
 
     while (migration_is_active(s)) {
-        int64_t current_time;
-
         if (urgent || !qemu_file_rate_limit(s->to_dst_file)) {
             MigIterateState iter_state = migration_iteration_run(s);
             if (iter_state == MIG_ITERATE_SKIP) {
@@ -3318,29 +3347,7 @@ static void *migration_thread(void *opaque)
             update_iteration_initial_status(s);
         }
 
-        current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
-
-        migration_update_counters(s, current_time);
-
-        urgent = false;
-        if (qemu_file_rate_limit(s->to_dst_file)) {
-            /* Wait for a delay to do rate limiting OR
-             * something urgent to post the semaphore.
-             */
-            int ms = s->iteration_start_time + BUFFER_DELAY - current_time;
-            trace_migration_thread_ratelimit_pre(ms);
-            if (qemu_sem_timedwait(&s->rate_limit_sem, ms) == 0) {
-                /* We were worken by one or more urgent things but
-                 * the timedwait will have consumed one of them.
-                 * The service routine for the urgent wake will dec
-                 * the semaphore itself for each item it consumes,
-                 * so add this one we just eat back.
-                 */
-                qemu_sem_post(&s->rate_limit_sem);
-                urgent = true;
-            }
-            trace_migration_thread_ratelimit_post(urgent);
-        }
+        urgent = migration_rate_limit();
     }
 
     trace_migration_thread_after_loop();
diff --git a/migration/migration.h b/migration/migration.h
index 79b3dda146..aa9ff6f27b 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -341,5 +341,6 @@ int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque);
 
 void migration_make_urgent_request(void);
 void migration_consume_urgent_request(void);
+bool migration_rate_limit(void);
 
 #endif
diff --git a/migration/ram.c b/migration/ram.c
index a4ae3b3120..a9177c6a24 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2616,6 +2616,8 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss,
 
         pages += tmppages;
         pss->page++;
+        /* Allow rate limiting to happen in the middle of huge pages */
+        migration_rate_limit();
     } while ((pss->page & (pagesize_bits - 1)) &&
              offset_in_ramblock(pss->block, pss->page << TARGET_PAGE_BITS));
 
diff --git a/migration/trace-events b/migration/trace-events
index 6dee7b5389..2f9129e213 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -138,12 +138,12 @@ migrate_send_rp_recv_bitmap(char *name, int64_t size) "block '%s' size 0x%"PRIi6
 migration_completion_file_err(void) ""
 migration_completion_postcopy_end(void) ""
 migration_completion_postcopy_end_after_complete(void) ""
+migration_rate_limit_pre(int ms) "%d ms"
+migration_rate_limit_post(int urgent) "urgent: %d"
 migration_return_path_end_before(void) ""
 migration_return_path_end_after(int rp_error) "%d"
 migration_thread_after_loop(void) ""
 migration_thread_file_err(void) ""
-migration_thread_ratelimit_pre(int ms) "%d ms"
-migration_thread_ratelimit_post(int urgent) "urgent: %d"
 migration_thread_setup_complete(void) ""
 open_return_path_on_source(void) ""
 open_return_path_on_source_continue(void) ""
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] migration: Rate limit inside host pages
  2019-12-05 10:29 [PATCH] migration: Rate limit inside host pages Dr. David Alan Gilbert (git)
@ 2019-12-05 13:54 ` Juan Quintela
  2019-12-05 14:30   ` Dr. David Alan Gilbert
  2019-12-05 13:55 ` Peter Xu
  1 sibling, 1 reply; 4+ messages in thread
From: Juan Quintela @ 2019-12-05 13:54 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, peterx, LMa

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> When using hugepages, rate limiting is necessary within each huge
> page, since a 1G huge page can take a significant time to send, so
> you end up with bursty behaviour.
>
> Fixes: 4c011c37ecb3 ("postcopy: Send whole huge pages")
> Reported-by: Lin Ma <LMa@suse.com>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---

Reviewed-by: Juan Quintela <quintela@redhat.com>

I can agree that rate limit needs to be done for huge pages.

> diff --git a/migration/ram.c b/migration/ram.c
> index a4ae3b3120..a9177c6a24 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2616,6 +2616,8 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss,
>  
>          pages += tmppages;
>          pss->page++;
> +        /* Allow rate limiting to happen in the middle of huge pages */
> +        migration_rate_limit();
>      } while ((pss->page & (pagesize_bits - 1)) &&
>               offset_in_ramblock(pss->block, pss->page << TARGET_PAGE_BITS));
>  

But is doing the rate limit for each page, no?  Even when not using huge
pages.

Not that it should be a big issue (performance wise).
Have you done any meassuremnet?


Later, Juan.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] migration: Rate limit inside host pages
  2019-12-05 10:29 [PATCH] migration: Rate limit inside host pages Dr. David Alan Gilbert (git)
  2019-12-05 13:54 ` Juan Quintela
@ 2019-12-05 13:55 ` Peter Xu
  1 sibling, 0 replies; 4+ messages in thread
From: Peter Xu @ 2019-12-05 13:55 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: quintela, qemu-devel, LMa

On Thu, Dec 05, 2019 at 10:29:18AM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> When using hugepages, rate limiting is necessary within each huge
> page, since a 1G huge page can take a significant time to send, so
> you end up with bursty behaviour.
> 
> Fixes: 4c011c37ecb3 ("postcopy: Send whole huge pages")
> Reported-by: Lin Ma <LMa@suse.com>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  migration/migration.c  | 57 ++++++++++++++++++++++++------------------
>  migration/migration.h  |  1 +
>  migration/ram.c        |  2 ++
>  migration/trace-events |  4 +--
>  4 files changed, 37 insertions(+), 27 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 354ad072fa..27500d09a9 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3224,6 +3224,37 @@ void migration_consume_urgent_request(void)
>      qemu_sem_wait(&migrate_get_current()->rate_limit_sem);
>  }
>  
> +/* Returns true if the rate limiting was broken by an urgent request */
> +bool migration_rate_limit(void)
> +{
> +    int64_t now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +    MigrationState *s = migrate_get_current();
> +
> +    bool urgent = false;
> +    migration_update_counters(s, now);
> +    if (qemu_file_rate_limit(s->to_dst_file)) {
> +        /*
> +         * Wait for a delay to do rate limiting OR
> +         * something urgent to post the semaphore.
> +         */
> +        int ms = s->iteration_start_time + BUFFER_DELAY - now;
> +        trace_migration_rate_limit_pre(ms);
> +        if (qemu_sem_timedwait(&s->rate_limit_sem, ms) == 0) {
> +            /*
> +             * We were woken by one or more urgent things but
> +             * the timedwait will have consumed one of them.
> +             * The service routine for the urgent wake will dec
> +             * the semaphore itself for each item it consumes,
> +             * so add this one we just eat back.
> +             */
> +            qemu_sem_post(&s->rate_limit_sem);

I remembered I've commented around this when it was first introduced
on whether we can avoid this post().  IMHO we can if with something
like an eventfd, so when we queue the page we write the eventfd to 1,
here we poll() on the eventfd with the same timeout, then clear it
after the poll no matter what.  When unqueue, we can probably simply
do nothing.  I'm not sure about Windows or other OS, though..

Anyway this patch is not changing that part but to fix huge page
issue, so that's another story for sure.

Reviewed-by: Peter Xu <peterx@redhat.com>

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] migration: Rate limit inside host pages
  2019-12-05 13:54 ` Juan Quintela
@ 2019-12-05 14:30   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 4+ messages in thread
From: Dr. David Alan Gilbert @ 2019-12-05 14:30 UTC (permalink / raw)
  To: Juan Quintela; +Cc: qemu-devel, peterx, LMa

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > When using hugepages, rate limiting is necessary within each huge
> > page, since a 1G huge page can take a significant time to send, so
> > you end up with bursty behaviour.
> >
> > Fixes: 4c011c37ecb3 ("postcopy: Send whole huge pages")
> > Reported-by: Lin Ma <LMa@suse.com>
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> 
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> 
> I can agree that rate limit needs to be done for huge pages.
> 
> > diff --git a/migration/ram.c b/migration/ram.c
> > index a4ae3b3120..a9177c6a24 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -2616,6 +2616,8 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss,
> >  
> >          pages += tmppages;
> >          pss->page++;
> > +        /* Allow rate limiting to happen in the middle of huge pages */
> > +        migration_rate_limit();
> >      } while ((pss->page & (pagesize_bits - 1)) &&
> >               offset_in_ramblock(pss->block, pss->page << TARGET_PAGE_BITS));
> >  
> 
> But is doing the rate limit for each page, no?  Even when not using huge
> pages.

Right.

> Not that it should be a big issue (performance wise).
> Have you done any meassuremnet?

I've just given it a quick run; it still seems to be hitting ~9.5Gbps on
my 10Gbps interface; so it doesn't seem to be the limit on that.

Dave

> 
> 
> Later, Juan.
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-12-05 14:33 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-05 10:29 [PATCH] migration: Rate limit inside host pages Dr. David Alan Gilbert (git)
2019-12-05 13:54 ` Juan Quintela
2019-12-05 14:30   ` Dr. David Alan Gilbert
2019-12-05 13:55 ` Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.