All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/2] Postcopy fix and traces
@ 2017-04-26 18:37 Dr. David Alan Gilbert (git)
  2017-04-26 18:37 ` [Qemu-devel] [PATCH 1/2] Postcopy: Force allocation of all-zero precopy pages Dr. David Alan Gilbert (git)
  2017-04-26 18:37 ` [Qemu-devel] [PATCH 2/2] migration: Extra tracing Dr. David Alan Gilbert (git)
  0 siblings, 2 replies; 18+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-04-26 18:37 UTC (permalink / raw)
  To: qemu-devel, borntraeger, quintela, lvivier, peterx

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Hi,
  This is a small postcopy fix for a bug that
Christian ran into, and a couple of extra traces that
would have made finding it easier.

Dave

Dr. David Alan Gilbert (2):
  Postcopy: Force allocation of all-zero precopy pages
  migration: Extra tracing

 include/migration/migration.h |  3 ++-
 migration/ram.c               | 14 ++++++++++----
 migration/rdma.c              |  2 +-
 migration/trace-events        |  2 ++
 4 files changed, 15 insertions(+), 6 deletions(-)

-- 
2.12.2

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH 1/2] Postcopy: Force allocation of all-zero precopy pages
  2017-04-26 18:37 [Qemu-devel] [PATCH 0/2] Postcopy fix and traces Dr. David Alan Gilbert (git)
@ 2017-04-26 18:37 ` Dr. David Alan Gilbert (git)
  2017-04-26 19:00   ` Christian Borntraeger
  2017-04-26 19:35   ` Juan Quintela
  2017-04-26 18:37 ` [Qemu-devel] [PATCH 2/2] migration: Extra tracing Dr. David Alan Gilbert (git)
  1 sibling, 2 replies; 18+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-04-26 18:37 UTC (permalink / raw)
  To: qemu-devel, borntraeger, quintela, lvivier, peterx

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

When an all-zero page is received during the precopy
phase of a postcopy-enabled migration we must force
allocation otherwise accesses to the page will still
get blocked by userfault.

Symptom:
  a) If the page is accessed by a device during device-load
    then we get a deadlock as the source finishes sending
    all its pages but the destination device-load is still
    paused and so doesn't clean up.

  b) If the page is accessed later, then the thread will stay
    paused until the end of migration rather than carrying on
    running, until we release userfault at the end.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 include/migration/migration.h |  3 ++-
 migration/ram.c               | 12 ++++++++----
 migration/rdma.c              |  2 +-
 3 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index ba1a16cbc1..b47904033c 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -265,7 +265,8 @@ uint64_t xbzrle_mig_pages_overflow(void);
 uint64_t xbzrle_mig_pages_cache_miss(void);
 double xbzrle_mig_cache_miss_rate(void);
 
-void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
+void ram_handle_compressed(void *host, uint8_t ch, uint64_t size,
+                           bool always_write);
 void ram_debug_dump_bitmap(unsigned long *todump, bool expected);
 /* For outgoing discard bitmap */
 int ram_postcopy_send_discard_bitmap(MigrationState *ms);
diff --git a/migration/ram.c b/migration/ram.c
index f48664ec62..b4ed41c725 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2274,10 +2274,12 @@ static inline void *host_from_ram_block_offset(RAMBlock *block,
  * @host: host address for the zero page
  * @ch: what the page is filled from.  We only support zero
  * @size: size of the zero page
+ * @always_write: Always perform the memset even if it's zero
  */
-void ram_handle_compressed(void *host, uint8_t ch, uint64_t size)
+void ram_handle_compressed(void *host, uint8_t ch, uint64_t size,
+                           bool always_write)
 {
-    if (ch != 0 || !is_zero_range(host, size)) {
+    if (ch != 0 || always_write || !is_zero_range(host, size)) {
         memset(host, ch, size);
     }
 }
@@ -2514,7 +2516,8 @@ static int ram_load_postcopy(QEMUFile *f)
         switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
         case RAM_SAVE_FLAG_COMPRESS:
             ch = qemu_get_byte(f);
-            memset(page_buffer, ch, TARGET_PAGE_SIZE);
+            ram_handle_compressed(page_buffer, ch, TARGET_PAGE_SIZE,
+                                  true);
             if (ch) {
                 all_zero = false;
             }
@@ -2664,7 +2667,8 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
 
         case RAM_SAVE_FLAG_COMPRESS:
             ch = qemu_get_byte(f);
-            ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
+            ram_handle_compressed(host, ch, TARGET_PAGE_SIZE,
+                                  postcopy_advised);
             break;
 
         case RAM_SAVE_FLAG_PAGE:
diff --git a/migration/rdma.c b/migration/rdma.c
index fe0a4b5a83..07a9bd75d8 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3164,7 +3164,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
             host_addr = block->local_host_addr +
                             (comp->offset - block->offset);
 
-            ram_handle_compressed(host_addr, comp->value, comp->length);
+            ram_handle_compressed(host_addr, comp->value, comp->length, false);
             break;
 
         case RDMA_CONTROL_REGISTER_FINISHED:
-- 
2.12.2

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH 2/2] migration: Extra tracing
  2017-04-26 18:37 [Qemu-devel] [PATCH 0/2] Postcopy fix and traces Dr. David Alan Gilbert (git)
  2017-04-26 18:37 ` [Qemu-devel] [PATCH 1/2] Postcopy: Force allocation of all-zero precopy pages Dr. David Alan Gilbert (git)
@ 2017-04-26 18:37 ` Dr. David Alan Gilbert (git)
  2017-04-26 18:48   ` Christian Borntraeger
                     ` (2 more replies)
  1 sibling, 3 replies; 18+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-04-26 18:37 UTC (permalink / raw)
  To: qemu-devel, borntraeger, quintela, lvivier, peterx

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

A couple more traces that would have made fixing that postcopy
bug a bit easier.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/ram.c        | 2 ++
 migration/trace-events | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index b4ed41c725..3ac41ccaba 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -812,6 +812,7 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
     ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
 
     p = block->host + offset;
+    trace_ram_save_page(block->idstr, (uint64_t)offset, p);
 
     /* In doubt sent page as normal */
     bytes_xmit = 0;
@@ -2614,6 +2615,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
                 ret = -EINVAL;
                 break;
             }
+            trace_ram_load_loop(block->idstr, (uint64_t)addr, flags, host);
         }
 
         switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
diff --git a/migration/trace-events b/migration/trace-events
index b8f01a218c..5b8ccf301c 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -69,8 +69,10 @@ migration_bitmap_sync_start(void) ""
 migration_bitmap_sync_end(uint64_t dirty_pages) "dirty_pages %" PRIu64
 migration_throttle(void) ""
 ram_discard_range(const char *rbname, uint64_t start, size_t len) "%s: start: %" PRIx64 " %zx"
+ram_load_loop(const char *rbname, uint64_t addr, int flags, void *host) "%s: addr: %" PRIx64 " flags: %x host: %p"
 ram_load_postcopy_loop(uint64_t addr, int flags) "@%" PRIx64 " %x"
 ram_postcopy_send_discard_bitmap(void) ""
+ram_save_page(const char *rbname, uint64_t offset, void *host) "%s: offset: %" PRIx64 " host: %p"
 ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: %zx len: %zx"
 
 # migration/migration.c
-- 
2.12.2

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] migration: Extra tracing
  2017-04-26 18:37 ` [Qemu-devel] [PATCH 2/2] migration: Extra tracing Dr. David Alan Gilbert (git)
@ 2017-04-26 18:48   ` Christian Borntraeger
  2017-05-02 13:17   ` Philippe Mathieu-Daudé
  2017-05-04  8:40   ` Juan Quintela
  2 siblings, 0 replies; 18+ messages in thread
From: Christian Borntraeger @ 2017-04-26 18:48 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel, quintela, lvivier, peterx

On 04/26/2017 08:37 PM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> A couple more traces that would have made fixing that postcopy
> bug a bit easier.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>

> ---
>  migration/ram.c        | 2 ++
>  migration/trace-events | 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index b4ed41c725..3ac41ccaba 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -812,6 +812,7 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
>      ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
> 
>      p = block->host + offset;
> +    trace_ram_save_page(block->idstr, (uint64_t)offset, p);
> 
>      /* In doubt sent page as normal */
>      bytes_xmit = 0;
> @@ -2614,6 +2615,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>                  ret = -EINVAL;
>                  break;
>              }
> +            trace_ram_load_loop(block->idstr, (uint64_t)addr, flags, host);
>          }
> 
>          switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
> diff --git a/migration/trace-events b/migration/trace-events
> index b8f01a218c..5b8ccf301c 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -69,8 +69,10 @@ migration_bitmap_sync_start(void) ""
>  migration_bitmap_sync_end(uint64_t dirty_pages) "dirty_pages %" PRIu64
>  migration_throttle(void) ""
>  ram_discard_range(const char *rbname, uint64_t start, size_t len) "%s: start: %" PRIx64 " %zx"
> +ram_load_loop(const char *rbname, uint64_t addr, int flags, void *host) "%s: addr: %" PRIx64 " flags: %x host: %p"
>  ram_load_postcopy_loop(uint64_t addr, int flags) "@%" PRIx64 " %x"
>  ram_postcopy_send_discard_bitmap(void) ""
> +ram_save_page(const char *rbname, uint64_t offset, void *host) "%s: offset: %" PRIx64 " host: %p"
>  ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: %zx len: %zx"
> 
>  # migration/migration.c
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] Postcopy: Force allocation of all-zero precopy pages
  2017-04-26 18:37 ` [Qemu-devel] [PATCH 1/2] Postcopy: Force allocation of all-zero precopy pages Dr. David Alan Gilbert (git)
@ 2017-04-26 19:00   ` Christian Borntraeger
  2017-04-26 19:04     ` Dr. David Alan Gilbert
  2017-04-26 19:35   ` Juan Quintela
  1 sibling, 1 reply; 18+ messages in thread
From: Christian Borntraeger @ 2017-04-26 19:00 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel, quintela, lvivier, peterx

On 04/26/2017 08:37 PM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> When an all-zero page is received during the precopy
> phase of a postcopy-enabled migration we must force
> allocation otherwise accesses to the page will still
> get blocked by userfault.
> 
> Symptom:
>   a) If the page is accessed by a device during device-load
>     then we get a deadlock as the source finishes sending
>     all its pages but the destination device-load is still
>     paused and so doesn't clean up.
> 
>   b) If the page is accessed later, then the thread will stay
>     paused until the end of migration rather than carrying on
>     running, until we release userfault at the end.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>

CC stable? after all the guest hangs on both sides

Has survived 40 migrations (usually failed at the 2nd)
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>

> ---
>  include/migration/migration.h |  3 ++-
>  migration/ram.c               | 12 ++++++++----
>  migration/rdma.c              |  2 +-
>  3 files changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index ba1a16cbc1..b47904033c 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -265,7 +265,8 @@ uint64_t xbzrle_mig_pages_overflow(void);
>  uint64_t xbzrle_mig_pages_cache_miss(void);
>  double xbzrle_mig_cache_miss_rate(void);
> 
> -void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
> +void ram_handle_compressed(void *host, uint8_t ch, uint64_t size,
> +                           bool always_write);
>  void ram_debug_dump_bitmap(unsigned long *todump, bool expected);
>  /* For outgoing discard bitmap */
>  int ram_postcopy_send_discard_bitmap(MigrationState *ms);
> diff --git a/migration/ram.c b/migration/ram.c
> index f48664ec62..b4ed41c725 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2274,10 +2274,12 @@ static inline void *host_from_ram_block_offset(RAMBlock *block,
>   * @host: host address for the zero page
>   * @ch: what the page is filled from.  We only support zero
>   * @size: size of the zero page
> + * @always_write: Always perform the memset even if it's zero
>   */
> -void ram_handle_compressed(void *host, uint8_t ch, uint64_t size)
> +void ram_handle_compressed(void *host, uint8_t ch, uint64_t size,
> +                           bool always_write)
>  {
> -    if (ch != 0 || !is_zero_range(host, size)) {
> +    if (ch != 0 || always_write || !is_zero_range(host, size)) {
>          memset(host, ch, size);
>      }
>  }
> @@ -2514,7 +2516,8 @@ static int ram_load_postcopy(QEMUFile *f)
>          switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
>          case RAM_SAVE_FLAG_COMPRESS:
>              ch = qemu_get_byte(f);
> -            memset(page_buffer, ch, TARGET_PAGE_SIZE);
> +            ram_handle_compressed(page_buffer, ch, TARGET_PAGE_SIZE,
> +                                  true);
>              if (ch) {
>                  all_zero = false;
>              }
> @@ -2664,7 +2667,8 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> 
>          case RAM_SAVE_FLAG_COMPRESS:
>              ch = qemu_get_byte(f);
> -            ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
> +            ram_handle_compressed(host, ch, TARGET_PAGE_SIZE,
> +                                  postcopy_advised);
>              break;
> 
>          case RAM_SAVE_FLAG_PAGE:
> diff --git a/migration/rdma.c b/migration/rdma.c
> index fe0a4b5a83..07a9bd75d8 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -3164,7 +3164,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
>              host_addr = block->local_host_addr +
>                              (comp->offset - block->offset);
> 
> -            ram_handle_compressed(host_addr, comp->value, comp->length);
> +            ram_handle_compressed(host_addr, comp->value, comp->length, false);
>              break;
> 
>          case RDMA_CONTROL_REGISTER_FINISHED:
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] Postcopy: Force allocation of all-zero precopy pages
  2017-04-26 19:00   ` Christian Borntraeger
@ 2017-04-26 19:04     ` Dr. David Alan Gilbert
  2017-04-26 19:37       ` Andrea Arcangeli
  2017-04-26 19:52       ` Christian Borntraeger
  0 siblings, 2 replies; 18+ messages in thread
From: Dr. David Alan Gilbert @ 2017-04-26 19:04 UTC (permalink / raw)
  To: Christian Borntraeger, aarcange; +Cc: qemu-devel, quintela, lvivier, peterx

* Christian Borntraeger (borntraeger@de.ibm.com) wrote:
> On 04/26/2017 08:37 PM, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > When an all-zero page is received during the precopy
> > phase of a postcopy-enabled migration we must force
> > allocation otherwise accesses to the page will still
> > get blocked by userfault.
> > 
> > Symptom:
> >   a) If the page is accessed by a device during device-load
> >     then we get a deadlock as the source finishes sending
> >     all its pages but the destination device-load is still
> >     paused and so doesn't clean up.
> > 
> >   b) If the page is accessed later, then the thread will stay
> >     paused until the end of migration rather than carrying on
> >     running, until we release userfault at the end.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
> 
> CC stable? after all the guest hangs on both sides
> 
> Has survived 40 migrations (usually failed at the 2nd)
> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>

Great...but.....
Andrea (added to the mail) says this shouldn't be necessary.
The read we were doing in the is_zero_range() should have been sufficient
to get the page mapped and that zero page should have survived.

So - I guess that's back a step, we need to figure out why the
page disapepars for you.

Dave

> 
> > ---
> >  include/migration/migration.h |  3 ++-
> >  migration/ram.c               | 12 ++++++++----
> >  migration/rdma.c              |  2 +-
> >  3 files changed, 11 insertions(+), 6 deletions(-)
> > 
> > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > index ba1a16cbc1..b47904033c 100644
> > --- a/include/migration/migration.h
> > +++ b/include/migration/migration.h
> > @@ -265,7 +265,8 @@ uint64_t xbzrle_mig_pages_overflow(void);
> >  uint64_t xbzrle_mig_pages_cache_miss(void);
> >  double xbzrle_mig_cache_miss_rate(void);
> > 
> > -void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
> > +void ram_handle_compressed(void *host, uint8_t ch, uint64_t size,
> > +                           bool always_write);
> >  void ram_debug_dump_bitmap(unsigned long *todump, bool expected);
> >  /* For outgoing discard bitmap */
> >  int ram_postcopy_send_discard_bitmap(MigrationState *ms);
> > diff --git a/migration/ram.c b/migration/ram.c
> > index f48664ec62..b4ed41c725 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -2274,10 +2274,12 @@ static inline void *host_from_ram_block_offset(RAMBlock *block,
> >   * @host: host address for the zero page
> >   * @ch: what the page is filled from.  We only support zero
> >   * @size: size of the zero page
> > + * @always_write: Always perform the memset even if it's zero
> >   */
> > -void ram_handle_compressed(void *host, uint8_t ch, uint64_t size)
> > +void ram_handle_compressed(void *host, uint8_t ch, uint64_t size,
> > +                           bool always_write)
> >  {
> > -    if (ch != 0 || !is_zero_range(host, size)) {
> > +    if (ch != 0 || always_write || !is_zero_range(host, size)) {
> >          memset(host, ch, size);
> >      }
> >  }
> > @@ -2514,7 +2516,8 @@ static int ram_load_postcopy(QEMUFile *f)
> >          switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
> >          case RAM_SAVE_FLAG_COMPRESS:
> >              ch = qemu_get_byte(f);
> > -            memset(page_buffer, ch, TARGET_PAGE_SIZE);
> > +            ram_handle_compressed(page_buffer, ch, TARGET_PAGE_SIZE,
> > +                                  true);
> >              if (ch) {
> >                  all_zero = false;
> >              }
> > @@ -2664,7 +2667,8 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> > 
> >          case RAM_SAVE_FLAG_COMPRESS:
> >              ch = qemu_get_byte(f);
> > -            ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
> > +            ram_handle_compressed(host, ch, TARGET_PAGE_SIZE,
> > +                                  postcopy_advised);
> >              break;
> > 
> >          case RAM_SAVE_FLAG_PAGE:
> > diff --git a/migration/rdma.c b/migration/rdma.c
> > index fe0a4b5a83..07a9bd75d8 100644
> > --- a/migration/rdma.c
> > +++ b/migration/rdma.c
> > @@ -3164,7 +3164,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
> >              host_addr = block->local_host_addr +
> >                              (comp->offset - block->offset);
> > 
> > -            ram_handle_compressed(host_addr, comp->value, comp->length);
> > +            ram_handle_compressed(host_addr, comp->value, comp->length, false);
> >              break;
> > 
> >          case RDMA_CONTROL_REGISTER_FINISHED:
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] Postcopy: Force allocation of all-zero precopy pages
  2017-04-26 18:37 ` [Qemu-devel] [PATCH 1/2] Postcopy: Force allocation of all-zero precopy pages Dr. David Alan Gilbert (git)
  2017-04-26 19:00   ` Christian Borntraeger
@ 2017-04-26 19:35   ` Juan Quintela
  2017-04-27  8:03     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 18+ messages in thread
From: Juan Quintela @ 2017-04-26 19:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, borntraeger, lvivier, peterx

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> When an all-zero page is received during the precopy
> phase of a postcopy-enabled migration we must force
> allocation otherwise accesses to the page will still
> get blocked by userfault.
>
> Symptom:
>   a) If the page is accessed by a device during device-load
>     then we get a deadlock as the source finishes sending
>     all its pages but the destination device-load is still
>     paused and so doesn't clean up.
>
>   b) If the page is accessed later, then the thread will stay
>     paused until the end of migration rather than carrying on
>     running, until we release userfault at the end.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  include/migration/migration.h |  3 ++-
>  migration/ram.c               | 12 ++++++++----
>  migration/rdma.c              |  2 +-
>  3 files changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index ba1a16cbc1..b47904033c 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -265,7 +265,8 @@ uint64_t xbzrle_mig_pages_overflow(void);
>  uint64_t xbzrle_mig_pages_cache_miss(void);
>  double xbzrle_mig_cache_miss_rate(void);
>  
> -void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
> +void ram_handle_compressed(void *host, uint8_t ch, uint64_t size,
> +                           bool always_write);
>  void ram_debug_dump_bitmap(unsigned long *todump, bool expected);
>  /* For outgoing discard bitmap */
>  int ram_postcopy_send_discard_bitmap(MigrationState *ms);
> diff --git a/migration/ram.c b/migration/ram.c
> index f48664ec62..b4ed41c725 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2274,10 +2274,12 @@ static inline void *host_from_ram_block_offset(RAMBlock *block,
>   * @host: host address for the zero page
>   * @ch: what the page is filled from.  We only support zero
>   * @size: size of the zero page
> + * @always_write: Always perform the memset even if it's zero
>   */
> -void ram_handle_compressed(void *host, uint8_t ch, uint64_t size)
> +void ram_handle_compressed(void *host, uint8_t ch, uint64_t size,
> +                           bool always_write)
>  {
> -    if (ch != 0 || !is_zero_range(host, size)) {
> +    if (ch != 0 || always_write || !is_zero_range(host, size)) {
>          memset(host, ch, size);
>      }
>  }
> @@ -2514,7 +2516,8 @@ static int ram_load_postcopy(QEMUFile *f)
>          switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
>          case RAM_SAVE_FLAG_COMPRESS:
>              ch = qemu_get_byte(f);
> -            memset(page_buffer, ch, TARGET_PAGE_SIZE);
> +            ram_handle_compressed(page_buffer, ch, TARGET_PAGE_SIZE,
> +                                  true);

This is weird, if we are in postcopy, we were already doing the memset
unconditionally, so .... weird as it can be.

I agree with Andrea that if the is_zero_range() don't assure that the
page is there, it is really strange.

Can you look at utils/bufferiszero() and see what that function is doing
in power?  File is a mess of ifdefs to optimize for intel as far as I
can see.

Later, Juan.


>              if (ch) {
>                  all_zero = false;
>              }
> @@ -2664,7 +2667,8 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>  
>          case RAM_SAVE_FLAG_COMPRESS:
>              ch = qemu_get_byte(f);
> -            ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
> +            ram_handle_compressed(host, ch, TARGET_PAGE_SIZE,
> +                                  postcopy_advised);
>              break;
>  
>          case RAM_SAVE_FLAG_PAGE:
> diff --git a/migration/rdma.c b/migration/rdma.c
> index fe0a4b5a83..07a9bd75d8 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -3164,7 +3164,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
>              host_addr = block->local_host_addr +
>                              (comp->offset - block->offset);
>  
> -            ram_handle_compressed(host_addr, comp->value, comp->length);
> +            ram_handle_compressed(host_addr, comp->value, comp->length, false);
>              break;
>  
>          case RDMA_CONTROL_REGISTER_FINISHED:

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] Postcopy: Force allocation of all-zero precopy pages
  2017-04-26 19:04     ` Dr. David Alan Gilbert
@ 2017-04-26 19:37       ` Andrea Arcangeli
  2017-04-27  3:20         ` Peter Xu
  2017-04-26 19:52       ` Christian Borntraeger
  1 sibling, 1 reply; 18+ messages in thread
From: Andrea Arcangeli @ 2017-04-26 19:37 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Christian Borntraeger, qemu-devel, quintela, lvivier, peterx

Hello,

On Wed, Apr 26, 2017 at 08:04:43PM +0100, Dr. David Alan Gilbert wrote:
> * Christian Borntraeger (borntraeger@de.ibm.com) wrote:
> > On 04/26/2017 08:37 PM, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > When an all-zero page is received during the precopy
> > > phase of a postcopy-enabled migration we must force
> > > allocation otherwise accesses to the page will still
> > > get blocked by userfault.
> > > 
> > > Symptom:
> > >   a) If the page is accessed by a device during device-load
> > >     then we get a deadlock as the source finishes sending
> > >     all its pages but the destination device-load is still
> > >     paused and so doesn't clean up.
> > > 
> > >   b) If the page is accessed later, then the thread will stay
> > >     paused until the end of migration rather than carrying on
> > >     running, until we release userfault at the end.
> > > 
> > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
> > 
> > CC stable? after all the guest hangs on both sides
> > 
> > Has survived 40 migrations (usually failed at the 2nd)
> > Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
> 
> Great...but.....
> Andrea (added to the mail) says this shouldn't be necessary.
> The read we were doing in the is_zero_range() should have been sufficient
> to get the page mapped and that zero page should have survived.
> 
> So - I guess that's back a step, we need to figure out why the
> page disapepars for you.

Yes reading during precopy is enough to fill the hole and prevent
userfault missing faults to trigger.

Somehow the pagetable must be mapped by a zeropage or a hugezeropage
or a regular page allocated during a previous precopy pass or a
pre-zeroed subpage part of a THP.

Even if the hugezeropage is splitted later by a MADV_DONTNEED with
postcopy starts, they will become 4k zeropages.

After a read succeeds, nothing (except MADV_DONTNEED or other explicit
syscalls which qemu would need to invoke explicitly between
is_zero_range and UFFDIO_REGISTER) should be able to bring the
pagetable back to its "pte_none/pmd_none" state that will then trigger
missing userfaults during postcopy later.

Thanks,
Andrea

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] Postcopy: Force allocation of all-zero precopy pages
  2017-04-26 19:04     ` Dr. David Alan Gilbert
  2017-04-26 19:37       ` Andrea Arcangeli
@ 2017-04-26 19:52       ` Christian Borntraeger
  2017-04-26 19:54         ` Christian Borntraeger
  1 sibling, 1 reply; 18+ messages in thread
From: Christian Borntraeger @ 2017-04-26 19:52 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, aarcange; +Cc: qemu-devel, quintela, lvivier, peterx

On 04/26/2017 09:04 PM, Dr. David Alan Gilbert wrote:
> * Christian Borntraeger (borntraeger@de.ibm.com) wrote:
>> On 04/26/2017 08:37 PM, Dr. David Alan Gilbert (git) wrote:
>>> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>>
>>> When an all-zero page is received during the precopy
>>> phase of a postcopy-enabled migration we must force
>>> allocation otherwise accesses to the page will still
>>> get blocked by userfault.
>>>
>>> Symptom:
>>>   a) If the page is accessed by a device during device-load
>>>     then we get a deadlock as the source finishes sending
>>>     all its pages but the destination device-load is still
>>>     paused and so doesn't clean up.
>>>
>>>   b) If the page is accessed later, then the thread will stay
>>>     paused until the end of migration rather than carrying on
>>>     running, until we release userfault at the end.
>>>
>>> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>> Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>
>> CC stable? after all the guest hangs on both sides
>>
>> Has survived 40 migrations (usually failed at the 2nd)
>> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
> 
> Great...but.....
> Andrea (added to the mail) says this shouldn't be necessary.
> The read we were doing in the is_zero_range() should have been sufficient
> to get the page mapped and that zero page should have survived.

We do not do is_zero_range if ch==0 because of lazy evaluation, no?

> 
> So - I guess that's back a step, we need to figure out why the
> page disapepars for you.
> 
> Dave
> 
>>
>>> ---
>>>  include/migration/migration.h |  3 ++-
>>>  migration/ram.c               | 12 ++++++++----
>>>  migration/rdma.c              |  2 +-
>>>  3 files changed, 11 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/include/migration/migration.h b/include/migration/migration.h
>>> index ba1a16cbc1..b47904033c 100644
>>> --- a/include/migration/migration.h
>>> +++ b/include/migration/migration.h
>>> @@ -265,7 +265,8 @@ uint64_t xbzrle_mig_pages_overflow(void);
>>>  uint64_t xbzrle_mig_pages_cache_miss(void);
>>>  double xbzrle_mig_cache_miss_rate(void);
>>>
>>> -void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
>>> +void ram_handle_compressed(void *host, uint8_t ch, uint64_t size,
>>> +                           bool always_write);
>>>  void ram_debug_dump_bitmap(unsigned long *todump, bool expected);
>>>  /* For outgoing discard bitmap */
>>>  int ram_postcopy_send_discard_bitmap(MigrationState *ms);
>>> diff --git a/migration/ram.c b/migration/ram.c
>>> index f48664ec62..b4ed41c725 100644
>>> --- a/migration/ram.c
>>> +++ b/migration/ram.c
>>> @@ -2274,10 +2274,12 @@ static inline void *host_from_ram_block_offset(RAMBlock *block,
>>>   * @host: host address for the zero page
>>>   * @ch: what the page is filled from.  We only support zero
>>>   * @size: size of the zero page
>>> + * @always_write: Always perform the memset even if it's zero
>>>   */
>>> -void ram_handle_compressed(void *host, uint8_t ch, uint64_t size)
>>> +void ram_handle_compressed(void *host, uint8_t ch, uint64_t size,
>>> +                           bool always_write)
>>>  {
>>> -    if (ch != 0 || !is_zero_range(host, size)) {
>>> +    if (ch != 0 || always_write || !is_zero_range(host, size)) {
>>>          memset(host, ch, size);
>>>      }
>>>  }
>>> @@ -2514,7 +2516,8 @@ static int ram_load_postcopy(QEMUFile *f)
>>>          switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
>>>          case RAM_SAVE_FLAG_COMPRESS:
>>>              ch = qemu_get_byte(f);
>>> -            memset(page_buffer, ch, TARGET_PAGE_SIZE);
>>> +            ram_handle_compressed(page_buffer, ch, TARGET_PAGE_SIZE,
>>> +                                  true);
>>>              if (ch) {
>>>                  all_zero = false;
>>>              }
>>> @@ -2664,7 +2667,8 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>>>
>>>          case RAM_SAVE_FLAG_COMPRESS:
>>>              ch = qemu_get_byte(f);
>>> -            ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
>>> +            ram_handle_compressed(host, ch, TARGET_PAGE_SIZE,
>>> +                                  postcopy_advised);
>>>              break;
>>>
>>>          case RAM_SAVE_FLAG_PAGE:
>>> diff --git a/migration/rdma.c b/migration/rdma.c
>>> index fe0a4b5a83..07a9bd75d8 100644
>>> --- a/migration/rdma.c
>>> +++ b/migration/rdma.c
>>> @@ -3164,7 +3164,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
>>>              host_addr = block->local_host_addr +
>>>                              (comp->offset - block->offset);
>>>
>>> -            ram_handle_compressed(host_addr, comp->value, comp->length);
>>> +            ram_handle_compressed(host_addr, comp->value, comp->length, false);
>>>              break;
>>>
>>>          case RDMA_CONTROL_REGISTER_FINISHED:
>>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] Postcopy: Force allocation of all-zero precopy pages
  2017-04-26 19:52       ` Christian Borntraeger
@ 2017-04-26 19:54         ` Christian Borntraeger
  0 siblings, 0 replies; 18+ messages in thread
From: Christian Borntraeger @ 2017-04-26 19:54 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, aarcange; +Cc: qemu-devel, quintela, lvivier, peterx

On 04/26/2017 09:52 PM, Christian Borntraeger wrote:
> On 04/26/2017 09:04 PM, Dr. David Alan Gilbert wrote:
>> * Christian Borntraeger (borntraeger@de.ibm.com) wrote:
>>> On 04/26/2017 08:37 PM, Dr. David Alan Gilbert (git) wrote:
>>>> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>>>
>>>> When an all-zero page is received during the precopy
>>>> phase of a postcopy-enabled migration we must force
>>>> allocation otherwise accesses to the page will still
>>>> get blocked by userfault.
>>>>
>>>> Symptom:
>>>>   a) If the page is accessed by a device during device-load
>>>>     then we get a deadlock as the source finishes sending
>>>>     all its pages but the destination device-load is still
>>>>     paused and so doesn't clean up.
>>>>
>>>>   b) If the page is accessed later, then the thread will stay
>>>>     paused until the end of migration rather than carrying on
>>>>     running, until we release userfault at the end.
>>>>
>>>> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>> Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>>
>>> CC stable? after all the guest hangs on both sides
>>>
>>> Has survived 40 migrations (usually failed at the 2nd)
>>> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>
>> Great...but.....
>> Andrea (added to the mail) says this shouldn't be necessary.
>> The read we were doing in the is_zero_range() should have been sufficient
>> to get the page mapped and that zero page should have survived.
> 
> We do not do is_zero_range if ch==0 because of lazy evaluation, no?

Sorry misread that code. We do.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] Postcopy: Force allocation of all-zero precopy pages
  2017-04-26 19:37       ` Andrea Arcangeli
@ 2017-04-27  3:20         ` Peter Xu
  2017-04-27  6:44           ` Christian Borntraeger
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Xu @ 2017-04-27  3:20 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Christian Borntraeger, qemu-devel, quintela, lvivier, Andrea Arcangeli

On Wed, Apr 26, 2017 at 09:37:43PM +0200, Andrea Arcangeli wrote:
> Hello,
> 
> On Wed, Apr 26, 2017 at 08:04:43PM +0100, Dr. David Alan Gilbert wrote:
> > * Christian Borntraeger (borntraeger@de.ibm.com) wrote:
> > > On 04/26/2017 08:37 PM, Dr. David Alan Gilbert (git) wrote:
> > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > 
> > > > When an all-zero page is received during the precopy
> > > > phase of a postcopy-enabled migration we must force
> > > > allocation otherwise accesses to the page will still
> > > > get blocked by userfault.
> > > > 
> > > > Symptom:
> > > >   a) If the page is accessed by a device during device-load
> > > >     then we get a deadlock as the source finishes sending
> > > >     all its pages but the destination device-load is still
> > > >     paused and so doesn't clean up.
> > > > 
> > > >   b) If the page is accessed later, then the thread will stay
> > > >     paused until the end of migration rather than carrying on
> > > >     running, until we release userfault at the end.
> > > > 
> > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > > Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
> > > 
> > > CC stable? after all the guest hangs on both sides
> > > 
> > > Has survived 40 migrations (usually failed at the 2nd)
> > > Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
> > 
> > Great...but.....
> > Andrea (added to the mail) says this shouldn't be necessary.
> > The read we were doing in the is_zero_range() should have been sufficient
> > to get the page mapped and that zero page should have survived.
> > 
> > So - I guess that's back a step, we need to figure out why the
> > page disapepars for you.
> 
> Yes reading during precopy is enough to fill the hole and prevent
> userfault missing faults to trigger.
> 
> Somehow the pagetable must be mapped by a zeropage or a hugezeropage
> or a regular page allocated during a previous precopy pass or a
> pre-zeroed subpage part of a THP.
> 
> Even if the hugezeropage is splitted later by a MADV_DONTNEED with
> postcopy starts, they will become 4k zeropages.
> 
> After a read succeeds, nothing (except MADV_DONTNEED or other explicit
> syscalls which qemu would need to invoke explicitly between
> is_zero_range and UFFDIO_REGISTER) should be able to bring the
> pagetable back to its "pte_none/pmd_none" state that will then trigger
> missing userfaults during postcopy later.

No matter what finally the solution would be (after see Juan's
comment, I am curious about whether is_zero_page() behaves differently
in power now)... Dave, would it worth mentioning in
ram_handle_compressed() about this read side-effect? Otherwise imho it
might be hard for many people to quickly notice this.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] Postcopy: Force allocation of all-zero precopy pages
  2017-04-27  3:20         ` Peter Xu
@ 2017-04-27  6:44           ` Christian Borntraeger
  2017-04-27 13:47             ` Andrea Arcangeli
  0 siblings, 1 reply; 18+ messages in thread
From: Christian Borntraeger @ 2017-04-27  6:44 UTC (permalink / raw)
  To: Peter Xu, Dr. David Alan Gilbert
  Cc: qemu-devel, quintela, lvivier, Andrea Arcangeli

On 04/27/2017 05:20 AM, Peter Xu wrote:
> On Wed, Apr 26, 2017 at 09:37:43PM +0200, Andrea Arcangeli wrote:
>> Hello,
>>
>> On Wed, Apr 26, 2017 at 08:04:43PM +0100, Dr. David Alan Gilbert wrote:
>>> * Christian Borntraeger (borntraeger@de.ibm.com) wrote:
>>>> On 04/26/2017 08:37 PM, Dr. David Alan Gilbert (git) wrote:
>>>>> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>>>>
>>>>> When an all-zero page is received during the precopy
>>>>> phase of a postcopy-enabled migration we must force
>>>>> allocation otherwise accesses to the page will still
>>>>> get blocked by userfault.
>>>>>
>>>>> Symptom:
>>>>>   a) If the page is accessed by a device during device-load
>>>>>     then we get a deadlock as the source finishes sending
>>>>>     all its pages but the destination device-load is still
>>>>>     paused and so doesn't clean up.
>>>>>
>>>>>   b) If the page is accessed later, then the thread will stay
>>>>>     paused until the end of migration rather than carrying on
>>>>>     running, until we release userfault at the end.
>>>>>
>>>>> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>>> Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>>>
>>>> CC stable? after all the guest hangs on both sides
>>>>
>>>> Has survived 40 migrations (usually failed at the 2nd)
>>>> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>>
>>> Great...but.....
>>> Andrea (added to the mail) says this shouldn't be necessary.
>>> The read we were doing in the is_zero_range() should have been sufficient
>>> to get the page mapped and that zero page should have survived.
>>>
>>> So - I guess that's back a step, we need to figure out why the
>>> page disapepars for you.
>>
>> Yes reading during precopy is enough to fill the hole and prevent
>> userfault missing faults to trigger.
>>
>> Somehow the pagetable must be mapped by a zeropage or a hugezeropage
>> or a regular page allocated during a previous precopy pass or a
>> pre-zeroed subpage part of a THP.
>>
>> Even if the hugezeropage is splitted later by a MADV_DONTNEED with
>> postcopy starts, they will become 4k zeropages.
>>
>> After a read succeeds, nothing (except MADV_DONTNEED or other explicit
>> syscalls which qemu would need to invoke explicitly between
>> is_zero_range and UFFDIO_REGISTER) should be able to bring the
>> pagetable back to its "pte_none/pmd_none" state that will then trigger
>> missing userfaults during postcopy later.

I have started instrumenting the kernel. I can see a set_pte_at for this address
and I see an (to be understood) invalidation shortly after that which explains
why I get a fault.
> 
> No matter what finally the solution would be (after see Juan's
> comment, I am curious about whether is_zero_page() behaves differently
> in power now)... Dave, would it worth mentioning in

s390 is not power. :-)

And we fall back to the normal buffer_zero_int which reads.

> ram_handle_compressed() about this read side-effect? Otherwise imho it
> might be hard for many people to quickly notice this.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] Postcopy: Force allocation of all-zero precopy pages
  2017-04-26 19:35   ` Juan Quintela
@ 2017-04-27  8:03     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 18+ messages in thread
From: Dr. David Alan Gilbert @ 2017-04-27  8:03 UTC (permalink / raw)
  To: Juan Quintela; +Cc: qemu-devel, borntraeger, lvivier, peterx

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > When an all-zero page is received during the precopy
> > phase of a postcopy-enabled migration we must force
> > allocation otherwise accesses to the page will still
> > get blocked by userfault.
> >
> > Symptom:
> >   a) If the page is accessed by a device during device-load
> >     then we get a deadlock as the source finishes sending
> >     all its pages but the destination device-load is still
> >     paused and so doesn't clean up.
> >
> >   b) If the page is accessed later, then the thread will stay
> >     paused until the end of migration rather than carrying on
> >     running, until we release userfault at the end.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
> > ---
> >  include/migration/migration.h |  3 ++-
> >  migration/ram.c               | 12 ++++++++----
> >  migration/rdma.c              |  2 +-
> >  3 files changed, 11 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > index ba1a16cbc1..b47904033c 100644
> > --- a/include/migration/migration.h
> > +++ b/include/migration/migration.h
> > @@ -265,7 +265,8 @@ uint64_t xbzrle_mig_pages_overflow(void);
> >  uint64_t xbzrle_mig_pages_cache_miss(void);
> >  double xbzrle_mig_cache_miss_rate(void);
> >  
> > -void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
> > +void ram_handle_compressed(void *host, uint8_t ch, uint64_t size,
> > +                           bool always_write);
> >  void ram_debug_dump_bitmap(unsigned long *todump, bool expected);
> >  /* For outgoing discard bitmap */
> >  int ram_postcopy_send_discard_bitmap(MigrationState *ms);
> > diff --git a/migration/ram.c b/migration/ram.c
> > index f48664ec62..b4ed41c725 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -2274,10 +2274,12 @@ static inline void *host_from_ram_block_offset(RAMBlock *block,
> >   * @host: host address for the zero page
> >   * @ch: what the page is filled from.  We only support zero
> >   * @size: size of the zero page
> > + * @always_write: Always perform the memset even if it's zero
> >   */
> > -void ram_handle_compressed(void *host, uint8_t ch, uint64_t size)
> > +void ram_handle_compressed(void *host, uint8_t ch, uint64_t size,
> > +                           bool always_write)
> >  {
> > -    if (ch != 0 || !is_zero_range(host, size)) {
> > +    if (ch != 0 || always_write || !is_zero_range(host, size)) {
> >          memset(host, ch, size);
> >      }
> >  }
> > @@ -2514,7 +2516,8 @@ static int ram_load_postcopy(QEMUFile *f)
> >          switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
> >          case RAM_SAVE_FLAG_COMPRESS:
> >              ch = qemu_get_byte(f);
> > -            memset(page_buffer, ch, TARGET_PAGE_SIZE);
> > +            ram_handle_compressed(page_buffer, ch, TARGET_PAGE_SIZE,
> > +                                  true);
> 
> This is weird, if we are in postcopy, we were already doing the memset
> unconditionally, so .... weird as it can be.

No, we were already doing the memset once we had switched into postcopy mode,
this code does it while we're still in the precopy phase.

Dave

> I agree with Andrea that if the is_zero_range() don't assure that the
> page is there, it is really strange.
> 
> Can you look at utils/bufferiszero() and see what that function is doing
> in power?  File is a mess of ifdefs to optimize for intel as far as I
> can see.
> 
> Later, Juan.
> 
> 
> >              if (ch) {
> >                  all_zero = false;
> >              }
> > @@ -2664,7 +2667,8 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >  
> >          case RAM_SAVE_FLAG_COMPRESS:
> >              ch = qemu_get_byte(f);
> > -            ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
> > +            ram_handle_compressed(host, ch, TARGET_PAGE_SIZE,
> > +                                  postcopy_advised);
> >              break;
> >  
> >          case RAM_SAVE_FLAG_PAGE:
> > diff --git a/migration/rdma.c b/migration/rdma.c
> > index fe0a4b5a83..07a9bd75d8 100644
> > --- a/migration/rdma.c
> > +++ b/migration/rdma.c
> > @@ -3164,7 +3164,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
> >              host_addr = block->local_host_addr +
> >                              (comp->offset - block->offset);
> >  
> > -            ram_handle_compressed(host_addr, comp->value, comp->length);
> > +            ram_handle_compressed(host_addr, comp->value, comp->length, false);
> >              break;
> >  
> >          case RDMA_CONTROL_REGISTER_FINISHED:
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] Postcopy: Force allocation of all-zero precopy pages
  2017-04-27  6:44           ` Christian Borntraeger
@ 2017-04-27 13:47             ` Andrea Arcangeli
  2017-04-28 13:19               ` Christian Borntraeger
  0 siblings, 1 reply; 18+ messages in thread
From: Andrea Arcangeli @ 2017-04-27 13:47 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Peter Xu, Dr. David Alan Gilbert, qemu-devel, quintela, lvivier

On Thu, Apr 27, 2017 at 08:44:03AM +0200, Christian Borntraeger wrote:
> I have started instrumenting the kernel. I can see a set_pte_at for this address
> and I see an (to be understood) invalidation shortly after that which explains
> why I get a fault.

Sounds great that you can see an invalidation shortly after, that is
the real source of the problem. Can you get a stack trace of such
invalidation?

Thanks!
Andrea

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] Postcopy: Force allocation of all-zero precopy pages
  2017-04-27 13:47             ` Andrea Arcangeli
@ 2017-04-28 13:19               ` Christian Borntraeger
  2017-04-28 14:24                 ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 18+ messages in thread
From: Christian Borntraeger @ 2017-04-28 13:19 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Peter Xu, Dr. David Alan Gilbert, qemu-devel, quintela, lvivier,
	Martin Schwidefsky

On 04/27/2017 03:47 PM, Andrea Arcangeli wrote:
> On Thu, Apr 27, 2017 at 08:44:03AM +0200, Christian Borntraeger wrote:
>> I have started instrumenting the kernel. I can see a set_pte_at for this address
>> and I see an (to be understood) invalidation shortly after that which explains
>> why I get a fault.
> 
> Sounds great that you can see an invalidation shortly after, that is
> the real source of the problem. Can you get a stack trace of such
> invalidation?
> 
> Thanks!
> Andrea
> 

Finally got it. I had a test module in that guest, which triggered a storage key 
operation. Normally we no longer use the storage keys in Linux. Therefore KVM
disables storage key support and intercepts all storage key instructions to enable
the support for that lazily.This makes paging easier and faster to not worry about those.
When we enable storage keys, we must not use shared pages as the storage key
is a property of the physical page frame (and not of the virtual page).
Therefore, this enablement makes mm_forbids_zeropage return true and removes
all existing zero pages.
(see commit 2faee8ff9dc6f4bfe46f6d2d110add858140fb20
    s390/mm: prevent and break zero page mappings in case of storage keys)
In this case it was called while migrating the storage keys (via kvm ioctl)
resulting in zero page mappings going away. (see qemu hw/s390x/s390-skeys.c)


Apr 28 14:48:43 s38lp08 kernel: ([<000000000011218a>] show_trace+0x62/0x78)
Apr 28 14:48:43 s38lp08 kernel:  [<0000000000112278>] show_stack+0x68/0xe0 
Apr 28 14:48:43 s38lp08 kernel:  [<000000000066f82e>] dump_stack+0x7e/0xb0 
Apr 28 14:48:43 s38lp08 kernel:  [<0000000000123b2c>] ptep_xchg_direct+0x254/0x288 
Apr 28 14:48:43 s38lp08 kernel:  [<0000000000127cfe>] __s390_enable_skey+0x76/0xa0 
Apr 28 14:48:43 s38lp08 kernel:  [<00000000002e5278>] __walk_page_range+0x270/0x500 
Apr 28 14:48:43 s38lp08 kernel:  [<00000000002e5592>] walk_page_range+0x8a/0x148 
Apr 28 14:48:43 s38lp08 kernel:  [<0000000000127bc6>] s390_enable_skey+0x116/0x140 
Apr 28 14:48:43 s38lp08 kernel:  [<000000000013fd92>] kvm_arch_vm_ioctl+0x11ea/0x1c70 
Apr 28 14:48:43 s38lp08 kernel:  [<0000000000131aa2>] kvm_vm_ioctl+0xca/0x710 
Apr 28 14:48:43 s38lp08 kernel:  [<00000000003460e8>] do_vfs_ioctl+0xa8/0x608 
Apr 28 14:48:43 s38lp08 kernel:  [<00000000003466ec>] SyS_ioctl+0xa4/0xb8 
Apr 28 14:48:43 s38lp08 kernel:  [<0000000000923460>] system_call+0xc4/0x23c

As a result a userfault on this virtual address will indeed go back to QEMU
and asks again for that page. And then QEMU "oh I have that page already transferred"
(even if it was detected as zero page and just faulted in by reading from it)
So I will not write it again.

Several options:
- let postcopy not discard a page, even it if must already be there (patch from David)
- change s390-skeys to register_savevm_live and do the skey enablement
very early (but this will be impossible for incoming data from old versions)
- let kernel s390_enable_skey actually fault in (might show big memory consumption)
- let qemu hw/s390x/s390-skeys.c tell the migration code that pages might need
  retransmissions
....

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] Postcopy: Force allocation of all-zero precopy pages
  2017-04-28 13:19               ` Christian Borntraeger
@ 2017-04-28 14:24                 ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 18+ messages in thread
From: Dr. David Alan Gilbert @ 2017-04-28 14:24 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Andrea Arcangeli, Peter Xu, qemu-devel, quintela, lvivier,
	Martin Schwidefsky

* Christian Borntraeger (borntraeger@de.ibm.com) wrote:
> On 04/27/2017 03:47 PM, Andrea Arcangeli wrote:
> > On Thu, Apr 27, 2017 at 08:44:03AM +0200, Christian Borntraeger wrote:
> >> I have started instrumenting the kernel. I can see a set_pte_at for this address
> >> and I see an (to be understood) invalidation shortly after that which explains
> >> why I get a fault.
> > 
> > Sounds great that you can see an invalidation shortly after, that is
> > the real source of the problem. Can you get a stack trace of such
> > invalidation?
> > 
> > Thanks!
> > Andrea
> > 
> 
> Finally got it. I had a test module in that guest, which triggered a storage key 
> operation. Normally we no longer use the storage keys in Linux. Therefore KVM
> disables storage key support and intercepts all storage key instructions to enable
> the support for that lazily.This makes paging easier and faster to not worry about those.
> When we enable storage keys, we must not use shared pages as the storage key
> is a property of the physical page frame (and not of the virtual page).
> Therefore, this enablement makes mm_forbids_zeropage return true and removes
> all existing zero pages.
> (see commit 2faee8ff9dc6f4bfe46f6d2d110add858140fb20
>     s390/mm: prevent and break zero page mappings in case of storage keys)
> In this case it was called while migrating the storage keys (via kvm ioctl)
> resulting in zero page mappings going away. (see qemu hw/s390x/s390-skeys.c)
> 
> 
> Apr 28 14:48:43 s38lp08 kernel: ([<000000000011218a>] show_trace+0x62/0x78)
> Apr 28 14:48:43 s38lp08 kernel:  [<0000000000112278>] show_stack+0x68/0xe0 
> Apr 28 14:48:43 s38lp08 kernel:  [<000000000066f82e>] dump_stack+0x7e/0xb0 
> Apr 28 14:48:43 s38lp08 kernel:  [<0000000000123b2c>] ptep_xchg_direct+0x254/0x288 
> Apr 28 14:48:43 s38lp08 kernel:  [<0000000000127cfe>] __s390_enable_skey+0x76/0xa0 
> Apr 28 14:48:43 s38lp08 kernel:  [<00000000002e5278>] __walk_page_range+0x270/0x500 
> Apr 28 14:48:43 s38lp08 kernel:  [<00000000002e5592>] walk_page_range+0x8a/0x148 
> Apr 28 14:48:43 s38lp08 kernel:  [<0000000000127bc6>] s390_enable_skey+0x116/0x140 
> Apr 28 14:48:43 s38lp08 kernel:  [<000000000013fd92>] kvm_arch_vm_ioctl+0x11ea/0x1c70 
> Apr 28 14:48:43 s38lp08 kernel:  [<0000000000131aa2>] kvm_vm_ioctl+0xca/0x710 
> Apr 28 14:48:43 s38lp08 kernel:  [<00000000003460e8>] do_vfs_ioctl+0xa8/0x608 
> Apr 28 14:48:43 s38lp08 kernel:  [<00000000003466ec>] SyS_ioctl+0xa4/0xb8 
> Apr 28 14:48:43 s38lp08 kernel:  [<0000000000923460>] system_call+0xc4/0x23c
> 
> As a result a userfault on this virtual address will indeed go back to QEMU
> and asks again for that page. And then QEMU "oh I have that page already transferred"
> (even if it was detected as zero page and just faulted in by reading from it)
> So I will not write it again.
> 
> Several options:
> - let postcopy not discard a page, even it if must already be there (patch from David)

Yes so that patch was forcing the population of zero pages when received; now that might
cause a lot of memory consumption - so it's not a great idea if we can avoid it.

> - change s390-skeys to register_savevm_live and do the skey enablement
> very early (but this will be impossible for incoming data from old versions)
> - let kernel s390_enable_skey actually fault in (might show big memory consumption)
> - let qemu hw/s390x/s390-skeys.c tell the migration code that pages might need
>   retransmissions

Note that it would have to do that on the source side prior to/or in the discard phase of
postcopy entry; any later and the source will ignore those requests (which is not
easy to fix - because it's a safeguard against it overwriting new data on the destination
by old data in a race).

Dave

> ....
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] migration: Extra tracing
  2017-04-26 18:37 ` [Qemu-devel] [PATCH 2/2] migration: Extra tracing Dr. David Alan Gilbert (git)
  2017-04-26 18:48   ` Christian Borntraeger
@ 2017-05-02 13:17   ` Philippe Mathieu-Daudé
  2017-05-04  8:40   ` Juan Quintela
  2 siblings, 0 replies; 18+ messages in thread
From: Philippe Mathieu-Daudé @ 2017-05-02 13:17 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git),
	qemu-devel, borntraeger, quintela, lvivier, peterx

On 04/26/2017 03:37 PM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> A couple more traces that would have made fixing that postcopy
> bug a bit easier.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>

> ---
>  migration/ram.c        | 2 ++
>  migration/trace-events | 2 ++
>  2 files changed, 4 insertions(+)
>
> diff --git a/migration/ram.c b/migration/ram.c
> index b4ed41c725..3ac41ccaba 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -812,6 +812,7 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss, bool last_stage)
>      ram_addr_t offset = pss->page << TARGET_PAGE_BITS;
>
>      p = block->host + offset;
> +    trace_ram_save_page(block->idstr, (uint64_t)offset, p);
>
>      /* In doubt sent page as normal */
>      bytes_xmit = 0;
> @@ -2614,6 +2615,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>                  ret = -EINVAL;
>                  break;
>              }
> +            trace_ram_load_loop(block->idstr, (uint64_t)addr, flags, host);
>          }
>
>          switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
> diff --git a/migration/trace-events b/migration/trace-events
> index b8f01a218c..5b8ccf301c 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -69,8 +69,10 @@ migration_bitmap_sync_start(void) ""
>  migration_bitmap_sync_end(uint64_t dirty_pages) "dirty_pages %" PRIu64
>  migration_throttle(void) ""
>  ram_discard_range(const char *rbname, uint64_t start, size_t len) "%s: start: %" PRIx64 " %zx"
> +ram_load_loop(const char *rbname, uint64_t addr, int flags, void *host) "%s: addr: %" PRIx64 " flags: %x host: %p"
>  ram_load_postcopy_loop(uint64_t addr, int flags) "@%" PRIx64 " %x"
>  ram_postcopy_send_discard_bitmap(void) ""
> +ram_save_page(const char *rbname, uint64_t offset, void *host) "%s: offset: %" PRIx64 " host: %p"
>  ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: %zx len: %zx"
>
>  # migration/migration.c
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] migration: Extra tracing
  2017-04-26 18:37 ` [Qemu-devel] [PATCH 2/2] migration: Extra tracing Dr. David Alan Gilbert (git)
  2017-04-26 18:48   ` Christian Borntraeger
  2017-05-02 13:17   ` Philippe Mathieu-Daudé
@ 2017-05-04  8:40   ` Juan Quintela
  2 siblings, 0 replies; 18+ messages in thread
From: Juan Quintela @ 2017-05-04  8:40 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, borntraeger, lvivier, peterx

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> A couple more traces that would have made fixing that postcopy
> bug a bit easier.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

Pulled.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-05-04  8:40 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-26 18:37 [Qemu-devel] [PATCH 0/2] Postcopy fix and traces Dr. David Alan Gilbert (git)
2017-04-26 18:37 ` [Qemu-devel] [PATCH 1/2] Postcopy: Force allocation of all-zero precopy pages Dr. David Alan Gilbert (git)
2017-04-26 19:00   ` Christian Borntraeger
2017-04-26 19:04     ` Dr. David Alan Gilbert
2017-04-26 19:37       ` Andrea Arcangeli
2017-04-27  3:20         ` Peter Xu
2017-04-27  6:44           ` Christian Borntraeger
2017-04-27 13:47             ` Andrea Arcangeli
2017-04-28 13:19               ` Christian Borntraeger
2017-04-28 14:24                 ` Dr. David Alan Gilbert
2017-04-26 19:52       ` Christian Borntraeger
2017-04-26 19:54         ` Christian Borntraeger
2017-04-26 19:35   ` Juan Quintela
2017-04-27  8:03     ` Dr. David Alan Gilbert
2017-04-26 18:37 ` [Qemu-devel] [PATCH 2/2] migration: Extra tracing Dr. David Alan Gilbert (git)
2017-04-26 18:48   ` Christian Borntraeger
2017-05-02 13:17   ` Philippe Mathieu-Daudé
2017-05-04  8:40   ` Juan Quintela

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.