All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 1/2] migration: Fix rdma migration failed
@ 2023-09-26 10:01 Li Zhijian
  2023-09-26 10:01 ` [PATCH v2 2/2] migration/rdma: zore out head.repeat to make the error more clear Li Zhijian
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Li Zhijian @ 2023-09-26 10:01 UTC (permalink / raw)
  To: quintela, peterx, leobras; +Cc: qemu-devel, Li Zhijian, Fabiano Rosas

Migration over RDMA failed since
commit: 294e5a4034 ("multifd: Only flush once each full round of memory")
with erors:
qemu-system-x86_64: rdma: Too many requests in this message (3638950032).Bailing.

migration with RDMA is different from tcp. RDMA has its own control
message, and all traffic between RDMA_CONTROL_REGISTER_REQUEST and
RDMA_CONTROL_REGISTER_FINISHED should not be disturbed.

find_dirty_block() will be called during RDMA_CONTROL_REGISTER_REQUEST
and RDMA_CONTROL_REGISTER_FINISHED, it will send a extra traffic(
RAM_SAVE_FLAG_MULTIFD_FLUSH) to destination and cause migration to fail
even though multifd is disabled.

This change make migrate_multifd_flush_after_each_section() return true
when multifd is disabled, that also means RAM_SAVE_FLAG_MULTIFD_FLUSH
will not be sent to destination any more when multifd is disabled.

Fixes: 294e5a4034 ("multifd: Only flush once each full round of memory")
CC: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---

V2: put that check at the entry of migrate_multifd_flush_after_each_section() # Peter
---
 migration/options.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/options.c b/migration/options.c
index 1d1e1321b0..327bcf2fbe 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -368,7 +368,7 @@ bool migrate_multifd_flush_after_each_section(void)
 {
     MigrationState *s = migrate_get_current();
 
-    return s->multifd_flush_after_each_section;
+    return !migrate_multifd() || s->multifd_flush_after_each_section;
 }
 
 bool migrate_postcopy(void)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/2] migration/rdma: zore out head.repeat to make the error more clear
  2023-09-26 10:01 [PATCH v2 1/2] migration: Fix rdma migration failed Li Zhijian
@ 2023-09-26 10:01 ` Li Zhijian
  2023-10-03 18:57   ` Juan Quintela
  2023-09-26 17:04 ` [PATCH v2 1/2] migration: Fix rdma migration failed Peter Xu
  2023-10-03 18:57 ` Juan Quintela
  2 siblings, 1 reply; 10+ messages in thread
From: Li Zhijian @ 2023-09-26 10:01 UTC (permalink / raw)
  To: quintela, peterx, leobras; +Cc: qemu-devel, Li Zhijian, Fabiano Rosas

Previously, we got a confusion error that complains
the RDMAControlHeader.repeat:
qemu-system-x86_64: rdma: Too many requests in this message (3638950032).Bailing.

Actually, it's caused by an unexpected RDMAControlHeader.type.
After this patch, error will become:
qemu-system-x86_64: Unknown control message QEMU FILE

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>

---
V2: add reviewed-by tags
---
 migration/rdma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index a2a3db35b1..3073d9953c 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2812,7 +2812,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
         size_t remaining = iov[i].iov_len;
         uint8_t * data = (void *)iov[i].iov_base;
         while (remaining) {
-            RDMAControlHeader head;
+            RDMAControlHeader head = {};
 
             len = MIN(remaining, RDMA_SEND_INCREMENT);
             remaining -= len;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] migration: Fix rdma migration failed
  2023-09-26 10:01 [PATCH v2 1/2] migration: Fix rdma migration failed Li Zhijian
  2023-09-26 10:01 ` [PATCH v2 2/2] migration/rdma: zore out head.repeat to make the error more clear Li Zhijian
@ 2023-09-26 17:04 ` Peter Xu
  2023-10-03 19:00   ` Juan Quintela
  2023-10-03 18:57 ` Juan Quintela
  2 siblings, 1 reply; 10+ messages in thread
From: Peter Xu @ 2023-09-26 17:04 UTC (permalink / raw)
  To: Li Zhijian; +Cc: quintela, leobras, qemu-devel, Fabiano Rosas

On Tue, Sep 26, 2023 at 06:01:02PM +0800, Li Zhijian wrote:
> Migration over RDMA failed since
> commit: 294e5a4034 ("multifd: Only flush once each full round of memory")
> with erors:
> qemu-system-x86_64: rdma: Too many requests in this message (3638950032).Bailing.
> 
> migration with RDMA is different from tcp. RDMA has its own control
> message, and all traffic between RDMA_CONTROL_REGISTER_REQUEST and
> RDMA_CONTROL_REGISTER_FINISHED should not be disturbed.
> 
> find_dirty_block() will be called during RDMA_CONTROL_REGISTER_REQUEST
> and RDMA_CONTROL_REGISTER_FINISHED, it will send a extra traffic(
> RAM_SAVE_FLAG_MULTIFD_FLUSH) to destination and cause migration to fail
> even though multifd is disabled.
> 
> This change make migrate_multifd_flush_after_each_section() return true
> when multifd is disabled, that also means RAM_SAVE_FLAG_MULTIFD_FLUSH
> will not be sent to destination any more when multifd is disabled.
> 
> Fixes: 294e5a4034 ("multifd: Only flush once each full round of memory")
> CC: Fabiano Rosas <farosas@suse.de>
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> ---
> 
> V2: put that check at the entry of migrate_multifd_flush_after_each_section() # Peter

When seeing this I notice my suggestion wasn't ideal either, as we rely on
both multifd_send_sync_main() and multifd_recv_sync_main() be no-op when
!multifd.

For the long term, we should not call multifd functions at all, if multifd
is not enabled..

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] migration: Fix rdma migration failed
  2023-09-26 10:01 [PATCH v2 1/2] migration: Fix rdma migration failed Li Zhijian
  2023-09-26 10:01 ` [PATCH v2 2/2] migration/rdma: zore out head.repeat to make the error more clear Li Zhijian
  2023-09-26 17:04 ` [PATCH v2 1/2] migration: Fix rdma migration failed Peter Xu
@ 2023-10-03 18:57 ` Juan Quintela
  2023-10-06 15:52   ` Peter Xu
  2023-10-07  6:03   ` Zhijian Li (Fujitsu)
  2 siblings, 2 replies; 10+ messages in thread
From: Juan Quintela @ 2023-10-03 18:57 UTC (permalink / raw)
  To: Li Zhijian; +Cc: peterx, leobras, qemu-devel, Fabiano Rosas

Li Zhijian <lizhijian@fujitsu.com> wrote:
> Migration over RDMA failed since
> commit: 294e5a4034 ("multifd: Only flush once each full round of memory")
> with erors:
> qemu-system-x86_64: rdma: Too many requests in this message (3638950032).Bailing.
>
> migration with RDMA is different from tcp. RDMA has its own control
> message, and all traffic between RDMA_CONTROL_REGISTER_REQUEST and
> RDMA_CONTROL_REGISTER_FINISHED should not be disturbed.
>
> find_dirty_block() will be called during RDMA_CONTROL_REGISTER_REQUEST
> and RDMA_CONTROL_REGISTER_FINISHED, it will send a extra traffic(
> RAM_SAVE_FLAG_MULTIFD_FLUSH) to destination and cause migration to fail
> even though multifd is disabled.
>
> This change make migrate_multifd_flush_after_each_section() return true
> when multifd is disabled, that also means RAM_SAVE_FLAG_MULTIFD_FLUSH
> will not be sent to destination any more when multifd is disabled.
>
> Fixes: 294e5a4034 ("multifd: Only flush once each full round of memory")
> CC: Fabiano Rosas <farosas@suse.de>
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>

Ouch.

> index 1d1e1321b0..327bcf2fbe 100644
> --- a/migration/options.c
> +++ b/migration/options.c
> @@ -368,7 +368,7 @@ bool migrate_multifd_flush_after_each_section(void)
>  {
>      MigrationState *s = migrate_get_current();
>  
> -    return s->multifd_flush_after_each_section;
> +    return !migrate_multifd() || s->multifd_flush_after_each_section;
>  }
>  
>  bool migrate_postcopy(void)

But I think this is ugly.

migrate_multifd_flush_after_each_section()

returnls true

with multifd not enabled?

And we are creating a "function" that just reads a property now does
something else.

What about this?

I know that the change is bigger, but it makes clear what is happening
here.

commit c638f66121ce30063fbf68c3eab4d7429cf2b209
Author: Juan Quintela <quintela@redhat.com>
Date:   Tue Oct 3 20:53:38 2023 +0200

    migration: Non multifd migration don't care about multifd flushes
    
    RDMA was having trouble because
    migrate_multifd_flush_after_each_section() can only be true or false,
    but we don't want to send any flush when we are not in multifd
    migration.
    
    CC: Fabiano Rosas <farosas@suse.de
    Reported-by: Li Zhijian <lizhijian@fujitsu.com>
    Signed-off-by: Juan Quintela <quintela@redhat.com>

diff --git a/migration/ram.c b/migration/ram.c
index e4bfd39f08..716cef6425 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1387,7 +1387,8 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss)
         pss->page = 0;
         pss->block = QLIST_NEXT_RCU(pss->block, next);
         if (!pss->block) {
-            if (!migrate_multifd_flush_after_each_section()) {
+            if (migrate_multifd() &&
+                !migrate_multifd_flush_after_each_section()) {
                 QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel;
                 int ret = multifd_send_sync_main(f);
                 if (ret < 0) {
@@ -3064,7 +3065,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
         return ret;
     }
 
-    if (!migrate_multifd_flush_after_each_section()) {
+    if (migrate_multifd() && !migrate_multifd_flush_after_each_section()) {
         qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH);
     }
 
@@ -3176,7 +3177,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 out:
     if (ret >= 0
         && migration_is_setup_or_active(migrate_get_current()->state)) {
-        if (migrate_multifd_flush_after_each_section()) {
+        if (migrate_multifd() && migrate_multifd_flush_after_each_section()) {
             ret = multifd_send_sync_main(rs->pss[RAM_CHANNEL_PRECOPY].pss_channel);
             if (ret < 0) {
                 return ret;
@@ -3253,7 +3254,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
         return ret;
     }
 
-    if (!migrate_multifd_flush_after_each_section()) {
+    if (migrate_multifd() && !migrate_multifd_flush_after_each_section()) {
         qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH);
     }
     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
@@ -3760,7 +3761,7 @@ int ram_load_postcopy(QEMUFile *f, int channel)
             break;
         case RAM_SAVE_FLAG_EOS:
             /* normal exit */
-            if (migrate_multifd_flush_after_each_section()) {
+            if (migrate_multifd() && migrate_multifd_flush_after_each_section()) {
                 multifd_recv_sync_main();
             }
             break;
@@ -4038,7 +4039,8 @@ static int ram_load_precopy(QEMUFile *f)
             break;
         case RAM_SAVE_FLAG_EOS:
             /* normal exit */
-            if (migrate_multifd_flush_after_each_section()) {
+            if (migrate_multifd() &&
+                migrate_multifd_flush_after_each_section()) {
                 multifd_recv_sync_main();
             }
             break;



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] migration/rdma: zore out head.repeat to make the error more clear
  2023-09-26 10:01 ` [PATCH v2 2/2] migration/rdma: zore out head.repeat to make the error more clear Li Zhijian
@ 2023-10-03 18:57   ` Juan Quintela
  0 siblings, 0 replies; 10+ messages in thread
From: Juan Quintela @ 2023-10-03 18:57 UTC (permalink / raw)
  To: Li Zhijian; +Cc: peterx, leobras, qemu-devel, Fabiano Rosas

Li Zhijian <lizhijian@fujitsu.com> wrote:
> Previously, we got a confusion error that complains
> the RDMAControlHeader.repeat:
> qemu-system-x86_64: rdma: Too many requests in this message (3638950032).Bailing.
>
> Actually, it's caused by an unexpected RDMAControlHeader.type.
> After this patch, error will become:
> qemu-system-x86_64: Unknown control message QEMU FILE
>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

queued.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] migration: Fix rdma migration failed
  2023-09-26 17:04 ` [PATCH v2 1/2] migration: Fix rdma migration failed Peter Xu
@ 2023-10-03 19:00   ` Juan Quintela
  0 siblings, 0 replies; 10+ messages in thread
From: Juan Quintela @ 2023-10-03 19:00 UTC (permalink / raw)
  To: Peter Xu; +Cc: Li Zhijian, leobras, qemu-devel, Fabiano Rosas

Peter Xu <peterx@redhat.com> wrote:
> On Tue, Sep 26, 2023 at 06:01:02PM +0800, Li Zhijian wrote:
>> Migration over RDMA failed since
>> commit: 294e5a4034 ("multifd: Only flush once each full round of memory")
>> with erors:
>> qemu-system-x86_64: rdma: Too many requests in this message (3638950032).Bailing.
>> 
>> migration with RDMA is different from tcp. RDMA has its own control
>> message, and all traffic between RDMA_CONTROL_REGISTER_REQUEST and
>> RDMA_CONTROL_REGISTER_FINISHED should not be disturbed.
>> 
>> find_dirty_block() will be called during RDMA_CONTROL_REGISTER_REQUEST
>> and RDMA_CONTROL_REGISTER_FINISHED, it will send a extra traffic(
>> RAM_SAVE_FLAG_MULTIFD_FLUSH) to destination and cause migration to fail
>> even though multifd is disabled.
>> 
>> This change make migrate_multifd_flush_after_each_section() return true
>> when multifd is disabled, that also means RAM_SAVE_FLAG_MULTIFD_FLUSH
>> will not be sent to destination any more when multifd is disabled.
>> 
>> Fixes: 294e5a4034 ("multifd: Only flush once each full round of memory")
>> CC: Fabiano Rosas <farosas@suse.de>
>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
>> ---
>> 
>> V2: put that check at the entry of migrate_multifd_flush_after_each_section() # Peter
>
> When seeing this I notice my suggestion wasn't ideal either, as we rely on
> both multifd_send_sync_main() and multifd_recv_sync_main() be no-op when
> !multifd.
>
> For the long term, we should not call multifd functions at all, if multifd
> is not enabled..

Agreed.

Send a different patch that makes this clear.

> Reviewed-by: Peter Xu <peterx@redhat.com>



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] migration: Fix rdma migration failed
  2023-10-03 18:57 ` Juan Quintela
@ 2023-10-06 15:52   ` Peter Xu
  2023-10-06 17:15     ` Peter Xu
  2023-10-18 14:32     ` Juan Quintela
  2023-10-07  6:03   ` Zhijian Li (Fujitsu)
  1 sibling, 2 replies; 10+ messages in thread
From: Peter Xu @ 2023-10-06 15:52 UTC (permalink / raw)
  To: Juan Quintela; +Cc: Li Zhijian, leobras, qemu-devel, Fabiano Rosas

On Tue, Oct 03, 2023 at 08:57:07PM +0200, Juan Quintela wrote:
> commit c638f66121ce30063fbf68c3eab4d7429cf2b209
> Author: Juan Quintela <quintela@redhat.com>
> Date:   Tue Oct 3 20:53:38 2023 +0200
> 
>     migration: Non multifd migration don't care about multifd flushes
>     
>     RDMA was having trouble because
>     migrate_multifd_flush_after_each_section() can only be true or false,
>     but we don't want to send any flush when we are not in multifd
>     migration.
>     
>     CC: Fabiano Rosas <farosas@suse.de
>     Reported-by: Li Zhijian <lizhijian@fujitsu.com>
>     Signed-off-by: Juan Quintela <quintela@redhat.com>
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index e4bfd39f08..716cef6425 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1387,7 +1387,8 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss)
>          pss->page = 0;
>          pss->block = QLIST_NEXT_RCU(pss->block, next);
>          if (!pss->block) {
> -            if (!migrate_multifd_flush_after_each_section()) {
> +            if (migrate_multifd() &&
> +                !migrate_multifd_flush_after_each_section()) {
>                  QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel;
>                  int ret = multifd_send_sync_main(f);
>                  if (ret < 0) {
> @@ -3064,7 +3065,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>          return ret;
>      }
>  
> -    if (!migrate_multifd_flush_after_each_section()) {
> +    if (migrate_multifd() && !migrate_multifd_flush_after_each_section()) {
>          qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH);
>      }
>  
> @@ -3176,7 +3177,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>  out:
>      if (ret >= 0
>          && migration_is_setup_or_active(migrate_get_current()->state)) {
> -        if (migrate_multifd_flush_after_each_section()) {
> +        if (migrate_multifd() && migrate_multifd_flush_after_each_section()) {
>              ret = multifd_send_sync_main(rs->pss[RAM_CHANNEL_PRECOPY].pss_channel);
>              if (ret < 0) {
>                  return ret;
> @@ -3253,7 +3254,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
>          return ret;
>      }
>  
> -    if (!migrate_multifd_flush_after_each_section()) {
> +    if (migrate_multifd() && !migrate_multifd_flush_after_each_section()) {
>          qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH);
>      }
>      qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
> @@ -3760,7 +3761,7 @@ int ram_load_postcopy(QEMUFile *f, int channel)
>              break;
>          case RAM_SAVE_FLAG_EOS:
>              /* normal exit */
> -            if (migrate_multifd_flush_after_each_section()) {
> +            if (migrate_multifd() && migrate_multifd_flush_after_each_section()) {
>                  multifd_recv_sync_main();
>              }
>              break;
> @@ -4038,7 +4039,8 @@ static int ram_load_precopy(QEMUFile *f)
>              break;
>          case RAM_SAVE_FLAG_EOS:
>              /* normal exit */
> -            if (migrate_multifd_flush_after_each_section()) {
> +            if (migrate_multifd() &&
> +                migrate_multifd_flush_after_each_section()) {
>                  multifd_recv_sync_main();
>              }
>              break;

Reviewed-by: Peter Xu <peterx@redhat.com>

Did you forget to send this out formally?  Even if f1de309792d6656e landed
(which, IMHO, shouldn't..), but IIUC rdma is still broken..

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] migration: Fix rdma migration failed
  2023-10-06 15:52   ` Peter Xu
@ 2023-10-06 17:15     ` Peter Xu
  2023-10-18 14:32     ` Juan Quintela
  1 sibling, 0 replies; 10+ messages in thread
From: Peter Xu @ 2023-10-06 17:15 UTC (permalink / raw)
  To: Juan Quintela; +Cc: Li Zhijian, leobras, qemu-devel, Fabiano Rosas

On Fri, Oct 06, 2023 at 11:52:10AM -0400, Peter Xu wrote:
> On Tue, Oct 03, 2023 at 08:57:07PM +0200, Juan Quintela wrote:
> > commit c638f66121ce30063fbf68c3eab4d7429cf2b209
> > Author: Juan Quintela <quintela@redhat.com>
> > Date:   Tue Oct 3 20:53:38 2023 +0200
> > 
> >     migration: Non multifd migration don't care about multifd flushes
> >     
> >     RDMA was having trouble because
> >     migrate_multifd_flush_after_each_section() can only be true or false,
> >     but we don't want to send any flush when we are not in multifd
> >     migration.
> >     
> >     CC: Fabiano Rosas <farosas@suse.de
> >     Reported-by: Li Zhijian <lizhijian@fujitsu.com>
> >     Signed-off-by: Juan Quintela <quintela@redhat.com>
> > 
> > diff --git a/migration/ram.c b/migration/ram.c
> > index e4bfd39f08..716cef6425 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -1387,7 +1387,8 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss)
> >          pss->page = 0;
> >          pss->block = QLIST_NEXT_RCU(pss->block, next);
> >          if (!pss->block) {
> > -            if (!migrate_multifd_flush_after_each_section()) {
> > +            if (migrate_multifd() &&
> > +                !migrate_multifd_flush_after_each_section()) {
> >                  QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel;
> >                  int ret = multifd_send_sync_main(f);
> >                  if (ret < 0) {
> > @@ -3064,7 +3065,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
> >          return ret;
> >      }
> >  
> > -    if (!migrate_multifd_flush_after_each_section()) {
> > +    if (migrate_multifd() && !migrate_multifd_flush_after_each_section()) {
> >          qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH);
> >      }
> >  
> > @@ -3176,7 +3177,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
> >  out:
> >      if (ret >= 0
> >          && migration_is_setup_or_active(migrate_get_current()->state)) {
> > -        if (migrate_multifd_flush_after_each_section()) {
> > +        if (migrate_multifd() && migrate_multifd_flush_after_each_section()) {
> >              ret = multifd_send_sync_main(rs->pss[RAM_CHANNEL_PRECOPY].pss_channel);
> >              if (ret < 0) {
> >                  return ret;
> > @@ -3253,7 +3254,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
> >          return ret;
> >      }
> >  
> > -    if (!migrate_multifd_flush_after_each_section()) {
> > +    if (migrate_multifd() && !migrate_multifd_flush_after_each_section()) {
> >          qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH);
> >      }
> >      qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
> > @@ -3760,7 +3761,7 @@ int ram_load_postcopy(QEMUFile *f, int channel)
> >              break;
> >          case RAM_SAVE_FLAG_EOS:
> >              /* normal exit */
> > -            if (migrate_multifd_flush_after_each_section()) {
> > +            if (migrate_multifd() && migrate_multifd_flush_after_each_section()) {
> >                  multifd_recv_sync_main();
> >              }
> >              break;
> > @@ -4038,7 +4039,8 @@ static int ram_load_precopy(QEMUFile *f)
> >              break;
> >          case RAM_SAVE_FLAG_EOS:
> >              /* normal exit */
> > -            if (migrate_multifd_flush_after_each_section()) {
> > +            if (migrate_multifd() &&
> > +                migrate_multifd_flush_after_each_section()) {
> >                  multifd_recv_sync_main();
> >              }
> >              break;
> 
> Reviewed-by: Peter Xu <peterx@redhat.com>
> 
> Did you forget to send this out formally?  Even if f1de309792d6656e landed
> (which, IMHO, shouldn't..), but IIUC rdma is still broken..

Two more things to mention..

$ git tag --contains 294e5a4034e81b

It tells me v8.1 is also affected.. so we may want to copy stable too for
8.1, for whichever patch we want to merge (either yours or Zhijian's)..

Meanwhile, it also breaks migration as long as user specifies the new
behavior.. for example: v8.1->v8.0 will break with this:

$ (echo "migrate exec:cat>out"; echo "quit") | ./qemu-v8.1.1 -M pc-q35-8.0 -global migration.multifd-flush-after-each-section=false -monitor stdio
QEMU 8.1.1 monitor - type 'help' for more information
VNC server running on ::1:5900
(qemu) migrate exec:cat>out
(qemu) quit

$ ./qemu-v8.0.5 -M pc-q35-8.0 -incoming "exec:cat<out"
VNC server running on ::1:5900
qemu-v8.0.5: Unknown combination of migration flags: 0x200
qemu-v8.0.5: error while loading state for instance 0x0 of device 'ram'
qemu-v8.0.5: load of migration failed: Invalid argument

IOW, besides rdma and the script, it can also break in other ways.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] migration: Fix rdma migration failed
  2023-10-03 18:57 ` Juan Quintela
  2023-10-06 15:52   ` Peter Xu
@ 2023-10-07  6:03   ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 10+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-10-07  6:03 UTC (permalink / raw)
  To: quintela; +Cc: peterx, leobras, qemu-devel, Fabiano Rosas



On 04/10/2023 02:57, Juan Quintela wrote:
> commit c638f66121ce30063fbf68c3eab4d7429cf2b209
> Author: Juan Quintela<quintela@redhat.com>
> Date:   Tue Oct 3 20:53:38 2023 +0200
> 
>      migration: Non multifd migration don't care about multifd flushes
>      
>      RDMA was having trouble because
>      migrate_multifd_flush_after_each_section() can only be true or false,
>      but we don't want to send any flush when we are not in multifd
>      migration.
>      
>      CC: Fabiano Rosas <farosas@suse.de
>      Reported-by: Li Zhijian<lizhijian@fujitsu.com>
>      Signed-off-by: Juan Quintela<quintela@redhat.com>

Looks good to me

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] migration: Fix rdma migration failed
  2023-10-06 15:52   ` Peter Xu
  2023-10-06 17:15     ` Peter Xu
@ 2023-10-18 14:32     ` Juan Quintela
  1 sibling, 0 replies; 10+ messages in thread
From: Juan Quintela @ 2023-10-18 14:32 UTC (permalink / raw)
  To: Peter Xu; +Cc: Li Zhijian, leobras, qemu-devel, Fabiano Rosas


I see in upstream already:

(master)$ g branch --show-current 
master
(master)$ g branch --contains d4f34485ca8a077c98fc2303451e9bece9200dd7
* master
(master)$ 


commit d4f34485ca8a077c98fc2303451e9bece9200dd7
Author: Juan Quintela <quintela@redhat.com>
Date:   Wed Oct 11 22:55:48 2023 +0200

    migration: Non multifd migration don't care about multifd flushes
    
    RDMA was having trouble because
    migrate_multifd_flush_after_each_section() can only be true or false,
    but we don't want to send any flush when we are not in multifd
    migration.
    
    CC: Fabiano Rosas <farosas@suse.de
    Fixes: 294e5a4034e81 ("multifd: Only flush once each full round of memory")
    
    Reported-by: Li Zhijian <lizhijian@fujitsu.com>
    Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
    Reviewed-by: Peter Xu <peterx@redhat.com>
    Signed-off-by: Juan Quintela <quintela@redhat.com>
    Message-ID: <20231011205548.10571-2-quintela@redhat.com>

Or I am missing something?

Later, Juan.



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-10-18 14:33 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-26 10:01 [PATCH v2 1/2] migration: Fix rdma migration failed Li Zhijian
2023-09-26 10:01 ` [PATCH v2 2/2] migration/rdma: zore out head.repeat to make the error more clear Li Zhijian
2023-10-03 18:57   ` Juan Quintela
2023-09-26 17:04 ` [PATCH v2 1/2] migration: Fix rdma migration failed Peter Xu
2023-10-03 19:00   ` Juan Quintela
2023-10-03 18:57 ` Juan Quintela
2023-10-06 15:52   ` Peter Xu
2023-10-06 17:15     ` Peter Xu
2023-10-18 14:32     ` Juan Quintela
2023-10-07  6:03   ` Zhijian Li (Fujitsu)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.