qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] migration: Fix rdma migration failed
@ 2023-09-20  9:04 Li Zhijian
  2023-09-20  9:04 ` [PATCH 2/2] migration/rdma: zore out head.repeat to make the error more clear Li Zhijian
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Li Zhijian @ 2023-09-20  9:04 UTC (permalink / raw)
  To: quintela, peterx, leobras; +Cc: qemu-devel, Li Zhijian

From: Li Zhijian <lizhijian@cn.fujitsu.com>

Destination will fail with:
qemu-system-x86_64: rdma: Too many requests in this message (3638950032).Bailing.

migrate with RDMA is different from tcp. RDMA has its own control
message, and all traffic between RDMA_CONTROL_REGISTER_REQUEST and
RDMA_CONTROL_REGISTER_FINISHED should not be disturbed.

find_dirty_block() will be called during RDMA_CONTROL_REGISTER_REQUEST
and RDMA_CONTROL_REGISTER_FINISHED, it will send a extra traffic to
destination and cause migration to fail.

Since there's no existing subroutine to indicate whether it's migrated
by RDMA or not, and RDMA is not compatible with multifd, we use
migrate_multifd() here.

Fixes: 294e5a4034 ("multifd: Only flush once each full round of memory")
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 migration/ram.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index 9040d66e61..89ae28e21a 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1399,7 +1399,8 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss)
         pss->page = 0;
         pss->block = QLIST_NEXT_RCU(pss->block, next);
         if (!pss->block) {
-            if (!migrate_multifd_flush_after_each_section()) {
+            if (migrate_multifd() &&
+                !migrate_multifd_flush_after_each_section()) {
                 QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel;
                 int ret = multifd_send_sync_main(f);
                 if (ret < 0) {
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] migration/rdma: zore out head.repeat to make the error more clear
  2023-09-20  9:04 [PATCH 1/2] migration: Fix rdma migration failed Li Zhijian
@ 2023-09-20  9:04 ` Li Zhijian
  2023-09-20 13:01   ` Fabiano Rosas
  2023-09-22 15:44   ` Peter Xu
  2023-09-20 12:46 ` [PATCH 1/2] migration: Fix rdma migration failed Fabiano Rosas
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 13+ messages in thread
From: Li Zhijian @ 2023-09-20  9:04 UTC (permalink / raw)
  To: quintela, peterx, leobras; +Cc: qemu-devel, Li Zhijian

From: Li Zhijian <lizhijian@cn.fujitsu.com>

Previously, we got a confusion error that complains
the RDMAControlHeader.repeat:
qemu-system-x86_64: rdma: Too many requests in this message (3638950032).Bailing.

Actually, it's caused by an unexpected RDMAControlHeader.type.
After this patch, error will become:
qemu-system-x86_64: Unknown control message QEMU FILE

Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 migration/rdma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index a2a3db35b1..3073d9953c 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2812,7 +2812,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
         size_t remaining = iov[i].iov_len;
         uint8_t * data = (void *)iov[i].iov_base;
         while (remaining) {
-            RDMAControlHeader head;
+            RDMAControlHeader head = {};
 
             len = MIN(remaining, RDMA_SEND_INCREMENT);
             remaining -= len;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] migration: Fix rdma migration failed
  2023-09-20  9:04 [PATCH 1/2] migration: Fix rdma migration failed Li Zhijian
  2023-09-20  9:04 ` [PATCH 2/2] migration/rdma: zore out head.repeat to make the error more clear Li Zhijian
@ 2023-09-20 12:46 ` Fabiano Rosas
  2023-09-22  7:42   ` Zhijian Li (Fujitsu)
  2023-09-21  1:40 ` Zhijian Li (Fujitsu)
  2023-09-22 15:42 ` Peter Xu
  3 siblings, 1 reply; 13+ messages in thread
From: Fabiano Rosas @ 2023-09-20 12:46 UTC (permalink / raw)
  To: Li Zhijian, quintela, peterx, leobras; +Cc: qemu-devel, Li Zhijian

Li Zhijian <lizhijian@fujitsu.com> writes:

> From: Li Zhijian <lizhijian@cn.fujitsu.com>
>
> Destination will fail with:
> qemu-system-x86_64: rdma: Too many requests in this message (3638950032).Bailing.
>
> migrate with RDMA is different from tcp. RDMA has its own control
> message, and all traffic between RDMA_CONTROL_REGISTER_REQUEST and
> RDMA_CONTROL_REGISTER_FINISHED should not be disturbed.

Yeah, this is really fragile. We need a long term solution to this. Any
other change to multifd protocol as well as any other change to the
migration ram handling might hit this issue again.

Perhaps commit 294e5a4034 ("multifd: Only flush once each full round of
memory") should simply not have touched the stream at that point, but we
don't have any explicit safeguards to avoid interleaving flags from
different layers like that (assuming multifd is at another logical layer
than the ram handling).

I don't have any good suggestions at this moment, so for now:

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] migration/rdma: zore out head.repeat to make the error more clear
  2023-09-20  9:04 ` [PATCH 2/2] migration/rdma: zore out head.repeat to make the error more clear Li Zhijian
@ 2023-09-20 13:01   ` Fabiano Rosas
  2023-09-21  1:36     ` Zhijian Li (Fujitsu)
  2023-09-22 15:44   ` Peter Xu
  1 sibling, 1 reply; 13+ messages in thread
From: Fabiano Rosas @ 2023-09-20 13:01 UTC (permalink / raw)
  To: Li Zhijian, quintela, peterx, leobras; +Cc: qemu-devel, Li Zhijian

Li Zhijian <lizhijian@fujitsu.com> writes:

> From: Li Zhijian <lizhijian@cn.fujitsu.com>
>
> Previously, we got a confusion error that complains
> the RDMAControlHeader.repeat:
> qemu-system-x86_64: rdma: Too many requests in this message (3638950032).Bailing.
>
> Actually, it's caused by an unexpected RDMAControlHeader.type.
> After this patch, error will become:
> qemu-system-x86_64: Unknown control message QEMU FILE
>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> ---
>  migration/rdma.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/migration/rdma.c b/migration/rdma.c
> index a2a3db35b1..3073d9953c 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2812,7 +2812,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>          size_t remaining = iov[i].iov_len;
>          uint8_t * data = (void *)iov[i].iov_base;
>          while (remaining) {
> -            RDMAControlHeader head;
> +            RDMAControlHeader head = {};
>  
>              len = MIN(remaining, RDMA_SEND_INCREMENT);
>              remaining -= len;

I'm struggling to see how head is used before we set the type a couple
of lines below. Could you expand on it?

Also, a smoke test could have caught both issues early on. Is there any
reason for not having any?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] migration/rdma: zore out head.repeat to make the error more clear
  2023-09-20 13:01   ` Fabiano Rosas
@ 2023-09-21  1:36     ` Zhijian Li (Fujitsu)
  2023-09-21 12:29       ` Fabiano Rosas
  0 siblings, 1 reply; 13+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-21  1:36 UTC (permalink / raw)
  To: Fabiano Rosas, quintela, peterx, leobras; +Cc: qemu-devel, Zhijian Li (Fujitsu)



On 20/09/2023 21:01, Fabiano Rosas wrote:
> Li Zhijian <lizhijian@fujitsu.com> writes:
> 
>> From: Li Zhijian <lizhijian@cn.fujitsu.com>
>>
>> Previously, we got a confusion error that complains
>> the RDMAControlHeader.repeat:
>> qemu-system-x86_64: rdma: Too many requests in this message (3638950032).Bailing.
>>
>> Actually, it's caused by an unexpected RDMAControlHeader.type.
>> After this patch, error will become:
>> qemu-system-x86_64: Unknown control message QEMU FILE
>>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> ---
>>   migration/rdma.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index a2a3db35b1..3073d9953c 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -2812,7 +2812,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>>           size_t remaining = iov[i].iov_len;
>>           uint8_t * data = (void *)iov[i].iov_base;
>>           while (remaining) {
>> -            RDMAControlHeader head;
>> +            RDMAControlHeader head = {};
>>   
>>               len = MIN(remaining, RDMA_SEND_INCREMENT);
>>               remaining -= len;
> 

2815             RDMAControlHeader head = {};
2816
2817             len = MIN(remaining, RDMA_SEND_INCREMENT);
2818             remaining -= len;
2819
2820             head.len = len;
2821             head.type = RDMA_CONTROL_QEMU_FILE;
2822
2823             ret = qemu_rdma_exchange_send(rdma, &head, data, NULL, NULL, NULL);

> I'm struggling to see how head is used before we set the type a couple
> of lines below. Could you expand on it?


IIUC, head is used for both common migration control path and RDMA specific control path.

hook_stage(RAM_SAVE_FLAG_HOOK) {
    rdma_hook_process(qemu_rdma_registration_handle) {
       do {
           // this is a RDMA own control block, should not be disturbed by the common migration control path.
           // head will be extracted and processed here.
           // qio_channel_rdma_writev() will send RDMA_CONTROL_QEMU_FILE, which is an unexpected message for this block.
           // head.repeat will be examined before the type, so an uninitialized repeat will confuse us here.
       } while (!RDMA_CONTROL_REGISTER_FINISHED || !error)
    }
}


when qio_channel_rdma_writev() is used for common migration control path, repeat is useless and will not be examined.

With this patch, we can quickly know the cause.


> 
> Also, a smoke test could have caught both issues early on. Is there any
> reason for not having any?

i have no idea yet :)


Thanks
Zhijian

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] migration: Fix rdma migration failed
  2023-09-20  9:04 [PATCH 1/2] migration: Fix rdma migration failed Li Zhijian
  2023-09-20  9:04 ` [PATCH 2/2] migration/rdma: zore out head.repeat to make the error more clear Li Zhijian
  2023-09-20 12:46 ` [PATCH 1/2] migration: Fix rdma migration failed Fabiano Rosas
@ 2023-09-21  1:40 ` Zhijian Li (Fujitsu)
  2023-09-22 15:42 ` Peter Xu
  3 siblings, 0 replies; 13+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-21  1:40 UTC (permalink / raw)
  To: quintela, peterx, leobras; +Cc: qemu-devel, Zhijian Li (Fujitsu)

Sorry to all, i forgot to update my email address to lizhijian@fujitsu.com.

Corrected it.


On 20/09/2023 17:04, Li Zhijian wrote:
> From: Li Zhijian <lizhijian@cn.fujitsu.com>
> 
> Destination will fail with:
> qemu-system-x86_64: rdma: Too many requests in this message (3638950032).Bailing.
> 
> migrate with RDMA is different from tcp. RDMA has its own control
> message, and all traffic between RDMA_CONTROL_REGISTER_REQUEST and
> RDMA_CONTROL_REGISTER_FINISHED should not be disturbed.
> 
> find_dirty_block() will be called during RDMA_CONTROL_REGISTER_REQUEST
> and RDMA_CONTROL_REGISTER_FINISHED, it will send a extra traffic to
> destination and cause migration to fail.
> 
> Since there's no existing subroutine to indicate whether it's migrated
> by RDMA or not, and RDMA is not compatible with multifd, we use
> migrate_multifd() here.
> 
> Fixes: 294e5a4034 ("multifd: Only flush once each full round of memory")
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> ---
>   migration/ram.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 9040d66e61..89ae28e21a 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1399,7 +1399,8 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss)
>           pss->page = 0;
>           pss->block = QLIST_NEXT_RCU(pss->block, next);
>           if (!pss->block) {
> -            if (!migrate_multifd_flush_after_each_section()) {
> +            if (migrate_multifd() &&
> +                !migrate_multifd_flush_after_each_section()) {
>                   QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel;
>                   int ret = multifd_send_sync_main(f);
>                   if (ret < 0) {

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] migration/rdma: zore out head.repeat to make the error more clear
  2023-09-21  1:36     ` Zhijian Li (Fujitsu)
@ 2023-09-21 12:29       ` Fabiano Rosas
  0 siblings, 0 replies; 13+ messages in thread
From: Fabiano Rosas @ 2023-09-21 12:29 UTC (permalink / raw)
  To: Zhijian Li (Fujitsu), quintela, peterx, leobras
  Cc: qemu-devel, Zhijian Li (Fujitsu)

"Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com> writes:

> On 20/09/2023 21:01, Fabiano Rosas wrote:
>> Li Zhijian <lizhijian@fujitsu.com> writes:
>> 
>>> From: Li Zhijian <lizhijian@cn.fujitsu.com>
>>>
>>> Previously, we got a confusion error that complains
>>> the RDMAControlHeader.repeat:
>>> qemu-system-x86_64: rdma: Too many requests in this message (3638950032).Bailing.
>>>
>>> Actually, it's caused by an unexpected RDMAControlHeader.type.
>>> After this patch, error will become:
>>> qemu-system-x86_64: Unknown control message QEMU FILE
>>>
>>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>>> ---
>>>   migration/rdma.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/migration/rdma.c b/migration/rdma.c
>>> index a2a3db35b1..3073d9953c 100644
>>> --- a/migration/rdma.c
>>> +++ b/migration/rdma.c
>>> @@ -2812,7 +2812,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>>>           size_t remaining = iov[i].iov_len;
>>>           uint8_t * data = (void *)iov[i].iov_base;
>>>           while (remaining) {
>>> -            RDMAControlHeader head;
>>> +            RDMAControlHeader head = {};
>>>   
>>>               len = MIN(remaining, RDMA_SEND_INCREMENT);
>>>               remaining -= len;
>> 
>
> 2815             RDMAControlHeader head = {};
> 2816
> 2817             len = MIN(remaining, RDMA_SEND_INCREMENT);
> 2818             remaining -= len;
> 2819
> 2820             head.len = len;
> 2821             head.type = RDMA_CONTROL_QEMU_FILE;
> 2822
> 2823             ret = qemu_rdma_exchange_send(rdma, &head, data, NULL, NULL, NULL);
>
>> I'm struggling to see how head is used before we set the type a couple
>> of lines below. Could you expand on it?
>
>
> IIUC, head is used for both common migration control path and RDMA specific control path.
>
> hook_stage(RAM_SAVE_FLAG_HOOK) {
>     rdma_hook_process(qemu_rdma_registration_handle) {
>        do {
>            // this is a RDMA own control block, should not be disturbed by the common migration control path.
>            // head will be extracted and processed here.
>            // qio_channel_rdma_writev() will send RDMA_CONTROL_QEMU_FILE, which is an unexpected message for this block.
>            // head.repeat will be examined before the type, so an uninitialized repeat will confuse us here.
>        } while (!RDMA_CONTROL_REGISTER_FINISHED || !error)
>     }
> }
>
>
> when qio_channel_rdma_writev() is used for common migration control path, repeat is useless and will not be examined.
>
> With this patch, we can quickly know the cause.
>

Ah, right. Somehow I interpreted the commit message as meaning the
'type' field was bogus. But it's the 'repeat' field that causes the
issue. Thanks for the explanation.

Reviewed-by: Fabiano Rosas <farosas@suse.de>



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] migration: Fix rdma migration failed
  2023-09-20 12:46 ` [PATCH 1/2] migration: Fix rdma migration failed Fabiano Rosas
@ 2023-09-22  7:42   ` Zhijian Li (Fujitsu)
  0 siblings, 0 replies; 13+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-22  7:42 UTC (permalink / raw)
  To: Fabiano Rosas, quintela, peterx, leobras; +Cc: qemu-devel, Zhijian Li (Fujitsu)



On 20/09/2023 20:46, Fabiano Rosas wrote:
> Li Zhijian <lizhijian@fujitsu.com> writes:
> 
>> From: Li Zhijian <lizhijian@cn.fujitsu.com>
>>
>> Destination will fail with:
>> qemu-system-x86_64: rdma: Too many requests in this message (3638950032).Bailing.
>>
>> migrate with RDMA is different from tcp. RDMA has its own control
>> message, and all traffic between RDMA_CONTROL_REGISTER_REQUEST and
>> RDMA_CONTROL_REGISTER_FINISHED should not be disturbed.
> 
> Yeah, this is really fragile. We need a long term solution to this. Any
> other change to multifd protocol as well as any other change to the
> migration ram handling might hit this issue again.

Yeah, it's pain point.

Another option is that let RDMA control handler to know RAM_SAVE_FLAG_MULTIFD_FLUSH message
and do nothing with it.


> 
> Perhaps commit 294e5a4034 ("multifd: Only flush once each full round of
> memory") should simply not have touched the stream at that point, but we
> don't have any explicit safeguards to avoid interleaving flags from
> different layers like that (assuming multifd is at another logical layer
> than the ram handling)> 
> I don't have any good suggestions at this moment, so for now:
> 
> Reviewed-by: Fabiano Rosas <farosas@suse.de>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] migration: Fix rdma migration failed
  2023-09-20  9:04 [PATCH 1/2] migration: Fix rdma migration failed Li Zhijian
                   ` (2 preceding siblings ...)
  2023-09-21  1:40 ` Zhijian Li (Fujitsu)
@ 2023-09-22 15:42 ` Peter Xu
  2023-09-22 15:59   ` Fabiano Rosas
  2023-09-25  8:59   ` Zhijian Li (Fujitsu)
  3 siblings, 2 replies; 13+ messages in thread
From: Peter Xu @ 2023-09-22 15:42 UTC (permalink / raw)
  To: Li Zhijian; +Cc: quintela, leobras, qemu-devel, Li Zhijian

On Wed, Sep 20, 2023 at 05:04:11PM +0800, Li Zhijian wrote:
> From: Li Zhijian <lizhijian@cn.fujitsu.com>
> 
> Destination will fail with:
> qemu-system-x86_64: rdma: Too many requests in this message (3638950032).Bailing.
> 
> migrate with RDMA is different from tcp. RDMA has its own control
> message, and all traffic between RDMA_CONTROL_REGISTER_REQUEST and
> RDMA_CONTROL_REGISTER_FINISHED should not be disturbed.
> 
> find_dirty_block() will be called during RDMA_CONTROL_REGISTER_REQUEST
> and RDMA_CONTROL_REGISTER_FINISHED, it will send a extra traffic to
> destination and cause migration to fail.
> 
> Since there's no existing subroutine to indicate whether it's migrated
> by RDMA or not, and RDMA is not compatible with multifd, we use
> migrate_multifd() here.
> 
> Fixes: 294e5a4034 ("multifd: Only flush once each full round of memory")
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> ---
>  migration/ram.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 9040d66e61..89ae28e21a 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1399,7 +1399,8 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss)
>          pss->page = 0;
>          pss->block = QLIST_NEXT_RCU(pss->block, next);
>          if (!pss->block) {
> -            if (!migrate_multifd_flush_after_each_section()) {
> +            if (migrate_multifd() &&
> +                !migrate_multifd_flush_after_each_section()) {
>                  QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel;
>                  int ret = multifd_send_sync_main(f);
>                  if (ret < 0) {
> -- 
> 2.31.1
> 

Maybe better to put that check at the entry of
migrate_multifd_flush_after_each_section()?

I also hope that some day there's no multifd function called in generic
migration code paths..

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] migration/rdma: zore out head.repeat to make the error more clear
  2023-09-20  9:04 ` [PATCH 2/2] migration/rdma: zore out head.repeat to make the error more clear Li Zhijian
  2023-09-20 13:01   ` Fabiano Rosas
@ 2023-09-22 15:44   ` Peter Xu
  1 sibling, 0 replies; 13+ messages in thread
From: Peter Xu @ 2023-09-22 15:44 UTC (permalink / raw)
  To: Li Zhijian; +Cc: quintela, leobras, qemu-devel, Li Zhijian

On Wed, Sep 20, 2023 at 05:04:12PM +0800, Li Zhijian wrote:
> From: Li Zhijian <lizhijian@cn.fujitsu.com>
> 
> Previously, we got a confusion error that complains
> the RDMAControlHeader.repeat:
> qemu-system-x86_64: rdma: Too many requests in this message (3638950032).Bailing.
> 
> Actually, it's caused by an unexpected RDMAControlHeader.type.
> After this patch, error will become:
> qemu-system-x86_64: Unknown control message QEMU FILE
> 
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] migration: Fix rdma migration failed
  2023-09-22 15:42 ` Peter Xu
@ 2023-09-22 15:59   ` Fabiano Rosas
  2023-09-22 16:09     ` Peter Xu
  2023-09-25  8:59   ` Zhijian Li (Fujitsu)
  1 sibling, 1 reply; 13+ messages in thread
From: Fabiano Rosas @ 2023-09-22 15:59 UTC (permalink / raw)
  To: Peter Xu, Li Zhijian; +Cc: quintela, leobras, qemu-devel, Li Zhijian

Peter Xu <peterx@redhat.com> writes:

> On Wed, Sep 20, 2023 at 05:04:11PM +0800, Li Zhijian wrote:
>> From: Li Zhijian <lizhijian@cn.fujitsu.com>
>> 
>> Destination will fail with:
>> qemu-system-x86_64: rdma: Too many requests in this message (3638950032).Bailing.
>> 
>> migrate with RDMA is different from tcp. RDMA has its own control
>> message, and all traffic between RDMA_CONTROL_REGISTER_REQUEST and
>> RDMA_CONTROL_REGISTER_FINISHED should not be disturbed.
>> 
>> find_dirty_block() will be called during RDMA_CONTROL_REGISTER_REQUEST
>> and RDMA_CONTROL_REGISTER_FINISHED, it will send a extra traffic to
>> destination and cause migration to fail.
>> 
>> Since there's no existing subroutine to indicate whether it's migrated
>> by RDMA or not, and RDMA is not compatible with multifd, we use
>> migrate_multifd() here.
>> 
>> Fixes: 294e5a4034 ("multifd: Only flush once each full round of memory")
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> ---
>>  migration/ram.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>> 
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 9040d66e61..89ae28e21a 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -1399,7 +1399,8 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss)
>>          pss->page = 0;
>>          pss->block = QLIST_NEXT_RCU(pss->block, next);
>>          if (!pss->block) {
>> -            if (!migrate_multifd_flush_after_each_section()) {
>> +            if (migrate_multifd() &&
>> +                !migrate_multifd_flush_after_each_section()) {
>>                  QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel;
>>                  int ret = multifd_send_sync_main(f);
>>                  if (ret < 0) {
>> -- 
>> 2.31.1
>> 
>
> Maybe better to put that check at the entry of
> migrate_multifd_flush_after_each_section()?
>
> I also hope that some day there's no multifd function called in generic
> migration code paths..

I wonder what happened with that MigrationOps idea. We added the
ram_save_target_page pointer and nothing else. It seems like it could be
something in the direction of allowing different parts of the migration
code to provide different behavior without having to put these explicit
checks all over the place.

And although we're removing the QEMUFile hooks, those also looked like
they could help mitigate these cross-layer interactions. (I'm NOT
advocating bringing them back).


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] migration: Fix rdma migration failed
  2023-09-22 15:59   ` Fabiano Rosas
@ 2023-09-22 16:09     ` Peter Xu
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Xu @ 2023-09-22 16:09 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: Li Zhijian, quintela, leobras, qemu-devel, Li Zhijian

On Fri, Sep 22, 2023 at 12:59:37PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Wed, Sep 20, 2023 at 05:04:11PM +0800, Li Zhijian wrote:
> >> From: Li Zhijian <lizhijian@cn.fujitsu.com>
> >> 
> >> Destination will fail with:
> >> qemu-system-x86_64: rdma: Too many requests in this message (3638950032).Bailing.
> >> 
> >> migrate with RDMA is different from tcp. RDMA has its own control
> >> message, and all traffic between RDMA_CONTROL_REGISTER_REQUEST and
> >> RDMA_CONTROL_REGISTER_FINISHED should not be disturbed.
> >> 
> >> find_dirty_block() will be called during RDMA_CONTROL_REGISTER_REQUEST
> >> and RDMA_CONTROL_REGISTER_FINISHED, it will send a extra traffic to
> >> destination and cause migration to fail.
> >> 
> >> Since there's no existing subroutine to indicate whether it's migrated
> >> by RDMA or not, and RDMA is not compatible with multifd, we use
> >> migrate_multifd() here.
> >> 
> >> Fixes: 294e5a4034 ("multifd: Only flush once each full round of memory")
> >> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> >> ---
> >>  migration/ram.c | 3 ++-
> >>  1 file changed, 2 insertions(+), 1 deletion(-)
> >> 
> >> diff --git a/migration/ram.c b/migration/ram.c
> >> index 9040d66e61..89ae28e21a 100644
> >> --- a/migration/ram.c
> >> +++ b/migration/ram.c
> >> @@ -1399,7 +1399,8 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss)
> >>          pss->page = 0;
> >>          pss->block = QLIST_NEXT_RCU(pss->block, next);
> >>          if (!pss->block) {
> >> -            if (!migrate_multifd_flush_after_each_section()) {
> >> +            if (migrate_multifd() &&
> >> +                !migrate_multifd_flush_after_each_section()) {
> >>                  QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel;
> >>                  int ret = multifd_send_sync_main(f);
> >>                  if (ret < 0) {
> >> -- 
> >> 2.31.1
> >> 
> >
> > Maybe better to put that check at the entry of
> > migrate_multifd_flush_after_each_section()?
> >
> > I also hope that some day there's no multifd function called in generic
> > migration code paths..
> 
> I wonder what happened with that MigrationOps idea. We added the
> ram_save_target_page pointer and nothing else. It seems like it could be
> something in the direction of allowing different parts of the migration
> code to provide different behavior without having to put these explicit
> checks all over the place.

Yeah..

https://lore.kernel.org/qemu-devel/20230130080956.3047-12-quintela@redhat.com/

Juan should know better.

Personally I think it'll be good we only introduce hook when there's a 2nd
user already.  I assume Juan merged it planning that'll land soon but it
didn't.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] migration: Fix rdma migration failed
  2023-09-22 15:42 ` Peter Xu
  2023-09-22 15:59   ` Fabiano Rosas
@ 2023-09-25  8:59   ` Zhijian Li (Fujitsu)
  1 sibling, 0 replies; 13+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-09-25  8:59 UTC (permalink / raw)
  To: Peter Xu; +Cc: quintela, leobras, qemu-devel, Zhijian Li (Fujitsu)



On 22/09/2023 23:42, Peter Xu wrote:
> On Wed, Sep 20, 2023 at 05:04:11PM +0800, Li Zhijian wrote:
>> From: Li Zhijian <lizhijian@cn.fujitsu.com>
>>
>> Destination will fail with:
>> qemu-system-x86_64: rdma: Too many requests in this message (3638950032).Bailing.
>>
>> migrate with RDMA is different from tcp. RDMA has its own control
>> message, and all traffic between RDMA_CONTROL_REGISTER_REQUEST and
>> RDMA_CONTROL_REGISTER_FINISHED should not be disturbed.
>>
>> find_dirty_block() will be called during RDMA_CONTROL_REGISTER_REQUEST
>> and RDMA_CONTROL_REGISTER_FINISHED, it will send a extra traffic to
>> destination and cause migration to fail.
>>
>> Since there's no existing subroutine to indicate whether it's migrated
>> by RDMA or not, and RDMA is not compatible with multifd, we use
>> migrate_multifd() here.
>>
>> Fixes: 294e5a4034 ("multifd: Only flush once each full round of memory")
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> ---
>>   migration/ram.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 9040d66e61..89ae28e21a 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -1399,7 +1399,8 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss)
>>           pss->page = 0;
>>           pss->block = QLIST_NEXT_RCU(pss->block, next);
>>           if (!pss->block) {
>> -            if (!migrate_multifd_flush_after_each_section()) {
>> +            if (migrate_multifd() &&
>> +                !migrate_multifd_flush_after_each_section()) {
>>                   QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel;
>>                   int ret = multifd_send_sync_main(f);
>>                   if (ret < 0) {
>> -- 
>> 2.31.1
>>
> 
> Maybe better to put that check at the entry of
> migrate_multifd_flush_after_each_section()?
> 

It sounds good to me:
diff --git a/migration/options.c b/migration/options.c
index 1d1e1321b0a..327bcf2fbe4 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -368,7 +368,7 @@ bool migrate_multifd_flush_after_each_section(void)
  {
      MigrationState *s = migrate_get_current();

-    return s->multifd_flush_after_each_section;
+    return !migrate_multifd() || s->multifd_flush_after_each_section;
  }

  bool migrate_postcopy(void)


That changes make migrate_multifd_flush_after_each_section() always true when multifd is disabled.

Thanks



> I also hope that some day there's no multifd function called in generic
> migration code paths..
> 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-09-25  9:00 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-20  9:04 [PATCH 1/2] migration: Fix rdma migration failed Li Zhijian
2023-09-20  9:04 ` [PATCH 2/2] migration/rdma: zore out head.repeat to make the error more clear Li Zhijian
2023-09-20 13:01   ` Fabiano Rosas
2023-09-21  1:36     ` Zhijian Li (Fujitsu)
2023-09-21 12:29       ` Fabiano Rosas
2023-09-22 15:44   ` Peter Xu
2023-09-20 12:46 ` [PATCH 1/2] migration: Fix rdma migration failed Fabiano Rosas
2023-09-22  7:42   ` Zhijian Li (Fujitsu)
2023-09-21  1:40 ` Zhijian Li (Fujitsu)
2023-09-22 15:42 ` Peter Xu
2023-09-22 15:59   ` Fabiano Rosas
2023-09-22 16:09     ` Peter Xu
2023-09-25  8:59   ` Zhijian Li (Fujitsu)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).