xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] tools/libxenguest: Fix migration's debug option
@ 2021-07-02 19:03 Andrew Cooper
  2021-07-05  7:53 ` Olaf Hering
  2021-07-05  7:57 ` Jan Beulich
  0 siblings, 2 replies; 11+ messages in thread
From: Andrew Cooper @ 2021-07-02 19:03 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Ian Jackson, Wei Liu, Olaf Hering

The code has gone through many refactors, but the first refactor was the one
which broke it by inverting the check with respect to checkpointed streams.

Fixes: 7449fb36c6c8 ("migration/save: pass checkpointed_stream from libxl to libxc")
Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Ian Jackson <iwj@xenproject.org>
CC: Wei Liu <wl@xen.org>
CC: Olaf Hering <olaf@aepfle.de>

`xl migrate --debug` might not be perfect, but this at least brings it back to
mostly working.

I don't think dropping it is a sensible move.  In particular, it is invaluable
for testing the logdirty infrastructure when migrating a memtest VM.

If anyone has a clever idea to fix the grant problem, then we can.  It is
after all a debug option, without any specific prescribed behaviour.
---
 tools/libs/guest/xg_sr_save.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
index 2ba7c3200cd5..f0e2bd048d37 100644
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -752,7 +752,7 @@ static int send_domain_memory_live(struct xc_sr_context *ctx)
     if ( rc )
         goto out;
 
-    if ( ctx->save.debug && ctx->stream_type != XC_STREAM_PLAIN )
+    if ( ctx->save.debug && ctx->stream_type == XC_STREAM_PLAIN )
     {
         rc = verify_frames(ctx);
         if ( rc )
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] tools/libxenguest: Fix migration's debug option
  2021-07-02 19:03 [PATCH] tools/libxenguest: Fix migration's debug option Andrew Cooper
@ 2021-07-05  7:53 ` Olaf Hering
  2021-07-05  7:57 ` Jan Beulich
  1 sibling, 0 replies; 11+ messages in thread
From: Olaf Hering @ 2021-07-05  7:53 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Ian Jackson, Wei Liu

[-- Attachment #1: Type: text/plain, Size: 495 bytes --]

Am Fri, 2 Jul 2021 20:03:42 +0100
schrieb Andrew Cooper <andrew.cooper3@citrix.com>:

> The code has gone through many refactors, but the first refactor was the one
> which broke it by inverting the check with respect to checkpointed streams.
> 
> Fixes: 7449fb36c6c8 ("migration/save: pass checkpointed_stream from libxl to libxc")
> Reported-by: Olaf Hering <olaf@aepfle.de>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Olaf Hering <olaf@aepfle.de>

Olaf

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] tools/libxenguest: Fix migration's debug option
  2021-07-02 19:03 [PATCH] tools/libxenguest: Fix migration's debug option Andrew Cooper
  2021-07-05  7:53 ` Olaf Hering
@ 2021-07-05  7:57 ` Jan Beulich
  2021-07-05  8:02   ` Olaf Hering
  1 sibling, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2021-07-05  7:57 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Ian Jackson, Wei Liu, Olaf Hering, Xen-devel

On 02.07.2021 21:03, Andrew Cooper wrote:
> The code has gone through many refactors, but the first refactor was the one
> which broke it by inverting the check with respect to checkpointed streams.
> 
> Fixes: 7449fb36c6c8 ("migration/save: pass checkpointed_stream from libxl to libxc")
> Reported-by: Olaf Hering <olaf@aepfle.de>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

> ---
> CC: Ian Jackson <iwj@xenproject.org>
> CC: Wei Liu <wl@xen.org>
> CC: Olaf Hering <olaf@aepfle.de>
> 
> `xl migrate --debug` might not be perfect, but this at least brings it back to
> mostly working.
> 
> I don't think dropping it is a sensible move.  In particular, it is invaluable
> for testing the logdirty infrastructure when migrating a memtest VM.
> 
> If anyone has a clever idea to fix the grant problem, then we can.  It is
> after all a debug option, without any specific prescribed behaviour.

What is "the grant problem" referring to here? Neither anything above
nor the offending original commit has any reference to grants, or a
problem with them.

Jan



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] tools/libxenguest: Fix migration's debug option
  2021-07-05  7:57 ` Jan Beulich
@ 2021-07-05  8:02   ` Olaf Hering
  2021-07-05  8:23     ` Jan Beulich
  0 siblings, 1 reply; 11+ messages in thread
From: Olaf Hering @ 2021-07-05  8:02 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Ian Jackson, Wei Liu, Xen-devel

[-- Attachment #1: Type: text/plain, Size: 554 bytes --]

Am Mon, 5 Jul 2021 09:57:21 +0200
schrieb Jan Beulich <jbeulich@suse.com>:

> What is "the grant problem" referring to here? Neither anything above
> nor the offending original commit has any reference to grants, or a
> problem with them.

When the guest is paused during final transit, the backends will continue to write into domU memory. As a result the final additional iteration to verify memory on both sides will always see errors. The code has no way to know for which pfn such mismatches in page content can safely be ignored.


Olaf

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] tools/libxenguest: Fix migration's debug option
  2021-07-05  8:02   ` Olaf Hering
@ 2021-07-05  8:23     ` Jan Beulich
  2021-07-05  8:32       ` Olaf Hering
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2021-07-05  8:23 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Andrew Cooper, Ian Jackson, Wei Liu, Xen-devel

On 05.07.2021 10:02, Olaf Hering wrote:
> Am Mon, 5 Jul 2021 09:57:21 +0200
> schrieb Jan Beulich <jbeulich@suse.com>:
> 
>> What is "the grant problem" referring to here? Neither anything above
>> nor the offending original commit has any reference to grants, or a
>> problem with them.
> 
> When the guest is paused during final transit, the backends will
> continue to write into domU memory. As a result the final additional
> iteration to verify memory on both sides will always see errors.

I see. A similar problem then exists with at least the FIFO event
channel per-vCPU control blocks?

> The code has no way to know for which pfn such mismatches in page
> content can safely be ignored.

Well, in principle this can be known, but it's expensive: For a
paused domain the grant table can't change anymore. Any pages
referenced by a valid non-r/o grant table entry could in principle
change.

Jan



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] tools/libxenguest: Fix migration's debug option
  2021-07-05  8:23     ` Jan Beulich
@ 2021-07-05  8:32       ` Olaf Hering
  2021-07-05  9:19         ` Jan Beulich
  0 siblings, 1 reply; 11+ messages in thread
From: Olaf Hering @ 2021-07-05  8:32 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Ian Jackson, Wei Liu, Xen-devel

[-- Attachment #1: Type: text/plain, Size: 450 bytes --]

Am Mon, 5 Jul 2021 10:23:00 +0200
schrieb Jan Beulich <jbeulich@suse.com>:

> I see. A similar problem then exists with at least the FIFO event
> channel per-vCPU control blocks?

I have not done any debugging how the pages differ and what they are actually used for.

My guess was that it might be activity from the backends, particularly netback. I found no API to query the usage of a page, so I declared the interface broken..


Olaf

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] tools/libxenguest: Fix migration's debug option
  2021-07-05  8:32       ` Olaf Hering
@ 2021-07-05  9:19         ` Jan Beulich
  2021-07-05  9:25           ` Olaf Hering
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2021-07-05  9:19 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Andrew Cooper, Ian Jackson, Wei Liu, Xen-devel

On 05.07.2021 10:32, Olaf Hering wrote:
> Am Mon, 5 Jul 2021 10:23:00 +0200
> schrieb Jan Beulich <jbeulich@suse.com>:
> 
>> I see. A similar problem then exists with at least the FIFO event
>> channel per-vCPU control blocks?
> 
> I have not done any debugging how the pages differ and what they are actually used for.
> 
> My guess was that it might be activity from the backends, particularly
> netback. I found no API to query the usage of a page, so I declared the
> interface broken..

"The interface" being which one? The tool stack can map the guest's
grant table, so it is in the position to find out about all grants
without further hypervisor help. Afaict the same isn't true for the
FIFO event channel control blocks, though, and maybe there are
further ways by which pages may get modified for a paused guest.

Jan



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] tools/libxenguest: Fix migration's debug option
  2021-07-05  9:19         ` Jan Beulich
@ 2021-07-05  9:25           ` Olaf Hering
  2021-07-05  9:31             ` Jan Beulich
  0 siblings, 1 reply; 11+ messages in thread
From: Olaf Hering @ 2021-07-05  9:25 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Ian Jackson, Wei Liu, Xen-devel

[-- Attachment #1: Type: text/plain, Size: 428 bytes --]

Am Mon, 5 Jul 2021 11:19:59 +0200
schrieb Jan Beulich <jbeulich@suse.com>:

> "The interface" being which one? The tool stack can map the guest's
> grant table, so it is in the position to find out about all grants
> without further hypervisor help.

The interface means the code behind verify_frames.

If there are indeed ways to query which pages belong to grants, how would the toolstack need to do that?


Olaf

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] tools/libxenguest: Fix migration's debug option
  2021-07-05  9:25           ` Olaf Hering
@ 2021-07-05  9:31             ` Jan Beulich
  2021-07-05 10:06               ` Andrew Cooper
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2021-07-05  9:31 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Andrew Cooper, Ian Jackson, Wei Liu, Xen-devel

On 05.07.2021 11:25, Olaf Hering wrote:
> Am Mon, 5 Jul 2021 11:19:59 +0200
> schrieb Jan Beulich <jbeulich@suse.com>:
> 
>> "The interface" being which one? The tool stack can map the guest's
>> grant table, so it is in the position to find out about all grants
>> without further hypervisor help.
> 
> The interface means the code behind verify_frames.
> 
> If there are indeed ways to query which pages belong to grants, how would the toolstack need to do that?

Map the grant table of the guest and walk it, recording any MFN for
which at least one valid r/w grant exists.

Jan



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] tools/libxenguest: Fix migration's debug option
  2021-07-05  9:31             ` Jan Beulich
@ 2021-07-05 10:06               ` Andrew Cooper
  2021-07-05 10:36                 ` Jan Beulich
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Cooper @ 2021-07-05 10:06 UTC (permalink / raw)
  To: Jan Beulich, Olaf Hering; +Cc: Ian Jackson, Wei Liu, Xen-devel

On 05/07/2021 10:31, Jan Beulich wrote:
> On 05.07.2021 11:25, Olaf Hering wrote:
>> Am Mon, 5 Jul 2021 11:19:59 +0200
>> schrieb Jan Beulich <jbeulich@suse.com>:
>>
>>> "The interface" being which one? The tool stack can map the guest's
>>> grant table, so it is in the position to find out about all grants
>>> without further hypervisor help.
>> The interface means the code behind verify_frames.
>>
>> If there are indeed ways to query which pages belong to grants, how would the toolstack need to do that?
> Map the grant table of the guest and walk it, recording any MFN for
> which at least one valid r/w grant exists.

That doesn't help - Its still racy with in-flight IO.  Also with updates
from Xen such as the wallclocks.

The only way to fix the IO problem is to disconnect the blk/net rings
before doing the final sweep for frames, but that clobbers any ability
to restart the VM on the source side if things go wrong at the destination.

I don't have an answer at all for the vcpu info frames.

~Andrew



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] tools/libxenguest: Fix migration's debug option
  2021-07-05 10:06               ` Andrew Cooper
@ 2021-07-05 10:36                 ` Jan Beulich
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Beulich @ 2021-07-05 10:36 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Ian Jackson, Wei Liu, Xen-devel, Olaf Hering

On 05.07.2021 12:06, Andrew Cooper wrote:
> On 05/07/2021 10:31, Jan Beulich wrote:
>> On 05.07.2021 11:25, Olaf Hering wrote:
>>> Am Mon, 5 Jul 2021 11:19:59 +0200
>>> schrieb Jan Beulich <jbeulich@suse.com>:
>>>
>>>> "The interface" being which one? The tool stack can map the guest's
>>>> grant table, so it is in the position to find out about all grants
>>>> without further hypervisor help.
>>> The interface means the code behind verify_frames.
>>>
>>> If there are indeed ways to query which pages belong to grants, how would the toolstack need to do that?
>> Map the grant table of the guest and walk it, recording any MFN for
>> which at least one valid r/w grant exists.
> 
> That doesn't help - Its still racy with in-flight IO.

Well, I meant the recorded data to be used to simply not verify
those frames.

>  Also with updates from Xen such as the wallclocks.

This doesn't occur for a paused domain, does it?

> The only way to fix the IO problem is to disconnect the blk/net rings
> before doing the final sweep for frames, but that clobbers any ability
> to restart the VM on the source side if things go wrong at the destination.
> 
> I don't have an answer at all for the vcpu info frames.

Yeah, they fall in the same category as the FIFO control pages, as
they contain evtchn_{pending_sel,upcall_pending}.

Jan



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-07-05 10:36 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-02 19:03 [PATCH] tools/libxenguest: Fix migration's debug option Andrew Cooper
2021-07-05  7:53 ` Olaf Hering
2021-07-05  7:57 ` Jan Beulich
2021-07-05  8:02   ` Olaf Hering
2021-07-05  8:23     ` Jan Beulich
2021-07-05  8:32       ` Olaf Hering
2021-07-05  9:19         ` Jan Beulich
2021-07-05  9:25           ` Olaf Hering
2021-07-05  9:31             ` Jan Beulich
2021-07-05 10:06               ` Andrew Cooper
2021-07-05 10:36                 ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).