All of lore.kernel.org
 help / color / mirror / Atom feed
* question about migration
@ 2015-12-24  2:29 Wen Congyang
  2015-12-24 12:36 ` Andrew Cooper
  0 siblings, 1 reply; 21+ messages in thread
From: Wen Congyang @ 2015-12-24  2:29 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen devel

Hi Andrew Cooper:

I rebase the COLO codes to the newest upstream xen, and test it. I found
a problem in the test, and I can reproduce this problem via the migration.

How to reproduce:
1. xl cr -p hvm_nopv
2. xl migrate hvm_nopv 192.168.3.1

The migration successes, but the vm doesn't run in the target machine.
You can get the reason from 'xl dmesg':
(XEN) HVM2 restore: VMCE_VCPU 1
(XEN) HVM2 restore: TSC_ADJUST 0
(XEN) HVM2 restore: TSC_ADJUST 1
(d2) HVM Loader
(d2) Detected Xen v4.7-unstable
(d2) Get guest memory maps[128] failed. (-38)
(d2) *** HVMLoader bug at e820.c:39
(d2) *** HVMLoader crashed.

The reason is that:
We don't call xc_domain_set_memory_map() in the target machine.
When we create a hvm domain:
libxl__domain_build()
    libxl__build_hvm()
        libxl__arch_domain_construct_memmap()
            xc_domain_set_memory_map()

Should we migrate the guest memory from source machine to target machine?

Thanks
Wen Congyang

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: question about migration
  2015-12-24  2:29 question about migration Wen Congyang
@ 2015-12-24 12:36 ` Andrew Cooper
  2015-12-25  0:55   ` Wen Congyang
  2015-12-25  1:45   ` Wen Congyang
  0 siblings, 2 replies; 21+ messages in thread
From: Andrew Cooper @ 2015-12-24 12:36 UTC (permalink / raw)
  To: Wen Congyang; +Cc: xen devel

On 24/12/15 02:29, Wen Congyang wrote:
> Hi Andrew Cooper:
>
> I rebase the COLO codes to the newest upstream xen, and test it. I found
> a problem in the test, and I can reproduce this problem via the migration.
>
> How to reproduce:
> 1. xl cr -p hvm_nopv
> 2. xl migrate hvm_nopv 192.168.3.1

You are the very first person to try a usecase like this.

It works as much as it does because of your changes to the uncooperative 
HVM domain logic.  I have said repeatedly during review, this is not 
necessarily a safe change to make without an in-depth analysis of the 
knock-on effects; it looks as if you have found the first knock-on effect.

>
> The migration successes, but the vm doesn't run in the target machine.
> You can get the reason from 'xl dmesg':
> (XEN) HVM2 restore: VMCE_VCPU 1
> (XEN) HVM2 restore: TSC_ADJUST 0
> (XEN) HVM2 restore: TSC_ADJUST 1
> (d2) HVM Loader
> (d2) Detected Xen v4.7-unstable
> (d2) Get guest memory maps[128] failed. (-38)
> (d2) *** HVMLoader bug at e820.c:39
> (d2) *** HVMLoader crashed.
>
> The reason is that:
> We don't call xc_domain_set_memory_map() in the target machine.
> When we create a hvm domain:
> libxl__domain_build()
>      libxl__build_hvm()
>          libxl__arch_domain_construct_memmap()
>              xc_domain_set_memory_map()
>
> Should we migrate the guest memory from source machine to target machine?

This bug specifically is because HVMLoader is expected to have run and 
turned the hypercall information in an E820 table in the guest before a 
migration occurs.

Unfortunately, the current codebase is riddled with such assumption and 
expectations (e.g. the HVM save code assumed that FPU context is valid 
when it is saving register state) which is a direct side effect of how 
it was developed.


Having said all of the above, I agree that your example is a usecase 
which should work.  It is the ultimate test of whether the migration 
stream contains enough information to faithfully reproduce the domain on 
the far side.  Clearly at the moment, this is not the case.

I have an upcoming project to work on the domain memory layout logic, 
because it is unsuitable for a number of XenServer usecases. Part of 
that will require moving it in the migration stream.

~Andrew

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: question about migration
  2015-12-24 12:36 ` Andrew Cooper
@ 2015-12-25  0:55   ` Wen Congyang
  2015-12-29 10:57     ` Andrew Cooper
  2015-12-25  1:45   ` Wen Congyang
  1 sibling, 1 reply; 21+ messages in thread
From: Wen Congyang @ 2015-12-25  0:55 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen devel

On 12/24/2015 08:36 PM, Andrew Cooper wrote:
> On 24/12/15 02:29, Wen Congyang wrote:
>> Hi Andrew Cooper:
>>
>> I rebase the COLO codes to the newest upstream xen, and test it. I found
>> a problem in the test, and I can reproduce this problem via the migration.
>>
>> How to reproduce:
>> 1. xl cr -p hvm_nopv
>> 2. xl migrate hvm_nopv 192.168.3.1
> 
> You are the very first person to try a usecase like this.
> 
> It works as much as it does because of your changes to the uncooperative HVM domain logic.  I have said repeatedly during review, this is not necessarily a safe change to make without an in-depth analysis of the knock-on effects; it looks as if you have found the first knock-on effect.
> 
>>
>> The migration successes, but the vm doesn't run in the target machine.
>> You can get the reason from 'xl dmesg':
>> (XEN) HVM2 restore: VMCE_VCPU 1
>> (XEN) HVM2 restore: TSC_ADJUST 0
>> (XEN) HVM2 restore: TSC_ADJUST 1
>> (d2) HVM Loader
>> (d2) Detected Xen v4.7-unstable
>> (d2) Get guest memory maps[128] failed. (-38)
>> (d2) *** HVMLoader bug at e820.c:39
>> (d2) *** HVMLoader crashed.
>>
>> The reason is that:
>> We don't call xc_domain_set_memory_map() in the target machine.
>> When we create a hvm domain:
>> libxl__domain_build()
>>      libxl__build_hvm()
>>          libxl__arch_domain_construct_memmap()
>>              xc_domain_set_memory_map()
>>
>> Should we migrate the guest memory from source machine to target machine?
> 
> This bug specifically is because HVMLoader is expected to have run and turned the hypercall information in an E820 table in the guest before a migration occurs.
> 
> Unfortunately, the current codebase is riddled with such assumption and expectations (e.g. the HVM save code assumed that FPU context is valid when it is saving register state) which is a direct side effect of how it was developed.

Does FPU context have the similar problem?
IIRC, I have tested colo befroe 4.6 is released. It works. In my test, I always
use the option '-p' to start the HVM guest.

> 
> 
> Having said all of the above, I agree that your example is a usecase which should work.  It is the ultimate test of whether the migration stream contains enough information to faithfully reproduce the domain on the far side.  Clearly at the moment, this is not the case.

I think it should work. But the user doesn't use the migration like this.
So it is not a serious problem.

Thanks
Wen Congyang

> 
> I have an upcoming project to work on the domain memory layout logic, because it is unsuitable for a number of XenServer usecases. Part of that will require moving it in the migration stream.
> 
> ~Andrew
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: question about migration
  2015-12-24 12:36 ` Andrew Cooper
  2015-12-25  0:55   ` Wen Congyang
@ 2015-12-25  1:45   ` Wen Congyang
  2015-12-25  3:06     ` Wen Congyang
  2015-12-29 11:24     ` Andrew Cooper
  1 sibling, 2 replies; 21+ messages in thread
From: Wen Congyang @ 2015-12-25  1:45 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen devel

On 12/24/2015 08:36 PM, Andrew Cooper wrote:
> On 24/12/15 02:29, Wen Congyang wrote:
>> Hi Andrew Cooper:
>>
>> I rebase the COLO codes to the newest upstream xen, and test it. I found
>> a problem in the test, and I can reproduce this problem via the migration.
>>
>> How to reproduce:
>> 1. xl cr -p hvm_nopv
>> 2. xl migrate hvm_nopv 192.168.3.1
> 
> You are the very first person to try a usecase like this.
> 
> It works as much as it does because of your changes to the uncooperative HVM domain logic.  I have said repeatedly during review, this is not necessarily a safe change to make without an in-depth analysis of the knock-on effects; it looks as if you have found the first knock-on effect.
> 
>>
>> The migration successes, but the vm doesn't run in the target machine.
>> You can get the reason from 'xl dmesg':
>> (XEN) HVM2 restore: VMCE_VCPU 1
>> (XEN) HVM2 restore: TSC_ADJUST 0
>> (XEN) HVM2 restore: TSC_ADJUST 1
>> (d2) HVM Loader
>> (d2) Detected Xen v4.7-unstable
>> (d2) Get guest memory maps[128] failed. (-38)
>> (d2) *** HVMLoader bug at e820.c:39
>> (d2) *** HVMLoader crashed.
>>
>> The reason is that:
>> We don't call xc_domain_set_memory_map() in the target machine.
>> When we create a hvm domain:
>> libxl__domain_build()
>>      libxl__build_hvm()
>>          libxl__arch_domain_construct_memmap()
>>              xc_domain_set_memory_map()
>>
>> Should we migrate the guest memory from source machine to target machine?
> 
> This bug specifically is because HVMLoader is expected to have run and turned the hypercall information in an E820 table in the guest before a migration occurs.
> 
> Unfortunately, the current codebase is riddled with such assumption and expectations (e.g. the HVM save code assumed that FPU context is valid when it is saving register state) which is a direct side effect of how it was developed.
> 
> 
> Having said all of the above, I agree that your example is a usecase which should work.  It is the ultimate test of whether the migration stream contains enough information to faithfully reproduce the domain on the far side.  Clearly at the moment, this is not the case.
> 
> I have an upcoming project to work on the domain memory layout logic, because it is unsuitable for a number of XenServer usecases. Part of that will require moving it in the migration stream.

I found another migration problem in the test:
If the migration fails, we will resume it in the source side.
But the hvm guest doesn't response any more.

In my test envirionment, the migration always successses, so I
use a hack way to reproduce it:
1. modify the target xen tools:

diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c
index 258dec4..da95606 100644
--- a/tools/libxl/libxl_stream_read.c
+++ b/tools/libxl/libxl_stream_read.c
@@ -767,6 +767,8 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
         goto err;
     }
 
+    rc = ERROR_FAIL;
+
  err:
     check_all_finished(egc, stream, rc);
 
2. xl cr hvm_nopv, and wait some time(You can login to the guest)
3. xl migrate hvm_nopv 192.168.3.1

The reason it that:
We create a default ioreq server when we get the hvm param HVM_PARAM_IOREQ_PFN.
It means that: the problem occurs only when the migration fails after we get
the hvm param HVM_PARAM_IOREQ_PFN.

In the function hvm_select_ioreq_server()
If the I/O will be handed by non-default ioreq server, we will return the
non-default ioreq server. In this case, it is handed by qemu.
If the I/O will not be handed by non-default ioreq server, we will return
the default ioreq server. Before migration, we return NULL, and after migration
it is not NULL. 
See the caller is hvmemul_do_io():
    case X86EMUL_UNHANDLEABLE:
    {
        struct hvm_ioreq_server *s =
            hvm_select_ioreq_server(curr->domain, &p);

        /* If there is no suitable backing DM, just ignore accesses */
        if ( !s )
        {
            rc = hvm_process_io_intercept(&null_handler, &p);
            vio->io_req.state = STATE_IOREQ_NONE;
        }
        else
        {
            rc = hvm_send_ioreq(s, &p, 0);
            if ( rc != X86EMUL_RETRY || curr->domain->is_shutting_down )
                vio->io_req.state = STATE_IOREQ_NONE;
            else if ( data_is_addr )
                rc = X86EMUL_OKAY;
        }
        break;

We send the I/O request to the default I/O request server, but no backing
DM hands it. We wil wait the I/O forever......

Thanks
Wen Congyang

> 
> ~Andrew
> 
> 
> .
> 

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: question about migration
  2015-12-25  1:45   ` Wen Congyang
@ 2015-12-25  3:06     ` Wen Congyang
  2015-12-29 12:46       ` Andrew Cooper
  2015-12-29 11:24     ` Andrew Cooper
  1 sibling, 1 reply; 21+ messages in thread
From: Wen Congyang @ 2015-12-25  3:06 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen devel

On 12/25/2015 09:45 AM, Wen Congyang wrote:
> On 12/24/2015 08:36 PM, Andrew Cooper wrote:
>> On 24/12/15 02:29, Wen Congyang wrote:
>>> Hi Andrew Cooper:
>>>
>>> I rebase the COLO codes to the newest upstream xen, and test it. I found
>>> a problem in the test, and I can reproduce this problem via the migration.
>>>
>>> How to reproduce:
>>> 1. xl cr -p hvm_nopv
>>> 2. xl migrate hvm_nopv 192.168.3.1
>>
>> You are the very first person to try a usecase like this.
>>
>> It works as much as it does because of your changes to the uncooperative HVM domain logic.  I have said repeatedly during review, this is not necessarily a safe change to make without an in-depth analysis of the knock-on effects; it looks as if you have found the first knock-on effect.
>>
>>>
>>> The migration successes, but the vm doesn't run in the target machine.
>>> You can get the reason from 'xl dmesg':
>>> (XEN) HVM2 restore: VMCE_VCPU 1
>>> (XEN) HVM2 restore: TSC_ADJUST 0
>>> (XEN) HVM2 restore: TSC_ADJUST 1
>>> (d2) HVM Loader
>>> (d2) Detected Xen v4.7-unstable
>>> (d2) Get guest memory maps[128] failed. (-38)
>>> (d2) *** HVMLoader bug at e820.c:39
>>> (d2) *** HVMLoader crashed.
>>>
>>> The reason is that:
>>> We don't call xc_domain_set_memory_map() in the target machine.
>>> When we create a hvm domain:
>>> libxl__domain_build()
>>>      libxl__build_hvm()
>>>          libxl__arch_domain_construct_memmap()
>>>              xc_domain_set_memory_map()
>>>
>>> Should we migrate the guest memory from source machine to target machine?
>>
>> This bug specifically is because HVMLoader is expected to have run and turned the hypercall information in an E820 table in the guest before a migration occurs.
>>
>> Unfortunately, the current codebase is riddled with such assumption and expectations (e.g. the HVM save code assumed that FPU context is valid when it is saving register state) which is a direct side effect of how it was developed.
>>
>>
>> Having said all of the above, I agree that your example is a usecase which should work.  It is the ultimate test of whether the migration stream contains enough information to faithfully reproduce the domain on the far side.  Clearly at the moment, this is not the case.
>>
>> I have an upcoming project to work on the domain memory layout logic, because it is unsuitable for a number of XenServer usecases. Part of that will require moving it in the migration stream.
> 
> I found another migration problem in the test:
> If the migration fails, we will resume it in the source side.
> But the hvm guest doesn't response any more.
> 
> In my test envirionment, the migration always successses, so I
> use a hack way to reproduce it:
> 1. modify the target xen tools:
> 
> diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c
> index 258dec4..da95606 100644
> --- a/tools/libxl/libxl_stream_read.c
> +++ b/tools/libxl/libxl_stream_read.c
> @@ -767,6 +767,8 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
>          goto err;
>      }
>  
> +    rc = ERROR_FAIL;
> +
>   err:
>      check_all_finished(egc, stream, rc);
>  
> 2. xl cr hvm_nopv, and wait some time(You can login to the guest)
> 3. xl migrate hvm_nopv 192.168.3.1

Another problem:
If migration fails after the guest is suspended, we will resume it in the source.
In this case, we cannot shutdown it. because no process hanlds the shutdown event.
The log in /var/log/xen/xl-hvm_nopv.log:
Waiting for domain hvm_nopv (domid 1) to die [pid 5508]
Domain 1 has shut down, reason code 2 0x2
Domain has suspended.
Done. Exiting now

The xl has exited...

Thanks
Wen Congyang
> 
> The reason it that:
> We create a default ioreq server when we get the hvm param HVM_PARAM_IOREQ_PFN.
> It means that: the problem occurs only when the migration fails after we get
> the hvm param HVM_PARAM_IOREQ_PFN.
> 
> In the function hvm_select_ioreq_server()
> If the I/O will be handed by non-default ioreq server, we will return the
> non-default ioreq server. In this case, it is handed by qemu.
> If the I/O will not be handed by non-default ioreq server, we will return
> the default ioreq server. Before migration, we return NULL, and after migration
> it is not NULL. 
> See the caller is hvmemul_do_io():
>     case X86EMUL_UNHANDLEABLE:
>     {
>         struct hvm_ioreq_server *s =
>             hvm_select_ioreq_server(curr->domain, &p);
> 
>         /* If there is no suitable backing DM, just ignore accesses */
>         if ( !s )
>         {
>             rc = hvm_process_io_intercept(&null_handler, &p);
>             vio->io_req.state = STATE_IOREQ_NONE;
>         }
>         else
>         {
>             rc = hvm_send_ioreq(s, &p, 0);
>             if ( rc != X86EMUL_RETRY || curr->domain->is_shutting_down )
>                 vio->io_req.state = STATE_IOREQ_NONE;
>             else if ( data_is_addr )
>                 rc = X86EMUL_OKAY;
>         }
>         break;
> 
> We send the I/O request to the default I/O request server, but no backing
> DM hands it. We wil wait the I/O forever......
> 
> Thanks
> Wen Congyang
> 
>>
>> ~Andrew
>>
>>
>> .
>>
> 
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> .
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: question about migration
  2015-12-25  0:55   ` Wen Congyang
@ 2015-12-29 10:57     ` Andrew Cooper
  0 siblings, 0 replies; 21+ messages in thread
From: Andrew Cooper @ 2015-12-29 10:57 UTC (permalink / raw)
  To: Wen Congyang; +Cc: xen devel



On 25/12/2015 00:55, Wen Congyang wrote:
> On 12/24/2015 08:36 PM, Andrew Cooper wrote:
>> On 24/12/15 02:29, Wen Congyang wrote:
>>> Hi Andrew Cooper:
>>>
>>> I rebase the COLO codes to the newest upstream xen, and test it. I found
>>> a problem in the test, and I can reproduce this problem via the migration.
>>>
>>> How to reproduce:
>>> 1. xl cr -p hvm_nopv
>>> 2. xl migrate hvm_nopv 192.168.3.1
>> You are the very first person to try a usecase like this.
>>
>> It works as much as it does because of your changes to the uncooperative HVM domain logic.  I have said repeatedly during review, this is not necessarily a safe change to make without an in-depth analysis of the knock-on effects; it looks as if you have found the first knock-on effect.
>>
>>> The migration successes, but the vm doesn't run in the target machine.
>>> You can get the reason from 'xl dmesg':
>>> (XEN) HVM2 restore: VMCE_VCPU 1
>>> (XEN) HVM2 restore: TSC_ADJUST 0
>>> (XEN) HVM2 restore: TSC_ADJUST 1
>>> (d2) HVM Loader
>>> (d2) Detected Xen v4.7-unstable
>>> (d2) Get guest memory maps[128] failed. (-38)
>>> (d2) *** HVMLoader bug at e820.c:39
>>> (d2) *** HVMLoader crashed.
>>>
>>> The reason is that:
>>> We don't call xc_domain_set_memory_map() in the target machine.
>>> When we create a hvm domain:
>>> libxl__domain_build()
>>>       libxl__build_hvm()
>>>           libxl__arch_domain_construct_memmap()
>>>               xc_domain_set_memory_map()
>>>
>>> Should we migrate the guest memory from source machine to target machine?
>> This bug specifically is because HVMLoader is expected to have run and turned the hypercall information in an E820 table in the guest before a migration occurs.
>>
>> Unfortunately, the current codebase is riddled with such assumption and expectations (e.g. the HVM save code assumed that FPU context is valid when it is saving register state) which is a direct side effect of how it was developed.
> Does FPU context have the similar problem?

Yes, although it is far harder to spot, and no software will likely 
crash as a result.

> IIRC, I have tested colo befroe 4.6 is released. It works. In my test, I always
> use the option '-p' to start the HVM guest.

If the FPU wasn't initialised, the save code memset()'s the x87 register 
block to 0.  On the restore side, this is taken an loaded back.

The problem is that a block of zeroes is valid for the x87, and not the 
default which the vcpu would expect to observe, given no resetting 
itself.  However, the first thing any real software will do is reset the 
values properly.

>
>>
>> Having said all of the above, I agree that your example is a usecase which should work.  It is the ultimate test of whether the migration stream contains enough information to faithfully reproduce the domain on the far side.  Clearly at the moment, this is not the case.
> I think it should work. But the user doesn't use the migration like this.
> So it is not a serious problem.

It is still worth identifying as an issue.

~Andrew

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: question about migration
  2015-12-25  1:45   ` Wen Congyang
  2015-12-25  3:06     ` Wen Congyang
@ 2015-12-29 11:24     ` Andrew Cooper
  2016-01-04 10:28       ` Paul Durrant
  1 sibling, 1 reply; 21+ messages in thread
From: Andrew Cooper @ 2015-12-29 11:24 UTC (permalink / raw)
  To: Wen Congyang; +Cc: Paul Durrant, xen devel

On 25/12/2015 01:45, Wen Congyang wrote:
> On 12/24/2015 08:36 PM, Andrew Cooper wrote:
>> On 24/12/15 02:29, Wen Congyang wrote:
>>> Hi Andrew Cooper:
>>>
>>> I rebase the COLO codes to the newest upstream xen, and test it. I found
>>> a problem in the test, and I can reproduce this problem via the migration.
>>>
>>> How to reproduce:
>>> 1. xl cr -p hvm_nopv
>>> 2. xl migrate hvm_nopv 192.168.3.1
>> You are the very first person to try a usecase like this.
>>
>> It works as much as it does because of your changes to the uncooperative HVM domain logic.  I have said repeatedly during review, this is not necessarily a safe change to make without an in-depth analysis of the knock-on effects; it looks as if you have found the first knock-on effect.
>>
>>> The migration successes, but the vm doesn't run in the target machine.
>>> You can get the reason from 'xl dmesg':
>>> (XEN) HVM2 restore: VMCE_VCPU 1
>>> (XEN) HVM2 restore: TSC_ADJUST 0
>>> (XEN) HVM2 restore: TSC_ADJUST 1
>>> (d2) HVM Loader
>>> (d2) Detected Xen v4.7-unstable
>>> (d2) Get guest memory maps[128] failed. (-38)
>>> (d2) *** HVMLoader bug at e820.c:39
>>> (d2) *** HVMLoader crashed.
>>>
>>> The reason is that:
>>> We don't call xc_domain_set_memory_map() in the target machine.
>>> When we create a hvm domain:
>>> libxl__domain_build()
>>>       libxl__build_hvm()
>>>           libxl__arch_domain_construct_memmap()
>>>               xc_domain_set_memory_map()
>>>
>>> Should we migrate the guest memory from source machine to target machine?
>> This bug specifically is because HVMLoader is expected to have run and turned the hypercall information in an E820 table in the guest before a migration occurs.
>>
>> Unfortunately, the current codebase is riddled with such assumption and expectations (e.g. the HVM save code assumed that FPU context is valid when it is saving register state) which is a direct side effect of how it was developed.
>>
>>
>> Having said all of the above, I agree that your example is a usecase which should work.  It is the ultimate test of whether the migration stream contains enough information to faithfully reproduce the domain on the far side.  Clearly at the moment, this is not the case.
>>
>> I have an upcoming project to work on the domain memory layout logic, because it is unsuitable for a number of XenServer usecases. Part of that will require moving it in the migration stream.
> I found another migration problem in the test:
> If the migration fails, we will resume it in the source side.
> But the hvm guest doesn't response any more.
>
> In my test envirionment, the migration always successses, so I

"succeeds"

> use a hack way to reproduce it:
> 1. modify the target xen tools:
>
> diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c
> index 258dec4..da95606 100644
> --- a/tools/libxl/libxl_stream_read.c
> +++ b/tools/libxl/libxl_stream_read.c
> @@ -767,6 +767,8 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
>           goto err;
>       }
>   
> +    rc = ERROR_FAIL;
> +
>    err:
>       check_all_finished(egc, stream, rc);
>   
> 2. xl cr hvm_nopv, and wait some time(You can login to the guest)
> 3. xl migrate hvm_nopv 192.168.3.1
>
> The reason it that:
> We create a default ioreq server when we get the hvm param HVM_PARAM_IOREQ_PFN.
> It means that: the problem occurs only when the migration fails after we get
> the hvm param HVM_PARAM_IOREQ_PFN.
>
> In the function hvm_select_ioreq_server()
> If the I/O will be handed by non-default ioreq server, we will return the
> non-default ioreq server. In this case, it is handed by qemu.
> If the I/O will not be handed by non-default ioreq server, we will return
> the default ioreq server. Before migration, we return NULL, and after migration
> it is not NULL.
> See the caller is hvmemul_do_io():
>      case X86EMUL_UNHANDLEABLE:
>      {
>          struct hvm_ioreq_server *s =
>              hvm_select_ioreq_server(curr->domain, &p);
>
>          /* If there is no suitable backing DM, just ignore accesses */
>          if ( !s )
>          {
>              rc = hvm_process_io_intercept(&null_handler, &p);
>              vio->io_req.state = STATE_IOREQ_NONE;
>          }
>          else
>          {
>              rc = hvm_send_ioreq(s, &p, 0);
>              if ( rc != X86EMUL_RETRY || curr->domain->is_shutting_down )
>                  vio->io_req.state = STATE_IOREQ_NONE;
>              else if ( data_is_addr )
>                  rc = X86EMUL_OKAY;
>          }
>          break;
>
> We send the I/O request to the default I/O request server, but no backing
> DM hands it. We wil wait the I/O forever......

Hmm yes.  This needs fixing.

CC'ing Paul who did the ioreq server work.

This bug is caused by the read side effects of HVM_PARAM_IOREQ_PFN. The 
migration code needs a way of being able to query whether a default 
ioreq server exists, without creating one.

Can you remember what the justification for the read side effects were?  
ISTR that it was only for qemu compatibility until the ioreq server work 
got in upstream.  If that was the case, can we drop the read side 
effects now and mandate that all qemus explicitly create their ioreq 
servers (even if this involves creating a default ioreq server for 
qemu-trad)?

~Andrew

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: question about migration
  2015-12-25  3:06     ` Wen Congyang
@ 2015-12-29 12:46       ` Andrew Cooper
  2016-01-04 15:31         ` Ian Jackson
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Cooper @ 2015-12-29 12:46 UTC (permalink / raw)
  To: Wen Congyang; +Cc: Wei Liu, Ian Jackson, Ian.Campbell, xen devel

On 25/12/2015 03:06, Wen Congyang wrote:
> On 12/25/2015 09:45 AM, Wen Congyang wrote:
>> On 12/24/2015 08:36 PM, Andrew Cooper wrote:
>>> On 24/12/15 02:29, Wen Congyang wrote:
>>>> Hi Andrew Cooper:
>>>>
>>>> I rebase the COLO codes to the newest upstream xen, and test it. I found
>>>> a problem in the test, and I can reproduce this problem via the migration.
>>>>
>>>> How to reproduce:
>>>> 1. xl cr -p hvm_nopv
>>>> 2. xl migrate hvm_nopv 192.168.3.1
>>> You are the very first person to try a usecase like this.
>>>
>>> It works as much as it does because of your changes to the uncooperative HVM domain logic.  I have said repeatedly during review, this is not necessarily a safe change to make without an in-depth analysis of the knock-on effects; it looks as if you have found the first knock-on effect.
>>>
>>>> The migration successes, but the vm doesn't run in the target machine.
>>>> You can get the reason from 'xl dmesg':
>>>> (XEN) HVM2 restore: VMCE_VCPU 1
>>>> (XEN) HVM2 restore: TSC_ADJUST 0
>>>> (XEN) HVM2 restore: TSC_ADJUST 1
>>>> (d2) HVM Loader
>>>> (d2) Detected Xen v4.7-unstable
>>>> (d2) Get guest memory maps[128] failed. (-38)
>>>> (d2) *** HVMLoader bug at e820.c:39
>>>> (d2) *** HVMLoader crashed.
>>>>
>>>> The reason is that:
>>>> We don't call xc_domain_set_memory_map() in the target machine.
>>>> When we create a hvm domain:
>>>> libxl__domain_build()
>>>>       libxl__build_hvm()
>>>>           libxl__arch_domain_construct_memmap()
>>>>               xc_domain_set_memory_map()
>>>>
>>>> Should we migrate the guest memory from source machine to target machine?
>>> This bug specifically is because HVMLoader is expected to have run and turned the hypercall information in an E820 table in the guest before a migration occurs.
>>>
>>> Unfortunately, the current codebase is riddled with such assumption and expectations (e.g. the HVM save code assumed that FPU context is valid when it is saving register state) which is a direct side effect of how it was developed.
>>>
>>>
>>> Having said all of the above, I agree that your example is a usecase which should work.  It is the ultimate test of whether the migration stream contains enough information to faithfully reproduce the domain on the far side.  Clearly at the moment, this is not the case.
>>>
>>> I have an upcoming project to work on the domain memory layout logic, because it is unsuitable for a number of XenServer usecases. Part of that will require moving it in the migration stream.
>> I found another migration problem in the test:
>> If the migration fails, we will resume it in the source side.
>> But the hvm guest doesn't response any more.
>>
>> In my test envirionment, the migration always successses, so I
>> use a hack way to reproduce it:
>> 1. modify the target xen tools:
>>
>> diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c
>> index 258dec4..da95606 100644
>> --- a/tools/libxl/libxl_stream_read.c
>> +++ b/tools/libxl/libxl_stream_read.c
>> @@ -767,6 +767,8 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
>>           goto err;
>>       }
>>   
>> +    rc = ERROR_FAIL;
>> +
>>    err:
>>       check_all_finished(egc, stream, rc);
>>   
>> 2. xl cr hvm_nopv, and wait some time(You can login to the guest)
>> 3. xl migrate hvm_nopv 192.168.3.1
> Another problem:
> If migration fails after the guest is suspended, we will resume it in the source.
> In this case, we cannot shutdown it. because no process hanlds the shutdown event.
> The log in /var/log/xen/xl-hvm_nopv.log:
> Waiting for domain hvm_nopv (domid 1) to die [pid 5508]
> Domain 1 has shut down, reason code 2 0x2
> Domain has suspended.
> Done. Exiting now
>
> The xl has exited...
>
> Thanks
> Wen Congyang

Hmm yes.  This is a libxl bug in libxl_evenable_domain_death(). CC'ing 
the toolstack maintainers.

It waits for the @releasedomain watch, but doesn't interpret the results 
correctly.  In particular, if it can still make successful hypercalls 
with the provided domid, that domain was not the subject of 
@releasedomain.  (I also observe that domain_death_xswatch_callback() is 
very inefficient.  It only needs to make a single hypercall, not query 
the entire state of all domains.)

~Andrew

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: question about migration
  2015-12-29 11:24     ` Andrew Cooper
@ 2016-01-04 10:28       ` Paul Durrant
  2016-01-04 10:36         ` Andrew Cooper
  0 siblings, 1 reply; 21+ messages in thread
From: Paul Durrant @ 2016-01-04 10:28 UTC (permalink / raw)
  To: Andrew Cooper, Wen Congyang; +Cc: xen devel

> -----Original Message-----
[snip]
> > We send the I/O request to the default I/O request server, but no backing
> > DM hands it. We wil wait the I/O forever......
> 
> Hmm yes.  This needs fixing.
> 
> CC'ing Paul who did the ioreq server work.
> 
> This bug is caused by the read side effects of HVM_PARAM_IOREQ_PFN. The
> migration code needs a way of being able to query whether a default
> ioreq server exists, without creating one.
> 
> Can you remember what the justification for the read side effects were?
> ISTR that it was only for qemu compatibility until the ioreq server work
> got in upstream.  If that was the case, can we drop the read side
> effects now and mandate that all qemus explicitly create their ioreq
> servers (even if this involves creating a default ioreq server for
> qemu-trad)?
> 

Yes, you're correct that the read side-effect was the only way of keeping compatibility with existing QEMU at the time. It would be nice to remove that hackery but it would indeed require a patch to qemu-trad and would clearly break compatibility with distro qemu packages built prior to my modification.
An alternative solution to avoid breaking compatibility (albeit a bit icky) would be to turn off the side effects when HVMOP_create_ioreq_server is handled. There should be no case where a non-default server needs to be created in advance of a default server.

  Paul

> ~Andrew

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: question about migration
  2016-01-04 10:28       ` Paul Durrant
@ 2016-01-04 10:36         ` Andrew Cooper
  2016-01-04 11:08           ` Paul Durrant
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Cooper @ 2016-01-04 10:36 UTC (permalink / raw)
  To: Paul Durrant, Wen Congyang; +Cc: Wei Liu, Ian Jackson, Ian Campbell, xen devel

On 04/01/16 10:28, Paul Durrant wrote:
>> -----Original Message-----
> [snip]
>>> We send the I/O request to the default I/O request server, but no backing
>>> DM hands it. We wil wait the I/O forever......
>> Hmm yes.  This needs fixing.
>>
>> CC'ing Paul who did the ioreq server work.
>>
>> This bug is caused by the read side effects of HVM_PARAM_IOREQ_PFN. The
>> migration code needs a way of being able to query whether a default
>> ioreq server exists, without creating one.
>>
>> Can you remember what the justification for the read side effects were?
>> ISTR that it was only for qemu compatibility until the ioreq server work
>> got in upstream.  If that was the case, can we drop the read side
>> effects now and mandate that all qemus explicitly create their ioreq
>> servers (even if this involves creating a default ioreq server for
>> qemu-trad)?
>>
> Yes, you're correct that the read side-effect was the only way of keeping compatibility with existing QEMU at the time. It would be nice to remove that hackery but it would indeed require a patch to qemu-trad and would clearly break compatibility with distro qemu packages built prior to my modification.
> An alternative solution to avoid breaking compatibility (albeit a bit icky) would be to turn off the side effects when HVMOP_create_ioreq_server is handled. There should be no case where a non-default server needs to be created in advance of a default server.

I was under the impression that qemu-trad (like libxc) was expected to
exactly match.  There is deliberately no facility to use a distro
packaged qemu-trad in the Xen build system.  CC'ing toolstack
maintainers for their input.

If this is indeed the case, then the former option is the better course
of action.

~Andrew

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: question about migration
  2016-01-04 10:36         ` Andrew Cooper
@ 2016-01-04 11:08           ` Paul Durrant
  0 siblings, 0 replies; 21+ messages in thread
From: Paul Durrant @ 2016-01-04 11:08 UTC (permalink / raw)
  To: Andrew Cooper, Wen Congyang; +Cc: Ian Jackson, Wei Liu, Ian Campbell, xen devel

> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: 04 January 2016 10:36
> To: Paul Durrant; Wen Congyang
> Cc: xen devel; Ian Campbell; Ian Jackson; Wei Liu
> Subject: Re: [Xen-devel] question about migration
> 
> On 04/01/16 10:28, Paul Durrant wrote:
> >> -----Original Message-----
> > [snip]
> >>> We send the I/O request to the default I/O request server, but no
> backing
> >>> DM hands it. We wil wait the I/O forever......
> >> Hmm yes.  This needs fixing.
> >>
> >> CC'ing Paul who did the ioreq server work.
> >>
> >> This bug is caused by the read side effects of HVM_PARAM_IOREQ_PFN.
> The
> >> migration code needs a way of being able to query whether a default
> >> ioreq server exists, without creating one.
> >>
> >> Can you remember what the justification for the read side effects were?
> >> ISTR that it was only for qemu compatibility until the ioreq server work
> >> got in upstream.  If that was the case, can we drop the read side
> >> effects now and mandate that all qemus explicitly create their ioreq
> >> servers (even if this involves creating a default ioreq server for
> >> qemu-trad)?
> >>
> > Yes, you're correct that the read side-effect was the only way of keeping
> compatibility with existing QEMU at the time. It would be nice to remove that
> hackery but it would indeed require a patch to qemu-trad and would clearly
> break compatibility with distro qemu packages built prior to my modification.
> > An alternative solution to avoid breaking compatibility (albeit a bit icky)
> would be to turn off the side effects when HVMOP_create_ioreq_server is
> handled. There should be no case where a non-default server needs to be
> created in advance of a default server.
> 
> I was under the impression that qemu-trad (like libxc) was expected to
> exactly match.  There is deliberately no facility to use a distro
> packaged qemu-trad in the Xen build system.  CC'ing toolstack
> maintainers for their input.
> 

It wasn't clear but I meant that compatibility with *upstream* QEMU builds prior to my patch would be broken. It depends whether we care about this or not.

  Paul

> If this is indeed the case, then the former option is the better course
> of action.
> 
> ~Andrew

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: question about migration
  2015-12-29 12:46       ` Andrew Cooper
@ 2016-01-04 15:31         ` Ian Jackson
  2016-01-04 15:44           ` Ian Campbell
                             ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Ian Jackson @ 2016-01-04 15:31 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Ian.Campbell, Wen Congyang, xen devel

Andrew Cooper writes ("Re: [Xen-devel] question about migration"):
> On 25/12/2015 03:06, Wen Congyang wrote:
> > Another problem:
> > If migration fails after the guest is suspended, we will resume it in the source.
> > In this case, we cannot shutdown it. because no process hanlds the shutdown event.
> > The log in /var/log/xen/xl-hvm_nopv.log:
> > Waiting for domain hvm_nopv (domid 1) to die [pid 5508]
> > Domain 1 has shut down, reason code 2 0x2
> > Domain has suspended.
> > Done. Exiting now
> >
> > The xl has exited...
...
> Hmm yes.  This is a libxl bug in libxl_evenable_domain_death(). CC'ing 
> the toolstack maintainers.

AIUI this is a response to Wen's comments above.

> It waits for the @releasedomain watch, but doesn't interpret the results 
> correctly.  In particular, if it can still make successful hypercalls 
> with the provided domid, that domain was not the subject of 
> @releasedomain.  (I also observe that domain_death_xswatch_callback() is 
> very inefficient.  It only needs to make a single hypercall, not query 
> the entire state of all domains.)

I don't understand precisely what you allege this bug to be, but:

* libxl_evenable_domain_death may generate two events, a
  DOMAIN_SHUTDOWN and a DOMAIN_DEATH, or only one, a DOMAIN_DEATH.
  This is documented in libxl.h (although it refers to DESTROY rather
  than DEATH - see patch below to fix the doc).

* @releaseDomain usually triggers twice for each domain: once when it
  goes to SHUTDOWN and once when it is actually destroyed.  (This is
  obviously necessary to implement the above.)

* @releaseDomain does not have a specific domain which is the "subject
  of @releaseDomain".  Arguably this is unhelpful, but it is not
  libxl's fault.  It arises from the VIRQ generated by Xen.  Note that
  xenstored needs to search its own list of active domains to see what
  has happened; it generates the @releaseDomain event and throws away
  the domid.

* domain_death_xswatch_callback handles the reporting for all
  domains that this instance of libxl is interested in.

* That libxl searches its own smaller list of domains it is interested
  in is I think not a huge performance problem in a context where
  xenstored has to search all the domains on the system.  In the worst
  case where each libxl process is interested in one domain it makes
  O(number of domains) hypercalls requesting info for 200 domains
  each.  If this `200' is a problem a better heuristic would be quite
  easy.

* Wen is using xl.  The domain death events in xl are handled in
  xl_cmdimpl.c in the big while loop near the end of create_domain.
  The message in the log comes from this (in `create_domain'):

        case LIBXL_EVENT_TYPE_DOMAIN_SHUTDOWN:
            LOG("Domain %d has shut down, reason code %d 0x%x", domid,
                event->u.domain_shutdown.shutdown_reason,
                event->u.domain_shutdown.shutdown_reason);

  and then this (in `handle_domain_death')

        case LIBXL_SHUTDOWN_REASON_SUSPEND:
            LOG("Domain has suspended.");
            return 0;

  This is anomalous coding style because handle_domain_death should
  return one of the DOMAIN_RESTART values.  This anomaly was
  introduced in 1a0e1789 "xl: support on_{poweroff,reboot,crash}
  domain configuration options", which introduced ACTION_* but left
  the suspend case to `return 0'.  0 now means in this context
  DOMAIN_RESTART_NONE which means that the daemonic xl should simply
  exit.

  But this stylistic anomaly is of no functional import: even before
  that (eg in 0e89a4ce~), shutdown with the reason code suspend causes
  xl to exit.

  The daemonic xl exits in this situation because it expects the
  migration machinery to tidy up the domain.

* It is not possible to resume the domain in the source after it has
  suspended.  The domain is, in the hypervisor, in the lifecycle state
  SHUTDOWN.  For this reason in general, it is a bad idea to suspend
  the guest until the migration is known to be likely to succeed.

  After a late migration failure the domain should probably be treated
  as crashed.  This is slightly awkward to implement because the
  daemonic xl would have to hang around to see what happened next and
  perhaps be told what to do.  Or maybe the migrating xl would have to
  know what the on_crash configuration was.

  In the meantime the failure is rather less graceful: debris is
  simply left for the user.  This is not ideal but tolerable I think.

Ian.


>From 00c4fc6495078026d68f3fdd88bfd29a8ad483d7 Mon Sep 17 00:00:00 2001
From: Ian Jackson <ian.jackson@eu.citrix.com>
Date: Mon, 4 Jan 2016 15:13:14 +0000
Subject: [PATCH] libxl: Fix doc comment ref to DOMAIN_DEATH

The doc comment for libxl_evdisable_domain_death mistakenly referred
to DOMAIN_DESTROY but the event type name is actually DOMAIN_DEATH.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
---
 tools/libxl/libxl_event.h |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/libxl/libxl_event.h b/tools/libxl/libxl_event.h
index fad4c14..1ea789e 100644
--- a/tools/libxl/libxl_event.h
+++ b/tools/libxl/libxl_event.h
@@ -179,9 +179,9 @@ typedef struct libxl__evgen_domain_death libxl_evgen_domain_death;
 int libxl_evenable_domain_death(libxl_ctx *ctx, uint32_t domid,
                          libxl_ev_user, libxl_evgen_domain_death **evgen_out);
 void libxl_evdisable_domain_death(libxl_ctx *ctx, libxl_evgen_domain_death*);
-  /* Arranges for the generation of DOMAIN_SHUTDOWN and DOMAIN_DESTROY
+  /* Arranges for the generation of DOMAIN_SHUTDOWN and DOMAIN_DEATH
    * events.  A domain which is destroyed before it shuts down
-   * may generate only a DESTROY event.
+   * may generate only a DEATH event.
    */
 
 typedef struct libxl__evgen_disk_eject libxl_evgen_disk_eject;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: question about migration
  2016-01-04 15:31         ` Ian Jackson
@ 2016-01-04 15:44           ` Ian Campbell
  2016-01-04 15:48           ` Ian Campbell
  2016-01-04 16:38           ` Andrew Cooper
  2 siblings, 0 replies; 21+ messages in thread
From: Ian Campbell @ 2016-01-04 15:44 UTC (permalink / raw)
  To: Ian Jackson, Andrew Cooper; +Cc: Wei Liu, Wen Congyang, xen devel

On Mon, 2016-01-04 at 15:31 +0000, Ian Jackson wrote:
> 
> From 00c4fc6495078026d68f3fdd88bfd29a8ad483d7 Mon Sep 17 00:00:00 2001
> From: Ian Jackson <ian.jackson@eu.citrix.com>
> Date: Mon, 4 Jan 2016 15:13:14 +0000
> Subject: [PATCH] libxl: Fix doc comment ref to DOMAIN_DEATH
> 
> The doc comment for libxl_evdisable_domain_death mistakenly referred
> to DOMAIN_DESTROY but the event type name is actually DOMAIN_DEATH.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

> ---
>  tools/libxl/libxl_event.h |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/libxl/libxl_event.h b/tools/libxl/libxl_event.h
> index fad4c14..1ea789e 100644
> --- a/tools/libxl/libxl_event.h
> +++ b/tools/libxl/libxl_event.h
> @@ -179,9 +179,9 @@ typedef struct libxl__evgen_domain_death
> libxl_evgen_domain_death;
>  int libxl_evenable_domain_death(libxl_ctx *ctx, uint32_t domid,
>                           libxl_ev_user, libxl_evgen_domain_death
> **evgen_out);
>  void libxl_evdisable_domain_death(libxl_ctx *ctx,
> libxl_evgen_domain_death*);
> -  /* Arranges for the generation of DOMAIN_SHUTDOWN and DOMAIN_DESTROY
> +  /* Arranges for the generation of DOMAIN_SHUTDOWN and DOMAIN_DEATH
>     * events.  A domain which is destroyed before it shuts down
> -   * may generate only a DESTROY event.
> +   * may generate only a DEATH event.
>     */
>  
>  typedef struct libxl__evgen_disk_eject libxl_evgen_disk_eject;

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: question about migration
  2016-01-04 15:31         ` Ian Jackson
  2016-01-04 15:44           ` Ian Campbell
@ 2016-01-04 15:48           ` Ian Campbell
  2016-01-04 16:38           ` Andrew Cooper
  2 siblings, 0 replies; 21+ messages in thread
From: Ian Campbell @ 2016-01-04 15:48 UTC (permalink / raw)
  To: Ian Jackson, Andrew Cooper; +Cc: Wei Liu, Wen Congyang, xen devel

On Mon, 2016-01-04 at 15:31 +0000, Ian Jackson wrote:
> [...]

>   The daemonic xl exits in this situation because it expects the
>   migration machinery to tidy up the domain.
> 
> * It is not possible to resume the domain in the source after it has
>   suspended.

Isn't supposed to be, isn't that why SCHEDOP_suspend returns a flag to
indicate if it was cancelled or not, so the guest can be resumed on the
source if "something" went wrong? (And isn't such a resume the same as what
happens after a checkpoint?)

Unless you are referring not to the general case but to some more specific
scenario/window where this is no longer possible?

>   The domain is, in the hypervisor, in the lifecycle state
>   SHUTDOWN.  For this reason in general, it is a bad idea to suspend
>   the guest until the migration is known to be likely to succeed.
> 
>   After a late migration failure the domain should probably be treated
>   as crashed.  This is slightly awkward to implement because the
>   daemonic xl would have to hang around to see what happened next and
>   perhaps be told what to do.  Or maybe the migrating xl would have to
>   know what the on_crash configuration was.
> 
>   In the meantime the failure is rather less graceful: debris is
>   simply left for the user.  This is not ideal but tolerable I think.
> 
> Ian.
> 
> 
> From 00c4fc6495078026d68f3fdd88bfd29a8ad483d7 Mon Sep 17 00:00:00 2001
> From: Ian Jackson <ian.jackson@eu.citrix.com>
> Date: Mon, 4 Jan 2016 15:13:14 +0000
> Subject: [PATCH] libxl: Fix doc comment ref to DOMAIN_DEATH
> 
> The doc comment for libxl_evdisable_domain_death mistakenly referred
> to DOMAIN_DESTROY but the event type name is actually DOMAIN_DEATH.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
>  tools/libxl/libxl_event.h |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/libxl/libxl_event.h b/tools/libxl/libxl_event.h
> index fad4c14..1ea789e 100644
> --- a/tools/libxl/libxl_event.h
> +++ b/tools/libxl/libxl_event.h
> @@ -179,9 +179,9 @@ typedef struct libxl__evgen_domain_death
> libxl_evgen_domain_death;
>  int libxl_evenable_domain_death(libxl_ctx *ctx, uint32_t domid,
>                           libxl_ev_user, libxl_evgen_domain_death
> **evgen_out);
>  void libxl_evdisable_domain_death(libxl_ctx *ctx,
> libxl_evgen_domain_death*);
> -  /* Arranges for the generation of DOMAIN_SHUTDOWN and DOMAIN_DESTROY
> +  /* Arranges for the generation of DOMAIN_SHUTDOWN and DOMAIN_DEATH
>     * events.  A domain which is destroyed before it shuts down
> -   * may generate only a DESTROY event.
> +   * may generate only a DEATH event.
>     */
>  
>  typedef struct libxl__evgen_disk_eject libxl_evgen_disk_eject;

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: question about migration
  2016-01-04 15:31         ` Ian Jackson
  2016-01-04 15:44           ` Ian Campbell
  2016-01-04 15:48           ` Ian Campbell
@ 2016-01-04 16:38           ` Andrew Cooper
  2016-01-04 17:46             ` Ian Jackson
  2 siblings, 1 reply; 21+ messages in thread
From: Andrew Cooper @ 2016-01-04 16:38 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Wei Liu, Ian.Campbell, Wen Congyang, xen devel

On 04/01/16 15:31, Ian Jackson wrote:
> Andrew Cooper writes ("Re: [Xen-devel] question about migration"):
>> On 25/12/2015 03:06, Wen Congyang wrote:
>>> Another problem:
>>> If migration fails after the guest is suspended, we will resume it in the source.
>>> In this case, we cannot shutdown it. because no process hanlds the shutdown event.
>>> The log in /var/log/xen/xl-hvm_nopv.log:
>>> Waiting for domain hvm_nopv (domid 1) to die [pid 5508]
>>> Domain 1 has shut down, reason code 2 0x2
>>> Domain has suspended.
>>> Done. Exiting now
>>>
>>> The xl has exited...
> ...
>> Hmm yes.  This is a libxl bug in libxl_evenable_domain_death(). CC'ing 
>> the toolstack maintainers.
> AIUI this is a response to Wen's comments above.
>
>> It waits for the @releasedomain watch, but doesn't interpret the results 
>> correctly.  In particular, if it can still make successful hypercalls 
>> with the provided domid, that domain was not the subject of 
>> @releasedomain.  (I also observe that domain_death_xswatch_callback() is 
>> very inefficient.  It only needs to make a single hypercall, not query 
>> the entire state of all domains.)
> I don't understand precisely what you allege this bug to be, but:
>
> * libxl_evenable_domain_death may generate two events, a
>   DOMAIN_SHUTDOWN and a DOMAIN_DEATH, or only one, a DOMAIN_DEATH.
>   This is documented in libxl.h (although it refers to DESTROY rather
>   than DEATH - see patch below to fix the doc).
>
> * @releaseDomain usually triggers twice for each domain: once when it
>   goes to SHUTDOWN and once when it is actually destroyed.  (This is
>   obviously necessary to implement the above.)

So it does.  I clearly had an accident with `git grep` when I came the
opposite conclusion.  Apologies for the noise generated from this.

>
> * @releaseDomain does not have a specific domain which is the "subject
>   of @releaseDomain".  Arguably this is unhelpful, but it is not
>   libxl's fault.  It arises from the VIRQ generated by Xen.  Note that
>   xenstored needs to search its own list of active domains to see what
>   has happened; it generates the @releaseDomain event and throws away
>   the domid.

The semantics of @releaseDomain are quite mad, but this is have it has
always been.

The current semantics are a scalability limitation which someone in
XenServer will likely get around to in due course (we support 1000 VMs
per host).

> * It is not possible to resume the domain in the source after it has
>   suspended.

This functionality exists and is already used in several circumstances,
both by libxl, and other toolstacks.

xl has an added split-brain problem here that plain demonic toolstacks
don't have; specifically that there are two completely independent
processes playing with the domain state at the same time.

The daemonic xl needs to ignore DOMAIN_SHUTDOWN and tidy up only after
DOMAIN_DEATH.  Under these circumstances, a failed migrate which resumes
the domain won't result in qemu being cleaned up.

~Andrew

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: question about migration
  2016-01-04 16:38           ` Andrew Cooper
@ 2016-01-04 17:46             ` Ian Jackson
  2016-01-04 18:05               ` Andrew Cooper
  0 siblings, 1 reply; 21+ messages in thread
From: Ian Jackson @ 2016-01-04 17:46 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Ian.Campbell, Wen Congyang, xen devel

Andrew Cooper writes ("Re: [Xen-devel] question about migration"):
> On 04/01/16 15:31, Ian Jackson wrote:
> > * It is not possible to resume the domain in the source after it has
> >   suspended.
> 
> This functionality exists and is already used in several circumstances,
> both by libxl, and other toolstacks.

Oh!

> xl has an added split-brain problem here that plain demonic toolstacks
> don't have; specifically that there are two completely independent
> processes playing with the domain state at the same time.
> 
> The daemonic xl needs to ignore DOMAIN_SHUTDOWN and tidy up only after
> DOMAIN_DEATH.  Under these circumstances, a failed migrate which resumes
> the domain won't result in qemu being cleaned up.

I think there is some kind of further underlying problem here.

Suppose a domain goes into SHUTDOWN with reason code SUSPEND.  Then
later it is resumed again (perhaps the migration failed).  And later
it shuts down cleanly.

Will this generate a second @releaseDomain ?  How is a toolstack
supposed to discover this situation ?

Should libxl simply treat domains in state SHUTDOWN/SUSPEND as if they
were running, and not issue a notification to anyone ?


> > * @releaseDomain does not have a specific domain which is the "subject
> >   of @releaseDomain".  Arguably this is unhelpful, but it is not
> >   libxl's fault.  It arises from the VIRQ generated by Xen.  Note that
> >   xenstored needs to search its own list of active domains to see what
> >   has happened; it generates the @releaseDomain event and throws away
> >   the domid.
> 
> The semantics of @releaseDomain are quite mad, but this is have it has
> always been.

Yes.

> The current semantics are a scalability limitation which someone in
> XenServer will likely get around to in due course (we support 1000 VMs
> per host).

Jolly good :-).


Ian.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: question about migration
  2016-01-04 17:46             ` Ian Jackson
@ 2016-01-04 18:05               ` Andrew Cooper
  2016-01-05 15:40                 ` Ian Jackson
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Cooper @ 2016-01-04 18:05 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Wei Liu, Ian.Campbell, Wen Congyang, xen devel

On 04/01/16 17:46, Ian Jackson wrote:
> Andrew Cooper writes ("Re: [Xen-devel] question about migration"):
>> On 04/01/16 15:31, Ian Jackson wrote:
>>> * It is not possible to resume the domain in the source after it has
>>>   suspended.
>> This functionality exists and is already used in several circumstances,
>> both by libxl, and other toolstacks.
> Oh!
>
>> xl has an added split-brain problem here that plain demonic toolstacks
>> don't have; specifically that there are two completely independent
>> processes playing with the domain state at the same time.
>>
>> The daemonic xl needs to ignore DOMAIN_SHUTDOWN and tidy up only after
>> DOMAIN_DEATH.  Under these circumstances, a failed migrate which resumes
>> the domain won't result in qemu being cleaned up.
> I think there is some kind of further underlying problem here.
>
> Suppose a domain goes into SHUTDOWN with reason code SUSPEND.  Then
> later it is resumed again (perhaps the migration failed).  And later
> it shuts down cleanly.
>
> Will this generate a second @releaseDomain ?  How is a toolstack
> supposed to discover this situation ?

By the looks of it in Xen, domain_resume() sets d->is_shut_down = 0,
which means a subsequent domain_shutdown() will generate a new VIRQ.

Therefore, a new @releaseDomain will arrive at a later point.

(Actually, there are 3 VIRQs on a standard shutdown path, but the final
two are very close together.  It is exceedingly likely that they are
amortised in the time it takes dom0 to respond to the evtchn for the
penultimate notification.)

>
> Should libxl simply treat domains in state SHUTDOWN/SUSPEND as if they
> were running, and not issue a notification to anyone ?

As far as libxl_evenable_domain_death() is concerned, yes.

In general however, no.  Consider a libxl user (such as a snapshotting
utility) which asks a domain to suspend to quiesce the disks and then
intends to resume the domain.

~Andrew

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: question about migration
  2016-01-04 18:05               ` Andrew Cooper
@ 2016-01-05 15:40                 ` Ian Jackson
  2016-01-05 17:39                   ` Andrew Cooper
  0 siblings, 1 reply; 21+ messages in thread
From: Ian Jackson @ 2016-01-05 15:40 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Ian.Campbell, Wen Congyang, xen devel

Andrew Cooper writes ("Re: [Xen-devel] question about migration"):
> On 04/01/16 17:46, Ian Jackson wrote:
> > Suppose a domain goes into SHUTDOWN with reason code SUSPEND.  Then
> > later it is resumed again (perhaps the migration failed).  And later
> > it shuts down cleanly.
> >
> > Will this generate a second @releaseDomain ?  How is a toolstack
> > supposed to discover this situation ?
> 
> By the looks of it in Xen, domain_resume() sets d->is_shut_down = 0,
> which means a subsequent domain_shutdown() will generate a new VIRQ.
> 
> Therefore, a new @releaseDomain will arrive at a later point.

Well, jolly good.

> > Should libxl simply treat domains in state SHUTDOWN/SUSPEND as if they
> > were running, and not issue a notification to anyone ?
> 
> As far as libxl_evenable_domain_death() is concerned, yes.

Would you like to prepare a patch ?

> In general however, no.  Consider a libxl user (such as a snapshotting
> utility) which asks a domain to suspend to quiesce the disks and then
> intends to resume the domain.

Currently such a utility would have to use
libxl_evenable_domain_death to discover when the domain suspends, but
that would cause breakage as discussed in this thread.  So I think
such a utility would need a new libxl evenable interface.  Do you
agree ?

Ian.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: question about migration
  2016-01-05 15:40                 ` Ian Jackson
@ 2016-01-05 17:39                   ` Andrew Cooper
  2016-01-05 18:17                     ` Ian Jackson
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Cooper @ 2016-01-05 17:39 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Wei Liu, Ian.Campbell, Wen Congyang, xen devel

On 05/01/16 15:40, Ian Jackson wrote:
> Andrew Cooper writes ("Re: [Xen-devel] question about migration"):
>> On 04/01/16 17:46, Ian Jackson wrote:
>>> Suppose a domain goes into SHUTDOWN with reason code SUSPEND.  Then
>>> later it is resumed again (perhaps the migration failed).  And later
>>> it shuts down cleanly.
>>>
>>> Will this generate a second @releaseDomain ?  How is a toolstack
>>> supposed to discover this situation ?
>> By the looks of it in Xen, domain_resume() sets d->is_shut_down = 0,
>> which means a subsequent domain_shutdown() will generate a new VIRQ.
>>
>> Therefore, a new @releaseDomain will arrive at a later point.
> Well, jolly good.
>
>>> Should libxl simply treat domains in state SHUTDOWN/SUSPEND as if they
>>> were running, and not issue a notification to anyone ?
>> As far as libxl_evenable_domain_death() is concerned, yes.
> Would you like to prepare a patch ?

I don't have a repro of the issue.  This thread was merely me triaging
an issue reported by Wen, given some `git grep`.  (There are actually 3
different bugs on different subthreads of this thread.)

>
>> In general however, no.  Consider a libxl user (such as a snapshotting
>> utility) which asks a domain to suspend to quiesce the disks and then
>> intends to resume the domain.
> Currently such a utility would have to use
> libxl_evenable_domain_death to discover when the domain suspends, but
> that would cause breakage as discussed in this thread.  So I think
> such a utility would need a new libxl evenable interface.  Do you
> agree ?

This looks like a yakk.

Changing the behaviour of libxl_evenable_domain_death() will break the
API and also break in-guest users.  It also doesn't return the shutdown
code, so the caller can't loop while shutdown && suspend.

I can't think of a solution which doesn't involve making a brand new
function written from scratch.

~Andrew

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: question about migration
  2016-01-05 17:39                   ` Andrew Cooper
@ 2016-01-05 18:17                     ` Ian Jackson
  2016-01-06 10:21                       ` Ian Campbell
  0 siblings, 1 reply; 21+ messages in thread
From: Ian Jackson @ 2016-01-05 18:17 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Ian.Campbell, Wen Congyang, xen devel

Andrew Cooper writes ("Re: [Xen-devel] question about migration"):
> On 05/01/16 15:40, Ian Jackson wrote:
> > Would you like to prepare a patch ?
> 
> I don't have a repro of the issue.  This thread was merely me triaging
> an issue reported by Wen, given some `git grep`.  (There are actually 3
> different bugs on different subthreads of this thread.)

I will do so then.

> >> In general however, no.  Consider a libxl user (such as a snapshotting
> >> utility) which asks a domain to suspend to quiesce the disks and then
> >> intends to resume the domain.
> >
> > Currently such a utility would have to use
> > libxl_evenable_domain_death to discover when the domain suspends, but
> > that would cause breakage as discussed in this thread.  So I think
> > such a utility would need a new libxl evenable interface.  Do you
> > agree ?
> 
> This looks like a yakk.

It's a yak that someone would have to shave when they wanted to write
such a utility.

> Changing the behaviour of libxl_evenable_domain_death() will break the
> API and also break in-guest users.  It also doesn't return the shutdown
> code, so the caller can't loop while shutdown && suspend.

I don't understand what you mean.

Currently libxl_evenable_domain_death reports DOMAIN_SHUTDOWN and
provides the reason code.  But it only does so once, so when a domain
is migrated it triggers during the suspend-for-migrate and then
generates no further SHUTDOWN event.

So any code trying to use this for your snapshotting case is already
broken and cannot be fixed without introducing a new API (probably one
which generates separate suspend/unsuspend events).

(We can't have libxl_evenable_domain_death generate new unsuspend
events because existing callers won't understand.)


I therefore propose that:

libxl_evenable_domain_death should never generate DOMAIN_SHUTDOWN with
reason code suspended.  Instead, it should hope that the domain will
go back to running.  If the domain ends up shut down for some other
reason, or is destroyed, it should report those things.

In the future we introduce libxl_evenable_domain_state which generates
the existing events from libxl_evenable_domain_death but also two new
events DOMAIN_SUSPENDED or DOMAIN_RUNNING.

Ian.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: question about migration
  2016-01-05 18:17                     ` Ian Jackson
@ 2016-01-06 10:21                       ` Ian Campbell
  0 siblings, 0 replies; 21+ messages in thread
From: Ian Campbell @ 2016-01-06 10:21 UTC (permalink / raw)
  To: Ian Jackson, Andrew Cooper; +Cc: Wei Liu, Wen Congyang, xen devel

On Tue, 2016-01-05 at 18:17 +0000, Ian Jackson wrote:

> So any code trying to use this for your snapshotting case is already
> broken and cannot be fixed without introducing a new API (probably one
> which generates separate suspend/unsuspend events).

Remus does seem to work at least to the extent that it seems not to be
hitting this case though? Or at least I assume so since people have
reported various cases as working. Or maybe no one ever did "xl shutdown"
after checkpointing so this went unnoticed.

I'm trying to decide if this is "it's completely knackered" for "fails in
some corner cases".

> (We can't have libxl_evenable_domain_death generate new unsuspend
> events because existing callers won't understand.)

Can we make it so that the new API can be extended in this way, even if we
just document "ignore unknown events"?

> I therefore propose that:
> 
> libxl_evenable_domain_death should never generate DOMAIN_SHUTDOWN with
> reason code suspended.

FWIW libvirt ignores these (and as we know xl incorrectly exits). I guess
we think it unlikely that anything could be relying on the current
behaviour?

OOI would calling libxl_evdisable_domain_death+libxl_evenable_domain_death
reset the one-shot nature of this event?

>   Instead, it should hope that the domain will
> go back to running.  If the domain ends up shut down for some other
> reason, or is destroyed, it should report those things.
> 
> In the future we introduce libxl_evenable_domain_state which generates
> the existing events from libxl_evenable_domain_death but also two new
> events DOMAIN_SUSPENDED or DOMAIN_RUNNING.

I infer that this new function will take a mask of domain states which the
caller is interested in (and hence is extensible)?

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2016-01-06 10:21 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-24  2:29 question about migration Wen Congyang
2015-12-24 12:36 ` Andrew Cooper
2015-12-25  0:55   ` Wen Congyang
2015-12-29 10:57     ` Andrew Cooper
2015-12-25  1:45   ` Wen Congyang
2015-12-25  3:06     ` Wen Congyang
2015-12-29 12:46       ` Andrew Cooper
2016-01-04 15:31         ` Ian Jackson
2016-01-04 15:44           ` Ian Campbell
2016-01-04 15:48           ` Ian Campbell
2016-01-04 16:38           ` Andrew Cooper
2016-01-04 17:46             ` Ian Jackson
2016-01-04 18:05               ` Andrew Cooper
2016-01-05 15:40                 ` Ian Jackson
2016-01-05 17:39                   ` Andrew Cooper
2016-01-05 18:17                     ` Ian Jackson
2016-01-06 10:21                       ` Ian Campbell
2015-12-29 11:24     ` Andrew Cooper
2016-01-04 10:28       ` Paul Durrant
2016-01-04 10:36         ` Andrew Cooper
2016-01-04 11:08           ` Paul Durrant

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.