xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* xl migrate regression in staging
@ 2016-04-14 13:03 Olaf Hering
  2016-04-14 13:08 ` Andrew Cooper
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Olaf Hering @ 2016-04-14 13:03 UTC (permalink / raw)
  To: xen-devel

Migration from staging-4.5.3f802a5 to staging-4-7.3dac42f fails with a HVM guest:


root@anonymi:~ # xl migrate domU host
migration target: Ready to receive domain.
Saving to migration stream new xl format (info 0x1/0x0/2677)
xc: error: error polling suspend notification channel: -1: Internal error
Loading new save file <incoming migration stream> (new xl fmt info 0x1/0x0/2677)
 Savefile contains xl domain config in JSON format
Parsing config from <saved>
libxl: error: libxl_stream_read.c:327:stream_header_done: Invalid ident: expected 0x4c6962786c466d74, got 0x01f00f0000000000
libxl: error: libxl_utils.c:430:libxl_read_exactly: file/stream truncated reading ipc msg header from domain 1 save/restore helper stdout pipe
libxl: error: libxl_exec.c:129:libxl_report_child_exitstatus: domain 1 save/restore helper [-1] died due to fatal signal Broken pipe
libxl: error: libxl_dom.c:2036:remus_teardown_done: Remus: failed to teardown device for guest with domid 1, rc -3
migration sender: libxl_domain_suspend failed (rc=-3)
libxl: info: libxl_exec.c:118:libxl_report_child_exitstatus: migration transport process [2000] exited with error status 255
Migration failed, resuming at sender.
xc: error: Dom 1 not suspended: (shutdown 0, reason 255): Internal error
libxl: error: libxl.c:515:libxl__domain_resume: xc_domain_resume failed for domain 1: Invalid argument


I think this regression was introduced in the last 3-4 weeks.

Olaf

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: xl migrate regression in staging
  2016-04-14 13:03 xl migrate regression in staging Olaf Hering
@ 2016-04-14 13:08 ` Andrew Cooper
  2016-04-14 13:52   ` Olaf Hering
  2016-04-14 13:32 ` Olaf Hering
  2016-04-14 15:31 ` Wei Liu
  2 siblings, 1 reply; 13+ messages in thread
From: Andrew Cooper @ 2016-04-14 13:08 UTC (permalink / raw)
  To: Olaf Hering, xen-devel

On 14/04/16 14:03, Olaf Hering wrote:
> Migration from staging-4.5.3f802a5 to staging-4-7.3dac42f fails with a HVM guest:
>
>
> root@anonymi:~ # xl migrate domU host
> migration target: Ready to receive domain.
> Saving to migration stream new xl format (info 0x1/0x0/2677)
> xc: error: error polling suspend notification channel: -1: Internal error
> Loading new save file <incoming migration stream> (new xl fmt info 0x1/0x0/2677)
>  Savefile contains xl domain config in JSON format
> Parsing config from <saved>
> libxl: error: libxl_stream_read.c:327:stream_header_done: Invalid ident: expected 0x4c6962786c466d74, got 0x01f00f0000000000
> libxl: error: libxl_utils.c:430:libxl_read_exactly: file/stream truncated reading ipc msg header from domain 1 save/restore helper stdout pipe
> libxl: error: libxl_exec.c:129:libxl_report_child_exitstatus: domain 1 save/restore helper [-1] died due to fatal signal Broken pipe
> libxl: error: libxl_dom.c:2036:remus_teardown_done: Remus: failed to teardown device for guest with domid 1, rc -3
> migration sender: libxl_domain_suspend failed (rc=-3)
> libxl: info: libxl_exec.c:118:libxl_report_child_exitstatus: migration transport process [2000] exited with error status 255
> Migration failed, resuming at sender.
> xc: error: Dom 1 not suspended: (shutdown 0, reason 255): Internal error
> libxl: error: libxl.c:515:libxl__domain_resume: xc_domain_resume failed for domain 1: Invalid argument
>
>
> I think this regression was introduced in the last 3-4 weeks.

Over that time, the COLO support has been added to libxl, which seems
like a likely candidate.

This looks like it isn't kicking off the legacy conversion script on the
destination side.

Can you do an `xl save` to file on the source side, and on the
destination side manually invoke convert-legacy-stream and
verify-stream-v2 (both of which are also specifically usable on the
command line as well as automatically) ?  This will identify if it is a
"content of the stream" error, or a plumbing error.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: xl migrate regression in staging
  2016-04-14 13:03 xl migrate regression in staging Olaf Hering
  2016-04-14 13:08 ` Andrew Cooper
@ 2016-04-14 13:32 ` Olaf Hering
  2016-04-14 15:31 ` Wei Liu
  2 siblings, 0 replies; 13+ messages in thread
From: Olaf Hering @ 2016-04-14 13:32 UTC (permalink / raw)
  To: xen-devel

On Thu, Apr 14, Olaf Hering wrote:

> Migration from staging-4.5.3f802a5 to staging-4-7.3dac42f fails with a HVM guest:
> root@anonymi:~ # xl migrate domU host

Related to that:

The domU--incoming on "host" is not destroyed. Which part of 'xl
migrate' would be responsible to do the 'xl destroy domU--incoming'?

Olaf

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: xl migrate regression in staging
  2016-04-14 13:08 ` Andrew Cooper
@ 2016-04-14 13:52   ` Olaf Hering
  2016-04-14 13:54     ` Olaf Hering
  2016-04-14 14:04     ` Andrew Cooper
  0 siblings, 2 replies; 13+ messages in thread
From: Olaf Hering @ 2016-04-14 13:52 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

On Thu, Apr 14, Andrew Cooper wrote:

> Can you do an `xl save` to file on the source side, and on the
> destination side manually invoke convert-legacy-stream and
> verify-stream-v2 (both of which are also specifically usable on the
> command line as well as automatically) ?  This will identify if it is a
> "content of the stream" error, or a plumbing error.

root@macintyre-old:~ # xl restore -V /share/save.img
Loading new save file /share/save.img (new xl fmt info 0x1/0x0/2677)
 Savefile contains xl domain config in JSON format
Parsing config from <saved>
libxl: error: libxl_stream_read.c:327:stream_header_done: Invalid ident: expected 0x4c6962786c466d74, got 0x01f00f0000000000
Speicherzugriffsfehler (core dumped)
root@macintyre-old:~ # Stream Error:
Traceback (most recent call last):
  File "/usr/lib/xen/bin/convert-legacy-stream", line 582, in read_legacy_stream
    write_libxl_hdr()
  File "/usr/lib/xen/bin/convert-legacy-stream", line 101, in write_libxl_hdr
    libxl.HDR_OPT_LEGACY # Little Endian and Legacy
  File "/usr/lib/xen/bin/convert-legacy-stream", line 33, in stream_write
    return fout.write(_)
IOError: [Errno 32] Broken pipe


Migration fron staging-4.6 works at a first glance. The domU crashes as
soon as it is accessed in the vnc window. Havent check yet what happens
to it. Will send another mail with details.

Olaf

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: xl migrate regression in staging
  2016-04-14 13:52   ` Olaf Hering
@ 2016-04-14 13:54     ` Olaf Hering
  2016-04-14 14:04     ` Andrew Cooper
  1 sibling, 0 replies; 13+ messages in thread
From: Olaf Hering @ 2016-04-14 13:54 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

On Thu, Apr 14, Olaf Hering wrote:

> On Thu, Apr 14, Andrew Cooper wrote:
> 
> > Can you do an `xl save` to file on the source side, and on the
> > destination side manually invoke convert-legacy-stream and
> > verify-stream-v2 (both of which are also specifically usable on the
> > command line as well as automatically) ?  This will identify if it is a
> > "content of the stream" error, or a plumbing error.
> 
> root@macintyre-old:~ # xl restore -V /share/save.img
> Loading new save file /share/save.img (new xl fmt info 0x1/0x0/2677)
>  Savefile contains xl domain config in JSON format
> Parsing config from <saved>
> libxl: error: libxl_stream_read.c:327:stream_header_done: Invalid ident: expected 0x4c6962786c466d74, got 0x01f00f0000000000
> Speicherzugriffsfehler (core dumped)

And also this leaves a 'domU' running in paused state.

Olaf

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: xl migrate regression in staging
  2016-04-14 13:52   ` Olaf Hering
  2016-04-14 13:54     ` Olaf Hering
@ 2016-04-14 14:04     ` Andrew Cooper
  2016-04-14 14:15       ` Olaf Hering
  1 sibling, 1 reply; 13+ messages in thread
From: Andrew Cooper @ 2016-04-14 14:04 UTC (permalink / raw)
  To: Olaf Hering; +Cc: xen-devel

On 14/04/16 14:52, Olaf Hering wrote:
> On Thu, Apr 14, Andrew Cooper wrote:
>
>> Can you do an `xl save` to file on the source side, and on the
>> destination side manually invoke convert-legacy-stream and
>> verify-stream-v2 (both of which are also specifically usable on the
>> command line as well as automatically) ?  This will identify if it is a
>> "content of the stream" error, or a plumbing error.
> root@macintyre-old:~ # xl restore -V /share/save.img
> Loading new save file /share/save.img (new xl fmt info 0x1/0x0/2677)
>  Savefile contains xl domain config in JSON format
> Parsing config from <saved>
> libxl: error: libxl_stream_read.c:327:stream_header_done: Invalid ident: expected 0x4c6962786c466d74, got 0x01f00f0000000000
> Speicherzugriffsfehler (core dumped)
> root@macintyre-old:~ # Stream Error:
> Traceback (most recent call last):
>   File "/usr/lib/xen/bin/convert-legacy-stream", line 582, in read_legacy_stream
>     write_libxl_hdr()
>   File "/usr/lib/xen/bin/convert-legacy-stream", line 101, in write_libxl_hdr
>     libxl.HDR_OPT_LEGACY # Little Endian and Legacy
>   File "/usr/lib/xen/bin/convert-legacy-stream", line 33, in stream_write
>     return fout.write(_)
> IOError: [Errno 32] Broken pipe

Ok - so the conversion script was running.  I still can't explain "got
0x01f00f0000000000" via any of the failure modes I encountered while
working on migration v2 in the first place.

How large is save.img?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: xl migrate regression in staging
  2016-04-14 14:04     ` Andrew Cooper
@ 2016-04-14 14:15       ` Olaf Hering
  2016-04-14 15:00         ` Andrew Cooper
  0 siblings, 1 reply; 13+ messages in thread
From: Olaf Hering @ 2016-04-14 14:15 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

On Thu, Apr 14, Andrew Cooper wrote:

> How large is save.img?

domU.cfg has memory=1024, save.img has 1076188625.

Olaf

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: xl migrate regression in staging
  2016-04-14 14:15       ` Olaf Hering
@ 2016-04-14 15:00         ` Andrew Cooper
  0 siblings, 0 replies; 13+ messages in thread
From: Andrew Cooper @ 2016-04-14 15:00 UTC (permalink / raw)
  To: Olaf Hering; +Cc: xen-devel

On 14/04/16 15:15, Olaf Hering wrote:
> On Thu, Apr 14, Andrew Cooper wrote:
>
>> How large is save.img?
> domU.cfg has memory=1024, save.img has 1076188625.
>
> Olaf

Can you repro with a smaller domain?  Does it compress well?  Can you
post it somewhere I can get my hands on it?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: xl migrate regression in staging
  2016-04-14 13:03 xl migrate regression in staging Olaf Hering
  2016-04-14 13:08 ` Andrew Cooper
  2016-04-14 13:32 ` Olaf Hering
@ 2016-04-14 15:31 ` Wei Liu
  2016-04-14 15:36   ` Olaf Hering
  2 siblings, 1 reply; 13+ messages in thread
From: Wei Liu @ 2016-04-14 15:31 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Wei Liu, xen-devel

On Thu, Apr 14, 2016 at 03:03:01PM +0200, Olaf Hering wrote:
> Migration from staging-4.5.3f802a5 to staging-4-7.3dac42f fails with a HVM guest:
> 
> 
> root@anonymi:~ # xl migrate domU host
> migration target: Ready to receive domain.
> Saving to migration stream new xl format (info 0x1/0x0/2677)
> xc: error: error polling suspend notification channel: -1: Internal error

Is there anything in xl dmesg regarding this?

> Loading new save file <incoming migration stream> (new xl fmt info 0x1/0x0/2677)
>  Savefile contains xl domain config in JSON format
> Parsing config from <saved>
> libxl: error: libxl_stream_read.c:327:stream_header_done: Invalid ident: expected 0x4c6962786c466d74, got 0x01f00f0000000000

This is libxl expected libxl stream header (the 0x4cxxx is header
signature) but got something else.

> libxl: error: libxl_utils.c:430:libxl_read_exactly: file/stream truncated reading ipc msg header from domain 1 save/restore helper stdout pipe
> libxl: error: libxl_exec.c:129:libxl_report_child_exitstatus: domain 1 save/restore helper [-1] died due to fatal signal Broken pipe
> libxl: error: libxl_dom.c:2036:remus_teardown_done: Remus: failed to teardown device for guest with domid 1, rc -3

I feel quite confused why remus is involved.

> migration sender: libxl_domain_suspend failed (rc=-3)
> libxl: info: libxl_exec.c:118:libxl_report_child_exitstatus: migration transport process [2000] exited with error status 255
> Migration failed, resuming at sender.
> xc: error: Dom 1 not suspended: (shutdown 0, reason 255): Internal error
> libxl: error: libxl.c:515:libxl__domain_resume: xc_domain_resume failed for domain 1: Invalid argument
> 
> 
> I think this regression was introduced in the last 3-4 weeks.
> 

Maybe go back to 96ae556569b8eaedc0bb242932842c3277b515d8 and try again?
Then 5cf46a66883ad7a56c5bdee97696373473f80974 and try? So that I can
know if it is related to COLO series. No, don't try to bisect that
because it's broken in the middle.

Also as Andrew suggested if you can reproduce it with a smaller domain
or share the guest image with us, that would be helpful.

Wei.

> Olaf
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: xl migrate regression in staging
  2016-04-14 15:31 ` Wei Liu
@ 2016-04-14 15:36   ` Olaf Hering
  2016-06-02 15:18     ` annie li
  0 siblings, 1 reply; 13+ messages in thread
From: Olaf Hering @ 2016-04-14 15:36 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel

On Thu, Apr 14, Wei Liu wrote:

> Maybe go back to 96ae556569b8eaedc0bb242932842c3277b515d8 and try again?
> Then 5cf46a66883ad7a56c5bdee97696373473f80974 and try? So that I can
> know if it is related to COLO series. No, don't try to bisect that
> because it's broken in the middle.

I think I took enough snapshots for my rpm packages to get close to the
above commits. It will take some time to cycle through them.


> Also as Andrew suggested if you can reproduce it with a smaller domain
> or share the guest image with us, that would be helpful.

I have already sent him the link.

Olaf

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: xl migrate regression in staging
  2016-04-14 15:36   ` Olaf Hering
@ 2016-06-02 15:18     ` annie li
  2016-06-02 15:21       ` Wei Liu
  0 siblings, 1 reply; 13+ messages in thread
From: annie li @ 2016-06-02 15:18 UTC (permalink / raw)
  To: Olaf Hering, Wei Liu, andrew.cooper3; +Cc: xen-devel


On 4/14/2016 11:36 AM, Olaf Hering wrote:
> On Thu, Apr 14, Wei Liu wrote:
>
>> Maybe go back to 96ae556569b8eaedc0bb242932842c3277b515d8 and try again?
>> Then 5cf46a66883ad7a56c5bdee97696373473f80974 and try? So that I can
>> know if it is related to COLO series. No, don't try to bisect that
>> because it's broken in the middle.
> I think I took enough snapshots for my rpm packages to get close to the
> above commits. It will take some time to cycle through them.
>
>
>> Also as Andrew suggested if you can reproduce it with a smaller domain
>> or share the guest image with us, that would be helpful.
> I have already sent him the link.
Any update, guys?
I hit similar problem recently too.

Thanks
Annie

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: xl migrate regression in staging
  2016-06-02 15:18     ` annie li
@ 2016-06-02 15:21       ` Wei Liu
  2016-06-02 16:18         ` annie li
  0 siblings, 1 reply; 13+ messages in thread
From: Wei Liu @ 2016-06-02 15:21 UTC (permalink / raw)
  To: annie li; +Cc: andrew.cooper3, Olaf Hering, Wei Liu, xen-devel

On Thu, Jun 02, 2016 at 11:18:36AM -0400, annie li wrote:
> 
> On 4/14/2016 11:36 AM, Olaf Hering wrote:
> >On Thu, Apr 14, Wei Liu wrote:
> >
> >>Maybe go back to 96ae556569b8eaedc0bb242932842c3277b515d8 and try again?
> >>Then 5cf46a66883ad7a56c5bdee97696373473f80974 and try? So that I can
> >>know if it is related to COLO series. No, don't try to bisect that
> >>because it's broken in the middle.
> >I think I took enough snapshots for my rpm packages to get close to the
> >above commits. It will take some time to cycle through them.
> >
> >
> >>Also as Andrew suggested if you can reproduce it with a smaller domain
> >>or share the guest image with us, that would be helpful.
> >I have already sent him the link.
> Any update, guys?
> I hit similar problem recently too.
> 

Yes, this has been fixed in master branch.

See de28b189bd329dd20a245a318be8feea2c4cc60a.

Wei.

> Thanks
> Annie

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: xl migrate regression in staging
  2016-06-02 15:21       ` Wei Liu
@ 2016-06-02 16:18         ` annie li
  0 siblings, 0 replies; 13+ messages in thread
From: annie li @ 2016-06-02 16:18 UTC (permalink / raw)
  To: Wei Liu; +Cc: andrew.cooper3, Olaf Hering, xen-devel


On 6/2/2016 11:21 AM, Wei Liu wrote:
> On Thu, Jun 02, 2016 at 11:18:36AM -0400, annie li wrote:
>> On 4/14/2016 11:36 AM, Olaf Hering wrote:
>>> On Thu, Apr 14, Wei Liu wrote:
>>>
>>>> Maybe go back to 96ae556569b8eaedc0bb242932842c3277b515d8 and try again?
>>>> Then 5cf46a66883ad7a56c5bdee97696373473f80974 and try? So that I can
>>>> know if it is related to COLO series. No, don't try to bisect that
>>>> because it's broken in the middle.
>>> I think I took enough snapshots for my rpm packages to get close to the
>>> above commits. It will take some time to cycle through them.
>>>
>>>
>>>> Also as Andrew suggested if you can reproduce it with a smaller domain
>>>> or share the guest image with us, that would be helpful.
>>> I have already sent him the link.
>> Any update, guys?
>> I hit similar problem recently too.
>>
> Yes, this has been fixed in master branch.
>
> See de28b189bd329dd20a245a318be8feea2c4cc60a.
Nice! tested with newer version with this patch, it works.

Thanks
Annie
>
> Wei.
>
>> Thanks
>> Annie
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-06-02 16:18 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-14 13:03 xl migrate regression in staging Olaf Hering
2016-04-14 13:08 ` Andrew Cooper
2016-04-14 13:52   ` Olaf Hering
2016-04-14 13:54     ` Olaf Hering
2016-04-14 14:04     ` Andrew Cooper
2016-04-14 14:15       ` Olaf Hering
2016-04-14 15:00         ` Andrew Cooper
2016-04-14 13:32 ` Olaf Hering
2016-04-14 15:31 ` Wei Liu
2016-04-14 15:36   ` Olaf Hering
2016-06-02 15:18     ` annie li
2016-06-02 15:21       ` Wei Liu
2016-06-02 16:18         ` annie li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).