All of lore.kernel.org
 help / color / mirror / Atom feed
* xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2
@ 2011-08-19 17:56 Andreas Olsowski
  2011-08-22  7:32 ` Jan Beulich
  0 siblings, 1 reply; 18+ messages in thread
From: Andreas Olsowski @ 2011-08-19 17:56 UTC (permalink / raw)
  To: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3180 bytes --]

I have 2 servers, both were installed with Debian 6.0.2 stable(squeeze).

I took the xen-4.1.1.tar.gz and the very latest xen/stable-2.6.32.x from 
jeremys git.

For dom0 .config i used one that was derived from the ones suggested on 
the pvops wiki page. It has worked fine before.

For domU i use 3 different kernels, a 2.6.39 one, that is running fine 
on ~80 paravirtualized guests in my production envrionment.
Also the lastest 3.0.3 and 3.1-rc2 from the kernel.org git.

The config was updated for them via make oldconfig at different times.
3.0.3 has explicitly has DEBUG symbols in it, the others dont.


I made damn sure my two test servers where as close to identical as they 
can possbily get.
Everything installed by make install-xen and make install-tools is 
binary identical.
The kernels have been copied over via scp. (scp /boot/*2.6.32.45* ...)


It all boils down to this:
BUG: unable to handle kernel paging request at ...

This happens when i migrate one of my 3 test virtual machines 
(testvm-2.6 testvm-3.0 and testvm-3.1) from host1 to host2.
host1 is called xenturio1, host2 is called tarballerina.

config-2.6.32.45-xen0:
http://pastebin.com/DLC3BcCF

config-2.6.39-xenU:
http://pastebin.com/r5KBpumE

config-3.0.3-xenU:
http://pastebin.com/DDjrYANv

config-3.1-rc2-xenU+:
http://pastebin.com/tWbt16yR

sytem information on host1 and host2:
http://pastebin.com/zs89a1rQ
(cpuinfo, xl info, uname -a, md5sums of xen and kernel)



Here come the logs:

testvm-2.6@host1 to host2:
xl console:
http://pastebin.com/mUKugaYu
vm-state after migration "r-----"
xenctx:
http://pastebin.com/viQzfwT1

testvm-3.0@host1 to host2:
xl console:
http://pastebin.com/iswQFN2a
vm-state after migration "r-----"
xenctx:
http://pastebin.com/8VdSUrYd

testvm-3.1@host1 to host2:
xl console:
did not produce any output
vm-state after migration "---sc-"
xenctx: http://pastebin.com/ymT0Rxhz

xl-testvm.*.log output after killing them:
http://pastebin.com/0L4905ft




testvm-2.6@host2 to host1:
xl console:
http://pastebin.com/nNqUeJNR
vm-state after migration "-b----"
xenctx:
http://pastebin.com/gfAVWe2v

testvm-3.0@host2 to host1:
xl console:
did not produce any output
vm-state after migration "-b----"
xenctx:
http://pastebin.com/nPiTTLEz

testvm-3.1@host2 to host1:
xl console:
http://pastebin.com/3tqB4Zet
vm-state after migration "-b----"
xenctx:
http://pastebin.com/bBtxePmr

xl-testvm.*.log output after killing them:
http://pastebin.com/3i4XzFsv

Local migration works (migrate to localhost).


I first encountered this on servers running 4.2 where one of 3 hosts 
could not migrate machines that have been created on it.



As usual: input is greatly appreciated.

If you want me to try any other kernel .config entries or want some 
different output tell me what exactly you would like to see and i will 
provides them a.s.a.p.

If you are running xen4.1.1 with 2.6.32-jeremy kernels and you dont 
experience this problem, i would like to have your dom0 and domU .config 
files so i can test them.



With best regards
-- 
Andreas Olsowski


[-- Attachment #1.2: S/MIME Kryptografische Unterschrift --]
[-- Type: application/pkcs7-signature, Size: 5144 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2
  2011-08-19 17:56 xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2 Andreas Olsowski
@ 2011-08-22  7:32 ` Jan Beulich
  2011-08-22 13:56   ` Andreas Olsowski
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2011-08-22  7:32 UTC (permalink / raw)
  To: Andreas Olsowski; +Cc: xen-devel

>>> On 19.08.11 at 19:56, Andreas Olsowski <andreas.olsowski@leuphana.de> wrote:
> I have 2 servers, both were installed with Debian 6.0.2 stable(squeeze).
> 
> I took the xen-4.1.1.tar.gz and the very latest xen/stable-2.6.32.x from 
> jeremys git.
> 
> For dom0 .config i used one that was derived from the ones suggested on 
> the pvops wiki page. It has worked fine before.
> 
> For domU i use 3 different kernels, a 2.6.39 one, that is running fine 
> on ~80 paravirtualized guests in my production envrionment.
> Also the lastest 3.0.3 and 3.1-rc2 from the kernel.org git.
> 
> The config was updated for them via make oldconfig at different times.
> 3.0.3 has explicitly has DEBUG symbols in it, the others dont.
> 
> 
> I made damn sure my two test servers where as close to identical as they 
> can possbily get.
> Everything installed by make install-xen and make install-tools is 
> binary identical.
> The kernels have been copied over via scp. (scp /boot/*2.6.32.45* ...)
> 
> 
> It all boils down to this:
> BUG: unable to handle kernel paging request at ...
> 
> This happens when i migrate one of my 3 test virtual machines 
> (testvm-2.6 testvm-3.0 and testvm-3.1) from host1 to host2.
> host1 is called xenturio1, host2 is called tarballerina.
> 
> config-2.6.32.45-xen0:
> http://pastebin.com/DLC3BcCF 
> 
> config-2.6.39-xenU:
> http://pastebin.com/r5KBpumE 
> 
> config-3.0.3-xenU:
> http://pastebin.com/DDjrYANv 
> 
> config-3.1-rc2-xenU+:
> http://pastebin.com/tWbt16yR 
> 
> sytem information on host1 and host2:
> http://pastebin.com/zs89a1rQ 
> (cpuinfo, xl info, uname -a, md5sums of xen and kernel)

Does it also fail the other way round (host2 -> host1)? If not, your
issue is likely fixed with 23102:1c7b601b1b35 on 4.1-testing (and
with you posting on xen-devel rather than xen-users I would really
have expected that you would have looked for similar reports or
eventual fixes before complaining).

Jan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xen/stable-2.6.32.x xen-4.1.1 live migration  fails with kernels 2.6.39, 3.0.3 and 3.1-rc2
  2011-08-22  7:32 ` Jan Beulich
@ 2011-08-22 13:56   ` Andreas Olsowski
  2011-08-24 20:34     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 18+ messages in thread
From: Andreas Olsowski @ 2011-08-22 13:56 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1718 bytes --]

> Does it also fail the other way round (host2 ->  host1)? If not, your
> issue is likely fixed with 23102:1c7b601b1b35 on 4.1-testing (and
> with you posting on xen-devel rather than xen-users I would really
> have expected that you would have looked for similar reports or
> eventual fixes before complaining).

It did happen host2->host1 and host1->host2 with xen4.1.1-

I did set up 2 servers with identical hardware now and in fact i dont 
have any problems with them migrating machines.

I went on to upgrade all 3 servers (2x 32gb 1x96gb) to xen-4.1-testing.

Now i can migrate 32gbhost->32gbhost and 32gbhost->96gbhost but 
96gbhost->32gbhost still fails.

BUG: unable to handle kernel paging request at fffffffffffffff8
with 2.6.39 and 3.1 guest kernels, 3.0 didnt produce any output on its 
tty0 anymore.


This issue may be a little more then a memory size mismatch, since i 
have 3 servers running xen4.2 with 96gb ram two Dell R610s and one R710, 
where the R610s can migrate guests between each other just fine.
They can also migrate to the R710 and back.
But a host created on the R710 cant be migrated to a R610.

The same exact thing happens with 4.1-testing.
A guest created on a 32gb host can be migrated to the 96gb host and back 
to any 32gb host.
But a guest created on the 96gb host can not be migrated to a 32gb host.

Here is my server list:
host1: Dell PE2950 32GB RAM  4.1.1/4.1-testing/4.2 available for testing
host2: Dell PE2950 32GB RAM  4.1.1/4.1-testing/4.2 available for testing
host3: Dell R710 96B RAM  4.1.1/4.1-testing/4.2 available for testing
host4: Dell R610 96B RAM  xen4.2
host4: Dell R610 96B RAM  xen4.2



-- 
Andreas Olsowski


[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 6595 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2
  2011-08-22 13:56   ` Andreas Olsowski
@ 2011-08-24 20:34     ` Konrad Rzeszutek Wilk
  2011-08-25  7:15       ` Andreas Olsowski
  0 siblings, 1 reply; 18+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-08-24 20:34 UTC (permalink / raw)
  To: Andreas Olsowski; +Cc: xen-devel, Jan Beulich

On Mon, Aug 22, 2011 at 03:56:43PM +0200, Andreas Olsowski wrote:
> >Does it also fail the other way round (host2 ->  host1)? If not, your
> >issue is likely fixed with 23102:1c7b601b1b35 on 4.1-testing (and
> >with you posting on xen-devel rather than xen-users I would really
> >have expected that you would have looked for similar reports or
> >eventual fixes before complaining).
> 
> It did happen host2->host1 and host1->host2 with xen4.1.1-
> 
> I did set up 2 servers with identical hardware now and in fact i
> dont have any problems with them migrating machines.
> 
> I went on to upgrade all 3 servers (2x 32gb 1x96gb) to xen-4.1-testing.

Did you check that your xen-4.1-testing had the patch above?
> 
> Now i can migrate 32gbhost->32gbhost and 32gbhost->96gbhost but
> 96gbhost->32gbhost still fails.
> 
> BUG: unable to handle kernel paging request at fffffffffffffff8
> with 2.6.39 and 3.1 guest kernels, 3.0 didnt produce any output on
> its tty0 anymore.
> 
> 
> This issue may be a little more then a memory size mismatch, since i
> have 3 servers running xen4.2 with 96gb ram two Dell R610s and one
> R710, where the R610s can migrate guests between each other just
> fine.
> They can also migrate to the R710 and back.
> But a host created on the R710 cant be migrated to a R610.
> 
> The same exact thing happens with 4.1-testing.
> A guest created on a 32gb host can be migrated to the 96gb host and
> back to any 32gb host.
> But a guest created on the 96gb host can not be migrated to a 32gb host.

Which sounds like the patch above should have fixed. Again, did you
check your binary and source tree to see if you have the mentioned
patch?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2
  2011-08-24 20:34     ` Konrad Rzeszutek Wilk
@ 2011-08-25  7:15       ` Andreas Olsowski
  2011-08-26 15:00         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 18+ messages in thread
From: Andreas Olsowski @ 2011-08-25  7:15 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 532 bytes --]


 > Which sounds like the patch above should have fixed. Again, did you
 > check your binary and source tree to see if you have the mentioned
 > patch?

Yes at the time i tested it the patch was in 4.1-testing and 4.2, so i 
do have the patch applied. It had the desired effect, i can migrate TO 
the host with more RAM, but i still cannot migrate guests that have been 
created on it FROM that host.

I can however create the guest somewhere else, migrate it there and then 
migrate it back.

-- 
Andreas Olsowski


[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 6595 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2
  2011-08-25  7:15       ` Andreas Olsowski
@ 2011-08-26 15:00         ` Konrad Rzeszutek Wilk
  2011-08-26 17:26           ` Andreas Olsowski
  0 siblings, 1 reply; 18+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-08-26 15:00 UTC (permalink / raw)
  To: Andreas Olsowski; +Cc: xen-devel

On Thu, Aug 25, 2011 at 09:15:14AM +0200, Andreas Olsowski wrote:
> 
> > Which sounds like the patch above should have fixed. Again, did you
> > check your binary and source tree to see if you have the mentioned
> > patch?
> 
> Yes at the time i tested it the patch was in 4.1-testing and 4.2, so
> i do have the patch applied. It had the desired effect, i can
> migrate TO the host with more RAM, but i still cannot migrate guests
> that have been created on it FROM that host.

Ok, so you do have a workaround for that right now - and we kind of
know that is something still amiss with the MFN calculations when
migrating.

My todo list is not getting any shorter sadly so not sure when I will
get to try this out. But let me do that when I get my 32GB machine
working again.

> 
> I can however create the guest somewhere else, migrate it there and
> then migrate it back.

Yeah, that really points to either the tools not liking the
MFN being too high or the hypervisor. Or the save/resume path in the
Linux kernel is failing silently and sticking in invalid MFNs
as it can't deal with higher MFNs.

In other words - need to run this to figure out.

Unless you are up for helping out by debugging the code a bit and
seeing if you can come with a fix?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2
  2011-08-26 15:00         ` Konrad Rzeszutek Wilk
@ 2011-08-26 17:26           ` Andreas Olsowski
  2011-08-29 19:49             ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 18+ messages in thread
From: Andreas Olsowski @ 2011-08-26 17:26 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2557 bytes --]

> My todo list is not getting any shorter sadly so not sure when I will
> get to try this out. But let me do that when I get my 32GB machine
> working again.
It would certainly be interesting to know if you experience the same 
thing on your platforms. This may or may not have sth to do with the 
hardware in play.


>
> Yeah, that really points to either the tools not liking the
> MFN being too high or the hypervisor. Or the save/resume path in the
> Linux kernel is failing silently and sticking in invalid MFNs
> as it can't deal with higher MFNs.
>
> In other words - need to run this to figure out.
>
> Unless you are up for helping out by debugging the code a bit and
> seeing if you can come with a fix?

Allthough i am willing, i probably wont be able to, since i lack the 
neccessary understanding of the low level workings of Xen and i am not 
very experienced at debugging C code/programs.


However i did some additional testing, this time with xen4.2 and things 
have gotten worse:

The two servers involved do BOTH have 96GB ram and are both running the 
latest xen 4.2 but are of different hardware (R710 and R610):
http://pastebin.com/AaSpWZdg

And this is happens when i throw a 32GB server (PE2950) in the mix:
http://pastebin.com/7X8t022R

So with 4.2 there are still migration errors, but whats worse, now i 
cant migrate anything anywhere anymore when the platform is different.

Within the same platform everything works fine (2x R610):
http://pastebin.com/ZWByjjY5

What is going on here?


Could this be a xl toolstack problem after all? And why does half of it 
work in 4.1 and not with 4.2??
Stuff like:
xc: error: Failed to pin batch of 493 page tables (22 = Invalid 
argument): Internal error
and
xc: error: Couldn't set eXtended States for vcpu0 (22 = Invalid 
argument): Internal error

look like some simple to debug errors.


There is still one more thing left to test: xen 4.1-testing on a R610.
For that i have to migrate the guests away to the other R610.
I probably will get around to do it this weekend or at least on monday.
Ill just reply my findings to this email once i have them.

It would seem you are overloaded with too many different things, i hope 
you still find some time to relax and i am sorry for adding more stuff 
to your list.


I will focus my future testing solely on 4.1-testing, just thought 
checking out 4.2 may help me understand ... instead i am even more confused.

Have a nice weekend.

With best regards

Andreas


[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 6595 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2
  2011-08-26 17:26           ` Andreas Olsowski
@ 2011-08-29 19:49             ` Konrad Rzeszutek Wilk
  2011-08-31 13:07               ` Andreas Olsowski
  0 siblings, 1 reply; 18+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-08-29 19:49 UTC (permalink / raw)
  To: Andreas Olsowski; +Cc: xen-devel

On Fri, Aug 26, 2011 at 07:26:29PM +0200, Andreas Olsowski wrote:
> >My todo list is not getting any shorter sadly so not sure when I will
> >get to try this out. But let me do that when I get my 32GB machine
> >working again.
> It would certainly be interesting to know if you experience the same
> thing on your platforms. This may or may not have sth to do with the
> hardware in play.

OK, got my box online.

Getting closer to trying to reproduce the problem.

> 
> 
> >
> >Yeah, that really points to either the tools not liking the
> >MFN being too high or the hypervisor. Or the save/resume path in the
> >Linux kernel is failing silently and sticking in invalid MFNs
> >as it can't deal with higher MFNs.
> >
> >In other words - need to run this to figure out.
> >
> >Unless you are up for helping out by debugging the code a bit and
> >seeing if you can come with a fix?
> 
> Allthough i am willing, i probably wont be able to, since i lack the
> neccessary understanding of the low level workings of Xen and i am
> not very experienced at debugging C code/programs.

OK.
> 
> 
> However i did some additional testing, this time with xen4.2 and
> things have gotten worse:

Yeah, xen-unstable past c/s 23379 is doing a lot of weird stuff for me.

> 
> The two servers involved do BOTH have 96GB ram and are both running
> the latest xen 4.2 but are of different hardware (R710 and R610):
> http://pastebin.com/AaSpWZdg
> 
> And this is happens when i throw a 32GB server (PE2950) in the mix:
> http://pastebin.com/7X8t022R
> 
> So with 4.2 there are still migration errors, but whats worse, now i
> cant migrate anything anywhere anymore when the platform is
> different.
> 
> Within the same platform everything works fine (2x R610):
> http://pastebin.com/ZWByjjY5
> 
> What is going on here?

<sigh> Development - and not all developers test everything in the
mix.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2
  2011-08-29 19:49             ` Konrad Rzeszutek Wilk
@ 2011-08-31 13:07               ` Andreas Olsowski
  2011-09-07 13:50                 ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 18+ messages in thread
From: Andreas Olsowski @ 2011-08-31 13:07 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2173 bytes --]

A little update, i now have all machines running on xen-4.1-testing with 
xen/stable-2.6.32.x
That gave me the possiblity for additional tests.

(I also tested xm/xend in addtion to xl/libxl, to make sure its not a 
xl/libxl problem.)

I took the liberty to create a new test result matrix that should 
provide a better overview (in case someone else wants to get the whole 
picture):

####################################################################
##### xen 4.1 live migration fails between different platforms #####
####################################################################
XEN: xen-4.1-testing.hg
dom0: xen/stable-2.6.32.x
domU: linux-2.6.39 vanilla (also 3.0.3 and 3.1)

toolstack: xl/libxl
(at least FAIL type1 also occurs with xm/xend)

# create means the guest has been created by this host
# received means the guest has been migrate-received by this host

XEN: xen-4.1-testing.hg
dom0: xen/stable-2.6.32.x
domU: linux-2.6.39 vanilla (also 3.0.3 and 3.1)

toolstack: xl/libxl
(at least FAIL type1 also occurs with xm/xend)


# Dell PE 2950 and Dell PE 2950
create pe2950-1 -> pe2950-2  OK
received pe2950-2 -> pe2950-1 OK
create pe2950-2 -> pe2950-1  OK
received pe2950-1 -> pe2950-2 OK

# Dell PE 2950 and Dell R710
create pe2950-1 -> r710  OK
received r710 -> pe2950-1 OK
create r710 -> pe2950-1 FAIL (type 1): http://pastebin.com/iUeNPQyY

# Dell PE 2950 and Dell R610
create pe2950-1 -> r610-1 FAIL (type 2): http://pastebin.com/fzMkuS5s
create r610-1 -> pe2950-1 FAIL (type 1): http://pastebin.com/Lq6SGVPj

# Dell R610 and Dell R610
create r610-1 -> r610-2 OK
received r610-2 -> r610-1 OK

create r610-2 -> r610-1 OK
received r610-1 -> r610-2 OK

# Dell R610 and Dell R710
create r610-1 -> r710 OK
received r710 -> r610-1 OK

create r710 -> r610-1 FAIL (type 2): http://pastebin.com/eff5Yx0C

# Dell PE 2950 and Dell R710 and Dell R610
create pe2950-2 -> r710 OK
received r710 -> r610 FAIL (type 2): http://pastebin.com/it7QPsJk

create r610 -> r710 OK
received r710 -> pe2950-2 FAIL (type 1 derived?): 
http://pastebin.com/R6pXSJpU

#EOF

with best regards

Andreas


[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 6595 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2
  2011-08-31 13:07               ` Andreas Olsowski
@ 2011-09-07 13:50                 ` Konrad Rzeszutek Wilk
  2011-09-08 17:32                   ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 18+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-09-07 13:50 UTC (permalink / raw)
  To: Andreas Olsowski; +Cc: xen-devel

On Wed, Aug 31, 2011 at 03:07:22PM +0200, Andreas Olsowski wrote:
> A little update, i now have all machines running on xen-4.1-testing
> with xen/stable-2.6.32.x
> That gave me the possiblity for additional tests.
> 
> (I also tested xm/xend in addtion to xl/libxl, to make sure its not
> a xl/libxl problem.)
> 
> I took the liberty to create a new test result matrix that should
> provide a better overview (in case someone else wants to get the
> whole picture):

So.. I don't think the issue I am seeing is exactly the same. This is
what 'xl' gives me:

 :~/
> xl migrate 3 tst010
root@tst010's password:
migration target: Ready to receive domain.
Saving to migration stream new xl format (info 0x0/0x0/326)
Loading new save file incoming migration stream (new xl fmt info 0x0/0x0/326)
 Savefile contains xl domain config
xc: Saving memory: iter 0 (last sent 0 skipped 0): 262400/262400  100%
xc: Saving memory: iter 2 (last sent 1105 skipped 23): 262400/262400  100%
xc: Saving memory: iter 3 (last sent 74 skipped 0): 262400/262400  100%
xc: Saving memory: iter 4 (last sent 0 skipped 0): 262400/262400  100%
xc: error: unexpected PFN mapping failure pfn 19d0 map_mfn 4e7e04 p2m_mfn 4e7e04: Internal error
libxl: error: libxl_dom.c:363:libxl__domain_restore_common: restoring domain: Resource temporarily unavailable
libxl: error: libxl_create.c:483:do_domain_create: cannot (re-)build domain: -3
libxl: error: libxl.c:733:libxl_domain_destroy: non-existant domain 4
migration target: Domain creation failed (code -3).
libxl: error: libxl_utils.c:410:libxl_read_exactly: file/stream truncated reading ready message from migration receiver stream
libxl: info: libxl_exec.c:125:libxl_report_child_exitstatus: migration target process [5810] exited with error status 3
Migration failed, resuming at sender.


And on the receiving side (tst010) I get a monster off:

(XEN) mm.c:945:d0 Error getting mfn 4e7e04 (pfn ffffffffffffffff) from L1 entry 80000004e7e04627 for l1e_owner=0, pg_owner=4
XEN) mm.c:945:d0 Error getting mfn 36fd19 (pfn ffffffffffffffff) from L1 entry 800000036fd19627 for l1e_owner=0, pg_owner=4
(XEN) mm.c:945:d0 Error getting mfn 36f583 (pfn ffffffffffffffff) from L1 entry 800000036f583627 for l1e_owner=0, pg_owner=4
..
(XEN) mm.c:945:d0 Error getting mfn 4e7d09 (pfn ffffffffffffffff) from L1 entry 80000004e7d09627 for l1e_owner=0, pg_owner=4
(XEN) event_channel.c:250:d3 EVTCHNOP failure: error -17


The migration is from a 4GB box to a 32GB box (worked), then back to the 4GB( worked)
and then back to the 32GB (boom!).

anyhow, let me try this with 4.1-testing branch. Running on the bleeding
edge might not be the best idea sometimes.

> 
> ####################################################################
> ##### xen 4.1 live migration fails between different platforms #####
> ####################################################################
> XEN: xen-4.1-testing.hg
> dom0: xen/stable-2.6.32.x
> domU: linux-2.6.39 vanilla (also 3.0.3 and 3.1)
> 
> toolstack: xl/libxl
> (at least FAIL type1 also occurs with xm/xend)
> 
> # create means the guest has been created by this host
> # received means the guest has been migrate-received by this host
> 
> XEN: xen-4.1-testing.hg
> dom0: xen/stable-2.6.32.x
> domU: linux-2.6.39 vanilla (also 3.0.3 and 3.1)
> 
> toolstack: xl/libxl
> (at least FAIL type1 also occurs with xm/xend)
> 
> 
> # Dell PE 2950 and Dell PE 2950
> create pe2950-1 -> pe2950-2  OK
> received pe2950-2 -> pe2950-1 OK
> create pe2950-2 -> pe2950-1  OK
> received pe2950-1 -> pe2950-2 OK
> 
> # Dell PE 2950 and Dell R710
> create pe2950-1 -> r710  OK
> received r710 -> pe2950-1 OK
> create r710 -> pe2950-1 FAIL (type 1): http://pastebin.com/iUeNPQyY
> 
> # Dell PE 2950 and Dell R610
> create pe2950-1 -> r610-1 FAIL (type 2): http://pastebin.com/fzMkuS5s
> create r610-1 -> pe2950-1 FAIL (type 1): http://pastebin.com/Lq6SGVPj
> 
> # Dell R610 and Dell R610
> create r610-1 -> r610-2 OK
> received r610-2 -> r610-1 OK
> 
> create r610-2 -> r610-1 OK
> received r610-1 -> r610-2 OK
> 
> # Dell R610 and Dell R710
> create r610-1 -> r710 OK
> received r710 -> r610-1 OK
> 
> create r710 -> r610-1 FAIL (type 2): http://pastebin.com/eff5Yx0C
> 
> # Dell PE 2950 and Dell R710 and Dell R610
> create pe2950-2 -> r710 OK
> received r710 -> r610 FAIL (type 2): http://pastebin.com/it7QPsJk
> 
> create r610 -> r710 OK
> received r710 -> pe2950-2 FAIL (type 1 derived?):
> http://pastebin.com/R6pXSJpU
> 
> #EOF
> 
> with best regards
> 
> Andreas
> 



> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2
  2011-09-07 13:50                 ` Konrad Rzeszutek Wilk
@ 2011-09-08 17:32                   ` Konrad Rzeszutek Wilk
  2011-09-08 18:12                     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 18+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-09-08 17:32 UTC (permalink / raw)
  To: Andreas Olsowski; +Cc: xen-devel

On Wed, Sep 07, 2011 at 09:50:47AM -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Aug 31, 2011 at 03:07:22PM +0200, Andreas Olsowski wrote:
> > A little update, i now have all machines running on xen-4.1-testing
> > with xen/stable-2.6.32.x
> > That gave me the possiblity for additional tests.
> > 
> > (I also tested xm/xend in addtion to xl/libxl, to make sure its not
> > a xl/libxl problem.)
> > 
> > I took the liberty to create a new test result matrix that should
> > provide a better overview (in case someone else wants to get the
> > whole picture):
> 
> So.. I don't think the issue I am seeing is exactly the same. This is
> what 'xl' gives me:

Scratch that. I am seeing the error below if I:

1) Create guest on 4GB machine
2) Migrate it to the 32GB box (guest still works)
3) Migrate it to the 4GB box (guest dies - error below shows up and
guest is dead).

With 3.1-rc5 virgin - both Dom0 and DomU. Also Xen 4.1-testing on top of this.

I tried just creating a guest on the 32GB and migrating it - and while
it did migrate it was stuck in a hypercall_page call or crashed later on.

Andreas,

Thanks for reporting this.
> 
>  :~/
> > xl migrate 3 tst010
> root@tst010's password:
> migration target: Ready to receive domain.
> Saving to migration stream new xl format (info 0x0/0x0/326)
> Loading new save file incoming migration stream (new xl fmt info 0x0/0x0/326)
>  Savefile contains xl domain config
> xc: Saving memory: iter 0 (last sent 0 skipped 0): 262400/262400  100%
> xc: Saving memory: iter 2 (last sent 1105 skipped 23): 262400/262400  100%
> xc: Saving memory: iter 3 (last sent 74 skipped 0): 262400/262400  100%
> xc: Saving memory: iter 4 (last sent 0 skipped 0): 262400/262400  100%
> xc: error: unexpected PFN mapping failure pfn 19d0 map_mfn 4e7e04 p2m_mfn 4e7e04: Internal error
> libxl: error: libxl_dom.c:363:libxl__domain_restore_common: restoring domain: Resource temporarily unavailable
> libxl: error: libxl_create.c:483:do_domain_create: cannot (re-)build domain: -3
> libxl: error: libxl.c:733:libxl_domain_destroy: non-existant domain 4
> migration target: Domain creation failed (code -3).
> libxl: error: libxl_utils.c:410:libxl_read_exactly: file/stream truncated reading ready message from migration receiver stream
> libxl: info: libxl_exec.c:125:libxl_report_child_exitstatus: migration target process [5810] exited with error status 3
> Migration failed, resuming at sender.
> 
> 
> And on the receiving side (tst010) I get a monster off:
> 
> (XEN) mm.c:945:d0 Error getting mfn 4e7e04 (pfn ffffffffffffffff) from L1 entry 80000004e7e04627 for l1e_owner=0, pg_owner=4
> XEN) mm.c:945:d0 Error getting mfn 36fd19 (pfn ffffffffffffffff) from L1 entry 800000036fd19627 for l1e_owner=0, pg_owner=4
> (XEN) mm.c:945:d0 Error getting mfn 36f583 (pfn ffffffffffffffff) from L1 entry 800000036f583627 for l1e_owner=0, pg_owner=4
> ..
> (XEN) mm.c:945:d0 Error getting mfn 4e7d09 (pfn ffffffffffffffff) from L1 entry 80000004e7d09627 for l1e_owner=0, pg_owner=4
> (XEN) event_channel.c:250:d3 EVTCHNOP failure: error -17
> 
> 
> The migration is from a 4GB box to a 32GB box (worked), then back to the 4GB( worked)
> and then back to the 32GB (boom!).
> 
> anyhow, let me try this with 4.1-testing branch. Running on the bleeding
> edge might not be the best idea sometimes.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2
  2011-09-08 17:32                   ` Konrad Rzeszutek Wilk
@ 2011-09-08 18:12                     ` Konrad Rzeszutek Wilk
  2011-09-08 19:50                       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 18+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-09-08 18:12 UTC (permalink / raw)
  To: Andreas Olsowski; +Cc: xen-devel

On Thu, Sep 08, 2011 at 01:32:12PM -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Sep 07, 2011 at 09:50:47AM -0400, Konrad Rzeszutek Wilk wrote:
> > On Wed, Aug 31, 2011 at 03:07:22PM +0200, Andreas Olsowski wrote:
> > > A little update, i now have all machines running on xen-4.1-testing
> > > with xen/stable-2.6.32.x
> > > That gave me the possiblity for additional tests.
> > > 
> > > (I also tested xm/xend in addtion to xl/libxl, to make sure its not
> > > a xl/libxl problem.)
> > > 
> > > I took the liberty to create a new test result matrix that should
> > > provide a better overview (in case someone else wants to get the
> > > whole picture):
> > 
> > So.. I don't think the issue I am seeing is exactly the same. This is
> > what 'xl' gives me:
> 
> Scratch that. I am seeing the error below if I:
> 
> 1) Create guest on 4GB machine
> 2) Migrate it to the 32GB box (guest still works)
> 3) Migrate it to the 4GB box (guest dies - error below shows up and
> guest is dead).
> 
> With 3.1-rc5 virgin - both Dom0 and DomU. Also Xen 4.1-testing on top of this.
> 
> I tried just creating a guest on the 32GB and migrating it - and while
> it did migrate it was stuck in a hypercall_page call or crashed later on.
> 
> Andreas,
> 
> Thanks for reporting this.

Oh wait. At some point you said that 2.6.32.43 worked for you.. Is that still
the case?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2
  2011-09-08 18:12                     ` Konrad Rzeszutek Wilk
@ 2011-09-08 19:50                       ` Konrad Rzeszutek Wilk
  2011-09-09  5:59                         ` Andreas Olsowski
       [not found]                         ` <19825_1315548082_p8961G08009635_4E69AB53.5010702@leuphana.de>
  0 siblings, 2 replies; 18+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-09-08 19:50 UTC (permalink / raw)
  To: Andreas Olsowski; +Cc: xen-devel

On Thu, Sep 08, 2011 at 02:12:27PM -0400, Konrad Rzeszutek Wilk wrote:
> On Thu, Sep 08, 2011 at 01:32:12PM -0400, Konrad Rzeszutek Wilk wrote:
> > On Wed, Sep 07, 2011 at 09:50:47AM -0400, Konrad Rzeszutek Wilk wrote:
> > > On Wed, Aug 31, 2011 at 03:07:22PM +0200, Andreas Olsowski wrote:
> > > > A little update, i now have all machines running on xen-4.1-testing
> > > > with xen/stable-2.6.32.x
> > > > That gave me the possiblity for additional tests.
> > > > 
> > > > (I also tested xm/xend in addtion to xl/libxl, to make sure its not
> > > > a xl/libxl problem.)
> > > > 
> > > > I took the liberty to create a new test result matrix that should
> > > > provide a better overview (in case someone else wants to get the
> > > > whole picture):
> > > 
> > > So.. I don't think the issue I am seeing is exactly the same. This is
> > > what 'xl' gives me:
> > 
> > Scratch that. I am seeing the error below if I:
> > 
> > 1) Create guest on 4GB machine
> > 2) Migrate it to the 32GB box (guest still works)
> > 3) Migrate it to the 4GB box (guest dies - error below shows up and
> > guest is dead).
> > 
> > With 3.1-rc5 virgin - both Dom0 and DomU. Also Xen 4.1-testing on top of this.
> > 
> > I tried just creating a guest on the 32GB and migrating it - and while
> > it did migrate it was stuck in a hypercall_page call or crashed later on.
> > 
> > Andreas,
> > 
> > Thanks for reporting this.
> 
> Oh wait. At some point you said that 2.6.32.43 worked for you.. Is that still
> the case?

Can you please try one thing for me - can you make sure the boxes have exact same
amount of memory? You can do 'mem=X' on the Xen hypervisor line to set that.

I think the problem you are running into is that you are migrating between
different CPU families... Is the /proc/cpuinfo drastically different between
the boxes?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2
  2011-09-08 19:50                       ` Konrad Rzeszutek Wilk
@ 2011-09-09  5:59                         ` Andreas Olsowski
  2011-09-12 16:47                           ` xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2.. between different physical machines and CPUs Konrad Rzeszutek Wilk
       [not found]                         ` <19825_1315548082_p8961G08009635_4E69AB53.5010702@leuphana.de>
  1 sibling, 1 reply; 18+ messages in thread
From: Andreas Olsowski @ 2011-09-09  5:59 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 4799 bytes --]

On 09/08/2011 09:50 PM, Konrad Rzeszutek Wilk wrote:
> On Thu, Sep 08, 2011 at 02:12:27PM -0400, Konrad Rzeszutek Wilk wrote:
>> On Thu, Sep 08, 2011 at 01:32:12PM -0400, Konrad Rzeszutek Wilk wrote:
>>> On Wed, Sep 07, 2011 at 09:50:47AM -0400, Konrad Rzeszutek Wilk wrote:
>>>> On Wed, Aug 31, 2011 at 03:07:22PM +0200, Andreas Olsowski wrote:
>>>>> A little update, i now have all machines running on xen-4.1-testing
>>>>> with xen/stable-2.6.32.x
>>>>> That gave me the possiblity for additional tests.
>>>>>
>>>>> (I also tested xm/xend in addtion to xl/libxl, to make sure its not
>>>>> a xl/libxl problem.)
>>>>>
>>>>> I took the liberty to create a new test result matrix that should
>>>>> provide a better overview (in case someone else wants to get the
>>>>> whole picture):
>>>>
>>>> So.. I don't think the issue I am seeing is exactly the same. This is
>>>> what 'xl' gives me:
>>>
>>> Scratch that. I am seeing the error below if I:
>>>
>>> 1) Create guest on 4GB machine
>>> 2) Migrate it to the 32GB box (guest still works)
>>> 3) Migrate it to the 4GB box (guest dies - error below shows up and
>>> guest is dead).
>>>
>>> With 3.1-rc5 virgin - both Dom0 and DomU. Also Xen 4.1-testing on top of this.
>>>
>>> I tried just creating a guest on the 32GB and migrating it - and while
>>> it did migrate it was stuck in a hypercall_page call or crashed later on.
>>>
>>> Andreas,
>>>
>>> Thanks for reporting this.
>>
>> Oh wait. At some point you said that 2.6.32.43 worked for you.. Is that still
>> the case?
 >
(Ignore e-mail from a few minutes ago, accidentally did not reply-all)

Did I? I will have to check my sent emails, but im pretty sure that if i 
found a way that works i normally would use it.

But i can try an older version later today.

Btw. allthough you get the same error as i do, the circumstances are 
slightly different.

This does not neccessarily have sth to todo with the amount of memory.
I do see this on hosts where both have the same amount of ram but are a 
different hardware platform.

>
> Can you please try one thing for me - can you make sure the boxes have exact same
> amount of memory? You can do 'mem=X' on the Xen hypervisor line to set that.
Running mem=8g and have turned balooning dom0 off.

	multiboot	/boot/xen.gz placeholder dom0_mem=8192M
	module	/boot/vmlinuz-2.6.32.45-xen0 placeholder 
root=UUID=216ff902-b505-45c4-9bcb-9d63b4cb8992 ro   mem=8G nomodeset 
console=tty0 console=ttyS1,57600 earlyprintk=xen


For some reason though, the two r610s show:
root@netcatarina:~# cat /proc/meminfo
MemTotal:        8378236 kB
root@netcatarina:~#  xl list |grep Domain-0
Domain-0                                     0  7445     8     r----- 
124304.7

root@memoryana:~# cat /proc/meminfo
MemTotal:        8378236 kB
root@memoryana:~# xl list |grep Domain-0
Domain-0                                     0  7445     8     r----- 
132125.0

wheras the r710:
root@tarballerina:~# cat /proc/meminfo
MemTotal:        7886716 kB
root@tarballerina:~#  xl list |grep Domain-0
Domain-0                                     0  7221     8     r----- 
64497.0

On a sidenote:

root@tarballerina:~# xl mem-set Domain-0 8192
libxl: error: libxl.c:2119:libxl_set_memory_target cannot get memory 
info from /local/domain/0/memory/static-max
: No such file or directory

The two r610s can xl set their memory just fine

>
> I think the problem you are running into is that you are migrating between
> different CPU families... Is the /proc/cpuinfo drastically different between
> the boxes?
diff:
< model		: 26
< model name	: Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz
< stepping	: 5
< cpu MHz		: 2261.074
< cache size	: 8192 KB
---
 > model		: 44
 > model name	: Intel(R) Xeon(R) CPU           E5640  @ 2.67GHz
 > stepping	: 2
 > cpu MHz		: 2660.050
 > cache size	: 12288 KB
13,14c13,14
< flags		: fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov pat clflush 
acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good 
nonstop_tsc aperfmperf pni est ssse3 cx16 sse4_1 sse4_2 popcnt 
hypervisor lahf_lm ida
< bogomips	: 4522.14
---
 > flags		: fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov pat 
clflush acpi mmx fxsr sse sse2 ss ht syscall lm constant_tsc rep_good 
nonstop_tsc aperfmperf pni pclmulqdq est ssse3 cx16 sse4_1 sse4_2 popcnt 
aes hypervisor lahf_lm ida arat
 > bogomips	: 5320.10

diffrent flags are: nx and aes

And thats r610 and r710. The cpu in the 2950 is older, a completely 
different platform, different chipset, no on-chip memory controller.

-- 
Andreas Olsowski
Leuphana Universität Lüneburg
Rechen- und Medienzentrum
Scharnhorststraße 1, C7.015
21335 Lüneburg

Tel: ++49 4131 677 1309


[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 6595 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2
       [not found]                         ` <19825_1315548082_p8961G08009635_4E69AB53.5010702@leuphana.de>
@ 2011-09-09  9:18                           ` Andreas Olsowski
  0 siblings, 0 replies; 18+ messages in thread
From: Andreas Olsowski @ 2011-09-09  9:18 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1639 bytes --]

>>> On Thu, Sep 08, 2011 at 01:32:12PM -0400, Konrad Rzeszutek Wilk wrote:
>>> Oh wait. At some point you said that 2.6.32.43 worked for you.. Is
>>> that still
>>> the case?
I tested 2.6.32.43 and 2.6.32.40 (to be sure) again, they dont work either.


>> Can you please try one thing for me - can you make sure the boxes have
>> exact same
>> amount of memory? You can do 'mem=X' on the Xen hypervisor line to set
>> that.
> Running mem=8g and have turned balooning dom0 off.
>
> multiboot /boot/xen.gz placeholder dom0_mem=8192M
> module /boot/vmlinuz-2.6.32.45-xen0 placeholder
> root=UUID=216ff902-b505-45c4-9bcb-9d63b4cb8992 ro mem=8G nomodeset
> console=tty0 console=ttyS1,57600 earlyprintk=xen
>
>
> For some reason though, the two r610s show:
> root@netcatarina:~# cat /proc/meminfo
> MemTotal: 8378236 kB
> root@netcatarina:~# xl list |grep Domain-0
> Domain-0 0 7445 8 r----- 124304.7
>
> root@memoryana:~# cat /proc/meminfo
> MemTotal: 8378236 kB
> root@memoryana:~# xl list |grep Domain-0
> Domain-0 0 7445 8 r----- 132125.0
>
> wheras the r710:
> root@tarballerina:~# cat /proc/meminfo
> MemTotal: 7886716 kB
> root@tarballerina:~# xl list |grep Domain-0
> Domain-0 0 7221 8 r----- 64497.0
After reboot it went back up to 8378236KB.
I dont understand why the dom0 memory sometimes changes.

The two 32gb PE2950s show   8378236 KB after boot and then drop to sth 
like 6575996 KB.

The R610s stay at 8378236 KB, always.





-- 
Andreas Olsowski
Leuphana Universität Lüneburg
Rechen- und Medienzentrum
Scharnhorststraße 1, C7.015
21335 Lüneburg

Tel: ++49 4131 677 1309


[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 6595 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2.. between different physical machines and CPUs.
  2011-09-09  5:59                         ` Andreas Olsowski
@ 2011-09-12 16:47                           ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 18+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-09-12 16:47 UTC (permalink / raw)
  To: Andreas Olsowski; +Cc: xen-devel

> This does not neccessarily have sth to todo with the amount of memory.
> I do see this on hosts where both have the same amount of ram but
> are a different hardware platform.

<nods> Let me modify the subject a bit to reflect this.

> >I think the problem you are running into is that you are migrating between
> >different CPU families... Is the /proc/cpuinfo drastically different between
> >the boxes?
> diff:
> < model		: 26
> < model name	: Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz
> < stepping	: 5
> < cpu MHz		: 2261.074
> < cache size	: 8192 KB
> ---
> > model		: 44
> > model name	: Intel(R) Xeon(R) CPU           E5640  @ 2.67GHz
> > stepping	: 2
> > cpu MHz		: 2660.050
> > cache size	: 12288 KB
> 13,14c13,14
> < flags		: fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov pat
> clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc
> rep_good nonstop_tsc aperfmperf pni est ssse3 cx16 sse4_1 sse4_2
> popcnt hypervisor lahf_lm ida
> < bogomips	: 4522.14
> ---
> > flags		: fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov pat
> clflush acpi mmx fxsr sse sse2 ss ht syscall lm constant_tsc
> rep_good nonstop_tsc aperfmperf pni pclmulqdq est ssse3 cx16 sse4_1
> sse4_2 popcnt aes hypervisor lahf_lm ida arat
> > bogomips	: 5320.10
> 
> diffrent flags are: nx and aes

On the Linux command line, try using 'noexec=off' - that should
take care of the 'nx' bit.

The aes.. the 'xl' command has a bit easier syntax for setting the CPUID:

cpuid='host,family=15,model=26,stepping=5,aes=s'

That ought to take care of that. I don't really understand how
the old 'cpuid=['...']' syntax worked (the one that 'xm' used).
It looks quite arcane - so I think doing some Google search is the
only way to figure that out.

But co-workers of mine remind me that CPUID instructions is
trapped by the hypervisor (both HVM and PV - PV via a special
opcode - look in arch/x86/include/asm/xen/interface.h for details) for
the kernel _only_. There is no such guarantee for applications. Meaning that
if the application uses the 'cpuid' to figure out if 'aes' is available
instead of using /proc/cpuinfo, it _will_ get the 'aes' on one machine.

This application using CPUID and getting and not getting the right
filtered value is not present with HVM guests - as the CPUID instruction
is trapped there irregardless of whether it is running in the kernel or
user-land.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2
       [not found] ` <1676_1313808015_p7K2e91M024610_4E4F1DFC.2000909@leuphana.de>
@ 2011-08-20  3:49   ` Andreas Olsowski
  0 siblings, 0 replies; 18+ messages in thread
From: Andreas Olsowski @ 2011-08-20  3:49 UTC (permalink / raw)
  To: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 344 bytes --]

Am 20.08.2011 04:37, schrieb Andreas Olsowski:

> Next i will check xen4.2, maybe the results are different.

No they are not.


But while trying to find the last 2.6.32.x kernel to boot bare-metal i 
found out, that this migration problem does NOT exist in 2.6.32.43!

I will try 2.6.32.44 and 2.6.33.42 tomorrow , too tired now.


[-- Attachment #1.2: S/MIME Kryptografische Unterschrift --]
[-- Type: application/pkcs7-signature, Size: 5144 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2
       [not found] <1676_1313776768_p7JHxMh3027661_4E4EA3E2.2040809@leuphana.de>
@ 2011-08-20  2:37 ` Andreas Olsowski
       [not found] ` <1676_1313808015_p7K2e91M024610_4E4F1DFC.2000909@leuphana.de>
  1 sibling, 0 replies; 18+ messages in thread
From: Andreas Olsowski @ 2011-08-20  2:37 UTC (permalink / raw)
  To: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 577 bytes --]

I have tested linux 3.0.3 as dom0 kernel now and it has the same problem.

Migration of HVM also does not work and the kernel of the HVM shows the 
same output as my PV domUs.

I took another look at my dom0 kernel .config after make oldconfig'ing 
it for 3.0.3.
I know have every possible XEN flag set in the kernel:
http://pastebin.com/YxB8mkSU


Next i will check xen4.2, maybe the results are different.

-- 
Andreas Olsowski
Leuphana Universität Lüneburg
Rechen- und Medienzentrum
Scharnhorststraße 1, C7.015
21335 Lüneburg

Tel: ++49 4131 677 1309


[-- Attachment #1.2: S/MIME Kryptografische Unterschrift --]
[-- Type: application/pkcs7-signature, Size: 5144 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2011-09-12 16:47 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-19 17:56 xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2 Andreas Olsowski
2011-08-22  7:32 ` Jan Beulich
2011-08-22 13:56   ` Andreas Olsowski
2011-08-24 20:34     ` Konrad Rzeszutek Wilk
2011-08-25  7:15       ` Andreas Olsowski
2011-08-26 15:00         ` Konrad Rzeszutek Wilk
2011-08-26 17:26           ` Andreas Olsowski
2011-08-29 19:49             ` Konrad Rzeszutek Wilk
2011-08-31 13:07               ` Andreas Olsowski
2011-09-07 13:50                 ` Konrad Rzeszutek Wilk
2011-09-08 17:32                   ` Konrad Rzeszutek Wilk
2011-09-08 18:12                     ` Konrad Rzeszutek Wilk
2011-09-08 19:50                       ` Konrad Rzeszutek Wilk
2011-09-09  5:59                         ` Andreas Olsowski
2011-09-12 16:47                           ` xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2.. between different physical machines and CPUs Konrad Rzeszutek Wilk
     [not found]                         ` <19825_1315548082_p8961G08009635_4E69AB53.5010702@leuphana.de>
2011-09-09  9:18                           ` xen/stable-2.6.32.x xen-4.1.1 live migration fails with kernels 2.6.39, 3.0.3 and 3.1-rc2 Andreas Olsowski
     [not found] <1676_1313776768_p7JHxMh3027661_4E4EA3E2.2040809@leuphana.de>
2011-08-20  2:37 ` Andreas Olsowski
     [not found] ` <1676_1313808015_p7K2e91M024610_4E4F1DFC.2000909@leuphana.de>
2011-08-20  3:49   ` Andreas Olsowski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.