migration of pv guest fails from small to large host

* migration of pv guest fails from small to large host
@ 2011-07-01 10:41 Olaf Hering
  2011-07-01 16:20 ` Olaf Hering
  2011-07-12 16:43 ` [PATCH] xen: update machine_to_phys_order on resume Olaf Hering
  0 siblings, 2 replies; 12+ messages in thread
From: Olaf Hering @ 2011-07-01 10:41 UTC (permalink / raw)
  To: xen-devel

This issue was initially reported to happen on different sized HP ProLiant
systems running SLES11SP1 on dom0 and domU.

Migration of pv guests fails, the guest crashes on the target host once the
guest is unpaused after transit. It happens when the guest is started on a
small systen, then migrated from that small system to a large system.
If the guest is started on a large system, then migrated to a small system and
back to the large system, the migration will be successful.

The symptoms on the target host differ with the systems I have access to,
which are listed below. It is not possible to take a core dump.
The pv guest has one vcpu and 256MB, one network interface and a disk.

I have currently no idea what to look for. The xenctx patch for dumping
pagetables showed no differences between src/dst guest after transit to the
target host (I have to verify this on my hosts).

involved hardware:

bolen: ProLiant DL580 G7, 32GB, CPU E7540 @ 2.00GHz
falla: ProLiant DL360 G6, 8GB, CPU E5540 @ 2.53GHz 
drnek: ProLiant DL170h G6, 6GB, CPU E5504 @ 2.00GHz
gubaidulina: Intel SDV S3E37, 192GB, CPU 000 @ 2.40GHz (unknown cpu 0x206f1)

(other target hosts from different vendors with large amount of memory were reported to fail as well.)
I still trying to test a non-HP system as source host.

involved software:
host: sles11sp1, xen 4.0. Also xen-unstable 4.2 hg rev23640
pv gust: sles11sp1

migration with this command on bolen, falla, drnek:
"xm migrate sles11sp_para_1 gubaidulina" fails on gubaidulina:

[2011-06-30 21:21:32 21858] WARNING (XendDomainInfo:2061) Domain has crashed: name=sles11sp1_para_1 id=1.
[2011-06-30 21:21:32 21858] ERROR (XendDomainInfo:2318) core dump failed: id = 1 name = sles11sp1_para_1: (1, 'Internal error', "Couldn't map p2m_frame_list_list (errno 1) (1 = Operation not permitted)")
[2011-06-30 21:21:32 21858] DEBUG (XendDomainInfo:3084) XendDomainInfo.destroy: domid=1
[2011-06-30 21:21:32 21858] DEBUG (XendDomainInfo:2403) Destroying device model
[2011-06-30 21:21:32 21858] INFO (image:702) sles11sp1_para_1 device model terminated

xm dmesg shows no errors.

notes from a "bisect" with limiting Xen memory:
gubaidulina booted with mem=64G, migration from bolen succeeds.
gubaidulina booted with mem=96G, migration from bolen fails.
gubaidulina booted with mem=80G, migration from bolen fails.
gubaidulina booted with mem=72G, migration from bolen fails.
gubaidulina booted with mem=68G, migration from bolen fails.
gubaidulina booted with mem=65G, migration from bolen succeeds.
now testing more after migration:
gubaidulina booted with mem=66G, migration from bolen fails, no coredump message, no coredump
gubaidulina booted with mem=66G, second migration from bolen succeeds. xm shutdown crashes guest, no coredump
gubaidulina booted with mem=65G, migration from bolen succeeds. xm shutdown succeeds

Olaf

^ permalink raw reply	[flat|nested] 12+ messages in thread