All of lore.kernel.org
 help / color / mirror / Atom feed
* live migration fails (assert in shadow_hash_delete)
@ 2010-02-23  8:57 Ashish Bijlani
  2010-02-23  9:25 ` Tim Deegan
  0 siblings, 1 reply; 22+ messages in thread
From: Ashish Bijlani @ 2010-02-23  8:57 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1: Type: text/plain, Size: 3670 bytes --]

Hi,

I'm working on a project that requires live migration of a 64-bit PV
VM (on a 64-bit platform). "xm save"  and "xm restore" work fine.
However, live migration fails with the following err msg:

mapping kernel into physical memory
about to get started...
(XEN) traps.c:2306:d3 Domain attempted WRMSR 000000000000008b from
00000a07:00000000 to 00000000:000000.
(XEN) Assertion 'x' failed at common.c:2139
(XEN) ----[ Xen-4.0.0-rc3-pre  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82c4801c8a08>] shadow_hash_delete+0x12e/0x18c
(XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor
(XEN) rax: ffff8300040e2770   rbx: ffff830223ce0000   rcx: 0000000000000000
(XEN) rdx: 0000000000000000   rsi: 0000000000000000   rdi: ffff82f60443c8a0
(XEN) rbp: ffff82c4802efb48   rsp: ffff82c4802efb18   r8:  ffff82f600000000
(XEN) r9:  0000000000000000   r10: ffff830223ce0000   r11: 00000000000041c5
(XEN) r12: 0000000000221e45   r13: 00000000000000ec   r14: ffff82f600000000
(XEN) r15: ffff8300cfaea000   cr0: 0000000080050033   cr4: 00000000000006f0
(XEN) cr3: 0000000210154000   cr2: ffff8801dd5508c8
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff82c4802efb18:
(XEN)    3333333333333333 00000000000041c5 ffff82c4802efb32 000000000000000d
(XEN)    00000000000041c5 ffff8300cfaea000 ffff82c4802efba8 ffff82c4801e5d2e
(XEN)    0000005600ed79a0 0000000800000000 0000000100000000 0000000000221e45
(XEN)    0000000000000008 0000000000000000 ffff82f60443c8a0 ffff82f60404ab00
(XEN)    ffff82f600000000 ffff8300cfaea000 ffff82c4802efbd8 ffff82c4801c766d
(XEN)    ffff82c4802efc18 0000000000000282 0000000000000281 0000000000221e45
(XEN)    ffff82c4802efc28 ffff82c4801cb18f 000000000f69d1d0 ffff830223ce0000
(XEN)    ffff82c4802efc18 ffff830223ce0000 ffff82c4802eff28 ffff830223ce0e28
(XEN)    0000000000000002 ffff8300040de000 ffff82c4802efc58 ffff82c4801cba8d
(XEN)    0000000000000282 ffff82c4802efe58 0000000000010000 0000000000008000
(XEN)    ffff82c4802efce8 ffff82c4801bb394 0000000100000000 ffff8302236e8000
(XEN)    ffff830223ce0f08 ffff8300040e1000 00000001802efd48 ffff82c48031f640
(XEN)    ffff830223ce0000 0000000100000001 ffff82c4802eff28 ffff8300040e0000
(XEN)    ffff82c4802efce8 ffff830223ce0000 ffff82c4802efe58 00007fff0f69d1d0
(XEN)    ffff82c4802efe48 0000000000000000 ffff82c4802efd08 ffff82c4801bb56a
(XEN)    fffffffffffffff3 0000000000f71000 ffff82c4802efdc8 ffff82c48014796c
(XEN)    ffff82c4802efd28 ffff82c48016b0d4 ffff82c4802efd48 ffff82c48011dce7
(XEN)    0000000000000008 ffff82c480163d8c ffff82c4802efd68 ffff82c480118755
(XEN)    0000000000000008 ffff8300cfafa000 ffff82c4802efdc8 0000000000000286
(XEN)    ffff82c4802efd98 0000000000000286 ffff82c4802eff28 ffff82c4802eff28
(XEN) Xen call trace:
(XEN)    [<ffff82c4801c8a08>] shadow_hash_delete+0x12e/0x18c
(XEN)    [<ffff82c4801e5d2e>] sh_destroy_l4_shadow__guest_4+0xb5/0x371
(XEN)    [<ffff82c4801c766d>] sh_destroy_shadow+0x17d/0x1ad
(XEN)    [<ffff82c4801cb18f>] shadow_blow_tables+0x20b/0x302
(XEN)    [<ffff82c4801cba8d>] shadow_clean_dirty_bitmap+0xba/0x10a
(XEN)    [<ffff82c4801bb394>] paging_log_dirty_op+0x506/0x58c
(XEN)    [<ffff82c4801bb56a>] paging_domctl+0x150/0x181
(XEN)    [<ffff82c48014796c>] arch_do_domctl+0x5c/0x1f64
(XEN)    [<ffff82c4801053b3>] do_domctl+0x1169/0x11e6
(XEN)    [<ffff82c4801f11bf>] syscall_enter+0xef/0x149
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion 'x' failed at common.c:2139
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...

Any ideas what could be wrong here.

Thanks,
Ashish

[-- Attachment #2: xend-config.sxp1 --]
[-- Type: application/octet-stream, Size: 10155 bytes --]

# -*- sh -*-

#
# Xend configuration file.
#

# This example configuration is appropriate for an installation that 
# utilizes a bridged network configuration. Access to xend via http
# is disabled.  

# Commented out entries show the default for that entry, unless otherwise
# specified.

#(logfile /var/log/xen/xend.log)
#(loglevel DEBUG)

# Uncomment the line below.  Set the value to flask, acm, or dummy to 
# select a security module.

#(xsm_module_name dummy)

# The Xen-API server configuration.
#
# This value configures the ports, interfaces, and access controls for the
# Xen-API server.  Each entry in the list starts with either unix, a port
# number, or an address:port pair.  If this is "unix", then a UDP socket is
# opened, and this entry applies to that.  If it is a port, then Xend will
# listen on all interfaces on that TCP port, and if it is an address:port
# pair, then Xend will listen on the specified port, using the interface with
# the specified address.
#
# The subsequent string configures the user-based access control for the
# listener in question.  This can be one of "none" or "pam", indicating either
# that users should be allowed access unconditionally, or that the local
# Pluggable Authentication Modules configuration should be used.  If this
# string is missing or empty, then "pam" is used.
#
# The final string gives the host-based access control for that listener. If
# this is missing or empty, then all connections are accepted.  Otherwise,
# this should be a space-separated sequence of regular expressions; any host
# with a fully-qualified domain name or an IP address that matches one of
# these regular expressions will be accepted.
#
# Example: listen on TCP port 9363 on all interfaces, accepting connections
# only from machines in example.com or localhost, and allow access through
# the unix domain socket unconditionally:
#
#   (xen-api-server ((9363 pam '^localhost$ example\\.com$')
#                    (unix none)))
#
# Optionally, the TCP Xen-API server can use SSL by specifying the private
# key and certificate location:
#
#                    (9367 pam '' xen-api.key xen-api.crt)
#
# Default:
#   (xen-api-server ((unix)))


(xend-http-server yes)
#(xend-unix-server yes)
#(xend-tcp-xmlrpc-server yes)
#(xend-unix-xmlrpc-server yes)
(xend-relocation-server yes)
#(xend-relocation-ssl-server no)
#(xend-udev-event-server no)

#(xend-unix-path /var/lib/xend/xend-socket)


# Address and port xend should use for the legacy TCP XMLRPC interface, 
# if xend-tcp-xmlrpc-server is set.
#(xend-tcp-xmlrpc-server-address 'localhost')
#(xend-tcp-xmlrpc-server-port 8006)

# SSL key and certificate to use for the legacy TCP XMLRPC interface.
# Setting these will mean that this port serves only SSL connections as
# opposed to plaintext ones.
#(xend-tcp-xmlrpc-server-ssl-key-file  xmlrpc.key)
#(xend-tcp-xmlrpc-server-ssl-cert-file xmlrpc.crt)


# Port xend should use for the HTTP interface, if xend-http-server is set.
(xend-port            8000)

# Port xend should use for the relocation interface, if xend-relocation-server
# is set.
(xend-relocation-port 8002)

# Port xend should use for the ssl relocation interface, if
# xend-relocation-ssl-server is set.
#(xend-relocation-ssl-port 8003)

# SSL key and certificate to use for the ssl relocation interface, if
# xend-relocation-ssl-server is set.
#(xend-relocation-server-ssl-key-file   xmlrpc.key)
#(xend-relocation-server-ssl-cert-file  xmlrpc.crt)

# Whether to use ssl as default when relocating.
#(xend-relocation-ssl no)

# Address xend should listen on for HTTP connections, if xend-http-server is
# set.
# Specifying 'localhost' prevents remote connections.
# Specifying the empty string '' (the default) allows all connections.
(xend-address '')
#(xend-address localhost)

# Address xend should listen on for relocation-socket connections, if
# xend-relocation-server is set.
# Meaning and default as for xend-address above.
(xend-relocation-address '')

# The hosts allowed to talk to the relocation port.  If this is empty (the
# default), then all connections are allowed (assuming that the connection
# arrives on a port and interface on which we are listening; see
# xend-relocation-port and xend-relocation-address above).  Otherwise, this
# should be a space-separated sequence of regular expressions.  Any host with
# a fully-qualified domain name or an IP address that matches one of these
# regular expressions will be accepted.
#
# For example:
#  (xend-relocation-hosts-allow '^localhost$ ^.*\\.example\\.org$')
#
(xend-relocation-hosts-allow '')
#(xend-relocation-hosts-allow '^localhost$ ^localhost\\.localdomain$')

# The limit (in kilobytes) on the size of the console buffer
#(console-limit 1024)

##
# To bridge network traffic, like this:
#
# dom0: ----------------- bridge -> real eth0 -> the network
#                            |
# domU: fake eth0 -> vifN.0 -+
#
# use
#
# (network-script network-bridge)
#
# Your default ethernet device is used as the outgoing interface, by default. 
# To use a different one (e.g. eth1) use
#
# (network-script 'network-bridge netdev=eth1')
#
# The bridge is named xenbr0, by default.  To rename the bridge, use
#
# (network-script 'network-bridge bridge=<name>')
#
# It is possible to use the network-bridge script in more complicated
# scenarios, such as having two outgoing interfaces, with two bridges, and
# two fake interfaces per guest domain.  To do things like this, write
# yourself a wrapper script, and call network-bridge from it, as appropriate.
#
(network-script network-bridge)

# The script used to control virtual interfaces.  This can be overridden on a
# per-vif basis when creating a domain or a configuring a new vif.  The
# vif-bridge script is designed for use with the network-bridge script, or
# similar configurations.
#
# If you have overridden the bridge name using
# (network-script 'network-bridge bridge=<name>') then you may wish to do the
# same here.  The bridge name can also be set when creating a domain or
# configuring a new vif, but a value specified here would act as a default.
#
# If you are using only one bridge, the vif-bridge script will discover that,
# so there is no need to specify it explicitly.
#
(vif-script vif-bridge)


## Use the following if network traffic is routed, as an alternative to the
# settings for bridged networking given above.
#(network-script network-route)
#(vif-script     vif-route)


## Use the following if network traffic is routed with NAT, as an alternative
# to the settings for bridged networking given above.
#(network-script network-nat)
#(vif-script     vif-nat)

# dom0-min-mem is the lowest permissible memory level (in MB) for dom0.
# This is a minimum both for auto-ballooning (as enabled by
# enable-dom0-ballooning below) and for xm mem-set when applied to dom0.
(dom0-min-mem 196)

# Whether to enable auto-ballooning of dom0 to allow domUs to be created.
# If enable-dom0-ballooning = no, dom0 will never balloon out.
(enable-dom0-ballooning yes)

# 32-bit paravirtual domains can only consume physical
# memory below 168GB. On systems with memory beyond that address,
# they'll be confined to memory below 128GB.
# Using total_available_memory (in GB) to specify the amount of memory reserved
# in the memory pool exclusively for 32-bit paravirtual domains.
# Additionally you should use dom0_mem = <-Value> as a parameter in 
# xen kernel to reserve the memory for 32-bit paravirtual domains, default 
# is "0" (0GB).  
(total_available_memory 0) 

# In SMP system, dom0 will use dom0-cpus # of CPUS
# If dom0-cpus = 0, dom0 will take all cpus available
(dom0-cpus 0)

# Whether to enable core-dumps when domains crash.
#(enable-dump no)

# The tool used for initiating virtual TPM migration
#(external-migration-tool '')

# The interface for VNC servers to listen on. Defaults
# to 127.0.0.1  To restore old 'listen everywhere' behaviour
# set this to 0.0.0.0
(vnc-listen '0.0.0.0')

# The default password for VNC console on HVM domain.
# Empty string is no authentication.
(vncpasswd '')

# The VNC server can be told to negotiate a TLS session
# to encryption all traffic, and provide x509 cert to
# clients enabling them to verify server identity. The
# GTK-VNC widget, virt-viewer, virt-manager and VeNCrypt
# all support the VNC extension for TLS used in QEMU. The
# TightVNC/RealVNC/UltraVNC clients do not.
#
# To enable this create x509 certificates / keys in the
# directory ${XEN_CONFIG_DIR} + vnc
#
#  ca-cert.pem       - The CA certificate
#  server-cert.pem   - The Server certificate signed by the CA
#  server-key.pem    - The server private key
#
# and then uncomment this next line
# (vnc-tls 1)

# The certificate dir can be pointed elsewhere..
#
# (vnc-x509-cert-dir vnc)

# The server can be told to request & validate an x509
# certificate from the client. Only clients with a cert
# signed by the trusted CA will be able to connect. This
# is more secure the password auth alone. Passwd auth can
# used at the same time if desired. To enable client cert
# checking uncomment this:
#
# (vnc-x509-verify 1)

# The default keymap to use for the VM's virtual keyboard
# when not specififed in VM's configuration
#(keymap 'en-us')

# Script to run when the label of a resource has changed.
#(resource-label-change-script '')

# Rotation count of qemu-dm log file.
#(qemu-dm-logrotate-count 10)

# Path where persistent domain configuration is stored.
# Default is /var/lib/xend/domains/
#(xend-domains-path /var/lib/xend/domains)

# Number of seconds xend will wait for device creation and
# destruction
#(device-create-timeout 100)
#(device-destroy-timeout 100)

# When assigning device to HVM guest, we use the strict check for HVM guest by
# default. (For PV guest, we use loose check automatically if necessary.)
# When we assign device to HVM guest, if we meet with the co-assignment
# issues or the ACS issue, we could try changing the option to 'no' -- however,
# we have to realize this may incur security issue and we can't make sure the
# device assignment could really work properly even after we do this.
#(pci-passthrough-strict-check yes)

[-- Attachment #3: xend-config.sxp2 --]
[-- Type: application/octet-stream, Size: 10158 bytes --]

# -*- sh -*-

#
# Xend configuration file.
#

# This example configuration is appropriate for an installation that 
# utilizes a bridged network configuration. Access to xend via http
# is disabled.  

# Commented out entries show the default for that entry, unless otherwise
# specified.

#(logfile /var/log/xen/xend.log)
#(loglevel DEBUG)

# Uncomment the line below.  Set the value to flask, acm, or dummy to 
# select a security module.

#(xsm_module_name dummy)

# The Xen-API server configuration.
#
# This value configures the ports, interfaces, and access controls for the
# Xen-API server.  Each entry in the list starts with either unix, a port
# number, or an address:port pair.  If this is "unix", then a UDP socket is
# opened, and this entry applies to that.  If it is a port, then Xend will
# listen on all interfaces on that TCP port, and if it is an address:port
# pair, then Xend will listen on the specified port, using the interface with
# the specified address.
#
# The subsequent string configures the user-based access control for the
# listener in question.  This can be one of "none" or "pam", indicating either
# that users should be allowed access unconditionally, or that the local
# Pluggable Authentication Modules configuration should be used.  If this
# string is missing or empty, then "pam" is used.
#
# The final string gives the host-based access control for that listener. If
# this is missing or empty, then all connections are accepted.  Otherwise,
# this should be a space-separated sequence of regular expressions; any host
# with a fully-qualified domain name or an IP address that matches one of
# these regular expressions will be accepted.
#
# Example: listen on TCP port 9363 on all interfaces, accepting connections
# only from machines in example.com or localhost, and allow access through
# the unix domain socket unconditionally:
#
#   (xen-api-server ((9363 pam '^localhost$ example\\.com$')
#                    (unix none)))
#
# Optionally, the TCP Xen-API server can use SSL by specifying the private
# key and certificate location:
#
#                    (9367 pam '' xen-api.key xen-api.crt)
#
# Default:
#   (xen-api-server ((unix)))


(xend-http-server yes)
#(xend-unix-server yes)
#(xend-tcp-xmlrpc-server yes)
#(xend-unix-xmlrpc-server yes)
(xend-relocation-server yes)
#(xend-relocation-ssl-server no)
#(xend-udev-event-server no)

#(xend-unix-path /var/lib/xend/xend-socket)


# Address and port xend should use for the legacy TCP XMLRPC interface, 
# if xend-tcp-xmlrpc-server is set.
#(xend-tcp-xmlrpc-server-address 'localhost')
#(xend-tcp-xmlrpc-server-port 8006)

# SSL key and certificate to use for the legacy TCP XMLRPC interface.
# Setting these will mean that this port serves only SSL connections as
# opposed to plaintext ones.
#(xend-tcp-xmlrpc-server-ssl-key-file  xmlrpc.key)
#(xend-tcp-xmlrpc-server-ssl-cert-file xmlrpc.crt)


# Port xend should use for the HTTP interface, if xend-http-server is set.
(xend-port            8000)

# Port xend should use for the relocation interface, if xend-relocation-server
# is set.
(xend-relocation-port 8002)

# Port xend should use for the ssl relocation interface, if
# xend-relocation-ssl-server is set.
#(xend-relocation-ssl-port 8003)

# SSL key and certificate to use for the ssl relocation interface, if
# xend-relocation-ssl-server is set.
#(xend-relocation-server-ssl-key-file   xmlrpc.key)
#(xend-relocation-server-ssl-cert-file  xmlrpc.crt)

# Whether to use ssl as default when relocating.
#(xend-relocation-ssl no)

# Address xend should listen on for HTTP connections, if xend-http-server is
# set.
# Specifying 'localhost' prevents remote connections.
# Specifying the empty string '' (the default) allows all connections.
(xend-address '')
#(xend-address localhost)

# Address xend should listen on for relocation-socket connections, if
# xend-relocation-server is set.
# Meaning and default as for xend-address above.
(xend-relocation-address '')

# The hosts allowed to talk to the relocation port.  If this is empty (the
# default), then all connections are allowed (assuming that the connection
# arrives on a port and interface on which we are listening; see
# xend-relocation-port and xend-relocation-address above).  Otherwise, this
# should be a space-separated sequence of regular expressions.  Any host with
# a fully-qualified domain name or an IP address that matches one of these
# regular expressions will be accepted.
#
# For example:
#  (xend-relocation-hosts-allow '^localhost$ ^.*\\.example\\.org$')
#
(xend-relocation-hosts-allow '')
#(xend-relocation-hosts-allow '^localhost$ ^localhost\\.localdomain$')

# The limit (in kilobytes) on the size of the console buffer
#(console-limit 1024)

##
# To bridge network traffic, like this:
#
# dom0: ----------------- bridge -> real eth0 -> the network
#                            |
# domU: fake eth0 -> vifN.0 -+
#
# use
#
# (network-script network-bridge)
#
# Your default ethernet device is used as the outgoing interface, by default. 
# To use a different one (e.g. eth1) use
#
# (network-script 'network-bridge netdev=eth1')
#
# The bridge is named xenbr0, by default.  To rename the bridge, use
#
# (network-script 'network-bridge bridge=<name>')
#
# It is possible to use the network-bridge script in more complicated
# scenarios, such as having two outgoing interfaces, with two bridges, and
# two fake interfaces per guest domain.  To do things like this, write
# yourself a wrapper script, and call network-bridge from it, as appropriate.
#
(network-script network-bridge)

# The script used to control virtual interfaces.  This can be overridden on a
# per-vif basis when creating a domain or a configuring a new vif.  The
# vif-bridge script is designed for use with the network-bridge script, or
# similar configurations.
#
# If you have overridden the bridge name using
# (network-script 'network-bridge bridge=<name>') then you may wish to do the
# same here.  The bridge name can also be set when creating a domain or
# configuring a new vif, but a value specified here would act as a default.
#
# If you are using only one bridge, the vif-bridge script will discover that,
# so there is no need to specify it explicitly.
#
(vif-script vif-bridge)


## Use the following if network traffic is routed, as an alternative to the
# settings for bridged networking given above.
#(network-script network-route)
#(vif-script     vif-route)


## Use the following if network traffic is routed with NAT, as an alternative
# to the settings for bridged networking given above.
#(network-script network-nat)
#(vif-script     vif-nat)

# dom0-min-mem is the lowest permissible memory level (in MB) for dom0.
# This is a minimum both for auto-ballooning (as enabled by
# enable-dom0-ballooning below) and for xm mem-set when applied to dom0.
(dom0-min-mem 196)

# Whether to enable auto-ballooning of dom0 to allow domUs to be created.
# If enable-dom0-ballooning = no, dom0 will never balloon out.
(enable-dom0-ballooning yes)

# 32-bit paravirtual domains can only consume physical
# memory below 168GB. On systems with memory beyond that address,
# they'll be confined to memory below 128GB.
# Using total_available_memory (in GB) to specify the amount of memory reserved
# in the memory pool exclusively for 32-bit paravirtual domains.
# Additionally you should use dom0_mem = <-Value> as a parameter in 
# xen kernel to reserve the memory for 32-bit paravirtual domains, default 
# is "0" (0GB).  
(total_available_memory 0) 

# In SMP system, dom0 will use dom0-cpus # of CPUS
# If dom0-cpus = 0, dom0 will take all cpus available
(dom0-cpus 0)

# Whether to enable core-dumps when domains crash.
#(enable-dump no)

# The tool used for initiating virtual TPM migration
#(external-migration-tool '')

# The interface for VNC servers to listen on. Defaults
# to 127.0.0.1  To restore old 'listen everywhere' behaviour
# set this to 0.0.0.0
#(vnc-listen '127.0.0.1')

# The default password for VNC console on HVM domain.
# Empty string is no authentication.
(vncpasswd '')

# The VNC server can be told to negotiate a TLS session
# to encryption all traffic, and provide x509 cert to
# clients enabling them to verify server identity. The
# GTK-VNC widget, virt-viewer, virt-manager and VeNCrypt
# all support the VNC extension for TLS used in QEMU. The
# TightVNC/RealVNC/UltraVNC clients do not.
#
# To enable this create x509 certificates / keys in the
# directory ${XEN_CONFIG_DIR} + vnc
#
#  ca-cert.pem       - The CA certificate
#  server-cert.pem   - The Server certificate signed by the CA
#  server-key.pem    - The server private key
#
# and then uncomment this next line
# (vnc-tls 1)

# The certificate dir can be pointed elsewhere..
#
# (vnc-x509-cert-dir vnc)

# The server can be told to request & validate an x509
# certificate from the client. Only clients with a cert
# signed by the trusted CA will be able to connect. This
# is more secure the password auth alone. Passwd auth can
# used at the same time if desired. To enable client cert
# checking uncomment this:
#
# (vnc-x509-verify 1)

# The default keymap to use for the VM's virtual keyboard
# when not specififed in VM's configuration
#(keymap 'en-us')

# Script to run when the label of a resource has changed.
#(resource-label-change-script '')

# Rotation count of qemu-dm log file.
#(qemu-dm-logrotate-count 10)

# Path where persistent domain configuration is stored.
# Default is /var/lib/xend/domains/
#(xend-domains-path /var/lib/xend/domains)

# Number of seconds xend will wait for device creation and
# destruction
#(device-create-timeout 100)
#(device-destroy-timeout 100)

# When assigning device to HVM guest, we use the strict check for HVM guest by
# default. (For PV guest, we use loose check automatically if necessary.)
# When we assign device to HVM guest, if we meet with the co-assignment
# issues or the ACS issue, we could try changing the option to 'no' -- however,
# we have to realize this may incur security issue and we can't make sure the
# device assignment could really work properly even after we do this.
#(pci-passthrough-strict-check yes)

[-- Attachment #4: xmexample1 --]
[-- Type: application/octet-stream, Size: 7523 bytes --]

#  -*- mode: python; -*-
#============================================================================
# Python configuration setup for 'xm create'.
# This script sets the parameters used when a domain is created using 'xm create'.
# You use a separate script for each domain you want to create, or 
# you can set the parameters for the domain on the xm command line.
#============================================================================

#----------------------------------------------------------------------------
# Kernel image file.
kernel = "/nfs/vmlinuz-2.6.31.6"

# Optional ramdisk.
#ramdisk = "/boot/initrd.gz"

# The domain build function. Default is 'linux'.
#builder='linux'

# Initial memory allocation (in megabytes) for the new domain.
#
# WARNING: Creating a domain with insufficient memory may cause out of
#          memory errors. The domain needs enough memory to boot kernel
#          and modules. Allocating less than 32MBs is not recommended.
memory = 64

# A name for your domain. All domains must have different names.
name = "ExampleDomain"

# 128-bit UUID for the domain.  The default behavior is to generate a new UUID
# on each call to 'xm create'.
#uuid = "06ed00fe-1162-4fc4-b5d8-11993ee4a8b9"

# List of which CPUS this domain is allowed to use, default Xen picks
#cpus = ""         # leave to Xen to pick
#cpus = "0"        # all vcpus run on CPU0
#cpus = "0-3,5,^1" # all vcpus run on cpus 0,2,3,5
#cpus = ["2", "3"] # VCPU0 runs on CPU2, VCPU1 runs on CPU3

# Number of Virtual CPUS to use, default is 1
#vcpus = 1

#----------------------------------------------------------------------------
# Define network interfaces.

# By default, no network interfaces are configured.  You may have one created
# with sensible defaults using an empty vif clause:
#
# vif = [ '' ]
#
# or optionally override backend, bridge, ip, mac, script, type, or vifname:
#
# vif = [ 'mac=00:16:3e:00:00:11, bridge=xenbr0' ]
#
# or more than one interface may be configured:
#
# vif = [ '', 'bridge=xenbr1' ]

vif = [ '' ]

#----------------------------------------------------------------------------
# Define the disk devices you want the domain to have access to, and
# what you want them accessible as.
# Each disk entry is of the form phy:UNAME,DEV,MODE
# where UNAME is the device, DEV is the device name the domain will see,
# and MODE is r for read-only, w for read-write.

disk = [ 'tap:aio:/nfs/disk.img,xvda,w' ]

#----------------------------------------------------------------------------
# Define frame buffer device.
#
# By default, no frame buffer device is configured.
#
# To create one using the SDL backend and sensible defaults:
#
# vfb = [ 'sdl=1' ]
#
# This uses environment variables XAUTHORITY and DISPLAY.  You
# can override that:
#
#vfb = [ 'sdl=1,xauthority=/home/ashish/.Xauthority,display=:0' ]
#
# To create one using the VNC backend and sensible defaults:
#
# vfb = [ 'vnc=1' ]
#
# The backend listens on 127.0.0.1 port 5900+N by default, where N is
# the domain ID.  You can override both address and N:
#
# vfb = [ 'vnc=1,vnclisten=127.0.0.1,vncdisplay=1' ]
#
# Or you can bind the first unused port above 5900:
#
# vfb = [ 'vnc=1,vnclisten=0.0.0.0,vncunused=1' ]
#
# You can override the password:
#
# vfb = [ 'vnc=1,vncpasswd=MYPASSWD' ]
#
# Empty password disables authentication.  Defaults to the vncpasswd
# configured in xend-config.sxp.

#----------------------------------------------------------------------------
# Define to which TPM instance the user domain should communicate.
# The vtpm entry is of the form 'instance=INSTANCE,backend=DOM'
# where INSTANCE indicates the instance number of the TPM the VM
# should be talking to and DOM provides the domain where the backend
# is located.
# Note that no two virtual machines should try to connect to the same
# TPM instance. The handling of all TPM instances does require
# some management effort in so far that VM configration files (and thus
# a VM) should be associated with a TPM instance throughout the lifetime
# of the VM / VM configuration file. The instance number must be
# greater or equal to 1.
#vtpm = [ 'instance=1,backend=0' ]

#----------------------------------------------------------------------------
# Set the kernel command line for the new domain.
# You only need to define the IP parameters and hostname if the domain's
# IP config doesn't, e.g. in ifcfg-eth0 or via DHCP.
# You can use 'extra' to set the runlevel and custom environment
# variables used by custom rc scripts (e.g. VMID=, usr= ).

# Set if you want dhcp to allocate the IP address.
#dhcp="dhcp"
# Set netmask.
#netmask=
# Set default gateway.
#gateway=
# Set the hostname.
#hostname= "vm%d" % vmid

# Set root device.
root = "/dev/xvda1"

# Root device for nfs.
#root = "/dev/nfs"
# The nfs server.
#nfs_server = '192.0.2.1'  
# Root directory on the nfs server.
#nfs_root   = '/full/path/to/root/directory'

# Sets runlevel 4.
# extra = "4"

#----------------------------------------------------------------------------
# Configure the behaviour when a domain exits.  There are three 'reasons'
# for a domain to stop: poweroff, reboot, and crash.  For each of these you
# may specify:
#
#   "destroy",        meaning that the domain is cleaned up as normal;
#   "restart",        meaning that a new domain is started in place of the old
#                     one;
#   "preserve",       meaning that no clean-up is done until the domain is
#                     manually destroyed (using xm destroy, for example); or
#   "rename-restart", meaning that the old domain is not cleaned up, but is
#                     renamed and a new domain started in its place.
#
# In the event a domain stops due to a crash, you have the additional options:
#
#   "coredump-destroy", meaning dump the crashed domain's core and then destroy;
#   "coredump-restart', meaning dump the crashed domain's core and the restart.
#
# The default is
#
#   on_poweroff = 'destroy'
#   on_reboot   = 'restart'
#   on_crash    = 'restart'
#
# For backwards compatibility we also support the deprecated option restart
#
# restart = 'onreboot' means on_poweroff = 'destroy'
#                            on_reboot   = 'restart'
#                            on_crash    = 'destroy'
#
# restart = 'always'   means on_poweroff = 'restart'
#                            on_reboot   = 'restart'
#                            on_crash    = 'restart'
#
# restart = 'never'    means on_poweroff = 'destroy'
#                            on_reboot   = 'destroy'
#                            on_crash    = 'destroy'

#on_poweroff = 'destroy'
#on_reboot   = 'restart'
#on_crash    = 'restart'

#-----------------------------------------------------------------------------
#   Configure PVSCSI devices:
#
#vscsi=[ 'PDEV, VDEV' ]
#
#   PDEV   gives physical SCSI device to be attached to specified guest
#          domain by one of the following identifier format.
#          - XX:XX:XX:XX (4-tuples with decimal notation which shows
#                          "host:channel:target:lun")
#          - /dev/sdxx or sdx
#          - /dev/stxx or stx
#          - /dev/sgxx or sgx
#          - result of 'scsi_id -gu -s'.
#            ex. # scsi_id -gu -s /block/sdb
#                  36000b5d0006a0000006a0257004c0000
#
#   VDEV   gives virtual SCSI device by 4-tuples (XX:XX:XX:XX) as 
#          which the specified guest domain recognize.
#

#vscsi = [ '/dev/sdx, 0:0:0:0' ]

#============================================================================


[-- Attachment #5: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: live migration fails (assert in shadow_hash_delete)
  2010-02-23  8:57 live migration fails (assert in shadow_hash_delete) Ashish Bijlani
@ 2010-02-23  9:25 ` Tim Deegan
  2010-02-23 10:19   ` Devdutt Patnaik
  0 siblings, 1 reply; 22+ messages in thread
From: Tim Deegan @ 2010-02-23  9:25 UTC (permalink / raw)
  To: Ashish Bijlani; +Cc: xen-devel

Hi, 

At 08:57 +0000 on 23 Feb (1266915448), Ashish Bijlani wrote:
> I'm working on a project that requires live migration of a 64-bit PV
> VM (on a 64-bit platform). "xm save"  and "xm restore" work fine.
> However, live migration fails with the following err msg:

Oh dear.  I take it this is on the sending machine. What version of Xen
are you using?

Does it happen every time or only intermittently?

Does it happen only with one particular guest or all 64bit guests?

Have you made any modifications to Xen?

It looks like the shadow pagetable code has got very confused - a page
is marked as shadowed but isn't in the hash-table of shadowed pages. 

Cheers,

Tim.

> mapping kernel into physical memory
> about to get started...
> (XEN) traps.c:2306:d3 Domain attempted WRMSR 000000000000008b from
> 00000a07:00000000 to 00000000:000000.
> (XEN) Assertion 'x' failed at common.c:2139
> (XEN) ----[ Xen-4.0.0-rc3-pre  x86_64  debug=y  Not tainted ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82c4801c8a08>] shadow_hash_delete+0x12e/0x18c
> (XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor
> (XEN) rax: ffff8300040e2770   rbx: ffff830223ce0000   rcx: 0000000000000000
> (XEN) rdx: 0000000000000000   rsi: 0000000000000000   rdi: ffff82f60443c8a0
> (XEN) rbp: ffff82c4802efb48   rsp: ffff82c4802efb18   r8:  ffff82f600000000
> (XEN) r9:  0000000000000000   r10: ffff830223ce0000   r11: 00000000000041c5
> (XEN) r12: 0000000000221e45   r13: 00000000000000ec   r14: ffff82f600000000
> (XEN) r15: ffff8300cfaea000   cr0: 0000000080050033   cr4: 00000000000006f0
> (XEN) cr3: 0000000210154000   cr2: ffff8801dd5508c8
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) Xen stack trace from rsp=ffff82c4802efb18:
> (XEN)    3333333333333333 00000000000041c5 ffff82c4802efb32 000000000000000d
> (XEN)    00000000000041c5 ffff8300cfaea000 ffff82c4802efba8 ffff82c4801e5d2e
> (XEN)    0000005600ed79a0 0000000800000000 0000000100000000 0000000000221e45
> (XEN)    0000000000000008 0000000000000000 ffff82f60443c8a0 ffff82f60404ab00
> (XEN)    ffff82f600000000 ffff8300cfaea000 ffff82c4802efbd8 ffff82c4801c766d
> (XEN)    ffff82c4802efc18 0000000000000282 0000000000000281 0000000000221e45
> (XEN)    ffff82c4802efc28 ffff82c4801cb18f 000000000f69d1d0 ffff830223ce0000
> (XEN)    ffff82c4802efc18 ffff830223ce0000 ffff82c4802eff28 ffff830223ce0e28
> (XEN)    0000000000000002 ffff8300040de000 ffff82c4802efc58 ffff82c4801cba8d
> (XEN)    0000000000000282 ffff82c4802efe58 0000000000010000 0000000000008000
> (XEN)    ffff82c4802efce8 ffff82c4801bb394 0000000100000000 ffff8302236e8000
> (XEN)    ffff830223ce0f08 ffff8300040e1000 00000001802efd48 ffff82c48031f640
> (XEN)    ffff830223ce0000 0000000100000001 ffff82c4802eff28 ffff8300040e0000
> (XEN)    ffff82c4802efce8 ffff830223ce0000 ffff82c4802efe58 00007fff0f69d1d0
> (XEN)    ffff82c4802efe48 0000000000000000 ffff82c4802efd08 ffff82c4801bb56a
> (XEN)    fffffffffffffff3 0000000000f71000 ffff82c4802efdc8 ffff82c48014796c
> (XEN)    ffff82c4802efd28 ffff82c48016b0d4 ffff82c4802efd48 ffff82c48011dce7
> (XEN)    0000000000000008 ffff82c480163d8c ffff82c4802efd68 ffff82c480118755
> (XEN)    0000000000000008 ffff8300cfafa000 ffff82c4802efdc8 0000000000000286
> (XEN)    ffff82c4802efd98 0000000000000286 ffff82c4802eff28 ffff82c4802eff28
> (XEN) Xen call trace:
> (XEN)    [<ffff82c4801c8a08>] shadow_hash_delete+0x12e/0x18c
> (XEN)    [<ffff82c4801e5d2e>] sh_destroy_l4_shadow__guest_4+0xb5/0x371
> (XEN)    [<ffff82c4801c766d>] sh_destroy_shadow+0x17d/0x1ad
> (XEN)    [<ffff82c4801cb18f>] shadow_blow_tables+0x20b/0x302
> (XEN)    [<ffff82c4801cba8d>] shadow_clean_dirty_bitmap+0xba/0x10a
> (XEN)    [<ffff82c4801bb394>] paging_log_dirty_op+0x506/0x58c
> (XEN)    [<ffff82c4801bb56a>] paging_domctl+0x150/0x181
> (XEN)    [<ffff82c48014796c>] arch_do_domctl+0x5c/0x1f64
> (XEN)    [<ffff82c4801053b3>] do_domctl+0x1169/0x11e6
> (XEN)    [<ffff82c4801f11bf>] syscall_enter+0xef/0x149
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) Assertion 'x' failed at common.c:2139
> (XEN) ****************************************
> (XEN)
> (XEN) Reboot in five seconds...
> 
> Any ideas what could be wrong here.
> 
> Thanks,
> Ashish




Content-Description: ATT00001.txt
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel


-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, XenServer Engineering
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: live migration fails (assert in shadow_hash_delete)
  2010-02-23  9:25 ` Tim Deegan
@ 2010-02-23 10:19   ` Devdutt Patnaik
  2010-02-23 10:25     ` Devdutt Patnaik
                       ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Devdutt Patnaik @ 2010-02-23 10:19 UTC (permalink / raw)
  To: Tim Deegan; +Cc: Ashish Bijlani, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 5526 bytes --]

Tim,

Yes, this is happening on the sending machine.

We just used the xen-unstable version from 2 weeks ago, and haven't really
modified it.
We tried this with 64-bit versions of 2.6.31.6 and 2.6.32.8 DomU kernels.

Any suggestions on what might be a better bet in terms of xen, Dom0 and DomU
kernel versions.
We wish to use 64-bit PV VMs for our experiments.

We have only been able to do a successful migration 3 times, out of maybe 30
odd attempts.

Thanks,
Devdutt.

On Tue, Feb 23, 2010 at 1:25 AM, Tim Deegan <Tim.Deegan@citrix.com> wrote:

> Hi,
>
> At 08:57 +0000 on 23 Feb (1266915448), Ashish Bijlani wrote:
> > I'm working on a project that requires live migration of a 64-bit PV
> > VM (on a 64-bit platform). "xm save"  and "xm restore" work fine.
> > However, live migration fails with the following err msg:
>
> Oh dear.  I take it this is on the sending machine. What version of Xen
> are you using?
>
> Does it happen every time or only intermittently?
>
> Does it happen only with one particular guest or all 64bit guests?
>
> Have you made any modifications to Xen?
>
> It looks like the shadow pagetable code has got very confused - a page
> is marked as shadowed but isn't in the hash-table of shadowed pages.
>
> Cheers,
>
> Tim.
>
> > mapping kernel into physical memory
> > about to get started...
> > (XEN) traps.c:2306:d3 Domain attempted WRMSR 000000000000008b from
> > 00000a07:00000000 to 00000000:000000.
> > (XEN) Assertion 'x' failed at common.c:2139
> > (XEN) ----[ Xen-4.0.0-rc3-pre  x86_64  debug=y  Not tainted ]----
> > (XEN) CPU:    0
> > (XEN) RIP:    e008:[<ffff82c4801c8a08>] shadow_hash_delete+0x12e/0x18c
> > (XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor
> > (XEN) rax: ffff8300040e2770   rbx: ffff830223ce0000   rcx:
> 0000000000000000
> > (XEN) rdx: 0000000000000000   rsi: 0000000000000000   rdi:
> ffff82f60443c8a0
> > (XEN) rbp: ffff82c4802efb48   rsp: ffff82c4802efb18   r8:
>  ffff82f600000000
> > (XEN) r9:  0000000000000000   r10: ffff830223ce0000   r11:
> 00000000000041c5
> > (XEN) r12: 0000000000221e45   r13: 00000000000000ec   r14:
> ffff82f600000000
> > (XEN) r15: ffff8300cfaea000   cr0: 0000000080050033   cr4:
> 00000000000006f0
> > (XEN) cr3: 0000000210154000   cr2: ffff8801dd5508c8
> > (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
> > (XEN) Xen stack trace from rsp=ffff82c4802efb18:
> > (XEN)    3333333333333333 00000000000041c5 ffff82c4802efb32
> 000000000000000d
> > (XEN)    00000000000041c5 ffff8300cfaea000 ffff82c4802efba8
> ffff82c4801e5d2e
> > (XEN)    0000005600ed79a0 0000000800000000 0000000100000000
> 0000000000221e45
> > (XEN)    0000000000000008 0000000000000000 ffff82f60443c8a0
> ffff82f60404ab00
> > (XEN)    ffff82f600000000 ffff8300cfaea000 ffff82c4802efbd8
> ffff82c4801c766d
> > (XEN)    ffff82c4802efc18 0000000000000282 0000000000000281
> 0000000000221e45
> > (XEN)    ffff82c4802efc28 ffff82c4801cb18f 000000000f69d1d0
> ffff830223ce0000
> > (XEN)    ffff82c4802efc18 ffff830223ce0000 ffff82c4802eff28
> ffff830223ce0e28
> > (XEN)    0000000000000002 ffff8300040de000 ffff82c4802efc58
> ffff82c4801cba8d
> > (XEN)    0000000000000282 ffff82c4802efe58 0000000000010000
> 0000000000008000
> > (XEN)    ffff82c4802efce8 ffff82c4801bb394 0000000100000000
> ffff8302236e8000
> > (XEN)    ffff830223ce0f08 ffff8300040e1000 00000001802efd48
> ffff82c48031f640
> > (XEN)    ffff830223ce0000 0000000100000001 ffff82c4802eff28
> ffff8300040e0000
> > (XEN)    ffff82c4802efce8 ffff830223ce0000 ffff82c4802efe58
> 00007fff0f69d1d0
> > (XEN)    ffff82c4802efe48 0000000000000000 ffff82c4802efd08
> ffff82c4801bb56a
> > (XEN)    fffffffffffffff3 0000000000f71000 ffff82c4802efdc8
> ffff82c48014796c
> > (XEN)    ffff82c4802efd28 ffff82c48016b0d4 ffff82c4802efd48
> ffff82c48011dce7
> > (XEN)    0000000000000008 ffff82c480163d8c ffff82c4802efd68
> ffff82c480118755
> > (XEN)    0000000000000008 ffff8300cfafa000 ffff82c4802efdc8
> 0000000000000286
> > (XEN)    ffff82c4802efd98 0000000000000286 ffff82c4802eff28
> ffff82c4802eff28
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82c4801c8a08>] shadow_hash_delete+0x12e/0x18c
> > (XEN)    [<ffff82c4801e5d2e>] sh_destroy_l4_shadow__guest_4+0xb5/0x371
> > (XEN)    [<ffff82c4801c766d>] sh_destroy_shadow+0x17d/0x1ad
> > (XEN)    [<ffff82c4801cb18f>] shadow_blow_tables+0x20b/0x302
> > (XEN)    [<ffff82c4801cba8d>] shadow_clean_dirty_bitmap+0xba/0x10a
> > (XEN)    [<ffff82c4801bb394>] paging_log_dirty_op+0x506/0x58c
> > (XEN)    [<ffff82c4801bb56a>] paging_domctl+0x150/0x181
> > (XEN)    [<ffff82c48014796c>] arch_do_domctl+0x5c/0x1f64
> > (XEN)    [<ffff82c4801053b3>] do_domctl+0x1169/0x11e6
> > (XEN)    [<ffff82c4801f11bf>] syscall_enter+0xef/0x149
> > (XEN)
> > (XEN)
> > (XEN) ****************************************
> > (XEN) Panic on CPU 0:
> > (XEN) Assertion 'x' failed at common.c:2139
> > (XEN) ****************************************
> > (XEN)
> > (XEN) Reboot in five seconds...
> >
> > Any ideas what could be wrong here.
> >
> > Thanks,
> > Ashish
>
>
>
>
> Content-Description: ATT00001.txt
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
>
>
> --
> Tim Deegan <Tim.Deegan@citrix.com>
> Principal Software Engineer, XenServer Engineering
> Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

[-- Attachment #1.2: Type: text/html, Size: 6780 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: live migration fails (assert in shadow_hash_delete)
  2010-02-23 10:19   ` Devdutt Patnaik
@ 2010-02-23 10:25     ` Devdutt Patnaik
  2010-02-23 10:46     ` Tim Deegan
  2010-02-23 10:54     ` Jan Beulich
  2 siblings, 0 replies; 22+ messages in thread
From: Devdutt Patnaik @ 2010-02-23 10:25 UTC (permalink / raw)
  To: Tim Deegan; +Cc: Ashish Bijlani, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 5865 bytes --]

Forgot to specifically mention thats its Xen 4.0. Its the rc3 version.

Thanks,
Devdutt.

On Tue, Feb 23, 2010 at 2:19 AM, Devdutt Patnaik <xendevid@gmail.com> wrote:

> Tim,
>
> Yes, this is happening on the sending machine.
>
> We just used the xen-unstable version from 2 weeks ago, and haven't really
> modified it.
> We tried this with 64-bit versions of 2.6.31.6 and 2.6.32.8 DomU kernels.
>
> Any suggestions on what might be a better bet in terms of xen, Dom0 and
> DomU kernel versions.
> We wish to use 64-bit PV VMs for our experiments.
>
> We have only been able to do a successful migration 3 times, out of maybe
> 30 odd attempts.
>
> Thanks,
> Devdutt.
>
>
> On Tue, Feb 23, 2010 at 1:25 AM, Tim Deegan <Tim.Deegan@citrix.com> wrote:
>
>> Hi,
>>
>> At 08:57 +0000 on 23 Feb (1266915448), Ashish Bijlani wrote:
>> > I'm working on a project that requires live migration of a 64-bit PV
>> > VM (on a 64-bit platform). "xm save"  and "xm restore" work fine.
>> > However, live migration fails with the following err msg:
>>
>> Oh dear.  I take it this is on the sending machine. What version of Xen
>> are you using?
>>
>> Does it happen every time or only intermittently?
>>
>> Does it happen only with one particular guest or all 64bit guests?
>>
>> Have you made any modifications to Xen?
>>
>> It looks like the shadow pagetable code has got very confused - a page
>> is marked as shadowed but isn't in the hash-table of shadowed pages.
>>
>> Cheers,
>>
>> Tim.
>>
>> > mapping kernel into physical memory
>> > about to get started...
>> > (XEN) traps.c:2306:d3 Domain attempted WRMSR 000000000000008b from
>> > 00000a07:00000000 to 00000000:000000.
>> > (XEN) Assertion 'x' failed at common.c:2139
>> > (XEN) ----[ Xen-4.0.0-rc3-pre  x86_64  debug=y  Not tainted ]----
>> > (XEN) CPU:    0
>> > (XEN) RIP:    e008:[<ffff82c4801c8a08>] shadow_hash_delete+0x12e/0x18c
>> > (XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor
>> > (XEN) rax: ffff8300040e2770   rbx: ffff830223ce0000   rcx:
>> 0000000000000000
>> > (XEN) rdx: 0000000000000000   rsi: 0000000000000000   rdi:
>> ffff82f60443c8a0
>> > (XEN) rbp: ffff82c4802efb48   rsp: ffff82c4802efb18   r8:
>>  ffff82f600000000
>> > (XEN) r9:  0000000000000000   r10: ffff830223ce0000   r11:
>> 00000000000041c5
>> > (XEN) r12: 0000000000221e45   r13: 00000000000000ec   r14:
>> ffff82f600000000
>> > (XEN) r15: ffff8300cfaea000   cr0: 0000000080050033   cr4:
>> 00000000000006f0
>> > (XEN) cr3: 0000000210154000   cr2: ffff8801dd5508c8
>> > (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
>> > (XEN) Xen stack trace from rsp=ffff82c4802efb18:
>> > (XEN)    3333333333333333 00000000000041c5 ffff82c4802efb32
>> 000000000000000d
>> > (XEN)    00000000000041c5 ffff8300cfaea000 ffff82c4802efba8
>> ffff82c4801e5d2e
>> > (XEN)    0000005600ed79a0 0000000800000000 0000000100000000
>> 0000000000221e45
>> > (XEN)    0000000000000008 0000000000000000 ffff82f60443c8a0
>> ffff82f60404ab00
>> > (XEN)    ffff82f600000000 ffff8300cfaea000 ffff82c4802efbd8
>> ffff82c4801c766d
>> > (XEN)    ffff82c4802efc18 0000000000000282 0000000000000281
>> 0000000000221e45
>> > (XEN)    ffff82c4802efc28 ffff82c4801cb18f 000000000f69d1d0
>> ffff830223ce0000
>> > (XEN)    ffff82c4802efc18 ffff830223ce0000 ffff82c4802eff28
>> ffff830223ce0e28
>> > (XEN)    0000000000000002 ffff8300040de000 ffff82c4802efc58
>> ffff82c4801cba8d
>> > (XEN)    0000000000000282 ffff82c4802efe58 0000000000010000
>> 0000000000008000
>> > (XEN)    ffff82c4802efce8 ffff82c4801bb394 0000000100000000
>> ffff8302236e8000
>> > (XEN)    ffff830223ce0f08 ffff8300040e1000 00000001802efd48
>> ffff82c48031f640
>> > (XEN)    ffff830223ce0000 0000000100000001 ffff82c4802eff28
>> ffff8300040e0000
>> > (XEN)    ffff82c4802efce8 ffff830223ce0000 ffff82c4802efe58
>> 00007fff0f69d1d0
>> > (XEN)    ffff82c4802efe48 0000000000000000 ffff82c4802efd08
>> ffff82c4801bb56a
>> > (XEN)    fffffffffffffff3 0000000000f71000 ffff82c4802efdc8
>> ffff82c48014796c
>> > (XEN)    ffff82c4802efd28 ffff82c48016b0d4 ffff82c4802efd48
>> ffff82c48011dce7
>> > (XEN)    0000000000000008 ffff82c480163d8c ffff82c4802efd68
>> ffff82c480118755
>> > (XEN)    0000000000000008 ffff8300cfafa000 ffff82c4802efdc8
>> 0000000000000286
>> > (XEN)    ffff82c4802efd98 0000000000000286 ffff82c4802eff28
>> ffff82c4802eff28
>> > (XEN) Xen call trace:
>> > (XEN)    [<ffff82c4801c8a08>] shadow_hash_delete+0x12e/0x18c
>> > (XEN)    [<ffff82c4801e5d2e>] sh_destroy_l4_shadow__guest_4+0xb5/0x371
>> > (XEN)    [<ffff82c4801c766d>] sh_destroy_shadow+0x17d/0x1ad
>> > (XEN)    [<ffff82c4801cb18f>] shadow_blow_tables+0x20b/0x302
>> > (XEN)    [<ffff82c4801cba8d>] shadow_clean_dirty_bitmap+0xba/0x10a
>> > (XEN)    [<ffff82c4801bb394>] paging_log_dirty_op+0x506/0x58c
>> > (XEN)    [<ffff82c4801bb56a>] paging_domctl+0x150/0x181
>> > (XEN)    [<ffff82c48014796c>] arch_do_domctl+0x5c/0x1f64
>> > (XEN)    [<ffff82c4801053b3>] do_domctl+0x1169/0x11e6
>> > (XEN)    [<ffff82c4801f11bf>] syscall_enter+0xef/0x149
>> > (XEN)
>> > (XEN)
>> > (XEN) ****************************************
>> > (XEN) Panic on CPU 0:
>> > (XEN) Assertion 'x' failed at common.c:2139
>> > (XEN) ****************************************
>> > (XEN)
>> > (XEN) Reboot in five seconds...
>> >
>> > Any ideas what could be wrong here.
>> >
>> > Thanks,
>> > Ashish
>>
>>
>>
>>
>> Content-Description: ATT00001.txt
>> > _______________________________________________
>> > Xen-devel mailing list
>> > Xen-devel@lists.xensource.com
>> > http://lists.xensource.com/xen-devel
>>
>>
>> --
>> Tim Deegan <Tim.Deegan@citrix.com>
>> Principal Software Engineer, XenServer Engineering
>> Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
>>
>
>

[-- Attachment #1.2: Type: text/html, Size: 7336 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: live migration fails (assert in shadow_hash_delete)
  2010-02-23 10:19   ` Devdutt Patnaik
  2010-02-23 10:25     ` Devdutt Patnaik
@ 2010-02-23 10:46     ` Tim Deegan
  2010-02-23 10:51       ` Devdutt Patnaik
  2010-02-23 10:59       ` Keir Fraser
  2010-02-23 10:54     ` Jan Beulich
  2 siblings, 2 replies; 22+ messages in thread
From: Tim Deegan @ 2010-02-23 10:46 UTC (permalink / raw)
  To: Devdutt Patnaik; +Cc: Ashish Bijlani, xen-devel

At 10:19 +0000 on 23 Feb (1266920353), Devdutt Patnaik wrote:
> We just used the xen-unstable version from 2 weeks ago, and haven't really modified it.
> We tried this with 64-bit versions of 2.6.31.6 and 2.6.32.8 DomU kernels.

OK.  This really needs to be fixed to the 4.0 release.  Keir, have we
had any other testing on 64-bit PV live migrations?

By "haven't really modified it" do you mean you have modified it or not?

> Any suggestions on what might be a better bet in terms of xen, Dom0 and DomU kernel versions.
> We wish to use 64-bit PV VMs for our experiments.

Xen 3.4.x should be stabler if you need to carry on immediately.

Cheers,

Tim.

> We have only been able to do a successful migration 3 times, out of maybe 30 odd attempts.
> 
> Thanks,
> Devdutt.
> 
> On Tue, Feb 23, 2010 at 1:25 AM, Tim Deegan <Tim.Deegan@citrix.com<mailto:Tim.Deegan@citrix.com>> wrote:
> Hi,
> 
> At 08:57 +0000 on 23 Feb (1266915448), Ashish Bijlani wrote:
> > I'm working on a project that requires live migration of a 64-bit PV
> > VM (on a 64-bit platform). "xm save"  and "xm restore" work fine.
> > However, live migration fails with the following err msg:
> 
> Oh dear.  I take it this is on the sending machine. What version of Xen
> are you using?
> 
> Does it happen every time or only intermittently?
> 
> Does it happen only with one particular guest or all 64bit guests?
> 
> Have you made any modifications to Xen?
> 
> It looks like the shadow pagetable code has got very confused - a page
> is marked as shadowed but isn't in the hash-table of shadowed pages.
> 
> Cheers,
> 
> Tim.
> 
> > mapping kernel into physical memory
> > about to get started...
> > (XEN) traps.c:2306:d3 Domain attempted WRMSR 000000000000008b from
> > 00000a07:00000000 to 00000000:000000.
> > (XEN) Assertion 'x' failed at common.c:2139
> > (XEN) ----[ Xen-4.0.0-rc3-pre  x86_64  debug=y  Not tainted ]----
> > (XEN) CPU:    0
> > (XEN) RIP:    e008:[<ffff82c4801c8a08>] shadow_hash_delete+0x12e/0x18c
> > (XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor
> > (XEN) rax: ffff8300040e2770   rbx: ffff830223ce0000   rcx: 0000000000000000
> > (XEN) rdx: 0000000000000000   rsi: 0000000000000000   rdi: ffff82f60443c8a0
> > (XEN) rbp: ffff82c4802efb48   rsp: ffff82c4802efb18   r8:  ffff82f600000000
> > (XEN) r9:  0000000000000000   r10: ffff830223ce0000   r11: 00000000000041c5
> > (XEN) r12: 0000000000221e45   r13: 00000000000000ec   r14: ffff82f600000000
> > (XEN) r15: ffff8300cfaea000   cr0: 0000000080050033   cr4: 00000000000006f0
> > (XEN) cr3: 0000000210154000   cr2: ffff8801dd5508c8
> > (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
> > (XEN) Xen stack trace from rsp=ffff82c4802efb18:
> > (XEN)    3333333333333333 00000000000041c5 ffff82c4802efb32 000000000000000d
> > (XEN)    00000000000041c5 ffff8300cfaea000 ffff82c4802efba8 ffff82c4801e5d2e
> > (XEN)    0000005600ed79a0 0000000800000000 0000000100000000 0000000000221e45
> > (XEN)    0000000000000008 0000000000000000 ffff82f60443c8a0 ffff82f60404ab00
> > (XEN)    ffff82f600000000 ffff8300cfaea000 ffff82c4802efbd8 ffff82c4801c766d
> > (XEN)    ffff82c4802efc18 0000000000000282 0000000000000281 0000000000221e45
> > (XEN)    ffff82c4802efc28 ffff82c4801cb18f 000000000f69d1d0 ffff830223ce0000
> > (XEN)    ffff82c4802efc18 ffff830223ce0000 ffff82c4802eff28 ffff830223ce0e28
> > (XEN)    0000000000000002 ffff8300040de000 ffff82c4802efc58 ffff82c4801cba8d
> > (XEN)    0000000000000282 ffff82c4802efe58 0000000000010000 0000000000008000
> > (XEN)    ffff82c4802efce8 ffff82c4801bb394 0000000100000000 ffff8302236e8000
> > (XEN)    ffff830223ce0f08 ffff8300040e1000 00000001802efd48 ffff82c48031f640
> > (XEN)    ffff830223ce0000 0000000100000001 ffff82c4802eff28 ffff8300040e0000
> > (XEN)    ffff82c4802efce8 ffff830223ce0000 ffff82c4802efe58 00007fff0f69d1d0
> > (XEN)    ffff82c4802efe48 0000000000000000 ffff82c4802efd08 ffff82c4801bb56a
> > (XEN)    fffffffffffffff3 0000000000f71000 ffff82c4802efdc8 ffff82c48014796c
> > (XEN)    ffff82c4802efd28 ffff82c48016b0d4 ffff82c4802efd48 ffff82c48011dce7
> > (XEN)    0000000000000008 ffff82c480163d8c ffff82c4802efd68 ffff82c480118755
> > (XEN)    0000000000000008 ffff8300cfafa000 ffff82c4802efdc8 0000000000000286
> > (XEN)    ffff82c4802efd98 0000000000000286 ffff82c4802eff28 ffff82c4802eff28
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82c4801c8a08>] shadow_hash_delete+0x12e/0x18c
> > (XEN)    [<ffff82c4801e5d2e>] sh_destroy_l4_shadow__guest_4+0xb5/0x371
> > (XEN)    [<ffff82c4801c766d>] sh_destroy_shadow+0x17d/0x1ad
> > (XEN)    [<ffff82c4801cb18f>] shadow_blow_tables+0x20b/0x302
> > (XEN)    [<ffff82c4801cba8d>] shadow_clean_dirty_bitmap+0xba/0x10a
> > (XEN)    [<ffff82c4801bb394>] paging_log_dirty_op+0x506/0x58c
> > (XEN)    [<ffff82c4801bb56a>] paging_domctl+0x150/0x181
> > (XEN)    [<ffff82c48014796c>] arch_do_domctl+0x5c/0x1f64
> > (XEN)    [<ffff82c4801053b3>] do_domctl+0x1169/0x11e6
> > (XEN)    [<ffff82c4801f11bf>] syscall_enter+0xef/0x149
> > (XEN)
> > (XEN)
> > (XEN) ****************************************
> > (XEN) Panic on CPU 0:
> > (XEN) Assertion 'x' failed at common.c:2139
> > (XEN) ****************************************
> > (XEN)
> > (XEN) Reboot in five seconds...
> >
> > Any ideas what could be wrong here.
> >
> > Thanks,
> > Ashish
> 
> 
> 
> 
> Content-Description: ATT00001.txt
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com<mailto:Xen-devel@lists.xensource.com>
> > http://lists.xensource.com/xen-devel
> 
> 
> --
> Tim Deegan <Tim.Deegan@citrix.com<mailto:Tim.Deegan@citrix.com>>
> Principal Software Engineer, XenServer Engineering
> Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com<mailto:Xen-devel@lists.xensource.com>
> http://lists.xensource.com/xen-devel
> 

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, XenServer Engineering
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: live migration fails (assert in shadow_hash_delete)
  2010-02-23 10:46     ` Tim Deegan
@ 2010-02-23 10:51       ` Devdutt Patnaik
  2010-02-23 10:59       ` Keir Fraser
  1 sibling, 0 replies; 22+ messages in thread
From: Devdutt Patnaik @ 2010-02-23 10:51 UTC (permalink / raw)
  To: Tim Deegan; +Cc: Ashish Bijlani, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 6649 bytes --]

Tim,

Its just the stock xen-unstable (Xen4.0-rc3 ) unmodified. Has this feature
been evaluated/tested on the latest xen-unstable ?

Alright, we will give Xen 3.4.x a shot.

Thanks,
Devdutt.

On Tue, Feb 23, 2010 at 2:46 AM, Tim Deegan <Tim.Deegan@citrix.com> wrote:

> At 10:19 +0000 on 23 Feb (1266920353), Devdutt Patnaik wrote:
> > We just used the xen-unstable version from 2 weeks ago, and haven't
> really modified it.
> > We tried this with 64-bit versions of 2.6.31.6 and 2.6.32.8 DomU kernels.
>
> OK.  This really needs to be fixed to the 4.0 release.  Keir, have we
> had any other testing on 64-bit PV live migrations?
>
> By "haven't really modified it" do you mean you have modified it or not?
>
> > Any suggestions on what might be a better bet in terms of xen, Dom0 and
> DomU kernel versions.
> > We wish to use 64-bit PV VMs for our experiments.
>
> Xen 3.4.x should be stabler if you need to carry on immediately.
>
> Cheers,
>
> Tim.
>
> > We have only been able to do a successful migration 3 times, out of maybe
> 30 odd attempts.
> >
> > Thanks,
> > Devdutt.
> >
> > On Tue, Feb 23, 2010 at 1:25 AM, Tim Deegan <Tim.Deegan@citrix.com
> <mailto:Tim.Deegan@citrix.com>> wrote:
> > Hi,
> >
> > At 08:57 +0000 on 23 Feb (1266915448), Ashish Bijlani wrote:
> > > I'm working on a project that requires live migration of a 64-bit PV
> > > VM (on a 64-bit platform). "xm save"  and "xm restore" work fine.
> > > However, live migration fails with the following err msg:
> >
> > Oh dear.  I take it this is on the sending machine. What version of Xen
> > are you using?
> >
> > Does it happen every time or only intermittently?
> >
> > Does it happen only with one particular guest or all 64bit guests?
> >
> > Have you made any modifications to Xen?
> >
> > It looks like the shadow pagetable code has got very confused - a page
> > is marked as shadowed but isn't in the hash-table of shadowed pages.
> >
> > Cheers,
> >
> > Tim.
> >
> > > mapping kernel into physical memory
> > > about to get started...
> > > (XEN) traps.c:2306:d3 Domain attempted WRMSR 000000000000008b from
> > > 00000a07:00000000 to 00000000:000000.
> > > (XEN) Assertion 'x' failed at common.c:2139
> > > (XEN) ----[ Xen-4.0.0-rc3-pre  x86_64  debug=y  Not tainted ]----
> > > (XEN) CPU:    0
> > > (XEN) RIP:    e008:[<ffff82c4801c8a08>] shadow_hash_delete+0x12e/0x18c
> > > (XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor
> > > (XEN) rax: ffff8300040e2770   rbx: ffff830223ce0000   rcx:
> 0000000000000000
> > > (XEN) rdx: 0000000000000000   rsi: 0000000000000000   rdi:
> ffff82f60443c8a0
> > > (XEN) rbp: ffff82c4802efb48   rsp: ffff82c4802efb18   r8:
>  ffff82f600000000
> > > (XEN) r9:  0000000000000000   r10: ffff830223ce0000   r11:
> 00000000000041c5
> > > (XEN) r12: 0000000000221e45   r13: 00000000000000ec   r14:
> ffff82f600000000
> > > (XEN) r15: ffff8300cfaea000   cr0: 0000000080050033   cr4:
> 00000000000006f0
> > > (XEN) cr3: 0000000210154000   cr2: ffff8801dd5508c8
> > > (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
> > > (XEN) Xen stack trace from rsp=ffff82c4802efb18:
> > > (XEN)    3333333333333333 00000000000041c5 ffff82c4802efb32
> 000000000000000d
> > > (XEN)    00000000000041c5 ffff8300cfaea000 ffff82c4802efba8
> ffff82c4801e5d2e
> > > (XEN)    0000005600ed79a0 0000000800000000 0000000100000000
> 0000000000221e45
> > > (XEN)    0000000000000008 0000000000000000 ffff82f60443c8a0
> ffff82f60404ab00
> > > (XEN)    ffff82f600000000 ffff8300cfaea000 ffff82c4802efbd8
> ffff82c4801c766d
> > > (XEN)    ffff82c4802efc18 0000000000000282 0000000000000281
> 0000000000221e45
> > > (XEN)    ffff82c4802efc28 ffff82c4801cb18f 000000000f69d1d0
> ffff830223ce0000
> > > (XEN)    ffff82c4802efc18 ffff830223ce0000 ffff82c4802eff28
> ffff830223ce0e28
> > > (XEN)    0000000000000002 ffff8300040de000 ffff82c4802efc58
> ffff82c4801cba8d
> > > (XEN)    0000000000000282 ffff82c4802efe58 0000000000010000
> 0000000000008000
> > > (XEN)    ffff82c4802efce8 ffff82c4801bb394 0000000100000000
> ffff8302236e8000
> > > (XEN)    ffff830223ce0f08 ffff8300040e1000 00000001802efd48
> ffff82c48031f640
> > > (XEN)    ffff830223ce0000 0000000100000001 ffff82c4802eff28
> ffff8300040e0000
> > > (XEN)    ffff82c4802efce8 ffff830223ce0000 ffff82c4802efe58
> 00007fff0f69d1d0
> > > (XEN)    ffff82c4802efe48 0000000000000000 ffff82c4802efd08
> ffff82c4801bb56a
> > > (XEN)    fffffffffffffff3 0000000000f71000 ffff82c4802efdc8
> ffff82c48014796c
> > > (XEN)    ffff82c4802efd28 ffff82c48016b0d4 ffff82c4802efd48
> ffff82c48011dce7
> > > (XEN)    0000000000000008 ffff82c480163d8c ffff82c4802efd68
> ffff82c480118755
> > > (XEN)    0000000000000008 ffff8300cfafa000 ffff82c4802efdc8
> 0000000000000286
> > > (XEN)    ffff82c4802efd98 0000000000000286 ffff82c4802eff28
> ffff82c4802eff28
> > > (XEN) Xen call trace:
> > > (XEN)    [<ffff82c4801c8a08>] shadow_hash_delete+0x12e/0x18c
> > > (XEN)    [<ffff82c4801e5d2e>] sh_destroy_l4_shadow__guest_4+0xb5/0x371
> > > (XEN)    [<ffff82c4801c766d>] sh_destroy_shadow+0x17d/0x1ad
> > > (XEN)    [<ffff82c4801cb18f>] shadow_blow_tables+0x20b/0x302
> > > (XEN)    [<ffff82c4801cba8d>] shadow_clean_dirty_bitmap+0xba/0x10a
> > > (XEN)    [<ffff82c4801bb394>] paging_log_dirty_op+0x506/0x58c
> > > (XEN)    [<ffff82c4801bb56a>] paging_domctl+0x150/0x181
> > > (XEN)    [<ffff82c48014796c>] arch_do_domctl+0x5c/0x1f64
> > > (XEN)    [<ffff82c4801053b3>] do_domctl+0x1169/0x11e6
> > > (XEN)    [<ffff82c4801f11bf>] syscall_enter+0xef/0x149
> > > (XEN)
> > > (XEN)
> > > (XEN) ****************************************
> > > (XEN) Panic on CPU 0:
> > > (XEN) Assertion 'x' failed at common.c:2139
> > > (XEN) ****************************************
> > > (XEN)
> > > (XEN) Reboot in five seconds...
> > >
> > > Any ideas what could be wrong here.
> > >
> > > Thanks,
> > > Ashish
> >
> >
> >
> >
> > Content-Description: ATT00001.txt
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel@lists.xensource.com<mailto:Xen-devel@lists.xensource.com>
> > > http://lists.xensource.com/xen-devel
> >
> >
> > --
> > Tim Deegan <Tim.Deegan@citrix.com<mailto:Tim.Deegan@citrix.com>>
> > Principal Software Engineer, XenServer Engineering
> > Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com<mailto:Xen-devel@lists.xensource.com>
> > http://lists.xensource.com/xen-devel
> >
>
> --
> Tim Deegan <Tim.Deegan@citrix.com>
> Principal Software Engineer, XenServer Engineering
> Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)
>

[-- Attachment #1.2: Type: text/html, Size: 8761 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: live migration fails (assert in shadow_hash_delete)
  2010-02-23 10:19   ` Devdutt Patnaik
  2010-02-23 10:25     ` Devdutt Patnaik
  2010-02-23 10:46     ` Tim Deegan
@ 2010-02-23 10:54     ` Jan Beulich
  2 siblings, 0 replies; 22+ messages in thread
From: Jan Beulich @ 2010-02-23 10:54 UTC (permalink / raw)
  To: Devdutt Patnaik; +Cc: Ashish Bijlani, xen-devel, Tim Deegan

>>> Devdutt Patnaik <xendevid@gmail.com> 23.02.10 11:19 >>>
>We have only been able to do a successful migration 3 times, out of maybe 30
>odd attempts.

And is it always a very similar (or identical) register/stack dump you get?

Jan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: live migration fails (assert in shadow_hash_delete)
  2010-02-23 10:46     ` Tim Deegan
  2010-02-23 10:51       ` Devdutt Patnaik
@ 2010-02-23 10:59       ` Keir Fraser
  2010-02-23 11:05         ` Devdutt Patnaik
                           ` (3 more replies)
  1 sibling, 4 replies; 22+ messages in thread
From: Keir Fraser @ 2010-02-23 10:59 UTC (permalink / raw)
  To: Tim Deegan, Devdutt Patnaik, Ian Jackson; +Cc: Ashish Bijlani, xen-devel

On 23/02/2010 10:46, "Tim Deegan" <Tim.Deegan@citrix.com> wrote:

> At 10:19 +0000 on 23 Feb (1266920353), Devdutt Patnaik wrote:
>> We just used the xen-unstable version from 2 weeks ago, and haven't really
>> modified it.
>> We tried this with 64-bit versions of 2.6.31.6 and 2.6.32.8 DomU kernels.
> 
> OK.  This really needs to be fixed to the 4.0 release.  Keir, have we
> had any other testing on 64-bit PV live migrations?

Localhost migrations were just added to the automated tests. But I think
maybe they are trivially failing due to trying to do them via the 'xl'
interface, which doesn't support it(!). Ian?

In short, there's probably been little or no testing of live migration in
the recent past, as I don't think Intel tests it either.

 -- Keir

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: live migration fails (assert in shadow_hash_delete)
  2010-02-23 10:59       ` Keir Fraser
@ 2010-02-23 11:05         ` Devdutt Patnaik
  2010-02-23 11:10         ` Keir Fraser
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: Devdutt Patnaik @ 2010-02-23 11:05 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Ashish Bijlani, xen-devel, Ian Jackson, Tim Deegan


[-- Attachment #1.1: Type: text/plain, Size: 968 bytes --]

We are trying remote migrations and use the "xm migrate --live" command.

-Devdutt.

On Tue, Feb 23, 2010 at 2:59 AM, Keir Fraser <keir.fraser@eu.citrix.com>wrote:

> On 23/02/2010 10:46, "Tim Deegan" <Tim.Deegan@citrix.com> wrote:
>
> > At 10:19 +0000 on 23 Feb (1266920353), Devdutt Patnaik wrote:
> >> We just used the xen-unstable version from 2 weeks ago, and haven't
> really
> >> modified it.
> >> We tried this with 64-bit versions of 2.6.31.6 and 2.6.32.8 DomU
> kernels.
> >
> > OK.  This really needs to be fixed to the 4.0 release.  Keir, have we
> > had any other testing on 64-bit PV live migrations?
>
> Localhost migrations were just added to the automated tests. But I think
> maybe they are trivially failing due to trying to do them via the 'xl'
> interface, which doesn't support it(!). Ian?
>
> In short, there's probably been little or no testing of live migration in
> the recent past, as I don't think Intel tests it either.
>
>  -- Keir
>
>
>

[-- Attachment #1.2: Type: text/html, Size: 1476 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: live migration fails (assert in shadow_hash_delete)
  2010-02-23 10:59       ` Keir Fraser
  2010-02-23 11:05         ` Devdutt Patnaik
@ 2010-02-23 11:10         ` Keir Fraser
  2010-02-23 17:05         ` Ian Jackson
  2010-02-26  6:12         ` Xu, Jiajun
  3 siblings, 0 replies; 22+ messages in thread
From: Keir Fraser @ 2010-02-23 11:10 UTC (permalink / raw)
  To: Tim Deegan, Devdutt Patnaik, Ian Jackson; +Cc: Ashish Bijlani, xen-devel

On 23/02/2010 10:59, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:

> Localhost migrations were just added to the automated tests. But I think
> maybe they are trivially failing due to trying to do them via the 'xl'
> interface, which doesn't support it(!). Ian?
> 
> In short, there's probably been little or no testing of live migration in
> the recent past, as I don't think Intel tests it either.

A quick manual test indicates it's very easy to get Xen to blow up. I got
the following on my first localhost live migration attempt, which is a
different looking crash in the shadow code. This is with 2.6.18 dom0 and
domU by the way, so it's not pv_ops tickling the hypervisor in an unexpected
way...

(XEN) sh error: sh_page_fault__guest_4(): Recursive shadow fault: lock was
taken by sh_page_fault__guest_4
(XEN) ----[ Xen-4.0.0-rc4  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    3
(XEN) RIP:    e008:[<ffff82c4801c7984>] shadow_hash_lookup+0x11f/0x268
(XEN) RFLAGS: 0000000000010206   CONTEXT: hypervisor
(XEN) rax: 00000000c0000000   rbx: 0000000000085111   rcx: 0000000000000000
(XEN) rdx: 000000007339c000   rsi: 0000000000000000   rdi: ffff82f600000000
(XEN) rbp: ffff8300bfcdfc88   rsp: ffff8300bfcdfc18   r8:  ffffffffffffffff
(XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000001
(XEN) r12: 0000000000000008   r13: ffff8300bfce0000   r14: 0000000000000000
(XEN) r15: 00000000c0000000   cr0: 000000008005003b   cr4: 00000000000026f4
(XEN) cr3: 0000000082b46000   cr2: 00000000c0000010
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen stack trace from rsp=ffff8300bfcdfc18:
(XEN)    ffff8300bc42e000 ffff8300bfce0000 ffff8300bfcdfc88 ffff82c4801e56dd
(XEN)    ffff8300bfcdfc88 0000000000000000 0000000082b44067 0000000000085111
(XEN)    ffff8300bfcdff28 ffff8300bfce0000 ffff8300bfcdff28 ffff8300bc42e000
(XEN)    0000000000082b45 0000000000000001 ffff8300bfcdfed8 ffff82c4801e7d7a
(XEN)    ffff82c48016ae3e 0000000000000260 00000000000001f0 0000000000000d20
(XEN)    0000000000083037 ffff8300bfce0218 ffff8300bfcdff28 ffff8300bfcdff28
(XEN)    0000000000083037 ffff8300bfcdff28 ffff8300bfcdff28 0000000000083037
(XEN)    00000000000000d8 ffff82c480265ce0 ffff8300bfcdff28 00000002ae907c4c
(XEN)    00000000bc42e000 ffff81c0e0655d20 ffff8300bfce0e28 0000000000082b44
(XEN)    ffff81c0caba41f0 0000000000083037 00002ae907c4c0ff 00000002bfce0000
(XEN)    0000000082b44067 ffff8300bfcdfd78 ffff82c48011e433 ffff8300bc42e000
(XEN)    ffff8300bfcdfe18 00000001801ca140 ffff8300bfcdfdb8 ffff8300bfcdfde0
(XEN)    ffff8300bfcdff28 ffff8300bfcdff28 ffff8300bfcdff28 ffff8300bfcdff28
(XEN)    ffff8300bfcdfe18 00000001801e021e ffff8300bfcdfe18 0000000100000100
(XEN)    ffffffff8020d84d ffff8300bc42e000 ffff8300bfce0000 ffff8300bc42e000
(XEN)    ffff8300bc42fa38 0000000000082b67 0000000000000001 ffff82f601056ce0
(XEN)    ffff8300bfcdfe68 ffff82c4801e9ae5 ffff8300bfce0000 ffff82f600000001
(XEN)    ffff8300bfcdff08 ffff8300bc42e000 ffff8300bc42e000 0000000000583440
(XEN)    00002ae907c4c0ff 0000000084a81067 0000000084cfa067 0000000085111067
(XEN)    0000000083037125 000000000008550a 0000000000084a81 0000000000084cfa
(XEN) Xen call trace:
(XEN)    [<ffff82c4801c7984>] shadow_hash_lookup+0x11f/0x268
(XEN)    [<ffff82c4801e7d7a>] sh_page_fault__guest_4+0xf4f/0x1fee
(XEN)    [<ffff82c48017735e>] do_page_fault+0x3b2/0x4f0
(XEN)    
(XEN) Pagetable walk from 00000000c0000010:
(XEN)  L4[0x000] = 0000000000000000 ffffffffffffffff
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 3:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0000]
(XEN) Faulting linear address: 00000000c0000010
(XEN) ****************************************

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: live migration fails (assert in shadow_hash_delete)
  2010-02-23 10:59       ` Keir Fraser
  2010-02-23 11:05         ` Devdutt Patnaik
  2010-02-23 11:10         ` Keir Fraser
@ 2010-02-23 17:05         ` Ian Jackson
  2010-02-24  9:31           ` Ashish Bijlani
  2010-02-26  6:12         ` Xu, Jiajun
  3 siblings, 1 reply; 22+ messages in thread
From: Ian Jackson @ 2010-02-23 17:05 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Tim Deegan, Ashish Bijlani, xen-devel, Devdutt Patnaik

Keir Fraser writes ("Re: [Xen-devel] live migration fails (assert in shadow_hash_delete)"):
> Localhost migrations were just added to the automated tests. But I think
> maybe they are trivially failing due to trying to do them via the 'xl'
> interface, which doesn't support it(!). Ian?

Localhost migration does work in most combinations in our tests.  It
was only recently added and there are a few teething troubles with it
so I don't have a full slate of results.

It doesn't work at all with libxl because it's not implemented.

Keir:
> A quick manual test indicates it's very easy to get Xen to blow up. I got
> the following on my first localhost live migration attempt, which is a
> different looking crash in the shadow code. This is with 2.6.18 dom0 and
> domU by the way, so it's not pv_ops tickling the hypervisor in an unexpected
> way...

2.6.18 doesn't boot on my test hardware so I'm just building it, not
running it.  So I haven't reproduced your test, which explains the
different results.

Ian.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: live migration fails (assert in shadow_hash_delete)
  2010-02-23 17:05         ` Ian Jackson
@ 2010-02-24  9:31           ` Ashish Bijlani
  2010-02-24 11:01             ` Keir Fraser
  0 siblings, 1 reply; 22+ messages in thread
From: Ashish Bijlani @ 2010-02-24  9:31 UTC (permalink / raw)
  To: xen-devel

xen barfs while live migrating a 32-bit VM (on 64-bit platform):

(XEN) Assertion '__mfn_valid(mfn_x(smfn))' failed at multi.c:2561
(XEN) ----[ Xen-4.0.0-rc4  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    2
(XEN) RIP:    e008:[<ffff82c4801e0639>]
sh_map_and_validate_gl4e__guest_4+0x6d/0x1d4
(XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: ffff830213187000   rcx: 00000000000000d3
(XEN) rdx: 0000000049ba6ceb   rsi: 00000000000000d3   rdi: ffffffffffffffff
(XEN) rbp: ffff83022ff2fb68   rsp: ffff83022ff2faf8   r8:  0000000000213187
(XEN) r9:  007fffffffffffff   r10: ffff82c480207e90   r11: 0000000000000000
(XEN) r12: 0000000000000000   r13: 0000000000213187   r14: 0000000000000008
(XEN) r15: 0000000000000008   cr0: 000000008005003b   cr4: 00000000000006f4
(XEN) cr3: 000000020ee46000   cr2: 00000000c1829248
(XEN) ds: 007b   es: 007b   fs: 00d8   gs: 0000   ss: 0000   cs: e008
(XEN) Xen stack trace from rsp=ffff83022ff2faf8:
(XEN)    ffff83022ff2fb68 ffff82c4801bbe32 ffff830004060000 ffffffffffffffff
(XEN)    ffff83022ff2ff28 ffff83022ff2ff28 ffff8301f5330000 0000000000000000
(XEN)    ffff83022ff2fb78 ffff82f6042630e0 0000000000000000 0000000000213187
(XEN)    ffff830004060000 0000000000000008 ffff83022ff2fbb8 ffff82c4801c785b
(XEN)    ffff83022ff2fbc8 ffff830213187000 3000000000000000 ffff830004060000
(XEN)    00000001dc092027 ffff83022ff2fc60 0000000000000000 ffff830213187000
(XEN)    ffff83022ff2fc08 ffff82c4801c79a8 0000000000213187 00000001dbaa5027
(XEN)    ffff830004060000 00000001dc092027 ffff830004060000 00000001dc092027
(XEN)    0000000000000000 00000001dbaa5027 ffff83022ff2fc98 ffff82c480163091
(XEN)    ffff83022ff2fc88 ffff82c4801e1180 ffff830100000000 ffff83022ff2fc60
(XEN)    0000000000213187 00000001dbaa5027 00000001dbaa5027 ffff830213187000
(XEN)    ffff83022ff2ff28 00000001dc092027 ffff83022ff2ff28 ffff830004060000
(XEN)    ffff8301f5330000 00000000001dbaa5 0000000000000005 ffff83022ff2ff28
(XEN)    ffff83022ff2fcc8 ffff82c480163242 ffff8301f5330000 0000000000000000
(XEN)    ffff83022ff24000 0000000000000005 ffff83022ff2fdc8 ffff82c480163ba7
(XEN)    ffff8301f5330018 00007ff0d8c3c148 0000000000000000 ffff82c480265db0
(XEN)    ffff82c480265db8 ffff83022ff2ff28 ffff83022ff2ff28 ffff8301f5330218
(XEN)    000000200000007b ffff81800060c148 ffff830004060000 ffff8301f5330000
(XEN)    ffff818000000000 00000001001d8462 0000000000000000 00000006cfd24000
(XEN)    80000001d9582021 ffff830000000001 ffff83022ff2fd78 0000000004060060
(XEN) Xen call trace:
(XEN)    [<ffff82c4801e0639>] sh_map_and_validate_gl4e__guest_4+0x6d/0x1d4
(XEN)    [<ffff82c4801c785b>] sh_validate_guest_entry+0x17e/0x1c6
(XEN)    [<ffff82c4801c79a8>] shadow_cmpxchg_guest_entry+0x105/0x189
(XEN)    [<ffff82c480163091>] mod_l4_entry+0x2fd/0x3e3
(XEN)    [<ffff82c480163242>] new_guest_cr3+0xcb/0x269
(XEN)    [<ffff82c480163ba7>] do_mmuext_op+0x7c7/0x14b8
(XEN)    [<ffff82c4801f2248>] compat_mmuext_op+0x217/0x3a9
(XEN)    [<ffff82c4801309b9>] compat_multicall+0x269/0x404
(XEN)    [<ffff82c4801ff580>] compat_hypercall+0xc0/0x119
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 2:
(XEN) Assertion '__mfn_valid(mfn_x(smfn))' failed at multi.c:2561
(XEN) ****************************************
(XEN)

Any ideas how to fix this prob?

Is live migration not stable enough with xen-4.0 (rc4) yet?

Thanks,
Ashish

On Tue, Feb 23, 2010 at 12:05 PM, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:
> Keir Fraser writes ("Re: [Xen-devel] live migration fails (assert in shadow_hash_delete)"):
>> Localhost migrations were just added to the automated tests. But I think
>> maybe they are trivially failing due to trying to do them via the 'xl'
>> interface, which doesn't support it(!). Ian?
>
> Localhost migration does work in most combinations in our tests.  It
> was only recently added and there are a few teething troubles with it
> so I don't have a full slate of results.
>
> It doesn't work at all with libxl because it's not implemented.
>
> Keir:
>> A quick manual test indicates it's very easy to get Xen to blow up. I got
>> the following on my first localhost live migration attempt, which is a
>> different looking crash in the shadow code. This is with 2.6.18 dom0 and
>> domU by the way, so it's not pv_ops tickling the hypervisor in an unexpected
>> way...
>
> 2.6.18 doesn't boot on my test hardware so I'm just building it, not
> running it.  So I haven't reproduced your test, which explains the
> different results.
>
> Ian.
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: live migration fails (assert in shadow_hash_delete)
  2010-02-24  9:31           ` Ashish Bijlani
@ 2010-02-24 11:01             ` Keir Fraser
  2010-02-26  9:52               ` Tim Deegan
  0 siblings, 1 reply; 22+ messages in thread
From: Keir Fraser @ 2010-02-24 11:01 UTC (permalink / raw)
  To: Ashish Bijlani, xen-devel

On 24/02/2010 09:31, "Ashish Bijlani" <ashish.bijlani@gmail.com> wrote:

> Any ideas how to fix this prob?
> 
> Is live migration not stable enough with xen-4.0 (rc4) yet?

Tim Deegan's kindly offered to investigate this ahead of rc5.

 -- Keir

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: live migration fails (assert in shadow_hash_delete)
  2010-02-23 10:59       ` Keir Fraser
                           ` (2 preceding siblings ...)
  2010-02-23 17:05         ` Ian Jackson
@ 2010-02-26  6:12         ` Xu, Jiajun
  2010-02-26  8:38           ` Pasi Kärkkäinen
  2010-02-26  8:39           ` Jan Beulich
  3 siblings, 2 replies; 22+ messages in thread
From: Xu, Jiajun @ 2010-02-26  6:12 UTC (permalink / raw)
  To: Keir Fraser, Tim Deegan, Devdutt Patnaik, Ian Jackson
  Cc: Ashish Bijlani, xen-devel

Our normal testing covers local live migration testing for HVM with Pv_ops. These cases can pass in Xen-4.0.0 RCx testing. 
And I just now tried HVM live migration between two machines with xen c/s 20964 and Pv_ops, it can work.

> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com
> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser
> Sent: Tuesday, February 23, 2010 7:00 PM
> To: Tim Deegan; Devdutt Patnaik; Ian Jackson
> Cc: Ashish Bijlani; xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] live migration fails (assert in shadow_hash_delete)
> 
> On 23/02/2010 10:46, "Tim Deegan" <Tim.Deegan@citrix.com> wrote:
> 
> > At 10:19 +0000 on 23 Feb (1266920353), Devdutt Patnaik wrote:
> >> We just used the xen-unstable version from 2 weeks ago, and haven't really
> >> modified it.
> >> We tried this with 64-bit versions of 2.6.31.6 and 2.6.32.8 DomU kernels.
> >
> > OK.  This really needs to be fixed to the 4.0 release.  Keir, have we
> > had any other testing on 64-bit PV live migrations?
> 
> Localhost migrations were just added to the automated tests. But I think
> maybe they are trivially failing due to trying to do them via the 'xl'
> interface, which doesn't support it(!). Ian?
> 
> In short, there's probably been little or no testing of live migration in
> the recent past, as I don't think Intel tests it either.
> 
>  -- Keir
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: live migration fails (assert in shadow_hash_delete)
  2010-02-26  6:12         ` Xu, Jiajun
@ 2010-02-26  8:38           ` Pasi Kärkkäinen
  2010-02-26  8:39           ` Jan Beulich
  1 sibling, 0 replies; 22+ messages in thread
From: Pasi Kärkkäinen @ 2010-02-26  8:38 UTC (permalink / raw)
  To: Xu, Jiajun
  Cc: xen-devel, Devdutt Patnaik, Tim Deegan, Ashish Bijlani,
	Ian Jackson, Keir Fraser

On Fri, Feb 26, 2010 at 02:12:22PM +0800, Xu, Jiajun wrote:
> Our normal testing covers local live migration testing for HVM with Pv_ops. These cases can pass in Xen-4.0.0 RCx testing. 
> And I just now tried HVM live migration between two machines with xen c/s 20964 and Pv_ops, it can work.
> 

I guess the problem here was PV live migration.. do you test that aswell? 

-- Pasi

> > -----Original Message-----
> > From: xen-devel-bounces@lists.xensource.com
> > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser
> > Sent: Tuesday, February 23, 2010 7:00 PM
> > To: Tim Deegan; Devdutt Patnaik; Ian Jackson
> > Cc: Ashish Bijlani; xen-devel@lists.xensource.com
> > Subject: Re: [Xen-devel] live migration fails (assert in shadow_hash_delete)
> > 
> > On 23/02/2010 10:46, "Tim Deegan" <Tim.Deegan@citrix.com> wrote:
> > 
> > > At 10:19 +0000 on 23 Feb (1266920353), Devdutt Patnaik wrote:
> > >> We just used the xen-unstable version from 2 weeks ago, and haven't really
> > >> modified it.
> > >> We tried this with 64-bit versions of 2.6.31.6 and 2.6.32.8 DomU kernels.
> > >
> > > OK.  This really needs to be fixed to the 4.0 release.  Keir, have we
> > > had any other testing on 64-bit PV live migrations?
> > 
> > Localhost migrations were just added to the automated tests. But I think
> > maybe they are trivially failing due to trying to do them via the 'xl'
> > interface, which doesn't support it(!). Ian?
> > 
> > In short, there's probably been little or no testing of live migration in
> > the recent past, as I don't think Intel tests it either.
> > 
> >  -- Keir
> > 
> > 
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: live migration fails (assert in shadow_hash_delete)
  2010-02-26  6:12         ` Xu, Jiajun
  2010-02-26  8:38           ` Pasi Kärkkäinen
@ 2010-02-26  8:39           ` Jan Beulich
  1 sibling, 0 replies; 22+ messages in thread
From: Jan Beulich @ 2010-02-26  8:39 UTC (permalink / raw)
  To: Jiajun Xu
  Cc: xen-devel, Devdutt Patnaik, Tim Deegan, Ashish Bijlani,
	Ian Jackson, Keir Fraser

HVM with pv-ops? Seems irrelevant whether the kernel used in a hvm
guest has pv-ops. The point is that HVM live migration appears to work
fine (also according to our internal testing), just pv seems to be broken
(and unfortunately with no consistent crash pattern).

Jan

>>> "Xu, Jiajun" <jiajun.xu@intel.com> 26.02.10 07:12 >>>
Our normal testing covers local live migration testing for HVM with Pv_ops. These cases can pass in Xen-4.0.0 RCx testing. 
And I just now tried HVM live migration between two machines with xen c/s 20964 and Pv_ops, it can work.

> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com 
> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser
> Sent: Tuesday, February 23, 2010 7:00 PM
> To: Tim Deegan; Devdutt Patnaik; Ian Jackson
> Cc: Ashish Bijlani; xen-devel@lists.xensource.com 
> Subject: Re: [Xen-devel] live migration fails (assert in shadow_hash_delete)
> 
> On 23/02/2010 10:46, "Tim Deegan" <Tim.Deegan@citrix.com> wrote:
> 
> > At 10:19 +0000 on 23 Feb (1266920353), Devdutt Patnaik wrote:
> >> We just used the xen-unstable version from 2 weeks ago, and haven't really
> >> modified it.
> >> We tried this with 64-bit versions of 2.6.31.6 and 2.6.32.8 DomU kernels.
> >
> > OK.  This really needs to be fixed to the 4.0 release.  Keir, have we
> > had any other testing on 64-bit PV live migrations?
> 
> Localhost migrations were just added to the automated tests. But I think
> maybe they are trivially failing due to trying to do them via the 'xl'
> interface, which doesn't support it(!). Ian?
> 
> In short, there's probably been little or no testing of live migration in
> the recent past, as I don't think Intel tests it either.
> 
>  -- Keir
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com 
> http://lists.xensource.com/xen-devel 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com 
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: live migration fails (assert in shadow_hash_delete)
  2010-02-24 11:01             ` Keir Fraser
@ 2010-02-26  9:52               ` Tim Deegan
  2010-02-26 10:27                 ` Jan Beulich
  0 siblings, 1 reply; 22+ messages in thread
From: Tim Deegan @ 2010-02-26  9:52 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Ashish Bijlani, xen-devel

At 11:01 +0000 on 24 Feb (1267009267), Keir Fraser wrote:
> On 24/02/2010 09:31, "Ashish Bijlani" <ashish.bijlani@gmail.com> wrote:
> 
> > Any ideas how to fix this prob?
> > 
> > Is live migration not stable enough with xen-4.0 (rc4) yet?
> 
> Tim Deegan's kindly offered to investigate this ahead of rc5.

For the curious:

The bug seems to have come in between 4.0.0 rc1 (20789) and 20822.
Bisecting between those is more fun because PV domain creation and migration
were broken in libxc then.  Reverting 20808 (the only cset there that
touches the shadow code) doesn't fix the problem.

Selective backporting yesterday seemed to blame 20812 ("xend: NUMA: fix
division by zero on unpopulated nodes"), which seems unlikely.  I'll dig
further.

Tim.

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, XenServer Engineering
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: live migration fails (assert in shadow_hash_delete)
  2010-02-26  9:52               ` Tim Deegan
@ 2010-02-26 10:27                 ` Jan Beulich
  2010-02-26 14:22                   ` Tim Deegan
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2010-02-26 10:27 UTC (permalink / raw)
  To: Tim Deegan; +Cc: Ashish Bijlani, xen-devel, Keir Fraser

>>> Tim Deegan <Tim.Deegan@citrix.com> 26.02.10 10:52 >>>
>The bug seems to have come in between 4.0.0 rc1 (20789) and 20822.
>Bisecting between those is more fun because PV domain creation and migration
>were broken in libxc then.  Reverting 20808 (the only cset there that
>touches the shadow code) doesn't fix the problem.
>
>Selective backporting yesterday seemed to blame 20812 ("xend: NUMA: fix
>division by zero on unpopulated nodes"), which seems unlikely.  I'll dig
>further.

I'd think 20792 is a good candidate - a copy-and-paste mistake would
cause the page subsequent to the one allocated to be overwritten.
Will send a patch in a minute.

Jan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: live migration fails (assert in shadow_hash_delete)
  2010-02-26 10:27                 ` Jan Beulich
@ 2010-02-26 14:22                   ` Tim Deegan
  2010-02-26 14:48                     ` Keir Fraser
  2010-02-26 14:49                     ` Jan Beulich
  0 siblings, 2 replies; 22+ messages in thread
From: Tim Deegan @ 2010-02-26 14:22 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Ashish Bijlani, xen-devel, Keir Fraser

At 10:27 +0000 on 26 Feb (1267180053), Jan Beulich wrote:
> >>> Tim Deegan <Tim.Deegan@citrix.com> 26.02.10 10:52 >>>
> >The bug seems to have come in between 4.0.0 rc1 (20789) and 20822.
> >Bisecting between those is more fun because PV domain creation and migration
> >were broken in libxc then.  Reverting 20808 (the only cset there that
> >touches the shadow code) doesn't fix the problem.
> >
> >Selective backporting yesterday seemed to blame 20812 ("xend: NUMA: fix
> >division by zero on unpopulated nodes"), which seems unlikely.  I'll dig
> >further.
> 
> I'd think 20792 is a good candidate - a copy-and-paste mistake would
> cause the page subsequent to the one allocated to be overwritten.
> Will send a patch in a minute.

Thanks for that.

Keir, I'm still seeing (different) crashes on unstable tip even with
Jan's fix; the proximate cause is c/s 20954, which changes the paths
taken when log-dirty mode is turned off after the live migration.

Reverting c/s 20954 fixes migration for me and is probably the best
thing to get the 4.0 release schedule going again.   I'll try to find
the actual bug at some later date. 

Cheers,

Tim.

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, XenServer Engineering
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: live migration fails (assert in shadow_hash_delete)
  2010-02-26 14:22                   ` Tim Deegan
@ 2010-02-26 14:48                     ` Keir Fraser
  2010-02-26 14:49                     ` Jan Beulich
  1 sibling, 0 replies; 22+ messages in thread
From: Keir Fraser @ 2010-02-26 14:48 UTC (permalink / raw)
  To: Tim Deegan, Jan Beulich; +Cc: Ashish Bijlani, xen-devel

On 26/02/2010 14:22, "Tim Deegan" <Tim.Deegan@eu.citrix.com> wrote:

>> I'd think 20792 is a good candidate - a copy-and-paste mistake would
>> cause the page subsequent to the one allocated to be overwritten.
>> Will send a patch in a minute.
> 
> Thanks for that.
> 
> Keir, I'm still seeing (different) crashes on unstable tip even with
> Jan's fix; the proximate cause is c/s 20954, which changes the paths
> taken when log-dirty mode is turned off after the live migration.
> 
> Reverting c/s 20954 fixes migration for me and is probably the best
> thing to get the 4.0 release schedule going again.   I'll try to find
> the actual bug at some later date.

Hm, yes, it looks like properly performing XEN_DOMCTL_SHADOW_OP_OFF on a PV
domain doesn't work. Removing the break stmts stops {shadow,hap}_domctl()
ever being called for the OFF operation -- so logdirty gets disabled but
nothing else -- and then I guess that we get teardown right for domain
destruction.

 -- Keir

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: live migration fails (assert in shadow_hash_delete)
  2010-02-26 14:22                   ` Tim Deegan
  2010-02-26 14:48                     ` Keir Fraser
@ 2010-02-26 14:49                     ` Jan Beulich
  2010-02-26 15:29                       ` Keir Fraser
  1 sibling, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2010-02-26 14:49 UTC (permalink / raw)
  To: Tim Deegan; +Cc: Ashish Bijlani, xen-devel, Keir Fraser

>>> Tim Deegan <Tim.Deegan@citrix.com> 26.02.10 15:22 >>>
>Keir, I'm still seeing (different) crashes on unstable tip even with
>Jan's fix; the proximate cause is c/s 20954, which changes the paths
>taken when log-dirty mode is turned off after the live migration.
>
>Reverting c/s 20954 fixes migration for me and is probably the best
>thing to get the 4.0 release schedule going again.   I'll try to find
>the actual bug at some later date. 

So perhaps the fall-through there was really intended? I had pointed
out that these missing break statements looked suspicious, so maybe
it's simply that those two places should be annotated accordingly?

Jan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: live migration fails (assert in  shadow_hash_delete)
  2010-02-26 14:49                     ` Jan Beulich
@ 2010-02-26 15:29                       ` Keir Fraser
  0 siblings, 0 replies; 22+ messages in thread
From: Keir Fraser @ 2010-02-26 15:29 UTC (permalink / raw)
  To: Jan Beulich, Tim Deegan; +Cc: Ashish Bijlani, xen-devel

On 26/02/2010 14:49, "Jan Beulich" <JBeulich@novell.com> wrote:

>>>> Tim Deegan <Tim.Deegan@citrix.com> 26.02.10 15:22 >>>
>> Keir, I'm still seeing (different) crashes on unstable tip even with
>> Jan's fix; the proximate cause is c/s 20954, which changes the paths
>> taken when log-dirty mode is turned off after the live migration.
>> 
>> Reverting c/s 20954 fixes migration for me and is probably the best
>> thing to get the 4.0 release schedule going again.   I'll try to find
>> the actual bug at some later date.
> 
> So perhaps the fall-through there was really intended? I had pointed
> out that these missing break statements looked suspicious, so maybe
> it's simply that those two places should be annotated accordingly?

Mmmm.. No. :-)

 -- Keir

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2010-02-26 15:29 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-23  8:57 live migration fails (assert in shadow_hash_delete) Ashish Bijlani
2010-02-23  9:25 ` Tim Deegan
2010-02-23 10:19   ` Devdutt Patnaik
2010-02-23 10:25     ` Devdutt Patnaik
2010-02-23 10:46     ` Tim Deegan
2010-02-23 10:51       ` Devdutt Patnaik
2010-02-23 10:59       ` Keir Fraser
2010-02-23 11:05         ` Devdutt Patnaik
2010-02-23 11:10         ` Keir Fraser
2010-02-23 17:05         ` Ian Jackson
2010-02-24  9:31           ` Ashish Bijlani
2010-02-24 11:01             ` Keir Fraser
2010-02-26  9:52               ` Tim Deegan
2010-02-26 10:27                 ` Jan Beulich
2010-02-26 14:22                   ` Tim Deegan
2010-02-26 14:48                     ` Keir Fraser
2010-02-26 14:49                     ` Jan Beulich
2010-02-26 15:29                       ` Keir Fraser
2010-02-26  6:12         ` Xu, Jiajun
2010-02-26  8:38           ` Pasi Kärkkäinen
2010-02-26  8:39           ` Jan Beulich
2010-02-23 10:54     ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.