All of lore.kernel.org
 help / color / mirror / Atom feed
* Xen unstability on HP Moonshot m400
@ 2015-03-21 12:34 Christoffer Dall
  2015-03-23 12:36 ` Ian Campbell
  0 siblings, 1 reply; 7+ messages in thread
From: Christoffer Dall @ 2015-03-21 12:34 UTC (permalink / raw)
  To: Ian Campbell, Stefano Stabellini, Hull, Jim
  Cc: Marc Zyngier, Robert Ricci, xen-devel, Pranavkumar Sawargaonkar


[-- Attachment #1.1: Type: text/plain, Size: 2805 bytes --]

Hi,

I have been experiencing a problematic crash running Xen on m400 over the
last few days.  I already spoke to Ian and Stefano about this, but thought
I'd summarize what I've seen so far and loop in a wider audience.

The basic setup is this:
 - Two m400 nodes, one running Linux bare-metal, the other running Xen.
 - The Xen node runs Dom0 and 1 DomU
 - The m400 has a Mellanox Connectx-3 PCIe 10G ethernet card with two parts
on it
 - Dom0 uses NAT forwarding from Dom0's eth0 (which is connected to the
internet) and regular bridging to eth1 which is connected to a private VLAN
to the bare-metal node
 - Dom0 and DomU are configured with 14GB of ram, 4 cpus each
 - DomU runs apache2 serving the GCC manual (see
https://github.com/chazy/kvmperf/blob/master/cmdline_tests/apache_install.sh
)

The bare-metal node runs apache bench, like this: "ab -n 100000 -c 100
http://10.10.1.120/gcc/index.html"

(10.10.1.120 is the DomU IP address of the bridged interface to eth1)

What happens now is that the entire Xen node goes down.  I see various
errors in the kernel log, some examples:
http://pastebin.ubuntu.com/10642148/
http://pastebin.ubuntu.com/10642177/
http://pastebin.ubuntu.com/10642181/
http://pastebin.ubuntu.com/10635573/

All Linux kernels are 3.18 plus some tweaks for the m400 cartridge:
https://github.com/columbia/linux-kvm-arm/tree/columbia-armvirt-3.18
config: columbia_armvirt_defconfig (from the same tree:
https://github.com/columbia/linux-kvm-arm/blob/columbia-armvirt-3.18/arch/arm64/configs/columbia_armvirt_defconfig
)

I have also tried applying a set of swiotlb fixes provided by Stefano to
both the Dom0 and DomU kernel, like this:
https://github.com/columbia/linux-kvm-arm/commits/columbia-armvirt-3.18-with-xen-fixes

With these patches I sometime also saw this error in the kernel log (but
not always):
http://pastebin.ubuntu.com/10635062/

Other data points of interest:
 - Bare-metal serving apache doesn't exhibit this behavior
 - KVM guests with bridged networking on identical hardware/setup with the
same kernels also don't exhibit this behavior
 - Other physical identical nodes exhibit the same behavior
 - Just running Dom0 serving apache without running DomU doesn't appear to
exhibit this behavior
 - Running apache on Dom0 and benchmarking the system using Dom0's ip
address but running DomU idle in the background causes this behavior (
http://pastebin.ubuntu.com/10642311/), but the system seems to stay alive
(at least for much longer)!

Stefano suggested that this could be related DMA cache coherency, but I'm
not sure how to investigate that further.

This is a somewhat urgent issue for us at Columbia so I would appreciate
any feedback and/or ideas and will be happy to try out any debugging steps
to get to the bottom of this.

Thanks,
-Christoffer

[-- Attachment #1.2: Type: text/html, Size: 4019 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-03-24 16:51 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-21 12:34 Xen unstability on HP Moonshot m400 Christoffer Dall
2015-03-23 12:36 ` Ian Campbell
2015-03-23 13:00   ` Christoffer Dall
2015-03-23 23:58     ` Stefano Stabellini
2015-03-24 13:54       ` Mark Salter
2015-03-24 14:00         ` Mark Salter
2015-03-24 16:51           ` Christoffer Dall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.