From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?utf-8?Q?Ingo_J=C3=BCrgensmann?= Subject: Re: Kernel panic on Xen virtualisation in Debian Date: Sun, 10 Jul 2016 15:18:37 +0200 Message-ID: <17804F34-7019-4595-91D5-42FC9866A4C2@2013.bluespice.org> References: <133d4197-8155-1219-46af-bb51e1092245@andreas-ziegler.de> Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Content-Type: multipart/mixed; boundary="===============1350038295635959718==" Return-path: In-Reply-To: <133d4197-8155-1219-46af-bb51e1092245@andreas-ziegler.de> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org --===============1350038295635959718== Content-Type: multipart/signed; boundary="Apple-Mail=_CE64CEF9-9C21-4927-8B6F-2A52048295FE"; protocol="application/pgp-signature"; micalg=pgp-sha256 --Apple-Mail=_CE64CEF9-9C21-4927-8B6F-2A52048295FE Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Am 10.07.2016 um 00:29 schrieb Andreas Ziegler : > In May, Ingo J=C3=BCrgensmann also started experiencing this problem = and > blogged about it: > = https://blog.windfluechter.net/content/blog/2016/03/23/1721-xen-randomly-c= rashing-server > = https://blog.windfluechter.net/content/blog/2016/05/12/1723-xen-randomly-c= rashing-server-part-2 Actually I=E2=80=99m suffering from this problem since April 2013. = Here=E2=80=99s my story=E2=80=A6 ;) Everything was working smoothly when I was still using a rootserver at = Hetzner. The setup there was some sort of non-standard, as I needed to = have eth0 as outgoing interface not being part of the Xen bridge. So I = used a mixture of bridge and routed in xend-config.sxp. This setup = worked for years without problems. However: as Hetzner started to bill for every single IPv4 address, I = moved to my new provider where I could get the same address space (/26) = without being forced to pay for every IPv4 address. The server back then = was a Cisco C200 M2. Since I got my own VLAN at the new location, I was then able to dismiss = the mixed setup of routing and bridging and used only bridging with eth0 = now being part of the Xen bridge. The whole setup consists of two = bridges: one for the external IP addresses (xenbr0) and one for internal = traffic (xenbr1). This was already that way with Hetzner. However, shortly after I moved to the new provider, the issues started: = random crashes of the host. With the new provider, who was and is still = very helpful, we exchanged for example the memory. The provider reported = as well that other Cisco C200 server with Ubutu LTS didn=E2=80=99t show = this issue. Over time a pattern showed up that might cause the frequent crashes = (sometimes several times in a row, let=E2=80=99s say 2-10 times a day!): My setup is this: Debian stable with packaged Xen hypervisor and these VMs: 1) Mail, Database, Nameserver, OpenVPN 2) Webserver, Squid3 3) Login server 4) =E2=80=A6 some more servers (10 in total), e.g. Tor Relay=E2=80=A6 IPv4 /26 network, IPv6 /48 network =46rom my workplace I need to login to 3) and have a tunnel to the Squid = on 2) via the internal addresses on xenbr1. Of course Squid queries the = nameserver on 1), so there is some internal traffic going back and forth = on the internal bridge and traffic originating from the external bridge = (xenbr0). Using Squid I access my Roundcube on my small homebrew server = that is connected to 1) via OpenVPN. Of course the webserver on 2) = queries the database on 1) So, the most crashes do happen while I=E2=80=99m using the SSH tunnel = from my workplace. If a crash happen, it=E2=80=99s most likely that at = least two in a row will happen in a short time frame (within 1-2 hours), = sometimes even within 10 mins after the server came back. =46rom time to = time my impression was, that the server crashes the second time = instantly when I try to access my Roundcube at home. Furthermore, I switched from using the Cisco C200 server to my own = server with Supermicro X9SRi-F mainboard and a XEON E5-2630L V2, but = still the same provider, and the same issue: the new server crashes the = same way as the Cisco server did. With the new server we did a = replacement of the memory as well: from 32G to 128G. So over time we = have switched memory twice and hardware once. Since then I don=E2=80=99t = assume anymore that this might be hardware related. In the meantime I switched from using Squid on 2) to tinyproxy running = on 2) as well as running tinyproxy on another third party VPS. Still the = crashes happen, regardless of using Squid on 2) or not. In May the server crashed again several times a week and several times a = day. Really, really annoying! So together with my provider we setup a netconsole to catch some more = information about the crash than just the few lines from the IPMI = console. Trying linux-image 4.4 from backports didn=E2=80=99t help either. I = switched from PV to PVHVM as well some months ago. > He is pretty sure, that the problem went away after disabling IPv6. Indeed. Since I disabled IPv6 for all of my VMs (it=E2=80=99s still = active on dom0, but not routed to the domUs anymore) no single crash = happened again. > But: we can't say for sure, because on our server it sometimes = happened > often in a short period of time, but then it didn't for months. > and: disabling IPv6 is no option for me at all. I won=E2=80=99t state that I have an exact way of reproducing the = crashes, but it happens fairly often when doing as described above. What I can offer is: - activate IPv6 again - install a kernel with debugging symbols (*-dbg) - try to provoke another crash - send netconsole output if happened What I cannot do: - interpret the debug symbols - access IPMI console from workplace (firewalled) I=E2=80=99m with Andreas that disabling IPv6 cannot be an option. -- Ciao... // http://blog.windfluechter.net Ingo \X/ XMPP: ij@jabber.windfluechter.net gpg pubkey: http://www.juergensmann.de/ij_public_key.asc --Apple-Mail=_CE64CEF9-9C21-4927-8B6F-2A52048295FE Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- iQIcBAEBCAAGBQJXgks0AAoJEBUi7ZnYRSedTHcP/2vEFMSUfADX3M9f0Blhl9Eh bcbbpiIoeWc5H8zdh+xnluKxS+6u6DR8T53ldrZhhBZd+/FxBNpdl2kB8cb/8Nkn Z+fxpni9bQ3NpY/iCoK+4uKORoDKXRCEyPXxjHgOEWYkkwRLTOzBb5G3R1SXxA9B j5fcBSu1YQyg0uWEllZF/Q2gQurtYRUBTLbvVgiWEEuDJu3iXWYkFux43JHp9o7U 5tf/RVfQbEAG1aJR9JmYoyM30b7DYXzZTMo+dnVWYiM0ZU3RNkf4Q/pFaYdGqTQp OwDVsNb8oGdh8OtYrbA57y5HI/ko4lgvAEFTQtiG62p/5oX7EyUJvGENY/BwMh2W S5V0StSNyL2skRHB4K89FGW+zeOtNqZ+cuhaDqijFoB1gK6W3sXls0UUGKgj6bwd Fpq4a+qh3Z/4k2pRRFyvdAND6/mre1rZHJBDusNAe5Qjq9gp+ZYy2mqb2votjnFa FCqm7OlZZziY5lhqDN7Na7E2SuoFkIRSxkt2ou3cZGqZ/JtW40dTBCh+pSX+9y27 StthtZpfdOj84ee3q+rbxzFHBGkGd+8IsP0Z9CFkKnoScmiadJi0XIvKG7Y1cboG dAPiZejovmuV0BuP15W8Ii6iZLtS4Qpk0RMbtSWs4Id49Gm9mbRWCJQHQSKH9pFS xU2df0e2XBkwd4leS8U5 =6J09 -----END PGP SIGNATURE----- --Apple-Mail=_CE64CEF9-9C21-4927-8B6F-2A52048295FE-- --===============1350038295635959718== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwczovL2xpc3RzLnhlbi5v cmcveGVuLWRldmVsCg== --===============1350038295635959718==--