From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steven Haigh Subject: Re: 4.4: INFO: rcu_sched self-detected stall on CPU Date: Tue, 29 Mar 2016 19:56:22 +1100 Message-ID: <56FA4336.2030301__15742.7484788515$1459241883$gmane$org@crc.id.au> References: <56F4A816.3050505@crc.id.au> <56F52DBF.5080006@oracle.com> <56F545B1.8080609@crc.id.au> <56F54EE0.6030004@oracle.com> <56F56172.9020805@crc.id.au> <56F5653B.1090700@oracle.com> <56F5A87A.8000903@crc.id.au> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2469772727787349246==" Return-path: Received: from mail6.bemta6.messagelabs.com ([85.158.143.247]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1akpSB-0001au-OL for xen-devel@lists.xenproject.org; Tue, 29 Mar 2016 08:56:48 +0000 In-Reply-To: <56F5A87A.8000903@crc.id.au> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: Boris Ostrovsky , xen-devel , linux-kernel@vger.kernel.org Cc: "gregkh@linuxfoundation.org" List-Id: xen-devel@lists.xenproject.org This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --===============2469772727787349246== Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="6km3IDSkXRNwUHW1se4dCfKVd9N7GGunP" This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --6km3IDSkXRNwUHW1se4dCfKVd9N7GGunP Content-Type: multipart/mixed; boundary="SRDoF1Tq9gntdeureP17489DGQ4b9E1Lc" From: Steven Haigh To: Boris Ostrovsky , xen-devel , linux-kernel@vger.kernel.org Cc: "gregkh@linuxfoundation.org" Message-ID: <56FA4336.2030301@crc.id.au> Subject: Re: 4.4: INFO: rcu_sched self-detected stall on CPU References: <56F4A816.3050505@crc.id.au> <56F52DBF.5080006@oracle.com> <56F545B1.8080609@crc.id.au> <56F54EE0.6030004@oracle.com> <56F56172.9020805@crc.id.au> <56F5653B.1090700@oracle.com> <56F5A87A.8000903@crc.id.au> In-Reply-To: <56F5A87A.8000903@crc.id.au> --SRDoF1Tq9gntdeureP17489DGQ4b9E1Lc Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 26/03/2016 8:07 AM, Steven Haigh wrote: > On 26/03/2016 3:20 AM, Boris Ostrovsky wrote: >> On 03/25/2016 12:04 PM, Steven Haigh wrote: >>> It may not actually be the full logs. Once the system gets really ups= et, >>> you can't run anything - as such, grabbing anything from dmesg is not= >>> possible. >>> >>> The logs provided above is all that gets spat out to the syslog serve= r. >>> >>> I'll try tinkering with a few things to see if I can get more output = - >>> but right now, that's all I've been able to achieve. So far, my only >>> ideas are to remove the 'quiet' options from the kernel command line = - >>> but I'm not sure how much that would help. >>> >>> Suggestions gladly accepted on this front. >> >> You probably want to run connected to guest serial console (" >> serial=3D'pty' " in guest config file and something like 'loglevel=3D7= >> console=3Dtty0 console=3DttyS0,38400n8' on guest kernel commandline). = And >> start the guest with 'xl create -c ' or connect later with 'xl >> console '. >=20 > Ok thanks, I've booted the DomU with: >=20 > $ cat /proc/cmdline > root=3DUUID=3D63ade949-ee67-4afb-8fe7-ecd96faa15e2 ro enforcemodulesig=3D= 1 > selinux=3D0 fsck.repair=3Dyes loglevel=3D7 console=3Dtty0 console=3Dtty= S0,38400n8 >=20 > I've left a screen session attached to the console (via xl console) and= > I'll see if that turns anything up. As this seems to be rather > unpredictable when it happens, it may take a day or two to get anything= =2E > I just hope its more than the syslog output :) Interestingly enough, this just happened again - but on a different virtual machine. I'm starting to wonder if this may have something to do with the uptime of the machine - as the system that this seems to happen to is always different. Destroying it and monitoring it again has so far come up blank. I've thrown the latest lot of kernel messages here: http://paste.fedoraproject.org/346802/59241532 Interestingly, around the same time, /var/log/messages on the remote syslog server shows: Mar 29 17:00:01 zeus systemd: Created slice user-0.slice. Mar 29 17:00:01 zeus systemd: Starting user-0.slice. Mar 29 17:00:01 zeus systemd: Started Session 1567 of user root. Mar 29 17:00:01 zeus systemd: Starting Session 1567 of user root. Mar 29 17:00:01 zeus systemd: Removed slice user-0.slice. Mar 29 17:00:01 zeus systemd: Stopping user-0.slice. Mar 29 17:01:01 zeus systemd: Created slice user-0.slice. Mar 29 17:01:01 zeus systemd: Starting user-0.slice. Mar 29 17:01:01 zeus systemd: Started Session 1568 of user root. Mar 29 17:01:01 zeus systemd: Starting Session 1568 of user root. Mar 29 17:08:34 zeus ntpdate[18569]: adjust time server 203.56.246.94 offset -0.002247 sec Mar 29 17:08:34 zeus systemd: Removed slice user-0.slice. Mar 29 17:08:34 zeus systemd: Stopping user-0.slice. Mar 29 17:10:01 zeus systemd: Created slice user-0.slice. Mar 29 17:10:01 zeus systemd: Starting user-0.slice. Mar 29 17:10:01 zeus systemd: Started Session 1569 of user root. Mar 29 17:10:01 zeus systemd: Starting Session 1569 of user root. Mar 29 17:10:01 zeus systemd: Removed slice user-0.slice. Mar 29 17:10:01 zeus systemd: Stopping user-0.slice. Mar 29 17:20:01 zeus systemd: Created slice user-0.slice. Mar 29 17:20:01 zeus systemd: Starting user-0.slice. Mar 29 17:20:01 zeus systemd: Started Session 1570 of user root. Mar 29 17:20:01 zeus systemd: Starting Session 1570 of user root. Mar 29 17:20:01 zeus systemd: Removed slice user-0.slice. Mar 29 17:20:01 zeus systemd: Stopping user-0.slice. Mar 29 17:30:55 zeus systemd: systemd-logind.service watchdog timeout (limit 1min)! Mar 29 17:32:25 zeus systemd: systemd-logind.service stop-sigabrt timed out. Terminating. Mar 29 17:33:56 zeus systemd: systemd-logind.service stop-sigterm timed out. Killing. Mar 29 17:35:26 zeus systemd: systemd-logind.service still around after SIGKILL. Ignoring. Mar 29 17:36:56 zeus systemd: systemd-logind.service stop-final-sigterm timed out. Killing. Mar 29 17:38:26 zeus systemd: systemd-logind.service still around after final SIGKILL. Entering failed mode. Mar 29 17:38:26 zeus systemd: Unit systemd-logind.service entered failed state. Mar 29 17:38:26 zeus systemd: systemd-logind.service failed. --=20 Steven Haigh Email: netwiz@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 --SRDoF1Tq9gntdeureP17489DGQ4b9E1Lc-- --6km3IDSkXRNwUHW1se4dCfKVd9N7GGunP Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJW+kNBAAoJEEGvNdV6fTHcw1wP/1WEBiRzEcwqh2jjMtHkfW7x pD6N+bD+8yP1H6jqroQKHS+86xrTfYCzn+YfVTTHs4KMI0grQVQVF65wCtyADxTP pfvZ9Azn38acPlYwyx667UU4L1MtxddiWNUbFN8eRjGsCE0qmCI9l2TSk3nIg/5H NwyHAgfswqTouwq5DkDs4NMnxgGib98SEWnzFPE2bkzHfKWRlEtuZ6r2owVk6a+9 qPA237uK9hTgNWRXX0DOEkuKKzRIwyzYO2AtdW5FH/aOJc9PgSZmSU7cOjzRWMKm wG6Kc4jowEwDsqqWDGFM2uXskGAGQdc3T3kUc0ng9sT+qNNn6FHvzsv3rH7LYh4m fjlIYG4TGRz6U6SNZYj2IkemvQSefoWCZ8RIAbxN1zoAjSgCIHdIup1/JQWv2DfQ 05puRTQrAFycWgpDD548eZzmjqrhybHLEVcjjdmgZzE81JSj4R5q82UXhnc42rMc 3H4ZqSDj3AYAMHKY5p2gU88GWgU6tZ9yKbyMnEL0dnndyXZLhj2IuaaX17RuSbQd WRA7MbSUnFGQvsnFYLa+VnkVNZZdlFHtoUxLCSLjOqXhSqY4ST9UklvmqH3r1MoX nZzSSYxXyiHBFk5/T/LU9yQQ9hvx6yG7XiviFRx3X0clTnL/fxp0Kvwrg3ZbFtHV gMU78vfFVBy0OuDr+EtF =3TKq -----END PGP SIGNATURE----- --6km3IDSkXRNwUHW1se4dCfKVd9N7GGunP-- --===============2469772727787349246== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwOi8vbGlzdHMueGVuLm9y Zy94ZW4tZGV2ZWwK --===============2469772727787349246==--