xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Steven Haigh <netwiz@crc.id.au>
To: Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	xen-devel <xen-devel@lists.xenproject.org>,
	linux-kernel@vger.kernel.org
Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>
Subject: Re: 4.4: INFO: rcu_sched self-detected stall on CPU
Date: Wed, 30 Mar 2016 04:44:55 +1100	[thread overview]
Message-ID: <56FABF17.7090608__22969.6305389099$1459273600$gmane$org@crc.id.au> (raw)
In-Reply-To: <56FA8DDD.7070406@oracle.com>


[-- Attachment #1.1.1: Type: text/plain, Size: 5229 bytes --]

On 30/03/2016 1:14 AM, Boris Ostrovsky wrote:
> On 03/29/2016 04:56 AM, Steven Haigh wrote:
>>
>> Interestingly enough, this just happened again - but on a different
>> virtual machine. I'm starting to wonder if this may have something to do
>> with the uptime of the machine - as the system that this seems to happen
>> to is always different.
>>
>> Destroying it and monitoring it again has so far come up blank.
>>
>> I've thrown the latest lot of kernel messages here:
>>      http://paste.fedoraproject.org/346802/59241532
> 
> Would be good to see full console log. The one that you posted starts
> with an error so I wonder what was before that.

Agreed. It started off with me observing this on one VM - but since
trying to get details on that VM - others have started showing issues as
well. It frustrating as it seems I've been playing whack-a-mole to get
more debug on what is going on.

So, I've changed the kernel command line to the following on ALL VMs on
this system:
enforcemodulesig=1 selinux=0 fsck.repair=yes loglevel=7 console=tty0
console=ttyS0,38400n8

In the Dom0 (which runs the same kernel package), I've started a screen
sessions with a screen for each of the DomUs running attached to the
console via 'xl console blah' - so hopefully the next one that goes down
(whichever one that is) will get caught in the console.

> Have you tried this on bare metal, BTW? And you said this is only
> observed on 4.4, not 4.5, right?

I use the same kernel package as the Dom0 kernel - and so far haven't
seen any issues running this as the Dom0. I haven't used it on baremetal
as a non-xen kernel as yet.

The kernel package I'm currently running is for CentOS / Scientific
Linux / RHEL at:
    http://au1.mirror.crc.id.au/repo/el7-testing/x86_64/

I'm using 4.4.6-3 at the moment - which has CONFIG_PREEMPT_VOLUNTARY set
- which *MAY* have increased the time between this happening - or may
have no effect at all. I'm not convinced either way as yet.

With respect to 4.5, I have had reports from another user of my packages
that they haven't seen the same crash using the same Xen packages but
with kernel 4.5. I have not verified this myself as yet as I haven't
gone down the path of making 4.5 packages for testing. As such, I
wouldn't treat this as a conclusive test case as yet.

I'm hoping that the steps I've taken above may give some more
information in which we can drill down into exactly what is going on -
or at least give more pointers into the root cause.

>>
>> Interestingly, around the same time, /var/log/messages on the remote
>> syslog server shows:
>> Mar 29 17:00:01 zeus systemd: Created slice user-0.slice.
>> Mar 29 17:00:01 zeus systemd: Starting user-0.slice.
>> Mar 29 17:00:01 zeus systemd: Started Session 1567 of user root.
>> Mar 29 17:00:01 zeus systemd: Starting Session 1567 of user root.
>> Mar 29 17:00:01 zeus systemd: Removed slice user-0.slice.
>> Mar 29 17:00:01 zeus systemd: Stopping user-0.slice.
>> Mar 29 17:01:01 zeus systemd: Created slice user-0.slice.
>> Mar 29 17:01:01 zeus systemd: Starting user-0.slice.
>> Mar 29 17:01:01 zeus systemd: Started Session 1568 of user root.
>> Mar 29 17:01:01 zeus systemd: Starting Session 1568 of user root.
>> Mar 29 17:08:34 zeus ntpdate[18569]: adjust time server 203.56.246.94
>> offset -0.002247 sec
>> Mar 29 17:08:34 zeus systemd: Removed slice user-0.slice.
>> Mar 29 17:08:34 zeus systemd: Stopping user-0.slice.
>> Mar 29 17:10:01 zeus systemd: Created slice user-0.slice.
>> Mar 29 17:10:01 zeus systemd: Starting user-0.slice.
>> Mar 29 17:10:01 zeus systemd: Started Session 1569 of user root.
>> Mar 29 17:10:01 zeus systemd: Starting Session 1569 of user root.
>> Mar 29 17:10:01 zeus systemd: Removed slice user-0.slice.
>> Mar 29 17:10:01 zeus systemd: Stopping user-0.slice.
>> Mar 29 17:20:01 zeus systemd: Created slice user-0.slice.
>> Mar 29 17:20:01 zeus systemd: Starting user-0.slice.
>> Mar 29 17:20:01 zeus systemd: Started Session 1570 of user root.
>> Mar 29 17:20:01 zeus systemd: Starting Session 1570 of user root.
>> Mar 29 17:20:01 zeus systemd: Removed slice user-0.slice.
>> Mar 29 17:20:01 zeus systemd: Stopping user-0.slice.
>> Mar 29 17:30:55 zeus systemd: systemd-logind.service watchdog timeout
>> (limit 1min)!
>> Mar 29 17:32:25 zeus systemd: systemd-logind.service stop-sigabrt timed
>> out. Terminating.
>> Mar 29 17:33:56 zeus systemd: systemd-logind.service stop-sigterm timed
>> out. Killing.
>> Mar 29 17:35:26 zeus systemd: systemd-logind.service still around after
>> SIGKILL. Ignoring.
>> Mar 29 17:36:56 zeus systemd: systemd-logind.service stop-final-sigterm
>> timed out. Killing.
>> Mar 29 17:38:26 zeus systemd: systemd-logind.service still around after
>> final SIGKILL. Entering failed mode.
>> Mar 29 17:38:26 zeus systemd: Unit systemd-logind.service entered failed
>> state.
>> Mar 29 17:38:26 zeus systemd: systemd-logind.service failed.
> 
> 
> These may be result of your system not feeling well, which is not
> surprising.
> 
> -boris

-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  parent reply	other threads:[~2016-03-29 17:45 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-25  2:53 4.4: INFO: rcu_sched self-detected stall on CPU Steven Haigh
2016-03-25 12:23 ` Boris Ostrovsky
     [not found] ` <56F52DBF.5080006@oracle.com>
2016-03-25 14:05   ` Steven Haigh
     [not found]   ` <56F545B1.8080609@crc.id.au>
2016-03-25 14:44     ` Boris Ostrovsky
     [not found]     ` <56F54EE0.6030004@oracle.com>
2016-03-25 16:04       ` Steven Haigh
     [not found]       ` <56F56172.9020805@crc.id.au>
2016-03-25 16:20         ` Boris Ostrovsky
     [not found]         ` <56F5653B.1090700@oracle.com>
2016-03-25 21:07           ` Steven Haigh
     [not found]           ` <56F5A87A.8000903@crc.id.au>
2016-03-29  8:56             ` Steven Haigh
     [not found]             ` <56FA4336.2030301@crc.id.au>
2016-03-29 14:14               ` Boris Ostrovsky
     [not found]               ` <56FA8DDD.7070406@oracle.com>
2016-03-29 17:44                 ` Steven Haigh [this message]
     [not found]                 ` <56FABF17.7090608@crc.id.au>
2016-03-29 18:04                   ` Steven Haigh
     [not found]                   ` <56FAC3AC.9050802@crc.id.au>
2016-03-30 13:44                     ` Boris Ostrovsky
2016-05-02 20:54                     ` gregkh
2016-04-02  1:50                 ` Steven Haigh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='56FABF17.7090608__22969.6305389099$1459273600$gmane$org@crc.id.au' \
    --to=netwiz@crc.id.au \
    --cc=boris.ostrovsky@oracle.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).