All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kiszka <jan.kiszka@siemens.com>
To: Niels Wellens <niels.wellens@triphase.com>,
	"xenomai@xenomai.org" <xenomai@xenomai.org>
Subject: Re: [Xenomai] xeno3_rc3 - Watchdog detected hard LOCKUP
Date: Thu, 12 Mar 2015 21:52:25 +0100	[thread overview]
Message-ID: <5501FC89.2040205@siemens.com> (raw)
In-Reply-To: <54FDB495.3060303@triphase.com>

Am 2015-03-09 um 15:56 schrieb Niels Wellens:
> Hi,
> 
> We have a few updates on the lockup's that we observed.
> 
> Jeroen did a dohell test on his unpatched 3.14.28 kernel and he didn't
> experienced any problems, the system was still working as expected after
> more than 100 hours of operation.
> 
> In the meanwhile, I did some further tests on my 3.16.0 ipipe kernel. I
> disabled some services (gdm3, rtkit-daemon, smbd and nmbd) and after 90
> hours of operation (latency + dohell) everything was still working
> flawlessly.  Afterwards I enabled gdm3 and rtkit-daemon services again
> and the lockup didn't occur for another 25hours (test stopped due to
> kernel panic while porting one of my RTDM drivers to xeno 3 ;-) ).
> Then I continued my test where it stopped (only smbd and nmbd services
> disabled, latency + dohell running) and it was running perfectly for 114
> hours, then I enabled smbd and nmbd again and after 3 hours the hard
> lockup occurred again:
> 
> Mar  4 16:35:47 dev-x10sae kernel: [    0.000000] Initializing cgroup
> subsys cpuset
> Mar  4 16:35:47 dev-x10sae kernel: [    0.000000] Initializing cgroup
> subsys cpu
> Mar  4 16:35:47 dev-x10sae kernel: [    0.000000] Initializing cgroup
> subsys cpuacct
> Mar  4 16:35:47 dev-x10sae kernel: [    0.000000] Linux version
> 3.16.0-ipipe-v0+ (triphase@dev-x10sae) (gcc version 4.9.1 (Debian
> 4.9.1-19) ) #1 SMP Thu Feb 26 12:15:32 CET 2015
> Mar  4 16:35:47 dev-x10sae kernel: [    0.000000] Command line:
> BOOT_IMAGE=/boot/vmlinuz-3.16.0-ipipe-v0+
> root=UUID=fc8ecefa-fc73-487f-a045-cffa99c38a11 ro quiet
> ...
> Mar  9 07:35:02 dev-x10sae anacron[26338]: Job `cron.daily' terminated
> Mar  9 07:35:02 dev-x10sae anacron[26338]: Normal exit (1 job run)
> Mar  9 08:17:01 dev-x10sae CRON[25670]: (root) CMD (   cd / && run-parts
> --report /etc/cron.hourly)
> Mar  9 08:30:17 dev-x10sae gnome-session[2611]:
> (gnome-settings-daemon:2675): GLib-CRITICAL **: Source ID 4961 was not
> found when attempting to remove it
> Mar  9 09:17:01 dev-x10sae CRON[20303]: (root) CMD (   cd / && run-parts
> --report /etc/cron.hourly)
> Mar  9 09:30:17 dev-x10sae gnome-session[2611]:
> (gnome-settings-daemon:2675): GLib-CRITICAL **: Source ID 4987 was not
> found when attempting to remove it
> Mar  9 10:17:01 dev-x10sae CRON[14576]: (root) CMD (   cd / && run-parts
> --report /etc/cron.hourly)
> Mar  9 10:30:17 dev-x10sae gnome-session[2611]:
> (gnome-settings-daemon:2675): GLib-CRITICAL **: Source ID 5017 was not
> found when attempting to remove it
> Mar  9 11:17:01 dev-x10sae CRON[30596]: (root) CMD (   cd / && run-parts
> --report /etc/cron.hourly)
> Mar  9 11:20:51 dev-x10sae smbd[11478]: Starting SMB/CIFS daemon: smbd.
> Mar  9 11:20:56 dev-x10sae nmbd[24483]: Starting NetBIOS name server: nmbd.
> Mar  9 11:30:17 dev-x10sae gnome-session[2611]:
> (gnome-settings-daemon:2675): GLib-CRITICAL **: Source ID 5043 was not
> found when attempting to remove it
> Mar  9 12:17:01 dev-x10sae CRON[6674]: (root) CMD (   cd / && run-parts
> --report /etc/cron.hourly)
> Mar  9 12:30:17 dev-x10sae gnome-session[2611]:
> (gnome-settings-daemon:2675): GLib-CRITICAL **: Source ID 5075 was not
> found when attempting to remove it
> Mar  9 13:17:01 dev-x10sae CRON[6801]: (root) CMD (   cd / && run-parts
> --report /etc/cron.hourly)
> Mar  9 13:30:17 dev-x10sae gnome-session[2611]:
> (gnome-settings-daemon:2675): GLib-CRITICAL **: Source ID 5464 was not
> found when attempting to remove it
> Mar  9 14:02:54 dev-x10sae kernel: [422579.748685] Watchdog detected
> hard LOCKUP on cpu 5
> Mar  9 14:02:54 dev-x10sae kernel: [422583.196923] INFO: rcu_sched
> self-detected stall on CPUINFO: rcu_sched self-detected stall on
> CPUINFO: rcu_sched self-detected stall on CPU {
> Mar  9 14:02:54 dev-x10sae kernel: [422583.196927]  {
> Mar  9 14:02:54 dev-x10sae kernel: [422583.196928]  2
> Mar  9 14:02:54 dev-x10sae kernel: [422583.196928]  1
> Mar  9 14:02:54 dev-x10sae kernel: [422583.196929] }
> Mar  9 14:02:54 dev-x10sae kernel: [422583.196930] }
> Mar  9 14:02:54 dev-x10sae kernel: [422583.196930]  (t=5250 jiffies
> g=21756356 c=21756355 q=15258)
> Mar  9 14:02:54 dev-x10sae kernel: [422583.196931]  (t=5250 jiffies
> g=21756356 c=21756355 q=15258)
> Mar  9 14:02:54 dev-x10sae kernel: [422583.196932] sending NMI to all CPUs:
> Mar  9 14:02:54 dev-x10sae kernel: [422583.197098]  { 6}  (t=5250
> jiffies g=21756356 c=21756355 q=15258)
> 
> Is it possible that the kernel part of Samba (CIFS?) is holding the page
> allocation spinlock that Jan has mentioned?

Well, we need to see the backtraces to know more. But even then the
question would what could cause this. If it is some issue in I-pipe or
Xenomai, or if this is a generic issue that would see after a while with
an unpatched kernel as well.

> 
> For now I will enable CONFIG_FRAME_POINTER  and connect a serial header
> (just arrived) in order to have a serial terminal, hopefully this gives
> some more debugging information.

I finally started some tests here as well with your config, but I don't
expect results soon (if at all), given your long times to reproduce
things. I will also make some new patches available soon that target
very specific corner cases in kernel exception handling. However, these
patches will apply to both 3.16 and 3.14, so nothing that could easily
explain your issues.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux


  reply	other threads:[~2015-03-12 20:52 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-26 10:08 [Xenomai] xeno3_rc3 - Watchdog detected hard LOCKUP Niels Wellens
2015-02-26 10:20 ` Gilles Chanteperdrix
     [not found]   ` <54EF0790.3040607@triphase.com>
2015-02-27 14:10     ` Niels Wellens
2015-02-27 20:32       ` Jan Kiszka
2015-02-27 20:36         ` Gilles Chanteperdrix
2015-02-27 20:39           ` Jan Kiszka
2015-03-03  8:11         ` Jan Kiszka
2015-03-04 20:35           ` Jeroen Van den Keybus
2015-03-09 14:56             ` Niels Wellens
2015-03-12 20:52               ` Jan Kiszka [this message]
2015-03-13 16:25                 ` Jan Kiszka
2015-03-13 16:34                 ` Gilles Chanteperdrix
2015-03-13 17:09                   ` Jan Kiszka
2015-03-13 17:12                     ` Gilles Chanteperdrix
2015-04-02 18:47                       ` Jeroen Van den Keybus
2015-04-02 19:15                         ` Gilles Chanteperdrix
2015-04-02 19:29                           ` Jeroen Van den Keybus
2015-04-02 20:41                             ` Gilles Chanteperdrix
2015-04-08 21:02                               ` Jeroen Van den Keybus
2015-04-09  9:04                                 ` Jan Kiszka
2015-04-09  9:14                                   ` Jan Kiszka
2015-04-09  9:26                                     ` Jan Kiszka
2015-04-09 12:41                                     ` Gilles Chanteperdrix
2015-04-09 12:49                                       ` Jan Kiszka
2015-04-09 12:56                                         ` Gilles Chanteperdrix
2015-04-09 12:58                                         ` Gilles Chanteperdrix
2015-04-09 13:01                                           ` Jan Kiszka
2015-04-21 21:14                                             ` Jeroen Van den Keybus
2015-04-22  5:14                                               ` Jan Kiszka
2015-04-22 19:22                                                 ` Jeroen Van den Keybus
2015-04-28 19:12                                                   ` Jeroen Van den Keybus
2015-04-29  6:36                                                     ` Jan Kiszka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5501FC89.2040205@siemens.com \
    --to=jan.kiszka@siemens.com \
    --cc=niels.wellens@triphase.com \
    --cc=xenomai@xenomai.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.