All of lore.kernel.org
 help / color / mirror / Atom feed
From: Prarit Bhargava <prarit@redhat.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>,
	athorlton@sgi.com, CAI Qian <caiqian@redhat.com>
Subject: Re: BUG: tick device NULL pointer during system initialization and shutdown
Date: Thu, 27 Jun 2013 13:04:11 -0400	[thread overview]
Message-ID: <51CC708B.7040605@redhat.com> (raw)
In-Reply-To: <alpine.DEB.2.02.1306261303260.4013@ionos.tec.linutronix.de>



On 06/26/2013 07:05 AM, Thomas Gleixner wrote:
> On Tue, 25 Jun 2013, Prarit Bhargava wrote:
>> On 06/24/2013 09:57 AM, Thomas Gleixner wrote:
>>> Does the patch below fix it?
>>>
>>
>> Thomas,
>>
>> Thanks for the patch.
>>
>> The reproducibility appears to be quite low.  I'm seeing this roughly 1 time
>> every six hours of continuous system reboots.  I'm testing right now with your
>> patch.  I'll update the thread in a couple of days...
> 
> I have a proper version of that patch now along with an explanation of
> the failure.
> 
> -------------------->
> 
> Subject: tick: Make oneshot broadcast robust vs. CPU offlining
> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Wed, 26 Jun 2013 12:17:32 +0200
> 
> In periodic mode we remove offline cpus from the broadcast propagation
> mask. In oneshot mode we fail to do so. This was not a problem so far,
> but the recent changes to the broadcast propagation introduced a
> constellation which can result in a NULL pointer dereference.
> 

Unfortunately this patch causes an NMI watchdog during system shutdown.  Most of
the CPUs are in start_secondary+0x254/0x256.

CPU 0, however, is

[  270.579581] NMI backtrace for cpu 0^M
[  270.583480] CPU: 0 PID: 595 Comm: kworker/0:2 Not tainted 3.10.0-rc4+ #2^M
[  270.590954] Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS
QSSC-S4R.QCI.01.00.T030.072620111404 07/26/2011^M
[  270.601345] task: ffff880851c50000 ti: ffff880851c72000 task.ti:
ffff880851c72000^M
[  270.609691] RIP: 0010:[<ffffffff8109a8c0>]  [<ffffffff8109a8c0>]
update_cfs_shares+0xf0/0xf0^M
[  270.619126] RSP: 0018:ffff880851c73d78  EFLAGS: 00000086^M
[  270.625049] RAX: ffffffff81626180 RBX: ffff880851c50048 RCX: 0000000000000000^M
[  270.633007] RDX: 0000000000000001 RSI: ffff880851c50048 RDI: ffff88085f414670^M
[  270.640965] RBP: ffff880851c73dc0 R08: 0000003effcc9cfd R09: 0000000000000000^M
[  270.648923] R10: 0000000000000000 R11: 0000000000000005 R12: ffff88085f414670^M
[  270.656881] R13: ffff88085f414600 R14: 0000000000000001 R15: 0000000000000001^M
[  270.664841] FS:  0000000000000000(0000) GS:ffff88085f400000(0000)
knlGS:0000000000000000^M
[  270.673865] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b^M
[  270.680272] CR2: 00000000000000b8 CR3: 00000000018f8000 CR4: 00000000000007f0^M
[  270.688229] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000^M
[  270.696188] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400^M
[  270.704146] Stack:^M
[  270.706388]  ffffffff8109b019 ffff88085f414600 ffff88085f414600
0000000000000000^M
[  270.714684]  ffff88085f414600 ffff88085f414600 0000000000000000
ffff880851c50000^M
[  270.722981]  ffff8808521ec700 ffff880851c73de8 ffffffff8108ed39
0000000168d36c00^M
[  270.731276] Call Trace:^M
[  270.734007]  [<ffffffff8109b019>] ? dequeue_task_fair+0x59/0x640^M
[  270.740713]  [<ffffffff8108ed39>] dequeue_task+0x79/0xa0^M
[  270.746638]  [<ffffffff81091be3>] deactivate_task+0x23/0x30^M
[  270.752857]  [<ffffffff816023f9>] __schedule+0x589/0x7d0^M
[  270.758782]  [<ffffffff81602669>] schedule+0x29/0x70^M
[  270.764323]  [<ffffffff8107de03>] worker_thread+0x1c3/0x3a0^M
[  270.770541]  [<ffffffff8107dc40>] ? rescuer_thread+0x350/0x350^M
[  270.777041]  [<ffffffff81084300>] kthread+0xc0/0xd0^M
[  270.782474]  [<ffffffff81084240>] ? insert_kthread_work+0x40/0x40^M
[  270.789272]  [<ffffffff8160c56c>] ret_from_fork+0x7c/0xb0^M
[  270.795295]  [<ffffffff81084240>] ? insert_kthread_work+0x40/0x40^M

and CPU63 is doing the back trace:

[  272.655049] CPU: 63 PID: 0 Comm: swapper/63 Not tainted 3.10.0-rc4+ #2^M
[  272.662331] Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS
QSSC-S4R.QCI.01.00.T030.072620111404 07/26/2011^M
[  272.672714] task: ffff880854df4de0 ti: ffff880854e02000 task.ti:
ffff880854e02000^M
[  272.681062] RIP: 0010:[<ffffffff812f3c82>]  [<ffffffff812f3c82>]
delay_tsc+0x32/0x80^M
[  272.689720] RSP: 0018:ffff88106f3c3dd0  EFLAGS: 00000083^M
[  272.695647] RAX: 000000000000009e RBX: 00000000cea08f3d RCX: 0000000000000001^M
[  272.703607] RDX: 00000000cea08fdb RSI: 0000000000000050 RDI: 00000000001e7000^M
[  272.711569] RBP: ffff88106f3c3de8 R08: ffffffff81a02928 R09: 000000000000070e^M
[  272.719529] R10: 0000000000000000 R11: ffff88106f3c3b46 R12: 00000000001e7000^M
[  272.727491] R13: 000000000000003f R14: ffff88106f3cec80 R15: ffffffff81949480^M
[  272.735452] FS:  0000000000000000(0000) GS:ffff88106f3c0000(0000)
knlGS:0000000000000000^M
[  272.744470] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b^M
[  272.750879] CR2: 00007f114a8f7920 CR3: 0000000c61b5f000 CR4: 00000000000007e0^M
[  272.758841] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000^M
[  272.766801] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400^M
[  272.774759] Stack:^M
[  272.777001]  0000000000002710 ffffffff81949300 ffffffff81949000
ffff88106f3c3df8^M
[  272.785303]  ffffffff812f3be8 ffff88106f3c3e10 ffffffff81036faa
ffffffff81a02ba0^M
[  272.793605]  ffff88106f3c3e70 ffffffff810f8060 0000000354df4de0
0000000000000242^M
[  272.801908] Call Trace:^M
[  272.804634]  <IRQ> ^M
[  272.806782]  [<ffffffff812f3be8>] __const_udelay+0x28/0x30^M
[  272.813122]  [<ffffffff81036faa>] arch_trigger_all_cpu_backtrace+0x7a/0xa0^M
[  272.820799]  [<ffffffff810f8060>] rcu_check_callbacks+0x5b0/0x600^M
[  272.827603]  [<ffffffff81070217>] update_process_times+0x47/0x80^M
[  272.834313]  [<ffffffff810b94f5>] tick_sched_handle.isra.15+0x25/0x60^M
[  272.841500]  [<ffffffff810b9571>] tick_sched_timer+0x41/0x60^M
[  272.847821]  [<ffffffff81087c74>] __run_hrtimer+0x74/0x1d0^M
[  272.853943]  [<ffffffff810b9530>] ? tick_sched_handle.isra.15+0x60/0x60^M
[  272.861325]  [<ffffffff81088457>] hrtimer_interrupt+0xf7/0x240^M
[  272.867841]  [<ffffffff8160e429>] smp_apic_timer_interrupt+0x69/0x9c^M
[  272.874933]  [<ffffffff8160d29d>] apic_timer_interrupt+0x6d/0x80^M
[  272.881634]  <EOI> ^M
[  272.883781]  [<ffffffff810b0432>] ? cpu_startup_entry+0x132/0x230^M
[  272.890803]  [<ffffffff810b0400>] ? cpu_startup_entry+0x100/0x230^M
[  272.897605]  [<ffffffff815ed4e8>] start_secondary+0x254/0x256^M
[  272.904014] Code: 89 e5 41 55 41 54 41 89 fc 53 65 44 8b 2c 25 1c b0 00 00 66
66 90 0f ae e8 e8 5b 46 d2 ff 66 90 89 c3 eb 14 0f 1f 44 00 00 f3 90 <65> 8b 04
25 1c b0 00 00 41 39 c5 75 1d 66 66 90 0f ae e8 e8 36 ^M

P.


  reply	other threads:[~2013-06-27 17:04 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-18 18:46 BUG: tick device NULL pointer during system initialization and shutdown Prarit Bhargava
2013-06-24 13:57 ` Thomas Gleixner
2013-06-25 23:50   ` Prarit Bhargava
2013-06-26 11:05     ` Thomas Gleixner
2013-06-27 17:04       ` Prarit Bhargava [this message]
2013-06-28 10:52         ` Thomas Gleixner
2013-07-01 13:07           ` Prarit Bhargava
2013-07-01 13:30             ` Thomas Gleixner
2013-07-01 15:41               ` Paul E. McKenney
2013-07-08 13:04               ` Prarit Bhargava
2013-07-02 12:31       ` [tip:timers/core] tick: Make oneshot broadcast robust vs. CPU offlining tip-bot for Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51CC708B.7040605@redhat.com \
    --to=prarit@redhat.com \
    --cc=athorlton@sgi.com \
    --cc=caiqian@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.