All of lore.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <fweisbec@gmail.com>
To: Fengguang Wu <fengguang.wu@intel.com>
Cc: Jet Chen <jet.chen@intel.com>, Su Tao <tao.su@intel.com>,
	Yuanhan Liu <yuanhan.liu@intel.com>, LKP <lkp@01.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [nohz] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/common.c:1430 warn_pre_alternatives()
Date: Fri, 12 Sep 2014 14:35:32 +0200	[thread overview]
Message-ID: <20140912123530.GA5824@lerouge> (raw)
In-Reply-To: <20140911211802.GA18626@localhost>

On Fri, Sep 12, 2014 at 05:18:02AM +0800, Fengguang Wu wrote:
> Hi Frederic,
> 
> On Thu, Sep 11, 2014 at 05:53:19PM +0200, Frederic Weisbecker wrote:
> > Hi,
> > 
> > On Thu, Sep 11, 2014 at 03:17:04PM +0800, Fengguang Wu wrote:
> > > Greetings,
> > > 
> > > 0day kernel testing robot got the below dmesg and the first bad commit is
> > > 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git nohz/fixes-v2
> > > 
> > > commit 664c05b9ab2dd92aad807bd7f9aa273955949abe
> > > Author:     Frederic Weisbecker <fweisbec@gmail.com>
> > > AuthorDate: Sun Aug 17 22:02:55 2014 +0200
> > > Commit:     Frederic Weisbecker <fweisbec@gmail.com>
> > > CommitDate: Wed Sep 10 18:52:45 2014 +0200
> > > 
> > >     nohz: Consolidate nohz full init code
> > >     
> > >     The supports for CONFIG_NO_HZ_FULL_ALL=y and the nohz_full= kernel
> > >     parameter both have their own way to do the same thing: allocate
> > >     full dynticks cpumasks, fill them and initialize some state variables.
> > >     
> > >     Lets consolidate that all in the same place.
> > >     
> > >     While at it, convert some regular printk message to warnings when
> > >     fundamental allocations fail.
> > >     
> > >     Cc: Ingo Molnar <mingo@kernel.org>
> > >     Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > >     Cc: Peter Zijlstra <peterz@infradead.org>
> > >     Cc: Thomas Gleixner <tglx@linutronix.de>
> > >     Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > > 
> > > +-----------------------------------------------------------------+------------+------------+------------+
> > > |                                                                 | ed1fb75de3 | 664c05b9ab | dd99eb9cdf |
> > > +-----------------------------------------------------------------+------------+------------+------------+
> > > | boot_successes                                                  | 60         | 0          | 0          |
> > > | boot_failures                                                   | 0          | 20         | 23         |
> > 
> > So you tested ed1fb75de3 and 664c05b9ab with the same config that is in attachement, right?
> 
> Yes.
> 
> > I can't find what dd99eb9cdf refers to though.
> 
> It's an internal merge-and-test branch head.
> 
> > One more thing below:
> > 
> > > | WARNING:at_arch/x86/kernel/cpu/common.c:warn_pre_alternatives() | 0          | 20         | 22         |
> > > | BUG:kernel_boot_hang                                            | 0          | 20         | 22         |
> > > | backtrace:setup_default_timer_irq                               | 0          | 20         | 22         |
> > > | backtrace:hpet_time_init                                        | 0          | 20         | 22         |
> > > | backtrace:x86_late_time_init                                    | 0          | 20         | 22         |
> > > | WARNING:CPU:PID:at/kbuild/                                      | 0          | 1          |            |
> > > | WARNING:CPU:PID:at_arch/x86/k                                   | 0          | 1          |            |
> > > | WARNING:CPU:PID:at_arch/x86/kernel/cpu/common.c:warn_pre_al     | 0          | 0          | 1          |
> > > | BUG:kernel_boot_crashed                                         | 0          | 0          | 1          |
> > > +-----------------------------------------------------------------+------------+------------+------------+
> > > 
> > > [    0.000000] ----------------------------------------------------
> > > [    0.000000] hpet clockevent registered
> > > [    0.000000] ------------[ cut here ]------------
> > > [    0.000000] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/common.c:1430 warn_pre_alternatives+0x1e/0x20()
> > > [    0.000000] You're using static_cpu_has before alternatives have run!
> > > [    0.000000] Modules linked in:
> > > [    0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.17.0-rc3-00007-g664c05b #840
> > > [    0.000000]  0000000000000009 ffff880012603d80 ffffffff89dfe8c6 ffff880012603dc8
> > > [    0.000000]  ffff880012603db8 ffffffff89a81df3 ffffffff8a414580 0000000000000000
> > > [    0.000000]  0000000000000000 0000000000000000 0000000000000000 ffff880012603e18
> > > [    0.000000] Call Trace:
> > > [    0.000000]  <IRQ>  [<ffffffff89dfe8c6>] dump_stack+0x4d/0x66
> > > [    0.000000]  [<ffffffff89a81df3>] warn_slowpath_common+0x73/0x90
> > > [    0.000000]  [<ffffffff89a81e57>] warn_slowpath_fmt+0x47/0x50
> > > [    0.000000]  [<ffffffff89acaadf>] ? rcu_check_callbacks+0x30f/0x7c0
> > > [    0.000000]  [<ffffffff89a1396e>] warn_pre_alternatives+0x1e/0x20
> > > [    0.000000]  [<ffffffff89aeddd5>] irq_work_tick+0x15/0x50
> > > [    0.000000]  [<ffffffff89acfae6>] update_process_times+0x56/0x70
> > > [    0.000000]  [<ffffffff89adae2b>] tick_periodic+0x2b/0xc0
> > > [    0.000000]  [<ffffffff89adb0d1>] ? tick_handle_periodic+0x21/0x60
> > > [    0.000000]  [<ffffffff89adb0d1>] tick_handle_periodic+0x21/0x60
> > > [    0.000000]  [<ffffffff89a05500>] timer_interrupt+0x10/0x20
> > > [    0.000000]  [<ffffffff89ac1ef4>] handle_irq_event_percpu+0x64/0x110
> > > [    0.000000]  [<ffffffff89ac1fd8>] handle_irq_event+0x38/0x60
> > > [    0.000000]  [<ffffffff89ac4c24>] handle_level_irq+0x74/0x110
> > > [    0.000000]  [<ffffffff89a04cf8>] handle_irq+0x58/0x150
> > > [    0.000000]  [<ffffffff89aa8e89>] ? vtime_account_idle+0x9/0x50
> > > [    0.000000]  [<ffffffff89aa8f0d>] ? vtime_common_account_irq_enter+0x3d/0x50
> > > [    0.000000]  [<ffffffff89a04598>] do_IRQ+0x48/0xe0
> > > [    0.000000]  [<ffffffff89e063ed>] common_interrupt+0x6d/0x6d
> > > [    0.000000]  <EOI>  [<ffffffff89a2f0a6>] ? native_restore_fl+0x6/0x10
> > > [    0.000000]  [<ffffffff89e04ea1>] _raw_spin_unlock_irqrestore+0x31/0x40
> > > [    0.000000]  [<ffffffff89ac3845>] __setup_irq+0x255/0x580
> > > [    0.000000]  [<ffffffff89ac3bac>] setup_irq+0x3c/0xa0
> > > [    0.000000]  [<ffffffff8a529998>] ? hpet_enable+0x29a/0x2cb
> > > [    0.000000]  [<ffffffff8a51cb9b>] setup_default_timer_irq+0x1e/0x20
> > > [    0.000000]  [<ffffffff8a51cbb4>] hpet_time_init+0x17/0x19
> > > [    0.000000]  [<ffffffff8a51cb76>] x86_late_time_init+0xa/0x11
> > > [    0.000000]  [<ffffffff8a51ad8c>] start_kernel+0x383/0x3f8
> > > [    0.000000]  [<ffffffff8a51a120>] ? early_idt_handlers+0x120/0x120
> > > [    0.000000]  [<ffffffff8a51a432>] x86_64_start_reservations+0x2a/0x2c
> > > [    0.000000]  [<ffffffff8a51a50b>] x86_64_start_kernel+0xd7/0xe4
> > 
> > So the issue here seems to be that something enabled irqs on boot way too early.
> > But I tried hard to reproduce that crash without success. So I wanted to ask you,
> > do you mind if I send you a patch to debug this?
> 
> Never mind. Feel free to send me the patch.

Ah wait, I can finally reproduce. I just forgot to enable CONFIG_X86_DEBUG_STATIC_CPU_HAS.

I feel really relieved because otherwise debugging by email is often hard and long.

Thanks a lot Wu! 0-day is, as usual, extremely useful. All these years, Ingo
has sent me a lot of bug report right after he pulled my branches. But now 0-day
catches all the bugs before I do my pull requests. It spares a lot of time for everyone.
I can only hope that intel will continue to support you and your work for a long
while.

WARNING: multiple messages have this Message-ID (diff)
From: Frederic Weisbecker <fweisbec@gmail.com>
To: lkp@lists.01.org
Subject: Re: [nohz] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/common.c:1430 warn_pre_alternatives()
Date: Fri, 12 Sep 2014 14:35:32 +0200	[thread overview]
Message-ID: <20140912123530.GA5824@lerouge> (raw)
In-Reply-To: <20140911211802.GA18626@localhost>

[-- Attachment #1: Type: text/plain, Size: 7409 bytes --]

On Fri, Sep 12, 2014 at 05:18:02AM +0800, Fengguang Wu wrote:
> Hi Frederic,
> 
> On Thu, Sep 11, 2014 at 05:53:19PM +0200, Frederic Weisbecker wrote:
> > Hi,
> > 
> > On Thu, Sep 11, 2014 at 03:17:04PM +0800, Fengguang Wu wrote:
> > > Greetings,
> > > 
> > > 0day kernel testing robot got the below dmesg and the first bad commit is
> > > 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git nohz/fixes-v2
> > > 
> > > commit 664c05b9ab2dd92aad807bd7f9aa273955949abe
> > > Author:     Frederic Weisbecker <fweisbec@gmail.com>
> > > AuthorDate: Sun Aug 17 22:02:55 2014 +0200
> > > Commit:     Frederic Weisbecker <fweisbec@gmail.com>
> > > CommitDate: Wed Sep 10 18:52:45 2014 +0200
> > > 
> > >     nohz: Consolidate nohz full init code
> > >     
> > >     The supports for CONFIG_NO_HZ_FULL_ALL=y and the nohz_full= kernel
> > >     parameter both have their own way to do the same thing: allocate
> > >     full dynticks cpumasks, fill them and initialize some state variables.
> > >     
> > >     Lets consolidate that all in the same place.
> > >     
> > >     While at it, convert some regular printk message to warnings when
> > >     fundamental allocations fail.
> > >     
> > >     Cc: Ingo Molnar <mingo@kernel.org>
> > >     Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > >     Cc: Peter Zijlstra <peterz@infradead.org>
> > >     Cc: Thomas Gleixner <tglx@linutronix.de>
> > >     Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > > 
> > > +-----------------------------------------------------------------+------------+------------+------------+
> > > |                                                                 | ed1fb75de3 | 664c05b9ab | dd99eb9cdf |
> > > +-----------------------------------------------------------------+------------+------------+------------+
> > > | boot_successes                                                  | 60         | 0          | 0          |
> > > | boot_failures                                                   | 0          | 20         | 23         |
> > 
> > So you tested ed1fb75de3 and 664c05b9ab with the same config that is in attachement, right?
> 
> Yes.
> 
> > I can't find what dd99eb9cdf refers to though.
> 
> It's an internal merge-and-test branch head.
> 
> > One more thing below:
> > 
> > > | WARNING:at_arch/x86/kernel/cpu/common.c:warn_pre_alternatives() | 0          | 20         | 22         |
> > > | BUG:kernel_boot_hang                                            | 0          | 20         | 22         |
> > > | backtrace:setup_default_timer_irq                               | 0          | 20         | 22         |
> > > | backtrace:hpet_time_init                                        | 0          | 20         | 22         |
> > > | backtrace:x86_late_time_init                                    | 0          | 20         | 22         |
> > > | WARNING:CPU:PID:at/kbuild/                                      | 0          | 1          |            |
> > > | WARNING:CPU:PID:at_arch/x86/k                                   | 0          | 1          |            |
> > > | WARNING:CPU:PID:at_arch/x86/kernel/cpu/common.c:warn_pre_al     | 0          | 0          | 1          |
> > > | BUG:kernel_boot_crashed                                         | 0          | 0          | 1          |
> > > +-----------------------------------------------------------------+------------+------------+------------+
> > > 
> > > [    0.000000] ----------------------------------------------------
> > > [    0.000000] hpet clockevent registered
> > > [    0.000000] ------------[ cut here ]------------
> > > [    0.000000] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/common.c:1430 warn_pre_alternatives+0x1e/0x20()
> > > [    0.000000] You're using static_cpu_has before alternatives have run!
> > > [    0.000000] Modules linked in:
> > > [    0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.17.0-rc3-00007-g664c05b #840
> > > [    0.000000]  0000000000000009 ffff880012603d80 ffffffff89dfe8c6 ffff880012603dc8
> > > [    0.000000]  ffff880012603db8 ffffffff89a81df3 ffffffff8a414580 0000000000000000
> > > [    0.000000]  0000000000000000 0000000000000000 0000000000000000 ffff880012603e18
> > > [    0.000000] Call Trace:
> > > [    0.000000]  <IRQ>  [<ffffffff89dfe8c6>] dump_stack+0x4d/0x66
> > > [    0.000000]  [<ffffffff89a81df3>] warn_slowpath_common+0x73/0x90
> > > [    0.000000]  [<ffffffff89a81e57>] warn_slowpath_fmt+0x47/0x50
> > > [    0.000000]  [<ffffffff89acaadf>] ? rcu_check_callbacks+0x30f/0x7c0
> > > [    0.000000]  [<ffffffff89a1396e>] warn_pre_alternatives+0x1e/0x20
> > > [    0.000000]  [<ffffffff89aeddd5>] irq_work_tick+0x15/0x50
> > > [    0.000000]  [<ffffffff89acfae6>] update_process_times+0x56/0x70
> > > [    0.000000]  [<ffffffff89adae2b>] tick_periodic+0x2b/0xc0
> > > [    0.000000]  [<ffffffff89adb0d1>] ? tick_handle_periodic+0x21/0x60
> > > [    0.000000]  [<ffffffff89adb0d1>] tick_handle_periodic+0x21/0x60
> > > [    0.000000]  [<ffffffff89a05500>] timer_interrupt+0x10/0x20
> > > [    0.000000]  [<ffffffff89ac1ef4>] handle_irq_event_percpu+0x64/0x110
> > > [    0.000000]  [<ffffffff89ac1fd8>] handle_irq_event+0x38/0x60
> > > [    0.000000]  [<ffffffff89ac4c24>] handle_level_irq+0x74/0x110
> > > [    0.000000]  [<ffffffff89a04cf8>] handle_irq+0x58/0x150
> > > [    0.000000]  [<ffffffff89aa8e89>] ? vtime_account_idle+0x9/0x50
> > > [    0.000000]  [<ffffffff89aa8f0d>] ? vtime_common_account_irq_enter+0x3d/0x50
> > > [    0.000000]  [<ffffffff89a04598>] do_IRQ+0x48/0xe0
> > > [    0.000000]  [<ffffffff89e063ed>] common_interrupt+0x6d/0x6d
> > > [    0.000000]  <EOI>  [<ffffffff89a2f0a6>] ? native_restore_fl+0x6/0x10
> > > [    0.000000]  [<ffffffff89e04ea1>] _raw_spin_unlock_irqrestore+0x31/0x40
> > > [    0.000000]  [<ffffffff89ac3845>] __setup_irq+0x255/0x580
> > > [    0.000000]  [<ffffffff89ac3bac>] setup_irq+0x3c/0xa0
> > > [    0.000000]  [<ffffffff8a529998>] ? hpet_enable+0x29a/0x2cb
> > > [    0.000000]  [<ffffffff8a51cb9b>] setup_default_timer_irq+0x1e/0x20
> > > [    0.000000]  [<ffffffff8a51cbb4>] hpet_time_init+0x17/0x19
> > > [    0.000000]  [<ffffffff8a51cb76>] x86_late_time_init+0xa/0x11
> > > [    0.000000]  [<ffffffff8a51ad8c>] start_kernel+0x383/0x3f8
> > > [    0.000000]  [<ffffffff8a51a120>] ? early_idt_handlers+0x120/0x120
> > > [    0.000000]  [<ffffffff8a51a432>] x86_64_start_reservations+0x2a/0x2c
> > > [    0.000000]  [<ffffffff8a51a50b>] x86_64_start_kernel+0xd7/0xe4
> > 
> > So the issue here seems to be that something enabled irqs on boot way too early.
> > But I tried hard to reproduce that crash without success. So I wanted to ask you,
> > do you mind if I send you a patch to debug this?
> 
> Never mind. Feel free to send me the patch.

Ah wait, I can finally reproduce. I just forgot to enable CONFIG_X86_DEBUG_STATIC_CPU_HAS.

I feel really relieved because otherwise debugging by email is often hard and long.

Thanks a lot Wu! 0-day is, as usual, extremely useful. All these years, Ingo
has sent me a lot of bug report right after he pulled my branches. But now 0-day
catches all the bugs before I do my pull requests. It spares a lot of time for everyone.
I can only hope that intel will continue to support you and your work for a long
while.

  reply	other threads:[~2014-09-12 12:35 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-11  7:17 [nohz] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/common.c:1430 warn_pre_alternatives() Fengguang Wu
2014-09-11 15:53 ` Frederic Weisbecker
2014-09-11 15:53   ` Frederic Weisbecker
2014-09-11 21:18   ` Fengguang Wu
2014-09-11 21:18     ` Fengguang Wu
2014-09-12 12:35     ` Frederic Weisbecker [this message]
2014-09-12 12:35       ` Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140912123530.GA5824@lerouge \
    --to=fweisbec@gmail.com \
    --cc=fengguang.wu@intel.com \
    --cc=jet.chen@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@01.org \
    --cc=tao.su@intel.com \
    --cc=yuanhan.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.