From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757196AbaIKPxk (ORCPT ); Thu, 11 Sep 2014 11:53:40 -0400 Received: from mail-we0-f169.google.com ([74.125.82.169]:64266 "EHLO mail-we0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751146AbaIKPxh (ORCPT ); Thu, 11 Sep 2014 11:53:37 -0400 Date: Thu, 11 Sep 2014 17:53:19 +0200 From: Frederic Weisbecker To: Fengguang Wu Cc: Jet Chen , Su Tao , Yuanhan Liu , LKP , linux-kernel@vger.kernel.org Subject: Re: [nohz] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/common.c:1430 warn_pre_alternatives() Message-ID: <20140911155316.GA2524@lerouge> References: <20140911071704.GA4026@localhost> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140911071704.GA4026@localhost> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On Thu, Sep 11, 2014 at 03:17:04PM +0800, Fengguang Wu wrote: > Greetings, > > 0day kernel testing robot got the below dmesg and the first bad commit is > > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git nohz/fixes-v2 > > commit 664c05b9ab2dd92aad807bd7f9aa273955949abe > Author: Frederic Weisbecker > AuthorDate: Sun Aug 17 22:02:55 2014 +0200 > Commit: Frederic Weisbecker > CommitDate: Wed Sep 10 18:52:45 2014 +0200 > > nohz: Consolidate nohz full init code > > The supports for CONFIG_NO_HZ_FULL_ALL=y and the nohz_full= kernel > parameter both have their own way to do the same thing: allocate > full dynticks cpumasks, fill them and initialize some state variables. > > Lets consolidate that all in the same place. > > While at it, convert some regular printk message to warnings when > fundamental allocations fail. > > Cc: Ingo Molnar > Cc: Paul E. McKenney > Cc: Peter Zijlstra > Cc: Thomas Gleixner > Signed-off-by: Frederic Weisbecker > > +-----------------------------------------------------------------+------------+------------+------------+ > | | ed1fb75de3 | 664c05b9ab | dd99eb9cdf | > +-----------------------------------------------------------------+------------+------------+------------+ > | boot_successes | 60 | 0 | 0 | > | boot_failures | 0 | 20 | 23 | So you tested ed1fb75de3 and 664c05b9ab with the same config that is in attachement, right? I can't find what dd99eb9cdf refers to though. One more thing below: > | WARNING:at_arch/x86/kernel/cpu/common.c:warn_pre_alternatives() | 0 | 20 | 22 | > | BUG:kernel_boot_hang | 0 | 20 | 22 | > | backtrace:setup_default_timer_irq | 0 | 20 | 22 | > | backtrace:hpet_time_init | 0 | 20 | 22 | > | backtrace:x86_late_time_init | 0 | 20 | 22 | > | WARNING:CPU:PID:at/kbuild/ | 0 | 1 | | > | WARNING:CPU:PID:at_arch/x86/k | 0 | 1 | | > | WARNING:CPU:PID:at_arch/x86/kernel/cpu/common.c:warn_pre_al | 0 | 0 | 1 | > | BUG:kernel_boot_crashed | 0 | 0 | 1 | > +-----------------------------------------------------------------+------------+------------+------------+ > > [ 0.000000] ---------------------------------------------------- > [ 0.000000] hpet clockevent registered > [ 0.000000] ------------[ cut here ]------------ > [ 0.000000] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/common.c:1430 warn_pre_alternatives+0x1e/0x20() > [ 0.000000] You're using static_cpu_has before alternatives have run! > [ 0.000000] Modules linked in: > [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.17.0-rc3-00007-g664c05b #840 > [ 0.000000] 0000000000000009 ffff880012603d80 ffffffff89dfe8c6 ffff880012603dc8 > [ 0.000000] ffff880012603db8 ffffffff89a81df3 ffffffff8a414580 0000000000000000 > [ 0.000000] 0000000000000000 0000000000000000 0000000000000000 ffff880012603e18 > [ 0.000000] Call Trace: > [ 0.000000] [] dump_stack+0x4d/0x66 > [ 0.000000] [] warn_slowpath_common+0x73/0x90 > [ 0.000000] [] warn_slowpath_fmt+0x47/0x50 > [ 0.000000] [] ? rcu_check_callbacks+0x30f/0x7c0 > [ 0.000000] [] warn_pre_alternatives+0x1e/0x20 > [ 0.000000] [] irq_work_tick+0x15/0x50 > [ 0.000000] [] update_process_times+0x56/0x70 > [ 0.000000] [] tick_periodic+0x2b/0xc0 > [ 0.000000] [] ? tick_handle_periodic+0x21/0x60 > [ 0.000000] [] tick_handle_periodic+0x21/0x60 > [ 0.000000] [] timer_interrupt+0x10/0x20 > [ 0.000000] [] handle_irq_event_percpu+0x64/0x110 > [ 0.000000] [] handle_irq_event+0x38/0x60 > [ 0.000000] [] handle_level_irq+0x74/0x110 > [ 0.000000] [] handle_irq+0x58/0x150 > [ 0.000000] [] ? vtime_account_idle+0x9/0x50 > [ 0.000000] [] ? vtime_common_account_irq_enter+0x3d/0x50 > [ 0.000000] [] do_IRQ+0x48/0xe0 > [ 0.000000] [] common_interrupt+0x6d/0x6d > [ 0.000000] [] ? native_restore_fl+0x6/0x10 > [ 0.000000] [] _raw_spin_unlock_irqrestore+0x31/0x40 > [ 0.000000] [] __setup_irq+0x255/0x580 > [ 0.000000] [] setup_irq+0x3c/0xa0 > [ 0.000000] [] ? hpet_enable+0x29a/0x2cb > [ 0.000000] [] setup_default_timer_irq+0x1e/0x20 > [ 0.000000] [] hpet_time_init+0x17/0x19 > [ 0.000000] [] x86_late_time_init+0xa/0x11 > [ 0.000000] [] start_kernel+0x383/0x3f8 > [ 0.000000] [] ? early_idt_handlers+0x120/0x120 > [ 0.000000] [] x86_64_start_reservations+0x2a/0x2c > [ 0.000000] [] x86_64_start_kernel+0xd7/0xe4 So the issue here seems to be that something enabled irqs on boot way too early. But I tried hard to reproduce that crash without success. So I wanted to ask you, do you mind if I send you a patch to debug this? From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============1826083132995818521==" MIME-Version: 1.0 From: Frederic Weisbecker To: lkp@lists.01.org Subject: Re: [nohz] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/common.c:1430 warn_pre_alternatives() Date: Thu, 11 Sep 2014 17:53:19 +0200 Message-ID: <20140911155316.GA2524@lerouge> In-Reply-To: <20140911071704.GA4026@localhost> List-Id: --===============1826083132995818521== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Hi, On Thu, Sep 11, 2014 at 03:17:04PM +0800, Fengguang Wu wrote: > Greetings, > = > 0day kernel testing robot got the below dmesg and the first bad commit is > = > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git= nohz/fixes-v2 > = > commit 664c05b9ab2dd92aad807bd7f9aa273955949abe > Author: Frederic Weisbecker > AuthorDate: Sun Aug 17 22:02:55 2014 +0200 > Commit: Frederic Weisbecker > CommitDate: Wed Sep 10 18:52:45 2014 +0200 > = > nohz: Consolidate nohz full init code > = > The supports for CONFIG_NO_HZ_FULL_ALL=3Dy and the nohz_full=3D kernel > parameter both have their own way to do the same thing: allocate > full dynticks cpumasks, fill them and initialize some state variables. > = > Lets consolidate that all in the same place. > = > While at it, convert some regular printk message to warnings when > fundamental allocations fail. > = > Cc: Ingo Molnar > Cc: Paul E. McKenney > Cc: Peter Zijlstra > Cc: Thomas Gleixner > Signed-off-by: Frederic Weisbecker > = > +-----------------------------------------------------------------+------= ------+------------+------------+ > | | ed1fb= 75de3 | 664c05b9ab | dd99eb9cdf | > +-----------------------------------------------------------------+------= ------+------------+------------+ > | boot_successes | 60 = | 0 | 0 | > | boot_failures | 0 = | 20 | 23 | So you tested ed1fb75de3 and 664c05b9ab with the same config that is in att= achement, right? I can't find what dd99eb9cdf refers to though. One more thing below: > | WARNING:at_arch/x86/kernel/cpu/common.c:warn_pre_alternatives() | 0 = | 20 | 22 | > | BUG:kernel_boot_hang | 0 = | 20 | 22 | > | backtrace:setup_default_timer_irq | 0 = | 20 | 22 | > | backtrace:hpet_time_init | 0 = | 20 | 22 | > | backtrace:x86_late_time_init | 0 = | 20 | 22 | > | WARNING:CPU:PID:at/kbuild/ | 0 = | 1 | | > | WARNING:CPU:PID:at_arch/x86/k | 0 = | 1 | | > | WARNING:CPU:PID:at_arch/x86/kernel/cpu/common.c:warn_pre_al | 0 = | 0 | 1 | > | BUG:kernel_boot_crashed | 0 = | 0 | 1 | > +-----------------------------------------------------------------+------= ------+------------+------------+ > = > [ 0.000000] ---------------------------------------------------- > [ 0.000000] hpet clockevent registered > [ 0.000000] ------------[ cut here ]------------ > [ 0.000000] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/common.c:143= 0 warn_pre_alternatives+0x1e/0x20() > [ 0.000000] You're using static_cpu_has before alternatives have run! > [ 0.000000] Modules linked in: > [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.17.0-rc3-00007= -g664c05b #840 > [ 0.000000] 0000000000000009 ffff880012603d80 ffffffff89dfe8c6 ffff88= 0012603dc8 > [ 0.000000] ffff880012603db8 ffffffff89a81df3 ffffffff8a414580 000000= 0000000000 > [ 0.000000] 0000000000000000 0000000000000000 0000000000000000 ffff88= 0012603e18 > [ 0.000000] Call Trace: > [ 0.000000] [] dump_stack+0x4d/0x66 > [ 0.000000] [] warn_slowpath_common+0x73/0x90 > [ 0.000000] [] warn_slowpath_fmt+0x47/0x50 > [ 0.000000] [] ? rcu_check_callbacks+0x30f/0x7c0 > [ 0.000000] [] warn_pre_alternatives+0x1e/0x20 > [ 0.000000] [] irq_work_tick+0x15/0x50 > [ 0.000000] [] update_process_times+0x56/0x70 > [ 0.000000] [] tick_periodic+0x2b/0xc0 > [ 0.000000] [] ? tick_handle_periodic+0x21/0x60 > [ 0.000000] [] tick_handle_periodic+0x21/0x60 > [ 0.000000] [] timer_interrupt+0x10/0x20 > [ 0.000000] [] handle_irq_event_percpu+0x64/0x110 > [ 0.000000] [] handle_irq_event+0x38/0x60 > [ 0.000000] [] handle_level_irq+0x74/0x110 > [ 0.000000] [] handle_irq+0x58/0x150 > [ 0.000000] [] ? vtime_account_idle+0x9/0x50 > [ 0.000000] [] ? vtime_common_account_irq_enter+0x3= d/0x50 > [ 0.000000] [] do_IRQ+0x48/0xe0 > [ 0.000000] [] common_interrupt+0x6d/0x6d > [ 0.000000] [] ? native_restore_fl+0x6/0x10 > [ 0.000000] [] _raw_spin_unlock_irqrestore+0x31/0x40 > [ 0.000000] [] __setup_irq+0x255/0x580 > [ 0.000000] [] setup_irq+0x3c/0xa0 > [ 0.000000] [] ? hpet_enable+0x29a/0x2cb > [ 0.000000] [] setup_default_timer_irq+0x1e/0x20 > [ 0.000000] [] hpet_time_init+0x17/0x19 > [ 0.000000] [] x86_late_time_init+0xa/0x11 > [ 0.000000] [] start_kernel+0x383/0x3f8 > [ 0.000000] [] ? early_idt_handlers+0x120/0x120 > [ 0.000000] [] x86_64_start_reservations+0x2a/0x2c > [ 0.000000] [] x86_64_start_kernel+0xd7/0xe4 So the issue here seems to be that something enabled irqs on boot way too e= arly. But I tried hard to reproduce that crash without success. So I wanted to as= k you, do you mind if I send you a patch to debug this? --===============1826083132995818521==--