3.[34].x Reproducable [Firmware bug] message upon warm boot only

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* 3.[34].x Reproducable [Firmware bug] message upon warm boot only
       [not found] <CAKgfjTktMLc5bfqhCOKy0evssxggofNQqLT-gQqBkHcBvOgF0g@mail.gmail.com>
@ 2012-06-07 13:40 ` Rus
  2012-06-07 14:24   ` Borislav Petkov
  0 siblings, 1 reply; 12+ messages in thread
From: Rus @ 2012-06-07 13:40 UTC (permalink / raw)
  To: linux-kernel

Hello,

I have the following reproducable message upon each 3.4.1 warm boot :

....
[Firmware Bug]: cpu 0, invalid threshold interrupt offset 0 for bank
4, block 0 (MSR00000413=0xc010000001000000)
[Firmware Bug]: cpu 0, invalid threshold interrupt offset 0 for bank
4, block 1 (MSRC0000408=0xc010000001000000)
[Firmware Bug]: cpu 0, invalid threshold interrupt offset 0 for bank
4, block 2 (MSRC0000409=0xc01001c001000000)
.....
[Firmware Bug]: cpu 7, try to use APIC500 (LVT offset 0) for vector
0x10400, but the register is already in use for vector 0xf9 on another
cpu
[Firmware Bug]: cpu 7, IBS interrupt offset 0 not available
(MSRC001103A=0x0000000000000100)
Failed to setup IBS, -22
.....

Cold boot do not show this message. The hardware is Asus M5A97 PRO
(latest 1208 Bios) and FX-8150 CPU, 24GB RAM, the only system on this
box is Linux.

Is this message harmless or this is some problem with hardware/bios/Linux ?

Can supply any additional info.

TIA, Rus

--
SfinxSoft
http://sfinxsoft.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.[34].x Reproducable [Firmware bug] message upon warm boot only
  2012-06-07 13:40 ` 3.[34].x Reproducable [Firmware bug] message upon warm boot only Rus
@ 2012-06-07 14:24   ` Borislav Petkov
  2012-06-07 14:51     ` Rus
  0 siblings, 1 reply; 12+ messages in thread
From: Borislav Petkov @ 2012-06-07 14:24 UTC (permalink / raw)
  To: Rus; +Cc: linux-kernel

On Thu, Jun 07, 2012 at 04:40:01PM +0300, Rus wrote:
> Hello,
> 
> I have the following reproducable message upon each 3.4.1 warm boot :
> 
> ....
> [Firmware Bug]: cpu 0, invalid threshold interrupt offset 0 for bank
> 4, block 0 (MSR00000413=0xc010000001000000)
> [Firmware Bug]: cpu 0, invalid threshold interrupt offset 0 for bank
> 4, block 1 (MSRC0000408=0xc010000001000000)
> [Firmware Bug]: cpu 0, invalid threshold interrupt offset 0 for bank
> 4, block 2 (MSRC0000409=0xc01001c001000000)
> .....
> [Firmware Bug]: cpu 7, try to use APIC500 (LVT offset 0) for vector
> 0x10400, but the register is already in use for vector 0xf9 on another
> cpu
> [Firmware Bug]: cpu 7, IBS interrupt offset 0 not available
> (MSRC001103A=0x0000000000000100)
> Failed to setup IBS, -22
> .....
> 
> Cold boot do not show this message. The hardware is Asus M5A97 PRO
> (latest 1208 Bios) and FX-8150 CPU, 24GB RAM, the only system on this
> box is Linux.

Can you send /proc/cpuinfo from that box?

> Is this message harmless or this is some problem with hardware/bios/Linux ?

Mostly harmless, fix is already upstream: f227d306cf3 and also on its
way to stable. You can try 3.5-rc1 on the box (it should be pretty
stable on AMD) or wait for the stable backport.

Thanks.

-- 
Regards/Gruss,
Boris.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.[34].x Reproducable [Firmware bug] message upon warm boot only
  2012-06-07 14:24   ` Borislav Petkov
@ 2012-06-07 14:51     ` Rus
  2012-06-07 15:13       ` Borislav Petkov
  0 siblings, 1 reply; 12+ messages in thread
From: Rus @ 2012-06-07 14:51 UTC (permalink / raw)
  To: Borislav Petkov, linux-kernel

Hello,

>> Cold boot do not show this message. The hardware is Asus M5A97 PRO
>> (latest 1208 Bios) and FX-8150 CPU, 24GB RAM, the only system on this
>> box is Linux.
>
> Can you send /proc/cpuinfo from that box?
>

The /proc/cpuinfo :

.........
processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 21
model		: 1
model name	: AMD FX(tm)-8150 Eight-Core Processor
stepping	: 2
microcode	: 0x6000626
cpu MHz		: 3600.000
cache size	: 2048 KB
physical id	: 0
siblings	: 8
core id		: 0
cpu cores	: 4
apicid		: 16
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt
pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid
aperfmperf pni pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 popcnt aes
xsave avx lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a
misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 nodeid_msr
topoext perfctr_core arat cpb hw_pstate npt lbrv svm_lock nrip_save
tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
bogomips	: 7224.00
TLB size	: 1536 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual power management: ts
ttp tm 100mhzsteps hwpstate cpb
.....

>> Is this message harmless or this is some problem with hardware/bios/Linux ?
>
> Mostly harmless, fix is already upstream: f227d306cf3 and also on its
> way to stable. You can try 3.5-rc1 on the box (it should be pretty
> stable on AMD) or wait for the stable backport.

Ok, compiling ... will report soon. The reason I've asked - this
particular brand new box is freezing once at a 2-3 days without any
output - no oops, no sysrq, nothing. So I'm trying to investigate what
is the problem. The previous kernels (3.3.x) have the serious problem
with iommu - the onboard USB3.0 and Ethernet did not work at all on
them. The 3.4.x is better, but seems like still freezing.

Thanks

>
> Thanks.
>
> --
> Regards/Gruss,
> Boris.

-- 
SfinxSoft
http://sfinxsoft.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.[34].x Reproducable [Firmware bug] message upon warm boot only
  2012-06-07 14:51     ` Rus
@ 2012-06-07 15:13       ` Borislav Petkov
  2012-06-07 19:02         ` Rus
  2012-06-07 21:00         ` 3.[34].x Reproducable [Firmware bug] message upon warm boot only Rus
  0 siblings, 2 replies; 12+ messages in thread
From: Borislav Petkov @ 2012-06-07 15:13 UTC (permalink / raw)
  To: Rus; +Cc: Andreas Herrmann, linux-kernel

On Thu, Jun 07, 2012 at 05:51:51PM +0300, Rus wrote:
> .........
> processor	: 0
> vendor_id	: AuthenticAMD
> cpu family	: 21
> model		: 1
> model name	: AMD FX(tm)-8150 Eight-Core Processor
> stepping	: 2

Ok, family F15h, as expected.

> >> Is this message harmless or this is some problem with hardware/bios/Linux ?
> >
> > Mostly harmless, fix is already upstream: f227d306cf3 and also on its
> > way to stable. You can try 3.5-rc1 on the box (it should be pretty
> > stable on AMD) or wait for the stable backport.
> 
> Ok, compiling ... will report soon. The reason I've asked - this
> particular brand new box is freezing once at a 2-3 days without any
> output - no oops, no sysrq, nothing. So I'm trying to investigate what
> is the problem. The previous kernels (3.3.x) have the serious problem
> with iommu - the onboard USB3.0 and Ethernet did not work at all on
> them. The 3.4.x is better, but seems like still freezing.

Hmm, that doesn't sound good. What exactly do you do when it freezes,
any repeatable usage patterns? Does the freeze happen if you disable
IOMMU in the BIOS?

Do you have serial console attached to it?

Do you have CONFIG_DETECT_HUNG_TASK enabled? Also CONFIG_PROVE_LOCKING,
CONFIG_KMEMCHECK, CONFIG_DEBUG_PREEMPT. These are just a couple of debug
options to enable right now which could tell us more.

Anything else that I'm forgetting?

Ah, and send me dmesg privately pls, from 3.4 and 3.5-rc1.

That's all I can think of right now.

-- 
Regards/Gruss,
Boris.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.[34].x Reproducable [Firmware bug] message upon warm boot only
  2012-06-07 15:13       ` Borislav Petkov
@ 2012-06-07 19:02         ` Rus
  2012-06-07 19:33           ` lockdep and kmemcheck Borislav Petkov
  2012-06-07 21:00         ` 3.[34].x Reproducable [Firmware bug] message upon warm boot only Rus
  1 sibling, 1 reply; 12+ messages in thread
From: Rus @ 2012-06-07 19:02 UTC (permalink / raw)
  To: Borislav Petkov, linux-kernel

> Do you have CONFIG_DETECT_HUNG_TASK enabled? Also CONFIG_PROVE_LOCKING,
> CONFIG_KMEMCHECK, CONFIG_DEBUG_PREEMPT. These are just a couple of debug
> options to enable right now which could tell us more.

Kmemcheck setting is prevented the booting of the 3.5-rc1 with the
following messages :

kmemcheck: Limiting number of CPUs to 1.
kmemcheck: Initialized
------------[ cut here ]------------
WARNING: at kernel/lockdep.c:2739 lockdep_trace_alloc+0xcd/0xd0()
Hardware name: To be filled by O.E.M.
Modules linked in:
Pid: 1, comm: swapper/0 Not tainted 3.5.0-rc1 #3
Call Trace:
 [<ffffffff8104123a>] warn_slowpath_common+0x7a/0xb0
 [<ffffffff81041285>] warn_slowpath_null+0x15/0x20
 [<ffffffff81096f2d>] lockdep_trace_alloc+0xcd/0xd0
 [<ffffffff810ee48e>] __alloc_pages_nodemask+0x7e/0x890
 [<ffffffff810ee599>] ? __alloc_pages_nodemask+0x189/0x890
 [<ffffffff81278f3d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
 [<ffffffff81278f3d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
 [<ffffffff815c07d0>] ? error_exit+0x30/0xb0
 [<ffffffff811260c9>] kmemcheck_alloc_shadow+0x29/0xb0
 [<ffffffff8112409a>] new_slab+0x1fa/0x2e0
 [<ffffffff815b70ec>] __slab_alloc.isra.51.constprop.55+0x3e8/0x40e
 [<ffffffff815c07d0>] ? error_exit+0x30/0xb0
 [<ffffffff811251f7>] kmem_cache_alloc+0x87/0xb0
 [<ffffffff8126d7c0>] idr_pre_get+0x60/0x90
 [<ffffffff8126dd8b>] ida_pre_get+0x1b/0x90
 [<ffffffff810593b2>] create_worker+0x42/0x170
 [<ffffffff81ae441d>] init_workqueues+0x1f2/0x393
 [<ffffffff81ae422b>] ? usermodehelper_init+0x36/0x36
 [<ffffffff81ae422b>] ? usermodehelper_init+0x36/0x36
 [<ffffffff81002122>] do_one_initcall+0x122/0x180
 [<ffffffff81acbc7a>] kernel_init+0x9b/0x1f6
 [<ffffffff815c1f74>] kernel_thread_helper+0x4/0x10
 [<ffffffff815c0274>] ? retint_restore_args+0x13/0x13
 [<ffffffff81acbbdf>] ? start_kernel+0x3d2/0x3d2
 [<ffffffff815c1f70>] ? gs_change+0x13/0x13
---[ end trace 6d450e935ee1897c ]---
MCE: In-kernel MCE decoding enabled.
NMI watchdog: enabled, takes one hw-pmu counter.
Brought up 1 CPUs
----------------
| NMI testsuite:
--------------------
  remote IPI:  ok  |
   local IPI:
------------[ cut here ]------------
------------[ cut here ]------------
WARNING: at arch/x86/mm/kmemcheck/kmemcheck.c:634 kmemcheck_fault+0xb1/0xc0()
Hardware name: To be filled by O.E.M.
Modules linked in:
Pid: 1, comm: swapper/0 Tainted: G        W    3.5.0-rc1 #3
Call Trace:
 <NMI>  [<ffffffff8104123a>] warn_slowpath_common+0x7a/0xb0
 [<ffffffff81041285>] warn_slowpath_null+0x15/0x20
 [<ffffffff8103a1f1>] kmemcheck_fault+0xb1/0xc0
 [<ffffffff81033d48>] do_page_fault+0x3f8/0x480
 [<ffffffff81278f3d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
 [<ffffffff815c0525>] page_fault+0x25/0x30
 [<ffffffff8131fe2d>] ? vt_console_print+0xad/0x3b0
 [<ffffffff8131fdea>] ? vt_console_print+0x6a/0x3b0
 [<ffffffff8127ebbd>] ? do_raw_spin_unlock+0x5d/0xb0
 [<ffffffff81042484>] console_unlock+0x174/0x280
 [<ffffffff810427fc>] vprintk_emit+0x16c/0x580
 [<ffffffff8103a1f1>] ? kmemcheck_fault+0xb1/0xc0
 [<ffffffff815b451b>] printk+0x5c/0x5e
 [<ffffffff8103a1f1>] ? kmemcheck_fault+0xb1/0xc0
 [<ffffffff810411f9>] warn_slowpath_common+0x39/0xb0
 [<ffffffff81041285>] warn_slowpath_null+0x15/0x20
 [<ffffffff8103a1f1>] kmemcheck_fault+0xb1/0xc0
 [<ffffffff81033d48>] do_page_fault+0x3f8/0x480
 [<ffffffff81278f3d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
 [<ffffffff815c0525>] page_fault+0x25/0x30
 [<ffffffff8101ca8a>] ? x86_perf_event_update+0x2a/0xb0
 [<ffffffff8101dc66>] x86_pmu_handle_irq+0x96/0x130
 [<ffffffff8101c45d>] perf_event_nmi_handler+0x1d/0x20
 [<ffffffff81010891>] nmi_handle.isra.0+0x81/0xd0
 [<ffffffff81010810>] ? __register_nmi_handler+0x190/0x190
 [<ffffffff810109e8>] do_nmi+0x108/0x380
 [<ffffffff815c090c>] end_repeat_nmi+0x1a/0x1e
 [<ffffffff81278879>] ? delay_tsc+0x29/0xf0
 [<ffffffff81278879>] ? delay_tsc+0x29/0xf0
 [<ffffffff81278879>] ? delay_tsc+0x29/0xf0
 <<EOE>>  [<ffffffff812787ba>] __delay+0xa/0x10
 [<ffffffff812787eb>] __const_udelay+0x2b/0x30
 [<ffffffff81adf398>] test_nmi_ipi.constprop.2+0x51/0x84
 [<ffffffff81adf41e>] local_ipi+0x21/0x23
 [<ffffffff81adf2d2>] dotest.constprop.1+0x6/0x7b
 [<ffffffff81adf4af>] nmi_selftest+0x8f/0x185
 [<ffffffff81ada69b>] native_smp_cpus_done+0x2d/0x11e
 [<ffffffff81ae6199>] smp_init+0x97/0x9f
 [<ffffffff81acbc9e>] kernel_init+0xbf/0x1f6
 [<ffffffff815c1f74>] kernel_thread_helper+0x4/0x10
 [<ffffffff815c0274>] ? retint_restore_args+0x13/0x13
 [<ffffffff81acbbdf>] ? start_kernel+0x3d2/0x3d2
 [<ffffffff815c1f70>] ? gs_change+0x13/0x13
---[ end trace 6d450e935ee1897d ]---
WARNING: at arch/x86/mm/kmemcheck/kmemcheck.c:634 kmemcheck_fault+0xb1/0xc0()
Hardware name: To be filled by O.E.M.
Modules linked in:
Pid: 1, comm: swapper/0 Tainted: G        W    3.5.0-rc1 #3
Call Trace:
 <NMI>  [<ffffffff8104123a>] warn_slowpath_common+0x7a/0xb0
 [<ffffffff81041285>] warn_slowpath_null+0x15/0x20
 [<ffffffff8103a1f1>] kmemcheck_fault+0xb1/0xc0
 [<ffffffff81033d48>] do_page_fault+0x3f8/0x480
 [<ffffffff81278f3d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
 [<ffffffff815c0525>] page_fault+0x25/0x30
 [<ffffffff8101ca8a>] ? x86_perf_event_update+0x2a/0xb0
 [<ffffffff8101dc66>] x86_pmu_handle_irq+0x96/0x130
 [<ffffffff8101c45d>] perf_event_nmi_handler+0x1d/0x20
 [<ffffffff81010891>] nmi_handle.isra.0+0x81/0xd0
 [<ffffffff81010810>] ? __register_nmi_handler+0x190/0x190
 [<ffffffff810109e8>] do_nmi+0x108/0x380
 [<ffffffff815c090c>] end_repeat_nmi+0x1a/0x1e
 [<ffffffff81278879>] ? delay_tsc+0x29/0xf0
 [<ffffffff81278879>] ? delay_tsc+0x29/0xf0
 [<ffffffff81278879>] ? delay_tsc+0x29/0xf0
 <<EOE>>  [<ffffffff812787ba>] __delay+0xa/0x10
 [<ffffffff812787eb>] __const_udelay+0x2b/0x30
 [<ffffffff81adf398>] test_nmi_ipi.constprop.2+0x51/0x84
 [<ffffffff81adf41e>] local_ipi+0x21/0x23
 [<ffffffff81adf2d2>] dotest.constprop.1+0x6/0x7b
 [<ffffffff81adf4af>] nmi_selftest+0x8f/0x185
 [<ffffffff81ada69b>] native_smp_cpus_done+0x2d/0x11e
 [<ffffffff81ae6199>] smp_init+0x97/0x9f
 [<ffffffff81acbc9e>] kernel_init+0xbf/0x1f6
 [<ffffffff815c1f74>] kernel_thread_helper+0x4/0x10
 [<ffffffff815c0274>] ? retint_restore_args+0x13/0x13
 [<ffffffff81acbbdf>] ? start_kernel+0x3d2/0x3d2
 [<ffffffff815c1f70>] ? gs_change+0x13/0x13
---[ end trace 6d450e935ee1897e ]---
  ok  |
--------------------
Good, all   2 testcases passed! |
....................
Freeing unused kernel memory: 228k freed
Freeing unused kernel memory: 1388k freed
init (1) used greatest stack depth: 3256 bytes left
Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000005

Pid: 1, comm: init Tainted: G        W    3.5.0-rc1 #3
Call Trace:
 [<ffffffff815b43ae>] panic+0xb5/0x1c6
 [<ffffffff810464d9>] ? do_exit+0x749/0x920
 [<ffffffff81046581>] do_exit+0x7f1/0x920
 [<ffffffff81046944>] do_group_exit+0x44/0xb0
 [<ffffffff81053211>] get_signal_to_deliver+0x1e1/0x5f0
 [<ffffffff8105168f>] ? __send_signal+0x16f/0x2f0
 [<ffffffff8100c26a>] do_signal+0x3a/0x920
 [<ffffffff815bfd65>] ? _raw_spin_unlock_irqrestore+0x45/0x80
 [<ffffffff8105214c>] ? force_sig_info+0xdc/0x100
 [<ffffffff81019f82>] ? syscall_trace_leave+0x122/0x130
 [<ffffffff8100cbdd>] do_notify_resume+0x6d/0xa0
 [<ffffffff81278efe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff815c0e7a>] int_signal+0x12/0x17
 [<ffffffff815b290b>] ? run_init_process+0x1e/0x20
 [<ffffffff815b2952>] ? init_post+0x45/0xbe
 [<ffffffff81acbdd5>] ? kernel_init+0x1f6/0x1f6
 [<ffffffff81acb5ae>] ? do_early_param+0x8c/0x8c
 [<ffffffff815c1f74>] ? kernel_thread_helper+0x4/0x10
 [<ffffffff815c0274>] ? retint_restore_args+0x13/0x13
 [<ffffffff81acbbdf>] ? start_kernel+0x3d2/0x3d2
 [<ffffffff815c1f70>] ? gs_change+0x13/0x13
.........

After disabling kmemcheck the box is able to boot again.

Rus

-- 
SfinxSoft
http://sfinxsoft.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* lockdep and kmemcheck
  2012-06-07 19:02         ` Rus
@ 2012-06-07 19:33           ` Borislav Petkov
  2012-06-07 19:45             ` Borislav Petkov
  2012-06-08  8:02             ` Peter Zijlstra
  0 siblings, 2 replies; 12+ messages in thread
From: Borislav Petkov @ 2012-06-07 19:33 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Rus, linux-kernel

Peter, does it ring a bell?

This is LOCKDEP with CONFIG_KMEMCHECK.

On Thu, Jun 07, 2012 at 10:02:33PM +0300, Rus wrote:
> > Do you have CONFIG_DETECT_HUNG_TASK enabled? Also CONFIG_PROVE_LOCKING,
> > CONFIG_KMEMCHECK, CONFIG_DEBUG_PREEMPT. These are just a couple of debug
> > options to enable right now which could tell us more.
> 
> Kmemcheck setting is prevented the booting of the 3.5-rc1 with the
> following messages :
> 
> kmemcheck: Limiting number of CPUs to 1.
> kmemcheck: Initialized
> ------------[ cut here ]------------
> WARNING: at kernel/lockdep.c:2739 lockdep_trace_alloc+0xcd/0xd0()
> Hardware name: To be filled by O.E.M.
> Modules linked in:
> Pid: 1, comm: swapper/0 Not tainted 3.5.0-rc1 #3
> Call Trace:
>  [<ffffffff8104123a>] warn_slowpath_common+0x7a/0xb0
>  [<ffffffff81041285>] warn_slowpath_null+0x15/0x20
>  [<ffffffff81096f2d>] lockdep_trace_alloc+0xcd/0xd0
>  [<ffffffff810ee48e>] __alloc_pages_nodemask+0x7e/0x890
>  [<ffffffff810ee599>] ? __alloc_pages_nodemask+0x189/0x890
>  [<ffffffff81278f3d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
>  [<ffffffff81278f3d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
>  [<ffffffff815c07d0>] ? error_exit+0x30/0xb0
>  [<ffffffff811260c9>] kmemcheck_alloc_shadow+0x29/0xb0
>  [<ffffffff8112409a>] new_slab+0x1fa/0x2e0
>  [<ffffffff815b70ec>] __slab_alloc.isra.51.constprop.55+0x3e8/0x40e
>  [<ffffffff815c07d0>] ? error_exit+0x30/0xb0
>  [<ffffffff811251f7>] kmem_cache_alloc+0x87/0xb0
>  [<ffffffff8126d7c0>] idr_pre_get+0x60/0x90
>  [<ffffffff8126dd8b>] ida_pre_get+0x1b/0x90
>  [<ffffffff810593b2>] create_worker+0x42/0x170
>  [<ffffffff81ae441d>] init_workqueues+0x1f2/0x393
>  [<ffffffff81ae422b>] ? usermodehelper_init+0x36/0x36
>  [<ffffffff81ae422b>] ? usermodehelper_init+0x36/0x36
>  [<ffffffff81002122>] do_one_initcall+0x122/0x180
>  [<ffffffff81acbc7a>] kernel_init+0x9b/0x1f6
>  [<ffffffff815c1f74>] kernel_thread_helper+0x4/0x10
>  [<ffffffff815c0274>] ? retint_restore_args+0x13/0x13
>  [<ffffffff81acbbdf>] ? start_kernel+0x3d2/0x3d2
>  [<ffffffff815c1f70>] ? gs_change+0x13/0x13
> ---[ end trace 6d450e935ee1897c ]---
> MCE: In-kernel MCE decoding enabled.
> NMI watchdog: enabled, takes one hw-pmu counter.
> Brought up 1 CPUs
> ----------------
> | NMI testsuite:
> --------------------
>   remote IPI:  ok  |
>    local IPI:
> ------------[ cut here ]------------
> ------------[ cut here ]------------
> WARNING: at arch/x86/mm/kmemcheck/kmemcheck.c:634 kmemcheck_fault+0xb1/0xc0()
> Hardware name: To be filled by O.E.M.
> Modules linked in:
> Pid: 1, comm: swapper/0 Tainted: G        W    3.5.0-rc1 #3
> Call Trace:
>  <NMI>  [<ffffffff8104123a>] warn_slowpath_common+0x7a/0xb0
>  [<ffffffff81041285>] warn_slowpath_null+0x15/0x20
>  [<ffffffff8103a1f1>] kmemcheck_fault+0xb1/0xc0
>  [<ffffffff81033d48>] do_page_fault+0x3f8/0x480
>  [<ffffffff81278f3d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
>  [<ffffffff815c0525>] page_fault+0x25/0x30
>  [<ffffffff8131fe2d>] ? vt_console_print+0xad/0x3b0
>  [<ffffffff8131fdea>] ? vt_console_print+0x6a/0x3b0
>  [<ffffffff8127ebbd>] ? do_raw_spin_unlock+0x5d/0xb0
>  [<ffffffff81042484>] console_unlock+0x174/0x280
>  [<ffffffff810427fc>] vprintk_emit+0x16c/0x580
>  [<ffffffff8103a1f1>] ? kmemcheck_fault+0xb1/0xc0
>  [<ffffffff815b451b>] printk+0x5c/0x5e
>  [<ffffffff8103a1f1>] ? kmemcheck_fault+0xb1/0xc0
>  [<ffffffff810411f9>] warn_slowpath_common+0x39/0xb0
>  [<ffffffff81041285>] warn_slowpath_null+0x15/0x20
>  [<ffffffff8103a1f1>] kmemcheck_fault+0xb1/0xc0
>  [<ffffffff81033d48>] do_page_fault+0x3f8/0x480
>  [<ffffffff81278f3d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
>  [<ffffffff815c0525>] page_fault+0x25/0x30
>  [<ffffffff8101ca8a>] ? x86_perf_event_update+0x2a/0xb0
>  [<ffffffff8101dc66>] x86_pmu_handle_irq+0x96/0x130
>  [<ffffffff8101c45d>] perf_event_nmi_handler+0x1d/0x20
>  [<ffffffff81010891>] nmi_handle.isra.0+0x81/0xd0
>  [<ffffffff81010810>] ? __register_nmi_handler+0x190/0x190
>  [<ffffffff810109e8>] do_nmi+0x108/0x380
>  [<ffffffff815c090c>] end_repeat_nmi+0x1a/0x1e
>  [<ffffffff81278879>] ? delay_tsc+0x29/0xf0
>  [<ffffffff81278879>] ? delay_tsc+0x29/0xf0
>  [<ffffffff81278879>] ? delay_tsc+0x29/0xf0
>  <<EOE>>  [<ffffffff812787ba>] __delay+0xa/0x10
>  [<ffffffff812787eb>] __const_udelay+0x2b/0x30
>  [<ffffffff81adf398>] test_nmi_ipi.constprop.2+0x51/0x84
>  [<ffffffff81adf41e>] local_ipi+0x21/0x23
>  [<ffffffff81adf2d2>] dotest.constprop.1+0x6/0x7b
>  [<ffffffff81adf4af>] nmi_selftest+0x8f/0x185
>  [<ffffffff81ada69b>] native_smp_cpus_done+0x2d/0x11e
>  [<ffffffff81ae6199>] smp_init+0x97/0x9f
>  [<ffffffff81acbc9e>] kernel_init+0xbf/0x1f6
>  [<ffffffff815c1f74>] kernel_thread_helper+0x4/0x10
>  [<ffffffff815c0274>] ? retint_restore_args+0x13/0x13
>  [<ffffffff81acbbdf>] ? start_kernel+0x3d2/0x3d2
>  [<ffffffff815c1f70>] ? gs_change+0x13/0x13
> ---[ end trace 6d450e935ee1897d ]---
> WARNING: at arch/x86/mm/kmemcheck/kmemcheck.c:634 kmemcheck_fault+0xb1/0xc0()
> Hardware name: To be filled by O.E.M.
> Modules linked in:
> Pid: 1, comm: swapper/0 Tainted: G        W    3.5.0-rc1 #3
> Call Trace:
>  <NMI>  [<ffffffff8104123a>] warn_slowpath_common+0x7a/0xb0
>  [<ffffffff81041285>] warn_slowpath_null+0x15/0x20
>  [<ffffffff8103a1f1>] kmemcheck_fault+0xb1/0xc0
>  [<ffffffff81033d48>] do_page_fault+0x3f8/0x480
>  [<ffffffff81278f3d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
>  [<ffffffff815c0525>] page_fault+0x25/0x30
>  [<ffffffff8101ca8a>] ? x86_perf_event_update+0x2a/0xb0
>  [<ffffffff8101dc66>] x86_pmu_handle_irq+0x96/0x130
>  [<ffffffff8101c45d>] perf_event_nmi_handler+0x1d/0x20
>  [<ffffffff81010891>] nmi_handle.isra.0+0x81/0xd0
>  [<ffffffff81010810>] ? __register_nmi_handler+0x190/0x190
>  [<ffffffff810109e8>] do_nmi+0x108/0x380
>  [<ffffffff815c090c>] end_repeat_nmi+0x1a/0x1e
>  [<ffffffff81278879>] ? delay_tsc+0x29/0xf0
>  [<ffffffff81278879>] ? delay_tsc+0x29/0xf0
>  [<ffffffff81278879>] ? delay_tsc+0x29/0xf0
>  <<EOE>>  [<ffffffff812787ba>] __delay+0xa/0x10
>  [<ffffffff812787eb>] __const_udelay+0x2b/0x30
>  [<ffffffff81adf398>] test_nmi_ipi.constprop.2+0x51/0x84
>  [<ffffffff81adf41e>] local_ipi+0x21/0x23
>  [<ffffffff81adf2d2>] dotest.constprop.1+0x6/0x7b
>  [<ffffffff81adf4af>] nmi_selftest+0x8f/0x185
>  [<ffffffff81ada69b>] native_smp_cpus_done+0x2d/0x11e
>  [<ffffffff81ae6199>] smp_init+0x97/0x9f
>  [<ffffffff81acbc9e>] kernel_init+0xbf/0x1f6
>  [<ffffffff815c1f74>] kernel_thread_helper+0x4/0x10
>  [<ffffffff815c0274>] ? retint_restore_args+0x13/0x13
>  [<ffffffff81acbbdf>] ? start_kernel+0x3d2/0x3d2
>  [<ffffffff815c1f70>] ? gs_change+0x13/0x13
> ---[ end trace 6d450e935ee1897e ]---
>   ok  |
> --------------------
> Good, all   2 testcases passed! |
> ....................
> Freeing unused kernel memory: 228k freed
> Freeing unused kernel memory: 1388k freed
> init (1) used greatest stack depth: 3256 bytes left
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000005
> 
> Pid: 1, comm: init Tainted: G        W    3.5.0-rc1 #3
> Call Trace:
>  [<ffffffff815b43ae>] panic+0xb5/0x1c6
>  [<ffffffff810464d9>] ? do_exit+0x749/0x920
>  [<ffffffff81046581>] do_exit+0x7f1/0x920
>  [<ffffffff81046944>] do_group_exit+0x44/0xb0
>  [<ffffffff81053211>] get_signal_to_deliver+0x1e1/0x5f0
>  [<ffffffff8105168f>] ? __send_signal+0x16f/0x2f0
>  [<ffffffff8100c26a>] do_signal+0x3a/0x920
>  [<ffffffff815bfd65>] ? _raw_spin_unlock_irqrestore+0x45/0x80
>  [<ffffffff8105214c>] ? force_sig_info+0xdc/0x100
>  [<ffffffff81019f82>] ? syscall_trace_leave+0x122/0x130
>  [<ffffffff8100cbdd>] do_notify_resume+0x6d/0xa0
>  [<ffffffff81278efe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>  [<ffffffff815c0e7a>] int_signal+0x12/0x17
>  [<ffffffff815b290b>] ? run_init_process+0x1e/0x20
>  [<ffffffff815b2952>] ? init_post+0x45/0xbe
>  [<ffffffff81acbdd5>] ? kernel_init+0x1f6/0x1f6
>  [<ffffffff81acb5ae>] ? do_early_param+0x8c/0x8c
>  [<ffffffff815c1f74>] ? kernel_thread_helper+0x4/0x10
>  [<ffffffff815c0274>] ? retint_restore_args+0x13/0x13
>  [<ffffffff81acbbdf>] ? start_kernel+0x3d2/0x3d2
>  [<ffffffff815c1f70>] ? gs_change+0x13/0x13
> .........
> 
> After disabling kmemcheck the box is able to boot again.
> 
> Rus
> 
> -- 
> SfinxSoft
> http://sfinxsoft.com
> 

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: lockdep and kmemcheck
  2012-06-07 19:33           ` lockdep and kmemcheck Borislav Petkov
@ 2012-06-07 19:45             ` Borislav Petkov
  2012-06-08  6:10               ` Tejun Heo
  2012-06-08  8:02             ` Peter Zijlstra
  1 sibling, 1 reply; 12+ messages in thread
From: Borislav Petkov @ 2012-06-07 19:45 UTC (permalink / raw)
  To: Peter Zijlstra, Rus, linux-kernel; +Cc: Tejun Heo

On Thu, Jun 07, 2012 at 09:33:31PM +0200, Borislav Petkov wrote:
> Peter, does it ring a bell?
> 
> This is LOCKDEP with CONFIG_KMEMCHECK.
> 
> On Thu, Jun 07, 2012 at 10:02:33PM +0300, Rus wrote:
> > > Do you have CONFIG_DETECT_HUNG_TASK enabled? Also CONFIG_PROVE_LOCKING,
> > > CONFIG_KMEMCHECK, CONFIG_DEBUG_PREEMPT. These are just a couple of debug
> > > options to enable right now which could tell us more.
> > 
> > Kmemcheck setting is prevented the booting of the 3.5-rc1 with the
> > following messages :
> > 
> > kmemcheck: Limiting number of CPUs to 1.
> > kmemcheck: Initialized
> > ------------[ cut here ]------------
> > WARNING: at kernel/lockdep.c:2739 lockdep_trace_alloc+0xcd/0xd0()

This is

        /*
	 * Oi! Can't be having __GFP_FS allocations with IRQs disabled.
         */
        if (DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags)))
	                return;


> > Hardware name: To be filled by O.E.M.
> > Modules linked in:
> > Pid: 1, comm: swapper/0 Not tainted 3.5.0-rc1 #3
> > Call Trace:
> >  [<ffffffff8104123a>] warn_slowpath_common+0x7a/0xb0
> >  [<ffffffff81041285>] warn_slowpath_null+0x15/0x20
> >  [<ffffffff81096f2d>] lockdep_trace_alloc+0xcd/0xd0
> >  [<ffffffff810ee48e>] __alloc_pages_nodemask+0x7e/0x890
> >  [<ffffffff810ee599>] ? __alloc_pages_nodemask+0x189/0x890
> >  [<ffffffff81278f3d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
> >  [<ffffffff81278f3d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
> >  [<ffffffff815c07d0>] ? error_exit+0x30/0xb0
> >  [<ffffffff811260c9>] kmemcheck_alloc_shadow+0x29/0xb0
> >  [<ffffffff8112409a>] new_slab+0x1fa/0x2e0
> >  [<ffffffff815b70ec>] __slab_alloc.isra.51.constprop.55+0x3e8/0x40e
> >  [<ffffffff815c07d0>] ? error_exit+0x30/0xb0
> >  [<ffffffff811251f7>] kmem_cache_alloc+0x87/0xb0
> >  [<ffffffff8126d7c0>] idr_pre_get+0x60/0x90
> >  [<ffffffff8126dd8b>] ida_pre_get+0x1b/0x90
> >  [<ffffffff810593b2>] create_worker+0x42/0x170

This has to be

static struct worker *create_worker(struct global_cwq *gcwq, bool bind)
{
        bool on_unbound_cpu = gcwq->cpu == WORK_CPU_UNBOUND;
        struct worker *worker = NULL;
        int id = -1;

        spin_lock_irq(&gcwq->lock);
        while (ida_get_new(&gcwq->worker_ida, &id)) {
                spin_unlock_irq(&gcwq->lock);
                if (!ida_pre_get(&gcwq->worker_ida, GFP_KERNEL))

and GFP_KERNEL has __GFP_FS.

> >  [<ffffffff81ae441d>] init_workqueues+0x1f2/0x393
> >  [<ffffffff81ae422b>] ? usermodehelper_init+0x36/0x36
> >  [<ffffffff81ae422b>] ? usermodehelper_init+0x36/0x36
> >  [<ffffffff81002122>] do_one_initcall+0x122/0x180
> >  [<ffffffff81acbc7a>] kernel_init+0x9b/0x1f6
> >  [<ffffffff815c1f74>] kernel_thread_helper+0x4/0x10
> >  [<ffffffff815c0274>] ? retint_restore_args+0x13/0x13
> >  [<ffffffff81acbbdf>] ? start_kernel+0x3d2/0x3d2
> >  [<ffffffff815c1f70>] ? gs_change+0x13/0x13
> > ---[ end trace 6d450e935ee1897c ]---
> > MCE: In-kernel MCE decoding enabled.
> > NMI watchdog: enabled, takes one hw-pmu counter.

Let's add some more people to CC.

Tejun, this create_worker() uses ida_pre_get() with GFP_KERNEL mask
but lockdep complains about __GFP_FS allocations with IRQs off in
__lockdep_trace_alloc. What's up?

Hmmm...

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.[34].x Reproducable [Firmware bug] message upon warm boot only
  2012-06-07 15:13       ` Borislav Petkov
  2012-06-07 19:02         ` Rus
@ 2012-06-07 21:00         ` Rus
  1 sibling, 0 replies; 12+ messages in thread
From: Rus @ 2012-06-07 21:00 UTC (permalink / raw)
  To: Borislav Petkov, linux-kernel

More info:

Tried to switch off the IOMMU in the BIOS:

3.4.1 boots, but onboard ethernet is not woking with the previously
posted net/sched/sch_generic.c:256 warning.
3.5-rc1 fails to boot due to AHCI timeouts

If the at the same time the USB3.0 controller is disabled in the BIOS
- for both kernels the USB keyboard do not work (the 3.5-rc1 can't
load the usb modules though) even in USB2.0 ports. If the USB3.0 is
enabled in the BIOS - the USB keyboard works only in USB3.0 ports -
the USB2.0 ports are failed with the device descriptor read errors. By
the way: people with the older kernels and older BIOS'es struggles
from the same problem on the same m/b -
http://www.spinics.net/lists/linux-usb/msg54761.html

After re-enabling the IOMMU the keyboard and ethernet became work
again. This is somewhat strange as week ago I've observed exact the
opposite behaviour - the Ethernet works only with IOMMU switched off.
USB was bugging as described - works only in USB3.0 ports. Seems like
some random bug hits the IOMMU subsystem or this motherboard need to
be thrown away.

As usual can supply any additional info.

Rus

-- 
SfinxSoft
http://sfinxsoft.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: lockdep and kmemcheck
  2012-06-07 19:45             ` Borislav Petkov
@ 2012-06-08  6:10               ` Tejun Heo
  2012-06-08  7:55                 ` Peter Zijlstra
  0 siblings, 1 reply; 12+ messages in thread
From: Tejun Heo @ 2012-06-08  6:10 UTC (permalink / raw)
  To: Borislav Petkov, Peter Zijlstra, Rus, linux-kernel

On Thu, Jun 07, 2012 at 09:45:16PM +0200, Borislav Petkov wrote:
> static struct worker *create_worker(struct global_cwq *gcwq, bool bind)
> {
>         bool on_unbound_cpu = gcwq->cpu == WORK_CPU_UNBOUND;
>         struct worker *worker = NULL;
>         int id = -1;
> 
>         spin_lock_irq(&gcwq->lock);
>         while (ida_get_new(&gcwq->worker_ida, &id)) {
>                 spin_unlock_irq(&gcwq->lock);
>                 if (!ida_pre_get(&gcwq->worker_ida, GFP_KERNEL))
> 
> and GFP_KERNEL has __GFP_FS.
> 
> > >  [<ffffffff81ae441d>] init_workqueues+0x1f2/0x393
> > >  [<ffffffff81ae422b>] ? usermodehelper_init+0x36/0x36
> > >  [<ffffffff81ae422b>] ? usermodehelper_init+0x36/0x36
> > >  [<ffffffff81002122>] do_one_initcall+0x122/0x180
> > >  [<ffffffff81acbc7a>] kernel_init+0x9b/0x1f6
> > >  [<ffffffff815c1f74>] kernel_thread_helper+0x4/0x10
> > >  [<ffffffff815c0274>] ? retint_restore_args+0x13/0x13
> > >  [<ffffffff81acbbdf>] ? start_kernel+0x3d2/0x3d2
> > >  [<ffffffff815c1f70>] ? gs_change+0x13/0x13
> > > ---[ end trace 6d450e935ee1897c ]---
> > > MCE: In-kernel MCE decoding enabled.
> > > NMI watchdog: enabled, takes one hw-pmu counter.
> 
> Let's add some more people to CC.
> 
> Tejun, this create_worker() uses ida_pre_get() with GFP_KERNEL mask
> but lockdep complains about __GFP_FS allocations with IRQs off in
> __lockdep_trace_alloc. What's up?

The GFP_KERNEL allocation is right after spin_unlock_irq().  I suppose
this is from early boot before IRQs are brought online, right?  My
memory is very fuzzy now but ISTR irq debug code and lockdep having
workarounds for early boot.  Peter?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: lockdep and kmemcheck
  2012-06-08  6:10               ` Tejun Heo
@ 2012-06-08  7:55                 ` Peter Zijlstra
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2012-06-08  7:55 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Borislav Petkov, Rus, linux-kernel

On Fri, 2012-06-08 at 15:10 +0900, Tejun Heo wrote:
> On Thu, Jun 07, 2012 at 09:45:16PM +0200, Borislav Petkov wrote:
> > static struct worker *create_worker(struct global_cwq *gcwq, bool bind)
> > {
> >         bool on_unbound_cpu = gcwq->cpu == WORK_CPU_UNBOUND;
> >         struct worker *worker = NULL;
> >         int id = -1;
> > 
> >         spin_lock_irq(&gcwq->lock);
> >         while (ida_get_new(&gcwq->worker_ida, &id)) {
> >                 spin_unlock_irq(&gcwq->lock);
> >                 if (!ida_pre_get(&gcwq->worker_ida, GFP_KERNEL))
> > 
> > and GFP_KERNEL has __GFP_FS.
> > 
> > > >  [<ffffffff81ae441d>] init_workqueues+0x1f2/0x393
> > > >  [<ffffffff81ae422b>] ? usermodehelper_init+0x36/0x36
> > > >  [<ffffffff81ae422b>] ? usermodehelper_init+0x36/0x36
> > > >  [<ffffffff81002122>] do_one_initcall+0x122/0x180
> > > >  [<ffffffff81acbc7a>] kernel_init+0x9b/0x1f6
> > > >  [<ffffffff815c1f74>] kernel_thread_helper+0x4/0x10
> > > >  [<ffffffff815c0274>] ? retint_restore_args+0x13/0x13
> > > >  [<ffffffff81acbbdf>] ? start_kernel+0x3d2/0x3d2
> > > >  [<ffffffff815c1f70>] ? gs_change+0x13/0x13
> > > > ---[ end trace 6d450e935ee1897c ]---
> > > > MCE: In-kernel MCE decoding enabled.
> > > > NMI watchdog: enabled, takes one hw-pmu counter.
> > 
> > Let's add some more people to CC.
> > 
> > Tejun, this create_worker() uses ida_pre_get() with GFP_KERNEL mask
> > but lockdep complains about __GFP_FS allocations with IRQs off in
> > __lockdep_trace_alloc. What's up?
> 
> The GFP_KERNEL allocation is right after spin_unlock_irq().  I suppose
> this is from early boot before IRQs are brought online, right?  My
> memory is very fuzzy now but ISTR irq debug code and lockdep having
> workarounds for early boot.  Peter?

early_boot_irqs_disabled, but kernel_init() is way past that, that's
where we've forked the first thread and are fully scheduling already.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: lockdep and kmemcheck
  2012-06-07 19:33           ` lockdep and kmemcheck Borislav Petkov
  2012-06-07 19:45             ` Borislav Petkov
@ 2012-06-08  8:02             ` Peter Zijlstra
  2012-06-08  8:06               ` Borislav Petkov
  1 sibling, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2012-06-08  8:02 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Rus, linux-kernel, Pekka Enberg, Christoph Lameter

On Thu, 2012-06-07 at 21:33 +0200, Borislav Petkov wrote:
> > WARNING: at kernel/lockdep.c:2739 lockdep_trace_alloc+0xcd/0xd0()
> > Hardware name: To be filled by O.E.M.
> > Modules linked in:
> > Pid: 1, comm: swapper/0 Not tainted 3.5.0-rc1 #3
> > Call Trace:
> >  [<ffffffff8104123a>] warn_slowpath_common+0x7a/0xb0
> >  [<ffffffff81041285>] warn_slowpath_null+0x15/0x20
> >  [<ffffffff81096f2d>] lockdep_trace_alloc+0xcd/0xd0
> >  [<ffffffff810ee48e>] __alloc_pages_nodemask+0x7e/0x890
> >  [<ffffffff810ee599>] ? __alloc_pages_nodemask+0x189/0x890
> >  [<ffffffff81278f3d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
> >  [<ffffffff81278f3d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
> >  [<ffffffff815c07d0>] ? error_exit+0x30/0xb0
> >  [<ffffffff811260c9>] kmemcheck_alloc_shadow+0x29/0xb0
> >  [<ffffffff8112409a>] new_slab+0x1fa/0x2e0
> >  [<ffffffff815b70ec>] __slab_alloc.isra.51.constprop.55+0x3e8/0x40e
> >  [<ffffffff815c07d0>] ? error_exit+0x30/0xb0
> >  [<ffffffff811251f7>] kmem_cache_alloc+0x87/0xb0
> >  [<ffffffff8126d7c0>] idr_pre_get+0x60/0x90
> >  [<ffffffff8126dd8b>] ida_pre_get+0x1b/0x90
> >  [<ffffffff810593b2>] create_worker+0x42/0x170
> >  [<ffffffff81ae441d>] init_workqueues+0x1f2/0x393
> >  [<ffffffff81ae422b>] ? usermodehelper_init+0x36/0x36
> >  [<ffffffff81ae422b>] ? usermodehelper_init+0x36/0x36
> >  [<ffffffff81002122>] do_one_initcall+0x122/0x180
> >  [<ffffffff81acbc7a>] kernel_init+0x9b/0x1f6
> >  [<ffffffff815c1f74>] kernel_thread_helper+0x4/0x10
> >  [<ffffffff815c0274>] ? retint_restore_args+0x13/0x13
> >  [<ffffffff81acbbdf>] ? start_kernel+0x3d2/0x3d2
> >  [<ffffffff815c1f70>] ? gs_change+0x13/0x13 

Using SLUB are you?

Looks like SLUB is buggy here.. does this fix it?

---
 mm/slub.c |   14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index fb2ef09..6b9e3f6 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1314,13 +1314,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 			stat(s, ORDER_FALLBACK);
 	}
 
-	if (flags & __GFP_WAIT)
-		local_irq_disable();
-
-	if (!page)
-		return NULL;
-
-	if (kmemcheck_enabled
+	if (page && kmemcheck_enabled
 		&& !(s->flags & (SLAB_NOTRACK | DEBUG_DEFAULT_FLAGS))) {
 		int pages = 1 << oo_order(oo);
 
@@ -1336,6 +1330,12 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 			kmemcheck_mark_unallocated_pages(page, pages);
 	}
 
+	if (flags & __GFP_WAIT)
+		local_irq_disable();
+
+	if (!page)
+		return NULL;
+
 	page->objects = oo_objects(oo);
 	mod_zone_page_state(page_zone(page),
 		(s->flags & SLAB_RECLAIM_ACCOUNT) ?


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: lockdep and kmemcheck
  2012-06-08  8:02             ` Peter Zijlstra
@ 2012-06-08  8:06               ` Borislav Petkov
  0 siblings, 0 replies; 12+ messages in thread
From: Borislav Petkov @ 2012-06-08  8:06 UTC (permalink / raw)
  To: Peter Zijlstra, Rus; +Cc: linux-kernel, Pekka Enberg, Christoph Lameter

On Fri, Jun 08, 2012 at 10:02:16AM +0200, Peter Zijlstra wrote:
> On Thu, 2012-06-07 at 21:33 +0200, Borislav Petkov wrote:
> > > WARNING: at kernel/lockdep.c:2739 lockdep_trace_alloc+0xcd/0xd0()
> > > Hardware name: To be filled by O.E.M.
> > > Modules linked in:
> > > Pid: 1, comm: swapper/0 Not tainted 3.5.0-rc1 #3
> > > Call Trace:
> > >  [<ffffffff8104123a>] warn_slowpath_common+0x7a/0xb0
> > >  [<ffffffff81041285>] warn_slowpath_null+0x15/0x20
> > >  [<ffffffff81096f2d>] lockdep_trace_alloc+0xcd/0xd0
> > >  [<ffffffff810ee48e>] __alloc_pages_nodemask+0x7e/0x890
> > >  [<ffffffff810ee599>] ? __alloc_pages_nodemask+0x189/0x890
> > >  [<ffffffff81278f3d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
> > >  [<ffffffff81278f3d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
> > >  [<ffffffff815c07d0>] ? error_exit+0x30/0xb0
> > >  [<ffffffff811260c9>] kmemcheck_alloc_shadow+0x29/0xb0
> > >  [<ffffffff8112409a>] new_slab+0x1fa/0x2e0
> > >  [<ffffffff815b70ec>] __slab_alloc.isra.51.constprop.55+0x3e8/0x40e
> > >  [<ffffffff815c07d0>] ? error_exit+0x30/0xb0
> > >  [<ffffffff811251f7>] kmem_cache_alloc+0x87/0xb0
> > >  [<ffffffff8126d7c0>] idr_pre_get+0x60/0x90
> > >  [<ffffffff8126dd8b>] ida_pre_get+0x1b/0x90
> > >  [<ffffffff810593b2>] create_worker+0x42/0x170
> > >  [<ffffffff81ae441d>] init_workqueues+0x1f2/0x393
> > >  [<ffffffff81ae422b>] ? usermodehelper_init+0x36/0x36
> > >  [<ffffffff81ae422b>] ? usermodehelper_init+0x36/0x36
> > >  [<ffffffff81002122>] do_one_initcall+0x122/0x180
> > >  [<ffffffff81acbc7a>] kernel_init+0x9b/0x1f6
> > >  [<ffffffff815c1f74>] kernel_thread_helper+0x4/0x10
> > >  [<ffffffff815c0274>] ? retint_restore_args+0x13/0x13
> > >  [<ffffffff81acbbdf>] ? start_kernel+0x3d2/0x3d2
> > >  [<ffffffff815c1f70>] ? gs_change+0x13/0x13 
> 
> Using SLUB are you?

Who, mee? Nah :-)

Rus reported this WARN while testing -rc1.

> Looks like SLUB is buggy here.. does this fix it?

@Rus: can you apply the patch below ontop of -rc1 and retest with the
same BIOS settings and kernel .config you used to trigger the above?

Don't hesitate to ask questions if you don't know how to apply patches,
build kernels, etc.

> ---
>  mm/slub.c |   14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index fb2ef09..6b9e3f6 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1314,13 +1314,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
>  			stat(s, ORDER_FALLBACK);
>  	}
>  
> -	if (flags & __GFP_WAIT)
> -		local_irq_disable();
> -
> -	if (!page)
> -		return NULL;
> -
> -	if (kmemcheck_enabled
> +	if (page && kmemcheck_enabled
>  		&& !(s->flags & (SLAB_NOTRACK | DEBUG_DEFAULT_FLAGS))) {
>  		int pages = 1 << oo_order(oo);
>  
> @@ -1336,6 +1330,12 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
>  			kmemcheck_mark_unallocated_pages(page, pages);
>  	}
>  
> +	if (flags & __GFP_WAIT)
> +		local_irq_disable();
> +
> +	if (!page)
> +		return NULL;
> +
>  	page->objects = oo_objects(oo);
>  	mod_zone_page_state(page_zone(page),
>  		(s->flags & SLAB_RECLAIM_ACCOUNT) ?
> 
> 

Thanks.

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-06-08  8:07 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAKgfjTktMLc5bfqhCOKy0evssxggofNQqLT-gQqBkHcBvOgF0g@mail.gmail.com>
2012-06-07 13:40 ` 3.[34].x Reproducable [Firmware bug] message upon warm boot only Rus
2012-06-07 14:24   ` Borislav Petkov
2012-06-07 14:51     ` Rus
2012-06-07 15:13       ` Borislav Petkov
2012-06-07 19:02         ` Rus
2012-06-07 19:33           ` lockdep and kmemcheck Borislav Petkov
2012-06-07 19:45             ` Borislav Petkov
2012-06-08  6:10               ` Tejun Heo
2012-06-08  7:55                 ` Peter Zijlstra
2012-06-08  8:02             ` Peter Zijlstra
2012-06-08  8:06               ` Borislav Petkov
2012-06-07 21:00         ` 3.[34].x Reproducable [Firmware bug] message upon warm boot only Rus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).