[BUG] Kernel splat when taking CPUs offline

* [BUG] Kernel splat when taking CPUs offline
@ 2015-07-08 19:24 Steven Rostedt
  2015-07-09  0:13 ` Rafael J. Wysocki
  0 siblings, 1 reply; 6+ messages in thread
From: Steven Rostedt @ 2015-07-08 19:24 UTC (permalink / raw)
  To: LKML
  Cc: Linus Torvalds, Andrew Morton, Viresh Kumar, Rafael J. Wysocki,
	Saravana Kannan

My tests for ftrace includes testing the mmiotracer, which to run
requires taking all CPUs offline but one of them. This test crashed
every so often, and I was able to bisect down to this commit:

commit 87549141d516 ("cpufreq: Stop migrating sysfs files on hotplug")

Just to make sure this wasn't just the mmiotracer causing the issue, I
was able to trigger this same bug by simply doing the following:

(on a 4 cpu machine)

 # echo 0 > /sys/devices/system/cpu/cpu1/online 
 # echo 0 > /sys/devices/system/cpu/cpu2/online 
 # echo 0 > /sys/devices/system/cpu/cpu3/online 
 # echo 1 > /sys/devices/system/cpu/cpu1/online 
 # echo 1 > /sys/devices/system/cpu/cpu2/online 
 # echo 1 > /sys/devices/system/cpu/cpu3/online 
 # echo 0 > /sys/devices/system/cpu/cpu1/online 
 # echo 0 > /sys/devices/system/cpu/cpu2/online 
 # echo 0 > /sys/devices/system/cpu/cpu2/online 
 # echo 0 > /sys/devices/system/cpu/cpu3/online 
 # echo 1 > /sys/devices/system/cpu/cpu1/online 
 # echo 1 > /sys/devices/system/cpu/cpu2/online 
 # echo 1 > /sys/devices/system/cpu/cpu3/online 

It usually takes two or three tries (shutting down all but one CPU, and
starting them again) before it triggers.

Here's the splat:

Initializing CPU#1
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1609 at /home/rostedt/work/git/linux-trace.git/drivers/cpufreq/cpufreq.c:2350 cpufreq_update_policy+0xc8/0x139()
Modules linked in: ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipv6 ppdev parport_pc r8169 parport microcode
CPU: 0 PID: 1609 Comm: bash Tainted: G        W       4.2.0-rc1-test #26
Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
 00000000 00000000 ee47db9c c0cd04e6 c10d4463 ee47dbcc c0440fbe c1010460
 00000000 00000649 c10d4463 0000092e c0a6dd28 c0a6dd28 f13fd600 00000000
 ee47dda8 ee47dbdc c0440ff7 00000009 00000000 ee47ddb8 c0a6dd28 efb01bc0
Call Trace:
 [<c0cd04e6>] dump_stack+0x41/0x52
 [<c0440fbe>] warn_slowpath_common+0x9d/0xb4
 [<c0a6dd28>] ? cpufreq_update_policy+0xc8/0x139
 [<c0a6dd28>] ? cpufreq_update_policy+0xc8/0x139
 [<c0440ff7>] warn_slowpath_null+0x22/0x24
 [<c0a6dd28>] cpufreq_update_policy+0xc8/0x139
 [<c0a6dd99>] ? cpufreq_update_policy+0x139/0x139
 [<c0a6dc9b>] ? cpufreq_update_policy+0x3b/0x139
 [<c0a6bef7>] ? cpufreq_freq_transition_begin+0x97/0xd9
 [<c046ea90>] ? __wake_up+0x1a/0x47
 [<c0772682>] acpi_processor_ppc_has_changed+0x54/0x5d
 [<c076f6b9>] acpi_cpu_soft_notify+0xb0/0xf1
 [<c06d2859>] ? compute_batch_value+0xd/0x22
 [<c06d2a38>] ? percpu_counter_hotcpu_callback+0x11/0x80
 [<c0458c35>] notifier_call_chain+0x68/0x91
 [<c047007b>] ? sched_debug_header+0x15c/0x58e
 [<c0458c7c>] __raw_notifier_call_chain+0x1e/0x23
 [<c04410c2>] __cpu_notify+0x24/0x39
 [<c04414d9>] _cpu_up+0xef/0x105
 [<c044153d>] cpu_up+0x4e/0x5f
 [<c0ccb642>] cpu_subsys_online+0x13/0x15
 [<c09134b4>] device_online+0x45/0x6e
 [<c091350f>] online_store+0x32/0x4f
 [<c09134dd>] ? device_online+0x6e/0x6e
 [<c0911570>] dev_attr_store+0x24/0x29
 [<c0587f31>] sysfs_kf_write+0x3a/0x41
 [<c0587ef7>] ? sysfs_file_ops+0x48/0x48
 [<c0587244>] kernfs_fop_write+0xe2/0x11f
 [<c0587162>] ? kernfs_vma_page_mkwrite+0x6c/0x6c
 [<c0532e3a>] __vfs_write+0x24/0x9b
 [<c0532d25>] ? file_start_write+0x27/0x29
 [<c0533355>] ? rw_verify_area+0xce/0xef
 [<c0533843>] vfs_write+0x7a/0xc4
 [<c0533a09>] SyS_write+0x54/0x7f
 [<c0cdae58>] sysenter_do_call+0x12/0x12
---[ end trace e2c32eead4f4e541 ]---

I'll dig more into it, but wanted to give people a heads up.

-- Steve

^ permalink raw reply	[flat|nested] 6+ messages in thread