From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758913AbbGHTZq (ORCPT ); Wed, 8 Jul 2015 15:25:46 -0400 Received: from smtprelay0136.hostedemail.com ([216.40.44.136]:56555 "EHLO smtprelay.hostedemail.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1754559AbbGHTZA (ORCPT ); Wed, 8 Jul 2015 15:25:00 -0400 X-Session-Marker: 726F737465647440676F6F646D69732E6F7267 X-Spam-Summary: 2,0,0,,d41d8cd98f00b204,rostedt@goodmis.org,:::::::::::,RULES_HIT:41:69:355:379:541:960:973:988:989:1260:1277:1311:1313:1314:1345:1437:1515:1516:1518:1534:1543:1593:1594:1711:1730:1747:1777:1792:2194:2199:2393:2559:2562:2741:3138:3139:3140:3141:3142:3354:3865:3866:3867:3868:3870:3872:4321:4401:4605:5007:6261:7875:7903:7904:8660:8784:9163:10004:10400:10848:11026:11232:11657:11658:11914:12043:12294:12296:12438:12517:12519:12555:12679:12710:12737:12740:13148:13160:13229:13230:21080,0,RBL:none,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:none,Custom_rules:0:0:0 X-HE-Tag: card16_4886a4ff87114 X-Filterd-Recvd-Size: 4470 Date: Wed, 8 Jul 2015 15:24:56 -0400 From: Steven Rostedt To: LKML Cc: Linus Torvalds , Andrew Morton , Viresh Kumar , "Rafael J. Wysocki" , Saravana Kannan Subject: [BUG] Kernel splat when taking CPUs offline Message-ID: <20150708152456.4438d60f@gandalf.local.home> X-Mailer: Claws Mail 3.11.1 (GTK+ 2.24.28; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org My tests for ftrace includes testing the mmiotracer, which to run requires taking all CPUs offline but one of them. This test crashed every so often, and I was able to bisect down to this commit: commit 87549141d516 ("cpufreq: Stop migrating sysfs files on hotplug") Just to make sure this wasn't just the mmiotracer causing the issue, I was able to trigger this same bug by simply doing the following: (on a 4 cpu machine) # echo 0 > /sys/devices/system/cpu/cpu1/online # echo 0 > /sys/devices/system/cpu/cpu2/online # echo 0 > /sys/devices/system/cpu/cpu3/online # echo 1 > /sys/devices/system/cpu/cpu1/online # echo 1 > /sys/devices/system/cpu/cpu2/online # echo 1 > /sys/devices/system/cpu/cpu3/online # echo 0 > /sys/devices/system/cpu/cpu1/online # echo 0 > /sys/devices/system/cpu/cpu2/online # echo 0 > /sys/devices/system/cpu/cpu2/online # echo 0 > /sys/devices/system/cpu/cpu3/online # echo 1 > /sys/devices/system/cpu/cpu1/online # echo 1 > /sys/devices/system/cpu/cpu2/online # echo 1 > /sys/devices/system/cpu/cpu3/online It usually takes two or three tries (shutting down all but one CPU, and starting them again) before it triggers. Here's the splat: Initializing CPU#1 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1609 at /home/rostedt/work/git/linux-trace.git/drivers/cpufreq/cpufreq.c:2350 cpufreq_update_policy+0xc8/0x139() Modules linked in: ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipv6 ppdev parport_pc r8169 parport microcode CPU: 0 PID: 1609 Comm: bash Tainted: G W 4.2.0-rc1-test #26 Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014 00000000 00000000 ee47db9c c0cd04e6 c10d4463 ee47dbcc c0440fbe c1010460 00000000 00000649 c10d4463 0000092e c0a6dd28 c0a6dd28 f13fd600 00000000 ee47dda8 ee47dbdc c0440ff7 00000009 00000000 ee47ddb8 c0a6dd28 efb01bc0 Call Trace: [] dump_stack+0x41/0x52 [] warn_slowpath_common+0x9d/0xb4 [] ? cpufreq_update_policy+0xc8/0x139 [] ? cpufreq_update_policy+0xc8/0x139 [] warn_slowpath_null+0x22/0x24 [] cpufreq_update_policy+0xc8/0x139 [] ? cpufreq_update_policy+0x139/0x139 [] ? cpufreq_update_policy+0x3b/0x139 [] ? cpufreq_freq_transition_begin+0x97/0xd9 [] ? __wake_up+0x1a/0x47 [] acpi_processor_ppc_has_changed+0x54/0x5d [] acpi_cpu_soft_notify+0xb0/0xf1 [] ? compute_batch_value+0xd/0x22 [] ? percpu_counter_hotcpu_callback+0x11/0x80 [] notifier_call_chain+0x68/0x91 [] ? sched_debug_header+0x15c/0x58e [] __raw_notifier_call_chain+0x1e/0x23 [] __cpu_notify+0x24/0x39 [] _cpu_up+0xef/0x105 [] cpu_up+0x4e/0x5f [] cpu_subsys_online+0x13/0x15 [] device_online+0x45/0x6e [] online_store+0x32/0x4f [] ? device_online+0x6e/0x6e [] dev_attr_store+0x24/0x29 [] sysfs_kf_write+0x3a/0x41 [] ? sysfs_file_ops+0x48/0x48 [] kernfs_fop_write+0xe2/0x11f [] ? kernfs_vma_page_mkwrite+0x6c/0x6c [] __vfs_write+0x24/0x9b [] ? file_start_write+0x27/0x29 [] ? rw_verify_area+0xce/0xef [] vfs_write+0x7a/0xc4 [] SyS_write+0x54/0x7f [] sysenter_do_call+0x12/0x12 ---[ end trace e2c32eead4f4e541 ]--- I'll dig more into it, but wanted to give people a heads up. -- Steve