From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932913AbdCIRqF (ORCPT ); Thu, 9 Mar 2017 12:46:05 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:52415 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932589AbdCIRpI (ORCPT ); Thu, 9 Mar 2017 12:45:08 -0500 Date: Thu, 9 Mar 2017 18:43:58 +0100 (CET) From: Thomas Gleixner To: Bart Van Assche cc: "torvalds@linux-foundation.org" , "mingo@kernel.org" , "linux-kernel@vger.kernel.org" , "hpa@zytor.com" , "akpm@linux-foundation.org" Subject: Re: [GIT pull] CPU hotplug updates for 4.9 In-Reply-To: <1489080026.2597.7.camel@sandisk.com> Message-ID: References: <1488851515.6858.2.camel@sandisk.com> <1488925949.2739.3.camel@sandisk.com> <1489001551.2813.3.camel@sandisk.com> <1489080026.2597.7.camel@sandisk.com> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 9 Mar 2017, Bart Van Assche wrote: > On Thu, 2017-03-09 at 11:22 +0100, Thomas Gleixner wrote: > > > In contrast with previous tests this morning I have been able to reproduce > > > this hang with kernel v4.10. So it's not a kernel v4.10 regression. But the > > > hang did not occur in a test with kernel v4.9.7. I assume this means that > > > the regression got introduced between the v4.9 and v4.10 kernels. > > > > Is it always the x86_pkg_thermal init which locks up? > > Hello Thomas, > > Apparently not. Here are a few other call traces that appeared in the system > log: > > INFO: task systemd-udevd:748 blocked for more than 480 seconds. > Tainted: G IO 4.11.0-rc1-dbg+ #1 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > systemd-udevd D 0 748 518 0x00000104 > Call Trace: > __schedule+0x302/0xc30 > schedule+0x38/0x90 > schedule_timeout+0x255/0x490 > wait_for_completion+0x103/0x170 > cpuhp_issue_call+0xb9/0xe0 > __cpuhp_setup_state+0xf6/0x180 > coretemp_init+0x8d/0x1000 [coretemp] > do_one_initcall+0x3e/0x170 > do_init_module+0x5a/0x1ed > load_module+0x2339/0x2a40 > SYSC_finit_module+0xbc/0xf0 > SyS_finit_module+0x9/0x10 > do_syscall_64+0x57/0x140 > entry_SYSCALL64_slow_path+0x25/0x25 > Showing all locks held in the system: > 2 locks held by khungtaskd/91: > #0: (rcu_read_lock){......}, at: [] watchdog+0xa0/0x5d0 > #1: (tasklist_lock){.+.?..}, at: [] debug_show_all_locks+0x3d/0x1a0 > 1 lock held by systemd-udevd/748: > #0: (cpu_hotplug.dep_map){++++++}, at: [] get_online_cpus+0x2d/0x80 Ok, so it's random. Now it would be interesting what the rest of the system does when this happens. I still have no idea why that IOAT setting has any influence. Thanks, tglx