From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755475AbdEKKB1 (ORCPT ); Thu, 11 May 2017 06:01:27 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:48380 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754981AbdEKKB0 (ORCPT ); Thu, 11 May 2017 06:01:26 -0400 Date: Thu, 11 May 2017 12:01:21 +0200 (CEST) From: Thomas Gleixner To: Mark Rutland cc: LAK , LKML , will.deacon@arm.com, catalin.marinas@arm.com, Sebastian Sewior , jbaron@akamai.com, Peter Zijlstra , Steven Rostedt , suzuki.poulose@arm.com Subject: Re: [PATCHv3 0/2] arm64: fix hotplug rwsem boot fallout In-Reply-To: <20170511093721.GB14766@leverpostej> Message-ID: References: <1493377266-2205-1-git-send-email-mark.rutland@arm.com> <20170510180928.GA7102@leverpostej> <20170511093721.GB14766@leverpostej> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 11 May 2017, Mark Rutland wrote: > On Thu, May 11, 2017 at 10:30:39AM +0200, Thomas Gleixner wrote: > > On Wed, 10 May 2017, Thomas Gleixner wrote: > > > On Wed, 10 May 2017, Mark Rutland wrote: > > > > [ 0.182133] [] lockdep_assert_hotplug_held+0x78/0x98 > > > > [ 0.182161] [] __static_key_slow_inc+0x174/0x2e0 > > > > [ 0.182188] [] static_key_enable_cpuslocked+0x64/0xb0 > > > > [ 0.182215] [] update_cpu_capabilities+0x178/0x2d8 > > > > [ 0.182243] [] update_cpu_errata_workarounds_cpuslocked+0x1c/0x28 > > > > [ 0.182270] [] check_local_cpu_capabilities+0x1a0/0x248 > > > > [ 0.182295] [] secondary_start_kernel+0x1e8/0x478 > > > > [ 0.182317] [<000000008219a1b4>] 0x8219a1b4 > > > > [ 0.182337] CPU features: enabling workaround for ARM erratum 834220 > > > > [ 0.182362] ------------[ cut here ]------------ > > > > > > > > The problem is that the secondary CPU doesn't hold the rwsem when it > > > > calls __static_key_slow_inc() in its boot path. It cannot take the > > > > rwsem, since the primaary CPU holds this for the duration of onlining > > > > the secondary CPU. > > > > Looking deeper into that: > > > > secondary_start_kernel() > > check_local_cpu_capabilities() > > update_cpu_errata_workarounds() > > update_cpu_capabilities() > > static_key_enable() > > __static_key_slow_inc() > > jump_label_lock() > > mutex_lock(&jump_label_mutex); > > > > How is that supposed to work? > > > > That call path is the low level CPU bringup, running in the context of the > > idle task of that CPU with interrupts and preemption disabled. Taking a > > mutex in that context, even if in that case the mutex is uncontended, is a > > NONO. > > Urgh; good point. Thanks for taking a look. > > I think I can solve both issues by deferring poking the keys, so I'll > give that a go. > > As an aside, do we have anything that should detect the broken mutex > usage? I've been testing kernels with LOCKDEP, PROVE_LOCKING, > DEBUG_ATOMIC_SLEEP, and friends, and nothing has complained so far. Peter and myself were wondering about that already. No idea why that doesn't yell at you. Thanks, tglx From mboxrd@z Thu Jan 1 00:00:00 1970 From: tglx@linutronix.de (Thomas Gleixner) Date: Thu, 11 May 2017 12:01:21 +0200 (CEST) Subject: [PATCHv3 0/2] arm64: fix hotplug rwsem boot fallout In-Reply-To: <20170511093721.GB14766@leverpostej> References: <1493377266-2205-1-git-send-email-mark.rutland@arm.com> <20170510180928.GA7102@leverpostej> <20170511093721.GB14766@leverpostej> Message-ID: To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Thu, 11 May 2017, Mark Rutland wrote: > On Thu, May 11, 2017 at 10:30:39AM +0200, Thomas Gleixner wrote: > > On Wed, 10 May 2017, Thomas Gleixner wrote: > > > On Wed, 10 May 2017, Mark Rutland wrote: > > > > [ 0.182133] [] lockdep_assert_hotplug_held+0x78/0x98 > > > > [ 0.182161] [] __static_key_slow_inc+0x174/0x2e0 > > > > [ 0.182188] [] static_key_enable_cpuslocked+0x64/0xb0 > > > > [ 0.182215] [] update_cpu_capabilities+0x178/0x2d8 > > > > [ 0.182243] [] update_cpu_errata_workarounds_cpuslocked+0x1c/0x28 > > > > [ 0.182270] [] check_local_cpu_capabilities+0x1a0/0x248 > > > > [ 0.182295] [] secondary_start_kernel+0x1e8/0x478 > > > > [ 0.182317] [<000000008219a1b4>] 0x8219a1b4 > > > > [ 0.182337] CPU features: enabling workaround for ARM erratum 834220 > > > > [ 0.182362] ------------[ cut here ]------------ > > > > > > > > The problem is that the secondary CPU doesn't hold the rwsem when it > > > > calls __static_key_slow_inc() in its boot path. It cannot take the > > > > rwsem, since the primaary CPU holds this for the duration of onlining > > > > the secondary CPU. > > > > Looking deeper into that: > > > > secondary_start_kernel() > > check_local_cpu_capabilities() > > update_cpu_errata_workarounds() > > update_cpu_capabilities() > > static_key_enable() > > __static_key_slow_inc() > > jump_label_lock() > > mutex_lock(&jump_label_mutex); > > > > How is that supposed to work? > > > > That call path is the low level CPU bringup, running in the context of the > > idle task of that CPU with interrupts and preemption disabled. Taking a > > mutex in that context, even if in that case the mutex is uncontended, is a > > NONO. > > Urgh; good point. Thanks for taking a look. > > I think I can solve both issues by deferring poking the keys, so I'll > give that a go. > > As an aside, do we have anything that should detect the broken mutex > usage? I've been testing kernels with LOCKDEP, PROVE_LOCKING, > DEBUG_ATOMIC_SLEEP, and friends, and nothing has complained so far. Peter and myself were wondering about that already. No idea why that doesn't yell at you. Thanks, tglx