LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Mark Rutland <mark.rutland@arm.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Sebastian Siewior <bigeasy@linutronix.de>,
	catalin.marinas@arm.com, will.deacon@arm.com,
	suzuki.poulose@arm.com, linux-arm-kernel@lists.infradead.org
Subject: Re: [patch V2 00/24] cpu/hotplug: Convert get_online_cpus() to a percpu_rwsem
Date: Tue, 25 Apr 2017 17:10:37 +0100
Message-ID: <20170425161037.GA27156@leverpostej> (raw)
In-Reply-To: <20170418170442.665445272@linutronix.de>

Hi,

This series appears to break boot on some arm64 platforms, seen with
next-20170424. More info below.

On Tue, Apr 18, 2017 at 07:04:42PM +0200, Thomas Gleixner wrote:
> get_online_cpus() is used in hot pathes in mainline and even more so in
> RT. That can show up badly under certain conditions because every locker
> contends on a global mutex. RT has it's own homebrewn mitigation which is
> an (badly done) open coded implementation of percpu_rwsems with recursion
> support.
> 
> The proper replacement for that are percpu_rwsems, but that requires to
> remove recursion support.
> 
> The conversion unearthed real locking issues which were previously not
> visible because the get_online_cpus() lockdep annotation was implemented
> with recursion support which prevents lockdep from tracking full dependency
> chains. These potential deadlocks are not related to recursive calls, they
> trigger on the first invocation because lockdep now has the full dependency
> chains available.

Catalin spotted next-20170424 wouldn't boot on a Juno system, where we see the
following splat (repeated forever) when we try to bring up the first secondary
CPU:

[    0.213406] smp: Bringing up secondary CPUs ...
[    0.250326] CPU features: enabling workaround for ARM erratum 832075
[    0.250334] BUG: scheduling while atomic: swapper/1/0/0x00000002
[    0.250337] Modules linked in:
[    0.250346] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.11.0-rc7-next-20170424 #2
[    0.250349] Hardware name: ARM Juno development board (r1) (DT)
[    0.250353] Call trace:
[    0.250365] [<ffff000008088510>] dump_backtrace+0x0/0x238
[    0.250371] [<ffff00000808880c>] show_stack+0x14/0x20
[    0.250377] [<ffff00000839d854>] dump_stack+0x9c/0xc0
[    0.250384] [<ffff0000080e3540>] __schedule_bug+0x50/0x70
[    0.250391] [<ffff000008932ecc>] __schedule+0x52c/0x5a8
[    0.250395] [<ffff000008932f80>] schedule+0x38/0xa0
[    0.250400] [<ffff000008935e8c>] rwsem_down_read_failed+0xc4/0x108
[    0.250407] [<ffff0000080fe8e0>] __percpu_down_read+0x100/0x118
[    0.250414] [<ffff0000080c0b60>] get_online_cpus+0x70/0x78
[    0.250420] [<ffff0000081749e8>] static_key_enable+0x28/0x48
[    0.250425] [<ffff00000808de90>] update_cpu_capabilities+0x78/0xf8
[    0.250430] [<ffff00000808d14c>] update_cpu_errata_workarounds+0x1c/0x28
[    0.250435] [<ffff00000808e004>] check_local_cpu_capabilities+0xf4/0x128
[    0.250440] [<ffff00000808e894>] secondary_start_kernel+0x8c/0x118
[    0.250444] [<000000008093d1b4>] 0x8093d1b4

I can reproduce this with the current head of the linux-tip smp/hotplug
branch (commit 77c60400c82bd993), with arm64 defconfig on a Juno R1
system.

When we bring the secondary CPU online, we detect an erratum that wasn't
present on the boot CPU, and try to enable a static branch we use to
track the erratum. The call to static_branch_enable() blows up as above.

I see that we now have static_branch_disable_cpuslocked(), but we don't
have an equivalent for enable. I'm not sure what we should be doing
here.

Thanks,
Mark.

> The following patch series addresses this by
> 
>  - Cleaning up places which call get_online_cpus() nested
> 
>  - Replacing a few instances with cpu_hotplug_disable() to prevent circular
>    locking dependencies.
> 
> The series depends on
> 
>     git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core
>   plus
>     Linus tree merged in to avoid conflicts
> 
> It's available in git from
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.hotplug
> 
> Changes since V1:
> 
>   - Fixed fallout reported by kbuild bot
>   - Repaired the recursive call in perf
>   - Repaired the interaction with jumplabels (Peter Zijlstra)
>   - Renamed _locked to _cpuslocked
>   - Picked up Acked-bys
> 
> Thanks,
> 
> 	tglx
> 
> -------
>  arch/arm/kernel/hw_breakpoint.c               |    5 
>  arch/mips/kernel/jump_label.c                 |    2 
>  arch/powerpc/kvm/book3s_hv.c                  |    8 -
>  arch/powerpc/platforms/powernv/subcore.c      |    3 
>  arch/s390/kernel/time.c                       |    2 
>  arch/x86/events/core.c                        |    1 
>  arch/x86/events/intel/cqm.c                   |   12 -
>  arch/x86/kernel/cpu/mtrr/main.c               |    2 
>  b/arch/sparc/kernel/jump_label.c              |    2 
>  b/arch/tile/kernel/jump_label.c               |    2 
>  b/arch/x86/events/intel/core.c                |    4 
>  b/arch/x86/kernel/jump_label.c                |    2 
>  b/kernel/jump_label.c                         |   31 ++++-
>  drivers/acpi/processor_driver.c               |    4 
>  drivers/cpufreq/cpufreq.c                     |    9 -
>  drivers/hwtracing/coresight/coresight-etm3x.c |   12 -
>  drivers/hwtracing/coresight/coresight-etm4x.c |   12 -
>  drivers/pci/pci-driver.c                      |   47 ++++---
>  include/linux/cpu.h                           |    2 
>  include/linux/cpuhotplug.h                    |   29 ++++
>  include/linux/jump_label.h                    |    3 
>  include/linux/padata.h                        |    3 
>  include/linux/pci.h                           |    1 
>  include/linux/stop_machine.h                  |   26 +++-
>  kernel/cpu.c                                  |  157 ++++++++------------------
>  kernel/events/core.c                          |    9 -
>  kernel/padata.c                               |   39 +++---
>  kernel/stop_machine.c                         |    7 -
>  28 files changed, 228 insertions(+), 208 deletions(-)
> 
> 
> 

  parent reply index

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-18 17:04 Thomas Gleixner
2017-04-18 17:04 ` [patch V2 01/24] cpu/hotplug: Provide cpuhp_setup/remove_state[_nocalls]_cpuslocked() Thomas Gleixner
2017-04-20 11:18   ` [tip:smp/hotplug] " tip-bot for Sebastian Andrzej Siewior
2017-04-18 17:04 ` [patch V2 02/24] stop_machine: Provide stop_machine_cpuslocked() Thomas Gleixner
2017-04-20 11:19   ` [tip:smp/hotplug] " tip-bot for Sebastian Andrzej Siewior
2017-04-18 17:04 ` [patch V2 03/24] padata: Make padata_alloc() static Thomas Gleixner
2017-04-20 11:19   ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
2017-04-18 17:04 ` [patch V2 04/24] padata: Avoid nested calls to get_online_cpus() in pcrypt_init_padata() Thomas Gleixner
2017-04-20 11:20   ` [tip:smp/hotplug] " tip-bot for Sebastian Andrzej Siewior
2017-04-18 17:04 ` [patch V2 05/24] x86/mtrr: Remove get_online_cpus() from mtrr_save_state() Thomas Gleixner
2017-04-20 11:20   ` [tip:smp/hotplug] " tip-bot for Sebastian Andrzej Siewior
2017-04-18 17:04 ` [patch V2 06/24] cpufreq: Use cpuhp_setup_state_nocalls_cpuslocked() Thomas Gleixner
2017-04-20 11:21   ` [tip:smp/hotplug] " tip-bot for Sebastian Andrzej Siewior
2017-04-18 17:04 ` [patch V2 07/24] KVM/PPC/Book3S HV: " Thomas Gleixner
2017-04-20 11:21   ` [tip:smp/hotplug] " tip-bot for Sebastian Andrzej Siewior
2017-04-18 17:04 ` [patch V2 08/24] hwtracing/coresight-etm3x: " Thomas Gleixner
2017-04-20 11:22   ` [tip:smp/hotplug] " tip-bot for Sebastian Andrzej Siewior
2017-04-20 15:14   ` [patch V2 08/24] " Mathieu Poirier
2017-04-20 15:32   ` Mathieu Poirier
2017-04-18 17:04 ` [patch V2 09/24] hwtracing/coresight-etm4x: " Thomas Gleixner
2017-04-20 11:22   ` [tip:smp/hotplug] " tip-bot for Sebastian Andrzej Siewior
2017-04-18 17:04 ` [patch V2 10/24] perf/x86/intel/cqm: Use cpuhp_setup_state_cpuslocked() Thomas Gleixner
2017-04-20 11:23   ` [tip:smp/hotplug] " tip-bot for Sebastian Andrzej Siewior
2017-04-18 17:04 ` [patch V2 11/24] ARM/hw_breakpoint: " Thomas Gleixner
2017-04-19 17:54   ` Mark Rutland
2017-04-19 18:20     ` Thomas Gleixner
2017-04-20 11:23   ` [tip:smp/hotplug] " tip-bot for Sebastian Andrzej Siewior
2017-04-18 17:04 ` [patch V2 12/24] s390/kernel: Use stop_machine_cpuslocked() Thomas Gleixner
2017-04-20 11:24   ` [tip:smp/hotplug] " tip-bot for Sebastian Andrzej Siewior
2017-04-18 17:04 ` [patch V2 13/24] powerpc/powernv: " Thomas Gleixner
2017-04-20 11:24   ` [tip:smp/hotplug] " tip-bot for Sebastian Andrzej Siewior
2017-04-18 17:04 ` [patch V2 14/24] cpu/hotplug: Use stop_machine_cpuslocked() in takedown_cpu() Thomas Gleixner
2017-04-20 11:25   ` [tip:smp/hotplug] " tip-bot for Sebastian Andrzej Siewior
2017-04-18 17:04 ` [patch V2 15/24] x86/perf: Drop EXPORT of perf_check_microcode Thomas Gleixner
2017-04-20 11:25   ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
2017-04-18 17:04 ` [patch V2 16/24] perf/x86/intel: Drop get_online_cpus() in intel_snb_check_microcode() Thomas Gleixner
2017-04-20 11:26   ` [tip:smp/hotplug] " tip-bot for Sebastian Andrzej Siewior
2017-04-18 17:04 ` [patch V2 17/24] PCI: Use cpu_hotplug_disable() instead of get_online_cpus() Thomas Gleixner
2017-04-20 11:27   ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
2017-04-18 17:05 ` [patch V2 18/24] PCI: Replace the racy recursion prevention Thomas Gleixner
2017-04-20 11:27   ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
2017-04-18 17:05 ` [patch V2 19/24] ACPI/processor: Use cpu_hotplug_disable() instead of get_online_cpus() Thomas Gleixner
2017-04-20 11:28   ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
2017-04-18 17:05 ` [patch V2 20/24] perf/core: Remove redundant get_online_cpus() Thomas Gleixner
2017-04-20 11:28   ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
2017-04-18 17:05 ` [patch V2 21/24] jump_label: Pull get_online_cpus() into generic code Thomas Gleixner
2017-04-18 17:05 ` [patch V2 22/24] jump_label: Provide static_key_slow_inc_cpuslocked() Thomas Gleixner
2017-04-18 17:05 ` [patch V2 23/24] perf: Avoid cpu_hotplug_lock r-r recursion Thomas Gleixner
2017-04-18 17:05 ` [patch V2 24/24] cpu/hotplug: Convert hotplug locking to percpu rwsem Thomas Gleixner
2017-04-20 11:30   ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
2017-05-10  4:59   ` [patch V2 24/24] " Michael Ellerman
2017-05-10  8:49     ` Thomas Gleixner
2017-05-10 16:30       ` Steven Rostedt
2017-05-10 17:15         ` Steven Rostedt
2017-05-11  5:49       ` Michael Ellerman
2017-04-25 16:10 ` Mark Rutland [this message]
2017-04-25 17:28   ` [patch V2 00/24] cpu/hotplug: Convert get_online_cpus() to a percpu_rwsem Sebastian Siewior
2017-04-26  8:59     ` Mark Rutland
2017-04-26  9:40       ` Suzuki K Poulose
2017-04-26 10:32         ` Mark Rutland
2017-04-27  8:27           ` Sebastian Siewior
2017-04-27  9:57             ` Mark Rutland
2017-04-27 10:01               ` Thomas Gleixner
2017-04-27 12:30                 ` Mark Rutland
2017-04-27 15:48                   ` [PATCH] arm64: cpufeature: use static_branch_enable_cpuslocked() (was: Re: [patch V2 00/24] cpu/hotplug: Convert get_online_cpus() to a percpu_rwsem) Mark Rutland
2017-04-27 16:35                     ` Suzuki K Poulose
2017-04-27 17:03                       ` [PATCH] arm64: cpufeature: use static_branch_enable_cpuslocked() Suzuki K Poulose
2017-04-27 17:17                         ` Mark Rutland
2017-04-28 14:24 ` [RFC PATCH] trace/perf: cure locking issue in perf_event_open() error path Sebastian Siewior
2017-04-28 14:27   ` Sebastian Siewior
2017-05-01 12:57   ` [tip:smp/hotplug] perf: Reorder cpu hotplug rwsem against cred_guard_mutex tip-bot for Thomas Gleixner
2017-05-01 12:58   ` [tip:smp/hotplug] perf: Push hotplug protection down to callers tip-bot for Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170425161037.GA27156@leverpostej \
    --to=mark.rutland@arm.com \
    --cc=bigeasy@linutronix.de \
    --cc=catalin.marinas@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=suzuki.poulose@arm.com \
    --cc=tglx@linutronix.de \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git