get_online_cpus() is used in hot pathes in mainline and even more so in RT. That can show up badly under certain conditions because every locker contends on a global mutex. RT has it's own homebrewn mitigation which is an (badly done) open coded implementation of percpu_rwsems with recursion support. The proper replacement for that are percpu_rwsems, but that requires to remove recursion support. The conversion unearthed real locking issues which were previously not visible because the get_online_cpus() lockdep annotation was implemented with recursion support which prevents lockdep from tracking full dependency chains. These potential deadlocks are not related to recursive calls, they trigger on the first invocation because lockdep now has the full dependency chains available. The following patch series addresses this by - Cleaning up places which call get_online_cpus() nested - Replacing a few instances with cpu_hotplug_disable() to prevent circular locking dependencies. The series depends on git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core plus Linus tree merged in to avoid conflicts It's available in git from git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.hotplug Changes since V1: - Fixed fallout reported by kbuild bot - Repaired the recursive call in perf - Repaired the interaction with jumplabels (Peter Zijlstra) - Renamed _locked to _cpuslocked - Picked up Acked-bys Thanks, tglx ------- arch/arm/kernel/hw_breakpoint.c | 5 arch/mips/kernel/jump_label.c | 2 arch/powerpc/kvm/book3s_hv.c | 8 - arch/powerpc/platforms/powernv/subcore.c | 3 arch/s390/kernel/time.c | 2 arch/x86/events/core.c | 1 arch/x86/events/intel/cqm.c | 12 - arch/x86/kernel/cpu/mtrr/main.c | 2 b/arch/sparc/kernel/jump_label.c | 2 b/arch/tile/kernel/jump_label.c | 2 b/arch/x86/events/intel/core.c | 4 b/arch/x86/kernel/jump_label.c | 2 b/kernel/jump_label.c | 31 ++++- drivers/acpi/processor_driver.c | 4 drivers/cpufreq/cpufreq.c | 9 - drivers/hwtracing/coresight/coresight-etm3x.c | 12 - drivers/hwtracing/coresight/coresight-etm4x.c | 12 - drivers/pci/pci-driver.c | 47 ++++--- include/linux/cpu.h | 2 include/linux/cpuhotplug.h | 29 ++++ include/linux/jump_label.h | 3 include/linux/padata.h | 3 include/linux/pci.h | 1 include/linux/stop_machine.h | 26 +++- kernel/cpu.c | 157 ++++++++------------------ kernel/events/core.c | 9 - kernel/padata.c | 39 +++--- kernel/stop_machine.c | 7 - 28 files changed, 228 insertions(+), 208 deletions(-)
[-- Attachment #1: cpuhotplug_Provide_cpuhp_setup_state_locked.patch --] [-- Type: text/plain, Size: 6846 bytes --] From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Some call sites of cpuhp_setup/remove_state[_nocalls]() are within a get_online_cpus() protected region. cpuhp_setup/remove_state[_nocalls]() call get_online_cpus() as well, which is possible in the current implementation but prevents converting the hotplug locking to a percpu rwsem. Provide locked versions of the interfaces to avoid nested calls to get_online_cpus(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- include/linux/cpuhotplug.h | 29 +++++++++++++++++++++++++++++ kernel/cpu.c | 45 +++++++++++++++++++++++++++++++++------------ 2 files changed, 62 insertions(+), 12 deletions(-) --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -151,6 +151,11 @@ int __cpuhp_setup_state(enum cpuhp_state int (*startup)(unsigned int cpu), int (*teardown)(unsigned int cpu), bool multi_instance); +int __cpuhp_setup_state_cpuslocked(enum cpuhp_state state, const char *name, + bool invoke, + int (*startup)(unsigned int cpu), + int (*teardown)(unsigned int cpu), + bool multi_instance); /** * cpuhp_setup_state - Setup hotplug state callbacks with calling the callbacks * @state: The state for which the calls are installed @@ -169,6 +174,15 @@ static inline int cpuhp_setup_state(enum return __cpuhp_setup_state(state, name, true, startup, teardown, false); } +static inline int cpuhp_setup_state_cpuslocked(enum cpuhp_state state, + const char *name, + int (*startup)(unsigned int cpu), + int (*teardown)(unsigned int cpu)) +{ + return __cpuhp_setup_state_cpuslocked(state, name, true, startup, + teardown, false); +} + /** * cpuhp_setup_state_nocalls - Setup hotplug state callbacks without calling the * callbacks @@ -189,6 +203,15 @@ static inline int cpuhp_setup_state_noca false); } +static inline int cpuhp_setup_state_nocalls_cpuslocked(enum cpuhp_state state, + const char *name, + int (*startup)(unsigned int cpu), + int (*teardown)(unsigned int cpu)) +{ + return __cpuhp_setup_state_cpuslocked(state, name, false, startup, + teardown, false); +} + /** * cpuhp_setup_state_multi - Add callbacks for multi state * @state: The state for which the calls are installed @@ -248,6 +271,7 @@ static inline int cpuhp_state_add_instan } void __cpuhp_remove_state(enum cpuhp_state state, bool invoke); +void __cpuhp_remove_state_cpuslocked(enum cpuhp_state state, bool invoke); /** * cpuhp_remove_state - Remove hotplug state callbacks and invoke the teardown @@ -271,6 +295,11 @@ static inline void cpuhp_remove_state_no __cpuhp_remove_state(state, false); } +static inline void cpuhp_remove_state_nocalls_cpuslocked(enum cpuhp_state state) +{ + __cpuhp_remove_state_cpuslocked(state, false); +} + /** * cpuhp_remove_multi_state - Remove hotplug multi state callback * @state: The state for which the calls are removed --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -1457,7 +1457,7 @@ int __cpuhp_state_add_instance(enum cpuh EXPORT_SYMBOL_GPL(__cpuhp_state_add_instance); /** - * __cpuhp_setup_state - Setup the callbacks for an hotplug machine state + * __cpuhp_setup_state_cpuslocked - Setup the callbacks for an hotplug machine state * @state: The state to setup * @invoke: If true, the startup function is invoked for cpus where * cpu state >= @state @@ -1466,17 +1466,18 @@ EXPORT_SYMBOL_GPL(__cpuhp_state_add_inst * @multi_instance: State is set up for multiple instances which get * added afterwards. * + * The caller needs to hold get_online_cpus() while calling this function. * Returns: * On success: * Positive state number if @state is CPUHP_AP_ONLINE_DYN * 0 for all other states * On failure: proper (negative) error code */ -int __cpuhp_setup_state(enum cpuhp_state state, - const char *name, bool invoke, - int (*startup)(unsigned int cpu), - int (*teardown)(unsigned int cpu), - bool multi_instance) +int __cpuhp_setup_state_cpuslocked(enum cpuhp_state state, + const char *name, bool invoke, + int (*startup)(unsigned int cpu), + int (*teardown)(unsigned int cpu), + bool multi_instance) { int cpu, ret = 0; bool dynstate; @@ -1484,7 +1485,6 @@ int __cpuhp_setup_state(enum cpuhp_state if (cpuhp_cb_check(state) || !name) return -EINVAL; - get_online_cpus(); mutex_lock(&cpuhp_state_mutex); ret = cpuhp_store_callbacks(state, name, startup, teardown, @@ -1520,7 +1520,6 @@ int __cpuhp_setup_state(enum cpuhp_state } out: mutex_unlock(&cpuhp_state_mutex); - put_online_cpus(); /* * If the requested state is CPUHP_AP_ONLINE_DYN, return the * dynamically allocated state in case of success. @@ -1529,6 +1528,22 @@ int __cpuhp_setup_state(enum cpuhp_state return state; return ret; } +EXPORT_SYMBOL(__cpuhp_setup_state_cpuslocked); + +int __cpuhp_setup_state(enum cpuhp_state state, + const char *name, bool invoke, + int (*startup)(unsigned int cpu), + int (*teardown)(unsigned int cpu), + bool multi_instance) +{ + int ret; + + get_online_cpus(); + ret = __cpuhp_setup_state_cpuslocked(state, name, invoke, startup, + teardown, multi_instance); + put_online_cpus(); + return ret; +} EXPORT_SYMBOL(__cpuhp_setup_state); int __cpuhp_state_remove_instance(enum cpuhp_state state, @@ -1570,23 +1585,22 @@ int __cpuhp_state_remove_instance(enum c EXPORT_SYMBOL_GPL(__cpuhp_state_remove_instance); /** - * __cpuhp_remove_state - Remove the callbacks for an hotplug machine state + * __cpuhp_remove_state_cpuslocked - Remove the callbacks for an hotplug machine state * @state: The state to remove * @invoke: If true, the teardown function is invoked for cpus where * cpu state >= @state * + * The caller needs to hold get_online_cpus() while calling this function. * The teardown callback is currently not allowed to fail. Think * about module removal! */ -void __cpuhp_remove_state(enum cpuhp_state state, bool invoke) +void __cpuhp_remove_state_cpuslocked(enum cpuhp_state state, bool invoke) { struct cpuhp_step *sp = cpuhp_get_step(state); int cpu; BUG_ON(cpuhp_cb_check(state)); - get_online_cpus(); - mutex_lock(&cpuhp_state_mutex); if (sp->multi_instance) { WARN(!hlist_empty(&sp->list), @@ -1613,6 +1627,13 @@ void __cpuhp_remove_state(enum cpuhp_sta remove: cpuhp_store_callbacks(state, NULL, NULL, NULL, false); mutex_unlock(&cpuhp_state_mutex); +} +EXPORT_SYMBOL(__cpuhp_remove_state_cpuslocked); + +void __cpuhp_remove_state(enum cpuhp_state state, bool invoke) +{ + get_online_cpus(); + __cpuhp_remove_state_cpuslocked(state, invoke); put_online_cpus(); } EXPORT_SYMBOL(__cpuhp_remove_state);
[-- Attachment #1: kernel-stopmachine-provide-stop-machine-locked.patch --] [-- Type: text/plain, Size: 2936 bytes --] From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Some call sites of stop_machine() are within a get_online_cpus() protected region. stop_machine() calls get_online_cpus() as well, which is possible in the current implementation but prevents converting the hotplug locking to a percpu rwsem. Provide stop_machine_cpuslocked() to avoid nested calls to get_online_cpus(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- include/linux/stop_machine.h | 26 +++++++++++++++++++++++--- kernel/stop_machine.c | 5 +++-- 2 files changed, 26 insertions(+), 5 deletions(-) --- a/include/linux/stop_machine.h +++ b/include/linux/stop_machine.h @@ -116,15 +116,29 @@ static inline int try_stop_cpus(const st * @fn() runs. * * This can be thought of as a very heavy write lock, equivalent to - * grabbing every spinlock in the kernel. */ + * grabbing every spinlock in the kernel. + * + * Protects against CPU hotplug. + */ int stop_machine(cpu_stop_fn_t fn, void *data, const struct cpumask *cpus); +/** + * stop_machine_cpuslocked: freeze the machine on all CPUs and run this function + * @fn: the function to run + * @data: the data ptr for the @fn() + * @cpus: the cpus to run the @fn() on (NULL = any online cpu) + * + * Same as above. Must be called from with in a get_online_cpus() protected + * region. Avoids nested calls to get_online_cpus(). + */ +int stop_machine_cpuslocked(cpu_stop_fn_t fn, void *data, const struct cpumask *cpus); + int stop_machine_from_inactive_cpu(cpu_stop_fn_t fn, void *data, const struct cpumask *cpus); #else /* CONFIG_SMP || CONFIG_HOTPLUG_CPU */ -static inline int stop_machine(cpu_stop_fn_t fn, void *data, - const struct cpumask *cpus) +static inline int stop_machine_cpuslocked(cpu_stop_fn_t fn, void *data, + const struct cpumask *cpus) { unsigned long flags; int ret; @@ -134,6 +148,12 @@ static inline int stop_machine(cpu_stop_ return ret; } +static inline int stop_machine(cpu_stop_fn_t fn, void *data, + const struct cpumask *cpus) +{ + return stop_machine_cpuslocked(fn, data, cpus); +} + static inline int stop_machine_from_inactive_cpu(cpu_stop_fn_t fn, void *data, const struct cpumask *cpus) { --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -552,7 +552,8 @@ static int __init cpu_stop_init(void) } early_initcall(cpu_stop_init); -static int __stop_machine(cpu_stop_fn_t fn, void *data, const struct cpumask *cpus) +int stop_machine_cpuslocked(cpu_stop_fn_t fn, void *data, + const struct cpumask *cpus) { struct multi_stop_data msdata = { .fn = fn, @@ -591,7 +592,7 @@ int stop_machine(cpu_stop_fn_t fn, void /* No CPUs can come up or down during this. */ get_online_cpus(); - ret = __stop_machine(fn, data, cpus); + ret = stop_machine_cpuslocked(fn, data, cpus); put_online_cpus(); return ret; }
[-- Attachment #1: padata--Make-padata_alloc---static.patch --] [-- Type: text/plain, Size: 2737 bytes --] No users outside of padata.c Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: linux-crypto@vger.kernel.org --- include/linux/padata.h | 3 --- kernel/padata.c | 32 ++++++++++++++++---------------- 2 files changed, 16 insertions(+), 19 deletions(-) --- a/include/linux/padata.h +++ b/include/linux/padata.h @@ -166,9 +166,6 @@ struct padata_instance { extern struct padata_instance *padata_alloc_possible( struct workqueue_struct *wq); -extern struct padata_instance *padata_alloc(struct workqueue_struct *wq, - const struct cpumask *pcpumask, - const struct cpumask *cbcpumask); extern void padata_free(struct padata_instance *pinst); extern int padata_do_parallel(struct padata_instance *pinst, struct padata_priv *padata, int cb_cpu); --- a/kernel/padata.c +++ b/kernel/padata.c @@ -939,19 +939,6 @@ static struct kobj_type padata_attr_type }; /** - * padata_alloc_possible - Allocate and initialize padata instance. - * Use the cpu_possible_mask for serial and - * parallel workers. - * - * @wq: workqueue to use for the allocated padata instance - */ -struct padata_instance *padata_alloc_possible(struct workqueue_struct *wq) -{ - return padata_alloc(wq, cpu_possible_mask, cpu_possible_mask); -} -EXPORT_SYMBOL(padata_alloc_possible); - -/** * padata_alloc - allocate and initialize a padata instance and specify * cpumasks for serial and parallel workers. * @@ -959,9 +946,9 @@ EXPORT_SYMBOL(padata_alloc_possible); * @pcpumask: cpumask that will be used for padata parallelization * @cbcpumask: cpumask that will be used for padata serialization */ -struct padata_instance *padata_alloc(struct workqueue_struct *wq, - const struct cpumask *pcpumask, - const struct cpumask *cbcpumask) +static struct padata_instance *padata_alloc(struct workqueue_struct *wq, + const struct cpumask *pcpumask, + const struct cpumask *cbcpumask) { struct padata_instance *pinst; struct parallel_data *pd = NULL; @@ -1016,6 +1003,19 @@ struct padata_instance *padata_alloc(str } /** + * padata_alloc_possible - Allocate and initialize padata instance. + * Use the cpu_possible_mask for serial and + * parallel workers. + * + * @wq: workqueue to use for the allocated padata instance + */ +struct padata_instance *padata_alloc_possible(struct workqueue_struct *wq) +{ + return padata_alloc(wq, cpu_possible_mask, cpu_possible_mask); +} +EXPORT_SYMBOL(padata_alloc_possible); + +/** * padata_free - free a padata instance * * @padata_inst: padata instance to free
[-- Attachment #1: kernelpadata_Avoid_get_online_cpus_recursion_via_pcrypt_init_padata.patch --] [-- Type: text/plain, Size: 2051 bytes --] From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> pcrypt_init_padata() get_online_cpus() padata_alloc_possible() padata_alloc() get_online_cpus() The nested call to get_online_cpus() works with the current implementation, but prevents the conversion to a percpu rwsem. The other caller of padata_alloc_possible() is pcrypt_init_padata() which calls from a get_online_cpus() protected region as well. Remove the get_online_cpus() call in padata_alloc() and document the calling convention. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: linux-crypto@vger.kernel.org --- kernel/padata.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) --- a/kernel/padata.c +++ b/kernel/padata.c @@ -945,6 +945,8 @@ static struct kobj_type padata_attr_type * @wq: workqueue to use for the allocated padata instance * @pcpumask: cpumask that will be used for padata parallelization * @cbcpumask: cpumask that will be used for padata serialization + * + * Must be called from a get_online_cpus() protected region */ static struct padata_instance *padata_alloc(struct workqueue_struct *wq, const struct cpumask *pcpumask, @@ -957,7 +959,6 @@ static struct padata_instance *padata_al if (!pinst) goto err; - get_online_cpus(); if (!alloc_cpumask_var(&pinst->cpumask.pcpu, GFP_KERNEL)) goto err_free_inst; if (!alloc_cpumask_var(&pinst->cpumask.cbcpu, GFP_KERNEL)) { @@ -997,7 +998,6 @@ static struct padata_instance *padata_al free_cpumask_var(pinst->cpumask.cbcpu); err_free_inst: kfree(pinst); - put_online_cpus(); err: return NULL; } @@ -1008,6 +1008,8 @@ static struct padata_instance *padata_al * parallel workers. * * @wq: workqueue to use for the allocated padata instance + * + * Must be called from a get_online_cpus() protected region */ struct padata_instance *padata_alloc_possible(struct workqueue_struct *wq) {
[-- Attachment #1: x86mtrr_Remove_get_online_cpus_from_mtrr_save_state.patch --] [-- Type: text/plain, Size: 936 bytes --] From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> mtrr_save_state() is invoked from native_cpu_up() which is in the context of a CPU hotplug operation and therefor calling get_online_cpus() is pointless. While this works in the current get_online_cpus() implementation it prevents from converting the hotplug locking to percpu rwsems. Remove it. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org --- arch/x86/kernel/cpu/mtrr/main.c | 2 -- 1 file changed, 2 deletions(-) --- a/arch/x86/kernel/cpu/mtrr/main.c +++ b/arch/x86/kernel/cpu/mtrr/main.c @@ -807,10 +807,8 @@ void mtrr_save_state(void) if (!mtrr_enabled()) return; - get_online_cpus(); first_cpu = cpumask_first(cpu_online_mask); smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1); - put_online_cpus(); } void set_mtrr_aps_delayed_init(void)
[-- Attachment #1: cpufreq-Use-cpuhp_setup_state_nocalls_locked.patch --] [-- Type: text/plain, Size: 1630 bytes --] From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> cpufreq holds get_online_cpus() while invoking cpuhp_setup_state_nocalls() to make subsys_interface_register() and the registration of hotplug calls atomic versus cpu hotplug. cpuhp_setup_state_nocalls() invokes get_online_cpus() as well. This is correct, but prevents the conversion of the hotplug locking to a percpu rwsem. Use cpuhp_setup/remove_state_nocalls_cpuslocked() to avoid the nested call. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: linux-pm@vger.kernel.org --- drivers/cpufreq/cpufreq.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -2473,9 +2473,10 @@ int cpufreq_register_driver(struct cpufr goto err_if_unreg; } - ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "cpufreq:online", - cpuhp_cpufreq_online, - cpuhp_cpufreq_offline); + ret = cpuhp_setup_state_nocalls_cpuslocked(CPUHP_AP_ONLINE_DYN, + "cpufreq:online", + cpuhp_cpufreq_online, + cpuhp_cpufreq_offline); if (ret < 0) goto err_if_unreg; hp_online = ret; @@ -2519,7 +2520,7 @@ int cpufreq_unregister_driver(struct cpu get_online_cpus(); subsys_interface_unregister(&cpufreq_interface); remove_boost_sysfs_file(); - cpuhp_remove_state_nocalls(hp_online); + cpuhp_remove_state_nocalls_cpuslocked(hp_online); write_lock_irqsave(&cpufreq_driver_lock, flags);
[-- Attachment #1: KVMPPCBook3S_HV_Use_cpuhp_setup_state_nocalls_locked.patch --] [-- Type: text/plain, Size: 1310 bytes --] From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> kvmppc_alloc_host_rm_ops() holds get_online_cpus() while invoking cpuhp_setup_state_nocalls(). cpuhp_setup_state_nocalls() invokes get_online_cpus() as well. This is correct, but prevents the conversion of the hotplug locking to a percpu rwsem. Use cpuhp_setup_state_nocalls_cpuslocked() to avoid the nested call. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Graf <agraf@suse.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: kvm@vger.kernel.org Cc: kvm-ppc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org --- arch/powerpc/kvm/book3s_hv.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -3336,10 +3336,10 @@ void kvmppc_alloc_host_rm_ops(void) return; } - cpuhp_setup_state_nocalls(CPUHP_KVM_PPC_BOOK3S_PREPARE, - "ppc/kvm_book3s:prepare", - kvmppc_set_host_core, - kvmppc_clear_host_core); + cpuhp_setup_state_nocalls_cpuslocked(CPUHP_KVM_PPC_BOOK3S_PREPARE, + "ppc/kvm_book3s:prepare", + kvmppc_set_host_core, + kvmppc_clear_host_core); put_online_cpus(); }
[-- Attachment #1: wtracingcoresight-etm3x_Use_cpuhp_setup_state_nocalls_locked.patch --] [-- Type: text/plain, Size: 1534 bytes --] From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> etm_probe() holds get_online_cpus() while invoking cpuhp_setup_state_nocalls(). cpuhp_setup_state_nocalls() invokes get_online_cpus() as well. This is correct, but prevents the conversion of the hotplug locking to a percpu rwsem. Use cpuhp_setup_state_nocalls_cpuslocked() to avoid the nested call. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: linux-arm-kernel@lists.infradead.org --- drivers/hwtracing/coresight/coresight-etm3x.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) --- a/drivers/hwtracing/coresight/coresight-etm3x.c +++ b/drivers/hwtracing/coresight/coresight-etm3x.c @@ -803,12 +803,12 @@ static int etm_probe(struct amba_device dev_err(dev, "ETM arch init failed\n"); if (!etm_count++) { - cpuhp_setup_state_nocalls(CPUHP_AP_ARM_CORESIGHT_STARTING, - "arm/coresight:starting", - etm_starting_cpu, etm_dying_cpu); - ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, - "arm/coresight:online", - etm_online_cpu, NULL); + cpuhp_setup_state_nocalls_cpuslocked(CPUHP_AP_ARM_CORESIGHT_STARTING, + "arm/coresight:starting", + etm_starting_cpu, etm_dying_cpu); + ret = cpuhp_setup_state_nocalls_cpuslocked(CPUHP_AP_ONLINE_DYN, + "arm/coresight:online", + etm_online_cpu, NULL); if (ret < 0) goto err_arch_supported; hp_online = ret;
[-- Attachment #1: wtracingcoresight-etm4x_Use_cpuhp_setup_state_nocalls_locked.patch --] [-- Type: text/plain, Size: 1547 bytes --] From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> etm_probe4() holds get_online_cpus() while invoking cpuhp_setup_state_nocalls(). cpuhp_setup_state_nocalls() invokes get_online_cpus() as well. This is correct, but prevents the conversion of the hotplug locking to a percpu rwsem. Use cpuhp_setup_state_nocalls_cpuslocked() to avoid the nested call. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: linux-arm-kernel@lists.infradead.org --- drivers/hwtracing/coresight/coresight-etm4x.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) --- a/drivers/hwtracing/coresight/coresight-etm4x.c +++ b/drivers/hwtracing/coresight/coresight-etm4x.c @@ -990,12 +990,12 @@ static int etm4_probe(struct amba_device dev_err(dev, "ETM arch init failed\n"); if (!etm4_count++) { - cpuhp_setup_state_nocalls(CPUHP_AP_ARM_CORESIGHT_STARTING, - "arm/coresight4:starting", - etm4_starting_cpu, etm4_dying_cpu); - ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, - "arm/coresight4:online", - etm4_online_cpu, NULL); + cpuhp_setup_state_nocalls_cpuslocked(CPUHP_AP_ARM_CORESIGHT_STARTING, + "arm/coresight4:starting", + etm4_starting_cpu, etm4_dying_cpu); + ret = cpuhp_setup_state_nocalls_cpuslocked(CPUHP_AP_ONLINE_DYN, + "arm/coresight4:online", + etm4_online_cpu, NULL); if (ret < 0) goto err_arch_supported; hp_online = ret;
[-- Attachment #1: perfx86intelcqm_Use_cpuhp_setup_state_locked.patch --] [-- Type: text/plain, Size: 1423 bytes --] From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> intel_cqm_init() holds get_online_cpus() while registerring the hotplug callbacks. cpuhp_setup_state() invokes get_online_cpus() as well. This is correct, but prevents the conversion of the hotplug locking to a percpu rwsem. Use cpuhp_setup_state_cpuslocked() to avoid the nested call. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: x86@kernel.org Cc: Fenghua Yu <fenghua.yu@intel.com> --- arch/x86/events/intel/cqm.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) --- a/arch/x86/events/intel/cqm.c +++ b/arch/x86/events/intel/cqm.c @@ -1746,12 +1746,12 @@ static int __init intel_cqm_init(void) * Setup the hot cpu notifier once we are sure cqm * is enabled to avoid notifier leak. */ - cpuhp_setup_state(CPUHP_AP_PERF_X86_CQM_STARTING, - "perf/x86/cqm:starting", - intel_cqm_cpu_starting, NULL); - cpuhp_setup_state(CPUHP_AP_PERF_X86_CQM_ONLINE, "perf/x86/cqm:online", - NULL, intel_cqm_cpu_exit); - + cpuhp_setup_state_cpuslocked(CPUHP_AP_PERF_X86_CQM_STARTING, + "perf/x86/cqm:starting", + intel_cqm_cpu_starting, NULL); + cpuhp_setup_state_cpuslocked(CPUHP_AP_PERF_X86_CQM_ONLINE, + "perf/x86/cqm:online", + NULL, intel_cqm_cpu_exit); out: put_online_cpus();
[-- Attachment #1: ARMhw_breakpoint_Use_cpuhp_setup_state_locked.patch --] [-- Type: text/plain, Size: 1323 bytes --] From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> arch_hw_breakpoint_init() holds get_online_cpus() while registerring the hotplug callbacks. cpuhp_setup_state() invokes get_online_cpus() as well. This is correct, but prevents the conversion of the hotplug locking to a percpu rwsem. Use cpuhp_setup_state_cpuslocked() to avoid the nested call. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: linux-arm-kernel@lists.infradead.org --- arch/arm/kernel/hw_breakpoint.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) --- a/arch/arm/kernel/hw_breakpoint.c +++ b/arch/arm/kernel/hw_breakpoint.c @@ -1098,8 +1098,9 @@ static int __init arch_hw_breakpoint_ini * assume that a halting debugger will leave the world in a nice state * for us. */ - ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "arm/hw_breakpoint:online", - dbg_reset_online, NULL); + ret = cpuhp_setup_state_cpuslocked(CPUHP_AP_ONLINE_DYN, + "arm/hw_breakpoint:online", + dbg_reset_online, NULL); unregister_undef_hook(&debug_reg_hook); if (WARN_ON(ret < 0) || !cpumask_empty(&debug_err_mask)) { core_num_brps = 0;
[-- Attachment #1: s390-kernel-Use_stop_machine_locked.patch --] [-- Type: text/plain, Size: 1062 bytes --] stp_work_fn() holds get_online_cpus() while invoking stop_machine(). stop_machine() invokes get_online_cpus() as well. This is correct, but prevents the conversion of the hotplug locking to a percpu rwsem. Use stop_machine_cpuslocked() to avoid the nested call. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David Hildenbrand <dahi@linux.vnet.ibm.com> Cc: linux-s390@vger.kernel.org --- arch/s390/kernel/time.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/arch/s390/kernel/time.c +++ b/arch/s390/kernel/time.c @@ -636,7 +636,7 @@ static void stp_work_fn(struct work_stru memset(&stp_sync, 0, sizeof(stp_sync)); get_online_cpus(); atomic_set(&stp_sync.cpus, num_online_cpus() - 1); - stop_machine(stp_sync_clock, &stp_sync, cpu_online_mask); + stop_machine_cpuslocked(stp_sync_clock, &stp_sync, cpu_online_mask); put_online_cpus(); if (!check_sync_clock())
[-- Attachment #1: powerpcpowernv_Use_stop_machine_locked.patch --] [-- Type: text/plain, Size: 1034 bytes --] set_subcores_per_core() holds get_online_cpus() while invoking stop_machine(). stop_machine() invokes get_online_cpus() as well. This is correct, but prevents the conversion of the hotplug locking to a percpu rwsem. Use stop_machine_cpuslocked() to avoid the nested call. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org --- arch/powerpc/platforms/powernv/subcore.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/arch/powerpc/platforms/powernv/subcore.c +++ b/arch/powerpc/platforms/powernv/subcore.c @@ -356,7 +356,8 @@ static int set_subcores_per_core(int new /* Ensure state is consistent before we call the other cpus */ mb(); - stop_machine(cpu_update_split_mode, &new_mode, cpu_online_mask); + stop_machine_cpuslocked(cpu_update_split_mode, &new_mode, + cpu_online_mask); put_online_cpus();
[-- Attachment #1: cpu-hotplug-Use-stop-machine-locked.patch --] [-- Type: text/plain, Size: 941 bytes --] From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> takedown_cpu() is a cpu hotplug function invoking stop_machine(). The cpu hotplug machinery holds the hotplug lock for write. stop_machine() invokes get_online_cpus() as well. This is correct, but prevents the conversion of the hotplug locking to a percpu rwsem. Use stop_machine_cpuslocked() to avoid the nested call. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- kernel/cpu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -701,7 +701,7 @@ static int takedown_cpu(unsigned int cpu /* * So now all preempt/rcu users must observe !cpu_active(). */ - err = stop_machine(take_cpu_down, NULL, cpumask_of(cpu)); + err = stop_machine_cpuslocked(take_cpu_down, NULL, cpumask_of(cpu)); if (err) { /* CPU refused to die */ irq_unlock_sparse();
[-- Attachment #1: x86-perf--Drop-EXPORT-of-perf_check_microcode.patch --] [-- Type: text/plain, Size: 621 bytes --] The only caller is the microcode update, which cannot be modular. Drop the export. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Borislav Petkov <bp@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: x86@kernel.org --- arch/x86/events/core.c | 1 - 1 file changed, 1 deletion(-) --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -2224,7 +2224,6 @@ void perf_check_microcode(void) if (x86_pmu.check_microcode) x86_pmu.check_microcode(); } -EXPORT_SYMBOL_GPL(perf_check_microcode); static struct pmu pmu = { .pmu_enable = x86_pmu_enable,
[-- Attachment #1: perfx86intel_Drop_get_online_cpus_in_intel_snb_check_microcode.patch --] [-- Type: text/plain, Size: 1593 bytes --] From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> If intel_snb_check_microcode() is invoked via microcode_init -> perf_check_microcode -> intel_snb_check_microcode then get_online_cpus() is invoked nested. This works with the current implementation of get_online_cpus() but prevents converting it to a percpu rwsem. intel_snb_check_microcode() is also invoked from intel_sandybridge_quirk() unprotected. Drop get_online_cpus() from intel_snb_check_microcode() and add it to intel_sandybridge_quirk() so both call sites are protected. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Borislav Petkov <bp@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: x86@kernel.org --- arch/x86/events/intel/core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -3389,12 +3389,10 @@ static void intel_snb_check_microcode(vo int pebs_broken = 0; int cpu; - get_online_cpus(); for_each_online_cpu(cpu) { if ((pebs_broken = intel_snb_pebs_broken(cpu))) break; } - put_online_cpus(); if (pebs_broken == x86_pmu.pebs_broken) return; @@ -3467,7 +3465,9 @@ static bool check_msr(unsigned long msr, static __init void intel_sandybridge_quirk(void) { x86_pmu.check_microcode = intel_snb_check_microcode; + get_online_cpus(); intel_snb_check_microcode(); + put_online_cpus(); } static const struct { int id; char *name; } intel_arch_events_map[] __initconst = {
[-- Attachment #1: PCI--Use-cpu_hotplug_disable-instead-of-get_online_cpus.patch --] [-- Type: text/plain, Size: 2312 bytes --] Converting the hotplug locking, i.e. get_online_cpus(), to a percpu rwsem unearthed a circular lock dependency which was hidden from lockdep due to the lockdep annotation of get_online_cpus() which prevents lockdep from creating full dependency chains. There are several variants of this. And example is: Chain exists of: cpu_hotplug_lock.rw_sem --> drm_global_mutex --> &item->mutex CPU0 CPU1 ---- ---- lock(&item->mutex); lock(drm_global_mutex); lock(&item->mutex); lock(cpu_hotplug_lock.rw_sem); because there are dependencies through workqueues. The call chain is: get_online_cpus apply_workqueue_attrs __alloc_workqueue_key ttm_mem_global_init ast_ttm_mem_global_init drm_global_item_ref ast_mm_init ast_driver_load drm_dev_register drm_get_pci_dev ast_pci_probe local_pci_probe work_for_cpu_fn process_one_work worker_thread This is not a problem of get_online_cpus() recursion, it's a possible deadlock undetected by lockdep so far. The cure is to use cpu_hotplug_disable() instead of get_online_cpus() to protect the PCI probing. There is a side effect to this: cpu_hotplug_disable() makes a concurrent cpu hotplug attempt via the sysfs interfaces fail with -EBUSY, but PCI probing usually happens during the boot process where no interaction is possible. Any later invocations are infrequent enough and concurrent hotplug attempts are so unlikely that the danger of user space visible regressions is very close to zero. Anyway, thats preferrable over a real deadlock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-pci@vger.kernel.org --- drivers/pci/pci-driver.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -349,13 +349,13 @@ static int pci_call_probe(struct pci_dri if (node >= 0 && node != numa_node_id()) { int cpu; - get_online_cpus(); + cpu_hotplug_disable(); cpu = cpumask_any_and(cpumask_of_node(node), cpu_online_mask); if (cpu < nr_cpu_ids) error = work_on_cpu(cpu, local_pci_probe, &ddi); else error = local_pci_probe(&ddi); - put_online_cpus(); + cpu_hotplug_enable(); } else error = local_pci_probe(&ddi);
[-- Attachment #1: PCI--Replace-the-racy-recursion-prevention.patch --] [-- Type: text/plain, Size: 3926 bytes --] pci_call_probe() can called recursively when a physcial function is probed and the probing creates virtual functions, which are populated via pci_bus_add_device() which in turn can end up calling pci_call_probe() again. The code has an interesting way to prevent recursing into the workqueue code. That's accomplished by a check whether the current task runs already on the numa node which is associated with the device. While that works to prevent the recursion into the workqueue code, it's racy versus normal execution as there is no guarantee that the node does not vanish after the check. There is another issue with this code. It dereferences cpumask_of_node() unconditionally without checking whether the node is available. Make the detection reliable by: - Mark a probed device as 'is_probed' in pci_call_probe() - Check in pci_call_probe for a virtual function. If it's a virtual function and the associated physical function device is marked 'is_probed' then this is a recursive call, so the call can be invoked in the calling context. - Add a check whether the node is online before dereferencing it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-pci@vger.kernel.org --- drivers/pci/pci-driver.c | 47 +++++++++++++++++++++++++---------------------- include/linux/pci.h | 1 + 2 files changed, 26 insertions(+), 22 deletions(-) --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -320,10 +320,19 @@ static long local_pci_probe(void *_ddi) return 0; } +static bool pci_physfn_is_probed(struct pci_dev *dev) +{ +#ifdef CONFIG_ATS + return dev->is_virtfn && dev->physfn->is_probed; +#else + return false; +#endif +} + static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev, const struct pci_device_id *id) { - int error, node; + int error, node, cpu; struct drv_dev_and_id ddi = { drv, dev, id }; /* @@ -332,33 +341,27 @@ static int pci_call_probe(struct pci_dri * on the right node. */ node = dev_to_node(&dev->dev); + dev->is_probed = 1; + + cpu_hotplug_disable(); /* - * On NUMA systems, we are likely to call a PF probe function using - * work_on_cpu(). If that probe calls pci_enable_sriov() (which - * adds the VF devices via pci_bus_add_device()), we may re-enter - * this function to call the VF probe function. Calling - * work_on_cpu() again will cause a lockdep warning. Since VFs are - * always on the same node as the PF, we can work around this by - * avoiding work_on_cpu() when we're already on the correct node. - * - * Preemption is enabled, so it's theoretically unsafe to use - * numa_node_id(), but even if we run the probe function on the - * wrong node, it should be functionally correct. + * Prevent nesting work_on_cpu() for the case where a Virtual Function + * device is probed from work_on_cpu() of the Physical device. */ - if (node >= 0 && node != numa_node_id()) { - int cpu; - - cpu_hotplug_disable(); + if (node < 0 || node >= MAX_NUMNODES || !node_online(node) || + pci_physfn_is_probed(dev)) + cpu = nr_cpu_ids; + else cpu = cpumask_any_and(cpumask_of_node(node), cpu_online_mask); - if (cpu < nr_cpu_ids) - error = work_on_cpu(cpu, local_pci_probe, &ddi); - else - error = local_pci_probe(&ddi); - cpu_hotplug_enable(); - } else + + if (cpu < nr_cpu_ids) + error = work_on_cpu(cpu, local_pci_probe, &ddi); + else error = local_pci_probe(&ddi); + dev->is_probed = 0; + cpu_hotplug_enable(); return error; } --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -365,6 +365,7 @@ struct pci_dev { unsigned int irq_managed:1; unsigned int has_secondary_link:1; unsigned int non_compliant_bars:1; /* broken BARs; ignore them */ + unsigned int is_probed:1; /* device probing in progress */ pci_dev_flags_t dev_flags; atomic_t enable_cnt; /* pci_enable_device has been called */
[-- Attachment #1: ACPI-processor--Use-cpu_hotplug_disable-instead-of-get_online_cpus.patch --] [-- Type: text/plain, Size: 1931 bytes --] Converting the hotplug locking, i.e. get_online_cpus(), to a percpu rwsem unearthed a circular lock dependency which was hidden from lockdep due to the lockdep annotation of get_online_cpus() which prevents lockdep from creating full dependency chains. CPU0 CPU1 ---- ---- lock((&wfc.work)); lock(cpu_hotplug_lock.rw_sem); lock((&wfc.work)); lock(cpu_hotplug_lock.rw_sem); This dependency is established via acpi_processor_start() which calls into the work queue code. And the work queue code establishes the reverse dependency. This is not a problem of get_online_cpus() recursion, it's a possible deadlock undetected by lockdep so far. The cure is to use cpu_hotplug_disable() instead of get_online_cpus() to protect the probing from acpi_processor_start(). There is a side effect to this: cpu_hotplug_disable() makes a concurrent cpu hotplug attempt via the sysfs interfaces fail with -EBUSY, but that probing usually happens during the boot process where no interaction is possible. Any later invocations are infrequent enough and concurrent hotplug attempts are so unlikely that the danger of user space visible regressions is very close to zero. Anyway, thats preferrable over a real deadlock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Len Brown <lenb@kernel.org> Cc: linux-acpi@vger.kernel.org --- drivers/acpi/processor_driver.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/drivers/acpi/processor_driver.c +++ b/drivers/acpi/processor_driver.c @@ -268,9 +268,9 @@ static int acpi_processor_start(struct d return -ENODEV; /* Protect against concurrent CPU hotplug operations */ - get_online_cpus(); + cpu_hotplug_disable(); ret = __acpi_processor_start(device); - put_online_cpus(); + cpu_hotplug_enable(); return ret; }
[-- Attachment #1: perf--Remove-redundant-get_online_cpus--.patch --] [-- Type: text/plain, Size: 958 bytes --] SyS_perf_event_open() calls get_online_cpus() and eventually invokes swevent_hlist_get() which does it again. All callchains leading to swevent_hlist_get() originate from SyS_perf_event_open() so the extra protection is redundant. Remove it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> --- kernel/events/core.c | 5 ----- 1 file changed, 5 deletions(-) --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -7592,7 +7592,6 @@ static int swevent_hlist_get(void) { int err, cpu, failed_cpu; - get_online_cpus(); for_each_possible_cpu(cpu) { err = swevent_hlist_get_cpu(cpu); if (err) { @@ -7600,8 +7599,6 @@ static int swevent_hlist_get(void) goto fail; } } - put_online_cpus(); - return 0; fail: for_each_possible_cpu(cpu) { @@ -7609,8 +7606,6 @@ static int swevent_hlist_get(void) break; swevent_hlist_put_cpu(cpu); } - - put_online_cpus(); return err; }
[-- Attachment #1: jump_label_Pull_get_online_cpus_into_generic_code.patch --] [-- Type: text/plain, Size: 4160 bytes --] This change does two things: - it moves the get_online_cpus() call into generic code, with the aim of later providing some static_key ops that avoid it. - as a side effect it inverts the lock order between cpu_hotplug_lock and jump_label_mutex. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: jbaron@akamai.com Cc: bigeasy@linutronix.de Cc: rostedt@goodmis.org Link: http://lkml.kernel.org/r/20170418103422.590118425@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- --- arch/mips/kernel/jump_label.c | 2 -- arch/sparc/kernel/jump_label.c | 2 -- arch/tile/kernel/jump_label.c | 2 -- arch/x86/kernel/jump_label.c | 2 -- kernel/jump_label.c | 14 ++++++++++++++ 5 files changed, 14 insertions(+), 8 deletions(-) --- a/arch/mips/kernel/jump_label.c +++ b/arch/mips/kernel/jump_label.c @@ -58,7 +58,6 @@ void arch_jump_label_transform(struct ju insn.word = 0; /* nop */ } - get_online_cpus(); mutex_lock(&text_mutex); if (IS_ENABLED(CONFIG_CPU_MICROMIPS)) { insn_p->halfword[0] = insn.word >> 16; @@ -70,7 +69,6 @@ void arch_jump_label_transform(struct ju (unsigned long)insn_p + sizeof(*insn_p)); mutex_unlock(&text_mutex); - put_online_cpus(); } #endif /* HAVE_JUMP_LABEL */ --- a/arch/sparc/kernel/jump_label.c +++ b/arch/sparc/kernel/jump_label.c @@ -41,12 +41,10 @@ void arch_jump_label_transform(struct ju val = 0x01000000; } - get_online_cpus(); mutex_lock(&text_mutex); *insn = val; flushi(insn); mutex_unlock(&text_mutex); - put_online_cpus(); } #endif --- a/arch/tile/kernel/jump_label.c +++ b/arch/tile/kernel/jump_label.c @@ -45,14 +45,12 @@ static void __jump_label_transform(struc void arch_jump_label_transform(struct jump_entry *e, enum jump_label_type type) { - get_online_cpus(); mutex_lock(&text_mutex); __jump_label_transform(e, type); flush_icache_range(e->code, e->code + sizeof(tilegx_bundle_bits)); mutex_unlock(&text_mutex); - put_online_cpus(); } __init_or_module void arch_jump_label_transform_static(struct jump_entry *e, --- a/arch/x86/kernel/jump_label.c +++ b/arch/x86/kernel/jump_label.c @@ -105,11 +105,9 @@ static void __jump_label_transform(struc void arch_jump_label_transform(struct jump_entry *entry, enum jump_label_type type) { - get_online_cpus(); mutex_lock(&text_mutex); __jump_label_transform(entry, type, NULL, 0); mutex_unlock(&text_mutex); - put_online_cpus(); } static enum { --- a/kernel/jump_label.c +++ b/kernel/jump_label.c @@ -15,6 +15,7 @@ #include <linux/static_key.h> #include <linux/jump_label_ratelimit.h> #include <linux/bug.h> +#include <linux/cpu.h> #ifdef HAVE_JUMP_LABEL @@ -124,6 +125,12 @@ void static_key_slow_inc(struct static_k return; } + /* + * A number of architectures need to synchronize I$ across + * the all CPUs, for that to be serialized against CPU hot-plug + * we need to avoid CPUs coming online. + */ + get_online_cpus(); jump_label_lock(); if (atomic_read(&key->enabled) == 0) { atomic_set(&key->enabled, -1); @@ -133,6 +140,7 @@ void static_key_slow_inc(struct static_k atomic_inc(&key->enabled); } jump_label_unlock(); + put_online_cpus(); } EXPORT_SYMBOL_GPL(static_key_slow_inc); @@ -146,6 +154,7 @@ static void __static_key_slow_dec(struct * returns is unbalanced, because all other static_key_slow_inc() * instances block while the update is in progress. */ + get_online_cpus(); if (!atomic_dec_and_mutex_lock(&key->enabled, &jump_label_mutex)) { WARN(atomic_read(&key->enabled) < 0, "jump label: negative count!\n"); @@ -159,6 +168,7 @@ static void __static_key_slow_dec(struct jump_label_update(key); } jump_label_unlock(); + put_online_cpus(); } static void jump_label_update_timeout(struct work_struct *work) @@ -592,6 +602,10 @@ jump_label_module_notify(struct notifier switch (val) { case MODULE_STATE_COMING: + /* + * XXX do we need get_online_cpus() ? the module isn't + * executable yet, so nothing should be looking at our code. + */ jump_label_lock(); ret = jump_label_add_module(mod); if (ret) {
[-- Attachment #1: jump_label_Provide_static_key_slow_inc_nohp.patch --] [-- Type: text/plain, Size: 3376 bytes --] Provide static_key_slow_inc_cpuslocked(), a variant that doesn't take cpu_hotplug_lock(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: jbaron@akamai.com Cc: bigeasy@linutronix.de Cc: rostedt@goodmis.org Link: http://lkml.kernel.org/r/20170418103422.636958338@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- include/linux/jump_label.h | 3 +++ kernel/jump_label.c | 21 +++++++++++++++++---- 2 files changed, 20 insertions(+), 4 deletions(-) --- a/include/linux/jump_label.h +++ b/include/linux/jump_label.h @@ -158,6 +158,7 @@ extern void arch_jump_label_transform_st enum jump_label_type type); extern int jump_label_text_reserved(void *start, void *end); extern void static_key_slow_inc(struct static_key *key); +extern void static_key_slow_inc_cpuslocked(struct static_key *key); extern void static_key_slow_dec(struct static_key *key); extern void jump_label_apply_nops(struct module *mod); extern int static_key_count(struct static_key *key); @@ -213,6 +214,8 @@ static inline void static_key_slow_inc(s atomic_inc(&key->enabled); } +#define static_key_slow_inc_cpuslocked static_key_slow_inc + static inline void static_key_slow_dec(struct static_key *key) { STATIC_KEY_CHECK_USE(); --- a/kernel/jump_label.c +++ b/kernel/jump_label.c @@ -101,7 +101,7 @@ void static_key_disable(struct static_ke } EXPORT_SYMBOL_GPL(static_key_disable); -void static_key_slow_inc(struct static_key *key) +void __static_key_slow_inc(struct static_key *key) { int v, v1; @@ -130,7 +130,6 @@ void static_key_slow_inc(struct static_k * the all CPUs, for that to be serialized against CPU hot-plug * we need to avoid CPUs coming online. */ - get_online_cpus(); jump_label_lock(); if (atomic_read(&key->enabled) == 0) { atomic_set(&key->enabled, -1); @@ -140,10 +139,22 @@ void static_key_slow_inc(struct static_k atomic_inc(&key->enabled); } jump_label_unlock(); +} + +void static_key_slow_inc(struct static_key *key) +{ + get_online_cpus(); + __static_key_slow_inc(key); put_online_cpus(); } EXPORT_SYMBOL_GPL(static_key_slow_inc); +void static_key_slow_inc_cpuslocked(struct static_key *key) +{ + __static_key_slow_inc(key); +} +EXPORT_SYMBOL_GPL(static_key_slow_inc_cpuslocked); + static void __static_key_slow_dec(struct static_key *key, unsigned long rate_limit, struct delayed_work *work) { @@ -154,7 +165,6 @@ static void __static_key_slow_dec(struct * returns is unbalanced, because all other static_key_slow_inc() * instances block while the update is in progress. */ - get_online_cpus(); if (!atomic_dec_and_mutex_lock(&key->enabled, &jump_label_mutex)) { WARN(atomic_read(&key->enabled) < 0, "jump label: negative count!\n"); @@ -168,20 +178,23 @@ static void __static_key_slow_dec(struct jump_label_update(key); } jump_label_unlock(); - put_online_cpus(); } static void jump_label_update_timeout(struct work_struct *work) { struct static_key_deferred *key = container_of(work, struct static_key_deferred, work.work); + get_online_cpus(); __static_key_slow_dec(&key->key, 0, NULL); + put_online_cpus(); } void static_key_slow_dec(struct static_key *key) { STATIC_KEY_CHECK_USE(); + get_online_cpus(); __static_key_slow_dec(key, 0, NULL); + put_online_cpus(); } EXPORT_SYMBOL_GPL(static_key_slow_dec);
[-- Attachment #1: perf_Avoid_cpu_hotplug_lock_r-r_recursion.patch --] [-- Type: text/plain, Size: 1190 bytes --] There are two call-sites where using static_key results in recursing on the cpu_hotplug_lock. Use the hotplug locked version of static_key_slow_inc(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: jbaron@akamai.com Cc: bigeasy@linutronix.de Cc: rostedt@goodmis.org Link: http://lkml.kernel.org/r/20170418103422.687248115@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- kernel/events/core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -7653,7 +7653,7 @@ static int perf_swevent_init(struct perf if (err) return err; - static_key_slow_inc(&perf_swevent_enabled[event_id]); + static_key_slow_inc_cpuslocked(&perf_swevent_enabled[event_id]); event->destroy = sw_perf_event_destroy; } @@ -9160,7 +9160,7 @@ static void account_event(struct perf_ev mutex_lock(&perf_sched_mutex); if (!atomic_read(&perf_sched_count)) { - static_branch_enable(&perf_sched_events); + static_key_slow_inc_cpuslocked(&perf_sched_events.key); /* * Guarantee that all CPUs observe they key change and * call the perf scheduling hooks before proceeding to
[-- Attachment #1: cpu-hotplug--Convert-hotplug-locking-to-percpu-rwsem.patch --] [-- Type: text/plain, Size: 7514 bytes --] There are no more (known) nested calls to get_online_cpus() so it's possible to remove the nested call magic and convert the mutex to a percpu-rwsem, which speeds up get/put_online_cpus() significantly for the uncontended case. The contended case (write locked for hotplug operations) is slow anyway, so the slightly more expensive down_write of the percpu rwsem does not matter. [ peterz: Add lockdep assertions ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- include/linux/cpu.h | 2 kernel/cpu.c | 110 +++++++------------------------------------------- kernel/jump_label.c | 2 kernel/padata.c | 1 kernel/stop_machine.c | 2 5 files changed, 23 insertions(+), 94 deletions(-) --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -105,6 +105,7 @@ extern void cpu_hotplug_begin(void); extern void cpu_hotplug_done(void); extern void get_online_cpus(void); extern void put_online_cpus(void); +extern void lockdep_assert_hotplug_held(void); extern void cpu_hotplug_disable(void); extern void cpu_hotplug_enable(void); void clear_tasks_mm_cpumask(int cpu); @@ -118,6 +119,7 @@ static inline void cpu_hotplug_done(void #define put_online_cpus() do { } while (0) #define cpu_hotplug_disable() do { } while (0) #define cpu_hotplug_enable() do { } while (0) +static inline void lockdep_assert_hotplug_held(void) {} #endif /* CONFIG_HOTPLUG_CPU */ #ifdef CONFIG_PM_SLEEP_SMP --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -27,6 +27,7 @@ #include <linux/smpboot.h> #include <linux/relay.h> #include <linux/slab.h> +#include <linux/percpu-rwsem.h> #include <trace/events/power.h> #define CREATE_TRACE_POINTS @@ -196,121 +197,41 @@ void cpu_maps_update_done(void) mutex_unlock(&cpu_add_remove_lock); } -/* If set, cpu_up and cpu_down will return -EBUSY and do nothing. +/* + * If set, cpu_up and cpu_down will return -EBUSY and do nothing. * Should always be manipulated under cpu_add_remove_lock */ static int cpu_hotplug_disabled; #ifdef CONFIG_HOTPLUG_CPU -static struct { - struct task_struct *active_writer; - /* wait queue to wake up the active_writer */ - wait_queue_head_t wq; - /* verifies that no writer will get active while readers are active */ - struct mutex lock; - /* - * Also blocks the new readers during - * an ongoing cpu hotplug operation. - */ - atomic_t refcount; - -#ifdef CONFIG_DEBUG_LOCK_ALLOC - struct lockdep_map dep_map; -#endif -} cpu_hotplug = { - .active_writer = NULL, - .wq = __WAIT_QUEUE_HEAD_INITIALIZER(cpu_hotplug.wq), - .lock = __MUTEX_INITIALIZER(cpu_hotplug.lock), -#ifdef CONFIG_DEBUG_LOCK_ALLOC - .dep_map = STATIC_LOCKDEP_MAP_INIT("cpu_hotplug.dep_map", &cpu_hotplug.dep_map), -#endif -}; - -/* Lockdep annotations for get/put_online_cpus() and cpu_hotplug_begin/end() */ -#define cpuhp_lock_acquire_read() lock_map_acquire_read(&cpu_hotplug.dep_map) -#define cpuhp_lock_acquire_tryread() \ - lock_map_acquire_tryread(&cpu_hotplug.dep_map) -#define cpuhp_lock_acquire() lock_map_acquire(&cpu_hotplug.dep_map) -#define cpuhp_lock_release() lock_map_release(&cpu_hotplug.dep_map) - +DEFINE_STATIC_PERCPU_RWSEM(cpu_hotplug_lock); void get_online_cpus(void) { - might_sleep(); - if (cpu_hotplug.active_writer == current) - return; - cpuhp_lock_acquire_read(); - mutex_lock(&cpu_hotplug.lock); - atomic_inc(&cpu_hotplug.refcount); - mutex_unlock(&cpu_hotplug.lock); + percpu_down_read(&cpu_hotplug_lock); } EXPORT_SYMBOL_GPL(get_online_cpus); void put_online_cpus(void) { - int refcount; - - if (cpu_hotplug.active_writer == current) - return; - - refcount = atomic_dec_return(&cpu_hotplug.refcount); - if (WARN_ON(refcount < 0)) /* try to fix things up */ - atomic_inc(&cpu_hotplug.refcount); - - if (refcount <= 0 && waitqueue_active(&cpu_hotplug.wq)) - wake_up(&cpu_hotplug.wq); - - cpuhp_lock_release(); - + percpu_up_read(&cpu_hotplug_lock); } EXPORT_SYMBOL_GPL(put_online_cpus); -/* - * This ensures that the hotplug operation can begin only when the - * refcount goes to zero. - * - * Note that during a cpu-hotplug operation, the new readers, if any, - * will be blocked by the cpu_hotplug.lock - * - * Since cpu_hotplug_begin() is always called after invoking - * cpu_maps_update_begin(), we can be sure that only one writer is active. - * - * Note that theoretically, there is a possibility of a livelock: - * - Refcount goes to zero, last reader wakes up the sleeping - * writer. - * - Last reader unlocks the cpu_hotplug.lock. - * - A new reader arrives at this moment, bumps up the refcount. - * - The writer acquires the cpu_hotplug.lock finds the refcount - * non zero and goes to sleep again. - * - * However, this is very difficult to achieve in practice since - * get_online_cpus() not an api which is called all that often. - * - */ void cpu_hotplug_begin(void) { - DEFINE_WAIT(wait); - - cpu_hotplug.active_writer = current; - cpuhp_lock_acquire(); - - for (;;) { - mutex_lock(&cpu_hotplug.lock); - prepare_to_wait(&cpu_hotplug.wq, &wait, TASK_UNINTERRUPTIBLE); - if (likely(!atomic_read(&cpu_hotplug.refcount))) - break; - mutex_unlock(&cpu_hotplug.lock); - schedule(); - } - finish_wait(&cpu_hotplug.wq, &wait); + percpu_down_write(&cpu_hotplug_lock); } void cpu_hotplug_done(void) { - cpu_hotplug.active_writer = NULL; - mutex_unlock(&cpu_hotplug.lock); - cpuhp_lock_release(); + percpu_up_write(&cpu_hotplug_lock); +} + +void lockdep_assert_hotplug_held(void) +{ + percpu_rwsem_assert_held(&cpu_hotplug_lock); } /* @@ -344,8 +265,6 @@ void cpu_hotplug_enable(void) EXPORT_SYMBOL_GPL(cpu_hotplug_enable); #endif /* CONFIG_HOTPLUG_CPU */ -/* Notifier wrappers for transitioning to state machine */ - static int bringup_wait_for_ap(unsigned int cpu) { struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu); @@ -1482,6 +1401,8 @@ int __cpuhp_setup_state_cpuslocked(enum int cpu, ret = 0; bool dynstate; + lockdep_assert_hotplug_held(); + if (cpuhp_cb_check(state) || !name) return -EINVAL; @@ -1600,6 +1521,7 @@ void __cpuhp_remove_state_cpuslocked(enu int cpu; BUG_ON(cpuhp_cb_check(state)); + lockdep_assert_hotplug_held(); mutex_lock(&cpuhp_state_mutex); if (sp->multi_instance) { --- a/kernel/jump_label.c +++ b/kernel/jump_label.c @@ -130,6 +130,7 @@ void __static_key_slow_inc(struct static * the all CPUs, for that to be serialized against CPU hot-plug * we need to avoid CPUs coming online. */ + lockdep_assert_hotplug_held(); jump_label_lock(); if (atomic_read(&key->enabled) == 0) { atomic_set(&key->enabled, -1); @@ -158,6 +159,7 @@ EXPORT_SYMBOL_GPL(static_key_slow_inc_cp static void __static_key_slow_dec(struct static_key *key, unsigned long rate_limit, struct delayed_work *work) { + lockdep_assert_hotplug_held(); /* * The negative count check is valid even when a negative * key->enabled is in use by static_key_slow_inc(); a --- a/kernel/padata.c +++ b/kernel/padata.c @@ -1013,6 +1013,7 @@ static struct padata_instance *padata_al */ struct padata_instance *padata_alloc_possible(struct workqueue_struct *wq) { + lockdep_assert_hotplug_held(); return padata_alloc(wq, cpu_possible_mask, cpu_possible_mask); } EXPORT_SYMBOL(padata_alloc_possible); --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -562,6 +562,8 @@ int stop_machine_cpuslocked(cpu_stop_fn_ .active_cpus = cpus, }; + lockdep_assert_hotplug_held(); + if (!stop_machine_initialized) { /* * Handle the case where stop_machine() is called
Hi,
On Tue, Apr 18, 2017 at 07:04:53PM +0200, Thomas Gleixner wrote:
> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>
> arch_hw_breakpoint_init() holds get_online_cpus() while registerring the
> hotplug callbacks.
>
> cpuhp_setup_state() invokes get_online_cpus() as well. This is correct, but
> prevents the conversion of the hotplug locking to a percpu rwsem.
>
> Use cpuhp_setup_state_cpuslocked() to avoid the nested call.
>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Russell King <linux@armlinux.org.uk>
> Cc: linux-arm-kernel@lists.infradead.org
>
> ---
> arch/arm/kernel/hw_breakpoint.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> --- a/arch/arm/kernel/hw_breakpoint.c
> +++ b/arch/arm/kernel/hw_breakpoint.c
> @@ -1098,8 +1098,9 @@ static int __init arch_hw_breakpoint_ini
> * assume that a halting debugger will leave the world in a nice state
> * for us.
> */
> - ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "arm/hw_breakpoint:online",
> - dbg_reset_online, NULL);
> + ret = cpuhp_setup_state_cpuslocked(CPUHP_AP_ONLINE_DYN,
> + "arm/hw_breakpoint:online",
> + dbg_reset_online, NULL);
Given the callsite, this particular change looks ok to me. So FWIW:
Acked-by: Mark Rutland <mark.rutland@arm.com>
However, as a more general note, the changes make the API feel odd. per
their current names, {get,put}_online_cpus() sound/feel like refcounting
ops, which should be able to nest.
Is there any chance these could get a better name, e.g.
{lock,unlock}_online_cpus(), so as to align with _cpuslocked?
Thanks,
Mark.
On Wed, 19 Apr 2017, Mark Rutland wrote:
> Hi,
>
> On Tue, Apr 18, 2017 at 07:04:53PM +0200, Thomas Gleixner wrote:
> > From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> >
> > arch_hw_breakpoint_init() holds get_online_cpus() while registerring the
> > hotplug callbacks.
> >
> > cpuhp_setup_state() invokes get_online_cpus() as well. This is correct, but
> > prevents the conversion of the hotplug locking to a percpu rwsem.
> >
> > Use cpuhp_setup_state_cpuslocked() to avoid the nested call.
> >
> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Will Deacon <will.deacon@arm.com>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Cc: Russell King <linux@armlinux.org.uk>
> > Cc: linux-arm-kernel@lists.infradead.org
> >
> > ---
> > arch/arm/kernel/hw_breakpoint.c | 5 +++--
> > 1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > --- a/arch/arm/kernel/hw_breakpoint.c
> > +++ b/arch/arm/kernel/hw_breakpoint.c
> > @@ -1098,8 +1098,9 @@ static int __init arch_hw_breakpoint_ini
> > * assume that a halting debugger will leave the world in a nice state
> > * for us.
> > */
> > - ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "arm/hw_breakpoint:online",
> > - dbg_reset_online, NULL);
> > + ret = cpuhp_setup_state_cpuslocked(CPUHP_AP_ONLINE_DYN,
> > + "arm/hw_breakpoint:online",
> > + dbg_reset_online, NULL);
>
> Given the callsite, this particular change looks ok to me. So FWIW:
>
> Acked-by: Mark Rutland <mark.rutland@arm.com>
>
> However, as a more general note, the changes make the API feel odd. per
> their current names, {get,put}_online_cpus() sound/feel like refcounting
> ops, which should be able to nest.
>
> Is there any chance these could get a better name, e.g.
> {lock,unlock}_online_cpus(), so as to align with _cpuslocked?
Yes, that's a follow up cleanup patch treewide once this hit Linus tree.
Thanks,
tglx
Commit-ID: 6fd1f038ff919b765d1edc877ce14b9f20a703f7 Gitweb: http://git.kernel.org/tip/6fd1f038ff919b765d1edc877ce14b9f20a703f7 Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de> AuthorDate: Tue, 18 Apr 2017 19:04:43 +0200 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Thu, 20 Apr 2017 13:08:50 +0200 cpu/hotplug: Provide cpuhp_setup/remove_state[_nocalls]_cpuslocked() Some call sites of cpuhp_setup/remove_state[_nocalls]() are within a get_online_cpus() protected region. cpuhp_setup/remove_state[_nocalls]() call get_online_cpus() as well, which is possible in the current implementation but prevents converting the hotplug locking to a percpu rwsem. Provide locked versions of the interfaces to avoid nested calls to get_online_cpus(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170418170552.448563593@linutronix.de --- include/linux/cpuhotplug.h | 29 +++++++++++++++++++++++++++++ kernel/cpu.c | 45 +++++++++++++++++++++++++++++++++------------ 2 files changed, 62 insertions(+), 12 deletions(-) diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index 62d240e..5b33837 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -151,6 +151,11 @@ int __cpuhp_setup_state(enum cpuhp_state state, const char *name, bool invoke, int (*startup)(unsigned int cpu), int (*teardown)(unsigned int cpu), bool multi_instance); +int __cpuhp_setup_state_cpuslocked(enum cpuhp_state state, const char *name, + bool invoke, + int (*startup)(unsigned int cpu), + int (*teardown)(unsigned int cpu), + bool multi_instance); /** * cpuhp_setup_state - Setup hotplug state callbacks with calling the callbacks * @state: The state for which the calls are installed @@ -169,6 +174,15 @@ static inline int cpuhp_setup_state(enum cpuhp_state state, return __cpuhp_setup_state(state, name, true, startup, teardown, false); } +static inline int cpuhp_setup_state_cpuslocked(enum cpuhp_state state, + const char *name, + int (*startup)(unsigned int cpu), + int (*teardown)(unsigned int cpu)) +{ + return __cpuhp_setup_state_cpuslocked(state, name, true, startup, + teardown, false); +} + /** * cpuhp_setup_state_nocalls - Setup hotplug state callbacks without calling the * callbacks @@ -189,6 +203,15 @@ static inline int cpuhp_setup_state_nocalls(enum cpuhp_state state, false); } +static inline int cpuhp_setup_state_nocalls_cpuslocked(enum cpuhp_state state, + const char *name, + int (*startup)(unsigned int cpu), + int (*teardown)(unsigned int cpu)) +{ + return __cpuhp_setup_state_cpuslocked(state, name, false, startup, + teardown, false); +} + /** * cpuhp_setup_state_multi - Add callbacks for multi state * @state: The state for which the calls are installed @@ -248,6 +271,7 @@ static inline int cpuhp_state_add_instance_nocalls(enum cpuhp_state state, } void __cpuhp_remove_state(enum cpuhp_state state, bool invoke); +void __cpuhp_remove_state_cpuslocked(enum cpuhp_state state, bool invoke); /** * cpuhp_remove_state - Remove hotplug state callbacks and invoke the teardown @@ -271,6 +295,11 @@ static inline void cpuhp_remove_state_nocalls(enum cpuhp_state state) __cpuhp_remove_state(state, false); } +static inline void cpuhp_remove_state_nocalls_cpuslocked(enum cpuhp_state state) +{ + __cpuhp_remove_state_cpuslocked(state, false); +} + /** * cpuhp_remove_multi_state - Remove hotplug multi state callback * @state: The state for which the calls are removed diff --git a/kernel/cpu.c b/kernel/cpu.c index 37b223e..92e5a91 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -1457,7 +1457,7 @@ unlock: EXPORT_SYMBOL_GPL(__cpuhp_state_add_instance); /** - * __cpuhp_setup_state - Setup the callbacks for an hotplug machine state + * __cpuhp_setup_state_cpuslocked - Setup the callbacks for an hotplug machine state * @state: The state to setup * @invoke: If true, the startup function is invoked for cpus where * cpu state >= @state @@ -1466,17 +1466,18 @@ EXPORT_SYMBOL_GPL(__cpuhp_state_add_instance); * @multi_instance: State is set up for multiple instances which get * added afterwards. * + * The caller needs to hold get_online_cpus() while calling this function. * Returns: * On success: * Positive state number if @state is CPUHP_AP_ONLINE_DYN * 0 for all other states * On failure: proper (negative) error code */ -int __cpuhp_setup_state(enum cpuhp_state state, - const char *name, bool invoke, - int (*startup)(unsigned int cpu), - int (*teardown)(unsigned int cpu), - bool multi_instance) +int __cpuhp_setup_state_cpuslocked(enum cpuhp_state state, + const char *name, bool invoke, + int (*startup)(unsigned int cpu), + int (*teardown)(unsigned int cpu), + bool multi_instance) { int cpu, ret = 0; bool dynstate; @@ -1484,7 +1485,6 @@ int __cpuhp_setup_state(enum cpuhp_state state, if (cpuhp_cb_check(state) || !name) return -EINVAL; - get_online_cpus(); mutex_lock(&cpuhp_state_mutex); ret = cpuhp_store_callbacks(state, name, startup, teardown, @@ -1520,7 +1520,6 @@ int __cpuhp_setup_state(enum cpuhp_state state, } out: mutex_unlock(&cpuhp_state_mutex); - put_online_cpus(); /* * If the requested state is CPUHP_AP_ONLINE_DYN, return the * dynamically allocated state in case of success. @@ -1529,6 +1528,22 @@ out: return state; return ret; } +EXPORT_SYMBOL(__cpuhp_setup_state_cpuslocked); + +int __cpuhp_setup_state(enum cpuhp_state state, + const char *name, bool invoke, + int (*startup)(unsigned int cpu), + int (*teardown)(unsigned int cpu), + bool multi_instance) +{ + int ret; + + get_online_cpus(); + ret = __cpuhp_setup_state_cpuslocked(state, name, invoke, startup, + teardown, multi_instance); + put_online_cpus(); + return ret; +} EXPORT_SYMBOL(__cpuhp_setup_state); int __cpuhp_state_remove_instance(enum cpuhp_state state, @@ -1570,23 +1585,22 @@ remove: EXPORT_SYMBOL_GPL(__cpuhp_state_remove_instance); /** - * __cpuhp_remove_state - Remove the callbacks for an hotplug machine state + * __cpuhp_remove_state_cpuslocked - Remove the callbacks for an hotplug machine state * @state: The state to remove * @invoke: If true, the teardown function is invoked for cpus where * cpu state >= @state * + * The caller needs to hold get_online_cpus() while calling this function. * The teardown callback is currently not allowed to fail. Think * about module removal! */ -void __cpuhp_remove_state(enum cpuhp_state state, bool invoke) +void __cpuhp_remove_state_cpuslocked(enum cpuhp_state state, bool invoke) { struct cpuhp_step *sp = cpuhp_get_step(state); int cpu; BUG_ON(cpuhp_cb_check(state)); - get_online_cpus(); - mutex_lock(&cpuhp_state_mutex); if (sp->multi_instance) { WARN(!hlist_empty(&sp->list), @@ -1613,6 +1627,13 @@ void __cpuhp_remove_state(enum cpuhp_state state, bool invoke) remove: cpuhp_store_callbacks(state, NULL, NULL, NULL, false); mutex_unlock(&cpuhp_state_mutex); +} +EXPORT_SYMBOL(__cpuhp_remove_state_cpuslocked); + +void __cpuhp_remove_state(enum cpuhp_state state, bool invoke) +{ + get_online_cpus(); + __cpuhp_remove_state_cpuslocked(state, invoke); put_online_cpus(); } EXPORT_SYMBOL(__cpuhp_remove_state);
Commit-ID: e878ce0f69caa5c37f3183af711291c0c6a38d6a Gitweb: http://git.kernel.org/tip/e878ce0f69caa5c37f3183af711291c0c6a38d6a Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de> AuthorDate: Tue, 18 Apr 2017 19:04:44 +0200 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Thu, 20 Apr 2017 13:08:50 +0200 stop_machine: Provide stop_machine_cpuslocked() Some call sites of stop_machine() are within a get_online_cpus() protected region. stop_machine() calls get_online_cpus() as well, which is possible in the current implementation but prevents converting the hotplug locking to a percpu rwsem. Provide stop_machine_cpuslocked() to avoid nested calls to get_online_cpus(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170418170552.526243990@linutronix.de --- include/linux/stop_machine.h | 26 +++++++++++++++++++++++--- kernel/stop_machine.c | 5 +++-- 2 files changed, 26 insertions(+), 5 deletions(-) diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h index 3cc9632..8a068bd 100644 --- a/include/linux/stop_machine.h +++ b/include/linux/stop_machine.h @@ -116,15 +116,29 @@ static inline int try_stop_cpus(const struct cpumask *cpumask, * @fn() runs. * * This can be thought of as a very heavy write lock, equivalent to - * grabbing every spinlock in the kernel. */ + * grabbing every spinlock in the kernel. + * + * Protects against CPU hotplug. + */ int stop_machine(cpu_stop_fn_t fn, void *data, const struct cpumask *cpus); +/** + * stop_machine_cpuslocked: freeze the machine on all CPUs and run this function + * @fn: the function to run + * @data: the data ptr for the @fn() + * @cpus: the cpus to run the @fn() on (NULL = any online cpu) + * + * Same as above. Must be called from with in a get_online_cpus() protected + * region. Avoids nested calls to get_online_cpus(). + */ +int stop_machine_cpuslocked(cpu_stop_fn_t fn, void *data, const struct cpumask *cpus); + int stop_machine_from_inactive_cpu(cpu_stop_fn_t fn, void *data, const struct cpumask *cpus); #else /* CONFIG_SMP || CONFIG_HOTPLUG_CPU */ -static inline int stop_machine(cpu_stop_fn_t fn, void *data, - const struct cpumask *cpus) +static inline int stop_machine_cpuslocked(cpu_stop_fn_t fn, void *data, + const struct cpumask *cpus) { unsigned long flags; int ret; @@ -134,6 +148,12 @@ static inline int stop_machine(cpu_stop_fn_t fn, void *data, return ret; } +static inline int stop_machine(cpu_stop_fn_t fn, void *data, + const struct cpumask *cpus) +{ + return stop_machine_cpuslocked(fn, data, cpus); +} + static inline int stop_machine_from_inactive_cpu(cpu_stop_fn_t fn, void *data, const struct cpumask *cpus) { diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c index 1eb8266..a8c5636 100644 --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -552,7 +552,8 @@ static int __init cpu_stop_init(void) } early_initcall(cpu_stop_init); -static int __stop_machine(cpu_stop_fn_t fn, void *data, const struct cpumask *cpus) +int stop_machine_cpuslocked(cpu_stop_fn_t fn, void *data, + const struct cpumask *cpus) { struct multi_stop_data msdata = { .fn = fn, @@ -591,7 +592,7 @@ int stop_machine(cpu_stop_fn_t fn, void *data, const struct cpumask *cpus) /* No CPUs can come up or down during this. */ get_online_cpus(); - ret = __stop_machine(fn, data, cpus); + ret = stop_machine_cpuslocked(fn, data, cpus); put_online_cpus(); return ret; }
Commit-ID: a792e10c4cb42b8508364c7e187caac2409e6166 Gitweb: http://git.kernel.org/tip/a792e10c4cb42b8508364c7e187caac2409e6166 Author: Thomas Gleixner <tglx@linutronix.de> AuthorDate: Tue, 18 Apr 2017 19:04:45 +0200 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Thu, 20 Apr 2017 13:08:51 +0200 padata: Make padata_alloc() static No users outside of padata.c Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: linux-crypto@vger.kernel.org Link: http://lkml.kernel.org/r/20170418170552.619297328@linutronix.de --- include/linux/padata.h | 3 --- kernel/padata.c | 32 ++++++++++++++++---------------- 2 files changed, 16 insertions(+), 19 deletions(-) diff --git a/include/linux/padata.h b/include/linux/padata.h index 0f9e567..2f9c1f9 100644 --- a/include/linux/padata.h +++ b/include/linux/padata.h @@ -166,9 +166,6 @@ struct padata_instance { extern struct padata_instance *padata_alloc_possible( struct workqueue_struct *wq); -extern struct padata_instance *padata_alloc(struct workqueue_struct *wq, - const struct cpumask *pcpumask, - const struct cpumask *cbcpumask); extern void padata_free(struct padata_instance *pinst); extern int padata_do_parallel(struct padata_instance *pinst, struct padata_priv *padata, int cb_cpu); diff --git a/kernel/padata.c b/kernel/padata.c index 3202aa1..18992bf 100644 --- a/kernel/padata.c +++ b/kernel/padata.c @@ -939,19 +939,6 @@ static struct kobj_type padata_attr_type = { }; /** - * padata_alloc_possible - Allocate and initialize padata instance. - * Use the cpu_possible_mask for serial and - * parallel workers. - * - * @wq: workqueue to use for the allocated padata instance - */ -struct padata_instance *padata_alloc_possible(struct workqueue_struct *wq) -{ - return padata_alloc(wq, cpu_possible_mask, cpu_possible_mask); -} -EXPORT_SYMBOL(padata_alloc_possible); - -/** * padata_alloc - allocate and initialize a padata instance and specify * cpumasks for serial and parallel workers. * @@ -959,9 +946,9 @@ EXPORT_SYMBOL(padata_alloc_possible); * @pcpumask: cpumask that will be used for padata parallelization * @cbcpumask: cpumask that will be used for padata serialization */ -struct padata_instance *padata_alloc(struct workqueue_struct *wq, - const struct cpumask *pcpumask, - const struct cpumask *cbcpumask) +static struct padata_instance *padata_alloc(struct workqueue_struct *wq, + const struct cpumask *pcpumask, + const struct cpumask *cbcpumask) { struct padata_instance *pinst; struct parallel_data *pd = NULL; @@ -1016,6 +1003,19 @@ err: } /** + * padata_alloc_possible - Allocate and initialize padata instance. + * Use the cpu_possible_mask for serial and + * parallel workers. + * + * @wq: workqueue to use for the allocated padata instance + */ +struct padata_instance *padata_alloc_possible(struct workqueue_struct *wq) +{ + return padata_alloc(wq, cpu_possible_mask, cpu_possible_mask); +} +EXPORT_SYMBOL(padata_alloc_possible); + +/** * padata_free - free a padata instance * * @padata_inst: padata instance to free
Commit-ID: bbd8797975ce0191ac72d575f7e34033b5a5f620 Gitweb: http://git.kernel.org/tip/bbd8797975ce0191ac72d575f7e34033b5a5f620 Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de> AuthorDate: Tue, 18 Apr 2017 19:04:46 +0200 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Thu, 20 Apr 2017 13:08:51 +0200 padata: Avoid nested calls to get_online_cpus() in pcrypt_init_padata() pcrypt_init_padata() get_online_cpus() padata_alloc_possible() padata_alloc() get_online_cpus() The nested call to get_online_cpus() works with the current implementation, but prevents the conversion to a percpu rwsem. The other caller of padata_alloc_possible() is pcrypt_init_padata() which calls from a get_online_cpus() protected region as well. Remove the get_online_cpus() call in padata_alloc() and document the calling convention. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: linux-crypto@vger.kernel.org Link: http://lkml.kernel.org/r/20170418170552.697398655@linutronix.de --- kernel/padata.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/kernel/padata.c b/kernel/padata.c index 18992bf..e5ff511 100644 --- a/kernel/padata.c +++ b/kernel/padata.c @@ -945,6 +945,8 @@ static struct kobj_type padata_attr_type = { * @wq: workqueue to use for the allocated padata instance * @pcpumask: cpumask that will be used for padata parallelization * @cbcpumask: cpumask that will be used for padata serialization + * + * Must be called from a get_online_cpus() protected region */ static struct padata_instance *padata_alloc(struct workqueue_struct *wq, const struct cpumask *pcpumask, @@ -957,7 +959,6 @@ static struct padata_instance *padata_alloc(struct workqueue_struct *wq, if (!pinst) goto err; - get_online_cpus(); if (!alloc_cpumask_var(&pinst->cpumask.pcpu, GFP_KERNEL)) goto err_free_inst; if (!alloc_cpumask_var(&pinst->cpumask.cbcpu, GFP_KERNEL)) { @@ -997,7 +998,6 @@ err_free_masks: free_cpumask_var(pinst->cpumask.cbcpu); err_free_inst: kfree(pinst); - put_online_cpus(); err: return NULL; } @@ -1008,6 +1008,8 @@ err: * parallel workers. * * @wq: workqueue to use for the allocated padata instance + * + * Must be called from a get_online_cpus() protected region */ struct padata_instance *padata_alloc_possible(struct workqueue_struct *wq) {
Commit-ID: 9ef0b62e7059e3ea85e2e5c4894bc2cb78b04eed Gitweb: http://git.kernel.org/tip/9ef0b62e7059e3ea85e2e5c4894bc2cb78b04eed Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de> AuthorDate: Tue, 18 Apr 2017 19:04:47 +0200 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Thu, 20 Apr 2017 13:08:51 +0200 x86/mtrr: Remove get_online_cpus() from mtrr_save_state() mtrr_save_state() is invoked from native_cpu_up() which is in the context of a CPU hotplug operation and therefor calling get_online_cpus() is pointless. While this works in the current get_online_cpus() implementation it prevents from converting the hotplug locking to percpu rwsems. Remove it. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170418170552.775914295@linutronix.de --- arch/x86/kernel/cpu/mtrr/main.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c index 24e87e7..8fd1f8d 100644 --- a/arch/x86/kernel/cpu/mtrr/main.c +++ b/arch/x86/kernel/cpu/mtrr/main.c @@ -807,10 +807,8 @@ void mtrr_save_state(void) if (!mtrr_enabled()) return; - get_online_cpus(); first_cpu = cpumask_first(cpu_online_mask); smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1); - put_online_cpus(); } void set_mtrr_aps_delayed_init(void)
Commit-ID: 6d7d2f1f9e6e0c2d6fb9de82ad5fbc3aed115829 Gitweb: http://git.kernel.org/tip/6d7d2f1f9e6e0c2d6fb9de82ad5fbc3aed115829 Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de> AuthorDate: Tue, 18 Apr 2017 19:04:48 +0200 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Thu, 20 Apr 2017 13:08:52 +0200 cpufreq: Use cpuhp_setup_state_nocalls_cpuslocked() cpufreq holds get_online_cpus() while invoking cpuhp_setup_state_nocalls() to make subsys_interface_register() and the registration of hotplug calls atomic versus cpu hotplug. cpuhp_setup_state_nocalls() invokes get_online_cpus() as well. This is correct, but prevents the conversion of the hotplug locking to a percpu rwsem. Use cpuhp_setup/remove_state_nocalls_cpuslocked() to avoid the nested call. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: linux-pm@vger.kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170418170552.865966021@linutronix.de --- drivers/cpufreq/cpufreq.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 0e3f649..67ef7ce 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -2473,9 +2473,10 @@ int cpufreq_register_driver(struct cpufreq_driver *driver_data) goto err_if_unreg; } - ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "cpufreq:online", - cpuhp_cpufreq_online, - cpuhp_cpufreq_offline); + ret = cpuhp_setup_state_nocalls_cpuslocked(CPUHP_AP_ONLINE_DYN, + "cpufreq:online", + cpuhp_cpufreq_online, + cpuhp_cpufreq_offline); if (ret < 0) goto err_if_unreg; hp_online = ret; @@ -2519,7 +2520,7 @@ int cpufreq_unregister_driver(struct cpufreq_driver *driver) get_online_cpus(); subsys_interface_unregister(&cpufreq_interface); remove_boost_sysfs_file(); - cpuhp_remove_state_nocalls(hp_online); + cpuhp_remove_state_nocalls_cpuslocked(hp_online); write_lock_irqsave(&cpufreq_driver_lock, flags);
Commit-ID: 2fd231725b835f38e70fbda0842de57e330cd147 Gitweb: http://git.kernel.org/tip/2fd231725b835f38e70fbda0842de57e330cd147 Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de> AuthorDate: Tue, 18 Apr 2017 19:04:49 +0200 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Thu, 20 Apr 2017 13:08:52 +0200 KVM/PPC/Book3S HV: Use cpuhp_setup_state_nocalls_cpuslocked() kvmppc_alloc_host_rm_ops() holds get_online_cpus() while invoking cpuhp_setup_state_nocalls(). cpuhp_setup_state_nocalls() invokes get_online_cpus() as well. This is correct, but prevents the conversion of the hotplug locking to a percpu rwsem. Use cpuhp_setup_state_nocalls_cpuslocked() to avoid the nested call. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: kvm@vger.kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: kvm-ppc@vger.kernel.org Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Cc: Alexander Graf <agraf@suse.com> Link: http://lkml.kernel.org/r/20170418170552.946155801@linutronix.de --- arch/powerpc/kvm/book3s_hv.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 1ec86d9..df7117c 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -3336,10 +3336,10 @@ void kvmppc_alloc_host_rm_ops(void) return; } - cpuhp_setup_state_nocalls(CPUHP_KVM_PPC_BOOK3S_PREPARE, - "ppc/kvm_book3s:prepare", - kvmppc_set_host_core, - kvmppc_clear_host_core); + cpuhp_setup_state_nocalls_cpuslocked(CPUHP_KVM_PPC_BOOK3S_PREPARE, + "ppc/kvm_book3s:prepare", + kvmppc_set_host_core, + kvmppc_clear_host_core); put_online_cpus(); }
Commit-ID: c8fbca5abedcbf4212a34b7d3aef9c2adc44d36e Gitweb: http://git.kernel.org/tip/c8fbca5abedcbf4212a34b7d3aef9c2adc44d36e Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de> AuthorDate: Tue, 18 Apr 2017 19:04:50 +0200 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Thu, 20 Apr 2017 13:08:52 +0200 hwtracing/coresight-etm3x: Use cpuhp_setup_state_nocalls_cpuslocked() etm_probe() holds get_online_cpus() while invoking cpuhp_setup_state_nocalls(). cpuhp_setup_state_nocalls() invokes get_online_cpus() as well. This is correct, but prevents the conversion of the hotplug locking to a percpu rwsem. Use cpuhp_setup_state_nocalls_cpuslocked() to avoid the nested call. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20170418170553.026838125@linutronix.de --- drivers/hwtracing/coresight/coresight-etm3x.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/hwtracing/coresight/coresight-etm3x.c b/drivers/hwtracing/coresight/coresight-etm3x.c index a51b6b6..c9a0a75 100644 --- a/drivers/hwtracing/coresight/coresight-etm3x.c +++ b/drivers/hwtracing/coresight/coresight-etm3x.c @@ -803,12 +803,12 @@ static int etm_probe(struct amba_device *adev, const struct amba_id *id) dev_err(dev, "ETM arch init failed\n"); if (!etm_count++) { - cpuhp_setup_state_nocalls(CPUHP_AP_ARM_CORESIGHT_STARTING, - "arm/coresight:starting", - etm_starting_cpu, etm_dying_cpu); - ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, - "arm/coresight:online", - etm_online_cpu, NULL); + cpuhp_setup_state_nocalls_cpuslocked(CPUHP_AP_ARM_CORESIGHT_STARTING, + "arm/coresight:starting", + etm_starting_cpu, etm_dying_cpu); + ret = cpuhp_setup_state_nocalls_cpuslocked(CPUHP_AP_ONLINE_DYN, + "arm/coresight:online", + etm_online_cpu, NULL); if (ret < 0) goto err_arch_supported; hp_online = ret;
Commit-ID: 91e555edde960481085a8a69ac32726a9f6df0c9 Gitweb: http://git.kernel.org/tip/91e555edde960481085a8a69ac32726a9f6df0c9 Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de> AuthorDate: Tue, 18 Apr 2017 19:04:51 +0200 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Thu, 20 Apr 2017 13:08:53 +0200 hwtracing/coresight-etm4x: Use cpuhp_setup_state_nocalls_cpuslocked() etm_probe4() holds get_online_cpus() while invoking cpuhp_setup_state_nocalls(). cpuhp_setup_state_nocalls() invokes get_online_cpus() as well. This is correct, but prevents the conversion of the hotplug locking to a percpu rwsem. Use cpuhp_setup_state_nocalls_cpuslocked() to avoid the nested call. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20170418170553.108514162@linutronix.de --- drivers/hwtracing/coresight/coresight-etm4x.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c b/drivers/hwtracing/coresight/coresight-etm4x.c index d1340fb..bd3046c 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x.c +++ b/drivers/hwtracing/coresight/coresight-etm4x.c @@ -990,12 +990,12 @@ static int etm4_probe(struct amba_device *adev, const struct amba_id *id) dev_err(dev, "ETM arch init failed\n"); if (!etm4_count++) { - cpuhp_setup_state_nocalls(CPUHP_AP_ARM_CORESIGHT_STARTING, - "arm/coresight4:starting", - etm4_starting_cpu, etm4_dying_cpu); - ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, - "arm/coresight4:online", - etm4_online_cpu, NULL); + cpuhp_setup_state_nocalls_cpuslocked(CPUHP_AP_ARM_CORESIGHT_STARTING, + "arm/coresight4:starting", + etm4_starting_cpu, etm4_dying_cpu); + ret = cpuhp_setup_state_nocalls_cpuslocked(CPUHP_AP_ONLINE_DYN, + "arm/coresight4:online", + etm4_online_cpu, NULL); if (ret < 0) goto err_arch_supported; hp_online = ret;
Commit-ID: 88840592a3483951be52f3ed35f4c4348dc9eed9 Gitweb: http://git.kernel.org/tip/88840592a3483951be52f3ed35f4c4348dc9eed9 Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de> AuthorDate: Tue, 18 Apr 2017 19:04:52 +0200 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Thu, 20 Apr 2017 13:08:53 +0200 perf/x86/intel/cqm: Use cpuhp_setup_state_cpuslocked() intel_cqm_init() holds get_online_cpus() while registerring the hotplug callbacks. cpuhp_setup_state() invokes get_online_cpus() as well. This is correct, but prevents the conversion of the hotplug locking to a percpu rwsem. Use cpuhp_setup_state_cpuslocked() to avoid the nested call. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170418170553.191578891@linutronix.de --- arch/x86/events/intel/cqm.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/x86/events/intel/cqm.c b/arch/x86/events/intel/cqm.c index 8c00dc0..2d4a842 100644 --- a/arch/x86/events/intel/cqm.c +++ b/arch/x86/events/intel/cqm.c @@ -1746,12 +1746,12 @@ static int __init intel_cqm_init(void) * Setup the hot cpu notifier once we are sure cqm * is enabled to avoid notifier leak. */ - cpuhp_setup_state(CPUHP_AP_PERF_X86_CQM_STARTING, - "perf/x86/cqm:starting", - intel_cqm_cpu_starting, NULL); - cpuhp_setup_state(CPUHP_AP_PERF_X86_CQM_ONLINE, "perf/x86/cqm:online", - NULL, intel_cqm_cpu_exit); - + cpuhp_setup_state_cpuslocked(CPUHP_AP_PERF_X86_CQM_STARTING, + "perf/x86/cqm:starting", + intel_cqm_cpu_starting, NULL); + cpuhp_setup_state_cpuslocked(CPUHP_AP_PERF_X86_CQM_ONLINE, + "perf/x86/cqm:online", + NULL, intel_cqm_cpu_exit); out: put_online_cpus();
Commit-ID: f29f3fd01356bd8ac1f95ab098e9139ae6c945bb Gitweb: http://git.kernel.org/tip/f29f3fd01356bd8ac1f95ab098e9139ae6c945bb Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de> AuthorDate: Tue, 18 Apr 2017 19:04:53 +0200 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Thu, 20 Apr 2017 13:08:53 +0200 ARM/hw_breakpoint: Use cpuhp_setup_state_cpuslocked() arch_hw_breakpoint_init() holds get_online_cpus() while registerring the hotplug callbacks. cpuhp_setup_state() invokes get_online_cpus() as well. This is correct, but prevents the conversion of the hotplug locking to a percpu rwsem. Use cpuhp_setup_state_cpuslocked() to avoid the nested call. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Russell King <linux@armlinux.org.uk> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20170418170553.274229815@linutronix.de --- arch/arm/kernel/hw_breakpoint.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/arm/kernel/hw_breakpoint.c b/arch/arm/kernel/hw_breakpoint.c index be3b3fb..2320780 100644 --- a/arch/arm/kernel/hw_breakpoint.c +++ b/arch/arm/kernel/hw_breakpoint.c @@ -1098,8 +1098,9 @@ static int __init arch_hw_breakpoint_init(void) * assume that a halting debugger will leave the world in a nice state * for us. */ - ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "arm/hw_breakpoint:online", - dbg_reset_online, NULL); + ret = cpuhp_setup_state_cpuslocked(CPUHP_AP_ONLINE_DYN, + "arm/hw_breakpoint:online", + dbg_reset_online, NULL); unregister_undef_hook(&debug_reg_hook); if (WARN_ON(ret < 0) || !cpumask_empty(&debug_err_mask)) { core_num_brps = 0;
Commit-ID: 35cc1ac5c24cb497b3b15a9383ff098e4a8bf03c Gitweb: http://git.kernel.org/tip/35cc1ac5c24cb497b3b15a9383ff098e4a8bf03c Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de> AuthorDate: Tue, 18 Apr 2017 19:04:54 +0200 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Thu, 20 Apr 2017 13:08:54 +0200 s390/kernel: Use stop_machine_cpuslocked() stp_work_fn() holds get_online_cpus() while invoking stop_machine(). stop_machine() invokes get_online_cpus() as well. This is correct, but prevents the conversion of the hotplug locking to a percpu rwsem. Use stop_machine_cpuslocked() to avoid the nested call. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-s390@vger.kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: David Hildenbrand <dahi@linux.vnet.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Link: http://lkml.kernel.org/r/20170418170553.363283884@linutronix.de --- arch/s390/kernel/time.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/s390/kernel/time.c b/arch/s390/kernel/time.c index c31da46..3aa8acb 100644 --- a/arch/s390/kernel/time.c +++ b/arch/s390/kernel/time.c @@ -636,7 +636,7 @@ static void stp_work_fn(struct work_struct *work) memset(&stp_sync, 0, sizeof(stp_sync)); get_online_cpus(); atomic_set(&stp_sync.cpus, num_online_cpus() - 1); - stop_machine(stp_sync_clock, &stp_sync, cpu_online_mask); + stop_machine_cpuslocked(stp_sync_clock, &stp_sync, cpu_online_mask); put_online_cpus(); if (!check_sync_clock())
Commit-ID: 6e9e61778f976b52fbdeb315850ee6eb4695a1e5 Gitweb: http://git.kernel.org/tip/6e9e61778f976b52fbdeb315850ee6eb4695a1e5 Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de> AuthorDate: Tue, 18 Apr 2017 19:04:55 +0200 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Thu, 20 Apr 2017 13:08:54 +0200 powerpc/powernv: Use stop_machine_cpuslocked() set_subcores_per_core() holds get_online_cpus() while invoking stop_machine(). stop_machine() invokes get_online_cpus() as well. This is correct, but prevents the conversion of the hotplug locking to a percpu rwsem. Use stop_machine_cpuslocked() to avoid the nested call. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20170418170553.447314339@linutronix.de --- arch/powerpc/platforms/powernv/subcore.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/subcore.c b/arch/powerpc/platforms/powernv/subcore.c index 0babef1..67cbf45 100644 --- a/arch/powerpc/platforms/powernv/subcore.c +++ b/arch/powerpc/platforms/powernv/subcore.c @@ -356,7 +356,8 @@ static int set_subcores_per_core(int new_mode) /* Ensure state is consistent before we call the other cpus */ mb(); - stop_machine(cpu_update_split_mode, &new_mode, cpu_online_mask); + stop_machine_cpuslocked(cpu_update_split_mode, &new_mode, + cpu_online_mask); put_online_cpus();
Commit-ID: cdcb496b285dc8e9c544107d42bd60e5bc2feffa Gitweb: http://git.kernel.org/tip/cdcb496b285dc8e9c544107d42bd60e5bc2feffa Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de> AuthorDate: Tue, 18 Apr 2017 19:04:56 +0200 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Thu, 20 Apr 2017 13:08:55 +0200 cpu/hotplug: Use stop_machine_cpuslocked() in takedown_cpu() takedown_cpu() is a cpu hotplug function invoking stop_machine(). The cpu hotplug machinery holds the hotplug lock for write. stop_machine() invokes get_online_cpus() as well. This is correct, but prevents the conversion of the hotplug locking to a percpu rwsem. Use stop_machine_cpuslocked() to avoid the nested call. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170418170553.531067939@linutronix.de --- kernel/cpu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/cpu.c b/kernel/cpu.c index 92e5a91..f932e68 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -701,7 +701,7 @@ static int takedown_cpu(unsigned int cpu) /* * So now all preempt/rcu users must observe !cpu_active(). */ - err = stop_machine(take_cpu_down, NULL, cpumask_of(cpu)); + err = stop_machine_cpuslocked(take_cpu_down, NULL, cpumask_of(cpu)); if (err) { /* CPU refused to die */ irq_unlock_sparse();
Commit-ID: 1516643d0831ac6807aee360206cf0f0691c5da0 Gitweb: http://git.kernel.org/tip/1516643d0831ac6807aee360206cf0f0691c5da0 Author: Thomas Gleixner <tglx@linutronix.de> AuthorDate: Tue, 18 Apr 2017 19:04:57 +0200 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Thu, 20 Apr 2017 13:08:55 +0200 x86/perf: Drop EXPORT of perf_check_microcode The only caller is the microcode update, which cannot be modular. Drop the export. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Borislav Petkov <bp@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20170418170553.620260279@linutronix.de --- arch/x86/events/core.c | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 580b60f..ac650d5 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -2224,7 +2224,6 @@ void perf_check_microcode(void) if (x86_pmu.check_microcode) x86_pmu.check_microcode(); } -EXPORT_SYMBOL_GPL(perf_check_microcode); static struct pmu pmu = { .pmu_enable = x86_pmu_enable,
Commit-ID: 494d8679788930b29e1262d92db4c1912bdf9119 Gitweb: http://git.kernel.org/tip/494d8679788930b29e1262d92db4c1912bdf9119 Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de> AuthorDate: Tue, 18 Apr 2017 19:04:58 +0200 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Thu, 20 Apr 2017 13:08:55 +0200 perf/x86/intel: Drop get_online_cpus() in intel_snb_check_microcode() If intel_snb_check_microcode() is invoked via microcode_init -> perf_check_microcode -> intel_snb_check_microcode then get_online_cpus() is invoked nested. This works with the current implementation of get_online_cpus() but prevents converting it to a percpu rwsem. intel_snb_check_microcode() is also invoked from intel_sandybridge_quirk() unprotected. Drop get_online_cpus() from intel_snb_check_microcode() and add it to intel_sandybridge_quirk() so both call sites are protected. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Borislav Petkov <bp@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20170418170553.713742599@linutronix.de --- arch/x86/events/intel/core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index eb1484c..152b7e4 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -3389,12 +3389,10 @@ static void intel_snb_check_microcode(void) int pebs_broken = 0; int cpu; - get_online_cpus(); for_each_online_cpu(cpu) { if ((pebs_broken = intel_snb_pebs_broken(cpu))) break; } - put_online_cpus(); if (pebs_broken == x86_pmu.pebs_broken) return; @@ -3467,7 +3465,9 @@ static bool check_msr(unsigned long msr, u64 mask) static __init void intel_sandybridge_quirk(void) { x86_pmu.check_microcode = intel_snb_check_microcode; + get_online_cpus(); intel_snb_check_microcode(); + put_online_cpus(); } static const struct { int id; char *name; } intel_arch_events_map[] __initconst = {
Commit-ID: b4d1673371196dd9aebdd2f61d946165c777b931 Gitweb: http://git.kernel.org/tip/b4d1673371196dd9aebdd2f61d946165c777b931 Author: Thomas Gleixner <tglx@linutronix.de> AuthorDate: Tue, 18 Apr 2017 19:04:59 +0200 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Thu, 20 Apr 2017 13:08:55 +0200 PCI: Use cpu_hotplug_disable() instead of get_online_cpus() Converting the hotplug locking, i.e. get_online_cpus(), to a percpu rwsem unearthed a circular lock dependency which was hidden from lockdep due to the lockdep annotation of get_online_cpus() which prevents lockdep from creating full dependency chains. There are several variants of this. And example is: Chain exists of: cpu_hotplug_lock.rw_sem --> drm_global_mutex --> &item->mutex CPU0 CPU1 ---- ---- lock(&item->mutex); lock(drm_global_mutex); lock(&item->mutex); lock(cpu_hotplug_lock.rw_sem); because there are dependencies through workqueues. The call chain is: get_online_cpus apply_workqueue_attrs __alloc_workqueue_key ttm_mem_global_init ast_ttm_mem_global_init drm_global_item_ref ast_mm_init ast_driver_load drm_dev_register drm_get_pci_dev ast_pci_probe local_pci_probe work_for_cpu_fn process_one_work worker_thread This is not a problem of get_online_cpus() recursion, it's a possible deadlock undetected by lockdep so far. The cure is to use cpu_hotplug_disable() instead of get_online_cpus() to protect the PCI probing. There is a side effect to this: cpu_hotplug_disable() makes a concurrent cpu hotplug attempt via the sysfs interfaces fail with -EBUSY, but PCI probing usually happens during the boot process where no interaction is possible. Any later invocations are infrequent enough and concurrent hotplug attempts are so unlikely that the danger of user space visible regressions is very close to zero. Anyway, thats preferrable over a real deadlock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: linux-pci@vger.kernel.org Link: http://lkml.kernel.org/r/20170418170553.806707929@linutronix.de --- drivers/pci/pci-driver.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index afa7271..f00e4d9 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -349,13 +349,13 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev, if (node >= 0 && node != numa_node_id()) { int cpu; - get_online_cpus(); + cpu_hotplug_disable(); cpu = cpumask_any_and(cpumask_of_node(node), cpu_online_mask); if (cpu < nr_cpu_ids) error = work_on_cpu(cpu, local_pci_probe, &ddi); else error = local_pci_probe(&ddi); - put_online_cpus(); + cpu_hotplug_enable(); } else error = local_pci_probe(&ddi);
Commit-ID: 81edd60135c58f893bcc07f8e633ea8efe72084d Gitweb: http://git.kernel.org/tip/81edd60135c58f893bcc07f8e633ea8efe72084d Author: Thomas Gleixner <tglx@linutronix.de> AuthorDate: Tue, 18 Apr 2017 19:05:00 +0200 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Thu, 20 Apr 2017 13:08:56 +0200 PCI: Replace the racy recursion prevention pci_call_probe() can called recursively when a physcial function is probed and the probing creates virtual functions, which are populated via pci_bus_add_device() which in turn can end up calling pci_call_probe() again. The code has an interesting way to prevent recursing into the workqueue code. That's accomplished by a check whether the current task runs already on the numa node which is associated with the device. While that works to prevent the recursion into the workqueue code, it's racy versus normal execution as there is no guarantee that the node does not vanish after the check. There is another issue with this code. It dereferences cpumask_of_node() unconditionally without checking whether the node is available. Make the detection reliable by: - Mark a probed device as 'is_probed' in pci_call_probe() - Check in pci_call_probe for a virtual function. If it's a virtual function and the associated physical function device is marked 'is_probed' then this is a recursive call, so the call can be invoked in the calling context. - Add a check whether the node is online before dereferencing it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-pci@vger.kernel.org Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170418170553.885818800@linutronix.de --- drivers/pci/pci-driver.c | 47 +++++++++++++++++++++++++---------------------- include/linux/pci.h | 1 + 2 files changed, 26 insertions(+), 22 deletions(-) diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index f00e4d9..f84d2a8 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -320,10 +320,19 @@ static long local_pci_probe(void *_ddi) return 0; } +static bool pci_physfn_is_probed(struct pci_dev *dev) +{ +#ifdef CONFIG_PCI_IOV + return dev->is_virtfn && dev->physfn->is_probed; +#else + return false; +#endif +} + static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev, const struct pci_device_id *id) { - int error, node; + int error, node, cpu; struct drv_dev_and_id ddi = { drv, dev, id }; /* @@ -332,33 +341,27 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev, * on the right node. */ node = dev_to_node(&dev->dev); + dev->is_probed = 1; + + cpu_hotplug_disable(); /* - * On NUMA systems, we are likely to call a PF probe function using - * work_on_cpu(). If that probe calls pci_enable_sriov() (which - * adds the VF devices via pci_bus_add_device()), we may re-enter - * this function to call the VF probe function. Calling - * work_on_cpu() again will cause a lockdep warning. Since VFs are - * always on the same node as the PF, we can work around this by - * avoiding work_on_cpu() when we're already on the correct node. - * - * Preemption is enabled, so it's theoretically unsafe to use - * numa_node_id(), but even if we run the probe function on the - * wrong node, it should be functionally correct. + * Prevent nesting work_on_cpu() for the case where a Virtual Function + * device is probed from work_on_cpu() of the Physical device. */ - if (node >= 0 && node != numa_node_id()) { - int cpu; - - cpu_hotplug_disable(); + if (node < 0 || node >= MAX_NUMNODES || !node_online(node) || + pci_physfn_is_probed(dev)) + cpu = nr_cpu_ids; + else cpu = cpumask_any_and(cpumask_of_node(node), cpu_online_mask); - if (cpu < nr_cpu_ids) - error = work_on_cpu(cpu, local_pci_probe, &ddi); - else - error = local_pci_probe(&ddi); - cpu_hotplug_enable(); - } else + + if (cpu < nr_cpu_ids) + error = work_on_cpu(cpu, local_pci_probe, &ddi); + else error = local_pci_probe(&ddi); + dev->is_probed = 0; + cpu_hotplug_enable(); return error; } diff --git a/include/linux/pci.h b/include/linux/pci.h index eb3da1a..3efe145 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -365,6 +365,7 @@ struct pci_dev { unsigned int irq_managed:1; unsigned int has_secondary_link:1; unsigned int non_compliant_bars:1; /* broken BARs; ignore them */ + unsigned int is_probed:1; /* device probing in progress */ pci_dev_flags_t dev_flags; atomic_t enable_cnt; /* pci_enable_device has been called */
Commit-ID: 16ab1d27c5040ae369c4652e6ce0a9a82bdbb6c5 Gitweb: http://git.kernel.org/tip/16ab1d27c5040ae369c4652e6ce0a9a82bdbb6c5 Author: Thomas Gleixner <tglx@linutronix.de> AuthorDate: Tue, 18 Apr 2017 19:05:01 +0200 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Thu, 20 Apr 2017 13:08:56 +0200 ACPI/processor: Use cpu_hotplug_disable() instead of get_online_cpus() Converting the hotplug locking, i.e. get_online_cpus(), to a percpu rwsem unearthed a circular lock dependency which was hidden from lockdep due to the lockdep annotation of get_online_cpus() which prevents lockdep from creating full dependency chains. CPU0 CPU1 ---- ---- lock((&wfc.work)); lock(cpu_hotplug_lock.rw_sem); lock((&wfc.work)); lock(cpu_hotplug_lock.rw_sem); This dependency is established via acpi_processor_start() which calls into the work queue code. And the work queue code establishes the reverse dependency. This is not a problem of get_online_cpus() recursion, it's a possible deadlock undetected by lockdep so far. The cure is to use cpu_hotplug_disable() instead of get_online_cpus() to protect the probing from acpi_processor_start(). There is a side effect to this: cpu_hotplug_disable() makes a concurrent cpu hotplug attempt via the sysfs interfaces fail with -EBUSY, but that probing usually happens during the boot process where no interaction is possible. Any later invocations are infrequent enough and concurrent hotplug attempts are so unlikely that the danger of user space visible regressions is very close to zero. Anyway, thats preferrable over a real deadlock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: linux-acpi@vger.kernel.org Cc: Len Brown <lenb@kernel.org> Link: http://lkml.kernel.org/r/20170418170553.964555153@linutronix.de --- drivers/acpi/processor_driver.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/acpi/processor_driver.c b/drivers/acpi/processor_driver.c index 8697a82..591d1dd 100644 --- a/drivers/acpi/processor_driver.c +++ b/drivers/acpi/processor_driver.c @@ -268,9 +268,9 @@ static int acpi_processor_start(struct device *dev) return -ENODEV; /* Protect against concurrent CPU hotplug operations */ - get_online_cpus(); + cpu_hotplug_disable(); ret = __acpi_processor_start(device); - put_online_cpus(); + cpu_hotplug_enable(); return ret; }
Commit-ID: 3e27bdd5a4fc7954af4027f1a77e9556deed653d Gitweb: http://git.kernel.org/tip/3e27bdd5a4fc7954af4027f1a77e9556deed653d Author: Thomas Gleixner <tglx@linutronix.de> AuthorDate: Tue, 18 Apr 2017 19:05:02 +0200 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Thu, 20 Apr 2017 13:08:56 +0200 perf/core: Remove redundant get_online_cpus() SyS_perf_event_open() calls get_online_cpus() and eventually invokes swevent_hlist_get() which does it again. All callchains leading to swevent_hlist_get() originate from SyS_perf_event_open() so the extra protection is redundant. Remove it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170418170554.043759892@linutronix.de --- kernel/events/core.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index ff01cba..634dd95 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -7592,7 +7592,6 @@ static int swevent_hlist_get(void) { int err, cpu, failed_cpu; - get_online_cpus(); for_each_possible_cpu(cpu) { err = swevent_hlist_get_cpu(cpu); if (err) { @@ -7600,8 +7599,6 @@ static int swevent_hlist_get(void) goto fail; } } - put_online_cpus(); - return 0; fail: for_each_possible_cpu(cpu) { @@ -7609,8 +7606,6 @@ fail: break; swevent_hlist_put_cpu(cpu); } - - put_online_cpus(); return err; }
Commit-ID: d215aab82d81974f438bfbc80aa437132f3c37c3 Gitweb: http://git.kernel.org/tip/d215aab82d81974f438bfbc80aa437132f3c37c3 Author: Thomas Gleixner <tglx@linutronix.de> AuthorDate: Tue, 18 Apr 2017 19:05:06 +0200 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Thu, 20 Apr 2017 13:08:58 +0200 cpu/hotplug: Convert hotplug locking to percpu rwsem There are no more (known) nested calls to get_online_cpus() so it's possible to remove the nested call magic and convert the mutex to a percpu-rwsem, which speeds up get/put_online_cpus() significantly for the uncontended case. The contended case (write locked for hotplug operations) is slow anyway, so the slightly more expensive down_write of the percpu rwsem does not matter. [ peterz: Add lockdep assertions ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170418170554.382344438@linutronix.de --- include/linux/cpu.h | 2 + kernel/cpu.c | 110 ++++++++------------------------------------------ kernel/jump_label.c | 2 + kernel/padata.c | 1 + kernel/stop_machine.c | 2 + 5 files changed, 23 insertions(+), 94 deletions(-) diff --git a/include/linux/cpu.h b/include/linux/cpu.h index f920812..83010c3 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -105,6 +105,7 @@ extern void cpu_hotplug_begin(void); extern void cpu_hotplug_done(void); extern void get_online_cpus(void); extern void put_online_cpus(void); +extern void lockdep_assert_hotplug_held(void); extern void cpu_hotplug_disable(void); extern void cpu_hotplug_enable(void); void clear_tasks_mm_cpumask(int cpu); @@ -118,6 +119,7 @@ static inline void cpu_hotplug_done(void) {} #define put_online_cpus() do { } while (0) #define cpu_hotplug_disable() do { } while (0) #define cpu_hotplug_enable() do { } while (0) +static inline void lockdep_assert_hotplug_held(void) {} #endif /* CONFIG_HOTPLUG_CPU */ #ifdef CONFIG_PM_SLEEP_SMP diff --git a/kernel/cpu.c b/kernel/cpu.c index f932e68..05341f7 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -27,6 +27,7 @@ #include <linux/smpboot.h> #include <linux/relay.h> #include <linux/slab.h> +#include <linux/percpu-rwsem.h> #include <trace/events/power.h> #define CREATE_TRACE_POINTS @@ -196,121 +197,41 @@ void cpu_maps_update_done(void) mutex_unlock(&cpu_add_remove_lock); } -/* If set, cpu_up and cpu_down will return -EBUSY and do nothing. +/* + * If set, cpu_up and cpu_down will return -EBUSY and do nothing. * Should always be manipulated under cpu_add_remove_lock */ static int cpu_hotplug_disabled; #ifdef CONFIG_HOTPLUG_CPU -static struct { - struct task_struct *active_writer; - /* wait queue to wake up the active_writer */ - wait_queue_head_t wq; - /* verifies that no writer will get active while readers are active */ - struct mutex lock; - /* - * Also blocks the new readers during - * an ongoing cpu hotplug operation. - */ - atomic_t refcount; - -#ifdef CONFIG_DEBUG_LOCK_ALLOC - struct lockdep_map dep_map; -#endif -} cpu_hotplug = { - .active_writer = NULL, - .wq = __WAIT_QUEUE_HEAD_INITIALIZER(cpu_hotplug.wq), - .lock = __MUTEX_INITIALIZER(cpu_hotplug.lock), -#ifdef CONFIG_DEBUG_LOCK_ALLOC - .dep_map = STATIC_LOCKDEP_MAP_INIT("cpu_hotplug.dep_map", &cpu_hotplug.dep_map), -#endif -}; - -/* Lockdep annotations for get/put_online_cpus() and cpu_hotplug_begin/end() */ -#define cpuhp_lock_acquire_read() lock_map_acquire_read(&cpu_hotplug.dep_map) -#define cpuhp_lock_acquire_tryread() \ - lock_map_acquire_tryread(&cpu_hotplug.dep_map) -#define cpuhp_lock_acquire() lock_map_acquire(&cpu_hotplug.dep_map) -#define cpuhp_lock_release() lock_map_release(&cpu_hotplug.dep_map) - +DEFINE_STATIC_PERCPU_RWSEM(cpu_hotplug_lock); void get_online_cpus(void) { - might_sleep(); - if (cpu_hotplug.active_writer == current) - return; - cpuhp_lock_acquire_read(); - mutex_lock(&cpu_hotplug.lock); - atomic_inc(&cpu_hotplug.refcount); - mutex_unlock(&cpu_hotplug.lock); + percpu_down_read(&cpu_hotplug_lock); } EXPORT_SYMBOL_GPL(get_online_cpus); void put_online_cpus(void) { - int refcount; - - if (cpu_hotplug.active_writer == current) - return; - - refcount = atomic_dec_return(&cpu_hotplug.refcount); - if (WARN_ON(refcount < 0)) /* try to fix things up */ - atomic_inc(&cpu_hotplug.refcount); - - if (refcount <= 0 && waitqueue_active(&cpu_hotplug.wq)) - wake_up(&cpu_hotplug.wq); - - cpuhp_lock_release(); - + percpu_up_read(&cpu_hotplug_lock); } EXPORT_SYMBOL_GPL(put_online_cpus); -/* - * This ensures that the hotplug operation can begin only when the - * refcount goes to zero. - * - * Note that during a cpu-hotplug operation, the new readers, if any, - * will be blocked by the cpu_hotplug.lock - * - * Since cpu_hotplug_begin() is always called after invoking - * cpu_maps_update_begin(), we can be sure that only one writer is active. - * - * Note that theoretically, there is a possibility of a livelock: - * - Refcount goes to zero, last reader wakes up the sleeping - * writer. - * - Last reader unlocks the cpu_hotplug.lock. - * - A new reader arrives at this moment, bumps up the refcount. - * - The writer acquires the cpu_hotplug.lock finds the refcount - * non zero and goes to sleep again. - * - * However, this is very difficult to achieve in practice since - * get_online_cpus() not an api which is called all that often. - * - */ void cpu_hotplug_begin(void) { - DEFINE_WAIT(wait); - - cpu_hotplug.active_writer = current; - cpuhp_lock_acquire(); - - for (;;) { - mutex_lock(&cpu_hotplug.lock); - prepare_to_wait(&cpu_hotplug.wq, &wait, TASK_UNINTERRUPTIBLE); - if (likely(!atomic_read(&cpu_hotplug.refcount))) - break; - mutex_unlock(&cpu_hotplug.lock); - schedule(); - } - finish_wait(&cpu_hotplug.wq, &wait); + percpu_down_write(&cpu_hotplug_lock); } void cpu_hotplug_done(void) { - cpu_hotplug.active_writer = NULL; - mutex_unlock(&cpu_hotplug.lock); - cpuhp_lock_release(); + percpu_up_write(&cpu_hotplug_lock); +} + +void lockdep_assert_hotplug_held(void) +{ + percpu_rwsem_assert_held(&cpu_hotplug_lock); } /* @@ -344,8 +265,6 @@ void cpu_hotplug_enable(void) EXPORT_SYMBOL_GPL(cpu_hotplug_enable); #endif /* CONFIG_HOTPLUG_CPU */ -/* Notifier wrappers for transitioning to state machine */ - static int bringup_wait_for_ap(unsigned int cpu) { struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu); @@ -1482,6 +1401,8 @@ int __cpuhp_setup_state_cpuslocked(enum cpuhp_state state, int cpu, ret = 0; bool dynstate; + lockdep_assert_hotplug_held(); + if (cpuhp_cb_check(state) || !name) return -EINVAL; @@ -1600,6 +1521,7 @@ void __cpuhp_remove_state_cpuslocked(enum cpuhp_state state, bool invoke) int cpu; BUG_ON(cpuhp_cb_check(state)); + lockdep_assert_hotplug_held(); mutex_lock(&cpuhp_state_mutex); if (sp->multi_instance) { diff --git a/kernel/jump_label.c b/kernel/jump_label.c index 308b12e..9455d2b 100644 --- a/kernel/jump_label.c +++ b/kernel/jump_label.c @@ -130,6 +130,7 @@ void __static_key_slow_inc(struct static_key *key) * the all CPUs, for that to be serialized against CPU hot-plug * we need to avoid CPUs coming online. */ + lockdep_assert_hotplug_held(); jump_label_lock(); if (atomic_read(&key->enabled) == 0) { atomic_set(&key->enabled, -1); @@ -158,6 +159,7 @@ EXPORT_SYMBOL_GPL(static_key_slow_inc_cpuslocked); static void __static_key_slow_dec(struct static_key *key, unsigned long rate_limit, struct delayed_work *work) { + lockdep_assert_hotplug_held(); /* * The negative count check is valid even when a negative * key->enabled is in use by static_key_slow_inc(); a diff --git a/kernel/padata.c b/kernel/padata.c index e5ff511..dc74ddba 100644 --- a/kernel/padata.c +++ b/kernel/padata.c @@ -1013,6 +1013,7 @@ err: */ struct padata_instance *padata_alloc_possible(struct workqueue_struct *wq) { + lockdep_assert_hotplug_held(); return padata_alloc(wq, cpu_possible_mask, cpu_possible_mask); } EXPORT_SYMBOL(padata_alloc_possible); diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c index a8c5636..2390340 100644 --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -562,6 +562,8 @@ int stop_machine_cpuslocked(cpu_stop_fn_t fn, void *data, .active_cpus = cpus, }; + lockdep_assert_hotplug_held(); + if (!stop_machine_initialized) { /* * Handle the case where stop_machine() is called
On 18 April 2017 at 11:04, Thomas Gleixner <tglx@linutronix.de> wrote:
> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>
> etm_probe() holds get_online_cpus() while invoking
> cpuhp_setup_state_nocalls().
>
> cpuhp_setup_state_nocalls() invokes get_online_cpus() as well. This is
> correct, but prevents the conversion of the hotplug locking to a percpu
> rwsem.
>
> Use cpuhp_setup_state_nocalls_cpuslocked() to avoid the nested call.
>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: linux-arm-kernel@lists.infradead.org
>
> ---
> drivers/hwtracing/coresight/coresight-etm3x.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> --- a/drivers/hwtracing/coresight/coresight-etm3x.c
> +++ b/drivers/hwtracing/coresight/coresight-etm3x.c
> @@ -803,12 +803,12 @@ static int etm_probe(struct amba_device
> dev_err(dev, "ETM arch init failed\n");
>
> if (!etm_count++) {
> - cpuhp_setup_state_nocalls(CPUHP_AP_ARM_CORESIGHT_STARTING,
> - "arm/coresight:starting",
> - etm_starting_cpu, etm_dying_cpu);
> - ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
> - "arm/coresight:online",
> - etm_online_cpu, NULL);
> + cpuhp_setup_state_nocalls_cpuslocked(CPUHP_AP_ARM_CORESIGHT_STARTING,
> + "arm/coresight:starting",
> + etm_starting_cpu, etm_dying_cpu);
> + ret = cpuhp_setup_state_nocalls_cpuslocked(CPUHP_AP_ONLINE_DYN,
> + "arm/coresight:online",
> + etm_online_cpu, NULL);
> if (ret < 0)
> goto err_arch_supported;
> hp_online = ret;
>
Acked-by: Mathieu Poirier <mathieu.poirier@linaro.org>
On Tue, Apr 18, 2017 at 07:04:50PM +0200, Thomas Gleixner wrote:
> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>
> etm_probe() holds get_online_cpus() while invoking
> cpuhp_setup_state_nocalls().
>
> cpuhp_setup_state_nocalls() invokes get_online_cpus() as well. This is
> correct, but prevents the conversion of the hotplug locking to a percpu
> rwsem.
>
> Use cpuhp_setup_state_nocalls_cpuslocked() to avoid the nested call.
>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: linux-arm-kernel@lists.infradead.org
>
> ---
> drivers/hwtracing/coresight/coresight-etm3x.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> --- a/drivers/hwtracing/coresight/coresight-etm3x.c
> +++ b/drivers/hwtracing/coresight/coresight-etm3x.c
> @@ -803,12 +803,12 @@ static int etm_probe(struct amba_device
> dev_err(dev, "ETM arch init failed\n");
>
> if (!etm_count++) {
> - cpuhp_setup_state_nocalls(CPUHP_AP_ARM_CORESIGHT_STARTING,
> - "arm/coresight:starting",
> - etm_starting_cpu, etm_dying_cpu);
> - ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
> - "arm/coresight:online",
> - etm_online_cpu, NULL);
> + cpuhp_setup_state_nocalls_cpuslocked(CPUHP_AP_ARM_CORESIGHT_STARTING,
> + "arm/coresight:starting",
> + etm_starting_cpu, etm_dying_cpu);
> + ret = cpuhp_setup_state_nocalls_cpuslocked(CPUHP_AP_ONLINE_DYN,
> + "arm/coresight:online",
> + etm_online_cpu, NULL);
Checkpatch complains about lines being over 80 characters long. It is better to
keep the above or do as follow and make checkpatch happy?
cpuhp_setup_state_nocalls_cpuslocked(
PUHP_AP_ARM_CORESIGHT_STARTING,
"arm/coresight:starting",
etm_starting_cpu, etm_dying_cpu);
ret = cpuhp_setup_state_nocalls_cpuslocked(
CPUHP_AP_ONLINE_DYN,
"arm/coresight:online",
etm_online_cpu, NULL);
Same for coresight-etm4x.c
Thanks,
Mathieu
Hi, This series appears to break boot on some arm64 platforms, seen with next-20170424. More info below. On Tue, Apr 18, 2017 at 07:04:42PM +0200, Thomas Gleixner wrote: > get_online_cpus() is used in hot pathes in mainline and even more so in > RT. That can show up badly under certain conditions because every locker > contends on a global mutex. RT has it's own homebrewn mitigation which is > an (badly done) open coded implementation of percpu_rwsems with recursion > support. > > The proper replacement for that are percpu_rwsems, but that requires to > remove recursion support. > > The conversion unearthed real locking issues which were previously not > visible because the get_online_cpus() lockdep annotation was implemented > with recursion support which prevents lockdep from tracking full dependency > chains. These potential deadlocks are not related to recursive calls, they > trigger on the first invocation because lockdep now has the full dependency > chains available. Catalin spotted next-20170424 wouldn't boot on a Juno system, where we see the following splat (repeated forever) when we try to bring up the first secondary CPU: [ 0.213406] smp: Bringing up secondary CPUs ... [ 0.250326] CPU features: enabling workaround for ARM erratum 832075 [ 0.250334] BUG: scheduling while atomic: swapper/1/0/0x00000002 [ 0.250337] Modules linked in: [ 0.250346] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.11.0-rc7-next-20170424 #2 [ 0.250349] Hardware name: ARM Juno development board (r1) (DT) [ 0.250353] Call trace: [ 0.250365] [<ffff000008088510>] dump_backtrace+0x0/0x238 [ 0.250371] [<ffff00000808880c>] show_stack+0x14/0x20 [ 0.250377] [<ffff00000839d854>] dump_stack+0x9c/0xc0 [ 0.250384] [<ffff0000080e3540>] __schedule_bug+0x50/0x70 [ 0.250391] [<ffff000008932ecc>] __schedule+0x52c/0x5a8 [ 0.250395] [<ffff000008932f80>] schedule+0x38/0xa0 [ 0.250400] [<ffff000008935e8c>] rwsem_down_read_failed+0xc4/0x108 [ 0.250407] [<ffff0000080fe8e0>] __percpu_down_read+0x100/0x118 [ 0.250414] [<ffff0000080c0b60>] get_online_cpus+0x70/0x78 [ 0.250420] [<ffff0000081749e8>] static_key_enable+0x28/0x48 [ 0.250425] [<ffff00000808de90>] update_cpu_capabilities+0x78/0xf8 [ 0.250430] [<ffff00000808d14c>] update_cpu_errata_workarounds+0x1c/0x28 [ 0.250435] [<ffff00000808e004>] check_local_cpu_capabilities+0xf4/0x128 [ 0.250440] [<ffff00000808e894>] secondary_start_kernel+0x8c/0x118 [ 0.250444] [<000000008093d1b4>] 0x8093d1b4 I can reproduce this with the current head of the linux-tip smp/hotplug branch (commit 77c60400c82bd993), with arm64 defconfig on a Juno R1 system. When we bring the secondary CPU online, we detect an erratum that wasn't present on the boot CPU, and try to enable a static branch we use to track the erratum. The call to static_branch_enable() blows up as above. I see that we now have static_branch_disable_cpuslocked(), but we don't have an equivalent for enable. I'm not sure what we should be doing here. Thanks, Mark. > The following patch series addresses this by > > - Cleaning up places which call get_online_cpus() nested > > - Replacing a few instances with cpu_hotplug_disable() to prevent circular > locking dependencies. > > The series depends on > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core > plus > Linus tree merged in to avoid conflicts > > It's available in git from > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.hotplug > > Changes since V1: > > - Fixed fallout reported by kbuild bot > - Repaired the recursive call in perf > - Repaired the interaction with jumplabels (Peter Zijlstra) > - Renamed _locked to _cpuslocked > - Picked up Acked-bys > > Thanks, > > tglx > > ------- > arch/arm/kernel/hw_breakpoint.c | 5 > arch/mips/kernel/jump_label.c | 2 > arch/powerpc/kvm/book3s_hv.c | 8 - > arch/powerpc/platforms/powernv/subcore.c | 3 > arch/s390/kernel/time.c | 2 > arch/x86/events/core.c | 1 > arch/x86/events/intel/cqm.c | 12 - > arch/x86/kernel/cpu/mtrr/main.c | 2 > b/arch/sparc/kernel/jump_label.c | 2 > b/arch/tile/kernel/jump_label.c | 2 > b/arch/x86/events/intel/core.c | 4 > b/arch/x86/kernel/jump_label.c | 2 > b/kernel/jump_label.c | 31 ++++- > drivers/acpi/processor_driver.c | 4 > drivers/cpufreq/cpufreq.c | 9 - > drivers/hwtracing/coresight/coresight-etm3x.c | 12 - > drivers/hwtracing/coresight/coresight-etm4x.c | 12 - > drivers/pci/pci-driver.c | 47 ++++--- > include/linux/cpu.h | 2 > include/linux/cpuhotplug.h | 29 ++++ > include/linux/jump_label.h | 3 > include/linux/padata.h | 3 > include/linux/pci.h | 1 > include/linux/stop_machine.h | 26 +++- > kernel/cpu.c | 157 ++++++++------------------ > kernel/events/core.c | 9 - > kernel/padata.c | 39 +++--- > kernel/stop_machine.c | 7 - > 28 files changed, 228 insertions(+), 208 deletions(-) > > >
On 2017-04-25 17:10:37 [+0100], Mark Rutland wrote: > Hi, Hi, > When we bring the secondary CPU online, we detect an erratum that wasn't > present on the boot CPU, and try to enable a static branch we use to > track the erratum. The call to static_branch_enable() blows up as above. this (cpus_set_cap()) seems only to be used used in CPU up part. > I see that we now have static_branch_disable_cpuslocked(), but we don't > have an equivalent for enable. I'm not sure what we should be doing > here. We should introduce static_branch_enable_cpuslocked(). Does this work for you (after s/static_branch_enable/static_branch_enable_cpuslocked/ in cpus_set_cap()) ?: >From fe004351e4649da037d5165247e89ab976fe4a26 Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Date: Tue, 25 Apr 2017 18:43:00 +0200 Subject: [PATCH] jump_label: Provide static_key_[enable|/slow_inc]_cpuslocked() Provide static_key_[enable|slow_inc]_cpuslocked() variant that don't take cpu_hotplug_lock(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- include/linux/jump_label.h | 7 +++++++ kernel/jump_label.c | 10 ++++++++++ 2 files changed, 17 insertions(+) diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h index d7b17d1ab875..c80d8b1279b5 100644 --- a/include/linux/jump_label.h +++ b/include/linux/jump_label.h @@ -164,6 +164,7 @@ extern void static_key_slow_dec_cpuslocked(struct static_key *key); extern void jump_label_apply_nops(struct module *mod); extern int static_key_count(struct static_key *key); extern void static_key_enable(struct static_key *key); +extern void static_key_enable_cpuslocked(struct static_key *key); extern void static_key_disable(struct static_key *key); extern void static_key_disable_cpuslocked(struct static_key *key); @@ -252,6 +253,11 @@ static inline void static_key_enable(struct static_key *key) static_key_slow_inc(key); } +static inline void static_key_enable_cpuslocked(struct static_key *key) +{ + static_key_enable(key); +} + static inline void static_key_disable(struct static_key *key) { int count = static_key_count(key); @@ -429,6 +435,7 @@ extern bool ____wrong_branch_error(void); */ #define static_branch_enable(x) static_key_enable(&(x)->key) +#define static_branch_enable_cpuslocked(x) static_key_enable_cpuslocked(&(x)->key) #define static_branch_disable(x) static_key_disable(&(x)->key) #define static_branch_disable_cpuslocked(x) static_key_disable_cpuslocked(&(x)->key) diff --git a/kernel/jump_label.c b/kernel/jump_label.c index d71124ee3b14..6343f4c7e27f 100644 --- a/kernel/jump_label.c +++ b/kernel/jump_label.c @@ -90,6 +90,16 @@ void static_key_enable(struct static_key *key) } EXPORT_SYMBOL_GPL(static_key_enable); +void static_key_enable_cpuslocked(struct static_key *key) +{ + int count = static_key_count(key); + + WARN_ON_ONCE(count < 0 || count > 1); + + if (!count) + static_key_slow_inc_cpuslocked(key); +} + void static_key_disable(struct static_key *key) { int count = static_key_count(key); -- 2.11.0 > Thanks, > Mark. Sebastian
On Tue, Apr 25, 2017 at 07:28:38PM +0200, Sebastian Siewior wrote: > On 2017-04-25 17:10:37 [+0100], Mark Rutland wrote: > > When we bring the secondary CPU online, we detect an erratum that wasn't > > present on the boot CPU, and try to enable a static branch we use to > > track the erratum. The call to static_branch_enable() blows up as above. > > this (cpus_set_cap()) seems only to be used used in CPU up part. > > > I see that we now have static_branch_disable_cpuslocked(), but we don't > > have an equivalent for enable. I'm not sure what we should be doing > > here. > > We should introduce static_branch_enable_cpuslocked(). Does this work > for you (after s/static_branch_enable/static_branch_enable_cpuslocked/ > in cpus_set_cap()) ?: The patch you linked worked for me, given the below patch for arm64 to make use of static_branch_enable_cpuslocked(). Catalin/Will, are you happy for this to go via the tip tree with the other hotplug locking changes? Thanks, Mark. ---->8---- >From 03c98baf0a9fcdfb87145bfb3f108d49a721dad6 Mon Sep 17 00:00:00 2001 From: Mark Rutland <mark.rutland@arm.com> Date: Wed, 26 Apr 2017 09:46:47 +0100 Subject: [PATCH] arm64: cpufeature: use static_branch_enable_cpuslocked() Recently, the hotplug locking was conveted to use a percpu rwsem. Unlike the existing {get,put}_online_cpus() logic, this can't nest. Unfortunately, in arm64's secondary boot path we can end up nesting via static_branch_enable() in cpus_set_cap() when we detect an erratum. This leads to a stream of messages as below, where the secondary attempts to schedule befroe it has been fully onlined. As the CPU orchsetrating the onlining holds the rswem, this hangs the system. [ 0.250334] BUG: scheduling while atomic: swapper/1/0/0x00000002 [ 0.250337] Modules linked in: [ 0.250346] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.11.0-rc7-next-20170424 #2 [ 0.250349] Hardware name: ARM Juno development board (r1) (DT) [ 0.250353] Call trace: [ 0.250365] [<ffff000008088510>] dump_backtrace+0x0/0x238 [ 0.250371] [<ffff00000808880c>] show_stack+0x14/0x20 [ 0.250377] [<ffff00000839d854>] dump_stack+0x9c/0xc0 [ 0.250384] [<ffff0000080e3540>] __schedule_bug+0x50/0x70 [ 0.250391] [<ffff000008932ecc>] __schedule+0x52c/0x5a8 [ 0.250395] [<ffff000008932f80>] schedule+0x38/0xa0 [ 0.250400] [<ffff000008935e8c>] rwsem_down_read_failed+0xc4/0x108 [ 0.250407] [<ffff0000080fe8e0>] __percpu_down_read+0x100/0x118 [ 0.250414] [<ffff0000080c0b60>] get_online_cpus+0x70/0x78 [ 0.250420] [<ffff0000081749e8>] static_key_enable+0x28/0x48 [ 0.250425] [<ffff00000808de90>] update_cpu_capabilities+0x78/0xf8 [ 0.250430] [<ffff00000808d14c>] update_cpu_errata_workarounds+0x1c/0x28 [ 0.250435] [<ffff00000808e004>] check_local_cpu_capabilities+0xf4/0x128 [ 0.250440] [<ffff00000808e894>] secondary_start_kernel+0x8c/0x118 [ 0.250444] [<000000008093d1b4>] 0x8093d1b4 We only call cpus_set_cap() in the secondary boot path, where we know that the rwsem is held by the thread orchestrating the onlining. Thus, we can use the new static_branch_enable_cpuslocked() in cpus_set_cap(), avoiding the above. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reported-by: Catalin Marinas <catalin.marinas@arm.com> Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: Suzuki Poulose <suzuki,poulose@arm.com> --- arch/arm64/include/asm/cpufeature.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index f31c48d..349b5cd 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -145,7 +145,7 @@ static inline void cpus_set_cap(unsigned int num) num, ARM64_NCAPS); } else { __set_bit(num, cpu_hwcaps); - static_branch_enable(&cpu_hwcap_keys[num]); + static_branch_enable_cpuslocked(&cpu_hwcap_keys[num]); } } -- 1.9.1
On 26/04/17 09:59, Mark Rutland wrote: > On Tue, Apr 25, 2017 at 07:28:38PM +0200, Sebastian Siewior wrote: >> On 2017-04-25 17:10:37 [+0100], Mark Rutland wrote: >>> When we bring the secondary CPU online, we detect an erratum that wasn't >>> present on the boot CPU, and try to enable a static branch we use to >>> track the erratum. The call to static_branch_enable() blows up as above. >> >> this (cpus_set_cap()) seems only to be used used in CPU up part. >> >>> I see that we now have static_branch_disable_cpuslocked(), but we don't >>> have an equivalent for enable. I'm not sure what we should be doing >>> here. >> >> We should introduce static_branch_enable_cpuslocked(). Does this work >> for you (after s/static_branch_enable/static_branch_enable_cpuslocked/ >> in cpus_set_cap()) ?: > > The patch you linked worked for me, given the below patch for arm64 to > make use of static_branch_enable_cpuslocked(). > > Catalin/Will, are you happy for this to go via the tip tree with the > other hotplug locking changes? > > Thanks, > Mark. > > ---->8---- > From 03c98baf0a9fcdfb87145bfb3f108d49a721dad6 Mon Sep 17 00:00:00 2001 > From: Mark Rutland <mark.rutland@arm.com> > Date: Wed, 26 Apr 2017 09:46:47 +0100 > Subject: [PATCH] arm64: cpufeature: use static_branch_enable_cpuslocked() > > Recently, the hotplug locking was conveted to use a percpu rwsem. Unlike > the existing {get,put}_online_cpus() logic, this can't nest. > Unfortunately, in arm64's secondary boot path we can end up nesting via > static_branch_enable() in cpus_set_cap() when we detect an erratum. > > This leads to a stream of messages as below, where the secondary > attempts to schedule befroe it has been fully onlined. As the CPU nit: s/befroe/before/ > orchsetrating the onlining holds the rswem, this hangs the system. s/orchsetrating/orchestrating/ > > [ 0.250334] BUG: scheduling while atomic: swapper/1/0/0x00000002 > [ 0.250337] Modules linked in: > [ 0.250346] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.11.0-rc7-next-20170424 #2 > [ 0.250349] Hardware name: ARM Juno development board (r1) (DT) > [ 0.250353] Call trace: > [ 0.250365] [<ffff000008088510>] dump_backtrace+0x0/0x238 > [ 0.250371] [<ffff00000808880c>] show_stack+0x14/0x20 > [ 0.250377] [<ffff00000839d854>] dump_stack+0x9c/0xc0 > [ 0.250384] [<ffff0000080e3540>] __schedule_bug+0x50/0x70 > [ 0.250391] [<ffff000008932ecc>] __schedule+0x52c/0x5a8 > [ 0.250395] [<ffff000008932f80>] schedule+0x38/0xa0 > [ 0.250400] [<ffff000008935e8c>] rwsem_down_read_failed+0xc4/0x108 > [ 0.250407] [<ffff0000080fe8e0>] __percpu_down_read+0x100/0x118 > [ 0.250414] [<ffff0000080c0b60>] get_online_cpus+0x70/0x78 > [ 0.250420] [<ffff0000081749e8>] static_key_enable+0x28/0x48 > [ 0.250425] [<ffff00000808de90>] update_cpu_capabilities+0x78/0xf8 > [ 0.250430] [<ffff00000808d14c>] update_cpu_errata_workarounds+0x1c/0x28 > [ 0.250435] [<ffff00000808e004>] check_local_cpu_capabilities+0xf4/0x128 > [ 0.250440] [<ffff00000808e894>] secondary_start_kernel+0x8c/0x118 > [ 0.250444] [<000000008093d1b4>] 0x8093d1b4 > > We only call cpus_set_cap() in the secondary boot path, where we know > that the rwsem is held by the thread orchestrating the onlining. Thus, > we can use the new static_branch_enable_cpuslocked() in cpus_set_cap(), > avoiding the above. Correction, we could call cpus_set_cap() from setup_cpu_features() for updating the system global capabilities, where we may not hold the cpu_hotplug_lock. So we could end up calling static_branch_enable_cpuslocked() without actually holding the lock. Should we do a cpu_hotplug_begin/done in setup_cpu_feature_capabilities ? I agree it doesn't look that nice. Thoughts ? Suzuki > > Signed-off-by: Mark Rutland <mark.rutland@arm.com> > Reported-by: Catalin Marinas <catalin.marinas@arm.com> > Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > Cc: Will Deacon <will.deacon@arm.com> > Cc: Suzuki Poulose <suzuki,poulose@arm.com> > --- > arch/arm64/include/asm/cpufeature.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h > index f31c48d..349b5cd 100644 > --- a/arch/arm64/include/asm/cpufeature.h > +++ b/arch/arm64/include/asm/cpufeature.h > @@ -145,7 +145,7 @@ static inline void cpus_set_cap(unsigned int num) > num, ARM64_NCAPS); > } else { > __set_bit(num, cpu_hwcaps); > - static_branch_enable(&cpu_hwcap_keys[num]); > + static_branch_enable_cpuslocked(&cpu_hwcap_keys[num]); > } > } > >
On Wed, Apr 26, 2017 at 10:40:47AM +0100, Suzuki K Poulose wrote: > On 26/04/17 09:59, Mark Rutland wrote: > >We only call cpus_set_cap() in the secondary boot path, where we know > >that the rwsem is held by the thread orchestrating the onlining. Thus, > >we can use the new static_branch_enable_cpuslocked() in cpus_set_cap(), > >avoiding the above. > > Correction, we could call cpus_set_cap() from setup_cpu_features() > for updating the system global capabilities, where we may not hold the > cpu_hotplug_lock. Argh, yes, I missed that when scanning. > So we could end up calling static_branch_enable_cpuslocked() > without actually holding the lock. Should we do a cpu_hotplug_begin/done in > setup_cpu_feature_capabilities ? I agree it doesn't look that nice. Thoughts ? I agree that's hideous, but it looks like the only choice given the hotplug rwsem cahnges. :/ I can spin a v2 with that and the typo fixes. Thanks, Mark. > > Suzuki > > > > >Signed-off-by: Mark Rutland <mark.rutland@arm.com> > >Reported-by: Catalin Marinas <catalin.marinas@arm.com> > >Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > >Cc: Will Deacon <will.deacon@arm.com> > >Cc: Suzuki Poulose <suzuki,poulose@arm.com> > >--- > > arch/arm64/include/asm/cpufeature.h | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > >diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h > >index f31c48d..349b5cd 100644 > >--- a/arch/arm64/include/asm/cpufeature.h > >+++ b/arch/arm64/include/asm/cpufeature.h > >@@ -145,7 +145,7 @@ static inline void cpus_set_cap(unsigned int num) > > num, ARM64_NCAPS); > > } else { > > __set_bit(num, cpu_hwcaps); > >- static_branch_enable(&cpu_hwcap_keys[num]); > >+ static_branch_enable_cpuslocked(&cpu_hwcap_keys[num]); > > } > > } > > > > >
On 2017-04-26 11:32:36 [+0100], Mark Rutland wrote: > > So we could end up calling static_branch_enable_cpuslocked() > > without actually holding the lock. Should we do a cpu_hotplug_begin/done in > > setup_cpu_feature_capabilities ? I agree it doesn't look that nice. Thoughts ? > > I agree that's hideous, but it looks like the only choice given the > hotplug rwsem cahnges. :/ would work for you to provide a locked and unlocked version? > I can spin a v2 with that and the typo fixes. > > Thanks, > Mark. Sebastian
On Thu, Apr 27, 2017 at 10:27:20AM +0200, Sebastian Siewior wrote: > On 2017-04-26 11:32:36 [+0100], Mark Rutland wrote: > > > So we could end up calling static_branch_enable_cpuslocked() > > > without actually holding the lock. Should we do a cpu_hotplug_begin/done in > > > setup_cpu_feature_capabilities ? I agree it doesn't look that nice. Thoughts ? > > > > I agree that's hideous, but it looks like the only choice given the > > hotplug rwsem cahnges. :/ > > would work for you to provide a locked and unlocked version? Maybe. Today we have: // rwsem unlocked start_kernel() ->smp_prepare_boot_cpu() -->update_cpu_errata_workarounds() --->update_cpu_capabilities() // rwsem locked (by other CPU) secondary_start_kernel() ->check_local_cpu_capabilities() -->update_cpu_errata_workarounds() --->update_cpu_capabilities() With the common chain: update_cpu_capabilities() ->cpus_set_cap() -->static_branch_enable() ... so we could add a update_cpu_capabilities{,_cpuslocked}(), and say that cpus_set_cap() expects the hotplug rswem to be locked, as per the below diff. Thoughts? Mark. ---->8---- diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index f31c48d..7341579 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -145,7 +145,7 @@ static inline void cpus_set_cap(unsigned int num) num, ARM64_NCAPS); } else { __set_bit(num, cpu_hwcaps); - static_branch_enable(&cpu_hwcap_keys[num]); + static_branch_enable_cpuslocked(&cpu_hwcap_keys[num]); } } @@ -217,8 +217,22 @@ static inline bool id_aa64pfr0_32bit_el0(u64 pfr0) void __init setup_cpu_features(void); -void update_cpu_capabilities(const struct arm64_cpu_capabilities *caps, - const char *info); +void __update_cpu_capabilities(const struct arm64_cpu_capabilities *caps, + const char *info, bool cpuslocked); +static inline void +update_cpu_capabilities(const struct arm64_cpu_capabilities *caps, + const char *info) +{ + __update_cpu_capabilities(caps, info, false); +} + +static inline void +update_cpu_capabilities_cpuslocked(const struct arm64_cpu_capabilities *caps, + const char *info) +{ + __update_cpu_capabilities(caps, info, true); +} + void enable_cpu_capabilities(const struct arm64_cpu_capabilities *caps); void check_local_cpu_capabilities(void); diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index abda8e8..ae8ddc1 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -956,8 +956,8 @@ static void __init setup_elf_hwcaps(const struct arm64_cpu_capabilities *hwcaps) cap_set_elf_hwcap(hwcaps); } -void update_cpu_capabilities(const struct arm64_cpu_capabilities *caps, - const char *info) +void __update_cpu_capabilities(const struct arm64_cpu_capabilities *caps, + const char *info, bool cpuslocked) { for (; caps->matches; caps++) { if (!caps->matches(caps, caps->def_scope)) @@ -965,7 +965,14 @@ void update_cpu_capabilities(const struct arm64_cpu_capabilities *caps, if (!cpus_have_cap(caps->capability) && caps->desc) pr_info("%s %s\n", info, caps->desc); - cpus_set_cap(caps->capability); + + if (cpuslocked) { + cpus_set_cap(caps->capability); + } else { + get_online_cpus(); + cpus_set_cap(caps->capability); + put_online_cpus(); + } } }
On Thu, 27 Apr 2017, Mark Rutland wrote:
> On Thu, Apr 27, 2017 at 10:27:20AM +0200, Sebastian Siewior wrote:
> > On 2017-04-26 11:32:36 [+0100], Mark Rutland wrote:
> > > > So we could end up calling static_branch_enable_cpuslocked()
> > > > without actually holding the lock. Should we do a cpu_hotplug_begin/done in
> > > > setup_cpu_feature_capabilities ? I agree it doesn't look that nice. Thoughts ?
> > >
> > > I agree that's hideous, but it looks like the only choice given the
> > > hotplug rwsem cahnges. :/
> >
> > would work for you to provide a locked and unlocked version?
>
> Maybe. Today we have:
>
> // rwsem unlocked
> start_kernel()
> ->smp_prepare_boot_cpu()
> -->update_cpu_errata_workarounds()
> --->update_cpu_capabilities()
>
> // rwsem locked (by other CPU)
> secondary_start_kernel()
> ->check_local_cpu_capabilities()
> -->update_cpu_errata_workarounds()
> --->update_cpu_capabilities()
>
> With the common chain:
>
> update_cpu_capabilities()
> ->cpus_set_cap()
> -->static_branch_enable()
>
> ... so we could add a update_cpu_capabilities{,_cpuslocked}(), and say
> that cpus_set_cap() expects the hotplug rswem to be locked, as per the
> below diff.
You just can take the rwsen in smp_prepare_boot_cpu(), so you don't need
that conditional thingy at all. Hmm?
Thanks,
tglx
On Thu, Apr 27, 2017 at 12:01:57PM +0200, Thomas Gleixner wrote:
> On Thu, 27 Apr 2017, Mark Rutland wrote:
>
> > On Thu, Apr 27, 2017 at 10:27:20AM +0200, Sebastian Siewior wrote:
> > > On 2017-04-26 11:32:36 [+0100], Mark Rutland wrote:
> > > > > So we could end up calling static_branch_enable_cpuslocked()
> > > > > without actually holding the lock. Should we do a cpu_hotplug_begin/done in
> > > > > setup_cpu_feature_capabilities ? I agree it doesn't look that nice. Thoughts ?
> > > >
> > > > I agree that's hideous, but it looks like the only choice given the
> > > > hotplug rwsem cahnges. :/
> > >
> > > would work for you to provide a locked and unlocked version?
> >
> > Maybe. Today we have:
> >
> > // rwsem unlocked
> > start_kernel()
> > ->smp_prepare_boot_cpu()
> > -->update_cpu_errata_workarounds()
> > --->update_cpu_capabilities()
> >
> > // rwsem locked (by other CPU)
> > secondary_start_kernel()
> > ->check_local_cpu_capabilities()
> > -->update_cpu_errata_workarounds()
> > --->update_cpu_capabilities()
> >
> > With the common chain:
> >
> > update_cpu_capabilities()
> > ->cpus_set_cap()
> > -->static_branch_enable()
> >
> > ... so we could add a update_cpu_capabilities{,_cpuslocked}(), and say
> > that cpus_set_cap() expects the hotplug rswem to be locked, as per the
> > below diff.
>
> You just can take the rwsen in smp_prepare_boot_cpu(), so you don't need
> that conditional thingy at all. Hmm?
True.
Given it's a bit further up the callchain, it's probably worth a
comment, but it will work.
I'll spin a v3 to that effect shortly.
Thanks,
Mark.
Hi Catalin/Will, The below addresses a boot failure Catalin spotted in next-20170424, based on Sebastian's patch [1]. I've given it a spin on Juno R1, where I can reproduce the issue prior to applying this patch. I believe this would need to go via tip, as the issue is a result of change in the tip smp/hotplug branch, and the fix depends on infrastructure introduced there. Are you happy with the fix, and for it to go via the tip tree? Thanks, Mark. [1] https://lkml.kernel.org/r/20170425172838.mr3kyccsdteyjso5@linutronix.de [2] https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=smp/hotplug ---->8---- >From 6cdb503b060f74743769c9f601c35f985d3c58eb Mon Sep 17 00:00:00 2001 From: Mark Rutland <mark.rutland@arm.com> Date: Wed, 26 Apr 2017 09:46:47 +0100 Subject: [PATCH] arm64: cpufeature: use static_branch_enable_cpuslocked() Recently, the hotplug locking was conveted to use a percpu rwsem. Unlike the existing {get,put}_online_cpus() logic, this can't nest. Unfortunately, in arm64's secondary boot path we can end up nesting via static_branch_enable() in cpus_set_cap() when we detect an erratum. This leads to a stream of messages as below, where the secondary attempts to schedule before it has been fully onlined. As the CPU orchestrating the onlining holds the rswem, this hangs the system. [ 0.250334] BUG: scheduling while atomic: swapper/1/0/0x00000002 [ 0.250337] Modules linked in: [ 0.250346] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.11.0-rc7-next-20170424 #2 [ 0.250349] Hardware name: ARM Juno development board (r1) (DT) [ 0.250353] Call trace: [ 0.250365] [<ffff000008088510>] dump_backtrace+0x0/0x238 [ 0.250371] [<ffff00000808880c>] show_stack+0x14/0x20 [ 0.250377] [<ffff00000839d854>] dump_stack+0x9c/0xc0 [ 0.250384] [<ffff0000080e3540>] __schedule_bug+0x50/0x70 [ 0.250391] [<ffff000008932ecc>] __schedule+0x52c/0x5a8 [ 0.250395] [<ffff000008932f80>] schedule+0x38/0xa0 [ 0.250400] [<ffff000008935e8c>] rwsem_down_read_failed+0xc4/0x108 [ 0.250407] [<ffff0000080fe8e0>] __percpu_down_read+0x100/0x118 [ 0.250414] [<ffff0000080c0b60>] get_online_cpus+0x70/0x78 [ 0.250420] [<ffff0000081749e8>] static_key_enable+0x28/0x48 [ 0.250425] [<ffff00000808de90>] update_cpu_capabilities+0x78/0xf8 [ 0.250430] [<ffff00000808d14c>] update_cpu_errata_workarounds+0x1c/0x28 [ 0.250435] [<ffff00000808e004>] check_local_cpu_capabilities+0xf4/0x128 [ 0.250440] [<ffff00000808e894>] secondary_start_kernel+0x8c/0x118 [ 0.250444] [<000000008093d1b4>] 0x8093d1b4 We call cpus_set_cap() from update_cpu_capabilities(), which is called from the secondary boot path (where the CPU orchestrating the onlining holds the hotplug rwsem), and in the primary boot path, where this is not held. This patch makes cpus_set_cap() use static_branch_enable_cpuslocked(), and updates the primary CPU boot path to hold the rwsem so as to keep the *_cpuslocked() code happy. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reported-by: Catalin Marinas <catalin.marinas@arm.com> Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: Suzuki Poulose <suzuki,poulose@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> --- arch/arm64/include/asm/cpufeature.h | 2 +- arch/arm64/kernel/smp.c | 8 +++++--- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index f31c48d..349b5cd 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -145,7 +145,7 @@ static inline void cpus_set_cap(unsigned int num) num, ARM64_NCAPS); } else { __set_bit(num, cpu_hwcaps); - static_branch_enable(&cpu_hwcap_keys[num]); + static_branch_enable_cpuslocked(&cpu_hwcap_keys[num]); } } diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 9b10365..c2ce9aa 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -447,11 +447,13 @@ void __init smp_prepare_boot_cpu(void) cpuinfo_store_boot_cpu(); save_boot_cpu_run_el(); /* - * Run the errata work around checks on the boot CPU, once we have - * initialised the cpu feature infrastructure from - * cpuinfo_store_boot_cpu() above. + * Run the errata work around checks on the boot CPU, now that + * cpuinfo_store_boot_cpu() has set things up. We hold the percpu rwsem + * to keep the workaround setup code happy. */ + get_online_cpus(); update_cpu_errata_workarounds(); + put_online_cpus(); } static u64 __init of_get_cpu_mpidr(struct device_node *dn) -- 1.9.1
On Thu, Apr 27, 2017 at 04:48:06PM +0100, Mark Rutland wrote: > Hi Catalin/Will, > > The below addresses a boot failure Catalin spotted in next-20170424, > based on Sebastian's patch [1]. I've given it a spin on Juno R1, where I > can reproduce the issue prior to applying this patch. > > I believe this would need to go via tip, as the issue is a result of > change in the tip smp/hotplug branch, and the fix depends on > infrastructure introduced there. > > Are you happy with the fix, and for it to go via the tip tree? > > Thanks, > Mark. > > [1] https://lkml.kernel.org/r/20170425172838.mr3kyccsdteyjso5@linutronix.de > [2] https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=smp/hotplug > > ---->8---- > From 6cdb503b060f74743769c9f601c35f985d3c58eb Mon Sep 17 00:00:00 2001 > From: Mark Rutland <mark.rutland@arm.com> > Date: Wed, 26 Apr 2017 09:46:47 +0100 > Subject: [PATCH] arm64: cpufeature: use static_branch_enable_cpuslocked() > > Recently, the hotplug locking was conveted to use a percpu rwsem. Unlike > the existing {get,put}_online_cpus() logic, this can't nest. > Unfortunately, in arm64's secondary boot path we can end up nesting via > static_branch_enable() in cpus_set_cap() when we detect an erratum. > > This leads to a stream of messages as below, where the secondary > attempts to schedule before it has been fully onlined. As the CPU > orchestrating the onlining holds the rswem, this hangs the system. > > [ 0.250334] BUG: scheduling while atomic: swapper/1/0/0x00000002 > [ 0.250337] Modules linked in: > [ 0.250346] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.11.0-rc7-next-20170424 #2 > [ 0.250349] Hardware name: ARM Juno development board (r1) (DT) > [ 0.250353] Call trace: > [ 0.250365] [<ffff000008088510>] dump_backtrace+0x0/0x238 > [ 0.250371] [<ffff00000808880c>] show_stack+0x14/0x20 > [ 0.250377] [<ffff00000839d854>] dump_stack+0x9c/0xc0 > [ 0.250384] [<ffff0000080e3540>] __schedule_bug+0x50/0x70 > [ 0.250391] [<ffff000008932ecc>] __schedule+0x52c/0x5a8 > [ 0.250395] [<ffff000008932f80>] schedule+0x38/0xa0 > [ 0.250400] [<ffff000008935e8c>] rwsem_down_read_failed+0xc4/0x108 > [ 0.250407] [<ffff0000080fe8e0>] __percpu_down_read+0x100/0x118 > [ 0.250414] [<ffff0000080c0b60>] get_online_cpus+0x70/0x78 > [ 0.250420] [<ffff0000081749e8>] static_key_enable+0x28/0x48 > [ 0.250425] [<ffff00000808de90>] update_cpu_capabilities+0x78/0xf8 > [ 0.250430] [<ffff00000808d14c>] update_cpu_errata_workarounds+0x1c/0x28 > [ 0.250435] [<ffff00000808e004>] check_local_cpu_capabilities+0xf4/0x128 > [ 0.250440] [<ffff00000808e894>] secondary_start_kernel+0x8c/0x118 > [ 0.250444] [<000000008093d1b4>] 0x8093d1b4 > > We call cpus_set_cap() from update_cpu_capabilities(), which is called > from the secondary boot path (where the CPU orchestrating the onlining > holds the hotplug rwsem), and in the primary boot path, where this is > not held. > > This patch makes cpus_set_cap() use static_branch_enable_cpuslocked(), > and updates the primary CPU boot path to hold the rwsem so as to keep > the *_cpuslocked() code happy. > > Signed-off-by: Mark Rutland <mark.rutland@arm.com> > Reported-by: Catalin Marinas <catalin.marinas@arm.com> > Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > Suggested-by: Thomas Gleixner <tglx@linutronix.de> > Cc: Will Deacon <will.deacon@arm.com> > Cc: Suzuki Poulose <suzuki,poulose@arm.com> > Signed-off-by: Mark Rutland <mark.rutland@arm.com> > --- > arch/arm64/include/asm/cpufeature.h | 2 +- > arch/arm64/kernel/smp.c | 8 +++++--- > 2 files changed, 6 insertions(+), 4 deletions(-) > > diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h > index f31c48d..349b5cd 100644 > --- a/arch/arm64/include/asm/cpufeature.h > +++ b/arch/arm64/include/asm/cpufeature.h > @@ -145,7 +145,7 @@ static inline void cpus_set_cap(unsigned int num) > num, ARM64_NCAPS); > } else { > __set_bit(num, cpu_hwcaps); > - static_branch_enable(&cpu_hwcap_keys[num]); > + static_branch_enable_cpuslocked(&cpu_hwcap_keys[num]); > } > } > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > index 9b10365..c2ce9aa 100644 > --- a/arch/arm64/kernel/smp.c > +++ b/arch/arm64/kernel/smp.c > @@ -447,11 +447,13 @@ void __init smp_prepare_boot_cpu(void) > cpuinfo_store_boot_cpu(); > save_boot_cpu_run_el(); > /* > - * Run the errata work around checks on the boot CPU, once we have > - * initialised the cpu feature infrastructure from > - * cpuinfo_store_boot_cpu() above. > + * Run the errata work around checks on the boot CPU, now that > + * cpuinfo_store_boot_cpu() has set things up. We hold the percpu rwsem > + * to keep the workaround setup code happy. > */ > + get_online_cpus(); > update_cpu_errata_workarounds(); > + put_online_cpus(); We need similar locking for updat_cpu_capabilities() called from setup_cpu_feature_capabilities(). i.e., the following fixup is required. diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 94b8f7f..62bdab4 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -966,6 +966,7 @@ static void __init setup_elf_hwcaps(const struct arm64_cpu_capabilities *hwcaps) cap_set_elf_hwcap(hwcaps); } +/* Should be called with CPU hotplug lock held */ void update_cpu_capabilities(const struct arm64_cpu_capabilities *caps, const char *info) { @@ -1092,7 +1093,9 @@ void check_local_cpu_capabilities(void) static void __init setup_feature_capabilities(void) { - update_cpu_capabilities(arm64_features, "detected feature:"); + get_online_cpus(); + update_cpu_capabilities(arm6_features, "detected feature:"); + put_online_cpus(); enable_cpu_capabilities(arm64_features); } -- --- Also, I think having update_cpu_errata_workarounds() called with and without the CPU hotplug lock makes it a bit confusing without proper explanation. How about the following patch which makes things a bit more reader friendly ? ---8>--- >From f3b0809224e4915197d3ae4a38ebe7f210e74abf Mon Sep 17 00:00:00 2001 From: Mark Rutland <mark.rutland@arm.com> Date: Thu, 27 Apr 2017 16:48:06 +0100 Subject: [PATCH] arm64: cpufeature: use static_branch_enable_cpuslocked() Recently, the hotplug locking was conveted to use a percpu rwsem. Unlike the existing {get,put}_online_cpus() logic, this can't nest. Unfortunately, in arm64's secondary boot path we can end up nesting via static_branch_enable() in cpus_set_cap() when we detect an erratum. This leads to a stream of messages as below, where the secondary attempts to schedule before it has been fully onlined. As the CPU orchestrating the onlining holds the rswem, this hangs the system. [ 0.250334] BUG: scheduling while atomic: swapper/1/0/0x00000002 [ 0.250337] Modules linked in: [ 0.250346] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.11.0-rc7-next-20170424 #2 [ 0.250349] Hardware name: ARM Juno development board (r1) (DT) [ 0.250353] Call trace: [ 0.250365] [<ffff000008088510>] dump_backtrace+0x0/0x238 [ 0.250371] [<ffff00000808880c>] show_stack+0x14/0x20 [ 0.250377] [<ffff00000839d854>] dump_stack+0x9c/0xc0 [ 0.250384] [<ffff0000080e3540>] __schedule_bug+0x50/0x70 [ 0.250391] [<ffff000008932ecc>] __schedule+0x52c/0x5a8 [ 0.250395] [<ffff000008932f80>] schedule+0x38/0xa0 [ 0.250400] [<ffff000008935e8c>] rwsem_down_read_failed+0xc4/0x108 [ 0.250407] [<ffff0000080fe8e0>] __percpu_down_read+0x100/0x118 [ 0.250414] [<ffff0000080c0b60>] get_online_cpus+0x70/0x78 [ 0.250420] [<ffff0000081749e8>] static_key_enable+0x28/0x48 [ 0.250425] [<ffff00000808de90>] update_cpu_capabilities+0x78/0xf8 [ 0.250430] [<ffff00000808d14c>] update_cpu_errata_workarounds+0x1c/0x28 [ 0.250435] [<ffff00000808e004>] check_local_cpu_capabilities+0xf4/0x128 [ 0.250440] [<ffff00000808e894>] secondary_start_kernel+0x8c/0x118 [ 0.250444] [<000000008093d1b4>] 0x8093d1b4 We call cpus_set_cap() from update_cpu_capabilities(), which is called from the secondary boot path (where the CPU orchestrating the onlining holds the hotplug rwsem), and in the primary boot path, where this is not held. This patch makes cpus_set_cap() use static_branch_enable_cpuslocked(), and updates all the callers of update_cpu_capabilities() consistent with the change. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reported-by: Catalin Marinas <catalin.marinas@arm.com> Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: Suzuki Poulose <suzuki,poulose@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> --- arch/arm64/include/asm/cpufeature.h | 5 +++-- arch/arm64/kernel/cpu_errata.c | 13 ++++++++++++- arch/arm64/kernel/cpufeature.c | 5 ++++- arch/arm64/kernel/smp.c | 7 +++---- 4 files changed, 22 insertions(+), 8 deletions(-) diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index e7f84a7..2a832c6 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -145,7 +145,7 @@ static inline void cpus_set_cap(unsigned int num) num, ARM64_NCAPS); } else { __set_bit(num, cpu_hwcaps); - static_branch_enable(&cpu_hwcap_keys[num]); + static_branch_enable_cpuslocked(&cpu_hwcap_keys[num]); } } @@ -222,7 +222,8 @@ void update_cpu_capabilities(const struct arm64_cpu_capabilities *caps, void enable_cpu_capabilities(const struct arm64_cpu_capabilities *caps); void check_local_cpu_capabilities(void); -void update_cpu_errata_workarounds(void); +void update_secondary_cpu_errata_workarounds(void); +void update_boot_cpu_errata_workarounds(void); void __init enable_errata_workarounds(void); void verify_local_cpu_errata_workarounds(void); diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c index 2ed2a76..f2d889b 100644 --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -190,9 +190,20 @@ void verify_local_cpu_errata_workarounds(void) } } -void update_cpu_errata_workarounds(void) +/* + * Secondary CPUs are booted with the waker holding the + * CPU hotplug lock, hence we don't need to lock it here again. + */ +void update_secondary_cpu_errata_workarounds(void) +{ + update_cpu_capabilities(arm64_errata, "enabling workaround for"); +} + +void update_boot_cpu_errata_workarounds(void) { + get_online_cpus(); update_cpu_capabilities(arm64_errata, "enabling workaround for"); + put_online_cpus(); } void __init enable_errata_workarounds(void) diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 94b8f7f..62bdab4 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -966,6 +966,7 @@ static void __init setup_elf_hwcaps(const struct arm64_cpu_capabilities *hwcaps) cap_set_elf_hwcap(hwcaps); } +/* Should be called with CPU hotplug lock held */ void update_cpu_capabilities(const struct arm64_cpu_capabilities *caps, const char *info) { @@ -1092,7 +1093,9 @@ void check_local_cpu_capabilities(void) static void __init setup_feature_capabilities(void) { - update_cpu_capabilities(arm64_features, "detected feature:"); + get_online_cpus(); + update_cpu_capabilities(arm6_features, "detected feature:"); + put_online_cpus(); enable_cpu_capabilities(arm64_features); } diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 6e0e16a..51ba91f 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -450,11 +450,10 @@ void __init smp_prepare_boot_cpu(void) cpuinfo_store_boot_cpu(); save_boot_cpu_run_el(); /* - * Run the errata work around checks on the boot CPU, once we have - * initialised the cpu feature infrastructure from - * cpuinfo_store_boot_cpu() above. + * Run the errata work around checks on the boot CPU, now that + * cpuinfo_store_boot_cpu() has set things up. */ - update_cpu_errata_workarounds(); + update_boot_cpu_errata_workarounds(); } static u64 __init of_get_cpu_mpidr(struct device_node *dn) -- 2.7.4
On 27/04/17 17:35, Suzuki K Poulose wrote: > rom f3b0809224e4915197d3ae4a38ebe7f210e74abf Mon Sep 17 00:00:00 2001 > From: Mark Rutland <mark.rutland@arm.com> > Date: Thu, 27 Apr 2017 16:48:06 +0100 > Subject: [PATCH] arm64: cpufeature: use static_branch_enable_cpuslocked() > Build break alert. There are some issues with patch below. > > Signed-off-by: Mark Rutland <mark.rutland@arm.com> > Reported-by: Catalin Marinas <catalin.marinas@arm.com> > Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > Suggested-by: Thomas Gleixner <tglx@linutronix.de> > Cc: Will Deacon <will.deacon@arm.com> > Cc: Suzuki Poulose <suzuki,poulose@arm.com> > Signed-off-by: Mark Rutland <mark.rutland@arm.com> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> > --- > arch/arm64/include/asm/cpufeature.h | 5 +++-- > arch/arm64/kernel/cpu_errata.c | 13 ++++++++++++- > arch/arm64/kernel/cpufeature.c | 5 ++++- > arch/arm64/kernel/smp.c | 7 +++---- > 4 files changed, 22 insertions(+), 8 deletions(-) > > void __init enable_errata_workarounds(void) > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c > index 94b8f7f..62bdab4 100644 > --- a/arch/arm64/kernel/cpufeature.c > +++ b/arch/arm64/kernel/cpufeature.c > @@ -966,6 +966,7 @@ static void __init setup_elf_hwcaps(const struct arm64_cpu_capabilities *hwcaps) > cap_set_elf_hwcap(hwcaps); > } > > +/* Should be called with CPU hotplug lock held */ > void update_cpu_capabilities(const struct arm64_cpu_capabilities *caps, > const char *info) > { > @@ -1092,7 +1093,9 @@ void check_local_cpu_capabilities(void) > > static void __init setup_feature_capabilities(void) > { > - update_cpu_capabilities(arm64_features, "detected feature:"); > + get_online_cpus(); > + update_cpu_capabilities(arm6_features, "detected feature:"); s/arm6_features/arm64_features And we need the following hunk: diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 62bdab4..19c359a 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -1086,7 +1086,7 @@ void check_local_cpu_capabilities(void) * advertised capabilities. */ if (!sys_caps_initialised) - update_cpu_errata_workarounds(); + update_secondary_cpu_errata_workarounds(); else verify_local_cpu_capabilities(); } Sorry about that. Suzuki
On Thu, Apr 27, 2017 at 06:03:35PM +0100, Suzuki K Poulose wrote: > On 27/04/17 17:35, Suzuki K Poulose wrote: > >@@ -1092,7 +1093,9 @@ void check_local_cpu_capabilities(void) > > > > static void __init setup_feature_capabilities(void) > > { > >- update_cpu_capabilities(arm64_features, "detected feature:"); > >+ get_online_cpus(); > >+ update_cpu_capabilities(arm6_features, "detected feature:"); > > s/arm6_features/arm64_features > > And we need the following hunk: > > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c > index 62bdab4..19c359a 100644 > --- a/arch/arm64/kernel/cpufeature.c > +++ b/arch/arm64/kernel/cpufeature.c > @@ -1086,7 +1086,7 @@ void check_local_cpu_capabilities(void) > * advertised capabilities. > */ > if (!sys_caps_initialised) > - update_cpu_errata_workarounds(); > + update_secondary_cpu_errata_workarounds(); > else > verify_local_cpu_capabilities(); > } > > Sorry about that. No worries; thanks for the fixups. With those this is working for me, so I'll send this and Sebastian's patch (with Ccs) as a new series. > > Suzuki
As trinity figured out, there is a recursive get_online_cpus() in perf_event_open()'s error path: | Call Trace: | dump_stack+0x86/0xce | __lock_acquire+0x2520/0x2cd0 | lock_acquire+0x27c/0x2f0 | get_online_cpus+0x3d/0x80 | static_key_slow_dec+0x5a/0x70 | sw_perf_event_destroy+0x8e/0x100 | _free_event+0x61b/0x800 | free_event+0x68/0x70 | SyS_perf_event_open+0x19db/0x1d80 In order to cure, I am moving free_event() after the put_online_cpus() block. Besides that one, there also the error path in perf_event_alloc() which also invokes event->destory. Here I delayed the destory work to schedule_work(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- I am not quite happy with the schedule_work() part. include/linux/perf_event.h | 1 + kernel/events/core.c | 52 +++++++++++++++++++++++++++++++++------------- 2 files changed, 38 insertions(+), 15 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 24a635887f28..d6a874dbbd21 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -718,6 +718,7 @@ struct perf_event { #endif struct list_head sb_list; + struct work_struct destroy_work; #endif /* CONFIG_PERF_EVENTS */ }; diff --git a/kernel/events/core.c b/kernel/events/core.c index 7aed78b516fc..3358889609f8 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9320,6 +9320,16 @@ static void account_event(struct perf_event *event) account_pmu_sb_event(event); } +static void perf_alloc_destroy_ev(struct work_struct *work) +{ + struct perf_event *event; + + event = container_of(work, struct perf_event, destroy_work); + event->destroy(event); + module_put(event->pmu->module); + kfree(event); +} + /* * Allocate and initialize a event structure */ @@ -9334,6 +9344,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu, struct pmu *pmu; struct perf_event *event; struct hw_perf_event *hwc; + bool delay_destroy = false; long err = -EINVAL; if ((unsigned)cpu >= nr_cpu_ids) { @@ -9497,15 +9508,22 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu, exclusive_event_destroy(event); err_pmu: - if (event->destroy) - event->destroy(event); - module_put(pmu->module); + if (event->destroy) { + /* delay ->destroy due to nested get_online_cpus() */ + INIT_WORK(&event->destroy_work, perf_alloc_destroy_ev); + delay_destroy = true; + } else { + module_put(pmu->module); + } err_ns: if (is_cgroup_event(event)) perf_detach_cgroup(event); if (event->ns) put_pid_ns(event->ns); - kfree(event); + if (delay_destroy) + schedule_work(&event->destroy_work); + else + kfree(event); return ERR_PTR(err); } @@ -9798,7 +9816,7 @@ SYSCALL_DEFINE5(perf_event_open, pid_t, pid, int, cpu, int, group_fd, unsigned long, flags) { struct perf_event *group_leader = NULL, *output_event = NULL; - struct perf_event *event, *sibling; + struct perf_event *event = NULL, *sibling; struct perf_event_attr attr; struct perf_event_context *ctx, *uninitialized_var(gctx); struct file *event_file = NULL; @@ -9908,13 +9926,14 @@ SYSCALL_DEFINE5(perf_event_open, NULL, NULL, cgroup_fd); if (IS_ERR(event)) { err = PTR_ERR(event); + event = NULL; goto err_cred; } if (is_sampling_event(event)) { if (event->pmu->capabilities & PERF_PMU_CAP_NO_INTERRUPT) { err = -EOPNOTSUPP; - goto err_alloc; + goto err_cred; } } @@ -9927,7 +9946,7 @@ SYSCALL_DEFINE5(perf_event_open, if (attr.use_clockid) { err = perf_event_set_clock(event, attr.clockid); if (err) - goto err_alloc; + goto err_cred; } if (pmu->task_ctx_nr == perf_sw_context) @@ -9962,7 +9981,7 @@ SYSCALL_DEFINE5(perf_event_open, ctx = find_get_context(pmu, task, event); if (IS_ERR(ctx)) { err = PTR_ERR(ctx); - goto err_alloc; + goto err_cred; } if ((pmu->capabilities & PERF_PMU_CAP_EXCLUSIVE) && group_leader) { @@ -10186,18 +10205,21 @@ SYSCALL_DEFINE5(perf_event_open, err_context: perf_unpin_context(ctx); put_ctx(ctx); -err_alloc: - /* - * If event_file is set, the fput() above will have called ->release() - * and that will take care of freeing the event. - */ - if (!event_file) - free_event(event); err_cred: if (task) mutex_unlock(&task->signal->cred_guard_mutex); err_cpus: put_online_cpus(); + /* + * The event cleanup should happen earlier (as per cleanup in reverse + * allocation order). It is delayed after the put_online_cpus() section + * so we don't invoke event->destroy in it and risk recursive invocation + * of it via static_key_slow_dec(). + * If event_file is set, the fput() above will have called ->release() + * and that will take care of freeing the event. + */ + if (event && !event_file) + free_event(event); err_task: if (task) put_task_struct(task); -- 2.11.0
With the last patch on-top I trigger this now and then: ====================================================== WARNING: possible circular locking dependency detected 4.11.0-rc8-00894-g8bd462ee4aac-dirty #84 Not tainted ------------------------------------------------------ trinity-subchil/4966 is trying to acquire lock: (cpu_hotplug_lock.rw_sem){++++++}, at: [<ffffffff812fcdad>] tp_perf_event_destroy+0xd/0x20 but task is already holding lock: (&ctx->mutex){+.+.+.}, at: [<ffffffff81316ede>] perf_event_exit_task+0x2ae/0x8d0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&ctx->mutex){+.+.+.}: __lock_acquire+0x2534/0x2cd0 lock_acquire+0x27c/0x2f0 __mutex_lock+0xef/0x1280 mutex_lock_nested+0x16/0x20 SyS_perf_event_open+0x11ab/0x1e80 entry_SYSCALL_64_fastpath+0x23/0xc2 -> #1 (&sig->cred_guard_mutex){+.+.+.}: __lock_acquire+0x2534/0x2cd0 lock_acquire+0x27c/0x2f0 __mutex_lock+0xef/0x1280 mutex_lock_interruptible_nested+0x16/0x20 SyS_perf_event_open+0x1bd6/0x1e80 entry_SYSCALL_64_fastpath+0x23/0xc2 -> #0 (cpu_hotplug_lock.rw_sem){++++++}: check_prevs_add+0x544/0x18f0 __lock_acquire+0x2534/0x2cd0 lock_acquire+0x27c/0x2f0 get_online_cpus+0x3d/0x80 tp_perf_event_destroy+0xd/0x20 _free_event+0x61b/0x800 free_event+0x68/0x70 perf_event_exit_task+0x816/0x8d0 do_exit+0xc90/0x2a90 do_group_exit+0x1aa/0x2b0 SyS_exit_group+0x18/0x20 entry_SYSCALL_64_fastpath+0x23/0xc2 other info that might help us debug this: Chain exists of: cpu_hotplug_lock.rw_sem --> &sig->cred_guard_mutex --> &ctx->mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&ctx->mutex); lock(&sig->cred_guard_mutex); lock(&ctx->mutex); lock(cpu_hotplug_lock.rw_sem); *** DEADLOCK *** 1 lock held by trinity-subchil/4966: #0: (&ctx->mutex){+.+.+.}, at: [<ffffffff81316ede>] perf_event_exit_task+0x2ae/0x8d0 stack backtrace: CPU: 3 PID: 4966 Comm: trinity-subchil Not tainted 4.11.0-rc8-00894-g8bd462ee4aac-dirty #84 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0x86/0xce print_circular_bug+0x5c3/0x620 ? lockdep_on+0x50/0x50 check_prevs_add+0x544/0x18f0 ? check_irq_usage+0x150/0x150 ? lock_acquire+0x27c/0x2f0 __lock_acquire+0x2534/0x2cd0 ? __lock_acquire+0x2534/0x2cd0 ? find_held_lock+0x36/0x1c0 lock_acquire+0x27c/0x2f0 ? tp_perf_event_destroy+0xd/0x20 get_online_cpus+0x3d/0x80 ? tp_perf_event_destroy+0xd/0x20 tp_perf_event_destroy+0xd/0x20 _free_event+0x61b/0x800 free_event+0x68/0x70 perf_event_exit_task+0x816/0x8d0 do_exit+0xc90/0x2a90 ? mm_update_next_owner+0x550/0x550 ? getname_flags+0xde/0x370 ? getname_flags+0xde/0x370 ? rcu_read_lock_sched_held+0x14a/0x180 ? prepare_bprm_creds+0xf0/0xf0 ? kmem_cache_free+0x250/0x2c0 ? getname_flags+0xde/0x370 ? entry_SYSCALL_64_fastpath+0x5/0xc2 do_group_exit+0x1aa/0x2b0 SyS_exit_group+0x18/0x20 entry_SYSCALL_64_fastpath+0x23/0xc2 and I am not sure what to do here… Sebastian
Commit-ID: 1526eee294dd52b70804aa377579682cc4dcd9ad Gitweb: http://git.kernel.org/tip/1526eee294dd52b70804aa377579682cc4dcd9ad Author: Thomas Gleixner <tglx@linutronix.de> AuthorDate: Mon, 1 May 2017 14:35:45 +0200 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Mon, 1 May 2017 14:54:40 +0200 perf: Reorder cpu hotplug rwsem against cred_guard_mutex sys_perf_event_open() takes the hotplug rwsem before the cred_guard_mutex. The exit() path has the reverse lock order. The hotplug protection in sys_perf_event_open() is not required before taking the cred_guard_mutex, so it can be reordered there. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/20170428142456.5xh44ef3fv7w2kkh@linutronix.de --- kernel/events/core.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index 997123c..71d8c74 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9742,12 +9742,10 @@ SYSCALL_DEFINE5(perf_event_open, goto err_task; } - get_online_cpus(); - if (task) { err = mutex_lock_interruptible(&task->signal->cred_guard_mutex); if (err) - goto err_cpus; + goto err_task; /* * Reuse ptrace permission checks for now. @@ -9765,11 +9763,13 @@ SYSCALL_DEFINE5(perf_event_open, if (flags & PERF_FLAG_PID_CGROUP) cgroup_fd = pid; + get_online_cpus(); + event = perf_event_alloc(&attr, cpu, task, group_leader, NULL, NULL, NULL, cgroup_fd); if (IS_ERR(event)) { err = PTR_ERR(event); - goto err_cred; + goto err_cpus; } if (is_sampling_event(event)) { @@ -10017,13 +10017,13 @@ SYSCALL_DEFINE5(perf_event_open, perf_event_ctx_unlock(group_leader, gctx); mutex_unlock(&ctx->mutex); + put_online_cpus(); + if (task) { mutex_unlock(&task->signal->cred_guard_mutex); put_task_struct(task); } - put_online_cpus(); - mutex_lock(¤t->perf_event_mutex); list_add_tail(&event->owner_entry, ¤t->perf_event_list); mutex_unlock(¤t->perf_event_mutex); @@ -10054,11 +10054,11 @@ err_alloc: */ if (!event_file) free_event(event); +err_cpus: + put_online_cpus(); err_cred: if (task) mutex_unlock(&task->signal->cred_guard_mutex); -err_cpus: - put_online_cpus(); err_task: if (task) put_task_struct(task);
Commit-ID: 74b3980ba87e6bdb78694600e234f91fef592dbd Gitweb: http://git.kernel.org/tip/74b3980ba87e6bdb78694600e234f91fef592dbd Author: Thomas Gleixner <tglx@linutronix.de> AuthorDate: Mon, 1 May 2017 09:57:10 +0200 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Mon, 1 May 2017 14:54:41 +0200 perf: Push hotplug protection down to callers There are various code pathes invoked from event init/release which take the hotplug rwsem. That's either nesting in a region which holds the hotplug rwsem already or creates reverse lock ordering. Push the hotplug protection down to the core call sites and remove the hotplug locking in the various init/destroy functions. There is a subtle problem with the error exit path in sys_perf_event_open() where fput() calls the release function from the hotplug protected region. Solve this by manually releasing the event and preventing the release function from doing so. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/20170428142456.5xh44ef3fv7w2kkh@linutronix.de --- arch/x86/events/intel/ds.c | 7 ++----- kernel/events/core.c | 29 +++++++++++++++++++++++------ 2 files changed, 25 insertions(+), 11 deletions(-) diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 9dfeeec..f42db0c 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -391,7 +391,7 @@ void release_ds_buffers(void) if (!x86_pmu.bts && !x86_pmu.pebs) return; - get_online_cpus(); + lockdep_assert_hotplug_held(); for_each_online_cpu(cpu) fini_debug_store_on_cpu(cpu); @@ -400,7 +400,6 @@ void release_ds_buffers(void) release_bts_buffer(cpu); release_ds_buffer(cpu); } - put_online_cpus(); } void reserve_ds_buffers(void) @@ -420,7 +419,7 @@ void reserve_ds_buffers(void) if (!x86_pmu.pebs) pebs_err = 1; - get_online_cpus(); + lockdep_assert_hotplug_held(); for_each_possible_cpu(cpu) { if (alloc_ds_buffer(cpu)) { @@ -461,8 +460,6 @@ void reserve_ds_buffers(void) for_each_online_cpu(cpu) init_debug_store_on_cpu(cpu); } - - put_online_cpus(); } /* diff --git a/kernel/events/core.c b/kernel/events/core.c index 71d8c74..fc39c22 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -4337,7 +4337,15 @@ EXPORT_SYMBOL_GPL(perf_event_release_kernel); */ static int perf_release(struct inode *inode, struct file *file) { - perf_event_release_kernel(file->private_data); + /* + * The error exit path of sys_perf_event_open() might have released + * the event already and cleared file->private_data. + */ + if (file->private_data) { + get_online_cpus(); + perf_event_release_kernel(file->private_data); + put_online_cpus(); + } return 0; } @@ -7619,7 +7627,7 @@ static void sw_perf_event_destroy(struct perf_event *event) WARN_ON(event->parent); - static_key_slow_dec(&perf_swevent_enabled[event_id]); + static_key_slow_dec_cpuslocked(&perf_swevent_enabled[event_id]); swevent_hlist_put(); } @@ -7783,9 +7791,7 @@ EXPORT_SYMBOL_GPL(perf_tp_event); static void tp_perf_event_destroy(struct perf_event *event) { - get_online_cpus(); perf_trace_destroy(event); - put_online_cpus(); } static int perf_tp_event_init(struct perf_event *event) @@ -10043,6 +10049,12 @@ err_locked: perf_event_ctx_unlock(group_leader, gctx); mutex_unlock(&ctx->mutex); /* err_file: */ + /* + * Release the event manually to avoid hotplug lock recursion in + * perf_release(). + */ + event_file->private_data = NULL; + perf_event_release_kernel(event); fput(event_file); err_context: perf_unpin_context(ctx); @@ -10086,10 +10098,10 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu, struct perf_event *event; int err; + get_online_cpus(); /* * Get the target context (task or percpu): */ - event = perf_event_alloc(attr, cpu, task, NULL, NULL, overflow_handler, context, -1); if (IS_ERR(event)) { @@ -10121,7 +10133,7 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu, perf_install_in_context(ctx, event, cpu); perf_unpin_context(ctx); mutex_unlock(&ctx->mutex); - + put_online_cpus(); return event; err_unlock: @@ -10131,6 +10143,7 @@ err_unlock: err_free: free_event(event); err: + put_online_cpus(); return ERR_PTR(err); } EXPORT_SYMBOL_GPL(perf_event_create_kernel_counter); @@ -10364,8 +10377,10 @@ void perf_event_exit_task(struct task_struct *child) } mutex_unlock(&child->perf_event_mutex); + get_online_cpus(); for_each_task_context_nr(ctxn) perf_event_exit_task_context(child, ctxn); + put_online_cpus(); /* * The perf_event_exit_task_context calls perf_event_task @@ -10410,6 +10425,7 @@ void perf_event_free_task(struct task_struct *task) struct perf_event *event, *tmp; int ctxn; + get_online_cpus(); for_each_task_context_nr(ctxn) { ctx = task->perf_event_ctxp[ctxn]; if (!ctx) @@ -10434,6 +10450,7 @@ void perf_event_free_task(struct task_struct *task) mutex_unlock(&ctx->mutex); put_ctx(ctx); } + put_online_cpus(); } void perf_event_delayed_put(struct task_struct *task)
Thomas Gleixner <tglx@linutronix.de> writes: > @@ -130,6 +130,7 @@ void __static_key_slow_inc(struct static > * the all CPUs, for that to be serialized against CPU hot-plug > * we need to avoid CPUs coming online. > */ > + lockdep_assert_hotplug_held(); > jump_label_lock(); > if (atomic_read(&key->enabled) == 0) { > atomic_set(&key->enabled, -1); I seem to be hitting this assert from the ftrace event selftests, enabled at boot with CONFIG_FTRACE_STARTUP_TEST=y, using next-20170509 (on powerpc). [ 842.691191] Testing event rpc_call_status: [ 842.691209] ------------[ cut here ]------------ [ 842.691399] WARNING: CPU: 6 PID: 1 at ../kernel/cpu.c:234 lockdep_assert_hotplug_held+0x5c/0x70 [ 842.691575] Modules linked in: [ 842.691675] CPU: 6 PID: 1 Comm: swapper/0 Tainted: G W 4.11.0-gcc-5.4.1-next-20170509 #218 [ 842.691865] task: c0000001fe780000 task.stack: c0000001fe800000 [ 842.692003] NIP: c0000000000ff3dc LR: c0000000000ff3d0 CTR: c000000000218650 [ 842.692166] REGS: c0000001fe8036e0 TRAP: 0700 Tainted: G W (4.11.0-gcc-5.4.1-next-20170509) [ 842.692343] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> [ 842.692491] CR: 28002222 XER: 20000000 [ 842.692689] CFAR: c000000000171530 SOFTE: 1 GPR00: c0000000000ff3d0 c0000001fe803960 c0000000012b7600 0000000000000000 GPR04: ffffffffffffffff 0000000000000000 c0000000fc10c0e8 0000000000000000 GPR08: 0000000000000000 0000000000000000 0000000000000000 c0000000f8180008 GPR12: 0000000000002200 c00000000fd42100 c00000000000e218 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 c0000000f9341610 GPR24: c00000000127ee48 c000000000aa49d0 000000000000000a c0000000fc3c0000 GPR28: c00000000117b148 c000000001264230 0000000000000000 c00000000127ee48 [ 842.694287] NIP [c0000000000ff3dc] lockdep_assert_hotplug_held+0x5c/0x70 [ 842.694434] LR [c0000000000ff3d0] lockdep_assert_hotplug_held+0x50/0x70 [ 842.694577] Call Trace: [ 842.694658] [c0000001fe803960] [c0000000000ff3d0] lockdep_assert_hotplug_held+0x50/0x70 (unreliable) [ 842.694876] [c0000001fe803980] [c0000000002a3754] __static_key_slow_inc+0x104/0x170 [ 842.695054] [c0000001fe8039f0] [c0000000002176ac] tracepoint_probe_register_prio+0x2dc/0x390 [ 842.695258] [c0000001fe803a60] [c00000000024cf50] trace_event_reg+0xe0/0x130 [ 842.695434] [c0000001fe803a80] [c00000000024d5f0] __ftrace_event_enable_disable+0x270/0x3e0 [ 842.695601] [c0000001fe803b10] [c000000000e20328] event_trace_self_tests+0x14c/0x350 [ 842.695778] [c0000001fe803bc0] [c000000000e20774] event_trace_self_tests_init+0xc8/0xf4 [ 842.695944] [c0000001fe803c30] [c00000000000d87c] do_one_initcall+0x6c/0x1d0 [ 842.696113] [c0000001fe803cf0] [c000000000df462c] kernel_init_freeable+0x304/0x3e4 [ 842.696282] [c0000001fe803dc0] [c00000000000e23c] kernel_init+0x2c/0x170 [ 842.696460] [c0000001fe803e30] [c00000000000bdec] ret_from_kernel_thread+0x5c/0x70 [ 842.696662] Instruction dump: [ 842.696763] 409e0014 38210020 e8010010 7c0803a6 4e800020 3c62ffe6 3880ffff 38634808 [ 842.697009] 480720ed 60000000 2fa30000 409effd8 <0fe00000> 38210020 e8010010 7c0803a6 [ 842.697271] ---[ end trace f68728a0d30544a1 ]--- The stupidly obvious (or perhaps obviously stupid) patch below fixes it: diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index daefdee9411a..5531f7ce8fa6 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -3241,9 +3241,19 @@ static __init void event_trace_self_tests(void) continue; } + get_online_cpus(); + mutex_lock(&event_mutex); ftrace_event_enable_disable(file, 1); + mutex_unlock(&event_mutex); + put_online_cpus(); + event_test_stuff(); + + get_online_cpus(); + mutex_lock(&event_mutex); ftrace_event_enable_disable(file, 0); + mutex_unlock(&event_mutex); + put_online_cpus(); pr_cont("OK\n"); } cheers
On Wed, 10 May 2017, Michael Ellerman wrote: > Thomas Gleixner <tglx@linutronix.de> writes: > > > @@ -130,6 +130,7 @@ void __static_key_slow_inc(struct static > > * the all CPUs, for that to be serialized against CPU hot-plug > > * we need to avoid CPUs coming online. > > */ > > + lockdep_assert_hotplug_held(); > > jump_label_lock(); > > if (atomic_read(&key->enabled) == 0) { > > atomic_set(&key->enabled, -1); > > I seem to be hitting this assert from the ftrace event selftests, > enabled at boot with CONFIG_FTRACE_STARTUP_TEST=y, using next-20170509 > (on powerpc). > > The stupidly obvious (or perhaps obviously stupid) patch below fixes it: Kinda. There is more horror in that area lurking and I'm still trying to figure out all the convoluted call pathes. Thanks, tglx > diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c > index daefdee9411a..5531f7ce8fa6 100644 > --- a/kernel/trace/trace_events.c > +++ b/kernel/trace/trace_events.c > @@ -3241,9 +3241,19 @@ static __init void event_trace_self_tests(void) > continue; > } > > + get_online_cpus(); > + mutex_lock(&event_mutex); > ftrace_event_enable_disable(file, 1); > + mutex_unlock(&event_mutex); > + put_online_cpus(); > + > event_test_stuff(); > + > + get_online_cpus(); > + mutex_lock(&event_mutex); > ftrace_event_enable_disable(file, 0); > + mutex_unlock(&event_mutex); > + put_online_cpus(); > > pr_cont("OK\n"); > } > > cheers >
On Wed, 10 May 2017 10:49:09 +0200 (CEST) Thomas Gleixner <tglx@linutronix.de> wrote: > On Wed, 10 May 2017, Michael Ellerman wrote: > > > Thomas Gleixner <tglx@linutronix.de> writes: > > > > > @@ -130,6 +130,7 @@ void __static_key_slow_inc(struct static > > > * the all CPUs, for that to be serialized against CPU hot-plug > > > * we need to avoid CPUs coming online. > > > */ > > > + lockdep_assert_hotplug_held(); > > > jump_label_lock(); > > > if (atomic_read(&key->enabled) == 0) { > > > atomic_set(&key->enabled, -1); > > > > I seem to be hitting this assert from the ftrace event selftests, > > enabled at boot with CONFIG_FTRACE_STARTUP_TEST=y, using next-20170509 > > (on powerpc). > > > > The stupidly obvious (or perhaps obviously stupid) patch below fixes it: > > Kinda. There is more horror in that area lurking and I'm still trying to > figure out all the convoluted call pathes. I finally got some time to look at this. I'm looking at your commit: commit b53e5129c4c7ab47ec4f709fd8f5784ca45fb46d Author: Thomas Gleixner <tglx@linutronix.de> Date: Sun Apr 23 12:17:13 2017 +0200 trace/perf: Cure hotplug lock ordering issues What were the circular locking dependencies that were uncovered. event_mutex could possibly be broken up, if that helps. It sorta became a catch all for various modifications to tracing. -- Steve > > Thanks, > > tglx > > > diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c > > index daefdee9411a..5531f7ce8fa6 100644 > > --- a/kernel/trace/trace_events.c > > +++ b/kernel/trace/trace_events.c > > @@ -3241,9 +3241,19 @@ static __init void event_trace_self_tests(void) > > continue; > > } > > > > + get_online_cpus(); > > + mutex_lock(&event_mutex); > > ftrace_event_enable_disable(file, 1); > > + mutex_unlock(&event_mutex); > > + put_online_cpus(); > > + > > event_test_stuff(); > > + > > + get_online_cpus(); > > + mutex_lock(&event_mutex); > > ftrace_event_enable_disable(file, 0); > > + mutex_unlock(&event_mutex); > > + put_online_cpus(); > > > > pr_cont("OK\n"); > > } > > > > cheers > >
On Wed, 10 May 2017 12:30:57 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:
> I finally got some time to look at this. I'm looking at your commit:
>
> commit b53e5129c4c7ab47ec4f709fd8f5784ca45fb46d
> Author: Thomas Gleixner <tglx@linutronix.de>
> Date: Sun Apr 23 12:17:13 2017 +0200
>
> trace/perf: Cure hotplug lock ordering issues
>
> What were the circular locking dependencies that were uncovered.
>
> event_mutex could possibly be broken up, if that helps. It sorta became
> a catch all for various modifications to tracing.
>
I checked out the commit before this and started playing with it. I see
some of the issues now. I'll look more into it.
-- Steve
Thomas Gleixner <tglx@linutronix.de> writes:
> On Wed, 10 May 2017, Michael Ellerman wrote:
>
>> Thomas Gleixner <tglx@linutronix.de> writes:
>>
>> > @@ -130,6 +130,7 @@ void __static_key_slow_inc(struct static
>> > * the all CPUs, for that to be serialized against CPU hot-plug
>> > * we need to avoid CPUs coming online.
>> > */
>> > + lockdep_assert_hotplug_held();
>> > jump_label_lock();
>> > if (atomic_read(&key->enabled) == 0) {
>> > atomic_set(&key->enabled, -1);
>>
>> I seem to be hitting this assert from the ftrace event selftests,
>> enabled at boot with CONFIG_FTRACE_STARTUP_TEST=y, using next-20170509
>> (on powerpc).
>>
>> The stupidly obvious (or perhaps obviously stupid) patch below fixes it:
>
> Kinda. There is more horror in that area lurking and I'm still trying to
> figure out all the convoluted call pathes.
OK thanks.
cheers