From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753728AbdEQKkY (ORCPT ); Wed, 17 May 2017 06:40:24 -0400 Received: from merlin.infradead.org ([205.233.59.134]:46008 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753528AbdEQKkV (ORCPT ); Wed, 17 May 2017 06:40:21 -0400 Date: Wed, 17 May 2017 12:40:10 +0200 From: Peter Zijlstra To: "Paul E. McKenney" Cc: Steven Rostedt , linux-kernel@vger.kernel.org, Thomas Gleixner , Ingo Molnar , Mathieu Desnoyers , Masami Hiramatsu Subject: Re: [RFC][PATCH 0/5] perf/tracing/cpuhotplug: Fix locking order Message-ID: <20170517104010.5dfz7qqeigwbzb2u@hirez.programming.kicks-ass.net> References: <20170512171544.100715273@goodmis.org> <20170512194956.GH4626@worktop.programming.kicks-ass.net> <20170512173448.5e2106b6@gandalf.local.home> <20170513134003.GA30927@linux.vnet.ibm.com> <20170515090318.amup4tbqujg27nrl@hirez.programming.kicks-ass.net> <20170515184043.GU3956@linux.vnet.ibm.com> <20170516081923.fxg67gawc44eg6i6@hirez.programming.kicks-ass.net> <20170516124606.GC3956@linux.vnet.ibm.com> <20170516142742.GA17599@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170516142742.GA17599@linux.vnet.ibm.com> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 16, 2017 at 07:27:42AM -0700, Paul E. McKenney wrote: > On Tue, May 16, 2017 at 05:46:06AM -0700, Paul E. McKenney wrote: > > Something like this, yes. Maybe even exactly like this. ;-) > > Ah, one thing I forgot... If you are avoiding use of get_online_cpus(), > you usually also have to be very careful about how you use things like > cpu_online() and cpu_is_offline. OK, so I think I got it wrong there. This hunk should close any race between perf_pmu_register() and hotplug by tracking a global state protected by pmus_lock. Thereby insuring the cpuctx->online state gets initialized right. --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -8993,6 +8993,8 @@ static int pmu_dev_alloc(struct pmu *pmu static struct lock_class_key cpuctx_mutex; static struct lock_class_key cpuctx_lock; +static cpumask_var_t perf_online_mask; + int perf_pmu_register(struct pmu *pmu, const char *name, int type) { int cpu, ret; @@ -9056,7 +9058,7 @@ int perf_pmu_register(struct pmu *pmu, c lockdep_set_class(&cpuctx->ctx.mutex, &cpuctx_mutex); lockdep_set_class(&cpuctx->ctx.lock, &cpuctx_lock); cpuctx->ctx.pmu = pmu; - cpuctx->online = cpu_online(cpu); + cpuctx->online = cpumask_test_cpu(cpu, perf_online_mask); __perf_mux_hrtimer_init(cpuctx, cpu); } @@ -10952,6 +10954,8 @@ static void __init perf_event_init_all_c struct swevent_htable *swhash; int cpu; + zalloc_cpumask_var(&perf_online_mask, GFP_KERNEL); + for_each_possible_cpu(cpu) { swhash = &per_cpu(swevent_htable, cpu); mutex_init(&swhash->hlist_mutex); @@ -11011,6 +11015,7 @@ static void perf_event_exit_cpu_context( cpuctx->online = 0; mutex_unlock(&ctx->mutex); } + cpumask_clear_cpu(cpu, perf_online_mask); mutex_unlock(&pmus_lock); } #else @@ -11028,6 +11033,7 @@ int perf_event_init_cpu(unsigned int cpu perf_swevent_init_cpu(cpu); mutex_lock(&pmus_lock); + cpumask_set_cpu(cpu, perf_online_mask); list_for_each_entry(pmu, &pmus, entry) { cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu); ctx = &cpuctx->ctx; > > > --- > > > --- a/kernel/events/core.c > > > +++ b/kernel/events/core.c > > > @@ -8997,7 +8997,6 @@ int perf_pmu_register(struct pmu *pmu, c > > > { > > > int cpu, ret; > > > > > > - get_online_cpus(); > > > mutex_lock(&pmus_lock); > > > ret = -ENOMEM; > > > pmu->pmu_disable_count = alloc_percpu(int); > > > > There is usually also some state check in here somewhere for the CPU > > being offline from a perf perspective. Such a check might already exist, > > but I must plead ignorance of perf. This just allocates per-cpu storage, that is per definition on the possible mask and unrelated to the online mask. > > > @@ -9093,7 +9092,6 @@ int perf_pmu_register(struct pmu *pmu, c > > > ret = 0; > > > unlock: > > > mutex_unlock(&pmus_lock); > > > - put_online_cpus(); > > > > > > return ret; > > > > > > @@ -11002,10 +11000,9 @@ static void perf_event_exit_cpu_context( > > > struct perf_cpu_context *cpuctx; > > > struct perf_event_context *ctx; > > > struct pmu *pmu; > > > - int idx; > > > > > > - idx = srcu_read_lock(&pmus_srcu); > > > - list_for_each_entry_rcu(pmu, &pmus, entry) { > > > + mutex_lock(&pmus_lock); > > > > If the state change checked for by perf_pmu_register() needs to be also > > guarded by ctx->mutex, this looks right to me. Right, so we have two locks, pmus_lock that serializes hotplug vs perf_pmu_register and ctx->lock that serializes find_get_context() vs hotplug. Together they ensure that if a PMU is observed, the PMU's cpuctx's have the correct ->online state. > > Just for completeness, the other style is to maintain separate per-CPU > > state, in which case you would instead acquire pmus_lock, mark this > > CPU off-limits to more perf_pmu_register() usage, release pmus_lock, > > then clean up any old usage. I'm not immediately seeing the other style, but per the above, I need that too. Because the previous could race against hotplug if perf_pmu_register() would observe cpu_online() as set after perf_event_exit_cpu() or something. With the above change any chance of a race is gone and we don't need to worry about hotplug ordering too much. Now I just need to do something about the swevent hash, because that's got a hole in now.