linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 0/5] perf/tracing/cpuhotplug: Fix locking order
@ 2017-05-12 17:15 Steven Rostedt
  2017-05-12 17:15 ` [RFC][PATCH 1/5] tracing: Make sure RCU is watching before calling a stack trace Steven Rostedt
                   ` (6 more replies)
  0 siblings, 7 replies; 34+ messages in thread
From: Steven Rostedt @ 2017-05-12 17:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Mathieu Desnoyers,
	Paul E. McKenney, Masami Hiramatsu

NOTE: This was quickly written. The change logs and patches probably need
some loving. This is for discussion. These may become legitimate patches,
but for now, I'm seeing if this is an acceptable solution.

Also note, I checked out the last branch that I had Linus pull, and then
merged with tip's commit b53e5129~1, which is the commit just before
"trace/perf: Cure hotplug lock ordering issues" in tip/smp/hotplug
that placed get_online_cpus() ahead of event_mutex. The issue is that
event_mutex is a high level mutex, and trying to make sure that
get_online_cpus() is held whenever it is held and is going to be a
nightmare. So I started ahead of that commit, but I needed changes
in my own repo to do full testing. I renamed this temp branch to
"tip/cpuhotplug", and you can pull it as well. But this is not going
to be a candidate for pushing to Linus.

The 5 patches here are:

 1) fix a lockdep splat when taking stack trace in irqsoff tracing
    (one of my tests triggered this)

 2) Allow for get_online_cpus() to nest

 3) Have kprobes take get_online_cpus() before taking jump label lock.

 4) Have tracepoints always take get_online_cpus

 5) Have perf take event_mutex before taking get_online_cpus()

Thoughts?

-- Steve


  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
tip/cpuhotplug

Head SHA1: ddb06e08fa10ed5030285267a9b1e25d40c337c8


Steven Rostedt (VMware) (5):
      tracing: Make sure RCU is watching before calling a stack trace
      cpu-hotplug: Allow get_online_cpus() to nest
      kprobes: Take get_online_cpus() before taking jump_label_lock()
      tracepoints: Grab get_online_cpus() before taking tracepoints_mutex
      perf: Grab event_mutex before taking get_online_cpus()

----
 include/linux/sched.h           | 1 +
 include/linux/trace_events.h    | 2 ++
 kernel/cpu.c                    | 9 +++++++++
 kernel/events/core.c            | 9 +++++++++
 kernel/fork.c                   | 1 +
 kernel/kprobes.c                | 6 +++---
 kernel/trace/trace.c            | 5 +++++
 kernel/trace/trace.h            | 1 -
 kernel/trace/trace_event_perf.c | 8 ++++----
 kernel/tracepoint.c             | 5 +++++
 10 files changed, 39 insertions(+), 8 deletions(-)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC][PATCH 1/5] tracing: Make sure RCU is watching before calling a stack trace
  2017-05-12 17:15 [RFC][PATCH 0/5] perf/tracing/cpuhotplug: Fix locking order Steven Rostedt
@ 2017-05-12 17:15 ` Steven Rostedt
  2017-05-12 18:25   ` Paul E. McKenney
  2017-05-12 17:15 ` [RFC][PATCH 2/5] cpu-hotplug: Allow get_online_cpus() to nest Steven Rostedt
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 34+ messages in thread
From: Steven Rostedt @ 2017-05-12 17:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Mathieu Desnoyers,
	Paul E. McKenney, Masami Hiramatsu

[-- Attachment #1: 0001-tracing-Make-sure-RCU-is-watching-before-calling-a-s.patch --]
[-- Type: text/plain, Size: 801 bytes --]

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

As stack tracing now requires "rcu watching", force RCU to be watching when
recording a stack trace.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 kernel/trace/trace.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index c4536c449021..a4208cebb42b 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2568,7 +2568,12 @@ static inline void ftrace_trace_stack(struct trace_array *tr,
 void __trace_stack(struct trace_array *tr, unsigned long flags, int skip,
 		   int pc)
 {
+	if (unlikely(rcu_irq_enter_disabled()))
+		return;
+
+	rcu_irq_enter();
 	__ftrace_trace_stack(tr->trace_buffer.buffer, flags, skip, pc, NULL);
+	rcu_irq_exit();
 }
 
 /**
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC][PATCH 2/5] cpu-hotplug: Allow get_online_cpus() to nest
  2017-05-12 17:15 [RFC][PATCH 0/5] perf/tracing/cpuhotplug: Fix locking order Steven Rostedt
  2017-05-12 17:15 ` [RFC][PATCH 1/5] tracing: Make sure RCU is watching before calling a stack trace Steven Rostedt
@ 2017-05-12 17:15 ` Steven Rostedt
  2017-05-12 18:35   ` Paul E. McKenney
  2017-05-12 22:15   ` Thomas Gleixner
  2017-05-12 17:15 ` [RFC][PATCH 3/5] kprobes: Take get_online_cpus() before taking jump_label_lock() Steven Rostedt
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 34+ messages in thread
From: Steven Rostedt @ 2017-05-12 17:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Mathieu Desnoyers,
	Paul E. McKenney, Masami Hiramatsu

[-- Attachment #1: 0002-cpu-hotplug-Allow-get_online_cpus-to-nest.patch --]
[-- Type: text/plain, Size: 1890 bytes --]

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

Allow get_online_cpus() to be recursive. If a lock is taken while under
"get_online_cpus()", it can call get_online_cpus() as well, just as long as
it is never held without being under get_online_cpus(), but then calling it.

   GOC() -> Lock(X) -> GOC()

is OK, as long as

   Lock(X) -> GOC()

does not exist.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 include/linux/sched.h | 1 +
 kernel/cpu.c          | 9 +++++++++
 kernel/fork.c         | 1 +
 3 files changed, 11 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 993e7e25a3a5..8f272ab57685 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -813,6 +813,7 @@ struct task_struct {
 	unsigned int			lockdep_recursion;
 	struct held_lock		held_locks[MAX_LOCK_DEPTH];
 	gfp_t				lockdep_reclaim_gfp;
+	int				goc_depth;
 #endif
 
 #ifdef CONFIG_UBSAN
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 983163ef36ee..0dbdf1e69715 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -209,12 +209,21 @@ DEFINE_STATIC_PERCPU_RWSEM(cpu_hotplug_lock);
 
 void get_online_cpus(void)
 {
+#ifdef CONFIG_LOCKDEP
+	if (current->goc_depth++)
+		return;
+#endif
 	percpu_down_read(&cpu_hotplug_lock);
 }
 EXPORT_SYMBOL_GPL(get_online_cpus);
 
 void put_online_cpus(void)
 {
+#ifdef CONFIG_LOCKDEP
+	WARN_ON_ONCE(current->goc_depth < 1);
+	if (--current->goc_depth)
+		return;
+#endif
 	percpu_up_read(&cpu_hotplug_lock);
 }
 EXPORT_SYMBOL_GPL(put_online_cpus);
diff --git a/kernel/fork.c b/kernel/fork.c
index dd5a371c392a..8aedcd011ccc 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1658,6 +1658,7 @@ static __latent_entropy struct task_struct *copy_process(
 	p->lockdep_depth = 0; /* no locks held yet */
 	p->curr_chain_key = 0;
 	p->lockdep_recursion = 0;
+	p->goc_depth = 0;
 #endif
 
 #ifdef CONFIG_DEBUG_MUTEXES
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC][PATCH 3/5] kprobes: Take get_online_cpus() before taking jump_label_lock()
  2017-05-12 17:15 [RFC][PATCH 0/5] perf/tracing/cpuhotplug: Fix locking order Steven Rostedt
  2017-05-12 17:15 ` [RFC][PATCH 1/5] tracing: Make sure RCU is watching before calling a stack trace Steven Rostedt
  2017-05-12 17:15 ` [RFC][PATCH 2/5] cpu-hotplug: Allow get_online_cpus() to nest Steven Rostedt
@ 2017-05-12 17:15 ` Steven Rostedt
  2017-05-12 18:39   ` Paul E. McKenney
  2017-05-17 17:50   ` Masami Hiramatsu
  2017-05-12 17:15 ` [RFC][PATCH 4/5] tracepoints: Grab get_online_cpus() before taking tracepoints_mutex Steven Rostedt
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 34+ messages in thread
From: Steven Rostedt @ 2017-05-12 17:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Mathieu Desnoyers,
	Paul E. McKenney, Masami Hiramatsu

[-- Attachment #1: 0003-kprobes-Take-get_online_cpus-before-taking-jump_labe.patch --]
[-- Type: text/plain, Size: 1299 bytes --]

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

jump_label_lock() is taken under get_online_cpus(). Make sure that kprobes
follows suit.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 kernel/kprobes.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index d733479a10ee..57cf73aef488 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1294,13 +1294,13 @@ static int register_aggr_kprobe(struct kprobe *orig_p, struct kprobe *p)
 	int ret = 0;
 	struct kprobe *ap = orig_p;
 
-	/* For preparing optimization, jump_label_text_reserved() is called */
-	jump_label_lock();
 	/*
 	 * Get online CPUs to avoid text_mutex deadlock.with stop machine,
 	 * which is invoked by unoptimize_kprobe() in add_new_kprobe()
 	 */
 	get_online_cpus();
+	/* For preparing optimization, jump_label_text_reserved() is called */
+	jump_label_lock();
 	mutex_lock(&text_mutex);
 
 	if (!kprobe_aggrprobe(orig_p)) {
@@ -1348,8 +1348,8 @@ static int register_aggr_kprobe(struct kprobe *orig_p, struct kprobe *p)
 
 out:
 	mutex_unlock(&text_mutex);
-	put_online_cpus();
 	jump_label_unlock();
+	put_online_cpus();
 
 	if (ret == 0 && kprobe_disabled(ap) && !kprobe_disabled(p)) {
 		ap->flags &= ~KPROBE_FLAG_DISABLED;
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC][PATCH 4/5] tracepoints: Grab get_online_cpus() before taking tracepoints_mutex
  2017-05-12 17:15 [RFC][PATCH 0/5] perf/tracing/cpuhotplug: Fix locking order Steven Rostedt
                   ` (2 preceding siblings ...)
  2017-05-12 17:15 ` [RFC][PATCH 3/5] kprobes: Take get_online_cpus() before taking jump_label_lock() Steven Rostedt
@ 2017-05-12 17:15 ` Steven Rostedt
  2017-05-12 17:15 ` [RFC][PATCH 5/5] perf: Grab event_mutex before taking get_online_cpus() Steven Rostedt
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 34+ messages in thread
From: Steven Rostedt @ 2017-05-12 17:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Mathieu Desnoyers,
	Paul E. McKenney, Masami Hiramatsu

[-- Attachment #1: 0004-tracepoints-Grab-get_online_cpus-before-taking-trace.patch --]
[-- Type: text/plain, Size: 1653 bytes --]

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

There's places that take tracepoints_mutex while holding get_online_cpus(),
and since tracepoints call jump_label code, which also takes
get_online_cpus(), make sure that the tracepoints_mutex is always taken
under get_online_cpus().

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 kernel/tracepoint.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 685c50ae6300..e41eab51b435 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -23,6 +23,7 @@
 #include <linux/rcupdate.h>
 #include <linux/tracepoint.h>
 #include <linux/err.h>
+#include <linux/cpu.h>
 #include <linux/slab.h>
 #include <linux/sched/signal.h>
 #include <linux/sched/task.h>
@@ -276,12 +277,14 @@ int tracepoint_probe_register_prio(struct tracepoint *tp, void *probe,
 	struct tracepoint_func tp_func;
 	int ret;
 
+	get_online_cpus();
 	mutex_lock(&tracepoints_mutex);
 	tp_func.func = probe;
 	tp_func.data = data;
 	tp_func.prio = prio;
 	ret = tracepoint_add_func(tp, &tp_func, prio);
 	mutex_unlock(&tracepoints_mutex);
+	put_online_cpus();
 	return ret;
 }
 EXPORT_SYMBOL_GPL(tracepoint_probe_register_prio);
@@ -318,11 +321,13 @@ int tracepoint_probe_unregister(struct tracepoint *tp, void *probe, void *data)
 	struct tracepoint_func tp_func;
 	int ret;
 
+	get_online_cpus();
 	mutex_lock(&tracepoints_mutex);
 	tp_func.func = probe;
 	tp_func.data = data;
 	ret = tracepoint_remove_func(tp, &tp_func);
 	mutex_unlock(&tracepoints_mutex);
+	put_online_cpus();
 	return ret;
 }
 EXPORT_SYMBOL_GPL(tracepoint_probe_unregister);
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC][PATCH 5/5] perf: Grab event_mutex before taking get_online_cpus()
  2017-05-12 17:15 [RFC][PATCH 0/5] perf/tracing/cpuhotplug: Fix locking order Steven Rostedt
                   ` (3 preceding siblings ...)
  2017-05-12 17:15 ` [RFC][PATCH 4/5] tracepoints: Grab get_online_cpus() before taking tracepoints_mutex Steven Rostedt
@ 2017-05-12 17:15 ` Steven Rostedt
  2017-05-12 18:13 ` [RFC][PATCH 0/5] perf/tracing/cpuhotplug: Fix locking order Paul E. McKenney
  2017-05-12 19:49 ` Peter Zijlstra
  6 siblings, 0 replies; 34+ messages in thread
From: Steven Rostedt @ 2017-05-12 17:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Mathieu Desnoyers,
	Paul E. McKenney, Masami Hiramatsu

[-- Attachment #1: 0005-perf-Grab-event_mutex-before-taking-get_online_cpus.patch --]
[-- Type: text/plain, Size: 4358 bytes --]

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

The event_mutex is a high level lock. It should never be taken under
get_online_cpus() being held. Perf is the only user that does so. Move the
taking of event_mutex outside of get_online_cpus() and this should solve the
locking order.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 include/linux/trace_events.h    | 2 ++
 kernel/events/core.c            | 9 +++++++++
 kernel/trace/trace.h            | 1 -
 kernel/trace/trace_event_perf.c | 8 ++++----
 4 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index a556805eff8a..0a30d2b5df51 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -50,6 +50,8 @@ struct trace_event;
 int trace_raw_output_prep(struct trace_iterator *iter,
 			  struct trace_event *event);
 
+extern struct mutex event_mutex;
+
 /*
  * The trace entry - the most basic unit of tracing. This is what
  * is printed in the end as a single line in the trace output, such as:
diff --git a/kernel/events/core.c b/kernel/events/core.c
index dba870ccda63..c65c3b5a92f4 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4021,12 +4021,14 @@ static void unaccount_event(struct perf_event *event)
 
 static void perf_sched_delayed(struct work_struct *work)
 {
+	mutex_lock(&event_mutex);
 	get_online_cpus();
 	mutex_lock(&perf_sched_mutex);
 	if (atomic_dec_and_test(&perf_sched_count))
 		static_branch_disable_cpuslocked(&perf_sched_events);
 	mutex_unlock(&perf_sched_mutex);
 	put_online_cpus();
+	mutex_unlock(&event_mutex);
 }
 
 /*
@@ -4231,7 +4233,9 @@ static void put_event(struct perf_event *event)
 	if (!atomic_long_dec_and_test(&event->refcount))
 		return;
 
+	mutex_lock(&event_mutex);
 	_free_event(event);
+	mutex_unlock(&event_mutex);
 }
 
 /*
@@ -8917,6 +8921,7 @@ perf_event_mux_interval_ms_store(struct device *dev,
 	pmu->hrtimer_interval_ms = timer;
 
 	/* update all cpuctx for this PMU */
+	mutex_lock(&event_mutex);
 	get_online_cpus();
 	for_each_online_cpu(cpu) {
 		struct perf_cpu_context *cpuctx;
@@ -8927,6 +8932,7 @@ perf_event_mux_interval_ms_store(struct device *dev,
 			(remote_function_f)perf_mux_hrtimer_restart, cpuctx);
 	}
 	put_online_cpus();
+	mutex_unlock(&event_mutex);
 	mutex_unlock(&mux_interval_mutex);
 
 	return count;
@@ -9879,6 +9885,7 @@ SYSCALL_DEFINE5(perf_event_open,
 		goto err_task;
 	}
 
+	mutex_lock(&event_mutex);
 	get_online_cpus();
 
 	if (task) {
@@ -10160,6 +10167,7 @@ SYSCALL_DEFINE5(perf_event_open,
 	}
 
 	put_online_cpus();
+	mutex_unlock(&event_mutex);
 
 	mutex_lock(&current->perf_event_mutex);
 	list_add_tail(&event->owner_entry, &current->perf_event_list);
@@ -10196,6 +10204,7 @@ SYSCALL_DEFINE5(perf_event_open,
 		mutex_unlock(&task->signal->cred_guard_mutex);
 err_cpus:
 	put_online_cpus();
+	mutex_lock(&event_mutex);
 err_task:
 	if (task)
 		put_task_struct(task);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 291a1bca5748..4df0a8e9d000 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1430,7 +1430,6 @@ static inline void *event_file_data(struct file *filp)
 	return ACCESS_ONCE(file_inode(filp)->i_private);
 }
 
-extern struct mutex event_mutex;
 extern struct list_head ftrace_events;
 
 extern const struct file_operations event_trigger_fops;
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 562fa69df5d3..b8c90a024e99 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -213,7 +213,8 @@ int perf_trace_init(struct perf_event *p_event)
 	u64 event_id = p_event->attr.config;
 	int ret = -EINVAL;
 
-	mutex_lock(&event_mutex);
+	lockdep_assert_held(&event_mutex);
+
 	list_for_each_entry(tp_event, &ftrace_events, list) {
 		if (tp_event->event.type == event_id &&
 		    tp_event->class && tp_event->class->reg &&
@@ -224,17 +225,16 @@ int perf_trace_init(struct perf_event *p_event)
 			break;
 		}
 	}
-	mutex_unlock(&event_mutex);
 
 	return ret;
 }
 
 void perf_trace_destroy(struct perf_event *p_event)
 {
-	mutex_lock(&event_mutex);
+	lockdep_assert_held(&event_mutex);
+
 	perf_trace_event_close(p_event);
 	perf_trace_event_unreg(p_event);
-	mutex_unlock(&event_mutex);
 }
 
 int perf_trace_add(struct perf_event *p_event, int flags)
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/5] perf/tracing/cpuhotplug: Fix locking order
  2017-05-12 17:15 [RFC][PATCH 0/5] perf/tracing/cpuhotplug: Fix locking order Steven Rostedt
                   ` (4 preceding siblings ...)
  2017-05-12 17:15 ` [RFC][PATCH 5/5] perf: Grab event_mutex before taking get_online_cpus() Steven Rostedt
@ 2017-05-12 18:13 ` Paul E. McKenney
  2017-05-12 19:49 ` Peter Zijlstra
  6 siblings, 0 replies; 34+ messages in thread
From: Paul E. McKenney @ 2017-05-12 18:13 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Mathieu Desnoyers, Masami Hiramatsu

On Fri, May 12, 2017 at 01:15:44PM -0400, Steven Rostedt wrote:
> NOTE: This was quickly written. The change logs and patches probably need
> some loving. This is for discussion. These may become legitimate patches,
> but for now, I'm seeing if this is an acceptable solution.
> 
> Also note, I checked out the last branch that I had Linus pull, and then
> merged with tip's commit b53e5129~1, which is the commit just before
> "trace/perf: Cure hotplug lock ordering issues" in tip/smp/hotplug
> that placed get_online_cpus() ahead of event_mutex. The issue is that
> event_mutex is a high level mutex, and trying to make sure that
> get_online_cpus() is held whenever it is held and is going to be a
> nightmare. So I started ahead of that commit, but I needed changes
> in my own repo to do full testing. I renamed this temp branch to
> "tip/cpuhotplug", and you can pull it as well. But this is not going
> to be a candidate for pushing to Linus.
> 
> The 5 patches here are:
> 
>  1) fix a lockdep splat when taking stack trace in irqsoff tracing
>     (one of my tests triggered this)
> 
>  2) Allow for get_online_cpus() to nest
> 
>  3) Have kprobes take get_online_cpus() before taking jump label lock.
> 
>  4) Have tracepoints always take get_online_cpus
> 
>  5) Have perf take event_mutex before taking get_online_cpus()
> 
> Thoughts?

First a thought on the use of get_online_cpus()...

Your mileage may vary, but from what I have seen, life is very good
if your piece of the kernel uses get_online_cpus(), but refrains from
using the CPU-hotplug notifiers.  Life is also very good if your piece
of the kernel uses CPU-hotplug notifiers, but refrains from using
get_online_cpus().  If your code uses both get_online_cpus() -and-
CPU-hotplug notifiers for the same data structures, life can be quite
complex and difficult.

For RCU, I cannot avoid using notifiers, so I have been removing uses
of get_online_cpus().  Removing each of these has made large numbers of
race conditions with CPU hotplug simply disappear -- not much reduction
in lines of code, but big reductions in complexity and in state space.
I have one more to go in _rcu_barrier().

Again, your mileage may vary.

							Thanx, Paul

> -- Steve
> 
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
> tip/cpuhotplug
> 
> Head SHA1: ddb06e08fa10ed5030285267a9b1e25d40c337c8
> 
> 
> Steven Rostedt (VMware) (5):
>       tracing: Make sure RCU is watching before calling a stack trace
>       cpu-hotplug: Allow get_online_cpus() to nest
>       kprobes: Take get_online_cpus() before taking jump_label_lock()
>       tracepoints: Grab get_online_cpus() before taking tracepoints_mutex
>       perf: Grab event_mutex before taking get_online_cpus()
> 
> ----
>  include/linux/sched.h           | 1 +
>  include/linux/trace_events.h    | 2 ++
>  kernel/cpu.c                    | 9 +++++++++
>  kernel/events/core.c            | 9 +++++++++
>  kernel/fork.c                   | 1 +
>  kernel/kprobes.c                | 6 +++---
>  kernel/trace/trace.c            | 5 +++++
>  kernel/trace/trace.h            | 1 -
>  kernel/trace/trace_event_perf.c | 8 ++++----
>  kernel/tracepoint.c             | 5 +++++
>  10 files changed, 39 insertions(+), 8 deletions(-)
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 1/5] tracing: Make sure RCU is watching before calling a stack trace
  2017-05-12 17:15 ` [RFC][PATCH 1/5] tracing: Make sure RCU is watching before calling a stack trace Steven Rostedt
@ 2017-05-12 18:25   ` Paul E. McKenney
  2017-05-12 18:36     ` Steven Rostedt
  0 siblings, 1 reply; 34+ messages in thread
From: Paul E. McKenney @ 2017-05-12 18:25 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Mathieu Desnoyers, Masami Hiramatsu

On Fri, May 12, 2017 at 01:15:45PM -0400, Steven Rostedt wrote:
> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> As stack tracing now requires "rcu watching", force RCU to be watching when
> recording a stack trace.
> 
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

Assuming that you never get to __trace_stack() if in an NMI handler,
this looks good to me!

In contrast, if if __trace_stack() ever is called from an NMI handler,
invoking rcu_irq_enter() can be fatal.

							Thanx, Paul

> ---
>  kernel/trace/trace.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index c4536c449021..a4208cebb42b 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -2568,7 +2568,12 @@ static inline void ftrace_trace_stack(struct trace_array *tr,
>  void __trace_stack(struct trace_array *tr, unsigned long flags, int skip,
>  		   int pc)
>  {
> +	if (unlikely(rcu_irq_enter_disabled()))
> +		return;
> +
> +	rcu_irq_enter();
>  	__ftrace_trace_stack(tr->trace_buffer.buffer, flags, skip, pc, NULL);
> +	rcu_irq_exit();
>  }
> 
>  /**
> -- 
> 2.10.2
> 
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 2/5] cpu-hotplug: Allow get_online_cpus() to nest
  2017-05-12 17:15 ` [RFC][PATCH 2/5] cpu-hotplug: Allow get_online_cpus() to nest Steven Rostedt
@ 2017-05-12 18:35   ` Paul E. McKenney
  2017-05-12 18:40     ` Steven Rostedt
  2017-05-12 22:15   ` Thomas Gleixner
  1 sibling, 1 reply; 34+ messages in thread
From: Paul E. McKenney @ 2017-05-12 18:35 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Mathieu Desnoyers, Masami Hiramatsu

On Fri, May 12, 2017 at 01:15:46PM -0400, Steven Rostedt wrote:
> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> Allow get_online_cpus() to be recursive. If a lock is taken while under
> "get_online_cpus()", it can call get_online_cpus() as well, just as long as
> it is never held without being under get_online_cpus(), but then calling it.
> 
>    GOC() -> Lock(X) -> GOC()
> 
> is OK, as long as
> 
>    Lock(X) -> GOC()
> 
> does not exist.
> 
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

Does ->goc_depth also need to be initialized in include/linux/init_task.h?

It seems like C-language initialization-to-zero would cover this,
but there is a lot of initialization to zero values in init_task.h
(including but not limited to some RCU stuff).

Other than that, this does look like it might ease use of get_online_cpus()
in combination with CPU-hotplug notifiers, although it would not have
made any difference in any of the RCU use cases.

							Thanx, Paul

> ---
>  include/linux/sched.h | 1 +
>  kernel/cpu.c          | 9 +++++++++
>  kernel/fork.c         | 1 +
>  3 files changed, 11 insertions(+)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 993e7e25a3a5..8f272ab57685 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -813,6 +813,7 @@ struct task_struct {
>  	unsigned int			lockdep_recursion;
>  	struct held_lock		held_locks[MAX_LOCK_DEPTH];
>  	gfp_t				lockdep_reclaim_gfp;
> +	int				goc_depth;
>  #endif
> 
>  #ifdef CONFIG_UBSAN
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 983163ef36ee..0dbdf1e69715 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -209,12 +209,21 @@ DEFINE_STATIC_PERCPU_RWSEM(cpu_hotplug_lock);
> 
>  void get_online_cpus(void)
>  {
> +#ifdef CONFIG_LOCKDEP
> +	if (current->goc_depth++)
> +		return;
> +#endif
>  	percpu_down_read(&cpu_hotplug_lock);
>  }
>  EXPORT_SYMBOL_GPL(get_online_cpus);
> 
>  void put_online_cpus(void)
>  {
> +#ifdef CONFIG_LOCKDEP
> +	WARN_ON_ONCE(current->goc_depth < 1);
> +	if (--current->goc_depth)
> +		return;
> +#endif
>  	percpu_up_read(&cpu_hotplug_lock);
>  }
>  EXPORT_SYMBOL_GPL(put_online_cpus);
> diff --git a/kernel/fork.c b/kernel/fork.c
> index dd5a371c392a..8aedcd011ccc 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1658,6 +1658,7 @@ static __latent_entropy struct task_struct *copy_process(
>  	p->lockdep_depth = 0; /* no locks held yet */
>  	p->curr_chain_key = 0;
>  	p->lockdep_recursion = 0;
> +	p->goc_depth = 0;
>  #endif
> 
>  #ifdef CONFIG_DEBUG_MUTEXES
> -- 
> 2.10.2
> 
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 1/5] tracing: Make sure RCU is watching before calling a stack trace
  2017-05-12 18:25   ` Paul E. McKenney
@ 2017-05-12 18:36     ` Steven Rostedt
  2017-05-12 18:50       ` Paul E. McKenney
  0 siblings, 1 reply; 34+ messages in thread
From: Steven Rostedt @ 2017-05-12 18:36 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Mathieu Desnoyers, Masami Hiramatsu

On Fri, 12 May 2017 11:25:35 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> On Fri, May 12, 2017 at 01:15:45PM -0400, Steven Rostedt wrote:
> > From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> > 
> > As stack tracing now requires "rcu watching", force RCU to be watching when
> > recording a stack trace.
> > 
> > Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>  
> 
> Assuming that you never get to __trace_stack() if in an NMI handler,
> this looks good to me!
> 
> In contrast, if if __trace_stack() ever is called from an NMI handler,
> invoking rcu_irq_enter() can be fatal.

Then someone may die.

OK, what's the case of running this in nmi? How does perf do it?

Do we just skip the check if it is in an nmi?

	if (!in_nmi()) {
		if (unlikely(rcu_irq_enter_disabled()))
			return;
		rcu_irq_enter();
	}

	__ftrace_trace_stack();

	if (!in_nmi())
		rcu_irq_exit();

?

-- Steve

		

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 3/5] kprobes: Take get_online_cpus() before taking jump_label_lock()
  2017-05-12 17:15 ` [RFC][PATCH 3/5] kprobes: Take get_online_cpus() before taking jump_label_lock() Steven Rostedt
@ 2017-05-12 18:39   ` Paul E. McKenney
  2017-05-12 18:44     ` Steven Rostedt
  2017-05-17 17:50   ` Masami Hiramatsu
  1 sibling, 1 reply; 34+ messages in thread
From: Paul E. McKenney @ 2017-05-12 18:39 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Mathieu Desnoyers, Masami Hiramatsu

On Fri, May 12, 2017 at 01:15:47PM -0400, Steven Rostedt wrote:
> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> jump_label_lock() is taken under get_online_cpus(). Make sure that kprobes
> follows suit.
> 
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

The remaining three (3/5 through 5/5) look straightforward.  #4 appears
to be the one needing the recursive get_online_cpus().

							Thanx, Paul

> ---
>  kernel/kprobes.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index d733479a10ee..57cf73aef488 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1294,13 +1294,13 @@ static int register_aggr_kprobe(struct kprobe *orig_p, struct kprobe *p)
>  	int ret = 0;
>  	struct kprobe *ap = orig_p;
> 
> -	/* For preparing optimization, jump_label_text_reserved() is called */
> -	jump_label_lock();
>  	/*
>  	 * Get online CPUs to avoid text_mutex deadlock.with stop machine,
>  	 * which is invoked by unoptimize_kprobe() in add_new_kprobe()
>  	 */
>  	get_online_cpus();
> +	/* For preparing optimization, jump_label_text_reserved() is called */
> +	jump_label_lock();
>  	mutex_lock(&text_mutex);
> 
>  	if (!kprobe_aggrprobe(orig_p)) {
> @@ -1348,8 +1348,8 @@ static int register_aggr_kprobe(struct kprobe *orig_p, struct kprobe *p)
> 
>  out:
>  	mutex_unlock(&text_mutex);
> -	put_online_cpus();
>  	jump_label_unlock();
> +	put_online_cpus();
> 
>  	if (ret == 0 && kprobe_disabled(ap) && !kprobe_disabled(p)) {
>  		ap->flags &= ~KPROBE_FLAG_DISABLED;
> -- 
> 2.10.2
> 
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 2/5] cpu-hotplug: Allow get_online_cpus() to nest
  2017-05-12 18:35   ` Paul E. McKenney
@ 2017-05-12 18:40     ` Steven Rostedt
  2017-05-12 18:52       ` Paul E. McKenney
  0 siblings, 1 reply; 34+ messages in thread
From: Steven Rostedt @ 2017-05-12 18:40 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Mathieu Desnoyers, Masami Hiramatsu

On Fri, 12 May 2017 11:35:59 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> On Fri, May 12, 2017 at 01:15:46PM -0400, Steven Rostedt wrote:
> > From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> > 
> > Allow get_online_cpus() to be recursive. If a lock is taken while under
> > "get_online_cpus()", it can call get_online_cpus() as well, just as long as
> > it is never held without being under get_online_cpus(), but then calling it.
> > 
> >    GOC() -> Lock(X) -> GOC()
> > 
> > is OK, as long as
> > 
> >    Lock(X) -> GOC()
> > 
> > does not exist.
> > 
> > Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>  
> 
> Does ->goc_depth also need to be initialized in include/linux/init_task.h?
> 
> It seems like C-language initialization-to-zero would cover this,
> but there is a lot of initialization to zero values in init_task.h
> (including but not limited to some RCU stuff).

I assumed that it would just initialize it to zero.

OK, I need to add this:

-- Steve

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index fffe49f..be7f71b 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -375,7 +375,7 @@ extern struct pin_cookie lock_pin_lock(struct lockdep_map *lock);
 extern void lock_repin_lock(struct lockdep_map *lock, struct pin_cookie);
 extern void lock_unpin_lock(struct lockdep_map *lock, struct pin_cookie);
 
-# define INIT_LOCKDEP				.lockdep_recursion = 0, .lockdep_reclaim_gfp = 0,
+# define INIT_LOCKDEP				.lockdep_recursion = 0, .lockdep_reclaim_gfp = 0, .goc_depth = 0,
 
 #define lockdep_depth(tsk)	(debug_locks ? (tsk)->lockdep_depth : 0)
 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 3/5] kprobes: Take get_online_cpus() before taking jump_label_lock()
  2017-05-12 18:39   ` Paul E. McKenney
@ 2017-05-12 18:44     ` Steven Rostedt
  0 siblings, 0 replies; 34+ messages in thread
From: Steven Rostedt @ 2017-05-12 18:44 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Mathieu Desnoyers, Masami Hiramatsu

On Fri, 12 May 2017 11:39:11 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> On Fri, May 12, 2017 at 01:15:47PM -0400, Steven Rostedt wrote:
> > From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> > 
> > jump_label_lock() is taken under get_online_cpus(). Make sure that kprobes
> > follows suit.
> > 
> > Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>  
> 
> The remaining three (3/5 through 5/5) look straightforward.  #4 appears
> to be the one needing the recursive get_online_cpus().

In deed it is.

Thanks for looking at these.

-- Steve

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 1/5] tracing: Make sure RCU is watching before calling a stack trace
  2017-05-12 18:36     ` Steven Rostedt
@ 2017-05-12 18:50       ` Paul E. McKenney
  2017-05-12 20:05         ` Steven Rostedt
  0 siblings, 1 reply; 34+ messages in thread
From: Paul E. McKenney @ 2017-05-12 18:50 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Mathieu Desnoyers, Masami Hiramatsu

On Fri, May 12, 2017 at 02:36:19PM -0400, Steven Rostedt wrote:
> On Fri, 12 May 2017 11:25:35 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > On Fri, May 12, 2017 at 01:15:45PM -0400, Steven Rostedt wrote:
> > > From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> > > 
> > > As stack tracing now requires "rcu watching", force RCU to be watching when
> > > recording a stack trace.
> > > 
> > > Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>  
> > 
> > Assuming that you never get to __trace_stack() if in an NMI handler,
> > this looks good to me!
> > 
> > In contrast, if if __trace_stack() ever is called from an NMI handler,
> > invoking rcu_irq_enter() can be fatal.
> 
> Then someone may die.
> 
> OK, what's the case of running this in nmi? How does perf do it?

I have no idea.  If it cannot happen, then it cannot happen and all
is well, RCU is happy, and I am happy.  ;-)

> Do we just skip the check if it is in an nmi?
> 
> 	if (!in_nmi()) {
> 		if (unlikely(rcu_irq_enter_disabled()))
> 			return;
> 		rcu_irq_enter();
> 	}
> 
> 	__ftrace_trace_stack();
> 
> 	if (!in_nmi())
> 		rcu_irq_exit();
> 
> ?

If it -can- happen, bail out of the function without doing the
__ftrace_trace_stack()?  Or does that just cause other problems further
down the road?  Or BUG_ON(in_nmi())?

But again if it cannot happen, no problem and no need for extra code.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 2/5] cpu-hotplug: Allow get_online_cpus() to nest
  2017-05-12 18:40     ` Steven Rostedt
@ 2017-05-12 18:52       ` Paul E. McKenney
  0 siblings, 0 replies; 34+ messages in thread
From: Paul E. McKenney @ 2017-05-12 18:52 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Mathieu Desnoyers, Masami Hiramatsu

On Fri, May 12, 2017 at 02:40:27PM -0400, Steven Rostedt wrote:
> On Fri, 12 May 2017 11:35:59 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > On Fri, May 12, 2017 at 01:15:46PM -0400, Steven Rostedt wrote:
> > > From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> > > 
> > > Allow get_online_cpus() to be recursive. If a lock is taken while under
> > > "get_online_cpus()", it can call get_online_cpus() as well, just as long as
> > > it is never held without being under get_online_cpus(), but then calling it.
> > > 
> > >    GOC() -> Lock(X) -> GOC()
> > > 
> > > is OK, as long as
> > > 
> > >    Lock(X) -> GOC()
> > > 
> > > does not exist.
> > > 
> > > Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>  
> > 
> > Does ->goc_depth also need to be initialized in include/linux/init_task.h?
> > 
> > It seems like C-language initialization-to-zero would cover this,
> > but there is a lot of initialization to zero values in init_task.h
> > (including but not limited to some RCU stuff).
> 
> I assumed that it would just initialize it to zero.
> 
> OK, I need to add this:
> 
> -- Steve
> 
> diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> index fffe49f..be7f71b 100644
> --- a/include/linux/lockdep.h
> +++ b/include/linux/lockdep.h
> @@ -375,7 +375,7 @@ extern struct pin_cookie lock_pin_lock(struct lockdep_map *lock);
>  extern void lock_repin_lock(struct lockdep_map *lock, struct pin_cookie);
>  extern void lock_unpin_lock(struct lockdep_map *lock, struct pin_cookie);
> 
> -# define INIT_LOCKDEP				.lockdep_recursion = 0, .lockdep_reclaim_gfp = 0,
> +# define INIT_LOCKDEP				.lockdep_recursion = 0, .lockdep_reclaim_gfp = 0, .goc_depth = 0,
> 
>  #define lockdep_depth(tsk)	(debug_locks ? (tsk)->lockdep_depth : 0)

Or maybe we should remove a bunch of zero-initialization from that file.
But if it is needed, then that addition to the patch looks good to me.
Given how much zero-initialization there is, I suspect that it is needed
for some strange boot-up reason.  Hard to believe that someone would not
have gotten rid of it otherwise.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/5] perf/tracing/cpuhotplug: Fix locking order
  2017-05-12 17:15 [RFC][PATCH 0/5] perf/tracing/cpuhotplug: Fix locking order Steven Rostedt
                   ` (5 preceding siblings ...)
  2017-05-12 18:13 ` [RFC][PATCH 0/5] perf/tracing/cpuhotplug: Fix locking order Paul E. McKenney
@ 2017-05-12 19:49 ` Peter Zijlstra
  2017-05-12 20:14   ` Steven Rostedt
                     ` (2 more replies)
  6 siblings, 3 replies; 34+ messages in thread
From: Peter Zijlstra @ 2017-05-12 19:49 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, Mathieu Desnoyers,
	Paul E. McKenney, Masami Hiramatsu

On Fri, May 12, 2017 at 01:15:44PM -0400, Steven Rostedt wrote:
>  2) Allow for get_online_cpus() to nest

So Thomas and me have been avoiding doing this.

In general we avoid nested locking in the kernel. Nested locking makes
an absolute mockery of locking rules and what all gets protected.

Yes, its much easier.. But we managed to kill the BKL, so surely we can
fix the hotplug lock too, right ;-)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 1/5] tracing: Make sure RCU is watching before calling a stack trace
  2017-05-12 18:50       ` Paul E. McKenney
@ 2017-05-12 20:05         ` Steven Rostedt
  2017-05-12 20:31           ` Paul E. McKenney
  0 siblings, 1 reply; 34+ messages in thread
From: Steven Rostedt @ 2017-05-12 20:05 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Mathieu Desnoyers, Masami Hiramatsu

On Fri, 12 May 2017 11:50:03 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> On Fri, May 12, 2017 at 02:36:19PM -0400, Steven Rostedt wrote:
> > On Fri, 12 May 2017 11:25:35 -0700
> > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> >   
> > > On Fri, May 12, 2017 at 01:15:45PM -0400, Steven Rostedt wrote:  
> > > > From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> > > > 
> > > > As stack tracing now requires "rcu watching", force RCU to be watching when
> > > > recording a stack trace.
> > > > 
> > > > Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>    
> > > 
> > > Assuming that you never get to __trace_stack() if in an NMI handler,
> > > this looks good to me!
> > > 
> > > In contrast, if if __trace_stack() ever is called from an NMI handler,
> > > invoking rcu_irq_enter() can be fatal.  
> > 
> > Then someone may die.
> > 
> > OK, what's the case of running this in nmi? How does perf do it?  
> 
> I have no idea.  If it cannot happen, then it cannot happen and all
> is well, RCU is happy, and I am happy.  ;-)
> 
> > Do we just skip the check if it is in an nmi?
> > 
> > 	if (!in_nmi()) {
> > 		if (unlikely(rcu_irq_enter_disabled()))
> > 			return;
> > 		rcu_irq_enter();
> > 	}
> > 
> > 	__ftrace_trace_stack();
> > 
> > 	if (!in_nmi())
> > 		rcu_irq_exit();
> > 
> > ?  
> 
> If it -can- happen, bail out of the function without doing the

Why?

> __ftrace_trace_stack()?  Or does that just cause other problems further
> down the road?  Or BUG_ON(in_nmi())?

Why?

> 
> But again if it cannot happen, no problem and no need for extra code.
> 

We can't call stack trace from nmi anymore? It calls rcu_read_lock()
which is why we need to make sure rcu is watching, otherwise lockdep
complains.

-- Steve

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/5] perf/tracing/cpuhotplug: Fix locking order
  2017-05-12 19:49 ` Peter Zijlstra
@ 2017-05-12 20:14   ` Steven Rostedt
  2017-05-12 21:34   ` Steven Rostedt
  2017-05-15 19:06   ` Steven Rostedt
  2 siblings, 0 replies; 34+ messages in thread
From: Steven Rostedt @ 2017-05-12 20:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, Mathieu Desnoyers,
	Paul E. McKenney, Masami Hiramatsu

On Fri, 12 May 2017 21:49:56 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, May 12, 2017 at 01:15:44PM -0400, Steven Rostedt wrote:
> >  2) Allow for get_online_cpus() to nest  
> 
> So Thomas and me have been avoiding doing this.
> 
> In general we avoid nested locking in the kernel. Nested locking makes
> an absolute mockery of locking rules and what all gets protected.
> 
> Yes, its much easier.. But we managed to kill the BKL, so surely we can
> fix the hotplug lock too, right ;-)

Well, is it really a lock in that sense? Or more like a
preempt_disable()? Which, one can argue is a BKL in its own right.

get_online_cpus() is more like a preempt_disable() than a lock, as it
is preventing something from happening and not really protecting data.
preempt_disable() prevents a schedule from happening. get_online_cpus()
prevents CPUs from going offline or coming online.

Can you image the mess it would be if we prevent preempt_disable() from
nesting? get_online_cpus() is similar, but maybe not so horrific.

The problem I see with going the route of not letting get_online_cpus()
from nesting, is that we are going to have to start encapsulating large
areas where get_online_cpus() must be taken. Any time a low level
function needs to take get_online_cpus() and there happens to be a
higher level function that has a lock that also must have
get_online_cpus() held, that calls that lower level function (take
tracepoints_mutex for example), that means we need to remove the
get_online_cpus() from the lower level function, and make it a
requirement to be taken before calling that lower level function
everywhere. It moves the get_online_cpus() away from what really needs
to have protection, and makes it more into a global lock like the BKL.

Look at all the places that needed get_online_cpus() in your patches
where the function itself really didn't care about cpus going on or off
line.

-- Steve

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 1/5] tracing: Make sure RCU is watching before calling a stack trace
  2017-05-12 20:05         ` Steven Rostedt
@ 2017-05-12 20:31           ` Paul E. McKenney
  2017-05-17 16:46             ` Steven Rostedt
  0 siblings, 1 reply; 34+ messages in thread
From: Paul E. McKenney @ 2017-05-12 20:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Mathieu Desnoyers, Masami Hiramatsu

On Fri, May 12, 2017 at 04:05:32PM -0400, Steven Rostedt wrote:
> On Fri, 12 May 2017 11:50:03 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > On Fri, May 12, 2017 at 02:36:19PM -0400, Steven Rostedt wrote:
> > > On Fri, 12 May 2017 11:25:35 -0700
> > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > >   
> > > > On Fri, May 12, 2017 at 01:15:45PM -0400, Steven Rostedt wrote:  
> > > > > From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> > > > > 
> > > > > As stack tracing now requires "rcu watching", force RCU to be watching when
> > > > > recording a stack trace.
> > > > > 
> > > > > Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>    
> > > > 
> > > > Assuming that you never get to __trace_stack() if in an NMI handler,
> > > > this looks good to me!
> > > > 
> > > > In contrast, if if __trace_stack() ever is called from an NMI handler,
> > > > invoking rcu_irq_enter() can be fatal.  
> > > 
> > > Then someone may die.
> > > 
> > > OK, what's the case of running this in nmi? How does perf do it?  
> > 
> > I have no idea.  If it cannot happen, then it cannot happen and all
> > is well, RCU is happy, and I am happy.  ;-)
> > 
> > > Do we just skip the check if it is in an nmi?
> > > 
> > > 	if (!in_nmi()) {
> > > 		if (unlikely(rcu_irq_enter_disabled()))
> > > 			return;
> > > 		rcu_irq_enter();
> > > 	}
> > > 
> > > 	__ftrace_trace_stack();
> > > 
> > > 	if (!in_nmi())
> > > 		rcu_irq_exit();
> > > 
> > > ?  
> > 
> > If it -can- happen, bail out of the function without doing the
> 
> Why?
> 
> > __ftrace_trace_stack()?  Or does that just cause other problems further
> > down the road?  Or BUG_ON(in_nmi())?
> 
> Why?
> 
> > But again if it cannot happen, no problem and no need for extra code.
> 
> We can't call stack trace from nmi anymore? It calls rcu_read_lock()
> which is why we need to make sure rcu is watching, otherwise lockdep
> complains.

Ah, finally got it!  If we are in_nmi(), you are relying on the
NMI handler's call to rcu_nmi_enter(), which works.  The piece I was
forgetting was that you also recently said in an unrelated LKML thread
that all the functions called at the very beginings and ends of NMI
handlers (which can see !in_nmi()) are marked notrace, so that should
be covered as well.

So never mind!  (And thank you for the explanation.)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/5] perf/tracing/cpuhotplug: Fix locking order
  2017-05-12 19:49 ` Peter Zijlstra
  2017-05-12 20:14   ` Steven Rostedt
@ 2017-05-12 21:34   ` Steven Rostedt
  2017-05-13 13:40     ` Paul E. McKenney
  2017-05-15 19:06   ` Steven Rostedt
  2 siblings, 1 reply; 34+ messages in thread
From: Steven Rostedt @ 2017-05-12 21:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, Mathieu Desnoyers,
	Paul E. McKenney, Masami Hiramatsu

On Fri, 12 May 2017 21:49:56 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, May 12, 2017 at 01:15:44PM -0400, Steven Rostedt wrote:
> >  2) Allow for get_online_cpus() to nest  
> 
> So Thomas and me have been avoiding doing this.
> 
> In general we avoid nested locking in the kernel. Nested locking makes
> an absolute mockery of locking rules and what all gets protected.
> 
> Yes, its much easier.. But we managed to kill the BKL, so surely we can
> fix the hotplug lock too, right ;-)

Actually, talking this over with tglx. We may be able to simplify this
a lot without the goc nesting, by removing the requirement of taking
get_online_cpus() before text_mutex. Reading the comment in kprobes.c:

	/*
	 * The optimization/unoptimization refers online_cpus via
	 * stop_machine() and cpu-hotplug modifies online_cpus.
	 * And same time, text_mutex will be held in cpu-hotplug and here.
	 * This combination can cause a deadlock (cpu-hotplug try to lock
	 * text_mutex but stop_machine can not be done because online_cpus
	 * has been changed)
	 * To avoid this deadlock, we need to call get_online_cpus()
	 * for preventing cpu-hotplug outside of text_mutex locking.
	 */
	get_online_cpus();
	mutex_lock(&text_mutex);

Back when this was added, when we offlined all but one CPU, we would
call into alternatives to switch the system into "uniprocessor" mode.
Turning spinlocks into nops and such. And when we onlined a CPU again,
it would do the same thing in reverse (converting those nops back to
spinlocks). Paul showed Rusty that it was causing huge overheads in
onlining and offlining CPUs and it seemed a bit ridiculous to do so
(although I have to say it was a cool feature). And finally we stop
doing that.

This means that text_mutex, which was taken by the alternative code, no
longer is taken in cpu hotplug code. That means there's no longer a
deadlock scenario, as we don't have anyplace(*) that grabs
get_online_cpus() and takes the text_mutex. Removing that will
simplify things tremendously!

(*) with one exception: perf.

Currently perf does a get_online_cpu() at a high level. Will it be
possible to move that down, such that we don't have it taken when we do
any software events?

-- Steve

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 2/5] cpu-hotplug: Allow get_online_cpus() to nest
  2017-05-12 17:15 ` [RFC][PATCH 2/5] cpu-hotplug: Allow get_online_cpus() to nest Steven Rostedt
  2017-05-12 18:35   ` Paul E. McKenney
@ 2017-05-12 22:15   ` Thomas Gleixner
  2017-05-13  0:23     ` Steven Rostedt
  1 sibling, 1 reply; 34+ messages in thread
From: Thomas Gleixner @ 2017-05-12 22:15 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Mathieu Desnoyers,
	Paul E. McKenney, Masami Hiramatsu

On Fri, 12 May 2017, Steven Rostedt wrote:
>  void get_online_cpus(void)
>  {
> +#ifdef CONFIG_LOCKDEP
> +	if (current->goc_depth++)
> +		return;

This must be unconditional and not depend on lockdep. The percpu rwsem is
going to deadlock silently otherwise when a writer is waiting ....

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 2/5] cpu-hotplug: Allow get_online_cpus() to nest
  2017-05-12 22:15   ` Thomas Gleixner
@ 2017-05-13  0:23     ` Steven Rostedt
  0 siblings, 0 replies; 34+ messages in thread
From: Steven Rostedt @ 2017-05-13  0:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Mathieu Desnoyers,
	Paul E. McKenney, Masami Hiramatsu

On Sat, 13 May 2017 00:15:03 +0200 (CEST)
Thomas Gleixner <tglx@linutronix.de> wrote:

> On Fri, 12 May 2017, Steven Rostedt wrote:
> >  void get_online_cpus(void)
> >  {
> > +#ifdef CONFIG_LOCKDEP
> > +	if (current->goc_depth++)
> > +		return;  
> 
> This must be unconditional and not depend on lockdep. The percpu rwsem is
> going to deadlock silently otherwise when a writer is waiting ....
> 

After I sent the updated changes to Paul, I realized this. But didn't
update it as I turned my attention to seeing if we can still get it
done without it.

But I'm not convinced that we can yet. I'll reply to Peter's email
again later.

-- Steve

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/5] perf/tracing/cpuhotplug: Fix locking order
  2017-05-12 21:34   ` Steven Rostedt
@ 2017-05-13 13:40     ` Paul E. McKenney
  2017-05-15  9:03       ` Peter Zijlstra
  0 siblings, 1 reply; 34+ messages in thread
From: Paul E. McKenney @ 2017-05-13 13:40 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, linux-kernel, Thomas Gleixner, Ingo Molnar,
	Mathieu Desnoyers, Masami Hiramatsu

On Fri, May 12, 2017 at 05:34:48PM -0400, Steven Rostedt wrote:
> On Fri, 12 May 2017 21:49:56 +0200

[ . . . ]

> This means that text_mutex, which was taken by the alternative code, no
> longer is taken in cpu hotplug code. That means there's no longer a
> deadlock scenario, as we don't have anyplace(*) that grabs
> get_online_cpus() and takes the text_mutex. Removing that will
> simplify things tremendously!
> 
> (*) with one exception: perf.
> 
> Currently perf does a get_online_cpu() at a high level. Will it be
> possible to move that down, such that we don't have it taken when we do
> any software events?

Can perf get rid of get_online_cpus(), perhaps using the mutexes acquired
by perf_event_init_cpu() or by perf_event_exit_cpu()?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/5] perf/tracing/cpuhotplug: Fix locking order
  2017-05-13 13:40     ` Paul E. McKenney
@ 2017-05-15  9:03       ` Peter Zijlstra
  2017-05-15 18:40         ` Paul E. McKenney
  0 siblings, 1 reply; 34+ messages in thread
From: Peter Zijlstra @ 2017-05-15  9:03 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, linux-kernel, Thomas Gleixner, Ingo Molnar,
	Mathieu Desnoyers, Masami Hiramatsu

On Sat, May 13, 2017 at 06:40:03AM -0700, Paul E. McKenney wrote:
> On Fri, May 12, 2017 at 05:34:48PM -0400, Steven Rostedt wrote:
> > On Fri, 12 May 2017 21:49:56 +0200
> 
> [ . . . ]
> 
> > This means that text_mutex, which was taken by the alternative code, no
> > longer is taken in cpu hotplug code. That means there's no longer a
> > deadlock scenario, as we don't have anyplace(*) that grabs
> > get_online_cpus() and takes the text_mutex. Removing that will
> > simplify things tremendously!
> > 
> > (*) with one exception: perf.
> > 
> > Currently perf does a get_online_cpu() at a high level. Will it be
> > possible to move that down, such that we don't have it taken when we do
> > any software events?
> 
> Can perf get rid of get_online_cpus(), perhaps using the mutexes acquired
> by perf_event_init_cpu() or by perf_event_exit_cpu()?


Best I could come up with is something like that below that moves the
get_online_cpus() to a possibly less onerous place.

Compile tested only..

---
 include/linux/perf_event.h |  2 ++
 kernel/events/core.c       | 85 ++++++++++++++++++++++++++++++++++------------
 2 files changed, 66 insertions(+), 21 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 24a635887f28..7d6aa29094b2 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -801,6 +801,8 @@ struct perf_cpu_context {
 
 	struct list_head		sched_cb_entry;
 	int				sched_cb_usage;
+
+	int				online;
 };
 
 struct perf_output_handle {
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 13f5b942580b..f9a595641d32 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3812,14 +3812,6 @@ find_get_context(struct pmu *pmu, struct task_struct *task,
 		if (perf_paranoid_cpu() && !capable(CAP_SYS_ADMIN))
 			return ERR_PTR(-EACCES);
 
-		/*
-		 * We could be clever and allow to attach a event to an
-		 * offline CPU and activate it when the CPU comes up, but
-		 * that's for later.
-		 */
-		if (!cpu_online(cpu))
-			return ERR_PTR(-ENODEV);
-
 		cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
 		ctx = &cpuctx->ctx;
 		get_ctx(ctx);
@@ -9005,6 +8997,7 @@ int perf_pmu_register(struct pmu *pmu, const char *name, int type)
 {
 	int cpu, ret;
 
+	get_online_cpus();
 	mutex_lock(&pmus_lock);
 	ret = -ENOMEM;
 	pmu->pmu_disable_count = alloc_percpu(int);
@@ -9064,6 +9057,7 @@ int perf_pmu_register(struct pmu *pmu, const char *name, int type)
 		lockdep_set_class(&cpuctx->ctx.mutex, &cpuctx_mutex);
 		lockdep_set_class(&cpuctx->ctx.lock, &cpuctx_lock);
 		cpuctx->ctx.pmu = pmu;
+		cpuctx->online = cpu_online(cpu);
 
 		__perf_mux_hrtimer_init(cpuctx, cpu);
 	}
@@ -9099,6 +9093,7 @@ int perf_pmu_register(struct pmu *pmu, const char *name, int type)
 	ret = 0;
 unlock:
 	mutex_unlock(&pmus_lock);
+	put_online_cpus();
 
 	return ret;
 
@@ -9908,13 +9903,11 @@ SYSCALL_DEFINE5(perf_event_open,
 	if (flags & PERF_FLAG_PID_CGROUP)
 		cgroup_fd = pid;
 
-	get_online_cpus();
-
 	event = perf_event_alloc(&attr, cpu, task, group_leader, NULL,
 				 NULL, NULL, cgroup_fd);
 	if (IS_ERR(event)) {
 		err = PTR_ERR(event);
-		goto err_cpus;
+		goto err_cred;
 	}
 
 	if (is_sampling_event(event)) {
@@ -10078,6 +10071,23 @@ SYSCALL_DEFINE5(perf_event_open,
 		goto err_locked;
 	}
 
+	if (!task) {
+		/*
+		 * Check if the @cpu we're creating an event for is online.
+		 *
+		 * We use the perf_cpu_context::ctx::mutex to serialize against
+		 * the hotplug notifiers. See perf_event_{init,exit}_cpu().
+		 */
+		struct perf_cpu_context *cpuctx =
+			container_of(ctx, struct perf_cpu_context, ctx);
+
+		if (!cpuctx->online) {
+			err = -ENODEV;
+			goto err_locked;
+		}
+	}
+
+
 	/*
 	 * Must be under the same ctx::mutex as perf_install_in_context(),
 	 * because we need to serialize with concurrent event creation.
@@ -10162,8 +10172,6 @@ SYSCALL_DEFINE5(perf_event_open,
 		perf_event_ctx_unlock(group_leader, gctx);
 	mutex_unlock(&ctx->mutex);
 
-	put_online_cpus();
-
 	if (task) {
 		mutex_unlock(&task->signal->cred_guard_mutex);
 		put_task_struct(task);
@@ -10205,8 +10213,6 @@ SYSCALL_DEFINE5(perf_event_open,
 	 */
 	if (!event_file)
 		free_event(event);
-err_cpus:
-	put_online_cpus();
 err_cred:
 	if (task)
 		mutex_unlock(&task->signal->cred_guard_mutex);
@@ -10237,7 +10243,6 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu,
 	struct perf_event *event;
 	int err;
 
-	get_online_cpus();
 	/*
 	 * Get the target context (task or percpu):
 	 */
@@ -10264,6 +10269,21 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu,
 		goto err_unlock;
 	}
 
+	if (!task) {
+		/*
+		 * Check if the @cpu we're creating an event for is online.
+		 *
+		 * We use the perf_cpu_context::ctx::mutex to serialize against
+		 * the hotplug notifiers. See perf_event_{init,exit}_cpu().
+		 */
+		struct perf_cpu_context *cpuctx =
+			container_of(ctx, struct perf_cpu_context, ctx);
+		if (!cpuctx->online) {
+			err = -ENODEV;
+			goto err_unlock;
+		}
+	}
+
 	if (!exclusive_event_installable(event, ctx)) {
 		err = -EBUSY;
 		goto err_unlock;
@@ -10272,7 +10292,6 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu,
 	perf_install_in_context(ctx, event, cpu);
 	perf_unpin_context(ctx);
 	mutex_unlock(&ctx->mutex);
-	put_online_cpus();
 	return event;
 
 err_unlock:
@@ -10282,7 +10301,6 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu,
 err_free:
 	free_event(event);
 err:
-	put_online_cpus();
 	return ERR_PTR(err);
 }
 EXPORT_SYMBOL_GPL(perf_event_create_kernel_counter);
@@ -10951,7 +10969,7 @@ static void __init perf_event_init_all_cpus(void)
 	}
 }
 
-int perf_event_init_cpu(unsigned int cpu)
+void perf_swevent_init_cpu(unsigned int cpu)
 {
 	struct swevent_htable *swhash = &per_cpu(swevent_htable, cpu);
 
@@ -10964,7 +10982,6 @@ int perf_event_init_cpu(unsigned int cpu)
 		rcu_assign_pointer(swhash->swevent_hlist, hlist);
 	}
 	mutex_unlock(&swhash->hlist_mutex);
-	return 0;
 }
 
 #if defined CONFIG_HOTPLUG_CPU || defined CONFIG_KEXEC_CORE
@@ -10982,16 +10999,19 @@ static void __perf_event_exit_context(void *__info)
 
 static void perf_event_exit_cpu_context(int cpu)
 {
+	struct perf_cpu_context *cpuctx;
 	struct perf_event_context *ctx;
 	struct pmu *pmu;
 	int idx;
 
 	idx = srcu_read_lock(&pmus_srcu);
 	list_for_each_entry_rcu(pmu, &pmus, entry) {
-		ctx = &per_cpu_ptr(pmu->pmu_cpu_context, cpu)->ctx;
+		cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
+		ctx = &cpuctx->ctx;
 
 		mutex_lock(&ctx->mutex);
 		smp_call_function_single(cpu, __perf_event_exit_context, ctx, 1);
+		cpuctx->online = 0;
 		mutex_unlock(&ctx->mutex);
 	}
 	srcu_read_unlock(&pmus_srcu, idx);
@@ -11002,6 +11022,29 @@ static void perf_event_exit_cpu_context(int cpu) { }
 
 #endif
 
+int perf_event_init_cpu(unsigned int cpu)
+{
+	struct perf_cpu_context *cpuctx;
+	struct perf_event_context *ctx;
+	struct pmu *pmu;
+	int idx;
+
+	perf_swevent_init_cpu(cpu);
+
+	idx = srcu_read_lock(&pmus_srcu);
+	list_for_each_entry_rcu(pmu, &pmus, entry) {
+		cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
+		ctx = &cpuctx->ctx;
+
+		mutex_lock(&ctx->mutex);
+		cpuctx->online = 1;
+		mutex_unlock(&ctx->mutex);
+	}
+	srcu_read_unlock(&pmus_srcu, idx);
+
+	return 0;
+}
+
 int perf_event_exit_cpu(unsigned int cpu)
 {
 	perf_event_exit_cpu_context(cpu);

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/5] perf/tracing/cpuhotplug: Fix locking order
  2017-05-15  9:03       ` Peter Zijlstra
@ 2017-05-15 18:40         ` Paul E. McKenney
  2017-05-16  8:19           ` Peter Zijlstra
  0 siblings, 1 reply; 34+ messages in thread
From: Paul E. McKenney @ 2017-05-15 18:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, linux-kernel, Thomas Gleixner, Ingo Molnar,
	Mathieu Desnoyers, Masami Hiramatsu

On Mon, May 15, 2017 at 11:03:18AM +0200, Peter Zijlstra wrote:
> On Sat, May 13, 2017 at 06:40:03AM -0700, Paul E. McKenney wrote:
> > On Fri, May 12, 2017 at 05:34:48PM -0400, Steven Rostedt wrote:
> > > On Fri, 12 May 2017 21:49:56 +0200
> > 
> > [ . . . ]
> > 
> > > This means that text_mutex, which was taken by the alternative code, no
> > > longer is taken in cpu hotplug code. That means there's no longer a
> > > deadlock scenario, as we don't have anyplace(*) that grabs
> > > get_online_cpus() and takes the text_mutex. Removing that will
> > > simplify things tremendously!
> > > 
> > > (*) with one exception: perf.
> > > 
> > > Currently perf does a get_online_cpu() at a high level. Will it be
> > > possible to move that down, such that we don't have it taken when we do
> > > any software events?
> > 
> > Can perf get rid of get_online_cpus(), perhaps using the mutexes acquired
> > by perf_event_init_cpu() or by perf_event_exit_cpu()?
> 
> 
> Best I could come up with is something like that below that moves the
> get_online_cpus() to a possibly less onerous place.
> 
> Compile tested only..

I freely confess that I don't fully understand this code, but...

Given that you acquire the global pmus_lock when doing the
get_online_cpus(), and given that CPU hotplug is rare, is it possible
to momentarily acquire the global pmus_lock in perf_event_init_cpu()
and perf_event_exit_cpu() and interact directly with that?  Then perf
would presumably leave alone any outgoing CPU that had already executed
perf_event_exit_cpu(), and also any incoming CPU that had not already
executed perf_event_init_cpu().

What prevents this approach from working?

							Thanx, Paul

> ---
>  include/linux/perf_event.h |  2 ++
>  kernel/events/core.c       | 85 ++++++++++++++++++++++++++++++++++------------
>  2 files changed, 66 insertions(+), 21 deletions(-)
> 
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 24a635887f28..7d6aa29094b2 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -801,6 +801,8 @@ struct perf_cpu_context {
> 
>  	struct list_head		sched_cb_entry;
>  	int				sched_cb_usage;
> +
> +	int				online;
>  };
> 
>  struct perf_output_handle {
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 13f5b942580b..f9a595641d32 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3812,14 +3812,6 @@ find_get_context(struct pmu *pmu, struct task_struct *task,
>  		if (perf_paranoid_cpu() && !capable(CAP_SYS_ADMIN))
>  			return ERR_PTR(-EACCES);
> 
> -		/*
> -		 * We could be clever and allow to attach a event to an
> -		 * offline CPU and activate it when the CPU comes up, but
> -		 * that's for later.
> -		 */
> -		if (!cpu_online(cpu))
> -			return ERR_PTR(-ENODEV);
> -
>  		cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
>  		ctx = &cpuctx->ctx;
>  		get_ctx(ctx);
> @@ -9005,6 +8997,7 @@ int perf_pmu_register(struct pmu *pmu, const char *name, int type)
>  {
>  	int cpu, ret;
> 
> +	get_online_cpus();
>  	mutex_lock(&pmus_lock);
>  	ret = -ENOMEM;
>  	pmu->pmu_disable_count = alloc_percpu(int);
> @@ -9064,6 +9057,7 @@ int perf_pmu_register(struct pmu *pmu, const char *name, int type)
>  		lockdep_set_class(&cpuctx->ctx.mutex, &cpuctx_mutex);
>  		lockdep_set_class(&cpuctx->ctx.lock, &cpuctx_lock);
>  		cpuctx->ctx.pmu = pmu;
> +		cpuctx->online = cpu_online(cpu);
> 
>  		__perf_mux_hrtimer_init(cpuctx, cpu);
>  	}
> @@ -9099,6 +9093,7 @@ int perf_pmu_register(struct pmu *pmu, const char *name, int type)
>  	ret = 0;
>  unlock:
>  	mutex_unlock(&pmus_lock);
> +	put_online_cpus();
> 
>  	return ret;
> 
> @@ -9908,13 +9903,11 @@ SYSCALL_DEFINE5(perf_event_open,
>  	if (flags & PERF_FLAG_PID_CGROUP)
>  		cgroup_fd = pid;
> 
> -	get_online_cpus();
> -
>  	event = perf_event_alloc(&attr, cpu, task, group_leader, NULL,
>  				 NULL, NULL, cgroup_fd);
>  	if (IS_ERR(event)) {
>  		err = PTR_ERR(event);
> -		goto err_cpus;
> +		goto err_cred;
>  	}
> 
>  	if (is_sampling_event(event)) {
> @@ -10078,6 +10071,23 @@ SYSCALL_DEFINE5(perf_event_open,
>  		goto err_locked;
>  	}
> 
> +	if (!task) {
> +		/*
> +		 * Check if the @cpu we're creating an event for is online.
> +		 *
> +		 * We use the perf_cpu_context::ctx::mutex to serialize against
> +		 * the hotplug notifiers. See perf_event_{init,exit}_cpu().
> +		 */
> +		struct perf_cpu_context *cpuctx =
> +			container_of(ctx, struct perf_cpu_context, ctx);
> +
> +		if (!cpuctx->online) {
> +			err = -ENODEV;
> +			goto err_locked;
> +		}
> +	}
> +
> +
>  	/*
>  	 * Must be under the same ctx::mutex as perf_install_in_context(),
>  	 * because we need to serialize with concurrent event creation.
> @@ -10162,8 +10172,6 @@ SYSCALL_DEFINE5(perf_event_open,
>  		perf_event_ctx_unlock(group_leader, gctx);
>  	mutex_unlock(&ctx->mutex);
> 
> -	put_online_cpus();
> -
>  	if (task) {
>  		mutex_unlock(&task->signal->cred_guard_mutex);
>  		put_task_struct(task);
> @@ -10205,8 +10213,6 @@ SYSCALL_DEFINE5(perf_event_open,
>  	 */
>  	if (!event_file)
>  		free_event(event);
> -err_cpus:
> -	put_online_cpus();
>  err_cred:
>  	if (task)
>  		mutex_unlock(&task->signal->cred_guard_mutex);
> @@ -10237,7 +10243,6 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu,
>  	struct perf_event *event;
>  	int err;
> 
> -	get_online_cpus();
>  	/*
>  	 * Get the target context (task or percpu):
>  	 */
> @@ -10264,6 +10269,21 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu,
>  		goto err_unlock;
>  	}
> 
> +	if (!task) {
> +		/*
> +		 * Check if the @cpu we're creating an event for is online.
> +		 *
> +		 * We use the perf_cpu_context::ctx::mutex to serialize against
> +		 * the hotplug notifiers. See perf_event_{init,exit}_cpu().
> +		 */
> +		struct perf_cpu_context *cpuctx =
> +			container_of(ctx, struct perf_cpu_context, ctx);
> +		if (!cpuctx->online) {
> +			err = -ENODEV;
> +			goto err_unlock;
> +		}
> +	}
> +
>  	if (!exclusive_event_installable(event, ctx)) {
>  		err = -EBUSY;
>  		goto err_unlock;
> @@ -10272,7 +10292,6 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu,
>  	perf_install_in_context(ctx, event, cpu);
>  	perf_unpin_context(ctx);
>  	mutex_unlock(&ctx->mutex);
> -	put_online_cpus();
>  	return event;
> 
>  err_unlock:
> @@ -10282,7 +10301,6 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu,
>  err_free:
>  	free_event(event);
>  err:
> -	put_online_cpus();
>  	return ERR_PTR(err);
>  }
>  EXPORT_SYMBOL_GPL(perf_event_create_kernel_counter);
> @@ -10951,7 +10969,7 @@ static void __init perf_event_init_all_cpus(void)
>  	}
>  }
> 
> -int perf_event_init_cpu(unsigned int cpu)
> +void perf_swevent_init_cpu(unsigned int cpu)
>  {
>  	struct swevent_htable *swhash = &per_cpu(swevent_htable, cpu);
> 
> @@ -10964,7 +10982,6 @@ int perf_event_init_cpu(unsigned int cpu)
>  		rcu_assign_pointer(swhash->swevent_hlist, hlist);
>  	}
>  	mutex_unlock(&swhash->hlist_mutex);
> -	return 0;
>  }
> 
>  #if defined CONFIG_HOTPLUG_CPU || defined CONFIG_KEXEC_CORE
> @@ -10982,16 +10999,19 @@ static void __perf_event_exit_context(void *__info)
> 
>  static void perf_event_exit_cpu_context(int cpu)
>  {
> +	struct perf_cpu_context *cpuctx;
>  	struct perf_event_context *ctx;
>  	struct pmu *pmu;
>  	int idx;
> 
>  	idx = srcu_read_lock(&pmus_srcu);
>  	list_for_each_entry_rcu(pmu, &pmus, entry) {
> -		ctx = &per_cpu_ptr(pmu->pmu_cpu_context, cpu)->ctx;
> +		cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
> +		ctx = &cpuctx->ctx;
> 
>  		mutex_lock(&ctx->mutex);
>  		smp_call_function_single(cpu, __perf_event_exit_context, ctx, 1);
> +		cpuctx->online = 0;
>  		mutex_unlock(&ctx->mutex);
>  	}
>  	srcu_read_unlock(&pmus_srcu, idx);
> @@ -11002,6 +11022,29 @@ static void perf_event_exit_cpu_context(int cpu) { }
> 
>  #endif
> 
> +int perf_event_init_cpu(unsigned int cpu)
> +{
> +	struct perf_cpu_context *cpuctx;
> +	struct perf_event_context *ctx;
> +	struct pmu *pmu;
> +	int idx;
> +
> +	perf_swevent_init_cpu(cpu);
> +
> +	idx = srcu_read_lock(&pmus_srcu);
> +	list_for_each_entry_rcu(pmu, &pmus, entry) {
> +		cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
> +		ctx = &cpuctx->ctx;
> +
> +		mutex_lock(&ctx->mutex);
> +		cpuctx->online = 1;
> +		mutex_unlock(&ctx->mutex);
> +	}
> +	srcu_read_unlock(&pmus_srcu, idx);
> +
> +	return 0;
> +}
> +
>  int perf_event_exit_cpu(unsigned int cpu)
>  {
>  	perf_event_exit_cpu_context(cpu);
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/5] perf/tracing/cpuhotplug: Fix locking order
  2017-05-12 19:49 ` Peter Zijlstra
  2017-05-12 20:14   ` Steven Rostedt
  2017-05-12 21:34   ` Steven Rostedt
@ 2017-05-15 19:06   ` Steven Rostedt
  2 siblings, 0 replies; 34+ messages in thread
From: Steven Rostedt @ 2017-05-15 19:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, Mathieu Desnoyers,
	Paul E. McKenney, Masami Hiramatsu

On Fri, 12 May 2017 21:49:56 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> In general we avoid nested locking in the kernel. Nested locking makes
> an absolute mockery of locking rules and what all gets protected.

I'm not against the goal of having get_online_cpus() not be nested,
but I don't agree with the above comment.

If we look at locking in a more abstract view, lock(A) is just
something to protect critical section A. Lock(B) is just something to
protect critical section B. To prevent deadlocks, if one enters
critical section A and then enters critical section B before leaving
critical section A, then the system must never enter critical section B
and then enter critical section A.

But for nested locks, the nested lock is not a new critical section. If
we have lock(A); ...; lock(A); ...; unlock(A); ...; unlock(A); that
just shows a single entry into critical section A. Even if we have:

lock(A);...; lock(B); ...; lock(A); ...; unlock(A); ...;
unlock(B); ...; unlock(A);

As long as there never exits a lock(B); ...; lock(A); without first
taking lock A before taking lock(B).

Because the rules only matter when entering a critical section, and the
taking of the second lock(A) is basically just a nop.

We do this all the time in the kernel, but we just don't do it with
nesting locks. We usually do it with __func()s.


void func(void) {
	lock(A);
	__func();
	unlock(A);
}

and later we could even have:

void foo(void) {
	lock(A);
	...;
	lock(B);
	...;
	__func();
	...;
	unlock(B);
	...;
	unlock(A);
}

If lock(A) was able to nest, then __func() wouldn't be needed. We could
always just call func() which would take lock(A); lockdep will still
work just fine, as it would only care when the lock is first taken, not
its nesting.

One thing we need to do with the current approach is

void __func(void) {

	lockdep_assert_held(A);

	...;
}

to make sure that A is held when calling __func().

Again, I'm not against the lofty goal of having no nesting of
get_online_cpus(), but I just don't agree that nesting locks make a
mockery out of the locking rules.

-- Steve

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/5] perf/tracing/cpuhotplug: Fix locking order
  2017-05-15 18:40         ` Paul E. McKenney
@ 2017-05-16  8:19           ` Peter Zijlstra
  2017-05-16 12:46             ` Paul E. McKenney
  0 siblings, 1 reply; 34+ messages in thread
From: Peter Zijlstra @ 2017-05-16  8:19 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, linux-kernel, Thomas Gleixner, Ingo Molnar,
	Mathieu Desnoyers, Masami Hiramatsu

On Mon, May 15, 2017 at 11:40:43AM -0700, Paul E. McKenney wrote:

> Given that you acquire the global pmus_lock when doing the
> get_online_cpus(), and given that CPU hotplug is rare, is it possible
> to momentarily acquire the global pmus_lock in perf_event_init_cpu()
> and perf_event_exit_cpu() and interact directly with that?  Then perf
> would presumably leave alone any outgoing CPU that had already executed
> perf_event_exit_cpu(), and also any incoming CPU that had not already
> executed perf_event_init_cpu().
> 
> What prevents this approach from working?

Lack of sleep probably ;-)

I'd blame the kids, but those have actually been very good lately.

You're suggesting the below on top, right? I'll run it with lockdep
enabled after I chase some regression..

---
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8997,7 +8997,6 @@ int perf_pmu_register(struct pmu *pmu, c
 {
 	int cpu, ret;
 
-	get_online_cpus();
 	mutex_lock(&pmus_lock);
 	ret = -ENOMEM;
 	pmu->pmu_disable_count = alloc_percpu(int);
@@ -9093,7 +9092,6 @@ int perf_pmu_register(struct pmu *pmu, c
 	ret = 0;
 unlock:
 	mutex_unlock(&pmus_lock);
-	put_online_cpus();
 
 	return ret;
 
@@ -11002,10 +11000,9 @@ static void perf_event_exit_cpu_context(
 	struct perf_cpu_context *cpuctx;
 	struct perf_event_context *ctx;
 	struct pmu *pmu;
-	int idx;
 
-	idx = srcu_read_lock(&pmus_srcu);
-	list_for_each_entry_rcu(pmu, &pmus, entry) {
+	mutex_lock(&pmus_lock);
+	list_for_each_entry(pmu, &pmus, entry) {
 		cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
 		ctx = &cpuctx->ctx;
 
@@ -11014,7 +11011,7 @@ static void perf_event_exit_cpu_context(
 		cpuctx->online = 0;
 		mutex_unlock(&ctx->mutex);
 	}
-	srcu_read_unlock(&pmus_srcu, idx);
+	mutex_unlock(&pmus_lock);
 }
 #else
 
@@ -11027,12 +11024,11 @@ int perf_event_init_cpu(unsigned int cpu
 	struct perf_cpu_context *cpuctx;
 	struct perf_event_context *ctx;
 	struct pmu *pmu;
-	int idx;
 
 	perf_swevent_init_cpu(cpu);
 
-	idx = srcu_read_lock(&pmus_srcu);
-	list_for_each_entry_rcu(pmu, &pmus, entry) {
+	mutex_lock(&pmus_lock);
+	list_for_each_entry(pmu, &pmus, entry) {
 		cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
 		ctx = &cpuctx->ctx;
 
@@ -11040,7 +11036,7 @@ int perf_event_init_cpu(unsigned int cpu
 		cpuctx->online = 1;
 		mutex_unlock(&ctx->mutex);
 	}
-	srcu_read_unlock(&pmus_srcu, idx);
+	mutex_unlock(&pmus_lock);
 
 	return 0;
 }

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/5] perf/tracing/cpuhotplug: Fix locking order
  2017-05-16  8:19           ` Peter Zijlstra
@ 2017-05-16 12:46             ` Paul E. McKenney
  2017-05-16 14:27               ` Paul E. McKenney
  0 siblings, 1 reply; 34+ messages in thread
From: Paul E. McKenney @ 2017-05-16 12:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, linux-kernel, Thomas Gleixner, Ingo Molnar,
	Mathieu Desnoyers, Masami Hiramatsu

On Tue, May 16, 2017 at 10:19:23AM +0200, Peter Zijlstra wrote:
> On Mon, May 15, 2017 at 11:40:43AM -0700, Paul E. McKenney wrote:
> 
> > Given that you acquire the global pmus_lock when doing the
> > get_online_cpus(), and given that CPU hotplug is rare, is it possible
> > to momentarily acquire the global pmus_lock in perf_event_init_cpu()
> > and perf_event_exit_cpu() and interact directly with that?  Then perf
> > would presumably leave alone any outgoing CPU that had already executed
> > perf_event_exit_cpu(), and also any incoming CPU that had not already
> > executed perf_event_init_cpu().
> > 
> > What prevents this approach from working?
> 
> Lack of sleep probably ;-)

I know that feeling...

> I'd blame the kids, but those have actually been very good lately.

I don't get that excuse anymore, all are on their own.  So I need
to come up with some fresh excuses.  ;-)

> You're suggesting the below on top, right? I'll run it with lockdep
> enabled after I chase some regression..

Something like this, yes.  Maybe even exactly like this.  ;-)

> ---
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8997,7 +8997,6 @@ int perf_pmu_register(struct pmu *pmu, c
>  {
>  	int cpu, ret;
> 
> -	get_online_cpus();
>  	mutex_lock(&pmus_lock);
>  	ret = -ENOMEM;
>  	pmu->pmu_disable_count = alloc_percpu(int);

There is usually also some state check in here somewhere for the CPU
being offline from a perf perspective.  Such a check might already exist,
but I must plead ignorance of perf.

> @@ -9093,7 +9092,6 @@ int perf_pmu_register(struct pmu *pmu, c
>  	ret = 0;
>  unlock:
>  	mutex_unlock(&pmus_lock);
> -	put_online_cpus();
> 
>  	return ret;
> 
> @@ -11002,10 +11000,9 @@ static void perf_event_exit_cpu_context(
>  	struct perf_cpu_context *cpuctx;
>  	struct perf_event_context *ctx;
>  	struct pmu *pmu;
> -	int idx;
> 
> -	idx = srcu_read_lock(&pmus_srcu);
> -	list_for_each_entry_rcu(pmu, &pmus, entry) {
> +	mutex_lock(&pmus_lock);

If the state change checked for by perf_pmu_register() needs to be also
guarded by ctx->mutex, this looks right to me.

Just for completeness, the other style is to maintain separate per-CPU
state, in which case you would instead acquire pmus_lock, mark this
CPU off-limits to more perf_pmu_register() usage, release pmus_lock,
then clean up any old usage.

The approach you have here seems to work best when the cleanup
and initialization naturally mark the CPU as off limits and ready,
respectively.  The other style seems to work best when you need a separate
indication of which CPUs are off limits and usable.

RCU is an example of the other style, with the rcu_node structure's
->qsmaskinitnext mask serving to mark which CPUs usable.  One reason
that the other style works so well for RCU is that a CPU coming online
has no effect on the current grace period, so rcu_cpu_starting() just
sets the CPU's bit in ->qsmaskinitnext, which takes effect only once
the the next grace period starts.

It is quite possible that many of the other use cases instead need to
use something like what you have here.  I suspect that the common case
is that a CPU appearing or disappearing must have some immediate effect.

> +	list_for_each_entry(pmu, &pmus, entry) {
>  		cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
>  		ctx = &cpuctx->ctx;
> 
> @@ -11014,7 +11011,7 @@ static void perf_event_exit_cpu_context(
>  		cpuctx->online = 0;
>  		mutex_unlock(&ctx->mutex);
>  	}
> -	srcu_read_unlock(&pmus_srcu, idx);
> +	mutex_unlock(&pmus_lock);
>  }
>  #else
> 
> @@ -11027,12 +11024,11 @@ int perf_event_init_cpu(unsigned int cpu
>  	struct perf_cpu_context *cpuctx;
>  	struct perf_event_context *ctx;
>  	struct pmu *pmu;
> -	int idx;
> 
>  	perf_swevent_init_cpu(cpu);
> 
> -	idx = srcu_read_lock(&pmus_srcu);
> -	list_for_each_entry_rcu(pmu, &pmus, entry) {
> +	mutex_lock(&pmus_lock);
> +	list_for_each_entry(pmu, &pmus, entry) {
>  		cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
>  		ctx = &cpuctx->ctx;
> 
> @@ -11040,7 +11036,7 @@ int perf_event_init_cpu(unsigned int cpu
>  		cpuctx->online = 1;
>  		mutex_unlock(&ctx->mutex);
>  	}
> -	srcu_read_unlock(&pmus_srcu, idx);
> +	mutex_unlock(&pmus_lock);

And same here.

Again for completeness, the other style would be to mark this CPU
as ready for perf usage at the very end, protected by pmus_lock.

							Thanx, Paul

>  	return 0;
>  }
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/5] perf/tracing/cpuhotplug: Fix locking order
  2017-05-16 12:46             ` Paul E. McKenney
@ 2017-05-16 14:27               ` Paul E. McKenney
  2017-05-17 10:40                 ` Peter Zijlstra
  0 siblings, 1 reply; 34+ messages in thread
From: Paul E. McKenney @ 2017-05-16 14:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, linux-kernel, Thomas Gleixner, Ingo Molnar,
	Mathieu Desnoyers, Masami Hiramatsu

On Tue, May 16, 2017 at 05:46:06AM -0700, Paul E. McKenney wrote:
> On Tue, May 16, 2017 at 10:19:23AM +0200, Peter Zijlstra wrote:
> > On Mon, May 15, 2017 at 11:40:43AM -0700, Paul E. McKenney wrote:
> > 
> > > Given that you acquire the global pmus_lock when doing the
> > > get_online_cpus(), and given that CPU hotplug is rare, is it possible
> > > to momentarily acquire the global pmus_lock in perf_event_init_cpu()
> > > and perf_event_exit_cpu() and interact directly with that?  Then perf
> > > would presumably leave alone any outgoing CPU that had already executed
> > > perf_event_exit_cpu(), and also any incoming CPU that had not already
> > > executed perf_event_init_cpu().
> > > 
> > > What prevents this approach from working?
> > 
> > Lack of sleep probably ;-)
> 
> I know that feeling...
> 
> > I'd blame the kids, but those have actually been very good lately.
> 
> I don't get that excuse anymore, all are on their own.  So I need
> to come up with some fresh excuses.  ;-)
> 
> > You're suggesting the below on top, right? I'll run it with lockdep
> > enabled after I chase some regression..
> 
> Something like this, yes.  Maybe even exactly like this.  ;-)

Ah, one thing I forgot...  If you are avoiding use of get_online_cpus(),
you usually also have to be very careful about how you use things like
cpu_online() and cpu_is_offline.

							Thanx, Paul

> > ---
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -8997,7 +8997,6 @@ int perf_pmu_register(struct pmu *pmu, c
> >  {
> >  	int cpu, ret;
> > 
> > -	get_online_cpus();
> >  	mutex_lock(&pmus_lock);
> >  	ret = -ENOMEM;
> >  	pmu->pmu_disable_count = alloc_percpu(int);
> 
> There is usually also some state check in here somewhere for the CPU
> being offline from a perf perspective.  Such a check might already exist,
> but I must plead ignorance of perf.
> 
> > @@ -9093,7 +9092,6 @@ int perf_pmu_register(struct pmu *pmu, c
> >  	ret = 0;
> >  unlock:
> >  	mutex_unlock(&pmus_lock);
> > -	put_online_cpus();
> > 
> >  	return ret;
> > 
> > @@ -11002,10 +11000,9 @@ static void perf_event_exit_cpu_context(
> >  	struct perf_cpu_context *cpuctx;
> >  	struct perf_event_context *ctx;
> >  	struct pmu *pmu;
> > -	int idx;
> > 
> > -	idx = srcu_read_lock(&pmus_srcu);
> > -	list_for_each_entry_rcu(pmu, &pmus, entry) {
> > +	mutex_lock(&pmus_lock);
> 
> If the state change checked for by perf_pmu_register() needs to be also
> guarded by ctx->mutex, this looks right to me.
> 
> Just for completeness, the other style is to maintain separate per-CPU
> state, in which case you would instead acquire pmus_lock, mark this
> CPU off-limits to more perf_pmu_register() usage, release pmus_lock,
> then clean up any old usage.
> 
> The approach you have here seems to work best when the cleanup
> and initialization naturally mark the CPU as off limits and ready,
> respectively.  The other style seems to work best when you need a separate
> indication of which CPUs are off limits and usable.
> 
> RCU is an example of the other style, with the rcu_node structure's
> ->qsmaskinitnext mask serving to mark which CPUs usable.  One reason
> that the other style works so well for RCU is that a CPU coming online
> has no effect on the current grace period, so rcu_cpu_starting() just
> sets the CPU's bit in ->qsmaskinitnext, which takes effect only once
> the the next grace period starts.
> 
> It is quite possible that many of the other use cases instead need to
> use something like what you have here.  I suspect that the common case
> is that a CPU appearing or disappearing must have some immediate effect.
> 
> > +	list_for_each_entry(pmu, &pmus, entry) {
> >  		cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
> >  		ctx = &cpuctx->ctx;
> > 
> > @@ -11014,7 +11011,7 @@ static void perf_event_exit_cpu_context(
> >  		cpuctx->online = 0;
> >  		mutex_unlock(&ctx->mutex);
> >  	}
> > -	srcu_read_unlock(&pmus_srcu, idx);
> > +	mutex_unlock(&pmus_lock);
> >  }
> >  #else
> > 
> > @@ -11027,12 +11024,11 @@ int perf_event_init_cpu(unsigned int cpu
> >  	struct perf_cpu_context *cpuctx;
> >  	struct perf_event_context *ctx;
> >  	struct pmu *pmu;
> > -	int idx;
> > 
> >  	perf_swevent_init_cpu(cpu);
> > 
> > -	idx = srcu_read_lock(&pmus_srcu);
> > -	list_for_each_entry_rcu(pmu, &pmus, entry) {
> > +	mutex_lock(&pmus_lock);
> > +	list_for_each_entry(pmu, &pmus, entry) {
> >  		cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
> >  		ctx = &cpuctx->ctx;
> > 
> > @@ -11040,7 +11036,7 @@ int perf_event_init_cpu(unsigned int cpu
> >  		cpuctx->online = 1;
> >  		mutex_unlock(&ctx->mutex);
> >  	}
> > -	srcu_read_unlock(&pmus_srcu, idx);
> > +	mutex_unlock(&pmus_lock);
> 
> And same here.
> 
> Again for completeness, the other style would be to mark this CPU
> as ready for perf usage at the very end, protected by pmus_lock.
> 
> 							Thanx, Paul
> 
> >  	return 0;
> >  }
> > 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/5] perf/tracing/cpuhotplug: Fix locking order
  2017-05-16 14:27               ` Paul E. McKenney
@ 2017-05-17 10:40                 ` Peter Zijlstra
  2017-05-17 14:55                   ` Paul E. McKenney
  0 siblings, 1 reply; 34+ messages in thread
From: Peter Zijlstra @ 2017-05-17 10:40 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, linux-kernel, Thomas Gleixner, Ingo Molnar,
	Mathieu Desnoyers, Masami Hiramatsu

On Tue, May 16, 2017 at 07:27:42AM -0700, Paul E. McKenney wrote:
> On Tue, May 16, 2017 at 05:46:06AM -0700, Paul E. McKenney wrote:

> > Something like this, yes.  Maybe even exactly like this.  ;-)
> 
> Ah, one thing I forgot...  If you are avoiding use of get_online_cpus(),
> you usually also have to be very careful about how you use things like
> cpu_online() and cpu_is_offline.

OK, so I think I got it wrong there. This hunk should close any race
between perf_pmu_register() and hotplug by tracking a global state
protected by pmus_lock. Thereby insuring the cpuctx->online state gets
initialized right.

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8993,6 +8993,8 @@ static int pmu_dev_alloc(struct pmu *pmu
 static struct lock_class_key cpuctx_mutex;
 static struct lock_class_key cpuctx_lock;
 
+static cpumask_var_t perf_online_mask;
+
 int perf_pmu_register(struct pmu *pmu, const char *name, int type)
 {
 	int cpu, ret;
@@ -9056,7 +9058,7 @@ int perf_pmu_register(struct pmu *pmu, c
 		lockdep_set_class(&cpuctx->ctx.mutex, &cpuctx_mutex);
 		lockdep_set_class(&cpuctx->ctx.lock, &cpuctx_lock);
 		cpuctx->ctx.pmu = pmu;
-		cpuctx->online = cpu_online(cpu);
+		cpuctx->online = cpumask_test_cpu(cpu, perf_online_mask);
 
 		__perf_mux_hrtimer_init(cpuctx, cpu);
 	}
@@ -10952,6 +10954,8 @@ static void __init perf_event_init_all_c
 	struct swevent_htable *swhash;
 	int cpu;
 
+	zalloc_cpumask_var(&perf_online_mask, GFP_KERNEL);
+
 	for_each_possible_cpu(cpu) {
 		swhash = &per_cpu(swevent_htable, cpu);
 		mutex_init(&swhash->hlist_mutex);
@@ -11011,6 +11015,7 @@ static void perf_event_exit_cpu_context(
 		cpuctx->online = 0;
 		mutex_unlock(&ctx->mutex);
 	}
+	cpumask_clear_cpu(cpu, perf_online_mask);
 	mutex_unlock(&pmus_lock);
 }
 #else
@@ -11028,6 +11033,7 @@ int perf_event_init_cpu(unsigned int cpu
 	perf_swevent_init_cpu(cpu);
 
 	mutex_lock(&pmus_lock);
+	cpumask_set_cpu(cpu, perf_online_mask);
 	list_for_each_entry(pmu, &pmus, entry) {
 		cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
 		ctx = &cpuctx->ctx;

> > > ---
> > > --- a/kernel/events/core.c
> > > +++ b/kernel/events/core.c
> > > @@ -8997,7 +8997,6 @@ int perf_pmu_register(struct pmu *pmu, c
> > >  {
> > >  	int cpu, ret;
> > > 
> > > -	get_online_cpus();
> > >  	mutex_lock(&pmus_lock);
> > >  	ret = -ENOMEM;
> > >  	pmu->pmu_disable_count = alloc_percpu(int);
> > 
> > There is usually also some state check in here somewhere for the CPU
> > being offline from a perf perspective.  Such a check might already exist,
> > but I must plead ignorance of perf.

This just allocates per-cpu storage, that is per definition on the
possible mask and unrelated to the online mask.

> > > @@ -9093,7 +9092,6 @@ int perf_pmu_register(struct pmu *pmu, c
> > >  	ret = 0;
> > >  unlock:
> > >  	mutex_unlock(&pmus_lock);
> > > -	put_online_cpus();
> > > 
> > >  	return ret;
> > > 
> > > @@ -11002,10 +11000,9 @@ static void perf_event_exit_cpu_context(
> > >  	struct perf_cpu_context *cpuctx;
> > >  	struct perf_event_context *ctx;
> > >  	struct pmu *pmu;
> > > -	int idx;
> > > 
> > > -	idx = srcu_read_lock(&pmus_srcu);
> > > -	list_for_each_entry_rcu(pmu, &pmus, entry) {
> > > +	mutex_lock(&pmus_lock);
> > 
> > If the state change checked for by perf_pmu_register() needs to be also
> > guarded by ctx->mutex, this looks right to me.

Right, so we have two locks, pmus_lock that serializes hotplug vs
perf_pmu_register and ctx->lock that serializes find_get_context() vs
hotplug.

Together they ensure that if a PMU is observed, the PMU's cpuctx's have
the correct ->online state.

> > Just for completeness, the other style is to maintain separate per-CPU
> > state, in which case you would instead acquire pmus_lock, mark this
> > CPU off-limits to more perf_pmu_register() usage, release pmus_lock,
> > then clean up any old usage.

I'm not immediately seeing the other style, but per the above, I need
that too. Because the previous could race against hotplug if
perf_pmu_register() would observe cpu_online() as set after
perf_event_exit_cpu() or something.

With the above change any chance of a race is gone and we don't need to
worry about hotplug ordering too much.


Now I just need to do something about the swevent hash, because that's
got a hole in now.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/5] perf/tracing/cpuhotplug: Fix locking order
  2017-05-17 10:40                 ` Peter Zijlstra
@ 2017-05-17 14:55                   ` Paul E. McKenney
  2017-05-18  3:58                     ` Paul E. McKenney
  0 siblings, 1 reply; 34+ messages in thread
From: Paul E. McKenney @ 2017-05-17 14:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, linux-kernel, Thomas Gleixner, Ingo Molnar,
	Mathieu Desnoyers, Masami Hiramatsu

On Wed, May 17, 2017 at 12:40:10PM +0200, Peter Zijlstra wrote:
> On Tue, May 16, 2017 at 07:27:42AM -0700, Paul E. McKenney wrote:
> > On Tue, May 16, 2017 at 05:46:06AM -0700, Paul E. McKenney wrote:
> 
> > > Something like this, yes.  Maybe even exactly like this.  ;-)
> > 
> > Ah, one thing I forgot...  If you are avoiding use of get_online_cpus(),
> > you usually also have to be very careful about how you use things like
> > cpu_online() and cpu_is_offline.
> 
> OK, so I think I got it wrong there. This hunk should close any race
> between perf_pmu_register() and hotplug by tracking a global state
> protected by pmus_lock. Thereby insuring the cpuctx->online state gets
> initialized right.

Looks much better!

> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8993,6 +8993,8 @@ static int pmu_dev_alloc(struct pmu *pmu
>  static struct lock_class_key cpuctx_mutex;
>  static struct lock_class_key cpuctx_lock;
> 
> +static cpumask_var_t perf_online_mask;
> +
>  int perf_pmu_register(struct pmu *pmu, const char *name, int type)
>  {
>  	int cpu, ret;
> @@ -9056,7 +9058,7 @@ int perf_pmu_register(struct pmu *pmu, c
>  		lockdep_set_class(&cpuctx->ctx.mutex, &cpuctx_mutex);
>  		lockdep_set_class(&cpuctx->ctx.lock, &cpuctx_lock);
>  		cpuctx->ctx.pmu = pmu;
> -		cpuctx->online = cpu_online(cpu);
> +		cpuctx->online = cpumask_test_cpu(cpu, perf_online_mask);
> 
>  		__perf_mux_hrtimer_init(cpuctx, cpu);
>  	}
> @@ -10952,6 +10954,8 @@ static void __init perf_event_init_all_c
>  	struct swevent_htable *swhash;
>  	int cpu;
> 
> +	zalloc_cpumask_var(&perf_online_mask, GFP_KERNEL);
> +
>  	for_each_possible_cpu(cpu) {
>  		swhash = &per_cpu(swevent_htable, cpu);
>  		mutex_init(&swhash->hlist_mutex);
> @@ -11011,6 +11015,7 @@ static void perf_event_exit_cpu_context(
>  		cpuctx->online = 0;
>  		mutex_unlock(&ctx->mutex);
>  	}
> +	cpumask_clear_cpu(cpu, perf_online_mask);
>  	mutex_unlock(&pmus_lock);
>  }
>  #else
> @@ -11028,6 +11033,7 @@ int perf_event_init_cpu(unsigned int cpu
>  	perf_swevent_init_cpu(cpu);
> 
>  	mutex_lock(&pmus_lock);
> +	cpumask_set_cpu(cpu, perf_online_mask);
>  	list_for_each_entry(pmu, &pmus, entry) {
>  		cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
>  		ctx = &cpuctx->ctx;
> 
> > > > ---
> > > > --- a/kernel/events/core.c
> > > > +++ b/kernel/events/core.c
> > > > @@ -8997,7 +8997,6 @@ int perf_pmu_register(struct pmu *pmu, c
> > > >  {
> > > >  	int cpu, ret;
> > > > 
> > > > -	get_online_cpus();
> > > >  	mutex_lock(&pmus_lock);
> > > >  	ret = -ENOMEM;
> > > >  	pmu->pmu_disable_count = alloc_percpu(int);
> > > 
> > > There is usually also some state check in here somewhere for the CPU
> > > being offline from a perf perspective.  Such a check might already exist,
> > > but I must plead ignorance of perf.
> 
> This just allocates per-cpu storage, that is per definition on the
> possible mask and unrelated to the online mask.

Got it!

> > > > @@ -9093,7 +9092,6 @@ int perf_pmu_register(struct pmu *pmu, c
> > > >  	ret = 0;
> > > >  unlock:
> > > >  	mutex_unlock(&pmus_lock);
> > > > -	put_online_cpus();
> > > > 
> > > >  	return ret;
> > > > 
> > > > @@ -11002,10 +11000,9 @@ static void perf_event_exit_cpu_context(
> > > >  	struct perf_cpu_context *cpuctx;
> > > >  	struct perf_event_context *ctx;
> > > >  	struct pmu *pmu;
> > > > -	int idx;
> > > > 
> > > > -	idx = srcu_read_lock(&pmus_srcu);
> > > > -	list_for_each_entry_rcu(pmu, &pmus, entry) {
> > > > +	mutex_lock(&pmus_lock);
> > > 
> > > If the state change checked for by perf_pmu_register() needs to be also
> > > guarded by ctx->mutex, this looks right to me.
> 
> Right, so we have two locks, pmus_lock that serializes hotplug vs
> perf_pmu_register and ctx->lock that serializes find_get_context() vs
> hotplug.
> 
> Together they ensure that if a PMU is observed, the PMU's cpuctx's have
> the correct ->online state.

Sounds good, and now the added pmus_lock allows dropping get_online_cpus().

> > > Just for completeness, the other style is to maintain separate per-CPU
> > > state, in which case you would instead acquire pmus_lock, mark this
> > > CPU off-limits to more perf_pmu_register() usage, release pmus_lock,
> > > then clean up any old usage.
> 
> I'm not immediately seeing the other style, but per the above, I need
> that too. Because the previous could race against hotplug if
> perf_pmu_register() would observe cpu_online() as set after
> perf_event_exit_cpu() or something.
> 
> With the above change any chance of a race is gone and we don't need to
> worry about hotplug ordering too much.

Nice!!!

> Now I just need to do something about the swevent hash, because that's
> got a hole in now.

On the styles, here is a more coherence list:

1.	Use get_online_cpus(), but no CPU-hotplug notifiers.  You can
	use cpu_online() and cpu_is_offline() straightforwardly while
	get_online_cpus() is in effect, but these two may only be used
	for statistical/heuristic purposes otherwise.

2a.	Use CPU-hotplug notifiers, but not get_online_cpus().
	The notifiers maintain some sort of bitmask tracking which CPUs
	are present from the viewpoint of this subsystem.  This bitmask
	provides exact CPU presence/absence indications, at least
	assuming the appropriate lock is held.	The cpu_online() and
	cpu_is_offline() macros should be avoided, except possibly for
	statistical/heuristic purposes.

	For your perf patch, the bitmask is a cpumask_var_t.  For RCU,
	it is the combination of the ->qsmaskinitnext fields of the leaf
	rcu_node structures.

2b.	Like 2a, except that instead of a bitmask, the CPU online/offline
	information is implicit in other data structures.  For example,
	per-CPU structures might be allocated only when the corresponding
	CPU is online, so that a given per-CPU pointer is non-NULL
	iff the corresponding CPU is online.

So I was (confusingly) initially using "style" to distinguish #1 on the
one hand from #2a and #2b on the other.  Later on, I was using "style"
to distinguish #2a from #2b.

At this point, I am not sure that it makes all that much sense to
distinguish 2a from 2b.  Or you could argue that use of a cpumask_var_t
is its own substyle, with hand-crafted bitmasks (such as RCU's) are
separate substyles.  Can't say that I care all that much about that
fine-grained gradiations.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 1/5] tracing: Make sure RCU is watching before calling a stack trace
  2017-05-12 20:31           ` Paul E. McKenney
@ 2017-05-17 16:46             ` Steven Rostedt
  0 siblings, 0 replies; 34+ messages in thread
From: Steven Rostedt @ 2017-05-17 16:46 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Mathieu Desnoyers, Masami Hiramatsu

On Fri, 12 May 2017 13:31:45 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> On Fri, May 12, 2017 at 04:05:32PM -0400, Steven Rostedt wrote:
> > On Fri, 12 May 2017 11:50:03 -0700
> > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> >   
> > > On Fri, May 12, 2017 at 02:36:19PM -0400, Steven Rostedt wrote:  
> > > > On Fri, 12 May 2017 11:25:35 -0700
> > > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > > >     
> > > > > On Fri, May 12, 2017 at 01:15:45PM -0400, Steven Rostedt wrote:    
> > > > > > From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> > > > > > 
> > > > > > As stack tracing now requires "rcu watching", force RCU to be watching when
> > > > > > recording a stack trace.
> > > > > > 
> > > > > > Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>      
> > > > > 
> > > > > Assuming that you never get to __trace_stack() if in an NMI handler,
> > > > > this looks good to me!
> > > > > 
> > > > > In contrast, if if __trace_stack() ever is called from an NMI handler,
> > > > > invoking rcu_irq_enter() can be fatal.    
> > > > 
> > > > Then someone may die.
> > > > 
> > > > OK, what's the case of running this in nmi? How does perf do it?    
> > > 
> > > I have no idea.  If it cannot happen, then it cannot happen and all
> > > is well, RCU is happy, and I am happy.  ;-)
> > >   
> > > > Do we just skip the check if it is in an nmi?
> > > > 
> > > > 	if (!in_nmi()) {
> > > > 		if (unlikely(rcu_irq_enter_disabled()))
> > > > 			return;
> > > > 		rcu_irq_enter();
> > > > 	}
> > > > 
> > > > 	__ftrace_trace_stack();
> > > > 
> > > > 	if (!in_nmi())
> > > > 		rcu_irq_exit();
> > > > 
> > > > ?    
> > > 
> > > If it -can- happen, bail out of the function without doing the  
> > 
> > Why?
> >   
> > > __ftrace_trace_stack()?  Or does that just cause other problems further
> > > down the road?  Or BUG_ON(in_nmi())?  
> > 
> > Why?
> >   
> > > But again if it cannot happen, no problem and no need for extra code.  
> > 
> > We can't call stack trace from nmi anymore? It calls rcu_read_lock()
> > which is why we need to make sure rcu is watching, otherwise lockdep
> > complains.  
> 
> Ah, finally got it!  If we are in_nmi(), you are relying on the
> NMI handler's call to rcu_nmi_enter(), which works.  The piece I was
> forgetting was that you also recently said in an unrelated LKML thread
> that all the functions called at the very beginings and ends of NMI
> handlers (which can see !in_nmi()) are marked notrace, so that should
> be covered as well.
> 
> So never mind!  (And thank you for the explanation.)

Is this an Acked-by?

-- Steve

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 3/5] kprobes: Take get_online_cpus() before taking jump_label_lock()
  2017-05-12 17:15 ` [RFC][PATCH 3/5] kprobes: Take get_online_cpus() before taking jump_label_lock() Steven Rostedt
  2017-05-12 18:39   ` Paul E. McKenney
@ 2017-05-17 17:50   ` Masami Hiramatsu
  1 sibling, 0 replies; 34+ messages in thread
From: Masami Hiramatsu @ 2017-05-17 17:50 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Mathieu Desnoyers, Paul E. McKenney, Masami Hiramatsu

On Fri, 12 May 2017 13:15:47 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> jump_label_lock() is taken under get_online_cpus(). Make sure that kprobes
> follows suit.

BTW, register_aggr_kprobe() is called under kprobe_mutex locked.
Is that OK?

Thank you,


> 
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---
>  kernel/kprobes.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index d733479a10ee..57cf73aef488 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1294,13 +1294,13 @@ static int register_aggr_kprobe(struct kprobe *orig_p, struct kprobe *p)
>  	int ret = 0;
>  	struct kprobe *ap = orig_p;
>  
> -	/* For preparing optimization, jump_label_text_reserved() is called */
> -	jump_label_lock();
>  	/*
>  	 * Get online CPUs to avoid text_mutex deadlock.with stop machine,
>  	 * which is invoked by unoptimize_kprobe() in add_new_kprobe()
>  	 */
>  	get_online_cpus();
> +	/* For preparing optimization, jump_label_text_reserved() is called */
> +	jump_label_lock();
>  	mutex_lock(&text_mutex);
>  
>  	if (!kprobe_aggrprobe(orig_p)) {
> @@ -1348,8 +1348,8 @@ static int register_aggr_kprobe(struct kprobe *orig_p, struct kprobe *p)
>  
>  out:
>  	mutex_unlock(&text_mutex);
> -	put_online_cpus();
>  	jump_label_unlock();
> +	put_online_cpus();
>  
>  	if (ret == 0 && kprobe_disabled(ap) && !kprobe_disabled(p)) {
>  		ap->flags &= ~KPROBE_FLAG_DISABLED;
> -- 
> 2.10.2
> 
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/5] perf/tracing/cpuhotplug: Fix locking order
  2017-05-17 14:55                   ` Paul E. McKenney
@ 2017-05-18  3:58                     ` Paul E. McKenney
  0 siblings, 0 replies; 34+ messages in thread
From: Paul E. McKenney @ 2017-05-18  3:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, linux-kernel, Thomas Gleixner, Ingo Molnar,
	Mathieu Desnoyers, Masami Hiramatsu

On Wed, May 17, 2017 at 07:55:20AM -0700, Paul E. McKenney wrote:
> On Wed, May 17, 2017 at 12:40:10PM +0200, Peter Zijlstra wrote:
> > On Tue, May 16, 2017 at 07:27:42AM -0700, Paul E. McKenney wrote:
> > > On Tue, May 16, 2017 at 05:46:06AM -0700, Paul E. McKenney wrote:

[ . . . ]

> > 
> > With the above change any chance of a race is gone and we don't need to
> > worry about hotplug ordering too much.
> 
> Nice!!!
> 
> > Now I just need to do something about the swevent hash, because that's
> > got a hole in now.
> 
> On the styles, here is a more coherence list:
> 
> 1.	Use get_online_cpus(), but no CPU-hotplug notifiers.  You can
> 	use cpu_online() and cpu_is_offline() straightforwardly while
> 	get_online_cpus() is in effect, but these two may only be used
> 	for statistical/heuristic purposes otherwise.

And I forgot about the disable-preemption trick...

Of course you can also use cpu_online() and cpu_is_offline() more or less
straightforwardly when preemption is disabled, but if your code is making
use of subsystems that disable themselves early on outgoing CPUs or enable
themselves late on incoming CPUs, life can be difficult.  These subsystems
can make things easier, for example, smp_call_function_single() will
give a failure indication if it cannot help you because the CPU isn't
all there.  (RCU does OK, but could use a bit more improvement.)

Also, you can use the disable-preemption trick to prevent a CPU from
leaving (if you get there early enough in the offlining), but it won't
necessarily prevent a CPU from showing up.  Yes, there seem to be a
fair number of notifiers that wait for grace periods (perhaps too many),
but you cannot necessarily count on them being configured into the
kernel.

> 2a.	Use CPU-hotplug notifiers, but not get_online_cpus().
> 	The notifiers maintain some sort of bitmask tracking which CPUs
> 	are present from the viewpoint of this subsystem.  This bitmask
> 	provides exact CPU presence/absence indications, at least
> 	assuming the appropriate lock is held.	The cpu_online() and
> 	cpu_is_offline() macros should be avoided, except possibly for
> 	statistical/heuristic purposes.
> 
> 	For your perf patch, the bitmask is a cpumask_var_t.  For RCU,
> 	it is the combination of the ->qsmaskinitnext fields of the leaf
> 	rcu_node structures.
> 
> 2b.	Like 2a, except that instead of a bitmask, the CPU online/offline
> 	information is implicit in other data structures.  For example,
> 	per-CPU structures might be allocated only when the corresponding
> 	CPU is online, so that a given per-CPU pointer is non-NULL
> 	iff the corresponding CPU is online.

Here, disabling preemption will still prevent any CPU from responding
to the stop-CPUs mechanism, but it cannot prevent your own notifier from
executing.  So care is required using cpu_online() and cpu_is_offline()
even when preemption is disabled.

							Thanx, Paul

> So I was (confusingly) initially using "style" to distinguish #1 on the
> one hand from #2a and #2b on the other.  Later on, I was using "style"
> to distinguish #2a from #2b.
> 
> At this point, I am not sure that it makes all that much sense to
> distinguish 2a from 2b.  Or you could argue that use of a cpumask_var_t
> is its own substyle, with hand-crafted bitmasks (such as RCU's) are
> separate substyles.  Can't say that I care all that much about that
> fine-grained gradiations.  ;-)
> 
> 							Thanx, Paul

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2017-05-18  3:58 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-12 17:15 [RFC][PATCH 0/5] perf/tracing/cpuhotplug: Fix locking order Steven Rostedt
2017-05-12 17:15 ` [RFC][PATCH 1/5] tracing: Make sure RCU is watching before calling a stack trace Steven Rostedt
2017-05-12 18:25   ` Paul E. McKenney
2017-05-12 18:36     ` Steven Rostedt
2017-05-12 18:50       ` Paul E. McKenney
2017-05-12 20:05         ` Steven Rostedt
2017-05-12 20:31           ` Paul E. McKenney
2017-05-17 16:46             ` Steven Rostedt
2017-05-12 17:15 ` [RFC][PATCH 2/5] cpu-hotplug: Allow get_online_cpus() to nest Steven Rostedt
2017-05-12 18:35   ` Paul E. McKenney
2017-05-12 18:40     ` Steven Rostedt
2017-05-12 18:52       ` Paul E. McKenney
2017-05-12 22:15   ` Thomas Gleixner
2017-05-13  0:23     ` Steven Rostedt
2017-05-12 17:15 ` [RFC][PATCH 3/5] kprobes: Take get_online_cpus() before taking jump_label_lock() Steven Rostedt
2017-05-12 18:39   ` Paul E. McKenney
2017-05-12 18:44     ` Steven Rostedt
2017-05-17 17:50   ` Masami Hiramatsu
2017-05-12 17:15 ` [RFC][PATCH 4/5] tracepoints: Grab get_online_cpus() before taking tracepoints_mutex Steven Rostedt
2017-05-12 17:15 ` [RFC][PATCH 5/5] perf: Grab event_mutex before taking get_online_cpus() Steven Rostedt
2017-05-12 18:13 ` [RFC][PATCH 0/5] perf/tracing/cpuhotplug: Fix locking order Paul E. McKenney
2017-05-12 19:49 ` Peter Zijlstra
2017-05-12 20:14   ` Steven Rostedt
2017-05-12 21:34   ` Steven Rostedt
2017-05-13 13:40     ` Paul E. McKenney
2017-05-15  9:03       ` Peter Zijlstra
2017-05-15 18:40         ` Paul E. McKenney
2017-05-16  8:19           ` Peter Zijlstra
2017-05-16 12:46             ` Paul E. McKenney
2017-05-16 14:27               ` Paul E. McKenney
2017-05-17 10:40                 ` Peter Zijlstra
2017-05-17 14:55                   ` Paul E. McKenney
2017-05-18  3:58                     ` Paul E. McKenney
2017-05-15 19:06   ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).