All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] perf: Finer grained full dynticks handling
@ 2013-07-23  0:30 Frederic Weisbecker
  2013-07-23  0:30 ` [PATCH 1/8] perf: Fix branch stack refcount leak on callchain init failure Frederic Weisbecker
                   ` (8 more replies)
  0 siblings, 9 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2013-07-23  0:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Jiri Olsa, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian, Don Zickus,
	Srivatsa S. Bhat, Anish Singh

Hi,

This patchset inspires from Peterz patch in http://marc.info/?l=linux-kernel&m=137155182121860
to make perf retain the timer tick less often, ie: only when there are freq events
or when one throttles.

The main purpose is to make the lockup detector work with full dynticks, as I suspect
that distros want to enable both...


You can fetch from:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	perf/nohz

Thanks,
	Frederic
---

Frederic Weisbecker (8):
      perf: Fix branch stack refcount leak on callchain init failure
      perf: Sanitize get_callchain_buffer()
      perf: Gather event accounting code
      perf: Split per cpu event accounting code
      perf: Migrate per cpu event accounting
      perf: Account freq events per cpu
      perf: Finer grained full dynticks kick
      watchdog: Remove hack to make full dynticks working


 kernel/events/callchain.c |    2 +
 kernel/events/core.c      |  203 ++++++++++++++++++++++++++++-----------------
 kernel/watchdog.c         |    8 --
 3 files changed, 128 insertions(+), 85 deletions(-)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 1/8] perf: Fix branch stack refcount leak on callchain init failure
  2013-07-23  0:30 [PATCH 0/8] perf: Finer grained full dynticks handling Frederic Weisbecker
@ 2013-07-23  0:30 ` Frederic Weisbecker
  2013-07-31  8:55   ` [tip:perf/core] " tip-bot for Frederic Weisbecker
  2013-07-23  0:31 ` [PATCH 2/8] perf: Sanitize get_callchain_buffer() Frederic Weisbecker
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 49+ messages in thread
From: Frederic Weisbecker @ 2013-07-23  0:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Jiri Olsa, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

On callchain buffers allocation failure, free_event() is
called and all the accounting performed in perf_event_alloc()
for that event is cancelled.

But if the event has branch stack sampling, it is unaccounted
as well from the branch stack sampling events refcounts.

This is a bug because this accounting is performed after the
callchain buffer allocation. As a result, the branch stack sampling
events refcount can become negative.

To fix this, move the branch stack event accounting before the
callchain buffer allocation.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
---
 kernel/events/core.c |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5e2bce9..7ffb81e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6566,6 +6566,12 @@ done:
 			atomic_inc(&nr_comm_events);
 		if (event->attr.task)
 			atomic_inc(&nr_task_events);
+		if (has_branch_stack(event)) {
+			static_key_slow_inc(&perf_sched_events.key);
+			if (!(event->attach_state & PERF_ATTACH_TASK))
+				atomic_inc(&per_cpu(perf_branch_stack_events,
+						    event->cpu));
+		}
 		if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) {
 			err = get_callchain_buffers();
 			if (err) {
@@ -6573,12 +6579,6 @@ done:
 				return ERR_PTR(err);
 			}
 		}
-		if (has_branch_stack(event)) {
-			static_key_slow_inc(&perf_sched_events.key);
-			if (!(event->attach_state & PERF_ATTACH_TASK))
-				atomic_inc(&per_cpu(perf_branch_stack_events,
-						    event->cpu));
-		}
 	}
 
 	return event;
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 2/8] perf: Sanitize get_callchain_buffer()
  2013-07-23  0:30 [PATCH 0/8] perf: Finer grained full dynticks handling Frederic Weisbecker
  2013-07-23  0:30 ` [PATCH 1/8] perf: Fix branch stack refcount leak on callchain init failure Frederic Weisbecker
@ 2013-07-23  0:31 ` Frederic Weisbecker
  2013-07-31  8:56   ` [tip:perf/core] " tip-bot for Frederic Weisbecker
                     ` (2 more replies)
  2013-07-23  0:31 ` [PATCH 3/8] perf: Gather event accounting code Frederic Weisbecker
                   ` (6 subsequent siblings)
  8 siblings, 3 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2013-07-23  0:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Jiri Olsa, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

In case of allocation failure, get_callchain_buffer() keeps the
refcount incremented for the current event.

As a result, when get_callchain_buffers() returns an error,
we must cleanup what it did by cancelling its last refcount
with a call to put_callchain_buffers().

This is a hack in order to be able to call free_event()
after that failure.

The original purpose of that was to simplify the failure
path. But this error handling is actually counter intuitive,
ugly and not very easy to follow because one expect to
see the resources used to perform a service to be cleaned
by the callee if case of failure, not by the caller.

So lets clean this up by cancelling the refcount from
get_callchain_buffer() in case of failure. And correctly free
the event accordingly in perf_event_alloc().

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
---
 kernel/events/callchain.c |    2 ++
 kernel/events/core.c      |   41 +++++++++++++++++++++--------------------
 2 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c
index c772061..76a8bc5 100644
--- a/kernel/events/callchain.c
+++ b/kernel/events/callchain.c
@@ -117,6 +117,8 @@ int get_callchain_buffers(void)
 	err = alloc_callchain_buffers();
 exit:
 	mutex_unlock(&callchain_mutex);
+	if (err)
+		atomic_dec(&nr_callchain_events);
 
 	return err;
 }
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 7ffb81e..c5f435f 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6456,7 +6456,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 	struct pmu *pmu;
 	struct perf_event *event;
 	struct hw_perf_event *hwc;
-	long err;
+	long err = -EINVAL;
 
 	if ((unsigned)cpu >= nr_cpu_ids) {
 		if (!task || cpu != -1)
@@ -6539,25 +6539,23 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 	 * we currently do not support PERF_FORMAT_GROUP on inherited events
 	 */
 	if (attr->inherit && (attr->read_format & PERF_FORMAT_GROUP))
-		goto done;
+		goto err_ns;
 
 	pmu = perf_init_event(event);
-
-done:
-	err = 0;
 	if (!pmu)
-		err = -EINVAL;
-	else if (IS_ERR(pmu))
+		goto err_ns;
+	else if (IS_ERR(pmu)) {
 		err = PTR_ERR(pmu);
-
-	if (err) {
-		if (event->ns)
-			put_pid_ns(event->ns);
-		kfree(event);
-		return ERR_PTR(err);
+		goto err_ns;
 	}
 
 	if (!event->parent) {
+		if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) {
+			err = get_callchain_buffers();
+			if (err)
+				goto err_pmu;
+		}
+
 		if (event->attach_state & PERF_ATTACH_TASK)
 			static_key_slow_inc(&perf_sched_events.key);
 		if (event->attr.mmap || event->attr.mmap_data)
@@ -6572,16 +6570,19 @@ done:
 				atomic_inc(&per_cpu(perf_branch_stack_events,
 						    event->cpu));
 		}
-		if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) {
-			err = get_callchain_buffers();
-			if (err) {
-				free_event(event);
-				return ERR_PTR(err);
-			}
-		}
 	}
 
 	return event;
+
+err_pmu:
+	if (event->destroy)
+		event->destroy(event);
+err_ns:
+	if (event->ns)
+		put_pid_ns(event->ns);
+	kfree(event);
+
+	return ERR_PTR(err);
 }
 
 static int perf_copy_attr(struct perf_event_attr __user *uattr,
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 3/8] perf: Gather event accounting code
  2013-07-23  0:30 [PATCH 0/8] perf: Finer grained full dynticks handling Frederic Weisbecker
  2013-07-23  0:30 ` [PATCH 1/8] perf: Fix branch stack refcount leak on callchain init failure Frederic Weisbecker
  2013-07-23  0:31 ` [PATCH 2/8] perf: Sanitize get_callchain_buffer() Frederic Weisbecker
@ 2013-07-23  0:31 ` Frederic Weisbecker
  2013-07-31  8:56   ` [tip:perf/core] perf: Factor out event accounting code to account_event()/__free_event() tip-bot for Frederic Weisbecker
  2013-08-01 13:13   ` [PATCH 3/8] perf: Gather event accounting code Jiri Olsa
  2013-07-23  0:31 ` [PATCH 4/8] perf: Split per cpu " Frederic Weisbecker
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2013-07-23  0:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Jiri Olsa, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

Gather all the event accounting code to a single place,
once all the prerequisites are completed. This simplifies
the refcounting.

Original-patch-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
---
 kernel/events/core.c |   79 +++++++++++++++++++++++++++++--------------------
 1 files changed, 47 insertions(+), 32 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index c5f435f..3bb73af 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3128,6 +3128,21 @@ static void free_event_rcu(struct rcu_head *head)
 static void ring_buffer_put(struct ring_buffer *rb);
 static void ring_buffer_detach(struct perf_event *event, struct ring_buffer *rb);
 
+static void __free_event(struct perf_event *event)
+{
+	if (!event->parent) {
+		if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
+			put_callchain_buffers();
+	}
+
+	if (event->destroy)
+		event->destroy(event);
+
+	if (event->ctx)
+		put_ctx(event->ctx);
+
+	call_rcu(&event->rcu_head, free_event_rcu);
+}
 static void free_event(struct perf_event *event)
 {
 	irq_work_sync(&event->pending);
@@ -3141,8 +3156,6 @@ static void free_event(struct perf_event *event)
 			atomic_dec(&nr_comm_events);
 		if (event->attr.task)
 			atomic_dec(&nr_task_events);
-		if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
-			put_callchain_buffers();
 		if (is_cgroup_event(event)) {
 			atomic_dec(&per_cpu(perf_cgroup_events, event->cpu));
 			static_key_slow_dec_deferred(&perf_sched_events);
@@ -3180,13 +3193,8 @@ static void free_event(struct perf_event *event)
 	if (is_cgroup_event(event))
 		perf_detach_cgroup(event);
 
-	if (event->destroy)
-		event->destroy(event);
-
-	if (event->ctx)
-		put_ctx(event->ctx);
 
-	call_rcu(&event->rcu_head, free_event_rcu);
+	__free_event(event);
 }
 
 int perf_event_release_kernel(struct perf_event *event)
@@ -6442,6 +6450,29 @@ unlock:
 	return pmu;
 }
 
+static void account_event(struct perf_event *event)
+{
+	if (event->attach_state & PERF_ATTACH_TASK)
+		static_key_slow_inc(&perf_sched_events.key);
+	if (event->attr.mmap || event->attr.mmap_data)
+		atomic_inc(&nr_mmap_events);
+	if (event->attr.comm)
+		atomic_inc(&nr_comm_events);
+	if (event->attr.task)
+		atomic_inc(&nr_task_events);
+	if (has_branch_stack(event)) {
+		static_key_slow_inc(&perf_sched_events.key);
+		if (!(event->attach_state & PERF_ATTACH_TASK))
+			atomic_inc(&per_cpu(perf_branch_stack_events,
+					    event->cpu));
+	}
+
+	if (is_cgroup_event(event)) {
+		atomic_inc(&per_cpu(perf_cgroup_events, event->cpu));
+		static_key_slow_inc(&perf_sched_events.key);
+	}
+}
+
 /*
  * Allocate and initialize a event structure
  */
@@ -6555,21 +6586,6 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 			if (err)
 				goto err_pmu;
 		}
-
-		if (event->attach_state & PERF_ATTACH_TASK)
-			static_key_slow_inc(&perf_sched_events.key);
-		if (event->attr.mmap || event->attr.mmap_data)
-			atomic_inc(&nr_mmap_events);
-		if (event->attr.comm)
-			atomic_inc(&nr_comm_events);
-		if (event->attr.task)
-			atomic_inc(&nr_task_events);
-		if (has_branch_stack(event)) {
-			static_key_slow_inc(&perf_sched_events.key);
-			if (!(event->attach_state & PERF_ATTACH_TASK))
-				atomic_inc(&per_cpu(perf_branch_stack_events,
-						    event->cpu));
-		}
 	}
 
 	return event;
@@ -6864,17 +6880,14 @@ SYSCALL_DEFINE5(perf_event_open,
 
 	if (flags & PERF_FLAG_PID_CGROUP) {
 		err = perf_cgroup_connect(pid, event, &attr, group_leader);
-		if (err)
-			goto err_alloc;
-		/*
-		 * one more event:
-		 * - that has cgroup constraint on event->cpu
-		 * - that may need work on context switch
-		 */
-		atomic_inc(&per_cpu(perf_cgroup_events, event->cpu));
-		static_key_slow_inc(&perf_sched_events.key);
+		if (err) {
+			__free_event(event);
+			goto err_task;
+		}
 	}
 
+	account_event(event);
+
 	/*
 	 * Special case software events and allow them to be part of
 	 * any hardware group.
@@ -7070,6 +7083,8 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu,
 		goto err;
 	}
 
+	account_event(event);
+
 	ctx = find_get_context(event->pmu, task, cpu);
 	if (IS_ERR(ctx)) {
 		err = PTR_ERR(ctx);
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 4/8] perf: Split per cpu event accounting code
  2013-07-23  0:30 [PATCH 0/8] perf: Finer grained full dynticks handling Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2013-07-23  0:31 ` [PATCH 3/8] perf: Gather event accounting code Frederic Weisbecker
@ 2013-07-23  0:31 ` Frederic Weisbecker
  2013-07-31  8:56   ` [tip:perf/core] perf: Split the per-cpu accounting part of the " tip-bot for Frederic Weisbecker
  2013-07-23  0:31 ` [PATCH 5/8] perf: Migrate per cpu event accounting Frederic Weisbecker
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 49+ messages in thread
From: Frederic Weisbecker @ 2013-07-23  0:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Jiri Olsa, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

Split the per-cpu accounting part of the event
accounting code.

This way we can use the per-cpu handling seperately.
This is going to be used by to fix the event migration
code accounting.

Original-patch-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
---
 kernel/events/core.c |   87 +++++++++++++++++++++++++++++++------------------
 1 files changed, 55 insertions(+), 32 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 3bb73af..fd820a4 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3128,6 +3128,40 @@ static void free_event_rcu(struct rcu_head *head)
 static void ring_buffer_put(struct ring_buffer *rb);
 static void ring_buffer_detach(struct perf_event *event, struct ring_buffer *rb);
 
+static void unaccount_event_cpu(struct perf_event *event, int cpu)
+{
+	if (event->parent)
+		return;
+
+	if (has_branch_stack(event)) {
+		if (!(event->attach_state & PERF_ATTACH_TASK))
+			atomic_dec(&per_cpu(perf_branch_stack_events, cpu));
+	}
+	if (is_cgroup_event(event))
+		atomic_dec(&per_cpu(perf_cgroup_events, cpu));
+}
+
+static void unaccount_event(struct perf_event *event)
+{
+	if (event->parent)
+		return;
+
+	if (event->attach_state & PERF_ATTACH_TASK)
+		static_key_slow_dec_deferred(&perf_sched_events);
+	if (event->attr.mmap || event->attr.mmap_data)
+		atomic_dec(&nr_mmap_events);
+	if (event->attr.comm)
+		atomic_dec(&nr_comm_events);
+	if (event->attr.task)
+		atomic_dec(&nr_task_events);
+	if (is_cgroup_event(event))
+		static_key_slow_dec_deferred(&perf_sched_events);
+	if (has_branch_stack(event))
+		static_key_slow_dec_deferred(&perf_sched_events);
+
+	unaccount_event_cpu(event, event->cpu);
+}
+
 static void __free_event(struct perf_event *event)
 {
 	if (!event->parent) {
@@ -3147,29 +3181,7 @@ static void free_event(struct perf_event *event)
 {
 	irq_work_sync(&event->pending);
 
-	if (!event->parent) {
-		if (event->attach_state & PERF_ATTACH_TASK)
-			static_key_slow_dec_deferred(&perf_sched_events);
-		if (event->attr.mmap || event->attr.mmap_data)
-			atomic_dec(&nr_mmap_events);
-		if (event->attr.comm)
-			atomic_dec(&nr_comm_events);
-		if (event->attr.task)
-			atomic_dec(&nr_task_events);
-		if (is_cgroup_event(event)) {
-			atomic_dec(&per_cpu(perf_cgroup_events, event->cpu));
-			static_key_slow_dec_deferred(&perf_sched_events);
-		}
-
-		if (has_branch_stack(event)) {
-			static_key_slow_dec_deferred(&perf_sched_events);
-			/* is system-wide event */
-			if (!(event->attach_state & PERF_ATTACH_TASK)) {
-				atomic_dec(&per_cpu(perf_branch_stack_events,
-						    event->cpu));
-			}
-		}
-	}
+	unaccount_event(event);
 
 	if (event->rb) {
 		struct ring_buffer *rb;
@@ -6450,8 +6462,24 @@ unlock:
 	return pmu;
 }
 
+static void account_event_cpu(struct perf_event *event, int cpu)
+{
+	if (event->parent)
+		return;
+
+	if (has_branch_stack(event)) {
+		if (!(event->attach_state & PERF_ATTACH_TASK))
+			atomic_inc(&per_cpu(perf_branch_stack_events, cpu));
+	}
+	if (is_cgroup_event(event))
+		atomic_inc(&per_cpu(perf_cgroup_events, cpu));
+}
+
 static void account_event(struct perf_event *event)
 {
+	if (event->parent)
+		return;
+
 	if (event->attach_state & PERF_ATTACH_TASK)
 		static_key_slow_inc(&perf_sched_events.key);
 	if (event->attr.mmap || event->attr.mmap_data)
@@ -6460,17 +6488,12 @@ static void account_event(struct perf_event *event)
 		atomic_inc(&nr_comm_events);
 	if (event->attr.task)
 		atomic_inc(&nr_task_events);
-	if (has_branch_stack(event)) {
+	if (has_branch_stack(event))
 		static_key_slow_inc(&perf_sched_events.key);
-		if (!(event->attach_state & PERF_ATTACH_TASK))
-			atomic_inc(&per_cpu(perf_branch_stack_events,
-					    event->cpu));
-	}
-
-	if (is_cgroup_event(event)) {
-		atomic_inc(&per_cpu(perf_cgroup_events, event->cpu));
+	if (is_cgroup_event(event))
 		static_key_slow_inc(&perf_sched_events.key);
-	}
+
+	account_event_cpu(event, event->cpu);
 }
 
 /*
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 5/8] perf: Migrate per cpu event accounting
  2013-07-23  0:30 [PATCH 0/8] perf: Finer grained full dynticks handling Frederic Weisbecker
                   ` (3 preceding siblings ...)
  2013-07-23  0:31 ` [PATCH 4/8] perf: Split per cpu " Frederic Weisbecker
@ 2013-07-23  0:31 ` Frederic Weisbecker
  2013-07-31  8:56   ` [tip:perf/core] " tip-bot for Frederic Weisbecker
  2013-07-23  0:31 ` [PATCH 6/8] perf: Account freq events per cpu Frederic Weisbecker
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 49+ messages in thread
From: Frederic Weisbecker @ 2013-07-23  0:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Jiri Olsa, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

When an event is migrated, move the event per-cpu
accounting accordingly so that branch stack and cgroup
events work correctly on the new CPU.

Original-patch-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
---
 kernel/events/core.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index fd820a4..b40c3db 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7144,6 +7144,7 @@ void perf_pmu_migrate_context(struct pmu *pmu, int src_cpu, int dst_cpu)
 	list_for_each_entry_safe(event, tmp, &src_ctx->event_list,
 				 event_entry) {
 		perf_remove_from_context(event);
+		unaccount_event_cpu(event, src_cpu);
 		put_ctx(src_ctx);
 		list_add(&event->event_entry, &events);
 	}
@@ -7156,6 +7157,7 @@ void perf_pmu_migrate_context(struct pmu *pmu, int src_cpu, int dst_cpu)
 		list_del(&event->event_entry);
 		if (event->state >= PERF_EVENT_STATE_OFF)
 			event->state = PERF_EVENT_STATE_INACTIVE;
+		account_event_cpu(event, dst_cpu);
 		perf_install_in_context(dst_ctx, event, dst_cpu);
 		get_ctx(dst_ctx);
 	}
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 6/8] perf: Account freq events per cpu
  2013-07-23  0:30 [PATCH 0/8] perf: Finer grained full dynticks handling Frederic Weisbecker
                   ` (4 preceding siblings ...)
  2013-07-23  0:31 ` [PATCH 5/8] perf: Migrate per cpu event accounting Frederic Weisbecker
@ 2013-07-23  0:31 ` Frederic Weisbecker
  2013-07-31  8:56   ` [tip:perf/core] " tip-bot for Frederic Weisbecker
  2013-08-01 12:46   ` [PATCH 6/8] " Jiri Olsa
  2013-07-23  0:31 ` [PATCH 7/8] perf: Finer grained full dynticks kick Frederic Weisbecker
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2013-07-23  0:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Jiri Olsa, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

This is going to be used by the full dynticks subsystem
as a finer-grained information to know when to keep and
when to stop the tick.

Original-patch-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
---
 kernel/events/core.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index b40c3db..f9bd39b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -141,6 +141,7 @@ enum event_type_t {
 struct static_key_deferred perf_sched_events __read_mostly;
 static DEFINE_PER_CPU(atomic_t, perf_cgroup_events);
 static DEFINE_PER_CPU(atomic_t, perf_branch_stack_events);
+static DEFINE_PER_CPU(atomic_t, perf_freq_events);
 
 static atomic_t nr_mmap_events __read_mostly;
 static atomic_t nr_comm_events __read_mostly;
@@ -3139,6 +3140,9 @@ static void unaccount_event_cpu(struct perf_event *event, int cpu)
 	}
 	if (is_cgroup_event(event))
 		atomic_dec(&per_cpu(perf_cgroup_events, cpu));
+
+	if (event->attr.freq)
+		atomic_dec(&per_cpu(perf_freq_events, cpu));
 }
 
 static void unaccount_event(struct perf_event *event)
@@ -6473,6 +6477,9 @@ static void account_event_cpu(struct perf_event *event, int cpu)
 	}
 	if (is_cgroup_event(event))
 		atomic_inc(&per_cpu(perf_cgroup_events, cpu));
+
+	if (event->attr.freq)
+		atomic_inc(&per_cpu(perf_freq_events, cpu));
 }
 
 static void account_event(struct perf_event *event)
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 7/8] perf: Finer grained full dynticks kick
  2013-07-23  0:30 [PATCH 0/8] perf: Finer grained full dynticks handling Frederic Weisbecker
                   ` (5 preceding siblings ...)
  2013-07-23  0:31 ` [PATCH 6/8] perf: Account freq events per cpu Frederic Weisbecker
@ 2013-07-23  0:31 ` Frederic Weisbecker
  2013-07-31  8:56   ` [tip:perf/core] perf: Implement finer " tip-bot for Frederic Weisbecker
  2013-07-23  0:31 ` [PATCH 8/8] watchdog: Remove hack to make full dynticks working Frederic Weisbecker
  2013-07-25  9:59 ` [PATCH 0/8] perf: Finer grained full dynticks handling Peter Zijlstra
  8 siblings, 1 reply; 49+ messages in thread
From: Frederic Weisbecker @ 2013-07-23  0:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Jiri Olsa, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

Currently the full dynticks subsystem keep the
tick alive as long as there are perf events running.

This prevents the tick from being stopped as long as features
such that the lockup detectors are running. As a temporary fix,
the lockup detector is disabled by default when full dynticks
is built but this is not a long term viable solution.

To fix this, only keep the tick alive when an event configured
with a frequency rather than a period is running on the CPU,
or when an event throttles on the CPU.

These are the only purposes of the perf tick, especially now that
the rotation of flexible events is handled from a seperate hrtimer.
The tick can be shutdown the rest of the time.

Original-patch-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
---
 kernel/events/core.c |   17 +++++++++--------
 1 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index f9bd39b..61d70b7 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -870,12 +870,8 @@ static void perf_pmu_rotate_start(struct pmu *pmu)
 
 	WARN_ON(!irqs_disabled());
 
-	if (list_empty(&cpuctx->rotation_list)) {
-		int was_empty = list_empty(head);
+	if (list_empty(&cpuctx->rotation_list))
 		list_add(&cpuctx->rotation_list, head);
-		if (was_empty)
-			tick_nohz_full_kick();
-	}
 }
 
 static void get_ctx(struct perf_event_context *ctx)
@@ -1875,6 +1871,9 @@ static int  __perf_install_in_context(void *info)
 	perf_pmu_enable(cpuctx->ctx.pmu);
 	perf_ctx_unlock(cpuctx, task_ctx);
 
+	if (atomic_read(&__get_cpu_var(perf_freq_events)))
+		tick_nohz_full_kick();
+
 	return 0;
 }
 
@@ -2812,10 +2811,11 @@ done:
 #ifdef CONFIG_NO_HZ_FULL
 bool perf_event_can_stop_tick(void)
 {
-	if (list_empty(&__get_cpu_var(rotation_list)))
-		return true;
-	else
+	if (atomic_read(&__get_cpu_var(perf_freq_events)) ||
+	    __this_cpu_read(perf_throttled_count))
 		return false;
+	else
+		return true;
 }
 #endif
 
@@ -5201,6 +5201,7 @@ static int __perf_event_overflow(struct perf_event *event,
 			__this_cpu_inc(perf_throttled_count);
 			hwc->interrupts = MAX_INTERRUPTS;
 			perf_log_throttle(event, 0);
+			tick_nohz_full_kick();
 			ret = 1;
 		}
 	}
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 8/8] watchdog: Remove hack to make full dynticks working
  2013-07-23  0:30 [PATCH 0/8] perf: Finer grained full dynticks handling Frederic Weisbecker
                   ` (6 preceding siblings ...)
  2013-07-23  0:31 ` [PATCH 7/8] perf: Finer grained full dynticks kick Frederic Weisbecker
@ 2013-07-23  0:31 ` Frederic Weisbecker
  2013-07-23 12:33   ` Don Zickus
  2013-07-31  8:57   ` [tip:perf/core] watchdog: Make it work under full dynticks tip-bot for Frederic Weisbecker
  2013-07-25  9:59 ` [PATCH 0/8] perf: Finer grained full dynticks handling Peter Zijlstra
  8 siblings, 2 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2013-07-23  0:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Jiri Olsa, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian, Don Zickus,
	Srivatsa S. Bhat, Anish Singh

A perf event can be used without forcing the tick to
stay alive if it doesn't use a frequency but a sample
period and if it doesn't throttle (raise storm of events).

Since the lockup detector neither use a perf event frequency
nor should ever throttle due to its high period, it can now
run concurrently with the full dynticks feature.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Anish Singh <anish198519851985@gmail.com>
---
 kernel/watchdog.c |    8 --------
 1 files changed, 0 insertions(+), 8 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 1241d8c..51c4f34 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -553,14 +553,6 @@ void __init lockup_detector_init(void)
 {
 	set_sample_period();
 
-#ifdef CONFIG_NO_HZ_FULL
-	if (watchdog_user_enabled) {
-		watchdog_user_enabled = 0;
-		pr_warning("Disabled lockup detectors by default for full dynticks\n");
-		pr_warning("You can reactivate it with 'sysctl -w kernel.watchdog=1'\n");
-	}
-#endif
-
 	if (watchdog_user_enabled)
 		watchdog_enable_all_cpus();
 }
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH 8/8] watchdog: Remove hack to make full dynticks working
  2013-07-23  0:31 ` [PATCH 8/8] watchdog: Remove hack to make full dynticks working Frederic Weisbecker
@ 2013-07-23 12:33   ` Don Zickus
  2013-07-23 12:44     ` Frederic Weisbecker
  2013-07-23 12:45     ` Peter Zijlstra
  2013-07-31  8:57   ` [tip:perf/core] watchdog: Make it work under full dynticks tip-bot for Frederic Weisbecker
  1 sibling, 2 replies; 49+ messages in thread
From: Don Zickus @ 2013-07-23 12:33 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, LKML, Jiri Olsa, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian, Srivatsa S. Bhat,
	Anish Singh

On Tue, Jul 23, 2013 at 02:31:06AM +0200, Frederic Weisbecker wrote:
> A perf event can be used without forcing the tick to
> stay alive if it doesn't use a frequency but a sample
> period and if it doesn't throttle (raise storm of events).
> 
> Since the lockup detector neither use a perf event frequency
> nor should ever throttle due to its high period, it can now
> run concurrently with the full dynticks feature.

Thanks.  Dumb question, I keep wondering if the lockup detector would be
better or worse off if it used the perf event frequency as opposed to
using a sample period?  The idea is it could follow the varying cpu
frequencies better (and probably simplify some of the code too).

Acked-by: Don Zickus <dzickus@redhat.com>


> 
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Jiri Olsa <jolsa@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Stephane Eranian <eranian@google.com>
> Cc: Don Zickus <dzickus@redhat.com>
> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> Cc: Anish Singh <anish198519851985@gmail.com>
> ---
>  kernel/watchdog.c |    8 --------
>  1 files changed, 0 insertions(+), 8 deletions(-)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 1241d8c..51c4f34 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -553,14 +553,6 @@ void __init lockup_detector_init(void)
>  {
>  	set_sample_period();
>  
> -#ifdef CONFIG_NO_HZ_FULL
> -	if (watchdog_user_enabled) {
> -		watchdog_user_enabled = 0;
> -		pr_warning("Disabled lockup detectors by default for full dynticks\n");
> -		pr_warning("You can reactivate it with 'sysctl -w kernel.watchdog=1'\n");
> -	}
> -#endif
> -
>  	if (watchdog_user_enabled)
>  		watchdog_enable_all_cpus();
>  }
> -- 
> 1.7.5.4
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 8/8] watchdog: Remove hack to make full dynticks working
  2013-07-23 12:33   ` Don Zickus
@ 2013-07-23 12:44     ` Frederic Weisbecker
  2013-07-23 12:45     ` Peter Zijlstra
  1 sibling, 0 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2013-07-23 12:44 UTC (permalink / raw)
  To: Don Zickus
  Cc: Peter Zijlstra, LKML, Jiri Olsa, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian, Srivatsa S. Bhat,
	Anish Singh

On Tue, Jul 23, 2013 at 08:33:31AM -0400, Don Zickus wrote:
> On Tue, Jul 23, 2013 at 02:31:06AM +0200, Frederic Weisbecker wrote:
> > A perf event can be used without forcing the tick to
> > stay alive if it doesn't use a frequency but a sample
> > period and if it doesn't throttle (raise storm of events).
> > 
> > Since the lockup detector neither use a perf event frequency
> > nor should ever throttle due to its high period, it can now
> > run concurrently with the full dynticks feature.
> 
> Thanks.  Dumb question, I keep wondering if the lockup detector would be
> better or worse off if it used the perf event frequency as opposed to
> using a sample period?  The idea is it could follow the varying cpu
> frequencies better (and probably simplify some of the code too).
> 
> Acked-by: Don Zickus <dzickus@redhat.com>

Thanks!

IIRC we tried that and I believe the issue was that perf did not
support frequency below 1. So probably the limitation was that it had to
fire at least once per sec.

Also if you do that, we fall into the full dynticks problem again.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 8/8] watchdog: Remove hack to make full dynticks working
  2013-07-23 12:33   ` Don Zickus
  2013-07-23 12:44     ` Frederic Weisbecker
@ 2013-07-23 12:45     ` Peter Zijlstra
  1 sibling, 0 replies; 49+ messages in thread
From: Peter Zijlstra @ 2013-07-23 12:45 UTC (permalink / raw)
  To: Don Zickus
  Cc: Frederic Weisbecker, LKML, Jiri Olsa, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian, Srivatsa S. Bhat,
	Anish Singh

On Tue, Jul 23, 2013 at 08:33:31AM -0400, Don Zickus wrote:
> On Tue, Jul 23, 2013 at 02:31:06AM +0200, Frederic Weisbecker wrote:
> > A perf event can be used without forcing the tick to
> > stay alive if it doesn't use a frequency but a sample
> > period and if it doesn't throttle (raise storm of events).
> > 
> > Since the lockup detector neither use a perf event frequency
> > nor should ever throttle due to its high period, it can now
> > run concurrently with the full dynticks feature.
> 
> Thanks.  Dumb question, I keep wondering if the lockup detector would be
> better or worse off if it used the perf event frequency as opposed to
> using a sample period?  The idea is it could follow the varying cpu
> frequencies better (and probably simplify some of the code too).

Right, trouble is that someone didn't consider fractional frequencies
when writing the interface :/ Lowest we can go is 1 Hz while we'd want
something like 0.1 Hz or smaller.

Also, like the above says, that would interfere with the nohz efforts as
perf needs the tick to re-compute those frequency thingies.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/8] perf: Finer grained full dynticks handling
  2013-07-23  0:30 [PATCH 0/8] perf: Finer grained full dynticks handling Frederic Weisbecker
                   ` (7 preceding siblings ...)
  2013-07-23  0:31 ` [PATCH 8/8] watchdog: Remove hack to make full dynticks working Frederic Weisbecker
@ 2013-07-25  9:59 ` Peter Zijlstra
  2013-07-25 14:02   ` Frederic Weisbecker
  8 siblings, 1 reply; 49+ messages in thread
From: Peter Zijlstra @ 2013-07-25  9:59 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Jiri Olsa, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian, Don Zickus,
	Srivatsa S. Bhat, Anish Singh

On Tue, Jul 23, 2013 at 02:30:58AM +0200, Frederic Weisbecker wrote:
> Hi,
> 
> This patchset inspires from Peterz patch in http://marc.info/?l=linux-kernel&m=137155182121860
> to make perf retain the timer tick less often, ie: only when there are freq events
> or when one throttles.
> 
> The main purpose is to make the lockup detector work with full dynticks, as I suspect
> that distros want to enable both...
> 
> 
> You can fetch from:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> 	perf/nohz
> 
> Thanks,
> 	Frederic

Awesome, looks good, thanks Frederic!

Acked-by: Peter Zijlstra <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/8] perf: Finer grained full dynticks handling
  2013-07-25  9:59 ` [PATCH 0/8] perf: Finer grained full dynticks handling Peter Zijlstra
@ 2013-07-25 14:02   ` Frederic Weisbecker
  2013-07-25 16:29     ` Peter Zijlstra
  0 siblings, 1 reply; 49+ messages in thread
From: Frederic Weisbecker @ 2013-07-25 14:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Jiri Olsa, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian, Don Zickus,
	Srivatsa S. Bhat, Anish Singh

On Thu, Jul 25, 2013 at 11:59:38AM +0200, Peter Zijlstra wrote:
> On Tue, Jul 23, 2013 at 02:30:58AM +0200, Frederic Weisbecker wrote:
> > Hi,
> > 
> > This patchset inspires from Peterz patch in http://marc.info/?l=linux-kernel&m=137155182121860
> > to make perf retain the timer tick less often, ie: only when there are freq events
> > or when one throttles.
> > 
> > The main purpose is to make the lockup detector work with full dynticks, as I suspect
> > that distros want to enable both...
> > 
> > 
> > You can fetch from:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> > 	perf/nohz
> > 
> > Thanks,
> > 	Frederic
> 
> Awesome, looks good, thanks Frederic!
> 
> Acked-by: Peter Zijlstra <peterz@infradead.org>

Great! What do you prefer? Do you want to take these or should I do a pull request
to Ingo directly?

Thanks!

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/8] perf: Finer grained full dynticks handling
  2013-07-25 14:02   ` Frederic Weisbecker
@ 2013-07-25 16:29     ` Peter Zijlstra
  2013-07-25 20:07       ` Frederic Weisbecker
  0 siblings, 1 reply; 49+ messages in thread
From: Peter Zijlstra @ 2013-07-25 16:29 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Jiri Olsa, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian, Don Zickus,
	Srivatsa S. Bhat, Anish Singh

On Thu, Jul 25, 2013 at 04:02:17PM +0200, Frederic Weisbecker wrote:
> > Awesome, looks good, thanks Frederic!
> > 
> > Acked-by: Peter Zijlstra <peterz@infradead.org>
> 
> Great! What do you prefer? Do you want to take these or should I do a pull request
> to Ingo directly?

I had intended Ingo to pick these up; but then I remembered he's playing
road warrior atm. So I stuck them in my queue so as not to loose them.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 0/8] perf: Finer grained full dynticks handling
  2013-07-25 16:29     ` Peter Zijlstra
@ 2013-07-25 20:07       ` Frederic Weisbecker
  0 siblings, 0 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2013-07-25 20:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Jiri Olsa, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian, Don Zickus,
	Srivatsa S. Bhat, Anish Singh

On Thu, Jul 25, 2013 at 06:29:39PM +0200, Peter Zijlstra wrote:
> On Thu, Jul 25, 2013 at 04:02:17PM +0200, Frederic Weisbecker wrote:
> > > Awesome, looks good, thanks Frederic!
> > > 
> > > Acked-by: Peter Zijlstra <peterz@infradead.org>
> > 
> > Great! What do you prefer? Do you want to take these or should I do a pull request
> > to Ingo directly?
> 
> I had intended Ingo to pick these up; but then I remembered he's playing
> road warrior atm. So I stuck them in my queue so as not to loose them.

Thanks!

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [tip:perf/core] perf: Fix branch stack refcount leak on callchain init failure
  2013-07-23  0:30 ` [PATCH 1/8] perf: Fix branch stack refcount leak on callchain init failure Frederic Weisbecker
@ 2013-07-31  8:55   ` tip-bot for Frederic Weisbecker
  0 siblings, 0 replies; 49+ messages in thread
From: tip-bot for Frederic Weisbecker @ 2013-07-31  8:55 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, eranian, acme, hpa, mingo, peterz, namhyung, jolsa,
	fweisbec, tglx

Commit-ID:  6050cb0b0b366092d1383bc23d7b16cd26db00f0
Gitweb:     http://git.kernel.org/tip/6050cb0b0b366092d1383bc23d7b16cd26db00f0
Author:     Frederic Weisbecker <fweisbec@gmail.com>
AuthorDate: Tue, 23 Jul 2013 02:30:59 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 30 Jul 2013 22:22:58 +0200

perf: Fix branch stack refcount leak on callchain init failure

On callchain buffers allocation failure, free_event() is
called and all the accounting performed in perf_event_alloc()
for that event is cancelled.

But if the event has branch stack sampling, it is unaccounted
as well from the branch stack sampling events refcounts.

This is a bug because this accounting is performed after the
callchain buffer allocation. As a result, the branch stack sampling
events refcount can become negative.

To fix this, move the branch stack event accounting before the
callchain buffer allocation.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1374539466-4799-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/events/core.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 1274114..f35aa7e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6567,6 +6567,12 @@ done:
 			atomic_inc(&nr_comm_events);
 		if (event->attr.task)
 			atomic_inc(&nr_task_events);
+		if (has_branch_stack(event)) {
+			static_key_slow_inc(&perf_sched_events.key);
+			if (!(event->attach_state & PERF_ATTACH_TASK))
+				atomic_inc(&per_cpu(perf_branch_stack_events,
+						    event->cpu));
+		}
 		if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) {
 			err = get_callchain_buffers();
 			if (err) {
@@ -6574,12 +6580,6 @@ done:
 				return ERR_PTR(err);
 			}
 		}
-		if (has_branch_stack(event)) {
-			static_key_slow_inc(&perf_sched_events.key);
-			if (!(event->attach_state & PERF_ATTACH_TASK))
-				atomic_inc(&per_cpu(perf_branch_stack_events,
-						    event->cpu));
-		}
 	}
 
 	return event;

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [tip:perf/core] perf: Sanitize get_callchain_buffer()
  2013-07-23  0:31 ` [PATCH 2/8] perf: Sanitize get_callchain_buffer() Frederic Weisbecker
@ 2013-07-31  8:56   ` tip-bot for Frederic Weisbecker
  2013-08-01 13:01   ` [PATCH 2/8] " Jiri Olsa
  2013-08-01 13:29   ` Jiri Olsa
  2 siblings, 0 replies; 49+ messages in thread
From: tip-bot for Frederic Weisbecker @ 2013-07-31  8:56 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, eranian, acme, hpa, mingo, peterz, namhyung, jolsa,
	fweisbec, tglx

Commit-ID:  90983b16078ab0fdc58f0dab3e8e3da79c9579a2
Gitweb:     http://git.kernel.org/tip/90983b16078ab0fdc58f0dab3e8e3da79c9579a2
Author:     Frederic Weisbecker <fweisbec@gmail.com>
AuthorDate: Tue, 23 Jul 2013 02:31:00 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 30 Jul 2013 22:29:12 +0200

perf: Sanitize get_callchain_buffer()

In case of allocation failure, get_callchain_buffer() keeps the
refcount incremented for the current event.

As a result, when get_callchain_buffers() returns an error,
we must cleanup what it did by cancelling its last refcount
with a call to put_callchain_buffers().

This is a hack in order to be able to call free_event()
after that failure.

The original purpose of that was to simplify the failure
path. But this error handling is actually counter intuitive,
ugly and not very easy to follow because one expect to
see the resources used to perform a service to be cleaned
by the callee if case of failure, not by the caller.

So lets clean this up by cancelling the refcount from
get_callchain_buffer() in case of failure. And correctly free
the event accordingly in perf_event_alloc().

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1374539466-4799-3-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/events/callchain.c |  2 ++
 kernel/events/core.c      | 41 +++++++++++++++++++++--------------------
 2 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c
index c772061..76a8bc5 100644
--- a/kernel/events/callchain.c
+++ b/kernel/events/callchain.c
@@ -117,6 +117,8 @@ int get_callchain_buffers(void)
 	err = alloc_callchain_buffers();
 exit:
 	mutex_unlock(&callchain_mutex);
+	if (err)
+		atomic_dec(&nr_callchain_events);
 
 	return err;
 }
diff --git a/kernel/events/core.c b/kernel/events/core.c
index f35aa7e..3b99862 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6457,7 +6457,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 	struct pmu *pmu;
 	struct perf_event *event;
 	struct hw_perf_event *hwc;
-	long err;
+	long err = -EINVAL;
 
 	if ((unsigned)cpu >= nr_cpu_ids) {
 		if (!task || cpu != -1)
@@ -6540,25 +6540,23 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 	 * we currently do not support PERF_FORMAT_GROUP on inherited events
 	 */
 	if (attr->inherit && (attr->read_format & PERF_FORMAT_GROUP))
-		goto done;
+		goto err_ns;
 
 	pmu = perf_init_event(event);
-
-done:
-	err = 0;
 	if (!pmu)
-		err = -EINVAL;
-	else if (IS_ERR(pmu))
+		goto err_ns;
+	else if (IS_ERR(pmu)) {
 		err = PTR_ERR(pmu);
-
-	if (err) {
-		if (event->ns)
-			put_pid_ns(event->ns);
-		kfree(event);
-		return ERR_PTR(err);
+		goto err_ns;
 	}
 
 	if (!event->parent) {
+		if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) {
+			err = get_callchain_buffers();
+			if (err)
+				goto err_pmu;
+		}
+
 		if (event->attach_state & PERF_ATTACH_TASK)
 			static_key_slow_inc(&perf_sched_events.key);
 		if (event->attr.mmap || event->attr.mmap_data)
@@ -6573,16 +6571,19 @@ done:
 				atomic_inc(&per_cpu(perf_branch_stack_events,
 						    event->cpu));
 		}
-		if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) {
-			err = get_callchain_buffers();
-			if (err) {
-				free_event(event);
-				return ERR_PTR(err);
-			}
-		}
 	}
 
 	return event;
+
+err_pmu:
+	if (event->destroy)
+		event->destroy(event);
+err_ns:
+	if (event->ns)
+		put_pid_ns(event->ns);
+	kfree(event);
+
+	return ERR_PTR(err);
 }
 
 static int perf_copy_attr(struct perf_event_attr __user *uattr,

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [tip:perf/core] perf: Factor out event accounting code to account_event()/__free_event()
  2013-07-23  0:31 ` [PATCH 3/8] perf: Gather event accounting code Frederic Weisbecker
@ 2013-07-31  8:56   ` tip-bot for Frederic Weisbecker
  2013-08-01 13:13   ` [PATCH 3/8] perf: Gather event accounting code Jiri Olsa
  1 sibling, 0 replies; 49+ messages in thread
From: tip-bot for Frederic Weisbecker @ 2013-07-31  8:56 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, eranian, acme, hpa, mingo, peterz, namhyung, jolsa,
	fweisbec, tglx

Commit-ID:  766d6c076928191d75ad5b0d0f58f52b1e7682d8
Gitweb:     http://git.kernel.org/tip/766d6c076928191d75ad5b0d0f58f52b1e7682d8
Author:     Frederic Weisbecker <fweisbec@gmail.com>
AuthorDate: Tue, 23 Jul 2013 02:31:01 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 30 Jul 2013 22:29:12 +0200

perf: Factor out event accounting code to account_event()/__free_event()

Gather all the event accounting code to a single place,
once all the prerequisites are completed. This simplifies
the refcounting.

Original-patch-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1374539466-4799-4-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/events/core.c | 79 +++++++++++++++++++++++++++++++---------------------
 1 file changed, 47 insertions(+), 32 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 3b99862..158fd57 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3128,6 +3128,21 @@ static void free_event_rcu(struct rcu_head *head)
 static void ring_buffer_put(struct ring_buffer *rb);
 static void ring_buffer_detach(struct perf_event *event, struct ring_buffer *rb);
 
+static void __free_event(struct perf_event *event)
+{
+	if (!event->parent) {
+		if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
+			put_callchain_buffers();
+	}
+
+	if (event->destroy)
+		event->destroy(event);
+
+	if (event->ctx)
+		put_ctx(event->ctx);
+
+	call_rcu(&event->rcu_head, free_event_rcu);
+}
 static void free_event(struct perf_event *event)
 {
 	irq_work_sync(&event->pending);
@@ -3141,8 +3156,6 @@ static void free_event(struct perf_event *event)
 			atomic_dec(&nr_comm_events);
 		if (event->attr.task)
 			atomic_dec(&nr_task_events);
-		if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
-			put_callchain_buffers();
 		if (is_cgroup_event(event)) {
 			atomic_dec(&per_cpu(perf_cgroup_events, event->cpu));
 			static_key_slow_dec_deferred(&perf_sched_events);
@@ -3180,13 +3193,8 @@ static void free_event(struct perf_event *event)
 	if (is_cgroup_event(event))
 		perf_detach_cgroup(event);
 
-	if (event->destroy)
-		event->destroy(event);
-
-	if (event->ctx)
-		put_ctx(event->ctx);
 
-	call_rcu(&event->rcu_head, free_event_rcu);
+	__free_event(event);
 }
 
 int perf_event_release_kernel(struct perf_event *event)
@@ -6443,6 +6451,29 @@ unlock:
 	return pmu;
 }
 
+static void account_event(struct perf_event *event)
+{
+	if (event->attach_state & PERF_ATTACH_TASK)
+		static_key_slow_inc(&perf_sched_events.key);
+	if (event->attr.mmap || event->attr.mmap_data)
+		atomic_inc(&nr_mmap_events);
+	if (event->attr.comm)
+		atomic_inc(&nr_comm_events);
+	if (event->attr.task)
+		atomic_inc(&nr_task_events);
+	if (has_branch_stack(event)) {
+		static_key_slow_inc(&perf_sched_events.key);
+		if (!(event->attach_state & PERF_ATTACH_TASK))
+			atomic_inc(&per_cpu(perf_branch_stack_events,
+					    event->cpu));
+	}
+
+	if (is_cgroup_event(event)) {
+		atomic_inc(&per_cpu(perf_cgroup_events, event->cpu));
+		static_key_slow_inc(&perf_sched_events.key);
+	}
+}
+
 /*
  * Allocate and initialize a event structure
  */
@@ -6556,21 +6587,6 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 			if (err)
 				goto err_pmu;
 		}
-
-		if (event->attach_state & PERF_ATTACH_TASK)
-			static_key_slow_inc(&perf_sched_events.key);
-		if (event->attr.mmap || event->attr.mmap_data)
-			atomic_inc(&nr_mmap_events);
-		if (event->attr.comm)
-			atomic_inc(&nr_comm_events);
-		if (event->attr.task)
-			atomic_inc(&nr_task_events);
-		if (has_branch_stack(event)) {
-			static_key_slow_inc(&perf_sched_events.key);
-			if (!(event->attach_state & PERF_ATTACH_TASK))
-				atomic_inc(&per_cpu(perf_branch_stack_events,
-						    event->cpu));
-		}
 	}
 
 	return event;
@@ -6865,17 +6881,14 @@ SYSCALL_DEFINE5(perf_event_open,
 
 	if (flags & PERF_FLAG_PID_CGROUP) {
 		err = perf_cgroup_connect(pid, event, &attr, group_leader);
-		if (err)
-			goto err_alloc;
-		/*
-		 * one more event:
-		 * - that has cgroup constraint on event->cpu
-		 * - that may need work on context switch
-		 */
-		atomic_inc(&per_cpu(perf_cgroup_events, event->cpu));
-		static_key_slow_inc(&perf_sched_events.key);
+		if (err) {
+			__free_event(event);
+			goto err_task;
+		}
 	}
 
+	account_event(event);
+
 	/*
 	 * Special case software events and allow them to be part of
 	 * any hardware group.
@@ -7071,6 +7084,8 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu,
 		goto err;
 	}
 
+	account_event(event);
+
 	ctx = find_get_context(event->pmu, task, cpu);
 	if (IS_ERR(ctx)) {
 		err = PTR_ERR(ctx);

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [tip:perf/core] perf: Split the per-cpu accounting part of the event accounting code
  2013-07-23  0:31 ` [PATCH 4/8] perf: Split per cpu " Frederic Weisbecker
@ 2013-07-31  8:56   ` tip-bot for Frederic Weisbecker
  0 siblings, 0 replies; 49+ messages in thread
From: tip-bot for Frederic Weisbecker @ 2013-07-31  8:56 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, eranian, acme, hpa, mingo, peterz, namhyung, jolsa,
	fweisbec, tglx

Commit-ID:  4beb31f3657348a8b702dd014d01c520e522012f
Gitweb:     http://git.kernel.org/tip/4beb31f3657348a8b702dd014d01c520e522012f
Author:     Frederic Weisbecker <fweisbec@gmail.com>
AuthorDate: Tue, 23 Jul 2013 02:31:02 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 30 Jul 2013 22:29:13 +0200

perf: Split the per-cpu accounting part of the event accounting code

This way we can use the per-cpu handling seperately.
This is going to be used by to fix the event migration
code accounting.

Original-patch-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1374539466-4799-5-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/events/core.c | 87 +++++++++++++++++++++++++++++++++-------------------
 1 file changed, 55 insertions(+), 32 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 158fd57..3a4b73a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3128,6 +3128,40 @@ static void free_event_rcu(struct rcu_head *head)
 static void ring_buffer_put(struct ring_buffer *rb);
 static void ring_buffer_detach(struct perf_event *event, struct ring_buffer *rb);
 
+static void unaccount_event_cpu(struct perf_event *event, int cpu)
+{
+	if (event->parent)
+		return;
+
+	if (has_branch_stack(event)) {
+		if (!(event->attach_state & PERF_ATTACH_TASK))
+			atomic_dec(&per_cpu(perf_branch_stack_events, cpu));
+	}
+	if (is_cgroup_event(event))
+		atomic_dec(&per_cpu(perf_cgroup_events, cpu));
+}
+
+static void unaccount_event(struct perf_event *event)
+{
+	if (event->parent)
+		return;
+
+	if (event->attach_state & PERF_ATTACH_TASK)
+		static_key_slow_dec_deferred(&perf_sched_events);
+	if (event->attr.mmap || event->attr.mmap_data)
+		atomic_dec(&nr_mmap_events);
+	if (event->attr.comm)
+		atomic_dec(&nr_comm_events);
+	if (event->attr.task)
+		atomic_dec(&nr_task_events);
+	if (is_cgroup_event(event))
+		static_key_slow_dec_deferred(&perf_sched_events);
+	if (has_branch_stack(event))
+		static_key_slow_dec_deferred(&perf_sched_events);
+
+	unaccount_event_cpu(event, event->cpu);
+}
+
 static void __free_event(struct perf_event *event)
 {
 	if (!event->parent) {
@@ -3147,29 +3181,7 @@ static void free_event(struct perf_event *event)
 {
 	irq_work_sync(&event->pending);
 
-	if (!event->parent) {
-		if (event->attach_state & PERF_ATTACH_TASK)
-			static_key_slow_dec_deferred(&perf_sched_events);
-		if (event->attr.mmap || event->attr.mmap_data)
-			atomic_dec(&nr_mmap_events);
-		if (event->attr.comm)
-			atomic_dec(&nr_comm_events);
-		if (event->attr.task)
-			atomic_dec(&nr_task_events);
-		if (is_cgroup_event(event)) {
-			atomic_dec(&per_cpu(perf_cgroup_events, event->cpu));
-			static_key_slow_dec_deferred(&perf_sched_events);
-		}
-
-		if (has_branch_stack(event)) {
-			static_key_slow_dec_deferred(&perf_sched_events);
-			/* is system-wide event */
-			if (!(event->attach_state & PERF_ATTACH_TASK)) {
-				atomic_dec(&per_cpu(perf_branch_stack_events,
-						    event->cpu));
-			}
-		}
-	}
+	unaccount_event(event);
 
 	if (event->rb) {
 		struct ring_buffer *rb;
@@ -6451,8 +6463,24 @@ unlock:
 	return pmu;
 }
 
+static void account_event_cpu(struct perf_event *event, int cpu)
+{
+	if (event->parent)
+		return;
+
+	if (has_branch_stack(event)) {
+		if (!(event->attach_state & PERF_ATTACH_TASK))
+			atomic_inc(&per_cpu(perf_branch_stack_events, cpu));
+	}
+	if (is_cgroup_event(event))
+		atomic_inc(&per_cpu(perf_cgroup_events, cpu));
+}
+
 static void account_event(struct perf_event *event)
 {
+	if (event->parent)
+		return;
+
 	if (event->attach_state & PERF_ATTACH_TASK)
 		static_key_slow_inc(&perf_sched_events.key);
 	if (event->attr.mmap || event->attr.mmap_data)
@@ -6461,17 +6489,12 @@ static void account_event(struct perf_event *event)
 		atomic_inc(&nr_comm_events);
 	if (event->attr.task)
 		atomic_inc(&nr_task_events);
-	if (has_branch_stack(event)) {
+	if (has_branch_stack(event))
 		static_key_slow_inc(&perf_sched_events.key);
-		if (!(event->attach_state & PERF_ATTACH_TASK))
-			atomic_inc(&per_cpu(perf_branch_stack_events,
-					    event->cpu));
-	}
-
-	if (is_cgroup_event(event)) {
-		atomic_inc(&per_cpu(perf_cgroup_events, event->cpu));
+	if (is_cgroup_event(event))
 		static_key_slow_inc(&perf_sched_events.key);
-	}
+
+	account_event_cpu(event, event->cpu);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [tip:perf/core] perf: Migrate per cpu event accounting
  2013-07-23  0:31 ` [PATCH 5/8] perf: Migrate per cpu event accounting Frederic Weisbecker
@ 2013-07-31  8:56   ` tip-bot for Frederic Weisbecker
  0 siblings, 0 replies; 49+ messages in thread
From: tip-bot for Frederic Weisbecker @ 2013-07-31  8:56 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, eranian, acme, hpa, mingo, peterz, namhyung, jolsa,
	fweisbec, tglx

Commit-ID:  9a545de019b536771feefb76f85e5038b65c2190
Gitweb:     http://git.kernel.org/tip/9a545de019b536771feefb76f85e5038b65c2190
Author:     Frederic Weisbecker <fweisbec@gmail.com>
AuthorDate: Tue, 23 Jul 2013 02:31:03 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 30 Jul 2013 22:29:14 +0200

perf: Migrate per cpu event accounting

When an event is migrated, move the event per-cpu
accounting accordingly so that branch stack and cgroup
events work correctly on the new CPU.

Original-patch-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1374539466-4799-6-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/events/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 3a4b73a..63bdec9 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7145,6 +7145,7 @@ void perf_pmu_migrate_context(struct pmu *pmu, int src_cpu, int dst_cpu)
 	list_for_each_entry_safe(event, tmp, &src_ctx->event_list,
 				 event_entry) {
 		perf_remove_from_context(event);
+		unaccount_event_cpu(event, src_cpu);
 		put_ctx(src_ctx);
 		list_add(&event->event_entry, &events);
 	}
@@ -7157,6 +7158,7 @@ void perf_pmu_migrate_context(struct pmu *pmu, int src_cpu, int dst_cpu)
 		list_del(&event->event_entry);
 		if (event->state >= PERF_EVENT_STATE_OFF)
 			event->state = PERF_EVENT_STATE_INACTIVE;
+		account_event_cpu(event, dst_cpu);
 		perf_install_in_context(dst_ctx, event, dst_cpu);
 		get_ctx(dst_ctx);
 	}

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [tip:perf/core] perf: Account freq events per cpu
  2013-07-23  0:31 ` [PATCH 6/8] perf: Account freq events per cpu Frederic Weisbecker
@ 2013-07-31  8:56   ` tip-bot for Frederic Weisbecker
  2013-08-01 12:46   ` [PATCH 6/8] " Jiri Olsa
  1 sibling, 0 replies; 49+ messages in thread
From: tip-bot for Frederic Weisbecker @ 2013-07-31  8:56 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, eranian, acme, hpa, mingo, peterz, namhyung, jolsa,
	fweisbec, tglx

Commit-ID:  ba8a75c16e292c0a3a87406a77508cbbc6cf4ee2
Gitweb:     http://git.kernel.org/tip/ba8a75c16e292c0a3a87406a77508cbbc6cf4ee2
Author:     Frederic Weisbecker <fweisbec@gmail.com>
AuthorDate: Tue, 23 Jul 2013 02:31:04 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 30 Jul 2013 22:29:14 +0200

perf: Account freq events per cpu

This is going to be used by the full dynticks subsystem
as a finer-grained information to know when to keep and
when to stop the tick.

Original-patch-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1374539466-4799-7-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/events/core.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 63bdec9..3fe385a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -141,6 +141,7 @@ enum event_type_t {
 struct static_key_deferred perf_sched_events __read_mostly;
 static DEFINE_PER_CPU(atomic_t, perf_cgroup_events);
 static DEFINE_PER_CPU(atomic_t, perf_branch_stack_events);
+static DEFINE_PER_CPU(atomic_t, perf_freq_events);
 
 static atomic_t nr_mmap_events __read_mostly;
 static atomic_t nr_comm_events __read_mostly;
@@ -3139,6 +3140,9 @@ static void unaccount_event_cpu(struct perf_event *event, int cpu)
 	}
 	if (is_cgroup_event(event))
 		atomic_dec(&per_cpu(perf_cgroup_events, cpu));
+
+	if (event->attr.freq)
+		atomic_dec(&per_cpu(perf_freq_events, cpu));
 }
 
 static void unaccount_event(struct perf_event *event)
@@ -6474,6 +6478,9 @@ static void account_event_cpu(struct perf_event *event, int cpu)
 	}
 	if (is_cgroup_event(event))
 		atomic_inc(&per_cpu(perf_cgroup_events, cpu));
+
+	if (event->attr.freq)
+		atomic_inc(&per_cpu(perf_freq_events, cpu));
 }
 
 static void account_event(struct perf_event *event)

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [tip:perf/core] perf: Implement finer grained full dynticks kick
  2013-07-23  0:31 ` [PATCH 7/8] perf: Finer grained full dynticks kick Frederic Weisbecker
@ 2013-07-31  8:56   ` tip-bot for Frederic Weisbecker
  0 siblings, 0 replies; 49+ messages in thread
From: tip-bot for Frederic Weisbecker @ 2013-07-31  8:56 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, eranian, acme, hpa, mingo, peterz, namhyung, jolsa,
	fweisbec, tglx

Commit-ID:  d84153d6c96f61aa06429586284639f32debf03e
Gitweb:     http://git.kernel.org/tip/d84153d6c96f61aa06429586284639f32debf03e
Author:     Frederic Weisbecker <fweisbec@gmail.com>
AuthorDate: Tue, 23 Jul 2013 02:31:05 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 30 Jul 2013 22:29:15 +0200

perf: Implement finer grained full dynticks kick

Currently the full dynticks subsystem keep the
tick alive as long as there are perf events running.

This prevents the tick from being stopped as long as features
such that the lockup detectors are running. As a temporary fix,
the lockup detector is disabled by default when full dynticks
is built but this is not a long term viable solution.

To fix this, only keep the tick alive when an event configured
with a frequency rather than a period is running on the CPU,
or when an event throttles on the CPU.

These are the only purposes of the perf tick, especially now that
the rotation of flexible events is handled from a seperate hrtimer.
The tick can be shutdown the rest of the time.

Original-patch-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1374539466-4799-8-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/events/core.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 3fe385a..916cf1f 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -870,12 +870,8 @@ static void perf_pmu_rotate_start(struct pmu *pmu)
 
 	WARN_ON(!irqs_disabled());
 
-	if (list_empty(&cpuctx->rotation_list)) {
-		int was_empty = list_empty(head);
+	if (list_empty(&cpuctx->rotation_list))
 		list_add(&cpuctx->rotation_list, head);
-		if (was_empty)
-			tick_nohz_full_kick();
-	}
 }
 
 static void get_ctx(struct perf_event_context *ctx)
@@ -1875,6 +1871,9 @@ static int  __perf_install_in_context(void *info)
 	perf_pmu_enable(cpuctx->ctx.pmu);
 	perf_ctx_unlock(cpuctx, task_ctx);
 
+	if (atomic_read(&__get_cpu_var(perf_freq_events)))
+		tick_nohz_full_kick();
+
 	return 0;
 }
 
@@ -2812,10 +2811,11 @@ done:
 #ifdef CONFIG_NO_HZ_FULL
 bool perf_event_can_stop_tick(void)
 {
-	if (list_empty(&__get_cpu_var(rotation_list)))
-		return true;
-	else
+	if (atomic_read(&__get_cpu_var(perf_freq_events)) ||
+	    __this_cpu_read(perf_throttled_count))
 		return false;
+	else
+		return true;
 }
 #endif
 
@@ -5202,6 +5202,7 @@ static int __perf_event_overflow(struct perf_event *event,
 			__this_cpu_inc(perf_throttled_count);
 			hwc->interrupts = MAX_INTERRUPTS;
 			perf_log_throttle(event, 0);
+			tick_nohz_full_kick();
 			ret = 1;
 		}
 	}

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [tip:perf/core] watchdog: Make it work under full dynticks
  2013-07-23  0:31 ` [PATCH 8/8] watchdog: Remove hack to make full dynticks working Frederic Weisbecker
  2013-07-23 12:33   ` Don Zickus
@ 2013-07-31  8:57   ` tip-bot for Frederic Weisbecker
  1 sibling, 0 replies; 49+ messages in thread
From: tip-bot for Frederic Weisbecker @ 2013-07-31  8:57 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, eranian, acme, hpa, mingo, anish198519851985,
	peterz, srivatsa.bhat, namhyung, jolsa, fweisbec, tglx, dzickus

Commit-ID:  93786a5f6aeb9c032c1c240246c5aabcf457b38f
Gitweb:     http://git.kernel.org/tip/93786a5f6aeb9c032c1c240246c5aabcf457b38f
Author:     Frederic Weisbecker <fweisbec@gmail.com>
AuthorDate: Tue, 23 Jul 2013 02:31:06 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 30 Jul 2013 22:29:15 +0200

watchdog: Make it work under full dynticks

A perf event can be used without forcing the tick to
stay alive if it doesn't use a frequency but a sample
period and if it doesn't throttle (raise storm of events).

Since the lockup detector neither use a perf event frequency
nor should ever throttle due to its high period, it can now
run concurrently with the full dynticks feature.

So remove the hack that disabled the watchdog.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Anish Singh <anish198519851985@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1374539466-4799-9-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/watchdog.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 1241d8c..51c4f34 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -553,14 +553,6 @@ void __init lockup_detector_init(void)
 {
 	set_sample_period();
 
-#ifdef CONFIG_NO_HZ_FULL
-	if (watchdog_user_enabled) {
-		watchdog_user_enabled = 0;
-		pr_warning("Disabled lockup detectors by default for full dynticks\n");
-		pr_warning("You can reactivate it with 'sysctl -w kernel.watchdog=1'\n");
-	}
-#endif
-
 	if (watchdog_user_enabled)
 		watchdog_enable_all_cpus();
 }

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH 6/8] perf: Account freq events per cpu
  2013-07-23  0:31 ` [PATCH 6/8] perf: Account freq events per cpu Frederic Weisbecker
  2013-07-31  8:56   ` [tip:perf/core] " tip-bot for Frederic Weisbecker
@ 2013-08-01 12:46   ` Jiri Olsa
  2013-08-01 12:48     ` Jiri Olsa
  2013-08-01 13:31     ` Peter Zijlstra
  1 sibling, 2 replies; 49+ messages in thread
From: Jiri Olsa @ 2013-08-01 12:46 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, LKML, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

On Tue, Jul 23, 2013 at 02:31:04AM +0200, Frederic Weisbecker wrote:
> This is going to be used by the full dynticks subsystem
> as a finer-grained information to know when to keep and
> when to stop the tick.
> 
> Original-patch-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Jiri Olsa <jolsa@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Stephane Eranian <eranian@google.com>
> ---
>  kernel/events/core.c |    7 +++++++
>  1 files changed, 7 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index b40c3db..f9bd39b 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -141,6 +141,7 @@ enum event_type_t {
>  struct static_key_deferred perf_sched_events __read_mostly;
>  static DEFINE_PER_CPU(atomic_t, perf_cgroup_events);
>  static DEFINE_PER_CPU(atomic_t, perf_branch_stack_events);
> +static DEFINE_PER_CPU(atomic_t, perf_freq_events);
>  
>  static atomic_t nr_mmap_events __read_mostly;
>  static atomic_t nr_comm_events __read_mostly;
> @@ -3139,6 +3140,9 @@ static void unaccount_event_cpu(struct perf_event *event, int cpu)
>  	}
>  	if (is_cgroup_event(event))
>  		atomic_dec(&per_cpu(perf_cgroup_events, cpu));
> +
> +	if (event->attr.freq)
> +		atomic_dec(&per_cpu(perf_freq_events, cpu));
>  }
>  
>  static void unaccount_event(struct perf_event *event)
> @@ -6473,6 +6477,9 @@ static void account_event_cpu(struct perf_event *event, int cpu)
>  	}
>  	if (is_cgroup_event(event))
>  		atomic_inc(&per_cpu(perf_cgroup_events, cpu));
> +
> +	if (event->attr.freq)
> +		atomic_inc(&per_cpu(perf_freq_events, cpu));

cpu could be -1 in here.. getting:

[  178.901881] BUG: unable to handle kernel paging request at 0000000f001cf45a
[  178.901989] IP: [<ffffffff8110c32b>] account_event_cpu+0xbb/0xd0
[  178.901989] PGD 751a5067 PUD 0 
[  178.901989] Oops: 0002 [#2] PREEMPT SMP 
[  178.901989] Modules linked in:
[  178.901989] CPU: 0 PID: 1309 Comm: perf Tainted: G      D W    3.11.0-rc3+ #281
[  178.901989] Hardware name: Intel Corporation Montevina platform/To be filled by O.E.M., BIOS AMVACRB1.86C.0066.B00.0805070703 05/07/2008
[  178.901989] task: ffff88007519e5c0 ti: ffff880074c08000 task.ti: ffff880074c08000
[  178.901989] RIP: 0010:[<ffffffff8110c32b>]  [<ffffffff8110c32b>] account_event_cpu+0xbb/0xd0
[  178.901989] RSP: 0018:ffff880074c09e38  EFLAGS: 00010216
[  178.901989] RAX: 0000000f001cf45a RBX: ffff880074c1a800 RCX: 0000000000100000
[  178.901989] RDX: 000000000000002c RSI: 000000000000002c RDI: ffff880000120400
[  178.901989] RBP: ffff880074c09e48 R08: 0000000000000000 R09: 00000000ffffffff
[  178.901989] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffffffffff
[  178.901989] R13: ffff880074c0e640 R14: 00000000ffffffff R15: 0000000000000000
[  178.901989] FS:  00007fc670738980(0000) GS:ffff88007a600000(0000) knlGS:0000000000000000
[  178.901989] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  178.901989] CR2: 0000000f001cf45a CR3: 00000000751a8000 CR4: 00000000000407f0
[  178.901989] Stack:
[  178.901989]  ffff880074c1a800 ffff880074c09ee8 ffff880074c09e68 ffffffff8110c3b6
[  178.901989]  00000000ffffffff ffff880074c1a800 ffff880074c09f78 ffffffff81113ae8
[  178.901989]  0000000000000000 0000000000000001 ffff880074c09eb8 ffffffff00000060
[  178.901989] Call Trace:
[  178.901989]  [<ffffffff8110c3b6>] account_event.part.66+0x76/0xa0
[  178.901989]  [<ffffffff81113ae8>] SyS_perf_event_open+0x678/0xdf0
[  178.901989]  [<ffffffff8100ed42>] ? syscall_trace_leave+0xb2/0x240
[  178.901989]  [<ffffffff81641512>] tracesys+0xd0/0xd5
[  178.901989] Code: d4 48 c7 c0 28 f4 1c 00 48 03 04 d5 80 ad ce 81 f0 ff 00 f6 83 c9 00 00 00 04 74 a7 48 c7 c0 4c f4 1c 00 4a 03 04 e5 80 ad ce 81 <f0> ff 00 5b 41 5c 5d c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 
[  178.901989] RIP  [<ffffffff8110c32b>] account_event_cpu+0xbb/0xd0
[  178.901989]  RSP <ffff880074c09e38>
[  178.901989] CR2: 0000000f001cf45a
[  179.122749] ---[ end trace 6f2f4f69b01368fb ]---


jirka

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 6/8] perf: Account freq events per cpu
  2013-08-01 12:46   ` [PATCH 6/8] " Jiri Olsa
@ 2013-08-01 12:48     ` Jiri Olsa
  2013-08-01 13:31     ` Peter Zijlstra
  1 sibling, 0 replies; 49+ messages in thread
From: Jiri Olsa @ 2013-08-01 12:48 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, LKML, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

On Thu, Aug 01, 2013 at 02:46:58PM +0200, Jiri Olsa wrote:
> On Tue, Jul 23, 2013 at 02:31:04AM +0200, Frederic Weisbecker wrote:
> > This is going to be used by the full dynticks subsystem
> > as a finer-grained information to know when to keep and
> > when to stop the tick.
> > 
> > Original-patch-by: Peter Zijlstra <peterz@infradead.org>
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > Cc: Jiri Olsa <jolsa@redhat.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Namhyung Kim <namhyung@kernel.org>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> > Cc: Stephane Eranian <eranian@google.com>
> > ---
> >  kernel/events/core.c |    7 +++++++
> >  1 files changed, 7 insertions(+), 0 deletions(-)
> > 
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index b40c3db..f9bd39b 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -141,6 +141,7 @@ enum event_type_t {
> >  struct static_key_deferred perf_sched_events __read_mostly;
> >  static DEFINE_PER_CPU(atomic_t, perf_cgroup_events);
> >  static DEFINE_PER_CPU(atomic_t, perf_branch_stack_events);
> > +static DEFINE_PER_CPU(atomic_t, perf_freq_events);
> >  
> >  static atomic_t nr_mmap_events __read_mostly;
> >  static atomic_t nr_comm_events __read_mostly;
> > @@ -3139,6 +3140,9 @@ static void unaccount_event_cpu(struct perf_event *event, int cpu)
> >  	}
> >  	if (is_cgroup_event(event))
> >  		atomic_dec(&per_cpu(perf_cgroup_events, cpu));
> > +
> > +	if (event->attr.freq)
> > +		atomic_dec(&per_cpu(perf_freq_events, cpu));
> >  }
> >  
> >  static void unaccount_event(struct perf_event *event)
> > @@ -6473,6 +6477,9 @@ static void account_event_cpu(struct perf_event *event, int cpu)
> >  	}
> >  	if (is_cgroup_event(event))
> >  		atomic_inc(&per_cpu(perf_cgroup_events, cpu));
> > +
> > +	if (event->attr.freq)
> > +		atomic_inc(&per_cpu(perf_freq_events, cpu));
> 
> cpu could be -1 in here.. getting:
> 
> [  178.901881] BUG: unable to handle kernel paging request at 0000000f001cf45a
> [  178.901989] IP: [<ffffffff8110c32b>] account_event_cpu+0xbb/0xd0
> [  178.901989] PGD 751a5067 PUD 0 
> [  178.901989] Oops: 0002 [#2] PREEMPT SMP 
> [  178.901989] Modules linked in:
> [  178.901989] CPU: 0 PID: 1309 Comm: perf Tainted: G      D W    3.11.0-rc3+ #281
> [  178.901989] Hardware name: Intel Corporation Montevina platform/To be filled by O.E.M., BIOS AMVACRB1.86C.0066.B00.0805070703 05/07/2008
> [  178.901989] task: ffff88007519e5c0 ti: ffff880074c08000 task.ti: ffff880074c08000
> [  178.901989] RIP: 0010:[<ffffffff8110c32b>]  [<ffffffff8110c32b>] account_event_cpu+0xbb/0xd0
> [  178.901989] RSP: 0018:ffff880074c09e38  EFLAGS: 00010216
> [  178.901989] RAX: 0000000f001cf45a RBX: ffff880074c1a800 RCX: 0000000000100000
> [  178.901989] RDX: 000000000000002c RSI: 000000000000002c RDI: ffff880000120400
> [  178.901989] RBP: ffff880074c09e48 R08: 0000000000000000 R09: 00000000ffffffff
> [  178.901989] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffffffffff
> [  178.901989] R13: ffff880074c0e640 R14: 00000000ffffffff R15: 0000000000000000
> [  178.901989] FS:  00007fc670738980(0000) GS:ffff88007a600000(0000) knlGS:0000000000000000
> [  178.901989] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  178.901989] CR2: 0000000f001cf45a CR3: 00000000751a8000 CR4: 00000000000407f0
> [  178.901989] Stack:
> [  178.901989]  ffff880074c1a800 ffff880074c09ee8 ffff880074c09e68 ffffffff8110c3b6
> [  178.901989]  00000000ffffffff ffff880074c1a800 ffff880074c09f78 ffffffff81113ae8
> [  178.901989]  0000000000000000 0000000000000001 ffff880074c09eb8 ffffffff00000060
> [  178.901989] Call Trace:
> [  178.901989]  [<ffffffff8110c3b6>] account_event.part.66+0x76/0xa0
> [  178.901989]  [<ffffffff81113ae8>] SyS_perf_event_open+0x678/0xdf0
> [  178.901989]  [<ffffffff8100ed42>] ? syscall_trace_leave+0xb2/0x240
> [  178.901989]  [<ffffffff81641512>] tracesys+0xd0/0xd5
> [  178.901989] Code: d4 48 c7 c0 28 f4 1c 00 48 03 04 d5 80 ad ce 81 f0 ff 00 f6 83 c9 00 00 00 04 74 a7 48 c7 c0 4c f4 1c 00 4a 03 04 e5 80 ad ce 81 <f0> ff 00 5b 41 5c 5d c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 
> [  178.901989] RIP  [<ffffffff8110c32b>] account_event_cpu+0xbb/0xd0
> [  178.901989]  RSP <ffff880074c09e38>
> [  178.901989] CR2: 0000000f001cf45a
> [  179.122749] ---[ end trace 6f2f4f69b01368fb ]---

the testcase was:

[root@dhcp-26-214 tracing]# cat
...
[root@dhcp-26-214 perf]# ./perf record -p `pgrep cat`
Killed

jirka

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 2/8] perf: Sanitize get_callchain_buffer()
  2013-07-23  0:31 ` [PATCH 2/8] perf: Sanitize get_callchain_buffer() Frederic Weisbecker
  2013-07-31  8:56   ` [tip:perf/core] " tip-bot for Frederic Weisbecker
@ 2013-08-01 13:01   ` Jiri Olsa
  2013-08-01 13:28     ` Frederic Weisbecker
  2013-08-01 13:29   ` Jiri Olsa
  2 siblings, 1 reply; 49+ messages in thread
From: Jiri Olsa @ 2013-08-01 13:01 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, LKML, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

On Tue, Jul 23, 2013 at 02:31:00AM +0200, Frederic Weisbecker wrote:
> In case of allocation failure, get_callchain_buffer() keeps the
> refcount incremented for the current event.
> 
> As a result, when get_callchain_buffers() returns an error,
> we must cleanup what it did by cancelling its last refcount
> with a call to put_callchain_buffers().
> 
> This is a hack in order to be able to call free_event()
> after that failure.
> 
> The original purpose of that was to simplify the failure
> path. But this error handling is actually counter intuitive,
> ugly and not very easy to follow because one expect to
> see the resources used to perform a service to be cleaned
> by the callee if case of failure, not by the caller.
> 
> So lets clean this up by cancelling the refcount from
> get_callchain_buffer() in case of failure. And correctly free
> the event accordingly in perf_event_alloc().
> 
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Jiri Olsa <jolsa@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Stephane Eranian <eranian@google.com>
> ---
>  kernel/events/callchain.c |    2 ++
>  kernel/events/core.c      |   41 +++++++++++++++++++++--------------------
>  2 files changed, 23 insertions(+), 20 deletions(-)
> 
> diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c
> index c772061..76a8bc5 100644
> --- a/kernel/events/callchain.c
> +++ b/kernel/events/callchain.c
> @@ -117,6 +117,8 @@ int get_callchain_buffers(void)
>  	err = alloc_callchain_buffers();
>  exit:
>  	mutex_unlock(&callchain_mutex);
> +	if (err)
> +		atomic_dec(&nr_callchain_events);

shouldn't we touch this under above lock?

also that above hunk decrements nr_callchain_events
also for following case:

        count = atomic_inc_return(&nr_callchain_events);
        if (WARN_ON_ONCE(count < 1)) {
                err = -EINVAL;
                goto exit;
        }

seems like it screws the count

jirka

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 3/8] perf: Gather event accounting code
  2013-07-23  0:31 ` [PATCH 3/8] perf: Gather event accounting code Frederic Weisbecker
  2013-07-31  8:56   ` [tip:perf/core] perf: Factor out event accounting code to account_event()/__free_event() tip-bot for Frederic Weisbecker
@ 2013-08-01 13:13   ` Jiri Olsa
  2013-08-01 13:30     ` Frederic Weisbecker
  1 sibling, 1 reply; 49+ messages in thread
From: Jiri Olsa @ 2013-08-01 13:13 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, LKML, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

On Tue, Jul 23, 2013 at 02:31:01AM +0200, Frederic Weisbecker wrote:
> Gather all the event accounting code to a single place,
> once all the prerequisites are completed. This simplifies
> the refcounting.
> 
> Original-patch-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Jiri Olsa <jolsa@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Stephane Eranian <eranian@google.com>
> ---
>  kernel/events/core.c |   79 +++++++++++++++++++++++++++++--------------------
>  1 files changed, 47 insertions(+), 32 deletions(-)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index c5f435f..3bb73af 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3128,6 +3128,21 @@ static void free_event_rcu(struct rcu_head *head)
>  static void ring_buffer_put(struct ring_buffer *rb);
>  static void ring_buffer_detach(struct perf_event *event, struct ring_buffer *rb);
>  
> +static void __free_event(struct perf_event *event)
> +{
> +	if (!event->parent) {
> +		if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
> +			put_callchain_buffers();
> +	}
> +
> +	if (event->destroy)
> +		event->destroy(event);
> +
> +	if (event->ctx)
> +		put_ctx(event->ctx);
> +
> +	call_rcu(&event->rcu_head, free_event_rcu);
> +}
>  static void free_event(struct perf_event *event)
>  {

nitpick, missing nl between functions

>  	irq_work_sync(&event->pending);
> @@ -3141,8 +3156,6 @@ static void free_event(struct perf_event *event)
>  			atomic_dec(&nr_comm_events);
>  		if (event->attr.task)
>  			atomic_dec(&nr_task_events);
> -		if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
> -			put_callchain_buffers();
>  		if (is_cgroup_event(event)) {
>  			atomic_dec(&per_cpu(perf_cgroup_events, event->cpu));
>  			static_key_slow_dec_deferred(&perf_sched_events);
> @@ -3180,13 +3193,8 @@ static void free_event(struct perf_event *event)
>  	if (is_cgroup_event(event))
>  		perf_detach_cgroup(event);
>  
> -	if (event->destroy)
> -		event->destroy(event);
> -
> -	if (event->ctx)
> -		put_ctx(event->ctx);
>  
> -	call_rcu(&event->rcu_head, free_event_rcu);
> +	__free_event(event);

nitpick, extra nl above

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 2/8] perf: Sanitize get_callchain_buffer()
  2013-08-01 13:01   ` [PATCH 2/8] " Jiri Olsa
@ 2013-08-01 13:28     ` Frederic Weisbecker
  2013-08-01 13:32       ` Jiri Olsa
  0 siblings, 1 reply; 49+ messages in thread
From: Frederic Weisbecker @ 2013-08-01 13:28 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, LKML, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

On Thu, Aug 01, 2013 at 03:01:46PM +0200, Jiri Olsa wrote:
> On Tue, Jul 23, 2013 at 02:31:00AM +0200, Frederic Weisbecker wrote:
> > In case of allocation failure, get_callchain_buffer() keeps the
> > refcount incremented for the current event.
> > 
> > As a result, when get_callchain_buffers() returns an error,
> > we must cleanup what it did by cancelling its last refcount
> > with a call to put_callchain_buffers().
> > 
> > This is a hack in order to be able to call free_event()
> > after that failure.
> > 
> > The original purpose of that was to simplify the failure
> > path. But this error handling is actually counter intuitive,
> > ugly and not very easy to follow because one expect to
> > see the resources used to perform a service to be cleaned
> > by the callee if case of failure, not by the caller.
> > 
> > So lets clean this up by cancelling the refcount from
> > get_callchain_buffer() in case of failure. And correctly free
> > the event accordingly in perf_event_alloc().
> > 
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > Cc: Jiri Olsa <jolsa@redhat.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Namhyung Kim <namhyung@kernel.org>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> > Cc: Stephane Eranian <eranian@google.com>
> > ---
> >  kernel/events/callchain.c |    2 ++
> >  kernel/events/core.c      |   41 +++++++++++++++++++++--------------------
> >  2 files changed, 23 insertions(+), 20 deletions(-)
> > 
> > diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c
> > index c772061..76a8bc5 100644
> > --- a/kernel/events/callchain.c
> > +++ b/kernel/events/callchain.c
> > @@ -117,6 +117,8 @@ int get_callchain_buffers(void)
> >  	err = alloc_callchain_buffers();
> >  exit:
> >  	mutex_unlock(&callchain_mutex);
> > +	if (err)
> > +		atomic_dec(&nr_callchain_events);
> 
> shouldn't we touch this under above lock?

Right, we should move that under the lock, or another user of the callchains may see that we failed
the allocation and simply give up instead of retrying.

> 
> also that above hunk decrements nr_callchain_events
> also for following case:
> 
>         count = atomic_inc_return(&nr_callchain_events);
>         if (WARN_ON_ONCE(count < 1)) {
>                 err = -EINVAL;
>                 goto exit;
>         }
> 
> seems like it screws the count

I'm not sure what you mean here. You mean that it could be negative when the dec is done
outside the lock?

> 
> jirka


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 2/8] perf: Sanitize get_callchain_buffer()
  2013-07-23  0:31 ` [PATCH 2/8] perf: Sanitize get_callchain_buffer() Frederic Weisbecker
  2013-07-31  8:56   ` [tip:perf/core] " tip-bot for Frederic Weisbecker
  2013-08-01 13:01   ` [PATCH 2/8] " Jiri Olsa
@ 2013-08-01 13:29   ` Jiri Olsa
  2013-08-01 13:42     ` Frederic Weisbecker
  2 siblings, 1 reply; 49+ messages in thread
From: Jiri Olsa @ 2013-08-01 13:29 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, LKML, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

On Tue, Jul 23, 2013 at 02:31:00AM +0200, Frederic Weisbecker wrote:
SNIP

>  		if (event->attach_state & PERF_ATTACH_TASK)
>  			static_key_slow_inc(&perf_sched_events.key);
>  		if (event->attr.mmap || event->attr.mmap_data)
> @@ -6572,16 +6570,19 @@ done:
>  				atomic_inc(&per_cpu(perf_branch_stack_events,
>  						    event->cpu));
>  		}
> -		if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) {
> -			err = get_callchain_buffers();
> -			if (err) {
> -				free_event(event);
> -				return ERR_PTR(err);
> -			}
> -		}
>  	}
>  
>  	return event;
> +
> +err_pmu:
> +	if (event->destroy)
> +		event->destroy(event);
> +err_ns:
> +	if (event->ns)
> +		put_pid_ns(event->ns);
> +	kfree(event);
> +
> +	return ERR_PTR(err);

could we call __free_filter(event) here?

jirka

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 3/8] perf: Gather event accounting code
  2013-08-01 13:13   ` [PATCH 3/8] perf: Gather event accounting code Jiri Olsa
@ 2013-08-01 13:30     ` Frederic Weisbecker
  0 siblings, 0 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2013-08-01 13:30 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, LKML, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

On Thu, Aug 01, 2013 at 03:13:30PM +0200, Jiri Olsa wrote:
> On Tue, Jul 23, 2013 at 02:31:01AM +0200, Frederic Weisbecker wrote:
> > Gather all the event accounting code to a single place,
> > once all the prerequisites are completed. This simplifies
> > the refcounting.
> > 
> > Original-patch-by: Peter Zijlstra <peterz@infradead.org>
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > Cc: Jiri Olsa <jolsa@redhat.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Namhyung Kim <namhyung@kernel.org>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> > Cc: Stephane Eranian <eranian@google.com>
> > ---
> >  kernel/events/core.c |   79 +++++++++++++++++++++++++++++--------------------
> >  1 files changed, 47 insertions(+), 32 deletions(-)
> > 
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index c5f435f..3bb73af 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -3128,6 +3128,21 @@ static void free_event_rcu(struct rcu_head *head)
> >  static void ring_buffer_put(struct ring_buffer *rb);
> >  static void ring_buffer_detach(struct perf_event *event, struct ring_buffer *rb);
> >  
> > +static void __free_event(struct perf_event *event)
> > +{
> > +	if (!event->parent) {
> > +		if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
> > +			put_callchain_buffers();
> > +	}
> > +
> > +	if (event->destroy)
> > +		event->destroy(event);
> > +
> > +	if (event->ctx)
> > +		put_ctx(event->ctx);
> > +
> > +	call_rcu(&event->rcu_head, free_event_rcu);
> > +}
> >  static void free_event(struct perf_event *event)
> >  {
> 
> nitpick, missing nl between functions
> 
> >  	irq_work_sync(&event->pending);
> > @@ -3141,8 +3156,6 @@ static void free_event(struct perf_event *event)
> >  			atomic_dec(&nr_comm_events);
> >  		if (event->attr.task)
> >  			atomic_dec(&nr_task_events);
> > -		if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
> > -			put_callchain_buffers();
> >  		if (is_cgroup_event(event)) {
> >  			atomic_dec(&per_cpu(perf_cgroup_events, event->cpu));
> >  			static_key_slow_dec_deferred(&perf_sched_events);
> > @@ -3180,13 +3193,8 @@ static void free_event(struct perf_event *event)
> >  	if (is_cgroup_event(event))
> >  		perf_detach_cgroup(event);
> >  
> > -	if (event->destroy)
> > -		event->destroy(event);
> > -
> > -	if (event->ctx)
> > -		put_ctx(event->ctx);
> >  
> > -	call_rcu(&event->rcu_head, free_event_rcu);
> > +	__free_event(event);
> 
> nitpick, extra nl above


Ok, I'll fix, thanks!

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 6/8] perf: Account freq events per cpu
  2013-08-01 12:46   ` [PATCH 6/8] " Jiri Olsa
  2013-08-01 12:48     ` Jiri Olsa
@ 2013-08-01 13:31     ` Peter Zijlstra
  2013-08-01 13:35       ` Peter Zijlstra
                         ` (2 more replies)
  1 sibling, 3 replies; 49+ messages in thread
From: Peter Zijlstra @ 2013-08-01 13:31 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Frederic Weisbecker, LKML, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

On Thu, Aug 01, 2013 at 02:46:58PM +0200, Jiri Olsa wrote:
> On Tue, Jul 23, 2013 at 02:31:04AM +0200, Frederic Weisbecker wrote:
> > This is going to be used by the full dynticks subsystem
> > as a finer-grained information to know when to keep and
> > when to stop the tick.
> > 
> > Original-patch-by: Peter Zijlstra <peterz@infradead.org>
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > Cc: Jiri Olsa <jolsa@redhat.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Namhyung Kim <namhyung@kernel.org>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> > Cc: Stephane Eranian <eranian@google.com>
> > ---
> >  kernel/events/core.c |    7 +++++++
> >  1 files changed, 7 insertions(+), 0 deletions(-)
> > 
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index b40c3db..f9bd39b 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -141,6 +141,7 @@ enum event_type_t {
> >  struct static_key_deferred perf_sched_events __read_mostly;
> >  static DEFINE_PER_CPU(atomic_t, perf_cgroup_events);
> >  static DEFINE_PER_CPU(atomic_t, perf_branch_stack_events);
> > +static DEFINE_PER_CPU(atomic_t, perf_freq_events);
> >  
> >  static atomic_t nr_mmap_events __read_mostly;
> >  static atomic_t nr_comm_events __read_mostly;
> > @@ -3139,6 +3140,9 @@ static void unaccount_event_cpu(struct perf_event *event, int cpu)
> >  	}
> >  	if (is_cgroup_event(event))
> >  		atomic_dec(&per_cpu(perf_cgroup_events, cpu));
> > +
> > +	if (event->attr.freq)
> > +		atomic_dec(&per_cpu(perf_freq_events, cpu));
> >  }
> >  
> >  static void unaccount_event(struct perf_event *event)
> > @@ -6473,6 +6477,9 @@ static void account_event_cpu(struct perf_event *event, int cpu)
> >  	}
> >  	if (is_cgroup_event(event))
> >  		atomic_inc(&per_cpu(perf_cgroup_events, cpu));
> > +
> > +	if (event->attr.freq)
> > +		atomic_inc(&per_cpu(perf_freq_events, cpu));
> 
> cpu could be -1 in here.. getting:

Ho humm, right you are. 

So we have:

static void account_event_cpu(struct perf_event *event, int cpu)
{
	if (event->parent)
		return;

	if (has_branch_stack(event)) {
		if (!(event->attach_state & PERF_ATTACH_TASK))
			atomic_inc(&per_cpu(perf_branch_stack_events, cpu));
	}
	if (is_cgroup_event(event))
		atomic_inc(&per_cpu(perf_cgroup_events, cpu));

	if (event->attr.freq)
		atomic_inc(&per_cpu(perf_freq_events, cpu));
}

Where the freq thing is new and shiney, but we already had the other
two. Of those, cgroup events must be per cpu so that should be good,
the branch_stack thing tests ATTACH_TASK, which should also be good, but
leaves me wonder wth they do for those that are attached to tasks.

But yes, the frequency thing is borken.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 2/8] perf: Sanitize get_callchain_buffer()
  2013-08-01 13:28     ` Frederic Weisbecker
@ 2013-08-01 13:32       ` Jiri Olsa
  2013-08-01 13:49         ` Frederic Weisbecker
  0 siblings, 1 reply; 49+ messages in thread
From: Jiri Olsa @ 2013-08-01 13:32 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, LKML, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

On Thu, Aug 01, 2013 at 03:28:34PM +0200, Frederic Weisbecker wrote:

SNIP

> > also for following case:
> > 
> >         count = atomic_inc_return(&nr_callchain_events);
> >         if (WARN_ON_ONCE(count < 1)) {
> >                 err = -EINVAL;
> >                 goto exit;
> >         }
> > 
> > seems like it screws the count
> 
> I'm not sure what you mean here. You mean that it could be negative when the dec is done
> outside the lock?

yes


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 6/8] perf: Account freq events per cpu
  2013-08-01 13:31     ` Peter Zijlstra
@ 2013-08-01 13:35       ` Peter Zijlstra
  2013-08-01 13:39       ` Jiri Olsa
  2013-08-01 13:55       ` Frederic Weisbecker
  2 siblings, 0 replies; 49+ messages in thread
From: Peter Zijlstra @ 2013-08-01 13:35 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Frederic Weisbecker, LKML, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

On Thu, Aug 01, 2013 at 03:31:55PM +0200, Peter Zijlstra wrote:
> the branch_stack thing tests ATTACH_TASK, which should also be good, but
> leaves me wonder wth they do for those that are attached to tasks.

Ah found it, in intel_pmu_lbr_enable() we test cpuc->lbr_context against
event->ctx and wipe the LBR if they differ, that should indeed avoid the
leak but isn't sufficient to guarantee sane data as the context swap for
inherited counters can foil this.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 6/8] perf: Account freq events per cpu
  2013-08-01 13:31     ` Peter Zijlstra
  2013-08-01 13:35       ` Peter Zijlstra
@ 2013-08-01 13:39       ` Jiri Olsa
  2013-08-01 13:56         ` Peter Zijlstra
  2013-08-01 13:55       ` Frederic Weisbecker
  2 siblings, 1 reply; 49+ messages in thread
From: Jiri Olsa @ 2013-08-01 13:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, LKML, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

On Thu, Aug 01, 2013 at 03:31:55PM +0200, Peter Zijlstra wrote:
> On Thu, Aug 01, 2013 at 02:46:58PM +0200, Jiri Olsa wrote:
> > On Tue, Jul 23, 2013 at 02:31:04AM +0200, Frederic Weisbecker wrote:
> > > This is going to be used by the full dynticks subsystem
> > > as a finer-grained information to know when to keep and
> > > when to stop the tick.
> > > 
> > > Original-patch-by: Peter Zijlstra <peterz@infradead.org>
> > > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > > Cc: Jiri Olsa <jolsa@redhat.com>
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Cc: Namhyung Kim <namhyung@kernel.org>
> > > Cc: Ingo Molnar <mingo@kernel.org>
> > > Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> > > Cc: Stephane Eranian <eranian@google.com>
> > > ---
> > >  kernel/events/core.c |    7 +++++++
> > >  1 files changed, 7 insertions(+), 0 deletions(-)
> > > 
> > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > index b40c3db..f9bd39b 100644
> > > --- a/kernel/events/core.c
> > > +++ b/kernel/events/core.c
> > > @@ -141,6 +141,7 @@ enum event_type_t {
> > >  struct static_key_deferred perf_sched_events __read_mostly;
> > >  static DEFINE_PER_CPU(atomic_t, perf_cgroup_events);
> > >  static DEFINE_PER_CPU(atomic_t, perf_branch_stack_events);
> > > +static DEFINE_PER_CPU(atomic_t, perf_freq_events);
> > >  
> > >  static atomic_t nr_mmap_events __read_mostly;
> > >  static atomic_t nr_comm_events __read_mostly;
> > > @@ -3139,6 +3140,9 @@ static void unaccount_event_cpu(struct perf_event *event, int cpu)
> > >  	}
> > >  	if (is_cgroup_event(event))
> > >  		atomic_dec(&per_cpu(perf_cgroup_events, cpu));
> > > +
> > > +	if (event->attr.freq)
> > > +		atomic_dec(&per_cpu(perf_freq_events, cpu));
> > >  }
> > >  
> > >  static void unaccount_event(struct perf_event *event)
> > > @@ -6473,6 +6477,9 @@ static void account_event_cpu(struct perf_event *event, int cpu)
> > >  	}
> > >  	if (is_cgroup_event(event))
> > >  		atomic_inc(&per_cpu(perf_cgroup_events, cpu));
> > > +
> > > +	if (event->attr.freq)
> > > +		atomic_inc(&per_cpu(perf_freq_events, cpu));
> > 
> > cpu could be -1 in here.. getting:
> 
> Ho humm, right you are. 
> 
> So we have:
> 
> static void account_event_cpu(struct perf_event *event, int cpu)
> {
> 	if (event->parent)
> 		return;
> 
> 	if (has_branch_stack(event)) {
> 		if (!(event->attach_state & PERF_ATTACH_TASK))
> 			atomic_inc(&per_cpu(perf_branch_stack_events, cpu));
> 	}
> 	if (is_cgroup_event(event))
> 		atomic_inc(&per_cpu(perf_cgroup_events, cpu));
> 
> 	if (event->attr.freq)
> 		atomic_inc(&per_cpu(perf_freq_events, cpu));
> }
> 
> Where the freq thing is new and shiney, but we already had the other
> two. Of those, cgroup events must be per cpu so that should be good,
> the branch_stack thing tests ATTACH_TASK, which should also be good, but
> leaves me wonder wth they do for those that are attached to tasks.

cgroup is cpu only:

SYSCALL(..
        if ((flags & PERF_FLAG_PID_CGROUP) && (pid == -1 || cpu == -1))
                return -EINVAL;

jirka

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 2/8] perf: Sanitize get_callchain_buffer()
  2013-08-01 13:29   ` Jiri Olsa
@ 2013-08-01 13:42     ` Frederic Weisbecker
  2013-08-01 13:51       ` Jiri Olsa
  0 siblings, 1 reply; 49+ messages in thread
From: Frederic Weisbecker @ 2013-08-01 13:42 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, LKML, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

On Thu, Aug 01, 2013 at 03:29:34PM +0200, Jiri Olsa wrote:
> On Tue, Jul 23, 2013 at 02:31:00AM +0200, Frederic Weisbecker wrote:
> SNIP
> 
> >  		if (event->attach_state & PERF_ATTACH_TASK)
> >  			static_key_slow_inc(&perf_sched_events.key);
> >  		if (event->attr.mmap || event->attr.mmap_data)
> > @@ -6572,16 +6570,19 @@ done:
> >  				atomic_inc(&per_cpu(perf_branch_stack_events,
> >  						    event->cpu));
> >  		}
> > -		if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) {
> > -			err = get_callchain_buffers();
> > -			if (err) {
> > -				free_event(event);
> > -				return ERR_PTR(err);
> > -			}
> > -		}
> >  	}
> >  
> >  	return event;
> > +
> > +err_pmu:
> > +	if (event->destroy)
> > +		event->destroy(event);
> > +err_ns:
> > +	if (event->ns)
> > +		put_pid_ns(event->ns);
> > +	kfree(event);
> > +
> > +	return ERR_PTR(err);
> 
> could we call __free_filter(event) here?

Hmm, the filters are installed from ioctl time so there shouldn't be any yet. But there should be
an exception with inherited events. I fail to find where the filter is inherited though. Do
we actually inherit those?

thanks.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 2/8] perf: Sanitize get_callchain_buffer()
  2013-08-01 13:32       ` Jiri Olsa
@ 2013-08-01 13:49         ` Frederic Weisbecker
  2013-08-01 13:54           ` Jiri Olsa
  0 siblings, 1 reply; 49+ messages in thread
From: Frederic Weisbecker @ 2013-08-01 13:49 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, LKML, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

On Thu, Aug 01, 2013 at 03:32:17PM +0200, Jiri Olsa wrote:
> On Thu, Aug 01, 2013 at 03:28:34PM +0200, Frederic Weisbecker wrote:
> 
> SNIP
> 
> > > also for following case:
> > > 
> > >         count = atomic_inc_return(&nr_callchain_events);
> > >         if (WARN_ON_ONCE(count < 1)) {
> > >                 err = -EINVAL;
> > >                 goto exit;
> > >         }
> > > 
> > > seems like it screws the count
> > 
> > I'm not sure what you mean here. You mean that it could be negative when the dec is done
> > outside the lock?
> 
> yes
> 

I don't understand why.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 2/8] perf: Sanitize get_callchain_buffer()
  2013-08-01 13:42     ` Frederic Weisbecker
@ 2013-08-01 13:51       ` Jiri Olsa
  2013-08-01 14:30         ` Frederic Weisbecker
  0 siblings, 1 reply; 49+ messages in thread
From: Jiri Olsa @ 2013-08-01 13:51 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, LKML, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

On Thu, Aug 01, 2013 at 03:42:28PM +0200, Frederic Weisbecker wrote:
> On Thu, Aug 01, 2013 at 03:29:34PM +0200, Jiri Olsa wrote:
> > On Tue, Jul 23, 2013 at 02:31:00AM +0200, Frederic Weisbecker wrote:
> > SNIP
> > 
> > >  		if (event->attach_state & PERF_ATTACH_TASK)
> > >  			static_key_slow_inc(&perf_sched_events.key);
> > >  		if (event->attr.mmap || event->attr.mmap_data)
> > > @@ -6572,16 +6570,19 @@ done:
> > >  				atomic_inc(&per_cpu(perf_branch_stack_events,
> > >  						    event->cpu));
> > >  		}
> > > -		if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) {
> > > -			err = get_callchain_buffers();
> > > -			if (err) {
> > > -				free_event(event);
> > > -				return ERR_PTR(err);
> > > -			}
> > > -		}
> > >  	}
> > >  
> > >  	return event;
> > > +
> > > +err_pmu:
> > > +	if (event->destroy)
> > > +		event->destroy(event);
> > > +err_ns:
> > > +	if (event->ns)
> > > +		put_pid_ns(event->ns);
> > > +	kfree(event);
> > > +
> > > +	return ERR_PTR(err);
> > 
> > could we call __free_filter(event) here?
> 
> Hmm, the filters are installed from ioctl time so there shouldn't be any yet. But there should be
> an exception with inherited events. I fail to find where the filter is inherited though. Do
> we actually inherit those?

ouch.. last I checked was freeing filter before writing this... :)

what I meant was the __free_event(event)

sorry

jirka

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 2/8] perf: Sanitize get_callchain_buffer()
  2013-08-01 13:49         ` Frederic Weisbecker
@ 2013-08-01 13:54           ` Jiri Olsa
  2013-08-01 13:57             ` Frederic Weisbecker
  0 siblings, 1 reply; 49+ messages in thread
From: Jiri Olsa @ 2013-08-01 13:54 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, LKML, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

On Thu, Aug 01, 2013 at 03:49:36PM +0200, Frederic Weisbecker wrote:
> On Thu, Aug 01, 2013 at 03:32:17PM +0200, Jiri Olsa wrote:
> > On Thu, Aug 01, 2013 at 03:28:34PM +0200, Frederic Weisbecker wrote:
> > 
> > SNIP
> > 
> > > > also for following case:
> > > > 
> > > >         count = atomic_inc_return(&nr_callchain_events);
> > > >         if (WARN_ON_ONCE(count < 1)) {
> > > >                 err = -EINVAL;
> > > >                 goto exit;
> > > >         }
> > > > 
> > > > seems like it screws the count
> > > 
> > > I'm not sure what you mean here. You mean that it could be negative when the dec is done
> > > outside the lock?
> > 
> > yes
> > 
> 
> I don't understand why.

ah ok, with the count being 0 the nr_callchain_events still
increments right? then it's ok..

jirka

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 6/8] perf: Account freq events per cpu
  2013-08-01 13:31     ` Peter Zijlstra
  2013-08-01 13:35       ` Peter Zijlstra
  2013-08-01 13:39       ` Jiri Olsa
@ 2013-08-01 13:55       ` Frederic Weisbecker
  2013-08-01 14:03         ` Peter Zijlstra
  2 siblings, 1 reply; 49+ messages in thread
From: Frederic Weisbecker @ 2013-08-01 13:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jiri Olsa, LKML, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

On Thu, Aug 01, 2013 at 03:31:55PM +0200, Peter Zijlstra wrote:
> On Thu, Aug 01, 2013 at 02:46:58PM +0200, Jiri Olsa wrote:
> > On Tue, Jul 23, 2013 at 02:31:04AM +0200, Frederic Weisbecker wrote:
> > > This is going to be used by the full dynticks subsystem
> > > as a finer-grained information to know when to keep and
> > > when to stop the tick.
> > > 
> > > Original-patch-by: Peter Zijlstra <peterz@infradead.org>
> > > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > > Cc: Jiri Olsa <jolsa@redhat.com>
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Cc: Namhyung Kim <namhyung@kernel.org>
> > > Cc: Ingo Molnar <mingo@kernel.org>
> > > Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> > > Cc: Stephane Eranian <eranian@google.com>
> > > ---
> > >  kernel/events/core.c |    7 +++++++
> > >  1 files changed, 7 insertions(+), 0 deletions(-)
> > > 
> > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > index b40c3db..f9bd39b 100644
> > > --- a/kernel/events/core.c
> > > +++ b/kernel/events/core.c
> > > @@ -141,6 +141,7 @@ enum event_type_t {
> > >  struct static_key_deferred perf_sched_events __read_mostly;
> > >  static DEFINE_PER_CPU(atomic_t, perf_cgroup_events);
> > >  static DEFINE_PER_CPU(atomic_t, perf_branch_stack_events);
> > > +static DEFINE_PER_CPU(atomic_t, perf_freq_events);
> > >  
> > >  static atomic_t nr_mmap_events __read_mostly;
> > >  static atomic_t nr_comm_events __read_mostly;
> > > @@ -3139,6 +3140,9 @@ static void unaccount_event_cpu(struct perf_event *event, int cpu)
> > >  	}
> > >  	if (is_cgroup_event(event))
> > >  		atomic_dec(&per_cpu(perf_cgroup_events, cpu));
> > > +
> > > +	if (event->attr.freq)
> > > +		atomic_dec(&per_cpu(perf_freq_events, cpu));
> > >  }
> > >  
> > >  static void unaccount_event(struct perf_event *event)
> > > @@ -6473,6 +6477,9 @@ static void account_event_cpu(struct perf_event *event, int cpu)
> > >  	}
> > >  	if (is_cgroup_event(event))
> > >  		atomic_inc(&per_cpu(perf_cgroup_events, cpu));
> > > +
> > > +	if (event->attr.freq)
> > > +		atomic_inc(&per_cpu(perf_freq_events, cpu));
> > 
> > cpu could be -1 in here.. getting:
> 
> Ho humm, right you are. 
> 
> So we have:
> 
> static void account_event_cpu(struct perf_event *event, int cpu)
> {
> 	if (event->parent)
> 		return;
> 
> 	if (has_branch_stack(event)) {
> 		if (!(event->attach_state & PERF_ATTACH_TASK))
> 			atomic_inc(&per_cpu(perf_branch_stack_events, cpu));
> 	}
> 	if (is_cgroup_event(event))
> 		atomic_inc(&per_cpu(perf_cgroup_events, cpu));
> 
> 	if (event->attr.freq)
> 		atomic_inc(&per_cpu(perf_freq_events, cpu));
> }
> 
> Where the freq thing is new and shiney, but we already had the other
> two. Of those, cgroup events must be per cpu so that should be good,
> the branch_stack thing tests ATTACH_TASK, which should also be good, but
> leaves me wonder wth they do for those that are attached to tasks.
> 
> But yes, the frequency thing is borken.

Aie, so the freq thing,  I can either account to all CPUs (inc to all and send an IPI to all), or
when the event scheds in/out. Probably we should do the former to avoid sending an IPI at all context switches.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 6/8] perf: Account freq events per cpu
  2013-08-01 13:39       ` Jiri Olsa
@ 2013-08-01 13:56         ` Peter Zijlstra
  0 siblings, 0 replies; 49+ messages in thread
From: Peter Zijlstra @ 2013-08-01 13:56 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Frederic Weisbecker, LKML, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

On Thu, Aug 01, 2013 at 03:39:21PM +0200, Jiri Olsa wrote:
> On Thu, Aug 01, 2013 at 03:31:55PM +0200, Peter Zijlstra wrote:
> > two. Of those, cgroup events must be per cpu so that should be good,

> cgroup is cpu only:

jah! :-)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 2/8] perf: Sanitize get_callchain_buffer()
  2013-08-01 13:54           ` Jiri Olsa
@ 2013-08-01 13:57             ` Frederic Weisbecker
  0 siblings, 0 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2013-08-01 13:57 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, LKML, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

On Thu, Aug 01, 2013 at 03:54:01PM +0200, Jiri Olsa wrote:
> On Thu, Aug 01, 2013 at 03:49:36PM +0200, Frederic Weisbecker wrote:
> > On Thu, Aug 01, 2013 at 03:32:17PM +0200, Jiri Olsa wrote:
> > > On Thu, Aug 01, 2013 at 03:28:34PM +0200, Frederic Weisbecker wrote:
> > > 
> > > SNIP
> > > 
> > > > > also for following case:
> > > > > 
> > > > >         count = atomic_inc_return(&nr_callchain_events);
> > > > >         if (WARN_ON_ONCE(count < 1)) {
> > > > >                 err = -EINVAL;
> > > > >                 goto exit;
> > > > >         }
> > > > > 
> > > > > seems like it screws the count
> > > > 
> > > > I'm not sure what you mean here. You mean that it could be negative when the dec is done
> > > > outside the lock?
> > > 
> > > yes
> > > 
> > 
> > I don't understand why.
> 
> ah ok, with the count being 0 the nr_callchain_events still
> increments right? then it's ok..

Yep. But I still need to move the dec into the lock though.

Thanks for catching this!

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 6/8] perf: Account freq events per cpu
  2013-08-01 13:55       ` Frederic Weisbecker
@ 2013-08-01 14:03         ` Peter Zijlstra
  2013-08-01 14:06           ` Peter Zijlstra
  2013-08-01 14:19           ` Frederic Weisbecker
  0 siblings, 2 replies; 49+ messages in thread
From: Peter Zijlstra @ 2013-08-01 14:03 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Jiri Olsa, LKML, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

On Thu, Aug 01, 2013 at 03:55:27PM +0200, Frederic Weisbecker wrote:
> On Thu, Aug 01, 2013 at 03:31:55PM +0200, Peter Zijlstra wrote:
> > 
> > Where the freq thing is new and shiney, but we already had the other
> > two. Of those, cgroup events must be per cpu so that should be good,
> > the branch_stack thing tests ATTACH_TASK, which should also be good, but
> > leaves me wonder wth they do for those that are attached to tasks.
> > 
> > But yes, the frequency thing is borken.
> 
> Aie, so the freq thing,  I can either account to all CPUs (inc to all
> and send an IPI to all), or when the event scheds in/out. Probably we
> should do the former to avoid sending an IPI at all context switches.

Yeah, just go with a single global state for now..

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 6/8] perf: Account freq events per cpu
  2013-08-01 14:03         ` Peter Zijlstra
@ 2013-08-01 14:06           ` Peter Zijlstra
  2013-08-01 14:21             ` Frederic Weisbecker
  2013-08-01 14:19           ` Frederic Weisbecker
  1 sibling, 1 reply; 49+ messages in thread
From: Peter Zijlstra @ 2013-08-01 14:06 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Jiri Olsa, LKML, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

On Thu, Aug 01, 2013 at 04:03:52PM +0200, Peter Zijlstra wrote:
> On Thu, Aug 01, 2013 at 03:55:27PM +0200, Frederic Weisbecker wrote:
> > On Thu, Aug 01, 2013 at 03:31:55PM +0200, Peter Zijlstra wrote:
> > > 
> > > Where the freq thing is new and shiney, but we already had the other
> > > two. Of those, cgroup events must be per cpu so that should be good,
> > > the branch_stack thing tests ATTACH_TASK, which should also be good, but
> > > leaves me wonder wth they do for those that are attached to tasks.
> > > 
> > > But yes, the frequency thing is borken.
> > 
> > Aie, so the freq thing,  I can either account to all CPUs (inc to all
> > and send an IPI to all), or when the event scheds in/out. Probably we
> > should do the former to avoid sending an IPI at all context switches.
> 
> Yeah, just go with a single global state for now..

The perf default is to create inherited counter, which are per cpu
anyway. So we'll not loose much.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 6/8] perf: Account freq events per cpu
  2013-08-01 14:03         ` Peter Zijlstra
  2013-08-01 14:06           ` Peter Zijlstra
@ 2013-08-01 14:19           ` Frederic Weisbecker
  1 sibling, 0 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2013-08-01 14:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jiri Olsa, LKML, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

On Thu, Aug 01, 2013 at 04:03:52PM +0200, Peter Zijlstra wrote:
> On Thu, Aug 01, 2013 at 03:55:27PM +0200, Frederic Weisbecker wrote:
> > On Thu, Aug 01, 2013 at 03:31:55PM +0200, Peter Zijlstra wrote:
> > > 
> > > Where the freq thing is new and shiney, but we already had the other
> > > two. Of those, cgroup events must be per cpu so that should be good,
> > > the branch_stack thing tests ATTACH_TASK, which should also be good, but
> > > leaves me wonder wth they do for those that are attached to tasks.
> > > 
> > > But yes, the frequency thing is borken.
> > 
> > Aie, so the freq thing,  I can either account to all CPUs (inc to all
> > and send an IPI to all), or when the event scheds in/out. Probably we
> > should do the former to avoid sending an IPI at all context switches.
> 
> Yeah, just go with a single global state for now..

A single global counter? Ok sounds good.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 6/8] perf: Account freq events per cpu
  2013-08-01 14:06           ` Peter Zijlstra
@ 2013-08-01 14:21             ` Frederic Weisbecker
  2013-08-01 14:40               ` Peter Zijlstra
  0 siblings, 1 reply; 49+ messages in thread
From: Frederic Weisbecker @ 2013-08-01 14:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jiri Olsa, LKML, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

On Thu, Aug 01, 2013 at 04:06:15PM +0200, Peter Zijlstra wrote:
> On Thu, Aug 01, 2013 at 04:03:52PM +0200, Peter Zijlstra wrote:
> > On Thu, Aug 01, 2013 at 03:55:27PM +0200, Frederic Weisbecker wrote:
> > > On Thu, Aug 01, 2013 at 03:31:55PM +0200, Peter Zijlstra wrote:
> > > > 
> > > > Where the freq thing is new and shiney, but we already had the other
> > > > two. Of those, cgroup events must be per cpu so that should be good,
> > > > the branch_stack thing tests ATTACH_TASK, which should also be good, but
> > > > leaves me wonder wth they do for those that are attached to tasks.
> > > > 
> > > > But yes, the frequency thing is borken.
> > > 
> > > Aie, so the freq thing,  I can either account to all CPUs (inc to all
> > > and send an IPI to all), or when the event scheds in/out. Probably we
> > > should do the former to avoid sending an IPI at all context switches.
> > 
> > Yeah, just go with a single global state for now..
> 
> The perf default is to create inherited counter, which are per cpu
> anyway. So we'll not loose much.

So you mean that I keep the per cpu state when event->cpu != -1 and also have
a global counter for the others. Right?

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 2/8] perf: Sanitize get_callchain_buffer()
  2013-08-01 13:51       ` Jiri Olsa
@ 2013-08-01 14:30         ` Frederic Weisbecker
  0 siblings, 0 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2013-08-01 14:30 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, LKML, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

On Thu, Aug 01, 2013 at 03:51:02PM +0200, Jiri Olsa wrote:
> On Thu, Aug 01, 2013 at 03:42:28PM +0200, Frederic Weisbecker wrote:
> > On Thu, Aug 01, 2013 at 03:29:34PM +0200, Jiri Olsa wrote:
> > > On Tue, Jul 23, 2013 at 02:31:00AM +0200, Frederic Weisbecker wrote:
> > > SNIP
> > > 
> > > >  		if (event->attach_state & PERF_ATTACH_TASK)
> > > >  			static_key_slow_inc(&perf_sched_events.key);
> > > >  		if (event->attr.mmap || event->attr.mmap_data)
> > > > @@ -6572,16 +6570,19 @@ done:
> > > >  				atomic_inc(&per_cpu(perf_branch_stack_events,
> > > >  						    event->cpu));
> > > >  		}
> > > > -		if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) {
> > > > -			err = get_callchain_buffers();
> > > > -			if (err) {
> > > > -				free_event(event);
> > > > -				return ERR_PTR(err);
> > > > -			}
> > > > -		}
> > > >  	}
> > > >  
> > > >  	return event;
> > > > +
> > > > +err_pmu:
> > > > +	if (event->destroy)
> > > > +		event->destroy(event);
> > > > +err_ns:
> > > > +	if (event->ns)
> > > > +		put_pid_ns(event->ns);
> > > > +	kfree(event);
> > > > +
> > > > +	return ERR_PTR(err);
> > > 
> > > could we call __free_filter(event) here?
> > 
> > Hmm, the filters are installed from ioctl time so there shouldn't be any yet. But there should be
> > an exception with inherited events. I fail to find where the filter is inherited though. Do
> > we actually inherit those?
> 
> ouch.. last I checked was freeing filter before writing this... :)
> 
> what I meant was the __free_event(event)

free_event() doesn't work either because we want several level of rollback depending
of where the error triggered:

   +err_pmu:
           if (event->destroy)
                 event->destroy(event);
   +err_ns:
           if (event->ns)
                 put_pid_ns(event->ns);
           kfree(event);

           return ERR_PTR(err)

If we fail after pmu init we want to call destroy, free pid ns and the event.
If we fail before the pmu init, we want to only free pid ns and the event, ...

_free_event() does the whole in any case, which is not what we want.

But...

OTOH it might work due to the if (event->destroy) and if (event->ns) before freeing the
resource associated.

So may be I can replace the labels with a single call to __free_event() after all as it
checks what needs to be freed.  What do you think?

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 6/8] perf: Account freq events per cpu
  2013-08-01 14:21             ` Frederic Weisbecker
@ 2013-08-01 14:40               ` Peter Zijlstra
  2013-08-02 16:25                 ` Frederic Weisbecker
  0 siblings, 1 reply; 49+ messages in thread
From: Peter Zijlstra @ 2013-08-01 14:40 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Jiri Olsa, LKML, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

On Thu, Aug 01, 2013 at 04:21:11PM +0200, Frederic Weisbecker wrote:
> > > Yeah, just go with a single global state for now..
> > 
> > The perf default is to create inherited counter, which are per cpu
> > anyway. So we'll not loose much.
> 
> So you mean that I keep the per cpu state when event->cpu != -1 and also have
> a global counter for the others. Right?

No, only a single global counter. The fact that perf, by default,
creates a counter per cpu, means that there's little effective
difference between a single global counter and per-cpu counters.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 6/8] perf: Account freq events per cpu
  2013-08-01 14:40               ` Peter Zijlstra
@ 2013-08-02 16:25                 ` Frederic Weisbecker
  0 siblings, 0 replies; 49+ messages in thread
From: Frederic Weisbecker @ 2013-08-02 16:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jiri Olsa, LKML, Namhyung Kim, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

On Thu, Aug 01, 2013 at 04:40:05PM +0200, Peter Zijlstra wrote:
> On Thu, Aug 01, 2013 at 04:21:11PM +0200, Frederic Weisbecker wrote:
> > > > Yeah, just go with a single global state for now..
> > > 
> > > The perf default is to create inherited counter, which are per cpu
> > > anyway. So we'll not loose much.
> > 
> > So you mean that I keep the per cpu state when event->cpu != -1 and also have
> > a global counter for the others. Right?
> 
> No, only a single global counter. The fact that perf, by default,
> creates a counter per cpu, means that there's little effective
> difference between a single global counter and per-cpu counters.

Good point! I just included that in the changelog.

Thanks.

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2013-08-02 16:26 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-23  0:30 [PATCH 0/8] perf: Finer grained full dynticks handling Frederic Weisbecker
2013-07-23  0:30 ` [PATCH 1/8] perf: Fix branch stack refcount leak on callchain init failure Frederic Weisbecker
2013-07-31  8:55   ` [tip:perf/core] " tip-bot for Frederic Weisbecker
2013-07-23  0:31 ` [PATCH 2/8] perf: Sanitize get_callchain_buffer() Frederic Weisbecker
2013-07-31  8:56   ` [tip:perf/core] " tip-bot for Frederic Weisbecker
2013-08-01 13:01   ` [PATCH 2/8] " Jiri Olsa
2013-08-01 13:28     ` Frederic Weisbecker
2013-08-01 13:32       ` Jiri Olsa
2013-08-01 13:49         ` Frederic Weisbecker
2013-08-01 13:54           ` Jiri Olsa
2013-08-01 13:57             ` Frederic Weisbecker
2013-08-01 13:29   ` Jiri Olsa
2013-08-01 13:42     ` Frederic Weisbecker
2013-08-01 13:51       ` Jiri Olsa
2013-08-01 14:30         ` Frederic Weisbecker
2013-07-23  0:31 ` [PATCH 3/8] perf: Gather event accounting code Frederic Weisbecker
2013-07-31  8:56   ` [tip:perf/core] perf: Factor out event accounting code to account_event()/__free_event() tip-bot for Frederic Weisbecker
2013-08-01 13:13   ` [PATCH 3/8] perf: Gather event accounting code Jiri Olsa
2013-08-01 13:30     ` Frederic Weisbecker
2013-07-23  0:31 ` [PATCH 4/8] perf: Split per cpu " Frederic Weisbecker
2013-07-31  8:56   ` [tip:perf/core] perf: Split the per-cpu accounting part of the " tip-bot for Frederic Weisbecker
2013-07-23  0:31 ` [PATCH 5/8] perf: Migrate per cpu event accounting Frederic Weisbecker
2013-07-31  8:56   ` [tip:perf/core] " tip-bot for Frederic Weisbecker
2013-07-23  0:31 ` [PATCH 6/8] perf: Account freq events per cpu Frederic Weisbecker
2013-07-31  8:56   ` [tip:perf/core] " tip-bot for Frederic Weisbecker
2013-08-01 12:46   ` [PATCH 6/8] " Jiri Olsa
2013-08-01 12:48     ` Jiri Olsa
2013-08-01 13:31     ` Peter Zijlstra
2013-08-01 13:35       ` Peter Zijlstra
2013-08-01 13:39       ` Jiri Olsa
2013-08-01 13:56         ` Peter Zijlstra
2013-08-01 13:55       ` Frederic Weisbecker
2013-08-01 14:03         ` Peter Zijlstra
2013-08-01 14:06           ` Peter Zijlstra
2013-08-01 14:21             ` Frederic Weisbecker
2013-08-01 14:40               ` Peter Zijlstra
2013-08-02 16:25                 ` Frederic Weisbecker
2013-08-01 14:19           ` Frederic Weisbecker
2013-07-23  0:31 ` [PATCH 7/8] perf: Finer grained full dynticks kick Frederic Weisbecker
2013-07-31  8:56   ` [tip:perf/core] perf: Implement finer " tip-bot for Frederic Weisbecker
2013-07-23  0:31 ` [PATCH 8/8] watchdog: Remove hack to make full dynticks working Frederic Weisbecker
2013-07-23 12:33   ` Don Zickus
2013-07-23 12:44     ` Frederic Weisbecker
2013-07-23 12:45     ` Peter Zijlstra
2013-07-31  8:57   ` [tip:perf/core] watchdog: Make it work under full dynticks tip-bot for Frederic Weisbecker
2013-07-25  9:59 ` [PATCH 0/8] perf: Finer grained full dynticks handling Peter Zijlstra
2013-07-25 14:02   ` Frederic Weisbecker
2013-07-25 16:29     ` Peter Zijlstra
2013-07-25 20:07       ` Frederic Weisbecker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.