linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] perf_counter: Prevent oopses from per-cpu software counters
@ 2009-02-05  4:52 Paul Mackerras
  2009-02-05 14:22 ` Ingo Molnar
  0 siblings, 1 reply; 6+ messages in thread
From: Paul Mackerras @ 2009-02-05  4:52 UTC (permalink / raw)
  To: Ingo Molnar, Zhang, Yanmin; +Cc: linux-kernel

Impact: oops fix

Yanmin Zhang reported that using a PERF_COUNT_TASK_CLOCK software
counter as a per-cpu counter would reliably crash the system, because
it calls __task_delta_exec with a null pointer.  And indeed, a "task
clock" counter only makes sense as a per-task counter.  Similarly,
counting page faults, context switches or cpu migrations only makes
sense for a per-task counter.

This fixes the problem by disallowing the use of the task clock,
page fault, context switch and cpu migration software counters as
per-cpu counters, since they all require a task context to obtain their
data.  The only software counter that can be used as a per-cpu counter
is the cpu clock counter (PERF_COUNT_CPU_CYCLES).

In order for sw_perf_counter_init to be able to tell whether we are
setting up a per-task or a per-cpu counter, this arranges for counter->ctx
to be initialized earlier, in perf_counter_alloc.

The other minor change this makes is to ensure that if sw_perf_counter_init
fails, we don't try to initialize the counter as a hardware counter.
Since the user has passed a negative event type (and it isn't raw), they
clearly don't intend it to be interpreted as a hardware event.  This
matters now that sw_perf_counter_init can fail for valid software event
types (because of the check that the counter is a per-task counter).

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
This is in my perfcounters.git tree master branch at:

git://git.kernel.org/pub/scm/linux/kernel/git/paulus/perfcounters.git master

Ingo - pull if you agree with it.  The check in sw_perf_counter_init
may need to be modified if we add more software counters, of course,
but that can happen later.

diff --git a/kernel/perf_counter.c b/kernel/perf_counter.c
index f27a7e9..47d91da 100644
--- a/kernel/perf_counter.c
+++ b/kernel/perf_counter.c
@@ -502,7 +502,6 @@ perf_install_in_context(struct perf_counter_context *ctx,
 {
 	struct task_struct *task = ctx->task;
 
-	counter->ctx = ctx;
 	if (!task) {
 		/*
 		 * Per cpu counters are installed via an smp call and
@@ -1564,6 +1563,15 @@ sw_perf_counter_init(struct perf_counter *counter)
 {
 	const struct hw_perf_counter_ops *hw_ops = NULL;
 
+	/*
+	 * PERF_COUNT_CPU_CLOCK is the only type that is valid
+	 * for a per-cpu counter.  All the others require a task
+	 * context and thus can only be used as per-task counters.
+	 */
+	if (!counter->ctx->task &&
+	    counter->hw_event.type != PERF_COUNT_CPU_CLOCK)
+		return NULL;
+
 	switch (counter->hw_event.type) {
 	case PERF_COUNT_CPU_CLOCK:
 		hw_ops = &perf_ops_cpu_clock;
@@ -1592,6 +1600,7 @@ sw_perf_counter_init(struct perf_counter *counter)
 static struct perf_counter *
 perf_counter_alloc(struct perf_counter_hw_event *hw_event,
 		   int cpu,
+		   struct perf_counter_context *ctx,
 		   struct perf_counter *group_leader,
 		   gfp_t gfpflags)
 {
@@ -1623,6 +1632,7 @@ perf_counter_alloc(struct perf_counter_hw_event *hw_event,
 	counter->wakeup_pending		= 0;
 	counter->group_leader		= group_leader;
 	counter->hw_ops			= NULL;
+	counter->ctx			= ctx;
 
 	counter->state = PERF_COUNTER_STATE_INACTIVE;
 	if (hw_event->disabled)
@@ -1631,7 +1641,7 @@ perf_counter_alloc(struct perf_counter_hw_event *hw_event,
 	hw_ops = NULL;
 	if (!hw_event->raw && hw_event->type < 0)
 		hw_ops = sw_perf_counter_init(counter);
-	if (!hw_ops)
+	else
 		hw_ops = hw_perf_counter_init(counter);
 
 	if (!hw_ops) {
@@ -1707,7 +1717,8 @@ sys_perf_counter_open(struct perf_counter_hw_event *hw_event_uptr __user,
 	}
 
 	ret = -EINVAL;
-	counter = perf_counter_alloc(&hw_event, cpu, group_leader, GFP_KERNEL);
+	counter = perf_counter_alloc(&hw_event, cpu, ctx, group_leader,
+				     GFP_KERNEL);
 	if (!counter)
 		goto err_put_context;
 
@@ -1777,15 +1788,14 @@ inherit_counter(struct perf_counter *parent_counter,
 		parent_counter = parent_counter->parent;
 
 	child_counter = perf_counter_alloc(&parent_counter->hw_event,
-					    parent_counter->cpu, group_leader,
-					    GFP_KERNEL);
+					   parent_counter->cpu, child_ctx,
+					   group_leader, GFP_KERNEL);
 	if (!child_counter)
 		return NULL;
 
 	/*
 	 * Link it up in the child's context:
 	 */
-	child_counter->ctx = child_ctx;
 	child_counter->task = child;
 	list_add_counter(child_counter, child_ctx);
 	child_ctx->nr_counters++;


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf_counter: Prevent oopses from per-cpu software counters
  2009-02-05  4:52 [PATCH] perf_counter: Prevent oopses from per-cpu software counters Paul Mackerras
@ 2009-02-05 14:22 ` Ingo Molnar
  2009-02-06  2:40   ` Paul Mackerras
  2009-02-06  4:34   ` Paul Mackerras
  0 siblings, 2 replies; 6+ messages in thread
From: Ingo Molnar @ 2009-02-05 14:22 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Zhang, Yanmin, linux-kernel


* Paul Mackerras <paulus@samba.org> wrote:

> Impact: oops fix
> 
> Yanmin Zhang reported that using a PERF_COUNT_TASK_CLOCK software
> counter as a per-cpu counter would reliably crash the system, because
> it calls __task_delta_exec with a null pointer.  And indeed, a "task
> clock" counter only makes sense as a per-task counter.  Similarly,
> counting page faults, context switches or cpu migrations only makes
> sense for a per-task counter.
> 
> This fixes the problem by disallowing the use of the task clock,
> page fault, context switch and cpu migration software counters as
> per-cpu counters, since they all require a task context to obtain their
> data.  The only software counter that can be used as a per-cpu counter
> is the cpu clock counter (PERF_COUNT_CPU_CYCLES).
> 
> In order for sw_perf_counter_init to be able to tell whether we are
> setting up a per-task or a per-cpu counter, this arranges for counter->ctx
> to be initialized earlier, in perf_counter_alloc.
> 
> The other minor change this makes is to ensure that if sw_perf_counter_init
> fails, we don't try to initialize the counter as a hardware counter.
> Since the user has passed a negative event type (and it isn't raw), they
> clearly don't intend it to be interpreted as a hardware event.  This
> matters now that sw_perf_counter_init can fail for valid software event
> types (because of the check that the counter is a per-task counter).

Hm, i dont really think that the notion that it should not be possible to 
use sw counters on a per CPU basis is valid.

You are right that "pagefaults" and "context switches" do get generated by 
tasks - but there is a per cpu and system wide notion of 'number of 
pagefaults', and people might be interested in monitoring that.

The existence and widespread use of "vmstat", and its display of system-wide 
count of "context switches" (and administrator's reliance on judging a 
workload based on those counts) is i think ample proof that it makes sense 
to have those counters on a per CPU basis too.

So how about fixing these sw counts to properly work as percpu counters too? 
Or am i misssing something subtle that makes that impossible?

	Ingo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf_counter: Prevent oopses from per-cpu software counters
  2009-02-05 14:22 ` Ingo Molnar
@ 2009-02-06  2:40   ` Paul Mackerras
  2009-02-06  4:34   ` Paul Mackerras
  1 sibling, 0 replies; 6+ messages in thread
From: Paul Mackerras @ 2009-02-06  2:40 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Zhang, Yanmin, linux-kernel

Ingo Molnar writes:

> You are right that "pagefaults" and "context switches" do get generated by 
> tasks - but there is a per cpu and system wide notion of 'number of 
> pagefaults', and people might be interested in monitoring that.

Sure.  What I was trying to say was that the existing code for the
page fault, context switch and task migration software counters uses
"current", so as it is, they can't be used as per-cpu counters.

> The existence and widespread use of "vmstat", and its display of system-wide 
> count of "context switches" (and administrator's reliance on judging a 
> workload based on those counts) is i think ample proof that it makes sense 
> to have those counters on a per CPU basis too.
> 
> So how about fixing these sw counts to properly work as percpu counters too? 
> Or am i misssing something subtle that makes that impossible?

I'll do that, but I think it can be a second patch on top of the one I
posted.  My existing patch doesn't disable anything that previously
worked, it just stops you doing things that would cause an oops.  So I
think it is strictly an improvement on the current situation.

Paul.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf_counter: Prevent oopses from per-cpu software counters
  2009-02-05 14:22 ` Ingo Molnar
  2009-02-06  2:40   ` Paul Mackerras
@ 2009-02-06  4:34   ` Paul Mackerras
  2009-02-06  6:35     ` Zhang, Yanmin
  2009-02-06 15:31     ` Ingo Molnar
  1 sibling, 2 replies; 6+ messages in thread
From: Paul Mackerras @ 2009-02-06  4:34 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Zhang, Yanmin, linux-kernel

Ingo Molnar writes:

> So how about fixing these sw counts to properly work as percpu counters too? 

OK, so for page faults it looks like I want to look at

	get_cpu_var(vm_event_states).event[PGFAULT]

to get the per-cpu page fault count, as long as
CONFIG_VM_EVENT_COUNTERS is set.

It looks like the scheduler doesn't keep per-cpu counts of context
switches or task migrations, or if it does I couldn't find them.  We
could do stuff in perf_counter_task_sched_in/out to implement per-cpu
context switch and migration counters by adding up the delta values
for each task that gets scheduled onto the cpu.  Or we could add
explicit per-cpu counters for these things in the scheduler.

What do you think?

Paul.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf_counter: Prevent oopses from per-cpu software counters
  2009-02-06  4:34   ` Paul Mackerras
@ 2009-02-06  6:35     ` Zhang, Yanmin
  2009-02-06 15:31     ` Ingo Molnar
  1 sibling, 0 replies; 6+ messages in thread
From: Zhang, Yanmin @ 2009-02-06  6:35 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Ingo Molnar, linux-kernel

On Fri, 2009-02-06 at 15:34 +1100, Paul Mackerras wrote:
> Ingo Molnar writes:
> 
> > So how about fixing these sw counts to properly work as percpu counters too? 
> 
> OK, so for page faults it looks like I want to look at
> 
> 	get_cpu_var(vm_event_states).event[PGFAULT]
> 
> to get the per-cpu page fault count, as long as
> CONFIG_VM_EVENT_COUNTERS is set.
> 
> It looks like the scheduler doesn't keep per-cpu counts of context
> switches
rq->sched_switch or sched_count? sched_switch is defined, but not used. They
all depends on CONFIG_SCHEDSTATS=y.

>  or task migrations, or if it does I couldn't find them.  We
> could do stuff in perf_counter_task_sched_in/out to implement per-cpu
> context switch and migration counters by adding up the delta values
> for each task that gets scheduled onto the cpu.  Or we could add
> explicit per-cpu counters for these things in the scheduler.
> 
> What do you think?
> 
> Paul.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf_counter: Prevent oopses from per-cpu software counters
  2009-02-06  4:34   ` Paul Mackerras
  2009-02-06  6:35     ` Zhang, Yanmin
@ 2009-02-06 15:31     ` Ingo Molnar
  1 sibling, 0 replies; 6+ messages in thread
From: Ingo Molnar @ 2009-02-06 15:31 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Zhang, Yanmin, linux-kernel


* Paul Mackerras <paulus@samba.org> wrote:

> Ingo Molnar writes:
> 
> > So how about fixing these sw counts to properly work as percpu counters too? 
> 
> OK, so for page faults it looks like I want to look at
> 
> 	get_cpu_var(vm_event_states).event[PGFAULT]
> 
> to get the per-cpu page fault count, as long as
> CONFIG_VM_EVENT_COUNTERS is set.

Yeah - i'd suggest that. Note hat VM_EVENT_COUNTERS is default enabled on 
99.99% of kernels:

 config VM_EVENT_COUNTERS
        default y
        bool "Enable VM event counters for /proc/vmstat" if EMBEDDED

> It looks like the scheduler doesn't keep per-cpu counts of context 
> switches or task migrations, or if it does I couldn't find them.  We could 
> do stuff in perf_counter_task_sched_in/out to implement per-cpu context 
> switch and migration counters by adding up the delta values for each task 
> that gets scheduled onto the cpu.  Or we could add explicit per-cpu 
> counters for these things in the scheduler.
> 
> What do you think?

For per-cpu counts of context switches we already have rq->nr_switches. 

We dont have per-cpu counts of migrations - but could add them.

We should do it this way becaue it would be nice to make per-cpu counters 
work just fine even if they are never switched in and out during context 
switches. That turns per-cpu counters into even lower-overhead ways of 
monitoring those values.

	Ingo

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-02-06 15:32 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-05  4:52 [PATCH] perf_counter: Prevent oopses from per-cpu software counters Paul Mackerras
2009-02-05 14:22 ` Ingo Molnar
2009-02-06  2:40   ` Paul Mackerras
2009-02-06  4:34   ` Paul Mackerras
2009-02-06  6:35     ` Zhang, Yanmin
2009-02-06 15:31     ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).