linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* perf code using smp_processor_id() in preemptible [00000000] code
@ 2013-11-15  3:29 Dave Jones
  2013-11-15 10:02 ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Jones @ 2013-11-15  3:29 UTC (permalink / raw)
  To: Linux Kernel; +Cc: Ingo Molnar, Peter Zijlstra

BUG: using smp_processor_id() in preemptible [00000000] code: trinity-main/890
caller is p4_pmu_schedule_events+0x25/0x4c0
CPU: 0 PID: 890 Comm: trinity-main Not tainted 3.12.0+ #3 
Hardware name: Dell Inc.                 Precision WorkStation 470    /0P7996, BIOS A05 05/18/2005
 ffff88001e675668 ffff880025e79d00 ffffffff8171f332 0000000000000000
 ffff880025e79d18 ffffffff8133132a 0000000000000002 ffff880025e79dc0
 ffffffff81018e55 ffff88001e675668 00000000000080d0 ffff88001e675668
Call Trace:
 [<ffffffff8171f332>] dump_stack+0x4e/0x7a
 [<ffffffff8133132a>] debug_smp_processor_id+0xca/0xe0
 [<ffffffff81018e55>] p4_pmu_schedule_events+0x25/0x4c0
 [<ffffffff811a5c9f>] ? kmem_cache_alloc_trace+0x12f/0x2d0
 [<ffffffff810160ff>] ? allocate_fake_cpuc+0x2f/0x90
 [<ffffffff81016443>] x86_pmu_event_init+0x193/0x440
 [<ffffffff81146850>] perf_init_event+0x250/0x330
 [<ffffffff81146600>] ? perf_pmu_unregister+0x160/0x160
 [<ffffffff81146ca8>] perf_event_alloc+0x378/0x420
 [<ffffffff81147335>] SYSC_perf_event_open+0x5e5/0xa80
 [<ffffffff81147b89>] SyS_perf_event_open+0x9/0x10
 [<ffffffff81731ee4>] tracesys+0xdd/0xe2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf code using smp_processor_id() in preemptible [00000000] code
  2013-11-15  3:29 perf code using smp_processor_id() in preemptible [00000000] code Dave Jones
@ 2013-11-15 10:02 ` Peter Zijlstra
  2013-11-15 10:19   ` Cyrill Gorcunov
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2013-11-15 10:02 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel, Ingo Molnar; +Cc: gorcunov

On Thu, Nov 14, 2013 at 10:29:07PM -0500, Dave Jones wrote:
> BUG: using smp_processor_id() in preemptible [00000000] code: trinity-main/890
> caller is p4_pmu_schedule_events+0x25/0x4c0

Whee, wherever did you dig up that dinosaur ;-)

> CPU: 0 PID: 890 Comm: trinity-main Not tainted 3.12.0+ #3 
> Hardware name: Dell Inc.                 Precision WorkStation 470    /0P7996, BIOS A05 05/18/2005
>  ffff88001e675668 ffff880025e79d00 ffffffff8171f332 0000000000000000
>  ffff880025e79d18 ffffffff8133132a 0000000000000002 ffff880025e79dc0
>  ffffffff81018e55 ffff88001e675668 00000000000080d0 ffff88001e675668
> Call Trace:
>  [<ffffffff8171f332>] dump_stack+0x4e/0x7a
>  [<ffffffff8133132a>] debug_smp_processor_id+0xca/0xe0
>  [<ffffffff81018e55>] p4_pmu_schedule_events+0x25/0x4c0
>  [<ffffffff811a5c9f>] ? kmem_cache_alloc_trace+0x12f/0x2d0
>  [<ffffffff810160ff>] ? allocate_fake_cpuc+0x2f/0x90
>  [<ffffffff81016443>] x86_pmu_event_init+0x193/0x440
>  [<ffffffff81146850>] perf_init_event+0x250/0x330
>  [<ffffffff81146600>] ? perf_pmu_unregister+0x160/0x160
>  [<ffffffff81146ca8>] perf_event_alloc+0x378/0x420
>  [<ffffffff81147335>] SYSC_perf_event_open+0x5e5/0xa80
>  [<ffffffff81147b89>] SyS_perf_event_open+0x9/0x10
>  [<ffffffff81731ee4>] tracesys+0xdd/0xe2

Hrmm, I'm not entire sure what we should do here; Cyrill, do you have
clue?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf code using smp_processor_id() in preemptible [00000000] code
  2013-11-15 10:02 ` Peter Zijlstra
@ 2013-11-15 10:19   ` Cyrill Gorcunov
  2013-11-15 11:30     ` Cyrill Gorcunov
  0 siblings, 1 reply; 8+ messages in thread
From: Cyrill Gorcunov @ 2013-11-15 10:19 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Dave Jones, Linux Kernel, Ingo Molnar

On Fri, Nov 15, 2013 at 11:02:54AM +0100, Peter Zijlstra wrote:
> On Thu, Nov 14, 2013 at 10:29:07PM -0500, Dave Jones wrote:
> > BUG: using smp_processor_id() in preemptible [00000000] code: trinity-main/890
> > caller is p4_pmu_schedule_events+0x25/0x4c0
> 
> Whee, wherever did you dig up that dinosaur ;-)

rofl ;)
> 
> Hrmm, I'm not entire sure what we should do here; Cyrill, do you have clue?

We need figure out which cpu we're scheduled on to properly choose thread.
I have a vague memory that earlier I've raw_smp_processor_id here but then
someone pointed that we need smp_processor_id instead. Gimme some time,
I'll cook a patch today.

Thanks for report, Dave!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf code using smp_processor_id() in preemptible [00000000] code
  2013-11-15 10:19   ` Cyrill Gorcunov
@ 2013-11-15 11:30     ` Cyrill Gorcunov
  2013-11-15 11:51       ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Cyrill Gorcunov @ 2013-11-15 11:30 UTC (permalink / raw)
  To: Peter Zijlstra, Dave Jones; +Cc: Linux Kernel, Ingo Molnar

On Fri, Nov 15, 2013 at 02:19:46PM +0400, Cyrill Gorcunov wrote:
> 
> Thanks for report, Dave!

Dave, could you please give the patch a shot?
---
From: Cyrill Gorcunov <gorcunov@openvz.org>
Subject: x86, perf: P4 PMU -- protect p4_pmu_schedule_events from preemption

Dave reported

 | BUG: using smp_processor_id() in preemptible [00000000] code: trinity-main/890
 | caller is p4_pmu_schedule_events+0x25/0x4c0
 | CPU: 0 PID: 890 Comm: trinity-main Not tainted 3.12.0+ #3
 | Hardware name: Dell Inc.                 Precision WorkStation 470    /0P7996, BIOS A05 05/18/2005
 | ffff88001e675668 ffff880025e79d00 ffffffff8171f332 0000000000000000
 | ffff880025e79d18 ffffffff8133132a 0000000000000002 ffff880025e79dc0
 | ffffffff81018e55 ffff88001e675668 00000000000080d0 ffff88001e675668
 |Call Trace:
 | [<ffffffff8171f332>] dump_stack+0x4e/0x7a
 | [<ffffffff8133132a>] debug_smp_processor_id+0xca/0xe0
 | [<ffffffff81018e55>] p4_pmu_schedule_events+0x25/0x4c0
 | [<ffffffff811a5c9f>] ? kmem_cache_alloc_trace+0x12f/0x2d0
 | [<ffffffff810160ff>] ? allocate_fake_cpuc+0x2f/0x90
 | [<ffffffff81016443>] x86_pmu_event_init+0x193/0x440
 | [<ffffffff81146850>] perf_init_event+0x250/0x330
 | [<ffffffff81146600>] ? perf_pmu_unregister+0x160/0x160
 | [<ffffffff81146ca8>] perf_event_alloc+0x378/0x420
 | [<ffffffff81147335>] SYSC_perf_event_open+0x5e5/0xa80
 | [<ffffffff81147b89>] SyS_perf_event_open+0x9/0x10
 | [<ffffffff81731ee4>] tracesys+0xdd/0xe2

When we schedule event we need to disable preempt. The
problem the same as fixed in commit 137351e0feeb (I simply
managed to miss this routine in first place).

Reported-by: Dave Jones <davej@redhat.com>
CC: Ingo Molnar <mingo@kernel.org>
CC: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 arch/x86/kernel/cpu/perf_event_p4.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux-2.6.git/arch/x86/kernel/cpu/perf_event_p4.c
===================================================================
--- linux-2.6.git.orig/arch/x86/kernel/cpu/perf_event_p4.c
+++ linux-2.6.git/arch/x86/kernel/cpu/perf_event_p4.c
@@ -1207,7 +1207,7 @@ static int p4_pmu_schedule_events(struct
 {
 	unsigned long used_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
 	unsigned long escr_mask[BITS_TO_LONGS(P4_ESCR_MSR_TABLE_SIZE)];
-	int cpu = smp_processor_id();
+	int cpu = get_cpu();
 	struct hw_perf_event *hwc;
 	struct p4_event_bind *bind;
 	unsigned int i, thread, num;
@@ -1267,6 +1267,7 @@ reserve:
 	}
 
 done:
+	put_cpu();
 	return num ? -EINVAL : 0;
 }
 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf code using smp_processor_id() in preemptible [00000000] code
  2013-11-15 11:30     ` Cyrill Gorcunov
@ 2013-11-15 11:51       ` Peter Zijlstra
  2013-11-15 12:10         ` Cyrill Gorcunov
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2013-11-15 11:51 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Dave Jones, Linux Kernel, Ingo Molnar

On Fri, Nov 15, 2013 at 03:30:46PM +0400, Cyrill Gorcunov wrote:

>  arch/x86/kernel/cpu/perf_event_p4.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6.git/arch/x86/kernel/cpu/perf_event_p4.c
> ===================================================================
> --- linux-2.6.git.orig/arch/x86/kernel/cpu/perf_event_p4.c
> +++ linux-2.6.git/arch/x86/kernel/cpu/perf_event_p4.c
> @@ -1207,7 +1207,7 @@ static int p4_pmu_schedule_events(struct
>  {
>  	unsigned long used_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
>  	unsigned long escr_mask[BITS_TO_LONGS(P4_ESCR_MSR_TABLE_SIZE)];
> -	int cpu = smp_processor_id();
> +	int cpu = get_cpu();
>  	struct hw_perf_event *hwc;
>  	struct p4_event_bind *bind;
>  	unsigned int i, thread, num;
> @@ -1267,6 +1267,7 @@ reserve:
>  	}
>  
>  done:
> +	put_cpu();
>  	return num ? -EINVAL : 0;
>  }


ok, this will make the error go away, but what about the semantics of
the case? Does it really matter for the grouping on which cpu we compute
it? That is can we end up with a different group for one cpu as for
another?

Or do we simply need a coherent single cpu to do the computation with?
In which case raw_smp_processor_id() would also suffice.


If we can indeed get a different result depending on which cpu we do the
computation, then things are broken, because it might be a task group
we're building which has to be able to migrate around with the task.

In that case we must compute the maximal group that can be scheduled on
all cpus.

This wasn't at all clear.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf code using smp_processor_id() in preemptible [00000000] code
  2013-11-15 11:51       ` Peter Zijlstra
@ 2013-11-15 12:10         ` Cyrill Gorcunov
  2013-11-15 12:33           ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Cyrill Gorcunov @ 2013-11-15 12:10 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Dave Jones, Linux Kernel, Ingo Molnar

On Fri, Nov 15, 2013 at 12:51:50PM +0100, Peter Zijlstra wrote:
> ok, this will make the error go away, but what about the semantics of
> the case? Does it really matter for the grouping on which cpu we compute
> it? That is can we end up with a different group for one cpu as for
> another?
> 
> Or do we simply need a coherent single cpu to do the computation with?
> In which case raw_smp_processor_id() would also suffice.
> 
> If we can indeed get a different result depending on which cpu we do the
> computation, then things are broken, because it might be a task group
> we're building which has to be able to migrate around with the task.

The events are sensitive to which cpu they're scheduled to execute on
(if HT is turned on, we need to setup thread bit in register).
As far as I understand once events are assigned to cpu_hw_events
they are executing on this cpu, when tasks are migrated to another
cpu, they're re-scheduled. Or I miss something obvious here?

> 
> In that case we must compute the maximal group that can be scheduled on
> all cpus.
> 
> This wasn't at all clear.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf code using smp_processor_id() in preemptible [00000000] code
  2013-11-15 12:10         ` Cyrill Gorcunov
@ 2013-11-15 12:33           ` Peter Zijlstra
  2013-11-15 12:42             ` Cyrill Gorcunov
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2013-11-15 12:33 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Dave Jones, Linux Kernel, Ingo Molnar

On Fri, Nov 15, 2013 at 04:10:51PM +0400, Cyrill Gorcunov wrote:
> On Fri, Nov 15, 2013 at 12:51:50PM +0100, Peter Zijlstra wrote:
> > ok, this will make the error go away, but what about the semantics of
> > the case? Does it really matter for the grouping on which cpu we compute
> > it? That is can we end up with a different group for one cpu as for
> > another?
> > 
> > Or do we simply need a coherent single cpu to do the computation with?
> > In which case raw_smp_processor_id() would also suffice.
> > 
> > If we can indeed get a different result depending on which cpu we do the
> > computation, then things are broken, because it might be a task group
> > we're building which has to be able to migrate around with the task.
> 
> The events are sensitive to which cpu they're scheduled to execute on
> (if HT is turned on, we need to setup thread bit in register).
> As far as I understand once events are assigned to cpu_hw_events
> they are executing on this cpu, when tasks are migrated to another
> cpu, they're re-scheduled. Or I miss something obvious here?

No this is correct, but that is simply about event encoding, right?

The situation we should be avoiding is:

 {x, y, z}

being a valid event group on ht0 but an invalid group for ht1.

So the whole fake_cpuc / validate_{event,group} code that triggered this
isn't actually scheduling them, its testing to see if all the provided
events could possibly be scheduled together -- and we would want to
avoid giving a sibling dependent answer here.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf code using smp_processor_id() in preemptible [00000000] code
  2013-11-15 12:33           ` Peter Zijlstra
@ 2013-11-15 12:42             ` Cyrill Gorcunov
  0 siblings, 0 replies; 8+ messages in thread
From: Cyrill Gorcunov @ 2013-11-15 12:42 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Dave Jones, Linux Kernel, Ingo Molnar

On Fri, Nov 15, 2013 at 01:33:37PM +0100, Peter Zijlstra wrote:
> On Fri, Nov 15, 2013 at 04:10:51PM +0400, Cyrill Gorcunov wrote:
> > On Fri, Nov 15, 2013 at 12:51:50PM +0100, Peter Zijlstra wrote:
> > > ok, this will make the error go away, but what about the semantics of
> > > the case? Does it really matter for the grouping on which cpu we compute
> > > it? That is can we end up with a different group for one cpu as for
> > > another?
> > > 
> > > Or do we simply need a coherent single cpu to do the computation with?
> > > In which case raw_smp_processor_id() would also suffice.
> > > 
> > > If we can indeed get a different result depending on which cpu we do the
> > > computation, then things are broken, because it might be a task group
> > > we're building which has to be able to migrate around with the task.
> > 
> > The events are sensitive to which cpu they're scheduled to execute on
> > (if HT is turned on, we need to setup thread bit in register).
> > As far as I understand once events are assigned to cpu_hw_events
> > they are executing on this cpu, when tasks are migrated to another
> > cpu, they're re-scheduled. Or I miss something obvious here?
> 
> No this is correct, but that is simply about event encoding, right?

Yes, sorry for not mentioning it earlier.

> 
> The situation we should be avoiding is:
> 
>  {x, y, z}
> 
> being a valid event group on ht0 but an invalid group for ht1.

I see. No, this can't happen. (The idea of using cpu here is to
split the whole set of perf registers available on a core [which
are shared between HT threads]  into two set, one half used for thread 1
and second half used for thread 2 only).

> 
> So the whole fake_cpuc / validate_{event,group} code that triggered this
> isn't actually scheduling them, its testing to see if all the provided
> events could possibly be scheduled together -- and we would want to
> avoid giving a sibling dependent answer here.

Yes, I looked into fake_cpuc, and our @cpu variable used in p4_pmu_schedule_events
will simply either answer us "ok, there is enough registers to carry
all events requested", either it will decline events if no space left.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-11-15 12:42 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-15  3:29 perf code using smp_processor_id() in preemptible [00000000] code Dave Jones
2013-11-15 10:02 ` Peter Zijlstra
2013-11-15 10:19   ` Cyrill Gorcunov
2013-11-15 11:30     ` Cyrill Gorcunov
2013-11-15 11:51       ` Peter Zijlstra
2013-11-15 12:10         ` Cyrill Gorcunov
2013-11-15 12:33           ` Peter Zijlstra
2013-11-15 12:42             ` Cyrill Gorcunov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).