linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG] perf: multiplexing and hotplug CPU problem
@ 2012-09-12 14:40 Stephane Eranian
  2012-09-12 14:47 ` Peter Zijlstra
  0 siblings, 1 reply; 3+ messages in thread
From: Stephane Eranian @ 2012-09-12 14:40 UTC (permalink / raw)
  To: LKML; +Cc: Peter Zijlstra, mingo, ak, Robert Richter

Hi,

As I was debugging my hrtimer patch, I ran a few tests
with hotplug CPU. In others words, I offline a CPU while
there is an active monitoring session which causes multiplexing.

When the CPU goes down, all is well. But when it comes back,
things go wrong. No kernel crashes but wrong results and multiplexing
does not work anymore.

I investigated this some more and found out there is an issue
on re-activation.

During shutdown, system-wide events are scheduled out AND removed
from the event lists. Consequently, ctx->nr_events and ctx->nr_active
go to zero.

When the CPU is brought back online and tools do start/stop on the events
they can be scheduled back in, and therefore increment ctx->nr_active.
Because list_add_event() is not called again, you may end up with
ctx->nr_events < ctx->nr_active which is wrong. Events may not
be a lists and therefore they cannot get multiplexed again.

It is not clear to me why we need to remove the events from any
list (list_del_event) when the CPU goes down.

Why isn't calling event_sched_out() enough?
If events are kept on lists, why not try to schedule them back in when
the CPU is brought back online?

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [BUG] perf: multiplexing and hotplug CPU problem
  2012-09-12 14:40 [BUG] perf: multiplexing and hotplug CPU problem Stephane Eranian
@ 2012-09-12 14:47 ` Peter Zijlstra
  2012-09-12 14:50   ` Stephane Eranian
  0 siblings, 1 reply; 3+ messages in thread
From: Peter Zijlstra @ 2012-09-12 14:47 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: LKML, mingo, ak, Robert Richter

On Wed, 2012-09-12 at 16:40 +0200, Stephane Eranian wrote:
> Hi,
> 
> As I was debugging my hrtimer patch, I ran a few tests
> with hotplug CPU. In others words, I offline a CPU while
> there is an active monitoring session which causes multiplexing.
> 
> When the CPU goes down, all is well. But when it comes back,
> things go wrong. No kernel crashes but wrong results and multiplexing
> does not work anymore.
> 
> I investigated this some more and found out there is an issue
> on re-activation.
> 
> During shutdown, system-wide events are scheduled out AND removed
> from the event lists. Consequently, ctx->nr_events and ctx->nr_active
> go to zero.
> 
> When the CPU is brought back online and tools do start/stop on the events
> they can be scheduled back in, and therefore increment ctx->nr_active.
> Because list_add_event() is not called again, you may end up with
> ctx->nr_events < ctx->nr_active which is wrong. Events may not
> be a lists and therefore they cannot get multiplexed again.
> 
> It is not clear to me why we need to remove the events from any
> list (list_del_event) when the CPU goes down.
> 
> Why isn't calling event_sched_out() enough?
> If events are kept on lists, why not try to schedule them back in when
> the CPU is brought back online?

This might be never, I think we should put the events in error state
instead of disabling them, that should avoid the re-activation and
provide a stronger hint to userspace that something went funny.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [BUG] perf: multiplexing and hotplug CPU problem
  2012-09-12 14:47 ` Peter Zijlstra
@ 2012-09-12 14:50   ` Stephane Eranian
  0 siblings, 0 replies; 3+ messages in thread
From: Stephane Eranian @ 2012-09-12 14:50 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, mingo, ak, Robert Richter

On Wed, Sep 12, 2012 at 4:47 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, 2012-09-12 at 16:40 +0200, Stephane Eranian wrote:
>> Hi,
>>
>> As I was debugging my hrtimer patch, I ran a few tests
>> with hotplug CPU. In others words, I offline a CPU while
>> there is an active monitoring session which causes multiplexing.
>>
>> When the CPU goes down, all is well. But when it comes back,
>> things go wrong. No kernel crashes but wrong results and multiplexing
>> does not work anymore.
>>
>> I investigated this some more and found out there is an issue
>> on re-activation.
>>
>> During shutdown, system-wide events are scheduled out AND removed
>> from the event lists. Consequently, ctx->nr_events and ctx->nr_active
>> go to zero.
>>
>> When the CPU is brought back online and tools do start/stop on the events
>> they can be scheduled back in, and therefore increment ctx->nr_active.
>> Because list_add_event() is not called again, you may end up with
>> ctx->nr_events < ctx->nr_active which is wrong. Events may not
>> be a lists and therefore they cannot get multiplexed again.
>>
>> It is not clear to me why we need to remove the events from any
>> list (list_del_event) when the CPU goes down.
>>
>> Why isn't calling event_sched_out() enough?
>> If events are kept on lists, why not try to schedule them back in when
>> the CPU is brought back online?
>
> This might be never, I think we should put the events in error state
> instead of disabling them, that should avoid the re-activation and
> provide a stronger hint to userspace that something went funny.

Yeah, that's probably a better approach. Counters should be put
in error state such that IOC_ENABLE or read() return errors.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-09-12 14:50 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-12 14:40 [BUG] perf: multiplexing and hotplug CPU problem Stephane Eranian
2012-09-12 14:47 ` Peter Zijlstra
2012-09-12 14:50   ` Stephane Eranian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).