linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* perf: race with automatic rdpmc() disabling
@ 2017-03-13 13:58 Vince Weaver
  2017-03-13 16:44 ` Andy Lutomirski
  0 siblings, 1 reply; 8+ messages in thread
From: Vince Weaver @ 2017-03-13 13:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: luto, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo

Hello

I've been trying to track this issue down for a few days and haven't been 
able to isolate it.  So maybe someone who understands low-level perf mmap
reference counting can help here.

As you might recall, 7911d3f7af14a614617e38245fedf98a724e46a9
introduced automatic disabling of userspace rdpmc when no perf_events 
were running.

I've run into a problem with PAPI when using rdpmc.  If you have PAPI
measuring events in multiple pthread threads, sometimes (but not always)
the program will GPF because CR4/rdpmc gets turned off while events are 
still active.

I've been trying to put together a reproducible test case but haven't been 
able to manage.  I have a PAPI test that will show the problem about 
50% of the time but I can't seem to isolate the problem.

Any ideas?

If you really want to try to reproduce it, get the current git version of 
PAPI
	git clone https://bitbucket.org/icl/papi.git
edit src/components/perf_event/perf_event.c 
	so that #define PERF_USE_RDPMC 1
in src run ./configure , make
then run the ./ctests/zero_pthreads test a few times.  It will GPF and I'm
relatively (though not entirely) sure it's not a PAPI issue.  
The problem does go away if you set /sys/devices/cpu/rdpmc to 2

Vince

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf: race with automatic rdpmc() disabling
  2017-03-13 13:58 perf: race with automatic rdpmc() disabling Vince Weaver
@ 2017-03-13 16:44 ` Andy Lutomirski
  2017-03-13 16:55   ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Andy Lutomirski @ 2017-03-13 16:44 UTC (permalink / raw)
  To: Vince Weaver
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo

On Mon, Mar 13, 2017 at 6:58 AM, Vince Weaver <vincent.weaver@maine.edu> wrote:
> Hello
>
> I've been trying to track this issue down for a few days and haven't been
> able to isolate it.  So maybe someone who understands low-level perf mmap
> reference counting can help here.
>
> As you might recall, 7911d3f7af14a614617e38245fedf98a724e46a9
> introduced automatic disabling of userspace rdpmc when no perf_events
> were running.
>
> I've run into a problem with PAPI when using rdpmc.  If you have PAPI
> measuring events in multiple pthread threads, sometimes (but not always)
> the program will GPF because CR4/rdpmc gets turned off while events are
> still active.
>
> I've been trying to put together a reproducible test case but haven't been
> able to manage.  I have a PAPI test that will show the problem about
> 50% of the time but I can't seem to isolate the problem.
>
> Any ideas?
>
> If you really want to try to reproduce it, get the current git version of
> PAPI
>         git clone https://bitbucket.org/icl/papi.git
> edit src/components/perf_event/perf_event.c
>         so that #define PERF_USE_RDPMC 1
> in src run ./configure , make
> then run the ./ctests/zero_pthreads test a few times.  It will GPF and I'm
> relatively (though not entirely) sure it's not a PAPI issue.
> The problem does go away if you set /sys/devices/cpu/rdpmc to 2

Hmm

static void x86_pmu_event_mapped(struct perf_event *event)
{
    if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
        return;

    if (atomic_inc_return(&current->mm->context.perf_rdpmc_allowed) == 1)

<-- thread 1 stalls here

        on_each_cpu_mask(mm_cpumask(current->mm), refresh_pce, NULL, 1);
}

Suppose you start with perf_rdpmc_allowed == 0.  Thread 1 runs
x86_pmu_event_mapped and gets preempted (or just runs slowly) where I
marked.  Then thread 2 runs the whole function, does *not* update CR4,
returns to userspace, and GPFs.

The big hammer solution is to stick a per-mm mutex around it.  Let me
ponder whether a smaller hammer is available.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf: race with automatic rdpmc() disabling
  2017-03-13 16:44 ` Andy Lutomirski
@ 2017-03-13 16:55   ` Peter Zijlstra
  2017-03-13 21:05     ` Andy Lutomirski
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2017-03-13 16:55 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Vince Weaver, linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo

On Mon, Mar 13, 2017 at 09:44:02AM -0700, Andy Lutomirski wrote:
> static void x86_pmu_event_mapped(struct perf_event *event)
> {
>     if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
>         return;
> 
>     if (atomic_inc_return(&current->mm->context.perf_rdpmc_allowed) == 1)
> 
> <-- thread 1 stalls here
> 
>         on_each_cpu_mask(mm_cpumask(current->mm), refresh_pce, NULL, 1);
> }
> 
> Suppose you start with perf_rdpmc_allowed == 0.  Thread 1 runs
> x86_pmu_event_mapped and gets preempted (or just runs slowly) where I
> marked.  Then thread 2 runs the whole function, does *not* update CR4,
> returns to userspace, and GPFs.
> 
> The big hammer solution is to stick a per-mm mutex around it.  Let me
> ponder whether a smaller hammer is available.

Reminds me a bit of what we ended up with in kernel/jump_label.c:static_key_slow_inc().

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf: race with automatic rdpmc() disabling
  2017-03-13 16:55   ` Peter Zijlstra
@ 2017-03-13 21:05     ` Andy Lutomirski
  2017-03-14 15:58       ` Andy Lutomirski
  0 siblings, 1 reply; 8+ messages in thread
From: Andy Lutomirski @ 2017-03-13 21:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andy Lutomirski, Vince Weaver, linux-kernel, Ingo Molnar,
	Arnaldo Carvalho de Melo

On Mon, Mar 13, 2017 at 9:55 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Mar 13, 2017 at 09:44:02AM -0700, Andy Lutomirski wrote:
>> static void x86_pmu_event_mapped(struct perf_event *event)
>> {
>>     if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
>>         return;
>>
>>     if (atomic_inc_return(&current->mm->context.perf_rdpmc_allowed) == 1)
>>
>> <-- thread 1 stalls here
>>
>>         on_each_cpu_mask(mm_cpumask(current->mm), refresh_pce, NULL, 1);
>> }
>>
>> Suppose you start with perf_rdpmc_allowed == 0.  Thread 1 runs
>> x86_pmu_event_mapped and gets preempted (or just runs slowly) where I
>> marked.  Then thread 2 runs the whole function, does *not* update CR4,
>> returns to userspace, and GPFs.
>>
>> The big hammer solution is to stick a per-mm mutex around it.  Let me
>> ponder whether a smaller hammer is available.
>
> Reminds me a bit of what we ended up with in kernel/jump_label.c:static_key_slow_inc().
>
>

One thing I don't get: isn't mmap_sem held for write the whole time?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf: race with automatic rdpmc() disabling
  2017-03-13 21:05     ` Andy Lutomirski
@ 2017-03-14 15:58       ` Andy Lutomirski
  2017-03-14 16:45         ` Vince Weaver
  0 siblings, 1 reply; 8+ messages in thread
From: Andy Lutomirski @ 2017-03-14 15:58 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Peter Zijlstra, Vince Weaver, linux-kernel, Ingo Molnar,
	Arnaldo Carvalho de Melo

On Mon, Mar 13, 2017 at 2:05 PM, Andy Lutomirski <luto@kernel.org> wrote:
> On Mon, Mar 13, 2017 at 9:55 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> On Mon, Mar 13, 2017 at 09:44:02AM -0700, Andy Lutomirski wrote:
>>> static void x86_pmu_event_mapped(struct perf_event *event)
>>> {
>>>     if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
>>>         return;
>>>
>>>     if (atomic_inc_return(&current->mm->context.perf_rdpmc_allowed) == 1)
>>>
>>> <-- thread 1 stalls here
>>>
>>>         on_each_cpu_mask(mm_cpumask(current->mm), refresh_pce, NULL, 1);
>>> }
>>>
>>> Suppose you start with perf_rdpmc_allowed == 0.  Thread 1 runs
>>> x86_pmu_event_mapped and gets preempted (or just runs slowly) where I
>>> marked.  Then thread 2 runs the whole function, does *not* update CR4,
>>> returns to userspace, and GPFs.
>>>
>>> The big hammer solution is to stick a per-mm mutex around it.  Let me
>>> ponder whether a smaller hammer is available.
>>
>> Reminds me a bit of what we ended up with in kernel/jump_label.c:static_key_slow_inc().
>>
>>
>
> One thing I don't get: isn't mmap_sem held for write the whole time?

mmap_sem is indeed held, so my theory is wrong.  I can reproduce it,
but I don't see the bug yet...

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf: race with automatic rdpmc() disabling
  2017-03-14 15:58       ` Andy Lutomirski
@ 2017-03-14 16:45         ` Vince Weaver
  2017-03-15 20:16           ` Andy Lutomirski
  0 siblings, 1 reply; 8+ messages in thread
From: Vince Weaver @ 2017-03-14 16:45 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, Peter Zijlstra, linux-kernel, Ingo Molnar,
	Arnaldo Carvalho de Melo

On Tue, 14 Mar 2017, Andy Lutomirski wrote:

> On Mon, Mar 13, 2017 at 2:05 PM, Andy Lutomirski <luto@kernel.org> wrote:
> > On Mon, Mar 13, 2017 at 9:55 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >> On Mon, Mar 13, 2017 at 09:44:02AM -0700, Andy Lutomirski wrote:
> >>> static void x86_pmu_event_mapped(struct perf_event *event)
> >>> {
> >>>     if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
> >>>         return;
> >>>
> >>>     if (atomic_inc_return(&current->mm->context.perf_rdpmc_allowed) == 1)
> >>>
> >>> <-- thread 1 stalls here
> >>>
> >>>         on_each_cpu_mask(mm_cpumask(current->mm), refresh_pce, NULL, 1);
> >>> }
> >>>
> >>> Suppose you start with perf_rdpmc_allowed == 0.  Thread 1 runs
> >>> x86_pmu_event_mapped and gets preempted (or just runs slowly) where I
> >>> marked.  Then thread 2 runs the whole function, does *not* update CR4,
> >>> returns to userspace, and GPFs.
> >>>
> >>> The big hammer solution is to stick a per-mm mutex around it.  Let me
> >>> ponder whether a smaller hammer is available.
> >>
> >> Reminds me a bit of what we ended up with in kernel/jump_label.c:static_key_slow_inc().
> >>
> >>
> >
> > One thing I don't get: isn't mmap_sem held for write the whole time?
> 
> mmap_sem is indeed held, so my theory is wrong.  I can reproduce it,
> but I don't see the bug yet...

It could still be a PAPI bug, as I'm having absolutely no luck trying to 
come up with a plain perf_event reproducer.

Let me dig through the PAPI code again and make sure I'm not missing 
something.

Vince

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf: race with automatic rdpmc() disabling
  2017-03-14 16:45         ` Vince Weaver
@ 2017-03-15 20:16           ` Andy Lutomirski
  2017-03-16  5:35             ` Vince Weaver
  0 siblings, 1 reply; 8+ messages in thread
From: Andy Lutomirski @ 2017-03-15 20:16 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Andy Lutomirski, Peter Zijlstra, linux-kernel, Ingo Molnar,
	Arnaldo Carvalho de Melo

On Tue, Mar 14, 2017 at 9:45 AM, Vince Weaver <vincent.weaver@maine.edu> wrote:
> On Tue, 14 Mar 2017, Andy Lutomirski wrote:
>
>> On Mon, Mar 13, 2017 at 2:05 PM, Andy Lutomirski <luto@kernel.org> wrote:
>> > On Mon, Mar 13, 2017 at 9:55 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> >> On Mon, Mar 13, 2017 at 09:44:02AM -0700, Andy Lutomirski wrote:
>> >>> static void x86_pmu_event_mapped(struct perf_event *event)
>> >>> {
>> >>>     if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
>> >>>         return;
>> >>>
>> >>>     if (atomic_inc_return(&current->mm->context.perf_rdpmc_allowed) == 1)
>> >>>
>> >>> <-- thread 1 stalls here
>> >>>
>> >>>         on_each_cpu_mask(mm_cpumask(current->mm), refresh_pce, NULL, 1);
>> >>> }
>> >>>
>> >>> Suppose you start with perf_rdpmc_allowed == 0.  Thread 1 runs
>> >>> x86_pmu_event_mapped and gets preempted (or just runs slowly) where I
>> >>> marked.  Then thread 2 runs the whole function, does *not* update CR4,
>> >>> returns to userspace, and GPFs.
>> >>>
>> >>> The big hammer solution is to stick a per-mm mutex around it.  Let me
>> >>> ponder whether a smaller hammer is available.
>> >>
>> >> Reminds me a bit of what we ended up with in kernel/jump_label.c:static_key_slow_inc().
>> >>
>> >>
>> >
>> > One thing I don't get: isn't mmap_sem held for write the whole time?
>>
>> mmap_sem is indeed held, so my theory is wrong.  I can reproduce it,
>> but I don't see the bug yet...
>
> It could still be a PAPI bug, as I'm having absolutely no luck trying to
> come up with a plain perf_event reproducer.
>
> Let me dig through the PAPI code again and make sure I'm not missing
> something.

Can you give this a try:

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes&id=9edb8154863ba1a7f6f1f15ffe6aecf3cf32bf21

(The link doesn't work yet but it should in a minute or two.)

--Andy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: perf: race with automatic rdpmc() disabling
  2017-03-15 20:16           ` Andy Lutomirski
@ 2017-03-16  5:35             ` Vince Weaver
  0 siblings, 0 replies; 8+ messages in thread
From: Vince Weaver @ 2017-03-16  5:35 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, Peter Zijlstra, linux-kernel, Ingo Molnar,
	Arnaldo Carvalho de Melo

On Wed, 15 Mar 2017, Andy Lutomirski wrote:

> Can you give this a try:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes&id=9edb8154863ba1a7f6f1f15ffe6aecf3cf32bf21
> 
> (The link doesn't work yet but it should in a minute or two.)

I've tested it and I am unable to reproduce the problem with the patch 
applied.

Vince

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-03-16  5:43 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-13 13:58 perf: race with automatic rdpmc() disabling Vince Weaver
2017-03-13 16:44 ` Andy Lutomirski
2017-03-13 16:55   ` Peter Zijlstra
2017-03-13 21:05     ` Andy Lutomirski
2017-03-14 15:58       ` Andy Lutomirski
2017-03-14 16:45         ` Vince Weaver
2017-03-15 20:16           ` Andy Lutomirski
2017-03-16  5:35             ` Vince Weaver

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).