All of lore.kernel.org
 help / color / mirror / Atom feed
* WARN_ON_ONCE() hit at kernel/events/core.c:330
@ 2019-04-03  9:47 Thomas-Mich Richter
  2019-04-03 10:41 ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Thomas-Mich Richter @ 2019-04-03  9:47 UTC (permalink / raw)
  To: peterz; +Cc: acme, Linux Kernel Mailing List

I use linux 5.1.0-rc3 on s390 and got this WARN_ON_ONCE message:

WARNING: CPU: 15 PID: 0 at kernel/events/core.c:330
                event_function_local.constprop.79+0xe2/0xe8

which was introduced with
   commit cca2094605ef ("perf/core: Fix event_function_local()").

This is the WARN_ON_ONCE message, which sometimes shows up in the kernel
log:
 [ 4598.316519] WARNING: CPU: 15 PID: 0 at kernel/events/core.c:330
                event_function_local.constprop.79+0xe2/0xe8
 [ 4598.316524] Kernel panic - not syncing: panic_on_warn set ...
 [ 4598.316527] CPU: 15 PID: 0 Comm: swapper/15 Tainted: G           OE
                5.1.0-20190402.rc3.git0.5e7a8ca31926.300.fc29.s390x+git #1
 [ 4598.316529] Hardware name: IBM 2964 NC9 712 (LPAR)
 [ 4598.316531] Call Trace:
 [ 4598.316534] ([<0000000000112eb8>] show_stack+0x58/0x70)
 [ 4598.316538]  [<0000000000a820aa>] dump_stack+0x7a/0xa8
 [ 4598.316541]  [<0000000000143b52>] panic+0x11a/0x2d0
 [ 4598.316543]  [<00000000001439b0>] __warn+0xf8/0x118
 [ 4598.316545]  [<0000000000a811a8>] report_bug+0xd8/0x150
 [ 4598.316547]  [<00000000001014ac>] do_report_trap+0xc4/0xe0
 [ 4598.316549]  [<0000000000101680>] illegal_op+0x138/0x150
 [ 4598.316552]  [<0000000000aa233c>] pgm_check_handler+0x1cc/0x220
 [ 4598.316554]  [<00000000002a6d02>] event_function_local.constprop.79+
                                                0xe2/0xe8
 [ 4598.316556] ([<00000000002a6c7a>] event_function_local.constprop.79+
                                                0x5a/0xe8)
 [ 4598.316559]  [<00000000002aa400>] perf_pending_event+0x88/0xb0
 [ 4598.316561]  [<0000000000271a8c>] irq_work_run_list+0x8c/0xb8
 [ 4598.316563]  [<0000000000271d78>] irq_work_tick+0x48/0x68
 [ 4598.316566]  [<00000000001d6058>] update_process_times+0x68/0x80
 [ 4598.316568]  [<00000000001e6c10>] tick_sched_handle.isra.6+0x50/0x60
 [ 4598.316570]  [<00000000001e6c7e>] tick_sched_timer+0x5e/0xb8
 [ 4598.316573]  [<00000000001d6b6a>] __hrtimer_run_queues+0x10a/0x2c0
 [ 4598.316575]  [<00000000001d7a88>] hrtimer_interrupt+0x138/0x2a8
 [ 4598.316577]  [<000000000010c3e4>] do_IRQ+0xac/0xb0
 [ 4598.316597]  [<0000000000aa2744>] ext_int_handler+0x128/0x12c
 [ 4598.316600]  [<00000000001034f6>] enabled_wait+0x46/0xd0
 [ 4598.316602] ([<000003e000d2fe10>] 0x3e000d2fe10)
 [ 4598.316604]  [<0000000000103842>] arch_cpu_idle+0x3a/0x50
 [ 4598.316606]  [<000000000017b808>] do_idle+0x180/0x1b0
 [ 4598.316608]  [<000000000017ba06>] cpu_startup_entry+0x36/0x40
 [ 4598.316611]  [<0000000000115d16>] smp_init_secondary+0xd6/0xf0
 [ 4598.316613]  [<000000000011521e>] smp_start_secondary+0x86/0x98

Any ideas or hints who to avoid/fix this warning?

I ran a user space program which uses

  1. perf_event_open() system call with type HARDWARE
     and PERF_COUNT_HW_CPU_CYCLES in sampling mode.
  2. Assigns signal SIGIO to the file descriptor returned from
     perf_event_open() using:
       rc = fcntl(fd, F_SETFL, O_RDWR | O_NONBLOCK | O_ASYNC);
       rc |= fcntl(fd, F_SETSIG, signo);
       rc |= fcntl(fd, F_SETOWN, getpid());

  3. The signal handler increments some variables and issues
       ioctl(fd, PERF_EVENT_IOC_REFRESH, 1);
     to restart signal delivery.

I can send you the test program if needed.

Thanks a lot for your help.


-- 
Thomas Richter, Dept 3252, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzender des Aufsichtsrats: Matthias Hartmann
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: WARN_ON_ONCE() hit at kernel/events/core.c:330
  2019-04-03  9:47 WARN_ON_ONCE() hit at kernel/events/core.c:330 Thomas-Mich Richter
@ 2019-04-03 10:41 ` Peter Zijlstra
  2019-04-03 11:26   ` Thomas-Mich Richter
  2019-04-04  9:15   ` Thomas-Mich Richter
  0 siblings, 2 replies; 22+ messages in thread
From: Peter Zijlstra @ 2019-04-03 10:41 UTC (permalink / raw)
  To: Thomas-Mich Richter; +Cc: acme, Linux Kernel Mailing List

On Wed, Apr 03, 2019 at 11:47:00AM +0200, Thomas-Mich Richter wrote:
> I use linux 5.1.0-rc3 on s390 and got this WARN_ON_ONCE message:
> 
> WARNING: CPU: 15 PID: 0 at kernel/events/core.c:330
>                 event_function_local.constprop.79+0xe2/0xe8
> 
> which was introduced with
>    commit cca2094605ef ("perf/core: Fix event_function_local()").
> 
> This is the WARN_ON_ONCE message, which sometimes shows up in the kernel
> log:
>  [ 4598.316519] WARNING: CPU: 15 PID: 0 at kernel/events/core.c:330
>                 event_function_local.constprop.79+0xe2/0xe8
>  [ 4598.316524] Kernel panic - not syncing: panic_on_warn set ...
>  [ 4598.316527] CPU: 15 PID: 0 Comm: swapper/15 Tainted: G           OE
>                 5.1.0-20190402.rc3.git0.5e7a8ca31926.300.fc29.s390x+git #1
>  [ 4598.316529] Hardware name: IBM 2964 NC9 712 (LPAR)
>  [ 4598.316531] Call Trace:
>  [ 4598.316534] ([<0000000000112eb8>] show_stack+0x58/0x70)
>  [ 4598.316538]  [<0000000000a820aa>] dump_stack+0x7a/0xa8
>  [ 4598.316541]  [<0000000000143b52>] panic+0x11a/0x2d0
>  [ 4598.316543]  [<00000000001439b0>] __warn+0xf8/0x118
>  [ 4598.316545]  [<0000000000a811a8>] report_bug+0xd8/0x150
>  [ 4598.316547]  [<00000000001014ac>] do_report_trap+0xc4/0xe0
>  [ 4598.316549]  [<0000000000101680>] illegal_op+0x138/0x150
>  [ 4598.316552]  [<0000000000aa233c>] pgm_check_handler+0x1cc/0x220
>  [ 4598.316554]  [<00000000002a6d02>] event_function_local.constprop.79+
>                                                 0xe2/0xe8
>  [ 4598.316556] ([<00000000002a6c7a>] event_function_local.constprop.79+
>                                                 0x5a/0xe8)
>  [ 4598.316559]  [<00000000002aa400>] perf_pending_event+0x88/0xb0
>  [ 4598.316561]  [<0000000000271a8c>] irq_work_run_list+0x8c/0xb8
>  [ 4598.316563]  [<0000000000271d78>] irq_work_tick+0x48/0x68
>  [ 4598.316566]  [<00000000001d6058>] update_process_times+0x68/0x80
>  [ 4598.316568]  [<00000000001e6c10>] tick_sched_handle.isra.6+0x50/0x60
>  [ 4598.316570]  [<00000000001e6c7e>] tick_sched_timer+0x5e/0xb8
>  [ 4598.316573]  [<00000000001d6b6a>] __hrtimer_run_queues+0x10a/0x2c0
>  [ 4598.316575]  [<00000000001d7a88>] hrtimer_interrupt+0x138/0x2a8
>  [ 4598.316577]  [<000000000010c3e4>] do_IRQ+0xac/0xb0
>  [ 4598.316597]  [<0000000000aa2744>] ext_int_handler+0x128/0x12c
>  [ 4598.316600]  [<00000000001034f6>] enabled_wait+0x46/0xd0
>  [ 4598.316602] ([<000003e000d2fe10>] 0x3e000d2fe10)
>  [ 4598.316604]  [<0000000000103842>] arch_cpu_idle+0x3a/0x50
>  [ 4598.316606]  [<000000000017b808>] do_idle+0x180/0x1b0
>  [ 4598.316608]  [<000000000017ba06>] cpu_startup_entry+0x36/0x40
>  [ 4598.316611]  [<0000000000115d16>] smp_init_secondary+0xd6/0xf0
>  [ 4598.316613]  [<000000000011521e>] smp_start_secondary+0x86/0x98
> 
> Any ideas or hints who to avoid/fix this warning?

Some thoughts here:

  https://lkml.kernel.org/r/20190213101644.GN32534@hirez.programming.kicks-ass.net

tl;dr, I've no frigging clue.

> I ran a user space program which uses
> 
>   1. perf_event_open() system call with type HARDWARE
>      and PERF_COUNT_HW_CPU_CYCLES in sampling mode.
>   2. Assigns signal SIGIO to the file descriptor returned from
>      perf_event_open() using:
>        rc = fcntl(fd, F_SETFL, O_RDWR | O_NONBLOCK | O_ASYNC);
>        rc |= fcntl(fd, F_SETSIG, signo);
>        rc |= fcntl(fd, F_SETOWN, getpid());
> 
>   3. The signal handler increments some variables and issues
>        ioctl(fd, PERF_EVENT_IOC_REFRESH, 1);
>      to restart signal delivery.
> 
> I can send you the test program if needed.

Does it reproduce on x86 without virt on? I don't have a PPC LPAR to
test things on.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: WARN_ON_ONCE() hit at kernel/events/core.c:330
  2019-04-03 10:41 ` Peter Zijlstra
@ 2019-04-03 11:26   ` Thomas-Mich Richter
  2019-04-04  9:15   ` Thomas-Mich Richter
  1 sibling, 0 replies; 22+ messages in thread
From: Thomas-Mich Richter @ 2019-04-03 11:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: acme, Linux Kernel Mailing List, Hendrik Brueckner, Heiko Carstens

On 4/3/19 12:41 PM, Peter Zijlstra wrote:
> On Wed, Apr 03, 2019 at 11:47:00AM +0200, Thomas-Mich Richter wrote:
>> I use linux 5.1.0-rc3 on s390 and got this WARN_ON_ONCE message:
>>
>> WARNING: CPU: 15 PID: 0 at kernel/events/core.c:330
>>                 event_function_local.constprop.79+0xe2/0xe8
>>
>> which was introduced with
>>    commit cca2094605ef ("perf/core: Fix event_function_local()").
>>
>> This is the WARN_ON_ONCE message, which sometimes shows up in the kernel
>> log:
>>  [ 4598.316519] WARNING: CPU: 15 PID: 0 at kernel/events/core.c:330
>>                 event_function_local.constprop.79+0xe2/0xe8
>>  [ 4598.316524] Kernel panic - not syncing: panic_on_warn set ...
>>  [ 4598.316527] CPU: 15 PID: 0 Comm: swapper/15 Tainted: G           OE
>>                 5.1.0-20190402.rc3.git0.5e7a8ca31926.300.fc29.s390x+git #1
>>  [ 4598.316529] Hardware name: IBM 2964 NC9 712 (LPAR)
>>  [ 4598.316531] Call Trace:
>>  [ 4598.316534] ([<0000000000112eb8>] show_stack+0x58/0x70)
>>  [ 4598.316538]  [<0000000000a820aa>] dump_stack+0x7a/0xa8
>>  [ 4598.316541]  [<0000000000143b52>] panic+0x11a/0x2d0
>>  [ 4598.316543]  [<00000000001439b0>] __warn+0xf8/0x118
>>  [ 4598.316545]  [<0000000000a811a8>] report_bug+0xd8/0x150
>>  [ 4598.316547]  [<00000000001014ac>] do_report_trap+0xc4/0xe0
>>  [ 4598.316549]  [<0000000000101680>] illegal_op+0x138/0x150
>>  [ 4598.316552]  [<0000000000aa233c>] pgm_check_handler+0x1cc/0x220
>>  [ 4598.316554]  [<00000000002a6d02>] event_function_local.constprop.79+
>>                                                 0xe2/0xe8
>>  [ 4598.316556] ([<00000000002a6c7a>] event_function_local.constprop.79+
>>                                                 0x5a/0xe8)
>>  [ 4598.316559]  [<00000000002aa400>] perf_pending_event+0x88/0xb0
>>  [ 4598.316561]  [<0000000000271a8c>] irq_work_run_list+0x8c/0xb8
>>  [ 4598.316563]  [<0000000000271d78>] irq_work_tick+0x48/0x68
>>  [ 4598.316566]  [<00000000001d6058>] update_process_times+0x68/0x80
>>  [ 4598.316568]  [<00000000001e6c10>] tick_sched_handle.isra.6+0x50/0x60
>>  [ 4598.316570]  [<00000000001e6c7e>] tick_sched_timer+0x5e/0xb8
>>  [ 4598.316573]  [<00000000001d6b6a>] __hrtimer_run_queues+0x10a/0x2c0
>>  [ 4598.316575]  [<00000000001d7a88>] hrtimer_interrupt+0x138/0x2a8
>>  [ 4598.316577]  [<000000000010c3e4>] do_IRQ+0xac/0xb0
>>  [ 4598.316597]  [<0000000000aa2744>] ext_int_handler+0x128/0x12c
>>  [ 4598.316600]  [<00000000001034f6>] enabled_wait+0x46/0xd0
>>  [ 4598.316602] ([<000003e000d2fe10>] 0x3e000d2fe10)
>>  [ 4598.316604]  [<0000000000103842>] arch_cpu_idle+0x3a/0x50
>>  [ 4598.316606]  [<000000000017b808>] do_idle+0x180/0x1b0
>>  [ 4598.316608]  [<000000000017ba06>] cpu_startup_entry+0x36/0x40
>>  [ 4598.316611]  [<0000000000115d16>] smp_init_secondary+0xd6/0xf0
>>  [ 4598.316613]  [<000000000011521e>] smp_start_secondary+0x86/0x98
>>
>> Any ideas or hints who to avoid/fix this warning?
> 
> Some thoughts here:
> 
>   https://lkml.kernel.org/r/20190213101644.GN32534@hirez.programming.kicks-ass.net
> 
> tl;dr, I've no frigging clue.
> 
>> I ran a user space program which uses
>>
>>   1. perf_event_open() system call with type HARDWARE
>>      and PERF_COUNT_HW_CPU_CYCLES in sampling mode.
>>   2. Assigns signal SIGIO to the file descriptor returned from
>>      perf_event_open() using:
>>        rc = fcntl(fd, F_SETFL, O_RDWR | O_NONBLOCK | O_ASYNC);
>>        rc |= fcntl(fd, F_SETSIG, signo);
>>        rc |= fcntl(fd, F_SETOWN, getpid());
>>
>>   3. The signal handler increments some variables and issues
>>        ioctl(fd, PERF_EVENT_IOC_REFRESH, 1);
>>      to restart signal delivery.
>>
>> I can send you the test program if needed.
> 
> Does it reproduce on x86 without virt on? I don't have a PPC LPAR to
> test things on.
> 

It happens on an s390 mainframe (not PowerPC) without KVM or z/VM virtualisation.
I'll try to repro it on x86.

PS: Put my team members Heiko and Hendrik on cc.
-- 
Thomas Richter, Dept 3252, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzender des Aufsichtsrats: Matthias Hartmann
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: WARN_ON_ONCE() hit at kernel/events/core.c:330
  2019-04-03 10:41 ` Peter Zijlstra
  2019-04-03 11:26   ` Thomas-Mich Richter
@ 2019-04-04  9:15   ` Thomas-Mich Richter
  2019-04-04 11:09     ` Peter Zijlstra
  1 sibling, 1 reply; 22+ messages in thread
From: Thomas-Mich Richter @ 2019-04-04  9:15 UTC (permalink / raw)
  To: Peter Zijlstra, Kees Cook
  Cc: acme, Linux Kernel Mailing List, Heiko Carstens,
	Hendrik Brueckner, Martin Schwidefsky

On 4/3/19 12:41 PM, Peter Zijlstra wrote:
> On Wed, Apr 03, 2019 at 11:47:00AM +0200, Thomas-Mich Richter wrote:
>> I use linux 5.1.0-rc3 on s390 and got this WARN_ON_ONCE message:
>>
>> WARNING: CPU: 15 PID: 0 at kernel/events/core.c:330
>>                 event_function_local.constprop.79+0xe2/0xe8
>>
>> which was introduced with
>>    commit cca2094605ef ("perf/core: Fix event_function_local()").
>> ..snip.... 
>>
>> Any ideas or hints who to avoid/fix this warning?
> 
> Some thoughts here:
> 
>   https://lkml.kernel.org/r/20190213101644.GN32534@hirez.programming.kicks-ass.net
> 
> tl;dr, I've no frigging clue.
> 

I have read this thread and at the end you mentioned:

    Humm, but in that case:

   context_switch()
    prepare_task_switch()
      perf_event_task_sched_out()
        __perf_event_task_sched_out()
	  perf_event_context_sched_out()
	    task_ctx_sched_out()
	      ctx_sched_out()
	        group_sched_out()
		  event_sched_out()
		    if (event->pending_disable)

   Would have already cleared the pending_disable state, so the IPI would
   not have ran perf_event_disable_local() in the first place.

Our test system is configured to panic in WARN_ON_ONCE(). I looked
at the dump. The event triggering WARN_ON_ONCE:

crash> struct perf_event.oncpu 0x1f9b24800
  oncpu = 0
crash> struct perf_event.state 0x1f9b24800
  state = PERF_EVENT_STATE_ACTIVE
crash> 

This means the code in 
static void event_sched_out(....)
{
        ....
        event->pmu->del(event, 0);
        event->oncpu = -1;

        if (event->pending_disable) {
                event->pending_disable = 0;
                state = PERF_EVENT_STATE_OFF;
        }
        perf_event_set_state(event, state);
        ...
}

has not finished and returned from this function. So the task was not completely context-switched
out from CPU 0 while the interrupt handler was executing on CPU 15:

static void perf_pending_event(...)
{
        ....
        if (event->pending_disable) {
                event->pending_disable = 0;
                perf_event_disable_local(event);  <--- Causes the WARN_ON_ONCE()
        }
        .....
}

I think there is a race, especially when the interrupt handler on CPU 15
was invoked via timer interrupt an runs on a different CPU.

> 
> Does it reproduce on x86 without virt on? I don't have a PPC LPAR to
> test things on.
> 

s390 LPARs run under hipervisor control, no chance to run any OS without it.

-- 
Thomas Richter, Dept 3252, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzender des Aufsichtsrats: Matthias Hartmann
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: WARN_ON_ONCE() hit at kernel/events/core.c:330
  2019-04-04  9:15   ` Thomas-Mich Richter
@ 2019-04-04 11:09     ` Peter Zijlstra
  2019-04-04 12:02       ` Peter Zijlstra
  2019-04-04 13:03       ` Peter Zijlstra
  0 siblings, 2 replies; 22+ messages in thread
From: Peter Zijlstra @ 2019-04-04 11:09 UTC (permalink / raw)
  To: Thomas-Mich Richter
  Cc: Kees Cook, acme, Linux Kernel Mailing List, Heiko Carstens,
	Hendrik Brueckner, Martin Schwidefsky

On Thu, Apr 04, 2019 at 11:15:39AM +0200, Thomas-Mich Richter wrote:
> On 4/3/19 12:41 PM, Peter Zijlstra wrote:
> > On Wed, Apr 03, 2019 at 11:47:00AM +0200, Thomas-Mich Richter wrote:
> >> I use linux 5.1.0-rc3 on s390 and got this WARN_ON_ONCE message:
> >>
> >> WARNING: CPU: 15 PID: 0 at kernel/events/core.c:330
> >>                 event_function_local.constprop.79+0xe2/0xe8
> >>
> >> which was introduced with
> >>    commit cca2094605ef ("perf/core: Fix event_function_local()").
> >> ..snip.... 
> >>
> >> Any ideas or hints who to avoid/fix this warning?
> > 
> > Some thoughts here:
> > 
> >   https://lkml.kernel.org/r/20190213101644.GN32534@hirez.programming.kicks-ass.net
> > 
> > tl;dr, I've no frigging clue.
> > 
> 
> I have read this thread and at the end you mentioned:
> 
>     Humm, but in that case:
> 
>    context_switch()
>     prepare_task_switch()
>       perf_event_task_sched_out()
>         __perf_event_task_sched_out()
> 	  perf_event_context_sched_out()
> 	    task_ctx_sched_out()
> 	      ctx_sched_out()
> 	        group_sched_out()
> 		  event_sched_out()
> 		    if (event->pending_disable)
> 
>    Would have already cleared the pending_disable state, so the IPI would
>    not have ran perf_event_disable_local() in the first place.
> 
> Our test system is configured to panic in WARN_ON_ONCE(). I looked
> at the dump. The event triggering WARN_ON_ONCE:
> 
> crash> struct perf_event.oncpu 0x1f9b24800
>   oncpu = 0
> crash> struct perf_event.state 0x1f9b24800
>   state = PERF_EVENT_STATE_ACTIVE
> crash> 
> 
> This means the code in 
> static void event_sched_out(....)
> {
>         ....
>         event->pmu->del(event, 0);
>         event->oncpu = -1;
> 
>         if (event->pending_disable) {
>                 event->pending_disable = 0;
>                 state = PERF_EVENT_STATE_OFF;
>         }
>         perf_event_set_state(event, state);
>         ...
> }
> 
> has not finished and returned from this function. So the task was not completely context-switched
> out from CPU 0 while the interrupt handler was executing on CPU 15:
> 
> static void perf_pending_event(...)
> {
>         ....
>         if (event->pending_disable) {
>                 event->pending_disable = 0;
>                 perf_event_disable_local(event);  <--- Causes the WARN_ON_ONCE()
>         }
>         .....
> }
> 
> I think there is a race, especially when the interrupt handler on CPU 15
> was invoked via timer interrupt an runs on a different CPU.

That is not entirely the scenario I talked about, but *groan*.

So what I meant was:

	CPU-0							CPU-n

	__schedule()
	  local_irq_disable()

	  ...
	    deactivate_task(prev);

								try_to_wake_up(@p)
								  ...
								  smp_cond_load_acquire(&p->on_cpu, !VAL);

	  <PMI>
	    ..
	    perf_event_disable_inatomic()
	      event->pending_disable = 1;
	      irq_work_queue() /* self-IPI */
	  </PMI>

	  context_switch()
	    prepare_task_switch()
	      perf_event_task_sched_out()
	        // the above chain that clears pending_disable

	    finish_task_switch()
	      finish_task()
	        smp_store_release(prev->on_cpu, 0);
								  /* finally.... */
								// take woken
								// context_switch to @p
	      finish_lock_switch()
	        raw_spin_unlock_irq()
		/* w00t, IRQs enabled, self-IPI time */
	        <self-IPI>
		  perf_pending_event()
		    // event->pending_disable == 0
		</self-IPI>


What you're suggesting, is that the time between:

  smp_store_release(prev->on_cpu, 0);

and

  <self-IPI>

on CPU-0 is sufficient for CPU-n to context switch to the task, enable
the event there, trigger a PMI that calls perf_event_disable_inatomic()
_again_ (this would mean irq_work_queue() failing, which we don't check)
(and schedule out again, although that's not required).

This being virt that might actually be possible if (v)CPU-0 takes a nap
I suppose.

Let me think about this a little more...

> > Does it reproduce on x86 without virt on? I don't have a PPC LPAR to
> > test things on.
> > 
> 
> s390 LPARs run under hipervisor control, no chance to run any OS without it.

Yah, I know.. Same difference though; I also don't have an s390.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: WARN_ON_ONCE() hit at kernel/events/core.c:330
  2019-04-04 11:09     ` Peter Zijlstra
@ 2019-04-04 12:02       ` Peter Zijlstra
  2019-04-04 12:13         ` Peter Zijlstra
  2019-04-04 13:03       ` Peter Zijlstra
  1 sibling, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2019-04-04 12:02 UTC (permalink / raw)
  To: Thomas-Mich Richter
  Cc: Kees Cook, acme, Linux Kernel Mailing List, Heiko Carstens,
	Hendrik Brueckner, Martin Schwidefsky

On Thu, Apr 04, 2019 at 01:09:09PM +0200, Peter Zijlstra wrote:
> That is not entirely the scenario I talked about, but *groan*.
> 
> So what I meant was:
> 
> 	CPU-0							CPU-n
> 
> 	__schedule()
> 	  local_irq_disable()
> 
> 	  ...
> 	    deactivate_task(prev);
> 
> 								try_to_wake_up(@p)
> 								  ...
> 								  smp_cond_load_acquire(&p->on_cpu, !VAL);
> 
> 	  <PMI>
> 	    ..
> 	    perf_event_disable_inatomic()
> 	      event->pending_disable = 1;
> 	      irq_work_queue() /* self-IPI */
> 	  </PMI>
> 
> 	  context_switch()
> 	    prepare_task_switch()
> 	      perf_event_task_sched_out()
> 	        // the above chain that clears pending_disable
> 
> 	    finish_task_switch()
> 	      finish_task()
> 	        smp_store_release(prev->on_cpu, 0);
> 								  /* finally.... */
> 								// take woken
> 								// context_switch to @p
> 	      finish_lock_switch()
> 	        raw_spin_unlock_irq()
> 		/* w00t, IRQs enabled, self-IPI time */
> 	        <self-IPI>
> 		  perf_pending_event()
> 		    // event->pending_disable == 0
> 		</self-IPI>
> 
> 
> What you're suggesting, is that the time between:
> 
>   smp_store_release(prev->on_cpu, 0);
> 
> and
> 
>   <self-IPI>
> 
> on CPU-0 is sufficient for CPU-n to context switch to the task, enable
> the event there, trigger a PMI that calls perf_event_disable_inatomic()
> _again_ (this would mean irq_work_queue() failing, which we don't check)
> (and schedule out again, although that's not required).
> 
> This being virt that might actually be possible if (v)CPU-0 takes a nap
> I suppose.
> 
> Let me think about this a little more...

Arghh... s390 doesn't implement arch_irq_work_raise(), which makes it
far far worse.

I have a hack that might've cured it, were it not for that. Let me think
more still..

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: WARN_ON_ONCE() hit at kernel/events/core.c:330
  2019-04-04 12:02       ` Peter Zijlstra
@ 2019-04-04 12:13         ` Peter Zijlstra
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2019-04-04 12:13 UTC (permalink / raw)
  To: Thomas-Mich Richter
  Cc: Kees Cook, acme, Linux Kernel Mailing List, Heiko Carstens,
	Hendrik Brueckner, Martin Schwidefsky

On Thu, Apr 04, 2019 at 02:02:14PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 04, 2019 at 01:09:09PM +0200, Peter Zijlstra wrote:
> > That is not entirely the scenario I talked about, but *groan*.
> > 
> > So what I meant was:
> > 
> > 	CPU-0							CPU-n
> > 
> > 	__schedule()
> > 	  local_irq_disable()
> > 
> > 	  ...
> > 	    deactivate_task(prev);
> > 
> > 								try_to_wake_up(@p)
> > 								  ...
> > 								  smp_cond_load_acquire(&p->on_cpu, !VAL);
> > 
> > 	  <PMI>
> > 	    ..
> > 	    perf_event_disable_inatomic()
> > 	      event->pending_disable = 1;
> > 	      irq_work_queue() /* self-IPI */
> > 	  </PMI>
> > 
> > 	  context_switch()
> > 	    prepare_task_switch()
> > 	      perf_event_task_sched_out()
> > 	        // the above chain that clears pending_disable
> > 
> > 	    finish_task_switch()
> > 	      finish_task()
> > 	        smp_store_release(prev->on_cpu, 0);
> > 								  /* finally.... */
> > 								// take woken
> > 								// context_switch to @p
> > 	      finish_lock_switch()
> > 	        raw_spin_unlock_irq()
> > 		/* w00t, IRQs enabled, self-IPI time */
> > 	        <self-IPI>
> > 		  perf_pending_event()
> > 		    // event->pending_disable == 0
> > 		</self-IPI>
> > 
> > 
> > What you're suggesting, is that the time between:
> > 
> >   smp_store_release(prev->on_cpu, 0);
> > 
> > and
> > 
> >   <self-IPI>
> > 
> > on CPU-0 is sufficient for CPU-n to context switch to the task, enable
> > the event there, trigger a PMI that calls perf_event_disable_inatomic()
> > _again_ (this would mean irq_work_queue() failing, which we don't check)
> > (and schedule out again, although that's not required).
> > 
> > This being virt that might actually be possible if (v)CPU-0 takes a nap
> > I suppose.
> > 
> > Let me think about this a little more...
> 
> Arghh... s390 doesn't implement arch_irq_work_raise(), which makes it
> far far worse.
> 
> I have a hack that might've cured it, were it not for that. Let me think
> more still..

Could you educate me on the s390 PMU, afaict only the SF one has a
sampling interrupt (cpumf_measurement_alert), is that NMI-like or a
regular IRQ ?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: WARN_ON_ONCE() hit at kernel/events/core.c:330
  2019-04-04 11:09     ` Peter Zijlstra
  2019-04-04 12:02       ` Peter Zijlstra
@ 2019-04-04 13:03       ` Peter Zijlstra
  2019-04-04 13:21         ` Thomas-Mich Richter
                           ` (2 more replies)
  1 sibling, 3 replies; 22+ messages in thread
From: Peter Zijlstra @ 2019-04-04 13:03 UTC (permalink / raw)
  To: Thomas-Mich Richter
  Cc: Kees Cook, acme, Linux Kernel Mailing List, Heiko Carstens,
	Hendrik Brueckner, Martin Schwidefsky

On Thu, Apr 04, 2019 at 01:09:09PM +0200, Peter Zijlstra wrote:

> That is not entirely the scenario I talked about, but *groan*.
> 
> So what I meant was:
> 
> 	CPU-0							CPU-n
> 
> 	__schedule()
> 	  local_irq_disable()
> 
> 	  ...
> 	    deactivate_task(prev);
> 
> 								try_to_wake_up(@p)
> 								  ...
> 								  smp_cond_load_acquire(&p->on_cpu, !VAL);
> 
> 	  <PMI>
> 	    ..
> 	    perf_event_disable_inatomic()
> 	      event->pending_disable = 1;
> 	      irq_work_queue() /* self-IPI */
> 	  </PMI>
> 
> 	  context_switch()
> 	    prepare_task_switch()
> 	      perf_event_task_sched_out()
> 	        // the above chain that clears pending_disable
> 
> 	    finish_task_switch()
> 	      finish_task()
> 	        smp_store_release(prev->on_cpu, 0);
> 								  /* finally.... */
> 								// take woken
> 								// context_switch to @p
> 	      finish_lock_switch()
> 	        raw_spin_unlock_irq()
> 		/* w00t, IRQs enabled, self-IPI time */
> 	        <self-IPI>
> 		  perf_pending_event()
> 		    // event->pending_disable == 0
> 		</self-IPI>
> 
> 
> What you're suggesting, is that the time between:
> 
>   smp_store_release(prev->on_cpu, 0);
> 
> and
> 
>   <self-IPI>
> 
> on CPU-0 is sufficient for CPU-n to context switch to the task, enable
> the event there, trigger a PMI that calls perf_event_disable_inatomic()
> _again_ (this would mean irq_work_queue() failing, which we don't check)
> (and schedule out again, although that's not required).
> 
> This being virt that might actually be possible if (v)CPU-0 takes a nap
> I suppose.
> 
> Let me think about this a little more...

Does the below cure things? It's not exactly pretty, but it could just
do the trick.

---
diff --git a/kernel/events/core.c b/kernel/events/core.c
index dfc4bab0b02b..d496e6911442 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2009,8 +2009,8 @@ event_sched_out(struct perf_event *event,
 	event->pmu->del(event, 0);
 	event->oncpu = -1;
 
-	if (event->pending_disable) {
-		event->pending_disable = 0;
+	if (event->pending_disable == smp_processor_id()) {
+		event->pending_disable = -1;
 		state = PERF_EVENT_STATE_OFF;
 	}
 	perf_event_set_state(event, state);
@@ -2198,7 +2198,7 @@ EXPORT_SYMBOL_GPL(perf_event_disable);
 
 void perf_event_disable_inatomic(struct perf_event *event)
 {
-	event->pending_disable = 1;
+	event->pending_disable = smp_processor_id();
 	irq_work_queue(&event->pending);
 }
 
@@ -5822,8 +5822,8 @@ static void perf_pending_event(struct irq_work *entry)
 	 * and we won't recurse 'further'.
 	 */
 
-	if (event->pending_disable) {
-		event->pending_disable = 0;
+	if (event->pending_disable == smp_processor_id()) {
+		event->pending_disable = -1;
 		perf_event_disable_local(event);
 	}
 
@@ -10236,6 +10236,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 
 
 	init_waitqueue_head(&event->waitq);
+	event->pending_disable = -1;
 	init_irq_work(&event->pending, perf_pending_event);
 
 	mutex_init(&event->mmap_mutex);

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: WARN_ON_ONCE() hit at kernel/events/core.c:330
  2019-04-04 13:03       ` Peter Zijlstra
@ 2019-04-04 13:21         ` Thomas-Mich Richter
  2019-04-05 10:18         ` Thomas-Mich Richter
  2019-04-08  7:12         ` Thomas-Mich Richter
  2 siblings, 0 replies; 22+ messages in thread
From: Thomas-Mich Richter @ 2019-04-04 13:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kees Cook, acme, Linux Kernel Mailing List, Heiko Carstens,
	Hendrik Brueckner, Martin Schwidefsky

On 4/4/19 3:03 PM, Peter Zijlstra wrote:
> On Thu, Apr 04, 2019 at 01:09:09PM +0200, Peter Zijlstra wrote:
> 
>> That is not entirely the scenario I talked about, but *groan*.
>>
>> So what I meant was:
>>
>> 	CPU-0							CPU-n
>>
>> 	__schedule()
>> 	  local_irq_disable()
>>
>> 	  ...
>> 	    deactivate_task(prev);
>>
>> 								try_to_wake_up(@p)
>> 								  ...
>> 								  smp_cond_load_acquire(&p->on_cpu, !VAL);
>>
>> 	  <PMI>
>> 	    ..
>> 	    perf_event_disable_inatomic()
>> 	      event->pending_disable = 1;
>> 	      irq_work_queue() /* self-IPI */
>> 	  </PMI>
>>
>> 	  context_switch()
>> 	    prepare_task_switch()
>> 	      perf_event_task_sched_out()
>> 	        // the above chain that clears pending_disable
>>
>> 	    finish_task_switch()
>> 	      finish_task()
>> 	        smp_store_release(prev->on_cpu, 0);
>> 								  /* finally.... */
>> 								// take woken
>> 								// context_switch to @p
>> 	      finish_lock_switch()
>> 	        raw_spin_unlock_irq()
>> 		/* w00t, IRQs enabled, self-IPI time */
>> 	        <self-IPI>
>> 		  perf_pending_event()
>> 		    // event->pending_disable == 0
>> 		</self-IPI>
>>
>>
>> What you're suggesting, is that the time between:
>>
>>   smp_store_release(prev->on_cpu, 0);
>>
>> and
>>
>>   <self-IPI>
>>
>> on CPU-0 is sufficient for CPU-n to context switch to the task, enable
>> the event there, trigger a PMI that calls perf_event_disable_inatomic()
>> _again_ (this would mean irq_work_queue() failing, which we don't check)
>> (and schedule out again, although that's not required).
>>
>> This being virt that might actually be possible if (v)CPU-0 takes a nap
>> I suppose.
>>
>> Let me think about this a little more...
> 
> Does the below cure things? It's not exactly pretty, but it could just
> do the trick.
> 
> ---
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index dfc4bab0b02b..d496e6911442 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -2009,8 +2009,8 @@ event_sched_out(struct perf_event *event,
>  	event->pmu->del(event, 0);
>  	event->oncpu = -1;
>  
> -	if (event->pending_disable) {
> -		event->pending_disable = 0;
> +	if (event->pending_disable == smp_processor_id()) {
> +		event->pending_disable = -1;
>  		state = PERF_EVENT_STATE_OFF;
>  	}
>  	perf_event_set_state(event, state);
> @@ -2198,7 +2198,7 @@ EXPORT_SYMBOL_GPL(perf_event_disable);
>  
>  void perf_event_disable_inatomic(struct perf_event *event)
>  {
> -	event->pending_disable = 1;
> +	event->pending_disable = smp_processor_id();
>  	irq_work_queue(&event->pending);
>  }
>  
> @@ -5822,8 +5822,8 @@ static void perf_pending_event(struct irq_work *entry)
>  	 * and we won't recurse 'further'.
>  	 */
>  
> -	if (event->pending_disable) {
> -		event->pending_disable = 0;
> +	if (event->pending_disable == smp_processor_id()) {
> +		event->pending_disable = -1;
>  		perf_event_disable_local(event);
>  	}
>  
> @@ -10236,6 +10236,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
>  
>  
>  	init_waitqueue_head(&event->waitq);
> +	event->pending_disable = -1;
>  	init_irq_work(&event->pending, perf_pending_event);
>  
>  	mutex_init(&event->mmap_mutex);
> 

Thanks for the quick reply, will give it a try...


-- 
Thomas Richter, Dept 3252, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzender des Aufsichtsrats: Matthias Hartmann
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: WARN_ON_ONCE() hit at kernel/events/core.c:330
  2019-04-04 13:03       ` Peter Zijlstra
  2019-04-04 13:21         ` Thomas-Mich Richter
@ 2019-04-05 10:18         ` Thomas-Mich Richter
  2019-04-05 11:46           ` Peter Zijlstra
  2019-04-08  7:12         ` Thomas-Mich Richter
  2 siblings, 1 reply; 22+ messages in thread
From: Thomas-Mich Richter @ 2019-04-05 10:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kees Cook, acme, Linux Kernel Mailing List, Heiko Carstens,
	Hendrik Brueckner, Martin Schwidefsky

On 4/4/19 3:03 PM, Peter Zijlstra wrote:
> On Thu, Apr 04, 2019 at 01:09:09PM +0200, Peter Zijlstra wrote:
> 
>> That is not entirely the scenario I talked about, but *groan*.
>>
>> So what I meant was:
>>
>> 	CPU-0							CPU-n
>>
>> 	__schedule()
>> 	  local_irq_disable()
>>
>> 	  ...
>> 	    deactivate_task(prev);
>>
>> 								try_to_wake_up(@p)
>> 								  ...
>> 								  smp_cond_load_acquire(&p->on_cpu, !VAL);
>>
>> 	  <PMI>
>> 	    ..
>> 	    perf_event_disable_inatomic()
>> 	      event->pending_disable = 1;
>> 	      irq_work_queue() /* self-IPI */
>> 	  </PMI>
>>
>> 	  context_switch()
>> 	    prepare_task_switch()
>> 	      perf_event_task_sched_out()
>> 	        // the above chain that clears pending_disable
>>
>> 	    finish_task_switch()
>> 	      finish_task()
>> 	        smp_store_release(prev->on_cpu, 0);
>> 								  /* finally.... */
>> 								// take woken
>> 								// context_switch to @p
>> 	      finish_lock_switch()
>> 	        raw_spin_unlock_irq()
>> 		/* w00t, IRQs enabled, self-IPI time */
>> 	        <self-IPI>
>> 		  perf_pending_event()
>> 		    // event->pending_disable == 0
>> 		</self-IPI>
>>
>>
>> What you're suggesting, is that the time between:
>>
>>   smp_store_release(prev->on_cpu, 0);
>>
>> and
>>
>>   <self-IPI>
>>
>> on CPU-0 is sufficient for CPU-n to context switch to the task, enable
>> the event there, trigger a PMI that calls perf_event_disable_inatomic()
>> _again_ (this would mean irq_work_queue() failing, which we don't check)
>> (and schedule out again, although that's not required).
>>
>> This being virt that might actually be possible if (v)CPU-0 takes a nap
>> I suppose.
>>
>> Let me think about this a little more...
> 
> Does the below cure things? It's not exactly pretty, but it could just
> do the trick.
> 

Thanks a lot for the patch, I have built a new kernel and let it run over the week end.

s390 does not have a PMI, all interrupts (including the measurement interrupts from 
the PMU) are normal, maskable interrupts.


-- 
Thomas Richter, Dept 3252, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzender des Aufsichtsrats: Matthias Hartmann
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: WARN_ON_ONCE() hit at kernel/events/core.c:330
  2019-04-05 10:18         ` Thomas-Mich Richter
@ 2019-04-05 11:46           ` Peter Zijlstra
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2019-04-05 11:46 UTC (permalink / raw)
  To: Thomas-Mich Richter
  Cc: Kees Cook, acme, Linux Kernel Mailing List, Heiko Carstens,
	Hendrik Brueckner, Martin Schwidefsky

On Fri, Apr 05, 2019 at 12:18:54PM +0200, Thomas-Mich Richter wrote:
> On 4/4/19 3:03 PM, Peter Zijlstra wrote:
> > On Thu, Apr 04, 2019 at 01:09:09PM +0200, Peter Zijlstra wrote:
> > 
> >> That is not entirely the scenario I talked about, but *groan*.
> >>
> >> So what I meant was:
> >>
> >> 	CPU-0							CPU-n
> >>
> >> 	__schedule()
> >> 	  local_irq_disable()
> >>
> >> 	  ...
> >> 	    deactivate_task(prev);
> >>
> >> 								try_to_wake_up(@p)
> >> 								  ...
> >> 								  smp_cond_load_acquire(&p->on_cpu, !VAL);
> >>
> >> 	  <PMI>
> >> 	    ..
> >> 	    perf_event_disable_inatomic()
> >> 	      event->pending_disable = 1;
> >> 	      irq_work_queue() /* self-IPI */
> >> 	  </PMI>
> >>
> >> 	  context_switch()
> >> 	    prepare_task_switch()
> >> 	      perf_event_task_sched_out()
> >> 	        // the above chain that clears pending_disable
> >>
> >> 	    finish_task_switch()
> >> 	      finish_task()
> >> 	        smp_store_release(prev->on_cpu, 0);
> >> 								  /* finally.... */
> >> 								// take woken
> >> 								// context_switch to @p
> >> 	      finish_lock_switch()
> >> 	        raw_spin_unlock_irq()
> >> 		/* w00t, IRQs enabled, self-IPI time */
> >> 	        <self-IPI>
> >> 		  perf_pending_event()
> >> 		    // event->pending_disable == 0
> >> 		</self-IPI>
> >>
> >>
> >> What you're suggesting, is that the time between:
> >>
> >>   smp_store_release(prev->on_cpu, 0);
> >>
> >> and
> >>
> >>   <self-IPI>
> >>
> >> on CPU-0 is sufficient for CPU-n to context switch to the task, enable
> >> the event there, trigger a PMI that calls perf_event_disable_inatomic()
> >> _again_ (this would mean irq_work_queue() failing, which we don't check)
> >> (and schedule out again, although that's not required).
> >>
> >> This being virt that might actually be possible if (v)CPU-0 takes a nap
> >> I suppose.
> >>
> >> Let me think about this a little more...
> > 
> > Does the below cure things? It's not exactly pretty, but it could just
> > do the trick.
> > 
> 
> Thanks a lot for the patch, I have built a new kernel and let it run over the week end.
> 
> s390 does not have a PMI, all interrupts (including the measurement interrupts from 
> the PMU) are normal, maskable interrupts.

Please implement arch_irq_work_raise() for s390 though, traditionally
that was only required if you had NMI-like PMis, and IRQ based PMIs
would have to run irq_work_run() at the end of their handler.

The s390-sf handler does not do this, but even it if were to do that, it
wouldn't be sufficient, since you also drain the buffer from
pmu::stop().

Not doing this will get you some 'weird' behavoiur. As in, your signals
could be delivered very late etc.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: WARN_ON_ONCE() hit at kernel/events/core.c:330
  2019-04-04 13:03       ` Peter Zijlstra
  2019-04-04 13:21         ` Thomas-Mich Richter
  2019-04-05 10:18         ` Thomas-Mich Richter
@ 2019-04-08  7:12         ` Thomas-Mich Richter
  2019-04-08  8:22           ` Peter Zijlstra
  2 siblings, 1 reply; 22+ messages in thread
From: Thomas-Mich Richter @ 2019-04-08  7:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kees Cook, acme, Linux Kernel Mailing List, Heiko Carstens,
	Hendrik Brueckner, Martin Schwidefsky

On 4/4/19 3:03 PM, Peter Zijlstra wrote:
> On Thu, Apr 04, 2019 at 01:09:09PM +0200, Peter Zijlstra wrote:
> 
>> That is not entirely the scenario I talked about, but *groan*.
>>
>> So what I meant was:
>>
>> 	CPU-0							CPU-n
>>
>> 	__schedule()
>> 	  local_irq_disable()
>>
>> 	  ...
>> 	    deactivate_task(prev);
>>
>> 								try_to_wake_up(@p)
>> 								  ...
>> 								  smp_cond_load_acquire(&p->on_cpu, !VAL);
>>
>> 	  <PMI>
>> 	    ..
>> 	    perf_event_disable_inatomic()
>> 	      event->pending_disable = 1;
>> 	      irq_work_queue() /* self-IPI */
>> 	  </PMI>
>>
>> 	  context_switch()
>> 	    prepare_task_switch()
>> 	      perf_event_task_sched_out()
>> 	        // the above chain that clears pending_disable
>>
>> 	    finish_task_switch()
>> 	      finish_task()
>> 	        smp_store_release(prev->on_cpu, 0);
>> 								  /* finally.... */
>> 								// take woken
>> 								// context_switch to @p
>> 	      finish_lock_switch()
>> 	        raw_spin_unlock_irq()
>> 		/* w00t, IRQs enabled, self-IPI time */
>> 	        <self-IPI>
>> 		  perf_pending_event()
>> 		    // event->pending_disable == 0
>> 		</self-IPI>
>>
>>
>> What you're suggesting, is that the time between:
>>
>>   smp_store_release(prev->on_cpu, 0);
>>
>> and
>>
>>   <self-IPI>
>>
>> on CPU-0 is sufficient for CPU-n to context switch to the task, enable
>> the event there, trigger a PMI that calls perf_event_disable_inatomic()
>> _again_ (this would mean irq_work_queue() failing, which we don't check)
>> (and schedule out again, although that's not required).
>>
>> This being virt that might actually be possible if (v)CPU-0 takes a nap
>> I suppose.
>>
>> Let me think about this a little more...
> 
> Does the below cure things? It's not exactly pretty, but it could just
> do the trick.
> 
> ---
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index dfc4bab0b02b..d496e6911442 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -2009,8 +2009,8 @@ event_sched_out(struct perf_event *event,
>  	event->pmu->del(event, 0);
>  	event->oncpu = -1;
>  
> -	if (event->pending_disable) {
> -		event->pending_disable = 0;
> +	if (event->pending_disable == smp_processor_id()) {
> +		event->pending_disable = -1;
>  		state = PERF_EVENT_STATE_OFF;
>  	}
>  	perf_event_set_state(event, state);
> @@ -2198,7 +2198,7 @@ EXPORT_SYMBOL_GPL(perf_event_disable);
>  
>  void perf_event_disable_inatomic(struct perf_event *event)
>  {
> -	event->pending_disable = 1;
> +	event->pending_disable = smp_processor_id();
>  	irq_work_queue(&event->pending);
>  }
>  
> @@ -5822,8 +5822,8 @@ static void perf_pending_event(struct irq_work *entry)
>  	 * and we won't recurse 'further'.
>  	 */
>  
> -	if (event->pending_disable) {
> -		event->pending_disable = 0;
> +	if (event->pending_disable == smp_processor_id()) {
> +		event->pending_disable = -1;
>  		perf_event_disable_local(event);
>  	}
>  
> @@ -10236,6 +10236,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
>  
>  
>  	init_waitqueue_head(&event->waitq);
> +	event->pending_disable = -1;
>  	init_irq_work(&event->pending, perf_pending_event);
>  
>  	mutex_init(&event->mmap_mutex);
> 

Peter,

very good news, your fix ran over the weekend without any hit!!!

Thanks very much for your help. Do you submit this patch to the kernel mailing list?

-- 
Thomas Richter, Dept 3252, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzender des Aufsichtsrats: Matthias Hartmann
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: WARN_ON_ONCE() hit at kernel/events/core.c:330
  2019-04-08  7:12         ` Thomas-Mich Richter
@ 2019-04-08  8:22           ` Peter Zijlstra
  2019-04-08  8:47             ` Thomas-Mich Richter
  2019-04-08  9:50             ` Peter Zijlstra
  0 siblings, 2 replies; 22+ messages in thread
From: Peter Zijlstra @ 2019-04-08  8:22 UTC (permalink / raw)
  To: Thomas-Mich Richter
  Cc: Kees Cook, acme, Linux Kernel Mailing List, Heiko Carstens,
	Hendrik Brueckner, Martin Schwidefsky

On Mon, Apr 08, 2019 at 09:12:28AM +0200, Thomas-Mich Richter wrote:
> > Does the below cure things? It's not exactly pretty, but it could just
> > do the trick.
> > 
> > ---
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index dfc4bab0b02b..d496e6911442 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -2009,8 +2009,8 @@ event_sched_out(struct perf_event *event,
> >  	event->pmu->del(event, 0);
> >  	event->oncpu = -1;
> >  
> > -	if (event->pending_disable) {
> > -		event->pending_disable = 0;
> > +	if (event->pending_disable == smp_processor_id()) {
> > +		event->pending_disable = -1;
> >  		state = PERF_EVENT_STATE_OFF;
> >  	}
> >  	perf_event_set_state(event, state);
> > @@ -2198,7 +2198,7 @@ EXPORT_SYMBOL_GPL(perf_event_disable);
> >  
> >  void perf_event_disable_inatomic(struct perf_event *event)
> >  {
> > -	event->pending_disable = 1;
> > +	event->pending_disable = smp_processor_id();
> >  	irq_work_queue(&event->pending);
> >  }
> >  
> > @@ -5822,8 +5822,8 @@ static void perf_pending_event(struct irq_work *entry)
> >  	 * and we won't recurse 'further'.
> >  	 */
> >  
> > -	if (event->pending_disable) {
> > -		event->pending_disable = 0;
> > +	if (event->pending_disable == smp_processor_id()) {
> > +		event->pending_disable = -1;
> >  		perf_event_disable_local(event);
> >  	}
> >  
> > @@ -10236,6 +10236,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
> >  
> >  
> >  	init_waitqueue_head(&event->waitq);
> > +	event->pending_disable = -1;
> >  	init_irq_work(&event->pending, perf_pending_event);
> >  
> >  	mutex_init(&event->mmap_mutex);
> > 
> 
> Peter,
> 
> very good news, your fix ran over the weekend without any hit!!!
> 
> Thanks very much for your help. Do you submit this patch to the kernel mailing list?

Most excellent, let me go write a Changelog.

Could I convince you to implement arch_irq_work_raise() for s390?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: WARN_ON_ONCE() hit at kernel/events/core.c:330
  2019-04-08  8:22           ` Peter Zijlstra
@ 2019-04-08  8:47             ` Thomas-Mich Richter
  2019-04-08  9:50             ` Peter Zijlstra
  1 sibling, 0 replies; 22+ messages in thread
From: Thomas-Mich Richter @ 2019-04-08  8:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kees Cook, acme, Linux Kernel Mailing List, Heiko Carstens,
	Hendrik Brueckner, Martin Schwidefsky

On 4/8/19 10:22 AM, Peter Zijlstra wrote:
> On Mon, Apr 08, 2019 at 09:12:28AM +0200, Thomas-Mich Richter wrote:
>>> Does the below cure things? It's not exactly pretty, but it could just
>>> do the trick.
>>>
>>> ---
>>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>>> index dfc4bab0b02b..d496e6911442 100644
>>> --- a/kernel/events/core.c
>>> +++ b/kernel/events/core.c
>>> @@ -2009,8 +2009,8 @@ event_sched_out(struct perf_event *event,
>>>  	event->pmu->del(event, 0);
>>>  	event->oncpu = -1;
>>>  
>>> -	if (event->pending_disable) {
>>> -		event->pending_disable = 0;
>>> +	if (event->pending_disable == smp_processor_id()) {
>>> +		event->pending_disable = -1;
>>>  		state = PERF_EVENT_STATE_OFF;
>>>  	}
>>>  	perf_event_set_state(event, state);
>>> @@ -2198,7 +2198,7 @@ EXPORT_SYMBOL_GPL(perf_event_disable);
>>>  
>>>  void perf_event_disable_inatomic(struct perf_event *event)
>>>  {
>>> -	event->pending_disable = 1;
>>> +	event->pending_disable = smp_processor_id();
>>>  	irq_work_queue(&event->pending);
>>>  }
>>>  
>>> @@ -5822,8 +5822,8 @@ static void perf_pending_event(struct irq_work *entry)
>>>  	 * and we won't recurse 'further'.
>>>  	 */
>>>  
>>> -	if (event->pending_disable) {
>>> -		event->pending_disable = 0;
>>> +	if (event->pending_disable == smp_processor_id()) {
>>> +		event->pending_disable = -1;
>>>  		perf_event_disable_local(event);
>>>  	}
>>>  
>>> @@ -10236,6 +10236,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
>>>  
>>>  
>>>  	init_waitqueue_head(&event->waitq);
>>> +	event->pending_disable = -1;
>>>  	init_irq_work(&event->pending, perf_pending_event);
>>>  
>>>  	mutex_init(&event->mmap_mutex);
>>>
>>
>> Peter,
>>
>> very good news, your fix ran over the weekend without any hit!!!
>>
>> Thanks very much for your help. Do you submit this patch to the kernel mailing list?
> 
> Most excellent, let me go write a Changelog.
> 
> Could I convince you to implement arch_irq_work_raise() for s390?
> 

Yes, I am convinced, however I need to discuss this with the s390 maintainers
Martin Schwidesfky and Heiko Carstens first.

-- 
Thomas Richter, Dept 3252, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzender des Aufsichtsrats: Matthias Hartmann
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: WARN_ON_ONCE() hit at kernel/events/core.c:330
  2019-04-08  8:22           ` Peter Zijlstra
  2019-04-08  8:47             ` Thomas-Mich Richter
@ 2019-04-08  9:50             ` Peter Zijlstra
  2019-04-08 13:28               ` Thomas-Mich Richter
                                 ` (2 more replies)
  1 sibling, 3 replies; 22+ messages in thread
From: Peter Zijlstra @ 2019-04-08  9:50 UTC (permalink / raw)
  To: Thomas-Mich Richter
  Cc: Kees Cook, acme, Linux Kernel Mailing List, Heiko Carstens,
	Hendrik Brueckner, Martin Schwidefsky, mark.rutland, jolsa

On Mon, Apr 08, 2019 at 10:22:29AM +0200, Peter Zijlstra wrote:
> On Mon, Apr 08, 2019 at 09:12:28AM +0200, Thomas-Mich Richter wrote:

> > very good news, your fix ran over the weekend without any hit!!!
> > 
> > Thanks very much for your help. Do you submit this patch to the kernel mailing list?
> 
> Most excellent, let me go write a Changelog.

Hi Thomas, find below.

Sadly, while writing the Changelog I ended up with a 'completely'
differet patch again, could I bother you to test this one too?

---
Subject: perf: Fix perf_event_disable_inatomic()
From: Peter Zijlstra <peterz@infradead.org>
Date: Thu, 4 Apr 2019 15:03:00 +0200

Thomas-Mich Richter reported he triggered a WARN from event_function_local()
on his s390. The problem boils down to:

  CPU-A				CPU-B

  perf_event_overflow()
    perf_event_disable_inatomic()
      @pending_disable = 1
      irq_work_queue();

  sched-out
    event_sched_out()
      @pending_disable = 0

				sched-in
				perf_event_overflow()
				  perf_event_disable_inatomic()
				    @pending_disable = 1;
				    irq_work_queue(); // FAILS

  irq_work_run()
    perf_pending_event()
      if (@pending_disable)
        perf_event_disable_local(); // WHOOPS

The problem exists in generic, but s390 is particularly sensitive
because it doesn't implement arch_irq_work_raise(), nor does it call
irq_work_run() from it's PMU interrupt handler (nor would that be
sufficient in this case, because s390 also generates
perf_event_overflow() from pmu::stop). Add to that the fact that s390
is a virtual architecture and (virtual) CPU-A can stall long enough
for the above race to happen, even if it would self-IPI.

Adding an irq_work_syn() to event_sched_in() would work for all hardare
PMUs that properly use irq_work_run() but fails for software PMUs.

Instead encode the CPU number in @pending_disable, such that we can
tell which CPU requested the disable. This then allows us to detect
the above scenario and even redirect the IPI to make up for the failed
queue.

Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Hendrik Brueckner <brueckner@linux.ibm.com>
Cc: acme@redhat.com
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Reported-by: Thomas-Mich Richter <tmricht@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/events/core.c |   52 ++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 43 insertions(+), 9 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2009,8 +2009,8 @@ event_sched_out(struct perf_event *event
 	event->pmu->del(event, 0);
 	event->oncpu = -1;
 
-	if (event->pending_disable) {
-		event->pending_disable = 0;
+	if (READ_ONCE(event->pending_disable) >= 0) {
+		WRITE_ONCE(event->pending_disable, -1);
 		state = PERF_EVENT_STATE_OFF;
 	}
 	perf_event_set_state(event, state);
@@ -2198,7 +2198,8 @@ EXPORT_SYMBOL_GPL(perf_event_disable);
 
 void perf_event_disable_inatomic(struct perf_event *event)
 {
-	event->pending_disable = 1;
+	WRITE_ONCE(event->pending_disable, smp_processor_id());
+	/* can fail, see perf_pending_event_disable() */
 	irq_work_queue(&event->pending);
 }
 
@@ -5810,10 +5811,45 @@ void perf_event_wakeup(struct perf_event
 	}
 }
 
+static void perf_pending_event_disable(struct perf_event *event)
+{
+	int cpu = READ_ONCE(event->pending_disable);
+
+	if (cpu < 0)
+		return;
+
+	if (cpu == smp_processor_id()) {
+		WRITE_ONCE(event->pending_disable, -1);
+		perf_event_disable_local(event);
+		return;
+	}
+
+	/*
+	 *  CPU-A			CPU-B
+	 *
+	 *  perf_event_disable_inatomic()
+	 *    @pending_disable = CPU-A;
+	 *    irq_work_queue();
+	 *
+	 *  sched-out
+	 *    @pending_disable = -1;
+	 *
+	 *				sched-in
+	 *				perf_event_disable_inatomic()
+	 *				  @pending_disable = CPU-B;
+	 *				  irq_work_queue(); // FAILS
+	 *
+	 *  irq_work_run()
+	 *    perf_pending_event()
+	 *
+	 * But the event runs on CPU-B and wants disabling there.
+	 */
+	irq_work_queue_on(&event->pending, cpu);
+}
+
 static void perf_pending_event(struct irq_work *entry)
 {
-	struct perf_event *event = container_of(entry,
-			struct perf_event, pending);
+	struct perf_event *event = container_of(entry, struct perf_event, pending);
 	int rctx;
 
 	rctx = perf_swevent_get_recursion_context();
@@ -5822,10 +5858,7 @@ static void perf_pending_event(struct ir
 	 * and we won't recurse 'further'.
 	 */
 
-	if (event->pending_disable) {
-		event->pending_disable = 0;
-		perf_event_disable_local(event);
-	}
+	perf_pending_event_disable(event);
 
 	if (event->pending_wakeup) {
 		event->pending_wakeup = 0;
@@ -10236,6 +10269,7 @@ perf_event_alloc(struct perf_event_attr
 
 
 	init_waitqueue_head(&event->waitq);
+	event->pending_disable = -1;
 	init_irq_work(&event->pending, perf_pending_event);
 
 	mutex_init(&event->mmap_mutex);

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: WARN_ON_ONCE() hit at kernel/events/core.c:330
  2019-04-08  9:50             ` Peter Zijlstra
@ 2019-04-08 13:28               ` Thomas-Mich Richter
  2019-04-09  6:07               ` Thomas-Mich Richter
  2019-04-09  8:53               ` Mark Rutland
  2 siblings, 0 replies; 22+ messages in thread
From: Thomas-Mich Richter @ 2019-04-08 13:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kees Cook, acme, Linux Kernel Mailing List, Heiko Carstens,
	Hendrik Brueckner, Martin Schwidefsky, mark.rutland, jolsa

On 4/8/19 11:50 AM, Peter Zijlstra wrote:
> On Mon, Apr 08, 2019 at 10:22:29AM +0200, Peter Zijlstra wrote:
>> On Mon, Apr 08, 2019 at 09:12:28AM +0200, Thomas-Mich Richter wrote:
> 
>>> very good news, your fix ran over the weekend without any hit!!!
>>>
>>> Thanks very much for your help. Do you submit this patch to the kernel mailing list?
>>
>> Most excellent, let me go write a Changelog.
> 
> Hi Thomas, find below.
> 
> Sadly, while writing the Changelog I ended up with a 'completely'
> differet patch again, could I bother you to test this one too?
> 


Hi Peter, no problem, test is now running overnight.
Let you know the outcome tomorrow...

-- 
Thomas Richter, Dept 3252, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzender des Aufsichtsrats: Matthias Hartmann
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: WARN_ON_ONCE() hit at kernel/events/core.c:330
  2019-04-08  9:50             ` Peter Zijlstra
  2019-04-08 13:28               ` Thomas-Mich Richter
@ 2019-04-09  6:07               ` Thomas-Mich Richter
  2019-04-09  8:29                 ` Peter Zijlstra
  2019-04-09  8:53               ` Mark Rutland
  2 siblings, 1 reply; 22+ messages in thread
From: Thomas-Mich Richter @ 2019-04-09  6:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kees Cook, acme, Linux Kernel Mailing List, Heiko Carstens,
	Hendrik Brueckner, Martin Schwidefsky, mark.rutland, jolsa

On 4/8/19 11:50 AM, Peter Zijlstra wrote:
> On Mon, Apr 08, 2019 at 10:22:29AM +0200, Peter Zijlstra wrote:
>> On Mon, Apr 08, 2019 at 09:12:28AM +0200, Thomas-Mich Richter wrote:
> 
>>> very good news, your fix ran over the weekend without any hit!!!
>>>
>>> Thanks very much for your help. Do you submit this patch to the kernel mailing list?
>>
>> Most excellent, let me go write a Changelog.
> 
> Hi Thomas, find below.
> 
> Sadly, while writing the Changelog I ended up with a 'completely'
> differet patch again, could I bother you to test this one too?
> 

Hi Peter,

I verified your second patch, it ran overnight without any message in the
kernel log. Thanks again.

Tested-by: Thomas Richter <tmricht@linux.ibm.com>

-- 
Thomas Richter, Dept 3252, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzender des Aufsichtsrats: Matthias Hartmann
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: WARN_ON_ONCE() hit at kernel/events/core.c:330
  2019-04-09  6:07               ` Thomas-Mich Richter
@ 2019-04-09  8:29                 ` Peter Zijlstra
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2019-04-09  8:29 UTC (permalink / raw)
  To: Thomas-Mich Richter
  Cc: Kees Cook, acme, Linux Kernel Mailing List, Heiko Carstens,
	Hendrik Brueckner, Martin Schwidefsky, mark.rutland, jolsa

On Tue, Apr 09, 2019 at 08:07:49AM +0200, Thomas-Mich Richter wrote:
> On 4/8/19 11:50 AM, Peter Zijlstra wrote:
> > On Mon, Apr 08, 2019 at 10:22:29AM +0200, Peter Zijlstra wrote:
> >> On Mon, Apr 08, 2019 at 09:12:28AM +0200, Thomas-Mich Richter wrote:
> > 
> >>> very good news, your fix ran over the weekend without any hit!!!
> >>>
> >>> Thanks very much for your help. Do you submit this patch to the kernel mailing list?
> >>
> >> Most excellent, let me go write a Changelog.
> > 
> > Hi Thomas, find below.
> > 
> > Sadly, while writing the Changelog I ended up with a 'completely'
> > differet patch again, could I bother you to test this one too?
> > 
> 
> Hi Peter,
> 
> I verified your second patch, it ran overnight without any message in the
> kernel log. Thanks again.
> 
> Tested-by: Thomas Richter <tmricht@linux.ibm.com>

Most excellent, thanks for that!


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: WARN_ON_ONCE() hit at kernel/events/core.c:330
  2019-04-08  9:50             ` Peter Zijlstra
  2019-04-08 13:28               ` Thomas-Mich Richter
  2019-04-09  6:07               ` Thomas-Mich Richter
@ 2019-04-09  8:53               ` Mark Rutland
  2019-04-10 13:51                 ` Thomas-Mich Richter
  2 siblings, 1 reply; 22+ messages in thread
From: Mark Rutland @ 2019-04-09  8:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas-Mich Richter, Kees Cook, acme, Linux Kernel Mailing List,
	Heiko Carstens, Hendrik Brueckner, Martin Schwidefsky, jolsa

On Mon, Apr 08, 2019 at 11:50:31AM +0200, Peter Zijlstra wrote:
> On Mon, Apr 08, 2019 at 10:22:29AM +0200, Peter Zijlstra wrote:
> > On Mon, Apr 08, 2019 at 09:12:28AM +0200, Thomas-Mich Richter wrote:
> 
> > > very good news, your fix ran over the weekend without any hit!!!
> > > 
> > > Thanks very much for your help. Do you submit this patch to the kernel mailing list?
> > 
> > Most excellent, let me go write a Changelog.
> 
> Hi Thomas, find below.
> 
> Sadly, while writing the Changelog I ended up with a 'completely'
> differet patch again, could I bother you to test this one too?
> 
> ---
> Subject: perf: Fix perf_event_disable_inatomic()
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Thu, 4 Apr 2019 15:03:00 +0200
> 
> Thomas-Mich Richter reported he triggered a WARN from event_function_local()
> on his s390. The problem boils down to:
> 
>   CPU-A				CPU-B
> 
>   perf_event_overflow()
>     perf_event_disable_inatomic()
>       @pending_disable = 1
>       irq_work_queue();
> 
>   sched-out
>     event_sched_out()
>       @pending_disable = 0
> 
> 				sched-in
> 				perf_event_overflow()
> 				  perf_event_disable_inatomic()
> 				    @pending_disable = 1;
> 				    irq_work_queue(); // FAILS
> 
>   irq_work_run()
>     perf_pending_event()
>       if (@pending_disable)
>         perf_event_disable_local(); // WHOOPS
> 
> The problem exists in generic, but s390 is particularly sensitive
> because it doesn't implement arch_irq_work_raise(), nor does it call
> irq_work_run() from it's PMU interrupt handler (nor would that be
> sufficient in this case, because s390 also generates
> perf_event_overflow() from pmu::stop). Add to that the fact that s390
> is a virtual architecture and (virtual) CPU-A can stall long enough
> for the above race to happen, even if it would self-IPI.
> 
> Adding an irq_work_syn() to event_sched_in() would work for all hardare
> PMUs that properly use irq_work_run() but fails for software PMUs.

Typo: s/syn/sync/

> 
> Instead encode the CPU number in @pending_disable, such that we can
> tell which CPU requested the disable. This then allows us to detect
> the above scenario and even redirect the IPI to make up for the failed
> queue.
> 
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Hendrik Brueckner <brueckner@linux.ibm.com>
> Cc: acme@redhat.com
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Jiri Olsa <jolsa@redhat.com>
> Reported-by: Thomas-Mich Richter <tmricht@linux.ibm.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

I can't think of a nicer way of handling this, so FWIW:

Acked-by: Mark Rutland <mark.rutland@arm.com>

Mark.

> ---
>  kernel/events/core.c |   52 ++++++++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 43 insertions(+), 9 deletions(-)
> 
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -2009,8 +2009,8 @@ event_sched_out(struct perf_event *event
>  	event->pmu->del(event, 0);
>  	event->oncpu = -1;
>  
> -	if (event->pending_disable) {
> -		event->pending_disable = 0;
> +	if (READ_ONCE(event->pending_disable) >= 0) {
> +		WRITE_ONCE(event->pending_disable, -1);
>  		state = PERF_EVENT_STATE_OFF;
>  	}
>  	perf_event_set_state(event, state);
> @@ -2198,7 +2198,8 @@ EXPORT_SYMBOL_GPL(perf_event_disable);
>  
>  void perf_event_disable_inatomic(struct perf_event *event)
>  {
> -	event->pending_disable = 1;
> +	WRITE_ONCE(event->pending_disable, smp_processor_id());
> +	/* can fail, see perf_pending_event_disable() */
>  	irq_work_queue(&event->pending);
>  }
>  
> @@ -5810,10 +5811,45 @@ void perf_event_wakeup(struct perf_event
>  	}
>  }
>  
> +static void perf_pending_event_disable(struct perf_event *event)
> +{
> +	int cpu = READ_ONCE(event->pending_disable);
> +
> +	if (cpu < 0)
> +		return;
> +
> +	if (cpu == smp_processor_id()) {
> +		WRITE_ONCE(event->pending_disable, -1);
> +		perf_event_disable_local(event);
> +		return;
> +	}
> +
> +	/*
> +	 *  CPU-A			CPU-B
> +	 *
> +	 *  perf_event_disable_inatomic()
> +	 *    @pending_disable = CPU-A;
> +	 *    irq_work_queue();
> +	 *
> +	 *  sched-out
> +	 *    @pending_disable = -1;
> +	 *
> +	 *				sched-in
> +	 *				perf_event_disable_inatomic()
> +	 *				  @pending_disable = CPU-B;
> +	 *				  irq_work_queue(); // FAILS
> +	 *
> +	 *  irq_work_run()
> +	 *    perf_pending_event()
> +	 *
> +	 * But the event runs on CPU-B and wants disabling there.
> +	 */
> +	irq_work_queue_on(&event->pending, cpu);
> +}
> +
>  static void perf_pending_event(struct irq_work *entry)
>  {
> -	struct perf_event *event = container_of(entry,
> -			struct perf_event, pending);
> +	struct perf_event *event = container_of(entry, struct perf_event, pending);
>  	int rctx;
>  
>  	rctx = perf_swevent_get_recursion_context();
> @@ -5822,10 +5858,7 @@ static void perf_pending_event(struct ir
>  	 * and we won't recurse 'further'.
>  	 */
>  
> -	if (event->pending_disable) {
> -		event->pending_disable = 0;
> -		perf_event_disable_local(event);
> -	}
> +	perf_pending_event_disable(event);
>  
>  	if (event->pending_wakeup) {
>  		event->pending_wakeup = 0;
> @@ -10236,6 +10269,7 @@ perf_event_alloc(struct perf_event_attr
>  
>  
>  	init_waitqueue_head(&event->waitq);
> +	event->pending_disable = -1;
>  	init_irq_work(&event->pending, perf_pending_event);
>  
>  	mutex_init(&event->mmap_mutex);

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: WARN_ON_ONCE() hit at kernel/events/core.c:330
  2019-04-09  8:53               ` Mark Rutland
@ 2019-04-10 13:51                 ` Thomas-Mich Richter
  2019-04-10 14:33                   ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Thomas-Mich Richter @ 2019-04-10 13:51 UTC (permalink / raw)
  To: Mark Rutland, Peter Zijlstra
  Cc: Kees Cook, acme, Linux Kernel Mailing List, Heiko Carstens,
	Hendrik Brueckner, Martin Schwidefsky, jolsa

On 4/9/19 10:53 AM, Mark Rutland wrote:
> On Mon, Apr 08, 2019 at 11:50:31AM +0200, Peter Zijlstra wrote:
>> On Mon, Apr 08, 2019 at 10:22:29AM +0200, Peter Zijlstra wrote:
>>> On Mon, Apr 08, 2019 at 09:12:28AM +0200, Thomas-Mich Richter wrote:
>>

.....


>>
>> Instead encode the CPU number in @pending_disable, such that we can
>> tell which CPU requested the disable. This then allows us to detect
>> the above scenario and even redirect the IPI to make up for the failed
>> queue.
>>
>> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>> Cc: Kees Cook <keescook@chromium.org>
>> Cc: Hendrik Brueckner <brueckner@linux.ibm.com>
>> Cc: acme@redhat.com
>> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: Jiri Olsa <jolsa@redhat.com>
>> Reported-by: Thomas-Mich Richter <tmricht@linux.ibm.com>
>> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> 
> I can't think of a nicer way of handling this, so FWIW:
> 
> Acked-by: Mark Rutland <mark.rutland@arm.com>
> 
> Mark.
> 
>> ---

Thanks for the fix with commit id 86071b11317550d994b55ce5e31aa06bcad783b5.

However doing an fgrep on the pending_disable member of struct perf_event
reveals two more hits in file kernel/events/ringbuffer.c when events
have auxiliary data.

Not sure if this needs to be addresses too, just wanted to let you know.

-- 
Thomas Richter, Dept 3252, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzender des Aufsichtsrats: Matthias Hartmann
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: WARN_ON_ONCE() hit at kernel/events/core.c:330
  2019-04-10 13:51                 ` Thomas-Mich Richter
@ 2019-04-10 14:33                   ` Peter Zijlstra
  2019-04-11 12:06                     ` Alexander Shishkin
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2019-04-10 14:33 UTC (permalink / raw)
  To: Thomas-Mich Richter
  Cc: Mark Rutland, Kees Cook, acme, Linux Kernel Mailing List,
	Heiko Carstens, Hendrik Brueckner, Martin Schwidefsky, jolsa,
	Alexander Shishkin

On Wed, Apr 10, 2019 at 03:51:24PM +0200, Thomas-Mich Richter wrote:
> Thanks for the fix with commit id 86071b11317550d994b55ce5e31aa06bcad783b5.
> 
> However doing an fgrep on the pending_disable member of struct perf_event
> reveals two more hits in file kernel/events/ringbuffer.c when events
> have auxiliary data.
> 
> Not sure if this needs to be addresses too, just wanted to let you know.

*groan* indeed, and yes that would need fixing too. I completely missed
the AUX stuff using that too.

I think we can simply do the below on top, Alexander?

---
 kernel/events/ring_buffer.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 678ccec60d8f..ab2b7b38adc5 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -392,7 +392,7 @@ void *perf_aux_output_begin(struct perf_output_handle *handle,
 		 * store that will be enabled on successful return
 		 */
 		if (!handle->size) { /* A, matches D */
-			event->pending_disable = 1;
+			event->pending_disable = smp_processor_id();
 			perf_output_wakeup(handle);
 			local_set(&rb->aux_nest, 0);
 			goto err_put;
@@ -480,7 +480,7 @@ void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size)

 	if (wakeup) {
 		if (handle->aux_flags & PERF_AUX_FLAG_TRUNCATED)
-			handle->event->pending_disable = 1;
+			handle->event->pending_disable = smp_processor_id();
 		perf_output_wakeup(handle);
 	}



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: WARN_ON_ONCE() hit at kernel/events/core.c:330
  2019-04-10 14:33                   ` Peter Zijlstra
@ 2019-04-11 12:06                     ` Alexander Shishkin
  0 siblings, 0 replies; 22+ messages in thread
From: Alexander Shishkin @ 2019-04-11 12:06 UTC (permalink / raw)
  To: Peter Zijlstra, Thomas-Mich Richter
  Cc: Mark Rutland, Kees Cook, acme, Linux Kernel Mailing List,
	Heiko Carstens, Hendrik Brueckner, Martin Schwidefsky, jolsa,
	alexander.shishkin

Peter Zijlstra <peterz@infradead.org> writes:

> On Wed, Apr 10, 2019 at 03:51:24PM +0200, Thomas-Mich Richter wrote:
>> Thanks for the fix with commit id 86071b11317550d994b55ce5e31aa06bcad783b5.
>> 
>> However doing an fgrep on the pending_disable member of struct perf_event
>> reveals two more hits in file kernel/events/ringbuffer.c when events
>> have auxiliary data.
>> 
>> Not sure if this needs to be addresses too, just wanted to let you know.
>
> *groan* indeed, and yes that would need fixing too. I completely missed
> the AUX stuff using that too.
>
> I think we can simply do the below on top, Alexander?

Looks good. FWIW, renaming pending_disable in the first place to match
the fact that it's a cpu# now, would have probably caught all these.

> ---
>  kernel/events/ring_buffer.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
> index 678ccec60d8f..ab2b7b38adc5 100644
> --- a/kernel/events/ring_buffer.c
> +++ b/kernel/events/ring_buffer.c
> @@ -392,7 +392,7 @@ void *perf_aux_output_begin(struct perf_output_handle *handle,
>  		 * store that will be enabled on successful return
>  		 */
>  		if (!handle->size) { /* A, matches D */
> -			event->pending_disable = 1;
> +			event->pending_disable = smp_processor_id();
>  			perf_output_wakeup(handle);
>  			local_set(&rb->aux_nest, 0);
>  			goto err_put;
> @@ -480,7 +480,7 @@ void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size)
>
>  	if (wakeup) {
>  		if (handle->aux_flags & PERF_AUX_FLAG_TRUNCATED)
> -			handle->event->pending_disable = 1;
> +			handle->event->pending_disable = smp_processor_id();
>  		perf_output_wakeup(handle);
>  	}

Thanks,
--
Alex

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-04-11 12:06 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-03  9:47 WARN_ON_ONCE() hit at kernel/events/core.c:330 Thomas-Mich Richter
2019-04-03 10:41 ` Peter Zijlstra
2019-04-03 11:26   ` Thomas-Mich Richter
2019-04-04  9:15   ` Thomas-Mich Richter
2019-04-04 11:09     ` Peter Zijlstra
2019-04-04 12:02       ` Peter Zijlstra
2019-04-04 12:13         ` Peter Zijlstra
2019-04-04 13:03       ` Peter Zijlstra
2019-04-04 13:21         ` Thomas-Mich Richter
2019-04-05 10:18         ` Thomas-Mich Richter
2019-04-05 11:46           ` Peter Zijlstra
2019-04-08  7:12         ` Thomas-Mich Richter
2019-04-08  8:22           ` Peter Zijlstra
2019-04-08  8:47             ` Thomas-Mich Richter
2019-04-08  9:50             ` Peter Zijlstra
2019-04-08 13:28               ` Thomas-Mich Richter
2019-04-09  6:07               ` Thomas-Mich Richter
2019-04-09  8:29                 ` Peter Zijlstra
2019-04-09  8:53               ` Mark Rutland
2019-04-10 13:51                 ` Thomas-Mich Richter
2019-04-10 14:33                   ` Peter Zijlstra
2019-04-11 12:06                     ` Alexander Shishkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.