linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] perf/x86: Fix a warning on x86_pmu_stop()
@ 2020-11-21  2:50 Namhyung Kim
  2020-11-23 14:23 ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Namhyung Kim @ 2020-11-21  2:50 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Borislav Petkov, Thomas Gleixner, H. Peter Anvin, x86, LKML,
	Stephane Eranian, Kan Liang, John Sperbeck, Lendacky, Thomas

When large PEBS is enabled, the below warning is triggered:

  [6070379.453697] WARNING: CPU: 23 PID: 42379 at arch/x86/events/core.c:1466 x86_pmu_stop+0x95/0xa0
  ...
  [6070379.453831] Call Trace:
  [6070379.453840]  x86_pmu_del+0x50/0x150
  [6070379.453845]  event_sched_out.isra.0+0x95/0x200
  [6070379.453848]  group_sched_out.part.0+0x53/0xd0
  [6070379.453851]  __perf_event_disable+0xee/0x1e0
  [6070379.453854]  event_function+0x89/0xd0
  [6070379.453859]  remote_function+0x3e/0x50
  [6070379.453866]  generic_exec_single+0x91/0xd0
  [6070379.453870]  smp_call_function_single+0xd1/0x110
  [6070379.453874]  event_function_call+0x11c/0x130
  [6070379.453877]  ? task_ctx_sched_out+0x20/0x20
  [6070379.453880]  ? perf_mux_hrtimer_handler+0x370/0x370
  [6070379.453882]  ? event_function_call+0x130/0x130
  [6070379.453886]  perf_event_for_each_child+0x34/0x80
  [6070379.453889]  ? event_function_call+0x130/0x130
  [6070379.453891]  _perf_ioctl+0x24b/0x6a0
  [6070379.453898]  ? sched_setaffinity+0x1ad/0x2a0
  [6070379.453904]  ? _cond_resched+0x15/0x30
  [6070379.453906]  perf_ioctl+0x3d/0x60
  [6070379.453912]  ksys_ioctl+0x87/0xc0
  [6070379.453917]  __x64_sys_ioctl+0x16/0x20
  [6070379.453923]  do_syscall_64+0x52/0x180
  [6070379.453928]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

The commit 3966c3feca3f ("x86/perf/amd: Remove need to check "running"
bit in NMI handler") introduced this.  It seems x86_pmu_stop can be
called recursively (like when it losts some samples) like below:

  x86_pmu_stop
    intel_pmu_disable_event  (x86_pmu_disable)
      intel_pmu_pebs_disable
        intel_pmu_drain_pebs_buffer
          x86_pmu_stop

It seems the change is only needed for AMD.  So I added a new bit to
check when it should clear the active mask.

Fixes: 3966c3feca3f ("x86/perf/amd: Remove need to check "running" bit in NMI handler")
Reported-by: John Sperbeck <jsperbeck@google.com>
Cc: "Lendacky, Thomas" <Thomas.Lendacky@amd.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 arch/x86/events/amd/core.c   | 1 +
 arch/x86/events/core.c       | 9 +++++++--
 arch/x86/events/perf_event.h | 3 ++-
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index 39eb276d0277..c545fbd423df 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -927,6 +927,7 @@ static __initconst const struct x86_pmu amd_pmu = {
 	.max_period		= (1ULL << 47) - 1,
 	.get_event_constraints	= amd_get_event_constraints,
 	.put_event_constraints	= amd_put_event_constraints,
+	.late_nmi		= 1,
 
 	.format_attrs		= amd_format_attr,
 	.events_sysfs_show	= amd_event_sysfs_show,
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 7b802a778014..a6c12bd71e66 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1514,8 +1514,13 @@ void x86_pmu_stop(struct perf_event *event, int flags)
 	struct hw_perf_event *hwc = &event->hw;
 
 	if (test_bit(hwc->idx, cpuc->active_mask)) {
-		x86_pmu.disable(event);
-		__clear_bit(hwc->idx, cpuc->active_mask);
+		if (x86_pmu.late_nmi) {
+			x86_pmu.disable(event);
+			__clear_bit(hwc->idx, cpuc->active_mask);
+		} else {
+			__clear_bit(hwc->idx, cpuc->active_mask);
+			x86_pmu.disable(event);
+		}
 		cpuc->events[hwc->idx] = NULL;
 		WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);
 		hwc->state |= PERF_HES_STOPPED;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 10032f023fcc..1ffaa0fcd521 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -682,7 +682,8 @@ struct x86_pmu {
 	/* PMI handler bits */
 	unsigned int	late_ack		:1,
 			enabled_ack		:1,
-			counter_freezing	:1;
+			counter_freezing	:1,
+			late_nmi		:1;
 	/*
 	 * sysfs attrs
 	 */
-- 
2.29.2.454.gaff20da3a2-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC] perf/x86: Fix a warning on x86_pmu_stop()
  2020-11-21  2:50 [RFC] perf/x86: Fix a warning on x86_pmu_stop() Namhyung Kim
@ 2020-11-23 14:23 ` Peter Zijlstra
  2020-11-24  5:01   ` Namhyung Kim
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2020-11-23 14:23 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Borislav Petkov, Thomas Gleixner, H. Peter Anvin,
	x86, LKML, Stephane Eranian, Kan Liang, John Sperbeck, Lendacky,
	Thomas

On Sat, Nov 21, 2020 at 11:50:11AM +0900, Namhyung Kim wrote:
> When large PEBS is enabled, the below warning is triggered:
> 
>   [6070379.453697] WARNING: CPU: 23 PID: 42379 at arch/x86/events/core.c:1466 x86_pmu_stop+0x95/0xa0
>   ...
>   [6070379.453831] Call Trace:
>   [6070379.453840]  x86_pmu_del+0x50/0x150
>   [6070379.453845]  event_sched_out.isra.0+0x95/0x200
>   [6070379.453848]  group_sched_out.part.0+0x53/0xd0
>   [6070379.453851]  __perf_event_disable+0xee/0x1e0
>   [6070379.453854]  event_function+0x89/0xd0
>   [6070379.453859]  remote_function+0x3e/0x50
>   [6070379.453866]  generic_exec_single+0x91/0xd0
>   [6070379.453870]  smp_call_function_single+0xd1/0x110
>   [6070379.453874]  event_function_call+0x11c/0x130
>   [6070379.453877]  ? task_ctx_sched_out+0x20/0x20
>   [6070379.453880]  ? perf_mux_hrtimer_handler+0x370/0x370
>   [6070379.453882]  ? event_function_call+0x130/0x130
>   [6070379.453886]  perf_event_for_each_child+0x34/0x80
>   [6070379.453889]  ? event_function_call+0x130/0x130
>   [6070379.453891]  _perf_ioctl+0x24b/0x6a0
>   [6070379.453898]  ? sched_setaffinity+0x1ad/0x2a0
>   [6070379.453904]  ? _cond_resched+0x15/0x30
>   [6070379.453906]  perf_ioctl+0x3d/0x60
>   [6070379.453912]  ksys_ioctl+0x87/0xc0
>   [6070379.453917]  __x64_sys_ioctl+0x16/0x20
>   [6070379.453923]  do_syscall_64+0x52/0x180
>   [6070379.453928]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> The commit 3966c3feca3f ("x86/perf/amd: Remove need to check "running"
> bit in NMI handler") introduced this.  It seems x86_pmu_stop can be
> called recursively (like when it losts some samples) like below:
> 
>   x86_pmu_stop
>     intel_pmu_disable_event  (x86_pmu_disable)
>       intel_pmu_pebs_disable
>         intel_pmu_drain_pebs_buffer
>           x86_pmu_stop
> 

This shouldn't be possible; intel_pmu_drain_pebs_buffer() calls
drain_pebs(.iregs=NULL), which means that __intel_pmu_pebs_event()
should not end up x86_pmu_stop().

Are you running some old kernel?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] perf/x86: Fix a warning on x86_pmu_stop()
  2020-11-23 14:23 ` Peter Zijlstra
@ 2020-11-24  5:01   ` Namhyung Kim
  2020-11-24  8:09     ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Namhyung Kim @ 2020-11-24  5:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Borislav Petkov, Thomas Gleixner, H. Peter Anvin,
	x86, LKML, Stephane Eranian, Kan Liang, John Sperbeck, Lendacky,
	Thomas

Hi Peter,

On Mon, Nov 23, 2020 at 11:23 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Sat, Nov 21, 2020 at 11:50:11AM +0900, Namhyung Kim wrote:
> > When large PEBS is enabled, the below warning is triggered:
> >
> >   [6070379.453697] WARNING: CPU: 23 PID: 42379 at arch/x86/events/core.c:1466 x86_pmu_stop+0x95/0xa0
> >   ...
> >   [6070379.453831] Call Trace:
> >   [6070379.453840]  x86_pmu_del+0x50/0x150
> >   [6070379.453845]  event_sched_out.isra.0+0x95/0x200
> >   [6070379.453848]  group_sched_out.part.0+0x53/0xd0
> >   [6070379.453851]  __perf_event_disable+0xee/0x1e0
> >   [6070379.453854]  event_function+0x89/0xd0
> >   [6070379.453859]  remote_function+0x3e/0x50
> >   [6070379.453866]  generic_exec_single+0x91/0xd0
> >   [6070379.453870]  smp_call_function_single+0xd1/0x110
> >   [6070379.453874]  event_function_call+0x11c/0x130
> >   [6070379.453877]  ? task_ctx_sched_out+0x20/0x20
> >   [6070379.453880]  ? perf_mux_hrtimer_handler+0x370/0x370
> >   [6070379.453882]  ? event_function_call+0x130/0x130
> >   [6070379.453886]  perf_event_for_each_child+0x34/0x80
> >   [6070379.453889]  ? event_function_call+0x130/0x130
> >   [6070379.453891]  _perf_ioctl+0x24b/0x6a0
> >   [6070379.453898]  ? sched_setaffinity+0x1ad/0x2a0
> >   [6070379.453904]  ? _cond_resched+0x15/0x30
> >   [6070379.453906]  perf_ioctl+0x3d/0x60
> >   [6070379.453912]  ksys_ioctl+0x87/0xc0
> >   [6070379.453917]  __x64_sys_ioctl+0x16/0x20
> >   [6070379.453923]  do_syscall_64+0x52/0x180
> >   [6070379.453928]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >
> > The commit 3966c3feca3f ("x86/perf/amd: Remove need to check "running"
> > bit in NMI handler") introduced this.  It seems x86_pmu_stop can be
> > called recursively (like when it losts some samples) like below:
> >
> >   x86_pmu_stop
> >     intel_pmu_disable_event  (x86_pmu_disable)
> >       intel_pmu_pebs_disable
> >         intel_pmu_drain_pebs_buffer
> >           x86_pmu_stop
> >
>
> This shouldn't be possible; intel_pmu_drain_pebs_buffer() calls
> drain_pebs(.iregs=NULL), which means that __intel_pmu_pebs_event()
> should not end up x86_pmu_stop().
>
> Are you running some old kernel?

Well, it's actually 5.7.17 but I think the latest version has the same problem.

Yes, it's not about __intel_pmu_pebs_event().  I'm looking at
intel_pmu_drain_pebs_nhm() specifically.  There's code like

        /* log dropped samples number */
        if (error[bit]) {
            perf_log_lost_samples(event, error[bit]);

            if (perf_event_account_interrupt(event))
                x86_pmu_stop(event, 0);
        }

        if (counts[bit]) {
            __intel_pmu_pebs_event(event, iregs, base,
                           top, bit, counts[bit],
                           setup_pebs_fixed_sample_data);
        }

There's a path to x86_pmu_stop() when an error bit is on.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] perf/x86: Fix a warning on x86_pmu_stop()
  2020-11-24  5:01   ` Namhyung Kim
@ 2020-11-24  8:09     ` Peter Zijlstra
  2020-11-24  8:19       ` Stephane Eranian
  2020-11-25  7:22       ` Namhyung Kim
  0 siblings, 2 replies; 7+ messages in thread
From: Peter Zijlstra @ 2020-11-24  8:09 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Borislav Petkov, Thomas Gleixner, H. Peter Anvin,
	x86, LKML, Stephane Eranian, Kan Liang, John Sperbeck, Lendacky,
	Thomas

On Tue, Nov 24, 2020 at 02:01:39PM +0900, Namhyung Kim wrote:

> Yes, it's not about __intel_pmu_pebs_event().  I'm looking at
> intel_pmu_drain_pebs_nhm() specifically.  There's code like
> 
>         /* log dropped samples number */
>         if (error[bit]) {
>             perf_log_lost_samples(event, error[bit]);
> 
>             if (perf_event_account_interrupt(event))
>                 x86_pmu_stop(event, 0);
>         }
> 
>         if (counts[bit]) {
>             __intel_pmu_pebs_event(event, iregs, base,
>                            top, bit, counts[bit],
>                            setup_pebs_fixed_sample_data);
>         }
> 
> There's a path to x86_pmu_stop() when an error bit is on.

That would seem to suggest you try something like this:

diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 31b9e58b03fe..8c6ee8be8b6e 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1945,7 +1945,7 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs, struct perf_sample_d
 		if (error[bit]) {
 			perf_log_lost_samples(event, error[bit]);
 
-			if (perf_event_account_interrupt(event))
+			if (iregs && perf_event_account_interrupt(event))
 				x86_pmu_stop(event, 0);
 		}
 

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC] perf/x86: Fix a warning on x86_pmu_stop()
  2020-11-24  8:09     ` Peter Zijlstra
@ 2020-11-24  8:19       ` Stephane Eranian
  2020-11-25  7:36         ` Peter Zijlstra
  2020-11-25  7:22       ` Namhyung Kim
  1 sibling, 1 reply; 7+ messages in thread
From: Stephane Eranian @ 2020-11-24  8:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Namhyung Kim, Ingo Molnar, Borislav Petkov, Thomas Gleixner,
	H. Peter Anvin, x86, LKML, Kan Liang, John Sperbeck, Lendacky,
	Thomas, Andi Kleen

Hi,

Another remark on the PEBS drainage code, it seems to me like a test
is not quite correct:
intel_pmu_drain_pebs_nhm()
{
...
               if (p->status != (1ULL << bit)) {
                        for_each_set_bit(i, (unsigned long *)&pebs_status, size)
                                error[i]++;
                        continue;
                }

The kernel cannot disambiguate when 2+ PEBS counters overflow at the
same time. This is what the comment for this code suggests.
However, I see the comparison is done with the unfiltered p->status
which is a copy of  IA32_PERF_GLOBAL_STATUS at the time of
the sample. This register contains more than the PEBS counter overflow
bits. It also includes many other bits which could also be set.

Shouldn't this test use pebs_status instead (which covers only the
PEBS counters)?

          if (pebs_status != (1ULL << bit)) {
          }

Or am I missing something?
Thanks.


On Tue, Nov 24, 2020 at 12:09 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Nov 24, 2020 at 02:01:39PM +0900, Namhyung Kim wrote:
>
> > Yes, it's not about __intel_pmu_pebs_event().  I'm looking at
> > intel_pmu_drain_pebs_nhm() specifically.  There's code like
> >
> >         /* log dropped samples number */
> >         if (error[bit]) {
> >             perf_log_lost_samples(event, error[bit]);
> >
> >             if (perf_event_account_interrupt(event))
> >                 x86_pmu_stop(event, 0);
> >         }
> >
> >         if (counts[bit]) {
> >             __intel_pmu_pebs_event(event, iregs, base,
> >                            top, bit, counts[bit],
> >                            setup_pebs_fixed_sample_data);
> >         }
> >
> > There's a path to x86_pmu_stop() when an error bit is on.
>
> That would seem to suggest you try something like this:
>
> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
> index 31b9e58b03fe..8c6ee8be8b6e 100644
> --- a/arch/x86/events/intel/ds.c
> +++ b/arch/x86/events/intel/ds.c
> @@ -1945,7 +1945,7 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs, struct perf_sample_d
>                 if (error[bit]) {
>                         perf_log_lost_samples(event, error[bit]);
>
> -                       if (perf_event_account_interrupt(event))
> +                       if (iregs && perf_event_account_interrupt(event))
>                                 x86_pmu_stop(event, 0);
>                 }
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] perf/x86: Fix a warning on x86_pmu_stop()
  2020-11-24  8:09     ` Peter Zijlstra
  2020-11-24  8:19       ` Stephane Eranian
@ 2020-11-25  7:22       ` Namhyung Kim
  1 sibling, 0 replies; 7+ messages in thread
From: Namhyung Kim @ 2020-11-25  7:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Borislav Petkov, Thomas Gleixner, H. Peter Anvin,
	x86, LKML, Stephane Eranian, Kan Liang, John Sperbeck, Lendacky,
	Thomas

On Tue, Nov 24, 2020 at 5:10 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Nov 24, 2020 at 02:01:39PM +0900, Namhyung Kim wrote:
>
> > Yes, it's not about __intel_pmu_pebs_event().  I'm looking at
> > intel_pmu_drain_pebs_nhm() specifically.  There's code like
> >
> >         /* log dropped samples number */
> >         if (error[bit]) {
> >             perf_log_lost_samples(event, error[bit]);
> >
> >             if (perf_event_account_interrupt(event))
> >                 x86_pmu_stop(event, 0);
> >         }
> >
> >         if (counts[bit]) {
> >             __intel_pmu_pebs_event(event, iregs, base,
> >                            top, bit, counts[bit],
> >                            setup_pebs_fixed_sample_data);
> >         }
> >
> > There's a path to x86_pmu_stop() when an error bit is on.
>
> That would seem to suggest you try something like this:
>
> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
> index 31b9e58b03fe..8c6ee8be8b6e 100644
> --- a/arch/x86/events/intel/ds.c
> +++ b/arch/x86/events/intel/ds.c
> @@ -1945,7 +1945,7 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs, struct perf_sample_d
>                 if (error[bit]) {
>                         perf_log_lost_samples(event, error[bit]);
>
> -                       if (perf_event_account_interrupt(event))
> +                       if (iregs && perf_event_account_interrupt(event))
>                                 x86_pmu_stop(event, 0);
>                 }
>

That would work too and much simpler!

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] perf/x86: Fix a warning on x86_pmu_stop()
  2020-11-24  8:19       ` Stephane Eranian
@ 2020-11-25  7:36         ` Peter Zijlstra
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2020-11-25  7:36 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Namhyung Kim, Ingo Molnar, Borislav Petkov, Thomas Gleixner,
	H. Peter Anvin, x86, LKML, Kan Liang, John Sperbeck, Lendacky,
	Thomas, Andi Kleen

On Tue, Nov 24, 2020 at 12:19:34AM -0800, Stephane Eranian wrote:
> Hi,
> 
> Another remark on the PEBS drainage code, it seems to me like a test
> is not quite correct:
> intel_pmu_drain_pebs_nhm()
> {
> ...
>                if (p->status != (1ULL << bit)) {
>                         for_each_set_bit(i, (unsigned long *)&pebs_status, size)
>                                 error[i]++;
>                         continue;
>                 }
> 
> The kernel cannot disambiguate when 2+ PEBS counters overflow at the
> same time. This is what the comment for this code suggests.
> However, I see the comparison is done with the unfiltered p->status
> which is a copy of  IA32_PERF_GLOBAL_STATUS at the time of
> the sample. This register contains more than the PEBS counter overflow
> bits. It also includes many other bits which could also be set.
> 
> Shouldn't this test use pebs_status instead (which covers only the
> PEBS counters)?

Hmm, yes, think so.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-11-25  7:37 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-21  2:50 [RFC] perf/x86: Fix a warning on x86_pmu_stop() Namhyung Kim
2020-11-23 14:23 ` Peter Zijlstra
2020-11-24  5:01   ` Namhyung Kim
2020-11-24  8:09     ` Peter Zijlstra
2020-11-24  8:19       ` Stephane Eranian
2020-11-25  7:36         ` Peter Zijlstra
2020-11-25  7:22       ` Namhyung Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).