All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] perf/x86/intel: Fix a warning on x86_pmu_stop() with large PEBS
@ 2020-11-26 11:09 Namhyung Kim
  2020-11-26 11:09 ` [PATCH 2/2] perf/x86/intel: Check PEBS status correctly Namhyung Kim
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Namhyung Kim @ 2020-11-26 11:09 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: Alexander Shishkin, Thomas Gleixner, Borislav Petkov,
	H. Peter Anvin, LKML, x86, Stephane Eranian, John Sperbeck,
	Lendacky, Thomas, Kan Liang

The commit 3966c3feca3f ("x86/perf/amd: Remove need to check "running"
bit in NMI handler") introduced this.  It seems x86_pmu_stop can be
called recursively (like when it losts some samples) like below:

  x86_pmu_stop
    intel_pmu_disable_event  (x86_pmu_disable)
      intel_pmu_pebs_disable
        intel_pmu_drain_pebs_nhm  (x86_pmu_drain_pebs_buffer)
          x86_pmu_stop

While commit 35d1ce6bec13 ("perf/x86/intel/ds: Fix x86_pmu_stop
warning for large PEBS") fixed it for the normal cases, there's
another path to call x86_pmu_stop() recursively when a PEBS error was
detected (like two or more counters overflowed at the same time).

Like in the Kan's previous fix, we can skip the interrupt accounting
for large PEBS, so check the iregs which is set for PMI only.

Fixes: 3966c3feca3f ("x86/perf/amd: Remove need to check "running" bit in NMI handler")
Reported-by: John Sperbeck <jsperbeck@google.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Cc: "Lendacky, Thomas" <Thomas.Lendacky@amd.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 arch/x86/events/intel/ds.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index b47cc4226934..89dba588636e 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1940,7 +1940,7 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs, struct perf_sample_d
 		if (error[bit]) {
 			perf_log_lost_samples(event, error[bit]);
 
-			if (perf_event_account_interrupt(event))
+			if (iregs && perf_event_account_interrupt(event))
 				x86_pmu_stop(event, 0);
 		}
 
-- 
2.29.2.454.gaff20da3a2-goog


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] perf/x86/intel: Check PEBS status correctly
  2020-11-26 11:09 [PATCH 1/2] perf/x86/intel: Fix a warning on x86_pmu_stop() with large PEBS Namhyung Kim
@ 2020-11-26 11:09 ` Namhyung Kim
  2020-12-03  9:07   ` [tip: perf/urgent] " tip-bot2 for Stephane Eranian
  2020-11-26 12:13 ` [PATCH 1/2] perf/x86/intel: Fix a warning on x86_pmu_stop() with large PEBS Peter Zijlstra
  2020-12-03  9:07 ` [tip: perf/urgent] " tip-bot2 for Namhyung Kim
  2 siblings, 1 reply; 5+ messages in thread
From: Namhyung Kim @ 2020-11-26 11:09 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: Alexander Shishkin, Thomas Gleixner, Borislav Petkov,
	H. Peter Anvin, LKML, x86, Stephane Eranian, Kan Liang

From: Stephane Eranian <eranian@google.com>

The kernel cannot disambiguate when 2+ PEBS counters overflow at the
same time. This is what the comment for this code suggests.  However,
I see the comparison is done with the unfiltered p->status which is a
copy of IA32_PERF_GLOBAL_STATUS at the time of the sample. This
register contains more than the PEBS counter overflow bits. It also
includes many other bits which could also be set.

Cc: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 arch/x86/events/intel/ds.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 89dba588636e..485c5066f8b8 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1916,7 +1916,7 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs, struct perf_sample_d
 		 * that caused the PEBS record. It's called collision.
 		 * If collision happened, the record will be dropped.
 		 */
-		if (p->status != (1ULL << bit)) {
+		if (pebs_status != (1ULL << bit)) {
 			for_each_set_bit(i, (unsigned long *)&pebs_status, size)
 				error[i]++;
 			continue;
-- 
2.29.2.454.gaff20da3a2-goog


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] perf/x86/intel: Fix a warning on x86_pmu_stop() with large PEBS
  2020-11-26 11:09 [PATCH 1/2] perf/x86/intel: Fix a warning on x86_pmu_stop() with large PEBS Namhyung Kim
  2020-11-26 11:09 ` [PATCH 2/2] perf/x86/intel: Check PEBS status correctly Namhyung Kim
@ 2020-11-26 12:13 ` Peter Zijlstra
  2020-12-03  9:07 ` [tip: perf/urgent] " tip-bot2 for Namhyung Kim
  2 siblings, 0 replies; 5+ messages in thread
From: Peter Zijlstra @ 2020-11-26 12:13 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Alexander Shishkin, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, LKML, x86, Stephane Eranian,
	John Sperbeck, Lendacky, Thomas, Kan Liang

On Thu, Nov 26, 2020 at 08:09:21PM +0900, Namhyung Kim wrote:
> The commit 3966c3feca3f ("x86/perf/amd: Remove need to check "running"
> bit in NMI handler") introduced this.  It seems x86_pmu_stop can be
> called recursively (like when it losts some samples) like below:
> 
>   x86_pmu_stop
>     intel_pmu_disable_event  (x86_pmu_disable)
>       intel_pmu_pebs_disable
>         intel_pmu_drain_pebs_nhm  (x86_pmu_drain_pebs_buffer)
>           x86_pmu_stop
> 
> While commit 35d1ce6bec13 ("perf/x86/intel/ds: Fix x86_pmu_stop
> warning for large PEBS") fixed it for the normal cases, there's
> another path to call x86_pmu_stop() recursively when a PEBS error was
> detected (like two or more counters overflowed at the same time).
> 
> Like in the Kan's previous fix, we can skip the interrupt accounting
> for large PEBS, so check the iregs which is set for PMI only.
> 
> Fixes: 3966c3feca3f ("x86/perf/amd: Remove need to check "running" bit in NMI handler")
> Reported-by: John Sperbeck <jsperbeck@google.com>
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Cc: "Lendacky, Thomas" <Thomas.Lendacky@amd.com>
> Cc: Kan Liang <kan.liang@linux.intel.com>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>

Thanks for both!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [tip: perf/urgent] perf/x86/intel: Fix a warning on x86_pmu_stop() with large PEBS
  2020-11-26 11:09 [PATCH 1/2] perf/x86/intel: Fix a warning on x86_pmu_stop() with large PEBS Namhyung Kim
  2020-11-26 11:09 ` [PATCH 2/2] perf/x86/intel: Check PEBS status correctly Namhyung Kim
  2020-11-26 12:13 ` [PATCH 1/2] perf/x86/intel: Fix a warning on x86_pmu_stop() with large PEBS Peter Zijlstra
@ 2020-12-03  9:07 ` tip-bot2 for Namhyung Kim
  2 siblings, 0 replies; 5+ messages in thread
From: tip-bot2 for Namhyung Kim @ 2020-12-03  9:07 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: John Sperbeck, Peter Zijlstra, Namhyung Kim, x86, linux-kernel

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID:     5debf02131227d39988e44adf5090fb796fa8466
Gitweb:        https://git.kernel.org/tip/5debf02131227d39988e44adf5090fb796fa8466
Author:        Namhyung Kim <namhyung@kernel.org>
AuthorDate:    Thu, 26 Nov 2020 20:09:21 +09:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 03 Dec 2020 10:00:26 +01:00

perf/x86/intel: Fix a warning on x86_pmu_stop() with large PEBS

The commit 3966c3feca3f ("x86/perf/amd: Remove need to check "running"
bit in NMI handler") introduced this.  It seems x86_pmu_stop can be
called recursively (like when it losts some samples) like below:

  x86_pmu_stop
    intel_pmu_disable_event  (x86_pmu_disable)
      intel_pmu_pebs_disable
        intel_pmu_drain_pebs_nhm  (x86_pmu_drain_pebs_buffer)
          x86_pmu_stop

While commit 35d1ce6bec13 ("perf/x86/intel/ds: Fix x86_pmu_stop
warning for large PEBS") fixed it for the normal cases, there's
another path to call x86_pmu_stop() recursively when a PEBS error was
detected (like two or more counters overflowed at the same time).

Like in the Kan's previous fix, we can skip the interrupt accounting
for large PEBS, so check the iregs which is set for PMI only.

Fixes: 3966c3feca3f ("x86/perf/amd: Remove need to check "running" bit in NMI handler")
Reported-by: John Sperbeck <jsperbeck@google.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201126110922.317681-1-namhyung@kernel.org
---
 arch/x86/events/intel/ds.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index b47cc42..89dba58 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1940,7 +1940,7 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs, struct perf_sample_d
 		if (error[bit]) {
 			perf_log_lost_samples(event, error[bit]);
 
-			if (perf_event_account_interrupt(event))
+			if (iregs && perf_event_account_interrupt(event))
 				x86_pmu_stop(event, 0);
 		}
 

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [tip: perf/urgent] perf/x86/intel: Check PEBS status correctly
  2020-11-26 11:09 ` [PATCH 2/2] perf/x86/intel: Check PEBS status correctly Namhyung Kim
@ 2020-12-03  9:07   ` tip-bot2 for Stephane Eranian
  0 siblings, 0 replies; 5+ messages in thread
From: tip-bot2 for Stephane Eranian @ 2020-12-03  9:07 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Namhyung Kim, Stephane Eranian, Peter Zijlstra (Intel),
	x86, linux-kernel

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID:     fc17db8aa4c53cbd2d5469bb0521ea0f0a6dbb27
Gitweb:        https://git.kernel.org/tip/fc17db8aa4c53cbd2d5469bb0521ea0f0a6dbb27
Author:        Stephane Eranian <eranian@google.com>
AuthorDate:    Thu, 26 Nov 2020 20:09:22 +09:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 03 Dec 2020 10:00:26 +01:00

perf/x86/intel: Check PEBS status correctly

The kernel cannot disambiguate when 2+ PEBS counters overflow at the
same time. This is what the comment for this code suggests.  However,
I see the comparison is done with the unfiltered p->status which is a
copy of IA32_PERF_GLOBAL_STATUS at the time of the sample. This
register contains more than the PEBS counter overflow bits. It also
includes many other bits which could also be set.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201126110922.317681-2-namhyung@kernel.org
---
 arch/x86/events/intel/ds.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 89dba58..485c506 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1916,7 +1916,7 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs, struct perf_sample_d
 		 * that caused the PEBS record. It's called collision.
 		 * If collision happened, the record will be dropped.
 		 */
-		if (p->status != (1ULL << bit)) {
+		if (pebs_status != (1ULL << bit)) {
 			for_each_set_bit(i, (unsigned long *)&pebs_status, size)
 				error[i]++;
 			continue;

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-12-03  9:08 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-26 11:09 [PATCH 1/2] perf/x86/intel: Fix a warning on x86_pmu_stop() with large PEBS Namhyung Kim
2020-11-26 11:09 ` [PATCH 2/2] perf/x86/intel: Check PEBS status correctly Namhyung Kim
2020-12-03  9:07   ` [tip: perf/urgent] " tip-bot2 for Stephane Eranian
2020-11-26 12:13 ` [PATCH 1/2] perf/x86/intel: Fix a warning on x86_pmu_stop() with large PEBS Peter Zijlstra
2020-12-03  9:07 ` [tip: perf/urgent] " tip-bot2 for Namhyung Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.