All of lore.kernel.org
 help / color / mirror / Atom feed
* [RESEND][PATCH] perf/x86/intel/cqm: Return cached counter value from IRQ context
@ 2015-07-21 14:55 Matt Fleming
  2015-07-26  8:25 ` [tip:perf/urgent] " tip-bot for Matt Fleming
  2015-10-06  7:16 ` [tip:perf/core] perf tests: Add Intel CQM test tip-bot for Matt Fleming
  0 siblings, 2 replies; 3+ messages in thread
From: Matt Fleming @ 2015-07-21 14:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Will Auld, Matt Fleming, Thomas Gleixner,
	Vikas Shivappa, Kanaka Juvva, stable

From: Matt Fleming <matt.fleming@intel.com>

Peter reported the following potential crash which I was able to
reproduce with his test program,

[  148.765788] ------------[ cut here ]------------
[  148.765796] WARNING: CPU: 34 PID: 2840 at kernel/smp.c:417 smp_call_function_many+0xb6/0x260()
[  148.765797] Modules linked in:
[  148.765800] CPU: 34 PID: 2840 Comm: perf Not tainted 4.2.0-rc1+ #4
[  148.765803]  ffffffff81cdc398 ffff88085f105950 ffffffff818bdfd5 0000000000000007
[  148.765805]  0000000000000000 ffff88085f105990 ffffffff810e413a 0000000000000000
[  148.765807]  ffffffff82301080 0000000000000022 ffffffff8107f640 ffffffff8107f640
[  148.765809] Call Trace:
[  148.765810]  <NMI>  [<ffffffff818bdfd5>] dump_stack+0x45/0x57
[  148.765818]  [<ffffffff810e413a>] warn_slowpath_common+0x8a/0xc0
[  148.765822]  [<ffffffff8107f640>] ? intel_cqm_stable+0x60/0x60
[  148.765824]  [<ffffffff8107f640>] ? intel_cqm_stable+0x60/0x60
[  148.765825]  [<ffffffff810e422a>] warn_slowpath_null+0x1a/0x20
[  148.765827]  [<ffffffff811613f6>] smp_call_function_many+0xb6/0x260
[  148.765829]  [<ffffffff8107f640>] ? intel_cqm_stable+0x60/0x60
[  148.765831]  [<ffffffff81161748>] on_each_cpu_mask+0x28/0x60
[  148.765832]  [<ffffffff8107f6ef>] intel_cqm_event_count+0x7f/0xe0
[  148.765836]  [<ffffffff811cdd35>] perf_output_read+0x2a5/0x400
[  148.765839]  [<ffffffff811d2e5a>] perf_output_sample+0x31a/0x590
[  148.765840]  [<ffffffff811d333d>] ? perf_prepare_sample+0x26d/0x380
[  148.765841]  [<ffffffff811d3497>] perf_event_output+0x47/0x60
[  148.765843]  [<ffffffff811d36c5>] __perf_event_overflow+0x215/0x240
[  148.765844]  [<ffffffff811d4124>] perf_event_overflow+0x14/0x20
[  148.765847]  [<ffffffff8107e7f4>] intel_pmu_handle_irq+0x1d4/0x440
[  148.765849]  [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0
[  148.765853]  [<ffffffff81219bad>] ? vunmap_page_range+0x19d/0x2f0
[  148.765854]  [<ffffffff81219d11>] ? unmap_kernel_range_noflush+0x11/0x20
[  148.765859]  [<ffffffff814ce6fe>] ? ghes_copy_tofrom_phys+0x11e/0x2a0
[  148.765863]  [<ffffffff8109e5db>] ? native_apic_msr_write+0x2b/0x30
[  148.765865]  [<ffffffff8109e44d>] ? x2apic_send_IPI_self+0x1d/0x20
[  148.765869]  [<ffffffff81065135>] ? arch_irq_work_raise+0x35/0x40
[  148.765872]  [<ffffffff811c8d86>] ? irq_work_queue+0x66/0x80
[  148.765875]  [<ffffffff81075306>] perf_event_nmi_handler+0x26/0x40
[  148.765877]  [<ffffffff81063ed9>] nmi_handle+0x79/0x100
[  148.765879]  [<ffffffff81064422>] default_do_nmi+0x42/0x100
[  148.765880]  [<ffffffff81064563>] do_nmi+0x83/0xb0
[  148.765884]  [<ffffffff818c7c0f>] end_repeat_nmi+0x1e/0x2e
[  148.765886]  [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0
[  148.765888]  [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0
[  148.765890]  [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0
[  148.765891]  <<EOE>>  [<ffffffff8110ab66>] finish_task_switch+0x156/0x210
[  148.765898]  [<ffffffff818c1671>] __schedule+0x341/0x920
[  148.765899]  [<ffffffff818c1c87>] schedule+0x37/0x80
[  148.765903]  [<ffffffff810ae1af>] ? do_page_fault+0x2f/0x80
[  148.765905]  [<ffffffff818c1f4a>] schedule_user+0x1a/0x50
[  148.765907]  [<ffffffff818c666c>] retint_careful+0x14/0x32
[  148.765908] ---[ end trace e33ff2be78e14901 ]---

The CQM task events are not safe to be called from within interrupt
context because they require performing an IPI to read the counter value
on all sockets. And performing IPIs from within IRQ context is a
"no-no".

Make do with the last read counter value currently event in
event->count when we're invoked in this context.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vikas Shivappa <vikas.shivappa@intel.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
---

Apologies for the spam, I fat-fingered Will's email address.

 arch/x86/kernel/cpu/perf_event_intel_cqm.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_cqm.c b/arch/x86/kernel/cpu/perf_event_intel_cqm.c
index 188076161c1b..63eb68b73589 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_cqm.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_cqm.c
@@ -952,6 +952,14 @@ static u64 intel_cqm_event_count(struct perf_event *event)
 		return 0;
 
 	/*
+	 * Getting up-to-date values requires an SMP IPI which is not
+	 * possible if we're being called in interrupt context. Return
+	 * the cached values instead.
+	 */
+	if (unlikely(in_interrupt()))
+		goto out;
+
+	/*
 	 * Notice that we don't perform the reading of an RMID
 	 * atomically, because we can't hold a spin lock across the
 	 * IPIs.
-- 
2.1.0


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [tip:perf/urgent] perf/x86/intel/cqm: Return cached counter value from IRQ context
  2015-07-21 14:55 [RESEND][PATCH] perf/x86/intel/cqm: Return cached counter value from IRQ context Matt Fleming
@ 2015-07-26  8:25 ` tip-bot for Matt Fleming
  2015-10-06  7:16 ` [tip:perf/core] perf tests: Add Intel CQM test tip-bot for Matt Fleming
  1 sibling, 0 replies; 3+ messages in thread
From: tip-bot for Matt Fleming @ 2015-07-26  8:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, kanaka.d.juvva, stable, matt.fleming, linux-kernel,
	vikas.shivappa, will.auld, peterz, tglx, mingo

Commit-ID:  2c534c0da0a68418693e10ce1c4146e085f39518
Gitweb:     http://git.kernel.org/tip/2c534c0da0a68418693e10ce1c4146e085f39518
Author:     Matt Fleming <matt.fleming@intel.com>
AuthorDate: Tue, 21 Jul 2015 15:55:09 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sun, 26 Jul 2015 10:22:29 +0200

perf/x86/intel/cqm: Return cached counter value from IRQ context

Peter reported the following potential crash which I was able to
reproduce with his test program,

[  148.765788] ------------[ cut here ]------------
[  148.765796] WARNING: CPU: 34 PID: 2840 at kernel/smp.c:417 smp_call_function_many+0xb6/0x260()
[  148.765797] Modules linked in:
[  148.765800] CPU: 34 PID: 2840 Comm: perf Not tainted 4.2.0-rc1+ #4
[  148.765803]  ffffffff81cdc398 ffff88085f105950 ffffffff818bdfd5 0000000000000007
[  148.765805]  0000000000000000 ffff88085f105990 ffffffff810e413a 0000000000000000
[  148.765807]  ffffffff82301080 0000000000000022 ffffffff8107f640 ffffffff8107f640
[  148.765809] Call Trace:
[  148.765810]  <NMI>  [<ffffffff818bdfd5>] dump_stack+0x45/0x57
[  148.765818]  [<ffffffff810e413a>] warn_slowpath_common+0x8a/0xc0
[  148.765822]  [<ffffffff8107f640>] ? intel_cqm_stable+0x60/0x60
[  148.765824]  [<ffffffff8107f640>] ? intel_cqm_stable+0x60/0x60
[  148.765825]  [<ffffffff810e422a>] warn_slowpath_null+0x1a/0x20
[  148.765827]  [<ffffffff811613f6>] smp_call_function_many+0xb6/0x260
[  148.765829]  [<ffffffff8107f640>] ? intel_cqm_stable+0x60/0x60
[  148.765831]  [<ffffffff81161748>] on_each_cpu_mask+0x28/0x60
[  148.765832]  [<ffffffff8107f6ef>] intel_cqm_event_count+0x7f/0xe0
[  148.765836]  [<ffffffff811cdd35>] perf_output_read+0x2a5/0x400
[  148.765839]  [<ffffffff811d2e5a>] perf_output_sample+0x31a/0x590
[  148.765840]  [<ffffffff811d333d>] ? perf_prepare_sample+0x26d/0x380
[  148.765841]  [<ffffffff811d3497>] perf_event_output+0x47/0x60
[  148.765843]  [<ffffffff811d36c5>] __perf_event_overflow+0x215/0x240
[  148.765844]  [<ffffffff811d4124>] perf_event_overflow+0x14/0x20
[  148.765847]  [<ffffffff8107e7f4>] intel_pmu_handle_irq+0x1d4/0x440
[  148.765849]  [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0
[  148.765853]  [<ffffffff81219bad>] ? vunmap_page_range+0x19d/0x2f0
[  148.765854]  [<ffffffff81219d11>] ? unmap_kernel_range_noflush+0x11/0x20
[  148.765859]  [<ffffffff814ce6fe>] ? ghes_copy_tofrom_phys+0x11e/0x2a0
[  148.765863]  [<ffffffff8109e5db>] ? native_apic_msr_write+0x2b/0x30
[  148.765865]  [<ffffffff8109e44d>] ? x2apic_send_IPI_self+0x1d/0x20
[  148.765869]  [<ffffffff81065135>] ? arch_irq_work_raise+0x35/0x40
[  148.765872]  [<ffffffff811c8d86>] ? irq_work_queue+0x66/0x80
[  148.765875]  [<ffffffff81075306>] perf_event_nmi_handler+0x26/0x40
[  148.765877]  [<ffffffff81063ed9>] nmi_handle+0x79/0x100
[  148.765879]  [<ffffffff81064422>] default_do_nmi+0x42/0x100
[  148.765880]  [<ffffffff81064563>] do_nmi+0x83/0xb0
[  148.765884]  [<ffffffff818c7c0f>] end_repeat_nmi+0x1e/0x2e
[  148.765886]  [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0
[  148.765888]  [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0
[  148.765890]  [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0
[  148.765891]  <<EOE>>  [<ffffffff8110ab66>] finish_task_switch+0x156/0x210
[  148.765898]  [<ffffffff818c1671>] __schedule+0x341/0x920
[  148.765899]  [<ffffffff818c1c87>] schedule+0x37/0x80
[  148.765903]  [<ffffffff810ae1af>] ? do_page_fault+0x2f/0x80
[  148.765905]  [<ffffffff818c1f4a>] schedule_user+0x1a/0x50
[  148.765907]  [<ffffffff818c666c>] retint_careful+0x14/0x32
[  148.765908] ---[ end trace e33ff2be78e14901 ]---

The CQM task events are not safe to be called from within interrupt
context because they require performing an IPI to read the counter value
on all sockets. And performing IPIs from within IRQ context is a
"no-no".

Make do with the last read counter value currently event in
event->count when we're invoked in this context.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vikas Shivappa <vikas.shivappa@intel.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Will Auld <will.auld@intel.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1437490509-15373-1-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/perf_event_intel_cqm.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_cqm.c b/arch/x86/kernel/cpu/perf_event_intel_cqm.c
index 1880761..63eb68b 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_cqm.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_cqm.c
@@ -952,6 +952,14 @@ static u64 intel_cqm_event_count(struct perf_event *event)
 		return 0;
 
 	/*
+	 * Getting up-to-date values requires an SMP IPI which is not
+	 * possible if we're being called in interrupt context. Return
+	 * the cached values instead.
+	 */
+	if (unlikely(in_interrupt()))
+		goto out;
+
+	/*
 	 * Notice that we don't perform the reading of an RMID
 	 * atomically, because we can't hold a spin lock across the
 	 * IPIs.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [tip:perf/core] perf tests: Add Intel CQM test
  2015-07-21 14:55 [RESEND][PATCH] perf/x86/intel/cqm: Return cached counter value from IRQ context Matt Fleming
  2015-07-26  8:25 ` [tip:perf/urgent] " tip-bot for Matt Fleming
@ 2015-10-06  7:16 ` tip-bot for Matt Fleming
  1 sibling, 0 replies; 3+ messages in thread
From: tip-bot for Matt Fleming @ 2015-10-06  7:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: fenghua.yu, vikas.shivappa, mingo, jolsa, acme, adrian.hunter,
	matt.fleming, peterz, ak, hpa, vince, kanaka.d.juvva,
	linux-kernel, tglx

Commit-ID:  035827e9f2bd71a280f4eb58c65811d377ab2217
Gitweb:     http://git.kernel.org/tip/035827e9f2bd71a280f4eb58c65811d377ab2217
Author:     Matt Fleming <matt.fleming@intel.com>
AuthorDate: Mon, 5 Oct 2015 15:40:21 +0100
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 5 Oct 2015 16:56:07 -0300

perf tests: Add Intel CQM test

Peter reports that it's possible to trigger a WARN_ON_ONCE() in the
Intel CQM code by combining a hardware event and an Intel CQM
(software) event into a group. Unfortunately, the perf tools are not
able to create this bundle and we need to manually construct a test
case.

For posterity, record Peter's proof of concept test case in tools/perf
so that it presents a model for how we can perform architecture
specific tests, or "arch tests", in perf in the future.

The particular issue triggered in the test case is that when the
counter for the hardware event overflows and triggers a PMI we'll read
both the hardware event and the software event counters.
Unfortunately, for CQM that involves performing an IPI to read the CQM
event counters on all sockets, which in NMI context triggers the
WARN_ON_ONCE().

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Vikas Shivappa <vikas.shivappa@intel.com>
Cc: Vince Weaver <vince@deater.net>
Link: http://lkml.kernel.org/r/1437490509-15373-1-git-send-email-matt@codeblueprint.co.uk
Link: http://lkml.kernel.org/n/tip-3p4ra0u8vzm7m289a1m799kf@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/arch/x86/include/arch-tests.h |   1 +
 tools/perf/arch/x86/tests/Build          |   1 +
 tools/perf/arch/x86/tests/arch-tests.c   |   4 +
 tools/perf/arch/x86/tests/intel-cqm.c    | 124 +++++++++++++++++++++++++++++++
 4 files changed, 130 insertions(+)

diff --git a/tools/perf/arch/x86/include/arch-tests.h b/tools/perf/arch/x86/include/arch-tests.h
index 5927cf2..7ed00f4 100644
--- a/tools/perf/arch/x86/include/arch-tests.h
+++ b/tools/perf/arch/x86/include/arch-tests.h
@@ -5,6 +5,7 @@
 int test__rdpmc(void);
 int test__perf_time_to_tsc(void);
 int test__insn_x86(void);
+int test__intel_cqm_count_nmi_context(void);
 
 #ifdef HAVE_DWARF_UNWIND_SUPPORT
 struct thread;
diff --git a/tools/perf/arch/x86/tests/Build b/tools/perf/arch/x86/tests/Build
index 8e2c5a3..cbb7e97 100644
--- a/tools/perf/arch/x86/tests/Build
+++ b/tools/perf/arch/x86/tests/Build
@@ -5,3 +5,4 @@ libperf-y += arch-tests.o
 libperf-y += rdpmc.o
 libperf-y += perf-time-to-tsc.o
 libperf-$(CONFIG_AUXTRACE) += insn-x86.o
+libperf-y += intel-cqm.o
diff --git a/tools/perf/arch/x86/tests/arch-tests.c b/tools/perf/arch/x86/tests/arch-tests.c
index d116c21..2218cb6 100644
--- a/tools/perf/arch/x86/tests/arch-tests.c
+++ b/tools/perf/arch/x86/tests/arch-tests.c
@@ -24,6 +24,10 @@ struct test arch_tests[] = {
 	},
 #endif
 	{
+		.desc = "Test intel cqm nmi context read",
+		.func = test__intel_cqm_count_nmi_context,
+	},
+	{
 		.func = NULL,
 	},
 
diff --git a/tools/perf/arch/x86/tests/intel-cqm.c b/tools/perf/arch/x86/tests/intel-cqm.c
new file mode 100644
index 0000000..d28c1b6
--- /dev/null
+++ b/tools/perf/arch/x86/tests/intel-cqm.c
@@ -0,0 +1,124 @@
+#include "tests/tests.h"
+#include "perf.h"
+#include "cloexec.h"
+#include "debug.h"
+#include "evlist.h"
+#include "evsel.h"
+#include "arch-tests.h"
+
+#include <sys/mman.h>
+#include <string.h>
+
+static pid_t spawn(void)
+{
+	pid_t pid;
+
+	pid = fork();
+	if (pid)
+		return pid;
+
+	while(1);
+		sleep(5);
+	return 0;
+}
+
+/*
+ * Create an event group that contains both a sampled hardware
+ * (cpu-cycles) and software (intel_cqm/llc_occupancy/) event. We then
+ * wait for the hardware perf counter to overflow and generate a PMI,
+ * which triggers an event read for both of the events in the group.
+ *
+ * Since reading Intel CQM event counters requires sending SMP IPIs, the
+ * CQM pmu needs to handle the above situation gracefully, and return
+ * the last read counter value to avoid triggering a WARN_ON_ONCE() in
+ * smp_call_function_many() caused by sending IPIs from NMI context.
+ */
+int test__intel_cqm_count_nmi_context(void)
+{
+	struct perf_evlist *evlist = NULL;
+	struct perf_evsel *evsel = NULL;
+	struct perf_event_attr pe;
+	int i, fd[2], flag, ret;
+	size_t mmap_len;
+	void *event;
+	pid_t pid;
+	int err = TEST_FAIL;
+
+	flag = perf_event_open_cloexec_flag();
+
+	evlist = perf_evlist__new();
+	if (!evlist) {
+		pr_debug("perf_evlist__new failed\n");
+		return TEST_FAIL;
+	}
+
+	ret = parse_events(evlist, "intel_cqm/llc_occupancy/", NULL);
+	if (ret) {
+		pr_debug("parse_events failed\n");
+		err = TEST_SKIP;
+		goto out;
+	}
+
+	evsel = perf_evlist__first(evlist);
+	if (!evsel) {
+		pr_debug("perf_evlist__first failed\n");
+		goto out;
+	}
+
+	memset(&pe, 0, sizeof(pe));
+	pe.size = sizeof(pe);
+
+	pe.type = PERF_TYPE_HARDWARE;
+	pe.config = PERF_COUNT_HW_CPU_CYCLES;
+	pe.read_format = PERF_FORMAT_GROUP;
+
+	pe.sample_period = 128;
+	pe.sample_type = PERF_SAMPLE_IP | PERF_SAMPLE_READ;
+
+	pid = spawn();
+
+	fd[0] = sys_perf_event_open(&pe, pid, -1, -1, flag);
+	if (fd[0] < 0) {
+		pr_debug("failed to open event\n");
+		goto out;
+	}
+
+	memset(&pe, 0, sizeof(pe));
+	pe.size = sizeof(pe);
+
+	pe.type = evsel->attr.type;
+	pe.config = evsel->attr.config;
+
+	fd[1] = sys_perf_event_open(&pe, pid, -1, fd[0], flag);
+	if (fd[1] < 0) {
+		pr_debug("failed to open event\n");
+		goto out;
+	}
+
+	/*
+	 * Pick a power-of-two number of pages + 1 for the meta-data
+	 * page (struct perf_event_mmap_page). See tools/perf/design.txt.
+	 */
+	mmap_len = page_size * 65;
+
+	event = mmap(NULL, mmap_len, PROT_READ, MAP_SHARED, fd[0], 0);
+	if (event == (void *)(-1)) {
+		pr_debug("failed to mmap %d\n", errno);
+		goto out;
+	}
+
+	sleep(1);
+
+	err = TEST_OK;
+
+	munmap(event, mmap_len);
+
+	for (i = 0; i < 2; i++)
+		close(fd[i]);
+
+	kill(pid, SIGKILL);
+	wait(NULL);
+out:
+	perf_evlist__delete(evlist);
+	return err;
+}

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-10-06  7:17 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-21 14:55 [RESEND][PATCH] perf/x86/intel/cqm: Return cached counter value from IRQ context Matt Fleming
2015-07-26  8:25 ` [tip:perf/urgent] " tip-bot for Matt Fleming
2015-10-06  7:16 ` [tip:perf/core] perf tests: Add Intel CQM test tip-bot for Matt Fleming

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.