linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] x86: Intel Processor Trace Logger
@ 2015-09-08  5:49 Takao Indoh
  2015-09-08  5:49 ` [PATCH v2 1/4] perf/trace: Add function to find event type by name Takao Indoh
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Takao Indoh @ 2015-09-08  5:49 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Vivek Goyal,
	Steven Rostedt
  Cc: linux-kernel, x86

Hi all,

These patch series provide logging feature for Intel Processor Trace
(Intel PT).

Intel PT is a new feature of Intel CPU "Broadwell", it captures
information about program execution flow. Here is a article about Intel
PT.
https://software.intel.com/en-us/blogs/2013/09/18/processor-tracing

Once Intel PT is enabled, the events which change program flow, like
branch instructions, exceptions, interruptions, traps and so on are
logged in the memory. This is very useful for debugging because we can
know the detailed behavior of software.

This patch creates log buffer for Intel PT and enable logging at boot
time. When kernel panic occurs, we can get this log buffer from
crashdump file by kdump, and reconstruct the flow that led to the panic.

changelog:
v2:
- Reimplement using perf_event_create_kernel_counter

v1:
https://lkml.org/lkml/2015/7/29/6

Takao Indoh (4):
  perf/trace: Add function to find event type by name
  perf: Add function to enable perf events in kernel with ring buffer
  perf/x86/intel/pt: Add Intel PT logger
  x86: Stop Intel PT and save its registers when panic occurs

 arch/x86/Kconfig                          |   16 +++
 arch/x86/include/asm/intel_pt_log.h       |   13 ++
 arch/x86/kernel/cpu/Makefile              |    2 +
 arch/x86/kernel/cpu/intel_pt_log.c        |  178 +++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/perf_event_intel_pt.c |    6 +
 arch/x86/kernel/crash.c                   |    9 ++
 include/linux/perf_event.h                |   10 ++
 include/linux/trace_events.h              |    2 +
 kernel/events/core.c                      |   70 +++++++++++-
 kernel/trace/trace_event_perf.c           |   22 ++++
 10 files changed, 323 insertions(+), 5 deletions(-)
 create mode 100644 arch/x86/include/asm/intel_pt_log.h
 create mode 100644 arch/x86/kernel/cpu/intel_pt_log.c



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/4] perf/trace: Add function to find event type by name
  2015-09-08  5:49 [PATCH v2 0/4] x86: Intel Processor Trace Logger Takao Indoh
@ 2015-09-08  5:49 ` Takao Indoh
  2015-09-08  5:49 ` [PATCH v2 2/4] perf: Add function to enable perf events in kernel with ring buffer Takao Indoh
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Takao Indoh @ 2015-09-08  5:49 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Vivek Goyal,
	Steven Rostedt
  Cc: linux-kernel, x86

This patch adds function to find struct trace_event by event name like
"sched_switch" , and return its type so that Intel PT logger can enable
the trace event in kernel. Intel PT logger needs this because it needs
sched_switch tracing to collect side-band data.

Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com>
---
 include/linux/trace_events.h    |    2 ++
 kernel/trace/trace_event_perf.c |   22 ++++++++++++++++++++++
 2 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index ed27917..d3cae4b 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -616,6 +616,8 @@ perf_trace_buf_submit(void *raw_data, int size, int rctx, u64 addr,
 {
 	perf_tp_event(addr, count, raw_data, size, regs, head, rctx, task);
 }
+
+int perf_trace_event_get_type_by_name(char *system, char *name);
 #endif
 
 #endif /* _LINUX_TRACE_EVENT_H */
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index abfc903..1a851d5 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -21,6 +21,28 @@ typedef typeof(unsigned long [PERF_MAX_TRACE_SIZE / sizeof(unsigned long)])
 /* Count the events in use (per event id, not per instance) */
 static int	total_ref_count;
 
+int perf_trace_event_get_type_by_name(char *system, char *name)
+{
+	struct trace_event_call *tp_event;
+	int ret = 0;
+	/*
+	 * All type is larger than __TRACE_LAST_TYPE + 1. Therefore return zero
+	 * as a invalid type if not found.
+	 */
+
+	mutex_lock(&event_mutex);
+	list_for_each_entry(tp_event, &ftrace_events, list) {
+		if (!strcmp(tp_event->class->system, system) &&
+		    !strcmp(trace_event_name(tp_event), name)) {
+			ret = tp_event->event.type;
+			break;
+		}
+	}
+	mutex_unlock(&event_mutex);
+
+	return ret;
+}
+
 static int perf_trace_event_perm(struct trace_event_call *tp_event,
 				 struct perf_event *p_event)
 {
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/4] perf: Add function to enable perf events in kernel with ring buffer
  2015-09-08  5:49 [PATCH v2 0/4] x86: Intel Processor Trace Logger Takao Indoh
  2015-09-08  5:49 ` [PATCH v2 1/4] perf/trace: Add function to find event type by name Takao Indoh
@ 2015-09-08  5:49 ` Takao Indoh
  2015-09-08  9:32   ` Alexander Shishkin
  2015-09-08  5:49 ` [PATCH v2 3/4] perf/x86/intel/pt: Add Intel PT logger Takao Indoh
  2015-09-08  5:49 ` [PATCH v2 4/4] x86: Stop Intel PT and save its registers when panic occurs Takao Indoh
  3 siblings, 1 reply; 11+ messages in thread
From: Takao Indoh @ 2015-09-08  5:49 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Vivek Goyal,
	Steven Rostedt
  Cc: linux-kernel, x86

perf_event_create_kernel_counter is used to enable perf events in kernel
without buffer for logging its events. This patch add new fucntion which
enable perf events with ring buffer. Intel PT logger uses this to enable
Intel PT and some associated events with its log buffer.

Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com>
---
 include/linux/perf_event.h |   10 ++++++
 kernel/events/core.c       |   70 ++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 75 insertions(+), 5 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 2027809..34ada8c 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -657,6 +657,16 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr,
 				struct task_struct *task,
 				perf_overflow_handler_t callback,
 				void *context);
+extern struct perf_event *
+perf_event_create_kernel_counter_with_buffer(struct perf_event_attr *attr,
+					int cpu,
+					struct task_struct *task,
+					perf_overflow_handler_t callback,
+					void *context,
+					int flags,
+					int nr_pages,
+					int nr_pages_aux,
+					struct perf_event *output_event);
 extern void perf_pmu_migrate_context(struct pmu *pmu,
 				int src_cpu, int dst_cpu);
 extern u64 perf_event_read_value(struct perf_event *event,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index ae16867..c9d8a59 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8356,21 +8356,33 @@ err_fd:
 }
 
 /**
- * perf_event_create_kernel_counter
+ * perf_event_create_kernel_counter_with_buffer
  *
  * @attr: attributes of the counter to create
  * @cpu: cpu in which the counter is bound
  * @task: task to profile (NULL for percpu)
+ * @overflow_handler: handler for overflow event
+ * @context: target context
+ * @flags: flags of ring buffer
+ * @nr_pages: size (number of pages) of buffer
+ * @nr_pages_aux: size (number of pages) of aux buffer
+ * @output_event: event to be attached
  */
 struct perf_event *
-perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu,
-				 struct task_struct *task,
-				 perf_overflow_handler_t overflow_handler,
-				 void *context)
+perf_event_create_kernel_counter_with_buffer(struct perf_event_attr *attr,
+				int cpu,
+				struct task_struct *task,
+				perf_overflow_handler_t overflow_handler,
+				void *context,
+				int flags,
+				int nr_pages,
+				int nr_pages_aux,
+				struct perf_event *output_event)
 {
 	struct perf_event_context *ctx;
 	struct perf_event *event;
 	int err;
+	struct ring_buffer *rb = NULL;
 
 	/*
 	 * Get the target context (task or percpu):
@@ -8383,6 +8395,31 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu,
 		goto err;
 	}
 
+	if (output_event) {
+		err = perf_event_set_output(event, output_event);
+		if (err)
+			goto err_free;
+	} else if (nr_pages) {
+		rb = rb_alloc(nr_pages,
+		      event->attr.watermark ? event->attr.wakeup_watermark : 0,
+		      event->cpu, flags);
+
+		if (!rb) {
+			err = -ENOMEM;
+			goto err_free;
+		}
+
+		ring_buffer_attach(event, rb);
+
+		if (nr_pages_aux) {
+			err = rb_alloc_aux(rb, event, 0, nr_pages_aux,
+					   event->attr.aux_watermark, flags);
+
+			if (err)
+				goto err_free;
+		}
+	}
+
 	/* Mark owner so we could distinguish it from user events. */
 	event->owner = EVENT_OWNER_KERNEL;
 
@@ -8411,10 +8448,33 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu,
 	return event;
 
 err_free:
+	if (rb && rb->aux_pages)
+		rb_free_aux(rb);
+	if (rb)
+		rb_free(rb);
 	free_event(event);
 err:
 	return ERR_PTR(err);
 }
+EXPORT_SYMBOL_GPL(perf_event_create_kernel_counter_with_buffer);
+
+/**
+ * perf_event_create_kernel_counter
+ *
+ * @attr: attributes of the counter to create
+ * @cpu: cpu in which the counter is bound
+ * @task: task to profile (NULL for percpu)
+ */
+struct perf_event *
+perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu,
+				 struct task_struct *task,
+				 perf_overflow_handler_t overflow_handler,
+				 void *context)
+{
+	return perf_event_create_kernel_counter_with_buffer(attr, cpu, task,
+							overflow_handler,
+							context, 0, 0, 0, NULL);
+}
 EXPORT_SYMBOL_GPL(perf_event_create_kernel_counter);
 
 void perf_pmu_migrate_context(struct pmu *pmu, int src_cpu, int dst_cpu)
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 3/4] perf/x86/intel/pt: Add Intel PT logger
  2015-09-08  5:49 [PATCH v2 0/4] x86: Intel Processor Trace Logger Takao Indoh
  2015-09-08  5:49 ` [PATCH v2 1/4] perf/trace: Add function to find event type by name Takao Indoh
  2015-09-08  5:49 ` [PATCH v2 2/4] perf: Add function to enable perf events in kernel with ring buffer Takao Indoh
@ 2015-09-08  5:49 ` Takao Indoh
  2015-09-08  9:48   ` Alexander Shishkin
  2015-09-08  5:49 ` [PATCH v2 4/4] x86: Stop Intel PT and save its registers when panic occurs Takao Indoh
  3 siblings, 1 reply; 11+ messages in thread
From: Takao Indoh @ 2015-09-08  5:49 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Vivek Goyal,
	Steven Rostedt
  Cc: linux-kernel, x86

This patch provides Intel PT logging feature. When system boots with a
parameter "intel_pt_log", log buffers for Intel PT are allocated and
logging starts, then processor flow information is written in the log
buffer by hardware like flight recorder. This is very helpful to
investigate a cause of kernel panic.

The log buffer size is specified by the parameter
"intel_pt_log_buf_len=<size>". This buffer is used as circular buffer,
therefore old events are overwritten by new events.

Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com>
---
 arch/x86/Kconfig                          |   16 +++
 arch/x86/include/asm/intel_pt_log.h       |   13 ++
 arch/x86/kernel/cpu/Makefile              |    2 +
 arch/x86/kernel/cpu/intel_pt_log.c        |  178 +++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/perf_event_intel_pt.c |    6 +
 5 files changed, 215 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/include/asm/intel_pt_log.h
 create mode 100644 arch/x86/kernel/cpu/intel_pt_log.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f37010f..2b99ba2 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1722,6 +1722,22 @@ config X86_INTEL_MPX
 
 	  If unsure, say N.
 
+config X86_INTEL_PT_LOG
+	prompt "Intel PT logger"
+	def_bool n
+	depends on PERF_EVENTS && CPU_SUP_INTEL
+	---help---
+	  Intel PT is a hardware features that can capture information
+	  about program execution flow. Once Intel PT is enabled, the
+	  events which change program flow, like branch instructions,
+	  exceptions, interruptions, traps and so on are logged in
+	  the memory.
+
+	  This option enables starting Intel PT logging feature at boot
+	  time. When kernel panic occurs, Intel PT log buffer can be
+	  retrieved from crash dump file and enables to reconstruct the
+	  detailed flow that led to the panic.
+
 config EFI
 	bool "EFI runtime service support"
 	depends on ACPI
diff --git a/arch/x86/include/asm/intel_pt_log.h b/arch/x86/include/asm/intel_pt_log.h
new file mode 100644
index 0000000..cef63f7
--- /dev/null
+++ b/arch/x86/include/asm/intel_pt_log.h
@@ -0,0 +1,13 @@
+#ifndef __INTEL_PT_LOG_H__
+#define __INTEL_PT_LOG_H__
+
+#if defined(CONFIG_X86_INTEL_PT_LOG)
+
+#include <linux/perf_event.h>
+
+void pt_log_start(struct pmu *pmu);
+void save_intel_pt_registers(void);
+
+#endif
+
+#endif /* __INTEL_PT_LOG_H__ */
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 4eb065c..67c17f0 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -48,6 +48,8 @@ obj-$(CONFIG_PERF_EVENTS_INTEL_UNCORE)	+= perf_event_intel_uncore.o \
 					   perf_event_intel_uncore_nhmex.o
 obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_msr.o
 obj-$(CONFIG_CPU_SUP_AMD)		+= perf_event_msr.o
+
+obj-$(CONFIG_X86_INTEL_PT_LOG)		+= intel_pt_log.o
 endif
 
 
diff --git a/arch/x86/kernel/cpu/intel_pt_log.c b/arch/x86/kernel/cpu/intel_pt_log.c
new file mode 100644
index 0000000..eb345fd
--- /dev/null
+++ b/arch/x86/kernel/cpu/intel_pt_log.c
@@ -0,0 +1,178 @@
+/*
+ * Intel Processor Trace Logger
+ *
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/trace_events.h>
+#include <asm/intel_pt_log.h>
+
+#define SAMPLE_TYPE_BASE \
+	(PERF_SAMPLE_IP|PERF_SAMPLE_TID|PERF_SAMPLE_TIME|PERF_SAMPLE_IDENTIFIER)
+#define SAMPLE_TYPE_PT \
+	(SAMPLE_TYPE_BASE|PERF_SAMPLE_CPU|PERF_SAMPLE_RAW)
+#define SAMPLE_TYPE_SCHED \
+	(SAMPLE_TYPE_BASE|PERF_SAMPLE_CPU|PERF_SAMPLE_PERIOD|PERF_SAMPLE_RAW)
+#define SAMPLE_TYPE_DUMMY \
+	(SAMPLE_TYPE_BASE)
+
+/* intel_pt */
+static struct perf_event_attr pt_attr_pt = {
+	.config		= 0x400, /* bit10: TSCEn */
+	.size		= sizeof(struct perf_event_attr),
+	.sample_type	= SAMPLE_TYPE_PT,
+	.read_format	= PERF_FORMAT_ID,
+	.inherit	= 1,
+	.pinned		= 1,
+	.sample_id_all	= 1,
+	.exclude_guest	= 1
+};
+
+/* sched:sched_switch */
+static struct perf_event_attr pt_attr_sched = {
+	.type		= PERF_TYPE_TRACEPOINT,
+	.size		= sizeof(struct perf_event_attr),
+	.sample_type	= SAMPLE_TYPE_SCHED,
+	.read_format	= PERF_FORMAT_ID,
+	.inherit	= 1,
+	.sample_id_all	= 1,
+	.exclude_guest	= 1
+};
+
+/* dummy:u */
+static struct perf_event_attr pt_attr_dummy = {
+	.type		= PERF_TYPE_SOFTWARE,
+	.config		= PERF_COUNT_SW_DUMMY,
+	.size		= sizeof(struct perf_event_attr),
+	.sample_type	= SAMPLE_TYPE_DUMMY,
+	.read_format	= PERF_FORMAT_ID,
+	.inherit	= 1,
+	.exclude_kernel = 1,
+	.exclude_hv     = 1,
+	.comm		= 1,
+	.task		= 1,
+	.sample_id_all	= 1,
+	.comm_exec	= 1
+};
+
+static int pt_log_enabled;
+static int pt_log_buf_nr_pages = 128; /* number of pages for log buffer */
+static struct cpumask pt_log_cpu_mask;
+
+static DEFINE_PER_CPU(struct perf_event *, pt_perf_event_pt);
+static DEFINE_PER_CPU(struct perf_event *, pt_perf_event_sched);
+static DEFINE_PER_CPU(struct perf_event *, pt_perf_event_dummy);
+
+/* Saved registers on panic */
+static DEFINE_PER_CPU(u64, saved_msr_ctl);
+static DEFINE_PER_CPU(u64, saved_msr_status);
+static DEFINE_PER_CPU(u64, saved_msr_output_base);
+static DEFINE_PER_CPU(u64, saved_msr_output_mask);
+
+void save_intel_pt_registers(void)
+{
+	int cpu = smp_processor_id();
+	u64 ctl;
+
+	if (!cpumask_test_cpu(cpu, &pt_log_cpu_mask))
+		return;
+
+	/* Save RTIT_CTL register */
+	rdmsrl(MSR_IA32_RTIT_CTL, ctl);
+	per_cpu(saved_msr_ctl, cpu) = ctl;
+
+	/* Stop tracing */
+	ctl &= ~RTIT_CTL_TRACEEN;
+	wrmsrl(MSR_IA32_RTIT_CTL, ctl);
+
+	/* Save other registers */
+	rdmsrl(MSR_IA32_RTIT_STATUS, per_cpu(saved_msr_status, cpu));
+	rdmsrl(MSR_IA32_RTIT_OUTPUT_BASE, per_cpu(saved_msr_output_base, cpu));
+	rdmsrl(MSR_IA32_RTIT_OUTPUT_MASK, per_cpu(saved_msr_output_mask, cpu));
+}
+
+static int pt_enable_kernel_counter(int cpu)
+{
+	struct perf_event *event = NULL;
+
+	/* Create counter for intel_pt */
+	event = perf_event_create_kernel_counter_with_buffer(&pt_attr_pt,
+			cpu, NULL, NULL, NULL, 0,
+			pt_log_buf_nr_pages, pt_log_buf_nr_pages, event);
+
+	if (IS_ERR(event)) {
+		pr_err("failed to create counter for pt: cpu=%d, err=%d\n",
+			cpu, IS_ERR(event));
+		return -1;
+	}
+	per_cpu(pt_perf_event_pt, cpu) = event;
+
+	/* Create counter for side-band data (sched:sched_switch) */
+	event = perf_event_create_kernel_counter_with_buffer(&pt_attr_sched,
+			cpu, NULL, NULL, NULL, 0, 0, 0, event);
+
+	if (IS_ERR(event))
+		pr_warn("failed to create counter for sched: cpu=%d, err=%d\n",
+			cpu, IS_ERR(event));
+	else
+		per_cpu(pt_perf_event_sched, cpu) = event;
+
+	/* Create counter for side-band data (dummy:u) */
+	event = perf_event_create_kernel_counter_with_buffer(&pt_attr_dummy,
+			cpu, NULL, NULL, NULL, 0, 0, 0, event);
+
+	if (IS_ERR(event))
+		pr_warn("failed to create counter for dummy: cpu=%d, err=%d\n",
+			cpu, IS_ERR(event));
+	else
+		per_cpu(pt_perf_event_dummy, cpu) = event;
+
+	return 0;
+}
+
+static __init int pt_log_buf_setup(char *str)
+{
+	int len;
+
+	if (get_option(&str, &len))
+		pt_log_buf_nr_pages = len>>PAGE_SHIFT;
+
+	return 1;
+}
+__setup("intel_pt_log_buf_len", pt_log_buf_setup);
+
+static __init int pt_log_setup(char *str)
+{
+	pt_log_enabled = 1;
+	return 1;
+}
+__setup("intel_pt_log", pt_log_setup);
+
+__init void pt_log_start(struct pmu *pmu)
+{
+	int cpu, type;
+
+	cpumask_clear(&pt_log_cpu_mask);
+
+	if (!pt_log_enabled)
+		return;
+
+	type = perf_trace_event_get_type_by_name("sched", "sched_switch");
+	if (!type) {
+		pr_err("Cannot find sched:sched_switch event\n");
+		return;
+	}
+
+	pt_attr_sched.config = type;
+	pt_attr_sched.sample_period = 1;
+	pt_attr_pt.type = pmu->type;
+
+	get_online_cpus();
+	for_each_online_cpu(cpu) {
+		if (!pt_enable_kernel_counter(cpu))
+			cpumask_set_cpu(cpu, &pt_log_cpu_mask);
+	}
+	put_online_cpus();
+}
+
diff --git a/arch/x86/kernel/cpu/perf_event_intel_pt.c b/arch/x86/kernel/cpu/perf_event_intel_pt.c
index 4216928..5154670 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_pt.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_pt.c
@@ -27,6 +27,7 @@
 #include <asm/perf_event.h>
 #include <asm/insn.h>
 #include <asm/io.h>
+#include <asm/intel_pt_log.h>
 
 #include "perf_event.h"
 #include "intel_pt.h"
@@ -1173,6 +1174,11 @@ static __init int pt_init(void)
 	pt_pmu.pmu.free_aux	= pt_buffer_free_aux;
 	ret = perf_pmu_register(&pt_pmu.pmu, "intel_pt", -1);
 
+#ifdef CONFIG_X86_INTEL_PT_LOG
+	if (!ret)
+		pt_log_start(&pt_pmu.pmu);
+#endif
+
 	return ret;
 }
 arch_initcall(pt_init);
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 4/4] x86: Stop Intel PT and save its registers when panic occurs
  2015-09-08  5:49 [PATCH v2 0/4] x86: Intel Processor Trace Logger Takao Indoh
                   ` (2 preceding siblings ...)
  2015-09-08  5:49 ` [PATCH v2 3/4] perf/x86/intel/pt: Add Intel PT logger Takao Indoh
@ 2015-09-08  5:49 ` Takao Indoh
  3 siblings, 0 replies; 11+ messages in thread
From: Takao Indoh @ 2015-09-08  5:49 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Vivek Goyal,
	Steven Rostedt
  Cc: linux-kernel, x86

When panic occurs, Intel PT logging is stopped to prevent it from
overwrite its log buffer. The registers of Intel PT are saved in the
memory on panic, they are needed for debugger to find the last position
where Intel PT wrote data.

Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com>
---
 arch/x86/kernel/crash.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index e068d66..78deceb 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -35,6 +35,7 @@
 #include <asm/cpu.h>
 #include <asm/reboot.h>
 #include <asm/virtext.h>
+#include <asm/intel_pt_log.h>
 
 /* Alignment required for elf header segment */
 #define ELF_CORE_HEADER_ALIGN   4096
@@ -127,6 +128,10 @@ static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
 	cpu_emergency_vmxoff();
 	cpu_emergency_svm_disable();
 
+#ifdef CONFIG_X86_INTEL_PT_LOG
+	save_intel_pt_registers();
+#endif
+
 	disable_local_APIC();
 }
 
@@ -172,6 +177,10 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
 	cpu_emergency_vmxoff();
 	cpu_emergency_svm_disable();
 
+#ifdef CONFIG_X86_INTEL_PT_LOG
+	save_intel_pt_registers();
+#endif
+
 #ifdef CONFIG_X86_IO_APIC
 	/* Prevent crash_kexec() from deadlocking on ioapic_lock. */
 	ioapic_zap_locks();
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/4] perf: Add function to enable perf events in kernel with ring buffer
  2015-09-08  5:49 ` [PATCH v2 2/4] perf: Add function to enable perf events in kernel with ring buffer Takao Indoh
@ 2015-09-08  9:32   ` Alexander Shishkin
  2015-09-09  2:10     ` Takao Indoh
  0 siblings, 1 reply; 11+ messages in thread
From: Alexander Shishkin @ 2015-09-08  9:32 UTC (permalink / raw)
  To: Takao Indoh, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Peter Zijlstra, Arnaldo Carvalho de Melo, Vivek Goyal,
	Steven Rostedt
  Cc: linux-kernel, x86

Takao Indoh <indou.takao@jp.fujitsu.com> writes:

> perf_event_create_kernel_counter is used to enable perf events in kernel
> without buffer for logging its events. This patch add new fucntion which
> enable perf events with ring buffer. Intel PT logger uses this to enable
> Intel PT and some associated events with its log buffer.

Have you seen [1] and related patches? I haven't gotten around to
updating them yet, but hopefully it's going to happen soon.

The problem is that for such api to work, this memory needs to be
accounted, especially when you start handling event inheritance. For
system crash dump it doesn't really matter, but I also need a similar
api for per-task core dumps, for example.

[1] https://lkml.org/lkml/2014/10/13/290

Thanks,
--
Alex

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/4] perf/x86/intel/pt: Add Intel PT logger
  2015-09-08  5:49 ` [PATCH v2 3/4] perf/x86/intel/pt: Add Intel PT logger Takao Indoh
@ 2015-09-08  9:48   ` Alexander Shishkin
  2015-09-09  2:40     ` Takao Indoh
  0 siblings, 1 reply; 11+ messages in thread
From: Alexander Shishkin @ 2015-09-08  9:48 UTC (permalink / raw)
  To: Takao Indoh, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Peter Zijlstra, Arnaldo Carvalho de Melo, Vivek Goyal,
	Steven Rostedt
  Cc: linux-kernel, x86

Takao Indoh <indou.takao@jp.fujitsu.com> writes:

> +/* intel_pt */
> +static struct perf_event_attr pt_attr_pt = {
> +	.config		= 0x400, /* bit10: TSCEn */

Doesn't it make sense to make these things configurable via sysfs or
whatnot?

> +static int pt_log_buf_nr_pages = 128; /* number of pages for log buffer */

Same here.

> +static struct cpumask pt_log_cpu_mask;
> +
> +static DEFINE_PER_CPU(struct perf_event *, pt_perf_event_pt);
> +static DEFINE_PER_CPU(struct perf_event *, pt_perf_event_sched);
> +static DEFINE_PER_CPU(struct perf_event *, pt_perf_event_dummy);
> +
> +/* Saved registers on panic */
> +static DEFINE_PER_CPU(u64, saved_msr_ctl);
> +static DEFINE_PER_CPU(u64, saved_msr_status);
> +static DEFINE_PER_CPU(u64, saved_msr_output_base);
> +static DEFINE_PER_CPU(u64, saved_msr_output_mask);
> +
> +void save_intel_pt_registers(void)
> +{
> +	int cpu = smp_processor_id();
> +	u64 ctl;
> +
> +	if (!cpumask_test_cpu(cpu, &pt_log_cpu_mask))
> +		return;
> +
> +	/* Save RTIT_CTL register */
> +	rdmsrl(MSR_IA32_RTIT_CTL, ctl);
> +	per_cpu(saved_msr_ctl, cpu) = ctl;
> +
> +	/* Stop tracing */
> +	ctl &= ~RTIT_CTL_TRACEEN;
> +	wrmsrl(MSR_IA32_RTIT_CTL, ctl);
> +
> +	/* Save other registers */
> +	rdmsrl(MSR_IA32_RTIT_STATUS, per_cpu(saved_msr_status, cpu));
> +	rdmsrl(MSR_IA32_RTIT_OUTPUT_BASE, per_cpu(saved_msr_output_base, cpu));
> +	rdmsrl(MSR_IA32_RTIT_OUTPUT_MASK, per_cpu(saved_msr_output_mask, cpu));

I'd really like to keep the PT msr accesses confined to the intel_pt
driver. Maybe have a similar function there? That way you could also use
pt_config_start() instead of clearing TraceEn by hand.

Do you need these saved msr values for the crash tool? I'm guessing
you'd need the write pointer to figure out where the most recent data
is. But then again, if you go the perf_event_disable() path, it'll all
happen automatically in the driver. Or rather __perf_event_disable()
type of thing since this is strictly cpu-local. Or even
event::pmu::stop() would do the trick. The buffer's write head would
then be in this_cpu_ptr(&pt_ctx)->handle.head.

Thanks,
--
Alex

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/4] perf: Add function to enable perf events in kernel with ring buffer
  2015-09-08  9:32   ` Alexander Shishkin
@ 2015-09-09  2:10     ` Takao Indoh
  2015-09-15 12:00       ` Alexander Shishkin
  0 siblings, 1 reply; 11+ messages in thread
From: Takao Indoh @ 2015-09-09  2:10 UTC (permalink / raw)
  To: alexander.shishkin, tglx, mingo, hpa, a.p.zijlstra, acme, vgoyal,
	rostedt
  Cc: linux-kernel, x86

On 2015/09/08 18:32, Alexander Shishkin wrote:
> Takao Indoh <indou.takao@jp.fujitsu.com> writes:
> 
>> perf_event_create_kernel_counter is used to enable perf events in kernel
>> without buffer for logging its events. This patch add new fucntion which
>> enable perf events with ring buffer. Intel PT logger uses this to enable
>> Intel PT and some associated events with its log buffer.
> 
> Have you seen [1] and related patches? I haven't gotten around to
> updating them yet, but hopefully it's going to happen soon.
> 
> The problem is that for such api to work, this memory needs to be
> accounted, especially when you start handling event inheritance. For
> system crash dump it doesn't really matter, but I also need a similar
> api for per-task core dumps, for example.

I have not seen this, I'll check it. You or someone else are working on
api for process core dump?

Thanks,
Takao Indoh

> 
> [1] https://lkml.org/lkml/2014/10/13/290
> 
> Thanks,
> --
> Alex
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/4] perf/x86/intel/pt: Add Intel PT logger
  2015-09-08  9:48   ` Alexander Shishkin
@ 2015-09-09  2:40     ` Takao Indoh
  2015-09-15 12:01       ` Alexander Shishkin
  0 siblings, 1 reply; 11+ messages in thread
From: Takao Indoh @ 2015-09-09  2:40 UTC (permalink / raw)
  To: alexander.shishkin, tglx, mingo, hpa, a.p.zijlstra, acme, vgoyal,
	rostedt
  Cc: linux-kernel, x86

On 2015/09/08 18:48, Alexander Shishkin wrote:
> Takao Indoh <indou.takao@jp.fujitsu.com> writes:
> 
>> +/* intel_pt */
>> +static struct perf_event_attr pt_attr_pt = {
>> +	.config		= 0x400, /* bit10: TSCEn */
> 
> Doesn't it make sense to make these things configurable via sysfs or
> whatnot?

That make sense, will do.

> 
>> +static int pt_log_buf_nr_pages = 128; /* number of pages for log buffer */
> 
> Same here.
> 
>> +static struct cpumask pt_log_cpu_mask;
>> +
>> +static DEFINE_PER_CPU(struct perf_event *, pt_perf_event_pt);
>> +static DEFINE_PER_CPU(struct perf_event *, pt_perf_event_sched);
>> +static DEFINE_PER_CPU(struct perf_event *, pt_perf_event_dummy);
>> +
>> +/* Saved registers on panic */
>> +static DEFINE_PER_CPU(u64, saved_msr_ctl);
>> +static DEFINE_PER_CPU(u64, saved_msr_status);
>> +static DEFINE_PER_CPU(u64, saved_msr_output_base);
>> +static DEFINE_PER_CPU(u64, saved_msr_output_mask);
>> +
>> +void save_intel_pt_registers(void)
>> +{
>> +	int cpu = smp_processor_id();
>> +	u64 ctl;
>> +
>> +	if (!cpumask_test_cpu(cpu, &pt_log_cpu_mask))
>> +		return;
>> +
>> +	/* Save RTIT_CTL register */
>> +	rdmsrl(MSR_IA32_RTIT_CTL, ctl);
>> +	per_cpu(saved_msr_ctl, cpu) = ctl;
>> +
>> +	/* Stop tracing */
>> +	ctl &= ~RTIT_CTL_TRACEEN;
>> +	wrmsrl(MSR_IA32_RTIT_CTL, ctl);
>> +
>> +	/* Save other registers */
>> +	rdmsrl(MSR_IA32_RTIT_STATUS, per_cpu(saved_msr_status, cpu));
>> +	rdmsrl(MSR_IA32_RTIT_OUTPUT_BASE, per_cpu(saved_msr_output_base, cpu));
>> +	rdmsrl(MSR_IA32_RTIT_OUTPUT_MASK, per_cpu(saved_msr_output_mask, cpu));
> 
> I'd really like to keep the PT msr accesses confined to the intel_pt
> driver. Maybe have a similar function there? That way you could also use
> pt_config_start() instead of clearing TraceEn by hand.
> 
> Do you need these saved msr values for the crash tool? I'm guessing
> you'd need the write pointer to figure out where the most recent data
> is. But then again, if you go the perf_event_disable() path, it'll all
> happen automatically in the driver. Or rather __perf_event_disable()
> type of thing since this is strictly cpu-local. Or even
> event::pmu::stop() would do the trick. The buffer's write head would
> then be in this_cpu_ptr(&pt_ctx)->handle.head.

Yes, what I need is the last position where Intel PT hardware wrote
data. Once kernel panic occurs, basically we should minimize the access
to kernel data or functions because they may be broken. That is why I
touch msr directly in this patch. But I agree to limit the access to msr
except intel_pt driver. Using pmu.stop() or pt_event_stop() looks good
to me.

Thanks,
Takao Indoh


> 
> Thanks,
> --
> Alex
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/4] perf: Add function to enable perf events in kernel with ring buffer
  2015-09-09  2:10     ` Takao Indoh
@ 2015-09-15 12:00       ` Alexander Shishkin
  0 siblings, 0 replies; 11+ messages in thread
From: Alexander Shishkin @ 2015-09-15 12:00 UTC (permalink / raw)
  To: Takao Indoh, tglx, mingo, hpa, a.p.zijlstra, acme, vgoyal, rostedt
  Cc: linux-kernel, x86

Takao Indoh <indou.takao@jp.fujitsu.com> writes:

> On 2015/09/08 18:32, Alexander Shishkin wrote:
>> Takao Indoh <indou.takao@jp.fujitsu.com> writes:
>> 
>>> perf_event_create_kernel_counter is used to enable perf events in kernel
>>> without buffer for logging its events. This patch add new fucntion which
>>> enable perf events with ring buffer. Intel PT logger uses this to enable
>>> Intel PT and some associated events with its log buffer.
>> 
>> Have you seen [1] and related patches? I haven't gotten around to
>> updating them yet, but hopefully it's going to happen soon.
>> 
>> The problem is that for such api to work, this memory needs to be
>> accounted, especially when you start handling event inheritance. For
>> system crash dump it doesn't really matter, but I also need a similar
>> api for per-task core dumps, for example.
>
> I have not seen this, I'll check it. You or someone else are working on
> api for process core dump?

Yes, I am. I'll make sure to include you in the next round of patches.

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/4] perf/x86/intel/pt: Add Intel PT logger
  2015-09-09  2:40     ` Takao Indoh
@ 2015-09-15 12:01       ` Alexander Shishkin
  0 siblings, 0 replies; 11+ messages in thread
From: Alexander Shishkin @ 2015-09-15 12:01 UTC (permalink / raw)
  To: Takao Indoh, tglx, mingo, hpa, a.p.zijlstra, acme, vgoyal, rostedt
  Cc: linux-kernel, x86

Takao Indoh <indou.takao@jp.fujitsu.com> writes:

> On 2015/09/08 18:48, Alexander Shishkin wrote:
>> Takao Indoh <indou.takao@jp.fujitsu.com> writes:
>> 
>>> +/* intel_pt */
>>> +static struct perf_event_attr pt_attr_pt = {
>>> +	.config		= 0x400, /* bit10: TSCEn */
>> 
>> Doesn't it make sense to make these things configurable via sysfs or
>> whatnot?
>
> That make sense, will do.
>
>> 
>>> +static int pt_log_buf_nr_pages = 128; /* number of pages for log buffer */
>> 
>> Same here.
>> 
>>> +static struct cpumask pt_log_cpu_mask;
>>> +
>>> +static DEFINE_PER_CPU(struct perf_event *, pt_perf_event_pt);
>>> +static DEFINE_PER_CPU(struct perf_event *, pt_perf_event_sched);
>>> +static DEFINE_PER_CPU(struct perf_event *, pt_perf_event_dummy);
>>> +
>>> +/* Saved registers on panic */
>>> +static DEFINE_PER_CPU(u64, saved_msr_ctl);
>>> +static DEFINE_PER_CPU(u64, saved_msr_status);
>>> +static DEFINE_PER_CPU(u64, saved_msr_output_base);
>>> +static DEFINE_PER_CPU(u64, saved_msr_output_mask);
>>> +
>>> +void save_intel_pt_registers(void)
>>> +{
>>> +	int cpu = smp_processor_id();
>>> +	u64 ctl;
>>> +
>>> +	if (!cpumask_test_cpu(cpu, &pt_log_cpu_mask))
>>> +		return;
>>> +
>>> +	/* Save RTIT_CTL register */
>>> +	rdmsrl(MSR_IA32_RTIT_CTL, ctl);
>>> +	per_cpu(saved_msr_ctl, cpu) = ctl;
>>> +
>>> +	/* Stop tracing */
>>> +	ctl &= ~RTIT_CTL_TRACEEN;
>>> +	wrmsrl(MSR_IA32_RTIT_CTL, ctl);
>>> +
>>> +	/* Save other registers */
>>> +	rdmsrl(MSR_IA32_RTIT_STATUS, per_cpu(saved_msr_status, cpu));
>>> +	rdmsrl(MSR_IA32_RTIT_OUTPUT_BASE, per_cpu(saved_msr_output_base, cpu));
>>> +	rdmsrl(MSR_IA32_RTIT_OUTPUT_MASK, per_cpu(saved_msr_output_mask, cpu));
>> 
>> I'd really like to keep the PT msr accesses confined to the intel_pt
>> driver. Maybe have a similar function there? That way you could also use
>> pt_config_start() instead of clearing TraceEn by hand.
>> 
>> Do you need these saved msr values for the crash tool? I'm guessing
>> you'd need the write pointer to figure out where the most recent data
>> is. But then again, if you go the perf_event_disable() path, it'll all
>> happen automatically in the driver. Or rather __perf_event_disable()
>> type of thing since this is strictly cpu-local. Or even
>> event::pmu::stop() would do the trick. The buffer's write head would
>> then be in this_cpu_ptr(&pt_ctx)->handle.head.
>
> Yes, what I need is the last position where Intel PT hardware wrote
> data. Once kernel panic occurs, basically we should minimize the access
> to kernel data or functions because they may be broken. That is why I
> touch msr directly in this patch. But I agree to limit the access to msr
> except intel_pt driver. Using pmu.stop() or pt_event_stop() looks good
> to me.

Ok, thanks!

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-09-15 12:01 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-08  5:49 [PATCH v2 0/4] x86: Intel Processor Trace Logger Takao Indoh
2015-09-08  5:49 ` [PATCH v2 1/4] perf/trace: Add function to find event type by name Takao Indoh
2015-09-08  5:49 ` [PATCH v2 2/4] perf: Add function to enable perf events in kernel with ring buffer Takao Indoh
2015-09-08  9:32   ` Alexander Shishkin
2015-09-09  2:10     ` Takao Indoh
2015-09-15 12:00       ` Alexander Shishkin
2015-09-08  5:49 ` [PATCH v2 3/4] perf/x86/intel/pt: Add Intel PT logger Takao Indoh
2015-09-08  9:48   ` Alexander Shishkin
2015-09-09  2:40     ` Takao Indoh
2015-09-15 12:01       ` Alexander Shishkin
2015-09-08  5:49 ` [PATCH v2 4/4] x86: Stop Intel PT and save its registers when panic occurs Takao Indoh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).