linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V3 0/6] export perf overheads information (kernel)
@ 2016-12-08 21:27 kan.liang
  2016-12-08 21:27 ` [PATCH V3 1/6] perf/core: Introduce PERF_RECORD_OVERHEAD kan.liang
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: kan.liang @ 2016-12-08 21:27 UTC (permalink / raw)
  To: peterz, mingo, acme, linux-kernel
  Cc: alexander.shishkin, tglx, namhyung, jolsa, adrian.hunter,
	wangnan0, mark.rutland, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

This series only include the kernel related patches.

Profiling brings additional overhead. High overhead may impacts the
behavior of the profiling object, impacts the accuracy of the
profiling result, and even hang the system.
Currently, perf has dynamic interrupt throttle mechanism to lower the
sample rate and overhead. But it has limitations.
 - The mechanism only focus in the sampling overhead. However, there
   are other parts which also bring big overhead. E.g, multiplexing.
 - The hint from the mechanism doesn't work on fixed period.
 - The system changes which caused by the mechanism are not recorded
   in the perf.data. Users have no idea about the overhead and its
   impact.
Actually, any passive ways like dynamic interrupt throttle mechanism
are only palliative. The best way is to export overhead information,
provide more hints, and help the users design more proper perf command.

For kernel, three parts can bring obvious overhead.
  - sample overhead: For x86, it's the time cost in perf NMI handler.
  - multiplexing overhead: The time cost spends on rotate context.
  - side-band events overhead: The time cost spends on iterating all
    events that need to receive side-band events.
The time cost of those parts are stored in pmu's per-cpu cpuctx.
The tool can call PERF_EVENT_IOC_STAT when it's 'done'. Then the kernel
generates the overhead record PERF_RECORD_OVERHEAD.

User can use the overhead information to refine their perf command and get
accurate profiling result. For example, if there is high overhead warning,
user may reduce the number of events/increase the period/decrease the
frequency.
Developer can also use the overhead information to find bugs.

Changes since V2:
 - Separate kernel patches from the previous version
 - Add PERF_EVENT_IOC_STAT to control overhead statistics
 - Collect per pmu overhead information
 - Store the overhead information in pmu's cpuctx
 - Add CPU information in overhead record

Changes since V1:
 - fix u32 holes and remove duplicate CPU information
 - configurable overhead logging
 - Introduce the concept of PMU specific overhead and common core
   overhead. Rename NMI overhead to PMU sample overhead
 - Add log_overhead in perf_event_context to indicate the logging of
   overhead
 - Refine the output of overhead information
 - Use perf CPU time to replace perf write data overhead
 - Refine the formula of overhead evaluation
 - Refine perf script

Kan Liang (6):
  perf/core: Introduce PERF_RECORD_OVERHEAD
  perf/core: Add PERF_EVENT_IOC_STAT to control overhead statistics
  perf/x86: implement overhead stat for x86 pmu
  perf/core: calculate multiplexing overhead
  perf/core: calculate side-band events overhead
  perf/x86: calculate sampling overhead

 arch/x86/events/core.c          | 45 +++++++++++++++++++++++-
 include/linux/perf_event.h      | 12 +++++++
 include/uapi/linux/perf_event.h | 44 ++++++++++++++++++++++-
 kernel/events/core.c            | 77 ++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 175 insertions(+), 3 deletions(-)

-- 
2.4.3

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH V3 1/6] perf/core: Introduce PERF_RECORD_OVERHEAD
  2016-12-08 21:27 [PATCH V3 0/6] export perf overheads information (kernel) kan.liang
@ 2016-12-08 21:27 ` kan.liang
  2016-12-08 21:27 ` [PATCH V3 2/6] perf/core: Add PERF_EVENT_IOC_STAT to control overhead statistics kan.liang
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: kan.liang @ 2016-12-08 21:27 UTC (permalink / raw)
  To: peterz, mingo, acme, linux-kernel
  Cc: alexander.shishkin, tglx, namhyung, jolsa, adrian.hunter,
	wangnan0, mark.rutland, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

A new perf record is introduced to export perf overhead information to
userspace. So the user can measure the overhead of sampling directly.
If the user doesn't want to use this feature, it can be switched off by
configuring the user space tool.

Perf overhead is actually the sum of active pmu overhead. The pmu event
could be run on different CPU. To calculate the perf overhead, it needs
to collect the per-pmu per-cpu overhead information.
Each pmu has its own per-cpu cpuctx. It's a good place to store the
overhead information.

To output the overhead information, it takes advantage of the existing
event log mechanism. But the overhead information is the per-pmu
overhead, not per-event overhead.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/perf_event.h      |  6 ++++++
 include/uapi/linux/perf_event.h | 38 +++++++++++++++++++++++++++++++++-
 kernel/events/core.c            | 46 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 4741ecd..946e8d8 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -792,6 +792,8 @@ struct perf_cpu_context {
 
 	struct list_head		sched_cb_entry;
 	int				sched_cb_usage;
+
+	struct perf_overhead_entry	overhead[PERF_OVERHEAD_MAX];
 };
 
 struct perf_output_handle {
@@ -998,6 +1000,10 @@ perf_event__output_id_sample(struct perf_event *event,
 extern void
 perf_log_lost_samples(struct perf_event *event, u64 lost);
 
+extern void
+perf_log_overhead(struct perf_event *event, u64 type,
+		  u32 cpu, u32 nr, u64 time);
+
 static inline bool is_sampling_event(struct perf_event *event)
 {
 	return event->attr.sample_period != 0;
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index c66a485..101f8b3 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -344,7 +344,8 @@ struct perf_event_attr {
 				use_clockid    :  1, /* use @clockid for time fields */
 				context_switch :  1, /* context switch data */
 				write_backward :  1, /* Write ring buffer from end to beginning */
-				__reserved_1   : 36;
+				overhead       :  1, /* Log overhead information */
+				__reserved_1   : 35;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -862,6 +863,17 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_SWITCH_CPU_WIDE		= 15,
 
+	/*
+	 * Records perf overhead
+	 * struct {
+	 *	struct perf_event_header	header;
+	 *	u64				type;
+	 *	struct perf_overhead_entry	entry;
+	 *	struct sample_id		sample_id;
+	 * };
+	 */
+	PERF_RECORD_OVERHEAD			= 16,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
@@ -980,4 +992,28 @@ struct perf_branch_entry {
 		reserved:44;
 };
 
+/*
+ * The overhead could be common overhead (in core codes) or
+ * PMU specific overhead (in pmu specific codes).
+ */
+enum perf_record_overhead_type {
+	/* common overhead */
+	/* PMU specific */
+	PERF_OVERHEAD_MAX,
+};
+
+/*
+ * single overhead record layout:
+ *
+ *	 cpu: CPU id
+ *	  nr: Times of overhead happens.
+ *	      E.g. for NMI, nr == times of NMI handler are called.
+ *	time: Total overhead cost(ns)
+ */
+struct perf_overhead_entry {
+	__u32	cpu;
+	__u32	nr;
+	__u64	time;
+};
+
 #endif /* _UAPI_LINUX_PERF_EVENT_H */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 02c8421..1420139 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7034,6 +7034,52 @@ static void perf_log_itrace_start(struct perf_event *event)
 	perf_output_end(&handle);
 }
 
+
+/*
+ * Record overhead information
+ *
+ * The overhead logged here is the overhead for event pmu, not per-event overhead.
+ * This function only take advantage of the existing event log mechanism
+ * to log the overhead information.
+ *
+ */
+void perf_log_overhead(struct perf_event *event, u64 type,
+		       u32 cpu, u32 nr, u64 time)
+{
+	struct perf_output_handle handle;
+	struct perf_sample_data sample;
+	int ret;
+
+	struct {
+		struct perf_event_header	header;
+		u64				type;
+		struct perf_overhead_entry	overhead;
+	} overhead_event = {
+		.header = {
+			.type = PERF_RECORD_OVERHEAD,
+			.misc = 0,
+			.size = sizeof(overhead_event),
+		},
+		.type = type,
+		.overhead = {
+			.cpu = cpu,
+			.nr = nr,
+			.time = time,
+		},
+	};
+
+	perf_event_header__init_id(&overhead_event.header, &sample, event);
+	ret = perf_output_begin(&handle, event, overhead_event.header.size);
+
+	if (ret)
+		return;
+
+	perf_output_put(&handle, overhead_event);
+	perf_event__output_id_sample(event, &handle, &sample);
+
+	perf_output_end(&handle);
+}
+
 /*
  * Generic event overflow handling, sampling.
  */
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V3 2/6] perf/core: Add PERF_EVENT_IOC_STAT to control overhead statistics
  2016-12-08 21:27 [PATCH V3 0/6] export perf overheads information (kernel) kan.liang
  2016-12-08 21:27 ` [PATCH V3 1/6] perf/core: Introduce PERF_RECORD_OVERHEAD kan.liang
@ 2016-12-08 21:27 ` kan.liang
  2016-12-08 21:27 ` [PATCH V3 3/6] perf/x86: implement overhead stat for x86 pmu kan.liang
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: kan.liang @ 2016-12-08 21:27 UTC (permalink / raw)
  To: peterz, mingo, acme, linux-kernel
  Cc: alexander.shishkin, tglx, namhyung, jolsa, adrian.hunter,
	wangnan0, mark.rutland, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

It only needs to generate the overhead statistics once when perf is
done. But there is no good place in the kernel for that.

A new ioctl PERF_EVENT_IOC_STAT is introduced to notify the kernel when
the tool 'start' and 'done'.
In 'start', the kernel resets the overhead numbers.
In 'done', the kernel generates pmu's overhead.

It also needs a pmu specific int (*stat) to do the real work.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/perf_event.h      | 6 ++++++
 include/uapi/linux/perf_event.h | 3 +++
 kernel/events/core.c            | 5 +++++
 3 files changed, 14 insertions(+)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 946e8d8..a34f9a2 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -403,6 +403,12 @@ struct pmu {
 	 */
 	void (*sched_task)		(struct perf_event_context *ctx,
 					bool sched_in);
+
+	/*
+	 * overhead statistics
+	 */
+	int (*stat)			(struct perf_event *event, u32 flag);
+
 	/*
 	 * PMU specific data size
 	 */
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 101f8b3..23b7963 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -408,9 +408,12 @@ struct perf_event_attr {
 #define PERF_EVENT_IOC_ID		_IOR('$', 7, __u64 *)
 #define PERF_EVENT_IOC_SET_BPF		_IOW('$', 8, __u32)
 #define PERF_EVENT_IOC_PAUSE_OUTPUT	_IOW('$', 9, __u32)
+#define PERF_EVENT_IOC_STAT		_IOW('$', 10, __u32)
 
 enum perf_event_ioc_flags {
 	PERF_IOC_FLAG_GROUP		= 1U << 0,
+	PERF_IOC_FLAG_STAT_START	= 1U << 1,
+	PERF_IOC_FLAG_STAT_DONE		= 1U << 2,
 };
 
 /*
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 1420139..dbde193 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4637,6 +4637,11 @@ static long _perf_ioctl(struct perf_event *event, unsigned int cmd, unsigned lon
 		rcu_read_unlock();
 		return 0;
 	}
+	case PERF_EVENT_IOC_STAT: {
+		if (event->pmu->stat)
+			return event->pmu->stat(event, flags);
+		return 0;
+	}
 	default:
 		return -ENOTTY;
 	}
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V3 3/6] perf/x86: implement overhead stat for x86 pmu
  2016-12-08 21:27 [PATCH V3 0/6] export perf overheads information (kernel) kan.liang
  2016-12-08 21:27 ` [PATCH V3 1/6] perf/core: Introduce PERF_RECORD_OVERHEAD kan.liang
  2016-12-08 21:27 ` [PATCH V3 2/6] perf/core: Add PERF_EVENT_IOC_STAT to control overhead statistics kan.liang
@ 2016-12-08 21:27 ` kan.liang
  2016-12-08 21:27 ` [PATCH V3 4/6] perf/core: calculate multiplexing overhead kan.liang
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: kan.liang @ 2016-12-08 21:27 UTC (permalink / raw)
  To: peterz, mingo, acme, linux-kernel
  Cc: alexander.shishkin, tglx, namhyung, jolsa, adrian.hunter,
	wangnan0, mark.rutland, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

In STAT_START, resets overhead counter for each possible cpuctx of event
pmu.
In STAT_DONE, generate overhead information for each possible cpuctx and
reset the overhead counteris.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 arch/x86/events/core.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 6e395c9..09ab36a 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2198,6 +2198,40 @@ static void x86_pmu_sched_task(struct perf_event_context *ctx, bool sched_in)
 		x86_pmu.sched_task(ctx, sched_in);
 }
 
+static int x86_pmu_stat(struct perf_event *event, u32 flag)
+{
+	struct perf_cpu_context *cpuctx;
+	struct pmu *pmu = event->pmu;
+	int cpu, i;
+
+	if (!(flag & (PERF_IOC_FLAG_STAT_START | PERF_IOC_FLAG_STAT_DONE)))
+		return -EINVAL;
+
+	if (!event->attr.overhead)
+		return -EINVAL;
+
+	if (flag & PERF_IOC_FLAG_STAT_DONE) {
+		for_each_possible_cpu(cpu) {
+			cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
+
+			for (i = 0; i < PERF_OVERHEAD_MAX; i++) {
+				if (!cpuctx->overhead[i].nr)
+					continue;
+				perf_log_overhead(event, i, cpu,
+						  cpuctx->overhead[i].nr,
+						  cpuctx->overhead[i].time);
+			}
+		}
+	}
+
+	for_each_possible_cpu(cpu) {
+		cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
+		memset(cpuctx->overhead, 0, PERF_OVERHEAD_MAX * sizeof(struct perf_overhead_entry));
+	}
+
+	return 0;
+}
+
 void perf_check_microcode(void)
 {
 	if (x86_pmu.check_microcode)
@@ -2228,6 +2262,9 @@ static struct pmu pmu = {
 
 	.event_idx		= x86_pmu_event_idx,
 	.sched_task		= x86_pmu_sched_task,
+
+	.stat			= x86_pmu_stat,
+
 	.task_ctx_size          = sizeof(struct x86_perf_task_context),
 };
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V3 4/6] perf/core: calculate multiplexing overhead
  2016-12-08 21:27 [PATCH V3 0/6] export perf overheads information (kernel) kan.liang
                   ` (2 preceding siblings ...)
  2016-12-08 21:27 ` [PATCH V3 3/6] perf/x86: implement overhead stat for x86 pmu kan.liang
@ 2016-12-08 21:27 ` kan.liang
  2016-12-08 21:27 ` [PATCH V3 5/6] perf/core: calculate side-band events overhead kan.liang
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: kan.liang @ 2016-12-08 21:27 UTC (permalink / raw)
  To: peterz, mingo, acme, linux-kernel
  Cc: alexander.shishkin, tglx, namhyung, jolsa, adrian.hunter,
	wangnan0, mark.rutland, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

Multiplexing overhead is one of the key overhead when the number of
events is more than available counters.

The multiplexing overhead PERF_CORE_MUX_OVERHEAD is a common overhead
type.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/uapi/linux/perf_event.h | 1 +
 kernel/events/core.c            | 9 +++++++++
 2 files changed, 10 insertions(+)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 23b7963..c488336 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -1001,6 +1001,7 @@ struct perf_branch_entry {
  */
 enum perf_record_overhead_type {
 	/* common overhead */
+	PERF_CORE_MUX_OVERHEAD	= 0,
 	/* PMU specific */
 	PERF_OVERHEAD_MAX,
 };
diff --git a/kernel/events/core.c b/kernel/events/core.c
index dbde193..28468ae 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3326,6 +3326,7 @@ static void rotate_ctx(struct perf_event_context *ctx)
 static int perf_rotate_context(struct perf_cpu_context *cpuctx)
 {
 	struct perf_event_context *ctx = NULL;
+	u64 start_clock, end_clock;
 	int rotate = 0;
 
 	if (cpuctx->ctx.nr_events) {
@@ -3342,6 +3343,7 @@ static int perf_rotate_context(struct perf_cpu_context *cpuctx)
 	if (!rotate)
 		goto done;
 
+	start_clock = perf_clock();
 	perf_ctx_lock(cpuctx, cpuctx->task_ctx);
 	perf_pmu_disable(cpuctx->ctx.pmu);
 
@@ -3357,6 +3359,13 @@ static int perf_rotate_context(struct perf_cpu_context *cpuctx)
 
 	perf_pmu_enable(cpuctx->ctx.pmu);
 	perf_ctx_unlock(cpuctx, cpuctx->task_ctx);
+
+	/* calculate multiplexing overhead */
+	if (cpuctx->ctx.pmu->stat) {
+		end_clock = perf_clock();
+		cpuctx->overhead[PERF_CORE_MUX_OVERHEAD].nr++;
+		cpuctx->overhead[PERF_CORE_MUX_OVERHEAD].time += end_clock - start_clock;
+	}
 done:
 
 	return rotate;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V3 5/6] perf/core: calculate side-band events overhead
  2016-12-08 21:27 [PATCH V3 0/6] export perf overheads information (kernel) kan.liang
                   ` (3 preceding siblings ...)
  2016-12-08 21:27 ` [PATCH V3 4/6] perf/core: calculate multiplexing overhead kan.liang
@ 2016-12-08 21:27 ` kan.liang
  2016-12-08 21:27 ` [PATCH V3 6/6] perf/x86: calculate sampling overhead kan.liang
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: kan.liang @ 2016-12-08 21:27 UTC (permalink / raw)
  To: peterz, mingo, acme, linux-kernel
  Cc: alexander.shishkin, tglx, namhyung, jolsa, adrian.hunter,
	wangnan0, mark.rutland, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

Iterating all events which need to receive side-band events also bring
some overhead.

The side-band events overhead PERF_CORE_SB_OVERHEAD is a common overhead
type.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/uapi/linux/perf_event.h |  1 +
 kernel/events/core.c            | 17 ++++++++++++++++-
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index c488336..7ba6d30 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -1002,6 +1002,7 @@ struct perf_branch_entry {
 enum perf_record_overhead_type {
 	/* common overhead */
 	PERF_CORE_MUX_OVERHEAD	= 0,
+	PERF_CORE_SB_OVERHEAD,
 	/* PMU specific */
 	PERF_OVERHEAD_MAX,
 };
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 28468ae..335b1e2 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6134,9 +6134,13 @@ static void
 perf_iterate_sb(perf_iterate_f output, void *data,
 	       struct perf_event_context *task_ctx)
 {
+	struct perf_event_context *overhead_ctx = task_ctx;
+	struct perf_cpu_context *cpuctx;
 	struct perf_event_context *ctx;
+	u64 start_clock, end_clock;
 	int ctxn;
 
+	start_clock = perf_clock();
 	rcu_read_lock();
 	preempt_disable();
 
@@ -6154,12 +6158,23 @@ perf_iterate_sb(perf_iterate_f output, void *data,
 
 	for_each_task_context_nr(ctxn) {
 		ctx = rcu_dereference(current->perf_event_ctxp[ctxn]);
-		if (ctx)
+		if (ctx) {
 			perf_iterate_ctx(ctx, output, data, false);
+			if (!overhead_ctx)
+				overhead_ctx = ctx;
+		}
 	}
 done:
 	preempt_enable();
 	rcu_read_unlock();
+
+	/* calculate side-band event overhead */
+	end_clock = perf_clock();
+	if (overhead_ctx && overhead_ctx->pmu && overhead_ctx->pmu->stat) {
+		cpuctx = this_cpu_ptr(overhead_ctx->pmu->pmu_cpu_context);
+		cpuctx->overhead[PERF_CORE_SB_OVERHEAD].nr++;
+		cpuctx->overhead[PERF_CORE_SB_OVERHEAD].time += end_clock - start_clock;
+	}
 }
 
 /*
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V3 6/6] perf/x86: calculate sampling overhead
  2016-12-08 21:27 [PATCH V3 0/6] export perf overheads information (kernel) kan.liang
                   ` (4 preceding siblings ...)
  2016-12-08 21:27 ` [PATCH V3 5/6] perf/core: calculate side-band events overhead kan.liang
@ 2016-12-08 21:27 ` kan.liang
  2016-12-16 15:08 ` [PATCH V3 0/6] export perf overheads information (kernel) Liang, Kan
  2017-01-06 15:25 ` Liang, Kan
  7 siblings, 0 replies; 9+ messages in thread
From: kan.liang @ 2016-12-08 21:27 UTC (permalink / raw)
  To: peterz, mingo, acme, linux-kernel
  Cc: alexander.shishkin, tglx, namhyung, jolsa, adrian.hunter,
	wangnan0, mark.rutland, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

On x86, NMI handler is the most important part which brings overhead
for sampling. Adding a pmu specific overhead type
PERF_PMU_SAMPLE_OVERHEAD for it.

For other architectures which may don't have NMI, the overhead type can
be reused.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 arch/x86/events/core.c          | 8 +++++++-
 include/uapi/linux/perf_event.h | 1 +
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 09ab36a..1e57ccf 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1478,8 +1478,10 @@ void perf_events_lapic_init(void)
 static int
 perf_event_nmi_handler(unsigned int cmd, struct pt_regs *regs)
 {
+	struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu.pmu_cpu_context);
 	u64 start_clock;
 	u64 finish_clock;
+	u64 clock;
 	int ret;
 
 	/*
@@ -1492,8 +1494,12 @@ perf_event_nmi_handler(unsigned int cmd, struct pt_regs *regs)
 	start_clock = sched_clock();
 	ret = x86_pmu.handle_irq(regs);
 	finish_clock = sched_clock();
+	clock = finish_clock - start_clock;
+	perf_sample_event_took(clock);
 
-	perf_sample_event_took(finish_clock - start_clock);
+	/* calculate NMI overhead */
+	cpuctx->overhead[PERF_PMU_SAMPLE_OVERHEAD].nr++;
+	cpuctx->overhead[PERF_PMU_SAMPLE_OVERHEAD].time += clock;
 
 	return ret;
 }
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 7ba6d30..954b116 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -1004,6 +1004,7 @@ enum perf_record_overhead_type {
 	PERF_CORE_MUX_OVERHEAD	= 0,
 	PERF_CORE_SB_OVERHEAD,
 	/* PMU specific */
+	PERF_PMU_SAMPLE_OVERHEAD,
 	PERF_OVERHEAD_MAX,
 };
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* RE: [PATCH V3 0/6] export perf overheads information (kernel)
  2016-12-08 21:27 [PATCH V3 0/6] export perf overheads information (kernel) kan.liang
                   ` (5 preceding siblings ...)
  2016-12-08 21:27 ` [PATCH V3 6/6] perf/x86: calculate sampling overhead kan.liang
@ 2016-12-16 15:08 ` Liang, Kan
  2017-01-06 15:25 ` Liang, Kan
  7 siblings, 0 replies; 9+ messages in thread
From: Liang, Kan @ 2016-12-16 15:08 UTC (permalink / raw)
  To: peterz, mingo, acme, linux-kernel
  Cc: alexander.shishkin, tglx, namhyung, jolsa, Hunter, Adrian,
	wangnan0, mark.rutland, andi


Ping.
Any comments for the series?

Thanks,
Kan

> Subject: [PATCH V3 0/6] export perf overheads information (kernel)
> 
> From: Kan Liang <kan.liang@intel.com>
> 
> This series only include the kernel related patches.
> 
> Profiling brings additional overhead. High overhead may impacts the
> behavior of the profiling object, impacts the accuracy of the profiling result,
> and even hang the system.
> Currently, perf has dynamic interrupt throttle mechanism to lower the
> sample rate and overhead. But it has limitations.
>  - The mechanism only focus in the sampling overhead. However, there
>    are other parts which also bring big overhead. E.g, multiplexing.
>  - The hint from the mechanism doesn't work on fixed period.
>  - The system changes which caused by the mechanism are not recorded
>    in the perf.data. Users have no idea about the overhead and its
>    impact.
> Actually, any passive ways like dynamic interrupt throttle mechanism are
> only palliative. The best way is to export overhead information, provide
> more hints, and help the users design more proper perf command.
> 
> For kernel, three parts can bring obvious overhead.
>   - sample overhead: For x86, it's the time cost in perf NMI handler.
>   - multiplexing overhead: The time cost spends on rotate context.
>   - side-band events overhead: The time cost spends on iterating all
>     events that need to receive side-band events.
> The time cost of those parts are stored in pmu's per-cpu cpuctx.
> The tool can call PERF_EVENT_IOC_STAT when it's 'done'. Then the kernel
> generates the overhead record PERF_RECORD_OVERHEAD.
> 
> User can use the overhead information to refine their perf command and
> get accurate profiling result. For example, if there is high overhead warning,
> user may reduce the number of events/increase the period/decrease the
> frequency.
> Developer can also use the overhead information to find bugs.
> 
> Changes since V2:
>  - Separate kernel patches from the previous version
>  - Add PERF_EVENT_IOC_STAT to control overhead statistics
>  - Collect per pmu overhead information
>  - Store the overhead information in pmu's cpuctx
>  - Add CPU information in overhead record
> 
> Changes since V1:
>  - fix u32 holes and remove duplicate CPU information
>  - configurable overhead logging
>  - Introduce the concept of PMU specific overhead and common core
>    overhead. Rename NMI overhead to PMU sample overhead
>  - Add log_overhead in perf_event_context to indicate the logging of
>    overhead
>  - Refine the output of overhead information
>  - Use perf CPU time to replace perf write data overhead
>  - Refine the formula of overhead evaluation
>  - Refine perf script
> 
> Kan Liang (6):
>   perf/core: Introduce PERF_RECORD_OVERHEAD
>   perf/core: Add PERF_EVENT_IOC_STAT to control overhead statistics
>   perf/x86: implement overhead stat for x86 pmu
>   perf/core: calculate multiplexing overhead
>   perf/core: calculate side-band events overhead
>   perf/x86: calculate sampling overhead
> 
>  arch/x86/events/core.c          | 45 +++++++++++++++++++++++-
>  include/linux/perf_event.h      | 12 +++++++
>  include/uapi/linux/perf_event.h | 44 ++++++++++++++++++++++-
>  kernel/events/core.c            | 77
> ++++++++++++++++++++++++++++++++++++++++-
>  4 files changed, 175 insertions(+), 3 deletions(-)
> 
> --
> 2.4.3

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH V3 0/6] export perf overheads information (kernel)
  2016-12-08 21:27 [PATCH V3 0/6] export perf overheads information (kernel) kan.liang
                   ` (6 preceding siblings ...)
  2016-12-16 15:08 ` [PATCH V3 0/6] export perf overheads information (kernel) Liang, Kan
@ 2017-01-06 15:25 ` Liang, Kan
  7 siblings, 0 replies; 9+ messages in thread
From: Liang, Kan @ 2017-01-06 15:25 UTC (permalink / raw)
  To: peterz, mingo, acme, linux-kernel
  Cc: alexander.shishkin, tglx, namhyung, jolsa, Hunter, Adrian,
	wangnan0, mark.rutland, andi

Hi Peter,

Any comments?

Thanks,
Kan

> Subject: RE: [PATCH V3 0/6] export perf overheads information (kernel)
> 
> 
> Ping.
> Any comments for the series?
> 
> Thanks,
> Kan
> 
> > Subject: [PATCH V3 0/6] export perf overheads information (kernel)
> >
> > From: Kan Liang <kan.liang@intel.com>
> >
> > This series only include the kernel related patches.
> >
> > Profiling brings additional overhead. High overhead may impacts the
> > behavior of the profiling object, impacts the accuracy of the
> > profiling result, and even hang the system.
> > Currently, perf has dynamic interrupt throttle mechanism to lower the
> > sample rate and overhead. But it has limitations.
> >  - The mechanism only focus in the sampling overhead. However, there
> >    are other parts which also bring big overhead. E.g, multiplexing.
> >  - The hint from the mechanism doesn't work on fixed period.
> >  - The system changes which caused by the mechanism are not recorded
> >    in the perf.data. Users have no idea about the overhead and its
> >    impact.
> > Actually, any passive ways like dynamic interrupt throttle mechanism
> > are only palliative. The best way is to export overhead information,
> > provide more hints, and help the users design more proper perf
> command.
> >
> > For kernel, three parts can bring obvious overhead.
> >   - sample overhead: For x86, it's the time cost in perf NMI handler.
> >   - multiplexing overhead: The time cost spends on rotate context.
> >   - side-band events overhead: The time cost spends on iterating all
> >     events that need to receive side-band events.
> > The time cost of those parts are stored in pmu's per-cpu cpuctx.
> > The tool can call PERF_EVENT_IOC_STAT when it's 'done'. Then the
> > kernel generates the overhead record PERF_RECORD_OVERHEAD.
> >
> > User can use the overhead information to refine their perf command and
> > get accurate profiling result. For example, if there is high overhead
> > warning, user may reduce the number of events/increase the
> > period/decrease the frequency.
> > Developer can also use the overhead information to find bugs.
> >
> > Changes since V2:
> >  - Separate kernel patches from the previous version
> >  - Add PERF_EVENT_IOC_STAT to control overhead statistics
> >  - Collect per pmu overhead information
> >  - Store the overhead information in pmu's cpuctx
> >  - Add CPU information in overhead record
> >
> > Changes since V1:
> >  - fix u32 holes and remove duplicate CPU information
> >  - configurable overhead logging
> >  - Introduce the concept of PMU specific overhead and common core
> >    overhead. Rename NMI overhead to PMU sample overhead
> >  - Add log_overhead in perf_event_context to indicate the logging of
> >    overhead
> >  - Refine the output of overhead information
> >  - Use perf CPU time to replace perf write data overhead
> >  - Refine the formula of overhead evaluation
> >  - Refine perf script
> >
> > Kan Liang (6):
> >   perf/core: Introduce PERF_RECORD_OVERHEAD
> >   perf/core: Add PERF_EVENT_IOC_STAT to control overhead statistics
> >   perf/x86: implement overhead stat for x86 pmu
> >   perf/core: calculate multiplexing overhead
> >   perf/core: calculate side-band events overhead
> >   perf/x86: calculate sampling overhead
> >
> >  arch/x86/events/core.c          | 45 +++++++++++++++++++++++-
> >  include/linux/perf_event.h      | 12 +++++++
> >  include/uapi/linux/perf_event.h | 44 ++++++++++++++++++++++-
> >  kernel/events/core.c            | 77
> > ++++++++++++++++++++++++++++++++++++++++-
> >  4 files changed, 175 insertions(+), 3 deletions(-)
> >
> > --
> > 2.4.3

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-01-06 15:26 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-08 21:27 [PATCH V3 0/6] export perf overheads information (kernel) kan.liang
2016-12-08 21:27 ` [PATCH V3 1/6] perf/core: Introduce PERF_RECORD_OVERHEAD kan.liang
2016-12-08 21:27 ` [PATCH V3 2/6] perf/core: Add PERF_EVENT_IOC_STAT to control overhead statistics kan.liang
2016-12-08 21:27 ` [PATCH V3 3/6] perf/x86: implement overhead stat for x86 pmu kan.liang
2016-12-08 21:27 ` [PATCH V3 4/6] perf/core: calculate multiplexing overhead kan.liang
2016-12-08 21:27 ` [PATCH V3 5/6] perf/core: calculate side-band events overhead kan.liang
2016-12-08 21:27 ` [PATCH V3 6/6] perf/x86: calculate sampling overhead kan.liang
2016-12-16 15:08 ` [PATCH V3 0/6] export perf overheads information (kernel) Liang, Kan
2017-01-06 15:25 ` Liang, Kan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).