linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] perf: use hrtimer for event multiplexing
@ 2012-09-07 14:29 Stephane Eranian
  2012-09-07 14:29 ` [PATCH 1/3] " Stephane Eranian
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Stephane Eranian @ 2012-09-07 14:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, zheng.z.yan, robert.richter

The current scheme of using the timer tick was fine
for per-thread events. However, it was causing
bias issues in system-wide mode (including for
uncore PMUs). Event groups would not get their
fair share of runtime on the PMU. With tickless
kernels, if a core is idle there is no timer tick,
and thus no event rotation (multiplexing). However,
there are events (especially uncore events) which do
count even though cores are asleep.

This patch changes the timer source for multiplexing.
It introduces a per-cpu hrtimer. The advantage is that
even when the core goes idle, it will come back to
service the hrtimer, thus multiplexing on system-wide
events works much better.

In order to minimize the impact of the hrtimer, it
is turned on and off on demand. When the PMU on
a CPU is overcommited, the hrtimer is activated.
It is stopped when the PMU is not overcommitted.

In order for this to work properly with HOTPLUG_CPU,
we had to change the order of initialization in
start_kernel() such that hrtimer_init() is run
before perf_event_init().

The second patch provide a sysctl control to
adjust the multiplexing interval. Unit is
milliseconds.

Here is a simple before/after example with
two event groups which do require multiplexing.
This is done in system-wide mode on an idle
system. What matters here is the scaling factor
in [] in not the total counts.

Before:

# perf stat -a -e ref-cycles,ref-cycles sleep 10
 Performance counter stats for 'sleep 10':
 34,319,545 ref-cycles  [56.51%]
 31,917,229 ref-cycles  [43.50%]

 10.000827569 seconds time elapsed

After:
# perf stat -a -e ref-cycles,ref-cycles sleep 10
 Performance counter stats for 'sleep 10':
 11,144,822,193 ref-cycles [50.00%]
 11,103,760,513 ref-cycles [50.00%]

 10.000672946 seconds time elapsed

Signed-off-by: Stephane Eranian <eranian@google.com>
---

Stephane Eranian (3):
  perf: use hrtimer for event multiplexing
  perf: add sysctl control to adjust multiplexing interval
  perf: remove jiffies_interval

 include/linux/perf_event.h |    6 ++-
 init/main.c                |    2 +-
 kernel/events/core.c       |  155 +++++++++++++++++++++++++++++++++++++++++---
 kernel/sysctl.c            |    8 ++
 4 files changed, 160 insertions(+), 11 deletions(-)

-- 
1.7.5.4


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] perf: use hrtimer for event multiplexing
  2012-09-07 14:29 [PATCH 0/3] perf: use hrtimer for event multiplexing Stephane Eranian
@ 2012-09-07 14:29 ` Stephane Eranian
  2012-09-07 15:39   ` Peter Zijlstra
  2012-09-07 15:42   ` Peter Zijlstra
  2012-09-07 14:29 ` [PATCH 2/3] perf: add sysctl control to adjust multiplexing interval Stephane Eranian
  2012-09-07 14:29 ` [PATCH 3/3] perf: remove jiffies_interval Stephane Eranian
  2 siblings, 2 replies; 11+ messages in thread
From: Stephane Eranian @ 2012-09-07 14:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, zheng.z.yan, robert.richter

The current scheme of using the timer tick was fine
for per-thread events. However, it was causing
bias issues in system-wide mode (including for
uncore PMUs). Event groups would not get their
fair share of runtime on the PMU. With tickless
kernels, if a core is idle there is no timer tick,
and thus no event rotation (multiplexing). However,
there are events (especially uncore events) which do
count even though cores are asleep.

This patch changes the timer source for multiplexing.
It introduces a per-cpu hrtimer. The advantage is that
even when the core goes idle, it will come back to
service the hrtimer, thus multiplexing on system-wide
events works much better.

In order to minimize the impact of the hrtimer, it
is turned on and off on demand. When the PMU on
a CPU is overcommited, the hrtimer is activated.
It is stopped when the PMU is not overcommitted.

In order for this to work properly with HOTPLUG_CPU,
we had to change the order of initialization in
start_kernel() such that hrtimer_init() is run
before perf_event_init().

The default hrtimer interval in milliseconds
is set to a timer tick just like with the old
code. We will provide a sysctl to tune this in
another patch.

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 init/main.c          |    2 +-
 kernel/events/core.c |  125 ++++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 118 insertions(+), 9 deletions(-)

diff --git a/init/main.c b/init/main.c
index b286730..8ffe441 100644
--- a/init/main.c
+++ b/init/main.c
@@ -541,7 +541,6 @@ asmlinkage void __init start_kernel(void)
 		local_irq_disable();
 	}
 	idr_init_cache();
-	perf_event_init();
 	rcu_init();
 	radix_tree_init();
 	/* init some links before init_ISA_irqs() */
@@ -553,6 +552,7 @@ asmlinkage void __init start_kernel(void)
 	softirq_init();
 	timekeeping_init();
 	time_init();
+	perf_event_init();
 	profile_init();
 	call_function_init();
 	if (!irqs_disabled())
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 40f42b8..ab4ef10 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -148,6 +148,15 @@ static LIST_HEAD(pmus);
 static DEFINE_MUTEX(pmus_lock);
 static struct srcu_struct pmus_srcu;
 
+struct perf_cpu_hrtimer {
+	struct hrtimer hrtimer;
+	int active;
+};
+
+static DEFINE_PER_CPU(struct list_head, rotation_list);
+
+static DEFINE_PER_CPU(struct perf_cpu_hrtimer, perf_cpu_hrtimer);
+
 /*
  * perf event paranoia level:
  *  -1 - not paranoid at all
@@ -168,6 +177,8 @@ int sysctl_perf_event_sample_rate __read_mostly = DEFAULT_MAX_SAMPLE_RATE;
 static int max_samples_per_tick __read_mostly =
 	DIV_ROUND_UP(DEFAULT_MAX_SAMPLE_RATE, HZ);
 
+static int perf_rotate_context(struct perf_cpu_context *cpuctx);
+
 int perf_proc_update_handler(struct ctl_table *table, int write,
 		void __user *buffer, size_t *lenp,
 		loff_t *ppos)
@@ -625,6 +636,95 @@ perf_cgroup_mark_enabled(struct perf_event *event,
 }
 #endif
 
+/*
+ * set default to be dependent on timer tick just
+ * like original code
+ */
+#define PERF_CPU_HRTIMER (1000 / HZ)
+static enum hrtimer_restart perf_cpu_hrtimer_handler(struct hrtimer *hr)
+{
+	struct perf_cpu_hrtimer *h = &__get_cpu_var(perf_cpu_hrtimer);
+	struct list_head *head = &__get_cpu_var(rotation_list);
+	struct perf_cpu_context *cpuctx, *tmp;
+	enum hrtimer_restart ret = HRTIMER_NORESTART;
+	unsigned long flags;
+	int rotations = 0;
+
+	/* sanity check */
+	if (WARN_ON_ONCE(hr != &h->hrtimer))
+		goto end;
+
+	local_irq_save(flags);
+
+	if (h->active)
+		list_for_each_entry_safe(cpuctx, tmp, head, rotation_list)
+			rotations += perf_rotate_context(cpuctx);
+
+	/*
+	 * if no rotations done, then we can stop timer
+	 * will be reactivated in group_sched_in()
+	 */
+	if (!rotations)
+		h->active = 0;
+
+	local_irq_restore(flags);
+
+	/*
+	 * arm timer if needed
+	 */
+	if (rotations) {
+		hrtimer_forward_now(hr, ns_to_ktime(PERF_CPU_HRTIMER));
+		ret = HRTIMER_RESTART;
+	}
+end:
+	return ret;
+}
+
+void perf_cpu_hrtimer_init(int cpu)
+{
+	struct perf_cpu_hrtimer *h = &__get_cpu_var(perf_cpu_hrtimer);
+	struct hrtimer *hr = &h->hrtimer;
+
+	if (WARN_ON(cpu != smp_processor_id()))
+		return;
+
+	if (WARN_ON(h->active))
+		return;
+
+	hrtimer_init(hr, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	hr->function = perf_cpu_hrtimer_handler;
+	h->active = 0;
+}
+
+void perf_cpu_hrtimer_cancel(int cpu)
+{
+	struct perf_cpu_hrtimer *h = &__get_cpu_var(perf_cpu_hrtimer);
+	struct hrtimer *hr = &h->hrtimer;
+
+	if (WARN_ON(cpu != smp_processor_id()))
+		return;
+
+	if (h->active) {
+		hrtimer_cancel(hr);
+		h->active = 0;
+	}
+}
+
+static void perf_cpu_hrtimer_restart(void)
+{
+	struct perf_cpu_hrtimer *h = &__get_cpu_var(perf_cpu_hrtimer);
+	struct hrtimer *hr = &h->hrtimer;
+
+	if (h->active)
+		return;
+
+	h->active = 1;
+
+	if (!hrtimer_callback_running(hr))
+		__hrtimer_start_range_ns(hr, ns_to_ktime(PERF_CPU_HRTIMER),
+					 0, HRTIMER_MODE_REL_PINNED, 0);
+}
+
 void perf_pmu_disable(struct pmu *pmu)
 {
 	int *count = this_cpu_ptr(pmu->pmu_disable_count);
@@ -639,8 +739,6 @@ void perf_pmu_enable(struct pmu *pmu)
 		pmu->pmu_enable(pmu);
 }
 
-static DEFINE_PER_CPU(struct list_head, rotation_list);
-
 /*
  * perf_pmu_rotate_start() and perf_rotate_context() are fully serialized
  * because they're strictly cpu affine and rotate_start is called with IRQs
@@ -1458,6 +1556,7 @@ group_sched_in(struct perf_event *group_event,
 
 	if (event_sched_in(group_event, cpuctx, ctx)) {
 		pmu->cancel_txn(pmu);
+		perf_cpu_hrtimer_restart();
 		return -EAGAIN;
 	}
 
@@ -1504,6 +1603,8 @@ group_sched_in(struct perf_event *group_event,
 
 	pmu->cancel_txn(pmu);
 
+	perf_cpu_hrtimer_restart();
+
 	return -EAGAIN;
 }
 
@@ -1759,8 +1860,10 @@ static int __perf_event_enable(void *info)
 		 * If this event can't go on and it's part of a
 		 * group, then the whole group has to come off.
 		 */
-		if (leader != event)
+		if (leader != event) {
 			group_sched_out(leader, cpuctx, ctx);
+			perf_cpu_hrtimer_restart();
+		}
 		if (leader->attr.pinned) {
 			update_group_times(leader);
 			leader->state = PERF_EVENT_STATE_ERROR;
@@ -2507,7 +2610,7 @@ static void rotate_ctx(struct perf_event_context *ctx)
  * because they're strictly cpu affine and rotate_start is called with IRQs
  * disabled, while rotate_context is called from IRQ context.
  */
-static void perf_rotate_context(struct perf_cpu_context *cpuctx)
+static int perf_rotate_context(struct perf_cpu_context *cpuctx)
 {
 	struct perf_event_context *ctx = NULL;
 	int rotate = 0, remove = 1;
@@ -2546,6 +2649,8 @@ static void perf_rotate_context(struct perf_cpu_context *cpuctx)
 done:
 	if (remove)
 		list_del_init(&cpuctx->rotation_list);
+
+	return rotate;
 }
 
 void perf_event_task_tick(void)
@@ -2567,10 +2672,6 @@ void perf_event_task_tick(void)
 		ctx = cpuctx->task_ctx;
 		if (ctx)
 			perf_adjust_freq_unthr_context(ctx, throttled);
-
-		if (cpuctx->jiffies_interval == 1 ||
-				!(jiffies % cpuctx->jiffies_interval))
-			perf_rotate_context(cpuctx);
 	}
 }
 
@@ -7379,6 +7480,14 @@ perf_cpu_notify(struct notifier_block *self, unsigned long action, void *hcpu)
 	case CPU_DOWN_PREPARE:
 		perf_event_exit_cpu(cpu);
 		break;
+	case CPU_DYING:
+		/* must be run on actual cpu */
+		perf_cpu_hrtimer_cancel(cpu);
+		break;
+	case CPU_STARTING:
+		/* must be run on actual cpu */
+		perf_cpu_hrtimer_init(cpu);
+		break;
 
 	default:
 		break;
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] perf: add sysctl control to adjust multiplexing interval
  2012-09-07 14:29 [PATCH 0/3] perf: use hrtimer for event multiplexing Stephane Eranian
  2012-09-07 14:29 ` [PATCH 1/3] " Stephane Eranian
@ 2012-09-07 14:29 ` Stephane Eranian
  2012-09-07 14:29 ` [PATCH 3/3] perf: remove jiffies_interval Stephane Eranian
  2 siblings, 0 replies; 11+ messages in thread
From: Stephane Eranian @ 2012-09-07 14:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, zheng.z.yan, robert.richter

add /proc/sys/kernel/perf_event_mux_interval_ms
entry to adjust the multiplexing period.

Unit is milliseconds.

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 include/linux/perf_event.h |    5 +++++
 kernel/events/core.c       |   43 ++++++++++++++++++++++++++++++++++++-------
 kernel/sysctl.c            |    8 ++++++++
 3 files changed, 49 insertions(+), 7 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index cc5e2cd..21c3bb0 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1297,11 +1297,16 @@ static inline void perf_callchain_store(struct perf_callchain_entry *entry, u64
 extern int sysctl_perf_event_paranoid;
 extern int sysctl_perf_event_mlock;
 extern int sysctl_perf_event_sample_rate;
+extern int sysctl_perf_event_mux_interval_ms;
 
 extern int perf_proc_update_handler(struct ctl_table *table, int write,
 		void __user *buffer, size_t *lenp,
 		loff_t *ppos);
 
+extern int perf_proc_mux_interval_ms_handler(struct ctl_table *table, int write,
+		void __user *buffer, size_t *lenp,
+		loff_t *ppos);
+
 static inline bool perf_paranoid_tracepoint_raw(void)
 {
 	return sysctl_perf_event_paranoid > -1;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index ab4ef10..fd53509 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -144,6 +144,13 @@ static atomic_t nr_mmap_events __read_mostly;
 static atomic_t nr_comm_events __read_mostly;
 static atomic_t nr_task_events __read_mostly;
 
+/*
+ * set default to be dependent on timer tick just
+ * like original code
+ */
+#define DEFAULT_MUX_INTERVAL_MS (1000 / HZ)
+static ktime_t perf_cpu_hrtimer_interval; /* mux interval in ktime_t */
+
 static LIST_HEAD(pmus);
 static DEFINE_MUTEX(pmus_lock);
 static struct srcu_struct pmus_srcu;
@@ -174,6 +181,10 @@ int sysctl_perf_event_mlock __read_mostly = 512 + (PAGE_SIZE / 1024); /* 'free'
  */
 #define DEFAULT_MAX_SAMPLE_RATE 100000
 int sysctl_perf_event_sample_rate __read_mostly = DEFAULT_MAX_SAMPLE_RATE;
+int sysctl_perf_event_mux_interval_ms __read_mostly = DEFAULT_MUX_INTERVAL_MS;
+
+static DEFINE_PER_CPU(struct list_head, rotation_list);
+
 static int max_samples_per_tick __read_mostly =
 	DIV_ROUND_UP(DEFAULT_MAX_SAMPLE_RATE, HZ);
 
@@ -193,6 +204,25 @@ int perf_proc_update_handler(struct ctl_table *table, int write,
 	return 0;
 }
 
+int perf_proc_mux_interval_ms_handler(struct ctl_table *table, int write,
+		void __user *buffer, size_t *lenp,
+		loff_t *ppos)
+{
+	int ret;
+
+	ret = proc_dointvec(table, write, buffer, lenp, ppos);
+	if (ret || !write)
+		return ret;
+
+	if (sysctl_perf_event_mux_interval_ms < 1)
+		return -EINVAL;
+
+	perf_cpu_hrtimer_interval =
+		ns_to_ktime(sysctl_perf_event_mux_interval_ms * NSEC_PER_MSEC);
+
+	return 0;
+}
+
 static atomic64_t perf_event_id;
 
 static void cpu_ctx_sched_out(struct perf_cpu_context *cpuctx,
@@ -636,11 +666,6 @@ perf_cgroup_mark_enabled(struct perf_event *event,
 }
 #endif
 
-/*
- * set default to be dependent on timer tick just
- * like original code
- */
-#define PERF_CPU_HRTIMER (1000 / HZ)
 static enum hrtimer_restart perf_cpu_hrtimer_handler(struct hrtimer *hr)
 {
 	struct perf_cpu_hrtimer *h = &__get_cpu_var(perf_cpu_hrtimer);
@@ -673,7 +698,7 @@ static enum hrtimer_restart perf_cpu_hrtimer_handler(struct hrtimer *hr)
 	 * arm timer if needed
 	 */
 	if (rotations) {
-		hrtimer_forward_now(hr, ns_to_ktime(PERF_CPU_HRTIMER));
+		hrtimer_forward_now(hr, perf_cpu_hrtimer_interval);
 		ret = HRTIMER_RESTART;
 	}
 end:
@@ -721,7 +746,7 @@ static void perf_cpu_hrtimer_restart(void)
 	h->active = 1;
 
 	if (!hrtimer_callback_running(hr))
-		__hrtimer_start_range_ns(hr, ns_to_ktime(PERF_CPU_HRTIMER),
+		__hrtimer_start_range_ns(hr, perf_cpu_hrtimer_interval,
 					 0, HRTIMER_MODE_REL_PINNED, 0);
 }
 
@@ -7517,6 +7542,10 @@ void __init perf_event_init(void)
 	/* do not patch jump label more than once per second */
 	jump_label_rate_limit(&perf_sched_events, HZ);
 
+	/* default multiplexing interval */
+	perf_cpu_hrtimer_interval =
+		ns_to_ktime(DEFAULT_MUX_INTERVAL_MS * NSEC_PER_MSEC);
+
 	/*
 	 * Build time assertion that we keep the data_head at the intended
 	 * location.  IOW, validation we got the __reserved[] size right.
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 81c7b1a..0d7b2c8 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -981,6 +981,14 @@ static struct ctl_table kern_table[] = {
 		.mode		= 0644,
 		.proc_handler	= perf_proc_update_handler,
 	},
+	{
+		.procname	= "perf_event_mux_interval_ms",
+		.data		= &sysctl_perf_event_mux_interval_ms,
+		.maxlen		= sizeof(sysctl_perf_event_mux_interval_ms),
+		.mode		= 0644,
+		.proc_handler	= perf_proc_mux_interval_ms_handler,
+	},
+
 #endif
 #ifdef CONFIG_KMEMCHECK
 	{
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] perf: remove jiffies_interval
  2012-09-07 14:29 [PATCH 0/3] perf: use hrtimer for event multiplexing Stephane Eranian
  2012-09-07 14:29 ` [PATCH 1/3] " Stephane Eranian
  2012-09-07 14:29 ` [PATCH 2/3] perf: add sysctl control to adjust multiplexing interval Stephane Eranian
@ 2012-09-07 14:29 ` Stephane Eranian
  2012-09-07 15:40   ` Peter Zijlstra
  2 siblings, 1 reply; 11+ messages in thread
From: Stephane Eranian @ 2012-09-07 14:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, zheng.z.yan, robert.richter

Obsolete because superseded by hrtimer based
multiplexing.

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 include/linux/perf_event.h |    1 -
 kernel/events/core.c       |    1 -
 2 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 21c3bb0..b0cb209 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1107,7 +1107,6 @@ struct perf_cpu_context {
 	int				active_oncpu;
 	int				exclusive;
 	struct list_head		rotation_list;
-	int				jiffies_interval;
 	struct pmu			*active_pmu;
 	struct perf_cgroup		*cgrp;
 };
diff --git a/kernel/events/core.c b/kernel/events/core.c
index fd53509..e0db140 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6126,7 +6126,6 @@ int perf_pmu_register(struct pmu *pmu, char *name, int type)
 		lockdep_set_class(&cpuctx->ctx.lock, &cpuctx_lock);
 		cpuctx->ctx.type = cpu_context;
 		cpuctx->ctx.pmu = pmu;
-		cpuctx->jiffies_interval = 1;
 		INIT_LIST_HEAD(&cpuctx->rotation_list);
 		cpuctx->active_pmu = pmu;
 	}
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] perf: use hrtimer for event multiplexing
  2012-09-07 14:29 ` [PATCH 1/3] " Stephane Eranian
@ 2012-09-07 15:39   ` Peter Zijlstra
  2012-09-07 17:03     ` Stephane Eranian
  2012-09-07 15:42   ` Peter Zijlstra
  1 sibling, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2012-09-07 15:39 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: linux-kernel, mingo, ak, zheng.z.yan, robert.richter

On Fri, 2012-09-07 at 16:29 +0200, Stephane Eranian wrote:
> @@ -148,6 +148,15 @@ static LIST_HEAD(pmus);
>  static DEFINE_MUTEX(pmus_lock);
>  static struct srcu_struct pmus_srcu;
>  
> +struct perf_cpu_hrtimer {
> +       struct hrtimer hrtimer;
> +       int active;
> +};
> +
> +static DEFINE_PER_CPU(struct list_head, rotation_list);
> +
> +static DEFINE_PER_CPU(struct perf_cpu_hrtimer, perf_cpu_hrtimer);


How about sticking the hrtimer in perf_cpu_context so you can have a
different rotation interval per PMU ?

Sorta like e9d2b064149ff7ef4acbc65a1b9374ac8b218d3e removed. Stopping
the timer when the PMU isn't over committed should solve the NOHZ
problem I think.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] perf: remove jiffies_interval
  2012-09-07 14:29 ` [PATCH 3/3] perf: remove jiffies_interval Stephane Eranian
@ 2012-09-07 15:40   ` Peter Zijlstra
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2012-09-07 15:40 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: linux-kernel, mingo, ak, zheng.z.yan, robert.richter

On Fri, 2012-09-07 at 16:29 +0200, Stephane Eranian wrote:
> Obsolete because superseded by hrtimer based
> multiplexing.

Not entirely, the jiffies_interval allows different PMUs to have
different rotation speeds. Your code doesn't allow this.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] perf: use hrtimer for event multiplexing
  2012-09-07 14:29 ` [PATCH 1/3] " Stephane Eranian
  2012-09-07 15:39   ` Peter Zijlstra
@ 2012-09-07 15:42   ` Peter Zijlstra
  1 sibling, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2012-09-07 15:42 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: linux-kernel, mingo, ak, zheng.z.yan, robert.richter

On Fri, 2012-09-07 at 16:29 +0200, Stephane Eranian wrote:

Style nit:

> +	if (h->active)
> +		list_for_each_entry_safe(cpuctx, tmp, head, rotation_list)
> +			rotations += perf_rotate_context(cpuctx);


> +	if (!hrtimer_callback_running(hr))
> +		__hrtimer_start_range_ns(hr, ns_to_ktime(PERF_CPU_HRTIMER),
> +					 0, HRTIMER_MODE_REL_PINNED, 0);


All multi-line statements should be a stmt block, even if not strictly
required.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] perf: use hrtimer for event multiplexing
  2012-09-07 15:39   ` Peter Zijlstra
@ 2012-09-07 17:03     ` Stephane Eranian
  2012-09-07 19:08       ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: Stephane Eranian @ 2012-09-07 17:03 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, mingo, ak, zheng.z.yan, robert.richter

On Fri, Sep 7, 2012 at 5:39 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, 2012-09-07 at 16:29 +0200, Stephane Eranian wrote:
>> @@ -148,6 +148,15 @@ static LIST_HEAD(pmus);
>>  static DEFINE_MUTEX(pmus_lock);
>>  static struct srcu_struct pmus_srcu;
>>
>> +struct perf_cpu_hrtimer {
>> +       struct hrtimer hrtimer;
>> +       int active;
>> +};
>> +
>> +static DEFINE_PER_CPU(struct list_head, rotation_list);
>> +
>> +static DEFINE_PER_CPU(struct perf_cpu_hrtimer, perf_cpu_hrtimer);
>
>
> How about sticking the hrtimer in perf_cpu_context so you can have a
> different rotation interval per PMU ?
>
I think having different intervals would be a good thing, especially for uncore.
But now, I am wondering how this could work without too much overhead.
Looks like you're suggesting arming multiple hrtimers if multiple PMU are
overcommitted. Is that right? As opposed to having a PMU multiplier off of a
single per-cpu hrtimer.

> Sorta like e9d2b064149ff7ef4acbc65a1b9374ac8b218d3e removed. Stopping
> the timer when the PMU isn't over committed should solve the NOHZ
> problem I think.

Yeah, my patch does solve this. That's what I show in my example in the intro
msg.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] perf: use hrtimer for event multiplexing
  2012-09-07 17:03     ` Stephane Eranian
@ 2012-09-07 19:08       ` Peter Zijlstra
  2012-09-07 19:10         ` Stephane Eranian
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2012-09-07 19:08 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: linux-kernel, mingo, ak, zheng.z.yan, robert.richter

On Fri, 2012-09-07 at 19:03 +0200, Stephane Eranian wrote:
> I think having different intervals would be a good thing, especially for uncore.
> But now, I am wondering how this could work without too much overhead.
> Looks like you're suggesting arming multiple hrtimers if multiple PMU are
> overcommitted. Is that right?

Right, we shoulnd't have too many PMUs anyway, let alone over committed
ones, so a timer per cpu per pmu should be fine.

>  As opposed to having a PMU multiplier off of a
> single per-cpu hrtimer.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] perf: use hrtimer for event multiplexing
  2012-09-07 19:08       ` Peter Zijlstra
@ 2012-09-07 19:10         ` Stephane Eranian
  2012-09-07 19:14           ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: Stephane Eranian @ 2012-09-07 19:10 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, mingo, ak, zheng.z.yan, robert.richter

On Fri, Sep 7, 2012 at 9:08 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, 2012-09-07 at 19:03 +0200, Stephane Eranian wrote:
>> I think having different intervals would be a good thing, especially for uncore.
>> But now, I am wondering how this could work without too much overhead.
>> Looks like you're suggesting arming multiple hrtimers if multiple PMU are
>> overcommitted. Is that right?
>
> Right, we shoulnd't have too many PMUs anyway, let alone over committed
> ones, so a timer per cpu per pmu should be fine.
>
>>  As opposed to having a PMU multiplier off of a
>> single per-cpu hrtimer.

That's true. I started modifying my code to implement your suggestion.
We'll see how it goes. Then we would have to export that mux interval
via sysfs for each PMU.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] perf: use hrtimer for event multiplexing
  2012-09-07 19:10         ` Stephane Eranian
@ 2012-09-07 19:14           ` Peter Zijlstra
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2012-09-07 19:14 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: linux-kernel, mingo, ak, zheng.z.yan, robert.richter

On Fri, 2012-09-07 at 21:10 +0200, Stephane Eranian wrote:
> 
> That's true. I started modifying my code to implement your suggestion.
> We'll see how it goes. Then we would have to export that mux interval
> via sysfs for each PMU. 

Indeed. Thanks!

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-09-07 19:15 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-07 14:29 [PATCH 0/3] perf: use hrtimer for event multiplexing Stephane Eranian
2012-09-07 14:29 ` [PATCH 1/3] " Stephane Eranian
2012-09-07 15:39   ` Peter Zijlstra
2012-09-07 17:03     ` Stephane Eranian
2012-09-07 19:08       ` Peter Zijlstra
2012-09-07 19:10         ` Stephane Eranian
2012-09-07 19:14           ` Peter Zijlstra
2012-09-07 15:42   ` Peter Zijlstra
2012-09-07 14:29 ` [PATCH 2/3] perf: add sysctl control to adjust multiplexing interval Stephane Eranian
2012-09-07 14:29 ` [PATCH 3/3] perf: remove jiffies_interval Stephane Eranian
2012-09-07 15:40   ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).