All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/2] perf: use hrtimer for event multiplexing
@ 2013-03-22 10:51 Stephane Eranian
  2013-03-22 10:51 ` [PATCH v5 1/2] " Stephane Eranian
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Stephane Eranian @ 2013-03-22 10:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, acme, jolsa, namhyung.kim

The current scheme of using the timer tick was fine
for per-thread events. However, it was causing
bias issues in system-wide mode (including for
uncore PMUs). Event groups would not get their
fair share of runtime on the PMU. With tickless
kernels, if a core is idle there is no timer tick,
and thus no event rotation (multiplexing). However,
there are events (especially uncore events) which do
count even though cores are asleep.

This patch changes the timer source for multiplexing.
It introduces a per-cpu hrtimer. The advantage is that
even when the core goes idle, it will come back to
service the hrtimer, thus multiplexing on system-wide
events works much better.

In order to minimize the impact of the hrtimer, it
is turned on and off on demand. When the PMU on
a CPU is overcommitted, the hrtimer is activated.
It is stopped when the PMU is not overcommitted.

In order for this to work properly with HOTPLUG_CPU,
we had to change the order of initialization in
start_kernel() such that hrtimer_init() is run
before perf_event_init().

The second patch provide a sysctl control to
adjust the multiplexing interval. Unit is
milliseconds.

Here is a simple before/after example with
two event groups which do require multiplexing.
This is done in system-wide mode on an idle
system. What matters here is the scaling factor
in [] in not the total counts.

Before:

# perf stat -a -e ref-cycles,ref-cycles sleep 10
 Performance counter stats for 'sleep 10':
 34,319,545 ref-cycles  [56.51%]
 31,917,229 ref-cycles  [43.50%]

 10.000827569 seconds time elapsed

After:
# perf stat -a -e ref-cycles,ref-cycles sleep 10
 Performance counter stats for 'sleep 10':
 11,144,822,193 ref-cycles [50.00%]
 11,103,760,513 ref-cycles [50.00%]

 10.000672946 seconds time elapsed

What matters here is the 50% not the actual
count. Ref-cycles runs only on one fixed counter.
With two instances, each should get 50% of the PMU
which is now true. This helps mitigate the error
introduced by the scaling.

In this second version of the patchset, we now
have the hrtimer_interval per PMU instance. The
tunable is in /sys/devices/XXX/mux_interval_ms,
where XXX is the name of the PMU instance. Due
to initialization changes of each hrtimer, we
had to introduce hrtimer_init_cpu() to initialize
a hrtimer from another CPU.

In the 3rd version, we simplify the code a bit
by using hrtimer_active(). We stopped using
the rotation_list for perf_cpu_hrtimer_cancel().
We also fix an intialization problem.

In the 4th version, we rebase to 3.8.0-rc7 and
we kept SW event on the rotation list which is
now used only for unthrottling. We also renamed
the sysfs tunable to perf_event_mux_interval_ms
to be more consistent with the existing sysctl
entries.

In the 5th version, we modified the code such
that a new hrtimer interval is applied immediately
to any active hrtimer as suggested by Jiri Olsa.
Also got rid of the CPU notifier for hrtimer, it
was useless and unreliable. The code is rebased to
3.9.0-rc3.

Signed-off-by: Stephane Eranian <eranian@google.com>
---

Stephane Eranian (2):
  perf: use hrtimer for event multiplexing
  perf: add sysfs entry to adjust multiplexing interval per PMU

 include/linux/perf_event.h |    4 +-
 init/main.c                |    2 +-
 kernel/events/core.c       |  173 +++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 167 insertions(+), 12 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v5 1/2] perf: use hrtimer for event multiplexing
  2013-03-22 10:51 [PATCH v5 0/2] perf: use hrtimer for event multiplexing Stephane Eranian
@ 2013-03-22 10:51 ` Stephane Eranian
  2013-03-25 11:13   ` Peter Zijlstra
  2013-03-22 10:51 ` [PATCH v5 2/2] perf: add sysfs entry to adjust multiplexing interval per PMU Stephane Eranian
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Stephane Eranian @ 2013-03-22 10:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, acme, jolsa, namhyung.kim

The current scheme of using the timer tick was fine
for per-thread events. However, it was causing
bias issues in system-wide mode (including for
uncore PMUs). Event groups would not get their
fair share of runtime on the PMU. With tickless
kernels, if a core is idle there is no timer tick,
and thus no event rotation (multiplexing). However,
there are events (especially uncore events) which do
count even though cores are asleep.

This patch changes the timer source for multiplexing.
It introduces a per-PMU per-cpu hrtimer. The advantage
is that even when a core goes idle, it will come back
to service the hrtimer, thus multiplexing on system-wide
events works much better.

The per-PMU implementation (suggested by PeterZ) enables
adjusting the multiplexing interval per PMU. The preferred
interval is stashed into the struct pmu. If not set, it
will be forced to the default interval value.

In order to minimize the impact of the hrtimer, it
is turned on and off on demand. When the PMU on
a CPU is overcommited, the hrtimer is activated.
It is stopped when the PMU is not overcommitted.

In order for this to work properly, we had to change
the order of initialization in start_kernel() such
that hrtimer_init() is run before perf_event_init().

The default interval in milliseconds is set to a
timer tick just like with the old code. We will
provide a sysctl to tune this in another patch.

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 include/linux/perf_event.h |    3 +-
 init/main.c                |    2 +-
 kernel/events/core.c       |  114 ++++++++++++++++++++++++++++++++++++++++----
 3 files changed, 109 insertions(+), 10 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 12a1aa2..e2e6c7e 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -500,8 +500,9 @@ struct perf_cpu_context {
 	struct perf_event_context	*task_ctx;
 	int				active_oncpu;
 	int				exclusive;
+	struct hrtimer			hrtimer;
+	ktime_t				hrtimer_interval;
 	struct list_head		rotation_list;
-	int				jiffies_interval;
 	struct pmu			*unique_pmu;
 	struct perf_cgroup		*cgrp;
 };
diff --git a/init/main.c b/init/main.c
index b3e0614..1542b38 100644
--- a/init/main.c
+++ b/init/main.c
@@ -544,7 +544,6 @@ asmlinkage void __init start_kernel(void)
 		local_irq_disable();
 	}
 	idr_init_cache();
-	perf_event_init();
 	rcu_init();
 	radix_tree_init();
 	/* init some links before init_ISA_irqs() */
@@ -556,6 +555,7 @@ asmlinkage void __init start_kernel(void)
 	softirq_init();
 	timekeeping_init();
 	time_init();
+	perf_event_init();
 	profile_init();
 	call_function_init();
 	if (!irqs_disabled())
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 7b4a55d..92d34ad 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -169,6 +169,8 @@ int sysctl_perf_event_sample_rate __read_mostly = DEFAULT_MAX_SAMPLE_RATE;
 static int max_samples_per_tick __read_mostly =
 	DIV_ROUND_UP(DEFAULT_MAX_SAMPLE_RATE, HZ);
 
+static int perf_rotate_context(struct perf_cpu_context *cpuctx);
+
 int perf_proc_update_handler(struct ctl_table *table, int write,
 		void __user *buffer, size_t *lenp,
 		loff_t *ppos)
@@ -642,6 +644,98 @@ perf_cgroup_mark_enabled(struct perf_event *event,
 }
 #endif
 
+/*
+ * set default to be dependent on timer tick just
+ * like original code
+ */
+#define PERF_CPU_HRTIMER (1000 / HZ)
+static enum hrtimer_restart perf_cpu_hrtimer_handler(struct hrtimer *hr)
+{
+	struct perf_cpu_context *cpuctx;
+	enum hrtimer_restart ret = HRTIMER_NORESTART;
+	unsigned long flags;
+	int rotations = 0;
+
+	cpuctx = container_of(hr, struct perf_cpu_context, hrtimer);
+
+	local_irq_save(flags);
+
+	rotations = perf_rotate_context(cpuctx);
+
+	local_irq_restore(flags);
+
+	/*
+	 * arm timer if needed
+	 */
+	if (rotations) {
+		hrtimer_forward_now(hr, cpuctx->hrtimer_interval);
+		ret = HRTIMER_RESTART;
+	}
+
+	return ret;
+}
+
+/* CPU is going down */
+void perf_cpu_hrtimer_cancel(int cpu)
+{
+	struct perf_cpu_context *cpuctx;
+	struct pmu *pmu;
+	unsigned long flags;
+
+	if (WARN_ON(cpu != smp_processor_id()))
+		return;
+
+	local_irq_save(flags);
+
+	rcu_read_lock();
+
+	list_for_each_entry_rcu(pmu, &pmus, entry) {
+		cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
+
+		if (pmu->task_ctx_nr == perf_sw_context)
+			continue;
+
+		hrtimer_cancel(&cpuctx->hrtimer);
+	}
+
+	rcu_read_unlock();
+
+	local_irq_restore(flags);
+}
+
+static void __perf_cpu_hrtimer_init(struct perf_cpu_context *cpuctx, int cpu)
+{
+	struct hrtimer *hr = &cpuctx->hrtimer;
+	struct pmu *pmu = cpuctx->ctx.pmu;
+
+	/* no multiplexing needed for SW PMU */
+	if (pmu->task_ctx_nr == perf_sw_context)
+		return;
+
+	cpuctx->hrtimer_interval =
+		ns_to_ktime(NSEC_PER_MSEC * PERF_CPU_HRTIMER);
+
+	hrtimer_init(hr, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED);
+	hr->function = perf_cpu_hrtimer_handler;
+}
+
+static void perf_cpu_hrtimer_restart(struct perf_cpu_context *cpuctx)
+{
+	struct hrtimer *hr = &cpuctx->hrtimer;
+	struct pmu *pmu = cpuctx->ctx.pmu;
+
+	/* not for SW PMU */
+	if (pmu->task_ctx_nr == perf_sw_context)
+		return;
+
+	if (hrtimer_active(hr))
+		return;
+
+	if (!hrtimer_callback_running(hr))
+		__hrtimer_start_range_ns(hr, cpuctx->hrtimer_interval,
+					 0, HRTIMER_MODE_REL_PINNED, 0);
+}
+
 void perf_pmu_disable(struct pmu *pmu)
 {
 	int *count = this_cpu_ptr(pmu->pmu_disable_count);
@@ -1480,6 +1574,7 @@ group_sched_in(struct perf_event *group_event,
 
 	if (event_sched_in(group_event, cpuctx, ctx)) {
 		pmu->cancel_txn(pmu);
+		perf_cpu_hrtimer_restart(cpuctx);
 		return -EAGAIN;
 	}
 
@@ -1526,6 +1621,8 @@ group_sched_in(struct perf_event *group_event,
 
 	pmu->cancel_txn(pmu);
 
+	perf_cpu_hrtimer_restart(cpuctx);
+
 	return -EAGAIN;
 }
 
@@ -1781,8 +1878,10 @@ static int __perf_event_enable(void *info)
 		 * If this event can't go on and it's part of a
 		 * group, then the whole group has to come off.
 		 */
-		if (leader != event)
+		if (leader != event) {
 			group_sched_out(leader, cpuctx, ctx);
+			perf_cpu_hrtimer_restart(cpuctx);
+		}
 		if (leader->attr.pinned) {
 			update_group_times(leader);
 			leader->state = PERF_EVENT_STATE_ERROR;
@@ -2529,7 +2628,7 @@ static void rotate_ctx(struct perf_event_context *ctx)
  * because they're strictly cpu affine and rotate_start is called with IRQs
  * disabled, while rotate_context is called from IRQ context.
  */
-static void perf_rotate_context(struct perf_cpu_context *cpuctx)
+static int perf_rotate_context(struct perf_cpu_context *cpuctx)
 {
 	struct perf_event_context *ctx = NULL;
 	int rotate = 0, remove = 1;
@@ -2568,6 +2667,8 @@ static void perf_rotate_context(struct perf_cpu_context *cpuctx)
 done:
 	if (remove)
 		list_del_init(&cpuctx->rotation_list);
+
+	return rotate;
 }
 
 void perf_event_task_tick(void)
@@ -2589,10 +2690,6 @@ void perf_event_task_tick(void)
 		ctx = cpuctx->task_ctx;
 		if (ctx)
 			perf_adjust_freq_unthr_context(ctx, throttled);
-
-		if (cpuctx->jiffies_interval == 1 ||
-				!(jiffies % cpuctx->jiffies_interval))
-			perf_rotate_context(cpuctx);
 	}
 }
 
@@ -6014,7 +6111,9 @@ int perf_pmu_register(struct pmu *pmu, char *name, int type)
 		lockdep_set_class(&cpuctx->ctx.lock, &cpuctx_lock);
 		cpuctx->ctx.type = cpu_context;
 		cpuctx->ctx.pmu = pmu;
-		cpuctx->jiffies_interval = 1;
+
+		__perf_cpu_hrtimer_init(cpuctx, cpu);
+
 		INIT_LIST_HEAD(&cpuctx->rotation_list);
 		cpuctx->unique_pmu = pmu;
 	}
@@ -7400,7 +7499,6 @@ perf_cpu_notify(struct notifier_block *self, unsigned long action, void *hcpu)
 	case CPU_DOWN_PREPARE:
 		perf_event_exit_cpu(cpu);
 		break;
-
 	default:
 		break;
 	}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 2/2] perf: add sysfs entry to adjust multiplexing interval per PMU
  2013-03-22 10:51 [PATCH v5 0/2] perf: use hrtimer for event multiplexing Stephane Eranian
  2013-03-22 10:51 ` [PATCH v5 1/2] " Stephane Eranian
@ 2013-03-22 10:51 ` Stephane Eranian
  2013-03-25 11:43   ` Peter Zijlstra
  2013-03-22 13:54 ` [PATCH v5 0/2] perf: use hrtimer for event multiplexing Frederic Weisbecker
  2013-03-22 17:33 ` Andi Kleen
  3 siblings, 1 reply; 12+ messages in thread
From: Stephane Eranian @ 2013-03-22 10:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, acme, jolsa, namhyung.kim

This patch adds /sys/device/xxx/perf_event_mux_interval_ms to ajust the
multiplexing interval per PMU. The unit is milliseconds. Value
has to be >= 1.

In the 4th version, we renamed the sysfs file to be more
consistent with the other /proc/sys/kernel entries for
perf_events.

In the 5th version, we handle the reprogramming of the
hrtimer using hrtimer_forward_now(). That way, we sync
up to new timer value quickly (suggested by Jiri Olsa).

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 include/linux/perf_event.h |    1 +
 kernel/events/core.c       |   63 +++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 60 insertions(+), 4 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index e2e6c7e..e057b06 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -193,6 +193,7 @@ struct pmu {
 	int * __percpu			pmu_disable_count;
 	struct perf_cpu_context * __percpu pmu_cpu_context;
 	int				task_ctx_nr;
+	int				hrtimer_interval_ms;
 
 	/*
 	 * Fully disable/enable this PMU, can be used to protect from the PMI
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 92d34ad..d8d6421 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -707,13 +707,21 @@ static void __perf_cpu_hrtimer_init(struct perf_cpu_context *cpuctx, int cpu)
 {
 	struct hrtimer *hr = &cpuctx->hrtimer;
 	struct pmu *pmu = cpuctx->ctx.pmu;
+	int timer;
 
 	/* no multiplexing needed for SW PMU */
 	if (pmu->task_ctx_nr == perf_sw_context)
 		return;
 
-	cpuctx->hrtimer_interval =
-		ns_to_ktime(NSEC_PER_MSEC * PERF_CPU_HRTIMER);
+	/*
+	 * check default is sane, if not set then force to
+	 * default interval (1/tick)
+	 */
+	timer = pmu->hrtimer_interval_ms;
+	if (timer < 1)
+		timer = pmu->hrtimer_interval_ms = PERF_CPU_HRTIMER;
+
+	cpuctx->hrtimer_interval = ns_to_ktime(NSEC_PER_MSEC * timer);
 
 	hrtimer_init(hr, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED);
 	hr->function = perf_cpu_hrtimer_handler;
@@ -6015,9 +6023,56 @@ type_show(struct device *dev, struct device_attribute *attr, char *page)
 	return snprintf(page, PAGE_SIZE-1, "%d\n", pmu->type);
 }
 
+static ssize_t
+perf_event_mux_interval_ms_show(struct device *dev,
+				struct device_attribute *attr,
+				char *page)
+{
+	struct pmu *pmu = dev_get_drvdata(dev);
+
+	return snprintf(page, PAGE_SIZE-1, "%d\n", pmu->hrtimer_interval_ms);
+}
+
+static ssize_t
+perf_event_mux_interval_ms_store(struct device *dev,
+				 struct device_attribute *attr,
+				 const char *buf, size_t count)
+{
+	struct pmu *pmu = dev_get_drvdata(dev);
+	int timer, cpu, ret;
+
+	ret = kstrtoint(buf, 0, &timer);
+	if (ret)
+		return ret;
+
+	if (timer < 1)
+		return -EINVAL;
+
+	/* same value, noting to do */
+	if (timer == pmu->hrtimer_interval_ms)
+		return count;
+
+	pmu->hrtimer_interval_ms = timer;
+
+	/* update all cpuctx for this PMU */
+	for_each_possible_cpu(cpu) {
+		struct perf_cpu_context *cpuctx;
+		cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
+		cpuctx->hrtimer_interval = ns_to_ktime(NSEC_PER_MSEC * timer);
+
+		if (hrtimer_active(&cpuctx->hrtimer))
+			hrtimer_forward_now(&cpuctx->hrtimer, cpuctx->hrtimer_interval);
+	}
+
+	return count;
+}
+
+#define __ATTR_RW(attr) __ATTR(attr, 0644, attr##_show, attr##_store)
+
 static struct device_attribute pmu_dev_attrs[] = {
-       __ATTR_RO(type),
-       __ATTR_NULL,
+	__ATTR_RO(type),
+	__ATTR_RW(perf_event_mux_interval_ms),
+	__ATTR_NULL,
 };
 
 static int pmu_bus_running;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 0/2] perf: use hrtimer for event multiplexing
  2013-03-22 10:51 [PATCH v5 0/2] perf: use hrtimer for event multiplexing Stephane Eranian
  2013-03-22 10:51 ` [PATCH v5 1/2] " Stephane Eranian
  2013-03-22 10:51 ` [PATCH v5 2/2] perf: add sysfs entry to adjust multiplexing interval per PMU Stephane Eranian
@ 2013-03-22 13:54 ` Frederic Weisbecker
  2013-03-25 12:08   ` Peter Zijlstra
  2013-03-22 17:33 ` Andi Kleen
  3 siblings, 1 reply; 12+ messages in thread
From: Frederic Weisbecker @ 2013-03-22 13:54 UTC (permalink / raw)
  To: Stephane Eranian, Steven Rostedt, Paul McKenney
  Cc: linux-kernel, peterz, mingo, ak, acme, jolsa, namhyung.kim

2013/3/22 Stephane Eranian <eranian@google.com>:
> The current scheme of using the timer tick was fine
> for per-thread events. However, it was causing
> bias issues in system-wide mode (including for
> uncore PMUs). Event groups would not get their
> fair share of runtime on the PMU. With tickless
> kernels, if a core is idle there is no timer tick,
> and thus no event rotation (multiplexing). However,
> there are events (especially uncore events) which do
> count even though cores are asleep.
>
> This patch changes the timer source for multiplexing.
> It introduces a per-cpu hrtimer. The advantage is that
> even when the core goes idle, it will come back to
> service the hrtimer, thus multiplexing on system-wide
> events works much better.
>
> In order to minimize the impact of the hrtimer, it
> is turned on and off on demand. When the PMU on
> a CPU is overcommitted, the hrtimer is activated.
> It is stopped when the PMU is not overcommitted.
>
> In order for this to work properly with HOTPLUG_CPU,
> we had to change the order of initialization in
> start_kernel() such that hrtimer_init() is run
> before perf_event_init().
>
> The second patch provide a sysctl control to
> adjust the multiplexing interval. Unit is
> milliseconds.
>
> Here is a simple before/after example with
> two event groups which do require multiplexing.
> This is done in system-wide mode on an idle
> system. What matters here is the scaling factor
> in [] in not the total counts.
>
> Before:
>
> # perf stat -a -e ref-cycles,ref-cycles sleep 10
>  Performance counter stats for 'sleep 10':
>  34,319,545 ref-cycles  [56.51%]
>  31,917,229 ref-cycles  [43.50%]
>
>  10.000827569 seconds time elapsed
>
> After:
> # perf stat -a -e ref-cycles,ref-cycles sleep 10
>  Performance counter stats for 'sleep 10':
>  11,144,822,193 ref-cycles [50.00%]
>  11,103,760,513 ref-cycles [50.00%]
>
>  10.000672946 seconds time elapsed
>
> What matters here is the 50% not the actual
> count. Ref-cycles runs only on one fixed counter.
> With two instances, each should get 50% of the PMU
> which is now true. This helps mitigate the error
> introduced by the scaling.
>
> In this second version of the patchset, we now
> have the hrtimer_interval per PMU instance. The
> tunable is in /sys/devices/XXX/mux_interval_ms,
> where XXX is the name of the PMU instance. Due
> to initialization changes of each hrtimer, we
> had to introduce hrtimer_init_cpu() to initialize
> a hrtimer from another CPU.
>
> In the 3rd version, we simplify the code a bit
> by using hrtimer_active(). We stopped using
> the rotation_list for perf_cpu_hrtimer_cancel().
> We also fix an intialization problem.
>
> In the 4th version, we rebase to 3.8.0-rc7 and
> we kept SW event on the rotation list which is
> now used only for unthrottling. We also renamed
> the sysfs tunable to perf_event_mux_interval_ms
> to be more consistent with the existing sysctl
> entries.
>
> In the 5th version, we modified the code such
> that a new hrtimer interval is applied immediately
> to any active hrtimer as suggested by Jiri Olsa.
> Also got rid of the CPU notifier for hrtimer, it
> was useless and unreliable. The code is rebased to
> 3.9.0-rc3.
>
> Signed-off-by: Stephane Eranian <eranian@google.com>

And I have to say this patch is going to be very useful for the full
dynticks tree. We are happy to get rid of that tick hook.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 0/2] perf: use hrtimer for event multiplexing
  2013-03-22 10:51 [PATCH v5 0/2] perf: use hrtimer for event multiplexing Stephane Eranian
                   ` (2 preceding siblings ...)
  2013-03-22 13:54 ` [PATCH v5 0/2] perf: use hrtimer for event multiplexing Frederic Weisbecker
@ 2013-03-22 17:33 ` Andi Kleen
  2013-03-22 17:36   ` Stephane Eranian
  3 siblings, 1 reply; 12+ messages in thread
From: Andi Kleen @ 2013-03-22 17:33 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: linux-kernel, peterz, mingo, acme, jolsa, namhyung.kim

On Fri, Mar 22, 2013 at 11:51:37AM +0100, Stephane Eranian wrote:
> The current scheme of using the timer tick was fine
> for per-thread events. However, it was causing
> bias issues in system-wide mode (including for
> uncore PMUs). Event groups would not get their
> fair share of runtime on the PMU. With tickless
> kernels, if a core is idle there is no timer tick,
> and thus no event rotation (multiplexing). However,
> there are events (especially uncore events) which do
> count even though cores are asleep.

Would it be possible to only do this when uncore events
are active? Otherwise it may have a large power cost
and actually change results, as the core counters
will tick more.

-Andi

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 0/2] perf: use hrtimer for event multiplexing
  2013-03-22 17:33 ` Andi Kleen
@ 2013-03-22 17:36   ` Stephane Eranian
  0 siblings, 0 replies; 12+ messages in thread
From: Stephane Eranian @ 2013-03-22 17:36 UTC (permalink / raw)
  To: Andi Kleen
  Cc: LKML, Peter Zijlstra, mingo, Arnaldo Carvalho de Melo, Jiri Olsa,
	Namhyung Kim

On Fri, Mar 22, 2013 at 6:33 PM, Andi Kleen <ak@linux.intel.com> wrote:
>
> On Fri, Mar 22, 2013 at 11:51:37AM +0100, Stephane Eranian wrote:
> > The current scheme of using the timer tick was fine
> > for per-thread events. However, it was causing
> > bias issues in system-wide mode (including for
> > uncore PMUs). Event groups would not get their
> > fair share of runtime on the PMU. With tickless
> > kernels, if a core is idle there is no timer tick,
> > and thus no event rotation (multiplexing). However,
> > there are events (especially uncore events) which do
> > count even though cores are asleep.
>
> Would it be possible to only do this when uncore events
> are active? Otherwise it may have a large power cost
> and actually change results, as the core counters
> will tick more.
>
the hrtimer is activated only when multiplexing is needed
and that's for any PMU.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 1/2] perf: use hrtimer for event multiplexing
  2013-03-22 10:51 ` [PATCH v5 1/2] " Stephane Eranian
@ 2013-03-25 11:13   ` Peter Zijlstra
  2013-03-25 17:58     ` Stephane Eranian
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2013-03-25 11:13 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: linux-kernel, mingo, ak, acme, jolsa, namhyung.kim

On Fri, 2013-03-22 at 11:51 +0100, Stephane Eranian wrote:
> +#define PERF_CPU_HRTIMER (1000 / HZ)
> +static enum hrtimer_restart perf_cpu_hrtimer_handler(struct hrtimer
> *hr)
> +{
> +       struct perf_cpu_context *cpuctx;
> +       enum hrtimer_restart ret = HRTIMER_NORESTART;
> +       unsigned long flags;
> +       int rotations = 0;
> +
> +       cpuctx = container_of(hr, struct perf_cpu_context, hrtimer);
> +
> +       local_irq_save(flags);
> +
> +       rotations = perf_rotate_context(cpuctx);
> +
> +       local_irq_restore(flags);

IIRC hrtimer callbacks are always ran with interrupts disabled.. not a
big deal though. I'll continue reading.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 2/2] perf: add sysfs entry to adjust multiplexing interval per PMU
  2013-03-22 10:51 ` [PATCH v5 2/2] perf: add sysfs entry to adjust multiplexing interval per PMU Stephane Eranian
@ 2013-03-25 11:43   ` Peter Zijlstra
  2013-03-25 12:15     ` Stephane Eranian
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2013-03-25 11:43 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: linux-kernel, mingo, ak, acme, jolsa, namhyung.kim

On Fri, 2013-03-22 at 11:51 +0100, Stephane Eranian wrote:
> In the 4th version, we renamed the sysfs file to be more
> consistent with the other /proc/sys/kernel entries for
> perf_events.

>  static struct device_attribute pmu_dev_attrs[] = {
> -       __ATTR_RO(type),
> -       __ATTR_NULL,
> +       __ATTR_RO(type),
> +       __ATTR_RW(perf_event_mux_interval_ms),
> +       __ATTR_NULL,
>  };

but but but, the file doesn't live in /proc/sys/kernel/, its in
/sys/bus/event_source/devices/*/$foo and none of the files there are
perf_event_* prefixed.

Other than that, the patches look fine.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 0/2] perf: use hrtimer for event multiplexing
  2013-03-22 13:54 ` [PATCH v5 0/2] perf: use hrtimer for event multiplexing Frederic Weisbecker
@ 2013-03-25 12:08   ` Peter Zijlstra
  2013-03-25 12:21     ` Frederic Weisbecker
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2013-03-25 12:08 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Stephane Eranian, Steven Rostedt, Paul McKenney, linux-kernel,
	mingo, ak, acme, jolsa, namhyung.kim

On Fri, 2013-03-22 at 14:54 +0100, Frederic Weisbecker wrote:
> And I have to say this patch is going to be very useful for the full
> dynticks tree. We are happy to get rid of that tick hook.

I'm sorry to have to disappoint, but the tick hook is still there ;-),
its just no longer used for counter rotation. Of course we have
multiple uses for the tick, what did you think :-)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 2/2] perf: add sysfs entry to adjust multiplexing interval per PMU
  2013-03-25 11:43   ` Peter Zijlstra
@ 2013-03-25 12:15     ` Stephane Eranian
  0 siblings, 0 replies; 12+ messages in thread
From: Stephane Eranian @ 2013-03-25 12:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, mingo, ak, Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim

On Mon, Mar 25, 2013 at 12:43 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, 2013-03-22 at 11:51 +0100, Stephane Eranian wrote:
> > In the 4th version, we renamed the sysfs file to be more
> > consistent with the other /proc/sys/kernel entries for
> > perf_events.
>
> >  static struct device_attribute pmu_dev_attrs[] = {
> > -       __ATTR_RO(type),
> > -       __ATTR_NULL,
> > +       __ATTR_RO(type),
> > +       __ATTR_RW(perf_event_mux_interval_ms),
> > +       __ATTR_NULL,
> >  };
>
> but but but, the file doesn't live in /proc/sys/kernel/, its in
> /sys/bus/event_source/devices/*/$foo and none of the files there are
> perf_event_* prefixed.
>
True. Some leftover from the earlier patch. The sysfs entry is now
per pmu instance. So yes, it is in /sys/devices/*/

>
> Other than that, the patches look fine.
>
Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 0/2] perf: use hrtimer for event multiplexing
  2013-03-25 12:08   ` Peter Zijlstra
@ 2013-03-25 12:21     ` Frederic Weisbecker
  0 siblings, 0 replies; 12+ messages in thread
From: Frederic Weisbecker @ 2013-03-25 12:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Stephane Eranian, Steven Rostedt, Paul McKenney, linux-kernel,
	mingo, ak, acme, jolsa, namhyung.kim

2013/3/25 Peter Zijlstra <peterz@infradead.org>:
> On Fri, 2013-03-22 at 14:54 +0100, Frederic Weisbecker wrote:
>> And I have to say this patch is going to be very useful for the full
>> dynticks tree. We are happy to get rid of that tick hook.
>
> I'm sorry to have to disappoint, but the tick hook is still there ;-),
> its just no longer used for counter rotation. Of course we have
> multiple uses for the tick, what did you think :-)
>

Haha!
Bah, at least some significant part of the work is pulled outside the hook :)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 1/2] perf: use hrtimer for event multiplexing
  2013-03-25 11:13   ` Peter Zijlstra
@ 2013-03-25 17:58     ` Stephane Eranian
  0 siblings, 0 replies; 12+ messages in thread
From: Stephane Eranian @ 2013-03-25 17:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, mingo, ak, Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim

On Mon, Mar 25, 2013 at 12:13 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, 2013-03-22 at 11:51 +0100, Stephane Eranian wrote:
>> +#define PERF_CPU_HRTIMER (1000 / HZ)
>> +static enum hrtimer_restart perf_cpu_hrtimer_handler(struct hrtimer
>> *hr)
>> +{
>> +       struct perf_cpu_context *cpuctx;
>> +       enum hrtimer_restart ret = HRTIMER_NORESTART;
>> +       unsigned long flags;
>> +       int rotations = 0;
>> +
>> +       cpuctx = container_of(hr, struct perf_cpu_context, hrtimer);
>> +
>> +       local_irq_save(flags);
>> +
>> +       rotations = perf_rotate_context(cpuctx);
>> +
>> +       local_irq_restore(flags);
>
> IIRC hrtimer callbacks are always ran with interrupts disabled.. not a
> big deal though. I'll continue reading.

I can create a V6 to remove this if necessary.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-03-25 17:58 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-22 10:51 [PATCH v5 0/2] perf: use hrtimer for event multiplexing Stephane Eranian
2013-03-22 10:51 ` [PATCH v5 1/2] " Stephane Eranian
2013-03-25 11:13   ` Peter Zijlstra
2013-03-25 17:58     ` Stephane Eranian
2013-03-22 10:51 ` [PATCH v5 2/2] perf: add sysfs entry to adjust multiplexing interval per PMU Stephane Eranian
2013-03-25 11:43   ` Peter Zijlstra
2013-03-25 12:15     ` Stephane Eranian
2013-03-22 13:54 ` [PATCH v5 0/2] perf: use hrtimer for event multiplexing Frederic Weisbecker
2013-03-25 12:08   ` Peter Zijlstra
2013-03-25 12:21     ` Frederic Weisbecker
2013-03-22 17:33 ` Andi Kleen
2013-03-22 17:36   ` Stephane Eranian

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.