linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] perf: use hrtimer for event multiplexing
@ 2012-09-12 14:13 Stephane Eranian
  2012-09-12 14:13 ` [PATCH v2 1/3] hrtimer: add hrtimer_init_cpu() Stephane Eranian
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Stephane Eranian @ 2012-09-12 14:13 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, zheng.z.yan, robert.richter

The current scheme of using the timer tick was fine
for per-thread events. However, it was causing
bias issues in system-wide mode (including for
uncore PMUs). Event groups would not get their
fair share of runtime on the PMU. With tickless
kernels, if a core is idle there is no timer tick,
and thus no event rotation (multiplexing). However,
there are events (especially uncore events) which do
count even though cores are asleep.

This patch changes the timer source for multiplexing.
It introduces a per-cpu hrtimer. The advantage is that
even when the core goes idle, it will come back to
service the hrtimer, thus multiplexing on system-wide
events works much better.

In order to minimize the impact of the hrtimer, it
is turned on and off on demand. When the PMU on
a CPU is overcommited, the hrtimer is activated.
It is stopped when the PMU is not overcommitted.

In order for this to work properly with HOTPLUG_CPU,
we had to change the order of initialization in
start_kernel() such that hrtimer_init() is run
before perf_event_init().

The second patch provide a sysctl control to
adjust the multiplexing interval. Unit is
milliseconds.

Here is a simple before/after example with
two event groups which do require multiplexing.
This is done in system-wide mode on an idle
system. What matters here is the scaling factor
in [] in not the total counts.

Before:

# perf stat -a -e ref-cycles,ref-cycles sleep 10
 Performance counter stats for 'sleep 10':
 34,319,545 ref-cycles  [56.51%]
 31,917,229 ref-cycles  [43.50%]

 10.000827569 seconds time elapsed

After:
# perf stat -a -e ref-cycles,ref-cycles sleep 10
 Performance counter stats for 'sleep 10':
 11,144,822,193 ref-cycles [50.00%]
 11,103,760,513 ref-cycles [50.00%]

 10.000672946 seconds time elapsed

In this second version of the patchset, we now
have the hrtimer_interval per PMU instance. The
tunable is in /sys/devices/XXX/mux_interval_ms,
where XXX is the name of the PMU instance. Due
to initialization changes of each hrtimer, we
had to introduce hrtimer_init_cpu() to initialize
a hrtimer from another CPU.

Signed-off-by: Stephane Eranian <eranian@google.com>
---

Stephane Eranian (3):
  hrtimer: add hrtimer_init_cpu()
  perf: use hrtimer for event multiplexing
  perf: add sysfs entry to adjust multiplexing interval per PMU

 include/linux/hrtimer.h    |    2 +
 include/linux/perf_event.h |    5 +-
 init/main.c                |    2 +-
 kernel/events/core.c       |  166 +++++++++++++++++++++++++++++++++++++++++---
 kernel/hrtimer.c           |   17 +++--
 5 files changed, 176 insertions(+), 16 deletions(-)

-- 
1.7.5.4


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 1/3] hrtimer: add hrtimer_init_cpu()
  2012-09-12 14:13 [PATCH v2 0/3] perf: use hrtimer for event multiplexing Stephane Eranian
@ 2012-09-12 14:13 ` Stephane Eranian
  2012-09-12 14:22   ` Eric Dumazet
  2012-09-12 14:26   ` Peter Zijlstra
  2012-09-12 14:13 ` [PATCH v2 2/3] perf: use hrtimer for event multiplexing Stephane Eranian
  2012-09-12 14:13 ` [PATCH v2 3/3] perf: add sysfs entry to adjust multiplexing interval per PMU Stephane Eranian
  2 siblings, 2 replies; 23+ messages in thread
From: Stephane Eranian @ 2012-09-12 14:13 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, zheng.z.yan, robert.richter

hrtimer_init() assumes it is called for the current CPU
as it accesses per-cpu variables (hrtimer_bases).

However, there can be cases where a hrtimer is initialized
from a different CPU, so add an entry point to make this
more explicit.

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 include/linux/hrtimer.h |    3 +++
 kernel/hrtimer.c        |   17 ++++++++++++-----
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index cc07d27..9d07e93 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -337,6 +337,9 @@ DECLARE_PER_CPU(struct tick_device, tick_cpu_device);
 /* Initialize timers: */
 extern void hrtimer_init(struct hrtimer *timer, clockid_t which_clock,
 			 enum hrtimer_mode mode);
+extern void hrtimer_init_cpu(int cpu, struct hrtimer *timer,
+			     clockid_t which_clock,
+			     enum hrtimer_mode mode);
 
 #ifdef CONFIG_DEBUG_OBJECTS_TIMERS
 extern void hrtimer_init_on_stack(struct hrtimer *timer, clockid_t which_clock,
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 6db7a5e..cde7fc4 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -440,14 +440,14 @@ static inline void debug_hrtimer_free(struct hrtimer *timer)
 	debug_object_free(timer, &hrtimer_debug_descr);
 }
 
-static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
+static void __hrtimer_init(int cpu, struct hrtimer *timer, clockid_t clock_id,
 			   enum hrtimer_mode mode);
 
 void hrtimer_init_on_stack(struct hrtimer *timer, clockid_t clock_id,
 			   enum hrtimer_mode mode)
 {
 	debug_object_init_on_stack(timer, &hrtimer_debug_descr);
-	__hrtimer_init(timer, clock_id, mode);
+	__hrtimer_init(smp_processor_id(), timer, clock_id, mode);
 }
 EXPORT_SYMBOL_GPL(hrtimer_init_on_stack);
 
@@ -1146,7 +1146,7 @@ ktime_t hrtimer_get_next_event(void)
 }
 #endif
 
-static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
+static void __hrtimer_init(int cpu, struct hrtimer *timer, clockid_t clock_id,
 			   enum hrtimer_mode mode)
 {
 	struct hrtimer_cpu_base *cpu_base;
@@ -1154,7 +1154,7 @@ static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
 
 	memset(timer, 0, sizeof(struct hrtimer));
 
-	cpu_base = &__raw_get_cpu_var(hrtimer_bases);
+	cpu_base = &per_cpu(hrtimer_bases, cpu);
 
 	if (clock_id == CLOCK_REALTIME && mode != HRTIMER_MODE_ABS)
 		clock_id = CLOCK_MONOTONIC;
@@ -1180,10 +1180,17 @@ void hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
 		  enum hrtimer_mode mode)
 {
 	debug_init(timer, clock_id, mode);
-	__hrtimer_init(timer, clock_id, mode);
+	__hrtimer_init(smp_processor_id(), timer, clock_id, mode);
 }
 EXPORT_SYMBOL_GPL(hrtimer_init);
 
+void hrtimer_init_cpu(int cpu, struct hrtimer *timer, clockid_t clock_id,
+		  enum hrtimer_mode mode)
+{
+	debug_init(timer, clock_id, mode);
+	__hrtimer_init(cpu, timer, clock_id, mode);
+}
+
 /**
  * hrtimer_get_res - get the timer resolution for a clock
  * @which_clock: which clock to query
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 2/3] perf: use hrtimer for event multiplexing
  2012-09-12 14:13 [PATCH v2 0/3] perf: use hrtimer for event multiplexing Stephane Eranian
  2012-09-12 14:13 ` [PATCH v2 1/3] hrtimer: add hrtimer_init_cpu() Stephane Eranian
@ 2012-09-12 14:13 ` Stephane Eranian
  2012-09-12 14:22   ` Peter Zijlstra
  2012-09-12 14:13 ` [PATCH v2 3/3] perf: add sysfs entry to adjust multiplexing interval per PMU Stephane Eranian
  2 siblings, 1 reply; 23+ messages in thread
From: Stephane Eranian @ 2012-09-12 14:13 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, zheng.z.yan, robert.richter

The current scheme of using the timer tick was fine
for per-thread events. However, it was causing
bias issues in system-wide mode (including for
uncore PMUs). Event groups would not get their
fair share of runtime on the PMU. With tickless
kernels, if a core is idle there is no timer tick,
and thus no event rotation (multiplexing). However,
there are events (especially uncore events) which do
count even though cores are asleep.

This patch changes the timer source for multiplexing.
It introduces a per-PMU per-cpu hrtimer. The advantage
is that even when a core goes idle, it will come back
to service the hrtimer, thus multiplexing on system-wide
events works much better.

The per-PMU implementation (suggested by PeterZ) enables
adjusting the multiplexing interval per PMU. The preferred
interval is stashed into the struct pmu. If not set, it
will be forced to the default interval value.

In order to minimize the impact of the hrtimer, it
is turned on and off on demand. When the PMU on
a CPU is overcommited, the hrtimer is activated.
It is stopped when the PMU is not overcommitted.

In order for this to work properly, we had to change
the order of initialization in start_kernel() such
that hrtimer_init() is run before perf_event_init().

The default interval in milliseconds is set to a
timer tick just like with the old code. We will
provide a sysctl to tune this in another patch.

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 include/linux/perf_event.h |    4 +-
 init/main.c                |    2 +-
 kernel/events/core.c       |  123 ++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 118 insertions(+), 11 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index cc5e2cd..f299fc8 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1106,8 +1106,10 @@ struct perf_cpu_context {
 	struct perf_event_context	*task_ctx;
 	int				active_oncpu;
 	int				exclusive;
+	struct hrtimer			hrtimer;
 	struct list_head		rotation_list;
-	int				jiffies_interval;
+	ktime_t				hrtimer_interval;
+	int				hrtimer_active;
 	struct pmu			*active_pmu;
 	struct perf_cgroup		*cgrp;
 };
diff --git a/init/main.c b/init/main.c
index b286730..8ffe441 100644
--- a/init/main.c
+++ b/init/main.c
@@ -541,7 +541,6 @@ asmlinkage void __init start_kernel(void)
 		local_irq_disable();
 	}
 	idr_init_cache();
-	perf_event_init();
 	rcu_init();
 	radix_tree_init();
 	/* init some links before init_ISA_irqs() */
@@ -553,6 +552,7 @@ asmlinkage void __init start_kernel(void)
 	softirq_init();
 	timekeeping_init();
 	time_init();
+	perf_event_init();
 	profile_init();
 	call_function_init();
 	if (!irqs_disabled())
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 40f42b8..cf1aec4 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -148,6 +148,8 @@ static LIST_HEAD(pmus);
 static DEFINE_MUTEX(pmus_lock);
 static struct srcu_struct pmus_srcu;
 
+static DEFINE_PER_CPU(struct list_head, rotation_list);
+
 /*
  * perf event paranoia level:
  *  -1 - not paranoid at all
@@ -168,6 +170,8 @@ int sysctl_perf_event_sample_rate __read_mostly = DEFAULT_MAX_SAMPLE_RATE;
 static int max_samples_per_tick __read_mostly =
 	DIV_ROUND_UP(DEFAULT_MAX_SAMPLE_RATE, HZ);
 
+static int perf_rotate_context(struct perf_cpu_context *cpuctx);
+
 int perf_proc_update_handler(struct ctl_table *table, int write,
 		void __user *buffer, size_t *lenp,
 		loff_t *ppos)
@@ -625,6 +629,100 @@ perf_cgroup_mark_enabled(struct perf_event *event,
 }
 #endif
 
+/*
+ * set default to be dependent on timer tick just
+ * like original code
+ */
+#define PERF_CPU_HRTIMER (1000 / HZ)
+static enum hrtimer_restart perf_cpu_hrtimer_handler(struct hrtimer *hr)
+{
+	struct perf_cpu_context *cpuctx;
+	enum hrtimer_restart ret = HRTIMER_NORESTART;
+	unsigned long flags;
+	int rotations = 0;
+
+	cpuctx = container_of(hr, struct perf_cpu_context, hrtimer);
+
+	local_irq_save(flags);
+
+	if (cpuctx->hrtimer_active) {
+		rotations = perf_rotate_context(cpuctx);
+		if (!rotations)
+			cpuctx->hrtimer_active = 0;
+	}
+
+	local_irq_restore(flags);
+
+	/*
+	 * arm timer if needed
+	 */
+	if (rotations) {
+		hrtimer_forward_now(hr, cpuctx->hrtimer_interval);
+		ret = HRTIMER_RESTART;
+	}
+
+	return ret;
+}
+
+/* CPU is going down */
+void perf_cpu_hrtimer_cancel(int cpu)
+{
+	struct list_head *head = &__get_cpu_var(rotation_list);
+	struct perf_cpu_context *cpuctx, *tmp;
+	unsigned long flags;
+
+	if (WARN_ON(cpu != smp_processor_id()))
+		return;
+
+	local_irq_save(flags);
+
+	list_for_each_entry_safe(cpuctx, tmp, head, rotation_list) {
+		if (cpuctx->hrtimer_active) {
+			hrtimer_cancel(&cpuctx->hrtimer);
+			cpuctx->hrtimer_active = 0;
+		}
+	}
+
+	local_irq_restore(flags);
+}
+
+static void __perf_cpu_hrtimer_init(struct perf_cpu_context *cpuctx, int cpu)
+{
+	struct hrtimer *hr = &cpuctx->hrtimer;
+	struct pmu *pmu = cpuctx->ctx.pmu;
+
+	cpuctx->hrtimer_active = 0;
+
+	/* no multiplexing needed for SW PMU */
+	if (pmu->task_ctx_nr == perf_sw_context)
+		return;
+
+	cpuctx->hrtimer_interval =
+		ns_to_ktime(NSEC_PER_MSEC * PERF_CPU_HRTIMER);
+
+	hrtimer_init_cpu(cpu, hr, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	hr->function = perf_cpu_hrtimer_handler;
+}
+
+static void perf_cpu_hrtimer_restart(struct perf_cpu_context *cpuctx)
+{
+	struct hrtimer *hr = &cpuctx->hrtimer;
+	struct pmu *pmu = cpuctx->ctx.pmu;
+
+	/* not for SW PMU */
+	if (pmu->task_ctx_nr == perf_sw_context)
+		return;
+
+	if (cpuctx->hrtimer_active)
+		return;
+
+	cpuctx->hrtimer_active = 1;
+
+	if (!hrtimer_callback_running(hr))
+		__hrtimer_start_range_ns(hr, cpuctx->hrtimer_interval,
+					 0, HRTIMER_MODE_REL_PINNED, 0);
+}
+
 void perf_pmu_disable(struct pmu *pmu)
 {
 	int *count = this_cpu_ptr(pmu->pmu_disable_count);
@@ -639,8 +737,6 @@ void perf_pmu_enable(struct pmu *pmu)
 		pmu->pmu_enable(pmu);
 }
 
-static DEFINE_PER_CPU(struct list_head, rotation_list);
-
 /*
  * perf_pmu_rotate_start() and perf_rotate_context() are fully serialized
  * because they're strictly cpu affine and rotate_start is called with IRQs
@@ -1458,6 +1554,7 @@ group_sched_in(struct perf_event *group_event,
 
 	if (event_sched_in(group_event, cpuctx, ctx)) {
 		pmu->cancel_txn(pmu);
+		perf_cpu_hrtimer_restart(cpuctx);
 		return -EAGAIN;
 	}
 
@@ -1504,6 +1601,8 @@ group_sched_in(struct perf_event *group_event,
 
 	pmu->cancel_txn(pmu);
 
+	perf_cpu_hrtimer_restart(cpuctx);
+
 	return -EAGAIN;
 }
 
@@ -1759,8 +1858,10 @@ static int __perf_event_enable(void *info)
 		 * If this event can't go on and it's part of a
 		 * group, then the whole group has to come off.
 		 */
-		if (leader != event)
+		if (leader != event) {
 			group_sched_out(leader, cpuctx, ctx);
+			perf_cpu_hrtimer_restart(cpuctx);
+		}
 		if (leader->attr.pinned) {
 			update_group_times(leader);
 			leader->state = PERF_EVENT_STATE_ERROR;
@@ -2507,7 +2608,7 @@ static void rotate_ctx(struct perf_event_context *ctx)
  * because they're strictly cpu affine and rotate_start is called with IRQs
  * disabled, while rotate_context is called from IRQ context.
  */
-static void perf_rotate_context(struct perf_cpu_context *cpuctx)
+static int perf_rotate_context(struct perf_cpu_context *cpuctx)
 {
 	struct perf_event_context *ctx = NULL;
 	int rotate = 0, remove = 1;
@@ -2546,6 +2647,8 @@ static void perf_rotate_context(struct perf_cpu_context *cpuctx)
 done:
 	if (remove)
 		list_del_init(&cpuctx->rotation_list);
+
+	return rotate;
 }
 
 void perf_event_task_tick(void)
@@ -2567,10 +2670,6 @@ void perf_event_task_tick(void)
 		ctx = cpuctx->task_ctx;
 		if (ctx)
 			perf_adjust_freq_unthr_context(ctx, throttled);
-
-		if (cpuctx->jiffies_interval == 1 ||
-				!(jiffies % cpuctx->jiffies_interval))
-			perf_rotate_context(cpuctx);
 	}
 }
 
@@ -6000,7 +6099,9 @@ int perf_pmu_register(struct pmu *pmu, char *name, int type)
 		lockdep_set_class(&cpuctx->ctx.lock, &cpuctx_lock);
 		cpuctx->ctx.type = cpu_context;
 		cpuctx->ctx.pmu = pmu;
-		cpuctx->jiffies_interval = 1;
+
+		__perf_cpu_hrtimer_init(cpuctx, cpu);
+
 		INIT_LIST_HEAD(&cpuctx->rotation_list);
 		cpuctx->active_pmu = pmu;
 	}
@@ -7379,6 +7480,10 @@ perf_cpu_notify(struct notifier_block *self, unsigned long action, void *hcpu)
 	case CPU_DOWN_PREPARE:
 		perf_event_exit_cpu(cpu);
 		break;
+	case CPU_DYING:
+		/* must be run on actual cpu */
+		perf_cpu_hrtimer_cancel(cpu);
+		break;
 
 	default:
 		break;
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 3/3] perf: add sysfs entry to adjust multiplexing interval per PMU
  2012-09-12 14:13 [PATCH v2 0/3] perf: use hrtimer for event multiplexing Stephane Eranian
  2012-09-12 14:13 ` [PATCH v2 1/3] hrtimer: add hrtimer_init_cpu() Stephane Eranian
  2012-09-12 14:13 ` [PATCH v2 2/3] perf: use hrtimer for event multiplexing Stephane Eranian
@ 2012-09-12 14:13 ` Stephane Eranian
  2 siblings, 0 replies; 23+ messages in thread
From: Stephane Eranian @ 2012-09-12 14:13 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, zheng.z.yan, robert.richter

This patch adds /sys/device/xxx/mux_interval_ms to ajust the
multiplexing interval per PMU. The unit is milliseconds. Value
has to be >= 1.

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 include/linux/perf_event.h |    1 +
 kernel/events/core.c       |   54 ++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 51 insertions(+), 4 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index f299fc8..e439034 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -784,6 +784,7 @@ struct pmu {
 	int * __percpu			pmu_disable_count;
 	struct perf_cpu_context * __percpu pmu_cpu_context;
 	int				task_ctx_nr;
+	int				hrtimer_interval_ms;
 
 	/*
 	 * Fully disable/enable this PMU, can be used to protect from the PMI
diff --git a/kernel/events/core.c b/kernel/events/core.c
index cf1aec4..681eca2 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -690,15 +690,22 @@ static void __perf_cpu_hrtimer_init(struct perf_cpu_context *cpuctx, int cpu)
 {
 	struct hrtimer *hr = &cpuctx->hrtimer;
 	struct pmu *pmu = cpuctx->ctx.pmu;
+	int timer;
 
 	cpuctx->hrtimer_active = 0;
 
 	/* no multiplexing needed for SW PMU */
 	if (pmu->task_ctx_nr == perf_sw_context)
 		return;
+	/*
+	 * check default is sane, if not set then force to
+	 * default interval (1/tick)
+	 */
+	timer = pmu->hrtimer_interval_ms;
+	if (timer < 1)
+		timer = pmu->hrtimer_interval_ms = PERF_CPU_HRTIMER;
 
-	cpuctx->hrtimer_interval =
-		ns_to_ktime(NSEC_PER_MSEC * PERF_CPU_HRTIMER);
+	cpuctx->hrtimer_interval = ns_to_ktime(NSEC_PER_MSEC * timer);
 
 	hrtimer_init_cpu(cpu, hr, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
 	hr->function = perf_cpu_hrtimer_handler;
@@ -5999,9 +6006,48 @@ type_show(struct device *dev, struct device_attribute *attr, char *page)
 	return snprintf(page, PAGE_SIZE-1, "%d\n", pmu->type);
 }
 
+static ssize_t
+mux_interval_ms_show(struct device *dev, struct device_attribute *attr,
+		     char *page)
+{
+	struct pmu *pmu = dev_get_drvdata(dev);
+
+	return snprintf(page, PAGE_SIZE-1, "%d\n", pmu->hrtimer_interval_ms);
+}
+
+static ssize_t
+mux_interval_ms_store(struct device *dev, struct device_attribute *attr,
+		      const char *buf, size_t count)
+{
+	struct pmu *pmu = dev_get_drvdata(dev);
+	char *end = NULL;
+	int timer, cpu;
+
+	timer = kstrtoul(buf, &end, 0);
+	if (end && *end != '\n' && *end != '\0')
+		return -EINVAL;
+
+	if (timer < 1)
+		return -EINVAL;
+
+	pmu->hrtimer_interval_ms = timer;
+
+	/* update all cpuctx for this PMU */
+	for_each_possible_cpu(cpu) {
+		struct perf_cpu_context *cpuctx;
+		cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
+		cpuctx->hrtimer_interval = ns_to_ktime(NSEC_PER_MSEC * timer);
+	}
+
+	return count;
+}
+
+#define __ATTR_RW(attr) __ATTR(attr, 0644, attr##_show, attr##_store)
+
 static struct device_attribute pmu_dev_attrs[] = {
-       __ATTR_RO(type),
-       __ATTR_NULL,
+	__ATTR_RO(type),
+	__ATTR_RW(mux_interval_ms),
+	__ATTR_NULL,
 };
 
 static int pmu_bus_running;
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/3] hrtimer: add hrtimer_init_cpu()
  2012-09-12 14:13 ` [PATCH v2 1/3] hrtimer: add hrtimer_init_cpu() Stephane Eranian
@ 2012-09-12 14:22   ` Eric Dumazet
  2012-09-12 14:26   ` Peter Zijlstra
  1 sibling, 0 replies; 23+ messages in thread
From: Eric Dumazet @ 2012-09-12 14:22 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: linux-kernel, peterz, mingo, ak, zheng.z.yan, robert.richter

On Wed, 2012-09-12 at 16:13 +0200, Stephane Eranian wrote:

>  void hrtimer_init_on_stack(struct hrtimer *timer, clockid_t clock_id,
>  			   enum hrtimer_mode mode)
>  {
>  	debug_object_init_on_stack(timer, &hrtimer_debug_descr);
> -	__hrtimer_init(timer, clock_id, mode);
> +	__hrtimer_init(smp_processor_id(), timer, clock_id, mode);
>  }

This assumes caller is in non preemptable context.




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/3] perf: use hrtimer for event multiplexing
  2012-09-12 14:13 ` [PATCH v2 2/3] perf: use hrtimer for event multiplexing Stephane Eranian
@ 2012-09-12 14:22   ` Peter Zijlstra
  2012-09-12 14:43     ` Stephane Eranian
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2012-09-12 14:22 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: linux-kernel, mingo, ak, zheng.z.yan, robert.richter

On Wed, 2012-09-12 at 16:13 +0200, Stephane Eranian wrote:
> +static DEFINE_PER_CPU(struct list_head, rotation_list);

Why do you keep the rotation list? The only use seems to be:


> +void perf_cpu_hrtimer_cancel(int cpu)
> +{
> +       struct list_head *head = &__get_cpu_var(rotation_list);
> +       struct perf_cpu_context *cpuctx, *tmp;
> +       unsigned long flags;
> +
> +       if (WARN_ON(cpu != smp_processor_id()))
> +               return;
> +
> +       local_irq_save(flags);
> +
> +       list_for_each_entry_safe(cpuctx, tmp, head, rotation_list) {
> +               if (cpuctx->hrtimer_active) {
> +                       hrtimer_cancel(&cpuctx->hrtimer);
> +                       cpuctx->hrtimer_active = 0;
> +               }
> +       }
> +
> +       local_irq_restore(flags);
> +}

Which is weird, why not use the existing for-each-pmu loop in
perf_event_exit_cpu_context() ? Or something similar to iterate all
extant PMUs and thus their cpuctxs?

Also, you can do away with hrtimer_active, you can hrtimer_cancel() on
an inactive hrtimer just fine, it will DTRT.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/3] hrtimer: add hrtimer_init_cpu()
  2012-09-12 14:13 ` [PATCH v2 1/3] hrtimer: add hrtimer_init_cpu() Stephane Eranian
  2012-09-12 14:22   ` Eric Dumazet
@ 2012-09-12 14:26   ` Peter Zijlstra
  2012-09-12 14:33     ` Stephane Eranian
  1 sibling, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2012-09-12 14:26 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: linux-kernel, mingo, ak, zheng.z.yan, robert.richter, Thomas Gleixner

I'm rather sure Thomas would want to know about this..


On Wed, 2012-09-12 at 16:13 +0200, Stephane Eranian wrote:
> hrtimer_init() assumes it is called for the current CPU
> as it accesses per-cpu variables (hrtimer_bases).
> 
> However, there can be cases where a hrtimer is initialized
> from a different CPU, so add an entry point to make this
> more explicit.
> 
> Signed-off-by: Stephane Eranian <eranian@google.com>
> ---
>  include/linux/hrtimer.h |    3 +++
>  kernel/hrtimer.c        |   17 ++++++++++++-----
>  2 files changed, 15 insertions(+), 5 deletions(-)
> 

> +static void __hrtimer_init(int cpu, struct hrtimer *timer, clockid_t clock_id,
>  			   enum hrtimer_mode mode)
>  {
>  	struct hrtimer_cpu_base *cpu_base;
> @@ -1154,7 +1154,7 @@ static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
>  
>  	memset(timer, 0, sizeof(struct hrtimer));
>  
> -	cpu_base = &__raw_get_cpu_var(hrtimer_bases);
> +	cpu_base = &per_cpu(hrtimer_bases, cpu);
>  
>  	if (clock_id == CLOCK_REALTIME && mode != HRTIMER_MODE_ABS)
>  		clock_id = CLOCK_MONOTONIC;


I don't see the point, one of the first things
__hrtimer_start_range_ns() does is switch_hrtimer_base() to swizzle it
to the calling CPUs base.

And since all the perf event rotation muck is strictly per cpu that all
should work out just fine, no?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/3] hrtimer: add hrtimer_init_cpu()
  2012-09-12 14:26   ` Peter Zijlstra
@ 2012-09-12 14:33     ` Stephane Eranian
  2012-09-12 14:38       ` Peter Zijlstra
  0 siblings, 1 reply; 23+ messages in thread
From: Stephane Eranian @ 2012-09-12 14:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, mingo, ak, Yan, Zheng, Robert Richter, Thomas Gleixner

On Wed, Sep 12, 2012 at 4:26 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> I'm rather sure Thomas would want to know about this..
>
>
> On Wed, 2012-09-12 at 16:13 +0200, Stephane Eranian wrote:
>> hrtimer_init() assumes it is called for the current CPU
>> as it accesses per-cpu variables (hrtimer_bases).
>>
>> However, there can be cases where a hrtimer is initialized
>> from a different CPU, so add an entry point to make this
>> more explicit.
>>
>> Signed-off-by: Stephane Eranian <eranian@google.com>
>> ---
>>  include/linux/hrtimer.h |    3 +++
>>  kernel/hrtimer.c        |   17 ++++++++++++-----
>>  2 files changed, 15 insertions(+), 5 deletions(-)
>>
>
>> +static void __hrtimer_init(int cpu, struct hrtimer *timer, clockid_t clock_id,
>>                          enum hrtimer_mode mode)
>>  {
>>       struct hrtimer_cpu_base *cpu_base;
>> @@ -1154,7 +1154,7 @@ static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
>>
>>       memset(timer, 0, sizeof(struct hrtimer));
>>
>> -     cpu_base = &__raw_get_cpu_var(hrtimer_bases);
>> +     cpu_base = &per_cpu(hrtimer_bases, cpu);
>>
>>       if (clock_id == CLOCK_REALTIME && mode != HRTIMER_MODE_ABS)
>>               clock_id = CLOCK_MONOTONIC;
>
>
> I don't see the point, one of the first things
> __hrtimer_start_range_ns() does is switch_hrtimer_base() to swizzle it
> to the calling CPUs base.
>
> And since all the perf event rotation muck is strictly per cpu that all
> should work out just fine, no?


If I do:
  for_each_possible_cpu(cpu) {
       cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
       hr = &cpuctx->hrtimer;
       hrtimer_init(hr)
}
I don't understand why I would have to refer to per-cpu data
(hrtimer_bases) from
a CPU that is not equal to "cpu" here. Unless you're telling me it's
read-only data.
But still if it's per-cpu why not initialize with the correct CPU from
the start?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/3] hrtimer: add hrtimer_init_cpu()
  2012-09-12 14:33     ` Stephane Eranian
@ 2012-09-12 14:38       ` Peter Zijlstra
  2012-09-12 14:46         ` Stephane Eranian
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2012-09-12 14:38 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: LKML, mingo, ak, Yan, Zheng, Robert Richter, Thomas Gleixner,
	Eric Dumazet

On Wed, 2012-09-12 at 16:33 +0200, Stephane Eranian wrote:

> 
> If I do:
>   for_each_possible_cpu(cpu) {
>        cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
>        hr = &cpuctx->hrtimer;
>        hrtimer_init(hr)
> }
> I don't understand why I would have to refer to per-cpu data
> (hrtimer_bases) from
> a CPU that is not equal to "cpu" here. Unless you're telling me it's
> read-only data.

No its not read only, but it is unused until you do *hrtimer_start*(),
which will test and fix.

> But still if it's per-cpu why not initialize with the correct CPU from
> the start?

To keep the interface simpler I guess. There's no great harm in your
proposal, but it is strictly speaking superfluous. I'm not sure the max
one time avoidance of a base swizzle is worth the extra interface, I'll
leave that up to Thomas.

Also, what Eric said ;-)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/3] perf: use hrtimer for event multiplexing
  2012-09-12 14:22   ` Peter Zijlstra
@ 2012-09-12 14:43     ` Stephane Eranian
  2012-09-12 14:44       ` Peter Zijlstra
  2012-09-12 15:37       ` Stephane Eranian
  0 siblings, 2 replies; 23+ messages in thread
From: Stephane Eranian @ 2012-09-12 14:43 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, mingo, ak, Yan, Zheng, Robert Richter

On Wed, Sep 12, 2012 at 4:22 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, 2012-09-12 at 16:13 +0200, Stephane Eranian wrote:
>> +static DEFINE_PER_CPU(struct list_head, rotation_list);
>
> Why do you keep the rotation list? The only use seems to be:
>
>
>> +void perf_cpu_hrtimer_cancel(int cpu)
>> +{
>> +       struct list_head *head = &__get_cpu_var(rotation_list);
>> +       struct perf_cpu_context *cpuctx, *tmp;
>> +       unsigned long flags;
>> +
>> +       if (WARN_ON(cpu != smp_processor_id()))
>> +               return;
>> +
>> +       local_irq_save(flags);
>> +
>> +       list_for_each_entry_safe(cpuctx, tmp, head, rotation_list) {
>> +               if (cpuctx->hrtimer_active) {
>> +                       hrtimer_cancel(&cpuctx->hrtimer);
>> +                       cpuctx->hrtimer_active = 0;
>> +               }
>> +       }
>> +
>> +       local_irq_restore(flags);
>> +}
>
> Which is weird, why not use the existing for-each-pmu loop in
> perf_event_exit_cpu_context() ? Or something similar to iterate all
> extant PMUs and thus their cpuctxs?
>
True. That would probably work too.

> Also, you can do away with hrtimer_active, you can hrtimer_cancel() on
> an inactive hrtimer just fine, it will DTRT.

The hrtimer_active is used to prevent activating the timer multiple times
in a row.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/3] perf: use hrtimer for event multiplexing
  2012-09-12 14:43     ` Stephane Eranian
@ 2012-09-12 14:44       ` Peter Zijlstra
  2012-09-12 14:48         ` Stephane Eranian
  2012-09-12 15:37       ` Stephane Eranian
  1 sibling, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2012-09-12 14:44 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: LKML, mingo, ak, Yan, Zheng, Robert Richter

On Wed, 2012-09-12 at 16:43 +0200, Stephane Eranian wrote:
> The hrtimer_active is used to prevent activating the timer multiple times
> in a row. 

see hrtimer_active(), this should do what you want I think.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/3] hrtimer: add hrtimer_init_cpu()
  2012-09-12 14:38       ` Peter Zijlstra
@ 2012-09-12 14:46         ` Stephane Eranian
  2012-09-12 14:49           ` Peter Zijlstra
  0 siblings, 1 reply; 23+ messages in thread
From: Stephane Eranian @ 2012-09-12 14:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, mingo, ak, Yan, Zheng, Robert Richter, Thomas Gleixner,
	Eric Dumazet

On Wed, Sep 12, 2012 at 4:38 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, 2012-09-12 at 16:33 +0200, Stephane Eranian wrote:
>
>>
>> If I do:
>>   for_each_possible_cpu(cpu) {
>>        cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
>>        hr = &cpuctx->hrtimer;
>>        hrtimer_init(hr)
>> }
>> I don't understand why I would have to refer to per-cpu data
>> (hrtimer_bases) from
>> a CPU that is not equal to "cpu" here. Unless you're telling me it's
>> read-only data.
>
> No its not read only, but it is unused until you do *hrtimer_start*(),
> which will test and fix.
>
>> But still if it's per-cpu why not initialize with the correct CPU from
>> the start?
>
> To keep the interface simpler I guess. There's no great harm in your
> proposal, but it is strictly speaking superfluous. I'm not sure the max
> one time avoidance of a base swizzle is worth the extra interface, I'll
> leave that up to Thomas.
>
> Also, what Eric said ;-)

I am fine with dropping this patch. I just found it odd there was a per-cpu
data reference embedded deep into the call. I wanted things to be more
explicit. I know it works without the proposed change.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/3] perf: use hrtimer for event multiplexing
  2012-09-12 14:44       ` Peter Zijlstra
@ 2012-09-12 14:48         ` Stephane Eranian
  2012-09-12 14:50           ` Peter Zijlstra
  0 siblings, 1 reply; 23+ messages in thread
From: Stephane Eranian @ 2012-09-12 14:48 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, mingo, ak, Yan, Zheng, Robert Richter

On Wed, Sep 12, 2012 at 4:44 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, 2012-09-12 at 16:43 +0200, Stephane Eranian wrote:
>> The hrtimer_active is used to prevent activating the timer multiple times
>> in a row.
>
> see hrtimer_active(), this should do what you want I think.

I need something that is true even when the hrtimer is not executing
the callback handler. I guess that may be the different between
hrtimer_active() vs. hrtimer_running()?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/3] hrtimer: add hrtimer_init_cpu()
  2012-09-12 14:46         ` Stephane Eranian
@ 2012-09-12 14:49           ` Peter Zijlstra
  2012-09-12 14:51             ` Stephane Eranian
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2012-09-12 14:49 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: LKML, mingo, ak, Yan, Zheng, Robert Richter, Thomas Gleixner,
	Eric Dumazet

On Wed, 2012-09-12 at 16:46 +0200, Stephane Eranian wrote:

> I am fine with dropping this patch. I just found it odd there was a per-cpu
> data reference embedded deep into the call. I wanted things to be more
> explicit. I know it works without the proposed change.

Ah the reason its there is to make sure the base pointer is a valid base
pointer, this avoids an extra conditional in a number of places. Note
that it uses raw_get_cpu_var(), so it doesn't even actually know which
specific pointer goes in.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/3] perf: use hrtimer for event multiplexing
  2012-09-12 14:48         ` Stephane Eranian
@ 2012-09-12 14:50           ` Peter Zijlstra
  0 siblings, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2012-09-12 14:50 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: LKML, mingo, ak, Yan, Zheng, Robert Richter

On Wed, 2012-09-12 at 16:48 +0200, Stephane Eranian wrote:
> On Wed, Sep 12, 2012 at 4:44 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Wed, 2012-09-12 at 16:43 +0200, Stephane Eranian wrote:
> >> The hrtimer_active is used to prevent activating the timer multiple times
> >> in a row.
> >
> > see hrtimer_active(), this should do what you want I think.
> 
> I need something that is true even when the hrtimer is not executing
> the callback handler. I guess that may be the different between
> hrtimer_active() vs. hrtimer_running()?

As the comment states, hrtimer_active() returns true if the timer is
either in the tree (enqueued) or running.

hrtimer_callback_running() is true iff the callback is currently
executing.

hrtimer_queued() is true iff the timer is queued.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/3] hrtimer: add hrtimer_init_cpu()
  2012-09-12 14:49           ` Peter Zijlstra
@ 2012-09-12 14:51             ` Stephane Eranian
  0 siblings, 0 replies; 23+ messages in thread
From: Stephane Eranian @ 2012-09-12 14:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, mingo, ak, Yan, Zheng, Robert Richter, Thomas Gleixner,
	Eric Dumazet

On Wed, Sep 12, 2012 at 4:49 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, 2012-09-12 at 16:46 +0200, Stephane Eranian wrote:
>
>> I am fine with dropping this patch. I just found it odd there was a per-cpu
>> data reference embedded deep into the call. I wanted things to be more
>> explicit. I know it works without the proposed change.
>
> Ah the reason its there is to make sure the base pointer is a valid base
> pointer, this avoids an extra conditional in a number of places. Note
> that it uses raw_get_cpu_var(), so it doesn't even actually know which
> specific pointer goes in.
>
>
Ok, fine. Then let's drop this patch then.
Thanks for the explanations.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/3] perf: use hrtimer for event multiplexing
  2012-09-12 14:43     ` Stephane Eranian
  2012-09-12 14:44       ` Peter Zijlstra
@ 2012-09-12 15:37       ` Stephane Eranian
  2012-09-12 15:49         ` Stephane Eranian
  2012-09-13 12:16         ` Peter Zijlstra
  1 sibling, 2 replies; 23+ messages in thread
From: Stephane Eranian @ 2012-09-12 15:37 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, mingo, ak, Yan, Zheng, Robert Richter

On Wed, Sep 12, 2012 at 4:43 PM, Stephane Eranian <eranian@google.com> wrote:
> On Wed, Sep 12, 2012 at 4:22 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> On Wed, 2012-09-12 at 16:13 +0200, Stephane Eranian wrote:
>>> +static DEFINE_PER_CPU(struct list_head, rotation_list);
>>
>> Why do you keep the rotation list? The only use seems to be:
>>
>>
>>> +void perf_cpu_hrtimer_cancel(int cpu)
>>> +{
>>> +       struct list_head *head = &__get_cpu_var(rotation_list);
>>> +       struct perf_cpu_context *cpuctx, *tmp;
>>> +       unsigned long flags;
>>> +
>>> +       if (WARN_ON(cpu != smp_processor_id()))
>>> +               return;
>>> +
>>> +       local_irq_save(flags);
>>> +
>>> +       list_for_each_entry_safe(cpuctx, tmp, head, rotation_list) {
>>> +               if (cpuctx->hrtimer_active) {
>>> +                       hrtimer_cancel(&cpuctx->hrtimer);
>>> +                       cpuctx->hrtimer_active = 0;
>>> +               }
>>> +       }
>>> +
>>> +       local_irq_restore(flags);
>>> +}
>>
>> Which is weird, why not use the existing for-each-pmu loop in
>> perf_event_exit_cpu_context() ? Or something similar to iterate all
>> extant PMUs and thus their cpuctxs?
>>
> True. That would probably work too.
>
Note however that the rotation_list is still used in perf_event_task_tick()
to iterate over the ctx which needs unthrottling. We would have to switch
that loop over to a for-each-pmu() which would necessary incur more
iterations as it would include all the SW PMUs.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/3] perf: use hrtimer for event multiplexing
  2012-09-12 15:37       ` Stephane Eranian
@ 2012-09-12 15:49         ` Stephane Eranian
  2012-09-13 12:16         ` Peter Zijlstra
  1 sibling, 0 replies; 23+ messages in thread
From: Stephane Eranian @ 2012-09-12 15:49 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, mingo, ak, Yan, Zheng, Robert Richter

On Wed, Sep 12, 2012 at 5:37 PM, Stephane Eranian <eranian@google.com> wrote:
> On Wed, Sep 12, 2012 at 4:43 PM, Stephane Eranian <eranian@google.com> wrote:
>> On Wed, Sep 12, 2012 at 4:22 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>>> On Wed, 2012-09-12 at 16:13 +0200, Stephane Eranian wrote:
>>>> +static DEFINE_PER_CPU(struct list_head, rotation_list);
>>>
>>> Why do you keep the rotation list? The only use seems to be:
>>>
>>>
>>>> +void perf_cpu_hrtimer_cancel(int cpu)
>>>> +{
>>>> +       struct list_head *head = &__get_cpu_var(rotation_list);
>>>> +       struct perf_cpu_context *cpuctx, *tmp;
>>>> +       unsigned long flags;
>>>> +
>>>> +       if (WARN_ON(cpu != smp_processor_id()))
>>>> +               return;
>>>> +
>>>> +       local_irq_save(flags);
>>>> +
>>>> +       list_for_each_entry_safe(cpuctx, tmp, head, rotation_list) {
>>>> +               if (cpuctx->hrtimer_active) {
>>>> +                       hrtimer_cancel(&cpuctx->hrtimer);
>>>> +                       cpuctx->hrtimer_active = 0;
>>>> +               }
>>>> +       }
>>>> +
>>>> +       local_irq_restore(flags);
>>>> +}
>>>
>>> Which is weird, why not use the existing for-each-pmu loop in
>>> perf_event_exit_cpu_context() ? Or something similar to iterate all
>>> extant PMUs and thus their cpuctxs?
>>>
>> True. That would probably work too.
>>
> Note however that the rotation_list is still used in perf_event_task_tick()
> to iterate over the ctx which needs unthrottling. We would have to switch
> that loop over to a for-each-pmu() which would necessary incur more
> iterations as it would include all the SW PMUs.

That reminds me that dropping sw context from rotation_list causes an
issue in perf_event_task_tick() because that means the sw PMU are
not considered anymore for interrupt unthrottling but they should. So
I think switching to for-each-pmu() in perf_event_task_tick() will solve
that problem too.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/3] perf: use hrtimer for event multiplexing
  2012-09-12 15:37       ` Stephane Eranian
  2012-09-12 15:49         ` Stephane Eranian
@ 2012-09-13 12:16         ` Peter Zijlstra
  2012-09-13 12:20           ` Stephane Eranian
  1 sibling, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2012-09-13 12:16 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: LKML, mingo, ak, Yan, Zheng, Robert Richter

On Wed, 2012-09-12 at 17:37 +0200, Stephane Eranian wrote:

> Note however that the rotation_list is still used in perf_event_task_tick()
> to iterate over the ctx which needs unthrottling. We would have to switch
> that loop over to a for-each-pmu() which would necessary incur more
> iterations as it would include all the SW PMUs.

Oh urgh, right. I think that was one of the reasons I bailed on the
hrtimer thing, the frequency and throttle stuff.

ctx->nr_freq and needs_unthr could help, but yeah.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/3] perf: use hrtimer for event multiplexing
  2012-09-13 12:16         ` Peter Zijlstra
@ 2012-09-13 12:20           ` Stephane Eranian
  2012-09-13 12:26             ` Peter Zijlstra
  0 siblings, 1 reply; 23+ messages in thread
From: Stephane Eranian @ 2012-09-13 12:20 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, mingo, ak, Yan, Zheng, Robert Richter

On Thu, Sep 13, 2012 at 2:16 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, 2012-09-12 at 17:37 +0200, Stephane Eranian wrote:
>
>> Note however that the rotation_list is still used in perf_event_task_tick()
>> to iterate over the ctx which needs unthrottling. We would have to switch
>> that loop over to a for-each-pmu() which would necessary incur more
>> iterations as it would include all the SW PMUs.
>
> Oh urgh, right. I think that was one of the reasons I bailed on the
> hrtimer thing, the frequency and throttle stuff.
>
> ctx->nr_freq and needs_unthr could help, but yeah.

I think for now, we could keep rotation list for the unthrottling.
Multiplexing won't use rotation_list anymore.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/3] perf: use hrtimer for event multiplexing
  2012-09-13 12:20           ` Stephane Eranian
@ 2012-09-13 12:26             ` Peter Zijlstra
  2012-09-13 12:27               ` Stephane Eranian
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2012-09-13 12:26 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: LKML, mingo, ak, Yan, Zheng, Robert Richter

On Thu, 2012-09-13 at 14:20 +0200, Stephane Eranian wrote:
> On Thu, Sep 13, 2012 at 2:16 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Wed, 2012-09-12 at 17:37 +0200, Stephane Eranian wrote:
> >
> >> Note however that the rotation_list is still used in perf_event_task_tick()
> >> to iterate over the ctx which needs unthrottling. We would have to switch
> >> that loop over to a for-each-pmu() which would necessary incur more
> >> iterations as it would include all the SW PMUs.
> >
> > Oh urgh, right. I think that was one of the reasons I bailed on the
> > hrtimer thing, the frequency and throttle stuff.
> >
> > ctx->nr_freq and needs_unthr could help, but yeah.
> 
> I think for now, we could keep rotation list for the unthrottling.
> Multiplexing won't use rotation_list anymore.

wouldn't that still have the problem where we take the sw-pmus off of
the rotation-list?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/3] perf: use hrtimer for event multiplexing
  2012-09-13 12:26             ` Peter Zijlstra
@ 2012-09-13 12:27               ` Stephane Eranian
  2012-09-13 12:29                 ` Peter Zijlstra
  0 siblings, 1 reply; 23+ messages in thread
From: Stephane Eranian @ 2012-09-13 12:27 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, mingo, ak, Yan, Zheng, Robert Richter

On Thu, Sep 13, 2012 at 2:26 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, 2012-09-13 at 14:20 +0200, Stephane Eranian wrote:
>> On Thu, Sep 13, 2012 at 2:16 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Wed, 2012-09-12 at 17:37 +0200, Stephane Eranian wrote:
>> >
>> >> Note however that the rotation_list is still used in perf_event_task_tick()
>> >> to iterate over the ctx which needs unthrottling. We would have to switch
>> >> that loop over to a for-each-pmu() which would necessary incur more
>> >> iterations as it would include all the SW PMUs.
>> >
>> > Oh urgh, right. I think that was one of the reasons I bailed on the
>> > hrtimer thing, the frequency and throttle stuff.
>> >
>> > ctx->nr_freq and needs_unthr could help, but yeah.
>>
>> I think for now, we could keep rotation list for the unthrottling.
>> Multiplexing won't use rotation_list anymore.
>
> wouldn't that still have the problem where we take the sw-pmus off of
> the rotation-list?

No because we should not use the patch I posted last week. So rotation_start()
would still enqueue SW pmus.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/3] perf: use hrtimer for event multiplexing
  2012-09-13 12:27               ` Stephane Eranian
@ 2012-09-13 12:29                 ` Peter Zijlstra
  0 siblings, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2012-09-13 12:29 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: LKML, mingo, ak, Yan, Zheng, Robert Richter

On Thu, 2012-09-13 at 14:27 +0200, Stephane Eranian wrote:

> No because we should not use the patch I posted last week. So rotation_start()
> would still enqueue SW pmus.

Hrmm. I just send it to Ingo.. let me see if I can still recall that.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2012-09-13 12:29 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-12 14:13 [PATCH v2 0/3] perf: use hrtimer for event multiplexing Stephane Eranian
2012-09-12 14:13 ` [PATCH v2 1/3] hrtimer: add hrtimer_init_cpu() Stephane Eranian
2012-09-12 14:22   ` Eric Dumazet
2012-09-12 14:26   ` Peter Zijlstra
2012-09-12 14:33     ` Stephane Eranian
2012-09-12 14:38       ` Peter Zijlstra
2012-09-12 14:46         ` Stephane Eranian
2012-09-12 14:49           ` Peter Zijlstra
2012-09-12 14:51             ` Stephane Eranian
2012-09-12 14:13 ` [PATCH v2 2/3] perf: use hrtimer for event multiplexing Stephane Eranian
2012-09-12 14:22   ` Peter Zijlstra
2012-09-12 14:43     ` Stephane Eranian
2012-09-12 14:44       ` Peter Zijlstra
2012-09-12 14:48         ` Stephane Eranian
2012-09-12 14:50           ` Peter Zijlstra
2012-09-12 15:37       ` Stephane Eranian
2012-09-12 15:49         ` Stephane Eranian
2012-09-13 12:16         ` Peter Zijlstra
2012-09-13 12:20           ` Stephane Eranian
2012-09-13 12:26             ` Peter Zijlstra
2012-09-13 12:27               ` Stephane Eranian
2012-09-13 12:29                 ` Peter Zijlstra
2012-09-12 14:13 ` [PATCH v2 3/3] perf: add sysfs entry to adjust multiplexing interval per PMU Stephane Eranian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).