All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Preparatory perf patches for KVM PMU support
@ 2011-06-29 15:42 Avi Kivity
  2011-06-29 15:42 ` [PATCH 1/3] perf: add context field to perf_event Avi Kivity
                   ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: Avi Kivity @ 2011-06-29 15:42 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, Ingo Molnar, a.p.zijlstra, acme, Frederic Weisbecker

The following three patches pave the way for KVM in-guest performance
monitoring.  One is a perf API improvement, another fixes the constraints
for the version 1 architectural PMU (which we will emulate), and the third
adds an export that KVM will use.

Please consider for merging; this will make further work on the KVM PMU
easier.

Avi Kivity (3):
  perf: add context field to perf_event
  x86, perf: add constraints for architectural PMU v1
  perf: export perf_event_refresh() to modules

 arch/arm/kernel/ptrace.c                |    3 ++-
 arch/powerpc/kernel/ptrace.c            |    2 +-
 arch/sh/kernel/ptrace_32.c              |    3 ++-
 arch/x86/kernel/cpu/perf_event_intel.c  |   23 ++++++++++++++++++-----
 arch/x86/kernel/kgdb.c                  |    2 +-
 arch/x86/kernel/ptrace.c                |    3 ++-
 drivers/oprofile/oprofile_perf.c        |    2 +-
 include/linux/hw_breakpoint.h           |   10 ++++++++--
 include/linux/perf_event.h              |    9 ++++++++-
 kernel/events/core.c                    |   24 +++++++++++++++++-------
 kernel/events/hw_breakpoint.c           |   10 +++++++---
 kernel/watchdog.c                       |    2 +-
 samples/hw_breakpoint/data_breakpoint.c |    2 +-
 13 files changed, 69 insertions(+), 26 deletions(-)

-- 
1.7.5.3


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 1/3] perf: add context field to perf_event
  2011-06-29 15:42 [PATCH 0/3] Preparatory perf patches for KVM PMU support Avi Kivity
@ 2011-06-29 15:42 ` Avi Kivity
  2011-06-29 16:08   ` Frederic Weisbecker
  2011-07-01 15:24   ` [tip:perf/core] perf: Add " tip-bot for Avi Kivity
  2011-06-29 15:42 ` [PATCH 2/3] x86, perf: add constraints for architectural PMU v1 Avi Kivity
  2011-06-29 15:42 ` [PATCH 3/3] perf: export perf_event_refresh() to modules Avi Kivity
  2 siblings, 2 replies; 39+ messages in thread
From: Avi Kivity @ 2011-06-29 15:42 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, Ingo Molnar, a.p.zijlstra, acme, Frederic Weisbecker

The perf_event overflow handler does not receive any caller-derived
argument, so many callers need to resort to looking up the perf_event
in their local data structure.  This is ugly and doesn't scale if a
single callback services many perf_events.

Fix by adding a context parameter to perf_event_create_kernel_counter()
(and derived hardware breakpoints APIs) and storing it in the perf_event.
The field can be accessed from the callback as event->overflow_handler_context.
All callers are updated.

Signed-off-by: Avi Kivity <avi@redhat.com>
---
 arch/arm/kernel/ptrace.c                |    3 ++-
 arch/powerpc/kernel/ptrace.c            |    2 +-
 arch/sh/kernel/ptrace_32.c              |    3 ++-
 arch/x86/kernel/kgdb.c                  |    2 +-
 arch/x86/kernel/ptrace.c                |    3 ++-
 drivers/oprofile/oprofile_perf.c        |    2 +-
 include/linux/hw_breakpoint.h           |   10 ++++++++--
 include/linux/perf_event.h              |    4 +++-
 kernel/events/core.c                    |   21 +++++++++++++++------
 kernel/events/hw_breakpoint.c           |   10 +++++++---
 kernel/watchdog.c                       |    2 +-
 samples/hw_breakpoint/data_breakpoint.c |    2 +-
 12 files changed, 44 insertions(+), 20 deletions(-)

diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c
index 9726006..4911c94 100644
--- a/arch/arm/kernel/ptrace.c
+++ b/arch/arm/kernel/ptrace.c
@@ -479,7 +479,8 @@ static struct perf_event *ptrace_hbp_create(struct task_struct *tsk, int type)
 	attr.bp_type	= type;
 	attr.disabled	= 1;
 
-	return register_user_hw_breakpoint(&attr, ptrace_hbptriggered, tsk);
+	return register_user_hw_breakpoint(&attr, ptrace_hbptriggered, NULL,
+					   tsk);
 }
 
 static int ptrace_gethbpregs(struct task_struct *tsk, long num,
diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index cb22024..5249308 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -973,7 +973,7 @@ int ptrace_set_debugreg(struct task_struct *task, unsigned long addr,
 								&attr.bp_type);
 
 	thread->ptrace_bps[0] = bp = register_user_hw_breakpoint(&attr,
-							ptrace_triggered, task);
+					       ptrace_triggered, NULL, task);
 	if (IS_ERR(bp)) {
 		thread->ptrace_bps[0] = NULL;
 		ptrace_put_breakpoints(task);
diff --git a/arch/sh/kernel/ptrace_32.c b/arch/sh/kernel/ptrace_32.c
index 3d7b209..930312f 100644
--- a/arch/sh/kernel/ptrace_32.c
+++ b/arch/sh/kernel/ptrace_32.c
@@ -91,7 +91,8 @@ static int set_single_step(struct task_struct *tsk, unsigned long addr)
 		attr.bp_len = HW_BREAKPOINT_LEN_2;
 		attr.bp_type = HW_BREAKPOINT_R;
 
-		bp = register_user_hw_breakpoint(&attr, ptrace_triggered, tsk);
+		bp = register_user_hw_breakpoint(&attr, ptrace_triggered,
+						 NULL, tsk);
 		if (IS_ERR(bp))
 			return PTR_ERR(bp);
 
diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index 5f9ecff..473ab53 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -638,7 +638,7 @@ void kgdb_arch_late(void)
 	for (i = 0; i < HBP_NUM; i++) {
 		if (breakinfo[i].pev)
 			continue;
-		breakinfo[i].pev = register_wide_hw_breakpoint(&attr, NULL);
+		breakinfo[i].pev = register_wide_hw_breakpoint(&attr, NULL, NULL);
 		if (IS_ERR((void * __force)breakinfo[i].pev)) {
 			printk(KERN_ERR "kgdb: Could not allocate hw"
 			       "breakpoints\nDisabling the kernel debugger\n");
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 807c2a2..28092ae 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -715,7 +715,8 @@ static int ptrace_set_breakpoint_addr(struct task_struct *tsk, int nr,
 		attr.bp_type = HW_BREAKPOINT_W;
 		attr.disabled = 1;
 
-		bp = register_user_hw_breakpoint(&attr, ptrace_triggered, tsk);
+		bp = register_user_hw_breakpoint(&attr, ptrace_triggered,
+						 NULL, tsk);
 
 		/*
 		 * CHECKME: the previous code returned -EIO if the addr wasn't
diff --git a/drivers/oprofile/oprofile_perf.c b/drivers/oprofile/oprofile_perf.c
index 9046f7b..59acf9e 100644
--- a/drivers/oprofile/oprofile_perf.c
+++ b/drivers/oprofile/oprofile_perf.c
@@ -79,7 +79,7 @@ static int op_create_counter(int cpu, int event)
 
 	pevent = perf_event_create_kernel_counter(&counter_config[event].attr,
 						  cpu, NULL,
-						  op_overflow_handler);
+						  op_overflow_handler, NULL);
 
 	if (IS_ERR(pevent))
 		return PTR_ERR(pevent);
diff --git a/include/linux/hw_breakpoint.h b/include/linux/hw_breakpoint.h
index d1e55fe..6ae9c63 100644
--- a/include/linux/hw_breakpoint.h
+++ b/include/linux/hw_breakpoint.h
@@ -73,6 +73,7 @@ static inline unsigned long hw_breakpoint_len(struct perf_event *bp)
 extern struct perf_event *
 register_user_hw_breakpoint(struct perf_event_attr *attr,
 			    perf_overflow_handler_t triggered,
+			    void *context,
 			    struct task_struct *tsk);
 
 /* FIXME: only change from the attr, and don't unregister */
@@ -85,11 +86,13 @@ modify_user_hw_breakpoint(struct perf_event *bp, struct perf_event_attr *attr);
 extern struct perf_event *
 register_wide_hw_breakpoint_cpu(struct perf_event_attr *attr,
 				perf_overflow_handler_t	triggered,
+				void *context,
 				int cpu);
 
 extern struct perf_event * __percpu *
 register_wide_hw_breakpoint(struct perf_event_attr *attr,
-			    perf_overflow_handler_t triggered);
+			    perf_overflow_handler_t triggered,
+			    void *context);
 
 extern int register_perf_hw_breakpoint(struct perf_event *bp);
 extern int __register_perf_hw_breakpoint(struct perf_event *bp);
@@ -115,6 +118,7 @@ static inline int __init init_hw_breakpoint(void) { return 0; }
 static inline struct perf_event *
 register_user_hw_breakpoint(struct perf_event_attr *attr,
 			    perf_overflow_handler_t triggered,
+			    void *context,
 			    struct task_struct *tsk)	{ return NULL; }
 static inline int
 modify_user_hw_breakpoint(struct perf_event *bp,
@@ -122,10 +126,12 @@ modify_user_hw_breakpoint(struct perf_event *bp,
 static inline struct perf_event *
 register_wide_hw_breakpoint_cpu(struct perf_event_attr *attr,
 				perf_overflow_handler_t	 triggered,
+				void *context,
 				int cpu)		{ return NULL; }
 static inline struct perf_event * __percpu *
 register_wide_hw_breakpoint(struct perf_event_attr *attr,
-			    perf_overflow_handler_t triggered)	{ return NULL; }
+			    perf_overflow_handler_t triggered,
+			    void *context)		{ return NULL; }
 static inline int
 register_perf_hw_breakpoint(struct perf_event *bp)	{ return -ENOSYS; }
 static inline int
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index e0786e3..40264b5 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -855,6 +855,7 @@ struct perf_event {
 	u64				id;
 
 	perf_overflow_handler_t		overflow_handler;
+	void				*overflow_handler_context;
 
 #ifdef CONFIG_EVENT_TRACING
 	struct ftrace_event_call	*tp_event;
@@ -978,7 +979,8 @@ extern struct perf_event *
 perf_event_create_kernel_counter(struct perf_event_attr *attr,
 				int cpu,
 				struct task_struct *task,
-				perf_overflow_handler_t callback);
+				perf_overflow_handler_t callback,
+				void *context);
 extern u64 perf_event_read_value(struct perf_event *event,
 				 u64 *enabled, u64 *running);
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 9efe710..6dd4819 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6150,7 +6150,8 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 		 struct task_struct *task,
 		 struct perf_event *group_leader,
 		 struct perf_event *parent_event,
-		 perf_overflow_handler_t overflow_handler)
+		 perf_overflow_handler_t overflow_handler,
+		 void *context)
 {
 	struct pmu *pmu;
 	struct perf_event *event;
@@ -6208,10 +6209,13 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 #endif
 	}
 
-	if (!overflow_handler && parent_event)
+	if (!overflow_handler && parent_event) {
 		overflow_handler = parent_event->overflow_handler;
+		context = parent_event->overflow_handler_context;
+	}
 
 	event->overflow_handler	= overflow_handler;
+	event->overflow_handler_context = context;
 
 	if (attr->disabled)
 		event->state = PERF_EVENT_STATE_OFF;
@@ -6478,7 +6482,8 @@ SYSCALL_DEFINE5(perf_event_open,
 		}
 	}
 
-	event = perf_event_alloc(&attr, cpu, task, group_leader, NULL, NULL);
+	event = perf_event_alloc(&attr, cpu, task, group_leader, NULL,
+				 NULL, NULL);
 	if (IS_ERR(event)) {
 		err = PTR_ERR(event);
 		goto err_task;
@@ -6663,7 +6668,8 @@ err_fd:
 struct perf_event *
 perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu,
 				 struct task_struct *task,
-				 perf_overflow_handler_t overflow_handler)
+				 perf_overflow_handler_t overflow_handler,
+				 void *context)
 {
 	struct perf_event_context *ctx;
 	struct perf_event *event;
@@ -6673,7 +6679,8 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu,
 	 * Get the target context (task or percpu):
 	 */
 
-	event = perf_event_alloc(attr, cpu, task, NULL, NULL, overflow_handler);
+	event = perf_event_alloc(attr, cpu, task, NULL, NULL,
+				 overflow_handler, context);
 	if (IS_ERR(event)) {
 		err = PTR_ERR(event);
 		goto err;
@@ -6957,7 +6964,7 @@ inherit_event(struct perf_event *parent_event,
 					   parent_event->cpu,
 					   child,
 					   group_leader, parent_event,
-					   NULL);
+				           NULL, NULL);
 	if (IS_ERR(child_event))
 		return child_event;
 	get_ctx(child_ctx);
@@ -6984,6 +6991,8 @@ inherit_event(struct perf_event *parent_event,
 
 	child_event->ctx = child_ctx;
 	child_event->overflow_handler = parent_event->overflow_handler;
+	child_event->overflow_handler_context
+		= parent_event->overflow_handler_context;
 
 	/*
 	 * Precalculate sample_data sizes
diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index 086adf2..b7971d6 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -431,9 +431,11 @@ int register_perf_hw_breakpoint(struct perf_event *bp)
 struct perf_event *
 register_user_hw_breakpoint(struct perf_event_attr *attr,
 			    perf_overflow_handler_t triggered,
+			    void *context,
 			    struct task_struct *tsk)
 {
-	return perf_event_create_kernel_counter(attr, -1, tsk, triggered);
+	return perf_event_create_kernel_counter(attr, -1, tsk, triggered,
+						context);
 }
 EXPORT_SYMBOL_GPL(register_user_hw_breakpoint);
 
@@ -502,7 +504,8 @@ EXPORT_SYMBOL_GPL(unregister_hw_breakpoint);
  */
 struct perf_event * __percpu *
 register_wide_hw_breakpoint(struct perf_event_attr *attr,
-			    perf_overflow_handler_t triggered)
+			    perf_overflow_handler_t triggered,
+			    void *context)
 {
 	struct perf_event * __percpu *cpu_events, **pevent, *bp;
 	long err;
@@ -515,7 +518,8 @@ register_wide_hw_breakpoint(struct perf_event_attr *attr,
 	get_online_cpus();
 	for_each_online_cpu(cpu) {
 		pevent = per_cpu_ptr(cpu_events, cpu);
-		bp = perf_event_create_kernel_counter(attr, cpu, NULL, triggered);
+		bp = perf_event_create_kernel_counter(attr, cpu, NULL,
+						      triggered, context);
 
 		*pevent = bp;
 
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 3d0c56a..0aaa41f 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -371,7 +371,7 @@ static int watchdog_nmi_enable(int cpu)
 	/* Try to register using hardware perf events */
 	wd_attr = &wd_hw_attr;
 	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
-	event = perf_event_create_kernel_counter(wd_attr, cpu, NULL, watchdog_overflow_callback);
+	event = perf_event_create_kernel_counter(wd_attr, cpu, NULL, watchdog_overflow_callback, NULL);
 	if (!IS_ERR(event)) {
 		printk(KERN_INFO "NMI watchdog enabled, takes one hw-pmu counter.\n");
 		goto out_save;
diff --git a/samples/hw_breakpoint/data_breakpoint.c b/samples/hw_breakpoint/data_breakpoint.c
index 0636539..2b85831 100644
--- a/samples/hw_breakpoint/data_breakpoint.c
+++ b/samples/hw_breakpoint/data_breakpoint.c
@@ -60,7 +60,7 @@ static int __init hw_break_module_init(void)
 	attr.bp_len = HW_BREAKPOINT_LEN_4;
 	attr.bp_type = HW_BREAKPOINT_W | HW_BREAKPOINT_R;
 
-	sample_hbp = register_wide_hw_breakpoint(&attr, sample_hbp_handler);
+	sample_hbp = register_wide_hw_breakpoint(&attr, sample_hbp_handler, NULL);
 	if (IS_ERR((void __force *)sample_hbp)) {
 		ret = PTR_ERR((void __force *)sample_hbp);
 		goto fail;
-- 
1.7.5.3


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 2/3] x86, perf: add constraints for architectural PMU v1
  2011-06-29 15:42 [PATCH 0/3] Preparatory perf patches for KVM PMU support Avi Kivity
  2011-06-29 15:42 ` [PATCH 1/3] perf: add context field to perf_event Avi Kivity
@ 2011-06-29 15:42 ` Avi Kivity
  2011-07-01 15:24   ` [tip:perf/core] x86, perf: Add constraints for architectural PMU tip-bot for Avi Kivity
  2011-06-29 15:42 ` [PATCH 3/3] perf: export perf_event_refresh() to modules Avi Kivity
  2 siblings, 1 reply; 39+ messages in thread
From: Avi Kivity @ 2011-06-29 15:42 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, Ingo Molnar, a.p.zijlstra, acme, Frederic Weisbecker

The v1 PMU does not have any fixed counters.  Using the v2 constraints,
which do have fixed counters, causes an additional choice to be present
in the weight calculation, but not when actually scheduling the event,
leading to an event being not scheduled at all.

Signed-off-by: Avi Kivity <avi@redhat.com>
---
 arch/x86/kernel/cpu/perf_event_intel.c |   23 ++++++++++++++++++-----
 1 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 41178c8..b46b70e 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -137,6 +137,11 @@ static struct event_constraint intel_westmere_percore_constraints[] __read_mostl
 	EVENT_CONSTRAINT_END
 };
 
+static struct event_constraint intel_v1_event_constraints[] __read_mostly =
+{
+	EVENT_CONSTRAINT_END
+};
+
 static struct event_constraint intel_gen_event_constraints[] __read_mostly =
 {
 	FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */
@@ -1512,11 +1517,19 @@ static __init int intel_pmu_init(void)
 		break;
 
 	default:
-		/*
-		 * default constraints for v2 and up
-		 */
-		x86_pmu.event_constraints = intel_gen_event_constraints;
-		pr_cont("generic architected perfmon, ");
+		switch (x86_pmu.version) {
+		case 1:
+			x86_pmu.event_constraints = intel_v1_event_constraints;
+			pr_cont("generic architected perfmon v1, ");
+			break;
+		default:
+			/*
+			 * default constraints for v2 and up
+			 */
+			x86_pmu.event_constraints = intel_gen_event_constraints;
+			pr_cont("generic architected perfmon, ");
+			break;
+		}
 	}
 	return 0;
 }
-- 
1.7.5.3


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 3/3] perf: export perf_event_refresh() to modules
  2011-06-29 15:42 [PATCH 0/3] Preparatory perf patches for KVM PMU support Avi Kivity
  2011-06-29 15:42 ` [PATCH 1/3] perf: add context field to perf_event Avi Kivity
  2011-06-29 15:42 ` [PATCH 2/3] x86, perf: add constraints for architectural PMU v1 Avi Kivity
@ 2011-06-29 15:42 ` Avi Kivity
  2011-07-01 15:25   ` [tip:perf/core] " tip-bot for Avi Kivity
  2 siblings, 1 reply; 39+ messages in thread
From: Avi Kivity @ 2011-06-29 15:42 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, Ingo Molnar, a.p.zijlstra, acme, Frederic Weisbecker

KVM needs one-shot samples, since a PMC programmed to -X will fire after X
events and then again after 2^40 events (i.e. variable period).

Signed-off-by: Avi Kivity <avi@redhat.com>
---
 include/linux/perf_event.h |    5 +++++
 kernel/events/core.c       |    3 ++-
 2 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 40264b5..91342ac 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -973,6 +973,7 @@ extern void perf_pmu_disable(struct pmu *pmu);
 extern void perf_pmu_enable(struct pmu *pmu);
 extern int perf_event_task_disable(void);
 extern int perf_event_task_enable(void);
+extern int perf_event_refresh(struct perf_event *event, int refresh);
 extern void perf_event_update_userpage(struct perf_event *event);
 extern int perf_event_release_kernel(struct perf_event *event);
 extern struct perf_event *
@@ -1168,6 +1169,10 @@ static inline void perf_event_delayed_put(struct task_struct *task)	{ }
 static inline void perf_event_print_debug(void)				{ }
 static inline int perf_event_task_disable(void)				{ return -EINVAL; }
 static inline int perf_event_task_enable(void)				{ return -EINVAL; }
+static inline int perf_event_refresh(struct perf_event *event, int refresh)
+{
+	return -EINVAL;
+}
 
 static inline void
 perf_sw_event(u32 event_id, u64 nr, int nmi,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6dd4819..f69cc9f 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1739,7 +1739,7 @@ out:
 	raw_spin_unlock_irq(&ctx->lock);
 }
 
-static int perf_event_refresh(struct perf_event *event, int refresh)
+int perf_event_refresh(struct perf_event *event, int refresh)
 {
 	/*
 	 * not supported on inherited events
@@ -1752,6 +1752,7 @@ static int perf_event_refresh(struct perf_event *event, int refresh)
 
 	return 0;
 }
+EXPORT_SYMBOL_GPL(perf_event_refresh);
 
 static void ctx_sched_out(struct perf_event_context *ctx,
 			  struct perf_cpu_context *cpuctx,
-- 
1.7.5.3


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/3] perf: add context field to perf_event
  2011-06-29 15:42 ` [PATCH 1/3] perf: add context field to perf_event Avi Kivity
@ 2011-06-29 16:08   ` Frederic Weisbecker
  2011-06-29 16:25     ` Avi Kivity
  2011-06-29 16:27     ` Will Deacon
  2011-07-01 15:24   ` [tip:perf/core] perf: Add " tip-bot for Avi Kivity
  1 sibling, 2 replies; 39+ messages in thread
From: Frederic Weisbecker @ 2011-06-29 16:08 UTC (permalink / raw)
  To: Avi Kivity
  Cc: linux-kernel, kvm, Ingo Molnar, a.p.zijlstra, acme, Will Deacon,
	Jason Wessel

On Wed, Jun 29, 2011 at 06:42:35PM +0300, Avi Kivity wrote:
> The perf_event overflow handler does not receive any caller-derived
> argument, so many callers need to resort to looking up the perf_event
> in their local data structure.  This is ugly and doesn't scale if a
> single callback services many perf_events.
> 
> Fix by adding a context parameter to perf_event_create_kernel_counter()
> (and derived hardware breakpoints APIs) and storing it in the perf_event.
> The field can be accessed from the callback as event->overflow_handler_context.
> All callers are updated.
> 
> Signed-off-by: Avi Kivity <avi@redhat.com>

I believe it can micro-optimize ptrace through register_user_hw_breakpoint() because
we could store the index of the breakpoint that way, instead of iterating through 4 slots.

Perhaps it can help in arm too, adding Will in Cc.

But for register_wide_hw_breakpoint, I'm not sure. kgdb is the main user, may be Jason
could find some use of it.

> ---
>  arch/arm/kernel/ptrace.c                |    3 ++-
>  arch/powerpc/kernel/ptrace.c            |    2 +-
>  arch/sh/kernel/ptrace_32.c              |    3 ++-
>  arch/x86/kernel/kgdb.c                  |    2 +-
>  arch/x86/kernel/ptrace.c                |    3 ++-
>  drivers/oprofile/oprofile_perf.c        |    2 +-
>  include/linux/hw_breakpoint.h           |   10 ++++++++--
>  include/linux/perf_event.h              |    4 +++-
>  kernel/events/core.c                    |   21 +++++++++++++++------
>  kernel/events/hw_breakpoint.c           |   10 +++++++---
>  kernel/watchdog.c                       |    2 +-
>  samples/hw_breakpoint/data_breakpoint.c |    2 +-
>  12 files changed, 44 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c
> index 9726006..4911c94 100644
> --- a/arch/arm/kernel/ptrace.c
> +++ b/arch/arm/kernel/ptrace.c
> @@ -479,7 +479,8 @@ static struct perf_event *ptrace_hbp_create(struct task_struct *tsk, int type)
>  	attr.bp_type	= type;
>  	attr.disabled	= 1;
>  
> -	return register_user_hw_breakpoint(&attr, ptrace_hbptriggered, tsk);
> +	return register_user_hw_breakpoint(&attr, ptrace_hbptriggered, NULL,
> +					   tsk);
>  }
>  
>  static int ptrace_gethbpregs(struct task_struct *tsk, long num,
> diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
> index cb22024..5249308 100644
> --- a/arch/powerpc/kernel/ptrace.c
> +++ b/arch/powerpc/kernel/ptrace.c
> @@ -973,7 +973,7 @@ int ptrace_set_debugreg(struct task_struct *task, unsigned long addr,
>  								&attr.bp_type);
>  
>  	thread->ptrace_bps[0] = bp = register_user_hw_breakpoint(&attr,
> -							ptrace_triggered, task);
> +					       ptrace_triggered, NULL, task);
>  	if (IS_ERR(bp)) {
>  		thread->ptrace_bps[0] = NULL;
>  		ptrace_put_breakpoints(task);
> diff --git a/arch/sh/kernel/ptrace_32.c b/arch/sh/kernel/ptrace_32.c
> index 3d7b209..930312f 100644
> --- a/arch/sh/kernel/ptrace_32.c
> +++ b/arch/sh/kernel/ptrace_32.c
> @@ -91,7 +91,8 @@ static int set_single_step(struct task_struct *tsk, unsigned long addr)
>  		attr.bp_len = HW_BREAKPOINT_LEN_2;
>  		attr.bp_type = HW_BREAKPOINT_R;
>  
> -		bp = register_user_hw_breakpoint(&attr, ptrace_triggered, tsk);
> +		bp = register_user_hw_breakpoint(&attr, ptrace_triggered,
> +						 NULL, tsk);
>  		if (IS_ERR(bp))
>  			return PTR_ERR(bp);
>  
> diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
> index 5f9ecff..473ab53 100644
> --- a/arch/x86/kernel/kgdb.c
> +++ b/arch/x86/kernel/kgdb.c
> @@ -638,7 +638,7 @@ void kgdb_arch_late(void)
>  	for (i = 0; i < HBP_NUM; i++) {
>  		if (breakinfo[i].pev)
>  			continue;
> -		breakinfo[i].pev = register_wide_hw_breakpoint(&attr, NULL);
> +		breakinfo[i].pev = register_wide_hw_breakpoint(&attr, NULL, NULL);
>  		if (IS_ERR((void * __force)breakinfo[i].pev)) {
>  			printk(KERN_ERR "kgdb: Could not allocate hw"
>  			       "breakpoints\nDisabling the kernel debugger\n");
> diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
> index 807c2a2..28092ae 100644
> --- a/arch/x86/kernel/ptrace.c
> +++ b/arch/x86/kernel/ptrace.c
> @@ -715,7 +715,8 @@ static int ptrace_set_breakpoint_addr(struct task_struct *tsk, int nr,
>  		attr.bp_type = HW_BREAKPOINT_W;
>  		attr.disabled = 1;
>  
> -		bp = register_user_hw_breakpoint(&attr, ptrace_triggered, tsk);
> +		bp = register_user_hw_breakpoint(&attr, ptrace_triggered,
> +						 NULL, tsk);
>  
>  		/*
>  		 * CHECKME: the previous code returned -EIO if the addr wasn't
> diff --git a/drivers/oprofile/oprofile_perf.c b/drivers/oprofile/oprofile_perf.c
> index 9046f7b..59acf9e 100644
> --- a/drivers/oprofile/oprofile_perf.c
> +++ b/drivers/oprofile/oprofile_perf.c
> @@ -79,7 +79,7 @@ static int op_create_counter(int cpu, int event)
>  
>  	pevent = perf_event_create_kernel_counter(&counter_config[event].attr,
>  						  cpu, NULL,
> -						  op_overflow_handler);
> +						  op_overflow_handler, NULL);
>  
>  	if (IS_ERR(pevent))
>  		return PTR_ERR(pevent);
> diff --git a/include/linux/hw_breakpoint.h b/include/linux/hw_breakpoint.h
> index d1e55fe..6ae9c63 100644
> --- a/include/linux/hw_breakpoint.h
> +++ b/include/linux/hw_breakpoint.h
> @@ -73,6 +73,7 @@ static inline unsigned long hw_breakpoint_len(struct perf_event *bp)
>  extern struct perf_event *
>  register_user_hw_breakpoint(struct perf_event_attr *attr,
>  			    perf_overflow_handler_t triggered,
> +			    void *context,
>  			    struct task_struct *tsk);
>  
>  /* FIXME: only change from the attr, and don't unregister */
> @@ -85,11 +86,13 @@ modify_user_hw_breakpoint(struct perf_event *bp, struct perf_event_attr *attr);
>  extern struct perf_event *
>  register_wide_hw_breakpoint_cpu(struct perf_event_attr *attr,
>  				perf_overflow_handler_t	triggered,
> +				void *context,
>  				int cpu);
>  
>  extern struct perf_event * __percpu *
>  register_wide_hw_breakpoint(struct perf_event_attr *attr,
> -			    perf_overflow_handler_t triggered);
> +			    perf_overflow_handler_t triggered,
> +			    void *context);
>  
>  extern int register_perf_hw_breakpoint(struct perf_event *bp);
>  extern int __register_perf_hw_breakpoint(struct perf_event *bp);
> @@ -115,6 +118,7 @@ static inline int __init init_hw_breakpoint(void) { return 0; }
>  static inline struct perf_event *
>  register_user_hw_breakpoint(struct perf_event_attr *attr,
>  			    perf_overflow_handler_t triggered,
> +			    void *context,
>  			    struct task_struct *tsk)	{ return NULL; }
>  static inline int
>  modify_user_hw_breakpoint(struct perf_event *bp,
> @@ -122,10 +126,12 @@ modify_user_hw_breakpoint(struct perf_event *bp,
>  static inline struct perf_event *
>  register_wide_hw_breakpoint_cpu(struct perf_event_attr *attr,
>  				perf_overflow_handler_t	 triggered,
> +				void *context,
>  				int cpu)		{ return NULL; }
>  static inline struct perf_event * __percpu *
>  register_wide_hw_breakpoint(struct perf_event_attr *attr,
> -			    perf_overflow_handler_t triggered)	{ return NULL; }
> +			    perf_overflow_handler_t triggered,
> +			    void *context)		{ return NULL; }
>  static inline int
>  register_perf_hw_breakpoint(struct perf_event *bp)	{ return -ENOSYS; }
>  static inline int
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index e0786e3..40264b5 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -855,6 +855,7 @@ struct perf_event {
>  	u64				id;
>  
>  	perf_overflow_handler_t		overflow_handler;
> +	void				*overflow_handler_context;
>  
>  #ifdef CONFIG_EVENT_TRACING
>  	struct ftrace_event_call	*tp_event;
> @@ -978,7 +979,8 @@ extern struct perf_event *
>  perf_event_create_kernel_counter(struct perf_event_attr *attr,
>  				int cpu,
>  				struct task_struct *task,
> -				perf_overflow_handler_t callback);
> +				perf_overflow_handler_t callback,
> +				void *context);
>  extern u64 perf_event_read_value(struct perf_event *event,
>  				 u64 *enabled, u64 *running);
>  
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 9efe710..6dd4819 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -6150,7 +6150,8 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
>  		 struct task_struct *task,
>  		 struct perf_event *group_leader,
>  		 struct perf_event *parent_event,
> -		 perf_overflow_handler_t overflow_handler)
> +		 perf_overflow_handler_t overflow_handler,
> +		 void *context)
>  {
>  	struct pmu *pmu;
>  	struct perf_event *event;
> @@ -6208,10 +6209,13 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
>  #endif
>  	}
>  
> -	if (!overflow_handler && parent_event)
> +	if (!overflow_handler && parent_event) {
>  		overflow_handler = parent_event->overflow_handler;
> +		context = parent_event->overflow_handler_context;
> +	}
>  
>  	event->overflow_handler	= overflow_handler;
> +	event->overflow_handler_context = context;
>  
>  	if (attr->disabled)
>  		event->state = PERF_EVENT_STATE_OFF;
> @@ -6478,7 +6482,8 @@ SYSCALL_DEFINE5(perf_event_open,
>  		}
>  	}
>  
> -	event = perf_event_alloc(&attr, cpu, task, group_leader, NULL, NULL);
> +	event = perf_event_alloc(&attr, cpu, task, group_leader, NULL,
> +				 NULL, NULL);
>  	if (IS_ERR(event)) {
>  		err = PTR_ERR(event);
>  		goto err_task;
> @@ -6663,7 +6668,8 @@ err_fd:
>  struct perf_event *
>  perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu,
>  				 struct task_struct *task,
> -				 perf_overflow_handler_t overflow_handler)
> +				 perf_overflow_handler_t overflow_handler,
> +				 void *context)
>  {
>  	struct perf_event_context *ctx;
>  	struct perf_event *event;
> @@ -6673,7 +6679,8 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu,
>  	 * Get the target context (task or percpu):
>  	 */
>  
> -	event = perf_event_alloc(attr, cpu, task, NULL, NULL, overflow_handler);
> +	event = perf_event_alloc(attr, cpu, task, NULL, NULL,
> +				 overflow_handler, context);
>  	if (IS_ERR(event)) {
>  		err = PTR_ERR(event);
>  		goto err;
> @@ -6957,7 +6964,7 @@ inherit_event(struct perf_event *parent_event,
>  					   parent_event->cpu,
>  					   child,
>  					   group_leader, parent_event,
> -					   NULL);
> +				           NULL, NULL);
>  	if (IS_ERR(child_event))
>  		return child_event;
>  	get_ctx(child_ctx);
> @@ -6984,6 +6991,8 @@ inherit_event(struct perf_event *parent_event,
>  
>  	child_event->ctx = child_ctx;
>  	child_event->overflow_handler = parent_event->overflow_handler;
> +	child_event->overflow_handler_context
> +		= parent_event->overflow_handler_context;
>  
>  	/*
>  	 * Precalculate sample_data sizes
> diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
> index 086adf2..b7971d6 100644
> --- a/kernel/events/hw_breakpoint.c
> +++ b/kernel/events/hw_breakpoint.c
> @@ -431,9 +431,11 @@ int register_perf_hw_breakpoint(struct perf_event *bp)
>  struct perf_event *
>  register_user_hw_breakpoint(struct perf_event_attr *attr,
>  			    perf_overflow_handler_t triggered,
> +			    void *context,
>  			    struct task_struct *tsk)
>  {
> -	return perf_event_create_kernel_counter(attr, -1, tsk, triggered);
> +	return perf_event_create_kernel_counter(attr, -1, tsk, triggered,
> +						context);
>  }
>  EXPORT_SYMBOL_GPL(register_user_hw_breakpoint);
>  
> @@ -502,7 +504,8 @@ EXPORT_SYMBOL_GPL(unregister_hw_breakpoint);
>   */
>  struct perf_event * __percpu *
>  register_wide_hw_breakpoint(struct perf_event_attr *attr,
> -			    perf_overflow_handler_t triggered)
> +			    perf_overflow_handler_t triggered,
> +			    void *context)
>  {
>  	struct perf_event * __percpu *cpu_events, **pevent, *bp;
>  	long err;
> @@ -515,7 +518,8 @@ register_wide_hw_breakpoint(struct perf_event_attr *attr,
>  	get_online_cpus();
>  	for_each_online_cpu(cpu) {
>  		pevent = per_cpu_ptr(cpu_events, cpu);
> -		bp = perf_event_create_kernel_counter(attr, cpu, NULL, triggered);
> +		bp = perf_event_create_kernel_counter(attr, cpu, NULL,
> +						      triggered, context);
>  
>  		*pevent = bp;
>  
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 3d0c56a..0aaa41f 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -371,7 +371,7 @@ static int watchdog_nmi_enable(int cpu)
>  	/* Try to register using hardware perf events */
>  	wd_attr = &wd_hw_attr;
>  	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
> -	event = perf_event_create_kernel_counter(wd_attr, cpu, NULL, watchdog_overflow_callback);
> +	event = perf_event_create_kernel_counter(wd_attr, cpu, NULL, watchdog_overflow_callback, NULL);
>  	if (!IS_ERR(event)) {
>  		printk(KERN_INFO "NMI watchdog enabled, takes one hw-pmu counter.\n");
>  		goto out_save;
> diff --git a/samples/hw_breakpoint/data_breakpoint.c b/samples/hw_breakpoint/data_breakpoint.c
> index 0636539..2b85831 100644
> --- a/samples/hw_breakpoint/data_breakpoint.c
> +++ b/samples/hw_breakpoint/data_breakpoint.c
> @@ -60,7 +60,7 @@ static int __init hw_break_module_init(void)
>  	attr.bp_len = HW_BREAKPOINT_LEN_4;
>  	attr.bp_type = HW_BREAKPOINT_W | HW_BREAKPOINT_R;
>  
> -	sample_hbp = register_wide_hw_breakpoint(&attr, sample_hbp_handler);
> +	sample_hbp = register_wide_hw_breakpoint(&attr, sample_hbp_handler, NULL);
>  	if (IS_ERR((void __force *)sample_hbp)) {
>  		ret = PTR_ERR((void __force *)sample_hbp);
>  		goto fail;
> -- 
> 1.7.5.3
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/3] perf: add context field to perf_event
  2011-06-29 16:08   ` Frederic Weisbecker
@ 2011-06-29 16:25     ` Avi Kivity
  2011-06-29 16:27     ` Will Deacon
  1 sibling, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2011-06-29 16:25 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, kvm, Ingo Molnar, a.p.zijlstra, acme, Will Deacon,
	Jason Wessel

On 06/29/2011 07:08 PM, Frederic Weisbecker wrote:
> On Wed, Jun 29, 2011 at 06:42:35PM +0300, Avi Kivity wrote:
> >  The perf_event overflow handler does not receive any caller-derived
> >  argument, so many callers need to resort to looking up the perf_event
> >  in their local data structure.  This is ugly and doesn't scale if a
> >  single callback services many perf_events.
> >
> >  Fix by adding a context parameter to perf_event_create_kernel_counter()
> >  (and derived hardware breakpoints APIs) and storing it in the perf_event.
> >  The field can be accessed from the callback as event->overflow_handler_context.
> >  All callers are updated.
> >
> >  Signed-off-by: Avi Kivity<avi@redhat.com>
>
> I believe it can micro-optimize ptrace through register_user_hw_breakpoint() because
> we could store the index of the breakpoint that way, instead of iterating through 4 slots.
>

Right, I noticed that while writing the patch.

> Perhaps it can help in arm too, adding Will in Cc.
>
> But for register_wide_hw_breakpoint, I'm not sure. kgdb is the main user, may be Jason
> could find some use of it.

I think an API should not require its users to iterate in their 
callbacks, even if it doesn't affect current users for some reason.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/3] perf: add context field to perf_event
  2011-06-29 16:08   ` Frederic Weisbecker
  2011-06-29 16:25     ` Avi Kivity
@ 2011-06-29 16:27     ` Will Deacon
  2011-07-04 13:58       ` Frederic Weisbecker
  1 sibling, 1 reply; 39+ messages in thread
From: Will Deacon @ 2011-06-29 16:27 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Avi Kivity, linux-kernel, kvm, Ingo Molnar, a.p.zijlstra, acme,
	Jason Wessel

Hi Frederic,

Thanks for including me on CC.

On Wed, Jun 29, 2011 at 05:08:45PM +0100, Frederic Weisbecker wrote:
> On Wed, Jun 29, 2011 at 06:42:35PM +0300, Avi Kivity wrote:
> > The perf_event overflow handler does not receive any caller-derived
> > argument, so many callers need to resort to looking up the perf_event
> > in their local data structure.  This is ugly and doesn't scale if a
> > single callback services many perf_events.
> >
> > Fix by adding a context parameter to perf_event_create_kernel_counter()
> > (and derived hardware breakpoints APIs) and storing it in the perf_event.
> > The field can be accessed from the callback as event->overflow_handler_context.
> > All callers are updated.
> >
> > Signed-off-by: Avi Kivity <avi@redhat.com>
> 
> I believe it can micro-optimize ptrace through register_user_hw_breakpoint() because
> we could store the index of the breakpoint that way, instead of iterating through 4 slots.
> 
> Perhaps it can help in arm too, adding Will in Cc.

Yes, we could store the breakpoint index in there and it would save us
walking over the breakpoints when one fires. Not sure this helps us for
anything else though. My main gripe with the ptrace interface to
hw_breakpoints is that we have to convert all the breakpoint information
from ARM_BREAKPOINT_* to HW_BREAKPOINT_* and then convert it all back again
in the hw_breakpoint code. Yuck!

Will

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [tip:perf/core] perf: Add context field to perf_event
  2011-06-29 15:42 ` [PATCH 1/3] perf: add context field to perf_event Avi Kivity
  2011-06-29 16:08   ` Frederic Weisbecker
@ 2011-07-01 15:24   ` tip-bot for Avi Kivity
  1 sibling, 0 replies; 39+ messages in thread
From: tip-bot for Avi Kivity @ 2011-07-01 15:24 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, tglx, mingo, avi

Commit-ID:  4dc0da86967d5463708631d02a70cfed5b104884
Gitweb:     http://git.kernel.org/tip/4dc0da86967d5463708631d02a70cfed5b104884
Author:     Avi Kivity <avi@redhat.com>
AuthorDate: Wed, 29 Jun 2011 18:42:35 +0300
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 1 Jul 2011 11:06:38 +0200

perf: Add context field to perf_event

The perf_event overflow handler does not receive any caller-derived
argument, so many callers need to resort to looking up the perf_event
in their local data structure.  This is ugly and doesn't scale if a
single callback services many perf_events.

Fix by adding a context parameter to perf_event_create_kernel_counter()
(and derived hardware breakpoints APIs) and storing it in the perf_event.
The field can be accessed from the callback as event->overflow_handler_context.
All callers are updated.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1309362157-6596-2-git-send-email-avi@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/arm/kernel/ptrace.c                |    3 ++-
 arch/powerpc/kernel/ptrace.c            |    2 +-
 arch/sh/kernel/ptrace_32.c              |    3 ++-
 arch/x86/kernel/kgdb.c                  |    2 +-
 arch/x86/kernel/ptrace.c                |    3 ++-
 drivers/oprofile/oprofile_perf.c        |    2 +-
 include/linux/hw_breakpoint.h           |   10 ++++++++--
 include/linux/perf_event.h              |    4 +++-
 kernel/events/core.c                    |   21 +++++++++++++++------
 kernel/events/hw_breakpoint.c           |   10 +++++++---
 kernel/watchdog.c                       |    2 +-
 samples/hw_breakpoint/data_breakpoint.c |    2 +-
 12 files changed, 44 insertions(+), 20 deletions(-)

diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c
index 0c9b105..5c19961 100644
--- a/arch/arm/kernel/ptrace.c
+++ b/arch/arm/kernel/ptrace.c
@@ -479,7 +479,8 @@ static struct perf_event *ptrace_hbp_create(struct task_struct *tsk, int type)
 	attr.bp_type	= type;
 	attr.disabled	= 1;
 
-	return register_user_hw_breakpoint(&attr, ptrace_hbptriggered, tsk);
+	return register_user_hw_breakpoint(&attr, ptrace_hbptriggered, NULL,
+					   tsk);
 }
 
 static int ptrace_gethbpregs(struct task_struct *tsk, long num,
diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index 3177617..05b7dd2 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -973,7 +973,7 @@ int ptrace_set_debugreg(struct task_struct *task, unsigned long addr,
 								&attr.bp_type);
 
 	thread->ptrace_bps[0] = bp = register_user_hw_breakpoint(&attr,
-							ptrace_triggered, task);
+					       ptrace_triggered, NULL, task);
 	if (IS_ERR(bp)) {
 		thread->ptrace_bps[0] = NULL;
 		ptrace_put_breakpoints(task);
diff --git a/arch/sh/kernel/ptrace_32.c b/arch/sh/kernel/ptrace_32.c
index 8051976..92b3c27 100644
--- a/arch/sh/kernel/ptrace_32.c
+++ b/arch/sh/kernel/ptrace_32.c
@@ -91,7 +91,8 @@ static int set_single_step(struct task_struct *tsk, unsigned long addr)
 		attr.bp_len = HW_BREAKPOINT_LEN_2;
 		attr.bp_type = HW_BREAKPOINT_R;
 
-		bp = register_user_hw_breakpoint(&attr, ptrace_triggered, tsk);
+		bp = register_user_hw_breakpoint(&attr, ptrace_triggered,
+						 NULL, tsk);
 		if (IS_ERR(bp))
 			return PTR_ERR(bp);
 
diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index 98da6a7..00354d4 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -638,7 +638,7 @@ void kgdb_arch_late(void)
 	for (i = 0; i < HBP_NUM; i++) {
 		if (breakinfo[i].pev)
 			continue;
-		breakinfo[i].pev = register_wide_hw_breakpoint(&attr, NULL);
+		breakinfo[i].pev = register_wide_hw_breakpoint(&attr, NULL, NULL);
 		if (IS_ERR((void * __force)breakinfo[i].pev)) {
 			printk(KERN_ERR "kgdb: Could not allocate hw"
 			       "breakpoints\nDisabling the kernel debugger\n");
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 11db2e9..8252879 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -715,7 +715,8 @@ static int ptrace_set_breakpoint_addr(struct task_struct *tsk, int nr,
 		attr.bp_type = HW_BREAKPOINT_W;
 		attr.disabled = 1;
 
-		bp = register_user_hw_breakpoint(&attr, ptrace_triggered, tsk);
+		bp = register_user_hw_breakpoint(&attr, ptrace_triggered,
+						 NULL, tsk);
 
 		/*
 		 * CHECKME: the previous code returned -EIO if the addr wasn't
diff --git a/drivers/oprofile/oprofile_perf.c b/drivers/oprofile/oprofile_perf.c
index 9046f7b..59acf9e 100644
--- a/drivers/oprofile/oprofile_perf.c
+++ b/drivers/oprofile/oprofile_perf.c
@@ -79,7 +79,7 @@ static int op_create_counter(int cpu, int event)
 
 	pevent = perf_event_create_kernel_counter(&counter_config[event].attr,
 						  cpu, NULL,
-						  op_overflow_handler);
+						  op_overflow_handler, NULL);
 
 	if (IS_ERR(pevent))
 		return PTR_ERR(pevent);
diff --git a/include/linux/hw_breakpoint.h b/include/linux/hw_breakpoint.h
index d1e55fe..6ae9c63 100644
--- a/include/linux/hw_breakpoint.h
+++ b/include/linux/hw_breakpoint.h
@@ -73,6 +73,7 @@ static inline unsigned long hw_breakpoint_len(struct perf_event *bp)
 extern struct perf_event *
 register_user_hw_breakpoint(struct perf_event_attr *attr,
 			    perf_overflow_handler_t triggered,
+			    void *context,
 			    struct task_struct *tsk);
 
 /* FIXME: only change from the attr, and don't unregister */
@@ -85,11 +86,13 @@ modify_user_hw_breakpoint(struct perf_event *bp, struct perf_event_attr *attr);
 extern struct perf_event *
 register_wide_hw_breakpoint_cpu(struct perf_event_attr *attr,
 				perf_overflow_handler_t	triggered,
+				void *context,
 				int cpu);
 
 extern struct perf_event * __percpu *
 register_wide_hw_breakpoint(struct perf_event_attr *attr,
-			    perf_overflow_handler_t triggered);
+			    perf_overflow_handler_t triggered,
+			    void *context);
 
 extern int register_perf_hw_breakpoint(struct perf_event *bp);
 extern int __register_perf_hw_breakpoint(struct perf_event *bp);
@@ -115,6 +118,7 @@ static inline int __init init_hw_breakpoint(void) { return 0; }
 static inline struct perf_event *
 register_user_hw_breakpoint(struct perf_event_attr *attr,
 			    perf_overflow_handler_t triggered,
+			    void *context,
 			    struct task_struct *tsk)	{ return NULL; }
 static inline int
 modify_user_hw_breakpoint(struct perf_event *bp,
@@ -122,10 +126,12 @@ modify_user_hw_breakpoint(struct perf_event *bp,
 static inline struct perf_event *
 register_wide_hw_breakpoint_cpu(struct perf_event_attr *attr,
 				perf_overflow_handler_t	 triggered,
+				void *context,
 				int cpu)		{ return NULL; }
 static inline struct perf_event * __percpu *
 register_wide_hw_breakpoint(struct perf_event_attr *attr,
-			    perf_overflow_handler_t triggered)	{ return NULL; }
+			    perf_overflow_handler_t triggered,
+			    void *context)		{ return NULL; }
 static inline int
 register_perf_hw_breakpoint(struct perf_event *bp)	{ return -ENOSYS; }
 static inline int
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index a5f54b9..2a08cac 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -839,6 +839,7 @@ struct perf_event {
 	u64				id;
 
 	perf_overflow_handler_t		overflow_handler;
+	void				*overflow_handler_context;
 
 #ifdef CONFIG_EVENT_TRACING
 	struct ftrace_event_call	*tp_event;
@@ -960,7 +961,8 @@ extern struct perf_event *
 perf_event_create_kernel_counter(struct perf_event_attr *attr,
 				int cpu,
 				struct task_struct *task,
-				perf_overflow_handler_t callback);
+				perf_overflow_handler_t callback,
+				void *context);
 extern u64 perf_event_read_value(struct perf_event *event,
 				 u64 *enabled, u64 *running);
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 81de28d..ba8e0f4 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5745,7 +5745,8 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 		 struct task_struct *task,
 		 struct perf_event *group_leader,
 		 struct perf_event *parent_event,
-		 perf_overflow_handler_t overflow_handler)
+		 perf_overflow_handler_t overflow_handler,
+		 void *context)
 {
 	struct pmu *pmu;
 	struct perf_event *event;
@@ -5803,10 +5804,13 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 #endif
 	}
 
-	if (!overflow_handler && parent_event)
+	if (!overflow_handler && parent_event) {
 		overflow_handler = parent_event->overflow_handler;
+		context = parent_event->overflow_handler_context;
+	}
 
 	event->overflow_handler	= overflow_handler;
+	event->overflow_handler_context = context;
 
 	if (attr->disabled)
 		event->state = PERF_EVENT_STATE_OFF;
@@ -6073,7 +6077,8 @@ SYSCALL_DEFINE5(perf_event_open,
 		}
 	}
 
-	event = perf_event_alloc(&attr, cpu, task, group_leader, NULL, NULL);
+	event = perf_event_alloc(&attr, cpu, task, group_leader, NULL,
+				 NULL, NULL);
 	if (IS_ERR(event)) {
 		err = PTR_ERR(event);
 		goto err_task;
@@ -6258,7 +6263,8 @@ err_fd:
 struct perf_event *
 perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu,
 				 struct task_struct *task,
-				 perf_overflow_handler_t overflow_handler)
+				 perf_overflow_handler_t overflow_handler,
+				 void *context)
 {
 	struct perf_event_context *ctx;
 	struct perf_event *event;
@@ -6268,7 +6274,8 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu,
 	 * Get the target context (task or percpu):
 	 */
 
-	event = perf_event_alloc(attr, cpu, task, NULL, NULL, overflow_handler);
+	event = perf_event_alloc(attr, cpu, task, NULL, NULL,
+				 overflow_handler, context);
 	if (IS_ERR(event)) {
 		err = PTR_ERR(event);
 		goto err;
@@ -6552,7 +6559,7 @@ inherit_event(struct perf_event *parent_event,
 					   parent_event->cpu,
 					   child,
 					   group_leader, parent_event,
-					   NULL);
+				           NULL, NULL);
 	if (IS_ERR(child_event))
 		return child_event;
 	get_ctx(child_ctx);
@@ -6579,6 +6586,8 @@ inherit_event(struct perf_event *parent_event,
 
 	child_event->ctx = child_ctx;
 	child_event->overflow_handler = parent_event->overflow_handler;
+	child_event->overflow_handler_context
+		= parent_event->overflow_handler_context;
 
 	/*
 	 * Precalculate sample_data sizes
diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index 086adf2..b7971d6 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -431,9 +431,11 @@ int register_perf_hw_breakpoint(struct perf_event *bp)
 struct perf_event *
 register_user_hw_breakpoint(struct perf_event_attr *attr,
 			    perf_overflow_handler_t triggered,
+			    void *context,
 			    struct task_struct *tsk)
 {
-	return perf_event_create_kernel_counter(attr, -1, tsk, triggered);
+	return perf_event_create_kernel_counter(attr, -1, tsk, triggered,
+						context);
 }
 EXPORT_SYMBOL_GPL(register_user_hw_breakpoint);
 
@@ -502,7 +504,8 @@ EXPORT_SYMBOL_GPL(unregister_hw_breakpoint);
  */
 struct perf_event * __percpu *
 register_wide_hw_breakpoint(struct perf_event_attr *attr,
-			    perf_overflow_handler_t triggered)
+			    perf_overflow_handler_t triggered,
+			    void *context)
 {
 	struct perf_event * __percpu *cpu_events, **pevent, *bp;
 	long err;
@@ -515,7 +518,8 @@ register_wide_hw_breakpoint(struct perf_event_attr *attr,
 	get_online_cpus();
 	for_each_online_cpu(cpu) {
 		pevent = per_cpu_ptr(cpu_events, cpu);
-		bp = perf_event_create_kernel_counter(attr, cpu, NULL, triggered);
+		bp = perf_event_create_kernel_counter(attr, cpu, NULL,
+						      triggered, context);
 
 		*pevent = bp;
 
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index a6708e6..a933e3a 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -375,7 +375,7 @@ static int watchdog_nmi_enable(int cpu)
 	hw_nmi_watchdog_set_attr(wd_attr);
 
 	/* Try to register using hardware perf events */
-	event = perf_event_create_kernel_counter(wd_attr, cpu, NULL, watchdog_overflow_callback);
+	event = perf_event_create_kernel_counter(wd_attr, cpu, NULL, watchdog_overflow_callback, NULL);
 	if (!IS_ERR(event)) {
 		printk(KERN_INFO "NMI watchdog enabled, takes one hw-pmu counter.\n");
 		goto out_save;
diff --git a/samples/hw_breakpoint/data_breakpoint.c b/samples/hw_breakpoint/data_breakpoint.c
index 7b164d3..ef7f322 100644
--- a/samples/hw_breakpoint/data_breakpoint.c
+++ b/samples/hw_breakpoint/data_breakpoint.c
@@ -60,7 +60,7 @@ static int __init hw_break_module_init(void)
 	attr.bp_len = HW_BREAKPOINT_LEN_4;
 	attr.bp_type = HW_BREAKPOINT_W | HW_BREAKPOINT_R;
 
-	sample_hbp = register_wide_hw_breakpoint(&attr, sample_hbp_handler);
+	sample_hbp = register_wide_hw_breakpoint(&attr, sample_hbp_handler, NULL);
 	if (IS_ERR((void __force *)sample_hbp)) {
 		ret = PTR_ERR((void __force *)sample_hbp);
 		goto fail;

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [tip:perf/core] x86, perf: Add constraints for architectural PMU
  2011-06-29 15:42 ` [PATCH 2/3] x86, perf: add constraints for architectural PMU v1 Avi Kivity
@ 2011-07-01 15:24   ` tip-bot for Avi Kivity
  0 siblings, 0 replies; 39+ messages in thread
From: tip-bot for Avi Kivity @ 2011-07-01 15:24 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, tglx, mingo, avi

Commit-ID:  0af3ac1fdb9d5c297b4b07c9e0172531d42b6716
Gitweb:     http://git.kernel.org/tip/0af3ac1fdb9d5c297b4b07c9e0172531d42b6716
Author:     Avi Kivity <avi@redhat.com>
AuthorDate: Wed, 29 Jun 2011 18:42:36 +0300
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 1 Jul 2011 11:06:39 +0200

x86, perf: Add constraints for architectural PMU

The v1 PMU does not have any fixed counters.  Using the v2 constraints,
which do have fixed counters, causes an additional choice to be present
in the weight calculation, but not when actually scheduling the event,
leading to an event being not scheduled at all.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1309362157-6596-3-git-send-email-avi@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/cpu/perf_event_intel.c |   23 ++++++++++++++++++-----
 1 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index bf6f92f..45fbb8f 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -112,6 +112,11 @@ static struct extra_reg intel_westmere_extra_regs[] __read_mostly =
 	EVENT_EXTRA_END
 };
 
+static struct event_constraint intel_v1_event_constraints[] __read_mostly =
+{
+	EVENT_CONSTRAINT_END
+};
+
 static struct event_constraint intel_gen_event_constraints[] __read_mostly =
 {
 	FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */
@@ -1606,11 +1611,19 @@ static __init int intel_pmu_init(void)
 		break;
 
 	default:
-		/*
-		 * default constraints for v2 and up
-		 */
-		x86_pmu.event_constraints = intel_gen_event_constraints;
-		pr_cont("generic architected perfmon, ");
+		switch (x86_pmu.version) {
+		case 1:
+			x86_pmu.event_constraints = intel_v1_event_constraints;
+			pr_cont("generic architected perfmon v1, ");
+			break;
+		default:
+			/*
+			 * default constraints for v2 and up
+			 */
+			x86_pmu.event_constraints = intel_gen_event_constraints;
+			pr_cont("generic architected perfmon, ");
+			break;
+		}
 	}
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [tip:perf/core] perf: export perf_event_refresh() to modules
  2011-06-29 15:42 ` [PATCH 3/3] perf: export perf_event_refresh() to modules Avi Kivity
@ 2011-07-01 15:25   ` tip-bot for Avi Kivity
  0 siblings, 0 replies; 39+ messages in thread
From: tip-bot for Avi Kivity @ 2011-07-01 15:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, tglx, mingo, avi

Commit-ID:  26ca5c11fb45ae2b2ac7e3574b8db6b3a3c7d350
Gitweb:     http://git.kernel.org/tip/26ca5c11fb45ae2b2ac7e3574b8db6b3a3c7d350
Author:     Avi Kivity <avi@redhat.com>
AuthorDate: Wed, 29 Jun 2011 18:42:37 +0300
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 1 Jul 2011 11:06:40 +0200

perf: export perf_event_refresh() to modules

KVM needs one-shot samples, since a PMC programmed to -X will fire after X
events and then again after 2^40 events (i.e. variable period).

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1309362157-6596-4-git-send-email-avi@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 include/linux/perf_event.h |    5 +++++
 kernel/events/core.c       |    3 ++-
 2 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 2a08cac..3f2711c 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -955,6 +955,7 @@ extern void perf_pmu_disable(struct pmu *pmu);
 extern void perf_pmu_enable(struct pmu *pmu);
 extern int perf_event_task_disable(void);
 extern int perf_event_task_enable(void);
+extern int perf_event_refresh(struct perf_event *event, int refresh);
 extern void perf_event_update_userpage(struct perf_event *event);
 extern int perf_event_release_kernel(struct perf_event *event);
 extern struct perf_event *
@@ -1149,6 +1150,10 @@ static inline void perf_event_delayed_put(struct task_struct *task)	{ }
 static inline void perf_event_print_debug(void)				{ }
 static inline int perf_event_task_disable(void)				{ return -EINVAL; }
 static inline int perf_event_task_enable(void)				{ return -EINVAL; }
+static inline int perf_event_refresh(struct perf_event *event, int refresh)
+{
+	return -EINVAL;
+}
 
 static inline void
 perf_sw_event(u32 event_id, u64 nr, struct pt_regs *regs, u64 addr)	{ }
diff --git a/kernel/events/core.c b/kernel/events/core.c
index ba8e0f4..0567e32 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1764,7 +1764,7 @@ out:
 	raw_spin_unlock_irq(&ctx->lock);
 }
 
-static int perf_event_refresh(struct perf_event *event, int refresh)
+int perf_event_refresh(struct perf_event *event, int refresh)
 {
 	/*
 	 * not supported on inherited events
@@ -1777,6 +1777,7 @@ static int perf_event_refresh(struct perf_event *event, int refresh)
 
 	return 0;
 }
+EXPORT_SYMBOL_GPL(perf_event_refresh);
 
 static void ctx_sched_out(struct perf_event_context *ctx,
 			  struct perf_cpu_context *cpuctx,

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/3] perf: add context field to perf_event
  2011-06-29 16:27     ` Will Deacon
@ 2011-07-04 13:58       ` Frederic Weisbecker
  2011-07-04 14:10         ` Avi Kivity
  2011-07-05 14:30         ` Will Deacon
  0 siblings, 2 replies; 39+ messages in thread
From: Frederic Weisbecker @ 2011-07-04 13:58 UTC (permalink / raw)
  To: Will Deacon
  Cc: Avi Kivity, linux-kernel, kvm, Ingo Molnar, a.p.zijlstra, acme,
	Jason Wessel

On Wed, Jun 29, 2011 at 05:27:25PM +0100, Will Deacon wrote:
> Hi Frederic,
> 
> Thanks for including me on CC.
> 
> On Wed, Jun 29, 2011 at 05:08:45PM +0100, Frederic Weisbecker wrote:
> > On Wed, Jun 29, 2011 at 06:42:35PM +0300, Avi Kivity wrote:
> > > The perf_event overflow handler does not receive any caller-derived
> > > argument, so many callers need to resort to looking up the perf_event
> > > in their local data structure.  This is ugly and doesn't scale if a
> > > single callback services many perf_events.
> > >
> > > Fix by adding a context parameter to perf_event_create_kernel_counter()
> > > (and derived hardware breakpoints APIs) and storing it in the perf_event.
> > > The field can be accessed from the callback as event->overflow_handler_context.
> > > All callers are updated.
> > >
> > > Signed-off-by: Avi Kivity <avi@redhat.com>
> > 
> > I believe it can micro-optimize ptrace through register_user_hw_breakpoint() because
> > we could store the index of the breakpoint that way, instead of iterating through 4 slots.
> > 
> > Perhaps it can help in arm too, adding Will in Cc.
> 
> Yes, we could store the breakpoint index in there and it would save us
> walking over the breakpoints when one fires. Not sure this helps us for
> anything else though. My main gripe with the ptrace interface to
> hw_breakpoints is that we have to convert all the breakpoint information
> from ARM_BREAKPOINT_* to HW_BREAKPOINT_* and then convert it all back again
> in the hw_breakpoint code. Yuck!

Agreed, I don't like that either.

Would you like to improve that? We probably need to be able to pass some arch data
through the whole call of breakpoint creation, including perf_event_create_kernel_counter().

There can be a transition step where we can either take generic attr or arch datas, until
every archs are converted. So that you can handle the arm part and other arch developers
can relay.


Another thing I would like to do in the even longer term is to not use perf anymore
for ptrace breakpoints, because that involves a heavy dependency and few people are
happy with that. Instead we should just have a generic hook into the sched_switch()
and handle pure ptrace breakpoints there. The central breakpoint API would still be
there to reserve/schedule breakpoint resources between ptrace and perf.

But beeing able to create ptrace breakpoints without converting to generic perf attr
is a necessary first step in order to achieve this.

Thanks. 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/3] perf: add context field to perf_event
  2011-07-04 13:58       ` Frederic Weisbecker
@ 2011-07-04 14:10         ` Avi Kivity
  2011-07-04 14:36           ` Frederic Weisbecker
  2011-07-05 14:30         ` Will Deacon
  1 sibling, 1 reply; 39+ messages in thread
From: Avi Kivity @ 2011-07-04 14:10 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Will Deacon, linux-kernel, kvm, Ingo Molnar, a.p.zijlstra, acme,
	Jason Wessel

On 07/04/2011 04:58 PM, Frederic Weisbecker wrote:
> Another thing I would like to do in the even longer term is to not use perf anymore
> for ptrace breakpoints, because that involves a heavy dependency and few people are
> happy with that. Instead we should just have a generic hook into the sched_switch()
> and handle pure ptrace breakpoints there. The central breakpoint API would still be
> there to reserve/schedule breakpoint resources between ptrace and perf.
>

'struct preempt_notifier' may be the hook you're looking for.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/3] perf: add context field to perf_event
  2011-07-04 14:10         ` Avi Kivity
@ 2011-07-04 14:36           ` Frederic Weisbecker
  2011-07-11 21:07             ` Will Deacon
  0 siblings, 1 reply; 39+ messages in thread
From: Frederic Weisbecker @ 2011-07-04 14:36 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Will Deacon, linux-kernel, kvm, Ingo Molnar, a.p.zijlstra, acme,
	Jason Wessel

On Mon, Jul 04, 2011 at 05:10:20PM +0300, Avi Kivity wrote:
> On 07/04/2011 04:58 PM, Frederic Weisbecker wrote:
> >Another thing I would like to do in the even longer term is to not use perf anymore
> >for ptrace breakpoints, because that involves a heavy dependency and few people are
> >happy with that. Instead we should just have a generic hook into the sched_switch()
> >and handle pure ptrace breakpoints there. The central breakpoint API would still be
> >there to reserve/schedule breakpoint resources between ptrace and perf.
> >
> 
> 'struct preempt_notifier' may be the hook you're looking for.

Yeah looks like a perfect fit as it's per task.

Thanks.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/3] perf: add context field to perf_event
  2011-07-04 13:58       ` Frederic Weisbecker
  2011-07-04 14:10         ` Avi Kivity
@ 2011-07-05 14:30         ` Will Deacon
  2011-07-05 14:34           ` Frederic Weisbecker
  1 sibling, 1 reply; 39+ messages in thread
From: Will Deacon @ 2011-07-05 14:30 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Avi Kivity, linux-kernel, kvm, Ingo Molnar, a.p.zijlstra, acme,
	Jason Wessel

Hi Frederic,

On Mon, Jul 04, 2011 at 02:58:24PM +0100, Frederic Weisbecker wrote:
> On Wed, Jun 29, 2011 at 05:27:25PM +0100, Will Deacon wrote:
> > Hi Frederic,
> > 
> > Thanks for including me on CC.
> > 
> > On Wed, Jun 29, 2011 at 05:08:45PM +0100, Frederic Weisbecker wrote:
> > > On Wed, Jun 29, 2011 at 06:42:35PM +0300, Avi Kivity wrote:
> > > > The perf_event overflow handler does not receive any caller-derived
> > > > argument, so many callers need to resort to looking up the perf_event
> > > > in their local data structure.  This is ugly and doesn't scale if a
> > > > single callback services many perf_events.
> > > >
> > > > Fix by adding a context parameter to perf_event_create_kernel_counter()
> > > > (and derived hardware breakpoints APIs) and storing it in the perf_event.
> > > > The field can be accessed from the callback as event->overflow_handler_context.
> > > > All callers are updated.
> > > >
> > > > Signed-off-by: Avi Kivity <avi@redhat.com>
> > > 
> > > I believe it can micro-optimize ptrace through register_user_hw_breakpoint() because
> > > we could store the index of the breakpoint that way, instead of iterating through 4 slots.
> > > 
> > > Perhaps it can help in arm too, adding Will in Cc.
> > 
> > Yes, we could store the breakpoint index in there and it would save us
> > walking over the breakpoints when one fires. Not sure this helps us for
> > anything else though. My main gripe with the ptrace interface to
> > hw_breakpoints is that we have to convert all the breakpoint information
> > from ARM_BREAKPOINT_* to HW_BREAKPOINT_* and then convert it all back again
> > in the hw_breakpoint code. Yuck!
> 
> Agreed, I don't like that either.
> 
> Would you like to improve that? We probably need to be able to pass some arch data
> through the whole call of breakpoint creation, including perf_event_create_kernel_counter().

Sure, I'll make some time to look at this and try and get an RFC out in the
next few weeks.

> There can be a transition step where we can either take generic attr or arch datas, until
> every archs are converted. So that you can handle the arm part and other arch developers
> can relay.

Yup.

> 
> Another thing I would like to do in the even longer term is to not use perf anymore
> for ptrace breakpoints, because that involves a heavy dependency and few people are
> happy with that. Instead we should just have a generic hook into the sched_switch()
> and handle pure ptrace breakpoints there. The central breakpoint API would still be
> there to reserve/schedule breakpoint resources between ptrace and perf.
> 
> But beeing able to create ptrace breakpoints without converting to generic perf attr
> is a necessary first step in order to achieve this.

Agreed, but I'll bear that in mind so I don't make it any more difficult
than it already is!

Cheers,

Will

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/3] perf: add context field to perf_event
  2011-07-05 14:30         ` Will Deacon
@ 2011-07-05 14:34           ` Frederic Weisbecker
  0 siblings, 0 replies; 39+ messages in thread
From: Frederic Weisbecker @ 2011-07-05 14:34 UTC (permalink / raw)
  To: Will Deacon
  Cc: Avi Kivity, linux-kernel, kvm, Ingo Molnar, a.p.zijlstra, acme,
	Jason Wessel

On Tue, Jul 05, 2011 at 03:30:26PM +0100, Will Deacon wrote:
> Hi Frederic,
> 
> On Mon, Jul 04, 2011 at 02:58:24PM +0100, Frederic Weisbecker wrote:
> > On Wed, Jun 29, 2011 at 05:27:25PM +0100, Will Deacon wrote:
> > > Hi Frederic,
> > > 
> > > Thanks for including me on CC.
> > > 
> > > On Wed, Jun 29, 2011 at 05:08:45PM +0100, Frederic Weisbecker wrote:
> > > > On Wed, Jun 29, 2011 at 06:42:35PM +0300, Avi Kivity wrote:
> > > > > The perf_event overflow handler does not receive any caller-derived
> > > > > argument, so many callers need to resort to looking up the perf_event
> > > > > in their local data structure.  This is ugly and doesn't scale if a
> > > > > single callback services many perf_events.
> > > > >
> > > > > Fix by adding a context parameter to perf_event_create_kernel_counter()
> > > > > (and derived hardware breakpoints APIs) and storing it in the perf_event.
> > > > > The field can be accessed from the callback as event->overflow_handler_context.
> > > > > All callers are updated.
> > > > >
> > > > > Signed-off-by: Avi Kivity <avi@redhat.com>
> > > > 
> > > > I believe it can micro-optimize ptrace through register_user_hw_breakpoint() because
> > > > we could store the index of the breakpoint that way, instead of iterating through 4 slots.
> > > > 
> > > > Perhaps it can help in arm too, adding Will in Cc.
> > > 
> > > Yes, we could store the breakpoint index in there and it would save us
> > > walking over the breakpoints when one fires. Not sure this helps us for
> > > anything else though. My main gripe with the ptrace interface to
> > > hw_breakpoints is that we have to convert all the breakpoint information
> > > from ARM_BREAKPOINT_* to HW_BREAKPOINT_* and then convert it all back again
> > > in the hw_breakpoint code. Yuck!
> > 
> > Agreed, I don't like that either.
> > 
> > Would you like to improve that? We probably need to be able to pass some arch data
> > through the whole call of breakpoint creation, including perf_event_create_kernel_counter().
> 
> Sure, I'll make some time to look at this and try and get an RFC out in the
> next few weeks.

Great! Thanks a lot!

> 
> > There can be a transition step where we can either take generic attr or arch datas, until
> > every archs are converted. So that you can handle the arm part and other arch developers
> > can relay.
> 
> Yup.
> 
> > 
> > Another thing I would like to do in the even longer term is to not use perf anymore
> > for ptrace breakpoints, because that involves a heavy dependency and few people are
> > happy with that. Instead we should just have a generic hook into the sched_switch()
> > and handle pure ptrace breakpoints there. The central breakpoint API would still be
> > there to reserve/schedule breakpoint resources between ptrace and perf.
> > 
> > But beeing able to create ptrace breakpoints without converting to generic perf attr
> > is a necessary first step in order to achieve this.
> 
> Agreed, but I'll bear that in mind so I don't make it any more difficult
> than it already is!
> 
> Cheers,
> 
> Will

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/3] perf: add context field to perf_event
  2011-07-04 14:36           ` Frederic Weisbecker
@ 2011-07-11 21:07             ` Will Deacon
  2011-07-12  7:20               ` Avi Kivity
  0 siblings, 1 reply; 39+ messages in thread
From: Will Deacon @ 2011-07-11 21:07 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Avi Kivity, linux-kernel, kvm, Ingo Molnar, a.p.zijlstra, acme,
	Jason Wessel

On Mon, Jul 04, 2011 at 03:36:57PM +0100, Frederic Weisbecker wrote:
> On Mon, Jul 04, 2011 at 05:10:20PM +0300, Avi Kivity wrote:
> > On 07/04/2011 04:58 PM, Frederic Weisbecker wrote:
> > >Another thing I would like to do in the even longer term is to not use perf anymore
> > >for ptrace breakpoints, because that involves a heavy dependency and few people are
> > >happy with that. Instead we should just have a generic hook into the sched_switch()
> > >and handle pure ptrace breakpoints there. The central breakpoint API would still be
> > >there to reserve/schedule breakpoint resources between ptrace and perf.
> > >
> > 
> > 'struct preempt_notifier' may be the hook you're looking for.
> 
> Yeah looks like a perfect fit as it's per task.

I had a quick look at this and I think the preempt_notifier stuff needs
slightly extending so that we can register a notifier for a task other than
current [e.g. the child of current on which we are installing breakpoints].

If the task in question is running, it looks like this will introduce a race
condition between notifier registration and rescheduling. For the purposes
of ptrace this shouldn't be a problem as the child will be stopped, but
others might also want to make use of the new functionality.

Any ideas on how this could be achieved, or am I better off just restricting
this to children that are being traced?

Cheers,

Will

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/3] perf: add context field to perf_event
  2011-07-11 21:07             ` Will Deacon
@ 2011-07-12  7:20               ` Avi Kivity
  2011-07-12  8:38                 ` Peter Zijlstra
  0 siblings, 1 reply; 39+ messages in thread
From: Avi Kivity @ 2011-07-12  7:20 UTC (permalink / raw)
  To: Will Deacon
  Cc: Frederic Weisbecker, linux-kernel, kvm, Ingo Molnar,
	a.p.zijlstra, acme, Jason Wessel

On 07/12/2011 12:07 AM, Will Deacon wrote:
> On Mon, Jul 04, 2011 at 03:36:57PM +0100, Frederic Weisbecker wrote:
> >  On Mon, Jul 04, 2011 at 05:10:20PM +0300, Avi Kivity wrote:
> >  >  On 07/04/2011 04:58 PM, Frederic Weisbecker wrote:
> >  >  >Another thing I would like to do in the even longer term is to not use perf anymore
> >  >  >for ptrace breakpoints, because that involves a heavy dependency and few people are
> >  >  >happy with that. Instead we should just have a generic hook into the sched_switch()
> >  >  >and handle pure ptrace breakpoints there. The central breakpoint API would still be
> >  >  >there to reserve/schedule breakpoint resources between ptrace and perf.
> >  >  >
> >  >
> >  >  'struct preempt_notifier' may be the hook you're looking for.
> >
> >  Yeah looks like a perfect fit as it's per task.
>
> I had a quick look at this and I think the preempt_notifier stuff needs
> slightly extending so that we can register a notifier for a task other than
> current [e.g. the child of current on which we are installing breakpoints].
>
> If the task in question is running, it looks like this will introduce a race
> condition between notifier registration and rescheduling. For the purposes
> of ptrace this shouldn't be a problem as the child will be stopped, but
> others might also want to make use of the new functionality.
>
> Any ideas on how this could be achieved, or am I better off just restricting
> this to children that are being traced?

Maybe we need a generic "run this function in this task's context" 
mechanism instead.  Like an IPI, but targeting tasks instead of cpus.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/3] perf: add context field to perf_event
  2011-07-12  7:20               ` Avi Kivity
@ 2011-07-12  8:38                 ` Peter Zijlstra
  2011-07-12  9:08                   ` Avi Kivity
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2011-07-12  8:38 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Will Deacon, Frederic Weisbecker, linux-kernel, kvm, Ingo Molnar,
	acme, Jason Wessel

On Tue, 2011-07-12 at 10:20 +0300, Avi Kivity wrote:

> Maybe we need a generic "run this function in this task's context" 
> mechanism instead.  Like an IPI, but targeting tasks instead of cpus.
> 

kernel/event/core.c:task_function_call() like?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/3] perf: add context field to perf_event
  2011-07-12  8:38                 ` Peter Zijlstra
@ 2011-07-12  9:08                   ` Avi Kivity
  2011-07-12  9:14                     ` Peter Zijlstra
  0 siblings, 1 reply; 39+ messages in thread
From: Avi Kivity @ 2011-07-12  9:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Frederic Weisbecker, linux-kernel, kvm, Ingo Molnar,
	acme, Jason Wessel

On 07/12/2011 11:38 AM, Peter Zijlstra wrote:
> On Tue, 2011-07-12 at 10:20 +0300, Avi Kivity wrote:
>
> >  Maybe we need a generic "run this function in this task's context"
> >  mechanism instead.  Like an IPI, but targeting tasks instead of cpus.
> >
>
> kernel/event/core.c:task_function_call() like?

Similar, but with stronger guarantees: when the function is called, 
current == p, and the task was either sleeping or in userspace.

This can be used to reduce synchronization requirements between tasks.  
For example, you can set a task-local bit flag with this, without 
atomics.  If this is rare enough, it's a net win compared to adding atomics.

aio completions might also make use of this.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/3] perf: add context field to perf_event
  2011-07-12  9:08                   ` Avi Kivity
@ 2011-07-12  9:14                     ` Peter Zijlstra
  2011-07-12  9:16                       ` Avi Kivity
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2011-07-12  9:14 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Will Deacon, Frederic Weisbecker, linux-kernel, kvm, Ingo Molnar,
	acme, Jason Wessel

On Tue, 2011-07-12 at 12:08 +0300, Avi Kivity wrote:
> Similar, but with stronger guarantees: when the function is called, 
> current == p, and the task was either sleeping or in userspace. 

If the task is sleeping, current can never be p.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/3] perf: add context field to perf_event
  2011-07-12  9:14                     ` Peter Zijlstra
@ 2011-07-12  9:16                       ` Avi Kivity
  2011-07-12  9:18                         ` Peter Zijlstra
  0 siblings, 1 reply; 39+ messages in thread
From: Avi Kivity @ 2011-07-12  9:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Frederic Weisbecker, linux-kernel, kvm, Ingo Molnar,
	acme, Jason Wessel

On 07/12/2011 12:14 PM, Peter Zijlstra wrote:
> On Tue, 2011-07-12 at 12:08 +0300, Avi Kivity wrote:
> >  Similar, but with stronger guarantees: when the function is called,
> >  current == p, and the task was either sleeping or in userspace.
>
> If the task is sleeping, current can never be p.

The guarantee is that the task was sleeping just before the function is 
called.  Of course it's woken up to run the function.

The idea is that you run the function in a known safe point to avoid 
extra synchronization.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/3] perf: add context field to perf_event
  2011-07-12  9:16                       ` Avi Kivity
@ 2011-07-12  9:18                         ` Peter Zijlstra
  2011-07-12  9:27                           ` Avi Kivity
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2011-07-12  9:18 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Will Deacon, Frederic Weisbecker, linux-kernel, kvm, Ingo Molnar,
	acme, Jason Wessel

On Tue, 2011-07-12 at 12:16 +0300, Avi Kivity wrote:
> On 07/12/2011 12:14 PM, Peter Zijlstra wrote:
> > On Tue, 2011-07-12 at 12:08 +0300, Avi Kivity wrote:
> > >  Similar, but with stronger guarantees: when the function is called,
> > >  current == p, and the task was either sleeping or in userspace.
> >
> > If the task is sleeping, current can never be p.
> 
> The guarantee is that the task was sleeping just before the function is 
> called.  Of course it's woken up to run the function.
> 
> The idea is that you run the function in a known safe point to avoid 
> extra synchronization.
> 

I'd much rather we didn't wake the task and let it sleep, that's usually
a very safe place for tasks to be. All you'd need is a guarantee it
won't be woken up while you're doing your thing.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/3] perf: add context field to perf_event
  2011-07-12  9:18                         ` Peter Zijlstra
@ 2011-07-12  9:27                           ` Avi Kivity
  2011-07-12  9:31                             ` Peter Zijlstra
  0 siblings, 1 reply; 39+ messages in thread
From: Avi Kivity @ 2011-07-12  9:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Frederic Weisbecker, linux-kernel, kvm, Ingo Molnar,
	acme, Jason Wessel

On 07/12/2011 12:18 PM, Peter Zijlstra wrote:
> >
> >  The guarantee is that the task was sleeping just before the function is
> >  called.  Of course it's woken up to run the function.
> >
> >  The idea is that you run the function in a known safe point to avoid
> >  extra synchronization.
> >
>
> I'd much rather we didn't wake the task and let it sleep, that's usually
> a very safe place for tasks to be. All you'd need is a guarantee it
> won't be woken up while you're doing your thing.

But it means that 'current' is not set to the right value.  If the 
function depends on it, then it will misbehave.  And in fact 
preempt_notifier_register(), which is the function we want to call here, 
does depend on current.

Of course we need to find more users for this, but I have a feeling this 
will be generally useful.  The alternative is to keep adding bits to 
thread_info::flags.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/3] perf: add context field to perf_event
  2011-07-12  9:27                           ` Avi Kivity
@ 2011-07-12  9:31                             ` Peter Zijlstra
  2011-07-12  9:36                               ` Avi Kivity
                                                 ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: Peter Zijlstra @ 2011-07-12  9:31 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Will Deacon, Frederic Weisbecker, linux-kernel, kvm, Ingo Molnar,
	acme, Jason Wessel

On Tue, 2011-07-12 at 12:27 +0300, Avi Kivity wrote:
> On 07/12/2011 12:18 PM, Peter Zijlstra wrote:
> > >
> > >  The guarantee is that the task was sleeping just before the function is
> > >  called.  Of course it's woken up to run the function.
> > >
> > >  The idea is that you run the function in a known safe point to avoid
> > >  extra synchronization.
> > >
> >
> > I'd much rather we didn't wake the task and let it sleep, that's usually
> > a very safe place for tasks to be. All you'd need is a guarantee it
> > won't be woken up while you're doing your thing.
> 
> But it means that 'current' is not set to the right value.  If the 
> function depends on it, then it will misbehave.  And in fact 
> preempt_notifier_register(), which is the function we want to call here, 
> does depend on current.
> 
> Of course we need to find more users for this, but I have a feeling this 
> will be generally useful.  The alternative is to keep adding bits to 
> thread_info::flags.

Using TIF_bits sounds like a much better solution for this, wakeups are
really rather expensive and its best to avoid extra if at all possible.



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/3] perf: add context field to perf_event
  2011-07-12  9:31                             ` Peter Zijlstra
@ 2011-07-12  9:36                               ` Avi Kivity
  2011-07-12  9:42                                 ` Will Deacon
  2011-07-12  9:41                               ` Joerg Roedel
  2011-07-21 15:32                               ` Will Deacon
  2 siblings, 1 reply; 39+ messages in thread
From: Avi Kivity @ 2011-07-12  9:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Frederic Weisbecker, linux-kernel, kvm, Ingo Molnar,
	acme, Jason Wessel

On 07/12/2011 12:31 PM, Peter Zijlstra wrote:
> >
> >  But it means that 'current' is not set to the right value.  If the
> >  function depends on it, then it will misbehave.  And in fact
> >  preempt_notifier_register(), which is the function we want to call here,
> >  does depend on current.
> >
> >  Of course we need to find more users for this, but I have a feeling this
> >  will be generally useful.  The alternative is to keep adding bits to
> >  thread_info::flags.
>
> Using TIF_bits sounds like a much better solution for this, wakeups are
> really rather expensive and its best to avoid extra if at all possible.
>

We do need a way to make the task look at them.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/3] perf: add context field to perf_event
  2011-07-12  9:31                             ` Peter Zijlstra
  2011-07-12  9:36                               ` Avi Kivity
@ 2011-07-12  9:41                               ` Joerg Roedel
  2011-07-12  9:44                                 ` Avi Kivity
  2011-07-21 15:32                               ` Will Deacon
  2 siblings, 1 reply; 39+ messages in thread
From: Joerg Roedel @ 2011-07-12  9:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Avi Kivity, Will Deacon, Frederic Weisbecker, linux-kernel, kvm,
	Ingo Molnar, acme, Jason Wessel

On Tue, Jul 12, 2011 at 11:31:00AM +0200, Peter Zijlstra wrote:
> On Tue, 2011-07-12 at 12:27 +0300, Avi Kivity wrote:

> > But it means that 'current' is not set to the right value.  If the 
> > function depends on it, then it will misbehave.  And in fact 
> > preempt_notifier_register(), which is the function we want to call here, 
> > does depend on current.
> > 
> > Of course we need to find more users for this, but I have a feeling this 
> > will be generally useful.  The alternative is to keep adding bits to 
> > thread_info::flags.
> 
> Using TIF_bits sounds like a much better solution for this, wakeups are
> really rather expensive and its best to avoid extra if at all possible.

I would rather vote for Avi's solution too. Such a functionality is very
helpful for LWP-perf integration as well (because we need a way to call
do_mmap for a task != current).

Regards,

	Joerg


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/3] perf: add context field to perf_event
  2011-07-12  9:36                               ` Avi Kivity
@ 2011-07-12  9:42                                 ` Will Deacon
  0 siblings, 0 replies; 39+ messages in thread
From: Will Deacon @ 2011-07-12  9:42 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Peter Zijlstra, Frederic Weisbecker, linux-kernel, kvm,
	Ingo Molnar, acme, Jason Wessel

On Tue, Jul 12, 2011 at 10:36:05AM +0100, Avi Kivity wrote:
> On 07/12/2011 12:31 PM, Peter Zijlstra wrote:
> > >
> > >  But it means that 'current' is not set to the right value.  If the
> > >  function depends on it, then it will misbehave.  And in fact
> > >  preempt_notifier_register(), which is the function we want to call here,
> > >  does depend on current.
> > >
> > >  Of course we need to find more users for this, but I have a feeling this
> > >  will be generally useful.  The alternative is to keep adding bits to
> > >  thread_info::flags.
> >
> > Using TIF_bits sounds like a much better solution for this, wakeups are
> > really rather expensive and its best to avoid extra if at all possible.
> >
> 
> We do need a way to make the task look at them.

... especially without pushing this down into the arch code where I don't
think it belongs.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/3] perf: add context field to perf_event
  2011-07-12  9:41                               ` Joerg Roedel
@ 2011-07-12  9:44                                 ` Avi Kivity
  2011-07-12  9:48                                   ` Joerg Roedel
  0 siblings, 1 reply; 39+ messages in thread
From: Avi Kivity @ 2011-07-12  9:44 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Peter Zijlstra, Will Deacon, Frederic Weisbecker, linux-kernel,
	kvm, Ingo Molnar, acme, Jason Wessel

On 07/12/2011 12:41 PM, Joerg Roedel wrote:
> >  Using TIF_bits sounds like a much better solution for this, wakeups are
> >  really rather expensive and its best to avoid extra if at all possible.
>
> I would rather vote for Avi's solution too. Such a functionality is very
> helpful for LWP-perf integration as well (because we need a way to call
> do_mmap for a task != current).
>

It's not needed for that.  See use_mm() (caller must be a kernel thread).

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/3] perf: add context field to perf_event
  2011-07-12  9:44                                 ` Avi Kivity
@ 2011-07-12  9:48                                   ` Joerg Roedel
  2011-07-12  9:55                                     ` Avi Kivity
  0 siblings, 1 reply; 39+ messages in thread
From: Joerg Roedel @ 2011-07-12  9:48 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Peter Zijlstra, Will Deacon, Frederic Weisbecker, linux-kernel,
	kvm, Ingo Molnar, acme, Jason Wessel

On Tue, Jul 12, 2011 at 12:44:17PM +0300, Avi Kivity wrote:
> On 07/12/2011 12:41 PM, Joerg Roedel wrote:
>> >  Using TIF_bits sounds like a much better solution for this, wakeups are
>> >  really rather expensive and its best to avoid extra if at all possible.
>>
>> I would rather vote for Avi's solution too. Such a functionality is very
>> helpful for LWP-perf integration as well (because we need a way to call
>> do_mmap for a task != current).
>>
>
> It's not needed for that.  See use_mm() (caller must be a kernel thread).

But in our case the caller would be the process that wants to count, not
a kernel thread.

	Joerg


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/3] perf: add context field to perf_event
  2011-07-12  9:48                                   ` Joerg Roedel
@ 2011-07-12  9:55                                     ` Avi Kivity
  2011-07-12 10:03                                       ` Joerg Roedel
  0 siblings, 1 reply; 39+ messages in thread
From: Avi Kivity @ 2011-07-12  9:55 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Peter Zijlstra, Will Deacon, Frederic Weisbecker, linux-kernel,
	kvm, Ingo Molnar, acme, Jason Wessel

On 07/12/2011 12:48 PM, Joerg Roedel wrote:
> On Tue, Jul 12, 2011 at 12:44:17PM +0300, Avi Kivity wrote:
> >  On 07/12/2011 12:41 PM, Joerg Roedel wrote:
> >>  >   Using TIF_bits sounds like a much better solution for this, wakeups are
> >>  >   really rather expensive and its best to avoid extra if at all possible.
> >>
> >>  I would rather vote for Avi's solution too. Such a functionality is very
> >>  helpful for LWP-perf integration as well (because we need a way to call
> >>  do_mmap for a task != current).
> >>
> >
> >  It's not needed for that.  See use_mm() (caller must be a kernel thread).
>
> But in our case the caller would be the process that wants to count, not
> a kernel thread.
>

Have a helper kernel thread do it for you.  Or extend use_mm() to return 
the old mm (without dropping its refcount) and add a way to restore it.

Regarding LWP - I thought the intent was self-profiling by the process 
for jits and the like?  If you also use it for perf, won't it be 
unusable for that?  Also, can't the process interfere, from userspace, 
by executing the unprivileged LWP instructions?

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/3] perf: add context field to perf_event
  2011-07-12  9:55                                     ` Avi Kivity
@ 2011-07-12 10:03                                       ` Joerg Roedel
  2011-07-12 10:07                                         ` Avi Kivity
  0 siblings, 1 reply; 39+ messages in thread
From: Joerg Roedel @ 2011-07-12 10:03 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Peter Zijlstra, Will Deacon, Frederic Weisbecker, linux-kernel,
	kvm, Ingo Molnar, acme, Jason Wessel

On Tue, Jul 12, 2011 at 12:55:25PM +0300, Avi Kivity wrote:

> Have a helper kernel thread do it for you.  Or extend use_mm() to return
> the old mm (without dropping its refcount) and add a way to restore it.

Making use_mm usable for regular tasks too sounds like a good idea.
Thanks.

> Regarding LWP - I thought the intent was self-profiling by the process  
> for jits and the like?  If you also use it for perf, won't it be  
> unusable for that?  Also, can't the process interfere, from userspace,  
> by executing the unprivileged LWP instructions?

Ingo made perf-integration a merge-requirement for LWP. It is not really
well-suited for being integrated into perf because the design goal was
easy and efficient self-profiling of tasks (like you stated). So
integrating it into perf causes some pain. But lets see how it works
out.

	Joerg


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/3] perf: add context field to perf_event
  2011-07-12 10:03                                       ` Joerg Roedel
@ 2011-07-12 10:07                                         ` Avi Kivity
  2011-07-12 10:24                                           ` Joerg Roedel
  0 siblings, 1 reply; 39+ messages in thread
From: Avi Kivity @ 2011-07-12 10:07 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Peter Zijlstra, Will Deacon, Frederic Weisbecker, linux-kernel,
	kvm, Ingo Molnar, acme, Jason Wessel

On 07/12/2011 01:03 PM, Joerg Roedel wrote:
> >  Regarding LWP - I thought the intent was self-profiling by the process
> >  for jits and the like?  If you also use it for perf, won't it be
> >  unusable for that?  Also, can't the process interfere, from userspace,
> >  by executing the unprivileged LWP instructions?
>
> Ingo made perf-integration a merge-requirement for LWP. It is not really
> well-suited for being integrated into perf because the design goal was
> easy and efficient self-profiling of tasks (like you stated). So
> integrating it into perf causes some pain. But lets see how it works
> out.

I don't think it's workable.  Having do_mmap() called in the task's 
context can change how it works.  And the task being able to kill/modify 
the profile, and not able to use LWP for itself, is a show stopper IMO.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/3] perf: add context field to perf_event
  2011-07-12 10:07                                         ` Avi Kivity
@ 2011-07-12 10:24                                           ` Joerg Roedel
  2011-07-12 10:36                                             ` Avi Kivity
  0 siblings, 1 reply; 39+ messages in thread
From: Joerg Roedel @ 2011-07-12 10:24 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Peter Zijlstra, Will Deacon, Frederic Weisbecker, linux-kernel,
	kvm, Ingo Molnar, acme, Jason Wessel

On Tue, Jul 12, 2011 at 01:07:25PM +0300, Avi Kivity wrote:
> On 07/12/2011 01:03 PM, Joerg Roedel wrote:

>> Ingo made perf-integration a merge-requirement for LWP. It is not really
>> well-suited for being integrated into perf because the design goal was
>> easy and efficient self-profiling of tasks (like you stated). So
>> integrating it into perf causes some pain. But lets see how it works
>> out.
>
> I don't think it's workable.  Having do_mmap() called in the task's  
> context can change how it works.

According to Ingo that is already done at other places in the kernel and
should not be an issue.

> And the task being able to kill/modify  the profile, and not able to
> use LWP for itself, is a show stopper IMO.

I agree, that is a problem we have no solution for so far.

	Joerg


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/3] perf: add context field to perf_event
  2011-07-12 10:24                                           ` Joerg Roedel
@ 2011-07-12 10:36                                             ` Avi Kivity
  0 siblings, 0 replies; 39+ messages in thread
From: Avi Kivity @ 2011-07-12 10:36 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Peter Zijlstra, Will Deacon, Frederic Weisbecker, linux-kernel,
	kvm, Ingo Molnar, acme, Jason Wessel

On 07/12/2011 01:24 PM, Joerg Roedel wrote:
> On Tue, Jul 12, 2011 at 01:07:25PM +0300, Avi Kivity wrote:
> >  On 07/12/2011 01:03 PM, Joerg Roedel wrote:
>
> >>  Ingo made perf-integration a merge-requirement for LWP. It is not really
> >>  well-suited for being integrated into perf because the design goal was
> >>  easy and efficient self-profiling of tasks (like you stated). So
> >>  integrating it into perf causes some pain. But lets see how it works
> >>  out.
> >
> >  I don't think it's workable.  Having do_mmap() called in the task's
> >  context can change how it works.
>
> According to Ingo that is already done at other places in the kernel and
> should not be an issue.

kvm is one of those places, unfortunately.


-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/3] perf: add context field to perf_event
  2011-07-12  9:31                             ` Peter Zijlstra
  2011-07-12  9:36                               ` Avi Kivity
  2011-07-12  9:41                               ` Joerg Roedel
@ 2011-07-21 15:32                               ` Will Deacon
  2011-07-21 15:36                                 ` Avi Kivity
  2 siblings, 1 reply; 39+ messages in thread
From: Will Deacon @ 2011-07-21 15:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Avi Kivity, Frederic Weisbecker, linux-kernel, kvm, Ingo Molnar,
	acme, Jason Wessel

On Tue, Jul 12, 2011 at 10:31:00AM +0100, Peter Zijlstra wrote:
> On Tue, 2011-07-12 at 12:27 +0300, Avi Kivity wrote:
> > On 07/12/2011 12:18 PM, Peter Zijlstra wrote:
> > > >
> > > >  The guarantee is that the task was sleeping just before the function is
> > > >  called.  Of course it's woken up to run the function.
> > > >
> > > >  The idea is that you run the function in a known safe point to avoid
> > > >  extra synchronization.
> > > >
> > >
> > > I'd much rather we didn't wake the task and let it sleep, that's usually
> > > a very safe place for tasks to be. All you'd need is a guarantee it
> > > won't be woken up while you're doing your thing.
> > 
> > But it means that 'current' is not set to the right value.  If the 
> > function depends on it, then it will misbehave.  And in fact 
> > preempt_notifier_register(), which is the function we want to call here, 
> > does depend on current.
> > 
> > Of course we need to find more users for this, but I have a feeling this 
> > will be generally useful.  The alternative is to keep adding bits to 
> > thread_info::flags.
> 
> Using TIF_bits sounds like a much better solution for this, wakeups are
> really rather expensive and its best to avoid extra if at all possible.

The problem with using a TIF bit to tell a task that it needs to perform
some preempt_notifier registrations is that you end up with something that
looks a lot like preempt notifiers! You also don't escape the concurrent
read/write to thelist of pending registrations.

One thing I tried was simply using an RCU protected hlist for the preempt
notifiers so that we don't have to worry about atomicity when reading the
notifiers in finish_task_switch. It's a bit odd, since we know we only ever
have a single reader, but I've included it below anyway.

If anybody has any better ideas, I'm all ears.

Will


diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index 2e681d9..2e21ffe 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -132,6 +132,11 @@ struct preempt_notifier {
 void preempt_notifier_register(struct preempt_notifier *notifier);
 void preempt_notifier_unregister(struct preempt_notifier *notifier);
 
+void preempt_notifier_register_task(struct preempt_notifier *notifier,
+				    struct task_struct *tsk);
+void preempt_notifier_unregister_task(struct preempt_notifier *notifier,
+				      struct task_struct *tsk);
+
 static inline void preempt_notifier_init(struct preempt_notifier *notifier,
 				     struct preempt_ops *ops)
 {
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 496770a..5530d91 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1233,6 +1233,7 @@ struct task_struct {
 #ifdef CONFIG_PREEMPT_NOTIFIERS
 	/* list of struct preempt_notifier: */
 	struct hlist_head preempt_notifiers;
+	struct mutex preempt_notifiers_mutex;
 #endif
 
 	/*
diff --git a/kernel/sched.c b/kernel/sched.c
index 9769c75..d3c46ca 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2784,6 +2784,7 @@ static void __sched_fork(struct task_struct *p)
 
 #ifdef CONFIG_PREEMPT_NOTIFIERS
 	INIT_HLIST_HEAD(&p->preempt_notifiers);
+	mutex_init(&p->preempt_notifiers_mutex);
 #endif
 }
 
@@ -2901,13 +2902,31 @@ void wake_up_new_task(struct task_struct *p)
 
 #ifdef CONFIG_PREEMPT_NOTIFIERS
 
+void preempt_notifier_register_task(struct preempt_notifier *notifier,
+				    struct task_struct *tsk)
+{
+	mutex_lock(&tsk->preempt_notifiers_mutex);
+	hlist_add_head_rcu(&notifier->link, &tsk->preempt_notifiers);
+	mutex_unlock(&tsk->preempt_notifiers_mutex);
+}
+EXPORT_SYMBOL_GPL(preempt_notifier_register_task);
+
+void preempt_notifier_unregister_task(struct preempt_notifier *notifier,
+				      struct task_struct *tsk)
+{
+	mutex_lock(&tsk->preempt_notifiers_mutex);
+	hlist_del_rcu(&notifier->link);
+	mutex_unlock(&tsk->preempt_notifiers_mutex);
+}
+EXPORT_SYMBOL_GPL(preempt_notifier_unregister_task);
+
 /**
  * preempt_notifier_register - tell me when current is being preempted & rescheduled
  * @notifier: notifier struct to register
  */
 void preempt_notifier_register(struct preempt_notifier *notifier)
 {
-	hlist_add_head(&notifier->link, &current->preempt_notifiers);
+	preempt_notifier_register_task(notifier, current);
 }
 EXPORT_SYMBOL_GPL(preempt_notifier_register);
 
@@ -2919,7 +2938,7 @@ EXPORT_SYMBOL_GPL(preempt_notifier_register);
  */
 void preempt_notifier_unregister(struct preempt_notifier *notifier)
 {
-	hlist_del(&notifier->link);
+	preempt_notifier_unregister_task(notifier, current);
 }
 EXPORT_SYMBOL_GPL(preempt_notifier_unregister);
 
@@ -2928,8 +2947,12 @@ static void fire_sched_in_preempt_notifiers(struct task_struct *curr)
 	struct preempt_notifier *notifier;
 	struct hlist_node *node;
 
-	hlist_for_each_entry(notifier, node, &curr->preempt_notifiers, link)
+	rcu_read_lock();
+
+	hlist_for_each_entry_rcu(notifier, node, &curr->preempt_notifiers, link)
 		notifier->ops->sched_in(notifier, raw_smp_processor_id());
+
+	rcu_read_unlock();
 }
 
 static void
@@ -2939,8 +2962,12 @@ fire_sched_out_preempt_notifiers(struct task_struct *curr,
 	struct preempt_notifier *notifier;
 	struct hlist_node *node;
 
-	hlist_for_each_entry(notifier, node, &curr->preempt_notifiers, link)
+	rcu_read_lock();
+
+	hlist_for_each_entry_rcu(notifier, node, &curr->preempt_notifiers, link)
 		notifier->ops->sched_out(notifier, next);
+
+	rcu_read_unlock();
 }
 
 #else /* !CONFIG_PREEMPT_NOTIFIERS */
@@ -7979,6 +8006,7 @@ void __init sched_init(void)
 
 #ifdef CONFIG_PREEMPT_NOTIFIERS
 	INIT_HLIST_HEAD(&init_task.preempt_notifiers);
+	mutex_init(&init_task.preempt_notifiers_mutex);
 #endif
 
 #ifdef CONFIG_SMP


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/3] perf: add context field to perf_event
  2011-07-21 15:32                               ` Will Deacon
@ 2011-07-21 15:36                                 ` Avi Kivity
  2011-07-21 15:46                                   ` Will Deacon
  0 siblings, 1 reply; 39+ messages in thread
From: Avi Kivity @ 2011-07-21 15:36 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, Frederic Weisbecker, linux-kernel, kvm,
	Ingo Molnar, acme, Jason Wessel

On 07/21/2011 06:32 PM, Will Deacon wrote:
> >
> >  Using TIF_bits sounds like a much better solution for this, wakeups are
> >  really rather expensive and its best to avoid extra if at all possible.
>
> The problem with using a TIF bit to tell a task that it needs to perform
> some preempt_notifier registrations is that you end up with something that
> looks a lot like preempt notifiers! You also don't escape the concurrent
> read/write to thelist of pending registrations.
>
> One thing I tried was simply using an RCU protected hlist for the preempt
> notifiers so that we don't have to worry about atomicity when reading the
> notifiers in finish_task_switch. It's a bit odd, since we know we only ever
> have a single reader, but I've included it below anyway.
>
> If anybody has any better ideas, I'm all ears.

> +void preempt_notifier_register_task(struct preempt_notifier *notifier,
> +				    struct task_struct *tsk)
> +{
> +	mutex_lock(&tsk->preempt_notifiers_mutex);
> +	hlist_add_head_rcu(&notifier->link,&tsk->preempt_notifiers);
> +	mutex_unlock(&tsk->preempt_notifiers_mutex);
> +}
> +EXPORT_SYMBOL_GPL(preempt_notifier_register_task);
> +
> +void preempt_notifier_unregister_task(struct preempt_notifier *notifier,
> +				      struct task_struct *tsk)
> +{
> +	mutex_lock(&tsk->preempt_notifiers_mutex);
> +	hlist_del_rcu(&notifier->link);
> +	mutex_unlock(&tsk->preempt_notifiers_mutex);
> +}
> +EXPORT_SYMBOL_GPL(preempt_notifier_unregister_task);
> +
>   /**
>    * preempt_notifier_register - tell me when current is being preempted&  rescheduled
>    * @notifier: notifier struct to register
>    */
>   void preempt_notifier_register(struct preempt_notifier *notifier)
>   {
> -	hlist_add_head(&notifier->link,&current->preempt_notifiers);
> +	preempt_notifier_register_task(notifier, current);
>   }
>   EXPORT_SYMBOL_GPL(preempt_notifier_register);

This is (and must be) called from a preempt disabled context, no mutexes 
around here.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/3] perf: add context field to perf_event
  2011-07-21 15:36                                 ` Avi Kivity
@ 2011-07-21 15:46                                   ` Will Deacon
  2011-07-21 15:59                                     ` Avi Kivity
  0 siblings, 1 reply; 39+ messages in thread
From: Will Deacon @ 2011-07-21 15:46 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Peter Zijlstra, Frederic Weisbecker, linux-kernel, kvm,
	Ingo Molnar, acme, Jason Wessel

On Thu, Jul 21, 2011 at 04:36:43PM +0100, Avi Kivity wrote:
> On 07/21/2011 06:32 PM, Will Deacon wrote:
> > >
> > >  Using TIF_bits sounds like a much better solution for this, wakeups are
> > >  really rather expensive and its best to avoid extra if at all possible.
> >
> > The problem with using a TIF bit to tell a task that it needs to perform
> > some preempt_notifier registrations is that you end up with something that
> > looks a lot like preempt notifiers! You also don't escape the concurrent
> > read/write to thelist of pending registrations.
> >
> > One thing I tried was simply using an RCU protected hlist for the preempt
> > notifiers so that we don't have to worry about atomicity when reading the
> > notifiers in finish_task_switch. It's a bit odd, since we know we only ever
> > have a single reader, but I've included it below anyway.
> >
> > If anybody has any better ideas, I'm all ears.
> 
> > +void preempt_notifier_register_task(struct preempt_notifier *notifier,
> > +				    struct task_struct *tsk)
> > +{
> > +	mutex_lock(&tsk->preempt_notifiers_mutex);
> > +	hlist_add_head_rcu(&notifier->link,&tsk->preempt_notifiers);
> > +	mutex_unlock(&tsk->preempt_notifiers_mutex);
> > +}
> > +EXPORT_SYMBOL_GPL(preempt_notifier_register_task);
> > +
> > +void preempt_notifier_unregister_task(struct preempt_notifier *notifier,
> > +				      struct task_struct *tsk)
> > +{
> > +	mutex_lock(&tsk->preempt_notifiers_mutex);
> > +	hlist_del_rcu(&notifier->link);
> > +	mutex_unlock(&tsk->preempt_notifiers_mutex);
> > +}
> > +EXPORT_SYMBOL_GPL(preempt_notifier_unregister_task);
> > +
> >   /**
> >    * preempt_notifier_register - tell me when current is being preempted&  rescheduled
> >    * @notifier: notifier struct to register
> >    */
> >   void preempt_notifier_register(struct preempt_notifier *notifier)
> >   {
> > -	hlist_add_head(&notifier->link,&current->preempt_notifiers);
> > +	preempt_notifier_register_task(notifier, current);
> >   }
> >   EXPORT_SYMBOL_GPL(preempt_notifier_register);
> 
> This is (and must be) called from a preempt disabled context, no mutexes 
> around here.

Bah, yes, that is essential if you're dealing with current. Maybe use a
spinlock instead?

Will

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/3] perf: add context field to perf_event
  2011-07-21 15:46                                   ` Will Deacon
@ 2011-07-21 15:59                                     ` Avi Kivity
  2011-07-21 16:37                                       ` Will Deacon
  0 siblings, 1 reply; 39+ messages in thread
From: Avi Kivity @ 2011-07-21 15:59 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, Frederic Weisbecker, linux-kernel, kvm,
	Ingo Molnar, acme, Jason Wessel

On 07/21/2011 06:46 PM, Will Deacon wrote:
> >
> >  This is (and must be) called from a preempt disabled context, no mutexes
> >  around here.
>
> Bah, yes, that is essential if you're dealing with current. Maybe use a
> spinlock instead?

Could work.  Not thrilled about adding it to the kvm hot path, but I 
can't say it will make a measurable impact.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/3] perf: add context field to perf_event
  2011-07-21 15:59                                     ` Avi Kivity
@ 2011-07-21 16:37                                       ` Will Deacon
  0 siblings, 0 replies; 39+ messages in thread
From: Will Deacon @ 2011-07-21 16:37 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Peter Zijlstra, Frederic Weisbecker, linux-kernel, kvm,
	Ingo Molnar, acme, Jason Wessel

On Thu, Jul 21, 2011 at 04:59:00PM +0100, Avi Kivity wrote:
> On 07/21/2011 06:46 PM, Will Deacon wrote:
> > >
> > >  This is (and must be) called from a preempt disabled context, no mutexes
> > >  around here.
> >
> > Bah, yes, that is essential if you're dealing with current. Maybe use a
> > spinlock instead?
> 
> Could work.  Not thrilled about adding it to the kvm hot path, but I 
> can't say it will make a measurable impact.

Understood, but at least it doesn't contribute to finish_task_switch. I also
wouldn't expect to have multiple preempt registrations in parallel for the
same task so that lock should rarely (if ever) be contended.

Will

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2011-07-21 16:38 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-29 15:42 [PATCH 0/3] Preparatory perf patches for KVM PMU support Avi Kivity
2011-06-29 15:42 ` [PATCH 1/3] perf: add context field to perf_event Avi Kivity
2011-06-29 16:08   ` Frederic Weisbecker
2011-06-29 16:25     ` Avi Kivity
2011-06-29 16:27     ` Will Deacon
2011-07-04 13:58       ` Frederic Weisbecker
2011-07-04 14:10         ` Avi Kivity
2011-07-04 14:36           ` Frederic Weisbecker
2011-07-11 21:07             ` Will Deacon
2011-07-12  7:20               ` Avi Kivity
2011-07-12  8:38                 ` Peter Zijlstra
2011-07-12  9:08                   ` Avi Kivity
2011-07-12  9:14                     ` Peter Zijlstra
2011-07-12  9:16                       ` Avi Kivity
2011-07-12  9:18                         ` Peter Zijlstra
2011-07-12  9:27                           ` Avi Kivity
2011-07-12  9:31                             ` Peter Zijlstra
2011-07-12  9:36                               ` Avi Kivity
2011-07-12  9:42                                 ` Will Deacon
2011-07-12  9:41                               ` Joerg Roedel
2011-07-12  9:44                                 ` Avi Kivity
2011-07-12  9:48                                   ` Joerg Roedel
2011-07-12  9:55                                     ` Avi Kivity
2011-07-12 10:03                                       ` Joerg Roedel
2011-07-12 10:07                                         ` Avi Kivity
2011-07-12 10:24                                           ` Joerg Roedel
2011-07-12 10:36                                             ` Avi Kivity
2011-07-21 15:32                               ` Will Deacon
2011-07-21 15:36                                 ` Avi Kivity
2011-07-21 15:46                                   ` Will Deacon
2011-07-21 15:59                                     ` Avi Kivity
2011-07-21 16:37                                       ` Will Deacon
2011-07-05 14:30         ` Will Deacon
2011-07-05 14:34           ` Frederic Weisbecker
2011-07-01 15:24   ` [tip:perf/core] perf: Add " tip-bot for Avi Kivity
2011-06-29 15:42 ` [PATCH 2/3] x86, perf: add constraints for architectural PMU v1 Avi Kivity
2011-07-01 15:24   ` [tip:perf/core] x86, perf: Add constraints for architectural PMU tip-bot for Avi Kivity
2011-06-29 15:42 ` [PATCH 3/3] perf: export perf_event_refresh() to modules Avi Kivity
2011-07-01 15:25   ` [tip:perf/core] " tip-bot for Avi Kivity

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.