All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC v2 0/4] perf: IRQ-bound performance events
@ 2014-01-04 18:22 Alexander Gordeev
  2014-01-04 18:22 ` [PATCH RFC v2 1/4] perf/core: " Alexander Gordeev
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Alexander Gordeev @ 2014-01-04 18:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alexander Gordeev, Arnaldo Carvalho de Melo, Jiri Olsa,
	Ingo Molnar, Frederic Weisbecker, Peter Zijlstra, Andi Kleen

Hello,

This is version 2 of RFC "perf: IRQ-bound performance events". That is an
introduction of IRQ-bound performance events - ones that only count in a
context of a hardware interrupt handler. Ingo suggested to extend this
functionality to softirq and threaded handlers as well:

[quote]

Looks useful.

I think the main challenges are:

 - Creating a proper ABI for all this:

   - IRQ numbers alone are probably not specific enough: we'd also want to 
     be more specific to match on handler names - or handler numbers if
     the handler name is not unique.

   - another useful variant would be where IRQ numbers are too specific:
     something like 'perf top irq' would be a natural thing to do, to see 
     only overhead in hardirq execution - without limiting it to a
     specific handler. An 'all irq contexts' wildcard concept?

 - Covering softirqs as well. If we handle both hardirqs and softirqs,
   then we are pretty much feature complete: all major context types that 
   the Linux kernel cares about are covered in instrumentation. For things
   like networking the softirq overhead is obviously very important, and 
   for example on routers it will do most of the execution.

 - Covering threaded IRQs as well, in a similar model. So if someone types
   'perf top irq', and some IRQ handlers are running threaded, those
   should probaby be included as well.

 - Making the tooling friendlier: 'perf top irq' would be useful, and
   accepting handler names would be useful as well.

The runtime overhead of your patches seems to be pretty low: when no IRQ 
contexts are instrumented then it's a single 'is the list empty' check at 
context scheduling time. That looks acceptable.

Regarding the ABI and IRQ/softirq context enumeration you are breaking 
lots of new ground here, because unlike tasks, cgroups and CPUs the IRQ 
execution contexts do not have a good programmatically accessible 
namespace (yet). So it has to be thought out pretty well I think, but once 
we have it, it will be a lovely feature IMO.

Thanks,

	Ingo

[/quote]

This RFC version addresses only "Creating a proper ABI for all this"
suggestion for kernel side. Each hardware interrupt context performance
event is assigned a bitmask where each bit indicates whether the action with
the bit's number should be measured or not. A task to convert handler
name(s), wildcards etc. to bitmasks to be off-loaded to user level and is
not yet supported.

The kernel side implementation revolves around a need to make enabling and
disabling performance counters in hardware interrupt context as fast as
possible. For this reason a new command PERF_EVENT_IOC_SET_HARDIRQ pre-
allocates and initializes a per-CPU array with performance events destined
for this IRQ, before the event is started. Once, the action (aka ISR) is
called, another pre-allocated per-CPU array gets initialized with events for
this action and then submitted to relevant PMUs using a new PMU callback:

	void (*start_hardirq)(struct perf_event *events[], int count);

Since the performance events are expected known to PMUs, it should be able
to enable the counters in a performance-aware manner. I.e. in the sample
patch for Intel PMU this goal is achieved with a single pass thru the
'events' array and a single call to WRMSR instruction.

By contrast with version 1 of this RFC, per-CPU lists are replaced with
per-CPU arrays whenever possible. With an assumption there will be normally
no more than a dozen of events being counted at a time, it expected to add
to cache hit rate when the events are enabled or disabled from the hardware
interrupt context.

Besides the original purpose the design accommodates an ability to run the
same performance counter for any combination of actions and IRQs, which
makes possible a unlimited level of flexibility. This feature is not yet
supported with the perf tool, though.

Although the whole idea seems simple, I am not sure if it fits into the
current perf design and does not break some ground assumptions. The very
purpose of this RFC is to ensure the taken approach is correct.

This RFC interleaves with toggling events introduced some time ago. While
addressing a similar problem, it does not appear the toggling events could
count on per-action basis, nor to provide a flexibility this RFC assumes.
The performance is also a major concern. Perhaps, the two designs could be
merged, but at the moment I am not realizing how. Suggestions are very
welcomed.


The perf tool update for now is just a hack to make possible kernel side
testing. Here is a sample session against IRQ #8, 'rtc0' device:

# ./tools/perf/perf stat -a -e L1-dcache-load-misses:k --hardirq=8 sleep 1

 Performance counter stats for 'system wide':

                 0      L1-dcache-load-misses                                       

       1.001190052 seconds time elapsed

# ./tools/perf/perf stat -a -e L1-dcache-load-misses:k --hardirq=8 hwclock --test
Sat 04 Jan 2014 12:16:36 EST  -0.484913 seconds

 Performance counter stats for 'system wide':

               374      L1-dcache-load-misses                                       

       0.485939068 seconds time elapsed


The patchset is against Arnaldo's repo, in "perf/core" branch.

The tree could be found in "pci-next-msi-v5" branch in repo:
https://github.com/a-gordeev/linux.git

Thanks!

Alexander Gordeev (4):
  perf/core: IRQ-bound performance events
  perf/x86: IRQ-bound performance events
  perf/x86/Intel: IRQ-bound performance events
  perf/tool: IRQ-bound performance events

 arch/x86/kernel/cpu/perf_event.c       |   55 +++++-
 arch/x86/kernel/cpu/perf_event.h       |   15 ++
 arch/x86/kernel/cpu/perf_event_amd.c   |    2 +
 arch/x86/kernel/cpu/perf_event_intel.c |   72 ++++++-
 arch/x86/kernel/cpu/perf_event_knc.c   |    2 +
 arch/x86/kernel/cpu/perf_event_p4.c    |    2 +
 arch/x86/kernel/cpu/perf_event_p6.c    |    2 +
 include/linux/irq.h                    |   10 +
 include/linux/irqdesc.h                |    4 +
 include/linux/perf_event.h             |   24 ++
 include/uapi/linux/perf_event.h        |   14 ++-
 kernel/events/Makefile                 |    2 +-
 kernel/events/core.c                   |  142 ++++++++++++-
 kernel/events/hardirq.c                |  370 ++++++++++++++++++++++++++++++++
 kernel/irq/handle.c                    |    7 +-
 kernel/irq/irqdesc.c                   |   15 ++
 tools/perf/builtin-stat.c              |    9 +
 tools/perf/util/evlist.c               |   38 ++++
 tools/perf/util/evlist.h               |    3 +
 tools/perf/util/evsel.c                |    8 +
 tools/perf/util/evsel.h                |    3 +
 tools/perf/util/parse-events.c         |   24 ++
 tools/perf/util/parse-events.h         |    1 +
 23 files changed, 811 insertions(+), 13 deletions(-)
 create mode 100644 kernel/events/hardirq.c

-- 
1.7.7.6


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH RFC v2 1/4] perf/core: IRQ-bound performance events
  2014-01-04 18:22 [PATCH RFC v2 0/4] perf: IRQ-bound performance events Alexander Gordeev
@ 2014-01-04 18:22 ` Alexander Gordeev
  2014-01-04 18:22 ` [PATCH RFC v2 2/4] perf/x86: " Alexander Gordeev
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Alexander Gordeev @ 2014-01-04 18:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alexander Gordeev, Arnaldo Carvalho de Melo, Jiri Olsa,
	Ingo Molnar, Frederic Weisbecker, Peter Zijlstra, Andi Kleen

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
 include/linux/irq.h             |   10 +
 include/linux/irqdesc.h         |    4 +
 include/linux/perf_event.h      |   24 +++
 include/uapi/linux/perf_event.h |   15 ++-
 kernel/events/Makefile          |    2 +-
 kernel/events/core.c            |  176 +++++++++++++++++--
 kernel/events/hardirq.c         |  370 +++++++++++++++++++++++++++++++++++++++
 kernel/irq/handle.c             |    7 +-
 kernel/irq/irqdesc.c            |   15 ++
 9 files changed, 609 insertions(+), 14 deletions(-)
 create mode 100644 kernel/events/hardirq.c

diff --git a/include/linux/irq.h b/include/linux/irq.h
index 7dc1003..c79bbbd 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -632,6 +632,16 @@ static inline int irq_reserve_irq(unsigned int irq)
 # define irq_reg_readl(addr)		readl(addr)
 #endif
 
+#ifdef CONFIG_PERF_EVENTS
+extern void perf_start_hardirq_events(struct irq_desc *desc, int action_nr);
+extern void perf_stop_hardirq_events(struct irq_desc *desc, int action_nr);
+#else
+static inline void
+perf_start_hardirq_events(struct irq_desc *desc, int action_nr)	{ }
+static inline void
+perf_stop_hardirq_events(struct irq_desc *desc, int action_nr)	{ }
+#endif
+
 /**
  * struct irq_chip_regs - register offsets for struct irq_gci
  * @enable:	Enable register offset to reg_base
diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h
index 56fb646..00a2759 100644
--- a/include/linux/irqdesc.h
+++ b/include/linux/irqdesc.h
@@ -12,6 +12,7 @@ struct irq_affinity_notify;
 struct proc_dir_entry;
 struct module;
 struct irq_desc;
+struct hardirq_events;
 
 /**
  * struct irq_desc - interrupt descriptor
@@ -68,6 +69,9 @@ struct irq_desc {
 	struct proc_dir_entry	*dir;
 #endif
 	int			parent_irq;
+#ifdef CONFIG_PERF_EVENTS
+	struct hardirq_events __percpu **events;
+#endif
 	struct module		*owner;
 	const char		*name;
 } ____cacheline_internodealigned_in_smp;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 8f4a70f..8bd7860 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -215,6 +215,12 @@ struct pmu {
 	void (*stop)			(struct perf_event *event, int flags);
 
 	/*
+	 * Start/Stop hardware interrupt context counters present on the PMU.
+	 */
+	void (*start_hardirq)		(struct perf_event *events[], int count); /* optional */
+	void (*stop_hardirq)		(struct perf_event *events[], int count); /* optional */
+
+	/*
 	 * Updates the counter value of the event.
 	 */
 	void (*read)			(struct perf_event *event);
@@ -313,6 +319,11 @@ struct perf_event {
 	struct list_head		sibling_list;
 
 	/*
+	 * List of hardware interrupt context numbers and actions
+	 */
+	struct list_head		hardirq_list;
+
+	/*
 	 * We need storage to track the entries in perf_pmu_migrate_context; we
 	 * cannot use the event_entry because of RCU and we want to keep the
 	 * group in tact which avoids us using the other two entries.
@@ -528,6 +539,12 @@ struct perf_output_handle {
 	int				page;
 };
 
+struct perf_hardirq_param {
+	struct list_head	list;
+	int			irq;
+	unsigned long		mask;
+};
+
 #ifdef CONFIG_PERF_EVENTS
 
 extern int perf_pmu_register(struct pmu *pmu, const char *name, int type);
@@ -635,6 +652,11 @@ static inline int is_software_event(struct perf_event *event)
 	return event->pmu->task_ctx_nr == perf_sw_context;
 }
 
+static inline bool is_hardirq_event(struct perf_event *event)
+{
+	return event->attr.hardirq != 0;
+}
+
 extern struct static_key perf_swevent_enabled[PERF_COUNT_SW_MAX];
 
 extern void __perf_sw_event(u32, u64, struct pt_regs *, u64);
@@ -772,6 +794,8 @@ extern void perf_event_enable(struct perf_event *event);
 extern void perf_event_disable(struct perf_event *event);
 extern int __perf_event_disable(void *info);
 extern void perf_event_task_tick(void);
+extern int perf_event_init_hardirq(void *info);
+extern int perf_event_term_hardirq(void *info);
 #else
 static inline void
 perf_event_task_sched_in(struct task_struct *prev,
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index e1802d6..a033014 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -301,8 +301,9 @@ struct perf_event_attr {
 				exclude_callchain_kernel : 1, /* exclude kernel callchains */
 				exclude_callchain_user   : 1, /* exclude user callchains */
 				mmap2          :  1, /* include mmap with inode data     */
+				hardirq        :  1,
 
-				__reserved_1   : 40;
+				__reserved_1   : 39;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -348,6 +349,7 @@ struct perf_event_attr {
 #define PERF_EVENT_IOC_SET_OUTPUT	_IO ('$', 5)
 #define PERF_EVENT_IOC_SET_FILTER	_IOW('$', 6, char *)
 #define PERF_EVENT_IOC_ID		_IOR('$', 7, __u64 *)
+#define PERF_EVENT_IOC_SET_HARDIRQ	_IOR('$', 8, __u64 *)
 
 enum perf_event_ioc_flags {
 	PERF_IOC_FLAG_GROUP		= 1U << 0,
@@ -724,6 +726,7 @@ enum perf_callchain_context {
 #define PERF_FLAG_FD_NO_GROUP		(1U << 0)
 #define PERF_FLAG_FD_OUTPUT		(1U << 1)
 #define PERF_FLAG_PID_CGROUP		(1U << 2) /* pid=cgroup id, per-cpu mode only */
+#define PERF_FLAG_PID_HARDIRQ		(1U << 3) /* pid=irq number */
 
 union perf_mem_data_src {
 	__u64 val;
@@ -812,4 +815,14 @@ struct perf_branch_entry {
 		reserved:60;
 };
 
+struct perf_hardirq_disp {
+	__s32				irq_nr;
+	__u64				actions;
+};
+
+struct perf_hardirq_event_disp {
+	__s32				nr_disp;	/* everything if <0 */
+	struct perf_hardirq_disp	disp[0];
+};
+
 #endif /* _UAPI_LINUX_PERF_EVENT_H */
diff --git a/kernel/events/Makefile b/kernel/events/Makefile
index 103f5d1..8b94980 100644
--- a/kernel/events/Makefile
+++ b/kernel/events/Makefile
@@ -2,7 +2,7 @@ ifdef CONFIG_FUNCTION_TRACER
 CFLAGS_REMOVE_core.o = -pg
 endif
 
-obj-y := core.o ring_buffer.o callchain.o
+obj-y := core.o ring_buffer.o callchain.o hardirq.o
 
 obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o
 obj-$(CONFIG_UPROBES) += uprobes.o
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 89d34f9..465ce681 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -118,8 +118,9 @@ static int cpu_function_call(int cpu, int (*func) (void *info), void *info)
 }
 
 #define PERF_FLAG_ALL (PERF_FLAG_FD_NO_GROUP |\
-		       PERF_FLAG_FD_OUTPUT  |\
-		       PERF_FLAG_PID_CGROUP)
+		       PERF_FLAG_FD_OUTPUT |\
+		       PERF_FLAG_PID_CGROUP |\
+		       PERF_FLAG_PID_HARDIRQ)
 
 /*
  * branch priv levels that need permission checks
@@ -3213,10 +3214,46 @@ static void __free_event(struct perf_event *event)
 
 	call_rcu(&event->rcu_head, free_event_rcu);
 }
+
+static int __perf_hardirq_add_disp(struct perf_event *event,
+				   struct perf_hardirq_disp *disp)
+{
+	struct perf_hardirq_param *param = kmalloc_node(sizeof(*param),
+		GFP_KERNEL, cpu_to_node(event->cpu));
+	if (!param)
+		return -ENOMEM;
+
+	param->irq = disp->irq_nr;
+
+	if (disp->actions == (typeof(disp->actions))-1)
+		param->mask = -1;
+	else
+		param->mask = disp->actions;
+
+	list_add(&param->list, &event->hardirq_list);
+
+	return 0;
+}
+
+static void __perf_hardirq_del_disps(struct perf_event *event)
+{
+	struct perf_hardirq_param *param;
+	struct list_head *pos, *next;
+
+	list_for_each_safe(pos, next, &event->hardirq_list) {
+		param = list_entry(pos, typeof(*param), list);
+		list_del(pos);
+		kfree(param);
+	}
+}
+
 static void free_event(struct perf_event *event)
 {
 	irq_work_sync(&event->pending);
 
+	cpu_function_call(event->cpu, perf_event_term_hardirq, event);
+	__perf_hardirq_del_disps(event);
+
 	unaccount_event(event);
 
 	if (event->rb) {
@@ -3590,6 +3627,7 @@ static inline int perf_fget_light(int fd, struct fd *p)
 static int perf_event_set_output(struct perf_event *event,
 				 struct perf_event *output_event);
 static int perf_event_set_filter(struct perf_event *event, void __user *arg);
+static int perf_event_set_hardirq(struct perf_event *event, void __user *arg);
 
 static long perf_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 {
@@ -3644,6 +3682,9 @@ static long perf_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 	case PERF_EVENT_IOC_SET_FILTER:
 		return perf_event_set_filter(event, (void __user *)arg);
 
+	case PERF_EVENT_IOC_SET_HARDIRQ:
+		return perf_event_set_hardirq(event, (void __user *)arg);
+
 	default:
 		return -ENOTTY;
 	}
@@ -6248,6 +6289,10 @@ static void perf_pmu_nop_void(struct pmu *pmu)
 {
 }
 
+static void perf_pmu_nop_void_arg1_arg2(struct perf_event *events[], int count)
+{
+}
+
 static int perf_pmu_nop_int(struct pmu *pmu)
 {
 	return 0;
@@ -6511,6 +6556,11 @@ got_cpu_context:
 		pmu->pmu_disable = perf_pmu_nop_void;
 	}
 
+	if (!pmu->start_hardirq) {
+		pmu->start_hardirq = perf_pmu_nop_void_arg1_arg2;
+		pmu->stop_hardirq = perf_pmu_nop_void_arg1_arg2;
+	}
+
 	if (!pmu->event_idx)
 		pmu->event_idx = perf_event_idx_default;
 
@@ -6668,6 +6718,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 	INIT_LIST_HEAD(&event->group_entry);
 	INIT_LIST_HEAD(&event->event_entry);
 	INIT_LIST_HEAD(&event->sibling_list);
+	INIT_LIST_HEAD(&event->hardirq_list);
 	INIT_LIST_HEAD(&event->rb_entry);
 	INIT_LIST_HEAD(&event->active_entry);
 
@@ -6977,6 +7028,7 @@ SYSCALL_DEFINE5(perf_event_open,
 	struct fd group = {NULL, 0};
 	struct task_struct *task = NULL;
 	struct pmu *pmu;
+	int hardirq = -1;
 	int event_fd;
 	int move_group = 0;
 	int err;
@@ -6985,6 +7037,27 @@ SYSCALL_DEFINE5(perf_event_open,
 	if (flags & ~PERF_FLAG_ALL)
 		return -EINVAL;
 
+	if ((flags & (PERF_FLAG_PID_CGROUP | PERF_FLAG_PID_HARDIRQ)) ==
+	    (PERF_FLAG_PID_CGROUP | PERF_FLAG_PID_HARDIRQ))
+		return -EINVAL;
+
+	/*
+	 * In irq mode, the pid argument is used to pass irq number.
+	 */
+	if (flags & PERF_FLAG_PID_HARDIRQ) {
+		hardirq = pid;
+		pid = -1;
+	}
+
+	/*
+	 * In cgroup mode, the pid argument is used to pass the fd
+	 * opened to the cgroup directory in cgroupfs. The cpu argument
+	 * designates the cpu on which to monitor threads from that
+	 * cgroup.
+	 */
+	if ((flags & PERF_FLAG_PID_CGROUP) && (pid == -1 || cpu == -1))
+		return -EINVAL;
+
 	err = perf_copy_attr(attr_uptr, &attr);
 	if (err)
 		return err;
@@ -6999,15 +7072,6 @@ SYSCALL_DEFINE5(perf_event_open,
 			return -EINVAL;
 	}
 
-	/*
-	 * In cgroup mode, the pid argument is used to pass the fd
-	 * opened to the cgroup directory in cgroupfs. The cpu argument
-	 * designates the cpu on which to monitor threads from that
-	 * cgroup.
-	 */
-	if ((flags & PERF_FLAG_PID_CGROUP) && (pid == -1 || cpu == -1))
-		return -EINVAL;
-
 	event_fd = get_unused_fd();
 	if (event_fd < 0)
 		return event_fd;
@@ -7874,6 +7938,96 @@ static void perf_event_exit_cpu(int cpu)
 static inline void perf_event_exit_cpu(int cpu) { }
 #endif
 
+static int __perf_hardirq_check_disp(struct perf_hardirq_disp *disp)
+{
+	struct irq_desc *desc = irq_to_desc(disp->irq_nr);
+	struct irqaction *action;
+	int nr_actions = 0;
+	unsigned long flags;
+
+	if (!desc)
+		return -ENOENT;
+
+	if (!disp->actions)
+		return -EINVAL;
+
+	/*
+	 * -1 means all actions
+	 */
+	if (disp->actions == (typeof(disp->actions))-1)
+		return 0;
+
+	/*
+	 * Check actions existence
+	 */
+	raw_spin_lock_irqsave(&desc->lock, flags);
+	for (action = desc->action; action; action = action->next)
+		nr_actions++;
+	raw_spin_unlock_irqrestore(&desc->lock, flags);
+
+	if (!nr_actions)
+		return -ENOENT;
+
+	if (__fls(disp->actions) + 1 > nr_actions)
+		return -EINVAL;
+
+	return 0;
+}
+
+static int perf_event_set_hardirq(struct perf_event *event, void __user *arg)
+{
+	struct perf_hardirq_event_disp edisp;
+	struct perf_hardirq_disp idisp;
+	struct perf_hardirq_disp __user *user;
+	struct perf_hardirq_param *param;
+	int ret = 0;
+	int i;
+
+	if (copy_from_user(&edisp, arg, sizeof(edisp.nr_disp)))
+		return -EFAULT;
+
+	/*
+	 * TODO Run counters for all actions on all IRQs
+	 */
+	if (edisp.nr_disp == (typeof(edisp.nr_disp))-1)
+		return -EINVAL;
+
+	user = arg + offsetof(typeof(edisp), disp);
+	for (i = 0; i < edisp.nr_disp; i++) {
+		if (copy_from_user(&idisp, &user[i], sizeof(idisp))) {
+			ret = -EFAULT;
+			goto err;
+		}
+
+		/*
+		 * Multiple entries against one IRQ are not allowed
+		 */
+		list_for_each_entry(param, &event->hardirq_list, list) {
+			if (param->irq == idisp.irq_nr)
+				return -EINVAL;
+		}
+
+		ret = __perf_hardirq_check_disp(&idisp);
+		if (ret)
+			goto err;
+
+		ret = __perf_hardirq_add_disp(event, &idisp);
+		if (ret)
+			goto err;
+	}
+
+	ret = cpu_function_call(event->cpu, perf_event_init_hardirq, event);
+	if (ret)
+		goto err;
+
+	return 0;
+
+err:
+	__perf_hardirq_del_disps(event);
+
+	return ret;
+}
+
 static int
 perf_reboot(struct notifier_block *notifier, unsigned long val, void *v)
 {
diff --git a/kernel/events/hardirq.c b/kernel/events/hardirq.c
new file mode 100644
index 0000000..f857be3
--- /dev/null
+++ b/kernel/events/hardirq.c
@@ -0,0 +1,370 @@
+/*
+ * linux/kernel/events/hardirq.c
+ *
+ * Copyright (C) 2012-2014 Red Hat, Inc., Alexander Gordeev
+ *
+ * This file contains the code for h/w interrupt context performance counters
+ */
+
+#include <linux/perf_event.h>
+#include <linux/irq.h>
+#include <linux/percpu.h>
+#include <linux/bitops.h>
+#include <linux/slab.h>
+#include <linux/sort.h>
+
+struct hardirq_event {
+	unsigned long		mask;		/* action numbers to count on */
+	struct perf_event	*event;		/* event to count */
+};
+
+struct hardirq_events {
+	int			nr_events;	/* number of events in array */
+	struct hardirq_event	events[0];	/* array of events to count */
+};
+
+struct active_events {
+	int			nr_events;	/* number of allocated events */
+	int			nr_active;	/* number of events to count */
+	struct perf_event	*events[0];	/* array of events to count */
+};
+
+DEFINE_PER_CPU(struct active_events *, active_events);
+DEFINE_PER_CPU(int, total_events);
+
+static struct hardirq_events *alloc_desc_events(int cpu, int count)
+{
+	struct hardirq_events *events;
+	size_t size;
+
+	size = offsetof(typeof(*events), events) +
+	       count * sizeof(events->events[0]);
+	events = kmalloc_node(size, GFP_KERNEL, cpu_to_node(cpu));
+	if (unlikely(!events))
+		return NULL;
+
+	events->nr_events = count;
+
+	return events;
+}
+
+static void free_desc_events(struct hardirq_events *events)
+{
+	kfree(events);
+}
+
+static struct active_events *alloc_active_events(int cpu, int count)
+{
+	struct active_events *active;
+	size_t size;
+
+	size = offsetof(typeof(*active), events) +
+	       count * sizeof(active->events[0]);
+	active = kmalloc_node(size, GFP_KERNEL, cpu_to_node(cpu));
+	if (unlikely(!active))
+		return NULL;
+
+	active->nr_events = count;
+
+	return active;
+}
+
+static void free_active_events(struct active_events *active)
+{
+	kfree(active);
+}
+
+static int compare_pmus(const void *event1, const void *event2)
+{
+	return strcmp(((const struct hardirq_event *)event1)->event->pmu->name,
+		      ((const struct hardirq_event *)event2)->event->pmu->name);
+}
+
+static int max_active_events(struct hardirq_events *events)
+{
+	/*
+	 * TODO Count number of events per action and return the maximum
+	 */
+	return events->nr_events;
+}
+
+static int add_event(struct perf_event *event, int irq, unsigned long mask)
+{
+	struct irq_desc *desc = irq_to_desc(irq);
+	struct hardirq_events __percpu **events_ptr;
+	struct hardirq_events *events, *events_tmp = NULL;
+	struct active_events __percpu *active;
+	struct active_events *active_tmp = NULL;
+	int cpu, max_active, nr_events;
+	unsigned long flags;
+	int ret = 0;
+
+	if (!desc)
+		return -ENOENT;
+
+	cpu = get_cpu();
+	BUG_ON(cpu != event->cpu);
+
+	events_ptr = this_cpu_ptr(desc->events);
+	events = *events_ptr;
+
+	nr_events = events ? events->nr_events : 0;
+	events_tmp = alloc_desc_events(cpu, nr_events + 1);
+	if (!events_tmp) {
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	memmove(events_tmp->events, events->events,
+		nr_events * sizeof(events_tmp->events[0]));
+
+	events_tmp->events[nr_events].event = event;
+	events_tmp->events[nr_events].mask = mask;
+
+	/*
+	 * Group events that belong to same PMUs in contiguous sub-arrays
+	 */
+	sort(events_tmp->events, events_tmp->nr_events,
+	     sizeof(events_tmp->events[0]), compare_pmus, NULL);
+
+	max_active = max_active_events(events_tmp);
+	active = this_cpu_read(active_events);
+
+	if (!active || max_active > active->nr_active) {
+		active_tmp = alloc_active_events(cpu, max_active);
+		if (!active_tmp) {
+			ret = -ENOMEM;
+			goto err;
+		}
+	}
+
+	raw_spin_lock_irqsave(&desc->lock, flags);
+
+	swap(events, events_tmp);
+	*events_ptr = events;
+
+	if (active_tmp) {
+		swap(active, active_tmp);
+		this_cpu_write(active_events, active);
+	}
+
+	__this_cpu_inc(total_events);
+
+	raw_spin_unlock_irqrestore(&desc->lock, flags);
+
+err:
+	put_cpu();
+
+	free_active_events(active_tmp);
+	free_desc_events(events_tmp);
+
+	return ret;
+}
+
+static int del_event(struct perf_event *event, int irq)
+{
+	struct irq_desc *desc = irq_to_desc(irq);
+	struct hardirq_events __percpu **events_ptr;
+	struct hardirq_events *events, *events_tmp = NULL;
+	struct active_events __percpu *active;
+	struct active_events *active_tmp = NULL;
+	int cpu, i, nr_events;
+	unsigned long flags;
+	int ret = 0;
+
+	if (!desc)
+		return -ENOENT;
+
+	cpu = get_cpu();
+	BUG_ON(cpu != event->cpu);
+
+	events_ptr = this_cpu_ptr(desc->events);
+	events = *events_ptr;
+
+	nr_events = events->nr_events;
+	for (i = 0; i < nr_events; i++) {
+		if (events->events[i].event == event)
+			break;
+	}
+
+	if (i >= nr_events) {
+		ret = -ENOENT;
+		goto err;
+	}
+
+	if (nr_events > 1) {
+		events_tmp = alloc_desc_events(cpu, nr_events - 1);
+		if (!events_tmp) {
+			ret = -ENOMEM;
+			goto err;
+		}
+
+		memmove(&events_tmp->events[0], &events->events[0],
+			i * sizeof(events->events[0]));
+		memmove(&events_tmp->events[i], &events->events[i + 1],
+			(nr_events - i - 1) * sizeof(events->events[0]));
+	}
+
+	active = this_cpu_read(active_events);
+
+	raw_spin_lock_irqsave(&desc->lock, flags);
+
+	if (!__this_cpu_dec_return(total_events)) {
+		swap(active, active_tmp);
+		this_cpu_write(active_events, active);
+	}
+
+	swap(events, events_tmp);
+	*events_ptr = events;
+
+	raw_spin_unlock_irqrestore(&desc->lock, flags);
+
+err:
+	put_cpu();
+
+	free_active_events(active_tmp);
+	free_desc_events(events_tmp);
+
+	return ret;
+}
+
+int perf_event_init_hardirq(void *info)
+{
+	struct perf_event *event = info;
+	struct perf_hardirq_param *param, *param_tmp;
+	int ret = 0;
+
+	list_for_each_entry(param, &event->hardirq_list, list) {
+		ret = add_event(event, param->irq, param->mask);
+		if (ret)
+			break;
+	}
+
+	if (ret) {
+		list_for_each_entry(param_tmp, &event->hardirq_list, list) {
+			if (param == param_tmp)
+				break;
+			del_event(event, param_tmp->irq);
+		}
+	}
+
+	WARN_ON(ret);
+	return ret;
+}
+
+int perf_event_term_hardirq(void *info)
+{
+	struct perf_event *event = info;
+	struct perf_hardirq_param *param;
+	int ret_tmp, ret = 0;
+
+	list_for_each_entry(param, &event->hardirq_list, list) {
+		ret_tmp = del_event(event, param->irq);
+		if (!ret)
+			ret = ret_tmp;
+	}
+
+	WARN_ON(ret);
+	return ret;
+}
+
+static void update_active_events(struct active_events *active,
+				 struct hardirq_events *events,
+				 int action_nr)
+{
+	int i, nr_active = 0;
+
+	for (i = 0; i < events->nr_events; i++) {
+		struct hardirq_event *event = &events->events[i];
+
+		if (test_bit(action_nr, &event->mask)) {
+			active->events[nr_active] = event->event;
+			nr_active++;
+		}
+	}
+
+	active->nr_active = nr_active;
+}
+
+int perf_alloc_hardirq_events(struct irq_desc *desc)
+{
+	desc->events = alloc_percpu(struct hardirq_events*);
+	if (!desc->events)
+		return -ENOMEM;
+	return 0;
+}
+
+void perf_free_hardirq_events(struct irq_desc *desc)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu)
+		BUG_ON(*per_cpu_ptr(desc->events, cpu));
+
+	free_percpu(desc->events);
+}
+
+static void start_stop_events(struct perf_event *events[], int count, bool start)
+{
+	/*
+	 * All events in the list must belong to the same PMU
+	 */
+	struct pmu *pmu = events[0]->pmu;
+
+	if (start)
+		pmu->start_hardirq(events, count);
+	else
+		pmu->stop_hardirq(events, count);
+}
+
+static void start_stop_active(struct active_events *active, bool start)
+{
+	struct perf_event **first, **last;
+	int i;
+
+	first = last = active->events;
+
+	for (i = 0; i < active->nr_active; i++) {
+		if ((*last)->pmu != (*first)->pmu) {
+			start_stop_events(first, last - first, start);
+			first = last;
+		}
+		last++;
+	}
+
+	start_stop_events(first, last - first, start);
+}
+
+static void start_stop_desc(struct irq_desc *desc, int action_nr, bool start)
+{
+	struct hardirq_events __percpu *events;
+	struct active_events __percpu *active;
+
+	events = *__this_cpu_ptr(desc->events);
+	if (likely(!events))
+		return;
+
+	active = __this_cpu_read(active_events);
+
+	/*
+	 * Assume events to run do not change between start and stop,
+	 * thus no reason to update active events when stopping.
+	 */
+	if (start)
+		update_active_events(active, events, action_nr);
+
+	if (!active->nr_active)
+		return;
+
+	start_stop_active(active, start);
+}
+
+void perf_start_hardirq_events(struct irq_desc *desc, int action_nr)
+{
+	start_stop_desc(desc, action_nr, true);
+}
+
+void perf_stop_hardirq_events(struct irq_desc *desc, int action_nr)
+{
+	start_stop_desc(desc, action_nr, false);
+}
diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c
index 131ca17..7feab55 100644
--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -133,13 +133,17 @@ irqreturn_t
 handle_irq_event_percpu(struct irq_desc *desc, struct irqaction *action)
 {
 	irqreturn_t retval = IRQ_NONE;
-	unsigned int flags = 0, irq = desc->irq_data.irq;
+	unsigned int flags = 0, irq = desc->irq_data.irq, action_nr = 0;
 
 	do {
 		irqreturn_t res;
 
 		trace_irq_handler_entry(irq, action);
+		perf_start_hardirq_events(desc, action_nr);
+
 		res = action->handler(irq, action->dev_id);
+
+		perf_stop_hardirq_events(desc, action_nr);
 		trace_irq_handler_exit(irq, action, res);
 
 		if (WARN_ONCE(!irqs_disabled(),"irq %u handler %pF enabled interrupts\n",
@@ -170,6 +174,7 @@ handle_irq_event_percpu(struct irq_desc *desc, struct irqaction *action)
 
 		retval |= res;
 		action = action->next;
+		action_nr++;
 	} while (action);
 
 	add_interrupt_randomness(irq, flags);
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 192a302..cd02b29 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -131,6 +131,14 @@ static void free_masks(struct irq_desc *desc)
 static inline void free_masks(struct irq_desc *desc) { }
 #endif
 
+#ifdef CONFIG_PERF_EVENTS
+extern int perf_alloc_hardirq_events(struct irq_desc *desc);
+extern void perf_free_hardirq_events(struct irq_desc *desc);
+#else
+static inline int perf_alloc_hardirq_events(struct irq_desc *desc) { return 0; }
+static inline void perf_free_hardirq_events(struct irq_desc *desc) { }
+#endif
+
 static struct irq_desc *alloc_desc(int irq, int node, struct module *owner)
 {
 	struct irq_desc *desc;
@@ -147,6 +155,10 @@ static struct irq_desc *alloc_desc(int irq, int node, struct module *owner)
 	if (alloc_masks(desc, gfp, node))
 		goto err_kstat;
 
+	if (perf_alloc_hardirq_events(desc))
+		goto err_masks;
+
+
 	raw_spin_lock_init(&desc->lock);
 	lockdep_set_class(&desc->lock, &irq_desc_lock_class);
 
@@ -154,6 +166,8 @@ static struct irq_desc *alloc_desc(int irq, int node, struct module *owner)
 
 	return desc;
 
+err_masks:
+	free_masks(desc);
 err_kstat:
 	free_percpu(desc->kstat_irqs);
 err_desc:
@@ -171,6 +185,7 @@ static void free_desc(unsigned int irq)
 	delete_irq_desc(irq);
 	mutex_unlock(&sparse_irq_lock);
 
+	perf_free_hardirq_events(desc);
 	free_masks(desc);
 	free_percpu(desc->kstat_irqs);
 	kfree(desc);
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RFC v2 2/4] perf/x86: IRQ-bound performance events
  2014-01-04 18:22 [PATCH RFC v2 0/4] perf: IRQ-bound performance events Alexander Gordeev
  2014-01-04 18:22 ` [PATCH RFC v2 1/4] perf/core: " Alexander Gordeev
@ 2014-01-04 18:22 ` Alexander Gordeev
  2014-01-04 18:22 ` [PATCH RFC v2 3/4] perf/x86/Intel: " Alexander Gordeev
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Alexander Gordeev @ 2014-01-04 18:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alexander Gordeev, Arnaldo Carvalho de Melo, Jiri Olsa,
	Ingo Molnar, Frederic Weisbecker, Peter Zijlstra, Andi Kleen

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
 arch/x86/kernel/cpu/perf_event.c       |   55 +++++++++++++++++++++++++++++++-
 arch/x86/kernel/cpu/perf_event.h       |   10 ++++++
 arch/x86/kernel/cpu/perf_event_amd.c   |    2 +
 arch/x86/kernel/cpu/perf_event_intel.c |    4 ++
 arch/x86/kernel/cpu/perf_event_knc.c   |    2 +
 arch/x86/kernel/cpu/perf_event_p4.c    |    2 +
 arch/x86/kernel/cpu/perf_event_p6.c    |    2 +
 include/uapi/linux/perf_event.h        |    1 -
 kernel/events/core.c                   |   34 ++++++--------------
 9 files changed, 86 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 8e13293..3a925e2 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -532,15 +532,66 @@ void x86_pmu_enable_all(int added)
 	int idx;
 
 	for (idx = 0; idx < x86_pmu.num_counters; idx++) {
-		struct hw_perf_event *hwc = &cpuc->events[idx]->hw;
+		struct perf_event *event = cpuc->events[idx];
+		struct hw_perf_event *hwc = &event->hw;
 
 		if (!test_bit(idx, cpuc->active_mask))
 			continue;
+		if (is_hardirq_event(event))
+			continue;
+
+		__x86_pmu_enable_event(hwc, ARCH_PERFMON_EVENTSEL_ENABLE);
+	}
+}
+
+void x86_pmu_enable_hardirq(struct perf_event *events[], int count)
+{
+	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+	int idx;
+
+	for (idx = 0; idx < count; idx++) {
+		struct perf_event *event = cpuc->events[idx];
+		struct hw_perf_event *hwc = &event->hw;
+
+		BUG_ON(!test_bit(idx, cpuc->active_mask));
+		BUG_ON(!is_hardirq_event(event));
+
+		if (event->hw.state)
+			continue;
 
 		__x86_pmu_enable_event(hwc, ARCH_PERFMON_EVENTSEL_ENABLE);
 	}
 }
 
+void x86_pmu_disable_hardirq(struct perf_event *events[], int count)
+{
+	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+	int idx;
+
+	for (idx = 0; idx < count; idx++) {
+		struct perf_event *event = events[idx];
+
+		BUG_ON(!test_bit(idx, cpuc->active_mask));
+		BUG_ON(!is_hardirq_event(event));
+
+		x86_pmu_disable_event(event);
+	}
+}
+
+void x86_pmu_nop_hardirq(struct perf_event *events[], int count)
+{
+}
+
+static void x86_pmu_start_hardirq(struct perf_event *events[], int count)
+{
+	x86_pmu.enable_hardirq(events, count);
+}
+
+static void x86_pmu_stop_hardirq(struct perf_event *events[], int count)
+{
+	x86_pmu.disable_hardirq(events, count);
+}
+
 static struct pmu pmu;
 
 static inline int is_x86_event(struct perf_event *event)
@@ -1871,6 +1922,8 @@ static struct pmu pmu = {
 	.del			= x86_pmu_del,
 	.start			= x86_pmu_start,
 	.stop			= x86_pmu_stop,
+	.start_hardirq		= x86_pmu_start_hardirq,
+	.stop_hardirq		= x86_pmu_stop_hardirq,
 	.read			= x86_pmu_read,
 
 	.start_txn		= x86_pmu_start_txn,
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index fd00bb2..03c9595 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -367,6 +367,8 @@ struct x86_pmu {
 	void		(*enable_all)(int added);
 	void		(*enable)(struct perf_event *);
 	void		(*disable)(struct perf_event *);
+	void		(*enable_hardirq)(struct perf_event *[], int);
+	void		(*disable_hardirq)(struct perf_event *[], int);
 	int		(*hw_config)(struct perf_event *event);
 	int		(*schedule_events)(struct cpu_hw_events *cpuc, int n, int *assign);
 	unsigned	eventsel;
@@ -538,6 +540,8 @@ int x86_pmu_hw_config(struct perf_event *event);
 
 void x86_pmu_disable_all(void);
 
+void x86_pmu_disable_hardirq(struct perf_event *events[], int count);
+
 static inline void __x86_pmu_enable_event(struct hw_perf_event *hwc,
 					  u64 enable_mask)
 {
@@ -550,6 +554,12 @@ static inline void __x86_pmu_enable_event(struct hw_perf_event *hwc,
 
 void x86_pmu_enable_all(int added);
 
+void x86_pmu_enable_hardirq(struct perf_event *events[], int count);
+
+void x86_pmu_nop_hardirq(struct perf_event *events[], int count);
+
+void x86_pmu_nop_hardirq_void_int(int irq);
+
 int perf_assign_events(struct perf_event **events, int n,
 			int wmin, int wmax, int *assign);
 int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign);
diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index beeb7cc..fa51cae 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -621,6 +621,8 @@ static __initconst const struct x86_pmu amd_pmu = {
 	.handle_irq		= x86_pmu_handle_irq,
 	.disable_all		= x86_pmu_disable_all,
 	.enable_all		= x86_pmu_enable_all,
+	.disable_hardirq	= x86_pmu_nop_hardirq,
+	.enable_hardirq		= x86_pmu_nop_hardirq,
 	.enable			= x86_pmu_enable_event,
 	.disable		= x86_pmu_disable_event,
 	.hw_config		= amd_pmu_hw_config,
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 0fa4f24..c656997 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1931,6 +1931,8 @@ static __initconst const struct x86_pmu core_pmu = {
 	.handle_irq		= x86_pmu_handle_irq,
 	.disable_all		= x86_pmu_disable_all,
 	.enable_all		= core_pmu_enable_all,
+	.disable_hardirq	= x86_pmu_nop_hardirq,
+	.enable_hardirq		= x86_pmu_nop_hardirq,
 	.enable			= core_pmu_enable_event,
 	.disable		= x86_pmu_disable_event,
 	.hw_config		= x86_pmu_hw_config,
@@ -2076,6 +2078,8 @@ static __initconst const struct x86_pmu intel_pmu = {
 	.disable_all		= intel_pmu_disable_all,
 	.enable_all		= intel_pmu_enable_all,
 	.enable			= intel_pmu_enable_event,
+	.disable_hardirq	= x86_pmu_nop_hardirq,
+	.enable_hardirq		= x86_pmu_nop_hardirq,
 	.disable		= intel_pmu_disable_event,
 	.hw_config		= intel_pmu_hw_config,
 	.schedule_events	= x86_schedule_events,
diff --git a/arch/x86/kernel/cpu/perf_event_knc.c b/arch/x86/kernel/cpu/perf_event_knc.c
index 838fa87..3adffae 100644
--- a/arch/x86/kernel/cpu/perf_event_knc.c
+++ b/arch/x86/kernel/cpu/perf_event_knc.c
@@ -289,6 +289,8 @@ static const struct x86_pmu knc_pmu __initconst = {
 	.handle_irq		= knc_pmu_handle_irq,
 	.disable_all		= knc_pmu_disable_all,
 	.enable_all		= knc_pmu_enable_all,
+	.disable_hardirq	= x86_pmu_nop_hardirq,
+	.enable_hardirq		= x86_pmu_nop_hardirq,
 	.enable			= knc_pmu_enable_event,
 	.disable		= knc_pmu_disable_event,
 	.hw_config		= x86_pmu_hw_config,
diff --git a/arch/x86/kernel/cpu/perf_event_p4.c b/arch/x86/kernel/cpu/perf_event_p4.c
index 3486e66..377edc3 100644
--- a/arch/x86/kernel/cpu/perf_event_p4.c
+++ b/arch/x86/kernel/cpu/perf_event_p4.c
@@ -1286,6 +1286,8 @@ static __initconst const struct x86_pmu p4_pmu = {
 	.handle_irq		= p4_pmu_handle_irq,
 	.disable_all		= p4_pmu_disable_all,
 	.enable_all		= p4_pmu_enable_all,
+	.disable_hardirq	= x86_pmu_nop_hardirq,
+	.enable_hardirq		= x86_pmu_nop_hardirq,
 	.enable			= p4_pmu_enable_event,
 	.disable		= p4_pmu_disable_event,
 	.eventsel		= MSR_P4_BPU_CCCR0,
diff --git a/arch/x86/kernel/cpu/perf_event_p6.c b/arch/x86/kernel/cpu/perf_event_p6.c
index b1e2fe1..94755bf 100644
--- a/arch/x86/kernel/cpu/perf_event_p6.c
+++ b/arch/x86/kernel/cpu/perf_event_p6.c
@@ -202,6 +202,8 @@ static __initconst const struct x86_pmu p6_pmu = {
 	.handle_irq		= x86_pmu_handle_irq,
 	.disable_all		= p6_pmu_disable_all,
 	.enable_all		= p6_pmu_enable_all,
+	.disable_hardirq	= x86_pmu_nop_hardirq,
+	.enable_hardirq		= x86_pmu_nop_hardirq,
 	.enable			= p6_pmu_enable_event,
 	.disable		= p6_pmu_disable_event,
 	.hw_config		= x86_pmu_hw_config,
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index a033014..066b53c 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -726,7 +726,6 @@ enum perf_callchain_context {
 #define PERF_FLAG_FD_NO_GROUP		(1U << 0)
 #define PERF_FLAG_FD_OUTPUT		(1U << 1)
 #define PERF_FLAG_PID_CGROUP		(1U << 2) /* pid=cgroup id, per-cpu mode only */
-#define PERF_FLAG_PID_HARDIRQ		(1U << 3) /* pid=irq number */
 
 union perf_mem_data_src {
 	__u64 val;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 465ce681..ec1dfac 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -119,8 +119,7 @@ static int cpu_function_call(int cpu, int (*func) (void *info), void *info)
 
 #define PERF_FLAG_ALL (PERF_FLAG_FD_NO_GROUP |\
 		       PERF_FLAG_FD_OUTPUT |\
-		       PERF_FLAG_PID_CGROUP |\
-		       PERF_FLAG_PID_HARDIRQ)
+		       PERF_FLAG_PID_CGROUP)
 
 /*
  * branch priv levels that need permission checks
@@ -7028,7 +7027,6 @@ SYSCALL_DEFINE5(perf_event_open,
 	struct fd group = {NULL, 0};
 	struct task_struct *task = NULL;
 	struct pmu *pmu;
-	int hardirq = -1;
 	int event_fd;
 	int move_group = 0;
 	int err;
@@ -7037,27 +7035,6 @@ SYSCALL_DEFINE5(perf_event_open,
 	if (flags & ~PERF_FLAG_ALL)
 		return -EINVAL;
 
-	if ((flags & (PERF_FLAG_PID_CGROUP | PERF_FLAG_PID_HARDIRQ)) ==
-	    (PERF_FLAG_PID_CGROUP | PERF_FLAG_PID_HARDIRQ))
-		return -EINVAL;
-
-	/*
-	 * In irq mode, the pid argument is used to pass irq number.
-	 */
-	if (flags & PERF_FLAG_PID_HARDIRQ) {
-		hardirq = pid;
-		pid = -1;
-	}
-
-	/*
-	 * In cgroup mode, the pid argument is used to pass the fd
-	 * opened to the cgroup directory in cgroupfs. The cpu argument
-	 * designates the cpu on which to monitor threads from that
-	 * cgroup.
-	 */
-	if ((flags & PERF_FLAG_PID_CGROUP) && (pid == -1 || cpu == -1))
-		return -EINVAL;
-
 	err = perf_copy_attr(attr_uptr, &attr);
 	if (err)
 		return err;
@@ -7072,6 +7049,15 @@ SYSCALL_DEFINE5(perf_event_open,
 			return -EINVAL;
 	}
 
+	/*
+	 * In cgroup mode, the pid argument is used to pass the fd
+	 * opened to the cgroup directory in cgroupfs. The cpu argument
+	 * designates the cpu on which to monitor threads from that
+	 * cgroup.
+	 */
+	if ((flags & PERF_FLAG_PID_CGROUP) && (pid == -1 || cpu == -1))
+		return -EINVAL;
+
 	event_fd = get_unused_fd();
 	if (event_fd < 0)
 		return event_fd;
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RFC v2 3/4] perf/x86/Intel: IRQ-bound performance events
  2014-01-04 18:22 [PATCH RFC v2 0/4] perf: IRQ-bound performance events Alexander Gordeev
  2014-01-04 18:22 ` [PATCH RFC v2 1/4] perf/core: " Alexander Gordeev
  2014-01-04 18:22 ` [PATCH RFC v2 2/4] perf/x86: " Alexander Gordeev
@ 2014-01-04 18:22 ` Alexander Gordeev
  2014-01-04 18:22 ` [PATCH RFC v2 4/4] perf/tool: " Alexander Gordeev
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Alexander Gordeev @ 2014-01-04 18:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alexander Gordeev, Arnaldo Carvalho de Melo, Jiri Olsa,
	Ingo Molnar, Frederic Weisbecker, Peter Zijlstra, Andi Kleen

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
 arch/x86/kernel/cpu/perf_event.h       |    5 ++
 arch/x86/kernel/cpu/perf_event_intel.c |   72 +++++++++++++++++++++++++++----
 2 files changed, 67 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 03c9595..78aed95 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -164,6 +164,11 @@ struct cpu_hw_events {
 	struct perf_guest_switch_msr	guest_switch_msrs[X86_PMC_IDX_MAX];
 
 	/*
+	 * Intel hardware interrupt context exclude bits
+	 */
+	u64				intel_ctrl_hardirq_mask;
+
+	/*
 	 * Intel checkpoint mask
 	 */
 	u64				intel_cp_status;
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index c656997..98a4f38 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1057,14 +1057,11 @@ static void intel_pmu_disable_all(void)
 	intel_pmu_lbr_disable_all();
 }
 
-static void intel_pmu_enable_all(int added)
+static void __intel_pmu_enable(struct cpu_hw_events *cpuc, u64 control)
 {
-	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
-
 	intel_pmu_pebs_enable_all();
 	intel_pmu_lbr_enable_all();
-	wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL,
-			x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask);
+	wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, control);
 
 	if (test_bit(INTEL_PMC_IDX_FIXED_BTS, cpuc->active_mask)) {
 		struct perf_event *event =
@@ -1077,6 +1074,56 @@ static void intel_pmu_enable_all(int added)
 	}
 }
 
+static void intel_pmu_enable_all(int added)
+{
+	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+	u64 disable_mask = cpuc->intel_ctrl_guest_mask |
+			   cpuc->intel_ctrl_hardirq_mask;
+	__intel_pmu_enable(cpuc, x86_pmu.intel_ctrl & ~disable_mask);
+}
+
+u64 __get_intel_ctrl_hardirq_mask(struct perf_event *events[], int count)
+{
+	u64 ret = 0;
+	int i;
+
+	BUG_ON(count > x86_pmu.num_counters);
+
+	for (i = 0; i < count; i++) {
+		struct perf_event *event = events[i];
+		BUG_ON(!is_hardirq_event(event));
+		if (!event->hw.state)
+			ret |= (1ull << event->hw.idx);
+	}
+
+	return ret;
+}
+
+static inline u64 intel_pmu_get_control(void)
+{
+	u64 control;
+
+	rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, control);
+
+	return control;
+}
+
+static void intel_pmu_disable_hardirq(struct perf_event *events[], int count)
+{
+	u64 control = intel_pmu_get_control();
+	u64 mask = __get_intel_ctrl_hardirq_mask(events, count);
+
+	wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, control & ~mask);
+}
+
+static void intel_pmu_enable_hardirq(struct perf_event *events[], int count)
+{
+	u64 control = intel_pmu_get_control();
+	u64 mask = __get_intel_ctrl_hardirq_mask(events, count);
+
+	wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, control | mask);
+}
+
 /*
  * Workaround for:
  *   Intel Errata AAK100 (model 26)
@@ -1200,6 +1247,8 @@ static void intel_pmu_disable_event(struct perf_event *event)
 		return;
 	}
 
+	if (is_hardirq_event(event))
+		cpuc->intel_ctrl_hardirq_mask &= ~(1ull << hwc->idx);
 	cpuc->intel_ctrl_guest_mask &= ~(1ull << hwc->idx);
 	cpuc->intel_ctrl_host_mask &= ~(1ull << hwc->idx);
 	cpuc->intel_cp_status &= ~(1ull << hwc->idx);
@@ -1276,6 +1325,8 @@ static void intel_pmu_enable_event(struct perf_event *event)
 		cpuc->intel_ctrl_guest_mask |= (1ull << hwc->idx);
 	if (event->attr.exclude_guest)
 		cpuc->intel_ctrl_host_mask |= (1ull << hwc->idx);
+	if (is_hardirq_event(event))
+		cpuc->intel_ctrl_hardirq_mask |= (1ull << hwc->idx);
 
 	if (unlikely(event_is_checkpointed(event)))
 		cpuc->intel_cp_status |= (1ull << hwc->idx);
@@ -1347,7 +1398,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
 	struct perf_sample_data data;
 	struct cpu_hw_events *cpuc;
 	int bit, loops;
-	u64 status;
+	u64 control, status;
 	int handled;
 
 	cpuc = &__get_cpu_var(cpu_hw_events);
@@ -1358,11 +1409,12 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
 	 */
 	if (!x86_pmu.late_ack)
 		apic_write(APIC_LVTPC, APIC_DM_NMI);
+	control = intel_pmu_get_control();
 	intel_pmu_disable_all();
 	handled = intel_pmu_drain_bts_buffer();
 	status = intel_pmu_get_status();
 	if (!status) {
-		intel_pmu_enable_all(0);
+		__intel_pmu_enable(cpuc, control);
 		return handled;
 	}
 
@@ -1427,7 +1479,7 @@ again:
 		goto again;
 
 done:
-	intel_pmu_enable_all(0);
+	__intel_pmu_enable(cpuc, control);
 	/*
 	 * Only unmask the NMI after the overflow counters
 	 * have been reset. This avoids spurious NMIs on
@@ -2078,8 +2130,8 @@ static __initconst const struct x86_pmu intel_pmu = {
 	.disable_all		= intel_pmu_disable_all,
 	.enable_all		= intel_pmu_enable_all,
 	.enable			= intel_pmu_enable_event,
-	.disable_hardirq	= x86_pmu_nop_hardirq,
-	.enable_hardirq		= x86_pmu_nop_hardirq,
+	.disable_hardirq	= intel_pmu_disable_hardirq,
+	.enable_hardirq		= intel_pmu_enable_hardirq,
 	.disable		= intel_pmu_disable_event,
 	.hw_config		= intel_pmu_hw_config,
 	.schedule_events	= x86_schedule_events,
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RFC v2 4/4] perf/tool: IRQ-bound performance events
  2014-01-04 18:22 [PATCH RFC v2 0/4] perf: IRQ-bound performance events Alexander Gordeev
                   ` (2 preceding siblings ...)
  2014-01-04 18:22 ` [PATCH RFC v2 3/4] perf/x86/Intel: " Alexander Gordeev
@ 2014-01-04 18:22 ` Alexander Gordeev
  2014-01-05 17:59 ` [PATCH RFC v2 0/4] perf: " Andi Kleen
  2014-01-13 15:50 ` Frederic Weisbecker
  5 siblings, 0 replies; 10+ messages in thread
From: Alexander Gordeev @ 2014-01-04 18:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alexander Gordeev, Arnaldo Carvalho de Melo, Jiri Olsa,
	Ingo Molnar, Frederic Weisbecker, Peter Zijlstra, Andi Kleen

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
 tools/perf/builtin-stat.c      |    9 +++++++++
 tools/perf/util/evlist.c       |   38 ++++++++++++++++++++++++++++++++++++++
 tools/perf/util/evlist.h       |    3 +++
 tools/perf/util/evsel.c        |    8 ++++++++
 tools/perf/util/evsel.h        |    3 +++
 tools/perf/util/parse-events.c |   24 ++++++++++++++++++++++++
 tools/perf/util/parse-events.h |    1 +
 7 files changed, 86 insertions(+), 0 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index b27b264..48c9d90 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -292,6 +292,7 @@ static int create_perf_stat_counter(struct perf_evsel *evsel)
 				    PERF_FORMAT_TOTAL_TIME_RUNNING;
 
 	attr->inherit = !no_inherit;
+	attr->hardirq = 1;
 
 	if (target__has_cpu(&target))
 		return perf_evsel__open_per_cpu(evsel, perf_evsel__cpus(evsel));
@@ -590,6 +591,12 @@ static int __run_perf_stat(int argc, const char **argv)
 		return -1;
 	}
 
+	if (perf_evlist__apply_hardirq(evsel_list)) {
+		error("failed to set hardirq with %d (%s)\n", errno,
+			strerror(errno));
+		return -1;
+	}
+
 	/*
 	 * Enable counters and exec the command:
 	 */
@@ -1613,6 +1620,8 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
 		     parse_events_option),
 	OPT_CALLBACK(0, "filter", &evsel_list, "filter",
 		     "event filter", parse_filter),
+	OPT_CALLBACK('h', "hardirq", &evsel_list, "hardirq",
+		     "stat events on existing hardware IRQ", parse_hardirq),
 	OPT_BOOLEAN('i', "no-inherit", &no_inherit,
 		    "child tasks do not inherit counters"),
 	OPT_STRING('p', "pid", &target.pid, "pid",
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 0810f5c..5104232 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -861,6 +861,27 @@ int perf_evlist__apply_filters(struct perf_evlist *evlist)
 	return err;
 }
 
+int perf_evlist__apply_hardirq(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel;
+	int err = 0;
+	const int ncpus = cpu_map__nr(evlist->cpus),
+		  nthreads = thread_map__nr(evlist->threads);
+
+	list_for_each_entry(evsel, &evlist->entries, node) {
+		if (evsel->hardirq == NULL)
+			continue;
+
+		err = perf_evsel__set_hardirq(evsel, ncpus, nthreads, evsel->hardirq);
+		if (err)
+			break;
+
+		evsel->attr.hardirq = 1;
+	}
+
+	return err;
+}
+
 int perf_evlist__set_filter(struct perf_evlist *evlist, const char *filter)
 {
 	struct perf_evsel *evsel;
@@ -877,6 +898,23 @@ int perf_evlist__set_filter(struct perf_evlist *evlist, const char *filter)
 	return err;
 }
 
+int perf_evlist__set_hardirq(struct perf_evlist *evlist,
+			     const struct perf_hardirq_event_disp *hardirq)
+{
+	struct perf_evsel *evsel;
+	int err = 0;
+	const int ncpus = cpu_map__nr(evlist->cpus),
+		  nthreads = thread_map__nr(evlist->threads);
+
+	list_for_each_entry(evsel, &evlist->entries, node) {
+		err = perf_evsel__set_hardirq(evsel, ncpus, nthreads, hardirq);
+		if (err)
+			break;
+	}
+
+	return err;
+}
+
 bool perf_evlist__valid_sample_type(struct perf_evlist *evlist)
 {
 	struct perf_evsel *pos;
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 518e521..12dc36d 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -71,6 +71,8 @@ int perf_evlist__add_newtp(struct perf_evlist *evlist,
 			   const char *sys, const char *name, void *handler);
 
 int perf_evlist__set_filter(struct perf_evlist *evlist, const char *filter);
+int perf_evlist__set_hardirq(struct perf_evlist *evlist,
+			     const struct perf_hardirq_event_disp *hardirq);
 
 struct perf_evsel *
 perf_evlist__find_tracepoint_by_id(struct perf_evlist *evlist, int id);
@@ -136,6 +138,7 @@ static inline void perf_evlist__set_maps(struct perf_evlist *evlist,
 
 int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target);
 int perf_evlist__apply_filters(struct perf_evlist *evlist);
+int perf_evlist__apply_hardirq(struct perf_evlist *evlist);
 
 void __perf_evlist__set_leader(struct list_head *list);
 void perf_evlist__set_leader(struct perf_evlist *evlist);
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index ade8d9c..23d52a3 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -704,6 +704,14 @@ int perf_evsel__set_filter(struct perf_evsel *evsel, int ncpus, int nthreads,
 				     (void *)filter);
 }
 
+int perf_evsel__set_hardirq(struct perf_evsel *evsel, int ncpus, int nthreads,
+			    const struct perf_hardirq_event_disp *disp)
+{
+	return perf_evsel__run_ioctl(evsel, ncpus, nthreads,
+				     PERF_EVENT_IOC_SET_HARDIRQ,
+				     (void *)disp);
+}
+
 int perf_evsel__enable(struct perf_evsel *evsel, int ncpus, int nthreads)
 {
 	return perf_evsel__run_ioctl(evsel, ncpus, nthreads,
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index f1b3256..f07221b 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -59,6 +59,7 @@ struct perf_evsel {
 	struct list_head	node;
 	struct perf_event_attr	attr;
 	char			*filter;
+	struct perf_hardirq_event_disp *hardirq;
 	struct xyarray		*fd;
 	struct xyarray		*sample_id;
 	u64			*id;
@@ -169,6 +170,8 @@ void perf_evsel__set_sample_id(struct perf_evsel *evsel,
 
 int perf_evsel__set_filter(struct perf_evsel *evsel, int ncpus, int nthreads,
 			   const char *filter);
+int perf_evsel__set_hardirq(struct perf_evsel *evsel, int ncpus, int nthreads,
+			    const struct perf_hardirq_event_disp *hardirq);
 int perf_evsel__enable(struct perf_evsel *evsel, int ncpus, int nthreads);
 
 int perf_evsel__open_per_cpu(struct perf_evsel *evsel,
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 0153435..7a5114d 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -987,6 +987,30 @@ int parse_filter(const struct option *opt, const char *str,
 	return 0;
 }
 
+int parse_hardirq(const struct option *opt, const char *str,
+		  int unset __maybe_unused)
+{
+	struct perf_evlist *evlist = *(struct perf_evlist **)opt->value;
+	struct perf_evsel *evsel;
+	struct perf_hardirq_event_disp *event_disp;
+
+	event_disp = malloc(offsetof(typeof(*event_disp), disp) +
+			    sizeof(event_disp->disp[0]));
+	if (!event_disp) {
+		fprintf(stderr, "not enough memory to hold hardirq disp\n");
+		return -1;
+	}
+
+	event_disp->nr_disp		= 1;
+	event_disp->disp[0].irq_nr	= atoi(str);
+	event_disp->disp[0].actions	= -1;
+
+	list_for_each_entry(evsel, &evlist->entries, node)
+		evsel->hardirq = event_disp;
+
+	return 0;
+}
+
 static const char * const event_type_descriptors[] = {
 	"Hardware event",
 	"Software event",
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index f1cb4c4..a6927d6 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -33,6 +33,7 @@ extern int parse_events_option(const struct option *opt, const char *str,
 extern int parse_events(struct perf_evlist *evlist, const char *str);
 extern int parse_events_terms(struct list_head *terms, const char *str);
 extern int parse_filter(const struct option *opt, const char *str, int unset);
+extern int parse_hardirq(const struct option *opt, const char *str, int unset);
 
 #define EVENTS_HELP_MAX (128*1024)
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC v2 0/4] perf: IRQ-bound performance events
  2014-01-04 18:22 [PATCH RFC v2 0/4] perf: IRQ-bound performance events Alexander Gordeev
                   ` (3 preceding siblings ...)
  2014-01-04 18:22 ` [PATCH RFC v2 4/4] perf/tool: " Alexander Gordeev
@ 2014-01-05 17:59 ` Andi Kleen
  2014-01-13 13:23   ` Alexander Gordeev
  2014-01-13 15:50 ` Frederic Weisbecker
  5 siblings, 1 reply; 10+ messages in thread
From: Andi Kleen @ 2014-01-05 17:59 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar,
	Frederic Weisbecker, Peter Zijlstra, Andi Kleen

On Sat, Jan 04, 2014 at 07:22:32PM +0100, Alexander Gordeev wrote:
> Hello,
> 
> This is version 2 of RFC "perf: IRQ-bound performance events". That is an
> introduction of IRQ-bound performance events - ones that only count in a
> context of a hardware interrupt handler. Ingo suggested to extend this
> functionality to softirq and threaded handlers as well:

Did you measure the overhead in workloads that do a lot of interrupts?
I assume two WRMSR could be a significant part of the cost of small interrupts.

For counting at least it would be likely a lot cheaper to just RDPMC
and subtract manually.

The cache miss example below is certainly misleading, as cache misses
by interrupts are often a "debt", that is they are forced on whoever
is interrupted. I don't think that is a good use of this.

I guess it can be useful for cycles.

-Andi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC v2 0/4] perf: IRQ-bound performance events
  2014-01-05 17:59 ` [PATCH RFC v2 0/4] perf: " Andi Kleen
@ 2014-01-13 13:23   ` Alexander Gordeev
  0 siblings, 0 replies; 10+ messages in thread
From: Alexander Gordeev @ 2014-01-13 13:23 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar,
	Frederic Weisbecker, Peter Zijlstra, Andi Kleen

On Sun, Jan 05, 2014 at 09:59:49AM -0800, Andi Kleen wrote:
> > This is version 2 of RFC "perf: IRQ-bound performance events". That is an
> > introduction of IRQ-bound performance events - ones that only count in a
> > context of a hardware interrupt handler. Ingo suggested to extend this
> > functionality to softirq and threaded handlers as well:
> 
> Did you measure the overhead in workloads that do a lot of interrupts?
> I assume two WRMSR could be a significant part of the cost of small interrupts.

No, that would be the next step. I hoped first to ensure the way I am
intruding into the current perf design is correct.

> For counting at least it would be likely a lot cheaper to just RDPMC
> and subtract manually.

Sigh, that seems as quite a rework for Intel PMU.

> The cache miss example below is certainly misleading, as cache misses
> by interrupts are often a "debt", that is they are forced on whoever
> is interrupted. I don't think that is a good use of this.

May be useless rather than misleading? :) Actually, cache and power use
are exactly the data I thought are useful if one wants to check the
dependency from the interrupt affinity mask. There was some discussion
on this topic some year ago:

On Mon, May 21, 2012 at 08:36:09AM -0700, Linus Torvalds wrote:
"So it may well make perfect sense to allow a mask of CPU's for
interrupt delivery, but just make sure that the mask all points to
CPU's on the same socket. That would give the hardware some leeway in
choosing the actual core - it's very possible that hardware could
avoid cores that are running with irq's disabled (possibly improving
latecy) or even more likely - avoid cores that are in deeper
powersaving modes."

So this RFC is kind of follow-up to come up with necessary tooling.

> I guess it can be useful for cycles.
> 
> -Andi

Thanks, Andi!

-- 
Regards,
Alexander Gordeev
agordeev@redhat.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC v2 0/4] perf: IRQ-bound performance events
  2014-01-04 18:22 [PATCH RFC v2 0/4] perf: IRQ-bound performance events Alexander Gordeev
                   ` (4 preceding siblings ...)
  2014-01-05 17:59 ` [PATCH RFC v2 0/4] perf: " Andi Kleen
@ 2014-01-13 15:50 ` Frederic Weisbecker
  2014-01-14 16:07   ` Alexander Gordeev
  5 siblings, 1 reply; 10+ messages in thread
From: Frederic Weisbecker @ 2014-01-13 15:50 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar,
	Peter Zijlstra, Andi Kleen

On Sat, Jan 04, 2014 at 07:22:32PM +0100, Alexander Gordeev wrote:
> Hello,
> 
> This is version 2 of RFC "perf: IRQ-bound performance events". That is an
> introduction of IRQ-bound performance events - ones that only count in a
> context of a hardware interrupt handler. Ingo suggested to extend this
> functionality to softirq and threaded handlers as well:

Hi Alexander,

I still strongly think we should use toggle events to achieve that:
https://lkml.org/lkml/2013/9/25/227

This will let us count not just IRQs (-e 'cycles,irq_entry/on=cycles/,irq_exit/off=cycles/') but much more. 

The patchset still needs some polishing but I think that's a better diection.

Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC v2 0/4] perf: IRQ-bound performance events
  2014-01-13 15:50 ` Frederic Weisbecker
@ 2014-01-14 16:07   ` Alexander Gordeev
  2014-01-14 17:09     ` Frederic Weisbecker
  0 siblings, 1 reply; 10+ messages in thread
From: Alexander Gordeev @ 2014-01-14 16:07 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar,
	Peter Zijlstra, Andi Kleen

On Mon, Jan 13, 2014 at 04:50:37PM +0100, Frederic Weisbecker wrote:
> On Sat, Jan 04, 2014 at 07:22:32PM +0100, Alexander Gordeev wrote:
> > Hello,
> > 
> > This is version 2 of RFC "perf: IRQ-bound performance events". That is an
> > introduction of IRQ-bound performance events - ones that only count in a
> > context of a hardware interrupt handler. Ingo suggested to extend this
> > functionality to softirq and threaded handlers as well:
> 
> Hi Alexander,
> 
> I still strongly think we should use toggle events to achieve that:
> https://lkml.org/lkml/2013/9/25/227

Hi Frederic,

The toggle events would not allow counting per-action in hardware interrupt
context. The design I propose provisions any possible combination of actions/
IRQs.

I.e. if we had few drivers on IRQn and few drivers on IRQm we could assign
an event to let's say ISR0, ISR2 on IRQn and ISR1 on IRQm.

Moreover, given that hardware context handlers are running with local
interrupts disabled and therefore an IRQ-bound event could be enabled/
disabled only from a single handler at a time - we just need to allocate
a single hardware counter for any possible combination.

Unless I missing something, neither kernel nor user level for toggle events
can meet the features described above.

I think it would be ideal if the two approaches could be converged somehow,
but I just do not know how at the moment. I believe the next step is to
measure the overhead Andi mentioned. That well might be a showstopper for
either or both approaches.

By contrast with the hardware context, the toggle events seem to able
monitoring softirq in its current form.

As of the threaded context handlers, I have not come up with the idea how to
make it yet, but it does not seem the toggle events are able eigher.

> This will let us count not just IRQs (-e 'cycles,irq_entry/on=cycles/,irq_exit/off=cycles/') but much more. 
> 
> The patchset still needs some polishing but I think that's a better diection.

Thanks, Frederic.

> Thanks.

-- 
Regards,
Alexander Gordeev
agordeev@redhat.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC v2 0/4] perf: IRQ-bound performance events
  2014-01-14 16:07   ` Alexander Gordeev
@ 2014-01-14 17:09     ` Frederic Weisbecker
  0 siblings, 0 replies; 10+ messages in thread
From: Frederic Weisbecker @ 2014-01-14 17:09 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar,
	Peter Zijlstra, Andi Kleen

On Tue, Jan 14, 2014 at 05:07:52PM +0100, Alexander Gordeev wrote:
> On Mon, Jan 13, 2014 at 04:50:37PM +0100, Frederic Weisbecker wrote:
> > On Sat, Jan 04, 2014 at 07:22:32PM +0100, Alexander Gordeev wrote:
> > > Hello,
> > > 
> > > This is version 2 of RFC "perf: IRQ-bound performance events". That is an
> > > introduction of IRQ-bound performance events - ones that only count in a
> > > context of a hardware interrupt handler. Ingo suggested to extend this
> > > functionality to softirq and threaded handlers as well:
> > 
> > Hi Alexander,
> > 
> > I still strongly think we should use toggle events to achieve that:
> > https://lkml.org/lkml/2013/9/25/227
> 
> Hi Frederic,
> 
> The toggle events would not allow counting per-action in hardware interrupt
> context. The design I propose provisions any possible combination of actions/
> IRQs.

I think we could define one event per handler by using tracepoint filters.

> 
> I.e. if we had few drivers on IRQn and few drivers on IRQm we could assign
> an event to let's say ISR0, ISR2 on IRQn and ISR1 on IRQm.

Yeah that should be possible with tracepoints as well.

> Moreover, given that hardware context handlers are running with local
> interrupts disabled and therefore an IRQ-bound event could be enabled/
> disabled only from a single handler at a time - we just need to allocate
> a single hardware counter for any possible combination.

Hmm I don't get what you mean here. Why tracepoint defined event don't work in this scenario?

> 
> I think it would be ideal if the two approaches could be converged somehow,
> but I just do not know how at the moment. I believe the next step is to
> measure the overhead Andi mentioned. That well might be a showstopper for
> either or both approaches.
> 
> By contrast with the hardware context, the toggle events seem to able
> monitoring softirq in its current form.
> 
> As of the threaded context handlers, I have not come up with the idea how to
> make it yet, but it does not seem the toggle events are able eigher.

A per task event should do the trick for threaded irqs profiling.

Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-01-14 17:09 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-04 18:22 [PATCH RFC v2 0/4] perf: IRQ-bound performance events Alexander Gordeev
2014-01-04 18:22 ` [PATCH RFC v2 1/4] perf/core: " Alexander Gordeev
2014-01-04 18:22 ` [PATCH RFC v2 2/4] perf/x86: " Alexander Gordeev
2014-01-04 18:22 ` [PATCH RFC v2 3/4] perf/x86/Intel: " Alexander Gordeev
2014-01-04 18:22 ` [PATCH RFC v2 4/4] perf/tool: " Alexander Gordeev
2014-01-05 17:59 ` [PATCH RFC v2 0/4] perf: " Andi Kleen
2014-01-13 13:23   ` Alexander Gordeev
2014-01-13 15:50 ` Frederic Weisbecker
2014-01-14 16:07   ` Alexander Gordeev
2014-01-14 17:09     ` Frederic Weisbecker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.