[PATCH v2 0/5] perf: Implement event group read using txn interface

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/5] perf: Implement event group read using txn interface
@ 2015-04-08  0:34 Sukadev Bhattiprolu
  2015-04-08  0:34 ` [PATCH v2 1/5] perf: Add a flags parameter to pmu txn interfaces Sukadev Bhattiprolu
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Sukadev Bhattiprolu @ 2015-04-08  0:34 UTC (permalink / raw)
  To: Paul Mackerras, Arnaldo Carvalho de Melo, mingo, peterz,
	Michael Ellerman
  Cc: linuxppc-dev, dev, linux-kernel

Unlike normal hardware PMCs, the 24x7 counters[1] in Power8 are stored in
memory and accessed via a hypervisor call (HCALL).  A major aspect of the
HCALL is that it allows retireving _SEVERAL_ counters at once (unlike
regular PMCs, which are read one at a time). By reading several counters
at once, we can get a more consistent snapshot of the system.

This patchset extends the transaction interface to accomplish submitting
several events to the PMU and have the PMU read them all at once. User is
expected to submit the set of events they want to read as an "event group".

In the kernel, we submit each event to the PMU using the following logic
(from Peter Zijlstra).

	pmu->start_txn(pmu, PMU_TXN_READ);

	leader->read();
	for_each_sibling()
		sibling->read();
	pmu->commit_txn();

where:
	- the ->read()s queue events to be submitted to the hypervisor, and,
	- the ->commit_txn() issues the HCALL, retrieves the result and
	  updates the event count.

Architectures/PMUs that don't need/implement PMU_TXN_READ type of transactions,
simply ignore the ->start_txn() and ->commit_txn() and continue to read the
counters one at a time in the ->read() call.

Compile/touch tested on x86. Need help testing on s390 and Sparc.

Thanks to Peter Zijlstra for his input.

Changelog [v2]
	- Use the transaction interface unconditionally to avoid special-case
	  code. Architectures/PMUs that don't need the READ transaction types
	  simply ignore the ->start_txn() and ->commit_txn() calls.

Sukadev Bhattiprolu (5):
  perf: Add a flags parameter to pmu txn interfaces
  perf: Split perf_event_read() and perf_event_count()
  perf: Rename perf_event_read_value to perf_event_compute_values
  perf: Define PMU_TXN_READ interface
  powerpc/perf/hv-24x7: Use PMU_TXN_READ interface

 arch/powerpc/perf/core-book3s.c  |   16 +++-
 arch/powerpc/perf/hv-24x7.c      |  165 ++++++++++++++++++++++++++++++++++++++
 arch/s390/kernel/perf_cpum_cf.c  |   15 +++-
 arch/sparc/kernel/perf_event.c   |   15 +++-
 arch/x86/kernel/cpu/perf_event.c |   15 +++-
 arch/x86/kvm/pmu.c               |    6 +-
 include/linux/perf_event.h       |   17 +++-
 kernel/events/core.c             |   83 ++++++++++++++-----
 8 files changed, 292 insertions(+), 40 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/5] perf: Add a flags parameter to pmu txn interfaces
  2015-04-08  0:34 [PATCH v2 0/5] perf: Implement event group read using txn interface Sukadev Bhattiprolu
@ 2015-04-08  0:34 ` Sukadev Bhattiprolu
  2015-04-08 12:19   ` Peter Zijlstra
  2015-04-08  0:34 ` [PATCH v2 2/5] perf: Split perf_event_read() and perf_event_count() Sukadev Bhattiprolu
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Sukadev Bhattiprolu @ 2015-04-08  0:34 UTC (permalink / raw)
  To: Paul Mackerras, Arnaldo Carvalho de Melo, mingo, peterz,
	Michael Ellerman
  Cc: linuxppc-dev, dev, linux-kernel

Currently, the PMU interface allows reading only one counter at a time.
But some PMUs like the 24x7 counters in Power, support reading several
counters at once. To leveage this functionality, extend the transaction
interface to support a "transaction type".

The first type, PERF_PMU_TXN_ADD, refers to the existing transactions,
i.e. used to _schedule_ all the events on the PMU as a group.

A second transaction type, PERF_PMU_TXN_READ, will be used in a follow
on patch, by the 24x7 counters to read several counters at once.

For now, extend the transaction interfaces to the PMU to accept a
'flags' parameter and use this parameter to ignore any transactions
that are not of type PERF_PMU_TXN_ADD.

Note:	For now we add the 'flags' parameter to all three txn functions
	(start, commit, cancel). We could add the parameter to only the
	->start interface and have the PMUs cache the transaction type.
	But that would need slightly more intrusive changes in all PMUs
	to support a second transaction type.

Thanks to Peter Zijlstra for his input.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 arch/powerpc/perf/core-book3s.c  |   16 +++++++++++++---
 arch/s390/kernel/perf_cpum_cf.c  |   15 ++++++++++++---
 arch/sparc/kernel/perf_event.c   |   15 ++++++++++++---
 arch/x86/kernel/cpu/perf_event.c |   15 ++++++++++++---
 include/linux/perf_event.h       |   13 ++++++++++---
 kernel/events/core.c             |   26 +++++++++++++++-----------
 6 files changed, 74 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 7c4f669..9cb8008 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -1573,10 +1573,13 @@ static void power_pmu_stop(struct perf_event *event, int ef_flags)
  * Set the flag to make pmu::enable() not perform the
  * schedulability test, it will be performed at commit time
  */
-static void power_pmu_start_txn(struct pmu *pmu)
+static void power_pmu_start_txn(struct pmu *pmu, int flags)
 {
 	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
 
+	if (flags & ~PERF_PMU_TXN_ADD)
+		return;
+
 	perf_pmu_disable(pmu);
 	cpuhw->group_flag |= PERF_EVENT_TXN;
 	cpuhw->n_txn_start = cpuhw->n_events;
@@ -1587,10 +1590,13 @@ static void power_pmu_start_txn(struct pmu *pmu)
  * Clear the flag and pmu::enable() will perform the
  * schedulability test.
  */
-static void power_pmu_cancel_txn(struct pmu *pmu)
+static void power_pmu_cancel_txn(struct pmu *pmu, int flags)
 {
 	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
 
+	if (flags & ~PERF_PMU_TXN_ADD)
+		return;
+
 	cpuhw->group_flag &= ~PERF_EVENT_TXN;
 	perf_pmu_enable(pmu);
 }
@@ -1600,13 +1606,17 @@ static void power_pmu_cancel_txn(struct pmu *pmu)
  * Perform the group schedulability test as a whole
  * Return 0 if success
  */
-static int power_pmu_commit_txn(struct pmu *pmu)
+static int power_pmu_commit_txn(struct pmu *pmu, int flags)
 {
 	struct cpu_hw_events *cpuhw;
 	long i, n;
 
 	if (!ppmu)
 		return -EAGAIN;
+
+	if (flags & ~PERF_PMU_TXN_ADD)
+		return -EINVAL;
+
 	cpuhw = this_cpu_ptr(&cpu_hw_events);
 	n = cpuhw->n_events;
 	if (check_excludes(cpuhw->event, cpuhw->flags, 0, n))
diff --git a/arch/s390/kernel/perf_cpum_cf.c b/arch/s390/kernel/perf_cpum_cf.c
index 56fdad4..7fa742e 100644
--- a/arch/s390/kernel/perf_cpum_cf.c
+++ b/arch/s390/kernel/perf_cpum_cf.c
@@ -573,10 +573,13 @@ static void cpumf_pmu_del(struct perf_event *event, int flags)
  * Start group events scheduling transaction.
  * Set flags to perform a single test at commit time.
  */
-static void cpumf_pmu_start_txn(struct pmu *pmu)
+static void cpumf_pmu_start_txn(struct pmu *pmu, int flags)
 {
 	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
 
+	if (flags & ~PERF_PMU_TXN_ADD)
+		return;
+
 	perf_pmu_disable(pmu);
 	cpuhw->flags |= PERF_EVENT_TXN;
 	cpuhw->tx_state = cpuhw->state;
@@ -587,12 +590,15 @@ static void cpumf_pmu_start_txn(struct pmu *pmu)
  * Assumes cpumf_pmu_del() is called for each successful added
  * cpumf_pmu_add() during the transaction.
  */
-static void cpumf_pmu_cancel_txn(struct pmu *pmu)
+static void cpumf_pmu_cancel_txn(struct pmu *pmu, int flags)
 {
 	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
 
 	WARN_ON(cpuhw->tx_state != cpuhw->state);
 
+	if (flags & ~PERF_PMU_TXN_ADD)
+		return;
+
 	cpuhw->flags &= ~PERF_EVENT_TXN;
 	perf_pmu_enable(pmu);
 }
@@ -602,11 +608,14 @@ static void cpumf_pmu_cancel_txn(struct pmu *pmu)
  * transaction is closed.   On error, the transaction is kept open
  * until cpumf_pmu_cancel_txn() is called.
  */
-static int cpumf_pmu_commit_txn(struct pmu *pmu)
+static int cpumf_pmu_commit_txn(struct pmu *pmu, int flags)
 {
 	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
 	u64 state;
 
+	if (flags & ~PERF_PMU_TXN_ADD)
+		return 0;
+
 	/* check if the updated state can be scheduled */
 	state = cpuhw->state & ~((1 << CPUMF_LCCTL_ENABLE_SHIFT) - 1);
 	state >>= CPUMF_LCCTL_ENABLE_SHIFT;
diff --git a/arch/sparc/kernel/perf_event.c b/arch/sparc/kernel/perf_event.c
index 86eebfa..b4b558e 100644
--- a/arch/sparc/kernel/perf_event.c
+++ b/arch/sparc/kernel/perf_event.c
@@ -1523,10 +1523,13 @@ static int sparc_pmu_event_init(struct perf_event *event)
  * Set the flag to make pmu::enable() not perform the
  * schedulability test, it will be performed at commit time
  */
-static void sparc_pmu_start_txn(struct pmu *pmu)
+static void sparc_pmu_start_txn(struct pmu *pmu, int flags)
 {
 	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
 
+	if (flags & ~PERF_PMU_TXN_ADD)
+		return;
+
 	perf_pmu_disable(pmu);
 	cpuhw->group_flag |= PERF_EVENT_TXN;
 }
@@ -1536,10 +1539,13 @@ static void sparc_pmu_start_txn(struct pmu *pmu)
  * Clear the flag and pmu::enable() will perform the
  * schedulability test.
  */
-static void sparc_pmu_cancel_txn(struct pmu *pmu)
+static void sparc_pmu_cancel_txn(struct pmu *pmu, int flags)
 {
 	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
 
+	if (flags & ~PERF_PMU_TXN_ADD)
+		return;
+
 	cpuhw->group_flag &= ~PERF_EVENT_TXN;
 	perf_pmu_enable(pmu);
 }
@@ -1549,7 +1555,7 @@ static void sparc_pmu_cancel_txn(struct pmu *pmu)
  * Perform the group schedulability test as a whole
  * Return 0 if success
  */
-static int sparc_pmu_commit_txn(struct pmu *pmu)
+static int sparc_pmu_commit_txn(struct pmu *pmu, int flags)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 	int n;
@@ -1557,6 +1563,9 @@ static int sparc_pmu_commit_txn(struct pmu *pmu)
 	if (!sparc_pmu)
 		return -EINVAL;
 
+	if (flags & ~PERF_PMU_TXN_ADD)
+		return 0;
+
 	cpuc = this_cpu_ptr(&cpu_hw_events);
 	n = cpuc->n_events;
 	if (check_excludes(cpuc->event, 0, n))
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index b71a7f8..a59aab5 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1607,8 +1607,11 @@ static inline void x86_pmu_read(struct perf_event *event)
  * Set the flag to make pmu::enable() not perform the
  * schedulability test, it will be performed at commit time
  */
-static void x86_pmu_start_txn(struct pmu *pmu)
+static void x86_pmu_start_txn(struct pmu *pmu, int flags)
 {
+	if (flags & ~PERF_PMU_TXN_ADD)
+		return;
+
 	perf_pmu_disable(pmu);
 	__this_cpu_or(cpu_hw_events.group_flag, PERF_EVENT_TXN);
 	__this_cpu_write(cpu_hw_events.n_txn, 0);
@@ -1619,8 +1622,11 @@ static void x86_pmu_start_txn(struct pmu *pmu)
  * Clear the flag and pmu::enable() will perform the
  * schedulability test.
  */
-static void x86_pmu_cancel_txn(struct pmu *pmu)
+static void x86_pmu_cancel_txn(struct pmu *pmu, int flags)
 {
+	if (flags & ~PERF_PMU_TXN_ADD)
+		return;
+
 	__this_cpu_and(cpu_hw_events.group_flag, ~PERF_EVENT_TXN);
 	/*
 	 * Truncate collected array by the number of events added in this
@@ -1638,12 +1644,15 @@ static void x86_pmu_cancel_txn(struct pmu *pmu)
  *
  * Does not cancel the transaction on failure; expects the caller to do this.
  */
-static int x86_pmu_commit_txn(struct pmu *pmu)
+static int x86_pmu_commit_txn(struct pmu *pmu, int flags)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 	int assign[X86_PMC_IDX_MAX];
 	int n, ret;
 
+	if (flags & ~PERF_PMU_TXN_ADD)
+		return 0;
+
 	n = cpuc->n_events;
 
 	if (!x86_pmu_initialized())
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 2b62198..4dc3d70 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -240,20 +240,27 @@ struct pmu {
 	 *
 	 * Start the transaction, after this ->add() doesn't need to
 	 * do schedulability tests.
+	 *
+	 * Optional.
 	 */
-	void (*start_txn)		(struct pmu *pmu); /* optional */
+#define PERF_PMU_TXN_ADD  0x1		/* txn to add/schedule event on PMU */
+	void (*start_txn)		(struct pmu *pmu, int flags);
 	/*
 	 * If ->start_txn() disabled the ->add() schedulability test
 	 * then ->commit_txn() is required to perform one. On success
 	 * the transaction is closed. On error the transaction is kept
 	 * open until ->cancel_txn() is called.
+	 *
+	 * Optional.
 	 */
-	int  (*commit_txn)		(struct pmu *pmu); /* optional */
+	int  (*commit_txn)		(struct pmu *pmu, int flags);
 	/*
 	 * Will cancel the transaction, assumes ->del() is called
 	 * for each successful ->add() during the transaction.
+	 *
+	 * Optional.
 	 */
-	void (*cancel_txn)		(struct pmu *pmu); /* optional */
+	void (*cancel_txn)		(struct pmu *pmu, int flags);
 
 	/*
 	 * Will return the value for perf_event_mmap_page::index for this event,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2fabc06..97516d3 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1924,10 +1924,10 @@ group_sched_in(struct perf_event *group_event,
 	if (group_event->state == PERF_EVENT_STATE_OFF)
 		return 0;
 
-	pmu->start_txn(pmu);
+	pmu->start_txn(pmu, PERF_PMU_TXN_ADD);
 
 	if (event_sched_in(group_event, cpuctx, ctx)) {
-		pmu->cancel_txn(pmu);
+		pmu->cancel_txn(pmu, PERF_PMU_TXN_ADD);
 		perf_cpu_hrtimer_restart(cpuctx);
 		return -EAGAIN;
 	}
@@ -1942,7 +1942,7 @@ group_sched_in(struct perf_event *group_event,
 		}
 	}
 
-	if (!pmu->commit_txn(pmu))
+	if (!pmu->commit_txn(pmu, PERF_PMU_TXN_ADD))
 		return 0;
 
 group_error:
@@ -1973,7 +1973,7 @@ group_error:
 	}
 	event_sched_out(group_event, cpuctx, ctx);
 
-	pmu->cancel_txn(pmu);
+	pmu->cancel_txn(pmu, PERF_PMU_TXN_ADD);
 
 	perf_cpu_hrtimer_restart(cpuctx);
 
@@ -6728,23 +6728,27 @@ static void perf_pmu_nop_void(struct pmu *pmu)
 {
 }
 
-static int perf_pmu_nop_int(struct pmu *pmu)
+static void perf_pmu_nop_txn(struct pmu *pmu, int flags)
+{
+}
+
+static int perf_pmu_nop_txn_int(struct pmu *pmu, int flags)
 {
 	return 0;
 }
 
-static void perf_pmu_start_txn(struct pmu *pmu)
+static void perf_pmu_start_txn(struct pmu *pmu, int flags)
 {
 	perf_pmu_disable(pmu);
 }
 
-static int perf_pmu_commit_txn(struct pmu *pmu)
+static int perf_pmu_commit_txn(struct pmu *pmu, int flags)
 {
 	perf_pmu_enable(pmu);
 	return 0;
 }
 
-static void perf_pmu_cancel_txn(struct pmu *pmu)
+static void perf_pmu_cancel_txn(struct pmu *pmu, int flags)
 {
 	perf_pmu_enable(pmu);
 }
@@ -6978,9 +6982,9 @@ got_cpu_context:
 			pmu->commit_txn = perf_pmu_commit_txn;
 			pmu->cancel_txn = perf_pmu_cancel_txn;
 		} else {
-			pmu->start_txn  = perf_pmu_nop_void;
-			pmu->commit_txn = perf_pmu_nop_int;
-			pmu->cancel_txn = perf_pmu_nop_void;
+			pmu->start_txn  = perf_pmu_nop_txn;
+			pmu->commit_txn = perf_pmu_nop_txn_int;
+			pmu->cancel_txn = perf_pmu_nop_txn;
 		}
 	}
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/5] perf: Split perf_event_read() and perf_event_count()
  2015-04-08  0:34 [PATCH v2 0/5] perf: Implement event group read using txn interface Sukadev Bhattiprolu
  2015-04-08  0:34 ` [PATCH v2 1/5] perf: Add a flags parameter to pmu txn interfaces Sukadev Bhattiprolu
@ 2015-04-08  0:34 ` Sukadev Bhattiprolu
  2015-04-08  0:34 ` [PATCH v2 3/5] perf: Rename perf_event_read_value Sukadev Bhattiprolu
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Sukadev Bhattiprolu @ 2015-04-08  0:34 UTC (permalink / raw)
  To: Paul Mackerras, Arnaldo Carvalho de Melo, mingo, peterz,
	Michael Ellerman
  Cc: linuxppc-dev, dev, linux-kernel

perf_event_read() does two things:

	- call the PMU to read/update the counter value
	- and compute the total count of the event and its children

perf_event_reset() needs the first piece but doesn't need the second.

Similarly, when we implement the ability to read a group of events
using the transaction interface, we would sometimes need one but
not both.

Break up perf_event_read() and have it just read/update the counter
and have the callers compute the total count if necessary.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 kernel/events/core.c |   14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 97516d3..0a3d7c1 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3223,7 +3223,7 @@ static inline u64 perf_event_count(struct perf_event *event)
 	return local64_read(&event->count) + atomic64_read(&event->child_count);
 }
 
-static u64 perf_event_read(struct perf_event *event)
+static void perf_event_read(struct perf_event *event)
 {
 	/*
 	 * If event is enabled and currently active on a CPU, update the
@@ -3249,8 +3249,6 @@ static u64 perf_event_read(struct perf_event *event)
 		update_event_times(event);
 		raw_spin_unlock_irqrestore(&ctx->lock, flags);
 	}
-
-	return perf_event_count(event);
 }
 
 /*
@@ -3654,14 +3652,18 @@ u64 perf_event_read_value(struct perf_event *event, u64 *enabled, u64 *running)
 	*running = 0;
 
 	mutex_lock(&event->child_mutex);
-	total += perf_event_read(event);
+
+	perf_event_read(event);
+	total += perf_event_count(event);
+
 	*enabled += event->total_time_enabled +
 			atomic64_read(&event->child_total_time_enabled);
 	*running += event->total_time_running +
 			atomic64_read(&event->child_total_time_running);
 
 	list_for_each_entry(child, &event->child_list, child_list) {
-		total += perf_event_read(child);
+		perf_event_read(child);
+		total += perf_event_count(child);
 		*enabled += child->total_time_enabled;
 		*running += child->total_time_running;
 	}
@@ -3821,7 +3823,7 @@ static unsigned int perf_poll(struct file *file, poll_table *wait)
 
 static void _perf_event_reset(struct perf_event *event)
 {
-	(void)perf_event_read(event);
+	perf_event_read(event);
 	local64_set(&event->count, 0);
 	perf_event_update_userpage(event);
 }
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 3/5] perf: Rename perf_event_read_value
  2015-04-08  0:34 [PATCH v2 0/5] perf: Implement event group read using txn interface Sukadev Bhattiprolu
  2015-04-08  0:34 ` [PATCH v2 1/5] perf: Add a flags parameter to pmu txn interfaces Sukadev Bhattiprolu
  2015-04-08  0:34 ` [PATCH v2 2/5] perf: Split perf_event_read() and perf_event_count() Sukadev Bhattiprolu
@ 2015-04-08  0:34 ` Sukadev Bhattiprolu
  2015-04-08 12:24   ` Peter Zijlstra
  2015-04-08  0:34 ` [PATCH v2 4/5] perf: Define PMU_TXN_READ interface Sukadev Bhattiprolu
  2015-04-08  0:34 ` [PATCH v2 5/5] powerpc/perf/hv-24x7: Use " Sukadev Bhattiprolu
  4 siblings, 1 reply; 12+ messages in thread
From: Sukadev Bhattiprolu @ 2015-04-08  0:34 UTC (permalink / raw)
  To: Paul Mackerras, Arnaldo Carvalho de Melo, mingo, peterz,
	Michael Ellerman
  Cc: linuxppc-dev, dev, linux-kernel

perf_event_read_value() is mostly computing the event count and enabled/
running times. Move the perf_event_read() into caller and rename
perf_event_read_value() to perf_event_compute_values().

Changelog[v2]
	Export symbol perf_event_read() since x86/kvm needs it now.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 arch/x86/kvm/pmu.c         |    6 ++++--
 include/linux/perf_event.h |    3 ++-
 kernel/events/core.c       |   18 +++++++++++-------
 3 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 8e6b7d8..5896cb1 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -144,9 +144,11 @@ static u64 read_pmc(struct kvm_pmc *pmc)
 
 	counter = pmc->counter;
 
-	if (pmc->perf_event)
-		counter += perf_event_read_value(pmc->perf_event,
+	if (pmc->perf_event) {
+		perf_event_read(pmc->perf_event);
+		counter += perf_event_compute_values(pmc->perf_event,
 						 &enabled, &running);
+	}
 
 	/* FIXME: Scaling needed? */
 
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 4dc3d70..e684c6b 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -578,8 +578,9 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr,
 				void *context);
 extern void perf_pmu_migrate_context(struct pmu *pmu,
 				int src_cpu, int dst_cpu);
-extern u64 perf_event_read_value(struct perf_event *event,
+extern u64 perf_event_compute_values(struct perf_event *event,
 				 u64 *enabled, u64 *running);
+extern void perf_event_read(struct perf_event *event);
 
 
 struct perf_sample_data {
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 0a3d7c1..1ac99d1 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3223,7 +3223,7 @@ static inline u64 perf_event_count(struct perf_event *event)
 	return local64_read(&event->count) + atomic64_read(&event->child_count);
 }
 
-static void perf_event_read(struct perf_event *event)
+void perf_event_read(struct perf_event *event)
 {
 	/*
 	 * If event is enabled and currently active on a CPU, update the
@@ -3250,6 +3250,7 @@ static void perf_event_read(struct perf_event *event)
 		raw_spin_unlock_irqrestore(&ctx->lock, flags);
 	}
 }
+EXPORT_SYMBOL_GPL(perf_event_read);
 
 /*
  * Initialize the perf_event context in a task_struct:
@@ -3643,7 +3644,8 @@ static void orphans_remove_work(struct work_struct *work)
 	put_ctx(ctx);
 }
 
-u64 perf_event_read_value(struct perf_event *event, u64 *enabled, u64 *running)
+u64 perf_event_compute_values(struct perf_event *event, u64 *enabled,
+				u64 *running)
 {
 	struct perf_event *child;
 	u64 total = 0;
@@ -3653,7 +3655,6 @@ u64 perf_event_read_value(struct perf_event *event, u64 *enabled, u64 *running)
 
 	mutex_lock(&event->child_mutex);
 
-	perf_event_read(event);
 	total += perf_event_count(event);
 
 	*enabled += event->total_time_enabled +
@@ -3671,7 +3672,7 @@ u64 perf_event_read_value(struct perf_event *event, u64 *enabled, u64 *running)
 
 	return total;
 }
-EXPORT_SYMBOL_GPL(perf_event_read_value);
+EXPORT_SYMBOL_GPL(perf_event_compute_values);
 
 static int perf_event_read_group(struct perf_event *event,
 				   u64 read_format, char __user *buf)
@@ -3684,7 +3685,8 @@ static int perf_event_read_group(struct perf_event *event,
 
 	lockdep_assert_held(&ctx->mutex);
 
-	count = perf_event_read_value(leader, &enabled, &running);
+	perf_event_read(leader);
+	count = perf_event_compute_values(leader, &enabled, &running);
 
 	values[n++] = 1 + leader->nr_siblings;
 	if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED)
@@ -3705,7 +3707,8 @@ static int perf_event_read_group(struct perf_event *event,
 	list_for_each_entry(sub, &leader->sibling_list, group_entry) {
 		n = 0;
 
-		values[n++] = perf_event_read_value(sub, &enabled, &running);
+		perf_event_read(sub);
+		values[n++] = perf_event_compute_values(sub, &enabled, &running);
 		if (read_format & PERF_FORMAT_ID)
 			values[n++] = primary_event_id(sub);
 
@@ -3728,7 +3731,8 @@ static int perf_event_read_one(struct perf_event *event,
 	u64 values[4];
 	int n = 0;
 
-	values[n++] = perf_event_read_value(event, &enabled, &running);
+	perf_event_read(event);
+	values[n++] = perf_event_compute_values(event, &enabled, &running);
 	if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED)
 		values[n++] = enabled;
 	if (read_format & PERF_FORMAT_TOTAL_TIME_RUNNING)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 4/5] perf: Define PMU_TXN_READ interface
  2015-04-08  0:34 [PATCH v2 0/5] perf: Implement event group read using txn interface Sukadev Bhattiprolu
                   ` (2 preceding siblings ...)
  2015-04-08  0:34 ` [PATCH v2 3/5] perf: Rename perf_event_read_value Sukadev Bhattiprolu
@ 2015-04-08  0:34 ` Sukadev Bhattiprolu
  2015-04-08 12:45   ` Peter Zijlstra
  2015-04-08  0:34 ` [PATCH v2 5/5] powerpc/perf/hv-24x7: Use " Sukadev Bhattiprolu
  4 siblings, 1 reply; 12+ messages in thread
From: Sukadev Bhattiprolu @ 2015-04-08  0:34 UTC (permalink / raw)
  To: Paul Mackerras, Arnaldo Carvalho de Melo, mingo, peterz,
	Michael Ellerman
  Cc: linuxppc-dev, dev, linux-kernel

Define a new PERF_PMU_TXN_READ interface to read a group of counters
at once. Note that we use this interface with all PMUs.

PMUs that implement this interface will queue the counters to be read
in ->read() and read them all at once in ->commit_txn().

PMUs that don't implement PERF_PMU_TXN_READ will continue to read one
counter at a time and ignore the ->start_txn() and ->commit_txn().

Thanks to input from Peter Zijlstra.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 include/linux/perf_event.h |    1 +
 kernel/events/core.c       |   33 +++++++++++++++++++++++++++++++--
 2 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index e684c6b..3e46a07 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -244,6 +244,7 @@ struct pmu {
 	 * Optional.
 	 */
 #define PERF_PMU_TXN_ADD  0x1		/* txn to add/schedule event on PMU */
+#define PERF_PMU_TXN_READ 0x2		/* txn to read event group from PMU */
 	void (*start_txn)		(struct pmu *pmu, int flags);
 	/*
 	 * If ->start_txn() disabled the ->add() schedulability test
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 1ac99d1..a001582 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3644,6 +3644,33 @@ static void orphans_remove_work(struct work_struct *work)
 	put_ctx(ctx);
 }
 
+/*
+ * Use the transaction interface to read the group of events in @leader.
+ * PMUs like the 24x7 counters in Power, can use this to queue the events
+ * in the ->read() operation and perform the actual read in ->commit_txn.
+ *
+ * Other PMUs can ignore the ->start_txn and ->commit_txn and read each
+ * PMU directly in the ->read() operation.
+ */
+static int perf_event_read_txn(struct perf_event *leader)
+{
+	int ret;
+	struct perf_event *sub;
+	struct pmu *pmu;
+
+	pmu = leader->pmu;
+
+	pmu->start_txn(pmu, PERF_PMU_TXN_READ);
+
+	perf_event_read(leader);
+	list_for_each_entry(sub, &leader->sibling_list, group_entry)
+		perf_event_read(sub);
+
+	ret = pmu->commit_txn(pmu, PERF_PMU_TXN_READ);
+
+	return ret;
+}
+
 u64 perf_event_compute_values(struct perf_event *event, u64 *enabled,
 				u64 *running)
 {
@@ -3685,7 +3712,10 @@ static int perf_event_read_group(struct perf_event *event,
 
 	lockdep_assert_held(&ctx->mutex);
 
-	perf_event_read(leader);
+	ret = perf_event_read_txn(leader);
+	if (ret)
+		return ret;
+
 	count = perf_event_compute_values(leader, &enabled, &running);
 
 	values[n++] = 1 + leader->nr_siblings;
@@ -3707,7 +3737,6 @@ static int perf_event_read_group(struct perf_event *event,
 	list_for_each_entry(sub, &leader->sibling_list, group_entry) {
 		n = 0;
 
-		perf_event_read(sub);
 		values[n++] = perf_event_compute_values(sub, &enabled, &running);
 		if (read_format & PERF_FORMAT_ID)
 			values[n++] = primary_event_id(sub);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 5/5] powerpc/perf/hv-24x7: Use PMU_TXN_READ interface
  2015-04-08  0:34 [PATCH v2 0/5] perf: Implement event group read using txn interface Sukadev Bhattiprolu
                   ` (3 preceding siblings ...)
  2015-04-08  0:34 ` [PATCH v2 4/5] perf: Define PMU_TXN_READ interface Sukadev Bhattiprolu
@ 2015-04-08  0:34 ` Sukadev Bhattiprolu
  2015-04-08 12:51   ` Peter Zijlstra
  4 siblings, 1 reply; 12+ messages in thread
From: Sukadev Bhattiprolu @ 2015-04-08  0:34 UTC (permalink / raw)
  To: Paul Mackerras, Arnaldo Carvalho de Melo, mingo, peterz,
	Michael Ellerman
  Cc: linuxppc-dev, dev, linux-kernel

The 24x7 counters in Powerpc allow monitoring a large number of counters
simultaneously. They also allow reading several counters in a single
HCALL so we can get a more consistent snapshot of the system.

Use the PMU's transaction interface to monitor and read several event
counters at once. The idea is that users can group several 24x7 events
into a single group of events. We use the following logic to submit
the group of events to the PMU and read the values:

	pmu->start_txn()		// Initialize before first event

	for each event in group
		pmu->read(event);	// Queue each event to be read

	pmu->commit_txn()		// Read/update all queued counters

The ->commit_txn() also updates event counts in the respective perf_event
objects.  The perf subsystem can then directly get the event counts from
the perf_event and can avoid submitting a new ->read() request to the PMU.

Thanks to input from Peter Zijlstra.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 arch/powerpc/perf/hv-24x7.c |  165 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 165 insertions(+)

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index b107efd..6d4ff82 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -142,6 +142,13 @@ static struct attribute_group event_long_desc_group = {
 
 static struct kmem_cache *hv_page_cache;
 
+struct h_24x7_hw {
+	int flags;
+	int in_txn;
+	int txn_err;
+	struct perf_event *events[255];
+};
+
 /*
  * request_buffer and result_buffer are not required to be 4k aligned,
  * but are not allowed to cross any 4k boundary. Aligning them to 4k is
@@ -150,6 +157,7 @@ static struct kmem_cache *hv_page_cache;
 #define H24x7_DATA_BUFFER_SIZE	4096
 DEFINE_PER_CPU(char, hv_24x7_reqb[H24x7_DATA_BUFFER_SIZE]) __aligned(4096);
 DEFINE_PER_CPU(char, hv_24x7_resb[H24x7_DATA_BUFFER_SIZE]) __aligned(4096);
+DEFINE_PER_CPU(struct h_24x7_hw, h_24x7_hw);
 
 static char *event_name(struct hv_24x7_event_data *ev, int *len)
 {
@@ -1213,9 +1221,47 @@ static void update_event_count(struct perf_event *event, u64 now)
 static void h_24x7_event_read(struct perf_event *event)
 {
 	u64 now;
+	struct h_24x7_hw *h24x7hw;
+	struct hv_24x7_request_buffer *request_buffer;
+
+	/*
+	 * If in a READ transaction, add this counter to the list of
+	 * counters to read during the next HCALL (i.e commit_txn()).
+	 * If not in a READ transaction, go ahead and make the HCALL
+	 * to read this counter by itself.
+	 */
+	h24x7hw = &get_cpu_var(h_24x7_hw);
+	if (h24x7hw->txn_err)
+		goto out;
+
+	if (h24x7hw->in_txn) {
+		int i;
+		int ret;
+
+		request_buffer = (void *)get_cpu_var(hv_24x7_reqb);
+
+		ret = add_event_to_24x7_request(event, request_buffer);
+		if (ret) {
+			h24x7hw->txn_err = ret;
+		} else {
+			/*
+			 * Assoicate the event with the HCALL request index,
+			 * so we can quickly find/update the count in
+			 * ->commit_txn().
+			 */
+			i = request_buffer->num_requests - 1;
+			h24x7hw->events[i] = event;
+		}
+
+		put_cpu_var(hv_24x7_reqb);
+		goto out;
+	}
 
 	now = h_24x7_get_value(event);
 	update_event_count(event, now);
+
+out:
+	put_cpu_var(h_24x7_hw);
 }
 
 static void h_24x7_event_start(struct perf_event *event, int flags)
@@ -1237,6 +1283,122 @@ static int h_24x7_event_add(struct perf_event *event, int flags)
 	return 0;
 }
 
+static void h_24x7_event_start_txn(struct pmu *pmu, int flags)
+{
+	struct hv_24x7_request_buffer *request_buffer;
+	struct hv_24x7_data_result_buffer *result_buffer;
+	struct h_24x7_hw *h24x7hw;
+
+	/*
+	 * 24x7 counters only support READ transactions. They are
+	 * always counting and dont need/support ADD transactions.
+	 */
+	if (flags & ~PERF_PMU_TXN_READ)
+		return;
+
+	h24x7hw = &get_cpu_var(h_24x7_hw);
+	request_buffer = (void *)get_cpu_var(hv_24x7_reqb);
+	result_buffer = (void *)get_cpu_var(hv_24x7_resb);
+
+	/* We should not be called if we are already in a txn */
+	WARN_ON_ONCE(h24x7hw->in_txn);
+
+	init_24x7_request(request_buffer, result_buffer);
+	h24x7hw->in_txn = 1;
+
+	put_cpu_var(hv_24x7_resb);
+	put_cpu_var(hv_24x7_reqb);
+	put_cpu_var(h_24x7_hw);
+}
+
+static void reset_txn(struct h_24x7_hw *h24x7hw)
+{
+	/* Clean up transaction */
+	h24x7hw->in_txn = 0;
+	h24x7hw->txn_err = 0;
+	h24x7hw->flags = 0;
+
+	/*
+	 * request_buffer and result_buffer will be initialized
+	 * during the next read/txn.
+	 */
+}
+
+static int h_24x7_event_commit_txn(struct pmu *pmu, int flags)
+{
+	struct hv_24x7_request_buffer *request_buffer;
+	struct hv_24x7_data_result_buffer *result_buffer;
+	struct h_24x7_hw *h24x7hw;
+	struct hv_24x7_result *resb;
+	struct perf_event *event;
+	u64 count;
+	int i, ret;
+
+	/*
+	 * 24x7 counters only support READ transactions. They are
+	 * always counting and dont need/support ADD transactions.
+	 */
+	if (flags & ~PERF_PMU_TXN_READ)
+		return 0;
+
+	h24x7hw = &get_cpu_var(h_24x7_hw);
+	if (h24x7hw->txn_err) {
+		ret = h24x7hw->txn_err;
+		goto out;
+	}
+
+	ret = -EINVAL;
+	if (!h24x7hw->in_txn) {
+		WARN_ON_ONCE(1);
+		goto out;
+	}
+
+	request_buffer = (void *)get_cpu_var(hv_24x7_reqb);
+	result_buffer = (void *)get_cpu_var(hv_24x7_resb);
+
+	ret = make_24x7_request(request_buffer, result_buffer);
+	if (ret) {
+		log_24x7_hcall(request_buffer, result_buffer, ret);
+		goto put_reqb;
+	}
+
+	/* Update event counts from hcall */
+	for (i = 0; i < request_buffer->num_requests; i++) {
+		resb = &result_buffer->results[i];
+		count = be64_to_cpu(resb->elements[0].element_data[0]);
+		event = h24x7hw->events[i];
+		h24x7hw->events[i] = NULL;
+		update_event_count(event, count);
+	}
+
+put_reqb:
+	put_cpu_var(hv_24x7_resb);
+	put_cpu_var(hv_24x7_reqb);
+out:
+	reset_txn(h24x7hw);
+	put_cpu_var(h_24x7_hw);
+	return ret;
+}
+
+static void h_24x7_event_cancel_txn(struct pmu *pmu, int flags)
+{
+	struct h_24x7_hw *h24x7hw;
+
+	/*
+	 * 24x7 counters only support READ transactions. They are
+	 * always counting and dont need/support ADD transactions.
+	 */
+	if (flags & ~PERF_PMU_TXN_READ)
+		return;
+
+	h24x7hw = &get_cpu_var(h_24x7_hw);
+
+	WARN_ON_ONCE(!h24x7hw->in_txn);
+	reset_txn(h24x7hw);
+
+	put_cpu_var(h_24x7_hw);
+}
+
 static struct pmu h_24x7_pmu = {
 	.task_ctx_nr = perf_invalid_context,
 
@@ -1248,6 +1410,9 @@ static struct pmu h_24x7_pmu = {
 	.start       = h_24x7_event_start,
 	.stop        = h_24x7_event_stop,
 	.read        = h_24x7_event_read,
+	.start_txn   = h_24x7_event_start_txn,
+	.commit_txn  = h_24x7_event_commit_txn,
+	.cancel_txn  = h_24x7_event_cancel_txn,
 };
 
 static int hv_24x7_init(void)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/5] perf: Add a flags parameter to pmu txn interfaces
  2015-04-08  0:34 ` [PATCH v2 1/5] perf: Add a flags parameter to pmu txn interfaces Sukadev Bhattiprolu
@ 2015-04-08 12:19   ` Peter Zijlstra
  2015-04-08 15:40     ` Sukadev Bhattiprolu
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2015-04-08 12:19 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: linux-kernel, Arnaldo Carvalho de Melo, mingo, Paul Mackerras,
	dev, linuxppc-dev

On Tue, Apr 07, 2015 at 05:34:55PM -0700, Sukadev Bhattiprolu wrote:
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 2b62198..4dc3d70 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -240,20 +240,27 @@ struct pmu {
>  	 *
>  	 * Start the transaction, after this ->add() doesn't need to
>  	 * do schedulability tests.
> +	 *
> +	 * Optional.
>  	 */
> -	void (*start_txn)		(struct pmu *pmu); /* optional */
> +#define PERF_PMU_TXN_ADD  0x1		/* txn to add/schedule event on PMU */

Please do not interleave these flags in the structure. Maybe place it
near PERF_EVENT_TXN.

> +	void (*start_txn)		(struct pmu *pmu, int flags);
>  	/*
>  	 * If ->start_txn() disabled the ->add() schedulability test
>  	 * then ->commit_txn() is required to perform one. On success
>  	 * the transaction is closed. On error the transaction is kept
>  	 * open until ->cancel_txn() is called.
> +	 *
> +	 * Optional.
>  	 */
> -	int  (*commit_txn)		(struct pmu *pmu); /* optional */
> +	int  (*commit_txn)		(struct pmu *pmu, int flags);
>  	/*
>  	 * Will cancel the transaction, assumes ->del() is called
>  	 * for each successful ->add() during the transaction.
> +	 *
> +	 * Optional.
>  	 */
> -	void (*cancel_txn)		(struct pmu *pmu); /* optional */
> +	void (*cancel_txn)		(struct pmu *pmu, int flags);
>  
>  	/*
>  	 * Will return the value for perf_event_mmap_page::index for this event,

So (again), why also pass the flags to cancel/commit ?

The transaction state _has_ to already record it, otherwise it doesn't
know if add/read are part of one.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/5] perf: Rename perf_event_read_value
  2015-04-08  0:34 ` [PATCH v2 3/5] perf: Rename perf_event_read_value Sukadev Bhattiprolu
@ 2015-04-08 12:24   ` Peter Zijlstra
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2015-04-08 12:24 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: linux-kernel, Arnaldo Carvalho de Melo, mingo, Paul Mackerras,
	dev, linuxppc-dev

On Tue, Apr 07, 2015 at 05:34:57PM -0700, Sukadev Bhattiprolu wrote:
> perf_event_read_value() is mostly computing the event count and enabled/
> running times. Move the perf_event_read() into caller and rename
> perf_event_read_value() to perf_event_compute_values().
> 
> Changelog[v2]
> 	Export symbol perf_event_read() since x86/kvm needs it now.

FWIW these changed since last things go below the --- since they're
pointless for the permanent record.

> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
> ---
>  arch/x86/kvm/pmu.c         |    6 ++++--
>  include/linux/perf_event.h |    3 ++-
>  kernel/events/core.c       |   18 +++++++++++-------
>  3 files changed, 17 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index 8e6b7d8..5896cb1 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -144,9 +144,11 @@ static u64 read_pmc(struct kvm_pmc *pmc)
>  
>  	counter = pmc->counter;
>  
> -	if (pmc->perf_event)
> -		counter += perf_event_read_value(pmc->perf_event,
> +	if (pmc->perf_event) {
> +		perf_event_read(pmc->perf_event);
> +		counter += perf_event_compute_values(pmc->perf_event,
>  						 &enabled, &running);
> +	}

That's more than a rename, that's a functionality split just like the
previous patch.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 4/5] perf: Define PMU_TXN_READ interface
  2015-04-08  0:34 ` [PATCH v2 4/5] perf: Define PMU_TXN_READ interface Sukadev Bhattiprolu
@ 2015-04-08 12:45   ` Peter Zijlstra
  2015-04-09 19:10     ` Sukadev Bhattiprolu
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2015-04-08 12:45 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: linux-kernel, Arnaldo Carvalho de Melo, mingo, Paul Mackerras,
	dev, linuxppc-dev

On Tue, Apr 07, 2015 at 05:34:58PM -0700, Sukadev Bhattiprolu wrote:
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 1ac99d1..a001582 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3644,6 +3644,33 @@ static void orphans_remove_work(struct work_struct *work)
>  	put_ctx(ctx);
>  }
>  
> +/*
> + * Use the transaction interface to read the group of events in @leader.
> + * PMUs like the 24x7 counters in Power, can use this to queue the events
> + * in the ->read() operation and perform the actual read in ->commit_txn.
> + *
> + * Other PMUs can ignore the ->start_txn and ->commit_txn and read each
> + * PMU directly in the ->read() operation.
> + */
> +static int perf_event_read_txn(struct perf_event *leader)

perf_event_read_group() might be a better name. Ah, I see that's already
taken. Bugger.

See the below patch.

> +{
> +	int ret;
> +	struct perf_event *sub;
> +	struct pmu *pmu;
> +
> +	pmu = leader->pmu;
> +
> +	pmu->start_txn(pmu, PERF_PMU_TXN_READ);
> +
> +	perf_event_read(leader);
> +	list_for_each_entry(sub, &leader->sibling_list, group_entry)
> +		perf_event_read(sub);
> +
> +	ret = pmu->commit_txn(pmu, PERF_PMU_TXN_READ);
> +
> +	return ret;
> +}

And while were here, should we change the NOP txn implementation to not
call perf_pmu_disable for TXN_READ ?

That seems entirely unneeded in this case.

---
Subject: perf: Rename perf_event_read_{one,group}, perf_read_hw

In order to free up the perf_event_read_group() name:

 s/perf_event_read_\(one\|group\)/perf_read_\1/g
 s/perf_read_hw/__perf_read/g

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/events/core.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 06917d537302..869f6accb4f4 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3677,7 +3677,7 @@ static void put_event(struct perf_event *event)
 	 *     see the comment there.
 	 *
 	 *  2) there is a lock-inversion with mmap_sem through
-	 *     perf_event_read_group(), which takes faults while
+	 *     perf_read_group(), which takes faults while
 	 *     holding ctx->mutex, however this is called after
 	 *     the last filedesc died, so there is no possibility
 	 *     to trigger the AB-BA case.
@@ -3765,7 +3765,7 @@ u64 perf_event_read_value(struct perf_event *event, u64 *enabled, u64 *running)
 }
 EXPORT_SYMBOL_GPL(perf_event_read_value);
 
-static int perf_event_read_group(struct perf_event *event,
+static int perf_read_group(struct perf_event *event,
 				   u64 read_format, char __user *buf)
 {
 	struct perf_event *leader = event->group_leader, *sub;
@@ -3813,7 +3813,7 @@ static int perf_event_read_group(struct perf_event *event,
 	return ret;
 }
 
-static int perf_event_read_one(struct perf_event *event,
+static int perf_read_one(struct perf_event *event,
 				 u64 read_format, char __user *buf)
 {
 	u64 enabled, running;
@@ -3851,7 +3851,7 @@ static bool is_event_hup(struct perf_event *event)
  * Read the performance event - simple non blocking version for now
  */
 static ssize_t
-perf_read_hw(struct perf_event *event, char __user *buf, size_t count)
+__perf_read(struct perf_event *event, char __user *buf, size_t count)
 {
 	u64 read_format = event->attr.read_format;
 	int ret;
@@ -3869,9 +3869,9 @@ perf_read_hw(struct perf_event *event, char __user *buf, size_t count)
 
 	WARN_ON_ONCE(event->ctx->parent_ctx);
 	if (read_format & PERF_FORMAT_GROUP)
-		ret = perf_event_read_group(event, read_format, buf);
+		ret = perf_read_group(event, read_format, buf);
 	else
-		ret = perf_event_read_one(event, read_format, buf);
+		ret = perf_read_one(event, read_format, buf);
 
 	return ret;
 }
@@ -3884,7 +3884,7 @@ perf_read(struct file *file, char __user *buf, size_t count, loff_t *ppos)
 	int ret;
 
 	ctx = perf_event_ctx_lock(event);
-	ret = perf_read_hw(event, buf, count);
+	ret = __perf_read(event, buf, count);
 	perf_event_ctx_unlock(event, ctx);
 
 	return ret;

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 5/5] powerpc/perf/hv-24x7: Use PMU_TXN_READ interface
  2015-04-08  0:34 ` [PATCH v2 5/5] powerpc/perf/hv-24x7: Use " Sukadev Bhattiprolu
@ 2015-04-08 12:51   ` Peter Zijlstra
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2015-04-08 12:51 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: linux-kernel, Arnaldo Carvalho de Melo, mingo, Paul Mackerras,
	dev, linuxppc-dev

On Tue, Apr 07, 2015 at 05:34:59PM -0700, Sukadev Bhattiprolu wrote:
> @@ -1213,9 +1221,47 @@ static void update_event_count(struct perf_event *event, u64 now)
>  static void h_24x7_event_read(struct perf_event *event)
>  {
>  	u64 now;
> +	struct h_24x7_hw *h24x7hw;
> +	struct hv_24x7_request_buffer *request_buffer;
> +
> +	/*
> +	 * If in a READ transaction, add this counter to the list of
> +	 * counters to read during the next HCALL (i.e commit_txn()).
> +	 * If not in a READ transaction, go ahead and make the HCALL
> +	 * to read this counter by itself.
> +	 */
> +	h24x7hw = &get_cpu_var(h_24x7_hw);
> +	if (h24x7hw->txn_err)
> +		goto out;
> +
> +	if (h24x7hw->in_txn) {
> +		int i;
> +		int ret;
> +
> +		request_buffer = (void *)get_cpu_var(hv_24x7_reqb);
> +
> +		ret = add_event_to_24x7_request(event, request_buffer);
> +		if (ret) {
> +			h24x7hw->txn_err = ret;
> +		} else {
> +			/*
> +			 * Assoicate the event with the HCALL request index,
> +			 * so we can quickly find/update the count in
> +			 * ->commit_txn().
> +			 */
> +			i = request_buffer->num_requests - 1;
> +			h24x7hw->events[i] = event;
> +		}
> +
> +		put_cpu_var(hv_24x7_reqb);
> +		goto out;
> +	}
>  
>  	now = h_24x7_get_value(event);
>  	update_event_count(event, now);
> +
> +out:
> +	put_cpu_var(h_24x7_hw);
>  }

Most of the pmu ops are called with IRQs disabled, no need to use
get_cpu_var()/put_cpu_var() which disable/enable preemption.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/5] perf: Add a flags parameter to pmu txn interfaces
  2015-04-08 12:19   ` Peter Zijlstra
@ 2015-04-08 15:40     ` Sukadev Bhattiprolu
  0 siblings, 0 replies; 12+ messages in thread
From: Sukadev Bhattiprolu @ 2015-04-08 15:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Arnaldo Carvalho de Melo, mingo, Paul Mackerras,
	dev, linuxppc-dev

Peter Zijlstra [peterz@infradead.org] wrote:
| On Tue, Apr 07, 2015 at 05:34:55PM -0700, Sukadev Bhattiprolu wrote:
| > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
| > index 2b62198..4dc3d70 100644
| > --- a/include/linux/perf_event.h
| > +++ b/include/linux/perf_event.h
| > @@ -240,20 +240,27 @@ struct pmu {
| >  	 *
| >  	 * Start the transaction, after this ->add() doesn't need to
| >  	 * do schedulability tests.
| > +	 *
| > +	 * Optional.
| >  	 */
| > -	void (*start_txn)		(struct pmu *pmu); /* optional */
| > +#define PERF_PMU_TXN_ADD  0x1		/* txn to add/schedule event on PMU */
| 
| Please do not interleave these flags in the structure. Maybe place it
| near PERF_EVENT_TXN.

Ok.

| 
| > +	void (*start_txn)		(struct pmu *pmu, int flags);
| >  	/*
| >  	 * If ->start_txn() disabled the ->add() schedulability test
| >  	 * then ->commit_txn() is required to perform one. On success
| >  	 * the transaction is closed. On error the transaction is kept
| >  	 * open until ->cancel_txn() is called.
| > +	 *
| > +	 * Optional.
| >  	 */
| > -	int  (*commit_txn)		(struct pmu *pmu); /* optional */
| > +	int  (*commit_txn)		(struct pmu *pmu, int flags);
| >  	/*
| >  	 * Will cancel the transaction, assumes ->del() is called
| >  	 * for each successful ->add() during the transaction.
| > +	 *
| > +	 * Optional.
| >  	 */
| > -	void (*cancel_txn)		(struct pmu *pmu); /* optional */
| > +	void (*cancel_txn)		(struct pmu *pmu, int flags);
| >  
| >  	/*
| >  	 * Will return the value for perf_event_mmap_page::index for this event,
| 
| So (again), why also pass the flags to cancel/commit ?
| 
| The transaction state _has_ to already record it, otherwise it doesn't
| know if add/read are part of one.

I had replied to your earlier mail and added a Note to this patch
description. I was hesitating making more intrusive changes to PMUs
that don't care about the new txn type.

But, I agree, it is better to go ahead and update all PMUs to save
the state and check. Will do that in the next version.

Sukadev

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 4/5] perf: Define PMU_TXN_READ interface
  2015-04-08 12:45   ` Peter Zijlstra
@ 2015-04-09 19:10     ` Sukadev Bhattiprolu
  0 siblings, 0 replies; 12+ messages in thread
From: Sukadev Bhattiprolu @ 2015-04-09 19:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Arnaldo Carvalho de Melo, mingo, Paul Mackerras,
	dev, linuxppc-dev

Peter Zijlstra [peterz@infradead.org] wrote:
| On Tue, Apr 07, 2015 at 05:34:58PM -0700, Sukadev Bhattiprolu wrote:
| > diff --git a/kernel/events/core.c b/kernel/events/core.c
| > index 1ac99d1..a001582 100644
| > --- a/kernel/events/core.c
| > +++ b/kernel/events/core.c
| > @@ -3644,6 +3644,33 @@ static void orphans_remove_work(struct work_struct *work)
| >  	put_ctx(ctx);
| >  }
| >  
| > +/*
| > + * Use the transaction interface to read the group of events in @leader.
| > + * PMUs like the 24x7 counters in Power, can use this to queue the events
| > + * in the ->read() operation and perform the actual read in ->commit_txn.
| > + *
| > + * Other PMUs can ignore the ->start_txn and ->commit_txn and read each
| > + * PMU directly in the ->read() operation.
| > + */
| > +static int perf_event_read_txn(struct perf_event *leader)
| 
| perf_event_read_group() might be a better name. Ah, I see that's already
| taken. Bugger.
| 
| See the below patch.

Sure, will include your patch in the next version and use
perf_event_read_group().

| 
| > +{
| > +	int ret;
| > +	struct perf_event *sub;
| > +	struct pmu *pmu;
| > +
| > +	pmu = leader->pmu;
| > +
| > +	pmu->start_txn(pmu, PERF_PMU_TXN_READ);
| > +
| > +	perf_event_read(leader);
| > +	list_for_each_entry(sub, &leader->sibling_list, group_entry)
| > +		perf_event_read(sub);
| > +
| > +	ret = pmu->commit_txn(pmu, PERF_PMU_TXN_READ);
| > +
| > +	return ret;
| > +}
| 
| And while were here, should we change the NOP txn implementation to not
| call perf_pmu_disable for TXN_READ ?
| 
| That seems entirely unneeded in this case.

Yes. Should we use a per-cpu, per-pmu variable to save/check the
transaction type like this? (I am using the cpuhw->group_flag in
x86, Power and other PMUs)


>From 2f3658b0b131739dc08e0d6d579e58864d1777bc Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Date: Thu, 9 Apr 2015 13:47:50 -0400
Subject: [PATCH 1/1] perf: Have NOP txn interface ignore non ADD txns

The NOP txn interface should ignore non TXN_ADD transactions and
avoid disabling/enabling the PMU.

Use a per-cpu, per-PMU flag to store/check the type of transaction
in progress.

Thanks to Peter Zijlstra for the input.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 9e869b2..9466864 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -160,7 +160,10 @@ struct perf_event;
 /*
  * Common implementation detail of pmu::{start,commit,cancel}_txn
  */
-#define PERF_EVENT_TXN 0x1
+#define PERF_EVENT_TXN_ADD      0x1    /* txn to add/schedule event on PMU */
+#define PERF_EVENT_TXN_READ     0x2    /* txn to add/schedule event on PMU */
+
+#define PERF_EVENT_TXN_MASK     (PERF_EVENT_TXN_ADD|PERF_EVENT_TXN_READ)
 
 /**
  * pmu::capabilities flags
@@ -240,8 +243,10 @@ struct pmu {
         *
         * Start the transaction, after this ->add() doesn't need to
         * do schedulability tests.
+        *
+        * Optional.
         */
-       void (*start_txn)               (struct pmu *pmu); /* optional */
+       void (*start_txn)               (struct pmu *pmu, int flags);
        /*
         * If ->start_txn() disabled the ->add() schedulability test
         * then ->commit_txn() is required to perform one. On success
@@ -534,6 +534,7 @@ struct perf_cpu_context {
 	ktime_t				hrtimer_interval;
 	struct pmu			*unique_pmu;
 	struct perf_cgroup		*cgrp;
+	int				group_flag;
 };
 
 struct perf_output_handle {
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 0ebc468..08d0c3e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6746,18 +6746,35 @@ static int perf_pmu_nop_int(struct pmu *pmu)
 
 static void perf_pmu_start_txn(struct pmu *pmu, int flags)
 {
-	perf_pmu_disable(pmu);
+	struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
+
+	BUG_ON(cpuctx->group_flag);
+
+	cpuctx->group_flag = flags;
+
+	if (flags & PERF_EVENT_TXN_ADD)
+		perf_pmu_disable(pmu);
 }
 
 static int perf_pmu_commit_txn(struct pmu *pmu)
 {
-	perf_pmu_enable(pmu);
+	struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
+
+	if (cpuctx->group_flag & PERF_EVENT_TXN_ADD)
+		perf_pmu_enable(pmu);
+
+	cpuctx->group_flag &= ~PERF_EVENT_TXN_MASK;
 	return 0;
 }
 
 static void perf_pmu_cancel_txn(struct pmu *pmu)
 {
-	perf_pmu_enable(pmu);
+	struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
+
+	if (cpuctx->group_flag & PERF_EVENT_TXN_ADD)
+		perf_pmu_enable(pmu);
+
+	cpuctx->group_flag &= ~PERF_EVENT_TXN_MASK;
 }
 
 static int perf_event_idx_default(struct perf_event *event)

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-04-16 17:27 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-08  0:34 [PATCH v2 0/5] perf: Implement event group read using txn interface Sukadev Bhattiprolu
2015-04-08  0:34 ` [PATCH v2 1/5] perf: Add a flags parameter to pmu txn interfaces Sukadev Bhattiprolu
2015-04-08 12:19   ` Peter Zijlstra
2015-04-08 15:40     ` Sukadev Bhattiprolu
2015-04-08  0:34 ` [PATCH v2 2/5] perf: Split perf_event_read() and perf_event_count() Sukadev Bhattiprolu
2015-04-08  0:34 ` [PATCH v2 3/5] perf: Rename perf_event_read_value Sukadev Bhattiprolu
2015-04-08 12:24   ` Peter Zijlstra
2015-04-08  0:34 ` [PATCH v2 4/5] perf: Define PMU_TXN_READ interface Sukadev Bhattiprolu
2015-04-08 12:45   ` Peter Zijlstra
2015-04-09 19:10     ` Sukadev Bhattiprolu
2015-04-08  0:34 ` [PATCH v2 5/5] powerpc/perf/hv-24x7: Use " Sukadev Bhattiprolu
2015-04-08 12:51   ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).