[PATCH v3 0/8] Implement group-read of events using txn interface

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 0/8] Implement group-read of events using txn interface
@ 2015-07-15  3:01 Sukadev Bhattiprolu
  2015-07-15  3:01 ` [PATCH v3 1/8] powerpc/perf/hv-24x7: Whitespace - fix parameter alignment Sukadev Bhattiprolu
                   ` (7 more replies)
  0 siblings, 8 replies; 25+ messages in thread
From: Sukadev Bhattiprolu @ 2015-07-15  3:01 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Michael Ellerman
  Cc: linux-kernel, linuxppc-dev, linux-s390, sparclinux

Unlike normal hardware PMCs, the 24x7 counters in Power8 are stored in
memory and accessed via a hypervisor call(HCALL).  A major aspect of the
HCALL is that it allows retireving _several_ counters at once (unlike
regular PMCs, which are read one at a time). By reading several counters
at once, we can get a more consistent snapshot of the system.

This patchset extends the transaction interface to accomplish submitting
several events to the PMU and have the PMU read them all at once. User is
expected to submit the set of events they want to read as an "event group".

In the kernel, we submit each event to the PMU using the following logic
(from Peter Zijlstra).

        pmu->start_txn(pmu, PMU_TXN_READ);

        leader->read();
        for_each_sibling()
                sibling->read();
        pmu->commit_txn();

where:
        - the ->read()s queue events to be submitted to the hypervisor, and,
        - the ->commit_txn() issues the HCALL, retrieves the result and
          updates the event count.

Architectures/PMUs that don't need/implement PMU_TXN_READ type of transactions,
simply ignore the ->start_txn() and ->commit_txn() and continue to read the
counters one at a time in the ->read() call.

Compile/touch tested on x86. Need help testing on s390 and Sparc.

Thanks to Peter Zijlstra for his input.

Changelog [v3]
	- Simple changes/reorg of patchset to split/rename functions
	- [Peter Zijlstra] Save the transaction flags in ->start_txn() and
	  drop the flags parameter from ->commit_txn() and ->cancel_txn().
	- [Peter Zijlstra] The nop txn interfaces don't need to disable/enable
	  PMU for PERF_PMU_TXN_READ transactions.

Changelog [v2]
        - Use the transaction interface unconditionally to avoid special-case
          code. Architectures/PMUs that don't need the READ transaction types
          simply ignore the ->start_txn() and ->commit_txn() calls.

Peter Zijlstra (Intel) (1):
  perf: Rename perf_event_read_{one,group}, perf_read_hw

Sukadev Bhattiprolu (7):
  powerpc/perf/hv-24x7: Whitespace - fix parameter alignment
  powerpc/perf/hv-24x7: Simplify extracting counter from result buffer
  perf: Add a flags parameter to pmu txn interfaces
  perf: Split perf_event_read() and perf_event_count()
  perf: Split perf_event_read_value()
  perf: Define PMU_TXN_READ interface
  powerpc/perf/hv-24x7: Use PMU_TXN_READ interface

 arch/powerpc/perf/core-book3s.c  |   25 ++++-
 arch/powerpc/perf/hv-24x7.c      |  186 ++++++++++++++++++++++++++++++++++----
 arch/s390/kernel/perf_cpum_cf.c  |   24 ++++-
 arch/sparc/kernel/perf_event.c   |   19 +++-
 arch/x86/kernel/cpu/perf_event.c |   27 +++++-
 arch/x86/kernel/cpu/perf_event.h |    1 +
 include/linux/perf_event.h       |   15 ++-
 kernel/events/core.c             |  143 +++++++++++++++++++++++------
 8 files changed, 389 insertions(+), 51 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 1/8] powerpc/perf/hv-24x7: Whitespace - fix parameter alignment
  2015-07-15  3:01 [PATCH v3 0/8] Implement group-read of events using txn interface Sukadev Bhattiprolu
@ 2015-07-15  3:01 ` Sukadev Bhattiprolu
  2015-08-03  1:35   ` [v3,1/8] " Michael Ellerman
  2015-07-15  3:01 ` [PATCH v3 2/8] powerpc/perf/hv-24x7: Simplify extracting counter from result buffer Sukadev Bhattiprolu
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Sukadev Bhattiprolu @ 2015-07-15  3:01 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Michael Ellerman
  Cc: linux-kernel, linuxppc-dev, linux-s390, sparclinux

Fix parameter alignment to be consistent with coding style.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 arch/powerpc/perf/hv-24x7.c |   20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index df95629..9d73c69 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -416,7 +416,7 @@ out_val:
 }
 
 static struct attribute *event_to_desc_attr(struct hv_24x7_event_data *event,
-				int nonce)
+					    int nonce)
 {
 	int nl, dl;
 	char *name = event_name(event, &nl);
@@ -444,7 +444,7 @@ event_to_long_desc_attr(struct hv_24x7_event_data *event, int nonce)
 }
 
 static ssize_t event_data_to_attrs(unsigned ix, struct attribute **attrs,
-		struct hv_24x7_event_data *event, int nonce)
+				   struct hv_24x7_event_data *event, int nonce)
 {
 	unsigned i;
 
@@ -512,7 +512,7 @@ static int memord(const void *d1, size_t s1, const void *d2, size_t s2)
 }
 
 static int ev_uniq_ord(const void *v1, size_t s1, unsigned d1, const void *v2,
-					size_t s2, unsigned d2)
+		       size_t s2, unsigned d2)
 {
 	int r = memord(v1, s1, v2, s2);
 
@@ -526,7 +526,7 @@ static int ev_uniq_ord(const void *v1, size_t s1, unsigned d1, const void *v2,
 }
 
 static int event_uniq_add(struct rb_root *root, const char *name, int nl,
-				unsigned domain)
+			  unsigned domain)
 {
 	struct rb_node **new = &(root->rb_node), *parent = NULL;
 	struct event_uniq *data;
@@ -650,8 +650,8 @@ static ssize_t catalog_event_len_validate(struct hv_24x7_event_data *event,
 #define MAX_4K (SIZE_MAX / 4096)
 
 static int create_events_from_catalog(struct attribute ***events_,
-		struct attribute ***event_descs_,
-		struct attribute ***event_long_descs_)
+				      struct attribute ***event_descs_,
+				      struct attribute ***event_long_descs_)
 {
 	unsigned long hret;
 	size_t catalog_len, catalog_page_len, event_entry_count,
@@ -1008,8 +1008,8 @@ static const struct attribute_group *attr_groups[] = {
 };
 
 static void log_24x7_hcall(struct hv_24x7_request_buffer *request_buffer,
-			struct hv_24x7_data_result_buffer *result_buffer,
-			unsigned long ret)
+			   struct hv_24x7_data_result_buffer *result_buffer,
+			   unsigned long ret)
 {
 	struct hv_24x7_request *req;
 
@@ -1026,7 +1026,7 @@ static void log_24x7_hcall(struct hv_24x7_request_buffer *request_buffer,
  * Start the process for a new H_GET_24x7_DATA hcall.
  */
 static void init_24x7_request(struct hv_24x7_request_buffer *request_buffer,
-			struct hv_24x7_data_result_buffer *result_buffer)
+			      struct hv_24x7_data_result_buffer *result_buffer)
 {
 
 	memset(request_buffer, 0, 4096);
@@ -1041,7 +1041,7 @@ static void init_24x7_request(struct hv_24x7_request_buffer *request_buffer,
  * by 'init_24x7_request()' and 'add_event_to_24x7_request()'.
  */
 static int make_24x7_request(struct hv_24x7_request_buffer *request_buffer,
-			struct hv_24x7_data_result_buffer *result_buffer)
+			     struct hv_24x7_data_result_buffer *result_buffer)
 {
 	unsigned long ret;
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 2/8] powerpc/perf/hv-24x7: Simplify extracting counter from result buffer
  2015-07-15  3:01 [PATCH v3 0/8] Implement group-read of events using txn interface Sukadev Bhattiprolu
  2015-07-15  3:01 ` [PATCH v3 1/8] powerpc/perf/hv-24x7: Whitespace - fix parameter alignment Sukadev Bhattiprolu
@ 2015-07-15  3:01 ` Sukadev Bhattiprolu
  2015-08-03  1:35   ` [v3, " Michael Ellerman
  2015-07-15  3:01 ` [PATCH v3 3/8] perf: Add a flags parameter to pmu txn interfaces Sukadev Bhattiprolu
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Sukadev Bhattiprolu @ 2015-07-15  3:01 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Michael Ellerman
  Cc: linux-kernel, linuxppc-dev, linux-s390, sparclinux

Simplify code that extracts a 24x7 counter from the HCALL's result buffer.

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 arch/powerpc/perf/hv-24x7.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index 9d73c69..4d1a8d1 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -1104,7 +1104,7 @@ static unsigned long single_24x7_request(struct perf_event *event, u64 *count)
 	unsigned long ret;
 	struct hv_24x7_request_buffer *request_buffer;
 	struct hv_24x7_data_result_buffer *result_buffer;
-	struct hv_24x7_result *resb;
+	__be64 val;
 
 	BUILD_BUG_ON(sizeof(*request_buffer) > 4096);
 	BUILD_BUG_ON(sizeof(*result_buffer) > 4096);
@@ -1125,8 +1125,8 @@ static unsigned long single_24x7_request(struct perf_event *event, u64 *count)
 	}
 
 	/* process result from hcall */
-	resb = &result_buffer->results[0];
-	*count = be64_to_cpu(resb->elements[0].element_data[0]);
+	val = result_buffer->results[0].elements[0].element_data[0];
+	*count = be64_to_cpu(val);
 
 out:
 	put_cpu_var(hv_24x7_reqb);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 3/8] perf: Add a flags parameter to pmu txn interfaces
  2015-07-15  3:01 [PATCH v3 0/8] Implement group-read of events using txn interface Sukadev Bhattiprolu
  2015-07-15  3:01 ` [PATCH v3 1/8] powerpc/perf/hv-24x7: Whitespace - fix parameter alignment Sukadev Bhattiprolu
  2015-07-15  3:01 ` [PATCH v3 2/8] powerpc/perf/hv-24x7: Simplify extracting counter from result buffer Sukadev Bhattiprolu
@ 2015-07-15  3:01 ` Sukadev Bhattiprolu
  2015-07-16 20:17   ` Peter Zijlstra
  2015-07-16 20:48   ` Peter Zijlstra
  2015-07-15  3:01 ` [PATCH v3 4/8] perf: Split perf_event_read() and perf_event_count() Sukadev Bhattiprolu
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 25+ messages in thread
From: Sukadev Bhattiprolu @ 2015-07-15  3:01 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Michael Ellerman
  Cc: linux-kernel, linuxppc-dev, linux-s390, sparclinux

Currently, the PMU interface allows reading only one counter at a time.
But some PMUs like the 24x7 counters in Power, support reading several
counters at once. To leveage this functionality, extend the transaction
interface to support a "transaction type".

The first type, PERF_PMU_TXN_ADD, refers to the existing transactions,
i.e. used to _schedule_ all the events on the PMU as a group. A second
transaction type, PERF_PMU_TXN_READ, will be used in a follow-on patch,
by the 24x7 counters to read several counters at once.

Extend the transaction interfaces to the PMU to accept a 'txn_flags'
parameter and use this parameter to ignore any transactions that are
not of type PERF_PMU_TXN_ADD.

Thanks to Peter Zijlstra for his input.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>

Changelog[v3]
	- [Peter Zijlstra] Ensure the nop_txn interfaces disable/enable
	  PMU only for TXN_ADD transactions.
	- [Peter Zijlstra] Cache the flags parameter in ->start_txn() and
	  drop the flags parameter from ->commit_txn() and ->cancel_txn().
---
 arch/powerpc/perf/core-book3s.c  |   25 ++++++++++++++++++-
 arch/s390/kernel/perf_cpum_cf.c  |   24 +++++++++++++++++-
 arch/sparc/kernel/perf_event.c   |   19 +++++++++++++-
 arch/x86/kernel/cpu/perf_event.c |   27 ++++++++++++++++++--
 arch/x86/kernel/cpu/perf_event.h |    1 +
 include/linux/perf_event.h       |   14 ++++++++---
 kernel/events/core.c             |   51 +++++++++++++++++++++++++++++++++++---
 7 files changed, 150 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index d90893b..b92084b 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -50,6 +50,7 @@ struct cpu_hw_events {
 
 	unsigned int group_flag;
 	int n_txn_start;
+	int txn_flags;
 
 	/* BHRB bits */
 	u64				bhrb_filter;	/* BHRB HW branch filter */
@@ -1586,11 +1587,19 @@ static void power_pmu_stop(struct perf_event *event, int ef_flags)
  * Start group events scheduling transaction
  * Set the flag to make pmu::enable() not perform the
  * schedulability test, it will be performed at commit time
+ *
+ * We only support PERF_PMU_TXN_ADD transactions. Save the
+ * transaction flags but otherwise ignore non-PERF_PMU_TXN_ADD
+ * transactions.
  */
-static void power_pmu_start_txn(struct pmu *pmu)
+static void power_pmu_start_txn(struct pmu *pmu, int txn_flags)
 {
 	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
 
+	cpuhw->txn_flags = txn_flags;
+	if (txn_flags & ~PERF_PMU_TXN_ADD)
+		return;
+
 	perf_pmu_disable(pmu);
 	cpuhw->group_flag |= PERF_EVENT_TXN;
 	cpuhw->n_txn_start = cpuhw->n_events;
@@ -1604,6 +1613,12 @@ static void power_pmu_start_txn(struct pmu *pmu)
 static void power_pmu_cancel_txn(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
+	int txn_flags;
+
+	txn_flags = cpuhw->txn_flags;
+	cpuhw->txn_flags = 0;
+	if (cpuhw->txn_flags & ~PERF_PMU_TXN_ADD)
+		return;
 
 	cpuhw->group_flag &= ~PERF_EVENT_TXN;
 	perf_pmu_enable(pmu);
@@ -1618,10 +1633,18 @@ static int power_pmu_commit_txn(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuhw;
 	long i, n;
+	int txn_flags;
 
 	if (!ppmu)
 		return -EAGAIN;
+
 	cpuhw = this_cpu_ptr(&cpu_hw_events);
+
+	txn_flags = cpuhw->txn_flags;
+	cpuhw->txn_flags = 0;
+	if (cpuhw->txn_flags & ~PERF_PMU_TXN_ADD)
+		return 0;
+
 	n = cpuhw->n_events;
 	if (check_excludes(cpuhw->event, cpuhw->flags, 0, n))
 		return -EAGAIN;
diff --git a/arch/s390/kernel/perf_cpum_cf.c b/arch/s390/kernel/perf_cpum_cf.c
index 56fdad4..a6f9e7b 100644
--- a/arch/s390/kernel/perf_cpum_cf.c
+++ b/arch/s390/kernel/perf_cpum_cf.c
@@ -72,6 +72,7 @@ struct cpu_hw_events {
 	atomic_t		ctr_set[CPUMF_CTR_SET_MAX];
 	u64			state, tx_state;
 	unsigned int		flags;
+	int			txn_flags;
 };
 static DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = {
 	.ctr_set = {
@@ -82,6 +83,7 @@ static DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = {
 	},
 	.state = 0,
 	.flags = 0,
+	.txn_flags = 0,
 };
 
 static int get_counter_set(u64 event)
@@ -572,11 +574,19 @@ static void cpumf_pmu_del(struct perf_event *event, int flags)
 /*
  * Start group events scheduling transaction.
  * Set flags to perform a single test at commit time.
+ *
+ * We only support PERF_PMU_TXN_ADD transactions. Save the
+ * transaction flags but otherwise ignore non-PERF_PMU_TXN_ADD
+ * transactions.
  */
-static void cpumf_pmu_start_txn(struct pmu *pmu)
+static void cpumf_pmu_start_txn(struct pmu *pmu, int txn_flags)
 {
 	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
 
+	cpuhw->txn_flags = txn_flags;
+	if (txn_flags & ~PERF_PMU_TXN_ADD)
+		return;
+
 	perf_pmu_disable(pmu);
 	cpuhw->flags |= PERF_EVENT_TXN;
 	cpuhw->tx_state = cpuhw->state;
@@ -589,8 +599,14 @@ static void cpumf_pmu_start_txn(struct pmu *pmu)
  */
 static void cpumf_pmu_cancel_txn(struct pmu *pmu)
 {
+	int txn_flags;
 	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
 
+	txn_flags = cphw->txn_flags;
+	cpuhw->txn_flags = 0;
+	if (txn_flags & ~PERF_PMU_TXN_ADD)
+		return;
+
 	WARN_ON(cpuhw->tx_state != cpuhw->state);
 
 	cpuhw->flags &= ~PERF_EVENT_TXN;
@@ -604,9 +620,15 @@ static void cpumf_pmu_cancel_txn(struct pmu *pmu)
  */
 static int cpumf_pmu_commit_txn(struct pmu *pmu)
 {
+	int txn_flags;
 	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
 	u64 state;
 
+	txn_flags = cpuhw->txn_flags;
+	cpuhw->txn_flags = 0;
+	if (txn_flags & ~PERF_PMU_TXN_ADD)
+		return 0;
+
 	/* check if the updated state can be scheduled */
 	state = cpuhw->state & ~((1 << CPUMF_LCCTL_ENABLE_SHIFT) - 1);
 	state >>= CPUMF_LCCTL_ENABLE_SHIFT;
diff --git a/arch/sparc/kernel/perf_event.c b/arch/sparc/kernel/perf_event.c
index 689db65..b5fcbce 100644
--- a/arch/sparc/kernel/perf_event.c
+++ b/arch/sparc/kernel/perf_event.c
@@ -109,6 +109,7 @@ struct cpu_hw_events {
 	int			enabled;
 
 	unsigned int		group_flag;
+	int			txn_flags;
 };
 static DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = { .enabled = 1, };
 
@@ -1494,10 +1495,14 @@ static int sparc_pmu_event_init(struct perf_event *event)
  * Set the flag to make pmu::enable() not perform the
  * schedulability test, it will be performed at commit time
  */
-static void sparc_pmu_start_txn(struct pmu *pmu)
+static void sparc_pmu_start_txn(struct pmu *pmu, int txn_flags)
 {
 	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
 
+	cpuhw->txn_flags = txn_flags;
+	if (txn_flags & ~PERF_PMU_TXN_ADD)
+		return;
+
 	perf_pmu_disable(pmu);
 	cpuhw->group_flag |= PERF_EVENT_TXN;
 }
@@ -1510,6 +1515,12 @@ static void sparc_pmu_start_txn(struct pmu *pmu)
 static void sparc_pmu_cancel_txn(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
+	int txn_flags;
+
+	txn_flags = cpuhw->txn_flags;
+	cpuhw->txn_flags = 0;
+	if (txn_flags & ~PERF_PMU_TXN_ADD)
+		return;
 
 	cpuhw->group_flag &= ~PERF_EVENT_TXN;
 	perf_pmu_enable(pmu);
@@ -1524,11 +1535,17 @@ static int sparc_pmu_commit_txn(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 	int n;
+	int txn_flags;
 
 	if (!sparc_pmu)
 		return -EINVAL;
 
 	cpuc = this_cpu_ptr(&cpu_hw_events);
+	txn_flags = cpuc->txn_flags;
+	cpuc->txn_flags = 0;
+	if (txn_flags & ~PERF_PMU_TXN_ADD)
+		return 0;
+
 	n = cpuc->n_events;
 	if (check_excludes(cpuc->event, 0, n))
 		return -EINVAL;
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 3658de4..3a7e08f 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1748,9 +1748,19 @@ static inline void x86_pmu_read(struct perf_event *event)
  * Start group events scheduling transaction
  * Set the flag to make pmu::enable() not perform the
  * schedulability test, it will be performed at commit time
+ *
+ * We only support PERF_PMU_TXN_ADD transactions. Save the
+ * transaction flags but otherwise ignore non-PERF_PMU_TXN_ADD
+ * transactions.
  */
-static void x86_pmu_start_txn(struct pmu *pmu)
+static void x86_pmu_start_txn(struct pmu *pmu, int txn_flags)
 {
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+
+	cpuc->txn_flags = txn_flags;
+	if (txn_flags & ~PERF_PMU_TXN_ADD)
+		return;
+
 	perf_pmu_disable(pmu);
 	__this_cpu_or(cpu_hw_events.group_flag, PERF_EVENT_TXN);
 	__this_cpu_write(cpu_hw_events.n_txn, 0);
@@ -1763,6 +1773,14 @@ static void x86_pmu_start_txn(struct pmu *pmu)
  */
 static void x86_pmu_cancel_txn(struct pmu *pmu)
 {
+	int txn_flags;
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+
+	txn_flags = cpuc->txn_flags;
+	cpuc->txn_flags = 0;
+	if (txn_flags & ~PERF_PMU_TXN_ADD)
+		return;
+
 	__this_cpu_and(cpu_hw_events.group_flag, ~PERF_EVENT_TXN);
 	/*
 	 * Truncate collected array by the number of events added in this
@@ -1784,7 +1802,12 @@ static int x86_pmu_commit_txn(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 	int assign[X86_PMC_IDX_MAX];
-	int n, ret;
+	int n, ret, txn_flags;
+
+	txn_flags = cpuc->txn_flags;
+	cpuc->txn_flags = 0;
+	if (txn_flags & ~PERF_PMU_TXN_ADD)
+		return 0;
 
 	n = cpuc->n_events;
 
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 3e7fd27..87b6bb7 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -197,6 +197,7 @@ struct cpu_hw_events {
 
 	unsigned int		group_flag;
 	int			is_fake;
+	int			txn_flags;
 
 	/*
 	 * Intel DebugStore bits
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 2027809..44bf05f 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -168,6 +168,8 @@ struct perf_event;
  */
 #define PERF_EVENT_TXN 0x1
 
+#define PERF_PMU_TXN_ADD  0x1		/* txn to add/schedule event on PMU */
+
 /**
  * pmu::capabilities flags
  */
@@ -252,20 +254,26 @@ struct pmu {
 	 *
 	 * Start the transaction, after this ->add() doesn't need to
 	 * do schedulability tests.
+	 *
+	 * Optional.
 	 */
-	void (*start_txn)		(struct pmu *pmu); /* optional */
+	void (*start_txn)		(struct pmu *pmu, int txn_flags);
 	/*
 	 * If ->start_txn() disabled the ->add() schedulability test
 	 * then ->commit_txn() is required to perform one. On success
 	 * the transaction is closed. On error the transaction is kept
 	 * open until ->cancel_txn() is called.
+	 *
+	 * Optional.
 	 */
-	int  (*commit_txn)		(struct pmu *pmu); /* optional */
+	int  (*commit_txn)		(struct pmu *pmu);
 	/*
 	 * Will cancel the transaction, assumes ->del() is called
 	 * for each successful ->add() during the transaction.
+	 *
+	 * Optional.
 	 */
-	void (*cancel_txn)		(struct pmu *pmu); /* optional */
+	void (*cancel_txn)		(struct pmu *pmu);
 
 	/*
 	 * Will return the value for perf_event_mmap_page::index for this event,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index d3dae34..b79aad2 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1913,7 +1913,7 @@ group_sched_in(struct perf_event *group_event,
 	if (group_event->state == PERF_EVENT_STATE_OFF)
 		return 0;
 
-	pmu->start_txn(pmu);
+	pmu->start_txn(pmu, PERF_PMU_TXN_ADD);
 
 	if (event_sched_in(group_event, cpuctx, ctx)) {
 		pmu->cancel_txn(pmu);
@@ -7074,24 +7074,69 @@ static void perf_pmu_nop_void(struct pmu *pmu)
 {
 }
 
+static void perf_pmu_nop_txn(struct pmu *pmu, int flags)
+{
+}
+
 static int perf_pmu_nop_int(struct pmu *pmu)
 {
 	return 0;
 }
 
-static void perf_pmu_start_txn(struct pmu *pmu)
+DEFINE_PER_CPU(int, nop_txn_flags);
+
+static int nop_txn_flags_get_and_clear(void)
+{
+	int *flagsp;
+	int flags;
+	
+	flagsp = &get_cpu_var(nop_txn_flags);
+
+	flags = *flagsp;
+	*flagsp = 0;
+
+	put_cpu_var(nop_txn_flags);
+
+	return flags;
+}
+
+static void nop_txn_flags_set(int flags)
+{
+	int *flagsp;
+	
+	flagsp = &get_cpu_var(nop_txn_flags);
+	*flagsp = flags;
+	put_cpu_var(nop_txn_flags);
+}
+
+static void perf_pmu_start_txn(struct pmu *pmu, int flags)
 {
+	nop_txn_flags_set(flags);
+
+	if (flags & ~PERF_PMU_TXN_ADD)
+		return;
+
 	perf_pmu_disable(pmu);
 }
 
 static int perf_pmu_commit_txn(struct pmu *pmu)
 {
+	int flags = nop_txn_flags_get_and_clear();
+
+	if (flags & ~PERF_PMU_TXN_ADD)
+		return 0;
+
 	perf_pmu_enable(pmu);
 	return 0;
 }
 
 static void perf_pmu_cancel_txn(struct pmu *pmu)
 {
+	int flags = nop_txn_flags_get_and_clear();
+
+	if (flags & ~PERF_PMU_TXN_ADD)
+		return;
+
 	perf_pmu_enable(pmu);
 }
 
@@ -7330,7 +7375,7 @@ got_cpu_context:
 			pmu->commit_txn = perf_pmu_commit_txn;
 			pmu->cancel_txn = perf_pmu_cancel_txn;
 		} else {
-			pmu->start_txn  = perf_pmu_nop_void;
+			pmu->start_txn  = perf_pmu_nop_txn;
 			pmu->commit_txn = perf_pmu_nop_int;
 			pmu->cancel_txn = perf_pmu_nop_void;
 		}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 4/8] perf: Split perf_event_read() and perf_event_count()
  2015-07-15  3:01 [PATCH v3 0/8] Implement group-read of events using txn interface Sukadev Bhattiprolu
                   ` (2 preceding siblings ...)
  2015-07-15  3:01 ` [PATCH v3 3/8] perf: Add a flags parameter to pmu txn interfaces Sukadev Bhattiprolu
@ 2015-07-15  3:01 ` Sukadev Bhattiprolu
  2015-07-15  3:01 ` [PATCH v3 5/8] perf: Split perf_event_read_value() Sukadev Bhattiprolu
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: Sukadev Bhattiprolu @ 2015-07-15  3:01 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Michael Ellerman
  Cc: linux-kernel, linuxppc-dev, linux-s390, sparclinux

perf_event_read() does two things:

	- call the PMU to read/update the counter value, and
	- compute the total count of the event and its children

Not all callers need both. perf_event_reset() for instance needs the
first piece but doesn't need the second.  Similarly, when we implement
the ability to read a group of events using the transaction interface,
we would need the two pieces done independently.

Break up perf_event_read() and have it just read/update the counter
and have the callers compute the total count if necessary.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 kernel/events/core.c |   14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index b79aad2..44fb89d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3212,7 +3212,7 @@ static inline u64 perf_event_count(struct perf_event *event)
 	return __perf_event_count(event);
 }
 
-static u64 perf_event_read(struct perf_event *event)
+static void perf_event_read(struct perf_event *event)
 {
 	/*
 	 * If event is enabled and currently active on a CPU, update the
@@ -3238,8 +3238,6 @@ static u64 perf_event_read(struct perf_event *event)
 		update_event_times(event);
 		raw_spin_unlock_irqrestore(&ctx->lock, flags);
 	}
-
-	return perf_event_count(event);
 }
 
 /*
@@ -3751,14 +3749,18 @@ u64 perf_event_read_value(struct perf_event *event, u64 *enabled, u64 *running)
 	*running = 0;
 
 	mutex_lock(&event->child_mutex);
-	total += perf_event_read(event);
+
+	perf_event_read(event);
+	total += perf_event_count(event);
+
 	*enabled += event->total_time_enabled +
 			atomic64_read(&event->child_total_time_enabled);
 	*running += event->total_time_running +
 			atomic64_read(&event->child_total_time_running);
 
 	list_for_each_entry(child, &event->child_list, child_list) {
-		total += perf_event_read(child);
+		perf_event_read(child);
+		total += perf_event_count(child);
 		*enabled += child->total_time_enabled;
 		*running += child->total_time_running;
 	}
@@ -3918,7 +3920,7 @@ static unsigned int perf_poll(struct file *file, poll_table *wait)
 
 static void _perf_event_reset(struct perf_event *event)
 {
-	(void)perf_event_read(event);
+	perf_event_read(event);
 	local64_set(&event->count, 0);
 	perf_event_update_userpage(event);
 }
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 5/8] perf: Split perf_event_read_value()
  2015-07-15  3:01 [PATCH v3 0/8] Implement group-read of events using txn interface Sukadev Bhattiprolu
                   ` (3 preceding siblings ...)
  2015-07-15  3:01 ` [PATCH v3 4/8] perf: Split perf_event_read() and perf_event_count() Sukadev Bhattiprolu
@ 2015-07-15  3:01 ` Sukadev Bhattiprolu
  2015-07-16 21:12   ` Peter Zijlstra
  2015-07-23  7:45   ` Peter Zijlstra
  2015-07-15  3:01 ` [PATCH v3 6/8] perf: Rename perf_event_read_{one,group}, perf_read_hw Sukadev Bhattiprolu
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 25+ messages in thread
From: Sukadev Bhattiprolu @ 2015-07-15  3:01 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Michael Ellerman
  Cc: linux-kernel, linuxppc-dev, linux-s390, sparclinux

Move the part of perf_event_read_value() that computes the event
counts and event times into a new function, perf_event_compute().

This would allow us to call perf_event_compute() independently.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>

Changelog[v3]
	Rather than move perf_event_read() into callers and then
	rename, just move the computations into a separate function
	(redesign to address comment from Peter Zijlstra).
---
 kernel/events/core.c |   37 ++++++++++++++++++++++++-------------
 1 file changed, 24 insertions(+), 13 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 44fb89d..b1e9a42 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3704,6 +3704,29 @@ static int perf_release(struct inode *inode, struct file *file)
 	return 0;
 }
 
+static u64 perf_event_compute(struct perf_event *event, u64 *enabled,
+			      u64 *running)
+{
+	struct perf_event *child;
+	u64 total;
+
+	total = perf_event_count(event);
+
+	*enabled += event->total_time_enabled +
+			atomic64_read(&event->child_total_time_enabled);
+	*running += event->total_time_running +
+			atomic64_read(&event->child_total_time_running);
+
+	list_for_each_entry(child, &event->child_list, child_list) {
+		perf_event_read(child);
+		total += perf_event_count(child);
+		*enabled += child->total_time_enabled;
+		*running += child->total_time_running;
+	}
+
+	return total;
+}
+
 /*
  * Remove all orphanes events from the context.
  */
@@ -3742,7 +3765,6 @@ static void orphans_remove_work(struct work_struct *work)
 
 u64 perf_event_read_value(struct perf_event *event, u64 *enabled, u64 *running)
 {
-	struct perf_event *child;
 	u64 total = 0;
 
 	*enabled = 0;
@@ -3751,19 +3773,8 @@ u64 perf_event_read_value(struct perf_event *event, u64 *enabled, u64 *running)
 	mutex_lock(&event->child_mutex);
 
 	perf_event_read(event);
-	total += perf_event_count(event);
+	total = perf_event_compute(event, enabled, running);
 
-	*enabled += event->total_time_enabled +
-			atomic64_read(&event->child_total_time_enabled);
-	*running += event->total_time_running +
-			atomic64_read(&event->child_total_time_running);
-
-	list_for_each_entry(child, &event->child_list, child_list) {
-		perf_event_read(child);
-		total += perf_event_count(child);
-		*enabled += child->total_time_enabled;
-		*running += child->total_time_running;
-	}
 	mutex_unlock(&event->child_mutex);
 
 	return total;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 6/8] perf: Rename perf_event_read_{one,group}, perf_read_hw
  2015-07-15  3:01 [PATCH v3 0/8] Implement group-read of events using txn interface Sukadev Bhattiprolu
                   ` (4 preceding siblings ...)
  2015-07-15  3:01 ` [PATCH v3 5/8] perf: Split perf_event_read_value() Sukadev Bhattiprolu
@ 2015-07-15  3:01 ` Sukadev Bhattiprolu
  2015-07-15  3:01 ` [PATCH v3 7/8] perf: Define PMU_TXN_READ interface Sukadev Bhattiprolu
  2015-07-15  3:01 ` [PATCH v3 8/8] powerpc/perf/hv-24x7: Use PMU_TXN_READ interface Sukadev Bhattiprolu
  7 siblings, 0 replies; 25+ messages in thread
From: Sukadev Bhattiprolu @ 2015-07-15  3:01 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Michael Ellerman
  Cc: linux-kernel, linuxppc-dev, linux-s390, sparclinux

From: "Peter Zijlstra (Intel)" <peterz@infradead.org>

In order to free up the perf_event_read_group() name:

 s/perf_event_read_\(one\|group\)/perf_read_\1/g
 s/perf_read_hw/__perf_read/g

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/events/core.c |   14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index b1e9a42..a83d45c 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3675,7 +3675,7 @@ static void put_event(struct perf_event *event)
 	 *     see the comment there.
 	 *
 	 *  2) there is a lock-inversion with mmap_sem through
-	 *     perf_event_read_group(), which takes faults while
+	 *     perf_read_group(), which takes faults while
 	 *     holding ctx->mutex, however this is called after
 	 *     the last filedesc died, so there is no possibility
 	 *     to trigger the AB-BA case.
@@ -3781,7 +3781,7 @@ u64 perf_event_read_value(struct perf_event *event, u64 *enabled, u64 *running)
 }
 EXPORT_SYMBOL_GPL(perf_event_read_value);
 
-static int perf_event_read_group(struct perf_event *event,
+static int perf_read_group(struct perf_event *event,
 				   u64 read_format, char __user *buf)
 {
 	struct perf_event *leader = event->group_leader, *sub;
@@ -3829,7 +3829,7 @@ static int perf_event_read_group(struct perf_event *event,
 	return ret;
 }
 
-static int perf_event_read_one(struct perf_event *event,
+static int perf_read_one(struct perf_event *event,
 				 u64 read_format, char __user *buf)
 {
 	u64 enabled, running;
@@ -3867,7 +3867,7 @@ static bool is_event_hup(struct perf_event *event)
  * Read the performance event - simple non blocking version for now
  */
 static ssize_t
-perf_read_hw(struct perf_event *event, char __user *buf, size_t count)
+__perf_read(struct perf_event *event, char __user *buf, size_t count)
 {
 	u64 read_format = event->attr.read_format;
 	int ret;
@@ -3885,9 +3885,9 @@ perf_read_hw(struct perf_event *event, char __user *buf, size_t count)
 
 	WARN_ON_ONCE(event->ctx->parent_ctx);
 	if (read_format & PERF_FORMAT_GROUP)
-		ret = perf_event_read_group(event, read_format, buf);
+		ret = perf_read_group(event, read_format, buf);
 	else
-		ret = perf_event_read_one(event, read_format, buf);
+		ret = perf_read_one(event, read_format, buf);
 
 	return ret;
 }
@@ -3900,7 +3900,7 @@ perf_read(struct file *file, char __user *buf, size_t count, loff_t *ppos)
 	int ret;
 
 	ctx = perf_event_ctx_lock(event);
-	ret = perf_read_hw(event, buf, count);
+	ret = __perf_read(event, buf, count);
 	perf_event_ctx_unlock(event, ctx);
 
 	return ret;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 7/8] perf: Define PMU_TXN_READ interface
  2015-07-15  3:01 [PATCH v3 0/8] Implement group-read of events using txn interface Sukadev Bhattiprolu
                   ` (5 preceding siblings ...)
  2015-07-15  3:01 ` [PATCH v3 6/8] perf: Rename perf_event_read_{one,group}, perf_read_hw Sukadev Bhattiprolu
@ 2015-07-15  3:01 ` Sukadev Bhattiprolu
  2015-07-16 22:20   ` Peter Zijlstra
  2015-07-15  3:01 ` [PATCH v3 8/8] powerpc/perf/hv-24x7: Use PMU_TXN_READ interface Sukadev Bhattiprolu
  7 siblings, 1 reply; 25+ messages in thread
From: Sukadev Bhattiprolu @ 2015-07-15  3:01 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Michael Ellerman
  Cc: linux-kernel, linuxppc-dev, linux-s390, sparclinux

Define a new PERF_PMU_TXN_READ interface to read a group of counters
at once. Note that we use this interface with all PMUs.

PMUs that implement this interface use the ->read() operation to _queue_
the counters to be read and use ->commit_txn() to actually read all the
queued counters at once.

PMUs that don't implement PERF_PMU_TXN_READ ignore ->start_txn() and
->commit_txn() and continue to read counters one at a time.

Thanks to input from Peter Zijlstra.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 include/linux/perf_event.h |    1 +
 kernel/events/core.c       |   35 +++++++++++++++++++++++++++++++++--
 2 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 44bf05f..da307ad 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -169,6 +169,7 @@ struct perf_event;
 #define PERF_EVENT_TXN 0x1
 
 #define PERF_PMU_TXN_ADD  0x1		/* txn to add/schedule event on PMU */
+#define PERF_PMU_TXN_READ 0x2		/* txn to read event group from PMU */
 
 /**
  * pmu::capabilities flags
diff --git a/kernel/events/core.c b/kernel/events/core.c
index a83d45c..2ea06c4 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3763,6 +3763,33 @@ static void orphans_remove_work(struct work_struct *work)
 	put_ctx(ctx);
 }
 
+/*
+ * Use the transaction interface to read the group of events in @leader.
+ * PMUs like the 24x7 counters in Power, can use this to queue the events
+ * in the ->read() operation and perform the actual read in ->commit_txn.
+ *
+ * Other PMUs can ignore the ->start_txn and ->commit_txn and read each
+ * PMU directly in the ->read() operation.
+ */
+static int perf_event_read_group(struct perf_event *leader)
+{
+	int ret;
+	struct perf_event *sub;
+	struct pmu *pmu;
+
+	pmu = leader->pmu;
+
+	pmu->start_txn(pmu, PERF_PMU_TXN_READ);
+
+	perf_event_read(leader);
+	list_for_each_entry(sub, &leader->sibling_list, group_entry)
+		perf_event_read(sub);
+
+	ret = pmu->commit_txn(pmu);
+
+	return ret;
+}
+
 u64 perf_event_read_value(struct perf_event *event, u64 *enabled, u64 *running)
 {
 	u64 total = 0;
@@ -3792,7 +3819,11 @@ static int perf_read_group(struct perf_event *event,
 
 	lockdep_assert_held(&ctx->mutex);
 
-	count = perf_event_read_value(leader, &enabled, &running);
+	ret = perf_event_read_group(leader);
+	if (ret)
+		return ret;
+
+	count = perf_event_compute(leader, &enabled, &running);
 
 	values[n++] = 1 + leader->nr_siblings;
 	if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED)
@@ -3813,7 +3844,7 @@ static int perf_read_group(struct perf_event *event,
 	list_for_each_entry(sub, &leader->sibling_list, group_entry) {
 		n = 0;
 
-		values[n++] = perf_event_read_value(sub, &enabled, &running);
+		values[n++] = perf_event_compute(sub, &enabled, &running);
 		if (read_format & PERF_FORMAT_ID)
 			values[n++] = primary_event_id(sub);
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 8/8] powerpc/perf/hv-24x7: Use PMU_TXN_READ interface
  2015-07-15  3:01 [PATCH v3 0/8] Implement group-read of events using txn interface Sukadev Bhattiprolu
                   ` (6 preceding siblings ...)
  2015-07-15  3:01 ` [PATCH v3 7/8] perf: Define PMU_TXN_READ interface Sukadev Bhattiprolu
@ 2015-07-15  3:01 ` Sukadev Bhattiprolu
  7 siblings, 0 replies; 25+ messages in thread
From: Sukadev Bhattiprolu @ 2015-07-15  3:01 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Michael Ellerman
  Cc: linux-kernel, linuxppc-dev, linux-s390, sparclinux

The 24x7 counters in Powerpc allow monitoring a large number of counters
simultaneously. They also allow reading several counters in a single
HCALL so we can get a more consistent snapshot of the system.

Use the PMU's transaction interface to monitor and read several event
counters at once. The idea is that users can group several 24x7 events
into a single group of events. We use the following logic to submit
the group of events to the PMU and read the values:

	pmu->start_txn()		// Initialize before first event

	for each event in group
		pmu->read(event);	// Queue each event to be read

	pmu->commit_txn()		// Read/update all queuedcounters

The ->commit_txn() also updates the event counts in the respective
perf_event objects.  The perf subsystem can then directly get the
event counts from the perf_event and can avoid submitting a new
->read() request to the PMU.

Thanks to input from Peter Zijlstra.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>

Changelog[v3]
	[Peter Zijlstra] Save the transaction state in ->start_txn() and
	drop the flags parameter from ->commit_txn() and ->cancel_txn().
	[Peter Zijlstra] The nop txn interfaces don't need to disable/enable
	PMU for PERF_PMU_TXN_READ transactions.
---
 arch/powerpc/perf/hv-24x7.c |  160 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 157 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index 4d1a8d1..c28ef3f 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -142,8 +142,24 @@ static struct attribute_group event_long_desc_group = {
 
 static struct kmem_cache *hv_page_cache;
 
+struct h_24x7_hw {
+	int txn_err;
+	int txn_flags;
+	struct perf_event *events[255];
+} h24x7hw;
+
 /*
- * request_buffer and result_buffer are not required to be 4k aligned,
+ * The request and result buffers are also used in interrupt context
+ * (eg: we read/update the event counts in h_24x7_event_stop()). Rather
+ * than allocating buffers in interrupt context (i.e before each HCALL),
+ * pre-allocate per-CPU request and result buffers.
+ *
+ * However, for the transaction interface, the ->start_txn(), where the
+ * buffers are initialized and the ->read() operations (where the buffers
+ * are used) are not guaranteed to be on the same CPU. Hence, we cannot
+ * use the per-CPU buffers. Use PMU-wide request and result buffers instead.
+ *
+ * Note that request and result buffers are not required to be 4k aligned,
  * but are not allowed to cross any 4k boundary. Aligning them to 4k is
  * the simplest way to ensure that.
  */
@@ -151,6 +167,9 @@ static struct kmem_cache *hv_page_cache;
 DEFINE_PER_CPU(char, hv_24x7_reqb[H24x7_DATA_BUFFER_SIZE]) __aligned(4096);
 DEFINE_PER_CPU(char, hv_24x7_resb[H24x7_DATA_BUFFER_SIZE]) __aligned(4096);
 
+char hv_24x7_txn_reqb[H24x7_DATA_BUFFER_SIZE] __aligned(4096);
+char hv_24x7_txn_resb[H24x7_DATA_BUFFER_SIZE] __aligned(4096);
+
 static char *event_name(struct hv_24x7_event_data *ev, int *len)
 {
 	*len = be16_to_cpu(ev->event_name_len) - 2;
@@ -1233,9 +1252,42 @@ static void update_event_count(struct perf_event *event, u64 now)
 static void h_24x7_event_read(struct perf_event *event)
 {
 	u64 now;
+	struct hv_24x7_request_buffer *request_buffer;
+
+	/*
+	 * If in a READ transaction, add this counter to the list of
+	 * counters to read during the next HCALL (i.e commit_txn()).
+	 * If not in a READ transaction, go ahead and make the HCALL
+	 * to read this counter by itself.
+	 */
+
+	if (h24x7hw.txn_flags & PERF_PMU_TXN_READ) {
+		int i;
+		int ret;
+
+		if (h24x7hw.txn_err)
+			return;
+
+		request_buffer = (void *)&hv_24x7_txn_reqb[0];
+
+		ret = add_event_to_24x7_request(event, request_buffer);
+		if (ret) {
+			h24x7hw.txn_err = ret;
+		} else {
+			/*
+			 * Assoicate the event with the HCALL request index,
+			 * so ->commit_txn() can quickly find/update count.
+			 */
+			i = request_buffer->num_requests - 1;
+			h24x7hw.events[i] = event;
+		}
+
+		put_cpu_var(hv_24x7_reqb);
+	} else {
+		now = h_24x7_get_value(event);
+		update_event_count(event, now);
+	}
 
-	now = h_24x7_get_value(event);
-	update_event_count(event, now);
 }
 
 static void h_24x7_event_start(struct perf_event *event, int flags)
@@ -1257,6 +1309,105 @@ static int h_24x7_event_add(struct perf_event *event, int flags)
 	return 0;
 }
 
+/*
+ * 24x7 counters only support READ transactions. They are
+ * always counting and dont need/support ADD transactions.
+ * Cache the flags, but otherwise ignore transactions that
+ * are not PERF_PMU_TXN_READ.
+ */
+static void h_24x7_event_start_txn(struct pmu *pmu, int flags)
+{
+	struct hv_24x7_request_buffer *request_buffer;
+	struct hv_24x7_data_result_buffer *result_buffer;
+
+	/* We should not be called if we are already in a txn */
+	WARN_ON_ONCE(h24x7hw.txn_flags);
+
+	h24x7hw.txn_flags = flags;
+	if (flags & ~PERF_PMU_TXN_READ)
+		return;
+
+	request_buffer = (void *)&hv_24x7_txn_reqb[0];
+	result_buffer = (void *)&hv_24x7_txn_resb[0];
+
+	init_24x7_request(request_buffer, result_buffer);
+}
+
+/*
+ * Clean up transaction state.
+ *
+ * NOTE: Ignore state of request and result buffers for now.
+ * 	 We will initialize them during the next read/txn.
+ */
+static void reset_txn(void)
+{
+	h24x7hw.txn_err = 0;
+	h24x7hw.txn_flags = 0;
+}
+
+/*
+ * 24x7 counters only support READ transactions. They are always counting
+ * and dont need/support ADD transactions. Clear ->txn_flags but otherwise
+ * ignore transactions that are not of type PERF_PMU_TXN_READ.
+ *
+ * For READ transactions, submit all pending 24x7 requests (i.e requests
+ * that were queued by h_24x7_event_read()), to the hypervisor and update
+ * the event counts.
+ */
+static int h_24x7_event_commit_txn(struct pmu *pmu)
+{
+	struct hv_24x7_request_buffer *request_buffer;
+	struct hv_24x7_data_result_buffer *result_buffer;
+	struct hv_24x7_result *resb;
+	struct perf_event *event;
+	u64 count;
+	int i, ret;
+
+	WARN_ON_ONCE(!h24x7hw.txn_flags);
+
+	ret = 0;
+	if (h24x7hw.txn_flags & ~PERF_PMU_TXN_READ)
+		goto out;
+
+	ret = h24x7hw.txn_err;
+	if (h24x7hw.txn_err)
+		goto out;
+
+	request_buffer = (void *)&hv_24x7_txn_reqb[0];
+	result_buffer = (void *)&hv_24x7_txn_resb[0];
+
+	ret = make_24x7_request(request_buffer, result_buffer);
+	if (ret) {
+		log_24x7_hcall(request_buffer, result_buffer, ret);
+		goto out;
+	}
+
+	/* Update event counts from hcall */
+	for (i = 0; i < request_buffer->num_requests; i++) {
+		resb = &result_buffer->results[i];
+		count = be64_to_cpu(resb->elements[0].element_data[0]);
+		event = h24x7hw.events[i];
+		h24x7hw.events[i] = NULL;
+		update_event_count(event, count);
+	}
+
+out:
+	reset_txn();
+	return ret;
+}
+
+/*
+ * 24x7 counters only support READ transactions. They are always counting
+ * and dont need/support ADD transactions. However, regardless of type
+ * of transaction, all we need to do is cleanup, so we don't have to check
+ * the type of transaction.
+ */
+static void h_24x7_event_cancel_txn(struct pmu *pmu)
+{
+	WARN_ON_ONCE(!h24x7hw.txn_flags);
+	reset_txn();
+}
+
 static struct pmu h_24x7_pmu = {
 	.task_ctx_nr = perf_invalid_context,
 
@@ -1268,6 +1419,9 @@ static struct pmu h_24x7_pmu = {
 	.start       = h_24x7_event_start,
 	.stop        = h_24x7_event_stop,
 	.read        = h_24x7_event_read,
+	.start_txn   = h_24x7_event_start_txn,
+	.commit_txn  = h_24x7_event_commit_txn,
+	.cancel_txn  = h_24x7_event_cancel_txn,
 };
 
 static int hv_24x7_init(void)
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 3/8] perf: Add a flags parameter to pmu txn interfaces
  2015-07-15  3:01 ` [PATCH v3 3/8] perf: Add a flags parameter to pmu txn interfaces Sukadev Bhattiprolu
@ 2015-07-16 20:17   ` Peter Zijlstra
  2015-07-16 21:28     ` Sukadev Bhattiprolu
  2015-07-16 20:48   ` Peter Zijlstra
  1 sibling, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2015-07-16 20:17 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Michael Ellerman,
	linux-kernel, linuxppc-dev, linux-s390, sparclinux

On Tue, Jul 14, 2015 at 08:01:50PM -0700, Sukadev Bhattiprolu wrote:
> @@ -1604,6 +1613,12 @@ static void power_pmu_start_txn(struct pmu *pmu)
>  static void power_pmu_cancel_txn(struct pmu *pmu)
>  {
>  	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
> +	int txn_flags;
> +
> +	txn_flags = cpuhw->txn_flags;
> +	cpuhw->txn_flags = 0;
> +	if (cpuhw->txn_flags & ~PERF_PMU_TXN_ADD)
> +		return;

That seems, unintentional? ;-)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 3/8] perf: Add a flags parameter to pmu txn interfaces
  2015-07-15  3:01 ` [PATCH v3 3/8] perf: Add a flags parameter to pmu txn interfaces Sukadev Bhattiprolu
  2015-07-16 20:17   ` Peter Zijlstra
@ 2015-07-16 20:48   ` Peter Zijlstra
  1 sibling, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2015-07-16 20:48 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Michael Ellerman,
	linux-kernel, linuxppc-dev, linux-s390, sparclinux

On Tue, Jul 14, 2015 at 08:01:50PM -0700, Sukadev Bhattiprolu wrote:
> +DEFINE_PER_CPU(int, nop_txn_flags);
> +
> +static int nop_txn_flags_get_and_clear(void)
> +{
> +	int *flagsp;
> +	int flags;
> +	
> +	flagsp = &get_cpu_var(nop_txn_flags);
> +
> +	flags = *flagsp;
> +	*flagsp = 0;
> +
> +	put_cpu_var(nop_txn_flags);
> +
> +	return flags;
> +}
> +
> +static void nop_txn_flags_set(int flags)
> +{
> +	int *flagsp;
> +	
> +	flagsp = &get_cpu_var(nop_txn_flags);
> +	*flagsp = flags;
> +	put_cpu_var(nop_txn_flags);
> +}

That's really horrible, see below:

> +static void perf_pmu_start_txn(struct pmu *pmu, int flags)
>  {
	__this_cpu_write(nop_txn_flags, flags);
> +
> +	if (flags & ~PERF_PMU_TXN_ADD)
> +		return;
> +
>  	perf_pmu_disable(pmu);
>  }
>  
>  static int perf_pmu_commit_txn(struct pmu *pmu)
>  {
	int flags = __this_cpu_read(nop_txn_flags);
	__this_cpu_write(nop_txn_flags, 0);
> +
> +	if (flags & ~PERF_PMU_TXN_ADD)
> +		return 0;
> +
>  	perf_pmu_enable(pmu);
>  	return 0;
>  }
>  
>  static void perf_pmu_cancel_txn(struct pmu *pmu)
>  {
	int flags = __this_cpu_read(nop_txn_flags);
	__this_cpu_write(nop_txn_flags, 0);
> +
> +	if (flags & ~PERF_PMU_TXN_ADD)
> +		return;
> +
>  	perf_pmu_enable(pmu);
>  }

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 5/8] perf: Split perf_event_read_value()
  2015-07-15  3:01 ` [PATCH v3 5/8] perf: Split perf_event_read_value() Sukadev Bhattiprolu
@ 2015-07-16 21:12   ` Peter Zijlstra
  2015-07-16 21:41     ` Sukadev Bhattiprolu
  2015-07-23  7:45   ` Peter Zijlstra
  1 sibling, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2015-07-16 21:12 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Michael Ellerman,
	linux-kernel, linuxppc-dev, linux-s390, sparclinux

On Tue, Jul 14, 2015 at 08:01:52PM -0700, Sukadev Bhattiprolu wrote:
> Move the part of perf_event_read_value() that computes the event
> counts and event times into a new function, perf_event_compute().
> 
> This would allow us to call perf_event_compute() independently.
> 
> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
> 
> Changelog[v3]
> 	Rather than move perf_event_read() into callers and then
> 	rename, just move the computations into a separate function
> 	(redesign to address comment from Peter Zijlstra).
> ---

Changelog[] bits go here, below the '---' where they get discarded.

>  kernel/events/core.c |   37 ++++++++++++++++++++++++-------------
>  1 file changed, 24 insertions(+), 13 deletions(-)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 44fb89d..b1e9a42 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3704,6 +3704,29 @@ static int perf_release(struct inode *inode, struct file *file)
>  	return 0;
>  }
>  
> +static u64 perf_event_compute(struct perf_event *event, u64 *enabled,
> +			      u64 *running)

This is a horrible name, 'compute' what?

> +{
> +	struct perf_event *child;
> +	u64 total;
> +
> +	total = perf_event_count(event);
> +
> +	*enabled += event->total_time_enabled +
> +			atomic64_read(&event->child_total_time_enabled);
> +	*running += event->total_time_running +
> +			atomic64_read(&event->child_total_time_running);
> +

	lockdep_assert_held(&event->child_mutex);

> +	list_for_each_entry(child, &event->child_list, child_list) {
> +		perf_event_read(child);
> +		total += perf_event_count(child);
> +		*enabled += child->total_time_enabled;
> +		*running += child->total_time_running;
> +	}
> +
> +	return total;
> +}
> +
>  /*
>   * Remove all orphanes events from the context.
>   */

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 3/8] perf: Add a flags parameter to pmu txn interfaces
  2015-07-16 20:17   ` Peter Zijlstra
@ 2015-07-16 21:28     ` Sukadev Bhattiprolu
  0 siblings, 0 replies; 25+ messages in thread
From: Sukadev Bhattiprolu @ 2015-07-16 21:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Michael Ellerman,
	linux-kernel, linuxppc-dev, linux-s390, sparclinux

Peter Zijlstra [peterz@infradead.org] wrote:
| On Tue, Jul 14, 2015 at 08:01:50PM -0700, Sukadev Bhattiprolu wrote:
| > @@ -1604,6 +1613,12 @@ static void power_pmu_start_txn(struct pmu *pmu)
| >  static void power_pmu_cancel_txn(struct pmu *pmu)
| >  {
| >  	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
| > +	int txn_flags;
| > +
| > +	txn_flags = cpuhw->txn_flags;
| > +	cpuhw->txn_flags = 0;
| > +	if (cpuhw->txn_flags & ~PERF_PMU_TXN_ADD)
| > +		return;
| 
| That seems, unintentional? ;-)

Argh. Thanks for catching it.

Sukadev



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 5/8] perf: Split perf_event_read_value()
  2015-07-16 21:12   ` Peter Zijlstra
@ 2015-07-16 21:41     ` Sukadev Bhattiprolu
  0 siblings, 0 replies; 25+ messages in thread
From: Sukadev Bhattiprolu @ 2015-07-16 21:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Michael Ellerman,
	linux-kernel, linuxppc-dev, linux-s390, sparclinux

Peter Zijlstra [peterz@infradead.org] wrote:
| On Tue, Jul 14, 2015 at 08:01:52PM -0700, Sukadev Bhattiprolu wrote:
| > Move the part of perf_event_read_value() that computes the event
| > counts and event times into a new function, perf_event_compute().
| > 
| > This would allow us to call perf_event_compute() independently.
| > 
| > Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
| > 
| > Changelog[v3]
| > 	Rather than move perf_event_read() into callers and then
| > 	rename, just move the computations into a separate function
| > 	(redesign to address comment from Peter Zijlstra).
| > ---
| 
| Changelog[] bits go here, below the '---' where they get discarded.

Sorry. Will fix it.

| 
| >  kernel/events/core.c |   37 ++++++++++++++++++++++++-------------
| >  1 file changed, 24 insertions(+), 13 deletions(-)
| > 
| > diff --git a/kernel/events/core.c b/kernel/events/core.c
| > index 44fb89d..b1e9a42 100644
| > --- a/kernel/events/core.c
| > +++ b/kernel/events/core.c
| > @@ -3704,6 +3704,29 @@ static int perf_release(struct inode *inode, struct file *file)
| >  	return 0;
| >  }
| >  
| > +static u64 perf_event_compute(struct perf_event *event, u64 *enabled,
| > +			      u64 *running)
| 
| This is a horrible name, 'compute' what?

We are aggregating event counts and time for children.

Would perf_event_aggregate() or perf_event_aggregate_children()
be better?

| 
| > +{
| > +	struct perf_event *child;
| > +	u64 total;
| > +
| > +	total = perf_event_count(event);
| > +
| > +	*enabled += event->total_time_enabled +
| > +			atomic64_read(&event->child_total_time_enabled);
| > +	*running += event->total_time_running +
| > +			atomic64_read(&event->child_total_time_running);
| > +
| 
| 	lockdep_assert_held(&event->child_mutex);

OK. Thanks for the comments.

Sukadev


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 7/8] perf: Define PMU_TXN_READ interface
  2015-07-15  3:01 ` [PATCH v3 7/8] perf: Define PMU_TXN_READ interface Sukadev Bhattiprolu
@ 2015-07-16 22:20   ` Peter Zijlstra
  2015-07-22  1:50     ` Sukadev Bhattiprolu
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2015-07-16 22:20 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Michael Ellerman,
	linux-kernel, linuxppc-dev, linux-s390, sparclinux

On Tue, Jul 14, 2015 at 08:01:54PM -0700, Sukadev Bhattiprolu wrote:
> +/*
> + * Use the transaction interface to read the group of events in @leader.
> + * PMUs like the 24x7 counters in Power, can use this to queue the events
> + * in the ->read() operation and perform the actual read in ->commit_txn.
> + *
> + * Other PMUs can ignore the ->start_txn and ->commit_txn and read each
> + * PMU directly in the ->read() operation.
> + */
> +static int perf_event_read_group(struct perf_event *leader)
> +{
> +	int ret;
> +	struct perf_event *sub;
> +	struct pmu *pmu;
> +
> +	pmu = leader->pmu;
> +
> +	pmu->start_txn(pmu, PERF_PMU_TXN_READ);
> +
> +	perf_event_read(leader);

There should be a lockdep assert with that list iteration.

> +	list_for_each_entry(sub, &leader->sibling_list, group_entry)
> +		perf_event_read(sub);
> +
> +	ret = pmu->commit_txn(pmu);
> +
> +	return ret;
> +}

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 7/8] perf: Define PMU_TXN_READ interface
  2015-07-16 22:20   ` Peter Zijlstra
@ 2015-07-22  1:50     ` Sukadev Bhattiprolu
  2015-07-22  5:55       ` Peter Zijlstra
  0 siblings, 1 reply; 25+ messages in thread
From: Sukadev Bhattiprolu @ 2015-07-22  1:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Michael Ellerman,
	linux-kernel, linuxppc-dev, linux-s390, sparclinux

Peter Zijlstra [peterz@infradead.org] wrote:
| On Tue, Jul 14, 2015 at 08:01:54PM -0700, Sukadev Bhattiprolu wrote:
| > +/*
| > + * Use the transaction interface to read the group of events in @leader.
| > + * PMUs like the 24x7 counters in Power, can use this to queue the events
| > + * in the ->read() operation and perform the actual read in ->commit_txn.
| > + *
| > + * Other PMUs can ignore the ->start_txn and ->commit_txn and read each
| > + * PMU directly in the ->read() operation.
| > + */
| > +static int perf_event_read_group(struct perf_event *leader)
| > +{
| > +	int ret;
| > +	struct perf_event *sub;
| > +	struct pmu *pmu;
| > +
| > +	pmu = leader->pmu;
| > +
| > +	pmu->start_txn(pmu, PERF_PMU_TXN_READ);
| > +
| > +	perf_event_read(leader);
| 
| There should be a lockdep assert with that list iteration.
| 
| > +	list_for_each_entry(sub, &leader->sibling_list, group_entry)
| > +		perf_event_read(sub);
| > +
| > +	ret = pmu->commit_txn(pmu);

Peter,

I have a situation :-)

We are trying to use the following interface:

	start_txn(pmu, PERF_PMU_TXN_READ);

	perf_event_read(leader);
	list_for_each(sibling, &leader->sibling_list, group_entry)
		perf_event_read(sibling)

	pmu->commit_txn(pmu);

with the idea that the PMU driver would save the type of transaction in
->start_txn() and use in ->read() and ->commit_txn().

But since ->start_txn() and the ->read() operations could happen on different
CPUs (perf_event_read() uses the event->oncpu to schedule a call), the PMU
driver cannot use a per-cpu variable to save the state in ->start_txn().

I tried using a pmu-wide global, but that would also need us to hold a mutex
to serialize access to that global. The problem is ->start_txn() can be
called from an interrupt context for the TXN_ADD transactions (I got the
following backtrace during testing)

	mutex_lock_nested+0x504/0x520 (unreliable)
	h_24x7_event_start_txn+0x3c/0xd0
	group_sched_in+0x70/0x230
	ctx_sched_in.isra.63+0x150/0x230
	__perf_install_in_context+0x1c8/0x1e0
	remote_function+0x7c/0xa0
	flush_smp_call_function_queue+0xb0/0x1d0
	smp_ipi_demux+0x88/0xf0
	icp_hv_ipi_action+0x54/0xc0
	handle_irq_event_percpu+0x98/0x2b0
	handle_percpu_irq+0x7c/0xc0
	generic_handle_irq+0x4c/0x80
	__do_irq+0x7c/0x190
	call_do_irq+0x14/0x24
	do_IRQ+0x8c/0x100
	hardware_interrupt_common+0x168/0x180
	--- interrupt: 501 at .plpar_hcall_norets+0x14/0x20

Basically stuck trying to save the txn type in ->start_txn() and retrieve in
->read().

Couple of options I can think of are:

	- having ->start_txn() return a handle that should then be passed in
	  with ->read() (yuck) and ->commit_txn().

	- serialize the READ transaction for the PMU in perf_event_read_group()
	  with a new pmu->txn_mutex:

		mutex_lock(&pmu->txn_mutex);

		pmu->start_txn()
		list_for_each_entry(sub, &leader->sibling_list, group_entry)
			perf_event_read(sub);

		ret = pmu->commit_txn(pmu);

		mutex_unlock(&pmu->txn_mutex);

	  such serialization would be ok with 24x7 counters (they are system
	  wide counters anyway) We could maybe skip the mutex for PMUs that
	  don't implement TXN_READ interface.

or is there better way?

Sukadev


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 7/8] perf: Define PMU_TXN_READ interface
  2015-07-22  1:50     ` Sukadev Bhattiprolu
@ 2015-07-22  5:55       ` Peter Zijlstra
  2015-07-22 23:19         ` Sukadev Bhattiprolu
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2015-07-22  5:55 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Michael Ellerman,
	linux-kernel, linuxppc-dev, linux-s390, sparclinux

On Tue, Jul 21, 2015 at 06:50:45PM -0700, Sukadev Bhattiprolu wrote:
> We are trying to use the following interface:
> 
> 	start_txn(pmu, PERF_PMU_TXN_READ);
> 
> 	perf_event_read(leader);
> 	list_for_each(sibling, &leader->sibling_list, group_entry)
> 		perf_event_read(sibling)
> 
> 	pmu->commit_txn(pmu);
> 
> with the idea that the PMU driver would save the type of transaction in
> ->start_txn() and use in ->read() and ->commit_txn().
> 
> But since ->start_txn() and the ->read() operations could happen on different
> CPUs (perf_event_read() uses the event->oncpu to schedule a call), the PMU
> driver cannot use a per-cpu variable to save the state in ->start_txn().

> or is there better way?


I've not woken up yet, and not actually fully read the email, but can
you stuff the entire above chunk inside the IPI?

I think you could then actually optimize __perf_event_read() as well,
because all these events should be on the same context, so no point in
calling update_*time*() for every event or so.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 7/8] perf: Define PMU_TXN_READ interface
  2015-07-22  5:55       ` Peter Zijlstra
@ 2015-07-22 23:19         ` Sukadev Bhattiprolu
  2015-07-23  8:04           ` Peter Zijlstra
  0 siblings, 1 reply; 25+ messages in thread
From: Sukadev Bhattiprolu @ 2015-07-22 23:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Michael Ellerman,
	linux-kernel, linuxppc-dev, linux-s390, sparclinux

Peter Zijlstra [peterz@infradead.org] wrote:
| I've not woken up yet, and not actually fully read the email, but can
| you stuff the entire above chunk inside the IPI?
| 
| I think you could then actually optimize __perf_event_read() as well,
| because all these events should be on the same context, so no point in
| calling update_*time*() for every event or so.
| 

Do you mean something like this (will move the rename to a separate
patch before posting):
--

>From e8eddb5d3877ebdb3b71213a00aaa980f4010dd0 Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Date: Tue, 7 Jul 2015 21:45:23 -0400
Subject: [PATCH 1/1] perf: Define PMU_TXN_READ interface

Define a new PERF_PMU_TXN_READ interface to read a group of counters
at once. Note that we use this interface with all PMUs.

PMUs that implement this interface use the ->read() operation to _queue_
the counters to be read and use ->commit_txn() to actually read all the
queued counters at once.

PMUs that don't implement PERF_PMU_TXN_READ ignore ->start_txn() and
->commit_txn() and continue to read counters one at a time.

Thanks to input from Peter Zijlstra.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---

Changelog[v5]
	[Peter Zijlstra] Ensure the entire transaction happens on the same CPU.

Changelog[v4]
	[Peter Zijlstra] Add lockdep_assert_held() in perf_event_read_group()
---
 include/linux/perf_event.h |    1 +
 kernel/events/core.c       |   72 +++++++++++++++++++++++++++++++++++++-------
 2 files changed, 62 insertions(+), 11 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 44bf05f..da307ad 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -169,6 +169,7 @@ struct perf_event;
 #define PERF_EVENT_TXN 0x1
 
 #define PERF_PMU_TXN_ADD  0x1		/* txn to add/schedule event on PMU */
+#define PERF_PMU_TXN_READ 0x2		/* txn to read event group from PMU */
 
 /**
  * pmu::capabilities flags
diff --git a/kernel/events/core.c b/kernel/events/core.c
index a6bd09d..7177dd8 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3174,12 +3174,8 @@ void perf_event_exec(void)
 	rcu_read_unlock();
 }
 
-/*
- * Cross CPU call to read the hardware event
- */
-static void __perf_event_read(void *info)
+static void __perf_event_read(struct perf_event *event, int update_ctx)
 {
-	struct perf_event *event = info;
 	struct perf_event_context *ctx = event->ctx;
 	struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
 
@@ -3194,7 +3190,7 @@ static void __perf_event_read(void *info)
 		return;
 
 	raw_spin_lock(&ctx->lock);
-	if (ctx->is_active) {
+	if (ctx->is_active && update_ctx) {
 		update_context_time(ctx);
 		update_cgrp_time_from_event(event);
 	}
@@ -3204,6 +3200,16 @@ static void __perf_event_read(void *info)
 	raw_spin_unlock(&ctx->lock);
 }
 
+/*
+ * Cross CPU call to read the hardware event
+ */
+static void __perf_event_read_ipi(void *info)
+{
+	struct perf_event *event = info;
+
+	__perf_event_read(event, 1);
+}
+
 static inline u64 perf_event_count(struct perf_event *event)
 {
 	if (event->pmu->count)
@@ -3220,7 +3226,7 @@ static void perf_event_read(struct perf_event *event)
 	 */
 	if (event->state == PERF_EVENT_STATE_ACTIVE) {
 		smp_call_function_single(event->oncpu,
-					 __perf_event_read, event, 1);
+					 __perf_event_read_ipi, event, 1);
 	} else if (event->state == PERF_EVENT_STATE_INACTIVE) {
 		struct perf_event_context *ctx = event->ctx;
 		unsigned long flags;
@@ -3765,6 +3771,36 @@ static void orphans_remove_work(struct work_struct *work)
 	put_ctx(ctx);
 }
 
+/*
+ * Use the transaction interface to read the group of events in @leader.
+ * PMUs like the 24x7 counters in Power, can use this to queue the events
+ * in the ->read() operation and perform the actual read in ->commit_txn.
+ *
+ * Other PMUs can ignore the ->start_txn and ->commit_txn and read each
+ * PMU directly in the ->read() operation.
+ */
+static int perf_event_read_group(struct perf_event *leader)
+{
+	int ret;
+	struct perf_event *sub;
+	struct pmu *pmu;
+	struct perf_event_context *ctx = leader->ctx;
+
+	lockdep_assert_held(&ctx->mutex);
+
+	pmu = leader->pmu;
+
+	pmu->start_txn(pmu, PERF_PMU_TXN_READ);
+
+	__perf_event_read(leader, 1);
+	list_for_each_entry(sub, &leader->sibling_list, group_entry)
+		__perf_event_read(sub, 0);
+
+	ret = pmu->commit_txn(pmu);
+
+	return ret;
+}
+
 u64 perf_event_read_value(struct perf_event *event, u64 *enabled, u64 *running)
 {
 	u64 total = 0;
@@ -3794,7 +3830,17 @@ static int perf_read_group(struct perf_event *event,
 
 	lockdep_assert_held(&ctx->mutex);
 
-	count = perf_event_read_value(leader, &enabled, &running);
+	mutex_lock(&leader->child_mutex);
+
+	ret = perf_event_read_group(leader);
+	if (ret) {
+		mutex_unlock(&leader->child_mutex);
+		return ret;
+	}
+
+	count = perf_event_aggregate(leader, &enabled, &running);
+
+	mutex_unlock(&leader->child_mutex);
 
 	values[n++] = 1 + leader->nr_siblings;
 	if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED)
@@ -3815,15 +3861,19 @@ static int perf_read_group(struct perf_event *event,
 	list_for_each_entry(sub, &leader->sibling_list, group_entry) {
 		n = 0;
 
-		values[n++] = perf_event_read_value(sub, &enabled, &running);
+		mutex_lock(&sub->child_mutex);
+
+		values[n++] = perf_event_aggregate(sub, &enabled, &running);
+
+		mutex_unlock(&sub->child_mutex);
+
 		if (read_format & PERF_FORMAT_ID)
 			values[n++] = primary_event_id(sub);
 
 		size = n * sizeof(u64);
 
-		if (copy_to_user(buf + ret, values, size)) {
+		if (copy_to_user(buf + ret, values, size))
 			return -EFAULT;
-		}
 
 		ret += size;
 	}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 5/8] perf: Split perf_event_read_value()
  2015-07-15  3:01 ` [PATCH v3 5/8] perf: Split perf_event_read_value() Sukadev Bhattiprolu
  2015-07-16 21:12   ` Peter Zijlstra
@ 2015-07-23  7:45   ` Peter Zijlstra
  2015-07-27  5:54     ` Sukadev Bhattiprolu
  1 sibling, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2015-07-23  7:45 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Michael Ellerman,
	linux-kernel, linuxppc-dev, linux-s390, sparclinux

On Tue, Jul 14, 2015 at 08:01:52PM -0700, Sukadev Bhattiprolu wrote:
> Move the part of perf_event_read_value() that computes the event
> counts and event times into a new function, perf_event_compute().
> 
> This would allow us to call perf_event_compute() independently.
> 
> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
> 
> Changelog[v3]
> 	Rather than move perf_event_read() into callers and then
> 	rename, just move the computations into a separate function
> 	(redesign to address comment from Peter Zijlstra).
> ---
>  kernel/events/core.c |   37 ++++++++++++++++++++++++-------------
>  1 file changed, 24 insertions(+), 13 deletions(-)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 44fb89d..b1e9a42 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3704,6 +3704,29 @@ static int perf_release(struct inode *inode, struct file *file)
>  	return 0;
>  }
>  
> +static u64 perf_event_compute(struct perf_event *event, u64 *enabled,
> +			      u64 *running)
> +{
> +	struct perf_event *child;
> +	u64 total;
> +
> +	total = perf_event_count(event);
> +
> +	*enabled += event->total_time_enabled +
> +			atomic64_read(&event->child_total_time_enabled);
> +	*running += event->total_time_running +
> +			atomic64_read(&event->child_total_time_running);
> +
> +	list_for_each_entry(child, &event->child_list, child_list) {
> +		perf_event_read(child);

Sure we don't want that..

> +		total += perf_event_count(child);
> +		*enabled += child->total_time_enabled;
> +		*running += child->total_time_running;
> +	}
> +
> +	return total;
> +}

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 7/8] perf: Define PMU_TXN_READ interface
  2015-07-22 23:19         ` Sukadev Bhattiprolu
@ 2015-07-23  8:04           ` Peter Zijlstra
  2015-07-24  1:17             ` Sukadev Bhattiprolu
  2015-09-13 11:11             ` [tip:perf/core] perf/core: Add group reads to perf_event_read() tip-bot for Peter Zijlstra
  0 siblings, 2 replies; 25+ messages in thread
From: Peter Zijlstra @ 2015-07-23  8:04 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Michael Ellerman,
	linux-kernel, linuxppc-dev, linux-s390, sparclinux

On Wed, Jul 22, 2015 at 04:19:16PM -0700, Sukadev Bhattiprolu wrote:
> Peter Zijlstra [peterz@infradead.org] wrote:
> | I've not woken up yet, and not actually fully read the email, but can
> | you stuff the entire above chunk inside the IPI?
> | 
> | I think you could then actually optimize __perf_event_read() as well,
> | because all these events should be on the same context, so no point in
> | calling update_*time*() for every event or so.
> | 
> 
> Do you mean something like this (will move the rename to a separate
> patch before posting):

More like so.. please double check, I've not even had tea yet.

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3174,14 +3174,22 @@ void perf_event_exec(void)
 	rcu_read_unlock();
 }
 
+struct perf_read_data {
+	struct perf_event *event;
+	bool group;
+	int ret;
+};
+
 /*
  * Cross CPU call to read the hardware event
  */
 static void __perf_event_read(void *info)
 {
-	struct perf_event *event = info;
+	struct perf_read_data *data = info;
+	struct perf_event *sub, *event = data->event;
 	struct perf_event_context *ctx = event->ctx;
 	struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
+	struct pmu *pmu = event->pmu;
 
 	/*
 	 * If this is a task context, we need to check whether it is
@@ -3199,8 +3207,23 @@ static void __perf_event_read(void *info
 		update_cgrp_time_from_event(event);
 	}
 	update_event_times(event);
-	if (event->state == PERF_EVENT_STATE_ACTIVE)
-		event->pmu->read(event);
+	if (event->state != PERF_EVENT_STATE_ACTIVE)
+		goto unlock;
+
+	if (!data->group) {
+		pmu->read(event);
+		goto unlock;
+	}
+
+	pmu->start_txn(pmu, PERF_PMU_TXN_READ);
+	pmu->read(event);
+	list_for_each_entry(sub, &event->sibling_list, group_entry) {
+		if (sub->state == PERF_EVENT_STATE_ACTIVE)
+			pmu->read(sub);
+	}
+	data->ret = pmu->commit_txn(pmu);
+
+unlock:
 	raw_spin_unlock(&ctx->lock);
 }
 
@@ -3212,15 +3235,23 @@ static inline u64 perf_event_count(struc
 	return __perf_event_count(event);
 }
 
-static void perf_event_read(struct perf_event *event)
+static int perf_event_read(struct perf_event *event, bool group)
 {
+	int ret = 0
+
 	/*
 	 * If event is enabled and currently active on a CPU, update the
 	 * value in the event structure:
 	 */
 	if (event->state == PERF_EVENT_STATE_ACTIVE) {
+		struct perf_read_data data = {
+			.event = event,
+			.group = group,
+			.ret = 0,
+		};
 		smp_call_function_single(event->oncpu,
-					 __perf_event_read, event, 1);
+					 __perf_event_read, &data, 1);
+		ret = data.ret;
 	} else if (event->state == PERF_EVENT_STATE_INACTIVE) {
 		struct perf_event_context *ctx = event->ctx;
 		unsigned long flags;
@@ -3235,9 +3266,14 @@ static void perf_event_read(struct perf_
 			update_context_time(ctx);
 			update_cgrp_time_from_event(event);
 		}
-		update_event_times(event);
+		if (group)
+			update_group_times(event);
+		else
+			update_event_times(event);
 		raw_spin_unlock_irqrestore(&ctx->lock, flags);
 	}
+
+	return ret;
 }
 
 /*
@@ -3718,7 +3754,6 @@ static u64 perf_event_compute(struct per
 			atomic64_read(&event->child_total_time_running);
 
 	list_for_each_entry(child, &event->child_list, child_list) {
-		perf_event_read(child);
 		total += perf_event_count(child);
 		*enabled += child->total_time_enabled;
 		*running += child->total_time_running;
@@ -3772,7 +3807,7 @@ u64 perf_event_read_value(struct perf_ev
 
 	mutex_lock(&event->child_mutex);
 
-	perf_event_read(event);
+	perf_event_read(event, false);
 	total = perf_event_compute(event, enabled, running);
 
 	mutex_unlock(&event->child_mutex);
@@ -3792,7 +3827,11 @@ static int perf_read_group(struct perf_e
 
 	lockdep_assert_held(&ctx->mutex);
 
-	count = perf_event_read_value(leader, &enabled, &running);
+	ret = perf_event_read(leader, true);
+	if (ret)
+		return ret;
+
+	count = perf_event_compute(leader, &enabled, &running);
 
 	values[n++] = 1 + leader->nr_siblings;
 	if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED)
@@ -3813,7 +3852,7 @@ static int perf_read_group(struct perf_e
 	list_for_each_entry(sub, &leader->sibling_list, group_entry) {
 		n = 0;
 
-		values[n++] = perf_event_read_value(sub, &enabled, &running);
+		values[n++] = perf_event_compute(sub, &enabled, &running);
 		if (read_format & PERF_FORMAT_ID)
 			values[n++] = primary_event_id(sub);
 
@@ -3931,7 +3970,7 @@ static unsigned int perf_poll(struct fil
 
 static void _perf_event_reset(struct perf_event *event)
 {
-	perf_event_read(event);
+	perf_event_read(event, false);
 	local64_set(&event->count, 0);
 	perf_event_update_userpage(event);
 }

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 7/8] perf: Define PMU_TXN_READ interface
  2015-07-23  8:04           ` Peter Zijlstra
@ 2015-07-24  1:17             ` Sukadev Bhattiprolu
  2015-09-13 11:11             ` [tip:perf/core] perf/core: Add group reads to perf_event_read() tip-bot for Peter Zijlstra
  1 sibling, 0 replies; 25+ messages in thread
From: Sukadev Bhattiprolu @ 2015-07-24  1:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Michael Ellerman,
	linux-kernel, linuxppc-dev, linux-s390, sparclinux

Peter Zijlstra [peterz@infradead.org] wrote:
| On Wed, Jul 22, 2015 at 04:19:16PM -0700, Sukadev Bhattiprolu wrote:
| > Peter Zijlstra [peterz@infradead.org] wrote:
| > | I've not woken up yet, and not actually fully read the email, but can
| > | you stuff the entire above chunk inside the IPI?
| > | 
| > | I think you could then actually optimize __perf_event_read() as well,
| > | because all these events should be on the same context, so no point in
| > | calling update_*time*() for every event or so.
| > | 
| > 
| > Do you mean something like this (will move the rename to a separate
| > patch before posting):
| 
| More like so.. please double check, I've not even had tea yet.

Yeah, I realized I had ignored the 'event->cpu' spec.
Will try this out. Thanks,

Sukadev

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 5/8] perf: Split perf_event_read_value()
  2015-07-23  7:45   ` Peter Zijlstra
@ 2015-07-27  5:54     ` Sukadev Bhattiprolu
  0 siblings, 0 replies; 25+ messages in thread
From: Sukadev Bhattiprolu @ 2015-07-27  5:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Michael Ellerman,
	linux-kernel, linuxppc-dev, linux-s390, sparclinux

Peter Zijlstra [peterz@infradead.org] wrote:
| On Tue, Jul 14, 2015 at 08:01:52PM -0700, Sukadev Bhattiprolu wrote:
| > Move the part of perf_event_read_value() that computes the event
| > counts and event times into a new function, perf_event_compute().
| > 
| > This would allow us to call perf_event_compute() independently.
| > 
| > Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
| > 
| > Changelog[v3]
| > 	Rather than move perf_event_read() into callers and then
| > 	rename, just move the computations into a separate function
| > 	(redesign to address comment from Peter Zijlstra).
| > ---
| >  kernel/events/core.c |   37 ++++++++++++++++++++++++-------------
| >  1 file changed, 24 insertions(+), 13 deletions(-)
| > 
| > diff --git a/kernel/events/core.c b/kernel/events/core.c
| > index 44fb89d..b1e9a42 100644
| > --- a/kernel/events/core.c
| > +++ b/kernel/events/core.c
| > @@ -3704,6 +3704,29 @@ static int perf_release(struct inode *inode, struct file *file)
| >  	return 0;
| >  }
| >  
| > +static u64 perf_event_compute(struct perf_event *event, u64 *enabled,
| > +			      u64 *running)
| > +{
| > +	struct perf_event *child;
| > +	u64 total;
| > +
| > +	total = perf_event_count(event);
| > +
| > +	*enabled += event->total_time_enabled +
| > +			atomic64_read(&event->child_total_time_enabled);
| > +	*running += event->total_time_running +
| > +			atomic64_read(&event->child_total_time_running);
| > +
| > +	list_for_each_entry(child, &event->child_list, child_list) {
| > +		perf_event_read(child);
| 
| Sure we don't want that..

So if say x86 calls perf_event_read_value() the current upstream code
makes the perf_event_read(child).

If we remove this, then it would be a change in behavior?

I have commented it out and have TODO in the latest patchset. Pls
review and let me know if we should drop this read (and the TODO)
of the child event.

Sukadev


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [v3,1/8] powerpc/perf/hv-24x7: Whitespace - fix parameter alignment
  2015-07-15  3:01 ` [PATCH v3 1/8] powerpc/perf/hv-24x7: Whitespace - fix parameter alignment Sukadev Bhattiprolu
@ 2015-08-03  1:35   ` Michael Ellerman
  0 siblings, 0 replies; 25+ messages in thread
From: Michael Ellerman @ 2015-08-03  1:35 UTC (permalink / raw)
  To: Sukadev Bhattiprolu, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo
  Cc: sparclinux, linux-s390, linuxppc-dev, linux-kernel

On Wed, 2015-15-07 at 03:01:48 UTC, Sukadev Bhattiprolu wrote:
> Fix parameter alignment to be consistent with coding style.
> 
> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/40386217cd7bc38908d6

cheers

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [v3, 2/8] powerpc/perf/hv-24x7: Simplify extracting counter from result buffer
  2015-07-15  3:01 ` [PATCH v3 2/8] powerpc/perf/hv-24x7: Simplify extracting counter from result buffer Sukadev Bhattiprolu
@ 2015-08-03  1:35   ` Michael Ellerman
  0 siblings, 0 replies; 25+ messages in thread
From: Michael Ellerman @ 2015-08-03  1:35 UTC (permalink / raw)
  To: Sukadev Bhattiprolu, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo
  Cc: sparclinux, linux-s390, linuxppc-dev, linux-kernel

On Wed, 2015-15-07 at 03:01:49 UTC, Sukadev Bhattiprolu wrote:
> Simplify code that extracts a 24x7 counter from the HCALL's result buffer.
> 
> Suggested-by: Joe Perches <joe@perches.com>
> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/465345ca387ed491c467

cheers

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [tip:perf/core] perf/core: Add group reads to perf_event_read()
  2015-07-23  8:04           ` Peter Zijlstra
  2015-07-24  1:17             ` Sukadev Bhattiprolu
@ 2015-09-13 11:11             ` tip-bot for Peter Zijlstra
  1 sibling, 0 replies; 25+ messages in thread
From: tip-bot for Peter Zijlstra @ 2015-09-13 11:11 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, linux-kernel, mingo, torvalds, acme, sukadev, hpa,
	vincent.weaver, acme, jolsa, mpe, eranian, tglx

Commit-ID:  0492d4c5b8c4dc3a7591bf6fa0e35d117812cc85
Gitweb:     http://git.kernel.org/tip/0492d4c5b8c4dc3a7591bf6fa0e35d117812cc85
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Thu, 3 Sep 2015 20:07:48 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 13 Sep 2015 11:27:27 +0200

perf/core: Add group reads to perf_event_read()

Enable perf_event_read() to update entire groups at once, this will be
useful for read transactions.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/20150723080435.GE25159@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/events/core.c | 39 ++++++++++++++++++++++++++++++++-------
 1 file changed, 32 insertions(+), 7 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 67b7dba..4d89866 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3184,12 +3184,18 @@ void perf_event_exec(void)
 	rcu_read_unlock();
 }
 
+struct perf_read_data {
+	struct perf_event *event;
+	bool group;
+};
+
 /*
  * Cross CPU call to read the hardware event
  */
 static void __perf_event_read(void *info)
 {
-	struct perf_event *event = info;
+	struct perf_read_data *data = info;
+	struct perf_event *sub, *event = data->event;
 	struct perf_event_context *ctx = event->ctx;
 	struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
 
@@ -3208,9 +3214,21 @@ static void __perf_event_read(void *info)
 		update_context_time(ctx);
 		update_cgrp_time_from_event(event);
 	}
+
 	update_event_times(event);
 	if (event->state == PERF_EVENT_STATE_ACTIVE)
 		event->pmu->read(event);
+
+	if (!data->group)
+		goto unlock;
+
+	list_for_each_entry(sub, &event->sibling_list, group_entry) {
+		update_event_times(sub);
+		if (sub->state == PERF_EVENT_STATE_ACTIVE)
+			sub->pmu->read(sub);
+	}
+
+unlock:
 	raw_spin_unlock(&ctx->lock);
 }
 
@@ -3275,15 +3293,19 @@ u64 perf_event_read_local(struct perf_event *event)
 	return val;
 }
 
-static void perf_event_read(struct perf_event *event)
+static void perf_event_read(struct perf_event *event, bool group)
 {
 	/*
 	 * If event is enabled and currently active on a CPU, update the
 	 * value in the event structure:
 	 */
 	if (event->state == PERF_EVENT_STATE_ACTIVE) {
+		struct perf_read_data data = {
+			.event = event,
+			.group = group,
+		};
 		smp_call_function_single(event->oncpu,
-					 __perf_event_read, event, 1);
+					 __perf_event_read, &data, 1);
 	} else if (event->state == PERF_EVENT_STATE_INACTIVE) {
 		struct perf_event_context *ctx = event->ctx;
 		unsigned long flags;
@@ -3298,7 +3320,10 @@ static void perf_event_read(struct perf_event *event)
 			update_context_time(ctx);
 			update_cgrp_time_from_event(event);
 		}
-		update_event_times(event);
+		if (group)
+			update_group_times(event);
+		else
+			update_event_times(event);
 		raw_spin_unlock_irqrestore(&ctx->lock, flags);
 	}
 }
@@ -3817,7 +3842,7 @@ u64 perf_event_read_value(struct perf_event *event, u64 *enabled, u64 *running)
 
 	mutex_lock(&event->child_mutex);
 
-	perf_event_read(event);
+	perf_event_read(event, false);
 	total += perf_event_count(event);
 
 	*enabled += event->total_time_enabled +
@@ -3826,7 +3851,7 @@ u64 perf_event_read_value(struct perf_event *event, u64 *enabled, u64 *running)
 			atomic64_read(&event->child_total_time_running);
 
 	list_for_each_entry(child, &event->child_list, child_list) {
-		perf_event_read(child);
+		perf_event_read(child, false);
 		total += perf_event_count(child);
 		*enabled += child->total_time_enabled;
 		*running += child->total_time_running;
@@ -3987,7 +4012,7 @@ static unsigned int perf_poll(struct file *file, poll_table *wait)
 
 static void _perf_event_reset(struct perf_event *event)
 {
-	perf_event_read(event);
+	perf_event_read(event, false);
 	local64_set(&event->count, 0);
 	perf_event_update_userpage(event);
 }

^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2015-09-13 11:12 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-15  3:01 [PATCH v3 0/8] Implement group-read of events using txn interface Sukadev Bhattiprolu
2015-07-15  3:01 ` [PATCH v3 1/8] powerpc/perf/hv-24x7: Whitespace - fix parameter alignment Sukadev Bhattiprolu
2015-08-03  1:35   ` [v3,1/8] " Michael Ellerman
2015-07-15  3:01 ` [PATCH v3 2/8] powerpc/perf/hv-24x7: Simplify extracting counter from result buffer Sukadev Bhattiprolu
2015-08-03  1:35   ` [v3, " Michael Ellerman
2015-07-15  3:01 ` [PATCH v3 3/8] perf: Add a flags parameter to pmu txn interfaces Sukadev Bhattiprolu
2015-07-16 20:17   ` Peter Zijlstra
2015-07-16 21:28     ` Sukadev Bhattiprolu
2015-07-16 20:48   ` Peter Zijlstra
2015-07-15  3:01 ` [PATCH v3 4/8] perf: Split perf_event_read() and perf_event_count() Sukadev Bhattiprolu
2015-07-15  3:01 ` [PATCH v3 5/8] perf: Split perf_event_read_value() Sukadev Bhattiprolu
2015-07-16 21:12   ` Peter Zijlstra
2015-07-16 21:41     ` Sukadev Bhattiprolu
2015-07-23  7:45   ` Peter Zijlstra
2015-07-27  5:54     ` Sukadev Bhattiprolu
2015-07-15  3:01 ` [PATCH v3 6/8] perf: Rename perf_event_read_{one,group}, perf_read_hw Sukadev Bhattiprolu
2015-07-15  3:01 ` [PATCH v3 7/8] perf: Define PMU_TXN_READ interface Sukadev Bhattiprolu
2015-07-16 22:20   ` Peter Zijlstra
2015-07-22  1:50     ` Sukadev Bhattiprolu
2015-07-22  5:55       ` Peter Zijlstra
2015-07-22 23:19         ` Sukadev Bhattiprolu
2015-07-23  8:04           ` Peter Zijlstra
2015-07-24  1:17             ` Sukadev Bhattiprolu
2015-09-13 11:11             ` [tip:perf/core] perf/core: Add group reads to perf_event_read() tip-bot for Peter Zijlstra
2015-07-15  3:01 ` [PATCH v3 8/8] powerpc/perf/hv-24x7: Use PMU_TXN_READ interface Sukadev Bhattiprolu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).