linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] perf: Support PERF_SAMPLE_READ with inherit_stat
@ 2024-02-08 13:10 Ben Gainey
  2024-02-08 13:10 ` [PATCH v2 1/4] " Ben Gainey
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Ben Gainey @ 2024-02-08 13:10 UTC (permalink / raw)
  To: linux-perf-users, linux-kernel
  Cc: peterz, mingo, acme, mark.rutland, alexander.shishkin, jolsa,
	namhyung, irogers, adrian.hunter, james.clark, Ben Gainey

This change allows events to use PERF_SAMPLE READ with inherit so long 
as both inherit_stat and PERF_SAMPLE_TID are set.

Currently it is not possible to use PERF_SAMPLE_READ with inherit. This 
restriction assumes the user is interested in collecting aggregate 
statistics as per `perf stat`. It prevents a user from collecting 
per-thread samples using counter groups from a multi-threaded or 
multi-process application, as with `perf record -e '{....}:S'`. Instead 
users must use system-wide mode, or forgo the ability to sample counter 
groups. System-wide mode is often problematic as it requires specific 
permissions (no CAP_PERFMON / root access), or may lead to capture of 
significant amounts of extra data from other processes running on the 
system. 

Perf already supports the ability to collect per-thread counts with 
`inherit` via the `inherit_stat` flag. This patch changes 
`perf_event_alloc` relaxing the restriction to combine `inherit` with 
`PERF_SAMPLE_READ` so that the combination will be allowed so long as 
`inherit_stat` and `PERF_SAMPLE_TID` are enabled.

In this configuration stream ids (such as may appear in the read_format 
field of a PERF_RECORD_SAMPLE) are no longer globally unique, rather 
the pair of (stream id, tid) uniquely identify each event. Tools that 
rely on this, for example to calculate a delta between samples, would 
need updating to take this into account. Previously valid event 
configurations (system-wide, no-inherit and so on) where each stream id 
is the identifier are unaffected.


Changes since v1:
 - Rebase on v6.8-rc1
 - Fixed value written into sample after child exists.
 - Modified handling of switch-out so that context with these events take the
   slow path, so that the per-event/per-thread PMU state is correctly switched.
 - Modified perf tools to support this mode of operation.


Ben Gainey (4):
  perf: Support PERF_SAMPLE_READ with inherit_stat
  tools/perf: Track where perf_sample_ids need per-thread periods
  tools/perf: Correctly calculate sample period for inherited
    SAMPLE_READ values
  tools/perf: Allow inherit + inherit_stat + PERF_SAMPLE_READ when
    opening events

 include/linux/perf_event.h              |  1 +
 kernel/events/core.c                    | 53 +++++++++++++++++--------
 tools/lib/perf/evlist.c                 |  1 +
 tools/lib/perf/evsel.c                  | 48 ++++++++++++++++++++++
 tools/lib/perf/include/internal/evsel.h | 48 +++++++++++++++++++++-
 tools/perf/util/evsel.c                 | 15 ++++++-
 tools/perf/util/evsel.h                 |  1 +
 tools/perf/util/session.c               | 11 +++--
 8 files changed, 154 insertions(+), 24 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 1/4] perf: Support PERF_SAMPLE_READ with inherit_stat
  2024-02-08 13:10 [PATCH v2 0/4] perf: Support PERF_SAMPLE_READ with inherit_stat Ben Gainey
@ 2024-02-08 13:10 ` Ben Gainey
  2024-03-11 15:58   ` James Clark
  2024-02-08 13:10 ` [PATCH v2 2/4] tools/perf: Track where perf_sample_ids need per-thread periods Ben Gainey
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 7+ messages in thread
From: Ben Gainey @ 2024-02-08 13:10 UTC (permalink / raw)
  To: linux-perf-users, linux-kernel
  Cc: peterz, mingo, acme, mark.rutland, alexander.shishkin, jolsa,
	namhyung, irogers, adrian.hunter, james.clark, Ben Gainey

This change allows events to use PERF_SAMPLE READ with inherit
so long as both inherit_stat and PERF_SAMPLE_TID are set.

In this configuration, and event will be inherited into any
child processes / threads, allowing convenient profiling of a
multiprocess or multithreaded application, whilst allowing
profiling tools to collect per-thread samples, in particular
of groups of counters.

Signed-off-by: Ben Gainey <ben.gainey@arm.com>
---
 include/linux/perf_event.h |  1 +
 kernel/events/core.c       | 53 ++++++++++++++++++++++++++------------
 2 files changed, 37 insertions(+), 17 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index d2a15c0c6f8a..7d405dff6694 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -932,6 +932,7 @@ struct perf_event_context {
 
 	int				nr_task_data;
 	int				nr_stat;
+	int				nr_stat_read;
 	int				nr_freq;
 	int				rotate_disable;
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index f0f0f71213a1..dac7093b3608 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1795,8 +1795,11 @@ list_add_event(struct perf_event *event, struct perf_event_context *ctx)
 	ctx->nr_events++;
 	if (event->hw.flags & PERF_EVENT_FLAG_USER_READ_CNT)
 		ctx->nr_user++;
-	if (event->attr.inherit_stat)
+	if (event->attr.inherit_stat) {
 		ctx->nr_stat++;
+		if (event->attr.inherit && (event->attr.sample_type & PERF_SAMPLE_READ))
+			ctx->nr_stat_read++;
+	}
 
 	if (event->state > PERF_EVENT_STATE_OFF)
 		perf_cgroup_event_enable(event, ctx);
@@ -2019,8 +2022,11 @@ list_del_event(struct perf_event *event, struct perf_event_context *ctx)
 	ctx->nr_events--;
 	if (event->hw.flags & PERF_EVENT_FLAG_USER_READ_CNT)
 		ctx->nr_user--;
-	if (event->attr.inherit_stat)
+	if (event->attr.inherit_stat) {
 		ctx->nr_stat--;
+		if (event->attr.inherit && (event->attr.sample_type & PERF_SAMPLE_READ))
+			ctx->nr_stat_read--;
+	}
 
 	list_del_rcu(&event->event_entry);
 
@@ -3529,11 +3535,17 @@ perf_event_context_sched_out(struct task_struct *task, struct task_struct *next)
 			perf_ctx_disable(ctx, false);
 
 			/* PMIs are disabled; ctx->nr_pending is stable. */
-			if (local_read(&ctx->nr_pending) ||
+			if (ctx->nr_stat_read ||
+			    next_ctx->nr_stat_read ||
+			    local_read(&ctx->nr_pending) ||
 			    local_read(&next_ctx->nr_pending)) {
 				/*
 				 * Must not swap out ctx when there's pending
 				 * events that rely on the ctx->task relation.
+				 *
+				 * Likewise, when a context contains inherit+inherit_stat+SAMPLE_READ
+				 * events they should be switched out using the slow path
+				 * so that they are treated as if they were distinct contexts.
 				 */
 				raw_spin_unlock(&next_ctx->lock);
 				rcu_read_unlock();
@@ -3545,6 +3557,7 @@ perf_event_context_sched_out(struct task_struct *task, struct task_struct *next)
 
 			perf_ctx_sched_task_cb(ctx, false);
 			perf_event_swap_task_ctx_data(ctx, next_ctx);
+			perf_event_sync_stat(ctx, next_ctx);
 
 			perf_ctx_enable(ctx, false);
 
@@ -3559,8 +3572,6 @@ perf_event_context_sched_out(struct task_struct *task, struct task_struct *next)
 			RCU_INIT_POINTER(next->perf_event_ctxp, ctx);
 
 			do_switch = 0;
-
-			perf_event_sync_stat(ctx, next_ctx);
 		}
 		raw_spin_unlock(&next_ctx->lock);
 		raw_spin_unlock(&ctx->lock);
@@ -4533,8 +4544,13 @@ static void __perf_event_read(void *info)
 	raw_spin_unlock(&ctx->lock);
 }
 
-static inline u64 perf_event_count(struct perf_event *event)
+static inline u64 perf_event_count(struct perf_event *event, bool from_sample)
 {
+	if (from_sample && event->attr.inherit &&
+       event->attr.inherit &&
+       (event->attr.sample_type & PERF_SAMPLE_TID)) {
+		return local64_read(&event->count);
+	}
 	return local64_read(&event->count) + atomic64_read(&event->child_count);
 }
 
@@ -5454,7 +5470,7 @@ static u64 __perf_event_read_value(struct perf_event *event, u64 *enabled, u64 *
 	mutex_lock(&event->child_mutex);
 
 	(void)perf_event_read(event, false);
-	total += perf_event_count(event);
+	total += perf_event_count(event, false);
 
 	*enabled += event->total_time_enabled +
 			atomic64_read(&event->child_total_time_enabled);
@@ -5463,7 +5479,7 @@ static u64 __perf_event_read_value(struct perf_event *event, u64 *enabled, u64 *
 
 	list_for_each_entry(child, &event->child_list, child_list) {
 		(void)perf_event_read(child, false);
-		total += perf_event_count(child);
+		total += perf_event_count(child, false);
 		*enabled += child->total_time_enabled;
 		*running += child->total_time_running;
 	}
@@ -5545,14 +5561,14 @@ static int __perf_read_group_add(struct perf_event *leader,
 	/*
 	 * Write {count,id} tuples for every sibling.
 	 */
-	values[n++] += perf_event_count(leader);
+	values[n++] += perf_event_count(leader, false);
 	if (read_format & PERF_FORMAT_ID)
 		values[n++] = primary_event_id(leader);
 	if (read_format & PERF_FORMAT_LOST)
 		values[n++] = atomic64_read(&leader->lost_samples);
 
 	for_each_sibling_event(sub, leader) {
-		values[n++] += perf_event_count(sub);
+		values[n++] += perf_event_count(sub, false);
 		if (read_format & PERF_FORMAT_ID)
 			values[n++] = primary_event_id(sub);
 		if (read_format & PERF_FORMAT_LOST)
@@ -6132,7 +6148,7 @@ void perf_event_update_userpage(struct perf_event *event)
 	++userpg->lock;
 	barrier();
 	userpg->index = perf_event_index(event);
-	userpg->offset = perf_event_count(event);
+	userpg->offset = perf_event_count(event, false);
 	if (userpg->index)
 		userpg->offset -= local64_read(&event->hw.prev_count);
 
@@ -7200,7 +7216,7 @@ static void perf_output_read_one(struct perf_output_handle *handle,
 	u64 values[5];
 	int n = 0;
 
-	values[n++] = perf_event_count(event);
+	values[n++] = perf_event_count(event, true);
 	if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED) {
 		values[n++] = enabled +
 			atomic64_read(&event->child_total_time_enabled);
@@ -7245,7 +7261,7 @@ static void perf_output_read_group(struct perf_output_handle *handle,
 	    (leader->state == PERF_EVENT_STATE_ACTIVE))
 		leader->pmu->read(leader);
 
-	values[n++] = perf_event_count(leader);
+	values[n++] = perf_event_count(leader, true);
 	if (read_format & PERF_FORMAT_ID)
 		values[n++] = primary_event_id(leader);
 	if (read_format & PERF_FORMAT_LOST)
@@ -7260,7 +7276,7 @@ static void perf_output_read_group(struct perf_output_handle *handle,
 		    (sub->state == PERF_EVENT_STATE_ACTIVE))
 			sub->pmu->read(sub);
 
-		values[n++] = perf_event_count(sub);
+		values[n++] = perf_event_count(sub, false);
 		if (read_format & PERF_FORMAT_ID)
 			values[n++] = primary_event_id(sub);
 		if (read_format & PERF_FORMAT_LOST)
@@ -12010,10 +12026,13 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 	local64_set(&hwc->period_left, hwc->sample_period);
 
 	/*
-	 * We currently do not support PERF_SAMPLE_READ on inherited events.
+	 * We do not support PERF_SAMPLE_READ on inherited events unless
+	 * inherit_stat and PERF_SAMPLE_TID are also selected, which allows
+	 * inherited events to collect per-thread samples.
 	 * See perf_output_read().
 	 */
-	if (attr->inherit && (attr->sample_type & PERF_SAMPLE_READ))
+	if (attr->inherit && (attr->sample_type & PERF_SAMPLE_READ)
+		&& !(attr->inherit_stat && (attr->sample_type & PERF_SAMPLE_TID)))
 		goto err_ns;
 
 	if (!has_branch_stack(event))
@@ -13037,7 +13056,7 @@ static void sync_child_event(struct perf_event *child_event)
 			perf_event_read_event(child_event, task);
 	}
 
-	child_val = perf_event_count(child_event);
+	child_val = perf_event_count(child_event, false);
 
 	/*
 	 * Add back the child's count to the parent's count:
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 2/4] tools/perf: Track where perf_sample_ids need per-thread periods
  2024-02-08 13:10 [PATCH v2 0/4] perf: Support PERF_SAMPLE_READ with inherit_stat Ben Gainey
  2024-02-08 13:10 ` [PATCH v2 1/4] " Ben Gainey
@ 2024-02-08 13:10 ` Ben Gainey
  2024-02-08 13:10 ` [PATCH v2 3/4] tools/perf: Correctly calculate sample period for inherited SAMPLE_READ values Ben Gainey
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Ben Gainey @ 2024-02-08 13:10 UTC (permalink / raw)
  To: linux-perf-users, linux-kernel
  Cc: peterz, mingo, acme, mark.rutland, alexander.shishkin, jolsa,
	namhyung, irogers, adrian.hunter, james.clark, Ben Gainey

When PERF_SAMPLE_READ is used with inherit+inherit_stat the perf_sample_id is no longer
globally unique, but instead is unique per each inherited thread.

Track this fact in perf_sample_ids as it will be needed to correctly calculate the
period.

Signed-off-by: Ben Gainey <ben.gainey@arm.com>
---
 tools/lib/perf/evlist.c                 | 1 +
 tools/lib/perf/evsel.c                  | 7 +++++++
 tools/lib/perf/include/internal/evsel.h | 7 +++++++
 3 files changed, 15 insertions(+)

diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
index 058e3ff10f9b..c585c49491a5 100644
--- a/tools/lib/perf/evlist.c
+++ b/tools/lib/perf/evlist.c
@@ -255,6 +255,7 @@ static void perf_evlist__id_hash(struct perf_evlist *evlist,
 
 	sid->id = id;
 	sid->evsel = evsel;
+	sid->period_per_thread = perf_evsel__attr_has_per_thread_sample_period(evsel);
 	hash = hash_64(sid->id, PERF_EVLIST__HLIST_BITS);
 	hlist_add_head(&sid->node, &evlist->heads[hash]);
 }
diff --git a/tools/lib/perf/evsel.c b/tools/lib/perf/evsel.c
index c07160953224..dd60ee0557d8 100644
--- a/tools/lib/perf/evsel.c
+++ b/tools/lib/perf/evsel.c
@@ -537,6 +537,13 @@ void perf_evsel__free_id(struct perf_evsel *evsel)
 	evsel->ids = 0;
 }
 
+bool perf_evsel__attr_has_per_thread_sample_period(struct perf_evsel *evsel)
+{
+	return (evsel->attr.sample_type & PERF_SAMPLE_READ)
+		&& evsel->attr.inherit
+		&& evsel->attr.inherit_stat;
+}
+
 void perf_counts_values__scale(struct perf_counts_values *count,
 			       bool scale, __s8 *pscaled)
 {
diff --git a/tools/lib/perf/include/internal/evsel.h b/tools/lib/perf/include/internal/evsel.h
index 5cd220a61962..97658f1c9ca3 100644
--- a/tools/lib/perf/include/internal/evsel.h
+++ b/tools/lib/perf/include/internal/evsel.h
@@ -36,6 +36,11 @@ struct perf_sample_id {
 
 	/* Holds total ID period value for PERF_SAMPLE_READ processing. */
 	u64			 period;
+
+	/* When inherit+inherit_stat is combined with PERF_SAMPLE_READ, the
+	 * period value is per (sample_id, thread) tuple, rather than per
+	 * sample_id. */
+	bool 			 period_per_thread;
 };
 
 struct perf_evsel {
@@ -88,4 +93,6 @@ int perf_evsel__apply_filter(struct perf_evsel *evsel, const char *filter);
 int perf_evsel__alloc_id(struct perf_evsel *evsel, int ncpus, int nthreads);
 void perf_evsel__free_id(struct perf_evsel *evsel);
 
+bool perf_evsel__attr_has_per_thread_sample_period(struct perf_evsel *evsel);
+
 #endif /* __LIBPERF_INTERNAL_EVSEL_H */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 3/4] tools/perf: Correctly calculate sample period for inherited SAMPLE_READ values
  2024-02-08 13:10 [PATCH v2 0/4] perf: Support PERF_SAMPLE_READ with inherit_stat Ben Gainey
  2024-02-08 13:10 ` [PATCH v2 1/4] " Ben Gainey
  2024-02-08 13:10 ` [PATCH v2 2/4] tools/perf: Track where perf_sample_ids need per-thread periods Ben Gainey
@ 2024-02-08 13:10 ` Ben Gainey
  2024-02-08 13:10 ` [PATCH v2 4/4] tools/perf: Allow inherit + inherit_stat + PERF_SAMPLE_READ when opening events Ben Gainey
  2024-03-10 13:06 ` [PATCH v2 0/4] perf: Support PERF_SAMPLE_READ with inherit_stat Ben Gainey
  4 siblings, 0 replies; 7+ messages in thread
From: Ben Gainey @ 2024-02-08 13:10 UTC (permalink / raw)
  To: linux-perf-users, linux-kernel
  Cc: peterz, mingo, acme, mark.rutland, alexander.shishkin, jolsa,
	namhyung, irogers, adrian.hunter, james.clark, Ben Gainey

Calculate the per-thread period when using PERF_SAMPLE_READ with inherit+inherit_stat.

Stores a per-thread period per perf_sample_id, hashed by tid. For other
configurations, maintain a global period per perf_sample_id.

Signed-off-by: Ben Gainey <ben.gainey@arm.com>
---
 tools/lib/perf/evsel.c                  | 41 +++++++++++++++++++++++++
 tools/lib/perf/include/internal/evsel.h | 41 +++++++++++++++++++++++--
 tools/perf/util/session.c               | 11 +++++--
 3 files changed, 88 insertions(+), 5 deletions(-)

diff --git a/tools/lib/perf/evsel.c b/tools/lib/perf/evsel.c
index dd60ee0557d8..4e173151e183 100644
--- a/tools/lib/perf/evsel.c
+++ b/tools/lib/perf/evsel.c
@@ -5,6 +5,7 @@
 #include <perf/evsel.h>
 #include <perf/cpumap.h>
 #include <perf/threadmap.h>
+#include <linux/hash.h>
 #include <linux/list.h>
 #include <internal/evsel.h>
 #include <linux/zalloc.h>
@@ -23,6 +24,7 @@ void perf_evsel__init(struct perf_evsel *evsel, struct perf_event_attr *attr,
 		      int idx)
 {
 	INIT_LIST_HEAD(&evsel->node);
+	INIT_LIST_HEAD(&evsel->period_per_thread_periods);
 	evsel->attr = *attr;
 	evsel->idx  = idx;
 	evsel->leader = evsel;
@@ -531,10 +533,17 @@ int perf_evsel__alloc_id(struct perf_evsel *evsel, int ncpus, int nthreads)
 
 void perf_evsel__free_id(struct perf_evsel *evsel)
 {
+	struct perf_sample_id_period *pos, *n;
+
 	xyarray__delete(evsel->sample_id);
 	evsel->sample_id = NULL;
 	zfree(&evsel->id);
 	evsel->ids = 0;
+
+	perf_evsel_for_each_per_thread_period_safe(evsel, n, pos) {
+		list_del_init(&pos->node);
+		zfree(pos);
+	}
 }
 
 bool perf_evsel__attr_has_per_thread_sample_period(struct perf_evsel *evsel)
@@ -544,6 +553,38 @@ bool perf_evsel__attr_has_per_thread_sample_period(struct perf_evsel *evsel)
 		&& evsel->attr.inherit_stat;
 }
 
+u64 * perf_sample_id__get_period_storage(struct perf_sample_id * sid, u32 tid)
+{
+	struct hlist_head *head;
+	struct perf_sample_id_period *res;
+	int hash;
+
+	if (!sid->period_per_thread)
+		return &sid->period;
+
+	hash = hash_32(tid, PERF_SAMPLE_ID__HLIST_BITS);
+	head = &sid->periods[hash];
+
+	hlist_for_each_entry(res, head, hnode)
+		if (res->tid == tid)
+			return &res->period;
+
+	if (sid->evsel == NULL)
+		return NULL;
+
+	res = zalloc(sizeof(struct perf_sample_id_period));
+	if (res == NULL)
+		return NULL;
+
+	INIT_LIST_HEAD(&res->node);
+	res->tid = tid;
+
+	list_add_tail(&res->node, &sid->evsel->period_per_thread_periods);
+	hlist_add_head(&res->hnode, &sid->periods[hash]);
+
+	return &res->period;
+}
+
 void perf_counts_values__scale(struct perf_counts_values *count,
 			       bool scale, __s8 *pscaled)
 {
diff --git a/tools/lib/perf/include/internal/evsel.h b/tools/lib/perf/include/internal/evsel.h
index 97658f1c9ca3..0fd8597c1340 100644
--- a/tools/lib/perf/include/internal/evsel.h
+++ b/tools/lib/perf/include/internal/evsel.h
@@ -11,6 +11,31 @@
 struct perf_thread_map;
 struct xyarray;
 
+/**
+ * The per-thread accumulated period storage node.
+ */
+struct perf_sample_id_period {
+	struct list_head	node;
+	struct hlist_node	hnode;
+	/* The thread that the values belongs to */
+	u32			tid;
+	/* Holds total ID period value for PERF_SAMPLE_READ processing. */
+	u64			period;
+};
+
+/**
+ * perf_evsel_for_each_per_thread_period_safe - safely iterate thru all the period_per_thread_periods
+ * @evlist:perf_evsel instance to iterate
+ * @item: struct perf_sample_id_period iterator
+ * @tmp: struct perf_sample_id_period temp iterator
+ */
+#define perf_evsel_for_each_per_thread_period_safe(evsel, tmp, item) \
+	list_for_each_entry_safe(item, tmp, &(evsel)->period_per_thread_periods, node)
+
+
+#define PERF_SAMPLE_ID__HLIST_BITS 4
+#define PERF_SAMPLE_ID__HLIST_SIZE (1 << PERF_SAMPLE_ID__HLIST_BITS)
+
 /*
  * Per fd, to map back from PERF_SAMPLE_ID to evsel, only used when there are
  * more than one entry in the evlist.
@@ -19,6 +44,7 @@ struct perf_sample_id {
 	struct hlist_node	 node;
 	u64			 id;
 	struct perf_evsel	*evsel;
+
        /*
 	* 'idx' will be used for AUX area sampling. A sample will have AUX area
 	* data that will be queued for decoding, where there are separate
@@ -34,8 +60,14 @@ struct perf_sample_id {
 	pid_t			 machine_pid;
 	struct perf_cpu		 vcpu;
 
-	/* Holds total ID period value for PERF_SAMPLE_READ processing. */
-	u64			 period;
+	union {
+		/* Holds total ID period value for PERF_SAMPLE_READ processing (when period is not
+		 * per-thread). */
+		u64			period;
+		/* Holds total ID period value for PERF_SAMPLE_READ processing (when period is
+		 * per-thread). */
+		struct hlist_head	periods[PERF_SAMPLE_ID__HLIST_SIZE];
+	};
 
 	/* When inherit+inherit_stat is combined with PERF_SAMPLE_READ, the
 	 * period value is per (sample_id, thread) tuple, rather than per
@@ -63,6 +95,9 @@ struct perf_evsel {
 	u32			 ids;
 	struct perf_evsel	*leader;
 
+	/* Where period_per_thread is true, stores the per-thread values */
+	struct list_head	period_per_thread_periods;
+
 	/* parse modifier helper */
 	int			 nr_members;
 	/*
@@ -95,4 +130,6 @@ void perf_evsel__free_id(struct perf_evsel *evsel);
 
 bool perf_evsel__attr_has_per_thread_sample_period(struct perf_evsel *evsel);
 
+u64 * perf_sample_id__get_period_storage(struct perf_sample_id * sid, u32 tid);
+
 #endif /* __LIBPERF_INTERNAL_EVSEL_H */
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 199d3e8df315..22a8598ee849 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1478,14 +1478,19 @@ static int deliver_sample_value(struct evlist *evlist,
 {
 	struct perf_sample_id *sid = evlist__id2sid(evlist, v->id);
 	struct evsel *evsel;
+	u64 * storage = NULL;
 
 	if (sid) {
+		storage  = perf_sample_id__get_period_storage(sid, sample->tid);
+	}
+
+	if (storage) {
 		sample->id     = v->id;
-		sample->period = v->value - sid->period;
-		sid->period    = v->value;
+		sample->period = v->value - *storage;
+		*storage       = v->value;
 	}
 
-	if (!sid || sid->evsel == NULL) {
+	if (!storage || sid->evsel == NULL) {
 		++evlist->stats.nr_unknown_id;
 		return 0;
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 4/4] tools/perf: Allow inherit + inherit_stat + PERF_SAMPLE_READ when opening events
  2024-02-08 13:10 [PATCH v2 0/4] perf: Support PERF_SAMPLE_READ with inherit_stat Ben Gainey
                   ` (2 preceding siblings ...)
  2024-02-08 13:10 ` [PATCH v2 3/4] tools/perf: Correctly calculate sample period for inherited SAMPLE_READ values Ben Gainey
@ 2024-02-08 13:10 ` Ben Gainey
  2024-03-10 13:06 ` [PATCH v2 0/4] perf: Support PERF_SAMPLE_READ with inherit_stat Ben Gainey
  4 siblings, 0 replies; 7+ messages in thread
From: Ben Gainey @ 2024-02-08 13:10 UTC (permalink / raw)
  To: linux-perf-users, linux-kernel
  Cc: peterz, mingo, acme, mark.rutland, alexander.shishkin, jolsa,
	namhyung, irogers, adrian.hunter, james.clark, Ben Gainey

When PERF_SAMPLE_READ is used will enable inherit_stat when inherit is set.

Provides a fallback path to disable inherit when this feature is not available,
which is inline with the previous behaviour.

Signed-off-by: Ben Gainey <ben.gainey@arm.com>
---
 tools/perf/util/evsel.c | 15 +++++++++++++--
 tools/perf/util/evsel.h |  1 +
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 6d7c9c58a9bc..dc74b39a2254 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1156,7 +1156,11 @@ void evsel__config(struct evsel *evsel, struct record_opts *opts,
 		 */
 		if (leader->core.nr_members > 1) {
 			attr->read_format |= PERF_FORMAT_GROUP;
-			attr->inherit = 0;
+		}
+
+		/* Inherit + READ requires inherit_stat */
+		if (attr->inherit) {
+			attr->inherit_stat = true;
 		}
 	}
 
@@ -1832,6 +1836,8 @@ static int __evsel__prepare_open(struct evsel *evsel, struct perf_cpu_map *cpus,
 
 static void evsel__disable_missing_features(struct evsel *evsel)
 {
+	if (perf_missing_features.inherit_sample_read)
+		evsel->core.attr.inherit = 0;
 	if (perf_missing_features.branch_counters)
 		evsel->core.attr.branch_sample_type &= ~PERF_SAMPLE_BRANCH_COUNTERS;
 	if (perf_missing_features.read_lost)
@@ -1887,7 +1893,12 @@ bool evsel__detect_missing_features(struct evsel *evsel)
 	 * Must probe features in the order they were added to the
 	 * perf_event_attr interface.
 	 */
-	if (!perf_missing_features.branch_counters &&
+	if (!perf_missing_features.inherit_sample_read &&
+	    evsel->core.attr.inherit && (evsel->core.attr.sample_type & PERF_SAMPLE_READ)) {
+		perf_missing_features.inherit_sample_read = true;
+		pr_debug2("Using PERF_SAMPLE_READ / :S modifier is not compatible with inherit, falling back to no-inherit.\n");
+		return true;
+	} else if (!perf_missing_features.branch_counters &&
 	    (evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS)) {
 		perf_missing_features.branch_counters = true;
 		pr_debug2("switching off branch counters support\n");
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index efbb6e848287..11cc9b8bee27 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -192,6 +192,7 @@ struct perf_missing_features {
 	bool weight_struct;
 	bool read_lost;
 	bool branch_counters;
+	bool inherit_sample_read;
 };
 
 extern struct perf_missing_features perf_missing_features;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 0/4] perf: Support PERF_SAMPLE_READ with inherit_stat
  2024-02-08 13:10 [PATCH v2 0/4] perf: Support PERF_SAMPLE_READ with inherit_stat Ben Gainey
                   ` (3 preceding siblings ...)
  2024-02-08 13:10 ` [PATCH v2 4/4] tools/perf: Allow inherit + inherit_stat + PERF_SAMPLE_READ when opening events Ben Gainey
@ 2024-03-10 13:06 ` Ben Gainey
  4 siblings, 0 replies; 7+ messages in thread
From: Ben Gainey @ 2024-03-10 13:06 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users
  Cc: alexander.shishkin, peterz, acme, mingo, Mark Rutland,
	adrian.hunter, irogers, jolsa, namhyung, James Clark

On Thu, 2024-02-08 at 13:10 +0000, Ben Gainey wrote:
> This change allows events to use PERF_SAMPLE READ with inherit so
> long
> as both inherit_stat and PERF_SAMPLE_TID are set.
>
> Currently it is not possible to use PERF_SAMPLE_READ with inherit.
> This
> restriction assumes the user is interested in collecting aggregate
> statistics as per `perf stat`. It prevents a user from collecting
> per-thread samples using counter groups from a multi-threaded or
> multi-process application, as with `perf record -e '{....}:S'`.
> Instead
> users must use system-wide mode, or forgo the ability to sample
> counter
> groups. System-wide mode is often problematic as it requires specific
> permissions (no CAP_PERFMON / root access), or may lead to capture of
> significant amounts of extra data from other processes running on the
> system.
>
> Perf already supports the ability to collect per-thread counts with
> `inherit` via the `inherit_stat` flag. This patch changes
> `perf_event_alloc` relaxing the restriction to combine `inherit` with
> `PERF_SAMPLE_READ` so that the combination will be allowed so long as
> `inherit_stat` and `PERF_SAMPLE_TID` are enabled.
>
> In this configuration stream ids (such as may appear in the
> read_format
> field of a PERF_RECORD_SAMPLE) are no longer globally unique, rather
> the pair of (stream id, tid) uniquely identify each event. Tools that
> rely on this, for example to calculate a delta between samples, would
> need updating to take this into account. Previously valid event
> configurations (system-wide, no-inherit and so on) where each stream
> id
> is the identifier are unaffected.
>
>
> Changes since v1:
>  - Rebase on v6.8-rc1
>  - Fixed value written into sample after child exists.
>  - Modified handling of switch-out so that context with these events
> take the
>    slow path, so that the per-event/per-thread PMU state is correctly
> switched.
>  - Modified perf tools to support this mode of operation.
>
>
> Ben Gainey (4):
>   perf: Support PERF_SAMPLE_READ with inherit_stat
>   tools/perf: Track where perf_sample_ids need per-thread periods
>   tools/perf: Correctly calculate sample period for inherited
>     SAMPLE_READ values
>   tools/perf: Allow inherit + inherit_stat + PERF_SAMPLE_READ when
>     opening events
>
>  include/linux/perf_event.h              |  1 +
>  kernel/events/core.c                    | 53 +++++++++++++++++------
> --
>  tools/lib/perf/evlist.c                 |  1 +
>  tools/lib/perf/evsel.c                  | 48 ++++++++++++++++++++++
>  tools/lib/perf/include/internal/evsel.h | 48 +++++++++++++++++++++-
>  tools/perf/util/evsel.c                 | 15 ++++++-
>  tools/perf/util/evsel.h                 |  1 +
>  tools/perf/util/session.c               | 11 +++--
>  8 files changed, 154 insertions(+), 24 deletions(-)
>



Hello all, appreciate everyone is busy. Is there any feedback on these?
I expect they will need rebasing, but before I do that, are you happy
with the approach?

Cheers
Ben
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 1/4] perf: Support PERF_SAMPLE_READ with inherit_stat
  2024-02-08 13:10 ` [PATCH v2 1/4] " Ben Gainey
@ 2024-03-11 15:58   ` James Clark
  0 siblings, 0 replies; 7+ messages in thread
From: James Clark @ 2024-03-11 15:58 UTC (permalink / raw)
  To: Ben Gainey, linux-perf-users, linux-kernel
  Cc: peterz, mingo, acme, mark.rutland, alexander.shishkin, jolsa,
	namhyung, irogers, adrian.hunter



On 08/02/2024 13:10, Ben Gainey wrote:
> This change allows events to use PERF_SAMPLE READ with inherit
> so long as both inherit_stat and PERF_SAMPLE_TID are set.

I saw there was a discussion on V1 about adding a new flag because this
is an ABI break. Personally I'm not sure if it is required given that it
wasn't allowed before, so there wouldn't be any users to experience it.
I suppose there _could_ be a new user if they stumbled across this now,
but it's not like they would see that as a regression because it wasn't
allowed before anyway. Maybe it's cleaner to just use the existing flags
rather than adding a new one.

Perf even guarded the condition in userspace as far as I can see, so
nobody using Perf would hit this either.

> 
> In this configuration, and event will be inherited into any

and -> any/an?

> child processes / threads, allowing convenient profiling of a
> multiprocess or multithreaded application, whilst allowing
> profiling tools to collect per-thread samples, in particular
> of groups of counters.
> 
> Signed-off-by: Ben Gainey <ben.gainey@arm.com>
> ---
>  include/linux/perf_event.h |  1 +
>  kernel/events/core.c       | 53 ++++++++++++++++++++++++++------------
>  2 files changed, 37 insertions(+), 17 deletions(-)
> 
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index d2a15c0c6f8a..7d405dff6694 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -932,6 +932,7 @@ struct perf_event_context {
>  
>  	int				nr_task_data;
>  	int				nr_stat;
> +	int				nr_stat_read;
>  	int				nr_freq;
>  	int				rotate_disable;
>  
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index f0f0f71213a1..dac7093b3608 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -1795,8 +1795,11 @@ list_add_event(struct perf_event *event, struct perf_event_context *ctx)
>  	ctx->nr_events++;
>  	if (event->hw.flags & PERF_EVENT_FLAG_USER_READ_CNT)
>  		ctx->nr_user++;
> -	if (event->attr.inherit_stat)
> +	if (event->attr.inherit_stat) {
>  		ctx->nr_stat++;
> +		if (event->attr.inherit && (event->attr.sample_type & PERF_SAMPLE_READ))
> +			ctx->nr_stat_read++;
> +	}
>  
>  	if (event->state > PERF_EVENT_STATE_OFF)
>  		perf_cgroup_event_enable(event, ctx);
> @@ -2019,8 +2022,11 @@ list_del_event(struct perf_event *event, struct perf_event_context *ctx)
>  	ctx->nr_events--;
>  	if (event->hw.flags & PERF_EVENT_FLAG_USER_READ_CNT)
>  		ctx->nr_user--;
> -	if (event->attr.inherit_stat)
> +	if (event->attr.inherit_stat) {
>  		ctx->nr_stat--;
> +		if (event->attr.inherit && (event->attr.sample_type & PERF_SAMPLE_READ))

This condition is repeated a few times, maybe we could add a macro like
"has_sample_read_thread()" or something or whatever we're calling it
exactly.

It might have prevented the mistake in copying the condition below in
perf_event_count()...

> +			ctx->nr_stat_read--;
> +	}
>  
>  	list_del_rcu(&event->event_entry);
>  
> @@ -3529,11 +3535,17 @@ perf_event_context_sched_out(struct task_struct *task, struct task_struct *next)
>  			perf_ctx_disable(ctx, false);
>  
>  			/* PMIs are disabled; ctx->nr_pending is stable. */
> -			if (local_read(&ctx->nr_pending) ||
> +			if (ctx->nr_stat_read ||
> +			    next_ctx->nr_stat_read ||
> +			    local_read(&ctx->nr_pending) ||
>  			    local_read(&next_ctx->nr_pending)) {
>  				/*
>  				 * Must not swap out ctx when there's pending
>  				 * events that rely on the ctx->task relation.
> +				 *
> +				 * Likewise, when a context contains inherit+inherit_stat+SAMPLE_READ
> +				 * events they should be switched out using the slow path
> +				 * so that they are treated as if they were distinct contexts.
>  				 */
>  				raw_spin_unlock(&next_ctx->lock);
>  				rcu_read_unlock();
> @@ -3545,6 +3557,7 @@ perf_event_context_sched_out(struct task_struct *task, struct task_struct *next)
>  
>  			perf_ctx_sched_task_cb(ctx, false);
>  			perf_event_swap_task_ctx_data(ctx, next_ctx);
> +			perf_event_sync_stat(ctx, next_ctx);
>  
>  			perf_ctx_enable(ctx, false);
>  
> @@ -3559,8 +3572,6 @@ perf_event_context_sched_out(struct task_struct *task, struct task_struct *next)
>  			RCU_INIT_POINTER(next->perf_event_ctxp, ctx);
>  
>  			do_switch = 0;
> -
> -			perf_event_sync_stat(ctx, next_ctx);

The commit message gives a very high level summary of the user visible
changes, but it doesn't say what changes have been made to the code to
accomplish it.

I wasn't sure what this move of perf_even_sync_stat() is for, because
it's actually skipped over when nr_stat_read != 0, which is in this new
case?

>  		}
>  		raw_spin_unlock(&next_ctx->lock);
>  		raw_spin_unlock(&ctx->lock);
> @@ -4533,8 +4544,13 @@ static void __perf_event_read(void *info)
>  	raw_spin_unlock(&ctx->lock);
>  }
>  
> -static inline u64 perf_event_count(struct perf_event *event)
> +static inline u64 perf_event_count(struct perf_event *event, bool from_sample)

There are only two trues, and 7 existing falses, so you could just add a
new function like perf_event_count_sample(). But I suppose that's a
style thing.

>  {
> +	if (from_sample && event->attr.inherit &&
> +       event->attr.inherit &&
> +       (event->attr.sample_type & PERF_SAMPLE_TID)) {

There's something suspicious about this condition with the
event->attr.inherit twice. Should it be && inherit_stat? I don't know if
there's any test that could have caught this, if it's affecting the
existing inherit but not inherit_stat case?

And it's also probably not very helpful at this stage, but if you run
checkpatch.pl on your patches it will point out some of the style issues.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-03-11 15:58 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-08 13:10 [PATCH v2 0/4] perf: Support PERF_SAMPLE_READ with inherit_stat Ben Gainey
2024-02-08 13:10 ` [PATCH v2 1/4] " Ben Gainey
2024-03-11 15:58   ` James Clark
2024-02-08 13:10 ` [PATCH v2 2/4] tools/perf: Track where perf_sample_ids need per-thread periods Ben Gainey
2024-02-08 13:10 ` [PATCH v2 3/4] tools/perf: Correctly calculate sample period for inherited SAMPLE_READ values Ben Gainey
2024-02-08 13:10 ` [PATCH v2 4/4] tools/perf: Allow inherit + inherit_stat + PERF_SAMPLE_READ when opening events Ben Gainey
2024-03-10 13:06 ` [PATCH v2 0/4] perf: Support PERF_SAMPLE_READ with inherit_stat Ben Gainey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).