[PATCH v2 0/2] perf core: Sharing events with multiple cgroups

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/2] perf core: Sharing events with multiple cgroups
@ 2021-04-13  4:41 Namhyung Kim
  2021-04-13  4:41 ` [PATCH v2 1/2] perf/core: Share an event " Namhyung Kim
  2021-04-13  4:41 ` [PATCH v2 2/2] perf/core: Support reading group events with shared cgroups Namhyung Kim
  0 siblings, 2 replies; 7+ messages in thread
From: Namhyung Kim @ 2021-04-13  4:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Jiri Olsa, Mark Rutland,
	Alexander Shishkin, LKML, Stephane Eranian, Andi Kleen,
	Ian Rogers, Song Liu

Hello,

This work is to make perf stat more scalable with a lot of cgroups.

Changes in v2)
 * use ____cacheline_aligned macro instead of the padding
 * enclose the cgroup node list initialization
 * add more comments
 * add Acked-by from Song Liu

Currently we need to open a separate perf_event to count an event in a
cgroup.  For a big machine, this requires lots of events like

  256 cpu x 8 events x 200 cgroups = 409600 events

This is very wasteful and not scalable.  In this case, the perf stat
actually counts exactly same events for each cgroup.  I think we can
just use a single event to measure all cgroups running on that cpu.

So I added new ioctl commands to add per-cgroup counters to an
existing perf_event and to read the per-cgroup counters from the
event.  The per-cgroup counters are updated during the context switch
if tasks' cgroups are different (and no need to change the HW PMU).
It keeps the counters in a hash table with cgroup id as a key.

With this change, average processing time of my internal test workload
which runs tasks in a different cgroup and communicates by pipes
dropped from 11.3 usec to 5.8 usec.

Thanks,
Namhyung

Namhyung Kim (2):
  perf/core: Share an event with multiple cgroups
  perf/core: Support reading group events with shared cgroups

 include/linux/perf_event.h      |  22 ++
 include/uapi/linux/perf_event.h |   2 +
 kernel/events/core.c            | 591 ++++++++++++++++++++++++++++++--
 3 files changed, 588 insertions(+), 27 deletions(-)

-- 
2.31.1.295.g9ea45b61b8-goog

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 1/2] perf/core: Share an event with multiple cgroups
  2021-04-13  4:41 [PATCH v2 0/2] perf core: Sharing events with multiple cgroups Namhyung Kim
@ 2021-04-13  4:41 ` Namhyung Kim
  2021-04-13  7:03   ` kernel test robot
                     ` (3 more replies)
  2021-04-13  4:41 ` [PATCH v2 2/2] perf/core: Support reading group events with shared cgroups Namhyung Kim
  1 sibling, 4 replies; 7+ messages in thread
From: Namhyung Kim @ 2021-04-13  4:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Jiri Olsa, Mark Rutland,
	Alexander Shishkin, LKML, Stephane Eranian, Andi Kleen,
	Ian Rogers, Song Liu, Tejun Heo

As we can run many jobs (in container) on a big machine, we want to
measure each job's performance during the run.  To do that, the
perf_event can be associated to a cgroup to measure it only.

However such cgroup events need to be opened separately and it causes
significant overhead in event multiplexing during the context switch
as well as resource consumption like in file descriptors and memory
footprint.

As a cgroup event is basically a cpu event, we can share a single cpu
event for multiple cgroups.  All we need is a separate counter (and
two timing variables) for each cgroup.  I added a hash table to map
from cgroup id to the attached cgroups.

With this change, the cpu event needs to calculate a delta of event
counter values when the cgroups of current and the next task are
different.  And it attributes the delta to the current task's cgroup.

This patch adds two new ioctl commands to perf_event for light-weight
cgroup event counting (i.e. perf stat).

 * PERF_EVENT_IOC_ATTACH_CGROUP - it takes a buffer consists of a
     64-bit array to attach given cgroups.  The first element is a
     number of cgroups in the buffer, and the rest is a list of cgroup
     ids to add a cgroup info to the given event.

 * PERF_EVENT_IOC_READ_CGROUP - it takes a buffer consists of a 64-bit
     array to get the event counter values.  The first element is size
     of the array in byte, and the second element is a cgroup id to
     read.  The rest is to save the counter value and timings.

This attaches all cgroups in a single syscall and I didn't add the
DETACH command deliberately to make the implementation simple.  The
attached cgroup nodes would be deleted when the file descriptor of the
perf_event is closed.

Cc: Tejun Heo <tj@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 include/linux/perf_event.h      |  22 ++
 include/uapi/linux/perf_event.h |   2 +
 kernel/events/core.c            | 477 ++++++++++++++++++++++++++++++--
 3 files changed, 474 insertions(+), 27 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 3f7f89ea5e51..4b03cbadf4a0 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -771,6 +771,19 @@ struct perf_event {
 
 #ifdef CONFIG_CGROUP_PERF
 	struct perf_cgroup		*cgrp; /* cgroup event is attach to */
+
+	/* to share an event for multiple cgroups */
+	struct hlist_head		*cgrp_node_hash;
+	struct perf_cgroup_node		*cgrp_node_entries;
+	int				nr_cgrp_nodes;
+	int				cgrp_node_hash_bits;
+
+	struct list_head		cgrp_node_entry;
+
+	/* snapshot of previous reading (for perf_cgroup_node below) */
+	u64				cgrp_node_count;
+	u64				cgrp_node_time_enabled;
+	u64				cgrp_node_time_running;
 #endif
 
 #ifdef CONFIG_SECURITY
@@ -780,6 +793,13 @@ struct perf_event {
 #endif /* CONFIG_PERF_EVENTS */
 };
 
+struct perf_cgroup_node {
+	struct hlist_node		node;
+	u64				id;
+	u64				count;
+	u64				time_enabled;
+	u64				time_running;
+} ____cacheline_aligned;
 
 struct perf_event_groups {
 	struct rb_root	tree;
@@ -843,6 +863,8 @@ struct perf_event_context {
 	int				pin_count;
 #ifdef CONFIG_CGROUP_PERF
 	int				nr_cgroups;	 /* cgroup evts */
+	struct list_head		cgrp_node_list;
+	struct list_head		cgrp_ctx_entry;
 #endif
 	void				*task_ctx_data; /* pmu specific data */
 	struct rcu_head			rcu_head;
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index ad15e40d7f5d..06bc7ab13616 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -479,6 +479,8 @@ struct perf_event_query_bpf {
 #define PERF_EVENT_IOC_PAUSE_OUTPUT		_IOW('$', 9, __u32)
 #define PERF_EVENT_IOC_QUERY_BPF		_IOWR('$', 10, struct perf_event_query_bpf *)
 #define PERF_EVENT_IOC_MODIFY_ATTRIBUTES	_IOW('$', 11, struct perf_event_attr *)
+#define PERF_EVENT_IOC_ATTACH_CGROUP		_IOW('$', 12, __u64 *)
+#define PERF_EVENT_IOC_READ_CGROUP		_IOWR('$', 13, __u64 *)
 
 enum perf_event_ioc_flags {
 	PERF_IOC_FLAG_GROUP		= 1U << 0,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index f07943183041..0c6b3848a61f 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -379,6 +379,7 @@ enum event_type_t {
  * perf_cgroup_events: >0 per-cpu cgroup events exist on this cpu
  */
 
+static void perf_sched_enable(void);
 static void perf_sched_delayed(struct work_struct *work);
 DEFINE_STATIC_KEY_FALSE(perf_sched_events);
 static DECLARE_DELAYED_WORK(perf_sched_work, perf_sched_delayed);
@@ -2124,6 +2125,322 @@ static int perf_get_aux_event(struct perf_event *event,
 	return 1;
 }
 
+#ifdef CONFIG_CGROUP_PERF
+static DEFINE_PER_CPU(struct list_head, cgroup_ctx_list);
+
+static bool event_can_attach_cgroup(struct perf_event *event)
+{
+	if (is_sampling_event(event))
+		return false;
+	if (event->attach_state & PERF_ATTACH_TASK)
+		return false;
+	if (is_cgroup_event(event))
+		return false;
+
+	return true;
+}
+
+static bool event_has_cgroup_node(struct perf_event *event)
+{
+	return event->nr_cgrp_nodes > 0;
+}
+
+static struct perf_cgroup_node *
+find_cgroup_node(struct perf_event *event, u64 cgrp_id)
+{
+	struct perf_cgroup_node *cgrp_node;
+	int key = hash_64(cgrp_id, event->cgrp_node_hash_bits);
+
+	hlist_for_each_entry(cgrp_node, &event->cgrp_node_hash[key], node) {
+		if (cgrp_node->id == cgrp_id)
+			return cgrp_node;
+	}
+
+	return NULL;
+}
+
+static void perf_update_cgroup_node(struct perf_event *event, struct cgroup *cgrp)
+{
+	u64 delta_count, delta_time_enabled, delta_time_running;
+	int i;
+
+	if (event->cgrp_node_count == 0)
+		goto out;
+
+	delta_count = local64_read(&event->count) - event->cgrp_node_count;
+	delta_time_enabled = event->total_time_enabled - event->cgrp_node_time_enabled;
+	delta_time_running = event->total_time_running - event->cgrp_node_time_running;
+
+	/* account delta to all ancestor cgroups */
+	for (i = 0; i <= cgrp->level; i++) {
+		struct perf_cgroup_node *node;
+
+		node = find_cgroup_node(event, cgrp->ancestor_ids[i]);
+		if (node) {
+			node->count += delta_count;
+			node->time_enabled += delta_time_enabled;
+			node->time_running += delta_time_running;
+		}
+	}
+
+out:
+	event->cgrp_node_count = local64_read(&event->count);
+	event->cgrp_node_time_enabled = event->total_time_enabled;
+	event->cgrp_node_time_running = event->total_time_running;
+}
+
+static void update_cgroup_node(struct perf_event *event, struct cgroup *cgrp)
+{
+	if (event->state == PERF_EVENT_STATE_ACTIVE)
+		event->pmu->read(event);
+
+	perf_event_update_time(event);
+	perf_update_cgroup_node(event, cgrp);
+}
+
+/* this is called from context switch */
+static void update_cgroup_node_events(struct perf_event_context *ctx,
+				      struct cgroup *cgrp)
+{
+	struct perf_event *event;
+
+	lockdep_assert_held(&ctx->lock);
+
+	if (ctx->is_active & EVENT_TIME)
+		update_context_time(ctx);
+
+	list_for_each_entry(event, &ctx->cgrp_node_list, cgrp_node_entry)
+		update_cgroup_node(event, cgrp);
+}
+
+static void cgroup_node_sched_out(struct task_struct *task)
+{
+	struct list_head *cgrp_ctx_list = this_cpu_ptr(&cgroup_ctx_list);
+	struct perf_cgroup *cgrp = perf_cgroup_from_task(task, NULL);
+	struct perf_event_context *ctx;
+
+	list_for_each_entry(ctx, cgrp_ctx_list, cgrp_ctx_entry) {
+		raw_spin_lock(&ctx->lock);
+		update_cgroup_node_events(ctx, cgrp->css.cgroup);
+		raw_spin_unlock(&ctx->lock);
+	}
+}
+
+/* these are called from the when an event is enabled/disabled */
+static void perf_add_cgrp_node_list(struct perf_event *event,
+				    struct perf_event_context *ctx)
+{
+	struct list_head *cgrp_ctx_list = this_cpu_ptr(&cgroup_ctx_list);
+	struct perf_cgroup *cgrp = perf_cgroup_from_task(current, ctx);
+	bool is_first;
+
+	lockdep_assert_irqs_disabled();
+	lockdep_assert_held(&ctx->lock);
+
+	is_first = list_empty(&ctx->cgrp_node_list);
+	list_add_tail(&event->cgrp_node_entry, &ctx->cgrp_node_list);
+
+	if (is_first)
+		list_add_tail(&ctx->cgrp_ctx_entry, cgrp_ctx_list);
+
+	update_cgroup_node(event, cgrp->css.cgroup);
+}
+
+static void perf_del_cgrp_node_list(struct perf_event *event,
+				    struct perf_event_context *ctx)
+{
+	struct perf_cgroup *cgrp = perf_cgroup_from_task(current, ctx);
+
+	lockdep_assert_irqs_disabled();
+	lockdep_assert_held(&ctx->lock);
+
+	update_cgroup_node(event, cgrp->css.cgroup);
+	/* to refresh delta when it's enabled */
+	event->cgrp_node_count = 0;
+
+	list_del(&event->cgrp_node_entry);
+
+	if (list_empty(&ctx->cgrp_node_list))
+		list_del(&ctx->cgrp_ctx_entry);
+}
+
+static void perf_attach_cgroup_node(struct perf_event *event,
+				    struct perf_cpu_context *cpuctx,
+				    struct perf_event_context *ctx,
+				    void *data)
+{
+	if (ctx->is_active & EVENT_TIME)
+		update_context_time(ctx);
+
+	perf_add_cgrp_node_list(event, ctx);
+}
+
+#define MIN_CGRP_NODE_HASH  4
+#define MAX_CGRP_NODE_HASH  (4 * 1024)
+
+/* this is called from ioctl */
+static int perf_event_attach_cgroup_node(struct perf_event *event, u64 nr_cgrps,
+					 u64 *cgroup_ids)
+{
+	struct perf_cgroup_node *cgrp_node;
+	struct perf_event_context *ctx = event->ctx;
+	struct hlist_head *cgrp_node_hash;
+	int node = (event->cpu >= 0) ? cpu_to_node(event->cpu) : -1;
+	unsigned long flags;
+	bool is_first = true;
+	bool enabled;
+	int i, nr_hash;
+	int hash_bits;
+
+	if (nr_cgrps < MIN_CGRP_NODE_HASH)
+		nr_hash = MIN_CGRP_NODE_HASH;
+	else
+		nr_hash = roundup_pow_of_two(nr_cgrps);
+	hash_bits = ilog2(nr_hash);
+
+	cgrp_node_hash = kcalloc_node(nr_hash, sizeof(*cgrp_node_hash),
+				      GFP_KERNEL, node);
+	if (cgrp_node_hash == NULL)
+		return -ENOMEM;
+
+	cgrp_node = kcalloc_node(nr_cgrps, sizeof(*cgrp_node), GFP_KERNEL, node);
+	if (cgrp_node == NULL) {
+		kfree(cgrp_node_hash);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < (int)nr_cgrps; i++) {
+		int key = hash_64(cgroup_ids[i], hash_bits);
+
+		cgrp_node[i].id = cgroup_ids[i];
+		hlist_add_head(&cgrp_node[i].node, &cgrp_node_hash[key]);
+	}
+
+	raw_spin_lock_irqsave(&ctx->lock, flags);
+
+	enabled = event->state >= PERF_EVENT_STATE_INACTIVE;
+
+	if (event->nr_cgrp_nodes != 0) {
+		kfree(event->cgrp_node_hash);
+		kfree(event->cgrp_node_entries);
+		is_first = false;
+	}
+
+	event->cgrp_node_hash = cgrp_node_hash;
+	event->cgrp_node_entries = cgrp_node;
+	event->cgrp_node_hash_bits = hash_bits;
+	event->nr_cgrp_nodes = nr_cgrps;
+
+	raw_spin_unlock_irqrestore(&ctx->lock, flags);
+
+	if (is_first && enabled)
+		event_function_call(event, perf_attach_cgroup_node, NULL);
+
+	return 0;
+}
+
+static void perf_event_destroy_cgroup_nodes(struct perf_event *event)
+{
+	struct perf_event_context *ctx = event->ctx;
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&ctx->lock, flags);
+
+	if (event_has_cgroup_node(event)) {
+		if (!atomic_add_unless(&perf_sched_count, -1, 1))
+			schedule_delayed_work(&perf_sched_work, HZ);
+	}
+
+	kfree(event->cgrp_node_hash);
+	kfree(event->cgrp_node_entries);
+	event->nr_cgrp_nodes = 0;
+
+	raw_spin_unlock_irqrestore(&ctx->lock, flags);
+}
+
+static int perf_event_read(struct perf_event *event, bool group);
+
+static void __perf_read_cgroup_node(struct perf_event *event)
+{
+	struct perf_cgroup *cgrp;
+
+	if (event_has_cgroup_node(event)) {
+		cgrp = perf_cgroup_from_task(current, NULL);
+		perf_update_cgroup_node(event, cgrp->css.cgroup);
+	}
+}
+
+static int perf_event_read_cgroup_node(struct perf_event *event, u64 read_size,
+				       u64 cgrp_id, char __user *buf)
+{
+	struct perf_cgroup_node *cgrp;
+	struct perf_event_context *ctx = event->ctx;
+	unsigned long flags;
+	u64 read_format = event->attr.read_format;
+	u64 values[4];
+	int n = 0;
+
+	/* update event count and times (possibly run on other cpu) */
+	(void)perf_event_read(event, false);
+
+	raw_spin_lock_irqsave(&ctx->lock, flags);
+
+	cgrp = find_cgroup_node(event, cgrp_id);
+	if (cgrp == NULL) {
+		raw_spin_unlock_irqrestore(&ctx->lock, flags);
+		return -ENOENT;
+	}
+
+	values[n++] = cgrp->count;
+	if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED)
+		values[n++] = cgrp->time_enabled;
+	if (read_format & PERF_FORMAT_TOTAL_TIME_RUNNING)
+		values[n++] = cgrp->time_running;
+	if (read_format & PERF_FORMAT_ID)
+		values[n++] = primary_event_id(event);
+
+	raw_spin_unlock_irqrestore(&ctx->lock, flags);
+
+	if (copy_to_user(buf, values, n * sizeof(u64)))
+		return -EFAULT;
+
+	return n * sizeof(u64);
+}
+#else  /* !CONFIG_CGROUP_PERF */
+static inline bool event_can_attach_cgroup(struct perf_event *event)
+{
+	return false;
+}
+
+static inline bool event_has_cgroup_node(struct perf_event *event)
+{
+	return false;
+}
+
+static void cgroup_node_sched_out(struct task_struct *task) {}
+
+static inline void perf_add_cgrp_node_list(struct perf_event *event,
+					   struct perf_event_context *ctx) {}
+static inline void perf_del_cgrp_node_list(struct perf_event *event,
+					   struct perf_event_context *ctx) {}
+
+#define MAX_CGRP_NODE_HASH  1
+static inline int perf_event_attach_cgroup_node(struct perf_event *event,
+						u64 nr_cgrps, u64 *cgrp_ids)
+{
+	return -ENODEV;
+}
+
+static inline void perf_event_destroy_cgroup_nodes(struct perf_event *event) {}
+static inline  void __perf_read_cgroup_node(struct perf_event *event) {}
+
+static inline int perf_event_read_cgroup_node(struct perf_event *event,
+					      u64 cgrp_id, char __user *buf)
+{
+	return -EINVAL;
+}
+#endif  /* CONFIG_CGROUP_PERF */
+
 static inline struct list_head *get_event_list(struct perf_event *event)
 {
 	struct perf_event_context *ctx = event->ctx;
@@ -2407,6 +2724,7 @@ static void __perf_event_disable(struct perf_event *event,
 
 	perf_event_set_state(event, PERF_EVENT_STATE_OFF);
 	perf_cgroup_event_disable(event, ctx);
+	perf_del_cgrp_node_list(event, ctx);
 }
 
 /*
@@ -2946,6 +3264,7 @@ static void __perf_event_enable(struct perf_event *event,
 
 	perf_event_set_state(event, PERF_EVENT_STATE_INACTIVE);
 	perf_cgroup_event_enable(event, ctx);
+	perf_add_cgrp_node_list(event, ctx);
 
 	if (!ctx->is_active)
 		return;
@@ -3568,6 +3887,11 @@ void __perf_event_task_sched_out(struct task_struct *task,
 	 */
 	if (atomic_read(this_cpu_ptr(&perf_cgroup_events)))
 		perf_cgroup_sched_out(task, next);
+
+	if (!list_empty(this_cpu_ptr(&cgroup_ctx_list)) &&
+	    perf_cgroup_from_task(task, NULL) !=
+	    perf_cgroup_from_task(next, NULL))
+		cgroup_node_sched_out(task);
 }
 
 /*
@@ -4268,6 +4592,7 @@ static void __perf_event_read(void *info)
 
 	if (!data->group) {
 		pmu->read(event);
+		__perf_read_cgroup_node(event);
 		data->ret = 0;
 		goto unlock;
 	}
@@ -4283,6 +4608,7 @@ static void __perf_event_read(void *info)
 			 * sibling could be on different (eg: software) PMU.
 			 */
 			sub->pmu->read(sub);
+			__perf_read_cgroup_node(sub);
 		}
 	}
 
@@ -4462,6 +4788,10 @@ static void __perf_event_init_context(struct perf_event_context *ctx)
 	INIT_LIST_HEAD(&ctx->pinned_active);
 	INIT_LIST_HEAD(&ctx->flexible_active);
 	refcount_set(&ctx->refcount, 1);
+#ifdef CONFIG_CGROUP_PERF
+	INIT_LIST_HEAD(&ctx->cgrp_ctx_entry);
+	INIT_LIST_HEAD(&ctx->cgrp_node_list);
+#endif
 }
 
 static struct perf_event_context *
@@ -4851,6 +5181,8 @@ static void _free_event(struct perf_event *event)
 	if (is_cgroup_event(event))
 		perf_detach_cgroup(event);
 
+	perf_event_destroy_cgroup_nodes(event);
+
 	if (!event->parent) {
 		if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
 			put_callchain_buffers();
@@ -5571,6 +5903,58 @@ static long _perf_ioctl(struct perf_event *event, unsigned int cmd, unsigned lon
 
 		return perf_event_modify_attr(event,  &new_attr);
 	}
+
+	case PERF_EVENT_IOC_ATTACH_CGROUP: {
+		u64 nr_cgrps;
+		u64 *cgrp_buf;
+		size_t cgrp_bufsz;
+		int ret;
+
+		if (!event_can_attach_cgroup(event))
+			return -EINVAL;
+
+		if (copy_from_user(&nr_cgrps, (u64 __user *)arg,
+				   sizeof(nr_cgrps)))
+			return -EFAULT;
+
+		if (nr_cgrps == 0 || nr_cgrps > MAX_CGRP_NODE_HASH)
+			return -EINVAL;
+
+		cgrp_bufsz = nr_cgrps * sizeof(*cgrp_buf);
+
+		cgrp_buf = kmalloc(cgrp_bufsz, GFP_KERNEL);
+		if (cgrp_buf == NULL)
+			return -ENOMEM;
+
+		if (copy_from_user(cgrp_buf, (u64 __user *)(arg + 8),
+				   cgrp_bufsz)) {
+			kfree(cgrp_buf);
+			return -EFAULT;
+		}
+
+		ret = perf_event_attach_cgroup_node(event, nr_cgrps, cgrp_buf);
+
+		kfree(cgrp_buf);
+		return ret;
+	}
+
+	case PERF_EVENT_IOC_READ_CGROUP: {
+		u64 read_size, cgrp_id;
+
+		if (!event_can_attach_cgroup(event))
+			return -EINVAL;
+
+		if (copy_from_user(&read_size, (u64 __user *)arg,
+				   sizeof(read_size)))
+			return -EFAULT;
+		if (copy_from_user(&cgrp_id, (u64 __user *)(arg + 8),
+				   sizeof(cgrp_id)))
+			return -EFAULT;
+
+		return perf_event_read_cgroup_node(event, read_size, cgrp_id,
+						   (char __user *)(arg + 16));
+	}
+
 	default:
 		return -ENOTTY;
 	}
@@ -5583,10 +5967,39 @@ static long _perf_ioctl(struct perf_event *event, unsigned int cmd, unsigned lon
 	return 0;
 }
 
+static void perf_sched_enable(void)
+{
+	/*
+	 * We need the mutex here because static_branch_enable()
+	 * must complete *before* the perf_sched_count increment
+	 * becomes visible.
+	 */
+	if (atomic_inc_not_zero(&perf_sched_count))
+		return;
+
+	mutex_lock(&perf_sched_mutex);
+	if (!atomic_read(&perf_sched_count)) {
+		static_branch_enable(&perf_sched_events);
+		/*
+		 * Guarantee that all CPUs observe they key change and
+		 * call the perf scheduling hooks before proceeding to
+		 * install events that need them.
+		 */
+		synchronize_rcu();
+	}
+	/*
+	 * Now that we have waited for the sync_sched(), allow further
+	 * increments to by-pass the mutex.
+	 */
+	atomic_inc(&perf_sched_count);
+	mutex_unlock(&perf_sched_mutex);
+}
+
 static long perf_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 {
 	struct perf_event *event = file->private_data;
 	struct perf_event_context *ctx;
+	bool do_sched_enable = false;
 	long ret;
 
 	/* Treat ioctl like writes as it is likely a mutating operation. */
@@ -5595,9 +6008,19 @@ static long perf_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 		return ret;
 
 	ctx = perf_event_ctx_lock(event);
+	/* ATTACH_CGROUP requires context switch callback */
+	if (cmd == PERF_EVENT_IOC_ATTACH_CGROUP && !event_has_cgroup_node(event))
+		do_sched_enable = true;
 	ret = _perf_ioctl(event, cmd, arg);
 	perf_event_ctx_unlock(event, ctx);
 
+	/*
+	 * Due to the circular lock dependency, it cannot call
+	 * static_branch_enable() under the ctx->mutex.
+	 */
+	if (do_sched_enable && ret >= 0)
+		perf_sched_enable();
+
 	return ret;
 }
 
@@ -11240,33 +11663,8 @@ static void account_event(struct perf_event *event)
 	if (event->attr.text_poke)
 		atomic_inc(&nr_text_poke_events);
 
-	if (inc) {
-		/*
-		 * We need the mutex here because static_branch_enable()
-		 * must complete *before* the perf_sched_count increment
-		 * becomes visible.
-		 */
-		if (atomic_inc_not_zero(&perf_sched_count))
-			goto enabled;
-
-		mutex_lock(&perf_sched_mutex);
-		if (!atomic_read(&perf_sched_count)) {
-			static_branch_enable(&perf_sched_events);
-			/*
-			 * Guarantee that all CPUs observe they key change and
-			 * call the perf scheduling hooks before proceeding to
-			 * install events that need them.
-			 */
-			synchronize_rcu();
-		}
-		/*
-		 * Now that we have waited for the sync_sched(), allow further
-		 * increments to by-pass the mutex.
-		 */
-		atomic_inc(&perf_sched_count);
-		mutex_unlock(&perf_sched_mutex);
-	}
-enabled:
+	if (inc)
+		perf_sched_enable();
 
 	account_event_cpu(event, event->cpu);
 
@@ -13008,6 +13406,7 @@ static void __init perf_event_init_all_cpus(void)
 
 #ifdef CONFIG_CGROUP_PERF
 		INIT_LIST_HEAD(&per_cpu(cgrp_cpuctx_list, cpu));
+		INIT_LIST_HEAD(&per_cpu(cgroup_ctx_list, cpu));
 #endif
 		INIT_LIST_HEAD(&per_cpu(sched_cb_list, cpu));
 	}
@@ -13218,6 +13617,29 @@ static int perf_cgroup_css_online(struct cgroup_subsys_state *css)
 	return 0;
 }
 
+static int __perf_cgroup_update_node(void *info)
+{
+	struct task_struct *task = info;
+
+	rcu_read_lock();
+	cgroup_node_sched_out(task);
+	rcu_read_unlock();
+
+	return 0;
+}
+
+/* update cgroup counter BEFORE task's cgroup is changed */
+static int perf_cgroup_can_attach(struct cgroup_taskset *tset)
+{
+	struct task_struct *task;
+	struct cgroup_subsys_state *css;
+
+	cgroup_taskset_for_each(task, css, tset)
+		task_function_call(task, __perf_cgroup_update_node, task);
+
+	return 0;
+}
+
 static int __perf_cgroup_move(void *info)
 {
 	struct task_struct *task = info;
@@ -13240,6 +13662,7 @@ struct cgroup_subsys perf_event_cgrp_subsys = {
 	.css_alloc	= perf_cgroup_css_alloc,
 	.css_free	= perf_cgroup_css_free,
 	.css_online	= perf_cgroup_css_online,
+	.can_attach	= perf_cgroup_can_attach,
 	.attach		= perf_cgroup_attach,
 	/*
 	 * Implicitly enable on dfl hierarchy so that perf events can
-- 
2.31.1.295.g9ea45b61b8-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 2/2] perf/core: Support reading group events with shared cgroups
  2021-04-13  4:41 [PATCH v2 0/2] perf core: Sharing events with multiple cgroups Namhyung Kim
  2021-04-13  4:41 ` [PATCH v2 1/2] perf/core: Share an event " Namhyung Kim
@ 2021-04-13  4:41 ` Namhyung Kim
  1 sibling, 0 replies; 7+ messages in thread
From: Namhyung Kim @ 2021-04-13  4:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Jiri Olsa, Mark Rutland,
	Alexander Shishkin, LKML, Stephane Eranian, Andi Kleen,
	Ian Rogers, Song Liu

This enables reading event group's counter values together with a
PERF_EVENT_IOC_READ_CGROUP command like we do in the regular read().
Users should give a correct size of buffer to be read which includes
the total buffer size and the cgroup id.

Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 kernel/events/core.c | 120 +++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 117 insertions(+), 3 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 0c6b3848a61f..d483b4b42fe2 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2232,13 +2232,24 @@ static void perf_add_cgrp_node_list(struct perf_event *event,
 {
 	struct list_head *cgrp_ctx_list = this_cpu_ptr(&cgroup_ctx_list);
 	struct perf_cgroup *cgrp = perf_cgroup_from_task(current, ctx);
+	struct perf_event *sibling;
 	bool is_first;
 
 	lockdep_assert_irqs_disabled();
 	lockdep_assert_held(&ctx->lock);
 
+	/* only group leader can be added directly */
+	if (event->group_leader != event)
+		return;
+
+	if (!event_has_cgroup_node(event))
+		return;
+
 	is_first = list_empty(&ctx->cgrp_node_list);
+
 	list_add_tail(&event->cgrp_node_entry, &ctx->cgrp_node_list);
+	for_each_sibling_event(sibling, event)
+		list_add_tail(&sibling->cgrp_node_entry, &ctx->cgrp_node_list);
 
 	if (is_first)
 		list_add_tail(&ctx->cgrp_ctx_entry, cgrp_ctx_list);
@@ -2250,15 +2261,25 @@ static void perf_del_cgrp_node_list(struct perf_event *event,
 				    struct perf_event_context *ctx)
 {
 	struct perf_cgroup *cgrp = perf_cgroup_from_task(current, ctx);
+	struct perf_event *sibling;
 
 	lockdep_assert_irqs_disabled();
 	lockdep_assert_held(&ctx->lock);
 
+	/* only group leader can be deleted directly */
+	if (event->group_leader != event)
+		return;
+
+	if (!event_has_cgroup_node(event))
+		return;
+
 	update_cgroup_node(event, cgrp->css.cgroup);
 	/* to refresh delta when it's enabled */
 	event->cgrp_node_count = 0;
 
 	list_del(&event->cgrp_node_entry);
+	for_each_sibling_event(sibling, event)
+		list_del(&sibling->cgrp_node_entry);
 
 	if (list_empty(&ctx->cgrp_node_list))
 		list_del(&ctx->cgrp_ctx_entry);
@@ -2333,7 +2354,7 @@ static int perf_event_attach_cgroup_node(struct perf_event *event, u64 nr_cgrps,
 
 	raw_spin_unlock_irqrestore(&ctx->lock, flags);
 
-	if (is_first && enabled)
+	if (is_first && enabled && event->group_leader == event)
 		event_function_call(event, perf_attach_cgroup_node, NULL);
 
 	return 0;
@@ -2370,8 +2391,8 @@ static void __perf_read_cgroup_node(struct perf_event *event)
 	}
 }
 
-static int perf_event_read_cgroup_node(struct perf_event *event, u64 read_size,
-				       u64 cgrp_id, char __user *buf)
+static int perf_event_read_cgrp_node_one(struct perf_event *event, u64 cgrp_id,
+					 char __user *buf)
 {
 	struct perf_cgroup_node *cgrp;
 	struct perf_event_context *ctx = event->ctx;
@@ -2406,6 +2427,92 @@ static int perf_event_read_cgroup_node(struct perf_event *event, u64 read_size,
 
 	return n * sizeof(u64);
 }
+
+static int perf_event_read_cgrp_node_sibling(struct perf_event *event,
+					     u64 read_format, u64 cgrp_id,
+					     u64 *values)
+{
+	struct perf_cgroup_node *cgrp;
+	int n = 0;
+
+	cgrp = find_cgroup_node(event, cgrp_id);
+	if (cgrp == NULL)
+		return (read_format & PERF_FORMAT_ID) ? 2 : 1;
+
+	values[n++] = cgrp->count;
+	if (read_format & PERF_FORMAT_ID)
+		values[n++] = primary_event_id(event);
+	return n;
+}
+
+static int perf_event_read_cgrp_node_group(struct perf_event *event, u64 cgrp_id,
+					   char __user *buf)
+{
+	struct perf_cgroup_node *cgrp;
+	struct perf_event_context *ctx = event->ctx;
+	struct perf_event *sibling;
+	u64 read_format = event->attr.read_format;
+	unsigned long flags;
+	u64 *values;
+	int n = 1;
+	int ret;
+
+	values = kzalloc(event->read_size, GFP_KERNEL);
+	if (!values)
+		return -ENOMEM;
+
+	values[0] = 1 + event->nr_siblings;
+
+	/* update event count and times (possibly run on other cpu) */
+	(void)perf_event_read(event, true);
+
+	raw_spin_lock_irqsave(&ctx->lock, flags);
+
+	cgrp = find_cgroup_node(event, cgrp_id);
+	if (cgrp == NULL) {
+		raw_spin_unlock_irqrestore(&ctx->lock, flags);
+		kfree(values);
+		return -ENOENT;
+	}
+
+	if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED)
+		values[n++] = cgrp->time_enabled;
+	if (read_format & PERF_FORMAT_TOTAL_TIME_RUNNING)
+		values[n++] = cgrp->time_running;
+
+	values[n++] = cgrp->count;
+	if (read_format & PERF_FORMAT_ID)
+		values[n++] = primary_event_id(event);
+
+	for_each_sibling_event(sibling, event) {
+		n += perf_event_read_cgrp_node_sibling(sibling, read_format,
+						       cgrp_id, &values[n]);
+	}
+
+	raw_spin_unlock_irqrestore(&ctx->lock, flags);
+
+	ret = copy_to_user(buf, values, n * sizeof(u64));
+	kfree(values);
+	if (ret)
+		return -EFAULT;
+
+	return n * sizeof(u64);
+}
+
+static int perf_event_read_cgroup_node(struct perf_event *event, u64 read_size,
+				       u64 cgrp_id, char __user *buf)
+{
+	u64 read_format = event->attr.read_format;
+
+	/* buf = bufsize + cgroup_id + read_buffer */
+	if (read_size < 2 * sizeof(u64) + event->read_size)
+		return -EINVAL;
+
+	if (read_format & PERF_FORMAT_GROUP)
+		return perf_event_read_cgrp_node_group(event, cgrp_id, buf);
+
+	return perf_event_read_cgrp_node_one(event, cgrp_id, buf);
+}
 #else  /* !CONFIG_CGROUP_PERF */
 static inline bool event_can_attach_cgroup(struct perf_event *event)
 {
@@ -2511,6 +2618,7 @@ static void perf_group_detach(struct perf_event *event)
 			if (sibling->state == PERF_EVENT_STATE_ACTIVE)
 				list_add_tail(&sibling->active_list, get_event_list(sibling));
 		}
+		perf_add_cgrp_node_list(sibling, event->ctx);
 
 		WARN_ON_ONCE(sibling->ctx != event->ctx);
 	}
@@ -2654,6 +2762,9 @@ __perf_remove_from_context(struct perf_event *event,
 		perf_group_detach(event);
 	list_del_event(event, ctx);
 
+	if (event->state > PERF_EVENT_STATE_OFF)
+		perf_del_cgrp_node_list(event, ctx);
+
 	if (!ctx->nr_events && ctx->is_active) {
 		ctx->is_active = 0;
 		ctx->rotate_necessary = 0;
@@ -3112,6 +3223,9 @@ static int  __perf_install_in_context(void *info)
 		reprogram = cgroup_is_descendant(cgrp->css.cgroup,
 					event->cgrp->css.cgroup);
 	}
+
+	if (event->state > PERF_EVENT_STATE_OFF)
+		perf_add_cgrp_node_list(event, ctx);
 #endif
 
 	if (reprogram) {
-- 
2.31.1.295.g9ea45b61b8-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 1/2] perf/core: Share an event with multiple cgroups
  2021-04-13  4:41 ` [PATCH v2 1/2] perf/core: Share an event " Namhyung Kim
@ 2021-04-13  7:03   ` kernel test robot
  2021-04-13  9:39   ` kernel test robot
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: kernel test robot @ 2021-04-13  7:03 UTC (permalink / raw)
  To: Namhyung Kim, Peter Zijlstra
  Cc: kbuild-all, clang-built-linux, Ingo Molnar,
	Arnaldo Carvalho de Melo, Jiri Olsa, Mark Rutland,
	Alexander Shishkin, LKML, Stephane Eranian, Andi Kleen,
	Ian Rogers

[-- Attachment #1: Type: text/plain, Size: 12109 bytes --]

Hi Namhyung,

I love your patch! Yet something to improve:

[auto build test ERROR on tip/perf/core]
[also build test ERROR on tip/master linux/master linus/master v5.12-rc7 next-20210412]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Namhyung-Kim/perf-core-Sharing-events-with-multiple-cgroups/20210413-124251
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git cface0326a6c2ae5c8f47bd466f07624b3e348a7
config: arm64-randconfig-r026-20210413 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 9829f5e6b1bca9b61efc629770d28bb9014dec45)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm64 cross compiling tool for clang build
        # apt-get install binutils-aarch64-linux-gnu
        # https://github.com/0day-ci/linux/commit/c604a61fb3cfd58be50992c8284b13e598312794
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Namhyung-Kim/perf-core-Sharing-events-with-multiple-cgroups/20210413-124251
        git checkout c604a61fb3cfd58be50992c8284b13e598312794
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> kernel/events/core.c:3891:32: error: use of undeclared identifier 'cgroup_ctx_list'; did you mean 'cgroup_exit'?
           if (!list_empty(this_cpu_ptr(&cgroup_ctx_list)) &&
                                         ^~~~~~~~~~~~~~~
                                         cgroup_exit
   include/linux/percpu-defs.h:252:39: note: expanded from macro 'this_cpu_ptr'
   #define this_cpu_ptr(ptr) raw_cpu_ptr(ptr)
                                         ^
   include/linux/percpu-defs.h:241:20: note: expanded from macro 'raw_cpu_ptr'
           __verify_pcpu_ptr(ptr);                                         \
                             ^
   include/linux/percpu-defs.h:219:47: note: expanded from macro '__verify_pcpu_ptr'
           const void __percpu *__vpp_verify = (typeof((ptr) + 0))NULL;    \
                                                        ^
   include/linux/cgroup.h:130:6: note: 'cgroup_exit' declared here
   void cgroup_exit(struct task_struct *p);
        ^
>> kernel/events/core.c:3891:32: error: use of undeclared identifier 'cgroup_ctx_list'; did you mean 'cgroup_exit'?
           if (!list_empty(this_cpu_ptr(&cgroup_ctx_list)) &&
                                         ^~~~~~~~~~~~~~~
                                         cgroup_exit
   include/linux/percpu-defs.h:252:39: note: expanded from macro 'this_cpu_ptr'
   #define this_cpu_ptr(ptr) raw_cpu_ptr(ptr)
                                         ^
   include/linux/percpu-defs.h:242:19: note: expanded from macro 'raw_cpu_ptr'
           arch_raw_cpu_ptr(ptr);                                          \
                            ^
   include/asm-generic/percpu.h:44:48: note: expanded from macro 'arch_raw_cpu_ptr'
   #define arch_raw_cpu_ptr(ptr) SHIFT_PERCPU_PTR(ptr, __my_cpu_offset)
                                                  ^
   include/linux/percpu-defs.h:231:23: note: expanded from macro 'SHIFT_PERCPU_PTR'
           RELOC_HIDE((typeof(*(__p)) __kernel __force *)(__p), (__offset))
                                ^
   include/linux/compiler.h:181:31: note: expanded from macro 'RELOC_HIDE'
        __ptr = (unsigned long) (ptr);                             \
                                 ^
   include/linux/cgroup.h:130:6: note: 'cgroup_exit' declared here
   void cgroup_exit(struct task_struct *p);
        ^
>> kernel/events/core.c:3891:32: error: use of undeclared identifier 'cgroup_ctx_list'; did you mean 'cgroup_exit'?
           if (!list_empty(this_cpu_ptr(&cgroup_ctx_list)) &&
                                         ^~~~~~~~~~~~~~~
                                         cgroup_exit
   include/linux/percpu-defs.h:252:39: note: expanded from macro 'this_cpu_ptr'
   #define this_cpu_ptr(ptr) raw_cpu_ptr(ptr)
                                         ^
   include/linux/percpu-defs.h:242:19: note: expanded from macro 'raw_cpu_ptr'
           arch_raw_cpu_ptr(ptr);                                          \
                            ^
   include/asm-generic/percpu.h:44:48: note: expanded from macro 'arch_raw_cpu_ptr'
   #define arch_raw_cpu_ptr(ptr) SHIFT_PERCPU_PTR(ptr, __my_cpu_offset)
                                                  ^
   include/linux/percpu-defs.h:231:49: note: expanded from macro 'SHIFT_PERCPU_PTR'
           RELOC_HIDE((typeof(*(__p)) __kernel __force *)(__p), (__offset))
                                                          ^
   include/linux/compiler.h:181:31: note: expanded from macro 'RELOC_HIDE'
        __ptr = (unsigned long) (ptr);                             \
                                 ^
   include/linux/cgroup.h:130:6: note: 'cgroup_exit' declared here
   void cgroup_exit(struct task_struct *p);
        ^
>> kernel/events/core.c:3891:32: error: use of undeclared identifier 'cgroup_ctx_list'; did you mean 'cgroup_exit'?
           if (!list_empty(this_cpu_ptr(&cgroup_ctx_list)) &&
                                         ^~~~~~~~~~~~~~~
                                         cgroup_exit
   include/linux/percpu-defs.h:252:39: note: expanded from macro 'this_cpu_ptr'
   #define this_cpu_ptr(ptr) raw_cpu_ptr(ptr)
                                         ^
   include/linux/percpu-defs.h:242:19: note: expanded from macro 'raw_cpu_ptr'
           arch_raw_cpu_ptr(ptr);                                          \
                            ^
   include/asm-generic/percpu.h:44:48: note: expanded from macro 'arch_raw_cpu_ptr'
   #define arch_raw_cpu_ptr(ptr) SHIFT_PERCPU_PTR(ptr, __my_cpu_offset)
                                                  ^
   include/linux/percpu-defs.h:231:23: note: expanded from macro 'SHIFT_PERCPU_PTR'
           RELOC_HIDE((typeof(*(__p)) __kernel __force *)(__p), (__offset))
                                ^
   include/linux/compiler.h:182:13: note: expanded from macro 'RELOC_HIDE'
       (typeof(ptr)) (__ptr + (off)); })
               ^
   include/linux/cgroup.h:130:6: note: 'cgroup_exit' declared here
   void cgroup_exit(struct task_struct *p);
        ^
>> kernel/events/core.c:3891:32: error: use of undeclared identifier 'cgroup_ctx_list'; did you mean 'cgroup_exit'?
           if (!list_empty(this_cpu_ptr(&cgroup_ctx_list)) &&
                                         ^~~~~~~~~~~~~~~
                                         cgroup_exit
   include/linux/percpu-defs.h:252:39: note: expanded from macro 'this_cpu_ptr'
   #define this_cpu_ptr(ptr) raw_cpu_ptr(ptr)
                                         ^
   include/linux/percpu-defs.h:242:19: note: expanded from macro 'raw_cpu_ptr'
           arch_raw_cpu_ptr(ptr);                                          \
                            ^
   include/asm-generic/percpu.h:44:48: note: expanded from macro 'arch_raw_cpu_ptr'
   #define arch_raw_cpu_ptr(ptr) SHIFT_PERCPU_PTR(ptr, __my_cpu_offset)
                                                  ^
   include/linux/percpu-defs.h:231:49: note: expanded from macro 'SHIFT_PERCPU_PTR'
           RELOC_HIDE((typeof(*(__p)) __kernel __force *)(__p), (__offset))
                                                          ^
   include/linux/compiler.h:182:13: note: expanded from macro 'RELOC_HIDE'
       (typeof(ptr)) (__ptr + (off)); })
               ^
   include/linux/cgroup.h:130:6: note: 'cgroup_exit' declared here
   void cgroup_exit(struct task_struct *p);
        ^
>> kernel/events/core.c:3891:18: error: incompatible pointer types passing 'typeof ((typeof (*(&cgroup_exit)) *)(&cgroup_exit))' (aka 'void (*)(struct task_struct *)') to parameter of type 'const struct list_head *' [-Werror,-Wincompatible-pointer-types]
           if (!list_empty(this_cpu_ptr(&cgroup_ctx_list)) &&
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/percpu-defs.h:252:27: note: expanded from macro 'this_cpu_ptr'
   #define this_cpu_ptr(ptr) raw_cpu_ptr(ptr)
                             ^~~~~~~~~~~~~~~~
   include/linux/percpu-defs.h:239:31: note: expanded from macro 'raw_cpu_ptr'
   #define raw_cpu_ptr(ptr)                                                \
                                                                           ^
   include/linux/list.h:280:54: note: passing argument to parameter 'head' here
   static inline int list_empty(const struct list_head *head)
                                                        ^
>> kernel/events/core.c:3892:6: error: implicit declaration of function 'perf_cgroup_from_task' [-Werror,-Wimplicit-function-declaration]
               perf_cgroup_from_task(task, NULL) !=
               ^
   kernel/events/core.c:3892:6: note: did you mean 'perf_cgroup_match'?
   kernel/events/core.c:1057:1: note: 'perf_cgroup_match' declared here
   perf_cgroup_match(struct perf_event *event)
   ^
>> kernel/events/core.c:5955:10: error: too many arguments to function call, expected 3, have 4
                                                      (char __user *)(arg + 16));
                                                      ^~~~~~~~~~~~~~~~~~~~~~~~~
   kernel/events/core.c:2437:19: note: 'perf_event_read_cgroup_node' declared here
   static inline int perf_event_read_cgroup_node(struct perf_event *event,
                     ^
   kernel/events/core.c:6981:6: warning: no previous prototype for function 'perf_pmu_snapshot_aux' [-Wmissing-prototypes]
   long perf_pmu_snapshot_aux(struct perf_buffer *rb,
        ^
   kernel/events/core.c:6981:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   long perf_pmu_snapshot_aux(struct perf_buffer *rb,
   ^
   static 
   1 warning and 8 errors generated.


vim +3891 kernel/events/core.c

  3851	
  3852	static void perf_event_switch(struct task_struct *task,
  3853				      struct task_struct *next_prev, bool sched_in);
  3854	
  3855	#define for_each_task_context_nr(ctxn)					\
  3856		for ((ctxn) = 0; (ctxn) < perf_nr_task_contexts; (ctxn)++)
  3857	
  3858	/*
  3859	 * Called from scheduler to remove the events of the current task,
  3860	 * with interrupts disabled.
  3861	 *
  3862	 * We stop each event and update the event value in event->count.
  3863	 *
  3864	 * This does not protect us against NMI, but disable()
  3865	 * sets the disabled bit in the control field of event _before_
  3866	 * accessing the event control register. If a NMI hits, then it will
  3867	 * not restart the event.
  3868	 */
  3869	void __perf_event_task_sched_out(struct task_struct *task,
  3870					 struct task_struct *next)
  3871	{
  3872		int ctxn;
  3873	
  3874		if (__this_cpu_read(perf_sched_cb_usages))
  3875			perf_pmu_sched_task(task, next, false);
  3876	
  3877		if (atomic_read(&nr_switch_events))
  3878			perf_event_switch(task, next, false);
  3879	
  3880		for_each_task_context_nr(ctxn)
  3881			perf_event_context_sched_out(task, ctxn, next);
  3882	
  3883		/*
  3884		 * if cgroup events exist on this CPU, then we need
  3885		 * to check if we have to switch out PMU state.
  3886		 * cgroup event are system-wide mode only
  3887		 */
  3888		if (atomic_read(this_cpu_ptr(&perf_cgroup_events)))
  3889			perf_cgroup_sched_out(task, next);
  3890	
> 3891		if (!list_empty(this_cpu_ptr(&cgroup_ctx_list)) &&
> 3892		    perf_cgroup_from_task(task, NULL) !=
  3893		    perf_cgroup_from_task(next, NULL))
  3894			cgroup_node_sched_out(task);
  3895	}
  3896	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 35094 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 1/2] perf/core: Share an event with multiple cgroups
  2021-04-13  4:41 ` [PATCH v2 1/2] perf/core: Share an event " Namhyung Kim
  2021-04-13  7:03   ` kernel test robot
@ 2021-04-13  9:39   ` kernel test robot
  2021-04-14  5:15   ` kernel test robot
  2021-04-14  5:15   ` [PATCH] perf/core: fix memdup_user.cocci warnings kernel test robot
  3 siblings, 0 replies; 7+ messages in thread
From: kernel test robot @ 2021-04-13  9:39 UTC (permalink / raw)
  To: Namhyung Kim, Peter Zijlstra
  Cc: kbuild-all, clang-built-linux, Ingo Molnar,
	Arnaldo Carvalho de Melo, Jiri Olsa, Mark Rutland,
	Alexander Shishkin, LKML, Stephane Eranian, Andi Kleen,
	Ian Rogers

[-- Attachment #1: Type: text/plain, Size: 9780 bytes --]

Hi Namhyung,

I love your patch! Yet something to improve:

[auto build test ERROR on tip/perf/core]
[also build test ERROR on tip/master linux/master linus/master v5.12-rc7 next-20210412]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Namhyung-Kim/perf-core-Sharing-events-with-multiple-cgroups/20210413-124251
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git cface0326a6c2ae5c8f47bd466f07624b3e348a7
config: powerpc-randconfig-r025-20210413 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 9829f5e6b1bca9b61efc629770d28bb9014dec45)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install powerpc cross compiling tool for clang build
        # apt-get install binutils-powerpc-linux-gnu
        # https://github.com/0day-ci/linux/commit/c604a61fb3cfd58be50992c8284b13e598312794
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Namhyung-Kim/perf-core-Sharing-events-with-multiple-cgroups/20210413-124251
        git checkout c604a61fb3cfd58be50992c8284b13e598312794
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> kernel/events/core.c:3891:32: error: use of undeclared identifier 'cgroup_ctx_list'; did you mean 'cgroup_exit'?
           if (!list_empty(this_cpu_ptr(&cgroup_ctx_list)) &&
                                         ^~~~~~~~~~~~~~~
                                         cgroup_exit
   include/linux/percpu-defs.h:265:39: note: expanded from macro 'this_cpu_ptr'
   #define this_cpu_ptr(ptr)       raw_cpu_ptr(ptr)
                                               ^
   include/linux/percpu-defs.h:264:38: note: expanded from macro 'raw_cpu_ptr'
   #define raw_cpu_ptr(ptr)        per_cpu_ptr(ptr, 0)
                                               ^
   include/linux/percpu-defs.h:263:65: note: expanded from macro 'per_cpu_ptr'
   #define per_cpu_ptr(ptr, cpu)   ({ (void)(cpu); VERIFY_PERCPU_PTR(ptr); })
                                                                     ^
   include/linux/percpu-defs.h:259:20: note: expanded from macro 'VERIFY_PERCPU_PTR'
           __verify_pcpu_ptr(__p);                                         \
                             ^
   include/linux/percpu-defs.h:219:47: note: expanded from macro '__verify_pcpu_ptr'
           const void __percpu *__vpp_verify = (typeof((ptr) + 0))NULL;    \
                                                        ^
   include/linux/cgroup.h:130:6: note: 'cgroup_exit' declared here
   void cgroup_exit(struct task_struct *p);
        ^
>> kernel/events/core.c:3891:32: error: use of undeclared identifier 'cgroup_ctx_list'; did you mean 'cgroup_exit'?
           if (!list_empty(this_cpu_ptr(&cgroup_ctx_list)) &&
                                         ^~~~~~~~~~~~~~~
                                         cgroup_exit
   include/linux/percpu-defs.h:265:39: note: expanded from macro 'this_cpu_ptr'
   #define this_cpu_ptr(ptr)       raw_cpu_ptr(ptr)
                                               ^
   include/linux/percpu-defs.h:264:38: note: expanded from macro 'raw_cpu_ptr'
   #define raw_cpu_ptr(ptr)        per_cpu_ptr(ptr, 0)
                                               ^
   include/linux/percpu-defs.h:263:65: note: expanded from macro 'per_cpu_ptr'
   #define per_cpu_ptr(ptr, cpu)   ({ (void)(cpu); VERIFY_PERCPU_PTR(ptr); })
                                                                     ^
   include/linux/percpu-defs.h:260:12: note: expanded from macro 'VERIFY_PERCPU_PTR'
           (typeof(*(__p)) __kernel __force *)(__p);                       \
                     ^
   include/linux/cgroup.h:130:6: note: 'cgroup_exit' declared here
   void cgroup_exit(struct task_struct *p);
        ^
>> kernel/events/core.c:3891:32: error: use of undeclared identifier 'cgroup_ctx_list'; did you mean 'cgroup_exit'?
           if (!list_empty(this_cpu_ptr(&cgroup_ctx_list)) &&
                                         ^~~~~~~~~~~~~~~
                                         cgroup_exit
   include/linux/percpu-defs.h:265:39: note: expanded from macro 'this_cpu_ptr'
   #define this_cpu_ptr(ptr)       raw_cpu_ptr(ptr)
                                               ^
   include/linux/percpu-defs.h:264:38: note: expanded from macro 'raw_cpu_ptr'
   #define raw_cpu_ptr(ptr)        per_cpu_ptr(ptr, 0)
                                               ^
   include/linux/percpu-defs.h:263:65: note: expanded from macro 'per_cpu_ptr'
   #define per_cpu_ptr(ptr, cpu)   ({ (void)(cpu); VERIFY_PERCPU_PTR(ptr); })
                                                                     ^
   include/linux/percpu-defs.h:260:38: note: expanded from macro 'VERIFY_PERCPU_PTR'
           (typeof(*(__p)) __kernel __force *)(__p);                       \
                                               ^
   include/linux/cgroup.h:130:6: note: 'cgroup_exit' declared here
   void cgroup_exit(struct task_struct *p);
        ^
>> kernel/events/core.c:3891:18: error: incompatible pointer types passing 'typeof (*(&cgroup_exit)) *' (aka 'void (*)(struct task_struct *)') to parameter of type 'const struct list_head *' [-Werror,-Wincompatible-pointer-types]
           if (!list_empty(this_cpu_ptr(&cgroup_ctx_list)) &&
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/percpu-defs.h:265:27: note: expanded from macro 'this_cpu_ptr'
   #define this_cpu_ptr(ptr)       raw_cpu_ptr(ptr)
                                   ^~~~~~~~~~~~~~~~
   include/linux/percpu-defs.h:264:26: note: expanded from macro 'raw_cpu_ptr'
   #define raw_cpu_ptr(ptr)        per_cpu_ptr(ptr, 0)
                                   ^~~~~~~~~~~~~~~~~~~
   include/linux/percpu-defs.h:263:31: note: expanded from macro 'per_cpu_ptr'
   #define per_cpu_ptr(ptr, cpu)   ({ (void)(cpu); VERIFY_PERCPU_PTR(ptr); })
                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/list.h:280:54: note: passing argument to parameter 'head' here
   static inline int list_empty(const struct list_head *head)
                                                        ^
>> kernel/events/core.c:3892:6: error: implicit declaration of function 'perf_cgroup_from_task' [-Werror,-Wimplicit-function-declaration]
               perf_cgroup_from_task(task, NULL) !=
               ^
   kernel/events/core.c:3892:6: note: did you mean 'perf_cgroup_match'?
   kernel/events/core.c:1057:1: note: 'perf_cgroup_match' declared here
   perf_cgroup_match(struct perf_event *event)
   ^
>> kernel/events/core.c:5955:10: error: too many arguments to function call, expected 3, have 4
                                                      (char __user *)(arg + 16));
                                                      ^~~~~~~~~~~~~~~~~~~~~~~~~
   kernel/events/core.c:2437:19: note: 'perf_event_read_cgroup_node' declared here
   static inline int perf_event_read_cgroup_node(struct perf_event *event,
                     ^
   kernel/events/core.c:6981:6: warning: no previous prototype for function 'perf_pmu_snapshot_aux' [-Wmissing-prototypes]
   long perf_pmu_snapshot_aux(struct perf_buffer *rb,
        ^
   kernel/events/core.c:6981:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   long perf_pmu_snapshot_aux(struct perf_buffer *rb,
   ^
   static 
   1 warning and 6 errors generated.


vim +3891 kernel/events/core.c

  3851	
  3852	static void perf_event_switch(struct task_struct *task,
  3853				      struct task_struct *next_prev, bool sched_in);
  3854	
  3855	#define for_each_task_context_nr(ctxn)					\
  3856		for ((ctxn) = 0; (ctxn) < perf_nr_task_contexts; (ctxn)++)
  3857	
  3858	/*
  3859	 * Called from scheduler to remove the events of the current task,
  3860	 * with interrupts disabled.
  3861	 *
  3862	 * We stop each event and update the event value in event->count.
  3863	 *
  3864	 * This does not protect us against NMI, but disable()
  3865	 * sets the disabled bit in the control field of event _before_
  3866	 * accessing the event control register. If a NMI hits, then it will
  3867	 * not restart the event.
  3868	 */
  3869	void __perf_event_task_sched_out(struct task_struct *task,
  3870					 struct task_struct *next)
  3871	{
  3872		int ctxn;
  3873	
  3874		if (__this_cpu_read(perf_sched_cb_usages))
  3875			perf_pmu_sched_task(task, next, false);
  3876	
  3877		if (atomic_read(&nr_switch_events))
  3878			perf_event_switch(task, next, false);
  3879	
  3880		for_each_task_context_nr(ctxn)
  3881			perf_event_context_sched_out(task, ctxn, next);
  3882	
  3883		/*
  3884		 * if cgroup events exist on this CPU, then we need
  3885		 * to check if we have to switch out PMU state.
  3886		 * cgroup event are system-wide mode only
  3887		 */
  3888		if (atomic_read(this_cpu_ptr(&perf_cgroup_events)))
  3889			perf_cgroup_sched_out(task, next);
  3890	
> 3891		if (!list_empty(this_cpu_ptr(&cgroup_ctx_list)) &&
> 3892		    perf_cgroup_from_task(task, NULL) !=
  3893		    perf_cgroup_from_task(next, NULL))
  3894			cgroup_node_sched_out(task);
  3895	}
  3896	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 32803 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 1/2] perf/core: Share an event with multiple cgroups
  2021-04-13  4:41 ` [PATCH v2 1/2] perf/core: Share an event " Namhyung Kim
  2021-04-13  7:03   ` kernel test robot
  2021-04-13  9:39   ` kernel test robot
@ 2021-04-14  5:15   ` kernel test robot
  2021-04-14  5:15   ` [PATCH] perf/core: fix memdup_user.cocci warnings kernel test robot
  3 siblings, 0 replies; 7+ messages in thread
From: kernel test robot @ 2021-04-14  5:15 UTC (permalink / raw)
  To: Namhyung Kim, Peter Zijlstra
  Cc: kbuild-all, Ingo Molnar, Arnaldo Carvalho de Melo, Jiri Olsa,
	Mark Rutland, Alexander Shishkin, LKML, Stephane Eranian,
	Andi Kleen, Ian Rogers

[-- Attachment #1: Type: text/plain, Size: 1101 bytes --]

Hi Namhyung,

I love your patch! Perhaps something to improve:

[auto build test WARNING on tip/perf/core]
[also build test WARNING on tip/master linux/master linus/master v5.12-rc7 next-20210413]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Namhyung-Kim/perf-core-Sharing-events-with-multiple-cgroups/20210413-124251
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git cface0326a6c2ae5c8f47bd466f07624b3e348a7
config: x86_64-allyesconfig (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


cocci warnings: (new ones prefixed by >>)
>> kernel/events/core.c:5925:13-20: WARNING opportunity for memdup_user

Please review and possibly fold the followup patch.

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 65454 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] perf/core: fix memdup_user.cocci warnings
  2021-04-13  4:41 ` [PATCH v2 1/2] perf/core: Share an event " Namhyung Kim
                     ` (2 preceding siblings ...)
  2021-04-14  5:15   ` kernel test robot
@ 2021-04-14  5:15   ` kernel test robot
  3 siblings, 0 replies; 7+ messages in thread
From: kernel test robot @ 2021-04-14  5:15 UTC (permalink / raw)
  To: Namhyung Kim, Peter Zijlstra
  Cc: kbuild-all, Ingo Molnar, Arnaldo Carvalho de Melo, Jiri Olsa,
	Mark Rutland, Alexander Shishkin, LKML, Stephane Eranian,
	Andi Kleen, Ian Rogers

From: kernel test robot <lkp@intel.com>

kernel/events/core.c:5925:13-20: WARNING opportunity for memdup_user

 Use memdup_user rather than duplicating its implementation
 This is a little bit restricted to reduce false positives

Generated by: scripts/coccinelle/api/memdup_user.cocci

CC: Namhyung Kim <namhyung@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: kernel test robot <lkp@intel.com>
---

url:    https://github.com/0day-ci/linux/commits/Namhyung-Kim/perf-core-Sharing-events-with-multiple-cgroups/20210413-124251
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git cface0326a6c2ae5c8f47bd466f07624b3e348a7

 core.c |   12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5922,15 +5922,9 @@ static long _perf_ioctl(struct perf_even
 
 		cgrp_bufsz = nr_cgrps * sizeof(*cgrp_buf);
 
-		cgrp_buf = kmalloc(cgrp_bufsz, GFP_KERNEL);
-		if (cgrp_buf == NULL)
-			return -ENOMEM;
-
-		if (copy_from_user(cgrp_buf, (u64 __user *)(arg + 8),
-				   cgrp_bufsz)) {
-			kfree(cgrp_buf);
-			return -EFAULT;
-		}
+		cgrp_buf = memdup_user((u64 __user *)(arg + 8), cgrp_bufsz);
+		if (IS_ERR(cgrp_buf))
+			return PTR_ERR(cgrp_buf);
 
 		ret = perf_event_attach_cgroup_node(event, nr_cgrps, cgrp_buf);
 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-04-14  5:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-13  4:41 [PATCH v2 0/2] perf core: Sharing events with multiple cgroups Namhyung Kim
2021-04-13  4:41 ` [PATCH v2 1/2] perf/core: Share an event " Namhyung Kim
2021-04-13  7:03   ` kernel test robot
2021-04-13  9:39   ` kernel test robot
2021-04-14  5:15   ` kernel test robot
2021-04-14  5:15   ` [PATCH] perf/core: fix memdup_user.cocci warnings kernel test robot
2021-04-13  4:41 ` [PATCH v2 2/2] perf/core: Support reading group events with shared cgroups Namhyung Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).