All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET 0/9] perf: Improve cgroup profiling (v4)
@ 2020-01-07 13:34 Namhyung Kim
  2020-01-07 13:34 ` [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event Namhyung Kim
                   ` (8 more replies)
  0 siblings, 9 replies; 48+ messages in thread
From: Namhyung Kim @ 2020-01-07 13:34 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Alexander Shishkin, Mark Rutland, Stephane Eranian,
	LKML, linux-perf-users, Tejun Heo, Li Zefan, Johannes Weiner,
	Adrian Hunter

Hello,

This work is to improve cgroup profiling in perf.  Currently it only
supports profiling tasks in a specific cgroup and there's no way to
identify which cgroup the current sample belongs to.  So I added
PERF_SAMPLE_CGROUP to add cgroup id into each sample.  It's a 64-bit
integer having file handle of the cgroup.  And kernel also generates
PERF_RECORD_CGROUP event for new groups to correlate the cgroup id and
cgroup name (path in the cgroup filesystem).  The cgroup id can be
read from userspace by name_to_handle_at() system call so it can
synthesize the CGROUP event for existing groups.

So why do we want this?  Systems running a large number of jobs in
different cgroups want to profiling such jobs precisely. This includes
container hosting systems widely used today. Currently perf supports
namespace tracking but the systems may not use (cgroup) namespace for
their jobs. Also it'd be more intuitive to see cgroup names (as
they're given by user or sysadmin) rather than numeric
cgroup/namespace id even if they use the namespaces.

From Stephane Eranian:
> In data centers you care about attributing samples to a job not such
> much to a process.  A job may have multiple processes which may come
> and go. The cgroup on the other hand stays around for the entire
> lifetime of the job. It is much easier to map a cgroup name to a
> particular job than it is to map a pid back to a job name,
> especially for offline post-processing.

Note that this only works for "perf_event" cgroups (obviously) so if
users are still using cgroup-v1 interface, they need to have same
hierarchy for subsystem(s) want to profile with it.

 * Changes from v3:
  - staticize perf_event_cgroup()
  - fix sample parsing test

 * Changes from v2:
  - remove path_len from cgroup_event
  - bail out if kernel doesn't support cgroup sampling
  - add some description in the Kconfig

 * Changes from v1:
  - use new cgroup id (= file handle)

The code is available at perf/cgroup-v4 branch in

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git


The testing result looks something like this:

  [root@qemu build]# ./perf record --all-cgroups ./cgtest
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.009 MB perf.data (150 samples) ]
  
  [root@qemu build]# ./perf report -s cgroup,comm --stdio
  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 150  of event 'cpu-clock:pppH'
  # Event count (approx.): 37500000
  #
  # Overhead  cgroup      Command
  # ........  ..........  .......
  #
      32.00%  /sub/cgrp2  looper2
      28.00%  /sub/cgrp1  looper1
      25.33%  /sub        looper0
       4.00%  /           cgtest 
       4.00%  /sub        cgtest 
       3.33%  /sub/cgrp2  cgtest 
       2.67%  /sub/cgrp1  cgtest 
       0.67%  /           looper0


The test program (cgtest) follows.

Thanks,
Namhyung


Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>


-------8<-----------------------------------------8<----------------
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <sched.h>
#include <errno.h>
#include <sys/stat.h>
#include <sys/wait.h>
#include <sys/prctl.h>
#include <sys/mount.h>

char cgbase[] = "/sys/fs/cgroup/perf_event";

void mkcgrp(char *name) {
  char buf[256];

  snprintf(buf, sizeof(buf), "%s%s", cgbase, name);
  if (mkdir(buf, 0755) < 0)
    perror("mkdir");
}

void rmcgrp(char *name) {
  char buf[256];

  snprintf(buf, sizeof(buf), "%s%s", cgbase, name);
  if (rmdir(buf) < 0)
    perror("rmdir");
}

void setcgrp(char *name) {
  char buf[256];
  int fd;

  snprintf(buf, sizeof(buf), "%s%s/tasks", cgbase, name);

  fd = open(buf, O_WRONLY);
  if (fd > 0) {
    if (write(fd, "0\n", 2) != 2)
      perror("write");
    close(fd);
  }
}

void create_sub_cgroup(int idx) {
  char buf[128];

  snprintf(buf, sizeof(buf), "/sub/cgrp%d", idx);
  mkcgrp(buf);
}

void remove_sub_cgroup(int idx) {
  char buf[128];

  snprintf(buf, sizeof(buf), "/sub/cgrp%d", idx);
  rmcgrp(buf);
}

void set_sub_cgroup(int idx) {
  char buf[128];

  snprintf(buf, sizeof(buf), "/sub/cgrp%d", idx);
  setcgrp(buf);
}

void set_task_name(int idx) {
  char buf[16];

  snprintf(buf, sizeof(buf), "looper%d", idx);
  prctl(PR_SET_NAME, buf, 0, 0, 0);
}

void loop(unsigned max) {
  volatile unsigned int count = 1;

  while (count++ != max) {
    asm volatile ("pause");
  }
}

void worker(int idx, unsigned cnt, int new_ns) {
  int oldns;

  create_sub_cgroup(idx);
  set_sub_cgroup(idx);

  if (new_ns) {
    if (unshare(CLONE_NEWCGROUP) < 0)
      perror("unshare");

#if 0  /* FIXME */
    if (unshare(CLONE_NEWNS) < 0)
      perror("unshare");

    if (mount("none", "/sys", NULL, MS_REMOUNT | MS_REC | MS_SLAVE, NULL) < 0)
      perror("mount --make-rslave");

    sleep(1);
    if (umount("/sys/fs/cgroup/perf_event") < 0)
      perror("umount");

    if (mount("cgroup", "/sys/fs/cgroup/perf_event", "cgroup",
              MS_NODEV | MS_NOEXEC | MS_NOSUID, "perf_event") < 0)
      perror("mount again");
#endif
  }

  if (fork() == 0) {
    set_task_name(idx);
    loop(cnt);
    exit(0);
  }
  wait(NULL);
}

int main(int argc, char *argv[])
{
  int i, nr = 2;
  int new_ns = 1;
  unsigned cnt = 1000000;
  int fd;

  if (argc > 1)
    nr = atoi(argv[1]);
  if (argc > 2)
    cnt = atoi(argv[2]);
  if (argc > 3)
    new_ns = atoi(argv[3]);

  mkcgrp("/sub");
  setcgrp("/sub");

  for (i = 0; i < nr; i++) {
    if (fork() == 0) {
      worker(i+1, cnt, new_ns);
      exit(0);
    }
  }

  set_task_name(0);
  loop(cnt);

  for (i = 0; i < nr; i++)
    wait(NULL);

  for (i = 0; i < nr; i++)
    remove_sub_cgroup(i+1);

  setcgrp("/");
  rmcgrp("/sub");

  return 0;
}
-------8<-----------------------------------------8<----------------


Namhyung Kim (9):
  perf/core: Add PERF_RECORD_CGROUP event
  perf/core: Add PERF_SAMPLE_CGROUP feature
  perf tools: Basic support for CGROUP event
  perf tools: Maintain cgroup hierarchy
  perf report: Add 'cgroup' sort key
  perf record: Support synthesizing cgroup events
  perf record: Add --all-cgroups option
  perf top: Add --all-cgroups option
  perf script: Add --show-cgroup-events option

 include/linux/perf_event.h                |   1 +
 include/uapi/linux/perf_event.h           |  16 ++-
 init/Kconfig                              |   3 +-
 kernel/events/core.c                      | 133 ++++++++++++++++++++++
 tools/include/uapi/linux/perf_event.h     |  16 ++-
 tools/lib/perf/include/perf/event.h       |   7 ++
 tools/perf/Documentation/perf-record.txt  |   5 +-
 tools/perf/Documentation/perf-report.txt  |   1 +
 tools/perf/Documentation/perf-script.txt  |   3 +
 tools/perf/Documentation/perf-top.txt     |   4 +
 tools/perf/builtin-diff.c                 |   1 +
 tools/perf/builtin-record.c               |  10 ++
 tools/perf/builtin-report.c               |   1 +
 tools/perf/builtin-script.c               |  41 +++++++
 tools/perf/builtin-top.c                  |   9 ++
 tools/perf/tests/sample-parsing.c         |   6 +-
 tools/perf/util/cgroup.c                  |  75 +++++++++++-
 tools/perf/util/cgroup.h                  |  16 ++-
 tools/perf/util/event.c                   |  19 ++++
 tools/perf/util/event.h                   |   6 +
 tools/perf/util/evsel.c                   |  17 ++-
 tools/perf/util/evsel.h                   |   1 +
 tools/perf/util/hist.c                    |   7 ++
 tools/perf/util/hist.h                    |   1 +
 tools/perf/util/machine.c                 |  19 ++++
 tools/perf/util/machine.h                 |   3 +
 tools/perf/util/perf_event_attr_fprintf.c |   2 +
 tools/perf/util/record.h                  |   1 +
 tools/perf/util/session.c                 |   8 ++
 tools/perf/util/sort.c                    |  31 +++++
 tools/perf/util/sort.h                    |   2 +
 tools/perf/util/synthetic-events.c        | 127 +++++++++++++++++++++
 tools/perf/util/synthetic-events.h        |   1 +
 tools/perf/util/tool.h                    |   2 +
 34 files changed, 581 insertions(+), 14 deletions(-)


base-commit: 6c4798d3f08b81c2c52936b10e0fa872590c96ae
-- 
2.24.1.735.g03f4e72817-goog


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event
  2020-01-07 13:34 [PATCHSET 0/9] perf: Improve cgroup profiling (v4) Namhyung Kim
@ 2020-01-07 13:34 ` Namhyung Kim
  2020-01-07 13:34 ` [PATCH 2/9] perf/core: Add PERF_SAMPLE_CGROUP feature Namhyung Kim
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 48+ messages in thread
From: Namhyung Kim @ 2020-01-07 13:34 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Alexander Shishkin, Mark Rutland, Stephane Eranian,
	LKML, linux-perf-users, Tejun Heo, Li Zefan, Johannes Weiner,
	Adrian Hunter, kbuild test robot

To support cgroup tracking, add CGROUP event to save a link between
cgroup path and id number.  This is needed since cgroups can go away
when userspace tries to read the cgroup info (from the id) later.
The attr.cgroup bit was also added to enable cgroup tracking from
userspace.

This event will be generated when a new cgroup becomes active.
Userspace might need to synthesize those events for existing cgroups.

Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
[staticize perf_event_cgroup function]
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 include/uapi/linux/perf_event.h |  13 +++-
 kernel/events/core.c            | 111 ++++++++++++++++++++++++++++++++
 2 files changed, 123 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 377d794d3105..de2ab87ca92c 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -377,7 +377,8 @@ struct perf_event_attr {
 				ksymbol        :  1, /* include ksymbol events */
 				bpf_event      :  1, /* include bpf events */
 				aux_output     :  1, /* generate AUX records instead of events */
-				__reserved_1   : 32;
+				cgroup         :  1, /* include cgroup events */
+				__reserved_1   : 31;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -1006,6 +1007,16 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_BPF_EVENT			= 18,
 
+	/*
+	 * struct {
+	 *	struct perf_event_header	header;
+	 *	u64				id;
+	 *	char				path[];
+	 *	struct sample_id		sample_id;
+	 * };
+	 */
+	PERF_RECORD_CGROUP			= 19,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index a1f8bde19b56..dbc46b884505 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -386,6 +386,7 @@ static atomic_t nr_freq_events __read_mostly;
 static atomic_t nr_switch_events __read_mostly;
 static atomic_t nr_ksymbol_events __read_mostly;
 static atomic_t nr_bpf_events __read_mostly;
+static atomic_t nr_cgroup_events __read_mostly;
 
 static LIST_HEAD(pmus);
 static DEFINE_MUTEX(pmus_lock);
@@ -4455,6 +4456,8 @@ static void unaccount_event(struct perf_event *event)
 		atomic_dec(&nr_comm_events);
 	if (event->attr.namespaces)
 		atomic_dec(&nr_namespaces_events);
+	if (event->attr.cgroup)
+		atomic_dec(&nr_cgroup_events);
 	if (event->attr.task)
 		atomic_dec(&nr_task_events);
 	if (event->attr.freq)
@@ -7564,6 +7567,105 @@ void perf_event_namespaces(struct task_struct *task)
 			NULL);
 }
 
+/*
+ * cgroup tracking
+ */
+#ifdef CONFIG_CGROUPS
+
+struct perf_cgroup_event {
+	char				*path;
+	int				path_size;
+	struct {
+		struct perf_event_header	header;
+		u64				id;
+		char				path[];
+	} event_id;
+};
+
+static int perf_event_cgroup_match(struct perf_event *event)
+{
+	return event->attr.cgroup;
+}
+
+static void perf_event_cgroup_output(struct perf_event *event, void *data)
+{
+	struct perf_cgroup_event *cgroup_event = data;
+	struct perf_output_handle handle;
+	struct perf_sample_data sample;
+	u16 header_size = cgroup_event->event_id.header.size;
+	int ret;
+
+	if (!perf_event_cgroup_match(event))
+		return;
+
+	perf_event_header__init_id(&cgroup_event->event_id.header,
+				   &sample, event);
+	ret = perf_output_begin(&handle, event,
+				cgroup_event->event_id.header.size);
+	if (ret)
+		goto out;
+
+	perf_output_put(&handle, cgroup_event->event_id);
+	__output_copy(&handle, cgroup_event->path, cgroup_event->path_size);
+
+	perf_event__output_id_sample(event, &handle, &sample);
+
+	perf_output_end(&handle);
+out:
+	cgroup_event->event_id.header.size = header_size;
+}
+
+static void perf_event_cgroup(struct cgroup *cgrp)
+{
+	struct perf_cgroup_event cgroup_event;
+	char path_enomem[16] = "//enomem";
+	char *pathname;
+	size_t size;
+
+	if (!atomic_read(&nr_cgroup_events))
+		return;
+
+	cgroup_event = (struct perf_cgroup_event){
+		.event_id  = {
+			.header = {
+				.type = PERF_RECORD_CGROUP,
+				.misc = 0,
+				.size = sizeof(cgroup_event.event_id),
+			},
+			.id = cgroup_id(cgrp),
+		},
+	};
+
+	pathname = kmalloc(PATH_MAX, GFP_KERNEL);
+	if (pathname == NULL) {
+		cgroup_event.path = path_enomem;
+	} else {
+		/* just to be sure to have enough space for alignment */
+		cgroup_path(cgrp, pathname, PATH_MAX - sizeof(u64));
+		cgroup_event.path = pathname;
+	}
+
+	/*
+	 * Since our buffer works in 8 byte units we need to align our string
+	 * size to a multiple of 8. However, we must guarantee the tail end is
+	 * zero'd out to avoid leaking random bits to userspace.
+	 */
+	size = strlen(cgroup_event.path) + 1;
+	while (!IS_ALIGNED(size, sizeof(u64)))
+		cgroup_event.path[size++] = '\0';
+
+	cgroup_event.event_id.header.size += size;
+	cgroup_event.path_size = size;
+
+	perf_iterate_sb(perf_event_cgroup_output,
+			&cgroup_event,
+			NULL);
+
+	kfree(pathname);
+}
+
+#endif
+
 /*
  * mmap tracking
  */
@@ -10607,6 +10709,8 @@ static void account_event(struct perf_event *event)
 		atomic_inc(&nr_comm_events);
 	if (event->attr.namespaces)
 		atomic_inc(&nr_namespaces_events);
+	if (event->attr.cgroup)
+		atomic_inc(&nr_cgroup_events);
 	if (event->attr.task)
 		atomic_inc(&nr_task_events);
 	if (event->attr.freq)
@@ -12581,6 +12685,12 @@ static void perf_cgroup_css_free(struct cgroup_subsys_state *css)
 	kfree(jc);
 }
 
+static int perf_cgroup_css_online(struct cgroup_subsys_state *css)
+{
+	perf_event_cgroup(css->cgroup);
+	return 0;
+}
+
 static int __perf_cgroup_move(void *info)
 {
 	struct task_struct *task = info;
@@ -12602,6 +12712,7 @@ static void perf_cgroup_attach(struct cgroup_taskset *tset)
 struct cgroup_subsys perf_event_cgrp_subsys = {
 	.css_alloc	= perf_cgroup_css_alloc,
 	.css_free	= perf_cgroup_css_free,
+	.css_online	= perf_cgroup_css_online,
 	.attach		= perf_cgroup_attach,
 	/*
 	 * Implicitly enable on dfl hierarchy so that perf events can
-- 
2.24.1.735.g03f4e72817-goog


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 2/9] perf/core: Add PERF_SAMPLE_CGROUP feature
  2020-01-07 13:34 [PATCHSET 0/9] perf: Improve cgroup profiling (v4) Namhyung Kim
  2020-01-07 13:34 ` [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event Namhyung Kim
@ 2020-01-07 13:34 ` Namhyung Kim
  2020-01-07 13:34 ` [PATCH 3/9] perf tools: Basic support for CGROUP event Namhyung Kim
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 48+ messages in thread
From: Namhyung Kim @ 2020-01-07 13:34 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Alexander Shishkin, Mark Rutland, Stephane Eranian,
	LKML, linux-perf-users, Tejun Heo, Li Zefan, Johannes Weiner

The PERF_SAMPLE_CGROUP bit is to save (perf_event) cgroup information
in the sample.  It will add a 64-bit id to identify current cgroup and
it's the file handle in the cgroup file system.  Userspace should use
this information with PERF_RECORD_CGROUP event to match which cgroup
it belongs.

I put it before PERF_SAMPLE_AUX for simplicity since it just needs a
64-bit word.  But if we want bigger samples, I can work on that
direction too.

Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 include/linux/perf_event.h      |  1 +
 include/uapi/linux/perf_event.h |  3 ++-
 init/Kconfig                    |  3 ++-
 kernel/events/core.c            | 22 ++++++++++++++++++++++
 4 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 6d4c22aee384..17b5bff045a6 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1001,6 +1001,7 @@ struct perf_sample_data {
 	u64				stack_user_size;
 
 	u64				phys_addr;
+	u64				cgroup;
 } ____cacheline_aligned;
 
 /* default value for data source */
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index de2ab87ca92c..3a81e9806cb1 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -142,8 +142,9 @@ enum perf_event_sample_format {
 	PERF_SAMPLE_REGS_INTR			= 1U << 18,
 	PERF_SAMPLE_PHYS_ADDR			= 1U << 19,
 	PERF_SAMPLE_AUX				= 1U << 20,
+	PERF_SAMPLE_CGROUP			= 1U << 21,
 
-	PERF_SAMPLE_MAX = 1U << 21,		/* non-ABI */
+	PERF_SAMPLE_MAX = 1U << 22,		/* non-ABI */
 
 	__PERF_SAMPLE_CALLCHAIN_EARLY		= 1ULL << 63, /* non-ABI; internal use */
 };
diff --git a/init/Kconfig b/init/Kconfig
index a34064a031a5..403ed2448b22 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1026,7 +1026,8 @@ config CGROUP_PERF
 	help
 	  This option extends the perf per-cpu mode to restrict monitoring
 	  to threads which belong to the cgroup specified and run on the
-	  designated cpu.
+	  designated cpu.  Or this can be used to have cgroup ID in samples
+	  so that it can monitor performance events among cgroups.
 
 	  Say N if unsure.
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index dbc46b884505..773480527194 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1754,6 +1754,9 @@ static void __perf_event_header_size(struct perf_event *event, u64 sample_type)
 	if (sample_type & PERF_SAMPLE_PHYS_ADDR)
 		size += sizeof(data->phys_addr);
 
+	if (sample_type & PERF_SAMPLE_CGROUP)
+		size += sizeof(data->cgroup);
+
 	event->header_size = size;
 }
 
@@ -6699,6 +6702,9 @@ void perf_output_sample(struct perf_output_handle *handle,
 	if (sample_type & PERF_SAMPLE_PHYS_ADDR)
 		perf_output_put(handle, data->phys_addr);
 
+	if (sample_type & PERF_SAMPLE_CGROUP)
+		perf_output_put(handle, data->cgroup);
+
 	if (sample_type & PERF_SAMPLE_AUX) {
 		perf_output_put(handle, data->aux_size);
 
@@ -6895,6 +6901,16 @@ void perf_prepare_sample(struct perf_event_header *header,
 	if (sample_type & PERF_SAMPLE_PHYS_ADDR)
 		data->phys_addr = perf_virt_to_phys(data->addr);
 
+#ifdef CONFIG_CGROUP_PERF
+	if (sample_type & PERF_SAMPLE_CGROUP) {
+		struct cgroup *cgrp;
+
+		/* protected by RCU */
+		cgrp = task_css_check(current, perf_event_cgrp_id, 1)->cgroup;
+		data->cgroup = cgroup_id(cgrp);
+	}
+#endif
+
 	if (sample_type & PERF_SAMPLE_AUX) {
 		u64 size;
 
@@ -11090,6 +11106,12 @@ static int perf_copy_attr(struct perf_event_attr __user *uattr,
 
 	if (attr->sample_type & PERF_SAMPLE_REGS_INTR)
 		ret = perf_reg_validate(attr->sample_regs_intr);
+
+#ifndef CONFIG_CGROUP_PERF
+	if (attr->sample_type & PERF_SAMPLE_CGROUP)
+		return -EINVAL;
+#endif
+
 out:
 	return ret;
 
-- 
2.24.1.735.g03f4e72817-goog


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 3/9] perf tools: Basic support for CGROUP event
  2020-01-07 13:34 [PATCHSET 0/9] perf: Improve cgroup profiling (v4) Namhyung Kim
  2020-01-07 13:34 ` [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event Namhyung Kim
  2020-01-07 13:34 ` [PATCH 2/9] perf/core: Add PERF_SAMPLE_CGROUP feature Namhyung Kim
@ 2020-01-07 13:34 ` Namhyung Kim
  2020-01-07 13:34 ` [PATCH 4/9] perf tools: Maintain cgroup hierarchy Namhyung Kim
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 48+ messages in thread
From: Namhyung Kim @ 2020-01-07 13:34 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Alexander Shishkin, Mark Rutland, Stephane Eranian,
	LKML, linux-perf-users, Adrian Hunter, kernel test robot

Implement basic functionality to support cgroup tracking.  Each cgroup
can be identified by inode number which can be read from userspace
too.  The actual cgroup processing will come in the later patch.

Cc: Adrian Hunter <adrian.hunter@intel.com>
[fix perf test failure on sampling parsing]
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/include/uapi/linux/perf_event.h     | 16 ++++++++++++++--
 tools/lib/perf/include/perf/event.h       |  7 +++++++
 tools/perf/builtin-diff.c                 |  1 +
 tools/perf/builtin-report.c               |  1 +
 tools/perf/tests/sample-parsing.c         |  6 +++++-
 tools/perf/util/event.c                   | 18 ++++++++++++++++++
 tools/perf/util/event.h                   |  6 ++++++
 tools/perf/util/evsel.c                   |  6 ++++++
 tools/perf/util/machine.c                 | 12 ++++++++++++
 tools/perf/util/machine.h                 |  3 +++
 tools/perf/util/perf_event_attr_fprintf.c |  2 ++
 tools/perf/util/session.c                 |  4 ++++
 tools/perf/util/synthetic-events.c        |  8 ++++++++
 tools/perf/util/tool.h                    |  1 +
 14 files changed, 88 insertions(+), 3 deletions(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 377d794d3105..3a81e9806cb1 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -142,8 +142,9 @@ enum perf_event_sample_format {
 	PERF_SAMPLE_REGS_INTR			= 1U << 18,
 	PERF_SAMPLE_PHYS_ADDR			= 1U << 19,
 	PERF_SAMPLE_AUX				= 1U << 20,
+	PERF_SAMPLE_CGROUP			= 1U << 21,
 
-	PERF_SAMPLE_MAX = 1U << 21,		/* non-ABI */
+	PERF_SAMPLE_MAX = 1U << 22,		/* non-ABI */
 
 	__PERF_SAMPLE_CALLCHAIN_EARLY		= 1ULL << 63, /* non-ABI; internal use */
 };
@@ -377,7 +378,8 @@ struct perf_event_attr {
 				ksymbol        :  1, /* include ksymbol events */
 				bpf_event      :  1, /* include bpf events */
 				aux_output     :  1, /* generate AUX records instead of events */
-				__reserved_1   : 32;
+				cgroup         :  1, /* include cgroup events */
+				__reserved_1   : 31;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -1006,6 +1008,16 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_BPF_EVENT			= 18,
 
+	/*
+	 * struct {
+	 *	struct perf_event_header	header;
+	 *	u64				id;
+	 *	char				path[];
+	 *	struct sample_id		sample_id;
+	 * };
+	 */
+	PERF_RECORD_CGROUP			= 19,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
diff --git a/tools/lib/perf/include/perf/event.h b/tools/lib/perf/include/perf/event.h
index 18106899cb4e..69b44d2cc0f5 100644
--- a/tools/lib/perf/include/perf/event.h
+++ b/tools/lib/perf/include/perf/event.h
@@ -105,6 +105,12 @@ struct perf_record_bpf_event {
 	__u8			 tag[BPF_TAG_SIZE];  // prog tag
 };
 
+struct perf_record_cgroup {
+	struct perf_event_header header;
+	__u64			 id;
+	char			 path[PATH_MAX];
+};
+
 struct perf_record_sample {
 	struct perf_event_header header;
 	__u64			 array[];
@@ -352,6 +358,7 @@ union perf_event {
 	struct perf_record_mmap2		mmap2;
 	struct perf_record_comm			comm;
 	struct perf_record_namespaces		namespaces;
+	struct perf_record_cgroup		cgroup;
 	struct perf_record_fork			fork;
 	struct perf_record_lost			lost;
 	struct perf_record_lost_samples		lost_samples;
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index f8b6ae557d8b..83d4094bf152 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -455,6 +455,7 @@ static struct perf_diff pdiff = {
 		.fork	= perf_event__process_fork,
 		.lost	= perf_event__process_lost,
 		.namespaces = perf_event__process_namespaces,
+		.cgroup = perf_event__process_cgroup,
 		.ordered_events = true,
 		.ordering_requires_timestamps = true,
 	},
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index de988589d99b..4924b042ab00 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -1094,6 +1094,7 @@ int cmd_report(int argc, const char **argv)
 			.mmap2		 = perf_event__process_mmap2,
 			.comm		 = perf_event__process_comm,
 			.namespaces	 = perf_event__process_namespaces,
+			.cgroup		 = perf_event__process_cgroup,
 			.exit		 = perf_event__process_exit,
 			.fork		 = perf_event__process_fork,
 			.lost		 = perf_event__process_lost,
diff --git a/tools/perf/tests/sample-parsing.c b/tools/perf/tests/sample-parsing.c
index 2762e1155238..a8ff7d2df118 100644
--- a/tools/perf/tests/sample-parsing.c
+++ b/tools/perf/tests/sample-parsing.c
@@ -150,6 +150,9 @@ static bool samples_same(const struct perf_sample *s1,
 	if (type & PERF_SAMPLE_PHYS_ADDR)
 		COMP(phys_addr);
 
+	if (type & PERF_SAMPLE_CGROUP)
+		COMP(cgroup);
+
 	if (type & PERF_SAMPLE_AUX) {
 		COMP(aux_sample.size);
 		if (memcmp(s1->aux_sample.data, s2->aux_sample.data,
@@ -228,6 +231,7 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
 			.regs	= regs,
 		},
 		.phys_addr	= 113,
+		.cgroup		= 114,
 		.aux_sample	= {
 			.size	= sizeof(aux_data),
 			.data	= (void *)aux_data,
@@ -331,7 +335,7 @@ int test__sample_parsing(struct test *test __maybe_unused, int subtest __maybe_u
 	 * were added.  Please actually update the test rather than just change
 	 * the condition below.
 	 */
-	if (PERF_SAMPLE_MAX > PERF_SAMPLE_AUX << 1) {
+	if (PERF_SAMPLE_MAX > PERF_SAMPLE_CGROUP << 1) {
 		pr_debug("sample format has changed, some new PERF_SAMPLE_ bit was introduced - test needs updating\n");
 		return -1;
 	}
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index c5447ff516a2..824c038e5c33 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -54,6 +54,7 @@ static const char *perf_event__names[] = {
 	[PERF_RECORD_NAMESPACES]		= "NAMESPACES",
 	[PERF_RECORD_KSYMBOL]			= "KSYMBOL",
 	[PERF_RECORD_BPF_EVENT]			= "BPF_EVENT",
+	[PERF_RECORD_CGROUP]			= "CGROUP",
 	[PERF_RECORD_HEADER_ATTR]		= "ATTR",
 	[PERF_RECORD_HEADER_EVENT_TYPE]		= "EVENT_TYPE",
 	[PERF_RECORD_HEADER_TRACING_DATA]	= "TRACING_DATA",
@@ -180,6 +181,12 @@ size_t perf_event__fprintf_namespaces(union perf_event *event, FILE *fp)
 	return ret;
 }
 
+size_t perf_event__fprintf_cgroup(union perf_event *event, FILE *fp)
+{
+	return fprintf(fp, " cgroup: %" PRI_lu64 " %s\n",
+		       event->cgroup.id, event->cgroup.path);
+}
+
 int perf_event__process_comm(struct perf_tool *tool __maybe_unused,
 			     union perf_event *event,
 			     struct perf_sample *sample,
@@ -196,6 +203,14 @@ int perf_event__process_namespaces(struct perf_tool *tool __maybe_unused,
 	return machine__process_namespaces_event(machine, event, sample);
 }
 
+int perf_event__process_cgroup(struct perf_tool *tool __maybe_unused,
+			       union perf_event *event,
+			       struct perf_sample *sample,
+			       struct machine *machine)
+{
+	return machine__process_cgroup_event(machine, event, sample);
+}
+
 int perf_event__process_lost(struct perf_tool *tool __maybe_unused,
 			     union perf_event *event,
 			     struct perf_sample *sample,
@@ -417,6 +432,9 @@ size_t perf_event__fprintf(union perf_event *event, FILE *fp)
 	case PERF_RECORD_NAMESPACES:
 		ret += perf_event__fprintf_namespaces(event, fp);
 		break;
+	case PERF_RECORD_CGROUP:
+		ret += perf_event__fprintf_cgroup(event, fp);
+		break;
 	case PERF_RECORD_MMAP2:
 		ret += perf_event__fprintf_mmap2(event, fp);
 		break;
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 85223159737c..0ad3ba22817d 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -135,6 +135,7 @@ struct perf_sample {
 	u32 raw_size;
 	u64 data_src;
 	u64 phys_addr;
+	u64 cgroup;
 	u32 flags;
 	u16 insn_len;
 	u8  cpumode;
@@ -321,6 +322,10 @@ int perf_event__process_namespaces(struct perf_tool *tool,
 				   union perf_event *event,
 				   struct perf_sample *sample,
 				   struct machine *machine);
+int perf_event__process_cgroup(struct perf_tool *tool,
+			       union perf_event *event,
+			       struct perf_sample *sample,
+			       struct machine *machine);
 int perf_event__process_mmap(struct perf_tool *tool,
 			     union perf_event *event,
 			     struct perf_sample *sample,
@@ -376,6 +381,7 @@ size_t perf_event__fprintf_switch(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_thread_map(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_cpu_map(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_namespaces(union perf_event *event, FILE *fp);
+size_t perf_event__fprintf_cgroup(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_ksymbol(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_bpf(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf(union perf_event *event, FILE *fp);
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index a69e64236120..0f5a67603139 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2250,6 +2250,12 @@ int perf_evsel__parse_sample(struct evsel *evsel, union perf_event *event,
 		array++;
 	}
 
+	data->cgroup = 0;
+	if (type & PERF_SAMPLE_CGROUP) {
+		data->cgroup = *array;
+		array++;
+	}
+
 	if (type & PERF_SAMPLE_AUX) {
 		OVERFLOW_CHECK_u64(array);
 		sz = *array++;
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index c8c5410315e8..2c3223bec561 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -654,6 +654,16 @@ int machine__process_namespaces_event(struct machine *machine __maybe_unused,
 	return err;
 }
 
+int machine__process_cgroup_event(struct machine *machine __maybe_unused,
+				  union perf_event *event,
+				  struct perf_sample *sample __maybe_unused)
+{
+	if (dump_trace)
+		perf_event__fprintf_cgroup(event, stdout);
+
+	return 0;
+}
+
 int machine__process_lost_event(struct machine *machine __maybe_unused,
 				union perf_event *event, struct perf_sample *sample __maybe_unused)
 {
@@ -1880,6 +1890,8 @@ int machine__process_event(struct machine *machine, union perf_event *event,
 		ret = machine__process_mmap_event(machine, event, sample); break;
 	case PERF_RECORD_NAMESPACES:
 		ret = machine__process_namespaces_event(machine, event, sample); break;
+	case PERF_RECORD_CGROUP:
+		ret = machine__process_cgroup_event(machine, event, sample); break;
 	case PERF_RECORD_MMAP2:
 		ret = machine__process_mmap2_event(machine, event, sample); break;
 	case PERF_RECORD_FORK:
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index be0a930eca89..fa1be9ea00fa 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -128,6 +128,9 @@ int machine__process_switch_event(struct machine *machine,
 int machine__process_namespaces_event(struct machine *machine,
 				      union perf_event *event,
 				      struct perf_sample *sample);
+int machine__process_cgroup_event(struct machine *machine,
+				  union perf_event *event,
+				  struct perf_sample *sample);
 int machine__process_mmap_event(struct machine *machine, union perf_event *event,
 				struct perf_sample *sample);
 int machine__process_mmap2_event(struct machine *machine, union perf_event *event,
diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/perf_event_attr_fprintf.c
index 651203126c71..a37a89c747d8 100644
--- a/tools/perf/util/perf_event_attr_fprintf.c
+++ b/tools/perf/util/perf_event_attr_fprintf.c
@@ -35,6 +35,7 @@ static void __p_sample_type(char *buf, size_t size, u64 value)
 		bit_name(BRANCH_STACK), bit_name(REGS_USER), bit_name(STACK_USER),
 		bit_name(IDENTIFIER), bit_name(REGS_INTR), bit_name(DATA_SRC),
 		bit_name(WEIGHT), bit_name(PHYS_ADDR), bit_name(AUX),
+		bit_name(CGROUP),
 		{ .name = NULL, }
 	};
 #undef bit_name
@@ -131,6 +132,7 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
 	PRINT_ATTRf(ksymbol, p_unsigned);
 	PRINT_ATTRf(bpf_event, p_unsigned);
 	PRINT_ATTRf(aux_output, p_unsigned);
+	PRINT_ATTRf(cgroup, p_unsigned);
 
 	PRINT_ATTRn("{ wakeup_events, wakeup_watermark }", wakeup_events, p_unsigned);
 	PRINT_ATTRf(bp_type, p_unsigned);
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index d0d7d25b23e3..6b4c12d48c3f 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -471,6 +471,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
 		tool->comm = process_event_stub;
 	if (tool->namespaces == NULL)
 		tool->namespaces = process_event_stub;
+	if (tool->cgroup == NULL)
+		tool->cgroup = process_event_stub;
 	if (tool->fork == NULL)
 		tool->fork = process_event_stub;
 	if (tool->exit == NULL)
@@ -1434,6 +1436,8 @@ static int machines__deliver_event(struct machines *machines,
 		return tool->comm(tool, event, sample, machine);
 	case PERF_RECORD_NAMESPACES:
 		return tool->namespaces(tool, event, sample, machine);
+	case PERF_RECORD_CGROUP:
+		return tool->cgroup(tool, event, sample, machine);
 	case PERF_RECORD_FORK:
 		return tool->fork(tool, event, sample, machine);
 	case PERF_RECORD_EXIT:
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index c423298fe62d..cd336eb8886b 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -1228,6 +1228,9 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
 	if (type & PERF_SAMPLE_PHYS_ADDR)
 		result += sizeof(u64);
 
+	if (type & PERF_SAMPLE_CGROUP)
+		result += sizeof(u64);
+
 	if (type & PERF_SAMPLE_AUX) {
 		result += sizeof(u64);
 		result += sample->aux_sample.size;
@@ -1401,6 +1404,11 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo
 		array++;
 	}
 
+	if (type & PERF_SAMPLE_CGROUP) {
+		*array = sample->cgroup;
+		array++;
+	}
+
 	if (type & PERF_SAMPLE_AUX) {
 		sz = sample->aux_sample.size;
 		*array++ = sz;
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 2abbf668b8de..472ef5eb4068 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -46,6 +46,7 @@ struct perf_tool {
 			mmap2,
 			comm,
 			namespaces,
+			cgroup,
 			fork,
 			exit,
 			lost,
-- 
2.24.1.735.g03f4e72817-goog


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 4/9] perf tools: Maintain cgroup hierarchy
  2020-01-07 13:34 [PATCHSET 0/9] perf: Improve cgroup profiling (v4) Namhyung Kim
                   ` (2 preceding siblings ...)
  2020-01-07 13:34 ` [PATCH 3/9] perf tools: Basic support for CGROUP event Namhyung Kim
@ 2020-01-07 13:34 ` Namhyung Kim
  2020-01-08 21:52   ` Jiri Olsa
  2020-01-08 22:01   ` Jiri Olsa
  2020-01-07 13:34 ` [PATCH 5/9] perf report: Add 'cgroup' sort key Namhyung Kim
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 48+ messages in thread
From: Namhyung Kim @ 2020-01-07 13:34 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Alexander Shishkin, Mark Rutland, Stephane Eranian,
	LKML, linux-perf-users

Each cgroup is kept in the global cgroup_tree sorted by the cgroup id.
Hist entries have cgroup id can compare it directly and later it can
be used to find a group name using this tree.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/cgroup.c  | 72 +++++++++++++++++++++++++++++++++++++++
 tools/perf/util/cgroup.h  | 15 +++++---
 tools/perf/util/machine.c |  7 ++++
 tools/perf/util/session.c |  4 +++
 4 files changed, 94 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
index 4881d4af3381..4e8ef1db0c94 100644
--- a/tools/perf/util/cgroup.c
+++ b/tools/perf/util/cgroup.c
@@ -13,6 +13,8 @@
 
 int nr_cgroups;
 
+static struct rb_root cgroup_tree = RB_ROOT;
+
 static int
 cgroupfs_find_mountpoint(char *buf, size_t maxlen)
 {
@@ -250,3 +252,73 @@ int parse_cgroups(const struct option *opt, const char *str,
 	}
 	return 0;
 }
+
+struct cgroup *cgroup__findnew(uint64_t id, const char *path)
+{
+	struct rb_node **p = &cgroup_tree.rb_node;
+	struct rb_node *parent = NULL;
+	struct cgroup *cgrp;
+
+	while (*p != NULL) {
+		parent = *p;
+		cgrp = rb_entry(parent, struct cgroup, node);
+
+		if (cgrp->id == id)
+			return cgrp;
+
+		if (cgrp->id < id)
+			p = &(*p)->rb_left;
+		else
+			p = &(*p)->rb_right;
+	}
+
+	cgrp = malloc(sizeof(*cgrp));
+	if (cgrp == NULL)
+		return NULL;
+
+	cgrp->name = strdup(path);
+	if (cgrp->name == NULL) {
+		free(cgrp);
+		return NULL;
+	}
+
+	cgrp->fd = -1;
+	cgrp->id = id;
+	refcount_set(&cgrp->refcnt, 1);
+
+	rb_link_node(&cgrp->node, parent, p);
+	rb_insert_color(&cgrp->node, &cgroup_tree);
+
+	return cgrp;
+}
+
+struct cgroup *cgroup__find_by_path(const char *path)
+{
+	struct rb_node *node;
+
+	node = rb_first(&cgroup_tree);
+	while (node) {
+		struct cgroup *cgrp = rb_entry(node, struct cgroup, node);
+
+		if (!strcmp(cgrp->name, path))
+			return cgrp;
+
+		node = rb_next(&cgrp->node);
+	}
+
+	return NULL;
+}
+
+void destroy_cgroups(void)
+{
+	struct rb_node *node;
+	struct cgroup *cgrp;
+
+	while (!RB_EMPTY_ROOT(&cgroup_tree)) {
+		node = rb_first(&cgroup_tree);
+		cgrp = rb_entry(node, struct cgroup, node);
+
+		rb_erase(node, &cgroup_tree);
+		cgroup__put(cgrp);
+	}
+}
diff --git a/tools/perf/util/cgroup.h b/tools/perf/util/cgroup.h
index 2ec11f01090d..381583df27c7 100644
--- a/tools/perf/util/cgroup.h
+++ b/tools/perf/util/cgroup.h
@@ -3,16 +3,18 @@
 #define __CGROUP_H__
 
 #include <linux/refcount.h>
+#include <linux/rbtree.h>
 
 struct option;
 
 struct cgroup {
-	char *name;
-	int fd;
-	refcount_t refcnt;
+	struct rb_node		node;
+	u64			id;
+	char			*name;
+	int			fd;
+	refcount_t		refcnt;
 };
 
-
 extern int nr_cgroups; /* number of explicit cgroups defined */
 
 struct cgroup *cgroup__get(struct cgroup *cgroup);
@@ -26,4 +28,9 @@ void evlist__set_default_cgroup(struct evlist *evlist, struct cgroup *cgroup);
 
 int parse_cgroups(const struct option *opt, const char *str, int unset);
 
+struct cgroup *cgroup__findnew(uint64_t id, const char *path);
+struct cgroup *cgroup__find_by_path(const char *path);
+
+void destroy_cgroups(void);
+
 #endif /* __CGROUP_H__ */
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 2c3223bec561..19d40a2016d7 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -33,6 +33,7 @@
 #include "asm/bug.h"
 #include "bpf-event.h"
 #include <internal/lib.h> // page_size
+#include "cgroup.h"
 
 #include <linux/ctype.h>
 #include <symbol/kallsyms.h>
@@ -658,9 +659,15 @@ int machine__process_cgroup_event(struct machine *machine __maybe_unused,
 				  union perf_event *event,
 				  struct perf_sample *sample __maybe_unused)
 {
+	struct cgroup *cgrp;
+
 	if (dump_trace)
 		perf_event__fprintf_cgroup(event, stdout);
 
+	cgrp = cgroup__findnew(event->cgroup.id, event->cgroup.path);
+	if (cgrp == NULL)
+		return -ENOMEM;
+
 	return 0;
 }
 
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 6b4c12d48c3f..5e2c9f504184 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -289,6 +289,8 @@ static void perf_session__release_decomp_events(struct perf_session *session)
 	} while (1);
 }
 
+extern void destroy_cgroups(void);
+
 void perf_session__delete(struct perf_session *session)
 {
 	if (session == NULL)
@@ -303,6 +305,8 @@ void perf_session__delete(struct perf_session *session)
 	if (session->data)
 		perf_data__close(session->data);
 	free(session);
+
+	destroy_cgroups();
 }
 
 static int process_event_synth_tracing_data_stub(struct perf_session *session
-- 
2.24.1.735.g03f4e72817-goog


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 5/9] perf report: Add 'cgroup' sort key
  2020-01-07 13:34 [PATCHSET 0/9] perf: Improve cgroup profiling (v4) Namhyung Kim
                   ` (3 preceding siblings ...)
  2020-01-07 13:34 ` [PATCH 4/9] perf tools: Maintain cgroup hierarchy Namhyung Kim
@ 2020-01-07 13:34 ` Namhyung Kim
  2020-01-07 13:34 ` [PATCH 6/9] perf record: Support synthesizing cgroup events Namhyung Kim
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 48+ messages in thread
From: Namhyung Kim @ 2020-01-07 13:34 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Alexander Shishkin, Mark Rutland, Stephane Eranian,
	LKML, linux-perf-users

The cgroup sort key is to show cgroup membership of each task.
Currently it shows full path in the cgroupfs (not relative to the root
of cgroup namespace) since it'd be more intuitive IMHO.  Otherwise
root cgroup in different namespaces will all show same name - "/".

The cgroup sort key should come before cgroup_id otherwise
sort_dimension__add() will match it to cgroup_id as it only matches
with the given substring.

For example it will look like following.  Note that record patch
adding --all-cgroups patch will come later.

  $ perf record -a --namespace --all-cgroups  cgtest
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.208 MB perf.data (4090 samples) ]

  $ perf report -s cgroup_id,cgroup,pid
  ...
  # Overhead  cgroup id (dev/inode)  cgroup          Pid:Command
  # ........  .....................  ..........  ...............
  #
      93.96%  0/0x0                  /                 0:swapper
       1.25%  3/0xeffffffb           /               278:looper0
       0.86%  3/0xf000015f           /sub/cgrp1      280:cgtest
       0.37%  3/0xf0000160           /sub/cgrp2      281:cgtest
       0.34%  3/0xf0000163           /sub/cgrp3      282:cgtest
       0.22%  3/0xeffffffb           /sub            278:looper0
       0.20%  3/0xeffffffb           /               280:cgtest
       0.15%  3/0xf0000163           /sub/cgrp3      285:looper3

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-report.txt |  1 +
 tools/perf/util/hist.c                   |  7 ++++++
 tools/perf/util/hist.h                   |  1 +
 tools/perf/util/sort.c                   | 31 ++++++++++++++++++++++++
 tools/perf/util/sort.h                   |  2 ++
 5 files changed, 42 insertions(+)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 8dbe2119686a..5af43f3faf9f 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -95,6 +95,7 @@ OPTIONS
 	abort cost. This is the global weight.
 	- local_weight: Local weight version of the weight above.
 	- cgroup_id: ID derived from cgroup namespace device and inode numbers.
+	- cgroup: cgroup pathname in the cgroupfs.
 	- transaction: Transaction abort flags.
 	- overhead: Overhead percentage of sample
 	- overhead_sys: Overhead percentage of sample running in system mode
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index ca5a8f4d007e..9f28d2f487c1 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -10,6 +10,7 @@
 #include "mem-events.h"
 #include "session.h"
 #include "namespaces.h"
+#include "cgroup.h"
 #include "sort.h"
 #include "units.h"
 #include "evlist.h"
@@ -222,6 +223,11 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h)
 
 	if (h->trace_output)
 		hists__new_col_len(hists, HISTC_TRACE, strlen(h->trace_output));
+
+	if (h->cgroup) {
+		struct cgroup *cgrp = cgroup__findnew(h->cgroup, "<unknown>");
+		hists__new_col_len(hists, HISTC_CGROUP, strlen(cgrp->name));
+	}
 }
 
 void hists__output_recalc_col_len(struct hists *hists, int max_rows)
@@ -691,6 +697,7 @@ __hists__add_entry(struct hists *hists,
 			.dev = ns ? ns->link_info[CGROUP_NS_INDEX].dev : 0,
 			.ino = ns ? ns->link_info[CGROUP_NS_INDEX].ino : 0,
 		},
+		.cgroup = sample->cgroup,
 		.ms = {
 			.maps	= al->maps,
 			.map	= al->map,
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 0aa63aeb58ec..f7fb9c32dda5 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -38,6 +38,7 @@ enum hist_column {
 	HISTC_THREAD,
 	HISTC_COMM,
 	HISTC_CGROUP_ID,
+	HISTC_CGROUP,
 	HISTC_PARENT,
 	HISTC_CPU,
 	HISTC_SOCKET,
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index ab0cfd790ad0..52073029e592 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -25,6 +25,7 @@
 #include "mem-events.h"
 #include "annotate.h"
 #include "time-utils.h"
+#include "cgroup.h"
 #include <linux/kernel.h>
 #include <linux/string.h>
 
@@ -634,6 +635,35 @@ struct sort_entry sort_cgroup_id = {
 	.se_width_idx	= HISTC_CGROUP_ID,
 };
 
+/* --sort cgroup */
+
+static int64_t
+sort__cgroup_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	return right->cgroup - left->cgroup;
+}
+
+static int hist_entry__cgroup_snprintf(struct hist_entry *he,
+				       char *bf, size_t size,
+				       unsigned int width __maybe_unused)
+{
+	const char *cgrp_name = "N/A";
+
+	if (he->cgroup) {
+		struct cgroup *cgrp = cgroup__findnew(he->cgroup, cgrp_name);
+		cgrp_name = cgrp->name;
+	}
+
+	return repsep_snprintf(bf, size, "%s", cgrp_name);
+}
+
+struct sort_entry sort_cgroup = {
+	.se_header      = "cgroup",
+	.se_cmp	        = sort__cgroup_cmp,
+	.se_snprintf    = hist_entry__cgroup_snprintf,
+	.se_width_idx	= HISTC_CGROUP,
+};
+
 /* --sort socket */
 
 static int64_t
@@ -1658,6 +1688,7 @@ static struct sort_dimension common_sort_dimensions[] = {
 	DIM(SORT_TRACE, "trace", sort_trace),
 	DIM(SORT_SYM_SIZE, "symbol_size", sort_sym_size),
 	DIM(SORT_DSO_SIZE, "dso_size", sort_dso_size),
+	DIM(SORT_CGROUP, "cgroup", sort_cgroup),
 	DIM(SORT_CGROUP_ID, "cgroup_id", sort_cgroup_id),
 	DIM(SORT_SYM_IPC_NULL, "ipc_null", sort_sym_ipc_null),
 	DIM(SORT_TIME, "time", sort_time),
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 6c862d62d052..cfa6ac6f7d06 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -101,6 +101,7 @@ struct hist_entry {
 	struct thread		*thread;
 	struct comm		*comm;
 	struct namespace_id	cgroup_id;
+	u64			cgroup;
 	u64			ip;
 	u64			transaction;
 	s32			socket;
@@ -224,6 +225,7 @@ enum sort_type {
 	SORT_TRACE,
 	SORT_SYM_SIZE,
 	SORT_DSO_SIZE,
+	SORT_CGROUP,
 	SORT_CGROUP_ID,
 	SORT_SYM_IPC_NULL,
 	SORT_TIME,
-- 
2.24.1.735.g03f4e72817-goog


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 6/9] perf record: Support synthesizing cgroup events
  2020-01-07 13:34 [PATCHSET 0/9] perf: Improve cgroup profiling (v4) Namhyung Kim
                   ` (4 preceding siblings ...)
  2020-01-07 13:34 ` [PATCH 5/9] perf report: Add 'cgroup' sort key Namhyung Kim
@ 2020-01-07 13:34 ` Namhyung Kim
  2020-01-08 22:08   ` Jiri Olsa
                     ` (2 more replies)
  2020-01-07 13:34 ` [PATCH 7/9] perf record: Add --all-cgroups option Namhyung Kim
                   ` (2 subsequent siblings)
  8 siblings, 3 replies; 48+ messages in thread
From: Namhyung Kim @ 2020-01-07 13:34 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Alexander Shishkin, Mark Rutland, Stephane Eranian,
	LKML, linux-perf-users

Synthesize cgroup events by iterating cgroup filesystem directories.
The cgroup event only saves the portion of cgroup path after the mount
point and the cgroup id (which actually is a file handle).

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-record.c        |   5 ++
 tools/perf/util/cgroup.c           |   3 +-
 tools/perf/util/cgroup.h           |   1 +
 tools/perf/util/event.c            |   1 +
 tools/perf/util/synthetic-events.c | 119 +++++++++++++++++++++++++++++
 tools/perf/util/synthetic-events.h |   1 +
 tools/perf/util/tool.h             |   1 +
 7 files changed, 129 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 4c301466101b..2802de9538ff 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1397,6 +1397,11 @@ static int record__synthesize(struct record *rec, bool tail)
 	if (err < 0)
 		pr_warning("Couldn't synthesize bpf events.\n");
 
+	err = perf_event__synthesize_cgroups(tool, process_synthesized_event,
+					     machine);
+	if (err < 0)
+		pr_warning("Couldn't synthesize cgroup events.\n");
+
 	err = __machine__synthesize_threads(machine, tool, &opts->target, rec->evlist->core.threads,
 					    process_synthesized_event, opts->sample_address,
 					    1);
diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
index 4e8ef1db0c94..5147d22b3bda 100644
--- a/tools/perf/util/cgroup.c
+++ b/tools/perf/util/cgroup.c
@@ -15,8 +15,7 @@ int nr_cgroups;
 
 static struct rb_root cgroup_tree = RB_ROOT;
 
-static int
-cgroupfs_find_mountpoint(char *buf, size_t maxlen)
+int cgroupfs_find_mountpoint(char *buf, size_t maxlen)
 {
 	FILE *fp;
 	char mountpoint[PATH_MAX + 1], tokens[PATH_MAX + 1], type[PATH_MAX + 1];
diff --git a/tools/perf/util/cgroup.h b/tools/perf/util/cgroup.h
index 381583df27c7..9a67060723fa 100644
--- a/tools/perf/util/cgroup.h
+++ b/tools/perf/util/cgroup.h
@@ -17,6 +17,7 @@ struct cgroup {
 
 extern int nr_cgroups; /* number of explicit cgroups defined */
 
+int cgroupfs_find_mountpoint(char *buf, size_t maxlen);
 struct cgroup *cgroup__get(struct cgroup *cgroup);
 void cgroup__put(struct cgroup *cgroup);
 
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 824c038e5c33..28801c867f39 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -33,6 +33,7 @@
 #include "bpf-event.h"
 #include "tool.h"
 #include "../perf.h"
+#include "cgroup.h"
 
 static const char *perf_event__names[] = {
 	[0]					= "TOTAL",
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index cd336eb8886b..36f122cac44f 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -16,6 +16,7 @@
 #include "util/synthetic-events.h"
 #include "util/target.h"
 #include "util/time-utils.h"
+#include "util/cgroup.h"
 #include <linux/bitops.h>
 #include <linux/kernel.h>
 #include <linux/string.h>
@@ -413,6 +414,124 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 	return rc;
 }
 
+static int perf_event__synthesize_cgroup(struct perf_tool *tool,
+					 union perf_event *event,
+					 char *path, size_t mount_len,
+					 perf_event__handler_t process,
+					 struct machine *machine)
+{
+	size_t event_size = sizeof(event->cgroup) - sizeof(event->cgroup.path);
+	size_t path_len = strlen(path) - mount_len + 1;
+	struct {
+		struct file_handle fh;
+		uint64_t cgroup_id;
+	} handle;
+	int mount_id;
+
+	while (path_len % sizeof(u64))
+		path[mount_len + path_len++] = '\0';
+
+	memset(&event->cgroup, 0, event_size);
+
+	event->cgroup.header.type = PERF_RECORD_CGROUP;
+	event->cgroup.header.size = event_size + path_len + machine->id_hdr_size;
+
+	handle.fh.handle_bytes = sizeof(handle.cgroup_id);
+	if (name_to_handle_at(AT_FDCWD, path, &handle.fh, &mount_id, 0) < 0) {
+		pr_debug("stat failed: %s\n", path);
+		return -1;
+	}
+
+	event->cgroup.id = handle.cgroup_id;
+	strncpy(event->cgroup.path, path + mount_len, path_len);
+	memset(event->cgroup.path + path_len, 0, machine->id_hdr_size);
+
+	if (perf_tool__process_synth_event(tool, event, machine, process) < 0) {
+		pr_debug("process synth event failed\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int perf_event__walk_cgroup_tree(struct perf_tool *tool,
+					union perf_event *event,
+					char *path, size_t mount_len,
+					perf_event__handler_t process,
+					struct machine *machine)
+{
+	size_t pos = strlen(path);
+	DIR *d;
+	struct dirent *dent;
+	int ret = 0;
+
+	if (perf_event__synthesize_cgroup(tool, event, path, mount_len,
+					  process, machine) < 0)
+		return -1;
+
+	d = opendir(path);
+	if (d == NULL) {
+		pr_debug("failed to open directory: %s\n", path);
+		return -1;
+	}
+
+	while ((dent = readdir(d)) != NULL) {
+		if (dent->d_type != DT_DIR)
+			continue;
+		if (!strcmp(dent->d_name, ".") ||
+		    !strcmp(dent->d_name, ".."))
+			continue;
+
+		if (path[pos - 1] != '/')
+			strcat(path, "/");
+		strcat(path, dent->d_name);
+
+		ret = perf_event__walk_cgroup_tree(tool, event, path,
+						   mount_len, process, machine);
+		if (ret < 0)
+			break;
+
+		path[pos] = '\0';
+	}
+
+	closedir(d);
+	return ret;
+}
+
+int perf_event__synthesize_cgroups(struct perf_tool *tool,
+				   perf_event__handler_t process,
+				   struct machine *machine)
+{
+	union perf_event event;
+	char *cgrp_root;
+	size_t mount_len;  /* length of mount point in the path */
+	int ret = -1;
+
+	cgrp_root = malloc(PATH_MAX);
+	if (cgrp_root == NULL)
+		return -1;
+
+	if (cgroupfs_find_mountpoint(cgrp_root, PATH_MAX) < 0) {
+		pr_debug("cannot find cgroup mount point\n");
+		goto out;
+	}
+
+	mount_len = strlen(cgrp_root);
+	/* make sure the path starts with a slash (after mount point) */
+	strcat(cgrp_root, "/");
+
+	if (perf_event__walk_cgroup_tree(tool, &event, cgrp_root, mount_len,
+					 process, machine) < 0)
+		goto out;
+
+	ret = 0;
+
+out:
+	free(cgrp_root);
+
+	return ret;
+}
+
 int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t process,
 				   struct machine *machine)
 {
diff --git a/tools/perf/util/synthetic-events.h b/tools/perf/util/synthetic-events.h
index baead0cdc381..e7a3e9589738 100644
--- a/tools/perf/util/synthetic-events.h
+++ b/tools/perf/util/synthetic-events.h
@@ -45,6 +45,7 @@ int perf_event__synthesize_kernel_mmap(struct perf_tool *tool, perf_event__handl
 int perf_event__synthesize_mmap_events(struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine, bool mmap_data);
 int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
 int perf_event__synthesize_namespaces(struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine);
+int perf_event__synthesize_cgroups(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
 int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format, const struct perf_sample *sample);
 int perf_event__synthesize_stat_config(struct perf_tool *tool, struct perf_stat_config *config, perf_event__handler_t process, struct machine *machine);
 int perf_event__synthesize_stat_events(struct perf_stat_config *config, struct perf_tool *tool, struct evlist *evlist, perf_event__handler_t process, bool attrs);
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 472ef5eb4068..3fb67bd31e4a 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -79,6 +79,7 @@ struct perf_tool {
 	bool		ordered_events;
 	bool		ordering_requires_timestamps;
 	bool		namespace_events;
+	bool		cgroup_events;
 	bool		no_warn;
 	enum show_feature_header show_feat_hdr;
 };
-- 
2.24.1.735.g03f4e72817-goog


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 7/9] perf record: Add --all-cgroups option
  2020-01-07 13:34 [PATCHSET 0/9] perf: Improve cgroup profiling (v4) Namhyung Kim
                   ` (5 preceding siblings ...)
  2020-01-07 13:34 ` [PATCH 6/9] perf record: Support synthesizing cgroup events Namhyung Kim
@ 2020-01-07 13:34 ` Namhyung Kim
  2020-01-07 13:35 ` [PATCH 8/9] perf top: " Namhyung Kim
  2020-01-07 13:35 ` [PATCH 9/9] perf script: Add --show-cgroup-events option Namhyung Kim
  8 siblings, 0 replies; 48+ messages in thread
From: Namhyung Kim @ 2020-01-07 13:34 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Alexander Shishkin, Mark Rutland, Stephane Eranian,
	LKML, linux-perf-users

The --all-cgroups option is to enable cgroup profiling support.  It
tells kernel to record CGROUP events in the ring buffer so that perf
report can identify task/cgroup association later.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-record.txt |  5 ++++-
 tools/perf/builtin-record.c              |  5 +++++
 tools/perf/util/evsel.c                  | 11 ++++++++++-
 tools/perf/util/evsel.h                  |  1 +
 tools/perf/util/record.h                 |  1 +
 5 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index b23a4012a606..6dd9d69d7dee 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -385,7 +385,10 @@ displayed with the weight and local_weight sort keys.  This currently works for
 abort events and some memory events in precise mode on modern Intel CPUs.
 
 --namespaces::
-Record events of type PERF_RECORD_NAMESPACES.
+Record events of type PERF_RECORD_NAMESPACES.  This enables 'cgroup_id' sort key.
+
+--all-cgroups::
+Record events of type PERF_RECORD_CGROUP.  This enables 'cgroup' sort key.
 
 --transaction::
 Record transaction flags for transaction related events.
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 2802de9538ff..7d7912e121d6 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1433,6 +1433,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 	if (rec->opts.record_namespaces)
 		tool->namespace_events = true;
 
+	if (rec->opts.record_cgroup)
+		tool->cgroup_events = true;
+
 	if (rec->opts.auxtrace_snapshot_mode || rec->switch_output.enabled) {
 		signal(SIGUSR2, snapshot_sig_handler);
 		if (rec->opts.auxtrace_snapshot_mode)
@@ -2363,6 +2366,8 @@ static struct option __record_options[] = {
 			"per thread proc mmap processing timeout in ms"),
 	OPT_BOOLEAN(0, "namespaces", &record.opts.record_namespaces,
 		    "Record namespaces events"),
+	OPT_BOOLEAN(0, "all-cgroups", &record.opts.record_cgroup,
+		    "Record cgroup events"),
 	OPT_BOOLEAN(0, "switch-events", &record.opts.record_switch_events,
 		    "Record context switch events"),
 	OPT_BOOLEAN_FLAG(0, "all-kernel", &record.opts.all_kernel,
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 0f5a67603139..ff6c6683a32b 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1102,6 +1102,11 @@ void perf_evsel__config(struct evsel *evsel, struct record_opts *opts,
 	if (opts->record_namespaces)
 		attr->namespaces  = track;
 
+	if (opts->record_cgroup) {
+		attr->cgroup = track && !perf_missing_features.cgroup;
+		perf_evsel__set_sample_bit(evsel, CGROUP);
+	}
+
 	if (opts->record_switch_events)
 		attr->context_switch = track;
 
@@ -1782,7 +1787,11 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
 	 * Must probe features in the order they were added to the
 	 * perf_event_attr interface.
 	 */
-	if (!perf_missing_features.aux_output && evsel->core.attr.aux_output) {
+	if (!perf_missing_features.cgroup && evsel->core.attr.cgroup) {
+		perf_missing_features.cgroup = true;
+		pr_debug2_peo("Kernel has no cgroup sampling support, bailing out\n");
+		goto out_close;
+	} else if (!perf_missing_features.aux_output && evsel->core.attr.aux_output) {
 		perf_missing_features.aux_output = true;
 		pr_debug2_peo("Kernel has no attr.aux_output support, bailing out\n");
 		goto out_close;
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index dc14f4a823cd..df71b530740b 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -119,6 +119,7 @@ struct perf_missing_features {
 	bool ksymbol;
 	bool bpf;
 	bool aux_output;
+	bool cgroup;
 };
 
 extern struct perf_missing_features perf_missing_features;
diff --git a/tools/perf/util/record.h b/tools/perf/util/record.h
index 5421fd2ad383..24316458be20 100644
--- a/tools/perf/util/record.h
+++ b/tools/perf/util/record.h
@@ -34,6 +34,7 @@ struct record_opts {
 	bool	      auxtrace_snapshot_on_exit;
 	bool	      auxtrace_sample_mode;
 	bool	      record_namespaces;
+	bool	      record_cgroup;
 	bool	      record_switch_events;
 	bool	      all_kernel;
 	bool	      all_user;
-- 
2.24.1.735.g03f4e72817-goog


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 8/9] perf top: Add --all-cgroups option
  2020-01-07 13:34 [PATCHSET 0/9] perf: Improve cgroup profiling (v4) Namhyung Kim
                   ` (6 preceding siblings ...)
  2020-01-07 13:34 ` [PATCH 7/9] perf record: Add --all-cgroups option Namhyung Kim
@ 2020-01-07 13:35 ` Namhyung Kim
  2020-01-08 22:24   ` Jiri Olsa
  2020-01-07 13:35 ` [PATCH 9/9] perf script: Add --show-cgroup-events option Namhyung Kim
  8 siblings, 1 reply; 48+ messages in thread
From: Namhyung Kim @ 2020-01-07 13:35 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Alexander Shishkin, Mark Rutland, Stephane Eranian,
	LKML, linux-perf-users

The --all-cgroups option is to enable cgroup profiling support.  It
tells kernel to record CGROUP events in the ring buffer so that perf
report can identify task/cgroup association later.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-top.txt | 4 ++++
 tools/perf/builtin-top.c              | 9 +++++++++
 2 files changed, 13 insertions(+)

diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt
index 5596129a71cf..c75507f50071 100644
--- a/tools/perf/Documentation/perf-top.txt
+++ b/tools/perf/Documentation/perf-top.txt
@@ -266,6 +266,10 @@ Default is to monitor all CPUS.
 	Record events of type PERF_RECORD_NAMESPACES and display it with the
 	'cgroup_id' sort key.
 
+--cgroup::
+	Record events of type PERF_RECORD_CGROUP and display it with the
+	'cgroup' sort key.
+
 --switch-on EVENT_NAME::
 	Only consider events after this event is found.
 
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 795e353de095..f6256c533b09 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1244,6 +1244,8 @@ static int __cmd_top(struct perf_top *top)
 
 	if (opts->record_namespaces)
 		top->tool.namespace_events = true;
+	if (opts->record_cgroup)
+		top->tool.cgroup_events = true;
 
 	ret = perf_event__synthesize_bpf_events(top->session, perf_event__process,
 						&top->session->machines.host,
@@ -1251,6 +1253,11 @@ static int __cmd_top(struct perf_top *top)
 	if (ret < 0)
 		pr_debug("Couldn't synthesize BPF events: Pre-existing BPF programs won't have symbols resolved.\n");
 
+	ret = perf_event__synthesize_cgroups(&top->tool, perf_event__process,
+					     &top->session->machines.host);
+	if (ret < 0)
+		pr_debug("Couldn't synthesize cgroup events.\n");
+
 	machine__synthesize_threads(&top->session->machines.host, &opts->target,
 				    top->evlist->core.threads, false,
 				    top->nr_threads_synthesize);
@@ -1539,6 +1546,8 @@ int cmd_top(int argc, const char **argv)
 			"number of thread to run event synthesize"),
 	OPT_BOOLEAN(0, "namespaces", &opts->record_namespaces,
 		    "Record namespaces events"),
+	OPT_BOOLEAN(0, "all-cgroups", &opts->record_cgroup,
+		    "Record cgroup events"),
 	OPTS_EVSWITCH(&top.evswitch),
 	OPT_END()
 	};
-- 
2.24.1.735.g03f4e72817-goog


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 9/9] perf script: Add --show-cgroup-events option
  2020-01-07 13:34 [PATCHSET 0/9] perf: Improve cgroup profiling (v4) Namhyung Kim
                   ` (7 preceding siblings ...)
  2020-01-07 13:35 ` [PATCH 8/9] perf top: " Namhyung Kim
@ 2020-01-07 13:35 ` Namhyung Kim
  8 siblings, 0 replies; 48+ messages in thread
From: Namhyung Kim @ 2020-01-07 13:35 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Alexander Shishkin, Mark Rutland, Stephane Eranian,
	LKML, linux-perf-users

The --show-cgroup-events option is to print CGROUP events in the
output like others.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-script.txt |  3 ++
 tools/perf/builtin-script.c              | 41 ++++++++++++++++++++++++
 2 files changed, 44 insertions(+)

diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index 2599b057e47b..3dd297600427 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -319,6 +319,9 @@ OPTIONS
 --show-bpf-events
 	Display bpf events i.e. events of type PERF_RECORD_KSYMBOL and PERF_RECORD_BPF_EVENT.
 
+--show-cgroup-events
+	Display cgroup events i.e. events of type PERF_RECORD_CGROUP.
+
 --demangle::
 	Demangle symbol names to human readable form. It's enabled by default,
 	disable with --no-demangle.
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index e2406b291c1c..3db4afc29430 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -1681,6 +1681,7 @@ struct perf_script {
 	bool			show_lost_events;
 	bool			show_round_events;
 	bool			show_bpf_events;
+	bool			show_cgroup_events;
 	bool			allocated;
 	bool			per_event_dump;
 	struct evswitch		evswitch;
@@ -2199,6 +2200,41 @@ static int process_namespaces_event(struct perf_tool *tool,
 	return ret;
 }
 
+static int process_cgroup_event(struct perf_tool *tool,
+				union perf_event *event,
+				struct perf_sample *sample,
+				struct machine *machine)
+{
+	struct thread *thread;
+	struct perf_script *script = container_of(tool, struct perf_script, tool);
+	struct perf_session *session = script->session;
+	struct evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id);
+	int ret = -1;
+
+	thread = machine__findnew_thread(machine, sample->pid, sample->tid);
+	if (thread == NULL) {
+		pr_debug("problem processing CGROUP event, skipping it.\n");
+		return -1;
+	}
+
+	if (perf_event__process_cgroup(tool, event, sample, machine) < 0)
+		goto out;
+
+	if (!evsel->core.attr.sample_id_all) {
+		sample->cpu = 0;
+		sample->time = 0;
+	}
+	if (!filter_cpu(sample)) {
+		perf_sample__fprintf_start(sample, thread, evsel,
+					   PERF_RECORD_CGROUP, stdout);
+		perf_event__fprintf(event, stdout);
+	}
+	ret = 0;
+out:
+	thread__put(thread);
+	return ret;
+}
+
 static int process_fork_event(struct perf_tool *tool,
 			      union perf_event *event,
 			      struct perf_sample *sample,
@@ -2538,6 +2574,8 @@ static int __cmd_script(struct perf_script *script)
 		script->tool.context_switch = process_switch_event;
 	if (script->show_namespace_events)
 		script->tool.namespaces = process_namespaces_event;
+	if (script->show_cgroup_events)
+		script->tool.cgroup = process_cgroup_event;
 	if (script->show_lost_events)
 		script->tool.lost = process_lost_event;
 	if (script->show_round_events) {
@@ -3463,6 +3501,7 @@ int cmd_script(int argc, const char **argv)
 			.mmap2		 = perf_event__process_mmap2,
 			.comm		 = perf_event__process_comm,
 			.namespaces	 = perf_event__process_namespaces,
+			.cgroup		 = perf_event__process_cgroup,
 			.exit		 = perf_event__process_exit,
 			.fork		 = perf_event__process_fork,
 			.attr		 = process_attr,
@@ -3563,6 +3602,8 @@ int cmd_script(int argc, const char **argv)
 		    "Show context switch events (if recorded)"),
 	OPT_BOOLEAN('\0', "show-namespace-events", &script.show_namespace_events,
 		    "Show namespace events (if recorded)"),
+	OPT_BOOLEAN('\0', "show-cgroup-events", &script.show_cgroup_events,
+		    "Show cgroup events (if recorded)"),
 	OPT_BOOLEAN('\0', "show-lost-events", &script.show_lost_events,
 		    "Show lost events (if recorded)"),
 	OPT_BOOLEAN('\0', "show-round-events", &script.show_round_events,
-- 
2.24.1.735.g03f4e72817-goog


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/9] perf tools: Maintain cgroup hierarchy
  2020-01-07 13:34 ` [PATCH 4/9] perf tools: Maintain cgroup hierarchy Namhyung Kim
@ 2020-01-08 21:52   ` Jiri Olsa
  2020-01-09  0:51     ` Namhyung Kim
  2020-01-08 22:01   ` Jiri Olsa
  1 sibling, 1 reply; 48+ messages in thread
From: Jiri Olsa @ 2020-01-08 21:52 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

On Tue, Jan 07, 2020 at 10:34:56PM +0900, Namhyung Kim wrote:
> Each cgroup is kept in the global cgroup_tree sorted by the cgroup id.
> Hist entries have cgroup id can compare it directly and later it can
> be used to find a group name using this tree.
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/util/cgroup.c  | 72 +++++++++++++++++++++++++++++++++++++++
>  tools/perf/util/cgroup.h  | 15 +++++---
>  tools/perf/util/machine.c |  7 ++++
>  tools/perf/util/session.c |  4 +++
>  4 files changed, 94 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
> index 4881d4af3381..4e8ef1db0c94 100644
> --- a/tools/perf/util/cgroup.c
> +++ b/tools/perf/util/cgroup.c
> @@ -13,6 +13,8 @@
>  
>  int nr_cgroups;
>  
> +static struct rb_root cgroup_tree = RB_ROOT;

I think we shoud carry that in 'struct perf_env' 

jirka


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/9] perf tools: Maintain cgroup hierarchy
  2020-01-07 13:34 ` [PATCH 4/9] perf tools: Maintain cgroup hierarchy Namhyung Kim
  2020-01-08 21:52   ` Jiri Olsa
@ 2020-01-08 22:01   ` Jiri Olsa
  2020-01-09  0:55     ` Namhyung Kim
  1 sibling, 1 reply; 48+ messages in thread
From: Jiri Olsa @ 2020-01-08 22:01 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

On Tue, Jan 07, 2020 at 10:34:56PM +0900, Namhyung Kim wrote:

SNIP

> +	while (*p != NULL) {
> +		parent = *p;
> +		cgrp = rb_entry(parent, struct cgroup, node);
> +
> +		if (cgrp->id == id)
> +			return cgrp;
> +
> +		if (cgrp->id < id)
> +			p = &(*p)->rb_left;
> +		else
> +			p = &(*p)->rb_right;
> +	}
> +
> +	cgrp = malloc(sizeof(*cgrp));
> +	if (cgrp == NULL)
> +		return NULL;
> +
> +	cgrp->name = strdup(path);
> +	if (cgrp->name == NULL) {
> +		free(cgrp);
> +		return NULL;
> +	}
> +
> +	cgrp->fd = -1;
> +	cgrp->id = id;
> +	refcount_set(&cgrp->refcnt, 1);
> +
> +	rb_link_node(&cgrp->node, parent, p);
> +	rb_insert_color(&cgrp->node, &cgroup_tree);
> +
> +	return cgrp;
> +}
> +
> +struct cgroup *cgroup__find_by_path(const char *path)
> +{
> +	struct rb_node *node;
> +
> +	node = rb_first(&cgroup_tree);
> +	while (node) {
> +		struct cgroup *cgrp = rb_entry(node, struct cgroup, node);
> +
> +		if (!strcmp(cgrp->name, path))
> +			return cgrp;

you have it sorted on ids, but only search by path,
why don't we sort it on path right away?

jirka


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 6/9] perf record: Support synthesizing cgroup events
  2020-01-07 13:34 ` [PATCH 6/9] perf record: Support synthesizing cgroup events Namhyung Kim
@ 2020-01-08 22:08   ` Jiri Olsa
  2020-01-09  5:19     ` Namhyung Kim
  2020-01-08 22:18   ` Jiri Olsa
  2020-01-08 22:21   ` Jiri Olsa
  2 siblings, 1 reply; 48+ messages in thread
From: Jiri Olsa @ 2020-01-08 22:08 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

On Tue, Jan 07, 2020 at 10:34:58PM +0900, Namhyung Kim wrote:
> Synthesize cgroup events by iterating cgroup filesystem directories.
> The cgroup event only saves the portion of cgroup path after the mount
> point and the cgroup id (which actually is a file handle).
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/builtin-record.c        |   5 ++
>  tools/perf/util/cgroup.c           |   3 +-
>  tools/perf/util/cgroup.h           |   1 +
>  tools/perf/util/event.c            |   1 +
>  tools/perf/util/synthetic-events.c | 119 +++++++++++++++++++++++++++++
>  tools/perf/util/synthetic-events.h |   1 +
>  tools/perf/util/tool.h             |   1 +
>  7 files changed, 129 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 4c301466101b..2802de9538ff 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -1397,6 +1397,11 @@ static int record__synthesize(struct record *rec, bool tail)
>  	if (err < 0)
>  		pr_warning("Couldn't synthesize bpf events.\n");
>  
> +	err = perf_event__synthesize_cgroups(tool, process_synthesized_event,
> +					     machine);
> +	if (err < 0)
> +		pr_warning("Couldn't synthesize cgroup events.\n");
> +
>  	err = __machine__synthesize_threads(machine, tool, &opts->target, rec->evlist->core.threads,
>  					    process_synthesized_event, opts->sample_address,
>  					    1);
> diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
> index 4e8ef1db0c94..5147d22b3bda 100644
> --- a/tools/perf/util/cgroup.c
> +++ b/tools/perf/util/cgroup.c
> @@ -15,8 +15,7 @@ int nr_cgroups;
>  
>  static struct rb_root cgroup_tree = RB_ROOT;
>  
> -static int
> -cgroupfs_find_mountpoint(char *buf, size_t maxlen)
> +int cgroupfs_find_mountpoint(char *buf, size_t maxlen)

out of scope of this change, but could this be added to api/fs/fs.c?
it might need more checks then is currently supported, but would be
nice to have it under same api as the rest

jirka


>  {
>  	FILE *fp;
>  	char mountpoint[PATH_MAX + 1], tokens[PATH_MAX + 1], type[PATH_MAX + 1];
> diff --git a/tools/perf/util/cgroup.h b/tools/perf/util/cgroup.h
> index 381583df27c7..9a67060723fa 100644
> --- a/tools/perf/util/cgroup.h
> +++ b/tools/perf/util/cgroup.h
> @@ -17,6 +17,7 @@ struct cgroup {
>  
>  extern int nr_cgroups; /* number of explicit cgroups defined */
>  
> +int cgroupfs_find_mountpoint(char *buf, size_t maxlen);
>  struct cgroup *cgroup__get(struct cgroup *cgroup);
>  void cgroup__put(struct cgroup *cgroup);
>  
> diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
> index 824c038e5c33..28801c867f39 100644
> --- a/tools/perf/util/event.c
> +++ b/tools/perf/util/event.c
> @@ -33,6 +33,7 @@
>  #include "bpf-event.h"
>  #include "tool.h"
>  #include "../perf.h"
> +#include "cgroup.h"
>  
>  static const char *perf_event__names[] = {
>  	[0]					= "TOTAL",
> diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
> index cd336eb8886b..36f122cac44f 100644
> --- a/tools/perf/util/synthetic-events.c
> +++ b/tools/perf/util/synthetic-events.c
> @@ -16,6 +16,7 @@
>  #include "util/synthetic-events.h"
>  #include "util/target.h"
>  #include "util/time-utils.h"
> +#include "util/cgroup.h"
>  #include <linux/bitops.h>
>  #include <linux/kernel.h>
>  #include <linux/string.h>
> @@ -413,6 +414,124 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
>  	return rc;
>  }
>  
> +static int perf_event__synthesize_cgroup(struct perf_tool *tool,
> +					 union perf_event *event,
> +					 char *path, size_t mount_len,
> +					 perf_event__handler_t process,
> +					 struct machine *machine)
> +{
> +	size_t event_size = sizeof(event->cgroup) - sizeof(event->cgroup.path);
> +	size_t path_len = strlen(path) - mount_len + 1;
> +	struct {
> +		struct file_handle fh;
> +		uint64_t cgroup_id;
> +	} handle;
> +	int mount_id;
> +
> +	while (path_len % sizeof(u64))
> +		path[mount_len + path_len++] = '\0';
> +
> +	memset(&event->cgroup, 0, event_size);
> +
> +	event->cgroup.header.type = PERF_RECORD_CGROUP;
> +	event->cgroup.header.size = event_size + path_len + machine->id_hdr_size;
> +
> +	handle.fh.handle_bytes = sizeof(handle.cgroup_id);
> +	if (name_to_handle_at(AT_FDCWD, path, &handle.fh, &mount_id, 0) < 0) {
> +		pr_debug("stat failed: %s\n", path);
> +		return -1;
> +	}
> +
> +	event->cgroup.id = handle.cgroup_id;
> +	strncpy(event->cgroup.path, path + mount_len, path_len);
> +	memset(event->cgroup.path + path_len, 0, machine->id_hdr_size);
> +
> +	if (perf_tool__process_synth_event(tool, event, machine, process) < 0) {
> +		pr_debug("process synth event failed\n");
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
> +static int perf_event__walk_cgroup_tree(struct perf_tool *tool,
> +					union perf_event *event,
> +					char *path, size_t mount_len,
> +					perf_event__handler_t process,
> +					struct machine *machine)
> +{
> +	size_t pos = strlen(path);
> +	DIR *d;
> +	struct dirent *dent;
> +	int ret = 0;
> +
> +	if (perf_event__synthesize_cgroup(tool, event, path, mount_len,
> +					  process, machine) < 0)
> +		return -1;
> +
> +	d = opendir(path);
> +	if (d == NULL) {
> +		pr_debug("failed to open directory: %s\n", path);
> +		return -1;
> +	}
> +
> +	while ((dent = readdir(d)) != NULL) {
> +		if (dent->d_type != DT_DIR)
> +			continue;
> +		if (!strcmp(dent->d_name, ".") ||
> +		    !strcmp(dent->d_name, ".."))
> +			continue;
> +
> +		if (path[pos - 1] != '/')
> +			strcat(path, "/");
> +		strcat(path, dent->d_name);
> +
> +		ret = perf_event__walk_cgroup_tree(tool, event, path,
> +						   mount_len, process, machine);
> +		if (ret < 0)
> +			break;
> +
> +		path[pos] = '\0';
> +	}
> +
> +	closedir(d);
> +	return ret;
> +}
> +
> +int perf_event__synthesize_cgroups(struct perf_tool *tool,
> +				   perf_event__handler_t process,
> +				   struct machine *machine)
> +{
> +	union perf_event event;
> +	char *cgrp_root;
> +	size_t mount_len;  /* length of mount point in the path */
> +	int ret = -1;
> +
> +	cgrp_root = malloc(PATH_MAX);
> +	if (cgrp_root == NULL)
> +		return -1;
> +
> +	if (cgroupfs_find_mountpoint(cgrp_root, PATH_MAX) < 0) {
> +		pr_debug("cannot find cgroup mount point\n");
> +		goto out;
> +	}
> +
> +	mount_len = strlen(cgrp_root);
> +	/* make sure the path starts with a slash (after mount point) */
> +	strcat(cgrp_root, "/");
> +
> +	if (perf_event__walk_cgroup_tree(tool, &event, cgrp_root, mount_len,
> +					 process, machine) < 0)
> +		goto out;
> +
> +	ret = 0;
> +
> +out:
> +	free(cgrp_root);
> +
> +	return ret;
> +}
> +
>  int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t process,
>  				   struct machine *machine)
>  {
> diff --git a/tools/perf/util/synthetic-events.h b/tools/perf/util/synthetic-events.h
> index baead0cdc381..e7a3e9589738 100644
> --- a/tools/perf/util/synthetic-events.h
> +++ b/tools/perf/util/synthetic-events.h
> @@ -45,6 +45,7 @@ int perf_event__synthesize_kernel_mmap(struct perf_tool *tool, perf_event__handl
>  int perf_event__synthesize_mmap_events(struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine, bool mmap_data);
>  int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
>  int perf_event__synthesize_namespaces(struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine);
> +int perf_event__synthesize_cgroups(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
>  int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format, const struct perf_sample *sample);
>  int perf_event__synthesize_stat_config(struct perf_tool *tool, struct perf_stat_config *config, perf_event__handler_t process, struct machine *machine);
>  int perf_event__synthesize_stat_events(struct perf_stat_config *config, struct perf_tool *tool, struct evlist *evlist, perf_event__handler_t process, bool attrs);
> diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
> index 472ef5eb4068..3fb67bd31e4a 100644
> --- a/tools/perf/util/tool.h
> +++ b/tools/perf/util/tool.h
> @@ -79,6 +79,7 @@ struct perf_tool {
>  	bool		ordered_events;
>  	bool		ordering_requires_timestamps;
>  	bool		namespace_events;
> +	bool		cgroup_events;
>  	bool		no_warn;
>  	enum show_feature_header show_feat_hdr;
>  };
> -- 
> 2.24.1.735.g03f4e72817-goog
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 6/9] perf record: Support synthesizing cgroup events
  2020-01-07 13:34 ` [PATCH 6/9] perf record: Support synthesizing cgroup events Namhyung Kim
  2020-01-08 22:08   ` Jiri Olsa
@ 2020-01-08 22:18   ` Jiri Olsa
  2020-01-09  7:50     ` Namhyung Kim
  2020-01-08 22:21   ` Jiri Olsa
  2 siblings, 1 reply; 48+ messages in thread
From: Jiri Olsa @ 2020-01-08 22:18 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

On Tue, Jan 07, 2020 at 10:34:58PM +0900, Namhyung Kim wrote:

SNIP

> +	if (perf_tool__process_synth_event(tool, event, machine, process) < 0) {
> +		pr_debug("process synth event failed\n");
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
> +static int perf_event__walk_cgroup_tree(struct perf_tool *tool,
> +					union perf_event *event,
> +					char *path, size_t mount_len,
> +					perf_event__handler_t process,
> +					struct machine *machine)
> +{
> +	size_t pos = strlen(path);
> +	DIR *d;
> +	struct dirent *dent;
> +	int ret = 0;
> +
> +	if (perf_event__synthesize_cgroup(tool, event, path, mount_len,
> +					  process, machine) < 0)
> +		return -1;
> +
> +	d = opendir(path);
> +	if (d == NULL) {
> +		pr_debug("failed to open directory: %s\n", path);
> +		return -1;
> +	}
> +
> +	while ((dent = readdir(d)) != NULL) {
> +		if (dent->d_type != DT_DIR)
> +			continue;
> +		if (!strcmp(dent->d_name, ".") ||
> +		    !strcmp(dent->d_name, ".."))
> +			continue;
> +
> +		if (path[pos - 1] != '/')
> +			strcat(path, "/");
> +		strcat(path, dent->d_name);

shouldn't you also check for the max size of path in here?

jirka


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 6/9] perf record: Support synthesizing cgroup events
  2020-01-07 13:34 ` [PATCH 6/9] perf record: Support synthesizing cgroup events Namhyung Kim
  2020-01-08 22:08   ` Jiri Olsa
  2020-01-08 22:18   ` Jiri Olsa
@ 2020-01-08 22:21   ` Jiri Olsa
  2020-01-09  7:51     ` Namhyung Kim
  2 siblings, 1 reply; 48+ messages in thread
From: Jiri Olsa @ 2020-01-08 22:21 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

On Tue, Jan 07, 2020 at 10:34:58PM +0900, Namhyung Kim wrote:

SNIP

> +	closedir(d);
> +	return ret;
> +}
> +
> +int perf_event__synthesize_cgroups(struct perf_tool *tool,
> +				   perf_event__handler_t process,
> +				   struct machine *machine)
> +{
> +	union perf_event event;
> +	char *cgrp_root;
> +	size_t mount_len;  /* length of mount point in the path */
> +	int ret = -1;
> +
> +	cgrp_root = malloc(PATH_MAX);
> +	if (cgrp_root == NULL)
> +		return -1;
> +

hum, we normally use bufs with PATH_MAX size on stack..
is there some reason to use heap in here?

jirka


> +	if (cgroupfs_find_mountpoint(cgrp_root, PATH_MAX) < 0) {
> +		pr_debug("cannot find cgroup mount point\n");
> +		goto out;
> +	}
> +
> +	mount_len = strlen(cgrp_root);
> +	/* make sure the path starts with a slash (after mount point) */
> +	strcat(cgrp_root, "/");
> +
> +	if (perf_event__walk_cgroup_tree(tool, &event, cgrp_root, mount_len,
> +					 process, machine) < 0)
> +		goto out;
> +
> +	ret = 0;
> +
> +out:
> +	free(cgrp_root);
> +
> +	return ret;
> +}
> +

SNIP


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 8/9] perf top: Add --all-cgroups option
  2020-01-07 13:35 ` [PATCH 8/9] perf top: " Namhyung Kim
@ 2020-01-08 22:24   ` Jiri Olsa
  2020-01-09  7:55     ` Namhyung Kim
  0 siblings, 1 reply; 48+ messages in thread
From: Jiri Olsa @ 2020-01-08 22:24 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

On Tue, Jan 07, 2020 at 10:35:00PM +0900, Namhyung Kim wrote:
> The --all-cgroups option is to enable cgroup profiling support.  It
> tells kernel to record CGROUP events in the ring buffer so that perf
> report can identify task/cgroup association later.
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/Documentation/perf-top.txt | 4 ++++
>  tools/perf/builtin-top.c              | 9 +++++++++
>  2 files changed, 13 insertions(+)
> 
> diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt
> index 5596129a71cf..c75507f50071 100644
> --- a/tools/perf/Documentation/perf-top.txt
> +++ b/tools/perf/Documentation/perf-top.txt
> @@ -266,6 +266,10 @@ Default is to monitor all CPUS.
>  	Record events of type PERF_RECORD_NAMESPACES and display it with the
>  	'cgroup_id' sort key.
>  
> +--cgroup::
> +	Record events of type PERF_RECORD_CGROUP and display it with the
> +	'cgroup' sort key.

should be '--all-cgroups' ?

anyway, we dont have '--cgroups', why not use just this?

jirka

> +
>  --switch-on EVENT_NAME::
>  	Only consider events after this event is found.
>  
> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> index 795e353de095..f6256c533b09 100644
> --- a/tools/perf/builtin-top.c
> +++ b/tools/perf/builtin-top.c
> @@ -1244,6 +1244,8 @@ static int __cmd_top(struct perf_top *top)
>  
>  	if (opts->record_namespaces)
>  		top->tool.namespace_events = true;
> +	if (opts->record_cgroup)
> +		top->tool.cgroup_events = true;
>  
>  	ret = perf_event__synthesize_bpf_events(top->session, perf_event__process,
>  						&top->session->machines.host,
> @@ -1251,6 +1253,11 @@ static int __cmd_top(struct perf_top *top)
>  	if (ret < 0)
>  		pr_debug("Couldn't synthesize BPF events: Pre-existing BPF programs won't have symbols resolved.\n");
>  
> +	ret = perf_event__synthesize_cgroups(&top->tool, perf_event__process,
> +					     &top->session->machines.host);
> +	if (ret < 0)
> +		pr_debug("Couldn't synthesize cgroup events.\n");
> +
>  	machine__synthesize_threads(&top->session->machines.host, &opts->target,
>  				    top->evlist->core.threads, false,
>  				    top->nr_threads_synthesize);
> @@ -1539,6 +1546,8 @@ int cmd_top(int argc, const char **argv)
>  			"number of thread to run event synthesize"),
>  	OPT_BOOLEAN(0, "namespaces", &opts->record_namespaces,
>  		    "Record namespaces events"),
> +	OPT_BOOLEAN(0, "all-cgroups", &opts->record_cgroup,
> +		    "Record cgroup events"),
>  	OPTS_EVSWITCH(&top.evswitch),
>  	OPT_END()
>  	};
> -- 
> 2.24.1.735.g03f4e72817-goog
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/9] perf tools: Maintain cgroup hierarchy
  2020-01-08 21:52   ` Jiri Olsa
@ 2020-01-09  0:51     ` Namhyung Kim
  2020-01-20 13:57       ` Namhyung Kim
  0 siblings, 1 reply; 48+ messages in thread
From: Namhyung Kim @ 2020-01-09  0:51 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

Hi Jiri,

On Thu, Jan 9, 2020 at 6:52 AM Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Tue, Jan 07, 2020 at 10:34:56PM +0900, Namhyung Kim wrote:
> > Each cgroup is kept in the global cgroup_tree sorted by the cgroup id.
> > Hist entries have cgroup id can compare it directly and later it can
> > be used to find a group name using this tree.
> >
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> >  tools/perf/util/cgroup.c  | 72 +++++++++++++++++++++++++++++++++++++++
> >  tools/perf/util/cgroup.h  | 15 +++++---
> >  tools/perf/util/machine.c |  7 ++++
> >  tools/perf/util/session.c |  4 +++
> >  4 files changed, 94 insertions(+), 4 deletions(-)
> >
> > diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
> > index 4881d4af3381..4e8ef1db0c94 100644
> > --- a/tools/perf/util/cgroup.c
> > +++ b/tools/perf/util/cgroup.c
> > @@ -13,6 +13,8 @@
> >
> >  int nr_cgroups;
> >
> > +static struct rb_root cgroup_tree = RB_ROOT;
>
> I think we shoud carry that in 'struct perf_env'

OK, will move.

Thanks
Namhyung

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/9] perf tools: Maintain cgroup hierarchy
  2020-01-08 22:01   ` Jiri Olsa
@ 2020-01-09  0:55     ` Namhyung Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Namhyung Kim @ 2020-01-09  0:55 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

On Thu, Jan 9, 2020 at 7:01 AM Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Tue, Jan 07, 2020 at 10:34:56PM +0900, Namhyung Kim wrote:
>
> SNIP
>
> > +     while (*p != NULL) {
> > +             parent = *p;
> > +             cgrp = rb_entry(parent, struct cgroup, node);
> > +
> > +             if (cgrp->id == id)
> > +                     return cgrp;
> > +
> > +             if (cgrp->id < id)
> > +                     p = &(*p)->rb_left;
> > +             else
> > +                     p = &(*p)->rb_right;
> > +     }
> > +
> > +     cgrp = malloc(sizeof(*cgrp));
> > +     if (cgrp == NULL)
> > +             return NULL;
> > +
> > +     cgrp->name = strdup(path);
> > +     if (cgrp->name == NULL) {
> > +             free(cgrp);
> > +             return NULL;
> > +     }
> > +
> > +     cgrp->fd = -1;
> > +     cgrp->id = id;
> > +     refcount_set(&cgrp->refcnt, 1);
> > +
> > +     rb_link_node(&cgrp->node, parent, p);
> > +     rb_insert_color(&cgrp->node, &cgroup_tree);
> > +
> > +     return cgrp;
> > +}
> > +
> > +struct cgroup *cgroup__find_by_path(const char *path)
> > +{
> > +     struct rb_node *node;
> > +
> > +     node = rb_first(&cgroup_tree);
> > +     while (node) {
> > +             struct cgroup *cgrp = rb_entry(node, struct cgroup, node);
> > +
> > +             if (!strcmp(cgrp->name, path))
> > +                     return cgrp;
>
> you have it sorted on ids, but only search by path,
> why don't we sort it on path right away?

No, actually it only used cgroup__findnew() not __find_by_path().
I don't remember why I added this - will remove..

Thanks
Namhyung

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 6/9] perf record: Support synthesizing cgroup events
  2020-01-08 22:08   ` Jiri Olsa
@ 2020-01-09  5:19     ` Namhyung Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Namhyung Kim @ 2020-01-09  5:19 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

On Thu, Jan 9, 2020 at 7:09 AM Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Tue, Jan 07, 2020 at 10:34:58PM +0900, Namhyung Kim wrote:
> > diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
> > index 4e8ef1db0c94..5147d22b3bda 100644
> > --- a/tools/perf/util/cgroup.c
> > +++ b/tools/perf/util/cgroup.c
> > @@ -15,8 +15,7 @@ int nr_cgroups;
> >
> >  static struct rb_root cgroup_tree = RB_ROOT;
> >
> > -static int
> > -cgroupfs_find_mountpoint(char *buf, size_t maxlen)
> > +int cgroupfs_find_mountpoint(char *buf, size_t maxlen)
>
> out of scope of this change, but could this be added to api/fs/fs.c?
> it might need more checks then is currently supported, but would be
> nice to have it under same api as the rest

Makes sense, will try to move it.

Thanks
Namhyung

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 6/9] perf record: Support synthesizing cgroup events
  2020-01-08 22:18   ` Jiri Olsa
@ 2020-01-09  7:50     ` Namhyung Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Namhyung Kim @ 2020-01-09  7:50 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

On Thu, Jan 9, 2020 at 7:18 AM Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Tue, Jan 07, 2020 at 10:34:58PM +0900, Namhyung Kim wrote:
>
> SNIP
>
> > +     if (perf_tool__process_synth_event(tool, event, machine, process) < 0) {
> > +             pr_debug("process synth event failed\n");
> > +             return -1;
> > +     }
> > +
> > +     return 0;
> > +}
> > +
> > +static int perf_event__walk_cgroup_tree(struct perf_tool *tool,
> > +                                     union perf_event *event,
> > +                                     char *path, size_t mount_len,
> > +                                     perf_event__handler_t process,
> > +                                     struct machine *machine)
> > +{
> > +     size_t pos = strlen(path);
> > +     DIR *d;
> > +     struct dirent *dent;
> > +     int ret = 0;
> > +
> > +     if (perf_event__synthesize_cgroup(tool, event, path, mount_len,
> > +                                       process, machine) < 0)
> > +             return -1;
> > +
> > +     d = opendir(path);
> > +     if (d == NULL) {
> > +             pr_debug("failed to open directory: %s\n", path);
> > +             return -1;
> > +     }
> > +
> > +     while ((dent = readdir(d)) != NULL) {
> > +             if (dent->d_type != DT_DIR)
> > +                     continue;
> > +             if (!strcmp(dent->d_name, ".") ||
> > +                 !strcmp(dent->d_name, ".."))
> > +                     continue;
> > +
> > +             if (path[pos - 1] != '/')
> > +                     strcat(path, "/");
> > +             strcat(path, dent->d_name);
>
> shouldn't you also check for the max size of path in here?

I guess a path should be shorter than PATH_MAX.
But yes, I can add a check if you want.

Thanks
Namhyung

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 6/9] perf record: Support synthesizing cgroup events
  2020-01-08 22:21   ` Jiri Olsa
@ 2020-01-09  7:51     ` Namhyung Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Namhyung Kim @ 2020-01-09  7:51 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

On Thu, Jan 9, 2020 at 7:22 AM Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Tue, Jan 07, 2020 at 10:34:58PM +0900, Namhyung Kim wrote:
>
> SNIP
>
> > +     closedir(d);
> > +     return ret;
> > +}
> > +
> > +int perf_event__synthesize_cgroups(struct perf_tool *tool,
> > +                                perf_event__handler_t process,
> > +                                struct machine *machine)
> > +{
> > +     union perf_event event;
> > +     char *cgrp_root;
> > +     size_t mount_len;  /* length of mount point in the path */
> > +     int ret = -1;
> > +
> > +     cgrp_root = malloc(PATH_MAX);
> > +     if (cgrp_root == NULL)
> > +             return -1;
> > +
>
> hum, we normally use bufs with PATH_MAX size on stack..
> is there some reason to use heap in here?

No specific reason, will change.

Thanks
Namhyung

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 8/9] perf top: Add --all-cgroups option
  2020-01-08 22:24   ` Jiri Olsa
@ 2020-01-09  7:55     ` Namhyung Kim
  2020-01-09  8:18       ` Jiri Olsa
  0 siblings, 1 reply; 48+ messages in thread
From: Namhyung Kim @ 2020-01-09  7:55 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

On Thu, Jan 9, 2020 at 7:25 AM Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Tue, Jan 07, 2020 at 10:35:00PM +0900, Namhyung Kim wrote:
> > The --all-cgroups option is to enable cgroup profiling support.  It
> > tells kernel to record CGROUP events in the ring buffer so that perf
> > report can identify task/cgroup association later.
> >
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> >  tools/perf/Documentation/perf-top.txt | 4 ++++
> >  tools/perf/builtin-top.c              | 9 +++++++++
> >  2 files changed, 13 insertions(+)
> >
> > diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt
> > index 5596129a71cf..c75507f50071 100644
> > --- a/tools/perf/Documentation/perf-top.txt
> > +++ b/tools/perf/Documentation/perf-top.txt
> > @@ -266,6 +266,10 @@ Default is to monitor all CPUS.
> >       Record events of type PERF_RECORD_NAMESPACES and display it with the
> >       'cgroup_id' sort key.
> >
> > +--cgroup::
> > +     Record events of type PERF_RECORD_CGROUP and display it with the
> > +     'cgroup' sort key.
>
> should be '--all-cgroups' ?

Oops, you're right.

>
> anyway, we dont have '--cgroups', why not use just this?

I chose it for consistency since perf record has '--cgroup' option.

Thanks
Namhyung

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 8/9] perf top: Add --all-cgroups option
  2020-01-09  7:55     ` Namhyung Kim
@ 2020-01-09  8:18       ` Jiri Olsa
  0 siblings, 0 replies; 48+ messages in thread
From: Jiri Olsa @ 2020-01-09  8:18 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

On Thu, Jan 09, 2020 at 04:55:21PM +0900, Namhyung Kim wrote:
> On Thu, Jan 9, 2020 at 7:25 AM Jiri Olsa <jolsa@redhat.com> wrote:
> >
> > On Tue, Jan 07, 2020 at 10:35:00PM +0900, Namhyung Kim wrote:
> > > The --all-cgroups option is to enable cgroup profiling support.  It
> > > tells kernel to record CGROUP events in the ring buffer so that perf
> > > report can identify task/cgroup association later.
> > >
> > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > ---
> > >  tools/perf/Documentation/perf-top.txt | 4 ++++
> > >  tools/perf/builtin-top.c              | 9 +++++++++
> > >  2 files changed, 13 insertions(+)
> > >
> > > diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt
> > > index 5596129a71cf..c75507f50071 100644
> > > --- a/tools/perf/Documentation/perf-top.txt
> > > +++ b/tools/perf/Documentation/perf-top.txt
> > > @@ -266,6 +266,10 @@ Default is to monitor all CPUS.
> > >       Record events of type PERF_RECORD_NAMESPACES and display it with the
> > >       'cgroup_id' sort key.
> > >
> > > +--cgroup::
> > > +     Record events of type PERF_RECORD_CGROUP and display it with the
> > > +     'cgroup' sort key.
> >
> > should be '--all-cgroups' ?
> 
> Oops, you're right.
> 
> >
> > anyway, we dont have '--cgroups', why not use just this?
> 
> I chose it for consistency since perf record has '--cgroup' option.

I see.. and we have that substring recognition so I suppose --cgroups
would screw the existing --cgroup option, ok

jirka


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/9] perf tools: Maintain cgroup hierarchy
  2020-01-09  0:51     ` Namhyung Kim
@ 2020-01-20 13:57       ` Namhyung Kim
  2020-01-21  9:42         ` Jiri Olsa
  0 siblings, 1 reply; 48+ messages in thread
From: Namhyung Kim @ 2020-01-20 13:57 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

Hi Jiri,

On Thu, Jan 9, 2020 at 9:51 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Hi Jiri,
>
> On Thu, Jan 9, 2020 at 6:52 AM Jiri Olsa <jolsa@redhat.com> wrote:
> >
> > On Tue, Jan 07, 2020 at 10:34:56PM +0900, Namhyung Kim wrote:
> > > Each cgroup is kept in the global cgroup_tree sorted by the cgroup id.
> > > Hist entries have cgroup id can compare it directly and later it can
> > > be used to find a group name using this tree.
> > >
> > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > ---
> > >  tools/perf/util/cgroup.c  | 72 +++++++++++++++++++++++++++++++++++++++
> > >  tools/perf/util/cgroup.h  | 15 +++++---
> > >  tools/perf/util/machine.c |  7 ++++
> > >  tools/perf/util/session.c |  4 +++
> > >  4 files changed, 94 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
> > > index 4881d4af3381..4e8ef1db0c94 100644
> > > --- a/tools/perf/util/cgroup.c
> > > +++ b/tools/perf/util/cgroup.c
> > > @@ -13,6 +13,8 @@
> > >
> > >  int nr_cgroups;
> > >
> > > +static struct rb_root cgroup_tree = RB_ROOT;
> >
> > I think we shoud carry that in 'struct perf_env'
>
> OK, will move.

So I tried this but then realized that it's hard to get the perf_env later
when it needs to convert a cgroup id to name (ie. in sort_entry.se_snprintf).
I also checked maybe I can resolve it when a hist entry is added,
but it doesn't have the pointer there too.

Thanks
Namhyung

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/9] perf tools: Maintain cgroup hierarchy
  2020-01-20 13:57       ` Namhyung Kim
@ 2020-01-21  9:42         ` Jiri Olsa
  2020-02-17 15:46           ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 48+ messages in thread
From: Jiri Olsa @ 2020-01-21  9:42 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

On Mon, Jan 20, 2020 at 10:57:47PM +0900, Namhyung Kim wrote:
> Hi Jiri,
> 
> On Thu, Jan 9, 2020 at 9:51 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Hi Jiri,
> >
> > On Thu, Jan 9, 2020 at 6:52 AM Jiri Olsa <jolsa@redhat.com> wrote:
> > >
> > > On Tue, Jan 07, 2020 at 10:34:56PM +0900, Namhyung Kim wrote:
> > > > Each cgroup is kept in the global cgroup_tree sorted by the cgroup id.
> > > > Hist entries have cgroup id can compare it directly and later it can
> > > > be used to find a group name using this tree.
> > > >
> > > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > > ---
> > > >  tools/perf/util/cgroup.c  | 72 +++++++++++++++++++++++++++++++++++++++
> > > >  tools/perf/util/cgroup.h  | 15 +++++---
> > > >  tools/perf/util/machine.c |  7 ++++
> > > >  tools/perf/util/session.c |  4 +++
> > > >  4 files changed, 94 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
> > > > index 4881d4af3381..4e8ef1db0c94 100644
> > > > --- a/tools/perf/util/cgroup.c
> > > > +++ b/tools/perf/util/cgroup.c
> > > > @@ -13,6 +13,8 @@
> > > >
> > > >  int nr_cgroups;
> > > >
> > > > +static struct rb_root cgroup_tree = RB_ROOT;
> > >
> > > I think we shoud carry that in 'struct perf_env'
> >
> > OK, will move.
> 
> So I tried this but then realized that it's hard to get the perf_env later
> when it needs to convert a cgroup id to name (ie. in sort_entry.se_snprintf).
> I also checked maybe I can resolve it when a hist entry is added,
> but it doesn't have the pointer there too.

looks like there might be a path for standard report where hists
are part of evsel object:

  'struct hist_entry' via ->hists to  'struct hists'
  hists_to_evsel(hists) to 'struct evsel'
  'struct evsel' via ->evlist to 'struct evlist'
  and there you have evlist->env ;-)

however I was wondering if we could add 'machine' pointer
to the hist object, that would make that simpler ;-)

not sure about the way.. would be nice if that'd work for
both evsel hists and standalone ones like in c2c

but maybe just some init helper that sets the pointer early on
might do the job

jirka


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/9] perf tools: Maintain cgroup hierarchy
  2020-01-21  9:42         ` Jiri Olsa
@ 2020-02-17 15:46           ` Arnaldo Carvalho de Melo
  2020-02-17 19:22             ` Namhyung Kim
  0 siblings, 1 reply; 48+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-02-17 15:46 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Namhyung Kim, Ingo Molnar, Peter Zijlstra, Alexander Shishkin,
	Mark Rutland, Stephane Eranian, LKML, linux-perf-users

Em Tue, Jan 21, 2020 at 10:42:46AM +0100, Jiri Olsa escreveu:
> On Mon, Jan 20, 2020 at 10:57:47PM +0900, Namhyung Kim wrote:
> > So I tried this but then realized that it's hard to get the perf_env later
> > when it needs to convert a cgroup id to name (ie. in sort_entry.se_snprintf).
> > I also checked maybe I can resolve it when a hist entry is added,
> > but it doesn't have the pointer there too.
 
> looks like there might be a path for standard report where hists
> are part of evsel object:
> 
>   'struct hist_entry' via ->hists to  'struct hists'
>   hists_to_evsel(hists) to 'struct evsel'
>   'struct evsel' via ->evlist to 'struct evlist'
>   and there you have evlist->env ;-)

Hey, recently I did work that ended up adding a 'struct maps' to 'struct
map_symbol' and then hist_entry::ms has it, enough?

i.e.:

hist_entry::ms->maps->machine

  he->ms->maps->machine
 
> however I was wondering if we could add 'machine' pointer
> to the hist object, that would make that simpler ;-)

Directly from one of its hist_entries? As above?
 
> not sure about the way.. would be nice if that'd work for
> both evsel hists and standalone ones like in c2c
 
> but maybe just some init helper that sets the pointer early on
> might do the job

Sorry for the delay in answering, was travelling/vacations, now taming a
huge mailbox :-\

- Arnaldo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/9] perf tools: Maintain cgroup hierarchy
  2020-02-17 15:46           ` Arnaldo Carvalho de Melo
@ 2020-02-17 19:22             ` Namhyung Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Namhyung Kim @ 2020-02-17 19:22 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Ingo Molnar, Peter Zijlstra, Alexander Shishkin,
	Mark Rutland, Stephane Eranian, LKML, linux-perf-users

Hi Arnaldo,

On Mon, Feb 17, 2020 at 7:46 AM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> Em Tue, Jan 21, 2020 at 10:42:46AM +0100, Jiri Olsa escreveu:
> > On Mon, Jan 20, 2020 at 10:57:47PM +0900, Namhyung Kim wrote:
> > > So I tried this but then realized that it's hard to get the perf_env later
> > > when it needs to convert a cgroup id to name (ie. in sort_entry.se_snprintf).
> > > I also checked maybe I can resolve it when a hist entry is added,
> > > but it doesn't have the pointer there too.
>
> > looks like there might be a path for standard report where hists
> > are part of evsel object:
> >
> >   'struct hist_entry' via ->hists to  'struct hists'
> >   hists_to_evsel(hists) to 'struct evsel'
> >   'struct evsel' via ->evlist to 'struct evlist'
> >   and there you have evlist->env ;-)
>
> Hey, recently I did work that ended up adding a 'struct maps' to 'struct
> map_symbol' and then hist_entry::ms has it, enough?
>
> i.e.:
>
> hist_entry::ms->maps->machine
>
>   he->ms->maps->machine

Yep, that's enough for my use.

>
> > however I was wondering if we could add 'machine' pointer
> > to the hist object, that would make that simpler ;-)
>
> Directly from one of its hist_entries? As above?

Right, any path from hist_entry to perf_env would be ok.

>
> > not sure about the way.. would be nice if that'd work for
> > both evsel hists and standalone ones like in c2c
>
> > but maybe just some init helper that sets the pointer early on
> > might do the job
>
> Sorry for the delay in answering, was travelling/vacations, now taming a
> huge mailbox :-\

No problem, I'll send the next version soon.

Thanks
Namhyung

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event
  2020-03-25 12:45 [PATCHSET 0/9] perf: Improve cgroup profiling (v6) Namhyung Kim
@ 2020-03-25 12:45 ` Namhyung Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Namhyung Kim @ 2020-03-25 12:45 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Peter Zijlstra, Ingo Molnar, LKML, Mark Rutland,
	Alexander Shishkin, Li Zefan, Johannes Weiner, Adrian Hunter,
	kbuild test robot, Tejun Heo, Peter Zijlstra

To support cgroup tracking, add CGROUP event to save a link between
cgroup path and id number.  This is needed since cgroups can go away
when userspace tries to read the cgroup info (from the id) later.
The attr.cgroup bit was also added to enable cgroup tracking from
userspace.

This event will be generated when a new cgroup becomes active.
Userspace might need to synthesize those events for existing cgroups.

Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
[staticize perf_event_cgroup function]
Reported-by: kbuild test robot <lkp@intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 include/uapi/linux/perf_event.h |  13 +++-
 kernel/events/core.c            | 111 ++++++++++++++++++++++++++++++++
 2 files changed, 123 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 397cfd65b3fe..de95f6c7b273 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -381,7 +381,8 @@ struct perf_event_attr {
 				ksymbol        :  1, /* include ksymbol events */
 				bpf_event      :  1, /* include bpf events */
 				aux_output     :  1, /* generate AUX records instead of events */
-				__reserved_1   : 32;
+				cgroup         :  1, /* include cgroup events */
+				__reserved_1   : 31;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -1012,6 +1013,16 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_BPF_EVENT			= 18,
 
+	/*
+	 * struct {
+	 *	struct perf_event_header	header;
+	 *	u64				id;
+	 *	char				path[];
+	 *	struct sample_id		sample_id;
+	 * };
+	 */
+	PERF_RECORD_CGROUP			= 19,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index d22e4ba59dfa..994932d5e474 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -387,6 +387,7 @@ static atomic_t nr_freq_events __read_mostly;
 static atomic_t nr_switch_events __read_mostly;
 static atomic_t nr_ksymbol_events __read_mostly;
 static atomic_t nr_bpf_events __read_mostly;
+static atomic_t nr_cgroup_events __read_mostly;
 
 static LIST_HEAD(pmus);
 static DEFINE_MUTEX(pmus_lock);
@@ -4608,6 +4609,8 @@ static void unaccount_event(struct perf_event *event)
 		atomic_dec(&nr_comm_events);
 	if (event->attr.namespaces)
 		atomic_dec(&nr_namespaces_events);
+	if (event->attr.cgroup)
+		atomic_dec(&nr_cgroup_events);
 	if (event->attr.task)
 		atomic_dec(&nr_task_events);
 	if (event->attr.freq)
@@ -7735,6 +7738,105 @@ void perf_event_namespaces(struct task_struct *task)
 			NULL);
 }
 
+/*
+ * cgroup tracking
+ */
+#ifdef CONFIG_CGROUP_PERF
+
+struct perf_cgroup_event {
+	char				*path;
+	int				path_size;
+	struct {
+		struct perf_event_header	header;
+		u64				id;
+		char				path[];
+	} event_id;
+};
+
+static int perf_event_cgroup_match(struct perf_event *event)
+{
+	return event->attr.cgroup;
+}
+
+static void perf_event_cgroup_output(struct perf_event *event, void *data)
+{
+	struct perf_cgroup_event *cgroup_event = data;
+	struct perf_output_handle handle;
+	struct perf_sample_data sample;
+	u16 header_size = cgroup_event->event_id.header.size;
+	int ret;
+
+	if (!perf_event_cgroup_match(event))
+		return;
+
+	perf_event_header__init_id(&cgroup_event->event_id.header,
+				   &sample, event);
+	ret = perf_output_begin(&handle, event,
+				cgroup_event->event_id.header.size);
+	if (ret)
+		goto out;
+
+	perf_output_put(&handle, cgroup_event->event_id);
+	__output_copy(&handle, cgroup_event->path, cgroup_event->path_size);
+
+	perf_event__output_id_sample(event, &handle, &sample);
+
+	perf_output_end(&handle);
+out:
+	cgroup_event->event_id.header.size = header_size;
+}
+
+static void perf_event_cgroup(struct cgroup *cgrp)
+{
+	struct perf_cgroup_event cgroup_event;
+	char path_enomem[16] = "//enomem";
+	char *pathname;
+	size_t size;
+
+	if (!atomic_read(&nr_cgroup_events))
+		return;
+
+	cgroup_event = (struct perf_cgroup_event){
+		.event_id  = {
+			.header = {
+				.type = PERF_RECORD_CGROUP,
+				.misc = 0,
+				.size = sizeof(cgroup_event.event_id),
+			},
+			.id = cgroup_id(cgrp),
+		},
+	};
+
+	pathname = kmalloc(PATH_MAX, GFP_KERNEL);
+	if (pathname == NULL) {
+		cgroup_event.path = path_enomem;
+	} else {
+		/* just to be sure to have enough space for alignment */
+		cgroup_path(cgrp, pathname, PATH_MAX - sizeof(u64));
+		cgroup_event.path = pathname;
+	}
+
+	/*
+	 * Since our buffer works in 8 byte units we need to align our string
+	 * size to a multiple of 8. However, we must guarantee the tail end is
+	 * zero'd out to avoid leaking random bits to userspace.
+	 */
+	size = strlen(cgroup_event.path) + 1;
+	while (!IS_ALIGNED(size, sizeof(u64)))
+		cgroup_event.path[size++] = '\0';
+
+	cgroup_event.event_id.header.size += size;
+	cgroup_event.path_size = size;
+
+	perf_iterate_sb(perf_event_cgroup_output,
+			&cgroup_event,
+			NULL);
+
+	kfree(pathname);
+}
+
+#endif
+
 /*
  * mmap tracking
  */
@@ -10781,6 +10883,8 @@ static void account_event(struct perf_event *event)
 		atomic_inc(&nr_comm_events);
 	if (event->attr.namespaces)
 		atomic_inc(&nr_namespaces_events);
+	if (event->attr.cgroup)
+		atomic_inc(&nr_cgroup_events);
 	if (event->attr.task)
 		atomic_inc(&nr_task_events);
 	if (event->attr.freq)
@@ -12757,6 +12861,12 @@ static void perf_cgroup_css_free(struct cgroup_subsys_state *css)
 	kfree(jc);
 }
 
+static int perf_cgroup_css_online(struct cgroup_subsys_state *css)
+{
+	perf_event_cgroup(css->cgroup);
+	return 0;
+}
+
 static int __perf_cgroup_move(void *info)
 {
 	struct task_struct *task = info;
@@ -12778,6 +12888,7 @@ static void perf_cgroup_attach(struct cgroup_taskset *tset)
 struct cgroup_subsys perf_event_cgrp_subsys = {
 	.css_alloc	= perf_cgroup_css_alloc,
 	.css_free	= perf_cgroup_css_free,
+	.css_online	= perf_cgroup_css_online,
 	.attach		= perf_cgroup_attach,
 	/*
 	 * Implicitly enable on dfl hierarchy so that perf events can
-- 
2.25.1.696.g5e7596f4ac-goog


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event
  2020-01-07 12:51   ` Peter Zijlstra
@ 2020-01-07 13:46     ` Namhyung Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Namhyung Kim @ 2020-01-07 13:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Jiri Olsa,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users, Tejun Heo, Li Zefan, Johannes Weiner,
	Adrian Hunter

Hi Peter,

Happy new year!

On Tue, Jan 7, 2020 at 9:51 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Dec 23, 2019 at 03:07:51PM +0900, Namhyung Kim wrote:
>
> > @@ -7564,6 +7567,105 @@ void perf_event_namespaces(struct task_struct *task)
> >                       NULL);
> >  }
> >
> > +/*
> > + * cgroup tracking
> > + */
> > +#ifdef CONFIG_CGROUPS
> > +
>
> <snip>
>
> > +
> > +#endif
> > +
> >  /*
> >   * mmap tracking
> >   */
>
> > @@ -12581,6 +12685,12 @@ static void perf_cgroup_css_free(struct cgroup_subsys_state *css)
> >       kfree(jc);
> >  }
> >
> > +static int perf_cgroup_css_online(struct cgroup_subsys_state *css)
> > +{
> > +     perf_event_cgroup(css->cgroup);
> > +     return 0;
> > +}
> > +
> >  static int __perf_cgroup_move(void *info)
> >  {
> >       struct task_struct *task = info;
> > @@ -12602,6 +12712,7 @@ static void perf_cgroup_attach(struct cgroup_taskset *tset)
> >  struct cgroup_subsys perf_event_cgrp_subsys = {
> >       .css_alloc      = perf_cgroup_css_alloc,
> >       .css_free       = perf_cgroup_css_free,
> > +     .css_online     = perf_cgroup_css_online,
> >       .attach         = perf_cgroup_attach,
> >       /*
> >        * Implicitly enable on dfl hierarchy so that perf events can
>
> CONFIG_CGROUPS vs CONFIG_CGROUP_PERF ?

Oh, I just saw this after sending v4 right before..
I think it should use CONFIG_CGROUP_PERF, will change.

>
> Other than that, I see nothing wrong here.

Thanks for the review!
Namhyung

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event
  2019-12-23  6:07 ` [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event Namhyung Kim
  2019-12-27  4:07   ` kbuild test robot
@ 2020-01-07 12:51   ` Peter Zijlstra
  2020-01-07 13:46     ` Namhyung Kim
  1 sibling, 1 reply; 48+ messages in thread
From: Peter Zijlstra @ 2020-01-07 12:51 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Jiri Olsa,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users, Tejun Heo, Li Zefan, Johannes Weiner,
	Adrian Hunter

On Mon, Dec 23, 2019 at 03:07:51PM +0900, Namhyung Kim wrote:

> @@ -7564,6 +7567,105 @@ void perf_event_namespaces(struct task_struct *task)
>  			NULL);
>  }
>  
> +/*
> + * cgroup tracking
> + */
> +#ifdef CONFIG_CGROUPS
> +

<snip>

> +
> +#endif
> +
>  /*
>   * mmap tracking
>   */

> @@ -12581,6 +12685,12 @@ static void perf_cgroup_css_free(struct cgroup_subsys_state *css)
>  	kfree(jc);
>  }
>  
> +static int perf_cgroup_css_online(struct cgroup_subsys_state *css)
> +{
> +	perf_event_cgroup(css->cgroup);
> +	return 0;
> +}
> +
>  static int __perf_cgroup_move(void *info)
>  {
>  	struct task_struct *task = info;
> @@ -12602,6 +12712,7 @@ static void perf_cgroup_attach(struct cgroup_taskset *tset)
>  struct cgroup_subsys perf_event_cgrp_subsys = {
>  	.css_alloc	= perf_cgroup_css_alloc,
>  	.css_free	= perf_cgroup_css_free,
> +	.css_online	= perf_cgroup_css_online,
>  	.attach		= perf_cgroup_attach,
>  	/*
>  	 * Implicitly enable on dfl hierarchy so that perf events can

CONFIG_CGROUPS vs CONFIG_CGROUP_PERF ?

Other than that, I see nothing wrong here.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event
  2019-12-23  6:07 ` [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event Namhyung Kim
@ 2019-12-27  4:07   ` kbuild test robot
  2020-01-07 12:51   ` Peter Zijlstra
  1 sibling, 0 replies; 48+ messages in thread
From: kbuild test robot @ 2019-12-27  4:07 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 10757 bytes --]

Hi Namhyung,

I love your patch! Perhaps something to improve:

[auto build test WARNING on tip/auto-latest]
[also build test WARNING on linux/master linus/master v5.5-rc3 next-20191219]
[cannot apply to tip/perf/core]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Namhyung-Kim/perf-Improve-cgroup-profiling-v3/20191225-035231
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git ec7b10f2d023bd79cf067c60c194f72a6d672319
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.1-129-g341daf20-dirty
        make ARCH=x86_64 allmodconfig
        make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)

   kernel/events/core.c:572:26: sparse: sparse: function 'perf_pmu_name' with external linkage has definition
   kernel/events/core.c:1385:15: sparse: sparse: incompatible types in comparison expression (different address spaces):
   kernel/events/core.c:1385:15: sparse:    struct perf_event_context [noderef] <asn:4> *
   kernel/events/core.c:1385:15: sparse:    struct perf_event_context *
   kernel/events/core.c:1398:28: sparse: sparse: incompatible types in comparison expression (different address spaces):
   kernel/events/core.c:1398:28: sparse:    struct perf_event_context [noderef] <asn:4> *
   kernel/events/core.c:1398:28: sparse:    struct perf_event_context *
   kernel/events/core.c:3222:18: sparse: sparse: incompatible types in comparison expression (different address spaces):
   kernel/events/core.c:3222:18: sparse:    struct perf_event_context [noderef] <asn:4> *
   kernel/events/core.c:3222:18: sparse:    struct perf_event_context *
   kernel/events/core.c:3223:23: sparse: sparse: incompatible types in comparison expression (different address spaces):
   kernel/events/core.c:3223:23: sparse:    struct perf_event_context [noderef] <asn:4> *
   kernel/events/core.c:3223:23: sparse:    struct perf_event_context *
   kernel/events/core.c:3265:25: sparse: sparse: incompatible types in comparison expression (different address spaces):
   kernel/events/core.c:3265:25: sparse:    struct perf_event_context [noderef] <asn:4> *
   kernel/events/core.c:3265:25: sparse:    struct perf_event_context *
   kernel/events/core.c:3266:25: sparse: sparse: incompatible types in comparison expression (different address spaces):
   kernel/events/core.c:3266:25: sparse:    struct perf_event_context [noderef] <asn:4> *
   kernel/events/core.c:3266:25: sparse:    struct perf_event_context *
   kernel/events/core.c:4341:25: sparse: sparse: incompatible types in comparison expression (different address spaces):
   kernel/events/core.c:4341:25: sparse:    struct perf_event_context [noderef] <asn:4> *
   kernel/events/core.c:4341:25: sparse:    struct perf_event_context *
   kernel/events/core.c:5599:9: sparse: sparse: incompatible types in comparison expression (different address spaces):
   kernel/events/core.c:5599:9: sparse:    struct ring_buffer [noderef] <asn:4> *
   kernel/events/core.c:5599:9: sparse:    struct ring_buffer *
   kernel/events/core.c:5075:24: sparse: sparse: incorrect type in assignment (different base types)
   kernel/events/core.c:5075:24: sparse:    expected restricted __poll_t [usertype] events
   kernel/events/core.c:5075:24: sparse:    got int
   kernel/events/core.c:5305:22: sparse: sparse: incompatible types in comparison expression (different address spaces):
   kernel/events/core.c:5305:22: sparse:    struct ring_buffer [noderef] <asn:4> *
   kernel/events/core.c:5305:22: sparse:    struct ring_buffer *
   kernel/events/core.c:5441:14: sparse: sparse: incompatible types in comparison expression (different address spaces):
   kernel/events/core.c:5441:14: sparse:    struct ring_buffer [noderef] <asn:4> *
   kernel/events/core.c:5441:14: sparse:    struct ring_buffer *
   kernel/events/core.c:5474:14: sparse: sparse: incompatible types in comparison expression (different address spaces):
   kernel/events/core.c:5474:14: sparse:    struct ring_buffer [noderef] <asn:4> *
   kernel/events/core.c:5474:14: sparse:    struct ring_buffer *
   kernel/events/core.c:5531:14: sparse: sparse: incompatible types in comparison expression (different address spaces):
   kernel/events/core.c:5531:14: sparse:    struct ring_buffer [noderef] <asn:4> *
   kernel/events/core.c:5531:14: sparse:    struct ring_buffer *
   kernel/events/core.c:5617:14: sparse: sparse: incompatible types in comparison expression (different address spaces):
   kernel/events/core.c:5617:14: sparse:    struct ring_buffer [noderef] <asn:4> *
   kernel/events/core.c:5617:14: sparse:    struct ring_buffer *
   kernel/events/core.c:5630:14: sparse: sparse: incompatible types in comparison expression (different address spaces):
   kernel/events/core.c:5630:14: sparse:    struct ring_buffer [noderef] <asn:4> *
   kernel/events/core.c:5630:14: sparse:    struct ring_buffer *
   kernel/events/internal.h:204:1: sparse: sparse: incorrect type in argument 2 (different address spaces)
   kernel/events/internal.h:204:1: sparse:    expected void const [noderef] <asn:1> *from
   kernel/events/internal.h:204:1: sparse:    got void const *buf
   kernel/events/core.c:6305:6: sparse: sparse: symbol 'perf_pmu_snapshot_aux' was not declared. Should it be static?
   kernel/events/core.c:7094:23: sparse: sparse: incompatible types in comparison expression (different address spaces):
   kernel/events/core.c:7094:23: sparse:    struct perf_event_context [noderef] <asn:4> *
   kernel/events/core.c:7094:23: sparse:    struct perf_event_context *
   kernel/events/core.c:7185:13: sparse: sparse: incompatible types in comparison expression (different address spaces):
   kernel/events/core.c:7185:13: sparse:    struct ring_buffer [noderef] <asn:4> *
   kernel/events/core.c:7185:13: sparse:    struct ring_buffer *
>> kernel/events/core.c:7618:6: sparse: sparse: symbol 'perf_event_cgroup' was not declared. Should it be static?
   kernel/events/core.c:7972:23: sparse: sparse: incompatible types in comparison expression (different address spaces):
   kernel/events/core.c:7972:23: sparse:    struct perf_event_context [noderef] <asn:4> *
   kernel/events/core.c:7972:23: sparse:    struct perf_event_context *
   kernel/events/core.c:8702:17: sparse: sparse: incompatible types in comparison expression (different address spaces):
   kernel/events/core.c:8702:17: sparse:    struct swevent_hlist [noderef] <asn:4> *
   kernel/events/core.c:8702:17: sparse:    struct swevent_hlist *
   kernel/events/core.c:8722:17: sparse: sparse: incompatible types in comparison expression (different address spaces):
   kernel/events/core.c:8722:17: sparse:    struct swevent_hlist [noderef] <asn:4> *
   kernel/events/core.c:8722:17: sparse:    struct swevent_hlist *
   kernel/events/core.c:8842:16: sparse: sparse: incompatible types in comparison expression (different address spaces):
   kernel/events/core.c:8842:16: sparse:    struct swevent_hlist [noderef] <asn:4> *
   kernel/events/core.c:8842:16: sparse:    struct swevent_hlist *
   kernel/events/core.c:8853:9: sparse: sparse: incompatible types in comparison expression (different address spaces):
   kernel/events/core.c:8853:9: sparse:    struct swevent_hlist [noderef] <asn:4> *
   kernel/events/core.c:8853:9: sparse:    struct swevent_hlist *
   kernel/events/core.c:8842:16: sparse: sparse: incompatible types in comparison expression (different address spaces):
   kernel/events/core.c:8842:16: sparse:    struct swevent_hlist [noderef] <asn:4> *
   kernel/events/core.c:8842:16: sparse:    struct swevent_hlist *
   kernel/events/core.c:8892:17: sparse: sparse: incompatible types in comparison expression (different address spaces):
   kernel/events/core.c:8892:17: sparse:    struct swevent_hlist [noderef] <asn:4> *
   kernel/events/core.c:8892:17: sparse:    struct swevent_hlist *
   kernel/events/core.c:9073:23: sparse: sparse: incompatible types in comparison expression (different address spaces):
   kernel/events/core.c:9073:23: sparse:    struct perf_event_context [noderef] <asn:4> *
   kernel/events/core.c:9073:23: sparse:    struct perf_event_context *
   kernel/events/core.c:10244:1: sparse: sparse: symbol 'dev_attr_nr_addr_filters' was not declared. Should it be static?
   kernel/events/core.c:11967:9: sparse: sparse: incompatible types in comparison expression (different address spaces):
   kernel/events/core.c:11967:9: sparse:    struct perf_event_context [noderef] <asn:4> *
   kernel/events/core.c:11967:9: sparse:    struct perf_event_context *
   kernel/events/core.c:12077:17: sparse: sparse: incompatible types in comparison expression (different address spaces):
   kernel/events/core.c:12077:17: sparse:    struct perf_event_context [noderef] <asn:4> *
   kernel/events/core.c:12077:17: sparse:    struct perf_event_context *
   kernel/events/core.c:8842:16: sparse: sparse: incompatible types in comparison expression (different address spaces):
   kernel/events/core.c:8842:16: sparse:    struct swevent_hlist [noderef] <asn:4> *
   kernel/events/core.c:8842:16: sparse:    struct swevent_hlist *
   kernel/events/core.c:12501:17: sparse: sparse: incompatible types in comparison expression (different address spaces):
   kernel/events/core.c:12501:17: sparse:    struct swevent_hlist [noderef] <asn:4> *
   kernel/events/core.c:12501:17: sparse:    struct swevent_hlist *
   kernel/events/core.c:155:9: sparse: sparse: context imbalance in 'perf_ctx_lock' - wrong count at exit
   kernel/events/core.c:163:17: sparse: sparse: context imbalance in 'perf_ctx_unlock' - unexpected unlock
   kernel/events/core.c:1405:17: sparse: sparse: context imbalance in 'perf_lock_task_context' - different lock contexts for basic block
   kernel/events/core.c:1432:17: sparse: sparse: context imbalance in 'perf_pin_task_context' - unexpected unlock
   kernel/events/core.c:2652:9: sparse: sparse: context imbalance in '__perf_install_in_context' - wrong count at exit
   kernel/events/core.c:4313:17: sparse: sparse: context imbalance in 'find_get_context' - unexpected unlock

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructure                 Open Source Technology Center
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org Intel Corporation

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event
  2019-12-23  6:07 [PATCHSET 0/9] perf: Improve cgroup profiling (v3) Namhyung Kim
@ 2019-12-23  6:07 ` Namhyung Kim
  2019-12-27  4:07   ` kbuild test robot
  2020-01-07 12:51   ` Peter Zijlstra
  0 siblings, 2 replies; 48+ messages in thread
From: Namhyung Kim @ 2019-12-23  6:07 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Alexander Shishkin, Mark Rutland, Stephane Eranian,
	LKML, linux-perf-users, Tejun Heo, Li Zefan, Johannes Weiner,
	Adrian Hunter

To support cgroup tracking, add CGROUP event to save a link between
cgroup path and id number.  This is needed since cgroups can go away
when userspace tries to read the cgroup info (from the id) later.
The attr.cgroup bit was also added to enable cgroup tracking from
userspace.

This event will be generated when a new cgroup becomes active.
Userspace might need to synthesize those events for existing cgroups.

Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 include/uapi/linux/perf_event.h |  13 +++-
 kernel/events/core.c            | 111 ++++++++++++++++++++++++++++++++
 2 files changed, 123 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 377d794d3105..de2ab87ca92c 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -377,7 +377,8 @@ struct perf_event_attr {
 				ksymbol        :  1, /* include ksymbol events */
 				bpf_event      :  1, /* include bpf events */
 				aux_output     :  1, /* generate AUX records instead of events */
-				__reserved_1   : 32;
+				cgroup         :  1, /* include cgroup events */
+				__reserved_1   : 31;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -1006,6 +1007,16 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_BPF_EVENT			= 18,
 
+	/*
+	 * struct {
+	 *	struct perf_event_header	header;
+	 *	u64				id;
+	 *	char				path[];
+	 *	struct sample_id		sample_id;
+	 * };
+	 */
+	PERF_RECORD_CGROUP			= 19,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 4ff86d57f9e5..b0aa1b921769 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -386,6 +386,7 @@ static atomic_t nr_freq_events __read_mostly;
 static atomic_t nr_switch_events __read_mostly;
 static atomic_t nr_ksymbol_events __read_mostly;
 static atomic_t nr_bpf_events __read_mostly;
+static atomic_t nr_cgroup_events __read_mostly;
 
 static LIST_HEAD(pmus);
 static DEFINE_MUTEX(pmus_lock);
@@ -4455,6 +4456,8 @@ static void unaccount_event(struct perf_event *event)
 		atomic_dec(&nr_comm_events);
 	if (event->attr.namespaces)
 		atomic_dec(&nr_namespaces_events);
+	if (event->attr.cgroup)
+		atomic_dec(&nr_cgroup_events);
 	if (event->attr.task)
 		atomic_dec(&nr_task_events);
 	if (event->attr.freq)
@@ -7564,6 +7567,105 @@ void perf_event_namespaces(struct task_struct *task)
 			NULL);
 }
 
+/*
+ * cgroup tracking
+ */
+#ifdef CONFIG_CGROUPS
+
+struct perf_cgroup_event {
+	char				*path;
+	int				path_size;
+	struct {
+		struct perf_event_header	header;
+		u64				id;
+		char				path[];
+	} event_id;
+};
+
+static int perf_event_cgroup_match(struct perf_event *event)
+{
+	return event->attr.cgroup;
+}
+
+static void perf_event_cgroup_output(struct perf_event *event, void *data)
+{
+	struct perf_cgroup_event *cgroup_event = data;
+	struct perf_output_handle handle;
+	struct perf_sample_data sample;
+	u16 header_size = cgroup_event->event_id.header.size;
+	int ret;
+
+	if (!perf_event_cgroup_match(event))
+		return;
+
+	perf_event_header__init_id(&cgroup_event->event_id.header,
+				   &sample, event);
+	ret = perf_output_begin(&handle, event,
+				cgroup_event->event_id.header.size);
+	if (ret)
+		goto out;
+
+	perf_output_put(&handle, cgroup_event->event_id);
+	__output_copy(&handle, cgroup_event->path, cgroup_event->path_size);
+
+	perf_event__output_id_sample(event, &handle, &sample);
+
+	perf_output_end(&handle);
+out:
+	cgroup_event->event_id.header.size = header_size;
+}
+
+void perf_event_cgroup(struct cgroup *cgrp)
+{
+	struct perf_cgroup_event cgroup_event;
+	char path_enomem[16] = "//enomem";
+	char *pathname;
+	size_t size;
+
+	if (!atomic_read(&nr_cgroup_events))
+		return;
+
+	cgroup_event = (struct perf_cgroup_event){
+		.event_id  = {
+			.header = {
+				.type = PERF_RECORD_CGROUP,
+				.misc = 0,
+				.size = sizeof(cgroup_event.event_id),
+			},
+			.id = cgroup_id(cgrp),
+		},
+	};
+
+	pathname = kmalloc(PATH_MAX, GFP_KERNEL);
+	if (pathname == NULL) {
+		cgroup_event.path = path_enomem;
+	} else {
+		/* just to be sure to have enough space for alignment */
+		cgroup_path(cgrp, pathname, PATH_MAX - sizeof(u64));
+		cgroup_event.path = pathname;
+	}
+
+	/*
+	 * Since our buffer works in 8 byte units we need to align our string
+	 * size to a multiple of 8. However, we must guarantee the tail end is
+	 * zero'd out to avoid leaking random bits to userspace.
+	 */
+	size = strlen(cgroup_event.path) + 1;
+	while (!IS_ALIGNED(size, sizeof(u64)))
+		cgroup_event.path[size++] = '\0';
+
+	cgroup_event.event_id.header.size += size;
+	cgroup_event.path_size = size;
+
+	perf_iterate_sb(perf_event_cgroup_output,
+			&cgroup_event,
+			NULL);
+
+	kfree(pathname);
+}
+
+#endif
+
 /*
  * mmap tracking
  */
@@ -10607,6 +10709,8 @@ static void account_event(struct perf_event *event)
 		atomic_inc(&nr_comm_events);
 	if (event->attr.namespaces)
 		atomic_inc(&nr_namespaces_events);
+	if (event->attr.cgroup)
+		atomic_inc(&nr_cgroup_events);
 	if (event->attr.task)
 		atomic_inc(&nr_task_events);
 	if (event->attr.freq)
@@ -12581,6 +12685,12 @@ static void perf_cgroup_css_free(struct cgroup_subsys_state *css)
 	kfree(jc);
 }
 
+static int perf_cgroup_css_online(struct cgroup_subsys_state *css)
+{
+	perf_event_cgroup(css->cgroup);
+	return 0;
+}
+
 static int __perf_cgroup_move(void *info)
 {
 	struct task_struct *task = info;
@@ -12602,6 +12712,7 @@ static void perf_cgroup_attach(struct cgroup_taskset *tset)
 struct cgroup_subsys perf_event_cgrp_subsys = {
 	.css_alloc	= perf_cgroup_css_alloc,
 	.css_free	= perf_cgroup_css_free,
+	.css_online	= perf_cgroup_css_online,
 	.attach		= perf_cgroup_attach,
 	/*
 	 * Implicitly enable on dfl hierarchy so that perf events can
-- 
2.24.1.735.g03f4e72817-goog


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event
  2019-12-20 15:16     ` Tejun Heo
@ 2019-12-21  8:49       ` Namhyung Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Namhyung Kim @ 2019-12-21  8:49 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Jiri Olsa,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users, Li Zefan, Johannes Weiner, Adrian Hunter

Hi Tejun,

On Sat, Dec 21, 2019 at 12:16 AM Tejun Heo <tj@kernel.org> wrote:
>
> On Fri, Dec 20, 2019 at 10:33:35AM +0100, Peter Zijlstra wrote:
> > On Fri, Dec 20, 2019 at 01:32:45PM +0900, Namhyung Kim wrote:
> > > To support cgroup tracking, add CGROUP event to save a link between
> > > cgroup path and inode number.  The attr.cgroup bit was also added to
> > > enable cgroup tracking from userspace.
> > >
> > > This event will be generated when a new cgroup becomes active.
> > > Userspace might need to synthesize those events for existing cgroups.
> > >
> > > Cc: Tejun Heo <tj@kernel.org>
> > > Cc: Li Zefan <lizefan@huawei.com>
> > > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > > Cc: Adrian Hunter <adrian.hunter@intel.com>
> > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> >
> > TJ, is this the right thing to do? ISTR you had concerns on this topic
> > on the past.
>
> Yeah, cgroup->id is now the same as ino (on 64bit ino matchines) and
> fhandle and uniquely identifies a cgroup instance in that boot
> instance.  That said, id -> path mapping can be done from userspace by
> passing the cgroup id to open_by_handle_at(2) and then reading the
> symlink in /proc/self/fd, so this event isn't necessary per-se if the
> goal is mapping back ids to paths.

But we should support offline report even on a different machine.
Also cgroups might go away during the record. so I think we need it
anyway.

Thanks
Namhyung

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event
  2019-12-20  9:33   ` Peter Zijlstra
  2019-12-20 11:28     ` Namhyung Kim
  2019-12-20 11:50     ` Namhyung Kim
@ 2019-12-20 15:16     ` Tejun Heo
  2019-12-21  8:49       ` Namhyung Kim
  2 siblings, 1 reply; 48+ messages in thread
From: Tejun Heo @ 2019-12-20 15:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Namhyung Kim, Ingo Molnar, Arnaldo Carvalho de Melo, Jiri Olsa,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users, Li Zefan, Johannes Weiner, Adrian Hunter

On Fri, Dec 20, 2019 at 10:33:35AM +0100, Peter Zijlstra wrote:
> On Fri, Dec 20, 2019 at 01:32:45PM +0900, Namhyung Kim wrote:
> > To support cgroup tracking, add CGROUP event to save a link between
> > cgroup path and inode number.  The attr.cgroup bit was also added to
> > enable cgroup tracking from userspace.
> > 
> > This event will be generated when a new cgroup becomes active.
> > Userspace might need to synthesize those events for existing cgroups.
> > 
> > Cc: Tejun Heo <tj@kernel.org>
> > Cc: Li Zefan <lizefan@huawei.com>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: Adrian Hunter <adrian.hunter@intel.com>
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> 
> TJ, is this the right thing to do? ISTR you had concerns on this topic
> on the past.

Yeah, cgroup->id is now the same as ino (on 64bit ino matchines) and
fhandle and uniquely identifies a cgroup instance in that boot
instance.  That said, id -> path mapping can be done from userspace by
passing the cgroup id to open_by_handle_at(2) and then reading the
symlink in /proc/self/fd, so this event isn't necessary per-se if the
goal is mapping back ids to paths.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event
  2019-12-20  9:33   ` Peter Zijlstra
  2019-12-20 11:28     ` Namhyung Kim
@ 2019-12-20 11:50     ` Namhyung Kim
  2019-12-20 15:16     ` Tejun Heo
  2 siblings, 0 replies; 48+ messages in thread
From: Namhyung Kim @ 2019-12-20 11:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Jiri Olsa,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users, Tejun Heo, Li Zefan, Johannes Weiner,
	Adrian Hunter

On Fri, Dec 20, 2019 at 6:33 PM Peter Zijlstra <peterz@infradead.org> wrote:
> > diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> > index 377d794d3105..7bae2d3380a6 100644
> > --- a/include/uapi/linux/perf_event.h
> > +++ b/include/uapi/linux/perf_event.h
> > @@ -377,7 +377,8 @@ struct perf_event_attr {
> >                               ksymbol        :  1, /* include ksymbol events */
> >                               bpf_event      :  1, /* include bpf events */
> >                               aux_output     :  1, /* generate AUX records instead of events */
> > -                             __reserved_1   : 32;
> > +                             cgroup         :  1, /* include cgroup events */
> > +                             __reserved_1   : 31;
> >
> >       union {
> >               __u32           wakeup_events;    /* wakeup every n events */
> > @@ -1006,6 +1007,17 @@ enum perf_event_type {
> >        */
> >       PERF_RECORD_BPF_EVENT                   = 18,
> >
> > +     /*
> > +      * struct {
> > +      *      struct perf_event_header        header;
> > +      *      u64                             id;
> > +      *      u64                             path_len;
>
> You can leave out path_len (also u64 for a length field is silly).

Right, will remove.

Thanks
Namhyung


>
> > +      *      char                            path[];
> > +      *      struct sample_id                sample_id;
> > +      * };
> > +      */
> > +     PERF_RECORD_CGROUP                      = 19,

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event
  2019-12-20  9:33   ` Peter Zijlstra
@ 2019-12-20 11:28     ` Namhyung Kim
  2019-12-20 11:50     ` Namhyung Kim
  2019-12-20 15:16     ` Tejun Heo
  2 siblings, 0 replies; 48+ messages in thread
From: Namhyung Kim @ 2019-12-20 11:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Jiri Olsa,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users, Tejun Heo, Li Zefan, Johannes Weiner,
	Adrian Hunter

Hi Peter,

On Fri, Dec 20, 2019 at 6:33 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Dec 20, 2019 at 01:32:45PM +0900, Namhyung Kim wrote:
> > To support cgroup tracking, add CGROUP event to save a link between
> > cgroup path and inode number.  The attr.cgroup bit was also added to
> > enable cgroup tracking from userspace.
> >
> > This event will be generated when a new cgroup becomes active.
> > Userspace might need to synthesize those events for existing cgroups.
> >
> > Cc: Tejun Heo <tj@kernel.org>
> > Cc: Li Zefan <lizefan@huawei.com>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: Adrian Hunter <adrian.hunter@intel.com>
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
>
> TJ, is this the right thing to do? ISTR you had concerns on this topic
> on the past.

Oh, I should've updated the changelog.  Now it uses 64bit cgroup id which
was converted by Tejun.

https://lkml.org/lkml/2019/11/4/1551

Thanks
Namhyung

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event
  2019-12-20  4:32 ` [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event Namhyung Kim
@ 2019-12-20  9:33   ` Peter Zijlstra
  2019-12-20 11:28     ` Namhyung Kim
                       ` (2 more replies)
  0 siblings, 3 replies; 48+ messages in thread
From: Peter Zijlstra @ 2019-12-20  9:33 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Jiri Olsa,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users, Tejun Heo, Li Zefan, Johannes Weiner,
	Adrian Hunter

On Fri, Dec 20, 2019 at 01:32:45PM +0900, Namhyung Kim wrote:
> To support cgroup tracking, add CGROUP event to save a link between
> cgroup path and inode number.  The attr.cgroup bit was also added to
> enable cgroup tracking from userspace.
> 
> This event will be generated when a new cgroup becomes active.
> Userspace might need to synthesize those events for existing cgroups.
> 
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Li Zefan <lizefan@huawei.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>

TJ, is this the right thing to do? ISTR you had concerns on this topic
on the past.

> ---
>  include/uapi/linux/perf_event.h |  14 +++-
>  kernel/events/core.c            | 112 ++++++++++++++++++++++++++++++++
>  2 files changed, 125 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index 377d794d3105..7bae2d3380a6 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -377,7 +377,8 @@ struct perf_event_attr {
>  				ksymbol        :  1, /* include ksymbol events */
>  				bpf_event      :  1, /* include bpf events */
>  				aux_output     :  1, /* generate AUX records instead of events */
> -				__reserved_1   : 32;
> +				cgroup         :  1, /* include cgroup events */
> +				__reserved_1   : 31;
>  
>  	union {
>  		__u32		wakeup_events;	  /* wakeup every n events */
> @@ -1006,6 +1007,17 @@ enum perf_event_type {
>  	 */
>  	PERF_RECORD_BPF_EVENT			= 18,
>  
> +	/*
> +	 * struct {
> +	 *	struct perf_event_header	header;
> +	 *	u64				id;
> +	 *	u64				path_len;

You can leave out path_len (also u64 for a length field is silly).

> +	 *	char				path[];
> +	 *	struct sample_id		sample_id;
> +	 * };
> +	 */
> +	PERF_RECORD_CGROUP			= 19,

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event
  2019-12-20  4:32 [PATCHSET 0/9] perf: Improve cgroup profiling (v2) Namhyung Kim
@ 2019-12-20  4:32 ` Namhyung Kim
  2019-12-20  9:33   ` Peter Zijlstra
  0 siblings, 1 reply; 48+ messages in thread
From: Namhyung Kim @ 2019-12-20  4:32 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Alexander Shishkin, Mark Rutland, Stephane Eranian,
	LKML, linux-perf-users, Tejun Heo, Li Zefan, Johannes Weiner,
	Adrian Hunter

To support cgroup tracking, add CGROUP event to save a link between
cgroup path and inode number.  The attr.cgroup bit was also added to
enable cgroup tracking from userspace.

This event will be generated when a new cgroup becomes active.
Userspace might need to synthesize those events for existing cgroups.

Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 include/uapi/linux/perf_event.h |  14 +++-
 kernel/events/core.c            | 112 ++++++++++++++++++++++++++++++++
 2 files changed, 125 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 377d794d3105..7bae2d3380a6 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -377,7 +377,8 @@ struct perf_event_attr {
 				ksymbol        :  1, /* include ksymbol events */
 				bpf_event      :  1, /* include bpf events */
 				aux_output     :  1, /* generate AUX records instead of events */
-				__reserved_1   : 32;
+				cgroup         :  1, /* include cgroup events */
+				__reserved_1   : 31;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -1006,6 +1007,17 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_BPF_EVENT			= 18,
 
+	/*
+	 * struct {
+	 *	struct perf_event_header	header;
+	 *	u64				id;
+	 *	u64				path_len;
+	 *	char				path[];
+	 *	struct sample_id		sample_id;
+	 * };
+	 */
+	PERF_RECORD_CGROUP			= 19,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 4ff86d57f9e5..9bcb2b552acc 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -386,6 +386,7 @@ static atomic_t nr_freq_events __read_mostly;
 static atomic_t nr_switch_events __read_mostly;
 static atomic_t nr_ksymbol_events __read_mostly;
 static atomic_t nr_bpf_events __read_mostly;
+static atomic_t nr_cgroup_events __read_mostly;
 
 static LIST_HEAD(pmus);
 static DEFINE_MUTEX(pmus_lock);
@@ -4455,6 +4456,8 @@ static void unaccount_event(struct perf_event *event)
 		atomic_dec(&nr_comm_events);
 	if (event->attr.namespaces)
 		atomic_dec(&nr_namespaces_events);
+	if (event->attr.cgroup)
+		atomic_dec(&nr_cgroup_events);
 	if (event->attr.task)
 		atomic_dec(&nr_task_events);
 	if (event->attr.freq)
@@ -7564,6 +7567,106 @@ void perf_event_namespaces(struct task_struct *task)
 			NULL);
 }
 
+/*
+ * cgroup tracking
+ */
+#ifdef CONFIG_CGROUPS
+
+struct perf_cgroup_event {
+	char				*path;
+	struct {
+		struct perf_event_header	header;
+		u64				id;
+		u64				path_len;
+		char				path[];
+	} event_id;
+};
+
+static int perf_event_cgroup_match(struct perf_event *event)
+{
+	return event->attr.cgroup;
+}
+
+static void perf_event_cgroup_output(struct perf_event *event, void *data)
+{
+	struct perf_cgroup_event *cgroup_event = data;
+	struct perf_output_handle handle;
+	struct perf_sample_data sample;
+	u16 header_size = cgroup_event->event_id.header.size;
+	int ret;
+
+	if (!perf_event_cgroup_match(event))
+		return;
+
+	perf_event_header__init_id(&cgroup_event->event_id.header,
+				   &sample, event);
+	ret = perf_output_begin(&handle, event,
+				cgroup_event->event_id.header.size);
+	if (ret)
+		goto out;
+
+	perf_output_put(&handle, cgroup_event->event_id);
+	__output_copy(&handle, cgroup_event->path,
+		      cgroup_event->event_id.path_len);
+
+	perf_event__output_id_sample(event, &handle, &sample);
+
+	perf_output_end(&handle);
+out:
+	cgroup_event->event_id.header.size = header_size;
+}
+
+void perf_event_cgroup(struct cgroup *cgrp)
+{
+	struct perf_cgroup_event cgroup_event;
+	char path_enomem[16] = "//enomem";
+	char *pathname;
+	size_t size;
+
+	if (!atomic_read(&nr_cgroup_events))
+		return;
+
+	cgroup_event = (struct perf_cgroup_event){
+		.event_id  = {
+			.header = {
+				.type = PERF_RECORD_CGROUP,
+				.misc = 0,
+				.size = sizeof(cgroup_event.event_id),
+			},
+			.id = cgroup_id(cgrp),
+		},
+	};
+
+	pathname = kmalloc(PATH_MAX, GFP_KERNEL);
+	if (pathname == NULL) {
+		cgroup_event.path = path_enomem;
+	} else {
+		/* just to be sure to have enough space for alignment */
+		cgroup_path(cgrp, pathname, PATH_MAX - sizeof(u64));
+		cgroup_event.path = pathname;
+	}
+
+	/*
+	 * Since our buffer works in 8 byte units we need to align our string
+	 * size to a multiple of 8. However, we must guarantee the tail end is
+	 * zero'd out to avoid leaking random bits to userspace.
+	 */
+	size = strlen(cgroup_event.path) + 1;
+	while (!IS_ALIGNED(size, sizeof(u64)))
+		cgroup_event.path[size++] = '\0';
+
+	cgroup_event.event_id.header.size += size;
+	cgroup_event.event_id.path_len = size;
+
+	perf_iterate_sb(perf_event_cgroup_output,
+			&cgroup_event,
+			NULL);
+
+	kfree(pathname);
+}
+
+#endif
+
 /*
  * mmap tracking
  */
@@ -10607,6 +10710,8 @@ static void account_event(struct perf_event *event)
 		atomic_inc(&nr_comm_events);
 	if (event->attr.namespaces)
 		atomic_inc(&nr_namespaces_events);
+	if (event->attr.cgroup)
+		atomic_inc(&nr_cgroup_events);
 	if (event->attr.task)
 		atomic_inc(&nr_task_events);
 	if (event->attr.freq)
@@ -12581,6 +12686,12 @@ static void perf_cgroup_css_free(struct cgroup_subsys_state *css)
 	kfree(jc);
 }
 
+static int perf_cgroup_css_online(struct cgroup_subsys_state *css)
+{
+	perf_event_cgroup(css->cgroup);
+	return 0;
+}
+
 static int __perf_cgroup_move(void *info)
 {
 	struct task_struct *task = info;
@@ -12602,6 +12713,7 @@ static void perf_cgroup_attach(struct cgroup_taskset *tset)
 struct cgroup_subsys perf_event_cgrp_subsys = {
 	.css_alloc	= perf_cgroup_css_alloc,
 	.css_free	= perf_cgroup_css_free,
+	.css_online	= perf_cgroup_css_online,
 	.attach		= perf_cgroup_attach,
 	/*
 	 * Implicitly enable on dfl hierarchy so that perf events can
-- 
2.24.1.735.g03f4e72817-goog


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event
  2019-08-30 22:49         ` Namhyung Kim
@ 2019-08-30 23:52           ` Stephane Eranian
  0 siblings, 0 replies; 48+ messages in thread
From: Stephane Eranian @ 2019-08-30 23:52 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, LKML,
	Jiri Olsa, Alexander Shishkin, Tejun Heo, Li Zefan,
	Johannes Weiner, Adrian Hunter

On Fri, Aug 30, 2019 at 3:49 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Fri, Aug 30, 2019 at 4:35 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Fri, Aug 30, 2019 at 12:46:51PM +0900, Namhyung Kim wrote:
> > > Hi Peter,
> > >
> > > On Wed, Aug 28, 2019 at 6:45 PM Peter Zijlstra <peterz@infradead.org> wrote:
> > > >
> > > > On Wed, Aug 28, 2019 at 04:31:22PM +0900, Namhyung Kim wrote:
> > > > > To support cgroup tracking, add CGROUP event to save a link between
> > > > > cgroup path and inode number.  The attr.cgroup bit was also added to
> > > > > enable cgroup tracking from userspace.
> > > > >
> > > > > This event will be generated when a new cgroup becomes active.
> > > > > Userspace might need to synthesize those events for existing cgroups.
> > > > >
> > > > > As aux_output change is also going on, I just added the bit here as
> > > > > well to remove possible conflicts later.
> > > >
> > > > Why do we want this?
> > >
> > > I saw below [1] and thought you have the patch introduced aux_output
> > > and it's gonna to be merged soon.
> > > Also the tooling patches are already in the acme/perf/core
> > > so I just wanted to avoid conflicts.
> > >
> > > Anyway, I'm ok with changing it.  Will remove in v2.
> >
> > I seem to have confused both you and Arnaldo with this. This email
> > questions the Changelog as a whole, not just the aux thing (I send a
> > separate email for that).
> >
> > This Changelog utterly fails to explain to me _why_ we need/want cgroup
> > tracking. So why do I want to review and possibly merge this? Changelog
> > needs to answer this.
>
> OK.  How about this?
>
> Systems running a large number of jobs in different cgroups want to
> profiling such jobs precisely.  This includes container hosting systems
> widely used today.  Currently perf supports namespace tracking but
> the systems may not use (cgroup) namespace for their jobs.  Also
> it'd be more intuitive to see cgroup names (as they're given by user
> or sysadmin) rather than numeric cgroup/namespace id even if they
> use the namespaces.
>

In data centers you care about attributing samples to a job not such
much to a process.
A job may have multiple processes which may come and go. The cgroup on
the other hand
stays around for the entire lifetime of the job. It is much easier to
map a cgroup name to a particular
job than it is to map a pid back to a job name, especially for offline
post-processing.

Hope this clarifies why we would like this feature upstream.


>
> Thanks,
> Namhyung

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event
  2019-08-30  7:34       ` Peter Zijlstra
@ 2019-08-30 22:49         ` Namhyung Kim
  2019-08-30 23:52           ` Stephane Eranian
  0 siblings, 1 reply; 48+ messages in thread
From: Namhyung Kim @ 2019-08-30 22:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, LKML, Jiri Olsa,
	Alexander Shishkin, Stephane Eranian, Tejun Heo, Li Zefan,
	Johannes Weiner, Adrian Hunter

On Fri, Aug 30, 2019 at 4:35 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Aug 30, 2019 at 12:46:51PM +0900, Namhyung Kim wrote:
> > Hi Peter,
> >
> > On Wed, Aug 28, 2019 at 6:45 PM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Wed, Aug 28, 2019 at 04:31:22PM +0900, Namhyung Kim wrote:
> > > > To support cgroup tracking, add CGROUP event to save a link between
> > > > cgroup path and inode number.  The attr.cgroup bit was also added to
> > > > enable cgroup tracking from userspace.
> > > >
> > > > This event will be generated when a new cgroup becomes active.
> > > > Userspace might need to synthesize those events for existing cgroups.
> > > >
> > > > As aux_output change is also going on, I just added the bit here as
> > > > well to remove possible conflicts later.
> > >
> > > Why do we want this?
> >
> > I saw below [1] and thought you have the patch introduced aux_output
> > and it's gonna to be merged soon.
> > Also the tooling patches are already in the acme/perf/core
> > so I just wanted to avoid conflicts.
> >
> > Anyway, I'm ok with changing it.  Will remove in v2.
>
> I seem to have confused both you and Arnaldo with this. This email
> questions the Changelog as a whole, not just the aux thing (I send a
> separate email for that).
>
> This Changelog utterly fails to explain to me _why_ we need/want cgroup
> tracking. So why do I want to review and possibly merge this? Changelog
> needs to answer this.

OK.  How about this?

Systems running a large number of jobs in different cgroups want to
profiling such jobs precisely.  This includes container hosting systems
widely used today.  Currently perf supports namespace tracking but
the systems may not use (cgroup) namespace for their jobs.  Also
it'd be more intuitive to see cgroup names (as they're given by user
or sysadmin) rather than numeric cgroup/namespace id even if they
use the namespaces.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event
  2019-08-30  3:46     ` Namhyung Kim
@ 2019-08-30  7:34       ` Peter Zijlstra
  2019-08-30 22:49         ` Namhyung Kim
  0 siblings, 1 reply; 48+ messages in thread
From: Peter Zijlstra @ 2019-08-30  7:34 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, LKML, Jiri Olsa,
	Alexander Shishkin, Stephane Eranian, Tejun Heo, Li Zefan,
	Johannes Weiner, Adrian Hunter

On Fri, Aug 30, 2019 at 12:46:51PM +0900, Namhyung Kim wrote:
> Hi Peter,
> 
> On Wed, Aug 28, 2019 at 6:45 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Wed, Aug 28, 2019 at 04:31:22PM +0900, Namhyung Kim wrote:
> > > To support cgroup tracking, add CGROUP event to save a link between
> > > cgroup path and inode number.  The attr.cgroup bit was also added to
> > > enable cgroup tracking from userspace.
> > >
> > > This event will be generated when a new cgroup becomes active.
> > > Userspace might need to synthesize those events for existing cgroups.
> > >
> > > As aux_output change is also going on, I just added the bit here as
> > > well to remove possible conflicts later.
> >
> > Why do we want this?
> 
> I saw below [1] and thought you have the patch introduced aux_output
> and it's gonna to be merged soon.
> Also the tooling patches are already in the acme/perf/core
> so I just wanted to avoid conflicts.
> 
> Anyway, I'm ok with changing it.  Will remove in v2.

I seem to have confused both you and Arnaldo with this. This email
questions the Changelog as a whole, not just the aux thing (I send a
separate email for that).

This Changelog utterly fails to explain to me _why_ we need/want cgroup
tracking. So why do I want to review and possibly merge this? Changelog
needs to answer this.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event
  2019-08-28 14:48   ` Tejun Heo
@ 2019-08-30  3:56     ` Namhyung Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Namhyung Kim @ 2019-08-30  3:56 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo, LKML,
	Jiri Olsa, Alexander Shishkin, Stephane Eranian, Li Zefan,
	Johannes Weiner, Adrian Hunter

Hi Tejun,

On Wed, Aug 28, 2019 at 11:48 PM Tejun Heo <tj@kernel.org> wrote:
>
> Hello, Namhyung.
>
> On Wed, Aug 28, 2019 at 04:31:22PM +0900, Namhyung Kim wrote:
> > +      * struct {
> > +      *      struct perf_event_header        header;
> > +      *      u64                             ino;
> > +      *      u64                             path_len;
> > +      *      char                            path[];
>
> ino and path aren't great identifers for cgroups because both can get
> recycled pretty quickly.  Can you please take a look at
> KERNFS_ROOT_SUPPORT_EXPORTOP?  That's the fhandle that cgroup uses,
> currently the standard ino+gen which isn't ideal but good enough.
> Another benefit is that the path can also be resolved efficiently from
> userspace given the handle.

Thanks for the info, I'll take a look at it.
Namhyung

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event
  2019-08-28  9:44   ` Peter Zijlstra
  2019-08-28 13:13     ` Arnaldo Carvalho de Melo
@ 2019-08-30  3:46     ` Namhyung Kim
  2019-08-30  7:34       ` Peter Zijlstra
  1 sibling, 1 reply; 48+ messages in thread
From: Namhyung Kim @ 2019-08-30  3:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, LKML, Jiri Olsa,
	Alexander Shishkin, Stephane Eranian, Tejun Heo, Li Zefan,
	Johannes Weiner, Adrian Hunter

Hi Peter,

On Wed, Aug 28, 2019 at 6:45 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Aug 28, 2019 at 04:31:22PM +0900, Namhyung Kim wrote:
> > To support cgroup tracking, add CGROUP event to save a link between
> > cgroup path and inode number.  The attr.cgroup bit was also added to
> > enable cgroup tracking from userspace.
> >
> > This event will be generated when a new cgroup becomes active.
> > Userspace might need to synthesize those events for existing cgroups.
> >
> > As aux_output change is also going on, I just added the bit here as
> > well to remove possible conflicts later.
>
> Why do we want this?

I saw below [1] and thought you have the patch introduced aux_output
and it's gonna to be merged soon.
Also the tooling patches are already in the acme/perf/core
so I just wanted to avoid conflicts.

Anyway, I'm ok with changing it.  Will remove in v2.

Thanks
Namhyung

[1] https://lkml.org/lkml/2019/8/6/586

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event
  2019-08-28  7:31 ` [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event Namhyung Kim
  2019-08-28  9:44   ` Peter Zijlstra
  2019-08-28  9:44   ` Peter Zijlstra
@ 2019-08-28 14:48   ` Tejun Heo
  2019-08-30  3:56     ` Namhyung Kim
  2 siblings, 1 reply; 48+ messages in thread
From: Tejun Heo @ 2019-08-28 14:48 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo, LKML,
	Jiri Olsa, Alexander Shishkin, Stephane Eranian, Li Zefan,
	Johannes Weiner, Adrian Hunter

Hello, Namhyung.

On Wed, Aug 28, 2019 at 04:31:22PM +0900, Namhyung Kim wrote:
> +	 * struct {
> +	 *	struct perf_event_header	header;
> +	 *	u64				ino;
> +	 *	u64				path_len;
> +	 *	char				path[];

ino and path aren't great identifers for cgroups because both can get
recycled pretty quickly.  Can you please take a look at
KERNFS_ROOT_SUPPORT_EXPORTOP?  That's the fhandle that cgroup uses,
currently the standard ino+gen which isn't ideal but good enough.
Another benefit is that the path can also be resolved efficiently from
userspace given the handle.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event
  2019-08-28  9:44   ` Peter Zijlstra
@ 2019-08-28 13:13     ` Arnaldo Carvalho de Melo
  2019-08-30  3:46     ` Namhyung Kim
  1 sibling, 0 replies; 48+ messages in thread
From: Arnaldo Carvalho de Melo @ 2019-08-28 13:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Namhyung Kim, Ingo Molnar, LKML, Jiri Olsa, Alexander Shishkin,
	Stephane Eranian, Tejun Heo, Li Zefan, Johannes Weiner,
	Adrian Hunter

Em Wed, Aug 28, 2019 at 11:44:59AM +0200, Peter Zijlstra escreveu:
> On Wed, Aug 28, 2019 at 04:31:22PM +0900, Namhyung Kim wrote:
> > To support cgroup tracking, add CGROUP event to save a link between
> > cgroup path and inode number.  The attr.cgroup bit was also added to
> > enable cgroup tracking from userspace.
> > 
> > This event will be generated when a new cgroup becomes active.
> > Userspace might need to synthesize those events for existing cgroups.
> > 
> > As aux_output change is also going on, I just added the bit here as
> > well to remove possible conflicts later.
> 
> Why do we want this?

Yeah, I'd resolve the conflict later, to avoid mixing things like this.

- Arnaldo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event
  2019-08-28  7:31 ` [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event Namhyung Kim
  2019-08-28  9:44   ` Peter Zijlstra
@ 2019-08-28  9:44   ` Peter Zijlstra
  2019-08-28 13:13     ` Arnaldo Carvalho de Melo
  2019-08-30  3:46     ` Namhyung Kim
  2019-08-28 14:48   ` Tejun Heo
  2 siblings, 2 replies; 48+ messages in thread
From: Peter Zijlstra @ 2019-08-28  9:44 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, LKML, Jiri Olsa,
	Alexander Shishkin, Stephane Eranian, Tejun Heo, Li Zefan,
	Johannes Weiner, Adrian Hunter

On Wed, Aug 28, 2019 at 04:31:22PM +0900, Namhyung Kim wrote:
> To support cgroup tracking, add CGROUP event to save a link between
> cgroup path and inode number.  The attr.cgroup bit was also added to
> enable cgroup tracking from userspace.
> 
> This event will be generated when a new cgroup becomes active.
> Userspace might need to synthesize those events for existing cgroups.
> 
> As aux_output change is also going on, I just added the bit here as
> well to remove possible conflicts later.

Why do we want this?


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event
  2019-08-28  7:31 ` [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event Namhyung Kim
@ 2019-08-28  9:44   ` Peter Zijlstra
  2019-08-28  9:44   ` Peter Zijlstra
  2019-08-28 14:48   ` Tejun Heo
  2 siblings, 0 replies; 48+ messages in thread
From: Peter Zijlstra @ 2019-08-28  9:44 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, LKML, Jiri Olsa,
	Alexander Shishkin, Stephane Eranian, Tejun Heo, Li Zefan,
	Johannes Weiner, Adrian Hunter

On Wed, Aug 28, 2019 at 04:31:22PM +0900, Namhyung Kim wrote:
> To support cgroup tracking, add CGROUP event to save a link between
> cgroup path and inode number.  The attr.cgroup bit was also added to
> enable cgroup tracking from userspace.
> 
> This event will be generated when a new cgroup becomes active.
> Userspace might need to synthesize those events for existing cgroups.
> 
> As aux_output change is also going on, I just added the bit here as
> well to remove possible conflicts later.
> 
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Li Zefan <lizefan@huawei.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  include/uapi/linux/perf_event.h |  15 ++++-
>  kernel/events/core.c            | 112 ++++++++++++++++++++++++++++++++
>  2 files changed, 126 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index 7198ddd0c6b1..cb07c24b715f 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -374,7 +374,9 @@ struct perf_event_attr {
>  				namespaces     :  1, /* include namespaces data */
>  				ksymbol        :  1, /* include ksymbol events */
>  				bpf_event      :  1, /* include bpf events */
> -				__reserved_1   : 33;
> +				aux_output     :  1, /* generate AUX records instead of events */
> +				cgroup         :  1, /* include cgroup events */
> +				__reserved_1   : 31;

That looks like a rebase fail, aux_output is not from these patches.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event
  2019-08-28  7:31 [PATCHSET 0/9] perf: Improve cgroup profiling (v1) Namhyung Kim
@ 2019-08-28  7:31 ` Namhyung Kim
  2019-08-28  9:44   ` Peter Zijlstra
                     ` (2 more replies)
  0 siblings, 3 replies; 48+ messages in thread
From: Namhyung Kim @ 2019-08-28  7:31 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: LKML, Jiri Olsa, Alexander Shishkin, Stephane Eranian, Tejun Heo,
	Li Zefan, Johannes Weiner, Adrian Hunter

To support cgroup tracking, add CGROUP event to save a link between
cgroup path and inode number.  The attr.cgroup bit was also added to
enable cgroup tracking from userspace.

This event will be generated when a new cgroup becomes active.
Userspace might need to synthesize those events for existing cgroups.

As aux_output change is also going on, I just added the bit here as
well to remove possible conflicts later.

Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 include/uapi/linux/perf_event.h |  15 ++++-
 kernel/events/core.c            | 112 ++++++++++++++++++++++++++++++++
 2 files changed, 126 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 7198ddd0c6b1..cb07c24b715f 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -374,7 +374,9 @@ struct perf_event_attr {
 				namespaces     :  1, /* include namespaces data */
 				ksymbol        :  1, /* include ksymbol events */
 				bpf_event      :  1, /* include bpf events */
-				__reserved_1   : 33;
+				aux_output     :  1, /* generate AUX records instead of events */
+				cgroup         :  1, /* include cgroup events */
+				__reserved_1   : 31;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -999,6 +1001,17 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_BPF_EVENT			= 18,
 
+	/*
+	 * struct {
+	 *	struct perf_event_header	header;
+	 *	u64				ino;
+	 *	u64				path_len;
+	 *	char				path[];
+	 *	struct sample_id		sample_id;
+	 * };
+	 */
+	PERF_RECORD_CGROUP			= 19,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 0463c1151bae..d898263db4fc 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -386,6 +386,7 @@ static atomic_t nr_freq_events __read_mostly;
 static atomic_t nr_switch_events __read_mostly;
 static atomic_t nr_ksymbol_events __read_mostly;
 static atomic_t nr_bpf_events __read_mostly;
+static atomic_t nr_cgroup_events __read_mostly;
 
 static LIST_HEAD(pmus);
 static DEFINE_MUTEX(pmus_lock);
@@ -4314,6 +4315,8 @@ static void unaccount_event(struct perf_event *event)
 		atomic_dec(&nr_comm_events);
 	if (event->attr.namespaces)
 		atomic_dec(&nr_namespaces_events);
+	if (event->attr.cgroup)
+		atomic_dec(&nr_cgroup_events);
 	if (event->attr.task)
 		atomic_dec(&nr_task_events);
 	if (event->attr.freq)
@@ -7217,6 +7220,106 @@ void perf_event_namespaces(struct task_struct *task)
 			NULL);
 }
 
+/*
+ * cgroup tracking
+ */
+#ifdef CONFIG_CGROUPS
+
+struct perf_cgroup_event {
+	char				*path;
+	struct {
+		struct perf_event_header	header;
+		u64				ino;
+		u64				path_len;
+		char				path[];
+	} event_id;
+};
+
+static int perf_event_cgroup_match(struct perf_event *event)
+{
+	return event->attr.cgroup;
+}
+
+static void perf_event_cgroup_output(struct perf_event *event, void *data)
+{
+	struct perf_cgroup_event *cgroup_event = data;
+	struct perf_output_handle handle;
+	struct perf_sample_data sample;
+	u16 header_size = cgroup_event->event_id.header.size;
+	int ret;
+
+	if (!perf_event_cgroup_match(event))
+		return;
+
+	perf_event_header__init_id(&cgroup_event->event_id.header,
+				   &sample, event);
+	ret = perf_output_begin(&handle, event,
+				cgroup_event->event_id.header.size);
+	if (ret)
+		goto out;
+
+	perf_output_put(&handle, cgroup_event->event_id);
+	__output_copy(&handle, cgroup_event->path,
+		      cgroup_event->event_id.path_len);
+
+	perf_event__output_id_sample(event, &handle, &sample);
+
+	perf_output_end(&handle);
+out:
+	cgroup_event->event_id.header.size = header_size;
+}
+
+void perf_event_cgroup(struct cgroup *cgrp)
+{
+	struct perf_cgroup_event cgroup_event;
+	char path_enomem[16] = "//enomem";
+	char *pathname;
+	size_t size;
+
+	if (!atomic_read(&nr_cgroup_events))
+		return;
+
+	cgroup_event = (struct perf_cgroup_event){
+		.event_id  = {
+			.header = {
+				.type = PERF_RECORD_CGROUP,
+				.misc = 0,
+				.size = sizeof(cgroup_event.event_id),
+			},
+			.ino = cgrp->kn->id.ino,
+		},
+	};
+
+	pathname = kmalloc(PATH_MAX, GFP_KERNEL);
+	if (pathname == NULL) {
+		cgroup_event.path = path_enomem;
+	} else {
+		/* just to be sure to have enough space for alignment */
+		cgroup_path(cgrp, pathname, PATH_MAX - sizeof(u64));
+		cgroup_event.path = pathname;
+	}
+
+	/*
+	 * Since our buffer works in 8 byte units we need to align our string
+	 * size to a multiple of 8. However, we must guarantee the tail end is
+	 * zero'd out to avoid leaking random bits to userspace.
+	 */
+	size = strlen(cgroup_event.path) + 1;
+	while (!IS_ALIGNED(size, sizeof(u64)))
+		cgroup_event.path[size++] = '\0';
+
+	cgroup_event.event_id.header.size += size;
+	cgroup_event.event_id.path_len = size;
+
+	perf_iterate_sb(perf_event_cgroup_output,
+			&cgroup_event,
+			NULL);
+
+	kfree(pathname);
+}
+
+#endif
+
 /*
  * mmap tracking
  */
@@ -10232,6 +10335,8 @@ static void account_event(struct perf_event *event)
 		atomic_inc(&nr_comm_events);
 	if (event->attr.namespaces)
 		atomic_inc(&nr_namespaces_events);
+	if (event->attr.cgroup)
+		atomic_inc(&nr_cgroup_events);
 	if (event->attr.task)
 		atomic_inc(&nr_task_events);
 	if (event->attr.freq)
@@ -12186,6 +12291,12 @@ static void perf_cgroup_css_free(struct cgroup_subsys_state *css)
 	kfree(jc);
 }
 
+static int perf_cgroup_css_online(struct cgroup_subsys_state *css)
+{
+	perf_event_cgroup(css->cgroup);
+	return 0;
+}
+
 static int __perf_cgroup_move(void *info)
 {
 	struct task_struct *task = info;
@@ -12207,6 +12318,7 @@ static void perf_cgroup_attach(struct cgroup_taskset *tset)
 struct cgroup_subsys perf_event_cgrp_subsys = {
 	.css_alloc	= perf_cgroup_css_alloc,
 	.css_free	= perf_cgroup_css_free,
+	.css_online	= perf_cgroup_css_online,
 	.attach		= perf_cgroup_attach,
 	/*
 	 * Implicitly enable on dfl hierarchy so that perf events can
-- 
2.23.0.187.g17f5b7556c-goog


^ permalink raw reply related	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2020-03-25 12:45 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-07 13:34 [PATCHSET 0/9] perf: Improve cgroup profiling (v4) Namhyung Kim
2020-01-07 13:34 ` [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event Namhyung Kim
2020-01-07 13:34 ` [PATCH 2/9] perf/core: Add PERF_SAMPLE_CGROUP feature Namhyung Kim
2020-01-07 13:34 ` [PATCH 3/9] perf tools: Basic support for CGROUP event Namhyung Kim
2020-01-07 13:34 ` [PATCH 4/9] perf tools: Maintain cgroup hierarchy Namhyung Kim
2020-01-08 21:52   ` Jiri Olsa
2020-01-09  0:51     ` Namhyung Kim
2020-01-20 13:57       ` Namhyung Kim
2020-01-21  9:42         ` Jiri Olsa
2020-02-17 15:46           ` Arnaldo Carvalho de Melo
2020-02-17 19:22             ` Namhyung Kim
2020-01-08 22:01   ` Jiri Olsa
2020-01-09  0:55     ` Namhyung Kim
2020-01-07 13:34 ` [PATCH 5/9] perf report: Add 'cgroup' sort key Namhyung Kim
2020-01-07 13:34 ` [PATCH 6/9] perf record: Support synthesizing cgroup events Namhyung Kim
2020-01-08 22:08   ` Jiri Olsa
2020-01-09  5:19     ` Namhyung Kim
2020-01-08 22:18   ` Jiri Olsa
2020-01-09  7:50     ` Namhyung Kim
2020-01-08 22:21   ` Jiri Olsa
2020-01-09  7:51     ` Namhyung Kim
2020-01-07 13:34 ` [PATCH 7/9] perf record: Add --all-cgroups option Namhyung Kim
2020-01-07 13:35 ` [PATCH 8/9] perf top: " Namhyung Kim
2020-01-08 22:24   ` Jiri Olsa
2020-01-09  7:55     ` Namhyung Kim
2020-01-09  8:18       ` Jiri Olsa
2020-01-07 13:35 ` [PATCH 9/9] perf script: Add --show-cgroup-events option Namhyung Kim
  -- strict thread matches above, loose matches on Subject: below --
2020-03-25 12:45 [PATCHSET 0/9] perf: Improve cgroup profiling (v6) Namhyung Kim
2020-03-25 12:45 ` [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event Namhyung Kim
2019-12-23  6:07 [PATCHSET 0/9] perf: Improve cgroup profiling (v3) Namhyung Kim
2019-12-23  6:07 ` [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event Namhyung Kim
2019-12-27  4:07   ` kbuild test robot
2020-01-07 12:51   ` Peter Zijlstra
2020-01-07 13:46     ` Namhyung Kim
2019-12-20  4:32 [PATCHSET 0/9] perf: Improve cgroup profiling (v2) Namhyung Kim
2019-12-20  4:32 ` [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event Namhyung Kim
2019-12-20  9:33   ` Peter Zijlstra
2019-12-20 11:28     ` Namhyung Kim
2019-12-20 11:50     ` Namhyung Kim
2019-12-20 15:16     ` Tejun Heo
2019-12-21  8:49       ` Namhyung Kim
2019-08-28  7:31 [PATCHSET 0/9] perf: Improve cgroup profiling (v1) Namhyung Kim
2019-08-28  7:31 ` [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event Namhyung Kim
2019-08-28  9:44   ` Peter Zijlstra
2019-08-28  9:44   ` Peter Zijlstra
2019-08-28 13:13     ` Arnaldo Carvalho de Melo
2019-08-30  3:46     ` Namhyung Kim
2019-08-30  7:34       ` Peter Zijlstra
2019-08-30 22:49         ` Namhyung Kim
2019-08-30 23:52           ` Stephane Eranian
2019-08-28 14:48   ` Tejun Heo
2019-08-30  3:56     ` Namhyung Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.