All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET 0/9] perf: Improve cgroup profiling (v4)
@ 2020-01-07 13:34 Namhyung Kim
  2020-01-07 13:34 ` [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event Namhyung Kim
                   ` (8 more replies)
  0 siblings, 9 replies; 33+ messages in thread
From: Namhyung Kim @ 2020-01-07 13:34 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Alexander Shishkin, Mark Rutland, Stephane Eranian,
	LKML, linux-perf-users, Tejun Heo, Li Zefan, Johannes Weiner,
	Adrian Hunter

Hello,

This work is to improve cgroup profiling in perf.  Currently it only
supports profiling tasks in a specific cgroup and there's no way to
identify which cgroup the current sample belongs to.  So I added
PERF_SAMPLE_CGROUP to add cgroup id into each sample.  It's a 64-bit
integer having file handle of the cgroup.  And kernel also generates
PERF_RECORD_CGROUP event for new groups to correlate the cgroup id and
cgroup name (path in the cgroup filesystem).  The cgroup id can be
read from userspace by name_to_handle_at() system call so it can
synthesize the CGROUP event for existing groups.

So why do we want this?  Systems running a large number of jobs in
different cgroups want to profiling such jobs precisely. This includes
container hosting systems widely used today. Currently perf supports
namespace tracking but the systems may not use (cgroup) namespace for
their jobs. Also it'd be more intuitive to see cgroup names (as
they're given by user or sysadmin) rather than numeric
cgroup/namespace id even if they use the namespaces.

From Stephane Eranian:
> In data centers you care about attributing samples to a job not such
> much to a process.  A job may have multiple processes which may come
> and go. The cgroup on the other hand stays around for the entire
> lifetime of the job. It is much easier to map a cgroup name to a
> particular job than it is to map a pid back to a job name,
> especially for offline post-processing.

Note that this only works for "perf_event" cgroups (obviously) so if
users are still using cgroup-v1 interface, they need to have same
hierarchy for subsystem(s) want to profile with it.

 * Changes from v3:
  - staticize perf_event_cgroup()
  - fix sample parsing test

 * Changes from v2:
  - remove path_len from cgroup_event
  - bail out if kernel doesn't support cgroup sampling
  - add some description in the Kconfig

 * Changes from v1:
  - use new cgroup id (= file handle)

The code is available at perf/cgroup-v4 branch in

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git


The testing result looks something like this:

  [root@qemu build]# ./perf record --all-cgroups ./cgtest
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.009 MB perf.data (150 samples) ]
  
  [root@qemu build]# ./perf report -s cgroup,comm --stdio
  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 150  of event 'cpu-clock:pppH'
  # Event count (approx.): 37500000
  #
  # Overhead  cgroup      Command
  # ........  ..........  .......
  #
      32.00%  /sub/cgrp2  looper2
      28.00%  /sub/cgrp1  looper1
      25.33%  /sub        looper0
       4.00%  /           cgtest 
       4.00%  /sub        cgtest 
       3.33%  /sub/cgrp2  cgtest 
       2.67%  /sub/cgrp1  cgtest 
       0.67%  /           looper0


The test program (cgtest) follows.

Thanks,
Namhyung


Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>


-------8<-----------------------------------------8<----------------
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <sched.h>
#include <errno.h>
#include <sys/stat.h>
#include <sys/wait.h>
#include <sys/prctl.h>
#include <sys/mount.h>

char cgbase[] = "/sys/fs/cgroup/perf_event";

void mkcgrp(char *name) {
  char buf[256];

  snprintf(buf, sizeof(buf), "%s%s", cgbase, name);
  if (mkdir(buf, 0755) < 0)
    perror("mkdir");
}

void rmcgrp(char *name) {
  char buf[256];

  snprintf(buf, sizeof(buf), "%s%s", cgbase, name);
  if (rmdir(buf) < 0)
    perror("rmdir");
}

void setcgrp(char *name) {
  char buf[256];
  int fd;

  snprintf(buf, sizeof(buf), "%s%s/tasks", cgbase, name);

  fd = open(buf, O_WRONLY);
  if (fd > 0) {
    if (write(fd, "0\n", 2) != 2)
      perror("write");
    close(fd);
  }
}

void create_sub_cgroup(int idx) {
  char buf[128];

  snprintf(buf, sizeof(buf), "/sub/cgrp%d", idx);
  mkcgrp(buf);
}

void remove_sub_cgroup(int idx) {
  char buf[128];

  snprintf(buf, sizeof(buf), "/sub/cgrp%d", idx);
  rmcgrp(buf);
}

void set_sub_cgroup(int idx) {
  char buf[128];

  snprintf(buf, sizeof(buf), "/sub/cgrp%d", idx);
  setcgrp(buf);
}

void set_task_name(int idx) {
  char buf[16];

  snprintf(buf, sizeof(buf), "looper%d", idx);
  prctl(PR_SET_NAME, buf, 0, 0, 0);
}

void loop(unsigned max) {
  volatile unsigned int count = 1;

  while (count++ != max) {
    asm volatile ("pause");
  }
}

void worker(int idx, unsigned cnt, int new_ns) {
  int oldns;

  create_sub_cgroup(idx);
  set_sub_cgroup(idx);

  if (new_ns) {
    if (unshare(CLONE_NEWCGROUP) < 0)
      perror("unshare");

#if 0  /* FIXME */
    if (unshare(CLONE_NEWNS) < 0)
      perror("unshare");

    if (mount("none", "/sys", NULL, MS_REMOUNT | MS_REC | MS_SLAVE, NULL) < 0)
      perror("mount --make-rslave");

    sleep(1);
    if (umount("/sys/fs/cgroup/perf_event") < 0)
      perror("umount");

    if (mount("cgroup", "/sys/fs/cgroup/perf_event", "cgroup",
              MS_NODEV | MS_NOEXEC | MS_NOSUID, "perf_event") < 0)
      perror("mount again");
#endif
  }

  if (fork() == 0) {
    set_task_name(idx);
    loop(cnt);
    exit(0);
  }
  wait(NULL);
}

int main(int argc, char *argv[])
{
  int i, nr = 2;
  int new_ns = 1;
  unsigned cnt = 1000000;
  int fd;

  if (argc > 1)
    nr = atoi(argv[1]);
  if (argc > 2)
    cnt = atoi(argv[2]);
  if (argc > 3)
    new_ns = atoi(argv[3]);

  mkcgrp("/sub");
  setcgrp("/sub");

  for (i = 0; i < nr; i++) {
    if (fork() == 0) {
      worker(i+1, cnt, new_ns);
      exit(0);
    }
  }

  set_task_name(0);
  loop(cnt);

  for (i = 0; i < nr; i++)
    wait(NULL);

  for (i = 0; i < nr; i++)
    remove_sub_cgroup(i+1);

  setcgrp("/");
  rmcgrp("/sub");

  return 0;
}
-------8<-----------------------------------------8<----------------


Namhyung Kim (9):
  perf/core: Add PERF_RECORD_CGROUP event
  perf/core: Add PERF_SAMPLE_CGROUP feature
  perf tools: Basic support for CGROUP event
  perf tools: Maintain cgroup hierarchy
  perf report: Add 'cgroup' sort key
  perf record: Support synthesizing cgroup events
  perf record: Add --all-cgroups option
  perf top: Add --all-cgroups option
  perf script: Add --show-cgroup-events option

 include/linux/perf_event.h                |   1 +
 include/uapi/linux/perf_event.h           |  16 ++-
 init/Kconfig                              |   3 +-
 kernel/events/core.c                      | 133 ++++++++++++++++++++++
 tools/include/uapi/linux/perf_event.h     |  16 ++-
 tools/lib/perf/include/perf/event.h       |   7 ++
 tools/perf/Documentation/perf-record.txt  |   5 +-
 tools/perf/Documentation/perf-report.txt  |   1 +
 tools/perf/Documentation/perf-script.txt  |   3 +
 tools/perf/Documentation/perf-top.txt     |   4 +
 tools/perf/builtin-diff.c                 |   1 +
 tools/perf/builtin-record.c               |  10 ++
 tools/perf/builtin-report.c               |   1 +
 tools/perf/builtin-script.c               |  41 +++++++
 tools/perf/builtin-top.c                  |   9 ++
 tools/perf/tests/sample-parsing.c         |   6 +-
 tools/perf/util/cgroup.c                  |  75 +++++++++++-
 tools/perf/util/cgroup.h                  |  16 ++-
 tools/perf/util/event.c                   |  19 ++++
 tools/perf/util/event.h                   |   6 +
 tools/perf/util/evsel.c                   |  17 ++-
 tools/perf/util/evsel.h                   |   1 +
 tools/perf/util/hist.c                    |   7 ++
 tools/perf/util/hist.h                    |   1 +
 tools/perf/util/machine.c                 |  19 ++++
 tools/perf/util/machine.h                 |   3 +
 tools/perf/util/perf_event_attr_fprintf.c |   2 +
 tools/perf/util/record.h                  |   1 +
 tools/perf/util/session.c                 |   8 ++
 tools/perf/util/sort.c                    |  31 +++++
 tools/perf/util/sort.h                    |   2 +
 tools/perf/util/synthetic-events.c        | 127 +++++++++++++++++++++
 tools/perf/util/synthetic-events.h        |   1 +
 tools/perf/util/tool.h                    |   2 +
 34 files changed, 581 insertions(+), 14 deletions(-)


base-commit: 6c4798d3f08b81c2c52936b10e0fa872590c96ae
-- 
2.24.1.735.g03f4e72817-goog


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event
  2020-01-07 13:34 [PATCHSET 0/9] perf: Improve cgroup profiling (v4) Namhyung Kim
@ 2020-01-07 13:34 ` Namhyung Kim
  2020-01-07 13:34 ` [PATCH 2/9] perf/core: Add PERF_SAMPLE_CGROUP feature Namhyung Kim
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 33+ messages in thread
From: Namhyung Kim @ 2020-01-07 13:34 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Alexander Shishkin, Mark Rutland, Stephane Eranian,
	LKML, linux-perf-users, Tejun Heo, Li Zefan, Johannes Weiner,
	Adrian Hunter, kbuild test robot

To support cgroup tracking, add CGROUP event to save a link between
cgroup path and id number.  This is needed since cgroups can go away
when userspace tries to read the cgroup info (from the id) later.
The attr.cgroup bit was also added to enable cgroup tracking from
userspace.

This event will be generated when a new cgroup becomes active.
Userspace might need to synthesize those events for existing cgroups.

Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
[staticize perf_event_cgroup function]
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 include/uapi/linux/perf_event.h |  13 +++-
 kernel/events/core.c            | 111 ++++++++++++++++++++++++++++++++
 2 files changed, 123 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 377d794d3105..de2ab87ca92c 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -377,7 +377,8 @@ struct perf_event_attr {
 				ksymbol        :  1, /* include ksymbol events */
 				bpf_event      :  1, /* include bpf events */
 				aux_output     :  1, /* generate AUX records instead of events */
-				__reserved_1   : 32;
+				cgroup         :  1, /* include cgroup events */
+				__reserved_1   : 31;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -1006,6 +1007,16 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_BPF_EVENT			= 18,
 
+	/*
+	 * struct {
+	 *	struct perf_event_header	header;
+	 *	u64				id;
+	 *	char				path[];
+	 *	struct sample_id		sample_id;
+	 * };
+	 */
+	PERF_RECORD_CGROUP			= 19,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index a1f8bde19b56..dbc46b884505 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -386,6 +386,7 @@ static atomic_t nr_freq_events __read_mostly;
 static atomic_t nr_switch_events __read_mostly;
 static atomic_t nr_ksymbol_events __read_mostly;
 static atomic_t nr_bpf_events __read_mostly;
+static atomic_t nr_cgroup_events __read_mostly;
 
 static LIST_HEAD(pmus);
 static DEFINE_MUTEX(pmus_lock);
@@ -4455,6 +4456,8 @@ static void unaccount_event(struct perf_event *event)
 		atomic_dec(&nr_comm_events);
 	if (event->attr.namespaces)
 		atomic_dec(&nr_namespaces_events);
+	if (event->attr.cgroup)
+		atomic_dec(&nr_cgroup_events);
 	if (event->attr.task)
 		atomic_dec(&nr_task_events);
 	if (event->attr.freq)
@@ -7564,6 +7567,105 @@ void perf_event_namespaces(struct task_struct *task)
 			NULL);
 }
 
+/*
+ * cgroup tracking
+ */
+#ifdef CONFIG_CGROUPS
+
+struct perf_cgroup_event {
+	char				*path;
+	int				path_size;
+	struct {
+		struct perf_event_header	header;
+		u64				id;
+		char				path[];
+	} event_id;
+};
+
+static int perf_event_cgroup_match(struct perf_event *event)
+{
+	return event->attr.cgroup;
+}
+
+static void perf_event_cgroup_output(struct perf_event *event, void *data)
+{
+	struct perf_cgroup_event *cgroup_event = data;
+	struct perf_output_handle handle;
+	struct perf_sample_data sample;
+	u16 header_size = cgroup_event->event_id.header.size;
+	int ret;
+
+	if (!perf_event_cgroup_match(event))
+		return;
+
+	perf_event_header__init_id(&cgroup_event->event_id.header,
+				   &sample, event);
+	ret = perf_output_begin(&handle, event,
+				cgroup_event->event_id.header.size);
+	if (ret)
+		goto out;
+
+	perf_output_put(&handle, cgroup_event->event_id);
+	__output_copy(&handle, cgroup_event->path, cgroup_event->path_size);
+
+	perf_event__output_id_sample(event, &handle, &sample);
+
+	perf_output_end(&handle);
+out:
+	cgroup_event->event_id.header.size = header_size;
+}
+
+static void perf_event_cgroup(struct cgroup *cgrp)
+{
+	struct perf_cgroup_event cgroup_event;
+	char path_enomem[16] = "//enomem";
+	char *pathname;
+	size_t size;
+
+	if (!atomic_read(&nr_cgroup_events))
+		return;
+
+	cgroup_event = (struct perf_cgroup_event){
+		.event_id  = {
+			.header = {
+				.type = PERF_RECORD_CGROUP,
+				.misc = 0,
+				.size = sizeof(cgroup_event.event_id),
+			},
+			.id = cgroup_id(cgrp),
+		},
+	};
+
+	pathname = kmalloc(PATH_MAX, GFP_KERNEL);
+	if (pathname == NULL) {
+		cgroup_event.path = path_enomem;
+	} else {
+		/* just to be sure to have enough space for alignment */
+		cgroup_path(cgrp, pathname, PATH_MAX - sizeof(u64));
+		cgroup_event.path = pathname;
+	}
+
+	/*
+	 * Since our buffer works in 8 byte units we need to align our string
+	 * size to a multiple of 8. However, we must guarantee the tail end is
+	 * zero'd out to avoid leaking random bits to userspace.
+	 */
+	size = strlen(cgroup_event.path) + 1;
+	while (!IS_ALIGNED(size, sizeof(u64)))
+		cgroup_event.path[size++] = '\0';
+
+	cgroup_event.event_id.header.size += size;
+	cgroup_event.path_size = size;
+
+	perf_iterate_sb(perf_event_cgroup_output,
+			&cgroup_event,
+			NULL);
+
+	kfree(pathname);
+}
+
+#endif
+
 /*
  * mmap tracking
  */
@@ -10607,6 +10709,8 @@ static void account_event(struct perf_event *event)
 		atomic_inc(&nr_comm_events);
 	if (event->attr.namespaces)
 		atomic_inc(&nr_namespaces_events);
+	if (event->attr.cgroup)
+		atomic_inc(&nr_cgroup_events);
 	if (event->attr.task)
 		atomic_inc(&nr_task_events);
 	if (event->attr.freq)
@@ -12581,6 +12685,12 @@ static void perf_cgroup_css_free(struct cgroup_subsys_state *css)
 	kfree(jc);
 }
 
+static int perf_cgroup_css_online(struct cgroup_subsys_state *css)
+{
+	perf_event_cgroup(css->cgroup);
+	return 0;
+}
+
 static int __perf_cgroup_move(void *info)
 {
 	struct task_struct *task = info;
@@ -12602,6 +12712,7 @@ static void perf_cgroup_attach(struct cgroup_taskset *tset)
 struct cgroup_subsys perf_event_cgrp_subsys = {
 	.css_alloc	= perf_cgroup_css_alloc,
 	.css_free	= perf_cgroup_css_free,
+	.css_online	= perf_cgroup_css_online,
 	.attach		= perf_cgroup_attach,
 	/*
 	 * Implicitly enable on dfl hierarchy so that perf events can
-- 
2.24.1.735.g03f4e72817-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/9] perf/core: Add PERF_SAMPLE_CGROUP feature
  2020-01-07 13:34 [PATCHSET 0/9] perf: Improve cgroup profiling (v4) Namhyung Kim
  2020-01-07 13:34 ` [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event Namhyung Kim
@ 2020-01-07 13:34 ` Namhyung Kim
  2020-01-07 13:34 ` [PATCH 3/9] perf tools: Basic support for CGROUP event Namhyung Kim
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 33+ messages in thread
From: Namhyung Kim @ 2020-01-07 13:34 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Alexander Shishkin, Mark Rutland, Stephane Eranian,
	LKML, linux-perf-users, Tejun Heo, Li Zefan, Johannes Weiner

The PERF_SAMPLE_CGROUP bit is to save (perf_event) cgroup information
in the sample.  It will add a 64-bit id to identify current cgroup and
it's the file handle in the cgroup file system.  Userspace should use
this information with PERF_RECORD_CGROUP event to match which cgroup
it belongs.

I put it before PERF_SAMPLE_AUX for simplicity since it just needs a
64-bit word.  But if we want bigger samples, I can work on that
direction too.

Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 include/linux/perf_event.h      |  1 +
 include/uapi/linux/perf_event.h |  3 ++-
 init/Kconfig                    |  3 ++-
 kernel/events/core.c            | 22 ++++++++++++++++++++++
 4 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 6d4c22aee384..17b5bff045a6 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1001,6 +1001,7 @@ struct perf_sample_data {
 	u64				stack_user_size;
 
 	u64				phys_addr;
+	u64				cgroup;
 } ____cacheline_aligned;
 
 /* default value for data source */
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index de2ab87ca92c..3a81e9806cb1 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -142,8 +142,9 @@ enum perf_event_sample_format {
 	PERF_SAMPLE_REGS_INTR			= 1U << 18,
 	PERF_SAMPLE_PHYS_ADDR			= 1U << 19,
 	PERF_SAMPLE_AUX				= 1U << 20,
+	PERF_SAMPLE_CGROUP			= 1U << 21,
 
-	PERF_SAMPLE_MAX = 1U << 21,		/* non-ABI */
+	PERF_SAMPLE_MAX = 1U << 22,		/* non-ABI */
 
 	__PERF_SAMPLE_CALLCHAIN_EARLY		= 1ULL << 63, /* non-ABI; internal use */
 };
diff --git a/init/Kconfig b/init/Kconfig
index a34064a031a5..403ed2448b22 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1026,7 +1026,8 @@ config CGROUP_PERF
 	help
 	  This option extends the perf per-cpu mode to restrict monitoring
 	  to threads which belong to the cgroup specified and run on the
-	  designated cpu.
+	  designated cpu.  Or this can be used to have cgroup ID in samples
+	  so that it can monitor performance events among cgroups.
 
 	  Say N if unsure.
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index dbc46b884505..773480527194 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1754,6 +1754,9 @@ static void __perf_event_header_size(struct perf_event *event, u64 sample_type)
 	if (sample_type & PERF_SAMPLE_PHYS_ADDR)
 		size += sizeof(data->phys_addr);
 
+	if (sample_type & PERF_SAMPLE_CGROUP)
+		size += sizeof(data->cgroup);
+
 	event->header_size = size;
 }
 
@@ -6699,6 +6702,9 @@ void perf_output_sample(struct perf_output_handle *handle,
 	if (sample_type & PERF_SAMPLE_PHYS_ADDR)
 		perf_output_put(handle, data->phys_addr);
 
+	if (sample_type & PERF_SAMPLE_CGROUP)
+		perf_output_put(handle, data->cgroup);
+
 	if (sample_type & PERF_SAMPLE_AUX) {
 		perf_output_put(handle, data->aux_size);
 
@@ -6895,6 +6901,16 @@ void perf_prepare_sample(struct perf_event_header *header,
 	if (sample_type & PERF_SAMPLE_PHYS_ADDR)
 		data->phys_addr = perf_virt_to_phys(data->addr);
 
+#ifdef CONFIG_CGROUP_PERF
+	if (sample_type & PERF_SAMPLE_CGROUP) {
+		struct cgroup *cgrp;
+
+		/* protected by RCU */
+		cgrp = task_css_check(current, perf_event_cgrp_id, 1)->cgroup;
+		data->cgroup = cgroup_id(cgrp);
+	}
+#endif
+
 	if (sample_type & PERF_SAMPLE_AUX) {
 		u64 size;
 
@@ -11090,6 +11106,12 @@ static int perf_copy_attr(struct perf_event_attr __user *uattr,
 
 	if (attr->sample_type & PERF_SAMPLE_REGS_INTR)
 		ret = perf_reg_validate(attr->sample_regs_intr);
+
+#ifndef CONFIG_CGROUP_PERF
+	if (attr->sample_type & PERF_SAMPLE_CGROUP)
+		return -EINVAL;
+#endif
+
 out:
 	return ret;
 
-- 
2.24.1.735.g03f4e72817-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 3/9] perf tools: Basic support for CGROUP event
  2020-01-07 13:34 [PATCHSET 0/9] perf: Improve cgroup profiling (v4) Namhyung Kim
  2020-01-07 13:34 ` [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event Namhyung Kim
  2020-01-07 13:34 ` [PATCH 2/9] perf/core: Add PERF_SAMPLE_CGROUP feature Namhyung Kim
@ 2020-01-07 13:34 ` Namhyung Kim
  2020-01-07 13:34 ` [PATCH 4/9] perf tools: Maintain cgroup hierarchy Namhyung Kim
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 33+ messages in thread
From: Namhyung Kim @ 2020-01-07 13:34 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Alexander Shishkin, Mark Rutland, Stephane Eranian,
	LKML, linux-perf-users, Adrian Hunter, kernel test robot

Implement basic functionality to support cgroup tracking.  Each cgroup
can be identified by inode number which can be read from userspace
too.  The actual cgroup processing will come in the later patch.

Cc: Adrian Hunter <adrian.hunter@intel.com>
[fix perf test failure on sampling parsing]
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/include/uapi/linux/perf_event.h     | 16 ++++++++++++++--
 tools/lib/perf/include/perf/event.h       |  7 +++++++
 tools/perf/builtin-diff.c                 |  1 +
 tools/perf/builtin-report.c               |  1 +
 tools/perf/tests/sample-parsing.c         |  6 +++++-
 tools/perf/util/event.c                   | 18 ++++++++++++++++++
 tools/perf/util/event.h                   |  6 ++++++
 tools/perf/util/evsel.c                   |  6 ++++++
 tools/perf/util/machine.c                 | 12 ++++++++++++
 tools/perf/util/machine.h                 |  3 +++
 tools/perf/util/perf_event_attr_fprintf.c |  2 ++
 tools/perf/util/session.c                 |  4 ++++
 tools/perf/util/synthetic-events.c        |  8 ++++++++
 tools/perf/util/tool.h                    |  1 +
 14 files changed, 88 insertions(+), 3 deletions(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 377d794d3105..3a81e9806cb1 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -142,8 +142,9 @@ enum perf_event_sample_format {
 	PERF_SAMPLE_REGS_INTR			= 1U << 18,
 	PERF_SAMPLE_PHYS_ADDR			= 1U << 19,
 	PERF_SAMPLE_AUX				= 1U << 20,
+	PERF_SAMPLE_CGROUP			= 1U << 21,
 
-	PERF_SAMPLE_MAX = 1U << 21,		/* non-ABI */
+	PERF_SAMPLE_MAX = 1U << 22,		/* non-ABI */
 
 	__PERF_SAMPLE_CALLCHAIN_EARLY		= 1ULL << 63, /* non-ABI; internal use */
 };
@@ -377,7 +378,8 @@ struct perf_event_attr {
 				ksymbol        :  1, /* include ksymbol events */
 				bpf_event      :  1, /* include bpf events */
 				aux_output     :  1, /* generate AUX records instead of events */
-				__reserved_1   : 32;
+				cgroup         :  1, /* include cgroup events */
+				__reserved_1   : 31;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -1006,6 +1008,16 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_BPF_EVENT			= 18,
 
+	/*
+	 * struct {
+	 *	struct perf_event_header	header;
+	 *	u64				id;
+	 *	char				path[];
+	 *	struct sample_id		sample_id;
+	 * };
+	 */
+	PERF_RECORD_CGROUP			= 19,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
diff --git a/tools/lib/perf/include/perf/event.h b/tools/lib/perf/include/perf/event.h
index 18106899cb4e..69b44d2cc0f5 100644
--- a/tools/lib/perf/include/perf/event.h
+++ b/tools/lib/perf/include/perf/event.h
@@ -105,6 +105,12 @@ struct perf_record_bpf_event {
 	__u8			 tag[BPF_TAG_SIZE];  // prog tag
 };
 
+struct perf_record_cgroup {
+	struct perf_event_header header;
+	__u64			 id;
+	char			 path[PATH_MAX];
+};
+
 struct perf_record_sample {
 	struct perf_event_header header;
 	__u64			 array[];
@@ -352,6 +358,7 @@ union perf_event {
 	struct perf_record_mmap2		mmap2;
 	struct perf_record_comm			comm;
 	struct perf_record_namespaces		namespaces;
+	struct perf_record_cgroup		cgroup;
 	struct perf_record_fork			fork;
 	struct perf_record_lost			lost;
 	struct perf_record_lost_samples		lost_samples;
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index f8b6ae557d8b..83d4094bf152 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -455,6 +455,7 @@ static struct perf_diff pdiff = {
 		.fork	= perf_event__process_fork,
 		.lost	= perf_event__process_lost,
 		.namespaces = perf_event__process_namespaces,
+		.cgroup = perf_event__process_cgroup,
 		.ordered_events = true,
 		.ordering_requires_timestamps = true,
 	},
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index de988589d99b..4924b042ab00 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -1094,6 +1094,7 @@ int cmd_report(int argc, const char **argv)
 			.mmap2		 = perf_event__process_mmap2,
 			.comm		 = perf_event__process_comm,
 			.namespaces	 = perf_event__process_namespaces,
+			.cgroup		 = perf_event__process_cgroup,
 			.exit		 = perf_event__process_exit,
 			.fork		 = perf_event__process_fork,
 			.lost		 = perf_event__process_lost,
diff --git a/tools/perf/tests/sample-parsing.c b/tools/perf/tests/sample-parsing.c
index 2762e1155238..a8ff7d2df118 100644
--- a/tools/perf/tests/sample-parsing.c
+++ b/tools/perf/tests/sample-parsing.c
@@ -150,6 +150,9 @@ static bool samples_same(const struct perf_sample *s1,
 	if (type & PERF_SAMPLE_PHYS_ADDR)
 		COMP(phys_addr);
 
+	if (type & PERF_SAMPLE_CGROUP)
+		COMP(cgroup);
+
 	if (type & PERF_SAMPLE_AUX) {
 		COMP(aux_sample.size);
 		if (memcmp(s1->aux_sample.data, s2->aux_sample.data,
@@ -228,6 +231,7 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
 			.regs	= regs,
 		},
 		.phys_addr	= 113,
+		.cgroup		= 114,
 		.aux_sample	= {
 			.size	= sizeof(aux_data),
 			.data	= (void *)aux_data,
@@ -331,7 +335,7 @@ int test__sample_parsing(struct test *test __maybe_unused, int subtest __maybe_u
 	 * were added.  Please actually update the test rather than just change
 	 * the condition below.
 	 */
-	if (PERF_SAMPLE_MAX > PERF_SAMPLE_AUX << 1) {
+	if (PERF_SAMPLE_MAX > PERF_SAMPLE_CGROUP << 1) {
 		pr_debug("sample format has changed, some new PERF_SAMPLE_ bit was introduced - test needs updating\n");
 		return -1;
 	}
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index c5447ff516a2..824c038e5c33 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -54,6 +54,7 @@ static const char *perf_event__names[] = {
 	[PERF_RECORD_NAMESPACES]		= "NAMESPACES",
 	[PERF_RECORD_KSYMBOL]			= "KSYMBOL",
 	[PERF_RECORD_BPF_EVENT]			= "BPF_EVENT",
+	[PERF_RECORD_CGROUP]			= "CGROUP",
 	[PERF_RECORD_HEADER_ATTR]		= "ATTR",
 	[PERF_RECORD_HEADER_EVENT_TYPE]		= "EVENT_TYPE",
 	[PERF_RECORD_HEADER_TRACING_DATA]	= "TRACING_DATA",
@@ -180,6 +181,12 @@ size_t perf_event__fprintf_namespaces(union perf_event *event, FILE *fp)
 	return ret;
 }
 
+size_t perf_event__fprintf_cgroup(union perf_event *event, FILE *fp)
+{
+	return fprintf(fp, " cgroup: %" PRI_lu64 " %s\n",
+		       event->cgroup.id, event->cgroup.path);
+}
+
 int perf_event__process_comm(struct perf_tool *tool __maybe_unused,
 			     union perf_event *event,
 			     struct perf_sample *sample,
@@ -196,6 +203,14 @@ int perf_event__process_namespaces(struct perf_tool *tool __maybe_unused,
 	return machine__process_namespaces_event(machine, event, sample);
 }
 
+int perf_event__process_cgroup(struct perf_tool *tool __maybe_unused,
+			       union perf_event *event,
+			       struct perf_sample *sample,
+			       struct machine *machine)
+{
+	return machine__process_cgroup_event(machine, event, sample);
+}
+
 int perf_event__process_lost(struct perf_tool *tool __maybe_unused,
 			     union perf_event *event,
 			     struct perf_sample *sample,
@@ -417,6 +432,9 @@ size_t perf_event__fprintf(union perf_event *event, FILE *fp)
 	case PERF_RECORD_NAMESPACES:
 		ret += perf_event__fprintf_namespaces(event, fp);
 		break;
+	case PERF_RECORD_CGROUP:
+		ret += perf_event__fprintf_cgroup(event, fp);
+		break;
 	case PERF_RECORD_MMAP2:
 		ret += perf_event__fprintf_mmap2(event, fp);
 		break;
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 85223159737c..0ad3ba22817d 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -135,6 +135,7 @@ struct perf_sample {
 	u32 raw_size;
 	u64 data_src;
 	u64 phys_addr;
+	u64 cgroup;
 	u32 flags;
 	u16 insn_len;
 	u8  cpumode;
@@ -321,6 +322,10 @@ int perf_event__process_namespaces(struct perf_tool *tool,
 				   union perf_event *event,
 				   struct perf_sample *sample,
 				   struct machine *machine);
+int perf_event__process_cgroup(struct perf_tool *tool,
+			       union perf_event *event,
+			       struct perf_sample *sample,
+			       struct machine *machine);
 int perf_event__process_mmap(struct perf_tool *tool,
 			     union perf_event *event,
 			     struct perf_sample *sample,
@@ -376,6 +381,7 @@ size_t perf_event__fprintf_switch(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_thread_map(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_cpu_map(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_namespaces(union perf_event *event, FILE *fp);
+size_t perf_event__fprintf_cgroup(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_ksymbol(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_bpf(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf(union perf_event *event, FILE *fp);
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index a69e64236120..0f5a67603139 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2250,6 +2250,12 @@ int perf_evsel__parse_sample(struct evsel *evsel, union perf_event *event,
 		array++;
 	}
 
+	data->cgroup = 0;
+	if (type & PERF_SAMPLE_CGROUP) {
+		data->cgroup = *array;
+		array++;
+	}
+
 	if (type & PERF_SAMPLE_AUX) {
 		OVERFLOW_CHECK_u64(array);
 		sz = *array++;
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index c8c5410315e8..2c3223bec561 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -654,6 +654,16 @@ int machine__process_namespaces_event(struct machine *machine __maybe_unused,
 	return err;
 }
 
+int machine__process_cgroup_event(struct machine *machine __maybe_unused,
+				  union perf_event *event,
+				  struct perf_sample *sample __maybe_unused)
+{
+	if (dump_trace)
+		perf_event__fprintf_cgroup(event, stdout);
+
+	return 0;
+}
+
 int machine__process_lost_event(struct machine *machine __maybe_unused,
 				union perf_event *event, struct perf_sample *sample __maybe_unused)
 {
@@ -1880,6 +1890,8 @@ int machine__process_event(struct machine *machine, union perf_event *event,
 		ret = machine__process_mmap_event(machine, event, sample); break;
 	case PERF_RECORD_NAMESPACES:
 		ret = machine__process_namespaces_event(machine, event, sample); break;
+	case PERF_RECORD_CGROUP:
+		ret = machine__process_cgroup_event(machine, event, sample); break;
 	case PERF_RECORD_MMAP2:
 		ret = machine__process_mmap2_event(machine, event, sample); break;
 	case PERF_RECORD_FORK:
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index be0a930eca89..fa1be9ea00fa 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -128,6 +128,9 @@ int machine__process_switch_event(struct machine *machine,
 int machine__process_namespaces_event(struct machine *machine,
 				      union perf_event *event,
 				      struct perf_sample *sample);
+int machine__process_cgroup_event(struct machine *machine,
+				  union perf_event *event,
+				  struct perf_sample *sample);
 int machine__process_mmap_event(struct machine *machine, union perf_event *event,
 				struct perf_sample *sample);
 int machine__process_mmap2_event(struct machine *machine, union perf_event *event,
diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/perf_event_attr_fprintf.c
index 651203126c71..a37a89c747d8 100644
--- a/tools/perf/util/perf_event_attr_fprintf.c
+++ b/tools/perf/util/perf_event_attr_fprintf.c
@@ -35,6 +35,7 @@ static void __p_sample_type(char *buf, size_t size, u64 value)
 		bit_name(BRANCH_STACK), bit_name(REGS_USER), bit_name(STACK_USER),
 		bit_name(IDENTIFIER), bit_name(REGS_INTR), bit_name(DATA_SRC),
 		bit_name(WEIGHT), bit_name(PHYS_ADDR), bit_name(AUX),
+		bit_name(CGROUP),
 		{ .name = NULL, }
 	};
 #undef bit_name
@@ -131,6 +132,7 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
 	PRINT_ATTRf(ksymbol, p_unsigned);
 	PRINT_ATTRf(bpf_event, p_unsigned);
 	PRINT_ATTRf(aux_output, p_unsigned);
+	PRINT_ATTRf(cgroup, p_unsigned);
 
 	PRINT_ATTRn("{ wakeup_events, wakeup_watermark }", wakeup_events, p_unsigned);
 	PRINT_ATTRf(bp_type, p_unsigned);
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index d0d7d25b23e3..6b4c12d48c3f 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -471,6 +471,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
 		tool->comm = process_event_stub;
 	if (tool->namespaces == NULL)
 		tool->namespaces = process_event_stub;
+	if (tool->cgroup == NULL)
+		tool->cgroup = process_event_stub;
 	if (tool->fork == NULL)
 		tool->fork = process_event_stub;
 	if (tool->exit == NULL)
@@ -1434,6 +1436,8 @@ static int machines__deliver_event(struct machines *machines,
 		return tool->comm(tool, event, sample, machine);
 	case PERF_RECORD_NAMESPACES:
 		return tool->namespaces(tool, event, sample, machine);
+	case PERF_RECORD_CGROUP:
+		return tool->cgroup(tool, event, sample, machine);
 	case PERF_RECORD_FORK:
 		return tool->fork(tool, event, sample, machine);
 	case PERF_RECORD_EXIT:
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index c423298fe62d..cd336eb8886b 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -1228,6 +1228,9 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
 	if (type & PERF_SAMPLE_PHYS_ADDR)
 		result += sizeof(u64);
 
+	if (type & PERF_SAMPLE_CGROUP)
+		result += sizeof(u64);
+
 	if (type & PERF_SAMPLE_AUX) {
 		result += sizeof(u64);
 		result += sample->aux_sample.size;
@@ -1401,6 +1404,11 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo
 		array++;
 	}
 
+	if (type & PERF_SAMPLE_CGROUP) {
+		*array = sample->cgroup;
+		array++;
+	}
+
 	if (type & PERF_SAMPLE_AUX) {
 		sz = sample->aux_sample.size;
 		*array++ = sz;
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 2abbf668b8de..472ef5eb4068 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -46,6 +46,7 @@ struct perf_tool {
 			mmap2,
 			comm,
 			namespaces,
+			cgroup,
 			fork,
 			exit,
 			lost,
-- 
2.24.1.735.g03f4e72817-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 4/9] perf tools: Maintain cgroup hierarchy
  2020-01-07 13:34 [PATCHSET 0/9] perf: Improve cgroup profiling (v4) Namhyung Kim
                   ` (2 preceding siblings ...)
  2020-01-07 13:34 ` [PATCH 3/9] perf tools: Basic support for CGROUP event Namhyung Kim
@ 2020-01-07 13:34 ` Namhyung Kim
  2020-01-08 21:52   ` Jiri Olsa
  2020-01-08 22:01   ` Jiri Olsa
  2020-01-07 13:34 ` [PATCH 5/9] perf report: Add 'cgroup' sort key Namhyung Kim
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 33+ messages in thread
From: Namhyung Kim @ 2020-01-07 13:34 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Alexander Shishkin, Mark Rutland, Stephane Eranian,
	LKML, linux-perf-users

Each cgroup is kept in the global cgroup_tree sorted by the cgroup id.
Hist entries have cgroup id can compare it directly and later it can
be used to find a group name using this tree.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/cgroup.c  | 72 +++++++++++++++++++++++++++++++++++++++
 tools/perf/util/cgroup.h  | 15 +++++---
 tools/perf/util/machine.c |  7 ++++
 tools/perf/util/session.c |  4 +++
 4 files changed, 94 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
index 4881d4af3381..4e8ef1db0c94 100644
--- a/tools/perf/util/cgroup.c
+++ b/tools/perf/util/cgroup.c
@@ -13,6 +13,8 @@
 
 int nr_cgroups;
 
+static struct rb_root cgroup_tree = RB_ROOT;
+
 static int
 cgroupfs_find_mountpoint(char *buf, size_t maxlen)
 {
@@ -250,3 +252,73 @@ int parse_cgroups(const struct option *opt, const char *str,
 	}
 	return 0;
 }
+
+struct cgroup *cgroup__findnew(uint64_t id, const char *path)
+{
+	struct rb_node **p = &cgroup_tree.rb_node;
+	struct rb_node *parent = NULL;
+	struct cgroup *cgrp;
+
+	while (*p != NULL) {
+		parent = *p;
+		cgrp = rb_entry(parent, struct cgroup, node);
+
+		if (cgrp->id == id)
+			return cgrp;
+
+		if (cgrp->id < id)
+			p = &(*p)->rb_left;
+		else
+			p = &(*p)->rb_right;
+	}
+
+	cgrp = malloc(sizeof(*cgrp));
+	if (cgrp == NULL)
+		return NULL;
+
+	cgrp->name = strdup(path);
+	if (cgrp->name == NULL) {
+		free(cgrp);
+		return NULL;
+	}
+
+	cgrp->fd = -1;
+	cgrp->id = id;
+	refcount_set(&cgrp->refcnt, 1);
+
+	rb_link_node(&cgrp->node, parent, p);
+	rb_insert_color(&cgrp->node, &cgroup_tree);
+
+	return cgrp;
+}
+
+struct cgroup *cgroup__find_by_path(const char *path)
+{
+	struct rb_node *node;
+
+	node = rb_first(&cgroup_tree);
+	while (node) {
+		struct cgroup *cgrp = rb_entry(node, struct cgroup, node);
+
+		if (!strcmp(cgrp->name, path))
+			return cgrp;
+
+		node = rb_next(&cgrp->node);
+	}
+
+	return NULL;
+}
+
+void destroy_cgroups(void)
+{
+	struct rb_node *node;
+	struct cgroup *cgrp;
+
+	while (!RB_EMPTY_ROOT(&cgroup_tree)) {
+		node = rb_first(&cgroup_tree);
+		cgrp = rb_entry(node, struct cgroup, node);
+
+		rb_erase(node, &cgroup_tree);
+		cgroup__put(cgrp);
+	}
+}
diff --git a/tools/perf/util/cgroup.h b/tools/perf/util/cgroup.h
index 2ec11f01090d..381583df27c7 100644
--- a/tools/perf/util/cgroup.h
+++ b/tools/perf/util/cgroup.h
@@ -3,16 +3,18 @@
 #define __CGROUP_H__
 
 #include <linux/refcount.h>
+#include <linux/rbtree.h>
 
 struct option;
 
 struct cgroup {
-	char *name;
-	int fd;
-	refcount_t refcnt;
+	struct rb_node		node;
+	u64			id;
+	char			*name;
+	int			fd;
+	refcount_t		refcnt;
 };
 
-
 extern int nr_cgroups; /* number of explicit cgroups defined */
 
 struct cgroup *cgroup__get(struct cgroup *cgroup);
@@ -26,4 +28,9 @@ void evlist__set_default_cgroup(struct evlist *evlist, struct cgroup *cgroup);
 
 int parse_cgroups(const struct option *opt, const char *str, int unset);
 
+struct cgroup *cgroup__findnew(uint64_t id, const char *path);
+struct cgroup *cgroup__find_by_path(const char *path);
+
+void destroy_cgroups(void);
+
 #endif /* __CGROUP_H__ */
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 2c3223bec561..19d40a2016d7 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -33,6 +33,7 @@
 #include "asm/bug.h"
 #include "bpf-event.h"
 #include <internal/lib.h> // page_size
+#include "cgroup.h"
 
 #include <linux/ctype.h>
 #include <symbol/kallsyms.h>
@@ -658,9 +659,15 @@ int machine__process_cgroup_event(struct machine *machine __maybe_unused,
 				  union perf_event *event,
 				  struct perf_sample *sample __maybe_unused)
 {
+	struct cgroup *cgrp;
+
 	if (dump_trace)
 		perf_event__fprintf_cgroup(event, stdout);
 
+	cgrp = cgroup__findnew(event->cgroup.id, event->cgroup.path);
+	if (cgrp == NULL)
+		return -ENOMEM;
+
 	return 0;
 }
 
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 6b4c12d48c3f..5e2c9f504184 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -289,6 +289,8 @@ static void perf_session__release_decomp_events(struct perf_session *session)
 	} while (1);
 }
 
+extern void destroy_cgroups(void);
+
 void perf_session__delete(struct perf_session *session)
 {
 	if (session == NULL)
@@ -303,6 +305,8 @@ void perf_session__delete(struct perf_session *session)
 	if (session->data)
 		perf_data__close(session->data);
 	free(session);
+
+	destroy_cgroups();
 }
 
 static int process_event_synth_tracing_data_stub(struct perf_session *session
-- 
2.24.1.735.g03f4e72817-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 5/9] perf report: Add 'cgroup' sort key
  2020-01-07 13:34 [PATCHSET 0/9] perf: Improve cgroup profiling (v4) Namhyung Kim
                   ` (3 preceding siblings ...)
  2020-01-07 13:34 ` [PATCH 4/9] perf tools: Maintain cgroup hierarchy Namhyung Kim
@ 2020-01-07 13:34 ` Namhyung Kim
  2020-01-07 13:34 ` [PATCH 6/9] perf record: Support synthesizing cgroup events Namhyung Kim
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 33+ messages in thread
From: Namhyung Kim @ 2020-01-07 13:34 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Alexander Shishkin, Mark Rutland, Stephane Eranian,
	LKML, linux-perf-users

The cgroup sort key is to show cgroup membership of each task.
Currently it shows full path in the cgroupfs (not relative to the root
of cgroup namespace) since it'd be more intuitive IMHO.  Otherwise
root cgroup in different namespaces will all show same name - "/".

The cgroup sort key should come before cgroup_id otherwise
sort_dimension__add() will match it to cgroup_id as it only matches
with the given substring.

For example it will look like following.  Note that record patch
adding --all-cgroups patch will come later.

  $ perf record -a --namespace --all-cgroups  cgtest
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.208 MB perf.data (4090 samples) ]

  $ perf report -s cgroup_id,cgroup,pid
  ...
  # Overhead  cgroup id (dev/inode)  cgroup          Pid:Command
  # ........  .....................  ..........  ...............
  #
      93.96%  0/0x0                  /                 0:swapper
       1.25%  3/0xeffffffb           /               278:looper0
       0.86%  3/0xf000015f           /sub/cgrp1      280:cgtest
       0.37%  3/0xf0000160           /sub/cgrp2      281:cgtest
       0.34%  3/0xf0000163           /sub/cgrp3      282:cgtest
       0.22%  3/0xeffffffb           /sub            278:looper0
       0.20%  3/0xeffffffb           /               280:cgtest
       0.15%  3/0xf0000163           /sub/cgrp3      285:looper3

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-report.txt |  1 +
 tools/perf/util/hist.c                   |  7 ++++++
 tools/perf/util/hist.h                   |  1 +
 tools/perf/util/sort.c                   | 31 ++++++++++++++++++++++++
 tools/perf/util/sort.h                   |  2 ++
 5 files changed, 42 insertions(+)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 8dbe2119686a..5af43f3faf9f 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -95,6 +95,7 @@ OPTIONS
 	abort cost. This is the global weight.
 	- local_weight: Local weight version of the weight above.
 	- cgroup_id: ID derived from cgroup namespace device and inode numbers.
+	- cgroup: cgroup pathname in the cgroupfs.
 	- transaction: Transaction abort flags.
 	- overhead: Overhead percentage of sample
 	- overhead_sys: Overhead percentage of sample running in system mode
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index ca5a8f4d007e..9f28d2f487c1 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -10,6 +10,7 @@
 #include "mem-events.h"
 #include "session.h"
 #include "namespaces.h"
+#include "cgroup.h"
 #include "sort.h"
 #include "units.h"
 #include "evlist.h"
@@ -222,6 +223,11 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h)
 
 	if (h->trace_output)
 		hists__new_col_len(hists, HISTC_TRACE, strlen(h->trace_output));
+
+	if (h->cgroup) {
+		struct cgroup *cgrp = cgroup__findnew(h->cgroup, "<unknown>");
+		hists__new_col_len(hists, HISTC_CGROUP, strlen(cgrp->name));
+	}
 }
 
 void hists__output_recalc_col_len(struct hists *hists, int max_rows)
@@ -691,6 +697,7 @@ __hists__add_entry(struct hists *hists,
 			.dev = ns ? ns->link_info[CGROUP_NS_INDEX].dev : 0,
 			.ino = ns ? ns->link_info[CGROUP_NS_INDEX].ino : 0,
 		},
+		.cgroup = sample->cgroup,
 		.ms = {
 			.maps	= al->maps,
 			.map	= al->map,
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 0aa63aeb58ec..f7fb9c32dda5 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -38,6 +38,7 @@ enum hist_column {
 	HISTC_THREAD,
 	HISTC_COMM,
 	HISTC_CGROUP_ID,
+	HISTC_CGROUP,
 	HISTC_PARENT,
 	HISTC_CPU,
 	HISTC_SOCKET,
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index ab0cfd790ad0..52073029e592 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -25,6 +25,7 @@
 #include "mem-events.h"
 #include "annotate.h"
 #include "time-utils.h"
+#include "cgroup.h"
 #include <linux/kernel.h>
 #include <linux/string.h>
 
@@ -634,6 +635,35 @@ struct sort_entry sort_cgroup_id = {
 	.se_width_idx	= HISTC_CGROUP_ID,
 };
 
+/* --sort cgroup */
+
+static int64_t
+sort__cgroup_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	return right->cgroup - left->cgroup;
+}
+
+static int hist_entry__cgroup_snprintf(struct hist_entry *he,
+				       char *bf, size_t size,
+				       unsigned int width __maybe_unused)
+{
+	const char *cgrp_name = "N/A";
+
+	if (he->cgroup) {
+		struct cgroup *cgrp = cgroup__findnew(he->cgroup, cgrp_name);
+		cgrp_name = cgrp->name;
+	}
+
+	return repsep_snprintf(bf, size, "%s", cgrp_name);
+}
+
+struct sort_entry sort_cgroup = {
+	.se_header      = "cgroup",
+	.se_cmp	        = sort__cgroup_cmp,
+	.se_snprintf    = hist_entry__cgroup_snprintf,
+	.se_width_idx	= HISTC_CGROUP,
+};
+
 /* --sort socket */
 
 static int64_t
@@ -1658,6 +1688,7 @@ static struct sort_dimension common_sort_dimensions[] = {
 	DIM(SORT_TRACE, "trace", sort_trace),
 	DIM(SORT_SYM_SIZE, "symbol_size", sort_sym_size),
 	DIM(SORT_DSO_SIZE, "dso_size", sort_dso_size),
+	DIM(SORT_CGROUP, "cgroup", sort_cgroup),
 	DIM(SORT_CGROUP_ID, "cgroup_id", sort_cgroup_id),
 	DIM(SORT_SYM_IPC_NULL, "ipc_null", sort_sym_ipc_null),
 	DIM(SORT_TIME, "time", sort_time),
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 6c862d62d052..cfa6ac6f7d06 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -101,6 +101,7 @@ struct hist_entry {
 	struct thread		*thread;
 	struct comm		*comm;
 	struct namespace_id	cgroup_id;
+	u64			cgroup;
 	u64			ip;
 	u64			transaction;
 	s32			socket;
@@ -224,6 +225,7 @@ enum sort_type {
 	SORT_TRACE,
 	SORT_SYM_SIZE,
 	SORT_DSO_SIZE,
+	SORT_CGROUP,
 	SORT_CGROUP_ID,
 	SORT_SYM_IPC_NULL,
 	SORT_TIME,
-- 
2.24.1.735.g03f4e72817-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 6/9] perf record: Support synthesizing cgroup events
  2020-01-07 13:34 [PATCHSET 0/9] perf: Improve cgroup profiling (v4) Namhyung Kim
                   ` (4 preceding siblings ...)
  2020-01-07 13:34 ` [PATCH 5/9] perf report: Add 'cgroup' sort key Namhyung Kim
@ 2020-01-07 13:34 ` Namhyung Kim
  2020-01-08 22:08   ` Jiri Olsa
                     ` (2 more replies)
  2020-01-07 13:34 ` [PATCH 7/9] perf record: Add --all-cgroups option Namhyung Kim
                   ` (2 subsequent siblings)
  8 siblings, 3 replies; 33+ messages in thread
From: Namhyung Kim @ 2020-01-07 13:34 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Alexander Shishkin, Mark Rutland, Stephane Eranian,
	LKML, linux-perf-users

Synthesize cgroup events by iterating cgroup filesystem directories.
The cgroup event only saves the portion of cgroup path after the mount
point and the cgroup id (which actually is a file handle).

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-record.c        |   5 ++
 tools/perf/util/cgroup.c           |   3 +-
 tools/perf/util/cgroup.h           |   1 +
 tools/perf/util/event.c            |   1 +
 tools/perf/util/synthetic-events.c | 119 +++++++++++++++++++++++++++++
 tools/perf/util/synthetic-events.h |   1 +
 tools/perf/util/tool.h             |   1 +
 7 files changed, 129 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 4c301466101b..2802de9538ff 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1397,6 +1397,11 @@ static int record__synthesize(struct record *rec, bool tail)
 	if (err < 0)
 		pr_warning("Couldn't synthesize bpf events.\n");
 
+	err = perf_event__synthesize_cgroups(tool, process_synthesized_event,
+					     machine);
+	if (err < 0)
+		pr_warning("Couldn't synthesize cgroup events.\n");
+
 	err = __machine__synthesize_threads(machine, tool, &opts->target, rec->evlist->core.threads,
 					    process_synthesized_event, opts->sample_address,
 					    1);
diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
index 4e8ef1db0c94..5147d22b3bda 100644
--- a/tools/perf/util/cgroup.c
+++ b/tools/perf/util/cgroup.c
@@ -15,8 +15,7 @@ int nr_cgroups;
 
 static struct rb_root cgroup_tree = RB_ROOT;
 
-static int
-cgroupfs_find_mountpoint(char *buf, size_t maxlen)
+int cgroupfs_find_mountpoint(char *buf, size_t maxlen)
 {
 	FILE *fp;
 	char mountpoint[PATH_MAX + 1], tokens[PATH_MAX + 1], type[PATH_MAX + 1];
diff --git a/tools/perf/util/cgroup.h b/tools/perf/util/cgroup.h
index 381583df27c7..9a67060723fa 100644
--- a/tools/perf/util/cgroup.h
+++ b/tools/perf/util/cgroup.h
@@ -17,6 +17,7 @@ struct cgroup {
 
 extern int nr_cgroups; /* number of explicit cgroups defined */
 
+int cgroupfs_find_mountpoint(char *buf, size_t maxlen);
 struct cgroup *cgroup__get(struct cgroup *cgroup);
 void cgroup__put(struct cgroup *cgroup);
 
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 824c038e5c33..28801c867f39 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -33,6 +33,7 @@
 #include "bpf-event.h"
 #include "tool.h"
 #include "../perf.h"
+#include "cgroup.h"
 
 static const char *perf_event__names[] = {
 	[0]					= "TOTAL",
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index cd336eb8886b..36f122cac44f 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -16,6 +16,7 @@
 #include "util/synthetic-events.h"
 #include "util/target.h"
 #include "util/time-utils.h"
+#include "util/cgroup.h"
 #include <linux/bitops.h>
 #include <linux/kernel.h>
 #include <linux/string.h>
@@ -413,6 +414,124 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 	return rc;
 }
 
+static int perf_event__synthesize_cgroup(struct perf_tool *tool,
+					 union perf_event *event,
+					 char *path, size_t mount_len,
+					 perf_event__handler_t process,
+					 struct machine *machine)
+{
+	size_t event_size = sizeof(event->cgroup) - sizeof(event->cgroup.path);
+	size_t path_len = strlen(path) - mount_len + 1;
+	struct {
+		struct file_handle fh;
+		uint64_t cgroup_id;
+	} handle;
+	int mount_id;
+
+	while (path_len % sizeof(u64))
+		path[mount_len + path_len++] = '\0';
+
+	memset(&event->cgroup, 0, event_size);
+
+	event->cgroup.header.type = PERF_RECORD_CGROUP;
+	event->cgroup.header.size = event_size + path_len + machine->id_hdr_size;
+
+	handle.fh.handle_bytes = sizeof(handle.cgroup_id);
+	if (name_to_handle_at(AT_FDCWD, path, &handle.fh, &mount_id, 0) < 0) {
+		pr_debug("stat failed: %s\n", path);
+		return -1;
+	}
+
+	event->cgroup.id = handle.cgroup_id;
+	strncpy(event->cgroup.path, path + mount_len, path_len);
+	memset(event->cgroup.path + path_len, 0, machine->id_hdr_size);
+
+	if (perf_tool__process_synth_event(tool, event, machine, process) < 0) {
+		pr_debug("process synth event failed\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int perf_event__walk_cgroup_tree(struct perf_tool *tool,
+					union perf_event *event,
+					char *path, size_t mount_len,
+					perf_event__handler_t process,
+					struct machine *machine)
+{
+	size_t pos = strlen(path);
+	DIR *d;
+	struct dirent *dent;
+	int ret = 0;
+
+	if (perf_event__synthesize_cgroup(tool, event, path, mount_len,
+					  process, machine) < 0)
+		return -1;
+
+	d = opendir(path);
+	if (d == NULL) {
+		pr_debug("failed to open directory: %s\n", path);
+		return -1;
+	}
+
+	while ((dent = readdir(d)) != NULL) {
+		if (dent->d_type != DT_DIR)
+			continue;
+		if (!strcmp(dent->d_name, ".") ||
+		    !strcmp(dent->d_name, ".."))
+			continue;
+
+		if (path[pos - 1] != '/')
+			strcat(path, "/");
+		strcat(path, dent->d_name);
+
+		ret = perf_event__walk_cgroup_tree(tool, event, path,
+						   mount_len, process, machine);
+		if (ret < 0)
+			break;
+
+		path[pos] = '\0';
+	}
+
+	closedir(d);
+	return ret;
+}
+
+int perf_event__synthesize_cgroups(struct perf_tool *tool,
+				   perf_event__handler_t process,
+				   struct machine *machine)
+{
+	union perf_event event;
+	char *cgrp_root;
+	size_t mount_len;  /* length of mount point in the path */
+	int ret = -1;
+
+	cgrp_root = malloc(PATH_MAX);
+	if (cgrp_root == NULL)
+		return -1;
+
+	if (cgroupfs_find_mountpoint(cgrp_root, PATH_MAX) < 0) {
+		pr_debug("cannot find cgroup mount point\n");
+		goto out;
+	}
+
+	mount_len = strlen(cgrp_root);
+	/* make sure the path starts with a slash (after mount point) */
+	strcat(cgrp_root, "/");
+
+	if (perf_event__walk_cgroup_tree(tool, &event, cgrp_root, mount_len,
+					 process, machine) < 0)
+		goto out;
+
+	ret = 0;
+
+out:
+	free(cgrp_root);
+
+	return ret;
+}
+
 int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t process,
 				   struct machine *machine)
 {
diff --git a/tools/perf/util/synthetic-events.h b/tools/perf/util/synthetic-events.h
index baead0cdc381..e7a3e9589738 100644
--- a/tools/perf/util/synthetic-events.h
+++ b/tools/perf/util/synthetic-events.h
@@ -45,6 +45,7 @@ int perf_event__synthesize_kernel_mmap(struct perf_tool *tool, perf_event__handl
 int perf_event__synthesize_mmap_events(struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine, bool mmap_data);
 int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
 int perf_event__synthesize_namespaces(struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine);
+int perf_event__synthesize_cgroups(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
 int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format, const struct perf_sample *sample);
 int perf_event__synthesize_stat_config(struct perf_tool *tool, struct perf_stat_config *config, perf_event__handler_t process, struct machine *machine);
 int perf_event__synthesize_stat_events(struct perf_stat_config *config, struct perf_tool *tool, struct evlist *evlist, perf_event__handler_t process, bool attrs);
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 472ef5eb4068..3fb67bd31e4a 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -79,6 +79,7 @@ struct perf_tool {
 	bool		ordered_events;
 	bool		ordering_requires_timestamps;
 	bool		namespace_events;
+	bool		cgroup_events;
 	bool		no_warn;
 	enum show_feature_header show_feat_hdr;
 };
-- 
2.24.1.735.g03f4e72817-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 7/9] perf record: Add --all-cgroups option
  2020-01-07 13:34 [PATCHSET 0/9] perf: Improve cgroup profiling (v4) Namhyung Kim
                   ` (5 preceding siblings ...)
  2020-01-07 13:34 ` [PATCH 6/9] perf record: Support synthesizing cgroup events Namhyung Kim
@ 2020-01-07 13:34 ` Namhyung Kim
  2020-01-07 13:35 ` [PATCH 8/9] perf top: " Namhyung Kim
  2020-01-07 13:35 ` [PATCH 9/9] perf script: Add --show-cgroup-events option Namhyung Kim
  8 siblings, 0 replies; 33+ messages in thread
From: Namhyung Kim @ 2020-01-07 13:34 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Alexander Shishkin, Mark Rutland, Stephane Eranian,
	LKML, linux-perf-users

The --all-cgroups option is to enable cgroup profiling support.  It
tells kernel to record CGROUP events in the ring buffer so that perf
report can identify task/cgroup association later.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-record.txt |  5 ++++-
 tools/perf/builtin-record.c              |  5 +++++
 tools/perf/util/evsel.c                  | 11 ++++++++++-
 tools/perf/util/evsel.h                  |  1 +
 tools/perf/util/record.h                 |  1 +
 5 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index b23a4012a606..6dd9d69d7dee 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -385,7 +385,10 @@ displayed with the weight and local_weight sort keys.  This currently works for
 abort events and some memory events in precise mode on modern Intel CPUs.
 
 --namespaces::
-Record events of type PERF_RECORD_NAMESPACES.
+Record events of type PERF_RECORD_NAMESPACES.  This enables 'cgroup_id' sort key.
+
+--all-cgroups::
+Record events of type PERF_RECORD_CGROUP.  This enables 'cgroup' sort key.
 
 --transaction::
 Record transaction flags for transaction related events.
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 2802de9538ff..7d7912e121d6 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1433,6 +1433,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 	if (rec->opts.record_namespaces)
 		tool->namespace_events = true;
 
+	if (rec->opts.record_cgroup)
+		tool->cgroup_events = true;
+
 	if (rec->opts.auxtrace_snapshot_mode || rec->switch_output.enabled) {
 		signal(SIGUSR2, snapshot_sig_handler);
 		if (rec->opts.auxtrace_snapshot_mode)
@@ -2363,6 +2366,8 @@ static struct option __record_options[] = {
 			"per thread proc mmap processing timeout in ms"),
 	OPT_BOOLEAN(0, "namespaces", &record.opts.record_namespaces,
 		    "Record namespaces events"),
+	OPT_BOOLEAN(0, "all-cgroups", &record.opts.record_cgroup,
+		    "Record cgroup events"),
 	OPT_BOOLEAN(0, "switch-events", &record.opts.record_switch_events,
 		    "Record context switch events"),
 	OPT_BOOLEAN_FLAG(0, "all-kernel", &record.opts.all_kernel,
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 0f5a67603139..ff6c6683a32b 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1102,6 +1102,11 @@ void perf_evsel__config(struct evsel *evsel, struct record_opts *opts,
 	if (opts->record_namespaces)
 		attr->namespaces  = track;
 
+	if (opts->record_cgroup) {
+		attr->cgroup = track && !perf_missing_features.cgroup;
+		perf_evsel__set_sample_bit(evsel, CGROUP);
+	}
+
 	if (opts->record_switch_events)
 		attr->context_switch = track;
 
@@ -1782,7 +1787,11 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
 	 * Must probe features in the order they were added to the
 	 * perf_event_attr interface.
 	 */
-	if (!perf_missing_features.aux_output && evsel->core.attr.aux_output) {
+	if (!perf_missing_features.cgroup && evsel->core.attr.cgroup) {
+		perf_missing_features.cgroup = true;
+		pr_debug2_peo("Kernel has no cgroup sampling support, bailing out\n");
+		goto out_close;
+	} else if (!perf_missing_features.aux_output && evsel->core.attr.aux_output) {
 		perf_missing_features.aux_output = true;
 		pr_debug2_peo("Kernel has no attr.aux_output support, bailing out\n");
 		goto out_close;
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index dc14f4a823cd..df71b530740b 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -119,6 +119,7 @@ struct perf_missing_features {
 	bool ksymbol;
 	bool bpf;
 	bool aux_output;
+	bool cgroup;
 };
 
 extern struct perf_missing_features perf_missing_features;
diff --git a/tools/perf/util/record.h b/tools/perf/util/record.h
index 5421fd2ad383..24316458be20 100644
--- a/tools/perf/util/record.h
+++ b/tools/perf/util/record.h
@@ -34,6 +34,7 @@ struct record_opts {
 	bool	      auxtrace_snapshot_on_exit;
 	bool	      auxtrace_sample_mode;
 	bool	      record_namespaces;
+	bool	      record_cgroup;
 	bool	      record_switch_events;
 	bool	      all_kernel;
 	bool	      all_user;
-- 
2.24.1.735.g03f4e72817-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 8/9] perf top: Add --all-cgroups option
  2020-01-07 13:34 [PATCHSET 0/9] perf: Improve cgroup profiling (v4) Namhyung Kim
                   ` (6 preceding siblings ...)
  2020-01-07 13:34 ` [PATCH 7/9] perf record: Add --all-cgroups option Namhyung Kim
@ 2020-01-07 13:35 ` Namhyung Kim
  2020-01-08 22:24   ` Jiri Olsa
  2020-01-07 13:35 ` [PATCH 9/9] perf script: Add --show-cgroup-events option Namhyung Kim
  8 siblings, 1 reply; 33+ messages in thread
From: Namhyung Kim @ 2020-01-07 13:35 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Alexander Shishkin, Mark Rutland, Stephane Eranian,
	LKML, linux-perf-users

The --all-cgroups option is to enable cgroup profiling support.  It
tells kernel to record CGROUP events in the ring buffer so that perf
report can identify task/cgroup association later.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-top.txt | 4 ++++
 tools/perf/builtin-top.c              | 9 +++++++++
 2 files changed, 13 insertions(+)

diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt
index 5596129a71cf..c75507f50071 100644
--- a/tools/perf/Documentation/perf-top.txt
+++ b/tools/perf/Documentation/perf-top.txt
@@ -266,6 +266,10 @@ Default is to monitor all CPUS.
 	Record events of type PERF_RECORD_NAMESPACES and display it with the
 	'cgroup_id' sort key.
 
+--cgroup::
+	Record events of type PERF_RECORD_CGROUP and display it with the
+	'cgroup' sort key.
+
 --switch-on EVENT_NAME::
 	Only consider events after this event is found.
 
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 795e353de095..f6256c533b09 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1244,6 +1244,8 @@ static int __cmd_top(struct perf_top *top)
 
 	if (opts->record_namespaces)
 		top->tool.namespace_events = true;
+	if (opts->record_cgroup)
+		top->tool.cgroup_events = true;
 
 	ret = perf_event__synthesize_bpf_events(top->session, perf_event__process,
 						&top->session->machines.host,
@@ -1251,6 +1253,11 @@ static int __cmd_top(struct perf_top *top)
 	if (ret < 0)
 		pr_debug("Couldn't synthesize BPF events: Pre-existing BPF programs won't have symbols resolved.\n");
 
+	ret = perf_event__synthesize_cgroups(&top->tool, perf_event__process,
+					     &top->session->machines.host);
+	if (ret < 0)
+		pr_debug("Couldn't synthesize cgroup events.\n");
+
 	machine__synthesize_threads(&top->session->machines.host, &opts->target,
 				    top->evlist->core.threads, false,
 				    top->nr_threads_synthesize);
@@ -1539,6 +1546,8 @@ int cmd_top(int argc, const char **argv)
 			"number of thread to run event synthesize"),
 	OPT_BOOLEAN(0, "namespaces", &opts->record_namespaces,
 		    "Record namespaces events"),
+	OPT_BOOLEAN(0, "all-cgroups", &opts->record_cgroup,
+		    "Record cgroup events"),
 	OPTS_EVSWITCH(&top.evswitch),
 	OPT_END()
 	};
-- 
2.24.1.735.g03f4e72817-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 9/9] perf script: Add --show-cgroup-events option
  2020-01-07 13:34 [PATCHSET 0/9] perf: Improve cgroup profiling (v4) Namhyung Kim
                   ` (7 preceding siblings ...)
  2020-01-07 13:35 ` [PATCH 8/9] perf top: " Namhyung Kim
@ 2020-01-07 13:35 ` Namhyung Kim
  8 siblings, 0 replies; 33+ messages in thread
From: Namhyung Kim @ 2020-01-07 13:35 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Alexander Shishkin, Mark Rutland, Stephane Eranian,
	LKML, linux-perf-users

The --show-cgroup-events option is to print CGROUP events in the
output like others.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-script.txt |  3 ++
 tools/perf/builtin-script.c              | 41 ++++++++++++++++++++++++
 2 files changed, 44 insertions(+)

diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index 2599b057e47b..3dd297600427 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -319,6 +319,9 @@ OPTIONS
 --show-bpf-events
 	Display bpf events i.e. events of type PERF_RECORD_KSYMBOL and PERF_RECORD_BPF_EVENT.
 
+--show-cgroup-events
+	Display cgroup events i.e. events of type PERF_RECORD_CGROUP.
+
 --demangle::
 	Demangle symbol names to human readable form. It's enabled by default,
 	disable with --no-demangle.
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index e2406b291c1c..3db4afc29430 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -1681,6 +1681,7 @@ struct perf_script {
 	bool			show_lost_events;
 	bool			show_round_events;
 	bool			show_bpf_events;
+	bool			show_cgroup_events;
 	bool			allocated;
 	bool			per_event_dump;
 	struct evswitch		evswitch;
@@ -2199,6 +2200,41 @@ static int process_namespaces_event(struct perf_tool *tool,
 	return ret;
 }
 
+static int process_cgroup_event(struct perf_tool *tool,
+				union perf_event *event,
+				struct perf_sample *sample,
+				struct machine *machine)
+{
+	struct thread *thread;
+	struct perf_script *script = container_of(tool, struct perf_script, tool);
+	struct perf_session *session = script->session;
+	struct evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id);
+	int ret = -1;
+
+	thread = machine__findnew_thread(machine, sample->pid, sample->tid);
+	if (thread == NULL) {
+		pr_debug("problem processing CGROUP event, skipping it.\n");
+		return -1;
+	}
+
+	if (perf_event__process_cgroup(tool, event, sample, machine) < 0)
+		goto out;
+
+	if (!evsel->core.attr.sample_id_all) {
+		sample->cpu = 0;
+		sample->time = 0;
+	}
+	if (!filter_cpu(sample)) {
+		perf_sample__fprintf_start(sample, thread, evsel,
+					   PERF_RECORD_CGROUP, stdout);
+		perf_event__fprintf(event, stdout);
+	}
+	ret = 0;
+out:
+	thread__put(thread);
+	return ret;
+}
+
 static int process_fork_event(struct perf_tool *tool,
 			      union perf_event *event,
 			      struct perf_sample *sample,
@@ -2538,6 +2574,8 @@ static int __cmd_script(struct perf_script *script)
 		script->tool.context_switch = process_switch_event;
 	if (script->show_namespace_events)
 		script->tool.namespaces = process_namespaces_event;
+	if (script->show_cgroup_events)
+		script->tool.cgroup = process_cgroup_event;
 	if (script->show_lost_events)
 		script->tool.lost = process_lost_event;
 	if (script->show_round_events) {
@@ -3463,6 +3501,7 @@ int cmd_script(int argc, const char **argv)
 			.mmap2		 = perf_event__process_mmap2,
 			.comm		 = perf_event__process_comm,
 			.namespaces	 = perf_event__process_namespaces,
+			.cgroup		 = perf_event__process_cgroup,
 			.exit		 = perf_event__process_exit,
 			.fork		 = perf_event__process_fork,
 			.attr		 = process_attr,
@@ -3563,6 +3602,8 @@ int cmd_script(int argc, const char **argv)
 		    "Show context switch events (if recorded)"),
 	OPT_BOOLEAN('\0', "show-namespace-events", &script.show_namespace_events,
 		    "Show namespace events (if recorded)"),
+	OPT_BOOLEAN('\0', "show-cgroup-events", &script.show_cgroup_events,
+		    "Show cgroup events (if recorded)"),
 	OPT_BOOLEAN('\0', "show-lost-events", &script.show_lost_events,
 		    "Show lost events (if recorded)"),
 	OPT_BOOLEAN('\0', "show-round-events", &script.show_round_events,
-- 
2.24.1.735.g03f4e72817-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/9] perf tools: Maintain cgroup hierarchy
  2020-01-07 13:34 ` [PATCH 4/9] perf tools: Maintain cgroup hierarchy Namhyung Kim
@ 2020-01-08 21:52   ` Jiri Olsa
  2020-01-09  0:51     ` Namhyung Kim
  2020-01-08 22:01   ` Jiri Olsa
  1 sibling, 1 reply; 33+ messages in thread
From: Jiri Olsa @ 2020-01-08 21:52 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

On Tue, Jan 07, 2020 at 10:34:56PM +0900, Namhyung Kim wrote:
> Each cgroup is kept in the global cgroup_tree sorted by the cgroup id.
> Hist entries have cgroup id can compare it directly and later it can
> be used to find a group name using this tree.
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/util/cgroup.c  | 72 +++++++++++++++++++++++++++++++++++++++
>  tools/perf/util/cgroup.h  | 15 +++++---
>  tools/perf/util/machine.c |  7 ++++
>  tools/perf/util/session.c |  4 +++
>  4 files changed, 94 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
> index 4881d4af3381..4e8ef1db0c94 100644
> --- a/tools/perf/util/cgroup.c
> +++ b/tools/perf/util/cgroup.c
> @@ -13,6 +13,8 @@
>  
>  int nr_cgroups;
>  
> +static struct rb_root cgroup_tree = RB_ROOT;

I think we shoud carry that in 'struct perf_env' 

jirka


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/9] perf tools: Maintain cgroup hierarchy
  2020-01-07 13:34 ` [PATCH 4/9] perf tools: Maintain cgroup hierarchy Namhyung Kim
  2020-01-08 21:52   ` Jiri Olsa
@ 2020-01-08 22:01   ` Jiri Olsa
  2020-01-09  0:55     ` Namhyung Kim
  1 sibling, 1 reply; 33+ messages in thread
From: Jiri Olsa @ 2020-01-08 22:01 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

On Tue, Jan 07, 2020 at 10:34:56PM +0900, Namhyung Kim wrote:

SNIP

> +	while (*p != NULL) {
> +		parent = *p;
> +		cgrp = rb_entry(parent, struct cgroup, node);
> +
> +		if (cgrp->id == id)
> +			return cgrp;
> +
> +		if (cgrp->id < id)
> +			p = &(*p)->rb_left;
> +		else
> +			p = &(*p)->rb_right;
> +	}
> +
> +	cgrp = malloc(sizeof(*cgrp));
> +	if (cgrp == NULL)
> +		return NULL;
> +
> +	cgrp->name = strdup(path);
> +	if (cgrp->name == NULL) {
> +		free(cgrp);
> +		return NULL;
> +	}
> +
> +	cgrp->fd = -1;
> +	cgrp->id = id;
> +	refcount_set(&cgrp->refcnt, 1);
> +
> +	rb_link_node(&cgrp->node, parent, p);
> +	rb_insert_color(&cgrp->node, &cgroup_tree);
> +
> +	return cgrp;
> +}
> +
> +struct cgroup *cgroup__find_by_path(const char *path)
> +{
> +	struct rb_node *node;
> +
> +	node = rb_first(&cgroup_tree);
> +	while (node) {
> +		struct cgroup *cgrp = rb_entry(node, struct cgroup, node);
> +
> +		if (!strcmp(cgrp->name, path))
> +			return cgrp;

you have it sorted on ids, but only search by path,
why don't we sort it on path right away?

jirka


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/9] perf record: Support synthesizing cgroup events
  2020-01-07 13:34 ` [PATCH 6/9] perf record: Support synthesizing cgroup events Namhyung Kim
@ 2020-01-08 22:08   ` Jiri Olsa
  2020-01-09  5:19     ` Namhyung Kim
  2020-01-08 22:18   ` Jiri Olsa
  2020-01-08 22:21   ` Jiri Olsa
  2 siblings, 1 reply; 33+ messages in thread
From: Jiri Olsa @ 2020-01-08 22:08 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

On Tue, Jan 07, 2020 at 10:34:58PM +0900, Namhyung Kim wrote:
> Synthesize cgroup events by iterating cgroup filesystem directories.
> The cgroup event only saves the portion of cgroup path after the mount
> point and the cgroup id (which actually is a file handle).
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/builtin-record.c        |   5 ++
>  tools/perf/util/cgroup.c           |   3 +-
>  tools/perf/util/cgroup.h           |   1 +
>  tools/perf/util/event.c            |   1 +
>  tools/perf/util/synthetic-events.c | 119 +++++++++++++++++++++++++++++
>  tools/perf/util/synthetic-events.h |   1 +
>  tools/perf/util/tool.h             |   1 +
>  7 files changed, 129 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 4c301466101b..2802de9538ff 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -1397,6 +1397,11 @@ static int record__synthesize(struct record *rec, bool tail)
>  	if (err < 0)
>  		pr_warning("Couldn't synthesize bpf events.\n");
>  
> +	err = perf_event__synthesize_cgroups(tool, process_synthesized_event,
> +					     machine);
> +	if (err < 0)
> +		pr_warning("Couldn't synthesize cgroup events.\n");
> +
>  	err = __machine__synthesize_threads(machine, tool, &opts->target, rec->evlist->core.threads,
>  					    process_synthesized_event, opts->sample_address,
>  					    1);
> diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
> index 4e8ef1db0c94..5147d22b3bda 100644
> --- a/tools/perf/util/cgroup.c
> +++ b/tools/perf/util/cgroup.c
> @@ -15,8 +15,7 @@ int nr_cgroups;
>  
>  static struct rb_root cgroup_tree = RB_ROOT;
>  
> -static int
> -cgroupfs_find_mountpoint(char *buf, size_t maxlen)
> +int cgroupfs_find_mountpoint(char *buf, size_t maxlen)

out of scope of this change, but could this be added to api/fs/fs.c?
it might need more checks then is currently supported, but would be
nice to have it under same api as the rest

jirka


>  {
>  	FILE *fp;
>  	char mountpoint[PATH_MAX + 1], tokens[PATH_MAX + 1], type[PATH_MAX + 1];
> diff --git a/tools/perf/util/cgroup.h b/tools/perf/util/cgroup.h
> index 381583df27c7..9a67060723fa 100644
> --- a/tools/perf/util/cgroup.h
> +++ b/tools/perf/util/cgroup.h
> @@ -17,6 +17,7 @@ struct cgroup {
>  
>  extern int nr_cgroups; /* number of explicit cgroups defined */
>  
> +int cgroupfs_find_mountpoint(char *buf, size_t maxlen);
>  struct cgroup *cgroup__get(struct cgroup *cgroup);
>  void cgroup__put(struct cgroup *cgroup);
>  
> diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
> index 824c038e5c33..28801c867f39 100644
> --- a/tools/perf/util/event.c
> +++ b/tools/perf/util/event.c
> @@ -33,6 +33,7 @@
>  #include "bpf-event.h"
>  #include "tool.h"
>  #include "../perf.h"
> +#include "cgroup.h"
>  
>  static const char *perf_event__names[] = {
>  	[0]					= "TOTAL",
> diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
> index cd336eb8886b..36f122cac44f 100644
> --- a/tools/perf/util/synthetic-events.c
> +++ b/tools/perf/util/synthetic-events.c
> @@ -16,6 +16,7 @@
>  #include "util/synthetic-events.h"
>  #include "util/target.h"
>  #include "util/time-utils.h"
> +#include "util/cgroup.h"
>  #include <linux/bitops.h>
>  #include <linux/kernel.h>
>  #include <linux/string.h>
> @@ -413,6 +414,124 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
>  	return rc;
>  }
>  
> +static int perf_event__synthesize_cgroup(struct perf_tool *tool,
> +					 union perf_event *event,
> +					 char *path, size_t mount_len,
> +					 perf_event__handler_t process,
> +					 struct machine *machine)
> +{
> +	size_t event_size = sizeof(event->cgroup) - sizeof(event->cgroup.path);
> +	size_t path_len = strlen(path) - mount_len + 1;
> +	struct {
> +		struct file_handle fh;
> +		uint64_t cgroup_id;
> +	} handle;
> +	int mount_id;
> +
> +	while (path_len % sizeof(u64))
> +		path[mount_len + path_len++] = '\0';
> +
> +	memset(&event->cgroup, 0, event_size);
> +
> +	event->cgroup.header.type = PERF_RECORD_CGROUP;
> +	event->cgroup.header.size = event_size + path_len + machine->id_hdr_size;
> +
> +	handle.fh.handle_bytes = sizeof(handle.cgroup_id);
> +	if (name_to_handle_at(AT_FDCWD, path, &handle.fh, &mount_id, 0) < 0) {
> +		pr_debug("stat failed: %s\n", path);
> +		return -1;
> +	}
> +
> +	event->cgroup.id = handle.cgroup_id;
> +	strncpy(event->cgroup.path, path + mount_len, path_len);
> +	memset(event->cgroup.path + path_len, 0, machine->id_hdr_size);
> +
> +	if (perf_tool__process_synth_event(tool, event, machine, process) < 0) {
> +		pr_debug("process synth event failed\n");
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
> +static int perf_event__walk_cgroup_tree(struct perf_tool *tool,
> +					union perf_event *event,
> +					char *path, size_t mount_len,
> +					perf_event__handler_t process,
> +					struct machine *machine)
> +{
> +	size_t pos = strlen(path);
> +	DIR *d;
> +	struct dirent *dent;
> +	int ret = 0;
> +
> +	if (perf_event__synthesize_cgroup(tool, event, path, mount_len,
> +					  process, machine) < 0)
> +		return -1;
> +
> +	d = opendir(path);
> +	if (d == NULL) {
> +		pr_debug("failed to open directory: %s\n", path);
> +		return -1;
> +	}
> +
> +	while ((dent = readdir(d)) != NULL) {
> +		if (dent->d_type != DT_DIR)
> +			continue;
> +		if (!strcmp(dent->d_name, ".") ||
> +		    !strcmp(dent->d_name, ".."))
> +			continue;
> +
> +		if (path[pos - 1] != '/')
> +			strcat(path, "/");
> +		strcat(path, dent->d_name);
> +
> +		ret = perf_event__walk_cgroup_tree(tool, event, path,
> +						   mount_len, process, machine);
> +		if (ret < 0)
> +			break;
> +
> +		path[pos] = '\0';
> +	}
> +
> +	closedir(d);
> +	return ret;
> +}
> +
> +int perf_event__synthesize_cgroups(struct perf_tool *tool,
> +				   perf_event__handler_t process,
> +				   struct machine *machine)
> +{
> +	union perf_event event;
> +	char *cgrp_root;
> +	size_t mount_len;  /* length of mount point in the path */
> +	int ret = -1;
> +
> +	cgrp_root = malloc(PATH_MAX);
> +	if (cgrp_root == NULL)
> +		return -1;
> +
> +	if (cgroupfs_find_mountpoint(cgrp_root, PATH_MAX) < 0) {
> +		pr_debug("cannot find cgroup mount point\n");
> +		goto out;
> +	}
> +
> +	mount_len = strlen(cgrp_root);
> +	/* make sure the path starts with a slash (after mount point) */
> +	strcat(cgrp_root, "/");
> +
> +	if (perf_event__walk_cgroup_tree(tool, &event, cgrp_root, mount_len,
> +					 process, machine) < 0)
> +		goto out;
> +
> +	ret = 0;
> +
> +out:
> +	free(cgrp_root);
> +
> +	return ret;
> +}
> +
>  int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t process,
>  				   struct machine *machine)
>  {
> diff --git a/tools/perf/util/synthetic-events.h b/tools/perf/util/synthetic-events.h
> index baead0cdc381..e7a3e9589738 100644
> --- a/tools/perf/util/synthetic-events.h
> +++ b/tools/perf/util/synthetic-events.h
> @@ -45,6 +45,7 @@ int perf_event__synthesize_kernel_mmap(struct perf_tool *tool, perf_event__handl
>  int perf_event__synthesize_mmap_events(struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine, bool mmap_data);
>  int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
>  int perf_event__synthesize_namespaces(struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine);
> +int perf_event__synthesize_cgroups(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
>  int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format, const struct perf_sample *sample);
>  int perf_event__synthesize_stat_config(struct perf_tool *tool, struct perf_stat_config *config, perf_event__handler_t process, struct machine *machine);
>  int perf_event__synthesize_stat_events(struct perf_stat_config *config, struct perf_tool *tool, struct evlist *evlist, perf_event__handler_t process, bool attrs);
> diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
> index 472ef5eb4068..3fb67bd31e4a 100644
> --- a/tools/perf/util/tool.h
> +++ b/tools/perf/util/tool.h
> @@ -79,6 +79,7 @@ struct perf_tool {
>  	bool		ordered_events;
>  	bool		ordering_requires_timestamps;
>  	bool		namespace_events;
> +	bool		cgroup_events;
>  	bool		no_warn;
>  	enum show_feature_header show_feat_hdr;
>  };
> -- 
> 2.24.1.735.g03f4e72817-goog
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/9] perf record: Support synthesizing cgroup events
  2020-01-07 13:34 ` [PATCH 6/9] perf record: Support synthesizing cgroup events Namhyung Kim
  2020-01-08 22:08   ` Jiri Olsa
@ 2020-01-08 22:18   ` Jiri Olsa
  2020-01-09  7:50     ` Namhyung Kim
  2020-01-08 22:21   ` Jiri Olsa
  2 siblings, 1 reply; 33+ messages in thread
From: Jiri Olsa @ 2020-01-08 22:18 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

On Tue, Jan 07, 2020 at 10:34:58PM +0900, Namhyung Kim wrote:

SNIP

> +	if (perf_tool__process_synth_event(tool, event, machine, process) < 0) {
> +		pr_debug("process synth event failed\n");
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
> +static int perf_event__walk_cgroup_tree(struct perf_tool *tool,
> +					union perf_event *event,
> +					char *path, size_t mount_len,
> +					perf_event__handler_t process,
> +					struct machine *machine)
> +{
> +	size_t pos = strlen(path);
> +	DIR *d;
> +	struct dirent *dent;
> +	int ret = 0;
> +
> +	if (perf_event__synthesize_cgroup(tool, event, path, mount_len,
> +					  process, machine) < 0)
> +		return -1;
> +
> +	d = opendir(path);
> +	if (d == NULL) {
> +		pr_debug("failed to open directory: %s\n", path);
> +		return -1;
> +	}
> +
> +	while ((dent = readdir(d)) != NULL) {
> +		if (dent->d_type != DT_DIR)
> +			continue;
> +		if (!strcmp(dent->d_name, ".") ||
> +		    !strcmp(dent->d_name, ".."))
> +			continue;
> +
> +		if (path[pos - 1] != '/')
> +			strcat(path, "/");
> +		strcat(path, dent->d_name);

shouldn't you also check for the max size of path in here?

jirka


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/9] perf record: Support synthesizing cgroup events
  2020-01-07 13:34 ` [PATCH 6/9] perf record: Support synthesizing cgroup events Namhyung Kim
  2020-01-08 22:08   ` Jiri Olsa
  2020-01-08 22:18   ` Jiri Olsa
@ 2020-01-08 22:21   ` Jiri Olsa
  2020-01-09  7:51     ` Namhyung Kim
  2 siblings, 1 reply; 33+ messages in thread
From: Jiri Olsa @ 2020-01-08 22:21 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

On Tue, Jan 07, 2020 at 10:34:58PM +0900, Namhyung Kim wrote:

SNIP

> +	closedir(d);
> +	return ret;
> +}
> +
> +int perf_event__synthesize_cgroups(struct perf_tool *tool,
> +				   perf_event__handler_t process,
> +				   struct machine *machine)
> +{
> +	union perf_event event;
> +	char *cgrp_root;
> +	size_t mount_len;  /* length of mount point in the path */
> +	int ret = -1;
> +
> +	cgrp_root = malloc(PATH_MAX);
> +	if (cgrp_root == NULL)
> +		return -1;
> +

hum, we normally use bufs with PATH_MAX size on stack..
is there some reason to use heap in here?

jirka


> +	if (cgroupfs_find_mountpoint(cgrp_root, PATH_MAX) < 0) {
> +		pr_debug("cannot find cgroup mount point\n");
> +		goto out;
> +	}
> +
> +	mount_len = strlen(cgrp_root);
> +	/* make sure the path starts with a slash (after mount point) */
> +	strcat(cgrp_root, "/");
> +
> +	if (perf_event__walk_cgroup_tree(tool, &event, cgrp_root, mount_len,
> +					 process, machine) < 0)
> +		goto out;
> +
> +	ret = 0;
> +
> +out:
> +	free(cgrp_root);
> +
> +	return ret;
> +}
> +

SNIP


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 8/9] perf top: Add --all-cgroups option
  2020-01-07 13:35 ` [PATCH 8/9] perf top: " Namhyung Kim
@ 2020-01-08 22:24   ` Jiri Olsa
  2020-01-09  7:55     ` Namhyung Kim
  0 siblings, 1 reply; 33+ messages in thread
From: Jiri Olsa @ 2020-01-08 22:24 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

On Tue, Jan 07, 2020 at 10:35:00PM +0900, Namhyung Kim wrote:
> The --all-cgroups option is to enable cgroup profiling support.  It
> tells kernel to record CGROUP events in the ring buffer so that perf
> report can identify task/cgroup association later.
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/Documentation/perf-top.txt | 4 ++++
>  tools/perf/builtin-top.c              | 9 +++++++++
>  2 files changed, 13 insertions(+)
> 
> diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt
> index 5596129a71cf..c75507f50071 100644
> --- a/tools/perf/Documentation/perf-top.txt
> +++ b/tools/perf/Documentation/perf-top.txt
> @@ -266,6 +266,10 @@ Default is to monitor all CPUS.
>  	Record events of type PERF_RECORD_NAMESPACES and display it with the
>  	'cgroup_id' sort key.
>  
> +--cgroup::
> +	Record events of type PERF_RECORD_CGROUP and display it with the
> +	'cgroup' sort key.

should be '--all-cgroups' ?

anyway, we dont have '--cgroups', why not use just this?

jirka

> +
>  --switch-on EVENT_NAME::
>  	Only consider events after this event is found.
>  
> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> index 795e353de095..f6256c533b09 100644
> --- a/tools/perf/builtin-top.c
> +++ b/tools/perf/builtin-top.c
> @@ -1244,6 +1244,8 @@ static int __cmd_top(struct perf_top *top)
>  
>  	if (opts->record_namespaces)
>  		top->tool.namespace_events = true;
> +	if (opts->record_cgroup)
> +		top->tool.cgroup_events = true;
>  
>  	ret = perf_event__synthesize_bpf_events(top->session, perf_event__process,
>  						&top->session->machines.host,
> @@ -1251,6 +1253,11 @@ static int __cmd_top(struct perf_top *top)
>  	if (ret < 0)
>  		pr_debug("Couldn't synthesize BPF events: Pre-existing BPF programs won't have symbols resolved.\n");
>  
> +	ret = perf_event__synthesize_cgroups(&top->tool, perf_event__process,
> +					     &top->session->machines.host);
> +	if (ret < 0)
> +		pr_debug("Couldn't synthesize cgroup events.\n");
> +
>  	machine__synthesize_threads(&top->session->machines.host, &opts->target,
>  				    top->evlist->core.threads, false,
>  				    top->nr_threads_synthesize);
> @@ -1539,6 +1546,8 @@ int cmd_top(int argc, const char **argv)
>  			"number of thread to run event synthesize"),
>  	OPT_BOOLEAN(0, "namespaces", &opts->record_namespaces,
>  		    "Record namespaces events"),
> +	OPT_BOOLEAN(0, "all-cgroups", &opts->record_cgroup,
> +		    "Record cgroup events"),
>  	OPTS_EVSWITCH(&top.evswitch),
>  	OPT_END()
>  	};
> -- 
> 2.24.1.735.g03f4e72817-goog
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/9] perf tools: Maintain cgroup hierarchy
  2020-01-08 21:52   ` Jiri Olsa
@ 2020-01-09  0:51     ` Namhyung Kim
  2020-01-20 13:57       ` Namhyung Kim
  0 siblings, 1 reply; 33+ messages in thread
From: Namhyung Kim @ 2020-01-09  0:51 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

Hi Jiri,

On Thu, Jan 9, 2020 at 6:52 AM Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Tue, Jan 07, 2020 at 10:34:56PM +0900, Namhyung Kim wrote:
> > Each cgroup is kept in the global cgroup_tree sorted by the cgroup id.
> > Hist entries have cgroup id can compare it directly and later it can
> > be used to find a group name using this tree.
> >
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> >  tools/perf/util/cgroup.c  | 72 +++++++++++++++++++++++++++++++++++++++
> >  tools/perf/util/cgroup.h  | 15 +++++---
> >  tools/perf/util/machine.c |  7 ++++
> >  tools/perf/util/session.c |  4 +++
> >  4 files changed, 94 insertions(+), 4 deletions(-)
> >
> > diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
> > index 4881d4af3381..4e8ef1db0c94 100644
> > --- a/tools/perf/util/cgroup.c
> > +++ b/tools/perf/util/cgroup.c
> > @@ -13,6 +13,8 @@
> >
> >  int nr_cgroups;
> >
> > +static struct rb_root cgroup_tree = RB_ROOT;
>
> I think we shoud carry that in 'struct perf_env'

OK, will move.

Thanks
Namhyung

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/9] perf tools: Maintain cgroup hierarchy
  2020-01-08 22:01   ` Jiri Olsa
@ 2020-01-09  0:55     ` Namhyung Kim
  0 siblings, 0 replies; 33+ messages in thread
From: Namhyung Kim @ 2020-01-09  0:55 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

On Thu, Jan 9, 2020 at 7:01 AM Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Tue, Jan 07, 2020 at 10:34:56PM +0900, Namhyung Kim wrote:
>
> SNIP
>
> > +     while (*p != NULL) {
> > +             parent = *p;
> > +             cgrp = rb_entry(parent, struct cgroup, node);
> > +
> > +             if (cgrp->id == id)
> > +                     return cgrp;
> > +
> > +             if (cgrp->id < id)
> > +                     p = &(*p)->rb_left;
> > +             else
> > +                     p = &(*p)->rb_right;
> > +     }
> > +
> > +     cgrp = malloc(sizeof(*cgrp));
> > +     if (cgrp == NULL)
> > +             return NULL;
> > +
> > +     cgrp->name = strdup(path);
> > +     if (cgrp->name == NULL) {
> > +             free(cgrp);
> > +             return NULL;
> > +     }
> > +
> > +     cgrp->fd = -1;
> > +     cgrp->id = id;
> > +     refcount_set(&cgrp->refcnt, 1);
> > +
> > +     rb_link_node(&cgrp->node, parent, p);
> > +     rb_insert_color(&cgrp->node, &cgroup_tree);
> > +
> > +     return cgrp;
> > +}
> > +
> > +struct cgroup *cgroup__find_by_path(const char *path)
> > +{
> > +     struct rb_node *node;
> > +
> > +     node = rb_first(&cgroup_tree);
> > +     while (node) {
> > +             struct cgroup *cgrp = rb_entry(node, struct cgroup, node);
> > +
> > +             if (!strcmp(cgrp->name, path))
> > +                     return cgrp;
>
> you have it sorted on ids, but only search by path,
> why don't we sort it on path right away?

No, actually it only used cgroup__findnew() not __find_by_path().
I don't remember why I added this - will remove..

Thanks
Namhyung

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/9] perf record: Support synthesizing cgroup events
  2020-01-08 22:08   ` Jiri Olsa
@ 2020-01-09  5:19     ` Namhyung Kim
  0 siblings, 0 replies; 33+ messages in thread
From: Namhyung Kim @ 2020-01-09  5:19 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

On Thu, Jan 9, 2020 at 7:09 AM Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Tue, Jan 07, 2020 at 10:34:58PM +0900, Namhyung Kim wrote:
> > diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
> > index 4e8ef1db0c94..5147d22b3bda 100644
> > --- a/tools/perf/util/cgroup.c
> > +++ b/tools/perf/util/cgroup.c
> > @@ -15,8 +15,7 @@ int nr_cgroups;
> >
> >  static struct rb_root cgroup_tree = RB_ROOT;
> >
> > -static int
> > -cgroupfs_find_mountpoint(char *buf, size_t maxlen)
> > +int cgroupfs_find_mountpoint(char *buf, size_t maxlen)
>
> out of scope of this change, but could this be added to api/fs/fs.c?
> it might need more checks then is currently supported, but would be
> nice to have it under same api as the rest

Makes sense, will try to move it.

Thanks
Namhyung

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/9] perf record: Support synthesizing cgroup events
  2020-01-08 22:18   ` Jiri Olsa
@ 2020-01-09  7:50     ` Namhyung Kim
  0 siblings, 0 replies; 33+ messages in thread
From: Namhyung Kim @ 2020-01-09  7:50 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

On Thu, Jan 9, 2020 at 7:18 AM Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Tue, Jan 07, 2020 at 10:34:58PM +0900, Namhyung Kim wrote:
>
> SNIP
>
> > +     if (perf_tool__process_synth_event(tool, event, machine, process) < 0) {
> > +             pr_debug("process synth event failed\n");
> > +             return -1;
> > +     }
> > +
> > +     return 0;
> > +}
> > +
> > +static int perf_event__walk_cgroup_tree(struct perf_tool *tool,
> > +                                     union perf_event *event,
> > +                                     char *path, size_t mount_len,
> > +                                     perf_event__handler_t process,
> > +                                     struct machine *machine)
> > +{
> > +     size_t pos = strlen(path);
> > +     DIR *d;
> > +     struct dirent *dent;
> > +     int ret = 0;
> > +
> > +     if (perf_event__synthesize_cgroup(tool, event, path, mount_len,
> > +                                       process, machine) < 0)
> > +             return -1;
> > +
> > +     d = opendir(path);
> > +     if (d == NULL) {
> > +             pr_debug("failed to open directory: %s\n", path);
> > +             return -1;
> > +     }
> > +
> > +     while ((dent = readdir(d)) != NULL) {
> > +             if (dent->d_type != DT_DIR)
> > +                     continue;
> > +             if (!strcmp(dent->d_name, ".") ||
> > +                 !strcmp(dent->d_name, ".."))
> > +                     continue;
> > +
> > +             if (path[pos - 1] != '/')
> > +                     strcat(path, "/");
> > +             strcat(path, dent->d_name);
>
> shouldn't you also check for the max size of path in here?

I guess a path should be shorter than PATH_MAX.
But yes, I can add a check if you want.

Thanks
Namhyung

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/9] perf record: Support synthesizing cgroup events
  2020-01-08 22:21   ` Jiri Olsa
@ 2020-01-09  7:51     ` Namhyung Kim
  0 siblings, 0 replies; 33+ messages in thread
From: Namhyung Kim @ 2020-01-09  7:51 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

On Thu, Jan 9, 2020 at 7:22 AM Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Tue, Jan 07, 2020 at 10:34:58PM +0900, Namhyung Kim wrote:
>
> SNIP
>
> > +     closedir(d);
> > +     return ret;
> > +}
> > +
> > +int perf_event__synthesize_cgroups(struct perf_tool *tool,
> > +                                perf_event__handler_t process,
> > +                                struct machine *machine)
> > +{
> > +     union perf_event event;
> > +     char *cgrp_root;
> > +     size_t mount_len;  /* length of mount point in the path */
> > +     int ret = -1;
> > +
> > +     cgrp_root = malloc(PATH_MAX);
> > +     if (cgrp_root == NULL)
> > +             return -1;
> > +
>
> hum, we normally use bufs with PATH_MAX size on stack..
> is there some reason to use heap in here?

No specific reason, will change.

Thanks
Namhyung

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 8/9] perf top: Add --all-cgroups option
  2020-01-08 22:24   ` Jiri Olsa
@ 2020-01-09  7:55     ` Namhyung Kim
  2020-01-09  8:18       ` Jiri Olsa
  0 siblings, 1 reply; 33+ messages in thread
From: Namhyung Kim @ 2020-01-09  7:55 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

On Thu, Jan 9, 2020 at 7:25 AM Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Tue, Jan 07, 2020 at 10:35:00PM +0900, Namhyung Kim wrote:
> > The --all-cgroups option is to enable cgroup profiling support.  It
> > tells kernel to record CGROUP events in the ring buffer so that perf
> > report can identify task/cgroup association later.
> >
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> >  tools/perf/Documentation/perf-top.txt | 4 ++++
> >  tools/perf/builtin-top.c              | 9 +++++++++
> >  2 files changed, 13 insertions(+)
> >
> > diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt
> > index 5596129a71cf..c75507f50071 100644
> > --- a/tools/perf/Documentation/perf-top.txt
> > +++ b/tools/perf/Documentation/perf-top.txt
> > @@ -266,6 +266,10 @@ Default is to monitor all CPUS.
> >       Record events of type PERF_RECORD_NAMESPACES and display it with the
> >       'cgroup_id' sort key.
> >
> > +--cgroup::
> > +     Record events of type PERF_RECORD_CGROUP and display it with the
> > +     'cgroup' sort key.
>
> should be '--all-cgroups' ?

Oops, you're right.

>
> anyway, we dont have '--cgroups', why not use just this?

I chose it for consistency since perf record has '--cgroup' option.

Thanks
Namhyung

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 8/9] perf top: Add --all-cgroups option
  2020-01-09  7:55     ` Namhyung Kim
@ 2020-01-09  8:18       ` Jiri Olsa
  0 siblings, 0 replies; 33+ messages in thread
From: Jiri Olsa @ 2020-01-09  8:18 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

On Thu, Jan 09, 2020 at 04:55:21PM +0900, Namhyung Kim wrote:
> On Thu, Jan 9, 2020 at 7:25 AM Jiri Olsa <jolsa@redhat.com> wrote:
> >
> > On Tue, Jan 07, 2020 at 10:35:00PM +0900, Namhyung Kim wrote:
> > > The --all-cgroups option is to enable cgroup profiling support.  It
> > > tells kernel to record CGROUP events in the ring buffer so that perf
> > > report can identify task/cgroup association later.
> > >
> > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > ---
> > >  tools/perf/Documentation/perf-top.txt | 4 ++++
> > >  tools/perf/builtin-top.c              | 9 +++++++++
> > >  2 files changed, 13 insertions(+)
> > >
> > > diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt
> > > index 5596129a71cf..c75507f50071 100644
> > > --- a/tools/perf/Documentation/perf-top.txt
> > > +++ b/tools/perf/Documentation/perf-top.txt
> > > @@ -266,6 +266,10 @@ Default is to monitor all CPUS.
> > >       Record events of type PERF_RECORD_NAMESPACES and display it with the
> > >       'cgroup_id' sort key.
> > >
> > > +--cgroup::
> > > +     Record events of type PERF_RECORD_CGROUP and display it with the
> > > +     'cgroup' sort key.
> >
> > should be '--all-cgroups' ?
> 
> Oops, you're right.
> 
> >
> > anyway, we dont have '--cgroups', why not use just this?
> 
> I chose it for consistency since perf record has '--cgroup' option.

I see.. and we have that substring recognition so I suppose --cgroups
would screw the existing --cgroup option, ok

jirka


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/9] perf tools: Maintain cgroup hierarchy
  2020-01-09  0:51     ` Namhyung Kim
@ 2020-01-20 13:57       ` Namhyung Kim
  2020-01-21  9:42         ` Jiri Olsa
  0 siblings, 1 reply; 33+ messages in thread
From: Namhyung Kim @ 2020-01-20 13:57 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

Hi Jiri,

On Thu, Jan 9, 2020 at 9:51 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Hi Jiri,
>
> On Thu, Jan 9, 2020 at 6:52 AM Jiri Olsa <jolsa@redhat.com> wrote:
> >
> > On Tue, Jan 07, 2020 at 10:34:56PM +0900, Namhyung Kim wrote:
> > > Each cgroup is kept in the global cgroup_tree sorted by the cgroup id.
> > > Hist entries have cgroup id can compare it directly and later it can
> > > be used to find a group name using this tree.
> > >
> > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > ---
> > >  tools/perf/util/cgroup.c  | 72 +++++++++++++++++++++++++++++++++++++++
> > >  tools/perf/util/cgroup.h  | 15 +++++---
> > >  tools/perf/util/machine.c |  7 ++++
> > >  tools/perf/util/session.c |  4 +++
> > >  4 files changed, 94 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
> > > index 4881d4af3381..4e8ef1db0c94 100644
> > > --- a/tools/perf/util/cgroup.c
> > > +++ b/tools/perf/util/cgroup.c
> > > @@ -13,6 +13,8 @@
> > >
> > >  int nr_cgroups;
> > >
> > > +static struct rb_root cgroup_tree = RB_ROOT;
> >
> > I think we shoud carry that in 'struct perf_env'
>
> OK, will move.

So I tried this but then realized that it's hard to get the perf_env later
when it needs to convert a cgroup id to name (ie. in sort_entry.se_snprintf).
I also checked maybe I can resolve it when a hist entry is added,
but it doesn't have the pointer there too.

Thanks
Namhyung

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/9] perf tools: Maintain cgroup hierarchy
  2020-01-20 13:57       ` Namhyung Kim
@ 2020-01-21  9:42         ` Jiri Olsa
  2020-02-17 15:46           ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 33+ messages in thread
From: Jiri Olsa @ 2020-01-21  9:42 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

On Mon, Jan 20, 2020 at 10:57:47PM +0900, Namhyung Kim wrote:
> Hi Jiri,
> 
> On Thu, Jan 9, 2020 at 9:51 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Hi Jiri,
> >
> > On Thu, Jan 9, 2020 at 6:52 AM Jiri Olsa <jolsa@redhat.com> wrote:
> > >
> > > On Tue, Jan 07, 2020 at 10:34:56PM +0900, Namhyung Kim wrote:
> > > > Each cgroup is kept in the global cgroup_tree sorted by the cgroup id.
> > > > Hist entries have cgroup id can compare it directly and later it can
> > > > be used to find a group name using this tree.
> > > >
> > > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > > ---
> > > >  tools/perf/util/cgroup.c  | 72 +++++++++++++++++++++++++++++++++++++++
> > > >  tools/perf/util/cgroup.h  | 15 +++++---
> > > >  tools/perf/util/machine.c |  7 ++++
> > > >  tools/perf/util/session.c |  4 +++
> > > >  4 files changed, 94 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
> > > > index 4881d4af3381..4e8ef1db0c94 100644
> > > > --- a/tools/perf/util/cgroup.c
> > > > +++ b/tools/perf/util/cgroup.c
> > > > @@ -13,6 +13,8 @@
> > > >
> > > >  int nr_cgroups;
> > > >
> > > > +static struct rb_root cgroup_tree = RB_ROOT;
> > >
> > > I think we shoud carry that in 'struct perf_env'
> >
> > OK, will move.
> 
> So I tried this but then realized that it's hard to get the perf_env later
> when it needs to convert a cgroup id to name (ie. in sort_entry.se_snprintf).
> I also checked maybe I can resolve it when a hist entry is added,
> but it doesn't have the pointer there too.

looks like there might be a path for standard report where hists
are part of evsel object:

  'struct hist_entry' via ->hists to  'struct hists'
  hists_to_evsel(hists) to 'struct evsel'
  'struct evsel' via ->evlist to 'struct evlist'
  and there you have evlist->env ;-)

however I was wondering if we could add 'machine' pointer
to the hist object, that would make that simpler ;-)

not sure about the way.. would be nice if that'd work for
both evsel hists and standalone ones like in c2c

but maybe just some init helper that sets the pointer early on
might do the job

jirka


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/9] perf tools: Maintain cgroup hierarchy
  2020-01-21  9:42         ` Jiri Olsa
@ 2020-02-17 15:46           ` Arnaldo Carvalho de Melo
  2020-02-17 19:22             ` Namhyung Kim
  0 siblings, 1 reply; 33+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-02-17 15:46 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Namhyung Kim, Ingo Molnar, Peter Zijlstra, Alexander Shishkin,
	Mark Rutland, Stephane Eranian, LKML, linux-perf-users

Em Tue, Jan 21, 2020 at 10:42:46AM +0100, Jiri Olsa escreveu:
> On Mon, Jan 20, 2020 at 10:57:47PM +0900, Namhyung Kim wrote:
> > So I tried this but then realized that it's hard to get the perf_env later
> > when it needs to convert a cgroup id to name (ie. in sort_entry.se_snprintf).
> > I also checked maybe I can resolve it when a hist entry is added,
> > but it doesn't have the pointer there too.
 
> looks like there might be a path for standard report where hists
> are part of evsel object:
> 
>   'struct hist_entry' via ->hists to  'struct hists'
>   hists_to_evsel(hists) to 'struct evsel'
>   'struct evsel' via ->evlist to 'struct evlist'
>   and there you have evlist->env ;-)

Hey, recently I did work that ended up adding a 'struct maps' to 'struct
map_symbol' and then hist_entry::ms has it, enough?

i.e.:

hist_entry::ms->maps->machine

  he->ms->maps->machine
 
> however I was wondering if we could add 'machine' pointer
> to the hist object, that would make that simpler ;-)

Directly from one of its hist_entries? As above?
 
> not sure about the way.. would be nice if that'd work for
> both evsel hists and standalone ones like in c2c
 
> but maybe just some init helper that sets the pointer early on
> might do the job

Sorry for the delay in answering, was travelling/vacations, now taming a
huge mailbox :-\

- Arnaldo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/9] perf tools: Maintain cgroup hierarchy
  2020-02-17 15:46           ` Arnaldo Carvalho de Melo
@ 2020-02-17 19:22             ` Namhyung Kim
  0 siblings, 0 replies; 33+ messages in thread
From: Namhyung Kim @ 2020-02-17 19:22 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Ingo Molnar, Peter Zijlstra, Alexander Shishkin,
	Mark Rutland, Stephane Eranian, LKML, linux-perf-users

Hi Arnaldo,

On Mon, Feb 17, 2020 at 7:46 AM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> Em Tue, Jan 21, 2020 at 10:42:46AM +0100, Jiri Olsa escreveu:
> > On Mon, Jan 20, 2020 at 10:57:47PM +0900, Namhyung Kim wrote:
> > > So I tried this but then realized that it's hard to get the perf_env later
> > > when it needs to convert a cgroup id to name (ie. in sort_entry.se_snprintf).
> > > I also checked maybe I can resolve it when a hist entry is added,
> > > but it doesn't have the pointer there too.
>
> > looks like there might be a path for standard report where hists
> > are part of evsel object:
> >
> >   'struct hist_entry' via ->hists to  'struct hists'
> >   hists_to_evsel(hists) to 'struct evsel'
> >   'struct evsel' via ->evlist to 'struct evlist'
> >   and there you have evlist->env ;-)
>
> Hey, recently I did work that ended up adding a 'struct maps' to 'struct
> map_symbol' and then hist_entry::ms has it, enough?
>
> i.e.:
>
> hist_entry::ms->maps->machine
>
>   he->ms->maps->machine

Yep, that's enough for my use.

>
> > however I was wondering if we could add 'machine' pointer
> > to the hist object, that would make that simpler ;-)
>
> Directly from one of its hist_entries? As above?

Right, any path from hist_entry to perf_env would be ok.

>
> > not sure about the way.. would be nice if that'd work for
> > both evsel hists and standalone ones like in c2c
>
> > but maybe just some init helper that sets the pointer early on
> > might do the job
>
> Sorry for the delay in answering, was travelling/vacations, now taming a
> huge mailbox :-\

No problem, I'll send the next version soon.

Thanks
Namhyung

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/9] perf tools: Maintain cgroup hierarchy
  2020-04-03 12:36   ` Arnaldo Carvalho de Melo
@ 2020-04-04  1:29     ` Namhyung Kim
  0 siblings, 0 replies; 33+ messages in thread
From: Namhyung Kim @ 2020-04-04  1:29 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Peter Zijlstra, Ingo Molnar, LKML, Mark Rutland,
	Alexander Shishkin

Hi Arnaldo,

On Fri, Apr 3, 2020 at 9:36 PM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
>
> Em Wed, Mar 25, 2020 at 09:45:31PM +0900, Namhyung Kim escreveu:
> > Each cgroup is kept in the perf_env's cgroup_tree sorted by the cgroup
> > id.  Hist entries have cgroup id can compare it directly and later it
> > can be used to find a group name using this tree.
>
> This one breaks the 'perf test python' test, I fixed it adding this
> patch before your series:

Thanks a lot for fixing this and taking care of the whole thing!
Namhyung


> From ea3c4ab73cb2ea2960bba6894560b1ef91e69737 Mon Sep 17 00:00:00 2001
> From: Arnaldo Carvalho de Melo <acme@redhat.com>
> Date: Fri, 3 Apr 2020 09:29:52 -0300
> Subject: [PATCH 1/1] perf python: Include rwsem.c in the pythong biding
>
> We'll need it for the cgroup patches, and its better to have it in a
> separate patch in case we need to later revert the cgroup patches.
>
> I.e. without this we have:
>
>   [root@five ~]# perf test -v python
>   19: 'import perf' in python                               :
>   --- start ---
>   test child forked, pid 148447
>   Traceback (most recent call last):
>     File "<stdin>", line 1, in <module>
>   ImportError: /tmp/build/perf/python/perf.cpython-37m-x86_64-linux-gnu.so: undefined symbol: down_write
>   test child finished with -1
>   ---- end ----
>   'import perf' in python: FAILED!
>   [root@five ~]#
>
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Cc: Jiri Olsa <jolsa@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> ---
>  tools/perf/util/python-ext-sources | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/tools/perf/util/python-ext-sources b/tools/perf/util/python-ext-sources
> index e7279ea6043a..a9d9c142eb7c 100644
> --- a/tools/perf/util/python-ext-sources
> +++ b/tools/perf/util/python-ext-sources
> @@ -34,3 +34,4 @@ util/string.c
>  util/symbol_fprintf.c
>  util/units.c
>  util/affinity.c
> +util/rwsem.c
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/9] perf tools: Maintain cgroup hierarchy
  2020-03-25 12:45 ` [PATCH 4/9] perf tools: Maintain cgroup hierarchy Namhyung Kim
@ 2020-04-03 12:36   ` Arnaldo Carvalho de Melo
  2020-04-04  1:29     ` Namhyung Kim
  0 siblings, 1 reply; 33+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-04-03 12:36 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Peter Zijlstra, Ingo Molnar, LKML, Mark Rutland,
	Alexander Shishkin

Em Wed, Mar 25, 2020 at 09:45:31PM +0900, Namhyung Kim escreveu:
> Each cgroup is kept in the perf_env's cgroup_tree sorted by the cgroup
> id.  Hist entries have cgroup id can compare it directly and later it
> can be used to find a group name using this tree.

This one breaks the 'perf test python' test, I fixed it adding this
patch before your series:


From ea3c4ab73cb2ea2960bba6894560b1ef91e69737 Mon Sep 17 00:00:00 2001
From: Arnaldo Carvalho de Melo <acme@redhat.com>
Date: Fri, 3 Apr 2020 09:29:52 -0300
Subject: [PATCH 1/1] perf python: Include rwsem.c in the pythong biding

We'll need it for the cgroup patches, and its better to have it in a
separate patch in case we need to later revert the cgroup patches.

I.e. without this we have:

  [root@five ~]# perf test -v python
  19: 'import perf' in python                               :
  --- start ---
  test child forked, pid 148447
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  ImportError: /tmp/build/perf/python/perf.cpython-37m-x86_64-linux-gnu.so: undefined symbol: down_write
  test child finished with -1
  ---- end ----
  'import perf' in python: FAILED!
  [root@five ~]#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/python-ext-sources | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/util/python-ext-sources b/tools/perf/util/python-ext-sources
index e7279ea6043a..a9d9c142eb7c 100644
--- a/tools/perf/util/python-ext-sources
+++ b/tools/perf/util/python-ext-sources
@@ -34,3 +34,4 @@ util/string.c
 util/symbol_fprintf.c
 util/units.c
 util/affinity.c
+util/rwsem.c
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 4/9] perf tools: Maintain cgroup hierarchy
  2020-03-25 12:45 [PATCHSET 0/9] perf: Improve cgroup profiling (v6) Namhyung Kim
@ 2020-03-25 12:45 ` Namhyung Kim
  2020-04-03 12:36   ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 33+ messages in thread
From: Namhyung Kim @ 2020-03-25 12:45 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Peter Zijlstra, Ingo Molnar, LKML, Mark Rutland,
	Alexander Shishkin

Each cgroup is kept in the perf_env's cgroup_tree sorted by the cgroup
id.  Hist entries have cgroup id can compare it directly and later it
can be used to find a group name using this tree.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/cgroup.c  | 80 +++++++++++++++++++++++++++++++++++++++
 tools/perf/util/cgroup.h  | 17 +++++++--
 tools/perf/util/env.c     |  2 +
 tools/perf/util/env.h     |  6 +++
 tools/perf/util/machine.c |  9 ++++-
 5 files changed, 109 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
index 5bc9d3b01bd9..b73fb7823048 100644
--- a/tools/perf/util/cgroup.c
+++ b/tools/perf/util/cgroup.c
@@ -191,3 +191,83 @@ int parse_cgroups(const struct option *opt, const char *str,
 	}
 	return 0;
 }
+
+static struct cgroup *__cgroup__findnew(struct rb_root *root, uint64_t id,
+					bool create, const char *path)
+{
+	struct rb_node **p = &root->rb_node;
+	struct rb_node *parent = NULL;
+	struct cgroup *cgrp;
+
+	while (*p != NULL) {
+		parent = *p;
+		cgrp = rb_entry(parent, struct cgroup, node);
+
+		if (cgrp->id == id)
+			return cgrp;
+
+		if (cgrp->id < id)
+			p = &(*p)->rb_left;
+		else
+			p = &(*p)->rb_right;
+	}
+
+	if (!create)
+		return NULL;
+
+	cgrp = malloc(sizeof(*cgrp));
+	if (cgrp == NULL)
+		return NULL;
+
+	cgrp->name = strdup(path);
+	if (cgrp->name == NULL) {
+		free(cgrp);
+		return NULL;
+	}
+
+	cgrp->fd = -1;
+	cgrp->id = id;
+	refcount_set(&cgrp->refcnt, 1);
+
+	rb_link_node(&cgrp->node, parent, p);
+	rb_insert_color(&cgrp->node, root);
+
+	return cgrp;
+}
+
+struct cgroup *cgroup__findnew(struct perf_env *env, uint64_t id,
+			       const char *path)
+{
+	struct cgroup *cgrp;
+
+	down_write(&env->cgroups.lock);
+	cgrp = __cgroup__findnew(&env->cgroups.tree, id, true, path);
+	up_write(&env->cgroups.lock);
+	return cgrp;
+}
+
+struct cgroup *cgroup__find(struct perf_env *env, uint64_t id)
+{
+	struct cgroup *cgrp;
+
+	down_read(&env->cgroups.lock);
+	cgrp = __cgroup__findnew(&env->cgroups.tree, id, false, NULL);
+	up_read(&env->cgroups.lock);
+	return cgrp;
+}
+
+void perf_env__purge_cgroups(struct perf_env *env)
+{
+	struct rb_node *node;
+	struct cgroup *cgrp;
+
+	down_write(&env->cgroups.lock);
+	while (!RB_EMPTY_ROOT(&env->cgroups.tree)) {
+		node = rb_first(&env->cgroups.tree);
+		cgrp = rb_entry(node, struct cgroup, node);
+
+		rb_erase(node, &env->cgroups.tree);
+		cgroup__put(cgrp);
+	}
+	up_write(&env->cgroups.lock);
+}
diff --git a/tools/perf/util/cgroup.h b/tools/perf/util/cgroup.h
index 2ec11f01090d..e98d5975fe55 100644
--- a/tools/perf/util/cgroup.h
+++ b/tools/perf/util/cgroup.h
@@ -3,16 +3,19 @@
 #define __CGROUP_H__
 
 #include <linux/refcount.h>
+#include <linux/rbtree.h>
+#include "util/env.h"
 
 struct option;
 
 struct cgroup {
-	char *name;
-	int fd;
-	refcount_t refcnt;
+	struct rb_node		node;
+	u64			id;
+	char			*name;
+	int			fd;
+	refcount_t		refcnt;
 };
 
-
 extern int nr_cgroups; /* number of explicit cgroups defined */
 
 struct cgroup *cgroup__get(struct cgroup *cgroup);
@@ -26,4 +29,10 @@ void evlist__set_default_cgroup(struct evlist *evlist, struct cgroup *cgroup);
 
 int parse_cgroups(const struct option *opt, const char *str, int unset);
 
+struct cgroup *cgroup__findnew(struct perf_env *env, uint64_t id,
+			       const char *path);
+struct cgroup *cgroup__find(struct perf_env *env, uint64_t id);
+
+void perf_env__purge_cgroups(struct perf_env *env);
+
 #endif /* __CGROUP_H__ */
diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 4154f944f474..fadc59708ece 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -6,6 +6,7 @@
 #include <linux/ctype.h>
 #include <linux/zalloc.h>
 #include "bpf-event.h"
+#include "cgroup.h"
 #include <errno.h>
 #include <sys/utsname.h>
 #include <bpf/libbpf.h>
@@ -168,6 +169,7 @@ void perf_env__exit(struct perf_env *env)
 	int i;
 
 	perf_env__purge_bpf(env);
+	perf_env__purge_cgroups(env);
 	zfree(&env->hostname);
 	zfree(&env->os_release);
 	zfree(&env->version);
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index 11d05ae3606a..7632075a8792 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -88,6 +88,12 @@ struct perf_env {
 		u32			btfs_cnt;
 	} bpf_progs;
 
+	/* same reason as above (for perf-top) */
+	struct {
+		struct rw_semaphore	lock;
+		struct rb_root		tree;
+	} cgroups;
+
 	/* For fast cpu to numa node lookup via perf_env__numa_node */
 	int			*numa_map;
 	int			 nr_numa_map;
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 399b4731b246..97142e9671be 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -33,6 +33,7 @@
 #include "asm/bug.h"
 #include "bpf-event.h"
 #include <internal/lib.h> // page_size
+#include "cgroup.h"
 
 #include <linux/ctype.h>
 #include <symbol/kallsyms.h>
@@ -654,13 +655,19 @@ int machine__process_namespaces_event(struct machine *machine __maybe_unused,
 	return err;
 }
 
-int machine__process_cgroup_event(struct machine *machine __maybe_unused,
+int machine__process_cgroup_event(struct machine *machine,
 				  union perf_event *event,
 				  struct perf_sample *sample __maybe_unused)
 {
+	struct cgroup *cgrp;
+
 	if (dump_trace)
 		perf_event__fprintf_cgroup(event, stdout);
 
+	cgrp = cgroup__findnew(machine->env, event->cgroup.id, event->cgroup.path);
+	if (cgrp == NULL)
+		return -ENOMEM;
+
 	return 0;
 }
 
-- 
2.25.1.696.g5e7596f4ac-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 4/9] perf tools: Maintain cgroup hierarchy
  2019-12-23  6:07 [PATCHSET 0/9] perf: Improve cgroup profiling (v3) Namhyung Kim
@ 2019-12-23  6:07 ` Namhyung Kim
  0 siblings, 0 replies; 33+ messages in thread
From: Namhyung Kim @ 2019-12-23  6:07 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Alexander Shishkin, Mark Rutland, Stephane Eranian,
	LKML, linux-perf-users

Each cgroup is kept in the global cgroup_tree sorted by the cgroup id.
Hist entries have cgroup id can compare it directly and later it can
be used to find a group name using this tree.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/cgroup.c  | 72 +++++++++++++++++++++++++++++++++++++++
 tools/perf/util/cgroup.h  | 15 +++++---
 tools/perf/util/machine.c |  7 ++++
 tools/perf/util/session.c |  4 +++
 4 files changed, 94 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
index 4881d4af3381..4e8ef1db0c94 100644
--- a/tools/perf/util/cgroup.c
+++ b/tools/perf/util/cgroup.c
@@ -13,6 +13,8 @@
 
 int nr_cgroups;
 
+static struct rb_root cgroup_tree = RB_ROOT;
+
 static int
 cgroupfs_find_mountpoint(char *buf, size_t maxlen)
 {
@@ -250,3 +252,73 @@ int parse_cgroups(const struct option *opt, const char *str,
 	}
 	return 0;
 }
+
+struct cgroup *cgroup__findnew(uint64_t id, const char *path)
+{
+	struct rb_node **p = &cgroup_tree.rb_node;
+	struct rb_node *parent = NULL;
+	struct cgroup *cgrp;
+
+	while (*p != NULL) {
+		parent = *p;
+		cgrp = rb_entry(parent, struct cgroup, node);
+
+		if (cgrp->id == id)
+			return cgrp;
+
+		if (cgrp->id < id)
+			p = &(*p)->rb_left;
+		else
+			p = &(*p)->rb_right;
+	}
+
+	cgrp = malloc(sizeof(*cgrp));
+	if (cgrp == NULL)
+		return NULL;
+
+	cgrp->name = strdup(path);
+	if (cgrp->name == NULL) {
+		free(cgrp);
+		return NULL;
+	}
+
+	cgrp->fd = -1;
+	cgrp->id = id;
+	refcount_set(&cgrp->refcnt, 1);
+
+	rb_link_node(&cgrp->node, parent, p);
+	rb_insert_color(&cgrp->node, &cgroup_tree);
+
+	return cgrp;
+}
+
+struct cgroup *cgroup__find_by_path(const char *path)
+{
+	struct rb_node *node;
+
+	node = rb_first(&cgroup_tree);
+	while (node) {
+		struct cgroup *cgrp = rb_entry(node, struct cgroup, node);
+
+		if (!strcmp(cgrp->name, path))
+			return cgrp;
+
+		node = rb_next(&cgrp->node);
+	}
+
+	return NULL;
+}
+
+void destroy_cgroups(void)
+{
+	struct rb_node *node;
+	struct cgroup *cgrp;
+
+	while (!RB_EMPTY_ROOT(&cgroup_tree)) {
+		node = rb_first(&cgroup_tree);
+		cgrp = rb_entry(node, struct cgroup, node);
+
+		rb_erase(node, &cgroup_tree);
+		cgroup__put(cgrp);
+	}
+}
diff --git a/tools/perf/util/cgroup.h b/tools/perf/util/cgroup.h
index 2ec11f01090d..381583df27c7 100644
--- a/tools/perf/util/cgroup.h
+++ b/tools/perf/util/cgroup.h
@@ -3,16 +3,18 @@
 #define __CGROUP_H__
 
 #include <linux/refcount.h>
+#include <linux/rbtree.h>
 
 struct option;
 
 struct cgroup {
-	char *name;
-	int fd;
-	refcount_t refcnt;
+	struct rb_node		node;
+	u64			id;
+	char			*name;
+	int			fd;
+	refcount_t		refcnt;
 };
 
-
 extern int nr_cgroups; /* number of explicit cgroups defined */
 
 struct cgroup *cgroup__get(struct cgroup *cgroup);
@@ -26,4 +28,9 @@ void evlist__set_default_cgroup(struct evlist *evlist, struct cgroup *cgroup);
 
 int parse_cgroups(const struct option *opt, const char *str, int unset);
 
+struct cgroup *cgroup__findnew(uint64_t id, const char *path);
+struct cgroup *cgroup__find_by_path(const char *path);
+
+void destroy_cgroups(void);
+
 #endif /* __CGROUP_H__ */
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 2c3223bec561..19d40a2016d7 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -33,6 +33,7 @@
 #include "asm/bug.h"
 #include "bpf-event.h"
 #include <internal/lib.h> // page_size
+#include "cgroup.h"
 
 #include <linux/ctype.h>
 #include <symbol/kallsyms.h>
@@ -658,9 +659,15 @@ int machine__process_cgroup_event(struct machine *machine __maybe_unused,
 				  union perf_event *event,
 				  struct perf_sample *sample __maybe_unused)
 {
+	struct cgroup *cgrp;
+
 	if (dump_trace)
 		perf_event__fprintf_cgroup(event, stdout);
 
+	cgrp = cgroup__findnew(event->cgroup.id, event->cgroup.path);
+	if (cgrp == NULL)
+		return -ENOMEM;
+
 	return 0;
 }
 
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 6b4c12d48c3f..5e2c9f504184 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -289,6 +289,8 @@ static void perf_session__release_decomp_events(struct perf_session *session)
 	} while (1);
 }
 
+extern void destroy_cgroups(void);
+
 void perf_session__delete(struct perf_session *session)
 {
 	if (session == NULL)
@@ -303,6 +305,8 @@ void perf_session__delete(struct perf_session *session)
 	if (session->data)
 		perf_data__close(session->data);
 	free(session);
+
+	destroy_cgroups();
 }
 
 static int process_event_synth_tracing_data_stub(struct perf_session *session
-- 
2.24.1.735.g03f4e72817-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 4/9] perf tools: Maintain cgroup hierarchy
  2019-12-20  4:32 [PATCHSET 0/9] perf: Improve cgroup profiling (v2) Namhyung Kim
@ 2019-12-20  4:32 ` Namhyung Kim
  0 siblings, 0 replies; 33+ messages in thread
From: Namhyung Kim @ 2019-12-20  4:32 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Alexander Shishkin, Mark Rutland, Stephane Eranian,
	LKML, linux-perf-users

Each cgroup is kept in the global cgroup_tree sorted by the inode
number.  Hist entries have cgroup ino number can compare it directly
and later it can be used to find a group name using this tree.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/cgroup.c  | 72 +++++++++++++++++++++++++++++++++++++++
 tools/perf/util/cgroup.h  | 15 +++++---
 tools/perf/util/machine.c |  7 ++++
 tools/perf/util/session.c |  4 +++
 4 files changed, 94 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
index 4881d4af3381..99d93f31732d 100644
--- a/tools/perf/util/cgroup.c
+++ b/tools/perf/util/cgroup.c
@@ -13,6 +13,8 @@
 
 int nr_cgroups;
 
+static struct rb_root cgroup_tree = RB_ROOT;
+
 static int
 cgroupfs_find_mountpoint(char *buf, size_t maxlen)
 {
@@ -250,3 +252,73 @@ int parse_cgroups(const struct option *opt, const char *str,
 	}
 	return 0;
 }
+
+struct cgroup *cgroup__findnew(uint64_t ino, const char *path)
+{
+	struct rb_node **p = &cgroup_tree.rb_node;
+	struct rb_node *parent = NULL;
+	struct cgroup *cgrp;
+
+	while (*p != NULL) {
+		parent = *p;
+		cgrp = rb_entry(parent, struct cgroup, node);
+
+		if (cgrp->ino == ino)
+			return cgrp;
+
+		if (cgrp->ino < ino)
+			p = &(*p)->rb_left;
+		else
+			p = &(*p)->rb_right;
+	}
+
+	cgrp = malloc(sizeof(*cgrp));
+	if (cgrp == NULL)
+		return NULL;
+
+	cgrp->name = strdup(path);
+	if (cgrp->name == NULL) {
+		free(cgrp);
+		return NULL;
+	}
+
+	cgrp->fd = -1;
+	cgrp->ino = ino;
+	refcount_set(&cgrp->refcnt, 1);
+
+	rb_link_node(&cgrp->node, parent, p);
+	rb_insert_color(&cgrp->node, &cgroup_tree);
+
+	return cgrp;
+}
+
+struct cgroup *cgroup__find_by_path(const char *path)
+{
+	struct rb_node *node;
+
+	node = rb_first(&cgroup_tree);
+	while (node) {
+		struct cgroup *cgrp = rb_entry(node, struct cgroup, node);
+
+		if (!strcmp(cgrp->name, path))
+			return cgrp;
+
+		node = rb_next(&cgrp->node);
+	}
+
+	return NULL;
+}
+
+void destroy_cgroups(void)
+{
+	struct rb_node *node;
+	struct cgroup *cgrp;
+
+	while (!RB_EMPTY_ROOT(&cgroup_tree)) {
+		node = rb_first(&cgroup_tree);
+		cgrp = rb_entry(node, struct cgroup, node);
+
+		rb_erase(node, &cgroup_tree);
+		cgroup__put(cgrp);
+	}
+}
diff --git a/tools/perf/util/cgroup.h b/tools/perf/util/cgroup.h
index 2ec11f01090d..11a8b187ec09 100644
--- a/tools/perf/util/cgroup.h
+++ b/tools/perf/util/cgroup.h
@@ -3,16 +3,18 @@
 #define __CGROUP_H__
 
 #include <linux/refcount.h>
+#include <linux/rbtree.h>
 
 struct option;
 
 struct cgroup {
-	char *name;
-	int fd;
-	refcount_t refcnt;
+	struct rb_node		node;
+	u64			ino;
+	char			*name;
+	int			fd;
+	refcount_t		refcnt;
 };
 
-
 extern int nr_cgroups; /* number of explicit cgroups defined */
 
 struct cgroup *cgroup__get(struct cgroup *cgroup);
@@ -26,4 +28,9 @@ void evlist__set_default_cgroup(struct evlist *evlist, struct cgroup *cgroup);
 
 int parse_cgroups(const struct option *opt, const char *str, int unset);
 
+struct cgroup *cgroup__findnew(uint64_t ino, const char *path);
+struct cgroup *cgroup__find_by_path(const char *path);
+
+void destroy_cgroups(void);
+
 #endif /* __CGROUP_H__ */
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 2c3223bec561..19d40a2016d7 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -33,6 +33,7 @@
 #include "asm/bug.h"
 #include "bpf-event.h"
 #include <internal/lib.h> // page_size
+#include "cgroup.h"
 
 #include <linux/ctype.h>
 #include <symbol/kallsyms.h>
@@ -658,9 +659,15 @@ int machine__process_cgroup_event(struct machine *machine __maybe_unused,
 				  union perf_event *event,
 				  struct perf_sample *sample __maybe_unused)
 {
+	struct cgroup *cgrp;
+
 	if (dump_trace)
 		perf_event__fprintf_cgroup(event, stdout);
 
+	cgrp = cgroup__findnew(event->cgroup.id, event->cgroup.path);
+	if (cgrp == NULL)
+		return -ENOMEM;
+
 	return 0;
 }
 
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 6b4c12d48c3f..5e2c9f504184 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -289,6 +289,8 @@ static void perf_session__release_decomp_events(struct perf_session *session)
 	} while (1);
 }
 
+extern void destroy_cgroups(void);
+
 void perf_session__delete(struct perf_session *session)
 {
 	if (session == NULL)
@@ -303,6 +305,8 @@ void perf_session__delete(struct perf_session *session)
 	if (session->data)
 		perf_data__close(session->data);
 	free(session);
+
+	destroy_cgroups();
 }
 
 static int process_event_synth_tracing_data_stub(struct perf_session *session
-- 
2.24.1.735.g03f4e72817-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 4/9] perf tools: Maintain cgroup hierarchy
  2019-08-28  7:31 [PATCHSET 0/9] perf: Improve cgroup profiling (v1) Namhyung Kim
@ 2019-08-28  7:31 ` Namhyung Kim
  0 siblings, 0 replies; 33+ messages in thread
From: Namhyung Kim @ 2019-08-28  7:31 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: LKML, Jiri Olsa, Alexander Shishkin, Stephane Eranian

Each cgroup is kept in the global cgroup_tree sorted by the inode
number.  Hist entries have cgroup ino number can compare it directly
and later it can be used to find a group name using this tree.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/cgroup.c  | 72 +++++++++++++++++++++++++++++++++++++++
 tools/perf/util/cgroup.h  | 15 +++++---
 tools/perf/util/machine.c |  7 ++++
 tools/perf/util/session.c |  4 +++
 4 files changed, 94 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
index f73599f271ff..8e4c26ea5078 100644
--- a/tools/perf/util/cgroup.c
+++ b/tools/perf/util/cgroup.c
@@ -12,6 +12,8 @@
 
 int nr_cgroups;
 
+static struct rb_root cgroup_tree = RB_ROOT;
+
 static int
 cgroupfs_find_mountpoint(char *buf, size_t maxlen)
 {
@@ -249,3 +251,73 @@ int parse_cgroups(const struct option *opt, const char *str,
 	}
 	return 0;
 }
+
+struct cgroup *cgroup__findnew(uint64_t ino, const char *path)
+{
+	struct rb_node **p = &cgroup_tree.rb_node;
+	struct rb_node *parent = NULL;
+	struct cgroup *cgrp;
+
+	while (*p != NULL) {
+		parent = *p;
+		cgrp = rb_entry(parent, struct cgroup, node);
+
+		if (cgrp->ino == ino)
+			return cgrp;
+
+		if (cgrp->ino < ino)
+			p = &(*p)->rb_left;
+		else
+			p = &(*p)->rb_right;
+	}
+
+	cgrp = malloc(sizeof(*cgrp));
+	if (cgrp == NULL)
+		return NULL;
+
+	cgrp->name = strdup(path);
+	if (cgrp->name == NULL) {
+		free(cgrp);
+		return NULL;
+	}
+
+	cgrp->fd = -1;
+	cgrp->ino = ino;
+	refcount_set(&cgrp->refcnt, 1);
+
+	rb_link_node(&cgrp->node, parent, p);
+	rb_insert_color(&cgrp->node, &cgroup_tree);
+
+	return cgrp;
+}
+
+struct cgroup *cgroup__find_by_path(const char *path)
+{
+	struct rb_node *node;
+
+	node = rb_first(&cgroup_tree);
+	while (node) {
+		struct cgroup *cgrp = rb_entry(node, struct cgroup, node);
+
+		if (!strcmp(cgrp->name, path))
+			return cgrp;
+
+		node = rb_next(&cgrp->node);
+	}
+
+	return NULL;
+}
+
+void destroy_cgroups(void)
+{
+	struct rb_node *node;
+	struct cgroup *cgrp;
+
+	while (!RB_EMPTY_ROOT(&cgroup_tree)) {
+		node = rb_first(&cgroup_tree);
+		cgrp = rb_entry(node, struct cgroup, node);
+
+		rb_erase(node, &cgroup_tree);
+		cgroup__put(cgrp);
+	}
+}
diff --git a/tools/perf/util/cgroup.h b/tools/perf/util/cgroup.h
index 2ec11f01090d..11a8b187ec09 100644
--- a/tools/perf/util/cgroup.h
+++ b/tools/perf/util/cgroup.h
@@ -3,16 +3,18 @@
 #define __CGROUP_H__
 
 #include <linux/refcount.h>
+#include <linux/rbtree.h>
 
 struct option;
 
 struct cgroup {
-	char *name;
-	int fd;
-	refcount_t refcnt;
+	struct rb_node		node;
+	u64			ino;
+	char			*name;
+	int			fd;
+	refcount_t		refcnt;
 };
 
-
 extern int nr_cgroups; /* number of explicit cgroups defined */
 
 struct cgroup *cgroup__get(struct cgroup *cgroup);
@@ -26,4 +28,9 @@ void evlist__set_default_cgroup(struct evlist *evlist, struct cgroup *cgroup);
 
 int parse_cgroups(const struct option *opt, const char *str, int unset);
 
+struct cgroup *cgroup__findnew(uint64_t ino, const char *path);
+struct cgroup *cgroup__find_by_path(const char *path);
+
+void destroy_cgroups(void);
+
 #endif /* __CGROUP_H__ */
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 61c35eef616b..33554e745e6b 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -26,6 +26,7 @@
 #include "linux/hash.h"
 #include "asm/bug.h"
 #include "bpf-event.h"
+#include "cgroup.h"
 
 #include <linux/ctype.h>
 #include <symbol/kallsyms.h>
@@ -646,9 +647,15 @@ int machine__process_cgroup_event(struct machine *machine __maybe_unused,
 				  union perf_event *event,
 				  struct perf_sample *sample __maybe_unused)
 {
+	struct cgroup *cgrp;
+
 	if (dump_trace)
 		perf_event__fprintf_cgroup(event, stdout);
 
+	cgrp = cgroup__findnew(event->cgroup.ino, event->cgroup.path);
+	if (cgrp == NULL)
+		return -ENOMEM;
+
 	return 0;
 }
 
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 2cdce7ee228c..ffdd956d0a89 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -275,6 +275,8 @@ static void perf_session__release_decomp_events(struct perf_session *session)
 	} while (1);
 }
 
+extern void destroy_cgroups(void);
+
 void perf_session__delete(struct perf_session *session)
 {
 	if (session == NULL)
@@ -289,6 +291,8 @@ void perf_session__delete(struct perf_session *session)
 	if (session->data)
 		perf_data__close(session->data);
 	free(session);
+
+	destroy_cgroups();
 }
 
 static int process_event_synth_tracing_data_stub(struct perf_session *session
-- 
2.23.0.187.g17f5b7556c-goog


^ permalink raw reply related	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2020-04-04  1:29 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-07 13:34 [PATCHSET 0/9] perf: Improve cgroup profiling (v4) Namhyung Kim
2020-01-07 13:34 ` [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event Namhyung Kim
2020-01-07 13:34 ` [PATCH 2/9] perf/core: Add PERF_SAMPLE_CGROUP feature Namhyung Kim
2020-01-07 13:34 ` [PATCH 3/9] perf tools: Basic support for CGROUP event Namhyung Kim
2020-01-07 13:34 ` [PATCH 4/9] perf tools: Maintain cgroup hierarchy Namhyung Kim
2020-01-08 21:52   ` Jiri Olsa
2020-01-09  0:51     ` Namhyung Kim
2020-01-20 13:57       ` Namhyung Kim
2020-01-21  9:42         ` Jiri Olsa
2020-02-17 15:46           ` Arnaldo Carvalho de Melo
2020-02-17 19:22             ` Namhyung Kim
2020-01-08 22:01   ` Jiri Olsa
2020-01-09  0:55     ` Namhyung Kim
2020-01-07 13:34 ` [PATCH 5/9] perf report: Add 'cgroup' sort key Namhyung Kim
2020-01-07 13:34 ` [PATCH 6/9] perf record: Support synthesizing cgroup events Namhyung Kim
2020-01-08 22:08   ` Jiri Olsa
2020-01-09  5:19     ` Namhyung Kim
2020-01-08 22:18   ` Jiri Olsa
2020-01-09  7:50     ` Namhyung Kim
2020-01-08 22:21   ` Jiri Olsa
2020-01-09  7:51     ` Namhyung Kim
2020-01-07 13:34 ` [PATCH 7/9] perf record: Add --all-cgroups option Namhyung Kim
2020-01-07 13:35 ` [PATCH 8/9] perf top: " Namhyung Kim
2020-01-08 22:24   ` Jiri Olsa
2020-01-09  7:55     ` Namhyung Kim
2020-01-09  8:18       ` Jiri Olsa
2020-01-07 13:35 ` [PATCH 9/9] perf script: Add --show-cgroup-events option Namhyung Kim
  -- strict thread matches above, loose matches on Subject: below --
2020-03-25 12:45 [PATCHSET 0/9] perf: Improve cgroup profiling (v6) Namhyung Kim
2020-03-25 12:45 ` [PATCH 4/9] perf tools: Maintain cgroup hierarchy Namhyung Kim
2020-04-03 12:36   ` Arnaldo Carvalho de Melo
2020-04-04  1:29     ` Namhyung Kim
2019-12-23  6:07 [PATCHSET 0/9] perf: Improve cgroup profiling (v3) Namhyung Kim
2019-12-23  6:07 ` [PATCH 4/9] perf tools: Maintain cgroup hierarchy Namhyung Kim
2019-12-20  4:32 [PATCHSET 0/9] perf: Improve cgroup profiling (v2) Namhyung Kim
2019-12-20  4:32 ` [PATCH 4/9] perf tools: Maintain cgroup hierarchy Namhyung Kim
2019-08-28  7:31 [PATCHSET 0/9] perf: Improve cgroup profiling (v1) Namhyung Kim
2019-08-28  7:31 ` [PATCH 4/9] perf tools: Maintain cgroup hierarchy Namhyung Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.