linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET 0/9] perf: Improve cgroup profiling (v6)
@ 2020-03-25 12:45 Namhyung Kim
  2020-03-25 12:45 ` [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event Namhyung Kim
                   ` (9 more replies)
  0 siblings, 10 replies; 46+ messages in thread
From: Namhyung Kim @ 2020-03-25 12:45 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Peter Zijlstra, Ingo Molnar, LKML, Mark Rutland,
	Alexander Shishkin, Tejun Heo, Li Zefan, Johannes Weiner,
	Adrian Hunter

Hello,

This work is to improve cgroup profiling in perf.  Currently it only
supports profiling tasks in a specific cgroup and there's no way to
identify which cgroup the current sample belongs to.  So I added
PERF_SAMPLE_CGROUP to add cgroup id into each sample.  It's a 64-bit
integer having file handle of the cgroup.  And kernel also generates
PERF_RECORD_CGROUP event for new groups to correlate the cgroup id and
cgroup name (path in the cgroup filesystem).  The cgroup id can be
read from userspace by name_to_handle_at() system call so it can
synthesize the CGROUP event for existing groups.

So why do we want this?  Systems running a large number of jobs in
different cgroups want to profiling such jobs precisely. This includes
container hosting systems widely used today. Currently perf supports
namespace tracking but the systems may not use (cgroup) namespace for
their jobs. Also it'd be more intuitive to see cgroup names (as
they're given by user or sysadmin) rather than numeric
cgroup/namespace id even if they use the namespaces.

From Stephane Eranian:
> In data centers you care about attributing samples to a job not such
> much to a process.  A job may have multiple processes which may come
> and go. The cgroup on the other hand stays around for the entire
> lifetime of the job. It is much easier to map a cgroup name to a
> particular job than it is to map a pid back to a job name,
> especially for offline post-processing.

Note that this only works for "perf_event" cgroups (obviously) so if
users are still using cgroup-v1 interface, they need to have same
hierarchy for subsystem(s) want to profile with it.

 * Changes from v5:
  - add ACKs for the kernel changes
  - fix sort column alignment when no cgroup is available

 * Changes from v4:
  - use CONFIG_CGROUP_PERF
  - move cgroup tree to perf_env
  - move cgroup fs utility function to tools/lib/api/fs
  - use a local buffer and check its size for cgroup systhesis

 * Changes from v3:
  - staticize perf_event_cgroup()
  - fix sample parsing test

 * Changes from v2:
  - remove path_len from cgroup_event
  - bail out if kernel doesn't support cgroup sampling
  - add some description in the Kconfig

 * Changes from v1:
  - use new cgroup id (= file handle)

The code is available at perf/cgroup-v6 branch in

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git


The testing result looks something like this:

  [root@qemu build]# ./perf record --all-cgroups ./cgtest
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.009 MB perf.data (150 samples) ]
  
  [root@qemu build]# ./perf report -s cgroup,comm --stdio
  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 150  of event 'cpu-clock:pppH'
  # Event count (approx.): 37500000
  #
  # Overhead  cgroup      Command
  # ........  ..........  .......
  #
      32.00%  /sub/cgrp2  looper2
      28.00%  /sub/cgrp1  looper1
      25.33%  /sub        looper0
       4.00%  /           cgtest 
       4.00%  /sub        cgtest 
       3.33%  /sub/cgrp2  cgtest 
       2.67%  /sub/cgrp1  cgtest 
       0.67%  /           looper0


The test program (cgtest) follows.

Thanks,
Namhyung


Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>


-------8<-----------------------------------------8<----------------
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <sched.h>
#include <errno.h>
#include <sys/stat.h>
#include <sys/wait.h>
#include <sys/prctl.h>
#include <sys/mount.h>

char cgbase[] = "/sys/fs/cgroup/perf_event";

void mkcgrp(char *name) {
  char buf[256];

  snprintf(buf, sizeof(buf), "%s%s", cgbase, name);
  if (mkdir(buf, 0755) < 0)
    perror("mkdir");
}

void rmcgrp(char *name) {
  char buf[256];

  snprintf(buf, sizeof(buf), "%s%s", cgbase, name);
  if (rmdir(buf) < 0)
    perror("rmdir");
}

void setcgrp(char *name) {
  char buf[256];
  int fd;

  snprintf(buf, sizeof(buf), "%s%s/tasks", cgbase, name);

  fd = open(buf, O_WRONLY);
  if (fd > 0) {
    if (write(fd, "0\n", 2) != 2)
      perror("write");
    close(fd);
  }
}

void create_sub_cgroup(int idx) {
  char buf[128];

  snprintf(buf, sizeof(buf), "/sub/cgrp%d", idx);
  mkcgrp(buf);
}

void remove_sub_cgroup(int idx) {
  char buf[128];

  snprintf(buf, sizeof(buf), "/sub/cgrp%d", idx);
  rmcgrp(buf);
}

void set_sub_cgroup(int idx) {
  char buf[128];

  snprintf(buf, sizeof(buf), "/sub/cgrp%d", idx);
  setcgrp(buf);
}

void set_task_name(int idx) {
  char buf[16];

  snprintf(buf, sizeof(buf), "looper%d", idx);
  prctl(PR_SET_NAME, buf, 0, 0, 0);
}

void loop(unsigned max) {
  volatile unsigned int count = 1;

  while (count++ != max) {
    asm volatile ("pause");
  }
}

void worker(int idx, unsigned cnt, int new_ns) {
  int oldns;

  create_sub_cgroup(idx);
  set_sub_cgroup(idx);

  if (new_ns) {
    if (unshare(CLONE_NEWCGROUP) < 0)
      perror("unshare");

#if 0  /* FIXME */
    if (unshare(CLONE_NEWNS) < 0)
      perror("unshare");

    if (mount("none", "/sys", NULL, MS_REMOUNT | MS_REC | MS_SLAVE, NULL) < 0)
      perror("mount --make-rslave");

    sleep(1);
    if (umount("/sys/fs/cgroup/perf_event") < 0)
      perror("umount");

    if (mount("cgroup", "/sys/fs/cgroup/perf_event", "cgroup",
              MS_NODEV | MS_NOEXEC | MS_NOSUID, "perf_event") < 0)
      perror("mount again");
#endif
  }

  if (fork() == 0) {
    set_task_name(idx);
    loop(cnt);
    exit(0);
  }
  wait(NULL);
}

int main(int argc, char *argv[])
{
  int i, nr = 2;
  int new_ns = 1;
  unsigned cnt = 1000000;
  int fd;

  if (argc > 1)
    nr = atoi(argv[1]);
  if (argc > 2)
    cnt = atoi(argv[2]);
  if (argc > 3)
    new_ns = atoi(argv[3]);

  mkcgrp("/sub");
  setcgrp("/sub");

  for (i = 0; i < nr; i++) {
    if (fork() == 0) {
      worker(i+1, cnt, new_ns);
      exit(0);
    }
  }

  set_task_name(0);
  loop(cnt);

  for (i = 0; i < nr; i++)
    wait(NULL);

  for (i = 0; i < nr; i++)
    remove_sub_cgroup(i+1);

  setcgrp("/");
  rmcgrp("/sub");

  return 0;
}
-------8<-----------------------------------------8<----------------


Namhyung Kim (9):
  perf/core: Add PERF_RECORD_CGROUP event
  perf/core: Add PERF_SAMPLE_CGROUP feature
  perf tools: Basic support for CGROUP event
  perf tools: Maintain cgroup hierarchy
  perf report: Add 'cgroup' sort key
  perf record: Support synthesizing cgroup events
  perf record: Add --all-cgroups option
  perf top: Add --all-cgroups option
  perf script: Add --show-cgroup-events option

 include/linux/perf_event.h                |   1 +
 include/uapi/linux/perf_event.h           |  16 ++-
 init/Kconfig                              |   3 +-
 kernel/events/core.c                      | 133 ++++++++++++++++++++++
 tools/include/uapi/linux/perf_event.h     |  16 ++-
 tools/lib/perf/include/perf/event.h       |   7 ++
 tools/perf/Documentation/perf-record.txt  |   5 +-
 tools/perf/Documentation/perf-report.txt  |   1 +
 tools/perf/Documentation/perf-script.txt  |   3 +
 tools/perf/Documentation/perf-top.txt     |   4 +
 tools/perf/builtin-diff.c                 |   1 +
 tools/perf/builtin-record.c               |  10 ++
 tools/perf/builtin-report.c               |   1 +
 tools/perf/builtin-script.c               |  41 +++++++
 tools/perf/builtin-top.c                  |   9 ++
 tools/perf/tests/sample-parsing.c         |   6 +-
 tools/perf/util/cgroup.c                  |  80 +++++++++++++
 tools/perf/util/cgroup.h                  |  17 ++-
 tools/perf/util/env.c                     |   2 +
 tools/perf/util/env.h                     |   6 +
 tools/perf/util/event.c                   |  18 +++
 tools/perf/util/event.h                   |   6 +
 tools/perf/util/evsel.c                   |  17 ++-
 tools/perf/util/evsel.h                   |   1 +
 tools/perf/util/hist.c                    |  13 +++
 tools/perf/util/hist.h                    |   1 +
 tools/perf/util/machine.c                 |  19 ++++
 tools/perf/util/machine.h                 |   3 +
 tools/perf/util/perf_event_attr_fprintf.c |   2 +
 tools/perf/util/record.h                  |   1 +
 tools/perf/util/session.c                 |   4 +
 tools/perf/util/sort.c                    |  37 ++++++
 tools/perf/util/sort.h                    |   2 +
 tools/perf/util/synthetic-events.c        | 121 ++++++++++++++++++++
 tools/perf/util/synthetic-events.h        |   1 +
 tools/perf/util/tool.h                    |   2 +
 36 files changed, 598 insertions(+), 12 deletions(-)

-- 
2.25.1.696.g5e7596f4ac-goog


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event
  2020-03-25 12:45 [PATCHSET 0/9] perf: Improve cgroup profiling (v6) Namhyung Kim
@ 2020-03-25 12:45 ` Namhyung Kim
  2020-04-04  8:41   ` [tip: perf/urgent] " tip-bot2 for Namhyung Kim
  2020-03-25 12:45 ` [PATCH 2/9] perf/core: Add PERF_SAMPLE_CGROUP feature Namhyung Kim
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 46+ messages in thread
From: Namhyung Kim @ 2020-03-25 12:45 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Peter Zijlstra, Ingo Molnar, LKML, Mark Rutland,
	Alexander Shishkin, Li Zefan, Johannes Weiner, Adrian Hunter,
	kbuild test robot, Tejun Heo, Peter Zijlstra

To support cgroup tracking, add CGROUP event to save a link between
cgroup path and id number.  This is needed since cgroups can go away
when userspace tries to read the cgroup info (from the id) later.
The attr.cgroup bit was also added to enable cgroup tracking from
userspace.

This event will be generated when a new cgroup becomes active.
Userspace might need to synthesize those events for existing cgroups.

Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
[staticize perf_event_cgroup function]
Reported-by: kbuild test robot <lkp@intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 include/uapi/linux/perf_event.h |  13 +++-
 kernel/events/core.c            | 111 ++++++++++++++++++++++++++++++++
 2 files changed, 123 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 397cfd65b3fe..de95f6c7b273 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -381,7 +381,8 @@ struct perf_event_attr {
 				ksymbol        :  1, /* include ksymbol events */
 				bpf_event      :  1, /* include bpf events */
 				aux_output     :  1, /* generate AUX records instead of events */
-				__reserved_1   : 32;
+				cgroup         :  1, /* include cgroup events */
+				__reserved_1   : 31;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -1012,6 +1013,16 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_BPF_EVENT			= 18,
 
+	/*
+	 * struct {
+	 *	struct perf_event_header	header;
+	 *	u64				id;
+	 *	char				path[];
+	 *	struct sample_id		sample_id;
+	 * };
+	 */
+	PERF_RECORD_CGROUP			= 19,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index d22e4ba59dfa..994932d5e474 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -387,6 +387,7 @@ static atomic_t nr_freq_events __read_mostly;
 static atomic_t nr_switch_events __read_mostly;
 static atomic_t nr_ksymbol_events __read_mostly;
 static atomic_t nr_bpf_events __read_mostly;
+static atomic_t nr_cgroup_events __read_mostly;
 
 static LIST_HEAD(pmus);
 static DEFINE_MUTEX(pmus_lock);
@@ -4608,6 +4609,8 @@ static void unaccount_event(struct perf_event *event)
 		atomic_dec(&nr_comm_events);
 	if (event->attr.namespaces)
 		atomic_dec(&nr_namespaces_events);
+	if (event->attr.cgroup)
+		atomic_dec(&nr_cgroup_events);
 	if (event->attr.task)
 		atomic_dec(&nr_task_events);
 	if (event->attr.freq)
@@ -7735,6 +7738,105 @@ void perf_event_namespaces(struct task_struct *task)
 			NULL);
 }
 
+/*
+ * cgroup tracking
+ */
+#ifdef CONFIG_CGROUP_PERF
+
+struct perf_cgroup_event {
+	char				*path;
+	int				path_size;
+	struct {
+		struct perf_event_header	header;
+		u64				id;
+		char				path[];
+	} event_id;
+};
+
+static int perf_event_cgroup_match(struct perf_event *event)
+{
+	return event->attr.cgroup;
+}
+
+static void perf_event_cgroup_output(struct perf_event *event, void *data)
+{
+	struct perf_cgroup_event *cgroup_event = data;
+	struct perf_output_handle handle;
+	struct perf_sample_data sample;
+	u16 header_size = cgroup_event->event_id.header.size;
+	int ret;
+
+	if (!perf_event_cgroup_match(event))
+		return;
+
+	perf_event_header__init_id(&cgroup_event->event_id.header,
+				   &sample, event);
+	ret = perf_output_begin(&handle, event,
+				cgroup_event->event_id.header.size);
+	if (ret)
+		goto out;
+
+	perf_output_put(&handle, cgroup_event->event_id);
+	__output_copy(&handle, cgroup_event->path, cgroup_event->path_size);
+
+	perf_event__output_id_sample(event, &handle, &sample);
+
+	perf_output_end(&handle);
+out:
+	cgroup_event->event_id.header.size = header_size;
+}
+
+static void perf_event_cgroup(struct cgroup *cgrp)
+{
+	struct perf_cgroup_event cgroup_event;
+	char path_enomem[16] = "//enomem";
+	char *pathname;
+	size_t size;
+
+	if (!atomic_read(&nr_cgroup_events))
+		return;
+
+	cgroup_event = (struct perf_cgroup_event){
+		.event_id  = {
+			.header = {
+				.type = PERF_RECORD_CGROUP,
+				.misc = 0,
+				.size = sizeof(cgroup_event.event_id),
+			},
+			.id = cgroup_id(cgrp),
+		},
+	};
+
+	pathname = kmalloc(PATH_MAX, GFP_KERNEL);
+	if (pathname == NULL) {
+		cgroup_event.path = path_enomem;
+	} else {
+		/* just to be sure to have enough space for alignment */
+		cgroup_path(cgrp, pathname, PATH_MAX - sizeof(u64));
+		cgroup_event.path = pathname;
+	}
+
+	/*
+	 * Since our buffer works in 8 byte units we need to align our string
+	 * size to a multiple of 8. However, we must guarantee the tail end is
+	 * zero'd out to avoid leaking random bits to userspace.
+	 */
+	size = strlen(cgroup_event.path) + 1;
+	while (!IS_ALIGNED(size, sizeof(u64)))
+		cgroup_event.path[size++] = '\0';
+
+	cgroup_event.event_id.header.size += size;
+	cgroup_event.path_size = size;
+
+	perf_iterate_sb(perf_event_cgroup_output,
+			&cgroup_event,
+			NULL);
+
+	kfree(pathname);
+}
+
+#endif
+
 /*
  * mmap tracking
  */
@@ -10781,6 +10883,8 @@ static void account_event(struct perf_event *event)
 		atomic_inc(&nr_comm_events);
 	if (event->attr.namespaces)
 		atomic_inc(&nr_namespaces_events);
+	if (event->attr.cgroup)
+		atomic_inc(&nr_cgroup_events);
 	if (event->attr.task)
 		atomic_inc(&nr_task_events);
 	if (event->attr.freq)
@@ -12757,6 +12861,12 @@ static void perf_cgroup_css_free(struct cgroup_subsys_state *css)
 	kfree(jc);
 }
 
+static int perf_cgroup_css_online(struct cgroup_subsys_state *css)
+{
+	perf_event_cgroup(css->cgroup);
+	return 0;
+}
+
 static int __perf_cgroup_move(void *info)
 {
 	struct task_struct *task = info;
@@ -12778,6 +12888,7 @@ static void perf_cgroup_attach(struct cgroup_taskset *tset)
 struct cgroup_subsys perf_event_cgrp_subsys = {
 	.css_alloc	= perf_cgroup_css_alloc,
 	.css_free	= perf_cgroup_css_free,
+	.css_online	= perf_cgroup_css_online,
 	.attach		= perf_cgroup_attach,
 	/*
 	 * Implicitly enable on dfl hierarchy so that perf events can
-- 
2.25.1.696.g5e7596f4ac-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 2/9] perf/core: Add PERF_SAMPLE_CGROUP feature
  2020-03-25 12:45 [PATCHSET 0/9] perf: Improve cgroup profiling (v6) Namhyung Kim
  2020-03-25 12:45 ` [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event Namhyung Kim
@ 2020-03-25 12:45 ` Namhyung Kim
  2020-04-04  8:41   ` [tip: perf/urgent] " tip-bot2 for Namhyung Kim
  2020-03-25 12:45 ` [PATCH 3/9] perf tools: Basic support for CGROUP event Namhyung Kim
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 46+ messages in thread
From: Namhyung Kim @ 2020-03-25 12:45 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Peter Zijlstra, Ingo Molnar, LKML, Mark Rutland,
	Alexander Shishkin, Li Zefan, Johannes Weiner, Tejun Heo,
	Peter Zijlstra

The PERF_SAMPLE_CGROUP bit is to save (perf_event) cgroup information
in the sample.  It will add a 64-bit id to identify current cgroup and
it's the file handle in the cgroup file system.  Userspace should use
this information with PERF_RECORD_CGROUP event to match which cgroup
it belongs.

I put it before PERF_SAMPLE_AUX for simplicity since it just needs a
64-bit word.  But if we want bigger samples, I can work on that
direction too.

Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 include/linux/perf_event.h      |  1 +
 include/uapi/linux/perf_event.h |  3 ++-
 init/Kconfig                    |  3 ++-
 kernel/events/core.c            | 22 ++++++++++++++++++++++
 4 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 8768a39b5258..9c3e7619c929 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1020,6 +1020,7 @@ struct perf_sample_data {
 	u64				stack_user_size;
 
 	u64				phys_addr;
+	u64				cgroup;
 } ____cacheline_aligned;
 
 /* default value for data source */
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index de95f6c7b273..7b2d6fc9e6ed 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -142,8 +142,9 @@ enum perf_event_sample_format {
 	PERF_SAMPLE_REGS_INTR			= 1U << 18,
 	PERF_SAMPLE_PHYS_ADDR			= 1U << 19,
 	PERF_SAMPLE_AUX				= 1U << 20,
+	PERF_SAMPLE_CGROUP			= 1U << 21,
 
-	PERF_SAMPLE_MAX = 1U << 21,		/* non-ABI */
+	PERF_SAMPLE_MAX = 1U << 22,		/* non-ABI */
 
 	__PERF_SAMPLE_CALLCHAIN_EARLY		= 1ULL << 63, /* non-ABI; internal use */
 };
diff --git a/init/Kconfig b/init/Kconfig
index 20a6ac33761c..7766b06a0038 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1027,7 +1027,8 @@ config CGROUP_PERF
 	help
 	  This option extends the perf per-cpu mode to restrict monitoring
 	  to threads which belong to the cgroup specified and run on the
-	  designated cpu.
+	  designated cpu.  Or this can be used to have cgroup ID in samples
+	  so that it can monitor performance events among cgroups.
 
 	  Say N if unsure.
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 994932d5e474..1569979c8912 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1862,6 +1862,9 @@ static void __perf_event_header_size(struct perf_event *event, u64 sample_type)
 	if (sample_type & PERF_SAMPLE_PHYS_ADDR)
 		size += sizeof(data->phys_addr);
 
+	if (sample_type & PERF_SAMPLE_CGROUP)
+		size += sizeof(data->cgroup);
+
 	event->header_size = size;
 }
 
@@ -6867,6 +6870,9 @@ void perf_output_sample(struct perf_output_handle *handle,
 	if (sample_type & PERF_SAMPLE_PHYS_ADDR)
 		perf_output_put(handle, data->phys_addr);
 
+	if (sample_type & PERF_SAMPLE_CGROUP)
+		perf_output_put(handle, data->cgroup);
+
 	if (sample_type & PERF_SAMPLE_AUX) {
 		perf_output_put(handle, data->aux_size);
 
@@ -7066,6 +7072,16 @@ void perf_prepare_sample(struct perf_event_header *header,
 	if (sample_type & PERF_SAMPLE_PHYS_ADDR)
 		data->phys_addr = perf_virt_to_phys(data->addr);
 
+#ifdef CONFIG_CGROUP_PERF
+	if (sample_type & PERF_SAMPLE_CGROUP) {
+		struct cgroup *cgrp;
+
+		/* protected by RCU */
+		cgrp = task_css_check(current, perf_event_cgrp_id, 1)->cgroup;
+		data->cgroup = cgroup_id(cgrp);
+	}
+#endif
+
 	if (sample_type & PERF_SAMPLE_AUX) {
 		u64 size;
 
@@ -11264,6 +11280,12 @@ static int perf_copy_attr(struct perf_event_attr __user *uattr,
 
 	if (attr->sample_type & PERF_SAMPLE_REGS_INTR)
 		ret = perf_reg_validate(attr->sample_regs_intr);
+
+#ifndef CONFIG_CGROUP_PERF
+	if (attr->sample_type & PERF_SAMPLE_CGROUP)
+		return -EINVAL;
+#endif
+
 out:
 	return ret;
 
-- 
2.25.1.696.g5e7596f4ac-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 3/9] perf tools: Basic support for CGROUP event
  2020-03-25 12:45 [PATCHSET 0/9] perf: Improve cgroup profiling (v6) Namhyung Kim
  2020-03-25 12:45 ` [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event Namhyung Kim
  2020-03-25 12:45 ` [PATCH 2/9] perf/core: Add PERF_SAMPLE_CGROUP feature Namhyung Kim
@ 2020-03-25 12:45 ` Namhyung Kim
  2020-03-27 14:06   ` Arnaldo Carvalho de Melo
  2020-04-04  8:41   ` [tip: perf/urgent] " tip-bot2 for Namhyung Kim
  2020-03-25 12:45 ` [PATCH 4/9] perf tools: Maintain cgroup hierarchy Namhyung Kim
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 46+ messages in thread
From: Namhyung Kim @ 2020-03-25 12:45 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Peter Zijlstra, Ingo Molnar, LKML, Mark Rutland,
	Alexander Shishkin, Adrian Hunter, kernel test robot

Implement basic functionality to support cgroup tracking.  Each cgroup
can be identified by inode number which can be read from userspace
too.  The actual cgroup processing will come in the later patch.

Cc: Adrian Hunter <adrian.hunter@intel.com>
[fix perf test failure on sampling parsing]
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/include/uapi/linux/perf_event.h     | 16 ++++++++++++++--
 tools/lib/perf/include/perf/event.h       |  7 +++++++
 tools/perf/builtin-diff.c                 |  1 +
 tools/perf/builtin-report.c               |  1 +
 tools/perf/tests/sample-parsing.c         |  6 +++++-
 tools/perf/util/event.c                   | 18 ++++++++++++++++++
 tools/perf/util/event.h                   |  6 ++++++
 tools/perf/util/evsel.c                   |  6 ++++++
 tools/perf/util/machine.c                 | 12 ++++++++++++
 tools/perf/util/machine.h                 |  3 +++
 tools/perf/util/perf_event_attr_fprintf.c |  2 ++
 tools/perf/util/session.c                 |  4 ++++
 tools/perf/util/synthetic-events.c        |  8 ++++++++
 tools/perf/util/tool.h                    |  1 +
 14 files changed, 88 insertions(+), 3 deletions(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 397cfd65b3fe..7b2d6fc9e6ed 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -142,8 +142,9 @@ enum perf_event_sample_format {
 	PERF_SAMPLE_REGS_INTR			= 1U << 18,
 	PERF_SAMPLE_PHYS_ADDR			= 1U << 19,
 	PERF_SAMPLE_AUX				= 1U << 20,
+	PERF_SAMPLE_CGROUP			= 1U << 21,
 
-	PERF_SAMPLE_MAX = 1U << 21,		/* non-ABI */
+	PERF_SAMPLE_MAX = 1U << 22,		/* non-ABI */
 
 	__PERF_SAMPLE_CALLCHAIN_EARLY		= 1ULL << 63, /* non-ABI; internal use */
 };
@@ -381,7 +382,8 @@ struct perf_event_attr {
 				ksymbol        :  1, /* include ksymbol events */
 				bpf_event      :  1, /* include bpf events */
 				aux_output     :  1, /* generate AUX records instead of events */
-				__reserved_1   : 32;
+				cgroup         :  1, /* include cgroup events */
+				__reserved_1   : 31;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -1012,6 +1014,16 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_BPF_EVENT			= 18,
 
+	/*
+	 * struct {
+	 *	struct perf_event_header	header;
+	 *	u64				id;
+	 *	char				path[];
+	 *	struct sample_id		sample_id;
+	 * };
+	 */
+	PERF_RECORD_CGROUP			= 19,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
diff --git a/tools/lib/perf/include/perf/event.h b/tools/lib/perf/include/perf/event.h
index 18106899cb4e..69b44d2cc0f5 100644
--- a/tools/lib/perf/include/perf/event.h
+++ b/tools/lib/perf/include/perf/event.h
@@ -105,6 +105,12 @@ struct perf_record_bpf_event {
 	__u8			 tag[BPF_TAG_SIZE];  // prog tag
 };
 
+struct perf_record_cgroup {
+	struct perf_event_header header;
+	__u64			 id;
+	char			 path[PATH_MAX];
+};
+
 struct perf_record_sample {
 	struct perf_event_header header;
 	__u64			 array[];
@@ -352,6 +358,7 @@ union perf_event {
 	struct perf_record_mmap2		mmap2;
 	struct perf_record_comm			comm;
 	struct perf_record_namespaces		namespaces;
+	struct perf_record_cgroup		cgroup;
 	struct perf_record_fork			fork;
 	struct perf_record_lost			lost;
 	struct perf_record_lost_samples		lost_samples;
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 5e697cd2224a..c94a002f295e 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -455,6 +455,7 @@ static struct perf_diff pdiff = {
 		.fork	= perf_event__process_fork,
 		.lost	= perf_event__process_lost,
 		.namespaces = perf_event__process_namespaces,
+		.cgroup = perf_event__process_cgroup,
 		.ordered_events = true,
 		.ordering_requires_timestamps = true,
 	},
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index ea673b7eb3f4..26d8fc27e427 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -1105,6 +1105,7 @@ int cmd_report(int argc, const char **argv)
 			.mmap2		 = perf_event__process_mmap2,
 			.comm		 = perf_event__process_comm,
 			.namespaces	 = perf_event__process_namespaces,
+			.cgroup		 = perf_event__process_cgroup,
 			.exit		 = perf_event__process_exit,
 			.fork		 = perf_event__process_fork,
 			.lost		 = perf_event__process_lost,
diff --git a/tools/perf/tests/sample-parsing.c b/tools/perf/tests/sample-parsing.c
index 14239e472187..61865699c3f4 100644
--- a/tools/perf/tests/sample-parsing.c
+++ b/tools/perf/tests/sample-parsing.c
@@ -151,6 +151,9 @@ static bool samples_same(const struct perf_sample *s1,
 	if (type & PERF_SAMPLE_PHYS_ADDR)
 		COMP(phys_addr);
 
+	if (type & PERF_SAMPLE_CGROUP)
+		COMP(cgroup);
+
 	if (type & PERF_SAMPLE_AUX) {
 		COMP(aux_sample.size);
 		if (memcmp(s1->aux_sample.data, s2->aux_sample.data,
@@ -230,6 +233,7 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
 			.regs	= regs,
 		},
 		.phys_addr	= 113,
+		.cgroup		= 114,
 		.aux_sample	= {
 			.size	= sizeof(aux_data),
 			.data	= (void *)aux_data,
@@ -336,7 +340,7 @@ int test__sample_parsing(struct test *test __maybe_unused, int subtest __maybe_u
 	 * were added.  Please actually update the test rather than just change
 	 * the condition below.
 	 */
-	if (PERF_SAMPLE_MAX > PERF_SAMPLE_AUX << 1) {
+	if (PERF_SAMPLE_MAX > PERF_SAMPLE_CGROUP << 1) {
 		pr_debug("sample format has changed, some new PERF_SAMPLE_ bit was introduced - test needs updating\n");
 		return -1;
 	}
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index c5447ff516a2..824c038e5c33 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -54,6 +54,7 @@ static const char *perf_event__names[] = {
 	[PERF_RECORD_NAMESPACES]		= "NAMESPACES",
 	[PERF_RECORD_KSYMBOL]			= "KSYMBOL",
 	[PERF_RECORD_BPF_EVENT]			= "BPF_EVENT",
+	[PERF_RECORD_CGROUP]			= "CGROUP",
 	[PERF_RECORD_HEADER_ATTR]		= "ATTR",
 	[PERF_RECORD_HEADER_EVENT_TYPE]		= "EVENT_TYPE",
 	[PERF_RECORD_HEADER_TRACING_DATA]	= "TRACING_DATA",
@@ -180,6 +181,12 @@ size_t perf_event__fprintf_namespaces(union perf_event *event, FILE *fp)
 	return ret;
 }
 
+size_t perf_event__fprintf_cgroup(union perf_event *event, FILE *fp)
+{
+	return fprintf(fp, " cgroup: %" PRI_lu64 " %s\n",
+		       event->cgroup.id, event->cgroup.path);
+}
+
 int perf_event__process_comm(struct perf_tool *tool __maybe_unused,
 			     union perf_event *event,
 			     struct perf_sample *sample,
@@ -196,6 +203,14 @@ int perf_event__process_namespaces(struct perf_tool *tool __maybe_unused,
 	return machine__process_namespaces_event(machine, event, sample);
 }
 
+int perf_event__process_cgroup(struct perf_tool *tool __maybe_unused,
+			       union perf_event *event,
+			       struct perf_sample *sample,
+			       struct machine *machine)
+{
+	return machine__process_cgroup_event(machine, event, sample);
+}
+
 int perf_event__process_lost(struct perf_tool *tool __maybe_unused,
 			     union perf_event *event,
 			     struct perf_sample *sample,
@@ -417,6 +432,9 @@ size_t perf_event__fprintf(union perf_event *event, FILE *fp)
 	case PERF_RECORD_NAMESPACES:
 		ret += perf_event__fprintf_namespaces(event, fp);
 		break;
+	case PERF_RECORD_CGROUP:
+		ret += perf_event__fprintf_cgroup(event, fp);
+		break;
 	case PERF_RECORD_MMAP2:
 		ret += perf_event__fprintf_mmap2(event, fp);
 		break;
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 3cda40a2fafc..b8289f160f07 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -135,6 +135,7 @@ struct perf_sample {
 	u32 raw_size;
 	u64 data_src;
 	u64 phys_addr;
+	u64 cgroup;
 	u32 flags;
 	u16 insn_len;
 	u8  cpumode;
@@ -322,6 +323,10 @@ int perf_event__process_namespaces(struct perf_tool *tool,
 				   union perf_event *event,
 				   struct perf_sample *sample,
 				   struct machine *machine);
+int perf_event__process_cgroup(struct perf_tool *tool,
+			       union perf_event *event,
+			       struct perf_sample *sample,
+			       struct machine *machine);
 int perf_event__process_mmap(struct perf_tool *tool,
 			     union perf_event *event,
 			     struct perf_sample *sample,
@@ -377,6 +382,7 @@ size_t perf_event__fprintf_switch(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_thread_map(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_cpu_map(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_namespaces(union perf_event *event, FILE *fp);
+size_t perf_event__fprintf_cgroup(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_ksymbol(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_bpf(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf(union perf_event *event, FILE *fp);
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 15ccd193483f..b766eb608b97 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2267,6 +2267,12 @@ int perf_evsel__parse_sample(struct evsel *evsel, union perf_event *event,
 		array++;
 	}
 
+	data->cgroup = 0;
+	if (type & PERF_SAMPLE_CGROUP) {
+		data->cgroup = *array;
+		array++;
+	}
+
 	if (type & PERF_SAMPLE_AUX) {
 		OVERFLOW_CHECK_u64(array);
 		sz = *array++;
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index fd14f1489802..399b4731b246 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -654,6 +654,16 @@ int machine__process_namespaces_event(struct machine *machine __maybe_unused,
 	return err;
 }
 
+int machine__process_cgroup_event(struct machine *machine __maybe_unused,
+				  union perf_event *event,
+				  struct perf_sample *sample __maybe_unused)
+{
+	if (dump_trace)
+		perf_event__fprintf_cgroup(event, stdout);
+
+	return 0;
+}
+
 int machine__process_lost_event(struct machine *machine __maybe_unused,
 				union perf_event *event, struct perf_sample *sample __maybe_unused)
 {
@@ -1878,6 +1888,8 @@ int machine__process_event(struct machine *machine, union perf_event *event,
 		ret = machine__process_mmap_event(machine, event, sample); break;
 	case PERF_RECORD_NAMESPACES:
 		ret = machine__process_namespaces_event(machine, event, sample); break;
+	case PERF_RECORD_CGROUP:
+		ret = machine__process_cgroup_event(machine, event, sample); break;
 	case PERF_RECORD_MMAP2:
 		ret = machine__process_mmap2_event(machine, event, sample); break;
 	case PERF_RECORD_FORK:
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index be0a930eca89..fa1be9ea00fa 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -128,6 +128,9 @@ int machine__process_switch_event(struct machine *machine,
 int machine__process_namespaces_event(struct machine *machine,
 				      union perf_event *event,
 				      struct perf_sample *sample);
+int machine__process_cgroup_event(struct machine *machine,
+				  union perf_event *event,
+				  struct perf_sample *sample);
 int machine__process_mmap_event(struct machine *machine, union perf_event *event,
 				struct perf_sample *sample);
 int machine__process_mmap2_event(struct machine *machine, union perf_event *event,
diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/perf_event_attr_fprintf.c
index 355d3458d4e6..b94fa07f5d32 100644
--- a/tools/perf/util/perf_event_attr_fprintf.c
+++ b/tools/perf/util/perf_event_attr_fprintf.c
@@ -35,6 +35,7 @@ static void __p_sample_type(char *buf, size_t size, u64 value)
 		bit_name(BRANCH_STACK), bit_name(REGS_USER), bit_name(STACK_USER),
 		bit_name(IDENTIFIER), bit_name(REGS_INTR), bit_name(DATA_SRC),
 		bit_name(WEIGHT), bit_name(PHYS_ADDR), bit_name(AUX),
+		bit_name(CGROUP),
 		{ .name = NULL, }
 	};
 #undef bit_name
@@ -132,6 +133,7 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
 	PRINT_ATTRf(ksymbol, p_unsigned);
 	PRINT_ATTRf(bpf_event, p_unsigned);
 	PRINT_ATTRf(aux_output, p_unsigned);
+	PRINT_ATTRf(cgroup, p_unsigned);
 
 	PRINT_ATTRn("{ wakeup_events, wakeup_watermark }", wakeup_events, p_unsigned);
 	PRINT_ATTRf(bp_type, p_unsigned);
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 055b00abd56d..0b0bfe5bef17 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -471,6 +471,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
 		tool->comm = process_event_stub;
 	if (tool->namespaces == NULL)
 		tool->namespaces = process_event_stub;
+	if (tool->cgroup == NULL)
+		tool->cgroup = process_event_stub;
 	if (tool->fork == NULL)
 		tool->fork = process_event_stub;
 	if (tool->exit == NULL)
@@ -1436,6 +1438,8 @@ static int machines__deliver_event(struct machines *machines,
 		return tool->comm(tool, event, sample, machine);
 	case PERF_RECORD_NAMESPACES:
 		return tool->namespaces(tool, event, sample, machine);
+	case PERF_RECORD_CGROUP:
+		return tool->cgroup(tool, event, sample, machine);
 	case PERF_RECORD_FORK:
 		return tool->fork(tool, event, sample, machine);
 	case PERF_RECORD_EXIT:
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index 3f28af39f9c6..f72d80999506 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -1230,6 +1230,9 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
 	if (type & PERF_SAMPLE_PHYS_ADDR)
 		result += sizeof(u64);
 
+	if (type & PERF_SAMPLE_CGROUP)
+		result += sizeof(u64);
+
 	if (type & PERF_SAMPLE_AUX) {
 		result += sizeof(u64);
 		result += sample->aux_sample.size;
@@ -1404,6 +1407,11 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo
 		array++;
 	}
 
+	if (type & PERF_SAMPLE_CGROUP) {
+		*array = sample->cgroup;
+		array++;
+	}
+
 	if (type & PERF_SAMPLE_AUX) {
 		sz = sample->aux_sample.size;
 		*array++ = sz;
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 2abbf668b8de..472ef5eb4068 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -46,6 +46,7 @@ struct perf_tool {
 			mmap2,
 			comm,
 			namespaces,
+			cgroup,
 			fork,
 			exit,
 			lost,
-- 
2.25.1.696.g5e7596f4ac-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 4/9] perf tools: Maintain cgroup hierarchy
  2020-03-25 12:45 [PATCHSET 0/9] perf: Improve cgroup profiling (v6) Namhyung Kim
                   ` (2 preceding siblings ...)
  2020-03-25 12:45 ` [PATCH 3/9] perf tools: Basic support for CGROUP event Namhyung Kim
@ 2020-03-25 12:45 ` Namhyung Kim
  2020-04-03 12:36   ` Arnaldo Carvalho de Melo
  2020-04-04  8:41   ` [tip: perf/urgent] perf cgroup: Maintain cgroup hierarchy tip-bot2 for Namhyung Kim
  2020-03-25 12:45 ` [PATCH 5/9] perf report: Add 'cgroup' sort key Namhyung Kim
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 46+ messages in thread
From: Namhyung Kim @ 2020-03-25 12:45 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Peter Zijlstra, Ingo Molnar, LKML, Mark Rutland,
	Alexander Shishkin

Each cgroup is kept in the perf_env's cgroup_tree sorted by the cgroup
id.  Hist entries have cgroup id can compare it directly and later it
can be used to find a group name using this tree.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/cgroup.c  | 80 +++++++++++++++++++++++++++++++++++++++
 tools/perf/util/cgroup.h  | 17 +++++++--
 tools/perf/util/env.c     |  2 +
 tools/perf/util/env.h     |  6 +++
 tools/perf/util/machine.c |  9 ++++-
 5 files changed, 109 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
index 5bc9d3b01bd9..b73fb7823048 100644
--- a/tools/perf/util/cgroup.c
+++ b/tools/perf/util/cgroup.c
@@ -191,3 +191,83 @@ int parse_cgroups(const struct option *opt, const char *str,
 	}
 	return 0;
 }
+
+static struct cgroup *__cgroup__findnew(struct rb_root *root, uint64_t id,
+					bool create, const char *path)
+{
+	struct rb_node **p = &root->rb_node;
+	struct rb_node *parent = NULL;
+	struct cgroup *cgrp;
+
+	while (*p != NULL) {
+		parent = *p;
+		cgrp = rb_entry(parent, struct cgroup, node);
+
+		if (cgrp->id == id)
+			return cgrp;
+
+		if (cgrp->id < id)
+			p = &(*p)->rb_left;
+		else
+			p = &(*p)->rb_right;
+	}
+
+	if (!create)
+		return NULL;
+
+	cgrp = malloc(sizeof(*cgrp));
+	if (cgrp == NULL)
+		return NULL;
+
+	cgrp->name = strdup(path);
+	if (cgrp->name == NULL) {
+		free(cgrp);
+		return NULL;
+	}
+
+	cgrp->fd = -1;
+	cgrp->id = id;
+	refcount_set(&cgrp->refcnt, 1);
+
+	rb_link_node(&cgrp->node, parent, p);
+	rb_insert_color(&cgrp->node, root);
+
+	return cgrp;
+}
+
+struct cgroup *cgroup__findnew(struct perf_env *env, uint64_t id,
+			       const char *path)
+{
+	struct cgroup *cgrp;
+
+	down_write(&env->cgroups.lock);
+	cgrp = __cgroup__findnew(&env->cgroups.tree, id, true, path);
+	up_write(&env->cgroups.lock);
+	return cgrp;
+}
+
+struct cgroup *cgroup__find(struct perf_env *env, uint64_t id)
+{
+	struct cgroup *cgrp;
+
+	down_read(&env->cgroups.lock);
+	cgrp = __cgroup__findnew(&env->cgroups.tree, id, false, NULL);
+	up_read(&env->cgroups.lock);
+	return cgrp;
+}
+
+void perf_env__purge_cgroups(struct perf_env *env)
+{
+	struct rb_node *node;
+	struct cgroup *cgrp;
+
+	down_write(&env->cgroups.lock);
+	while (!RB_EMPTY_ROOT(&env->cgroups.tree)) {
+		node = rb_first(&env->cgroups.tree);
+		cgrp = rb_entry(node, struct cgroup, node);
+
+		rb_erase(node, &env->cgroups.tree);
+		cgroup__put(cgrp);
+	}
+	up_write(&env->cgroups.lock);
+}
diff --git a/tools/perf/util/cgroup.h b/tools/perf/util/cgroup.h
index 2ec11f01090d..e98d5975fe55 100644
--- a/tools/perf/util/cgroup.h
+++ b/tools/perf/util/cgroup.h
@@ -3,16 +3,19 @@
 #define __CGROUP_H__
 
 #include <linux/refcount.h>
+#include <linux/rbtree.h>
+#include "util/env.h"
 
 struct option;
 
 struct cgroup {
-	char *name;
-	int fd;
-	refcount_t refcnt;
+	struct rb_node		node;
+	u64			id;
+	char			*name;
+	int			fd;
+	refcount_t		refcnt;
 };
 
-
 extern int nr_cgroups; /* number of explicit cgroups defined */
 
 struct cgroup *cgroup__get(struct cgroup *cgroup);
@@ -26,4 +29,10 @@ void evlist__set_default_cgroup(struct evlist *evlist, struct cgroup *cgroup);
 
 int parse_cgroups(const struct option *opt, const char *str, int unset);
 
+struct cgroup *cgroup__findnew(struct perf_env *env, uint64_t id,
+			       const char *path);
+struct cgroup *cgroup__find(struct perf_env *env, uint64_t id);
+
+void perf_env__purge_cgroups(struct perf_env *env);
+
 #endif /* __CGROUP_H__ */
diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 4154f944f474..fadc59708ece 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -6,6 +6,7 @@
 #include <linux/ctype.h>
 #include <linux/zalloc.h>
 #include "bpf-event.h"
+#include "cgroup.h"
 #include <errno.h>
 #include <sys/utsname.h>
 #include <bpf/libbpf.h>
@@ -168,6 +169,7 @@ void perf_env__exit(struct perf_env *env)
 	int i;
 
 	perf_env__purge_bpf(env);
+	perf_env__purge_cgroups(env);
 	zfree(&env->hostname);
 	zfree(&env->os_release);
 	zfree(&env->version);
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index 11d05ae3606a..7632075a8792 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -88,6 +88,12 @@ struct perf_env {
 		u32			btfs_cnt;
 	} bpf_progs;
 
+	/* same reason as above (for perf-top) */
+	struct {
+		struct rw_semaphore	lock;
+		struct rb_root		tree;
+	} cgroups;
+
 	/* For fast cpu to numa node lookup via perf_env__numa_node */
 	int			*numa_map;
 	int			 nr_numa_map;
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 399b4731b246..97142e9671be 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -33,6 +33,7 @@
 #include "asm/bug.h"
 #include "bpf-event.h"
 #include <internal/lib.h> // page_size
+#include "cgroup.h"
 
 #include <linux/ctype.h>
 #include <symbol/kallsyms.h>
@@ -654,13 +655,19 @@ int machine__process_namespaces_event(struct machine *machine __maybe_unused,
 	return err;
 }
 
-int machine__process_cgroup_event(struct machine *machine __maybe_unused,
+int machine__process_cgroup_event(struct machine *machine,
 				  union perf_event *event,
 				  struct perf_sample *sample __maybe_unused)
 {
+	struct cgroup *cgrp;
+
 	if (dump_trace)
 		perf_event__fprintf_cgroup(event, stdout);
 
+	cgrp = cgroup__findnew(machine->env, event->cgroup.id, event->cgroup.path);
+	if (cgrp == NULL)
+		return -ENOMEM;
+
 	return 0;
 }
 
-- 
2.25.1.696.g5e7596f4ac-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 5/9] perf report: Add 'cgroup' sort key
  2020-03-25 12:45 [PATCHSET 0/9] perf: Improve cgroup profiling (v6) Namhyung Kim
                   ` (3 preceding siblings ...)
  2020-03-25 12:45 ` [PATCH 4/9] perf tools: Maintain cgroup hierarchy Namhyung Kim
@ 2020-03-25 12:45 ` Namhyung Kim
  2020-04-04  8:41   ` [tip: perf/urgent] " tip-bot2 for Namhyung Kim
  2020-03-25 12:45 ` [PATCH 6/9] perf record: Support synthesizing cgroup events Namhyung Kim
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 46+ messages in thread
From: Namhyung Kim @ 2020-03-25 12:45 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Peter Zijlstra, Ingo Molnar, LKML, Mark Rutland,
	Alexander Shishkin

The cgroup sort key is to show cgroup membership of each task.
Currently it shows full path in the cgroupfs (not relative to the root
of cgroup namespace) since it'd be more intuitive IMHO.  Otherwise
root cgroup in different namespaces will all show same name - "/".

The cgroup sort key should come before cgroup_id otherwise
sort_dimension__add() will match it to cgroup_id as it only matches
with the given substring.

For example it will look like following.  Note that record patch
adding --all-cgroups patch will come later.

  $ perf record -a --namespace --all-cgroups  cgtest
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.208 MB perf.data (4090 samples) ]

  $ perf report -s cgroup_id,cgroup,pid
  ...
  # Overhead  cgroup id (dev/inode)  Cgroup          Pid:Command
  # ........  .....................  ..........  ...............
  #
      93.96%  0/0x0                  /                 0:swapper
       1.25%  3/0xeffffffb           /               278:looper0
       0.86%  3/0xf000015f           /sub/cgrp1      280:cgtest
       0.37%  3/0xf0000160           /sub/cgrp2      281:cgtest
       0.34%  3/0xf0000163           /sub/cgrp3      282:cgtest
       0.22%  3/0xeffffffb           /sub            278:looper0
       0.20%  3/0xeffffffb           /               280:cgtest
       0.15%  3/0xf0000163           /sub/cgrp3      285:looper3

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-report.txt |  1 +
 tools/perf/util/hist.c                   | 13 +++++++++
 tools/perf/util/hist.h                   |  1 +
 tools/perf/util/sort.c                   | 37 ++++++++++++++++++++++++
 tools/perf/util/sort.h                   |  2 ++
 5 files changed, 54 insertions(+)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index e56e3f1344a7..f569b9ea4002 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -95,6 +95,7 @@ OPTIONS
 	abort cost. This is the global weight.
 	- local_weight: Local weight version of the weight above.
 	- cgroup_id: ID derived from cgroup namespace device and inode numbers.
+	- cgroup: cgroup pathname in the cgroupfs.
 	- transaction: Transaction abort flags.
 	- overhead: Overhead percentage of sample
 	- overhead_sys: Overhead percentage of sample running in system mode
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index e74a5acf66d9..283a69ff6a3d 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -10,6 +10,7 @@
 #include "mem-events.h"
 #include "session.h"
 #include "namespaces.h"
+#include "cgroup.h"
 #include "sort.h"
 #include "units.h"
 #include "evlist.h"
@@ -194,6 +195,7 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h)
 		hists__set_unres_dso_col_len(hists, HISTC_MEM_DADDR_DSO);
 	}
 
+	hists__new_col_len(hists, HISTC_CGROUP, 6);
 	hists__new_col_len(hists, HISTC_CGROUP_ID, 20);
 	hists__new_col_len(hists, HISTC_CPU, 3);
 	hists__new_col_len(hists, HISTC_SOCKET, 6);
@@ -222,6 +224,16 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h)
 
 	if (h->trace_output)
 		hists__new_col_len(hists, HISTC_TRACE, strlen(h->trace_output));
+
+	if (h->cgroup) {
+		const char *cgrp_name = "unknown";
+		struct cgroup *cgrp = cgroup__find(h->ms.maps->machine->env,
+						   h->cgroup);
+		if (cgrp != NULL)
+			cgrp_name = cgrp->name;
+
+		hists__new_col_len(hists, HISTC_CGROUP, strlen(cgrp_name));
+	}
 }
 
 void hists__output_recalc_col_len(struct hists *hists, int max_rows)
@@ -691,6 +703,7 @@ __hists__add_entry(struct hists *hists,
 			.dev = ns ? ns->link_info[CGROUP_NS_INDEX].dev : 0,
 			.ino = ns ? ns->link_info[CGROUP_NS_INDEX].ino : 0,
 		},
+		.cgroup = sample->cgroup,
 		.ms = {
 			.maps	= al->maps,
 			.map	= al->map,
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index bb994e030495..4141295a66fa 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -38,6 +38,7 @@ enum hist_column {
 	HISTC_THREAD,
 	HISTC_COMM,
 	HISTC_CGROUP_ID,
+	HISTC_CGROUP,
 	HISTC_PARENT,
 	HISTC_CPU,
 	HISTC_SOCKET,
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index e860595576c2..f14cc728c358 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -12,6 +12,7 @@
 #include "cacheline.h"
 #include "comm.h"
 #include "map.h"
+#include "maps.h"
 #include "symbol.h"
 #include "map_symbol.h"
 #include "branch.h"
@@ -25,6 +26,8 @@
 #include "mem-events.h"
 #include "annotate.h"
 #include "time-utils.h"
+#include "cgroup.h"
+#include "machine.h"
 #include <linux/kernel.h>
 #include <linux/string.h>
 
@@ -634,6 +637,39 @@ struct sort_entry sort_cgroup_id = {
 	.se_width_idx	= HISTC_CGROUP_ID,
 };
 
+/* --sort cgroup */
+
+static int64_t
+sort__cgroup_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	return right->cgroup - left->cgroup;
+}
+
+static int hist_entry__cgroup_snprintf(struct hist_entry *he,
+				       char *bf, size_t size,
+				       unsigned int width __maybe_unused)
+{
+	const char *cgrp_name = "N/A";
+
+	if (he->cgroup) {
+		struct cgroup *cgrp = cgroup__find(he->ms.maps->machine->env,
+						   he->cgroup);
+		if (cgrp != NULL)
+			cgrp_name = cgrp->name;
+		else
+			cgrp_name = "unknown";
+	}
+
+	return repsep_snprintf(bf, size, "%s", cgrp_name);
+}
+
+struct sort_entry sort_cgroup = {
+	.se_header      = "Cgroup",
+	.se_cmp	        = sort__cgroup_cmp,
+	.se_snprintf    = hist_entry__cgroup_snprintf,
+	.se_width_idx	= HISTC_CGROUP,
+};
+
 /* --sort socket */
 
 static int64_t
@@ -1660,6 +1696,7 @@ static struct sort_dimension common_sort_dimensions[] = {
 	DIM(SORT_TRACE, "trace", sort_trace),
 	DIM(SORT_SYM_SIZE, "symbol_size", sort_sym_size),
 	DIM(SORT_DSO_SIZE, "dso_size", sort_dso_size),
+	DIM(SORT_CGROUP, "cgroup", sort_cgroup),
 	DIM(SORT_CGROUP_ID, "cgroup_id", sort_cgroup_id),
 	DIM(SORT_SYM_IPC_NULL, "ipc_null", sort_sym_ipc_null),
 	DIM(SORT_TIME, "time", sort_time),
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 6c862d62d052..cfa6ac6f7d06 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -101,6 +101,7 @@ struct hist_entry {
 	struct thread		*thread;
 	struct comm		*comm;
 	struct namespace_id	cgroup_id;
+	u64			cgroup;
 	u64			ip;
 	u64			transaction;
 	s32			socket;
@@ -224,6 +225,7 @@ enum sort_type {
 	SORT_TRACE,
 	SORT_SYM_SIZE,
 	SORT_DSO_SIZE,
+	SORT_CGROUP,
 	SORT_CGROUP_ID,
 	SORT_SYM_IPC_NULL,
 	SORT_TIME,
-- 
2.25.1.696.g5e7596f4ac-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 6/9] perf record: Support synthesizing cgroup events
  2020-03-25 12:45 [PATCHSET 0/9] perf: Improve cgroup profiling (v6) Namhyung Kim
                   ` (4 preceding siblings ...)
  2020-03-25 12:45 ` [PATCH 5/9] perf report: Add 'cgroup' sort key Namhyung Kim
@ 2020-03-25 12:45 ` Namhyung Kim
  2020-03-30 16:30   ` Arnaldo Carvalho de Melo
  2020-04-04  8:41   ` [tip: perf/urgent] perf record: Support synthesizing cgroup events tip-bot2 for Namhyung Kim
  2020-03-25 12:45 ` [PATCH 7/9] perf record: Add --all-cgroups option Namhyung Kim
                   ` (3 subsequent siblings)
  9 siblings, 2 replies; 46+ messages in thread
From: Namhyung Kim @ 2020-03-25 12:45 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Peter Zijlstra, Ingo Molnar, LKML, Mark Rutland,
	Alexander Shishkin

Synthesize cgroup events by iterating cgroup filesystem directories.
The cgroup event only saves the portion of cgroup path after the mount
point and the cgroup id (which actually is a file handle).

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-record.c        |   5 ++
 tools/perf/util/synthetic-events.c | 113 +++++++++++++++++++++++++++++
 tools/perf/util/synthetic-events.h |   1 +
 tools/perf/util/tool.h             |   1 +
 4 files changed, 120 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 4c301466101b..2802de9538ff 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1397,6 +1397,11 @@ static int record__synthesize(struct record *rec, bool tail)
 	if (err < 0)
 		pr_warning("Couldn't synthesize bpf events.\n");
 
+	err = perf_event__synthesize_cgroups(tool, process_synthesized_event,
+					     machine);
+	if (err < 0)
+		pr_warning("Couldn't synthesize cgroup events.\n");
+
 	err = __machine__synthesize_threads(machine, tool, &opts->target, rec->evlist->core.threads,
 					    process_synthesized_event, opts->sample_address,
 					    1);
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index f72d80999506..24975470ed5c 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -16,6 +16,7 @@
 #include "util/synthetic-events.h"
 #include "util/target.h"
 #include "util/time-utils.h"
+#include "util/cgroup.h"
 #include <linux/bitops.h>
 #include <linux/kernel.h>
 #include <linux/string.h>
@@ -414,6 +415,118 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 	return rc;
 }
 
+static int perf_event__synthesize_cgroup(struct perf_tool *tool,
+					 union perf_event *event,
+					 char *path, size_t mount_len,
+					 perf_event__handler_t process,
+					 struct machine *machine)
+{
+	size_t event_size = sizeof(event->cgroup) - sizeof(event->cgroup.path);
+	size_t path_len = strlen(path) - mount_len + 1;
+	struct {
+		struct file_handle fh;
+		uint64_t cgroup_id;
+	} handle;
+	int mount_id;
+
+	while (path_len % sizeof(u64))
+		path[mount_len + path_len++] = '\0';
+
+	memset(&event->cgroup, 0, event_size);
+
+	event->cgroup.header.type = PERF_RECORD_CGROUP;
+	event->cgroup.header.size = event_size + path_len + machine->id_hdr_size;
+
+	handle.fh.handle_bytes = sizeof(handle.cgroup_id);
+	if (name_to_handle_at(AT_FDCWD, path, &handle.fh, &mount_id, 0) < 0) {
+		pr_debug("stat failed: %s\n", path);
+		return -1;
+	}
+
+	event->cgroup.id = handle.cgroup_id;
+	strncpy(event->cgroup.path, path + mount_len, path_len);
+	memset(event->cgroup.path + path_len, 0, machine->id_hdr_size);
+
+	if (perf_tool__process_synth_event(tool, event, machine, process) < 0) {
+		pr_debug("process synth event failed\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int perf_event__walk_cgroup_tree(struct perf_tool *tool,
+					union perf_event *event,
+					char *path, size_t mount_len,
+					perf_event__handler_t process,
+					struct machine *machine)
+{
+	size_t pos = strlen(path);
+	DIR *d;
+	struct dirent *dent;
+	int ret = 0;
+
+	if (perf_event__synthesize_cgroup(tool, event, path, mount_len,
+					  process, machine) < 0)
+		return -1;
+
+	d = opendir(path);
+	if (d == NULL) {
+		pr_debug("failed to open directory: %s\n", path);
+		return -1;
+	}
+
+	while ((dent = readdir(d)) != NULL) {
+		if (dent->d_type != DT_DIR)
+			continue;
+		if (!strcmp(dent->d_name, ".") ||
+		    !strcmp(dent->d_name, ".."))
+			continue;
+
+		/* any sane path should be less than PATH_MAX */
+		if (strlen(path) + strlen(dent->d_name) + 1 >= PATH_MAX)
+			continue;
+
+		if (path[pos - 1] != '/')
+			strcat(path, "/");
+		strcat(path, dent->d_name);
+
+		ret = perf_event__walk_cgroup_tree(tool, event, path,
+						   mount_len, process, machine);
+		if (ret < 0)
+			break;
+
+		path[pos] = '\0';
+	}
+
+	closedir(d);
+	return ret;
+}
+
+int perf_event__synthesize_cgroups(struct perf_tool *tool,
+				   perf_event__handler_t process,
+				   struct machine *machine)
+{
+	union perf_event event;
+	char cgrp_root[PATH_MAX];
+	size_t mount_len;  /* length of mount point in the path */
+
+	if (cgroupfs_find_mountpoint(cgrp_root, PATH_MAX, "perf_event") < 0) {
+		pr_debug("cannot find cgroup mount point\n");
+		return -1;
+	}
+
+	mount_len = strlen(cgrp_root);
+	/* make sure the path starts with a slash (after mount point) */
+	strcat(cgrp_root, "/");
+
+	if (perf_event__walk_cgroup_tree(tool, &event, cgrp_root, mount_len,
+					 process, machine) < 0)
+		return -1;
+
+	return 0;
+}
+
 int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t process,
 				   struct machine *machine)
 {
diff --git a/tools/perf/util/synthetic-events.h b/tools/perf/util/synthetic-events.h
index baead0cdc381..e7a3e9589738 100644
--- a/tools/perf/util/synthetic-events.h
+++ b/tools/perf/util/synthetic-events.h
@@ -45,6 +45,7 @@ int perf_event__synthesize_kernel_mmap(struct perf_tool *tool, perf_event__handl
 int perf_event__synthesize_mmap_events(struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine, bool mmap_data);
 int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
 int perf_event__synthesize_namespaces(struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine);
+int perf_event__synthesize_cgroups(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
 int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format, const struct perf_sample *sample);
 int perf_event__synthesize_stat_config(struct perf_tool *tool, struct perf_stat_config *config, perf_event__handler_t process, struct machine *machine);
 int perf_event__synthesize_stat_events(struct perf_stat_config *config, struct perf_tool *tool, struct evlist *evlist, perf_event__handler_t process, bool attrs);
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 472ef5eb4068..3fb67bd31e4a 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -79,6 +79,7 @@ struct perf_tool {
 	bool		ordered_events;
 	bool		ordering_requires_timestamps;
 	bool		namespace_events;
+	bool		cgroup_events;
 	bool		no_warn;
 	enum show_feature_header show_feat_hdr;
 };
-- 
2.25.1.696.g5e7596f4ac-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 7/9] perf record: Add --all-cgroups option
  2020-03-25 12:45 [PATCHSET 0/9] perf: Improve cgroup profiling (v6) Namhyung Kim
                   ` (5 preceding siblings ...)
  2020-03-25 12:45 ` [PATCH 6/9] perf record: Support synthesizing cgroup events Namhyung Kim
@ 2020-03-25 12:45 ` Namhyung Kim
  2020-04-04  8:41   ` [tip: perf/urgent] " tip-bot2 for Namhyung Kim
  2020-03-25 12:45 ` [PATCH 8/9] perf top: " Namhyung Kim
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 46+ messages in thread
From: Namhyung Kim @ 2020-03-25 12:45 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Peter Zijlstra, Ingo Molnar, LKML, Mark Rutland,
	Alexander Shishkin

The --all-cgroups option is to enable cgroup profiling support.  It
tells kernel to record CGROUP events in the ring buffer so that perf
report can identify task/cgroup association later.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-record.txt |  5 ++++-
 tools/perf/builtin-record.c              |  5 +++++
 tools/perf/util/evsel.c                  | 11 ++++++++++-
 tools/perf/util/evsel.h                  |  1 +
 tools/perf/util/record.h                 |  1 +
 5 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 7f4db7592467..eba9beec80e6 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -385,7 +385,10 @@ displayed with the weight and local_weight sort keys.  This currently works for
 abort events and some memory events in precise mode on modern Intel CPUs.
 
 --namespaces::
-Record events of type PERF_RECORD_NAMESPACES.
+Record events of type PERF_RECORD_NAMESPACES.  This enables 'cgroup_id' sort key.
+
+--all-cgroups::
+Record events of type PERF_RECORD_CGROUP.  This enables 'cgroup' sort key.
 
 --transaction::
 Record transaction flags for transaction related events.
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 2802de9538ff..7d7912e121d6 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1433,6 +1433,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 	if (rec->opts.record_namespaces)
 		tool->namespace_events = true;
 
+	if (rec->opts.record_cgroup)
+		tool->cgroup_events = true;
+
 	if (rec->opts.auxtrace_snapshot_mode || rec->switch_output.enabled) {
 		signal(SIGUSR2, snapshot_sig_handler);
 		if (rec->opts.auxtrace_snapshot_mode)
@@ -2363,6 +2366,8 @@ static struct option __record_options[] = {
 			"per thread proc mmap processing timeout in ms"),
 	OPT_BOOLEAN(0, "namespaces", &record.opts.record_namespaces,
 		    "Record namespaces events"),
+	OPT_BOOLEAN(0, "all-cgroups", &record.opts.record_cgroup,
+		    "Record cgroup events"),
 	OPT_BOOLEAN(0, "switch-events", &record.opts.record_switch_events,
 		    "Record context switch events"),
 	OPT_BOOLEAN_FLAG(0, "all-kernel", &record.opts.all_kernel,
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index b766eb608b97..eb880efbce16 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1104,6 +1104,11 @@ void perf_evsel__config(struct evsel *evsel, struct record_opts *opts,
 	if (opts->record_namespaces)
 		attr->namespaces  = track;
 
+	if (opts->record_cgroup) {
+		attr->cgroup = track && !perf_missing_features.cgroup;
+		perf_evsel__set_sample_bit(evsel, CGROUP);
+	}
+
 	if (opts->record_switch_events)
 		attr->context_switch = track;
 
@@ -1789,7 +1794,11 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
 	 * Must probe features in the order they were added to the
 	 * perf_event_attr interface.
 	 */
-	if (!perf_missing_features.branch_hw_idx &&
+        if (!perf_missing_features.cgroup && evsel->core.attr.cgroup) {
+		perf_missing_features.cgroup = true;
+		pr_debug2_peo("Kernel has no cgroup sampling support, bailing out\n");
+		goto out_close;
+        } else if (!perf_missing_features.branch_hw_idx &&
 	    (evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX)) {
 		perf_missing_features.branch_hw_idx = true;
 		pr_debug2("switching off branch HW index support\n");
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 33804740e2ca..53187c501ee8 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -120,6 +120,7 @@ struct perf_missing_features {
 	bool bpf;
 	bool aux_output;
 	bool branch_hw_idx;
+	bool cgroup;
 };
 
 extern struct perf_missing_features perf_missing_features;
diff --git a/tools/perf/util/record.h b/tools/perf/util/record.h
index 5421fd2ad383..24316458be20 100644
--- a/tools/perf/util/record.h
+++ b/tools/perf/util/record.h
@@ -34,6 +34,7 @@ struct record_opts {
 	bool	      auxtrace_snapshot_on_exit;
 	bool	      auxtrace_sample_mode;
 	bool	      record_namespaces;
+	bool	      record_cgroup;
 	bool	      record_switch_events;
 	bool	      all_kernel;
 	bool	      all_user;
-- 
2.25.1.696.g5e7596f4ac-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 8/9] perf top: Add --all-cgroups option
  2020-03-25 12:45 [PATCHSET 0/9] perf: Improve cgroup profiling (v6) Namhyung Kim
                   ` (6 preceding siblings ...)
  2020-03-25 12:45 ` [PATCH 7/9] perf record: Add --all-cgroups option Namhyung Kim
@ 2020-03-25 12:45 ` Namhyung Kim
  2020-03-27 16:35   ` Arnaldo Carvalho de Melo
  2020-04-04  8:41   ` [tip: perf/urgent] " tip-bot2 for Namhyung Kim
  2020-03-25 12:45 ` [PATCH 9/9] perf script: Add --show-cgroup-events option Namhyung Kim
  2020-03-25 13:05 ` [PATCHSET 0/9] perf: Improve cgroup profiling (v6) Arnaldo Carvalho de Melo
  9 siblings, 2 replies; 46+ messages in thread
From: Namhyung Kim @ 2020-03-25 12:45 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Peter Zijlstra, Ingo Molnar, LKML, Mark Rutland,
	Alexander Shishkin

The --all-cgroups option is to enable cgroup profiling support.  It
tells kernel to record CGROUP events in the ring buffer so that perf
report can identify task/cgroup association later.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-top.txt | 4 ++++
 tools/perf/builtin-top.c              | 9 +++++++++
 2 files changed, 13 insertions(+)

diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt
index 324b6b53c86b..ddab103af8c7 100644
--- a/tools/perf/Documentation/perf-top.txt
+++ b/tools/perf/Documentation/perf-top.txt
@@ -272,6 +272,10 @@ Default is to monitor all CPUS.
 	Record events of type PERF_RECORD_NAMESPACES and display it with the
 	'cgroup_id' sort key.
 
+--all-cgroups::
+	Record events of type PERF_RECORD_CGROUP and display it with the
+	'cgroup' sort key.
+
 --switch-on EVENT_NAME::
 	Only consider events after this event is found.
 
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index d2539b793f9d..56b2dd0db88e 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1246,6 +1246,8 @@ static int __cmd_top(struct perf_top *top)
 
 	if (opts->record_namespaces)
 		top->tool.namespace_events = true;
+	if (opts->record_cgroup)
+		top->tool.cgroup_events = true;
 
 	ret = perf_event__synthesize_bpf_events(top->session, perf_event__process,
 						&top->session->machines.host,
@@ -1253,6 +1255,11 @@ static int __cmd_top(struct perf_top *top)
 	if (ret < 0)
 		pr_debug("Couldn't synthesize BPF events: Pre-existing BPF programs won't have symbols resolved.\n");
 
+	ret = perf_event__synthesize_cgroups(&top->tool, perf_event__process,
+					     &top->session->machines.host);
+	if (ret < 0)
+		pr_debug("Couldn't synthesize cgroup events.\n");
+
 	machine__synthesize_threads(&top->session->machines.host, &opts->target,
 				    top->evlist->core.threads, false,
 				    top->nr_threads_synthesize);
@@ -1545,6 +1552,8 @@ int cmd_top(int argc, const char **argv)
 			"number of thread to run event synthesize"),
 	OPT_BOOLEAN(0, "namespaces", &opts->record_namespaces,
 		    "Record namespaces events"),
+	OPT_BOOLEAN(0, "all-cgroups", &opts->record_cgroup,
+		    "Record cgroup events"),
 	OPTS_EVSWITCH(&top.evswitch),
 	OPT_END()
 	};
-- 
2.25.1.696.g5e7596f4ac-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 9/9] perf script: Add --show-cgroup-events option
  2020-03-25 12:45 [PATCHSET 0/9] perf: Improve cgroup profiling (v6) Namhyung Kim
                   ` (7 preceding siblings ...)
  2020-03-25 12:45 ` [PATCH 8/9] perf top: " Namhyung Kim
@ 2020-03-25 12:45 ` Namhyung Kim
  2020-03-27 16:39   ` Arnaldo Carvalho de Melo
  2020-04-04  8:41   ` [tip: perf/urgent] " tip-bot2 for Namhyung Kim
  2020-03-25 13:05 ` [PATCHSET 0/9] perf: Improve cgroup profiling (v6) Arnaldo Carvalho de Melo
  9 siblings, 2 replies; 46+ messages in thread
From: Namhyung Kim @ 2020-03-25 12:45 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Peter Zijlstra, Ingo Molnar, LKML, Mark Rutland,
	Alexander Shishkin

The --show-cgroup-events option is to print CGROUP events in the
output like others.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-script.txt |  3 ++
 tools/perf/builtin-script.c              | 41 ++++++++++++++++++++++++
 2 files changed, 44 insertions(+)

diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index db6a36aac47e..515ff9f6af30 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -319,6 +319,9 @@ OPTIONS
 --show-bpf-events
 	Display bpf events i.e. events of type PERF_RECORD_KSYMBOL and PERF_RECORD_BPF_EVENT.
 
+--show-cgroup-events
+	Display cgroup events i.e. events of type PERF_RECORD_CGROUP.
+
 --demangle::
 	Demangle symbol names to human readable form. It's enabled by default,
 	disable with --no-demangle.
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 656b347f6dd8..6cc2d1da6ece 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -1685,6 +1685,7 @@ struct perf_script {
 	bool			show_lost_events;
 	bool			show_round_events;
 	bool			show_bpf_events;
+	bool			show_cgroup_events;
 	bool			allocated;
 	bool			per_event_dump;
 	struct evswitch		evswitch;
@@ -2203,6 +2204,41 @@ static int process_namespaces_event(struct perf_tool *tool,
 	return ret;
 }
 
+static int process_cgroup_event(struct perf_tool *tool,
+				union perf_event *event,
+				struct perf_sample *sample,
+				struct machine *machine)
+{
+	struct thread *thread;
+	struct perf_script *script = container_of(tool, struct perf_script, tool);
+	struct perf_session *session = script->session;
+	struct evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id);
+	int ret = -1;
+
+	thread = machine__findnew_thread(machine, sample->pid, sample->tid);
+	if (thread == NULL) {
+		pr_debug("problem processing CGROUP event, skipping it.\n");
+		return -1;
+	}
+
+	if (perf_event__process_cgroup(tool, event, sample, machine) < 0)
+		goto out;
+
+	if (!evsel->core.attr.sample_id_all) {
+		sample->cpu = 0;
+		sample->time = 0;
+	}
+	if (!filter_cpu(sample)) {
+		perf_sample__fprintf_start(sample, thread, evsel,
+					   PERF_RECORD_CGROUP, stdout);
+		perf_event__fprintf(event, stdout);
+	}
+	ret = 0;
+out:
+	thread__put(thread);
+	return ret;
+}
+
 static int process_fork_event(struct perf_tool *tool,
 			      union perf_event *event,
 			      struct perf_sample *sample,
@@ -2542,6 +2578,8 @@ static int __cmd_script(struct perf_script *script)
 		script->tool.context_switch = process_switch_event;
 	if (script->show_namespace_events)
 		script->tool.namespaces = process_namespaces_event;
+	if (script->show_cgroup_events)
+		script->tool.cgroup = process_cgroup_event;
 	if (script->show_lost_events)
 		script->tool.lost = process_lost_event;
 	if (script->show_round_events) {
@@ -3467,6 +3505,7 @@ int cmd_script(int argc, const char **argv)
 			.mmap2		 = perf_event__process_mmap2,
 			.comm		 = perf_event__process_comm,
 			.namespaces	 = perf_event__process_namespaces,
+			.cgroup		 = perf_event__process_cgroup,
 			.exit		 = perf_event__process_exit,
 			.fork		 = perf_event__process_fork,
 			.attr		 = process_attr,
@@ -3567,6 +3606,8 @@ int cmd_script(int argc, const char **argv)
 		    "Show context switch events (if recorded)"),
 	OPT_BOOLEAN('\0', "show-namespace-events", &script.show_namespace_events,
 		    "Show namespace events (if recorded)"),
+	OPT_BOOLEAN('\0', "show-cgroup-events", &script.show_cgroup_events,
+		    "Show cgroup events (if recorded)"),
 	OPT_BOOLEAN('\0', "show-lost-events", &script.show_lost_events,
 		    "Show lost events (if recorded)"),
 	OPT_BOOLEAN('\0', "show-round-events", &script.show_round_events,
-- 
2.25.1.696.g5e7596f4ac-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCHSET 0/9] perf: Improve cgroup profiling (v6)
  2020-03-25 12:45 [PATCHSET 0/9] perf: Improve cgroup profiling (v6) Namhyung Kim
                   ` (8 preceding siblings ...)
  2020-03-25 12:45 ` [PATCH 9/9] perf script: Add --show-cgroup-events option Namhyung Kim
@ 2020-03-25 13:05 ` Arnaldo Carvalho de Melo
  2020-03-25 13:19   ` Namhyung Kim
  9 siblings, 1 reply; 46+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-03-25 13:05 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Peter Zijlstra, Ingo Molnar, LKML, Mark Rutland,
	Alexander Shishkin, Tejun Heo, Li Zefan, Johannes Weiner,
	Adrian Hunter

Em Wed, Mar 25, 2020 at 09:45:27PM +0900, Namhyung Kim escreveu:
> Hello,
> 
> This work is to improve cgroup profiling in perf.  Currently it only
> supports profiling tasks in a specific cgroup and there's no way to
> identify which cgroup the current sample belongs to.  So I added
> PERF_SAMPLE_CGROUP to add cgroup id into each sample.  It's a 64-bit
> integer having file handle of the cgroup.  And kernel also generates
> PERF_RECORD_CGROUP event for new groups to correlate the cgroup id and
> cgroup name (path in the cgroup filesystem).  The cgroup id can be
> read from userspace by name_to_handle_at() system call so it can
> synthesize the CGROUP event for existing groups.

Thanks for the refresh, I'll test this today and all going well push to
Ingo soon,

- Arnaldo
 
> So why do we want this?  Systems running a large number of jobs in
> different cgroups want to profiling such jobs precisely. This includes
> container hosting systems widely used today. Currently perf supports
> namespace tracking but the systems may not use (cgroup) namespace for
> their jobs. Also it'd be more intuitive to see cgroup names (as
> they're given by user or sysadmin) rather than numeric
> cgroup/namespace id even if they use the namespaces.
> 
> From Stephane Eranian:
> > In data centers you care about attributing samples to a job not such
> > much to a process.  A job may have multiple processes which may come
> > and go. The cgroup on the other hand stays around for the entire
> > lifetime of the job. It is much easier to map a cgroup name to a
> > particular job than it is to map a pid back to a job name,
> > especially for offline post-processing.
> 
> Note that this only works for "perf_event" cgroups (obviously) so if
> users are still using cgroup-v1 interface, they need to have same
> hierarchy for subsystem(s) want to profile with it.
> 
>  * Changes from v5:
>   - add ACKs for the kernel changes
>   - fix sort column alignment when no cgroup is available
> 
>  * Changes from v4:
>   - use CONFIG_CGROUP_PERF
>   - move cgroup tree to perf_env
>   - move cgroup fs utility function to tools/lib/api/fs
>   - use a local buffer and check its size for cgroup systhesis
> 
>  * Changes from v3:
>   - staticize perf_event_cgroup()
>   - fix sample parsing test
> 
>  * Changes from v2:
>   - remove path_len from cgroup_event
>   - bail out if kernel doesn't support cgroup sampling
>   - add some description in the Kconfig
> 
>  * Changes from v1:
>   - use new cgroup id (= file handle)
> 
> The code is available at perf/cgroup-v6 branch in
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
> 
> 
> The testing result looks something like this:
> 
>   [root@qemu build]# ./perf record --all-cgroups ./cgtest
>   [ perf record: Woken up 1 times to write data ]
>   [ perf record: Captured and wrote 0.009 MB perf.data (150 samples) ]
>   
>   [root@qemu build]# ./perf report -s cgroup,comm --stdio
>   # To display the perf.data header info, please use --header/--header-only options.
>   #
>   #
>   # Total Lost Samples: 0
>   #
>   # Samples: 150  of event 'cpu-clock:pppH'
>   # Event count (approx.): 37500000
>   #
>   # Overhead  cgroup      Command
>   # ........  ..........  .......
>   #
>       32.00%  /sub/cgrp2  looper2
>       28.00%  /sub/cgrp1  looper1
>       25.33%  /sub        looper0
>        4.00%  /           cgtest 
>        4.00%  /sub        cgtest 
>        3.33%  /sub/cgrp2  cgtest 
>        2.67%  /sub/cgrp1  cgtest 
>        0.67%  /           looper0
> 
> 
> The test program (cgtest) follows.
> 
> Thanks,
> Namhyung
> 
> 
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Li Zefan <lizefan@huawei.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> 
> 
> -------8<-----------------------------------------8<----------------
> #define _GNU_SOURCE
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <unistd.h>
> #include <fcntl.h>
> #include <sched.h>
> #include <errno.h>
> #include <sys/stat.h>
> #include <sys/wait.h>
> #include <sys/prctl.h>
> #include <sys/mount.h>
> 
> char cgbase[] = "/sys/fs/cgroup/perf_event";
> 
> void mkcgrp(char *name) {
>   char buf[256];
> 
>   snprintf(buf, sizeof(buf), "%s%s", cgbase, name);
>   if (mkdir(buf, 0755) < 0)
>     perror("mkdir");
> }
> 
> void rmcgrp(char *name) {
>   char buf[256];
> 
>   snprintf(buf, sizeof(buf), "%s%s", cgbase, name);
>   if (rmdir(buf) < 0)
>     perror("rmdir");
> }
> 
> void setcgrp(char *name) {
>   char buf[256];
>   int fd;
> 
>   snprintf(buf, sizeof(buf), "%s%s/tasks", cgbase, name);
> 
>   fd = open(buf, O_WRONLY);
>   if (fd > 0) {
>     if (write(fd, "0\n", 2) != 2)
>       perror("write");
>     close(fd);
>   }
> }
> 
> void create_sub_cgroup(int idx) {
>   char buf[128];
> 
>   snprintf(buf, sizeof(buf), "/sub/cgrp%d", idx);
>   mkcgrp(buf);
> }
> 
> void remove_sub_cgroup(int idx) {
>   char buf[128];
> 
>   snprintf(buf, sizeof(buf), "/sub/cgrp%d", idx);
>   rmcgrp(buf);
> }
> 
> void set_sub_cgroup(int idx) {
>   char buf[128];
> 
>   snprintf(buf, sizeof(buf), "/sub/cgrp%d", idx);
>   setcgrp(buf);
> }
> 
> void set_task_name(int idx) {
>   char buf[16];
> 
>   snprintf(buf, sizeof(buf), "looper%d", idx);
>   prctl(PR_SET_NAME, buf, 0, 0, 0);
> }
> 
> void loop(unsigned max) {
>   volatile unsigned int count = 1;
> 
>   while (count++ != max) {
>     asm volatile ("pause");
>   }
> }
> 
> void worker(int idx, unsigned cnt, int new_ns) {
>   int oldns;
> 
>   create_sub_cgroup(idx);
>   set_sub_cgroup(idx);
> 
>   if (new_ns) {
>     if (unshare(CLONE_NEWCGROUP) < 0)
>       perror("unshare");
> 
> #if 0  /* FIXME */
>     if (unshare(CLONE_NEWNS) < 0)
>       perror("unshare");
> 
>     if (mount("none", "/sys", NULL, MS_REMOUNT | MS_REC | MS_SLAVE, NULL) < 0)
>       perror("mount --make-rslave");
> 
>     sleep(1);
>     if (umount("/sys/fs/cgroup/perf_event") < 0)
>       perror("umount");
> 
>     if (mount("cgroup", "/sys/fs/cgroup/perf_event", "cgroup",
>               MS_NODEV | MS_NOEXEC | MS_NOSUID, "perf_event") < 0)
>       perror("mount again");
> #endif
>   }
> 
>   if (fork() == 0) {
>     set_task_name(idx);
>     loop(cnt);
>     exit(0);
>   }
>   wait(NULL);
> }
> 
> int main(int argc, char *argv[])
> {
>   int i, nr = 2;
>   int new_ns = 1;
>   unsigned cnt = 1000000;
>   int fd;
> 
>   if (argc > 1)
>     nr = atoi(argv[1]);
>   if (argc > 2)
>     cnt = atoi(argv[2]);
>   if (argc > 3)
>     new_ns = atoi(argv[3]);
> 
>   mkcgrp("/sub");
>   setcgrp("/sub");
> 
>   for (i = 0; i < nr; i++) {
>     if (fork() == 0) {
>       worker(i+1, cnt, new_ns);
>       exit(0);
>     }
>   }
> 
>   set_task_name(0);
>   loop(cnt);
> 
>   for (i = 0; i < nr; i++)
>     wait(NULL);
> 
>   for (i = 0; i < nr; i++)
>     remove_sub_cgroup(i+1);
> 
>   setcgrp("/");
>   rmcgrp("/sub");
> 
>   return 0;
> }
> -------8<-----------------------------------------8<----------------
> 
> 
> Namhyung Kim (9):
>   perf/core: Add PERF_RECORD_CGROUP event
>   perf/core: Add PERF_SAMPLE_CGROUP feature
>   perf tools: Basic support for CGROUP event
>   perf tools: Maintain cgroup hierarchy
>   perf report: Add 'cgroup' sort key
>   perf record: Support synthesizing cgroup events
>   perf record: Add --all-cgroups option
>   perf top: Add --all-cgroups option
>   perf script: Add --show-cgroup-events option
> 
>  include/linux/perf_event.h                |   1 +
>  include/uapi/linux/perf_event.h           |  16 ++-
>  init/Kconfig                              |   3 +-
>  kernel/events/core.c                      | 133 ++++++++++++++++++++++
>  tools/include/uapi/linux/perf_event.h     |  16 ++-
>  tools/lib/perf/include/perf/event.h       |   7 ++
>  tools/perf/Documentation/perf-record.txt  |   5 +-
>  tools/perf/Documentation/perf-report.txt  |   1 +
>  tools/perf/Documentation/perf-script.txt  |   3 +
>  tools/perf/Documentation/perf-top.txt     |   4 +
>  tools/perf/builtin-diff.c                 |   1 +
>  tools/perf/builtin-record.c               |  10 ++
>  tools/perf/builtin-report.c               |   1 +
>  tools/perf/builtin-script.c               |  41 +++++++
>  tools/perf/builtin-top.c                  |   9 ++
>  tools/perf/tests/sample-parsing.c         |   6 +-
>  tools/perf/util/cgroup.c                  |  80 +++++++++++++
>  tools/perf/util/cgroup.h                  |  17 ++-
>  tools/perf/util/env.c                     |   2 +
>  tools/perf/util/env.h                     |   6 +
>  tools/perf/util/event.c                   |  18 +++
>  tools/perf/util/event.h                   |   6 +
>  tools/perf/util/evsel.c                   |  17 ++-
>  tools/perf/util/evsel.h                   |   1 +
>  tools/perf/util/hist.c                    |  13 +++
>  tools/perf/util/hist.h                    |   1 +
>  tools/perf/util/machine.c                 |  19 ++++
>  tools/perf/util/machine.h                 |   3 +
>  tools/perf/util/perf_event_attr_fprintf.c |   2 +
>  tools/perf/util/record.h                  |   1 +
>  tools/perf/util/session.c                 |   4 +
>  tools/perf/util/sort.c                    |  37 ++++++
>  tools/perf/util/sort.h                    |   2 +
>  tools/perf/util/synthetic-events.c        | 121 ++++++++++++++++++++
>  tools/perf/util/synthetic-events.h        |   1 +
>  tools/perf/util/tool.h                    |   2 +
>  36 files changed, 598 insertions(+), 12 deletions(-)
> 
> -- 
> 2.25.1.696.g5e7596f4ac-goog
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCHSET 0/9] perf: Improve cgroup profiling (v6)
  2020-03-25 13:05 ` [PATCHSET 0/9] perf: Improve cgroup profiling (v6) Arnaldo Carvalho de Melo
@ 2020-03-25 13:19   ` Namhyung Kim
  0 siblings, 0 replies; 46+ messages in thread
From: Namhyung Kim @ 2020-03-25 13:19 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Peter Zijlstra, Ingo Molnar, LKML, Mark Rutland,
	Alexander Shishkin, Tejun Heo, Li Zefan, Johannes Weiner,
	Adrian Hunter

Hi Arnaldo,

On Wed, Mar 25, 2020 at 10:05 PM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> Em Wed, Mar 25, 2020 at 09:45:27PM +0900, Namhyung Kim escreveu:
> > Hello,
> >
> > This work is to improve cgroup profiling in perf.  Currently it only
> > supports profiling tasks in a specific cgroup and there's no way to
> > identify which cgroup the current sample belongs to.  So I added
> > PERF_SAMPLE_CGROUP to add cgroup id into each sample.  It's a 64-bit
> > integer having file handle of the cgroup.  And kernel also generates
> > PERF_RECORD_CGROUP event for new groups to correlate the cgroup id and
> > cgroup name (path in the cgroup filesystem).  The cgroup id can be
> > read from userspace by name_to_handle_at() system call so it can
> > synthesize the CGROUP event for existing groups.
>
> Thanks for the refresh, I'll test this today and all going well push to
> Ingo soon,

Thanks for doing this!
Namhyung

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 3/9] perf tools: Basic support for CGROUP event
  2020-03-25 12:45 ` [PATCH 3/9] perf tools: Basic support for CGROUP event Namhyung Kim
@ 2020-03-27 14:06   ` Arnaldo Carvalho de Melo
  2020-04-04  8:41   ` [tip: perf/urgent] " tip-bot2 for Namhyung Kim
  1 sibling, 0 replies; 46+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-03-27 14:06 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Peter Zijlstra, Ingo Molnar, LKML, Mark Rutland,
	Alexander Shishkin, Adrian Hunter, kernel test robot

Em Wed, Mar 25, 2020 at 09:45:30PM +0900, Namhyung Kim escreveu:
> Implement basic functionality to support cgroup tracking.  Each cgroup
> can be identified by inode number which can be read from userspace
> too.  The actual cgroup processing will come in the later patch.
> 
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> [fix perf test failure on sampling parsing]
> Reported-by: kernel test robot <rong.a.chen@intel.com>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---

I'm just separating the UAPI header from the rest of this patch, as has
been usual,

- Arnaldo

>  tools/include/uapi/linux/perf_event.h     | 16 ++++++++++++++--
>  tools/lib/perf/include/perf/event.h       |  7 +++++++
>  tools/perf/builtin-diff.c                 |  1 +
>  tools/perf/builtin-report.c               |  1 +
>  tools/perf/tests/sample-parsing.c         |  6 +++++-
>  tools/perf/util/event.c                   | 18 ++++++++++++++++++
>  tools/perf/util/event.h                   |  6 ++++++
>  tools/perf/util/evsel.c                   |  6 ++++++
>  tools/perf/util/machine.c                 | 12 ++++++++++++
>  tools/perf/util/machine.h                 |  3 +++
>  tools/perf/util/perf_event_attr_fprintf.c |  2 ++
>  tools/perf/util/session.c                 |  4 ++++
>  tools/perf/util/synthetic-events.c        |  8 ++++++++
>  tools/perf/util/tool.h                    |  1 +
>  14 files changed, 88 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
> index 397cfd65b3fe..7b2d6fc9e6ed 100644
> --- a/tools/include/uapi/linux/perf_event.h
> +++ b/tools/include/uapi/linux/perf_event.h
> @@ -142,8 +142,9 @@ enum perf_event_sample_format {
>  	PERF_SAMPLE_REGS_INTR			= 1U << 18,
>  	PERF_SAMPLE_PHYS_ADDR			= 1U << 19,
>  	PERF_SAMPLE_AUX				= 1U << 20,
> +	PERF_SAMPLE_CGROUP			= 1U << 21,
>  
> -	PERF_SAMPLE_MAX = 1U << 21,		/* non-ABI */
> +	PERF_SAMPLE_MAX = 1U << 22,		/* non-ABI */
>  
>  	__PERF_SAMPLE_CALLCHAIN_EARLY		= 1ULL << 63, /* non-ABI; internal use */
>  };
> @@ -381,7 +382,8 @@ struct perf_event_attr {
>  				ksymbol        :  1, /* include ksymbol events */
>  				bpf_event      :  1, /* include bpf events */
>  				aux_output     :  1, /* generate AUX records instead of events */
> -				__reserved_1   : 32;
> +				cgroup         :  1, /* include cgroup events */
> +				__reserved_1   : 31;
>  
>  	union {
>  		__u32		wakeup_events;	  /* wakeup every n events */
> @@ -1012,6 +1014,16 @@ enum perf_event_type {
>  	 */
>  	PERF_RECORD_BPF_EVENT			= 18,
>  
> +	/*
> +	 * struct {
> +	 *	struct perf_event_header	header;
> +	 *	u64				id;
> +	 *	char				path[];
> +	 *	struct sample_id		sample_id;
> +	 * };
> +	 */
> +	PERF_RECORD_CGROUP			= 19,
> +
>  	PERF_RECORD_MAX,			/* non-ABI */
>  };
>  
> diff --git a/tools/lib/perf/include/perf/event.h b/tools/lib/perf/include/perf/event.h
> index 18106899cb4e..69b44d2cc0f5 100644
> --- a/tools/lib/perf/include/perf/event.h
> +++ b/tools/lib/perf/include/perf/event.h
> @@ -105,6 +105,12 @@ struct perf_record_bpf_event {
>  	__u8			 tag[BPF_TAG_SIZE];  // prog tag
>  };
>  
> +struct perf_record_cgroup {
> +	struct perf_event_header header;
> +	__u64			 id;
> +	char			 path[PATH_MAX];
> +};
> +
>  struct perf_record_sample {
>  	struct perf_event_header header;
>  	__u64			 array[];
> @@ -352,6 +358,7 @@ union perf_event {
>  	struct perf_record_mmap2		mmap2;
>  	struct perf_record_comm			comm;
>  	struct perf_record_namespaces		namespaces;
> +	struct perf_record_cgroup		cgroup;
>  	struct perf_record_fork			fork;
>  	struct perf_record_lost			lost;
>  	struct perf_record_lost_samples		lost_samples;
> diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
> index 5e697cd2224a..c94a002f295e 100644
> --- a/tools/perf/builtin-diff.c
> +++ b/tools/perf/builtin-diff.c
> @@ -455,6 +455,7 @@ static struct perf_diff pdiff = {
>  		.fork	= perf_event__process_fork,
>  		.lost	= perf_event__process_lost,
>  		.namespaces = perf_event__process_namespaces,
> +		.cgroup = perf_event__process_cgroup,
>  		.ordered_events = true,
>  		.ordering_requires_timestamps = true,
>  	},
> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> index ea673b7eb3f4..26d8fc27e427 100644
> --- a/tools/perf/builtin-report.c
> +++ b/tools/perf/builtin-report.c
> @@ -1105,6 +1105,7 @@ int cmd_report(int argc, const char **argv)
>  			.mmap2		 = perf_event__process_mmap2,
>  			.comm		 = perf_event__process_comm,
>  			.namespaces	 = perf_event__process_namespaces,
> +			.cgroup		 = perf_event__process_cgroup,
>  			.exit		 = perf_event__process_exit,
>  			.fork		 = perf_event__process_fork,
>  			.lost		 = perf_event__process_lost,
> diff --git a/tools/perf/tests/sample-parsing.c b/tools/perf/tests/sample-parsing.c
> index 14239e472187..61865699c3f4 100644
> --- a/tools/perf/tests/sample-parsing.c
> +++ b/tools/perf/tests/sample-parsing.c
> @@ -151,6 +151,9 @@ static bool samples_same(const struct perf_sample *s1,
>  	if (type & PERF_SAMPLE_PHYS_ADDR)
>  		COMP(phys_addr);
>  
> +	if (type & PERF_SAMPLE_CGROUP)
> +		COMP(cgroup);
> +
>  	if (type & PERF_SAMPLE_AUX) {
>  		COMP(aux_sample.size);
>  		if (memcmp(s1->aux_sample.data, s2->aux_sample.data,
> @@ -230,6 +233,7 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
>  			.regs	= regs,
>  		},
>  		.phys_addr	= 113,
> +		.cgroup		= 114,
>  		.aux_sample	= {
>  			.size	= sizeof(aux_data),
>  			.data	= (void *)aux_data,
> @@ -336,7 +340,7 @@ int test__sample_parsing(struct test *test __maybe_unused, int subtest __maybe_u
>  	 * were added.  Please actually update the test rather than just change
>  	 * the condition below.
>  	 */
> -	if (PERF_SAMPLE_MAX > PERF_SAMPLE_AUX << 1) {
> +	if (PERF_SAMPLE_MAX > PERF_SAMPLE_CGROUP << 1) {
>  		pr_debug("sample format has changed, some new PERF_SAMPLE_ bit was introduced - test needs updating\n");
>  		return -1;
>  	}
> diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
> index c5447ff516a2..824c038e5c33 100644
> --- a/tools/perf/util/event.c
> +++ b/tools/perf/util/event.c
> @@ -54,6 +54,7 @@ static const char *perf_event__names[] = {
>  	[PERF_RECORD_NAMESPACES]		= "NAMESPACES",
>  	[PERF_RECORD_KSYMBOL]			= "KSYMBOL",
>  	[PERF_RECORD_BPF_EVENT]			= "BPF_EVENT",
> +	[PERF_RECORD_CGROUP]			= "CGROUP",
>  	[PERF_RECORD_HEADER_ATTR]		= "ATTR",
>  	[PERF_RECORD_HEADER_EVENT_TYPE]		= "EVENT_TYPE",
>  	[PERF_RECORD_HEADER_TRACING_DATA]	= "TRACING_DATA",
> @@ -180,6 +181,12 @@ size_t perf_event__fprintf_namespaces(union perf_event *event, FILE *fp)
>  	return ret;
>  }
>  
> +size_t perf_event__fprintf_cgroup(union perf_event *event, FILE *fp)
> +{
> +	return fprintf(fp, " cgroup: %" PRI_lu64 " %s\n",
> +		       event->cgroup.id, event->cgroup.path);
> +}
> +
>  int perf_event__process_comm(struct perf_tool *tool __maybe_unused,
>  			     union perf_event *event,
>  			     struct perf_sample *sample,
> @@ -196,6 +203,14 @@ int perf_event__process_namespaces(struct perf_tool *tool __maybe_unused,
>  	return machine__process_namespaces_event(machine, event, sample);
>  }
>  
> +int perf_event__process_cgroup(struct perf_tool *tool __maybe_unused,
> +			       union perf_event *event,
> +			       struct perf_sample *sample,
> +			       struct machine *machine)
> +{
> +	return machine__process_cgroup_event(machine, event, sample);
> +}
> +
>  int perf_event__process_lost(struct perf_tool *tool __maybe_unused,
>  			     union perf_event *event,
>  			     struct perf_sample *sample,
> @@ -417,6 +432,9 @@ size_t perf_event__fprintf(union perf_event *event, FILE *fp)
>  	case PERF_RECORD_NAMESPACES:
>  		ret += perf_event__fprintf_namespaces(event, fp);
>  		break;
> +	case PERF_RECORD_CGROUP:
> +		ret += perf_event__fprintf_cgroup(event, fp);
> +		break;
>  	case PERF_RECORD_MMAP2:
>  		ret += perf_event__fprintf_mmap2(event, fp);
>  		break;
> diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
> index 3cda40a2fafc..b8289f160f07 100644
> --- a/tools/perf/util/event.h
> +++ b/tools/perf/util/event.h
> @@ -135,6 +135,7 @@ struct perf_sample {
>  	u32 raw_size;
>  	u64 data_src;
>  	u64 phys_addr;
> +	u64 cgroup;
>  	u32 flags;
>  	u16 insn_len;
>  	u8  cpumode;
> @@ -322,6 +323,10 @@ int perf_event__process_namespaces(struct perf_tool *tool,
>  				   union perf_event *event,
>  				   struct perf_sample *sample,
>  				   struct machine *machine);
> +int perf_event__process_cgroup(struct perf_tool *tool,
> +			       union perf_event *event,
> +			       struct perf_sample *sample,
> +			       struct machine *machine);
>  int perf_event__process_mmap(struct perf_tool *tool,
>  			     union perf_event *event,
>  			     struct perf_sample *sample,
> @@ -377,6 +382,7 @@ size_t perf_event__fprintf_switch(union perf_event *event, FILE *fp);
>  size_t perf_event__fprintf_thread_map(union perf_event *event, FILE *fp);
>  size_t perf_event__fprintf_cpu_map(union perf_event *event, FILE *fp);
>  size_t perf_event__fprintf_namespaces(union perf_event *event, FILE *fp);
> +size_t perf_event__fprintf_cgroup(union perf_event *event, FILE *fp);
>  size_t perf_event__fprintf_ksymbol(union perf_event *event, FILE *fp);
>  size_t perf_event__fprintf_bpf(union perf_event *event, FILE *fp);
>  size_t perf_event__fprintf(union perf_event *event, FILE *fp);
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index 15ccd193483f..b766eb608b97 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -2267,6 +2267,12 @@ int perf_evsel__parse_sample(struct evsel *evsel, union perf_event *event,
>  		array++;
>  	}
>  
> +	data->cgroup = 0;
> +	if (type & PERF_SAMPLE_CGROUP) {
> +		data->cgroup = *array;
> +		array++;
> +	}
> +
>  	if (type & PERF_SAMPLE_AUX) {
>  		OVERFLOW_CHECK_u64(array);
>  		sz = *array++;
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index fd14f1489802..399b4731b246 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -654,6 +654,16 @@ int machine__process_namespaces_event(struct machine *machine __maybe_unused,
>  	return err;
>  }
>  
> +int machine__process_cgroup_event(struct machine *machine __maybe_unused,
> +				  union perf_event *event,
> +				  struct perf_sample *sample __maybe_unused)
> +{
> +	if (dump_trace)
> +		perf_event__fprintf_cgroup(event, stdout);
> +
> +	return 0;
> +}
> +
>  int machine__process_lost_event(struct machine *machine __maybe_unused,
>  				union perf_event *event, struct perf_sample *sample __maybe_unused)
>  {
> @@ -1878,6 +1888,8 @@ int machine__process_event(struct machine *machine, union perf_event *event,
>  		ret = machine__process_mmap_event(machine, event, sample); break;
>  	case PERF_RECORD_NAMESPACES:
>  		ret = machine__process_namespaces_event(machine, event, sample); break;
> +	case PERF_RECORD_CGROUP:
> +		ret = machine__process_cgroup_event(machine, event, sample); break;
>  	case PERF_RECORD_MMAP2:
>  		ret = machine__process_mmap2_event(machine, event, sample); break;
>  	case PERF_RECORD_FORK:
> diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
> index be0a930eca89..fa1be9ea00fa 100644
> --- a/tools/perf/util/machine.h
> +++ b/tools/perf/util/machine.h
> @@ -128,6 +128,9 @@ int machine__process_switch_event(struct machine *machine,
>  int machine__process_namespaces_event(struct machine *machine,
>  				      union perf_event *event,
>  				      struct perf_sample *sample);
> +int machine__process_cgroup_event(struct machine *machine,
> +				  union perf_event *event,
> +				  struct perf_sample *sample);
>  int machine__process_mmap_event(struct machine *machine, union perf_event *event,
>  				struct perf_sample *sample);
>  int machine__process_mmap2_event(struct machine *machine, union perf_event *event,
> diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/perf_event_attr_fprintf.c
> index 355d3458d4e6..b94fa07f5d32 100644
> --- a/tools/perf/util/perf_event_attr_fprintf.c
> +++ b/tools/perf/util/perf_event_attr_fprintf.c
> @@ -35,6 +35,7 @@ static void __p_sample_type(char *buf, size_t size, u64 value)
>  		bit_name(BRANCH_STACK), bit_name(REGS_USER), bit_name(STACK_USER),
>  		bit_name(IDENTIFIER), bit_name(REGS_INTR), bit_name(DATA_SRC),
>  		bit_name(WEIGHT), bit_name(PHYS_ADDR), bit_name(AUX),
> +		bit_name(CGROUP),
>  		{ .name = NULL, }
>  	};
>  #undef bit_name
> @@ -132,6 +133,7 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
>  	PRINT_ATTRf(ksymbol, p_unsigned);
>  	PRINT_ATTRf(bpf_event, p_unsigned);
>  	PRINT_ATTRf(aux_output, p_unsigned);
> +	PRINT_ATTRf(cgroup, p_unsigned);
>  
>  	PRINT_ATTRn("{ wakeup_events, wakeup_watermark }", wakeup_events, p_unsigned);
>  	PRINT_ATTRf(bp_type, p_unsigned);
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 055b00abd56d..0b0bfe5bef17 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -471,6 +471,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
>  		tool->comm = process_event_stub;
>  	if (tool->namespaces == NULL)
>  		tool->namespaces = process_event_stub;
> +	if (tool->cgroup == NULL)
> +		tool->cgroup = process_event_stub;
>  	if (tool->fork == NULL)
>  		tool->fork = process_event_stub;
>  	if (tool->exit == NULL)
> @@ -1436,6 +1438,8 @@ static int machines__deliver_event(struct machines *machines,
>  		return tool->comm(tool, event, sample, machine);
>  	case PERF_RECORD_NAMESPACES:
>  		return tool->namespaces(tool, event, sample, machine);
> +	case PERF_RECORD_CGROUP:
> +		return tool->cgroup(tool, event, sample, machine);
>  	case PERF_RECORD_FORK:
>  		return tool->fork(tool, event, sample, machine);
>  	case PERF_RECORD_EXIT:
> diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
> index 3f28af39f9c6..f72d80999506 100644
> --- a/tools/perf/util/synthetic-events.c
> +++ b/tools/perf/util/synthetic-events.c
> @@ -1230,6 +1230,9 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
>  	if (type & PERF_SAMPLE_PHYS_ADDR)
>  		result += sizeof(u64);
>  
> +	if (type & PERF_SAMPLE_CGROUP)
> +		result += sizeof(u64);
> +
>  	if (type & PERF_SAMPLE_AUX) {
>  		result += sizeof(u64);
>  		result += sample->aux_sample.size;
> @@ -1404,6 +1407,11 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo
>  		array++;
>  	}
>  
> +	if (type & PERF_SAMPLE_CGROUP) {
> +		*array = sample->cgroup;
> +		array++;
> +	}
> +
>  	if (type & PERF_SAMPLE_AUX) {
>  		sz = sample->aux_sample.size;
>  		*array++ = sz;
> diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
> index 2abbf668b8de..472ef5eb4068 100644
> --- a/tools/perf/util/tool.h
> +++ b/tools/perf/util/tool.h
> @@ -46,6 +46,7 @@ struct perf_tool {
>  			mmap2,
>  			comm,
>  			namespaces,
> +			cgroup,
>  			fork,
>  			exit,
>  			lost,
> -- 
> 2.25.1.696.g5e7596f4ac-goog
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 8/9] perf top: Add --all-cgroups option
  2020-03-25 12:45 ` [PATCH 8/9] perf top: " Namhyung Kim
@ 2020-03-27 16:35   ` Arnaldo Carvalho de Melo
  2020-04-04  8:41   ` [tip: perf/urgent] " tip-bot2 for Namhyung Kim
  1 sibling, 0 replies; 46+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-03-27 16:35 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Peter Zijlstra, Ingo Molnar, LKML, Mark Rutland,
	Alexander Shishkin

Em Wed, Mar 25, 2020 at 09:45:35PM +0900, Namhyung Kim escreveu:
> The --all-cgroups option is to enable cgroup profiling support.  It
> tells kernel to record CGROUP events in the ring buffer so that perf
> report can identify task/cgroup association later.
 s/report/top/g in this case, and I added:

     Committer testing:

    Use:

      # perf top --all-cgroups -s cgroup_id,cgroup,pid

- Arnaldo

 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/Documentation/perf-top.txt | 4 ++++
>  tools/perf/builtin-top.c              | 9 +++++++++
>  2 files changed, 13 insertions(+)
> 
> diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt
> index 324b6b53c86b..ddab103af8c7 100644
> --- a/tools/perf/Documentation/perf-top.txt
> +++ b/tools/perf/Documentation/perf-top.txt
> @@ -272,6 +272,10 @@ Default is to monitor all CPUS.
>  	Record events of type PERF_RECORD_NAMESPACES and display it with the
>  	'cgroup_id' sort key.
>  
> +--all-cgroups::
> +	Record events of type PERF_RECORD_CGROUP and display it with the
> +	'cgroup' sort key.
> +
>  --switch-on EVENT_NAME::
>  	Only consider events after this event is found.
>  
> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> index d2539b793f9d..56b2dd0db88e 100644
> --- a/tools/perf/builtin-top.c
> +++ b/tools/perf/builtin-top.c
> @@ -1246,6 +1246,8 @@ static int __cmd_top(struct perf_top *top)
>  
>  	if (opts->record_namespaces)
>  		top->tool.namespace_events = true;
> +	if (opts->record_cgroup)
> +		top->tool.cgroup_events = true;
>  
>  	ret = perf_event__synthesize_bpf_events(top->session, perf_event__process,
>  						&top->session->machines.host,
> @@ -1253,6 +1255,11 @@ static int __cmd_top(struct perf_top *top)
>  	if (ret < 0)
>  		pr_debug("Couldn't synthesize BPF events: Pre-existing BPF programs won't have symbols resolved.\n");
>  
> +	ret = perf_event__synthesize_cgroups(&top->tool, perf_event__process,
> +					     &top->session->machines.host);
> +	if (ret < 0)
> +		pr_debug("Couldn't synthesize cgroup events.\n");
> +
>  	machine__synthesize_threads(&top->session->machines.host, &opts->target,
>  				    top->evlist->core.threads, false,
>  				    top->nr_threads_synthesize);
> @@ -1545,6 +1552,8 @@ int cmd_top(int argc, const char **argv)
>  			"number of thread to run event synthesize"),
>  	OPT_BOOLEAN(0, "namespaces", &opts->record_namespaces,
>  		    "Record namespaces events"),
> +	OPT_BOOLEAN(0, "all-cgroups", &opts->record_cgroup,
> +		    "Record cgroup events"),
>  	OPTS_EVSWITCH(&top.evswitch),
>  	OPT_END()
>  	};
> -- 
> 2.25.1.696.g5e7596f4ac-goog
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 9/9] perf script: Add --show-cgroup-events option
  2020-03-25 12:45 ` [PATCH 9/9] perf script: Add --show-cgroup-events option Namhyung Kim
@ 2020-03-27 16:39   ` Arnaldo Carvalho de Melo
  2020-04-04  8:41   ` [tip: perf/urgent] " tip-bot2 for Namhyung Kim
  1 sibling, 0 replies; 46+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-03-27 16:39 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Peter Zijlstra, Ingo Molnar, LKML, Mark Rutland,
	Alexander Shishkin

Em Wed, Mar 25, 2020 at 09:45:36PM +0900, Namhyung Kim escreveu:
> The --show-cgroup-events option is to print CGROUP events in the
> output like others.

Added this to the comment:

Committer testing:

  [root@seventh ~]# perf record --all-cgroups --namespaces /wb/cgtest
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.039 MB perf.data (487 samples) ]
  [root@seventh ~]# perf script --show-cgroup-events | grep PERF_RECORD_CGROUP -B2 -A2
           swapper     0     0.000000: PERF_RECORD_CGROUP cgroup: 1 /
              perf 12145 11200.440730:          1 cycles:  ffffffffb900d58b __intel_pmu_enable_all.constprop.0+0x3b (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux)
              perf 12145 11200.440733:          1 cycles:  ffffffffb900d58b __intel_pmu_enable_all.constprop.0+0x3b (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux)
  --
            cgtest 12145 11200.440739:     193472 cycles:  ffffffffb90f6fbc commit_creds+0x1fc (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux)
            cgtest 12145 11200.440790:    2691608 cycles:      7fa2cb43019b _dl_sysdep_start+0x7cb (/usr/lib64/ld-2.29.so)
            cgtest 12145 11200.440962: PERF_RECORD_CGROUP cgroup: 83 /sub
            cgtest 12147 11200.441054:          1 cycles:  ffffffffb900d58b __intel_pmu_enable_all.constprop.0+0x3b (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux)
            cgtest 12147 11200.441057:          1 cycles:  ffffffffb900d58b __intel_pmu_enable_all.constprop.0+0x3b (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux)
  --
            cgtest 12148 11200.441103:      10227 cycles:  ffffffffb9a0153d end_repeat_nmi+0x48 (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux)
            cgtest 12148 11200.441106:     273295 cycles:  ffffffffb99ecbc7 copy_page+0x7 (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux)
            cgtest 12147 11200.441133: PERF_RECORD_CGROUP cgroup: 88 /sub/cgrp1
            cgtest 12147 11200.441143:    2788845 cycles:  ffffffffb94676c2 security_genfs_sid+0x102 (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux)
            cgtest 12148 11200.441162: PERF_RECORD_CGROUP cgroup: 93 /sub/cgrp2
            cgtest 12148 11200.441182:    2669546 cycles:            401020 _init+0x20 (/wb/cgtest)
            cgtest 12149 11200.441247:          1 cycles:  ffffffffb900d58b __intel_pmu_enable_all.constprop.0+0x3b (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux)
  [root@seventh ~]#

Thanks, series applied and tested, seems to work :-)

Now to trampolines...

- Arnaldo
 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/Documentation/perf-script.txt |  3 ++
>  tools/perf/builtin-script.c              | 41 ++++++++++++++++++++++++
>  2 files changed, 44 insertions(+)
> 
> diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
> index db6a36aac47e..515ff9f6af30 100644
> --- a/tools/perf/Documentation/perf-script.txt
> +++ b/tools/perf/Documentation/perf-script.txt
> @@ -319,6 +319,9 @@ OPTIONS
>  --show-bpf-events
>  	Display bpf events i.e. events of type PERF_RECORD_KSYMBOL and PERF_RECORD_BPF_EVENT.
>  
> +--show-cgroup-events
> +	Display cgroup events i.e. events of type PERF_RECORD_CGROUP.
> +
>  --demangle::
>  	Demangle symbol names to human readable form. It's enabled by default,
>  	disable with --no-demangle.
> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> index 656b347f6dd8..6cc2d1da6ece 100644
> --- a/tools/perf/builtin-script.c
> +++ b/tools/perf/builtin-script.c
> @@ -1685,6 +1685,7 @@ struct perf_script {
>  	bool			show_lost_events;
>  	bool			show_round_events;
>  	bool			show_bpf_events;
> +	bool			show_cgroup_events;
>  	bool			allocated;
>  	bool			per_event_dump;
>  	struct evswitch		evswitch;
> @@ -2203,6 +2204,41 @@ static int process_namespaces_event(struct perf_tool *tool,
>  	return ret;
>  }
>  
> +static int process_cgroup_event(struct perf_tool *tool,
> +				union perf_event *event,
> +				struct perf_sample *sample,
> +				struct machine *machine)
> +{
> +	struct thread *thread;
> +	struct perf_script *script = container_of(tool, struct perf_script, tool);
> +	struct perf_session *session = script->session;
> +	struct evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id);
> +	int ret = -1;
> +
> +	thread = machine__findnew_thread(machine, sample->pid, sample->tid);
> +	if (thread == NULL) {
> +		pr_debug("problem processing CGROUP event, skipping it.\n");
> +		return -1;
> +	}
> +
> +	if (perf_event__process_cgroup(tool, event, sample, machine) < 0)
> +		goto out;
> +
> +	if (!evsel->core.attr.sample_id_all) {
> +		sample->cpu = 0;
> +		sample->time = 0;
> +	}
> +	if (!filter_cpu(sample)) {
> +		perf_sample__fprintf_start(sample, thread, evsel,
> +					   PERF_RECORD_CGROUP, stdout);
> +		perf_event__fprintf(event, stdout);
> +	}
> +	ret = 0;
> +out:
> +	thread__put(thread);
> +	return ret;
> +}
> +
>  static int process_fork_event(struct perf_tool *tool,
>  			      union perf_event *event,
>  			      struct perf_sample *sample,
> @@ -2542,6 +2578,8 @@ static int __cmd_script(struct perf_script *script)
>  		script->tool.context_switch = process_switch_event;
>  	if (script->show_namespace_events)
>  		script->tool.namespaces = process_namespaces_event;
> +	if (script->show_cgroup_events)
> +		script->tool.cgroup = process_cgroup_event;
>  	if (script->show_lost_events)
>  		script->tool.lost = process_lost_event;
>  	if (script->show_round_events) {
> @@ -3467,6 +3505,7 @@ int cmd_script(int argc, const char **argv)
>  			.mmap2		 = perf_event__process_mmap2,
>  			.comm		 = perf_event__process_comm,
>  			.namespaces	 = perf_event__process_namespaces,
> +			.cgroup		 = perf_event__process_cgroup,
>  			.exit		 = perf_event__process_exit,
>  			.fork		 = perf_event__process_fork,
>  			.attr		 = process_attr,
> @@ -3567,6 +3606,8 @@ int cmd_script(int argc, const char **argv)
>  		    "Show context switch events (if recorded)"),
>  	OPT_BOOLEAN('\0', "show-namespace-events", &script.show_namespace_events,
>  		    "Show namespace events (if recorded)"),
> +	OPT_BOOLEAN('\0', "show-cgroup-events", &script.show_cgroup_events,
> +		    "Show cgroup events (if recorded)"),
>  	OPT_BOOLEAN('\0', "show-lost-events", &script.show_lost_events,
>  		    "Show lost events (if recorded)"),
>  	OPT_BOOLEAN('\0', "show-round-events", &script.show_round_events,
> -- 
> 2.25.1.696.g5e7596f4ac-goog
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 6/9] perf record: Support synthesizing cgroup events
  2020-03-25 12:45 ` [PATCH 6/9] perf record: Support synthesizing cgroup events Namhyung Kim
@ 2020-03-30 16:30   ` Arnaldo Carvalho de Melo
  2020-03-30 16:43     ` Arnaldo Carvalho de Melo
  2020-04-04  8:41   ` [tip: perf/urgent] perf record: Support synthesizing cgroup events tip-bot2 for Namhyung Kim
  1 sibling, 1 reply; 46+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-03-30 16:30 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Peter Zijlstra, Ingo Molnar, LKML, Mark Rutland,
	Alexander Shishkin

Em Wed, Mar 25, 2020 at 09:45:33PM +0900, Namhyung Kim escreveu:
> Synthesize cgroup events by iterating cgroup filesystem directories.
> The cgroup event only saves the portion of cgroup path after the mount
> point and the cgroup id (which actually is a file handle).

Breaks the build on alpine linux (musl libc):

  CC       /tmp/build/perf/util/srccode.o
  CC       /tmp/build/perf/util/synthetic-events.o
util/synthetic-events.c: In function 'perf_event__synthesize_cgroup':
util/synthetic-events.c:427:22: error: field 'fh' has incomplete type
   struct file_handle fh;
                      ^
util/synthetic-events.c:441:6: error: implicit declaration of function 'name_to_handle_at' [-Werror=implicit-function-declaration]
  if (name_to_handle_at(AT_FDCWD, path, &handle.fh, &mount_id, 0) < 0) {
      ^
util/synthetic-events.c:441:2: error: nested extern declaration of 'name_to_handle_at' [-Werror=nested-externs]
  if (name_to_handle_at(AT_FDCWD, path, &handle.fh, &mount_id, 0) < 0) {
  ^
  CC       /tmp/build/perf/util/data.o
cc1: all warnings being treated as errors
mv: can't rename '/tmp/build/perf/util/.synthetic-events.o.tmp': No such file or directory


I'm trying to fix
 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/builtin-record.c        |   5 ++
>  tools/perf/util/synthetic-events.c | 113 +++++++++++++++++++++++++++++
>  tools/perf/util/synthetic-events.h |   1 +
>  tools/perf/util/tool.h             |   1 +
>  4 files changed, 120 insertions(+)
> 
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 4c301466101b..2802de9538ff 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -1397,6 +1397,11 @@ static int record__synthesize(struct record *rec, bool tail)
>  	if (err < 0)
>  		pr_warning("Couldn't synthesize bpf events.\n");
>  
> +	err = perf_event__synthesize_cgroups(tool, process_synthesized_event,
> +					     machine);
> +	if (err < 0)
> +		pr_warning("Couldn't synthesize cgroup events.\n");
> +
>  	err = __machine__synthesize_threads(machine, tool, &opts->target, rec->evlist->core.threads,
>  					    process_synthesized_event, opts->sample_address,
>  					    1);
> diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
> index f72d80999506..24975470ed5c 100644
> --- a/tools/perf/util/synthetic-events.c
> +++ b/tools/perf/util/synthetic-events.c
> @@ -16,6 +16,7 @@
>  #include "util/synthetic-events.h"
>  #include "util/target.h"
>  #include "util/time-utils.h"
> +#include "util/cgroup.h"
>  #include <linux/bitops.h>
>  #include <linux/kernel.h>
>  #include <linux/string.h>
> @@ -414,6 +415,118 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
>  	return rc;
>  }
>  
> +static int perf_event__synthesize_cgroup(struct perf_tool *tool,
> +					 union perf_event *event,
> +					 char *path, size_t mount_len,
> +					 perf_event__handler_t process,
> +					 struct machine *machine)
> +{
> +	size_t event_size = sizeof(event->cgroup) - sizeof(event->cgroup.path);
> +	size_t path_len = strlen(path) - mount_len + 1;
> +	struct {
> +		struct file_handle fh;
> +		uint64_t cgroup_id;
> +	} handle;
> +	int mount_id;
> +
> +	while (path_len % sizeof(u64))
> +		path[mount_len + path_len++] = '\0';
> +
> +	memset(&event->cgroup, 0, event_size);
> +
> +	event->cgroup.header.type = PERF_RECORD_CGROUP;
> +	event->cgroup.header.size = event_size + path_len + machine->id_hdr_size;
> +
> +	handle.fh.handle_bytes = sizeof(handle.cgroup_id);
> +	if (name_to_handle_at(AT_FDCWD, path, &handle.fh, &mount_id, 0) < 0) {
> +		pr_debug("stat failed: %s\n", path);
> +		return -1;
> +	}
> +
> +	event->cgroup.id = handle.cgroup_id;
> +	strncpy(event->cgroup.path, path + mount_len, path_len);
> +	memset(event->cgroup.path + path_len, 0, machine->id_hdr_size);
> +
> +	if (perf_tool__process_synth_event(tool, event, machine, process) < 0) {
> +		pr_debug("process synth event failed\n");
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
> +static int perf_event__walk_cgroup_tree(struct perf_tool *tool,
> +					union perf_event *event,
> +					char *path, size_t mount_len,
> +					perf_event__handler_t process,
> +					struct machine *machine)
> +{
> +	size_t pos = strlen(path);
> +	DIR *d;
> +	struct dirent *dent;
> +	int ret = 0;
> +
> +	if (perf_event__synthesize_cgroup(tool, event, path, mount_len,
> +					  process, machine) < 0)
> +		return -1;
> +
> +	d = opendir(path);
> +	if (d == NULL) {
> +		pr_debug("failed to open directory: %s\n", path);
> +		return -1;
> +	}
> +
> +	while ((dent = readdir(d)) != NULL) {
> +		if (dent->d_type != DT_DIR)
> +			continue;
> +		if (!strcmp(dent->d_name, ".") ||
> +		    !strcmp(dent->d_name, ".."))
> +			continue;
> +
> +		/* any sane path should be less than PATH_MAX */
> +		if (strlen(path) + strlen(dent->d_name) + 1 >= PATH_MAX)
> +			continue;
> +
> +		if (path[pos - 1] != '/')
> +			strcat(path, "/");
> +		strcat(path, dent->d_name);
> +
> +		ret = perf_event__walk_cgroup_tree(tool, event, path,
> +						   mount_len, process, machine);
> +		if (ret < 0)
> +			break;
> +
> +		path[pos] = '\0';
> +	}
> +
> +	closedir(d);
> +	return ret;
> +}
> +
> +int perf_event__synthesize_cgroups(struct perf_tool *tool,
> +				   perf_event__handler_t process,
> +				   struct machine *machine)
> +{
> +	union perf_event event;
> +	char cgrp_root[PATH_MAX];
> +	size_t mount_len;  /* length of mount point in the path */
> +
> +	if (cgroupfs_find_mountpoint(cgrp_root, PATH_MAX, "perf_event") < 0) {
> +		pr_debug("cannot find cgroup mount point\n");
> +		return -1;
> +	}
> +
> +	mount_len = strlen(cgrp_root);
> +	/* make sure the path starts with a slash (after mount point) */
> +	strcat(cgrp_root, "/");
> +
> +	if (perf_event__walk_cgroup_tree(tool, &event, cgrp_root, mount_len,
> +					 process, machine) < 0)
> +		return -1;
> +
> +	return 0;
> +}
> +
>  int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t process,
>  				   struct machine *machine)
>  {
> diff --git a/tools/perf/util/synthetic-events.h b/tools/perf/util/synthetic-events.h
> index baead0cdc381..e7a3e9589738 100644
> --- a/tools/perf/util/synthetic-events.h
> +++ b/tools/perf/util/synthetic-events.h
> @@ -45,6 +45,7 @@ int perf_event__synthesize_kernel_mmap(struct perf_tool *tool, perf_event__handl
>  int perf_event__synthesize_mmap_events(struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine, bool mmap_data);
>  int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
>  int perf_event__synthesize_namespaces(struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine);
> +int perf_event__synthesize_cgroups(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
>  int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format, const struct perf_sample *sample);
>  int perf_event__synthesize_stat_config(struct perf_tool *tool, struct perf_stat_config *config, perf_event__handler_t process, struct machine *machine);
>  int perf_event__synthesize_stat_events(struct perf_stat_config *config, struct perf_tool *tool, struct evlist *evlist, perf_event__handler_t process, bool attrs);
> diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
> index 472ef5eb4068..3fb67bd31e4a 100644
> --- a/tools/perf/util/tool.h
> +++ b/tools/perf/util/tool.h
> @@ -79,6 +79,7 @@ struct perf_tool {
>  	bool		ordered_events;
>  	bool		ordering_requires_timestamps;
>  	bool		namespace_events;
> +	bool		cgroup_events;
>  	bool		no_warn;
>  	enum show_feature_header show_feat_hdr;
>  };
> -- 
> 2.25.1.696.g5e7596f4ac-goog
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 6/9] perf record: Support synthesizing cgroup events
  2020-03-30 16:30   ` Arnaldo Carvalho de Melo
@ 2020-03-30 16:43     ` Arnaldo Carvalho de Melo
  2020-03-31 16:04       ` Arnaldo Carvalho de Melo
  2020-04-01 23:22       ` Namhyung Kim
  0 siblings, 2 replies; 46+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-03-30 16:43 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Peter Zijlstra, Ingo Molnar, LKML, Mark Rutland,
	Alexander Shishkin

Em Mon, Mar 30, 2020 at 01:30:58PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Wed, Mar 25, 2020 at 09:45:33PM +0900, Namhyung Kim escreveu:
> > Synthesize cgroup events by iterating cgroup filesystem directories.
> > The cgroup event only saves the portion of cgroup path after the mount
> > point and the cgroup id (which actually is a file handle).
> 
> Breaks the build on alpine linux (musl libc):
> 
>   CC       /tmp/build/perf/util/srccode.o
>   CC       /tmp/build/perf/util/synthetic-events.o
> util/synthetic-events.c: In function 'perf_event__synthesize_cgroup':
> util/synthetic-events.c:427:22: error: field 'fh' has incomplete type
>    struct file_handle fh;
>                       ^
> util/synthetic-events.c:441:6: error: implicit declaration of function 'name_to_handle_at' [-Werror=implicit-function-declaration]
>   if (name_to_handle_at(AT_FDCWD, path, &handle.fh, &mount_id, 0) < 0) {
>       ^
> util/synthetic-events.c:441:2: error: nested extern declaration of 'name_to_handle_at' [-Werror=nested-externs]
>   if (name_to_handle_at(AT_FDCWD, path, &handle.fh, &mount_id, 0) < 0) {
>   ^
>   CC       /tmp/build/perf/util/data.o
> cc1: all warnings being treated as errors
> mv: can't rename '/tmp/build/perf/util/.synthetic-events.o.tmp': No such file or directory
> 
> 
> I'm trying to fix

musl libc up to 1.2.21 (IIRC) lacks name_to_handle_at and its structs,
then from the one that is in alpine linux 3.10 the error changes to:

  CC       /tmp/build/perf/util/cloexec.o
util/synthetic-events.c:427:22: error: field 'fh' with variable sized type 'struct file_handle' not at the end of a struct or class is a GNU
      extension [-Werror,-Wgnu-variable-sized-type-not-at-end]
                struct file_handle fh;
                                   ^
1 error generated.
mv: can't rename '/tmp/build/perf/util/.synthetic-events.o.tmp': No such file or directory
make[4]: *** [/git/linux/tools/build/Makefile.build:97: /tmp/build/perf/util/synthetic-events.o] Error 1
make[4]: *** Waiting for unfinished jobs....
make[3]: *** [/git/linux/tools/build/Makefile.build:139: util] Error 2
make[2]: *** [Makefile.perf:617: /tmp/build/perf/perf-in.o] Error 2
make[1]: *** [Makefile.perf:225: sub-make] Error 2
make: *** [Makefile:70: all] Error 2
make: Leaving directory '/git/linux/tools/perf'
+ exit 1
[root@quaco ~]#

So probably we'll need a feature test to catch this and do some
workaround or disable the feature on such systems, providing some
warning.

I left the container build tests running to see if some other system has
problems with this, perhaps the ones with uCLibc or some older glibc,
we'll see.

So far all the alpine versions failed with the above problems and ALT
Linux p8 and p9 built without problems.

- Arnaldo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 6/9] perf record: Support synthesizing cgroup events
  2020-03-30 16:43     ` Arnaldo Carvalho de Melo
@ 2020-03-31 16:04       ` Arnaldo Carvalho de Melo
  2020-04-01 23:22       ` Namhyung Kim
  1 sibling, 0 replies; 46+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-03-31 16:04 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Peter Zijlstra, Ingo Molnar, LKML, Mark Rutland,
	Alexander Shishkin

Em Mon, Mar 30, 2020 at 01:43:12PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Mon, Mar 30, 2020 at 01:30:58PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Wed, Mar 25, 2020 at 09:45:33PM +0900, Namhyung Kim escreveu:
> > > Synthesize cgroup events by iterating cgroup filesystem directories.
> > > The cgroup event only saves the portion of cgroup path after the mount
> > > point and the cgroup id (which actually is a file handle).
> > 
> > Breaks the build on alpine linux (musl libc):
> > 
> >   CC       /tmp/build/perf/util/srccode.o
> >   CC       /tmp/build/perf/util/synthetic-events.o
> > util/synthetic-events.c: In function 'perf_event__synthesize_cgroup':
> > util/synthetic-events.c:427:22: error: field 'fh' has incomplete type
> >    struct file_handle fh;
> >                       ^
> > util/synthetic-events.c:441:6: error: implicit declaration of function 'name_to_handle_at' [-Werror=implicit-function-declaration]
> >   if (name_to_handle_at(AT_FDCWD, path, &handle.fh, &mount_id, 0) < 0) {
> >       ^
> > util/synthetic-events.c:441:2: error: nested extern declaration of 'name_to_handle_at' [-Werror=nested-externs]
> >   if (name_to_handle_at(AT_FDCWD, path, &handle.fh, &mount_id, 0) < 0) {
> >   ^
> >   CC       /tmp/build/perf/util/data.o
> > cc1: all warnings being treated as errors
> > mv: can't rename '/tmp/build/perf/util/.synthetic-events.o.tmp': No such file or directory
> > 
> > 
> > I'm trying to fix
> 
> musl libc up to 1.2.21 (IIRC) lacks name_to_handle_at and its structs,
> then from the one that is in alpine linux 3.10 the error changes to:
> 
>   CC       /tmp/build/perf/util/cloexec.o
> util/synthetic-events.c:427:22: error: field 'fh' with variable sized type 'struct file_handle' not at the end of a struct or class is a GNU
>       extension [-Werror,-Wgnu-variable-sized-type-not-at-end]
>                 struct file_handle fh;
>                                    ^
> 1 error generated.
> mv: can't rename '/tmp/build/perf/util/.synthetic-events.o.tmp': No such file or directory
> make[4]: *** [/git/linux/tools/build/Makefile.build:97: /tmp/build/perf/util/synthetic-events.o] Error 1
> make[4]: *** Waiting for unfinished jobs....
> make[3]: *** [/git/linux/tools/build/Makefile.build:139: util] Error 2
> make[2]: *** [Makefile.perf:617: /tmp/build/perf/perf-in.o] Error 2
> make[1]: *** [Makefile.perf:225: sub-make] Error 2
> make: *** [Makefile:70: all] Error 2
> make: Leaving directory '/git/linux/tools/perf'
> + exit 1
> [root@quaco ~]#
> 
> So probably we'll need a feature test to catch this and do some
> workaround or disable the feature on such systems, providing some
> warning.
> 
> I left the container build tests running to see if some other system has
> problems with this, perhaps the ones with uCLibc or some older glibc,
> we'll see.
> 
> So far all the alpine versions failed with the above problems and ALT
> Linux p8 and p9 built without problems.

  13   129.90 amazonlinux:1                 : Ok   gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final)
  14   192.36 amazonlinux:2                 : Ok   gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2)
  15    16.02 android-ndk:r12b-arm          : FAIL arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
  16    16.35 android-ndk:r15c-arm          : FAIL arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
  17    13.42 centos:5                      : FAIL gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55)
  18    18.39 centos:6                      : FAIL gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23)
  19    54.36 centos:7                      : Ok   gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39)
  20   215.20 centos:8                      : Ok   gcc (GCC) 8.3.1 20190507 (Red Hat 8.3.1-4), clang version 8.0.1 (Red Hat 8.0.1-1.module_el8.1.0+215+a01033fb)
  21    54.41 clearlinux:latest             : Ok   gcc (Clear Linux OS for Intel Architecture) 9.2.1 20200214 gcc_9_2_0_release-615-g7866f9ebf1, clang version 9.0.1 
  22   139.44 debian:8                      : Ok   gcc (Debian 4.9.2-10+deb8u2) 4.9.2, Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0)
  23   153.99 debian:9                      : Ok   gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, clang version 3.8.1-24 (tags/RELEASE_381/final)
  24   157.00 debian:10                     : Ok   gcc (Debian 8.3.0-6) 8.3.0, clang version 7.0.1-8 (tags/RELEASE_701/final)
  25   170.15 debian:experimental           : Ok   gcc (Debian 9.2.1-28) 9.2.1 20200203, clang version 8.0.1-7 (tags/RELEASE_801/final)
  26    28.45 debian:experimental-x-arm64   : FAIL aarch64-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0

This one was something else, and I bet the other arm64 were the same
  thing:

  CC       /tmp/build/perf/tests/parse-events.o
arch/arm64/util/machine.c: In function 'arch__symbols__fixup_end':
arch/arm64/util/machine.c:19:7: error: implicit declaration of function 'strchr' [-Werror=implicit-function-declaration]
  if ((strchr(p->name, '[') && strchr(c->name, '[') == NULL) ||
       ^~~~~~
arch/arm64/util/machine.c:19:7: error: incompatible implicit declaration of built-in function 'strchr' [-Werror]
arch/arm64/util/machine.c:19:7: note: include '<string.h>' or provide a declaration of 'strchr'
arch/arm64/util/machine.c:6:1:
+#include <string.h>
 
arch/arm64/util/machine.c:19:7:
  if ((strchr(p->name, '[') && strchr(c->name, '[') == NULL) ||
       ^~~~~~
cc1: all warnings being treated as errors
mv: cannot stat '/tmp/build/perf/arch/arm64/util/.machine.o.tmp': No such file or directory
make[6]: *** [/git/linux/tools/build/Makefile.build:97: /tmp/build/perf/arch/arm64/util/machine.o] Error 1


  27    60.23 debian:experimental-x-mips    : Ok   mips-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0
  28    54.38 debian:experimental-x-mips64  : Ok   mips64-linux-gnuabi64-gcc (Debian 9.2.1-24) 9.2.1 20200117
  29    60.29 debian:experimental-x-mipsel  : Ok   mipsel-linux-gnu-gcc (Debian 9.2.1-8) 9.2.1 20190909
  30    51.92 fedora:20                     : Ok   gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7)
  31    54.05 fedora:22                     : Ok   gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.5.0 (tags/RELEASE_350/final)
  32   142.15 fedora:23                     : Ok   gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.7.0 (tags/RELEASE_370/final)
  33   160.24 fedora:24                     : Ok   gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1), clang version 3.8.1 (tags/RELEASE_381/final)
  34    20.46 fedora:24-x-ARC-uClibc        : FAIL arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710
  35   161.31 fedora:25                     : Ok   gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), clang version 3.9.1 (tags/RELEASE_391/final)
  36   182.38 fedora:26                     : Ok   gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2), clang version 4.0.1 (tags/RELEASE_401/final)
  37   178.68 fedora:27                     : Ok   gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 5.0.2 (tags/RELEASE_502/final)
  38   199.12 fedora:28                     : Ok   gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 6.0.1 (tags/RELEASE_601/final)
  39   209.86 fedora:29                     : Ok   gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 7.0.1 (Fedora 7.0.1-6.fc29)
  40   213.17 fedora:30                     : Ok   gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 8.0.0 (Fedora 8.0.0-3.fc30)
  41    53.64 fedora:30-x-ARC-glibc         : Ok   arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225
  42    20.78 fedora:30-x-ARC-uClibc        : FAIL arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225

Same thing as with fedora:24-x-ARC-uClibc:

  CC       /tmp/build/perf/util/tsc.o
util/synthetic-events.c: In function 'perf_event__synthesize_cgroup':
util/synthetic-events.c:427:22: error: field 'fh' has incomplete type
   struct file_handle fh;
                      ^~
util/synthetic-events.c:441:6: error: implicit declaration of function 'name_to_handle_at' [-Werror=implicit-function-declaration]
  if (name_to_handle_at(AT_FDCWD, path, &handle.fh, &mount_id, 0) < 0) {
      ^~~~~~~~~~~~~~~~~
util/synthetic-events.c:441:6: error: nested extern declaration of 'name_to_handle_at' [-Werror=nested-externs]
  CC       /tmp/build/perf/util/cloexec.o
  CC       /tmp/build/perf/util/call-path.o
cc1: all warnings being treated as errors


  43   215.94 fedora:31                     : Ok   gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), clang version 9.0.1 (Fedora 9.0.1-2.fc31)
  44   181.57 fedora:32                     : Ok   gcc (GCC) 10.0.1 20200216 (Red Hat 10.0.1-0.8), clang version 10.0.0 (Fedora 10.0.0-0.1.rc2.fc32)
  45   185.19 fedora:rawhide                : Ok   gcc (GCC) 10.0.1 20200216 (Red Hat 10.0.1-0.8), clang version 10.0.0 (Fedora 10.0.0-0.3.rc2.fc33)
  46    62.84 gentoo-stage3-amd64:latest    : Ok   gcc (Gentoo 9.2.0-r2 p3) 9.2.0
  47   139.89 mageia:5                      : Ok   gcc (GCC) 4.9.2, clang version 3.5.2 (tags/RELEASE_352/final)
  48   165.42 mageia:6                      : Ok   gcc (Mageia 5.5.0-1.mga6) 5.5.0, clang version 3.9.1 (tags/RELEASE_391/final)
  49   217.63 mageia:7                      : Ok   gcc (Mageia 8.3.1-0.20190524.1.mga7) 8.3.1 20190524, clang version 8.0.0 (Mageia 8.0.0-1.mga7)
  50   221.42 manjaro:latest                : Ok   gcc (GCC) 9.2.0, clang version 9.0.0 (tags/RELEASE_900/final)
  51   219.95 openmandriva:cooker           : Ok   gcc (GCC) 10.0.0 20200216 (OpenMandriva), clang version 10.0.0 
  52   188.84 opensuse:15.0                 : Ok   gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 5.0.1 (tags/RELEASE_501/final 312548)
  53   206.42 opensuse:15.1                 : Ok   gcc (SUSE Linux) 7.5.0, clang version 7.0.1 (tags/RELEASE_701/final 349238)
  54   205.61 opensuse:15.2                 : Ok   gcc (SUSE Linux) 7.5.0, clang version 7.0.1 (tags/RELEASE_701/final 349238)
  55   183.74 opensuse:42.3                 : Ok   gcc (SUSE Linux) 4.8.5, clang version 3.8.0 (tags/RELEASE_380/final 262553)
  56   195.58 opensuse:tumbleweed           : Ok   gcc (SUSE Linux) 9.2.1 20200128 [revision 83f65674e78d97d27537361de1a9d74067ff228d], clang version 9.0.1 
  57    18.33 oraclelinux:6                 : FAIL gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1)
  58    52.69 oraclelinux:7                 : Ok   gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39.0.3)
  59   218.36 oraclelinux:8                 : Ok   gcc (GCC) 8.3.1 20190507 (Red Hat 8.3.1-4.5.0.5), clang version 8.0.1 (Red Hat 8.0.1-1.0.1.module+el8.1.0+5428+345cee14)
  60    45.57 ubuntu:12.04                  : Ok   gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0)
  61    54.51 ubuntu:14.04                  : Ok   gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4
  62   151.39 ubuntu:16.04                  : Ok   gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609, clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final)
  63    47.30 ubuntu:16.04-x-arm            : Ok   arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
  64    24.91 ubuntu:16.04-x-arm64          : FAIL aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609

  Same thing as with debian experimental cross build, will fix this one


  CC       /tmp/build/perf/bench/mem-functions.o
  CC       /tmp/build/perf/tests/parse-events.o
arch/arm64/util/machine.c: In function 'arch__symbols__fixup_end':
arch/arm64/util/machine.c:19:7: error: implicit declaration of function 'strchr' [-Werror=implicit-function-declaration]
  if ((strchr(p->name, '[') && strchr(c->name, '[') == NULL) ||
       ^
arch/arm64/util/machine.c:19:7: error: incompatible implicit declaration of built-in function 'strchr' [-Werror]
arch/arm64/util/machine.c:19:7: note: include '<string.h>' or provide a declaration of 'strchr'
cc1: all warnings being treated as errors
  MKDIR    /tmp/build/perf/ui/
mv: cannot stat '/tmp/build/perf/arch/arm64/util/.machine.o.tmp': No such file or directory
/git/linux/tools/build/Makefile.build:96: recipe for target '/tmp/build/perf/arch/arm64/util/machine.o' failed
make[6]: *** [/tmp/build/perf/arch/arm64/util/machine.o] Error 1
  CC       /tmp/build/perf/ui/setup.o
/git/linux/tools/build/Makefile.build:139: recipe for target 'util' failed
make[5]: *** [util] Error 2
/git/linux/tools/build/Makefile.build:139: recipe for target 'arm64' failed
make[4]: *** [arm64] Error 2
/git/linux/tools/build/Makefile.build:139: recipe for target 'arch' failed
make[3]: *** [arch] Error 2
make[3]: *** Waiting for unfinished jobs....


  65    47.00 ubuntu:16.04-x-powerpc        : Ok   powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
  66    48.04 ubuntu:16.04-x-powerpc64      : Ok   powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
  67    47.52 ubuntu:16.04-x-powerpc64el    : Ok   powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
  68    46.10 ubuntu:16.04-x-s390           : Ok   s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
  69   182.60 ubuntu:18.04                  : Ok   gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
  70    50.66 ubuntu:18.04-x-arm            : Ok   arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0
  71    26.81 ubuntu:18.04-x-arm64          : FAIL aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0
  72    38.36 ubuntu:18.04-x-m68k           : Ok   m68k-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
  73    50.79 ubuntu:18.04-x-powerpc        : Ok   powerpc-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
  74    57.69 ubuntu:18.04-x-powerpc64      : Ok   powerpc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
  75    56.70 ubuntu:18.04-x-powerpc64el    : Ok   powerpc64le-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
  76   147.10 ubuntu:18.04-x-riscv64        : Ok   riscv64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
  77    43.49 ubuntu:18.04-x-s390           : Ok   s390x-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
  78    52.88 ubuntu:18.04-x-sh4            : Ok   sh4-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
  79    45.57 ubuntu:18.04-x-sparc64        : Ok   sparc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
  80   168.41 ubuntu:18.10                  : Ok   gcc (Ubuntu 8.3.0-6ubuntu1~18.10.1) 8.3.0, clang version 7.0.0-3 (tags/RELEASE_700/final)
  81   177.55 ubuntu:19.04                  : Ok   gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0, clang version 8.0.0-3 (tags/RELEASE_800/final)
  82    45.12 ubuntu:19.04-x-alpha          : Ok   alpha-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0
  83    28.06 ubuntu:19.04-x-arm64          : FAIL aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.3.0-6ubuntu1) 8.3.0
  84    43.21 ubuntu:19.04-x-hppa           : Ok   hppa-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0
  85   147.65 ubuntu:19.10                  : FAIL gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008, clang version 9.0.0-2 (tags/RELEASE_900/final)

This should be something else, will check.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 6/9] perf record: Support synthesizing cgroup events
  2020-03-30 16:43     ` Arnaldo Carvalho de Melo
  2020-03-31 16:04       ` Arnaldo Carvalho de Melo
@ 2020-04-01 23:22       ` Namhyung Kim
  2020-04-02  1:52         ` [PATCH] perf tools: Add file-handle feature test Namhyung Kim
  1 sibling, 1 reply; 46+ messages in thread
From: Namhyung Kim @ 2020-04-01 23:22 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Peter Zijlstra, Ingo Molnar, LKML, Mark Rutland,
	Alexander Shishkin

Hi Arnaldo,

On Tue, Mar 31, 2020 at 1:43 AM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> Em Mon, Mar 30, 2020 at 01:30:58PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Wed, Mar 25, 2020 at 09:45:33PM +0900, Namhyung Kim escreveu:
> > > Synthesize cgroup events by iterating cgroup filesystem directories.
> > > The cgroup event only saves the portion of cgroup path after the mount
> > > point and the cgroup id (which actually is a file handle).
> >
> > Breaks the build on alpine linux (musl libc):
> >
> >   CC       /tmp/build/perf/util/srccode.o
> >   CC       /tmp/build/perf/util/synthetic-events.o
> > util/synthetic-events.c: In function 'perf_event__synthesize_cgroup':
> > util/synthetic-events.c:427:22: error: field 'fh' has incomplete type
> >    struct file_handle fh;
> >                       ^
> > util/synthetic-events.c:441:6: error: implicit declaration of function 'name_to_handle_at' [-Werror=implicit-function-declaration]
> >   if (name_to_handle_at(AT_FDCWD, path, &handle.fh, &mount_id, 0) < 0) {
> >       ^
> > util/synthetic-events.c:441:2: error: nested extern declaration of 'name_to_handle_at' [-Werror=nested-externs]
> >   if (name_to_handle_at(AT_FDCWD, path, &handle.fh, &mount_id, 0) < 0) {
> >   ^
> >   CC       /tmp/build/perf/util/data.o
> > cc1: all warnings being treated as errors
> > mv: can't rename '/tmp/build/perf/util/.synthetic-events.o.tmp': No such file or directory
> >
> >
> > I'm trying to fix
>
> musl libc up to 1.2.21 (IIRC) lacks name_to_handle_at and its structs,
> then from the one that is in alpine linux 3.10 the error changes to:
>
>   CC       /tmp/build/perf/util/cloexec.o
> util/synthetic-events.c:427:22: error: field 'fh' with variable sized type 'struct file_handle' not at the end of a struct or class is a GNU
>       extension [-Werror,-Wgnu-variable-sized-type-not-at-end]
>                 struct file_handle fh;
>                                    ^
> 1 error generated.
> mv: can't rename '/tmp/build/perf/util/.synthetic-events.o.tmp': No such file or directory
> make[4]: *** [/git/linux/tools/build/Makefile.build:97: /tmp/build/perf/util/synthetic-events.o] Error 1
> make[4]: *** Waiting for unfinished jobs....
> make[3]: *** [/git/linux/tools/build/Makefile.build:139: util] Error 2
> make[2]: *** [Makefile.perf:617: /tmp/build/perf/perf-in.o] Error 2
> make[1]: *** [Makefile.perf:225: sub-make] Error 2
> make: *** [Makefile:70: all] Error 2
> make: Leaving directory '/git/linux/tools/perf'
> + exit 1
> [root@quaco ~]#
>
> So probably we'll need a feature test to catch this and do some
> workaround or disable the feature on such systems, providing some
> warning.
>
> I left the container build tests running to see if some other system has
> problems with this, perhaps the ones with uCLibc or some older glibc,
> we'll see.
>
> So far all the alpine versions failed with the above problems and ALT
> Linux p8 and p9 built without problems.

Hmm.. right, the CONFIG_FHANDLE can be turned off or old systems
lack support.  I'll send a patch for the feature test soon.

Thanks
Namhyung

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] perf tools: Add file-handle feature test
  2020-04-01 23:22       ` Namhyung Kim
@ 2020-04-02  1:52         ` Namhyung Kim
  2020-04-02 15:37           ` Arnaldo Carvalho de Melo
  2020-04-04  8:41           ` [tip: perf/urgent] " tip-bot2 for Namhyung Kim
  0 siblings, 2 replies; 46+ messages in thread
From: Namhyung Kim @ 2020-04-02  1:52 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, Mark Rutland,
	Alexander Shishkin, LKML

The file handle (FHANDLE) support is configurable so some systems might
not have it.  So add a config feature item to check it on build time
and reject cgroup tracking based on that.

Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/build/Makefile.feature           |  3 ++-
 tools/build/feature/Makefile           |  6 +++++-
 tools/build/feature/test-file-handle.c | 14 ++++++++++++++
 tools/perf/Makefile.config             |  4 ++++
 tools/perf/builtin-record.c            |  8 +++++++-
 tools/perf/builtin-top.c               |  8 +++++++-
 tools/perf/util/synthetic-events.c     |  9 +++++++++
 7 files changed, 48 insertions(+), 4 deletions(-)
 create mode 100644 tools/build/feature/test-file-handle.c

diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature
index 574c2e0b9d20..3e0c019ef297 100644
--- a/tools/build/Makefile.feature
+++ b/tools/build/Makefile.feature
@@ -72,7 +72,8 @@ FEATURE_TESTS_BASIC :=                  \
         setns				\
         libaio				\
         libzstd				\
-        disassembler-four-args
+        disassembler-four-args		\
+        file-handle
 
 # FEATURE_TESTS_BASIC + FEATURE_TESTS_EXTRA is the complete list
 # of all feature tests
diff --git a/tools/build/feature/Makefile b/tools/build/feature/Makefile
index 7ac0d8088565..621f528f7822 100644
--- a/tools/build/feature/Makefile
+++ b/tools/build/feature/Makefile
@@ -67,7 +67,8 @@ FILES=                                          \
          test-llvm.bin				\
          test-llvm-version.bin			\
          test-libaio.bin			\
-         test-libzstd.bin
+         test-libzstd.bin			\
+         test-file-handle.bin
 
 FILES := $(addprefix $(OUTPUT),$(FILES))
 
@@ -321,6 +322,9 @@ FLAGS_PERL_EMBED=$(PERL_EMBED_CCOPTS) $(PERL_EMBED_LDOPTS)
 $(OUTPUT)test-libzstd.bin:
 	$(BUILD) -lzstd
 
+$(OUTPUT)test-file-handle.bin:
+	$(BUILD)
+
 ###############################
 
 clean:
diff --git a/tools/build/feature/test-file-handle.c b/tools/build/feature/test-file-handle.c
new file mode 100644
index 000000000000..5f8c16f8784f
--- /dev/null
+++ b/tools/build/feature/test-file-handle.c
@@ -0,0 +1,14 @@
+#define _GNU_SOURCE
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+
+int main(void)
+{
+	struct file_handle fh;
+	int mount_id;
+
+	name_to_handle_at(AT_FDCWD, "/", &fh, &mount_id, 0);
+	return 0;
+}
+
diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
index 80e55e796be9..eb95c0c0a169 100644
--- a/tools/perf/Makefile.config
+++ b/tools/perf/Makefile.config
@@ -348,6 +348,10 @@ ifeq ($(feature-gettid), 1)
   CFLAGS += -DHAVE_GETTID
 endif
 
+ifeq ($(feature-file-handle), 1)
+  CFLAGS += -DHAVE_FILE_HANDLE
+endif
+
 ifdef NO_LIBELF
   NO_DWARF := 1
   NO_DEMANGLE := 1
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 7d7912e121d6..1ab349abe904 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1433,8 +1433,14 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 	if (rec->opts.record_namespaces)
 		tool->namespace_events = true;
 
-	if (rec->opts.record_cgroup)
+	if (rec->opts.record_cgroup) {
+#ifdef HAVE_FILE_HANDLE
 		tool->cgroup_events = true;
+#else
+		pr_err("cgroup tracking is not supported\n");
+		return -1;
+#endif
+	}
 
 	if (rec->opts.auxtrace_snapshot_mode || rec->switch_output.enabled) {
 		signal(SIGUSR2, snapshot_sig_handler);
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 56b2dd0db88e..02ea2cf2a3d9 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1246,8 +1246,14 @@ static int __cmd_top(struct perf_top *top)
 
 	if (opts->record_namespaces)
 		top->tool.namespace_events = true;
-	if (opts->record_cgroup)
+	if (opts->record_cgroup) {
+#ifdef HAVE_FILE_HANDLE
 		top->tool.cgroup_events = true;
+#else
+		pr_err("cgroup tracking is not supported.\n");
+		return -1;
+#endif
+	}
 
 	ret = perf_event__synthesize_bpf_events(top->session, perf_event__process,
 						&top->session->machines.host,
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index 24975470ed5c..f96e84956d84 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -415,6 +415,7 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 	return rc;
 }
 
+#ifdef HAVE_FILE_HANDLE
 static int perf_event__synthesize_cgroup(struct perf_tool *tool,
 					 union perf_event *event,
 					 char *path, size_t mount_len,
@@ -526,6 +527,14 @@ int perf_event__synthesize_cgroups(struct perf_tool *tool,
 
 	return 0;
 }
+#else
+int perf_event__synthesize_cgroups(struct perf_tool *tool,
+				   perf_event__handler_t process,
+				   struct machine *machine)
+{
+	return -1;
+}
+#endif
 
 int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t process,
 				   struct machine *machine)
-- 
2.26.0.rc2.310.g2932bb562d-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH] perf tools: Add file-handle feature test
  2020-04-02  1:52         ` [PATCH] perf tools: Add file-handle feature test Namhyung Kim
@ 2020-04-02 15:37           ` Arnaldo Carvalho de Melo
  2020-04-02 15:46             ` Arnaldo Carvalho de Melo
  2020-04-02 18:37             ` Arnaldo Carvalho de Melo
  2020-04-04  8:41           ` [tip: perf/urgent] " tip-bot2 for Namhyung Kim
  1 sibling, 2 replies; 46+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-04-02 15:37 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, Mark Rutland,
	Alexander Shishkin, LKML

Em Thu, Apr 02, 2020 at 10:52:49AM +0900, Namhyung Kim escreveu:
> The file handle (FHANDLE) support is configurable so some systems might
> not have it.  So add a config feature item to check it on build time
> and reject cgroup tracking based on that.

Ok, I'll break this patch in two, add the feature test first, then fold
the usage of HAVE_FILE_HANDLE with the patch that uses it, so that we
keep the codebase bisectable,

Thanks!

- Arnaldo
 
> Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/build/Makefile.feature           |  3 ++-
>  tools/build/feature/Makefile           |  6 +++++-
>  tools/build/feature/test-file-handle.c | 14 ++++++++++++++
>  tools/perf/Makefile.config             |  4 ++++
>  tools/perf/builtin-record.c            |  8 +++++++-
>  tools/perf/builtin-top.c               |  8 +++++++-
>  tools/perf/util/synthetic-events.c     |  9 +++++++++
>  7 files changed, 48 insertions(+), 4 deletions(-)
>  create mode 100644 tools/build/feature/test-file-handle.c
> 
> diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature
> index 574c2e0b9d20..3e0c019ef297 100644
> --- a/tools/build/Makefile.feature
> +++ b/tools/build/Makefile.feature
> @@ -72,7 +72,8 @@ FEATURE_TESTS_BASIC :=                  \
>          setns				\
>          libaio				\
>          libzstd				\
> -        disassembler-four-args
> +        disassembler-four-args		\
> +        file-handle
>  
>  # FEATURE_TESTS_BASIC + FEATURE_TESTS_EXTRA is the complete list
>  # of all feature tests
> diff --git a/tools/build/feature/Makefile b/tools/build/feature/Makefile
> index 7ac0d8088565..621f528f7822 100644
> --- a/tools/build/feature/Makefile
> +++ b/tools/build/feature/Makefile
> @@ -67,7 +67,8 @@ FILES=                                          \
>           test-llvm.bin				\
>           test-llvm-version.bin			\
>           test-libaio.bin			\
> -         test-libzstd.bin
> +         test-libzstd.bin			\
> +         test-file-handle.bin
>  
>  FILES := $(addprefix $(OUTPUT),$(FILES))
>  
> @@ -321,6 +322,9 @@ FLAGS_PERL_EMBED=$(PERL_EMBED_CCOPTS) $(PERL_EMBED_LDOPTS)
>  $(OUTPUT)test-libzstd.bin:
>  	$(BUILD) -lzstd
>  
> +$(OUTPUT)test-file-handle.bin:
> +	$(BUILD)
> +
>  ###############################
>  
>  clean:
> diff --git a/tools/build/feature/test-file-handle.c b/tools/build/feature/test-file-handle.c
> new file mode 100644
> index 000000000000..5f8c16f8784f
> --- /dev/null
> +++ b/tools/build/feature/test-file-handle.c
> @@ -0,0 +1,14 @@
> +#define _GNU_SOURCE
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +#include <fcntl.h>
> +
> +int main(void)
> +{
> +	struct file_handle fh;
> +	int mount_id;
> +
> +	name_to_handle_at(AT_FDCWD, "/", &fh, &mount_id, 0);
> +	return 0;
> +}
> +
> diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
> index 80e55e796be9..eb95c0c0a169 100644
> --- a/tools/perf/Makefile.config
> +++ b/tools/perf/Makefile.config
> @@ -348,6 +348,10 @@ ifeq ($(feature-gettid), 1)
>    CFLAGS += -DHAVE_GETTID
>  endif
>  
> +ifeq ($(feature-file-handle), 1)
> +  CFLAGS += -DHAVE_FILE_HANDLE
> +endif
> +
>  ifdef NO_LIBELF
>    NO_DWARF := 1
>    NO_DEMANGLE := 1
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 7d7912e121d6..1ab349abe904 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -1433,8 +1433,14 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>  	if (rec->opts.record_namespaces)
>  		tool->namespace_events = true;
>  
> -	if (rec->opts.record_cgroup)
> +	if (rec->opts.record_cgroup) {
> +#ifdef HAVE_FILE_HANDLE
>  		tool->cgroup_events = true;
> +#else
> +		pr_err("cgroup tracking is not supported\n");
> +		return -1;
> +#endif
> +	}
>  
>  	if (rec->opts.auxtrace_snapshot_mode || rec->switch_output.enabled) {
>  		signal(SIGUSR2, snapshot_sig_handler);
> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> index 56b2dd0db88e..02ea2cf2a3d9 100644
> --- a/tools/perf/builtin-top.c
> +++ b/tools/perf/builtin-top.c
> @@ -1246,8 +1246,14 @@ static int __cmd_top(struct perf_top *top)
>  
>  	if (opts->record_namespaces)
>  		top->tool.namespace_events = true;
> -	if (opts->record_cgroup)
> +	if (opts->record_cgroup) {
> +#ifdef HAVE_FILE_HANDLE
>  		top->tool.cgroup_events = true;
> +#else
> +		pr_err("cgroup tracking is not supported.\n");
> +		return -1;
> +#endif
> +	}
>  
>  	ret = perf_event__synthesize_bpf_events(top->session, perf_event__process,
>  						&top->session->machines.host,
> diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
> index 24975470ed5c..f96e84956d84 100644
> --- a/tools/perf/util/synthetic-events.c
> +++ b/tools/perf/util/synthetic-events.c
> @@ -415,6 +415,7 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
>  	return rc;
>  }
>  
> +#ifdef HAVE_FILE_HANDLE
>  static int perf_event__synthesize_cgroup(struct perf_tool *tool,
>  					 union perf_event *event,
>  					 char *path, size_t mount_len,
> @@ -526,6 +527,14 @@ int perf_event__synthesize_cgroups(struct perf_tool *tool,
>  
>  	return 0;
>  }
> +#else
> +int perf_event__synthesize_cgroups(struct perf_tool *tool,
> +				   perf_event__handler_t process,
> +				   struct machine *machine)
> +{
> +	return -1;
> +}
> +#endif
>  
>  int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t process,
>  				   struct machine *machine)
> -- 
> 2.26.0.rc2.310.g2932bb562d-goog
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] perf tools: Add file-handle feature test
  2020-04-02 15:37           ` Arnaldo Carvalho de Melo
@ 2020-04-02 15:46             ` Arnaldo Carvalho de Melo
  2020-04-02 18:37             ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 46+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-04-02 15:46 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, Mark Rutland,
	Alexander Shishkin, LKML

Em Thu, Apr 02, 2020 at 12:37:48PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Thu, Apr 02, 2020 at 10:52:49AM +0900, Namhyung Kim escreveu:
> > The file handle (FHANDLE) support is configurable so some systems might
> > not have it.  So add a config feature item to check it on build time
> > and reject cgroup tracking based on that.
> 
> Ok, I'll break this patch in two, add the feature test first, then fold
> the usage of HAVE_FILE_HANDLE with the patch that uses it, so that we
> keep the codebase bisectable,
> 
> Thanks!

Also had to do this here and in the other similar places:

diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index f96e84956d84..a661b122d9d8 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -528,9 +528,9 @@ int perf_event__synthesize_cgroups(struct perf_tool *tool,
 	return 0;
 }
 #else
-int perf_event__synthesize_cgroups(struct perf_tool *tool,
-				   perf_event__handler_t process,
-				   struct machine *machine)
+int perf_event__synthesize_cgroups(struct perf_tool *tool __maybe_unused,
+				   perf_event__handler_t process __maybe_unused,
+				   struct machine *machine __maybe_unused)
 {
 	return -1;
 }

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH] perf tools: Add file-handle feature test
  2020-04-02 15:37           ` Arnaldo Carvalho de Melo
  2020-04-02 15:46             ` Arnaldo Carvalho de Melo
@ 2020-04-02 18:37             ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 46+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-04-02 18:37 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, Mark Rutland,
	Alexander Shishkin, LKML

Em Thu, Apr 02, 2020 at 12:37:48PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Thu, Apr 02, 2020 at 10:52:49AM +0900, Namhyung Kim escreveu:
> > The file handle (FHANDLE) support is configurable so some systems might
> > not have it.  So add a config feature item to check it on build time
> > and reject cgroup tracking based on that.
> 
> Ok, I'll break this patch in two, add the feature test first, then fold
> the usage of HAVE_FILE_HANDLE with the patch that uses it, so that we
> keep the codebase bisectable,

> > +++ b/tools/build/feature/test-file-handle.c
> > @@ -0,0 +1,14 @@
> > +#define _GNU_SOURCE
> > +#include <sys/types.h>
> > +#include <sys/stat.h>
> > +#include <fcntl.h>
> > +
> > +int main(void)
> > +{
> > +	struct file_handle fh;
> > +	int mount_id;
> > +
> > +	name_to_handle_at(AT_FDCWD, "/", &fh, &mount_id, 0);
> > +	return 0;

Also had to do it just like you use in your patches, see patch below, to
cover this case as well:

  CC       /tmp/build/perf/util/synthetic-events.o
  CC       /tmp/build/perf/util/data.o
  CC       /tmp/build/perf/util/tsc.o
  CC       /tmp/build/perf/util/cloexec.o
util/synthetic-events.c:428:22: error: field 'fh' with   CC       /tmp/build/perf/util/call-path.o
variable sized type 'struct file_handle' not at the end of a struct or class is a GNU
      extension [-Werror,-Wgnu-variable-sized-type-not-at-end]
                struct file_handle fh;
                                   ^
1 error generated.
mv: can't rename '/tmp/build/perf/util/.synthetic-events.o.tmp': No such file or directory
make[4]: *** [/git/linux/tools/build/Makefile.build:97: /tmp/build/perf/util/synthetic-events.o] Error 1
make[4]: *** Waiting for unfinished jobs....


This is on Alpine Linux 3.11 onwards, musl libc.

- Arnaldo


diff --git a/tools/build/feature/test-file-handle.c b/tools/build/feature/test-file-handle.c
index e325b2a060ed..bed871e3978d 100644
--- a/tools/build/feature/test-file-handle.c
+++ b/tools/build/feature/test-file-handle.c
@@ -5,9 +5,12 @@
 
 int main(void)
 {
-	struct file_handle fh;
+        struct {
+                struct file_handle fh;
+                uint64_t cgroup_id;
+        } handle;
 	int mount_id;
 
-	name_to_handle_at(AT_FDCWD, "/", &fh, &mount_id, 0);
+	name_to_handle_at(AT_FDCWD, "/", &handle.fh, &mount_id, 0);
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 4/9] perf tools: Maintain cgroup hierarchy
  2020-03-25 12:45 ` [PATCH 4/9] perf tools: Maintain cgroup hierarchy Namhyung Kim
@ 2020-04-03 12:36   ` Arnaldo Carvalho de Melo
  2020-04-04  1:29     ` Namhyung Kim
  2020-04-04  8:41     ` [tip: perf/urgent] perf python: Include rwsem.c in the pythong biding tip-bot2 for Arnaldo Carvalho de Melo
  2020-04-04  8:41   ` [tip: perf/urgent] perf cgroup: Maintain cgroup hierarchy tip-bot2 for Namhyung Kim
  1 sibling, 2 replies; 46+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-04-03 12:36 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Peter Zijlstra, Ingo Molnar, LKML, Mark Rutland,
	Alexander Shishkin

Em Wed, Mar 25, 2020 at 09:45:31PM +0900, Namhyung Kim escreveu:
> Each cgroup is kept in the perf_env's cgroup_tree sorted by the cgroup
> id.  Hist entries have cgroup id can compare it directly and later it
> can be used to find a group name using this tree.

This one breaks the 'perf test python' test, I fixed it adding this
patch before your series:


From ea3c4ab73cb2ea2960bba6894560b1ef91e69737 Mon Sep 17 00:00:00 2001
From: Arnaldo Carvalho de Melo <acme@redhat.com>
Date: Fri, 3 Apr 2020 09:29:52 -0300
Subject: [PATCH 1/1] perf python: Include rwsem.c in the pythong biding

We'll need it for the cgroup patches, and its better to have it in a
separate patch in case we need to later revert the cgroup patches.

I.e. without this we have:

  [root@five ~]# perf test -v python
  19: 'import perf' in python                               :
  --- start ---
  test child forked, pid 148447
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  ImportError: /tmp/build/perf/python/perf.cpython-37m-x86_64-linux-gnu.so: undefined symbol: down_write
  test child finished with -1
  ---- end ----
  'import perf' in python: FAILED!
  [root@five ~]#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/python-ext-sources | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/util/python-ext-sources b/tools/perf/util/python-ext-sources
index e7279ea6043a..a9d9c142eb7c 100644
--- a/tools/perf/util/python-ext-sources
+++ b/tools/perf/util/python-ext-sources
@@ -34,3 +34,4 @@ util/string.c
 util/symbol_fprintf.c
 util/units.c
 util/affinity.c
+util/rwsem.c
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 4/9] perf tools: Maintain cgroup hierarchy
  2020-04-03 12:36   ` Arnaldo Carvalho de Melo
@ 2020-04-04  1:29     ` Namhyung Kim
  2020-04-04  8:41     ` [tip: perf/urgent] perf python: Include rwsem.c in the pythong biding tip-bot2 for Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 46+ messages in thread
From: Namhyung Kim @ 2020-04-04  1:29 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Peter Zijlstra, Ingo Molnar, LKML, Mark Rutland,
	Alexander Shishkin

Hi Arnaldo,

On Fri, Apr 3, 2020 at 9:36 PM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
>
> Em Wed, Mar 25, 2020 at 09:45:31PM +0900, Namhyung Kim escreveu:
> > Each cgroup is kept in the perf_env's cgroup_tree sorted by the cgroup
> > id.  Hist entries have cgroup id can compare it directly and later it
> > can be used to find a group name using this tree.
>
> This one breaks the 'perf test python' test, I fixed it adding this
> patch before your series:

Thanks a lot for fixing this and taking care of the whole thing!
Namhyung


> From ea3c4ab73cb2ea2960bba6894560b1ef91e69737 Mon Sep 17 00:00:00 2001
> From: Arnaldo Carvalho de Melo <acme@redhat.com>
> Date: Fri, 3 Apr 2020 09:29:52 -0300
> Subject: [PATCH 1/1] perf python: Include rwsem.c in the pythong biding
>
> We'll need it for the cgroup patches, and its better to have it in a
> separate patch in case we need to later revert the cgroup patches.
>
> I.e. without this we have:
>
>   [root@five ~]# perf test -v python
>   19: 'import perf' in python                               :
>   --- start ---
>   test child forked, pid 148447
>   Traceback (most recent call last):
>     File "<stdin>", line 1, in <module>
>   ImportError: /tmp/build/perf/python/perf.cpython-37m-x86_64-linux-gnu.so: undefined symbol: down_write
>   test child finished with -1
>   ---- end ----
>   'import perf' in python: FAILED!
>   [root@five ~]#
>
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Cc: Jiri Olsa <jolsa@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> ---
>  tools/perf/util/python-ext-sources | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/tools/perf/util/python-ext-sources b/tools/perf/util/python-ext-sources
> index e7279ea6043a..a9d9c142eb7c 100644
> --- a/tools/perf/util/python-ext-sources
> +++ b/tools/perf/util/python-ext-sources
> @@ -34,3 +34,4 @@ util/string.c
>  util/symbol_fprintf.c
>  util/units.c
>  util/affinity.c
> +util/rwsem.c
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [tip: perf/urgent] perf script: Add --show-cgroup-events option
  2020-03-25 12:45 ` [PATCH 9/9] perf script: Add --show-cgroup-events option Namhyung Kim
  2020-03-27 16:39   ` Arnaldo Carvalho de Melo
@ 2020-04-04  8:41   ` tip-bot2 for Namhyung Kim
  1 sibling, 0 replies; 46+ messages in thread
From: tip-bot2 for Namhyung Kim @ 2020-04-04  8:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Namhyung Kim, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Jiri Olsa, Mark Rutland, Peter Zijlstra, x86, LKML

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID:     160d4af97b831c87e680c607c639d55ae027087f
Gitweb:        https://git.kernel.org/tip/160d4af97b831c87e680c607c639d55ae027087f
Author:        Namhyung Kim <namhyung@kernel.org>
AuthorDate:    Wed, 25 Mar 2020 21:45:36 +09:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Fri, 03 Apr 2020 09:37:55 -03:00

perf script: Add --show-cgroup-events option

The --show-cgroup-events option is to print CGROUP events in the
output like others.

Committer testing:

  [root@seventh ~]# perf record --all-cgroups --namespaces /wb/cgtest
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.039 MB perf.data (487 samples) ]
  [root@seventh ~]# perf script --show-cgroup-events | grep PERF_RECORD_CGROUP -B2 -A2
           swapper     0     0.000000: PERF_RECORD_CGROUP cgroup: 1 /
              perf 12145 11200.440730:          1 cycles:  ffffffffb900d58b __intel_pmu_enable_all.constprop.0+0x3b (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux)
              perf 12145 11200.440733:          1 cycles:  ffffffffb900d58b __intel_pmu_enable_all.constprop.0+0x3b (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux)
  --
            cgtest 12145 11200.440739:     193472 cycles:  ffffffffb90f6fbc commit_creds+0x1fc (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux)
            cgtest 12145 11200.440790:    2691608 cycles:      7fa2cb43019b _dl_sysdep_start+0x7cb (/usr/lib64/ld-2.29.so)
            cgtest 12145 11200.440962: PERF_RECORD_CGROUP cgroup: 83 /sub
            cgtest 12147 11200.441054:          1 cycles:  ffffffffb900d58b __intel_pmu_enable_all.constprop.0+0x3b (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux)
            cgtest 12147 11200.441057:          1 cycles:  ffffffffb900d58b __intel_pmu_enable_all.constprop.0+0x3b (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux)
  --
            cgtest 12148 11200.441103:      10227 cycles:  ffffffffb9a0153d end_repeat_nmi+0x48 (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux)
            cgtest 12148 11200.441106:     273295 cycles:  ffffffffb99ecbc7 copy_page+0x7 (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux)
            cgtest 12147 11200.441133: PERF_RECORD_CGROUP cgroup: 88 /sub/cgrp1
            cgtest 12147 11200.441143:    2788845 cycles:  ffffffffb94676c2 security_genfs_sid+0x102 (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux)
            cgtest 12148 11200.441162: PERF_RECORD_CGROUP cgroup: 93 /sub/cgrp2
            cgtest 12148 11200.441182:    2669546 cycles:            401020 _init+0x20 (/wb/cgtest)
            cgtest 12149 11200.441247:          1 cycles:  ffffffffb900d58b __intel_pmu_enable_all.constprop.0+0x3b (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux)
  [root@seventh ~]#

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200325124536.2800725-10-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-script.txt |  3 ++-
 tools/perf/builtin-script.c              | 41 +++++++++++++++++++++++-
 2 files changed, 44 insertions(+)

diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index 4af255b..99a9853 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -319,6 +319,9 @@ OPTIONS
 --show-bpf-events
 	Display bpf events i.e. events of type PERF_RECORD_KSYMBOL and PERF_RECORD_BPF_EVENT.
 
+--show-cgroup-events
+	Display cgroup events i.e. events of type PERF_RECORD_CGROUP.
+
 --demangle::
 	Demangle symbol names to human readable form. It's enabled by default,
 	disable with --no-demangle.
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index f63869a..186ebf8 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -1694,6 +1694,7 @@ struct perf_script {
 	bool			show_lost_events;
 	bool			show_round_events;
 	bool			show_bpf_events;
+	bool			show_cgroup_events;
 	bool			allocated;
 	bool			per_event_dump;
 	struct evswitch		evswitch;
@@ -2212,6 +2213,41 @@ out:
 	return ret;
 }
 
+static int process_cgroup_event(struct perf_tool *tool,
+				union perf_event *event,
+				struct perf_sample *sample,
+				struct machine *machine)
+{
+	struct thread *thread;
+	struct perf_script *script = container_of(tool, struct perf_script, tool);
+	struct perf_session *session = script->session;
+	struct evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id);
+	int ret = -1;
+
+	thread = machine__findnew_thread(machine, sample->pid, sample->tid);
+	if (thread == NULL) {
+		pr_debug("problem processing CGROUP event, skipping it.\n");
+		return -1;
+	}
+
+	if (perf_event__process_cgroup(tool, event, sample, machine) < 0)
+		goto out;
+
+	if (!evsel->core.attr.sample_id_all) {
+		sample->cpu = 0;
+		sample->time = 0;
+	}
+	if (!filter_cpu(sample)) {
+		perf_sample__fprintf_start(sample, thread, evsel,
+					   PERF_RECORD_CGROUP, stdout);
+		perf_event__fprintf(event, stdout);
+	}
+	ret = 0;
+out:
+	thread__put(thread);
+	return ret;
+}
+
 static int process_fork_event(struct perf_tool *tool,
 			      union perf_event *event,
 			      struct perf_sample *sample,
@@ -2551,6 +2587,8 @@ static int __cmd_script(struct perf_script *script)
 		script->tool.context_switch = process_switch_event;
 	if (script->show_namespace_events)
 		script->tool.namespaces = process_namespaces_event;
+	if (script->show_cgroup_events)
+		script->tool.cgroup = process_cgroup_event;
 	if (script->show_lost_events)
 		script->tool.lost = process_lost_event;
 	if (script->show_round_events) {
@@ -3476,6 +3514,7 @@ int cmd_script(int argc, const char **argv)
 			.mmap2		 = perf_event__process_mmap2,
 			.comm		 = perf_event__process_comm,
 			.namespaces	 = perf_event__process_namespaces,
+			.cgroup		 = perf_event__process_cgroup,
 			.exit		 = perf_event__process_exit,
 			.fork		 = perf_event__process_fork,
 			.attr		 = process_attr,
@@ -3577,6 +3616,8 @@ int cmd_script(int argc, const char **argv)
 		    "Show context switch events (if recorded)"),
 	OPT_BOOLEAN('\0', "show-namespace-events", &script.show_namespace_events,
 		    "Show namespace events (if recorded)"),
+	OPT_BOOLEAN('\0', "show-cgroup-events", &script.show_cgroup_events,
+		    "Show cgroup events (if recorded)"),
 	OPT_BOOLEAN('\0', "show-lost-events", &script.show_lost_events,
 		    "Show lost events (if recorded)"),
 	OPT_BOOLEAN('\0', "show-round-events", &script.show_round_events,

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [tip: perf/urgent] perf top: Add --all-cgroups option
  2020-03-25 12:45 ` [PATCH 8/9] perf top: " Namhyung Kim
  2020-03-27 16:35   ` Arnaldo Carvalho de Melo
@ 2020-04-04  8:41   ` tip-bot2 for Namhyung Kim
  1 sibling, 0 replies; 46+ messages in thread
From: tip-bot2 for Namhyung Kim @ 2020-04-04  8:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Namhyung Kim, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Jiri Olsa, Mark Rutland, Peter Zijlstra, x86, LKML

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID:     f382842fa0244ae1e2c28c8377732c85ec1fe7a9
Gitweb:        https://git.kernel.org/tip/f382842fa0244ae1e2c28c8377732c85ec1fe7a9
Author:        Namhyung Kim <namhyung@kernel.org>
AuthorDate:    Wed, 25 Mar 2020 21:45:35 +09:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Fri, 03 Apr 2020 09:37:55 -03:00

perf top: Add --all-cgroups option

The --all-cgroups option is to enable cgroup profiling support.  It
tells kernel to record CGROUP events in the ring buffer so that 'perf
top' can identify task/cgroup association later.

Committer testing:

Use:

  # perf top --all-cgroups -s cgroup_id,cgroup,pid

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200325124536.2800725-9-namhyung@kernel.org
Link: http://lore.kernel.org/lkml/20200402015249.3800462-1-namhyung@kernel.org
[ Extracted the HAVE_FILE_HANDLE from the followup patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-top.txt |  4 ++++
 tools/perf/builtin-top.c              | 15 +++++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt
index 324b6b5..ddab103 100644
--- a/tools/perf/Documentation/perf-top.txt
+++ b/tools/perf/Documentation/perf-top.txt
@@ -272,6 +272,10 @@ Default is to monitor all CPUS.
 	Record events of type PERF_RECORD_NAMESPACES and display it with the
 	'cgroup_id' sort key.
 
+--all-cgroups::
+	Record events of type PERF_RECORD_CGROUP and display it with the
+	'cgroup' sort key.
+
 --switch-on EVENT_NAME::
 	Only consider events after this event is found.
 
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index d2539b7..02ea2cf 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1246,6 +1246,14 @@ static int __cmd_top(struct perf_top *top)
 
 	if (opts->record_namespaces)
 		top->tool.namespace_events = true;
+	if (opts->record_cgroup) {
+#ifdef HAVE_FILE_HANDLE
+		top->tool.cgroup_events = true;
+#else
+		pr_err("cgroup tracking is not supported.\n");
+		return -1;
+#endif
+	}
 
 	ret = perf_event__synthesize_bpf_events(top->session, perf_event__process,
 						&top->session->machines.host,
@@ -1253,6 +1261,11 @@ static int __cmd_top(struct perf_top *top)
 	if (ret < 0)
 		pr_debug("Couldn't synthesize BPF events: Pre-existing BPF programs won't have symbols resolved.\n");
 
+	ret = perf_event__synthesize_cgroups(&top->tool, perf_event__process,
+					     &top->session->machines.host);
+	if (ret < 0)
+		pr_debug("Couldn't synthesize cgroup events.\n");
+
 	machine__synthesize_threads(&top->session->machines.host, &opts->target,
 				    top->evlist->core.threads, false,
 				    top->nr_threads_synthesize);
@@ -1545,6 +1558,8 @@ int cmd_top(int argc, const char **argv)
 			"number of thread to run event synthesize"),
 	OPT_BOOLEAN(0, "namespaces", &opts->record_namespaces,
 		    "Record namespaces events"),
+	OPT_BOOLEAN(0, "all-cgroups", &opts->record_cgroup,
+		    "Record cgroup events"),
 	OPTS_EVSWITCH(&top.evswitch),
 	OPT_END()
 	};

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [tip: perf/urgent] perf record: Support synthesizing cgroup events
  2020-03-25 12:45 ` [PATCH 6/9] perf record: Support synthesizing cgroup events Namhyung Kim
  2020-03-30 16:30   ` Arnaldo Carvalho de Melo
@ 2020-04-04  8:41   ` tip-bot2 for Namhyung Kim
  1 sibling, 0 replies; 46+ messages in thread
From: tip-bot2 for Namhyung Kim @ 2020-04-04  8:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Namhyung Kim, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Jiri Olsa, Mark Rutland, Peter Zijlstra, x86, LKML

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID:     ab64069f1a6604a43daec5fc4a4294b0b77a8b93
Gitweb:        https://git.kernel.org/tip/ab64069f1a6604a43daec5fc4a4294b0b77a8b93
Author:        Namhyung Kim <namhyung@kernel.org>
AuthorDate:    Wed, 25 Mar 2020 21:45:33 +09:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Fri, 03 Apr 2020 09:37:55 -03:00

perf record: Support synthesizing cgroup events

Synthesize cgroup events by iterating cgroup filesystem directories.
The cgroup event only saves the portion of cgroup path after the mount
point and the cgroup id (which actually is a file handle).

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200325124536.2800725-7-namhyung@kernel.org
Link: http://lore.kernel.org/lkml/20200402015249.3800462-1-namhyung@kernel.org
[ Extracted the HAVE_FILE_HANDLE from the followup patch, added missing __maybe_unused ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-record.c        |   5 +-
 tools/perf/util/synthetic-events.c | 122 ++++++++++++++++++++++++++++-
 tools/perf/util/synthetic-events.h |   1 +-
 tools/perf/util/tool.h             |   1 +-
 4 files changed, 129 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 4c30146..2802de9 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1397,6 +1397,11 @@ static int record__synthesize(struct record *rec, bool tail)
 	if (err < 0)
 		pr_warning("Couldn't synthesize bpf events.\n");
 
+	err = perf_event__synthesize_cgroups(tool, process_synthesized_event,
+					     machine);
+	if (err < 0)
+		pr_warning("Couldn't synthesize cgroup events.\n");
+
 	err = __machine__synthesize_threads(machine, tool, &opts->target, rec->evlist->core.threads,
 					    process_synthesized_event, opts->sample_address,
 					    1);
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index f72d809..a661b12 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -16,6 +16,7 @@
 #include "util/synthetic-events.h"
 #include "util/target.h"
 #include "util/time-utils.h"
+#include "util/cgroup.h"
 #include <linux/bitops.h>
 #include <linux/kernel.h>
 #include <linux/string.h>
@@ -414,6 +415,127 @@ out:
 	return rc;
 }
 
+#ifdef HAVE_FILE_HANDLE
+static int perf_event__synthesize_cgroup(struct perf_tool *tool,
+					 union perf_event *event,
+					 char *path, size_t mount_len,
+					 perf_event__handler_t process,
+					 struct machine *machine)
+{
+	size_t event_size = sizeof(event->cgroup) - sizeof(event->cgroup.path);
+	size_t path_len = strlen(path) - mount_len + 1;
+	struct {
+		struct file_handle fh;
+		uint64_t cgroup_id;
+	} handle;
+	int mount_id;
+
+	while (path_len % sizeof(u64))
+		path[mount_len + path_len++] = '\0';
+
+	memset(&event->cgroup, 0, event_size);
+
+	event->cgroup.header.type = PERF_RECORD_CGROUP;
+	event->cgroup.header.size = event_size + path_len + machine->id_hdr_size;
+
+	handle.fh.handle_bytes = sizeof(handle.cgroup_id);
+	if (name_to_handle_at(AT_FDCWD, path, &handle.fh, &mount_id, 0) < 0) {
+		pr_debug("stat failed: %s\n", path);
+		return -1;
+	}
+
+	event->cgroup.id = handle.cgroup_id;
+	strncpy(event->cgroup.path, path + mount_len, path_len);
+	memset(event->cgroup.path + path_len, 0, machine->id_hdr_size);
+
+	if (perf_tool__process_synth_event(tool, event, machine, process) < 0) {
+		pr_debug("process synth event failed\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int perf_event__walk_cgroup_tree(struct perf_tool *tool,
+					union perf_event *event,
+					char *path, size_t mount_len,
+					perf_event__handler_t process,
+					struct machine *machine)
+{
+	size_t pos = strlen(path);
+	DIR *d;
+	struct dirent *dent;
+	int ret = 0;
+
+	if (perf_event__synthesize_cgroup(tool, event, path, mount_len,
+					  process, machine) < 0)
+		return -1;
+
+	d = opendir(path);
+	if (d == NULL) {
+		pr_debug("failed to open directory: %s\n", path);
+		return -1;
+	}
+
+	while ((dent = readdir(d)) != NULL) {
+		if (dent->d_type != DT_DIR)
+			continue;
+		if (!strcmp(dent->d_name, ".") ||
+		    !strcmp(dent->d_name, ".."))
+			continue;
+
+		/* any sane path should be less than PATH_MAX */
+		if (strlen(path) + strlen(dent->d_name) + 1 >= PATH_MAX)
+			continue;
+
+		if (path[pos - 1] != '/')
+			strcat(path, "/");
+		strcat(path, dent->d_name);
+
+		ret = perf_event__walk_cgroup_tree(tool, event, path,
+						   mount_len, process, machine);
+		if (ret < 0)
+			break;
+
+		path[pos] = '\0';
+	}
+
+	closedir(d);
+	return ret;
+}
+
+int perf_event__synthesize_cgroups(struct perf_tool *tool,
+				   perf_event__handler_t process,
+				   struct machine *machine)
+{
+	union perf_event event;
+	char cgrp_root[PATH_MAX];
+	size_t mount_len;  /* length of mount point in the path */
+
+	if (cgroupfs_find_mountpoint(cgrp_root, PATH_MAX, "perf_event") < 0) {
+		pr_debug("cannot find cgroup mount point\n");
+		return -1;
+	}
+
+	mount_len = strlen(cgrp_root);
+	/* make sure the path starts with a slash (after mount point) */
+	strcat(cgrp_root, "/");
+
+	if (perf_event__walk_cgroup_tree(tool, &event, cgrp_root, mount_len,
+					 process, machine) < 0)
+		return -1;
+
+	return 0;
+}
+#else
+int perf_event__synthesize_cgroups(struct perf_tool *tool __maybe_unused,
+				   perf_event__handler_t process __maybe_unused,
+				   struct machine *machine __maybe_unused)
+{
+	return -1;
+}
+#endif
+
 int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t process,
 				   struct machine *machine)
 {
diff --git a/tools/perf/util/synthetic-events.h b/tools/perf/util/synthetic-events.h
index baead0c..e7a3e95 100644
--- a/tools/perf/util/synthetic-events.h
+++ b/tools/perf/util/synthetic-events.h
@@ -45,6 +45,7 @@ int perf_event__synthesize_kernel_mmap(struct perf_tool *tool, perf_event__handl
 int perf_event__synthesize_mmap_events(struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine, bool mmap_data);
 int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
 int perf_event__synthesize_namespaces(struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine);
+int perf_event__synthesize_cgroups(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
 int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format, const struct perf_sample *sample);
 int perf_event__synthesize_stat_config(struct perf_tool *tool, struct perf_stat_config *config, perf_event__handler_t process, struct machine *machine);
 int perf_event__synthesize_stat_events(struct perf_stat_config *config, struct perf_tool *tool, struct evlist *evlist, perf_event__handler_t process, bool attrs);
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 472ef5e..3fb67bd 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -79,6 +79,7 @@ struct perf_tool {
 	bool		ordered_events;
 	bool		ordering_requires_timestamps;
 	bool		namespace_events;
+	bool		cgroup_events;
 	bool		no_warn;
 	enum show_feature_header show_feat_hdr;
 };

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [tip: perf/urgent] perf record: Add --all-cgroups option
  2020-03-25 12:45 ` [PATCH 7/9] perf record: Add --all-cgroups option Namhyung Kim
@ 2020-04-04  8:41   ` tip-bot2 for Namhyung Kim
  0 siblings, 0 replies; 46+ messages in thread
From: tip-bot2 for Namhyung Kim @ 2020-04-04  8:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Namhyung Kim, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Jiri Olsa, Mark Rutland, Peter Zijlstra, x86, LKML

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID:     8fb4b67939e169fca68174e9ac7be79fe9a04498
Gitweb:        https://git.kernel.org/tip/8fb4b67939e169fca68174e9ac7be79fe9a04498
Author:        Namhyung Kim <namhyung@kernel.org>
AuthorDate:    Wed, 25 Mar 2020 21:45:34 +09:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Fri, 03 Apr 2020 09:37:55 -03:00

perf record: Add --all-cgroups option

The --all-cgroups option is to enable cgroup profiling support.  It
tells kernel to record CGROUP events in the ring buffer so that perf
report can identify task/cgroup association later.

  [root@seventh ~]# perf record --all-cgroups --namespaces /wb/cgtest
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.042 MB perf.data (558 samples) ]
  [root@seventh ~]# perf report --stdio -s cgroup_id,cgroup,pid
  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 558  of event 'cycles'
  # Event count (approx.): 458017341
  #
  # Overhead  cgroup id (dev/inode)  Cgroup          Pid:Command
  # ........  .....................  ..........  ...............
  #
      33.15%  4/0xeffffffb           /sub           9615:looper0
      32.83%  4/0xf00002f5           /sub/cgrp2     9620:looper2
      32.79%  4/0xf00002f4           /sub/cgrp1     9619:looper1
       0.35%  4/0xf00002f5           /sub/cgrp2     9618:cgtest
       0.34%  4/0xf00002f4           /sub/cgrp1     9617:cgtest
       0.32%  4/0xeffffffb           /              9615:looper0
       0.11%  4/0xeffffffb           /sub           9617:cgtest
       0.10%  4/0xeffffffb           /sub           9618:cgtest

  #
  # (Tip: Sample related events with: perf record -e '{cycles,instructions}:S')
  #
  [root@seventh ~]#

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200325124536.2800725-8-namhyung@kernel.org
Link: http://lore.kernel.org/lkml/20200402015249.3800462-1-namhyung@kernel.org
[ Extracted the HAVE_FILE_HANDLE from the followup patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-record.txt |  5 ++++-
 tools/perf/builtin-record.c              | 11 +++++++++++
 tools/perf/util/evsel.c                  | 11 ++++++++++-
 tools/perf/util/evsel.h                  |  1 +
 tools/perf/util/record.h                 |  1 +
 5 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index b25e028..b3f3b3f 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -391,7 +391,10 @@ displayed with the weight and local_weight sort keys.  This currently works for 
 abort events and some memory events in precise mode on modern Intel CPUs.
 
 --namespaces::
-Record events of type PERF_RECORD_NAMESPACES.
+Record events of type PERF_RECORD_NAMESPACES.  This enables 'cgroup_id' sort key.
+
+--all-cgroups::
+Record events of type PERF_RECORD_CGROUP.  This enables 'cgroup' sort key.
 
 --transaction::
 Record transaction flags for transaction related events.
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 2802de9..1ab349a 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1433,6 +1433,15 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 	if (rec->opts.record_namespaces)
 		tool->namespace_events = true;
 
+	if (rec->opts.record_cgroup) {
+#ifdef HAVE_FILE_HANDLE
+		tool->cgroup_events = true;
+#else
+		pr_err("cgroup tracking is not supported\n");
+		return -1;
+#endif
+	}
+
 	if (rec->opts.auxtrace_snapshot_mode || rec->switch_output.enabled) {
 		signal(SIGUSR2, snapshot_sig_handler);
 		if (rec->opts.auxtrace_snapshot_mode)
@@ -2363,6 +2372,8 @@ static struct option __record_options[] = {
 			"per thread proc mmap processing timeout in ms"),
 	OPT_BOOLEAN(0, "namespaces", &record.opts.record_namespaces,
 		    "Record namespaces events"),
+	OPT_BOOLEAN(0, "all-cgroups", &record.opts.record_cgroup,
+		    "Record cgroup events"),
 	OPT_BOOLEAN(0, "switch-events", &record.opts.record_switch_events,
 		    "Record context switch events"),
 	OPT_BOOLEAN_FLAG(0, "all-kernel", &record.opts.all_kernel,
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index b766eb6..eb880ef 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1104,6 +1104,11 @@ void perf_evsel__config(struct evsel *evsel, struct record_opts *opts,
 	if (opts->record_namespaces)
 		attr->namespaces  = track;
 
+	if (opts->record_cgroup) {
+		attr->cgroup = track && !perf_missing_features.cgroup;
+		perf_evsel__set_sample_bit(evsel, CGROUP);
+	}
+
 	if (opts->record_switch_events)
 		attr->context_switch = track;
 
@@ -1789,7 +1794,11 @@ try_fallback:
 	 * Must probe features in the order they were added to the
 	 * perf_event_attr interface.
 	 */
-	if (!perf_missing_features.branch_hw_idx &&
+        if (!perf_missing_features.cgroup && evsel->core.attr.cgroup) {
+		perf_missing_features.cgroup = true;
+		pr_debug2_peo("Kernel has no cgroup sampling support, bailing out\n");
+		goto out_close;
+        } else if (!perf_missing_features.branch_hw_idx &&
 	    (evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX)) {
 		perf_missing_features.branch_hw_idx = true;
 		pr_debug2("switching off branch HW index support\n");
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 3380474..53187c5 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -120,6 +120,7 @@ struct perf_missing_features {
 	bool bpf;
 	bool aux_output;
 	bool branch_hw_idx;
+	bool cgroup;
 };
 
 extern struct perf_missing_features perf_missing_features;
diff --git a/tools/perf/util/record.h b/tools/perf/util/record.h
index 5421fd2..2431645 100644
--- a/tools/perf/util/record.h
+++ b/tools/perf/util/record.h
@@ -34,6 +34,7 @@ struct record_opts {
 	bool	      auxtrace_snapshot_on_exit;
 	bool	      auxtrace_sample_mode;
 	bool	      record_namespaces;
+	bool	      record_cgroup;
 	bool	      record_switch_events;
 	bool	      all_kernel;
 	bool	      all_user;

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [tip: perf/urgent] perf report: Add 'cgroup' sort key
  2020-03-25 12:45 ` [PATCH 5/9] perf report: Add 'cgroup' sort key Namhyung Kim
@ 2020-04-04  8:41   ` tip-bot2 for Namhyung Kim
  0 siblings, 0 replies; 46+ messages in thread
From: tip-bot2 for Namhyung Kim @ 2020-04-04  8:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Namhyung Kim, Alexander Shishkin, Jiri Olsa, Mark Rutland,
	Peter Zijlstra, Arnaldo Carvalho de Melo, x86, LKML

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID:     b629f3e9d01b4e4ad4e84c8f76fad514967a83da
Gitweb:        https://git.kernel.org/tip/b629f3e9d01b4e4ad4e84c8f76fad514967a83da
Author:        Namhyung Kim <namhyung@kernel.org>
AuthorDate:    Wed, 25 Mar 2020 21:45:32 +09:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Fri, 03 Apr 2020 09:37:55 -03:00

perf report: Add 'cgroup' sort key

The cgroup sort key is to show cgroup membership of each task.
Currently it shows full path in the cgroupfs (not relative to the root
of cgroup namespace) since it'd be more intuitive IMHO.  Otherwise root
cgroup in different namespaces will all show same name - "/".

The cgroup sort key should come before cgroup_id otherwise
sort_dimension__add() will match it to cgroup_id as it only matches with
the given substring.

For example it will look like following.  Note that record patch adding
--all-cgroups patch will come later.

  $ perf record -a --namespace --all-cgroups  cgtest
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.208 MB perf.data (4090 samples) ]

  $ perf report -s cgroup_id,cgroup,pid
  ...
  # Overhead  cgroup id (dev/inode)  Cgroup          Pid:Command
  # ........  .....................  ..........  ...............
  #
      93.96%  0/0x0                  /                 0:swapper
       1.25%  3/0xeffffffb           /               278:looper0
       0.86%  3/0xf000015f           /sub/cgrp1      280:cgtest
       0.37%  3/0xf0000160           /sub/cgrp2      281:cgtest
       0.34%  3/0xf0000163           /sub/cgrp3      282:cgtest
       0.22%  3/0xeffffffb           /sub            278:looper0
       0.20%  3/0xeffffffb           /               280:cgtest
       0.15%  3/0xf0000163           /sub/cgrp3      285:looper3

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200325124536.2800725-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-report.txt |  1 +-
 tools/perf/util/hist.c                   | 13 ++++++++-
 tools/perf/util/hist.h                   |  1 +-
 tools/perf/util/sort.c                   | 37 +++++++++++++++++++++++-
 tools/perf/util/sort.h                   |  2 +-
 5 files changed, 54 insertions(+)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index e56e3f1..f569b9e 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -95,6 +95,7 @@ OPTIONS
 	abort cost. This is the global weight.
 	- local_weight: Local weight version of the weight above.
 	- cgroup_id: ID derived from cgroup namespace device and inode numbers.
+	- cgroup: cgroup pathname in the cgroupfs.
 	- transaction: Transaction abort flags.
 	- overhead: Overhead percentage of sample
 	- overhead_sys: Overhead percentage of sample running in system mode
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index e74a5ac..283a69f 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -10,6 +10,7 @@
 #include "mem-events.h"
 #include "session.h"
 #include "namespaces.h"
+#include "cgroup.h"
 #include "sort.h"
 #include "units.h"
 #include "evlist.h"
@@ -194,6 +195,7 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h)
 		hists__set_unres_dso_col_len(hists, HISTC_MEM_DADDR_DSO);
 	}
 
+	hists__new_col_len(hists, HISTC_CGROUP, 6);
 	hists__new_col_len(hists, HISTC_CGROUP_ID, 20);
 	hists__new_col_len(hists, HISTC_CPU, 3);
 	hists__new_col_len(hists, HISTC_SOCKET, 6);
@@ -222,6 +224,16 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h)
 
 	if (h->trace_output)
 		hists__new_col_len(hists, HISTC_TRACE, strlen(h->trace_output));
+
+	if (h->cgroup) {
+		const char *cgrp_name = "unknown";
+		struct cgroup *cgrp = cgroup__find(h->ms.maps->machine->env,
+						   h->cgroup);
+		if (cgrp != NULL)
+			cgrp_name = cgrp->name;
+
+		hists__new_col_len(hists, HISTC_CGROUP, strlen(cgrp_name));
+	}
 }
 
 void hists__output_recalc_col_len(struct hists *hists, int max_rows)
@@ -691,6 +703,7 @@ __hists__add_entry(struct hists *hists,
 			.dev = ns ? ns->link_info[CGROUP_NS_INDEX].dev : 0,
 			.ino = ns ? ns->link_info[CGROUP_NS_INDEX].ino : 0,
 		},
+		.cgroup = sample->cgroup,
 		.ms = {
 			.maps	= al->maps,
 			.map	= al->map,
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index bb994e0..4141295 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -38,6 +38,7 @@ enum hist_column {
 	HISTC_THREAD,
 	HISTC_COMM,
 	HISTC_CGROUP_ID,
+	HISTC_CGROUP,
 	HISTC_PARENT,
 	HISTC_CPU,
 	HISTC_SOCKET,
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index e860595..f14cc72 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -12,6 +12,7 @@
 #include "cacheline.h"
 #include "comm.h"
 #include "map.h"
+#include "maps.h"
 #include "symbol.h"
 #include "map_symbol.h"
 #include "branch.h"
@@ -25,6 +26,8 @@
 #include "mem-events.h"
 #include "annotate.h"
 #include "time-utils.h"
+#include "cgroup.h"
+#include "machine.h"
 #include <linux/kernel.h>
 #include <linux/string.h>
 
@@ -634,6 +637,39 @@ struct sort_entry sort_cgroup_id = {
 	.se_width_idx	= HISTC_CGROUP_ID,
 };
 
+/* --sort cgroup */
+
+static int64_t
+sort__cgroup_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	return right->cgroup - left->cgroup;
+}
+
+static int hist_entry__cgroup_snprintf(struct hist_entry *he,
+				       char *bf, size_t size,
+				       unsigned int width __maybe_unused)
+{
+	const char *cgrp_name = "N/A";
+
+	if (he->cgroup) {
+		struct cgroup *cgrp = cgroup__find(he->ms.maps->machine->env,
+						   he->cgroup);
+		if (cgrp != NULL)
+			cgrp_name = cgrp->name;
+		else
+			cgrp_name = "unknown";
+	}
+
+	return repsep_snprintf(bf, size, "%s", cgrp_name);
+}
+
+struct sort_entry sort_cgroup = {
+	.se_header      = "Cgroup",
+	.se_cmp	        = sort__cgroup_cmp,
+	.se_snprintf    = hist_entry__cgroup_snprintf,
+	.se_width_idx	= HISTC_CGROUP,
+};
+
 /* --sort socket */
 
 static int64_t
@@ -1660,6 +1696,7 @@ static struct sort_dimension common_sort_dimensions[] = {
 	DIM(SORT_TRACE, "trace", sort_trace),
 	DIM(SORT_SYM_SIZE, "symbol_size", sort_sym_size),
 	DIM(SORT_DSO_SIZE, "dso_size", sort_dso_size),
+	DIM(SORT_CGROUP, "cgroup", sort_cgroup),
 	DIM(SORT_CGROUP_ID, "cgroup_id", sort_cgroup_id),
 	DIM(SORT_SYM_IPC_NULL, "ipc_null", sort_sym_ipc_null),
 	DIM(SORT_TIME, "time", sort_time),
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 6c862d6..cfa6ac6 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -101,6 +101,7 @@ struct hist_entry {
 	struct thread		*thread;
 	struct comm		*comm;
 	struct namespace_id	cgroup_id;
+	u64			cgroup;
 	u64			ip;
 	u64			transaction;
 	s32			socket;
@@ -224,6 +225,7 @@ enum sort_type {
 	SORT_TRACE,
 	SORT_SYM_SIZE,
 	SORT_DSO_SIZE,
+	SORT_CGROUP,
 	SORT_CGROUP_ID,
 	SORT_SYM_IPC_NULL,
 	SORT_TIME,

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [tip: perf/urgent] perf cgroup: Maintain cgroup hierarchy
  2020-03-25 12:45 ` [PATCH 4/9] perf tools: Maintain cgroup hierarchy Namhyung Kim
  2020-04-03 12:36   ` Arnaldo Carvalho de Melo
@ 2020-04-04  8:41   ` tip-bot2 for Namhyung Kim
  1 sibling, 0 replies; 46+ messages in thread
From: tip-bot2 for Namhyung Kim @ 2020-04-04  8:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Namhyung Kim, Alexander Shishkin, Jiri Olsa, Mark Rutland,
	Peter Zijlstra, Arnaldo Carvalho de Melo, x86, LKML

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID:     d1277aa36bff4bfc1a187a469fc6a6a1d17cf59c
Gitweb:        https://git.kernel.org/tip/d1277aa36bff4bfc1a187a469fc6a6a1d17cf59c
Author:        Namhyung Kim <namhyung@kernel.org>
AuthorDate:    Wed, 25 Mar 2020 21:45:31 +09:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Fri, 03 Apr 2020 09:37:55 -03:00

perf cgroup: Maintain cgroup hierarchy

Each cgroup is kept in the perf_env's cgroup_tree sorted by the cgroup
id.  Hist entries have cgroup id can compare it directly and later it
can be used to find a group name using this tree.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200325124536.2800725-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/cgroup.c  | 80 ++++++++++++++++++++++++++++++++++++++-
 tools/perf/util/cgroup.h  | 17 ++++++--
 tools/perf/util/env.c     |  2 +-
 tools/perf/util/env.h     |  6 +++-
 tools/perf/util/machine.c |  9 +++-
 5 files changed, 109 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
index 5bc9d3b..b73fb78 100644
--- a/tools/perf/util/cgroup.c
+++ b/tools/perf/util/cgroup.c
@@ -191,3 +191,83 @@ int parse_cgroups(const struct option *opt, const char *str,
 	}
 	return 0;
 }
+
+static struct cgroup *__cgroup__findnew(struct rb_root *root, uint64_t id,
+					bool create, const char *path)
+{
+	struct rb_node **p = &root->rb_node;
+	struct rb_node *parent = NULL;
+	struct cgroup *cgrp;
+
+	while (*p != NULL) {
+		parent = *p;
+		cgrp = rb_entry(parent, struct cgroup, node);
+
+		if (cgrp->id == id)
+			return cgrp;
+
+		if (cgrp->id < id)
+			p = &(*p)->rb_left;
+		else
+			p = &(*p)->rb_right;
+	}
+
+	if (!create)
+		return NULL;
+
+	cgrp = malloc(sizeof(*cgrp));
+	if (cgrp == NULL)
+		return NULL;
+
+	cgrp->name = strdup(path);
+	if (cgrp->name == NULL) {
+		free(cgrp);
+		return NULL;
+	}
+
+	cgrp->fd = -1;
+	cgrp->id = id;
+	refcount_set(&cgrp->refcnt, 1);
+
+	rb_link_node(&cgrp->node, parent, p);
+	rb_insert_color(&cgrp->node, root);
+
+	return cgrp;
+}
+
+struct cgroup *cgroup__findnew(struct perf_env *env, uint64_t id,
+			       const char *path)
+{
+	struct cgroup *cgrp;
+
+	down_write(&env->cgroups.lock);
+	cgrp = __cgroup__findnew(&env->cgroups.tree, id, true, path);
+	up_write(&env->cgroups.lock);
+	return cgrp;
+}
+
+struct cgroup *cgroup__find(struct perf_env *env, uint64_t id)
+{
+	struct cgroup *cgrp;
+
+	down_read(&env->cgroups.lock);
+	cgrp = __cgroup__findnew(&env->cgroups.tree, id, false, NULL);
+	up_read(&env->cgroups.lock);
+	return cgrp;
+}
+
+void perf_env__purge_cgroups(struct perf_env *env)
+{
+	struct rb_node *node;
+	struct cgroup *cgrp;
+
+	down_write(&env->cgroups.lock);
+	while (!RB_EMPTY_ROOT(&env->cgroups.tree)) {
+		node = rb_first(&env->cgroups.tree);
+		cgrp = rb_entry(node, struct cgroup, node);
+
+		rb_erase(node, &env->cgroups.tree);
+		cgroup__put(cgrp);
+	}
+	up_write(&env->cgroups.lock);
+}
diff --git a/tools/perf/util/cgroup.h b/tools/perf/util/cgroup.h
index 2ec11f0..e98d597 100644
--- a/tools/perf/util/cgroup.h
+++ b/tools/perf/util/cgroup.h
@@ -3,16 +3,19 @@
 #define __CGROUP_H__
 
 #include <linux/refcount.h>
+#include <linux/rbtree.h>
+#include "util/env.h"
 
 struct option;
 
 struct cgroup {
-	char *name;
-	int fd;
-	refcount_t refcnt;
+	struct rb_node		node;
+	u64			id;
+	char			*name;
+	int			fd;
+	refcount_t		refcnt;
 };
 
-
 extern int nr_cgroups; /* number of explicit cgroups defined */
 
 struct cgroup *cgroup__get(struct cgroup *cgroup);
@@ -26,4 +29,10 @@ void evlist__set_default_cgroup(struct evlist *evlist, struct cgroup *cgroup);
 
 int parse_cgroups(const struct option *opt, const char *str, int unset);
 
+struct cgroup *cgroup__findnew(struct perf_env *env, uint64_t id,
+			       const char *path);
+struct cgroup *cgroup__find(struct perf_env *env, uint64_t id);
+
+void perf_env__purge_cgroups(struct perf_env *env);
+
 #endif /* __CGROUP_H__ */
diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 4154f94..fadc597 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -6,6 +6,7 @@
 #include <linux/ctype.h>
 #include <linux/zalloc.h>
 #include "bpf-event.h"
+#include "cgroup.h"
 #include <errno.h>
 #include <sys/utsname.h>
 #include <bpf/libbpf.h>
@@ -168,6 +169,7 @@ void perf_env__exit(struct perf_env *env)
 	int i;
 
 	perf_env__purge_bpf(env);
+	perf_env__purge_cgroups(env);
 	zfree(&env->hostname);
 	zfree(&env->os_release);
 	zfree(&env->version);
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index 11d05ae..7632075 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -88,6 +88,12 @@ struct perf_env {
 		u32			btfs_cnt;
 	} bpf_progs;
 
+	/* same reason as above (for perf-top) */
+	struct {
+		struct rw_semaphore	lock;
+		struct rb_root		tree;
+	} cgroups;
+
 	/* For fast cpu to numa node lookup via perf_env__numa_node */
 	int			*numa_map;
 	int			 nr_numa_map;
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 399b473..97142e9 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -33,6 +33,7 @@
 #include "asm/bug.h"
 #include "bpf-event.h"
 #include <internal/lib.h> // page_size
+#include "cgroup.h"
 
 #include <linux/ctype.h>
 #include <symbol/kallsyms.h>
@@ -654,13 +655,19 @@ int machine__process_namespaces_event(struct machine *machine __maybe_unused,
 	return err;
 }
 
-int machine__process_cgroup_event(struct machine *machine __maybe_unused,
+int machine__process_cgroup_event(struct machine *machine,
 				  union perf_event *event,
 				  struct perf_sample *sample __maybe_unused)
 {
+	struct cgroup *cgrp;
+
 	if (dump_trace)
 		perf_event__fprintf_cgroup(event, stdout);
 
+	cgrp = cgroup__findnew(machine->env, event->cgroup.id, event->cgroup.path);
+	if (cgrp == NULL)
+		return -ENOMEM;
+
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [tip: perf/urgent] perf tools: Basic support for CGROUP event
  2020-03-25 12:45 ` [PATCH 3/9] perf tools: Basic support for CGROUP event Namhyung Kim
  2020-03-27 14:06   ` Arnaldo Carvalho de Melo
@ 2020-04-04  8:41   ` tip-bot2 for Namhyung Kim
  1 sibling, 0 replies; 46+ messages in thread
From: tip-bot2 for Namhyung Kim @ 2020-04-04  8:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: kernel test robot, Namhyung Kim, Adrian Hunter,
	Alexander Shishkin, Jiri Olsa, Mark Rutland, Peter Zijlstra,
	Arnaldo Carvalho de Melo, x86, LKML

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID:     ba78c1c5461c2fc2f57b777e971b3a9ec0df5666
Gitweb:        https://git.kernel.org/tip/ba78c1c5461c2fc2f57b777e971b3a9ec0df5666
Author:        Namhyung Kim <namhyung@kernel.org>
AuthorDate:    Wed, 25 Mar 2020 21:45:30 +09:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Fri, 03 Apr 2020 09:37:55 -03:00

perf tools: Basic support for CGROUP event

Implement basic functionality to support cgroup tracking.  Each cgroup
can be identified by inode number which can be read from userspace too.
The actual cgroup processing will come in the later patch.

Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
[ fix perf test failure on sampling parsing ]
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200325124536.2800725-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/lib/perf/include/perf/event.h       |  7 +++++++
 tools/perf/builtin-diff.c                 |  1 +
 tools/perf/builtin-report.c               |  1 +
 tools/perf/tests/sample-parsing.c         |  6 +++++-
 tools/perf/util/event.c                   | 18 ++++++++++++++++++
 tools/perf/util/event.h                   |  6 ++++++
 tools/perf/util/evsel.c                   |  6 ++++++
 tools/perf/util/machine.c                 | 12 ++++++++++++
 tools/perf/util/machine.h                 |  3 +++
 tools/perf/util/perf_event_attr_fprintf.c |  2 ++
 tools/perf/util/session.c                 |  4 ++++
 tools/perf/util/synthetic-events.c        |  8 ++++++++
 tools/perf/util/tool.h                    |  1 +
 13 files changed, 74 insertions(+), 1 deletion(-)

diff --git a/tools/lib/perf/include/perf/event.h b/tools/lib/perf/include/perf/event.h
index 1810689..69b44d2 100644
--- a/tools/lib/perf/include/perf/event.h
+++ b/tools/lib/perf/include/perf/event.h
@@ -105,6 +105,12 @@ struct perf_record_bpf_event {
 	__u8			 tag[BPF_TAG_SIZE];  // prog tag
 };
 
+struct perf_record_cgroup {
+	struct perf_event_header header;
+	__u64			 id;
+	char			 path[PATH_MAX];
+};
+
 struct perf_record_sample {
 	struct perf_event_header header;
 	__u64			 array[];
@@ -352,6 +358,7 @@ union perf_event {
 	struct perf_record_mmap2		mmap2;
 	struct perf_record_comm			comm;
 	struct perf_record_namespaces		namespaces;
+	struct perf_record_cgroup		cgroup;
 	struct perf_record_fork			fork;
 	struct perf_record_lost			lost;
 	struct perf_record_lost_samples		lost_samples;
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 5e697cd..c94a002 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -455,6 +455,7 @@ static struct perf_diff pdiff = {
 		.fork	= perf_event__process_fork,
 		.lost	= perf_event__process_lost,
 		.namespaces = perf_event__process_namespaces,
+		.cgroup = perf_event__process_cgroup,
 		.ordered_events = true,
 		.ordering_requires_timestamps = true,
 	},
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index ea673b7..26d8fc2 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -1105,6 +1105,7 @@ int cmd_report(int argc, const char **argv)
 			.mmap2		 = perf_event__process_mmap2,
 			.comm		 = perf_event__process_comm,
 			.namespaces	 = perf_event__process_namespaces,
+			.cgroup		 = perf_event__process_cgroup,
 			.exit		 = perf_event__process_exit,
 			.fork		 = perf_event__process_fork,
 			.lost		 = perf_event__process_lost,
diff --git a/tools/perf/tests/sample-parsing.c b/tools/perf/tests/sample-parsing.c
index 14239e4..6186569 100644
--- a/tools/perf/tests/sample-parsing.c
+++ b/tools/perf/tests/sample-parsing.c
@@ -151,6 +151,9 @@ static bool samples_same(const struct perf_sample *s1,
 	if (type & PERF_SAMPLE_PHYS_ADDR)
 		COMP(phys_addr);
 
+	if (type & PERF_SAMPLE_CGROUP)
+		COMP(cgroup);
+
 	if (type & PERF_SAMPLE_AUX) {
 		COMP(aux_sample.size);
 		if (memcmp(s1->aux_sample.data, s2->aux_sample.data,
@@ -230,6 +233,7 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
 			.regs	= regs,
 		},
 		.phys_addr	= 113,
+		.cgroup		= 114,
 		.aux_sample	= {
 			.size	= sizeof(aux_data),
 			.data	= (void *)aux_data,
@@ -336,7 +340,7 @@ int test__sample_parsing(struct test *test __maybe_unused, int subtest __maybe_u
 	 * were added.  Please actually update the test rather than just change
 	 * the condition below.
 	 */
-	if (PERF_SAMPLE_MAX > PERF_SAMPLE_AUX << 1) {
+	if (PERF_SAMPLE_MAX > PERF_SAMPLE_CGROUP << 1) {
 		pr_debug("sample format has changed, some new PERF_SAMPLE_ bit was introduced - test needs updating\n");
 		return -1;
 	}
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index c5447ff..824c038 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -54,6 +54,7 @@ static const char *perf_event__names[] = {
 	[PERF_RECORD_NAMESPACES]		= "NAMESPACES",
 	[PERF_RECORD_KSYMBOL]			= "KSYMBOL",
 	[PERF_RECORD_BPF_EVENT]			= "BPF_EVENT",
+	[PERF_RECORD_CGROUP]			= "CGROUP",
 	[PERF_RECORD_HEADER_ATTR]		= "ATTR",
 	[PERF_RECORD_HEADER_EVENT_TYPE]		= "EVENT_TYPE",
 	[PERF_RECORD_HEADER_TRACING_DATA]	= "TRACING_DATA",
@@ -180,6 +181,12 @@ size_t perf_event__fprintf_namespaces(union perf_event *event, FILE *fp)
 	return ret;
 }
 
+size_t perf_event__fprintf_cgroup(union perf_event *event, FILE *fp)
+{
+	return fprintf(fp, " cgroup: %" PRI_lu64 " %s\n",
+		       event->cgroup.id, event->cgroup.path);
+}
+
 int perf_event__process_comm(struct perf_tool *tool __maybe_unused,
 			     union perf_event *event,
 			     struct perf_sample *sample,
@@ -196,6 +203,14 @@ int perf_event__process_namespaces(struct perf_tool *tool __maybe_unused,
 	return machine__process_namespaces_event(machine, event, sample);
 }
 
+int perf_event__process_cgroup(struct perf_tool *tool __maybe_unused,
+			       union perf_event *event,
+			       struct perf_sample *sample,
+			       struct machine *machine)
+{
+	return machine__process_cgroup_event(machine, event, sample);
+}
+
 int perf_event__process_lost(struct perf_tool *tool __maybe_unused,
 			     union perf_event *event,
 			     struct perf_sample *sample,
@@ -417,6 +432,9 @@ size_t perf_event__fprintf(union perf_event *event, FILE *fp)
 	case PERF_RECORD_NAMESPACES:
 		ret += perf_event__fprintf_namespaces(event, fp);
 		break;
+	case PERF_RECORD_CGROUP:
+		ret += perf_event__fprintf_cgroup(event, fp);
+		break;
 	case PERF_RECORD_MMAP2:
 		ret += perf_event__fprintf_mmap2(event, fp);
 		break;
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 3cda40a..b8289f1 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -135,6 +135,7 @@ struct perf_sample {
 	u32 raw_size;
 	u64 data_src;
 	u64 phys_addr;
+	u64 cgroup;
 	u32 flags;
 	u16 insn_len;
 	u8  cpumode;
@@ -322,6 +323,10 @@ int perf_event__process_namespaces(struct perf_tool *tool,
 				   union perf_event *event,
 				   struct perf_sample *sample,
 				   struct machine *machine);
+int perf_event__process_cgroup(struct perf_tool *tool,
+			       union perf_event *event,
+			       struct perf_sample *sample,
+			       struct machine *machine);
 int perf_event__process_mmap(struct perf_tool *tool,
 			     union perf_event *event,
 			     struct perf_sample *sample,
@@ -377,6 +382,7 @@ size_t perf_event__fprintf_switch(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_thread_map(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_cpu_map(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_namespaces(union perf_event *event, FILE *fp);
+size_t perf_event__fprintf_cgroup(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_ksymbol(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_bpf(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf(union perf_event *event, FILE *fp);
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 15ccd19..b766eb6 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2267,6 +2267,12 @@ int perf_evsel__parse_sample(struct evsel *evsel, union perf_event *event,
 		array++;
 	}
 
+	data->cgroup = 0;
+	if (type & PERF_SAMPLE_CGROUP) {
+		data->cgroup = *array;
+		array++;
+	}
+
 	if (type & PERF_SAMPLE_AUX) {
 		OVERFLOW_CHECK_u64(array);
 		sz = *array++;
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index fd14f14..399b473 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -654,6 +654,16 @@ int machine__process_namespaces_event(struct machine *machine __maybe_unused,
 	return err;
 }
 
+int machine__process_cgroup_event(struct machine *machine __maybe_unused,
+				  union perf_event *event,
+				  struct perf_sample *sample __maybe_unused)
+{
+	if (dump_trace)
+		perf_event__fprintf_cgroup(event, stdout);
+
+	return 0;
+}
+
 int machine__process_lost_event(struct machine *machine __maybe_unused,
 				union perf_event *event, struct perf_sample *sample __maybe_unused)
 {
@@ -1878,6 +1888,8 @@ int machine__process_event(struct machine *machine, union perf_event *event,
 		ret = machine__process_mmap_event(machine, event, sample); break;
 	case PERF_RECORD_NAMESPACES:
 		ret = machine__process_namespaces_event(machine, event, sample); break;
+	case PERF_RECORD_CGROUP:
+		ret = machine__process_cgroup_event(machine, event, sample); break;
 	case PERF_RECORD_MMAP2:
 		ret = machine__process_mmap2_event(machine, event, sample); break;
 	case PERF_RECORD_FORK:
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index be0a930..fa1be9e 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -128,6 +128,9 @@ int machine__process_switch_event(struct machine *machine,
 int machine__process_namespaces_event(struct machine *machine,
 				      union perf_event *event,
 				      struct perf_sample *sample);
+int machine__process_cgroup_event(struct machine *machine,
+				  union perf_event *event,
+				  struct perf_sample *sample);
 int machine__process_mmap_event(struct machine *machine, union perf_event *event,
 				struct perf_sample *sample);
 int machine__process_mmap2_event(struct machine *machine, union perf_event *event,
diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/perf_event_attr_fprintf.c
index 355d345..b94fa07 100644
--- a/tools/perf/util/perf_event_attr_fprintf.c
+++ b/tools/perf/util/perf_event_attr_fprintf.c
@@ -35,6 +35,7 @@ static void __p_sample_type(char *buf, size_t size, u64 value)
 		bit_name(BRANCH_STACK), bit_name(REGS_USER), bit_name(STACK_USER),
 		bit_name(IDENTIFIER), bit_name(REGS_INTR), bit_name(DATA_SRC),
 		bit_name(WEIGHT), bit_name(PHYS_ADDR), bit_name(AUX),
+		bit_name(CGROUP),
 		{ .name = NULL, }
 	};
 #undef bit_name
@@ -132,6 +133,7 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
 	PRINT_ATTRf(ksymbol, p_unsigned);
 	PRINT_ATTRf(bpf_event, p_unsigned);
 	PRINT_ATTRf(aux_output, p_unsigned);
+	PRINT_ATTRf(cgroup, p_unsigned);
 
 	PRINT_ATTRn("{ wakeup_events, wakeup_watermark }", wakeup_events, p_unsigned);
 	PRINT_ATTRf(bp_type, p_unsigned);
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 055b00a..0b0bfe5 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -471,6 +471,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
 		tool->comm = process_event_stub;
 	if (tool->namespaces == NULL)
 		tool->namespaces = process_event_stub;
+	if (tool->cgroup == NULL)
+		tool->cgroup = process_event_stub;
 	if (tool->fork == NULL)
 		tool->fork = process_event_stub;
 	if (tool->exit == NULL)
@@ -1436,6 +1438,8 @@ static int machines__deliver_event(struct machines *machines,
 		return tool->comm(tool, event, sample, machine);
 	case PERF_RECORD_NAMESPACES:
 		return tool->namespaces(tool, event, sample, machine);
+	case PERF_RECORD_CGROUP:
+		return tool->cgroup(tool, event, sample, machine);
 	case PERF_RECORD_FORK:
 		return tool->fork(tool, event, sample, machine);
 	case PERF_RECORD_EXIT:
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index 3f28af3..f72d809 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -1230,6 +1230,9 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
 	if (type & PERF_SAMPLE_PHYS_ADDR)
 		result += sizeof(u64);
 
+	if (type & PERF_SAMPLE_CGROUP)
+		result += sizeof(u64);
+
 	if (type & PERF_SAMPLE_AUX) {
 		result += sizeof(u64);
 		result += sample->aux_sample.size;
@@ -1404,6 +1407,11 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo
 		array++;
 	}
 
+	if (type & PERF_SAMPLE_CGROUP) {
+		*array = sample->cgroup;
+		array++;
+	}
+
 	if (type & PERF_SAMPLE_AUX) {
 		sz = sample->aux_sample.size;
 		*array++ = sz;
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 2abbf66..472ef5e 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -46,6 +46,7 @@ struct perf_tool {
 			mmap2,
 			comm,
 			namespaces,
+			cgroup,
 			fork,
 			exit,
 			lost,

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [tip: perf/urgent] perf tools: Add file-handle feature test
  2020-04-02  1:52         ` [PATCH] perf tools: Add file-handle feature test Namhyung Kim
  2020-04-02 15:37           ` Arnaldo Carvalho de Melo
@ 2020-04-04  8:41           ` tip-bot2 for Namhyung Kim
  1 sibling, 0 replies; 46+ messages in thread
From: tip-bot2 for Namhyung Kim @ 2020-04-04  8:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Arnaldo Carvalho de Melo, Namhyung Kim, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Jiri Olsa, Mark Rutland, Peter Zijlstra, x86,
	LKML

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID:     49f550ea87c73671ee4d5e08820e0976894dd610
Gitweb:        https://git.kernel.org/tip/49f550ea87c73671ee4d5e08820e0976894dd610
Author:        Namhyung Kim <namhyung@kernel.org>
AuthorDate:    Thu, 02 Apr 2020 10:52:49 +09:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Fri, 03 Apr 2020 09:37:55 -03:00

perf tools: Add file-handle feature test

The file handle (FHANDLE) support is configurable so some systems might not
have it.  So add a config feature item to check it on build time so that we
don't add the cgroup tracking feature based on that.

Committer notes:

Had to make the test use the same construct as its later use in
synthetic-events.c, in the next patch in this series. i.e. make it be:

	struct {
		struct file_handle fh;
		uint64_t cgroup_id;
	} handle;

To cope with:

    CC       /tmp/build/perf/util/cloexec.o
  util/synthetic-events.c:428:22: error: field 'fh' with   CC       /tmp/build/perf/util/call-path.o
  variable sized type 'struct file_handle' not at the end of a struct or class is a GNU
        extension [-Werror,-Wgnu-variable-sized-type-not-at-end]
                  struct file_handle fh;
                                     ^
  1 error generated.

Deal with this at some point, i.e. investigate if the right thing is to
remove that -Wgnu-variable-sized-type-not-at-end from our CFLAGS, for
now do the test the same way as it is used looks more sensible.

Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200402015249.3800462-1-namhyung@kernel.org
[ split from a larger patch, removed blank line at EOF ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/build/Makefile.feature           |  3 ++-
 tools/build/feature/Makefile           |  6 +++++-
 tools/build/feature/test-file-handle.c | 17 +++++++++++++++++
 tools/perf/Makefile.config             |  4 ++++
 4 files changed, 28 insertions(+), 2 deletions(-)
 create mode 100644 tools/build/feature/test-file-handle.c

diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature
index 574c2e0..3e0c019 100644
--- a/tools/build/Makefile.feature
+++ b/tools/build/Makefile.feature
@@ -72,7 +72,8 @@ FEATURE_TESTS_BASIC :=                  \
         setns				\
         libaio				\
         libzstd				\
-        disassembler-four-args
+        disassembler-four-args		\
+        file-handle
 
 # FEATURE_TESTS_BASIC + FEATURE_TESTS_EXTRA is the complete list
 # of all feature tests
diff --git a/tools/build/feature/Makefile b/tools/build/feature/Makefile
index 7ac0d80..621f528 100644
--- a/tools/build/feature/Makefile
+++ b/tools/build/feature/Makefile
@@ -67,7 +67,8 @@ FILES=                                          \
          test-llvm.bin				\
          test-llvm-version.bin			\
          test-libaio.bin			\
-         test-libzstd.bin
+         test-libzstd.bin			\
+         test-file-handle.bin
 
 FILES := $(addprefix $(OUTPUT),$(FILES))
 
@@ -321,6 +322,9 @@ $(OUTPUT)test-libaio.bin:
 $(OUTPUT)test-libzstd.bin:
 	$(BUILD) -lzstd
 
+$(OUTPUT)test-file-handle.bin:
+	$(BUILD)
+
 ###############################
 
 clean:
diff --git a/tools/build/feature/test-file-handle.c b/tools/build/feature/test-file-handle.c
new file mode 100644
index 0000000..4d3b03b
--- /dev/null
+++ b/tools/build/feature/test-file-handle.c
@@ -0,0 +1,17 @@
+#define _GNU_SOURCE
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <inttypes.h>
+
+int main(void)
+{
+	struct {
+		struct file_handle fh;
+		uint64_t cgroup_id;
+	} handle;
+	int mount_id;
+
+	name_to_handle_at(AT_FDCWD, "/", &handle.fh, &mount_id, 0);
+	return 0;
+}
diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
index 80e55e7..eb95c0c 100644
--- a/tools/perf/Makefile.config
+++ b/tools/perf/Makefile.config
@@ -348,6 +348,10 @@ ifeq ($(feature-gettid), 1)
   CFLAGS += -DHAVE_GETTID
 endif
 
+ifeq ($(feature-file-handle), 1)
+  CFLAGS += -DHAVE_FILE_HANDLE
+endif
+
 ifdef NO_LIBELF
   NO_DWARF := 1
   NO_DEMANGLE := 1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [tip: perf/urgent] perf python: Include rwsem.c in the pythong biding
  2020-04-03 12:36   ` Arnaldo Carvalho de Melo
  2020-04-04  1:29     ` Namhyung Kim
@ 2020-04-04  8:41     ` tip-bot2 for Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 46+ messages in thread
From: tip-bot2 for Arnaldo Carvalho de Melo @ 2020-04-04  8:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Adrian Hunter, Alexander Shishkin, Jiri Olsa, Mark Rutland,
	Namhyung Kim, Peter Zijlstra, Arnaldo Carvalho de Melo, x86,
	LKML

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID:     460c3ed999d700fa84c953f83141f580ef456b3c
Gitweb:        https://git.kernel.org/tip/460c3ed999d700fa84c953f83141f580ef456b3c
Author:        Arnaldo Carvalho de Melo <acme@redhat.com>
AuthorDate:    Fri, 03 Apr 2020 09:29:52 -03:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Fri, 03 Apr 2020 09:37:55 -03:00

perf python: Include rwsem.c in the pythong biding

We'll need it for the cgroup patches, and its better to have it in a
separate patch in case we need to later revert the cgroup patches.

I.e. without this we have:

  [root@five ~]# perf test -v python
  19: 'import perf' in python                               :
  --- start ---
  test child forked, pid 148447
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  ImportError: /tmp/build/perf/python/perf.cpython-37m-x86_64-linux-gnu.so: undefined symbol: down_write
  test child finished with -1
  ---- end ----
  'import perf' in python: FAILED!
  [root@five ~]#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200403123606.GC23243@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/python-ext-sources | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/util/python-ext-sources b/tools/perf/util/python-ext-sources
index e7279ea..a9d9c14 100644
--- a/tools/perf/util/python-ext-sources
+++ b/tools/perf/util/python-ext-sources
@@ -34,3 +34,4 @@ util/string.c
 util/symbol_fprintf.c
 util/units.c
 util/affinity.c
+util/rwsem.c

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [tip: perf/urgent] perf/core: Add PERF_SAMPLE_CGROUP feature
  2020-03-25 12:45 ` [PATCH 2/9] perf/core: Add PERF_SAMPLE_CGROUP feature Namhyung Kim
@ 2020-04-04  8:41   ` tip-bot2 for Namhyung Kim
  0 siblings, 0 replies; 46+ messages in thread
From: tip-bot2 for Namhyung Kim @ 2020-04-04  8:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Namhyung Kim, Arnaldo Carvalho de Melo, Peter Zijlstra (Intel),
	Tejun Heo, Alexander Shishkin, Jiri Olsa, Johannes Weiner,
	Mark Rutland, Zefan Li, x86, LKML

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID:     6546b19f95acc986807de981402bbac6b3a94b0f
Gitweb:        https://git.kernel.org/tip/6546b19f95acc986807de981402bbac6b3a94b0f
Author:        Namhyung Kim <namhyung@kernel.org>
AuthorDate:    Wed, 25 Mar 2020 21:45:29 +09:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Fri, 27 Mar 2020 10:41:44 -03:00

perf/core: Add PERF_SAMPLE_CGROUP feature

The PERF_SAMPLE_CGROUP bit is to save (perf_event) cgroup information in
the sample.  It will add a 64-bit id to identify current cgroup and it's
the file handle in the cgroup file system.  Userspace should use this
information with PERF_RECORD_CGROUP event to match which cgroup it
belongs.

I put it before PERF_SAMPLE_AUX for simplicity since it just needs a
64-bit word.  But if we want bigger samples, I can work on that
direction too.

Committer testing:

  $ pahole perf_sample_data | grep -w cgroup -B5 -A5
  	/* --- cacheline 4 boundary (256 bytes) was 56 bytes ago --- */
  	struct perf_regs           regs_intr;            /*   312    16 */
  	/* --- cacheline 5 boundary (320 bytes) was 8 bytes ago --- */
  	u64                        stack_user_size;      /*   328     8 */
  	u64                        phys_addr;            /*   336     8 */
  	u64                        cgroup;               /*   344     8 */

  	/* size: 384, cachelines: 6, members: 22 */
  	/* padding: 32 */
  };
  $

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Zefan Li <lizefan@huawei.com>
Link: http://lore.kernel.org/lkml/20200325124536.2800725-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 include/linux/perf_event.h      |  1 +
 include/uapi/linux/perf_event.h |  3 ++-
 init/Kconfig                    |  3 ++-
 kernel/events/core.c            | 22 ++++++++++++++++++++++
 4 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 8768a39..9c3e761 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1020,6 +1020,7 @@ struct perf_sample_data {
 	u64				stack_user_size;
 
 	u64				phys_addr;
+	u64				cgroup;
 } ____cacheline_aligned;
 
 /* default value for data source */
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index de95f6c..7b2d6fc 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -142,8 +142,9 @@ enum perf_event_sample_format {
 	PERF_SAMPLE_REGS_INTR			= 1U << 18,
 	PERF_SAMPLE_PHYS_ADDR			= 1U << 19,
 	PERF_SAMPLE_AUX				= 1U << 20,
+	PERF_SAMPLE_CGROUP			= 1U << 21,
 
-	PERF_SAMPLE_MAX = 1U << 21,		/* non-ABI */
+	PERF_SAMPLE_MAX = 1U << 22,		/* non-ABI */
 
 	__PERF_SAMPLE_CALLCHAIN_EARLY		= 1ULL << 63, /* non-ABI; internal use */
 };
diff --git a/init/Kconfig b/init/Kconfig
index 20a6ac3..7766b06 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1027,7 +1027,8 @@ config CGROUP_PERF
 	help
 	  This option extends the perf per-cpu mode to restrict monitoring
 	  to threads which belong to the cgroup specified and run on the
-	  designated cpu.
+	  designated cpu.  Or this can be used to have cgroup ID in samples
+	  so that it can monitor performance events among cgroups.
 
 	  Say N if unsure.
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 994932d..1569979 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1862,6 +1862,9 @@ static void __perf_event_header_size(struct perf_event *event, u64 sample_type)
 	if (sample_type & PERF_SAMPLE_PHYS_ADDR)
 		size += sizeof(data->phys_addr);
 
+	if (sample_type & PERF_SAMPLE_CGROUP)
+		size += sizeof(data->cgroup);
+
 	event->header_size = size;
 }
 
@@ -6867,6 +6870,9 @@ void perf_output_sample(struct perf_output_handle *handle,
 	if (sample_type & PERF_SAMPLE_PHYS_ADDR)
 		perf_output_put(handle, data->phys_addr);
 
+	if (sample_type & PERF_SAMPLE_CGROUP)
+		perf_output_put(handle, data->cgroup);
+
 	if (sample_type & PERF_SAMPLE_AUX) {
 		perf_output_put(handle, data->aux_size);
 
@@ -7066,6 +7072,16 @@ void perf_prepare_sample(struct perf_event_header *header,
 	if (sample_type & PERF_SAMPLE_PHYS_ADDR)
 		data->phys_addr = perf_virt_to_phys(data->addr);
 
+#ifdef CONFIG_CGROUP_PERF
+	if (sample_type & PERF_SAMPLE_CGROUP) {
+		struct cgroup *cgrp;
+
+		/* protected by RCU */
+		cgrp = task_css_check(current, perf_event_cgrp_id, 1)->cgroup;
+		data->cgroup = cgroup_id(cgrp);
+	}
+#endif
+
 	if (sample_type & PERF_SAMPLE_AUX) {
 		u64 size;
 
@@ -11264,6 +11280,12 @@ static int perf_copy_attr(struct perf_event_attr __user *uattr,
 
 	if (attr->sample_type & PERF_SAMPLE_REGS_INTR)
 		ret = perf_reg_validate(attr->sample_regs_intr);
+
+#ifndef CONFIG_CGROUP_PERF
+	if (attr->sample_type & PERF_SAMPLE_CGROUP)
+		return -EINVAL;
+#endif
+
 out:
 	return ret;
 

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [tip: perf/urgent] perf/core: Add PERF_RECORD_CGROUP event
  2020-03-25 12:45 ` [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event Namhyung Kim
@ 2020-04-04  8:41   ` tip-bot2 for Namhyung Kim
  0 siblings, 0 replies; 46+ messages in thread
From: tip-bot2 for Namhyung Kim @ 2020-04-04  8:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: kbuild test robot, Namhyung Kim, Arnaldo Carvalho de Melo,
	Peter Zijlstra (Intel),
	Tejun Heo, Adrian Hunter, Alexander Shishkin, Jiri Olsa,
	Johannes Weiner, Mark Rutland, Zefan Li, x86, LKML

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID:     96aaab686505c449e24d76e76507290dcc30e008
Gitweb:        https://git.kernel.org/tip/96aaab686505c449e24d76e76507290dcc30e008
Author:        Namhyung Kim <namhyung@kernel.org>
AuthorDate:    Wed, 25 Mar 2020 21:45:28 +09:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Fri, 27 Mar 2020 10:39:11 -03:00

perf/core: Add PERF_RECORD_CGROUP event

To support cgroup tracking, add CGROUP event to save a link between
cgroup path and id number.  This is needed since cgroups can go away
when userspace tries to read the cgroup info (from the id) later.

The attr.cgroup bit was also added to enable cgroup tracking from
userspace.

This event will be generated when a new cgroup becomes active.
Userspace might need to synthesize those events for existing cgroups.

Committer testing:

>From the resulting kernel, using /sys/kernel/btf/vmlinux:

  $ pahole perf_event_attr | grep -w cgroup -B5 -A1
  	__u64                      write_backward:1;     /*    40:27  8 */
  	__u64                      namespaces:1;         /*    40:28  8 */
  	__u64                      ksymbol:1;            /*    40:29  8 */
  	__u64                      bpf_event:1;          /*    40:30  8 */
  	__u64                      aux_output:1;         /*    40:31  8 */
  	__u64                      cgroup:1;             /*    40:32  8 */
  	__u64                      __reserved_1:31;      /*    40:33  8 */
  $

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
[staticize perf_event_cgroup function]
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Zefan Li <lizefan@huawei.com>
Link: http://lore.kernel.org/lkml/20200325124536.2800725-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 include/uapi/linux/perf_event.h |  13 +++-
 kernel/events/core.c            | 111 +++++++++++++++++++++++++++++++-
 2 files changed, 123 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 397cfd6..de95f6c 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -381,7 +381,8 @@ struct perf_event_attr {
 				ksymbol        :  1, /* include ksymbol events */
 				bpf_event      :  1, /* include bpf events */
 				aux_output     :  1, /* generate AUX records instead of events */
-				__reserved_1   : 32;
+				cgroup         :  1, /* include cgroup events */
+				__reserved_1   : 31;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -1012,6 +1013,16 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_BPF_EVENT			= 18,
 
+	/*
+	 * struct {
+	 *	struct perf_event_header	header;
+	 *	u64				id;
+	 *	char				path[];
+	 *	struct sample_id		sample_id;
+	 * };
+	 */
+	PERF_RECORD_CGROUP			= 19,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index d22e4ba..994932d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -387,6 +387,7 @@ static atomic_t nr_freq_events __read_mostly;
 static atomic_t nr_switch_events __read_mostly;
 static atomic_t nr_ksymbol_events __read_mostly;
 static atomic_t nr_bpf_events __read_mostly;
+static atomic_t nr_cgroup_events __read_mostly;
 
 static LIST_HEAD(pmus);
 static DEFINE_MUTEX(pmus_lock);
@@ -4608,6 +4609,8 @@ static void unaccount_event(struct perf_event *event)
 		atomic_dec(&nr_comm_events);
 	if (event->attr.namespaces)
 		atomic_dec(&nr_namespaces_events);
+	if (event->attr.cgroup)
+		atomic_dec(&nr_cgroup_events);
 	if (event->attr.task)
 		atomic_dec(&nr_task_events);
 	if (event->attr.freq)
@@ -7736,6 +7739,105 @@ void perf_event_namespaces(struct task_struct *task)
 }
 
 /*
+ * cgroup tracking
+ */
+#ifdef CONFIG_CGROUP_PERF
+
+struct perf_cgroup_event {
+	char				*path;
+	int				path_size;
+	struct {
+		struct perf_event_header	header;
+		u64				id;
+		char				path[];
+	} event_id;
+};
+
+static int perf_event_cgroup_match(struct perf_event *event)
+{
+	return event->attr.cgroup;
+}
+
+static void perf_event_cgroup_output(struct perf_event *event, void *data)
+{
+	struct perf_cgroup_event *cgroup_event = data;
+	struct perf_output_handle handle;
+	struct perf_sample_data sample;
+	u16 header_size = cgroup_event->event_id.header.size;
+	int ret;
+
+	if (!perf_event_cgroup_match(event))
+		return;
+
+	perf_event_header__init_id(&cgroup_event->event_id.header,
+				   &sample, event);
+	ret = perf_output_begin(&handle, event,
+				cgroup_event->event_id.header.size);
+	if (ret)
+		goto out;
+
+	perf_output_put(&handle, cgroup_event->event_id);
+	__output_copy(&handle, cgroup_event->path, cgroup_event->path_size);
+
+	perf_event__output_id_sample(event, &handle, &sample);
+
+	perf_output_end(&handle);
+out:
+	cgroup_event->event_id.header.size = header_size;
+}
+
+static void perf_event_cgroup(struct cgroup *cgrp)
+{
+	struct perf_cgroup_event cgroup_event;
+	char path_enomem[16] = "//enomem";
+	char *pathname;
+	size_t size;
+
+	if (!atomic_read(&nr_cgroup_events))
+		return;
+
+	cgroup_event = (struct perf_cgroup_event){
+		.event_id  = {
+			.header = {
+				.type = PERF_RECORD_CGROUP,
+				.misc = 0,
+				.size = sizeof(cgroup_event.event_id),
+			},
+			.id = cgroup_id(cgrp),
+		},
+	};
+
+	pathname = kmalloc(PATH_MAX, GFP_KERNEL);
+	if (pathname == NULL) {
+		cgroup_event.path = path_enomem;
+	} else {
+		/* just to be sure to have enough space for alignment */
+		cgroup_path(cgrp, pathname, PATH_MAX - sizeof(u64));
+		cgroup_event.path = pathname;
+	}
+
+	/*
+	 * Since our buffer works in 8 byte units we need to align our string
+	 * size to a multiple of 8. However, we must guarantee the tail end is
+	 * zero'd out to avoid leaking random bits to userspace.
+	 */
+	size = strlen(cgroup_event.path) + 1;
+	while (!IS_ALIGNED(size, sizeof(u64)))
+		cgroup_event.path[size++] = '\0';
+
+	cgroup_event.event_id.header.size += size;
+	cgroup_event.path_size = size;
+
+	perf_iterate_sb(perf_event_cgroup_output,
+			&cgroup_event,
+			NULL);
+
+	kfree(pathname);
+}
+
+#endif
+
+/*
  * mmap tracking
  */
 
@@ -10781,6 +10883,8 @@ static void account_event(struct perf_event *event)
 		atomic_inc(&nr_comm_events);
 	if (event->attr.namespaces)
 		atomic_inc(&nr_namespaces_events);
+	if (event->attr.cgroup)
+		atomic_inc(&nr_cgroup_events);
 	if (event->attr.task)
 		atomic_inc(&nr_task_events);
 	if (event->attr.freq)
@@ -12757,6 +12861,12 @@ static void perf_cgroup_css_free(struct cgroup_subsys_state *css)
 	kfree(jc);
 }
 
+static int perf_cgroup_css_online(struct cgroup_subsys_state *css)
+{
+	perf_event_cgroup(css->cgroup);
+	return 0;
+}
+
 static int __perf_cgroup_move(void *info)
 {
 	struct task_struct *task = info;
@@ -12778,6 +12888,7 @@ static void perf_cgroup_attach(struct cgroup_taskset *tset)
 struct cgroup_subsys perf_event_cgrp_subsys = {
 	.css_alloc	= perf_cgroup_css_alloc,
 	.css_free	= perf_cgroup_css_free,
+	.css_online	= perf_cgroup_css_online,
 	.attach		= perf_cgroup_attach,
 	/*
 	 * Implicitly enable on dfl hierarchy so that perf events can

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 6/9] perf record: Support synthesizing cgroup events
  2020-01-08 22:21   ` Jiri Olsa
@ 2020-01-09  7:51     ` Namhyung Kim
  0 siblings, 0 replies; 46+ messages in thread
From: Namhyung Kim @ 2020-01-09  7:51 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

On Thu, Jan 9, 2020 at 7:22 AM Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Tue, Jan 07, 2020 at 10:34:58PM +0900, Namhyung Kim wrote:
>
> SNIP
>
> > +     closedir(d);
> > +     return ret;
> > +}
> > +
> > +int perf_event__synthesize_cgroups(struct perf_tool *tool,
> > +                                perf_event__handler_t process,
> > +                                struct machine *machine)
> > +{
> > +     union perf_event event;
> > +     char *cgrp_root;
> > +     size_t mount_len;  /* length of mount point in the path */
> > +     int ret = -1;
> > +
> > +     cgrp_root = malloc(PATH_MAX);
> > +     if (cgrp_root == NULL)
> > +             return -1;
> > +
>
> hum, we normally use bufs with PATH_MAX size on stack..
> is there some reason to use heap in here?

No specific reason, will change.

Thanks
Namhyung

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 6/9] perf record: Support synthesizing cgroup events
  2020-01-08 22:18   ` Jiri Olsa
@ 2020-01-09  7:50     ` Namhyung Kim
  0 siblings, 0 replies; 46+ messages in thread
From: Namhyung Kim @ 2020-01-09  7:50 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

On Thu, Jan 9, 2020 at 7:18 AM Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Tue, Jan 07, 2020 at 10:34:58PM +0900, Namhyung Kim wrote:
>
> SNIP
>
> > +     if (perf_tool__process_synth_event(tool, event, machine, process) < 0) {
> > +             pr_debug("process synth event failed\n");
> > +             return -1;
> > +     }
> > +
> > +     return 0;
> > +}
> > +
> > +static int perf_event__walk_cgroup_tree(struct perf_tool *tool,
> > +                                     union perf_event *event,
> > +                                     char *path, size_t mount_len,
> > +                                     perf_event__handler_t process,
> > +                                     struct machine *machine)
> > +{
> > +     size_t pos = strlen(path);
> > +     DIR *d;
> > +     struct dirent *dent;
> > +     int ret = 0;
> > +
> > +     if (perf_event__synthesize_cgroup(tool, event, path, mount_len,
> > +                                       process, machine) < 0)
> > +             return -1;
> > +
> > +     d = opendir(path);
> > +     if (d == NULL) {
> > +             pr_debug("failed to open directory: %s\n", path);
> > +             return -1;
> > +     }
> > +
> > +     while ((dent = readdir(d)) != NULL) {
> > +             if (dent->d_type != DT_DIR)
> > +                     continue;
> > +             if (!strcmp(dent->d_name, ".") ||
> > +                 !strcmp(dent->d_name, ".."))
> > +                     continue;
> > +
> > +             if (path[pos - 1] != '/')
> > +                     strcat(path, "/");
> > +             strcat(path, dent->d_name);
>
> shouldn't you also check for the max size of path in here?

I guess a path should be shorter than PATH_MAX.
But yes, I can add a check if you want.

Thanks
Namhyung

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 6/9] perf record: Support synthesizing cgroup events
  2020-01-08 22:08   ` Jiri Olsa
@ 2020-01-09  5:19     ` Namhyung Kim
  0 siblings, 0 replies; 46+ messages in thread
From: Namhyung Kim @ 2020-01-09  5:19 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

On Thu, Jan 9, 2020 at 7:09 AM Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Tue, Jan 07, 2020 at 10:34:58PM +0900, Namhyung Kim wrote:
> > diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
> > index 4e8ef1db0c94..5147d22b3bda 100644
> > --- a/tools/perf/util/cgroup.c
> > +++ b/tools/perf/util/cgroup.c
> > @@ -15,8 +15,7 @@ int nr_cgroups;
> >
> >  static struct rb_root cgroup_tree = RB_ROOT;
> >
> > -static int
> > -cgroupfs_find_mountpoint(char *buf, size_t maxlen)
> > +int cgroupfs_find_mountpoint(char *buf, size_t maxlen)
>
> out of scope of this change, but could this be added to api/fs/fs.c?
> it might need more checks then is currently supported, but would be
> nice to have it under same api as the rest

Makes sense, will try to move it.

Thanks
Namhyung

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 6/9] perf record: Support synthesizing cgroup events
  2020-01-07 13:34 ` [PATCH 6/9] perf record: Support synthesizing cgroup events Namhyung Kim
  2020-01-08 22:08   ` Jiri Olsa
  2020-01-08 22:18   ` Jiri Olsa
@ 2020-01-08 22:21   ` Jiri Olsa
  2020-01-09  7:51     ` Namhyung Kim
  2 siblings, 1 reply; 46+ messages in thread
From: Jiri Olsa @ 2020-01-08 22:21 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

On Tue, Jan 07, 2020 at 10:34:58PM +0900, Namhyung Kim wrote:

SNIP

> +	closedir(d);
> +	return ret;
> +}
> +
> +int perf_event__synthesize_cgroups(struct perf_tool *tool,
> +				   perf_event__handler_t process,
> +				   struct machine *machine)
> +{
> +	union perf_event event;
> +	char *cgrp_root;
> +	size_t mount_len;  /* length of mount point in the path */
> +	int ret = -1;
> +
> +	cgrp_root = malloc(PATH_MAX);
> +	if (cgrp_root == NULL)
> +		return -1;
> +

hum, we normally use bufs with PATH_MAX size on stack..
is there some reason to use heap in here?

jirka


> +	if (cgroupfs_find_mountpoint(cgrp_root, PATH_MAX) < 0) {
> +		pr_debug("cannot find cgroup mount point\n");
> +		goto out;
> +	}
> +
> +	mount_len = strlen(cgrp_root);
> +	/* make sure the path starts with a slash (after mount point) */
> +	strcat(cgrp_root, "/");
> +
> +	if (perf_event__walk_cgroup_tree(tool, &event, cgrp_root, mount_len,
> +					 process, machine) < 0)
> +		goto out;
> +
> +	ret = 0;
> +
> +out:
> +	free(cgrp_root);
> +
> +	return ret;
> +}
> +

SNIP


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 6/9] perf record: Support synthesizing cgroup events
  2020-01-07 13:34 ` [PATCH 6/9] perf record: Support synthesizing cgroup events Namhyung Kim
  2020-01-08 22:08   ` Jiri Olsa
@ 2020-01-08 22:18   ` Jiri Olsa
  2020-01-09  7:50     ` Namhyung Kim
  2020-01-08 22:21   ` Jiri Olsa
  2 siblings, 1 reply; 46+ messages in thread
From: Jiri Olsa @ 2020-01-08 22:18 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

On Tue, Jan 07, 2020 at 10:34:58PM +0900, Namhyung Kim wrote:

SNIP

> +	if (perf_tool__process_synth_event(tool, event, machine, process) < 0) {
> +		pr_debug("process synth event failed\n");
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
> +static int perf_event__walk_cgroup_tree(struct perf_tool *tool,
> +					union perf_event *event,
> +					char *path, size_t mount_len,
> +					perf_event__handler_t process,
> +					struct machine *machine)
> +{
> +	size_t pos = strlen(path);
> +	DIR *d;
> +	struct dirent *dent;
> +	int ret = 0;
> +
> +	if (perf_event__synthesize_cgroup(tool, event, path, mount_len,
> +					  process, machine) < 0)
> +		return -1;
> +
> +	d = opendir(path);
> +	if (d == NULL) {
> +		pr_debug("failed to open directory: %s\n", path);
> +		return -1;
> +	}
> +
> +	while ((dent = readdir(d)) != NULL) {
> +		if (dent->d_type != DT_DIR)
> +			continue;
> +		if (!strcmp(dent->d_name, ".") ||
> +		    !strcmp(dent->d_name, ".."))
> +			continue;
> +
> +		if (path[pos - 1] != '/')
> +			strcat(path, "/");
> +		strcat(path, dent->d_name);

shouldn't you also check for the max size of path in here?

jirka


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 6/9] perf record: Support synthesizing cgroup events
  2020-01-07 13:34 ` [PATCH 6/9] perf record: Support synthesizing cgroup events Namhyung Kim
@ 2020-01-08 22:08   ` Jiri Olsa
  2020-01-09  5:19     ` Namhyung Kim
  2020-01-08 22:18   ` Jiri Olsa
  2020-01-08 22:21   ` Jiri Olsa
  2 siblings, 1 reply; 46+ messages in thread
From: Jiri Olsa @ 2020-01-08 22:08 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Mark Rutland, Stephane Eranian, LKML,
	linux-perf-users

On Tue, Jan 07, 2020 at 10:34:58PM +0900, Namhyung Kim wrote:
> Synthesize cgroup events by iterating cgroup filesystem directories.
> The cgroup event only saves the portion of cgroup path after the mount
> point and the cgroup id (which actually is a file handle).
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/builtin-record.c        |   5 ++
>  tools/perf/util/cgroup.c           |   3 +-
>  tools/perf/util/cgroup.h           |   1 +
>  tools/perf/util/event.c            |   1 +
>  tools/perf/util/synthetic-events.c | 119 +++++++++++++++++++++++++++++
>  tools/perf/util/synthetic-events.h |   1 +
>  tools/perf/util/tool.h             |   1 +
>  7 files changed, 129 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 4c301466101b..2802de9538ff 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -1397,6 +1397,11 @@ static int record__synthesize(struct record *rec, bool tail)
>  	if (err < 0)
>  		pr_warning("Couldn't synthesize bpf events.\n");
>  
> +	err = perf_event__synthesize_cgroups(tool, process_synthesized_event,
> +					     machine);
> +	if (err < 0)
> +		pr_warning("Couldn't synthesize cgroup events.\n");
> +
>  	err = __machine__synthesize_threads(machine, tool, &opts->target, rec->evlist->core.threads,
>  					    process_synthesized_event, opts->sample_address,
>  					    1);
> diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
> index 4e8ef1db0c94..5147d22b3bda 100644
> --- a/tools/perf/util/cgroup.c
> +++ b/tools/perf/util/cgroup.c
> @@ -15,8 +15,7 @@ int nr_cgroups;
>  
>  static struct rb_root cgroup_tree = RB_ROOT;
>  
> -static int
> -cgroupfs_find_mountpoint(char *buf, size_t maxlen)
> +int cgroupfs_find_mountpoint(char *buf, size_t maxlen)

out of scope of this change, but could this be added to api/fs/fs.c?
it might need more checks then is currently supported, but would be
nice to have it under same api as the rest

jirka


>  {
>  	FILE *fp;
>  	char mountpoint[PATH_MAX + 1], tokens[PATH_MAX + 1], type[PATH_MAX + 1];
> diff --git a/tools/perf/util/cgroup.h b/tools/perf/util/cgroup.h
> index 381583df27c7..9a67060723fa 100644
> --- a/tools/perf/util/cgroup.h
> +++ b/tools/perf/util/cgroup.h
> @@ -17,6 +17,7 @@ struct cgroup {
>  
>  extern int nr_cgroups; /* number of explicit cgroups defined */
>  
> +int cgroupfs_find_mountpoint(char *buf, size_t maxlen);
>  struct cgroup *cgroup__get(struct cgroup *cgroup);
>  void cgroup__put(struct cgroup *cgroup);
>  
> diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
> index 824c038e5c33..28801c867f39 100644
> --- a/tools/perf/util/event.c
> +++ b/tools/perf/util/event.c
> @@ -33,6 +33,7 @@
>  #include "bpf-event.h"
>  #include "tool.h"
>  #include "../perf.h"
> +#include "cgroup.h"
>  
>  static const char *perf_event__names[] = {
>  	[0]					= "TOTAL",
> diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
> index cd336eb8886b..36f122cac44f 100644
> --- a/tools/perf/util/synthetic-events.c
> +++ b/tools/perf/util/synthetic-events.c
> @@ -16,6 +16,7 @@
>  #include "util/synthetic-events.h"
>  #include "util/target.h"
>  #include "util/time-utils.h"
> +#include "util/cgroup.h"
>  #include <linux/bitops.h>
>  #include <linux/kernel.h>
>  #include <linux/string.h>
> @@ -413,6 +414,124 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
>  	return rc;
>  }
>  
> +static int perf_event__synthesize_cgroup(struct perf_tool *tool,
> +					 union perf_event *event,
> +					 char *path, size_t mount_len,
> +					 perf_event__handler_t process,
> +					 struct machine *machine)
> +{
> +	size_t event_size = sizeof(event->cgroup) - sizeof(event->cgroup.path);
> +	size_t path_len = strlen(path) - mount_len + 1;
> +	struct {
> +		struct file_handle fh;
> +		uint64_t cgroup_id;
> +	} handle;
> +	int mount_id;
> +
> +	while (path_len % sizeof(u64))
> +		path[mount_len + path_len++] = '\0';
> +
> +	memset(&event->cgroup, 0, event_size);
> +
> +	event->cgroup.header.type = PERF_RECORD_CGROUP;
> +	event->cgroup.header.size = event_size + path_len + machine->id_hdr_size;
> +
> +	handle.fh.handle_bytes = sizeof(handle.cgroup_id);
> +	if (name_to_handle_at(AT_FDCWD, path, &handle.fh, &mount_id, 0) < 0) {
> +		pr_debug("stat failed: %s\n", path);
> +		return -1;
> +	}
> +
> +	event->cgroup.id = handle.cgroup_id;
> +	strncpy(event->cgroup.path, path + mount_len, path_len);
> +	memset(event->cgroup.path + path_len, 0, machine->id_hdr_size);
> +
> +	if (perf_tool__process_synth_event(tool, event, machine, process) < 0) {
> +		pr_debug("process synth event failed\n");
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
> +static int perf_event__walk_cgroup_tree(struct perf_tool *tool,
> +					union perf_event *event,
> +					char *path, size_t mount_len,
> +					perf_event__handler_t process,
> +					struct machine *machine)
> +{
> +	size_t pos = strlen(path);
> +	DIR *d;
> +	struct dirent *dent;
> +	int ret = 0;
> +
> +	if (perf_event__synthesize_cgroup(tool, event, path, mount_len,
> +					  process, machine) < 0)
> +		return -1;
> +
> +	d = opendir(path);
> +	if (d == NULL) {
> +		pr_debug("failed to open directory: %s\n", path);
> +		return -1;
> +	}
> +
> +	while ((dent = readdir(d)) != NULL) {
> +		if (dent->d_type != DT_DIR)
> +			continue;
> +		if (!strcmp(dent->d_name, ".") ||
> +		    !strcmp(dent->d_name, ".."))
> +			continue;
> +
> +		if (path[pos - 1] != '/')
> +			strcat(path, "/");
> +		strcat(path, dent->d_name);
> +
> +		ret = perf_event__walk_cgroup_tree(tool, event, path,
> +						   mount_len, process, machine);
> +		if (ret < 0)
> +			break;
> +
> +		path[pos] = '\0';
> +	}
> +
> +	closedir(d);
> +	return ret;
> +}
> +
> +int perf_event__synthesize_cgroups(struct perf_tool *tool,
> +				   perf_event__handler_t process,
> +				   struct machine *machine)
> +{
> +	union perf_event event;
> +	char *cgrp_root;
> +	size_t mount_len;  /* length of mount point in the path */
> +	int ret = -1;
> +
> +	cgrp_root = malloc(PATH_MAX);
> +	if (cgrp_root == NULL)
> +		return -1;
> +
> +	if (cgroupfs_find_mountpoint(cgrp_root, PATH_MAX) < 0) {
> +		pr_debug("cannot find cgroup mount point\n");
> +		goto out;
> +	}
> +
> +	mount_len = strlen(cgrp_root);
> +	/* make sure the path starts with a slash (after mount point) */
> +	strcat(cgrp_root, "/");
> +
> +	if (perf_event__walk_cgroup_tree(tool, &event, cgrp_root, mount_len,
> +					 process, machine) < 0)
> +		goto out;
> +
> +	ret = 0;
> +
> +out:
> +	free(cgrp_root);
> +
> +	return ret;
> +}
> +
>  int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t process,
>  				   struct machine *machine)
>  {
> diff --git a/tools/perf/util/synthetic-events.h b/tools/perf/util/synthetic-events.h
> index baead0cdc381..e7a3e9589738 100644
> --- a/tools/perf/util/synthetic-events.h
> +++ b/tools/perf/util/synthetic-events.h
> @@ -45,6 +45,7 @@ int perf_event__synthesize_kernel_mmap(struct perf_tool *tool, perf_event__handl
>  int perf_event__synthesize_mmap_events(struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine, bool mmap_data);
>  int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
>  int perf_event__synthesize_namespaces(struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine);
> +int perf_event__synthesize_cgroups(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
>  int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format, const struct perf_sample *sample);
>  int perf_event__synthesize_stat_config(struct perf_tool *tool, struct perf_stat_config *config, perf_event__handler_t process, struct machine *machine);
>  int perf_event__synthesize_stat_events(struct perf_stat_config *config, struct perf_tool *tool, struct evlist *evlist, perf_event__handler_t process, bool attrs);
> diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
> index 472ef5eb4068..3fb67bd31e4a 100644
> --- a/tools/perf/util/tool.h
> +++ b/tools/perf/util/tool.h
> @@ -79,6 +79,7 @@ struct perf_tool {
>  	bool		ordered_events;
>  	bool		ordering_requires_timestamps;
>  	bool		namespace_events;
> +	bool		cgroup_events;
>  	bool		no_warn;
>  	enum show_feature_header show_feat_hdr;
>  };
> -- 
> 2.24.1.735.g03f4e72817-goog
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 6/9] perf record: Support synthesizing cgroup events
  2020-01-07 13:34 [PATCHSET 0/9] perf: Improve cgroup profiling (v4) Namhyung Kim
@ 2020-01-07 13:34 ` Namhyung Kim
  2020-01-08 22:08   ` Jiri Olsa
                     ` (2 more replies)
  0 siblings, 3 replies; 46+ messages in thread
From: Namhyung Kim @ 2020-01-07 13:34 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Alexander Shishkin, Mark Rutland, Stephane Eranian,
	LKML, linux-perf-users

Synthesize cgroup events by iterating cgroup filesystem directories.
The cgroup event only saves the portion of cgroup path after the mount
point and the cgroup id (which actually is a file handle).

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-record.c        |   5 ++
 tools/perf/util/cgroup.c           |   3 +-
 tools/perf/util/cgroup.h           |   1 +
 tools/perf/util/event.c            |   1 +
 tools/perf/util/synthetic-events.c | 119 +++++++++++++++++++++++++++++
 tools/perf/util/synthetic-events.h |   1 +
 tools/perf/util/tool.h             |   1 +
 7 files changed, 129 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 4c301466101b..2802de9538ff 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1397,6 +1397,11 @@ static int record__synthesize(struct record *rec, bool tail)
 	if (err < 0)
 		pr_warning("Couldn't synthesize bpf events.\n");
 
+	err = perf_event__synthesize_cgroups(tool, process_synthesized_event,
+					     machine);
+	if (err < 0)
+		pr_warning("Couldn't synthesize cgroup events.\n");
+
 	err = __machine__synthesize_threads(machine, tool, &opts->target, rec->evlist->core.threads,
 					    process_synthesized_event, opts->sample_address,
 					    1);
diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
index 4e8ef1db0c94..5147d22b3bda 100644
--- a/tools/perf/util/cgroup.c
+++ b/tools/perf/util/cgroup.c
@@ -15,8 +15,7 @@ int nr_cgroups;
 
 static struct rb_root cgroup_tree = RB_ROOT;
 
-static int
-cgroupfs_find_mountpoint(char *buf, size_t maxlen)
+int cgroupfs_find_mountpoint(char *buf, size_t maxlen)
 {
 	FILE *fp;
 	char mountpoint[PATH_MAX + 1], tokens[PATH_MAX + 1], type[PATH_MAX + 1];
diff --git a/tools/perf/util/cgroup.h b/tools/perf/util/cgroup.h
index 381583df27c7..9a67060723fa 100644
--- a/tools/perf/util/cgroup.h
+++ b/tools/perf/util/cgroup.h
@@ -17,6 +17,7 @@ struct cgroup {
 
 extern int nr_cgroups; /* number of explicit cgroups defined */
 
+int cgroupfs_find_mountpoint(char *buf, size_t maxlen);
 struct cgroup *cgroup__get(struct cgroup *cgroup);
 void cgroup__put(struct cgroup *cgroup);
 
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 824c038e5c33..28801c867f39 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -33,6 +33,7 @@
 #include "bpf-event.h"
 #include "tool.h"
 #include "../perf.h"
+#include "cgroup.h"
 
 static const char *perf_event__names[] = {
 	[0]					= "TOTAL",
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index cd336eb8886b..36f122cac44f 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -16,6 +16,7 @@
 #include "util/synthetic-events.h"
 #include "util/target.h"
 #include "util/time-utils.h"
+#include "util/cgroup.h"
 #include <linux/bitops.h>
 #include <linux/kernel.h>
 #include <linux/string.h>
@@ -413,6 +414,124 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 	return rc;
 }
 
+static int perf_event__synthesize_cgroup(struct perf_tool *tool,
+					 union perf_event *event,
+					 char *path, size_t mount_len,
+					 perf_event__handler_t process,
+					 struct machine *machine)
+{
+	size_t event_size = sizeof(event->cgroup) - sizeof(event->cgroup.path);
+	size_t path_len = strlen(path) - mount_len + 1;
+	struct {
+		struct file_handle fh;
+		uint64_t cgroup_id;
+	} handle;
+	int mount_id;
+
+	while (path_len % sizeof(u64))
+		path[mount_len + path_len++] = '\0';
+
+	memset(&event->cgroup, 0, event_size);
+
+	event->cgroup.header.type = PERF_RECORD_CGROUP;
+	event->cgroup.header.size = event_size + path_len + machine->id_hdr_size;
+
+	handle.fh.handle_bytes = sizeof(handle.cgroup_id);
+	if (name_to_handle_at(AT_FDCWD, path, &handle.fh, &mount_id, 0) < 0) {
+		pr_debug("stat failed: %s\n", path);
+		return -1;
+	}
+
+	event->cgroup.id = handle.cgroup_id;
+	strncpy(event->cgroup.path, path + mount_len, path_len);
+	memset(event->cgroup.path + path_len, 0, machine->id_hdr_size);
+
+	if (perf_tool__process_synth_event(tool, event, machine, process) < 0) {
+		pr_debug("process synth event failed\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int perf_event__walk_cgroup_tree(struct perf_tool *tool,
+					union perf_event *event,
+					char *path, size_t mount_len,
+					perf_event__handler_t process,
+					struct machine *machine)
+{
+	size_t pos = strlen(path);
+	DIR *d;
+	struct dirent *dent;
+	int ret = 0;
+
+	if (perf_event__synthesize_cgroup(tool, event, path, mount_len,
+					  process, machine) < 0)
+		return -1;
+
+	d = opendir(path);
+	if (d == NULL) {
+		pr_debug("failed to open directory: %s\n", path);
+		return -1;
+	}
+
+	while ((dent = readdir(d)) != NULL) {
+		if (dent->d_type != DT_DIR)
+			continue;
+		if (!strcmp(dent->d_name, ".") ||
+		    !strcmp(dent->d_name, ".."))
+			continue;
+
+		if (path[pos - 1] != '/')
+			strcat(path, "/");
+		strcat(path, dent->d_name);
+
+		ret = perf_event__walk_cgroup_tree(tool, event, path,
+						   mount_len, process, machine);
+		if (ret < 0)
+			break;
+
+		path[pos] = '\0';
+	}
+
+	closedir(d);
+	return ret;
+}
+
+int perf_event__synthesize_cgroups(struct perf_tool *tool,
+				   perf_event__handler_t process,
+				   struct machine *machine)
+{
+	union perf_event event;
+	char *cgrp_root;
+	size_t mount_len;  /* length of mount point in the path */
+	int ret = -1;
+
+	cgrp_root = malloc(PATH_MAX);
+	if (cgrp_root == NULL)
+		return -1;
+
+	if (cgroupfs_find_mountpoint(cgrp_root, PATH_MAX) < 0) {
+		pr_debug("cannot find cgroup mount point\n");
+		goto out;
+	}
+
+	mount_len = strlen(cgrp_root);
+	/* make sure the path starts with a slash (after mount point) */
+	strcat(cgrp_root, "/");
+
+	if (perf_event__walk_cgroup_tree(tool, &event, cgrp_root, mount_len,
+					 process, machine) < 0)
+		goto out;
+
+	ret = 0;
+
+out:
+	free(cgrp_root);
+
+	return ret;
+}
+
 int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t process,
 				   struct machine *machine)
 {
diff --git a/tools/perf/util/synthetic-events.h b/tools/perf/util/synthetic-events.h
index baead0cdc381..e7a3e9589738 100644
--- a/tools/perf/util/synthetic-events.h
+++ b/tools/perf/util/synthetic-events.h
@@ -45,6 +45,7 @@ int perf_event__synthesize_kernel_mmap(struct perf_tool *tool, perf_event__handl
 int perf_event__synthesize_mmap_events(struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine, bool mmap_data);
 int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
 int perf_event__synthesize_namespaces(struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine);
+int perf_event__synthesize_cgroups(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
 int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format, const struct perf_sample *sample);
 int perf_event__synthesize_stat_config(struct perf_tool *tool, struct perf_stat_config *config, perf_event__handler_t process, struct machine *machine);
 int perf_event__synthesize_stat_events(struct perf_stat_config *config, struct perf_tool *tool, struct evlist *evlist, perf_event__handler_t process, bool attrs);
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 472ef5eb4068..3fb67bd31e4a 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -79,6 +79,7 @@ struct perf_tool {
 	bool		ordered_events;
 	bool		ordering_requires_timestamps;
 	bool		namespace_events;
+	bool		cgroup_events;
 	bool		no_warn;
 	enum show_feature_header show_feat_hdr;
 };
-- 
2.24.1.735.g03f4e72817-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 6/9] perf record: Support synthesizing cgroup events
  2019-12-23  6:07 [PATCHSET 0/9] perf: Improve cgroup profiling (v3) Namhyung Kim
@ 2019-12-23  6:07 ` Namhyung Kim
  0 siblings, 0 replies; 46+ messages in thread
From: Namhyung Kim @ 2019-12-23  6:07 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Alexander Shishkin, Mark Rutland, Stephane Eranian,
	LKML, linux-perf-users

Synthesize cgroup events by iterating cgroup filesystem directories.
The cgroup event only saves the portion of cgroup path after the mount
point and the cgroup id (which actually is a file handle).

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-record.c        |   5 ++
 tools/perf/util/cgroup.c           |   3 +-
 tools/perf/util/cgroup.h           |   1 +
 tools/perf/util/event.c            |   1 +
 tools/perf/util/synthetic-events.c | 119 +++++++++++++++++++++++++++++
 tools/perf/util/synthetic-events.h |   1 +
 tools/perf/util/tool.h             |   1 +
 7 files changed, 129 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 4c301466101b..2802de9538ff 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1397,6 +1397,11 @@ static int record__synthesize(struct record *rec, bool tail)
 	if (err < 0)
 		pr_warning("Couldn't synthesize bpf events.\n");
 
+	err = perf_event__synthesize_cgroups(tool, process_synthesized_event,
+					     machine);
+	if (err < 0)
+		pr_warning("Couldn't synthesize cgroup events.\n");
+
 	err = __machine__synthesize_threads(machine, tool, &opts->target, rec->evlist->core.threads,
 					    process_synthesized_event, opts->sample_address,
 					    1);
diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
index 4e8ef1db0c94..5147d22b3bda 100644
--- a/tools/perf/util/cgroup.c
+++ b/tools/perf/util/cgroup.c
@@ -15,8 +15,7 @@ int nr_cgroups;
 
 static struct rb_root cgroup_tree = RB_ROOT;
 
-static int
-cgroupfs_find_mountpoint(char *buf, size_t maxlen)
+int cgroupfs_find_mountpoint(char *buf, size_t maxlen)
 {
 	FILE *fp;
 	char mountpoint[PATH_MAX + 1], tokens[PATH_MAX + 1], type[PATH_MAX + 1];
diff --git a/tools/perf/util/cgroup.h b/tools/perf/util/cgroup.h
index 381583df27c7..9a67060723fa 100644
--- a/tools/perf/util/cgroup.h
+++ b/tools/perf/util/cgroup.h
@@ -17,6 +17,7 @@ struct cgroup {
 
 extern int nr_cgroups; /* number of explicit cgroups defined */
 
+int cgroupfs_find_mountpoint(char *buf, size_t maxlen);
 struct cgroup *cgroup__get(struct cgroup *cgroup);
 void cgroup__put(struct cgroup *cgroup);
 
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 824c038e5c33..28801c867f39 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -33,6 +33,7 @@
 #include "bpf-event.h"
 #include "tool.h"
 #include "../perf.h"
+#include "cgroup.h"
 
 static const char *perf_event__names[] = {
 	[0]					= "TOTAL",
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index c423298fe62d..cfc850efa318 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -16,6 +16,7 @@
 #include "util/synthetic-events.h"
 #include "util/target.h"
 #include "util/time-utils.h"
+#include "util/cgroup.h"
 #include <linux/bitops.h>
 #include <linux/kernel.h>
 #include <linux/string.h>
@@ -413,6 +414,124 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 	return rc;
 }
 
+static int perf_event__synthesize_cgroup(struct perf_tool *tool,
+					 union perf_event *event,
+					 char *path, size_t mount_len,
+					 perf_event__handler_t process,
+					 struct machine *machine)
+{
+	size_t event_size = sizeof(event->cgroup) - sizeof(event->cgroup.path);
+	size_t path_len = strlen(path) - mount_len + 1;
+	struct {
+		struct file_handle fh;
+		uint64_t cgroup_id;
+	} handle;
+	int mount_id;
+
+	while (path_len % sizeof(u64))
+		path[mount_len + path_len++] = '\0';
+
+	memset(&event->cgroup, 0, event_size);
+
+	event->cgroup.header.type = PERF_RECORD_CGROUP;
+	event->cgroup.header.size = event_size + path_len + machine->id_hdr_size;
+
+	handle.fh.handle_bytes = sizeof(handle.cgroup_id);
+	if (name_to_handle_at(AT_FDCWD, path, &handle.fh, &mount_id, 0) < 0) {
+		pr_debug("stat failed: %s\n", path);
+		return -1;
+	}
+
+	event->cgroup.id = handle.cgroup_id;
+	strncpy(event->cgroup.path, path + mount_len, path_len);
+	memset(event->cgroup.path + path_len, 0, machine->id_hdr_size);
+
+	if (perf_tool__process_synth_event(tool, event, machine, process) < 0) {
+		pr_debug("process synth event failed\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int perf_event__walk_cgroup_tree(struct perf_tool *tool,
+					union perf_event *event,
+					char *path, size_t mount_len,
+					perf_event__handler_t process,
+					struct machine *machine)
+{
+	size_t pos = strlen(path);
+	DIR *d;
+	struct dirent *dent;
+	int ret = 0;
+
+	if (perf_event__synthesize_cgroup(tool, event, path, mount_len,
+					  process, machine) < 0)
+		return -1;
+
+	d = opendir(path);
+	if (d == NULL) {
+		pr_debug("failed to open directory: %s\n", path);
+		return -1;
+	}
+
+	while ((dent = readdir(d)) != NULL) {
+		if (dent->d_type != DT_DIR)
+			continue;
+		if (!strcmp(dent->d_name, ".") ||
+		    !strcmp(dent->d_name, ".."))
+			continue;
+
+		if (path[pos - 1] != '/')
+			strcat(path, "/");
+		strcat(path, dent->d_name);
+
+		ret = perf_event__walk_cgroup_tree(tool, event, path,
+						   mount_len, process, machine);
+		if (ret < 0)
+			break;
+
+		path[pos] = '\0';
+	}
+
+	closedir(d);
+	return ret;
+}
+
+int perf_event__synthesize_cgroups(struct perf_tool *tool,
+				   perf_event__handler_t process,
+				   struct machine *machine)
+{
+	union perf_event event;
+	char *cgrp_root;
+	size_t mount_len;  /* length of mount point in the path */
+	int ret = -1;
+
+	cgrp_root = malloc(PATH_MAX);
+	if (cgrp_root == NULL)
+		return -1;
+
+	if (cgroupfs_find_mountpoint(cgrp_root, PATH_MAX) < 0) {
+		pr_debug("cannot find cgroup mount point\n");
+		goto out;
+	}
+
+	mount_len = strlen(cgrp_root);
+	/* make sure the path starts with a slash (after mount point) */
+	strcat(cgrp_root, "/");
+
+	if (perf_event__walk_cgroup_tree(tool, &event, cgrp_root, mount_len,
+					 process, machine) < 0)
+		goto out;
+
+	ret = 0;
+
+out:
+	free(cgrp_root);
+
+	return ret;
+}
+
 int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t process,
 				   struct machine *machine)
 {
diff --git a/tools/perf/util/synthetic-events.h b/tools/perf/util/synthetic-events.h
index baead0cdc381..e7a3e9589738 100644
--- a/tools/perf/util/synthetic-events.h
+++ b/tools/perf/util/synthetic-events.h
@@ -45,6 +45,7 @@ int perf_event__synthesize_kernel_mmap(struct perf_tool *tool, perf_event__handl
 int perf_event__synthesize_mmap_events(struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine, bool mmap_data);
 int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
 int perf_event__synthesize_namespaces(struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine);
+int perf_event__synthesize_cgroups(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
 int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format, const struct perf_sample *sample);
 int perf_event__synthesize_stat_config(struct perf_tool *tool, struct perf_stat_config *config, perf_event__handler_t process, struct machine *machine);
 int perf_event__synthesize_stat_events(struct perf_stat_config *config, struct perf_tool *tool, struct evlist *evlist, perf_event__handler_t process, bool attrs);
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 472ef5eb4068..3fb67bd31e4a 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -79,6 +79,7 @@ struct perf_tool {
 	bool		ordered_events;
 	bool		ordering_requires_timestamps;
 	bool		namespace_events;
+	bool		cgroup_events;
 	bool		no_warn;
 	enum show_feature_header show_feat_hdr;
 };
-- 
2.24.1.735.g03f4e72817-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 6/9] perf record: Support synthesizing cgroup events
  2019-12-20  4:32 [PATCHSET 0/9] perf: Improve cgroup profiling (v2) Namhyung Kim
@ 2019-12-20  4:32 ` Namhyung Kim
  0 siblings, 0 replies; 46+ messages in thread
From: Namhyung Kim @ 2019-12-20  4:32 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Alexander Shishkin, Mark Rutland, Stephane Eranian,
	LKML, linux-perf-users

Synthesize cgroup events by iterating cgroup filesystem directories.
The cgroup event only saves the portion of cgroup path after the mount
point and the inode number.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-record.c        |   5 ++
 tools/perf/util/cgroup.c           |   3 +-
 tools/perf/util/cgroup.h           |   1 +
 tools/perf/util/event.c            |   1 +
 tools/perf/util/synthetic-events.c | 120 +++++++++++++++++++++++++++++
 tools/perf/util/synthetic-events.h |   1 +
 tools/perf/util/tool.h             |   1 +
 7 files changed, 130 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 4c301466101b..2802de9538ff 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1397,6 +1397,11 @@ static int record__synthesize(struct record *rec, bool tail)
 	if (err < 0)
 		pr_warning("Couldn't synthesize bpf events.\n");
 
+	err = perf_event__synthesize_cgroups(tool, process_synthesized_event,
+					     machine);
+	if (err < 0)
+		pr_warning("Couldn't synthesize cgroup events.\n");
+
 	err = __machine__synthesize_threads(machine, tool, &opts->target, rec->evlist->core.threads,
 					    process_synthesized_event, opts->sample_address,
 					    1);
diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
index 99d93f31732d..cf2a7fc24726 100644
--- a/tools/perf/util/cgroup.c
+++ b/tools/perf/util/cgroup.c
@@ -15,8 +15,7 @@ int nr_cgroups;
 
 static struct rb_root cgroup_tree = RB_ROOT;
 
-static int
-cgroupfs_find_mountpoint(char *buf, size_t maxlen)
+int cgroupfs_find_mountpoint(char *buf, size_t maxlen)
 {
 	FILE *fp;
 	char mountpoint[PATH_MAX + 1], tokens[PATH_MAX + 1], type[PATH_MAX + 1];
diff --git a/tools/perf/util/cgroup.h b/tools/perf/util/cgroup.h
index 11a8b187ec09..755f9712eda4 100644
--- a/tools/perf/util/cgroup.h
+++ b/tools/perf/util/cgroup.h
@@ -17,6 +17,7 @@ struct cgroup {
 
 extern int nr_cgroups; /* number of explicit cgroups defined */
 
+int cgroupfs_find_mountpoint(char *buf, size_t maxlen);
 struct cgroup *cgroup__get(struct cgroup *cgroup);
 void cgroup__put(struct cgroup *cgroup);
 
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 824c038e5c33..28801c867f39 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -33,6 +33,7 @@
 #include "bpf-event.h"
 #include "tool.h"
 #include "../perf.h"
+#include "cgroup.h"
 
 static const char *perf_event__names[] = {
 	[0]					= "TOTAL",
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index c423298fe62d..7ed7c89f48e6 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -16,6 +16,7 @@
 #include "util/synthetic-events.h"
 #include "util/target.h"
 #include "util/time-utils.h"
+#include "util/cgroup.h"
 #include <linux/bitops.h>
 #include <linux/kernel.h>
 #include <linux/string.h>
@@ -413,6 +414,125 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 	return rc;
 }
 
+static int perf_event__synthesize_cgroup(struct perf_tool *tool,
+					 union perf_event *event,
+					 char *path, size_t mount_len,
+					 perf_event__handler_t process,
+					 struct machine *machine)
+{
+	size_t event_size = sizeof(event->cgroup) - sizeof(event->cgroup.path);
+	size_t path_len = strlen(path) - mount_len + 1;
+	struct {
+		struct file_handle fh;
+		uint64_t cgroup_id;
+	} handle;
+	int mount_id;
+
+	while (path_len % sizeof(u64))
+		path[mount_len + path_len++] = '\0';
+
+	memset(&event->cgroup, 0, event_size);
+
+	event->cgroup.header.type = PERF_RECORD_CGROUP;
+	event->cgroup.header.size = event_size + path_len + machine->id_hdr_size;
+
+	handle.fh.handle_bytes = sizeof(handle.cgroup_id);
+	if (name_to_handle_at(AT_FDCWD, path, &handle.fh, &mount_id, 0) < 0) {
+		pr_debug("stat failed: %s\n", path);
+		return -1;
+	}
+
+	event->cgroup.id = handle.cgroup_id;
+	event->cgroup.path_len = path_len;
+	strncpy(event->cgroup.path, path + mount_len, path_len);
+	memset(event->cgroup.path + path_len, 0, machine->id_hdr_size);
+
+	if (perf_tool__process_synth_event(tool, event, machine, process) < 0) {
+		pr_debug("process synth event failed\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int perf_event__walk_cgroup_tree(struct perf_tool *tool,
+					union perf_event *event,
+					char *path, size_t mount_len,
+					perf_event__handler_t process,
+					struct machine *machine)
+{
+	size_t pos = strlen(path);
+	DIR *d;
+	struct dirent *dent;
+	int ret = 0;
+
+	if (perf_event__synthesize_cgroup(tool, event, path, mount_len,
+					  process, machine) < 0)
+		return -1;
+
+	d = opendir(path);
+	if (d == NULL) {
+		pr_debug("failed to open directory: %s\n", path);
+		return -1;
+	}
+
+	while ((dent = readdir(d)) != NULL) {
+		if (dent->d_type != DT_DIR)
+			continue;
+		if (!strcmp(dent->d_name, ".") ||
+		    !strcmp(dent->d_name, ".."))
+			continue;
+
+		if (path[pos - 1] != '/')
+			strcat(path, "/");
+		strcat(path, dent->d_name);
+
+		ret = perf_event__walk_cgroup_tree(tool, event, path,
+						   mount_len, process, machine);
+		if (ret < 0)
+			break;
+
+		path[pos] = '\0';
+	}
+
+	closedir(d);
+	return ret;
+}
+
+int perf_event__synthesize_cgroups(struct perf_tool *tool,
+				   perf_event__handler_t process,
+				   struct machine *machine)
+{
+	union perf_event event;
+	char *cgrp_root;
+	size_t mount_len;  /* length of mount point in the path */
+	int ret = -1;
+
+	cgrp_root = malloc(PATH_MAX);
+	if (cgrp_root == NULL)
+		return -1;
+
+	if (cgroupfs_find_mountpoint(cgrp_root, PATH_MAX) < 0) {
+		pr_debug("cannot find cgroup mount point\n");
+		goto out;
+	}
+
+	mount_len = strlen(cgrp_root);
+	/* make sure the path starts with a slash (after mount point) */
+	strcat(cgrp_root, "/");
+
+	if (perf_event__walk_cgroup_tree(tool, &event, cgrp_root, mount_len,
+					 process, machine) < 0)
+		goto out;
+
+	ret = 0;
+
+out:
+	free(cgrp_root);
+
+	return ret;
+}
+
 int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t process,
 				   struct machine *machine)
 {
diff --git a/tools/perf/util/synthetic-events.h b/tools/perf/util/synthetic-events.h
index baead0cdc381..e7a3e9589738 100644
--- a/tools/perf/util/synthetic-events.h
+++ b/tools/perf/util/synthetic-events.h
@@ -45,6 +45,7 @@ int perf_event__synthesize_kernel_mmap(struct perf_tool *tool, perf_event__handl
 int perf_event__synthesize_mmap_events(struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine, bool mmap_data);
 int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
 int perf_event__synthesize_namespaces(struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine);
+int perf_event__synthesize_cgroups(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
 int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format, const struct perf_sample *sample);
 int perf_event__synthesize_stat_config(struct perf_tool *tool, struct perf_stat_config *config, perf_event__handler_t process, struct machine *machine);
 int perf_event__synthesize_stat_events(struct perf_stat_config *config, struct perf_tool *tool, struct evlist *evlist, perf_event__handler_t process, bool attrs);
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 472ef5eb4068..3fb67bd31e4a 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -79,6 +79,7 @@ struct perf_tool {
 	bool		ordered_events;
 	bool		ordering_requires_timestamps;
 	bool		namespace_events;
+	bool		cgroup_events;
 	bool		no_warn;
 	enum show_feature_header show_feat_hdr;
 };
-- 
2.24.1.735.g03f4e72817-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 6/9] perf record: Support synthesizing cgroup events
  2019-08-28  7:31 [PATCHSET 0/9] perf: Improve cgroup profiling (v1) Namhyung Kim
@ 2019-08-28  7:31 ` Namhyung Kim
  0 siblings, 0 replies; 46+ messages in thread
From: Namhyung Kim @ 2019-08-28  7:31 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: LKML, Jiri Olsa, Alexander Shishkin, Stephane Eranian

Synthesize cgroup events by iterating cgroup filesystem directories.
The cgroup event only saves the portion of cgroup path after the mount
point and the inode number.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-record.c |   5 ++
 tools/perf/util/cgroup.c    |   3 +-
 tools/perf/util/cgroup.h    |   1 +
 tools/perf/util/event.c     | 115 ++++++++++++++++++++++++++++++++++++
 tools/perf/util/event.h     |   4 ++
 tools/perf/util/tool.h      |   1 +
 6 files changed, 127 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 359bb8f33e57..a6e3c4413b39 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1318,6 +1318,11 @@ static int record__synthesize(struct record *rec, bool tail)
 	if (err < 0)
 		pr_warning("Couldn't synthesize bpf events.\n");
 
+	err = perf_event__synthesize_cgroups(tool, process_synthesized_event,
+					     machine);
+	if (err < 0)
+		pr_warning("Couldn't synthesize cgroup events.\n");
+
 	err = __machine__synthesize_threads(machine, tool, &opts->target, rec->evlist->core.threads,
 					    process_synthesized_event, opts->sample_address,
 					    1);
diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
index 8e4c26ea5078..274f0f29c72d 100644
--- a/tools/perf/util/cgroup.c
+++ b/tools/perf/util/cgroup.c
@@ -14,8 +14,7 @@ int nr_cgroups;
 
 static struct rb_root cgroup_tree = RB_ROOT;
 
-static int
-cgroupfs_find_mountpoint(char *buf, size_t maxlen)
+int cgroupfs_find_mountpoint(char *buf, size_t maxlen)
 {
 	FILE *fp;
 	char mountpoint[PATH_MAX + 1], tokens[PATH_MAX + 1], type[PATH_MAX + 1];
diff --git a/tools/perf/util/cgroup.h b/tools/perf/util/cgroup.h
index 11a8b187ec09..755f9712eda4 100644
--- a/tools/perf/util/cgroup.h
+++ b/tools/perf/util/cgroup.h
@@ -17,6 +17,7 @@ struct cgroup {
 
 extern int nr_cgroups; /* number of explicit cgroups defined */
 
+int cgroupfs_find_mountpoint(char *buf, size_t maxlen);
 struct cgroup *cgroup__get(struct cgroup *cgroup);
 void cgroup__put(struct cgroup *cgroup);
 
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index c19b00c1fc26..9e71b9561f72 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -29,6 +29,7 @@
 #include "stat.h"
 #include "session.h"
 #include "bpf-event.h"
+#include "cgroup.h"
 
 #define DEFAULT_PROC_MAP_PARSE_TIMEOUT 500
 
@@ -296,6 +297,120 @@ int perf_event__synthesize_namespaces(struct perf_tool *tool,
 	return 0;
 }
 
+static int perf_event__synthesize_cgroup(struct perf_tool *tool,
+					 union perf_event *event,
+					 char *path, size_t mount_len,
+					 perf_event__handler_t process,
+					 struct machine *machine)
+{
+	size_t event_size = sizeof(event->cgroup) - sizeof(event->cgroup.path);
+	size_t path_len = strlen(path) - mount_len + 1;
+	struct stat64 stbuf;
+
+	while (path_len % sizeof(u64))
+		path[mount_len + path_len++] = '\0';
+
+	memset(&event->cgroup, 0, event_size);
+
+	event->cgroup.header.type = PERF_RECORD_CGROUP;
+	event->cgroup.header.size = event_size + path_len + machine->id_hdr_size;
+
+	if (stat64(path, &stbuf) < 0) {
+		pr_debug("stat failed: %s\n", path);
+		return -1;
+	}
+
+	event->cgroup.ino = stbuf.st_ino;
+	event->cgroup.path_len = path_len;
+	strncpy(event->cgroup.path, path + mount_len, path_len);
+	memset(event->cgroup.path + path_len, 0, machine->id_hdr_size);
+
+	if (perf_tool__process_synth_event(tool, event, machine, process) < 0) {
+		pr_debug("process synth event failed\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int perf_event__walk_cgroup_tree(struct perf_tool *tool,
+					union perf_event *event,
+					char *path, size_t mount_len,
+					perf_event__handler_t process,
+					struct machine *machine)
+{
+	size_t pos = strlen(path);
+	DIR *d;
+	struct dirent *dent;
+	int ret = 0;
+
+	if (perf_event__synthesize_cgroup(tool, event, path, mount_len,
+					  process, machine) < 0)
+		return -1;
+
+	d = opendir(path);
+	if (d == NULL) {
+		pr_debug("failed to open directory: %s\n", path);
+		return -1;
+	}
+
+	while ((dent = readdir(d)) != NULL) {
+		if (dent->d_type != DT_DIR)
+			continue;
+		if (!strcmp(dent->d_name, ".") ||
+		    !strcmp(dent->d_name, ".."))
+			continue;
+
+		if (path[pos - 1] != '/')
+			strcat(path, "/");
+		strcat(path, dent->d_name);
+
+		ret = perf_event__walk_cgroup_tree(tool, event, path,
+						   mount_len, process, machine);
+		if (ret < 0)
+			break;
+
+		path[pos] = '\0';
+	}
+
+	closedir(d);
+	return ret;
+}
+
+int perf_event__synthesize_cgroups(struct perf_tool *tool,
+				   perf_event__handler_t process,
+				   struct machine *machine)
+{
+	union perf_event event;
+	char *cgrp_root;
+	size_t mount_len;  /* length of mount point in the path */
+	int ret = -1;
+
+	cgrp_root = malloc(PATH_MAX);
+	if (cgrp_root == NULL)
+		return -1;
+
+	if (cgroupfs_find_mountpoint(cgrp_root, PATH_MAX) < 0) {
+		pr_debug("cannot find cgroup mount point\n");
+		goto out;
+	}
+
+	mount_len = strlen(cgrp_root);
+	/* make sure the path starts with a slash (after mount point) */
+	strcat(cgrp_root, "/");
+
+	if (perf_event__walk_cgroup_tree(tool, &event, cgrp_root, mount_len,
+					 process, machine) < 0)
+		goto out;
+
+	ret = 0;
+
+out:
+	free(cgrp_root);
+
+	return ret;
+}
+
 static int perf_event__synthesize_fork(struct perf_tool *tool,
 				       union perf_event *event,
 				       pid_t pid, pid_t tgid, pid_t ppid,
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 0170435fd1e8..b4c4da69a771 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -735,6 +735,10 @@ int perf_event__synthesize_namespaces(struct perf_tool *tool,
 				      perf_event__handler_t process,
 				      struct machine *machine);
 
+int perf_event__synthesize_cgroups(struct perf_tool *tool,
+				   perf_event__handler_t process,
+				   struct machine *machine);
+
 int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 				       union perf_event *event,
 				       pid_t pid, pid_t tgid,
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 472ef5eb4068..3fb67bd31e4a 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -79,6 +79,7 @@ struct perf_tool {
 	bool		ordered_events;
 	bool		ordering_requires_timestamps;
 	bool		namespace_events;
+	bool		cgroup_events;
 	bool		no_warn;
 	enum show_feature_header show_feat_hdr;
 };
-- 
2.23.0.187.g17f5b7556c-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2020-04-04  8:49 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-25 12:45 [PATCHSET 0/9] perf: Improve cgroup profiling (v6) Namhyung Kim
2020-03-25 12:45 ` [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event Namhyung Kim
2020-04-04  8:41   ` [tip: perf/urgent] " tip-bot2 for Namhyung Kim
2020-03-25 12:45 ` [PATCH 2/9] perf/core: Add PERF_SAMPLE_CGROUP feature Namhyung Kim
2020-04-04  8:41   ` [tip: perf/urgent] " tip-bot2 for Namhyung Kim
2020-03-25 12:45 ` [PATCH 3/9] perf tools: Basic support for CGROUP event Namhyung Kim
2020-03-27 14:06   ` Arnaldo Carvalho de Melo
2020-04-04  8:41   ` [tip: perf/urgent] " tip-bot2 for Namhyung Kim
2020-03-25 12:45 ` [PATCH 4/9] perf tools: Maintain cgroup hierarchy Namhyung Kim
2020-04-03 12:36   ` Arnaldo Carvalho de Melo
2020-04-04  1:29     ` Namhyung Kim
2020-04-04  8:41     ` [tip: perf/urgent] perf python: Include rwsem.c in the pythong biding tip-bot2 for Arnaldo Carvalho de Melo
2020-04-04  8:41   ` [tip: perf/urgent] perf cgroup: Maintain cgroup hierarchy tip-bot2 for Namhyung Kim
2020-03-25 12:45 ` [PATCH 5/9] perf report: Add 'cgroup' sort key Namhyung Kim
2020-04-04  8:41   ` [tip: perf/urgent] " tip-bot2 for Namhyung Kim
2020-03-25 12:45 ` [PATCH 6/9] perf record: Support synthesizing cgroup events Namhyung Kim
2020-03-30 16:30   ` Arnaldo Carvalho de Melo
2020-03-30 16:43     ` Arnaldo Carvalho de Melo
2020-03-31 16:04       ` Arnaldo Carvalho de Melo
2020-04-01 23:22       ` Namhyung Kim
2020-04-02  1:52         ` [PATCH] perf tools: Add file-handle feature test Namhyung Kim
2020-04-02 15:37           ` Arnaldo Carvalho de Melo
2020-04-02 15:46             ` Arnaldo Carvalho de Melo
2020-04-02 18:37             ` Arnaldo Carvalho de Melo
2020-04-04  8:41           ` [tip: perf/urgent] " tip-bot2 for Namhyung Kim
2020-04-04  8:41   ` [tip: perf/urgent] perf record: Support synthesizing cgroup events tip-bot2 for Namhyung Kim
2020-03-25 12:45 ` [PATCH 7/9] perf record: Add --all-cgroups option Namhyung Kim
2020-04-04  8:41   ` [tip: perf/urgent] " tip-bot2 for Namhyung Kim
2020-03-25 12:45 ` [PATCH 8/9] perf top: " Namhyung Kim
2020-03-27 16:35   ` Arnaldo Carvalho de Melo
2020-04-04  8:41   ` [tip: perf/urgent] " tip-bot2 for Namhyung Kim
2020-03-25 12:45 ` [PATCH 9/9] perf script: Add --show-cgroup-events option Namhyung Kim
2020-03-27 16:39   ` Arnaldo Carvalho de Melo
2020-04-04  8:41   ` [tip: perf/urgent] " tip-bot2 for Namhyung Kim
2020-03-25 13:05 ` [PATCHSET 0/9] perf: Improve cgroup profiling (v6) Arnaldo Carvalho de Melo
2020-03-25 13:19   ` Namhyung Kim
  -- strict thread matches above, loose matches on Subject: below --
2020-01-07 13:34 [PATCHSET 0/9] perf: Improve cgroup profiling (v4) Namhyung Kim
2020-01-07 13:34 ` [PATCH 6/9] perf record: Support synthesizing cgroup events Namhyung Kim
2020-01-08 22:08   ` Jiri Olsa
2020-01-09  5:19     ` Namhyung Kim
2020-01-08 22:18   ` Jiri Olsa
2020-01-09  7:50     ` Namhyung Kim
2020-01-08 22:21   ` Jiri Olsa
2020-01-09  7:51     ` Namhyung Kim
2019-12-23  6:07 [PATCHSET 0/9] perf: Improve cgroup profiling (v3) Namhyung Kim
2019-12-23  6:07 ` [PATCH 6/9] perf record: Support synthesizing cgroup events Namhyung Kim
2019-12-20  4:32 [PATCHSET 0/9] perf: Improve cgroup profiling (v2) Namhyung Kim
2019-12-20  4:32 ` [PATCH 6/9] perf record: Support synthesizing cgroup events Namhyung Kim
2019-08-28  7:31 [PATCHSET 0/9] perf: Improve cgroup profiling (v1) Namhyung Kim
2019-08-28  7:31 ` [PATCH 6/9] perf record: Support synthesizing cgroup events Namhyung Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).