linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] perf: add support for analyzing events for containers
@ 2016-12-12 18:19 Hari Bathini
  2016-12-12 18:19 ` [PATCH v3 1/3] perf: add PERF_RECORD_NAMESPACES to include namespaces related info Hari Bathini
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Hari Bathini @ 2016-12-12 18:19 UTC (permalink / raw)
  To: ast, peterz, lkml, acme, alexander.shishkin, mingo
  Cc: daniel, rostedt, Ananth N Mavinakayanahalli, ebiederm, sargun,
	Aravinda Prasad, brendan.d.gregg

Currently, there is no trivial mechanism to analyze events based on
containers. perf -G can be used, but it will not filter events for the
containers created after perf is invoked, making it difficult to assess/
analyze performance issues of multiple containers at once.

This patch-set overcomes this limitation by using cgroup identifier as
container unique identifier. A new PERF_RECORD_NAMESPACES event that
records namespaces related info is introduced, from which the cgroup
namespace's inode number is used as cgroup identifier.  This is based
on the assumption that each container is created with it's own cgroup
namespace allowing assessment/analysis of multiple containers using
cgroup identifier.

The first patch introduces PERF_RECORD_NAMESPACES in kernel while the
second patch makes the corresponding changes in perf tool to read this
PERF_RECORD_NAMESPACES events. The third patch adds a cgroup identifier
column in perf report, which is nothing but the cgroup namespace's
inode number.

---

Hari Bathini (3):
      perf: add PERF_RECORD_NAMESPACES to include namespaces related info
      perf tool: add PERF_RECORD_NAMESPACES to include namespaces related info
      perf tool: add cgroup identifier entry in perf report


 include/linux/perf_event.h            |    2 
 include/uapi/linux/perf_event.h       |   27 ++++++-
 kernel/events/core.c                  |  134 +++++++++++++++++++++++++++++++++
 kernel/fork.c                         |    3 +
 kernel/nsproxy.c                      |    5 +
 tools/include/uapi/linux/perf_event.h |   27 ++++++-
 tools/perf/builtin-annotate.c         |    1 
 tools/perf/builtin-diff.c             |    1 
 tools/perf/builtin-inject.c           |   14 +++
 tools/perf/builtin-kmem.c             |    1 
 tools/perf/builtin-kvm.c              |    2 
 tools/perf/builtin-lock.c             |    1 
 tools/perf/builtin-mem.c              |    1 
 tools/perf/builtin-record.c           |   33 +++++++-
 tools/perf/builtin-report.c           |    1 
 tools/perf/builtin-sched.c            |    1 
 tools/perf/builtin-script.c           |   41 ++++++++++
 tools/perf/builtin-trace.c            |    3 -
 tools/perf/perf.h                     |    1 
 tools/perf/util/Build                 |    1 
 tools/perf/util/data-convert-bt.c     |    2 
 tools/perf/util/event.c               |  127 ++++++++++++++++++++++++++++++-
 tools/perf/util/event.h               |   19 +++++
 tools/perf/util/evsel.c               |    3 +
 tools/perf/util/hist.c                |    4 +
 tools/perf/util/hist.h                |    1 
 tools/perf/util/machine.c             |   25 ++++++
 tools/perf/util/machine.h             |    3 +
 tools/perf/util/namespaces.c          |   28 +++++++
 tools/perf/util/namespaces.h          |   19 +++++
 tools/perf/util/session.c             |    7 ++
 tools/perf/util/sort.c                |   22 +++++
 tools/perf/util/sort.h                |    2 
 tools/perf/util/thread.c              |   44 ++++++++++-
 tools/perf/util/thread.h              |    6 +
 tools/perf/util/tool.h                |    2 
 36 files changed, 599 insertions(+), 15 deletions(-)
 create mode 100644 tools/perf/util/namespaces.c
 create mode 100644 tools/perf/util/namespaces.h

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 1/3] perf: add PERF_RECORD_NAMESPACES to include namespaces related info
  2016-12-12 18:19 [PATCH v3 0/3] perf: add support for analyzing events for containers Hari Bathini
@ 2016-12-12 18:19 ` Hari Bathini
  2016-12-12 18:27   ` Eric W. Biederman
  2016-12-12 18:19 ` [PATCH v3 2/3] perf tool: " Hari Bathini
  2016-12-12 18:20 ` [PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report Hari Bathini
  2 siblings, 1 reply; 14+ messages in thread
From: Hari Bathini @ 2016-12-12 18:19 UTC (permalink / raw)
  To: ast, peterz, lkml, acme, alexander.shishkin, mingo
  Cc: daniel, rostedt, Ananth N Mavinakayanahalli, ebiederm, sargun,
	Aravinda Prasad, brendan.d.gregg

With the advert of container technologies like docker, that depend
on namespaces for isolation, there is a need for tracing support for
namespaces. This patch introduces new PERF_RECORD_NAMESPACES event
for tracing based on namespaces related info.

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
---

Changes from v2:
* Use time value from sample_id.


 include/linux/perf_event.h      |    2 +
 include/uapi/linux/perf_event.h |   27 ++++++++
 kernel/events/core.c            |  134 +++++++++++++++++++++++++++++++++++++++
 kernel/fork.c                   |    3 +
 kernel/nsproxy.c                |    5 +
 5 files changed, 170 insertions(+), 1 deletion(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 4741ecd..42d8aa6 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1110,6 +1110,7 @@ extern int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks
 
 extern void perf_event_exec(void);
 extern void perf_event_comm(struct task_struct *tsk, bool exec);
+extern void perf_event_namespaces(struct task_struct *tsk);
 extern void perf_event_fork(struct task_struct *tsk);
 
 /* Callchains */
@@ -1312,6 +1313,7 @@ static inline int perf_unregister_guest_info_callbacks
 static inline void perf_event_mmap(struct vm_area_struct *vma)		{ }
 static inline void perf_event_exec(void)				{ }
 static inline void perf_event_comm(struct task_struct *tsk, bool exec)	{ }
+static inline void perf_event_namespaces(struct task_struct *tsk)	{ }
 static inline void perf_event_fork(struct task_struct *tsk)		{ }
 static inline void perf_event_init(void)				{ }
 static inline int  perf_swevent_get_recursion_context(void)		{ return -1; }
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index c66a485..2a48fc6 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -344,7 +344,8 @@ struct perf_event_attr {
 				use_clockid    :  1, /* use @clockid for time fields */
 				context_switch :  1, /* context switch data */
 				write_backward :  1, /* Write ring buffer from end to beginning */
-				__reserved_1   : 36;
+				namespaces     :  1, /* include namespaces data */
+				__reserved_1   : 35;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -610,6 +611,18 @@ struct perf_event_header {
 	__u16	size;
 };
 
+enum {
+	NET_NS_INDEX		= 0,
+	UTS_NS_INDEX		= 1,
+	IPC_NS_INDEX		= 2,
+	PID_NS_INDEX		= 3,
+	USER_NS_INDEX		= 4,
+	MNT_NS_INDEX		= 5,
+	CGROUP_NS_INDEX		= 6,
+
+	NAMESPACES_MAX,		/* maximum available namespaces */
+};
+
 enum perf_event_type {
 
 	/*
@@ -862,6 +875,18 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_SWITCH_CPU_WIDE		= 15,
 
+	/*
+	 * struct {
+	 *	struct perf_event_header	header;
+	 *
+	 *	u32				pid, tid;
+	 *	u64				dev_num;
+	 *	u64				inode_num[NAMESPACES_MAX];
+	 *	struct sample_id		sample_id;
+	 * };
+	 */
+	PERF_RECORD_NAMESPACES			= 16,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 02c8421..eb9c812 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -46,6 +46,10 @@
 #include <linux/filter.h>
 #include <linux/namei.h>
 #include <linux/parser.h>
+#include <linux/proc_ns.h>
+#include <linux/mount.h>
+#include <linux/ipc_namespace.h>
+#include <linux/utsname.h>
 
 #include "internal.h"
 
@@ -375,6 +379,7 @@ static DEFINE_PER_CPU(struct pmu_event_list, pmu_sb_events);
 
 static atomic_t nr_mmap_events __read_mostly;
 static atomic_t nr_comm_events __read_mostly;
+static atomic_t nr_namespaces_events __read_mostly;
 static atomic_t nr_task_events __read_mostly;
 static atomic_t nr_freq_events __read_mostly;
 static atomic_t nr_switch_events __read_mostly;
@@ -3882,6 +3887,8 @@ static void unaccount_event(struct perf_event *event)
 		atomic_dec(&nr_mmap_events);
 	if (event->attr.comm)
 		atomic_dec(&nr_comm_events);
+	if (event->attr.namespaces)
+		atomic_dec(&nr_namespaces_events);
 	if (event->attr.task)
 		atomic_dec(&nr_task_events);
 	if (event->attr.freq)
@@ -6382,6 +6389,7 @@ static void perf_event_task(struct task_struct *task,
 void perf_event_fork(struct task_struct *task)
 {
 	perf_event_task(task, NULL, 1);
+	perf_event_namespaces(task);
 }
 
 /*
@@ -6484,6 +6492,125 @@ void perf_event_comm(struct task_struct *task, bool exec)
 }
 
 /*
+ * namespaces tracking
+ */
+
+struct namespaces_event_id {
+	struct perf_event_header	header;
+	u32				pid;
+	u32				tid;
+	u64				dev_num;
+	u64				inode_num[NAMESPACES_MAX];
+};
+
+struct perf_namespaces_event {
+	struct task_struct		*task;
+
+	struct namespaces_event_id	event_id;
+};
+
+static int perf_event_namespaces_match(struct perf_event *event)
+{
+	return event->attr.namespaces;
+}
+
+static void perf_event_namespaces_output(struct perf_event *event,
+					 void *data)
+{
+	struct perf_namespaces_event *namespaces_event = data;
+	struct perf_output_handle handle;
+	struct perf_sample_data sample;
+	struct path ns_path;
+	struct task_struct *task = namespaces_event->task;
+	struct namespaces_event_id *ei = &namespaces_event->event_id;
+	struct ns_common *ns_common;
+	struct nsproxy *nsproxy;
+	void *error;
+	int ret;
+
+	if (!perf_event_namespaces_match(event))
+		return;
+
+	perf_event_header__init_id(&ei->header, &sample, event);
+	ret = perf_output_begin(&handle, event, ei->header.size);
+	if (ret)
+		return;
+
+	ei->pid = perf_event_pid(event, task);
+	ei->tid = perf_event_tid(event, task);
+
+	error = ns_get_path(&ns_path, task, &mntns_operations);
+	if (!error)
+		ei->dev_num = ns_path.mnt->mnt_sb->s_dev;
+
+	ns_common = mntns_operations.get(task);
+	ei->inode_num[MNT_NS_INDEX] = ns_common->inum;
+	mntns_operations.put(ns_common);
+
+#ifdef CONFIG_USER_NS
+	ei->inode_num[USER_NS_INDEX] = __task_cred(task)->user_ns->ns.inum;
+#endif
+
+	if (task != current)
+		task_lock(task);
+
+	nsproxy = task->nsproxy;
+	if (nsproxy != NULL) {
+#ifdef CONFIG_NET_NS
+		ei->inode_num[NET_NS_INDEX] = nsproxy->net_ns->ns.inum;
+#endif
+#ifdef CONFIG_UTS_NS
+		ei->inode_num[UTS_NS_INDEX] = nsproxy->uts_ns->ns.inum;
+#endif
+#ifdef CONFIG_IPC_NS
+		ei->inode_num[IPC_NS_INDEX] = nsproxy->ipc_ns->ns.inum;
+#endif
+#ifdef CONFIG_PID_NS
+		ei->inode_num[PID_NS_INDEX] = nsproxy->pid_ns_for_children->ns.inum;
+#endif
+#ifdef CONFIG_CGROUPS
+		ei->inode_num[CGROUP_NS_INDEX] = nsproxy->cgroup_ns->ns.inum;
+#endif
+	}
+
+	if (task != current)
+		task_unlock(task);
+
+	perf_output_put(&handle, namespaces_event->event_id);
+
+	perf_event__output_id_sample(event, &handle, &sample);
+
+	perf_output_end(&handle);
+}
+
+void perf_event_namespaces(struct task_struct *task)
+{
+	struct perf_namespaces_event namespaces_event;
+
+	if (!atomic_read(&nr_namespaces_events))
+		return;
+
+	namespaces_event = (struct perf_namespaces_event){
+		.task	= task,
+		.event_id  = {
+			.header = {
+				.type = PERF_RECORD_NAMESPACES,
+				.misc = 0,
+				.size = sizeof(namespaces_event.event_id),
+			},
+			/* .pid */
+			/* .tid */
+			/* .dev_num */
+			/* .inode_num[NAMESPACES_MAX] */
+		},
+	};
+
+	perf_iterate_sb(perf_event_namespaces_output,
+			&namespaces_event,
+			NULL);
+}
+
+/*
  * mmap tracking
  */
 
@@ -9028,6 +9155,8 @@ static void account_event(struct perf_event *event)
 		atomic_inc(&nr_mmap_events);
 	if (event->attr.comm)
 		atomic_inc(&nr_comm_events);
+	if (event->attr.namespaces)
+		atomic_inc(&nr_namespaces_events);
 	if (event->attr.task)
 		atomic_inc(&nr_task_events);
 	if (event->attr.freq)
@@ -9542,6 +9671,11 @@ SYSCALL_DEFINE5(perf_event_open,
 			return -EACCES;
 	}
 
+	if (attr.namespaces) {
+		if (!capable(CAP_SYS_ADMIN))
+			return -EACCES;
+	}
+
 	if (attr.freq) {
 		if (attr.sample_freq > sysctl_perf_event_sample_rate)
 			return -EINVAL;
diff --git a/kernel/fork.c b/kernel/fork.c
index 997ac1d..399a44b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2280,6 +2280,9 @@ SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags)
 		free_fs_struct(new_fs);
 
 bad_unshare_out:
+	if (!err)
+		perf_event_namespaces(current);
+
 	return err;
 }
 
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index 782102e..4c25e6e 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -26,6 +26,7 @@
 #include <linux/file.h>
 #include <linux/syscalls.h>
 #include <linux/cgroup.h>
+#include <linux/perf_event.h>
 
 static struct kmem_cache *nsproxy_cachep;
 
@@ -264,6 +265,10 @@ SYSCALL_DEFINE2(setns, int, fd, int, nstype)
 	switch_task_namespaces(tsk, new_nsproxy);
 out:
 	fput(file);
+
+	if (!err)
+		perf_event_namespaces(tsk);
+
 	return err;
 }
 

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 2/3] perf tool: add PERF_RECORD_NAMESPACES to include namespaces related info
  2016-12-12 18:19 [PATCH v3 0/3] perf: add support for analyzing events for containers Hari Bathini
  2016-12-12 18:19 ` [PATCH v3 1/3] perf: add PERF_RECORD_NAMESPACES to include namespaces related info Hari Bathini
@ 2016-12-12 18:19 ` Hari Bathini
  2016-12-12 21:51   ` Eric W. Biederman
  2016-12-12 18:20 ` [PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report Hari Bathini
  2 siblings, 1 reply; 14+ messages in thread
From: Hari Bathini @ 2016-12-12 18:19 UTC (permalink / raw)
  To: ast, peterz, lkml, acme, alexander.shishkin, mingo
  Cc: daniel, rostedt, Ananth N Mavinakayanahalli, ebiederm, sargun,
	Aravinda Prasad, brendan.d.gregg

This patch updates perf tool to examine PERF_RECORD_NAMESPACES events
emitted by the kernel when fork, clone, setns or unshare are invoked.
Also, it synthesizes PERF_RECORD_NAMESPACES events for processes that
were running prior to invocation of perf record, the data for which
is taken from /proc/$PID/ns. These changes make way for analyzing
events with regard to namespaces.

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
---

Changes from v2:
* Don't synthesize namespace events when "--namespaces" is not
  specified with perf-record.


 tools/include/uapi/linux/perf_event.h |   27 +++++++
 tools/perf/builtin-annotate.c         |    1 
 tools/perf/builtin-diff.c             |    1 
 tools/perf/builtin-inject.c           |   14 ++++
 tools/perf/builtin-kmem.c             |    1 
 tools/perf/builtin-kvm.c              |    2 +
 tools/perf/builtin-lock.c             |    1 
 tools/perf/builtin-mem.c              |    1 
 tools/perf/builtin-record.c           |   33 ++++++++-
 tools/perf/builtin-report.c           |    1 
 tools/perf/builtin-sched.c            |    1 
 tools/perf/builtin-script.c           |   41 +++++++++++
 tools/perf/builtin-trace.c            |    3 +
 tools/perf/perf.h                     |    1 
 tools/perf/util/Build                 |    1 
 tools/perf/util/data-convert-bt.c     |    2 +
 tools/perf/util/event.c               |  127 +++++++++++++++++++++++++++++++--
 tools/perf/util/event.h               |   19 +++++
 tools/perf/util/evsel.c               |    3 +
 tools/perf/util/machine.c             |   25 ++++++
 tools/perf/util/machine.h             |    3 +
 tools/perf/util/namespaces.c          |   28 +++++++
 tools/perf/util/namespaces.h          |   19 +++++
 tools/perf/util/session.c             |    7 ++
 tools/perf/util/thread.c              |   44 +++++++++++
 tools/perf/util/thread.h              |    6 ++
 tools/perf/util/tool.h                |    2 +
 27 files changed, 400 insertions(+), 14 deletions(-)
 create mode 100644 tools/perf/util/namespaces.c
 create mode 100644 tools/perf/util/namespaces.h

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index c66a485..2a48fc6 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -344,7 +344,8 @@ struct perf_event_attr {
 				use_clockid    :  1, /* use @clockid for time fields */
 				context_switch :  1, /* context switch data */
 				write_backward :  1, /* Write ring buffer from end to beginning */
-				__reserved_1   : 36;
+				namespaces     :  1, /* include namespaces data */
+				__reserved_1   : 35;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -610,6 +611,18 @@ struct perf_event_header {
 	__u16	size;
 };
 
+enum {
+	NET_NS_INDEX		= 0,
+	UTS_NS_INDEX		= 1,
+	IPC_NS_INDEX		= 2,
+	PID_NS_INDEX		= 3,
+	USER_NS_INDEX		= 4,
+	MNT_NS_INDEX		= 5,
+	CGROUP_NS_INDEX		= 6,
+
+	NAMESPACES_MAX,		/* maximum available namespaces */
+};
+
 enum perf_event_type {
 
 	/*
@@ -862,6 +875,18 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_SWITCH_CPU_WIDE		= 15,
 
+	/*
+	 * struct {
+	 *	struct perf_event_header	header;
+	 *
+	 *	u32				pid, tid;
+	 *	u64				dev_num;
+	 *	u64				inode_num[NAMESPACES_MAX];
+	 *	struct sample_id		sample_id;
+	 * };
+	 */
+	PERF_RECORD_NAMESPACES			= 16,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index ebb6283..1b63dc4 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -393,6 +393,7 @@ int cmd_annotate(int argc, const char **argv, const char *prefix __maybe_unused)
 			.comm	= perf_event__process_comm,
 			.exit	= perf_event__process_exit,
 			.fork	= perf_event__process_fork,
+			.namespaces = perf_event__process_namespaces,
 			.ordered_events = true,
 			.ordering_requires_timestamps = true,
 		},
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 9ff0db4..c52552f 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -354,6 +354,7 @@ static struct perf_tool tool = {
 	.exit	= perf_event__process_exit,
 	.fork	= perf_event__process_fork,
 	.lost	= perf_event__process_lost,
+	.namespaces = perf_event__process_namespaces,
 	.ordered_events = true,
 	.ordering_requires_timestamps = true,
 };
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index b9bc7e3..c5ddc73 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -333,6 +333,19 @@ static int perf_event__repipe_comm(struct perf_tool *tool,
 	return err;
 }
 
+static int perf_event__repipe_namespaces(struct perf_tool *tool,
+					 union perf_event *event,
+					 struct perf_sample *sample,
+					 struct machine *machine)
+{
+	int err;
+
+	err = perf_event__process_namespaces(tool, event, sample, machine);
+	perf_event__repipe(tool, event, sample, machine);
+
+	return err;
+}
+
 static int perf_event__repipe_exit(struct perf_tool *tool,
 				   union perf_event *event,
 				   struct perf_sample *sample,
@@ -660,6 +673,7 @@ static int __cmd_inject(struct perf_inject *inject)
 		session->itrace_synth_opts = &inject->itrace_synth_opts;
 		inject->itrace_synth_opts.inject = true;
 		inject->tool.comm	    = perf_event__repipe_comm;
+		inject->tool.namespaces	    = perf_event__repipe_namespaces;
 		inject->tool.exit	    = perf_event__repipe_exit;
 		inject->tool.id_index	    = perf_event__repipe_id_index;
 		inject->tool.auxtrace_info  = perf_event__process_auxtrace_info;
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index d426dcb..a60fab0 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -943,6 +943,7 @@ static struct perf_tool perf_kmem = {
 	.comm		 = perf_event__process_comm,
 	.mmap		 = perf_event__process_mmap,
 	.mmap2		 = perf_event__process_mmap2,
+	.namespaces	 = perf_event__process_namespaces,
 	.ordered_events	 = true,
 };
 
diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index 08fa88f..18e6c38 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -1044,6 +1044,7 @@ static int read_events(struct perf_kvm_stat *kvm)
 	struct perf_tool eops = {
 		.sample			= process_sample_event,
 		.comm			= perf_event__process_comm,
+		.namespaces		= perf_event__process_namespaces,
 		.ordered_events		= true,
 	};
 	struct perf_data_file file = {
@@ -1348,6 +1349,7 @@ static int kvm_events_live(struct perf_kvm_stat *kvm,
 	kvm->tool.exit   = perf_event__process_exit;
 	kvm->tool.fork   = perf_event__process_fork;
 	kvm->tool.lost   = process_lost_event;
+	kvm->tool.namespaces  = perf_event__process_namespaces;
 	kvm->tool.ordered_events = true;
 	perf_tool__fill_defaults(&kvm->tool);
 
diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
index ce3bfb4..d750cca 100644
--- a/tools/perf/builtin-lock.c
+++ b/tools/perf/builtin-lock.c
@@ -858,6 +858,7 @@ static int __cmd_report(bool display_info)
 	struct perf_tool eops = {
 		.sample		 = process_sample_event,
 		.comm		 = perf_event__process_comm,
+		.namespaces	 = perf_event__process_namespaces,
 		.ordered_events	 = true,
 	};
 	struct perf_data_file file = {
diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index d1ce29b..da55056 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -342,6 +342,7 @@ int cmd_mem(int argc, const char **argv, const char *prefix __maybe_unused)
 			.lost		= perf_event__process_lost,
 			.fork		= perf_event__process_fork,
 			.build_id	= perf_event__process_build_id,
+			.namespaces	= perf_event__process_namespaces,
 			.ordered_events	= true,
 		},
 		.input_name		 = "perf.data",
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 67d2a90..b360e85 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -834,6 +834,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 	signal(SIGINT, sig_handler);
 	signal(SIGTERM, sig_handler);
 
+	if (rec->opts.record_namespaces)
+		tool->namespace_events = true;
+
 	if (rec->opts.auxtrace_snapshot_mode || rec->switch_output) {
 		signal(SIGUSR2, snapshot_sig_handler);
 		if (rec->opts.auxtrace_snapshot_mode)
@@ -941,6 +944,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 	 */
 	if (forks) {
 		union perf_event *event;
+		pid_t tgid;
 
 		event = malloc(sizeof(event->comm) + machine->id_hdr_size);
 		if (event == NULL) {
@@ -954,10 +958,28 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 		 * cannot see a correct process name for those events.
 		 * Synthesize COMM event to prevent it.
 		 */
-		perf_event__synthesize_comm(tool, event,
-					    rec->evlist->workload.pid,
-					    process_synthesized_event,
-					    machine);
+		tgid = perf_event__synthesize_comm(tool, event,
+						   rec->evlist->workload.pid,
+						   process_synthesized_event,
+						   machine);
+		free(event);
+
+		if (tgid == -1)
+			goto out_child;
+
+		event = malloc(sizeof(event->namespaces) + machine->id_hdr_size);
+		if (event == NULL) {
+			err = -ENOMEM;
+			goto out_child;
+		}
+
+		/*
+		 * Synthesize NAMESPACES event for the command specified.
+		 */
+		perf_event__synthesize_namespaces(tool, event,
+						  rec->evlist->workload.pid,
+						  tgid, process_synthesized_event,
+						  machine);
 		free(event);
 
 		perf_evlist__start_workload(rec->evlist);
@@ -1376,6 +1398,7 @@ static struct record record = {
 		.fork		= perf_event__process_fork,
 		.exit		= perf_event__process_exit,
 		.comm		= perf_event__process_comm,
+		.namespaces	= perf_event__process_namespaces,
 		.mmap		= perf_event__process_mmap,
 		.mmap2		= perf_event__process_mmap2,
 		.ordered_events	= true,
@@ -1490,6 +1513,8 @@ struct option __record_options[] = {
 			  "opts", "AUX area tracing Snapshot Mode", ""),
 	OPT_UINTEGER(0, "proc-map-timeout", &record.opts.proc_map_timeout,
 			"per thread proc mmap processing timeout in ms"),
+	OPT_BOOLEAN(0, "namespaces", &record.opts.record_namespaces,
+		    "Record namespaces events"),
 	OPT_BOOLEAN(0, "switch-events", &record.opts.record_switch_events,
 		    "Record context switch events"),
 	OPT_BOOLEAN_FLAG(0, "all-kernel", &record.opts.all_kernel,
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 6e88460..420878f4 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -683,6 +683,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 			.mmap		 = perf_event__process_mmap,
 			.mmap2		 = perf_event__process_mmap2,
 			.comm		 = perf_event__process_comm,
+			.namespaces	 = perf_event__process_namespaces,
 			.exit		 = perf_event__process_exit,
 			.fork		 = perf_event__process_fork,
 			.lost		 = perf_event__process_lost,
diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index f5503ca..db67f55 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -1939,6 +1939,7 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 		.tool = {
 			.sample		 = perf_sched__process_tracepoint_sample,
 			.comm		 = perf_event__process_comm,
+			.namespaces	 = perf_event__process_namespaces,
 			.lost		 = perf_event__process_lost,
 			.fork		 = perf_sched__process_fork_event,
 			.ordered_events = true,
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 7228d14..77cc796 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -807,6 +807,7 @@ struct perf_script {
 	bool			show_task_events;
 	bool			show_mmap_events;
 	bool			show_switch_events;
+	bool			show_namespaces_events;
 	bool			allocated;
 	struct cpu_map		*cpus;
 	struct thread_map	*threads;
@@ -1090,6 +1091,41 @@ static int process_comm_event(struct perf_tool *tool,
 	return ret;
 }
 
+static int process_namespaces_event(struct perf_tool *tool,
+				    union perf_event *event,
+				    struct perf_sample *sample,
+				    struct machine *machine)
+{
+	struct thread *thread;
+	struct perf_script *script = container_of(tool, struct perf_script, tool);
+	struct perf_session *session = script->session;
+	struct perf_evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id);
+	int ret = -1;
+
+	thread = machine__findnew_thread(machine, event->namespaces.pid,
+					 event->namespaces.tid);
+	if (thread == NULL) {
+		pr_debug("problem processing NAMESPACES event, skipping it.\n");
+		return -1;
+	}
+
+	if (perf_event__process_namespaces(tool, event, sample, machine) < 0)
+		goto out;
+
+	if (!evsel->attr.sample_id_all) {
+		sample->cpu = 0;
+		sample->time = 0;
+		sample->tid = event->namespaces.tid;
+		sample->pid = event->namespaces.pid;
+	}
+	print_sample_start(sample, thread, evsel);
+	perf_event__fprintf(event, stdout);
+	ret = 0;
+out:
+	thread__put(thread);
+	return ret;
+}
+
 static int process_fork_event(struct perf_tool *tool,
 			      union perf_event *event,
 			      struct perf_sample *sample,
@@ -1265,6 +1301,8 @@ static int __cmd_script(struct perf_script *script)
 	}
 	if (script->show_switch_events)
 		script->tool.context_switch = process_switch_event;
+	if (script->show_namespaces_events)
+		script->tool.namespaces = process_namespaces_event;
 
 	ret = perf_session__process_events(script->session);
 
@@ -2069,6 +2107,7 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
 			.mmap		 = perf_event__process_mmap,
 			.mmap2		 = perf_event__process_mmap2,
 			.comm		 = perf_event__process_comm,
+			.namespaces	 = perf_event__process_namespaces,
 			.exit		 = perf_event__process_exit,
 			.fork		 = perf_event__process_fork,
 			.attr		 = process_attr,
@@ -2150,6 +2189,8 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
 		    "Show the mmap events"),
 	OPT_BOOLEAN('\0', "show-switch-events", &script.show_switch_events,
 		    "Show context switch events (if recorded)"),
+	OPT_BOOLEAN('\0', "show-namespaces-events", &script.show_namespaces_events,
+		    "Show namespaces events (if recorded)"),
 	OPT_BOOLEAN('f', "force", &file.force, "don't complain, do it"),
 	OPT_BOOLEAN(0, "ns", &nanosecs,
 		    "Use 9 decimal places when displaying time"),
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index c298bd3..8201a90 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -2445,8 +2445,9 @@ static int trace__replay(struct trace *trace)
 	trace->tool.exit	  = perf_event__process_exit;
 	trace->tool.fork	  = perf_event__process_fork;
 	trace->tool.attr	  = perf_event__process_attr;
-	trace->tool.tracing_data = perf_event__process_tracing_data;
+	trace->tool.tracing_data  = perf_event__process_tracing_data;
 	trace->tool.build_id	  = perf_event__process_build_id;
+	trace->tool.namespaces	  = perf_event__process_namespaces;
 
 	trace->tool.ordered_events = true;
 	trace->tool.ordering_requires_timestamps = true;
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 9a0236a..867e732 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -50,6 +50,7 @@ struct record_opts {
 	bool	     running_time;
 	bool	     full_auxtrace;
 	bool	     auxtrace_snapshot_mode;
+	bool	     record_namespaces;
 	bool	     record_switch_events;
 	bool	     all_kernel;
 	bool	     all_user;
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index eb60e61..73f12ae 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -42,6 +42,7 @@ libperf-y += pstack.o
 libperf-y += session.o
 libperf-$(CONFIG_AUDIT) += syscalltbl.o
 libperf-y += ordered-events.o
+libperf-y += namespaces.o
 libperf-y += comm.o
 libperf-y += thread.o
 libperf-y += thread_map.o
diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c
index 7123f4d..1fcacf1 100644
--- a/tools/perf/util/data-convert-bt.c
+++ b/tools/perf/util/data-convert-bt.c
@@ -1468,6 +1468,7 @@ int bt_convert__perf2ctf(const char *input, const char *path,
 			.lost            = perf_event__process_lost,
 			.tracing_data    = perf_event__process_tracing_data,
 			.build_id        = perf_event__process_build_id,
+			.namespaces      = perf_event__process_namespaces,
 			.ordered_events  = true,
 			.ordering_requires_timestamps = true,
 		},
@@ -1479,6 +1480,7 @@ int bt_convert__perf2ctf(const char *input, const char *path,
 		c.tool.comm = process_comm_event;
 		c.tool.exit = process_exit_event;
 		c.tool.fork = process_fork_event;
+		c.tool.namespaces = process_namespaces_event;
 	}
 
 	perf_config(convert__config, &c);
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 8ab0d7d..1ea9598 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -31,6 +31,7 @@ static const char *perf_event__names[] = {
 	[PERF_RECORD_LOST_SAMPLES]		= "LOST_SAMPLES",
 	[PERF_RECORD_SWITCH]			= "SWITCH",
 	[PERF_RECORD_SWITCH_CPU_WIDE]		= "SWITCH_CPU_WIDE",
+	[PERF_RECORD_NAMESPACES]		= "NAMESPACES",
 	[PERF_RECORD_HEADER_ATTR]		= "ATTR",
 	[PERF_RECORD_HEADER_EVENT_TYPE]		= "EVENT_TYPE",
 	[PERF_RECORD_HEADER_TRACING_DATA]	= "TRACING_DATA",
@@ -203,6 +204,65 @@ pid_t perf_event__synthesize_comm(struct perf_tool *tool,
 	return tgid;
 }
 
+int perf_event__synthesize_namespaces(struct perf_tool *tool,
+				      union perf_event *event,
+				      pid_t pid, pid_t tgid,
+				      perf_event__handler_t process,
+				      struct machine *machine)
+{
+	struct stat sb;
+	char proc_ns[128];
+
+	if (!tool->namespace_events)
+		return 0;
+
+	memset(&event->namespaces, 0,
+	       sizeof(event->namespaces) + machine->id_hdr_size);
+
+	event->namespaces.pid  = tgid;
+	event->namespaces.tid  = pid;
+
+	sprintf(proc_ns, "/proc/%u/ns/mnt", pid);
+	if (stat(proc_ns, &sb) == 0) {
+		event->namespaces.dev_num = sb.st_dev;
+		event->namespaces.inode_num[MNT_NS_INDEX] = sb.st_ino;
+	}
+
+	sprintf(proc_ns, "/proc/%u/ns/net", pid);
+	if (stat(proc_ns, &sb) == 0)
+		event->namespaces.inode_num[NET_NS_INDEX] = sb.st_ino;
+
+	sprintf(proc_ns, "/proc/%u/ns/uts", pid);
+	if (stat(proc_ns, &sb) == 0)
+		event->namespaces.inode_num[UTS_NS_INDEX] = sb.st_ino;
+
+	sprintf(proc_ns, "/proc/%u/ns/ipc", pid);
+	if (stat(proc_ns, &sb) == 0)
+		event->namespaces.inode_num[IPC_NS_INDEX] = sb.st_ino;
+
+	sprintf(proc_ns, "/proc/%u/ns/pid", pid);
+	if (stat(proc_ns, &sb) == 0)
+		event->namespaces.inode_num[PID_NS_INDEX] = sb.st_ino;
+
+	sprintf(proc_ns, "/proc/%u/ns/user", pid);
+	if (stat(proc_ns, &sb) == 0)
+		event->namespaces.inode_num[USER_NS_INDEX] = sb.st_ino;
+
+	sprintf(proc_ns, "/proc/%u/ns/cgroup", pid);
+	if (stat(proc_ns, &sb) == 0)
+		event->namespaces.inode_num[CGROUP_NS_INDEX] = sb.st_ino;
+
+	event->namespaces.header.type = PERF_RECORD_NAMESPACES;
+
+	event->namespaces.header.size = (sizeof(event->namespaces) +
+					 machine->id_hdr_size);
+
+	if (perf_tool__process_synth_event(tool, event, machine, process) != 0)
+		return -1;
+
+	return 0;
+}
+
 static int perf_event__synthesize_fork(struct perf_tool *tool,
 				       union perf_event *event,
 				       pid_t pid, pid_t tgid, pid_t ppid,
@@ -434,8 +494,9 @@ int perf_event__synthesize_modules(struct perf_tool *tool,
 static int __event__synthesize_thread(union perf_event *comm_event,
 				      union perf_event *mmap_event,
 				      union perf_event *fork_event,
+				      union perf_event *namespaces_event,
 				      pid_t pid, int full,
-					  perf_event__handler_t process,
+				      perf_event__handler_t process,
 				      struct perf_tool *tool,
 				      struct machine *machine,
 				      bool mmap_data,
@@ -455,6 +516,11 @@ static int __event__synthesize_thread(union perf_event *comm_event,
 		if (tgid == -1)
 			return -1;
 
+		if (perf_event__synthesize_namespaces(tool, namespaces_event, pid,
+						      tgid, process, machine) < 0)
+			return -1;
+
+
 		return perf_event__synthesize_mmap_events(tool, mmap_event, pid, tgid,
 							  process, machine, mmap_data,
 							  proc_map_timeout);
@@ -488,6 +554,11 @@ static int __event__synthesize_thread(union perf_event *comm_event,
 		if (perf_event__synthesize_fork(tool, fork_event, _pid, tgid,
 						ppid, process, machine) < 0)
 			break;
+
+		if (perf_event__synthesize_namespaces(tool, namespaces_event, _pid,
+						      tgid, process, machine) < 0)
+			break;
+
 		/*
 		 * Send the prepared comm event
 		 */
@@ -516,6 +587,7 @@ int perf_event__synthesize_thread_map(struct perf_tool *tool,
 				      unsigned int proc_map_timeout)
 {
 	union perf_event *comm_event, *mmap_event, *fork_event;
+	union perf_event *namespaces_event;
 	int err = -1, thread, j;
 
 	comm_event = malloc(sizeof(comm_event->comm) + machine->id_hdr_size);
@@ -530,10 +602,15 @@ int perf_event__synthesize_thread_map(struct perf_tool *tool,
 	if (fork_event == NULL)
 		goto out_free_mmap;
 
+	namespaces_event = malloc(sizeof(namespaces_event->namespaces) +
+				  machine->id_hdr_size);
+	if (namespaces_event == NULL)
+		goto out_free_fork;
+
 	err = 0;
 	for (thread = 0; thread < threads->nr; ++thread) {
 		if (__event__synthesize_thread(comm_event, mmap_event,
-					       fork_event,
+					       fork_event, namespaces_event,
 					       thread_map__pid(threads, thread), 0,
 					       process, tool, machine,
 					       mmap_data, proc_map_timeout)) {
@@ -559,7 +636,7 @@ int perf_event__synthesize_thread_map(struct perf_tool *tool,
 			/* if not, generate events for it */
 			if (need_leader &&
 			    __event__synthesize_thread(comm_event, mmap_event,
-						       fork_event,
+						       fork_event, namespaces_event,
 						       comm_event->comm.pid, 0,
 						       process, tool, machine,
 						       mmap_data, proc_map_timeout)) {
@@ -568,6 +645,8 @@ int perf_event__synthesize_thread_map(struct perf_tool *tool,
 			}
 		}
 	}
+	free(namespaces_event);
+out_free_fork:
 	free(fork_event);
 out_free_mmap:
 	free(mmap_event);
@@ -587,6 +666,7 @@ int perf_event__synthesize_threads(struct perf_tool *tool,
 	char proc_path[PATH_MAX];
 	struct dirent *dirent;
 	union perf_event *comm_event, *mmap_event, *fork_event;
+	union perf_event *namespaces_event;
 	int err = -1;
 
 	if (machine__is_default_guest(machine))
@@ -604,11 +684,16 @@ int perf_event__synthesize_threads(struct perf_tool *tool,
 	if (fork_event == NULL)
 		goto out_free_mmap;
 
+	namespaces_event = malloc(sizeof(namespaces_event->namespaces) +
+				  machine->id_hdr_size);
+	if (namespaces_event == NULL)
+		goto out_free_fork;
+
 	snprintf(proc_path, sizeof(proc_path), "%s/proc", machine->root_dir);
 	proc = opendir(proc_path);
 
 	if (proc == NULL)
-		goto out_free_fork;
+		goto out_free_namespaces;
 
 	while ((dirent = readdir(proc)) != NULL) {
 		char *end;
@@ -620,13 +705,16 @@ int perf_event__synthesize_threads(struct perf_tool *tool,
  		 * We may race with exiting thread, so don't stop just because
  		 * one thread couldn't be synthesized.
  		 */
-		__event__synthesize_thread(comm_event, mmap_event, fork_event, pid,
-					   1, process, tool, machine, mmap_data,
+		__event__synthesize_thread(comm_event, mmap_event, fork_event,
+					   namespaces_event, pid, 1, process,
+					   tool, machine, mmap_data,
 					   proc_map_timeout);
 	}
 
 	err = 0;
 	closedir(proc);
+out_free_namespaces:
+	free(namespaces_event);
 out_free_fork:
 	free(fork_event);
 out_free_mmap:
@@ -1008,6 +1096,22 @@ size_t perf_event__fprintf_comm(union perf_event *event, FILE *fp)
 	return fprintf(fp, "%s: %s:%d/%d\n", s, event->comm.comm, event->comm.pid, event->comm.tid);
 }
 
+size_t perf_event__fprintf_namespaces(union perf_event *event, FILE *fp)
+{
+	return fprintf(fp, " %d/%d - device: %lu [inodes - 0x%lx (cgroup),"
+			   " 0x%lx (ipc), 0x%lx (mnt), 0x%lx (net),"
+			   " 0x%lx (pid), 0x%lx (user), 0x%lx (uts)]\n\n",
+		       event->namespaces.pid, event->namespaces.tid,
+		       event->namespaces.dev_num,
+		       event->namespaces.inode_num[CGROUP_NS_INDEX],
+		       event->namespaces.inode_num[IPC_NS_INDEX],
+		       event->namespaces.inode_num[MNT_NS_INDEX],
+		       event->namespaces.inode_num[NET_NS_INDEX],
+		       event->namespaces.inode_num[PID_NS_INDEX],
+		       event->namespaces.inode_num[USER_NS_INDEX],
+		       event->namespaces.inode_num[UTS_NS_INDEX]);
+}
+
 int perf_event__process_comm(struct perf_tool *tool __maybe_unused,
 			     union perf_event *event,
 			     struct perf_sample *sample,
@@ -1016,6 +1120,14 @@ int perf_event__process_comm(struct perf_tool *tool __maybe_unused,
 	return machine__process_comm_event(machine, event, sample);
 }
 
+int perf_event__process_namespaces(struct perf_tool *tool __maybe_unused,
+				   union perf_event *event,
+				   struct perf_sample *sample,
+				   struct machine *machine)
+{
+	return machine__process_namespaces_event(machine, event, sample);
+}
+
 int perf_event__process_lost(struct perf_tool *tool __maybe_unused,
 			     union perf_event *event,
 			     struct perf_sample *sample,
@@ -1196,6 +1308,9 @@ size_t perf_event__fprintf(union perf_event *event, FILE *fp)
 	case PERF_RECORD_MMAP:
 		ret += perf_event__fprintf_mmap(event, fp);
 		break;
+	case PERF_RECORD_NAMESPACES:
+		ret += perf_event__fprintf_namespaces(event, fp);
+		break;
 	case PERF_RECORD_MMAP2:
 		ret += perf_event__fprintf_mmap2(event, fp);
 		break;
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 8d363d5..a73bc8e 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -39,6 +39,13 @@ struct comm_event {
 	char comm[16];
 };
 
+struct namespaces_event {
+	struct perf_event_header header;
+	u32 pid, tid;
+	u64 dev_num;
+	u64 inode_num[NAMESPACES_MAX];
+};
+
 struct fork_event {
 	struct perf_event_header header;
 	u32 pid, ppid;
@@ -482,6 +489,7 @@ union perf_event {
 	struct mmap_event		mmap;
 	struct mmap2_event		mmap2;
 	struct comm_event		comm;
+	struct namespaces_event		namespaces;
 	struct fork_event		fork;
 	struct lost_event		lost;
 	struct lost_samples_event	lost_samples;
@@ -584,6 +592,10 @@ int perf_event__process_switch(struct perf_tool *tool,
 			       union perf_event *event,
 			       struct perf_sample *sample,
 			       struct machine *machine);
+int perf_event__process_namespaces(struct perf_tool *tool,
+				   union perf_event *event,
+				   struct perf_sample *sample,
+				   struct machine *machine);
 int perf_event__process_mmap(struct perf_tool *tool,
 			     union perf_event *event,
 			     struct perf_sample *sample,
@@ -633,6 +645,12 @@ pid_t perf_event__synthesize_comm(struct perf_tool *tool,
 				  perf_event__handler_t process,
 				  struct machine *machine);
 
+int perf_event__synthesize_namespaces(struct perf_tool *tool,
+				      union perf_event *event,
+				      pid_t pid, pid_t tgid,
+				      perf_event__handler_t process,
+				      struct machine *machine);
+
 int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 				       union perf_event *event,
 				       pid_t pid, pid_t tgid,
@@ -650,6 +668,7 @@ size_t perf_event__fprintf_itrace_start(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_switch(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_thread_map(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_cpu_map(union perf_event *event, FILE *fp);
+size_t perf_event__fprintf_namespaces(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf(union perf_event *event, FILE *fp);
 
 u64 kallsyms__get_function_start(const char *kallsyms_filename,
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 8bc2711..91d19b2 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -923,6 +923,9 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts,
 	attr->mmap2 = track && !perf_missing_features.mmap2;
 	attr->comm  = track;
 
+	if (opts->record_namespaces)
+		attr->namespaces  = track;
+
 	if (opts->record_switch_events)
 		attr->context_switch = track;
 
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index df85b9e..f071294 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -482,6 +482,29 @@ int machine__process_comm_event(struct machine *machine, union perf_event *event
 	return err;
 }
 
+int machine__process_namespaces_event(struct machine *machine __maybe_unused,
+				      union perf_event *event,
+				      struct perf_sample *sample __maybe_unused)
+{
+	struct thread *thread = machine__findnew_thread(machine,
+							event->namespaces.pid,
+							event->namespaces.tid);
+	int err = 0;
+
+	if (dump_trace)
+		perf_event__fprintf_namespaces(event, stdout);
+
+	if (thread == NULL ||
+	    thread__set_namespaces(thread, sample->time, &event->namespaces)) {
+		dump_printf("problem processing PERF_RECORD_NAMESPACES, skipping event.\n");
+		err = -1;
+	}
+
+	thread__put(thread);
+
+	return err;
+}
+
 int machine__process_lost_event(struct machine *machine __maybe_unused,
 				union perf_event *event, struct perf_sample *sample __maybe_unused)
 {
@@ -1519,6 +1542,8 @@ int machine__process_event(struct machine *machine, union perf_event *event,
 		ret = machine__process_comm_event(machine, event, sample); break;
 	case PERF_RECORD_MMAP:
 		ret = machine__process_mmap_event(machine, event, sample); break;
+	case PERF_RECORD_NAMESPACES:
+		ret = machine__process_namespaces_event(machine, event, sample); break;
 	case PERF_RECORD_MMAP2:
 		ret = machine__process_mmap2_event(machine, event, sample); break;
 	case PERF_RECORD_FORK:
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index 354de6e..e494368 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -97,6 +97,9 @@ int machine__process_itrace_start_event(struct machine *machine,
 					union perf_event *event);
 int machine__process_switch_event(struct machine *machine,
 				  union perf_event *event);
+int machine__process_namespaces_event(struct machine *machine,
+				      union perf_event *event,
+				      struct perf_sample *sample);
 int machine__process_mmap_event(struct machine *machine, union perf_event *event,
 				struct perf_sample *sample);
 int machine__process_mmap2_event(struct machine *machine, union perf_event *event,
diff --git a/tools/perf/util/namespaces.c b/tools/perf/util/namespaces.c
new file mode 100644
index 0000000..dd481ed
--- /dev/null
+++ b/tools/perf/util/namespaces.c
@@ -0,0 +1,28 @@
+#include "namespaces.h"
+#include "util.h"
+#include "event.h"
+#include <stdlib.h>
+#include <stdio.h>
+
+struct namespaces *namespaces__new(struct namespaces_event *event)
+{
+	struct namespaces *namespaces = zalloc(sizeof(*namespaces));
+
+	if (!namespaces)
+		return NULL;
+
+	namespaces->end_time = -1;
+
+	if (event) {
+		namespaces->dev_num = event->dev_num;
+		memcpy(namespaces->inode_num, event->inode_num,
+		       sizeof(namespaces->inode_num));
+	}
+
+	return namespaces;
+}
+
+void namespaces__free(struct namespaces *namespaces)
+{
+	free(namespaces);
+}
diff --git a/tools/perf/util/namespaces.h b/tools/perf/util/namespaces.h
new file mode 100644
index 0000000..4acef9e
--- /dev/null
+++ b/tools/perf/util/namespaces.h
@@ -0,0 +1,19 @@
+#ifndef __PERF_NAMESPACES_H
+#define __PERF_NAMESPACES_H
+
+#include "../perf.h"
+#include <linux/list.h>
+
+struct namespaces_event;
+
+struct namespaces {
+	struct list_head list;
+	u64 end_time;
+	u64 dev_num;
+	u64 inode_num[NAMESPACES_MAX];
+};
+
+struct namespaces *namespaces__new(struct namespaces_event *event);
+void namespaces__free(struct namespaces *namespaces);
+
+#endif  /* __PERF_NAMESPACES_H */
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 5d61242..a9a1139 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1239,6 +1239,8 @@ static int machines__deliver_event(struct machines *machines,
 		return tool->mmap2(tool, event, sample, machine);
 	case PERF_RECORD_COMM:
 		return tool->comm(tool, event, sample, machine);
+	case PERF_RECORD_NAMESPACES:
+		return tool->namespaces(tool, event, sample, machine);
 	case PERF_RECORD_FORK:
 		return tool->fork(tool, event, sample, machine);
 	case PERF_RECORD_EXIT:
@@ -1494,6 +1496,11 @@ int perf_session__register_idle_thread(struct perf_session *session)
 		err = -1;
 	}
 
+	if (thread == NULL || thread__set_namespaces(thread, 0, NULL)) {
+		pr_err("problem inserting idle task.\n");
+		err = -1;
+	}
+
 	/* machine__findnew_thread() got the thread, so put it */
 	thread__put(thread);
 	return err;
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index f5af87f..b9fe432 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -7,6 +7,7 @@
 #include "thread-stack.h"
 #include "util.h"
 #include "debug.h"
+#include "namespaces.h"
 #include "comm.h"
 #include "unwind.h"
 
@@ -40,6 +41,7 @@ struct thread *thread__new(pid_t pid, pid_t tid)
 		thread->tid = tid;
 		thread->ppid = -1;
 		thread->cpu = -1;
+		INIT_LIST_HEAD(&thread->namespaces_list);
 		INIT_LIST_HEAD(&thread->comm_list);
 
 		comm_str = malloc(32);
@@ -66,7 +68,8 @@ struct thread *thread__new(pid_t pid, pid_t tid)
 
 void thread__delete(struct thread *thread)
 {
-	struct comm *comm, *tmp;
+	struct namespaces *namespaces, *tmp_namespaces;
+	struct comm *comm, *tmp_comm;
 
 	BUG_ON(!RB_EMPTY_NODE(&thread->rb_node));
 
@@ -76,7 +79,12 @@ void thread__delete(struct thread *thread)
 		map_groups__put(thread->mg);
 		thread->mg = NULL;
 	}
-	list_for_each_entry_safe(comm, tmp, &thread->comm_list, list) {
+	list_for_each_entry_safe(namespaces, tmp_namespaces,
+				 &thread->namespaces_list, list) {
+		list_del(&namespaces->list);
+		namespaces__free(namespaces);
+	}
+	list_for_each_entry_safe(comm, tmp_comm, &thread->comm_list, list) {
 		list_del(&comm->list);
 		comm__free(comm);
 	}
@@ -104,6 +112,38 @@ void thread__put(struct thread *thread)
 	}
 }
 
+struct namespaces *thread__namespaces(const struct thread *thread)
+{
+	if (list_empty(&thread->namespaces_list))
+		return NULL;
+
+	return list_first_entry(&thread->namespaces_list, struct namespaces, list);
+}
+
+int thread__set_namespaces(struct thread *thread, u64 timestamp,
+			   struct namespaces_event *event)
+{
+	struct namespaces *new, *curr = thread__namespaces(thread);
+
+	new = namespaces__new(event);
+	if (!new)
+		return -ENOMEM;
+
+	list_add(&new->list, &thread->namespaces_list);
+
+	if (timestamp && curr) {
+		/*
+		 * setns syscall must have changed few or all the namespaces
+		 * of this thread. Update end time for the namespaces
+		 * previously used.
+		 */
+		curr = list_next_entry(new, list);
+		curr->end_time = timestamp;
+	}
+
+	return 0;
+}
+
 struct comm *thread__comm(const struct thread *thread)
 {
 	if (list_empty(&thread->comm_list))
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index 99263cb..b18b5a2 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -28,6 +28,7 @@ struct thread {
 	bool			comm_set;
 	int			comm_len;
 	bool			dead; /* if set thread has exited */
+	struct list_head	namespaces_list;
 	struct list_head	comm_list;
 	u64			db_id;
 
@@ -40,6 +41,7 @@ struct thread {
 };
 
 struct machine;
+struct namespaces;
 struct comm;
 
 struct thread *thread__new(pid_t pid, pid_t tid);
@@ -62,6 +64,10 @@ static inline void thread__exited(struct thread *thread)
 	thread->dead = true;
 }
 
+struct namespaces *thread__namespaces(const struct thread *thread);
+int thread__set_namespaces(struct thread *thread, u64 timestamp,
+			   struct namespaces_event *event);
+
 int __thread__set_comm(struct thread *thread, const char *comm, u64 timestamp,
 		       bool exec);
 static inline int thread__set_comm(struct thread *thread, const char *comm,
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index ac2590a..829471a 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -40,6 +40,7 @@ struct perf_tool {
 	event_op	mmap,
 			mmap2,
 			comm,
+			namespaces,
 			fork,
 			exit,
 			lost,
@@ -66,6 +67,7 @@ struct perf_tool {
 	event_op3	auxtrace;
 	bool		ordered_events;
 	bool		ordering_requires_timestamps;
+	bool		namespace_events;
 };
 
 #endif /* __PERF_TOOL_H */

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report
  2016-12-12 18:19 [PATCH v3 0/3] perf: add support for analyzing events for containers Hari Bathini
  2016-12-12 18:19 ` [PATCH v3 1/3] perf: add PERF_RECORD_NAMESPACES to include namespaces related info Hari Bathini
  2016-12-12 18:19 ` [PATCH v3 2/3] perf tool: " Hari Bathini
@ 2016-12-12 18:20 ` Hari Bathini
  2016-12-12 22:06   ` Eric W. Biederman
  2 siblings, 1 reply; 14+ messages in thread
From: Hari Bathini @ 2016-12-12 18:20 UTC (permalink / raw)
  To: ast, peterz, lkml, acme, alexander.shishkin, mingo
  Cc: daniel, rostedt, Ananth N Mavinakayanahalli, ebiederm, sargun,
	Aravinda Prasad, brendan.d.gregg

This patch introduces a cgroup identifier entry field in perf report to
identify or distinguish data of different cgroups. It uses the unique
inode number of cgroup namespace, included in perf data with the new
PERF_RECORD_NAMESPACES event, as cgroup identifier. With the assumption
that each container is created with it's own cgroup namespace, this
allows assessment/analysis of multiple containers at once.

Shown below is the output of perf report, sorted based on cgroup id, on
a system that was running three containers at the time of perf record
and clearly showing one of the containers' considerable use of kernel
memory in comparison with others:


	$ perf report -s cgroup_id,sample --stdio
	#
	# Total Lost Samples: 0
	#
	# Samples: 1K of event 'kmem:kmalloc'
	# Event count (approx.): 1828
	#
	# Overhead  cgroup id        Samples
	# ........  ..........  ............
	#
	    84.74%  4026532048          1549
	     7.93%  4026531835           145
	     3.67%  4026532047            67
	     2.68%  4026532046            49
	     0.98%  0                     18

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
---
 tools/perf/util/hist.c |    4 ++++
 tools/perf/util/hist.h |    1 +
 tools/perf/util/sort.c |   22 ++++++++++++++++++++++
 tools/perf/util/sort.h |    2 ++
 4 files changed, 29 insertions(+)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index a69f027..a6650d7 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -2,6 +2,7 @@
 #include "build-id.h"
 #include "hist.h"
 #include "session.h"
+#include "namespaces.h"
 #include "sort.h"
 #include "evlist.h"
 #include "evsel.h"
@@ -168,6 +169,7 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h)
 		hists__set_unres_dso_col_len(hists, HISTC_MEM_DADDR_DSO);
 	}
 
+	hists__new_col_len(hists, HISTC_CGROUP_ID, 10);
 	hists__new_col_len(hists, HISTC_CPU, 3);
 	hists__new_col_len(hists, HISTC_SOCKET, 6);
 	hists__new_col_len(hists, HISTC_MEM_LOCKED, 6);
@@ -573,9 +575,11 @@ __hists__add_entry(struct hists *hists,
 		   bool sample_self,
 		   struct hist_entry_ops *ops)
 {
+	struct namespaces *ns = thread__namespaces(al->thread);
 	struct hist_entry entry = {
 		.thread	= al->thread,
 		.comm = thread__comm(al->thread),
+		.cgroup_id = ns ? ns->inode_num[CGROUP_NS_INDEX] : 0,
 		.ms = {
 			.map	= al->map,
 			.sym	= al->sym,
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 9928fed..894c95d 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -29,6 +29,7 @@ enum hist_column {
 	HISTC_DSO,
 	HISTC_THREAD,
 	HISTC_COMM,
+	HISTC_CGROUP_ID,
 	HISTC_PARENT,
 	HISTC_CPU,
 	HISTC_SOCKET,
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 452e15a..b6152df 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -536,6 +536,27 @@ struct sort_entry sort_cpu = {
 	.se_width_idx	= HISTC_CPU,
 };
 
+/* --sort cgroup_id */
+
+static int64_t
+sort__cgroup_id_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	return (int64_t)right->cgroup_id - (int64_t)left->cgroup_id;
+}
+
+static int hist_entry__cgroup_id_snprintf(struct hist_entry *he, char *bf,
+					  size_t size, unsigned int width)
+{
+	return repsep_snprintf(bf, size, "%-*u", width, he->cgroup_id);
+}
+
+struct sort_entry sort_cgroup_id = {
+	.se_header      = "cgroup id",
+	.se_cmp	        = sort__cgroup_id_cmp,
+	.se_snprintf    = hist_entry__cgroup_id_snprintf,
+	.se_width_idx	= HISTC_CGROUP_ID,
+};
+
 /* --sort socket */
 
 static int64_t
@@ -1418,6 +1439,7 @@ static struct sort_dimension common_sort_dimensions[] = {
 	DIM(SORT_GLOBAL_WEIGHT, "weight", sort_global_weight),
 	DIM(SORT_TRANSACTION, "transaction", sort_transaction),
 	DIM(SORT_TRACE, "trace", sort_trace),
+	DIM(SORT_CGROUP_ID, "cgroup_id", sort_cgroup_id),
 };
 
 #undef DIM
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 099c975..e8058f6 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -95,6 +95,7 @@ struct hist_entry {
 	u64			transaction;
 	s32			socket;
 	s32			cpu;
+	u32			cgroup_id;
 	u8			cpumode;
 	u8			depth;
 
@@ -211,6 +212,7 @@ enum sort_type {
 	SORT_GLOBAL_WEIGHT,
 	SORT_TRANSACTION,
 	SORT_TRACE,
+	SORT_CGROUP_ID,
 
 	/* branch stack specific sort keys */
 	__SORT_BRANCH_STACK,

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/3] perf: add PERF_RECORD_NAMESPACES to include namespaces related info
  2016-12-12 18:19 ` [PATCH v3 1/3] perf: add PERF_RECORD_NAMESPACES to include namespaces related info Hari Bathini
@ 2016-12-12 18:27   ` Eric W. Biederman
  2016-12-13 18:47     ` Hari Bathini
  0 siblings, 1 reply; 14+ messages in thread
From: Eric W. Biederman @ 2016-12-12 18:27 UTC (permalink / raw)
  To: Hari Bathini
  Cc: ast, peterz, lkml, acme, alexander.shishkin, mingo, daniel,
	rostedt, Ananth N Mavinakayanahalli, sargun, Aravinda Prasad,
	brendan.d.gregg

Hari Bathini <hbathini@linux.vnet.ibm.com> writes:

> With the advert of container technologies like docker, that depend
> on namespaces for isolation, there is a need for tracing support for
> namespaces. This patch introduces new PERF_RECORD_NAMESPACES event
> for tracing based on namespaces related info.

> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index c66a485..2a48fc6 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -344,7 +344,8 @@ struct perf_event_attr {
>  				use_clockid    :  1, /* use @clockid for time fields */
>  				context_switch :  1, /* context switch data */
>  				write_backward :  1, /* Write ring buffer from end to beginning */
> -				__reserved_1   : 36;
> +				namespaces     :  1, /* include namespaces data */
> +				__reserved_1   : 35;
>  
>  	union {
>  		__u32		wakeup_events;	  /* wakeup every n events */
> @@ -610,6 +611,18 @@ struct perf_event_header {
>  	__u16	size;
>  };
>  
> +enum {
> +	NET_NS_INDEX		= 0,
> +	UTS_NS_INDEX		= 1,
> +	IPC_NS_INDEX		= 2,
> +	PID_NS_INDEX		= 3,
> +	USER_NS_INDEX		= 4,
> +	MNT_NS_INDEX		= 5,
> +	CGROUP_NS_INDEX		= 6,
> +
> +	NAMESPACES_MAX,		/* maximum available namespaces */
> +};
> +
>  enum perf_event_type {
>  
>  	/*
> @@ -862,6 +875,18 @@ enum perf_event_type {
>  	 */
>  	PERF_RECORD_SWITCH_CPU_WIDE		= 15,
>  
> +	/*
> +	 * struct {
> +	 *	struct perf_event_header	header;
> +	 *
> +	 *	u32				pid, tid;
> +	 *	u64				dev_num;
> +	 *	u64				inode_num[NAMESPACES_MAX];
There needs to be one device number per inode.  While it is true that
today the device number is always the same.  That is not necessarily so.

I reserve the right to have the device number vary per namespace
so that I don't need to implement a namespace of namespaces.

These are st_dev and st_ino of the inode for the namespace.

> +	 *	struct sample_id		sample_id;
> +	 * };
> +	 */
> +	PERF_RECORD_NAMESPACES			= 16,
> +
>  	PERF_RECORD_MAX,			/* non-ABI */
>  };

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 2/3] perf tool: add PERF_RECORD_NAMESPACES to include namespaces related info
  2016-12-12 18:19 ` [PATCH v3 2/3] perf tool: " Hari Bathini
@ 2016-12-12 21:51   ` Eric W. Biederman
  0 siblings, 0 replies; 14+ messages in thread
From: Eric W. Biederman @ 2016-12-12 21:51 UTC (permalink / raw)
  To: Hari Bathini
  Cc: ast, peterz, lkml, acme, alexander.shishkin, mingo, daniel,
	rostedt, Ananth N Mavinakayanahalli, sargun, Aravinda Prasad,
	brendan.d.gregg

Hari Bathini <hbathini@linux.vnet.ibm.com> writes:

> This patch updates perf tool to examine PERF_RECORD_NAMESPACES events
> emitted by the kernel when fork, clone, setns or unshare are invoked.
> Also, it synthesizes PERF_RECORD_NAMESPACES events for processes that
> were running prior to invocation of perf record, the data for which
> is taken from /proc/$PID/ns. These changes make way for analyzing
> events with regard to namespaces.

> diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
> index 8d363d5..a73bc8e 100644
> --- a/tools/perf/util/event.h
> +++ b/tools/perf/util/event.h
> @@ -39,6 +39,13 @@ struct comm_event {
>  	char comm[16];
>  };
>  
> +struct namespaces_event {
> +	struct perf_event_header header;
> +	u32 pid, tid;
> +	u64 dev_num;
> +	u64 inode_num[NAMESPACES_MAX];
> +};

This suffers from the same issue I pointed out with the
kernel interface.  We need one device number per inode.
Today we only have one device number but that may change in
the future.  These are st_dev and st_inode from stat, and
you need both of them per namespace to be unique.

I do not want to get into a situation where I have to implement a
namespace of namespaces in the future.

Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report
  2016-12-12 18:20 ` [PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report Hari Bathini
@ 2016-12-12 22:06   ` Eric W. Biederman
  2016-12-13 19:07     ` Hari Bathini
  0 siblings, 1 reply; 14+ messages in thread
From: Eric W. Biederman @ 2016-12-12 22:06 UTC (permalink / raw)
  To: Hari Bathini
  Cc: ast, peterz, lkml, acme, alexander.shishkin, mingo, daniel,
	rostedt, Ananth N Mavinakayanahalli, sargun, Aravinda Prasad,
	brendan.d.gregg

Hari Bathini <hbathini@linux.vnet.ibm.com> writes:

> This patch introduces a cgroup identifier entry field in perf report to
> identify or distinguish data of different cgroups. It uses the unique
> inode number of cgroup namespace, included in perf data with the new
> PERF_RECORD_NAMESPACES event, as cgroup identifier. With the assumption
> that each container is created with it's own cgroup namespace, this
> allows assessment/analysis of multiple containers at once.

In the large this sounds reasonable.

The details are wrong.  The cgroup id needs to be device
number + inode number, not just inode number.

Eric

> Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
> ---
>  tools/perf/util/hist.c |    4 ++++
>  tools/perf/util/hist.h |    1 +
>  tools/perf/util/sort.c |   22 ++++++++++++++++++++++
>  tools/perf/util/sort.h |    2 ++
>  4 files changed, 29 insertions(+)
>
> diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
> @@ -573,9 +575,11 @@ __hists__add_entry(struct hists *hists,
>  		   bool sample_self,
>  		   struct hist_entry_ops *ops)
>  {
> +	struct namespaces *ns = thread__namespaces(al->thread);
>  	struct hist_entry entry = {
>  		.thread	= al->thread,
>  		.comm = thread__comm(al->thread),
> +		.cgroup_id = ns ? ns->inode_num[CGROUP_NS_INDEX] : 0,
>  		.ms = {
>  			.map	= al->map,
>  			.sym	= al->sym,

Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/3] perf: add PERF_RECORD_NAMESPACES to include namespaces related info
  2016-12-12 18:27   ` Eric W. Biederman
@ 2016-12-13 18:47     ` Hari Bathini
  2016-12-13 19:58       ` Eric W. Biederman
  0 siblings, 1 reply; 14+ messages in thread
From: Hari Bathini @ 2016-12-13 18:47 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: ast, peterz, lkml, acme, alexander.shishkin, mingo, daniel,
	rostedt, Ananth N Mavinakayanahalli, sargun, Aravinda Prasad,
	brendan.d.gregg

Hi Eric,


On Monday 12 December 2016 11:57 PM, Eric W. Biederman wrote:
> Hari Bathini <hbathini@linux.vnet.ibm.com> writes:
>
>> With the advert of container technologies like docker, that depend
>> on namespaces for isolation, there is a need for tracing support for
>> namespaces. This patch introduces new PERF_RECORD_NAMESPACES event
>> for tracing based on namespaces related info.
>> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
>> index c66a485..2a48fc6 100644
>> --- a/include/uapi/linux/perf_event.h
>> +++ b/include/uapi/linux/perf_event.h
>> @@ -344,7 +344,8 @@ struct perf_event_attr {
>>   				use_clockid    :  1, /* use @clockid for time fields */
>>   				context_switch :  1, /* context switch data */
>>   				write_backward :  1, /* Write ring buffer from end to beginning */
>> -				__reserved_1   : 36;
>> +				namespaces     :  1, /* include namespaces data */
>> +				__reserved_1   : 35;
>>   
>>   	union {
>>   		__u32		wakeup_events;	  /* wakeup every n events */
>> @@ -610,6 +611,18 @@ struct perf_event_header {
>>   	__u16	size;
>>   };
>>   
>> +enum {
>> +	NET_NS_INDEX		= 0,
>> +	UTS_NS_INDEX		= 1,
>> +	IPC_NS_INDEX		= 2,
>> +	PID_NS_INDEX		= 3,
>> +	USER_NS_INDEX		= 4,
>> +	MNT_NS_INDEX		= 5,
>> +	CGROUP_NS_INDEX		= 6,
>> +
>> +	NAMESPACES_MAX,		/* maximum available namespaces */
>> +};
>> +
>>   enum perf_event_type {
>>   
>>   	/*
>> @@ -862,6 +875,18 @@ enum perf_event_type {
>>   	 */
>>   	PERF_RECORD_SWITCH_CPU_WIDE		= 15,
>>   
>> +	/*
>> +	 * struct {
>> +	 *	struct perf_event_header	header;
>> +	 *
>> +	 *	u32				pid, tid;
>> +	 *	u64				dev_num;
>> +	 *	u64				inode_num[NAMESPACES_MAX];
> There needs to be one device number per inode.  While it is true that
> today the device number is always the same.  That is not necessarily so.
> I reserve the right to have the device number vary per namespace
> so that I don't need to implement a namespace of namespaces.
>
> These are st_dev and st_ino of the inode for the namespace.

Do you mean..

     st_dev = encode_dev(inode->i_sb->s_dev); ?
     st_ino = inode->i_ino; ?

Thanks
Hari

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report
  2016-12-12 22:06   ` Eric W. Biederman
@ 2016-12-13 19:07     ` Hari Bathini
  2016-12-13 19:56       ` Eric W. Biederman
  0 siblings, 1 reply; 14+ messages in thread
From: Hari Bathini @ 2016-12-13 19:07 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: ast, peterz, lkml, acme, alexander.shishkin, mingo, daniel,
	rostedt, Ananth N Mavinakayanahalli, sargun, Aravinda Prasad,
	brendan.d.gregg

Hi Eric,


On Tuesday 13 December 2016 03:36 AM, Eric W. Biederman wrote:
> Hari Bathini <hbathini@linux.vnet.ibm.com> writes:
>
>> This patch introduces a cgroup identifier entry field in perf report to
>> identify or distinguish data of different cgroups. It uses the unique
>> inode number of cgroup namespace, included in perf data with the new
>> PERF_RECORD_NAMESPACES event, as cgroup identifier. With the assumption
>> that each container is created with it's own cgroup namespace, this
>> allows assessment/analysis of multiple containers at once.
> In the large this sounds reasonable.
>
> The details are wrong.  The cgroup id needs to be device
> number + inode number, not just inode number.
>

As the assumption that device number is going to be the same for
all namespaces may not stand the test of time, the inode number is
not going to be unique, to use as an identifier..

I am thinking of an identifier like the below. This may be OK for now
as dev_num & inode_num are 32bit each.

     identifier = (dev_num << 32 | inode_num)

But this may leave us with identifiers that are not unique if dev_num
& inode_num are changed to 64bit. Should that be of concern? Do
you have any alternate suggestions to come up with unique identifier
in such scenario too..?

Thanks
Hari

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report
  2016-12-13 19:07     ` Hari Bathini
@ 2016-12-13 19:56       ` Eric W. Biederman
  2016-12-14  8:24         ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Eric W. Biederman @ 2016-12-13 19:56 UTC (permalink / raw)
  To: Hari Bathini
  Cc: ast, peterz, lkml, acme, alexander.shishkin, mingo, daniel,
	rostedt, Ananth N Mavinakayanahalli, sargun, Aravinda Prasad,
	brendan.d.gregg

Hari Bathini <hbathini@linux.vnet.ibm.com> writes:

> Hi Eric,
>
>
> On Tuesday 13 December 2016 03:36 AM, Eric W. Biederman wrote:
>> Hari Bathini <hbathini@linux.vnet.ibm.com> writes:
>>
>>> This patch introduces a cgroup identifier entry field in perf report to
>>> identify or distinguish data of different cgroups. It uses the unique
>>> inode number of cgroup namespace, included in perf data with the new
>>> PERF_RECORD_NAMESPACES event, as cgroup identifier. With the assumption
>>> that each container is created with it's own cgroup namespace, this
>>> allows assessment/analysis of multiple containers at once.
>> In the large this sounds reasonable.
>>
>> The details are wrong.  The cgroup id needs to be device
>> number + inode number, not just inode number.
>>
>
> As the assumption that device number is going to be the same for
> all namespaces may not stand the test of time, the inode number is
> not going to be unique, to use as an identifier..
>
> I am thinking of an identifier like the below. This may be OK for now
> as dev_num & inode_num are 32bit each.
>
>     identifier = (dev_num << 32 | inode_num)
>
> But this may leave us with identifiers that are not unique if dev_num
> & inode_num are changed to 64bit. Should that be of concern? Do
> you have any alternate suggestions to come up with unique identifier
> in such scenario too..?

Inode numbers in general are 64bit.  The namespace inodes admittedly are
currently implemented as 32bit quantities but that is not something we
want to hard code into the userspace interface.

I would just make the identifier a structure containing the
device number and the inode number.  It didn't look like perf required
the identifier to be a simple integer.

Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/3] perf: add PERF_RECORD_NAMESPACES to include namespaces related info
  2016-12-13 18:47     ` Hari Bathini
@ 2016-12-13 19:58       ` Eric W. Biederman
  0 siblings, 0 replies; 14+ messages in thread
From: Eric W. Biederman @ 2016-12-13 19:58 UTC (permalink / raw)
  To: Hari Bathini
  Cc: ast, peterz, lkml, acme, alexander.shishkin, mingo, daniel,
	rostedt, Ananth N Mavinakayanahalli, sargun, Aravinda Prasad,
	brendan.d.gregg

Hari Bathini <hbathini@linux.vnet.ibm.com> writes:

> Hi Eric,
>
>
> On Monday 12 December 2016 11:57 PM, Eric W. Biederman wrote:
>> Hari Bathini <hbathini@linux.vnet.ibm.com> writes:
>>
>>> With the advert of container technologies like docker, that depend
>>> on namespaces for isolation, there is a need for tracing support for
>>> namespaces. This patch introduces new PERF_RECORD_NAMESPACES event
>>> for tracing based on namespaces related info.
>>> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
>>> index c66a485..2a48fc6 100644
>>> --- a/include/uapi/linux/perf_event.h
>>> +++ b/include/uapi/linux/perf_event.h
>>> @@ -344,7 +344,8 @@ struct perf_event_attr {
>>>   				use_clockid    :  1, /* use @clockid for time fields */
>>>   				context_switch :  1, /* context switch data */
>>>   				write_backward :  1, /* Write ring buffer from end to beginning */
>>> -				__reserved_1   : 36;
>>> +				namespaces     :  1, /* include namespaces data */
>>> +				__reserved_1   : 35;
>>>     	union {
>>>   		__u32		wakeup_events;	  /* wakeup every n events */
>>> @@ -610,6 +611,18 @@ struct perf_event_header {
>>>   	__u16	size;
>>>   };
>>>   +enum {
>>> +	NET_NS_INDEX		= 0,
>>> +	UTS_NS_INDEX		= 1,
>>> +	IPC_NS_INDEX		= 2,
>>> +	PID_NS_INDEX		= 3,
>>> +	USER_NS_INDEX		= 4,
>>> +	MNT_NS_INDEX		= 5,
>>> +	CGROUP_NS_INDEX		= 6,
>>> +
>>> +	NAMESPACES_MAX,		/* maximum available namespaces */
>>> +};
>>> +
>>>   enum perf_event_type {
>>>     	/*
>>> @@ -862,6 +875,18 @@ enum perf_event_type {
>>>   	 */
>>>   	PERF_RECORD_SWITCH_CPU_WIDE		= 15,
>>>   +	/*
>>> +	 * struct {
>>> +	 *	struct perf_event_header	header;
>>> +	 *
>>> +	 *	u32				pid, tid;
>>> +	 *	u64				dev_num;
>>> +	 *	u64				inode_num[NAMESPACES_MAX];
>> There needs to be one device number per inode.  While it is true that
>> today the device number is always the same.  That is not necessarily so.
>> I reserve the right to have the device number vary per namespace
>> so that I don't need to implement a namespace of namespaces.
>>
>> These are st_dev and st_ino of the inode for the namespace.
>
> Do you mean..
>
>     st_dev = encode_dev(inode->i_sb->s_dev); ?
>     st_ino = inode->i_ino; ?

Yes.  I believe that is how those values make it to user space
during a stat system call.

Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report
  2016-12-13 19:56       ` Eric W. Biederman
@ 2016-12-14  8:24         ` Peter Zijlstra
  2016-12-14 15:52           ` Eric W. Biederman
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2016-12-14  8:24 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Hari Bathini, ast, lkml, acme, alexander.shishkin, mingo, daniel,
	rostedt, Ananth N Mavinakayanahalli, sargun, Aravinda Prasad,
	brendan.d.gregg

On Wed, Dec 14, 2016 at 08:56:43AM +1300, Eric W. Biederman wrote:
> 
> I would just make the identifier a structure containing the
> device number and the inode number.  It didn't look like perf required
> the identifier to be a simple integer.

Right, perf doesn't care at all here, its just a transport.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report
  2016-12-14  8:24         ` Peter Zijlstra
@ 2016-12-14 15:52           ` Eric W. Biederman
  2016-12-14 17:03             ` Hari Bathini
  0 siblings, 1 reply; 14+ messages in thread
From: Eric W. Biederman @ 2016-12-14 15:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Hari Bathini, ast, lkml, acme, alexander.shishkin, mingo, daniel,
	rostedt, Ananth N Mavinakayanahalli, sargun, Aravinda Prasad,
	brendan.d.gregg

Peter Zijlstra <peterz@infradead.org> writes:

> On Wed, Dec 14, 2016 at 08:56:43AM +1300, Eric W. Biederman wrote:
>> 
>> I would just make the identifier a structure containing the
>> device number and the inode number.  It didn't look like perf required
>> the identifier to be a simple integer.
>
> Right, perf doesn't care at all here, its just a transport.

perf report?  In that case I think perf cares enough to know there is
some identifier it is reporting things by.

Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report
  2016-12-14 15:52           ` Eric W. Biederman
@ 2016-12-14 17:03             ` Hari Bathini
  0 siblings, 0 replies; 14+ messages in thread
From: Hari Bathini @ 2016-12-14 17:03 UTC (permalink / raw)
  To: Eric W. Biederman, Peter Zijlstra
  Cc: ast, lkml, acme, alexander.shishkin, mingo, daniel, rostedt,
	Ananth N Mavinakayanahalli, sargun, Aravinda Prasad,
	brendan.d.gregg



On Wednesday 14 December 2016 09:22 PM, Eric W. Biederman wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
>
>> On Wed, Dec 14, 2016 at 08:56:43AM +1300, Eric W. Biederman wrote:
>>> I would just make the identifier a structure containing the
>>> device number and the inode number.  It didn't look like perf required
>>> the identifier to be a simple integer.
>> Right, perf doesn't care at all here, its just a transport.
> perf report?  In that case I think perf cares enough to know there is
> some identifier it is reporting things by.

Let me post v4 with this change..

Thanks
Hari

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-12-14 17:20 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-12 18:19 [PATCH v3 0/3] perf: add support for analyzing events for containers Hari Bathini
2016-12-12 18:19 ` [PATCH v3 1/3] perf: add PERF_RECORD_NAMESPACES to include namespaces related info Hari Bathini
2016-12-12 18:27   ` Eric W. Biederman
2016-12-13 18:47     ` Hari Bathini
2016-12-13 19:58       ` Eric W. Biederman
2016-12-12 18:19 ` [PATCH v3 2/3] perf tool: " Hari Bathini
2016-12-12 21:51   ` Eric W. Biederman
2016-12-12 18:20 ` [PATCH v3 3/3] perf tool: add cgroup identifier entry in perf report Hari Bathini
2016-12-12 22:06   ` Eric W. Biederman
2016-12-13 19:07     ` Hari Bathini
2016-12-13 19:56       ` Eric W. Biederman
2016-12-14  8:24         ` Peter Zijlstra
2016-12-14 15:52           ` Eric W. Biederman
2016-12-14 17:03             ` Hari Bathini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).