linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 0/8] perf: add support for analyzing events for containers
@ 2017-02-21 14:01 Hari Bathini
  2017-02-21 14:01 ` [PATCH v7 1/8] perf: add PERF_RECORD_NAMESPACES to include namespaces related info Hari Bathini
                   ` (8 more replies)
  0 siblings, 9 replies; 24+ messages in thread
From: Hari Bathini @ 2017-02-21 14:01 UTC (permalink / raw)
  To: ast, peterz, lkml, acme, alexander.shishkin, mingo
  Cc: daniel, rostedt, Ananth N Mavinakayanahalli, ebiederm, sargun,
	Aravinda Prasad, brendan.d.gregg, jolsa

Currently, there is no trivial mechanism to analyze events based on
containers. perf -G can be used, but it will not filter events for the                  
containers created after perf is invoked, making it difficult to assess/
analyze performance issues of multiple containers at once.

This patch-set is aimed at addressing this limitation by introducing a
new PERF_RECORD_NAMESPACES event that records namespaces related info.
As containers are created with namespaces, the new data can be used to
in assessment/analysis of multiple containers.

The first patch introduces PERF_RECORD_NAMESPACES in kernel while the
second patch makes the corresponding changes in perf tool to read this
PERF_RECORD_NAMESPACES events. The third patch demonstrates analysis
of containers with this data by adding a cgroup identifier column in
perf report, which contains the cgroup namespace's device and inode
numbers. This is based on the assumption that each container is created
with it's own cgroup namespace. The third patch has scope for improvement
based on the conventions a container is attributed with, going forward.

Changes from v6:
* Updated changelog of patch 1
* Split patch 2 into smaller patches
* Updated record and script documenatation
* Dropped name field from ns_link_info struct

Changes from v5:
* Updated changelogs of patches 1 & 3
* Rebased the patches on perf/core in tip

---

Hari Bathini (8):
      perf: add PERF_RECORD_NAMESPACES to include namespaces related info
      perf tool: add PERF_RECORD_NAMESPACES to include namespaces related info
      perf tool: update about the new option to record namespace events
      perf tool: synthesize namespace events for current processes
      perf tool: add print support for namespace events
      perf tool: add script print support for namespace events
      perf tool: update about the new option to show namespace events
      perf tool: add cgroup identifier entry in perf report


 include/linux/perf_event.h               |    2 
 include/uapi/linux/perf_event.h          |   32 ++++++-
 kernel/events/core.c                     |  139 +++++++++++++++++++++++++++++
 kernel/fork.c                            |    2 
 kernel/nsproxy.c                         |    3 +
 tools/include/uapi/linux/perf_event.h    |   32 ++++++-
 tools/perf/Documentation/perf-record.txt |    3 +
 tools/perf/Documentation/perf-script.txt |    3 +
 tools/perf/builtin-annotate.c            |    1 
 tools/perf/builtin-diff.c                |    1 
 tools/perf/builtin-inject.c              |   14 +++
 tools/perf/builtin-kmem.c                |    1 
 tools/perf/builtin-kvm.c                 |    2 
 tools/perf/builtin-lock.c                |    1 
 tools/perf/builtin-mem.c                 |    1 
 tools/perf/builtin-record.c              |   33 ++++++-
 tools/perf/builtin-report.c              |    1 
 tools/perf/builtin-sched.c               |    1 
 tools/perf/builtin-script.c              |   41 ++++++++
 tools/perf/builtin-trace.c               |    3 -
 tools/perf/perf.h                        |    1 
 tools/perf/util/Build                    |    1 
 tools/perf/util/data-convert-bt.c        |    1 
 tools/perf/util/event.c                  |  146 +++++++++++++++++++++++++++++-
 tools/perf/util/event.h                  |   21 ++++
 tools/perf/util/evsel.c                  |    3 +
 tools/perf/util/hist.c                   |    7 +
 tools/perf/util/hist.h                   |    1 
 tools/perf/util/machine.c                |   34 +++++++
 tools/perf/util/machine.h                |    3 +
 tools/perf/util/namespaces.c             |   35 +++++++
 tools/perf/util/namespaces.h             |   26 +++++
 tools/perf/util/session.c                |    7 +
 tools/perf/util/sort.c                   |   41 ++++++++
 tools/perf/util/sort.h                   |    7 +
 tools/perf/util/thread.c                 |   44 +++++++++
 tools/perf/util/thread.h                 |    6 +
 tools/perf/util/tool.h                   |    2 
 38 files changed, 687 insertions(+), 15 deletions(-)
 create mode 100644 tools/perf/util/namespaces.c
 create mode 100644 tools/perf/util/namespaces.h

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v7 1/8] perf: add PERF_RECORD_NAMESPACES to include namespaces related info
  2017-02-21 14:01 [PATCH v7 0/8] perf: add support for analyzing events for containers Hari Bathini
@ 2017-02-21 14:01 ` Hari Bathini
  2017-02-24 12:14   ` Peter Zijlstra
  2017-02-21 14:01 ` [PATCH v7 2/8] perf tool: " Hari Bathini
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Hari Bathini @ 2017-02-21 14:01 UTC (permalink / raw)
  To: ast, peterz, lkml, acme, alexander.shishkin, mingo
  Cc: daniel, rostedt, Ananth N Mavinakayanahalli, ebiederm, sargun,
	Aravinda Prasad, brendan.d.gregg, jolsa

With the advert of container technologies like docker, that depend
on namespaces for isolation, there is a need for tracing support for
namespaces. This patch introduces new PERF_RECORD_NAMESPACES event
for recording namespaces related info. By recording info for every
namespace, it is left to userspace to take a call on the definition
of a container and trace containers by updating perf tool accordingly.

Each namespace has a combination of device and inode numbers. Though
every namespace has the same device number currently, that may change
in future to avoid the need for a namespace of namespaces. Considering
such possibility, record both device and inode numbers separately for
each namespace.

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
---
 include/linux/perf_event.h      |    2 +
 include/uapi/linux/perf_event.h |   32 +++++++++
 kernel/events/core.c            |  139 +++++++++++++++++++++++++++++++++++++++
 kernel/fork.c                   |    2 +
 kernel/nsproxy.c                |    3 +
 5 files changed, 177 insertions(+), 1 deletion(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 000fdb2..f19a823 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1112,6 +1112,7 @@ extern int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks
 
 extern void perf_event_exec(void);
 extern void perf_event_comm(struct task_struct *tsk, bool exec);
+extern void perf_event_namespaces(struct task_struct *tsk);
 extern void perf_event_fork(struct task_struct *tsk);
 
 /* Callchains */
@@ -1315,6 +1316,7 @@ static inline int perf_unregister_guest_info_callbacks
 static inline void perf_event_mmap(struct vm_area_struct *vma)		{ }
 static inline void perf_event_exec(void)				{ }
 static inline void perf_event_comm(struct task_struct *tsk, bool exec)	{ }
+static inline void perf_event_namespaces(struct task_struct *tsk)	{ }
 static inline void perf_event_fork(struct task_struct *tsk)		{ }
 static inline void perf_event_init(void)				{ }
 static inline int  perf_swevent_get_recursion_context(void)		{ return -1; }
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index c66a485..bec0aad 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -344,7 +344,8 @@ struct perf_event_attr {
 				use_clockid    :  1, /* use @clockid for time fields */
 				context_switch :  1, /* context switch data */
 				write_backward :  1, /* Write ring buffer from end to beginning */
-				__reserved_1   : 36;
+				namespaces     :  1, /* include namespaces data */
+				__reserved_1   : 35;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -610,6 +611,23 @@ struct perf_event_header {
 	__u16	size;
 };
 
+struct perf_ns_link_info {
+	__u64	dev;
+	__u64	ino;
+};
+
+enum {
+	NET_NS_INDEX		= 0,
+	UTS_NS_INDEX		= 1,
+	IPC_NS_INDEX		= 2,
+	PID_NS_INDEX		= 3,
+	USER_NS_INDEX		= 4,
+	MNT_NS_INDEX		= 5,
+	CGROUP_NS_INDEX		= 6,
+
+	NR_NAMESPACES,		/* number of available namespaces */
+};
+
 enum perf_event_type {
 
 	/*
@@ -862,6 +880,18 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_SWITCH_CPU_WIDE		= 15,
 
+	/*
+	 * struct {
+	 *	struct perf_event_header	header;
+	 *	u32				pid;
+	 *	u32				tid;
+	 *	u64				nr_namespaces;
+	 *	{ u64				dev, inode; } [nr_namespaces];
+	 *	struct sample_id		sample_id;
+	 * };
+	 */
+	PERF_RECORD_NAMESPACES			= 16,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 77a932b..7677eb5 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -46,6 +46,8 @@
 #include <linux/filter.h>
 #include <linux/namei.h>
 #include <linux/parser.h>
+#include <linux/proc_ns.h>
+#include <linux/mount.h>
 
 #include "internal.h"
 
@@ -377,6 +379,7 @@ static DEFINE_PER_CPU(struct pmu_event_list, pmu_sb_events);
 
 static atomic_t nr_mmap_events __read_mostly;
 static atomic_t nr_comm_events __read_mostly;
+static atomic_t nr_namespaces_events __read_mostly;
 static atomic_t nr_task_events __read_mostly;
 static atomic_t nr_freq_events __read_mostly;
 static atomic_t nr_switch_events __read_mostly;
@@ -3987,6 +3990,8 @@ static void unaccount_event(struct perf_event *event)
 		atomic_dec(&nr_mmap_events);
 	if (event->attr.comm)
 		atomic_dec(&nr_comm_events);
+	if (event->attr.namespaces)
+		atomic_dec(&nr_namespaces_events);
 	if (event->attr.task)
 		atomic_dec(&nr_task_events);
 	if (event->attr.freq)
@@ -6487,6 +6492,7 @@ static void perf_event_task(struct task_struct *task,
 void perf_event_fork(struct task_struct *task)
 {
 	perf_event_task(task, NULL, 1);
+	perf_event_namespaces(task);
 }
 
 /*
@@ -6589,6 +6595,132 @@ void perf_event_comm(struct task_struct *task, bool exec)
 }
 
 /*
+ * namespaces tracking
+ */
+
+struct perf_namespaces_event {
+	struct task_struct		*task;
+
+	struct {
+		struct perf_event_header	header;
+
+		u32				pid;
+		u32				tid;
+		u64				nr_namespaces;
+		struct perf_ns_link_info	link_info[NR_NAMESPACES];
+	} event_id;
+};
+
+static int perf_event_namespaces_match(struct perf_event *event)
+{
+	return event->attr.namespaces;
+}
+
+static void perf_event_namespaces_output(struct perf_event *event,
+					 void *data)
+{
+	struct perf_namespaces_event *namespaces_event = data;
+	struct perf_output_handle handle;
+	struct perf_sample_data sample;
+	int ret;
+
+	if (!perf_event_namespaces_match(event))
+		return;
+
+	perf_event_header__init_id(&namespaces_event->event_id.header,
+				   &sample, event);
+	ret = perf_output_begin(&handle, event,
+				namespaces_event->event_id.header.size);
+	if (ret)
+		return;
+
+	namespaces_event->event_id.pid = perf_event_pid(event,
+							namespaces_event->task);
+	namespaces_event->event_id.tid = perf_event_tid(event,
+							namespaces_event->task);
+
+	perf_output_put(&handle, namespaces_event->event_id);
+
+	perf_event__output_id_sample(event, &handle, &sample);
+
+	perf_output_end(&handle);
+}
+
+static void perf_fill_ns_link_info(struct perf_ns_link_info *ns_link_info,
+				   struct task_struct *task,
+				   const struct proc_ns_operations *ns_ops)
+{
+	struct path ns_path;
+	struct inode *ns_inode;
+	void *error;
+
+	error = ns_get_path(&ns_path, task, ns_ops);
+	if (!error) {
+		ns_inode = ns_path.dentry->d_inode;
+		ns_link_info->dev = new_encode_dev(ns_inode->i_sb->s_dev);
+		ns_link_info->ino = ns_inode->i_ino;
+	}
+}
+
+void perf_event_namespaces(struct task_struct *task)
+{
+	struct perf_namespaces_event namespaces_event;
+	struct perf_ns_link_info *ns_link_info;
+
+	if (!atomic_read(&nr_namespaces_events))
+		return;
+
+	namespaces_event = (struct perf_namespaces_event){
+		.task	= task,
+		.event_id  = {
+			.header = {
+				.type = PERF_RECORD_NAMESPACES,
+				.misc = 0,
+				.size = sizeof(namespaces_event.event_id),
+			},
+			/* .pid */
+			/* .tid */
+			.nr_namespaces = NR_NAMESPACES,
+			/* .link_info[NR_NAMESPACES] */
+		},
+	};
+
+	ns_link_info = namespaces_event.event_id.link_info;
+
+	perf_fill_ns_link_info(&ns_link_info[MNT_NS_INDEX],
+			       task, &mntns_operations);
+
+#ifdef CONFIG_USER_NS
+	perf_fill_ns_link_info(&ns_link_info[USER_NS_INDEX],
+			       task, &userns_operations);
+#endif
+#ifdef CONFIG_NET_NS
+	perf_fill_ns_link_info(&ns_link_info[NET_NS_INDEX],
+			       task, &netns_operations);
+#endif
+#ifdef CONFIG_UTS_NS
+	perf_fill_ns_link_info(&ns_link_info[UTS_NS_INDEX],
+			       task, &utsns_operations);
+#endif
+#ifdef CONFIG_IPC_NS
+	perf_fill_ns_link_info(&ns_link_info[IPC_NS_INDEX],
+			       task, &ipcns_operations);
+#endif
+#ifdef CONFIG_PID_NS
+	perf_fill_ns_link_info(&ns_link_info[PID_NS_INDEX],
+			       task, &pidns_operations);
+#endif
+#ifdef CONFIG_CGROUPS
+	perf_fill_ns_link_info(&ns_link_info[CGROUP_NS_INDEX],
+			       task, &cgroupns_operations);
+#endif
+
+	perf_iterate_sb(perf_event_namespaces_output,
+			&namespaces_event,
+			NULL);
+}
+
+/*
  * mmap tracking
  */
 
@@ -9142,6 +9274,8 @@ static void account_event(struct perf_event *event)
 		atomic_inc(&nr_mmap_events);
 	if (event->attr.comm)
 		atomic_inc(&nr_comm_events);
+	if (event->attr.namespaces)
+		atomic_inc(&nr_namespaces_events);
 	if (event->attr.task)
 		atomic_inc(&nr_task_events);
 	if (event->attr.freq)
@@ -9687,6 +9821,11 @@ SYSCALL_DEFINE5(perf_event_open,
 			return -EACCES;
 	}
 
+	if (attr.namespaces) {
+		if (!capable(CAP_SYS_ADMIN))
+			return -EACCES;
+	}
+
 	if (attr.freq) {
 		if (attr.sample_freq > sysctl_perf_event_sample_rate)
 			return -EINVAL;
diff --git a/kernel/fork.c b/kernel/fork.c
index 11c5c8a..084755e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2277,6 +2277,8 @@ SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags)
 		}
 	}
 
+	perf_event_namespaces(current);
+
 bad_unshare_cleanup_cred:
 	if (new_cred)
 		put_cred(new_cred);
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index 782102e..f6c5d33 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -26,6 +26,7 @@
 #include <linux/file.h>
 #include <linux/syscalls.h>
 #include <linux/cgroup.h>
+#include <linux/perf_event.h>
 
 static struct kmem_cache *nsproxy_cachep;
 
@@ -262,6 +263,8 @@ SYSCALL_DEFINE2(setns, int, fd, int, nstype)
 		goto out;
 	}
 	switch_task_namespaces(tsk, new_nsproxy);
+
+	perf_event_namespaces(tsk);
 out:
 	fput(file);
 	return err;

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v7 2/8] perf tool: add PERF_RECORD_NAMESPACES to include namespaces related info
  2017-02-21 14:01 [PATCH v7 0/8] perf: add support for analyzing events for containers Hari Bathini
  2017-02-21 14:01 ` [PATCH v7 1/8] perf: add PERF_RECORD_NAMESPACES to include namespaces related info Hari Bathini
@ 2017-02-21 14:01 ` Hari Bathini
  2017-03-01 21:02   ` Arnaldo Carvalho de Melo
  2017-02-21 14:01 ` [PATCH v7 3/8] perf tool: update about the new option to record namespace events Hari Bathini
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Hari Bathini @ 2017-02-21 14:01 UTC (permalink / raw)
  To: ast, peterz, lkml, acme, alexander.shishkin, mingo
  Cc: daniel, rostedt, Ananth N Mavinakayanahalli, ebiederm, sargun,
	Aravinda Prasad, brendan.d.gregg, jolsa

Update perf tool to examine PERF_RECORD_NAMESPACES events emitted by
the kernel when fork, clone, setns or unshare are invoked.

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
---
 tools/include/uapi/linux/perf_event.h |   32 +++++++++++++++++++++++-
 tools/perf/builtin-annotate.c         |    1 +
 tools/perf/builtin-diff.c             |    1 +
 tools/perf/builtin-inject.c           |   14 +++++++++++
 tools/perf/builtin-kmem.c             |    1 +
 tools/perf/builtin-kvm.c              |    2 ++
 tools/perf/builtin-lock.c             |    1 +
 tools/perf/builtin-mem.c              |    1 +
 tools/perf/builtin-record.c           |    6 +++++
 tools/perf/builtin-report.c           |    1 +
 tools/perf/builtin-sched.c            |    1 +
 tools/perf/builtin-script.c           |    1 +
 tools/perf/builtin-trace.c            |    3 ++
 tools/perf/perf.h                     |    1 +
 tools/perf/util/Build                 |    1 +
 tools/perf/util/data-convert-bt.c     |    1 +
 tools/perf/util/event.c               |    9 +++++++
 tools/perf/util/event.h               |   14 +++++++++++
 tools/perf/util/evsel.c               |    3 ++
 tools/perf/util/machine.c             |   31 +++++++++++++++++++++++
 tools/perf/util/machine.h             |    3 ++
 tools/perf/util/namespaces.c          |   35 ++++++++++++++++++++++++++
 tools/perf/util/namespaces.h          |   26 ++++++++++++++++++++
 tools/perf/util/session.c             |    7 +++++
 tools/perf/util/thread.c              |   44 ++++++++++++++++++++++++++++++++-
 tools/perf/util/thread.h              |    6 +++++
 tools/perf/util/tool.h                |    2 ++
 27 files changed, 244 insertions(+), 4 deletions(-)
 create mode 100644 tools/perf/util/namespaces.c
 create mode 100644 tools/perf/util/namespaces.h

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index c66a485..bec0aad 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -344,7 +344,8 @@ struct perf_event_attr {
 				use_clockid    :  1, /* use @clockid for time fields */
 				context_switch :  1, /* context switch data */
 				write_backward :  1, /* Write ring buffer from end to beginning */
-				__reserved_1   : 36;
+				namespaces     :  1, /* include namespaces data */
+				__reserved_1   : 35;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -610,6 +611,23 @@ struct perf_event_header {
 	__u16	size;
 };
 
+struct perf_ns_link_info {
+	__u64	dev;
+	__u64	ino;
+};
+
+enum {
+	NET_NS_INDEX		= 0,
+	UTS_NS_INDEX		= 1,
+	IPC_NS_INDEX		= 2,
+	PID_NS_INDEX		= 3,
+	USER_NS_INDEX		= 4,
+	MNT_NS_INDEX		= 5,
+	CGROUP_NS_INDEX		= 6,
+
+	NR_NAMESPACES,		/* number of available namespaces */
+};
+
 enum perf_event_type {
 
 	/*
@@ -862,6 +880,18 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_SWITCH_CPU_WIDE		= 15,
 
+	/*
+	 * struct {
+	 *	struct perf_event_header	header;
+	 *	u32				pid;
+	 *	u32				tid;
+	 *	u64				nr_namespaces;
+	 *	{ u64				dev, inode; } [nr_namespaces];
+	 *	struct sample_id		sample_id;
+	 * };
+	 */
+	PERF_RECORD_NAMESPACES			= 16,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index ebb6283..1b63dc4 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -393,6 +393,7 @@ int cmd_annotate(int argc, const char **argv, const char *prefix __maybe_unused)
 			.comm	= perf_event__process_comm,
 			.exit	= perf_event__process_exit,
 			.fork	= perf_event__process_fork,
+			.namespaces = perf_event__process_namespaces,
 			.ordered_events = true,
 			.ordering_requires_timestamps = true,
 		},
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 70a2893..4b821cf 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -364,6 +364,7 @@ static struct perf_tool tool = {
 	.exit	= perf_event__process_exit,
 	.fork	= perf_event__process_fork,
 	.lost	= perf_event__process_lost,
+	.namespaces = perf_event__process_namespaces,
 	.ordered_events = true,
 	.ordering_requires_timestamps = true,
 };
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index b9bc7e3..c5ddc73 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -333,6 +333,19 @@ static int perf_event__repipe_comm(struct perf_tool *tool,
 	return err;
 }
 
+static int perf_event__repipe_namespaces(struct perf_tool *tool,
+					 union perf_event *event,
+					 struct perf_sample *sample,
+					 struct machine *machine)
+{
+	int err;
+
+	err = perf_event__process_namespaces(tool, event, sample, machine);
+	perf_event__repipe(tool, event, sample, machine);
+
+	return err;
+}
+
 static int perf_event__repipe_exit(struct perf_tool *tool,
 				   union perf_event *event,
 				   struct perf_sample *sample,
@@ -660,6 +673,7 @@ static int __cmd_inject(struct perf_inject *inject)
 		session->itrace_synth_opts = &inject->itrace_synth_opts;
 		inject->itrace_synth_opts.inject = true;
 		inject->tool.comm	    = perf_event__repipe_comm;
+		inject->tool.namespaces	    = perf_event__repipe_namespaces;
 		inject->tool.exit	    = perf_event__repipe_exit;
 		inject->tool.id_index	    = perf_event__repipe_id_index;
 		inject->tool.auxtrace_info  = perf_event__process_auxtrace_info;
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index 6da8d08..d509e74 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -964,6 +964,7 @@ static struct perf_tool perf_kmem = {
 	.comm		 = perf_event__process_comm,
 	.mmap		 = perf_event__process_mmap,
 	.mmap2		 = perf_event__process_mmap2,
+	.namespaces	 = perf_event__process_namespaces,
 	.ordered_events	 = true,
 };
 
diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index 08fa88f..18e6c38 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -1044,6 +1044,7 @@ static int read_events(struct perf_kvm_stat *kvm)
 	struct perf_tool eops = {
 		.sample			= process_sample_event,
 		.comm			= perf_event__process_comm,
+		.namespaces		= perf_event__process_namespaces,
 		.ordered_events		= true,
 	};
 	struct perf_data_file file = {
@@ -1348,6 +1349,7 @@ static int kvm_events_live(struct perf_kvm_stat *kvm,
 	kvm->tool.exit   = perf_event__process_exit;
 	kvm->tool.fork   = perf_event__process_fork;
 	kvm->tool.lost   = process_lost_event;
+	kvm->tool.namespaces  = perf_event__process_namespaces;
 	kvm->tool.ordered_events = true;
 	perf_tool__fill_defaults(&kvm->tool);
 
diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
index ce3bfb4..d750cca 100644
--- a/tools/perf/builtin-lock.c
+++ b/tools/perf/builtin-lock.c
@@ -858,6 +858,7 @@ static int __cmd_report(bool display_info)
 	struct perf_tool eops = {
 		.sample		 = process_sample_event,
 		.comm		 = perf_event__process_comm,
+		.namespaces	 = perf_event__process_namespaces,
 		.ordered_events	 = true,
 	};
 	struct perf_data_file file = {
diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index cd7bc4d..430656c 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -342,6 +342,7 @@ int cmd_mem(int argc, const char **argv, const char *prefix __maybe_unused)
 			.lost		= perf_event__process_lost,
 			.fork		= perf_event__process_fork,
 			.build_id	= perf_event__process_build_id,
+			.namespaces	= perf_event__process_namespaces,
 			.ordered_events	= true,
 		},
 		.input_name		 = "perf.data",
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 6cd6776..a8b9a78 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -876,6 +876,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 	signal(SIGTERM, sig_handler);
 	signal(SIGSEGV, sigsegv_handler);
 
+	if (rec->opts.record_namespaces)
+		tool->namespace_events = true;
+
 	if (rec->opts.auxtrace_snapshot_mode || rec->switch_output.enabled) {
 		signal(SIGUSR2, snapshot_sig_handler);
 		if (rec->opts.auxtrace_snapshot_mode)
@@ -1497,6 +1500,7 @@ static struct record record = {
 		.fork		= perf_event__process_fork,
 		.exit		= perf_event__process_exit,
 		.comm		= perf_event__process_comm,
+		.namespaces	= perf_event__process_namespaces,
 		.mmap		= perf_event__process_mmap,
 		.mmap2		= perf_event__process_mmap2,
 		.ordered_events	= true,
@@ -1611,6 +1615,8 @@ static struct option __record_options[] = {
 			  "opts", "AUX area tracing Snapshot Mode", ""),
 	OPT_UINTEGER(0, "proc-map-timeout", &record.opts.proc_map_timeout,
 			"per thread proc mmap processing timeout in ms"),
+	OPT_BOOLEAN(0, "namespaces", &record.opts.record_namespaces,
+		    "Record namespaces events"),
 	OPT_BOOLEAN(0, "switch-events", &record.opts.record_switch_events,
 		    "Record context switch events"),
 	OPT_BOOLEAN_FLAG(0, "all-kernel", &record.opts.all_kernel,
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index dbd7fa0..5c92c75 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -694,6 +694,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 			.mmap		 = perf_event__process_mmap,
 			.mmap2		 = perf_event__process_mmap2,
 			.comm		 = perf_event__process_comm,
+			.namespaces	 = perf_event__process_namespaces,
 			.exit		 = perf_event__process_exit,
 			.fork		 = perf_event__process_fork,
 			.lost		 = perf_event__process_lost,
diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 270eb2d..e0ddd04 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -3272,6 +3272,7 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 		.tool = {
 			.sample		 = perf_sched__process_tracepoint_sample,
 			.comm		 = perf_event__process_comm,
+			.namespaces	 = perf_event__process_namespaces,
 			.lost		 = perf_event__process_lost,
 			.fork		 = perf_sched__process_fork_event,
 			.ordered_events = true,
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index c0783b4..f1ce806 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -2097,6 +2097,7 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
 			.mmap		 = perf_event__process_mmap,
 			.mmap2		 = perf_event__process_mmap2,
 			.comm		 = perf_event__process_comm,
+			.namespaces	 = perf_event__process_namespaces,
 			.exit		 = perf_event__process_exit,
 			.fork		 = perf_event__process_fork,
 			.attr		 = process_attr,
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 40ef9b2..0bcd32f 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -2415,8 +2415,9 @@ static int trace__replay(struct trace *trace)
 	trace->tool.exit	  = perf_event__process_exit;
 	trace->tool.fork	  = perf_event__process_fork;
 	trace->tool.attr	  = perf_event__process_attr;
-	trace->tool.tracing_data = perf_event__process_tracing_data;
+	trace->tool.tracing_data  = perf_event__process_tracing_data;
 	trace->tool.build_id	  = perf_event__process_build_id;
+	trace->tool.namespaces	  = perf_event__process_namespaces;
 
 	trace->tool.ordered_events = true;
 	trace->tool.ordering_requires_timestamps = true;
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 1c27d94..806c216 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -50,6 +50,7 @@ struct record_opts {
 	bool	     running_time;
 	bool	     full_auxtrace;
 	bool	     auxtrace_snapshot_mode;
+	bool	     record_namespaces;
 	bool	     record_switch_events;
 	bool	     all_kernel;
 	bool	     all_user;
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 5da376b..2ea5ee1 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -42,6 +42,7 @@ libperf-y += pstack.o
 libperf-y += session.o
 libperf-$(CONFIG_AUDIT) += syscalltbl.o
 libperf-y += ordered-events.o
+libperf-y += namespaces.o
 libperf-y += comm.o
 libperf-y += thread.o
 libperf-y += thread_map.o
diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c
index 4e6cbc9..89ece24 100644
--- a/tools/perf/util/data-convert-bt.c
+++ b/tools/perf/util/data-convert-bt.c
@@ -1468,6 +1468,7 @@ int bt_convert__perf2ctf(const char *input, const char *path,
 			.lost            = perf_event__process_lost,
 			.tracing_data    = perf_event__process_tracing_data,
 			.build_id        = perf_event__process_build_id,
+			.namespaces      = perf_event__process_namespaces,
 			.ordered_events  = true,
 			.ordering_requires_timestamps = true,
 		},
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 4ea7ce7..f118eac 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -31,6 +31,7 @@ static const char *perf_event__names[] = {
 	[PERF_RECORD_LOST_SAMPLES]		= "LOST_SAMPLES",
 	[PERF_RECORD_SWITCH]			= "SWITCH",
 	[PERF_RECORD_SWITCH_CPU_WIDE]		= "SWITCH_CPU_WIDE",
+	[PERF_RECORD_NAMESPACES]		= "NAMESPACES",
 	[PERF_RECORD_HEADER_ATTR]		= "ATTR",
 	[PERF_RECORD_HEADER_EVENT_TYPE]		= "EVENT_TYPE",
 	[PERF_RECORD_HEADER_TRACING_DATA]	= "TRACING_DATA",
@@ -1016,6 +1017,14 @@ int perf_event__process_comm(struct perf_tool *tool __maybe_unused,
 	return machine__process_comm_event(machine, event, sample);
 }
 
+int perf_event__process_namespaces(struct perf_tool *tool __maybe_unused,
+				   union perf_event *event,
+				   struct perf_sample *sample,
+				   struct machine *machine)
+{
+	return machine__process_namespaces_event(machine, event, sample);
+}
+
 int perf_event__process_lost(struct perf_tool *tool __maybe_unused,
 			     union perf_event *event,
 			     struct perf_sample *sample,
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index c735c53..4e90b09 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -39,6 +39,15 @@ struct comm_event {
 	char comm[16];
 };
 
+#define NAMESPACES_MAX			12
+
+struct namespaces_event {
+	struct perf_event_header header;
+	u32 pid, tid;
+	u64 nr_namespaces;
+	struct perf_ns_link_info link_info[NAMESPACES_MAX];
+};
+
 struct fork_event {
 	struct perf_event_header header;
 	u32 pid, ppid;
@@ -485,6 +494,7 @@ union perf_event {
 	struct mmap_event		mmap;
 	struct mmap2_event		mmap2;
 	struct comm_event		comm;
+	struct namespaces_event		namespaces;
 	struct fork_event		fork;
 	struct lost_event		lost;
 	struct lost_samples_event	lost_samples;
@@ -587,6 +597,10 @@ int perf_event__process_switch(struct perf_tool *tool,
 			       union perf_event *event,
 			       struct perf_sample *sample,
 			       struct machine *machine);
+int perf_event__process_namespaces(struct perf_tool *tool,
+				   union perf_event *event,
+				   struct perf_sample *sample,
+				   struct machine *machine);
 int perf_event__process_mmap(struct perf_tool *tool,
 			     union perf_event *event,
 			     struct perf_sample *sample,
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index ac59710..175dc23 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -932,6 +932,9 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts,
 	attr->mmap2 = track && !perf_missing_features.mmap2;
 	attr->comm  = track;
 
+	if (opts->record_namespaces)
+		attr->namespaces  = track;
+
 	if (opts->record_switch_events)
 		attr->context_switch = track;
 
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 71c9720..060fabb 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -13,6 +13,7 @@
 #include <symbol/kallsyms.h>
 #include "unwind.h"
 #include "linux/hash.h"
+#include "asm/bug.h"
 
 static void __machine__remove_thread(struct machine *machine, struct thread *th, bool lock);
 
@@ -501,6 +502,34 @@ int machine__process_comm_event(struct machine *machine, union perf_event *event
 	return err;
 }
 
+int machine__process_namespaces_event(struct machine *machine __maybe_unused,
+				      union perf_event *event,
+				      struct perf_sample *sample __maybe_unused)
+{
+	struct thread *thread = machine__findnew_thread(machine,
+							event->namespaces.pid,
+							event->namespaces.tid);
+	int err = 0;
+
+	WARN_ONCE(event->namespaces.nr_namespaces > NR_NAMESPACES,
+		  "\nWARNING: kernel seems to support more namespaces than perf"
+		  " tool.\nTry updating the perf tool..\n\n");
+
+	WARN_ONCE(event->namespaces.nr_namespaces < NR_NAMESPACES,
+		  "\nWARNING: perf tool seems to support more namespaces than"
+		  " the kernel.\nTry updating the kernel..\n\n");
+
+	if (thread == NULL ||
+	    thread__set_namespaces(thread, sample->time, &event->namespaces)) {
+		dump_printf("problem processing PERF_RECORD_NAMESPACES, skipping event.\n");
+		err = -1;
+	}
+
+	thread__put(thread);
+
+	return err;
+}
+
 int machine__process_lost_event(struct machine *machine __maybe_unused,
 				union perf_event *event, struct perf_sample *sample __maybe_unused)
 {
@@ -1538,6 +1567,8 @@ int machine__process_event(struct machine *machine, union perf_event *event,
 		ret = machine__process_comm_event(machine, event, sample); break;
 	case PERF_RECORD_MMAP:
 		ret = machine__process_mmap_event(machine, event, sample); break;
+	case PERF_RECORD_NAMESPACES:
+		ret = machine__process_namespaces_event(machine, event, sample); break;
 	case PERF_RECORD_MMAP2:
 		ret = machine__process_mmap2_event(machine, event, sample); break;
 	case PERF_RECORD_FORK:
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index a283050..3cdb134 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -97,6 +97,9 @@ int machine__process_itrace_start_event(struct machine *machine,
 					union perf_event *event);
 int machine__process_switch_event(struct machine *machine,
 				  union perf_event *event);
+int machine__process_namespaces_event(struct machine *machine,
+				      union perf_event *event,
+				      struct perf_sample *sample);
 int machine__process_mmap_event(struct machine *machine, union perf_event *event,
 				struct perf_sample *sample);
 int machine__process_mmap2_event(struct machine *machine, union perf_event *event,
diff --git a/tools/perf/util/namespaces.c b/tools/perf/util/namespaces.c
new file mode 100644
index 0000000..3134c00
--- /dev/null
+++ b/tools/perf/util/namespaces.c
@@ -0,0 +1,35 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * Copyright (C) 2017 Hari Bathini, IBM Corporation
+ */
+
+#include "namespaces.h"
+#include "util.h"
+#include "event.h"
+#include <stdlib.h>
+#include <stdio.h>
+
+struct namespaces *namespaces__new(struct namespaces_event *event)
+{
+	struct namespaces *namespaces = zalloc(sizeof(*namespaces));
+
+	if (!namespaces)
+		return NULL;
+
+	namespaces->end_time = -1;
+
+	if (event) {
+		memcpy(namespaces->link_info, event->link_info,
+		       sizeof(namespaces->link_info));
+	}
+
+	return namespaces;
+}
+
+void namespaces__free(struct namespaces *namespaces)
+{
+	free(namespaces);
+}
diff --git a/tools/perf/util/namespaces.h b/tools/perf/util/namespaces.h
new file mode 100644
index 0000000..45d9ffd
--- /dev/null
+++ b/tools/perf/util/namespaces.h
@@ -0,0 +1,26 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * Copyright (C) 2017 Hari Bathini, IBM Corporation
+ */
+
+#ifndef __PERF_NAMESPACES_H
+#define __PERF_NAMESPACES_H
+
+#include "../perf.h"
+#include <linux/list.h>
+
+struct namespaces_event;
+
+struct namespaces {
+	struct list_head list;
+	u64 end_time;
+	struct perf_ns_link_info link_info[NR_NAMESPACES];
+};
+
+struct namespaces *namespaces__new(struct namespaces_event *event);
+void namespaces__free(struct namespaces *namespaces);
+
+#endif  /* __PERF_NAMESPACES_H */
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 4cdbc8f..0b782a3 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1239,6 +1239,8 @@ static int machines__deliver_event(struct machines *machines,
 		return tool->mmap2(tool, event, sample, machine);
 	case PERF_RECORD_COMM:
 		return tool->comm(tool, event, sample, machine);
+	case PERF_RECORD_NAMESPACES:
+		return tool->namespaces(tool, event, sample, machine);
 	case PERF_RECORD_FORK:
 		return tool->fork(tool, event, sample, machine);
 	case PERF_RECORD_EXIT:
@@ -1494,6 +1496,11 @@ int perf_session__register_idle_thread(struct perf_session *session)
 		err = -1;
 	}
 
+	if (thread == NULL || thread__set_namespaces(thread, 0, NULL)) {
+		pr_err("problem inserting idle task.\n");
+		err = -1;
+	}
+
 	/* machine__findnew_thread() got the thread, so put it */
 	thread__put(thread);
 	return err;
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index f5af87f..b9fe432 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -7,6 +7,7 @@
 #include "thread-stack.h"
 #include "util.h"
 #include "debug.h"
+#include "namespaces.h"
 #include "comm.h"
 #include "unwind.h"
 
@@ -40,6 +41,7 @@ struct thread *thread__new(pid_t pid, pid_t tid)
 		thread->tid = tid;
 		thread->ppid = -1;
 		thread->cpu = -1;
+		INIT_LIST_HEAD(&thread->namespaces_list);
 		INIT_LIST_HEAD(&thread->comm_list);
 
 		comm_str = malloc(32);
@@ -66,7 +68,8 @@ struct thread *thread__new(pid_t pid, pid_t tid)
 
 void thread__delete(struct thread *thread)
 {
-	struct comm *comm, *tmp;
+	struct namespaces *namespaces, *tmp_namespaces;
+	struct comm *comm, *tmp_comm;
 
 	BUG_ON(!RB_EMPTY_NODE(&thread->rb_node));
 
@@ -76,7 +79,12 @@ void thread__delete(struct thread *thread)
 		map_groups__put(thread->mg);
 		thread->mg = NULL;
 	}
-	list_for_each_entry_safe(comm, tmp, &thread->comm_list, list) {
+	list_for_each_entry_safe(namespaces, tmp_namespaces,
+				 &thread->namespaces_list, list) {
+		list_del(&namespaces->list);
+		namespaces__free(namespaces);
+	}
+	list_for_each_entry_safe(comm, tmp_comm, &thread->comm_list, list) {
 		list_del(&comm->list);
 		comm__free(comm);
 	}
@@ -104,6 +112,38 @@ void thread__put(struct thread *thread)
 	}
 }
 
+struct namespaces *thread__namespaces(const struct thread *thread)
+{
+	if (list_empty(&thread->namespaces_list))
+		return NULL;
+
+	return list_first_entry(&thread->namespaces_list, struct namespaces, list);
+}
+
+int thread__set_namespaces(struct thread *thread, u64 timestamp,
+			   struct namespaces_event *event)
+{
+	struct namespaces *new, *curr = thread__namespaces(thread);
+
+	new = namespaces__new(event);
+	if (!new)
+		return -ENOMEM;
+
+	list_add(&new->list, &thread->namespaces_list);
+
+	if (timestamp && curr) {
+		/*
+		 * setns syscall must have changed few or all the namespaces
+		 * of this thread. Update end time for the namespaces
+		 * previously used.
+		 */
+		curr = list_next_entry(new, list);
+		curr->end_time = timestamp;
+	}
+
+	return 0;
+}
+
 struct comm *thread__comm(const struct thread *thread)
 {
 	if (list_empty(&thread->comm_list))
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index 99263cb..b18b5a2 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -28,6 +28,7 @@ struct thread {
 	bool			comm_set;
 	int			comm_len;
 	bool			dead; /* if set thread has exited */
+	struct list_head	namespaces_list;
 	struct list_head	comm_list;
 	u64			db_id;
 
@@ -40,6 +41,7 @@ struct thread {
 };
 
 struct machine;
+struct namespaces;
 struct comm;
 
 struct thread *thread__new(pid_t pid, pid_t tid);
@@ -62,6 +64,10 @@ static inline void thread__exited(struct thread *thread)
 	thread->dead = true;
 }
 
+struct namespaces *thread__namespaces(const struct thread *thread);
+int thread__set_namespaces(struct thread *thread, u64 timestamp,
+			   struct namespaces_event *event);
+
 int __thread__set_comm(struct thread *thread, const char *comm, u64 timestamp,
 		       bool exec);
 static inline int thread__set_comm(struct thread *thread, const char *comm,
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index ac2590a..829471a 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -40,6 +40,7 @@ struct perf_tool {
 	event_op	mmap,
 			mmap2,
 			comm,
+			namespaces,
 			fork,
 			exit,
 			lost,
@@ -66,6 +67,7 @@ struct perf_tool {
 	event_op3	auxtrace;
 	bool		ordered_events;
 	bool		ordering_requires_timestamps;
+	bool		namespace_events;
 };
 
 #endif /* __PERF_TOOL_H */

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v7 3/8] perf tool: update about the new option to record namespace events
  2017-02-21 14:01 [PATCH v7 0/8] perf: add support for analyzing events for containers Hari Bathini
  2017-02-21 14:01 ` [PATCH v7 1/8] perf: add PERF_RECORD_NAMESPACES to include namespaces related info Hari Bathini
  2017-02-21 14:01 ` [PATCH v7 2/8] perf tool: " Hari Bathini
@ 2017-02-21 14:01 ` Hari Bathini
  2017-03-01 21:03   ` Arnaldo Carvalho de Melo
  2017-02-21 14:01 ` [PATCH v7 4/8] perf tool: synthesize namespace events for current processes Hari Bathini
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Hari Bathini @ 2017-02-21 14:01 UTC (permalink / raw)
  To: ast, peterz, lkml, acme, alexander.shishkin, mingo
  Cc: daniel, rostedt, Ananth N Mavinakayanahalli, ebiederm, sargun,
	Aravinda Prasad, brendan.d.gregg, jolsa

Now that we have a new option to record namespace events, update
the perf-record documentation accordingly.

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
---
 tools/perf/Documentation/perf-record.txt |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 27256bc..9c85a65 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -347,6 +347,9 @@ Enable weightened sampling. An additional weight is recorded per sample and can
 displayed with the weight and local_weight sort keys.  This currently works for TSX
 abort events and some memory events in precise mode on modern Intel CPUs.
 
+--namespaces::
+Record events of type PERF_RECORD_NAMESPACES.
+
 --transaction::
 Record transaction flags for transaction related events.
 

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v7 4/8] perf tool: synthesize namespace events for current processes
  2017-02-21 14:01 [PATCH v7 0/8] perf: add support for analyzing events for containers Hari Bathini
                   ` (2 preceding siblings ...)
  2017-02-21 14:01 ` [PATCH v7 3/8] perf tool: update about the new option to record namespace events Hari Bathini
@ 2017-02-21 14:01 ` Hari Bathini
  2017-03-01 21:05   ` Arnaldo Carvalho de Melo
  2017-02-21 14:01 ` [PATCH v7 5/8] perf tool: add print support for namespace events Hari Bathini
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Hari Bathini @ 2017-02-21 14:01 UTC (permalink / raw)
  To: ast, peterz, lkml, acme, alexander.shishkin, mingo
  Cc: daniel, rostedt, Ananth N Mavinakayanahalli, ebiederm, sargun,
	Aravinda Prasad, brendan.d.gregg, jolsa

Synthesize PERF_RECORD_NAMESPACES events for processes that were
running prior to invocation of perf record, the data for which is
taken from /proc/$PID/ns. These changes make way for analyzing
events with regard to namespaces.

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
---
 tools/perf/builtin-record.c |   27 +++++++++--
 tools/perf/util/event.c     |  107 +++++++++++++++++++++++++++++++++++++++++--
 tools/perf/util/event.h     |    6 ++
 3 files changed, 130 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index a8b9a78..f4bf6a6 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -986,6 +986,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 	 */
 	if (forks) {
 		union perf_event *event;
+		pid_t tgid;
 
 		event = malloc(sizeof(event->comm) + machine->id_hdr_size);
 		if (event == NULL) {
@@ -999,10 +1000,28 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 		 * cannot see a correct process name for those events.
 		 * Synthesize COMM event to prevent it.
 		 */
-		perf_event__synthesize_comm(tool, event,
-					    rec->evlist->workload.pid,
-					    process_synthesized_event,
-					    machine);
+		tgid = perf_event__synthesize_comm(tool, event,
+						   rec->evlist->workload.pid,
+						   process_synthesized_event,
+						   machine);
+		free(event);
+
+		if (tgid == -1)
+			goto out_child;
+
+		event = malloc(sizeof(event->namespaces) + machine->id_hdr_size);
+		if (event == NULL) {
+			err = -ENOMEM;
+			goto out_child;
+		}
+
+		/*
+		 * Synthesize NAMESPACES event for the command specified.
+		 */
+		perf_event__synthesize_namespaces(tool, event,
+						  rec->evlist->workload.pid,
+						  tgid, process_synthesized_event,
+						  machine);
 		free(event);
 
 		perf_evlist__start_workload(rec->evlist);
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index f118eac..c8c112a 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -50,6 +50,16 @@ static const char *perf_event__names[] = {
 	[PERF_RECORD_TIME_CONV]			= "TIME_CONV",
 };
 
+static const char *perf_ns__names[] = {
+	[NET_NS_INDEX]		= "net",
+	[UTS_NS_INDEX]		= "uts",
+	[IPC_NS_INDEX]		= "ipc",
+	[PID_NS_INDEX]		= "pid",
+	[USER_NS_INDEX]		= "user",
+	[MNT_NS_INDEX]		= "mnt",
+	[CGROUP_NS_INDEX]	= "cgroup",
+};
+
 const char *perf_event__name(unsigned int id)
 {
 	if (id >= ARRAY_SIZE(perf_event__names))
@@ -59,6 +69,13 @@ const char *perf_event__name(unsigned int id)
 	return perf_event__names[id];
 }
 
+static const char *perf_ns__name(unsigned int id)
+{
+	if (id >= ARRAY_SIZE(perf_ns__names))
+		return "UNKNOWN";
+	return perf_ns__names[id];
+}
+
 static int perf_tool__process_synth_event(struct perf_tool *tool,
 					  union perf_event *event,
 					  struct machine *machine,
@@ -204,6 +221,56 @@ pid_t perf_event__synthesize_comm(struct perf_tool *tool,
 	return tgid;
 }
 
+static void perf_event__get_ns_link_info(pid_t pid, const char *ns,
+					 struct perf_ns_link_info *ns_link_info)
+{
+	struct stat64 st;
+	char proc_ns[128];
+
+	sprintf(proc_ns, "/proc/%u/ns/%s", pid, ns);
+	if (stat64(proc_ns, &st) == 0) {
+		ns_link_info->dev = st.st_dev;
+		ns_link_info->ino = st.st_ino;
+	}
+}
+
+int perf_event__synthesize_namespaces(struct perf_tool *tool,
+				      union perf_event *event,
+				      pid_t pid, pid_t tgid,
+				      perf_event__handler_t process,
+				      struct machine *machine)
+{
+	u32 idx;
+	struct perf_ns_link_info *ns_link_info;
+
+	if (!tool->namespace_events)
+		return 0;
+
+	memset(&event->namespaces, 0,
+	       sizeof(event->namespaces) + machine->id_hdr_size);
+
+	event->namespaces.pid = tgid;
+	event->namespaces.tid = pid;
+
+	event->namespaces.nr_namespaces = NR_NAMESPACES;
+
+	ns_link_info = event->namespaces.link_info;
+
+	for (idx = 0; idx < event->namespaces.nr_namespaces; idx++)
+		perf_event__get_ns_link_info(pid, perf_ns__name(idx),
+					     &ns_link_info[idx]);
+
+	event->namespaces.header.type = PERF_RECORD_NAMESPACES;
+
+	event->namespaces.header.size = (sizeof(event->namespaces) +
+					 machine->id_hdr_size);
+
+	if (perf_tool__process_synth_event(tool, event, machine, process) != 0)
+		return -1;
+
+	return 0;
+}
+
 static int perf_event__synthesize_fork(struct perf_tool *tool,
 				       union perf_event *event,
 				       pid_t pid, pid_t tgid, pid_t ppid,
@@ -435,8 +502,9 @@ int perf_event__synthesize_modules(struct perf_tool *tool,
 static int __event__synthesize_thread(union perf_event *comm_event,
 				      union perf_event *mmap_event,
 				      union perf_event *fork_event,
+				      union perf_event *namespaces_event,
 				      pid_t pid, int full,
-					  perf_event__handler_t process,
+				      perf_event__handler_t process,
 				      struct perf_tool *tool,
 				      struct machine *machine,
 				      bool mmap_data,
@@ -456,6 +524,11 @@ static int __event__synthesize_thread(union perf_event *comm_event,
 		if (tgid == -1)
 			return -1;
 
+		if (perf_event__synthesize_namespaces(tool, namespaces_event, pid,
+						      tgid, process, machine) < 0)
+			return -1;
+
+
 		return perf_event__synthesize_mmap_events(tool, mmap_event, pid, tgid,
 							  process, machine, mmap_data,
 							  proc_map_timeout);
@@ -489,6 +562,11 @@ static int __event__synthesize_thread(union perf_event *comm_event,
 		if (perf_event__synthesize_fork(tool, fork_event, _pid, tgid,
 						ppid, process, machine) < 0)
 			break;
+
+		if (perf_event__synthesize_namespaces(tool, namespaces_event, _pid,
+						      tgid, process, machine) < 0)
+			break;
+
 		/*
 		 * Send the prepared comm event
 		 */
@@ -517,6 +595,7 @@ int perf_event__synthesize_thread_map(struct perf_tool *tool,
 				      unsigned int proc_map_timeout)
 {
 	union perf_event *comm_event, *mmap_event, *fork_event;
+	union perf_event *namespaces_event;
 	int err = -1, thread, j;
 
 	comm_event = malloc(sizeof(comm_event->comm) + machine->id_hdr_size);
@@ -531,10 +610,15 @@ int perf_event__synthesize_thread_map(struct perf_tool *tool,
 	if (fork_event == NULL)
 		goto out_free_mmap;
 
+	namespaces_event = malloc(sizeof(namespaces_event->namespaces) +
+				  machine->id_hdr_size);
+	if (namespaces_event == NULL)
+		goto out_free_fork;
+
 	err = 0;
 	for (thread = 0; thread < threads->nr; ++thread) {
 		if (__event__synthesize_thread(comm_event, mmap_event,
-					       fork_event,
+					       fork_event, namespaces_event,
 					       thread_map__pid(threads, thread), 0,
 					       process, tool, machine,
 					       mmap_data, proc_map_timeout)) {
@@ -560,7 +644,7 @@ int perf_event__synthesize_thread_map(struct perf_tool *tool,
 			/* if not, generate events for it */
 			if (need_leader &&
 			    __event__synthesize_thread(comm_event, mmap_event,
-						       fork_event,
+						       fork_event, namespaces_event,
 						       comm_event->comm.pid, 0,
 						       process, tool, machine,
 						       mmap_data, proc_map_timeout)) {
@@ -569,6 +653,8 @@ int perf_event__synthesize_thread_map(struct perf_tool *tool,
 			}
 		}
 	}
+	free(namespaces_event);
+out_free_fork:
 	free(fork_event);
 out_free_mmap:
 	free(mmap_event);
@@ -588,6 +674,7 @@ int perf_event__synthesize_threads(struct perf_tool *tool,
 	char proc_path[PATH_MAX];
 	struct dirent *dirent;
 	union perf_event *comm_event, *mmap_event, *fork_event;
+	union perf_event *namespaces_event;
 	int err = -1;
 
 	if (machine__is_default_guest(machine))
@@ -605,11 +692,16 @@ int perf_event__synthesize_threads(struct perf_tool *tool,
 	if (fork_event == NULL)
 		goto out_free_mmap;
 
+	namespaces_event = malloc(sizeof(namespaces_event->namespaces) +
+				  machine->id_hdr_size);
+	if (namespaces_event == NULL)
+		goto out_free_fork;
+
 	snprintf(proc_path, sizeof(proc_path), "%s/proc", machine->root_dir);
 	proc = opendir(proc_path);
 
 	if (proc == NULL)
-		goto out_free_fork;
+		goto out_free_namespaces;
 
 	while ((dirent = readdir(proc)) != NULL) {
 		char *end;
@@ -621,13 +713,16 @@ int perf_event__synthesize_threads(struct perf_tool *tool,
  		 * We may race with exiting thread, so don't stop just because
  		 * one thread couldn't be synthesized.
  		 */
-		__event__synthesize_thread(comm_event, mmap_event, fork_event, pid,
-					   1, process, tool, machine, mmap_data,
+		__event__synthesize_thread(comm_event, mmap_event, fork_event,
+					   namespaces_event, pid, 1, process,
+					   tool, machine, mmap_data,
 					   proc_map_timeout);
 	}
 
 	err = 0;
 	closedir(proc);
+out_free_namespaces:
+	free(namespaces_event);
 out_free_fork:
 	free(fork_event);
 out_free_mmap:
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 4e90b09..c73ad47 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -650,6 +650,12 @@ pid_t perf_event__synthesize_comm(struct perf_tool *tool,
 				  perf_event__handler_t process,
 				  struct machine *machine);
 
+int perf_event__synthesize_namespaces(struct perf_tool *tool,
+				      union perf_event *event,
+				      pid_t pid, pid_t tgid,
+				      perf_event__handler_t process,
+				      struct machine *machine);
+
 int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 				       union perf_event *event,
 				       pid_t pid, pid_t tgid,

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v7 5/8] perf tool: add print support for namespace events
  2017-02-21 14:01 [PATCH v7 0/8] perf: add support for analyzing events for containers Hari Bathini
                   ` (3 preceding siblings ...)
  2017-02-21 14:01 ` [PATCH v7 4/8] perf tool: synthesize namespace events for current processes Hari Bathini
@ 2017-02-21 14:01 ` Hari Bathini
  2017-03-01 21:06   ` Arnaldo Carvalho de Melo
  2017-02-21 14:02 ` [PATCH v7 6/8] perf tool: add script " Hari Bathini
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Hari Bathini @ 2017-02-21 14:01 UTC (permalink / raw)
  To: ast, peterz, lkml, acme, alexander.shishkin, mingo
  Cc: daniel, rostedt, Ananth N Mavinakayanahalli, ebiederm, sargun,
	Aravinda Prasad, brendan.d.gregg, jolsa

Add print support for events of type PERF_RECORD_NAMESPACES.

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
---
 tools/perf/util/event.c   |   30 ++++++++++++++++++++++++++++++
 tools/perf/util/event.h   |    1 +
 tools/perf/util/machine.c |    3 +++
 3 files changed, 34 insertions(+)

diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index c8c112a..43b6a8f 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -1104,6 +1104,33 @@ size_t perf_event__fprintf_comm(union perf_event *event, FILE *fp)
 	return fprintf(fp, "%s: %s:%d/%d\n", s, event->comm.comm, event->comm.pid, event->comm.tid);
 }
 
+size_t perf_event__fprintf_namespaces(union perf_event *event, FILE *fp)
+{
+	size_t ret = 0;
+	struct perf_ns_link_info *ns_link_info;
+	u32 nr_namespaces, idx;
+
+	ns_link_info = event->namespaces.link_info;
+	nr_namespaces = event->namespaces.nr_namespaces;
+
+	ret += fprintf(fp, " %d/%d - nr_namespaces: %u\n\t[",
+		       event->namespaces.pid,
+		       event->namespaces.tid,
+		       nr_namespaces);
+
+	for (idx = 0; idx < nr_namespaces; idx++) {
+		if (idx && (idx % 4 == 0))
+			ret += fprintf(fp, "\n\t ");
+
+		ret  += fprintf(fp, "%u/%s: %lu/0x%lx%s", idx,
+				perf_ns__name(idx), (u64)ns_link_info[idx].dev,
+				(u64)ns_link_info[idx].ino,
+				((idx + 1) != nr_namespaces) ? ", " : "]\n\n");
+	}
+
+	return ret;
+}
+
 int perf_event__process_comm(struct perf_tool *tool __maybe_unused,
 			     union perf_event *event,
 			     struct perf_sample *sample,
@@ -1300,6 +1327,9 @@ size_t perf_event__fprintf(union perf_event *event, FILE *fp)
 	case PERF_RECORD_MMAP:
 		ret += perf_event__fprintf_mmap(event, fp);
 		break;
+	case PERF_RECORD_NAMESPACES:
+		ret += perf_event__fprintf_namespaces(event, fp);
+		break;
 	case PERF_RECORD_MMAP2:
 		ret += perf_event__fprintf_mmap2(event, fp);
 		break;
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index c73ad47..8eb470b 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -673,6 +673,7 @@ size_t perf_event__fprintf_itrace_start(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_switch(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_thread_map(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_cpu_map(union perf_event *event, FILE *fp);
+size_t perf_event__fprintf_namespaces(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf(union perf_event *event, FILE *fp);
 
 u64 kallsyms__get_function_start(const char *kallsyms_filename,
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 060fabb..5f46ad0 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -519,6 +519,9 @@ int machine__process_namespaces_event(struct machine *machine __maybe_unused,
 		  "\nWARNING: perf tool seems to support more namespaces than"
 		  " the kernel.\nTry updating the kernel..\n\n");
 
+	if (dump_trace)
+		perf_event__fprintf_namespaces(event, stdout);
+
 	if (thread == NULL ||
 	    thread__set_namespaces(thread, sample->time, &event->namespaces)) {
 		dump_printf("problem processing PERF_RECORD_NAMESPACES, skipping event.\n");

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v7 6/8] perf tool: add script print support for namespace events
  2017-02-21 14:01 [PATCH v7 0/8] perf: add support for analyzing events for containers Hari Bathini
                   ` (4 preceding siblings ...)
  2017-02-21 14:01 ` [PATCH v7 5/8] perf tool: add print support for namespace events Hari Bathini
@ 2017-02-21 14:02 ` Hari Bathini
  2017-03-01 21:08   ` Arnaldo Carvalho de Melo
  2017-02-21 14:02 ` [PATCH v7 7/8] perf tool: update about the new option to show " Hari Bathini
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Hari Bathini @ 2017-02-21 14:02 UTC (permalink / raw)
  To: ast, peterz, lkml, acme, alexander.shishkin, mingo
  Cc: daniel, rostedt, Ananth N Mavinakayanahalli, ebiederm, sargun,
	Aravinda Prasad, brendan.d.gregg, jolsa

Add script print support for events of type PERF_RECORD_NAMESPACES.

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
---
 tools/perf/builtin-script.c |   40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index f1ce806..66d62c9 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -830,6 +830,7 @@ struct perf_script {
 	bool			show_task_events;
 	bool			show_mmap_events;
 	bool			show_switch_events;
+	bool			show_namespace_events;
 	bool			allocated;
 	struct cpu_map		*cpus;
 	struct thread_map	*threads;
@@ -1118,6 +1119,41 @@ static int process_comm_event(struct perf_tool *tool,
 	return ret;
 }
 
+static int process_namespaces_event(struct perf_tool *tool,
+				    union perf_event *event,
+				    struct perf_sample *sample,
+				    struct machine *machine)
+{
+	struct thread *thread;
+	struct perf_script *script = container_of(tool, struct perf_script, tool);
+	struct perf_session *session = script->session;
+	struct perf_evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id);
+	int ret = -1;
+
+	thread = machine__findnew_thread(machine, event->namespaces.pid,
+					 event->namespaces.tid);
+	if (thread == NULL) {
+		pr_debug("problem processing NAMESPACES event, skipping it.\n");
+		return -1;
+	}
+
+	if (perf_event__process_namespaces(tool, event, sample, machine) < 0)
+		goto out;
+
+	if (!evsel->attr.sample_id_all) {
+		sample->cpu = 0;
+		sample->time = 0;
+		sample->tid = event->namespaces.tid;
+		sample->pid = event->namespaces.pid;
+	}
+	print_sample_start(sample, thread, evsel);
+	perf_event__fprintf(event, stdout);
+	ret = 0;
+out:
+	thread__put(thread);
+	return ret;
+}
+
 static int process_fork_event(struct perf_tool *tool,
 			      union perf_event *event,
 			      struct perf_sample *sample,
@@ -1293,6 +1329,8 @@ static int __cmd_script(struct perf_script *script)
 	}
 	if (script->show_switch_events)
 		script->tool.context_switch = process_switch_event;
+	if (script->show_namespace_events)
+		script->tool.namespaces = process_namespaces_event;
 
 	ret = perf_session__process_events(script->session);
 
@@ -2181,6 +2219,8 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
 		    "Show the mmap events"),
 	OPT_BOOLEAN('\0', "show-switch-events", &script.show_switch_events,
 		    "Show context switch events (if recorded)"),
+	OPT_BOOLEAN('\0', "show-namespace-events", &script.show_namespace_events,
+		    "Show namespace events (if recorded)"),
 	OPT_BOOLEAN('f', "force", &symbol_conf.force, "don't complain, do it"),
 	OPT_BOOLEAN(0, "ns", &nanosecs,
 		    "Use 9 decimal places when displaying time"),

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v7 7/8] perf tool: update about the new option to show namespace events
  2017-02-21 14:01 [PATCH v7 0/8] perf: add support for analyzing events for containers Hari Bathini
                   ` (5 preceding siblings ...)
  2017-02-21 14:02 ` [PATCH v7 6/8] perf tool: add script " Hari Bathini
@ 2017-02-21 14:02 ` Hari Bathini
  2017-02-21 14:03 ` [PATCH v7 8/8] perf tool: add cgroup identifier entry in perf report Hari Bathini
  2017-02-22 11:11 ` [PATCH v7 0/8] perf: add support for analyzing events for containers Jiri Olsa
  8 siblings, 0 replies; 24+ messages in thread
From: Hari Bathini @ 2017-02-21 14:02 UTC (permalink / raw)
  To: ast, peterz, lkml, acme, alexander.shishkin, mingo
  Cc: daniel, rostedt, Ananth N Mavinakayanahalli, ebiederm, sargun,
	Aravinda Prasad, brendan.d.gregg, jolsa

Now that we have a new option to show namespace events, update
the perf-script documentation accordingly.

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
---
 tools/perf/Documentation/perf-script.txt |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index 4ed5f23..62c9b0c 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -248,6 +248,9 @@ OPTIONS
 --show-mmap-events
 	Display mmap related events (e.g. MMAP, MMAP2).
 
+--show-namespace-events
+	Display namespace events i.e. events of type PERF_RECORD_NAMESPACES.
+
 --show-switch-events
 	Display context switch events i.e. events of type PERF_RECORD_SWITCH or
 	PERF_RECORD_SWITCH_CPU_WIDE.

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v7 8/8] perf tool: add cgroup identifier entry in perf report
  2017-02-21 14:01 [PATCH v7 0/8] perf: add support for analyzing events for containers Hari Bathini
                   ` (6 preceding siblings ...)
  2017-02-21 14:02 ` [PATCH v7 7/8] perf tool: update about the new option to show " Hari Bathini
@ 2017-02-21 14:03 ` Hari Bathini
  2017-02-22 16:48   ` Jiri Olsa
  2017-03-01 21:16   ` Arnaldo Carvalho de Melo
  2017-02-22 11:11 ` [PATCH v7 0/8] perf: add support for analyzing events for containers Jiri Olsa
  8 siblings, 2 replies; 24+ messages in thread
From: Hari Bathini @ 2017-02-21 14:03 UTC (permalink / raw)
  To: ast, peterz, lkml, acme, alexander.shishkin, mingo
  Cc: daniel, rostedt, Ananth N Mavinakayanahalli, ebiederm, sargun,
	Aravinda Prasad, brendan.d.gregg, jolsa

This patch introduces a cgroup identifier entry field in perf report to
identify or distinguish data of different cgroups. It uses the device
number and inode number of cgroup namespace, included in perf data with
the new PERF_RECORD_NAMESPACES event, as cgroup identifier. With the
assumption that each container is created with it's own cgroup namespace,
this allows assessment/analysis of multiple containers at once.

Shown below is the output of perf report, sorted based on cgroup id, on
a system that was running three containers at the time of perf record
and clearly showing one of the containers' considerable use of kernel
memory in comparison with others:


	$ perf report -s cgroup_id,sample --stdio
	#
	# Total Lost Samples: 0
	#
	# Samples: 16K of event 'kmem:kmalloc'
	# Event count (approx.): 16043
	#
	# Overhead  cgroup id (dev/inode)       Samples
	# ........  .....................  ............
	#
	    96.33%  3/0xf00000d0                  15454
	     3.02%  3/0xeffffffb                    485
	     0.31%  3/0xf00000ce                     49
	     0.29%  3/0xf00000cf                     47
	     0.05%  0/0x0                             8

While this is a start, there is further scope of improving this. For
example, instead of cgroup namespace's device and inode numbers, dev
and inode numbers of some or all namespaces may be used to distinguish
which processes are running in a given container context. Also, scripts
to map device and inode info to containers sounds plausible for better
tracing of containers.

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
---
 tools/perf/util/hist.c |    7 +++++++
 tools/perf/util/hist.h |    1 +
 tools/perf/util/sort.c |   41 +++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/sort.h |    7 +++++++
 4 files changed, 56 insertions(+)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 32c6a93..559ea27 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -3,6 +3,7 @@
 #include "hist.h"
 #include "map.h"
 #include "session.h"
+#include "namespaces.h"
 #include "sort.h"
 #include "evlist.h"
 #include "evsel.h"
@@ -169,6 +170,7 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h)
 		hists__set_unres_dso_col_len(hists, HISTC_MEM_DADDR_DSO);
 	}
 
+	hists__new_col_len(hists, HISTC_CGROUP_ID, 20);
 	hists__new_col_len(hists, HISTC_CPU, 3);
 	hists__new_col_len(hists, HISTC_SOCKET, 6);
 	hists__new_col_len(hists, HISTC_MEM_LOCKED, 6);
@@ -574,9 +576,14 @@ __hists__add_entry(struct hists *hists,
 		   bool sample_self,
 		   struct hist_entry_ops *ops)
 {
+	struct namespaces *ns = thread__namespaces(al->thread);
 	struct hist_entry entry = {
 		.thread	= al->thread,
 		.comm = thread__comm(al->thread),
+		.cgroup_id = {
+			.dev = ns ? ns->link_info[CGROUP_NS_INDEX].dev : 0,
+			.ino = ns ? ns->link_info[CGROUP_NS_INDEX].ino : 0,
+		},
 		.ms = {
 			.map	= al->map,
 			.sym	= al->sym,
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 28c216e..4c1da48 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -30,6 +30,7 @@ enum hist_column {
 	HISTC_DSO,
 	HISTC_THREAD,
 	HISTC_COMM,
+	HISTC_CGROUP_ID,
 	HISTC_PARENT,
 	HISTC_CPU,
 	HISTC_SOCKET,
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index df622f4..9f5f404 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -536,6 +536,46 @@ struct sort_entry sort_cpu = {
 	.se_width_idx	= HISTC_CPU,
 };
 
+/* --sort cgroup_id */
+
+static int64_t _sort__cgroup_dev_cmp(u64 left_dev, u64 right_dev)
+{
+	return (int64_t)(right_dev - left_dev);
+}
+
+static int64_t _sort__cgroup_inode_cmp(u64 left_ino, u64 right_ino)
+{
+	return (int64_t)(right_ino - left_ino);
+}
+
+static int64_t
+sort__cgroup_id_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	int64_t ret;
+
+	ret = _sort__cgroup_dev_cmp(right->cgroup_id.dev, left->cgroup_id.dev);
+	if (ret != 0)
+		return ret;
+
+	return _sort__cgroup_inode_cmp(right->cgroup_id.ino,
+				       left->cgroup_id.ino);
+}
+
+static int hist_entry__cgroup_id_snprintf(struct hist_entry *he,
+					  char *bf, size_t size,
+					  unsigned int width __maybe_unused)
+{
+	return repsep_snprintf(bf, size, "%lu/0x%lx", he->cgroup_id.dev,
+			       he->cgroup_id.ino);
+}
+
+struct sort_entry sort_cgroup_id = {
+	.se_header      = "cgroup id (dev/inode)",
+	.se_cmp	        = sort__cgroup_id_cmp,
+	.se_snprintf    = hist_entry__cgroup_id_snprintf,
+	.se_width_idx	= HISTC_CGROUP_ID,
+};
+
 /* --sort socket */
 
 static int64_t
@@ -1418,6 +1458,7 @@ static struct sort_dimension common_sort_dimensions[] = {
 	DIM(SORT_GLOBAL_WEIGHT, "weight", sort_global_weight),
 	DIM(SORT_TRANSACTION, "transaction", sort_transaction),
 	DIM(SORT_TRACE, "trace", sort_trace),
+	DIM(SORT_CGROUP_ID, "cgroup_id", sort_cgroup_id),
 };
 
 #undef DIM
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 7aff317..68a5abb 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -54,6 +54,11 @@ struct he_stat {
 	u32			nr_events;
 };
 
+struct namespace_id {
+	u64			dev;
+	u64			ino;
+};
+
 struct hist_entry_diff {
 	bool	computed;
 	union {
@@ -91,6 +96,7 @@ struct hist_entry {
 	struct map_symbol	ms;
 	struct thread		*thread;
 	struct comm		*comm;
+	struct namespace_id	cgroup_id;
 	u64			ip;
 	u64			transaction;
 	s32			socket;
@@ -211,6 +217,7 @@ enum sort_type {
 	SORT_GLOBAL_WEIGHT,
 	SORT_TRANSACTION,
 	SORT_TRACE,
+	SORT_CGROUP_ID,
 
 	/* branch stack specific sort keys */
 	__SORT_BRANCH_STACK,

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 0/8] perf: add support for analyzing events for containers
  2017-02-21 14:01 [PATCH v7 0/8] perf: add support for analyzing events for containers Hari Bathini
                   ` (7 preceding siblings ...)
  2017-02-21 14:03 ` [PATCH v7 8/8] perf tool: add cgroup identifier entry in perf report Hari Bathini
@ 2017-02-22 11:11 ` Jiri Olsa
  2017-02-22 12:40   ` Hari Bathini
  8 siblings, 1 reply; 24+ messages in thread
From: Jiri Olsa @ 2017-02-22 11:11 UTC (permalink / raw)
  To: Hari Bathini
  Cc: ast, peterz, lkml, acme, alexander.shishkin, mingo, daniel,
	rostedt, Ananth N Mavinakayanahalli, ebiederm, sargun,
	Aravinda Prasad, brendan.d.gregg

On Tue, Feb 21, 2017 at 07:31:11PM +0530, Hari Bathini wrote:
> Currently, there is no trivial mechanism to analyze events based on
> containers. perf -G can be used, but it will not filter events for the                  
> containers created after perf is invoked, making it difficult to assess/
> analyze performance issues of multiple containers at once.
> 
> This patch-set is aimed at addressing this limitation by introducing a
> new PERF_RECORD_NAMESPACES event that records namespaces related info.
> As containers are created with namespaces, the new data can be used to
> in assessment/analysis of multiple containers.
> 
> The first patch introduces PERF_RECORD_NAMESPACES in kernel while the
> second patch makes the corresponding changes in perf tool to read this
> PERF_RECORD_NAMESPACES events. The third patch demonstrates analysis
> of containers with this data by adding a cgroup identifier column in
> perf report, which contains the cgroup namespace's device and inode
> numbers. This is based on the assumption that each container is created
> with it's own cgroup namespace. The third patch has scope for improvement
> based on the conventions a container is attributed with, going forward.
> 
> Changes from v6:
> * Updated changelog of patch 1
> * Split patch 2 into smaller patches
> * Updated record and script documenatation
> * Dropped name field from ns_link_info struct

what's this version based on? I can't cleanly apply it neither
on tip's perf/core or master or Arnaldo's perf/core

thanks,
jirka

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 0/8] perf: add support for analyzing events for containers
  2017-02-22 11:11 ` [PATCH v7 0/8] perf: add support for analyzing events for containers Jiri Olsa
@ 2017-02-22 12:40   ` Hari Bathini
  2017-02-22 13:52     ` Jiri Olsa
  0 siblings, 1 reply; 24+ messages in thread
From: Hari Bathini @ 2017-02-22 12:40 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: ast, peterz, lkml, acme, alexander.shishkin, mingo, daniel,
	rostedt, Ananth N Mavinakayanahalli, ebiederm, sargun,
	Aravinda Prasad, brendan.d.gregg

Hi Jirka,


On Wednesday 22 February 2017 04:41 PM, Jiri Olsa wrote:
> On Tue, Feb 21, 2017 at 07:31:11PM +0530, Hari Bathini wrote:
>> Currently, there is no trivial mechanism to analyze events based on
>> containers. perf -G can be used, but it will not filter events for the
>> containers created after perf is invoked, making it difficult to assess/
>> analyze performance issues of multiple containers at once.
>>
>> This patch-set is aimed at addressing this limitation by introducing a
>> new PERF_RECORD_NAMESPACES event that records namespaces related info.
>> As containers are created with namespaces, the new data can be used to
>> in assessment/analysis of multiple containers.
>>
>> The first patch introduces PERF_RECORD_NAMESPACES in kernel while the
>> second patch makes the corresponding changes in perf tool to read this
>> PERF_RECORD_NAMESPACES events. The third patch demonstrates analysis
>> of containers with this data by adding a cgroup identifier column in
>> perf report, which contains the cgroup namespace's device and inode
>> numbers. This is based on the assumption that each container is created
>> with it's own cgroup namespace. The third patch has scope for improvement
>> based on the conventions a container is attributed with, going forward.
>>
>> Changes from v6:
>> * Updated changelog of patch 1
>> * Split patch 2 into smaller patches
>> * Updated record and script documenatation
>> * Dropped name field from ns_link_info struct
> what's this version based on? I can't cleanly apply it neither
> on tip's perf/core or master or Arnaldo's perf/core

That's odd. I based my patches against tip's perf/core
To be precise, the patches apply cleanly on top of commit
0c8967c9df230d2c4dde6649f410b62e01806c22.

Thanks
Hari

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 0/8] perf: add support for analyzing events for containers
  2017-02-22 12:40   ` Hari Bathini
@ 2017-02-22 13:52     ` Jiri Olsa
  0 siblings, 0 replies; 24+ messages in thread
From: Jiri Olsa @ 2017-02-22 13:52 UTC (permalink / raw)
  To: Hari Bathini
  Cc: ast, peterz, lkml, acme, alexander.shishkin, mingo, daniel,
	rostedt, Ananth N Mavinakayanahalli, ebiederm, sargun,
	Aravinda Prasad, brendan.d.gregg

On Wed, Feb 22, 2017 at 06:10:28PM +0530, Hari Bathini wrote:
> Hi Jirka,
> 
> 
> On Wednesday 22 February 2017 04:41 PM, Jiri Olsa wrote:
> > On Tue, Feb 21, 2017 at 07:31:11PM +0530, Hari Bathini wrote:
> > > Currently, there is no trivial mechanism to analyze events based on
> > > containers. perf -G can be used, but it will not filter events for the
> > > containers created after perf is invoked, making it difficult to assess/
> > > analyze performance issues of multiple containers at once.
> > > 
> > > This patch-set is aimed at addressing this limitation by introducing a
> > > new PERF_RECORD_NAMESPACES event that records namespaces related info.
> > > As containers are created with namespaces, the new data can be used to
> > > in assessment/analysis of multiple containers.
> > > 
> > > The first patch introduces PERF_RECORD_NAMESPACES in kernel while the
> > > second patch makes the corresponding changes in perf tool to read this
> > > PERF_RECORD_NAMESPACES events. The third patch demonstrates analysis
> > > of containers with this data by adding a cgroup identifier column in
> > > perf report, which contains the cgroup namespace's device and inode
> > > numbers. This is based on the assumption that each container is created
> > > with it's own cgroup namespace. The third patch has scope for improvement
> > > based on the conventions a container is attributed with, going forward.
> > > 
> > > Changes from v6:
> > > * Updated changelog of patch 1
> > > * Split patch 2 into smaller patches
> > > * Updated record and script documenatation
> > > * Dropped name field from ns_link_info struct
> > what's this version based on? I can't cleanly apply it neither
> > on tip's perf/core or master or Arnaldo's perf/core
> 
> That's odd. I based my patches against tip's perf/core
> To be precise, the patches apply cleanly on top of commit
> 0c8967c9df230d2c4dde6649f410b62e01806c22.

got it now, thanks

jirka

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 8/8] perf tool: add cgroup identifier entry in perf report
  2017-02-21 14:03 ` [PATCH v7 8/8] perf tool: add cgroup identifier entry in perf report Hari Bathini
@ 2017-02-22 16:48   ` Jiri Olsa
  2017-03-01 21:16   ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 24+ messages in thread
From: Jiri Olsa @ 2017-02-22 16:48 UTC (permalink / raw)
  To: Hari Bathini
  Cc: ast, peterz, lkml, acme, alexander.shishkin, mingo, daniel,
	rostedt, Ananth N Mavinakayanahalli, ebiederm, sargun,
	Aravinda Prasad, brendan.d.gregg

On Tue, Feb 21, 2017 at 07:33:13PM +0530, Hari Bathini wrote:
> This patch introduces a cgroup identifier entry field in perf report to
> identify or distinguish data of different cgroups. It uses the device
> number and inode number of cgroup namespace, included in perf data with
> the new PERF_RECORD_NAMESPACES event, as cgroup identifier. With the
> assumption that each container is created with it's own cgroup namespace,
> this allows assessment/analysis of multiple containers at once.
> 
> Shown below is the output of perf report, sorted based on cgroup id, on
> a system that was running three containers at the time of perf record
> and clearly showing one of the containers' considerable use of kernel
> memory in comparison with others:
> 
> 
> 	$ perf report -s cgroup_id,sample --stdio
> 	#
> 	# Total Lost Samples: 0
> 	#
> 	# Samples: 16K of event 'kmem:kmalloc'
> 	# Event count (approx.): 16043
> 	#
> 	# Overhead  cgroup id (dev/inode)       Samples
> 	# ........  .....................  ............
> 	#
> 	    96.33%  3/0xf00000d0                  15454
> 	     3.02%  3/0xeffffffb                    485
> 	     0.31%  3/0xf00000ce                     49
> 	     0.29%  3/0xf00000cf                     47
> 	     0.05%  0/0x0                             8
> 
> While this is a start, there is further scope of improving this. For
> example, instead of cgroup namespace's device and inode numbers, dev
> and inode numbers of some or all namespaces may be used to distinguish
> which processes are running in a given container context. Also, scripts
> to map device and inode info to containers sounds plausible for better
> tracing of containers.
> 
> Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
> ---
>  tools/perf/util/hist.c |    7 +++++++
>  tools/perf/util/hist.h |    1 +
>  tools/perf/util/sort.c |   41 +++++++++++++++++++++++++++++++++++++++++
>  tools/perf/util/sort.h |    7 +++++++
>  4 files changed, 56 insertions(+)

missing documentation update with new sorting field...

other than that the rest looks ok to me, for the patchset:

Acked-by: Jiri Olsa <jolsa@kernel.org>

thanks,
jirka

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 1/8] perf: add PERF_RECORD_NAMESPACES to include namespaces related info
  2017-02-21 14:01 ` [PATCH v7 1/8] perf: add PERF_RECORD_NAMESPACES to include namespaces related info Hari Bathini
@ 2017-02-24 12:14   ` Peter Zijlstra
  2017-03-01 20:45     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2017-02-24 12:14 UTC (permalink / raw)
  To: Hari Bathini
  Cc: ast, lkml, acme, alexander.shishkin, mingo, daniel, rostedt,
	Ananth N Mavinakayanahalli, ebiederm, sargun, Aravinda Prasad,
	brendan.d.gregg, jolsa

On Tue, Feb 21, 2017 at 07:31:23PM +0530, Hari Bathini wrote:
> With the advert of container technologies like docker, that depend
> on namespaces for isolation, there is a need for tracing support for
> namespaces. This patch introduces new PERF_RECORD_NAMESPACES event
> for recording namespaces related info. By recording info for every
> namespace, it is left to userspace to take a call on the definition
> of a container and trace containers by updating perf tool accordingly.
> 
> Each namespace has a combination of device and inode numbers. Though
> every namespace has the same device number currently, that may change
> in future to avoid the need for a namespace of namespaces. Considering
> such possibility, record both device and inode numbers separately for
> each namespace.
> 
> Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
> ---
>  include/linux/perf_event.h      |    2 +
>  include/uapi/linux/perf_event.h |   32 +++++++++
>  kernel/events/core.c            |  139 +++++++++++++++++++++++++++++++++++++++
>  kernel/fork.c                   |    2 +
>  kernel/nsproxy.c                |    3 +
>  5 files changed, 177 insertions(+), 1 deletion(-)

Arnaldo, seeing that most of this patch set is tool stuff, could you
take the lot through the tool tree?

This patch:

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 1/8] perf: add PERF_RECORD_NAMESPACES to include namespaces related info
  2017-02-24 12:14   ` Peter Zijlstra
@ 2017-03-01 20:45     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 24+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-03-01 20:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Hari Bathini, ast, lkml, alexander.shishkin, mingo, daniel,
	rostedt, Ananth N Mavinakayanahalli, ebiederm, sargun,
	Aravinda Prasad, brendan.d.gregg, jolsa

Em Fri, Feb 24, 2017 at 01:14:18PM +0100, Peter Zijlstra escreveu:
> On Tue, Feb 21, 2017 at 07:31:23PM +0530, Hari Bathini wrote:
> > With the advert of container technologies like docker, that depend
> > on namespaces for isolation, there is a need for tracing support for
> > namespaces. This patch introduces new PERF_RECORD_NAMESPACES event
> > for recording namespaces related info. By recording info for every
> > namespace, it is left to userspace to take a call on the definition
> > of a container and trace containers by updating perf tool accordingly.
> > 
> > Each namespace has a combination of device and inode numbers. Though
> > every namespace has the same device number currently, that may change
> > in future to avoid the need for a namespace of namespaces. Considering
> > such possibility, record both device and inode numbers separately for
> > each namespace.
> > 
> > Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
> > ---
> >  include/linux/perf_event.h      |    2 +
> >  include/uapi/linux/perf_event.h |   32 +++++++++
> >  kernel/events/core.c            |  139 +++++++++++++++++++++++++++++++++++++++
> >  kernel/fork.c                   |    2 +
> >  kernel/nsproxy.c                |    3 +
> >  5 files changed, 177 insertions(+), 1 deletion(-)
> 
> Arnaldo, seeing that most of this patch set is tool stuff, could you
> take the lot through the tool tree?
> 
> This patch:
> 
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Ok, looking at it now.

- Arnaldo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 2/8] perf tool: add PERF_RECORD_NAMESPACES to include namespaces related info
  2017-02-21 14:01 ` [PATCH v7 2/8] perf tool: " Hari Bathini
@ 2017-03-01 21:02   ` Arnaldo Carvalho de Melo
  2017-03-03  8:54     ` Hari Bathini
  0 siblings, 1 reply; 24+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-03-01 21:02 UTC (permalink / raw)
  To: Hari Bathini
  Cc: ast, peterz, lkml, alexander.shishkin, mingo, daniel, rostedt,
	Ananth N Mavinakayanahalli, ebiederm, sargun, Aravinda Prasad,
	brendan.d.gregg, jolsa

Em Tue, Feb 21, 2017 at 07:31:30PM +0530, Hari Bathini escreveu:
> Update perf tool to examine PERF_RECORD_NAMESPACES events emitted by
>> the kernel when fork, clone, setns or unshare are invoked.

You forgot to update tools/perf/Documentation/ for all the options you
added, see more comments below.
 
> Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
> ---
>  tools/include/uapi/linux/perf_event.h |   32 +++++++++++++++++++++++-
>  tools/perf/builtin-annotate.c         |    1 +
>  tools/perf/builtin-diff.c             |    1 +
>  tools/perf/builtin-inject.c           |   14 +++++++++++
>  tools/perf/builtin-kmem.c             |    1 +
>  tools/perf/builtin-kvm.c              |    2 ++
>  tools/perf/builtin-lock.c             |    1 +
>  tools/perf/builtin-mem.c              |    1 +
>  tools/perf/builtin-record.c           |    6 +++++
>  tools/perf/builtin-report.c           |    1 +
>  tools/perf/builtin-sched.c            |    1 +
>  tools/perf/builtin-script.c           |    1 +
>  tools/perf/builtin-trace.c            |    3 ++
>  tools/perf/perf.h                     |    1 +
>  tools/perf/util/Build                 |    1 +
>  tools/perf/util/data-convert-bt.c     |    1 +
>  tools/perf/util/event.c               |    9 +++++++
>  tools/perf/util/event.h               |   14 +++++++++++
>  tools/perf/util/evsel.c               |    3 ++
>  tools/perf/util/machine.c             |   31 +++++++++++++++++++++++
>  tools/perf/util/machine.h             |    3 ++
>  tools/perf/util/namespaces.c          |   35 ++++++++++++++++++++++++++
>  tools/perf/util/namespaces.h          |   26 ++++++++++++++++++++
>  tools/perf/util/session.c             |    7 +++++
>  tools/perf/util/thread.c              |   44 ++++++++++++++++++++++++++++++++-
>  tools/perf/util/thread.h              |    6 +++++
>  tools/perf/util/tool.h                |    2 ++
>  27 files changed, 244 insertions(+), 4 deletions(-)
>  create mode 100644 tools/perf/util/namespaces.c
>  create mode 100644 tools/perf/util/namespaces.h
> 
> diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
> index c66a485..bec0aad 100644
> --- a/tools/include/uapi/linux/perf_event.h
> +++ b/tools/include/uapi/linux/perf_event.h
> @@ -344,7 +344,8 @@ struct perf_event_attr {
>  				use_clockid    :  1, /* use @clockid for time fields */
>  				context_switch :  1, /* context switch data */
>  				write_backward :  1, /* Write ring buffer from end to beginning */
> -				__reserved_1   : 36;
> +				namespaces     :  1, /* include namespaces data */
> +				__reserved_1   : 35;
>  
>  	union {
>  		__u32		wakeup_events;	  /* wakeup every n events */
> @@ -610,6 +611,23 @@ struct perf_event_header {
>  	__u16	size;
>  };
>  
> +struct perf_ns_link_info {
> +	__u64	dev;
> +	__u64	ino;
> +};
> +
> +enum {
> +	NET_NS_INDEX		= 0,
> +	UTS_NS_INDEX		= 1,
> +	IPC_NS_INDEX		= 2,
> +	PID_NS_INDEX		= 3,
> +	USER_NS_INDEX		= 4,
> +	MNT_NS_INDEX		= 5,
> +	CGROUP_NS_INDEX		= 6,
> +
> +	NR_NAMESPACES,		/* number of available namespaces */
> +};
> +
>  enum perf_event_type {
>  
>  	/*
> @@ -862,6 +880,18 @@ enum perf_event_type {
>  	 */
>  	PERF_RECORD_SWITCH_CPU_WIDE		= 15,
>  
> +	/*
> +	 * struct {
> +	 *	struct perf_event_header	header;
> +	 *	u32				pid;
> +	 *	u32				tid;
> +	 *	u64				nr_namespaces;
> +	 *	{ u64				dev, inode; } [nr_namespaces];
> +	 *	struct sample_id		sample_id;
> +	 * };
> +	 */
> +	PERF_RECORD_NAMESPACES			= 16,
> +
>  	PERF_RECORD_MAX,			/* non-ABI */
>  };
>  
> diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
> index ebb6283..1b63dc4 100644
> --- a/tools/perf/builtin-annotate.c
> +++ b/tools/perf/builtin-annotate.c
> @@ -393,6 +393,7 @@ int cmd_annotate(int argc, const char **argv, const char *prefix __maybe_unused)
>  			.comm	= perf_event__process_comm,
>  			.exit	= perf_event__process_exit,
>  			.fork	= perf_event__process_fork,
> +			.namespaces = perf_event__process_namespaces,
>  			.ordered_events = true,
>  			.ordering_requires_timestamps = true,
>  		},
> diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
> index 70a2893..4b821cf 100644
> --- a/tools/perf/builtin-diff.c
> +++ b/tools/perf/builtin-diff.c
> @@ -364,6 +364,7 @@ static struct perf_tool tool = {
>  	.exit	= perf_event__process_exit,
>  	.fork	= perf_event__process_fork,
>  	.lost	= perf_event__process_lost,
> +	.namespaces = perf_event__process_namespaces,
>  	.ordered_events = true,
>  	.ordering_requires_timestamps = true,
>  };
> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> index b9bc7e3..c5ddc73 100644
> --- a/tools/perf/builtin-inject.c
> +++ b/tools/perf/builtin-inject.c
> @@ -333,6 +333,19 @@ static int perf_event__repipe_comm(struct perf_tool *tool,
>  	return err;
>  }
>  
> +static int perf_event__repipe_namespaces(struct perf_tool *tool,
> +					 union perf_event *event,
> +					 struct perf_sample *sample,
> +					 struct machine *machine)
> +{
> +	int err;
> +
> +	err = perf_event__process_namespaces(tool, event, sample, machine);

Minor, but since changes are needed anyway: combine the previous three
lines into one.

> +	perf_event__repipe(tool, event, sample, machine);
> +
> +	return err;
> +}
> +
>  static int perf_event__repipe_exit(struct perf_tool *tool,
>  				   union perf_event *event,
>  				   struct perf_sample *sample,
> @@ -660,6 +673,7 @@ static int __cmd_inject(struct perf_inject *inject)
>  		session->itrace_synth_opts = &inject->itrace_synth_opts;
>  		inject->itrace_synth_opts.inject = true;
>  		inject->tool.comm	    = perf_event__repipe_comm;
> +		inject->tool.namespaces	    = perf_event__repipe_namespaces;
>  		inject->tool.exit	    = perf_event__repipe_exit;
>  		inject->tool.id_index	    = perf_event__repipe_id_index;
>  		inject->tool.auxtrace_info  = perf_event__process_auxtrace_info;
> diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
> index 6da8d08..d509e74 100644
> --- a/tools/perf/builtin-kmem.c
> +++ b/tools/perf/builtin-kmem.c
> @@ -964,6 +964,7 @@ static struct perf_tool perf_kmem = {
>  	.comm		 = perf_event__process_comm,
>  	.mmap		 = perf_event__process_mmap,
>  	.mmap2		 = perf_event__process_mmap2,
> +	.namespaces	 = perf_event__process_namespaces,
>  	.ordered_events	 = true,
>  };
>  
> diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
> index 08fa88f..18e6c38 100644
> --- a/tools/perf/builtin-kvm.c
> +++ b/tools/perf/builtin-kvm.c
> @@ -1044,6 +1044,7 @@ static int read_events(struct perf_kvm_stat *kvm)
>  	struct perf_tool eops = {
>  		.sample			= process_sample_event,
>  		.comm			= perf_event__process_comm,
> +		.namespaces		= perf_event__process_namespaces,
>  		.ordered_events		= true,
>  	};
>  	struct perf_data_file file = {
> @@ -1348,6 +1349,7 @@ static int kvm_events_live(struct perf_kvm_stat *kvm,
>  	kvm->tool.exit   = perf_event__process_exit;
>  	kvm->tool.fork   = perf_event__process_fork;
>  	kvm->tool.lost   = process_lost_event;
> +	kvm->tool.namespaces  = perf_event__process_namespaces;
>  	kvm->tool.ordered_events = true;
>  	perf_tool__fill_defaults(&kvm->tool);
>  
> diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
> index ce3bfb4..d750cca 100644
> --- a/tools/perf/builtin-lock.c
> +++ b/tools/perf/builtin-lock.c
> @@ -858,6 +858,7 @@ static int __cmd_report(bool display_info)
>  	struct perf_tool eops = {
>  		.sample		 = process_sample_event,
>  		.comm		 = perf_event__process_comm,
> +		.namespaces	 = perf_event__process_namespaces,
>  		.ordered_events	 = true,
>  	};
>  	struct perf_data_file file = {
> diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
> index cd7bc4d..430656c 100644
> --- a/tools/perf/builtin-mem.c
> +++ b/tools/perf/builtin-mem.c
> @@ -342,6 +342,7 @@ int cmd_mem(int argc, const char **argv, const char *prefix __maybe_unused)
>  			.lost		= perf_event__process_lost,
>  			.fork		= perf_event__process_fork,
>  			.build_id	= perf_event__process_build_id,
> +			.namespaces	= perf_event__process_namespaces,
>  			.ordered_events	= true,
>  		},
>  		.input_name		 = "perf.data",
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 6cd6776..a8b9a78 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -876,6 +876,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>  	signal(SIGTERM, sig_handler);
>  	signal(SIGSEGV, sigsegv_handler);
>  
> +	if (rec->opts.record_namespaces)
> +		tool->namespace_events = true;
> +
>  	if (rec->opts.auxtrace_snapshot_mode || rec->switch_output.enabled) {
>  		signal(SIGUSR2, snapshot_sig_handler);
>  		if (rec->opts.auxtrace_snapshot_mode)
> @@ -1497,6 +1500,7 @@ static struct record record = {
>  		.fork		= perf_event__process_fork,
>  		.exit		= perf_event__process_exit,
>  		.comm		= perf_event__process_comm,
> +		.namespaces	= perf_event__process_namespaces,
>  		.mmap		= perf_event__process_mmap,
>  		.mmap2		= perf_event__process_mmap2,
>  		.ordered_events	= true,
> @@ -1611,6 +1615,8 @@ static struct option __record_options[] = {
>  			  "opts", "AUX area tracing Snapshot Mode", ""),
>  	OPT_UINTEGER(0, "proc-map-timeout", &record.opts.proc_map_timeout,
>  			"per thread proc mmap processing timeout in ms"),
> +	OPT_BOOLEAN(0, "namespaces", &record.opts.record_namespaces,
> +		    "Record namespaces events"),
>  	OPT_BOOLEAN(0, "switch-events", &record.opts.record_switch_events,
>  		    "Record context switch events"),
>  	OPT_BOOLEAN_FLAG(0, "all-kernel", &record.opts.all_kernel,
> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> index dbd7fa0..5c92c75 100644
> --- a/tools/perf/builtin-report.c
> +++ b/tools/perf/builtin-report.c
> @@ -694,6 +694,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
>  			.mmap		 = perf_event__process_mmap,
>  			.mmap2		 = perf_event__process_mmap2,
>  			.comm		 = perf_event__process_comm,
> +			.namespaces	 = perf_event__process_namespaces,
>  			.exit		 = perf_event__process_exit,
>  			.fork		 = perf_event__process_fork,
>  			.lost		 = perf_event__process_lost,
> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> index 270eb2d..e0ddd04 100644
> --- a/tools/perf/builtin-sched.c
> +++ b/tools/perf/builtin-sched.c
> @@ -3272,6 +3272,7 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
>  		.tool = {
>  			.sample		 = perf_sched__process_tracepoint_sample,
>  			.comm		 = perf_event__process_comm,
> +			.namespaces	 = perf_event__process_namespaces,
>  			.lost		 = perf_event__process_lost,
>  			.fork		 = perf_sched__process_fork_event,
>  			.ordered_events = true,
> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> index c0783b4..f1ce806 100644
> --- a/tools/perf/builtin-script.c
> +++ b/tools/perf/builtin-script.c
> @@ -2097,6 +2097,7 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
>  			.mmap		 = perf_event__process_mmap,
>  			.mmap2		 = perf_event__process_mmap2,
>  			.comm		 = perf_event__process_comm,
> +			.namespaces	 = perf_event__process_namespaces,
>  			.exit		 = perf_event__process_exit,
>  			.fork		 = perf_event__process_fork,
>  			.attr		 = process_attr,
> diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
> index 40ef9b2..0bcd32f 100644
> --- a/tools/perf/builtin-trace.c
> +++ b/tools/perf/builtin-trace.c
> @@ -2415,8 +2415,9 @@ static int trace__replay(struct trace *trace)
>  	trace->tool.exit	  = perf_event__process_exit;
>  	trace->tool.fork	  = perf_event__process_fork;
>  	trace->tool.attr	  = perf_event__process_attr;
> -	trace->tool.tracing_data = perf_event__process_tracing_data;
> +	trace->tool.tracing_data  = perf_event__process_tracing_data;
>  	trace->tool.build_id	  = perf_event__process_build_id;
> +	trace->tool.namespaces	  = perf_event__process_namespaces;
>  
>  	trace->tool.ordered_events = true;
>  	trace->tool.ordering_requires_timestamps = true;
> diff --git a/tools/perf/perf.h b/tools/perf/perf.h
> index 1c27d94..806c216 100644
> --- a/tools/perf/perf.h
> +++ b/tools/perf/perf.h
> @@ -50,6 +50,7 @@ struct record_opts {
>  	bool	     running_time;
>  	bool	     full_auxtrace;
>  	bool	     auxtrace_snapshot_mode;
> +	bool	     record_namespaces;
>  	bool	     record_switch_events;
>  	bool	     all_kernel;
>  	bool	     all_user;
> diff --git a/tools/perf/util/Build b/tools/perf/util/Build
> index 5da376b..2ea5ee1 100644
> --- a/tools/perf/util/Build
> +++ b/tools/perf/util/Build
> @@ -42,6 +42,7 @@ libperf-y += pstack.o
>  libperf-y += session.o
>  libperf-$(CONFIG_AUDIT) += syscalltbl.o
>  libperf-y += ordered-events.o
> +libperf-y += namespaces.o
>  libperf-y += comm.o
>  libperf-y += thread.o
>  libperf-y += thread_map.o
> diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c
> index 4e6cbc9..89ece24 100644
> --- a/tools/perf/util/data-convert-bt.c
> +++ b/tools/perf/util/data-convert-bt.c
> @@ -1468,6 +1468,7 @@ int bt_convert__perf2ctf(const char *input, const char *path,
>  			.lost            = perf_event__process_lost,
>  			.tracing_data    = perf_event__process_tracing_data,
>  			.build_id        = perf_event__process_build_id,
> +			.namespaces      = perf_event__process_namespaces,
>  			.ordered_events  = true,
>  			.ordering_requires_timestamps = true,
>  		},
> diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
> index 4ea7ce7..f118eac 100644
> --- a/tools/perf/util/event.c
> +++ b/tools/perf/util/event.c
> @@ -31,6 +31,7 @@ static const char *perf_event__names[] = {
>  	[PERF_RECORD_LOST_SAMPLES]		= "LOST_SAMPLES",
>  	[PERF_RECORD_SWITCH]			= "SWITCH",
>  	[PERF_RECORD_SWITCH_CPU_WIDE]		= "SWITCH_CPU_WIDE",
> +	[PERF_RECORD_NAMESPACES]		= "NAMESPACES",
>  	[PERF_RECORD_HEADER_ATTR]		= "ATTR",
>  	[PERF_RECORD_HEADER_EVENT_TYPE]		= "EVENT_TYPE",
>  	[PERF_RECORD_HEADER_TRACING_DATA]	= "TRACING_DATA",
> @@ -1016,6 +1017,14 @@ int perf_event__process_comm(struct perf_tool *tool __maybe_unused,
>  	return machine__process_comm_event(machine, event, sample);
>  }
>  
> +int perf_event__process_namespaces(struct perf_tool *tool __maybe_unused,
> +				   union perf_event *event,
> +				   struct perf_sample *sample,
> +				   struct machine *machine)
> +{
> +	return machine__process_namespaces_event(machine, event, sample);
> +}
> +
>  int perf_event__process_lost(struct perf_tool *tool __maybe_unused,
>  			     union perf_event *event,
>  			     struct perf_sample *sample,
> diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
> index c735c53..4e90b09 100644
> --- a/tools/perf/util/event.h
> +++ b/tools/perf/util/event.h
> @@ -39,6 +39,15 @@ struct comm_event {
>  	char comm[16];
>  };
>  
> +#define NAMESPACES_MAX			12

Why have this limitation, does the kernel has it as well? Just read the
header, then allocate enough space, this way you don't need to have
those checks about tools being incompatible with a kernel that supports
more namespaces than the tool.

> +
> +struct namespaces_event {
> +	struct perf_event_header header;
> +	u32 pid, tid;
> +	u64 nr_namespaces;
> +	struct perf_ns_link_info link_info[NAMESPACES_MAX];
> +};
> +
>  struct fork_event {
>  	struct perf_event_header header;
>  	u32 pid, ppid;
> @@ -485,6 +494,7 @@ union perf_event {
>  	struct mmap_event		mmap;
>  	struct mmap2_event		mmap2;
>  	struct comm_event		comm;
> +	struct namespaces_event		namespaces;
>  	struct fork_event		fork;
>  	struct lost_event		lost;
>  	struct lost_samples_event	lost_samples;
> @@ -587,6 +597,10 @@ int perf_event__process_switch(struct perf_tool *tool,
>  			       union perf_event *event,
>  			       struct perf_sample *sample,
>  			       struct machine *machine);
> +int perf_event__process_namespaces(struct perf_tool *tool,
> +				   union perf_event *event,
> +				   struct perf_sample *sample,
> +				   struct machine *machine);
>  int perf_event__process_mmap(struct perf_tool *tool,
>  			     union perf_event *event,
>  			     struct perf_sample *sample,
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index ac59710..175dc23 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -932,6 +932,9 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts,
>  	attr->mmap2 = track && !perf_missing_features.mmap2;
>  	attr->comm  = track;
>  
> +	if (opts->record_namespaces)
> +		attr->namespaces  = track;
> +
>  	if (opts->record_switch_events)
>  		attr->context_switch = track;
>  
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 71c9720..060fabb 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -13,6 +13,7 @@
>  #include <symbol/kallsyms.h>
>  #include "unwind.h"
>  #include "linux/hash.h"
> +#include "asm/bug.h"
>  
>  static void __machine__remove_thread(struct machine *machine, struct thread *th, bool lock);
>  
> @@ -501,6 +502,34 @@ int machine__process_comm_event(struct machine *machine, union perf_event *event
>  	return err;
>  }
>  
> +int machine__process_namespaces_event(struct machine *machine __maybe_unused,
> +				      union perf_event *event,
> +				      struct perf_sample *sample __maybe_unused)
> +{
> +	struct thread *thread = machine__findnew_thread(machine,
> +							event->namespaces.pid,
> +							event->namespaces.tid);
> +	int err = 0;
> +
> +	WARN_ONCE(event->namespaces.nr_namespaces > NR_NAMESPACES,
> +		  "\nWARNING: kernel seems to support more namespaces than perf"
> +		  " tool.\nTry updating the perf tool..\n\n");
> +
> +	WARN_ONCE(event->namespaces.nr_namespaces < NR_NAMESPACES,
> +		  "\nWARNING: perf tool seems to support more namespaces than"
> +		  " the kernel.\nTry updating the kernel..\n\n");

And then how can this take place, i.e. are you truncating the extra
namespaces coming from the kernel but continuing anyway, just after
warning the user once about it?

Wouldn't this message get lost in the logs and the user be left
wondering why the namespaces it expects to be there to have vanished?

> +	if (thread == NULL ||
> +	    thread__set_namespaces(thread, sample->time, &event->namespaces)) {
> +		dump_printf("problem processing PERF_RECORD_NAMESPACES, skipping event.\n");
> +		err = -1;
> +	}
> +
> +	thread__put(thread);
> +
> +	return err;
> +}
> +
>  int machine__process_lost_event(struct machine *machine __maybe_unused,
>  				union perf_event *event, struct perf_sample *sample __maybe_unused)
>  {
> @@ -1538,6 +1567,8 @@ int machine__process_event(struct machine *machine, union perf_event *event,
>  		ret = machine__process_comm_event(machine, event, sample); break;
>  	case PERF_RECORD_MMAP:
>  		ret = machine__process_mmap_event(machine, event, sample); break;
> +	case PERF_RECORD_NAMESPACES:
> +		ret = machine__process_namespaces_event(machine, event, sample); break;
>  	case PERF_RECORD_MMAP2:
>  		ret = machine__process_mmap2_event(machine, event, sample); break;
>  	case PERF_RECORD_FORK:
> diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
> index a283050..3cdb134 100644
> --- a/tools/perf/util/machine.h
> +++ b/tools/perf/util/machine.h
> @@ -97,6 +97,9 @@ int machine__process_itrace_start_event(struct machine *machine,
>  					union perf_event *event);
>  int machine__process_switch_event(struct machine *machine,
>  				  union perf_event *event);
> +int machine__process_namespaces_event(struct machine *machine,
> +				      union perf_event *event,
> +				      struct perf_sample *sample);
>  int machine__process_mmap_event(struct machine *machine, union perf_event *event,
>  				struct perf_sample *sample);
>  int machine__process_mmap2_event(struct machine *machine, union perf_event *event,
> diff --git a/tools/perf/util/namespaces.c b/tools/perf/util/namespaces.c
> new file mode 100644
> index 0000000..3134c00
> --- /dev/null
> +++ b/tools/perf/util/namespaces.c
> @@ -0,0 +1,35 @@
> +/*
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * Copyright (C) 2017 Hari Bathini, IBM Corporation
> + */
> +
> +#include "namespaces.h"
> +#include "util.h"
> +#include "event.h"
> +#include <stdlib.h>
> +#include <stdio.h>
> +
> +struct namespaces *namespaces__new(struct namespaces_event *event)
> +{
> +	struct namespaces *namespaces = zalloc(sizeof(*namespaces));
> +
> +	if (!namespaces)
> +		return NULL;
> +
> +	namespaces->end_time = -1;
> +
> +	if (event) {
> +		memcpy(namespaces->link_info, event->link_info,
> +		       sizeof(namespaces->link_info));
> +	}

Please allocate just what came from the kernel, be it less or more than
that magic number (12).

> +
> +	return namespaces;
> +}
> +
> +void namespaces__free(struct namespaces *namespaces)
> +{
> +	free(namespaces);
> +}
> diff --git a/tools/perf/util/namespaces.h b/tools/perf/util/namespaces.h
> new file mode 100644
> index 0000000..45d9ffd
> --- /dev/null
> +++ b/tools/perf/util/namespaces.h
> @@ -0,0 +1,26 @@
> +/*
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * Copyright (C) 2017 Hari Bathini, IBM Corporation
> + */
> +
> +#ifndef __PERF_NAMESPACES_H
> +#define __PERF_NAMESPACES_H
> +
> +#include "../perf.h"
> +#include <linux/list.h>
> +
> +struct namespaces_event;
> +
> +struct namespaces {
> +	struct list_head list;
> +	u64 end_time;
> +	struct perf_ns_link_info link_info[NR_NAMESPACES];

Here you could have it as a zero sized array and allocate it according
to the number of namespaces that came from the kernel

> +};
> +
> +struct namespaces *namespaces__new(struct namespaces_event *event);
> +void namespaces__free(struct namespaces *namespaces);
> +
> +#endif  /* __PERF_NAMESPACES_H */
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 4cdbc8f..0b782a3 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -1239,6 +1239,8 @@ static int machines__deliver_event(struct machines *machines,
>  		return tool->mmap2(tool, event, sample, machine);
>  	case PERF_RECORD_COMM:
>  		return tool->comm(tool, event, sample, machine);
> +	case PERF_RECORD_NAMESPACES:
> +		return tool->namespaces(tool, event, sample, machine);
>  	case PERF_RECORD_FORK:
>  		return tool->fork(tool, event, sample, machine);
>  	case PERF_RECORD_EXIT:
> @@ -1494,6 +1496,11 @@ int perf_session__register_idle_thread(struct perf_session *session)
>  		err = -1;
>  	}
>  
> +	if (thread == NULL || thread__set_namespaces(thread, 0, NULL)) {
> +		pr_err("problem inserting idle task.\n");
> +		err = -1;
> +	}
> +
>  	/* machine__findnew_thread() got the thread, so put it */
>  	thread__put(thread);
>  	return err;
> diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
> index f5af87f..b9fe432 100644
> --- a/tools/perf/util/thread.c
> +++ b/tools/perf/util/thread.c
> @@ -7,6 +7,7 @@
>  #include "thread-stack.h"
>  #include "util.h"
>  #include "debug.h"
> +#include "namespaces.h"
>  #include "comm.h"
>  #include "unwind.h"
>  
> @@ -40,6 +41,7 @@ struct thread *thread__new(pid_t pid, pid_t tid)
>  		thread->tid = tid;
>  		thread->ppid = -1;
>  		thread->cpu = -1;
> +		INIT_LIST_HEAD(&thread->namespaces_list);
>  		INIT_LIST_HEAD(&thread->comm_list);
>  
>  		comm_str = malloc(32);
> @@ -66,7 +68,8 @@ struct thread *thread__new(pid_t pid, pid_t tid)
>  
>  void thread__delete(struct thread *thread)
>  {
> -	struct comm *comm, *tmp;
> +	struct namespaces *namespaces, *tmp_namespaces;
> +	struct comm *comm, *tmp_comm;
>  
>  	BUG_ON(!RB_EMPTY_NODE(&thread->rb_node));
>  
> @@ -76,7 +79,12 @@ void thread__delete(struct thread *thread)
>  		map_groups__put(thread->mg);
>  		thread->mg = NULL;
>  	}
> -	list_for_each_entry_safe(comm, tmp, &thread->comm_list, list) {
> +	list_for_each_entry_safe(namespaces, tmp_namespaces,
> +				 &thread->namespaces_list, list) {
> +		list_del(&namespaces->list);
> +		namespaces__free(namespaces);
> +	}
> +	list_for_each_entry_safe(comm, tmp_comm, &thread->comm_list, list) {
>  		list_del(&comm->list);
>  		comm__free(comm);
>  	}
> @@ -104,6 +112,38 @@ void thread__put(struct thread *thread)
>  	}
>  }
>  
> +struct namespaces *thread__namespaces(const struct thread *thread)
> +{
> +	if (list_empty(&thread->namespaces_list))
> +		return NULL;
> +
> +	return list_first_entry(&thread->namespaces_list, struct namespaces, list);
> +}
> +
> +int thread__set_namespaces(struct thread *thread, u64 timestamp,
> +			   struct namespaces_event *event)
> +{
> +	struct namespaces *new, *curr = thread__namespaces(thread);
> +
> +	new = namespaces__new(event);
> +	if (!new)
> +		return -ENOMEM;
> +
> +	list_add(&new->list, &thread->namespaces_list);
> +
> +	if (timestamp && curr) {
> +		/*
> +		 * setns syscall must have changed few or all the namespaces
> +		 * of this thread. Update end time for the namespaces
> +		 * previously used.
> +		 */
> +		curr = list_next_entry(new, list);
> +		curr->end_time = timestamp;
> +	}
> +
> +	return 0;
> +}
> +
>  struct comm *thread__comm(const struct thread *thread)
>  {
>  	if (list_empty(&thread->comm_list))
> diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
> index 99263cb..b18b5a2 100644
> --- a/tools/perf/util/thread.h
> +++ b/tools/perf/util/thread.h
> @@ -28,6 +28,7 @@ struct thread {
>  	bool			comm_set;
>  	int			comm_len;
>  	bool			dead; /* if set thread has exited */
> +	struct list_head	namespaces_list;
>  	struct list_head	comm_list;
>  	u64			db_id;
>  
> @@ -40,6 +41,7 @@ struct thread {
>  };
>  
>  struct machine;
> +struct namespaces;
>  struct comm;
>  
>  struct thread *thread__new(pid_t pid, pid_t tid);
> @@ -62,6 +64,10 @@ static inline void thread__exited(struct thread *thread)
>  	thread->dead = true;
>  }
>  
> +struct namespaces *thread__namespaces(const struct thread *thread);
> +int thread__set_namespaces(struct thread *thread, u64 timestamp,
> +			   struct namespaces_event *event);
> +
>  int __thread__set_comm(struct thread *thread, const char *comm, u64 timestamp,
>  		       bool exec);
>  static inline int thread__set_comm(struct thread *thread, const char *comm,
> diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
> index ac2590a..829471a 100644
> --- a/tools/perf/util/tool.h
> +++ b/tools/perf/util/tool.h
> @@ -40,6 +40,7 @@ struct perf_tool {
>  	event_op	mmap,
>  			mmap2,
>  			comm,
> +			namespaces,
>  			fork,
>  			exit,
>  			lost,
> @@ -66,6 +67,7 @@ struct perf_tool {
>  	event_op3	auxtrace;
>  	bool		ordered_events;
>  	bool		ordering_requires_timestamps;
> +	bool		namespace_events;
>  };
>  
>  #endif /* __PERF_TOOL_H */

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 3/8] perf tool: update about the new option to record namespace events
  2017-02-21 14:01 ` [PATCH v7 3/8] perf tool: update about the new option to record namespace events Hari Bathini
@ 2017-03-01 21:03   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 24+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-03-01 21:03 UTC (permalink / raw)
  To: Hari Bathini
  Cc: ast, peterz, lkml, alexander.shishkin, mingo, daniel, rostedt,
	Ananth N Mavinakayanahalli, ebiederm, sargun, Aravinda Prasad,
	brendan.d.gregg, jolsa

Em Tue, Feb 21, 2017 at 07:31:37PM +0530, Hari Bathini escreveu:
> Now that we have a new option to record namespace events, update
> the perf-record documentation accordingly.
> 
> Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
> ---
>  tools/perf/Documentation/perf-record.txt |    3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
> index 27256bc..9c85a65 100644
> --- a/tools/perf/Documentation/perf-record.txt
> +++ b/tools/perf/Documentation/perf-record.txt
> @@ -347,6 +347,9 @@ Enable weightened sampling. An additional weight is recorded per sample and can
>  displayed with the weight and local_weight sort keys.  This currently works for TSX
>  abort events and some memory events in precise mode on modern Intel CPUs.
>  
> +--namespaces::
> +Record events of type PERF_RECORD_NAMESPACES.
> +
>  --transaction::
>  Record transaction flags for transaction related events.

So, here it is, please update the documentation together with the patch
that introduces the option.

- Arnaldo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 4/8] perf tool: synthesize namespace events for current processes
  2017-02-21 14:01 ` [PATCH v7 4/8] perf tool: synthesize namespace events for current processes Hari Bathini
@ 2017-03-01 21:05   ` Arnaldo Carvalho de Melo
  2017-03-03  8:57     ` Hari Bathini
  0 siblings, 1 reply; 24+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-03-01 21:05 UTC (permalink / raw)
  To: Hari Bathini
  Cc: ast, peterz, lkml, alexander.shishkin, mingo, daniel, rostedt,
	Ananth N Mavinakayanahalli, ebiederm, sargun, Aravinda Prasad,
	brendan.d.gregg, jolsa

Em Tue, Feb 21, 2017 at 07:31:44PM +0530, Hari Bathini escreveu:
> Synthesize PERF_RECORD_NAMESPACES events for processes that were
> running prior to invocation of perf record, the data for which is
> taken from /proc/$PID/ns. These changes make way for analyzing
> events with regard to namespaces.
> 
> Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
> ---
>  tools/perf/builtin-record.c |   27 +++++++++--
>  tools/perf/util/event.c     |  107 +++++++++++++++++++++++++++++++++++++++++--
>  tools/perf/util/event.h     |    6 ++
>  3 files changed, 130 insertions(+), 10 deletions(-)
> 
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index a8b9a78..f4bf6a6 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -986,6 +986,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>  	 */
>  	if (forks) {
>  		union perf_event *event;
> +		pid_t tgid;
>  
>  		event = malloc(sizeof(event->comm) + machine->id_hdr_size);
>  		if (event == NULL) {
> @@ -999,10 +1000,28 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>  		 * cannot see a correct process name for those events.
>  		 * Synthesize COMM event to prevent it.
>  		 */
> -		perf_event__synthesize_comm(tool, event,
> -					    rec->evlist->workload.pid,
> -					    process_synthesized_event,
> -					    machine);
> +		tgid = perf_event__synthesize_comm(tool, event,
> +						   rec->evlist->workload.pid,
> +						   process_synthesized_event,
> +						   machine);
> +		free(event);
> +
> +		if (tgid == -1)
> +			goto out_child;
> +
> +		event = malloc(sizeof(event->namespaces) + machine->id_hdr_size);
> +		if (event == NULL) {
> +			err = -ENOMEM;
> +			goto out_child;
> +		}
> +
> +		/*
> +		 * Synthesize NAMESPACES event for the command specified.
> +		 */
> +		perf_event__synthesize_namespaces(tool, event,
> +						  rec->evlist->workload.pid,
> +						  tgid, process_synthesized_event,
> +						  machine);
>  		free(event);
>  
>  		perf_evlist__start_workload(rec->evlist);
> diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
> index f118eac..c8c112a 100644
> --- a/tools/perf/util/event.c
> +++ b/tools/perf/util/event.c
> @@ -50,6 +50,16 @@ static const char *perf_event__names[] = {
>  	[PERF_RECORD_TIME_CONV]			= "TIME_CONV",
>  };
>  
> +static const char *perf_ns__names[] = {
> +	[NET_NS_INDEX]		= "net",
> +	[UTS_NS_INDEX]		= "uts",
> +	[IPC_NS_INDEX]		= "ipc",
> +	[PID_NS_INDEX]		= "pid",
> +	[USER_NS_INDEX]		= "user",
> +	[MNT_NS_INDEX]		= "mnt",
> +	[CGROUP_NS_INDEX]	= "cgroup",
> +};
> +
>  const char *perf_event__name(unsigned int id)
>  {
>  	if (id >= ARRAY_SIZE(perf_event__names))
> @@ -59,6 +69,13 @@ const char *perf_event__name(unsigned int id)
>  	return perf_event__names[id];
>  }
>  
> +static const char *perf_ns__name(unsigned int id)
> +{
> +	if (id >= ARRAY_SIZE(perf_ns__names))
> +		return "UNKNOWN";
> +	return perf_ns__names[id];
> +}
> +
>  static int perf_tool__process_synth_event(struct perf_tool *tool,
>  					  union perf_event *event,
>  					  struct machine *machine,
> @@ -204,6 +221,56 @@ pid_t perf_event__synthesize_comm(struct perf_tool *tool,
>  	return tgid;
>  }
>  
> +static void perf_event__get_ns_link_info(pid_t pid, const char *ns,
> +					 struct perf_ns_link_info *ns_link_info)
> +{
> +	struct stat64 st;
> +	char proc_ns[128];
> +
> +	sprintf(proc_ns, "/proc/%u/ns/%s", pid, ns);
> +	if (stat64(proc_ns, &st) == 0) {
> +		ns_link_info->dev = st.st_dev;
> +		ns_link_info->ino = st.st_ino;
> +	}
> +}
> +
> +int perf_event__synthesize_namespaces(struct perf_tool *tool,
> +				      union perf_event *event,
> +				      pid_t pid, pid_t tgid,
> +				      perf_event__handler_t process,
> +				      struct machine *machine)
> +{
> +	u32 idx;
> +	struct perf_ns_link_info *ns_link_info;
> +
> +	if (!tool->namespace_events)
> +		return 0;
> +
> +	memset(&event->namespaces, 0,
> +	       sizeof(event->namespaces) + machine->id_hdr_size);
> +
> +	event->namespaces.pid = tgid;
> +	event->namespaces.tid = pid;
> +
> +	event->namespaces.nr_namespaces = NR_NAMESPACES;

Huh? Don't you have to first figure out how many namespaces a process is
in to then set this field? 

> +	ns_link_info = event->namespaces.link_info;
> +
> +	for (idx = 0; idx < event->namespaces.nr_namespaces; idx++)
> +		perf_event__get_ns_link_info(pid, perf_ns__name(idx),
> +					     &ns_link_info[idx]);



> +
> +	event->namespaces.header.type = PERF_RECORD_NAMESPACES;
> +
> +	event->namespaces.header.size = (sizeof(event->namespaces) +
> +					 machine->id_hdr_size);
> +
> +	if (perf_tool__process_synth_event(tool, event, machine, process) != 0)
> +		return -1;
> +
> +	return 0;
> +}
> +
>  static int perf_event__synthesize_fork(struct perf_tool *tool,
>  				       union perf_event *event,
>  				       pid_t pid, pid_t tgid, pid_t ppid,
> @@ -435,8 +502,9 @@ int perf_event__synthesize_modules(struct perf_tool *tool,
>  static int __event__synthesize_thread(union perf_event *comm_event,
>  				      union perf_event *mmap_event,
>  				      union perf_event *fork_event,
> +				      union perf_event *namespaces_event,
>  				      pid_t pid, int full,
> -					  perf_event__handler_t process,
> +				      perf_event__handler_t process,
>  				      struct perf_tool *tool,
>  				      struct machine *machine,
>  				      bool mmap_data,
> @@ -456,6 +524,11 @@ static int __event__synthesize_thread(union perf_event *comm_event,
>  		if (tgid == -1)
>  			return -1;
>  
> +		if (perf_event__synthesize_namespaces(tool, namespaces_event, pid,
> +						      tgid, process, machine) < 0)
> +			return -1;
> +
> +
>  		return perf_event__synthesize_mmap_events(tool, mmap_event, pid, tgid,
>  							  process, machine, mmap_data,
>  							  proc_map_timeout);
> @@ -489,6 +562,11 @@ static int __event__synthesize_thread(union perf_event *comm_event,
>  		if (perf_event__synthesize_fork(tool, fork_event, _pid, tgid,
>  						ppid, process, machine) < 0)
>  			break;
> +
> +		if (perf_event__synthesize_namespaces(tool, namespaces_event, _pid,
> +						      tgid, process, machine) < 0)
> +			break;
> +
>  		/*
>  		 * Send the prepared comm event
>  		 */
> @@ -517,6 +595,7 @@ int perf_event__synthesize_thread_map(struct perf_tool *tool,
>  				      unsigned int proc_map_timeout)
>  {
>  	union perf_event *comm_event, *mmap_event, *fork_event;
> +	union perf_event *namespaces_event;
>  	int err = -1, thread, j;
>  
>  	comm_event = malloc(sizeof(comm_event->comm) + machine->id_hdr_size);
> @@ -531,10 +610,15 @@ int perf_event__synthesize_thread_map(struct perf_tool *tool,
>  	if (fork_event == NULL)
>  		goto out_free_mmap;
>  
> +	namespaces_event = malloc(sizeof(namespaces_event->namespaces) +
> +				  machine->id_hdr_size);
> +	if (namespaces_event == NULL)
> +		goto out_free_fork;
> +
>  	err = 0;
>  	for (thread = 0; thread < threads->nr; ++thread) {
>  		if (__event__synthesize_thread(comm_event, mmap_event,
> -					       fork_event,
> +					       fork_event, namespaces_event,
>  					       thread_map__pid(threads, thread), 0,
>  					       process, tool, machine,
>  					       mmap_data, proc_map_timeout)) {
> @@ -560,7 +644,7 @@ int perf_event__synthesize_thread_map(struct perf_tool *tool,
>  			/* if not, generate events for it */
>  			if (need_leader &&
>  			    __event__synthesize_thread(comm_event, mmap_event,
> -						       fork_event,
> +						       fork_event, namespaces_event,
>  						       comm_event->comm.pid, 0,
>  						       process, tool, machine,
>  						       mmap_data, proc_map_timeout)) {
> @@ -569,6 +653,8 @@ int perf_event__synthesize_thread_map(struct perf_tool *tool,
>  			}
>  		}
>  	}
> +	free(namespaces_event);
> +out_free_fork:
>  	free(fork_event);
>  out_free_mmap:
>  	free(mmap_event);
> @@ -588,6 +674,7 @@ int perf_event__synthesize_threads(struct perf_tool *tool,
>  	char proc_path[PATH_MAX];
>  	struct dirent *dirent;
>  	union perf_event *comm_event, *mmap_event, *fork_event;
> +	union perf_event *namespaces_event;
>  	int err = -1;
>  
>  	if (machine__is_default_guest(machine))
> @@ -605,11 +692,16 @@ int perf_event__synthesize_threads(struct perf_tool *tool,
>  	if (fork_event == NULL)
>  		goto out_free_mmap;
>  
> +	namespaces_event = malloc(sizeof(namespaces_event->namespaces) +
> +				  machine->id_hdr_size);
> +	if (namespaces_event == NULL)
> +		goto out_free_fork;
> +
>  	snprintf(proc_path, sizeof(proc_path), "%s/proc", machine->root_dir);
>  	proc = opendir(proc_path);
>  
>  	if (proc == NULL)
> -		goto out_free_fork;
> +		goto out_free_namespaces;
>  
>  	while ((dirent = readdir(proc)) != NULL) {
>  		char *end;
> @@ -621,13 +713,16 @@ int perf_event__synthesize_threads(struct perf_tool *tool,
>   		 * We may race with exiting thread, so don't stop just because
>   		 * one thread couldn't be synthesized.
>   		 */
> -		__event__synthesize_thread(comm_event, mmap_event, fork_event, pid,
> -					   1, process, tool, machine, mmap_data,
> +		__event__synthesize_thread(comm_event, mmap_event, fork_event,
> +					   namespaces_event, pid, 1, process,
> +					   tool, machine, mmap_data,
>  					   proc_map_timeout);
>  	}
>  
>  	err = 0;
>  	closedir(proc);
> +out_free_namespaces:
> +	free(namespaces_event);
>  out_free_fork:
>  	free(fork_event);
>  out_free_mmap:
> diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
> index 4e90b09..c73ad47 100644
> --- a/tools/perf/util/event.h
> +++ b/tools/perf/util/event.h
> @@ -650,6 +650,12 @@ pid_t perf_event__synthesize_comm(struct perf_tool *tool,
>  				  perf_event__handler_t process,
>  				  struct machine *machine);
>  
> +int perf_event__synthesize_namespaces(struct perf_tool *tool,
> +				      union perf_event *event,
> +				      pid_t pid, pid_t tgid,
> +				      perf_event__handler_t process,
> +				      struct machine *machine);
> +
>  int perf_event__synthesize_mmap_events(struct perf_tool *tool,
>  				       union perf_event *event,
>  				       pid_t pid, pid_t tgid,

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 5/8] perf tool: add print support for namespace events
  2017-02-21 14:01 ` [PATCH v7 5/8] perf tool: add print support for namespace events Hari Bathini
@ 2017-03-01 21:06   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 24+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-03-01 21:06 UTC (permalink / raw)
  To: Hari Bathini
  Cc: ast, peterz, lkml, alexander.shishkin, mingo, daniel, rostedt,
	Ananth N Mavinakayanahalli, ebiederm, sargun, Aravinda Prasad,
	brendan.d.gregg, jolsa

Em Tue, Feb 21, 2017 at 07:31:52PM +0530, Hari Bathini escreveu:
> +++ b/tools/perf/util/event.c
> @@ -1104,6 +1104,33 @@ size_t perf_event__fprintf_comm(union perf_event *event, FILE *fp)
>  	return fprintf(fp, "%s: %s:%d/%d\n", s, event->comm.comm, event->comm.pid, event->comm.tid);
>  }
>  
> +size_t perf_event__fprintf_namespaces(union perf_event *event, FILE *fp)
> +{
> +	size_t ret = 0;
> +	struct perf_ns_link_info *ns_link_info;
> +	u32 nr_namespaces, idx;
> +
> +	ns_link_info = event->namespaces.link_info;
> +	nr_namespaces = event->namespaces.nr_namespaces;

Perfect, no magic numbers here. :-)

- Arnaldo

> +	ret += fprintf(fp, " %d/%d - nr_namespaces: %u\n\t[",
> +		       event->namespaces.pid,
> +		       event->namespaces.tid,
> +		       nr_namespaces);
> +
> +	for (idx = 0; idx < nr_namespaces; idx++) {
> +		if (idx && (idx % 4 == 0))
> +			ret += fprintf(fp, "\n\t ");
> +
> +		ret  += fprintf(fp, "%u/%s: %lu/0x%lx%s", idx,
> +				perf_ns__name(idx), (u64)ns_link_info[idx].dev,
> +				(u64)ns_link_info[idx].ino,
> +				((idx + 1) != nr_namespaces) ? ", " : "]\n\n");
> +	}
> +
> +	return ret;
> +}
> +
>  int perf_event__process_comm(struct perf_tool *tool __maybe_unused,
>  			     union perf_event *event,
>  			     struct perf_sample *sample,
> @@ -1300,6 +1327,9 @@ size_t perf_event__fprintf(union perf_event *event, FILE *fp)
>  	case PERF_RECORD_MMAP:
>  		ret += perf_event__fprintf_mmap(event, fp);
>  		break;
> +	case PERF_RECORD_NAMESPACES:
> +		ret += perf_event__fprintf_namespaces(event, fp);
> +		break;
>  	case PERF_RECORD_MMAP2:
>  		ret += perf_event__fprintf_mmap2(event, fp);
>  		break;
> diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
> index c73ad47..8eb470b 100644
> --- a/tools/perf/util/event.h
> +++ b/tools/perf/util/event.h
> @@ -673,6 +673,7 @@ size_t perf_event__fprintf_itrace_start(union perf_event *event, FILE *fp);
>  size_t perf_event__fprintf_switch(union perf_event *event, FILE *fp);
>  size_t perf_event__fprintf_thread_map(union perf_event *event, FILE *fp);
>  size_t perf_event__fprintf_cpu_map(union perf_event *event, FILE *fp);
> +size_t perf_event__fprintf_namespaces(union perf_event *event, FILE *fp);
>  size_t perf_event__fprintf(union perf_event *event, FILE *fp);
>  
>  u64 kallsyms__get_function_start(const char *kallsyms_filename,
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 060fabb..5f46ad0 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -519,6 +519,9 @@ int machine__process_namespaces_event(struct machine *machine __maybe_unused,
>  		  "\nWARNING: perf tool seems to support more namespaces than"
>  		  " the kernel.\nTry updating the kernel..\n\n");
>  
> +	if (dump_trace)
> +		perf_event__fprintf_namespaces(event, stdout);
> +
>  	if (thread == NULL ||
>  	    thread__set_namespaces(thread, sample->time, &event->namespaces)) {
>  		dump_printf("problem processing PERF_RECORD_NAMESPACES, skipping event.\n");

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 6/8] perf tool: add script print support for namespace events
  2017-02-21 14:02 ` [PATCH v7 6/8] perf tool: add script " Hari Bathini
@ 2017-03-01 21:08   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 24+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-03-01 21:08 UTC (permalink / raw)
  To: Hari Bathini
  Cc: ast, peterz, lkml, alexander.shishkin, mingo, daniel, rostedt,
	Ananth N Mavinakayanahalli, ebiederm, sargun, Aravinda Prasad,
	brendan.d.gregg, jolsa

Em Tue, Feb 21, 2017 at 07:32:19PM +0530, Hari Bathini escreveu:
>> Add script print support for events of type PERF_RECORD_NAMESPACES.
 

Please combine the documentation update (next patch) with this one.

And please add an the resulting output as an example, I will then try to
reproduce your tests, as such, please add as much information about how
to reproduce your tests and results in the changeset log, this way
anyone else can see how it looks like and can try to reproduce your
steps.

- Arnaldo

> Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
> ---
>  tools/perf/builtin-script.c |   40 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
> 
> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> index f1ce806..66d62c9 100644
> --- a/tools/perf/builtin-script.c
> +++ b/tools/perf/builtin-script.c
> @@ -830,6 +830,7 @@ struct perf_script {
>  	bool			show_task_events;
>  	bool			show_mmap_events;
>  	bool			show_switch_events;
> +	bool			show_namespace_events;
>  	bool			allocated;
>  	struct cpu_map		*cpus;
>  	struct thread_map	*threads;
> @@ -1118,6 +1119,41 @@ static int process_comm_event(struct perf_tool *tool,
>  	return ret;
>  }
>  
> +static int process_namespaces_event(struct perf_tool *tool,
> +				    union perf_event *event,
> +				    struct perf_sample *sample,
> +				    struct machine *machine)
> +{
> +	struct thread *thread;
> +	struct perf_script *script = container_of(tool, struct perf_script, tool);
> +	struct perf_session *session = script->session;
> +	struct perf_evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id);
> +	int ret = -1;
> +
> +	thread = machine__findnew_thread(machine, event->namespaces.pid,
> +					 event->namespaces.tid);
> +	if (thread == NULL) {
> +		pr_debug("problem processing NAMESPACES event, skipping it.\n");
> +		return -1;
> +	}
> +
> +	if (perf_event__process_namespaces(tool, event, sample, machine) < 0)
> +		goto out;
> +
> +	if (!evsel->attr.sample_id_all) {
> +		sample->cpu = 0;
> +		sample->time = 0;
> +		sample->tid = event->namespaces.tid;
> +		sample->pid = event->namespaces.pid;
> +	}
> +	print_sample_start(sample, thread, evsel);
> +	perf_event__fprintf(event, stdout);
> +	ret = 0;
> +out:
> +	thread__put(thread);
> +	return ret;
> +}
> +
>  static int process_fork_event(struct perf_tool *tool,
>  			      union perf_event *event,
>  			      struct perf_sample *sample,
> @@ -1293,6 +1329,8 @@ static int __cmd_script(struct perf_script *script)
>  	}
>  	if (script->show_switch_events)
>  		script->tool.context_switch = process_switch_event;
> +	if (script->show_namespace_events)
> +		script->tool.namespaces = process_namespaces_event;
>  
>  	ret = perf_session__process_events(script->session);
>  
> @@ -2181,6 +2219,8 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
>  		    "Show the mmap events"),
>  	OPT_BOOLEAN('\0', "show-switch-events", &script.show_switch_events,
>  		    "Show context switch events (if recorded)"),
> +	OPT_BOOLEAN('\0', "show-namespace-events", &script.show_namespace_events,
> +		    "Show namespace events (if recorded)"),
>  	OPT_BOOLEAN('f', "force", &symbol_conf.force, "don't complain, do it"),
>  	OPT_BOOLEAN(0, "ns", &nanosecs,
>  		    "Use 9 decimal places when displaying time"),

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 8/8] perf tool: add cgroup identifier entry in perf report
  2017-02-21 14:03 ` [PATCH v7 8/8] perf tool: add cgroup identifier entry in perf report Hari Bathini
  2017-02-22 16:48   ` Jiri Olsa
@ 2017-03-01 21:16   ` Arnaldo Carvalho de Melo
  2017-03-03  8:59     ` Hari Bathini
  1 sibling, 1 reply; 24+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-03-01 21:16 UTC (permalink / raw)
  To: Hari Bathini
  Cc: ast, peterz, lkml, alexander.shishkin, Ingo Molnar, daniel,
	rostedt, Ananth N Mavinakayanahalli, ebiederm, sargun,
	Aravinda Prasad, brendan.d.gregg, jolsa

Em Tue, Feb 21, 2017 at 07:33:13PM +0530, Hari Bathini escreveu:
> This patch introduces a cgroup identifier entry field in perf report to
> identify or distinguish data of different cgroups. It uses the device
> number and inode number of cgroup namespace, included in perf data with
> the new PERF_RECORD_NAMESPACES event, as cgroup identifier. With the
> assumption that each container is created with it's own cgroup namespace,
> this allows assessment/analysis of multiple containers at once.

Could you try to do this with some real world example? I.e. telling that
systemd creates some cgroups and then how to map the perf report output
you show below with what systemd puts in place.

Doing it with docker would be handy as well, no?

You also forgot to update the documentation with this new sort key,
please add it in the --sort part of
tools/perf/Documentation/perf-report.txt.

Ah, and just another minor request: please format the changeset
summaries as:

 perf tools: Add cgroup identifier sort order keyword


First the component, that in this case is the generic one, as it affects
multiple tools (top, report), and then start with a capital letter after
the colon.

A git log --oneline tools/perf/ -20

will show you the pattern used.

Thanks!

- Arnaldo
 
> Shown below is the output of perf report, sorted based on cgroup id, on
> a system that was running three containers at the time of perf record
> and clearly showing one of the containers' considerable use of kernel
> memory in comparison with others:
> 
> 
> 	$ perf report -s cgroup_id,sample --stdio
> 	#
> 	# Total Lost Samples: 0
> 	#
> 	# Samples: 16K of event 'kmem:kmalloc'
> 	# Event count (approx.): 16043
> 	#
> 	# Overhead  cgroup id (dev/inode)       Samples
> 	# ........  .....................  ............
> 	#
> 	    96.33%  3/0xf00000d0                  15454
> 	     3.02%  3/0xeffffffb                    485
> 	     0.31%  3/0xf00000ce                     49
> 	     0.29%  3/0xf00000cf                     47
> 	     0.05%  0/0x0                             8
> 
> While this is a start, there is further scope of improving this. For
> example, instead of cgroup namespace's device and inode numbers, dev
> and inode numbers of some or all namespaces may be used to distinguish
> which processes are running in a given container context. Also, scripts
> to map device and inode info to containers sounds plausible for better
> tracing of containers.
> 
> Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
> ---
>  tools/perf/util/hist.c |    7 +++++++
>  tools/perf/util/hist.h |    1 +
>  tools/perf/util/sort.c |   41 +++++++++++++++++++++++++++++++++++++++++
>  tools/perf/util/sort.h |    7 +++++++
>  4 files changed, 56 insertions(+)
> 
> diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
> index 32c6a93..559ea27 100644
> --- a/tools/perf/util/hist.c
> +++ b/tools/perf/util/hist.c
> @@ -3,6 +3,7 @@
>  #include "hist.h"
>  #include "map.h"
>  #include "session.h"
> +#include "namespaces.h"
>  #include "sort.h"
>  #include "evlist.h"
>  #include "evsel.h"
> @@ -169,6 +170,7 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h)
>  		hists__set_unres_dso_col_len(hists, HISTC_MEM_DADDR_DSO);
>  	}
>  
> +	hists__new_col_len(hists, HISTC_CGROUP_ID, 20);
>  	hists__new_col_len(hists, HISTC_CPU, 3);
>  	hists__new_col_len(hists, HISTC_SOCKET, 6);
>  	hists__new_col_len(hists, HISTC_MEM_LOCKED, 6);
> @@ -574,9 +576,14 @@ __hists__add_entry(struct hists *hists,
>  		   bool sample_self,
>  		   struct hist_entry_ops *ops)
>  {
> +	struct namespaces *ns = thread__namespaces(al->thread);
>  	struct hist_entry entry = {
>  		.thread	= al->thread,
>  		.comm = thread__comm(al->thread),
> +		.cgroup_id = {
> +			.dev = ns ? ns->link_info[CGROUP_NS_INDEX].dev : 0,
> +			.ino = ns ? ns->link_info[CGROUP_NS_INDEX].ino : 0,
> +		},
>  		.ms = {
>  			.map	= al->map,
>  			.sym	= al->sym,
> diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
> index 28c216e..4c1da48 100644
> --- a/tools/perf/util/hist.h
> +++ b/tools/perf/util/hist.h
> @@ -30,6 +30,7 @@ enum hist_column {
>  	HISTC_DSO,
>  	HISTC_THREAD,
>  	HISTC_COMM,
> +	HISTC_CGROUP_ID,
>  	HISTC_PARENT,
>  	HISTC_CPU,
>  	HISTC_SOCKET,
> diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
> index df622f4..9f5f404 100644
> --- a/tools/perf/util/sort.c
> +++ b/tools/perf/util/sort.c
> @@ -536,6 +536,46 @@ struct sort_entry sort_cpu = {
>  	.se_width_idx	= HISTC_CPU,
>  };
>  
> +/* --sort cgroup_id */
> +
> +static int64_t _sort__cgroup_dev_cmp(u64 left_dev, u64 right_dev)
> +{
> +	return (int64_t)(right_dev - left_dev);
> +}
> +
> +static int64_t _sort__cgroup_inode_cmp(u64 left_ino, u64 right_ino)
> +{
> +	return (int64_t)(right_ino - left_ino);
> +}
> +
> +static int64_t
> +sort__cgroup_id_cmp(struct hist_entry *left, struct hist_entry *right)
> +{
> +	int64_t ret;
> +
> +	ret = _sort__cgroup_dev_cmp(right->cgroup_id.dev, left->cgroup_id.dev);
> +	if (ret != 0)
> +		return ret;
> +
> +	return _sort__cgroup_inode_cmp(right->cgroup_id.ino,
> +				       left->cgroup_id.ino);
> +}
> +
> +static int hist_entry__cgroup_id_snprintf(struct hist_entry *he,
> +					  char *bf, size_t size,
> +					  unsigned int width __maybe_unused)
> +{
> +	return repsep_snprintf(bf, size, "%lu/0x%lx", he->cgroup_id.dev,
> +			       he->cgroup_id.ino);
> +}
> +
> +struct sort_entry sort_cgroup_id = {
> +	.se_header      = "cgroup id (dev/inode)",
> +	.se_cmp	        = sort__cgroup_id_cmp,
> +	.se_snprintf    = hist_entry__cgroup_id_snprintf,
> +	.se_width_idx	= HISTC_CGROUP_ID,
> +};
> +
>  /* --sort socket */
>  
>  static int64_t
> @@ -1418,6 +1458,7 @@ static struct sort_dimension common_sort_dimensions[] = {
>  	DIM(SORT_GLOBAL_WEIGHT, "weight", sort_global_weight),
>  	DIM(SORT_TRANSACTION, "transaction", sort_transaction),
>  	DIM(SORT_TRACE, "trace", sort_trace),
> +	DIM(SORT_CGROUP_ID, "cgroup_id", sort_cgroup_id),
>  };
>  
>  #undef DIM
> diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
> index 7aff317..68a5abb 100644
> --- a/tools/perf/util/sort.h
> +++ b/tools/perf/util/sort.h
> @@ -54,6 +54,11 @@ struct he_stat {
>  	u32			nr_events;
>  };
>  
> +struct namespace_id {
> +	u64			dev;
> +	u64			ino;
> +};
> +
>  struct hist_entry_diff {
>  	bool	computed;
>  	union {
> @@ -91,6 +96,7 @@ struct hist_entry {
>  	struct map_symbol	ms;
>  	struct thread		*thread;
>  	struct comm		*comm;
> +	struct namespace_id	cgroup_id;
>  	u64			ip;
>  	u64			transaction;
>  	s32			socket;
> @@ -211,6 +217,7 @@ enum sort_type {
>  	SORT_GLOBAL_WEIGHT,
>  	SORT_TRANSACTION,
>  	SORT_TRACE,
> +	SORT_CGROUP_ID,
>  
>  	/* branch stack specific sort keys */
>  	__SORT_BRANCH_STACK,

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 2/8] perf tool: add PERF_RECORD_NAMESPACES to include namespaces related info
  2017-03-01 21:02   ` Arnaldo Carvalho de Melo
@ 2017-03-03  8:54     ` Hari Bathini
  0 siblings, 0 replies; 24+ messages in thread
From: Hari Bathini @ 2017-03-03  8:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: ast, peterz, lkml, alexander.shishkin, mingo, daniel, rostedt,
	Ananth N Mavinakayanahalli, ebiederm, sargun, Aravinda Prasad,
	brendan.d.gregg, jolsa

Hi Arnaldo,


Thanks for the review.


On Thursday 02 March 2017 02:32 AM, Arnaldo Carvalho de Melo wrote:
> Em Tue, Feb 21, 2017 at 07:31:30PM +0530, Hari Bathini escreveu:
>> Update perf tool to examine PERF_RECORD_NAMESPACES events emitted by
>>> the kernel when fork, clone, setns or unshare are invoked.
> You forgot to update tools/perf/Documentation/ for all the options you
> added, see more comments below.

Will fold the next patch into this one.

>> Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
>> ---
>>   tools/include/uapi/linux/perf_event.h |   32 +++++++++++++++++++++++-
>>   tools/perf/builtin-annotate.c         |    1 +
>>   tools/perf/builtin-diff.c             |    1 +
>>   tools/perf/builtin-inject.c           |   14 +++++++++++
>>   tools/perf/builtin-kmem.c             |    1 +
>>   tools/perf/builtin-kvm.c              |    2 ++
>>   tools/perf/builtin-lock.c             |    1 +
>>   tools/perf/builtin-mem.c              |    1 +
>>   tools/perf/builtin-record.c           |    6 +++++
>>   tools/perf/builtin-report.c           |    1 +
>>   tools/perf/builtin-sched.c            |    1 +
>>   tools/perf/builtin-script.c           |    1 +
>>   tools/perf/builtin-trace.c            |    3 ++
>>   tools/perf/perf.h                     |    1 +
>>   tools/perf/util/Build                 |    1 +
>>   tools/perf/util/data-convert-bt.c     |    1 +
>>   tools/perf/util/event.c               |    9 +++++++
>>   tools/perf/util/event.h               |   14 +++++++++++
>>   tools/perf/util/evsel.c               |    3 ++
>>   tools/perf/util/machine.c             |   31 +++++++++++++++++++++++
>>   tools/perf/util/machine.h             |    3 ++
>>   tools/perf/util/namespaces.c          |   35 ++++++++++++++++++++++++++
>>   tools/perf/util/namespaces.h          |   26 ++++++++++++++++++++
>>   tools/perf/util/session.c             |    7 +++++
>>   tools/perf/util/thread.c              |   44 ++++++++++++++++++++++++++++++++-
>>   tools/perf/util/thread.h              |    6 +++++
>>   tools/perf/util/tool.h                |    2 ++
>>   27 files changed, 244 insertions(+), 4 deletions(-)
>>   create mode 100644 tools/perf/util/namespaces.c
>>   create mode 100644 tools/perf/util/namespaces.h
>>
>> diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
>> index c66a485..bec0aad 100644
>> --- a/tools/include/uapi/linux/perf_event.h
>> +++ b/tools/include/uapi/linux/perf_event.h
>> @@ -344,7 +344,8 @@ struct perf_event_attr {
>>   				use_clockid    :  1, /* use @clockid for time fields */
>>   				context_switch :  1, /* context switch data */
>>   				write_backward :  1, /* Write ring buffer from end to beginning */
>> -				__reserved_1   : 36;
>> +				namespaces     :  1, /* include namespaces data */
>> +				__reserved_1   : 35;
>>   
>>   	union {
>>   		__u32		wakeup_events;	  /* wakeup every n events */
>> @@ -610,6 +611,23 @@ struct perf_event_header {
>>   	__u16	size;
>>   };
>>   
>> +struct perf_ns_link_info {
>> +	__u64	dev;
>> +	__u64	ino;
>> +};
>> +
>> +enum {
>> +	NET_NS_INDEX		= 0,
>> +	UTS_NS_INDEX		= 1,
>> +	IPC_NS_INDEX		= 2,
>> +	PID_NS_INDEX		= 3,
>> +	USER_NS_INDEX		= 4,
>> +	MNT_NS_INDEX		= 5,
>> +	CGROUP_NS_INDEX		= 6,
>> +
>> +	NR_NAMESPACES,		/* number of available namespaces */
>> +};
>> +
>>   enum perf_event_type {
>>   
>>   	/*
>> @@ -862,6 +880,18 @@ enum perf_event_type {
>>   	 */
>>   	PERF_RECORD_SWITCH_CPU_WIDE		= 15,
>>   
>> +	/*
>> +	 * struct {
>> +	 *	struct perf_event_header	header;
>> +	 *	u32				pid;
>> +	 *	u32				tid;
>> +	 *	u64				nr_namespaces;
>> +	 *	{ u64				dev, inode; } [nr_namespaces];
>> +	 *	struct sample_id		sample_id;
>> +	 * };
>> +	 */
>> +	PERF_RECORD_NAMESPACES			= 16,
>> +
>>   	PERF_RECORD_MAX,			/* non-ABI */
>>   };
>>   
>> diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
>> index ebb6283..1b63dc4 100644
>> --- a/tools/perf/builtin-annotate.c
>> +++ b/tools/perf/builtin-annotate.c
>> @@ -393,6 +393,7 @@ int cmd_annotate(int argc, const char **argv, const char *prefix __maybe_unused)
>>   			.comm	= perf_event__process_comm,
>>   			.exit	= perf_event__process_exit,
>>   			.fork	= perf_event__process_fork,
>> +			.namespaces = perf_event__process_namespaces,
>>   			.ordered_events = true,
>>   			.ordering_requires_timestamps = true,
>>   		},
>> diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
>> index 70a2893..4b821cf 100644
>> --- a/tools/perf/builtin-diff.c
>> +++ b/tools/perf/builtin-diff.c
>> @@ -364,6 +364,7 @@ static struct perf_tool tool = {
>>   	.exit	= perf_event__process_exit,
>>   	.fork	= perf_event__process_fork,
>>   	.lost	= perf_event__process_lost,
>> +	.namespaces = perf_event__process_namespaces,
>>   	.ordered_events = true,
>>   	.ordering_requires_timestamps = true,
>>   };
>> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
>> index b9bc7e3..c5ddc73 100644
>> --- a/tools/perf/builtin-inject.c
>> +++ b/tools/perf/builtin-inject.c
>> @@ -333,6 +333,19 @@ static int perf_event__repipe_comm(struct perf_tool *tool,
>>   	return err;
>>   }
>>   
>> +static int perf_event__repipe_namespaces(struct perf_tool *tool,
>> +					 union perf_event *event,
>> +					 struct perf_sample *sample,
>> +					 struct machine *machine)
>> +{
>> +	int err;
>> +
>> +	err = perf_event__process_namespaces(tool, event, sample, machine);
> Minor, but since changes are needed anyway: combine the previous three
> lines into one.

Sure.

>> +	perf_event__repipe(tool, event, sample, machine);
>> +
>> +	return err;
>> +}
>> +
>>   static int perf_event__repipe_exit(struct perf_tool *tool,
>>   				   union perf_event *event,
>>   				   struct perf_sample *sample,
>> @@ -660,6 +673,7 @@ static int __cmd_inject(struct perf_inject *inject)
>>   		session->itrace_synth_opts = &inject->itrace_synth_opts;
>>   		inject->itrace_synth_opts.inject = true;
>>   		inject->tool.comm	    = perf_event__repipe_comm;
>> +		inject->tool.namespaces	    = perf_event__repipe_namespaces;
>>   		inject->tool.exit	    = perf_event__repipe_exit;
>>   		inject->tool.id_index	    = perf_event__repipe_id_index;
>>   		inject->tool.auxtrace_info  = perf_event__process_auxtrace_info;
>> diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
>> index 6da8d08..d509e74 100644
>> --- a/tools/perf/builtin-kmem.c
>> +++ b/tools/perf/builtin-kmem.c
>> @@ -964,6 +964,7 @@ static struct perf_tool perf_kmem = {
>>   	.comm		 = perf_event__process_comm,
>>   	.mmap		 = perf_event__process_mmap,
>>   	.mmap2		 = perf_event__process_mmap2,
>> +	.namespaces	 = perf_event__process_namespaces,
>>   	.ordered_events	 = true,
>>   };
>>   
>> diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
>> index 08fa88f..18e6c38 100644
>> --- a/tools/perf/builtin-kvm.c
>> +++ b/tools/perf/builtin-kvm.c
>> @@ -1044,6 +1044,7 @@ static int read_events(struct perf_kvm_stat *kvm)
>>   	struct perf_tool eops = {
>>   		.sample			= process_sample_event,
>>   		.comm			= perf_event__process_comm,
>> +		.namespaces		= perf_event__process_namespaces,
>>   		.ordered_events		= true,
>>   	};
>>   	struct perf_data_file file = {
>> @@ -1348,6 +1349,7 @@ static int kvm_events_live(struct perf_kvm_stat *kvm,
>>   	kvm->tool.exit   = perf_event__process_exit;
>>   	kvm->tool.fork   = perf_event__process_fork;
>>   	kvm->tool.lost   = process_lost_event;
>> +	kvm->tool.namespaces  = perf_event__process_namespaces;
>>   	kvm->tool.ordered_events = true;
>>   	perf_tool__fill_defaults(&kvm->tool);
>>   
>> diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
>> index ce3bfb4..d750cca 100644
>> --- a/tools/perf/builtin-lock.c
>> +++ b/tools/perf/builtin-lock.c
>> @@ -858,6 +858,7 @@ static int __cmd_report(bool display_info)
>>   	struct perf_tool eops = {
>>   		.sample		 = process_sample_event,
>>   		.comm		 = perf_event__process_comm,
>> +		.namespaces	 = perf_event__process_namespaces,
>>   		.ordered_events	 = true,
>>   	};
>>   	struct perf_data_file file = {
>> diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
>> index cd7bc4d..430656c 100644
>> --- a/tools/perf/builtin-mem.c
>> +++ b/tools/perf/builtin-mem.c
>> @@ -342,6 +342,7 @@ int cmd_mem(int argc, const char **argv, const char *prefix __maybe_unused)
>>   			.lost		= perf_event__process_lost,
>>   			.fork		= perf_event__process_fork,
>>   			.build_id	= perf_event__process_build_id,
>> +			.namespaces	= perf_event__process_namespaces,
>>   			.ordered_events	= true,
>>   		},
>>   		.input_name		 = "perf.data",
>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>> index 6cd6776..a8b9a78 100644
>> --- a/tools/perf/builtin-record.c
>> +++ b/tools/perf/builtin-record.c
>> @@ -876,6 +876,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>>   	signal(SIGTERM, sig_handler);
>>   	signal(SIGSEGV, sigsegv_handler);
>>   
>> +	if (rec->opts.record_namespaces)
>> +		tool->namespace_events = true;
>> +
>>   	if (rec->opts.auxtrace_snapshot_mode || rec->switch_output.enabled) {
>>   		signal(SIGUSR2, snapshot_sig_handler);
>>   		if (rec->opts.auxtrace_snapshot_mode)
>> @@ -1497,6 +1500,7 @@ static struct record record = {
>>   		.fork		= perf_event__process_fork,
>>   		.exit		= perf_event__process_exit,
>>   		.comm		= perf_event__process_comm,
>> +		.namespaces	= perf_event__process_namespaces,
>>   		.mmap		= perf_event__process_mmap,
>>   		.mmap2		= perf_event__process_mmap2,
>>   		.ordered_events	= true,
>> @@ -1611,6 +1615,8 @@ static struct option __record_options[] = {
>>   			  "opts", "AUX area tracing Snapshot Mode", ""),
>>   	OPT_UINTEGER(0, "proc-map-timeout", &record.opts.proc_map_timeout,
>>   			"per thread proc mmap processing timeout in ms"),
>> +	OPT_BOOLEAN(0, "namespaces", &record.opts.record_namespaces,
>> +		    "Record namespaces events"),
>>   	OPT_BOOLEAN(0, "switch-events", &record.opts.record_switch_events,
>>   		    "Record context switch events"),
>>   	OPT_BOOLEAN_FLAG(0, "all-kernel", &record.opts.all_kernel,
>> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
>> index dbd7fa0..5c92c75 100644
>> --- a/tools/perf/builtin-report.c
>> +++ b/tools/perf/builtin-report.c
>> @@ -694,6 +694,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
>>   			.mmap		 = perf_event__process_mmap,
>>   			.mmap2		 = perf_event__process_mmap2,
>>   			.comm		 = perf_event__process_comm,
>> +			.namespaces	 = perf_event__process_namespaces,
>>   			.exit		 = perf_event__process_exit,
>>   			.fork		 = perf_event__process_fork,
>>   			.lost		 = perf_event__process_lost,
>> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
>> index 270eb2d..e0ddd04 100644
>> --- a/tools/perf/builtin-sched.c
>> +++ b/tools/perf/builtin-sched.c
>> @@ -3272,6 +3272,7 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
>>   		.tool = {
>>   			.sample		 = perf_sched__process_tracepoint_sample,
>>   			.comm		 = perf_event__process_comm,
>> +			.namespaces	 = perf_event__process_namespaces,
>>   			.lost		 = perf_event__process_lost,
>>   			.fork		 = perf_sched__process_fork_event,
>>   			.ordered_events = true,
>> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
>> index c0783b4..f1ce806 100644
>> --- a/tools/perf/builtin-script.c
>> +++ b/tools/perf/builtin-script.c
>> @@ -2097,6 +2097,7 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
>>   			.mmap		 = perf_event__process_mmap,
>>   			.mmap2		 = perf_event__process_mmap2,
>>   			.comm		 = perf_event__process_comm,
>> +			.namespaces	 = perf_event__process_namespaces,
>>   			.exit		 = perf_event__process_exit,
>>   			.fork		 = perf_event__process_fork,
>>   			.attr		 = process_attr,
>> diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
>> index 40ef9b2..0bcd32f 100644
>> --- a/tools/perf/builtin-trace.c
>> +++ b/tools/perf/builtin-trace.c
>> @@ -2415,8 +2415,9 @@ static int trace__replay(struct trace *trace)
>>   	trace->tool.exit	  = perf_event__process_exit;
>>   	trace->tool.fork	  = perf_event__process_fork;
>>   	trace->tool.attr	  = perf_event__process_attr;
>> -	trace->tool.tracing_data = perf_event__process_tracing_data;
>> +	trace->tool.tracing_data  = perf_event__process_tracing_data;
>>   	trace->tool.build_id	  = perf_event__process_build_id;
>> +	trace->tool.namespaces	  = perf_event__process_namespaces;
>>   
>>   	trace->tool.ordered_events = true;
>>   	trace->tool.ordering_requires_timestamps = true;
>> diff --git a/tools/perf/perf.h b/tools/perf/perf.h
>> index 1c27d94..806c216 100644
>> --- a/tools/perf/perf.h
>> +++ b/tools/perf/perf.h
>> @@ -50,6 +50,7 @@ struct record_opts {
>>   	bool	     running_time;
>>   	bool	     full_auxtrace;
>>   	bool	     auxtrace_snapshot_mode;
>> +	bool	     record_namespaces;
>>   	bool	     record_switch_events;
>>   	bool	     all_kernel;
>>   	bool	     all_user;
>> diff --git a/tools/perf/util/Build b/tools/perf/util/Build
>> index 5da376b..2ea5ee1 100644
>> --- a/tools/perf/util/Build
>> +++ b/tools/perf/util/Build
>> @@ -42,6 +42,7 @@ libperf-y += pstack.o
>>   libperf-y += session.o
>>   libperf-$(CONFIG_AUDIT) += syscalltbl.o
>>   libperf-y += ordered-events.o
>> +libperf-y += namespaces.o
>>   libperf-y += comm.o
>>   libperf-y += thread.o
>>   libperf-y += thread_map.o
>> diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c
>> index 4e6cbc9..89ece24 100644
>> --- a/tools/perf/util/data-convert-bt.c
>> +++ b/tools/perf/util/data-convert-bt.c
>> @@ -1468,6 +1468,7 @@ int bt_convert__perf2ctf(const char *input, const char *path,
>>   			.lost            = perf_event__process_lost,
>>   			.tracing_data    = perf_event__process_tracing_data,
>>   			.build_id        = perf_event__process_build_id,
>> +			.namespaces      = perf_event__process_namespaces,
>>   			.ordered_events  = true,
>>   			.ordering_requires_timestamps = true,
>>   		},
>> diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
>> index 4ea7ce7..f118eac 100644
>> --- a/tools/perf/util/event.c
>> +++ b/tools/perf/util/event.c
>> @@ -31,6 +31,7 @@ static const char *perf_event__names[] = {
>>   	[PERF_RECORD_LOST_SAMPLES]		= "LOST_SAMPLES",
>>   	[PERF_RECORD_SWITCH]			= "SWITCH",
>>   	[PERF_RECORD_SWITCH_CPU_WIDE]		= "SWITCH_CPU_WIDE",
>> +	[PERF_RECORD_NAMESPACES]		= "NAMESPACES",
>>   	[PERF_RECORD_HEADER_ATTR]		= "ATTR",
>>   	[PERF_RECORD_HEADER_EVENT_TYPE]		= "EVENT_TYPE",
>>   	[PERF_RECORD_HEADER_TRACING_DATA]	= "TRACING_DATA",
>> @@ -1016,6 +1017,14 @@ int perf_event__process_comm(struct perf_tool *tool __maybe_unused,
>>   	return machine__process_comm_event(machine, event, sample);
>>   }
>>   
>> +int perf_event__process_namespaces(struct perf_tool *tool __maybe_unused,
>> +				   union perf_event *event,
>> +				   struct perf_sample *sample,
>> +				   struct machine *machine)
>> +{
>> +	return machine__process_namespaces_event(machine, event, sample);
>> +}
>> +
>>   int perf_event__process_lost(struct perf_tool *tool __maybe_unused,
>>   			     union perf_event *event,
>>   			     struct perf_sample *sample,
>> diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
>> index c735c53..4e90b09 100644
>> --- a/tools/perf/util/event.h
>> +++ b/tools/perf/util/event.h
>> @@ -39,6 +39,15 @@ struct comm_event {
>>   	char comm[16];
>>   };
>>   
>> +#define NAMESPACES_MAX			12
> Why have this limitation, does the kernel has it as well? Just read the

Right. A max limit here seems unnecessary. Will get rid of this.

> header, then allocate enough space, this way you don't need to have
> those checks about tools being incompatible with a kernel that supports
> more namespaces than the tool.
> And then how can this take place, i.e. are you truncating the extra
> namespaces coming from the kernel but continuing anyway, just after
> warning the user once about it?

Currently, NR_NAMESPACES is 7 & NAMESPACES_MAX is 12. I don't think we 
would go
beyond 12 namespaces in the kernel anytime soon. The magic number 12 is 
based
on that assumption.  So, truncating the extra namespaces coming from kernel
wasn't something I was concerned about. I will get rid of NAMESPACES_MAX 
magic
number as you suggested, thus not having to go by the assumption that the
kerenl may never have more than 12 namespaces.

But I think there is still a need to warn the user about different 
NR_NAMESPACES
value in kernel and perf because the warning was aimed at a scenario where
NR_NAMESPACES in the kernel is greater/less than the NR_NAMESPACES value
in perf.

> Wouldn't this message get lost in the logs and the user be left
> wondering why the namespaces it expects to be there to have vanished?

Hmmm.. Probably, while running perf-report or perf-script but not when 
data is
captured with perf-record. And that should be enough for the user to 
take a call
on where to post process or discard the data?

>> +	if (thread == NULL ||
>> +	    thread__set_namespaces(thread, sample->time, &event->namespaces)) {
>> +		dump_printf("problem processing PERF_RECORD_NAMESPACES, skipping event.\n");
>> +		err = -1;
>> +	}
>> +
>> +	thread__put(thread);
>> +
>> +	return err;
>> +}
>> +
>>   int machine__process_lost_event(struct machine *machine __maybe_unused,
>>   				union perf_event *event, struct perf_sample *sample __maybe_unused)
>>   {
>> @@ -1538,6 +1567,8 @@ int machine__process_event(struct machine *machine, union perf_event *event,
>>   		ret = machine__process_comm_event(machine, event, sample); break;
>>   	case PERF_RECORD_MMAP:
>>   		ret = machine__process_mmap_event(machine, event, sample); break;
>> +	case PERF_RECORD_NAMESPACES:
>> +		ret = machine__process_namespaces_event(machine, event, sample); break;
>>   	case PERF_RECORD_MMAP2:
>>   		ret = machine__process_mmap2_event(machine, event, sample); break;
>>   	case PERF_RECORD_FORK:
>> diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
>> index a283050..3cdb134 100644
>> --- a/tools/perf/util/machine.h
>> +++ b/tools/perf/util/machine.h
>> @@ -97,6 +97,9 @@ int machine__process_itrace_start_event(struct machine *machine,
>>   					union perf_event *event);
>>   int machine__process_switch_event(struct machine *machine,
>>   				  union perf_event *event);
>> +int machine__process_namespaces_event(struct machine *machine,
>> +				      union perf_event *event,
>> +				      struct perf_sample *sample);
>>   int machine__process_mmap_event(struct machine *machine, union perf_event *event,
>>   				struct perf_sample *sample);
>>   int machine__process_mmap2_event(struct machine *machine, union perf_event *event,
>> diff --git a/tools/perf/util/namespaces.c b/tools/perf/util/namespaces.c
>> new file mode 100644
>> index 0000000..3134c00
>> --- /dev/null
>> +++ b/tools/perf/util/namespaces.c
>> @@ -0,0 +1,35 @@
>> +/*
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License, version 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * Copyright (C) 2017 Hari Bathini, IBM Corporation
>> + */
>> +
>> +#include "namespaces.h"
>> +#include "util.h"
>> +#include "event.h"
>> +#include <stdlib.h>
>> +#include <stdio.h>
>> +
>> +struct namespaces *namespaces__new(struct namespaces_event *event)
>> +{
>> +	struct namespaces *namespaces = zalloc(sizeof(*namespaces));
>> +
>> +	if (!namespaces)
>> +		return NULL;
>> +
>> +	namespaces->end_time = -1;
>> +
>> +	if (event) {
>> +		memcpy(namespaces->link_info, event->link_info,
>> +		       sizeof(namespaces->link_info));
>> +	}
> Please allocate just what came from the kernel, be it less or more than
> that magic number (12).

>> +struct namespaces_event;
>> +
>> +struct namespaces {
>> +	struct list_head list;
>> +	u64 end_time;
>> +	struct perf_ns_link_info link_info[NR_NAMESPACES];
> Here you could have it as a zero sized array and allocate it according
> to the number of namespaces that came from the kernel

Sure. Will change that.

Thanks
Hari

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 4/8] perf tool: synthesize namespace events for current processes
  2017-03-01 21:05   ` Arnaldo Carvalho de Melo
@ 2017-03-03  8:57     ` Hari Bathini
  0 siblings, 0 replies; 24+ messages in thread
From: Hari Bathini @ 2017-03-03  8:57 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: ast, peterz, lkml, alexander.shishkin, mingo, daniel, rostedt,
	Ananth N Mavinakayanahalli, ebiederm, sargun, Aravinda Prasad,
	brendan.d.gregg, jolsa



On Thursday 02 March 2017 02:35 AM, Arnaldo Carvalho de Melo wrote:
> Em Tue, Feb 21, 2017 at 07:31:44PM +0530, Hari Bathini escreveu:
>> Synthesize PERF_RECORD_NAMESPACES events for processes that were
>> running prior to invocation of perf record, the data for which is
>> taken from /proc/$PID/ns. These changes make way for analyzing
>> events with regard to namespaces.
>>
>> Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
>> ---
>>   tools/perf/builtin-record.c |   27 +++++++++--
>>   tools/perf/util/event.c     |  107 +++++++++++++++++++++++++++++++++++++++++--
>>   tools/perf/util/event.h     |    6 ++
>>   3 files changed, 130 insertions(+), 10 deletions(-)
>>
>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>> index a8b9a78..f4bf6a6 100644
>> --- a/tools/perf/builtin-record.c
>> +++ b/tools/perf/builtin-record.c
>> @@ -986,6 +986,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>>   	 */
>>   	if (forks) {
>>   		union perf_event *event;
>> +		pid_t tgid;
>>   
>>   		event = malloc(sizeof(event->comm) + machine->id_hdr_size);
>>   		if (event == NULL) {
>> @@ -999,10 +1000,28 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>>   		 * cannot see a correct process name for those events.
>>   		 * Synthesize COMM event to prevent it.
>>   		 */
>> -		perf_event__synthesize_comm(tool, event,
>> -					    rec->evlist->workload.pid,
>> -					    process_synthesized_event,
>> -					    machine);
>> +		tgid = perf_event__synthesize_comm(tool, event,
>> +						   rec->evlist->workload.pid,
>> +						   process_synthesized_event,
>> +						   machine);
>> +		free(event);
>> +
>> +		if (tgid == -1)
>> +			goto out_child;
>> +
>> +		event = malloc(sizeof(event->namespaces) + machine->id_hdr_size);
>> +		if (event == NULL) {
>> +			err = -ENOMEM;
>> +			goto out_child;
>> +		}
>> +
>> +		/*
>> +		 * Synthesize NAMESPACES event for the command specified.
>> +		 */
>> +		perf_event__synthesize_namespaces(tool, event,
>> +						  rec->evlist->workload.pid,
>> +						  tgid, process_synthesized_event,
>> +						  machine);
>>   		free(event);
>>   
>>   		perf_evlist__start_workload(rec->evlist);
>> diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
>> index f118eac..c8c112a 100644
>> --- a/tools/perf/util/event.c
>> +++ b/tools/perf/util/event.c
>> @@ -50,6 +50,16 @@ static const char *perf_event__names[] = {
>>   	[PERF_RECORD_TIME_CONV]			= "TIME_CONV",
>>   };
>>   
>> +static const char *perf_ns__names[] = {
>> +	[NET_NS_INDEX]		= "net",
>> +	[UTS_NS_INDEX]		= "uts",
>> +	[IPC_NS_INDEX]		= "ipc",
>> +	[PID_NS_INDEX]		= "pid",
>> +	[USER_NS_INDEX]		= "user",
>> +	[MNT_NS_INDEX]		= "mnt",
>> +	[CGROUP_NS_INDEX]	= "cgroup",
>> +};
>> +
>>   const char *perf_event__name(unsigned int id)
>>   {
>>   	if (id >= ARRAY_SIZE(perf_event__names))
>> @@ -59,6 +69,13 @@ const char *perf_event__name(unsigned int id)
>>   	return perf_event__names[id];
>>   }
>>   
>> +static const char *perf_ns__name(unsigned int id)
>> +{
>> +	if (id >= ARRAY_SIZE(perf_ns__names))
>> +		return "UNKNOWN";
>> +	return perf_ns__names[id];
>> +}
>> +
>>   static int perf_tool__process_synth_event(struct perf_tool *tool,
>>   					  union perf_event *event,
>>   					  struct machine *machine,
>> @@ -204,6 +221,56 @@ pid_t perf_event__synthesize_comm(struct perf_tool *tool,
>>   	return tgid;
>>   }
>>   
>> +static void perf_event__get_ns_link_info(pid_t pid, const char *ns,
>> +					 struct perf_ns_link_info *ns_link_info)
>> +{
>> +	struct stat64 st;
>> +	char proc_ns[128];
>> +
>> +	sprintf(proc_ns, "/proc/%u/ns/%s", pid, ns);
>> +	if (stat64(proc_ns, &st) == 0) {
>> +		ns_link_info->dev = st.st_dev;
>> +		ns_link_info->ino = st.st_ino;
>> +	}
>> +}
>> +
>> +int perf_event__synthesize_namespaces(struct perf_tool *tool,
>> +				      union perf_event *event,
>> +				      pid_t pid, pid_t tgid,
>> +				      perf_event__handler_t process,
>> +				      struct machine *machine)
>> +{
>> +	u32 idx;
>> +	struct perf_ns_link_info *ns_link_info;
>> +
>> +	if (!tool->namespace_events)
>> +		return 0;
>> +
>> +	memset(&event->namespaces, 0,
>> +	       sizeof(event->namespaces) + machine->id_hdr_size);
>> +
>> +	event->namespaces.pid = tgid;
>> +	event->namespaces.tid = pid;
>> +
>> +	event->namespaces.nr_namespaces = NR_NAMESPACES;
> Huh? Don't you have to first figure out how many namespaces a process is
> in to then set this field?
>

NR_NAMESPACES is the total number of namespaces. For synthesized namespace
events, data is obtained from /proc/<pid>/ns/ dir. Looking at this dir, it
is difficult to arrive at the total number of namespaces in kernel, as
some namespaces can be compiled in/out of the kernel. NR_NAMESPACES is
used instead. Its value is most likely the same for kernel and perf tool
unless a new namespace is introduced - the warning in previous patch is
intended to alert the user about such scenario.

Thanks
Hari

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 8/8] perf tool: add cgroup identifier entry in perf report
  2017-03-01 21:16   ` Arnaldo Carvalho de Melo
@ 2017-03-03  8:59     ` Hari Bathini
  0 siblings, 0 replies; 24+ messages in thread
From: Hari Bathini @ 2017-03-03  8:59 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: ast, peterz, lkml, alexander.shishkin, Ingo Molnar, daniel,
	rostedt, Ananth N Mavinakayanahalli, ebiederm, sargun,
	Aravinda Prasad, brendan.d.gregg, jolsa



On Thursday 02 March 2017 02:46 AM, Arnaldo Carvalho de Melo wrote:
> Em Tue, Feb 21, 2017 at 07:33:13PM +0530, Hari Bathini escreveu:
>> This patch introduces a cgroup identifier entry field in perf report to
>> identify or distinguish data of different cgroups. It uses the device
>> number and inode number of cgroup namespace, included in perf data with
>> the new PERF_RECORD_NAMESPACES event, as cgroup identifier. With the
>> assumption that each container is created with it's own cgroup namespace,
>> this allows assessment/analysis of multiple containers at once.
> Could you try to do this with some real world example? I.e. telling that
> systemd creates some cgroups and then how to map the perf report output
> you show below with what systemd puts in place.
>
> Doing it with docker would be handy as well, no?

Cgroup namespace is a relatively new thing in the kernel (v4.9).
So, I am not sure if there is a real world example for this yet.
A simple test in the meanwhile would be to clone() multiple
processes by passing CLONE_NEWCROUP & SIGCHLD flags to each
process, executing a shell and running different workloads
on each of them while tracing.

> You also forgot to update the documentation with this new sort key,
> please add it in the --sort part of
> tools/perf/Documentation/perf-report.txt.

Sure. Will update.

> Ah, and just another minor request: please format the changeset
> summaries as:
>
>   perf tools: Add cgroup identifier sort order keyword
>
>
> First the component, that in this case is the generic one, as it affects
> multiple tools (top, report), and then start with a capital letter after
> the colon.
>

Sorry. I will take of this in next spin.

Thanks
Hari

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2017-03-03 11:25 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-21 14:01 [PATCH v7 0/8] perf: add support for analyzing events for containers Hari Bathini
2017-02-21 14:01 ` [PATCH v7 1/8] perf: add PERF_RECORD_NAMESPACES to include namespaces related info Hari Bathini
2017-02-24 12:14   ` Peter Zijlstra
2017-03-01 20:45     ` Arnaldo Carvalho de Melo
2017-02-21 14:01 ` [PATCH v7 2/8] perf tool: " Hari Bathini
2017-03-01 21:02   ` Arnaldo Carvalho de Melo
2017-03-03  8:54     ` Hari Bathini
2017-02-21 14:01 ` [PATCH v7 3/8] perf tool: update about the new option to record namespace events Hari Bathini
2017-03-01 21:03   ` Arnaldo Carvalho de Melo
2017-02-21 14:01 ` [PATCH v7 4/8] perf tool: synthesize namespace events for current processes Hari Bathini
2017-03-01 21:05   ` Arnaldo Carvalho de Melo
2017-03-03  8:57     ` Hari Bathini
2017-02-21 14:01 ` [PATCH v7 5/8] perf tool: add print support for namespace events Hari Bathini
2017-03-01 21:06   ` Arnaldo Carvalho de Melo
2017-02-21 14:02 ` [PATCH v7 6/8] perf tool: add script " Hari Bathini
2017-03-01 21:08   ` Arnaldo Carvalho de Melo
2017-02-21 14:02 ` [PATCH v7 7/8] perf tool: update about the new option to show " Hari Bathini
2017-02-21 14:03 ` [PATCH v7 8/8] perf tool: add cgroup identifier entry in perf report Hari Bathini
2017-02-22 16:48   ` Jiri Olsa
2017-03-01 21:16   ` Arnaldo Carvalho de Melo
2017-03-03  8:59     ` Hari Bathini
2017-02-22 11:11 ` [PATCH v7 0/8] perf: add support for analyzing events for containers Jiri Olsa
2017-02-22 12:40   ` Hari Bathini
2017-02-22 13:52     ` Jiri Olsa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).