All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 1/2] perf: add container identifier entry in perf sample data
@ 2016-08-30 16:27 Hari Bathini
  2016-08-30 16:27 ` [PATCH v2 2/2] perf tool: add container identifier entry related changes Hari Bathini
  2016-09-01  9:09 ` [PATCH v2 1/2] perf: add container identifier entry in perf sample data Peter Zijlstra
  0 siblings, 2 replies; 6+ messages in thread
From: Hari Bathini @ 2016-08-30 16:27 UTC (permalink / raw)
  To: ast, peterz, lkml, acme, alexander.shishkin, mingo
  Cc: daniel, rostedt, Ananth N Mavinakayanahalli, ebiederm, sargun,
	Aravinda Prasad, brendan.d.gregg

Currently, there is no mechanism to filter events based on containers.
perf -G can be used, but it will not filter events for the containers
created after perf is invoked, making it difficult to assess/analyze
performance issues of multiple containers at once. This limitation can
be overcome, if there is a standard kernel identifier for containers.

This patch introduces a container identifier entry field in perf sample
data to identify or distinguish sample data of different containers. It
uses the cgroup namespace inode number of a given task as it's container
identifier (cid). Alternatively, inode number of pid namespace can also
be used as cid. This patch assumes each container is created with it's
own cgroup namespace.

Suggested-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
---

Changes from v1:
  1. Updated PERF_RECORD_SAMPLE comment.
  2. Fixed compile issue with CONFIG_CGROUPS=n

Will post the manpage update as and when this gets in..


 include/linux/perf_event.h      |    4 ++++
 include/uapi/linux/perf_event.h |    4 +++-
 kernel/events/core.c            |   23 +++++++++++++++++++++++
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 2b6b43c..4d553ee 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -908,6 +908,10 @@ struct perf_sample_data {
 
 	struct perf_regs		regs_intr;
 	u64				stack_user_size;
+	struct {
+		u32	cid;
+		u32	reserved;
+	}				cid_entry;
 } ____cacheline_aligned;
 
 /* default value for data source */
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index c66a485..826b799 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -139,8 +139,9 @@ enum perf_event_sample_format {
 	PERF_SAMPLE_IDENTIFIER			= 1U << 16,
 	PERF_SAMPLE_TRANSACTION			= 1U << 17,
 	PERF_SAMPLE_REGS_INTR			= 1U << 18,
+	PERF_SAMPLE_CID				= 1U << 19,
 
-	PERF_SAMPLE_MAX = 1U << 19,		/* non-ABI */
+	PERF_SAMPLE_MAX = 1U << 20,		/* non-ABI */
 };
 
 /*
@@ -773,6 +774,7 @@ enum perf_event_type {
 	 *	{ u64			transaction; } && PERF_SAMPLE_TRANSACTION
 	 *	{ u64			abi; # enum perf_sample_regs_abi
 	 *	  u64			regs[weight(mask)]; } && PERF_SAMPLE_REGS_INTR
+	 *	{ u32			cid, res; } && PERF_SAMPLE_CID
 	 * };
 	 */
 	PERF_RECORD_SAMPLE			= 9,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 3cfabdf..465febd 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5776,6 +5776,9 @@ void perf_output_sample(struct perf_output_handle *handle,
 		}
 	}
 
+	if (sample_type & PERF_SAMPLE_CID)
+		perf_output_put(handle, data->cid_entry);
+
 	if (!event->attr.watermark) {
 		int wakeup_events = event->attr.wakeup_events;
 
@@ -5909,6 +5912,26 @@ void perf_prepare_sample(struct perf_event_header *header,
 
 		header->size += size;
 	}
+
+	if (sample_type & PERF_SAMPLE_CID) {
+		int size = sizeof(u64);
+
+		/* Container identifier for a given task */
+#ifdef CONFIG_CGROUPS
+		/*
+		 * Use the task's cgroup namespace inode number.
+		 */
+		data->cid_entry.cid = current->nsproxy->cgroup_ns->ns.inum;
+#else
+		/*
+		 * If cgroup namespace is not enabled,
+		 * all tasks have the same cid.
+		 */
+		data->cid_entry.cid = 0xffffffffUL;
+#endif
+		data->cid_entry.reserved = 0;
+		header->size += size;
+	}
 }
 
 static void __always_inline

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 2/2] perf tool: add container identifier entry related changes
  2016-08-30 16:27 [PATCH v2 1/2] perf: add container identifier entry in perf sample data Hari Bathini
@ 2016-08-30 16:27 ` Hari Bathini
  2016-09-01  9:09 ` [PATCH v2 1/2] perf: add container identifier entry in perf sample data Peter Zijlstra
  1 sibling, 0 replies; 6+ messages in thread
From: Hari Bathini @ 2016-08-30 16:27 UTC (permalink / raw)
  To: ast, peterz, lkml, acme, alexander.shishkin, mingo
  Cc: daniel, rostedt, Ananth N Mavinakayanahalli, ebiederm, sargun,
	Aravinda Prasad, brendan.d.gregg

With the introduction of container identifier entry in sample data,
perf sample data can now be analyzed with regard to containers. This
patch adds cid entry related support in perf tool.

Shown below is the output of perf report, sorted based on cid, on a
system that was running three containers at the time of perf record
and clearly showing one of the containers' considerable use of kernel
memory in comparison with others:

	$ perf report -s cid -n --stdio
	#
	# Total Lost Samples: 0
	#
	# Samples: 2K of event 'kmem:kmalloc'
	# Event count (approx.): 2171
	#
	# Overhead       Samples  Container ID
	# ........  ............  .............
	#
	    91.20%          1980  4026532048
	     3.55%            77  4026532105
	     2.67%            58  4026532162
	     2.58%            56  4026531835

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
---

Changes from v1:
  Added "--sample-cid" option in perf-record to optionally
  enable PERF_SAMPLE_CID


 tools/include/uapi/linux/perf_event.h    |    4 +++
 tools/perf/Documentation/perf-report.txt |    3 ++-
 tools/perf/Documentation/perf-script.txt |    4 ++-
 tools/perf/builtin-record.c              |    1 +
 tools/perf/builtin-script.c              |   14 ++++++++++--
 tools/perf/perf.h                        |    1 +
 tools/perf/util/event.h                  |    1 +
 tools/perf/util/evsel.c                  |   35 ++++++++++++++++++++++++++++++
 tools/perf/util/hist.c                   |    2 ++
 tools/perf/util/hist.h                   |    1 +
 tools/perf/util/session.c                |    3 +++
 tools/perf/util/sort.c                   |   22 +++++++++++++++++++
 tools/perf/util/sort.h                   |    2 ++
 13 files changed, 87 insertions(+), 6 deletions(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index c66a485..826b799 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -139,8 +139,9 @@ enum perf_event_sample_format {
 	PERF_SAMPLE_IDENTIFIER			= 1U << 16,
 	PERF_SAMPLE_TRANSACTION			= 1U << 17,
 	PERF_SAMPLE_REGS_INTR			= 1U << 18,
+	PERF_SAMPLE_CID				= 1U << 19,
 
-	PERF_SAMPLE_MAX = 1U << 19,		/* non-ABI */
+	PERF_SAMPLE_MAX = 1U << 20,		/* non-ABI */
 };
 
 /*
@@ -773,6 +774,7 @@ enum perf_event_type {
 	 *	{ u64			transaction; } && PERF_SAMPLE_TRANSACTION
 	 *	{ u64			abi; # enum perf_sample_regs_abi
 	 *	  u64			regs[weight(mask)]; } && PERF_SAMPLE_REGS_INTR
+	 *	{ u32			cid, res; } && PERF_SAMPLE_CID
 	 * };
 	 */
 	PERF_RECORD_SAMPLE			= 9,
diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 2d17462..b081aef 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -68,12 +68,13 @@ OPTIONS
 --sort=::
 	Sort histogram entries by given key(s) - multiple keys can be specified
 	in CSV format.  Following sort keys are available:
-	pid, comm, dso, symbol, parent, cpu, socket, srcline, weight, local_weight.
+	pid, comm, cid, dso, symbol, parent, cpu, socket, srcline, weight, local_weight.
 
 	Each key has following meaning:
 
 	- comm: command (name) of the task which can be read via /proc/<pid>/comm
 	- pid: command and tid of the task
+	- cid: ccontainer id of the task
 	- dso: name of library or module executed at the time of sample
 	- symbol: name of function executed at the time of sample
 	- parent: name of function matched to the parent regex filter. Unmatched
diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index 053bbbd..f87ebe7 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -115,8 +115,8 @@ OPTIONS
 -F::
 --fields::
         Comma separated list of fields to print. Options are:
-        comm, tid, pid, time, cpu, event, trace, ip, sym, dso, addr, symoff,
-        srcline, period, iregs, brstack, brstacksym, flags, bpf-output,
+        comm, tid, pid, cid, time, cpu, event, trace, ip, sym, dso, addr,
+        symoff, srcline, period, iregs, brstack, brstacksym, flags, bpf-output,
         callindent. Field list can be prepended with the type, trace, sw or hw,
         to indicate to which event type the field list applies.
         e.g., -F sw:comm,tid,time,ip,sym  and -F trace:time,cpu,trace
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 6355902..3f6172e 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1466,6 +1466,7 @@ struct option __record_options[] = {
 		    "sample by weight (on special events only)"),
 	OPT_BOOLEAN(0, "transaction", &record.opts.sample_transaction,
 		    "sample transaction flags (special events only)"),
+	OPT_BOOLEAN(0, "sample-cid", &record.opts.sample_cid, "Record the sample container id"),
 	OPT_BOOLEAN(0, "per-thread", &record.opts.target.per_thread,
 		    "use per-thread mmaps"),
 	OPT_CALLBACK_OPTARG('I', "intr-regs", &record.opts.sample_intr_regs, NULL, "any register",
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index c859e59..99f8404 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -65,6 +65,7 @@ enum perf_output_field {
 	PERF_OUTPUT_WEIGHT	    = 1U << 18,
 	PERF_OUTPUT_BPF_OUTPUT	    = 1U << 19,
 	PERF_OUTPUT_CALLINDENT	    = 1U << 20,
+	PERF_OUTPUT_CID             = 1U << 21,
 };
 
 struct output_option {
@@ -92,6 +93,7 @@ struct output_option {
 	{.str = "weight",   .field = PERF_OUTPUT_WEIGHT},
 	{.str = "bpf-output",   .field = PERF_OUTPUT_BPF_OUTPUT},
 	{.str = "callindent", .field = PERF_OUTPUT_CALLINDENT},
+	{.str = "cid",   .field = PERF_OUTPUT_CID},
 };
 
 /* default set to maintain compatibility with current format */
@@ -312,6 +314,11 @@ static int perf_evsel__check_attr(struct perf_evsel *evsel,
 					PERF_OUTPUT_IREGS))
 		return -EINVAL;
 
+	if (PRINT_FIELD(CID) &&
+		perf_evsel__check_stype(evsel, PERF_SAMPLE_CID, "CID",
+					PERF_OUTPUT_CID))
+		return -EINVAL;
+
 	return 0;
 }
 
@@ -911,6 +918,9 @@ static void process_event(struct perf_script *script,
 	if (perf_evsel__is_bpf_output(evsel) && PRINT_FIELD(BPF_OUTPUT))
 		print_sample_bpf_output(sample);
 
+	if (PRINT_FIELD(CID))
+		printf("%10u ", sample->cid);
+
 	printf("\n");
 }
 
@@ -2121,8 +2131,8 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
 	OPT_CALLBACK('F', "fields", NULL, "str",
 		     "comma separated output fields prepend with 'type:'. "
 		     "Valid types: hw,sw,trace,raw. "
-		     "Fields: comm,tid,pid,time,cpu,event,trace,ip,sym,dso,"
-		     "addr,symoff,period,iregs,brstack,brstacksym,flags,"
+		     "Fields: comm,tid,pid,cid,time,cpu,event,trace,ip,sym,"
+		     "dso,addr,symoff,period,iregs,brstack,brstacksym,flags,"
 		     "bpf-output,callindent", parse_output_fields),
 	OPT_BOOLEAN('a', "all-cpus", &system_wide,
 		    "system-wide collection from all CPUs"),
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index cb0f135..434659c 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -73,6 +73,7 @@ struct record_opts {
 	size_t	     auxtrace_snapshot_size;
 	const char   *auxtrace_snapshot_opts;
 	bool	     sample_transaction;
+	bool	     sample_cid;
 	unsigned     initial_delay;
 	bool         use_clockid;
 	clockid_t    clockid;
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 8d363d5..c35b1c5 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -191,6 +191,7 @@ struct perf_sample {
 	u32 raw_size;
 	u64 data_src;
 	u32 flags;
+	u32 cid;
 	u16 insn_len;
 	u8  cpumode;
 	void *raw_data;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 21fd573..9e4b749 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -929,6 +929,9 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts,
 	if (opts->sample_transaction)
 		perf_evsel__set_sample_bit(evsel, TRANSACTION);
 
+	if (opts->sample_cid)
+		perf_evsel__set_sample_bit(evsel, CID);
+
 	if (opts->running_time) {
 		evsel->attr.read_format |=
 			PERF_FORMAT_TOTAL_TIME_ENABLED |
@@ -1973,6 +1976,20 @@ int perf_evsel__parse_sample(struct perf_evsel *evsel, union perf_event *event,
 		}
 	}
 
+	data->cid = 0;
+	if (type & PERF_SAMPLE_CID) {
+		u.val64 = *array;
+
+		if (swapped) {
+			/* undo swap of u64, then swap on individual u32s */
+			u.val64 = bswap_64(u.val64);
+			u.val32[0] = bswap_32(u.val32[0]);
+		}
+
+		data->cid = u.val32[0];
+		array++;
+	}
+
 	return 0;
 }
 
@@ -2078,6 +2095,9 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
 		}
 	}
 
+	if (type & PERF_SAMPLE_CID)
+		result += sizeof(u64);
+
 	return result;
 }
 
@@ -2267,6 +2287,21 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type,
 		}
 	}
 
+	if (type & PERF_SAMPLE_CID) {
+		u.val32[0] = sample->cid;
+
+		if (swapped) {
+			/*
+			 * Inverse of what is done in perf_evsel__parse_sample
+			 */
+			u.val32[0] = bswap_32(u.val32[0]);
+			u.val64 = bswap_64(u.val64);
+		}
+
+		*array = u.val64;
+		array++;
+	}
+
 	return 0;
 }
 
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index de15dbc..93d5054 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -168,6 +168,7 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h)
 		hists__set_unres_dso_col_len(hists, HISTC_MEM_DADDR_DSO);
 	}
 
+	hists__new_col_len(hists, HISTC_CID, 10);
 	hists__new_col_len(hists, HISTC_CPU, 3);
 	hists__new_col_len(hists, HISTC_SOCKET, 6);
 	hists__new_col_len(hists, HISTC_MEM_LOCKED, 6);
@@ -591,6 +592,7 @@ __hists__add_entry(struct hists *hists,
 		.hists	= hists,
 		.branch_info = bi,
 		.mem_info = mi,
+		.cid = sample->cid,
 		.transaction = sample->transaction,
 		.raw_data = sample->raw_data,
 		.raw_size = sample->raw_size,
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 0a1edf1..1ad1d91 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -28,6 +28,7 @@ enum hist_column {
 	HISTC_SYMBOL,
 	HISTC_DSO,
 	HISTC_THREAD,
+	HISTC_CID,
 	HISTC_COMM,
 	HISTC_PARENT,
 	HISTC_CPU,
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 5d61242..d40f9c7 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1108,6 +1108,9 @@ static void dump_sample(struct perf_evsel *evsel, union perf_event *event,
 
 	if (sample_type & PERF_SAMPLE_READ)
 		sample_read__printf(sample, evsel->attr.read_format);
+
+	if (sample_type & PERF_SAMPLE_CID)
+		printf("... cid: %u\n", sample->cid);
 }
 
 static struct machine *machines__find_for_cpumode(struct machines *machines,
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 3d3cb83..c7cf2a7 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -1396,6 +1396,27 @@ struct sort_entry sort_transaction = {
 	.se_width_idx	= HISTC_TRANSACTION,
 };
 
+/* --sort cid */
+
+static int64_t
+sort__cid_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	return (int64_t)right->cid - (int64_t)left->cid;
+}
+
+static int hist_entry__cid_snprintf(struct hist_entry *he, char *bf,
+				       size_t size, unsigned int width)
+{
+	return repsep_snprintf(bf, size, "%-*u", width, he->cid);
+}
+
+struct sort_entry sort_cid = {
+	.se_header	= "Container ID ",
+	.se_cmp		= sort__cid_cmp,
+	.se_snprintf	= hist_entry__cid_snprintf,
+	.se_width_idx	= HISTC_CID,
+};
+
 struct sort_dimension {
 	const char		*name;
 	struct sort_entry	*entry;
@@ -1418,6 +1439,7 @@ static struct sort_dimension common_sort_dimensions[] = {
 	DIM(SORT_GLOBAL_WEIGHT, "weight", sort_global_weight),
 	DIM(SORT_TRANSACTION, "transaction", sort_transaction),
 	DIM(SORT_TRACE, "trace", sort_trace),
+	DIM(SORT_CID, "cid", sort_cid),
 };
 
 #undef DIM
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 7ca37ea..eea40e5 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -94,6 +94,7 @@ struct hist_entry {
 	u64			transaction;
 	s32			socket;
 	s32			cpu;
+	u32			cid;
 	u8			cpumode;
 	u8			depth;
 
@@ -210,6 +211,7 @@ enum sort_type {
 	SORT_GLOBAL_WEIGHT,
 	SORT_TRANSACTION,
 	SORT_TRACE,
+	SORT_CID,
 
 	/* branch stack specific sort keys */
 	__SORT_BRANCH_STACK,

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/2] perf: add container identifier entry in perf sample data
  2016-08-30 16:27 [PATCH v2 1/2] perf: add container identifier entry in perf sample data Hari Bathini
  2016-08-30 16:27 ` [PATCH v2 2/2] perf tool: add container identifier entry related changes Hari Bathini
@ 2016-09-01  9:09 ` Peter Zijlstra
  2016-09-02 13:55   ` Hari Bathini
  1 sibling, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2016-09-01  9:09 UTC (permalink / raw)
  To: Hari Bathini
  Cc: ast, lkml, acme, alexander.shishkin, mingo, daniel, rostedt,
	Ananth N Mavinakayanahalli, ebiederm, sargun, Aravinda Prasad,
	brendan.d.gregg

On Tue, Aug 30, 2016 at 09:57:02PM +0530, Hari Bathini wrote:
> Currently, there is no mechanism to filter events based on containers.
> perf -G can be used, but it will not filter events for the containers
> created after perf is invoked, making it difficult to assess/analyze
> performance issues of multiple containers at once. This limitation can
> be overcome, if there is a standard kernel identifier for containers.
> 
> This patch introduces a container identifier entry field in perf sample
> data to identify or distinguish sample data of different containers. It
> uses the cgroup namespace inode number of a given task as it's container
> identifier (cid). Alternatively, inode number of pid namespace can also
> be used as cid. This patch assumes each container is created with it's
> own cgroup namespace.

I'm thinking this value is mostly the same for tasks, just like COMM and
MMAP. Could we therefore not emit (sideband) events whenever a task
changes namespace and get the same information but with tons less data?

That also gives the possibility of recording all namespaces, not just
the one.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/2] perf: add container identifier entry in perf sample data
  2016-09-01  9:09 ` [PATCH v2 1/2] perf: add container identifier entry in perf sample data Peter Zijlstra
@ 2016-09-02 13:55   ` Hari Bathini
  2016-09-02 13:59     ` Peter Zijlstra
  0 siblings, 1 reply; 6+ messages in thread
From: Hari Bathini @ 2016-09-02 13:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: ast, lkml, acme, alexander.shishkin, mingo, daniel, rostedt,
	Ananth N Mavinakayanahalli, ebiederm, sargun, Aravinda Prasad,
	brendan.d.gregg



On Thursday 01 September 2016 02:39 PM, Peter Zijlstra wrote:
> On Tue, Aug 30, 2016 at 09:57:02PM +0530, Hari Bathini wrote:
>> Currently, there is no mechanism to filter events based on containers.
>> perf -G can be used, but it will not filter events for the containers
>> created after perf is invoked, making it difficult to assess/analyze
>> performance issues of multiple containers at once. This limitation can
>> be overcome, if there is a standard kernel identifier for containers.
>>
>> This patch introduces a container identifier entry field in perf sample
>> data to identify or distinguish sample data of different containers. It
>> uses the cgroup namespace inode number of a given task as it's container
>> identifier (cid). Alternatively, inode number of pid namespace can also
>> be used as cid. This patch assumes each container is created with it's
>> own cgroup namespace.

Hi Peter,

> I'm thinking this value is mostly the same for tasks, just like COMM and

I think so, too. Namespaces aren't changed that often for tasks...

> MMAP. Could we therefore not emit (sideband) events whenever a task
> changes namespace and get the same information but with tons less data?

You mean, something like PERF_RECORD_NAMESPACE that
emits events on fork, clone, setns..?
  

> That also gives the possibility of recording all namespaces, not just
> the one.

True. If we record all namespaces, container identifier interpretation
can be left to the userspace to decide, which is much more flexible...

Thanks
Hari

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/2] perf: add container identifier entry in perf sample data
  2016-09-02 13:55   ` Hari Bathini
@ 2016-09-02 13:59     ` Peter Zijlstra
  2016-09-02 16:55       ` Hari Bathini
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2016-09-02 13:59 UTC (permalink / raw)
  To: Hari Bathini
  Cc: ast, lkml, acme, alexander.shishkin, mingo, daniel, rostedt,
	Ananth N Mavinakayanahalli, ebiederm, sargun, Aravinda Prasad,
	brendan.d.gregg

On Fri, Sep 02, 2016 at 07:25:31PM +0530, Hari Bathini wrote:
> >I'm thinking this value is mostly the same for tasks, just like COMM and
> 
> I think so, too. Namespaces aren't changed that often for tasks...
> 
> >MMAP. Could we therefore not emit (sideband) events whenever a task
> >changes namespace and get the same information but with tons less data?
> 
> You mean, something like PERF_RECORD_NAMESPACE that
> emits events on fork, clone, setns..?

Yep.

> 
> >That also gives the possibility of recording all namespaces, not just
> >the one.
> 
> True. If we record all namespaces, container identifier interpretation
> can be left to the userspace to decide, which is much more flexible...

The only complication is initial state, on record start you'd have to
trawl /proc and generate 'fake' namespace records for all (relevant)
tasks.

We do the same with MMAP records, we parse /proc/$pid/maps for that.

Is this namespace stuff available in /proc somewhere?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/2] perf: add container identifier entry in perf sample data
  2016-09-02 13:59     ` Peter Zijlstra
@ 2016-09-02 16:55       ` Hari Bathini
  0 siblings, 0 replies; 6+ messages in thread
From: Hari Bathini @ 2016-09-02 16:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: ast, lkml, acme, alexander.shishkin, mingo, daniel, rostedt,
	Ananth N Mavinakayanahalli, ebiederm, sargun, Aravinda Prasad,
	brendan.d.gregg



On Friday 02 September 2016 07:29 PM, Peter Zijlstra wrote:
> On Fri, Sep 02, 2016 at 07:25:31PM +0530, Hari Bathini wrote:
>>> I'm thinking this value is mostly the same for tasks, just like COMM and
>> I think so, too. Namespaces aren't changed that often for tasks...
>>
>>> MMAP. Could we therefore not emit (sideband) events whenever a task
>>> changes namespace and get the same information but with tons less data?
>> You mean, something like PERF_RECORD_NAMESPACE that
>> emits events on fork, clone, setns..?
> Yep.

Ok. Thanks!

>
>>> That also gives the possibility of recording all namespaces, not just
>>> the one.
>> True. If we record all namespaces, container identifier interpretation
>> can be left to the userspace to decide, which is much more flexible...
> The only complication is initial state, on record start you'd have to
> trawl /proc and generate 'fake' namespace records for all (relevant)
> tasks.
>
> We do the same with MMAP records, we parse /proc/$pid/maps for that.
>
> Is this namespace stuff available in /proc somewhere?
>

Yes, Peter. /proc/$pid/ns
Will work on this and respin...

Thanks
Hari

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-09-02 16:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-30 16:27 [PATCH v2 1/2] perf: add container identifier entry in perf sample data Hari Bathini
2016-08-30 16:27 ` [PATCH v2 2/2] perf tool: add container identifier entry related changes Hari Bathini
2016-09-01  9:09 ` [PATCH v2 1/2] perf: add container identifier entry in perf sample data Peter Zijlstra
2016-09-02 13:55   ` Hari Bathini
2016-09-02 13:59     ` Peter Zijlstra
2016-09-02 16:55       ` Hari Bathini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.