linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] perf: Container-aware tracing support
@ 2015-07-15  9:08 Aravinda Prasad
  2015-07-15 12:47 ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Aravinda Prasad @ 2015-07-15  9:08 UTC (permalink / raw)
  To: a.p.zijlstra, linux-kernel, rostedt, mingo, paulus, acme; +Cc: hbathini, ananth

Current tracing infrastructure such as perf and ftrace reports system
wide data when invoked inside a container. It is required to restrict
events specific to a container context when such tools are invoked
inside a container.

This RFC patch supports filtering container specific events, without
any change in the user interface, when invoked within a container for
the perf utility; such support needs to be extended to ftrace. This
patch assumes that the debugfs is available within the container and
all the processes running inside a container are grouped into a single
perf_event subsystem of cgroups. This patch piggybacks on the existing
support available for tracing with cgroups [1] by setting the cgrp
member of the event structure to the cgroup of the context perf tool
is invoked from.

However, this patch is not complete and requires more work to fully
support tracing inside a container. This patch is intended to initiate
the discussion on having container-aware tracing support. A detailed
explanation on what is supported and pending issues are mentioned
below.

Suggestions, feedback, flames are welcome.

[1] https://lkml.org/lkml/2011/2/14/40

--------------------------------------------------------------------
Details:

With this patch, perf-stat, perf-record (tracepoints, [ku]rpobes) and
perf-top when executed within a container reports events that are
triggered only in that container context. However, there are couple
of limitations on how this works for kprobes/uprobes and in general
ftrace infrastructure.

The problem arises due to the use of files /sys/kernel/debug/
tracing/[uk]probe_events. Perf utility inserts a probe by writing into
the [uk]probe_events file, which is parsed by the kernel to register
an event. When debugfs is mounted inside containers, the contents of
these files are visible to all containers. This implies that a user
within a container can list/delete probes registered by other
containers, leading to security issues and/or denial of service (Eg:
by deleting a probe from another container every time it is
registered). This could be undesirable depending on the way containers
are used (Eg: if used in multi-tenancy with each users assigned a
container).

The issues mentioned above exist for tracing infrastructures which use
ftrace interface. One approach is to have a container specific view of
these files under /sys/kernel/debug/tracing. At this moment, this seems
to require a significant rework of ftrace.

We are looking for feedback on the assumptions we have made about the
processes running inside a container grouped into a single perf_event
subsystem and also any thoughts on extending such support to ftrace.

Regards,
Aravinda

Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
---
 kernel/events/core.c |   49 +++++++++++++++++++++++++++++++++++--------------
 1 file changed, 35 insertions(+), 14 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 81aa3a4..f6a1f89 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -589,17 +589,38 @@ static inline int perf_cgroup_connect(int fd, struct perf_event *event,
 {
 	struct perf_cgroup *cgrp;
 	struct cgroup_subsys_state *css;
-	struct fd f = fdget(fd);
+	struct fd f;
 	int ret = 0;
 
-	if (!f.file)
-		return -EBADF;
+	if (fd != -1) {
+		f = fdget(fd);
+		if (!f.file)
+			return -EBADF;
 
-	css = css_tryget_online_from_dir(f.file->f_path.dentry,
+		css = css_tryget_online_from_dir(f.file->f_path.dentry,
 					 &perf_event_cgrp_subsys);
-	if (IS_ERR(css)) {
-		ret = PTR_ERR(css);
-		goto out;
+		if (IS_ERR(css)) {
+			ret = PTR_ERR(css);
+			fdput(f);
+			return ret;
+		}
+	} else if (event->attach_state == PERF_ATTACH_TASK) {
+		/* Tracing on a PID. No need to set event->cgrp */
+		return ret;
+	} else if (task_active_pid_ns(current) != &init_pid_ns) {
+		/* Don't set event->cgrp if task belongs to root cgroup */
+		if (task_css_is_root(current, perf_event_cgrp_id))
+			return ret;
+
+		css = task_css(current, perf_event_cgrp_id);
+		if (!css || !css_tryget_online(css))
+			return -ENOENT;
+	} else {
+		/*
+		 * perf invoked from global context and hence don't set
+		 * event->cgrp as all the events should be included
+		 */
+		return ret;
 	}
 
 	cgrp = container_of(css, struct perf_cgroup, css);
@@ -614,8 +635,10 @@ static inline int perf_cgroup_connect(int fd, struct perf_event *event,
 		perf_detach_cgroup(event);
 		ret = -EINVAL;
 	}
-out:
-	fdput(f);
+
+	if (fd != -1)
+		fdput(f);
+
 	return ret;
 }
 
@@ -7554,11 +7577,9 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 	if (!has_branch_stack(event))
 		event->attr.branch_sample_type = 0;
 
-	if (cgroup_fd != -1) {
-		err = perf_cgroup_connect(cgroup_fd, event, attr, group_leader);
-		if (err)
-			goto err_ns;
-	}
+	err = perf_cgroup_connect(cgroup_fd, event, attr, group_leader);
+	if (err)
+		goto err_ns;
 
 	pmu = perf_init_event(event);
 	if (!pmu)


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] perf: Container-aware tracing support
  2015-07-15  9:08 [RFC PATCH] perf: Container-aware tracing support Aravinda Prasad
@ 2015-07-15 12:47 ` Peter Zijlstra
  2015-07-15 16:21   ` Aravinda Prasad
  2015-07-17 12:26   ` Ingo Molnar
  0 siblings, 2 replies; 7+ messages in thread
From: Peter Zijlstra @ 2015-07-15 12:47 UTC (permalink / raw)
  To: Aravinda Prasad
  Cc: linux-kernel, rostedt, mingo, paulus, acme, hbathini, ananth

On Wed, Jul 15, 2015 at 02:38:36PM +0530, Aravinda Prasad wrote:
> Current tracing infrastructure such as perf and ftrace reports system
> wide data when invoked inside a container. It is required to restrict
> events specific to a container context when such tools are invoked
> inside a container.
> 
> This RFC patch supports filtering container specific events, without
> any change in the user interface, when invoked within a container for
> the perf utility; such support needs to be extended to ftrace. This
> patch assumes that the debugfs is available within the container and
> all the processes running inside a container are grouped into a single
> perf_event subsystem of cgroups. This patch piggybacks on the existing
> support available for tracing with cgroups [1] by setting the cgrp
> member of the event structure to the cgroup of the context perf tool
> is invoked from.
> 
> However, this patch is not complete and requires more work to fully
> support tracing inside a container. This patch is intended to initiate
> the discussion on having container-aware tracing support. A detailed
> explanation on what is supported and pending issues are mentioned
> below.

tracing is outside the scope of perf; I suspect you want tracefs to be
sensitive to filesystem namespaces and all that that entails.

> Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> ---
>  kernel/events/core.c |   49 +++++++++++++++++++++++++++++++++++--------------
>  1 file changed, 35 insertions(+), 14 deletions(-)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 81aa3a4..f6a1f89 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -589,17 +589,38 @@ static inline int perf_cgroup_connect(int fd, struct perf_event *event,
>  {
>  	struct perf_cgroup *cgrp;
>  	struct cgroup_subsys_state *css;
> -	struct fd f = fdget(fd);
> +	struct fd f;
>  	int ret = 0;
>  
> -	if (!f.file)
> -		return -EBADF;
> +	if (fd != -1) {
> +		f = fdget(fd);
> +		if (!f.file)
> +			return -EBADF;
>  
> -	css = css_tryget_online_from_dir(f.file->f_path.dentry,
> +		css = css_tryget_online_from_dir(f.file->f_path.dentry,
>  					 &perf_event_cgrp_subsys);
> -	if (IS_ERR(css)) {
> -		ret = PTR_ERR(css);
> -		goto out;
> +		if (IS_ERR(css)) {
> +			ret = PTR_ERR(css);
> +			fdput(f);
> +			return ret;
> +		}
> +	} else if (event->attach_state == PERF_ATTACH_TASK) {
> +		/* Tracing on a PID. No need to set event->cgrp */
> +		return ret;
> +	} else if (task_active_pid_ns(current) != &init_pid_ns) {

Why the pid namespace?

> +		/* Don't set event->cgrp if task belongs to root cgroup */
> +		if (task_css_is_root(current, perf_event_cgrp_id))
> +			return ret;

So if you have the root perf_cgroup inside your container you can
escape?

> +
> +		css = task_css(current, perf_event_cgrp_id);
> +		if (!css || !css_tryget_online(css))
> +			return -ENOENT;
> +	} else {
> +		/*
> +		 * perf invoked from global context and hence don't set
> +		 * event->cgrp as all the events should be included
> +		 */
> +		return ret;
>  	}
>  
>  	cgrp = container_of(css, struct perf_cgroup, css);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] perf: Container-aware tracing support
  2015-07-15 12:47 ` Peter Zijlstra
@ 2015-07-15 16:21   ` Aravinda Prasad
  2015-07-17 10:19     ` Peter Zijlstra
  2015-07-17 12:26   ` Ingo Molnar
  1 sibling, 1 reply; 7+ messages in thread
From: Aravinda Prasad @ 2015-07-15 16:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, rostedt, mingo, paulus, acme, hbathini, ananth



On Wednesday 15 July 2015 06:17 PM, Peter Zijlstra wrote:
> On Wed, Jul 15, 2015 at 02:38:36PM +0530, Aravinda Prasad wrote:
>> Current tracing infrastructure such as perf and ftrace reports system
>> wide data when invoked inside a container. It is required to restrict
>> events specific to a container context when such tools are invoked
>> inside a container.
>>
>> This RFC patch supports filtering container specific events, without
>> any change in the user interface, when invoked within a container for
>> the perf utility; such support needs to be extended to ftrace. This
>> patch assumes that the debugfs is available within the container and
>> all the processes running inside a container are grouped into a single
>> perf_event subsystem of cgroups. This patch piggybacks on the existing
>> support available for tracing with cgroups [1] by setting the cgrp
>> member of the event structure to the cgroup of the context perf tool
>> is invoked from.
>>
>> However, this patch is not complete and requires more work to fully
>> support tracing inside a container. This patch is intended to initiate
>> the discussion on having container-aware tracing support. A detailed
>> explanation on what is supported and pending issues are mentioned
>> below.
> 
> tracing is outside the scope of perf; I suspect you want tracefs to be
> sensitive to filesystem namespaces and all that that entails.

Yes, tracefs needs to be sensitive to filesystem namespace. I wanted to
put together points required for supporting perf/trace inside containers.

> 
>> Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>> ---
>>  kernel/events/core.c |   49 +++++++++++++++++++++++++++++++++++--------------
>>  1 file changed, 35 insertions(+), 14 deletions(-)
>>
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 81aa3a4..f6a1f89 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -589,17 +589,38 @@ static inline int perf_cgroup_connect(int fd, struct perf_event *event,
>>  {
>>  	struct perf_cgroup *cgrp;
>>  	struct cgroup_subsys_state *css;
>> -	struct fd f = fdget(fd);
>> +	struct fd f;
>>  	int ret = 0;
>>  
>> -	if (!f.file)
>> -		return -EBADF;
>> +	if (fd != -1) {
>> +		f = fdget(fd);
>> +		if (!f.file)
>> +			return -EBADF;
>>  
>> -	css = css_tryget_online_from_dir(f.file->f_path.dentry,
>> +		css = css_tryget_online_from_dir(f.file->f_path.dentry,
>>  					 &perf_event_cgrp_subsys);
>> -	if (IS_ERR(css)) {
>> -		ret = PTR_ERR(css);
>> -		goto out;
>> +		if (IS_ERR(css)) {
>> +			ret = PTR_ERR(css);
>> +			fdput(f);
>> +			return ret;
>> +		}
>> +	} else if (event->attach_state == PERF_ATTACH_TASK) {
>> +		/* Tracing on a PID. No need to set event->cgrp */
>> +		return ret;
>> +	} else if (task_active_pid_ns(current) != &init_pid_ns) {
> 
> Why the pid namespace?

This comes from my understanding of container -- having at least a
separate PID namespace with processes inside a container grouped into a
single perf_event cgroups subsystem.

I know there are other ways to define a container, however, I thought I
start with the above one.

> 
>> +		/* Don't set event->cgrp if task belongs to root cgroup */
>> +		if (task_css_is_root(current, perf_event_cgrp_id))
>> +			return ret;
> 
> So if you have the root perf_cgroup inside your container you can
> escape?

If we have root perf_cgroup inside the container then even if we set
event->cgrp we will be including all processes in the system.

Regards,
Aravinda

> 
>> +
>> +		css = task_css(current, perf_event_cgrp_id);
>> +		if (!css || !css_tryget_online(css))
>> +			return -ENOENT;
>> +	} else {
>> +		/*
>> +		 * perf invoked from global context and hence don't set
>> +		 * event->cgrp as all the events should be included
>> +		 */
>> +		return ret;
>>  	}
>>  
>>  	cgrp = container_of(css, struct perf_cgroup, css);
> 

-- 
Regards,
Aravinda


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] perf: Container-aware tracing support
  2015-07-15 16:21   ` Aravinda Prasad
@ 2015-07-17 10:19     ` Peter Zijlstra
  2015-07-17 12:20       ` Aravinda Prasad
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2015-07-17 10:19 UTC (permalink / raw)
  To: Aravinda Prasad
  Cc: linux-kernel, rostedt, mingo, paulus, acme, hbathini, ananth

On Wed, Jul 15, 2015 at 09:51:52PM +0530, Aravinda Prasad wrote:
> >> +	} else if (task_active_pid_ns(current) != &init_pid_ns) {
> > 
> > Why the pid namespace?
> 
> This comes from my understanding of container -- having at least a
> separate PID namespace with processes inside a container grouped into a
> single perf_event cgroups subsystem.
> 
> I know there are other ways to define a container, however, I thought I
> start with the above one.

Right, but you should at least mention this, preferably in a comment.

> > 
> >> +		/* Don't set event->cgrp if task belongs to root cgroup */
> >> +		if (task_css_is_root(current, perf_event_cgrp_id))
> >> +			return ret;
> > 
> > So if you have the root perf_cgroup inside your container you can
> > escape?
> 
> If we have root perf_cgroup inside the container then even if we set
> event->cgrp we will be including all processes in the system.

Yes, that's what I said. Why does that make sense?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] perf: Container-aware tracing support
  2015-07-17 10:19     ` Peter Zijlstra
@ 2015-07-17 12:20       ` Aravinda Prasad
  0 siblings, 0 replies; 7+ messages in thread
From: Aravinda Prasad @ 2015-07-17 12:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, rostedt, mingo, paulus, acme, hbathini, ananth



On Friday 17 July 2015 03:49 PM, Peter Zijlstra wrote:
> On Wed, Jul 15, 2015 at 09:51:52PM +0530, Aravinda Prasad wrote:
>>>> +	} else if (task_active_pid_ns(current) != &init_pid_ns) {
>>>
>>> Why the pid namespace?
>>
>> This comes from my understanding of container -- having at least a
>> separate PID namespace with processes inside a container grouped into a
>> single perf_event cgroups subsystem.
>>
>> I know there are other ways to define a container, however, I thought I
>> start with the above one.
> 
> Right, but you should at least mention this, preferably in a comment.

Yes. I should have done that.

> 
>>>
>>>> +		/* Don't set event->cgrp if task belongs to root cgroup */
>>>> +		if (task_css_is_root(current, perf_event_cgrp_id))
>>>> +			return ret;
>>>
>>> So if you have the root perf_cgroup inside your container you can
>>> escape?
>>
>> If we have root perf_cgroup inside the container then even if we set
>> event->cgrp we will be including all processes in the system.
> 
> Yes, that's what I said. Why does that make sense?

We assume that processes are grouped into a single perf_event subsystem.
If we have root perf_cgroup, from our assumption, implies we are not
invoked from a container context. However, not sure if this assumption
is right.

> 

-- 
Regards,
Aravinda


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] perf: Container-aware tracing support
  2015-07-15 12:47 ` Peter Zijlstra
  2015-07-15 16:21   ` Aravinda Prasad
@ 2015-07-17 12:26   ` Ingo Molnar
  1 sibling, 0 replies; 7+ messages in thread
From: Ingo Molnar @ 2015-07-17 12:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Aravinda Prasad, linux-kernel, rostedt, mingo, paulus, acme,
	hbathini, ananth


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Jul 15, 2015 at 02:38:36PM +0530, Aravinda Prasad wrote:
> > Current tracing infrastructure such as perf and ftrace reports system
> > wide data when invoked inside a container. It is required to restrict
> > events specific to a container context when such tools are invoked
> > inside a container.
> > 
> > This RFC patch supports filtering container specific events, without
> > any change in the user interface, when invoked within a container for
> > the perf utility; such support needs to be extended to ftrace. This
> > patch assumes that the debugfs is available within the container and
> > all the processes running inside a container are grouped into a single
> > perf_event subsystem of cgroups. This patch piggybacks on the existing
> > support available for tracing with cgroups [1] by setting the cgrp
> > member of the event structure to the cgroup of the context perf tool
> > is invoked from.
> > 
> > However, this patch is not complete and requires more work to fully
> > support tracing inside a container. This patch is intended to initiate
> > the discussion on having container-aware tracing support. A detailed
> > explanation on what is supported and pending issues are mentioned
> > below.
> 
> tracing is outside the scope of perf; I suspect you want tracefs to be
> sensitive to filesystem namespaces and all that that entails.

I'd correct that to:

  > ftrace is outside the scope of perf; I suspect you want tracefs to be 
  > sensitive to filesystem namespaces and all that that entails.

because perf very much does tracing as well, we have 'perf trace' for example, and 
obviously the whole ring-buffer is a trace buffer and perf.data is a trace dump of 
that.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC PATCH] perf: Container-aware tracing support
@ 2017-01-12 12:11 Aravinda Prasad
  0 siblings, 0 replies; 7+ messages in thread
From: Aravinda Prasad @ 2017-01-12 12:11 UTC (permalink / raw)
  To: a.p.zijlstra, linux-kernel, rostedt, mingo, paulus, acme, ebiederm
  Cc: hbathini, ananth

The RFC patch supports filtering container specific events
when perf tool is executed inside a container.

Unlike previous approaches, this approach lets the user
decide what is a container through a set of kernel configs.
The main reason for such an approach is the lack of
container-unique identifier in the kernel and a clear
definition on what constitutes a container; any combination
of the namespaces can be considered as a container.

Previous approaches mandated at least a PID namespace or a
cgroup namespace or a perf-namespace (was newly introduced
to support container-aware tracing) to be a part of a container.
However, based on the discussions in LKML, mandating a
namespace to be a part of a container is not acceptable.
Hence, this patch lets the user to define a container
through a set of kernel configs.

This patch restricts the filtering of events to perf hardware
events with sample type set to PERF_SAMPLE_IDENTIFIER.
Further, this patch piggybacks on the cgroups support, i.e.,
the patch expects processes inside a container to be grouped
into a single perf_event cgroup.

However, if the approach of user deciding what is a container
is acceptable, then the filtering will be extended to other
events and further will be decoupled from grouping the processes
to perf_event cgroup.

Limitation:
  - Two different definitions of a container cannot co-exist.

Links to earlier approaches:
  - https://lwn.net/Articles/695601/
  - https://lwn.net/Articles/691298/
  - https://lkml.org/lkml/2015/7/15/192

Patch is based on 4.8 kernel

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
---
 init/Kconfig         |   64 ++++++++++++++++++++++++++++++++
 kernel/events/core.c |   99 ++++++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 148 insertions(+), 15 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index cac3f09..48568f0 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1720,6 +1720,70 @@ config DEBUG_PERF_USE_VMALLOC
 
 	 Say N if unsure.
 
+config PERF_NS_TRACE
+	default n
+	bool "Container-aware tracing support"
+	depends on CGROUPS && NAMESPACES
+	help
+	 Enable tracing support inside a container.
+
+	 This allows to filter container specific events, without
+	 any change in the user interface, when perf is invoked
+	 within a container.
+
+	 As the kernel has no concept of a container the user should
+	 select from the below choice to let the kernel identify a container.
+
+	 Say N if unsure.
+
+if PERF_NS_TRACE
+
+menu "Select the namespaces with which containers are created"
+
+config UTS_NS_TRACE
+	bool "UTS namespace"
+	depends on UTS_NS
+	default n
+	help
+	 Select if containers are created with UTS namespace"
+
+config IPC_NS_TRACE
+	bool "IPC namespace"
+	depends on IPC_NS
+	default n
+	help
+	 Select if containers are created with IPC namespace"
+
+config MNT_NS_TRACE
+	bool "Mount namespace"
+	default n
+	help
+	 Select if containers are created with mount namespace"
+
+config PID_NS_TRACE
+	bool "PID Namespaces"
+	default y
+	depends on PID_NS
+	help
+	 Select if containers are created with IPC namespace"
+
+config NET_NS_TRACE
+	bool "Network namespace"
+	depends on NET_NS
+	default n
+	help
+	 Select if containers are created with NET namespace"
+
+config CGROUPS_NS_TRACE
+	bool "Cgroup namespace"
+	default y
+	help
+	 Select if containers are created with cgroup namespace"
+
+endmenu
+
+endif #PERF_NS_TRACE
+
 endmenu
 
 config VM_EVENT_COUNTERS
diff --git a/kernel/events/core.c b/kernel/events/core.c
index fc9bb22..5920c9c 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -802,23 +802,86 @@ static inline void perf_cgroup_sched_in(struct task_struct *prev,
 	rcu_read_unlock();
 }
 
+#ifdef CONFIG_PERF_NS_TRACE
+static inline bool is_container(void)
+{
+	bool flag = 0;
+#ifdef CONFIG_PID_NS_TRACE
+	if (task_active_pid_ns(current) == &init_pid_ns)
+		return 0;
+	else
+		flag = 1;
+#endif
+#ifdef CONFIG_UTS_NS_TRACE
+	if (current->nsproxy->uts_ns == &init_uts_ns)
+		return 0;
+	else
+		flag = 1;
+#endif
+#ifdef CONFIG_IPC_NS_TRACE
+	if (current->nsproxy->ipc_ns == &init_ipc_ns)
+		return 0;
+	else
+		flag = 1;
+#endif
+#ifdef CONFIG_MNT_NS_TRACE
+	if (current->nsproxy->mnt_ns == init_task.nsproxy->mnt_ns)
+		return 0;
+	else
+		flag = 1;
+#endif
+#ifdef CONFIG_NET_NS_TRACE
+	if (current->nsproxy->net_ns == &init_net)
+		return 0;
+	else
+		flag = 1;
+#endif
+#ifdef CONFIG_CGROUPS_NS_TRACE
+	if (current->nsproxy->cgroup_ns == &init_cgroup_ns)
+		return 0;
+	else
+		flag = 1;
+#endif
+	return flag;
+}
+#endif /* #ifdef CONFIG_PERF_NS_TRACE */
+
 static inline int perf_cgroup_connect(int fd, struct perf_event *event,
 				      struct perf_event_attr *attr,
 				      struct perf_event *group_leader)
 {
 	struct perf_cgroup *cgrp;
 	struct cgroup_subsys_state *css;
-	struct fd f = fdget(fd);
+	struct fd f;
 	int ret = 0;
 
-	if (!f.file)
-		return -EBADF;
+	if (fd != -1) {
+		f = fdget(fd);
+		if (!f.file)
+			return -EBADF;
 
-	css = css_tryget_online_from_dir(f.file->f_path.dentry,
-					 &perf_event_cgrp_subsys);
-	if (IS_ERR(css)) {
-		ret = PTR_ERR(css);
-		goto out;
+		css = css_tryget_online_from_dir(f.file->f_path.dentry,
+						 &perf_event_cgrp_subsys);
+		if (IS_ERR(css)) {
+			ret = PTR_ERR(css);
+			fdput(f);
+			return ret;
+		}
+#ifdef CONFIG_PERF_NS_TRACE
+	} else if (event->attach_state == PERF_ATTACH_TASK) {
+		/* Tracing on a PID. No need to set event->cgrp */
+		return ret;
+	} else if (is_container()) {
+		css = task_css(current, perf_event_cgrp_id);
+		if (!css || !css_tryget_online(css))
+			return -ENOENT;
+	} else {
+		/*
+		 * perf invoked from global context and hence don't set
+		 * event->cgrp as all the events should be included
+		 */
+		return ret;
+#endif /* #ifdef CONFIG_PERF_NS_TRACE */
 	}
 
 	cgrp = container_of(css, struct perf_cgroup, css);
@@ -833,8 +896,9 @@ static inline int perf_cgroup_connect(int fd, struct perf_event *event,
 		perf_detach_cgroup(event);
 		ret = -EINVAL;
 	}
-out:
-	fdput(f);
+	if (fd != -1)
+		fdput(f);
+
 	return ret;
 }
 
@@ -9059,11 +9123,9 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 	if (!has_branch_stack(event))
 		event->attr.branch_sample_type = 0;
 
-	if (cgroup_fd != -1) {
-		err = perf_cgroup_connect(cgroup_fd, event, attr, group_leader);
-		if (err)
-			goto err_ns;
-	}
+	err = perf_cgroup_connect(cgroup_fd, event, attr, group_leader);
+	if (err)
+		goto err_ns;
 
 	pmu = perf_init_event(event);
 	if (!pmu)
@@ -9404,6 +9466,13 @@ SYSCALL_DEFINE5(perf_event_open,
 			return -EACCES;
 	}
 
+#ifdef CONFIG_PERF_NS_TRACE
+	if (is_container() && !(attr.type == PERF_TYPE_HARDWARE &&
+			attr.sample_type == PERF_SAMPLE_IDENTIFIER)) {
+		return -EACCES;
+	}
+#endif
+
 	if (attr.freq) {
 		if (attr.sample_freq > sysctl_perf_event_sample_rate)
 			return -EINVAL;

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-01-12 12:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-15  9:08 [RFC PATCH] perf: Container-aware tracing support Aravinda Prasad
2015-07-15 12:47 ` Peter Zijlstra
2015-07-15 16:21   ` Aravinda Prasad
2015-07-17 10:19     ` Peter Zijlstra
2015-07-17 12:20       ` Aravinda Prasad
2015-07-17 12:26   ` Ingo Molnar
2017-01-12 12:11 Aravinda Prasad

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).