linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v4 0/3] perf: add support for analyzing events for containers
@ 2016-12-16 12:14 Alban Crequy
  2016-12-16 18:26 ` Hari Bathini
  0 siblings, 1 reply; 9+ messages in thread
From: Alban Crequy @ 2016-12-16 12:14 UTC (permalink / raw)
  To: Hari Bathini
  Cc: ananth, daniel, Linux Containers, linux-kernel, acme,
	alexander.shishkin, mingo, paulus, rostedt, aravinda, kernel,
	Alexander Viro, Eric W. Biederman

Hi,

> Currently, there is no trivial mechanism to analyze events based on
> containers. perf -G can be used, but it will not filter events for the
> containers created after perf is invoked, making it difficult to assess/
> analyze performance issues of multiple containers at once.
>
> This patch-set overcomes this limitation by using cgroup identifier as
> container unique identifier. A new PERF_RECORD_NAMESPACES event that
> records namespaces related info is introduced, from which the cgroup
> namespace's device & inode numbers are used as cgroup identifier. This
> is based on the assumption that each container is created with it's own
> cgroup namespace allowing assessment/analysis of multiple containers
> using cgroup identifier.
>
> The first patch introduces PERF_RECORD_NAMESPACES in kernel while the
> second patch makes the corresponding changes in perf tool to read this
> PERF_RECORD_NAMESPACES events. The third patch adds a cgroup identifier
> column in perf report, which contains the cgroup namespace's device and
> inode numbers.

I have a question for the pid namespace: does the new perf event gives
the pid namespace of the task, or the pid_ns_for_children from the
nsproxy? From my limited understanding, v4 seems to do the former, as
opposed to v3.

When synthesizing events from /proc/$PID/ns/pid, it cannot take the
pid_ns_for_children, so I wanted to make sure it is consistent.

Cheers,
Alban

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 0/3] perf: add support for analyzing events for containers
  2016-12-16 12:14 [PATCH v4 0/3] perf: add support for analyzing events for containers Alban Crequy
@ 2016-12-16 18:26 ` Hari Bathini
  0 siblings, 0 replies; 9+ messages in thread
From: Hari Bathini @ 2016-12-16 18:26 UTC (permalink / raw)
  To: Alban Crequy
  Cc: ananth, daniel, Linux Containers, linux-kernel, acme,
	alexander.shishkin, mingo, paulus, rostedt, aravinda, kernel,
	Alexander Viro, Eric W. Biederman

Hi Alban,


On Friday 16 December 2016 05:44 PM, Alban Crequy wrote:
> Hi,
>
>> Currently, there is no trivial mechanism to analyze events based on
>> containers. perf -G can be used, but it will not filter events for the
>> containers created after perf is invoked, making it difficult to assess/
>> analyze performance issues of multiple containers at once.
>>
>> This patch-set overcomes this limitation by using cgroup identifier as
>> container unique identifier. A new PERF_RECORD_NAMESPACES event that
>> records namespaces related info is introduced, from which the cgroup
>> namespace's device & inode numbers are used as cgroup identifier. This
>> is based on the assumption that each container is created with it's own
>> cgroup namespace allowing assessment/analysis of multiple containers
>> using cgroup identifier.
>>
>> The first patch introduces PERF_RECORD_NAMESPACES in kernel while the
>> second patch makes the corresponding changes in perf tool to read this
>> PERF_RECORD_NAMESPACES events. The third patch adds a cgroup identifier
>> column in perf report, which contains the cgroup namespace's device and
>> inode numbers.
> I have a question for the pid namespace: does the new perf event gives
> the pid namespace of the task, or the pid_ns_for_children from the
> nsproxy? From my limited understanding, v4 seems to do the former, as
> opposed to v3.

Ah! How did I miss that?!

> When synthesizing events from /proc/$PID/ns/pid, it cannot take the
> pid_ns_for_children, so I wanted to make sure it is consistent.
>

So, eventually this version sounds like the right way of doing it..?

Thanks
Hari

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 0/3] perf: add support for analyzing events for containers
  2017-01-11 11:16       ` Aravinda Prasad
@ 2017-01-11 14:45         ` Eric W. Biederman
  0 siblings, 0 replies; 9+ messages in thread
From: Eric W. Biederman @ 2017-01-11 14:45 UTC (permalink / raw)
  To: Aravinda Prasad
  Cc: Krister Johansen, Hari Bathini, ast, peterz, lkml, acme,
	alexander.shishkin, mingo, daniel, rostedt,
	Ananth N Mavinakayanahalli, sargun, brendan.d.gregg, rgb,
	Linux-audit

Aravinda Prasad <aravinda@linux.vnet.ibm.com> writes:

> On Wednesday 04 January 2017 02:34 PM, Krister Johansen wrote:
>> On Tue, Jan 03, 2017 at 04:57:54PM +0530, Hari Bathini wrote:
>>> On Thursday 29 December 2016 07:11 AM, Krister Johansen wrote:
>>>> On Fri, Dec 16, 2016 at 12:06:55AM +0530, Hari Bathini wrote:
>>>>> This patch-set overcomes this limitation by using cgroup identifier as
>>>>> container unique identifier. A new PERF_RECORD_NAMESPACES event that
>>>>> records namespaces related info is introduced, from which the cgroup
>>>>> namespace's device & inode numbers are used as cgroup identifier. This
>>>>> is based on the assumption that each container is created with it's own
>>>>> cgroup namespace allowing assessment/analysis of multiple containers
>>>>> using cgroup identifier.
>>>> Why choose cgroups when the kernel dispenses namespace-unique
>>>> identifiers. Cgroup membership can be arbitrary.  Moreover, cgroup and
>>>
>>> Agreed. But doesn't that hold for any other namespace or a combination
>>> of namespaces as well?
>> 
>> I guess that's part of my concern.  There is no container-unique
>> identifier on the system, since the notion of containers is a construct
>> of higer-level software.  
>
> I wish we had a container-unique identifier. A container-unique
> identifier will make things a lot more better, not just for
> container-aware tracing but for audit subsystem as well.
>
> https://lwn.net/Articles/699819/#Comments

Something like the audit login id might be useful for some things, but
don't expect it to cover all containers, or all usecases so something
like that would need a more specific name than container id.

Any such identifier needs to handle the case of nested containers, and
of container migration from one system to another.

The issue generally appears to be that we have plent of ids that can
serve that purpose but there is insufficient agreement on what
constitues a container.

So at this point just no.  You quoted my technical description of why
there does not exist such a thing as a container id, and have completely
ignored the technical reasons.

Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 0/3] perf: add support for analyzing events for containers
  2017-01-04  9:04     ` Krister Johansen
  2017-01-04 11:45       ` Hari Bathini
@ 2017-01-11 11:16       ` Aravinda Prasad
  2017-01-11 14:45         ` Eric W. Biederman
  1 sibling, 1 reply; 9+ messages in thread
From: Aravinda Prasad @ 2017-01-11 11:16 UTC (permalink / raw)
  To: Krister Johansen, Hari Bathini
  Cc: ast, peterz, lkml, acme, alexander.shishkin, mingo, daniel,
	rostedt, Ananth N Mavinakayanahalli, ebiederm, sargun,
	brendan.d.gregg, rgb, Linux-audit



On Wednesday 04 January 2017 02:34 PM, Krister Johansen wrote:
> On Tue, Jan 03, 2017 at 04:57:54PM +0530, Hari Bathini wrote:
>> On Thursday 29 December 2016 07:11 AM, Krister Johansen wrote:
>>> On Fri, Dec 16, 2016 at 12:06:55AM +0530, Hari Bathini wrote:
>>>> This patch-set overcomes this limitation by using cgroup identifier as
>>>> container unique identifier. A new PERF_RECORD_NAMESPACES event that
>>>> records namespaces related info is introduced, from which the cgroup
>>>> namespace's device & inode numbers are used as cgroup identifier. This
>>>> is based on the assumption that each container is created with it's own
>>>> cgroup namespace allowing assessment/analysis of multiple containers
>>>> using cgroup identifier.
>>> Why choose cgroups when the kernel dispenses namespace-unique
>>> identifiers. Cgroup membership can be arbitrary.  Moreover, cgroup and
>>
>> Agreed. But doesn't that hold for any other namespace or a combination
>> of namespaces as well?
> 
> I guess that's part of my concern.  There is no container-unique
> identifier on the system, since the notion of containers is a construct
> of higer-level software.  

I wish we had a container-unique identifier. A container-unique
identifier will make things a lot more better, not just for
container-aware tracing but for audit subsystem as well.

https://lwn.net/Articles/699819/#Comments

-- 
Regards,
Aravinda

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 0/3] perf: add support for analyzing events for containers
  2017-01-04  9:04     ` Krister Johansen
@ 2017-01-04 11:45       ` Hari Bathini
  2017-01-11 11:16       ` Aravinda Prasad
  1 sibling, 0 replies; 9+ messages in thread
From: Hari Bathini @ 2017-01-04 11:45 UTC (permalink / raw)
  To: Krister Johansen
  Cc: ast, peterz, lkml, acme, alexander.shishkin, mingo, daniel,
	rostedt, Ananth N Mavinakayanahalli, ebiederm, sargun,
	Aravinda Prasad, brendan.d.gregg



On Wednesday 04 January 2017 02:34 PM, Krister Johansen wrote:
> On Tue, Jan 03, 2017 at 04:57:54PM +0530, Hari Bathini wrote:
>> On Thursday 29 December 2016 07:11 AM, Krister Johansen wrote:
>>> On Fri, Dec 16, 2016 at 12:06:55AM +0530, Hari Bathini wrote:
>>>> This patch-set overcomes this limitation by using cgroup identifier as
>>>> container unique identifier. A new PERF_RECORD_NAMESPACES event that
>>>> records namespaces related info is introduced, from which the cgroup
>>>> namespace's device & inode numbers are used as cgroup identifier. This
>>>> is based on the assumption that each container is created with it's own
>>>> cgroup namespace allowing assessment/analysis of multiple containers
>>>> using cgroup identifier.
>>> Why choose cgroups when the kernel dispenses namespace-unique
>>> identifiers. Cgroup membership can be arbitrary.  Moreover, cgroup and
>> Agreed. But doesn't that hold for any other namespace or a combination
>> of namespaces as well?
> I guess that's part of my concern.  There is no container-unique
> identifier on the system, since the notion of containers is a construct
> of higer-level software.  You're depending on the fact that some popular
> container software packages put their processes in separate cgroups.
> Some of the stranger problems I've debugged with containers involve
> abuses of nsenter(1) and shared subtrees.  In cases like that, if you
> filter by cgroup you may miss other interfering processes that are in
> one or more of the namespaces associated with the container, but not its
> cgroup.  It's possible I misunderstood.  Is the cgroup id being used to
> filter events, or just for display purposes?

All namespaces info for all threads is captured in perf.data with
PERF_RECORD_NAMESPACES events, which can be used
for post processing. We used cgroup namespace's id in
patch 3/3 for reporting, which can be improved later..

Thanks
Hari

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 0/3] perf: add support for analyzing events for containers
  2017-01-03 11:27   ` Hari Bathini
@ 2017-01-04  9:04     ` Krister Johansen
  2017-01-04 11:45       ` Hari Bathini
  2017-01-11 11:16       ` Aravinda Prasad
  0 siblings, 2 replies; 9+ messages in thread
From: Krister Johansen @ 2017-01-04  9:04 UTC (permalink / raw)
  To: Hari Bathini
  Cc: Krister Johansen, ast, peterz, lkml, acme, alexander.shishkin,
	mingo, daniel, rostedt, Ananth N Mavinakayanahalli, ebiederm,
	sargun, Aravinda Prasad, brendan.d.gregg

On Tue, Jan 03, 2017 at 04:57:54PM +0530, Hari Bathini wrote:
> On Thursday 29 December 2016 07:11 AM, Krister Johansen wrote:
> >On Fri, Dec 16, 2016 at 12:06:55AM +0530, Hari Bathini wrote:
> >>This patch-set overcomes this limitation by using cgroup identifier as
> >>container unique identifier. A new PERF_RECORD_NAMESPACES event that
> >>records namespaces related info is introduced, from which the cgroup
> >>namespace's device & inode numbers are used as cgroup identifier. This
> >>is based on the assumption that each container is created with it's own
> >>cgroup namespace allowing assessment/analysis of multiple containers
> >>using cgroup identifier.
> >Why choose cgroups when the kernel dispenses namespace-unique
> >identifiers. Cgroup membership can be arbitrary.  Moreover, cgroup and
> 
> Agreed. But doesn't that hold for any other namespace or a combination
> of namespaces as well?

I guess that's part of my concern.  There is no container-unique
identifier on the system, since the notion of containers is a construct
of higer-level software.  You're depending on the fact that some popular
container software packages put their processes in separate cgroups.
Some of the stranger problems I've debugged with containers involve
abuses of nsenter(1) and shared subtrees.  In cases like that, if you
filter by cgroup you may miss other interfering processes that are in
one or more of the namespaces associated with the container, but not its
cgroup.  It's possible I misunderstood.  Is the cgroup id being used to
filter events, or just for display purposes?

-K

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 0/3] perf: add support for analyzing events for containers
  2016-12-29  1:41 ` Krister Johansen
@ 2017-01-03 11:27   ` Hari Bathini
  2017-01-04  9:04     ` Krister Johansen
  0 siblings, 1 reply; 9+ messages in thread
From: Hari Bathini @ 2017-01-03 11:27 UTC (permalink / raw)
  To: Krister Johansen
  Cc: ast, peterz, lkml, acme, alexander.shishkin, mingo, daniel,
	rostedt, Ananth N Mavinakayanahalli, ebiederm, sargun,
	Aravinda Prasad, brendan.d.gregg

Hi Krister,

Thanks for reviewing..


On Thursday 29 December 2016 07:11 AM, Krister Johansen wrote:
> On Fri, Dec 16, 2016 at 12:06:55AM +0530, Hari Bathini wrote:
>> This patch-set overcomes this limitation by using cgroup identifier as
>> container unique identifier. A new PERF_RECORD_NAMESPACES event that
>> records namespaces related info is introduced, from which the cgroup
>> namespace's device & inode numbers are used as cgroup identifier. This
>> is based on the assumption that each container is created with it's own
>> cgroup namespace allowing assessment/analysis of multiple containers
>> using cgroup identifier.
> Why choose cgroups when the kernel dispenses namespace-unique
> identifiers. Cgroup membership can be arbitrary.  Moreover, cgroup and

Agreed. But doesn't that hold for any other namespace or a combination
of namespaces as well?

> namespace destruction are handled by separate subsystems.  It's possible
> to have a cgroup notifier run prior to network namespace teardown
> occurring.
> If it were me, I'd re-use existing convention to identify the namespaces
> you want to monitor.  The code in nsenter(1) can take a namespace that's
> been bind mount'd on a file, or extract the ns information from a task
> in /procfs.

As PERF_RECORD_NAMESPACES gets all namespaces info, I assume your 
reservation
is with how perf report (patch 3/3) is processed, which could be 
improved to consider
other namespaces as well, once PERF_RECORD_NAMESPACES is in.

> My biggest concern is how the sample data is handled after it has been
> collected.  Both namespaces and cgroups don't survive reboots.  Will the
> records will contain all the persistent state needed to run a report or
> script command at a later date?

Yes. Sideband events are emitted when a new thread is created or an existing
thread's namespaces are changed, which can be post processed for reporting.

> Does this code attempt to enter alternate namespaces in order to record
> stack/symbol information for a '-g' style trace?  If so, how are you
> holding on to that information?  There's no guarantee that a particular
> container will be alive or have its filesystems reachable from the host
> if the trace data is evaluated at a later time.

I am not altering the existing behavior of -g option with these patches..

Thanks
Hari

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 0/3] perf: add support for analyzing events for containers
  2016-12-15 18:36 Hari Bathini
@ 2016-12-29  1:41 ` Krister Johansen
  2017-01-03 11:27   ` Hari Bathini
  0 siblings, 1 reply; 9+ messages in thread
From: Krister Johansen @ 2016-12-29  1:41 UTC (permalink / raw)
  To: Hari Bathini
  Cc: ast, peterz, lkml, acme, alexander.shishkin, mingo, daniel,
	rostedt, Ananth N Mavinakayanahalli, ebiederm, sargun,
	Aravinda Prasad, brendan.d.gregg

On Fri, Dec 16, 2016 at 12:06:55AM +0530, Hari Bathini wrote:
> This patch-set overcomes this limitation by using cgroup identifier as
> container unique identifier. A new PERF_RECORD_NAMESPACES event that
> records namespaces related info is introduced, from which the cgroup
> namespace's device & inode numbers are used as cgroup identifier. This
> is based on the assumption that each container is created with it's own
> cgroup namespace allowing assessment/analysis of multiple containers
> using cgroup identifier.

Why choose cgroups when the kernel dispenses namespace-unique
identifiers. Cgroup membership can be arbitrary.  Moreover, cgroup and
namespace destruction are handled by separate subsystems.  It's possible
to have a cgroup notifier run prior to network namespace teardown
occurring.

If it were me, I'd re-use existing convention to identify the namespaces
you want to monitor.  The code in nsenter(1) can take a namespace that's
been bind mount'd on a file, or extract the ns information from a task
in /procfs.

My biggest concern is how the sample data is handled after it has been
collected.  Both namespaces and cgroups don't survive reboots.  Will the
records will contain all the persistent state needed to run a report or
script command at a later date?

Does this code attempt to enter alternate namespaces in order to record
stack/symbol information for a '-g' style trace?  If so, how are you
holding on to that information?  There's no guarantee that a particular
container will be alive or have its filesystems reachable from the host
if the trace data is evaluated at a later time.

-K

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v4 0/3] perf: add support for analyzing events for containers
@ 2016-12-15 18:36 Hari Bathini
  2016-12-29  1:41 ` Krister Johansen
  0 siblings, 1 reply; 9+ messages in thread
From: Hari Bathini @ 2016-12-15 18:36 UTC (permalink / raw)
  To: ast, peterz, lkml, acme, alexander.shishkin, mingo
  Cc: daniel, rostedt, Ananth N Mavinakayanahalli, ebiederm, sargun,
	Aravinda Prasad, brendan.d.gregg

Currently, there is no trivial mechanism to analyze events based on
containers. perf -G can be used, but it will not filter events for the
containers created after perf is invoked, making it difficult to assess/
analyze performance issues of multiple containers at once.

This patch-set overcomes this limitation by using cgroup identifier as
container unique identifier. A new PERF_RECORD_NAMESPACES event that
records namespaces related info is introduced, from which the cgroup
namespace's device & inode numbers are used as cgroup identifier. This
is based on the assumption that each container is created with it's own
cgroup namespace allowing assessment/analysis of multiple containers
using cgroup identifier.

The first patch introduces PERF_RECORD_NAMESPACES in kernel while the
second patch makes the corresponding changes in perf tool to read this
PERF_RECORD_NAMESPACES events. The third patch adds a cgroup identifier
column in perf report, which contains the cgroup namespace's device and
inode numbers.

Changes from v3:
* Saving device number for each inode.
* cgroup identifier includes device number along with inode number.

---

Hari Bathini (3):
      perf: add PERF_RECORD_NAMESPACES to include namespaces related info
      perf tool: add PERF_RECORD_NAMESPACES to include namespaces related info
      perf tool: add cgroup identifier entry in perf report


 include/linux/perf_event.h            |    2 
 include/uapi/linux/perf_event.h       |   31 +++++++
 kernel/events/core.c                  |  135 ++++++++++++++++++++++++++++++++
 kernel/fork.c                         |    3 +
 kernel/nsproxy.c                      |    5 +
 tools/include/uapi/linux/perf_event.h |   31 +++++++
 tools/perf/builtin-annotate.c         |    1 
 tools/perf/builtin-diff.c             |    1 
 tools/perf/builtin-inject.c           |   14 +++
 tools/perf/builtin-kmem.c             |    1 
 tools/perf/builtin-kvm.c              |    2 
 tools/perf/builtin-lock.c             |    1 
 tools/perf/builtin-mem.c              |    1 
 tools/perf/builtin-record.c           |   33 +++++++-
 tools/perf/builtin-report.c           |    1 
 tools/perf/builtin-sched.c            |    1 
 tools/perf/builtin-script.c           |   41 ++++++++++
 tools/perf/builtin-trace.c            |    3 -
 tools/perf/perf.h                     |    1 
 tools/perf/util/Build                 |    1 
 tools/perf/util/data-convert-bt.c     |    2 
 tools/perf/util/event.c               |  138 ++++++++++++++++++++++++++++++++-
 tools/perf/util/event.h               |   18 ++++
 tools/perf/util/evsel.c               |    3 +
 tools/perf/util/hist.c                |    7 ++
 tools/perf/util/hist.h                |    1 
 tools/perf/util/machine.c             |   25 ++++++
 tools/perf/util/machine.h             |    3 +
 tools/perf/util/namespaces.c          |   27 ++++++
 tools/perf/util/namespaces.h          |   18 ++++
 tools/perf/util/session.c             |    7 ++
 tools/perf/util/sort.c                |   41 ++++++++++
 tools/perf/util/sort.h                |    7 ++
 tools/perf/util/thread.c              |   44 ++++++++++-
 tools/perf/util/thread.h              |    6 +
 tools/perf/util/tool.h                |    2 
 36 files changed, 643 insertions(+), 15 deletions(-)
 create mode 100644 tools/perf/util/namespaces.c
 create mode 100644 tools/perf/util/namespaces.h

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-01-11 14:49 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-16 12:14 [PATCH v4 0/3] perf: add support for analyzing events for containers Alban Crequy
2016-12-16 18:26 ` Hari Bathini
  -- strict thread matches above, loose matches on Subject: below --
2016-12-15 18:36 Hari Bathini
2016-12-29  1:41 ` Krister Johansen
2017-01-03 11:27   ` Hari Bathini
2017-01-04  9:04     ` Krister Johansen
2017-01-04 11:45       ` Hari Bathini
2017-01-11 11:16       ` Aravinda Prasad
2017-01-11 14:45         ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).