All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/15] task_diag: add a new interface to get information about processes (v3)
@ 2016-04-11 23:35 Andrey Vagin
  2016-04-11 23:35 ` [PATCH 01/15] proc: pick out a function to iterate task children Andrey Vagin
                   ` (14 more replies)
  0 siblings, 15 replies; 21+ messages in thread
From: Andrey Vagin @ 2016-04-11 23:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrey Vagin, Oleg Nesterov, Andrew Morton, Cyrill Gorcunov,
	Pavel Emelyanov, Roger Luethi, Arnd Bergmann,
	Arnaldo Carvalho de Melo, David Ahern, Andy Lutomirski,
	Pavel Odintsov

Current interface is a bunch of files in /proc/PID. While this appears to be
simple and there are a number of problems with it.

* Lots of syscalls

  At least three syscalls per each PID are required — open(), read(), and
  close()

* Variety of formats

  There are many different formats used by files in /proc/PID/ hierarchy.
  Therefore, there is a need to write parser for each such format.

* Non-extendable formats

  Some formats in /proc/PID are non-extendable. For example, /proc/PID/maps
  last column (file name) is optional, therefore there is no way to add more
  columns without breaking the format.

* Slow read due to extra info[edit]
  Sometimes getting information is slow due to extra attributes that are not
  always needed. For example, /proc/PID/smaps contains VmFlags field (which
  can't be added to /proc/PID/maps, see previous item), but it also contains
  page stats that take long time to generate.

	$ time cat /proc/*/maps > /dev/null
	real	0m0.061s
	user	0m0.002s
	sys	0m0.059s


	$ time cat /proc/*/smaps > /dev/null
	real	0m0.253s
	user	0m0.004s
	sys	0m0.247s

Proposed solution
-----------------

The proposed solution is the /proc/task_diag file, which operates based on the
following principles:

* Transactional: write request, read response
* Netlink message format (same as used by sock_diag; binary and extendable)
* Ability to specify a set of processes to get info about
* Optimal grouping of attributes
  Any attribute in a group can't affect a response time

The user-kernel interface is encapsulated in include/uapi/linux/task_diag.h

A request is described by the task_diag_pid structure:

struct task_diag_pid {
       __u64   show_flags;	/* specify which information are required */
       __u64   dump_stratagy;   /* specify a group of processes */

       __u32   pid;
};

dump_stratagy specifies a group of processes:
/* system wide strategies (the pid fiel is ignored) */
TASK_DIAG_DUMP_ALL	  - all processes
TASK_DIAG_DUMP_ALL_THREAD - all threads
/* per-process strategies */
TASK_DIAG_DUMP_CHILDREN	 - all children
TASK_DIAG_DUMP_THREAD	 - all threads
TASK_DIAG_DUMP_ONE	 - one process

show_flags specifies which information are required.  If we set the
TASK_DIAG_SHOW_BASE flag, the response message will contain the TASK_DIAG_BASE
attribute which is described by the task_diag_base structure.

struct task_diag_base {
	__u32	tgid;
	__u32	pid;
	__u32	ppid;
	__u32	tpid;
	__u32	sid;
	__u32	pgid;
	__u8	state;
	char	comm[TASK_DIAG_COMM_LEN];
};

In future, it can be extended by optional attributes. The request describes
which task properties are required and for which processes they are required
for.

A response can be divided into a few netlink packets. Each task is described
by a netlink message. If all information about a process doesn't fit into a
message, the TASK_DIAG_FLAG_CONT flag will be set and the next message will
continue describing the same process.

The task diag is much faster than the proc file system. We don't need to create
a new file descriptor for each task. We need to send a request and get a
response. It allows to get information for a few tasks for one request-response
iteration.

As for security, task_diag always works as procfs with hidepid = 2 (highest
level of security).

I have compared performance of procfs and task-diag for the
"ps ax -o pid,ppid" command.

ps uses /proc/PID/* files:
$ time ./ps/pscommand ax | wc -l
50089

real    0m1.596s
user    0m0.475s
sys     0m1.126s

ps uses the task_diag interface
$ time ./ps/pscommand ax | wc -l
50089

real    0m0.148s
user    0m0.069s
sys     0m0.086s

Read /proc/PID/stat for 30K tasks:
$ time ./task_proc_all > /dev/null

real	0m0.258s
user	0m0.019s
sys	0m0.232s

Get the same information via task_diag:
$ time ./task_diag_all > /dev/null

real	0m0.052s
user	0m0.013s
sys	0m0.036s

And here are statistics on syscalls which were called by each
command.

$ perf trace -s -o log -- ./task_proc_all > /dev/null

 Summary of events:

 task_proc_all (30781), 180785 events, 100.0%, 0.000 msec

   syscall            calls      min       avg       max      stddev
                               (msec)    (msec)    (msec)        (%)
   --------------- -------- --------- --------- ---------     ------
   read               30111     0.000     0.013     0.107      0.21%
   write                  1     0.008     0.008     0.008      0.00%
   open               30111     0.007     0.012     0.145      0.24%
   close              30112     0.004     0.011     0.110      0.20%
   fstat                  3     0.009     0.013     0.016     16.15%
   mmap                   8     0.011     0.020     0.027     11.24%
   mprotect               4     0.019     0.023     0.028      8.33%
   munmap                 1     0.026     0.026     0.026      0.00%
   brk                    8     0.007     0.015     0.024     11.94%
   ioctl                  1     0.007     0.007     0.007      0.00%
   access                 1     0.019     0.019     0.019      0.00%
   execve                 1     0.000     0.000     0.000      0.00%
   getdents              29     0.008     1.010     2.215      8.88%
   arch_prctl             1     0.016     0.016     0.016      0.00%
   openat                 1     0.021     0.021     0.021      0.00%


$ perf trace -s -o log -- ./task_diag_all > /dev/null
 Summary of events:

 task_diag_all (30762), 717 events, 98.9%, 0.000 msec

   syscall            calls      min       avg       max      stddev
                               (msec)    (msec)    (msec)        (%)
   --------------- -------- --------- --------- ---------     ------
   read                   2     0.000     0.008     0.016    100.00%
   write                197     0.008     0.019     0.041      3.00%
   open                   2     0.023     0.029     0.036     22.45%
   close                  3     0.010     0.012     0.014     11.34%
   fstat                  3     0.012     0.044     0.106     70.52%
   mmap                   8     0.014     0.031     0.054     18.88%
   mprotect               4     0.016     0.023     0.027     10.93%
   munmap                 1     0.022     0.022     0.022      0.00%
   brk                    1     0.040     0.040     0.040      0.00%
   ioctl                  1     0.011     0.011     0.011      0.00%
   access                 1     0.032     0.032     0.032      0.00%
   getpid                 1     0.012     0.012     0.012      0.00%
   socket                 1     0.032     0.032     0.032      0.00%
   sendto                 2     0.032     0.095     0.157     65.77%
   recvfrom             129     0.009     0.235     0.418      2.45%
   bind                   1     0.018     0.018     0.018      0.00%
   execve                 1     0.000     0.000     0.000      0.00%
   arch_prctl             1     0.012     0.012     0.012      0.00%

You can find the test programs from this experiment in tools/test/selftest/task_diag.

The idea of this functionality was suggested by Pavel Emelyanov (xemul@),
when he found that operations with /proc forms a significant part
of a checkpointing time.

Ten years ago there was attempt to add a netlink interface to access to /proc
information:
http://lwn.net/Articles/99600/

Links
-----

kernel: https://github.com/avagin/linux-task-diag
procps: https://github.com/avagin/procps-task-diag
wiki: https://criu.org/Task-diag

Changes from the first version:
-------------------------------

David Ahern implemented all required functionality to use task_diag in
perf.

Bellow you can find his results how it affects performance.
> Using the fork test command:
>    10,000 processes; 10k proc with 5 threads = 50,000 tasks
>    reading /proc: 11.3 sec
>    task_diag:      2.2 sec
>
> @7,440 tasks, reading /proc is at 0.77 sec and task_diag at 0.096
>
> 128 instances of sepcjbb, 80,000+ tasks:
>     reading /proc: 32.1 sec
>     task_diag:      3.9 sec
>
> So overall much snappier startup times.

Many thanks to David Ahern for the help with improving task_diag.

Changes from the second version:
--------------------------------

Use a proc transation file instead of the netlink interface.
Andy Lutomirski pointed out on security problems related to netlink sockets:

> Slightly off-topic, but this netlink is really rather bad as an
> example of how fds can be used as capabilities (in the real capability
> sense, not the Linux capabilities sense).  You call socket and get a
> socket.  That socket captures f_cred.  Then you drop privs, and you
> assume that the socket you're holding on to retains the right to do
> certain things.
>
> This breaks pretty badly when, through things such as this patch set,
> existing code that creates netlink sockets suddenly starts capturing
> brand-new rights that didn't exist as part of a netlink socket before.

Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Roger Luethi <rl@hellgate.ch>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Pavel Odintsov <pavel.odintsov@gmail.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
--
2.1.0

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 01/15] proc: pick out a function to iterate task children
  2016-04-11 23:35 [PATCH 0/15] task_diag: add a new interface to get information about processes (v3) Andrey Vagin
@ 2016-04-11 23:35 ` Andrey Vagin
  2016-04-11 23:35 ` [PATCH 02/15] proc: export task_first_tid() and task_next_tid() Andrey Vagin
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Andrey Vagin @ 2016-04-11 23:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrey Vagin, Oleg Nesterov, Andrew Morton, Cyrill Gorcunov,
	Pavel Emelyanov, Roger Luethi, Arnd Bergmann,
	Arnaldo Carvalho de Melo, David Ahern, Andy Lutomirski,
	Pavel Odintsov

This function will be used in task_diag.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 fs/proc/array.c    | 53 +++++++++++++++++++++++++++++++++--------------------
 fs/proc/internal.h |  3 +++
 2 files changed, 36 insertions(+), 20 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index b6c00ce..3eceab1 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -593,31 +593,25 @@ int proc_pid_statm(struct seq_file *m, struct pid_namespace *ns,
 }
 
 #ifdef CONFIG_PROC_CHILDREN
-static struct pid *
-get_children_pid(struct inode *inode, struct pid *pid_prev, loff_t pos)
+struct task_struct *
+task_next_child(struct task_struct *parent, struct task_struct *prev, unsigned int pos)
 {
-	struct task_struct *start, *task;
-	struct pid *pid = NULL;
-
-	read_lock(&tasklist_lock);
-
-	start = pid_task(proc_pid(inode), PIDTYPE_PID);
-	if (!start)
-		goto out;
+	struct task_struct *task;
 
 	/*
 	 * Lets try to continue searching first, this gives
 	 * us significant speedup on children-rich processes.
 	 */
-	if (pid_prev) {
-		task = pid_task(pid_prev, PIDTYPE_PID);
-		if (task && task->real_parent == start &&
+	if (prev) {
+		task = prev;
+		if (task && task->real_parent == parent &&
 		    !(list_empty(&task->sibling))) {
-			if (list_is_last(&task->sibling, &start->children))
+			if (list_is_last(&task->sibling, &parent->children)) {
+				task = NULL;
 				goto out;
+			}
 			task = list_first_entry(&task->sibling,
 						struct task_struct, sibling);
-			pid = get_pid(task_pid(task));
 			goto out;
 		}
 	}
@@ -637,12 +631,31 @@ get_children_pid(struct inode *inode, struct pid *pid_prev, loff_t pos)
 	 * So one need to stop or freeze the leader and all
 	 * its children to get a precise result.
 	 */
-	list_for_each_entry(task, &start->children, sibling) {
-		if (pos-- == 0) {
-			pid = get_pid(task_pid(task));
-			break;
-		}
+	list_for_each_entry(task, &parent->children, sibling) {
+		if (pos-- == 0)
+			goto out;
 	}
+	task = NULL;
+out:
+	return task;
+}
+
+static struct pid *
+get_children_pid(struct inode *inode, struct pid *prev_pid, loff_t pos)
+{
+	struct task_struct *start, *task, *prev;
+	struct pid *pid = NULL;
+
+	read_lock(&tasklist_lock);
+	start = pid_task(proc_pid(inode), PIDTYPE_PID);
+	if (!start)
+		goto out;
+
+	prev = prev_pid ? pid_task(prev_pid, PIDTYPE_PID) : NULL;
+
+	task = task_next_child(start, prev, pos);
+	if (task)
+		pid = get_pid(task_pid(task));
 
 out:
 	read_unlock(&tasklist_lock);
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index aa27810..969e05b 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -303,3 +303,6 @@ extern unsigned long task_statm(struct mm_struct *,
 				unsigned long *, unsigned long *,
 				unsigned long *, unsigned long *);
 extern void task_mem(struct seq_file *, struct mm_struct *);
+
+struct task_struct *
+task_next_child(struct task_struct *parent, struct task_struct *prev, unsigned int pos);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 02/15] proc: export task_first_tid() and task_next_tid()
  2016-04-11 23:35 [PATCH 0/15] task_diag: add a new interface to get information about processes (v3) Andrey Vagin
  2016-04-11 23:35 ` [PATCH 01/15] proc: pick out a function to iterate task children Andrey Vagin
@ 2016-04-11 23:35 ` Andrey Vagin
  2016-04-11 23:35 ` [PATCH 03/15] proc: export next_tgid() Andrey Vagin
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Andrey Vagin @ 2016-04-11 23:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrey Vagin, Oleg Nesterov, Andrew Morton, Cyrill Gorcunov,
	Pavel Emelyanov, Roger Luethi, Arnd Bergmann,
	Arnaldo Carvalho de Melo, David Ahern, Andy Lutomirski,
	Pavel Odintsov

It will be more convenient when this function will be used in
task_diag.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 fs/proc/base.c     | 8 ++++----
 fs/proc/internal.h | 4 ++++
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index b1755b2..614f1d0 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3341,7 +3341,7 @@ out_no_task:
  * In the case of a seek we start with the leader and walk nr
  * threads past it.
  */
-static struct task_struct *first_tid(struct pid *pid, int tid, loff_t f_pos,
+struct task_struct *task_first_tid(struct pid *pid, int tid, loff_t f_pos,
 					struct pid_namespace *ns)
 {
 	struct task_struct *pos, *task;
@@ -3390,7 +3390,7 @@ out:
  *
  * The reference to the input task_struct is released.
  */
-static struct task_struct *next_tid(struct task_struct *start)
+struct task_struct *task_next_tid(struct task_struct *start)
 {
 	struct task_struct *pos = NULL;
 	rcu_read_lock();
@@ -3426,9 +3426,9 @@ static int proc_task_readdir(struct file *file, struct dir_context *ctx)
 	ns = inode->i_sb->s_fs_info;
 	tid = (int)file->f_version;
 	file->f_version = 0;
-	for (task = first_tid(proc_pid(inode), tid, ctx->pos - 2, ns);
+	for (task = task_first_tid(proc_pid(inode), tid, ctx->pos - 2, ns);
 	     task;
-	     task = next_tid(task), ctx->pos++) {
+	     task = task_next_tid(task), ctx->pos++) {
 		char name[PROC_NUMBUF];
 		int len;
 		tid = task_pid_nr_ns(task, ns);
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 969e05b..49145e2 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -306,3 +306,7 @@ extern void task_mem(struct seq_file *, struct mm_struct *);
 
 struct task_struct *
 task_next_child(struct task_struct *parent, struct task_struct *prev, unsigned int pos);
+
+struct task_struct *task_first_tid(struct pid *pid, int tid, loff_t f_pos,
+					struct pid_namespace *ns);
+struct task_struct *task_next_tid(struct task_struct *start);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 03/15] proc: export next_tgid()
  2016-04-11 23:35 [PATCH 0/15] task_diag: add a new interface to get information about processes (v3) Andrey Vagin
  2016-04-11 23:35 ` [PATCH 01/15] proc: pick out a function to iterate task children Andrey Vagin
  2016-04-11 23:35 ` [PATCH 02/15] proc: export task_first_tid() and task_next_tid() Andrey Vagin
@ 2016-04-11 23:35 ` Andrey Vagin
  2016-04-11 23:35 ` [PATCH 04/15] task_diag: add a new interface to get information about tasks (v4) Andrey Vagin
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Andrey Vagin @ 2016-04-11 23:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrey Vagin, Oleg Nesterov, Andrew Morton, Cyrill Gorcunov,
	Pavel Emelyanov, Roger Luethi, Arnd Bergmann,
	Arnaldo Carvalho de Melo, David Ahern, Andy Lutomirski,
	Pavel Odintsov

It's going to be used in task_diag

Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 fs/proc/base.c     | 6 +-----
 fs/proc/internal.h | 6 ++++++
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 614f1d0..9e5fd1c 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3071,11 +3071,7 @@ out:
  * Find the first task with tgid >= tgid
  *
  */
-struct tgid_iter {
-	unsigned int tgid;
-	struct task_struct *task;
-};
-static struct tgid_iter next_tgid(struct pid_namespace *ns, struct tgid_iter iter)
+struct tgid_iter next_tgid(struct pid_namespace *ns, struct tgid_iter iter)
 {
 	struct pid *pid;
 
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 49145e2..2a2b1e6 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -304,6 +304,12 @@ extern unsigned long task_statm(struct mm_struct *,
 				unsigned long *, unsigned long *);
 extern void task_mem(struct seq_file *, struct mm_struct *);
 
+struct tgid_iter {
+	unsigned int tgid;
+	struct task_struct *task;
+};
+struct tgid_iter next_tgid(struct pid_namespace *ns, struct tgid_iter iter);
+
 struct task_struct *
 task_next_child(struct task_struct *parent, struct task_struct *prev, unsigned int pos);
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 04/15] task_diag: add a new interface to get information about tasks (v4)
  2016-04-11 23:35 [PATCH 0/15] task_diag: add a new interface to get information about processes (v3) Andrey Vagin
                   ` (2 preceding siblings ...)
  2016-04-11 23:35 ` [PATCH 03/15] proc: export next_tgid() Andrey Vagin
@ 2016-04-11 23:35 ` Andrey Vagin
  2016-04-12  1:03   ` kbuild test robot
  2016-04-12  7:08   ` Cyrill Gorcunov
  2016-04-11 23:35 ` [PATCH 05/15] task_diag: add a new group to get process credentials Andrey Vagin
                   ` (10 subsequent siblings)
  14 siblings, 2 replies; 21+ messages in thread
From: Andrey Vagin @ 2016-04-11 23:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrey Vagin, Oleg Nesterov, Andrew Morton, Cyrill Gorcunov,
	Pavel Emelyanov, Roger Luethi, Arnd Bergmann,
	Arnaldo Carvalho de Melo, David Ahern, Andy Lutomirski,
	Pavel Odintsov

The task-diag interface allows to get information about running
processes (roughly same info that is now available from /proc/PID/*
files). Compared to /proc/PID/*, it is faster, more flexible and
provides data in a binary format.

Task-diag was created using the basic idea of socket_diag.

Here is the /proc/task-diag file, which operates based on the following
principles:

* Transactional: write request, read response
* Netlink message format (same as used by sock_diag; binary and extendable)

A request messages is described by the task_diag_pid structure:
struct task_diag_pid {
	__u64   show_flags;
	__u64   dump_strategy;

	__u32   pid;
};

A respone is a set of netlink messages. Each message describes one task.
All task properties are divided on groups. A message contains the
TASK_DIAG_PID group, and other groups if they have been requested in
show_flags. For example, if show_flags contains TASK_DIAG_SHOW_BASE, a
response will contain the TASK_DIAG_CRED group which is described by the
task_diag_creds structure.

struct task_diag_base {
	__u32   tgid;
	__u32   pid;
	__u32   ppid;
	__u32   tpid;
	__u32   sid;
	__u32   pgid;
	__u8    state;
	char    comm[TASK_DIAG_COMM_LEN];
};

The dump_strategy field will be used in following patches to request
information for a group of processes.

v2: A few changes from David Ahern
    Use a consistent name
    Add max attr enum
    task diag: Send pid as u32
    Change _MSG/msg references to base
    Fix 8-byte alignment

v3: take pid namespace from scm credentials. There is a pid of a process
which sent an request. If we need to get information from another
namespace, we can set pid in scm of a process from this namespaces.

v4: use a transaction file instead of netlink

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 fs/proc/Kconfig                |  13 ++
 fs/proc/Makefile               |   3 +
 fs/proc/task_diag.c            | 424 +++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/task_diag.h |  66 +++++++
 4 files changed, 506 insertions(+)
 create mode 100644 fs/proc/task_diag.c
 create mode 100644 include/uapi/linux/task_diag.h

diff --git a/fs/proc/Kconfig b/fs/proc/Kconfig
index 1ade120..ca223f5 100644
--- a/fs/proc/Kconfig
+++ b/fs/proc/Kconfig
@@ -81,3 +81,16 @@ config PROC_CHILDREN
 
 	  Say Y if you are running any user-space software which takes benefit from
 	  this interface. For example, rkt is such a piece of software.
+
+config TASK_DIAG
+	bool "Task-diag support (/proc/task-diag)"
+	depends on NET
+	default n
+	help
+	  Export selected properties for tasks/processes through the /proc/task-diag
+	  transaction file. Unlike the proc file system, task_diag returns
+	  information in a binary format (netlink) and allows to specify which
+	  properties are required.
+
+	  Say N if unsure.
+
diff --git a/fs/proc/Makefile b/fs/proc/Makefile
index 7151ea4..94965b9 100644
--- a/fs/proc/Makefile
+++ b/fs/proc/Makefile
@@ -30,3 +30,6 @@ proc-$(CONFIG_PROC_KCORE)	+= kcore.o
 proc-$(CONFIG_PROC_VMCORE)	+= vmcore.o
 proc-$(CONFIG_PRINTK)	+= kmsg.o
 proc-$(CONFIG_PROC_PAGE_MONITOR)	+= page.o
+
+obj-$(CONFIG_TASK_DIAG) += task_diag.o
+
diff --git a/fs/proc/task_diag.c b/fs/proc/task_diag.c
new file mode 100644
index 0000000..3c2127e
--- /dev/null
+++ b/fs/proc/task_diag.c
@@ -0,0 +1,424 @@
+#include <linux/kernel.h>
+#include <linux/task_diag.h>
+#include <linux/pid_namespace.h>
+#include <linux/ptrace.h>
+#include <linux/proc_fs.h>
+#include <linux/sched.h>
+#include <linux/taskstats.h>
+#include <net/sock.h>
+
+struct task_diag_cb {
+	struct sk_buff		*req;
+	struct sk_buff		*resp;
+	const struct nlmsghdr	*nlh;
+	pid_t			pid;
+	int			pos;
+	int			attr;
+};
+
+/*
+ * The task state array is a strange "bitmap" of
+ * reasons to sleep. Thus "running" is zero, and
+ * you can test for combinations of others with
+ * simple bit tests.
+ */
+static const __u8 task_state_array[] = {
+	TASK_DIAG_RUNNING,
+	TASK_DIAG_INTERRUPTIBLE,
+	TASK_DIAG_UNINTERRUPTIBLE,
+	TASK_DIAG_STOPPED,
+	TASK_DIAG_TRACE_STOP,
+	TASK_DIAG_DEAD,
+	TASK_DIAG_ZOMBIE,
+};
+
+static inline const __u8 get_task_state(struct task_struct *tsk)
+{
+	unsigned int state = (tsk->state | tsk->exit_state) & TASK_REPORT;
+
+	BUILD_BUG_ON(1 + ilog2(TASK_REPORT) != ARRAY_SIZE(task_state_array)-1);
+
+	return task_state_array[fls(state)];
+}
+
+static int fill_task_base(struct task_struct *p,
+			  struct sk_buff *skb, struct pid_namespace *ns)
+{
+	struct task_diag_base *base;
+	struct nlattr *attr;
+	char tcomm[sizeof(p->comm)];
+	struct task_struct *tracer;
+
+	attr = nla_reserve(skb, TASK_DIAG_BASE, sizeof(struct task_diag_base));
+	if (!attr)
+		return -EMSGSIZE;
+
+	base = nla_data(attr);
+
+	rcu_read_lock();
+	base->ppid = pid_alive(p) ?
+		task_tgid_nr_ns(rcu_dereference(p->real_parent), ns) : 0;
+
+	base->tpid = 0;
+	tracer = ptrace_parent(p);
+	if (tracer)
+		base->tpid = task_pid_nr_ns(tracer, ns);
+
+	base->tgid = task_tgid_nr_ns(p, ns);
+	base->pid  = task_pid_nr_ns(p, ns);
+	base->sid  = task_session_nr_ns(p, ns);
+	base->pgid = task_pgrp_nr_ns(p, ns);
+
+	rcu_read_unlock();
+
+	get_task_comm(tcomm, p);
+	memset(base->comm, 0, TASK_DIAG_COMM_LEN);
+	strncpy(base->comm, tcomm, TASK_DIAG_COMM_LEN);
+
+	base->state = get_task_state(p);
+
+	return 0;
+}
+
+static int task_diag_fill(struct task_struct *tsk, struct sk_buff *skb,
+			  struct task_diag_pid *req,
+			  struct task_diag_cb *cb, struct pid_namespace *pidns,
+			  struct user_namespace *userns)
+{
+	u64 show_flags = req->show_flags;
+	struct nlmsghdr *nlh;
+	struct task_diag_msg *msg;
+	int err = 0, i = 0, n = 0;
+	int flags = 0;
+
+	if (cb) {
+		n = cb->attr;
+		flags |= NLM_F_MULTI;
+	}
+
+	nlh = nlmsg_put(skb, 0, cb->nlh->nlmsg_seq,
+			TASK_DIAG_CMD_GET, sizeof(*msg), flags);
+	if (nlh == NULL)
+		return -EMSGSIZE;
+
+	msg = nlmsg_data(nlh);
+	msg->pid  = task_pid_nr_ns(tsk, pidns);
+	msg->tgid = task_tgid_nr_ns(tsk, pidns);
+
+	if (show_flags & TASK_DIAG_SHOW_BASE) {
+		if (i >= n)
+			err = fill_task_base(tsk, skb, pidns);
+		if (err)
+			goto err;
+		i++;
+	}
+
+	nlmsg_end(skb, nlh);
+	if (cb)
+		cb->attr = 0;
+
+	return 0;
+err:
+	if (err == -EMSGSIZE && (i > n)) {
+		if (cb)
+			cb->attr = i;
+		nlmsg_end(skb, nlh);
+	} else
+		nlmsg_cancel(skb, nlh);
+
+	return err;
+}
+
+struct task_iter {
+	struct task_diag_pid	req;
+	struct pid_namespace	*ns;
+	struct task_struct	*parent;
+
+	struct task_diag_cb	*cb;
+
+	struct tgid_iter	tgid;
+	unsigned int		pos;
+	struct task_struct	*task;
+};
+
+static void iter_stop(struct task_iter *iter)
+{
+	struct task_struct *task;
+
+	if (iter->parent)
+		put_task_struct(iter->parent);
+
+	switch (iter->req.dump_strategy) {
+	case TASK_DIAG_DUMP_ALL:
+		task = iter->tgid.task;
+		break;
+	default:
+		task = iter->task;
+	}
+	if (task)
+		put_task_struct(task);
+}
+
+static struct task_struct *iter_start(struct task_iter *iter)
+{
+	if (iter->req.pid > 0) {
+		rcu_read_lock();
+		iter->parent = find_task_by_pid_ns(iter->req.pid, iter->ns);
+		if (iter->parent)
+			get_task_struct(iter->parent);
+		rcu_read_unlock();
+	}
+
+	switch (iter->req.dump_strategy) {
+	case TASK_DIAG_DUMP_ONE:
+		if (iter->parent == NULL)
+			return ERR_PTR(-ESRCH);
+		iter->pos = iter->cb->pos;
+		if (iter->pos == 0) {
+			iter->task = iter->parent;
+			iter->parent = NULL;
+		} else
+			iter->task = NULL;
+		return iter->task;
+
+	case TASK_DIAG_DUMP_ALL:
+		iter->tgid.tgid = iter->cb->pid;
+		iter->tgid.task = NULL;
+		iter->tgid = next_tgid(iter->ns, iter->tgid);
+		return iter->tgid.task;
+	}
+
+	return ERR_PTR(-EINVAL);
+}
+
+static struct task_struct *iter_next(struct task_iter *iter)
+{
+	switch (iter->req.dump_strategy) {
+	case TASK_DIAG_DUMP_ONE:
+		iter->pos++;
+		iter->cb->pos = iter->pos;
+		if (iter->task)
+			put_task_struct(iter->task);
+		iter->task = NULL;
+		return NULL;
+
+	case TASK_DIAG_DUMP_ALL:
+		iter->tgid.tgid += 1;
+		iter->tgid = next_tgid(iter->ns, iter->tgid);
+		iter->cb->pid = iter->tgid.tgid;
+		return iter->tgid.task;
+	}
+
+	return NULL;
+}
+
+static int __taskdiag_dumpit(struct task_iter *iter,
+			     struct task_diag_cb *cb, struct task_struct **start)
+{
+	struct user_namespace *userns = current_user_ns();
+	struct task_struct *task = *start;
+	int rc;
+
+	for (; task; task = iter_next(iter)) {
+		if (!ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS))
+			continue;
+
+		rc = task_diag_fill(task, cb->resp, &iter->req,
+				cb, iter->ns, userns);
+		if (rc < 0) {
+			if (rc != -EMSGSIZE)
+				return rc;
+			break;
+		}
+	}
+	*start = task;
+
+	return 0;
+}
+
+static int taskdiag_dumpit(struct task_diag_cb *cb,
+				struct pid_namespace *pidns,
+				struct msghdr *msg, size_t len)
+{
+	struct sk_buff *skb = cb->resp;
+	struct task_struct *task;
+	struct task_iter iter;
+	struct nlattr *na;
+	size_t copied;
+	int err;
+
+	if (nlmsg_len(cb->nlh) < sizeof(iter.req))
+		return -EINVAL;
+
+	na = nlmsg_data(cb->nlh);
+	if (na->nla_type < 0)
+		return -EINVAL;
+
+	memcpy(&iter.req, na, sizeof(iter.req));
+
+	iter.ns     = pidns;
+	iter.cb     = cb;
+	iter.parent = NULL;
+	iter.pos    = 0;
+	iter.task   = NULL;
+
+	task = iter_start(&iter);
+	if (IS_ERR(task))
+		return PTR_ERR(task);
+
+	copied = 0;
+	while (1) {
+		err = __taskdiag_dumpit(&iter, cb, &task);
+		if (err < 0)
+			goto err;
+		if (skb->len == 0)
+			break;
+
+		err = skb_copy_datagram_msg(skb, 0, msg, skb->len);
+		if (err < 0)
+			goto err;
+
+		copied += skb->len;
+
+		skb_trim(skb, 0);
+		if (skb_tailroom(skb) + copied > len)
+			break;
+
+		if (signal_pending(current))
+			break;
+	}
+
+	iter_stop(&iter);
+	return copied;
+err:
+	iter_stop(&iter);
+	return err;
+}
+
+static ssize_t task_diag_write(struct file *f, const char __user *buf,
+						size_t len, loff_t *off)
+{
+	struct task_diag_cb *cb = f->private_data;
+	struct sk_buff *skb;
+	struct msghdr msg;
+	struct iovec iov;
+	int err;
+
+	if (cb->req)
+		return -EBUSY;
+	if (len < nlmsg_total_size(0))
+		return -EINVAL;
+
+	err = import_single_range(WRITE, (void __user *) buf, len,
+						&iov, &msg.msg_iter);
+	if (unlikely(err))
+		return err;
+
+	msg.msg_name = NULL;
+	msg.msg_control = NULL;
+	msg.msg_controllen = 0;
+	msg.msg_namelen = 0;
+	msg.msg_flags = 0;
+
+	skb = nlmsg_new(len, GFP_KERNEL);
+	if (skb == NULL)
+		return -ENOMEM;
+
+	if (memcpy_from_msg(skb_put(skb, len), &msg, len)) {
+		kfree_skb(skb);
+		return -EFAULT;
+	}
+
+	memset(cb, 0, sizeof(*cb));
+	cb->req = skb;
+	cb->nlh = nlmsg_hdr(skb);
+
+	return len;
+}
+
+static ssize_t task_diag_read(struct file *file, char __user *ubuf,
+						size_t len, loff_t *off)
+{
+	struct pid_namespace *ns = file_inode(file)->i_sb->s_fs_info;
+	struct task_diag_cb *cb = file->private_data;
+	struct iovec iov;
+	struct msghdr msg;
+	int size, err;
+
+	if (cb->req == NULL)
+		return 0;
+
+	err = import_single_range(READ, ubuf, len, &iov, &msg.msg_iter);
+	if (unlikely(err))
+		goto err;
+	msg.msg_control = NULL;
+	msg.msg_controllen = 0;
+	msg.msg_name = NULL;
+	msg.msg_namelen = 0;
+
+	if (!cb->resp) {
+		size = min_t(size_t, len, 16384);
+		cb->resp = alloc_skb(size, GFP_KERNEL);
+		if (cb->resp == NULL) {
+			err = -ENOMEM;
+			goto err;
+		}
+		/* Trim skb to allocated size. */
+		skb_reserve(cb->resp, skb_tailroom(cb->resp) - size);
+	}
+
+	err = taskdiag_dumpit(cb, ns, &msg, len);
+
+err:
+	skb_trim(cb->resp, 0);
+	if (err <= 0) {
+		kfree_skb(cb->req);
+		cb->req = NULL;
+	}
+
+	return err;
+}
+
+static int task_diag_open (struct inode *inode, struct file *f)
+{
+	f->private_data = kzalloc(sizeof(struct task_diag_cb), GFP_KERNEL);
+	if (f->private_data == NULL)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static int task_diag_release(struct inode *inode, struct file *f)
+{
+	struct task_diag_cb *cb = f->private_data;
+
+	kfree_skb(cb->req);
+	kfree_skb(cb->resp);
+
+	kfree(f->private_data);
+	return 0;
+}
+
+static const struct file_operations task_diag_fops = {
+	.owner		= THIS_MODULE,
+	.open		= task_diag_open,
+	.release	= task_diag_release,
+	.write		= task_diag_write,
+	.read		= task_diag_read,
+};
+
+static __init int task_diag_init(void)
+{
+	if (!proc_create("task-diag", S_IRUGO | S_IWUGO, NULL, &task_diag_fops))
+		return -ENOMEM;
+
+	return 0;
+}
+
+static __exit void task_diag_exit(void)
+{
+	remove_proc_entry("task-diag", NULL);
+}
+
+module_init(task_diag_init);
+module_exit(task_diag_exit);
diff --git a/include/uapi/linux/task_diag.h b/include/uapi/linux/task_diag.h
new file mode 100644
index 0000000..ba0f71a
--- /dev/null
+++ b/include/uapi/linux/task_diag.h
@@ -0,0 +1,66 @@
+#ifndef _LINUX_TASK_DIAG_H
+#define _LINUX_TASK_DIAG_H
+
+#include <linux/types.h>
+#include <linux/netlink.h>
+#include <linux/capability.h>
+
+#define TASK_DIAG_CMD_GET 0xd101U
+
+struct task_diag_msg {
+	__u32 pid;
+	__u32 tgid;
+	__u32 flags;
+};
+
+enum {
+	TASK_DIAG_BASE	= 0,
+
+	__TASK_DIAG_ATTR_MAX
+#define TASK_DIAG_ATTR_MAX (__TASK_DIAG_ATTR_MAX - 1)
+};
+
+#define TASK_DIAG_SHOW_BASE	(1ULL << TASK_DIAG_BASE)
+
+enum {
+	TASK_DIAG_RUNNING,
+	TASK_DIAG_INTERRUPTIBLE,
+	TASK_DIAG_UNINTERRUPTIBLE,
+	TASK_DIAG_STOPPED,
+	TASK_DIAG_TRACE_STOP,
+	TASK_DIAG_DEAD,
+	TASK_DIAG_ZOMBIE,
+};
+
+#define TASK_DIAG_COMM_LEN 16
+
+struct task_diag_base {
+	__u32	tgid;
+	__u32	pid;
+	__u32	ppid;
+	__u32	tpid;
+	__u32	sid;
+	__u32	pgid;
+	__u8	state;
+	char	comm[TASK_DIAG_COMM_LEN];
+};
+
+#define TASK_DIAG_DUMP_ALL	0
+#define TASK_DIAG_DUMP_ONE	1
+
+struct task_diag_pid {
+	__u64	show_flags;
+	__u64	dump_strategy;
+
+	__u32	pid;
+};
+
+enum {
+	TASK_DIAG_CMD_ATTR_UNSPEC = 0,
+	TASK_DIAG_CMD_ATTR_GET,
+	__TASK_DIAG_CMD_ATTR_MAX,
+};
+
+#define TASK_DIAG_CMD_ATTR_MAX (__TASK_DIAG_CMD_ATTR_MAX - 1)
+
+#endif /* _LINUX_TASK_DIAG_H */
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 05/15] task_diag: add a new group to get process credentials
  2016-04-11 23:35 [PATCH 0/15] task_diag: add a new interface to get information about processes (v3) Andrey Vagin
                   ` (3 preceding siblings ...)
  2016-04-11 23:35 ` [PATCH 04/15] task_diag: add a new interface to get information about tasks (v4) Andrey Vagin
@ 2016-04-11 23:35 ` Andrey Vagin
  2016-04-11 23:35 ` [PATCH 06/15] task_diag: add a new group to get tasks memory mappings (v2) Andrey Vagin
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Andrey Vagin @ 2016-04-11 23:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrey Vagin, Oleg Nesterov, Andrew Morton, Cyrill Gorcunov,
	Pavel Emelyanov, Roger Luethi, Arnd Bergmann,
	Arnaldo Carvalho de Melo, David Ahern, Andy Lutomirski,
	Pavel Odintsov

A response is represented by the task_diag_creds structure:

struct task_diag_creds {
   struct task_diag_caps cap_inheritable;
   struct task_diag_caps cap_permitted;
   struct task_diag_caps cap_effective;
   struct task_diag_caps cap_bset;

   __u32 uid;
   __u32 euid;
   __u32 suid;
   __u32 fsuid;
   __u32 gid;
   __u32 egid;
   __u32 sgid;
   __u32 fsgid;
};

This group is optional and it's filled only if show_flags contains
TASK_DIAG_SHOW_CRED.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 fs/proc/task_diag.c            | 50 ++++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/task_diag.h | 21 ++++++++++++++++++
 2 files changed, 71 insertions(+)

diff --git a/fs/proc/task_diag.c b/fs/proc/task_diag.c
index 3c2127e..fc31771 100644
--- a/fs/proc/task_diag.c
+++ b/fs/proc/task_diag.c
@@ -80,6 +80,48 @@ static int fill_task_base(struct task_struct *p,
 	return 0;
 }
 
+static inline void caps2diag(struct task_diag_caps *diag, const kernel_cap_t *cap)
+{
+	int i;
+
+	for (i = 0; i < _LINUX_CAPABILITY_U32S_3; i++)
+		diag->cap[i] = cap->cap[i];
+}
+
+static int fill_creds(struct task_struct *p, struct sk_buff *skb,
+					struct user_namespace *user_ns)
+{
+	struct task_diag_creds *diag_cred;
+	const struct cred *cred;
+	struct nlattr *attr;
+
+	attr = nla_reserve(skb, TASK_DIAG_CRED, sizeof(struct task_diag_creds));
+	if (!attr)
+		return -EMSGSIZE;
+
+	diag_cred = nla_data(attr);
+
+	cred = get_task_cred(p);
+
+	caps2diag(&diag_cred->cap_inheritable, &cred->cap_inheritable);
+	caps2diag(&diag_cred->cap_permitted, &cred->cap_permitted);
+	caps2diag(&diag_cred->cap_effective, &cred->cap_effective);
+	caps2diag(&diag_cred->cap_bset, &cred->cap_bset);
+
+	diag_cred->uid   = from_kuid_munged(user_ns, cred->uid);
+	diag_cred->euid  = from_kuid_munged(user_ns, cred->euid);
+	diag_cred->suid  = from_kuid_munged(user_ns, cred->suid);
+	diag_cred->fsuid = from_kuid_munged(user_ns, cred->fsuid);
+	diag_cred->gid   = from_kgid_munged(user_ns, cred->gid);
+	diag_cred->egid  = from_kgid_munged(user_ns, cred->egid);
+	diag_cred->sgid  = from_kgid_munged(user_ns, cred->sgid);
+	diag_cred->fsgid = from_kgid_munged(user_ns, cred->fsgid);
+
+	put_cred(cred);
+
+	return 0;
+}
+
 static int task_diag_fill(struct task_struct *tsk, struct sk_buff *skb,
 			  struct task_diag_pid *req,
 			  struct task_diag_cb *cb, struct pid_namespace *pidns,
@@ -113,6 +155,14 @@ static int task_diag_fill(struct task_struct *tsk, struct sk_buff *skb,
 		i++;
 	}
 
+	if (show_flags & TASK_DIAG_SHOW_CRED) {
+		if (i >= n)
+			err = fill_creds(tsk, skb, userns);
+		if (err)
+			goto err;
+		i++;
+	}
+
 	nlmsg_end(skb, nlh);
 	if (cb)
 		cb->attr = 0;
diff --git a/include/uapi/linux/task_diag.h b/include/uapi/linux/task_diag.h
index ba0f71a..ea500c6 100644
--- a/include/uapi/linux/task_diag.h
+++ b/include/uapi/linux/task_diag.h
@@ -15,12 +15,14 @@ struct task_diag_msg {
 
 enum {
 	TASK_DIAG_BASE	= 0,
+	TASK_DIAG_CRED,
 
 	__TASK_DIAG_ATTR_MAX
 #define TASK_DIAG_ATTR_MAX (__TASK_DIAG_ATTR_MAX - 1)
 };
 
 #define TASK_DIAG_SHOW_BASE	(1ULL << TASK_DIAG_BASE)
+#define TASK_DIAG_SHOW_CRED	(1ULL << TASK_DIAG_CRED)
 
 enum {
 	TASK_DIAG_RUNNING,
@@ -45,6 +47,25 @@ struct task_diag_base {
 	char	comm[TASK_DIAG_COMM_LEN];
 };
 
+struct task_diag_caps {
+	__u32 cap[_LINUX_CAPABILITY_U32S_3];
+};
+
+struct task_diag_creds {
+	struct task_diag_caps cap_inheritable;
+	struct task_diag_caps cap_permitted;
+	struct task_diag_caps cap_effective;
+	struct task_diag_caps cap_bset;
+
+	__u32 uid;
+	__u32 euid;
+	__u32 suid;
+	__u32 fsuid;
+	__u32 gid;
+	__u32 egid;
+	__u32 sgid;
+	__u32 fsgid;
+};
 #define TASK_DIAG_DUMP_ALL	0
 #define TASK_DIAG_DUMP_ONE	1
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 06/15] task_diag: add a new group to get tasks memory mappings (v2)
  2016-04-11 23:35 [PATCH 0/15] task_diag: add a new interface to get information about processes (v3) Andrey Vagin
                   ` (4 preceding siblings ...)
  2016-04-11 23:35 ` [PATCH 05/15] task_diag: add a new group to get process credentials Andrey Vagin
@ 2016-04-11 23:35 ` Andrey Vagin
  2016-04-11 23:35 ` [PATCH 07/15] task_diag: add ability to dump children and threads Andrey Vagin
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Andrey Vagin @ 2016-04-11 23:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrey Vagin, Oleg Nesterov, Andrew Morton, Cyrill Gorcunov,
	Pavel Emelyanov, Roger Luethi, Arnd Bergmann,
	Arnaldo Carvalho de Melo, David Ahern, Andy Lutomirski,
	Pavel Odintsov

v2: Fixes from David Ahern
* Fix 8-byte alignment
* Change implementation of DIAG_VMA attribute:

This patch puts the filename into the task_diag_vma struct and
converts TASK_DIAG_VMA attribute into a series of task_diag_vma.
Now is there is a single TASK_DIAG_VMA attribute that is parsed
as:

| struct task_diag_vma | filename | ...

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 fs/proc/internal.h             |  21 ++++
 fs/proc/task_diag.c            | 279 ++++++++++++++++++++++++++++++++++++++++-
 fs/proc/task_mmu.c             |  18 +--
 include/uapi/linux/task_diag.h |  85 +++++++++++++
 4 files changed, 385 insertions(+), 18 deletions(-)

diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 2a2b1e6..75b57a3 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -316,3 +316,24 @@ task_next_child(struct task_struct *parent, struct task_struct *prev, unsigned i
 struct task_struct *task_first_tid(struct pid *pid, int tid, loff_t f_pos,
 					struct pid_namespace *ns);
 struct task_struct *task_next_tid(struct task_struct *start);
+
+struct mem_size_stats {
+	unsigned long resident;
+	unsigned long shared_clean;
+	unsigned long shared_dirty;
+	unsigned long private_clean;
+	unsigned long private_dirty;
+	unsigned long referenced;
+	unsigned long anonymous;
+	unsigned long anonymous_thp;
+	unsigned long swap;
+	unsigned long shared_hugetlb;
+	unsigned long private_hugetlb;
+	u64 pss;
+	u64 swap_pss;
+	bool check_shmem_swap;
+};
+
+struct mm_walk;
+int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
+			   struct mm_walk *walk);
diff --git a/fs/proc/task_diag.c b/fs/proc/task_diag.c
index fc31771..9c1ed45 100644
--- a/fs/proc/task_diag.c
+++ b/fs/proc/task_diag.c
@@ -7,6 +7,8 @@
 #include <linux/taskstats.h>
 #include <net/sock.h>
 
+#include "internal.h"
+
 struct task_diag_cb {
 	struct sk_buff		*req;
 	struct sk_buff		*resp;
@@ -14,6 +16,11 @@ struct task_diag_cb {
 	pid_t			pid;
 	int			pos;
 	int			attr;
+	union { /* per-attribute */
+		struct {
+			unsigned long mark;
+		} vma;
+	};
 };
 
 /*
@@ -122,6 +129,267 @@ static int fill_creds(struct task_struct *p, struct sk_buff *skb,
 	return 0;
 }
 
+static u64 get_vma_flags(struct vm_area_struct *vma)
+{
+	u64 flags = 0;
+
+	static const u64 mnemonics[BITS_PER_LONG] = {
+		/*
+		 * In case if we meet a flag we don't know about.
+		 */
+		[0 ... (BITS_PER_LONG-1)] = 0,
+
+		[ilog2(VM_READ)]	= TASK_DIAG_VMA_F_READ,
+		[ilog2(VM_WRITE)]	= TASK_DIAG_VMA_F_WRITE,
+		[ilog2(VM_EXEC)]	= TASK_DIAG_VMA_F_EXEC,
+		[ilog2(VM_SHARED)]	= TASK_DIAG_VMA_F_SHARED,
+		[ilog2(VM_MAYREAD)]	= TASK_DIAG_VMA_F_MAYREAD,
+		[ilog2(VM_MAYWRITE)]	= TASK_DIAG_VMA_F_MAYWRITE,
+		[ilog2(VM_MAYEXEC)]	= TASK_DIAG_VMA_F_MAYEXEC,
+		[ilog2(VM_MAYSHARE)]	= TASK_DIAG_VMA_F_MAYSHARE,
+		[ilog2(VM_GROWSDOWN)]	= TASK_DIAG_VMA_F_GROWSDOWN,
+		[ilog2(VM_PFNMAP)]	= TASK_DIAG_VMA_F_PFNMAP,
+		[ilog2(VM_DENYWRITE)]	= TASK_DIAG_VMA_F_DENYWRITE,
+#ifdef CONFIG_X86_INTEL_MPX
+		[ilog2(VM_MPX)]		= TASK_DIAG_VMA_F_MPX,
+#endif
+		[ilog2(VM_LOCKED)]	= TASK_DIAG_VMA_F_LOCKED,
+		[ilog2(VM_IO)]		= TASK_DIAG_VMA_F_IO,
+		[ilog2(VM_SEQ_READ)]	= TASK_DIAG_VMA_F_SEQ_READ,
+		[ilog2(VM_RAND_READ)]	= TASK_DIAG_VMA_F_RAND_READ,
+		[ilog2(VM_DONTCOPY)]	= TASK_DIAG_VMA_F_DONTCOPY,
+		[ilog2(VM_DONTEXPAND)]	= TASK_DIAG_VMA_F_DONTEXPAND,
+		[ilog2(VM_ACCOUNT)]	= TASK_DIAG_VMA_F_ACCOUNT,
+		[ilog2(VM_NORESERVE)]	= TASK_DIAG_VMA_F_NORESERVE,
+		[ilog2(VM_HUGETLB)]	= TASK_DIAG_VMA_F_HUGETLB,
+		[ilog2(VM_ARCH_1)]	= TASK_DIAG_VMA_F_ARCH_1,
+		[ilog2(VM_DONTDUMP)]	= TASK_DIAG_VMA_F_DONTDUMP,
+#ifdef CONFIG_MEM_SOFT_DIRTY
+		[ilog2(VM_SOFTDIRTY)]	= TASK_DIAG_VMA_F_SOFTDIRTY,
+#endif
+		[ilog2(VM_MIXEDMAP)]	= TASK_DIAG_VMA_F_MIXEDMAP,
+		[ilog2(VM_HUGEPAGE)]	= TASK_DIAG_VMA_F_HUGEPAGE,
+		[ilog2(VM_NOHUGEPAGE)]	= TASK_DIAG_VMA_F_NOHUGEPAGE,
+		[ilog2(VM_MERGEABLE)]	= TASK_DIAG_VMA_F_MERGEABLE,
+	};
+	size_t i;
+
+	for (i = 0; i < BITS_PER_LONG; i++) {
+		if (vma->vm_flags & (1UL << i))
+			flags |= mnemonics[i];
+	}
+
+	return flags;
+}
+
+/*
+ * use a tmp variable and copy to input arg to deal with
+ * alignment issues. diag_vma contains u64 elements which
+ * means extended load operations can be used and those can
+ * require 8-byte alignment (e.g., sparc)
+ */
+static void fill_diag_vma(struct vm_area_struct *vma,
+			  struct task_diag_vma *diag_vma)
+{
+	struct task_diag_vma tmp;
+
+	/* We don't show the stack guard page in /proc/maps */
+	tmp.start = vma->vm_start;
+	if (stack_guard_page_start(vma, tmp.start))
+		tmp.start += PAGE_SIZE;
+
+	tmp.end = vma->vm_end;
+	if (stack_guard_page_end(vma, tmp.end))
+		tmp.end -= PAGE_SIZE;
+	tmp.vm_flags = get_vma_flags(vma);
+
+	if (vma->vm_file) {
+		struct inode *inode = file_inode(vma->vm_file);
+		dev_t dev;
+
+		dev = inode->i_sb->s_dev;
+		tmp.major = MAJOR(dev);
+		tmp.minor = MINOR(dev);
+		tmp.inode = inode->i_ino;
+		tmp.generation = inode->i_generation;
+		tmp.pgoff = ((loff_t)vma->vm_pgoff) << PAGE_SHIFT;
+	} else {
+		tmp.major = 0;
+		tmp.minor = 0;
+		tmp.inode = 0;
+		tmp.generation = 0;
+		tmp.pgoff = 0;
+	}
+
+	memcpy(diag_vma, &tmp, sizeof(*diag_vma));
+}
+
+static const char *get_vma_name(struct vm_area_struct *vma, char *page)
+{
+	const char *name = NULL;
+
+	if (vma->vm_file) {
+		name = d_path(&vma->vm_file->f_path, page, PAGE_SIZE);
+		goto out;
+	}
+
+	if (vma->vm_ops && vma->vm_ops->name) {
+		name = vma->vm_ops->name(vma);
+		if (name)
+			goto out;
+	}
+
+	name = arch_vma_name(vma);
+
+out:
+	return name;
+}
+
+static void fill_diag_vma_stat(struct vm_area_struct *vma,
+				struct task_diag_vma_stat *stat)
+{
+	struct task_diag_vma_stat tmp;
+	struct mem_size_stats mss;
+	struct mm_walk smaps_walk = {
+		.pmd_entry = smaps_pte_range,
+		.mm = vma->vm_mm,
+		.private = &mss,
+	};
+
+	memset(&mss, 0, sizeof(mss));
+	memset(&tmp, 0, sizeof(tmp));
+
+	/* mmap_sem is held in m_start */
+	walk_page_vma(vma, &smaps_walk);
+
+	tmp.resident		= mss.resident;
+	tmp.pss			= mss.pss;
+	tmp.shared_clean	= mss.shared_clean;
+	tmp.private_clean	= mss.private_clean;
+	tmp.private_dirty	= mss.private_dirty;
+	tmp.referenced		= mss.referenced;
+	tmp.anonymous		= mss.anonymous;
+	tmp.anonymous_thp	= mss.anonymous_thp;
+	tmp.swap		= mss.swap;
+
+	memcpy(stat, &tmp, sizeof(*stat));
+}
+
+static int fill_vma(struct task_struct *p, struct sk_buff *skb,
+		    struct task_diag_cb *cb, bool *progress, u64 show_flags)
+{
+	struct vm_area_struct *vma;
+	struct mm_struct *mm;
+	struct nlattr *attr = NULL;
+	struct task_diag_vma *diag_vma;
+	unsigned long mark = 0;
+	char *page;
+	int i, rc = -EMSGSIZE, size;
+
+	if (cb)
+		mark = cb->vma.mark;
+
+	mm = p->mm;
+	if (!mm || !atomic_inc_not_zero(&mm->mm_users))
+		return 0;
+
+	page = (char *)__get_free_page(GFP_TEMPORARY);
+	if (!page) {
+		mmput(mm);
+		return -ENOMEM;
+	}
+
+	size = NLA_ALIGN(sizeof(struct task_diag_vma));
+	if (show_flags & TASK_DIAG_SHOW_VMA_STAT)
+		size += NLA_ALIGN(sizeof(struct task_diag_vma_stat));
+
+	down_read(&mm->mmap_sem);
+	for (vma = mm->mmap; vma; vma = vma->vm_next, i++) {
+		unsigned char *b = skb_tail_pointer(skb);
+		const char *name;
+		void *pfile;
+
+
+		if (mark >= vma->vm_start)
+			continue;
+
+		/* setup pointer for next map */
+		if (attr == NULL) {
+			attr = nla_reserve(skb, TASK_DIAG_VMA, size);
+			if (!attr)
+				goto err;
+
+			diag_vma = nla_data(attr);
+		} else {
+			diag_vma = nla_reserve_nohdr(skb, size);
+
+			if (diag_vma == NULL) {
+				nlmsg_trim(skb, b);
+				goto out;
+			}
+		}
+
+		fill_diag_vma(vma, diag_vma);
+
+		if (show_flags & TASK_DIAG_SHOW_VMA_STAT) {
+			struct task_diag_vma_stat *stat;
+
+			stat = (void *) diag_vma + NLA_ALIGN(sizeof(*diag_vma));
+
+			fill_diag_vma_stat(vma, stat);
+			diag_vma->stat_len = sizeof(struct task_diag_vma_stat);
+			diag_vma->stat_off = (void *) stat - (void *)diag_vma;
+		} else {
+			diag_vma->stat_len = 0;
+			diag_vma->stat_off = 0;
+		}
+
+		name = get_vma_name(vma, page);
+		if (IS_ERR(name)) {
+			nlmsg_trim(skb, b);
+			rc = PTR_ERR(name);
+			goto out;
+		}
+
+		if (name) {
+			diag_vma->name_len = strlen(name) + 1;
+
+			/* reserves NLA_ALIGN(len) */
+			pfile = nla_reserve_nohdr(skb, diag_vma->name_len);
+			if (pfile == NULL) {
+				nlmsg_trim(skb, b);
+				goto out;
+			}
+			diag_vma->name_off = pfile - (void *) diag_vma;
+			memcpy(pfile, name, diag_vma->name_len);
+		} else {
+			diag_vma->name_len = 0;
+			diag_vma->name_off = 0;
+		}
+
+		mark = vma->vm_start;
+
+		diag_vma->vma_len = skb_tail_pointer(skb) - (unsigned char *) diag_vma;
+
+		*progress = true;
+	}
+
+	rc = 0;
+	mark = 0;
+out:
+	if (*progress)
+		attr->nla_len = skb_tail_pointer(skb) - (unsigned char *) attr;
+
+err:
+	up_read(&mm->mmap_sem);
+	mmput(mm);
+	free_page((unsigned long) page);
+	if (cb)
+		cb->vma.mark = mark;
+
+	return rc;
+}
+
 static int task_diag_fill(struct task_struct *tsk, struct sk_buff *skb,
 			  struct task_diag_pid *req,
 			  struct task_diag_cb *cb, struct pid_namespace *pidns,
@@ -131,6 +399,7 @@ static int task_diag_fill(struct task_struct *tsk, struct sk_buff *skb,
 	struct nlmsghdr *nlh;
 	struct task_diag_msg *msg;
 	int err = 0, i = 0, n = 0;
+	bool progress = false;
 	int flags = 0;
 
 	if (cb) {
@@ -163,13 +432,21 @@ static int task_diag_fill(struct task_struct *tsk, struct sk_buff *skb,
 		i++;
 	}
 
+	if (show_flags & TASK_DIAG_SHOW_VMA) {
+		if (i >= n)
+			err = fill_vma(tsk, skb, cb, &progress, show_flags);
+		if (err)
+			goto err;
+		i++;
+	}
+
 	nlmsg_end(skb, nlh);
 	if (cb)
 		cb->attr = 0;
 
 	return 0;
 err:
-	if (err == -EMSGSIZE && (i > n)) {
+	if (err == -EMSGSIZE && (i > n || progress)) {
 		if (cb)
 			cb->attr = i;
 		nlmsg_end(skb, nlh);
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 229cb54..211147e 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -439,22 +439,6 @@ const struct file_operations proc_tid_maps_operations = {
 #define PSS_SHIFT 12
 
 #ifdef CONFIG_PROC_PAGE_MONITOR
-struct mem_size_stats {
-	unsigned long resident;
-	unsigned long shared_clean;
-	unsigned long shared_dirty;
-	unsigned long private_clean;
-	unsigned long private_dirty;
-	unsigned long referenced;
-	unsigned long anonymous;
-	unsigned long anonymous_thp;
-	unsigned long swap;
-	unsigned long shared_hugetlb;
-	unsigned long private_hugetlb;
-	u64 pss;
-	u64 swap_pss;
-	bool check_shmem_swap;
-};
 
 static void smaps_account(struct mem_size_stats *mss, struct page *page,
 		bool compound, bool young, bool dirty)
@@ -586,7 +570,7 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
 }
 #endif
 
-static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
+int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 			   struct mm_walk *walk)
 {
 	struct vm_area_struct *vma = walk->vma;
diff --git a/include/uapi/linux/task_diag.h b/include/uapi/linux/task_diag.h
index ea500c6..3486f2f 100644
--- a/include/uapi/linux/task_diag.h
+++ b/include/uapi/linux/task_diag.h
@@ -16,6 +16,8 @@ struct task_diag_msg {
 enum {
 	TASK_DIAG_BASE	= 0,
 	TASK_DIAG_CRED,
+	TASK_DIAG_VMA,
+	TASK_DIAG_VMA_STAT,
 
 	__TASK_DIAG_ATTR_MAX
 #define TASK_DIAG_ATTR_MAX (__TASK_DIAG_ATTR_MAX - 1)
@@ -23,6 +25,8 @@ enum {
 
 #define TASK_DIAG_SHOW_BASE	(1ULL << TASK_DIAG_BASE)
 #define TASK_DIAG_SHOW_CRED	(1ULL << TASK_DIAG_CRED)
+#define TASK_DIAG_SHOW_VMA	(1ULL << TASK_DIAG_VMA)
+#define TASK_DIAG_SHOW_VMA_STAT	(1ULL << TASK_DIAG_VMA_STAT)
 
 enum {
 	TASK_DIAG_RUNNING,
@@ -66,6 +70,87 @@ struct task_diag_creds {
 	__u32 sgid;
 	__u32 fsgid;
 };
+
+#define TASK_DIAG_VMA_F_READ		(1ULL <<  0)
+#define TASK_DIAG_VMA_F_WRITE		(1ULL <<  1)
+#define TASK_DIAG_VMA_F_EXEC		(1ULL <<  2)
+#define TASK_DIAG_VMA_F_SHARED		(1ULL <<  3)
+#define TASK_DIAG_VMA_F_MAYREAD		(1ULL <<  4)
+#define TASK_DIAG_VMA_F_MAYWRITE	(1ULL <<  5)
+#define TASK_DIAG_VMA_F_MAYEXEC		(1ULL <<  6)
+#define TASK_DIAG_VMA_F_MAYSHARE	(1ULL <<  7)
+#define TASK_DIAG_VMA_F_GROWSDOWN	(1ULL <<  8)
+#define TASK_DIAG_VMA_F_PFNMAP		(1ULL <<  9)
+#define TASK_DIAG_VMA_F_DENYWRITE	(1ULL << 10)
+#define TASK_DIAG_VMA_F_MPX		(1ULL << 11)
+#define TASK_DIAG_VMA_F_LOCKED		(1ULL << 12)
+#define TASK_DIAG_VMA_F_IO		(1ULL << 13)
+#define TASK_DIAG_VMA_F_SEQ_READ	(1ULL << 14)
+#define TASK_DIAG_VMA_F_RAND_READ	(1ULL << 15)
+#define TASK_DIAG_VMA_F_DONTCOPY	(1ULL << 16)
+#define TASK_DIAG_VMA_F_DONTEXPAND	(1ULL << 17)
+#define TASK_DIAG_VMA_F_ACCOUNT		(1ULL << 18)
+#define TASK_DIAG_VMA_F_NORESERVE	(1ULL << 19)
+#define TASK_DIAG_VMA_F_HUGETLB		(1ULL << 20)
+#define TASK_DIAG_VMA_F_ARCH_1		(1ULL << 21)
+#define TASK_DIAG_VMA_F_DONTDUMP	(1ULL << 22)
+#define TASK_DIAG_VMA_F_SOFTDIRTY	(1ULL << 23)
+#define TASK_DIAG_VMA_F_MIXEDMAP	(1ULL << 24)
+#define TASK_DIAG_VMA_F_HUGEPAGE	(1ULL << 25)
+#define TASK_DIAG_VMA_F_NOHUGEPAGE	(1ULL << 26)
+#define TASK_DIAG_VMA_F_MERGEABLE	(1ULL << 27)
+
+struct task_diag_vma_stat {
+	__u64 resident;
+	__u64 shared_clean;
+	__u64 shared_dirty;
+	__u64 private_clean;
+	__u64 private_dirty;
+	__u64 referenced;
+	__u64 anonymous;
+	__u64 anonymous_thp;
+	__u64 swap;
+	__u64 pss;
+} __attribute__((__aligned__(NLA_ALIGNTO)));
+
+/* task_diag_vma must be NLA_ALIGN'ed */
+struct task_diag_vma {
+	__u64 start, end;
+	__u64 vm_flags;
+	__u64 pgoff;
+	__u32 major;
+	__u32 minor;
+	__u64 inode;
+	__u32 generation;
+	__u16 vma_len;
+	__u16 name_off;
+	__u16 name_len;
+	__u16 stat_off;
+	__u16 stat_len;
+} __attribute__((__aligned__(NLA_ALIGNTO)));
+
+static inline char *task_diag_vma_name(struct task_diag_vma *vma)
+{
+	if (!vma->name_len)
+		return NULL;
+
+	return ((char *)vma) + vma->name_off;
+}
+
+static inline
+struct task_diag_vma_stat *task_diag_vma_stat(struct task_diag_vma *vma)
+{
+	if (!vma->stat_len)
+		return NULL;
+
+	return ((void *)vma) + vma->stat_off;
+}
+
+#define task_diag_for_each_vma(vma, attr)			\
+	for (vma = nla_data(attr);				\
+		(void *) vma < nla_data(attr) + nla_len(attr);	\
+		vma = (void *) vma + vma->vma_len)
+
 #define TASK_DIAG_DUMP_ALL	0
 #define TASK_DIAG_DUMP_ONE	1
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 07/15] task_diag: add ability to dump children and threads
  2016-04-11 23:35 [PATCH 0/15] task_diag: add a new interface to get information about processes (v3) Andrey Vagin
                   ` (5 preceding siblings ...)
  2016-04-11 23:35 ` [PATCH 06/15] task_diag: add a new group to get tasks memory mappings (v2) Andrey Vagin
@ 2016-04-11 23:35 ` Andrey Vagin
  2016-04-11 23:35 ` [PATCH 08/15] task_diag: Only add VMAs for thread_group leader Andrey Vagin
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Andrey Vagin @ 2016-04-11 23:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrey Vagin, Oleg Nesterov, Andrew Morton, Cyrill Gorcunov,
	Pavel Emelyanov, Roger Luethi, Arnd Bergmann,
	Arnaldo Carvalho de Melo, David Ahern, Andy Lutomirski,
	Pavel Odintsov

Now we can dump all task or children, threads of a specified task.
It's an example how this interface can be expanded for different
use-cases.

v2: Fixes from David Ahern
Add missing break in iter_stop
Fix 8-byte alignment issues

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 fs/proc/task_diag.c            | 93 ++++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/task_diag.h |  7 +++-
 2 files changed, 98 insertions(+), 2 deletions(-)

diff --git a/fs/proc/task_diag.c b/fs/proc/task_diag.c
index 9c1ed45..e0f0b03 100644
--- a/fs/proc/task_diag.c
+++ b/fs/proc/task_diag.c
@@ -479,6 +479,12 @@ static void iter_stop(struct task_iter *iter)
 	case TASK_DIAG_DUMP_ALL:
 		task = iter->tgid.task;
 		break;
+	case TASK_DIAG_DUMP_ALL_THREAD:
+		/* release both tgid task and thread task */
+		if (iter->task)
+			put_task_struct(iter->task);
+		task = iter->tgid.task;
+		break;
 	default:
 		task = iter->task;
 	}
@@ -486,6 +492,23 @@ static void iter_stop(struct task_iter *iter)
 		put_task_struct(task);
 }
 
+static struct task_struct *
+task_diag_next_child(struct task_struct *parent,
+			struct task_struct *prev, unsigned int pos)
+{
+	struct task_struct *task;
+
+	read_lock(&tasklist_lock);
+	task = task_next_child(parent, prev, pos);
+	if (prev)
+		put_task_struct(prev);
+	if (task)
+		get_task_struct(task);
+	read_unlock(&tasklist_lock);
+
+	return task;
+}
+
 static struct task_struct *iter_start(struct task_iter *iter)
 {
 	if (iter->req.pid > 0) {
@@ -508,11 +531,48 @@ static struct task_struct *iter_start(struct task_iter *iter)
 			iter->task = NULL;
 		return iter->task;
 
+	case TASK_DIAG_DUMP_THREAD:
+		if (iter->parent == NULL)
+			return ERR_PTR(-ESRCH);
+
+		iter->pos = iter->cb->pos;
+		iter->task = task_first_tid(task_pid(iter->parent),
+					    iter->cb->pid,iter->pos, iter->ns);
+		return iter->task;
+
+	case TASK_DIAG_DUMP_CHILDREN:
+		if (iter->parent == NULL)
+			return ERR_PTR(-ESRCH);
+
+		iter->pos = iter->cb->pos;
+		iter->task = task_diag_next_child(iter->parent, NULL, iter->pos);
+		return iter->task;
+
 	case TASK_DIAG_DUMP_ALL:
 		iter->tgid.tgid = iter->cb->pid;
 		iter->tgid.task = NULL;
 		iter->tgid = next_tgid(iter->ns, iter->tgid);
 		return iter->tgid.task;
+
+	case TASK_DIAG_DUMP_ALL_THREAD:
+		iter->pos = iter->cb->pos;
+		iter->tgid.tgid = iter->cb->pid;
+		iter->tgid.task = NULL;
+		iter->tgid = next_tgid(iter->ns, iter->tgid);
+		if (!iter->tgid.task)
+			return NULL;
+
+		iter->task = task_first_tid(task_pid(iter->tgid.task),
+						0, iter->pos, iter->ns);
+		if (!iter->task) {
+			iter->pos = 0;
+			iter->tgid.tgid += 1;
+			iter->tgid = next_tgid(iter->ns, iter->tgid);
+			iter->task = iter->tgid.task;
+			if (iter->task)
+				get_task_struct(iter->task);
+		}
+		return iter->task;
 	}
 
 	return ERR_PTR(-EINVAL);
@@ -529,11 +589,44 @@ static struct task_struct *iter_next(struct task_iter *iter)
 		iter->task = NULL;
 		return NULL;
 
+	case TASK_DIAG_DUMP_THREAD:
+		iter->pos++;
+		iter->task = task_next_tid(iter->task);
+		iter->cb->pos = iter->pos;
+		if (iter->task)
+			iter->cb->pid = task_pid_nr_ns(iter->task, iter->ns);
+		else
+			iter->cb->pid = -1;
+		return iter->task;
+	case TASK_DIAG_DUMP_CHILDREN:
+		iter->pos++;
+		iter->task = task_diag_next_child(iter->parent, iter->task, iter->pos);
+		iter->cb->pos = iter->pos;
+		return iter->task;
+
 	case TASK_DIAG_DUMP_ALL:
 		iter->tgid.tgid += 1;
 		iter->tgid = next_tgid(iter->ns, iter->tgid);
 		iter->cb->pid = iter->tgid.tgid;
 		return iter->tgid.task;
+
+	case TASK_DIAG_DUMP_ALL_THREAD:
+		iter->pos++;
+		iter->task = task_next_tid(iter->task);
+		if (!iter->task) {
+			iter->pos = 0;
+			iter->tgid.tgid += 1;
+			iter->tgid = next_tgid(iter->ns, iter->tgid);
+			iter->task = iter->tgid.task;
+			if (iter->task)
+				get_task_struct(iter->task);
+		}
+
+		/* save current position */
+		iter->cb->pid = iter->tgid.tgid;
+		iter->cb->pos = iter->pos;
+
+		return iter->task;
 	}
 
 	return NULL;
diff --git a/include/uapi/linux/task_diag.h b/include/uapi/linux/task_diag.h
index 3486f2f..8bccd02 100644
--- a/include/uapi/linux/task_diag.h
+++ b/include/uapi/linux/task_diag.h
@@ -151,8 +151,11 @@ struct task_diag_vma_stat *task_diag_vma_stat(struct task_diag_vma *vma)
 		(void *) vma < nla_data(attr) + nla_len(attr);	\
 		vma = (void *) vma + vma->vma_len)
 
-#define TASK_DIAG_DUMP_ALL	0
-#define TASK_DIAG_DUMP_ONE	1
+#define TASK_DIAG_DUMP_ALL		0
+#define TASK_DIAG_DUMP_ONE		1
+#define TASK_DIAG_DUMP_ALL_THREAD	2
+#define TASK_DIAG_DUMP_CHILDREN		3
+#define TASK_DIAG_DUMP_THREAD		4
 
 struct task_diag_pid {
 	__u64	show_flags;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 08/15] task_diag: Only add VMAs for thread_group leader
  2016-04-11 23:35 [PATCH 0/15] task_diag: add a new interface to get information about processes (v3) Andrey Vagin
                   ` (6 preceding siblings ...)
  2016-04-11 23:35 ` [PATCH 07/15] task_diag: add ability to dump children and threads Andrey Vagin
@ 2016-04-11 23:35 ` Andrey Vagin
  2016-04-11 23:35 ` [PATCH 09/15] task_diag: add a flag to mark incomplete messages Andrey Vagin
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Andrey Vagin @ 2016-04-11 23:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrey Vagin, Oleg Nesterov, Andrew Morton, Cyrill Gorcunov,
	Pavel Emelyanov, Roger Luethi, Arnd Bergmann,
	Arnaldo Carvalho de Melo, David Ahern, Andy Lutomirski,
	Pavel Odintsov

From: David Ahern <dsahern@gmail.com>

threads of a process share the same VMAs, so when dumping all threads
for all processes only push vma data for group leader.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 fs/proc/task_diag.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/proc/task_diag.c b/fs/proc/task_diag.c
index e0f0b03..00db32d 100644
--- a/fs/proc/task_diag.c
+++ b/fs/proc/task_diag.c
@@ -433,7 +433,17 @@ static int task_diag_fill(struct task_struct *tsk, struct sk_buff *skb,
 	}
 
 	if (show_flags & TASK_DIAG_SHOW_VMA) {
-		if (i >= n)
+		bool dump_vma = true;
+
+		/* if the request is to dump all threads of all processes
+		 * only show VMAs for group leader.
+		 */
+		if ((req->dump_strategy == TASK_DIAG_DUMP_ALL_THREAD ||
+		     req->dump_strategy == TASK_DIAG_DUMP_THREAD) &&
+		    !thread_group_leader(tsk))
+			dump_vma = false;
+
+		if (dump_vma && i >= n)
 			err = fill_vma(tsk, skb, cb, &progress, show_flags);
 		if (err)
 			goto err;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 09/15] task_diag: add a flag to mark incomplete messages
  2016-04-11 23:35 [PATCH 0/15] task_diag: add a new interface to get information about processes (v3) Andrey Vagin
                   ` (7 preceding siblings ...)
  2016-04-11 23:35 ` [PATCH 08/15] task_diag: Only add VMAs for thread_group leader Andrey Vagin
@ 2016-04-11 23:35 ` Andrey Vagin
  2016-04-11 23:35 ` [PATCH 10/15] task_diag: add a new group to get resource usage Andrey Vagin
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Andrey Vagin @ 2016-04-11 23:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrey Vagin, Oleg Nesterov, Andrew Morton, Cyrill Gorcunov,
	Pavel Emelyanov, Roger Luethi, Arnd Bergmann,
	Arnaldo Carvalho de Melo, David Ahern, Andy Lutomirski,
	Pavel Odintsov

If all information about a process don't fit in a message, it's marked
by the TASK_DIAG_FLAG_CONT flag and the next message will describe the
same process.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 fs/proc/task_diag.c            | 3 +++
 include/uapi/linux/task_diag.h | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/fs/proc/task_diag.c b/fs/proc/task_diag.c
index 00db32d..cd10374 100644
--- a/fs/proc/task_diag.c
+++ b/fs/proc/task_diag.c
@@ -415,6 +415,7 @@ static int task_diag_fill(struct task_struct *tsk, struct sk_buff *skb,
 	msg = nlmsg_data(nlh);
 	msg->pid  = task_pid_nr_ns(tsk, pidns);
 	msg->tgid = task_tgid_nr_ns(tsk, pidns);
+	msg->flags |= TASK_DIAG_FLAG_CONT;
 
 	if (show_flags & TASK_DIAG_SHOW_BASE) {
 		if (i >= n)
@@ -450,6 +451,8 @@ static int task_diag_fill(struct task_struct *tsk, struct sk_buff *skb,
 		i++;
 	}
 
+	msg->flags &= ~TASK_DIAG_FLAG_CONT;
+
 	nlmsg_end(skb, nlh);
 	if (cb)
 		cb->attr = 0;
diff --git a/include/uapi/linux/task_diag.h b/include/uapi/linux/task_diag.h
index 8bccd02..e967c5b 100644
--- a/include/uapi/linux/task_diag.h
+++ b/include/uapi/linux/task_diag.h
@@ -13,6 +13,8 @@ struct task_diag_msg {
 	__u32 flags;
 };
 
+#define TASK_DIAG_FLAG_CONT 0x00000001
+
 enum {
 	TASK_DIAG_BASE	= 0,
 	TASK_DIAG_CRED,
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 10/15] task_diag: add a new group to get resource usage
  2016-04-11 23:35 [PATCH 0/15] task_diag: add a new interface to get information about processes (v3) Andrey Vagin
                   ` (8 preceding siblings ...)
  2016-04-11 23:35 ` [PATCH 09/15] task_diag: add a flag to mark incomplete messages Andrey Vagin
@ 2016-04-11 23:35 ` Andrey Vagin
  2016-04-11 23:35 ` [PATCH 11/15] task_diag: add a new group to get memory usage Andrey Vagin
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Andrey Vagin @ 2016-04-11 23:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrey Vagin, Oleg Nesterov, Andrew Morton, Cyrill Gorcunov,
	Pavel Emelyanov, Roger Luethi, Arnd Bergmann,
	Arnaldo Carvalho de Melo, David Ahern, Andy Lutomirski,
	Pavel Odintsov

Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 fs/proc/task_diag.c            | 92 ++++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/task_diag.h | 15 +++++++
 2 files changed, 107 insertions(+)

diff --git a/fs/proc/task_diag.c b/fs/proc/task_diag.c
index cd10374..c8499f2 100644
--- a/fs/proc/task_diag.c
+++ b/fs/proc/task_diag.c
@@ -390,6 +390,84 @@ err:
 	return rc;
 }
 
+static int fill_task_stat(struct task_struct *task, struct sk_buff *skb, int whole)
+{
+	struct task_diag_stat *st;
+	struct nlattr *attr;
+
+	int priority, nice;
+	int num_threads = 0;
+	unsigned long cmin_flt = 0, cmaj_flt = 0;
+	unsigned long  min_flt = 0,  maj_flt = 0;
+	cputime_t cutime, cstime, utime, stime;
+	cputime_t cgtime, gtime;
+	unsigned long flags;
+
+	attr = nla_reserve(skb, TASK_DIAG_STAT, sizeof(struct task_diag_stat));
+	if (!attr)
+		return -EMSGSIZE;
+
+	st = nla_data(attr);
+
+	cutime = cstime = utime = stime = 0;
+	cgtime = gtime = 0;
+	if (lock_task_sighand(task, &flags)) {
+		struct signal_struct *sig = task->signal;
+
+		num_threads = get_nr_threads(task);
+
+		cmin_flt = sig->cmin_flt;
+		cmaj_flt = sig->cmaj_flt;
+		cutime = sig->cutime;
+		cstime = sig->cstime;
+		cgtime = sig->cgtime;
+
+		/* add up live thread stats at the group level */
+		if (whole) {
+			struct task_struct *t = task;
+
+			do {
+				min_flt += t->min_flt;
+				maj_flt += t->maj_flt;
+				gtime += task_gtime(t);
+			} while_each_thread(task, t);
+
+			min_flt += sig->min_flt;
+			maj_flt += sig->maj_flt;
+			thread_group_cputime_adjusted(task, &utime, &stime);
+			gtime += sig->gtime;
+		}
+
+		unlock_task_sighand(task, &flags);
+	}
+
+	if (!whole) {
+		min_flt = task->min_flt;
+		maj_flt = task->maj_flt;
+		task_cputime_adjusted(task, &utime, &stime);
+		gtime = task_gtime(task);
+	}
+
+	/* scale priority and nice values from timeslices to -20..20 */
+	/* to make it look like a "normal" Unix priority/nice value  */
+	priority = task_prio(task);
+	nice = task_nice(task);
+
+
+	st->minflt	= min_flt;
+	st->cminflt	= cmin_flt;
+	st->majflt	= maj_flt;
+	st->cmajflt	= cmaj_flt;
+	st->utime	= cputime_to_clock_t(utime);
+	st->stime	= cputime_to_clock_t(stime);
+	st->cutime	= cputime_to_clock_t(cutime);
+	st->cstime	= cputime_to_clock_t(cstime);
+
+	st->threads	= num_threads;
+
+	return 0;
+}
+
 static int task_diag_fill(struct task_struct *tsk, struct sk_buff *skb,
 			  struct task_diag_pid *req,
 			  struct task_diag_cb *cb, struct pid_namespace *pidns,
@@ -451,6 +529,20 @@ static int task_diag_fill(struct task_struct *tsk, struct sk_buff *skb,
 		i++;
 	}
 
+	if (show_flags & TASK_DIAG_SHOW_STAT) {
+		int whole = 1;
+
+		if (req->dump_strategy == TASK_DIAG_DUMP_ALL_THREAD ||
+		    req->dump_strategy == TASK_DIAG_DUMP_THREAD)
+			whole = 0;
+
+		if (i >= n)
+			err = fill_task_stat(tsk, skb, whole);
+		if (err)
+			goto err;
+		i++;
+	}
+
 	msg->flags &= ~TASK_DIAG_FLAG_CONT;
 
 	nlmsg_end(skb, nlh);
diff --git a/include/uapi/linux/task_diag.h b/include/uapi/linux/task_diag.h
index e967c5b..551d4fa 100644
--- a/include/uapi/linux/task_diag.h
+++ b/include/uapi/linux/task_diag.h
@@ -20,6 +20,7 @@ enum {
 	TASK_DIAG_CRED,
 	TASK_DIAG_VMA,
 	TASK_DIAG_VMA_STAT,
+	TASK_DIAG_STAT,
 
 	__TASK_DIAG_ATTR_MAX
 #define TASK_DIAG_ATTR_MAX (__TASK_DIAG_ATTR_MAX - 1)
@@ -29,6 +30,7 @@ enum {
 #define TASK_DIAG_SHOW_CRED	(1ULL << TASK_DIAG_CRED)
 #define TASK_DIAG_SHOW_VMA	(1ULL << TASK_DIAG_VMA)
 #define TASK_DIAG_SHOW_VMA_STAT	(1ULL << TASK_DIAG_VMA_STAT)
+#define TASK_DIAG_SHOW_STAT	(1ULL << TASK_DIAG_STAT)
 
 enum {
 	TASK_DIAG_RUNNING,
@@ -153,6 +155,19 @@ struct task_diag_vma_stat *task_diag_vma_stat(struct task_diag_vma *vma)
 		(void *) vma < nla_data(attr) + nla_len(attr);	\
 		vma = (void *) vma + vma->vma_len)
 
+struct task_diag_stat {
+	__u64 minflt;
+	__u64 cminflt;
+	__u64 majflt;
+	__u64 cmajflt;
+	__u64 utime;
+	__u64 stime;
+	__u64 cutime;
+	__u64 cstime;
+
+	__u32 threads;
+};
+
 #define TASK_DIAG_DUMP_ALL		0
 #define TASK_DIAG_DUMP_ONE		1
 #define TASK_DIAG_DUMP_ALL_THREAD	2
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 11/15] task_diag: add a new group to get memory usage
  2016-04-11 23:35 [PATCH 0/15] task_diag: add a new interface to get information about processes (v3) Andrey Vagin
                   ` (9 preceding siblings ...)
  2016-04-11 23:35 ` [PATCH 10/15] task_diag: add a new group to get resource usage Andrey Vagin
@ 2016-04-11 23:35 ` Andrey Vagin
  2016-04-11 23:35 ` [PATCH 12/15] Documentation: add documentation for task_diag Andrey Vagin
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Andrey Vagin @ 2016-04-11 23:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrey Vagin, Oleg Nesterov, Andrew Morton, Cyrill Gorcunov,
	Pavel Emelyanov, Roger Luethi, Arnd Bergmann,
	Arnaldo Carvalho de Melo, David Ahern, Andy Lutomirski,
	Pavel Odintsov

Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 fs/proc/task_diag.c            | 79 ++++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/task_diag.h | 21 +++++++++++
 2 files changed, 100 insertions(+)

diff --git a/fs/proc/task_diag.c b/fs/proc/task_diag.c
index c8499f2..3dc3617 100644
--- a/fs/proc/task_diag.c
+++ b/fs/proc/task_diag.c
@@ -468,6 +468,77 @@ static int fill_task_stat(struct task_struct *task, struct sk_buff *skb, int who
 	return 0;
 }
 
+static int fill_task_statm(struct task_struct *task, struct sk_buff *skb, int whole)
+{
+	struct task_diag_statm *st;
+	struct nlattr *attr;
+
+	unsigned long text, lib, swap, ptes, pmds, anon, file, shmem;
+	unsigned long hiwater_vm, total_vm, hiwater_rss, total_rss;
+	unsigned long stack_vm, data_vm, locked_vm, pinned_vm;
+	struct mm_struct *mm;
+
+	mm = get_task_mm(task);
+	if (!mm)
+		return 0;
+
+	anon = get_mm_counter(mm, MM_ANONPAGES);
+	file = get_mm_counter(mm, MM_FILEPAGES);
+	shmem = get_mm_counter(mm, MM_SHMEMPAGES);
+
+	/*
+	 * Note: to minimize their overhead, mm maintains hiwater_vm and
+	 * hiwater_rss only when about to *lower* total_vm or rss.  Any
+	 * collector of these hiwater stats must therefore get total_vm
+	 * and rss too, which will usually be the higher.  Barriers? not
+	 * worth the effort, such snapshots can always be inconsistent.
+	 */
+	hiwater_vm = total_vm = mm->total_vm;
+	if (hiwater_vm < mm->hiwater_vm)
+		hiwater_vm = mm->hiwater_vm;
+	hiwater_rss = total_rss = anon + file + shmem;
+	if (hiwater_rss < mm->hiwater_rss)
+		hiwater_rss = mm->hiwater_rss;
+
+	text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK)) >> PAGE_SHIFT;
+	lib = mm->exec_vm - text;
+	swap = get_mm_counter(mm, MM_SWAPENTS);
+	ptes = PTRS_PER_PTE * sizeof(pte_t) * atomic_long_read(&mm->nr_ptes);
+	pmds = PTRS_PER_PMD * sizeof(pmd_t) * mm_nr_pmds(mm);
+
+	data_vm   = mm->data_vm;
+	stack_vm  = mm->stack_vm;
+	locked_vm = mm->locked_vm;
+	pinned_vm = mm->pinned_vm;
+
+	mmput(mm);
+
+	attr = nla_reserve(skb, TASK_DIAG_STATM, sizeof(*st));
+	if (!attr)
+		return -EMSGSIZE;
+
+	st = nla_data(attr);
+
+	st->anon	= anon;
+	st->file	= file;
+	st->shmem	= shmem;
+	st->hiwater_vm	= hiwater_vm;
+	st->hiwater_rss	= hiwater_rss;
+	st->text	= text;
+	st->lib		= lib;
+	st->swap	= swap;
+	st->ptes	= ptes;
+	st->pmds	= pmds;
+	st->total_rss	= total_rss;
+	st->total_vm	= total_vm;
+	st->data_vm	= data_vm;
+	st->stack_vm	= stack_vm;
+	st->locked_vm	= locked_vm;
+	st->pinned_vm	= pinned_vm;
+
+	return 0;
+}
+
 static int task_diag_fill(struct task_struct *tsk, struct sk_buff *skb,
 			  struct task_diag_pid *req,
 			  struct task_diag_cb *cb, struct pid_namespace *pidns,
@@ -543,6 +614,14 @@ static int task_diag_fill(struct task_struct *tsk, struct sk_buff *skb,
 		i++;
 	}
 
+	if (show_flags & TASK_DIAG_SHOW_STATM) {
+		if (i >= n)
+			err = fill_task_statm(tsk, skb, 1);
+		if (err)
+			goto err;
+		i++;
+	}
+
 	msg->flags &= ~TASK_DIAG_FLAG_CONT;
 
 	nlmsg_end(skb, nlh);
diff --git a/include/uapi/linux/task_diag.h b/include/uapi/linux/task_diag.h
index 551d4fa..9ab96f1 100644
--- a/include/uapi/linux/task_diag.h
+++ b/include/uapi/linux/task_diag.h
@@ -21,6 +21,7 @@ enum {
 	TASK_DIAG_VMA,
 	TASK_DIAG_VMA_STAT,
 	TASK_DIAG_STAT,
+	TASK_DIAG_STATM,
 
 	__TASK_DIAG_ATTR_MAX
 #define TASK_DIAG_ATTR_MAX (__TASK_DIAG_ATTR_MAX - 1)
@@ -31,6 +32,7 @@ enum {
 #define TASK_DIAG_SHOW_VMA	(1ULL << TASK_DIAG_VMA)
 #define TASK_DIAG_SHOW_VMA_STAT	(1ULL << TASK_DIAG_VMA_STAT)
 #define TASK_DIAG_SHOW_STAT	(1ULL << TASK_DIAG_STAT)
+#define TASK_DIAG_SHOW_STATM	(1ULL << TASK_DIAG_STATM)
 
 enum {
 	TASK_DIAG_RUNNING,
@@ -168,6 +170,25 @@ struct task_diag_stat {
 	__u32 threads;
 };
 
+struct task_diag_statm {
+	__u64 anon;
+	__u64 file;
+	__u64 shmem;
+	__u64 total_vm;
+	__u64 total_rss;
+	__u64 hiwater_vm;
+	__u64 hiwater_rss;
+	__u64 text;
+	__u64 lib;
+	__u64 swap;
+	__u64 ptes;
+	__u64 pmds;
+	__u64 locked_vm;
+	__u64 pinned_vm;
+	__u64 data_vm;
+	__u64 stack_vm;
+};
+
 #define TASK_DIAG_DUMP_ALL		0
 #define TASK_DIAG_DUMP_ONE		1
 #define TASK_DIAG_DUMP_ALL_THREAD	2
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 12/15] Documentation: add documentation for task_diag
  2016-04-11 23:35 [PATCH 0/15] task_diag: add a new interface to get information about processes (v3) Andrey Vagin
                   ` (10 preceding siblings ...)
  2016-04-11 23:35 ` [PATCH 11/15] task_diag: add a new group to get memory usage Andrey Vagin
@ 2016-04-11 23:35 ` Andrey Vagin
  2016-04-11 23:35 ` [PATCH 13/15] selftest: check the task_diag functinonality Andrey Vagin
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 21+ messages in thread
From: Andrey Vagin @ 2016-04-11 23:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrey Vagin, Oleg Nesterov, Andrew Morton, Cyrill Gorcunov,
	Pavel Emelyanov, Roger Luethi, Arnd Bergmann,
	Arnaldo Carvalho de Melo, David Ahern, Andy Lutomirski,
	Pavel Odintsov

Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 Documentation/accounting/task_diag.txt | 57 ++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)
 create mode 100644 Documentation/accounting/task_diag.txt

diff --git a/Documentation/accounting/task_diag.txt b/Documentation/accounting/task_diag.txt
new file mode 100644
index 0000000..ff486b9
--- /dev/null
+++ b/Documentation/accounting/task_diag.txt
@@ -0,0 +1,57 @@
+The task-diag interface allows to get information about running processes
+(roughly same info that is now available from /proc/PID/* files). Compared to
+/proc/PID/* files, it is faster, more flexible and provides data in a binary
+format. Task-diag was created using the basic idea of socket_diag.
+
+Interface
+---------
+
+Here is the /proc/task-diag file, which operates based on the following
+principles:
+
+* Transactional: write request, read response
+* Netlink message format (same as used by sock_diag; binary and extendable)
+
+The user-kernel interface is encapsulated in include/uapi/linux/task_diag.h
+
+Request
+-------
+
+A request is described by the task_diag_pid structure.
+
+struct task_diag_pid {
+	__u64	show_flags;	/* TASK_DIAG_SHOW_* */
+	__u64	dump_stratagy;	/* TASK_DIAG_DUMP_* */
+
+	__u32	pid;
+};
+
+dump_stratagy specifies a group of processes:
+/* per-process strategies */
+TASK_DIAG_DUMP_CHILDREN	- all children
+TASK_DIAG_DUMP_THREAD	- all threads
+TASK_DIAG_DUMP_ONE	- one process
+/* system wide strategies (the pid fiel is ignored) */
+TASK_DIAG_DUMP_ALL	  - all processes
+TASK_DIAG_DUMP_ALL_THREAD - all threads
+
+show_flags specifies which information are required.  If we set the
+TASK_DIAG_SHOW_BASE flag, the response message will contain the TASK_DIAG_BASE
+attribute which is described by the task_diag_base structure.
+
+In future, it can be extended by optional attributes. The request describes
+which task properties are required and for which processes they are required
+for.
+
+Response
+--------
+
+A response can be divided into a few packets. Each task is described by a
+netlink message. If all information about a process doesn't fit into a message,
+the TASK_DIAG_FLAG_CONT flag will be set and the next message will continue
+describing the same process.
+
+Examples
+--------
+
+A few examples can be found in tools/testing/selftests/task_diag/
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 13/15] selftest: check the task_diag functinonality
  2016-04-11 23:35 [PATCH 0/15] task_diag: add a new interface to get information about processes (v3) Andrey Vagin
                   ` (11 preceding siblings ...)
  2016-04-11 23:35 ` [PATCH 12/15] Documentation: add documentation for task_diag Andrey Vagin
@ 2016-04-11 23:35 ` Andrey Vagin
  2016-04-11 23:35 ` [PATCH 14/15] task_diag: Enhance fork tool to spawn threads Andrey Vagin
  2016-04-11 23:35 ` [PATCH 15/15] test: check that task_diag can dump all thread of one process Andrey Vagin
  14 siblings, 0 replies; 21+ messages in thread
From: Andrey Vagin @ 2016-04-11 23:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrey Vagin, Oleg Nesterov, Andrew Morton, Cyrill Gorcunov,
	Pavel Emelyanov, Roger Luethi, Arnd Bergmann,
	Arnaldo Carvalho de Melo, David Ahern, Andy Lutomirski,
	Pavel Odintsov

Here are two test (example) programs.

task_diag - request information for two processes.
test_diag_all - request information about all processes

v2: Fixes from David Ahern:
    * task_diag: Fix 8-byte alignment for vma and vma_stats

Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 tools/testing/selftests/Makefile                   |   1 +
 tools/testing/selftests/task_diag/.gitignore       |   4 +
 tools/testing/selftests/task_diag/Makefile         |  16 ++
 tools/testing/selftests/task_diag/_run.sh          |  21 +++
 tools/testing/selftests/task_diag/fork.c           |  30 ++++
 tools/testing/selftests/task_diag/run.sh           |   1 +
 tools/testing/selftests/task_diag/task_diag.h      |   1 +
 tools/testing/selftests/task_diag/task_diag_all.c  | 150 ++++++++++++++++++
 tools/testing/selftests/task_diag/task_diag_comm.c | 172 +++++++++++++++++++++
 tools/testing/selftests/task_diag/task_diag_comm.h |  34 ++++
 tools/testing/selftests/task_diag/task_proc_all.c  |  35 +++++
 11 files changed, 465 insertions(+)
 create mode 100644 tools/testing/selftests/task_diag/.gitignore
 create mode 100644 tools/testing/selftests/task_diag/Makefile
 create mode 100755 tools/testing/selftests/task_diag/_run.sh
 create mode 100644 tools/testing/selftests/task_diag/fork.c
 create mode 100755 tools/testing/selftests/task_diag/run.sh
 create mode 120000 tools/testing/selftests/task_diag/task_diag.h
 create mode 100644 tools/testing/selftests/task_diag/task_diag_all.c
 create mode 100644 tools/testing/selftests/task_diag/task_diag_comm.c
 create mode 100644 tools/testing/selftests/task_diag/task_diag_comm.h
 create mode 100644 tools/testing/selftests/task_diag/task_proc_all.c

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index b04afc3..b399c8b 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -29,6 +29,7 @@ TARGETS += user
 TARGETS += vm
 TARGETS += x86
 TARGETS += zram
+TARGETS += task_diag
 #Please keep the TARGETS list alphabetically sorted
 # Run "make quicktest=1 run_tests" or
 # "make quicktest=1 kselftest from top level Makefile
diff --git a/tools/testing/selftests/task_diag/.gitignore b/tools/testing/selftests/task_diag/.gitignore
new file mode 100644
index 0000000..f963a1f
--- /dev/null
+++ b/tools/testing/selftests/task_diag/.gitignore
@@ -0,0 +1,4 @@
+task_diag
+task_diag_all
+task_proc_all
+fork
diff --git a/tools/testing/selftests/task_diag/Makefile b/tools/testing/selftests/task_diag/Makefile
new file mode 100644
index 0000000..69b7934
--- /dev/null
+++ b/tools/testing/selftests/task_diag/Makefile
@@ -0,0 +1,16 @@
+all: task_diag_all fork task_proc_all fork
+
+CFLAGS += -g -Wall -O2 -I/usr/include/libnl3
+LDFLAGS += -lnl-3
+TEST_PROGS := run.sh
+include ../lib.mk
+
+task_diag_all.o: task_diag_all.c task_diag_comm.h
+task_diag_comm.o: task_diag_comm.c task_diag_comm.h
+
+task_diag_all: task_diag_all.o task_diag_comm.o
+fork: fork.c
+task_proc_all: task_proc_all.c
+
+clean:
+	rm -rf task_diag task_diag_all task_diag_comm.o task_diag_all.o task_diag.o fork task_proc_all
diff --git a/tools/testing/selftests/task_diag/_run.sh b/tools/testing/selftests/task_diag/_run.sh
new file mode 100755
index 0000000..3f541fe
--- /dev/null
+++ b/tools/testing/selftests/task_diag/_run.sh
@@ -0,0 +1,21 @@
+#!/bin/sh
+set -o pipefail
+set -e -x
+
+./fork 1000
+
+nprocesses=`./task_diag_all all --maps | grep 'pid.*tgid.*ppid.*comm fork$' | wc -l`
+nthreads=`./task_diag_all All --smaps --cred | grep 'pid.*tgid.*ppid.*comm fork$' | wc -l`
+nchildren=`./task_diag_all children --pid 1 | grep 'pid.*tgid.*ppid.*comm fork$' | wc -l`
+
+./task_diag_all one --pid 1 --cred
+
+killall -9 fork
+
+[ "$nthreads"     -eq 1000 ] &&
+[ "$nprocesses"   -eq 1000  ] &&
+[ "$nchildren"    -eq 1000  ] &&
+true ||  {
+	echo "Unexpected number of tasks $nthreads:$nprocesses" 1>&2
+	exit 1
+}
diff --git a/tools/testing/selftests/task_diag/fork.c b/tools/testing/selftests/task_diag/fork.c
new file mode 100644
index 0000000..c6e17d1
--- /dev/null
+++ b/tools/testing/selftests/task_diag/fork.c
@@ -0,0 +1,30 @@
+#include <unistd.h>
+#include <string.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+int main(int argc, char **argv)
+{
+	int i, n;
+
+	if (argc < 2)
+		return 1;
+
+	n = atoi(argv[1]);
+	for (i = 0; i < n; i++) {
+		pid_t pid;
+
+		pid = fork();
+		if (pid < 0) {
+			printf("Unable to fork: %m\n");
+			return 1;
+		}
+		if (pid == 0) {
+			while (1)
+				sleep(1000);
+			return 0;
+		}
+	}
+
+	return 0;
+}
diff --git a/tools/testing/selftests/task_diag/run.sh b/tools/testing/selftests/task_diag/run.sh
new file mode 100755
index 0000000..28a8550
--- /dev/null
+++ b/tools/testing/selftests/task_diag/run.sh
@@ -0,0 +1 @@
+unshare -p -f -m --mount-proc ./_run.sh && { echo PASS; exit 0; } || { echo FAIL; exit 1; }
diff --git a/tools/testing/selftests/task_diag/task_diag.h b/tools/testing/selftests/task_diag/task_diag.h
new file mode 120000
index 0000000..d20a38c
--- /dev/null
+++ b/tools/testing/selftests/task_diag/task_diag.h
@@ -0,0 +1 @@
+../../../../include/uapi/linux/task_diag.h
\ No newline at end of file
diff --git a/tools/testing/selftests/task_diag/task_diag_all.c b/tools/testing/selftests/task_diag/task_diag_all.c
new file mode 100644
index 0000000..d865207
--- /dev/null
+++ b/tools/testing/selftests/task_diag/task_diag_all.c
@@ -0,0 +1,150 @@
+#include <stdio.h>
+#include <errno.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <getopt.h>
+
+#include <linux/netlink.h>
+#include <netlink/msg.h>
+
+#include "task_diag.h"
+#include "task_diag_comm.h"
+
+#ifndef SOL_NETLINK
+#define SOL_NETLINK	270
+#endif
+
+#ifndef NETLINK_SCM_PID
+#define NETLINK_SCM_PID	11
+#endif
+
+static void usage(char *name)
+{
+	pr_err("Usage: %s command [options]", name);
+	pr_err(
+"Commands:\n"
+"\tall         - dump all processes\n"
+"\tAll         - dump all threads\n"
+"\tthreads     - dump all thread for the specified process\n"
+"\tchildren    - dump all thread for the specified process\n"
+"\tone         - dump the specified process\n"
+"Options:\n"
+"\t-p|--pid    - PID of the required process\n"
+"\t-m|--maps   - dump memory regions\n"
+"\t-s|--smaps  - dump statistics for memory regions\n"
+"\t-c|--cred   - dump credentials"
+);
+}
+int main(int argc, char *argv[])
+{
+	int exit_status = 1, fd;
+	struct task_diag_pid *req;
+	char nl_req[4096];
+	struct nlmsghdr *hdr = (void *)nl_req;
+	int last_pid = 0;
+	int opt, idx;
+	int err, size = 0;
+	static const char short_opts[] = "p:cms";
+	static struct option long_opts[] = {
+		{ "pid",	required_argument, 0, 'p' },
+		{ "maps",	no_argument, 0, 'm' },
+		{ "smaps",	no_argument, 0, 's' },
+		{ "cred",	no_argument, 0, 'c' },
+		{},
+	};
+
+	hdr->nlmsg_len = nlmsg_total_size(0);
+
+	req = nlmsg_data(hdr);
+	size += nla_total_size(sizeof(*req));
+
+	hdr->nlmsg_len += size;
+
+
+	req->show_flags = TASK_DIAG_SHOW_BASE;
+
+	if (argc < 2) {
+		pr_err("Usage: %s type pid scm_pid", argv[0]);
+		return 1;
+	}
+
+	req->pid = 0; /* dump all tasks by default */
+
+	switch (argv[1][0]) {
+	case 'c':
+		req->dump_strategy = TASK_DIAG_DUMP_CHILDREN;
+		break;
+	case 't':
+		req->dump_strategy = TASK_DIAG_DUMP_THREAD;
+		break;
+	case 'o':
+		req->dump_strategy = TASK_DIAG_DUMP_ONE;
+		break;
+	case 'a':
+		req->dump_strategy = TASK_DIAG_DUMP_ALL;
+		req->pid = 0;
+		break;
+	case 'A':
+		req->dump_strategy = TASK_DIAG_DUMP_ALL_THREAD;
+		req->pid = 0;
+		break;
+	default:
+		usage(argv[0]);
+		return 1;
+	}
+
+	while (1) {
+		idx = -1;
+		opt = getopt_long(argc, argv, short_opts, long_opts, &idx);
+		if (opt == -1)
+			break;
+		switch (opt) {
+		case 'p':
+			req->pid = atoi(optarg);
+			break;
+		case 'c':
+			req->show_flags |= TASK_DIAG_SHOW_CRED;
+			break;
+		case 'm':
+			req->show_flags |= TASK_DIAG_SHOW_VMA;
+			break;
+		case 's':
+			req->show_flags |= TASK_DIAG_SHOW_VMA_STAT | TASK_DIAG_SHOW_VMA;
+			break;
+		default:
+			usage(argv[0]);
+			return 1;
+		}
+	}
+
+	fd = open("/proc/task-diag", O_RDWR);
+	if (fd < 0)
+		return -1;
+
+	if (write(fd, hdr, hdr->nlmsg_len) != hdr->nlmsg_len)
+		return -1;
+
+	while (1) {
+		char buf[163840];
+		size = read(fd, buf, sizeof(buf));
+
+		if (size < 0)
+			goto err;
+
+		if (size == 0)
+			break;
+
+		err = nlmsg_receive(buf, size, &show_task, &last_pid);
+		if (err < 0)
+			goto err;
+
+		if (err == 0)
+			break;
+	}
+
+	exit_status = 0;
+err:
+	return exit_status;
+}
diff --git a/tools/testing/selftests/task_diag/task_diag_comm.c b/tools/testing/selftests/task_diag/task_diag_comm.c
new file mode 100644
index 0000000..65f2536
--- /dev/null
+++ b/tools/testing/selftests/task_diag/task_diag_comm.c
@@ -0,0 +1,172 @@
+#include <errno.h>
+#include <string.h>
+#include <unistd.h>
+
+#include <sys/types.h>
+#include <linux/netlink.h>
+#include <netlink/cli/utils.h>
+
+#include "task_diag.h"
+#include "task_diag_comm.h"
+
+int quiet;
+
+#define PSS_SHIFT 12
+
+int nlmsg_receive(void *buf, int len, int (*cb)(struct nlmsghdr *, void *), void *args)
+{
+	struct nlmsghdr *hdr;
+
+	for (hdr = (struct nlmsghdr *)buf;
+			NLMSG_OK(hdr, len); hdr = NLMSG_NEXT(hdr, len)) {
+
+		if (hdr->nlmsg_type == NLMSG_DONE) {
+			int *len = (int *)NLMSG_DATA(hdr);
+
+			if (*len < 0) {
+				pr_err("ERROR %d reported by netlink (%s)\n",
+					*len, strerror(-*len));
+				return *len;
+			}
+
+			return 0;
+		}
+
+		if (hdr->nlmsg_type == NLMSG_ERROR) {
+			struct nlmsgerr *err = (struct nlmsgerr *)NLMSG_DATA(hdr);
+
+			if (hdr->nlmsg_len - sizeof(*hdr) < sizeof(struct nlmsgerr)) {
+				pr_err("ERROR truncated\n");
+				return -1;
+			}
+
+			if (err->error == 0)
+				return 0;
+
+			return -1;
+		}
+		if (cb && cb(hdr, args))
+			return -1;
+	}
+
+	return 1;
+}
+
+int show_task(struct nlmsghdr *hdr, void *arg)
+{
+	int msg_len;
+	struct msgtemplate *msg;
+	struct task_diag_msg *diag_msg;
+	struct nlattr *na;
+	int *last_pid = arg;
+	int len;
+
+	msg_len = NLMSG_PAYLOAD(hdr, 0);
+
+	msg = (struct msgtemplate *)hdr;
+	diag_msg = NLMSG_DATA(msg);
+
+#if 1
+	if (diag_msg->pid != *last_pid)
+		pr_info("Start getting information about %d\n", diag_msg->pid);
+	else
+		pr_info("Continue getting information about %d\n", diag_msg->pid);
+#endif
+	*last_pid = diag_msg->pid;
+
+	na = ((void *) diag_msg) + NLMSG_ALIGN(sizeof(*diag_msg));
+	len = NLMSG_ALIGN(sizeof(*diag_msg));
+	while (len < msg_len) {
+		len += NLA_ALIGN(na->nla_len);
+		switch (na->nla_type) {
+		case TASK_DIAG_BASE:
+		{
+			struct task_diag_base *msg;
+
+			/* For nested attributes, na follows */
+			msg = NLA_DATA(na);
+			pr_info("pid %5d tgid %5d ppid %5d sid %5d pgid %5d comm %s\n",
+				msg->pid, msg->tgid, msg->ppid, msg->sid, msg->pgid, msg->comm);
+		}
+		break;
+
+		case TASK_DIAG_CRED:
+		{
+			struct task_diag_creds *creds;
+
+			creds = NLA_DATA(na);
+			pr_info("uid: %d %d %d %d\n", creds->uid,
+					creds->euid, creds->suid, creds->fsuid);
+			pr_info("gid: %d %d %d %d\n", creds->uid,
+					creds->euid, creds->suid, creds->fsuid);
+			pr_info("CapInh: %08x%08x\n",
+						creds->cap_inheritable.cap[1],
+						creds->cap_inheritable.cap[0]);
+			pr_info("CapPrm: %08x%08x\n",
+						creds->cap_permitted.cap[1],
+						creds->cap_permitted.cap[0]);
+			pr_info("CapEff: %08x%08x\n",
+						creds->cap_effective.cap[1],
+						creds->cap_effective.cap[0]);
+			pr_info("CapBnd: %08x%08x\n", creds->cap_bset.cap[1],
+						creds->cap_bset.cap[0]);
+		}
+		break;
+
+		case TASK_DIAG_VMA:
+		{
+			struct task_diag_vma *vma_tmp, vma;
+
+			task_diag_for_each_vma(vma_tmp, na) {
+				char *name;
+				struct task_diag_vma_stat *stat_tmp, stat;
+
+				name = task_diag_vma_name(vma_tmp);
+				if (name == NULL)
+					name = "";
+
+				memcpy(&vma, vma_tmp, sizeof(vma));
+				pr_info("%016llx-%016llx %016llx %s\n",
+					vma.start, vma.end, vma.vm_flags, name);
+
+				stat_tmp = task_diag_vma_stat(vma_tmp);
+				if (stat_tmp)
+					memcpy(&stat, stat_tmp, sizeof(stat));
+				else
+					memset(&stat, 0, sizeof(stat));
+
+				pr_info(
+					   "Size:           %8llu kB\n"
+					   "Rss:            %8llu kB\n"
+					   "Pss:            %8llu kB\n"
+					   "Shared_Clean:   %8llu kB\n"
+					   "Shared_Dirty:   %8llu kB\n"
+					   "Private_Clean:  %8llu kB\n"
+					   "Private_Dirty:  %8llu kB\n"
+					   "Referenced:     %8llu kB\n"
+					   "Anonymous:      %8llu kB\n"
+					   "AnonHugePages:  %8llu kB\n"
+					   "Swap:           %8llu kB\n",
+					   (vma.end - vma.start) >> 10,
+					   stat.resident >> 10,
+					   (stat.pss >> (10 + PSS_SHIFT)),
+					   stat.shared_clean  >> 10,
+					   stat.shared_dirty  >> 10,
+					   stat.private_clean >> 10,
+					   stat.private_dirty >> 10,
+					   stat.referenced >> 10,
+					   stat.anonymous >> 10,
+					   stat.anonymous_thp >> 10,
+					   stat.swap >> 10);
+			}
+		}
+		break;
+		default:
+			pr_info("Unknown nla_type %d\n",
+				na->nla_type);
+		}
+		na = ((void *) diag_msg) + len;
+	}
+
+	return 0;
+}
diff --git a/tools/testing/selftests/task_diag/task_diag_comm.h b/tools/testing/selftests/task_diag/task_diag_comm.h
new file mode 100644
index 0000000..40e83b7
--- /dev/null
+++ b/tools/testing/selftests/task_diag/task_diag_comm.h
@@ -0,0 +1,34 @@
+#ifndef __TASK_DIAG_COMM__
+#define __TASK_DIAG_COMM__
+
+#include <stdio.h>
+
+#include "task_diag.h"
+
+/*
+ * Generic macros for dealing with netlink sockets. Might be duplicated
+ * elsewhere. It is recommended that commercial grade applications use
+ * libnl or libnetlink and use the interfaces provided by the library
+ */
+#define GENLMSG_DATA(glh)	((void *)(NLMSG_DATA(glh) + GENL_HDRLEN))
+#define GENLMSG_PAYLOAD(glh)	(NLMSG_PAYLOAD(glh, 0) - GENL_HDRLEN)
+#define NLA_DATA(na)		((void *)((char *)(na) + NLA_HDRLEN))
+#define NLA_PAYLOAD(len)	(len - NLA_HDRLEN)
+
+#define pr_err(fmt, ...)				\
+		fprintf(stderr, "%s:%d" fmt"\n", __func__, __LINE__, ##__VA_ARGS__)
+
+#define pr_perror(fmt, ...)				\
+		fprintf(stderr, fmt " : %m\n", ##__VA_ARGS__)
+
+extern int quiet;
+#define pr_info(fmt, arg...)			\
+	do {					\
+		if (!quiet)			\
+			printf(fmt, ##arg);	\
+	} while (0)				\
+
+int nlmsg_receive(void *buf, int len, int (*cb)(struct nlmsghdr *, void *), void *args);
+extern int show_task(struct nlmsghdr *hdr, void *arg);
+
+#endif /* __TASK_DIAG_COMM__ */
diff --git a/tools/testing/selftests/task_diag/task_proc_all.c b/tools/testing/selftests/task_diag/task_proc_all.c
new file mode 100644
index 0000000..07ee80c
--- /dev/null
+++ b/tools/testing/selftests/task_diag/task_proc_all.c
@@ -0,0 +1,35 @@
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <dirent.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+
+
+int main(int argc, char **argv)
+{
+	DIR *d;
+	int fd, tasks = 0;
+	struct dirent *de;
+	char buf[4096];
+
+	d = opendir("/proc");
+	if (d == NULL)
+		return 1;
+
+	while ((de = readdir(d))) {
+		if (de->d_name[0] < '0' || de->d_name[0] > '9')
+			continue;
+		snprintf(buf, sizeof(buf), "/proc/%s/stat", de->d_name);
+		fd = open(buf, O_RDONLY);
+		read(fd, buf, sizeof(buf));
+		close(fd);
+		tasks++;
+	}
+
+	closedir(d);
+
+	printf("tasks: %d\n", tasks);
+
+	return 0;
+}
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 14/15] task_diag: Enhance fork tool to spawn threads
  2016-04-11 23:35 [PATCH 0/15] task_diag: add a new interface to get information about processes (v3) Andrey Vagin
                   ` (12 preceding siblings ...)
  2016-04-11 23:35 ` [PATCH 13/15] selftest: check the task_diag functinonality Andrey Vagin
@ 2016-04-11 23:35 ` Andrey Vagin
  2016-04-11 23:35 ` [PATCH 15/15] test: check that task_diag can dump all thread of one process Andrey Vagin
  14 siblings, 0 replies; 21+ messages in thread
From: Andrey Vagin @ 2016-04-11 23:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrey Vagin, Oleg Nesterov, Andrew Morton, Cyrill Gorcunov,
	Pavel Emelyanov, Roger Luethi, Arnd Bergmann,
	Arnaldo Carvalho de Melo, David Ahern, Andy Lutomirski,
	Pavel Odintsov

From: David Ahern <dsahern@gmail.com>

Add option to fork threads as well as processes.
Make the sleep time configurable too so that spawned
tasks exit on their own.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 tools/testing/selftests/task_diag/Makefile |  2 ++
 tools/testing/selftests/task_diag/_run.sh  |  4 ++--
 tools/testing/selftests/task_diag/fork.c   | 36 ++++++++++++++++++++++++++----
 3 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/task_diag/Makefile b/tools/testing/selftests/task_diag/Makefile
index 69b7934..c9977231 100644
--- a/tools/testing/selftests/task_diag/Makefile
+++ b/tools/testing/selftests/task_diag/Makefile
@@ -10,6 +10,8 @@ task_diag_comm.o: task_diag_comm.c task_diag_comm.h
 
 task_diag_all: task_diag_all.o task_diag_comm.o
 fork: fork.c
+	$(CC) $(CFLAGS) $(LDFLAGS) -o $@ $^ -lpthread
+
 task_proc_all: task_proc_all.c
 
 clean:
diff --git a/tools/testing/selftests/task_diag/_run.sh b/tools/testing/selftests/task_diag/_run.sh
index 3f541fe..559f02a 100755
--- a/tools/testing/selftests/task_diag/_run.sh
+++ b/tools/testing/selftests/task_diag/_run.sh
@@ -2,7 +2,7 @@
 set -o pipefail
 set -e -x
 
-./fork 1000
+./fork 1000 10
 
 nprocesses=`./task_diag_all all --maps | grep 'pid.*tgid.*ppid.*comm fork$' | wc -l`
 nthreads=`./task_diag_all All --smaps --cred | grep 'pid.*tgid.*ppid.*comm fork$' | wc -l`
@@ -12,7 +12,7 @@ nchildren=`./task_diag_all children --pid 1 | grep 'pid.*tgid.*ppid.*comm fork$'
 
 killall -9 fork
 
-[ "$nthreads"     -eq 1000 ] &&
+[ "$nthreads"     -eq 10000 ] &&
 [ "$nprocesses"   -eq 1000  ] &&
 [ "$nchildren"    -eq 1000  ] &&
 true ||  {
diff --git a/tools/testing/selftests/task_diag/fork.c b/tools/testing/selftests/task_diag/fork.c
index c6e17d1..ebddedd2 100644
--- a/tools/testing/selftests/task_diag/fork.c
+++ b/tools/testing/selftests/task_diag/fork.c
@@ -2,15 +2,39 @@
 #include <string.h>
 #include <stdio.h>
 #include <stdlib.h>
+#include <pthread.h>
 
+void *f(void *arg)
+{
+	unsigned long t = (unsigned long) arg;
+
+	sleep(t);
+	return NULL;
+}
+
+/* usage: fork nproc [mthreads [sleep]] */
 int main(int argc, char **argv)
 {
-	int i, n;
+	int i, j, n, m = 0;
+	unsigned long t_sleep = 1000;
+	pthread_attr_t attr;
+	pthread_t id;
 
-	if (argc < 2)
+	if (argc < 2) {
+		fprintf(stderr, "usage: fork nproc [mthreads [sleep]]\n");
 		return 1;
+	}
 
 	n = atoi(argv[1]);
+
+	if (argc > 2)
+		m = atoi(argv[2]);
+
+	if (argc > 3)
+		t_sleep = atoi(argv[3]);
+
+	pthread_attr_init(&attr);
+
 	for (i = 0; i < n; i++) {
 		pid_t pid;
 
@@ -20,8 +44,12 @@ int main(int argc, char **argv)
 			return 1;
 		}
 		if (pid == 0) {
-			while (1)
-				sleep(1000);
+			if (m) {
+				for (j = 0; j < m-1; ++j)
+					pthread_create(&id, &attr, f, (void *)t_sleep);
+			}
+
+			sleep(t_sleep);
 			return 0;
 		}
 	}
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 15/15] test: check that task_diag can dump all thread of one process
  2016-04-11 23:35 [PATCH 0/15] task_diag: add a new interface to get information about processes (v3) Andrey Vagin
                   ` (13 preceding siblings ...)
  2016-04-11 23:35 ` [PATCH 14/15] task_diag: Enhance fork tool to spawn threads Andrey Vagin
@ 2016-04-11 23:35 ` Andrey Vagin
  14 siblings, 0 replies; 21+ messages in thread
From: Andrey Vagin @ 2016-04-11 23:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrey Vagin, Oleg Nesterov, Andrew Morton, Cyrill Gorcunov,
	Pavel Emelyanov, Roger Luethi, Arnd Bergmann,
	Arnaldo Carvalho de Melo, David Ahern, Andy Lutomirski,
	Pavel Odintsov

Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 tools/testing/selftests/task_diag/_run.sh | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tools/testing/selftests/task_diag/_run.sh b/tools/testing/selftests/task_diag/_run.sh
index 559f02a..d2e8544 100755
--- a/tools/testing/selftests/task_diag/_run.sh
+++ b/tools/testing/selftests/task_diag/_run.sh
@@ -10,11 +10,15 @@ nchildren=`./task_diag_all children --pid 1 | grep 'pid.*tgid.*ppid.*comm fork$'
 
 ./task_diag_all one --pid 1 --cred
 
+( exec -a fork_thread ./fork 1 1234 )
+pid=`pidof fork_thread`
+ntaskthreads=`./task_diag_all thread --maps --cred --smaps --pid $pid |  grep 'pid.*tgid.*ppid.*comm' | wc -l`
 killall -9 fork
 
 [ "$nthreads"     -eq 10000 ] &&
 [ "$nprocesses"   -eq 1000  ] &&
 [ "$nchildren"    -eq 1000  ] &&
+[ "$ntaskthreads" -eq 1234  ] &&
 true ||  {
 	echo "Unexpected number of tasks $nthreads:$nprocesses" 1>&2
 	exit 1
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 04/15] task_diag: add a new interface to get information about tasks (v4)
  2016-04-11 23:35 ` [PATCH 04/15] task_diag: add a new interface to get information about tasks (v4) Andrey Vagin
@ 2016-04-12  1:03   ` kbuild test robot
  2016-04-13  0:45     ` Andrew Vagin
  2016-04-12  7:08   ` Cyrill Gorcunov
  1 sibling, 1 reply; 21+ messages in thread
From: kbuild test robot @ 2016-04-12  1:03 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: kbuild-all, linux-kernel, Andrey Vagin, Oleg Nesterov,
	Andrew Morton, Cyrill Gorcunov, Pavel Emelyanov, Roger Luethi,
	Arnd Bergmann, Arnaldo Carvalho de Melo, David Ahern,
	Andy Lutomirski, Pavel Odintsov

[-- Attachment #1: Type: text/plain, Size: 3085 bytes --]

Hi Andrey,

[auto build test ERROR on v4.6-rc3]
[also build test ERROR on next-20160411]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]

url:    https://github.com/0day-ci/linux/commits/Andrey-Vagin/task_diag-add-a-new-interface-to-get-information-about-processes-v3/20160412-074109
config: microblaze-allmodconfig (attached as .config)
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=microblaze 

Note: the linux-review/Andrey-Vagin/task_diag-add-a-new-interface-to-get-information-about-processes-v3/20160412-074109 HEAD 2e0f174ce7e6fddc8fdd89f3dbb8d990626358a0 builds fine.
      It only hurts bisectibility.

All errors (new ones prefixed by >>):

>> fs/proc/task_diag.c:139:19: error: field 'tgid' has incomplete type
     struct tgid_iter tgid;
                      ^
   fs/proc/task_diag.c: In function 'iter_start':
>> fs/proc/task_diag.c:187:3: error: implicit declaration of function 'next_tgid' [-Werror=implicit-function-declaration]
      iter->tgid = next_tgid(iter->ns, iter->tgid);
      ^
   cc1: some warnings being treated as errors

vim +/tgid +139 fs/proc/task_diag.c

   133		struct task_diag_pid	req;
   134		struct pid_namespace	*ns;
   135		struct task_struct	*parent;
   136	
   137		struct task_diag_cb	*cb;
   138	
 > 139		struct tgid_iter	tgid;
   140		unsigned int		pos;
   141		struct task_struct	*task;
   142	};
   143	
   144	static void iter_stop(struct task_iter *iter)
   145	{
   146		struct task_struct *task;
   147	
   148		if (iter->parent)
   149			put_task_struct(iter->parent);
   150	
   151		switch (iter->req.dump_strategy) {
   152		case TASK_DIAG_DUMP_ALL:
   153			task = iter->tgid.task;
   154			break;
   155		default:
   156			task = iter->task;
   157		}
   158		if (task)
   159			put_task_struct(task);
   160	}
   161	
   162	static struct task_struct *iter_start(struct task_iter *iter)
   163	{
   164		if (iter->req.pid > 0) {
   165			rcu_read_lock();
   166			iter->parent = find_task_by_pid_ns(iter->req.pid, iter->ns);
   167			if (iter->parent)
   168				get_task_struct(iter->parent);
   169			rcu_read_unlock();
   170		}
   171	
   172		switch (iter->req.dump_strategy) {
   173		case TASK_DIAG_DUMP_ONE:
   174			if (iter->parent == NULL)
   175				return ERR_PTR(-ESRCH);
   176			iter->pos = iter->cb->pos;
   177			if (iter->pos == 0) {
   178				iter->task = iter->parent;
   179				iter->parent = NULL;
   180			} else
   181				iter->task = NULL;
   182			return iter->task;
   183	
   184		case TASK_DIAG_DUMP_ALL:
   185			iter->tgid.tgid = iter->cb->pid;
   186			iter->tgid.task = NULL;
 > 187			iter->tgid = next_tgid(iter->ns, iter->tgid);
   188			return iter->tgid.task;
   189		}
   190	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 44698 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 04/15] task_diag: add a new interface to get information about tasks (v4)
  2016-04-11 23:35 ` [PATCH 04/15] task_diag: add a new interface to get information about tasks (v4) Andrey Vagin
  2016-04-12  1:03   ` kbuild test robot
@ 2016-04-12  7:08   ` Cyrill Gorcunov
  2016-04-13  0:39     ` Andrew Vagin
  1 sibling, 1 reply; 21+ messages in thread
From: Cyrill Gorcunov @ 2016-04-12  7:08 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: linux-kernel, Oleg Nesterov, Andrew Morton, Pavel Emelyanov,
	Roger Luethi, Arnd Bergmann, Arnaldo Carvalho de Melo,
	David Ahern, Andy Lutomirski, Pavel Odintsov

On Mon, Apr 11, 2016 at 04:35:44PM -0700, Andrey Vagin wrote:
...
> +static int __taskdiag_dumpit(struct task_iter *iter,
> +			     struct task_diag_cb *cb, struct task_struct **start)
> +{
> +	struct user_namespace *userns = current_user_ns();
> +	struct task_struct *task = *start;
> +	int rc;
> +
> +	for (; task; task = iter_next(iter)) {
> +		if (!ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS))
> +			continue;
> +
> +		rc = task_diag_fill(task, cb->resp, &iter->req,
> +				cb, iter->ns, userns);
> +		if (rc < 0) {
> +			if (rc != -EMSGSIZE)
> +				return rc;
> +			break;
> +		}
> +	}
> +	*start = task;

task = NULL always here?

> +
> +	return 0;
> +}

	Cyrill

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 04/15] task_diag: add a new interface to get information about tasks (v4)
  2016-04-12  7:08   ` Cyrill Gorcunov
@ 2016-04-13  0:39     ` Andrew Vagin
  2016-04-13  5:26       ` Cyrill Gorcunov
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Vagin @ 2016-04-13  0:39 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Andrey Vagin, linux-kernel, Oleg Nesterov, Andrew Morton,
	Pavel Emelyanov, Roger Luethi, Arnd Bergmann,
	Arnaldo Carvalho de Melo, David Ahern, Andy Lutomirski,
	Pavel Odintsov

On Tue, Apr 12, 2016 at 10:08:57AM +0300, Cyrill Gorcunov wrote:
> On Mon, Apr 11, 2016 at 04:35:44PM -0700, Andrey Vagin wrote:
> ...
> > +static int __taskdiag_dumpit(struct task_iter *iter,
> > +			     struct task_diag_cb *cb, struct task_struct **start)
> > +{
> > +	struct user_namespace *userns = current_user_ns();
> > +	struct task_struct *task = *start;
> > +	int rc;
> > +
> > +	for (; task; task = iter_next(iter)) {
> > +		if (!ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS))
> > +			continue;
> > +
> > +		rc = task_diag_fill(task, cb->resp, &iter->req,
> > +				cb, iter->ns, userns);
> > +		if (rc < 0) {
> > +			if (rc != -EMSGSIZE)
> > +				return rc;
> > +			break;

task isn't NULL here

> > +		}
> > +	}
> > +	*start = task;
> 
> task = NULL always here?

No, it isn't if the loop is interrupted by break.

Thanks,
Andrew

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 04/15] task_diag: add a new interface to get information about tasks (v4)
  2016-04-12  1:03   ` kbuild test robot
@ 2016-04-13  0:45     ` Andrew Vagin
  0 siblings, 0 replies; 21+ messages in thread
From: Andrew Vagin @ 2016-04-13  0:45 UTC (permalink / raw)
  To: kbuild test robot
  Cc: Andrey Vagin, kbuild-all, linux-kernel, Oleg Nesterov,
	Andrew Morton, Cyrill Gorcunov, Pavel Emelyanov, Roger Luethi,
	Arnd Bergmann, Arnaldo Carvalho de Melo, David Ahern,
	Andy Lutomirski, Pavel Odintsov

On Tue, Apr 12, 2016 at 09:03:39AM +0800, kbuild test robot wrote:
> Hi Andrey,
> 
> [auto build test ERROR on v4.6-rc3]
> [also build test ERROR on next-20160411]
> [if your patch is applied to the wrong git tree, please drop us a note to help improving the system]
> 
> url:    https://github.com/0day-ci/linux/commits/Andrey-Vagin/task_diag-add-a-new-interface-to-get-information-about-processes-v3/20160412-074109
> config: microblaze-allmodconfig (attached as .config)
> reproduce:
>         wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # save the attached .config to linux build tree
>         make.cross ARCH=microblaze 
> 
> Note: the linux-review/Andrey-Vagin/task_diag-add-a-new-interface-to-get-information-about-processes-v3/20160412-074109 HEAD 2e0f174ce7e6fddc8fdd89f3dbb8d990626358a0 builds fine.
>       It only hurts bisectibility.
> 
> All errors (new ones prefixed by >>):
> 
> >> fs/proc/task_diag.c:139:19: error: field 'tgid' has incomplete type
>      struct tgid_iter tgid;
>                       ^
>    fs/proc/task_diag.c: In function 'iter_start':
> >> fs/proc/task_diag.c:187:3: error: implicit declaration of function 'next_tgid' [-Werror=implicit-function-declaration]
>       iter->tgid = next_tgid(iter->ns, iter->tgid);
>       ^
>    cc1: some warnings being treated as errors
>

Unfortunately I forgot to include fs/proc/internal.h here. It will be
included in the 6-th patch. I will fix this issue in a final version.

Thanks,
Andrew

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 04/15] task_diag: add a new interface to get information about tasks (v4)
  2016-04-13  0:39     ` Andrew Vagin
@ 2016-04-13  5:26       ` Cyrill Gorcunov
  0 siblings, 0 replies; 21+ messages in thread
From: Cyrill Gorcunov @ 2016-04-13  5:26 UTC (permalink / raw)
  To: Andrew Vagin
  Cc: Andrey Vagin, linux-kernel, Oleg Nesterov, Andrew Morton,
	Pavel Emelyanov, Roger Luethi, Arnd Bergmann,
	Arnaldo Carvalho de Melo, David Ahern, Andy Lutomirski,
	Pavel Odintsov

On Tue, Apr 12, 2016 at 05:39:14PM -0700, Andrew Vagin wrote:
> > 
> > task = NULL always here?
> 
> No, it isn't if the loop is interrupted by break.

Yeah, managed to miss break, thanks!

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2016-04-13  5:26 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-11 23:35 [PATCH 0/15] task_diag: add a new interface to get information about processes (v3) Andrey Vagin
2016-04-11 23:35 ` [PATCH 01/15] proc: pick out a function to iterate task children Andrey Vagin
2016-04-11 23:35 ` [PATCH 02/15] proc: export task_first_tid() and task_next_tid() Andrey Vagin
2016-04-11 23:35 ` [PATCH 03/15] proc: export next_tgid() Andrey Vagin
2016-04-11 23:35 ` [PATCH 04/15] task_diag: add a new interface to get information about tasks (v4) Andrey Vagin
2016-04-12  1:03   ` kbuild test robot
2016-04-13  0:45     ` Andrew Vagin
2016-04-12  7:08   ` Cyrill Gorcunov
2016-04-13  0:39     ` Andrew Vagin
2016-04-13  5:26       ` Cyrill Gorcunov
2016-04-11 23:35 ` [PATCH 05/15] task_diag: add a new group to get process credentials Andrey Vagin
2016-04-11 23:35 ` [PATCH 06/15] task_diag: add a new group to get tasks memory mappings (v2) Andrey Vagin
2016-04-11 23:35 ` [PATCH 07/15] task_diag: add ability to dump children and threads Andrey Vagin
2016-04-11 23:35 ` [PATCH 08/15] task_diag: Only add VMAs for thread_group leader Andrey Vagin
2016-04-11 23:35 ` [PATCH 09/15] task_diag: add a flag to mark incomplete messages Andrey Vagin
2016-04-11 23:35 ` [PATCH 10/15] task_diag: add a new group to get resource usage Andrey Vagin
2016-04-11 23:35 ` [PATCH 11/15] task_diag: add a new group to get memory usage Andrey Vagin
2016-04-11 23:35 ` [PATCH 12/15] Documentation: add documentation for task_diag Andrey Vagin
2016-04-11 23:35 ` [PATCH 13/15] selftest: check the task_diag functinonality Andrey Vagin
2016-04-11 23:35 ` [PATCH 14/15] task_diag: Enhance fork tool to spawn threads Andrey Vagin
2016-04-11 23:35 ` [PATCH 15/15] test: check that task_diag can dump all thread of one process Andrey Vagin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.