linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [resend][PATCH v9 0/3] ns, procfs: pid conversion between ns and showing pidns hierarchy
@ 2014-12-23 10:20 Chen Hanxiao
  2014-12-23 10:20 ` [resend][PATCH v9 1/3] procfs: show hierarchy of pid namespace Chen Hanxiao
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Chen Hanxiao @ 2014-12-23 10:20 UTC (permalink / raw)
  To: Eric W. Biederman, Serge Hallyn, Andrew Morton, Pavel Emelyanov
  Cc: containers, linux-kernel, David Howells, Vasiliy Kulikov,
	Mateusz Guzik, Oleg Nesterov, Richard Weinberger

This series will expose pid inside containers
via procfs.
Also show the hierarchy of pid namespcae.
Then we could know how pid looks inside a container
and their ns relationships.

1. helpful for nested container checkpoint/restore

  We could know whether two pids had relationship
  between each other.

      init_pid_ns 1
            │
   ┌────────────┐
  ns1                       ns2
   │                        │
  200                       300
                             │
                            ns2
                             │
                            400

  #cat /proc/pidns_hierarchy
  200 1 1
  300 1 1
  400 300 2

2. useful for pid translation from container
  Ex:
       init_pid_ns    ns1         ns2
   t1  2
   t2   `- 3          1
   t3   `- 4          3
   t4       `- 5      `- 5        1
   t5       `- 6      `- 8        3

  It could solve problems like: we see a pid 3 goes wrong
  in container's log, what is its pid on hosts:
  a) inside container:
  # readlink /proc/3/ns/pid
  pid:[4026532388]

  b) on host:
  We show it in the form of :
  <init_PID> <parent_of_init_PID> <relative PID level>

  # cat /proc/pidns_hierarchy
  14918 1 1
  16263 14918 2
  16581 1 1
  Then we could easily find /proc/16263/ns/pid->4026532388.
  On host, we knew that reported pid 3 is in level 2,
   and its parental pid ns is from pid 14918.

  c) on host, check child of 16263, grep it from status:
  NSpid:  16268   8       3

  We knew that pid 16268 is pid 3 reported by container.

v9: fix codes be inluded if CONFIG_PID_NS=n
    add docs to describe the usage of pidns_hierarchy procfs
v8: fix some improper comments
    use max() from kernel.h
v7: change pidns_hierarchy style to be consistent
    with current interface like:
    <init_PID> <parent_of_init_PID> <relative PID level>
    remove EXPERT dependent in Kconfig.
v6: fix some get_pid leaks and do some cleanups.
v5: collect pid by find_ge_pid;
    use local list inside nslist_proc_show;
    use get_pid, remove mutex lock.
v4: simplify pid collection and some performance optimizamtion;
    fix another race issue.
v3: fix a race issue and memory leak issue in pidns_hierarchy;
    add another two fielsd: NSpgid and NSsid.
v2: use a procfs text file, replacing dirs under /proc for
    showing pidns hierarchy;
    add two new fields: NStgid and NSpid
    keep fields of Tgid and Pid unchanged for back compatibility.

Chen Hanxiao (3):
  procfs: show hierarchy of pid namespace
  /proc/PID/status: show all sets of pid according to ns
  Documentation: add docs for /proc/pidns_hierarchy

 Documentation/namespaces/pidns-hierarchy.txt |  51 +++++
 fs/proc/Kconfig                              |   6 +
 fs/proc/Makefile                             |   1 +
 fs/proc/array.c                              |  16 ++
 fs/proc/internal.h                           |   9 +
 fs/proc/pidns_hierarchy.c                    | 280 +++++++++++++++++++++++++++
 fs/proc/root.c                               |   1 +
 7 files changed, 364 insertions(+)
 create mode 100644 Documentation/namespaces/pidns-hierarchy.txt
 create mode 100644 fs/proc/pidns_hierarchy.c

-- 
1.9.3


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [resend][PATCH v9 1/3] procfs: show hierarchy of pid namespace
  2014-12-23 10:20 [resend][PATCH v9 0/3] ns, procfs: pid conversion between ns and showing pidns hierarchy Chen Hanxiao
@ 2014-12-23 10:20 ` Chen Hanxiao
  2014-12-30  1:17   ` Chen, Hanxiao
  2014-12-30  5:54   ` Eric W. Biederman
  2014-12-23 10:20 ` [resend][PATCH v9 2/3] /proc/PID/status: show all sets of pid according to ns Chen Hanxiao
  2014-12-23 10:20 ` [PATCH v9 3/3] Documentation: add docs for /proc/pidns_hierarchy Chen Hanxiao
  2 siblings, 2 replies; 8+ messages in thread
From: Chen Hanxiao @ 2014-12-23 10:20 UTC (permalink / raw)
  To: Eric W. Biederman, Serge Hallyn, Andrew Morton, Pavel Emelyanov
  Cc: containers, linux-kernel, David Howells, Vasiliy Kulikov,
	Mateusz Guzik, Oleg Nesterov, Richard Weinberger

We lack of pid hierarchy information, and this will lead to:
  a) we don't know pids' relationship, who is whose child:
   /proc/PID/ns/pid only tell us whether two pids live in different ns
  b) bring trouble to nested lxc container checkpoint/restore/migration
  c) bring trouble to pid translation between containers;

This patch will show the hierarchy of pid namespace
by pidns_hierarchy like:

<init_PID> <parent_of_init_PID> <relative PID level>

Ex:
[root@localhost ~]#cat /proc/pidns_hierarchy
18060 1 1
18102 18060 2
1534  18102 3
1600  18102 3
1550  1 1
*Note: numbers represent the pid 1 in different ns

It shows the pid hierarchy below:

      init_pid_ns 1
              │
┌────────────┐
ns1                      ns2
│                        │
1550                    18060
                          │
                          │
                         ns3
                          │
                        18102
                          │
                 ┌──────────┐
                 ns4                   ns5
                 │                    │
                1534                  1600

Every pid printed in pidns_hierarchy
is the init pid of that pid ns level.

Acked-by: Richard Weinberer <richard@nod.at>

Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
---
v9: fix codes be included if CONFIG_PID_NS=n
v8: use max() from kernel.h
    fix some improper comments
v7: change stype to be consistent with current interface like
    <init_PID> <parent_of_init_PID> <relative PID level>
    remove EXPERT dependent in Kconfig
v6: fix a get_pid leak and do some cleanups;
v5: collect pid by find_ge_pid;
    use local list inside nslist_proc_show;
    use get_pid, remove mutex lock.
v4: simplify pid collection and some performance optimizamtion
    fix another race issue.
v3: fix a race issue and memory leak issue
v2: use a procfs text file instead of dirs under /proc

 fs/proc/Kconfig           |   6 +
 fs/proc/Makefile          |   1 +
 fs/proc/internal.h        |   9 ++
 fs/proc/pidns_hierarchy.c | 280 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/proc/root.c            |   1 +
 5 files changed, 297 insertions(+)
 create mode 100644 fs/proc/pidns_hierarchy.c

diff --git a/fs/proc/Kconfig b/fs/proc/Kconfig
index 2183fcf..82dda55 100644
--- a/fs/proc/Kconfig
+++ b/fs/proc/Kconfig
@@ -71,3 +71,9 @@ config PROC_PAGE_MONITOR
 	  /proc/pid/smaps, /proc/pid/clear_refs, /proc/pid/pagemap,
 	  /proc/kpagecount, and /proc/kpageflags. Disabling these
           interfaces will reduce the size of the kernel by approximately 4kb.
+
+config PROC_PID_HIERARCHY
+	bool "Enable /proc/pidns_hierarchy support"
+	depends on PROC_FS
+	help
+	  Show pid namespace hierarchy information
diff --git a/fs/proc/Makefile b/fs/proc/Makefile
index 7151ea4..33e384b 100644
--- a/fs/proc/Makefile
+++ b/fs/proc/Makefile
@@ -30,3 +30,4 @@ proc-$(CONFIG_PROC_KCORE)	+= kcore.o
 proc-$(CONFIG_PROC_VMCORE)	+= vmcore.o
 proc-$(CONFIG_PRINTK)	+= kmsg.o
 proc-$(CONFIG_PROC_PAGE_MONITOR)	+= page.o
+proc-$(CONFIG_PROC_PID_HIERARCHY)	+= pidns_hierarchy.o
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 6fcdba5..18e0773 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -280,6 +280,15 @@ struct proc_maps_private {
 #endif
 };
 
+/*
+ * pidns_hierarchy.c
+ */
+#ifdef CONFIG_PROC_PID_HIERARCHY
+	extern void proc_pidns_hierarchy_init(void);
+#else
+	static inline void proc_pidns_hierarchy_init(void) {}
+#endif
+
 struct mm_struct *proc_mem_open(struct inode *inode, unsigned int mode);
 
 extern const struct file_operations proc_pid_maps_operations;
diff --git a/fs/proc/pidns_hierarchy.c b/fs/proc/pidns_hierarchy.c
new file mode 100644
index 0000000..ab1c665
--- /dev/null
+++ b/fs/proc/pidns_hierarchy.c
@@ -0,0 +1,280 @@
+#include <linux/init.h>
+#include <linux/errno.h>
+#include <linux/proc_fs.h>
+#include <linux/module.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/pid_namespace.h>
+#include <linux/seq_file.h>
+#include <linux/kernel.h>
+
+/*
+ *  /proc/pidns_hierarchy
+ *
+ *  show the hierarchy of pid namespace as:
+ *  <init_PID> <parent_of_init_PID> <relative PID level>
+ *
+ *  init_PID: child reaper in ns
+ *  parent_of_init_PID: init_PID's parent, child reaper too
+ *  relative PID level: pid level relative to caller's ns
+ */
+
+#define NS_HIERARCHY	"pidns_hierarchy"
+
+/* list for host pid collection */
+struct pidns_list {
+	struct list_head list;
+	struct pid *pid;
+	unsigned int level;
+};
+
+static void free_pidns_list(struct list_head *head)
+{
+	struct pidns_list *tmp, *pos;
+
+	list_for_each_entry_safe(pos, tmp, head, list) {
+		list_del(&pos->list);
+		put_pid(pos->pid);
+		kfree(pos);
+	}
+}
+
+static int
+pidns_list_add(struct pid *pid, struct list_head *list_head,
+		int level)
+{
+	struct pidns_list *ent;
+
+	ent = kmalloc(sizeof(*ent), GFP_KERNEL);
+	if (!ent)
+		return -ENOMEM;
+
+	ent->pid = pid;
+	ent->level = level;
+	list_add_tail(&ent->list, list_head);
+
+	return 0;
+}
+
+static int
+pidns_list_filter(struct list_head *pidns_pid_list,
+		struct list_head *pidns_pid_tree)
+{
+	struct pidns_list *pos, *pos_t;
+	struct pid_namespace *ns0, *ns1;
+	struct pid *pid0, *pid1;
+	int rc, flag = 0;
+
+	/*
+	 * screen pids with relationship
+	 * in pidns_pid_list, we may add pids like:
+	 * ns0   ns1   ns2
+	 * pid1->pid2->pid3
+	 * we should screen pid1, pid2 and keep pid3
+	 */
+	list_for_each_entry(pos, pidns_pid_list, list) {
+		list_for_each_entry(pos_t, pidns_pid_list, list) {
+			flag = 0;
+			pid0 = pos->pid;
+			pid1 = pos_t->pid;
+			ns0 = pid0->numbers[pid0->level].ns;
+			ns1 = pid1->numbers[pid1->level].ns;
+			if (pos->pid->level < pos_t->pid->level)
+				for (; ns1 != NULL; ns1 = ns1->parent)
+					if (ns0 == ns1) {
+						flag = 1;
+						break;
+					}
+			/* a redundant pid found */
+			if (flag == 1)
+				break;
+		}
+
+		if (flag == 0) {
+			get_pid(pos->pid);
+			rc = pidns_list_add(pos->pid, pidns_pid_tree, 0);
+			if (rc) {
+				put_pid(pos->pid);
+				goto cleanup;
+			}
+		}
+	}
+
+	/*
+	 *  Now all useful stuffs are in pidns_pid_tree,
+	 *  free pidns_pid_list
+	 */
+	free_pidns_list(pidns_pid_list);
+
+	return 0;
+
+cleanup:
+	free_pidns_list(pidns_pid_tree);
+	return rc;
+}
+
+static void
+pidns_list_set_level(struct list_head *pidns_list_in,
+		struct pid_namespace *curr_ns)
+{
+	struct pidns_list *pos, *pos_t;
+	struct pid *pid0, *pid1;
+	int i;
+
+	/*
+	 * From the pid hierarchy point of view,
+	 * we already had a list of pids who are not
+	 * the subsets of each other.
+	 * But part of them may be same.
+	 * We need to set the level of each pids:
+	 * pid0:         A->B->C   pid1:       A->B->D
+	 * level:           2                  0
+	 * We use level to identify
+	 * the public part of each pids.
+	 */
+	list_for_each_entry(pos, pidns_list_in, list) {
+		list_for_each_entry(pos_t, pidns_list_in, list) {
+			pid0 = pos->pid;
+			pid1 = pos_t->pid;
+			if (pid0 == pid1)
+				continue;
+			if (pos_t->level > 0)
+				continue;
+			for (i = curr_ns->level + 1; i <= pid0->level; i++) {
+				/* skip the public parts */
+				if (pid0->numbers[i].ns ==
+						pid1->numbers[i].ns)
+					continue;
+				else
+					break;
+			}
+			pos->level = i - 1;
+		}
+	}
+}
+
+/*
+ * Finds all init pids, places them into
+ * pidns_pid_list and then stores the hierarchy
+ * into pidns_pid_tree.
+ */
+static int proc_pidns_list_refresh(struct pid_namespace *curr_ns,
+		struct list_head *pidns_pid_list,
+		struct list_head *pidns_pid_tree)
+{
+	struct pid *pid;
+	int new_nr, nr = 0;
+	int rc;
+
+	/* collect pids in current namespace */
+	while (nr < PID_MAX_LIMIT) {
+		rcu_read_lock();
+		pid = find_ge_pid(nr, curr_ns);
+		if (!pid) {
+			rcu_read_unlock();
+			break;
+		}
+
+		new_nr = pid_vnr(pid);
+		if (!is_child_reaper(pid)) {
+			nr = new_nr + 1;
+			rcu_read_unlock();
+			continue;
+		}
+		get_pid(pid);
+		rcu_read_unlock();
+		rc = pidns_list_add(pid, pidns_pid_list, 0);
+		if (rc) {
+			put_pid(pid);
+			goto cleanup;
+		}
+		nr = new_nr + 1;
+	}
+
+	/*
+	 * Only one pid found as the child reaper,
+	 * so current pid namespace do not have sub-namespace,
+	 * return 0 directly.
+	 */
+	if (list_is_singular(pidns_pid_list)) {
+		rc = 0;
+		goto cleanup;
+	}
+
+	/*
+	 * screen duplicate pids from pidns_pid_list
+	 * and form a new list pidns_pid_tree.
+	 */
+	rc = pidns_list_filter(pidns_pid_list, pidns_pid_tree);
+	if (rc)
+		goto cleanup;
+
+	return 0;
+
+cleanup:
+	free_pidns_list(pidns_pid_list);
+	return rc;
+}
+
+static int nslist_proc_show(struct seq_file *m, void *v)
+{
+	struct pidns_list *pos;
+	struct pid_namespace *ns, *curr_ns;
+	struct pid *pid;
+	char pid_buf[16], ppid_buf[16];
+	int i, rc;
+
+	LIST_HEAD(pidns_pid_list);
+	LIST_HEAD(pidns_pid_tree);
+
+	curr_ns = task_active_pid_ns(current);
+
+	rc = proc_pidns_list_refresh(curr_ns,
+			&pidns_pid_list, &pidns_pid_tree);
+	if (rc)
+		return rc;
+
+	pidns_list_set_level(&pidns_pid_tree, curr_ns);
+
+	/* print pid namespace's hierarchy */
+	list_for_each_entry(pos, &pidns_pid_tree, list) {
+		pid = pos->pid;
+		for (i = max(curr_ns->level, pos->level) + 1;
+				i <= pid->level; i++) {
+			ns = pid->numbers[i].ns;
+			/* show PID '1' in specific pid ns */
+			snprintf(pid_buf, 16, "%u",
+				pid_vnr(find_pid_ns(1, ns)));
+			ns = pid->numbers[i - 1].ns;
+			snprintf(ppid_buf, 16, "%u",
+					pid_vnr(find_pid_ns(1, ns)));
+			seq_printf(m, "%s\t%s\t%d\n", pid_buf, ppid_buf,
+					i - curr_ns->level);
+		}
+	}
+
+	free_pidns_list(&pidns_pid_tree);
+
+	return 0;
+}
+
+static int nslist_proc_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, nslist_proc_show, NULL);
+}
+
+static const struct file_operations proc_nspid_nslist_fops = {
+	.open		= nslist_proc_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
+/*
+ *  Called by proc_root_init() to initialize the /proc/pidns_hierarchy
+ */
+void __init proc_pidns_hierarchy_init(void)
+{
+	proc_create(NS_HIERARCHY, S_IRUGO,
+		NULL, &proc_nspid_nslist_fops);
+}
diff --git a/fs/proc/root.c b/fs/proc/root.c
index e74ac9f..bcb55c7 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -190,6 +190,7 @@ void __init proc_root_init(void)
 	proc_tty_init();
 	proc_mkdir("bus", NULL);
 	proc_sys_init();
+	proc_pidns_hierarchy_init();
 }
 
 static int proc_root_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [resend][PATCH v9 2/3] /proc/PID/status: show all sets of pid according to ns
  2014-12-23 10:20 [resend][PATCH v9 0/3] ns, procfs: pid conversion between ns and showing pidns hierarchy Chen Hanxiao
  2014-12-23 10:20 ` [resend][PATCH v9 1/3] procfs: show hierarchy of pid namespace Chen Hanxiao
@ 2014-12-23 10:20 ` Chen Hanxiao
  2014-12-30  5:39   ` Eric W. Biederman
  2014-12-23 10:20 ` [PATCH v9 3/3] Documentation: add docs for /proc/pidns_hierarchy Chen Hanxiao
  2 siblings, 1 reply; 8+ messages in thread
From: Chen Hanxiao @ 2014-12-23 10:20 UTC (permalink / raw)
  To: Eric W. Biederman, Serge Hallyn, Andrew Morton, Pavel Emelyanov
  Cc: containers, linux-kernel, David Howells, Vasiliy Kulikov,
	Mateusz Guzik, Oleg Nesterov, Richard Weinberger

If some issues occurred inside a container guest, host user
could not know which process is in trouble just by guest pid:
the users of container guest only knew the pid inside containers.
This will bring obstacle for trouble shooting.

This patch adds four fields: NStgid, NSpid, NSpgid and NSsid:
a) In init_pid_ns, nothing changed;

b) In one pidns, will tell the pid inside containers:
  NStgid: 21776   5       1
  NSpid:  21776   5       1
  NSpgid: 21776   5       1
  NSsid:  21729   1       0
  ** Process id is 21776 in level 0, 5 in level 1, 1 in level 2.

c) If pidns is nested, it depends on which pidns are you in.
  NStgid: 5       1
  NSpid:  5       1
  NSpgid: 5       1
  NSsid:  1       0
  ** Views from level 1

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Tested-by: Serge Hallyn <serge.hallyn@canonical.com>

Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
---
v9: rebased on 3.19-rc1
No change from v4-v8
v3: add another two fielsd: NSpgid and NSsid.
v2: add two new fields: NStgid and NSpid.
    keep fields of Tgid and Pid unchanged for back compatibility.

 fs/proc/array.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index bd117d0..35205d4 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -208,6 +208,22 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
 			   from_kgid_munged(user_ns, GROUP_AT(group_info, g)));
 	put_cred(cred);
 
+	seq_puts(m, "\nNStgid:");
+	for (g = ns->level; g <= pid->level; g++)
+		seq_printf(m, "\t%d ",
+			task_tgid_nr_ns(p, pid->numbers[g].ns));
+	seq_puts(m, "\nNSpid:");
+	for (g = ns->level; g <= pid->level; g++)
+		seq_printf(m, "\t%d ",
+			task_pid_nr_ns(p, pid->numbers[g].ns));
+	seq_puts(m, "\nNSpgid:");
+	for (g = ns->level; g <= pid->level; g++)
+		seq_printf(m, "\t%d ",
+			task_pgrp_nr_ns(p, pid->numbers[g].ns));
+	seq_puts(m, "\nNSsid:");
+	for (g = ns->level; g <= pid->level; g++)
+		seq_printf(m, "\t%d ",
+			task_session_nr_ns(p, pid->numbers[g].ns));
 	seq_putc(m, '\n');
 }
 
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v9 3/3] Documentation: add docs for /proc/pidns_hierarchy
  2014-12-23 10:20 [resend][PATCH v9 0/3] ns, procfs: pid conversion between ns and showing pidns hierarchy Chen Hanxiao
  2014-12-23 10:20 ` [resend][PATCH v9 1/3] procfs: show hierarchy of pid namespace Chen Hanxiao
  2014-12-23 10:20 ` [resend][PATCH v9 2/3] /proc/PID/status: show all sets of pid according to ns Chen Hanxiao
@ 2014-12-23 10:20 ` Chen Hanxiao
  2 siblings, 0 replies; 8+ messages in thread
From: Chen Hanxiao @ 2014-12-23 10:20 UTC (permalink / raw)
  To: Eric W. Biederman, Serge Hallyn, Andrew Morton, Pavel Emelyanov
  Cc: containers, linux-kernel, David Howells, Vasiliy Kulikov,
	Mateusz Guzik, Oleg Nesterov, Richard Weinberger

Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
---
 Documentation/namespaces/pidns-hierarchy.txt | 51 ++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)
 create mode 100644 Documentation/namespaces/pidns-hierarchy.txt

diff --git a/Documentation/namespaces/pidns-hierarchy.txt b/Documentation/namespaces/pidns-hierarchy.txt
new file mode 100644
index 0000000..feb92a9
--- /dev/null
+++ b/Documentation/namespaces/pidns-hierarchy.txt
@@ -0,0 +1,51 @@
+This document is about how to use pid namespace hierarchy procfs.
+
+We knew whether two pids living in the same pid namespace
+by /proc/PID/ns/pid, but their relationships
+between pids were unknown:
+we couldn't tell that one pid was another one's parent/siblings...
+But /proc/pidns_hierarchy could tell us the answer.
+
+/proc/pidns_hierarchy will show the hierarchy of pid namespace
+in the form of:
+
+<init_PID> <parent_of_init_PID> <relative PID level>
+
+init_PID:            child reaper in a pid namespace
+parent_of_init_PID:  init_PID's parent, child reaper too
+relative PID level:  pid level relative to caller's ns,
+                     started from '1'.
+
+Here is a chart to describe the relationship between
+some pids:
+
+         init_pid_ns                          level 0
+              |
+              1
+              |
+┌────────────┐
+ns1                      ns2                  level 1
+|                         |
+1550                    18060
+                          |
+                          |
+                         ns3                  level 2
+                          |
+                        18102
+                          |
+                 ┌──────────┐
+                 ns4                   ns5    level 3
+                 |                     |
+                1534                  1600
+
+It will be showed by /proc/pidns_hierarchy as below:
+
+#cat /proc/pidns_hierarchy
+18060 1 1
+18102 18060 2
+1534  18102 3
+1600  18102 3
+1550  1 1
+
+Note: numbers in column 1 are pid numbers in current ns,
+    they represent the pid '1' in different ns
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* RE: [resend][PATCH v9 1/3] procfs: show hierarchy of pid namespace
  2014-12-23 10:20 ` [resend][PATCH v9 1/3] procfs: show hierarchy of pid namespace Chen Hanxiao
@ 2014-12-30  1:17   ` Chen, Hanxiao
  2014-12-30  5:54   ` Eric W. Biederman
  1 sibling, 0 replies; 8+ messages in thread
From: Chen, Hanxiao @ 2014-12-30  1:17 UTC (permalink / raw)
  To: Eric W. Biederman, Serge Hallyn, Andrew Morton, Pavel Emelyanov
  Cc: Richard Weinberger, containers, linux-kernel, Oleg Nesterov,
	David Howells, Mateusz Guzik

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 12214 bytes --]



> -----Original Message-----
> From: containers-bounces@lists.linux-foundation.org
> [mailto:containers-bounces@lists.linux-foundation.org] On Behalf Of Chen Hanxiao
> Sent: Tuesday, December 23, 2014 6:21 PM
> To: Eric W. Biederman; Serge Hallyn; Andrew Morton; Pavel Emelyanov
> Cc: Richard Weinberger; containers@lists.linux-foundation.org;
> linux-kernel@vger.kernel.org; Oleg Nesterov; David Howells; Mateusz Guzik
> Subject: [resend][PATCH v9 1/3] procfs: show hierarchy of pid namespace
> 
> We lack of pid hierarchy information, and this will lead to:
>   a) we don't know pids' relationship, who is whose child:
>    /proc/PID/ns/pid only tell us whether two pids live in different ns
>   b) bring trouble to nested lxc container checkpoint/restore/migration
>   c) bring trouble to pid translation between containers;
> 
> This patch will show the hierarchy of pid namespace
> by pidns_hierarchy like:
> 
> <init_PID> <parent_of_init_PID> <relative PID level>
> 

Hi Eric, Pavel
 
Any comments?

Regards,
- Chen

> Ex:
> [root@localhost ~]#cat /proc/pidns_hierarchy
> 18060 1 1
> 18102 18060 2
> 1534  18102 3
> 1600  18102 3
> 1550  1 1
> *Note: numbers represent the pid 1 in different ns
> 
> It shows the pid hierarchy below:
> 
>       init_pid_ns 1
>               │
> ┌────────────┐
> ns1                      ns2
> │                        │
> 1550                    18060
>                           │
>                           │
>                          ns3
>                           │
>                         18102
>                           │
>                  ┌──────────┐
>                  ns4                   ns5
>                  │                    │
>                 1534                  1600
> 
> Every pid printed in pidns_hierarchy
> is the init pid of that pid ns level.
> 
> Acked-by: Richard Weinberer <richard@nod.at>
> 
> Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
> ---
> v9: fix codes be included if CONFIG_PID_NS=n
> v8: use max() from kernel.h
>     fix some improper comments
> v7: change stype to be consistent with current interface like
>     <init_PID> <parent_of_init_PID> <relative PID level>
>     remove EXPERT dependent in Kconfig
> v6: fix a get_pid leak and do some cleanups;
> v5: collect pid by find_ge_pid;
>     use local list inside nslist_proc_show;
>     use get_pid, remove mutex lock.
> v4: simplify pid collection and some performance optimizamtion
>     fix another race issue.
> v3: fix a race issue and memory leak issue
> v2: use a procfs text file instead of dirs under /proc
> 
>  fs/proc/Kconfig           |   6 +
>  fs/proc/Makefile          |   1 +
>  fs/proc/internal.h        |   9 ++
>  fs/proc/pidns_hierarchy.c | 280 ++++++++++++++++++++++++++++++++++++++++++++++
>  fs/proc/root.c            |   1 +
>  5 files changed, 297 insertions(+)
>  create mode 100644 fs/proc/pidns_hierarchy.c
> 
> diff --git a/fs/proc/Kconfig b/fs/proc/Kconfig
> index 2183fcf..82dda55 100644
> --- a/fs/proc/Kconfig
> +++ b/fs/proc/Kconfig
> @@ -71,3 +71,9 @@ config PROC_PAGE_MONITOR
>  	  /proc/pid/smaps, /proc/pid/clear_refs, /proc/pid/pagemap,
>  	  /proc/kpagecount, and /proc/kpageflags. Disabling these
>            interfaces will reduce the size of the kernel by approximately 4kb.
> +
> +config PROC_PID_HIERARCHY
> +	bool "Enable /proc/pidns_hierarchy support"
> +	depends on PROC_FS
> +	help
> +	  Show pid namespace hierarchy information
> diff --git a/fs/proc/Makefile b/fs/proc/Makefile
> index 7151ea4..33e384b 100644
> --- a/fs/proc/Makefile
> +++ b/fs/proc/Makefile
> @@ -30,3 +30,4 @@ proc-$(CONFIG_PROC_KCORE)	+= kcore.o
>  proc-$(CONFIG_PROC_VMCORE)	+= vmcore.o
>  proc-$(CONFIG_PRINTK)	+= kmsg.o
>  proc-$(CONFIG_PROC_PAGE_MONITOR)	+= page.o
> +proc-$(CONFIG_PROC_PID_HIERARCHY)	+= pidns_hierarchy.o
> diff --git a/fs/proc/internal.h b/fs/proc/internal.h
> index 6fcdba5..18e0773 100644
> --- a/fs/proc/internal.h
> +++ b/fs/proc/internal.h
> @@ -280,6 +280,15 @@ struct proc_maps_private {
>  #endif
>  };
> 
> +/*
> + * pidns_hierarchy.c
> + */
> +#ifdef CONFIG_PROC_PID_HIERARCHY
> +	extern void proc_pidns_hierarchy_init(void);
> +#else
> +	static inline void proc_pidns_hierarchy_init(void) {}
> +#endif
> +
>  struct mm_struct *proc_mem_open(struct inode *inode, unsigned int mode);
> 
>  extern const struct file_operations proc_pid_maps_operations;
> diff --git a/fs/proc/pidns_hierarchy.c b/fs/proc/pidns_hierarchy.c
> new file mode 100644
> index 0000000..ab1c665
> --- /dev/null
> +++ b/fs/proc/pidns_hierarchy.c
> @@ -0,0 +1,280 @@
> +#include <linux/init.h>
> +#include <linux/errno.h>
> +#include <linux/proc_fs.h>
> +#include <linux/module.h>
> +#include <linux/list.h>
> +#include <linux/slab.h>
> +#include <linux/pid_namespace.h>
> +#include <linux/seq_file.h>
> +#include <linux/kernel.h>
> +
> +/*
> + *  /proc/pidns_hierarchy
> + *
> + *  show the hierarchy of pid namespace as:
> + *  <init_PID> <parent_of_init_PID> <relative PID level>
> + *
> + *  init_PID: child reaper in ns
> + *  parent_of_init_PID: init_PID's parent, child reaper too
> + *  relative PID level: pid level relative to caller's ns
> + */
> +
> +#define NS_HIERARCHY	"pidns_hierarchy"
> +
> +/* list for host pid collection */
> +struct pidns_list {
> +	struct list_head list;
> +	struct pid *pid;
> +	unsigned int level;
> +};
> +
> +static void free_pidns_list(struct list_head *head)
> +{
> +	struct pidns_list *tmp, *pos;
> +
> +	list_for_each_entry_safe(pos, tmp, head, list) {
> +		list_del(&pos->list);
> +		put_pid(pos->pid);
> +		kfree(pos);
> +	}
> +}
> +
> +static int
> +pidns_list_add(struct pid *pid, struct list_head *list_head,
> +		int level)
> +{
> +	struct pidns_list *ent;
> +
> +	ent = kmalloc(sizeof(*ent), GFP_KERNEL);
> +	if (!ent)
> +		return -ENOMEM;
> +
> +	ent->pid = pid;
> +	ent->level = level;
> +	list_add_tail(&ent->list, list_head);
> +
> +	return 0;
> +}
> +
> +static int
> +pidns_list_filter(struct list_head *pidns_pid_list,
> +		struct list_head *pidns_pid_tree)
> +{
> +	struct pidns_list *pos, *pos_t;
> +	struct pid_namespace *ns0, *ns1;
> +	struct pid *pid0, *pid1;
> +	int rc, flag = 0;
> +
> +	/*
> +	 * screen pids with relationship
> +	 * in pidns_pid_list, we may add pids like:
> +	 * ns0   ns1   ns2
> +	 * pid1->pid2->pid3
> +	 * we should screen pid1, pid2 and keep pid3
> +	 */
> +	list_for_each_entry(pos, pidns_pid_list, list) {
> +		list_for_each_entry(pos_t, pidns_pid_list, list) {
> +			flag = 0;
> +			pid0 = pos->pid;
> +			pid1 = pos_t->pid;
> +			ns0 = pid0->numbers[pid0->level].ns;
> +			ns1 = pid1->numbers[pid1->level].ns;
> +			if (pos->pid->level < pos_t->pid->level)
> +				for (; ns1 != NULL; ns1 = ns1->parent)
> +					if (ns0 == ns1) {
> +						flag = 1;
> +						break;
> +					}
> +			/* a redundant pid found */
> +			if (flag == 1)
> +				break;
> +		}
> +
> +		if (flag == 0) {
> +			get_pid(pos->pid);
> +			rc = pidns_list_add(pos->pid, pidns_pid_tree, 0);
> +			if (rc) {
> +				put_pid(pos->pid);
> +				goto cleanup;
> +			}
> +		}
> +	}
> +
> +	/*
> +	 *  Now all useful stuffs are in pidns_pid_tree,
> +	 *  free pidns_pid_list
> +	 */
> +	free_pidns_list(pidns_pid_list);
> +
> +	return 0;
> +
> +cleanup:
> +	free_pidns_list(pidns_pid_tree);
> +	return rc;
> +}
> +
> +static void
> +pidns_list_set_level(struct list_head *pidns_list_in,
> +		struct pid_namespace *curr_ns)
> +{
> +	struct pidns_list *pos, *pos_t;
> +	struct pid *pid0, *pid1;
> +	int i;
> +
> +	/*
> +	 * From the pid hierarchy point of view,
> +	 * we already had a list of pids who are not
> +	 * the subsets of each other.
> +	 * But part of them may be same.
> +	 * We need to set the level of each pids:
> +	 * pid0:         A->B->C   pid1:       A->B->D
> +	 * level:           2                  0
> +	 * We use level to identify
> +	 * the public part of each pids.
> +	 */
> +	list_for_each_entry(pos, pidns_list_in, list) {
> +		list_for_each_entry(pos_t, pidns_list_in, list) {
> +			pid0 = pos->pid;
> +			pid1 = pos_t->pid;
> +			if (pid0 == pid1)
> +				continue;
> +			if (pos_t->level > 0)
> +				continue;
> +			for (i = curr_ns->level + 1; i <= pid0->level; i++) {
> +				/* skip the public parts */
> +				if (pid0->numbers[i].ns ==
> +						pid1->numbers[i].ns)
> +					continue;
> +				else
> +					break;
> +			}
> +			pos->level = i - 1;
> +		}
> +	}
> +}
> +
> +/*
> + * Finds all init pids, places them into
> + * pidns_pid_list and then stores the hierarchy
> + * into pidns_pid_tree.
> + */
> +static int proc_pidns_list_refresh(struct pid_namespace *curr_ns,
> +		struct list_head *pidns_pid_list,
> +		struct list_head *pidns_pid_tree)
> +{
> +	struct pid *pid;
> +	int new_nr, nr = 0;
> +	int rc;
> +
> +	/* collect pids in current namespace */
> +	while (nr < PID_MAX_LIMIT) {
> +		rcu_read_lock();
> +		pid = find_ge_pid(nr, curr_ns);
> +		if (!pid) {
> +			rcu_read_unlock();
> +			break;
> +		}
> +
> +		new_nr = pid_vnr(pid);
> +		if (!is_child_reaper(pid)) {
> +			nr = new_nr + 1;
> +			rcu_read_unlock();
> +			continue;
> +		}
> +		get_pid(pid);
> +		rcu_read_unlock();
> +		rc = pidns_list_add(pid, pidns_pid_list, 0);
> +		if (rc) {
> +			put_pid(pid);
> +			goto cleanup;
> +		}
> +		nr = new_nr + 1;
> +	}
> +
> +	/*
> +	 * Only one pid found as the child reaper,
> +	 * so current pid namespace do not have sub-namespace,
> +	 * return 0 directly.
> +	 */
> +	if (list_is_singular(pidns_pid_list)) {
> +		rc = 0;
> +		goto cleanup;
> +	}
> +
> +	/*
> +	 * screen duplicate pids from pidns_pid_list
> +	 * and form a new list pidns_pid_tree.
> +	 */
> +	rc = pidns_list_filter(pidns_pid_list, pidns_pid_tree);
> +	if (rc)
> +		goto cleanup;
> +
> +	return 0;
> +
> +cleanup:
> +	free_pidns_list(pidns_pid_list);
> +	return rc;
> +}
> +
> +static int nslist_proc_show(struct seq_file *m, void *v)
> +{
> +	struct pidns_list *pos;
> +	struct pid_namespace *ns, *curr_ns;
> +	struct pid *pid;
> +	char pid_buf[16], ppid_buf[16];
> +	int i, rc;
> +
> +	LIST_HEAD(pidns_pid_list);
> +	LIST_HEAD(pidns_pid_tree);
> +
> +	curr_ns = task_active_pid_ns(current);
> +
> +	rc = proc_pidns_list_refresh(curr_ns,
> +			&pidns_pid_list, &pidns_pid_tree);
> +	if (rc)
> +		return rc;
> +
> +	pidns_list_set_level(&pidns_pid_tree, curr_ns);
> +
> +	/* print pid namespace's hierarchy */
> +	list_for_each_entry(pos, &pidns_pid_tree, list) {
> +		pid = pos->pid;
> +		for (i = max(curr_ns->level, pos->level) + 1;
> +				i <= pid->level; i++) {
> +			ns = pid->numbers[i].ns;
> +			/* show PID '1' in specific pid ns */
> +			snprintf(pid_buf, 16, "%u",
> +				pid_vnr(find_pid_ns(1, ns)));
> +			ns = pid->numbers[i - 1].ns;
> +			snprintf(ppid_buf, 16, "%u",
> +					pid_vnr(find_pid_ns(1, ns)));
> +			seq_printf(m, "%s\t%s\t%d\n", pid_buf, ppid_buf,
> +					i - curr_ns->level);
> +		}
> +	}
> +
> +	free_pidns_list(&pidns_pid_tree);
> +
> +	return 0;
> +}
> +
> +static int nslist_proc_open(struct inode *inode, struct file *file)
> +{
> +	return single_open(file, nslist_proc_show, NULL);
> +}
> +
> +static const struct file_operations proc_nspid_nslist_fops = {
> +	.open		= nslist_proc_open,
> +	.read		= seq_read,
> +	.llseek		= seq_lseek,
> +	.release	= single_release,
> +};
> +
> +/*
> + *  Called by proc_root_init() to initialize the /proc/pidns_hierarchy
> + */
> +void __init proc_pidns_hierarchy_init(void)
> +{
> +	proc_create(NS_HIERARCHY, S_IRUGO,
> +		NULL, &proc_nspid_nslist_fops);
> +}
> diff --git a/fs/proc/root.c b/fs/proc/root.c
> index e74ac9f..bcb55c7 100644
> --- a/fs/proc/root.c
> +++ b/fs/proc/root.c
> @@ -190,6 +190,7 @@ void __init proc_root_init(void)
>  	proc_tty_init();
>  	proc_mkdir("bus", NULL);
>  	proc_sys_init();
> +	proc_pidns_hierarchy_init();
>  }
> 
>  static int proc_root_getattr(struct vfsmount *mnt, struct dentry *dentry, struct
> kstat *stat
> --
> 1.9.3
> 
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [resend][PATCH v9 2/3] /proc/PID/status: show all sets of pid according to ns
  2014-12-23 10:20 ` [resend][PATCH v9 2/3] /proc/PID/status: show all sets of pid according to ns Chen Hanxiao
@ 2014-12-30  5:39   ` Eric W. Biederman
  0 siblings, 0 replies; 8+ messages in thread
From: Eric W. Biederman @ 2014-12-30  5:39 UTC (permalink / raw)
  To: Chen Hanxiao
  Cc: Serge Hallyn, Andrew Morton, Pavel Emelyanov, containers,
	linux-kernel, David Howells, Vasiliy Kulikov, Mateusz Guzik,
	Oleg Nesterov, Richard Weinberger

Chen Hanxiao <chenhanxiao@cn.fujitsu.com> writes:

> If some issues occurred inside a container guest, host user
> could not know which process is in trouble just by guest pid:
> the users of container guest only knew the pid inside containers.
> This will bring obstacle for trouble shooting.
>
> This patch adds four fields: NStgid, NSpid, NSpgid and NSsid:
> a) In init_pid_ns, nothing changed;
>
> b) In one pidns, will tell the pid inside containers:
>   NStgid: 21776   5       1
>   NSpid:  21776   5       1
>   NSpgid: 21776   5       1
>   NSsid:  21729   1       0
>   ** Process id is 21776 in level 0, 5 in level 1, 1 in level 2.
>
> c) If pidns is nested, it depends on which pidns are you in.
>   NStgid: 5       1
>   NSpid:  5       1
>   NSpgid: 5       1
>   NSsid:  1       0
>   ** Views from level 1
>
> Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
> Tested-by: Serge Hallyn <serge.hallyn@canonical.com>
>
> Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>

At a quick review and read through this looks good.  Once I finish
clearing the security bug fixes from my tree I will see about picking this
up.

Eric


> ---
> v9: rebased on 3.19-rc1
> No change from v4-v8
> v3: add another two fielsd: NSpgid and NSsid.
> v2: add two new fields: NStgid and NSpid.
>     keep fields of Tgid and Pid unchanged for back compatibility.
>
>  fs/proc/array.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
>
> diff --git a/fs/proc/array.c b/fs/proc/array.c
> index bd117d0..35205d4 100644
> --- a/fs/proc/array.c
> +++ b/fs/proc/array.c
> @@ -208,6 +208,22 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
>  			   from_kgid_munged(user_ns, GROUP_AT(group_info, g)));
>  	put_cred(cred);
>  
> +	seq_puts(m, "\nNStgid:");
> +	for (g = ns->level; g <= pid->level; g++)
> +		seq_printf(m, "\t%d ",
> +			task_tgid_nr_ns(p, pid->numbers[g].ns));
> +	seq_puts(m, "\nNSpid:");
> +	for (g = ns->level; g <= pid->level; g++)
> +		seq_printf(m, "\t%d ",
> +			task_pid_nr_ns(p, pid->numbers[g].ns));
> +	seq_puts(m, "\nNSpgid:");
> +	for (g = ns->level; g <= pid->level; g++)
> +		seq_printf(m, "\t%d ",
> +			task_pgrp_nr_ns(p, pid->numbers[g].ns));
> +	seq_puts(m, "\nNSsid:");
> +	for (g = ns->level; g <= pid->level; g++)
> +		seq_printf(m, "\t%d ",
> +			task_session_nr_ns(p, pid->numbers[g].ns));
>  	seq_putc(m, '\n');
>  }

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [resend][PATCH v9 1/3] procfs: show hierarchy of pid namespace
  2014-12-23 10:20 ` [resend][PATCH v9 1/3] procfs: show hierarchy of pid namespace Chen Hanxiao
  2014-12-30  1:17   ` Chen, Hanxiao
@ 2014-12-30  5:54   ` Eric W. Biederman
  1 sibling, 0 replies; 8+ messages in thread
From: Eric W. Biederman @ 2014-12-30  5:54 UTC (permalink / raw)
  To: Chen Hanxiao
  Cc: Serge Hallyn, Andrew Morton, Pavel Emelyanov, containers,
	linux-kernel, David Howells, Vasiliy Kulikov, Mateusz Guzik,
	Oleg Nesterov, Richard Weinberger

Chen Hanxiao <chenhanxiao@cn.fujitsu.com> writes:

> We lack of pid hierarchy information, and this will lead to:
>   a) we don't know pids' relationship, who is whose child:
>    /proc/PID/ns/pid only tell us whether two pids live in different ns
>   b) bring trouble to nested lxc container checkpoint/restore/migration
>   c) bring trouble to pid translation between containers;
>
> This patch will show the hierarchy of pid namespace
> by pidns_hierarchy like:
>
> <init_PID> <parent_of_init_PID> <relative PID level>

I am still trying to figure out if this is a good idea.

The problem is real, though I am not certain how severe?  Is there code
interesting code this would allow you to write?

It would be nice if we could use the same solution for both user
namespace and pid namespace hierarchy description.  This solution
doesn't have a chance of doing that.

The patch itself though is currently incorrect.   What is read from a
file should be determined at open time, and better still be constant
whoever reads the file.

Your pidns_hierarchy file morphs depending on who is reading it and that
is at a minimum confusing, and will cause problems if someone decides to
pass the file descriptor.

There is also an issue that this hierarchy does not seem to be able to
deal with pid namespaces that currently have no pids in them.  If the
goal is to use this for checkpoint/restart that may be a make certain
pid namespace states uncheckpointable.  So that seems like a significant
oversight.

Eric


> Ex:
> [root@localhost ~]#cat /proc/pidns_hierarchy
> 18060 1 1
> 18102 18060 2
> 1534  18102 3
> 1600  18102 3
> 1550  1 1
> *Note: numbers represent the pid 1 in different ns
>
> It shows the pid hierarchy below:
>
>       init_pid_ns 1
>               │
> ┌────────────┐
> ns1                      ns2
> │                        │
> 1550                    18060
>                           │
>                           │
>                          ns3
>                           │
>                         18102
>                           │
>                  ┌──────────┐
>                  ns4                   ns5
>                  │                    │
>                 1534                  1600
>
> Every pid printed in pidns_hierarchy
> is the init pid of that pid ns level.
>
> Acked-by: Richard Weinberer <richard@nod.at>
>
> Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
> ---
> v9: fix codes be included if CONFIG_PID_NS=n
> v8: use max() from kernel.h
>     fix some improper comments
> v7: change stype to be consistent with current interface like
>     <init_PID> <parent_of_init_PID> <relative PID level>
>     remove EXPERT dependent in Kconfig
> v6: fix a get_pid leak and do some cleanups;
> v5: collect pid by find_ge_pid;
>     use local list inside nslist_proc_show;
>     use get_pid, remove mutex lock.
> v4: simplify pid collection and some performance optimizamtion
>     fix another race issue.
> v3: fix a race issue and memory leak issue
> v2: use a procfs text file instead of dirs under /proc
>
>  fs/proc/Kconfig           |   6 +
>  fs/proc/Makefile          |   1 +
>  fs/proc/internal.h        |   9 ++
>  fs/proc/pidns_hierarchy.c | 280 ++++++++++++++++++++++++++++++++++++++++++++++
>  fs/proc/root.c            |   1 +
>  5 files changed, 297 insertions(+)
>  create mode 100644 fs/proc/pidns_hierarchy.c
>
> diff --git a/fs/proc/Kconfig b/fs/proc/Kconfig
> index 2183fcf..82dda55 100644
> --- a/fs/proc/Kconfig
> +++ b/fs/proc/Kconfig
> @@ -71,3 +71,9 @@ config PROC_PAGE_MONITOR
>  	  /proc/pid/smaps, /proc/pid/clear_refs, /proc/pid/pagemap,
>  	  /proc/kpagecount, and /proc/kpageflags. Disabling these
>            interfaces will reduce the size of the kernel by approximately 4kb.
> +
> +config PROC_PID_HIERARCHY
> +	bool "Enable /proc/pidns_hierarchy support"
> +	depends on PROC_FS
> +	help
> +	  Show pid namespace hierarchy information
> diff --git a/fs/proc/Makefile b/fs/proc/Makefile
> index 7151ea4..33e384b 100644
> --- a/fs/proc/Makefile
> +++ b/fs/proc/Makefile
> @@ -30,3 +30,4 @@ proc-$(CONFIG_PROC_KCORE)	+= kcore.o
>  proc-$(CONFIG_PROC_VMCORE)	+= vmcore.o
>  proc-$(CONFIG_PRINTK)	+= kmsg.o
>  proc-$(CONFIG_PROC_PAGE_MONITOR)	+= page.o
> +proc-$(CONFIG_PROC_PID_HIERARCHY)	+= pidns_hierarchy.o
> diff --git a/fs/proc/internal.h b/fs/proc/internal.h
> index 6fcdba5..18e0773 100644
> --- a/fs/proc/internal.h
> +++ b/fs/proc/internal.h
> @@ -280,6 +280,15 @@ struct proc_maps_private {
>  #endif
>  };
>  
> +/*
> + * pidns_hierarchy.c
> + */
> +#ifdef CONFIG_PROC_PID_HIERARCHY
> +	extern void proc_pidns_hierarchy_init(void);
> +#else
> +	static inline void proc_pidns_hierarchy_init(void) {}
> +#endif
> +
>  struct mm_struct *proc_mem_open(struct inode *inode, unsigned int mode);
>  
>  extern const struct file_operations proc_pid_maps_operations;
> diff --git a/fs/proc/pidns_hierarchy.c b/fs/proc/pidns_hierarchy.c
> new file mode 100644
> index 0000000..ab1c665
> --- /dev/null
> +++ b/fs/proc/pidns_hierarchy.c
> @@ -0,0 +1,280 @@
> +#include <linux/init.h>
> +#include <linux/errno.h>
> +#include <linux/proc_fs.h>
> +#include <linux/module.h>
> +#include <linux/list.h>
> +#include <linux/slab.h>
> +#include <linux/pid_namespace.h>
> +#include <linux/seq_file.h>
> +#include <linux/kernel.h>
> +
> +/*
> + *  /proc/pidns_hierarchy
> + *
> + *  show the hierarchy of pid namespace as:
> + *  <init_PID> <parent_of_init_PID> <relative PID level>
> + *
> + *  init_PID: child reaper in ns
> + *  parent_of_init_PID: init_PID's parent, child reaper too
> + *  relative PID level: pid level relative to caller's ns
> + */
> +
> +#define NS_HIERARCHY	"pidns_hierarchy"
> +
> +/* list for host pid collection */
> +struct pidns_list {
> +	struct list_head list;
> +	struct pid *pid;
> +	unsigned int level;
> +};
> +
> +static void free_pidns_list(struct list_head *head)
> +{
> +	struct pidns_list *tmp, *pos;
> +
> +	list_for_each_entry_safe(pos, tmp, head, list) {
> +		list_del(&pos->list);
> +		put_pid(pos->pid);
> +		kfree(pos);
> +	}
> +}
> +
> +static int
> +pidns_list_add(struct pid *pid, struct list_head *list_head,
> +		int level)
> +{
> +	struct pidns_list *ent;
> +
> +	ent = kmalloc(sizeof(*ent), GFP_KERNEL);
> +	if (!ent)
> +		return -ENOMEM;
> +
> +	ent->pid = pid;
> +	ent->level = level;
> +	list_add_tail(&ent->list, list_head);
> +
> +	return 0;
> +}
> +
> +static int
> +pidns_list_filter(struct list_head *pidns_pid_list,
> +		struct list_head *pidns_pid_tree)
> +{
> +	struct pidns_list *pos, *pos_t;
> +	struct pid_namespace *ns0, *ns1;
> +	struct pid *pid0, *pid1;
> +	int rc, flag = 0;
> +
> +	/*
> +	 * screen pids with relationship
> +	 * in pidns_pid_list, we may add pids like:
> +	 * ns0   ns1   ns2
> +	 * pid1->pid2->pid3
> +	 * we should screen pid1, pid2 and keep pid3
> +	 */
> +	list_for_each_entry(pos, pidns_pid_list, list) {
> +		list_for_each_entry(pos_t, pidns_pid_list, list) {
> +			flag = 0;
> +			pid0 = pos->pid;
> +			pid1 = pos_t->pid;
> +			ns0 = pid0->numbers[pid0->level].ns;
> +			ns1 = pid1->numbers[pid1->level].ns;
> +			if (pos->pid->level < pos_t->pid->level)
> +				for (; ns1 != NULL; ns1 = ns1->parent)
> +					if (ns0 == ns1) {
> +						flag = 1;
> +						break;
> +					}
> +			/* a redundant pid found */
> +			if (flag == 1)
> +				break;
> +		}
> +
> +		if (flag == 0) {
> +			get_pid(pos->pid);
> +			rc = pidns_list_add(pos->pid, pidns_pid_tree, 0);
> +			if (rc) {
> +				put_pid(pos->pid);
> +				goto cleanup;
> +			}
> +		}
> +	}
> +
> +	/*
> +	 *  Now all useful stuffs are in pidns_pid_tree,
> +	 *  free pidns_pid_list
> +	 */
> +	free_pidns_list(pidns_pid_list);
> +
> +	return 0;
> +
> +cleanup:
> +	free_pidns_list(pidns_pid_tree);
> +	return rc;
> +}
> +
> +static void
> +pidns_list_set_level(struct list_head *pidns_list_in,
> +		struct pid_namespace *curr_ns)
> +{
> +	struct pidns_list *pos, *pos_t;
> +	struct pid *pid0, *pid1;
> +	int i;
> +
> +	/*
> +	 * From the pid hierarchy point of view,
> +	 * we already had a list of pids who are not
> +	 * the subsets of each other.
> +	 * But part of them may be same.
> +	 * We need to set the level of each pids:
> +	 * pid0:         A->B->C   pid1:       A->B->D
> +	 * level:           2                  0
> +	 * We use level to identify
> +	 * the public part of each pids.
> +	 */
> +	list_for_each_entry(pos, pidns_list_in, list) {
> +		list_for_each_entry(pos_t, pidns_list_in, list) {
> +			pid0 = pos->pid;
> +			pid1 = pos_t->pid;
> +			if (pid0 == pid1)
> +				continue;
> +			if (pos_t->level > 0)
> +				continue;
> +			for (i = curr_ns->level + 1; i <= pid0->level; i++) {
> +				/* skip the public parts */
> +				if (pid0->numbers[i].ns ==
> +						pid1->numbers[i].ns)
> +					continue;
> +				else
> +					break;
> +			}
> +			pos->level = i - 1;
> +		}
> +	}
> +}
> +
> +/*
> + * Finds all init pids, places them into
> + * pidns_pid_list and then stores the hierarchy
> + * into pidns_pid_tree.
> + */
> +static int proc_pidns_list_refresh(struct pid_namespace *curr_ns,
> +		struct list_head *pidns_pid_list,
> +		struct list_head *pidns_pid_tree)
> +{
> +	struct pid *pid;
> +	int new_nr, nr = 0;
> +	int rc;
> +
> +	/* collect pids in current namespace */
> +	while (nr < PID_MAX_LIMIT) {
> +		rcu_read_lock();
> +		pid = find_ge_pid(nr, curr_ns);
> +		if (!pid) {
> +			rcu_read_unlock();
> +			break;
> +		}
> +
> +		new_nr = pid_vnr(pid);
> +		if (!is_child_reaper(pid)) {
> +			nr = new_nr + 1;
> +			rcu_read_unlock();
> +			continue;
> +		}
> +		get_pid(pid);
> +		rcu_read_unlock();
> +		rc = pidns_list_add(pid, pidns_pid_list, 0);
> +		if (rc) {
> +			put_pid(pid);
> +			goto cleanup;
> +		}
> +		nr = new_nr + 1;
> +	}
> +
> +	/*
> +	 * Only one pid found as the child reaper,
> +	 * so current pid namespace do not have sub-namespace,
> +	 * return 0 directly.
> +	 */
> +	if (list_is_singular(pidns_pid_list)) {
> +		rc = 0;
> +		goto cleanup;
> +	}
> +
> +	/*
> +	 * screen duplicate pids from pidns_pid_list
> +	 * and form a new list pidns_pid_tree.
> +	 */
> +	rc = pidns_list_filter(pidns_pid_list, pidns_pid_tree);
> +	if (rc)
> +		goto cleanup;
> +
> +	return 0;
> +
> +cleanup:
> +	free_pidns_list(pidns_pid_list);
> +	return rc;
> +}
> +
> +static int nslist_proc_show(struct seq_file *m, void *v)
> +{
> +	struct pidns_list *pos;
> +	struct pid_namespace *ns, *curr_ns;
> +	struct pid *pid;
> +	char pid_buf[16], ppid_buf[16];
> +	int i, rc;
> +
> +	LIST_HEAD(pidns_pid_list);
> +	LIST_HEAD(pidns_pid_tree);
> +
> +	curr_ns = task_active_pid_ns(current);
> +
> +	rc = proc_pidns_list_refresh(curr_ns,
> +			&pidns_pid_list, &pidns_pid_tree);
> +	if (rc)
> +		return rc;
> +
> +	pidns_list_set_level(&pidns_pid_tree, curr_ns);
> +
> +	/* print pid namespace's hierarchy */
> +	list_for_each_entry(pos, &pidns_pid_tree, list) {
> +		pid = pos->pid;
> +		for (i = max(curr_ns->level, pos->level) + 1;
> +				i <= pid->level; i++) {
> +			ns = pid->numbers[i].ns;
> +			/* show PID '1' in specific pid ns */
> +			snprintf(pid_buf, 16, "%u",
> +				pid_vnr(find_pid_ns(1, ns)));
> +			ns = pid->numbers[i - 1].ns;
> +			snprintf(ppid_buf, 16, "%u",
> +					pid_vnr(find_pid_ns(1, ns)));
> +			seq_printf(m, "%s\t%s\t%d\n", pid_buf, ppid_buf,
> +					i - curr_ns->level);
> +		}
> +	}
> +
> +	free_pidns_list(&pidns_pid_tree);
> +
> +	return 0;
> +}
> +
> +static int nslist_proc_open(struct inode *inode, struct file *file)
> +{
> +	return single_open(file, nslist_proc_show, NULL);
> +}
> +
> +static const struct file_operations proc_nspid_nslist_fops = {
> +	.open		= nslist_proc_open,
> +	.read		= seq_read,
> +	.llseek		= seq_lseek,
> +	.release	= single_release,
> +};
> +
> +/*
> + *  Called by proc_root_init() to initialize the /proc/pidns_hierarchy
> + */
> +void __init proc_pidns_hierarchy_init(void)
> +{
> +	proc_create(NS_HIERARCHY, S_IRUGO,
> +		NULL, &proc_nspid_nslist_fops);
> +}
> diff --git a/fs/proc/root.c b/fs/proc/root.c
> index e74ac9f..bcb55c7 100644
> --- a/fs/proc/root.c
> +++ b/fs/proc/root.c
> @@ -190,6 +190,7 @@ void __init proc_root_init(void)
>  	proc_tty_init();
>  	proc_mkdir("bus", NULL);
>  	proc_sys_init();
> +	proc_pidns_hierarchy_init();
>  }
>  
>  static int proc_root_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [resend][PATCH v9 2/3] /proc/PID/status: show all sets of pid according to ns
       [not found] <55349719.6775592.1423011445985.JavaMail.zimbra@redhat.com>
@ 2015-02-04  1:12 ` Nathan Scott
  0 siblings, 0 replies; 8+ messages in thread
From: Nathan Scott @ 2015-02-04  1:12 UTC (permalink / raw)
  To: Chen Hanxiao, Eric W. Biederman; +Cc: Serge Hallyn, containers, linux-kernel

Hi Chen, Eric,

Eric W. Biederman <ebiederm@xmission.com> writes:
> Chen Hanxiao <chenhanxiao@cn.fujitsu.com> writes:
> > If some issues occurred inside a container guest, host user
> > could not know which process is in trouble just by guest pid:
> > [...]
> > Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
> > Tested-by: Serge Hallyn <serge.hallyn@canonical.com>
> >
> > Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
> 
> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
> 
> At a quick review and read through this looks good.  Once I finish
> clearing the security bug fixes from my tree I will see about picking
> this up.

I recently came across a need for this patch so I just wanted to
say thanks and since I've used it a fair bit feel free to add:

Tested-by: Nathan Scott <nathans@redhat.com>

One small tweak you could make is to drop the extra whitespace
from those new seq_printf calls - "\t%d " has a trailing space
that isn't needed.

Also there's proc status docs below Documentation/ that should be
updated for these changes.  They are slightly out-of-date already
and there's a few typos in the vicinity - something like this may
do the trick though ... ?  (will need to be updated at merge time
with the correct kernel version)


docs: add missing and new /proc/PID/status file entries, fix typos

Signed-off-by: Nathan Scott <nathans@redhat.com>

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index aae9dd1..457cebd 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -197,12 +197,12 @@ contains details information about the process itself.  Its fields are
 explained in Table 1-4.
 
 (for SMP CONFIG users)
-For making accounting scalable, RSS related information are handled in
-asynchronous manner and the vaule may not be very precise. To see a precise
+For making accounting scalable, RSS related information are handled in an
+asynchronous manner and the value may not be very precise. To see a precise
 snapshot of a moment, you can see /proc/<pid>/smaps file and scan page table.
 It's slow but very precise.
 
-Table 1-2: Contents of the status files (as of 2.6.30-rc7)
+Table 1-2: Contents of the status files (as of 3.20.0)
 ..............................................................................
  Field                       Content
  Name                        filename of the executable
@@ -210,6 +210,7 @@ Table 1-2: Contents of the status files (as of 2.6.30-rc7)
                              in an uninterruptible wait, Z is zombie,
 			     T is traced or stopped)
  Tgid                        thread group ID
+ Ngid                        NUMA group ID (0 if none)
  Pid                         process id
  PPid                        process id of the parent process
  TracerPid                   PID of process tracing this process (0 if not)
@@ -217,6 +218,10 @@ Table 1-2: Contents of the status files (as of 2.6.30-rc7)
  Gid                         Real, effective, saved set, and  file system GIDs
  FDSize                      number of file descriptor slots currently allocated
  Groups                      supplementary group list
+ NStgid                      descendant namespace thread group ID hierarchy
+ NSpid                       descendant namespace process ID hierarchy
+ NSpgid                      descendant namespace process group ID hierarchy
+ NSsid                       descendant namespace session ID hierarchy
  VmPeak                      peak virtual memory size
  VmSize                      total program size
  VmLck                       locked memory size


^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-02-04  1:12 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-23 10:20 [resend][PATCH v9 0/3] ns, procfs: pid conversion between ns and showing pidns hierarchy Chen Hanxiao
2014-12-23 10:20 ` [resend][PATCH v9 1/3] procfs: show hierarchy of pid namespace Chen Hanxiao
2014-12-30  1:17   ` Chen, Hanxiao
2014-12-30  5:54   ` Eric W. Biederman
2014-12-23 10:20 ` [resend][PATCH v9 2/3] /proc/PID/status: show all sets of pid according to ns Chen Hanxiao
2014-12-30  5:39   ` Eric W. Biederman
2014-12-23 10:20 ` [PATCH v9 3/3] Documentation: add docs for /proc/pidns_hierarchy Chen Hanxiao
     [not found] <55349719.6775592.1423011445985.JavaMail.zimbra@redhat.com>
2015-02-04  1:12 ` [resend][PATCH v9 2/3] /proc/PID/status: show all sets of pid according to ns Nathan Scott

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).