linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Colascione <dancol@google.com>
To: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org
Cc: timmurray@google.com, primiano@google.com, joelaf@google.com,
	Daniel Colascione <dancol@google.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Vlastimil Babka <vbabka@suse.cz>, Roman Gushchin <guro@fb.com>,
	Prashant Dhamdhere <pdhamdhe@redhat.com>,
	"Dennis Zhou (Facebook)" <dennisszhou@gmail.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	"Steven Rostedt (VMware)" <rostedt@goodmis.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>,
	Dominik Brodowski <linux@dominikbrodowski.net>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	Michal Hocko <mhocko@suse.com>,
	Stephen Rothwell <sfr@canb.auug.org.au>,
	KJ Tsanaktsidis <ktsanaktsidis@zendesk.com>,
	David Howells <dhowells@redhat.com>,
	"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>
Subject: [PATCH v2] Add /proc/pid_gen
Date: Wed, 21 Nov 2018 12:54:20 -0800	[thread overview]
Message-ID: <20181121205428.165205-1-dancol@google.com> (raw)
In-Reply-To: <20181121201452.77173-1-dancol@google.com>

Trace analysis code needs a coherent picture of the set of processes
and threads running on a system. While it's possible to enumerate all
tasks via /proc, this enumeration is not atomic. If PID numbering
rolls over during snapshot collection, the resulting snapshot of the
process and thread state of the system may be incoherent, confusing
trace analysis tools. The fundamental problem is that if a PID is
reused during a userspace scan of /proc, it's impossible to tell, in
post-processing, whether a fact that the userspace /proc scanner
reports regarding a given PID refers to the old or new task named by
that PID, as the scan of that PID may or may not have occurred before
the PID reuse, and there's no way to "stamp" a fact read from the
kernel with a trace timestamp.

This change adds a per-pid-namespace 64-bit generation number,
incremented on PID rollover, and exposes it via a new proc file
/proc/pid_gen. By examining this file before and after /proc
enumeration, user code can detect the potential reuse of a PID and
restart the task enumeration process, repeating until it gets a
coherent snapshot.

PID rollover ought to be rare, so in practice, scan repetitions will
be rare.

Signed-off-by: Daniel Colascione <dancol@google.com>
---

Make commit message match the code.

 Documentation/filesystems/proc.txt |  1 +
 include/linux/pid.h                |  1 +
 include/linux/pid_namespace.h      |  2 ++
 init/main.c                        |  1 +
 kernel/pid.c                       | 36 +++++++++++++++++++++++++++++-
 5 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 12a5e6e693b6..f58a359f9a2c 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -615,6 +615,7 @@ Table 1-5: Kernel info in /proc
  partitions  Table of partitions known to the system           
  pci	     Deprecated info of PCI bus (new way -> /proc/bus/pci/,
              decoupled by lspci					(2.4)
+ pid_gen     PID rollover count
  rtc         Real time clock                                   
  scsi        SCSI info (see text)                              
  slabinfo    Slab pool info                                    
diff --git a/include/linux/pid.h b/include/linux/pid.h
index 14a9a39da9c7..2e4b41a32e86 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -112,6 +112,7 @@ extern struct pid *find_ge_pid(int nr, struct pid_namespace *);
 int next_pidmap(struct pid_namespace *pid_ns, unsigned int last);
 
 extern struct pid *alloc_pid(struct pid_namespace *ns);
+extern u64 read_pid_generation(struct pid_namespace *ns);
 extern void free_pid(struct pid *pid);
 extern void disable_pid_allocation(struct pid_namespace *ns);
 
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index 49538b172483..fa92ae66fb98 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -44,6 +44,7 @@ struct pid_namespace {
 	kgid_t pid_gid;
 	int hide_pid;
 	int reboot;	/* group exit code if this pidns was rebooted */
+	u64 generation;  /* incremented on wraparound */
 	struct ns_common ns;
 } __randomize_layout;
 
@@ -99,5 +100,6 @@ static inline int reboot_pid_ns(struct pid_namespace *pid_ns, int cmd)
 extern struct pid_namespace *task_active_pid_ns(struct task_struct *tsk);
 void pidhash_init(void);
 void pid_idr_init(void);
+void pid_proc_init(void);
 
 #endif /* _LINUX_PID_NS_H */
diff --git a/init/main.c b/init/main.c
index ee147103ba1b..20c595e852c6 100644
--- a/init/main.c
+++ b/init/main.c
@@ -730,6 +730,7 @@ asmlinkage __visible void __init start_kernel(void)
 	cgroup_init();
 	taskstats_init_early();
 	delayacct_init();
+	pid_proc_init();
 
 	check_bugs();
 
diff --git a/kernel/pid.c b/kernel/pid.c
index b2f6c506035d..cd5f4aa8eb55 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -174,6 +174,7 @@ struct pid *alloc_pid(struct pid_namespace *ns)
 
 	for (i = ns->level; i >= 0; i--) {
 		int pid_min = 1;
+		unsigned int old_cursor;
 
 		idr_preload(GFP_KERNEL);
 		spin_lock_irq(&pidmap_lock);
@@ -182,7 +183,8 @@ struct pid *alloc_pid(struct pid_namespace *ns)
 		 * init really needs pid 1, but after reaching the maximum
 		 * wrap back to RESERVED_PIDS
 		 */
-		if (idr_get_cursor(&tmp->idr) > RESERVED_PIDS)
+		old_cursor = idr_get_cursor(&tmp->idr);
+		if (old_cursor > RESERVED_PIDS)
 			pid_min = RESERVED_PIDS;
 
 		/*
@@ -191,6 +193,8 @@ struct pid *alloc_pid(struct pid_namespace *ns)
 		 */
 		nr = idr_alloc_cyclic(&tmp->idr, NULL, pid_min,
 				      pid_max, GFP_ATOMIC);
+		if (unlikely(idr_get_cursor(&tmp->idr) <= old_cursor))
+			tmp->generation += 1;
 		spin_unlock_irq(&pidmap_lock);
 		idr_preload_end();
 
@@ -246,6 +250,16 @@ struct pid *alloc_pid(struct pid_namespace *ns)
 	return ERR_PTR(retval);
 }
 
+u64 read_pid_generation(struct pid_namespace *ns)
+{
+	u64 generation;
+
+	spin_lock_irq(&pidmap_lock);
+	generation = ns->generation;
+	spin_unlock_irq(&pidmap_lock);
+	return generation;
+}
+
 void disable_pid_allocation(struct pid_namespace *ns)
 {
 	spin_lock_irq(&pidmap_lock);
@@ -449,6 +463,17 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns)
 	return idr_get_next(&ns->idr, &nr);
 }
 
+#ifdef CONFIG_PROC_FS
+static int pid_generation_show(struct seq_file *m, void *v)
+{
+	u64 generation =
+		read_pid_generation(proc_pid_ns(file_inode(m->file)));
+	seq_printf(m, "%llu\n", generation);
+	return 0;
+
+};
+#endif
+
 void __init pid_idr_init(void)
 {
 	/* Verify no one has done anything silly: */
@@ -465,4 +490,13 @@ void __init pid_idr_init(void)
 
 	init_pid_ns.pid_cachep = KMEM_CACHE(pid,
 			SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT);
+
+}
+
+void __init pid_proc_init(void)
+{
+	/* pid_idr_init is too early, so get a separate init function. */
+#ifdef CONFIG_PROC_FS
+	WARN_ON(!proc_create_single("pid_gen", 0, NULL, pid_generation_show));
+#endif
 }
-- 
2.19.1.1215.g8438c0b245-goog


  parent reply	other threads:[~2018-11-21 20:54 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-21 20:14 [PATCH] Add /proc/pid_generation Daniel Colascione
2018-11-21 20:31 ` Matthew Wilcox
2018-11-21 20:38   ` Daniel Colascione
2018-11-22  2:06     ` Matthew Wilcox
2018-11-25 22:55       ` Pavel Machek
2018-11-21 20:54 ` Daniel Colascione [this message]
2018-11-21 22:12   ` [PATCH v2] Add /proc/pid_gen Andrew Morton
2018-11-21 22:40     ` Daniel Colascione
2018-11-21 22:48       ` Jann Horn
2018-11-21 22:52         ` Daniel Colascione
2018-11-21 22:50       ` Andrew Morton
2018-11-21 23:21         ` Daniel Colascione
2018-11-21 23:35           ` Andy Lutomirski
2018-11-22  0:21             ` Daniel Colascione
2018-11-22 13:58             ` Cyrill Gorcunov
2018-11-22  0:22           ` Andrew Morton
2018-11-22  0:28             ` Daniel Colascione
2018-11-22  0:30               ` Daniel Colascione
2018-11-22 15:27                 ` Mathieu Desnoyers
2018-11-22  0:57               ` Andrew Morton
2018-11-22  1:08                 ` Daniel Colascione
2018-11-22  1:29                   ` Andrew Morton
2018-11-22  2:35                     ` Tim Murray
2018-11-22  5:30                       ` Daniel Colascione
2018-11-22 11:19 ` [PATCH] Add /proc/pid_generation Kevin Easton
2018-11-23 11:14   ` David Laight
2018-11-25 23:00     ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181121205428.165205-1-dancol@google.com \
    --to=dancol@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=ard.biesheuvel@linaro.org \
    --cc=corbet@lwn.net \
    --cc=dennisszhou@gmail.com \
    --cc=dhowells@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=guro@fb.com \
    --cc=joelaf@google.com \
    --cc=jpoimboe@redhat.com \
    --cc=ktsanaktsidis@zendesk.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@dominikbrodowski.net \
    --cc=mhocko@suse.com \
    --cc=mingo@kernel.org \
    --cc=pdhamdhe@redhat.com \
    --cc=primiano@google.com \
    --cc=rostedt@goodmis.org \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=sfr@canb.auug.org.au \
    --cc=tglx@linutronix.de \
    --cc=timmurray@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).