All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH, RESEND] procfs: silence lockdep warning about read vs. exec seq_file
@ 2014-08-02 20:10 Kirill A. Shutemov
  2014-08-03 16:44 ` Oleg Nesterov
  2014-08-05  3:42 ` [PATCH, RESEND] procfs: silence lockdep warning about read vs. exec seq_file Eric W. Biederman
  0 siblings, 2 replies; 13+ messages in thread
From: Kirill A. Shutemov @ 2014-08-02 20:10 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, David Howells, Peter Zijlstra,
	Sasha Levin, Cyrill Gorcunov, Oleg Nesterov, David S. Miller,
	Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Testcase:

  cat /proc/self/maps >/dev/null
  chmod +x /proc/self/net/packet
  exec /proc/self/net/packet

It triggers lockdep warning:

	 [ INFO: possible circular locking dependency detected ]
	 3.16.0-rc7-00064-g26bcd8b72563 #8 Not tainted
	 -------------------------------------------------------
	 sh/157 is trying to acquire lock:
	  (&p->lock){+.+.+.}, at: [<ffffffff8117f4f8>] seq_read+0x38/0x3e0

	 but task is already holding lock:
	  (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff81160018>] prepare_bprm_creds+0x28/0x90

	 which lock already depends on the new lock.

	 the existing dependency chain (in reverse order) is:

	-> #1 (&sig->cred_guard_mutex){+.+.+.}:
	        [<ffffffff8109a9c1>] __lock_acquire+0x531/0xde0
	        [<ffffffff8109b959>] lock_acquire+0x79/0xd0
	        [<ffffffff8173f838>] mutex_lock_killable_nested+0x68/0x460
	        [<ffffffff811c0d9f>] lock_trace+0x1f/0x60
	        [<ffffffff811c0ed7>] proc_pid_personality+0x17/0x60
	        [<ffffffff811be39b>] proc_single_show+0x4b/0x90
	        [<ffffffff8117f5a0>] seq_read+0xe0/0x3e0
	        [<ffffffff81158f1e>] vfs_read+0x8e/0x170
	        [<ffffffff81159be8>] SyS_read+0x48/0xc0
	        [<ffffffff81743712>] system_call_fastpath+0x16/0x1b

	-> #0 (&p->lock){+.+.+.}:
	        [<ffffffff81097437>] validate_chain.isra.37+0xfe7/0x13b0
	        [<ffffffff8109a9c1>] __lock_acquire+0x531/0xde0
	        [<ffffffff8109b959>] lock_acquire+0x79/0xd0
	        [<ffffffff8173f09a>] mutex_lock_nested+0x6a/0x3d0
	        [<ffffffff8117f4f8>] seq_read+0x38/0x3e0
	        [<ffffffff811bd5f3>] proc_reg_read+0x43/0x70
	        [<ffffffff81158f1e>] vfs_read+0x8e/0x170
	        [<ffffffff8115ea13>] kernel_read+0x43/0x60
	        [<ffffffff8115ec65>] prepare_binprm+0xd5/0x170
	        [<ffffffff811605c8>] do_execve_common.isra.32+0x548/0x800
	        [<ffffffff81160893>] do_execve+0x13/0x20
	        [<ffffffff81160b70>] SyS_execve+0x20/0x30
	        [<ffffffff81743c89>] stub_execve+0x69/0xa0

	 other info that might help us debug this:

	  Possible unsafe locking scenario:

	        CPU0                    CPU1
	        ----                    ----
	   lock(&sig->cred_guard_mutex);
	                                lock(&p->lock);
	                                lock(&sig->cred_guard_mutex);
	   lock(&p->lock);

	  *** DEADLOCK ***

	 1 lock held by sh/157:
	  #0:  (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff81160018>] prepare_bprm_creds+0x28/0x90

It's a false positive: seq files which take cred_guard_mutex are never
executable. Let's use separate lock class for them.

I don't know why we allow "chmod +x" on some proc files, notably net-related.
Is it a bug?

Also I suspect eb94cd96e05d fixes non-existing bug, like this one.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 fs/proc/base.c       | 24 +++++++++++++++++++++++-
 fs/proc/task_mmu.c   | 14 ++++++++++++++
 fs/proc/task_nommu.c |  4 ++++
 3 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 2d696b0c93bf..c05b4a227acb 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -655,9 +655,31 @@ static int proc_single_show(struct seq_file *m, void *v)
 	return ret;
 }
 
+/*
+ * proc_pid_personality() and proc_pid_stack() take cred_guard_mutex via
+ * lock_trace() with seq_file->lock held.
+ * execve(2) calls vfs_read() with cred_guard_mutex held.
+ *
+ * So if you will try to execute a seq_file, lockdep will report a possible
+ * circular locking dependency. It's false-positive, since ONE() files are
+ * never executable.
+ *
+ * Let's set separate lock class for seq_file->lock of ONE() files.
+ */
+static struct lock_class_key proc_single_open_lock_class;
+
 static int proc_single_open(struct inode *inode, struct file *filp)
 {
-	return single_open(filp, proc_single_show, inode);
+	struct seq_file *m;
+	int ret;
+
+	ret = single_open(filp, proc_single_show, inode);
+	if (ret)
+		return ret;
+
+	m = filp->private_data;
+	lockdep_set_class(&m->lock, &proc_single_open_lock_class);
+	return 0;
 }
 
 static const struct file_operations proc_single_file_operations = {
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index cfa63ee92c96..536b9f9a9ff5 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -19,6 +19,18 @@
 #include <asm/tlbflush.h>
 #include "internal.h"
 
+/*
+ * m_start() takes cred_guard_mutex via mm_access() with seq_file->lock held.
+ * execve(2) calls vfs_read() with cred_guard_mutex held.
+ *
+ * So if you will try to execute a seq_file, lockdep will report a possible
+ * circular locking dependency. It's false positive, since m_start() users are
+ * never executable.
+ *
+ * Let's set separate class lock for seq_file->lock of m_start() users.
+ */
+static struct lock_class_key pid_maps_seq_file_lock;
+
 void task_mem(struct seq_file *m, struct mm_struct *mm)
 {
 	unsigned long data, text, lib, swap;
@@ -242,6 +254,7 @@ static int do_maps_open(struct inode *inode, struct file *file,
 		ret = seq_open(file, ops);
 		if (!ret) {
 			struct seq_file *m = file->private_data;
+			lockdep_set_class(&m->lock, &pid_maps_seq_file_lock);
 			m->private = priv;
 		} else {
 			kfree(priv);
@@ -1512,6 +1525,7 @@ static int numa_maps_open(struct inode *inode, struct file *file,
 		ret = seq_open(file, ops);
 		if (!ret) {
 			struct seq_file *m = file->private_data;
+			lockdep_set_class(&m->lock, &pid_maps_seq_file_lock);
 			m->private = priv;
 		} else {
 			kfree(priv);
diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c
index 678455d2d683..35a799443990 100644
--- a/fs/proc/task_nommu.c
+++ b/fs/proc/task_nommu.c
@@ -9,6 +9,9 @@
 #include <linux/seq_file.h>
 #include "internal.h"
 
+/* See comment in task_mmu.c */
+static struct lock_class_key pid_maps_seq_file_lock;
+
 /*
  * Logic: we've got two memory sums for each process, "shared", and
  * "non-shared". Shared memory may get counted more than once, for
@@ -277,6 +280,7 @@ static int maps_open(struct inode *inode, struct file *file,
 		ret = seq_open(file, ops);
 		if (!ret) {
 			struct seq_file *m = file->private_data;
+			lockdep_set_class(&m->lock, &pid_maps_seq_file_lock);
 			m->private = priv;
 		} else {
 			kfree(priv);
-- 
2.0.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH, RESEND] procfs: silence lockdep warning about read vs. exec seq_file
  2014-08-02 20:10 [PATCH, RESEND] procfs: silence lockdep warning about read vs. exec seq_file Kirill A. Shutemov
@ 2014-08-03 16:44 ` Oleg Nesterov
  2014-08-03 21:18   ` [PATCH 0/5] (Was: procfs: silence lockdep warning about read vs. exec seq_file) Oleg Nesterov
  2014-08-05  3:42 ` [PATCH, RESEND] procfs: silence lockdep warning about read vs. exec seq_file Eric W. Biederman
  1 sibling, 1 reply; 13+ messages in thread
From: Oleg Nesterov @ 2014-08-03 16:44 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Alexander Viro, linux-fsdevel, linux-kernel, David Howells,
	Peter Zijlstra, Sasha Levin, Cyrill Gorcunov, David S. Miller,
	Kirill A. Shutemov

Sorry for delay,

On 08/02, Kirill A. Shutemov wrote:
>
> +/*
> + * proc_pid_personality() and proc_pid_stack() take cred_guard_mutex via
> + * lock_trace()

And at first glance they lock_trace() can die. But lets temporary ignore,
m_start() is trickier.

> +static struct lock_class_key pid_maps_seq_file_lock;
> +
>  void task_mem(struct seq_file *m, struct mm_struct *mm)
>  {
>  	unsigned long data, text, lib, swap;
> @@ -242,6 +254,7 @@ static int do_maps_open(struct inode *inode, struct file *file,
>  		ret = seq_open(file, ops);
>  		if (!ret) {
>  			struct seq_file *m = file->private_data;
> +			lockdep_set_class(&m->lock, &pid_maps_seq_file_lock);

Perhaps lockdep_set_subclass() would be better... But this doesn't matter.

The question is, why m_start() calls mm_access(). This is not even
strictly correct if the task execs between m_stop() + m_start().

Can't we do something like below? The patch is obviously horrible and
incomplete, just to explain what I meant. Basically this is what
proc_mem_operations does.

Oleg.


diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 3ab6d14..c16b70e 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -267,6 +267,7 @@ extern int proc_remount(struct super_block *, int *, char *);
 struct proc_maps_private {
 	struct pid *pid;
 	struct task_struct *task;
+	struct mm_struct *mm;
 #ifdef CONFIG_MMU
 	struct vm_area_struct *tail_vma;
 #endif
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index cfa63ee..9b88248 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -165,9 +165,9 @@ static void *m_start(struct seq_file *m, loff_t *pos)
 	if (!priv->task)
 		return ERR_PTR(-ESRCH);
 
-	mm = mm_access(priv->task, PTRACE_MODE_READ);
-	if (!mm || IS_ERR(mm))
-		return mm;
+	mm = priv->mm;
+	if (!mm || !atomic_inc_not_zero(&mm->mm_users))
+		return NULL;
 	down_read(&mm->mmap_sem);
 
 	tail_vma = get_gate_vma(priv->task->mm);
@@ -231,6 +231,27 @@ static void m_stop(struct seq_file *m, void *v)
 		put_task_struct(priv->task);
 }
 
+// TODO: change __mem_open() to use this helper
+static struct mm_struct *xxx(struct inode *inode, struct file *file, unsigned int mode)
+{
+	struct task_struct *task = get_proc_task(inode);
+	struct mm_struct *mm = ERR_PTR(-ESRCH);
+
+	if (task) {
+		mm = mm_access(task, mode);
+		put_task_struct(task);
+
+		if (!IS_ERR_OR_NULL(mm)) {
+			/* ensure this mm_struct can't be freed */
+			atomic_inc(&mm->mm_count);
+			/* but do not pin its memory */
+			mmput(mm);
+		}
+	}
+
+	return mm;
+}
+
 static int do_maps_open(struct inode *inode, struct file *file,
 			const struct seq_operations *ops)
 {
@@ -239,17 +260,38 @@ static int do_maps_open(struct inode *inode, struct file *file,
 	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
 	if (priv) {
 		priv->pid = proc_pid(inode);
+		priv->mm = xxx(inode, file, PTRACE_MODE_READ);
+
+		// XXX cleanup me
+		if (IS_ERR(priv->mm)) {
+			ret = -EACCES;
+			goto free;
+		}
+
 		ret = seq_open(file, ops);
 		if (!ret) {
 			struct seq_file *m = file->private_data;
 			m->private = priv;
 		} else {
+			// XXX cleanup me
+			if (priv->mm)
+				mmdrop(priv->mm);
+ free:
 			kfree(priv);
 		}
 	}
 	return ret;
 }
 
+static int xxx_release(struct inode *inode, struct file *file)
+{
+	struct seq_file *seq = file->private_data;
+	struct proc_maps_private *priv = seq->private;
+	if (priv->mm)
+		mmdrop(priv->mm);
+	return 0;
+}
+
 static void
 show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid)
 {
@@ -399,14 +441,14 @@ const struct file_operations proc_pid_maps_operations = {
 	.open		= pid_maps_open,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
-	.release	= seq_release_private,
+	.release	= xxx_release,
 };
 
 const struct file_operations proc_tid_maps_operations = {
 	.open		= tid_maps_open,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
-	.release	= seq_release_private,
+	.release	= xxx_release,
 };
 
 /*
@@ -682,14 +724,14 @@ const struct file_operations proc_pid_smaps_operations = {
 	.open		= pid_smaps_open,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
-	.release	= seq_release_private,
+	.release	= xxx_release,
 };
 
 const struct file_operations proc_tid_smaps_operations = {
 	.open		= tid_smaps_open,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
-	.release	= seq_release_private,
+	.release	= xxx_release,
 };
 
 /*


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 0/5] (Was: procfs: silence lockdep warning about read vs. exec seq_file)
  2014-08-03 16:44 ` Oleg Nesterov
@ 2014-08-03 21:18   ` Oleg Nesterov
  2014-08-03 21:19     ` [PATCH 1/5] fs/proc/task_mmu.c: don't use task->mm in m_start() and show_*map() Oleg Nesterov
                       ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Oleg Nesterov @ 2014-08-03 21:18 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Alexander Viro, linux-fsdevel, linux-kernel, David Howells,
	Peter Zijlstra, Sasha Levin, Cyrill Gorcunov, David S. Miller,
	Kirill A. Shutemov

On 08/03, Oleg Nesterov wrote:
>
> The question is, why m_start() calls mm_access(). This is not even
> strictly correct if the task execs between m_stop() + m_start().
>
> Can't we do something like below? The patch is obviously horrible and
> incomplete, just to explain what I meant. Basically this is what
> proc_mem_operations does.

Absolutely untested, only for review.

What do you all think?

Sure, with this change you can't open (say) /proc/pid/maps, and read the
new mappings after exec. But hopefully this is fine? And again, this
matches /proc/pid/mem.

lock_trace() users need another fix.

Oleg.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/5] fs/proc/task_mmu.c: don't use task->mm in m_start() and show_*map()
  2014-08-03 21:18   ` [PATCH 0/5] (Was: procfs: silence lockdep warning about read vs. exec seq_file) Oleg Nesterov
@ 2014-08-03 21:19     ` Oleg Nesterov
  2014-08-03 21:19     ` [PATCH 2/5] fs/proc/task_mmu.c: unify/simplify do_maps_open() and numa_maps_open() Oleg Nesterov
                       ` (5 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Oleg Nesterov @ 2014-08-03 21:19 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Alexander Viro, linux-fsdevel, linux-kernel, David Howells,
	Peter Zijlstra, Sasha Levin, Cyrill Gorcunov, David S. Miller,
	Kirill A. Shutemov

get_gate_vma(priv->task->mm) looks ugly and wrong, task->mm can be
already NULL.

I think that priv->task should simply die and hold_task_mempolicy()
logic can be simplified.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/proc/task_mmu.c |    8 +++-----
 1 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index cfa63ee..8a5271e 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -170,7 +170,7 @@ static void *m_start(struct seq_file *m, loff_t *pos)
 		return mm;
 	down_read(&mm->mmap_sem);
 
-	tail_vma = get_gate_vma(priv->task->mm);
+	tail_vma = get_gate_vma(mm);
 	priv->tail_vma = tail_vma;
 	hold_task_mempolicy(priv);
 	/* Start with last addr hint */
@@ -351,12 +351,11 @@ static int show_map(struct seq_file *m, void *v, int is_pid)
 {
 	struct vm_area_struct *vma = v;
 	struct proc_maps_private *priv = m->private;
-	struct task_struct *task = priv->task;
 
 	show_map_vma(m, vma, is_pid);
 
 	if (m->count < m->size)  /* vma is copied successfully */
-		m->version = (vma != get_gate_vma(task->mm))
+		m->version = (vma != priv->tail_vma)
 			? vma->vm_start : 0;
 	return 0;
 }
@@ -584,7 +583,6 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
 static int show_smap(struct seq_file *m, void *v, int is_pid)
 {
 	struct proc_maps_private *priv = m->private;
-	struct task_struct *task = priv->task;
 	struct vm_area_struct *vma = v;
 	struct mem_size_stats mss;
 	struct mm_walk smaps_walk = {
@@ -639,7 +637,7 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
 	show_smap_vma_flags(m, vma);
 
 	if (m->count < m->size)  /* vma is copied successfully */
-		m->version = (vma != get_gate_vma(task->mm))
+		m->version = (vma != priv->tail_vma)
 			? vma->vm_start : 0;
 	return 0;
 }
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/5] fs/proc/task_mmu.c: unify/simplify do_maps_open() and numa_maps_open()
  2014-08-03 21:18   ` [PATCH 0/5] (Was: procfs: silence lockdep warning about read vs. exec seq_file) Oleg Nesterov
  2014-08-03 21:19     ` [PATCH 1/5] fs/proc/task_mmu.c: don't use task->mm in m_start() and show_*map() Oleg Nesterov
@ 2014-08-03 21:19     ` Oleg Nesterov
  2014-08-03 21:20     ` [PATCH 3/5] proc: introduce proc_mem_open() Oleg Nesterov
                       ` (4 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Oleg Nesterov @ 2014-08-03 21:19 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Alexander Viro, linux-fsdevel, linux-kernel, David Howells,
	Peter Zijlstra, Sasha Levin, Cyrill Gorcunov, David S. Miller,
	Kirill A. Shutemov

do_maps_open() and numa_maps_open() are overcomplicated, they could
use __seq_open_private(). Plus they do the same, just sizeof(*priv)
differs.

Change them to use a new simple helper, proc_maps_open(ops, psize).
This simplifies the code and allows us to do the next changes.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/proc/task_mmu.c |   44 ++++++++++++++++----------------------------
 1 files changed, 16 insertions(+), 28 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 8a5271e..794aeb6 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -231,23 +231,23 @@ static void m_stop(struct seq_file *m, void *v)
 		put_task_struct(priv->task);
 }
 
+static int proc_maps_open(struct inode *inode, struct file *file,
+			const struct seq_operations *ops, int psize)
+{
+	struct proc_maps_private *priv = __seq_open_private(file, ops, psize);
+
+	if (!priv)
+		return -ENOMEM;
+
+	priv->pid = proc_pid(inode);
+	return 0;
+}
+
 static int do_maps_open(struct inode *inode, struct file *file,
 			const struct seq_operations *ops)
 {
-	struct proc_maps_private *priv;
-	int ret = -ENOMEM;
-	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
-	if (priv) {
-		priv->pid = proc_pid(inode);
-		ret = seq_open(file, ops);
-		if (!ret) {
-			struct seq_file *m = file->private_data;
-			m->private = priv;
-		} else {
-			kfree(priv);
-		}
-	}
-	return ret;
+	return proc_maps_open(inode, file, ops,
+				sizeof(struct proc_maps_private));
 }
 
 static void
@@ -1502,20 +1502,8 @@ static const struct seq_operations proc_tid_numa_maps_op = {
 static int numa_maps_open(struct inode *inode, struct file *file,
 			  const struct seq_operations *ops)
 {
-	struct numa_maps_private *priv;
-	int ret = -ENOMEM;
-	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
-	if (priv) {
-		priv->proc_maps.pid = proc_pid(inode);
-		ret = seq_open(file, ops);
-		if (!ret) {
-			struct seq_file *m = file->private_data;
-			m->private = priv;
-		} else {
-			kfree(priv);
-		}
-	}
-	return ret;
+	return proc_maps_open(inode, file, ops,
+				sizeof(struct numa_maps_private));
 }
 
 static int pid_numa_maps_open(struct inode *inode, struct file *file)
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/5] proc: introduce proc_mem_open()
  2014-08-03 21:18   ` [PATCH 0/5] (Was: procfs: silence lockdep warning about read vs. exec seq_file) Oleg Nesterov
  2014-08-03 21:19     ` [PATCH 1/5] fs/proc/task_mmu.c: don't use task->mm in m_start() and show_*map() Oleg Nesterov
  2014-08-03 21:19     ` [PATCH 2/5] fs/proc/task_mmu.c: unify/simplify do_maps_open() and numa_maps_open() Oleg Nesterov
@ 2014-08-03 21:20     ` Oleg Nesterov
  2014-08-03 21:20     ` [PATCH 4/5] fs/proc/task_mmu.c: introduce the "stable" proc_maps_private->mm Oleg Nesterov
                       ` (3 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Oleg Nesterov @ 2014-08-03 21:20 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Alexander Viro, linux-fsdevel, linux-kernel, David Howells,
	Peter Zijlstra, Sasha Levin, Cyrill Gorcunov, David S. Miller,
	Kirill A. Shutemov

Extract the mm_access() code from __mem_open() into the new helper,
proc_mem_open(), the next patch will add another caller.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/proc/base.c     |   36 +++++++++++++++++++++---------------
 fs/proc/internal.h |    2 ++
 2 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 2d696b0..9aef9ac 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -667,29 +667,35 @@ static const struct file_operations proc_single_file_operations = {
 	.release	= single_release,
 };
 
-static int __mem_open(struct inode *inode, struct file *file, unsigned int mode)
+
+struct mm_struct *proc_mem_open(struct inode *inode, unsigned int mode)
 {
-	struct task_struct *task = get_proc_task(file_inode(file));
-	struct mm_struct *mm;
+	struct task_struct *task = get_proc_task(inode);
+	struct mm_struct *mm = ERR_PTR(-ESRCH);
 
-	if (!task)
-		return -ESRCH;
+	if (task) {
+		mm = mm_access(task, mode);
+		put_task_struct(task);
 
-	mm = mm_access(task, mode);
-	put_task_struct(task);
+		if (!IS_ERR_OR_NULL(mm)) {
+			/* ensure this mm_struct can't be freed */
+			atomic_inc(&mm->mm_count);
+			/* but do not pin its memory */
+			mmput(mm);
+		}
+	}
+
+	return mm;
+}
+
+static int __mem_open(struct inode *inode, struct file *file, unsigned int mode)
+{
+	struct mm_struct *mm = proc_mem_open(inode, mode);
 
 	if (IS_ERR(mm))
 		return PTR_ERR(mm);
 
-	if (mm) {
-		/* ensure this mm_struct can't be freed */
-		atomic_inc(&mm->mm_count);
-		/* but do not pin its memory */
-		mmput(mm);
-	}
-
 	file->private_data = mm;
-
 	return 0;
 }
 
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 3ab6d14..ba0c1c1 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -275,6 +275,8 @@ struct proc_maps_private {
 #endif
 };
 
+struct mm_struct *proc_mem_open(struct inode *inode, unsigned int mode);
+
 extern const struct file_operations proc_pid_maps_operations;
 extern const struct file_operations proc_tid_maps_operations;
 extern const struct file_operations proc_pid_numa_maps_operations;
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/5] fs/proc/task_mmu.c: introduce the "stable" proc_maps_private->mm
  2014-08-03 21:18   ` [PATCH 0/5] (Was: procfs: silence lockdep warning about read vs. exec seq_file) Oleg Nesterov
                       ` (2 preceding siblings ...)
  2014-08-03 21:20     ` [PATCH 3/5] proc: introduce proc_mem_open() Oleg Nesterov
@ 2014-08-03 21:20     ` Oleg Nesterov
  2014-08-03 21:20     ` [PATCH 5/5] fs/proc/task_mmu.c: change m_start() to rely on priv->mm and avoid mm_access() Oleg Nesterov
                       ` (2 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Oleg Nesterov @ 2014-08-03 21:20 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Alexander Viro, linux-fsdevel, linux-kernel, David Howells,
	Peter Zijlstra, Sasha Levin, Cyrill Gorcunov, David S. Miller,
	Kirill A. Shutemov

A separate change to simplify the review of the next one.

Add the new "mm_struct *mm" member into struct proc_maps_private, it is
initialized by proc_maps_open() at "open" time.

This obviously means that proc_maps_open() users should additionally do
mmdrop() in fop->release(), add the new helper, proc_map_release().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/proc/internal.h |    1 +
 fs/proc/task_mmu.c |   31 +++++++++++++++++++++++++------
 2 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index ba0c1c1..78784cd 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -267,6 +267,7 @@ extern int proc_remount(struct super_block *, int *, char *);
 struct proc_maps_private {
 	struct pid *pid;
 	struct task_struct *task;
+	struct mm_struct *mm;
 #ifdef CONFIG_MMU
 	struct vm_area_struct *tail_vma;
 #endif
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 794aeb6..1cc623d 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -240,9 +240,28 @@ static int proc_maps_open(struct inode *inode, struct file *file,
 		return -ENOMEM;
 
 	priv->pid = proc_pid(inode);
+	priv->mm = proc_mem_open(inode, PTRACE_MODE_READ);
+	if (IS_ERR(priv->mm)) {
+		int err = PTR_ERR(priv->mm);
+		seq_release_private(inode, file);
+		return err;
+	}
+
 	return 0;
 }
 
+static int proc_map_release(struct inode *inode, struct file *file)
+{
+	struct seq_file *seq = file->private_data;
+	struct proc_maps_private *priv = seq->private;
+
+	if (priv->mm)
+		mmdrop(priv->mm);
+
+	return seq_release_private(inode, file);
+}
+
+
 static int do_maps_open(struct inode *inode, struct file *file,
 			const struct seq_operations *ops)
 {
@@ -398,14 +417,14 @@ const struct file_operations proc_pid_maps_operations = {
 	.open		= pid_maps_open,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
-	.release	= seq_release_private,
+	.release	= proc_map_release,
 };
 
 const struct file_operations proc_tid_maps_operations = {
 	.open		= tid_maps_open,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
-	.release	= seq_release_private,
+	.release	= proc_map_release,
 };
 
 /*
@@ -680,14 +699,14 @@ const struct file_operations proc_pid_smaps_operations = {
 	.open		= pid_smaps_open,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
-	.release	= seq_release_private,
+	.release	= proc_map_release,
 };
 
 const struct file_operations proc_tid_smaps_operations = {
 	.open		= tid_smaps_open,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
-	.release	= seq_release_private,
+	.release	= proc_map_release,
 };
 
 /*
@@ -1520,13 +1539,13 @@ const struct file_operations proc_pid_numa_maps_operations = {
 	.open		= pid_numa_maps_open,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
-	.release	= seq_release_private,
+	.release	= proc_map_release,
 };
 
 const struct file_operations proc_tid_numa_maps_operations = {
 	.open		= tid_numa_maps_open,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
-	.release	= seq_release_private,
+	.release	= proc_map_release,
 };
 #endif /* CONFIG_NUMA */
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/5] fs/proc/task_mmu.c: change m_start() to rely on priv->mm and avoid mm_access()
  2014-08-03 21:18   ` [PATCH 0/5] (Was: procfs: silence lockdep warning about read vs. exec seq_file) Oleg Nesterov
                       ` (3 preceding siblings ...)
  2014-08-03 21:20     ` [PATCH 4/5] fs/proc/task_mmu.c: introduce the "stable" proc_maps_private->mm Oleg Nesterov
@ 2014-08-03 21:20     ` Oleg Nesterov
  2014-08-04  6:59     ` [PATCH 0/5] (Was: procfs: silence lockdep warning about read vs. exec seq_file) Cyrill Gorcunov
  2014-08-04  9:20     ` Kirill A. Shutemov
  6 siblings, 0 replies; 13+ messages in thread
From: Oleg Nesterov @ 2014-08-03 21:20 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Alexander Viro, linux-fsdevel, linux-kernel, David Howells,
	Peter Zijlstra, Sasha Levin, Cyrill Gorcunov, David S. Miller,
	Kirill A. Shutemov

Finally we can change m_start() to avoid mm_access(), it can simply do
atomic_inc_not_zero(mm_users).

I'll try to verify, if this is the only change we need, then this one
line change doesn't deserve a separate patch.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/proc/task_mmu.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 1cc623d..7ec8eb5 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -165,9 +165,9 @@ static void *m_start(struct seq_file *m, loff_t *pos)
 	if (!priv->task)
 		return ERR_PTR(-ESRCH);
 
-	mm = mm_access(priv->task, PTRACE_MODE_READ);
-	if (!mm || IS_ERR(mm))
-		return mm;
+	mm = priv->mm;
+	if (!mm || !atomic_inc_not_zero(&mm->mm_users))
+		return NULL;
 	down_read(&mm->mmap_sem);
 
 	tail_vma = get_gate_vma(mm);
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/5] (Was: procfs: silence lockdep warning about read vs. exec seq_file)
  2014-08-03 21:18   ` [PATCH 0/5] (Was: procfs: silence lockdep warning about read vs. exec seq_file) Oleg Nesterov
                       ` (4 preceding siblings ...)
  2014-08-03 21:20     ` [PATCH 5/5] fs/proc/task_mmu.c: change m_start() to rely on priv->mm and avoid mm_access() Oleg Nesterov
@ 2014-08-04  6:59     ` Cyrill Gorcunov
  2014-08-04  9:20     ` Kirill A. Shutemov
  6 siblings, 0 replies; 13+ messages in thread
From: Cyrill Gorcunov @ 2014-08-04  6:59 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Kirill A. Shutemov, Alexander Viro, linux-fsdevel, linux-kernel,
	David Howells, Peter Zijlstra, Sasha Levin, David S. Miller,
	Kirill A. Shutemov

On Sun, Aug 03, 2014 at 11:18:43PM +0200, Oleg Nesterov wrote:
> On 08/03, Oleg Nesterov wrote:
> >
> > The question is, why m_start() calls mm_access(). This is not even
> > strictly correct if the task execs between m_stop() + m_start().
> >
> > Can't we do something like below? The patch is obviously horrible and
> > incomplete, just to explain what I meant. Basically this is what
> > proc_mem_operations does.
> 
> Absolutely untested, only for review.
> 
> What do you all think?
> 
> Sure, with this change you can't open (say) /proc/pid/maps, and read the
> new mappings after exec. But hopefully this is fine? And again, this
> matches /proc/pid/mem.
> 
> lock_trace() users need another fix.

If only I didn't miss something obvious this is quite the good cleanup,
thanks Oleg!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/5] (Was: procfs: silence lockdep warning about read vs. exec seq_file)
  2014-08-03 21:18   ` [PATCH 0/5] (Was: procfs: silence lockdep warning about read vs. exec seq_file) Oleg Nesterov
                       ` (5 preceding siblings ...)
  2014-08-04  6:59     ` [PATCH 0/5] (Was: procfs: silence lockdep warning about read vs. exec seq_file) Cyrill Gorcunov
@ 2014-08-04  9:20     ` Kirill A. Shutemov
  2014-08-04 14:55       ` Oleg Nesterov
  6 siblings, 1 reply; 13+ messages in thread
From: Kirill A. Shutemov @ 2014-08-04  9:20 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Alexander Viro, linux-fsdevel, linux-kernel, David Howells,
	Peter Zijlstra, Sasha Levin, Cyrill Gorcunov, David S. Miller,
	Kirill A. Shutemov

On Sun, Aug 03, 2014 at 11:18:43PM +0200, Oleg Nesterov wrote:
> On 08/03, Oleg Nesterov wrote:
> >
> > The question is, why m_start() calls mm_access(). This is not even
> > strictly correct if the task execs between m_stop() + m_start().
> >
> > Can't we do something like below? The patch is obviously horrible and
> > incomplete, just to explain what I meant. Basically this is what
> > proc_mem_operations does.
> 
> Absolutely untested, only for review.
> 
> What do you all think?

Look good. And works for me.

> Sure, with this change you can't open (say) /proc/pid/maps, and read the
> new mappings after exec. But hopefully this is fine? And again, this
> matches /proc/pid/mem.
> 
> lock_trace() users need another fix.

task_nommu.c need to be covered too, I believe.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/5] (Was: procfs: silence lockdep warning about read vs. exec seq_file)
  2014-08-04  9:20     ` Kirill A. Shutemov
@ 2014-08-04 14:55       ` Oleg Nesterov
  0 siblings, 0 replies; 13+ messages in thread
From: Oleg Nesterov @ 2014-08-04 14:55 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Alexander Viro, linux-fsdevel, linux-kernel, David Howells,
	Peter Zijlstra, Sasha Levin, Cyrill Gorcunov, David S. Miller,
	Kirill A. Shutemov

On 08/04, Kirill A. Shutemov wrote:
>
> On Sun, Aug 03, 2014 at 11:18:43PM +0200, Oleg Nesterov wrote:
> >
> > Absolutely untested, only for review.
> >
> > What do you all think?
>
> Look good. And works for me.

OK, thanks Kirill and Cyrill ;)

I'll try to recheck, update the changelogs and resend after the testing,
hopefully tomorrow.

> task_nommu.c need to be covered too, I believe.

Yes, I know, thanks. I'd prefer to make a separate patch or two, simply
because I can't test the changes in nommu. The only question is how much
should I try to avoid the copy-and-paste... probably I won't try at all.

Oleg.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH, RESEND] procfs: silence lockdep warning about read vs. exec seq_file
  2014-08-02 20:10 [PATCH, RESEND] procfs: silence lockdep warning about read vs. exec seq_file Kirill A. Shutemov
  2014-08-03 16:44 ` Oleg Nesterov
@ 2014-08-05  3:42 ` Eric W. Biederman
  2014-08-05  8:46   ` Kirill A. Shutemov
  1 sibling, 1 reply; 13+ messages in thread
From: Eric W. Biederman @ 2014-08-05  3:42 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Alexander Viro, linux-fsdevel, linux-kernel, David Howells,
	Peter Zijlstra, Sasha Levin, Cyrill Gorcunov, Oleg Nesterov,
	David S. Miller, Kirill A. Shutemov

"Kirill A. Shutemov" <kirill@shutemov.name> writes:

> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>
> Testcase:
>
>   cat /proc/self/maps >/dev/null
>   chmod +x /proc/self/net/packet
>   exec /proc/self/net/packet
>
> It triggers lockdep warning:

> I don't know why we allow "chmod +x" on some proc files, notably net-related.
> Is it a bug?

It looks like we simply did not remove the ability to make those files
executable when we realized executable proc files could be a problem.

I expect that part of proc could use an audit where someone figures out
what makes sense.  It does appear that chmod XXX /proc/generic_file
is explicitly supported.  So we would have to be delicate with any
changes in that area to avoid creating userspace regressions.

Eric

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH, RESEND] procfs: silence lockdep warning about read vs. exec seq_file
  2014-08-05  3:42 ` [PATCH, RESEND] procfs: silence lockdep warning about read vs. exec seq_file Eric W. Biederman
@ 2014-08-05  8:46   ` Kirill A. Shutemov
  0 siblings, 0 replies; 13+ messages in thread
From: Kirill A. Shutemov @ 2014-08-05  8:46 UTC (permalink / raw)
  To: Eric W. Biederman, David S. Miller
  Cc: Alexander Viro, linux-fsdevel, linux-kernel, David Howells,
	Peter Zijlstra, Sasha Levin, Cyrill Gorcunov, Oleg Nesterov,
	Kirill A. Shutemov

On Mon, Aug 04, 2014 at 08:42:11PM -0700, Eric W. Biederman wrote:
> "Kirill A. Shutemov" <kirill@shutemov.name> writes:
> 
> > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> >
> > Testcase:
> >
> >   cat /proc/self/maps >/dev/null
> >   chmod +x /proc/self/net/packet
> >   exec /proc/self/net/packet
> >
> > It triggers lockdep warning:
> 
> > I don't know why we allow "chmod +x" on some proc files, notably net-related.
> > Is it a bug?
> 
> It looks like we simply did not remove the ability to make those files
> executable when we realized executable proc files could be a problem.
> 
> I expect that part of proc could use an audit where someone figures out
> what makes sense.  It does appear that chmod XXX /proc/generic_file
> is explicitly supported.  So we would have to be delicate with any
> changes in that area to avoid creating userspace regressions.

I can't imagine valid use-case, taking into account that changing access
rights for one PID changes it for every PID. It's potential security hole
if user is not aware about this (I wasn't until recently).

List of files under /proc/PID/ which doesn't fail chmod to its current
access rights -- chmod "$(stat --format="%a" "$file")" "$file":

net/anycast6
net/arp
net/connector
net/dev
net/dev_mcast
net/dev_snmp6/eth0
net/dev_snmp6/lo
net/fib_trie
net/fib_triestat
net/icmp
net/icmp6
net/if_inet6
net/igmp
net/igmp6
net/ip6_flowlabel
net/ip6_tables_matches
net/ip6_tables_names
net/ip6_tables_targets
net/ip_mr_cache
net/ip_mr_vif
net/ip_tables_matches
net/ip_tables_names
net/ip_tables_targets
net/ipv6_route
net/mcfilter
net/mcfilter6
net/netfilter/nf_log
net/netlink
net/netstat
net/nf_conntrack
net/nf_conntrack_expect
net/packet
net/pnp
net/protocols
net/psched
net/ptype
net/raw
net/raw6
net/route
net/rt6_stats
net/rt_cache
net/snmp
net/snmp6
net/sockstat
net/sockstat6
net/softnet_stat
net/stat/arp_cache
net/stat/ndisc_cache
net/stat/nf_conntrack
net/stat/rt_cache
net/tcp
net/tcp6
net/udp
net/udp6
net/udplite
net/udplite6
net/unix

David, could you comment on this?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-08-05  8:47 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-02 20:10 [PATCH, RESEND] procfs: silence lockdep warning about read vs. exec seq_file Kirill A. Shutemov
2014-08-03 16:44 ` Oleg Nesterov
2014-08-03 21:18   ` [PATCH 0/5] (Was: procfs: silence lockdep warning about read vs. exec seq_file) Oleg Nesterov
2014-08-03 21:19     ` [PATCH 1/5] fs/proc/task_mmu.c: don't use task->mm in m_start() and show_*map() Oleg Nesterov
2014-08-03 21:19     ` [PATCH 2/5] fs/proc/task_mmu.c: unify/simplify do_maps_open() and numa_maps_open() Oleg Nesterov
2014-08-03 21:20     ` [PATCH 3/5] proc: introduce proc_mem_open() Oleg Nesterov
2014-08-03 21:20     ` [PATCH 4/5] fs/proc/task_mmu.c: introduce the "stable" proc_maps_private->mm Oleg Nesterov
2014-08-03 21:20     ` [PATCH 5/5] fs/proc/task_mmu.c: change m_start() to rely on priv->mm and avoid mm_access() Oleg Nesterov
2014-08-04  6:59     ` [PATCH 0/5] (Was: procfs: silence lockdep warning about read vs. exec seq_file) Cyrill Gorcunov
2014-08-04  9:20     ` Kirill A. Shutemov
2014-08-04 14:55       ` Oleg Nesterov
2014-08-05  3:42 ` [PATCH, RESEND] procfs: silence lockdep warning about read vs. exec seq_file Eric W. Biederman
2014-08-05  8:46   ` Kirill A. Shutemov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.