linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Question] race condition with remove_proc_entry
@ 2005-12-30 20:04 Steven Rostedt
  2005-12-30 21:28 ` [PATCH] protect remove_proc_entry Steven Rostedt
  0 siblings, 1 reply; 25+ messages in thread
From: Steven Rostedt @ 2005-12-30 20:04 UTC (permalink / raw)
  To: LKML; +Cc: Andrew Morton, Ingo Molnar

I'm just curious if it is know that remove_proc_entry has an inherit
race condition?  I have a modified kernel that would add and remove
stuff from the proc system and it would every so often crash.  I traced
the bug to remove_proc_entry.

	for (p = &parent->subdir; *p; p=&(*p)->next ) {
		if (!proc_match(len, fn, *p))
			continue;

Looking at proc_match

int proc_match(int len, const char *name, struct proc_dir_entry *de)
{
	if (de->namelen != len)
		return 0;
	return !memcmp(name, de->name, len);
}


The bug would happen either at de->namelen in proc_match or in the loop
of p=&(*p)->next.


The race is if two threads remove two entries that are siblings.  Since
p = &(*p)->next,  and this is then dereferenced, the race is with *p
becoming NULL.

The way I'm fixing this is to put a lock around the call to
remove_proc_entry.  But is this race already known and the solution is
to have the callers perform their own locking?  Or is this an actual
bug?  If it is not a bug, where's the documentation on having callers
protect it?

Thanks,

-- Steve


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] protect remove_proc_entry
  2005-12-30 20:04 [Question] race condition with remove_proc_entry Steven Rostedt
@ 2005-12-30 21:28 ` Steven Rostedt
  2005-12-30 21:34   ` Daniel Walker
                     ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Steven Rostedt @ 2005-12-30 21:28 UTC (permalink / raw)
  To: LKML; +Cc: Ingo Molnar, Andrew Morton

Working on a custom kernel that adds and removes proc entries quite a
bit, I discovered that remove_proc_entry is not protected against
multiple threads removing entries belonging to the same parent.  At
first I thought that this is only a problem with my changes, but after
inspecting the vanilla kernel, I see that there's several places that
calls remove_proc_entry with the same parent (most noticeably
/proc/drivers).

I've added a global remove_proc_lock to protect this section of code.  I
was going to add a lock to proc_dir_entry so that the locking is only
cut down to the same parent, but since this function is called so
infrequently, why waste more memory then is needed.  One global lock
should not cause too much of a headache here.

I'm not sure if remove_proc_entry is called from interrupt context, so I
did a irqsave just in case.

-- Steve


Index: linux-2.6.15-rc7/fs/proc/generic.c
===================================================================
--- linux-2.6.15-rc7.orig/fs/proc/generic.c	2005-12-30 14:19:39.000000000 -0500
+++ linux-2.6.15-rc7/fs/proc/generic.c	2005-12-30 16:18:42.000000000 -0500
@@ -19,6 +19,7 @@
 #include <linux/idr.h>
 #include <linux/namei.h>
 #include <linux/bitops.h>
+#include <linux/spinlock.h>
 #include <asm/uaccess.h>
 
 static ssize_t proc_file_read(struct file *file, char __user *buf,
@@ -27,6 +28,8 @@
 			       size_t count, loff_t *ppos);
 static loff_t proc_file_lseek(struct file *, loff_t, int);
 
+static DEFINE_SPINLOCK(remove_proc_lock);
+
 int proc_match(int len, const char *name, struct proc_dir_entry *de)
 {
 	if (de->namelen != len)
@@ -689,10 +692,13 @@
 	struct proc_dir_entry *de;
 	const char *fn = name;
 	int len;
+	unsigned long flags;
 
 	if (!parent && xlate_proc_name(name, &parent, &fn) != 0)
 		goto out;
 	len = strlen(fn);
+
+	spin_lock_irqsave(&remove_proc_lock, flags);
 	for (p = &parent->subdir; *p; p=&(*p)->next ) {
 		if (!proc_match(len, fn, *p))
 			continue;
@@ -713,6 +719,7 @@
 		}
 		break;
 	}
+	spin_unlock_irqrestore(&remove_proc_lock, flags);
 out:
 	return;
 }



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] protect remove_proc_entry
  2005-12-30 21:28 ` [PATCH] protect remove_proc_entry Steven Rostedt
@ 2005-12-30 21:34   ` Daniel Walker
  2005-12-30 21:55     ` Steven Rostedt
  2005-12-30 21:55   ` Mitchell Blank Jr
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 25+ messages in thread
From: Daniel Walker @ 2005-12-30 21:34 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML, Ingo Molnar, Andrew Morton

On Fri, 2005-12-30 at 16:28 -0500, Steven Rostedt wrote:

> +	spin_lock_irqsave(&remove_proc_lock, flags);
...
> +	spin_unlock_irqrestore(&remove_proc_lock, flags);

Why do an irqsave here? Are you not sure what context it gets called
from?

Daniel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] protect remove_proc_entry
  2005-12-30 21:28 ` [PATCH] protect remove_proc_entry Steven Rostedt
  2005-12-30 21:34   ` Daniel Walker
@ 2005-12-30 21:55   ` Mitchell Blank Jr
  2005-12-30 22:09     ` Steven Rostedt
  2005-12-30 22:11     ` Steven Rostedt
  2005-12-30 23:46   ` Andrew Morton
  2006-01-07 11:36   ` Andrew Morton
  3 siblings, 2 replies; 25+ messages in thread
From: Mitchell Blank Jr @ 2005-12-30 21:55 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML, Ingo Molnar, Andrew Morton

Steven Rostedt wrote:
> I've added a global remove_proc_lock to protect this section of code.  I
> was going to add a lock to proc_dir_entry so that the locking is only
> cut down to the same parent, but since this function is called so
> infrequently, why waste more memory then is needed.  One global lock
> should not cause too much of a headache here.

Are you sure that it's the only place where we need guard ->subdir?  It
looks like proc_lookup() and proc_readdir() use the BLK when walking that
list, so probably the best fix would be to use that lock everywhere else
->subdir is touched

-Mitch

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] protect remove_proc_entry
  2005-12-30 21:34   ` Daniel Walker
@ 2005-12-30 21:55     ` Steven Rostedt
  0 siblings, 0 replies; 25+ messages in thread
From: Steven Rostedt @ 2005-12-30 21:55 UTC (permalink / raw)
  To: Daniel Walker; +Cc: LKML, Ingo Molnar, Andrew Morton



On Fri, 30 Dec 2005, Daniel Walker wrote:

> On Fri, 2005-12-30 at 16:28 -0500, Steven Rostedt wrote:
>
> > +	spin_lock_irqsave(&remove_proc_lock, flags);
> ...
> > +	spin_unlock_irqrestore(&remove_proc_lock, flags);
>
> Why do an irqsave here? Are you not sure what context it gets called
> from?
>

[snipped from original message]
"I'm not sure if remove_proc_entry is called from interrupt context, so I
did a irqsave just in case."

  ;-)

-- Steve



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] protect remove_proc_entry
  2005-12-30 21:55   ` Mitchell Blank Jr
@ 2005-12-30 22:09     ` Steven Rostedt
  2005-12-30 22:18       ` Steven Rostedt
  2006-01-07 11:25       ` Andrew Morton
  2005-12-30 22:11     ` Steven Rostedt
  1 sibling, 2 replies; 25+ messages in thread
From: Steven Rostedt @ 2005-12-30 22:09 UTC (permalink / raw)
  To: Mitchell Blank Jr; +Cc: LKML, Ingo Molnar, Andrew Morton

On Fri, 2005-12-30 at 13:55 -0800, Mitchell Blank Jr wrote:
> Steven Rostedt wrote:
> > I've added a global remove_proc_lock to protect this section of code.  I
> > was going to add a lock to proc_dir_entry so that the locking is only
> > cut down to the same parent, but since this function is called so
> > infrequently, why waste more memory then is needed.  One global lock
> > should not cause too much of a headache here.
> 
> Are you sure that it's the only place where we need guard ->subdir?  It
> looks like proc_lookup() and proc_readdir() use the BLK when walking that
> list, so probably the best fix would be to use that lock everywhere else
> ->subdir is touched

Good point.

God, we should be getting rid of the stupid BKL, not add more.  But
seeing that this is what is used to protect that list, I guess I'll add
it.

I'm also assuming that interrupt context wont use this.

-- Steve

Index: linux-2.6.15-rc7/fs/proc/generic.c
===================================================================
--- linux-2.6.15-rc7.orig/fs/proc/generic.c	2005-12-30 14:19:39.000000000 -0500
+++ linux-2.6.15-rc7/fs/proc/generic.c	2005-12-30 17:05:56.000000000 -0500
@@ -693,6 +693,8 @@
 	if (!parent && xlate_proc_name(name, &parent, &fn) != 0)
 		goto out;
 	len = strlen(fn);
+
+	lock_kernel();
 	for (p = &parent->subdir; *p; p=&(*p)->next ) {
 		if (!proc_match(len, fn, *p))
 			continue;
@@ -713,6 +715,7 @@
 		}
 		break;
 	}
+	unlock_kernel();
 out:
 	return;
 }



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] protect remove_proc_entry
  2005-12-30 21:55   ` Mitchell Blank Jr
  2005-12-30 22:09     ` Steven Rostedt
@ 2005-12-30 22:11     ` Steven Rostedt
  1 sibling, 0 replies; 25+ messages in thread
From: Steven Rostedt @ 2005-12-30 22:11 UTC (permalink / raw)
  To: Mitchell Blank Jr; +Cc: LKML, Ingo Molnar, Andrew Morton

On Fri, 2005-12-30 at 13:55 -0800, Mitchell Blank Jr wrote:
> Steven Rostedt wrote:
> > I've added a global remove_proc_lock to protect this section of code.  I
> > was going to add a lock to proc_dir_entry so that the locking is only
> > cut down to the same parent, but since this function is called so
> > infrequently, why waste more memory then is needed.  One global lock
> > should not cause too much of a headache here.
> 
> Are you sure that it's the only place where we need guard ->subdir?  It
> looks like proc_lookup() and proc_readdir() use the BLK when walking that
> list, so probably the best fix would be to use that lock everywhere else
> ->subdir is touched

Perhaps this is a good candidate to have the BKL removed from this
protection and replaced with a spin lock or something else.  If I
remember, I'll look into that further.

-- Steve



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] protect remove_proc_entry
  2005-12-30 22:09     ` Steven Rostedt
@ 2005-12-30 22:18       ` Steven Rostedt
  2006-01-04  9:21         ` Andrew Morton
  2006-01-07 11:25       ` Andrew Morton
  1 sibling, 1 reply; 25+ messages in thread
From: Steven Rostedt @ 2005-12-30 22:18 UTC (permalink / raw)
  To: Mitchell Blank Jr; +Cc: Andrew Morton, Ingo Molnar, LKML

On Fri, 2005-12-30 at 17:09 -0500, Steven Rostedt wrote:

> Index: linux-2.6.15-rc7/fs/proc/generic.c
> ===================================================================
> --- linux-2.6.15-rc7.orig/fs/proc/generic.c	2005-12-30 14:19:39.000000000 -0500
> +++ linux-2.6.15-rc7/fs/proc/generic.c	2005-12-30 17:05:56.000000000 -0500
> @@ -693,6 +693,8 @@
>  	if (!parent && xlate_proc_name(name, &parent, &fn) != 0)
>  		goto out;
>  	len = strlen(fn);
> +
> +	lock_kernel();
>  	for (p = &parent->subdir; *p; p=&(*p)->next ) {
>  		if (!proc_match(len, fn, *p))
>  			continue;
> @@ -713,6 +715,7 @@
>  		}
>  		break;
>  	}
> +	unlock_kernel();
>  out:
>  	return;
>  }
> 

FYI, to make sure that this solves the problem, I'm removing my locking
in my kernel and using this instead.  It usually crashes in a day or
two, so I can say this works if it makes it three days.

-- Steve



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] protect remove_proc_entry
  2005-12-30 21:28 ` [PATCH] protect remove_proc_entry Steven Rostedt
  2005-12-30 21:34   ` Daniel Walker
  2005-12-30 21:55   ` Mitchell Blank Jr
@ 2005-12-30 23:46   ` Andrew Morton
  2005-12-31  6:58     ` Steven Rostedt
                       ` (2 more replies)
  2006-01-07 11:36   ` Andrew Morton
  3 siblings, 3 replies; 25+ messages in thread
From: Andrew Morton @ 2005-12-30 23:46 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel, mingo

Steven Rostedt <rostedt@goodmis.org> wrote:
>
> +static DEFINE_SPINLOCK(remove_proc_lock);
>

I'll take a closer look at this next week.

The official way of protecting the contents of a directory from concurrent
lookup or modification is to take its i_sem.  But procfs is totally weird
and that approach may well not be practical here.  We'd certainly prefer
not to rely upon lock_kernel().

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] protect remove_proc_entry
  2005-12-30 23:46   ` Andrew Morton
@ 2005-12-31  6:58     ` Steven Rostedt
  2005-12-31  8:34       ` Arjan van de Ven
  2005-12-31  8:53     ` Kirill Korotaev
  2006-01-02 13:02     ` Steven Rostedt
  2 siblings, 1 reply; 25+ messages in thread
From: Steven Rostedt @ 2005-12-31  6:58 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, mingo

On Fri, 30 Dec 2005, Andrew Morton wrote:

> Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > +static DEFINE_SPINLOCK(remove_proc_lock);
> >
>
> I'll take a closer look at this next week.
>
> The official way of protecting the contents of a directory from concurrent
> lookup or modification is to take its i_sem.  But procfs is totally weird
> and that approach may well not be practical here.  We'd certainly prefer
> not to rely upon lock_kernel().
>

Yeah, I thought about using the sem (or mutex ;) but remove_proc_entry is
used so darn much around the kernel, I wasn't sure it's not used in irq
context.  So I'm not sure I like the idea of making a non-scheduling
function schedule.  But it may not be a problem and it could very well be
ok to schedule here.

Also as Mitchell Blank pointed out, this list should probably be protected
everywhere by the same protection used, and an analysis should be done to
see what replacing the BKL affects.

-- Steve


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] protect remove_proc_entry
  2005-12-31  6:58     ` Steven Rostedt
@ 2005-12-31  8:34       ` Arjan van de Ven
  0 siblings, 0 replies; 25+ messages in thread
From: Arjan van de Ven @ 2005-12-31  8:34 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Andrew Morton, linux-kernel, mingo

On Sat, 2005-12-31 at 01:58 -0500, Steven Rostedt wrote:
> On Fri, 30 Dec 2005, Andrew Morton wrote:
> 
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> > >
> > > +static DEFINE_SPINLOCK(remove_proc_lock);
> > >
> >
> > I'll take a closer look at this next week.
> >
> > The official way of protecting the contents of a directory from concurrent
> > lookup or modification is to take its i_sem.  But procfs is totally weird
> > and that approach may well not be practical here.  We'd certainly prefer
> > not to rely upon lock_kernel().
> >
> 
> Yeah, I thought about using the sem (or mutex ;) but remove_proc_entry is
> used so darn much around the kernel, I wasn't sure it's not used in irq
> context. 

it can't be; "anything-VFS" like this can sleep if the file is busy etc
etc.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] protect remove_proc_entry
  2005-12-30 23:46   ` Andrew Morton
  2005-12-31  6:58     ` Steven Rostedt
@ 2005-12-31  8:53     ` Kirill Korotaev
  2006-01-04  9:36       ` Andrew Morton
  2006-01-02 13:02     ` Steven Rostedt
  2 siblings, 1 reply; 25+ messages in thread
From: Kirill Korotaev @ 2005-12-31  8:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Steven Rostedt, linux-kernel, mingo

[-- Attachment #1: Type: text/plain, Size: 1407 bytes --]

Hi Andrew,

I have a full patch for this.
I don't remember the details yet, but lock was not god here, we used 
semaphore. I pointed to this problem long ago when fixed error path in 
proc with moduleget.

This patch protects proc_dir_entry tree with a proc_tree_sem semaphore. 
I suppose lock_kernel() can be removed later after checking that no proc 
handlers require it.
Also this patch remakes de refcounters a bit making it more clear and 
more similar to dentry scheme - this is required to make sure that 
everything works correctly.

Patch is against 2.6.15-rcX and was tested for about a week. Also works 
half a year on 2.6.8 :)

Signed-Off-By: Pavel Emelianov <xemul@sw.ru>
Signed-Off-By: Kirill Korotaev <dev@openvz.org>

Kirill

> Steven Rostedt <rostedt@goodmis.org> wrote:
>> +static DEFINE_SPINLOCK(remove_proc_lock);
>>
> 
> I'll take a closer look at this next week.
> 
> The official way of protecting the contents of a directory from concurrent
> lookup or modification is to take its i_sem.  But procfs is totally weird
> and that approach may well not be practical here.  We'd certainly prefer
> not to rely upon lock_kernel().
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

[-- Attachment #2: diff-ms-proc-locks-20051231 --]
[-- Type: text/plain, Size: 7112 bytes --]

--- ./fs/proc/generic.c.proclk	2005-12-26 13:43:14.000000000 +0300
+++ ./fs/proc/generic.c	2005-12-31 11:48:16.000000000 +0300
@@ -27,6 +27,8 @@ static ssize_t proc_file_write(struct fi
 			       size_t count, loff_t *ppos);
 static loff_t proc_file_lseek(struct file *, loff_t, int);
 
+static DECLARE_RWSEM(proc_tree_sem);
+
 int proc_match(int len, const char *name, struct proc_dir_entry *de)
 {
 	if (de->namelen != len)
@@ -381,6 +383,7 @@ struct dentry *proc_lookup(struct inode 
 	lock_kernel();
 	de = PDE(dir);
 	if (de) {
+		down_read(&proc_tree_sem);
 		for (de = de->subdir; de ; de = de->next) {
 			if (de->namelen != dentry->d_name.len)
 				continue;
@@ -392,6 +395,7 @@ struct dentry *proc_lookup(struct inode 
 				break;
 			}
 		}
+		up_read(&proc_tree_sem);
 	}
 	unlock_kernel();
 
@@ -446,12 +450,13 @@ int proc_readdir(struct file * filp,
 			filp->f_pos++;
 			/* fall through */
 		default:
+			down_read(&proc_tree_sem);
 			de = de->subdir;
 			i -= 2;
 			for (;;) {
 				if (!de) {
 					ret = 1;
-					goto out;
+					goto out_up;
 				}
 				if (!i)
 					break;
@@ -462,14 +467,18 @@ int proc_readdir(struct file * filp,
 			do {
 				if (filldir(dirent, de->name, de->namelen, filp->f_pos,
 					    de->low_ino, de->mode >> 12) < 0)
-					goto out;
+					goto out_up;
 				filp->f_pos++;
 				de = de->next;
 			} while (de);
+			up_read(&proc_tree_sem);
 	}
 	ret = 1;
 out:	unlock_kernel();
 	return ret;	
+out_up:
+	up_read(&proc_tree_sem);
+	goto out;
 }
 
 /*
@@ -517,6 +526,7 @@ static int proc_register(struct proc_dir
 		if (dp->proc_iops == NULL)
 			dp->proc_iops = &proc_file_inode_operations;
 	}
+	de_get(dir);
 	return 0;
 }
 
@@ -576,6 +586,7 @@ static struct proc_dir_entry *proc_creat
 
 	memset(ent, 0, sizeof(struct proc_dir_entry));
 	memcpy(((char *) ent) + sizeof(struct proc_dir_entry), fn, len + 1);
+	atomic_set(&ent->count, 1);
 	ent->name = ((char *) ent) + sizeof(*ent);
 	ent->namelen = len;
 	ent->mode = mode;
@@ -589,6 +600,7 @@ struct proc_dir_entry *proc_symlink(cons
 {
 	struct proc_dir_entry *ent;
 
+	down_write(&proc_tree_sem);
 	ent = proc_create(&parent,name,
 			  (S_IFLNK | S_IRUGO | S_IWUGO | S_IXUGO),1);
 
@@ -606,6 +618,7 @@ struct proc_dir_entry *proc_symlink(cons
 			ent = NULL;
 		}
 	}
+	up_write(&proc_tree_sem);
 	return ent;
 }
 
@@ -614,6 +627,7 @@ struct proc_dir_entry *proc_mkdir_mode(c
 {
 	struct proc_dir_entry *ent;
 
+	down_write(&proc_tree_sem);
 	ent = proc_create(&parent, name, S_IFDIR | mode, 2);
 	if (ent) {
 		ent->proc_fops = &proc_dir_operations;
@@ -624,6 +638,7 @@ struct proc_dir_entry *proc_mkdir_mode(c
 			ent = NULL;
 		}
 	}
+	up_write(&proc_tree_sem);
 	return ent;
 }
 
@@ -633,7 +648,7 @@ struct proc_dir_entry *proc_mkdir(const 
 	return proc_mkdir_mode(name, S_IRUGO | S_IXUGO, parent);
 }
 
-struct proc_dir_entry *create_proc_entry(const char *name, mode_t mode,
+static struct proc_dir_entry *__create_proc_entry(const char *name, mode_t mode,
 					 struct proc_dir_entry *parent)
 {
 	struct proc_dir_entry *ent;
@@ -665,6 +680,17 @@ struct proc_dir_entry *create_proc_entry
 	return ent;
 }
 
+struct proc_dir_entry *create_proc_entry(const char *name, mode_t mode,
+					 struct proc_dir_entry *parent)
+{
+	struct proc_dir_entry *ent;
+
+	down_write(&proc_tree_sem);
+	ent = __create_proc_entry(name, mode, parent);
+	up_write(&proc_tree_sem);
+	return ent;
+}
+
 void free_proc_entry(struct proc_dir_entry *de)
 {
 	unsigned int ino = de->low_ino;
@@ -683,15 +709,13 @@ void free_proc_entry(struct proc_dir_ent
  * Remove a /proc entry and free it if it's not currently in use.
  * If it is in use, we set the 'deleted' flag.
  */
-void remove_proc_entry(const char *name, struct proc_dir_entry *parent)
+static void __remove_proc_entry(const char *name, struct proc_dir_entry *parent)
 {
 	struct proc_dir_entry **p;
 	struct proc_dir_entry *de;
 	const char *fn = name;
 	int len;
 
-	if (!parent && xlate_proc_name(name, &parent, &fn) != 0)
-		goto out;
 	len = strlen(fn);
 	for (p = &parent->subdir; *p; p=&(*p)->next ) {
 		if (!proc_match(len, fn, *p))
@@ -699,20 +723,24 @@ void remove_proc_entry(const char *name,
 		de = *p;
 		*p = de->next;
 		de->next = NULL;
+		de_put(parent);
 		if (S_ISDIR(de->mode))
 			parent->nlink--;
 		proc_kill_inodes(de);
 		de->nlink = 0;
 		WARN_ON(de->subdir);
-		if (!atomic_read(&de->count))
-			free_proc_entry(de);
-		else {
-			de->deleted = 1;
-			printk("remove_proc_entry: %s/%s busy, count=%d\n",
-				parent->name, de->name, atomic_read(&de->count));
-		}
+		de->deleted = 1;
+		de_put(de);
 		break;
 	}
-out:
-	return;
+}
+
+void remove_proc_entry(const char *name, struct proc_dir_entry *parent)
+{
+	const char *fn = name;
+
+	down_write(&proc_tree_sem);
+	if (parent || xlate_proc_name(name, &parent, &fn) == 0)
+		__remove_proc_entry(name, parent);
+	up_write(&proc_tree_sem);
 }
--- ./fs/proc/inode.c.proclk	2005-12-26 13:43:14.000000000 +0300
+++ ./fs/proc/inode.c	2005-12-31 11:48:16.000000000 +0300
@@ -21,34 +21,25 @@
 
 extern void free_proc_entry(struct proc_dir_entry *);
 
-static inline struct proc_dir_entry * de_get(struct proc_dir_entry *de)
-{
-	if (de)
-		atomic_inc(&de->count);
-	return de;
-}
-
 /*
  * Decrements the use count and checks for deferred deletion.
  */
-static void de_put(struct proc_dir_entry *de)
+void de_put(struct proc_dir_entry *de)
 {
 	if (de) {	
-		lock_kernel();		
 		if (!atomic_read(&de->count)) {
 			printk("de_put: entry %s already free!\n", de->name);
-			unlock_kernel();
 			return;
 		}
 
 		if (atomic_dec_and_test(&de->count)) {
-			if (de->deleted) {
-				printk("de_put: deferred delete of %s\n",
-					de->name);
-				free_proc_entry(de);
+			if (!de->deleted) {
+				printk("de_put: entry %s is not removed yet\n",
+						de->name);
+				return;
 			}
-		}		
-		unlock_kernel();
+			free_proc_entry(de);
+		}
 	}
 }
 
--- ./include/linux/proc_fs.h.proclk	2005-12-26 13:43:16.000000000 +0300
+++ ./include/linux/proc_fs.h	2005-12-31 11:48:16.000000000 +0300
@@ -69,6 +69,14 @@ struct proc_dir_entry {
 	void *set;
 };
 
+extern void de_put(struct proc_dir_entry *);
+static inline struct proc_dir_entry *de_get(struct proc_dir_entry *de)
+{
+	if (de)
+		atomic_inc(&de->count);
+	return de;
+}
+
 struct kcore_list {
 	struct kcore_list *next;
 	unsigned long addr;
--- ./kernel/sysctl.c.proclk	2005-12-26 13:43:16.000000000 +0300
+++ ./kernel/sysctl.c	2005-12-31 11:48:37.000000000 +0300
@@ -1396,19 +1396,15 @@ static void unregister_proc_table(ctl_ta
 				continue;
 		}
 
-		/*
-		 * In any case, mark the entry as goner; we'll keep it
-		 * around if it's busy, but we'll know to do nothing with
-		 * its fields.  We are under sysctl_lock here.
-		 */
 		de->data = NULL;
-
-		/* Don't unregister proc entries that are still being used.. */
-		if (atomic_read(&de->count))
-			continue;
-
 		table->de = NULL;
+		/*
+		 * sys_sysctl can't find us, since we are removed from list.
+		 * proc won't touch either, since de->data is NULL.
+		 */
+		spin_unlock(&sysctl_lock);
 		remove_proc_entry(table->procname, root);
+		spin_lock(&sysctl_lock);
 	}
 }
 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] protect remove_proc_entry
  2005-12-30 23:46   ` Andrew Morton
  2005-12-31  6:58     ` Steven Rostedt
  2005-12-31  8:53     ` Kirill Korotaev
@ 2006-01-02 13:02     ` Steven Rostedt
  2 siblings, 0 replies; 25+ messages in thread
From: Steven Rostedt @ 2006-01-02 13:02 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, mingo

On Fri, 2005-12-30 at 15:46 -0800, Andrew Morton wrote:
> Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > +static DEFINE_SPINLOCK(remove_proc_lock);
> >
> 
> I'll take a closer look at this next week.
> 
> The official way of protecting the contents of a directory from concurrent
> lookup or modification is to take its i_sem.  But procfs is totally weird
> and that approach may well not be practical here.  We'd certainly prefer
> not to rely upon lock_kernel().

FWIW,

My test that would crash within two days has been running for three days
now with the lock_kernel patch.  So, at least this fixes the problem,
whether we use another locking or not, it's good to know what to fix.

-- Steve



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] protect remove_proc_entry
  2005-12-30 22:18       ` Steven Rostedt
@ 2006-01-04  9:21         ` Andrew Morton
  2006-01-04 12:18           ` Steven Rostedt
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2006-01-04  9:21 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: mitch, mingo, linux-kernel

Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Fri, 2005-12-30 at 17:09 -0500, Steven Rostedt wrote:
> 
> > Index: linux-2.6.15-rc7/fs/proc/generic.c
> > ===================================================================
> > --- linux-2.6.15-rc7.orig/fs/proc/generic.c	2005-12-30 14:19:39.000000000 -0500
> > +++ linux-2.6.15-rc7/fs/proc/generic.c	2005-12-30 17:05:56.000000000 -0500
> > @@ -693,6 +693,8 @@
> >  	if (!parent && xlate_proc_name(name, &parent, &fn) != 0)
> >  		goto out;
> >  	len = strlen(fn);
> > +
> > +	lock_kernel();
> >  	for (p = &parent->subdir; *p; p=&(*p)->next ) {
> >  		if (!proc_match(len, fn, *p))
> >  			continue;
> > @@ -713,6 +715,7 @@
> >  		}
> >  		break;
> >  	}
> > +	unlock_kernel();
> >  out:
> >  	return;
> >  }
> > 
> 
> FYI, to make sure that this solves the problem, I'm removing my locking
> in my kernel and using this instead.  It usually crashes in a day or
> two, so I can say this works if it makes it three days.
> 

I guess the lock_kernel() approach is the way to go.  Fixing a race and
de-BKLing procfs are separate exercises...

Did the patch work?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] protect remove_proc_entry
  2005-12-31  8:53     ` Kirill Korotaev
@ 2006-01-04  9:36       ` Andrew Morton
  2006-01-04 11:27         ` Kirill Korotaev
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2006-01-04  9:36 UTC (permalink / raw)
  To: Kirill Korotaev; +Cc: rostedt, linux-kernel, mingo

Kirill Korotaev <dev@sw.ru> wrote:
>
> Hi Andrew,
> 
> I have a full patch for this.

Please don't top-post.  It makes things hard...

> I don't remember the details yet, but lock was not god here, we used 
> semaphore. I pointed to this problem long ago when fixed error path in 
> proc with moduleget.
> 
> This patch protects proc_dir_entry tree with a proc_tree_sem semaphore. 
> I suppose lock_kernel() can be removed later after checking that no proc 
> handlers require it.
> Also this patch remakes de refcounters a bit making it more clear and 
> more similar to dentry scheme - this is required to make sure that 
> everything works correctly.
> 
> Patch is against 2.6.15-rcX and was tested for about a week. Also works 
> half a year on 2.6.8 :)
> 
> [ patch which uses an rwsem for procfs and somewhat removes lock_kernel() ]
>

I worry about replacing a spinlock with a sleeping lock.  In some
circumstances it can cause a complete scalability collapse and I suspect
this could happen with /proc.  Although I guess the only fastpath here is
proc_readdir(), and as the lock is taken there for reading, we'll be OK..

The patch does leave some lock_kernel() calls behind.  If we're going to do
this, I think they should all be removed?

Races in /proc have been plentiful and hard to find.  The patch worries me,
frankly.  I'd like to see quite a bit more description of the locking
schema and some demonstration that it's actually complete before taking the
plunge.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] protect remove_proc_entry
  2006-01-04  9:36       ` Andrew Morton
@ 2006-01-04 11:27         ` Kirill Korotaev
  0 siblings, 0 replies; 25+ messages in thread
From: Kirill Korotaev @ 2006-01-04 11:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: rostedt, linux-kernel, mingo

>>Hi Andrew,
>>
>>I have a full patch for this.
> 
> 
> Please don't top-post.  It makes things hard...
do you prefer separate mails with patch and with reference to original 
report? will do so.

>>I don't remember the details yet, but lock was not god here, we used 
>>semaphore. I pointed to this problem long ago when fixed error path in 
>>proc with moduleget.
>>
>>This patch protects proc_dir_entry tree with a proc_tree_sem semaphore. 
>>I suppose lock_kernel() can be removed later after checking that no proc 
>>handlers require it.
>>Also this patch remakes de refcounters a bit making it more clear and 
>>more similar to dentry scheme - this is required to make sure that 
>>everything works correctly.
>>
>>Patch is against 2.6.15-rcX and was tested for about a week. Also works 
>>half a year on 2.6.8 :)
>>
>>[ patch which uses an rwsem for procfs and somewhat removes lock_kernel() ]
>>
> 
> 
> I worry about replacing a spinlock with a sleeping lock.  In some
> circumstances it can cause a complete scalability collapse and I suspect
> this could happen with /proc.  Although I guess the only fastpath here is
> proc_readdir(), and as the lock is taken there for reading, we'll be OK..
> 
> The patch does leave some lock_kernel() calls behind.  If we're going to do
> this, I think they should all be removed?
> 
> Races in /proc have been plentiful and hard to find.  The patch worries me,
> frankly.  I'd like to see quite a bit more description of the locking
> schema and some demonstration that it's actually complete before taking the
> plunge.

ok, here goes some more descriptions:

1.
proc_readdir is a sleeping ops:
sys_getdents
`- vfs_readdir
      `- proc_readdir
           `- filldir
                `- put_user/copy_to_user
that's why we must use semaphore. of course spinlock can be used too,
but this will case another problem: it must be dropped to call filldir, but
beofre it will be retaken the dentry being filldir-ed may be removed and
we won't see it's siblings in output.

2.
lock_kernel() is needed to handle at least simultaneous sys_read vs
create_proc_entry with sequental de->proc_fops = XXX. this can be
handled by passing fops directly to create_proc_entry.
i.e. there is a 3rd problem I pointed to you before:
create_proc_entry() and setting of de->owner/de->proc_fops is inatomic.
lock_kernel() is not a best way to protect against this, sure...
I would prefer to fix it with a separate patch somehow...

3.
refcounting:
each proc_dir_entry's refcounter is the reference from inode plus
references from children. once this count reaches zero - dentry is freed.
so on each proc_register() parent is get-ed, on each remove_proc_entry
parent is put-ed.

Kirill


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] protect remove_proc_entry
  2006-01-04  9:21         ` Andrew Morton
@ 2006-01-04 12:18           ` Steven Rostedt
  2006-01-05  1:48             ` Mitchell Blank Jr
  0 siblings, 1 reply; 25+ messages in thread
From: Steven Rostedt @ 2006-01-04 12:18 UTC (permalink / raw)
  To: Andrew Morton; +Cc: mitch, mingo, linux-kernel


On Wed, 4 Jan 2006, Andrew Morton wrote:

> Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > FYI, to make sure that this solves the problem, I'm removing my locking
> > in my kernel and using this instead.  It usually crashes in a day or
> > two, so I can say this works if it makes it three days.
> >
>
> I guess the lock_kernel() approach is the way to go.  Fixing a race and
> de-BKLing procfs are separate exercises...
>
> Did the patch work?
>

Sorry, I forgot to respond, because the test is still running ;)

So yes, it not only ran for three days, it ran for six. I just killed it.

-- Steve


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] protect remove_proc_entry
  2006-01-04 12:18           ` Steven Rostedt
@ 2006-01-05  1:48             ` Mitchell Blank Jr
  0 siblings, 0 replies; 25+ messages in thread
From: Mitchell Blank Jr @ 2006-01-05  1:48 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Andrew Morton, mingo, linux-kernel

Steven Rostedt wrote:
> > I guess the lock_kernel() approach is the way to go.  Fixing a race and
> > de-BKLing procfs are separate exercises...
> >
> > Did the patch work?
> 
> Sorry, I forgot to respond, because the test is still running ;)
> 
> So yes, it not only ran for three days, it ran for six. I just killed it.

Have you looked at the other paths that touch ->subdir?  Namely:
  proc_devtree.c:
    proc_device_tree_add_node() -- plays games with ->subdir directly
  generic.c:
    proc_create() -- calls xlate_proc_name() which touches ->subdir and
      should therefore probably be called under BKL
    proc_register() -- both the call to xlate_proc_name() and the
      following accesses to ->subdir should be under BKL

-Mitch

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] protect remove_proc_entry
  2005-12-30 22:09     ` Steven Rostedt
  2005-12-30 22:18       ` Steven Rostedt
@ 2006-01-07 11:25       ` Andrew Morton
  1 sibling, 0 replies; 25+ messages in thread
From: Andrew Morton @ 2006-01-07 11:25 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: mitch, linux-kernel, mingo

Steven Rostedt <rostedt@goodmis.org> wrote:
>
> God, we should be getting rid of the stupid BKL, not add more.  But
>  seeing that this is what is used to protect that list, I guess I'll add
>  it.
> 
>  I'm also assuming that interrupt context wont use this.
> 
>  -- Steve
> 
>  Index: linux-2.6.15-rc7/fs/proc/generic.c
>  ===================================================================
>  --- linux-2.6.15-rc7.orig/fs/proc/generic.c	2005-12-30 14:19:39.000000000 -0500
>  +++ linux-2.6.15-rc7/fs/proc/generic.c	2005-12-30 17:05:56.000000000 -0500
>  @@ -693,6 +693,8 @@
>   	if (!parent && xlate_proc_name(name, &parent, &fn) != 0)
>   		goto out;
>   	len = strlen(fn);
>  +
>  +	lock_kernel();
>   	for (p = &parent->subdir; *p; p=&(*p)->next ) {
>   		if (!proc_match(len, fn, *p))
>   			continue;
>  @@ -713,6 +715,7 @@
>   		}
>   		break;
>   	}
>  +	unlock_kernel();
>   out:
>   	return;
>   }

OK, we're kind of screwed here.

Debug: sleeping function called from invalid context at include/asm/semaphore.h:105
in_atomic():1, irqs_disabled():0

Call Trace:<ffffffff8012689b>{__might_sleep+190} <ffffffff803e83ce>{lock_kernel+53}
       <ffffffff801a6db2>{remove_proc_entry+74} <ffffffff8016b295>{poison_obj+58}
       <ffffffff80134ee5>{unregister_proc_table+121} <ffffffff80134eb6>{unregister_proc_table+74}
       <ffffffff80134eb6>{unregister_proc_table+74} <ffffffff80134eb6>{unregister_proc_table+74}
       <ffffffff80134eb6>{unregister_proc_table+74} <ffffffff80134fdb>{unregister_sysctl_table+232}
       <ffffffff803a2a0e>{ip_mc_dec_group+181} <ffffffff803e802e>{_write_lock_irqsave+32}
       <ffffffff80133dc2>{local_bh_enable+114} <ffffffff803e82b3>{_write_unlock_bh+24}
       <ffffffff8039ebfe>{devinet_sysctl_unregister+31} <ffffffff8039ecc1>{inetdev_destroy+171}
       <ffffffff8039f1c1>{inet_del_ifa+509} <ffffffff8039f2dc>{inet_rtm_deladdr+268}
       <ffffffff8036efea>{rtnetlink_rcv_msg+437} <ffffffff803761c3>{netlink_run_queue+140}
       <ffffffff8036ee35>{rtnetlink_rcv_msg+0} <ffffffff8036f032>{rtnetlink_rcv+41}
       <ffffffff803758af>{netlink_data_ready+23} <ffffffff80374d77>{netlink_sendskb+41}
       <ffffffff80374ff4>{netlink_unicast+539} <ffffffff80375881>{netlink_sendmsg+667}
       <ffffffff8035bedf>{sock_sendmsg+232} <ffffffff803e8203>{_read_unlock_irq+20}
       <ffffffff80142e90>{autoremove_wake_function+0} <ffffffff80171a1e>{fget+157}
       <ffffffff80142e90>{autoremove_wake_function+0} <ffffffff80171a1e>{fget+157}
       <ffffffff8035bbea>{sockfd_lookup+18} <ffffffff8035d408>{sys_sendto+246}
       <ffffffff803e80cc>{_spin_unlock_irqrestore+27} <ffffffff80238031>{__up_write+371}
       <ffffffff8010db46>{system_call+126} 


Because CONFIG_PREEMPT_BKL makes lock_kernel do down() and some callers of
remove_proc_entry() do it from inside spinlock.

And the spinlocking variant of lock_kernel() is also pretty much illegal,
because (in this case) whatever lock networking has taken may be taken
elsewhere inside lock_kernel(), so we have an ab/ba deadlock.

Best I can think of is that we need a private spinlock for this list.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] protect remove_proc_entry
  2005-12-30 21:28 ` [PATCH] protect remove_proc_entry Steven Rostedt
                     ` (2 preceding siblings ...)
  2005-12-30 23:46   ` Andrew Morton
@ 2006-01-07 11:36   ` Andrew Morton
  2006-01-07 12:04     ` Steven Rostedt
  2006-01-09 19:16     ` Steven Rostedt
  3 siblings, 2 replies; 25+ messages in thread
From: Andrew Morton @ 2006-01-07 11:36 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel, mingo

Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Working on a custom kernel that adds and removes proc entries quite a
> bit, I discovered that remove_proc_entry is not protected against
> multiple threads removing entries belonging to the same parent.  At
> first I thought that this is only a problem with my changes, but after
> inspecting the vanilla kernel, I see that there's several places that
> calls remove_proc_entry with the same parent (most noticeably
> /proc/drivers).
> 
> I've added a global remove_proc_lock to protect this section of code.  I
> was going to add a lock to proc_dir_entry so that the locking is only
> cut down to the same parent, but since this function is called so
> infrequently, why waste more memory then is needed.  One global lock
> should not cause too much of a headache here.
> 
> I'm not sure if remove_proc_entry is called from interrupt context, so I
> did a irqsave just in case.
> 
> -- Steve
> 
> 
> Index: linux-2.6.15-rc7/fs/proc/generic.c
> ===================================================================
> --- linux-2.6.15-rc7.orig/fs/proc/generic.c	2005-12-30 14:19:39.000000000 -0500
> +++ linux-2.6.15-rc7/fs/proc/generic.c	2005-12-30 16:18:42.000000000 -0500
> @@ -19,6 +19,7 @@
>  #include <linux/idr.h>
>  #include <linux/namei.h>
>  #include <linux/bitops.h>
> +#include <linux/spinlock.h>
>  #include <asm/uaccess.h>
>  
>  static ssize_t proc_file_read(struct file *file, char __user *buf,
> @@ -27,6 +28,8 @@
>  			       size_t count, loff_t *ppos);
>  static loff_t proc_file_lseek(struct file *, loff_t, int);
>  
> +static DEFINE_SPINLOCK(remove_proc_lock);
> +
>  int proc_match(int len, const char *name, struct proc_dir_entry *de)
>  {
>  	if (de->namelen != len)
> @@ -689,10 +692,13 @@
>  	struct proc_dir_entry *de;
>  	const char *fn = name;
>  	int len;
> +	unsigned long flags;
>  
>  	if (!parent && xlate_proc_name(name, &parent, &fn) != 0)
>  		goto out;
>  	len = strlen(fn);
> +
> +	spin_lock_irqsave(&remove_proc_lock, flags);
>  	for (p = &parent->subdir; *p; p=&(*p)->next ) {
>  		if (!proc_match(len, fn, *p))
>  			continue;
> @@ -713,6 +719,7 @@
>  		}
>  		break;
>  	}
> +	spin_unlock_irqrestore(&remove_proc_lock, flags);
>  out:
>  	return;
>  }

Aren't there other places where we need to take this lock?  Code which
traverses that list, code which adds things to it?



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] protect remove_proc_entry
  2006-01-07 11:36   ` Andrew Morton
@ 2006-01-07 12:04     ` Steven Rostedt
  2006-01-09 19:16     ` Steven Rostedt
  1 sibling, 0 replies; 25+ messages in thread
From: Steven Rostedt @ 2006-01-07 12:04 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, mingo


On Sat, 7 Jan 2006, Andrew Morton wrote:

>
> Aren't there other places where we need to take this lock?  Code which
> traverses that list, code which adds things to it?
>

Yeah, that patch was just a quick fix.  I'll look more into that on
Monday. (My wife has too many chores for me this weekend ;)

-- Steve

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] protect remove_proc_entry
  2006-01-07 11:36   ` Andrew Morton
  2006-01-07 12:04     ` Steven Rostedt
@ 2006-01-09 19:16     ` Steven Rostedt
  2006-01-10  0:59       ` Steven Rostedt
  2006-01-10 13:26       ` Steven Rostedt
  1 sibling, 2 replies; 25+ messages in thread
From: Steven Rostedt @ 2006-01-09 19:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, mingo

On Sat, 2006-01-07 at 03:36 -0800, Andrew Morton wrote:

> 
> Aren't there other places where we need to take this lock?  Code which
> traverses that list, code which adds things to it?
> 

Andrew,

How's this patch look?  I tested this with the following module:

http://www.kihontech.com/tests/proc/proc_stress.c

Without the patch, I could hang the processes (the processes would
either crash, or just get stuck spinning inside the list).  With the
patch, the module ran to completion each time.

I don't believe any of the calls are called from interrupt context, so I
held off from using the _irqsave versions of spin_lock.

-- Steve

Index: linux-2.6.15/fs/proc/generic.c
===================================================================
--- linux-2.6.15.orig/fs/proc/generic.c	2006-01-09 13:58:23.000000000 -0500
+++ linux-2.6.15/fs/proc/generic.c	2006-01-09 13:58:34.000000000 -0500
@@ -19,6 +19,7 @@
 #include <linux/idr.h>
 #include <linux/namei.h>
 #include <linux/bitops.h>
+#include <linux/spinlock.h>
 #include <asm/uaccess.h>
 
 static ssize_t proc_file_read(struct file *file, char __user *buf,
@@ -27,6 +28,8 @@
 			       size_t count, loff_t *ppos);
 static loff_t proc_file_lseek(struct file *, loff_t, int);
 
+DEFINE_SPINLOCK(proc_subdir_lock);
+
 int proc_match(int len, const char *name, struct proc_dir_entry *de)
 {
 	if (de->namelen != len)
@@ -275,7 +278,9 @@
 	const char     		*cp = name, *next;
 	struct proc_dir_entry	*de;
 	int			len;
+	int 			rtn = 0;
 
+	spin_lock(&proc_subdir_lock);
 	de = &proc_root;
 	while (1) {
 		next = strchr(cp, '/');
@@ -287,13 +292,17 @@
 			if (proc_match(len, cp, de))
 				break;
 		}
-		if (!de)
-			return -ENOENT;
+		if (!de) {
+			rtn = -ENOENT;
+			goto out;
+		}
 		cp += len + 1;
 	}
 	*residual = cp;
 	*ret = de;
-	return 0;
+out:
+	spin_unlock(&proc_subdir_lock);
+	return rtn;
 }
 
 static DEFINE_IDR(proc_inum_idr);
@@ -378,6 +387,7 @@
 	int error = -ENOENT;
 
 	lock_kernel();
+	spin_lock(&proc_subdir_lock);
 	de = PDE(dir);
 	if (de) {
 		for (de = de->subdir; de ; de = de->next) {
@@ -386,12 +396,15 @@
 			if (!memcmp(dentry->d_name.name, de->name, de->namelen)) {
 				unsigned int ino = de->low_ino;
 
+				spin_unlock(&proc_subdir_lock);
 				error = -EINVAL;
 				inode = proc_get_inode(dir->i_sb, ino, de);
+				spin_lock(&proc_subdir_lock);
 				break;
 			}
 		}
 	}
+	spin_unlock(&proc_subdir_lock);
 	unlock_kernel();
 
 	if (inode) {
@@ -445,11 +458,13 @@
 			filp->f_pos++;
 			/* fall through */
 		default:
+			spin_lock(&proc_subdir_lock);
 			de = de->subdir;
 			i -= 2;
 			for (;;) {
 				if (!de) {
 					ret = 1;
+					spin_unlock(&proc_subdir_lock);
 					goto out;
 				}
 				if (!i)
@@ -459,12 +474,16 @@
 			}
 
 			do {
+				/* filldir passes info to user space */
+				spin_unlock(&proc_subdir_lock);
 				if (filldir(dirent, de->name, de->namelen, filp->f_pos,
 					    de->low_ino, de->mode >> 12) < 0)
 					goto out;
+				spin_lock(&proc_subdir_lock);
 				filp->f_pos++;
 				de = de->next;
 			} while (de);
+			spin_unlock(&proc_subdir_lock);
 	}
 	ret = 1;
 out:	unlock_kernel();
@@ -498,9 +517,13 @@
 	if (i == 0)
 		return -EAGAIN;
 	dp->low_ino = i;
+
+	spin_lock(&proc_subdir_lock);
 	dp->next = dir->subdir;
 	dp->parent = dir;
 	dir->subdir = dp;
+	spin_unlock(&proc_subdir_lock);
+
 	if (S_ISDIR(dp->mode)) {
 		if (dp->proc_iops == NULL) {
 			dp->proc_fops = &proc_dir_operations;
@@ -692,6 +715,8 @@
 	if (!parent && xlate_proc_name(name, &parent, &fn) != 0)
 		goto out;
 	len = strlen(fn);
+
+	spin_lock(&proc_subdir_lock);
 	for (p = &parent->subdir; *p; p=&(*p)->next ) {
 		if (!proc_match(len, fn, *p))
 			continue;
@@ -712,6 +737,7 @@
 		}
 		break;
 	}
+	spin_unlock(&proc_subdir_lock);
 out:
 	return;
 }
Index: linux-2.6.15/fs/proc/proc_devtree.c
===================================================================
--- linux-2.6.15.orig/fs/proc/proc_devtree.c	2006-01-09 13:58:23.000000000 -0500
+++ linux-2.6.15/fs/proc/proc_devtree.c	2006-01-09 14:05:10.000000000 -0500
@@ -112,9 +112,11 @@
 		 * properties are quite unimportant for us though, thus we
 		 * simply "skip" them here, but we do have to check.
 		 */
+		spin_lock(&proc_subdir_lock);
 		for (ent = de->subdir; ent != NULL; ent = ent->next)
 			if (!strcmp(ent->name, pp->name))
 				break;
+		spin_unlock(&proc_subdir_lock);
 		if (ent != NULL) {
 			printk(KERN_WARNING "device-tree: property \"%s\" name"
 			       " conflicts with node in %s\n", pp->name,
Index: linux-2.6.15/include/linux/proc_fs.h
===================================================================
--- linux-2.6.15.orig/include/linux/proc_fs.h	2006-01-09 08:59:13.000000000 -0500
+++ linux-2.6.15/include/linux/proc_fs.h	2006-01-09 14:07:09.000000000 -0500
@@ -4,6 +4,7 @@
 #include <linux/config.h>
 #include <linux/slab.h>
 #include <linux/fs.h>
+#include <linux/spinlock.h>
 #include <asm/atomic.h>
 
 /*
@@ -92,6 +93,8 @@
 extern struct proc_dir_entry *proc_root_driver;
 extern struct proc_dir_entry *proc_root_kcore;
 
+extern spinlock_t proc_subdir_lock;
+
 extern void proc_root_init(void);
 extern void proc_misc_init(void);
 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] protect remove_proc_entry
  2006-01-09 19:16     ` Steven Rostedt
@ 2006-01-10  0:59       ` Steven Rostedt
  2006-01-10  1:05         ` Ingo Molnar
  2006-01-10 13:26       ` Steven Rostedt
  1 sibling, 1 reply; 25+ messages in thread
From: Steven Rostedt @ 2006-01-10  0:59 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML

Ingo, 

FYI

I just uploaded my 2.6.15-rt2-sr3 which includes the latest patch to fix
the bug in remove_proc_entry.

http://home.stny.rr.com/rostedt/patches/patch-2.6.15-rt2-sr3

Again, the module to test this is here:

http://www.kihontech.com/tests/proc/proc_stress.c

I tested it like the following:

# insmod proc_stress.ko & for i in `seq 1 10000`; do ls /proc/proc_tests; done

-- Steve



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] protect remove_proc_entry
  2006-01-10  0:59       ` Steven Rostedt
@ 2006-01-10  1:05         ` Ingo Molnar
  0 siblings, 0 replies; 25+ messages in thread
From: Ingo Molnar @ 2006-01-10  1:05 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML


* Steven Rostedt <rostedt@goodmis.org> wrote:

> Ingo,
> 
> FYI
> 
> I just uploaded my 2.6.15-rt2-sr3 which includes the latest patch to 
> fix the bug in remove_proc_entry.
> 
> http://home.stny.rr.com/rostedt/patches/patch-2.6.15-rt2-sr3

thanks Steve - i've applied your fixes and have uploaded 2.6.15-rt3 to 
the usual place:

   http://redhat.com/~mingo/realtime-preempt/

(other than the version string it is the same as -rt2-sr3.)

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] protect remove_proc_entry
  2006-01-09 19:16     ` Steven Rostedt
  2006-01-10  0:59       ` Steven Rostedt
@ 2006-01-10 13:26       ` Steven Rostedt
  1 sibling, 0 replies; 25+ messages in thread
From: Steven Rostedt @ 2006-01-10 13:26 UTC (permalink / raw)
  To: Andrew Morton; +Cc: mingo, linux-kernel

Andrew, here's a better explanation of the patch as well as my
Signed-off.

Description:

It has been discovered that the remove_proc_entry has a race in the
removing of entries in the proc file system that are siblings.  There's
no protection around the traversing and removing of elements that belong
in the same subdirectory.

This subdirectory list is protected in other areas by the BKL.  So the
BKL was at first used to protect this area too, but unfortunately,
remove_proc_entry may be called with spinlocks held.  The BKL may
schedule, so this was not a solution.

The final solution was to add a new global spin lock to protect this
list, called proc_subdir_lock.  This lock now protects the list in
remove_proc_entry, and I also went around looking for other areas that
this list is modified and added this protection there too.  Care must be
taken since these locations call several functions that may also
schedule.

Since I don't see any location that these functions that modify the
subdirectory list are called by interrupts, the irqsave/restore versions
of the spin lock was _not_ used.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


-- Steve



^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2006-01-10 13:27 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-12-30 20:04 [Question] race condition with remove_proc_entry Steven Rostedt
2005-12-30 21:28 ` [PATCH] protect remove_proc_entry Steven Rostedt
2005-12-30 21:34   ` Daniel Walker
2005-12-30 21:55     ` Steven Rostedt
2005-12-30 21:55   ` Mitchell Blank Jr
2005-12-30 22:09     ` Steven Rostedt
2005-12-30 22:18       ` Steven Rostedt
2006-01-04  9:21         ` Andrew Morton
2006-01-04 12:18           ` Steven Rostedt
2006-01-05  1:48             ` Mitchell Blank Jr
2006-01-07 11:25       ` Andrew Morton
2005-12-30 22:11     ` Steven Rostedt
2005-12-30 23:46   ` Andrew Morton
2005-12-31  6:58     ` Steven Rostedt
2005-12-31  8:34       ` Arjan van de Ven
2005-12-31  8:53     ` Kirill Korotaev
2006-01-04  9:36       ` Andrew Morton
2006-01-04 11:27         ` Kirill Korotaev
2006-01-02 13:02     ` Steven Rostedt
2006-01-07 11:36   ` Andrew Morton
2006-01-07 12:04     ` Steven Rostedt
2006-01-09 19:16     ` Steven Rostedt
2006-01-10  0:59       ` Steven Rostedt
2006-01-10  1:05         ` Ingo Molnar
2006-01-10 13:26       ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).