linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* PROBLEM: /proc (procfs) task exit race condition causes a kernel crash
@ 2006-05-26  0:43 Tony Griffiths
  2006-05-28 15:37 ` Eric W. Biederman
  0 siblings, 1 reply; 4+ messages in thread
From: Tony Griffiths @ 2006-05-26  0:43 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 5076 bytes --]

Summary:

A condition exists that crashes the kernel when one or more tasks are 
exiting while at the same time another task is reading their /proc 
entries.  The crash is caused by either a bad VA (NULL, LIST_POISON1, or 
LIST_POISON2) in prune_dcache() or a BUG_ON() sanity check in 
include/linux/list.h!

Detailed Description:

If there is a great deal of modification activity in /proc caused by 
task creation [fork()] and task exiting, and at the same time other 
task(s) are reading /proc/<pid>/... files, the dentry_unused list 
becomes corrupted and the kernel crashes, usually in function 
prune_dcache() in module fs/dcache.c!  A simple program that forks 
itself run in a continuous loop combined with a 'find /proc ... cat {} 
\;' to read the /proc task entries is all that is needed to induce the 
condition.  A couple of sample crash outputs look like-

(a)  BUG_ON() --
 ------------[ cut here ]------------
kernel BUG at include/linux/list.h:167!
invalid opcode: 0000 [#1]
SMP
last sysfs file: /class/vc/vcs1/dev
Modules linked in: parport_pc lp parport autofs4 i2c_dev i2c_core 
microcode binfmt_misc video thermal sony_acpi processor fan button 
battery ac ehci_hcd usbcore ide_cd cdrom sg ext3 jbd dm_mod mptspi 
scsi_transport_spi mptscsih mptbase sd_mod scsi_mod
CPU:    1
EIP:    0060:[<c017ec60>]    Not tainted VLI
EFLAGS: 00010203   (2.6.16-mm2 #1)
EIP is at prune_dcache+0x3c6/0x3d3
eax: 00000010   ebx: f7326b08   ecx: f7326b10   edx: c017e280
esi: f7326ae0   edi: f7e81e5c   ebp: 00000001   esp: f7e81e4c
ds: 007b   es: 007b   ss: 0068
Process init (pid: 1, threadinfo=f7e80000 task=c352eaa0)
Stack: <0>c0401c00 f7e81e5c f7e81ead c017ef28 f7e81e5c f7e81e5c f7326df8 
f620c000
       f7326df8 f7e81ea8 c017efe6 00000006 f7ec0e00 f620c000 c019c71a 
f7326df8
       f7e81e98 c036a63a 000077a4 089c21d9 00000005 f7e81ea8 c0117643 
32363033
Call Trace:
 <c017ef28> select_parent+0x17/0xbc   <c017efe6> 
shrink_dcache_parent+0x19/0x2c
 <c019c71a> proc_flush_task+0x5f/0x1f5   <c0117643> sched_exit+0xb1/0xc8
 <c0120552> release_task+0x84/0x101   <c01028c3> handle_signal+0x108/0x143
 <c0122094> wait_task_zombie+0x2de/0x3cf   <c0102984> do_signal+0x86/0x11c
 <c012291d> do_wait+0x36f/0x40f   <c0119153> default_wake_function+0x0/0x12
 <c01f08ec> copy_to_user+0x3c/0x50   <c0119153> 
default_wake_function+0x0/0x12
 <c0122a8c> sys_wait4+0x3f/0x43   <c0122ab7> sys_waitpid+0x27/0x2b
 <c0102b5f> syscall_call+0x7/0xb
Code: 31 ff ff ff 0f 0b a7 00 9f 69 35 c0 e9 8c fd ff ff 0f 0b a8 00 9f 
69 35 c0 e9 8b fd ff ff 0f 0b a8 00 9f 69 35 c0 e9 9e fe ff ff <0f> 0b 
a7 00 9f 69 35 c0 e9 85 fe ff ff 55 b8 00 1c 40 c0 57 56

(b)  LIST_POISON1/LIST_POISON2 --
# Unable to handle kernel paging request at virtual address 00100104
 printing eip:
c0179d12
*pde = 3780b001
Oops: 0002 [#1]
SMP
Modules linked in: parport_pc lp parport autofs4 i2c_dev i2c_core 
microcode binfmt_misc video thermal processor fan button battery ac 
ehci_hcd usbcore ide_cd cdrom sg ext3 jbd dm_mod mptspi mptscsih mptbase 
sd_mod scsi_mod
CPU:    7
EIP:    0060:[<c0179d12>]    Not tainted VLI
EFLAGS: 00010202   (2.6.16.18 #1)
EIP is at prune_dcache+0x231/0x327
eax: 00100100   ebx: f553d55c   ecx: f553d564   edx: 00200200
esi: f553d534   edi: f7e81e94   ebp: 00000002   esp: f7e81e84
ds: 007b   es: 007b   ss: 0068
Process init (pid: 1, threadinfo=f7e80000 task=c352ea90)
Stack: <0>c03f2c00 f7e81e94 f65fcac0 c017a03d f7e81e94 f7e81e94 f576a954 
f59c2a90
       f576a954 00000000 c017a0fa 0000000a f7ecf000 f576a954 c0196c8a 
f576a954
       f59c2a90 c01212b2 f576a954 c0102993 f59c2a90 00000000 00003b88 
00000000
Call Trace:
 [<c017a03d>] select_parent+0x17/0xbb
 [<c017a0fa>] shrink_dcache_parent+0x19/0x2c
 [<c0196c8a>] proc_pid_flush+0x14/0x26
 [<c01212b2>] release_task+0xa3/0x12e
 [<c0102993>] handle_signal+0x108/0x143
 [<c0122dac>] wait_task_zombie+0x2de/0x3c9
 [<c0102a5e>] do_signal+0x90/0x127
 [<c01235c9>] do_wait+0x34d/0x3de
 [<c011a913>] default_wake_function+0x0/0x12
 [<c01e9b5e>] copy_to_user+0x3c/0x50
 [<c011a913>] default_wake_function+0x0/0x12
 [<c0123722>] sys_wait4+0x3f/0x43
 [<c012374d>] sys_waitpid+0x27/0x2b
 [<c0102c2d>] syscall_call+0x7/0xb
Code: fe ff ff 8b 4e 50 e9 bd fe ff ff 8d 7c 24 10 89 7c 24 10 89 7c 24 
14 8b 46 04 a8 10 0f 84 a8 00 00 00 8d 4e 30 8b 46 30 8b 51 04 <89> 50 
04 89 02 c7 41 04 00 02 20 00 c7 46 30 00 01 10 00 83 2d
 <0>Kernel panic - not syncing: Attempted to kill init!

All kernels from 2.6.15 -> 2.6.17 with any of the applicable patch-sets 
(-git or -mm) are affected!!!  Also RedHat FC<n> kernels.

Environment:

The environment is any SMP hardware with the kernel build with or 
without PREEMPT enabled.  Any P4 hyperthreaded chip, or Xeon 
multi-processor system [DELL 1425 & 1850 dual-Xeon and also dual-core 
dual-Xeon in my case] will exhibit the crash.

The attached forkalot.c program combined with the simple shell scripts 
do the job.  Running the forkalot shell script while at the same time 
running any of the proc-*.sh in a 'while true; do ... ; done' loop 
crashes by systems within a couple of minutes.


[-- Attachment #2: forkalot.c --]
[-- Type: text/x-csrc, Size: 885 bytes --]

// Program:  forkalot.c
//
// Compile:  cc forkalot.c -o forkalot
// Run:      ./forkalot [100] [1]
//
// Args:     arg1 = # of copies of program to run simultaneously [100]
//           arg2 = Sleep time before exiting [1]

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <unistd.h>
#include <fcntl.h>

#define CHILDREN    100
#define SLEEP_FOR   1

int main(int argc, char *argv[])
{
	int	count, this_long;
	int	pid;

	if (argc > 1)
		count = atoi(argv[1]);
	else
		count = CHILDREN;
	if (argc > 2)
		this_long = atoi(argv[2]);
        else
                this_long = SLEEP_FOR;

	/* fork count-1 children */
	while (count-- > 1) {
		pid = fork();
		if (pid == 0) {
			/* child */
			break;
		} else if (pid < 0) {
			perror("fork");
			exit(1);
		}
	}

        /* Sleepy... sleepy... */
	sleep(this_long);

        /* All done... return success! */
	return 0;
}

[-- Attachment #3: forkalot-test.sh --]
[-- Type: application/x-shellscript, Size: 222 bytes --]

[-- Attachment #4: proc-cmdline.sh --]
[-- Type: application/x-shellscript, Size: 78 bytes --]

[-- Attachment #5: proc-status.sh --]
[-- Type: application/x-shellscript, Size: 77 bytes --]

[-- Attachment #6: proc-torture.sh --]
[-- Type: application/x-shellscript, Size: 178 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: PROBLEM: /proc (procfs) task exit race condition causes a kernel crash
  2006-05-26  0:43 PROBLEM: /proc (procfs) task exit race condition causes a kernel crash Tony Griffiths
@ 2006-05-28 15:37 ` Eric W. Biederman
  2006-05-29  0:28   ` Tony Griffiths
  2006-05-31  6:50   ` Tony Griffiths
  0 siblings, 2 replies; 4+ messages in thread
From: Eric W. Biederman @ 2006-05-28 15:37 UTC (permalink / raw)
  To: Tony Griffiths; +Cc: linux-kernel, Andrew Morton


I have tried to reproduce this.  The circumstances weren't the most
controlled but they did overlap with what you described and I haven't seen
anything.

So I am guessing that you are having memory corruption from some source.
Either bad ram or a bad module.

I'm off on vacation for a week, so I won't be able to follow up.


Eric

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: PROBLEM: /proc (procfs) task exit race condition causes a kernel crash
  2006-05-28 15:37 ` Eric W. Biederman
@ 2006-05-29  0:28   ` Tony Griffiths
  2006-05-31  6:50   ` Tony Griffiths
  1 sibling, 0 replies; 4+ messages in thread
From: Tony Griffiths @ 2006-05-29  0:28 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel, Andrew Morton

Eric W. Biederman wrote:

>I have tried to reproduce this.  The circumstances weren't the most
>controlled but they did overlap with what you described and I haven't seen
>anything.
>  
>
What version of the kernel and patch-set did you test against?

Over the last month I've tried *EVERYTHING* I can lay my hands on and 
can still cause a crash VERY easily!

>So I am guessing that you are having memory corruption from some source.
>Either bad ram or a bad module.
>  
>
On my DELL 1850 dual-core dual-Xeon system I did have a flaky DIMM which 
cause a few correctable ECC errors, but that has been replaced and still 
the same.  My other test systems are (multiple) DELL 1425 dual-Xeon 
machines [2.8 or 3.0 GHz chips].
 

>I'm off on vacation for a week, so I won't be able to follow up.
>  
>
Have a good one...

>
>Eric
>  
>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: PROBLEM: /proc (procfs) task exit race condition causes a kernel crash
  2006-05-28 15:37 ` Eric W. Biederman
  2006-05-29  0:28   ` Tony Griffiths
@ 2006-05-31  6:50   ` Tony Griffiths
  1 sibling, 0 replies; 4+ messages in thread
From: Tony Griffiths @ 2006-05-31  6:50 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 1307 bytes --]

Eric,

I've attached a patch file which rolls up a set of patches to dcache.c 
along with a few changed I made to locking symantics.  So far a 2.6.16 
(+ -mm2) with the patch applied survives the harshest testing I can 
throw at it!

Note that I've also made some minor changes to exit.c [preempt 
disable/enable during task exit] and truncate.c [BUG_ON check of 
'private' page].  The testing I've done is with a kernel built without 
preemption, and one built with voluntary preemption.  A kernel built 
with forced preemption and with spinlock debugging enabled did NOT work 
very well!!!  Also, the performance of a kernel  built with voluntary 
preemption and with the cond_resched_lock() calls in dcache.c was 
surprisingly *BAD*, with the system going into a wheel-spin for a long 
period [high system cpu of 87%+] when I fired up a number of large 
cpu+memory hungry tasks.  This might need to be looked at more closely?!

Eric W. Biederman wrote:

>I have tried to reproduce this.  The circumstances weren't the most
>controlled but they did overlap with what you described and I haven't seen
>anything.
>
>So I am guessing that you are having memory corruption from some source.
>Either bad ram or a bad module.
>
>I'm off on vacation for a week, so I won't be able to follow up.
>
>
>Eric
>  
>


[-- Attachment #2: post-2.6.16-mm2-dcache.patch --]
[-- Type: text/x-patch, Size: 15151 bytes --]

diff -urpN linux-2.6.16-mm2/fs/dcache.c linux-2.6.16/fs/dcache.c
--- linux-2.6.16-mm2/fs/dcache.c	2006-05-31 16:30:13.000000000 +1000
+++ linux-2.6.16/fs/dcache.c	2006-05-31 16:25:53.000000000 +1000
@@ -36,12 +36,10 @@
 
 
 int sysctl_vfs_cache_pressure __read_mostly = 100;
-EXPORT_SYMBOL_GPL(sysctl_vfs_cache_pressure);
 
  __cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_lock);
 static seqlock_t rename_lock __cacheline_aligned_in_smp = SEQLOCK_UNLOCKED;
 
-EXPORT_SYMBOL(dcache_lock);
 
 static kmem_cache_t *dentry_cache __read_mostly;
 
@@ -142,21 +140,18 @@ static void dentry_iput(struct dentry * 
  * no dcache lock, please.
  */
 
-void dput(struct dentry *dentry)
+static void dput_locked(struct dentry *dentry, struct list_head *list)
 {
 	if (!dentry)
 		return;
 
-repeat:
-	if (atomic_read(&dentry->d_count) == 1)
-		might_sleep();
-	if (!atomic_dec_and_lock(&dentry->d_count, &dcache_lock))
+	if (!atomic_dec_and_test(&dentry->d_count))
 		return;
 
+repeat:
 	spin_lock(&dentry->d_lock);
 	if (atomic_read(&dentry->d_count)) {
 		spin_unlock(&dentry->d_lock);
-		spin_unlock(&dcache_lock);
 		return;
 	}
 
@@ -176,33 +171,59 @@ repeat:
   		dentry_stat.nr_unused++;
   	}
  	spin_unlock(&dentry->d_lock);
-	spin_unlock(&dcache_lock);
 	return;
 
 unhash_it:
 	__d_drop(dentry);
 
 kill_it: {
-		struct dentry *parent;
-
 		/* If dentry was on d_lru list
 		 * delete it from there
 		 */
   		if (!list_empty(&dentry->d_lru)) {
-  			list_del(&dentry->d_lru);
+  			list_del_init(&dentry->d_lru);
   			dentry_stat.nr_unused--;
   		}
   		list_del(&dentry->d_u.d_child);
 		dentry_stat.nr_dentry--;	/* For d_free, below */
-		/*drops the locks, at that point nobody can reach this dentry */
-		dentry_iput(dentry);
-		parent = dentry->d_parent;
-		d_free(dentry);
-		if (dentry == parent)
+		/* at this point nobody can reach this dentry */
+		list_add(&dentry->d_lru, list);
+		spin_unlock(&dentry->d_lock);
+		if (dentry == dentry->d_parent)
 			return;
-		dentry = parent;
-		goto repeat;
+		dentry = dentry->d_parent;
+		if (atomic_dec_and_test(&dentry->d_count))
+			goto repeat;
+		/* out */
+	}
+}
+
+void dput(struct dentry *dentry)
+{
+	LIST_HEAD(free_list);
+
+	if (!dentry)
+		goto do_return;
+
+	if (atomic_add_unless(&dentry->d_count, -1, 1))
+		goto do_return;
+
+	spin_lock(&dcache_lock);			/* While we hold the dcache_lock */
+	dput_locked(dentry, &free_list);		/* Put ALL free-able dentry's onto 'free_list' */
+
+	if (!list_empty(&free_list)) {			/* Then process as a single batch! */
+		struct dentry *dentry, *p;
+		list_for_each_entry_safe(dentry, p, &free_list, d_lru) {
+			spin_lock(&dentry->d_lock);	/* Lock dentry while also holding dcache_lock! */
+			list_del(&dentry->d_lru);
+			dentry_iput(dentry);		/* Enter with locks held; Exit with no locks! */
+			d_free(dentry);
+			spin_lock(&dcache_lock);	/* Assume we will iterate again so ... */
+		}
 	}
+	spin_unlock(&dcache_lock);			/* There *MUST* be a better way of doing this?! */
+do_return:
+	return;
 }
 
 /**
@@ -219,13 +240,15 @@ kill_it: {
  
 int d_invalidate(struct dentry * dentry)
 {
+	int	ret = 0;
+
 	/*
 	 * If it's already been dropped, return OK.
 	 */
 	spin_lock(&dcache_lock);
 	if (d_unhashed(dentry)) {
 		spin_unlock(&dcache_lock);
-		return 0;
+		goto do_return;
 	}
 	/*
 	 * Check whether to do a partial shrink_dcache
@@ -252,14 +275,16 @@ int d_invalidate(struct dentry * dentry)
 		if (dentry->d_inode && S_ISDIR(dentry->d_inode->i_mode)) {
 			spin_unlock(&dentry->d_lock);
 			spin_unlock(&dcache_lock);
-			return -EBUSY;
+			ret = -EBUSY;
+			goto do_return;
 		}
 	}
 
 	__d_drop(dentry);
 	spin_unlock(&dentry->d_lock);
 	spin_unlock(&dcache_lock);
-	return 0;
+do_return:
+	return ret;
 }
 
 /* This should be called _only_ with dcache_lock held */
@@ -276,7 +301,10 @@ static inline struct dentry * __dget_loc
 
 struct dentry * dget_locked(struct dentry *dentry)
 {
-	return __dget_locked(dentry);
+	struct dentry*	ret;
+
+	ret = __dget_locked(dentry);
+	return ret;
 }
 
 /**
@@ -366,22 +394,39 @@ restart:
  */
 static inline void prune_one_dentry(struct dentry * dentry)
 {
-	struct dentry * parent;
+	LIST_HEAD(free_list);
 
 	__d_drop(dentry);
 	list_del(&dentry->d_u.d_child);
 	dentry_stat.nr_dentry--;	/* For d_free, below */
-	dentry_iput(dentry);
-	parent = dentry->d_parent;
+
+	/* dput the parent here before we release dcache_lock */
+	if (dentry != dentry->d_parent)
+		dput_locked(dentry->d_parent, &free_list);
+
+	dentry_iput(dentry);		/* drop locks */
 	d_free(dentry);
-	if (parent != dentry)
-		dput(parent);
+
+	if (!list_empty(&free_list)) {
+		struct dentry *tmp, *p;
+
+		list_for_each_entry_safe(tmp, p, &free_list, d_lru) {
+			spin_lock(&dcache_lock);	/* All of this locking/unlocking  */
+			spin_lock(&tmp->d_lock);	/* is so incredibly UGLY!!! */
+			list_del(&tmp->d_lru);
+			dentry_iput(tmp);
+			d_free(tmp);
+		}
+	}
+
 	spin_lock(&dcache_lock);
 }
 
 /**
  * prune_dcache - shrink the dcache
  * @count: number of entries to try and free
+ * @sb: if given, ignore dentries for other superblocks
+ *         which are being unmounted.
  *
  * Shrink the dcache. This is done when we need
  * more memory, or simply when we need to unmount
@@ -392,16 +437,30 @@ static inline void prune_one_dentry(stru
  * all the dentries are in use.
  */
  
-static void prune_dcache(int count)
+static void prune_dcache(int count, struct super_block *sb)
 {
 	spin_lock(&dcache_lock);
 	for (; count ; count--) {
 		struct dentry *dentry;
 		struct list_head *tmp;
+		struct rw_semaphore *s_umount;
 
-		cond_resched_lock(&dcache_lock);
+		/*cond_resched_lock(&dcache_lock);	** ?BAD PERFORMANCE? **   */
 
 		tmp = dentry_unused.prev;
+		if (unlikely(sb)) {
+			/* Try to find a dentry for this sb, but don't try
+			 * too hard, if they aren't near the tail they will
+			 * be moved down again soon
+			 */
+			int skip = count;
+			while (skip &&
+			       tmp != &dentry_unused &&
+			       list_entry(tmp, struct dentry, d_lru)->d_sb != sb) {
+				skip--;
+				tmp = tmp->prev;
+			}
+		}
 		if (tmp == &dentry_unused)
 			break;
 		list_del_init(tmp);
@@ -427,7 +486,45 @@ static void prune_dcache(int count)
  			spin_unlock(&dentry->d_lock);
 			continue;
 		}
-		prune_one_dentry(dentry);
+		/*
+		 * If the dentry is not DCACHED_REFERENCED, it is time
+		 * to remove it from the dcache, provided the super block is
+		 * NULL (which means we are trying to reclaim memory)
+		 * or this dentry belongs to the same super block that
+		 * we want to shrink.
+		 */
+		/*
+		 * If this dentry is for "my" filesystem, then I can prune it
+		 * without taking the s_umount lock (I already hold it).
+		 */
+		if (sb && dentry->d_sb == sb) {
+			prune_one_dentry(dentry);
+			continue;
+		}
+		/*
+		 * ...otherwise we need to be sure this filesystem isn't being
+		 * unmounted, otherwise we could race with
+		 * generic_shutdown_super(), and end up holding a reference to
+		 * an inode while the filesystem is unmounted.
+		 * So we try to get s_umount, and make sure s_root isn't NULL.
+		 * (Take a local copy of s_umount to avoid a use-after-free of
+		 * `dentry').
+		 */
+		s_umount = &dentry->d_sb->s_umount;
+		if (down_read_trylock(s_umount)) {
+			if (dentry->d_sb->s_root != NULL) {
+				prune_one_dentry(dentry);
+				up_read(s_umount);
+				continue;
+			}
+			up_read(s_umount);
+		}
+		spin_unlock(&dentry->d_lock);
+		/* Cannot remove the first dentry, and it isn't appropriate
+		 * to move it to the head of the list, so give up, and try
+		 * later
+		 */
+		break;
 	}
 	spin_unlock(&dcache_lock);
 }
@@ -481,14 +578,14 @@ repeat:
 		if (dentry->d_sb != sb)
 			continue;
 		dentry_stat.nr_unused--;
-		list_del_init(tmp);
 		spin_lock(&dentry->d_lock);
+		list_del_init(tmp);
 		if (atomic_read(&dentry->d_count)) {
 			spin_unlock(&dentry->d_lock);
 			continue;
 		}
 		prune_one_dentry(dentry);
-		cond_resched_lock(&dcache_lock);
+		/*cond_resched_lock(&dcache_lock);	** ?BAD PERFORMANCE? **   */
 		goto repeat;
 	}
 	spin_unlock(&dcache_lock);
@@ -512,6 +609,7 @@ int have_submounts(struct dentry *parent
 {
 	struct dentry *this_parent = parent;
 	struct list_head *next;
+	int	ret = 1;
 
 	spin_lock(&dcache_lock);
 	if (d_mountpoint(parent))
@@ -539,11 +637,10 @@ resume:
 		this_parent = this_parent->d_parent;
 		goto resume;
 	}
-	spin_unlock(&dcache_lock);
-	return 0; /* No mount points found in tree */
+	ret = 0; /* No mount points found in tree */
 positive:
 	spin_unlock(&dcache_lock);
-	return 1;
+	return ret;
 }
 
 /*
@@ -630,7 +727,7 @@ void shrink_dcache_parent(struct dentry 
 	int found;
 
 	while ((found = select_parent(parent)) != 0)
-		prune_dcache(found);
+		prune_dcache(found, parent->d_sb);
 }
 
 /**
@@ -643,9 +740,10 @@ void shrink_dcache_parent(struct dentry 
  * done under dcache_lock.
  *
  */
-void shrink_dcache_anon(struct hlist_head *head)
+void shrink_dcache_anon(struct super_block *sb)
 {
 	struct hlist_node *lp;
+	struct hlist_head *head = &sb->s_anon;
 	int found;
 	do {
 		found = 0;
@@ -668,7 +766,7 @@ void shrink_dcache_anon(struct hlist_hea
 			}
 		}
 		spin_unlock(&dcache_lock);
-		prune_dcache(found);
+		prune_dcache(found, sb);
 	} while(found);
 }
 
@@ -686,12 +784,16 @@ void shrink_dcache_anon(struct hlist_hea
  */
 static int shrink_dcache_memory(int nr, gfp_t gfp_mask)
 {
+	int	ret = -1;
+
 	if (nr) {
 		if (!(gfp_mask & __GFP_FS))
-			return -1;
-		prune_dcache(nr);
+			goto do_return;
+		prune_dcache(nr, NULL);
 	}
-	return (dentry_stat.nr_unused / 100) * sysctl_vfs_cache_pressure;
+	ret = (dentry_stat.nr_unused / 100) * sysctl_vfs_cache_pressure;
+do_return:
+	return ret;
 }
 
 /**
@@ -711,13 +813,14 @@ struct dentry *d_alloc(struct dentry * p
 
 	dentry = kmem_cache_alloc(dentry_cache, GFP_KERNEL); 
 	if (!dentry)
-		return NULL;
+		goto do_return;
 
 	if (name->len > DNAME_INLINE_LEN-1) {
 		dname = kmalloc(name->len + 1, GFP_KERNEL);
 		if (!dname) {
 			kmem_cache_free(dentry_cache, dentry); 
-			return NULL;
+			dentry = NULL;
+			goto do_return;
 		}
 	} else  {
 		dname = dentry->d_iname;
@@ -758,18 +861,20 @@ struct dentry *d_alloc(struct dentry * p
 		list_add(&dentry->d_u.d_child, &parent->d_subdirs);
 	dentry_stat.nr_dentry++;
 	spin_unlock(&dcache_lock);
-
+do_return:
 	return dentry;
 }
 
 struct dentry *d_alloc_name(struct dentry *parent, const char *name)
 {
 	struct qstr q;
+	struct dentry * ret;
 
 	q.name = name;
 	q.len = strlen(name);
 	q.hash = full_name_hash(q.name, q.len);
-	return d_alloc(parent, &q);
+	ret = d_alloc(parent, &q);
+	return ret;
 }
 
 /**
@@ -817,7 +922,7 @@ void d_instantiate(struct dentry *entry,
  */
 struct dentry *d_instantiate_unique(struct dentry *entry, struct inode *inode)
 {
-	struct dentry *alias;
+	struct dentry *alias = NULL;
 	int len = entry->d_name.len;
 	const char *name = entry->d_name.name;
 	unsigned int hash = entry->d_name.hash;
@@ -841,7 +946,7 @@ struct dentry *d_instantiate_unique(stru
 		spin_unlock(&dcache_lock);
 		BUG_ON(!d_unhashed(alias));
 		iput(inode);
-		return alias;
+		goto do_return;
 	}
 	list_add(&entry->d_alias, &inode->i_dentry);
 do_negative:
@@ -849,9 +954,10 @@ do_negative:
 	fsnotify_d_instantiate(entry, inode);
 	spin_unlock(&dcache_lock);
 	security_d_instantiate(entry, inode);
-	return NULL;
+	alias = NULL;
+do_return:
+	return alias;
 }
-EXPORT_SYMBOL(d_instantiate_unique);
 
 /**
  * d_alloc_root - allocate root dentry
@@ -915,12 +1021,14 @@ struct dentry * d_alloc_anon(struct inod
 
 	if ((res = d_find_alias(inode))) {
 		iput(inode);
-		return res;
+		goto do_return;
 	}
 
 	tmp = d_alloc(NULL, &anonstring);
-	if (!tmp)
-		return NULL;
+	if (!tmp) {
+		res = NULL;
+		goto do_return;
+	}
 
 	tmp->d_parent = tmp; /* make sure dput doesn't croak */
 	
@@ -948,6 +1056,7 @@ struct dentry * d_alloc_anon(struct inod
 		iput(inode);
 	if (tmp)
 		dput(tmp);
+do_return:
 	return res;
 }
 
@@ -1142,6 +1251,7 @@ int d_validate(struct dentry *dentry, st
 {
 	struct hlist_head *base;
 	struct hlist_node *lhp;
+	int		   ret = 0;
 
 	/* Check whether the ptr might be valid at all.. */
 	if (!kmem_ptr_validate(dentry_cache, dentry))
@@ -1159,12 +1269,13 @@ int d_validate(struct dentry *dentry, st
 		if (dentry == hlist_entry(lhp, struct dentry, d_hash)) {
 			__dget_locked(dentry);
 			spin_unlock(&dcache_lock);
-			return 1;
+			ret = 1;
+			goto out;
 		}
 	}
 	spin_unlock(&dcache_lock);
 out:
-	return 0;
+	return ret;
 }
 
 /*
@@ -1191,6 +1302,7 @@ out:
 void d_delete(struct dentry * dentry)
 {
 	int isdir = 0;
+
 	/*
 	 * Are we the only user?
 	 */
@@ -1203,7 +1315,7 @@ void d_delete(struct dentry * dentry)
 
 		dentry_iput(dentry);
 		fsnotify_nameremove(dentry, isdir);
-		return;
+		goto do_return;
 	}
 
 	if (!d_unhashed(dentry))
@@ -1213,6 +1325,8 @@ void d_delete(struct dentry * dentry)
 	spin_unlock(&dcache_lock);
 
 	fsnotify_nameremove(dentry, isdir);
+do_return:
+	return;
 }
 
 static void __d_rehash(struct dentry * entry, struct hlist_head *list)
@@ -1497,13 +1611,13 @@ char * d_path(struct dentry *dentry, str
  */
 asmlinkage long sys_getcwd(char __user *buf, unsigned long size)
 {
-	int error;
+	int error = -ENOMEM;
 	struct vfsmount *pwdmnt, *rootmnt;
 	struct dentry *pwd, *root;
 	char *page = (char *) __get_free_page(GFP_USER);
 
 	if (!page)
-		return -ENOMEM;
+		goto do_return;
 
 	read_lock(&current->fs->lock);
 	pwdmnt = mntget(current->fs->pwdmnt);
@@ -1542,6 +1656,7 @@ out:
 	dput(root);
 	mntput(rootmnt);
 	free_page((unsigned long) page);
+do_return:
 	return error;
 }
 
@@ -1729,7 +1844,6 @@ kmem_cache_t *names_cachep __read_mostly
 /* SLAB cache for file structures */
 kmem_cache_t *filp_cachep __read_mostly;
 
-EXPORT_SYMBOL(d_genocide);
 
 extern void bdev_cache_init(void);
 extern void chrdev_init(void);
@@ -1764,6 +1878,10 @@ void __init vfs_caches_init(unsigned lon
 	chrdev_init();
 }
 
+EXPORT_SYMBOL_GPL(sysctl_vfs_cache_pressure);
+EXPORT_SYMBOL(dcache_lock);
+EXPORT_SYMBOL(d_instantiate_unique);
+EXPORT_SYMBOL(d_genocide);
 EXPORT_SYMBOL(d_alloc);
 EXPORT_SYMBOL(d_alloc_anon);
 EXPORT_SYMBOL(d_alloc_root);
diff -urpN linux-2.6.16-mm2/kernel/exit.c linux-2.6.16/kernel/exit.c
--- linux-2.6.16-mm2/kernel/exit.c	2006-05-31 16:30:14.000000000 +1000
+++ linux-2.6.16/kernel/exit.c	2006-05-30 14:49:35.000000000 +1000
@@ -136,6 +136,7 @@ void release_task(struct task_struct * p
 {
 	int zap_leader;
 	task_t *leader;
+	preempt_disable();	// ** Cleanup as fast as we can! **
 repeat:
 	atomic_dec(&p->user->processes);
 	write_lock_irq(&tasklist_lock);
@@ -173,6 +174,7 @@ repeat:
 	p = leader;
 	if (unlikely(zap_leader))
 		goto repeat;
+	preempt_enable();	// ** OK to give other tasks some cycles now **
 }
 
 /*
diff -urpN linux-2.6.16-mm2/mm/truncate.c linux-2.6.16/mm/truncate.c
--- linux-2.6.16-mm2/mm/truncate.c	2006-05-31 16:30:14.000000000 +1000
+++ linux-2.6.16/mm/truncate.c	2006-05-28 17:22:46.000000000 +1000
@@ -80,7 +80,7 @@ invalidate_complete_page(struct address_
 		return 0;
 	}
 
-	BUG_ON(PagePrivate(page));
+	BUG_ON(PagePrivate(page) && (page_private(page) != 0));
 	__remove_from_page_cache(page);
 	write_unlock_irq(&mapping->tree_lock);
 	ClearPageUptodate(page);

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2006-05-31  6:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-05-26  0:43 PROBLEM: /proc (procfs) task exit race condition causes a kernel crash Tony Griffiths
2006-05-28 15:37 ` Eric W. Biederman
2006-05-29  0:28   ` Tony Griffiths
2006-05-31  6:50   ` Tony Griffiths

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).