linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Oops with tmpfs on both 2.4.22 & 2.6.0-test11
@ 2003-12-01  2:59 James W McMechan
  2003-12-01 11:37 ` Maneesh Soni
  0 siblings, 1 reply; 19+ messages in thread
From: James W McMechan @ 2003-12-01  2:59 UTC (permalink / raw)
  To: hugh; +Cc: linux-kernel

Hello, I have a test program  which will generate the Oops easily.
No maintainer was listed for tmpfs and the best Google reference is
about 2 years back, and it does not seem to be about this issue.

This Oops both 2.4.22 and 2.6.0-test11
It results from a ARCH=um bugreport and I kept making the
test program shorter, now down to one executable line.

It oops with the list poison address on 2.6.0-test11
Neither myself nor William Lee Irwin III know what the
list_del(q);
list_add(q, &dentry->d_subdirs);
from fs/libfs.c:90 or 137 is intended to do but he suggested you might
know
I think that is where it is corrupting the list entries.

/* by James_McMechan at hotmail com */                                   
      
/* test2 program to Oops shmfs mounted at /dev/shm */
/* yes it is dumb but unprivileged users should not be able */
/* to Oops the kernel regardless of how dumb the program */
#include <sys/types.h>
#include <dirent.h>
main()
{/* off 0 is "." off 1 is ".." off 2 is empty */
        seekdir(opendir("/dev/shm"), (off_t) 2);
}

On Sun, 30 Nov 2003 20:51:01 -0800 William Lee Irwin III
<wli@holomorphy.com> writes:
> On Sun, Nov 30, 2003 at 06:06:41PM -0800, James W McMechan wrote:
> > Have you got a suggestion on who to bug, I have not found
> > maintainers on tmpfs or now the libfs section.
> 
> Hugh Dickins is highly clueful and generally maintains tmpfs. He's
> fixed bugs in fs/libfs.c before, too.
> 
> 
> -- wli

________________________________________________________________
The best thing to hit the internet in years - Juno SpeedBand!
Surf the web up to FIVE TIMES FASTER!
Only $14.95/ month - visit www.juno.com to sign up today!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oops with tmpfs on both 2.4.22 & 2.6.0-test11
  2003-12-01  2:59 Oops with tmpfs on both 2.4.22 & 2.6.0-test11 James W McMechan
@ 2003-12-01 11:37 ` Maneesh Soni
  0 siblings, 0 replies; 19+ messages in thread
From: Maneesh Soni @ 2003-12-01 11:37 UTC (permalink / raw)
  To: James W McMechan
  Cc: hugh, linux-kernel, William Lee Irwin III, Al Viro, Andrew Morton

On Mon, Dec 01, 2003 at 05:50:13AM +0000, James W McMechan wrote:
> Hello, I have a test program  which will generate the Oops easily.
> No maintainer was listed for tmpfs and the best Google reference is
> about 2 years back, and it does not seem to be about this issue.
> 
> This Oops both 2.4.22 and 2.6.0-test11
> It results from a ARCH=um bugreport and I kept making the
> test program shorter, now down to one executable line.
> 
> It oops with the list poison address on 2.6.0-test11
> Neither myself nor William Lee Irwin III know what the
> list_del(q);
> list_add(q, &dentry->d_subdirs);
> from fs/libfs.c:90 or 137 is intended to do but he suggested you might
> know
> I think that is where it is corrupting the list entries.
> 
> /* by James_McMechan at hotmail com */                                   
>       
> /* test2 program to Oops shmfs mounted at /dev/shm */
> /* yes it is dumb but unprivileged users should not be able */
> /* to Oops the kernel regardless of how dumb the program */
> #include <sys/types.h>
> #include <dirent.h>
> main()
> {/* off 0 is "." off 1 is ".." off 2 is empty */
>         seekdir(opendir("/dev/shm"), (off_t) 2);
> }
> 
> On Sun, 30 Nov 2003 20:51:01 -0800 William Lee Irwin III
> <wli@holomorphy.com> writes:
> > On Sun, Nov 30, 2003 at 06:06:41PM -0800, James W McMechan wrote:
> > > Have you got a suggestion on who to bug, I have not found
> > > maintainers on tmpfs or now the libfs section.
> > 
> > Hugh Dickins is highly clueful and generally maintains tmpfs. He's
> > fixed bugs in fs/libfs.c before, too.
> > 
> > 
> > -- wli

Hi,

I hope nobody minds me jumping in this thread. I have been looking at this
code for some time and hope I have got the facts correct.

The two list_xxx macros as mentioned (fs/libfs.c:line 137) adjusts the 
cursor dentry to the beginning of the d_subdirs list needed for 
(file->f_pos == 2) as there can be additions in the d_subdirs list after the 
open call and before ->lseek or ->readdir call.

The cursor adjustment in dcache_dir_lseek() (fs/libfs.c: line 90) always
puts the cursor just before the last looked dentry in the while loop. 

But it is problematic when we have an empty directory and (file->f_pos == 2)
In this case we have the loop counter p pointing to the cursor and doing
list_del and list_add_tail of the same list node results in oops.

The following patch takes (file->f_post == 2) as a special case and adjusts 
the cursor dentry by putting it right at the beginning of the d_subdirs
list.


Thanks
Maneesh

 fs/libfs.c |   10 +++++++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff -puN fs/libfs.c~dcache_dir_lseek-fix fs/libfs.c
--- linux-2.6.0-test11/fs/libfs.c~dcache_dir_lseek-fix	2003-12-01 15:48:22.000000000 +0530
+++ linux-2.6.0-test11-maneesh/fs/libfs.c	2003-12-01 16:28:27.000000000 +0530
@@ -75,12 +75,13 @@ loff_t dcache_dir_lseek(struct file *fil
 		file->f_pos = offset;
 		if (file->f_pos >= 2) {
 			struct list_head *p;
+			struct dentry * dentry = file->f_dentry;
 			struct dentry *cursor = file->private_data;
 			loff_t n = file->f_pos - 2;
 
 			spin_lock(&dcache_lock);
-			p = file->f_dentry->d_subdirs.next;
-			while (n && p != &file->f_dentry->d_subdirs) {
+			p = dentry->d_subdirs.next;
+			while (n && p != &dentry->d_subdirs) {
 				struct dentry *next;
 				next = list_entry(p, struct dentry, d_child);
 				if (!d_unhashed(next) && next->d_inode)
@@ -88,7 +89,10 @@ loff_t dcache_dir_lseek(struct file *fil
 				p = p->next;
 			}
 			list_del(&cursor->d_child);
-			list_add_tail(&cursor->d_child, p);
+			if (file->f_pos == 2)
+				list_add(&cursor->d_child, &dentry->d_subdirs);
+			else
+				list_add_tail(&cursor->d_child, p);
 			spin_unlock(&dcache_lock);
 		}
 	}

_




-- 
Maneesh Soni
Linux Technology Center, 
IBM Software Lab, Bangalore, India
email: maneesh@in.ibm.com
Phone: 91-80-5044999 Fax: 91-80-5268553
T/L : 9243696

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oops with tmpfs on both 2.4.22 & 2.6.0-test11
       [not found] <20031207.140732.-1654081.3.mcmechanjw@juno.com>
@ 2003-12-08  5:10 ` Maneesh Soni
  0 siblings, 0 replies; 19+ messages in thread
From: Maneesh Soni @ 2003-12-08  5:10 UTC (permalink / raw)
  To: James W McMechan; +Cc: hugh, linux-kernel, wli, viro, akpm

On Sun, Dec 07, 2003 at 02:07:28PM -0800, James W McMechan wrote:
> After tinkering with patches for the last week I finally have a version
> that does not look quite so bad, my first attempts at improvement were
> awful in their awkwardness.
> The problem was that the cursor was in the list being walked, and when
> the pointer pointed to the cursor the list_del/list_add_tail pair would
> Oops trying to find the entry pointed to by the prev pointer of the
> freshly deleted cursor element.
> 
> The solution I finally found was to move the list_del earlier, before the
> beginning of the list walk, since it is not used during the list walk and
> should not count in the list enumeration it can be deleted, then the list
> pointer cannot point to it so it can be added safely with the
> list_add_tail
> without Oopsing, and everything works as expected I am unable to Oops
> this
> version with any of my test programs.
> 
> And of course since this Oops both 2.4 & 2.6 I will need to prepare
> a second set for the 2.4 tree.
> 
> My question to you who expressed interest, is anything odd looking about
> this code, anything that I am doing wrong or could do better?
> 

Looks better than my patch. The aim of dcache_dir_lseek() is to put the
cursor dentry at the required position and thats what it is doing now, deletes
the cursor, finds the desired location and then puts it there.

Thanks
Maneesh


-- 
Maneesh Soni
Linux Technology Center, 
IBM Software Lab, Bangalore, India
email: maneesh@in.ibm.com
Phone: 91-80-5044999 Fax: 91-80-5268553
T/L : 9243696

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oops with tmpfs on both 2.4.22 & 2.6.0-test11
@ 2003-12-07 22:48 James McMechan
  0 siblings, 0 replies; 19+ messages in thread
From: James McMechan @ 2003-12-07 22:48 UTC (permalink / raw)
  To: linux-kernel

After tinkering with patches for the last week I finally have a version
that does not look quite so bad, my first attempts at improvement were
awful in their awkwardness.
The problem was that the cursor was in the list being walked, and when
the pointer pointed to the cursor the list_del/list_add_tail pair would
oops trying to find the entry pointed to by the prev pointer of the
deleted cursor element.

The solution I finally found was to move the list_del earlier, before
the
begining of the list walk, since it is not used during the list walk and
should not count in the list enumeration it can be deleted, then the
list
pointer cannot point to it so it can be added safely with the
list_add_tail
without oopsing, and everything works as expected I am unable to oops
this
version with any of my test programs.

And of course since this Oops both 2.4 & 2.6 I will need to prepare
a second set for the 2.4 tree.

My question to you who expressed interest, is anything odd looking about
this code, anything that I am doing wrong or could do better?

diff -Nur linux-2.6.0-test11/fs/libfs.c
build-2.6.0-test11-bug/fs/libfs.c
--- linux-2.6.0-test11/fs/libfs.c 2003-11-26 12:42:48.000000000 -0800
+++ build-2.6.0-test11-bug/fs/libfs.c 2003-12-07
13:07:19.000000000 -0800
@@ -79,6 +79,7 @@
    loff_t n = file->f_pos - 2;

    spin_lock(&dcache_lock);
+   list_del(&cursor->d_child);
    p = file->f_dentry->d_subdirs.next;
    while (n && p != &file->f_dentry->d_subdirs) {
     struct dentry *next;
@@ -87,7 +88,6 @@
      n--;
     p = p->next;
    }
-   list_del(&cursor->d_child);
    list_add_tail(&cursor->d_child, p);
    spin_unlock(&dcache_lock);
   }

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oops with tmpfs on both 2.4.22 & 2.6.0-test11
  2003-12-03 11:06 James W McMechan
@ 2003-12-03 12:02 ` Maneesh Soni
  0 siblings, 0 replies; 19+ messages in thread
From: Maneesh Soni @ 2003-12-03 12:02 UTC (permalink / raw)
  To: James W McMechan; +Cc: hugh, linux-kernel, wli, viro, akpm

On Wed, Dec 03, 2003 at 03:06:14AM -0800, James W McMechan wrote:
[..]

> > 
> > The cursor adjustment in dcache_dir_lseek() (fs/libfs.c: line 90) 
> > always puts the cursor just before the last looked dentry in the
> > while loop. 
> > 
> > But it is problematic when we have an empty directory and 
> > (file->f_pos == 2)
> > In this case we have the loop counter p pointing to the cursor and 
> > doing list_del and list_add_tail of the same list node results in oops.
> > 
> This is where I get mildly lost, from what you are saying here I
> would have expected a test on list_empty rather than on
> fpos==2 also this occurs in every file, will starting in a different
> pos in the list cause problems?

The cursor dentry is added in d_subdirs list in the ->open call for 
the directory. So even if directory is empty from a user point
of view, the d_subdirs list will ateast have the cursor dentry. 
In other words when we come to ->lseek or ->readdir call, we will not have 
empty d_subdirs list.

> 
> With further testing it also Oops even when the dir is not empty
> I did a "touch /dev/shm/1 /dev/shm/2 /dev/shm/3" to put some
> entries in the dir first and the original still oops at offset 2
> 
> I should do more testing, to see if I can find out what happens
> on non empty dirs, because I was thinking it was due to the
> dir being empty, which now appears not to be true.

humm.. yeah.. the original case will always oops for offset 2 irrespective
of whether directory is empty or not. Because in case of non-empty dir
also we will have p pointing to cursor dentry for offset 2.
Thanks for letting me know one more fact.

> 
> > The following patch takes (file->f_post == 2) as a special case and 
> > adjusts the cursor dentry by putting it right at the beginning of the 
> > d_subdirs list.
> > 
> Also is the new variable dentry needed or just a optimization?
> It looks functionally equivalent, but perhaps it is needed for 
> something I am not seeing at the moment.
That's just to make code readable, without this it will have line beyond 80
columns and also it has to de-reference multiple levels of pointers.


Thanks
Maneesh

-- 
Maneesh Soni
Linux Technology Center, 
IBM Software Lab, Bangalore, India
email: maneesh@in.ibm.com
Phone: 91-80-5044999 Fax: 91-80-5268553
T/L : 9243696

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oops with tmpfs on both 2.4.22 & 2.6.0-test11
@ 2003-12-03 11:06 James W McMechan
  2003-12-03 12:02 ` Maneesh Soni
  0 siblings, 1 reply; 19+ messages in thread
From: James W McMechan @ 2003-12-03 11:06 UTC (permalink / raw)
  To: maneesh; +Cc: hugh, linux-kernel, wli, viro, akpm

> Hi,
> 
> I hope nobody minds me jumping in this thread. I have been looking 
> at this code for some time and hope I have got the facts correct.
> 
Thank you, so far it is no longer crashing :)

> The two list_xxx macros as mentioned (fs/libfs.c:line 137) adjusts 
> the cursor dentry to the beginning of the d_subdirs list needed for 
> (file->f_pos == 2) as there can be additions in the d_subdirs list 
> after the open call and before ->lseek or ->readdir call.
> 
> The cursor adjustment in dcache_dir_lseek() (fs/libfs.c: line 90) 
> always puts the cursor just before the last looked dentry in the
> while loop. 
> 
> But it is problematic when we have an empty directory and 
> (file->f_pos == 2)
> In this case we have the loop counter p pointing to the cursor and 
> doing list_del and list_add_tail of the same list node results in oops.
> 
This is where I get mildly lost, from what you are saying here I
would have expected a test on list_empty rather than on
fpos==2 also this occurs in every file, will starting in a different
pos in the list cause problems?

With further testing it also Oops even when the dir is not empty
I did a "touch /dev/shm/1 /dev/shm/2 /dev/shm/3" to put some
entries in the dir first and the original still oops at offset 2

I should do more testing, to see if I can find out what happens
on non empty dirs, because I was thinking it was due to the
dir being empty, which now appears not to be true.

> The following patch takes (file->f_post == 2) as a special case and 
> adjusts the cursor dentry by putting it right at the beginning of the 
> d_subdirs list.
> 
Also is the new variable dentry needed or just a optimization?
It looks functionally equivalent, but perhaps it is needed for 
something I am not seeing at the moment.
> 
> Thanks
> Maneesh

________________________________________________________________
The best thing to hit the internet in years - Juno SpeedBand!
Surf the web up to FIVE TIMES FASTER!
Only $14.95/ month - visit www.juno.com to sign up today!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oops with tmpfs on both 2.4.22 & 2.6.0-test11
  2003-12-01  7:58   ` Andries Brouwer
@ 2003-12-01  8:00     ` William Lee Irwin III
  0 siblings, 0 replies; 19+ messages in thread
From: William Lee Irwin III @ 2003-12-01  8:00 UTC (permalink / raw)
  To: Andries Brouwer; +Cc: James W McMechan, linux-kernel

On Sun, Nov 30, 2003 at 05:21:26PM -0800, William Lee Irwin III wrote:
>> This is significantly different in nature from the 2.4 oops, since
>> 2.4 hit NULL and this pointer is total garbage.
>> Either it's a double bitflip or even worse is afoot.

On Mon, Dec 01, 2003 at 08:58:24AM +0100, Andries Brouwer wrote:
> This oops is completely understood. I was going to write to you
> yesterday evening, but then saw that Oleg Drokin already had
> written. Didnt you see his mail?

I'm sorry if my mail came out after Oleg's reply; I at least started
writing it before his arrived on my system.


-- wli

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oops with tmpfs on both 2.4.22 & 2.6.0-test11
  2003-12-01  1:21 ` William Lee Irwin III
@ 2003-12-01  7:58   ` Andries Brouwer
  2003-12-01  8:00     ` William Lee Irwin III
  0 siblings, 1 reply; 19+ messages in thread
From: Andries Brouwer @ 2003-12-01  7:58 UTC (permalink / raw)
  To: William Lee Irwin III, James W McMechan, linux-kernel

On Sun, Nov 30, 2003 at 05:21:26PM -0800, William Lee Irwin III wrote:
> On Sun, Nov 30, 2003 at 01:17:46PM -0800, James W McMechan wrote:
> > Unable to handle kernel paging request at virtual address 00200200
> > c018a152
> > *pde = 00000000
> > Oops: 0002 [#1]
> > CPU:    0
> > EIP:    0060:[<c018a152>]    Not tainted
> 
> This is significantly different in nature from the 2.4 oops, since
> 2.4 hit NULL and this pointer is total garbage.
> 
> Either it's a double bitflip or even worse is afoot.

This oops is completely understood. I was going to write to you
yesterday evening, but then saw that Oleg Drokin already had
written. Didnt you see his mail?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oops with tmpfs on both 2.4.22 & 2.6.0-test11
  2003-12-01  2:06 James W McMechan
@ 2003-12-01  4:51 ` William Lee Irwin III
  0 siblings, 0 replies; 19+ messages in thread
From: William Lee Irwin III @ 2003-12-01  4:51 UTC (permalink / raw)
  To: James W McMechan; +Cc: linux-kernel

On Sun, Nov 30, 2003 at 06:06:41PM -0800, James W McMechan wrote:
> Have you got a suggestion on who to bug, I have not found
> maintainers on tmpfs or now the libfs section.

Hugh Dickins is highly clueful and generally maintains tmpfs. He's
fixed bugs in fs/libfs.c before, too.


-- wli

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oops with tmpfs on both 2.4.22 & 2.6.0-test11
  2003-12-01  1:06 James W McMechan
@ 2003-12-01  3:43 ` William Lee Irwin III
  0 siblings, 0 replies; 19+ messages in thread
From: William Lee Irwin III @ 2003-12-01  3:43 UTC (permalink / raw)
  To: James W McMechan; +Cc: linux-kernel

At some point in the past, I wrote:
>> Either it's a double bitflip or even worse is afoot.

On Sun, Nov 30, 2003 at 05:06:04PM -0800, James W McMechan wrote:
> Umm from include/linux/list.h
> #define LIST_POISON1  ((void *) 0x00100100)
> #define LIST_POISON2  ((void *) 0x00200200)
> though perhaps we need a better poison
> 0xdead0001 for example but that might be valid
> Were you thinking of a hardware fault?
> The test program oops both a Athlon and a
> PentiumMMX and I followed this in from a user
> bugreport over on uml-devel

No, it looks like the list poison fooled me.


On Sun, Nov 30, 2003 at 05:06:04PM -0800, James W McMechan wrote:
> I single stepped through on a UML machine and it looked
> like the prev pointer in the list is getting corrupted, I was
> suspecting that fs/libfs.c:dcache_readdir:137
> list_del(q);
> list_add(q, &dentry->d_subdirs);
> when q is a empty list entry this occurs when fpos is 2
> and has no comment :(
> there is a similar chunk at dcache_dir_lseek:90 with a
> list_del(&cursor->d_child);
> list_add_tail(&cursor->d_child, p);

I'm really not sure what the ->d_subdirs rearrangement is supposed
to accomplish.


On Sun, Nov 30, 2003 at 05:06:04PM -0800, James W McMechan wrote:
> I suspect that deleting from a empty? list and adding
> back the deleted entry will mangle things...
> The problem came from looping over roughly
> dirfile = opendir(dirname)
> seekdir(dirfile,pos)
> ent = readdir(dirfile)
> pos=telldir(dirfile)
> closedir(dirfile)
> it started with pos== 0
> seekdir is fine
> readdir returns "."
> teldir returned 1 -> pos
> seekdir is fine
> readdir then got ".." and 
> teldir returned 2 -> pos
> seekdir then blew up on the empty entry
> Have you tried the test program?

No, I've gotten as far as I can with your oopsen. Either someone else
will have to pick it up from here or I'll have to spend more time
looking at fs/libfs.c


-- wli

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oops with tmpfs on both 2.4.22 & 2.6.0-test11
@ 2003-12-01  2:06 James W McMechan
  2003-12-01  4:51 ` William Lee Irwin III
  0 siblings, 1 reply; 19+ messages in thread
From: James W McMechan @ 2003-12-01  2:06 UTC (permalink / raw)
  To: wli; +Cc: linux-kernel

> No, it looks like the list poison fooled me.

It took staring at the list_del for me to notice also

> 
> I'm really not sure what the ->d_subdirs rearrangement is supposed
> to accomplish.

Neither am I, it needs a few comments

> No, I've gotten as far as I can with your oopsen. Either someone 
> else
> will have to pick it up from here or I'll have to spend more time
> looking at fs/libfs.c
> 
> 
> -- wli

Have you got a suggestion on who to bug, I have not found
maintainers on tmpfs or now the libfs section.

________________________________________________________________
The best thing to hit the internet in years - Juno SpeedBand!
Surf the web up to FIVE TIMES FASTER!
Only $14.95/ month - visit www.juno.com to sign up today!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oops with tmpfs on both 2.4.22 & 2.6.0-test11
  2003-11-30 21:17 James W McMechan
@ 2003-12-01  1:21 ` William Lee Irwin III
  2003-12-01  7:58   ` Andries Brouwer
  0 siblings, 1 reply; 19+ messages in thread
From: William Lee Irwin III @ 2003-12-01  1:21 UTC (permalink / raw)
  To: James W McMechan; +Cc: linux-kernel

On Sun, Nov 30, 2003 at 01:17:46PM -0800, James W McMechan wrote:
> Unable to handle kernel paging request at virtual address 00200200
> c018a152
> *pde = 00000000
> Oops: 0002 [#1]
> CPU:    0
> EIP:    0060:[<c018a152>]    Not tainted

This is significantly different in nature from the 2.4 oops, since
2.4 hit NULL and this pointer is total garbage.

Either it's a double bitflip or even worse is afoot.


-- wli

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oops with tmpfs on both 2.4.22 & 2.6.0-test11
@ 2003-12-01  1:06 James W McMechan
  2003-12-01  3:43 ` William Lee Irwin III
  0 siblings, 1 reply; 19+ messages in thread
From: James W McMechan @ 2003-12-01  1:06 UTC (permalink / raw)
  To: wli; +Cc: linux-kernel

> This is significantly different in nature from the 2.4 oops, since
> 2.4 hit NULL and this pointer is total garbage.
> 
> Either it's a double bitflip or even worse is afoot.

Umm from include/linux/list.h
#define LIST_POISON1  ((void *) 0x00100100)
#define LIST_POISON2  ((void *) 0x00200200)
though perhaps we need a better poison
0xdead0001 for example but that might be valid

Were you thinking of a hardware fault?
The test program oops both a Athlon and a
PentiumMMX and I followed this in from a user
bugreport over on uml-devel

I single stepped through on a UML machine and it looked
like the prev pointer in the list is getting corrupted, I was
suspecting that fs/libfs.c:dcache_readdir:137
list_del(q);
list_add(q, &dentry->d_subdirs);
when q is a empty list entry this occurs when fpos is 2
and has no comment :(
there is a similar chunk at dcache_dir_lseek:90 with a
list_del(&cursor->d_child);
list_add_tail(&cursor->d_child, p);

I suspect that deleting from a empty? list and adding
back the deleted entry will mangle things...
The problem came from looping over roughly

dirfile = opendir(dirname)
seekdir(dirfile,pos)
ent = readdir(dirfile)
pos=telldir(dirfile)
closedir(dirfile)

it started with pos== 0
seekdir is fine
readdir returns "."
teldir returned 1 -> pos
seekdir is fine
readdir then got ".." and 
teldir returned 2 -> pos
seekdir then blew up on the empty entry

Have you tried the test program?

________________________________________________________________
The best thing to hit the internet in years - Juno SpeedBand!
Surf the web up to FIVE TIMES FASTER!
Only $14.95/ month - visit www.juno.com to sign up today!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oops with tmpfs on both 2.4.22 & 2.6.0-test11
  2003-11-30 20:06 ` William Lee Irwin III
@ 2003-11-30 21:21   ` Oleg Drokin
  0 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2003-11-30 21:21 UTC (permalink / raw)
  To: linux-kernel

Hello!

William Lee Irwin III <wli@holomorphy.com> wrote:

WLII> Could you try 2.6 with the following patch and send in the resulting
WLII> oops/BUG? Please turn on kallsyms for the run.
WLII>                         while (n && p != &file->f_dentry->d_subdirs) {
WLII>                                 struct dentry *next;
WLII>                                 next = list_entry(p, struct dentry, d_child);
WLII> +                               BUG_ON(!next);
WLII>                                 if (!d_unhashed(next) && next->d_inode)
WLII>                                         n--;
WLII>                                 p = p->next;
WLII>                         }

This loop is never run since n is 0

WLII> +                       BUG_ON(!cursor);
WLII>                         list_del(&cursor->d_child);
WLII>                         list_add_tail(&cursor->d_child, p);

The problem seems to be because &cursor->d_child is equal to p,
so on list_del we zero p->prev, and then assign to p->prev->next in
list_add_tail.
&cursor->d_child is equal to p probably because we just created it this
way in dcache_dir_open()

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oops with tmpfs on both 2.4.22 & 2.6.0-test11
@ 2003-11-30 21:17 James W McMechan
  2003-12-01  1:21 ` William Lee Irwin III
  0 siblings, 1 reply; 19+ messages in thread
From: James W McMechan @ 2003-11-30 21:17 UTC (permalink / raw)
  To: wli; +Cc: linux-kernel

> Could you try 2.6 with the following patch and send in the 
> resulting
> oops/BUG? Please turn on kallsyms for the run.
> 
> 
> Thanks.
> 
> 
> -- wli
Ok, it took a while to recompile, did you try the test program?
If you have tmpfs mounted at /dev/shm as recommended it crashes
for me on both kernels and it might be easier for you if you can
reproduce it on your machine, I can also send the longer version
of the test program if you are using POSIX shm. I was having
trouble with all the inlines hiding where it is going wrong.

Oops from 2.6.0-test11 + plus wli test patch
ksymoops cant find /proc/ksym I had kallsyms on
but ksymoops did not like it as -k

ksymoops 2.4.9 on i586 2.6.0-test11.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.6.0-test11/ (default)
     -m /boot/System.map-2.6.0-test11 (specified)

Error (regular_file): read_ksyms stat /proc/ksyms failed
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
Unable to handle kernel paging request at virtual address 00200200
c018a152
*pde = 00000000
Oops: 0002 [#1]
CPU:    0
EIP:    0060:[<c018a152>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010292
eax: 00200200   ebx: c2d19f64   ecx: c2d09f6c   edx: c2d19f64
esi: c2d19f38   edi: 00000000   ebp: c3513f7c   esp: c3513f64
ds: 007b   es: 007b   ss: 0068
Stack: c2d19f38 00000002 00000000 00000000 00000000 c2de9f60 c3513fbc
c0162ba9 
       c2de9f60 00000002 00000000 00000000 00000002 c3513fbc c017879f
00000003 
       c0189f90 ffffffea 00000000 00000003 0804a050 00000002 c3512000
c0109c17 
Call Trace:
 [<c0162ba9>] sys_lseek+0x59/0xb0
 [<c017879f>] sys_fcntl64+0x5f/0x80
 [<c0189f90>] dcache_dir_lseek+0x0/0x2f0
 [<c0109c17>] syscall_call+0x7/0xb
Code: 89 10 81 3d 70 f4 2c c0 3c 4b 24 1d 74 19 68 70 f4 2c c0 6a 


>>EIP; c018a152 <dcache_dir_lseek+1c2/2f0>   <=====

>>eax; 00200200 <__crc___user_walk+3d8ad/9422e>
>>ebx; c2d19f64 <__crc_device_unregister_wait+2144d0/4c44fe>
>>ecx; c2d09f6c <__crc_device_unregister_wait+2044d8/4c44fe>
>>edx; c2d19f64 <__crc_device_unregister_wait+2144d0/4c44fe>
>>esi; c2d19f38 <__crc_device_unregister_wait+2144a4/4c44fe>
>>ebp; c3513f7c <__crc_proc_root_driver+314d45/7e3f13>
>>esp; c3513f64 <__crc_proc_root_driver+314d2d/7e3f13>

Trace; c0162ba9 <sys_lseek+59/b0>
Trace; c017879f <sys_fcntl64+5f/80>
Trace; c0189f90 <dcache_dir_lseek+0/2f0>
Trace; c0109c17 <syscall_call+7/b>
                                                                         
      
Code;  c018a152 <dcache_dir_lseek+1c2/2f0>
00000000 <_EIP>:
Code;  c018a152 <dcache_dir_lseek+1c2/2f0>   <=====
   0:   89 10                     mov    %edx,(%eax)   <=====
Code;  c018a154 <dcache_dir_lseek+1c4/2f0>
   2:   81 3d 70 f4 2c c0 3c      cmpl   $0x1d244b3c,0xc02cf470
Code;  c018a15b <dcache_dir_lseek+1cb/2f0>
   9:   4b 24 1d 
Code;  c018a15e <dcache_dir_lseek+1ce/2f0>
   c:   74 19                     je     27 <_EIP+0x27>
Code;  c018a160 <dcache_dir_lseek+1d0/2f0>
   e:   68 70 f4 2c c0            push   $0xc02cf470
Code;  c018a165 <dcache_dir_lseek+1d5/2f0>
  13:   6a 00                     push   $0x0

Unable to handle kernel paging request at virtual address 00200200
c017d371
*pde = 00000000
Oops: 0002 [#2]
CPU:    0
EIP:    0060:[<c017d371>]    Not tainted
EFLAGS: 00010246
eax: c2d19f64   ebx: c2d19f38   ecx: c2d19f64   edx: 00200200
esi: c10bc194   edi: c2cf8e3c   ebp: c3513de0   esp: c3513dd8
ds: 007b   es: 007b   ss: 0068
Stack: c2de9f60 c10bc194 c3513dec c0189f7f c2d19f38 c3513e0c c0163ef4
c2cf8e3c 
       c2de9f60 c2d09f38 c2de9f60 00000000 c351ce44 c3513e30 c01625b4
c2de9f60 
       c351ce44 c2de9f60 c351ce44 00040001 00000003 c351ce44 c3513e50
c0120787 
Call Trace:
 [<c0189f7f>] dcache_dir_close+0xf/0x20
 [<c0163ef4>] __fput+0xe4/0x100
 [<c01625b4>] filp_close+0x44/0x70
 [<c0120787>] put_files_struct+0x67/0xd0
 [<c01219c5>] do_exit+0x335/0x6e0
 [<c010a599>] die+0x1a9/0x1b0
 [<c0116b36>] do_page_fault+0x2a6/0x576
 [<c017f2c9>] d_alloc+0x19/0x330
 [<c017f2c9>] d_alloc+0x19/0x330
 [<c0147853>] kmem_cache_alloc+0x133/0x1c0
 [<c016f9dd>] cp_new_stat64+0x10d/0x130
 [<c0116890>] do_page_fault+0x0/0x576
 [<c0109e7d>] error_code+0x2d/0x40
 [<c018a152>] dcache_dir_lseek+0x1c2/0x2f0
 [<c0162ba9>] sys_lseek+0x59/0xb0
 [<c017879f>] sys_fcntl64+0x5f/0x80
 [<c0189f90>] dcache_dir_lseek+0x0/0x2f0
 [<c0109c17>] syscall_call+0x7/0xb
Code: 89 02 c7 41 04 00 02 20 00 c7 43 2c 00 01 10 00 a1 ac f4 2c 


>>EIP; c017d371 <dput+d1/550>   <=====

>>eax; c2d19f64 <__crc_device_unregister_wait+2144d0/4c44fe>
>>ebx; c2d19f38 <__crc_device_unregister_wait+2144a4/4c44fe>
>>ecx; c2d19f64 <__crc_device_unregister_wait+2144d0/4c44fe>
>>edx; 00200200 <__crc___user_walk+3d8ad/9422e>
>>esi; c10bc194 <__crc_idle_cpu+2674d7/3833d6>
>>edi; c2cf8e3c <__crc_device_unregister_wait+1f33a8/4c44fe>
>>ebp; c3513de0 <__crc_proc_root_driver+314ba9/7e3f13>
>>esp; c3513dd8 <__crc_proc_root_driver+314ba1/7e3f13>

Trace; c0189f7f <dcache_dir_close+f/20>
Trace; c0163ef4 <__fput+e4/100>
Trace; c01625b4 <filp_close+44/70>
Trace; c0120787 <put_files_struct+67/d0>
Trace; c01219c5 <do_exit+335/6e0>
Trace; c010a599 <die+1a9/1b0>
Trace; c0116b36 <do_page_fault+2a6/576>
Trace; c017f2c9 <d_alloc+19/330>
Trace; c017f2c9 <d_alloc+19/330>
Trace; c0147853 <kmem_cache_alloc+133/1c0>
Trace; c016f9dd <cp_new_stat64+10d/130>
Trace; c0116890 <do_page_fault+0/576>
Trace; c0109e7d <error_code+2d/40>
Trace; c018a152 <dcache_dir_lseek+1c2/2f0>
Trace; c0162ba9 <sys_lseek+59/b0>
Trace; c017879f <sys_fcntl64+5f/80>
Trace; c0189f90 <dcache_dir_lseek+0/2f0>
Trace; c0109c17 <syscall_call+7/b>

Code;  c017d371 <dput+d1/550>
00000000 <_EIP>:
Code;  c017d371 <dput+d1/550>   <=====
   0:   89 02                     mov    %eax,(%edx)   <=====
Code;  c017d373 <dput+d3/550>
   2:   c7 41 04 00 02 20 00      movl   $0x200200,0x4(%ecx)
Code;  c017d37a <dput+da/550>
   9:   c7 43 2c 00 01 10 00      movl   $0x100100,0x2c(%ebx)
Code;  c017d381 <dput+e1/550>
  10:   a1 ac f4 2c 00            mov    0x2cf4ac,%eax

1 error issued.  Results may not be reliable.

________________________________________________________________
The best thing to hit the internet in years - Juno SpeedBand!
Surf the web up to FIVE TIMES FASTER!
Only $14.95/ month - visit www.juno.com to sign up today!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oops with tmpfs on both 2.4.22 & 2.6.0-test11
  2003-11-30 17:34 James W McMechan
@ 2003-11-30 20:06 ` William Lee Irwin III
  2003-11-30 21:21   ` Oleg Drokin
  0 siblings, 1 reply; 19+ messages in thread
From: William Lee Irwin III @ 2003-11-30 20:06 UTC (permalink / raw)
  To: James W McMechan; +Cc: linux-kernel

At some point in the past, I wrote:
>> Please post the oops (run through ksymoops as-needed).

On Sun, Nov 30, 2003 at 09:34:44AM -0800, James W McMechan wrote:
> I still think the one line test program was easier...
> I hope this helps

Could you try 2.6 with the following patch and send in the resulting
oops/BUG? Please turn on kallsyms for the run.


Thanks.


-- wli


--- fs/libfs.c.orig	2003-11-30 12:02:09.000000000 -0800
+++ fs/libfs.c	2003-11-30 12:04:36.000000000 -0800
@@ -60,6 +60,9 @@
 
 loff_t dcache_dir_lseek(struct file *file, loff_t offset, int origin)
 {
+	BUG_ON(!file);
+	BUG_ON(!file->f_dentry);
+	BUG_ON(!file->f_dentry->d_inode);
 	down(&file->f_dentry->d_inode->i_sem);
 	switch (origin) {
 		case 1:
@@ -83,10 +86,12 @@
 			while (n && p != &file->f_dentry->d_subdirs) {
 				struct dentry *next;
 				next = list_entry(p, struct dentry, d_child);
+				BUG_ON(!next);
 				if (!d_unhashed(next) && next->d_inode)
 					n--;
 				p = p->next;
 			}
+			BUG_ON(!cursor);
 			list_del(&cursor->d_child);
 			list_add_tail(&cursor->d_child, p);
 			spin_unlock(&dcache_lock);

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oops with tmpfs on both 2.4.22 & 2.6.0-test11
  2003-11-30 16:57 James W McMechan
@ 2003-11-30 19:27 ` William Lee Irwin III
  0 siblings, 0 replies; 19+ messages in thread
From: William Lee Irwin III @ 2003-11-30 19:27 UTC (permalink / raw)
  To: James W McMechan; +Cc: linux-kernel

On Sun, Nov 30, 2003 at 08:57:26AM -0800, James W McMechan wrote:
> I have a test program much shorter then the Oops
> If someone wants to work from a Oops I will send
> one, no maintainer was listed and the last Google
> reference is about 2 years back, and it does not
> seem to be about this issue.
> This Oops both 2.4.22 and 2.6.0-test11
> It results from a ARCH=um bugreport and
> I kept making the test program shorter
> This seems silly but one line to Oops?

Please post the oops (run through ksymoops as-needed).


-- wli

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oops with tmpfs on both 2.4.22 & 2.6.0-test11
@ 2003-11-30 17:34 James W McMechan
  2003-11-30 20:06 ` William Lee Irwin III
  0 siblings, 1 reply; 19+ messages in thread
From: James W McMechan @ 2003-11-30 17:34 UTC (permalink / raw)
  To: wli; +Cc: linux-kernel

>Please post the oops (run through ksymoops as-needed).

I still think the one line test program was easier...

I hope this helps

# ksymoops -m /boot/System.map-2.4.22 Oops.file
ksymoops 2.4.9 on i586 2.4.22.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.22/ (default)
     -m /boot/System.map-2.4.22 (specified)

Unable to handle kernel NULL pointer dereference at virtual address
00000000
c0141937
*pde = 00000000
Oops: 0002
CPU:    0
EIP:    0010:[<c0141937>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010282
eax: 00000000   ebx: c1b14410   ecx: c1b143f0   edx: c1b14410
esi: 00000002   edi: 00000000   ebp: c1f81f80   esp: c1f81f60
ds: 0018   es: 0018   ss: 0018
Process a.out (pid: 1331, stackpage=c1f81000)
Stack: c1fe4f24 c38d25c0 c1fe46a0 c1fe46ac c1b143f0 00000000 00000000
c2047654 
       c1f81fbc c0133d06 c2047654 00000002 00000000 00000000 c0123c5c
0804a000 
       00001000 c1f80000 c0141820 ffffffea c1f80000 080495a8 00000002
bffffaa8 
Call Trace:    [<c0133d06>] [<c0123c5c>] [<c0141820>] [<c01071e3>]
Code: 89 10 8b 5d 08 8b 43 08 8b 40 08 8d 48 6c ff 40 6c 0f 8e f7 


>>EIP; c0141937 <dcache_dir_lseek+117/150>   <=====

>>ebx; c1b14410 <_end+186dcb8/455b908>
>>ecx; c1b143f0 <_end+186dc98/455b908>
>>edx; c1b14410 <_end+186dcb8/455b908>
>>ebp; c1f81f80 <_end+1cdb828/455b908>
>>esp; c1f81f60 <_end+1cdb808/455b908>

Trace; c0133d06 <sys_lseek+56/a0>
Trace; c0123c5c <sys_brk+ec/120>
Trace; c0141820 <dcache_dir_lseek+0/150>
Trace; c01071e3 <system_call+33/40>

Code;  c0141937 <dcache_dir_lseek+117/150>
00000000 <_EIP>:
Code;  c0141937 <dcache_dir_lseek+117/150>   <=====
   0:   89 10                     mov    %edx,(%eax)   <=====
Code;  c0141939 <dcache_dir_lseek+119/150>
   2:   8b 5d 08                  mov    0x8(%ebp),%ebx
Code;  c014193c <dcache_dir_lseek+11c/150>
   5:   8b 43 08                  mov    0x8(%ebx),%eax
Code;  c014193f <dcache_dir_lseek+11f/150>
   8:   8b 40 08                  mov    0x8(%eax),%eax
Code;  c0141942 <dcache_dir_lseek+122/150>
   b:   8d 48 6c                  lea    0x6c(%eax),%ecx
Code;  c0141945 <dcache_dir_lseek+125/150>
   e:   ff 40 6c                  incl   0x6c(%eax)
Code;  c0141948 <dcache_dir_lseek+128/150>
  11:   0f 8e f7 00 00 00         jle    10e <_EIP+0x10e>

________________________________________________________________
The best thing to hit the internet in years - Juno SpeedBand!
Surf the web up to FIVE TIMES FASTER!
Only $14.95/ month - visit www.juno.com to sign up today!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Oops with tmpfs on both 2.4.22 & 2.6.0-test11
@ 2003-11-30 16:57 James W McMechan
  2003-11-30 19:27 ` William Lee Irwin III
  0 siblings, 1 reply; 19+ messages in thread
From: James W McMechan @ 2003-11-30 16:57 UTC (permalink / raw)
  To: linux-kernel

I have a test program much shorter then the Oops
If someone wants to work from a Oops I will send
one, no maintainer was listed and the last Google
reference is about 2 years back, and it does not
seem to be about this issue.

This Oops both 2.4.22 and 2.6.0-test11
It results from a ARCH=um bugreport and
I kept making the test program shorter
This seems silly but one line to Oops?

/* by James_McMechan at hotmail com */                                   
      
/* test2 program to Oops shmfs mounted at /dev/shm */
/* yes it is dumb but unprivileged users should not be able */
/* to Oops the kernel regardless of how dumb the program */
#include <sys/types.h>
#include <dirent.h>
main()
{/* off 0 is "." off 1 is ".." off 2 is empty */
        seekdir(opendir("/dev/shm"), (off_t) 2);
}

________________________________________________________________
The best thing to hit the internet in years - Juno SpeedBand!
Surf the web up to FIVE TIMES FASTER!
Only $14.95/ month - visit www.juno.com to sign up today!

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2003-12-08  5:12 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-12-01  2:59 Oops with tmpfs on both 2.4.22 & 2.6.0-test11 James W McMechan
2003-12-01 11:37 ` Maneesh Soni
     [not found] <20031207.140732.-1654081.3.mcmechanjw@juno.com>
2003-12-08  5:10 ` Maneesh Soni
  -- strict thread matches above, loose matches on Subject: below --
2003-12-07 22:48 James McMechan
2003-12-03 11:06 James W McMechan
2003-12-03 12:02 ` Maneesh Soni
2003-12-01  2:06 James W McMechan
2003-12-01  4:51 ` William Lee Irwin III
2003-12-01  1:06 James W McMechan
2003-12-01  3:43 ` William Lee Irwin III
2003-11-30 21:17 James W McMechan
2003-12-01  1:21 ` William Lee Irwin III
2003-12-01  7:58   ` Andries Brouwer
2003-12-01  8:00     ` William Lee Irwin III
2003-11-30 17:34 James W McMechan
2003-11-30 20:06 ` William Lee Irwin III
2003-11-30 21:21   ` Oleg Drokin
2003-11-30 16:57 James W McMechan
2003-11-30 19:27 ` William Lee Irwin III

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).