All of lore.kernel.org
 help / color / mirror / Atom feed
* Patch to fixe Data Acess error in dup_fd
@ 2006-11-10  9:32 Sharyathi Nagesh
  2006-11-14 15:16 ` Sergey Vlasov
  0 siblings, 1 reply; 11+ messages in thread
From: Sharyathi Nagesh @ 2006-11-10  9:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: Pavel Emelianov, Linus Torvalds, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 1266 bytes --]

On running the Stress Test on machine for more than 72 hours following
error message was observed.

0:mon> e
cpu 0x0: Vector: 300 (Data Access) at [c00000007ce2f7f0]
    pc: c000000000060d90: .dup_fd+0x240/0x39c
    lr: c000000000060d6c: .dup_fd+0x21c/0x39c
    sp: c00000007ce2fa70
   msr: 800000000000b032
   dar: ffffffff00000028
 dsisr: 40000000
  current = 0xc000000074950980
  paca    = 0xc000000000454500
    pid   = 27330, comm = bash

0:mon> t
[c00000007ce2fa70] c000000000060d28 .dup_fd+0x1d8/0x39c (unreliable)
[c00000007ce2fb30] c000000000060f48 .copy_files+0x5c/0x88
[c00000007ce2fbd0] c000000000061f5c .copy_process+0x574/0x1520
[c00000007ce2fcd0] c000000000062f88 .do_fork+0x80/0x1c4
[c00000007ce2fdc0] c000000000011790 .sys_clone+0x5c/0x74
[c00000007ce2fe30] c000000000008950 .ppc_clone+0x8/0xc
--- Exception: c00 (System Call) at 000000000fee9c60
SP (fcb2e770) is in userspace

---------------------------
The problem is because of race window. When if(expand) block is executed in dup_fd 
unlocking of oldf->file_lock give a window for fdtable in oldf to be
modified. So actual open_files in oldf may not match with open_files
variable.
This is the debug patch to fix the problem
  Please let me know of your opinion. It is generated on:2.6.19-rc1

[-- Attachment #2: dup_fd.patch --]
[-- Type: text/x-patch, Size: 322 bytes --]

--- kernel/fork.c.orig	2006-11-10 14:42:02.000000000 +0530
+++ kernel/fork.c	2006-11-10 14:42:30.000000000 +0530
@@ -687,6 +687,7 @@ static struct files_struct *dup_fd(struc
 		 * the latest pointer.
 		 */
 		spin_lock(&oldf->file_lock);
+		open_files = count_open_files(old_fdt);
 		old_fdt = files_fdtable(oldf);
 	}
 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Patch to fixe Data Acess error in dup_fd
  2006-11-10  9:32 Patch to fixe Data Acess error in dup_fd Sharyathi Nagesh
@ 2006-11-14 15:16 ` Sergey Vlasov
  2006-11-14 18:49   ` Vadim Lobanov
  0 siblings, 1 reply; 11+ messages in thread
From: Sergey Vlasov @ 2006-11-14 15:16 UTC (permalink / raw)
  To: sharyath
  Cc: Pavel Emelianov, Vadim Lobanov, Linus Torvalds, Andrew Morton,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3362 bytes --]

On Fri, 10 Nov 2006 15:02:01 +0530 Sharyathi Nagesh wrote:

> On running the Stress Test on machine for more than 72 hours following
> error message was observed.

Was your stress test using threads (or other nonstandard clone() calls)?

> 0:mon> e
> cpu 0x0: Vector: 300 (Data Access) at [c00000007ce2f7f0]
>     pc: c000000000060d90: .dup_fd+0x240/0x39c
>     lr: c000000000060d6c: .dup_fd+0x21c/0x39c
>     sp: c00000007ce2fa70
>    msr: 800000000000b032
>    dar: ffffffff00000028
>  dsisr: 40000000
>   current = 0xc000000074950980
>   paca    = 0xc000000000454500
>     pid   = 27330, comm = bash
>
> 0:mon> t
> [c00000007ce2fa70] c000000000060d28 .dup_fd+0x1d8/0x39c (unreliable)
> [c00000007ce2fb30] c000000000060f48 .copy_files+0x5c/0x88
> [c00000007ce2fbd0] c000000000061f5c .copy_process+0x574/0x1520
> [c00000007ce2fcd0] c000000000062f88 .do_fork+0x80/0x1c4
> [c00000007ce2fdc0] c000000000011790 .sys_clone+0x5c/0x74
> [c00000007ce2fe30] c000000000008950 .ppc_clone+0x8/0xc
> --- Exception: c00 (System Call) at 000000000fee9c60
> SP (fcb2e770) is in userspace

Did you find the exact instruction which faulted, and the place in
dup_fd() C code which it corresponds to?  Was the oops due to a NULL
dereference or an invalid pointer?

> The problem is because of race window. When if(expand) block is executed in dup_fd
> unlocking of oldf->file_lock give a window for fdtable in oldf to be
> modified. So actual open_files in oldf may not match with open_files
> variable.
> This is the debug patch to fix the problem

> --- kernel/fork.c.orig	2006-11-10 14:42:02.000000000 +0530
> +++ kernel/fork.c	2006-11-10 14:42:30.000000000 +0530
> @@ -687,6 +687,7 @@ static struct files_struct *dup_fd(struc
>  		 * the latest pointer.
>  		 */
>  		spin_lock(&oldf->file_lock);
> +		open_files = count_open_files(old_fdt);
>  		old_fdt = files_fdtable(oldf);
>  	}

I don't see immediately how the wrong open_files value could cause an
oops in this function.  If the stale open_files value was too big (some
files were closed while we dropped the lock), it should still be safe
(AFAIK file tables can only grow, but never shrink, so access to entries
which were valid before should not access invalid memory).  If the stale
open_files value was too small (some more files were opened), the copy
would miss some files, which should be OK (except that memcpy() calls
which copy fd_sets will copy bits for some of that missed files which
happened to be in the last word - this would cause some fd's to be
permanently busy, and potentially could cause problems later, but an
oops in dup_fd() due to this problem does not look likely).  Seeing the
exact place in the code which oopsed would be interesting.

However, the new code does not look safe in all cases.  If some other
task has opened more files while dup_fd() released oldf->file_lock, the
new code will update open_files to the new larger value.  But newf was
allocated with the old smaller value of open_files, therefore subsequent
accesses to newf may try to write into unallocated memory.

Hmm, and actually the patch seems to be wrong for yet another reason -
the old_fdt pointer which you pass to count_open_files() may be stale
(the comment above even warns about that, and the following line updates
old_fdt).  The count_open_files() call must be after the "old_fdt =
files_fdtable(oldf)" line.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Patch to fixe Data Acess error in dup_fd
  2006-11-14 15:16 ` Sergey Vlasov
@ 2006-11-14 18:49   ` Vadim Lobanov
  2006-11-14 20:42     ` Sergey Vlasov
  0 siblings, 1 reply; 11+ messages in thread
From: Vadim Lobanov @ 2006-11-14 18:49 UTC (permalink / raw)
  To: Sergey Vlasov
  Cc: sharyath, Pavel Emelianov, Linus Torvalds, Andrew Morton, linux-kernel

On Tue, 2006-11-14 at 18:16 +0300, Sergey Vlasov wrote:
> On Fri, 10 Nov 2006 15:02:01 +0530 Sharyathi Nagesh wrote:
> > --- kernel/fork.c.orig	2006-11-10 14:42:02.000000000 +0530
> > +++ kernel/fork.c	2006-11-10 14:42:30.000000000 +0530
> > @@ -687,6 +687,7 @@ static struct files_struct *dup_fd(struc
> >  		 * the latest pointer.
> >  		 */
> >  		spin_lock(&oldf->file_lock);
> > +		open_files = count_open_files(old_fdt);
> >  		old_fdt = files_fdtable(oldf);
> >  	}

Looks like your analysis of the proposed patch's side-effects agrees
with mine (call it independent verification, if you will :) ); I was
expressing the very same concerns about it introducing a race condition
on the mm-commits@ and stable@ lists. The only concern is that, although
this patch is not correct, it does fix "something" -- it would be good
to identify what exactly that "something" is.

> [...] If the stale
> open_files value was too small (some more files were opened), the copy
> would miss some files, which should be OK (except that memcpy() calls
> which copy fd_sets will copy bits for some of that missed files which
> happened to be in the last word - this would cause some fd's to be
> permanently busy, and potentially could cause problems later [...]

Nope, this logic also looks fine. The open_files value that
count_open_files() returns will always be a multiple of BITS_PER_LONG,
so no extraneous bits will ever be copied. It's a tad confusing since
count_open_files() does something a bit different than what its name
suggests.

Speaking of: does a maintainer currently exist for this part of the
kernel? Who's familiar with all the related code? :)

Also, here's some extra information from the other email thread
regarding this patch, that might aid in debugging. I'm merely
copy-pasting it here for reference:
0:mon> e
cpu 0x0: Vector: 300 (Data Access) at [c00000007ce2f7f0]
    pc: c000000000060d90: .dup_fd+0x240/0x39c
    lr: c000000000060d6c: .dup_fd+0x21c/0x39c
    sp: c00000007ce2fa70
   msr: 800000000000b032
   dar: ffffffff00000028
 dsisr: 40000000
  current = 0xc000000074950980
  paca    = 0xc000000000454500
    pid   = 27330, comm = bash

0:mon> t
[c00000007ce2fa70] c000000000060d28 .dup_fd+0x1d8/0x39c (unreliable)
[c00000007ce2fb30] c000000000060f48 .copy_files+0x5c/0x88
[c00000007ce2fbd0] c000000000061f5c .copy_process+0x574/0x1520
[c00000007ce2fcd0] c000000000062f88 .do_fork+0x80/0x1c4
[c00000007ce2fdc0] c000000000011790 .sys_clone+0x5c/0x74
[c00000007ce2fe30] c000000000008950 .ppc_clone+0x8/0xc

The PC translates to:
        for (i = open_files; i != 0; i--) {
                struct file *f = *old_fds++;
                if (f) {
                        get_file(f);       <-- Data access error
                } else {

And more info still:
        0:mon> r
R00 = ffffffff00000028   R16 = 00000000100e0000
R01 = c00000007ce2fa70   R17 = 000000000fff1d38
R02 = c00000000056cd20   R18 = 0000000000000000
R03 = c000000029f40a58   R19 = 0000000001200011
R04 = c000000029f442d8   R20 = c0000000a544a2a0
R05 = 0000000000000001   R21 = 0000000000000000
R06 = 0000000000000024   R22 = 0000000000000100
R07 = 0000001000000000   R23 = c00000008635f5e8
R08 = 0000000000000000   R24 = c0000000919c5448
R09 = 0000000000000024   R25 = 0000000000000100
R10 = 00000000000000dc   R26 = c000000086359c30
R11 = ffffffff00000000   R27 = c000000089e5e230
R12 = 0000000006bbd9e9   R28 = c00000000c8d3d80
R13 = c000000000454500   R29 = 0000000000000020
R14 = c00000007ce2fea0   R30 = c000000000491fc8
R15 = 00000000fcb2e770   R31 = c0000000b8369b08
pc  = c000000000060d90 .dup_fd+0x240/0x39c
lr  = c000000000060d6c .dup_fd+0x21c/0x39c
msr = 800000000000b032   cr  = 24242428
ctr = 0000000000000000   xer = 0000000000000000   trap =  300
dar = ffffffff00000028   dsisr = 40000000
-----------------------
0:mon> di c000000000060d90 <==PC
c000000000060d90  7d200028      lwarx   r9,r0,r0
c000000000060d94  31290001      addic   r9,r9,1
c000000000060d98  7d20012d      stwcx.  r9,r0,r0
c000000000060d9c  40a2fff4      bne     c000000000060d90        #
.dup_fd+0x240/0x39c
c000000000060da0  48000014      b       c000000000060db4        #
.dup_fd+0x264/0x39c
c000000000060da4  e93b0018      ld      r9,24(r27)
c000000000060da8  7c08482a      ldx     r0,r8,r9
c000000000060dac  7c003878      andc    r0,r0,r7
c000000000060db0  7c08492a      stdx    r0,r8,r9
c000000000060db4  3b180008      addi    r24,r24,8
c000000000060db8  7c0006ac      eieio
c000000000060dbc  380affff      addi    r0,r10,-1
c000000000060dc0  f97c0000      std     r11,0(r28)
c000000000060dc4  38c60001      addi    r6,r6,1
c000000000060dc8  3b9c0008      addi    r28,r28,8
c000000000060dcc  7c0a07b4      extsw   r10,r0

Thanks.
-- Vadim Lobanov



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Patch to fixe Data Acess error in dup_fd
  2006-11-14 18:49   ` Vadim Lobanov
@ 2006-11-14 20:42     ` Sergey Vlasov
  2006-11-14 21:35       ` Vadim Lobanov
  0 siblings, 1 reply; 11+ messages in thread
From: Sergey Vlasov @ 2006-11-14 20:42 UTC (permalink / raw)
  To: Vadim Lobanov
  Cc: sharyath, Pavel Emelianov, Linus Torvalds, Andrew Morton, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 5804 bytes --]

On Tue, Nov 14, 2006 at 10:49:14AM -0800, Vadim Lobanov wrote:
> On Tue, 2006-11-14 at 18:16 +0300, Sergey Vlasov wrote:
> > On Fri, 10 Nov 2006 15:02:01 +0530 Sharyathi Nagesh wrote:
> > > --- kernel/fork.c.orig	2006-11-10 14:42:02.000000000 +0530
> > > +++ kernel/fork.c	2006-11-10 14:42:30.000000000 +0530
> > > @@ -687,6 +687,7 @@ static struct files_struct *dup_fd(struc
> > >  		 * the latest pointer.
> > >  		 */
> > >  		spin_lock(&oldf->file_lock);
> > > +		open_files = count_open_files(old_fdt);
> > >  		old_fdt = files_fdtable(oldf);
> > >  	}
> 
> Looks like your analysis of the proposed patch's side-effects agrees
> with mine (call it independent verification, if you will :) ); I was
> expressing the very same concerns about it introducing a race condition
> on the mm-commits@ and stable@ lists. The only concern is that, although
> this patch is not correct, it does fix "something" -- it would be good
> to identify what exactly that "something" is.

Yes, very interesting (although if the problem appeared only after 72
hours of testing, it is hard to be sure that the bug is really fixed).

> [...] The open_files value that
> count_open_files() returns will always be a multiple of BITS_PER_LONG,
> so no extraneous bits will ever be copied. It's a tad confusing since
> count_open_files() does something a bit different than what its name
> suggests.

Yes, then the logic looks fine.  (The comment in count_open_files()
says "Find the last open fd", which is _almost_ what it does.)

There is also some unused code and slightly incorrect comment in
dup_fd():

	size = old_fdt->max_fdset;

... here "size" is not used ....

	/* if the old fdset gets grown now, we'll only copy up to "size" fds */

... here "size" is not used either ....

	size = (new_fdt->max_fds - open_files) * sizeof(struct file *);

The result of the first assignment to "size" is not used anywhere,
even if it is mentioned in the comment.  However, the intent to keep
the old size of fdset is noted again.

> Also, here's some extra information from the other email thread
> regarding this patch, that might aid in debugging. I'm merely
> copy-pasting it here for reference:

Thanks - I cannot find this discussion anywhere in archives...

> 0:mon> e
> cpu 0x0: Vector: 300 (Data Access) at [c00000007ce2f7f0]
>     pc: c000000000060d90: .dup_fd+0x240/0x39c
>     lr: c000000000060d6c: .dup_fd+0x21c/0x39c
>     sp: c00000007ce2fa70
>    msr: 800000000000b032
>    dar: ffffffff00000028
>  dsisr: 40000000
>   current = 0xc000000074950980
>   paca    = 0xc000000000454500
>     pid   = 27330, comm = bash
> 
> 0:mon> t
> [c00000007ce2fa70] c000000000060d28 .dup_fd+0x1d8/0x39c (unreliable)
> [c00000007ce2fb30] c000000000060f48 .copy_files+0x5c/0x88
> [c00000007ce2fbd0] c000000000061f5c .copy_process+0x574/0x1520
> [c00000007ce2fcd0] c000000000062f88 .do_fork+0x80/0x1c4
> [c00000007ce2fdc0] c000000000011790 .sys_clone+0x5c/0x74
> [c00000007ce2fe30] c000000000008950 .ppc_clone+0x8/0xc
> 
> The PC translates to:
>         for (i = open_files; i != 0; i--) {
>                 struct file *f = *old_fds++;
>                 if (f) {
>                         get_file(f);       <-- Data access error

So we probably got a bogus "struct file" pointer...

>                 } else {
> 
> And more info still:
>         0:mon> r
> R00 = ffffffff00000028   R16 = 00000000100e0000
> R01 = c00000007ce2fa70   R17 = 000000000fff1d38
> R02 = c00000000056cd20   R18 = 0000000000000000
> R03 = c000000029f40a58   R19 = 0000000001200011
> R04 = c000000029f442d8   R20 = c0000000a544a2a0
> R05 = 0000000000000001   R21 = 0000000000000000
> R06 = 0000000000000024   R22 = 0000000000000100
> R07 = 0000001000000000   R23 = c00000008635f5e8
> R08 = 0000000000000000   R24 = c0000000919c5448
> R09 = 0000000000000024   R25 = 0000000000000100
> R10 = 00000000000000dc   R26 = c000000086359c30
> R11 = ffffffff00000000   R27 = c000000089e5e230
> R12 = 0000000006bbd9e9   R28 = c00000000c8d3d80
> R13 = c000000000454500   R29 = 0000000000000020
> R14 = c00000007ce2fea0   R30 = c000000000491fc8
> R15 = 00000000fcb2e770   R31 = c0000000b8369b08
> pc  = c000000000060d90 .dup_fd+0x240/0x39c
> lr  = c000000000060d6c .dup_fd+0x21c/0x39c
> msr = 800000000000b032   cr  = 24242428
> ctr = 0000000000000000   xer = 0000000000000000   trap =  300
> dar = ffffffff00000028   dsisr = 40000000
> -----------------------
> 0:mon> di c000000000060d90 <==PC
> c000000000060d90  7d200028      lwarx   r9,r0,r0
> c000000000060d94  31290001      addic   r9,r9,1
> c000000000060d98  7d20012d      stwcx.  r9,r0,r0
> c000000000060d9c  40a2fff4      bne     c000000000060d90        # .dup_fd+0x240/0x39c

 From what little I know about PowerPC, this looks like an atomic
increment of the memory word pointed to by r0, which contains
0xffffffff00000028 - definitely looks like a bogus address.  The
offset of file->f_count should be 0x28 on a 64-bit architecture, so
apparently we got f == 0xffffffff00000000 from *old_fds.  Something
scribbled over that memory?

> c000000000060da0  48000014      b       c000000000060db4        # .dup_fd+0x264/0x39c
> c000000000060da4  e93b0018      ld      r9,24(r27)
> c000000000060da8  7c08482a      ldx     r0,r8,r9
> c000000000060dac  7c003878      andc    r0,r0,r7
> c000000000060db0  7c08492a      stdx    r0,r8,r9
> c000000000060db4  3b180008      addi    r24,r24,8
> c000000000060db8  7c0006ac      eieio
> c000000000060dbc  380affff      addi    r0,r10,-1
> c000000000060dc0  f97c0000      std     r11,0(r28)
> c000000000060dc4  38c60001      addi    r6,r6,1
> c000000000060dc8  3b9c0008      addi    r28,r28,8
> c000000000060dcc  7c0a07b4      extsw   r10,r0

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Patch to fixe Data Acess error in dup_fd
  2006-11-14 20:42     ` Sergey Vlasov
@ 2006-11-14 21:35       ` Vadim Lobanov
  2006-11-15  7:38         ` Sharyathi Nagesh
  0 siblings, 1 reply; 11+ messages in thread
From: Vadim Lobanov @ 2006-11-14 21:35 UTC (permalink / raw)
  To: Sergey Vlasov
  Cc: sharyath, Pavel Emelianov, Linus Torvalds, Andrew Morton, linux-kernel

On Tue, 2006-11-14 at 23:42 +0300, Sergey Vlasov wrote:
> Yes, very interesting (although if the problem appeared only after 72
> hours of testing, it is hard to be sure that the bug is really fixed).

Yep. Could be random memory corruption from some other piece of the
code.

> > [...] The open_files value that
> > count_open_files() returns will always be a multiple of BITS_PER_LONG,
> > so no extraneous bits will ever be copied. It's a tad confusing since
> > count_open_files() does something a bit different than what its name
> > suggests.
> 
> Yes, then the logic looks fine.  (The comment in count_open_files()
> says "Find the last open fd", which is _almost_ what it does.)

It's always the details that getcha. :) There's other implicit tricks
that the code pulls, such as implicit requirements for minimum size
granularity of certain data structures. It basically comes down to
trying to read and understand all the surrounding code.

> There is also some unused code and slightly incorrect comment in
> dup_fd():
> 
> 	size = old_fdt->max_fdset;
> 
> ... here "size" is not used ....
> 
> 	/* if the old fdset gets grown now, we'll only copy up to "size" fds */
> 
> ... here "size" is not used either ....
> 
> 	size = (new_fdt->max_fds - open_files) * sizeof(struct file *);
> 
> The result of the first assignment to "size" is not used anywhere,
> even if it is mentioned in the comment.  However, the intent to keep
> the old size of fdset is noted again.

I read that comment in my mind like this: s/"size"/"open_files"/g. The
wording can definitely be improved, though. Any takers?

I also already sent a patch to Andrew a while ago to clean up that
unused assignment to "size", as part of a bigger fdtable-rework
patchset. This code is currently getting its testing in -mm for the time
being.

> > 0:mon> e
> > cpu 0x0: Vector: 300 (Data Access) at [c00000007ce2f7f0]
> >     pc: c000000000060d90: .dup_fd+0x240/0x39c
> >     lr: c000000000060d6c: .dup_fd+0x21c/0x39c
> >     sp: c00000007ce2fa70
> >    msr: 800000000000b032
> >    dar: ffffffff00000028
> >  dsisr: 40000000
> >   current = 0xc000000074950980
> >   paca    = 0xc000000000454500
> >     pid   = 27330, comm = bash
> > 
> > 0:mon> t
> > [c00000007ce2fa70] c000000000060d28 .dup_fd+0x1d8/0x39c (unreliable)
> > [c00000007ce2fb30] c000000000060f48 .copy_files+0x5c/0x88
> > [c00000007ce2fbd0] c000000000061f5c .copy_process+0x574/0x1520
> > [c00000007ce2fcd0] c000000000062f88 .do_fork+0x80/0x1c4
> > [c00000007ce2fdc0] c000000000011790 .sys_clone+0x5c/0x74
> > [c00000007ce2fe30] c000000000008950 .ppc_clone+0x8/0xc
> > 
> > The PC translates to:
> >         for (i = open_files; i != 0; i--) {
> >                 struct file *f = *old_fds++;
> >                 if (f) {
> >                         get_file(f);       <-- Data access error
> 
> So we probably got a bogus "struct file" pointer...
> 
> >                 } else {
> > 
> > And more info still:
> >         0:mon> r
> > R00 = ffffffff00000028   R16 = 00000000100e0000
> > R01 = c00000007ce2fa70   R17 = 000000000fff1d38
> > R02 = c00000000056cd20   R18 = 0000000000000000
> > R03 = c000000029f40a58   R19 = 0000000001200011
> > R04 = c000000029f442d8   R20 = c0000000a544a2a0
> > R05 = 0000000000000001   R21 = 0000000000000000
> > R06 = 0000000000000024   R22 = 0000000000000100
> > R07 = 0000001000000000   R23 = c00000008635f5e8
> > R08 = 0000000000000000   R24 = c0000000919c5448
> > R09 = 0000000000000024   R25 = 0000000000000100
> > R10 = 00000000000000dc   R26 = c000000086359c30
> > R11 = ffffffff00000000   R27 = c000000089e5e230
> > R12 = 0000000006bbd9e9   R28 = c00000000c8d3d80
> > R13 = c000000000454500   R29 = 0000000000000020
> > R14 = c00000007ce2fea0   R30 = c000000000491fc8
> > R15 = 00000000fcb2e770   R31 = c0000000b8369b08
> > pc  = c000000000060d90 .dup_fd+0x240/0x39c
> > lr  = c000000000060d6c .dup_fd+0x21c/0x39c
> > msr = 800000000000b032   cr  = 24242428
> > ctr = 0000000000000000   xer = 0000000000000000   trap =  300
> > dar = ffffffff00000028   dsisr = 40000000
> > -----------------------
> > 0:mon> di c000000000060d90 <==PC
> > c000000000060d90  7d200028      lwarx   r9,r0,r0
> > c000000000060d94  31290001      addic   r9,r9,1
> > c000000000060d98  7d20012d      stwcx.  r9,r0,r0
> > c000000000060d9c  40a2fff4      bne     c000000000060d90        # .dup_fd+0x240/0x39c
> 
>  From what little I know about PowerPC, this looks like an atomic
> increment of the memory word pointed to by r0, which contains
> 0xffffffff00000028 - definitely looks like a bogus address.  The
> offset of file->f_count should be 0x28 on a 64-bit architecture, so
> apparently we got f == 0xffffffff00000000 from *old_fds.  Something
> scribbled over that memory?

Looks very likely. It would be ironic if the fdsets somehow scribbled
there -- a series of all ones in the open_fds. :)

I sent a request asking which version of the kernel did this. 2.6.19-mm1
without hotfixes could crash like this, due to a bug in the new fdtable
code. Aside from that, and if noone can recognize some sort of magical
constant value in those registers above, it might be bisection time.

-- Vadim Lobanov



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Patch to fixe Data Acess error in dup_fd
  2006-11-14 21:35       ` Vadim Lobanov
@ 2006-11-15  7:38         ` Sharyathi Nagesh
  2006-11-15  8:15           ` Vadim Lobanov
  0 siblings, 1 reply; 11+ messages in thread
From: Sharyathi Nagesh @ 2006-11-15  7:38 UTC (permalink / raw)
  To: Sergey Vlasov, Vadim Lobanov
  Cc: Pavel Emelianov, Linus Torvalds, Andrew Morton, linux-kernel

Lobanov/Sergey
On Tue, 2006-11-14 at 13:35 -0800, Vadim Lobanov wrote:
> On Tue, 2006-11-14 at 23:42 +0300, Sergey Vlasov wrote:
> > Yes, very interesting (although if the problem appeared only after 72
> > hours of testing, it is hard to be sure that the bug is really fixed).
> 
> Yep. Could be random memory corruption from some other piece of the
> code.
> 
> > > [...] The open_files value that
> > > count_open_files() returns will always be a multiple of BITS_PER_LONG,
> > > so no extraneous bits will ever be copied. It's a tad confusing since
> > > count_open_files() does something a bit different than what its name
> > > suggests.
> > 
> > Yes, then the logic looks fine.  (The comment in count_open_files()
> > says "Find the last open fd", which is _almost_ what it does.)
> 
> It's always the details that getcha. :) There's other implicit tricks
> that the code pulls, such as implicit requirements for minimum size
> granularity of certain data structures. It basically comes down to
> trying to read and understand all the surrounding code.
> 
> > There is also some unused code and slightly incorrect comment in
> > dup_fd():
> > 
> > 	size = old_fdt->max_fdset;
> > 
> > ... here "size" is not used ....
> > 
> > 	/* if the old fdset gets grown now, we'll only copy up to "size" fds */
> > 
> > ... here "size" is not used either ....
> > 
> > 	size = (new_fdt->max_fds - open_files) * sizeof(struct file *);
> > 
> > The result of the first assignment to "size" is not used anywhere,
> > even if it is mentioned in the comment.  However, the intent to keep
> > the old size of fdset is noted again.
> 
> I read that comment in my mind like this: s/"size"/"open_files"/g. The
> wording can definitely be improved, though. Any takers?
> 
> I also already sent a patch to Andrew a while ago to clean up that
> unused assignment to "size", as part of a bigger fdtable-rework
> patchset. This code is currently getting its testing in -mm for the time
> being.
> 
> > > 0:mon> e
> > > cpu 0x0: Vector: 300 (Data Access) at [c00000007ce2f7f0]
> > >     pc: c000000000060d90: .dup_fd+0x240/0x39c
> > >     lr: c000000000060d6c: .dup_fd+0x21c/0x39c
> > >     sp: c00000007ce2fa70
> > >    msr: 800000000000b032
> > >    dar: ffffffff00000028
> > >  dsisr: 40000000
> > >   current = 0xc000000074950980
> > >   paca    = 0xc000000000454500
> > >     pid   = 27330, comm = bash
> > > 
> > > 0:mon> t
> > > [c00000007ce2fa70] c000000000060d28 .dup_fd+0x1d8/0x39c (unreliable)
> > > [c00000007ce2fb30] c000000000060f48 .copy_files+0x5c/0x88
> > > [c00000007ce2fbd0] c000000000061f5c .copy_process+0x574/0x1520
> > > [c00000007ce2fcd0] c000000000062f88 .do_fork+0x80/0x1c4
> > > [c00000007ce2fdc0] c000000000011790 .sys_clone+0x5c/0x74
> > > [c00000007ce2fe30] c000000000008950 .ppc_clone+0x8/0xc
> > > 
> > > The PC translates to:
> > >         for (i = open_files; i != 0; i--) {
> > >                 struct file *f = *old_fds++;
> > >                 if (f) {
> > >                         get_file(f);       <-- Data access error
> > 
> > So we probably got a bogus "struct file" pointer...
> > 
> > >                 } else {
> > > 
> > > And more info still:
> > >         0:mon> r
> > > R00 = ffffffff00000028   R16 = 00000000100e0000
> > > R01 = c00000007ce2fa70   R17 = 000000000fff1d38
> > > R02 = c00000000056cd20   R18 = 0000000000000000
> > > R03 = c000000029f40a58   R19 = 0000000001200011
> > > R04 = c000000029f442d8   R20 = c0000000a544a2a0
> > > R05 = 0000000000000001   R21 = 0000000000000000
> > > R06 = 0000000000000024   R22 = 0000000000000100
> > > R07 = 0000001000000000   R23 = c00000008635f5e8
> > > R08 = 0000000000000000   R24 = c0000000919c5448
> > > R09 = 0000000000000024   R25 = 0000000000000100
> > > R10 = 00000000000000dc   R26 = c000000086359c30
> > > R11 = ffffffff00000000   R27 = c000000089e5e230
> > > R12 = 0000000006bbd9e9   R28 = c00000000c8d3d80
> > > R13 = c000000000454500   R29 = 0000000000000020
> > > R14 = c00000007ce2fea0   R30 = c000000000491fc8
> > > R15 = 00000000fcb2e770   R31 = c0000000b8369b08
> > > pc  = c000000000060d90 .dup_fd+0x240/0x39c
> > > lr  = c000000000060d6c .dup_fd+0x21c/0x39c
> > > msr = 800000000000b032   cr  = 24242428
> > > ctr = 0000000000000000   xer = 0000000000000000   trap =  300
> > > dar = ffffffff00000028   dsisr = 40000000
> > > -----------------------
> > > 0:mon> di c000000000060d90 <==PC
> > > c000000000060d90  7d200028      lwarx   r9,r0,r0
> > > c000000000060d94  31290001      addic   r9,r9,1
> > > c000000000060d98  7d20012d      stwcx.  r9,r0,r0
> > > c000000000060d9c  40a2fff4      bne     c000000000060d90        # .dup_fd+0x240/0x39c
> > 
> >  From what little I know about PowerPC, this looks like an atomic
> > increment of the memory word pointed to by r0, which contains
> > 0xffffffff00000028 - definitely looks like a bogus address.  The
> > offset of file->f_count should be 0x28 on a 64-bit architecture, so
> > apparently we got f == 0xffffffff00000000 from *old_fds.  Something
> > scribbled over that memory?

   This is very interesting: after reading through I am feeling there is high chance this 
could as well be a memory corruption issue. But if the issue is memory getting corrupted 
what could be the possible reasons.
   I had observed random slab corruption issues in the machine, could
that may have resulted in corruption, we may be opening up larger issues
here about which I am not much aware of, 

> Looks very likely. It would be ironic if the fdsets somehow scribbled
> there -- a series of all ones in the open_fds. :)
> 
> I sent a request asking which version of the kernel did this. 2.6.19-mm1
> without hotfixes could crash like this, due to a bug in the new fdtable
> code. Aside from that, and if noone can recognize some sort of magical
> constant value in those registers above, it might be bisection time.
> 
Lobanov
   Excuse me for the late response.
   The kernel version on which it is tested is: 2.6.18-1 (Distro   
   variant)
   Test suite:
   TCP/IO stress are from ltp test suite: you can find about them  
   here:           http://ltp.sourceforge.net/testPlan.php#6.0 

   



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Patch to fixe Data Acess error in dup_fd
  2006-11-15  7:38         ` Sharyathi Nagesh
@ 2006-11-15  8:15           ` Vadim Lobanov
  2006-11-15  9:03             ` Sharyathi Nagesh
  2006-11-17 13:08             ` Sharyathi Nagesh
  0 siblings, 2 replies; 11+ messages in thread
From: Vadim Lobanov @ 2006-11-15  8:15 UTC (permalink / raw)
  To: sharyath
  Cc: Sergey Vlasov, Pavel Emelianov, Linus Torvalds, Andrew Morton,
	linux-kernel

On Wed, 2006-11-15 at 13:08 +0530, Sharyathi Nagesh wrote:
>    This is very interesting: after reading through I am feeling there is high chance this 
> could as well be a memory corruption issue. But if the issue is memory getting corrupted 
> what could be the possible reasons.
>    I had observed random slab corruption issues in the machine, could
> that may have resulted in corruption, we may be opening up larger issues
> here about which I am not much aware of, 

I'm guessing that you've already tried this, but it never hurts to be
sure: does this machine pass memtest? :)

>    The kernel version on which it is tested is: 2.6.18-1 (Distro   
>    variant)

Unless someone recognizes special magic values from the register dumps
to point at any particular part of the kernel, the corruption could be
coming from almost anywhere. If noone has any better guesses, then
narrowing down the problem might be worthwhile: grab a vanilla
non-distro 2.6.18-1 kernel (from kernel.org) and see if you can
reproduce the problem with that, and then try to find the previous
release where the problem disappears. Or use git instead, which folks
say can do this bisection process rather well. :)

Thanks,
-- Vadim Lobanov


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Patch to fixe Data Acess error in dup_fd
  2006-11-15  8:15           ` Vadim Lobanov
@ 2006-11-15  9:03             ` Sharyathi Nagesh
  2006-11-17 13:08             ` Sharyathi Nagesh
  1 sibling, 0 replies; 11+ messages in thread
From: Sharyathi Nagesh @ 2006-11-15  9:03 UTC (permalink / raw)
  To: Vadim Lobanov
  Cc: wangzyu, Sergey Vlasov, Pavel Emelianov, Linus Torvalds,
	Andrew Morton, linux-kernel

On Wed, 2006-11-15 at 00:15 -0800, Vadim Lobanov wrote:
> On Wed, 2006-11-15 at 13:08 +0530, Sharyathi Nagesh wrote:
> >    This is very interesting: after reading through I am feeling there is high chance this 
> > could as well be a memory corruption issue. But if the issue is memory getting corrupted 
> > what could be the possible reasons.
> >    I had observed random slab corruption issues in the machine, could
> > that may have resulted in corruption, we may be opening up larger issues
> > here about which I am not much aware of, 
> 
> I'm guessing that you've already tried this, but it never hurts to be
> sure: does this machine pass memtest? :)
Haven't run memtest will run and check if the problem gets recreated.

> >    The kernel version on which it is tested is: 2.6.18-1 (Distro   
> >    variant)
> 
> Unless someone recognizes special magic values from the register dumps
> to point at any particular part of the kernel, the corruption could be
> coming from almost anywhere. If noone has any better guesses, then
> narrowing down the problem might be worthwhile: grab a vanilla
> non-distro 2.6.18-1 kernel (from kernel.org) and see if you can
> reproduce the problem with that, and then try to find the previous
> release where the problem disappears. Or use git instead, which folks
> say can do this bisection process rather well. :)
> 
> Thanks,
> -- Vadim Lobanov
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Patch to fixe Data Acess error in dup_fd
  2006-11-15  8:15           ` Vadim Lobanov
  2006-11-15  9:03             ` Sharyathi Nagesh
@ 2006-11-17 13:08             ` Sharyathi Nagesh
  2006-11-17 19:26               ` Vadim Lobanov
  2007-01-04 13:10               ` Sharyathi Nagesh
  1 sibling, 2 replies; 11+ messages in thread
From: Sharyathi Nagesh @ 2006-11-17 13:08 UTC (permalink / raw)
  To: Sergey Vlasov, Vadim Lobanov
  Cc: Zhao Yu Wang, Pavel Emelianov, Linus Torvalds, Andrew Morton,
	linux-kernel

  I looked into a few memtests that were run in similar machine. There
are a few slab corruption issues but not while running memtest and no
other issues.
  Seems difficult to replicate.

On Wed, 2006-11-15 at 00:15 -0800, Vadim Lobanov wrote:
> On Wed, 2006-11-15 at 13:08 +0530, Sharyathi Nagesh wrote:
> >    This is very interesting: after reading through I am feeling there is high chance this 
> > could as well be a memory corruption issue. But if the issue is memory getting corrupted 
> > what could be the possible reasons.
> >    I had observed random slab corruption issues in the machine, could
> > that may have resulted in corruption, we may be opening up larger issues
> > here about which I am not much aware of, 
> 
> I'm guessing that you've already tried this, but it never hurts to be
> sure: does this machine pass memtest? :)
> 
> >    The kernel version on which it is tested is: 2.6.18-1 (Distro   
> >    variant)
> 
> Unless someone recognizes special magic values from the register dumps
> to point at any particular part of the kernel, the corruption could be
> coming from almost anywhere. If noone has any better guesses, then
> narrowing down the problem might be worthwhile: grab a vanilla
> non-distro 2.6.18-1 kernel (from kernel.org) and see if you can
> reproduce the problem with that, and then try to find the previous
> release where the problem disappears. Or use git instead, which folks
> say can do this bisection process rather well. :)
> 
> Thanks,
> -- Vadim Lobanov
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Patch to fixe Data Acess error in dup_fd
  2006-11-17 13:08             ` Sharyathi Nagesh
@ 2006-11-17 19:26               ` Vadim Lobanov
  2007-01-04 13:10               ` Sharyathi Nagesh
  1 sibling, 0 replies; 11+ messages in thread
From: Vadim Lobanov @ 2006-11-17 19:26 UTC (permalink / raw)
  To: sharyath
  Cc: Sergey Vlasov, Zhao Yu Wang, Pavel Emelianov, Linus Torvalds,
	Andrew Morton, linux-kernel

On Fri, 2006-11-17 at 18:38 +0530, Sharyathi Nagesh wrote:
>   I looked into a few memtests that were run in similar machine. There
> are a few slab corruption issues but not while running memtest and no
> other issues.
>   Seems difficult to replicate.

Bummer.

Bisection with vanilla (non-distro) kernels may be your best bet, then.
I have tried to replicate this issue in my spare time, but so far have
been unsuccessful.

-- Vadim Lobanov


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Patch to fixe Data Acess error in dup_fd
  2006-11-17 13:08             ` Sharyathi Nagesh
  2006-11-17 19:26               ` Vadim Lobanov
@ 2007-01-04 13:10               ` Sharyathi Nagesh
  1 sibling, 0 replies; 11+ messages in thread
From: Sharyathi Nagesh @ 2007-01-04 13:10 UTC (permalink / raw)
  To: linux-kernel, Sergey Vlasov
  Cc: Vadim Lobanov, Zhao Yu Wang, Pavel Emelianov, Linus Torvalds,
	Andrew Morton

Excuse me for the late response

On Fri, 2006-11-17 at 18:38 +0530, Sharyathi Nagesh wrote:

> > I'm guessing that you've already tried this, but it never hurts to be
> > sure: does this machine pass memtest? :)
    Machine passes memtest without fail, even after running for many
hours.
    Since the problem looked genuine to me and may hit again in future
do let me know how I should improve the patch. Please let us know of the
opinion...
---------------------------------------------------------------------
To restate the problem:
Data access error was observed after testing IO and TCP Stress test for
more than 72 hrs over ppc-64 machine.

[c00000007ce2fa70] c000000000060d28 .dup_fd+0x1d8/0x39c (unreliable)
[c00000007ce2fb30] c000000000060f48 .copy_files+0x5c/0x88
[c00000007ce2fbd0] c000000000061f5c .copy_process+0x574/0x1520
[c00000007ce2fcd0] c000000000062f88 .do_fork+0x80/0x1c4
[c00000007ce2fdc0] c000000000011790 .sys_clone+0x5c/0x74
[c00000007ce2fe30] c000000000008950 .ppc_clone+0x8/0xc
--- Exception: c00 (System Call) at 000000000fee9c60

The location of the problem was 
-----------
for (i = open_files; i != 0; i--) {
                struct file *f = *old_fds++;
                if (f) {
                        get_file(f); <== Get file does increment of f_count of 
     
                                         struct file
                } else {
-----------
in kernel/fork.c.Accessing f_count gives Data access error as 'struct file' pointer f is 
pointing to invalid location

Regards
Sharyathi Nagesh


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2007-01-04 13:00 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-11-10  9:32 Patch to fixe Data Acess error in dup_fd Sharyathi Nagesh
2006-11-14 15:16 ` Sergey Vlasov
2006-11-14 18:49   ` Vadim Lobanov
2006-11-14 20:42     ` Sergey Vlasov
2006-11-14 21:35       ` Vadim Lobanov
2006-11-15  7:38         ` Sharyathi Nagesh
2006-11-15  8:15           ` Vadim Lobanov
2006-11-15  9:03             ` Sharyathi Nagesh
2006-11-17 13:08             ` Sharyathi Nagesh
2006-11-17 19:26               ` Vadim Lobanov
2007-01-04 13:10               ` Sharyathi Nagesh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.