All of lore.kernel.org
 help / color / mirror / Atom feed
* Crash on 2.6.21.7 Vanilla + DRBD 0.7
@ 2007-10-04 13:33 vindex+lists-xfs
  2007-10-04 14:27 ` Hannes Dorbath
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: vindex+lists-xfs @ 2007-10-04 13:33 UTC (permalink / raw)
  To: xfs


Hi,

I did compile a fresh 2.6.21.7 kernel from kernel.org (no distro patch,
....), and latest svn (3062) 0.7.X drbd.

After just 2 days of uptime, I did experience another crash.

I wonder if it is an XFS related bug, a DRBD one, or related to XFS on
top of DRBD.

This bug seems to occur with intensive IO operations.

What do you think about it ?

Thanks


Oct  3 18:55:23  kernel: Oops: 0002 [#1]
Oct  3 18:55:23  kernel: SMP 
Oct  3 18:55:23  kernel: CPU:    7
Oct  3 18:55:23  kernel: EIP:    0060:[<c016540c>]    Not tainted VLI
Oct  3 18:55:23  kernel: EFLAGS: 00010046   (2.6.21-dl380-g5-20071001 #1)
Oct  3 18:55:23  kernel: EIP is at cache_alloc_refill+0x11c/0x4f0
Oct  3 18:55:23  kernel: eax: f79c2940   ebx: 00000015   ecx: 00000005   edx: 65b567b0
Oct  3 18:55:23  kernel: esi: 0000000a   edi: d5d26000   ebp: f79d03c0   esp: d2531c98
Oct  3 18:55:23  kernel: ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Oct  3 18:55:23  kernel: Process rsync (pid: 22409, ti=d2530000 task=da1e8070 task.ti=d2530000)
Oct  3 18:55:23  kernel: Stack: 00000010 000002d0 ce9ca0b8 000002d0 f79cfe00 f79d1c00 f79c2940 00000000 
Oct  3 18:55:23  kernel: 00000001 d2531cd4 ce9ca088 c022aade d5d2601c 00000282 f79cfe00 000002d0 
Oct  3 18:55:23  kernel: f79cfe00 c01652e6 00000000 00000001 c0265a4e 00000011 d2531d60 d7acfb40 
Oct  3 18:55:23  kernel: Call Trace:
Oct  3 18:55:23  kernel: [<c022aade>] xfs_da_brelse+0x6e/0xb0
Oct  3 18:55:23  kernel: [<c01652e6>] kmem_cache_alloc+0x46/0x50
Oct  3 18:55:23  kernel: [<c0265a4e>] kmem_zone_alloc+0x4e/0xc0
Oct  3 18:55:23  kernel: [<c027015f>] xfs_fs_alloc_inode+0xf/0x20
Oct  3 18:55:23  kernel: [<c017bbd6>] alloc_inode+0x16/0x170
Oct  3 18:55:23  kernel: [<c017bd89>] iget_locked+0x59/0x130
Oct  3 18:55:23  kernel: [<c023fa38>] xfs_iget+0x78/0x160
Oct  3 18:55:23  kernel: [<c020a49c>] xfs_acl_vget+0x6c/0x160
Oct  3 18:55:23  kernel: [<c025b143>] xfs_dir_lookup_int+0x93/0xf0
Oct  3 18:55:23  kernel: [<c025ea55>] xfs_lookup+0x75/0xa0
Oct  3 18:55:23  kernel: [<c026d0c2>] xfs_vn_lookup+0x52/0x90
Oct  3 18:55:23  kernel: [<c016fd08>] do_lookup+0x148/0x190
Oct  3 18:55:23  kernel: [<c0171cb4>] __link_path_walk+0x814/0xe40
Oct  3 18:55:23  kernel: [<c0172325>] link_path_walk+0x45/0xc0
Oct  3 18:55:23  kernel: [<c0172581>] do_path_lookup+0x81/0x1c0
Oct  3 18:55:23  kernel: [<c01712c3>] getname+0xb3/0xe0
Oct  3 18:55:23  kernel: [<c0172f8b>] __user_walk_fd+0x3b/0x60
Oct  3 18:55:23  kernel: [<c016bcdf>] vfs_lstat_fd+0x1f/0x50
Oct  3 18:55:23  kernel: [<c016bd5f>] sys_lstat64+0xf/0x30
Oct  3 18:55:23  kernel: [<c01040b0>] sysenter_past_esp+0x5d/0x81
Oct  3 18:55:23  kernel: =======================
Oct  3 18:55:23  kernel: Code: 10 8b 77 14 01 c2 8b 44 24 30 8b 34 b0 89 77 14 89 54 8d 14 8d 51 01 89 55 00 8b 44 24 10 8b 77 10 3b 70 5c 72 c0 8b 17 8b 47 04 <89> 42 04 89 10 83 7f 14 ff c7 07 00 01 10 00 c7 47 04 00 02 20 
Oct  3 18:55:23  kernel: EIP: [<c016540c>] cache_alloc_refill+0x11c/0x4f0 SS:ESP 0068:d2531c98
Oct  3 18:55:26  kernel: Oops: 0002 [#2]
Oct  3 18:55:26  kernel: SMP 
Oct  3 18:55:26  kernel: CPU:    7
Oct  3 18:55:26  kernel: EIP:    0060:[<c017bbe0>]    Not tainted VLI
Oct  3 18:55:26  kernel: EFLAGS: 00210282   (2.6.21-dl380-g5-20071001 #1)
Oct  3 18:55:26  kernel: EIP is at alloc_inode+0x20/0x170
Oct  3 18:55:26  kernel: eax: b4fd89ba   ebx: b4fd89ba   ecx: b4fd89ba   edx: b4fd89ba
Oct  3 18:55:26  kernel: esi: f29bb000   edi: f29bb000   ebp: ca743575   esp: d6747c64
Oct  3 18:55:26  kernel: ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Oct  3 18:55:26  kernel: Process imapd (pid: 20054, ti=d6746000 task=e04a20b0 task.ti=d6746000)
Oct  3 18:55:26  kernel: Stack: 00000000 c76fe0dc f29bb000 c017bd89 ffffffff ffffffff c04abda0 ca743575 
Oct  3 18:55:26  kernel: ca743575 f53b5800 c023fa38 cb2b4524 1b2595f3 00000020 f0dd7400 ded8b7a8 
Oct  3 18:55:26  kernel: 00000000 f53b5800 c04abda0 cb2b4524 cb2b4524 ca743575 00000000 00000004 
Oct  3 18:55:26  kernel: Call Trace:
Oct  3 18:55:26  kernel: [<c017bd89>] iget_locked+0x59/0x130
Oct  3 18:55:26  kernel: [<c023fa38>] xfs_iget+0x78/0x160
Oct  3 18:55:26  kernel: [<c025a697>] xfs_trans_iget+0x117/0x190
Oct  3 18:55:26  kernel: [<c0243d87>] xfs_ialloc+0xc7/0x570
Oct  3 18:55:26  kernel: [<c024aabc>] xlog_grant_push_ail+0x3c/0x150
Oct  3 18:55:26  kernel: [<c025b261>] xfs_dir_ialloc+0x81/0x2d0
Oct  3 18:55:26  kernel: [<c025855b>] xfs_trans_reserve+0xab/0x230
Oct  3 18:55:26  kernel: [<c0261aa5>] xfs_create+0x395/0x6a0
Oct  3 18:55:26  kernel: [<c023eac5>] xfs_iunlock+0x85/0xa0
Oct  3 18:55:26  kernel: [<c026d6b5>] xfs_vn_mknod+0x235/0x360
Oct  3 18:55:26  kernel: [<c01705cd>] vfs_create+0xdd/0x140
Oct  3 18:55:26  kernel: [<c01738ae>] open_namei+0x58e/0x5f0
Oct  3 18:55:26  kernel: [<c016716e>] do_filp_open+0x2e/0x60
Oct  3 18:55:26  kernel: [<c0166e4f>] get_unused_fd+0x4f/0xb0
Oct  3 18:55:26  kernel: [<c01671ea>] do_sys_open+0x4a/0xe0
Oct  3 18:55:26  kernel: [<c01672bc>] sys_open+0x1c/0x20
Oct  3 18:55:26  kernel: [<c01040b0>] sysenter_past_esp+0x5d/0x81
Oct  3 18:55:26  kernel: =======================
Oct  3 18:55:26  kernel: Code: 90 90 90 90 90 90 90 90 90 90 90 57 56 89 c6 53 8b 40 20 8b 10 85 d2 0f 84 1e 01 00 00 89 f0 ff d2 89 c3 85 db 0f 84 ee 00 00 00 <89> b3 98 00 00 00 b9 02 00 00 00 0f b6 46 10 8d bb f8 00 00 00 
Oct  3 18:55:26  kernel: EIP: [<c017bbe0>] alloc_inode+0x20/0x170 SS:ESP 0068:d6747c64

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Crash on 2.6.21.7 Vanilla + DRBD 0.7
  2007-10-04 13:33 Crash on 2.6.21.7 Vanilla + DRBD 0.7 vindex+lists-xfs
@ 2007-10-04 14:27 ` Hannes Dorbath
  2007-10-04 16:42   ` Laurent CARON
  2007-10-04 14:35 ` Hannes Dorbath
  2007-10-04 23:10 ` David Chinner
  2 siblings, 1 reply; 12+ messages in thread
From: Hannes Dorbath @ 2007-10-04 14:27 UTC (permalink / raw)
  To: xfs

On 04.10.2007 15:33, vindex+lists-xfs@apartia.org wrote:
> What do you think about it ?

Is that by any chance a kernel with 4k stack size?


-- 
Regards,
Hannes Dorbath

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Crash on 2.6.21.7 Vanilla + DRBD 0.7
  2007-10-04 13:33 Crash on 2.6.21.7 Vanilla + DRBD 0.7 vindex+lists-xfs
  2007-10-04 14:27 ` Hannes Dorbath
@ 2007-10-04 14:35 ` Hannes Dorbath
  2007-10-04 16:33   ` Laurent CARON
  2007-10-10 15:15   ` Louis-David Mitterrand
  2007-10-04 23:10 ` David Chinner
  2 siblings, 2 replies; 12+ messages in thread
From: Hannes Dorbath @ 2007-10-04 14:35 UTC (permalink / raw)
  To: xfs

On 04.10.2007 15:33, vindex+lists-xfs@apartia.org wrote:
> What do you think about it ?

Another thing, is there a special reason why you use DRBD 0.7.x branch? 
AFAIK it will still deadlock with kernel 2.6.22. You are not running 
.22, but if you upgrade you might have serious problems. You should 
really go with DRBD 8.0.6 if you can.


-- 
Regards,
Hannes Dorbath

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Crash on 2.6.21.7 Vanilla + DRBD 0.7
  2007-10-04 14:35 ` Hannes Dorbath
@ 2007-10-04 16:33   ` Laurent CARON
  2007-10-10 15:15   ` Louis-David Mitterrand
  1 sibling, 0 replies; 12+ messages in thread
From: Laurent CARON @ 2007-10-04 16:33 UTC (permalink / raw)
  To: xfs; +Cc: Hannes Dorbath

Hannes Dorbath wrote:
> On 04.10.2007 15:33, vindex+lists-xfs@apartia.org wrote:
>> What do you think about it ?
> 
> Another thing, is there a special reason why you use DRBD 0.7.x branch?
> AFAIK it will still deadlock with kernel 2.6.22. You are not running
> .22, but if you upgrade you might have serious problems. You should
> really go with DRBD 8.0.6 if you can.
> 
> 

Hi,

We use 0.7.X since we had a major problem with 8.0.x.

The initial sync did never complete.

I tried to solve this problem with Lars Ellenberg to no avail, and
decided to go back to 0.7 which is a well tested version.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Crash on 2.6.21.7 Vanilla + DRBD 0.7
  2007-10-04 14:27 ` Hannes Dorbath
@ 2007-10-04 16:42   ` Laurent CARON
  0 siblings, 0 replies; 12+ messages in thread
From: Laurent CARON @ 2007-10-04 16:42 UTC (permalink / raw)
  To: Hannes Dorbath; +Cc: xfs

Hannes Dorbath wrote:
> On 04.10.2007 15:33, vindex+lists-xfs@apartia.org wrote:
>> What do you think about it ?
> 
> Is that by any chance a kernel with 4k stack size?
> 
> 

We're using a 8kb stack size (default value).

Laurent

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Crash on 2.6.21.7 Vanilla + DRBD 0.7
  2007-10-04 13:33 Crash on 2.6.21.7 Vanilla + DRBD 0.7 vindex+lists-xfs
  2007-10-04 14:27 ` Hannes Dorbath
  2007-10-04 14:35 ` Hannes Dorbath
@ 2007-10-04 23:10 ` David Chinner
  2 siblings, 0 replies; 12+ messages in thread
From: David Chinner @ 2007-10-04 23:10 UTC (permalink / raw)
  To: xfs

On Thu, Oct 04, 2007 at 03:33:02PM +0200, vindex+lists-xfs@apartia.org wrote:
> 
> Hi,
> 
> I did compile a fresh 2.6.21.7 kernel from kernel.org (no distro patch,
> ....), and latest svn (3062) 0.7.X drbd.
> 
> After just 2 days of uptime, I did experience another crash.
> 
> I wonder if it is an XFS related bug, a DRBD one, or related to XFS on
> top of DRBD.
> 
> This bug seems to occur with intensive IO operations.
> 
> What do you think about it ?
> 
> Thanks
> 
> 
> Oct  3 18:55:23  kernel: Oops: 0002 [#1]
> Oct  3 18:55:23  kernel: SMP 
> Oct  3 18:55:23  kernel: CPU:    7
> Oct  3 18:55:23  kernel: EIP:    0060:[<c016540c>]    Not tainted VLI
> Oct  3 18:55:23  kernel: EFLAGS: 00010046   (2.6.21-dl380-g5-20071001 #1)
> Oct  3 18:55:23  kernel: EIP is at cache_alloc_refill+0x11c/0x4f0

Use after free somewhere, i'd say. Turn on slab/slub poisoning and
other memory debugging options and see where it panics next
time.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Crash on 2.6.21.7 Vanilla + DRBD 0.7
  2007-10-04 14:35 ` Hannes Dorbath
  2007-10-04 16:33   ` Laurent CARON
@ 2007-10-10 15:15   ` Louis-David Mitterrand
  2007-10-10 22:46     ` David Chinner
  1 sibling, 1 reply; 12+ messages in thread
From: Louis-David Mitterrand @ 2007-10-10 15:15 UTC (permalink / raw)
  To: xfs

On Thu, Oct 04, 2007 at 04:35:05PM +0200, Hannes Dorbath wrote:
> On 04.10.2007 15:33, vindex+lists-xfs@apartia.org wrote:
>> What do you think about it ?
>
> Another thing, is there a special reason why you use DRBD 0.7.x branch? 
> AFAIK it will still deadlock with kernel 2.6.22. You are not running .22, 
> but if you upgrade you might have serious problems. You should really go 
> with DRBD 8.0.6 if you can.
>

After upgrading to 8.0.6 we had another xfs-related crash 4 days later. 
In desperation we are about to abandon xfs and convert this huge 
partition to ext3. Is there anyting else we could try before taking that 
step?

Thanks,

Oct  9 12:20:05 sargon/sargon kernel: SMP
Oct  9 12:20:05 sargon/sargon kernel: CPU:    1
Oct  9 12:20:05 sargon/sargon kernel: EIP:    0060:[<c015edc2>]    Not tainted VLI
Oct  9 12:20:05 sargon/sargon kernel: EFLAGS: 00010082   (2.6.22-dl380-g5-20070917 #1)
Oct  9 12:20:05 sargon/sargon kernel: EIP is at free_block+0x67/0xfe
Oct  9 12:20:05 sargon/sargon kernel: eax: a9b1fb46   ebx: 00000000   ecx: f65f4200   edx: d9741040
Oct  9 12:20:05 sargon/sargon kernel: esi: f65f4000   edi: f79e8f40   ebp: f79da680   esp: f797de5c
Oct  9 12:20:05 sargon/sargon kernel: ds: 007b   es: 007b   fs: 00d8  gs: 0000  ss: 0068
Oct  9 12:20:05 sargon/sargon kernel: Process kswapd0 (pid: 248, ti=f797c000 task=f7c22a90 task.ti=f797c000)
Oct  9 12:20:05 sargon/sargon kernel: Stack: 00000000 00000000 0000001b 00000010 f79e9f14 0000001b 000000d8 f79da680
Oct  9 12:20:05 sargon/sargon kernel: f79e8f40 c015eb6e 00000000 f79e9ec0 f79e9ec0 00000246 cb353240 00000001
Oct  9 12:20:05 sargon/sargon kernel: c015eca7 cb353240 f52f2cf0 dbe833c0 c0210cd0 00000001 dbe833dc dbe833c0
Oct  9 12:20:05 sargon/sargon kernel: Call Trace:
Oct  9 12:20:05 sargon/sargon kernel: [<c015eb6e>] cache_flusharray+0x70/0x96
Oct  9 12:20:05 sargon/sargon kernel: [<c015eca7>] kmem_cache_free+0x7d/0x96
Oct  9 12:20:05 sargon/sargon kernel: [<c0210cd0>] xfs_finish_reclaim+0x121/0x129
Oct  9 12:20:05 sargon/sargon kernel: [<c021e892>] xfs_fs_clear_inode+0x8f/0xb1
Oct  9 12:20:05 sargon/sargon kernel: [<c0172379>] clear_inode+0xa2/0xf0
Oct  9 12:20:05 sargon/sargon kernel: [<c0172639>] dispose_list+0x46/0xc2
Oct  9 12:20:05 sargon/sargon kernel: [<c0172841>] shrink_icache_memory+0x18c/0x1b4
Oct  9 12:20:05 sargon/sargon kernel: [<c014ca77>] shrink_slab+0xd9/0x138
Oct  9 12:20:05 sargon/sargon kernel: [<c014ce04>] kswapd+0x297/0x3e8
Oct  9 12:20:05 sargon/sargon kernel: [<c012d2f1>] autoremove_wake_function+0x0/0x35
Oct  9 12:20:05 sargon/sargon kernel: [<c014cb6d>] kswapd+0x0/0x3e8
Oct  9 12:20:05 sargon/sargon kernel: [<c012d22b>] kthread+0x38/0x5d
Oct  9 12:20:05 sargon/sargon kernel: [<c012d1f3>] kthread+0x0/0x5d
Oct  9 12:20:05 sargon/sargon kernel: [<c0104963>] kernel_thread_helper+0x7/0x10
Oct  9 12:20:05 sargon/sargon kernel: =======================
Oct  9 12:20:05 sargon/sargon kernel: Code: 00 3d 00 40 02 00 75 03 8b 52 0c 8b 02 84 c0 78 04 0f 0b eb fe 8b 72 1c 8b 54 24 28 8b 46 04 8b bc 95 88 00 00 00 8b 16 89 42 04 <89> 10 2b 4e 0c c7 06 00 01 10 00 c7 46 04 00 02 20 00 89 c8 f7
Oct  9 12:20:05 sargon/sargon kernel: EIP: [<c015edc2>] free_block+0x67/0xfe SS:ESP 0068:f797de5c

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Crash on 2.6.21.7 Vanilla + DRBD 0.7
  2007-10-10 15:15   ` Louis-David Mitterrand
@ 2007-10-10 22:46     ` David Chinner
  2007-10-11  7:36       ` Laurent CARON
  0 siblings, 1 reply; 12+ messages in thread
From: David Chinner @ 2007-10-10 22:46 UTC (permalink / raw)
  To: xfs

On Wed, Oct 10, 2007 at 05:15:37PM +0200, Louis-David Mitterrand wrote:
> On Thu, Oct 04, 2007 at 04:35:05PM +0200, Hannes Dorbath wrote:
> > On 04.10.2007 15:33, vindex+lists-xfs@apartia.org wrote:
> >> What do you think about it ?
> >
> > Another thing, is there a special reason why you use DRBD 0.7.x branch? 
> > AFAIK it will still deadlock with kernel 2.6.22. You are not running .22, 
> > but if you upgrade you might have serious problems. You should really go 
> > with DRBD 8.0.6 if you can.
> >
> 
> After upgrading to 8.0.6 we had another xfs-related crash 4 days later. 
> In desperation we are about to abandon xfs and convert this huge 
> partition to ext3. Is there anyting else we could try before taking that 
> step?

Yes, please turn on slab debugging so we can try to find the cause
of this memory corruption. I expect the problem to be in DRBD as
nobody else running XFS is reporting this problem. However, without
running with the right debug options enabled we'll never get to
the bottom of the problem.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Crash on 2.6.21.7 Vanilla + DRBD 0.7
  2007-10-10 22:46     ` David Chinner
@ 2007-10-11  7:36       ` Laurent CARON
  0 siblings, 0 replies; 12+ messages in thread
From: Laurent CARON @ 2007-10-11  7:36 UTC (permalink / raw)
  To: xfs; +Cc: David Chinner

David Chinner wrote:
> Yes, please turn on slab debugging so we can try to find the cause
> of this memory corruption. I expect the problem to be in DRBD as
> nobody else running XFS is reporting this problem. However, without
> running with the right debug options enabled we'll never get to
> the bottom of the problem.


Hi,

Before installing a new kernel i've got a (little?) clue.

The setup is as follows:

The drbd partition is mounted to a generic mountpoint
/dev/drbd1 on /data/web type xfs (rw)

The subdirectories of /data/web are mounted (mount --bind) to another
directory
/data/web/var/www on /var/www type xfs (rw,bind)
/data/web/var/lib/postgresql on /var/lib/postgresql type xfs (rw,bind)
/data/web/var/lib/mysql on /var/lib/mysql type xfs (rw,bind)


It seems I made a mistake here.

mount -t xfs --bind /data/web/var/www /var/www

instead of

mount --bind /data/web/var/www /var/www

Could this be 'THE' root of the problem (if the system then sees
/var/www as a 'real' XFS filesystem and not a directory mounted over) ?

Thanks

Laurent

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Crash on 2.6.21.7 Vanilla + DRBD 0.7
  2007-10-07 23:36 ` David Chinner
@ 2007-10-08 12:47   ` Laurent CARON
  0 siblings, 0 replies; 12+ messages in thread
From: Laurent CARON @ 2007-10-08 12:47 UTC (permalink / raw)
  To: David Chinner; +Cc: drbd-user, linux-kernel

David Chinner wrote:
> Can you turn on slab debug and poisoning and see where
> the kernel fails with that? e.g. set:
> 
> CONFIG_DEBUG_SLAB=y
> CONFIG_DEBUG_SLAB_LEAK=y


I was a little worried about letting those servers in such a bad state,
and went the "easy" way.

I did upgrade from drbd 0.7.X to latest svn 8.0.X

Laurent

PS: Should this bug reappear, i'll change the kernel's config, and let
you know the result.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Crash on 2.6.21.7 Vanilla + DRBD 0.7
  2007-10-04  7:29 Laurent Caron
@ 2007-10-07 23:36 ` David Chinner
  2007-10-08 12:47   ` Laurent CARON
  0 siblings, 1 reply; 12+ messages in thread
From: David Chinner @ 2007-10-07 23:36 UTC (permalink / raw)
  To: Laurent Caron; +Cc: drbd-user, linux-kernel

On Thu, Oct 04, 2007 at 09:29:40AM +0200, Laurent Caron wrote:
> 
> Hi,
> 
> I did compile a fresh 2.6.21.7 kernel from kernel.org (no distro patch, ....), and latest svn (3062) 0.7.X drbd.
> 
> After just 2 days of uptime, I did experience another crash.
> 
> I wonder if it is an XFS related bug, a DRBD one, or related to XFS on top of DRBD.
> 
> This bug seems to occur with intensive IO operations.
> 
> What do you think about it ?

This still looks like memory corruption of some sort:. I'd
suspect DRBD at this point because nobody is repprting this against
other block devices in 2.6.21....

> Oct  3 18:55:23  kernel: Oops: 0002 [#1]
> Oct  3 18:55:23  kernel: SMP 
> Oct  3 18:55:23  kernel: CPU:    7
> Oct  3 18:55:23  kernel: EIP:    0060:[<c016540c>]    Not tainted VLI
> Oct  3 18:55:23  kernel: EFLAGS: 00010046   (2.6.21-dl380-g5-20071001 #1)
> Oct  3 18:55:23  kernel: EIP is at cache_alloc_refill+0x11c/0x4f0

Can you turn on slab debug and poisoning and see where
the kernel fails with that? e.g. set:

CONFIG_DEBUG_SLAB=y
CONFIG_DEBUG_SLAB_LEAK=y

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Crash on 2.6.21.7 Vanilla + DRBD 0.7
@ 2007-10-04  7:29 Laurent Caron
  2007-10-07 23:36 ` David Chinner
  0 siblings, 1 reply; 12+ messages in thread
From: Laurent Caron @ 2007-10-04  7:29 UTC (permalink / raw)
  To: drbd-user; +Cc: linux-kernel


Hi,

I did compile a fresh 2.6.21.7 kernel from kernel.org (no distro patch, ....), and latest svn (3062) 0.7.X drbd.

After just 2 days of uptime, I did experience another crash.

I wonder if it is an XFS related bug, a DRBD one, or related to XFS on top of DRBD.

This bug seems to occur with intensive IO operations.

What do you think about it ?

Thanks

Laurent




Oct  3 18:55:23  kernel: Oops: 0002 [#1]
Oct  3 18:55:23  kernel: SMP 
Oct  3 18:55:23  kernel: CPU:    7
Oct  3 18:55:23  kernel: EIP:    0060:[<c016540c>]    Not tainted VLI
Oct  3 18:55:23  kernel: EFLAGS: 00010046   (2.6.21-dl380-g5-20071001 #1)
Oct  3 18:55:23  kernel: EIP is at cache_alloc_refill+0x11c/0x4f0
Oct  3 18:55:23  kernel: eax: f79c2940   ebx: 00000015   ecx: 00000005   edx: 65b567b0
Oct  3 18:55:23  kernel: esi: 0000000a   edi: d5d26000   ebp: f79d03c0   esp: d2531c98
Oct  3 18:55:23  kernel: ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Oct  3 18:55:23  kernel: Process rsync (pid: 22409, ti=d2530000 task=da1e8070 task.ti=d2530000)
Oct  3 18:55:23  kernel: Stack: 00000010 000002d0 ce9ca0b8 000002d0 f79cfe00 f79d1c00 f79c2940 00000000 
Oct  3 18:55:23  kernel: 00000001 d2531cd4 ce9ca088 c022aade d5d2601c 00000282 f79cfe00 000002d0 
Oct  3 18:55:23  kernel: f79cfe00 c01652e6 00000000 00000001 c0265a4e 00000011 d2531d60 d7acfb40 
Oct  3 18:55:23  kernel: Call Trace:
Oct  3 18:55:23  kernel: [<c022aade>] xfs_da_brelse+0x6e/0xb0
Oct  3 18:55:23  kernel: [<c01652e6>] kmem_cache_alloc+0x46/0x50
Oct  3 18:55:23  kernel: [<c0265a4e>] kmem_zone_alloc+0x4e/0xc0
Oct  3 18:55:23  kernel: [<c027015f>] xfs_fs_alloc_inode+0xf/0x20
Oct  3 18:55:23  kernel: [<c017bbd6>] alloc_inode+0x16/0x170
Oct  3 18:55:23  kernel: [<c017bd89>] iget_locked+0x59/0x130
Oct  3 18:55:23  kernel: [<c023fa38>] xfs_iget+0x78/0x160
Oct  3 18:55:23  kernel: [<c020a49c>] xfs_acl_vget+0x6c/0x160
Oct  3 18:55:23  kernel: [<c025b143>] xfs_dir_lookup_int+0x93/0xf0
Oct  3 18:55:23  kernel: [<c025ea55>] xfs_lookup+0x75/0xa0
Oct  3 18:55:23  kernel: [<c026d0c2>] xfs_vn_lookup+0x52/0x90
Oct  3 18:55:23  kernel: [<c016fd08>] do_lookup+0x148/0x190
Oct  3 18:55:23  kernel: [<c0171cb4>] __link_path_walk+0x814/0xe40
Oct  3 18:55:23  kernel: [<c0172325>] link_path_walk+0x45/0xc0
Oct  3 18:55:23  kernel: [<c0172581>] do_path_lookup+0x81/0x1c0
Oct  3 18:55:23  kernel: [<c01712c3>] getname+0xb3/0xe0
Oct  3 18:55:23  kernel: [<c0172f8b>] __user_walk_fd+0x3b/0x60
Oct  3 18:55:23  kernel: [<c016bcdf>] vfs_lstat_fd+0x1f/0x50
Oct  3 18:55:23  kernel: [<c016bd5f>] sys_lstat64+0xf/0x30
Oct  3 18:55:23  kernel: [<c01040b0>] sysenter_past_esp+0x5d/0x81
Oct  3 18:55:23  kernel: =======================
Oct  3 18:55:23  kernel: Code: 10 8b 77 14 01 c2 8b 44 24 30 8b 34 b0 89 77 14 89 54 8d 14 8d 51 01 89 55 00 8b 44 24 10 8b 77 10 3b 70 5c 72 c0 8b 17 8b 47 04 <89> 42 04 89 10 83 7f 14 ff c7 07 00 01 10 00 c7 47 04 00 02 20 
Oct  3 18:55:23  kernel: EIP: [<c016540c>] cache_alloc_refill+0x11c/0x4f0 SS:ESP 0068:d2531c98
Oct  3 18:55:26  kernel: Oops: 0002 [#2]
Oct  3 18:55:26  kernel: SMP 
Oct  3 18:55:26  kernel: CPU:    7
Oct  3 18:55:26  kernel: EIP:    0060:[<c017bbe0>]    Not tainted VLI
Oct  3 18:55:26  kernel: EFLAGS: 00210282   (2.6.21-dl380-g5-20071001 #1)
Oct  3 18:55:26  kernel: EIP is at alloc_inode+0x20/0x170
Oct  3 18:55:26  kernel: eax: b4fd89ba   ebx: b4fd89ba   ecx: b4fd89ba   edx: b4fd89ba
Oct  3 18:55:26  kernel: esi: f29bb000   edi: f29bb000   ebp: ca743575   esp: d6747c64
Oct  3 18:55:26  kernel: ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Oct  3 18:55:26  kernel: Process imapd (pid: 20054, ti=d6746000 task=e04a20b0 task.ti=d6746000)
Oct  3 18:55:26  kernel: Stack: 00000000 c76fe0dc f29bb000 c017bd89 ffffffff ffffffff c04abda0 ca743575 
Oct  3 18:55:26  kernel: ca743575 f53b5800 c023fa38 cb2b4524 1b2595f3 00000020 f0dd7400 ded8b7a8 
Oct  3 18:55:26  kernel: 00000000 f53b5800 c04abda0 cb2b4524 cb2b4524 ca743575 00000000 00000004 
Oct  3 18:55:26  kernel: Call Trace:
Oct  3 18:55:26  kernel: [<c017bd89>] iget_locked+0x59/0x130
Oct  3 18:55:26  kernel: [<c023fa38>] xfs_iget+0x78/0x160
Oct  3 18:55:26  kernel: [<c025a697>] xfs_trans_iget+0x117/0x190
Oct  3 18:55:26  kernel: [<c0243d87>] xfs_ialloc+0xc7/0x570
Oct  3 18:55:26  kernel: [<c024aabc>] xlog_grant_push_ail+0x3c/0x150
Oct  3 18:55:26  kernel: [<c025b261>] xfs_dir_ialloc+0x81/0x2d0
Oct  3 18:55:26  kernel: [<c025855b>] xfs_trans_reserve+0xab/0x230
Oct  3 18:55:26  kernel: [<c0261aa5>] xfs_create+0x395/0x6a0
Oct  3 18:55:26  kernel: [<c023eac5>] xfs_iunlock+0x85/0xa0
Oct  3 18:55:26  kernel: [<c026d6b5>] xfs_vn_mknod+0x235/0x360
Oct  3 18:55:26  kernel: [<c01705cd>] vfs_create+0xdd/0x140
Oct  3 18:55:26  kernel: [<c01738ae>] open_namei+0x58e/0x5f0
Oct  3 18:55:26  kernel: [<c016716e>] do_filp_open+0x2e/0x60
Oct  3 18:55:26  kernel: [<c0166e4f>] get_unused_fd+0x4f/0xb0
Oct  3 18:55:26  kernel: [<c01671ea>] do_sys_open+0x4a/0xe0
Oct  3 18:55:26  kernel: [<c01672bc>] sys_open+0x1c/0x20
Oct  3 18:55:26  kernel: [<c01040b0>] sysenter_past_esp+0x5d/0x81
Oct  3 18:55:26  kernel: =======================
Oct  3 18:55:26  kernel: Code: 90 90 90 90 90 90 90 90 90 90 90 57 56 89 c6 53 8b 40 20 8b 10 85 d2 0f 84 1e 01 00 00 89 f0 ff d2 89 c3 85 db 0f 84 ee 00 00 00 <89> b3 98 00 00 00 b9 02 00 00 00 0f b6 46 10 8d bb f8 00 00 00 
Oct  3 18:55:26  kernel: EIP: [<c017bbe0>] alloc_inode+0x20/0x170 SS:ESP 0068:d6747c64



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2007-10-11  7:36 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-10-04 13:33 Crash on 2.6.21.7 Vanilla + DRBD 0.7 vindex+lists-xfs
2007-10-04 14:27 ` Hannes Dorbath
2007-10-04 16:42   ` Laurent CARON
2007-10-04 14:35 ` Hannes Dorbath
2007-10-04 16:33   ` Laurent CARON
2007-10-10 15:15   ` Louis-David Mitterrand
2007-10-10 22:46     ` David Chinner
2007-10-11  7:36       ` Laurent CARON
2007-10-04 23:10 ` David Chinner
  -- strict thread matches above, loose matches on Subject: below --
2007-10-04  7:29 Laurent Caron
2007-10-07 23:36 ` David Chinner
2007-10-08 12:47   ` Laurent CARON

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.