All of lore.kernel.org
 help / color / mirror / Atom feed
* [bug, 2.6.37-current] Assertion failed: atomic_read(&pag->pag_ref) == 0
@ 2010-10-26  7:13 Dave Chinner
  2010-10-28 11:58 ` Christoph Hellwig
  2010-11-04 23:00 ` Dave Chinner
  0 siblings, 2 replies; 4+ messages in thread
From: Dave Chinner @ 2010-10-26  7:13 UTC (permalink / raw)
  To: xfs

Folks,

Since themainline merge, I've been getting unmount failures during
shutdown that look like:

Unmounting local filesystems...done.
Shutting down LVM Volume Groups[ 7088.820123] Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/xfs_mount.c, line: 259
[ 7088.821811] ------------[ cut here ]------------
[ 7088.822594] kernel BUG at fs/xfs/support/debug.c:108!
[ 7088.823383] invalid opcode: 0000 [#1] SMP 
[ 7088.824019] last sysfs file: /sys/devices/system/node/node0/cpumap
[ 7088.824045] CPU 1 
[ 7088.824045] Modules linked in:
[ 7088.824045] 
[ 7088.824045] Pid: 0, comm: kworker/0:0 Not tainted 2.6.36-dgc+ #587 /Bochs
[ 7088.824045] RIP: 0010:[<ffffffff814b74cf>]  [<ffffffff814b74cf>] assfail+0x1f/0x30
[ 7088.824045] RSP: 0018:ffff8800df003e50  EFLAGS: 00010286
[ 7088.824045] RAX: 0000000000000069 RBX: ffff88011760a400 RCX: 0000000000000001
[ 7088.824045] RDX: ffff88011b7742c0 RSI: 0000000000000001 RDI: 0000000000000246
[ 7088.824045] RBP: ffff8800df003e50 R08: 0000000000000001 R09: 0000000000000001
[ 7088.824045] R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff81ef8f00
[ 7088.824045] R13: ffff880117118df8 R14: ffff8800df1cecf0 R15: ffff880116ebf6e8
[ 7088.824045] FS:  0000000000000000(0000) GS:ffff8800df000000(0000) knlGS:0000000000000000
[ 7088.824045] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 7088.824045] CR2: 00007ffd8c8b6990 CR3: 0000000001edb000 CR4: 00000000000006e0
[ 7088.824045] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7088.824045] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 7088.824045] Process kworker/0:0 (pid: 0, threadinfo ffff88011b776000, task ffff88011b7742c0)
[ 7088.824045] Stack:
[ 7088.824045]  ffff8800df003e70 ffffffff81499007 ffff8800df003e70 ffff8800df1cecc0
[ 7088.824045] <0> ffff8800df003ed0 ffffffff810e900a 0000000000000001 000000000000000a
[ 7088.824045] <0> ffff880100000006 0000000000000202 0000000000000100 0000000000000048
[ 7088.824045] Call Trace:
[ 7088.824045]  <IRQ> 
[ 7088.824045]  [<ffffffff81499007>] __xfs_free_perag+0x37/0x50
[ 7088.824045]  [<ffffffff810e900a>] __rcu_process_callbacks+0x13a/0x3e0
[ 7088.824045]  [<ffffffff810e92d8>] rcu_process_callbacks+0x28/0x50
[ 7088.824045]  [<ffffffff8108848d>] __do_softirq+0xcd/0x290
[ 7088.824045]  [<ffffffff810a8808>] ? hrtimer_interrupt+0x138/0x250
[ 7088.824045]  [<ffffffff81037f5c>] call_softirq+0x1c/0x50
[ 7088.824045]  [<ffffffff810398dd>] do_softirq+0x9d/0xd0
[ 7088.824045]  [<ffffffff810881e5>] irq_exit+0x95/0xa0
[ 7088.824045]  [<ffffffff81b06380>] smp_apic_timer_interrupt+0x70/0x9b
[ 7088.824045]  [<ffffffff81037a13>] apic_timer_interrupt+0x13/0x20
[ 7088.824045]  <EOI> 
[ 7088.824045]  [<ffffffff81060f6b>] ? native_safe_halt+0xb/0x10
[ 7088.824045]  [<ffffffff810baded>] ? trace_hardirqs_on+0xd/0x10
[ 7088.824045]  [<ffffffff8103fd70>] default_idle+0x50/0xb0
[ 7088.824045]  [<ffffffff81035e28>] cpu_idle+0x78/0x100
[ 7088.824045]  [<ffffffff81af627b>] start_secondary+0x1ac/0x1b1
[ 7088.824045] Code: 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 31 c0 89 d1 48 89 f2 48 89 fe 48 c7 c7 08 38 df 81 e8 7b 34 64 00 <0f> 0b eb fe 66 66 66 66 2e  
[ 7088.824045] RIP  [<ffffffff814b74cf>] assfail+0x1f/0x30
[ 7088.824045]  RSP <ffff8800df003e50>
[ 7088.863091] ---[ end trace ec76f8135c3adba9 ]---

I'm not seeing failures during xfstests runs, it seems that dbench may be the
trigger.  Is anyone else seeing reference counting problems like this on the
current linus tree?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [bug, 2.6.37-current] Assertion failed: atomic_read(&pag->pag_ref) == 0
  2010-10-26  7:13 [bug, 2.6.37-current] Assertion failed: atomic_read(&pag->pag_ref) == 0 Dave Chinner
@ 2010-10-28 11:58 ` Christoph Hellwig
  2010-10-30 14:38   ` Christoph Hellwig
  2010-11-04 23:00 ` Dave Chinner
  1 sibling, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2010-10-28 11:58 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, Oct 26, 2010 at 06:13:56PM +1100, Dave Chinner wrote:
> Folks,
> 
> Since themainline merge, I've been getting unmount failures during
> shutdown that look like:

I've done quite a few mainline runs, but haven't seen anything like
that.  On the other hand I see completely silent hangs in 076 once in
a while.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [bug, 2.6.37-current] Assertion failed: atomic_read(&pag->pag_ref) == 0
  2010-10-28 11:58 ` Christoph Hellwig
@ 2010-10-30 14:38   ` Christoph Hellwig
  0 siblings, 0 replies; 4+ messages in thread
From: Christoph Hellwig @ 2010-10-30 14:38 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, Oct 28, 2010 at 07:58:00AM -0400, Christoph Hellwig wrote:
> On Tue, Oct 26, 2010 at 06:13:56PM +1100, Dave Chinner wrote:
> > Folks,
> > 
> > Since themainline merge, I've been getting unmount failures during
> > shutdown that look like:
> 
> I've done quite a few mainline runs, but haven't seen anything like
> that.  On the other hand I see completely silent hangs in 076 once in
> a while.

I can't reproduce that anymore since upgrading to a newer Linus' tree.
But I've hit the following twice now:

070 7s ...
[ 1208.818651] Assertion failed: dp->i_d.di_forkoff, file: fs/xfs/xfs_attr_leaf.c, line: 373
[ 1208.823852] ------------[ cut here ]------------
[ 1208.825880] kernel BUG at fs/xfs/support/debug.c:108!
[ 1208.827647] invalid opcode: 0000 [#1] SMP 
[ 1208.827724] last sysfs file: /sys/devices/virtio-pci/virtio1/block/vdb/removable
[ 1208.827724] Modules linked in:
[ 1208.827724] 
[ 1208.827724] Pid: 4422, comm: fsstress Not tainted 2.6.36-xfs+ #68 /Bochs
[ 1208.827724] EIP: 0060:[<c04eba6e>] EFLAGS: 00010282 CPU: 0
[ 1208.827724] EIP is at assfail+0x1e/0x30
[ 1208.827724] EAX: 00000060 EBX: f4c8d530 ECX: ffffffa0 EDX: 016e0000
[ 1208.827724] ESI: 00000051 EDI: 00000000 EBP: ece99d20 ESP: ece99d10
[ 1208.827724]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[ 1208.827724] Process fsstress (pid: 4422, ti=ece98000 task=f6310040 task.ti=ece98000)
[ 1208.827724] Stack:
[ 1208.827724]  c0b9dbb8 c0b5fa7e c0b5f912 00000175 ece99d58 c04848a9 00000001 f65cea88
[ 1208.827724] <0> 02e99d44 f5533258 ece99d78 f4c8d530 00000002 00000003 00000051 f4c8d530
[ 1208.827724] <0> 00000000 0000000a ece99df8 c04807d5 00000000 00000004 00000003 ece99d78
[ 1208.827724] Call Trace:
[ 1208.827724]  [<c04848a9>] ? xfs_attr_shortform_remove+0x159/0x270
[ 1208.827724]  [<c04807d5>] ? xfs_attr_remove_int+0x225/0x280
[ 1208.827724]  [<c04b512a>] ? xfs_iunlock+0xaa/0x160
[ 1208.827724]  [<c04808cc>] ? xfs_attr_remove+0x9c/0xc0
[ 1208.827724]  [<c04eb769>] ? xfs_xattr_set+0x89/0x90
[ 1208.827724]  [<c02283ec>] ? generic_removexattr+0x8c/0xa0
[ 1208.827724]  [<c022884e>] ? vfs_removexattr+0x7e/0xf0
[ 1208.827724]  [<c01f0e46>] ? might_fault+0x46/0xa0
[ 1208.827724]  [<c02288fb>] ? removexattr+0x3b/0x60
[ 1208.827724]  [<c0206d8e>] ? kfree_debugcheck+0xe/0x30
[ 1208.827724]  [<c02071bc>] ? cache_free_debugcheck+0x17c/0x250
[ 1208.827724]  [<c067f994>] ? debug_check_no_obj_freed+0x124/0x180
[ 1208.827724]  [<c0195d16>] ? debug_check_no_locks_freed+0xb6/0x140
[ 1208.827724]  [<c0207345>] ? kmem_cache_free+0xb5/0x120
[ 1208.827724]  [<c0195c5b>] ? trace_hardirqs_on+0xb/0x10
[ 1208.827724]  [<c0219d4a>] ? user_path_at+0x4a/0x80
[ 1208.827724]  [<c02120d2>] ? sys_stat64+0x22/0x30
[ 1208.827724]  [<c02289fb>] ? sys_lremovexattr+0x6b/0x80
[ 1208.827724]  [<c0911f2d>] ? syscall_call+0x7/0xb

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [bug, 2.6.37-current] Assertion failed: atomic_read(&pag->pag_ref) == 0
  2010-10-26  7:13 [bug, 2.6.37-current] Assertion failed: atomic_read(&pag->pag_ref) == 0 Dave Chinner
  2010-10-28 11:58 ` Christoph Hellwig
@ 2010-11-04 23:00 ` Dave Chinner
  1 sibling, 0 replies; 4+ messages in thread
From: Dave Chinner @ 2010-11-04 23:00 UTC (permalink / raw)
  To: xfs

On Tue, Oct 26, 2010 at 06:13:56PM +1100, Dave Chinner wrote:
> Folks,
> 
> Since themainline merge, I've been getting unmount failures during
> shutdown that look like:
> 
> Unmounting local filesystems...done.
> Shutting down LVM Volume Groups[ 7088.820123] Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/xfs_mount.c, line: 259
> [ 7088.821811] ------------[ cut here ]------------
> [ 7088.822594] kernel BUG at fs/xfs/support/debug.c:108!
> [ 7088.823383] invalid opcode: 0000 [#1] SMP 
> [ 7088.824019] last sysfs file: /sys/devices/system/node/node0/cpumap
> [ 7088.824045] CPU 1 
> [ 7088.824045] Modules linked in:
> [ 7088.824045] 
> [ 7088.824045] Pid: 0, comm: kworker/0:0 Not tainted 2.6.36-dgc+ #587 /Bochs
> [ 7088.824045] RIP: 0010:[<ffffffff814b74cf>]  [<ffffffff814b74cf>] assfail+0x1f/0x30
> [ 7088.824045] RSP: 0018:ffff8800df003e50  EFLAGS: 00010286
> [ 7088.824045] RAX: 0000000000000069 RBX: ffff88011760a400 RCX: 0000000000000001
> [ 7088.824045] RDX: ffff88011b7742c0 RSI: 0000000000000001 RDI: 0000000000000246
> [ 7088.824045] RBP: ffff8800df003e50 R08: 0000000000000001 R09: 0000000000000001
> [ 7088.824045] R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff81ef8f00
> [ 7088.824045] R13: ffff880117118df8 R14: ffff8800df1cecf0 R15: ffff880116ebf6e8
> [ 7088.824045] FS:  0000000000000000(0000) GS:ffff8800df000000(0000) knlGS:0000000000000000
> [ 7088.824045] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 7088.824045] CR2: 00007ffd8c8b6990 CR3: 0000000001edb000 CR4: 00000000000006e0
> [ 7088.824045] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 7088.824045] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 7088.824045] Process kworker/0:0 (pid: 0, threadinfo ffff88011b776000, task ffff88011b7742c0)
> [ 7088.824045] Stack:
> [ 7088.824045]  ffff8800df003e70 ffffffff81499007 ffff8800df003e70 ffff8800df1cecc0
> [ 7088.824045] <0> ffff8800df003ed0 ffffffff810e900a 0000000000000001 000000000000000a
> [ 7088.824045] <0> ffff880100000006 0000000000000202 0000000000000100 0000000000000048
> [ 7088.824045] Call Trace:
> [ 7088.824045]  <IRQ> 
> [ 7088.824045]  [<ffffffff81499007>] __xfs_free_perag+0x37/0x50
> [ 7088.824045]  [<ffffffff810e900a>] __rcu_process_callbacks+0x13a/0x3e0
> [ 7088.824045]  [<ffffffff810e92d8>] rcu_process_callbacks+0x28/0x50
> [ 7088.824045]  [<ffffffff8108848d>] __do_softirq+0xcd/0x290
> [ 7088.824045]  [<ffffffff810a8808>] ? hrtimer_interrupt+0x138/0x250
> [ 7088.824045]  [<ffffffff81037f5c>] call_softirq+0x1c/0x50
> [ 7088.824045]  [<ffffffff810398dd>] do_softirq+0x9d/0xd0
> [ 7088.824045]  [<ffffffff810881e5>] irq_exit+0x95/0xa0
> [ 7088.824045]  [<ffffffff81b06380>] smp_apic_timer_interrupt+0x70/0x9b
> [ 7088.824045]  [<ffffffff81037a13>] apic_timer_interrupt+0x13/0x20
> [ 7088.824045]  <EOI> 
> [ 7088.824045]  [<ffffffff81060f6b>] ? native_safe_halt+0xb/0x10
> [ 7088.824045]  [<ffffffff810baded>] ? trace_hardirqs_on+0xd/0x10
> [ 7088.824045]  [<ffffffff8103fd70>] default_idle+0x50/0xb0
> [ 7088.824045]  [<ffffffff81035e28>] cpu_idle+0x78/0x100
> [ 7088.824045]  [<ffffffff81af627b>] start_secondary+0x1ac/0x1b1
> [ 7088.824045] Code: 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 31 c0 89 d1 48 89 f2 48 89 fe 48 c7 c7 08 38 df 81 e8 7b 34 64 00 <0f> 0b eb fe 66 66 66 66 2e  
> [ 7088.824045] RIP  [<ffffffff814b74cf>] assfail+0x1f/0x30
> [ 7088.824045]  RSP <ffff8800df003e50>
> [ 7088.863091] ---[ end trace ec76f8135c3adba9 ]---
> 
> I'm not seeing failures during xfstests runs, it seems that dbench may be the
> trigger.  Is anyone else seeing reference counting problems like this on the
> current linus tree?

Ok, found the bug - it's in the reclaim scalability patchset that
was merged into .37-rc1 - when the shrinker skips a locked AG it
misseѕ a xfs_perag_put() call.  I'll push out a patch soon.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-11-04 22:59 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-26  7:13 [bug, 2.6.37-current] Assertion failed: atomic_read(&pag->pag_ref) == 0 Dave Chinner
2010-10-28 11:58 ` Christoph Hellwig
2010-10-30 14:38   ` Christoph Hellwig
2010-11-04 23:00 ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.