On Thu, Mar 17 2011 at 2:31pm -0400, Jens Axboe wrote: > On 2011-03-17 16:51, Mike Snitzer wrote: > > On Tue, Mar 08 2011 at 5:05pm -0500, > > Mike Snitzer wrote: > > > >> On Tue, Mar 08 2011 at 3:27pm -0500, > >> Jens Axboe wrote: > >> > >>> On 2011-03-08 21:21, Mike Snitzer wrote: > >>>> On Tue, Mar 08 2011 at 7:16am -0500, > >>>> Jens Axboe wrote: > >>>> > >>>>> On 2011-03-03 23:13, Mike Snitzer wrote: > >>>>>> I'm now hitting a lockdep issue, while running a 'for-2.6.39/stack-plug' > >>>>>> kernel, when I try an fsync heavy workload to a request-based mpath > >>>>>> device (the kernel ultimately goes down in flames, I've yet to look at > >>>>>> the crashdump I took) > >>>>> > >>>>> Mike, can you re-run with the current stack-plug branch? I've fixed the > >>>>> !CONFIG_BLOCK and rebase issues, and also added a change for this flush > >>>>> on schedule event. It's run outside of the runqueue lock now, so > >>>>> hopefully that should solve this one. > >>>> > >>>> Works for me, thanks. > >>> > >>> Super, thanks! Out of curiousity, did you use dm/md? > >> > >> Yes, I've been using a request-based DM multipath device. > > > > > > Against latest 'for-2.6.39/core', I just ran that same fsync heavy > > workload against XFS (ontop of a DM multipath volume). ffsb induced the > > following hangs (ripple effect causing NetworkManager to get hung up on > > this data-only XFS volume, etc): > > Ugh. Care to send the recipee for how to reproduce this? Essentially > just looks like IO got stuck. Here is the sequence to reproduce with the attached fsync-happy.ffsb (I've been running the following in a KVM guest): mkfs.xfs /dev/mapper/mpathb mount /dev/mapper/mpathb /mnt/test ./ffsb fsync-happy.ffsb And I just verified that the deadlock does _not_ seem to occur without DM multipath -- by directly using an underlying SCSI device instead. So multipath is exposing this somehow (could just be changing timing?). Mike p.s. though I did get this lockdep warning when unmounting the xfs filesystem: ================================= [ INFO: inconsistent lock state ] 2.6.38-rc6-snitm+ #8 --------------------------------- inconsistent {IN-RECLAIM_FS-R} -> {RECLAIM_FS-ON-W} usage. umount/1524 [HC0[0]:SC0[0]:HE1:SE1] takes: (iprune_sem){+++++-}, at: [] evict_inodes+0x2f/0x107 {IN-RECLAIM_FS-R} state was registered at: [] __lock_acquire+0x3a4/0xd26 [] lock_acquire+0xe3/0x110 [] down_read+0x51/0x96 [] shrink_icache_memory+0x4a/0x215 [] shrink_slab+0xe0/0x164 [] kswapd+0x5e7/0x9dc [] kthread+0xa0/0xa8 [] kernel_thread_helper+0x4/0x10 irq event stamp: 73433 hardirqs last enabled at (73433): [] debug_check_no_locks_freed+0x12e/0x145 hardirqs last disabled at (73432): [] debug_check_no_locks_freed+0x43/0x145 softirqs last enabled at (72996): [] __do_softirq+0x1b4/0x1d3 softirqs last disabled at (72991): [] call_softirq+0x1c/0x28 other info that might help us debug this: 2 locks held by umount/1524: #0: (&type->s_umount_key#24){++++++}, at: [] deactivate_super+0x3d/0x4a #1: (iprune_sem){+++++-}, at: [] evict_inodes+0x2f/0x107 stack backtrace: Pid: 1524, comm: umount Not tainted 2.6.38-rc6-snitm+ #8 Call Trace: [] ? valid_state+0x17e/0x191 [] ? check_usage_backwards+0x0/0x81 [] ? mark_lock+0x152/0x22d [] ? mark_held_locks+0x52/0x70 [] ? lockdep_trace_alloc+0x99/0xbb [] ? kmem_cache_alloc+0x30/0x145 [] ? kmem_zone_alloc+0x69/0xb1 [xfs] [] ? kmem_zone_zalloc+0x14/0x35 [xfs] [] ? _xfs_trans_alloc+0x27/0x64 [xfs] [] ? xfs_trans_alloc+0x9f/0xac [xfs] [] ? up_read+0x23/0x3c [] ? xfs_iunlock+0x7e/0xbc [xfs] [] ? xfs_free_eofblocks+0xea/0x1f1 [xfs] [] ? xfs_inactive+0x108/0x3a6 [xfs] [] ? lockdep_init_map+0xa6/0x11b [] ? xfs_fs_evict_inode+0xf6/0xfe [xfs] [] ? evict+0x24/0x8c [] ? dispose_list+0x31/0xaf [] ? evict_inodes+0xf0/0x107 [] ? generic_shutdown_super+0x5c/0xdf [] ? kill_block_super+0x27/0x69 [] ? deactivate_locked_super+0x26/0x4b [] ? deactivate_super+0x45/0x4a [] ? mntput_no_expire+0x105/0x10e [] ? sys_umount+0x2d9/0x304 [] ? trace_hardirqs_on_caller+0x11d/0x141 [] ? system_call_fastpath+0x16/0x1b