All of lore.kernel.org
 help / color / mirror / Atom feed
* 3.4.4-rt13: btrfs + xfstests 006 = BOOM..  and a bonus rt_mutex deadlock report for absolutely free!
@ 2012-07-12  5:47 Mike Galbraith
  2012-07-12  8:44 ` Mike Galbraith
                   ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: Mike Galbraith @ 2012-07-12  5:47 UTC (permalink / raw)
  To: linux-rt-users; +Cc: LKML, linux-fsdevel, Thomas Gleixner, Steven Rostedt

Greetings,

I'm chasing btrfs critters in an enterprise 3.0-rt kernel, and just
checked to see if they're alive in virgin latest/greatest rt kernel.  

Both are indeed alive and well, ie I didn't break it, nor did the
zillion patches in enterprise base kernel, so others may have an
opportunity to meet these critters up close and personal as well.  

Unfortunately, this kernel refuses to crash dump, but both appear to be
my exact critters, so I'll report them, then go back to squabbling with
the things where I can at least rummage in piles of wreckage to gather
rocks and sharpen sticks. 

Box: x3550 M3 1 x E5620, HT enabled ATM.

Reproducer1: xfstests 006 in a loop, box doesn't last long at all. 

[  189.300478] ------------[ cut here ]------------
[  189.300482] kernel BUG at kernel/rtmutex_common.h:75!
[  189.300486] invalid opcode: 0000 [#1] PREEMPT SMP 
[  189.300489] CPU 2 
[  189.300490] Modules linked in: ibm_rtl nfsd ipmi_devintf lockd nfs_acl auth_rpcgss sunrpc ipmi_si ipmi_msghandler ipv6 af_packet edd cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf fuse loop dm_mod tpm_tis tpm ioatdma shpchp tpm_bios pci_hotplug sg cdc_ether usbnet i2c_i801 serio_raw mii pcspkr i7core_edac i2c_core dca iTCO_wdt edac_core button iTCO_vendor_support bnx2 usbhid hid uhci_hcd ehci_hcd usbcore sd_mod crc_t10dif rtc_cmos usb_common fan processor ata_generic ata_piix libata megaraid_sas scsi_mod thermal thermal_sys hwmon
[  189.300531] 
[  189.300534] Pid: 15363, comm: btrfs-worker-1 Not tainted 3.4.4-rt13 #24 IBM System x3550 M3 -[7944K3G]-/69Y5698     
[  189.300539] RIP: 0010:[<ffffffff81089db9>]  [<ffffffff81089db9>] __try_to_take_rt_mutex+0x169/0x170
[  189.300551] RSP: 0018:ffff880174527b90  EFLAGS: 00010296
[  189.300554] RAX: 0000000000000000 RBX: ffff880177a0edd0 RCX: 0000000000000001
[  189.300557] RDX: 0000000000000000 RSI: ffff8801760a77c8 RDI: ffff8801760a77b0
[  189.300559] RBP: ffff880174527bd0 R08: 0000000000000001 R09: 0000000000000001
[  189.300562] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801791812c0
[  189.300565] R13: ffff880177a0edd8 R14: ffff880177a0edd0 R15: ffff8801791812c0
[  189.300569] FS:  0000000000000000(0000) GS:ffff88017f240000(0000) knlGS:0000000000000000
[  189.300572] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  189.300575] CR2: 00007f6449423f90 CR3: 000000000180e000 CR4: 00000000000007e0
[  189.300578] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  189.300582] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  189.300585] Process btrfs-worker-1 (pid: 15363, threadinfo ffff880174526000, task ffff8801791812c0)
[  189.300587] Stack:
[  189.300589]  ffff880179234500 ffff88017927b6b0 ffff88017a6a4180 ffff880177a0edd0
[  189.300595]  ffff880175b10da0 ffff880177ad9e98 ffff880177a0edd0 ffff8801791812c0
[  189.300599]  ffff880174527ca0 ffffffff814c466e 0000000000011200 ffff88017f24ca40
[  189.300604] Call Trace:
[  189.300611]  [<ffffffff814c466e>] rt_spin_lock_slowlock+0x4e/0x291
[  189.300618]  [<ffffffff8110ce64>] ? kmem_cache_alloc+0x114/0x1f0
[  189.300626]  [<ffffffff8114f710>] ? bvec_alloc_bs+0x60/0x110
[  189.300631]  [<ffffffff814c4a01>] rt_spin_lock+0x21/0x30
[  189.300636]  [<ffffffff81244b63>] schedule_bio+0x63/0x130
[  189.300640]  [<ffffffff8114f8f7>] ? bio_clone+0x47/0x90
[  189.300645]  [<ffffffff8124a862>] btrfs_map_bio+0xc2/0x230
[  189.300650]  [<ffffffff81215086>] __btree_submit_bio_done+0x16/0x20
[  189.300654]  [<ffffffff812157b8>] run_one_async_done+0xa8/0xc0
[  189.300658]  [<ffffffff8124c5c8>] run_ordered_completions+0x88/0xe0
[  189.300663]  [<ffffffff8124cfb5>] worker_loop+0xc5/0x430
[  189.300669]  [<ffffffff814c32f0>] ? __schedule+0x2b0/0x630
[  189.300673]  [<ffffffff8124cef0>] ? btrfs_queue_worker+0x1e0/0x1e0
[  189.300677]  [<ffffffff8124cef0>] ? btrfs_queue_worker+0x1e0/0x1e0
[  189.300684]  [<ffffffff81058246>] kthread+0x96/0xa0
[  189.300690]  [<ffffffff81061ae4>] ? finish_task_switch+0x54/0xd0
[  189.300695]  [<ffffffff814cbc64>] kernel_thread_helper+0x4/0x10
[  189.300700]  [<ffffffff810581b0>] ? __init_kthread_worker+0x50/0x50
[  189.300704]  [<ffffffff814cbc60>] ? gs_change+0x13/0x13
[  189.300706] Code: 02 ff ff ff e9 49 ff ff ff 49 39 f5 74 18 4d 8d b4 24 b0 05 00 00 4c 89 f7 e8 74 ae 43 00 49 89 c7 e9 67 ff ff ff 4c 89 e0 eb aa <0f> 0b 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 18 e8 ef 
[  189.300735] RIP  [<ffffffff81089db9>] __try_to_take_rt_mutex+0x169/0x170
[  189.300740]  RSP <ffff880174527b90>
[  189.636837] ---[ end trace 0000000000000002 ]---

>From 3.0-rt that will dump.

crash> struct rt_mutex 0xffff8801770601c8
struct rt_mutex {
  wait_lock = {
    raw_lock = {
      slock = 7966
    }
  }, 
  wait_list = {
    node_list = {
      next = 0xffff880175eedbe0, 
      prev = 0xffff880175eedbe0
    }, 
    rawlock = 0xffff880175eedbd8, 
    spinlock = 0x0
  }, 
  owner = 0x1, 
  save_state = 0, 
  file = 0x0, 
  name = 0xffffffff81781b9b "&(&device->io_lock)->lock", 
  line = 0, 
  magic = 0x0
}
crash> struct list_head 0xffff880175eedbe0
struct list_head {
  next = 0x6b6b6b6b6b6b6b6b, 
  prev = 0x6b6b6b6b6b6b6b6b
}

Reproducer2: dbench -t 30 8

[  692.857164] 
[  692.857165] ============================================
[  692.863963] [ BUG: circular locking deadlock detected! ]
[  692.869264] Not tainted
[  692.871708] --------------------------------------------
[  692.877008] btrfs-delayed-m/1404 is deadlocking current task dbench/7937
[  692.877009] 
[  692.885183] 
[  692.885184] 1) dbench/7937 is trying to acquire this lock:
[  692.892149]  [ffff88014d6aea80] {&(&eb->lock)->lock}
[  692.897102] .. ->owner: ffff880175808501
[  692.901018] .. held by:   btrfs-delayed-m: 1404 [ffff880175808500, 120]
[  692.907657] 
[  692.907657] 2) btrfs-delayed-m/1404 is blocked on this lock:
[  692.914797]  [ffff88014bf58d60] {&(&eb->lock)->lock}
[  692.919751] .. ->owner: ffff880175186101
[  692.923672] .. held by:            dbench: 7937 [ffff880175186100, 120]
[  692.930309] 
[  692.930309] btrfs-delayed-m/1404's [blocked] stackdump:
[  692.930310] 
[  692.938504]  ffff880177575aa0 0000000000000046 ffff88014bf58d60 000000000000fb00
[  692.938507]  000000000000fb00 ffff880177575fd8 000000000000fb00 ffff880177574000
[  692.938509]  ffff880177575fd8 000000000000fb00 ffff88017662f240 ffff880175808500
[  692.960635] Call Trace:
[  692.963085]  [<ffffffff814c68e9>] schedule+0x29/0x90
[  692.963087]  [<ffffffff814c745d>] rt_spin_lock_slowlock+0xfd/0x330
[  692.963090]  [<ffffffff814c7a39>] __rt_spin_lock+0x9/0x10
[  692.963092]  [<ffffffff814c7b27>] rt_read_lock+0x27/0x40
[  692.963096]  [<ffffffff812550cf>] btrfs_clear_lock_blocking_rw+0x6f/0x180
[  692.963099]  [<ffffffff811fabd2>] btrfs_clear_path_blocking+0x32/0x70
[  692.963102]  [<ffffffff81200342>] btrfs_search_slot+0x6b2/0x810
[  692.963105]  [<ffffffff812148da>] btrfs_lookup_inode+0x2a/0xa0
[  692.963107]  [<ffffffff814c7312>] ? rt_mutex_lock+0x12/0x20
[  692.963111]  [<ffffffff8126d0bc>] btrfs_update_delayed_inode+0x6c/0x160
[  692.963113]  [<ffffffff814c7ab9>] ? _mutex_unlock+0x9/0x10
[  692.963116]  [<ffffffff8126e142>] btrfs_async_run_delayed_node_done+0x182/0x1a0
[  692.963119]  [<ffffffff8124ed5f>] worker_loop+0xaf/0x430
[  692.963121]  [<ffffffff8124ecb0>] ? btrfs_queue_worker+0x1e0/0x1e0
[  692.963123]  [<ffffffff8124ecb0>] ? btrfs_queue_worker+0x1e0/0x1e0
[  692.963127]  [<ffffffff8105850e>] kthread+0xae/0xc0
[  692.963129]  [<ffffffff814c68e9>] ? schedule+0x29/0x90
[  692.963133]  [<ffffffff810015bc>] ? __switch_to+0x14c/0x410
[  692.963137]  [<ffffffff81061e44>] ? finish_task_switch+0x54/0xd0
[  692.963140]  [<ffffffff814ceca4>] kernel_thread_helper+0x4/0x10
[  692.963143]  [<ffffffff81058460>] ? __init_kthread_worker+0x50/0x50
[  692.963145]  [<ffffffff814ceca0>] ? gs_change+0x13/0x13
[  692.963146] 
[  692.963147] dbench/7937's [current] stackdump:
[  692.963147] 
[  693.098724] Pid: 7937, comm: dbench Not tainted 3.4.4-rt13 #25
[  693.104544] Call Trace:
[  693.106993]  [<ffffffff8108b436>] debug_rt_mutex_print_deadlock+0x176/0x190
[  693.106995]  [<ffffffff814c74ec>] rt_spin_lock_slowlock+0x18c/0x330
[  693.106998]  [<ffffffff814c7a39>] __rt_spin_lock+0x9/0x10
[  693.107000]  [<ffffffff814c7b27>] rt_read_lock+0x27/0x40
[  693.107002]  [<ffffffff8125538c>] btrfs_try_tree_read_lock+0x4c/0x80
[  693.107004]  [<ffffffff812001bd>] btrfs_search_slot+0x52d/0x810
[  693.107007]  [<ffffffff812027ba>] btrfs_next_leaf+0xea/0x440
[  693.107010]  [<ffffffff81238ed8>] ? btrfs_token_dir_data_len+0x58/0xd0
[  693.107012]  [<ffffffff81238ed8>] ? btrfs_token_dir_data_len+0x58/0xd0
[  693.107016]  [<ffffffff81222e17>] btrfs_real_readdir+0x247/0x610
[  693.107020]  [<ffffffff81131350>] ? sys_ioctl+0xa0/0xa0
[  693.107022]  [<ffffffff81131350>] ? sys_ioctl+0xa0/0xa0
[  693.107024]  [<ffffffff81131660>] vfs_readdir+0xb0/0xd0
[  693.107026]  [<ffffffff81131840>] sys_getdents64+0x80/0xe0
[  693.107030]  [<ffffffff814cd9b9>] system_call_fastpath+0x16/0x1b
[  693.107032] [ turning off deadlock detection.Please report this trace. ]
[  693.107033] 



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM..  and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-12  5:47 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free! Mike Galbraith
@ 2012-07-12  8:44 ` Mike Galbraith
  2012-07-12  9:53   ` Mike Galbraith
  2012-07-12 11:07 ` Thomas Gleixner
  2012-07-13 12:50 ` Chris Mason
  2 siblings, 1 reply; 39+ messages in thread
From: Mike Galbraith @ 2012-07-12  8:44 UTC (permalink / raw)
  To: linux-rt-users; +Cc: LKML, linux-fsdevel, Thomas Gleixner, Steven Rostedt

On Thu, 2012-07-12 at 07:47 +0200, Mike Galbraith wrote: 
> Greetings,
> 
> I'm chasing btrfs critters in an enterprise 3.0-rt kernel, and just
> checked to see if they're alive in virgin latest/greatest rt kernel.  
> 
> Both are indeed alive and well, ie I didn't break it, nor did the
> zillion patches in enterprise base kernel, so others may have an
> opportunity to meet these critters up close and personal as well.

3.2-rt both explodes and deadlocks as well.  3.0-rt (virgin I mean) does
neither, so with enough re-integrate investment, it might be bisectable.

Rummaging in btrfs, that begins to look down right attractive ;-)

-Mike


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM..  and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-12  8:44 ` Mike Galbraith
@ 2012-07-12  9:53   ` Mike Galbraith
  2012-07-12 11:43     ` Thomas Gleixner
  0 siblings, 1 reply; 39+ messages in thread
From: Mike Galbraith @ 2012-07-12  9:53 UTC (permalink / raw)
  To: linux-rt-users; +Cc: LKML, linux-fsdevel, Thomas Gleixner, Steven Rostedt

On Thu, 2012-07-12 at 10:44 +0200, Mike Galbraith wrote: 
> On Thu, 2012-07-12 at 07:47 +0200, Mike Galbraith wrote: 
> > Greetings,
> > 
> > I'm chasing btrfs critters in an enterprise 3.0-rt kernel, and just
> > checked to see if they're alive in virgin latest/greatest rt kernel.  
> > 
> > Both are indeed alive and well, ie I didn't break it, nor did the
> > zillion patches in enterprise base kernel, so others may have an
> > opportunity to meet these critters up close and personal as well.
> 
> 3.2-rt both explodes and deadlocks as well.  3.0-rt (virgin I mean) does
> neither, so with enough re-integrate investment, it might be bisectable.

Nope, virgin 3.0-rt just didn't feel like it at the time.  Booted it
again to run hefty test over lunch, it didn't survive 1 xfstests 006,
much less hundreds.

crash> bt
PID: 7604   TASK: ffff880174238b20  CPU: 0   COMMAND: "btrfs-worker-0"
 #0 [ffff88017455d9c8] machine_kexec at ffffffff81025794
 #1 [ffff88017455da28] crash_kexec at ffffffff8109781d
 #2 [ffff88017455daf8] panic at ffffffff814a0661
 #3 [ffff88017455db78] __try_to_take_rt_mutex at ffffffff81086d2f
 #4 [ffff88017455dbc8] rt_spin_lock_slowlock at ffffffff814a2670
 #5 [ffff88017455dca8] rt_spin_lock at ffffffff814a2db9
 #6 [ffff88017455dcb8] schedule_bio at ffffffff81243133
 #7 [ffff88017455dcf8] btrfs_map_bio at ffffffff812477be
 #8 [ffff88017455dd68] __btree_submit_bio_done at ffffffff812152f6
 #9 [ffff88017455dd78] run_one_async_done at ffffffff812148fa
#10 [ffff88017455dd98] run_ordered_completions at ffffffff812493e8
#11 [ffff88017455ddd8] worker_loop at ffffffff81249dc9
#12 [ffff88017455de88] kthread at ffffffff81070266
#13 [ffff88017455df48] kernel_thread_helper at ffffffff814a9be4
crash> struct rt_mutex 0xffff880174530108
struct rt_mutex {
  wait_lock = {
    raw_lock = {
      slock = 7966
    }
  }, 
  wait_list = {
    node_list = {
      next = 0xffff880175ecc970, 
      prev = 0xffff880175ecc970
    }, 
    rawlock = 0xffff880175ecc968, 
    spinlock = 0x0
  }, 
  owner = 0x1, 
  save_state = 0, 
  file = 0x0, 
  name = 0xffffffff81763d02 "&(&device->io_lock)->lock", 
  line = 0, 
  magic = 0x0
}


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-12  5:47 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free! Mike Galbraith
  2012-07-12  8:44 ` Mike Galbraith
@ 2012-07-12 11:07 ` Thomas Gleixner
  2012-07-12 17:09   ` Chris Mason
  2012-07-13 12:50 ` Chris Mason
  2 siblings, 1 reply; 39+ messages in thread
From: Thomas Gleixner @ 2012-07-12 11:07 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: linux-rt-users, LKML, linux-fsdevel, Steven Rostedt, Peter Zijlstra

On Thu, 12 Jul 2012, Mike Galbraith wrote:
> crash> struct rt_mutex 0xffff8801770601c8
> struct rt_mutex {
>   wait_lock = {
>     raw_lock = {
>       slock = 7966
>     }
>   }, 
>   wait_list = {
>     node_list = {
>       next = 0xffff880175eedbe0, 
>       prev = 0xffff880175eedbe0
>     }, 
>     rawlock = 0xffff880175eedbd8, 

Urgh. Here is something completely wrong. That should point to
wait_lock, i.e. the rt_mutex itself, but that points into lala land.

>     spinlock = 0x0
>   }, 
>   owner = 0x1, 
>   save_state = 0, 
>   file = 0x0, 
>   name = 0xffffffff81781b9b "&(&device->io_lock)->lock", 
>   line = 0, 
>   magic = 0x0
> }
> crash> struct list_head 0xffff880175eedbe0
> struct list_head {
>   next = 0x6b6b6b6b6b6b6b6b, 
>   prev = 0x6b6b6b6b6b6b6b6b
> }

That's POISON_FREE. How the heck can this happen ?

 
> Reproducer2: dbench -t 30 8
> 
> [  692.857164] 
> [  692.857165] ============================================
> [  692.863963] [ BUG: circular locking deadlock detected! ]
> [  692.869264] Not tainted
> [  692.871708] --------------------------------------------
> [  692.877008] btrfs-delayed-m/1404 is deadlocking current task dbench/7937
> [  692.877009] 
> [  692.885183] 
> [  692.885184] 1) dbench/7937 is trying to acquire this lock:
> [  692.892149]  [ffff88014d6aea80] {&(&eb->lock)->lock}
> [  692.897102] .. ->owner: ffff880175808501
> [  692.901018] .. held by:   btrfs-delayed-m: 1404 [ffff880175808500, 120]
> [  692.907657] 
> [  692.907657] 2) btrfs-delayed-m/1404 is blocked on this lock:
> [  692.914797]  [ffff88014bf58d60] {&(&eb->lock)->lock}
> [  692.919751] .. ->owner: ffff880175186101
> [  692.923672] .. held by:            dbench: 7937 [ffff880175186100, 120]
> [  692.930309] 
> [  692.930309] btrfs-delayed-m/1404's [blocked] stackdump:

Hrmm. Both locks are rw_locks and we prevent multiple readers for the
known reasons in RT. No idea how to deal with that one :(

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-12  9:53   ` Mike Galbraith
@ 2012-07-12 11:43     ` Thomas Gleixner
  2012-07-12 11:57       ` Mike Galbraith
  0 siblings, 1 reply; 39+ messages in thread
From: Thomas Gleixner @ 2012-07-12 11:43 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: linux-rt-users, LKML, linux-fsdevel, Steven Rostedt, Peter Zijlstra

On Thu, 12 Jul 2012, Mike Galbraith wrote:
> On Thu, 2012-07-12 at 10:44 +0200, Mike Galbraith wrote: 
> > On Thu, 2012-07-12 at 07:47 +0200, Mike Galbraith wrote: 
> > > Greetings,
> > > 
> > > I'm chasing btrfs critters in an enterprise 3.0-rt kernel, and just
> > > checked to see if they're alive in virgin latest/greatest rt kernel.  
> > > 
> > > Both are indeed alive and well, ie I didn't break it, nor did the
> > > zillion patches in enterprise base kernel, so others may have an
> > > opportunity to meet these critters up close and personal as well.
> > 
> > 3.2-rt both explodes and deadlocks as well.  3.0-rt (virgin I mean) does
> > neither, so with enough re-integrate investment, it might be bisectable.
> 
> Nope, virgin 3.0-rt just didn't feel like it at the time.  Booted it
> again to run hefty test over lunch, it didn't survive 1 xfstests 006,
> much less hundreds.
> 
> crash> bt
> PID: 7604   TASK: ffff880174238b20  CPU: 0   COMMAND: "btrfs-worker-0"
>  #0 [ffff88017455d9c8] machine_kexec at ffffffff81025794
>  #1 [ffff88017455da28] crash_kexec at ffffffff8109781d
>  #2 [ffff88017455daf8] panic at ffffffff814a0661
>  #3 [ffff88017455db78] __try_to_take_rt_mutex at ffffffff81086d2f
>  #4 [ffff88017455dbc8] rt_spin_lock_slowlock at ffffffff814a2670
>  #5 [ffff88017455dca8] rt_spin_lock at ffffffff814a2db9
>  #6 [ffff88017455dcb8] schedule_bio at ffffffff81243133
>  #7 [ffff88017455dcf8] btrfs_map_bio at ffffffff812477be
>  #8 [ffff88017455dd68] __btree_submit_bio_done at ffffffff812152f6
>  #9 [ffff88017455dd78] run_one_async_done at ffffffff812148fa
> #10 [ffff88017455dd98] run_ordered_completions at ffffffff812493e8
> #11 [ffff88017455ddd8] worker_loop at ffffffff81249dc9
> #12 [ffff88017455de88] kthread at ffffffff81070266
> #13 [ffff88017455df48] kernel_thread_helper at ffffffff814a9be4
> crash> struct rt_mutex 0xffff880174530108
> struct rt_mutex {
>   wait_lock = {
>     raw_lock = {
>       slock = 7966
>     }
>   }, 
>   wait_list = {
>     node_list = {
>       next = 0xffff880175ecc970, 
>       prev = 0xffff880175ecc970
>     }, 
>     rawlock = 0xffff880175ecc968, 

Pointer into lala land again.

rawlock points to ...968 and the node_list to ...970.

struct rt_mutex {
        raw_spinlock_t          wait_lock;
        struct plist_head       wait_list;

The raw_lock pointer of the plist_head is initialized in
__rt_mutex_init() so it points to wait_lock. 

Can you check the offset of wait_list vs. the rt_mutex itself?

I wouldn't be surprised if it's exactly 8 bytes. And then this thing
looks like a copied lock with stale pointers to hell. Eew.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-12 11:43     ` Thomas Gleixner
@ 2012-07-12 11:57       ` Mike Galbraith
  2012-07-12 13:31         ` Thomas Gleixner
  0 siblings, 1 reply; 39+ messages in thread
From: Mike Galbraith @ 2012-07-12 11:57 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-rt-users, LKML, linux-fsdevel, Steven Rostedt, Peter Zijlstra

On Thu, 2012-07-12 at 13:43 +0200, Thomas Gleixner wrote: 
> On Thu, 12 Jul 2012, Mike Galbraith wrote:
> > On Thu, 2012-07-12 at 10:44 +0200, Mike Galbraith wrote: 
> > > On Thu, 2012-07-12 at 07:47 +0200, Mike Galbraith wrote: 
> > > > Greetings,
> > > > 
> > > > I'm chasing btrfs critters in an enterprise 3.0-rt kernel, and just
> > > > checked to see if they're alive in virgin latest/greatest rt kernel.  
> > > > 
> > > > Both are indeed alive and well, ie I didn't break it, nor did the
> > > > zillion patches in enterprise base kernel, so others may have an
> > > > opportunity to meet these critters up close and personal as well.
> > > 
> > > 3.2-rt both explodes and deadlocks as well.  3.0-rt (virgin I mean) does
> > > neither, so with enough re-integrate investment, it might be bisectable.
> > 
> > Nope, virgin 3.0-rt just didn't feel like it at the time.  Booted it
> > again to run hefty test over lunch, it didn't survive 1 xfstests 006,
> > much less hundreds.
> > 
> > crash> bt
> > PID: 7604   TASK: ffff880174238b20  CPU: 0   COMMAND: "btrfs-worker-0"
> >  #0 [ffff88017455d9c8] machine_kexec at ffffffff81025794
> >  #1 [ffff88017455da28] crash_kexec at ffffffff8109781d
> >  #2 [ffff88017455daf8] panic at ffffffff814a0661
> >  #3 [ffff88017455db78] __try_to_take_rt_mutex at ffffffff81086d2f
> >  #4 [ffff88017455dbc8] rt_spin_lock_slowlock at ffffffff814a2670
> >  #5 [ffff88017455dca8] rt_spin_lock at ffffffff814a2db9
> >  #6 [ffff88017455dcb8] schedule_bio at ffffffff81243133
> >  #7 [ffff88017455dcf8] btrfs_map_bio at ffffffff812477be
> >  #8 [ffff88017455dd68] __btree_submit_bio_done at ffffffff812152f6
> >  #9 [ffff88017455dd78] run_one_async_done at ffffffff812148fa
> > #10 [ffff88017455dd98] run_ordered_completions at ffffffff812493e8
> > #11 [ffff88017455ddd8] worker_loop at ffffffff81249dc9
> > #12 [ffff88017455de88] kthread at ffffffff81070266
> > #13 [ffff88017455df48] kernel_thread_helper at ffffffff814a9be4
> > crash> struct rt_mutex 0xffff880174530108
> > struct rt_mutex {
> >   wait_lock = {
> >     raw_lock = {
> >       slock = 7966
> >     }
> >   }, 
> >   wait_list = {
> >     node_list = {
> >       next = 0xffff880175ecc970, 
> >       prev = 0xffff880175ecc970
> >     }, 
> >     rawlock = 0xffff880175ecc968, 
> 
> Pointer into lala land again.

Yeah, and freed again.

> rawlock points to ...968 and the node_list to ...970.
> 
> struct rt_mutex {
>         raw_spinlock_t          wait_lock;
>         struct plist_head       wait_list;
> 
> The raw_lock pointer of the plist_head is initialized in
> __rt_mutex_init() so it points to wait_lock. 
> 
> Can you check the offset of wait_list vs. the rt_mutex itself?
> 
> I wouldn't be surprised if it's exactly 8 bytes. And then this thing
> looks like a copied lock with stale pointers to hell. Eew.

crash> struct rt_mutex -o
struct rt_mutex {
   [0] raw_spinlock_t wait_lock;
   [8] struct plist_head wait_list;
  [40] struct task_struct *owner;
  [48] int save_state;
  [56] const char *file;
  [64] const char *name;
  [72] int line;
  [80] void *magic;
}
SIZE: 88


-Mike


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-12 11:57       ` Mike Galbraith
@ 2012-07-12 13:31         ` Thomas Gleixner
  2012-07-12 13:37           ` Mike Galbraith
  2012-07-13  6:31           ` Mike Galbraith
  0 siblings, 2 replies; 39+ messages in thread
From: Thomas Gleixner @ 2012-07-12 13:31 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: linux-rt-users, LKML, linux-fsdevel, Steven Rostedt, Peter Zijlstra

On Thu, 12 Jul 2012, Mike Galbraith wrote:
> On Thu, 2012-07-12 at 13:43 +0200, Thomas Gleixner wrote: 
> > rawlock points to ...968 and the node_list to ...970.
> > 
> > struct rt_mutex {
> >         raw_spinlock_t          wait_lock;
> >         struct plist_head       wait_list;
> > 
> > The raw_lock pointer of the plist_head is initialized in
> > __rt_mutex_init() so it points to wait_lock. 
> > 
> > Can you check the offset of wait_list vs. the rt_mutex itself?
> > 
> > I wouldn't be surprised if it's exactly 8 bytes. And then this thing
> > looks like a copied lock with stale pointers to hell. Eew.
> 
> crash> struct rt_mutex -o
> struct rt_mutex {
>    [0] raw_spinlock_t wait_lock;
>    [8] struct plist_head wait_list;

Bingo, that makes it more likely that this is caused by copying w/o
initializing the lock and then freeing the original structure.

A quick check for memcpy finds that __btrfs_close_devices() does a
memcpy of btrfs_device structs w/o initializing the lock in the new
copy, but I have no idea whether that's the place we are looking for.

Thanks,

	tglx

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 43baaf0..06c8ced 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -512,6 +512,7 @@ static int __btrfs_close_devices(struct btrfs_fs_devices *fs_devices)
 		new_device->writeable = 0;
 		new_device->in_fs_metadata = 0;
 		new_device->can_discard = 0;
+		spin_lock_init(&new_device->io_lock);
 		list_replace_rcu(&device->dev_list, &new_device->dev_list);
 
 		call_rcu(&device->rcu, free_device);




^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-12 13:31         ` Thomas Gleixner
@ 2012-07-12 13:37           ` Mike Galbraith
  2012-07-12 13:43             ` Thomas Gleixner
  2012-07-13  6:31           ` Mike Galbraith
  1 sibling, 1 reply; 39+ messages in thread
From: Mike Galbraith @ 2012-07-12 13:37 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-rt-users, LKML, linux-fsdevel, Steven Rostedt, Peter Zijlstra

On Thu, 2012-07-12 at 15:31 +0200, Thomas Gleixner wrote: 
> On Thu, 12 Jul 2012, Mike Galbraith wrote:
> > On Thu, 2012-07-12 at 13:43 +0200, Thomas Gleixner wrote: 
> > > rawlock points to ...968 and the node_list to ...970.
> > > 
> > > struct rt_mutex {
> > >         raw_spinlock_t          wait_lock;
> > >         struct plist_head       wait_list;
> > > 
> > > The raw_lock pointer of the plist_head is initialized in
> > > __rt_mutex_init() so it points to wait_lock. 
> > > 
> > > Can you check the offset of wait_list vs. the rt_mutex itself?
> > > 
> > > I wouldn't be surprised if it's exactly 8 bytes. And then this thing
> > > looks like a copied lock with stale pointers to hell. Eew.
> > 
> > crash> struct rt_mutex -o
> > struct rt_mutex {
> >    [0] raw_spinlock_t wait_lock;
> >    [8] struct plist_head wait_list;
> 
> Bingo, that makes it more likely that this is caused by copying w/o
> initializing the lock and then freeing the original structure.
> 
> A quick check for memcpy finds that __btrfs_close_devices() does a
> memcpy of btrfs_device structs w/o initializing the lock in the new
> copy, but I have no idea whether that's the place we are looking for.


Cool, you found one, thanks!  I'm setting boobytraps.

Um, correction, box says I'm setting _buggy_ boobytraps :)

Tomorrow-man will test this and frob traps anew.

> 
> Thanks,
> 
> 	tglx
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 43baaf0..06c8ced 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -512,6 +512,7 @@ static int __btrfs_close_devices(struct btrfs_fs_devices *fs_devices)
>  		new_device->writeable = 0;
>  		new_device->in_fs_metadata = 0;
>  		new_device->can_discard = 0;
> +		spin_lock_init(&new_device->io_lock);
>  		list_replace_rcu(&device->dev_list, &new_device->dev_list);
>  
>  		call_rcu(&device->rcu, free_device);
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-12 13:37           ` Mike Galbraith
@ 2012-07-12 13:43             ` Thomas Gleixner
  2012-07-12 13:48               ` Mike Galbraith
  0 siblings, 1 reply; 39+ messages in thread
From: Thomas Gleixner @ 2012-07-12 13:43 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: linux-rt-users, LKML, linux-fsdevel, Steven Rostedt, Peter Zijlstra

On Thu, 12 Jul 2012, Mike Galbraith wrote:
> On Thu, 2012-07-12 at 15:31 +0200, Thomas Gleixner wrote: 
> > A quick check for memcpy finds that __btrfs_close_devices() does a
> > memcpy of btrfs_device structs w/o initializing the lock in the new
> > copy, but I have no idea whether that's the place we are looking for.
> 
> 
> Cool, you found one, thanks!  I'm setting boobytraps.
> 
> Um, correction, box says I'm setting _buggy_ boobytraps :)
> 
> Tomorrow-man will test this and frob traps anew.

What kind of test setup do you have? i.e. raid, single disk ...

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-12 13:43             ` Thomas Gleixner
@ 2012-07-12 13:48               ` Mike Galbraith
  2012-07-12 13:51                 ` Mike Galbraith
  0 siblings, 1 reply; 39+ messages in thread
From: Mike Galbraith @ 2012-07-12 13:48 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-rt-users, LKML, linux-fsdevel, Steven Rostedt, Peter Zijlstra

On Thu, 2012-07-12 at 15:43 +0200, Thomas Gleixner wrote: 
> On Thu, 12 Jul 2012, Mike Galbraith wrote:
> > On Thu, 2012-07-12 at 15:31 +0200, Thomas Gleixner wrote: 
> > > A quick check for memcpy finds that __btrfs_close_devices() does a
> > > memcpy of btrfs_device structs w/o initializing the lock in the new
> > > copy, but I have no idea whether that's the place we are looking for.
> > 
> > 
> > Cool, you found one, thanks!  I'm setting boobytraps.
> > 
> > Um, correction, box says I'm setting _buggy_ boobytraps :)
> > 
> > Tomorrow-man will test this and frob traps anew.
> 
> What kind of test setup do you have? i.e. raid, single disk ...

Yeah, megaraid sas.. x3550 M3.

-Mike


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-12 13:48               ` Mike Galbraith
@ 2012-07-12 13:51                 ` Mike Galbraith
  0 siblings, 0 replies; 39+ messages in thread
From: Mike Galbraith @ 2012-07-12 13:51 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-rt-users, LKML, linux-fsdevel, Steven Rostedt, Peter Zijlstra

On Thu, 2012-07-12 at 15:48 +0200, Mike Galbraith wrote: 
> On Thu, 2012-07-12 at 15:43 +0200, Thomas Gleixner wrote: 
> > On Thu, 12 Jul 2012, Mike Galbraith wrote:
> > > On Thu, 2012-07-12 at 15:31 +0200, Thomas Gleixner wrote: 
> > > > A quick check for memcpy finds that __btrfs_close_devices() does a
> > > > memcpy of btrfs_device structs w/o initializing the lock in the new
> > > > copy, but I have no idea whether that's the place we are looking for.
> > > 
> > > 
> > > Cool, you found one, thanks!  I'm setting boobytraps.
> > > 
> > > Um, correction, box says I'm setting _buggy_ boobytraps :)
> > > 
> > > Tomorrow-man will test this and frob traps anew.
> > 
> > What kind of test setup do you have? i.e. raid, single disk ...
> 
> Yeah, megaraid sas.. x3550 M3.

(one disk for OS, one disk for xfstests to mangle)


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-12 11:07 ` Thomas Gleixner
@ 2012-07-12 17:09   ` Chris Mason
  2012-07-13 10:04     ` Thomas Gleixner
  0 siblings, 1 reply; 39+ messages in thread
From: Chris Mason @ 2012-07-12 17:09 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mike Galbraith, linux-rt-users, LKML, linux-fsdevel,
	Steven Rostedt, Peter Zijlstra

On Thu, Jul 12, 2012 at 05:07:58AM -0600, Thomas Gleixner wrote:
> On Thu, 12 Jul 2012, Mike Galbraith wrote:
> > crash> struct rt_mutex 0xffff8801770601c8
> > struct rt_mutex {
> >   wait_lock = {
> >     raw_lock = {
> >       slock = 7966
> >     }
> >   }, 
> >   wait_list = {
> >     node_list = {
> >       next = 0xffff880175eedbe0, 
> >       prev = 0xffff880175eedbe0
> >     }, 
> >     rawlock = 0xffff880175eedbd8, 
> 
> Urgh. Here is something completely wrong. That should point to
> wait_lock, i.e. the rt_mutex itself, but that points into lala land.

This is probably the memcpy you found later this morning, right?

>  
> > Reproducer2: dbench -t 30 8
> > 
> > [  692.857164] 
> > [  692.857165] ============================================
> > [  692.863963] [ BUG: circular locking deadlock detected! ]
> > [  692.869264] Not tainted
> > [  692.871708] --------------------------------------------
> > [  692.877008] btrfs-delayed-m/1404 is deadlocking current task dbench/7937
> > [  692.877009] 
> > [  692.885183] 
> > [  692.885184] 1) dbench/7937 is trying to acquire this lock:
> > [  692.892149]  [ffff88014d6aea80] {&(&eb->lock)->lock}
> > [  692.897102] .. ->owner: ffff880175808501
> > [  692.901018] .. held by:   btrfs-delayed-m: 1404 [ffff880175808500, 120]
> > [  692.907657] 
> > [  692.907657] 2) btrfs-delayed-m/1404 is blocked on this lock:
> > [  692.914797]  [ffff88014bf58d60] {&(&eb->lock)->lock}
> > [  692.919751] .. ->owner: ffff880175186101
> > [  692.923672] .. held by:            dbench: 7937 [ffff880175186100, 120]
> > [  692.930309] 
> > [  692.930309] btrfs-delayed-m/1404's [blocked] stackdump:
> 
> Hrmm. Both locks are rw_locks and we prevent multiple readers for the
> known reasons in RT. No idea how to deal with that one :(

The reader/writer part in btrfs is just an optimization.  If we need
them to be all writer locks for RT purposes, that's not a problem.

But, before we go down that road, we do annotations trying
to make sure lockdep doesn't get confused about lock classes.  Basically
the tree is locked level by level.  So its safe to take eb->lock while
holding eb->lock as long as you follow the rules.

Are additional annotations required for RT?

-chris


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-12 13:31         ` Thomas Gleixner
  2012-07-12 13:37           ` Mike Galbraith
@ 2012-07-13  6:31           ` Mike Galbraith
  2012-07-13  9:52             ` Thomas Gleixner
  1 sibling, 1 reply; 39+ messages in thread
From: Mike Galbraith @ 2012-07-13  6:31 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-rt-users, LKML, linux-fsdevel, Steven Rostedt, Peter Zijlstra

On Thu, 2012-07-12 at 15:31 +0200, Thomas Gleixner wrote: 
> On Thu, 12 Jul 2012, Mike Galbraith wrote:
> > On Thu, 2012-07-12 at 13:43 +0200, Thomas Gleixner wrote: 
> > > rawlock points to ...968 and the node_list to ...970.
> > > 
> > > struct rt_mutex {
> > >         raw_spinlock_t          wait_lock;
> > >         struct plist_head       wait_list;
> > > 
> > > The raw_lock pointer of the plist_head is initialized in
> > > __rt_mutex_init() so it points to wait_lock. 
> > > 
> > > Can you check the offset of wait_list vs. the rt_mutex itself?
> > > 
> > > I wouldn't be surprised if it's exactly 8 bytes. And then this thing
> > > looks like a copied lock with stale pointers to hell. Eew.
> > 
> > crash> struct rt_mutex -o
> > struct rt_mutex {
> >    [0] raw_spinlock_t wait_lock;
> >    [8] struct plist_head wait_list;
> 
> Bingo, that makes it more likely that this is caused by copying w/o
> initializing the lock and then freeing the original structure.
> 
> A quick check for memcpy finds that __btrfs_close_devices() does a
> memcpy of btrfs_device structs w/o initializing the lock in the new
> copy, but I have no idea whether that's the place we are looking for.

Thanks a bunch Thomas.  I doubt I would have ever figured out that lala
land resulted from _copying_ a lock.  That's one I won't be forgetting
any time soon.  Box not only survived a few thousand xfstests 006 runs,
dbench seemed disinterested in deadlocking virgin 3.0-rt.

btrfs still locks up in my enterprise kernel, so I suppose I had better
plug your fix into 3.4-rt and see what happens, and go beat hell out of
virgin 3.0-rt again to be sure box really really survives dbench.

> 	tglx
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 43baaf0..06c8ced 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -512,6 +512,7 @@ static int __btrfs_close_devices(struct btrfs_fs_devices *fs_devices)
>  		new_device->writeable = 0;
>  		new_device->in_fs_metadata = 0;
>  		new_device->can_discard = 0;
> +		spin_lock_init(&new_device->io_lock);
>  		list_replace_rcu(&device->dev_list, &new_device->dev_list);
>  
>  		call_rcu(&device->rcu, free_device);
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-13  6:31           ` Mike Galbraith
@ 2012-07-13  9:52             ` Thomas Gleixner
  2012-07-13 10:14               ` Mike Galbraith
  0 siblings, 1 reply; 39+ messages in thread
From: Thomas Gleixner @ 2012-07-13  9:52 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: linux-rt-users, LKML, linux-fsdevel, Steven Rostedt, Peter Zijlstra

On Fri, 13 Jul 2012, Mike Galbraith wrote:
> On Thu, 2012-07-12 at 15:31 +0200, Thomas Gleixner wrote: 
> > Bingo, that makes it more likely that this is caused by copying w/o
> > initializing the lock and then freeing the original structure.
> > 
> > A quick check for memcpy finds that __btrfs_close_devices() does a
> > memcpy of btrfs_device structs w/o initializing the lock in the new
> > copy, but I have no idea whether that's the place we are looking for.
> 
> Thanks a bunch Thomas.  I doubt I would have ever figured out that lala
> land resulted from _copying_ a lock.  That's one I won't be forgetting
> any time soon.  Box not only survived a few thousand xfstests 006 runs,
> dbench seemed disinterested in deadlocking virgin 3.0-rt.

Cute. It think that the lock copying caused the deadlock problem as
the list pointed to the wrong place, so we might have ended up with
following down the wrong chain when walking the list as long as the
original struct was not freed. That beast is freed under RCU so there
could be a rcu read side critical section fiddling with the old lock
and cause utter confusion.

/me goes and writes a nastigram^W proper changelog

> btrfs still locks up in my enterprise kernel, so I suppose I had better
> plug your fix into 3.4-rt and see what happens, and go beat hell out of
> virgin 3.0-rt again to be sure box really really survives dbench.

A test against 3.4-rt sans enterprise mess might be nice as well.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-12 17:09   ` Chris Mason
@ 2012-07-13 10:04     ` Thomas Gleixner
  0 siblings, 0 replies; 39+ messages in thread
From: Thomas Gleixner @ 2012-07-13 10:04 UTC (permalink / raw)
  To: Chris Mason
  Cc: Mike Galbraith, linux-rt-users, LKML, linux-fsdevel,
	Steven Rostedt, Peter Zijlstra

On Thu, 12 Jul 2012, Chris Mason wrote:
> On Thu, Jul 12, 2012 at 05:07:58AM -0600, Thomas Gleixner wrote:
> > On Thu, 12 Jul 2012, Mike Galbraith wrote:
> > > crash> struct rt_mutex 0xffff8801770601c8
> > > struct rt_mutex {
> > >   wait_lock = {
> > >     raw_lock = {
> > >       slock = 7966
> > >     }
> > >   }, 
> > >   wait_list = {
> > >     node_list = {
> > >       next = 0xffff880175eedbe0, 
> > >       prev = 0xffff880175eedbe0
> > >     }, 
> > >     rawlock = 0xffff880175eedbd8, 
> > 
> > Urgh. Here is something completely wrong. That should point to
> > wait_lock, i.e. the rt_mutex itself, but that points into lala land.
> 
> This is probably the memcpy you found later this morning, right?

As Mike found out, it looks like the culprit.
 
> The reader/writer part in btrfs is just an optimization.  If we need
> them to be all writer locks for RT purposes, that's not a problem.
> 
> But, before we go down that road, we do annotations trying
> to make sure lockdep doesn't get confused about lock classes.  Basically
> the tree is locked level by level.  So its safe to take eb->lock while
> holding eb->lock as long as you follow the rules.
> 
> Are additional annotations required for RT?

I don't think so. I'm sure it has been caused by the lock copying as
well. Walking the wrong list can cause complete confusion all over the
place. So lets wait for Mike beating the hell out of it.

Find the patch with a proper changelog below.

Thanks,

	tglx
------------------>
From: Thomas Gleixner <tglx@linutronix.de>
Date: Thu, 12 Jul 2012 15:30:02 +0200
Subject: btrfs: Init io_lock after cloning btrfs device struct

__btrfs_close_devices() clones btrfs device structs with
memcpy(). Some of the fields in the clone are reinitialized, but it's
missing to init io_lock. In mainline this goes unnoticed, but on RT it
leaves the plist pointing to the original about to be freed lock
struct.

Initialize io_lock after cloning, so no references to the original
struct are left.

Reported-and-tested-by: Mike Galbraith <efault@gmx.de>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 43baaf0..06c8ced 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -512,6 +512,7 @@ static int __btrfs_close_devices(struct btrfs_fs_devices *fs_devices)
 		new_device->writeable = 0;
 		new_device->in_fs_metadata = 0;
 		new_device->can_discard = 0;
+		spin_lock_init(&new_device->io_lock);
 		list_replace_rcu(&device->dev_list, &new_device->dev_list);
 
 		call_rcu(&device->rcu, free_device);

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-13  9:52             ` Thomas Gleixner
@ 2012-07-13 10:14               ` Mike Galbraith
  2012-07-13 10:26                 ` Thomas Gleixner
  0 siblings, 1 reply; 39+ messages in thread
From: Mike Galbraith @ 2012-07-13 10:14 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-rt-users, LKML, linux-fsdevel, Steven Rostedt, Peter Zijlstra

On Fri, 2012-07-13 at 11:52 +0200, Thomas Gleixner wrote: 
> On Fri, 13 Jul 2012, Mike Galbraith wrote:
> > On Thu, 2012-07-12 at 15:31 +0200, Thomas Gleixner wrote: 
> > > Bingo, that makes it more likely that this is caused by copying w/o
> > > initializing the lock and then freeing the original structure.
> > > 
> > > A quick check for memcpy finds that __btrfs_close_devices() does a
> > > memcpy of btrfs_device structs w/o initializing the lock in the new
> > > copy, but I have no idea whether that's the place we are looking for.
> > 
> > Thanks a bunch Thomas.  I doubt I would have ever figured out that lala
> > land resulted from _copying_ a lock.  That's one I won't be forgetting
> > any time soon.  Box not only survived a few thousand xfstests 006 runs,
> > dbench seemed disinterested in deadlocking virgin 3.0-rt.
> 
> Cute. It think that the lock copying caused the deadlock problem as
> the list pointed to the wrong place, so we might have ended up with
> following down the wrong chain when walking the list as long as the
> original struct was not freed. That beast is freed under RCU so there
> could be a rcu read side critical section fiddling with the old lock
> and cause utter confusion.

Virgin 3.0-rt appears to really be solid.  But then it doesn't have
pesky rwlocks.

> /me goes and writes a nastigram^W proper changelog
> 
> > btrfs still locks up in my enterprise kernel, so I suppose I had better
> > plug your fix into 3.4-rt and see what happens, and go beat hell out of
> > virgin 3.0-rt again to be sure box really really survives dbench.
> 
> A test against 3.4-rt sans enterprise mess might be nice as well.

Enterprise is 3.0-stable with um 555 btrfs patches (oh dear).

Virgin 3.4-rt and 3.2-rt deadlock gripe.  Enterprise doesn't gripe, but
deadlocks, so I have another adventure in my future even if I figure out
wth to do about rwlocks.

-Mike


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-13 10:14               ` Mike Galbraith
@ 2012-07-13 10:26                 ` Thomas Gleixner
  2012-07-13 10:47                   ` Chris Mason
  0 siblings, 1 reply; 39+ messages in thread
From: Thomas Gleixner @ 2012-07-13 10:26 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: linux-rt-users, LKML, linux-fsdevel, Steven Rostedt, Peter Zijlstra

On Fri, 13 Jul 2012, Mike Galbraith wrote:
> On Fri, 2012-07-13 at 11:52 +0200, Thomas Gleixner wrote: 
> > On Fri, 13 Jul 2012, Mike Galbraith wrote:
> > > On Thu, 2012-07-12 at 15:31 +0200, Thomas Gleixner wrote: 
> > > > Bingo, that makes it more likely that this is caused by copying w/o
> > > > initializing the lock and then freeing the original structure.
> > > > 
> > > > A quick check for memcpy finds that __btrfs_close_devices() does a
> > > > memcpy of btrfs_device structs w/o initializing the lock in the new
> > > > copy, but I have no idea whether that's the place we are looking for.
> > > 
> > > Thanks a bunch Thomas.  I doubt I would have ever figured out that lala
> > > land resulted from _copying_ a lock.  That's one I won't be forgetting
> > > any time soon.  Box not only survived a few thousand xfstests 006 runs,
> > > dbench seemed disinterested in deadlocking virgin 3.0-rt.
> > 
> > Cute. It think that the lock copying caused the deadlock problem as
> > the list pointed to the wrong place, so we might have ended up with
> > following down the wrong chain when walking the list as long as the
> > original struct was not freed. That beast is freed under RCU so there
> > could be a rcu read side critical section fiddling with the old lock
> > and cause utter confusion.
> 
> Virgin 3.0-rt appears to really be solid.  But then it doesn't have
> pesky rwlocks.

Ah. So 3.0 is not having those rwlock thingies. Bummer.
 
> > /me goes and writes a nastigram^W proper changelog
> > 
> > > btrfs still locks up in my enterprise kernel, so I suppose I had better
> > > plug your fix into 3.4-rt and see what happens, and go beat hell out of
> > > virgin 3.0-rt again to be sure box really really survives dbench.
> > 
> > A test against 3.4-rt sans enterprise mess might be nice as well.
> 
> Enterprise is 3.0-stable with um 555 btrfs patches (oh dear).
> 
> Virgin 3.4-rt and 3.2-rt deadlock gripe.  Enterprise doesn't gripe, but
> deadlocks, so I have another adventure in my future even if I figure out
> wth to do about rwlocks.

Hrmpf. /me goes to stare into fs/btrfs/ some more.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-13 10:26                 ` Thomas Gleixner
@ 2012-07-13 10:47                   ` Chris Mason
  2012-07-13 12:50                     ` Mike Galbraith
  0 siblings, 1 reply; 39+ messages in thread
From: Chris Mason @ 2012-07-13 10:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mike Galbraith, linux-rt-users, LKML, linux-fsdevel,
	Steven Rostedt, Peter Zijlstra

On Fri, Jul 13, 2012 at 04:26:26AM -0600, Thomas Gleixner wrote:
> On Fri, 13 Jul 2012, Mike Galbraith wrote:
> > On Fri, 2012-07-13 at 11:52 +0200, Thomas Gleixner wrote: 
> > > On Fri, 13 Jul 2012, Mike Galbraith wrote:
> > > > On Thu, 2012-07-12 at 15:31 +0200, Thomas Gleixner wrote: 
> > > > > Bingo, that makes it more likely that this is caused by copying w/o
> > > > > initializing the lock and then freeing the original structure.
> > > > > 
> > > > > A quick check for memcpy finds that __btrfs_close_devices() does a
> > > > > memcpy of btrfs_device structs w/o initializing the lock in the new
> > > > > copy, but I have no idea whether that's the place we are looking for.
> > > > 
> > > > Thanks a bunch Thomas.  I doubt I would have ever figured out that lala
> > > > land resulted from _copying_ a lock.  That's one I won't be forgetting
> > > > any time soon.  Box not only survived a few thousand xfstests 006 runs,
> > > > dbench seemed disinterested in deadlocking virgin 3.0-rt.
> > > 
> > > Cute. It think that the lock copying caused the deadlock problem as
> > > the list pointed to the wrong place, so we might have ended up with
> > > following down the wrong chain when walking the list as long as the
> > > original struct was not freed. That beast is freed under RCU so there
> > > could be a rcu read side critical section fiddling with the old lock
> > > and cause utter confusion.
> > 
> > Virgin 3.0-rt appears to really be solid.  But then it doesn't have
> > pesky rwlocks.
> 
> Ah. So 3.0 is not having those rwlock thingies. Bummer.
>  
> > > /me goes and writes a nastigram^W proper changelog
> > > 
> > > > btrfs still locks up in my enterprise kernel, so I suppose I had better
> > > > plug your fix into 3.4-rt and see what happens, and go beat hell out of
> > > > virgin 3.0-rt again to be sure box really really survives dbench.
> > > 
> > > A test against 3.4-rt sans enterprise mess might be nice as well.
> > 
> > Enterprise is 3.0-stable with um 555 btrfs patches (oh dear).
> > 
> > Virgin 3.4-rt and 3.2-rt deadlock gripe.  Enterprise doesn't gripe, but
> > deadlocks, so I have another adventure in my future even if I figure out
> > wth to do about rwlocks.
> 
> Hrmpf. /me goes to stare into fs/btrfs/ some more.

Please post the deadlocks here, I'll help ;)

-chris


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-13 10:47                   ` Chris Mason
@ 2012-07-13 12:50                     ` Mike Galbraith
  0 siblings, 0 replies; 39+ messages in thread
From: Mike Galbraith @ 2012-07-13 12:50 UTC (permalink / raw)
  To: Chris Mason
  Cc: Thomas Gleixner, linux-rt-users, LKML, linux-fsdevel,
	Steven Rostedt, Peter Zijlstra

On Fri, 2012-07-13 at 06:47 -0400, Chris Mason wrote: 
> On Fri, Jul 13, 2012 at 04:26:26AM -0600, Thomas Gleixner wrote:
> > On Fri, 13 Jul 2012, Mike Galbraith wrote:
> > > On Fri, 2012-07-13 at 11:52 +0200, Thomas Gleixner wrote: 
> > > > On Fri, 13 Jul 2012, Mike Galbraith wrote:
> > > > > On Thu, 2012-07-12 at 15:31 +0200, Thomas Gleixner wrote: 
> > > > > > Bingo, that makes it more likely that this is caused by copying w/o
> > > > > > initializing the lock and then freeing the original structure.
> > > > > > 
> > > > > > A quick check for memcpy finds that __btrfs_close_devices() does a
> > > > > > memcpy of btrfs_device structs w/o initializing the lock in the new
> > > > > > copy, but I have no idea whether that's the place we are looking for.
> > > > > 
> > > > > Thanks a bunch Thomas.  I doubt I would have ever figured out that lala
> > > > > land resulted from _copying_ a lock.  That's one I won't be forgetting
> > > > > any time soon.  Box not only survived a few thousand xfstests 006 runs,
> > > > > dbench seemed disinterested in deadlocking virgin 3.0-rt.
> > > > 
> > > > Cute. It think that the lock copying caused the deadlock problem as
> > > > the list pointed to the wrong place, so we might have ended up with
> > > > following down the wrong chain when walking the list as long as the
> > > > original struct was not freed. That beast is freed under RCU so there
> > > > could be a rcu read side critical section fiddling with the old lock
> > > > and cause utter confusion.
> > > 
> > > Virgin 3.0-rt appears to really be solid.  But then it doesn't have
> > > pesky rwlocks.
> > 
> > Ah. So 3.0 is not having those rwlock thingies. Bummer.
> >  
> > > > /me goes and writes a nastigram^W proper changelog
> > > > 
> > > > > btrfs still locks up in my enterprise kernel, so I suppose I had better
> > > > > plug your fix into 3.4-rt and see what happens, and go beat hell out of
> > > > > virgin 3.0-rt again to be sure box really really survives dbench.
> > > > 
> > > > A test against 3.4-rt sans enterprise mess might be nice as well.
> > > 
> > > Enterprise is 3.0-stable with um 555 btrfs patches (oh dear).
> > > 
> > > Virgin 3.4-rt and 3.2-rt deadlock gripe.  Enterprise doesn't gripe, but
> > > deadlocks, so I have another adventure in my future even if I figure out
> > > wth to do about rwlocks.
> > 
> > Hrmpf. /me goes to stare into fs/btrfs/ some more.
> 
> Please post the deadlocks here, I'll help ;)

This is the one from top of thread.  Below that is without the deadlock
detector.

[  692.857165] ============================================
[  692.863963] [ BUG: circular locking deadlock detected! ]
[  692.869264] Not tainted
[  692.871708] --------------------------------------------
[  692.877008] btrfs-delayed-m/1404 is deadlocking current task dbench/7937
[  692.877009] 
[  692.885183] 
[  692.885184] 1) dbench/7937 is trying to acquire this lock:
[  692.892149]  [ffff88014d6aea80] {&(&eb->lock)->lock}
[  692.897102] .. ->owner: ffff880175808501
[  692.901018] .. held by:   btrfs-delayed-m: 1404 [ffff880175808500, 120]
[  692.907657] 
[  692.907657] 2) btrfs-delayed-m/1404 is blocked on this lock:
[  692.914797]  [ffff88014bf58d60] {&(&eb->lock)->lock}
[  692.919751] .. ->owner: ffff880175186101
[  692.923672] .. held by:            dbench: 7937 [ffff880175186100, 120]
[  692.930309] 
[  692.930309] btrfs-delayed-m/1404's [blocked] stackdump:
[  692.930310] 
[  692.938504]  ffff880177575aa0 0000000000000046 ffff88014bf58d60 000000000000fb00
[  692.938507]  000000000000fb00 ffff880177575fd8 000000000000fb00 ffff880177574000
[  692.938509]  ffff880177575fd8 000000000000fb00 ffff88017662f240 ffff880175808500
[  692.960635] Call Trace:
[  692.963085]  [<ffffffff814c68e9>] schedule+0x29/0x90
[  692.963087]  [<ffffffff814c745d>] rt_spin_lock_slowlock+0xfd/0x330
[  692.963090]  [<ffffffff814c7a39>] __rt_spin_lock+0x9/0x10
[  692.963092]  [<ffffffff814c7b27>] rt_read_lock+0x27/0x40
[  692.963096]  [<ffffffff812550cf>] btrfs_clear_lock_blocking_rw+0x6f/0x180
[  692.963099]  [<ffffffff811fabd2>] btrfs_clear_path_blocking+0x32/0x70
[  692.963102]  [<ffffffff81200342>] btrfs_search_slot+0x6b2/0x810
[  692.963105]  [<ffffffff812148da>] btrfs_lookup_inode+0x2a/0xa0
[  692.963107]  [<ffffffff814c7312>] ? rt_mutex_lock+0x12/0x20
[  692.963111]  [<ffffffff8126d0bc>] btrfs_update_delayed_inode+0x6c/0x160
[  692.963113]  [<ffffffff814c7ab9>] ? _mutex_unlock+0x9/0x10
[  692.963116]  [<ffffffff8126e142>] btrfs_async_run_delayed_node_done+0x182/0x1a0
[  692.963119]  [<ffffffff8124ed5f>] worker_loop+0xaf/0x430
[  692.963121]  [<ffffffff8124ecb0>] ? btrfs_queue_worker+0x1e0/0x1e0
[  692.963123]  [<ffffffff8124ecb0>] ? btrfs_queue_worker+0x1e0/0x1e0
[  692.963127]  [<ffffffff8105850e>] kthread+0xae/0xc0
[  692.963129]  [<ffffffff814c68e9>] ? schedule+0x29/0x90
[  692.963133]  [<ffffffff810015bc>] ? __switch_to+0x14c/0x410
[  692.963137]  [<ffffffff81061e44>] ? finish_task_switch+0x54/0xd0
[  692.963140]  [<ffffffff814ceca4>] kernel_thread_helper+0x4/0x10
[  692.963143]  [<ffffffff81058460>] ? __init_kthread_worker+0x50/0x50
[  692.963145]  [<ffffffff814ceca0>] ? gs_change+0x13/0x13
[  692.963146] 
[  692.963147] dbench/7937's [current] stackdump:
[  692.963147] 
[  693.098724] Pid: 7937, comm: dbench Not tainted 3.4.4-rt13 #25
[  693.104544] Call Trace:
[  693.106993]  [<ffffffff8108b436>] debug_rt_mutex_print_deadlock+0x176/0x190
[  693.106995]  [<ffffffff814c74ec>] rt_spin_lock_slowlock+0x18c/0x330
[  693.106998]  [<ffffffff814c7a39>] __rt_spin_lock+0x9/0x10
[  693.107000]  [<ffffffff814c7b27>] rt_read_lock+0x27/0x40
[  693.107002]  [<ffffffff8125538c>] btrfs_try_tree_read_lock+0x4c/0x80
[  693.107004]  [<ffffffff812001bd>] btrfs_search_slot+0x52d/0x810
[  693.107007]  [<ffffffff812027ba>] btrfs_next_leaf+0xea/0x440
[  693.107010]  [<ffffffff81238ed8>] ? btrfs_token_dir_data_len+0x58/0xd0
[  693.107012]  [<ffffffff81238ed8>] ? btrfs_token_dir_data_len+0x58/0xd0
[  693.107016]  [<ffffffff81222e17>] btrfs_real_readdir+0x247/0x610
[  693.107020]  [<ffffffff81131350>] ? sys_ioctl+0xa0/0xa0
[  693.107022]  [<ffffffff81131350>] ? sys_ioctl+0xa0/0xa0
[  693.107024]  [<ffffffff81131660>] vfs_readdir+0xb0/0xd0
[  693.107026]  [<ffffffff81131840>] sys_getdents64+0x80/0xe0
[  693.107030]  [<ffffffff814cd9b9>] system_call_fastpath+0x16/0x1b
[  693.107032] [ turning off deadlock detection.Please report this trace. ]
[  693.107033] 


[  679.476016] SysRq : Show Blocked State
[  679.479781]   task                        PC stack   pid father
[  679.485708] btrfs-endio-wri D ffffffff81605920     0  1314      2 0x00000000
[  679.492785]  ffff880172939810 0000000000000046 ffff8801774ca538 000000000000f7c0
[  679.492789]  000000000000f7c0 ffff880172939fd8 ffff880172938000 000000000000f7c0
[  679.492792]  ffff880172939fd8 000000000000f7c0 ffff88017a4caa60 ffff8801744a88e0
[  679.514922] Call Trace:
[  679.517374]  [<ffffffff814ca569>] schedule+0x29/0x90
[  679.517378]  [<ffffffff814cb525>] rt_spin_lock_slowlock+0xd5/0x2bf
[  679.517382]  [<ffffffff814cb821>] __rt_spin_lock+0x21/0x30
[  679.517384]  [<ffffffff814cb927>] rt_read_lock+0x27/0x40
[  679.517388]  [<ffffffff81257d5d>] btrfs_tree_read_lock+0x5d/0x160
[  679.517392]  [<ffffffff812e46c6>] ? cpumask_next_and+0x36/0x50
[  679.517397]  [<ffffffff811fd863>] btrfs_read_lock_root_node+0x33/0x40
[  679.517400]  [<ffffffff81202919>] btrfs_search_slot+0x459/0x810
[  679.517404]  [<ffffffff8110dd60>] ? poison_obj+0x30/0x50
[  679.517409]  [<ffffffff81215957>] btrfs_lookup_file_extent+0x37/0x40
[  679.517412]  [<ffffffff811fd375>] ? btrfs_alloc_path+0x15/0x20
[  679.517417]  [<ffffffff81231bce>] btrfs_drop_extents+0xfe/0xa70
[  679.517421]  [<ffffffff814cf69d>] ? sub_preempt_count+0x9d/0xd0
[  679.517424]  [<ffffffff81066bb4>] ? migrate_enable+0xf4/0x1d0
[  679.517429]  [<ffffffff8110eb81>] ? cache_alloc_debugcheck_after+0x101/0x1a0
[  679.517432]  [<ffffffff81110545>] ? kmem_cache_alloc+0x1d5/0x200
[  679.517436]  [<ffffffff81224add>] insert_reserved_file_extent.clone.0+0x7d/0x270
[  679.517440]  [<ffffffff812218ab>] ? start_transaction+0x8b/0x290
[  679.517443]  [<ffffffff8122847e>] btrfs_finish_ordered_io+0x32e/0x3b0
[  679.517451]  [<ffffffff81047d1b>] ? try_to_del_timer_sync+0x6b/0xa0
[  679.517455]  [<ffffffff81228515>] btrfs_writepage_end_io_hook+0x15/0x20
[  679.517459]  [<ffffffff812446b4>] end_extent_writepage+0x64/0x100
[  679.517463]  [<ffffffff8124478b>] end_bio_extent_writepage+0x3b/0xa0
[  679.517468]  [<ffffffff81151af8>] bio_endio+0x18/0x30
[  679.517470]  [<ffffffff81219ba0>] end_workqueue_fn+0x40/0x50
[  679.517473]  [<ffffffff81251883>] worker_loop+0xc3/0x450
[  679.517476]  [<ffffffff814ca17f>] ? __schedule+0x2df/0x640
[  679.517480]  [<ffffffff812517c0>] ? btrfs_queue_worker+0x220/0x220
[  679.517483]  [<ffffffff812517c0>] ? btrfs_queue_worker+0x220/0x220
[  679.517486]  [<ffffffff81059586>] kthread+0x96/0xa0
[  679.517490]  [<ffffffff81062fb4>] ? finish_task_switch+0x54/0xd0
[  679.517494]  [<ffffffff814d2d24>] kernel_thread_helper+0x4/0x10
[  679.517498]  [<ffffffff810594f0>] ? __init_kthread_worker+0x50/0x50
[  679.517501]  [<ffffffff814d2d20>] ? gs_change+0x13/0x13
[  679.517509] btrfs-transacti D ffffffff81605920     0  1320      2 0x00000000
[  679.725747]  ffff880172d07a50 0000000000000046 ffff8801774ca538 000000000000f7c0
[  679.725750]  000000000000f7c0 ffff880172d07fd8 ffff880172d06000 000000000000f7c0
[  679.725753]  ffff880172d07fd8 000000000000f7c0 ffff88017a4929e0 ffff880176320920
[  679.747879] Call Trace:
[  679.750326]  [<ffffffff814ca569>] schedule+0x29/0x90
[  679.750328]  [<ffffffff814cb525>] rt_spin_lock_slowlock+0xd5/0x2bf
[  679.750331]  [<ffffffff814cb821>] __rt_spin_lock+0x21/0x30
[  679.750333]  [<ffffffff814cb927>] rt_read_lock+0x27/0x40
[  679.750335]  [<ffffffff81257d5d>] btrfs_tree_read_lock+0x5d/0x160
[  679.750338]  [<ffffffff814cf69d>] ? sub_preempt_count+0x9d/0xd0
[  679.750340]  [<ffffffff81066bb4>] ? migrate_enable+0xf4/0x1d0
[  679.750343]  [<ffffffff811fd863>] btrfs_read_lock_root_node+0x33/0x40
[  679.750345]  [<ffffffff81202919>] btrfs_search_slot+0x459/0x810
[  679.750349]  [<ffffffff81270d13>] btrfs_delete_delayed_items+0x63/0x100
[  679.750352]  [<ffffffff812710d2>] btrfs_run_delayed_items+0x112/0x160
[  679.750355]  [<ffffffff81220da2>] btrfs_commit_transaction+0x322/0xa70
[  679.750357]  [<ffffffff8122021a>] ? join_transaction+0x35a/0x3a0
[  679.750360]  [<ffffffff81059d30>] ? wake_up_bit+0x40/0x40
[  679.750362]  [<ffffffff81219a33>] transaction_kthread+0x273/0x2f0
[  679.750364]  [<ffffffff812197c0>] ? btrfs_congested_fn+0xb0/0xb0
[  679.750367]  [<ffffffff812197c0>] ? btrfs_congested_fn+0xb0/0xb0
[  679.750369]  [<ffffffff81059586>] kthread+0x96/0xa0
[  679.750371]  [<ffffffff81062fb4>] ? finish_task_switch+0x54/0xd0
[  679.750374]  [<ffffffff814d2d24>] kernel_thread_helper+0x4/0x10
[  679.750377]  [<ffffffff810594f0>] ? __init_kthread_worker+0x50/0x50
[  679.750379]  [<ffffffff814d2d20>] ? gs_change+0x13/0x13
[  679.750401] dbench          D ffffffff81605920     0  7812      1 0x00000004
[  679.886585]  ffff880174d99a98 0000000000000086 000000000000b380 000000000000f7c0
[  679.886587]  000000000000f7c0 ffff880174d99fd8 ffff880174d98000 000000000000f7c0
[  679.886590]  ffff880174d99fd8 000000000000f7c0 ffffffff81816020 ffff8801738be700
[  679.908712] Call Trace:
[  679.911158]  [<ffffffff810c87e0>] ? __lock_page+0x70/0x70
[  679.911160]  [<ffffffff814ca569>] schedule+0x29/0x90
[  679.911163]  [<ffffffff814ca657>] io_schedule+0x87/0xd0
[  679.911165]  [<ffffffff810c87e9>] sleep_on_page+0x9/0x10
[  679.911167]  [<ffffffff814c97a7>] __wait_on_bit+0x57/0x80
[  679.911170]  [<ffffffff810c983f>] ? find_get_pages_tag+0xcf/0x190
[  679.911172]  [<ffffffff810c8a0e>] wait_on_page_bit+0x6e/0x80
[  679.911175]  [<ffffffff81059d70>] ? autoremove_wake_function+0x40/0x40
[  679.911177]  [<ffffffff810d58b0>] ? pagevec_lookup_tag+0x20/0x30
[  679.911180]  [<ffffffff810c900e>] filemap_fdatawait_range+0xee/0x190
[  679.911183]  [<ffffffff812460bc>] ? extent_writepages+0x4c/0x60
[  679.911185]  [<ffffffff81226190>] ? btrfs_submit_direct+0x1d0/0x1d0
[  679.911188]  [<ffffffff811113e6>] ? kfree+0x1a6/0x2e0
[  679.911190]  [<ffffffff812240b2>] ? btrfs_writepages+0x22/0x30
[  679.911192]  [<ffffffff810d4dbf>] ? do_writepages+0x1f/0x40
[  679.911195]  [<ffffffff810c9e30>] filemap_write_and_wait_range+0x70/0x80
[  679.911198]  [<ffffffff81230587>] btrfs_sync_file+0x37/0x1b0
[  679.911201]  [<ffffffff8114ba50>] generic_write_sync+0x50/0x70
[  679.911203]  [<ffffffff812316dc>] btrfs_file_aio_write+0x31c/0x370
[  679.911207]  [<ffffffff811208da>] do_sync_write+0xda/0x120
[  679.911210]  [<ffffffff810ec752>] ? handle_mm_fault+0x162/0x220
[  679.911213]  [<ffffffff8104bc99>] ? kill_something_info+0x49/0x160
[  679.911216]  [<ffffffff812b45a3>] ? apparmor_file_permission+0x13/0x20
[  679.911219]  [<ffffffff8128ef27>] ? security_file_permission+0x27/0xb0
[  679.911222]  [<ffffffff811211c6>] vfs_write+0xc6/0x180
[  679.911224]  [<ffffffff81121672>] sys_pwrite64+0xa2/0xb0
[  679.911227]  [<ffffffff814d1a39>] system_call_fastpath+0x16/0x1b
[  679.911229] dbench          D ffffffff81605920     0  7813      1 0x00000004
[  680.074937]  ffff880174445908 0000000000000086 ffff8801744458a8 000000000000f7c0
[  680.074940]  000000000000f7c0 ffff880174445fd8 ffff880174444000 000000000000f7c0
[  680.074942]  ffff880174445fd8 000000000000f7c0 ffff88017a4e4ae0 ffff880174eb62c0
[  680.097069] Call Trace:
[  680.099514]  [<ffffffff814ca569>] schedule+0x29/0x90
[  680.099516]  [<ffffffff814cb525>] rt_spin_lock_slowlock+0xd5/0x2bf
[  680.099519]  [<ffffffff810d083a>] ? prep_new_page+0x12a/0x190
[  680.099521]  [<ffffffff814cb821>] __rt_spin_lock+0x21/0x30
[  680.099523]  [<ffffffff814cb927>] rt_read_lock+0x27/0x40
[  680.099525]  [<ffffffff81257d5d>] btrfs_tree_read_lock+0x5d/0x160
[  680.099529]  [<ffffffff812c21c6>] ? chksum_update+0x16/0x30
[  680.099531]  [<ffffffff811fd863>] btrfs_read_lock_root_node+0x33/0x40
[  680.099533]  [<ffffffff81202919>] btrfs_search_slot+0x459/0x810
[  680.099536]  [<ffffffff812148da>] btrfs_lookup_xattr+0x7a/0xc0
[  680.099539]  [<ffffffff8123e447>] __btrfs_getxattr+0x77/0x150
[  680.099542]  [<ffffffff8123e9cd>] btrfs_getxattr+0x7d/0x80
[  680.099544]  [<ffffffff8128d618>] cap_inode_need_killpriv+0x28/0x40
[  680.099547]  [<ffffffff8128ee41>] security_inode_need_killpriv+0x11/0x20
[  680.099549]  [<ffffffff810c85cb>] file_remove_suid+0x4b/0xc0
[  680.099551]  [<ffffffff810c88b5>] ? unlock_page+0x25/0x30
[  680.099556]  [<ffffffff810e7931>] ? __do_fault+0x431/0x530
[  680.099559]  [<ffffffff81231550>] btrfs_file_aio_write+0x190/0x370
[  680.099562]  [<ffffffff810eb437>] ? handle_pte_fault+0xe7/0x200
[  680.099565]  [<ffffffff811208da>] do_sync_write+0xda/0x120
[  680.099567]  [<ffffffff810ec752>] ? handle_mm_fault+0x162/0x220
[  680.099570]  [<ffffffff8104bc99>] ? kill_something_info+0x49/0x160
[  680.099572]  [<ffffffff812b45a3>] ? apparmor_file_permission+0x13/0x20
[  680.099574]  [<ffffffff8128ef27>] ? security_file_permission+0x27/0xb0
[  680.099577]  [<ffffffff811211c6>] vfs_write+0xc6/0x180
[  680.099579]  [<ffffffff81121672>] sys_pwrite64+0xa2/0xb0
[  680.099582]  [<ffffffff814d1a39>] system_call_fastpath+0x16/0x1b
[  680.099584] dbench          D ffffffff81605920     0  7814      1 0x00000004
[  680.263730]  ffff880176169be8 0000000000000086 ffff88017759a9b8 000000000000f7c0
[  680.263733]  000000000000f7c0 ffff880176169fd8 ffff880176168000 000000000000f7c0
[  680.263735]  ffff880176169fd8 000000000000f7c0 ffff88017a476960 ffff8801756ec1c0
[  680.285863] Call Trace:
[  680.288309]  [<ffffffff814ca569>] schedule+0x29/0x90
[  680.288311]  [<ffffffff814cb525>] rt_spin_lock_slowlock+0xd5/0x2bf
[  680.288314]  [<ffffffff814cb821>] __rt_spin_lock+0x21/0x30
[  680.288316]  [<ffffffff814cb927>] rt_read_lock+0x27/0x40
[  680.288319]  [<ffffffff81257eac>] btrfs_try_tree_read_lock+0x4c/0x80
[  680.288321]  [<ffffffff812029ed>] btrfs_search_slot+0x52d/0x810
[  680.288324]  [<ffffffff812256bf>] btrfs_real_readdir+0x1af/0x5f0
[  680.288326]  [<ffffffff81133690>] ? sys_ioctl+0xa0/0xa0
[  680.288329]  [<ffffffff81133690>] ? sys_ioctl+0xa0/0xa0
[  680.288331]  [<ffffffff811339a0>] vfs_readdir+0xb0/0xd0
[  680.288333]  [<ffffffff81133b80>] sys_getdents64+0x80/0xe0
[  680.288336]  [<ffffffff814d1a39>] system_call_fastpath+0x16/0x1b
[  680.288338] dbench          D ffffffff81605920     0  7815      1 0x00000004
[  680.362625]  ffff880174ea7ad8 0000000000000086 ffff88017759abf8 000000000000f7c0
[  680.362627]  000000000000f7c0 ffff880174ea7fd8 ffff880174ea6000 000000000000f7c0
[  680.362630]  ffff880174ea7fd8 000000000000f7c0 ffff88017a4fab60 ffff880174d5ca60
[  680.384756] Call Trace:
[  680.387202]  [<ffffffff814ca569>] schedule+0x29/0x90
[  680.387204]  [<ffffffff814cb525>] rt_spin_lock_slowlock+0xd5/0x2bf
[  680.387207]  [<ffffffff814cb821>] __rt_spin_lock+0x21/0x30
[  680.387209]  [<ffffffff814cb927>] rt_read_lock+0x27/0x40
[  680.387211]  [<ffffffff81257eac>] btrfs_try_tree_read_lock+0x4c/0x80
[  680.387214]  [<ffffffff812029ed>] btrfs_search_slot+0x52d/0x810
[  680.387216]  [<ffffffff812148da>] btrfs_lookup_xattr+0x7a/0xc0
[  680.387219]  [<ffffffff8123e447>] __btrfs_getxattr+0x77/0x150
[  680.387222]  [<ffffffff814cf69d>] ? sub_preempt_count+0x9d/0xd0
[  680.387225]  [<ffffffff8127b060>] btrfs_get_acl+0xf0/0x250
[  680.387227]  [<ffffffff8122d33d>] ? btrfs_new_inode+0x2bd/0x360
[  680.387230]  [<ffffffff8127b2e1>] btrfs_init_acl+0x81/0x150
[  680.387232]  [<ffffffff8122258c>] btrfs_init_inode_security+0x2c/0x60
[  680.387235]  [<ffffffff8122ea51>] btrfs_mkdir+0x121/0x1f0
[  680.387237]  [<ffffffff8112dfc8>] vfs_mkdir+0xb8/0x130
[  680.387240]  [<ffffffff81131273>] sys_mkdirat+0xf3/0x100
[  680.387242]  [<ffffffff81131294>] sys_mkdir+0x14/0x20
[  680.387244]  [<ffffffff814d1a39>] system_call_fastpath+0x16/0x1b
[  680.387246] dbench          D ffffffff81605920     0  7818      1 0x00000004
[  680.496083]  ffff8801790c98e8 0000000000000082 ffff8801790c9888 000000000000f7c0
[  680.496086]  000000000000f7c0 ffff8801790c9fd8 ffff8801790c8000 000000000000f7c0
[  680.496088]  ffff8801790c9fd8 000000000000f7c0 ffff88017a536be0 ffff880176072700
[  680.518214] Call Trace:
[  680.520658]  [<ffffffff814ca569>] schedule+0x29/0x90
[  680.520660]  [<ffffffff814cb525>] rt_spin_lock_slowlock+0xd5/0x2bf
[  680.520663]  [<ffffffff814cb821>] ? __rt_spin_lock+0x21/0x30
[  680.520665]  [<ffffffff814cb821>] __rt_spin_lock+0x21/0x30
[  680.520667]  [<ffffffff814cb927>] rt_read_lock+0x27/0x40
[  680.520669]  [<ffffffff81257bef>] btrfs_clear_lock_blocking_rw+0x6f/0x180
[  680.520672]  [<ffffffff811fd422>] btrfs_clear_path_blocking+0x32/0x70
[  680.520674]  [<ffffffff81202ba2>] btrfs_search_slot+0x6e2/0x810
[  680.520676]  [<ffffffff812c21c6>] ? chksum_update+0x16/0x30
[  680.520679]  [<ffffffff81214bd6>] btrfs_lookup_dir_item+0x76/0xc0
[  680.520682]  [<ffffffff8122c8db>] btrfs_lookup_dentry+0x9b/0x370
[  680.520685]  [<ffffffff810659c1>] ? get_parent_ip+0x11/0x50
[  680.520687]  [<ffffffff814cf69d>] ? sub_preempt_count+0x9d/0xd0
[  680.520690]  [<ffffffff8122cbc8>] btrfs_lookup+0x18/0x70
[  680.520693]  [<ffffffff8112bba8>] __lookup_hash+0x58/0x120
[  680.520695]  [<ffffffff8112e59f>] do_lookup+0x2af/0x330
[  680.520697]  [<ffffffff8112f174>] path_lookupat+0x134/0x750
[  680.520700]  [<ffffffff8110eb10>] ? cache_alloc_debugcheck_after+0x90/0x1a0
[  680.520702]  [<ffffffff8111048a>] ? kmem_cache_alloc+0x11a/0x200
[  680.520705]  [<ffffffff8112f7bc>] do_path_lookup+0x2c/0xc0
[  680.520707]  [<ffffffff810659c1>] ? get_parent_ip+0x11/0x50
[  680.520709]  [<ffffffff81130d74>] user_path_at_empty+0x54/0xa0
[  680.520712]  [<ffffffff81049380>] ? check_kill_permission+0x100/0x190
[  680.520714]  [<ffffffff8104baad>] ? group_send_sig_info+0x3d/0x80
[  680.520716]  [<ffffffff8104bc23>] ? kill_pid_info+0x53/0x80
[  680.520718]  [<ffffffff81130dcc>] user_path_at+0xc/0x10
[  680.520721]  [<ffffffff81125f1f>] vfs_fstatat+0x3f/0x80
[  680.520723]  [<ffffffff81125f96>] vfs_stat+0x16/0x20
[  680.520725]  [<ffffffff811260df>] sys_newstat+0x1f/0x40
[  680.520727]  [<ffffffff81122d6d>] ? fput+0x1d/0x30
[  680.520730]  [<ffffffff8111edd1>] ? filp_close+0x61/0x90
[  680.520732]  [<ffffffff8111eea8>] ? sys_close+0xa8/0x110
[  680.520735]  [<ffffffff814d1a39>] system_call_fastpath+0x16/0x1b
[  680.520737] rm              D ffffffff81605920     0  7820      1 0x00000004
[  680.714842]  ffff880176411a08 0000000000000086 ffff8801774ca538 000000000000f7c0
[  680.714844]  000000000000f7c0 ffff880176411fd8 ffff880176410000 000000000000f7c0
[  680.714847]  ffff880176411fd8 000000000000f7c0 ffff88017a4e4ae0 ffff880175892d20
[  680.736972] Call Trace:
[  680.739420]  [<ffffffff814ca569>] schedule+0x29/0x90
[  680.739422]  [<ffffffff814cb525>] rt_spin_lock_slowlock+0xd5/0x2bf
[  680.739426]  [<ffffffff810659c1>] ? get_parent_ip+0x11/0x50
[  680.739428]  [<ffffffff814cb821>] __rt_spin_lock+0x21/0x30
[  680.739430]  [<ffffffff814cb927>] rt_read_lock+0x27/0x40
[  680.739432]  [<ffffffff81257d5d>] btrfs_tree_read_lock+0x5d/0x160
[  680.739435]  [<ffffffff81246a34>] ? free_extent_buffer+0x34/0x70
[  680.739437]  [<ffffffff812002f1>] ? read_block_for_search+0x161/0x210
[  680.739440]  [<ffffffff811fd863>] btrfs_read_lock_root_node+0x33/0x40
[  680.739442]  [<ffffffff81202919>] btrfs_search_slot+0x459/0x810
[  680.739445]  [<ffffffff8121716a>] btrfs_lookup_inode+0x2a/0xa0
[  680.739448]  [<ffffffff81224e1e>] btrfs_read_locked_inode+0x7e/0x3a0
[  680.739450]  [<ffffffff81223eb0>] ? btrfs_permission+0x60/0x60
[  680.739453]  [<ffffffff8122c33f>] btrfs_iget+0x9f/0x100
[  680.739455]  [<ffffffff8122cad0>] btrfs_lookup_dentry+0x290/0x370
[  680.739458]  [<ffffffff8122cbc8>] btrfs_lookup+0x18/0x70
[  680.739461]  [<ffffffff8112bba8>] __lookup_hash+0x58/0x120
[  680.739463]  [<ffffffff8112faf7>] ? user_path_parent+0x47/0x80
[  680.739466]  [<ffffffff8112bc84>] lookup_hash+0x14/0x20
[  680.739468]  [<ffffffff8112fbc6>] do_unlinkat+0x96/0x1d0
[  680.739470]  [<ffffffff81133987>] ? vfs_readdir+0x97/0xd0
[  680.739472]  [<ffffffff81133baf>] ? sys_getdents64+0xaf/0xe0
[  680.739474]  [<ffffffff811312dd>] sys_unlinkat+0x1d/0x40
[  680.739477]  [<ffffffff814d1a39>] system_call_fastpath+0x16/0x1b
[  680.739479] rm              D ffffffff81605920     0  7822      1 0x00000004
[  680.883473]  ffff880172e0fa28 0000000000000086 ffff8801774ca538 000000000000f7c0
[  680.883475]  000000000000f7c0 ffff880172e0ffd8 ffff880172e0e000 000000000000f7c0
[  680.883478]  ffff880172e0ffd8 000000000000f7c0 ffff88017a476960 ffff8801791a4500
[  680.905605] Call Trace:
[  680.908050]  [<ffffffff814ca569>] schedule+0x29/0x90
[  680.908052]  [<ffffffff814cb525>] rt_spin_lock_slowlock+0xd5/0x2bf
[  680.908055]  [<ffffffff814cb821>] __rt_spin_lock+0x21/0x30
[  680.908057]  [<ffffffff814cb927>] rt_read_lock+0x27/0x40
[  680.908059]  [<ffffffff81257d5d>] btrfs_tree_read_lock+0x5d/0x160
[  680.908062]  [<ffffffff814cf69d>] ? sub_preempt_count+0x9d/0xd0
[  680.908065]  [<ffffffff811fd863>] btrfs_read_lock_root_node+0x33/0x40
[  680.908067]  [<ffffffff81202919>] btrfs_search_slot+0x459/0x810
[  680.908069]  [<ffffffff81066bb4>] ? migrate_enable+0xf4/0x1d0
[  680.908072]  [<ffffffff8122aefc>] btrfs_truncate_inode_items+0x13c/0x880
[  680.908074]  [<ffffffff814cf69d>] ? sub_preempt_count+0x9d/0xd0
[  680.908077]  [<ffffffff810659c1>] ? get_parent_ip+0x11/0x50
[  680.908079]  [<ffffffff814cf69d>] ? sub_preempt_count+0x9d/0xd0
[  680.908082]  [<ffffffff812218ab>] ? start_transaction+0x8b/0x290
[  680.908085]  [<ffffffff8122bf54>] btrfs_evict_inode+0x194/0x2c0
[  680.908087]  [<ffffffff814cf69d>] ? sub_preempt_count+0x9d/0xd0
[  680.908090]  [<ffffffff8113b875>] evict+0xb5/0x1d0
[  680.908092]  [<ffffffff8113ba83>] iput_final+0xf3/0x220
[  680.908095]  [<ffffffff8113bbe9>] iput+0x39/0x50
[  680.908097]  [<ffffffff8112fc87>] do_unlinkat+0x157/0x1d0
[  680.908099]  [<ffffffff81133987>] ? vfs_readdir+0x97/0xd0
[  680.908101]  [<ffffffff81133baf>] ? sys_getdents64+0xaf/0xe0
[  680.908104]  [<ffffffff811312dd>] sys_unlinkat+0x1d/0x40
[  680.908106]  [<ffffffff814d1a39>] system_call_fastpath+0x16/0x1b
[  680.908108] rm              D ffffffff81605920     0  7827      1 0x00000004
[  681.051241]  ffff880172dffa98 0000000000000086 ffff8801774ca538 000000000000f7c0
[  681.051244]  000000000000f7c0 ffff880172dfffd8 ffff880172dfe000 000000000000f7c0
[  681.051246]  ffff880172dfffd8 000000000000f7c0 ffff88017a476960 ffff8801761e4340
[  681.073378] Call Trace:
[  681.075823]  [<ffffffff814ca569>] schedule+0x29/0x90
[  681.075825]  [<ffffffff814cb525>] rt_spin_lock_slowlock+0xd5/0x2bf
[  681.075827]  [<ffffffff814cb821>] __rt_spin_lock+0x21/0x30
[  681.075829]  [<ffffffff814cb927>] rt_read_lock+0x27/0x40
[  681.075832]  [<ffffffff81257d5d>] btrfs_tree_read_lock+0x5d/0x160
[  681.075834]  [<ffffffff810659c1>] ? get_parent_ip+0x11/0x50
[  681.075837]  [<ffffffff814cf69d>] ? sub_preempt_count+0x9d/0xd0
[  681.075840]  [<ffffffff811fd863>] btrfs_read_lock_root_node+0x33/0x40
[  681.075842]  [<ffffffff81202919>] btrfs_search_slot+0x459/0x810
[  681.075844]  [<ffffffff812c21c6>] ? chksum_update+0x16/0x30
[  681.075847]  [<ffffffff81214bd6>] btrfs_lookup_dir_item+0x76/0xc0
[  681.075850]  [<ffffffff8122c8db>] btrfs_lookup_dentry+0x9b/0x370
[  681.075852]  [<ffffffff810659c1>] ? get_parent_ip+0x11/0x50
[  681.075855]  [<ffffffff814cf69d>] ? sub_preempt_count+0x9d/0xd0
[  681.075857]  [<ffffffff8122cbc8>] btrfs_lookup+0x18/0x70
[  681.075860]  [<ffffffff8112bba8>] __lookup_hash+0x58/0x120
[  681.075862]  [<ffffffff8112faf7>] ? user_path_parent+0x47/0x80
[  681.075864]  [<ffffffff8112bc84>] lookup_hash+0x14/0x20
[  681.075866]  [<ffffffff8112fbc6>] do_unlinkat+0x96/0x1d0
[  681.075869]  [<ffffffff81133987>] ? vfs_readdir+0x97/0xd0
[  681.075871]  [<ffffffff81133baf>] ? sys_getdents64+0xaf/0xe0
[  681.075873]  [<ffffffff811312dd>] sys_unlinkat+0x1d/0x40
[  681.075875]  [<ffffffff814d1a39>] system_call_fastpath+0x16/0x1b
[  681.075878] Sched Debug Version: v0.10, 3.4.4-rt13 #37



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM..  and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-12  5:47 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free! Mike Galbraith
  2012-07-12  8:44 ` Mike Galbraith
  2012-07-12 11:07 ` Thomas Gleixner
@ 2012-07-13 12:50 ` Chris Mason
  2012-07-13 14:47   ` Thomas Gleixner
                     ` (2 more replies)
  2 siblings, 3 replies; 39+ messages in thread
From: Chris Mason @ 2012-07-13 12:50 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: linux-rt-users, LKML, linux-fsdevel, Thomas Gleixner, Steven Rostedt

On Wed, Jul 11, 2012 at 11:47:40PM -0600, Mike Galbraith wrote:
> Greetings,

[ deadlocks with btrfs and the recent RT kernels ]

I talked with Thomas about this and I think the problem is the
single-reader nature of the RW rwlocks.  The lockdep report below
mentions that btrfs is calling:

> [  692.963099]  [<ffffffff811fabd2>] btrfs_clear_path_blocking+0x32/0x70

In this case, the task has a number of blocking read locks on the btrfs buffers,
and we're trying to turn them back into spinning read locks.  Even
though btrfs is taking the read rwlock, it doesn't think of this as a new
lock operation because we were blocking out new writers.

If the second task has taken the spinning read lock, it is going to
prevent that clear_path_blocking operation from progressing, even though
it would have worked on a non-RT kernel.

The solution should be to make the blocking read locks in btrfs honor the
single-reader semantics.  This means not allowing more than one blocking
reader and not allowing a spinning reader when there is a blocking
reader.  Strictly speaking btrfs shouldn't need recursive readers on a
single lock, so I wouldn't worry about that part.

There is also a chunk of code in btrfs_clear_path_blocking that makes
sure to strictly honor top down locking order during the conversion.  It
only does this when lockdep is enabled because in non-RT kernels we
don't need to worry about it.  For RT we'll want to enable that as well.

I'll give this a shot later today.

-chris


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-13 12:50 ` Chris Mason
@ 2012-07-13 14:47   ` Thomas Gleixner
  2012-07-14 10:14   ` Mike Galbraith
  2012-07-14 13:38   ` Mike Galbraith
  2 siblings, 0 replies; 39+ messages in thread
From: Thomas Gleixner @ 2012-07-13 14:47 UTC (permalink / raw)
  To: Chris Mason
  Cc: Mike Galbraith, linux-rt-users, LKML, linux-fsdevel, Steven Rostedt

Chris,

On Fri, 13 Jul 2012, Chris Mason wrote:
> On Wed, Jul 11, 2012 at 11:47:40PM -0600, Mike Galbraith wrote:
> > Greetings,
> 
> [ deadlocks with btrfs and the recent RT kernels ]
> 
> I talked with Thomas about this and I think the problem is the
> single-reader nature of the RW rwlocks.  The lockdep report below
> mentions that btrfs is calling:
> 
> > [  692.963099]  [<ffffffff811fabd2>] btrfs_clear_path_blocking+0x32/0x70
> 
> In this case, the task has a number of blocking read locks on the btrfs buffers,
> and we're trying to turn them back into spinning read locks.  Even
> though btrfs is taking the read rwlock, it doesn't think of this as a new
> lock operation because we were blocking out new writers.
> 
> If the second task has taken the spinning read lock, it is going to
> prevent that clear_path_blocking operation from progressing, even though
> it would have worked on a non-RT kernel.
> 
> The solution should be to make the blocking read locks in btrfs honor the
> single-reader semantics.  This means not allowing more than one blocking
> reader and not allowing a spinning reader when there is a blocking
> reader.  Strictly speaking btrfs shouldn't need recursive readers on a
> single lock, so I wouldn't worry about that part.
> 
> There is also a chunk of code in btrfs_clear_path_blocking that makes
> sure to strictly honor top down locking order during the conversion.  It
> only does this when lockdep is enabled because in non-RT kernels we
> don't need to worry about it.  For RT we'll want to enable that as well.

thanks for explaining this. I really got lost in that code completely.
 
> I'll give this a shot later today.

Cool.

Aside of that I'm still pondering to experiment with a non-pi variant
of rw locks which allows multiple readers. For such cases as btrfs I
think they would be well suited and avoid the performance overhead of
the single writer restriction. But that's not going to happen before
my vacation, so we'll stick with your workaround for now and let Mike
beat the hell out of it.

Thanks,

	Thomas

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM..  and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-13 12:50 ` Chris Mason
  2012-07-13 14:47   ` Thomas Gleixner
@ 2012-07-14 10:14   ` Mike Galbraith
  2012-07-15 17:56     ` Chris Mason
  2012-07-16 10:55     ` Mike Galbraith
  2012-07-14 13:38   ` Mike Galbraith
  2 siblings, 2 replies; 39+ messages in thread
From: Mike Galbraith @ 2012-07-14 10:14 UTC (permalink / raw)
  To: Chris Mason
  Cc: linux-rt-users, LKML, linux-fsdevel, Thomas Gleixner, Steven Rostedt

On Fri, 2012-07-13 at 08:50 -0400, Chris Mason wrote: 
> On Wed, Jul 11, 2012 at 11:47:40PM -0600, Mike Galbraith wrote:
> > Greetings,
> 
> [ deadlocks with btrfs and the recent RT kernels ]
> 
> I talked with Thomas about this and I think the problem is the
> single-reader nature of the RW rwlocks.  The lockdep report below
> mentions that btrfs is calling:
> 
> > [  692.963099]  [<ffffffff811fabd2>] btrfs_clear_path_blocking+0x32/0x70
> 
> In this case, the task has a number of blocking read locks on the btrfs buffers,
> and we're trying to turn them back into spinning read locks.  Even
> though btrfs is taking the read rwlock, it doesn't think of this as a new
> lock operation because we were blocking out new writers.
> 
> If the second task has taken the spinning read lock, it is going to
> prevent that clear_path_blocking operation from progressing, even though
> it would have worked on a non-RT kernel.
> 
> The solution should be to make the blocking read locks in btrfs honor the
> single-reader semantics.  This means not allowing more than one blocking
> reader and not allowing a spinning reader when there is a blocking
> reader.  Strictly speaking btrfs shouldn't need recursive readers on a
> single lock, so I wouldn't worry about that part.
> 
> There is also a chunk of code in btrfs_clear_path_blocking that makes
> sure to strictly honor top down locking order during the conversion.  It
> only does this when lockdep is enabled because in non-RT kernels we
> don't need to worry about it.  For RT we'll want to enable that as well.
> 
> I'll give this a shot later today.

I took a poke at it.  Did I do something similar to what you had in
mind, or just hide behind performance stealing paranoid trylock loops?
Box survived 1000 x xfstests 006 and dbench [-s] massive right off the
bat, so it gets posted despite skepticism.

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 4106264..ae47cc2 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -77,7 +77,7 @@ noinline void btrfs_clear_path_blocking(struct btrfs_path *p,
 {
 	int i;
 
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
+#if (defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_PREEMPT_RT_BASE))
 	/* lockdep really cares that we take all of these spinlocks
 	 * in the right order.  If any of the locks in the path are not
 	 * currently blocking, it is going to complain.  So, make really
@@ -104,7 +104,7 @@ noinline void btrfs_clear_path_blocking(struct btrfs_path *p,
 		}
 	}
 
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
+#if (defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_PREEMPT_RT_BASE))
 	if (held)
 		btrfs_clear_lock_blocking_rw(held, held_rw);
 #endif
diff --git a/fs/btrfs/locking.c b/fs/btrfs/locking.c
index 272f911..4db7c14 100644
--- a/fs/btrfs/locking.c
+++ b/fs/btrfs/locking.c
@@ -19,6 +19,7 @@
 #include <linux/pagemap.h>
 #include <linux/spinlock.h>
 #include <linux/page-flags.h>
+#include <linux/delay.h>
 #include <asm/bug.h>
 #include "ctree.h"
 #include "extent_io.h"
@@ -97,7 +98,18 @@ void btrfs_clear_lock_blocking_rw(struct extent_buffer *eb, int rw)
 void btrfs_tree_read_lock(struct extent_buffer *eb)
 {
 again:
+#ifdef CONFIG_PREEMPT_RT_BASE
+	while (atomic_read(&eb->blocking_readers))
+		cpu_chill();
+	while(!read_trylock(&eb->lock))
+		cpu_chill();
+	if (atomic_read(&eb->blocking_readers)) {
+		read_unlock(&eb->lock);
+		goto again;
+	}
+#else
 	read_lock(&eb->lock);
+#endif
 	if (atomic_read(&eb->blocking_writers) &&
 	    current->pid == eb->lock_owner) {
 		/*
@@ -131,11 +143,26 @@ int btrfs_try_tree_read_lock(struct extent_buffer *eb)
 	if (atomic_read(&eb->blocking_writers))
 		return 0;
 
+#ifdef CONFIG_PREEMPT_RT_BASE
+	if (atomic_read(&eb->blocking_readers))
+		return 0;
+	while(!read_trylock(&eb->lock))
+		cpu_chill();
+#else
 	read_lock(&eb->lock);
+#endif
+
 	if (atomic_read(&eb->blocking_writers)) {
 		read_unlock(&eb->lock);
 		return 0;
 	}
+
+#ifdef CONFIG_PREEMPT_RT_BASE
+	if (atomic_read(&eb->blocking_readers)) {
+		read_unlock(&eb->lock);
+		return 0;
+	}
+#endif
 	atomic_inc(&eb->read_locks);
 	atomic_inc(&eb->spinning_readers);
 	return 1;



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM..  and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-13 12:50 ` Chris Mason
  2012-07-13 14:47   ` Thomas Gleixner
  2012-07-14 10:14   ` Mike Galbraith
@ 2012-07-14 13:38   ` Mike Galbraith
  2 siblings, 0 replies; 39+ messages in thread
From: Mike Galbraith @ 2012-07-14 13:38 UTC (permalink / raw)
  To: Chris Mason
  Cc: linux-rt-users, LKML, linux-fsdevel, Thomas Gleixner, Steven Rostedt

On Fri, 2012-07-13 at 08:50 -0400, Chris Mason wrote:

> There is also a chunk of code in btrfs_clear_path_blocking that makes
> sure to strictly honor top down locking order during the conversion.  It
> only does this when lockdep is enabled because in non-RT kernels we
> don't need to worry about it.  For RT we'll want to enable that as well.

Hm, _seems_ that alone is enough prevent deadlock.  Throughput really
sucks though.  The other bits of my stab bump throughput for dbench 128
from ~200 mb/s to ~360 mb/s (appears it's the paranoid trylock loops).
ext3 does 775 mb/s with the same kernel.  Or, dbench 8 on ext3 gives
~1800 mb/s and ~480 mb/s btrfs.  Not exactly wonderful.

Hohum, guess I'll wait and see what your patch looks like.  I bet it'll
work a lot better than mine does :)

-Mike


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM..  and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-14 10:14   ` Mike Galbraith
@ 2012-07-15 17:56     ` Chris Mason
  2012-07-16  2:02       ` Mike Galbraith
  2012-07-16 10:55     ` Mike Galbraith
  1 sibling, 1 reply; 39+ messages in thread
From: Chris Mason @ 2012-07-15 17:56 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Chris L. Mason, linux-rt-users, LKML, linux-fsdevel,
	Thomas Gleixner, Steven Rostedt

On Sat, Jul 14, 2012 at 04:14:43AM -0600, Mike Galbraith wrote:
> On Fri, 2012-07-13 at 08:50 -0400, Chris Mason wrote: 
> > On Wed, Jul 11, 2012 at 11:47:40PM -0600, Mike Galbraith wrote:
> > > Greetings,
> > 
> > [ deadlocks with btrfs and the recent RT kernels ]
> > 
> > I talked with Thomas about this and I think the problem is the
> > single-reader nature of the RW rwlocks.  The lockdep report below
> > mentions that btrfs is calling:
> > 
> > > [  692.963099]  [<ffffffff811fabd2>] btrfs_clear_path_blocking+0x32/0x70
> > 
> > In this case, the task has a number of blocking read locks on the btrfs buffers,
> > and we're trying to turn them back into spinning read locks.  Even
> > though btrfs is taking the read rwlock, it doesn't think of this as a new
> > lock operation because we were blocking out new writers.
> > 
> > If the second task has taken the spinning read lock, it is going to
> > prevent that clear_path_blocking operation from progressing, even though
> > it would have worked on a non-RT kernel.
> > 
> > The solution should be to make the blocking read locks in btrfs honor the
> > single-reader semantics.  This means not allowing more than one blocking
> > reader and not allowing a spinning reader when there is a blocking
> > reader.  Strictly speaking btrfs shouldn't need recursive readers on a
> > single lock, so I wouldn't worry about that part.
> > 
> > There is also a chunk of code in btrfs_clear_path_blocking that makes
> > sure to strictly honor top down locking order during the conversion.  It
> > only does this when lockdep is enabled because in non-RT kernels we
> > don't need to worry about it.  For RT we'll want to enable that as well.
> > 
> > I'll give this a shot later today.
> 
> I took a poke at it.  Did I do something similar to what you had in
> mind, or just hide behind performance stealing paranoid trylock loops?
> Box survived 1000 x xfstests 006 and dbench [-s] massive right off the
> bat, so it gets posted despite skepticism.

Great, thanks!  I got stuck in bug land on Friday.  You mentioned
performance problems earlier on Saturday, did this improve performance?

One other question:

>  again:
> +#ifdef CONFIG_PREEMPT_RT_BASE
> +	while (atomic_read(&eb->blocking_readers))
> +		cpu_chill();
> +	while(!read_trylock(&eb->lock))
> +		cpu_chill();
> +	if (atomic_read(&eb->blocking_readers)) {
> +		read_unlock(&eb->lock);
> +		goto again;
> +	}

Why use read_trylock() in a loop instead of just trying to take the
lock?  Is this an RTism or are there other reasons?  

-chris

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM..  and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-15 17:56     ` Chris Mason
@ 2012-07-16  2:02       ` Mike Galbraith
  2012-07-16 16:02         ` Steven Rostedt
  0 siblings, 1 reply; 39+ messages in thread
From: Mike Galbraith @ 2012-07-16  2:02 UTC (permalink / raw)
  To: Chris Mason
  Cc: Chris L. Mason, linux-rt-users, LKML, linux-fsdevel,
	Thomas Gleixner, Steven Rostedt

On Sun, 2012-07-15 at 13:56 -0400, Chris Mason wrote: 
> On Sat, Jul 14, 2012 at 04:14:43AM -0600, Mike Galbraith wrote:
> > On Fri, 2012-07-13 at 08:50 -0400, Chris Mason wrote: 
> > > On Wed, Jul 11, 2012 at 11:47:40PM -0600, Mike Galbraith wrote:
> > > > Greetings,
> > > 
> > > [ deadlocks with btrfs and the recent RT kernels ]
> > > 
> > > I talked with Thomas about this and I think the problem is the
> > > single-reader nature of the RW rwlocks.  The lockdep report below
> > > mentions that btrfs is calling:
> > > 
> > > > [  692.963099]  [<ffffffff811fabd2>] btrfs_clear_path_blocking+0x32/0x70
> > > 
> > > In this case, the task has a number of blocking read locks on the btrfs buffers,
> > > and we're trying to turn them back into spinning read locks.  Even
> > > though btrfs is taking the read rwlock, it doesn't think of this as a new
> > > lock operation because we were blocking out new writers.
> > > 
> > > If the second task has taken the spinning read lock, it is going to
> > > prevent that clear_path_blocking operation from progressing, even though
> > > it would have worked on a non-RT kernel.
> > > 
> > > The solution should be to make the blocking read locks in btrfs honor the
> > > single-reader semantics.  This means not allowing more than one blocking
> > > reader and not allowing a spinning reader when there is a blocking
> > > reader.  Strictly speaking btrfs shouldn't need recursive readers on a
> > > single lock, so I wouldn't worry about that part.
> > > 
> > > There is also a chunk of code in btrfs_clear_path_blocking that makes
> > > sure to strictly honor top down locking order during the conversion.  It
> > > only does this when lockdep is enabled because in non-RT kernels we
> > > don't need to worry about it.  For RT we'll want to enable that as well.
> > > 
> > > I'll give this a shot later today.
> > 
> > I took a poke at it.  Did I do something similar to what you had in
> > mind, or just hide behind performance stealing paranoid trylock loops?
> > Box survived 1000 x xfstests 006 and dbench [-s] massive right off the
> > bat, so it gets posted despite skepticism.
> 
> Great, thanks!  I got stuck in bug land on Friday.  You mentioned
> performance problems earlier on Saturday, did this improve performance?

Yeah, the read_trylock() seems to improve throughput.  That's not
heavily tested, but it certainly looks like it does.  No idea why.

WRT performance, dbench isn't thrilled, but btrfs seems to work just
fine for my routine usage, and spinning rust bucket is being all it can
be.  I hope I don't have to care overly much about dbench's opinon.  It
doesn't make happy multi-thread numbers with btrfs, but those numbers
suddenly look great if you rebase relative to xfs -rt throughput :)

> One other question:
> 
> >  again:
> > +#ifdef CONFIG_PREEMPT_RT_BASE
> > +	while (atomic_read(&eb->blocking_readers))
> > +		cpu_chill();
> > +	while(!read_trylock(&eb->lock))
> > +		cpu_chill();
> > +	if (atomic_read(&eb->blocking_readers)) {
> > +		read_unlock(&eb->lock);
> > +		goto again;
> > +	}
> 
> Why use read_trylock() in a loop instead of just trying to take the
> lock?  Is this an RTism or are there other reasons?

First stab paranoia.  It worked, so I removed it.  It still worked but
lost throughput, removed all my bits leaving only the lockdep bits, it
still worked.

-Mike


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM..  and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-14 10:14   ` Mike Galbraith
  2012-07-15 17:56     ` Chris Mason
@ 2012-07-16 10:55     ` Mike Galbraith
  2012-07-16 15:43       ` Chris Mason
  1 sibling, 1 reply; 39+ messages in thread
From: Mike Galbraith @ 2012-07-16 10:55 UTC (permalink / raw)
  To: Chris Mason
  Cc: linux-rt-users, LKML, linux-fsdevel, Thomas Gleixner, Steven Rostedt

On Sat, 2012-07-14 at 12:14 +0200, Mike Galbraith wrote: 
> On Fri, 2012-07-13 at 08:50 -0400, Chris Mason wrote: 
> > On Wed, Jul 11, 2012 at 11:47:40PM -0600, Mike Galbraith wrote:
> > > Greetings,
> > 
> > [ deadlocks with btrfs and the recent RT kernels ]
> > 
> > I talked with Thomas about this and I think the problem is the
> > single-reader nature of the RW rwlocks.  The lockdep report below
> > mentions that btrfs is calling:
> > 
> > > [  692.963099]  [<ffffffff811fabd2>] btrfs_clear_path_blocking+0x32/0x70
> > 
> > In this case, the task has a number of blocking read locks on the btrfs buffers,
> > and we're trying to turn them back into spinning read locks.  Even
> > though btrfs is taking the read rwlock, it doesn't think of this as a new
> > lock operation because we were blocking out new writers.
> > 
> > If the second task has taken the spinning read lock, it is going to
> > prevent that clear_path_blocking operation from progressing, even though
> > it would have worked on a non-RT kernel.
> > 
> > The solution should be to make the blocking read locks in btrfs honor the
> > single-reader semantics.  This means not allowing more than one blocking
> > reader and not allowing a spinning reader when there is a blocking
> > reader.  Strictly speaking btrfs shouldn't need recursive readers on a
> > single lock, so I wouldn't worry about that part.
> > 
> > There is also a chunk of code in btrfs_clear_path_blocking that makes
> > sure to strictly honor top down locking order during the conversion.  It
> > only does this when lockdep is enabled because in non-RT kernels we
> > don't need to worry about it.  For RT we'll want to enable that as well.
> > 
> > I'll give this a shot later today.
> 
> I took a poke at it.  Did I do something similar to what you had in
> mind, or just hide behind performance stealing paranoid trylock loops?
> Box survived 1000 x xfstests 006 and dbench [-s] massive right off the
> bat, so it gets posted despite skepticism.

Seems btrfs isn't entirely convinced either.

[ 2292.336229] use_block_rsv: 1810 callbacks suppressed
[ 2292.336231] ------------[ cut here ]------------
[ 2292.336255] WARNING: at fs/btrfs/extent-tree.c:6344 use_block_rsv+0x17d/0x190 [btrfs]()
[ 2292.336257] Hardware name: System x3550 M3 -[7944K3G]-
[ 2292.336259] btrfs: block rsv returned -28
[ 2292.336260] Modules linked in: joydev st sr_mod ide_gd_mod(N) ide_cd_mod ide_core cdrom ibm_rtl nfsd lockd ipmi_devintf nfs_acl auth_rpcgss sunrpc ipmi_si ipmi_msghandler ipv6 ipv6_lib af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf edd fuse btrfs zlib_deflate ext3 jbd loop dm_mod usbhid hid cdc_ether usbnet mii sg shpchp pci_hotplug pcspkr bnx2 ioatdma i2c_i801 i2c_core tpm_tis tpm tpm_bios serio_raw i7core_edac edac_core button dca iTCO_wdt iTCO_vendor_support ext4 mbcache jbd2 uhci_hcd ehci_hcd sd_mod usbcore rtc_cmos crc_t10dif usb_common fan processor ata_generic ata_piix libata megaraid_sas scsi_mod thermal thermal_sys hwmon
[ 2292.336296] Supported: Yes
[ 2292.336298] Pid: 12975, comm: bonnie Tainted: G        W  N  3.0.35-rt56-rt #27
[ 2292.336300] Call Trace:
[ 2292.336312]  [<ffffffff81004562>] dump_trace+0x82/0x2e0
[ 2292.336320]  [<ffffffff814542b3>] dump_stack+0x69/0x6f
[ 2292.336325]  [<ffffffff8105900b>] warn_slowpath_common+0x7b/0xc0
[ 2292.336330]  [<ffffffff81059105>] warn_slowpath_fmt+0x45/0x50
[ 2292.336342]  [<ffffffffa034db7d>] use_block_rsv+0x17d/0x190 [btrfs]
[ 2292.336389]  [<ffffffffa0350d49>] btrfs_alloc_free_block+0x49/0x240 [btrfs]
[ 2292.336432]  [<ffffffffa033d49e>] __btrfs_cow_block+0x13e/0x510 [btrfs]
[ 2292.336457]  [<ffffffffa033d96f>] btrfs_cow_block+0xff/0x230 [btrfs]
[ 2292.336482]  [<ffffffffa0341ab0>] btrfs_search_slot+0x360/0x7e0 [btrfs]
[ 2292.336513]  [<ffffffffa03567c5>] btrfs_del_csums+0x175/0x2f0 [btrfs]
[ 2292.336562]  [<ffffffffa034a0f0>] __btrfs_free_extent+0x550/0x760 [btrfs]
[ 2292.336599]  [<ffffffffa034a53d>] run_delayed_data_ref+0x9d/0x190 [btrfs]
[ 2292.336636]  [<ffffffffa034f355>] run_clustered_refs+0xd5/0x3a0 [btrfs]
[ 2292.336678]  [<ffffffffa034f768>] btrfs_run_delayed_refs+0x148/0x350 [btrfs]
[ 2292.336723]  [<ffffffffa0362047>] __btrfs_end_transaction+0xb7/0x2b0 [btrfs]
[ 2292.336796]  [<ffffffffa036d153>] btrfs_evict_inode+0x2d3/0x340 [btrfs]
[ 2292.336863]  [<ffffffff81170121>] evict+0x91/0x190
[ 2292.336868]  [<ffffffff81163c07>] do_unlinkat+0x177/0x1f0
[ 2292.336875]  [<ffffffff8145e312>] system_call_fastpath+0x16/0x1b
[ 2292.336881]  [<00007fea187f9e67>] 0x7fea187f9e66
[ 2292.336887] ---[ end trace 0000000000000004 ]---
[ 2610.370398] use_block_rsv: 1947 callbacks suppressed
[ 2610.370400] ------------[ cut here ]------------

> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 4106264..ae47cc2 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -77,7 +77,7 @@ noinline void btrfs_clear_path_blocking(struct btrfs_path *p,
>  {
>  	int i;
>  
> -#ifdef CONFIG_DEBUG_LOCK_ALLOC
> +#if (defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_PREEMPT_RT_BASE))
>  	/* lockdep really cares that we take all of these spinlocks
>  	 * in the right order.  If any of the locks in the path are not
>  	 * currently blocking, it is going to complain.  So, make really
> @@ -104,7 +104,7 @@ noinline void btrfs_clear_path_blocking(struct btrfs_path *p,
>  		}
>  	}
>  
> -#ifdef CONFIG_DEBUG_LOCK_ALLOC
> +#if (defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_PREEMPT_RT_BASE))
>  	if (held)
>  		btrfs_clear_lock_blocking_rw(held, held_rw);
>  #endif
> diff --git a/fs/btrfs/locking.c b/fs/btrfs/locking.c
> index 272f911..4db7c14 100644
> --- a/fs/btrfs/locking.c
> +++ b/fs/btrfs/locking.c
> @@ -19,6 +19,7 @@
>  #include <linux/pagemap.h>
>  #include <linux/spinlock.h>
>  #include <linux/page-flags.h>
> +#include <linux/delay.h>
>  #include <asm/bug.h>
>  #include "ctree.h"
>  #include "extent_io.h"
> @@ -97,7 +98,18 @@ void btrfs_clear_lock_blocking_rw(struct extent_buffer *eb, int rw)
>  void btrfs_tree_read_lock(struct extent_buffer *eb)
>  {
>  again:
> +#ifdef CONFIG_PREEMPT_RT_BASE
> +	while (atomic_read(&eb->blocking_readers))
> +		cpu_chill();
> +	while(!read_trylock(&eb->lock))
> +		cpu_chill();
> +	if (atomic_read(&eb->blocking_readers)) {
> +		read_unlock(&eb->lock);
> +		goto again;
> +	}
> +#else
>  	read_lock(&eb->lock);
> +#endif
>  	if (atomic_read(&eb->blocking_writers) &&
>  	    current->pid == eb->lock_owner) {
>  		/*
> @@ -131,11 +143,26 @@ int btrfs_try_tree_read_lock(struct extent_buffer *eb)
>  	if (atomic_read(&eb->blocking_writers))
>  		return 0;
>  
> +#ifdef CONFIG_PREEMPT_RT_BASE
> +	if (atomic_read(&eb->blocking_readers))
> +		return 0;
> +	while(!read_trylock(&eb->lock))
> +		cpu_chill();
> +#else
>  	read_lock(&eb->lock);
> +#endif
> +
>  	if (atomic_read(&eb->blocking_writers)) {
>  		read_unlock(&eb->lock);
>  		return 0;
>  	}
> +
> +#ifdef CONFIG_PREEMPT_RT_BASE
> +	if (atomic_read(&eb->blocking_readers)) {
> +		read_unlock(&eb->lock);
> +		return 0;
> +	}
> +#endif
>  	atomic_inc(&eb->read_locks);
>  	atomic_inc(&eb->spinning_readers);
>  	return 1;
> 
> 



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM..  and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-16 10:55     ` Mike Galbraith
@ 2012-07-16 15:43       ` Chris Mason
  2012-07-16 16:16         ` Mike Galbraith
  0 siblings, 1 reply; 39+ messages in thread
From: Chris Mason @ 2012-07-16 15:43 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Chris L. Mason, linux-rt-users, LKML, linux-fsdevel,
	Thomas Gleixner, Steven Rostedt

On Mon, Jul 16, 2012 at 04:55:44AM -0600, Mike Galbraith wrote:
> On Sat, 2012-07-14 at 12:14 +0200, Mike Galbraith wrote: 
> > On Fri, 2012-07-13 at 08:50 -0400, Chris Mason wrote: 
> > > On Wed, Jul 11, 2012 at 11:47:40PM -0600, Mike Galbraith wrote:
> > > > Greetings,
> > > 
> > > [ deadlocks with btrfs and the recent RT kernels ]
> > > 
> > > I talked with Thomas about this and I think the problem is the
> > > single-reader nature of the RW rwlocks.  The lockdep report below
> > > mentions that btrfs is calling:
> > > 
> > > > [  692.963099]  [<ffffffff811fabd2>] btrfs_clear_path_blocking+0x32/0x70
> > > 
> > > In this case, the task has a number of blocking read locks on the btrfs buffers,
> > > and we're trying to turn them back into spinning read locks.  Even
> > > though btrfs is taking the read rwlock, it doesn't think of this as a new
> > > lock operation because we were blocking out new writers.
> > > 
> > > If the second task has taken the spinning read lock, it is going to
> > > prevent that clear_path_blocking operation from progressing, even though
> > > it would have worked on a non-RT kernel.
> > > 
> > > The solution should be to make the blocking read locks in btrfs honor the
> > > single-reader semantics.  This means not allowing more than one blocking
> > > reader and not allowing a spinning reader when there is a blocking
> > > reader.  Strictly speaking btrfs shouldn't need recursive readers on a
> > > single lock, so I wouldn't worry about that part.
> > > 
> > > There is also a chunk of code in btrfs_clear_path_blocking that makes
> > > sure to strictly honor top down locking order during the conversion.  It
> > > only does this when lockdep is enabled because in non-RT kernels we
> > > don't need to worry about it.  For RT we'll want to enable that as well.
> > > 
> > > I'll give this a shot later today.
> > 
> > I took a poke at it.  Did I do something similar to what you had in
> > mind, or just hide behind performance stealing paranoid trylock loops?
> > Box survived 1000 x xfstests 006 and dbench [-s] massive right off the
> > bat, so it gets posted despite skepticism.
> 
> Seems btrfs isn't entirely convinced either.
> 
> [ 2292.336229] use_block_rsv: 1810 callbacks suppressed
> [ 2292.336231] ------------[ cut here ]------------
> [ 2292.336255] WARNING: at fs/btrfs/extent-tree.c:6344 use_block_rsv+0x17d/0x190 [btrfs]()
> [ 2292.336257] Hardware name: System x3550 M3 -[7944K3G]-
> [ 2292.336259] btrfs: block rsv returned -28

This is unrelated.  You got far enough into the benchmark to hit an
ENOSPC warning.  This can be ignored (I just deleted it when we used 3.0
for oracle).

re: dbench performance.  dbench tends to penalize fairness.  I can
imagine RT making it slower in general.

It also triggers lots of lock contention in btrfs because the dataset is
fairly small and the trees don't fan out a lot.

-chris


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM..  and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-16  2:02       ` Mike Galbraith
@ 2012-07-16 16:02         ` Steven Rostedt
  2012-07-16 16:26           ` Mike Galbraith
  0 siblings, 1 reply; 39+ messages in thread
From: Steven Rostedt @ 2012-07-16 16:02 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Chris Mason, Chris L. Mason, linux-rt-users, LKML, linux-fsdevel,
	Thomas Gleixner

On Mon, 2012-07-16 at 04:02 +0200, Mike Galbraith wrote:

> > Great, thanks!  I got stuck in bug land on Friday.  You mentioned
> > performance problems earlier on Saturday, did this improve performance?
> 
> Yeah, the read_trylock() seems to improve throughput.  That's not
> heavily tested, but it certainly looks like it does.  No idea why.

Ouch, you just turned the rt_read_lock() into a spin lock. If a higher
priority process preempted a lower priority process that holds the same
lock, it will deadlock.

I'm not sure why you would get a performance benefit from this, as the
mutex used is an adaptive one (failure to acquire the lock will only
sleep if preempted or if the owner is not running).

We should look at why this performs better (if it really does).

-- Steve

> 
> WRT performance, dbench isn't thrilled, but btrfs seems to work just
> fine for my routine usage, and spinning rust bucket is being all it can
> be.  I hope I don't have to care overly much about dbench's opinon.  It
> doesn't make happy multi-thread numbers with btrfs, but those numbers
> suddenly look great if you rebase relative to xfs -rt throughput :)
> 
> > One other question:
> > 
> > >  again:
> > > +#ifdef CONFIG_PREEMPT_RT_BASE
> > > +	while (atomic_read(&eb->blocking_readers))
> > > +		cpu_chill();
> > > +	while(!read_trylock(&eb->lock))
> > > +		cpu_chill();
> > > +	if (atomic_read(&eb->blocking_readers)) {
> > > +		read_unlock(&eb->lock);
> > > +		goto again;
> > > +	}
> > 
> > Why use read_trylock() in a loop instead of just trying to take the
> > lock?  Is this an RTism or are there other reasons?
> 
> First stab paranoia.  It worked, so I removed it.  It still worked but
> lost throughput, removed all my bits leaving only the lockdep bits, it
> still worked.
> 
> -Mike



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM..  and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-16 15:43       ` Chris Mason
@ 2012-07-16 16:16         ` Mike Galbraith
  0 siblings, 0 replies; 39+ messages in thread
From: Mike Galbraith @ 2012-07-16 16:16 UTC (permalink / raw)
  To: Chris Mason
  Cc: Chris L. Mason, linux-rt-users, LKML, linux-fsdevel,
	Thomas Gleixner, Steven Rostedt

On Mon, 2012-07-16 at 11:43 -0400, Chris Mason wrote: 
> On Mon, Jul 16, 2012 at 04:55:44AM -0600, Mike Galbraith wrote:

> > Seems btrfs isn't entirely convinced either.
> > 
> > [ 2292.336229] use_block_rsv: 1810 callbacks suppressed
> > [ 2292.336231] ------------[ cut here ]------------
> > [ 2292.336255] WARNING: at fs/btrfs/extent-tree.c:6344 use_block_rsv+0x17d/0x190 [btrfs]()
> > [ 2292.336257] Hardware name: System x3550 M3 -[7944K3G]-
> > [ 2292.336259] btrfs: block rsv returned -28
> 
> This is unrelated.  You got far enough into the benchmark to hit an
> ENOSPC warning.  This can be ignored (I just deleted it when we used 3.0
> for oracle).

Ah great, thanks.  I'll whack it in my tree as well then.

> re: dbench performance.  dbench tends to penalize fairness.  I can
> imagine RT making it slower in general.

It seems to work just fine for my normal workloads, and cyclictest is
happy, so I'm happy.  Zillion threads is 'keep the pieces' to me ;-)

If you think the patch is ok as is, I'll go ahead and submit it after I
let dbench hammer on it overnight at least.

-Mike


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM..  and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-16 16:02         ` Steven Rostedt
@ 2012-07-16 16:26           ` Mike Galbraith
  2012-07-16 16:35             ` Chris Mason
  2012-07-16 16:36             ` Mike Galbraith
  0 siblings, 2 replies; 39+ messages in thread
From: Mike Galbraith @ 2012-07-16 16:26 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Chris Mason, Chris L. Mason, linux-rt-users, LKML, linux-fsdevel,
	Thomas Gleixner

On Mon, 2012-07-16 at 12:02 -0400, Steven Rostedt wrote: 
> On Mon, 2012-07-16 at 04:02 +0200, Mike Galbraith wrote:
> 
> > > Great, thanks!  I got stuck in bug land on Friday.  You mentioned
> > > performance problems earlier on Saturday, did this improve performance?
> > 
> > Yeah, the read_trylock() seems to improve throughput.  That's not
> > heavily tested, but it certainly looks like it does.  No idea why.
> 
> Ouch, you just turned the rt_read_lock() into a spin lock. If a higher
> priority process preempted a lower priority process that holds the same
> lock, it will deadlock.

Hm, how, it's doing cpu_chill()?

> I'm not sure why you would get a performance benefit from this, as the
> mutex used is an adaptive one (failure to acquire the lock will only
> sleep if preempted or if the owner is not running).

I'm not attached to it, can whack it in a heartbeat.. especially so it
the thing can deadlock.  I've seen enough of those of late.

> We should look at why this performs better (if it really does).

Not sure it really does, there's variance, but it looked like it did.

-Mike



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM..  and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-16 16:26           ` Mike Galbraith
@ 2012-07-16 16:35             ` Chris Mason
  2012-07-16 16:36             ` Mike Galbraith
  1 sibling, 0 replies; 39+ messages in thread
From: Chris Mason @ 2012-07-16 16:35 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Steven Rostedt, Chris L. Mason, linux-rt-users, LKML,
	linux-fsdevel, Thomas Gleixner

On Mon, Jul 16, 2012 at 10:26:08AM -0600, Mike Galbraith wrote:
> On Mon, 2012-07-16 at 12:02 -0400, Steven Rostedt wrote: 
> > On Mon, 2012-07-16 at 04:02 +0200, Mike Galbraith wrote:
> > 
> > > > Great, thanks!  I got stuck in bug land on Friday.  You mentioned
> > > > performance problems earlier on Saturday, did this improve performance?
> > > 
> > > Yeah, the read_trylock() seems to improve throughput.  That's not
> > > heavily tested, but it certainly looks like it does.  No idea why.
> > 
> > Ouch, you just turned the rt_read_lock() into a spin lock. If a higher
> > priority process preempted a lower priority process that holds the same
> > lock, it will deadlock.
> 
> Hm, how, it's doing cpu_chill()?
> 
> > I'm not sure why you would get a performance benefit from this, as the
> > mutex used is an adaptive one (failure to acquire the lock will only
> > sleep if preempted or if the owner is not running).
> 
> I'm not attached to it, can whack it in a heartbeat.. especially so it
> the thing can deadlock.  I've seen enough of those of late.
> 
> > We should look at why this performs better (if it really does).
> 
> Not sure it really does, there's variance, but it looked like it did.
> 

I'd use a benchmark that is more consistent than dbench for this.  I
love dbench for generating load (and the occasional deadlock) but it
tends to steer you in the wrong direction on performance.

-chris


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM..  and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-16 16:26           ` Mike Galbraith
  2012-07-16 16:35             ` Chris Mason
@ 2012-07-16 16:36             ` Mike Galbraith
  2012-07-16 17:03               ` Steven Rostedt
  1 sibling, 1 reply; 39+ messages in thread
From: Mike Galbraith @ 2012-07-16 16:36 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Chris Mason, Chris L. Mason, linux-rt-users, LKML, linux-fsdevel,
	Thomas Gleixner

On Mon, 2012-07-16 at 18:26 +0200, Mike Galbraith wrote: 
> On Mon, 2012-07-16 at 12:02 -0400, Steven Rostedt wrote: 
> > On Mon, 2012-07-16 at 04:02 +0200, Mike Galbraith wrote:
> > 
> > > > Great, thanks!  I got stuck in bug land on Friday.  You mentioned
> > > > performance problems earlier on Saturday, did this improve performance?
> > > 
> > > Yeah, the read_trylock() seems to improve throughput.  That's not
> > > heavily tested, but it certainly looks like it does.  No idea why.
> > 
> > Ouch, you just turned the rt_read_lock() into a spin lock. If a higher
> > priority process preempted a lower priority process that holds the same
> > lock, it will deadlock.
> 
> Hm, how, it's doing cpu_chill()?

'course PI is toast, so *poof*.  Since just enabling the lockdep bits
seems to fix it up, maybe that's the patchlet to submit (less is more).

-Mike


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM..  and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-16 16:36             ` Mike Galbraith
@ 2012-07-16 17:03               ` Steven Rostedt
  2012-07-17  4:18                 ` Mike Galbraith
  0 siblings, 1 reply; 39+ messages in thread
From: Steven Rostedt @ 2012-07-16 17:03 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Chris Mason, Chris L. Mason, linux-rt-users, LKML, linux-fsdevel,
	Thomas Gleixner

On Mon, 2012-07-16 at 18:36 +0200, Mike Galbraith wrote:
>  
> > > Ouch, you just turned the rt_read_lock() into a spin lock. If a higher
> > > priority process preempted a lower priority process that holds the same
> > > lock, it will deadlock.
> > 
> > Hm, how, it's doing cpu_chill()?
> 
> 'course PI is toast, so *poof*.  Since just enabling the lockdep bits
> seems to fix it up, maybe that's the patchlet to submit (less is more).

There's that too. But the issue I was talking about is with all trylock
loops. As holding an rt-mutex now disables migration, if a high priority
process preempts a task that holds the lock, and then the high prio task
starts spinning waiting for that lock to release, the lower priority
process will never get to run to release it. The cpu_chill() doesn't
help.

-- Steve



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM..  and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-16 17:03               ` Steven Rostedt
@ 2012-07-17  4:18                 ` Mike Galbraith
  2012-07-17  4:27                   ` Steven Rostedt
  2012-07-17 12:54                   ` Mike Galbraith
  0 siblings, 2 replies; 39+ messages in thread
From: Mike Galbraith @ 2012-07-17  4:18 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Chris Mason, Chris L. Mason, linux-rt-users, LKML, linux-fsdevel,
	Thomas Gleixner

On Mon, 2012-07-16 at 13:03 -0400, Steven Rostedt wrote: 
> On Mon, 2012-07-16 at 18:36 +0200, Mike Galbraith wrote:
> >  
> > > > Ouch, you just turned the rt_read_lock() into a spin lock. If a higher
> > > > priority process preempted a lower priority process that holds the same
> > > > lock, it will deadlock.
> > > 
> > > Hm, how, it's doing cpu_chill()?
> > 
> > 'course PI is toast, so *poof*.  Since just enabling the lockdep bits
> > seems to fix it up, maybe that's the patchlet to submit (less is more).
> 
> There's that too. But the issue I was talking about is with all trylock
> loops. As holding an rt-mutex now disables migration, if a high priority
> process preempts a task that holds the lock, and then the high prio task
> starts spinning waiting for that lock to release, the lower priority
> process will never get to run to release it. The cpu_chill() doesn't
> help.

Hrm.  I better go make a testcase, this one definitely wants pounding
through thick skull.

I think all of the chilling in patchlet is really ugly anyway, so would
prefer to trash it all, just enable the lockdep bits.  If it turns out
we really do need to bounce off of counts, go get a bigger hammer when
the need arises.  For the nonce, the pre-installed hammer _seemed_ big
enough for the job.

What's a good way to beat living hell out of btrfs?  I've never been
into destructive fs testing, since they usually lived on my one and only
disk.  x3550 has two, and OS clone has already been sacrificed.

-Mike


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM..  and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-17  4:18                 ` Mike Galbraith
@ 2012-07-17  4:27                   ` Steven Rostedt
  2012-07-17  4:34                     ` Steven Rostedt
  2012-07-17  4:44                     ` Mike Galbraith
  2012-07-17 12:54                   ` Mike Galbraith
  1 sibling, 2 replies; 39+ messages in thread
From: Steven Rostedt @ 2012-07-17  4:27 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Chris Mason, Chris L. Mason, linux-rt-users, LKML, linux-fsdevel,
	Thomas Gleixner

On Tue, 2012-07-17 at 06:18 +0200, Mike Galbraith wrote:
>  
> > There's that too. But the issue I was talking about is with all trylock
> > loops. As holding an rt-mutex now disables migration, if a high priority
> > process preempts a task that holds the lock, and then the high prio task
> > starts spinning waiting for that lock to release, the lower priority
> > process will never get to run to release it. The cpu_chill() doesn't
> > help.
> 
> Hrm.  I better go make a testcase, this one definitely wants pounding
> through thick skull.

Actually, I was mistaken. I forgot that we defined 'cpu_chill()' as
msleep(1) on RT, which would keep a deadlock from happening.

It doesn't explain the performance enhancement you get :-/

-- Steve



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM..  and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-17  4:27                   ` Steven Rostedt
@ 2012-07-17  4:34                     ` Steven Rostedt
  2012-07-17  4:46                       ` Mike Galbraith
  2012-07-17  4:44                     ` Mike Galbraith
  1 sibling, 1 reply; 39+ messages in thread
From: Steven Rostedt @ 2012-07-17  4:34 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Chris Mason, Chris L. Mason, linux-rt-users, LKML, linux-fsdevel,
	Thomas Gleixner

On Tue, 2012-07-17 at 00:27 -0400, Steven Rostedt wrote:

> Actually, I was mistaken. I forgot that we defined 'cpu_chill()' as
> msleep(1) on RT, which would keep a deadlock from happening.

Perhaps cpu_chill() isn't a good name, as it doesn't really explain what
is happening. Perhaps one of the following?

  cpu_rest()
  cpu_sleep()
  cpu_deep_relax()
  cpu_dream()
  cpu_hypnotize()

-- Steve
 



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM..  and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-17  4:27                   ` Steven Rostedt
  2012-07-17  4:34                     ` Steven Rostedt
@ 2012-07-17  4:44                     ` Mike Galbraith
  1 sibling, 0 replies; 39+ messages in thread
From: Mike Galbraith @ 2012-07-17  4:44 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Chris Mason, Chris L. Mason, linux-rt-users, LKML, linux-fsdevel,
	Thomas Gleixner

On Tue, 2012-07-17 at 00:27 -0400, Steven Rostedt wrote: 
> On Tue, 2012-07-17 at 06:18 +0200, Mike Galbraith wrote:
> >  
> > > There's that too. But the issue I was talking about is with all trylock
> > > loops. As holding an rt-mutex now disables migration, if a high priority
> > > process preempts a task that holds the lock, and then the high prio task
> > > starts spinning waiting for that lock to release, the lower priority
> > > process will never get to run to release it. The cpu_chill() doesn't
> > > help.
> > 
> > Hrm.  I better go make a testcase, this one definitely wants pounding
> > through thick skull.
> 
> Actually, I was mistaken. I forgot that we defined 'cpu_chill()' as
> msleep(1) on RT, which would keep a deadlock from happening.

Whew!  There are no stars and moons on my pointy hat, just plain white
cone, so you had me worried I was missing something critical there.

> It doesn't explain the performance enhancement you get :-/

No, it doesn't.  The only thing I can think of is that while folks are
timed sleeping, they aren't preempting and interleaving IO as much, but
I'm pulling that out of thin air.  Timed sleep should be a lot longer
than regular wakeup, so to my mind, there should be less interleave due
to more thumb twiddling.

-Mike


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM..  and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-17  4:34                     ` Steven Rostedt
@ 2012-07-17  4:46                       ` Mike Galbraith
  0 siblings, 0 replies; 39+ messages in thread
From: Mike Galbraith @ 2012-07-17  4:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Chris Mason, Chris L. Mason, linux-rt-users, LKML, linux-fsdevel,
	Thomas Gleixner

On Tue, 2012-07-17 at 00:34 -0400, Steven Rostedt wrote: 
> On Tue, 2012-07-17 at 00:27 -0400, Steven Rostedt wrote:
> 
> > Actually, I was mistaken. I forgot that we defined 'cpu_chill()' as
> > msleep(1) on RT, which would keep a deadlock from happening.
> 
> Perhaps cpu_chill() isn't a good name, as it doesn't really explain what
> is happening. Perhaps one of the following?
> 
>   cpu_rest()
>   cpu_sleep()
>   cpu_deep_relax()
>   cpu_dream()
>   cpu_hypnotize()

(   cpu_waste_truckloads_of_time();)


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 3.4.4-rt13: btrfs + xfstests 006 = BOOM..  and a bonus rt_mutex deadlock report for absolutely free!
  2012-07-17  4:18                 ` Mike Galbraith
  2012-07-17  4:27                   ` Steven Rostedt
@ 2012-07-17 12:54                   ` Mike Galbraith
  1 sibling, 0 replies; 39+ messages in thread
From: Mike Galbraith @ 2012-07-17 12:54 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Chris Mason, Chris L. Mason, linux-rt-users, LKML, linux-fsdevel,
	Thomas Gleixner

On Tue, 2012-07-17 at 06:18 +0200, Mike Galbraith wrote: 
> On Mon, 2012-07-16 at 13:03 -0400, Steven Rostedt wrote: 
> > On Mon, 2012-07-16 at 18:36 +0200, Mike Galbraith wrote:
> > >  
> > > > > Ouch, you just turned the rt_read_lock() into a spin lock. If a higher
> > > > > priority process preempted a lower priority process that holds the same
> > > > > lock, it will deadlock.
> > > > 
> > > > Hm, how, it's doing cpu_chill()?
> > > 
> > > 'course PI is toast, so *poof*.  Since just enabling the lockdep bits
> > > seems to fix it up, maybe that's the patchlet to submit (less is more).
> > 
> > There's that too. But the issue I was talking about is with all trylock
> > loops. As holding an rt-mutex now disables migration, if a high priority
> > process preempts a task that holds the lock, and then the high prio task
> > starts spinning waiting for that lock to release, the lower priority
> > process will never get to run to release it. The cpu_chill() doesn't
> > help.
> 
> Hrm.  I better go make a testcase, this one definitely wants pounding
> through thick skull.
> 
> I think all of the chilling in patchlet is really ugly anyway, so would
> prefer to trash it all, just enable the lockdep bits.  If it turns out
> we really do need to bounce off of counts, go get a bigger hammer when
> the need arises.  For the nonce, the pre-installed hammer _seemed_ big
> enough for the job.

All night dbench session, and all day doing many full xfstests runs
(what will run on btrfs), fsstress -p64, and generic beating says the
pre-installed tool is fine all by itself, so here comes a zero line
patch.. the second best kind ;-)

rt,fs,btrfs: fix rt deadlock on extent_buffer->lock

Trivially repeatable deadlock is cured by enabling lockdep code in
btrfs_clear_path_blocking() as suggested by Chris Mason.  He also
suggested restricting blocking reader count to one, and not allowing
a spinning reader while blocking reader exists.  This has proven to
be unnecessary, the strict lock order enforcement is enough.. or
rather that's my box's opinion after long hours of hard pounding.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Chris Mason <chris.mason@fusionio.com>

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 2e66786..1f71eb0 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -72,7 +72,7 @@ noinline void btrfs_clear_path_blocking(struct btrfs_path *p,
 {
 	int i;
 
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
+#if (defined(CONFIG_DEBUG_LOCK_ALLOC) || defined (CONFIG_PREEMPT_RT_BASE))
 	/* lockdep really cares that we take all of these spinlocks
 	 * in the right order.  If any of the locks in the path are not
 	 * currently blocking, it is going to complain.  So, make really
@@ -89,7 +89,7 @@ noinline void btrfs_clear_path_blocking(struct btrfs_path *p,
 			btrfs_clear_lock_blocking(p->nodes[i]);
 	}
 
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
+#if (defined(CONFIG_DEBUG_LOCK_ALLOC) || defined (CONFIG_PREEMPT_RT_BASE))
 	if (held)
 		btrfs_clear_lock_blocking(held);
 #endif



^ permalink raw reply related	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2012-07-17 12:54 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-12  5:47 3.4.4-rt13: btrfs + xfstests 006 = BOOM.. and a bonus rt_mutex deadlock report for absolutely free! Mike Galbraith
2012-07-12  8:44 ` Mike Galbraith
2012-07-12  9:53   ` Mike Galbraith
2012-07-12 11:43     ` Thomas Gleixner
2012-07-12 11:57       ` Mike Galbraith
2012-07-12 13:31         ` Thomas Gleixner
2012-07-12 13:37           ` Mike Galbraith
2012-07-12 13:43             ` Thomas Gleixner
2012-07-12 13:48               ` Mike Galbraith
2012-07-12 13:51                 ` Mike Galbraith
2012-07-13  6:31           ` Mike Galbraith
2012-07-13  9:52             ` Thomas Gleixner
2012-07-13 10:14               ` Mike Galbraith
2012-07-13 10:26                 ` Thomas Gleixner
2012-07-13 10:47                   ` Chris Mason
2012-07-13 12:50                     ` Mike Galbraith
2012-07-12 11:07 ` Thomas Gleixner
2012-07-12 17:09   ` Chris Mason
2012-07-13 10:04     ` Thomas Gleixner
2012-07-13 12:50 ` Chris Mason
2012-07-13 14:47   ` Thomas Gleixner
2012-07-14 10:14   ` Mike Galbraith
2012-07-15 17:56     ` Chris Mason
2012-07-16  2:02       ` Mike Galbraith
2012-07-16 16:02         ` Steven Rostedt
2012-07-16 16:26           ` Mike Galbraith
2012-07-16 16:35             ` Chris Mason
2012-07-16 16:36             ` Mike Galbraith
2012-07-16 17:03               ` Steven Rostedt
2012-07-17  4:18                 ` Mike Galbraith
2012-07-17  4:27                   ` Steven Rostedt
2012-07-17  4:34                     ` Steven Rostedt
2012-07-17  4:46                       ` Mike Galbraith
2012-07-17  4:44                     ` Mike Galbraith
2012-07-17 12:54                   ` Mike Galbraith
2012-07-16 10:55     ` Mike Galbraith
2012-07-16 15:43       ` Chris Mason
2012-07-16 16:16         ` Mike Galbraith
2012-07-14 13:38   ` Mike Galbraith

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.