All of lore.kernel.org
 help / color / mirror / Atom feed
* Filesystem lockup with CONFIG_PREEMPT_RT
@ 2014-05-14  2:29 ` Austin Schuh
  0 siblings, 0 replies; 43+ messages in thread
From: Austin Schuh @ 2014-05-14  2:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: xfs

[-- Attachment #1: Type: text/plain, Size: 831 bytes --]

Hi,

I am observing a filesystem lockup with XFS on a CONFIG_PREEMPT_RT
patched kernel.  I have currently only triggered it using dpkg.  Dave
Chinner on the XFS mailing list suggested that it was a rt-kernel
workqueue issue as opposed to a XFS problem after looking at the
kernel messages.

$ uname -a
Linux vpc5 3.10.24-rt22abs #15 SMP PREEMPT RT Tue May 13 14:42:22 PDT
2014 x86_64 GNU/Linux

The only modification to the kernel besides the RT patch is that I
have applied tglx's "genirq: Sanitize spurious interrupt detection of
threaded irqs" patch.

Any ideas on what could be wrong?

Is there any information that I can pull before I reboot the machine
that would be useful?  I have the output of triggering sysrq with l
and t if that would be useful.  Attached is the kernel blocked task
message output.

Thanks,
    Austin

[-- Attachment #2: vpc5_xfs_lockup_locks --]
[-- Type: application/octet-stream, Size: 31428 bytes --]

May 13 18:45:18 vpc5 kernel: [  959.966849] INFO: task kworker/2:1:82 blocked for more than 120 seconds.
May 13 18:45:18 vpc5 kernel: [  959.966917] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 13 18:45:18 vpc5 kernel: [  959.966972] kworker/2:1     D 0000000000000000     0    82      2 0x00000000
May 13 18:45:18 vpc5 kernel: [  959.966981] Workqueue: xfs-data/sda5 xfs_end_io
May 13 18:45:18 vpc5 kernel: [  959.966985]  ffff8803f64b9b58 0000000000000002 0000000000000000 ffff8803f659c600
May 13 18:45:18 vpc5 kernel: [  959.966987]  ffff880366153a40 ffff8803f64b9fd8 0000000000063800 ffff8803f64b9fd8
May 13 18:45:18 vpc5 kernel: [  959.966988]  0000000000063800 ffff8803f64b1180 ffff8803f64b9b78 ffff8803f64b1180
May 13 18:45:18 vpc5 kernel: [  959.966991] Call Trace:
May 13 18:45:18 vpc5 kernel: [  959.966997]  [<ffffffff814b7711>] schedule+0x75/0x87
May 13 18:45:18 vpc5 kernel: [  959.966999]  [<ffffffff814b7c74>] __rt_mutex_slowlock+0x8e/0xc7
May 13 18:45:18 vpc5 kernel: [  959.967001]  [<ffffffff814b7db6>] rt_mutex_slowlock+0x109/0x18a
May 13 18:45:18 vpc5 kernel: [  959.967005]  [<ffffffff8107a629>] ? __lock_acquire.isra.27+0x1ce/0x541
May 13 18:45:18 vpc5 kernel: [  959.967008]  [<ffffffff811b1f1f>] ? xfs_setfilesize+0x80/0x148
May 13 18:45:18 vpc5 kernel: [  959.967009]  [<ffffffff814b7e4e>] rt_mutex_lock+0x17/0x19
May 13 18:45:18 vpc5 kernel: [  959.967011]  [<ffffffff8107fdbd>] rt_down_write_nested+0x3a/0x41
May 13 18:45:18 vpc5 kernel: [  959.967014]  [<ffffffff811f18d0>] ? xfs_ilock+0x99/0xd6
May 13 18:45:18 vpc5 kernel: [  959.967015]  [<ffffffff811f18d0>] xfs_ilock+0x99/0xd6
May 13 18:45:18 vpc5 kernel: [  959.967017]  [<ffffffff811b1f1f>] xfs_setfilesize+0x80/0x148
May 13 18:45:18 vpc5 kernel: [  959.967019]  [<ffffffff811b1ea4>] ? xfs_setfilesize+0x5/0x148
May 13 18:45:18 vpc5 kernel: [  959.967021]  [<ffffffff8104f2e0>] ? process_one_work+0x16f/0x397
May 13 18:45:18 vpc5 kernel: [  959.967023]  [<ffffffff811b2bba>] xfs_end_io+0x83/0x99
May 13 18:45:18 vpc5 kernel: [  959.967025]  [<ffffffff8104f384>] process_one_work+0x213/0x397
May 13 18:45:18 vpc5 kernel: [  959.967026]  [<ffffffff8104f2e0>] ? process_one_work+0x16f/0x397
May 13 18:45:18 vpc5 kernel: [  959.967028]  [<ffffffff8104f925>] worker_thread+0x149/0x224
May 13 18:45:18 vpc5 kernel: [  959.967030]  [<ffffffff8104f7dc>] ? rescuer_thread+0x2a5/0x2a5
May 13 18:45:18 vpc5 kernel: [  959.967032]  [<ffffffff810550b2>] kthread+0xa2/0xaa
May 13 18:45:18 vpc5 kernel: [  959.967035]  [<ffffffff81055010>] ? __kthread_parkme+0x65/0x65
May 13 18:45:18 vpc5 kernel: [  959.967037]  [<ffffffff814be1dc>] ret_from_fork+0x7c/0xb0
May 13 18:45:18 vpc5 kernel: [  959.967038]  [<ffffffff81055010>] ? __kthread_parkme+0x65/0x65
May 13 18:45:18 vpc5 kernel: [  959.967040] 4 locks held by kworker/2:1/82:
May 13 18:45:18 vpc5 kernel: [  959.967051]  #0:  (xfs-data/%s){......}, at: [<ffffffff8104f2e0>] process_one_work+0x16f/0x397
May 13 18:45:18 vpc5 kernel: [  959.967056]  #1:  ((&ioend->io_work)){......}, at: [<ffffffff8104f2e0>] process_one_work+0x16f/0x397
May 13 18:45:18 vpc5 kernel: [  959.967061]  #2:  (sb_internal){......}, at: [<ffffffff811b1ea4>] xfs_setfilesize+0x5/0x148
May 13 18:45:18 vpc5 kernel: [  959.967066]  #3:  (&(&ip->i_lock)->mr_lock){......}, at: [<ffffffff811f18d0>] xfs_ilock+0x99/0xd6
May 13 18:45:18 vpc5 kernel: [  959.967073] INFO: task kworker/u16:5:241 blocked for more than 120 seconds.
May 13 18:45:18 vpc5 kernel: [  959.967127] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 13 18:45:18 vpc5 kernel: [  959.967181] kworker/u16:5   D 0000000000000002     0   241      2 0x00000000
May 13 18:45:18 vpc5 kernel: [  959.967187] Workqueue: writeback bdi_writeback_workfn (flush-8:0)
May 13 18:45:18 vpc5 kernel: [  959.967190]  ffff8803f572b3c8 0000000000000002 ffff88042db62b80 ffff8803f8f9c600
May 13 18:45:18 vpc5 kernel: [  959.967192]  ffff8803f63b6900 ffff8803f572bfd8 0000000000063800 ffff8803f572bfd8
May 13 18:45:18 vpc5 kernel: [  959.967193]  0000000000063800 ffff8803f63b6900 ffff880300000001 ffff8803f63b6900
May 13 18:45:18 vpc5 kernel: [  959.967195] Call Trace:
May 13 18:45:18 vpc5 kernel: [  959.967198]  [<ffffffff814b7711>] schedule+0x75/0x87
May 13 18:45:18 vpc5 kernel: [  959.967200]  [<ffffffff814b6790>] schedule_timeout+0x37/0xf9
May 13 18:45:18 vpc5 kernel: [  959.967201]  [<ffffffff814b6f25>] ? __wait_for_common+0x2a/0xda
May 13 18:45:18 vpc5 kernel: [  959.967203]  [<ffffffff814b6f78>] __wait_for_common+0x7d/0xda
May 13 18:45:18 vpc5 kernel: [  959.967205]  [<ffffffff814b6759>] ? console_conditional_schedule+0x19/0x19
May 13 18:45:18 vpc5 kernel: [  959.967207]  [<ffffffff814b6ff9>] wait_for_completion+0x24/0x26
May 13 18:45:18 vpc5 kernel: [  959.967209]  [<ffffffff811da0fe>] xfs_bmapi_allocate+0xd8/0xea
May 13 18:45:18 vpc5 kernel: [  959.967211]  [<ffffffff811da453>] xfs_bmapi_write+0x343/0x59b
May 13 18:45:18 vpc5 kernel: [  959.967213]  [<ffffffff811d7d75>] ? __xfs_bmapi_allocate+0x23c/0x23c
May 13 18:45:18 vpc5 kernel: [  959.967216]  [<ffffffff811bfb94>] xfs_iomap_write_allocate+0x1b9/0x2c2
May 13 18:45:18 vpc5 kernel: [  959.967218]  [<ffffffff811b2218>] xfs_map_blocks+0x12b/0x203
May 13 18:45:18 vpc5 kernel: [  959.967221]  [<ffffffff810d451c>] ? rcu_read_unlock+0x23/0x23
May 13 18:45:18 vpc5 kernel: [  959.967223]  [<ffffffff811b326a>] xfs_vm_writepage+0x280/0x4b8
May 13 18:45:18 vpc5 kernel: [  959.967226]  [<ffffffff810dc27c>] __writepage+0x18/0x37
May 13 18:45:18 vpc5 kernel: [  959.967228]  [<ffffffff810dcdd0>] write_cache_pages+0x25a/0x37e
May 13 18:45:18 vpc5 kernel: [  959.967230]  [<ffffffff810dc264>] ? page_index+0x1a/0x1a
May 13 18:45:18 vpc5 kernel: [  959.967232]  [<ffffffff814b81d8>] ? rt_spin_lock_slowunlock+0x14/0x20
May 13 18:45:18 vpc5 kernel: [  959.967234]  [<ffffffff810dcf35>] generic_writepages+0x41/0x5b
May 13 18:45:18 vpc5 kernel: [  959.967236]  [<ffffffff811b1e29>] xfs_vm_writepages+0x51/0x5c
May 13 18:45:18 vpc5 kernel: [  959.967238]  [<ffffffff810de05b>] do_writepages+0x21/0x2f
May 13 18:45:18 vpc5 kernel: [  959.967239]  [<ffffffff81142c1c>] __writeback_single_inode+0x7b/0x238
May 13 18:45:18 vpc5 kernel: [  959.967240]  [<ffffffff81143e74>] writeback_sb_inodes+0x220/0x37a
May 13 18:45:18 vpc5 kernel: [  959.967242]  [<ffffffff81144042>] __writeback_inodes_wb+0x74/0xb9
May 13 18:45:18 vpc5 kernel: [  959.967244]  [<ffffffff811441c5>] wb_writeback+0x13e/0x2a3
May 13 18:45:18 vpc5 kernel: [  959.967245]  [<ffffffff8114463b>] wb_do_writeback+0x163/0x1d9
May 13 18:45:18 vpc5 kernel: [  959.967247]  [<ffffffff8114471d>] bdi_writeback_workfn+0x6c/0xfe
May 13 18:45:18 vpc5 kernel: [  959.967249]  [<ffffffff8104f384>] process_one_work+0x213/0x397
May 13 18:45:18 vpc5 kernel: [  959.967250]  [<ffffffff8104f2e0>] ? process_one_work+0x16f/0x397
May 13 18:45:18 vpc5 kernel: [  959.967252]  [<ffffffff8104f925>] worker_thread+0x149/0x224
May 13 18:45:18 vpc5 kernel: [  959.967254]  [<ffffffff8104f7dc>] ? rescuer_thread+0x2a5/0x2a5
May 13 18:45:18 vpc5 kernel: [  959.967255]  [<ffffffff810550b2>] kthread+0xa2/0xaa
May 13 18:45:18 vpc5 kernel: [  959.967258]  [<ffffffff81055010>] ? __kthread_parkme+0x65/0x65
May 13 18:45:18 vpc5 kernel: [  959.967259]  [<ffffffff814be1dc>] ret_from_fork+0x7c/0xb0
May 13 18:45:18 vpc5 kernel: [  959.967261]  [<ffffffff81055010>] ? __kthread_parkme+0x65/0x65
May 13 18:45:18 vpc5 kernel: [  959.967262] 5 locks held by kworker/u16:5/241:
May 13 18:45:18 vpc5 kernel: [  959.967278]  #0:  (writeback){......}, at: [<ffffffff8104f2e0>] process_one_work+0x16f/0x397
May 13 18:45:18 vpc5 kernel: [  959.967283]  #1:  ((&(&wb->dwork)->work)){......}, at: [<ffffffff8104f2e0>] process_one_work+0x16f/0x397
May 13 18:45:18 vpc5 kernel: [  959.967288]  #2:  (&type->s_umount_key#18){......}, at: [<ffffffff81123d4f>] grab_super_passive+0x60/0x8a
May 13 18:45:18 vpc5 kernel: [  959.967294]  #3:  (sb_internal){......}, at: [<ffffffff811ff638>] xfs_trans_alloc+0x24/0x3d
May 13 18:45:18 vpc5 kernel: [  959.967299]  #4:  (&(&ip->i_lock)->mr_lock){......}, at: [<ffffffff811f18d0>] xfs_ilock+0x99/0xd6
May 13 18:45:18 vpc5 kernel: [  959.967315] INFO: task dpkg:5825 blocked for more than 120 seconds.
May 13 18:45:18 vpc5 kernel: [  959.967367] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 13 18:45:18 vpc5 kernel: [  959.967422] dpkg            D 0000000000000002     0  5825   5789 0x00000000
May 13 18:45:18 vpc5 kernel: [  959.967425]  ffff8803f57cd778 0000000000000002 0000000000000001 ffff8803f8f9c600
May 13 18:45:18 vpc5 kernel: [  959.967427]  ffff8803f57cd778 ffff8803f57cdfd8 0000000000063800 ffff8803f57cdfd8
May 13 18:45:18 vpc5 kernel: [  959.967428]  0000000000063800 ffff8803f659c600 ffff880300000001 ffff8803f659c600
May 13 18:45:18 vpc5 kernel: [  959.967430] Call Trace:
May 13 18:45:18 vpc5 kernel: [  959.967433]  [<ffffffff814b7711>] schedule+0x75/0x87
May 13 18:45:18 vpc5 kernel: [  959.967435]  [<ffffffff814b6790>] schedule_timeout+0x37/0xf9
May 13 18:45:18 vpc5 kernel: [  959.967436]  [<ffffffff814b6f25>] ? __wait_for_common+0x2a/0xda
May 13 18:45:18 vpc5 kernel: [  959.967438]  [<ffffffff814b6f78>] __wait_for_common+0x7d/0xda
May 13 18:45:18 vpc5 kernel: [  959.967440]  [<ffffffff814b6759>] ? console_conditional_schedule+0x19/0x19
May 13 18:45:18 vpc5 kernel: [  959.967441]  [<ffffffff814b6ff9>] wait_for_completion+0x24/0x26
May 13 18:45:18 vpc5 kernel: [  959.967443]  [<ffffffff811da0fe>] xfs_bmapi_allocate+0xd8/0xea
May 13 18:45:18 vpc5 kernel: [  959.967444]  [<ffffffff811da453>] xfs_bmapi_write+0x343/0x59b
May 13 18:45:18 vpc5 kernel: [  959.967447]  [<ffffffff811d7d75>] ? __xfs_bmapi_allocate+0x23c/0x23c
May 13 18:45:18 vpc5 kernel: [  959.967449]  [<ffffffff811bfb94>] xfs_iomap_write_allocate+0x1b9/0x2c2
May 13 18:45:18 vpc5 kernel: [  959.967451]  [<ffffffff811b2218>] xfs_map_blocks+0x12b/0x203
May 13 18:45:18 vpc5 kernel: [  959.967453]  [<ffffffff811b326a>] xfs_vm_writepage+0x280/0x4b8
May 13 18:45:18 vpc5 kernel: [  959.967456]  [<ffffffff810dc27c>] __writepage+0x18/0x37
May 13 18:45:18 vpc5 kernel: [  959.967458]  [<ffffffff810dcdd0>] write_cache_pages+0x25a/0x37e
May 13 18:45:18 vpc5 kernel: [  959.967460]  [<ffffffff810dc264>] ? page_index+0x1a/0x1a
May 13 18:45:18 vpc5 kernel: [  959.967462]  [<ffffffff810dcf35>] generic_writepages+0x41/0x5b
May 13 18:45:18 vpc5 kernel: [  959.967464]  [<ffffffff811b1e29>] xfs_vm_writepages+0x51/0x5c
May 13 18:45:18 vpc5 kernel: [  959.967466]  [<ffffffff810de05b>] do_writepages+0x21/0x2f
May 13 18:45:18 vpc5 kernel: [  959.967467]  [<ffffffff810d54b6>] __filemap_fdatawrite_range+0x53/0x55
May 13 18:45:18 vpc5 kernel: [  959.967469]  [<ffffffff810d5f57>] filemap_fdatawrite_range+0x13/0x15
May 13 18:45:18 vpc5 kernel: [  959.967471]  [<ffffffff811479c6>] SyS_sync_file_range+0xe9/0x12d
May 13 18:45:18 vpc5 kernel: [  959.967473]  [<ffffffff814be482>] tracesys+0xdd/0xe2
May 13 18:45:18 vpc5 kernel: [  959.967474] 2 locks held by dpkg/5825:
May 13 18:45:18 vpc5 kernel: [  959.967485]  #0:  (sb_internal){......}, at: [<ffffffff811ff638>] xfs_trans_alloc+0x24/0x3d
May 13 18:45:18 vpc5 kernel: [  959.967490]  #1:  (&(&ip->i_lock)->mr_lock){......}, at: [<ffffffff811f18d0>] xfs_ilock+0x99/0xd6
May 13 18:45:18 vpc5 kernel: [  959.967496] INFO: task shutdown_watche:5964 blocked for more than 120 seconds.
May 13 18:45:18 vpc5 kernel: [  959.967549] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 13 18:45:18 vpc5 kernel: [  959.967604] shutdown_watche D 0000000000000002     0  5964   2604 0x00000000
May 13 18:45:18 vpc5 kernel: [  959.967607]  ffff8803c1561ce8 0000000000000002 ffff8803c1561ce8 ffffffff81a1a400
May 13 18:45:18 vpc5 kernel: [  959.967609]  ffff8803f63b2300 ffff8803c1561fd8 0000000000063800 ffff8803c1561fd8
May 13 18:45:18 vpc5 kernel: [  959.967610]  0000000000063800 ffff8803f63b2300 ffff880300000001 ffff8803f63b2300
May 13 18:45:18 vpc5 kernel: [  959.967612] Call Trace:
May 13 18:45:18 vpc5 kernel: [  959.967615]  [<ffffffff814b7711>] schedule+0x75/0x87
May 13 18:45:18 vpc5 kernel: [  959.967616]  [<ffffffff814b6790>] schedule_timeout+0x37/0xf9
May 13 18:45:18 vpc5 kernel: [  959.967618]  [<ffffffff814b6f25>] ? __wait_for_common+0x2a/0xda
May 13 18:45:18 vpc5 kernel: [  959.967620]  [<ffffffff814b6f78>] __wait_for_common+0x7d/0xda
May 13 18:45:18 vpc5 kernel: [  959.967621]  [<ffffffff814b6759>] ? console_conditional_schedule+0x19/0x19
May 13 18:45:18 vpc5 kernel: [  959.967623]  [<ffffffff8104dce4>] ? flush_work+0x198/0x209
May 13 18:45:18 vpc5 kernel: [  959.967625]  [<ffffffff814b6ff9>] wait_for_completion+0x24/0x26
May 13 18:45:18 vpc5 kernel: [  959.967626]  [<ffffffff8104dd2d>] flush_work+0x1e1/0x209
May 13 18:45:18 vpc5 kernel: [  959.967627]  [<ffffffff8104dce4>] ? flush_work+0x198/0x209
May 13 18:45:18 vpc5 kernel: [  959.967629]  [<ffffffff8104c4b4>] ? create_and_start_worker+0x6e/0x6e
May 13 18:45:18 vpc5 kernel: [  959.967631]  [<ffffffff810dfe68>] ? __pagevec_release+0x2c/0x2c
May 13 18:45:18 vpc5 kernel: [  959.967633]  [<ffffffff8104ff67>] schedule_on_each_cpu+0xca/0x104
May 13 18:45:18 vpc5 kernel: [  959.967635]  [<ffffffff810dfe8d>] lru_add_drain_all+0x15/0x18
May 13 18:45:18 vpc5 kernel: [  959.967637]  [<ffffffff810f7aa9>] SyS_mlockall+0x48/0x11d
May 13 18:45:18 vpc5 kernel: [  959.967638]  [<ffffffff814be482>] tracesys+0xdd/0xe2
May 13 18:45:18 vpc5 kernel: [  959.967639] no locks held by shutdown_watche/5964.
May 13 18:45:18 vpc5 kernel: [  959.967648] INFO: task j1939_vehicle_m:5965 blocked for more than 120 seconds.
May 13 18:45:18 vpc5 kernel: [  959.967702] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 13 18:45:18 vpc5 kernel: [  959.967757] j1939_vehicle_m D 0000000000000002     0  5965   2604 0x00000000
May 13 18:45:18 vpc5 kernel: [  959.967760]  ffff8803c14f9ce8 0000000000000002 ffff8803c14f9ce8 ffff8803f8f9d780
May 13 18:45:18 vpc5 kernel: [  959.967761]  ffff8803f63b4600 ffff8803c14f9fd8 0000000000063800 ffff8803c14f9fd8
May 13 18:45:18 vpc5 kernel: [  959.967763]  0000000000063800 ffff8803f63b4600 ffff880300000001 ffff8803f63b4600
May 13 18:45:18 vpc5 kernel: [  959.967765] Call Trace:
May 13 18:45:18 vpc5 kernel: [  959.967768]  [<ffffffff814b7711>] schedule+0x75/0x87
May 13 18:45:18 vpc5 kernel: [  959.967769]  [<ffffffff814b6790>] schedule_timeout+0x37/0xf9
May 13 18:45:18 vpc5 kernel: [  959.967770]  [<ffffffff814b6f25>] ? __wait_for_common+0x2a/0xda
May 13 18:45:18 vpc5 kernel: [  959.967772]  [<ffffffff814b6f78>] __wait_for_common+0x7d/0xda
May 13 18:45:18 vpc5 kernel: [  959.967774]  [<ffffffff814b6759>] ? console_conditional_schedule+0x19/0x19
May 13 18:45:18 vpc5 kernel: [  959.967775]  [<ffffffff8104dce4>] ? flush_work+0x198/0x209
May 13 18:45:18 vpc5 kernel: [  959.967777]  [<ffffffff814b6ff9>] wait_for_completion+0x24/0x26
May 13 18:45:18 vpc5 kernel: [  959.967779]  [<ffffffff8104dd2d>] flush_work+0x1e1/0x209
May 13 18:45:18 vpc5 kernel: [  959.967780]  [<ffffffff8104dce4>] ? flush_work+0x198/0x209
May 13 18:45:18 vpc5 kernel: [  959.967783]  [<ffffffff8104c4b4>] ? create_and_start_worker+0x6e/0x6e
May 13 18:45:18 vpc5 kernel: [  959.967785]  [<ffffffff810dfe68>] ? __pagevec_release+0x2c/0x2c
May 13 18:45:18 vpc5 kernel: [  959.967786]  [<ffffffff8104ff67>] schedule_on_each_cpu+0xca/0x104
May 13 18:45:18 vpc5 kernel: [  959.967788]  [<ffffffff810dfe8d>] lru_add_drain_all+0x15/0x18
May 13 18:45:18 vpc5 kernel: [  959.967789]  [<ffffffff810f7aa9>] SyS_mlockall+0x48/0x11d
May 13 18:45:18 vpc5 kernel: [  959.967791]  [<ffffffff814be482>] tracesys+0xdd/0xe2
May 13 18:45:18 vpc5 kernel: [  959.967792] no locks held by j1939_vehicle_m/5965.
May 13 18:47:18 vpc5 kernel: [ 1079.835312] INFO: task kworker/2:1:82 blocked for more than 120 seconds.
May 13 18:47:18 vpc5 kernel: [ 1079.835369] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 13 18:47:18 vpc5 kernel: [ 1079.835426] kworker/2:1     D 0000000000000000     0    82      2 0x00000000
May 13 18:47:18 vpc5 kernel: [ 1079.835434] Workqueue: xfs-data/sda5 xfs_end_io
May 13 18:47:18 vpc5 kernel: [ 1079.835438]  ffff8803f64b9b58 0000000000000002 0000000000000000 ffff8803f659c600
May 13 18:47:18 vpc5 kernel: [ 1079.835439]  ffff880366153a40 ffff8803f64b9fd8 0000000000063800 ffff8803f64b9fd8
May 13 18:47:18 vpc5 kernel: [ 1079.835441]  0000000000063800 ffff8803f64b1180 ffff8803f64b9b78 ffff8803f64b1180
May 13 18:47:18 vpc5 kernel: [ 1079.835444] Call Trace:
May 13 18:47:18 vpc5 kernel: [ 1079.835449]  [<ffffffff814b7711>] schedule+0x75/0x87
May 13 18:47:18 vpc5 kernel: [ 1079.835451]  [<ffffffff814b7c74>] __rt_mutex_slowlock+0x8e/0xc7
May 13 18:47:18 vpc5 kernel: [ 1079.835453]  [<ffffffff814b7db6>] rt_mutex_slowlock+0x109/0x18a
May 13 18:47:18 vpc5 kernel: [ 1079.835456]  [<ffffffff8107a629>] ? __lock_acquire.isra.27+0x1ce/0x541
May 13 18:47:18 vpc5 kernel: [ 1079.835459]  [<ffffffff811b1f1f>] ? xfs_setfilesize+0x80/0x148
May 13 18:47:18 vpc5 kernel: [ 1079.835461]  [<ffffffff814b7e4e>] rt_mutex_lock+0x17/0x19
May 13 18:47:18 vpc5 kernel: [ 1079.835462]  [<ffffffff8107fdbd>] rt_down_write_nested+0x3a/0x41
May 13 18:47:18 vpc5 kernel: [ 1079.835465]  [<ffffffff811f18d0>] ? xfs_ilock+0x99/0xd6
May 13 18:47:18 vpc5 kernel: [ 1079.835466]  [<ffffffff811f18d0>] xfs_ilock+0x99/0xd6
May 13 18:47:18 vpc5 kernel: [ 1079.835468]  [<ffffffff811b1f1f>] xfs_setfilesize+0x80/0x148
May 13 18:47:18 vpc5 kernel: [ 1079.835470]  [<ffffffff811b1ea4>] ? xfs_setfilesize+0x5/0x148
May 13 18:47:18 vpc5 kernel: [ 1079.835473]  [<ffffffff8104f2e0>] ? process_one_work+0x16f/0x397
May 13 18:47:18 vpc5 kernel: [ 1079.835475]  [<ffffffff811b2bba>] xfs_end_io+0x83/0x99
May 13 18:47:18 vpc5 kernel: [ 1079.835476]  [<ffffffff8104f384>] process_one_work+0x213/0x397
May 13 18:47:18 vpc5 kernel: [ 1079.835478]  [<ffffffff8104f2e0>] ? process_one_work+0x16f/0x397
May 13 18:47:18 vpc5 kernel: [ 1079.835480]  [<ffffffff8104f925>] worker_thread+0x149/0x224
May 13 18:47:18 vpc5 kernel: [ 1079.835482]  [<ffffffff8104f7dc>] ? rescuer_thread+0x2a5/0x2a5
May 13 18:47:18 vpc5 kernel: [ 1079.835484]  [<ffffffff810550b2>] kthread+0xa2/0xaa
May 13 18:47:18 vpc5 kernel: [ 1079.835487]  [<ffffffff81055010>] ? __kthread_parkme+0x65/0x65
May 13 18:47:18 vpc5 kernel: [ 1079.835489]  [<ffffffff814be1dc>] ret_from_fork+0x7c/0xb0
May 13 18:47:18 vpc5 kernel: [ 1079.835490]  [<ffffffff81055010>] ? __kthread_parkme+0x65/0x65
May 13 18:47:18 vpc5 kernel: [ 1079.835492] 4 locks held by kworker/2:1/82:
May 13 18:47:18 vpc5 kernel: [ 1079.835503]  #0:  (xfs-data/%s){......}, at: [<ffffffff8104f2e0>] process_one_work+0x16f/0x397
May 13 18:47:18 vpc5 kernel: [ 1079.835509]  #1:  ((&ioend->io_work)){......}, at: [<ffffffff8104f2e0>] process_one_work+0x16f/0x397
May 13 18:47:18 vpc5 kernel: [ 1079.835514]  #2:  (sb_internal){......}, at: [<ffffffff811b1ea4>] xfs_setfilesize+0x5/0x148
May 13 18:47:18 vpc5 kernel: [ 1079.835519]  #3:  (&(&ip->i_lock)->mr_lock){......}, at: [<ffffffff811f18d0>] xfs_ilock+0x99/0xd6
May 13 18:47:18 vpc5 kernel: [ 1079.835526] INFO: task kworker/u16:5:241 blocked for more than 120 seconds.
May 13 18:47:18 vpc5 kernel: [ 1079.835590] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 13 18:47:18 vpc5 kernel: [ 1079.835665] kworker/u16:5   D 0000000000000002     0   241      2 0x00000000
May 13 18:47:18 vpc5 kernel: [ 1079.835670] Workqueue: writeback bdi_writeback_workfn (flush-8:0)
May 13 18:47:18 vpc5 kernel: [ 1079.835674]  ffff8803f572b3c8 0000000000000002 ffff88042db62b80 ffff8803f8f9c600
May 13 18:47:18 vpc5 kernel: [ 1079.835675]  ffff8803f63b6900 ffff8803f572bfd8 0000000000063800 ffff8803f572bfd8
May 13 18:47:18 vpc5 kernel: [ 1079.835677]  0000000000063800 ffff8803f63b6900 ffff880300000001 ffff8803f63b6900
May 13 18:47:18 vpc5 kernel: [ 1079.835679] Call Trace:
May 13 18:47:18 vpc5 kernel: [ 1079.835682]  [<ffffffff814b7711>] schedule+0x75/0x87
May 13 18:47:18 vpc5 kernel: [ 1079.835684]  [<ffffffff814b6790>] schedule_timeout+0x37/0xf9
May 13 18:47:18 vpc5 kernel: [ 1079.835686]  [<ffffffff814b6f25>] ? __wait_for_common+0x2a/0xda
May 13 18:47:18 vpc5 kernel: [ 1079.835688]  [<ffffffff814b6f78>] __wait_for_common+0x7d/0xda
May 13 18:47:18 vpc5 kernel: [ 1079.835690]  [<ffffffff814b6759>] ? console_conditional_schedule+0x19/0x19
May 13 18:47:18 vpc5 kernel: [ 1079.835692]  [<ffffffff814b6ff9>] wait_for_completion+0x24/0x26
May 13 18:47:18 vpc5 kernel: [ 1079.835693]  [<ffffffff811da0fe>] xfs_bmapi_allocate+0xd8/0xea
May 13 18:47:18 vpc5 kernel: [ 1079.835695]  [<ffffffff811da453>] xfs_bmapi_write+0x343/0x59b
May 13 18:47:18 vpc5 kernel: [ 1079.835698]  [<ffffffff811d7d75>] ? __xfs_bmapi_allocate+0x23c/0x23c
May 13 18:47:18 vpc5 kernel: [ 1079.835701]  [<ffffffff811bfb94>] xfs_iomap_write_allocate+0x1b9/0x2c2
May 13 18:47:18 vpc5 kernel: [ 1079.835703]  [<ffffffff811b2218>] xfs_map_blocks+0x12b/0x203
May 13 18:47:18 vpc5 kernel: [ 1079.835705]  [<ffffffff810d451c>] ? rcu_read_unlock+0x23/0x23
May 13 18:47:18 vpc5 kernel: [ 1079.835707]  [<ffffffff811b326a>] xfs_vm_writepage+0x280/0x4b8
May 13 18:47:18 vpc5 kernel: [ 1079.835710]  [<ffffffff810dc27c>] __writepage+0x18/0x37
May 13 18:47:18 vpc5 kernel: [ 1079.835713]  [<ffffffff810dcdd0>] write_cache_pages+0x25a/0x37e
May 13 18:47:18 vpc5 kernel: [ 1079.835715]  [<ffffffff810dc264>] ? page_index+0x1a/0x1a
May 13 18:47:18 vpc5 kernel: [ 1079.835717]  [<ffffffff814b81d8>] ? rt_spin_lock_slowunlock+0x14/0x20
May 13 18:47:18 vpc5 kernel: [ 1079.835719]  [<ffffffff810dcf35>] generic_writepages+0x41/0x5b
May 13 18:47:18 vpc5 kernel: [ 1079.835721]  [<ffffffff811b1e29>] xfs_vm_writepages+0x51/0x5c
May 13 18:47:18 vpc5 kernel: [ 1079.835723]  [<ffffffff810de05b>] do_writepages+0x21/0x2f
May 13 18:47:18 vpc5 kernel: [ 1079.835724]  [<ffffffff81142c1c>] __writeback_single_inode+0x7b/0x238
May 13 18:47:18 vpc5 kernel: [ 1079.835726]  [<ffffffff81143e74>] writeback_sb_inodes+0x220/0x37a
May 13 18:47:18 vpc5 kernel: [ 1079.835728]  [<ffffffff81144042>] __writeback_inodes_wb+0x74/0xb9
May 13 18:47:18 vpc5 kernel: [ 1079.835729]  [<ffffffff811441c5>] wb_writeback+0x13e/0x2a3
May 13 18:47:18 vpc5 kernel: [ 1079.835731]  [<ffffffff8114463b>] wb_do_writeback+0x163/0x1d9
May 13 18:47:18 vpc5 kernel: [ 1079.835733]  [<ffffffff8114471d>] bdi_writeback_workfn+0x6c/0xfe
May 13 18:47:18 vpc5 kernel: [ 1079.835735]  [<ffffffff8104f384>] process_one_work+0x213/0x397
May 13 18:47:18 vpc5 kernel: [ 1079.835737]  [<ffffffff8104f2e0>] ? process_one_work+0x16f/0x397
May 13 18:47:18 vpc5 kernel: [ 1079.835739]  [<ffffffff8104f925>] worker_thread+0x149/0x224
May 13 18:47:18 vpc5 kernel: [ 1079.835741]  [<ffffffff8104f7dc>] ? rescuer_thread+0x2a5/0x2a5
May 13 18:47:18 vpc5 kernel: [ 1079.835742]  [<ffffffff810550b2>] kthread+0xa2/0xaa
May 13 18:47:18 vpc5 kernel: [ 1079.835744]  [<ffffffff81055010>] ? __kthread_parkme+0x65/0x65
May 13 18:47:18 vpc5 kernel: [ 1079.835746]  [<ffffffff814be1dc>] ret_from_fork+0x7c/0xb0
May 13 18:47:18 vpc5 kernel: [ 1079.835748]  [<ffffffff81055010>] ? __kthread_parkme+0x65/0x65
May 13 18:47:18 vpc5 kernel: [ 1079.835749] 5 locks held by kworker/u16:5/241:
May 13 18:47:18 vpc5 kernel: [ 1079.835767]  #0:  (writeback){......}, at: [<ffffffff8104f2e0>] process_one_work+0x16f/0x397
May 13 18:47:18 vpc5 kernel: [ 1079.835772]  #1:  ((&(&wb->dwork)->work)){......}, at: [<ffffffff8104f2e0>] process_one_work+0x16f/0x397
May 13 18:47:18 vpc5 kernel: [ 1079.835776]  #2:  (&type->s_umount_key#18){......}, at: [<ffffffff81123d4f>] grab_super_passive+0x60/0x8a
May 13 18:47:18 vpc5 kernel: [ 1079.835782]  #3:  (sb_internal){......}, at: [<ffffffff811ff638>] xfs_trans_alloc+0x24/0x3d
May 13 18:47:18 vpc5 kernel: [ 1079.835788]  #4:  (&(&ip->i_lock)->mr_lock){......}, at: [<ffffffff811f18d0>] xfs_ilock+0x99/0xd6
May 13 18:47:18 vpc5 kernel: [ 1079.835802] INFO: task dpkg:5825 blocked for more than 120 seconds.
May 13 18:47:18 vpc5 kernel: [ 1079.835863] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 13 18:47:18 vpc5 kernel: [ 1079.835938] dpkg            D 0000000000000002     0  5825   5789 0x00000000
May 13 18:47:18 vpc5 kernel: [ 1079.835942]  ffff8803f57cd778 0000000000000002 0000000000000001 ffff8803f8f9c600
May 13 18:47:18 vpc5 kernel: [ 1079.835944]  ffff8803f57cd778 ffff8803f57cdfd8 0000000000063800 ffff8803f57cdfd8
May 13 18:47:18 vpc5 kernel: [ 1079.835945]  0000000000063800 ffff8803f659c600 ffff880300000001 ffff8803f659c600
May 13 18:47:18 vpc5 kernel: [ 1079.835947] Call Trace:
May 13 18:47:18 vpc5 kernel: [ 1079.835950]  [<ffffffff814b7711>] schedule+0x75/0x87
May 13 18:47:18 vpc5 kernel: [ 1079.835952]  [<ffffffff814b6790>] schedule_timeout+0x37/0xf9
May 13 18:47:18 vpc5 kernel: [ 1079.835953]  [<ffffffff814b6f25>] ? __wait_for_common+0x2a/0xda
May 13 18:47:18 vpc5 kernel: [ 1079.835955]  [<ffffffff814b6f78>] __wait_for_common+0x7d/0xda
May 13 18:47:18 vpc5 kernel: [ 1079.835957]  [<ffffffff814b6759>] ? console_conditional_schedule+0x19/0x19
May 13 18:47:18 vpc5 kernel: [ 1079.835959]  [<ffffffff814b6ff9>] wait_for_completion+0x24/0x26
May 13 18:47:18 vpc5 kernel: [ 1079.835960]  [<ffffffff811da0fe>] xfs_bmapi_allocate+0xd8/0xea
May 13 18:47:18 vpc5 kernel: [ 1079.835962]  [<ffffffff811da453>] xfs_bmapi_write+0x343/0x59b
May 13 18:47:18 vpc5 kernel: [ 1079.835965]  [<ffffffff811d7d75>] ? __xfs_bmapi_allocate+0x23c/0x23c
May 13 18:47:18 vpc5 kernel: [ 1079.835967]  [<ffffffff811bfb94>] xfs_iomap_write_allocate+0x1b9/0x2c2
May 13 18:47:18 vpc5 kernel: [ 1079.835970]  [<ffffffff811b2218>] xfs_map_blocks+0x12b/0x203
May 13 18:47:18 vpc5 kernel: [ 1079.835972]  [<ffffffff811b326a>] xfs_vm_writepage+0x280/0x4b8
May 13 18:47:18 vpc5 kernel: [ 1079.835975]  [<ffffffff810dc27c>] __writepage+0x18/0x37
May 13 18:47:18 vpc5 kernel: [ 1079.835977]  [<ffffffff810dcdd0>] write_cache_pages+0x25a/0x37e
May 13 18:47:18 vpc5 kernel: [ 1079.835978]  [<ffffffff810dc264>] ? page_index+0x1a/0x1a
May 13 18:47:18 vpc5 kernel: [ 1079.835981]  [<ffffffff810dcf35>] generic_writepages+0x41/0x5b
May 13 18:47:18 vpc5 kernel: [ 1079.835983]  [<ffffffff811b1e29>] xfs_vm_writepages+0x51/0x5c
May 13 18:47:18 vpc5 kernel: [ 1079.835985]  [<ffffffff810de05b>] do_writepages+0x21/0x2f
May 13 18:47:18 vpc5 kernel: [ 1079.835987]  [<ffffffff810d54b6>] __filemap_fdatawrite_range+0x53/0x55
May 13 18:47:18 vpc5 kernel: [ 1079.835989]  [<ffffffff810d5f57>] filemap_fdatawrite_range+0x13/0x15
May 13 18:47:18 vpc5 kernel: [ 1079.835991]  [<ffffffff811479c6>] SyS_sync_file_range+0xe9/0x12d
May 13 18:47:18 vpc5 kernel: [ 1079.835993]  [<ffffffff814be482>] tracesys+0xdd/0xe2
May 13 18:47:18 vpc5 kernel: [ 1079.835994] 2 locks held by dpkg/5825:
May 13 18:47:18 vpc5 kernel: [ 1079.836006]  #0:  (sb_internal){......}, at: [<ffffffff811ff638>] xfs_trans_alloc+0x24/0x3d
May 13 18:47:18 vpc5 kernel: [ 1079.836011]  #1:  (&(&ip->i_lock)->mr_lock){......}, at: [<ffffffff811f18d0>] xfs_ilock+0x99/0xd6
May 13 18:47:18 vpc5 kernel: [ 1079.836017] INFO: task shutdown_watche:5964 blocked for more than 120 seconds.
May 13 18:47:18 vpc5 kernel: [ 1079.836091] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 13 18:47:18 vpc5 kernel: [ 1079.836165] shutdown_watche D 0000000000000002     0  5964   2604 0x00000000
May 13 18:47:18 vpc5 kernel: [ 1079.836169]  ffff8803c1561ce8 0000000000000002 ffff8803c1561ce8 ffffffff81a1a400
May 13 18:47:18 vpc5 kernel: [ 1079.836171]  ffff8803f63b2300 ffff8803c1561fd8 0000000000063800 ffff8803c1561fd8
May 13 18:47:18 vpc5 kernel: [ 1079.836172]  0000000000063800 ffff8803f63b2300 ffff880300000001 ffff8803f63b2300
May 13 18:47:18 vpc5 kernel: [ 1079.836174] Call Trace:
May 13 18:47:18 vpc5 kernel: [ 1079.836177]  [<ffffffff814b7711>] schedule+0x75/0x87
May 13 18:47:18 vpc5 kernel: [ 1079.836179]  [<ffffffff814b6790>] schedule_timeout+0x37/0xf9
May 13 18:47:18 vpc5 kernel: [ 1079.836180]  [<ffffffff814b6f25>] ? __wait_for_common+0x2a/0xda
May 13 18:47:18 vpc5 kernel: [ 1079.836183]  [<ffffffff814b6f78>] __wait_for_common+0x7d/0xda
May 13 18:47:18 vpc5 kernel: [ 1079.836184]  [<ffffffff814b6759>] ? console_conditional_schedule+0x19/0x19
May 13 18:47:18 vpc5 kernel: [ 1079.836186]  [<ffffffff8104dce4>] ? flush_work+0x198/0x209
May 13 18:47:18 vpc5 kernel: [ 1079.836188]  [<ffffffff814b6ff9>] wait_for_completion+0x24/0x26
May 13 18:47:18 vpc5 kernel: [ 1079.836189]  [<ffffffff8104dd2d>] flush_work+0x1e1/0x209
May 13 18:47:18 vpc5 kernel: [ 1079.836191]  [<ffffffff8104dce4>] ? flush_work+0x198/0x209
May 13 18:47:18 vpc5 kernel: [ 1079.836193]  [<ffffffff8104c4b4>] ? create_and_start_worker+0x6e/0x6e
May 13 18:47:18 vpc5 kernel: [ 1079.836195]  [<ffffffff810dfe68>] ? __pagevec_release+0x2c/0x2c
May 13 18:47:18 vpc5 kernel: [ 1079.836197]  [<ffffffff8104ff67>] schedule_on_each_cpu+0xca/0x104
May 13 18:47:18 vpc5 kernel: [ 1079.836198]  [<ffffffff810dfe8d>] lru_add_drain_all+0x15/0x18
May 13 18:47:18 vpc5 kernel: [ 1079.836200]  [<ffffffff810f7aa9>] SyS_mlockall+0x48/0x11d
May 13 18:47:18 vpc5 kernel: [ 1079.836202]  [<ffffffff814be482>] tracesys+0xdd/0xe2
May 13 18:47:18 vpc5 kernel: [ 1079.836203] no locks held by shutdown_watche/5964.
May 13 18:47:18 vpc5 kernel: [ 1079.836212] INFO: task j1939_vehicle_m:5965 blocked for more than 120 seconds.
May 13 18:47:18 vpc5 kernel: [ 1079.836312] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 13 18:47:18 vpc5 kernel: [ 1079.836388] j1939_vehicle_m D 0000000000000002     0  5965   2604 0x00000000
May 13 18:47:18 vpc5 kernel: [ 1079.836394]  ffff8803c14f9ce8 0000000000000002 ffff8803c14f9ce8 ffff8803f8f9d780
May 13 18:47:18 vpc5 kernel: [ 1079.836396]  ffff8803f63b4600 ffff8803c14f9fd8 0000000000063800 ffff8803c14f9fd8
May 13 18:47:18 vpc5 kernel: [ 1079.836398]  0000000000063800 ffff8803f63b4600 ffff880300000001 ffff8803f63b4600
May 13 18:47:18 vpc5 kernel: [ 1079.836402] Call Trace:
May 13 18:47:18 vpc5 kernel: [ 1079.836407]  [<ffffffff814b7711>] schedule+0x75/0x87
May 13 18:47:18 vpc5 kernel: [ 1079.836410]  [<ffffffff814b6790>] schedule_timeout+0x37/0xf9
May 13 18:47:18 vpc5 kernel: [ 1079.836412]  [<ffffffff814b6f25>] ? __wait_for_common+0x2a/0xda
May 13 18:47:18 vpc5 kernel: [ 1079.836415]  [<ffffffff814b6f78>] __wait_for_common+0x7d/0xda
May 13 18:47:18 vpc5 kernel: [ 1079.836417]  [<ffffffff814b6759>] ? console_conditional_schedule+0x19/0x19
May 13 18:47:18 vpc5 kernel: [ 1079.836420]  [<ffffffff8104dce4>] ? flush_work+0x198/0x209
May 13 18:47:18 vpc5 kernel: [ 1079.836423]  [<ffffffff814b6ff9>] wait_for_completion+0x24/0x26
May 13 18:47:18 vpc5 kernel: [ 1079.836425]  [<ffffffff8104dd2d>] flush_work+0x1e1/0x209
May 13 18:47:18 vpc5 kernel: [ 1079.836427]  [<ffffffff8104dce4>] ? flush_work+0x198/0x209
May 13 18:47:18 vpc5 kernel: [ 1079.836431]  [<ffffffff8104c4b4>] ? create_and_start_worker+0x6e/0x6e
May 13 18:47:18 vpc5 kernel: [ 1079.836434]  [<ffffffff810dfe68>] ? __pagevec_release+0x2c/0x2c
May 13 18:47:18 vpc5 kernel: [ 1079.836437]  [<ffffffff8104ff67>] schedule_on_each_cpu+0xca/0x104
May 13 18:47:18 vpc5 kernel: [ 1079.836439]  [<ffffffff810dfe8d>] lru_add_drain_all+0x15/0x18
May 13 18:47:18 vpc5 kernel: [ 1079.836441]  [<ffffffff810f7aa9>] SyS_mlockall+0x48/0x11d
May 13 18:47:18 vpc5 kernel: [ 1079.836444]  [<ffffffff814be482>] tracesys+0xdd/0xe2
May 13 18:47:18 vpc5 kernel: [ 1079.836446] no locks held by j1939_vehicle_m/5965.

^ permalink raw reply	[flat|nested] 43+ messages in thread
* Re: Filesystem lockup with CONFIG_PREEMPT_RT
@ 2014-05-21 19:30 ` John Blackwood
  0 siblings, 0 replies; 43+ messages in thread
From: John Blackwood @ 2014-05-21 19:30 UTC (permalink / raw)
  To: Richard Weinberger, Austin Schuh; +Cc: linux-kernel, xfs, linux-rt-users

 > Date: Wed, 21 May 2014 03:33:49 -0400
 > From: Richard Weinberger <richard.weinberger@gmail.com>
 > To: Austin Schuh <austin@peloton-tech.com>
 > CC: LKML <linux-kernel@vger.kernel.org>, xfs <xfs@oss.sgi.com>, rt-users
 > 	<linux-rt-users@vger.kernel.org>
 > Subject: Re: Filesystem lockup with CONFIG_PREEMPT_RT
 >
 > CC'ing RT folks
 >
 > On Wed, May 21, 2014 at 8:23 AM, Austin Schuh <austin@peloton-tech.com> wrote:
 > > > On Tue, May 13, 2014 at 7:29 PM, Austin Schuh <austin@peloton-tech.com> wrote:
 > >> >> Hi,
 > >> >>
 > >> >> I am observing a filesystem lockup with XFS on a CONFIG_PREEMPT_RT
 > >> >> patched kernel.  I have currently only triggered it using dpkg.  Dave
 > >> >> Chinner on the XFS mailing list suggested that it was a rt-kernel
 > >> >> workqueue issue as opposed to a XFS problem after looking at the
 > >> >> kernel messages.
 > >> >>
 > >> >> The only modification to the kernel besides the RT patch is that I
 > >> >> have applied tglx's "genirq: Sanitize spurious interrupt detection of
 > >> >> threaded irqs" patch.
 > > >
 > > > I upgraded to 3.14.3-rt4, and the problem still persists.
 > > >
 > > > I turned on event tracing and tracked it down further.  I'm able to
 > > > lock it up by scping a new kernel debian package to /tmp/ on the
 > > > machine.  scp is locking the inode, and then scheduling
 > > > xfs_bmapi_allocate_worker in the work queue.  The work then never gets
 > > > run.  The kworkers then lock up waiting for the inode lock.
 > > >
 > > > Here are the relevant events from the trace.  ffff8803e9f10288
 > > > (blk_delay_work) gets run later on in the trace, but ffff8803b4c158d0
 > > > (xfs_bmapi_allocate_worker) never does.  The kernel then warns about
 > > > blocked tasks 120 seconds later.

Austin and Richard,

I'm not 100% sure that the patch below will fix your problem, but we
saw something that sounds pretty familiar to your issue involving the
nvidia driver and the preempt-rt patch.  The nvidia driver uses the
completion support to create their own driver's notion of an internally
used semaphore.

Some tasks were failing to ever wakeup from wait_for_completion() calls
due to a race in the underlying do_wait_for_common() routine.

This is the patch that we used to fix this issue:

------------------- -------------------

Fix a race in the PRT wait for completion simple wait code.

A wait_for_completion() waiter task can be awoken by a task calling
complete(), but fail to consume the 'done' completion resource if it
looses a race with another task calling wait_for_completion() just as
it is waking up.

In this case, the awoken task will call schedule_timeout() again
without being in the simple wait queue.

So if the awoken task is unable to claim the 'done' completion resource,
check to see if it needs to be re-inserted into the wait list before
waiting again in schedule_timeout().

Fix-by: John Blackwood <john.blackwood@ccur.com>
Index: b/kernel/sched/core.c
===================================================================
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3529,11 +3529,19 @@ static inline long __sched
  do_wait_for_common(struct completion *x,
  		   long (*action)(long), long timeout, int state)
  {
+	int again = 0;
+
  	if (!x->done) {
  		DEFINE_SWAITER(wait);

  		swait_prepare_locked(&x->wait, &wait);
  		do {
+			/* Check to see if we lost race for 'done' and are
+			 * no longer in the wait list.
+			 */
+			if (unlikely(again) && list_empty(&wait.node))
+				swait_prepare_locked(&x->wait, &wait);
+
  			if (signal_pending_state(state, current)) {
  				timeout = -ERESTARTSYS;
  				break;
@@ -3542,6 +3550,7 @@ do_wait_for_common(struct completion *x,
  			raw_spin_unlock_irq(&x->wait.lock);
  			timeout = action(timeout);
  			raw_spin_lock_irq(&x->wait.lock);
+			again = 1;
  		} while (!x->done && timeout);
  		swait_finish_locked(&x->wait, &wait);
  		if (!x->done)


^ permalink raw reply	[flat|nested] 43+ messages in thread
* Re: Filesystem lockup with CONFIG_PREEMPT_RT
@ 2014-07-05 19:30 Jan de Kruyf
  0 siblings, 0 replies; 43+ messages in thread
From: Jan de Kruyf @ 2014-07-05 19:30 UTC (permalink / raw)
  To: linux-rt-users

Good people,

Is it possible that the copy of the mail to the xfs mailinglist below
is connected to this problem?
If so, is there anything I can do to help? I am not a kernel developer
but I have run debugging stuff before, just give instructions.

Greetings from dark Africa

Jan de Kruijf.


---------- Forwarded message ----------
From: Jan de Kruyf <jan.de.kruyf@gmail.com>
Date: Sat, Jul 5, 2014 at 2:41 PM
Subject: Data loss XFS with RT kernel on Debian.
To: xfs@oss.sgi.com


Hallo,

While doing a reasonably high density job like rsynching a
subdirectory from one place to another, or tarring it to a pipe
and untarring it at the other end, I note that the cpu usage goes
practically to 100% and when I after 5 minutes or so I reset the
computer the writing has not finished at all. However on the
stock Debian kernel it works without a problem.

Could I still use this combination in an industrial environment
reading and writing reasonably short text files? So far I did not
experience this problem with normal day to day use. It stuck up
its head during installation of gnat-gpl-2014-x86_64-linux-bin
from the http://libre.adacore.com/download/ page. The offending
code is in the Makefile in the top directory page. The Xterm will
give you the place where it gets stuck.

Regards,

Jan de Kruijf.



Her are the details of the installation:

root@jan:~# xfs_info -V
xfs_info version 3.1.7

root@jan:~# xfs_info /usr
meta-data=/dev/sda3 isize=256 agcount=4, agsize=732416 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=2929664, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0

This combination does not work:
root@jan:~# uname -a
Linux jan 3.14-0.bpo.1-rt-amd64 #1 SMP PREEMPT RT Debian
3.14.7-1~bpo70+1 (2014-06-21) x86_64 GNU/Linux

Also kernel 3.10-0.bpo.3-rt-amd64 does not work

But this combination works:
root@jan:~# uname -a
Linux jan 3.2.0-4-amd64 #1 SMP Debian 3.2.57-3+deb7u2 x86_64 GNU/Linux

^ permalink raw reply	[flat|nested] 43+ messages in thread
* Filesystem lockup with CONFIG_PREEMPT_RT
@ 2014-07-07  8:48 Jan de Kruyf
  2014-07-07 13:00 ` Thomas Gleixner
  2014-07-07 16:23 ` Austin Schuh
  0 siblings, 2 replies; 43+ messages in thread
From: Jan de Kruyf @ 2014-07-07  8:48 UTC (permalink / raw)
  To: linux-rt-users

Hope I do not kick in an open door, but

Linux JanDell 3.2.0-0.bpo.3-rt-686-pae #1 SMP PREEMPT RT Thu Aug 23
09:55:27 UTC 2012 i686 GNU/Linux
and
Linux jan 3.2.0-4-rt-amd64 #1 SMP PREEMPT RT Debian 3.2.57-3+deb7u2
x86_64 GNU/Linux

does not give the same problem that I experienced with XFS.
So there is a hickup between version 3.2-rt and 3.10-rt I would say.

cheers,

j.

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2014-07-08 16:09 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-14  2:29 Filesystem lockup with CONFIG_PREEMPT_RT Austin Schuh
2014-05-14  2:29 ` Austin Schuh
2014-05-21  6:23 ` Austin Schuh
2014-05-21  6:23   ` Austin Schuh
2014-05-21  7:33   ` Richard Weinberger
2014-05-21  7:33     ` Richard Weinberger
2014-06-26 19:50     ` Austin Schuh
2014-06-26 22:35       ` Thomas Gleixner
2014-06-27  0:07         ` Austin Schuh
2014-06-27  3:22           ` Mike Galbraith
2014-06-27 12:57           ` Mike Galbraith
2014-06-27 14:01             ` Steven Rostedt
2014-06-27 17:34               ` Mike Galbraith
2014-06-27 17:54                 ` Steven Rostedt
2014-06-27 18:07                   ` Mike Galbraith
2014-06-27 18:19                     ` Steven Rostedt
2014-06-27 19:11                       ` Mike Galbraith
2014-06-28  1:18                       ` Austin Schuh
2014-06-28  3:32                         ` Mike Galbraith
2014-06-28  6:20                           ` Austin Schuh
2014-06-28  7:11                             ` Mike Galbraith
2014-06-27 14:24           ` Thomas Gleixner
2014-06-28  4:51             ` Mike Galbraith
2014-07-01  0:12             ` Austin Schuh
2014-07-01  0:53               ` Austin Schuh
2014-07-05 20:26                 ` Thomas Gleixner
2014-07-06  4:55                   ` Austin Schuh
2014-07-01  3:01             ` Austin Schuh
2014-07-01 19:32               ` Austin Schuh
2014-07-03 23:08                 ` Austin Schuh
2014-07-04  4:42                   ` Mike Galbraith
2014-05-21 19:30 John Blackwood
2014-05-21 19:30 ` John Blackwood
2014-05-21 21:59 ` Austin Schuh
2014-05-21 21:59   ` Austin Schuh
2014-07-05 20:36 ` Thomas Gleixner
2014-07-05 20:36   ` Thomas Gleixner
2014-07-05 19:30 Jan de Kruyf
2014-07-07  8:48 Jan de Kruyf
2014-07-07 13:00 ` Thomas Gleixner
2014-07-07 16:23 ` Austin Schuh
2014-07-08  8:03   ` Jan de Kruyf
2014-07-08 16:09     ` Austin Schuh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.