All of lore.kernel.org
 help / color / mirror / Atom feed
* softlockup with CONFIG_XFS_ONLINE_SCRUB enabled
@ 2019-10-25 10:24 Christoph Hellwig
  2019-10-27 18:32 ` Darrick J. Wong
  0 siblings, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2019-10-25 10:24 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

Hi Darrick,

the current xfs tree seems to easily cause sotlockups in generic/175
(and a few other tests, but not as reproducible) for me.  This is on
20GB 4k block size images on a VM with 4 CPUs and 4G of RAM.

Trace below

generic/175 files ... [ 1815.804060] run fstests generic/175 at 2019-10-25 08:32
[ 1816.168474] XFS (vdb): Mounting V5 Filesystem
[ 1816.674372] XFS (vdb): Ending clean mount
[ 1816.679621] Mounted xfs file system at /mnt/test supports timestamps until 2)
[ 1817.023736] XFS (vdc): Mounting V5 Filesystem
[ 1817.031661] XFS (vdc): Ending clean mount
[ 1817.035755] Mounted xfs file system at /mnt/scratch supports timestamps unti)
[ 1817.061998] XFS (vdc): Unmounting Filesystem
[ 1817.278028] XFS (vdc): Mounting V5 Filesystem
[ 1817.285522] XFS (vdc): Ending clean mount
[ 1817.289734] Mounted xfs file system at /mnt/scratch supports timestamps unti)
[ 1965.113511] XFS (vdc): Unmounting Filesystem
[ 1965.505196] XFS (vdc): Mounting V5 Filesystem
[ 1965.898141] XFS (vdc): Ending clean mount
[ 1965.903121] Mounted xfs file system at /mnt/scratch supports timestamps unti)
[ 2122.950581] XFS (vdb): Unmounting Filesystem
[ 2148.474472] watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [xfs_scrub:161]
[ 2148.475763] Modules linked in:
[ 2148.476389] irq event stamp: 41692326
[ 2148.477095] hardirqs last  enabled at (41692325): [<ffffffff8232c3b7>] _raw_0
[ 2148.478878] hardirqs last disabled at (41692326): [<ffffffff81001c5a>] trace0
[ 2148.480493] softirqs last  enabled at (41684994): [<ffffffff8260031f>] __do_e
[ 2148.481977] softirqs last disabled at (41684987): [<ffffffff81127d8c>] irq_e0
[ 2148.483306] CPU: 3 PID: 16189 Comm: xfs_scrub Not tainted 5.4.0-rc3+ #30
[ 2148.484391] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.124
[ 2148.485755] RIP: 0010:_raw_spin_unlock_irqrestore+0x39/0x40
[ 2148.486699] Code: 89 f3 be 01 00 00 00 e8 d5 3a e5 fe 48 89 ef e8 ed 87 e5 f2
[ 2148.489710] RSP: 0018:ffffc9000233f970 EFLAGS: 00000286 ORIG_RAX: ffffffffff3
[ 2148.491004] RAX: ffff88813b398040 RBX: 0000000000000286 RCX: 0000000000000006
[ 2148.492160] RDX: 0000000000000006 RSI: ffff88813b3988c0 RDI: ffff88813b398040
[ 2148.493313] RBP: ffff888137958640 R08: 0000000000000001 R09: 0000000000000000
[ 2148.494466] R10: 0000000000000000 R11: 0000000000000000 R12: ffffea00042b0c00
[ 2148.495600] R13: 0000000000000001 R14: ffff88810ac32308 R15: ffff8881376fc040
[ 2148.497119] FS:  00007f6113dea700(0000) GS:ffff88813bb80000(0000) knlGS:00000
[ 2148.498606] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2148.499553] CR2: 00007f6113de8ff8 CR3: 000000012f290000 CR4: 00000000000006e0
[ 2148.500718] Call Trace:
[ 2148.501145]  free_debug_processing+0x1dd/0x240
[ 2148.501897]  ? xchk_ag_btcur_free+0x76/0xb0
[ 2148.502616]  __slab_free+0x231/0x410
[ 2148.503206]  ? mark_held_locks+0x48/0x70
[ 2148.503863]  ? _raw_spin_unlock_irqrestore+0x37/0x40
[ 2148.504698]  ? debug_check_no_obj_freed+0x110/0x1d7
[ 2148.505511]  ? xchk_ag_btcur_free+0x76/0xb0
[ 2148.506197]  kmem_cache_free+0x30e/0x360
[ 2148.506920]  xchk_ag_btcur_free+0x76/0xb0
[ 2148.507642]  xchk_ag_free+0x10/0x80
[ 2148.508254]  xchk_bmap_iextent_xref.isra.14+0xd9/0x120
[ 2148.509228]  xchk_bmap_iextent+0x187/0x210
[ 2148.510000]  xchk_bmap+0x2e0/0x3b0
[ 2148.510643]  xfs_scrub_metadata+0x2e7/0x500
[ 2148.511421]  xfs_ioc_scrub_metadata+0x4a/0xa0
[ 2148.512166]  xfs_file_ioctl+0x58a/0xcd0
[ 2148.512793]  ? __lock_acquire+0x252/0x1470
[ 2148.513485]  ? find_held_lock+0x2d/0x90
[ 2148.514125]  do_vfs_ioctl+0xa0/0x6f0
[ 2148.514730]  ? __fget+0x101/0x1e0
[ 2148.515279]  ksys_ioctl+0x5b/0x90
[ 2148.515834]  __x64_sys_ioctl+0x11/0x20
[ 2148.516464]  do_syscall_64+0x4b/0x1a0
[ 2148.517077]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 2148.517921] RIP: 0033:0x7f6114a08f07
[ 2148.518525] Code: b3 66 90 48 8b 05 81 5f 2c 00 64 c7 00 26 00 00 00 48 c7 c8
[ 2148.521539] RSP: 002b:00007f6113de97e8 EFLAGS: 00000246 ORIG_RAX: 00000000000
[ 2148.522781] RAX: ffffffffffffffda RBX: 000000000000000b RCX: 00007f6114a08f07
[ 2148.523965] RDX: 00007f6113de9940 RSI: 00000000c040583c RDI: 0000000000000003
[ 2148.525128] RBP: 00007f6113de9940 R08: 000000000000000e R09: 000055761ead0256
[ 2148.526284] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff7c3f41f0
[ 2148.527605] R13: 00007fff7c3f4050 R14: 00007f6113de9800 R15: 0000000000000006

Message from syslogd@localhost at Oct 25 08:43:15 ...
 kernel:[ 2148.474472] watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [xfs_s]
[ 2149.498479] rcu: INFO: rcu_sched self-detected stall on CPU
[ 2149.499799] rcu: 	3-....: (6486 ticks this GP) idle=662/1/0x4000000000000 
[ 2149.502054] 	(t=6500 jiffies g=455813 q=201)
[ 2149.503003] NMI backtrace for cpu 3
[ 2149.503787] CPU: 3 PID: 16189 Comm: xfs_scrub Tainted: G             L    5.0
[ 2149.505599] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.124
[ 2149.507451] Call Trace:
[ 2149.508009]  <IRQ>
[ 2149.508473]  dump_stack+0x67/0x90
[ 2149.509170]  nmi_cpu_backtrace.cold.6+0x13/0x50
[ 2149.510184]  ? lapic_can_unplug_cpu.cold.31+0x3e/0x3e
[ 2149.511301]  nmi_trigger_cpumask_backtrace+0x8b/0x98
[ 2149.512400]  rcu_dump_cpu_stacks+0x8c/0xb8
[ 2149.513322]  rcu_sched_clock_irq.cold.91+0x1c0/0x3c0
[ 2149.514418]  ? tick_sched_do_timer+0x50/0x50
[ 2149.515367]  ? rcu_read_lock_sched_held+0x4d/0x60
[ 2149.516416]  ? tick_sched_do_timer+0x50/0x50
[ 2149.517368]  update_process_times+0x1f/0x50
[ 2149.518334]  tick_sched_handle+0x1d/0x50
[ 2149.519205]  tick_sched_timer+0x32/0x70
[ 2149.520063]  __hrtimer_run_queues+0x119/0x3e0
[ 2149.521035]  hrtimer_interrupt+0xef/0x200
[ 2149.521950]  smp_apic_timer_interrupt+0x7c/0x1f0
[ 2149.522980]  apic_timer_interrupt+0xf/0x20
[ 2149.523862]  </IRQ>
[ 2149.524297] RIP: 0010:__slab_alloc.constprop.93+0x4f/0x60
[ 2149.525475] Code: 44 89 e6 e8 13 fa ff ff f6 c7 02 48 89 c5 75 13 53 9d e8 f0
[ 2149.529618] RSP: 0018:ffffc9000233f8f0 EFLAGS: 00000246 ORIG_RAX: ffffffffff3
[ 2149.531338] RAX: ffff88813b398040 RBX: 0000000000000246 RCX: 0000000000000006
[ 2149.532929] RDX: 0000000000000006 RSI: ffff88813b3988c0 RDI: ffff88813b398040
[ 2149.534510] RBP: ffff888109016d28 R08: ffff8881376fc5c0 R09: 0000000000000000
[ 2149.536078] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000002dc0
[ 2149.537707] R13: ffffffff816990d6 R14: ffff88813bbb1780 R15: ffffffff816a271b
[ 2149.539282]  ? xfs_buf_item_init+0x6b/0x220
[ 2149.540218]  ? kmem_zone_alloc+0x96/0x1a0
[ 2149.541115]  ? kmem_zone_alloc+0x96/0x1a0
[ 2149.542018]  kmem_cache_alloc+0x1e2/0x220
[ 2149.542907]  ? xfs_buf_item_init+0x6b/0x220
[ 2149.543844]  kmem_zone_alloc+0x96/0x1a0
[ 2149.544705]  xfs_buf_item_init+0x6b/0x220
[ 2149.545610]  _xfs_trans_bjoin+0x26/0xf0
[ 2149.546468]  xfs_trans_read_buf_map+0x17f/0x5f0
[ 2149.547481]  xfs_read_agi+0xb1/0x1c0
[ 2149.548290]  xfs_ialloc_read_agi+0x45/0x170
[ 2149.549250]  xchk_ag_read_headers+0x29/0xa0
[ 2149.550204]  xchk_ag_init+0x1c/0x30
[ 2149.550988]  xchk_bmap_iextent_xref.isra.14+0x48/0x120
[ 2149.552135]  xchk_bmap_iextent+0x187/0x210
[ 2149.553061]  xchk_bmap+0x2e0/0x3b0
[ 2149.553847]  xfs_scrub_metadata+0x2e7/0x500
[ 2149.554727]  xfs_ioc_scrub_metadata+0x4a/0xa0
[ 2149.555618]  xfs_file_ioctl+0x58a/0xcd0
[ 2149.556496]  ? __lock_acquire+0x252/0x1470
[ 2149.557460]  ? find_held_lock+0x2d/0x90
[ 2149.558342]  do_vfs_ioctl+0xa0/0x6f0
[ 2149.559124]  ? __fget+0x101/0x1e0
[ 2149.559880]  ksys_ioctl+0x5b/0x90
[ 2149.560635]  __x64_sys_ioctl+0x11/0x20
[ 2149.561481]  do_syscall_64+0x4b/0x1a0
[ 2149.562300]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 2149.563426] RIP: 0033:0x7f6114a08f07
[ 2149.564226] Code: b3 66 90 48 8b 05 81 5f 2c 00 64 c7 00 26 00 00 00 48 c7 c8
[ 2149.568334] RSP: 002b:00007f6113de97e8 EFLAGS: 00000246 ORIG_RAX: 00000000000
[ 2149.570071] RAX: ffffffffffffffda RBX: 000000000000000b RCX: 00007f6114a08f07
[ 2149.571718] RDX: 00007f6113de9940 RSI: 00000000c040583c RDI: 0000000000000003
[ 2149.573367] RBP: 00007f6113de9940 R08: 000000000000000e R09: 000055761ead0256
[ 2149.574984] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff7c3f41f0
[ 2149.576633] R13: 00007fff7c3f4050 R14: 00007f6113de9800 R15: 0000000000000006

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: softlockup with CONFIG_XFS_ONLINE_SCRUB enabled
  2019-10-25 10:24 softlockup with CONFIG_XFS_ONLINE_SCRUB enabled Christoph Hellwig
@ 2019-10-27 18:32 ` Darrick J. Wong
  2019-10-28  7:30   ` Christoph Hellwig
  0 siblings, 1 reply; 4+ messages in thread
From: Darrick J. Wong @ 2019-10-27 18:32 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Fri, Oct 25, 2019 at 12:24:04PM +0200, Christoph Hellwig wrote:
> Hi Darrick,
> 
> the current xfs tree seems to easily cause sotlockups in generic/175
> (and a few other tests, but not as reproducible) for me.  This is on
> 20GB 4k block size images on a VM with 4 CPUs and 4G of RAM.

Hrm.  I haven't seen that before... what's your kernel config?
This looks like some kind of lockup in slub debugging...?

Also, is this a new thing?  Or something that used to happen with low
frequency but has slowly increased to the point that it's annoying?

(Or something else?)

--D

> Trace below
> 
> generic/175 files ... [ 1815.804060] run fstests generic/175 at 2019-10-25 08:32
> [ 1816.168474] XFS (vdb): Mounting V5 Filesystem
> [ 1816.674372] XFS (vdb): Ending clean mount
> [ 1816.679621] Mounted xfs file system at /mnt/test supports timestamps until 2)
> [ 1817.023736] XFS (vdc): Mounting V5 Filesystem
> [ 1817.031661] XFS (vdc): Ending clean mount
> [ 1817.035755] Mounted xfs file system at /mnt/scratch supports timestamps unti)
> [ 1817.061998] XFS (vdc): Unmounting Filesystem
> [ 1817.278028] XFS (vdc): Mounting V5 Filesystem
> [ 1817.285522] XFS (vdc): Ending clean mount
> [ 1817.289734] Mounted xfs file system at /mnt/scratch supports timestamps unti)
> [ 1965.113511] XFS (vdc): Unmounting Filesystem
> [ 1965.505196] XFS (vdc): Mounting V5 Filesystem
> [ 1965.898141] XFS (vdc): Ending clean mount
> [ 1965.903121] Mounted xfs file system at /mnt/scratch supports timestamps unti)
> [ 2122.950581] XFS (vdb): Unmounting Filesystem
> [ 2148.474472] watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [xfs_scrub:161]
> [ 2148.475763] Modules linked in:
> [ 2148.476389] irq event stamp: 41692326
> [ 2148.477095] hardirqs last  enabled at (41692325): [<ffffffff8232c3b7>] _raw_0
> [ 2148.478878] hardirqs last disabled at (41692326): [<ffffffff81001c5a>] trace0
> [ 2148.480493] softirqs last  enabled at (41684994): [<ffffffff8260031f>] __do_e
> [ 2148.481977] softirqs last disabled at (41684987): [<ffffffff81127d8c>] irq_e0
> [ 2148.483306] CPU: 3 PID: 16189 Comm: xfs_scrub Not tainted 5.4.0-rc3+ #30
> [ 2148.484391] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.124
> [ 2148.485755] RIP: 0010:_raw_spin_unlock_irqrestore+0x39/0x40
> [ 2148.486699] Code: 89 f3 be 01 00 00 00 e8 d5 3a e5 fe 48 89 ef e8 ed 87 e5 f2
> [ 2148.489710] RSP: 0018:ffffc9000233f970 EFLAGS: 00000286 ORIG_RAX: ffffffffff3
> [ 2148.491004] RAX: ffff88813b398040 RBX: 0000000000000286 RCX: 0000000000000006
> [ 2148.492160] RDX: 0000000000000006 RSI: ffff88813b3988c0 RDI: ffff88813b398040
> [ 2148.493313] RBP: ffff888137958640 R08: 0000000000000001 R09: 0000000000000000
> [ 2148.494466] R10: 0000000000000000 R11: 0000000000000000 R12: ffffea00042b0c00
> [ 2148.495600] R13: 0000000000000001 R14: ffff88810ac32308 R15: ffff8881376fc040
> [ 2148.497119] FS:  00007f6113dea700(0000) GS:ffff88813bb80000(0000) knlGS:00000
> [ 2148.498606] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2148.499553] CR2: 00007f6113de8ff8 CR3: 000000012f290000 CR4: 00000000000006e0
> [ 2148.500718] Call Trace:
> [ 2148.501145]  free_debug_processing+0x1dd/0x240
> [ 2148.501897]  ? xchk_ag_btcur_free+0x76/0xb0
> [ 2148.502616]  __slab_free+0x231/0x410
> [ 2148.503206]  ? mark_held_locks+0x48/0x70
> [ 2148.503863]  ? _raw_spin_unlock_irqrestore+0x37/0x40
> [ 2148.504698]  ? debug_check_no_obj_freed+0x110/0x1d7
> [ 2148.505511]  ? xchk_ag_btcur_free+0x76/0xb0
> [ 2148.506197]  kmem_cache_free+0x30e/0x360
> [ 2148.506920]  xchk_ag_btcur_free+0x76/0xb0
> [ 2148.507642]  xchk_ag_free+0x10/0x80
> [ 2148.508254]  xchk_bmap_iextent_xref.isra.14+0xd9/0x120
> [ 2148.509228]  xchk_bmap_iextent+0x187/0x210
> [ 2148.510000]  xchk_bmap+0x2e0/0x3b0
> [ 2148.510643]  xfs_scrub_metadata+0x2e7/0x500
> [ 2148.511421]  xfs_ioc_scrub_metadata+0x4a/0xa0
> [ 2148.512166]  xfs_file_ioctl+0x58a/0xcd0
> [ 2148.512793]  ? __lock_acquire+0x252/0x1470
> [ 2148.513485]  ? find_held_lock+0x2d/0x90
> [ 2148.514125]  do_vfs_ioctl+0xa0/0x6f0
> [ 2148.514730]  ? __fget+0x101/0x1e0
> [ 2148.515279]  ksys_ioctl+0x5b/0x90
> [ 2148.515834]  __x64_sys_ioctl+0x11/0x20
> [ 2148.516464]  do_syscall_64+0x4b/0x1a0
> [ 2148.517077]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [ 2148.517921] RIP: 0033:0x7f6114a08f07
> [ 2148.518525] Code: b3 66 90 48 8b 05 81 5f 2c 00 64 c7 00 26 00 00 00 48 c7 c8
> [ 2148.521539] RSP: 002b:00007f6113de97e8 EFLAGS: 00000246 ORIG_RAX: 00000000000
> [ 2148.522781] RAX: ffffffffffffffda RBX: 000000000000000b RCX: 00007f6114a08f07
> [ 2148.523965] RDX: 00007f6113de9940 RSI: 00000000c040583c RDI: 0000000000000003
> [ 2148.525128] RBP: 00007f6113de9940 R08: 000000000000000e R09: 000055761ead0256
> [ 2148.526284] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff7c3f41f0
> [ 2148.527605] R13: 00007fff7c3f4050 R14: 00007f6113de9800 R15: 0000000000000006
> 
> Message from syslogd@localhost at Oct 25 08:43:15 ...
>  kernel:[ 2148.474472] watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [xfs_s]
> [ 2149.498479] rcu: INFO: rcu_sched self-detected stall on CPU
> [ 2149.499799] rcu: 	3-....: (6486 ticks this GP) idle=662/1/0x4000000000000 
> [ 2149.502054] 	(t=6500 jiffies g=455813 q=201)
> [ 2149.503003] NMI backtrace for cpu 3
> [ 2149.503787] CPU: 3 PID: 16189 Comm: xfs_scrub Tainted: G             L    5.0
> [ 2149.505599] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.124
> [ 2149.507451] Call Trace:
> [ 2149.508009]  <IRQ>
> [ 2149.508473]  dump_stack+0x67/0x90
> [ 2149.509170]  nmi_cpu_backtrace.cold.6+0x13/0x50
> [ 2149.510184]  ? lapic_can_unplug_cpu.cold.31+0x3e/0x3e
> [ 2149.511301]  nmi_trigger_cpumask_backtrace+0x8b/0x98
> [ 2149.512400]  rcu_dump_cpu_stacks+0x8c/0xb8
> [ 2149.513322]  rcu_sched_clock_irq.cold.91+0x1c0/0x3c0
> [ 2149.514418]  ? tick_sched_do_timer+0x50/0x50
> [ 2149.515367]  ? rcu_read_lock_sched_held+0x4d/0x60
> [ 2149.516416]  ? tick_sched_do_timer+0x50/0x50
> [ 2149.517368]  update_process_times+0x1f/0x50
> [ 2149.518334]  tick_sched_handle+0x1d/0x50
> [ 2149.519205]  tick_sched_timer+0x32/0x70
> [ 2149.520063]  __hrtimer_run_queues+0x119/0x3e0
> [ 2149.521035]  hrtimer_interrupt+0xef/0x200
> [ 2149.521950]  smp_apic_timer_interrupt+0x7c/0x1f0
> [ 2149.522980]  apic_timer_interrupt+0xf/0x20
> [ 2149.523862]  </IRQ>
> [ 2149.524297] RIP: 0010:__slab_alloc.constprop.93+0x4f/0x60
> [ 2149.525475] Code: 44 89 e6 e8 13 fa ff ff f6 c7 02 48 89 c5 75 13 53 9d e8 f0
> [ 2149.529618] RSP: 0018:ffffc9000233f8f0 EFLAGS: 00000246 ORIG_RAX: ffffffffff3
> [ 2149.531338] RAX: ffff88813b398040 RBX: 0000000000000246 RCX: 0000000000000006
> [ 2149.532929] RDX: 0000000000000006 RSI: ffff88813b3988c0 RDI: ffff88813b398040
> [ 2149.534510] RBP: ffff888109016d28 R08: ffff8881376fc5c0 R09: 0000000000000000
> [ 2149.536078] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000002dc0
> [ 2149.537707] R13: ffffffff816990d6 R14: ffff88813bbb1780 R15: ffffffff816a271b
> [ 2149.539282]  ? xfs_buf_item_init+0x6b/0x220
> [ 2149.540218]  ? kmem_zone_alloc+0x96/0x1a0
> [ 2149.541115]  ? kmem_zone_alloc+0x96/0x1a0
> [ 2149.542018]  kmem_cache_alloc+0x1e2/0x220
> [ 2149.542907]  ? xfs_buf_item_init+0x6b/0x220
> [ 2149.543844]  kmem_zone_alloc+0x96/0x1a0
> [ 2149.544705]  xfs_buf_item_init+0x6b/0x220
> [ 2149.545610]  _xfs_trans_bjoin+0x26/0xf0
> [ 2149.546468]  xfs_trans_read_buf_map+0x17f/0x5f0
> [ 2149.547481]  xfs_read_agi+0xb1/0x1c0
> [ 2149.548290]  xfs_ialloc_read_agi+0x45/0x170
> [ 2149.549250]  xchk_ag_read_headers+0x29/0xa0
> [ 2149.550204]  xchk_ag_init+0x1c/0x30
> [ 2149.550988]  xchk_bmap_iextent_xref.isra.14+0x48/0x120
> [ 2149.552135]  xchk_bmap_iextent+0x187/0x210
> [ 2149.553061]  xchk_bmap+0x2e0/0x3b0
> [ 2149.553847]  xfs_scrub_metadata+0x2e7/0x500
> [ 2149.554727]  xfs_ioc_scrub_metadata+0x4a/0xa0
> [ 2149.555618]  xfs_file_ioctl+0x58a/0xcd0
> [ 2149.556496]  ? __lock_acquire+0x252/0x1470
> [ 2149.557460]  ? find_held_lock+0x2d/0x90
> [ 2149.558342]  do_vfs_ioctl+0xa0/0x6f0
> [ 2149.559124]  ? __fget+0x101/0x1e0
> [ 2149.559880]  ksys_ioctl+0x5b/0x90
> [ 2149.560635]  __x64_sys_ioctl+0x11/0x20
> [ 2149.561481]  do_syscall_64+0x4b/0x1a0
> [ 2149.562300]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [ 2149.563426] RIP: 0033:0x7f6114a08f07
> [ 2149.564226] Code: b3 66 90 48 8b 05 81 5f 2c 00 64 c7 00 26 00 00 00 48 c7 c8
> [ 2149.568334] RSP: 002b:00007f6113de97e8 EFLAGS: 00000246 ORIG_RAX: 00000000000
> [ 2149.570071] RAX: ffffffffffffffda RBX: 000000000000000b RCX: 00007f6114a08f07
> [ 2149.571718] RDX: 00007f6113de9940 RSI: 00000000c040583c RDI: 0000000000000003
> [ 2149.573367] RBP: 00007f6113de9940 R08: 000000000000000e R09: 000055761ead0256
> [ 2149.574984] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff7c3f41f0
> [ 2149.576633] R13: 00007fff7c3f4050 R14: 00007f6113de9800 R15: 0000000000000006

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: softlockup with CONFIG_XFS_ONLINE_SCRUB enabled
  2019-10-27 18:32 ` Darrick J. Wong
@ 2019-10-28  7:30   ` Christoph Hellwig
  2019-10-29  3:58     ` Darrick J. Wong
  0 siblings, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2019-10-28  7:30 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

[-- Attachment #1: Type: text/plain, Size: 831 bytes --]

On Sun, Oct 27, 2019 at 11:32:32AM -0700, Darrick J. Wong wrote:
> On Fri, Oct 25, 2019 at 12:24:04PM +0200, Christoph Hellwig wrote:
> > Hi Darrick,
> > 
> > the current xfs tree seems to easily cause sotlockups in generic/175
> > (and a few other tests, but not as reproducible) for me.  This is on
> > 20GB 4k block size images on a VM with 4 CPUs and 4G of RAM.
> 
> Hrm.  I haven't seen that before... what's your kernel config?
> This looks like some kind of lockup in slub debugging...?
> 
> Also, is this a new thing?  Or something that used to happen with low
> frequency but has slowly increased to the point that it's annoying?
> 
> (Or something else?)

Seems to happen with 5.3 as well.  I only recently turned
CONFIG_XFS_ONLINE_SCRUB back on in my usual test config, that is what
made it show up..

.config attached.

[-- Attachment #2: config.gz --]
[-- Type: application/x-gzip, Size: 32546 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: softlockup with CONFIG_XFS_ONLINE_SCRUB enabled
  2019-10-28  7:30   ` Christoph Hellwig
@ 2019-10-29  3:58     ` Darrick J. Wong
  0 siblings, 0 replies; 4+ messages in thread
From: Darrick J. Wong @ 2019-10-29  3:58 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Mon, Oct 28, 2019 at 08:30:03AM +0100, Christoph Hellwig wrote:
> On Sun, Oct 27, 2019 at 11:32:32AM -0700, Darrick J. Wong wrote:
> > On Fri, Oct 25, 2019 at 12:24:04PM +0200, Christoph Hellwig wrote:
> > > Hi Darrick,
> > > 
> > > the current xfs tree seems to easily cause sotlockups in generic/175
> > > (and a few other tests, but not as reproducible) for me.  This is on
> > > 20GB 4k block size images on a VM with 4 CPUs and 4G of RAM.
> > 
> > Hrm.  I haven't seen that before... what's your kernel config?
> > This looks like some kind of lockup in slub debugging...?
> > 
> > Also, is this a new thing?  Or something that used to happen with low
> > frequency but has slowly increased to the point that it's annoying?
> > 
> > (Or something else?)
> 
> Seems to happen with 5.3 as well.  I only recently turned
> CONFIG_XFS_ONLINE_SCRUB back on in my usual test config, that is what
> made it show up..
> 
> .config attached.

Aha, you have preempt disabled and slub debugging on by default, which
(on the million-extent files produced by generic/175) mean that scrub
takes long enough to trip the soft lockup watchdog while checking the
bmap.  The test eventually finishes, but the obvious(ly stupid) bandaid
of calling touch_softlockup_watchdog merely plunged the VM into
"rcu_sched self-detected stall on CPU" messages and as it's late I'll
set it aside until tomorrow.

IOWs I think I know what's going on but don't yet know how to fix it. :/

--D

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-10-29  3:58 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-25 10:24 softlockup with CONFIG_XFS_ONLINE_SCRUB enabled Christoph Hellwig
2019-10-27 18:32 ` Darrick J. Wong
2019-10-28  7:30   ` Christoph Hellwig
2019-10-29  3:58     ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.